diff --git a/COPYING b/COPYING index dc63aaca..1d1e693a 100644 --- a/COPYING +++ b/COPYING @@ -1,340 +1,16 @@ +Hypermail LICENCE - GNU GENERAL PUBLIC LICENSE - Version 2, June 1991 +Hypermail is distributed under the GNU GPL license (see the file +LICENSE.txt for details). Some programs that are distributed with it +in the archive and contrib directories have different licenses - check +the individual files for details. - Copyright (C) 1989, 1991 Free Software Foundation, Inc. - 675 Mass Ave, Cambridge, MA 02139, USA - Everyone is permitted to copy and distribute verbatim copies - of this license document, but changing it is not allowed. +Regular expression support is provided by the PCRE library package, +which is open source software, written by Philip Hazel, and copyright +by the University of Cambridge, England. See http://www.pcre.org/. - Preamble +Validation of UTF-8 strings is provided by the utf8.h package, which +is open source software covered by The Unlicense, written by Neil +Henning. See https://github.com/sheredom/utf8.h - The licenses for most software are designed to take away your -freedom to share and change it. By contrast, the GNU General Public -License is intended to guarantee your freedom to share and change free -software--to make sure the software is free for all its users. This -General Public License applies to most of the Free Software -Foundation's software and to any other program whose authors commit to -using it. (Some other Free Software Foundation software is covered by -the GNU Library General Public License instead.) You can apply it to -your programs, too. - - When we speak of free software, we are referring to freedom, not -price. Our General Public Licenses are designed to make sure that you -have the freedom to distribute copies of free software (and charge for -this service if you wish), that you receive source code or can get it -if you want it, that you can change the software or use pieces of it -in new free programs; and that you know you can do these things. - - To protect your rights, we need to make restrictions that forbid -anyone to deny you these rights or to ask you to surrender the rights. -These restrictions translate to certain responsibilities for you if you -distribute copies of the software, or if you modify it. - - For example, if you distribute copies of such a program, whether -gratis or for a fee, you must give the recipients all the rights that -you have. You must make sure that they, too, receive or can get the -source code. And you must show them these terms so they know their -rights. - - We protect your rights with two steps: (1) copyright the software, and -(2) offer you this license which gives you legal permission to copy, -distribute and/or modify the software. - - Also, for each author's protection and ours, we want to make certain -that everyone understands that there is no warranty for this free -software. If the software is modified by someone else and passed on, we -want its recipients to know that what they have is not the original, so -that any problems introduced by others will not reflect on the original -authors' reputations. - - Finally, any free program is threatened constantly by software -patents. We wish to avoid the danger that redistributors of a free -program will individually obtain patent licenses, in effect making the -program proprietary. To prevent this, we have made it clear that any -patent must be licensed for everyone's free use or not licensed at all. - - The precise terms and conditions for copying, distribution and -modification follow. - - GNU GENERAL PUBLIC LICENSE - TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION - - 0. This License applies to any program or other work which contains -a notice placed by the copyright holder saying it may be distributed -under the terms of this General Public License. The "Program", below, -refers to any such program or work, and a "work based on the Program" -means either the Program or any derivative work under copyright law: -that is to say, a work containing the Program or a portion of it, -either verbatim or with modifications and/or translated into another -language. (Hereinafter, translation is included without limitation in -the term "modification".) Each licensee is addressed as "you". - -Activities other than copying, distribution and modification are not -covered by this License; they are outside its scope. The act of -running the Program is not restricted, and the output from the Program -is covered only if its contents constitute a work based on the -Program (independent of having been made by running the Program). -Whether that is true depends on what the Program does. - - 1. You may copy and distribute verbatim copies of the Program's -source code as you receive it, in any medium, provided that you -conspicuously and appropriately publish on each copy an appropriate -copyright notice and disclaimer of warranty; keep intact all the -notices that refer to this License and to the absence of any warranty; -and give any other recipients of the Program a copy of this License -along with the Program. - -You may charge a fee for the physical act of transferring a copy, and -you may at your option offer warranty protection in exchange for a fee. - - 2. You may modify your copy or copies of the Program or any portion -of it, thus forming a work based on the Program, and copy and -distribute such modifications or work under the terms of Section 1 -above, provided that you also meet all of these conditions: - - a) You must cause the modified files to carry prominent notices - stating that you changed the files and the date of any change. - - b) You must cause any work that you distribute or publish, that in - whole or in part contains or is derived from the Program or any - part thereof, to be licensed as a whole at no charge to all third - parties under the terms of this License. - - c) If the modified program normally reads commands interactively - when run, you must cause it, when started running for such - interactive use in the most ordinary way, to print or display an - announcement including an appropriate copyright notice and a - notice that there is no warranty (or else, saying that you provide - a warranty) and that users may redistribute the program under - these conditions, and telling the user how to view a copy of this - License. (Exception: if the Program itself is interactive but - does not normally print such an announcement, your work based on - the Program is not required to print an announcement.) - -These requirements apply to the modified work as a whole. If -identifiable sections of that work are not derived from the Program, -and can be reasonably considered independent and separate works in -themselves, then this License, and its terms, do not apply to those -sections when you distribute them as separate works. But when you -distribute the same sections as part of a whole which is a work based -on the Program, the distribution of the whole must be on the terms of -this License, whose permissions for other licensees extend to the -entire whole, and thus to each and every part regardless of who wrote it. - -Thus, it is not the intent of this section to claim rights or contest -your rights to work written entirely by you; rather, the intent is to -exercise the right to control the distribution of derivative or -collective works based on the Program. - -In addition, mere aggregation of another work not based on the Program -with the Program (or with a work based on the Program) on a volume of -a storage or distribution medium does not bring the other work under -the scope of this License. - - 3. You may copy and distribute the Program (or a work based on it, -under Section 2) in object code or executable form under the terms of -Sections 1 and 2 above provided that you also do one of the following: - - a) Accompany it with the complete corresponding machine-readable - source code, which must be distributed under the terms of Sections - 1 and 2 above on a medium customarily used for software interchange; or, - - b) Accompany it with a written offer, valid for at least three - years, to give any third party, for a charge no more than your - cost of physically performing source distribution, a complete - machine-readable copy of the corresponding source code, to be - distributed under the terms of Sections 1 and 2 above on a medium - customarily used for software interchange; or, - - c) Accompany it with the information you received as to the offer - to distribute corresponding source code. (This alternative is - allowed only for noncommercial distribution and only if you - received the program in object code or executable form with such - an offer, in accord with Subsection b above.) - -The source code for a work means the preferred form of the work for -making modifications to it. For an executable work, complete source -code means all the source code for all modules it contains, plus any -associated interface definition files, plus the scripts used to -control compilation and installation of the executable. However, as a -special exception, the source code distributed need not include -anything that is normally distributed (in either source or binary -form) with the major components (compiler, kernel, and so on) of the -operating system on which the executable runs, unless that component -itself accompanies the executable. - -If distribution of executable or object code is made by offering -access to copy from a designated place, then offering equivalent -access to copy the source code from the same place counts as -distribution of the source code, even though third parties are not -compelled to copy the source along with the object code. - - 4. You may not copy, modify, sublicense, or distribute the Program -except as expressly provided under this License. Any attempt -otherwise to copy, modify, sublicense or distribute the Program is -void, and will automatically terminate your rights under this License. -However, parties who have received copies, or rights, from you under -this License will not have their licenses terminated so long as such -parties remain in full compliance. - - 5. You are not required to accept this License, since you have not -signed it. However, nothing else grants you permission to modify or -distribute the Program or its derivative works. These actions are -prohibited by law if you do not accept this License. Therefore, by -modifying or distributing the Program (or any work based on the -Program), you indicate your acceptance of this License to do so, and -all its terms and conditions for copying, distributing or modifying -the Program or works based on it. - - 6. Each time you redistribute the Program (or any work based on the -Program), the recipient automatically receives a license from the -original licensor to copy, distribute or modify the Program subject to -these terms and conditions. You may not impose any further -restrictions on the recipients' exercise of the rights granted herein. -You are not responsible for enforcing compliance by third parties to -this License. - - 7. If, as a consequence of a court judgment or allegation of patent -infringement or for any other reason (not limited to patent issues), -conditions are imposed on you (whether by court order, agreement or -otherwise) that contradict the conditions of this License, they do not -excuse you from the conditions of this License. If you cannot -distribute so as to satisfy simultaneously your obligations under this -License and any other pertinent obligations, then as a consequence you -may not distribute the Program at all. For example, if a patent -license would not permit royalty-free redistribution of the Program by -all those who receive copies directly or indirectly through you, then -the only way you could satisfy both it and this License would be to -refrain entirely from distribution of the Program. - -If any portion of this section is held invalid or unenforceable under -any particular circumstance, the balance of the section is intended to -apply and the section as a whole is intended to apply in other -circumstances. - -It is not the purpose of this section to induce you to infringe any -patents or other property right claims or to contest validity of any -such claims; this section has the sole purpose of protecting the -integrity of the free software distribution system, which is -implemented by public license practices. Many people have made -generous contributions to the wide range of software distributed -through that system in reliance on consistent application of that -system; it is up to the author/donor to decide if he or she is willing -to distribute software through any other system and a licensee cannot -impose that choice. - -This section is intended to make thoroughly clear what is believed to -be a consequence of the rest of this License. - - 8. If the distribution and/or use of the Program is restricted in -certain countries either by patents or by copyrighted interfaces, the -original copyright holder who places the Program under this License -may add an explicit geographical distribution limitation excluding -those countries, so that distribution is permitted only in or among -countries not thus excluded. In such case, this License incorporates -the limitation as if written in the body of this License. - - 9. The Free Software Foundation may publish revised and/or new versions -of the General Public License from time to time. Such new versions will -be similar in spirit to the present version, but may differ in detail to -address new problems or concerns. - -Each version is given a distinguishing version number. If the Program -specifies a version number of this License which applies to it and "any -later version", you have the option of following the terms and conditions -either of that version or of any later version published by the Free -Software Foundation. If the Program does not specify a version number of -this License, you may choose any version ever published by the Free Software -Foundation. - - 10. If you wish to incorporate parts of the Program into other free -programs whose distribution conditions are different, write to the author -to ask for permission. For software which is copyrighted by the Free -Software Foundation, write to the Free Software Foundation; we sometimes -make exceptions for this. Our decision will be guided by the two goals -of preserving the free status of all derivatives of our free software and -of promoting the sharing and reuse of software generally. - - NO WARRANTY - - 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY -FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN -OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES -PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED -OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF -MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS -TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE -PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, -REPAIR OR CORRECTION. - - 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING -WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR -REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, -INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING -OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED -TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY -YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER -PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE -POSSIBILITY OF SUCH DAMAGES. - - END OF TERMS AND CONDITIONS - - Appendix: How to Apply These Terms to Your New Programs - - If you develop a new program, and you want it to be of the greatest -possible use to the public, the best way to achieve this is to make it -free software which everyone can redistribute and change under these terms. - - To do so, attach the following notices to the program. It is safest -to attach them to the start of each source file to most effectively -convey the exclusion of warranty; and each file should have at least -the "copyright" line and a pointer to where the full notice is found. - - - Copyright (C) 19yy - - This program is free software; you can redistribute it and/or modify - it under the terms of the GNU General Public License as published by - the Free Software Foundation; either version 2 of the License, or - (at your option) any later version. - - This program is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - GNU General Public License for more details. - - You should have received a copy of the GNU General Public License - along with this program; if not, write to the Free Software - Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. - -Also add information on how to contact you by electronic and paper mail. - -If the program is interactive, make it output a short notice like this -when it starts in an interactive mode: - - Gnomovision version 69, Copyright (C) 19yy name of author - Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. - This is free software, and you are welcome to redistribute it - under certain conditions; type `show c' for details. - -The hypothetical commands `show w' and `show c' should show the appropriate -parts of the General Public License. Of course, the commands you use may -be called something other than `show w' and `show c'; they could even be -mouse-clicks or menu items--whatever suits your program. - -You should also get your employer (if you work as a programmer) or your -school, if any, to sign a "copyright disclaimer" for the program, if -necessary. Here is a sample; alter the names: - - Yoyodyne, Inc., hereby disclaims all copyright interest in the program - `Gnomovision' (which makes passes at compilers) written by James Hacker. - - , 1 April 1989 - Ty Coon, President of Vice - -This General Public License does not permit incorporating your program into -proprietary programs. If your program is a subroutine library, you may -consider it more useful to permit linking proprietary applications with the -library. If this is what you want to do, use the GNU Library General -Public License instead of this License. +End diff --git a/Changelog b/Changelog index 25fb4c22..636c33f8 100644 --- a/Changelog +++ b/Changelog @@ -2,26 +2,489 @@ Version Changes for Hypermail ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ============================ -HYPERMAIL VERSION 2.4.1: +HYPERMAIL VERSION 3.0.0 ============================ -2022-11-04 Jose Kahan - * src/Makefile.in - Some C linkers require libraries to be declared after functions in that - library are called, and not before. This was notably with -lm and - -ltrio, where -lm had to be declared after -ltrio. - (closes # 84) +2024-11-07 Jose Kahan + * configure.ac, configure + Improve detection of libchardet (system, local dir) and enable + it by default + + * pkgconf, libiconv + Make pkgconf and libiconv mandatory for compiling hypermail + +2024-11-06 Jose Kahan + * src/hypermail.c + If hypermail was called with its default configuration values, + set_labels was given a static value, which resulted in a sigsev + when hypermail tried to do clean-up when ending its run. + +2023-00-00 Jose Kahan + * src/parse.c + Some old versions of mail clients like Pine and thunderbidURL-escaped + the <> that separate the In-Reply-To and the first References header + values. This kept hypermail from finding those values and generating an + In-Reply-To link. The parser now unescapes those header values before + further processing. + + * src/parse.c + If a multipart/* message had an RFC2046 MIME preamble with a line + having "--\n" or "--\s\n", the parser would assume there was an issue + with the boundaries and would stop parsing the rest of the message + parts. + + * src/parse.c + For attachment filenames, if the line was folded after the attribute + and the value itself (quoted or not), parsing the filename could result + in a single space filename. + + Spaces in attachment filenames were not being replaced with + REPLACEMENT_CHAR (_) when the filename was made of only spaces. + + * src/{parse.c, string.c, proto.h} + Some old messages or broken UA used "DEFAULT_CHARSET" or "foo_CHARSET" + as a value for the charset attribute. We now filter out those broken + values from the charset value. In the case of "DEFAULT_CHARSET", the + end result will be equivalent to not having a charset attribute. In + that case, the parser will either use the charset associated with + another header value or the first body part or, by default, use + US-ASCII. + + * src/{parse.c, print.c, hypermail.h} + Fixes an issue when parsing a message with one or more invalid email + headers could make hypermail crash. If hypermail detects an invalid + header, it will mark it as such and skip it during the rest of the + parsing and markup generation. Some examples of invalid headers + are those missing a header name, value, ':\s', etc. + + * src/{parse.c, string.c, proto.h} configure.in + extend mdecode_RFC2047() so that; + 1) it partially supports RFC6532 + 2) if a header value is not valid UTF-8, + replaces it with "(invalid string)" + 3) if a header value is given as a binary unencoded string, + it tries to detect the charset with libchardet. In case of failure, + it will replace the header value with "(invalid string)" + + * src/parse.c + Fixes an error when a multipart/alternative message is missing + its end boundary immediately followed by another message that + has a multipart/alternative. + + * src/{hypermail.h, print.c} + Two enhancements to simplify the understanding of complex structured + messages when using screen readers. + + For forwarded messages, appending a forwarded message counter + to identify each message and its optional list of stored + attachmens.. + + Reduced the text "List of stored attachments" to "Attachments for + message nesting level-sequence" for forwarded messages and just + "Attachments" for the root message. + + * src/parse.c + filenames were not being taken int account for text/plain attachment + with Content-Disposition: attachment and a filename name given only in + the Content-Type name attribute. + + * aclocal.m4 moved to m4/apr.m4 + We were wrongly storing our local autoconf macros in aclocal.m4. That + file is reserved for aclocal and the autotools. We moved our macros to + m4/apr.m4 (apr is from where we borrowed those macros). + + * src/print.c + Integrate @bert-github's markup improvements for forwarded messages. + Replace
with
, move the

forwarded message from + below the corresponding
parent to article to below
+ and move the aria-labelled-by attribute from that
to +
. + + * src/{hypermail.h, string.c} + Increase the max size for a scanned url string to 4096 to reflect + modern trends. + + * src/parse.c + parse_old_html() was not unconverting all the protected + HTML entities (such as –) from the comments in a + hypermail archive directory. + + * src/{printfile.c, lang.h} + Associate title="Normal view" to the default stylesheet to take + into account someone adding alternate css files by means + of hypermail's configuration directives. + + * src/{parse.c, setup.c, setup.h} + New configuration option ignore_content_disposition to be able to + ignore the Content-Disposition header for some Mime-Types. This is + particularly useful for old Apple Mail messages that associated + Content-Disposition: attachment with multipart/appledouble. + + * src/parse.c + When parse_old_html() was called with cmp_msgid to check if the first + message in the archive corresponds to the first one that is added, + the comparition failed if the msgid had characters that had to be + escaped inside xml comments. + + * src/{parse.c, struct.c, struct.h} + In complex messages where a multipart/mixed had a message/rfc822 part, + with the later having multipart/mixed parts, parsing the message could + fail if there were one or more missing boundary end separators. + + * src/parse.c + If a multipart/mixed attachment had a Content-Disposition header + that didn't give a filename, the parser wasn't checking if the + Content-Type header had a name attribute it could use for the + attachment file. + + * src/print.c + Although description strings for attachments are stored in UTF-8, + they were not being converted to the prefered charset for message + files, which lead to invalid chars appearing in messages + + * src/print.c + Special chars in subjects in forwarded messages were not being + escaped + + * config.guess, config.sub + Updated from https://git.savannah.gnu.org/gitweb/?p=config.git + + * configure.ac, configure + Rewrote the rules for supporting user-defined paths / system paths for + the gdbm library + + * removed acconfig.h from FILES and from the repository. + Not needed anymore by autoconf. + + * src{print.c, printfile.c, struct.c} + Removed warnings, errors when compiling without iconv + + * src{*.c, *.h} + Added missing copyright, GPL licence where needed + Added protection against multiple includes of .h files + + * src/{hypermail.c, setup.c}, docs/{hypermail.html, hypermail.1, + hmrc.html, hmrc.4} + Removed the -T option from the doc, the hypermail -h page and other + places where the option was mentioned. This option + was deprecated and removed in 2.2.25 but it was still documented. + Removed the -t option from the hypermail -h page. This option + was deprecated in 3.0.0 and is now ignored. + + * src/parse.c + fixnextheader(), fixreplyheader(), and fixthreadheader were + not freeing memory allocated via i18n_utf2numref to + numsubject and numname + + * src/string.c + translateurl() was not URL-encoding UTF-8 chars correctly + makeinreplytocommand() was not calling translateurl for the subject + + * src/parse.c + Hypermail leaked memory if a message had more than one of + the following headers: Subject:, Date:, From:, Message-Id:. + Hypermail now only takes into account the first header + it finds, and ignores the duplicates. Note that multiple + Message-Ids will be rendered, but only the first one will + be taken into account internally (e.g., for threads). + + * src/{hypermail.h, struct.c} + When a message/rfc body part contained only a stored + attachment (e.g., text/html), even if the attachment + was being added to the attachments dir, both the message/rfc822 + headers and the link to the stored attachment were not being + added to the resulting markup and generated a memory leak. + + * src/parse.c + Some old mail clients were could associate a message/rfc822 + Content-Type with a BASE64 Content-Transfer-Encoding, which + violates RFC2046. If we find this case, we now force + ENCODE_NORMAL (the one used for both 7bit and 8bit). + + * src/{setup.c, setup.h} + new conf. option warn_deprecated_options to control whether we + want hypermail to display warnings when a configuration file + has options that have been deprecated or that we plan to deprecate + in the next release. + + * src/{defaults.h.in, setup.c, hypermail.c} + set a default breadcrumb for messages if neither + mhtmlnavbar2upfile ihtmlnavbar2upfile config options are set + + * src/{parse.c, struct.c, hypermail.h} + spamify() was being systematically applied to boundary-ids which broke + boundary detection in multipart/ messages whenever a boundary-id had a + '@' character. + closes #94 + + * src/{string.c, proto.h, struct.c, parse.c} + When parsing message headers having DOS-style line + endings ('\r\n'), the '\r' was not being dropped in most + of the headers. Expanded the solution proposed in PR#98 + closes #95 + + * src/{setup.c, setup.h, parse.c}, doc/{hmrc.html, hmrc.4} + New conf. option max_attachments_per_msg. Used for limiting + the number of attachments we want to parse for messages. + See hmrc.html, hmrc.4 or hypermail -v for more info. + + * src/{parse.c, string.c, proto.h} + Some malformed messages could insert a \r followed by some + text between the Content-Type or Content-Type parameters, + their values and the end of the line or the ';'. Hypermail + now detects these values and filters them out. + + * src/mail.c libcgi configure.ac Makefile.in src/Makefile.in + Purged the years old deprecated mail script and its associated + libraries. Althought the script itself had been disabled + by code changes, its code base was still being built and + distributed with hypermail, even if it did nothing anymore. + + * src/{hypermail.c, print.c, setup.c, setup.h} + When reading a multi-message mbox that is made only of messages that + are annotated as 'spam' or 'deleted', hypermail will now print + an "empty archive" notice in all the generated indices. You can + customize the markup and text of that notice with the + 'empty_archive_notice' configuration option. + Note that this behavior is disabled when calling hypermail with the -u + option (upgrade existing archive). + + * src/getdate.y + Fixed bison warning "POSIX yacc reserves %type to nonterminals [-Wyacc]" + by adapting the solution proposed for kerberos5: + https://github.com/krb5/krb5/commit/0108d7d7fbb1111c062ac580e69e97103662fc2b + + * many files + To prune out potential errors and portability issues, compiled + hypermail with the gcc flags here belowed and removed most of the + compiler warnings. The few ones that remain are mostly implicit + declaration of functions fileno(), symlink(), and lstat() whose + prototypes have switched between glib and posix and 'const' qualifier + being discarded in utf8.h, a 3rd party lib. Need more tests on + different platforms to improve hypermail's configure.ac. This has no + impact when compiling hypermail on debian with gcc either without these + warnings or with -Wall and with -O2. + + -Wall -ansi -pedantic -Wno-overlength-strings -Wshadow -Wpointer-arith + -Wcast-qual -Wcast-align -Waggregate-return -Wstrict-prototypes + -Wmissing-prototypes -Wnested-externs -Winline + + * src/getname.c + When a From: header had two email addresses, hypermail was + always extracting the first one, regardless if it was in + a comment, while the 2nd one was the valid one. In some + cases that address could also include enclosing parenthesis + Closes #85 + + * src/setup.c + Marked config options finelinks, linkquotes as "deprecated, with + intention to remove" due to corresponding code not being maintained + anymore. + + New conf. option show_headers_msg_rfc822. Useful if you want to + fine-tune which headers you want to be displayed in message/rfc822 + attachments. If not declared, hypermail will use + show_headers_msg for message/rfc822 attachments. + + * src/lock.c + unlock_archive() wasn't removing the lockfile + + * src/string.c + fixed a memory leak after calling obfuscated_email_address() + + * many files + Reworked the parser so that it now produces a tree that represents + a message to have better parsing of multipart/mixed and message/rfc822 + messages, choosing a charset for the whole message, and, most + importantly, separate the identification of the logic structures + of a message from the generation of markup. Once a message is parsed + and processed, the tree will then be flattened to the structure the + print modules are used to, but including now markers for the beginning + and end of sections, etc. + +2023-05-16 Andy Valencia (@vandys) + * src/{parse.c, base64.h, base64.c} + The BASE64 decoder didn't take into account lines that were not + multiples of four and this resulted in a corrupted decode. Changed the + decoder so that state can be preserved between subsequent calls. + Closes #96 + +2023-05-12 @shlomif + * README + Fixed spelling errors + PR#87 + +2022-10-26 Jose Kahan + * src/getname.c + Remove a potential writing out of bonds issue + +2022-04-07 Jose Kahan + * src/parse.c + When parsing a mailbox, the code for skipping MIME epilogues wasn't + detecting the start of the next message. Thanks to James Riordon and + Christof Meerwald for reporting and pointing out where the issue was + (closes #82). + +2022-04-04 Jose Kahan + * src/string.c + We're now using iconv's transliteration flag when converting a string + from UTF-8 to another charset, From the man page: "this means that when + a character cannot be represented in the target character set, it can + be approximated through one or several similarly looking characters." + +2022-04-01 Jose Kahan + * src/string.c src/utf8.h + Integrated utf8.h to provide validation of UTF-8 strings + + When i18n_convstring is called to convert from UTF-8 to UTF-8, we + skip the convertion, but we now replace invalid UTF-8 characters + with a '?' character. + +2022-03-30 Jose Kahan + * src/parse.c + The demimed flag was not being updated when decoding RFC2047 lines + + * src/hypermail.h + Increase NAMESTRLEN from 80 to 320 to take into account that a single + UTF-8 character may be encoded in up to four bytes. + + * src/string.c src/parse.c src/proto.h + If a header value is not using RFC2047, make sure its value consists + of only US-ASCII printable character plus newline. + +2022-01-15 Christof Meerwald + * src/print.c + Fixed a format string issue leading to a segfault or worse (fixes #81) 2021-11-17 David Hughes * docs/hmrc.html src/parse.c - Added support for strftime (3) formatting in the append_filename configuration - option to allow for archived messages to be split over multiple directories - (monthly or yearly for example) + Added support for strftime (3) formatting in the append_filename + configuration option to allow for archived messages to be split over + multiple directories (monthly or yearly for example) + +2021-11-09 Jose Kahan + * src/parse.c + MIME RFC1341 epilogues were not being ignored when they were made of + more than newlines + +2021-11-08 Jose Kahan + * src/print.c + Restoring in-reply-to and next-in-thread links to deleted messages. + Removing them caused confusing in the archives. One small change + w3t to 2.4.3: links now only display "deleted message" and not + "(deleted message) subject" as previously. + +2021-11-03 Jose Kahan + * src/print.c src/print.h src/hypermail.h src/parse.c src/struct.c + Improved the handling and displaying of message/rfc822 + attachments (closes #72). + + Rewrote parts of printbody() to improve how attachments are written + when using showhtml and inlinehtml options. + + Encapsulating attachments within sections. + +2021-10-15 Jose Kahan + * Added the css class showhtml-body to help customize displayed + messages when using the showhtml option. + +2021-10-13 @AverageGuy + * Hypermail wasn't compiling under ubuntu due to the order of + where -lm appeared when linking libraries. (closes # 84) + +2021-10-11 Jose Kahan + * src/string.c src/proto.h src/parse.c + Hypermail uses C-lib functions, such as isspace() and sscanf(), that + don't understand unicode spaces. As a workaround, we're converting + those spaces to ascii ones so that the parts of hypermail that depend + on those functions, such as link scanning will continue to work. + +2021-10-01 Jose Kahan + * src/threadprint.c src/print.c src/lang.h + Improve the thread index and next in thread links. If a thread's node + is deleted, don't display it unless it has non-deleted children. In + this case, display it using MSG_DEL_SHORT for its name. Make the next + in thread links skip deleted nodes. + +2021-09-16 Jose Kahan + * src/print.c + The default MESSAGE_DELETE text was being displayed twice in deleted + messages, once in the header, and again in the body. This message will + now only be displayed in the body and is now associated with the CSS + class "message-deleted". + +2021-09-14 Jose Kahan + * various files + Replaced the few remaining calls to snprintf with trio_snprintf, a more + portable snprintf. Those snprint functions were supposed to be + trio_snprintf ones since the beginning. This was also indirectly + causing problems when compiling under MacOs (issue #66). + +2021-09-14 Jose Kahan + * src/struct.c src/quotes.c configure.ac src/Makefile.in pcre2/ + Support for PCRE2. Updating shipped PCRE2 lib to upstream v10.37. + +2021-09-10 Jose Kahan + * src/Makefile.in + Add convert-css rule to use contrib/csstoc.pl to convert + ../docs/hypermail.css into printcss.c This rule must be run by hand as + we didn't want to tie for the moment hypermail's compilation to a + working install of perl + + * docs/hypermail.css + Added the default stylesheet used by hypermail + +2021-09-08 Jose Kahan + * src/print.c src/print.h + If set_show_headers is configured, output headers respecting the same + order as they are declared in this configuration variable. + + * src/setup.c + When using hypermail -v to print out the config options, Makeconfig() + was dereferencing the values using a pointer to a long, instead of + a pointer to an int. Even if this function could display wrong values, + it didn't have any incidence on how hypermail works. + +2021-09-07 Jose Kahan + * src/string.c + Make unrre() returN NOSUBJECT when a subject is made only of + one or more Re: (or equivalent) strings. + + * src/parse.c + findre() was being too greedy and detecting all Re: strings + even if they were not at the beginning of a subject. + + * src/setup.c src/setup.h src/print.c src/hypermail.c + Move all CSS statements to an external CSS file called by default + hypermail.css. Hypermail will generate that file if it's missing + and if either the index or message CSS URLs are not declared. + This name can be configured thru the set_hypermail_css variable. + + * src/string.c + Very long mail subjects were not being handled correctly and could + result in invalid markup. + + * src/hypermail.h + changed the link pointing to the hypermail project to the github + repository + + * src/print.c src/printfile,c src/threadprint.c + updated markup to HTML5. Reviewed and enhanced the WAI related markup + + * src/hypermail.c src/setup.[ch] src/print.c + new configuration option mhtmlnavbar2upfile to specify a specific + navbar for messages. By default uses the value of ihtmlnavbar2upfile 2021-06-04 Baptiste Daroussin * src/uudecode.c src/parse.c Fixes for memory issues detected by libasan +2020-08-27 Jose Kahan + * src/string.c + extended parseurl() in a naive approach for supporting U+00A0 + nonbreakable space inside sscanf. This function should eventually be + rewritten to use libpcre + 2020-06-19 Jose Kahan * src/domains.c valid_root_domains() was validating a domain name against the @@ -56,7 +519,8 @@ HYPERMAIL VERSION 2.4.0: * src/parse.c parsemail(): a Content-Transfer-Encoding header with a missing value would result in an unitialized variable being used to output an unknown - encoding warning message. Hypermail now skips this header if it's empty. + encoding warning message. Hypermail now skips this header if it's + empty. 2019-11-22 Jose Kahan * src/print.c @@ -104,51 +568,49 @@ HYPERMAIL VERSION 2.4.0: main(): if the en_US locale is not available, try en_US.UTF-8 2018-10-11 Jose Kahan - * string.c parseemail(): for some reason if a ',' char was concatenaned to an email address, it was being parsed as part of the email username. 2018-10-10 Jose Kahan - * string.c parseurl(): remove deprecated URLs. Improve support for URLs that don't have slashes following the 'protocol:' schema. Improve support for tel: URLs. 2018-10-09 Bill Shannon - * string.c - parseurl(): non-URL text can be misinterpreted as a URL, causing segfault - Fixes issue #39 + parseurl(): non-URL text can be misinterpreted as a URL, causing + segfault Fixes issue #39 2018-10-08 Jose Kahan - * print.c, string.c, proto.h - The inreplyto_command could generate invalid links if the parser interpolated the - in-reply-to from the subject. If a message's in-reply-to is interpolated, - hypermail won't honor it anymore for that message. + The inreplyto_command could generate invalid links if the parser + interpolated the in-reply-to from the subject. If a message's + in-reply-to is interpolated, hypermail won't honor it anymore for that + message. 2018-10-07 Jose Kahan - - * print.c printauthors(), printthreads(), printsubjects, printdates() - Even if set_i18n was enabled, Indexes were written with mixed charsets - instead of using utf-8 throughout. + * print.c + printauthors(), printthreads(), printsubjects, printdates() Even if + set_i18n was enabled, Indexes were written with mixed charsets instead + of using utf-8 throughout. * configure.ac, Makefile.in, src/Makefile.in If the system has a recent libpcre, compile against it instead of using - the bundled one. This check is done by default when launching configure. + the bundled one. This check is done by default when launching + configure. New configure options to allow to link against an external pcre library (--with-external-pcre) or to force the build and link against the bundled one (--enable-bundled-pcre). New configure option to allow to compile against a system libtrio - (--enable-system-libtrio). Contrary to libpcre, it's not possible to find the - version of libtrio so we cannot compile automatically against it. + (--enable-system-libtrio). Contrary to libpcre, it's not possible to + find the version of libtrio so we cannot compile automatically against + it. 2018-10-04 Jose Kahan - * src/pcre Upgraded to upstream version 8.42 @@ -156,35 +618,32 @@ HYPERMAIL VERSION 2.4.0: Upgraded to upstream version 1.16 * parse.c - Only use the headers charset for text/plain if its absent. If no charset - is available in the other parts, do not add it. + Only use the headers charset for text/plain if its absent. If no + charset is available in the other parts, do not add it. 2018-10-04 Jose Kahan - * Updated Changelog format for entries newer than version 2.3.0 * hypermail.h, parse.c, struct.c, struct.h Charset handling for multipart messages was being handled wrong, giving - priority to the headers charset (if found) over that of the displayed body. - Sometimes the last found charset was the one being used throughout in the - generated body. + priority to the headers charset (if found) over that of the displayed + body. Sometimes the last found charset was the one being used + throughout in the generated body. - The metafile for attachments was sometimes inheriting the charset of the whole - body, even when not necessary or wrong. Now the meta doesn't include a charset - if the attachment doesn't explicitly give one. + The metafile for attachments was sometimes inheriting the charset of + the whole body, even when not necessary or wrong. Now the meta doesn't + include a charset if the attachment doesn't explicitly give one. * parse.c - References header was processed multiple times as it was not being marked as parsed - after being processed. - The epilogue of MIME parts was not being ignored. + References header was processed multiple times as it was not being + marked as parsed after being processed. The epilogue of MIME parts was + not being ignored. 2018-06-19 Jose Kahan - * Changed mispelled configuration option warn_surpressions to the correct spelling warn_suppressions. 2018-06-14 Jose Kahan - * Hypermail would segfault or have an incorrect thread view if it was handling an archive with a message (msg1) that was a reply to a message not in the archive, if msg1 had a reply to it in the archive (msg2), @@ -192,7 +651,6 @@ HYPERMAIL VERSION 2.4.0: subject (regardless of Re: prefixes). 2018-06-07 Bill Shannon - * Add charset alias for Thai and Chinese * The markup for access key "j" to jump to the start of a message was @@ -211,22 +669,18 @@ HYPERMAIL VERSION 2.4.0: segfault because of a null pointer 2015-04-03 Ivan Kuraj - * Correct includes for new glibc 2013-06-10 Jose Kahan - * Even if a message was annotated as spam/deleted, its Attachments were still being created 2013-04-18 Jose Kahan - * Removed commented out code in printfile.c. * Added missing ';' in generated css rules in the same file 2013-04-11 Jose Kahan - * Extended the configure options so that a user can decide if hypermail should be compiled and statically linked against the bundled pcre lib or if it should be dynamically linked with an external (or system) pcre @@ -241,44 +695,37 @@ HYPERMAIL VERSION 2.4.0: * The make clean in libfnv wasn't removing the libfnv.a file 2013-03-29 Jose Kahan - * Updated configure to latest autoconf syntax conventions. * Renamed configure.in to configure.ac and cleaned it up partially. 2013-03-20 Jose Kahan - * Updated pcre to pcre-8.32 from pcre-4.3. This may have broken the Windows lcc compilation but I don't have the means to verify it. 2013-03-15 Jose Kahan - * Updated trio code to trio version 1.14. Moved all the trio code to src/trio. Now compiling trio as a static library and linking against it, which makes binary a bit smaller. See src/trio/README.hypermail for details. 2013-03-12 Jose Kahan - * New experimental support for RFC 3676 format=flowed. There are two new related configuration options: format_flowed to enable it and format_flowed_disable_quotes to disable it in specific cases. See the documentation and the RFC for further info. 2013-03-08 Jose Kahan - * Error: string.c:parseurl() assumes that all URL motifs it can match end with a :// string. However, the URL match table it uses included some URL motifs that didn't have that string. These have been commented as they may cause parseulr to SIGSEVs in some cases. 2013-02-26 Jose Kahan - * New configuration option, noindex_onindexes, for associating a "noindex" robot metadata value with hypermail generated indexes. 2013-02-26 Jose Kahan - * New configuration option, userobotmeta, for Associatating robot annotations with attachments, using the experimental X-Robots-Tag HTTP header @@ -291,7 +738,6 @@ HYPERMAIL VERSION 2.4.0: option says we're not to preserve the original message body. 2013-01-30 Jose Kahan - * Migrated source code repository from cvs.hypermail.org to http://sourceforge.net/projects/hypermail/ diff --git a/FILES b/FILES index c89be6cd..16a81174 100644 --- a/FILES +++ b/FILES @@ -1,12 +1,14 @@ # # FILES: Automatically generated by mkFILES: -# - Wed Mar 20 19:50:32 CET 2013 +# - Fri Oct 15 18:19:15 CET 2021 # ################## # Top-level files # Changelog COPYING +LICENSE.txt +RELATED_LICENSES.txt KNOWN_BUGS FILES INSTALL @@ -14,7 +16,8 @@ Makefile.in README TODO UPGRADE -acconfig.h +RELEASE_NOTES +ROADMAP aclocal.m4 config.guess config.h.in @@ -25,7 +28,11 @@ install-sh ltmain.sh maketgz patchlevel.h -# +# +# - m4 macros +# +m4/apr.m4 +# # - Archive directory files # archive/.indent.pro @@ -47,7 +54,7 @@ configs/hypermail-msg.hyp configs/hypermail.rc # # - Contributed utilities -# +# contrib/hoaf-28/haof-0.1.dtd contrib/hoaf-28/README contrib/hoaf-28/collect_snipplets.py @@ -57,6 +64,7 @@ contrib/hoaf-28/hypermail-2b28-2b28+.patch contrib/hoaf-28/top_html.hdr contrib/canonicalize.pl contrib/cron_hypermail +contrib/css_to_c.pl contrib/fixhtime.pl contrib/hyperfeed.pl contrib/hypetombox.pl @@ -76,13 +84,14 @@ docs/attachments.txt docs/customizing.html docs/hmrc.4 docs/hmrc.html -docs/hr.yellow.png -docs/hypermail-faq.html +docs/faq.html docs/hypermail.1 +docs/hypermail.css docs/hypermail.html docs/hypermail.png +docs/hypermail-doc.css docs/index_hypermail.txt -docs/stars.png +docs/thanks.html # # - LCC-Win32 Build Support # @@ -97,24 +106,7 @@ lcc/getdate.c lcc/hypermail_files.txt lcc/lcc_extras.c lcc/lcc_extras.h -lcc/pcre.h -# -# libcgi - mail suport lib -# -libcgi/.indent.pro -libcgi/Makefile.in -libcgi/cgi.h -libcgi/form_ent.c -libcgi/form_tags.c -libcgi/get_cgi_info.c -libcgi/html.c -libcgi/libcgi.html -libcgi/main.c -libcgi/mcode.c -libcgi/strops.c -libcgi/syn_mime.c -libcgi/syn_url.c -libcgi/template.c +lcc/pcre2.h # # - Hypermail source # @@ -142,12 +134,12 @@ src/hypermail.h src/lang.c src/lang.h src/lock.c -src/mail.c src/mem.c src/parse.c src/parse.h src/print.c src/print.h +src/printcss.c src/printfile.c src/printfile.h src/proto.h @@ -164,386 +156,424 @@ src/threadprint.h src/txt2html.c src/txt2html.h src/uconvert.h +src/utf8.h src/uudecode.c src/uudecode.h # -# - Source to pcre supporting library -# -src/pcre -src/pcre/pcre_dfa_exec.c -src/pcre/PrepareRelease -src/pcre/pcre32_jit_compile.c -src/pcre/pcre_ord2utf8.c -src/pcre/pcre_maketables.c -src/pcre/makevp.bat -src/pcre/config.h.in -src/pcre/pcre32_byte_order.c -src/pcre/pcre_newline.c -src/pcre/compile -src/pcre/pcre32_printint.c -src/pcre/pcre.h.generic -src/pcre/pcre32_ord2utf32.c -src/pcre/pcre32_fullinfo.c -src/pcre/pcre16_valid_utf16.c -src/pcre/pcre_study.c -src/pcre/sljit -src/pcre/sljit/sljitNativeTILEGX-encoder.c -src/pcre/sljit/sljitNativeMIPS_32.c -src/pcre/sljit/sljitNativeARM_T2_32.c -src/pcre/sljit/sljitUtils.c -src/pcre/sljit/sljitNativeMIPS_common.c -src/pcre/sljit/sljitNativePPC_64.c -src/pcre/sljit/sljitLir.c -src/pcre/sljit/sljitNativeX86_32.c -src/pcre/sljit/sljitLir.h -src/pcre/sljit/sljitNativeTILEGX_64.c -src/pcre/sljit/sljitNativeX86_64.c -src/pcre/sljit/sljitNativeSPARC_common.c -src/pcre/sljit/sljitNativePPC_32.c -src/pcre/sljit/sljitConfigInternal.h -src/pcre/sljit/sljitExecAllocator.c -src/pcre/sljit/sljitConfig.h -src/pcre/sljit/sljitNativePPC_common.c -src/pcre/sljit/sljitNativeARM_32.c -src/pcre/sljit/sljitNativeSPARC_32.c -src/pcre/sljit/sljitNativeX86_common.c -src/pcre/sljit/sljitNativeMIPS_64.c -src/pcre/sljit/sljitNativeARM_64.c -src/pcre/pcre16_tables.c -src/pcre/pcre_tables.c -src/pcre/pcre32_string_utils.c -src/pcre/pcre_stringpiece.h.in -src/pcre/libpcre16.pc.in -src/pcre/pcreposix.c -src/pcre/pcre32_config.c -src/pcre/pcre_printint.c -src/pcre/pcre_jit_test.c -src/pcre/pcre16_newline.c -src/pcre/CMakeLists.txt -src/pcre/CheckMan -src/pcre/pcreposix.h -src/pcre/pcre16_refcount.c -src/pcre/Makefile.in -src/pcre/config.sub -src/pcre/pcre32_ucd.c -src/pcre/pcre16_version.c -src/pcre/makevp_l.txt -src/pcre/pcre32_valid_utf32.c -src/pcre/pcre16_study.c -src/pcre/pcre16_exec.c -src/pcre/COPYING -src/pcre/pcre16_ord2utf16.c -src/pcre/pcre_valid_utf8.c -src/pcre/pcre32_version.c -src/pcre/AUTHORS -src/pcre/pcre16_jit_compile.c -src/pcre/depcomp -src/pcre/pcre_stringpiece.cc -src/pcre/pcre16_dfa_exec.c -src/pcre/HACKING -src/pcre/pcre32_get.c -src/pcre/pcre32_newline.c -src/pcre/pcre16_config.c -src/pcre/pcre16_string_utils.c -src/pcre/pcre32_exec.c -src/pcre/pcre_xclass.c -src/pcre/pcre_exec.c -src/pcre/makevp_c.txt -src/pcre/m4 -src/pcre/m4/pcre_visibility.m4 -src/pcre/m4/ltversion.m4 -src/pcre/m4/ltsugar.m4 -src/pcre/m4/ax_pthread.m4 -src/pcre/m4/lt~obsolete.m4 -src/pcre/m4/ltoptions.m4 -src/pcre/m4/libtool.m4 -src/pcre/pcregexp.pas -src/pcre/cmake -src/pcre/cmake/COPYING-CMAKE-SCRIPTS -src/pcre/cmake/FindPackageHandleStandardArgs.cmake -src/pcre/cmake/FindEditline.cmake -src/pcre/cmake/FindReadline.cmake -src/pcre/pcre_ucd.c -src/pcre/missing -src/pcre/pcre32_dfa_exec.c -src/pcre/pcre_scanner_unittest.cc -src/pcre/LICENCE -src/pcre/configure -src/pcre/pcre_compile.c -src/pcre/ucp.h -src/pcre/pcretest.c -src/pcre/INSTALL -src/pcre/pcre32_globals.c -src/pcre/pcre16_printint.c -src/pcre/config.guess -src/pcre/pcre16_globals.c -src/pcre/libpcre32.pc.in -src/pcre/CleanTxt -src/pcre/testdata -src/pcre/testdata/testinput5 -src/pcre/testdata/testoutput5 -src/pcre/testdata/saved8 -src/pcre/testdata/grepfilelist -src/pcre/testdata/testoutput20 -src/pcre/testdata/testoutput3A -src/pcre/testdata/testoutput26 -src/pcre/testdata/testoutput16 -src/pcre/testdata/testinput10 -src/pcre/testdata/saved16 -src/pcre/testdata/greplist -src/pcre/testdata/testoutput4 -src/pcre/testdata/testinput21 -src/pcre/testdata/testinput25 -src/pcre/testdata/testoutput21-16 -src/pcre/testdata/testoutput12 -src/pcre/testdata/saved32BE-1 -src/pcre/testdata/grepoutput -src/pcre/testdata/saved32 -src/pcre/testdata/testinput23 -src/pcre/testdata/testoutput15 -src/pcre/testdata/testinput24 -src/pcre/testdata/testoutput11-8 -src/pcre/testdata/grepinput -src/pcre/testdata/saved16BE-1 -src/pcre/testdata/testinput3 -src/pcre/testdata/saved32BE-2 -src/pcre/testdata/testinput20 -src/pcre/testdata/testinput15 -src/pcre/testdata/testinputEBC -src/pcre/testdata/testinput17 -src/pcre/testdata/testoutput22-16 -src/pcre/testdata/testoutput25 -src/pcre/testdata/wintestoutput3 -src/pcre/testdata/testinput7 -src/pcre/testdata/testoutput11-32 -src/pcre/testdata/grepinput8 -src/pcre/testdata/saved32LE-1 -src/pcre/testdata/testoutput23 -src/pcre/testdata/testoutput18-16 -src/pcre/testdata/wintestinput3 -src/pcre/testdata/testinput11 -src/pcre/testdata/testoutputEBC -src/pcre/testdata/saved16LE-2 -src/pcre/testdata/testinput1 -src/pcre/testdata/saved16LE-1 -src/pcre/testdata/grepinputx -src/pcre/testdata/grepoutput8 -src/pcre/testdata/testoutput24 -src/pcre/testdata/testoutput19 -src/pcre/testdata/testoutput6 -src/pcre/testdata/testinput9 -src/pcre/testdata/testoutput3B -src/pcre/testdata/grepbinary -src/pcre/testdata/grepinputv -src/pcre/testdata/testoutput14 -src/pcre/testdata/testinput8 -src/pcre/testdata/testinput6 -src/pcre/testdata/testoutput21-32 -src/pcre/testdata/testinput13 -src/pcre/testdata/testoutput17 -src/pcre/testdata/testinput2 -src/pcre/testdata/testoutput18-32 -src/pcre/testdata/testinput16 -src/pcre/testdata/testoutput8 -src/pcre/testdata/valgrind-jit.supp -src/pcre/testdata/grepoutputN -src/pcre/testdata/testoutput7 -src/pcre/testdata/testoutput3 -src/pcre/testdata/testoutput1 -src/pcre/testdata/testoutput2 -src/pcre/testdata/testoutput22-32 -src/pcre/testdata/greppatN4 -src/pcre/testdata/testinput14 -src/pcre/testdata/testinput18 -src/pcre/testdata/testoutput10 -src/pcre/testdata/testoutput11-16 -src/pcre/testdata/saved16BE-2 -src/pcre/testdata/saved32LE-2 -src/pcre/testdata/testinput22 -src/pcre/testdata/testinput19 -src/pcre/testdata/grepinput3 -src/pcre/testdata/testinput4 -src/pcre/testdata/testoutput9 -src/pcre/testdata/testoutput13 -src/pcre/testdata/testinput12 -src/pcre/testdata/testinput26 -src/pcre/pcre32_study.c -src/pcre/aclocal.m4 -src/pcre/pcre32_refcount.c -src/pcre/pcre_stringpiece_unittest.cc -src/pcre/pcre_jit_compile.c -src/pcre/test-driver -src/pcre/pcre32_compile.c -src/pcre/ChangeLog -src/pcre/pcre_globals.c -src/pcre/Detrail -src/pcre/pcre_string_utils.c -src/pcre/pcre32_maketables.c -src/pcre/pcre_chartables.c.dist -src/pcre/libpcrecpp.pc.in -src/pcre/pcre32_xclass.c -src/pcre/dftables.c -src/pcre/pcredemo.c -src/pcre/config-cmake.h.in -src/pcre/pcrecpparg.h.in -src/pcre/config.h.generic -src/pcre/RunGrepTest -src/pcre/pcre32_utf32_utils.c -src/pcre/pcre_internal.h -src/pcre/ar-lib -src/pcre/ltmain.sh -src/pcre/pcre16_xclass.c -src/pcre/pcre16_maketables.c -src/pcre/pcre_fullinfo.c -src/pcre/configure.ac -src/pcre/132html -src/pcre/pcre_get.c -src/pcre/pcre16_fullinfo.c -src/pcre/NEWS -src/pcre/pcre16_ucd.c -src/pcre/pcre_scanner.h -src/pcre/pcregrep.c -src/pcre/libpcreposix.pc.in -src/pcre/pcre32_chartables.c -src/pcre/pcre32_tables.c -src/pcre/pcre_version.c -src/pcre/pcre-config.in -src/pcre/pcre16_chartables.c -src/pcre/perltest.pl -src/pcre/pcrecpp_internal.h -src/pcre/libpcre.pc.in -src/pcre/pcrecpp.h -src/pcre/pcrecpp_unittest.cc -src/pcre/pcre16_get.c -src/pcre/pcre16_utf16_utils.c -src/pcre/Makefile.am -src/pcre/pcrecpp.cc -src/pcre/pcre16_byte_order.c -src/pcre/README -src/pcre/pcre_scanner.cc -src/pcre/NON-AUTOTOOLS-BUILD -src/pcre/pcre_config.c -src/pcre/pcre_byte_order.c -src/pcre/RunTest.bat -src/pcre/pcre.h.in -src/pcre/NON-UNIX-USE -src/pcre/doc -src/pcre/doc/pcrebuild.3 -src/pcre/doc/pcre_get_substring_list.3 -src/pcre/doc/pcre_free_study.3 -src/pcre/doc/pcrelimits.3 -src/pcre/doc/pcre_get_stringnumber.3 -src/pcre/doc/pcre_jit_exec.3 -src/pcre/doc/pcre16.3 -src/pcre/doc/pcre-config.txt -src/pcre/doc/perltest.txt -src/pcre/doc/pcretest.1 -src/pcre/doc/pcrematching.3 -src/pcre/doc/pcre_jit_stack_alloc.3 -src/pcre/doc/pcreprecompile.3 -src/pcre/doc/pcre_dfa_exec.3 -src/pcre/doc/pcre_pattern_to_host_byte_order.3 -src/pcre/doc/pcre_compile.3 -src/pcre/doc/pcredemo.3 -src/pcre/doc/pcre_utf16_to_host_byte_order.3 -src/pcre/doc/pcre_study.3 -src/pcre/doc/pcrecallout.3 -src/pcre/doc/pcre_version.3 -src/pcre/doc/pcre_utf32_to_host_byte_order.3 -src/pcre/doc/pcre_jit_stack_free.3 -src/pcre/doc/pcrecpp.3 -src/pcre/doc/pcregrep.1 -src/pcre/doc/pcre_assign_jit_stack.3 -src/pcre/doc/pcre_exec.3 -src/pcre/doc/pcre-config.1 -src/pcre/doc/pcrepattern.3 -src/pcre/doc/pcre_get_stringtable_entries.3 -src/pcre/doc/pcre_free_substring.3 -src/pcre/doc/pcre_get_substring.3 -src/pcre/doc/pcrepartial.3 -src/pcre/doc/pcre.3 -src/pcre/doc/pcreunicode.3 -src/pcre/doc/pcretest.txt -src/pcre/doc/pcre_copy_named_substring.3 -src/pcre/doc/pcre_maketables.3 -src/pcre/doc/pcreapi.3 -src/pcre/doc/pcre.txt -src/pcre/doc/pcre_copy_substring.3 -src/pcre/doc/pcre_config.3 -src/pcre/doc/pcre32.3 -src/pcre/doc/pcrestack.3 -src/pcre/doc/pcre_compile2.3 -src/pcre/doc/pcrecompat.3 -src/pcre/doc/pcreposix.3 -src/pcre/doc/pcre_refcount.3 -src/pcre/doc/pcre_fullinfo.3 -src/pcre/doc/pcregrep.txt -src/pcre/doc/pcresample.3 -src/pcre/doc/pcresyntax.3 -src/pcre/doc/pcre_get_named_substring.3 -src/pcre/doc/index.html.src -src/pcre/doc/pcrejit.3 -src/pcre/doc/html -src/pcre/doc/html/pcre_free_substring_list.html -src/pcre/doc/html/pcre_assign_jit_stack.html -src/pcre/doc/html/pcre_config.html -src/pcre/doc/html/pcre16.html -src/pcre/doc/html/pcrelimits.html -src/pcre/doc/html/pcrejit.html -src/pcre/doc/html/pcre-config.html -src/pcre/doc/html/pcre_jit_exec.html -src/pcre/doc/html/pcre_refcount.html -src/pcre/doc/html/pcrepartial.html -src/pcre/doc/html/pcreunicode.html -src/pcre/doc/html/index.html -src/pcre/doc/html/pcre_exec.html -src/pcre/doc/html/pcre_fullinfo.html -src/pcre/doc/html/pcrecompat.html -src/pcre/doc/html/pcreapi.html -src/pcre/doc/html/pcre_get_stringnumber.html -src/pcre/doc/html/pcre_get_substring.html -src/pcre/doc/html/pcrecpp.html -src/pcre/doc/html/pcrecallout.html -src/pcre/doc/html/pcre_free_study.html -src/pcre/doc/html/pcre_pattern_to_host_byte_order.html -src/pcre/doc/html/pcreprecompile.html -src/pcre/doc/html/NON-AUTOTOOLS-BUILD.txt -src/pcre/doc/html/pcrematching.html -src/pcre/doc/html/pcre_get_stringtable_entries.html -src/pcre/doc/html/pcre_jit_stack_alloc.html -src/pcre/doc/html/pcre_free_substring.html -src/pcre/doc/html/pcre_version.html -src/pcre/doc/html/pcresyntax.html -src/pcre/doc/html/pcre_jit_stack_free.html -src/pcre/doc/html/pcretest.html -src/pcre/doc/html/pcre_utf16_to_host_byte_order.html -src/pcre/doc/html/pcre_maketables.html -src/pcre/doc/html/pcrestack.html -src/pcre/doc/html/pcre_copy_substring.html -src/pcre/doc/html/pcreposix.html -src/pcre/doc/html/README.txt -src/pcre/doc/html/pcrepattern.html -src/pcre/doc/html/pcre_get_named_substring.html -src/pcre/doc/html/pcrebuild.html -src/pcre/doc/html/pcresample.html -src/pcre/doc/html/pcre.html -src/pcre/doc/html/pcre_study.html -src/pcre/doc/html/pcre_copy_named_substring.html -src/pcre/doc/html/pcre_compile.html -src/pcre/doc/html/pcre_compile2.html -src/pcre/doc/html/pcregrep.html -src/pcre/doc/html/pcre32.html -src/pcre/doc/html/pcre_utf32_to_host_byte_order.html -src/pcre/doc/html/pcredemo.html -src/pcre/doc/html/pcre_get_substring_list.html -src/pcre/doc/html/pcre_dfa_exec.html -src/pcre/doc/html/pcreperform.html -src/pcre/doc/pcreperform.3 -src/pcre/doc/pcre_free_substring_list.3 -src/pcre/pcre_refcount.c -src/pcre/pcre16_compile.c -src/pcre/install-sh -src/pcre/RunTest +# - Source to pcre2 supporting library +# +src/pcre2 +src/pcre2/PrepareRelease +src/pcre2/libpcre2-posix.pc.in +src/pcre2/compile +src/pcre2/pcre2-config.in +src/pcre2/CMakeLists.txt +src/pcre2/CheckMan +src/pcre2/Makefile.in +src/pcre2/config.sub +src/pcre2/COPYING +src/pcre2/RunGrepTest.bat +src/pcre2/AUTHORS +src/pcre2/depcomp +src/pcre2/HACKING +src/pcre2/perltest.sh +src/pcre2/libpcre2-16.pc.in +src/pcre2/m4 +src/pcre2/m4/ltversion.m4 +src/pcre2/m4/ltsugar.m4 +src/pcre2/m4/pcre2_visibility.m4 +src/pcre2/m4/ax_pthread.m4 +src/pcre2/m4/lt~obsolete.m4 +src/pcre2/m4/ltoptions.m4 +src/pcre2/m4/libtool.m4 +src/pcre2/cmake +src/pcre2/cmake/COPYING-CMAKE-SCRIPTS +src/pcre2/cmake/FindPackageHandleStandardArgs.cmake +src/pcre2/cmake/FindEditline.cmake +src/pcre2/cmake/FindReadline.cmake +src/pcre2/missing +src/pcre2/LICENCE +src/pcre2/configure +src/pcre2/INSTALL +src/pcre2/config.guess +src/pcre2/CleanTxt +src/pcre2/testdata +src/pcre2/testdata/testinput5 +src/pcre2/testdata/testoutput5 +src/pcre2/testdata/grepfilelist +src/pcre2/testdata/testoutput20 +src/pcre2/testdata/testoutput3A +src/pcre2/testdata/testoutput22-8 +src/pcre2/testdata/testoutput16 +src/pcre2/testdata/testinput10 +src/pcre2/testdata/testoutput8-16-3 +src/pcre2/testdata/greplist +src/pcre2/testdata/testoutput4 +src/pcre2/testdata/testinput21 +src/pcre2/testdata/testinput25 +src/pcre2/testdata/testoutput8-8-3 +src/pcre2/testdata/testbtables +src/pcre2/testdata/grepoutput +src/pcre2/testdata/testinput23 +src/pcre2/testdata/testoutput14-32 +src/pcre2/testdata/testoutput15 +src/pcre2/testdata/testoutput12-32 +src/pcre2/testdata/testinput24 +src/pcre2/testdata/grepinputM +src/pcre2/testdata/testoutput12-16 +src/pcre2/testdata/grepinput +src/pcre2/testdata/testinput3 +src/pcre2/testdata/testoutput8-8-2 +src/pcre2/testdata/testoutput18 +src/pcre2/testdata/grepoutputCN +src/pcre2/testdata/testinput20 +src/pcre2/testdata/testoutput21 +src/pcre2/testdata/testinput15 +src/pcre2/testdata/testinputEBC +src/pcre2/testdata/testinput17 +src/pcre2/testdata/testoutput22-16 +src/pcre2/testdata/testoutput25 +src/pcre2/testdata/wintestoutput3 +src/pcre2/testdata/testinput7 +src/pcre2/testdata/testoutput8-16-2 +src/pcre2/testdata/testoutput11-32 +src/pcre2/testdata/testoutput8-32-3 +src/pcre2/testdata/grepinput8 +src/pcre2/testdata/testoutput23 +src/pcre2/testdata/wintestinput3 +src/pcre2/testdata/testinput11 +src/pcre2/testdata/testoutputEBC +src/pcre2/testdata/testinput1 +src/pcre2/testdata/grepinputx +src/pcre2/testdata/grepoutput8 +src/pcre2/testdata/testoutput14-8 +src/pcre2/testdata/testoutput24 +src/pcre2/testdata/testoutput19 +src/pcre2/testdata/testoutput6 +src/pcre2/testdata/testinput9 +src/pcre2/testdata/testoutput3B +src/pcre2/testdata/testoutput8-32-2 +src/pcre2/testdata/grepbinary +src/pcre2/testdata/grepinputv +src/pcre2/testdata/testoutput14-16 +src/pcre2/testdata/testinput8 +src/pcre2/testdata/testinput6 +src/pcre2/testdata/testoutput8-16-4 +src/pcre2/testdata/testinput13 +src/pcre2/testdata/testoutput17 +src/pcre2/testdata/testinput2 +src/pcre2/testdata/testinput16 +src/pcre2/testdata/testoutput8-8-4 +src/pcre2/testdata/valgrind-jit.supp +src/pcre2/testdata/testoutput8-32-4 +src/pcre2/testdata/grepoutputN +src/pcre2/testdata/testoutput7 +src/pcre2/testdata/testoutput3 +src/pcre2/testdata/testoutput1 +src/pcre2/testdata/testoutput2 +src/pcre2/testdata/testoutput22-32 +src/pcre2/testdata/greppatN4 +src/pcre2/testdata/testinput14 +src/pcre2/testdata/testinput18 +src/pcre2/testdata/testoutput10 +src/pcre2/testdata/grepoutputC +src/pcre2/testdata/testoutput11-16 +src/pcre2/testdata/testinput22 +src/pcre2/testdata/testinput19 +src/pcre2/testdata/grepinput3 +src/pcre2/testdata/testinput4 +src/pcre2/testdata/testoutput9 +src/pcre2/testdata/testoutput13 +src/pcre2/testdata/testinput12 +src/pcre2/aclocal.m4 +src/pcre2/test-driver +src/pcre2/ChangeLog +src/pcre2/Detrail +src/pcre2/config-cmake.h.in +src/pcre2/RunGrepTest +src/pcre2/ar-lib +src/pcre2/ltmain.sh +src/pcre2/configure.ac +src/pcre2/132html +src/pcre2/libpcre2-8.pc.in +src/pcre2/NEWS +src/pcre2/Makefile.am +src/pcre2/libpcre2-32.pc.in +src/pcre2/README +src/pcre2/NON-AUTOTOOLS-BUILD +src/pcre2/RunTest.bat +src/pcre2/doc +src/pcre2/doc/pcre2_code_free.3 +src/pcre2/doc/pcre2_substring_length_byname.3 +src/pcre2/doc/pcre2convert.3 +src/pcre2/doc/pcre2jit.3 +src/pcre2/doc/pcre2_get_match_data_size.3 +src/pcre2/doc/pcre2_set_parens_nest_limit.3 +src/pcre2/doc/pcre2_get_error_message.3 +src/pcre2/doc/pcre2_match_context_copy.3 +src/pcre2/doc/pcre2_converted_pattern_free.3 +src/pcre2/doc/pcre2_substring_get_byname.3 +src/pcre2/doc/pcre2_dfa_match.3 +src/pcre2/doc/pcre2_substitute.3 +src/pcre2/doc/pcre2posix.3 +src/pcre2/doc/pcre2_set_bsr.3 +src/pcre2/doc/pcre2_jit_stack_free.3 +src/pcre2/doc/pcre2_convert_context_create.3 +src/pcre2/doc/pcre2_get_ovector_pointer.3 +src/pcre2/doc/pcre2pattern.3 +src/pcre2/doc/pcre2_substring_list_get.3 +src/pcre2/doc/pcre2_substring_nametable_scan.3 +src/pcre2/doc/pcre2syntax.3 +src/pcre2/doc/pcre2_match.3 +src/pcre2/doc/pcre2_config.3 +src/pcre2/doc/pcre2test.1 +src/pcre2/doc/pcre2.3 +src/pcre2/doc/pcre2_match_data_free.3 +src/pcre2/doc/pcre2-config.txt +src/pcre2/doc/pcre2_general_context_free.3 +src/pcre2/doc/pcre2_compile_context_free.3 +src/pcre2/doc/pcre2build.3 +src/pcre2/doc/pcre2_substring_number_from_name.3 +src/pcre2/doc/pcre2_set_newline.3 +src/pcre2/doc/pcre2_substring_free.3 +src/pcre2/doc/pcre2_serialize_free.3 +src/pcre2/doc/pcre2matching.3 +src/pcre2/doc/pcre2_match_context_create.3 +src/pcre2/doc/pcre2_set_compile_extra_options.3 +src/pcre2/doc/pcre2_set_callout.3 +src/pcre2/doc/pcre2_get_ovector_count.3 +src/pcre2/doc/pcre2_jit_match.3 +src/pcre2/doc/pcre2serialize.3 +src/pcre2/doc/pcre2_pattern_info.3 +src/pcre2/doc/pcre2_set_depth_limit.3 +src/pcre2/doc/pcre2_substring_list_free.3 +src/pcre2/doc/pcre2test.txt +src/pcre2/doc/pcre2demo.3 +src/pcre2/doc/pcre2_set_recursion_limit.3 +src/pcre2/doc/pcre2_get_startchar.3 +src/pcre2/doc/pcre2-config.1 +src/pcre2/doc/pcre2_convert_context_free.3 +src/pcre2/doc/pcre2_set_glob_escape.3 +src/pcre2/doc/pcre2_substring_length_bynumber.3 +src/pcre2/doc/pcre2_set_match_limit.3 +src/pcre2/doc/pcre2_serialize_decode.3 +src/pcre2/doc/pcre2_compile_context_copy.3 +src/pcre2/doc/pcre2unicode.3 +src/pcre2/doc/pcre2_set_recursion_memory_management.3 +src/pcre2/doc/pcre2_set_offset_limit.3 +src/pcre2/doc/pcre2_jit_compile.3 +src/pcre2/doc/pcre2grep.txt +src/pcre2/doc/pcre2compat.3 +src/pcre2/doc/pcre2_general_context_copy.3 +src/pcre2/doc/pcre2_match_context_free.3 +src/pcre2/doc/pcre2_jit_free_unused_memory.3 +src/pcre2/doc/pcre2_maketables_free.3 +src/pcre2/doc/pcre2_set_compile_recursion_guard.3 +src/pcre2/doc/pcre2.txt +src/pcre2/doc/pcre2_callout_enumerate.3 +src/pcre2/doc/pcre2_get_mark.3 +src/pcre2/doc/pcre2_set_heap_limit.3 +src/pcre2/doc/pcre2_serialize_get_number_of_codes.3 +src/pcre2/doc/pcre2_convert_context_copy.3 +src/pcre2/doc/pcre2_match_data_create_from_pattern.3 +src/pcre2/doc/pcre2limits.3 +src/pcre2/doc/pcre2_compile.3 +src/pcre2/doc/pcre2_jit_stack_assign.3 +src/pcre2/doc/pcre2_general_context_create.3 +src/pcre2/doc/pcre2_pattern_convert.3 +src/pcre2/doc/pcre2_match_data_create.3 +src/pcre2/doc/pcre2_set_max_pattern_length.3 +src/pcre2/doc/pcre2_serialize_encode.3 +src/pcre2/doc/pcre2perform.3 +src/pcre2/doc/pcre2_code_copy.3 +src/pcre2/doc/pcre2_set_glob_separator.3 +src/pcre2/doc/pcre2_jit_stack_create.3 +src/pcre2/doc/pcre2_code_copy_with_tables.3 +src/pcre2/doc/pcre2partial.3 +src/pcre2/doc/pcre2_maketables.3 +src/pcre2/doc/pcre2_set_character_tables.3 +src/pcre2/doc/pcre2_substring_get_bynumber.3 +src/pcre2/doc/pcre2_compile_context_create.3 +src/pcre2/doc/pcre2_set_substitute_callout.3 +src/pcre2/doc/pcre2sample.3 +src/pcre2/doc/pcre2_substring_copy_bynumber.3 +src/pcre2/doc/index.html.src +src/pcre2/doc/pcre2api.3 +src/pcre2/doc/pcre2callout.3 +src/pcre2/doc/html +src/pcre2/doc/html/pcre2_jit_stack_assign.html +src/pcre2/doc/html/pcre2_code_copy.html +src/pcre2/doc/html/pcre2_set_bsr.html +src/pcre2/doc/html/pcre2build.html +src/pcre2/doc/html/pcre2_serialize_encode.html +src/pcre2/doc/html/pcre2_maketables_free.html +src/pcre2/doc/html/pcre2_set_newline.html +src/pcre2/doc/html/pcre2compat.html +src/pcre2/doc/html/pcre2_convert_context_free.html +src/pcre2/doc/html/pcre2_match_context_free.html +src/pcre2/doc/html/pcre2_callout_enumerate.html +src/pcre2/doc/html/pcre2_substring_get_bynumber.html +src/pcre2/doc/html/pcre2_compile_context_copy.html +src/pcre2/doc/html/pcre2_substring_number_from_name.html +src/pcre2/doc/html/pcre2partial.html +src/pcre2/doc/html/pcre2convert.html +src/pcre2/doc/html/pcre2_set_recursion_memory_management.html +src/pcre2/doc/html/pcre2_get_mark.html +src/pcre2/doc/html/pcre2_compile.html +src/pcre2/doc/html/pcre2_general_context_create.html +src/pcre2/doc/html/pcre2_substring_list_free.html +src/pcre2/doc/html/pcre2_maketables.html +src/pcre2/doc/html/pcre2_match_data_create_from_pattern.html +src/pcre2/doc/html/index.html +src/pcre2/doc/html/pcre2_set_substitute_callout.html +src/pcre2/doc/html/pcre2_substring_nametable_scan.html +src/pcre2/doc/html/pcre2api.html +src/pcre2/doc/html/pcre2_pattern_convert.html +src/pcre2/doc/html/pcre2_substring_free.html +src/pcre2/doc/html/pcre2limits.html +src/pcre2/doc/html/pcre2_compile_context_create.html +src/pcre2/doc/html/pcre2_set_parens_nest_limit.html +src/pcre2/doc/html/pcre2pattern.html +src/pcre2/doc/html/pcre2_set_recursion_limit.html +src/pcre2/doc/html/pcre2_code_copy_with_tables.html +src/pcre2/doc/html/pcre2callout.html +src/pcre2/doc/html/pcre2_substring_copy_bynumber.html +src/pcre2/doc/html/pcre2_set_depth_limit.html +src/pcre2/doc/html/pcre2perform.html +src/pcre2/doc/html/NON-AUTOTOOLS-BUILD.txt +src/pcre2/doc/html/pcre2_set_max_pattern_length.html +src/pcre2/doc/html/pcre2_substring_length_bynumber.html +src/pcre2/doc/html/pcre2_get_match_data_size.html +src/pcre2/doc/html/pcre2_jit_stack_free.html +src/pcre2/doc/html/pcre2_match_data_free.html +src/pcre2/doc/html/pcre2_set_glob_separator.html +src/pcre2/doc/html/pcre2_set_callout.html +src/pcre2/doc/html/pcre2_set_glob_escape.html +src/pcre2/doc/html/pcre2matching.html +src/pcre2/doc/html/pcre2_compile_context_free.html +src/pcre2/doc/html/pcre2_serialize_free.html +src/pcre2/doc/html/pcre2_convert_context_create.html +src/pcre2/doc/html/pcre2_match_context_copy.html +src/pcre2/doc/html/pcre2_set_character_tables.html +src/pcre2/doc/html/pcre2_get_ovector_pointer.html +src/pcre2/doc/html/pcre2demo.html +src/pcre2/doc/html/pcre2_jit_match.html +src/pcre2/doc/html/pcre2_set_match_limit.html +src/pcre2/doc/html/pcre2unicode.html +src/pcre2/doc/html/pcre2_converted_pattern_free.html +src/pcre2/doc/html/pcre2_serialize_get_number_of_codes.html +src/pcre2/doc/html/pcre2_general_context_free.html +src/pcre2/doc/html/pcre2_substring_get_byname.html +src/pcre2/doc/html/README.txt +src/pcre2/doc/html/pcre2_get_startchar.html +src/pcre2/doc/html/pcre2jit.html +src/pcre2/doc/html/pcre2_set_compile_extra_options.html +src/pcre2/doc/html/pcre2_set_heap_limit.html +src/pcre2/doc/html/pcre2_jit_compile.html +src/pcre2/doc/html/pcre2_jit_free_unused_memory.html +src/pcre2/doc/html/pcre2-config.html +src/pcre2/doc/html/pcre2.html +src/pcre2/doc/html/pcre2_dfa_match.html +src/pcre2/doc/html/pcre2_general_context_copy.html +src/pcre2/doc/html/pcre2posix.html +src/pcre2/doc/html/pcre2_get_ovector_count.html +src/pcre2/doc/html/pcre2_get_error_message.html +src/pcre2/doc/html/pcre2_pattern_info.html +src/pcre2/doc/html/pcre2_jit_stack_create.html +src/pcre2/doc/html/pcre2_match_context_create.html +src/pcre2/doc/html/pcre2_config.html +src/pcre2/doc/html/pcre2_serialize_decode.html +src/pcre2/doc/html/pcre2grep.html +src/pcre2/doc/html/pcre2_substring_copy_byname.html +src/pcre2/doc/html/pcre2_set_offset_limit.html +src/pcre2/doc/html/pcre2_substring_length_byname.html +src/pcre2/doc/html/pcre2serialize.html +src/pcre2/doc/html/pcre2_substitute.html +src/pcre2/doc/html/pcre2_match.html +src/pcre2/doc/html/pcre2sample.html +src/pcre2/doc/html/pcre2_set_compile_recursion_guard.html +src/pcre2/doc/html/pcre2_substring_list_get.html +src/pcre2/doc/html/pcre2syntax.html +src/pcre2/doc/html/pcre2_code_free.html +src/pcre2/doc/html/pcre2test.html +src/pcre2/doc/html/pcre2_match_data_create.html +src/pcre2/doc/html/pcre2_convert_context_copy.html +src/pcre2/doc/pcre2_substring_copy_byname.3 +src/pcre2/doc/pcre2grep.1 +src/pcre2/src +src/pcre2/src/pcre2_match.c +src/pcre2/src/pcre2_substitute.c +src/pcre2/src/pcre2.h.generic +src/pcre2/src/pcre2_newline.c +src/pcre2/src/config.h.in +src/pcre2/src/pcre2posix.h +src/pcre2/src/pcre2_ord2utf.c +src/pcre2/src/pcre2_maketables.c +src/pcre2/src/pcre2_dftables.c +src/pcre2/src/pcre2_extuni.c +src/pcre2/src/pcre2_config.c +src/pcre2/src/pcre2_find_bracket.c +src/pcre2/src/pcre2_jit_match.c +src/pcre2/src/pcre2_fuzzsupport.c +src/pcre2/src/pcre2_chartables.c.dist +src/pcre2/src/sljit +src/pcre2/src/sljit/sljitNativeMIPS_32.c +src/pcre2/src/sljit/sljitNativeARM_T2_32.c +src/pcre2/src/sljit/sljitUtils.c +src/pcre2/src/sljit/sljitNativeS390X.c +src/pcre2/src/sljit/sljitNativeMIPS_common.c +src/pcre2/src/sljit/sljitNativePPC_64.c +src/pcre2/src/sljit/sljitWXExecAllocator.c +src/pcre2/src/sljit/sljitProtExecAllocator.c +src/pcre2/src/sljit/sljitLir.c +src/pcre2/src/sljit/sljitNativeX86_32.c +src/pcre2/src/sljit/sljitLir.h +src/pcre2/src/sljit/sljitNativeX86_64.c +src/pcre2/src/sljit/sljitNativeSPARC_common.c +src/pcre2/src/sljit/sljitNativePPC_32.c +src/pcre2/src/sljit/sljitConfigInternal.h +src/pcre2/src/sljit/sljitExecAllocator.c +src/pcre2/src/sljit/sljitConfig.h +src/pcre2/src/sljit/sljitNativePPC_common.c +src/pcre2/src/sljit/sljitNativeARM_32.c +src/pcre2/src/sljit/sljitNativeSPARC_32.c +src/pcre2/src/sljit/sljitNativeX86_common.c +src/pcre2/src/sljit/sljitNativeMIPS_64.c +src/pcre2/src/sljit/sljitNativeARM_64.c +src/pcre2/src/pcre2.h.in +src/pcre2/src/pcre2_error.c +src/pcre2/src/pcre2_convert.c +src/pcre2/src/pcre2_serialize.c +src/pcre2/src/pcre2_jit_simd_inc.h +src/pcre2/src/pcre2_match_data.c +src/pcre2/src/pcre2_compile.c +src/pcre2/src/pcre2_string_utils.c +src/pcre2/src/pcre2demo.c +src/pcre2/src/pcre2_jit_neon_inc.h +src/pcre2/src/pcre2_printint.c +src/pcre2/src/pcre2_jit_test.c +src/pcre2/src/pcre2_intmodedep.h +src/pcre2/src/config.h.generic +src/pcre2/src/pcre2_internal.h +src/pcre2/src/pcre2_auto_possess.c +src/pcre2/src/pcre2_xclass.c +src/pcre2/src/pcre2_tables.c +src/pcre2/src/pcre2_valid_utf.c +src/pcre2/src/pcre2_substring.c +src/pcre2/src/pcre2posix.c +src/pcre2/src/pcre2_study.c +src/pcre2/src/pcre2_pattern_info.c +src/pcre2/src/pcre2_jit_misc.c +src/pcre2/src/pcre2_jit_compile.c +src/pcre2/src/pcre2_context.c +src/pcre2/src/pcre2grep.c +src/pcre2/src/pcre2_dfa_match.c +src/pcre2/src/pcre2_ucd.c +src/pcre2/src/pcre2_script_run.c +src/pcre2/src/pcre2_ucp.h +src/pcre2/src/pcre2test.c +src/pcre2/install-sh +src/pcre2/RunTest # # - Source to ctrio supporting library # diff --git a/INSTALL b/INSTALL index 8c5df4ab..081c8f01 100644 --- a/INSTALL +++ b/INSTALL @@ -1,72 +1,167 @@ +======================================= +||BUILDING AND INSTALLING HYPERMAIL || +======================================= + +PLEASE READ THE RELEASE NOTES FIRST. + Quick summary: +============= +cd hypermail_checkout_directory +./autogen.sh ./configure make make install -See the Upgrading section at the end for changes that might affect users -of older versions. -========================== +Read the separate UPGRADE file for changes that might affect +users of older versions. + +External dependencies: +====================== + +Note: configure will warn you if a mandatory dependence is missing and +if it's going to skip or handle in a different way the optional ones. + +For compiling hypermail: + +* bison >=3.7 or yacc (mandatory) + bison is prefered as it's what we use in our dev. environment. + +* libiconv (mandatory) + libiconv is included in the libc6 package as a GNU standard C + library. In order to compile hypermail, you'll need to install its + headers + + On Debian: + + apt-get install libc6-dev + +* pkgconf (mandatory) + pkgconf is a system for configuring build dependency information + and is used for compiling against some of the libraries hypermail uses. + + On Debian: + + apt-get install pkgconf + +* libpcre2-dev >= 10.32 (optional) + if you don't have it, hypermail will compile and link with + it's own shipped-in version (src/pcre2) - SECURITY WARNING: Do not put hypermail's output anyplace where a web - server might have server side includes (SSI) enabled, unless you are sure - you know what you are doing. If in doubt, check your web server +* libgdbm-dev (optional) + only needed if you plan to use the usegdbm hypermail configuration + option (unmaintained, see docs/hmrc.html) + +* libchardet-dev (optional) + only needed if you want to benefit from automatic character set + detection in messages that don't provide any charset information. + Mostly useful when you're dealing with archives having messages + dating from 1990-2002, when many mail clients were a bit broken. + + On Debian: + + apt-get install libchardet-dev libchardet + + If not available in your system, you can download the source code, + compile and install it: + + https://github.com/Joungkyun/libchardet + +For running hypermail: + +* libpcre2 >= 10.42 + only needed if you linked hypermail against the system libpcre + library + +* libchardet (optional) + only needed if you linked hypermail against the system libchardet + library + +* libgdm (optional) + only needed if you plan to use the usegdbm hypermail configuration option + (unmaintained, see docs/hmrc.html) + +SECURITY WARNING: +================ + + Do not put hypermail's output anyplace where a web server might + have server side includes (SSI) enabled, unless you are sure you + know what you are doing. If in doubt, check your web server configuration. If you are using Apache, look for an Options line that mentions Includes or IncludesNOEXEC. The author of an email normally has substantial control over what files hypermail creates, particularly via attachments. Hypermail is designed to insure that - filenames don't end in .shtml and don't contain characters like / or \, - which prevent some security problems, but there are few restrictions - on what can go in a file (e.g. possibly malicious html tags - try the - "text_types = *" option or "ignore_types = $NONPLAIN" option if you want - to restrict this). You might also want to look at the attachmentlink - and unsafe_chars options to restrict attachment filenames. - Also, it is probably a bad idea to enable cgi execution on a directory - that hypermail puts files in. - Do not use the crappy cgi program called "mail". + filenames don't end in .shtml and don't contain characters like / + or \, which prevent some security problems, but there are few + restrictions on what can go in a file (e.g. possibly malicious html + tags - try the "text_types = *" option or "ignore_types = + $NONPLAIN" option if you want to restrict this). You might also + want to look at the attachmentlink and unsafe_chars options to + restrict attachment filenames. + + Also, it is probably a bad idea to enable cgi execution on a + directory that hypermail puts files in. Before Building Hypermail: ========================== Hypermail now uses "configure" to generate the Makefiles. In the - top level directory, type "configure" to create the Makefiles. - If it does not work on your system, please let me know. + top level directory, type "configure" to create the Makefiles. If + it does not work on your system, please let us know. Building Hypermail: =================== Hypermail has been normally compiled and run on Unix-based systems - in the past. Today it can be configured and built using Cygwin - software. I have either compiled and tested this code successfully - on the following platforms or others have told me of their success. + in the past. It compiles with both gcc and clang in the following + platforms: - Solaris, SunOS 4.1.3, FreeBSD 2.2.5 and later, - BSDI/3.x, Linux kernel 2.0.18 and 2.0.30, Redhat 5.x and later, - NT using CygWin-b19 , - Irix6.2, HP-UX 10.20 and later, SCO OS 5.0.5, and TRU64/OSF1 - on a DEC Alpha + Debian 12, Ubuntu 22.04.2 LTS - Hypermail compiles on MacOSX, tested with X.2.6. Beware that - you may need to configure it to use --disable-shared and manually - execute make in src/pcre before. + Previously, it used to compile against a wide variety of Unix + platforms, MacOS, and Cygwin but we've not done it recently + against those platforms nor we have access to them anymore. We'll + update this page when there is more feedback. - For more information on Cygwin and build hypermail on a Windows-based - system, see the file docs/Install-win32.txt. + For some (old) information on Cygwin and build hypermail on a + Windows-based system, see the file docs/Install-win32.txt. Generic Build: + 0) If you're checking out and building hypermail for the first time, + type "./autogen" to update the configure files to your + local system setup. + + You only need to do this once, when you do the clone of the + repository. + 1) Type "./configure". This creates the makefiles and the - config.h file needed to build the software. - If you want to install Hypermail somewhere other than in /usr/local, - run something like this instead: ./configure --prefix=$HOME + config.h file needed to build the software. + + If you want to install Hypermail somewhere other than in + /usr/local, run something like this instead: + ./configure --prefix=$HOME + + There are some options you can enable or disable but all the + useful ones are enabled by default. You can also use the + options to give paths to local checkout of the dependent libraries + if you didn't install them on your system. + ./configure --help + + Check out the configure output to see if you're missing system + libraries. If yes, install them and launch ./configure again. + 2) Type "make". This will build the software. - If it has trouble finding gdbm (e.g. if it fails with a message such as - "cannot open -lgdbm", you may need to disable gdbm with: + + If it has trouble finding gdbm (e.g. if it fails with a message + such as "cannot open -lgdbm", you may need to disable gdbm + with: ./configure --without-gdbm - and type "make clean" and then "make" again. (Note that without gdbm, - you can't do incremental updates using the folder_by_date option). + and type "make clean" and then "make" again. (Note that without + gdbm, you can't do incremental updates using the folder_by_date + option). On some systems you may need to add "-R/usr/local/lib -L/usr/local/lib" to the CFLAGS variable in the Makefiles, or alter your LD_LIBRARY_PATH @@ -94,7 +189,8 @@ Building Hypermail: proto.h:99: conflicting types for `strcasestr' /usr/include/string.h:86: previous declaration of `strcasestr' - then you should try removing the line in proto.h that refers to strcasestr. + then you should try removing the line in proto.h that refers to + strcasestr. Testing Hypermail: ================== @@ -133,40 +229,13 @@ Installing Hypermail: make install - in the main Hypermail directory (the one where you did ./configure). - If it fails with something like: + in the main Hypermail directory (the one where you did + ./configure). If it fails with something like: + mkdir: "/usr/local/apache/htdocs/hypermail": Permission denied + then you may need to rerun ./configure giving it the directory in - which you want to install html documentation files using this option: + which you want to install html documentation files using this + option: + --with-htmldir=/var/www/htdocs - -Upgrading: -============== - The body option has been disabled as of version 2.1.4 for strict - HTML 4.01 compatibility. You should replace any body command you - have in your .hmrc with a style sheet (such as a file called - hypermail.css in the archive directory), and set icss_url and - mcss_url to the url of that style sheet. If you want the appearance - that was the default before 2.1.4, your style sheet should contain this: - -body {color: black; background: #ffffff} -h1.center {text-align: center} -div.center {text-align: center} - - Also, if you have been using the icss_url and/or mcss_url options and - are upgrading to 2.1.4 or higher, you might want to add those statements - to your style sheet, as their style is no longer being provided by - tags, etc. (For users not specifying an icss_url and mcss_url, - default style sheets have been put in all files to maintain that style.) - - The overwrite option defaulted to On for many versions. Starting with - version 2.1.4, it defaults to off again. You may want to turn it on - again occasionally to insure that all of your archive uses the same style - (assuming you have a copy of the archive in mbox format). - -::: SPECIAL NOTE::: - The cgi program called "mail" has been disabled. If you've been - using it, you should either stop using it or look carefully enough - at what it does to understand whether it is safe for you to enable - its functionality. This could be a security concern for your site. - diff --git a/LICENSE.txt b/LICENSE.txt new file mode 100644 index 00000000..f288702d --- /dev/null +++ b/LICENSE.txt @@ -0,0 +1,674 @@ + GNU GENERAL PUBLIC LICENSE + Version 3, 29 June 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The GNU General Public License is a free, copyleft license for +software and other kinds of works. + + The licenses for most software and other practical works are designed +to take away your freedom to share and change the works. By contrast, +the GNU General Public License is intended to guarantee your freedom to +share and change all versions of a program--to make sure it remains free +software for all its users. We, the Free Software Foundation, use the +GNU General Public License for most of our software; it applies also to +any other work released this way by its authors. You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +them if you wish), that you receive source code or can get it if you +want it, that you can change the software or use pieces of it in new +free programs, and that you know you can do these things. + + To protect your rights, we need to prevent others from denying you +these rights or asking you to surrender the rights. Therefore, you have +certain responsibilities if you distribute copies of the software, or if +you modify it: responsibilities to respect the freedom of others. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must pass on to the recipients the same +freedoms that you received. You must make sure that they, too, receive +or can get the source code. And you must show them these terms so they +know their rights. + + Developers that use the GNU GPL protect your rights with two steps: +(1) assert copyright on the software, and (2) offer you this License +giving you legal permission to copy, distribute and/or modify it. + + For the developers' and authors' protection, the GPL clearly explains +that there is no warranty for this free software. For both users' and +authors' sake, the GPL requires that modified versions be marked as +changed, so that their problems will not be attributed erroneously to +authors of previous versions. + + Some devices are designed to deny users access to install or run +modified versions of the software inside them, although the manufacturer +can do so. This is fundamentally incompatible with the aim of +protecting users' freedom to change the software. The systematic +pattern of such abuse occurs in the area of products for individuals to +use, which is precisely where it is most unacceptable. Therefore, we +have designed this version of the GPL to prohibit the practice for those +products. If such problems arise substantially in other domains, we +stand ready to extend this provision to those domains in future versions +of the GPL, as needed to protect the freedom of users. + + Finally, every program is threatened constantly by software patents. +States should not allow patents to restrict development and use of +software on general-purpose computers, but in those that do, we wish to +avoid the special danger that patents applied to a free program could +make it effectively proprietary. To prevent this, the GPL assures that +patents cannot be used to render the program non-free. + + The precise terms and conditions for copying, distribution and +modification follow. + + TERMS AND CONDITIONS + + 0. Definitions. + + "This License" refers to version 3 of the GNU General Public License. + + "Copyright" also means copyright-like laws that apply to other kinds of +works, such as semiconductor masks. + + "The Program" refers to any copyrightable work licensed under this +License. Each licensee is addressed as "you". "Licensees" and +"recipients" may be individuals or organizations. + + To "modify" a work means to copy from or adapt all or part of the work +in a fashion requiring copyright permission, other than the making of an +exact copy. The resulting work is called a "modified version" of the +earlier work or a work "based on" the earlier work. + + A "covered work" means either the unmodified Program or a work based +on the Program. + + To "propagate" a work means to do anything with it that, without +permission, would make you directly or secondarily liable for +infringement under applicable copyright law, except executing it on a +computer or modifying a private copy. Propagation includes copying, +distribution (with or without modification), making available to the +public, and in some countries other activities as well. + + To "convey" a work means any kind of propagation that enables other +parties to make or receive copies. Mere interaction with a user through +a computer network, with no transfer of a copy, is not conveying. + + An interactive user interface displays "Appropriate Legal Notices" +to the extent that it includes a convenient and prominently visible +feature that (1) displays an appropriate copyright notice, and (2) +tells the user that there is no warranty for the work (except to the +extent that warranties are provided), that licensees may convey the +work under this License, and how to view a copy of this License. If +the interface presents a list of user commands or options, such as a +menu, a prominent item in the list meets this criterion. + + 1. Source Code. + + The "source code" for a work means the preferred form of the work +for making modifications to it. "Object code" means any non-source +form of a work. + + A "Standard Interface" means an interface that either is an official +standard defined by a recognized standards body, or, in the case of +interfaces specified for a particular programming language, one that +is widely used among developers working in that language. + + The "System Libraries" of an executable work include anything, other +than the work as a whole, that (a) is included in the normal form of +packaging a Major Component, but which is not part of that Major +Component, and (b) serves only to enable use of the work with that +Major Component, or to implement a Standard Interface for which an +implementation is available to the public in source code form. A +"Major Component", in this context, means a major essential component +(kernel, window system, and so on) of the specific operating system +(if any) on which the executable work runs, or a compiler used to +produce the work, or an object code interpreter used to run it. + + The "Corresponding Source" for a work in object code form means all +the source code needed to generate, install, and (for an executable +work) run the object code and to modify the work, including scripts to +control those activities. However, it does not include the work's +System Libraries, or general-purpose tools or generally available free +programs which are used unmodified in performing those activities but +which are not part of the work. For example, Corresponding Source +includes interface definition files associated with source files for +the work, and the source code for shared libraries and dynamically +linked subprograms that the work is specifically designed to require, +such as by intimate data communication or control flow between those +subprograms and other parts of the work. + + The Corresponding Source need not include anything that users +can regenerate automatically from other parts of the Corresponding +Source. + + The Corresponding Source for a work in source code form is that +same work. + + 2. Basic Permissions. + + All rights granted under this License are granted for the term of +copyright on the Program, and are irrevocable provided the stated +conditions are met. This License explicitly affirms your unlimited +permission to run the unmodified Program. The output from running a +covered work is covered by this License only if the output, given its +content, constitutes a covered work. This License acknowledges your +rights of fair use or other equivalent, as provided by copyright law. + + You may make, run and propagate covered works that you do not +convey, without conditions so long as your license otherwise remains +in force. You may convey covered works to others for the sole purpose +of having them make modifications exclusively for you, or provide you +with facilities for running those works, provided that you comply with +the terms of this License in conveying all material for which you do +not control copyright. Those thus making or running the covered works +for you must do so exclusively on your behalf, under your direction +and control, on terms that prohibit them from making any copies of +your copyrighted material outside their relationship with you. + + Conveying under any other circumstances is permitted solely under +the conditions stated below. Sublicensing is not allowed; section 10 +makes it unnecessary. + + 3. Protecting Users' Legal Rights From Anti-Circumvention Law. + + No covered work shall be deemed part of an effective technological +measure under any applicable law fulfilling obligations under article +11 of the WIPO copyright treaty adopted on 20 December 1996, or +similar laws prohibiting or restricting circumvention of such +measures. + + When you convey a covered work, you waive any legal power to forbid +circumvention of technological measures to the extent such circumvention +is effected by exercising rights under this License with respect to +the covered work, and you disclaim any intention to limit operation or +modification of the work as a means of enforcing, against the work's +users, your or third parties' legal rights to forbid circumvention of +technological measures. + + 4. Conveying Verbatim Copies. + + You may convey verbatim copies of the Program's source code as you +receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy an appropriate copyright notice; +keep intact all notices stating that this License and any +non-permissive terms added in accord with section 7 apply to the code; +keep intact all notices of the absence of any warranty; and give all +recipients a copy of this License along with the Program. + + You may charge any price or no price for each copy that you convey, +and you may offer support or warranty protection for a fee. + + 5. Conveying Modified Source Versions. + + You may convey a work based on the Program, or the modifications to +produce it from the Program, in the form of source code under the +terms of section 4, provided that you also meet all of these conditions: + + a) The work must carry prominent notices stating that you modified + it, and giving a relevant date. + + b) The work must carry prominent notices stating that it is + released under this License and any conditions added under section + 7. This requirement modifies the requirement in section 4 to + "keep intact all notices". + + c) You must license the entire work, as a whole, under this + License to anyone who comes into possession of a copy. This + License will therefore apply, along with any applicable section 7 + additional terms, to the whole of the work, and all its parts, + regardless of how they are packaged. This License gives no + permission to license the work in any other way, but it does not + invalidate such permission if you have separately received it. + + d) If the work has interactive user interfaces, each must display + Appropriate Legal Notices; however, if the Program has interactive + interfaces that do not display Appropriate Legal Notices, your + work need not make them do so. + + A compilation of a covered work with other separate and independent +works, which are not by their nature extensions of the covered work, +and which are not combined with it such as to form a larger program, +in or on a volume of a storage or distribution medium, is called an +"aggregate" if the compilation and its resulting copyright are not +used to limit the access or legal rights of the compilation's users +beyond what the individual works permit. Inclusion of a covered work +in an aggregate does not cause this License to apply to the other +parts of the aggregate. + + 6. Conveying Non-Source Forms. + + You may convey a covered work in object code form under the terms +of sections 4 and 5, provided that you also convey the +machine-readable Corresponding Source under the terms of this License, +in one of these ways: + + a) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by the + Corresponding Source fixed on a durable physical medium + customarily used for software interchange. + + b) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by a + written offer, valid for at least three years and valid for as + long as you offer spare parts or customer support for that product + model, to give anyone who possesses the object code either (1) a + copy of the Corresponding Source for all the software in the + product that is covered by this License, on a durable physical + medium customarily used for software interchange, for a price no + more than your reasonable cost of physically performing this + conveying of source, or (2) access to copy the + Corresponding Source from a network server at no charge. + + c) Convey individual copies of the object code with a copy of the + written offer to provide the Corresponding Source. This + alternative is allowed only occasionally and noncommercially, and + only if you received the object code with such an offer, in accord + with subsection 6b. + + d) Convey the object code by offering access from a designated + place (gratis or for a charge), and offer equivalent access to the + Corresponding Source in the same way through the same place at no + further charge. You need not require recipients to copy the + Corresponding Source along with the object code. If the place to + copy the object code is a network server, the Corresponding Source + may be on a different server (operated by you or a third party) + that supports equivalent copying facilities, provided you maintain + clear directions next to the object code saying where to find the + Corresponding Source. Regardless of what server hosts the + Corresponding Source, you remain obligated to ensure that it is + available for as long as needed to satisfy these requirements. + + e) Convey the object code using peer-to-peer transmission, provided + you inform other peers where the object code and Corresponding + Source of the work are being offered to the general public at no + charge under subsection 6d. + + A separable portion of the object code, whose source code is excluded +from the Corresponding Source as a System Library, need not be +included in conveying the object code work. + + A "User Product" is either (1) a "consumer product", which means any +tangible personal property which is normally used for personal, family, +or household purposes, or (2) anything designed or sold for incorporation +into a dwelling. In determining whether a product is a consumer product, +doubtful cases shall be resolved in favor of coverage. For a particular +product received by a particular user, "normally used" refers to a +typical or common use of that class of product, regardless of the status +of the particular user or of the way in which the particular user +actually uses, or expects or is expected to use, the product. A product +is a consumer product regardless of whether the product has substantial +commercial, industrial or non-consumer uses, unless such uses represent +the only significant mode of use of the product. + + "Installation Information" for a User Product means any methods, +procedures, authorization keys, or other information required to install +and execute modified versions of a covered work in that User Product from +a modified version of its Corresponding Source. The information must +suffice to ensure that the continued functioning of the modified object +code is in no case prevented or interfered with solely because +modification has been made. + + If you convey an object code work under this section in, or with, or +specifically for use in, a User Product, and the conveying occurs as +part of a transaction in which the right of possession and use of the +User Product is transferred to the recipient in perpetuity or for a +fixed term (regardless of how the transaction is characterized), the +Corresponding Source conveyed under this section must be accompanied +by the Installation Information. But this requirement does not apply +if neither you nor any third party retains the ability to install +modified object code on the User Product (for example, the work has +been installed in ROM). + + The requirement to provide Installation Information does not include a +requirement to continue to provide support service, warranty, or updates +for a work that has been modified or installed by the recipient, or for +the User Product in which it has been modified or installed. Access to a +network may be denied when the modification itself materially and +adversely affects the operation of the network or violates the rules and +protocols for communication across the network. + + Corresponding Source conveyed, and Installation Information provided, +in accord with this section must be in a format that is publicly +documented (and with an implementation available to the public in +source code form), and must require no special password or key for +unpacking, reading or copying. + + 7. Additional Terms. + + "Additional permissions" are terms that supplement the terms of this +License by making exceptions from one or more of its conditions. +Additional permissions that are applicable to the entire Program shall +be treated as though they were included in this License, to the extent +that they are valid under applicable law. If additional permissions +apply only to part of the Program, that part may be used separately +under those permissions, but the entire Program remains governed by +this License without regard to the additional permissions. + + When you convey a copy of a covered work, you may at your option +remove any additional permissions from that copy, or from any part of +it. (Additional permissions may be written to require their own +removal in certain cases when you modify the work.) You may place +additional permissions on material, added by you to a covered work, +for which you have or can give appropriate copyright permission. + + Notwithstanding any other provision of this License, for material you +add to a covered work, you may (if authorized by the copyright holders of +that material) supplement the terms of this License with terms: + + a) Disclaiming warranty or limiting liability differently from the + terms of sections 15 and 16 of this License; or + + b) Requiring preservation of specified reasonable legal notices or + author attributions in that material or in the Appropriate Legal + Notices displayed by works containing it; or + + c) Prohibiting misrepresentation of the origin of that material, or + requiring that modified versions of such material be marked in + reasonable ways as different from the original version; or + + d) Limiting the use for publicity purposes of names of licensors or + authors of the material; or + + e) Declining to grant rights under trademark law for use of some + trade names, trademarks, or service marks; or + + f) Requiring indemnification of licensors and authors of that + material by anyone who conveys the material (or modified versions of + it) with contractual assumptions of liability to the recipient, for + any liability that these contractual assumptions directly impose on + those licensors and authors. + + All other non-permissive additional terms are considered "further +restrictions" within the meaning of section 10. If the Program as you +received it, or any part of it, contains a notice stating that it is +governed by this License along with a term that is a further +restriction, you may remove that term. If a license document contains +a further restriction but permits relicensing or conveying under this +License, you may add to a covered work material governed by the terms +of that license document, provided that the further restriction does +not survive such relicensing or conveying. + + If you add terms to a covered work in accord with this section, you +must place, in the relevant source files, a statement of the +additional terms that apply to those files, or a notice indicating +where to find the applicable terms. + + Additional terms, permissive or non-permissive, may be stated in the +form of a separately written license, or stated as exceptions; +the above requirements apply either way. + + 8. Termination. + + You may not propagate or modify a covered work except as expressly +provided under this License. Any attempt otherwise to propagate or +modify it is void, and will automatically terminate your rights under +this License (including any patent licenses granted under the third +paragraph of section 11). + + However, if you cease all violation of this License, then your +license from a particular copyright holder is reinstated (a) +provisionally, unless and until the copyright holder explicitly and +finally terminates your license, and (b) permanently, if the copyright +holder fails to notify you of the violation by some reasonable means +prior to 60 days after the cessation. + + Moreover, your license from a particular copyright holder is +reinstated permanently if the copyright holder notifies you of the +violation by some reasonable means, this is the first time you have +received notice of violation of this License (for any work) from that +copyright holder, and you cure the violation prior to 30 days after +your receipt of the notice. + + Termination of your rights under this section does not terminate the +licenses of parties who have received copies or rights from you under +this License. If your rights have been terminated and not permanently +reinstated, you do not qualify to receive new licenses for the same +material under section 10. + + 9. Acceptance Not Required for Having Copies. + + You are not required to accept this License in order to receive or +run a copy of the Program. Ancillary propagation of a covered work +occurring solely as a consequence of using peer-to-peer transmission +to receive a copy likewise does not require acceptance. However, +nothing other than this License grants you permission to propagate or +modify any covered work. These actions infringe copyright if you do +not accept this License. Therefore, by modifying or propagating a +covered work, you indicate your acceptance of this License to do so. + + 10. Automatic Licensing of Downstream Recipients. + + Each time you convey a covered work, the recipient automatically +receives a license from the original licensors, to run, modify and +propagate that work, subject to this License. You are not responsible +for enforcing compliance by third parties with this License. + + An "entity transaction" is a transaction transferring control of an +organization, or substantially all assets of one, or subdividing an +organization, or merging organizations. If propagation of a covered +work results from an entity transaction, each party to that +transaction who receives a copy of the work also receives whatever +licenses to the work the party's predecessor in interest had or could +give under the previous paragraph, plus a right to possession of the +Corresponding Source of the work from the predecessor in interest, if +the predecessor has it or can get it with reasonable efforts. + + You may not impose any further restrictions on the exercise of the +rights granted or affirmed under this License. For example, you may +not impose a license fee, royalty, or other charge for exercise of +rights granted under this License, and you may not initiate litigation +(including a cross-claim or counterclaim in a lawsuit) alleging that +any patent claim is infringed by making, using, selling, offering for +sale, or importing the Program or any portion of it. + + 11. Patents. + + A "contributor" is a copyright holder who authorizes use under this +License of the Program or a work on which the Program is based. The +work thus licensed is called the contributor's "contributor version". + + A contributor's "essential patent claims" are all patent claims +owned or controlled by the contributor, whether already acquired or +hereafter acquired, that would be infringed by some manner, permitted +by this License, of making, using, or selling its contributor version, +but do not include claims that would be infringed only as a +consequence of further modification of the contributor version. For +purposes of this definition, "control" includes the right to grant +patent sublicenses in a manner consistent with the requirements of +this License. + + Each contributor grants you a non-exclusive, worldwide, royalty-free +patent license under the contributor's essential patent claims, to +make, use, sell, offer for sale, import and otherwise run, modify and +propagate the contents of its contributor version. + + In the following three paragraphs, a "patent license" is any express +agreement or commitment, however denominated, not to enforce a patent +(such as an express permission to practice a patent or covenant not to +sue for patent infringement). To "grant" such a patent license to a +party means to make such an agreement or commitment not to enforce a +patent against the party. + + If you convey a covered work, knowingly relying on a patent license, +and the Corresponding Source of the work is not available for anyone +to copy, free of charge and under the terms of this License, through a +publicly available network server or other readily accessible means, +then you must either (1) cause the Corresponding Source to be so +available, or (2) arrange to deprive yourself of the benefit of the +patent license for this particular work, or (3) arrange, in a manner +consistent with the requirements of this License, to extend the patent +license to downstream recipients. "Knowingly relying" means you have +actual knowledge that, but for the patent license, your conveying the +covered work in a country, or your recipient's use of the covered work +in a country, would infringe one or more identifiable patents in that +country that you have reason to believe are valid. + + If, pursuant to or in connection with a single transaction or +arrangement, you convey, or propagate by procuring conveyance of, a +covered work, and grant a patent license to some of the parties +receiving the covered work authorizing them to use, propagate, modify +or convey a specific copy of the covered work, then the patent license +you grant is automatically extended to all recipients of the covered +work and works based on it. + + A patent license is "discriminatory" if it does not include within +the scope of its coverage, prohibits the exercise of, or is +conditioned on the non-exercise of one or more of the rights that are +specifically granted under this License. You may not convey a covered +work if you are a party to an arrangement with a third party that is +in the business of distributing software, under which you make payment +to the third party based on the extent of your activity of conveying +the work, and under which the third party grants, to any of the +parties who would receive the covered work from you, a discriminatory +patent license (a) in connection with copies of the covered work +conveyed by you (or copies made from those copies), or (b) primarily +for and in connection with specific products or compilations that +contain the covered work, unless you entered into that arrangement, +or that patent license was granted, prior to 28 March 2007. + + Nothing in this License shall be construed as excluding or limiting +any implied license or other defenses to infringement that may +otherwise be available to you under applicable patent law. + + 12. No Surrender of Others' Freedom. + + If conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot convey a +covered work so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you may +not convey it at all. For example, if you agree to terms that obligate you +to collect a royalty for further conveying from those to whom you convey +the Program, the only way you could satisfy both those terms and this +License would be to refrain entirely from conveying the Program. + + 13. Use with the GNU Affero General Public License. + + Notwithstanding any other provision of this License, you have +permission to link or combine any covered work with a work licensed +under version 3 of the GNU Affero General Public License into a single +combined work, and to convey the resulting work. The terms of this +License will continue to apply to the part which is the covered work, +but the special requirements of the GNU Affero General Public License, +section 13, concerning interaction through a network will apply to the +combination as such. + + 14. Revised Versions of this License. + + The Free Software Foundation may publish revised and/or new versions of +the GNU General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + + Each version is given a distinguishing version number. If the +Program specifies that a certain numbered version of the GNU General +Public License "or any later version" applies to it, you have the +option of following the terms and conditions either of that numbered +version or of any later version published by the Free Software +Foundation. If the Program does not specify a version number of the +GNU General Public License, you may choose any version ever published +by the Free Software Foundation. + + If the Program specifies that a proxy can decide which future +versions of the GNU General Public License can be used, that proxy's +public statement of acceptance of a version permanently authorizes you +to choose that version for the Program. + + Later license versions may give you additional or different +permissions. However, no additional obligations are imposed on any +author or copyright holder as a result of your choosing to follow a +later version. + + 15. Disclaimer of Warranty. + + THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY +APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT +HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY +OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM +IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF +ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. Limitation of Liability. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS +THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY +GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE +USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF +DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD +PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), +EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF +SUCH DAMAGES. + + 17. Interpretation of Sections 15 and 16. + + If the disclaimer of warranty and limitation of liability provided +above cannot be given local legal effect according to their terms, +reviewing courts shall apply local law that most closely approximates +an absolute waiver of all civil liability in connection with the +Program, unless a warranty or assumption of liability accompanies a +copy of the Program in return for a fee. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +state the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . + +Also add information on how to contact you by electronic and paper mail. + + If the program does terminal interaction, make it output a short +notice like this when it starts in an interactive mode: + + Copyright (C) + This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, your program's commands +might be different; for a GUI interface, you would use an "about box". + + You should also get your employer (if you work as a programmer) or school, +if any, to sign a "copyright disclaimer" for the program, if necessary. +For more information on this, and how to apply and follow the GNU GPL, see +. + + The GNU General Public License does not permit incorporating your program +into proprietary programs. If your program is a subroutine library, you +may consider it more useful to permit linking proprietary applications with +the library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. But first, please read +. diff --git a/Makefile.in b/Makefile.in index c679d37d..09e48a9d 100644 --- a/Makefile.in +++ b/Makefile.in @@ -20,13 +20,10 @@ srcdir=@srcdir@ # This is where the HTML documentation goes htmldir=@htmldir@ -# This is where your CGI programs live -cgidir=@cgidir@ - INSTALL_PROG=@INSTALL@ #WNOERROR=-Werror -#WARNINGS=$(WNOERROR) -ansi -pedantic -Wall -Wtraditional -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Winline -Dlint +#WARNINGS=$(WNOERROR) -ansi -pedantic -Wall -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Winline -Dlint # Compiler to use CC=@CC@ @@ -41,7 +38,7 @@ hypermail: @cd src; $(MAKE) all CC="$(CC)" \ CFLAGS="$(CFLAGS)" \ CPPFLAGS="$(CPPFLAGS)" \ - cgidir="$(cgidir)" bindir="$(bindir)" LIBS="$(LIBS)" + bindir="$(bindir)" LIBS="$(LIBS)" support: @cd archive; $(MAKE) all CC="$(CC)" CFLAGS="$(CFLAGS)" CPPFLAGS="$(CPPFLAGS)" @@ -49,7 +46,7 @@ support: install: @cd src; $(MAKE) install CC="$(CC)" CFLAGS="$(CFLAGS)" \ CPPFLAGS="$(CPPFLAGS)" \ - cgidir="$(cgidir)" bindir="$(bindir)" LIBS="$(LIBS)" + bindir="$(bindir)" LIBS="$(LIBS)" @cd docs; $(MAKE) install CC="$(CC)" CFLAGS="$(CFLAGS)" \ CPPFLAGS="$(CPPFLAGS)" \ $(MAKEFLAGS) mandir="$(mandir)" htmldir="$(htmldir)" @@ -58,7 +55,7 @@ install: bindir="$(bindir)" uninstall: - @cd src; $(MAKE) uninstall cgidir="$(cgidir)" bindir="$(bindir)" + @cd src; $(MAKE) uninstall bindir="$(bindir)" @cd docs; $(MAKE) uninstall mandir="$(mandir)" htmldir="$(htmldir)" @cd archive; $(MAKE) uninstall bindir="$(bindir)" @@ -79,7 +76,6 @@ clobber: clean rm -f config.status rm -f archive/Makefile rm -f docs/Makefile - rm -f libcgi/Makefile rm -f src/Makefile rm -f src/defaults.h rm -f tests/testhm @@ -90,9 +86,23 @@ clobber: clean distclean: clobber rm -f tests/testmail +# creates a distribution tar file. +# taking care to replace the git checkout dirname by +# by hypermail-version[-extraversion]/ +# from the values found in patchlevel.h tgz: - @(dir=`pwd`;name=`basename $$dir`;echo Creates $$name.tar.gz; cd .. ; \ - tar -cf $$name.tar \ - `cat $$name/FILES | grep -v "^#" | sed "s:^:$$name/:g"` ; \ - gzip $$name.tar ; chmod a+r $$name.tar.gz ; mv $$name.tar.gz $$name/) + @(dir=`pwd`; source_dir=`basename $$dir`; \ + version=`sed -nr '1s/.*VERSION\s+\"(.*)\"/\1/p' patchlevel.h`; \ + extraversion=`sed -nr '2s/.*EXTRAVERSION\s+\"(.*)\"/-\1/p' patchlevel.h`; \ + version=$$version$$extraversion; \ + target_dir="hypermail-$$version"; \ + echo creates $$target_dir.tgz from $$source_dir; \ + cd .. ; \ + transform_exp=s,^$$source_dir/,$$target_dir/,; \ + tar -cf $$target_dir.tar \ + --transform=$$transform_exp \ + `cat $$source_dir/FILES | grep -v "^#" | sed "s:^:$$source_dir/:g"`; \ + gzip $$target_dir.tar ; chmod a+r $$target_dir.tar.gz ; \ + mv $$target_dir.tar.gz $$source_dir/ ; \ + ) diff --git a/README b/README index d5aaa9a0..22d1b55e 100644 --- a/README +++ b/README @@ -1,9 +1,9 @@ Hypermail - Version: 2.4.x + Version: 3.0.x -This is a release of the 2.4.x version of hypermail. +This is a release of the 3.0.x version of hypermail. Hypermail is a program that takes a file of mail messages in UNIX mailbox format and generates a set of cross-referenced HTML documents. @@ -13,25 +13,17 @@ SECURITY WARNING: server side includes (SSI) enabled unless you are sure you know what you are doing. -WARNING: - There once existed a program call "mail" that came with hypermail. - 'mail' utility has not installed by default for the last two years. - This program has been disabled because it was probably easy for spammers - to use as an open relay. It also had problems with enabling malicious - use of JavaScript and CRLF Injection. The 'mail' utility is a historic - reclic and will not be supplied in future versions. Its functionality - has been replaced with a warning that anyone using it should remove it - immediately. - See the INSTALL file to get started. For a description of how to use it, see the hypermail.html, hmrc.html, and hypermail-faq.html files that come in the docs/ directory. -Please refer to Changelog for the list of recent changes. -Hypermail is distributed under the GNU GPL license (see the file COPYING for -details). Some programs that are distributed with it in the archive and -contrib directories have different licenses - check the individual files for -details. +Please refer to Release Notes and the Changelog for the list of recent +changes. + +Hypermail is distributed under the GNU GPL license (see the file +LICENSE.txt for details). Some programs that are distributed with it +in the archive and contrib directories have different licenses - check +the individual files for details. Hypermail Background: ===================== @@ -59,7 +51,7 @@ EIT's net.disappearance: A very old and established government contractor company called Electronic Instrumentation and Technology Inc. made legal moves to obtain the eit.com domain. Since VeriFone/HP had no interest in keeping -EIT, dissolved it completely. As this company had a trademark on EIT, +EIT, it dissolved it completely. As this company had a trademark on EIT, the domain name was given to them. Elizabeth Batson of EIT/VeriFone/HP informed Kevin he could maintain all his old software himself wherever he wished to put it. @@ -109,7 +101,6 @@ accordingly. * contrib - contributed hypermail relate utilities * configs - sample hypermail configuration files, * docs - documentation and documentation support files, - * libcgi - support library for the mail utility, * src - here's the beef, * tests - directory for supporting local testing, @@ -154,7 +145,7 @@ Getting Help: Additionally: ============= - You'll find the image "hypermail.gif" included with the source; + You'll find the image "hypermail.png" included with the source; this icon is for your use in your Hypermail-related pages and links to them. If you are talented with graphics and would like to donate new icons and images to the hypermail effort, please feel free. diff --git a/README.CVS b/README.CVS deleted file mode 100644 index 8361a175..00000000 --- a/README.CVS +++ /dev/null @@ -1,91 +0,0 @@ -[ DEPRECATED We're now using github : https://github.com/hypermail-project/hypermail ] - - Hypermail CVS Server Access - - ---- - Hypermail Development has a CVS server, where we (try to) keep the - latest changes (usually hot out of the oven) and anyone is welcome to - use it. Thanks to Elliot Lee for helping with - setting it up, Daniel Stenberg with his - contributions. And many thanks to Ashley M. Kirchner - for hosting and managing the hypermail CVS server. - ------------------------------------- -Setting up the CVS software locally: - - If you system does not have cvs installed on it already then you - need to do the following to install the client locally. - - - Obtain the cvs source at ftp://ftp.cvshome.org/pub/ - For more information on CVS stop by http://www.cvshome.org/ - - - Compile and install the client (you can disable the server with - the --disable-server during the 'configure' command. Read the - INSTALL file once you're uncompressed the archive) - ------------------------------------- -General information on accessing the Hypermail CVS repository: - - Hypermail CVS Archive Address: - - :pserver:cvs@cvs.hypermail.org:/CVS - - The 'cvs' user doesn't have a password, so just hit return when it - asks you for one. The cvs user is setup for read access only. - ------------------------------------- -Step by Step information on accessing the Hypermail CVS repository: - - Aftering installing the CVS software: - - Set your CVSROOT enviroment: - For sh, bash and ksh users, execute the following commands: - - CVSROOT=:pserver:cvs@cvs.hypermail.org:/CVS - export CVSROOT - - (or you can stick them in your .profile and/or .bash_profile file) - For C shell users (csh, tcsh), you can do the following: - - setenv CVSROOT :pserver:cvs@cvs.hypermail.org:/CVS - - (or stick it in your .cshrc and/or .login file) - - - From here you can login to the server with: - - $ cvs login - (Logging in to cvs@cvs.hypermail.org) - CVS password: <-- hit RETURN (cvs user password is blank) - - - Now you're ready to grab the source - - $ cvs checkout hypermail - - This will create a mirror of the sources in your account/on your - machine called 'hypermail' that you can then compile and play - with. - - - Once done, don't forget to log out: - - $ cvs logout - - All of the above commands can be performed without having to set a - CVSROOT enviroment if you want, it's just a lot more to type in since - you'll have to specify the directory every time with: - - -d :pserver:cvs@cvs.hypermail.org:/CVS - - For example, you would need to use the following to execute the login - command - - cvs -d :pserver:cvs@cvs.hypermail.org:/CVS login - cvs -d :pserver:cvs@cvs.hypermail.org:/CVS checkout hypermail - cvs -d :pserver:cvs@cvs.hypermail.org:/CVS logout - ------------------------------------- -Browsing the Hypermail CVS Archive: - - You can also browse the repository at: - - http://cvsweb.hypermail.org/ - - ---- diff --git a/RELATED_LICENSES.txt b/RELATED_LICENSES.txt new file mode 100644 index 00000000..0adfadbb --- /dev/null +++ b/RELATED_LICENSES.txt @@ -0,0 +1,316 @@ +This file contains the copying permission notices for various files +and functions in hypermail that have copyright owners other than +the hypermail developers. + +These notices all require that a copy of the notice be included in the +accompanying documentation and be distributed with binary +distributions of the code, so be sure to include this file along with +any binary distributions derived from hypermail. + + +Regular expression support is provided by the PCRE library package, +which is open source software, written by Philip Hazel, and copyright +by the University of Cambridge, England. See http://www.pcre.org/. + +PCRE2 LICENCE +------------- + +PCRE2 is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + +Releases 10.00 and above of PCRE2 are distributed under the terms of the "BSD" +licence, as specified below, with one exemption for certain binary +redistributions. The documentation for PCRE2, supplied in the "doc" directory, +is distributed under the same terms as the software itself. The data in the +testdata directory is not copyrighted and is in the public domain. + +The basic library functions are written in C and are freestanding. Also +included in the distribution is a just-in-time compiler that can be used to +optimize pattern matching. This is an optional feature that can be omitted when +the library is built. + + +THE BASIC LIBRARY FUNCTIONS +--------------------------- + +Written by: Philip Hazel +Email local part: Philip.Hazel +Email domain: gmail.com + +Retired from University of Cambridge Computing Service, +Cambridge, England. + +Copyright (c) 1997-2021 University of Cambridge +All rights reserved. + + +PCRE2 JUST-IN-TIME COMPILATION SUPPORT +-------------------------------------- + +Written by: Zoltan Herczeg +Email local part: hzmester +Email domain: freemail.hu + +Copyright(c) 2010-2021 Zoltan Herczeg +All rights reserved. + + +STACK-LESS JUST-IN-TIME COMPILER +-------------------------------- + +Written by: Zoltan Herczeg +Email local part: hzmester +Email domain: freemail.hu + +Copyright(c) 2009-2021 Zoltan Herczeg +All rights reserved. + + +THE "BSD" LICENCE +----------------- + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notices, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notices, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of any + contributors may be used to endorse or promote products derived from this + software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. + + +EXEMPTION FOR BINARY LIBRARY-LIKE PACKAGES +------------------------------------------ + +The second condition in the BSD licence (covering binary redistributions) does +not apply all the way down a chain of software. If binary package A includes +PCRE2, it must respect the condition, but if package B is software that +includes package A, the condition is not imposed on package B unless it uses +PCRE2 independently. + +End + + +The code for strcasecmp() was borrowed from OpenBSD. OpenBSD code can be freely +used, copied, modified, and distributed by anyone and for any purpose. They +are both covered by The Berkeley Copyright license, which has been included +here above. See https://www.openbsd.org/policy.html + +Parts of the code in the uudecode.c module were borrowed from the BSD uudecode +command, which is also covered the Berkeley Copyright license, included here +above. + +The code for strcasestr() was borrowed from glib.c 2.1 +This code is covered by the GNU Lesser General Public License. + + GNU LESSER GENERAL PUBLIC LICENSE + Version 3, 29 June 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + + This version of the GNU Lesser General Public License incorporates +the terms and conditions of version 3 of the GNU General Public +License, supplemented by the additional permissions listed below. + + 0. Additional Definitions. + + As used herein, "this License" refers to version 3 of the GNU Lesser +General Public License, and the "GNU GPL" refers to version 3 of the GNU +General Public License. + + "The Library" refers to a covered work governed by this License, +other than an Application or a Combined Work as defined below. + + An "Application" is any work that makes use of an interface provided +by the Library, but which is not otherwise based on the Library. +Defining a subclass of a class defined by the Library is deemed a mode +of using an interface provided by the Library. + + A "Combined Work" is a work produced by combining or linking an +Application with the Library. The particular version of the Library +with which the Combined Work was made is also called the "Linked +Version". + + The "Minimal Corresponding Source" for a Combined Work means the +Corresponding Source for the Combined Work, excluding any source code +for portions of the Combined Work that, considered in isolation, are +based on the Application, and not on the Linked Version. + + The "Corresponding Application Code" for a Combined Work means the +object code and/or source code for the Application, including any data +and utility programs needed for reproducing the Combined Work from the +Application, but excluding the System Libraries of the Combined Work. + + 1. Exception to Section 3 of the GNU GPL. + + You may convey a covered work under sections 3 and 4 of this License +without being bound by section 3 of the GNU GPL. + + 2. Conveying Modified Versions. + + If you modify a copy of the Library, and, in your modifications, a +facility refers to a function or data to be supplied by an Application +that uses the facility (other than as an argument passed when the +facility is invoked), then you may convey a copy of the modified +version: + + a) under this License, provided that you make a good faith effort to + ensure that, in the event an Application does not supply the + function or data, the facility still operates, and performs + whatever part of its purpose remains meaningful, or + + b) under the GNU GPL, with none of the additional permissions of + this License applicable to that copy. + + 3. Object Code Incorporating Material from Library Header Files. + + The object code form of an Application may incorporate material from +a header file that is part of the Library. You may convey such object +code under terms of your choice, provided that, if the incorporated +material is not limited to numerical parameters, data structure +layouts and accessors, or small macros, inline functions and templates +(ten or fewer lines in length), you do both of the following: + + a) Give prominent notice with each copy of the object code that the + Library is used in it and that the Library and its use are + covered by this License. + + b) Accompany the object code with a copy of the GNU GPL and this license + document. + + 4. Combined Works. + + You may convey a Combined Work under terms of your choice that, +taken together, effectively do not restrict modification of the +portions of the Library contained in the Combined Work and reverse +engineering for debugging such modifications, if you also do each of +the following: + + a) Give prominent notice with each copy of the Combined Work that + the Library is used in it and that the Library and its use are + covered by this License. + + b) Accompany the Combined Work with a copy of the GNU GPL and this license + document. + + c) For a Combined Work that displays copyright notices during + execution, include the copyright notice for the Library among + these notices, as well as a reference directing the user to the + copies of the GNU GPL and this license document. + + d) Do one of the following: + + 0) Convey the Minimal Corresponding Source under the terms of this + License, and the Corresponding Application Code in a form + suitable for, and under terms that permit, the user to + recombine or relink the Application with a modified version of + the Linked Version to produce a modified Combined Work, in the + manner specified by section 6 of the GNU GPL for conveying + Corresponding Source. + + 1) Use a suitable shared library mechanism for linking with the + Library. A suitable mechanism is one that (a) uses at run time + a copy of the Library already present on the user's computer + system, and (b) will operate properly with a modified version + of the Library that is interface-compatible with the Linked + Version. + + e) Provide Installation Information, but only if you would otherwise + be required to provide such information under section 6 of the + GNU GPL, and only to the extent that such information is + necessary to install and execute a modified version of the + Combined Work produced by recombining or relinking the + Application with a modified version of the Linked Version. (If + you use option 4d0, the Installation Information must accompany + the Minimal Corresponding Source and Corresponding Application + Code. If you use option 4d1, you must provide the Installation + Information in the manner specified by section 6 of the GNU GPL + for conveying Corresponding Source.) + + 5. Combined Libraries. + + You may place library facilities that are a work based on the +Library side by side in a single library together with other library +facilities that are not Applications and are not covered by this +License, and convey such a combined library under terms of your +choice, if you do both of the following: + + a) Accompany the combined library with a copy of the same work based + on the Library, uncombined with any other library facilities, + conveyed under the terms of this License. + + b) Give prominent notice with the combined library that part of it + is a work based on the Library, and explaining where to find the + accompanying uncombined form of the same work. + + 6. Revised Versions of the GNU Lesser General Public License. + + The Free Software Foundation may publish revised and/or new versions +of the GNU Lesser General Public License from time to time. Such new +versions will be similar in spirit to the present version, but may +differ in detail to address new problems or concerns. + + Each version is given a distinguishing version number. If the +Library as you received it specifies that a certain numbered version +of the GNU Lesser General Public License "or any later version" +applies to it, you have the option of following the terms and +conditions either of that published version or of any later version +published by the Free Software Foundation. If the Library as you +received it does not specify a version number of the GNU Lesser +General Public License, you may choose any version of the GNU Lesser +General Public License ever published by the Free Software Foundation. + + If the Library as you received it specifies that a proxy can decide +whether future versions of the GNU Lesser General Public License shall +apply, that proxy's public statement of acceptance of any version is +permanent authorization for you to choose that version for the +Library. + + +The utf8.h library comes from https://github.com/sheredom/utf8.h +and is covered by the unlicense. + +/* This is free and unencumbered software released into the public domain. + * + * Anyone is free to copy, modify, publish, use, compile, sell, or + * distribute this software, either in source code form or as a compiled + * binary, for any purpose, commercial or non-commercial, and by any + * means. + * + * In jurisdictions that recognize copyright laws, the author or authors + * of this software dedicate any and all copyright interest in the + * software to the public domain. We make this dedication for the benefit + * of the public at large and to the detriment of our heirs and + * successors. We intend this dedication to be an overt act of + * relinquishment in perpetuity of all present and future rights to this + * software under copyright law. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + * IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + * For more information, please refer to */ diff --git a/RELEASE_NOTES b/RELEASE_NOTES new file mode 100644 index 00000000..d17ed554 --- /dev/null +++ b/RELEASE_NOTES @@ -0,0 +1,205 @@ +Hypermail 3.0.0 + +If you are migrating to 3.0.0 from 2.4.0, please consult the UPGRADE +file. + +*** NEW FEATURES + +- HTML5 + +Both the markup generated by hypermail and its documentation have +been updated to HTML5. + +- WAI enhancements / deprecation of text-only browser support + +The generated markup was reviewed and improved for better screen +reader user experience as well as for the overall accessiblity of +hypermail. Due to the need to use HTML5 and CSS to achieve this goal, +we unfortunately had to stop generating markup that could be readable +and look good in text-only browsers, such as lynx and w3m. + +- CSS and hypermail + +The CSS rules that used to be inside messages generated by hypermail +have been moved to external files. + +You now have the possibility to associate the indexes and messages +with a CSS file to customize their presentation. + +You have three configuration options that let you define your own +CSS files: + +- "icss_url" : URL of a css file you would like to use only with indexes +- "mcss_url" : URL of a css file you would like to use only with messages +- "default_css_url" : URL of a default CSS file that will be used if + either gicss_url or mcss_url are not declared. + +If "default_css_url" is not defined and is needed, hypermail will use +the name hypermail.css and will generate this file by itself with some +pre-defined rules, unless this file already exists in the archive +directory. + +Hypermail's default css file is also available under +docs/hypermail.css and will be installed with the other documentation +files. + +- Navigation bar for messages + +Previoulsy hypermail used the same navigation bar for both indexes and +messages. The new configuration option "mhtmlnavbar2upfile" let's you +include in hypermail generated messages more specific HTML formatting +statement giving links to the hierarchy of your archive. By default, +this option uses the value of "ihtmlnavbar2upfile". + +- Deleted messages + +If a message has been marked as deleted by means of an annotation, the +thread view will display a "deleted" message, but not remove the +deleted node to avoid having a misinterpreation of threads. A related +change. the in-reply-to and next in thread messages will display +"deleted message" but won't display the subject of the deleted message +anymore. + +- Switch to PCRE2 + +Up to now hypermail had been using PCRE behind the scenes of its +message filtering options (docs/hmrc.html#filters). PCRE being now +at end of life, and is no longer being actively maintained, we have now +updated hypermail to support its successor, PCRE2. If you were using any +of Hypermail's filtering options, we advise you to check out PCRE2's +doc to see if there have been any changes there that may impact you: + + https://www.pcre.org/current/doc/html/pcre2pattern.html + +As before, we're shipping the latest source of PCRE2 with hypermail. +However if your system has PCRE2 installed, hypermail will use that +one when being compiled. At this point of time we're not sure if +shipping this library source is a practice that will be continued in +the future. + +- Other configuration option changes + +Two new configuration options, "archive_date" and "hypermail_colophon", +allow you choose whether you want to display a line that says when the +archive was generated (only in the indexes) as well as a a line that +states that the archive was generated with hypermail and the +generation date (both in messages and indexes). Both these options may +be considered mutually exclusive as they both display the archive +generation date. You can disable both of them. + +"empty_archive_notice" will let you customize the markup and text that +is displayed in indices when you're converting a mbox that consists +exclusively of messages that have been annotated as 'spam' or 'deleted'. + +"show_headers_msg_rfc822" will let you customize the list of headers +that you want to be shown in message/rfc822 attachments. If not +defined, hypermail will use the value of "show_headers". + +"archived_date" lets you control whether you want to add a line +in the indices giving the date the archive was generated. + +Please refer to refer to hmrc.html hmrc.4 or use hypermail -v for a +more detailed description of the configuration options. + +- Parser partially revisited + +The parser (a very old source code comment rightfully states this is +the heart of hypermail has been revised and partially rewritten. We +went from a linear parser output, to an intermediary tree +representation of a message that, after the parsing is completed, is +flattened to the the structure that hypermail print functions are used +to. + +This new parser has allowed us to better handle message/rfc822 and +multipart/alternative attachments, simplify hypermail's parsing code, +and use better heuristics for determining the charset that will be +associated with a processed message. The new parser also greatly +simplifies adding new markup changes as it allowed to separate the +parsing from the markup generation. + +- Compiler warnings and memory leaks + +All compiler warnings generated by gcc -Wall -O2 have been solved. + +Although all major runtime cumulative memory leaks that gcc's +-fsanitize and -libasan found were solved, there may be parts of the +code that were not tested as they depend on the use and combination of +different hypermail configuration options. + +- Other code improvements + +Read the Changelog for more details. + +*** DEPRECATED FEATURES + +The configuration variable "indextable", which let hypermail +generate message indexes using tables has been deprecated in favor of +using HTML and CSS. This option has been disabled. The pertaining +code will be removed from the code base in a future release. + +The "quotes" and "finequotes" options are now in the to be deprecated +list. The code that handles these options hasn't been updated or +reviewed since a long time and will be deleted in a future release +unless someone gives them more love. + +Removed the code that was used for the stand-alone script 'mail'. +This script had security issues and had been disabled in 2003 to stop +working. However, the code and associated libraries were still in +the dist. + +*** HYPERMAIL FEATURES THAT NEED MORE LOVE in 3.0.0 + +This is a list of features that have only been partially updated in +this version. We are unsure if people are using them and if they +should be preserved or deprecated. If you're using any of these +features, please check that this new version of hypermail is still +working on your archive as expected. If that's not the case, please +stick to the 2.4 version for the moment. Feedback to know if people +are using these features and/or code patches (better) are welcome. + +All of these features now have a corresponding issue on hypermail's +github repository, labelled with "support requested", for tracking +their status. If we don't get code patches or feedback, we will +consider deprecating them in the next version of hypermail. Please use +those issue trackers to send your feedback. + +- showhtml option: only bare minimum of markup update done. CSS and + generated markup could be improved. Issue #74 on github. + +- linkquotes: this piece of code seems to have stopped working at + least since 2.4.0. This feature has been downgrated to experimental + and risks being deprecated in the next version of + hypermail unless someone updates it. Issue #75 on github. + +- inline html: no specific updates done. The way inline HTML is added + is very simple (just cut and paste of the attachment's body), and often + breaks the valid markup that hypermail added. Issue #76 on github. + +- yearly and monthly summaries. This features seem to not have been + updated since long time ago and missing some links letting navigate + back from a message to the summariess to make them more useful. + Note that you could add those links with the navbar options. Issue + #77 on github. + +- translation of messages: some languagess are behind since years and + need update. Unmaintained languages risk being dropped out in a + future version of hypermail. Issue #78 on github. + +- compilation: windows LCC support (not updated, seems very + old). Issue #79 on github. + +*** ACKNOWLEDGMENTS + +Many thanks to W3C for their contributions to this version of +hypermail. In particular Josh O'Connor, Daniel Montalvo, and Shadi +Abou-Zahra from the WAI Team for their evaluation and feedback on how +to improve hypermail's accessibility; thanks to Vivien Lacourba and +Gerald Oskoboiny from the Systems Team for their testing and feedback +of the pre-release version of hypermail. + +Many thanks too to Judy Brewer of the W3C WAI Team as well as Vivien +Lacourba of the W3C Systems Team for their involvement and +facilitating discussions and work on this version of hypermail. + +Thanks to Baptiste Daroussin, Jim (@AverageGuy), @cacsar, @schlomif, +and Andy Valencia (@vandys) for their bug reports and code contributions. diff --git a/ROADMAP b/ROADMAP new file mode 100644 index 00000000..22e2e097 --- /dev/null +++ b/ROADMAP @@ -0,0 +1,50 @@ +Here are the priorities I see for the next versions of hypermail: + +- Clean the code. + +Hypermail has many features, but not all of them have been maintained +and updated through the years. Some really need to be revised, like +the txt2html() and finequotes() modules. If people are not using those +features, I'd like to deprecate them. + +In many parts of the code we have commented code with #ifdef 0. Those +are very old parts which made sense to keep when new code was being +introduced to replace old one, but could safely be removed now. + +We have started tracking some potential features to be removed using +github issues labelled with "support request". + +- Support for UTF-8 + +We have some partial support. However, there are many C lib functions +that don't understand UTF-8, such as isspace() and scanf(). These +functions are used in important parts of hypermail, such as parsing +messages and scanning lines for URLs. In order to increase the use of +UTF-8 inside hypermail, we need to: + + - Complete the support for RFC2246 format=flowed. This RFC allows to + break long lines in the body of a message into multi-lines for + transmission. Due to UTF-8 having the potential to be multi-char, + we need to reconstitute the original unbroken line before + processing it for UTF-8. + + - Add support for RFC2231 data format. This is similar to RFC2246 + but applies to multi-line header parameter values. + +-- Convert the message body into UTF-8. We are already using iconv() + for converting headers. What is missing is converting the parts of + the message that will be further processed (like when scanning for + links, removing extra space from subjects, ...) into UCS-4 so that + we can use unicode-aware library functions. glib seems to be a + good candidate for providing the last part of these functions. I'm + not sure at this point if we should convert all message body into + UCS-4 and just output UTF-8 when writing the converted message. + +- Support for a json output + +The idea here is to add an option to be able to output a headless +archive, formatted into json and let other applications transform this +data into markup. This would help separate the parsing and updating of +messages from their presentation and probably help ease the +customization of markup into something more modern or reactive as +needed. diff --git a/UPGRADE b/UPGRADE index fe80ac9e..12f15d93 100644 --- a/UPGRADE +++ b/UPGRADE @@ -1,6 +1,80 @@ Upgrade Notes ============== +From Hypermail 2.4.0 to Hypermail 3.0.0 +----------------------------------------- + +Due to the markup changes to support HTML5 and enhance the WAI +support, it is strongly recommended that you delete all existing HTML +files and directories in the archive directory and that you rebuild +your archive completely using version 3.0.0 Failure to do so may +produce invalid markup in some of your messages or have duplicate +links that point to replies and other messages (the 2.4.0 + the 3.0.0 +ones) in your current messages. + +Except for some few exceptions related to specific options (see here +below), all generated markup was reviewed with the help of experts of +the WAI community to make sure it is accessible. We may have missed +some files that are only generated when using specific configuration +options; if that's the case, please open an issue and we'll look at +it. + +Hypermail is now using PCRE2. If you were using any of the filtering +options, you should check if your regular expressions need updating. +For more info see for example: + + https://wiki.php.net/rfc/pcre2-migration: + +All style that used to be embeeded in hypermail's generated archive +files was moved to an external stylesheet. This stylesheet, +hypermail.css, will be generated by default in your archive directory +unless you use options to give a link to an external one. Together +with this move, more markup was added to allow a better use of CSS and +better accessibility and use of screen readers. + +You can find hypermail's current stylesheet under docs/hypermail.css. +If you were already relying on an external stylesheet file, we advise +you to check hypermail's default shipped one to see if there are +changes you'd be interested in supporting in your own stylesheet file. + +The following configuration options are now considered deprecated and +will be ignored. They will be removed from the code base in the next +hypermail release: + +- showhr, usetable, iquotes. All of these options have been made + obsolete by markup changes in hypermail. You can emulate their + behavior by using CSS. + +The following options are being considered as future candidates for +deprecation: + +- showhtml. We are not sure if people are using this option with + values 1 and 2. The markup that's produced isn't so special and we + think that it'd be better to merge whatever features people deem + useful there to the normal markup and use CSS to control + presentation. Minimal work has been done with this option other than + to make sure the produced markup is valid. The markup that's + produced with this option was not given a WAI review. + +The following options are now considered unstable since version 2.4 +and are considered future candidates for deprecation unless people +state they are being used and someone proposes to adopt their code and +gives it more love and maintenance: + +- linkquotes, searchbackmsgnum, link_to_replies, quote_hide_threshold, + quote_link_string. Note that all these options are related to + linkquotes. No work was done in 3.0.0 regarding these options. + +By default, hypermail will warn you if your configuration file includes +deprecated or planned to be deprecated options. You can switch off this +warning by using the new "warn_deprecated_options" configuration option. + +The yearly and monthly summaries seem to have been broken since some +long time ago. Some few updates were done but they require more +love to make them more useful. In particular, messages are missing +links pointing back to the summaries. Note that you could do this +by using the different navbar options. + From Hypermail 2.3.0 to Hypermail 2.4.0 ----------------------------------------- diff --git a/acconfig.h b/acconfig.h deleted file mode 100644 index eb7720ab..00000000 --- a/acconfig.h +++ /dev/null @@ -1,2 +0,0 @@ - -#undef HAVE_GDBM_H diff --git a/aclocal.m4 b/aclocal.m4 index ebc6a54d..460e78ff 100644 --- a/aclocal.m4 +++ b/aclocal.m4 @@ -1,140 +1,15 @@ -dnl Some macros borrowed from Apache's httpd config file. Thanks! +# generated automatically by aclocal 1.16.3 -*- Autoconf -*- -dnl -dnl APR_SUBDIR_CONFIG(dir [, sub-package-cmdline-args, args-to-drop]) -dnl -dnl dir: directory to find configure in -dnl sub-package-cmdline-args: arguments to add to the invocation (optional) -dnl args-to-drop: arguments to drop from the invocation (optional) -dnl -dnl Note: This macro relies on ac_configure_args being set properly. -dnl -dnl The args-to-drop argument is shoved into a case statement, so -dnl multiple arguments can be separated with a |. -dnl -dnl Note: Older versions of autoconf do not single-quote args, while 2.54+ -dnl places quotes around every argument. So, if you want to drop the -dnl argument called --enable-layout, you must pass the third argument as: -dnl [--enable-layout=*|\'--enable-layout=*] -dnl -dnl Trying to optimize this is left as an exercise to the reader who wants -dnl to put up with more autoconf craziness. I give up. -dnl -AC_DEFUN([APR_SUBDIR_CONFIG], [ - # save our work to this point; this allows the sub-package to use it - AC_CACHE_SAVE +# Copyright (C) 1996-2020 Free Software Foundation, Inc. - echo "configuring package in $1 now" - ac_popdir=`pwd` - apr_config_subdirs="$1" - test -d $1 || $mkdir_p $1 - ac_abs_srcdir=`(cd $srcdir/$1 && pwd)` - cd $1 +# This file is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. -changequote(, )dnl - # A "../" for each directory in /$config_subdirs. - ac_dots=`echo $apr_config_subdirs|sed -e 's%^\./%%' -e 's%[^/]$%&/%' -e 's%[^/]*/%../%g'` -changequote([, ])dnl +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY, to the extent permitted by law; without +# even the implied warranty of MERCHANTABILITY or FITNESS FOR A +# PARTICULAR PURPOSE. - # Make the cache file pathname absolute for the subdirs - # required to correctly handle subdirs that might actually - # be symlinks - case "$cache_file" in - /*) # already absolute - ac_sub_cache_file=$cache_file ;; - *) # Was relative path. - ac_sub_cache_file="$ac_popdir/$cache_file" ;; - esac - - ifelse($3, [], [apr_configure_args=$ac_configure_args],[ - apr_configure_args= - apr_sep= - for apr_configure_arg in $ac_configure_args - do - case "$apr_configure_arg" in - $3) - continue ;; - esac - apr_configure_args="$apr_configure_args$apr_sep'$apr_configure_arg'" - apr_sep=" " - done - ]) - - dnl autoconf doesn't add --silent to ac_configure_args; explicitly pass it - test "x$silent" = "xyes" && apr_configure_args="$apr_configure_args --silent" - - dnl AC_CONFIG_SUBDIRS silences option warnings, emulate this for 2.62 - apr_configure_args="--disable-option-checking $apr_configure_args" - - dnl The eval makes quoting arguments work - specifically the second argument - dnl where the quoting mechanisms used is "" rather than []. - dnl - dnl We need to execute another shell because some autoconf/shell combinations - dnl will choke after doing repeated APR_SUBDIR_CONFIG()s. (Namely Solaris - dnl and autoconf-2.54+) - if eval $SHELL $ac_abs_srcdir/configure $apr_configure_args --cache-file=$ac_sub_cache_file --srcdir=$ac_abs_srcdir $2 - then : - echo "$1 configured properly" - else - echo "configure failed for $1" - exit 1 - fi - - cd $ac_popdir - - # grab any updates from the sub-package - AC_CACHE_LOAD -])dnl - -dnl -dnl APR_ADDTO(variable, value) -dnl -dnl Add value to variable -dnl -AC_DEFUN([APR_ADDTO], [ - if test "x$$1" = "x"; then - test "x$silent" != "xyes" && echo " setting $1 to \"$2\"" - $1="$2" - else - apr_addto_bugger="$2" - for i in $apr_addto_bugger; do - apr_addto_duplicate="0" - for j in $$1; do - if test "x$i" = "x$j"; then - apr_addto_duplicate="1" - break - fi - done - if test $apr_addto_duplicate = "0"; then - test "x$silent" != "xyes" && echo " adding \"$i\" to $1" - $1="$$1 $i" - fi - done - fi -])dnl - -dnl -dnl APR_REMOVEFROM(variable, value) -dnl -dnl Remove a value from a variable -dnl -AC_DEFUN([APR_REMOVEFROM], [ - if test "x$$1" = "x$2"; then - test "x$silent" != "xyes" && echo " nulling $1" - $1="" - else - apr_new_bugger="" - apr_removed=0 - for i in $$1; do - if test "x$i" != "x$2"; then - apr_new_bugger="$apr_new_bugger $i" - else - apr_removed=1 - fi - done - if test $apr_removed = "1"; then - test "x$silent" != "xyes" && echo " removed \"$2\" from $1" - $1=$apr_new_bugger - fi - fi -]) dnl +m4_ifndef([AC_CONFIG_MACRO_DIRS], [m4_defun([_AM_CONFIG_MACRO_DIRS], [])m4_defun([AC_CONFIG_MACRO_DIRS], [_AM_CONFIG_MACRO_DIRS($@)])]) +m4_include([m4/apr.m4]) diff --git a/config.guess b/config.guess index ed2e03b7..69188da7 100755 --- a/config.guess +++ b/config.guess @@ -1,13 +1,14 @@ #! /bin/sh # Attempt to guess a canonical system name. -# Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, -# 2000, 2001, 2002 Free Software Foundation, Inc. +# Copyright 1992-2023 Free Software Foundation, Inc. -timestamp='2002-03-20' +# shellcheck disable=SC2006,SC2268 # see below for rationale + +timestamp='2023-01-01' # This file is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by -# the Free Software Foundation; either version 2 of the License, or +# the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, but @@ -16,24 +17,30 @@ timestamp='2002-03-20' # General Public License for more details. # # You should have received a copy of the GNU General Public License -# along with this program; if not, write to the Free Software -# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. +# along with this program; if not, see . # # As a special exception to the GNU General Public License, if you # distribute this file as part of a program that contains a # configuration script generated by Autoconf, you may include it under -# the same distribution terms that you use for the rest of that program. - -# Originally written by Per Bothner . -# Please send patches to . Submit a context -# diff and a properly formatted ChangeLog entry. +# the same distribution terms that you use for the rest of that +# program. This Exception is an additional permission under section 7 +# of the GNU General Public License, version 3 ("GPLv3"). # -# This script attempts to guess a canonical system name similar to -# config.sub. If it succeeds, it prints the system name on stdout, and -# exits with 0. Otherwise, it exits with 1. +# Originally written by Per Bothner; maintained since 2000 by Ben Elliston. # -# The plan is that this can be called by configure scripts if you -# don't specify an explicit build system type. +# You can get the latest version of this script from: +# https://git.savannah.gnu.org/cgit/config.git/plain/config.guess +# +# Please send patches to . + + +# The "shellcheck disable" line above the timestamp inhibits complaints +# about features and limitations of the classic Bourne shell that were +# superseded or lifted in POSIX. However, this script identifies a wide +# variety of pre-POSIX systems that do not have POSIX shells at all, and +# even some reasonably current systems (Solaris 10 as case-in-point) still +# have a pre-POSIX /bin/sh. + me=`echo "$0" | sed -e 's,.*/,,'` @@ -42,7 +49,7 @@ Usage: $0 [OPTION] Output the configuration name of the system \`$me' is run on. -Operation modes: +Options: -h, --help print this help, then exit -t, --time-stamp print date of last modification, then exit -v, --version print version number, then exit @@ -53,8 +60,7 @@ version="\ GNU config.guess ($timestamp) Originally written by Per Bothner. -Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001 -Free Software Foundation, Inc. +Copyright 1992-2023 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE." @@ -66,11 +72,11 @@ Try \`$me --help' for more information." while test $# -gt 0 ; do case $1 in --time-stamp | --time* | -t ) - echo "$timestamp" ; exit 0 ;; + echo "$timestamp" ; exit ;; --version | -v ) - echo "$version" ; exit 0 ;; + echo "$version" ; exit ;; --help | --h* | -h ) - echo "$usage"; exit 0 ;; + echo "$usage"; exit ;; -- ) # Stop option processing shift; break ;; - ) # Use stdin as input. @@ -88,48 +94,106 @@ if test $# != 0; then exit 1 fi +# Just in case it came from the environment. +GUESS= -dummy=dummy-$$ -trap 'rm -f $dummy.c $dummy.o $dummy.rel $dummy; exit 1' 1 2 15 +# CC_FOR_BUILD -- compiler used by this script. Note that the use of a +# compiler to aid in system detection is discouraged as it requires +# temporary files to be created and, as you can see below, it is a +# headache to deal with in a portable fashion. -# CC_FOR_BUILD -- compiler used by this script. # Historically, `CC_FOR_BUILD' used to be named `HOST_CC'. We still # use `HOST_CC' if defined, but it is deprecated. -set_cc_for_build='case $CC_FOR_BUILD,$HOST_CC,$CC in - ,,) echo "int dummy(){}" > $dummy.c ; - for c in cc gcc c89 c99 ; do - ($c $dummy.c -c -o $dummy.o) >/dev/null 2>&1 ; - if test $? = 0 ; then - CC_FOR_BUILD="$c"; break ; - fi ; - done ; - rm -f $dummy.c $dummy.o $dummy.rel ; - if test x"$CC_FOR_BUILD" = x ; then - CC_FOR_BUILD=no_compiler_found ; - fi - ;; - ,,*) CC_FOR_BUILD=$CC ;; - ,*,*) CC_FOR_BUILD=$HOST_CC ;; -esac' +# Portable tmp directory creation inspired by the Autoconf team. + +tmp= +# shellcheck disable=SC2172 +trap 'test -z "$tmp" || rm -fr "$tmp"' 0 1 2 13 15 + +set_cc_for_build() { + # prevent multiple calls if $tmp is already set + test "$tmp" && return 0 + : "${TMPDIR=/tmp}" + # shellcheck disable=SC2039,SC3028 + { tmp=`(umask 077 && mktemp -d "$TMPDIR/cgXXXXXX") 2>/dev/null` && test -n "$tmp" && test -d "$tmp" ; } || + { test -n "$RANDOM" && tmp=$TMPDIR/cg$$-$RANDOM && (umask 077 && mkdir "$tmp" 2>/dev/null) ; } || + { tmp=$TMPDIR/cg-$$ && (umask 077 && mkdir "$tmp" 2>/dev/null) && echo "Warning: creating insecure temp directory" >&2 ; } || + { echo "$me: cannot create a temporary directory in $TMPDIR" >&2 ; exit 1 ; } + dummy=$tmp/dummy + case ${CC_FOR_BUILD-},${HOST_CC-},${CC-} in + ,,) echo "int x;" > "$dummy.c" + for driver in cc gcc c89 c99 ; do + if ($driver -c -o "$dummy.o" "$dummy.c") >/dev/null 2>&1 ; then + CC_FOR_BUILD=$driver + break + fi + done + if test x"$CC_FOR_BUILD" = x ; then + CC_FOR_BUILD=no_compiler_found + fi + ;; + ,,*) CC_FOR_BUILD=$CC ;; + ,*,*) CC_FOR_BUILD=$HOST_CC ;; + esac +} # This is needed to find uname on a Pyramid OSx when run in the BSD universe. # (ghazi@noc.rutgers.edu 1994-08-24) -if (test -f /.attbin/uname) >/dev/null 2>&1 ; then +if test -f /.attbin/uname ; then PATH=$PATH:/.attbin ; export PATH fi UNAME_MACHINE=`(uname -m) 2>/dev/null` || UNAME_MACHINE=unknown UNAME_RELEASE=`(uname -r) 2>/dev/null` || UNAME_RELEASE=unknown -UNAME_SYSTEM=`(uname -s) 2>/dev/null` || UNAME_SYSTEM=unknown +UNAME_SYSTEM=`(uname -s) 2>/dev/null` || UNAME_SYSTEM=unknown UNAME_VERSION=`(uname -v) 2>/dev/null` || UNAME_VERSION=unknown +case $UNAME_SYSTEM in +Linux|GNU|GNU/*) + LIBC=unknown + + set_cc_for_build + cat <<-EOF > "$dummy.c" + #include + #if defined(__UCLIBC__) + LIBC=uclibc + #elif defined(__dietlibc__) + LIBC=dietlibc + #elif defined(__GLIBC__) + LIBC=gnu + #else + #include + /* First heuristic to detect musl libc. */ + #ifdef __DEFINED_va_list + LIBC=musl + #endif + #endif + EOF + cc_set_libc=`$CC_FOR_BUILD -E "$dummy.c" 2>/dev/null | grep '^LIBC' | sed 's, ,,g'` + eval "$cc_set_libc" + + # Second heuristic to detect musl libc. + if [ "$LIBC" = unknown ] && + command -v ldd >/dev/null && + ldd --version 2>&1 | grep -q ^musl; then + LIBC=musl + fi + + # If the system lacks a compiler, then just pick glibc. + # We could probably try harder. + if [ "$LIBC" = unknown ]; then + LIBC=gnu + fi + ;; +esac + # Note: order is significant - the case branches are not exclusive. -case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in +case $UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION in *:NetBSD:*:*) # NetBSD (nbsd) targets should (where applicable) match one or - # more of the tupples: *-*-netbsdelf*, *-*-netbsdaout*, + # more of the tuples: *-*-netbsdelf*, *-*-netbsdaout*, # *-*-netbsdecoff* and *-*-netbsd*. For targets that recently # switched to ELF, *-*-netbsd* would select the old # object file format. This provides both forward @@ -138,22 +202,34 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in # # Note: NetBSD doesn't particularly care about the vendor # portion of the name. We always set it to "unknown". - sysctl="sysctl -n hw.machine_arch" - UNAME_MACHINE_ARCH=`(/sbin/$sysctl 2>/dev/null || \ - /usr/sbin/$sysctl 2>/dev/null || echo unknown)` - case "${UNAME_MACHINE_ARCH}" in + UNAME_MACHINE_ARCH=`(uname -p 2>/dev/null || \ + /sbin/sysctl -n hw.machine_arch 2>/dev/null || \ + /usr/sbin/sysctl -n hw.machine_arch 2>/dev/null || \ + echo unknown)` + case $UNAME_MACHINE_ARCH in + aarch64eb) machine=aarch64_be-unknown ;; + armeb) machine=armeb-unknown ;; arm*) machine=arm-unknown ;; sh3el) machine=shl-unknown ;; sh3eb) machine=sh-unknown ;; - *) machine=${UNAME_MACHINE_ARCH}-unknown ;; + sh5el) machine=sh5le-unknown ;; + earmv*) + arch=`echo "$UNAME_MACHINE_ARCH" | sed -e 's,^e\(armv[0-9]\).*$,\1,'` + endian=`echo "$UNAME_MACHINE_ARCH" | sed -ne 's,^.*\(eb\)$,\1,p'` + machine=${arch}${endian}-unknown + ;; + *) machine=$UNAME_MACHINE_ARCH-unknown ;; esac # The Operating System including object format, if it has switched - # to ELF recently, or will in the future. - case "${UNAME_MACHINE_ARCH}" in + # to ELF recently (or will in the future) and ABI. + case $UNAME_MACHINE_ARCH in + earm*) + os=netbsdelf + ;; arm*|i386|m68k|ns32k|sh3*|sparc|vax) - eval $set_cc_for_build + set_cc_for_build if echo __ELF__ | $CC_FOR_BUILD -E - 2>/dev/null \ - | grep __ELF__ >/dev/null + | grep -q __ELF__ then # Once all utilities can be ECOFF (netbsdecoff) or a.out (netbsdaout). # Return netbsd for either. FIX? @@ -163,198 +239,248 @@ case "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" in fi ;; *) - os=netbsd + os=netbsd + ;; + esac + # Determine ABI tags. + case $UNAME_MACHINE_ARCH in + earm*) + expr='s/^earmv[0-9]/-eabi/;s/eb$//' + abi=`echo "$UNAME_MACHINE_ARCH" | sed -e "$expr"` ;; esac # The OS release - release=`echo ${UNAME_RELEASE}|sed -e 's/[-_].*/\./'` + # Debian GNU/NetBSD machines have a different userland, and + # thus, need a distinct triplet. However, they do not need + # kernel version information, so it can be replaced with a + # suitable tag, in the style of linux-gnu. + case $UNAME_VERSION in + Debian*) + release='-gnu' + ;; + *) + release=`echo "$UNAME_RELEASE" | sed -e 's/[-_].*//' | cut -d. -f1,2` + ;; + esac # Since CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM: # contains redundant information, the shorter form: # CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM is used. - echo "${machine}-${os}${release}" - exit 0 ;; - amiga:OpenBSD:*:*) - echo m68k-unknown-openbsd${UNAME_RELEASE} - exit 0 ;; - arc:OpenBSD:*:*) - echo mipsel-unknown-openbsd${UNAME_RELEASE} - exit 0 ;; - hp300:OpenBSD:*:*) - echo m68k-unknown-openbsd${UNAME_RELEASE} - exit 0 ;; - mac68k:OpenBSD:*:*) - echo m68k-unknown-openbsd${UNAME_RELEASE} - exit 0 ;; - macppc:OpenBSD:*:*) - echo powerpc-unknown-openbsd${UNAME_RELEASE} - exit 0 ;; - mvme68k:OpenBSD:*:*) - echo m68k-unknown-openbsd${UNAME_RELEASE} - exit 0 ;; - mvme88k:OpenBSD:*:*) - echo m88k-unknown-openbsd${UNAME_RELEASE} - exit 0 ;; - mvmeppc:OpenBSD:*:*) - echo powerpc-unknown-openbsd${UNAME_RELEASE} - exit 0 ;; - pmax:OpenBSD:*:*) - echo mipsel-unknown-openbsd${UNAME_RELEASE} - exit 0 ;; - sgi:OpenBSD:*:*) - echo mipseb-unknown-openbsd${UNAME_RELEASE} - exit 0 ;; - sun3:OpenBSD:*:*) - echo m68k-unknown-openbsd${UNAME_RELEASE} - exit 0 ;; - wgrisc:OpenBSD:*:*) - echo mipsel-unknown-openbsd${UNAME_RELEASE} - exit 0 ;; + GUESS=$machine-${os}${release}${abi-} + ;; + *:Bitrig:*:*) + UNAME_MACHINE_ARCH=`arch | sed 's/Bitrig.//'` + GUESS=$UNAME_MACHINE_ARCH-unknown-bitrig$UNAME_RELEASE + ;; *:OpenBSD:*:*) - echo ${UNAME_MACHINE}-unknown-openbsd${UNAME_RELEASE} - exit 0 ;; + UNAME_MACHINE_ARCH=`arch | sed 's/OpenBSD.//'` + GUESS=$UNAME_MACHINE_ARCH-unknown-openbsd$UNAME_RELEASE + ;; + *:SecBSD:*:*) + UNAME_MACHINE_ARCH=`arch | sed 's/SecBSD.//'` + GUESS=$UNAME_MACHINE_ARCH-unknown-secbsd$UNAME_RELEASE + ;; + *:LibertyBSD:*:*) + UNAME_MACHINE_ARCH=`arch | sed 's/^.*BSD\.//'` + GUESS=$UNAME_MACHINE_ARCH-unknown-libertybsd$UNAME_RELEASE + ;; + *:MidnightBSD:*:*) + GUESS=$UNAME_MACHINE-unknown-midnightbsd$UNAME_RELEASE + ;; + *:ekkoBSD:*:*) + GUESS=$UNAME_MACHINE-unknown-ekkobsd$UNAME_RELEASE + ;; + *:SolidBSD:*:*) + GUESS=$UNAME_MACHINE-unknown-solidbsd$UNAME_RELEASE + ;; + *:OS108:*:*) + GUESS=$UNAME_MACHINE-unknown-os108_$UNAME_RELEASE + ;; + macppc:MirBSD:*:*) + GUESS=powerpc-unknown-mirbsd$UNAME_RELEASE + ;; + *:MirBSD:*:*) + GUESS=$UNAME_MACHINE-unknown-mirbsd$UNAME_RELEASE + ;; + *:Sortix:*:*) + GUESS=$UNAME_MACHINE-unknown-sortix + ;; + *:Twizzler:*:*) + GUESS=$UNAME_MACHINE-unknown-twizzler + ;; + *:Redox:*:*) + GUESS=$UNAME_MACHINE-unknown-redox + ;; + mips:OSF1:*.*) + GUESS=mips-dec-osf1 + ;; alpha:OSF1:*:*) - if test $UNAME_RELEASE = "V4.0"; then + # Reset EXIT trap before exiting to avoid spurious non-zero exit code. + trap '' 0 + case $UNAME_RELEASE in + *4.0) UNAME_RELEASE=`/usr/sbin/sizer -v | awk '{print $3}'` - fi + ;; + *5.*) + UNAME_RELEASE=`/usr/sbin/sizer -v | awk '{print $4}'` + ;; + esac + # According to Compaq, /usr/sbin/psrinfo has been available on + # OSF/1 and Tru64 systems produced since 1995. I hope that + # covers most systems running today. This code pipes the CPU + # types through head -n 1, so we only detect the type of CPU 0. + ALPHA_CPU_TYPE=`/usr/sbin/psrinfo -v | sed -n -e 's/^ The alpha \(.*\) processor.*$/\1/p' | head -n 1` + case $ALPHA_CPU_TYPE in + "EV4 (21064)") + UNAME_MACHINE=alpha ;; + "EV4.5 (21064)") + UNAME_MACHINE=alpha ;; + "LCA4 (21066/21068)") + UNAME_MACHINE=alpha ;; + "EV5 (21164)") + UNAME_MACHINE=alphaev5 ;; + "EV5.6 (21164A)") + UNAME_MACHINE=alphaev56 ;; + "EV5.6 (21164PC)") + UNAME_MACHINE=alphapca56 ;; + "EV5.7 (21164PC)") + UNAME_MACHINE=alphapca57 ;; + "EV6 (21264)") + UNAME_MACHINE=alphaev6 ;; + "EV6.7 (21264A)") + UNAME_MACHINE=alphaev67 ;; + "EV6.8CB (21264C)") + UNAME_MACHINE=alphaev68 ;; + "EV6.8AL (21264B)") + UNAME_MACHINE=alphaev68 ;; + "EV6.8CX (21264D)") + UNAME_MACHINE=alphaev68 ;; + "EV6.9A (21264/EV69A)") + UNAME_MACHINE=alphaev69 ;; + "EV7 (21364)") + UNAME_MACHINE=alphaev7 ;; + "EV7.9 (21364A)") + UNAME_MACHINE=alphaev79 ;; + esac + # A Pn.n version is a patched version. # A Vn.n version is a released version. # A Tn.n version is a released field test version. # A Xn.n version is an unreleased experimental baselevel. # 1.2 uses "1.2" for uname -r. - cat <$dummy.s - .data -\$Lformat: - .byte 37,100,45,37,120,10,0 # "%d-%x\n" - - .text - .globl main - .align 4 - .ent main -main: - .frame \$30,16,\$26,0 - ldgp \$29,0(\$27) - .prologue 1 - .long 0x47e03d80 # implver \$0 - lda \$2,-1 - .long 0x47e20c21 # amask \$2,\$1 - lda \$16,\$Lformat - mov \$0,\$17 - not \$1,\$18 - jsr \$26,printf - ldgp \$29,0(\$26) - mov 0,\$16 - jsr \$26,exit - .end main -EOF - eval $set_cc_for_build - $CC_FOR_BUILD $dummy.s -o $dummy 2>/dev/null - if test "$?" = 0 ; then - case `./$dummy` in - 0-0) - UNAME_MACHINE="alpha" - ;; - 1-0) - UNAME_MACHINE="alphaev5" - ;; - 1-1) - UNAME_MACHINE="alphaev56" - ;; - 1-101) - UNAME_MACHINE="alphapca56" - ;; - 2-303) - UNAME_MACHINE="alphaev6" - ;; - 2-307) - UNAME_MACHINE="alphaev67" - ;; - 2-1307) - UNAME_MACHINE="alphaev68" - ;; - esac - fi - rm -f $dummy.s $dummy - echo ${UNAME_MACHINE}-dec-osf`echo ${UNAME_RELEASE} | sed -e 's/^[VTX]//' | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'` - exit 0 ;; - Alpha\ *:Windows_NT*:*) - # How do we know it's Interix rather than the generic POSIX subsystem? - # Should we change UNAME_MACHINE based on the output of uname instead - # of the specific Alpha model? - echo alpha-pc-interix - exit 0 ;; - 21064:Windows_NT:50:3) - echo alpha-dec-winnt3.5 - exit 0 ;; + OSF_REL=`echo "$UNAME_RELEASE" | sed -e 's/^[PVTX]//' | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz` + GUESS=$UNAME_MACHINE-dec-osf$OSF_REL + ;; Amiga*:UNIX_System_V:4.0:*) - echo m68k-unknown-sysv4 - exit 0;; + GUESS=m68k-unknown-sysv4 + ;; *:[Aa]miga[Oo][Ss]:*:*) - echo ${UNAME_MACHINE}-unknown-amigaos - exit 0 ;; + GUESS=$UNAME_MACHINE-unknown-amigaos + ;; *:[Mm]orph[Oo][Ss]:*:*) - echo ${UNAME_MACHINE}-unknown-morphos - exit 0 ;; + GUESS=$UNAME_MACHINE-unknown-morphos + ;; *:OS/390:*:*) - echo i370-ibm-openedition - exit 0 ;; + GUESS=i370-ibm-openedition + ;; + *:z/VM:*:*) + GUESS=s390-ibm-zvmoe + ;; + *:OS400:*:*) + GUESS=powerpc-ibm-os400 + ;; arm:RISC*:1.[012]*:*|arm:riscix:1.[012]*:*) - echo arm-acorn-riscix${UNAME_RELEASE} - exit 0;; + GUESS=arm-acorn-riscix$UNAME_RELEASE + ;; + arm*:riscos:*:*|arm*:RISCOS:*:*) + GUESS=arm-unknown-riscos + ;; SR2?01:HI-UX/MPP:*:* | SR8000:HI-UX/MPP:*:*) - echo hppa1.1-hitachi-hiuxmpp - exit 0;; + GUESS=hppa1.1-hitachi-hiuxmpp + ;; Pyramid*:OSx*:*:* | MIS*:OSx*:*:* | MIS*:SMP_DC-OSx*:*:*) # akee@wpdis03.wpafb.af.mil (Earle F. Ake) contributed MIS and NILE. - if test "`(/bin/universe) 2>/dev/null`" = att ; then - echo pyramid-pyramid-sysv3 - else - echo pyramid-pyramid-bsd - fi - exit 0 ;; + case `(/bin/universe) 2>/dev/null` in + att) GUESS=pyramid-pyramid-sysv3 ;; + *) GUESS=pyramid-pyramid-bsd ;; + esac + ;; NILE*:*:*:dcosx) - echo pyramid-pyramid-svr4 - exit 0 ;; + GUESS=pyramid-pyramid-svr4 + ;; + DRS?6000:unix:4.0:6*) + GUESS=sparc-icl-nx6 + ;; + DRS?6000:UNIX_SV:4.2*:7* | DRS?6000:isis:4.2*:7*) + case `/usr/bin/uname -p` in + sparc) GUESS=sparc-icl-nx7 ;; + esac + ;; + s390x:SunOS:*:*) + SUN_REL=`echo "$UNAME_RELEASE" | sed -e 's/[^.]*//'` + GUESS=$UNAME_MACHINE-ibm-solaris2$SUN_REL + ;; sun4H:SunOS:5.*:*) - echo sparc-hal-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` - exit 0 ;; + SUN_REL=`echo "$UNAME_RELEASE" | sed -e 's/[^.]*//'` + GUESS=sparc-hal-solaris2$SUN_REL + ;; sun4*:SunOS:5.*:* | tadpole*:SunOS:5.*:*) - echo sparc-sun-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` - exit 0 ;; - i86pc:SunOS:5.*:*) - echo i386-pc-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` - exit 0 ;; + SUN_REL=`echo "$UNAME_RELEASE" | sed -e 's/[^.]*//'` + GUESS=sparc-sun-solaris2$SUN_REL + ;; + i86pc:AuroraUX:5.*:* | i86xen:AuroraUX:5.*:*) + GUESS=i386-pc-auroraux$UNAME_RELEASE + ;; + i86pc:SunOS:5.*:* | i86xen:SunOS:5.*:*) + set_cc_for_build + SUN_ARCH=i386 + # If there is a compiler, see if it is configured for 64-bit objects. + # Note that the Sun cc does not turn __LP64__ into 1 like gcc does. + # This test works for both compilers. + if test "$CC_FOR_BUILD" != no_compiler_found; then + if (echo '#ifdef __amd64'; echo IS_64BIT_ARCH; echo '#endif') | \ + (CCOPTS="" $CC_FOR_BUILD -m64 -E - 2>/dev/null) | \ + grep IS_64BIT_ARCH >/dev/null + then + SUN_ARCH=x86_64 + fi + fi + SUN_REL=`echo "$UNAME_RELEASE" | sed -e 's/[^.]*//'` + GUESS=$SUN_ARCH-pc-solaris2$SUN_REL + ;; sun4*:SunOS:6*:*) # According to config.sub, this is the proper way to canonicalize # SunOS6. Hard to guess exactly what SunOS6 will be like, but # it's likely to be more like Solaris than SunOS4. - echo sparc-sun-solaris3`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` - exit 0 ;; + SUN_REL=`echo "$UNAME_RELEASE" | sed -e 's/[^.]*//'` + GUESS=sparc-sun-solaris3$SUN_REL + ;; sun4*:SunOS:*:*) - case "`/usr/bin/arch -k`" in + case `/usr/bin/arch -k` in Series*|S4*) UNAME_RELEASE=`uname -v` ;; esac # Japanese Language versions have a version number like `4.1.3-JL'. - echo sparc-sun-sunos`echo ${UNAME_RELEASE}|sed -e 's/-/_/'` - exit 0 ;; + SUN_REL=`echo "$UNAME_RELEASE" | sed -e 's/-/_/'` + GUESS=sparc-sun-sunos$SUN_REL + ;; sun3*:SunOS:*:*) - echo m68k-sun-sunos${UNAME_RELEASE} - exit 0 ;; + GUESS=m68k-sun-sunos$UNAME_RELEASE + ;; sun*:*:4.2BSD:*) UNAME_RELEASE=`(sed 1q /etc/motd | awk '{print substr($5,1,3)}') 2>/dev/null` - test "x${UNAME_RELEASE}" = "x" && UNAME_RELEASE=3 - case "`/bin/arch`" in + test "x$UNAME_RELEASE" = x && UNAME_RELEASE=3 + case `/bin/arch` in sun3) - echo m68k-sun-sunos${UNAME_RELEASE} + GUESS=m68k-sun-sunos$UNAME_RELEASE ;; sun4) - echo sparc-sun-sunos${UNAME_RELEASE} + GUESS=sparc-sun-sunos$UNAME_RELEASE ;; esac - exit 0 ;; + ;; aushp:SunOS:*:*) - echo sparc-auspex-sunos${UNAME_RELEASE} - exit 0 ;; + GUESS=sparc-auspex-sunos$UNAME_RELEASE + ;; # The situation for MiNT is a little confusing. The machine name # can be virtually everything (everything which is not # "atarist" or "atariste" at least should have a processor @@ -364,41 +490,44 @@ EOF # MiNT. But MiNT is downward compatible to TOS, so this should # be no problem. atarist[e]:*MiNT:*:* | atarist[e]:*mint:*:* | atarist[e]:*TOS:*:*) - echo m68k-atari-mint${UNAME_RELEASE} - exit 0 ;; + GUESS=m68k-atari-mint$UNAME_RELEASE + ;; atari*:*MiNT:*:* | atari*:*mint:*:* | atarist[e]:*TOS:*:*) - echo m68k-atari-mint${UNAME_RELEASE} - exit 0 ;; + GUESS=m68k-atari-mint$UNAME_RELEASE + ;; *falcon*:*MiNT:*:* | *falcon*:*mint:*:* | *falcon*:*TOS:*:*) - echo m68k-atari-mint${UNAME_RELEASE} - exit 0 ;; + GUESS=m68k-atari-mint$UNAME_RELEASE + ;; milan*:*MiNT:*:* | milan*:*mint:*:* | *milan*:*TOS:*:*) - echo m68k-milan-mint${UNAME_RELEASE} - exit 0 ;; + GUESS=m68k-milan-mint$UNAME_RELEASE + ;; hades*:*MiNT:*:* | hades*:*mint:*:* | *hades*:*TOS:*:*) - echo m68k-hades-mint${UNAME_RELEASE} - exit 0 ;; + GUESS=m68k-hades-mint$UNAME_RELEASE + ;; *:*MiNT:*:* | *:*mint:*:* | *:*TOS:*:*) - echo m68k-unknown-mint${UNAME_RELEASE} - exit 0 ;; + GUESS=m68k-unknown-mint$UNAME_RELEASE + ;; + m68k:machten:*:*) + GUESS=m68k-apple-machten$UNAME_RELEASE + ;; powerpc:machten:*:*) - echo powerpc-apple-machten${UNAME_RELEASE} - exit 0 ;; + GUESS=powerpc-apple-machten$UNAME_RELEASE + ;; RISC*:Mach:*:*) - echo mips-dec-mach_bsd4.3 - exit 0 ;; + GUESS=mips-dec-mach_bsd4.3 + ;; RISC*:ULTRIX:*:*) - echo mips-dec-ultrix${UNAME_RELEASE} - exit 0 ;; + GUESS=mips-dec-ultrix$UNAME_RELEASE + ;; VAX*:ULTRIX*:*:*) - echo vax-dec-ultrix${UNAME_RELEASE} - exit 0 ;; + GUESS=vax-dec-ultrix$UNAME_RELEASE + ;; 2020:CLIX:*:* | 2430:CLIX:*:*) - echo clipper-intergraph-clix${UNAME_RELEASE} - exit 0 ;; + GUESS=clipper-intergraph-clix$UNAME_RELEASE + ;; mips:*:*:UMIPS | mips:*:*:RISCos) - eval $set_cc_for_build - sed 's/^ //' << EOF >$dummy.c + set_cc_for_build + sed 's/^ //' << EOF > "$dummy.c" #ifdef __cplusplus #include /* for printf() prototype */ int main (int argc, char *argv[]) { @@ -407,89 +536,96 @@ EOF #endif #if defined (host_mips) && defined (MIPSEB) #if defined (SYSTYPE_SYSV) - printf ("mips-mips-riscos%ssysv\n", argv[1]); exit (0); + printf ("mips-mips-riscos%ssysv\\n", argv[1]); exit (0); #endif #if defined (SYSTYPE_SVR4) - printf ("mips-mips-riscos%ssvr4\n", argv[1]); exit (0); + printf ("mips-mips-riscos%ssvr4\\n", argv[1]); exit (0); #endif #if defined (SYSTYPE_BSD43) || defined(SYSTYPE_BSD) - printf ("mips-mips-riscos%sbsd\n", argv[1]); exit (0); + printf ("mips-mips-riscos%sbsd\\n", argv[1]); exit (0); #endif #endif exit (-1); } EOF - $CC_FOR_BUILD $dummy.c -o $dummy \ - && ./$dummy `echo "${UNAME_RELEASE}" | sed -n 's/\([0-9]*\).*/\1/p'` \ - && rm -f $dummy.c $dummy && exit 0 - rm -f $dummy.c $dummy - echo mips-mips-riscos${UNAME_RELEASE} - exit 0 ;; + $CC_FOR_BUILD -o "$dummy" "$dummy.c" && + dummyarg=`echo "$UNAME_RELEASE" | sed -n 's/\([0-9]*\).*/\1/p'` && + SYSTEM_NAME=`"$dummy" "$dummyarg"` && + { echo "$SYSTEM_NAME"; exit; } + GUESS=mips-mips-riscos$UNAME_RELEASE + ;; Motorola:PowerMAX_OS:*:*) - echo powerpc-motorola-powermax - exit 0 ;; + GUESS=powerpc-motorola-powermax + ;; + Motorola:*:4.3:PL8-*) + GUESS=powerpc-harris-powermax + ;; + Night_Hawk:*:*:PowerMAX_OS | Synergy:PowerMAX_OS:*:*) + GUESS=powerpc-harris-powermax + ;; Night_Hawk:Power_UNIX:*:*) - echo powerpc-harris-powerunix - exit 0 ;; + GUESS=powerpc-harris-powerunix + ;; m88k:CX/UX:7*:*) - echo m88k-harris-cxux7 - exit 0 ;; + GUESS=m88k-harris-cxux7 + ;; m88k:*:4*:R4*) - echo m88k-motorola-sysv4 - exit 0 ;; + GUESS=m88k-motorola-sysv4 + ;; m88k:*:3*:R3*) - echo m88k-motorola-sysv3 - exit 0 ;; + GUESS=m88k-motorola-sysv3 + ;; AViiON:dgux:*:*) - # DG/UX returns AViiON for all architectures - UNAME_PROCESSOR=`/usr/bin/uname -p` - if [ $UNAME_PROCESSOR = mc88100 ] || [ $UNAME_PROCESSOR = mc88110 ] + # DG/UX returns AViiON for all architectures + UNAME_PROCESSOR=`/usr/bin/uname -p` + if test "$UNAME_PROCESSOR" = mc88100 || test "$UNAME_PROCESSOR" = mc88110 then - if [ ${TARGET_BINARY_INTERFACE}x = m88kdguxelfx ] || \ - [ ${TARGET_BINARY_INTERFACE}x = x ] + if test "$TARGET_BINARY_INTERFACE"x = m88kdguxelfx || \ + test "$TARGET_BINARY_INTERFACE"x = x then - echo m88k-dg-dgux${UNAME_RELEASE} + GUESS=m88k-dg-dgux$UNAME_RELEASE else - echo m88k-dg-dguxbcs${UNAME_RELEASE} + GUESS=m88k-dg-dguxbcs$UNAME_RELEASE fi else - echo i586-dg-dgux${UNAME_RELEASE} + GUESS=i586-dg-dgux$UNAME_RELEASE fi - exit 0 ;; + ;; M88*:DolphinOS:*:*) # DolphinOS (SVR3) - echo m88k-dolphin-sysv3 - exit 0 ;; + GUESS=m88k-dolphin-sysv3 + ;; M88*:*:R3*:*) # Delta 88k system running SVR3 - echo m88k-motorola-sysv3 - exit 0 ;; + GUESS=m88k-motorola-sysv3 + ;; XD88*:*:*:*) # Tektronix XD88 system running UTekV (SVR3) - echo m88k-tektronix-sysv3 - exit 0 ;; + GUESS=m88k-tektronix-sysv3 + ;; Tek43[0-9][0-9]:UTek:*:*) # Tektronix 4300 system running UTek (BSD) - echo m68k-tektronix-bsd - exit 0 ;; + GUESS=m68k-tektronix-bsd + ;; *:IRIX*:*:*) - echo mips-sgi-irix`echo ${UNAME_RELEASE}|sed -e 's/-/_/g'` - exit 0 ;; + IRIX_REL=`echo "$UNAME_RELEASE" | sed -e 's/-/_/g'` + GUESS=mips-sgi-irix$IRIX_REL + ;; ????????:AIX?:[12].1:2) # AIX 2.2.1 or AIX 2.1.1 is RT/PC AIX. - echo romp-ibm-aix # uname -m gives an 8 hex-code CPU id - exit 0 ;; # Note that: echo "'`uname -s`'" gives 'AIX ' + GUESS=romp-ibm-aix # uname -m gives an 8 hex-code CPU id + ;; # Note that: echo "'`uname -s`'" gives 'AIX ' i*86:AIX:*:*) - echo i386-ibm-aix - exit 0 ;; + GUESS=i386-ibm-aix + ;; ia64:AIX:*:*) - if [ -x /usr/bin/oslevel ] ; then + if test -x /usr/bin/oslevel ; then IBM_REV=`/usr/bin/oslevel` else - IBM_REV=${UNAME_VERSION}.${UNAME_RELEASE} + IBM_REV=$UNAME_VERSION.$UNAME_RELEASE fi - echo ${UNAME_MACHINE}-ibm-aix${IBM_REV} - exit 0 ;; + GUESS=$UNAME_MACHINE-ibm-aix$IBM_REV + ;; *:AIX:2:3) if grep bos325 /usr/include/stdio.h >/dev/null 2>&1; then - eval $set_cc_for_build - sed 's/^ //' << EOF >$dummy.c + set_cc_for_build + sed 's/^ //' << EOF > "$dummy.c" #include main() @@ -500,119 +636,143 @@ EOF exit(0); } EOF - $CC_FOR_BUILD $dummy.c -o $dummy && ./$dummy && rm -f $dummy.c $dummy && exit 0 - rm -f $dummy.c $dummy - echo rs6000-ibm-aix3.2.5 + if $CC_FOR_BUILD -o "$dummy" "$dummy.c" && SYSTEM_NAME=`"$dummy"` + then + GUESS=$SYSTEM_NAME + else + GUESS=rs6000-ibm-aix3.2.5 + fi elif grep bos324 /usr/include/stdio.h >/dev/null 2>&1; then - echo rs6000-ibm-aix3.2.4 + GUESS=rs6000-ibm-aix3.2.4 else - echo rs6000-ibm-aix3.2 + GUESS=rs6000-ibm-aix3.2 fi - exit 0 ;; - *:AIX:*:[45]) + ;; + *:AIX:*:[4567]) IBM_CPU_ID=`/usr/sbin/lsdev -C -c processor -S available | sed 1q | awk '{ print $1 }'` - if /usr/sbin/lsattr -El ${IBM_CPU_ID} | grep ' POWER' >/dev/null 2>&1; then + if /usr/sbin/lsattr -El "$IBM_CPU_ID" | grep ' POWER' >/dev/null 2>&1; then IBM_ARCH=rs6000 else IBM_ARCH=powerpc fi - if [ -x /usr/bin/oslevel ] ; then - IBM_REV=`/usr/bin/oslevel` + if test -x /usr/bin/lslpp ; then + IBM_REV=`/usr/bin/lslpp -Lqc bos.rte.libc | \ + awk -F: '{ print $3 }' | sed s/[0-9]*$/0/` else - IBM_REV=${UNAME_VERSION}.${UNAME_RELEASE} + IBM_REV=$UNAME_VERSION.$UNAME_RELEASE fi - echo ${IBM_ARCH}-ibm-aix${IBM_REV} - exit 0 ;; + GUESS=$IBM_ARCH-ibm-aix$IBM_REV + ;; *:AIX:*:*) - echo rs6000-ibm-aix - exit 0 ;; - ibmrt:4.4BSD:*|romp-ibm:BSD:*) - echo romp-ibm-bsd4.4 - exit 0 ;; + GUESS=rs6000-ibm-aix + ;; + ibmrt:4.4BSD:*|romp-ibm:4.4BSD:*) + GUESS=romp-ibm-bsd4.4 + ;; ibmrt:*BSD:*|romp-ibm:BSD:*) # covers RT/PC BSD and - echo romp-ibm-bsd${UNAME_RELEASE} # 4.3 with uname added to - exit 0 ;; # report: romp-ibm BSD 4.3 + GUESS=romp-ibm-bsd$UNAME_RELEASE # 4.3 with uname added to + ;; # report: romp-ibm BSD 4.3 *:BOSX:*:*) - echo rs6000-bull-bosx - exit 0 ;; + GUESS=rs6000-bull-bosx + ;; DPX/2?00:B.O.S.:*:*) - echo m68k-bull-sysv3 - exit 0 ;; + GUESS=m68k-bull-sysv3 + ;; 9000/[34]??:4.3bsd:1.*:*) - echo m68k-hp-bsd - exit 0 ;; + GUESS=m68k-hp-bsd + ;; hp300:4.4BSD:*:* | 9000/[34]??:4.3bsd:2.*:*) - echo m68k-hp-bsd4.4 - exit 0 ;; + GUESS=m68k-hp-bsd4.4 + ;; 9000/[34678]??:HP-UX:*:*) - HPUX_REV=`echo ${UNAME_RELEASE}|sed -e 's/[^.]*.[0B]*//'` - case "${UNAME_MACHINE}" in - 9000/31? ) HP_ARCH=m68000 ;; - 9000/[34]?? ) HP_ARCH=m68k ;; + HPUX_REV=`echo "$UNAME_RELEASE" | sed -e 's/[^.]*.[0B]*//'` + case $UNAME_MACHINE in + 9000/31?) HP_ARCH=m68000 ;; + 9000/[34]??) HP_ARCH=m68k ;; 9000/[678][0-9][0-9]) - if [ -x /usr/bin/getconf ]; then + if test -x /usr/bin/getconf; then sc_cpu_version=`/usr/bin/getconf SC_CPU_VERSION 2>/dev/null` - sc_kernel_bits=`/usr/bin/getconf SC_KERNEL_BITS 2>/dev/null` - case "${sc_cpu_version}" in - 523) HP_ARCH="hppa1.0" ;; # CPU_PA_RISC1_0 - 528) HP_ARCH="hppa1.1" ;; # CPU_PA_RISC1_1 - 532) # CPU_PA_RISC2_0 - case "${sc_kernel_bits}" in - 32) HP_ARCH="hppa2.0n" ;; - 64) HP_ARCH="hppa2.0w" ;; - '') HP_ARCH="hppa2.0" ;; # HP-UX 10.20 - esac ;; - esac + sc_kernel_bits=`/usr/bin/getconf SC_KERNEL_BITS 2>/dev/null` + case $sc_cpu_version in + 523) HP_ARCH=hppa1.0 ;; # CPU_PA_RISC1_0 + 528) HP_ARCH=hppa1.1 ;; # CPU_PA_RISC1_1 + 532) # CPU_PA_RISC2_0 + case $sc_kernel_bits in + 32) HP_ARCH=hppa2.0n ;; + 64) HP_ARCH=hppa2.0w ;; + '') HP_ARCH=hppa2.0 ;; # HP-UX 10.20 + esac ;; + esac fi - if [ "${HP_ARCH}" = "" ]; then - eval $set_cc_for_build - sed 's/^ //' << EOF >$dummy.c - - #define _HPUX_SOURCE - #include - #include - - int main () - { - #if defined(_SC_KERNEL_BITS) - long bits = sysconf(_SC_KERNEL_BITS); - #endif - long cpu = sysconf (_SC_CPU_VERSION); - - switch (cpu) - { - case CPU_PA_RISC1_0: puts ("hppa1.0"); break; - case CPU_PA_RISC1_1: puts ("hppa1.1"); break; - case CPU_PA_RISC2_0: - #if defined(_SC_KERNEL_BITS) - switch (bits) - { - case 64: puts ("hppa2.0w"); break; - case 32: puts ("hppa2.0n"); break; - default: puts ("hppa2.0"); break; - } break; - #else /* !defined(_SC_KERNEL_BITS) */ - puts ("hppa2.0"); break; - #endif - default: puts ("hppa1.0"); break; - } - exit (0); - } + if test "$HP_ARCH" = ""; then + set_cc_for_build + sed 's/^ //' << EOF > "$dummy.c" + + #define _HPUX_SOURCE + #include + #include + + int main () + { + #if defined(_SC_KERNEL_BITS) + long bits = sysconf(_SC_KERNEL_BITS); + #endif + long cpu = sysconf (_SC_CPU_VERSION); + + switch (cpu) + { + case CPU_PA_RISC1_0: puts ("hppa1.0"); break; + case CPU_PA_RISC1_1: puts ("hppa1.1"); break; + case CPU_PA_RISC2_0: + #if defined(_SC_KERNEL_BITS) + switch (bits) + { + case 64: puts ("hppa2.0w"); break; + case 32: puts ("hppa2.0n"); break; + default: puts ("hppa2.0"); break; + } break; + #else /* !defined(_SC_KERNEL_BITS) */ + puts ("hppa2.0"); break; + #endif + default: puts ("hppa1.0"); break; + } + exit (0); + } EOF - (CCOPTS= $CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null) && HP_ARCH=`./$dummy` - if test -z "$HP_ARCH"; then HP_ARCH=hppa; fi - rm -f $dummy.c $dummy + (CCOPTS="" $CC_FOR_BUILD -o "$dummy" "$dummy.c" 2>/dev/null) && HP_ARCH=`"$dummy"` + test -z "$HP_ARCH" && HP_ARCH=hppa fi ;; esac - echo ${HP_ARCH}-hp-hpux${HPUX_REV} - exit 0 ;; + if test "$HP_ARCH" = hppa2.0w + then + set_cc_for_build + + # hppa2.0w-hp-hpux* has a 64-bit kernel and a compiler generating + # 32-bit code. hppa64-hp-hpux* has the same kernel and a compiler + # generating 64-bit code. GNU and HP use different nomenclature: + # + # $ CC_FOR_BUILD=cc ./config.guess + # => hppa2.0w-hp-hpux11.23 + # $ CC_FOR_BUILD="cc +DA2.0w" ./config.guess + # => hppa64-hp-hpux11.23 + + if echo __LP64__ | (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | + grep -q __LP64__ + then + HP_ARCH=hppa2.0w + else + HP_ARCH=hppa64 + fi + fi + GUESS=$HP_ARCH-hp-hpux$HPUX_REV + ;; ia64:HP-UX:*:*) - HPUX_REV=`echo ${UNAME_RELEASE}|sed -e 's/[^.]*.[0B]*//'` - echo ia64-hp-hpux${HPUX_REV} - exit 0 ;; + HPUX_REV=`echo "$UNAME_RELEASE" | sed -e 's/[^.]*.[0B]*//'` + GUESS=ia64-hp-hpux$HPUX_REV + ;; 3050*:HI-UX:*:*) - eval $set_cc_for_build - sed 's/^ //' << EOF >$dummy.c + set_cc_for_build + sed 's/^ //' << EOF > "$dummy.c" #include int main () @@ -637,500 +797,791 @@ EOF exit (0); } EOF - $CC_FOR_BUILD $dummy.c -o $dummy && ./$dummy && rm -f $dummy.c $dummy && exit 0 - rm -f $dummy.c $dummy - echo unknown-hitachi-hiuxwe2 - exit 0 ;; - 9000/7??:4.3bsd:*:* | 9000/8?[79]:4.3bsd:*:* ) - echo hppa1.1-hp-bsd - exit 0 ;; + $CC_FOR_BUILD -o "$dummy" "$dummy.c" && SYSTEM_NAME=`"$dummy"` && + { echo "$SYSTEM_NAME"; exit; } + GUESS=unknown-hitachi-hiuxwe2 + ;; + 9000/7??:4.3bsd:*:* | 9000/8?[79]:4.3bsd:*:*) + GUESS=hppa1.1-hp-bsd + ;; 9000/8??:4.3bsd:*:*) - echo hppa1.0-hp-bsd - exit 0 ;; + GUESS=hppa1.0-hp-bsd + ;; *9??*:MPE/iX:*:* | *3000*:MPE/iX:*:*) - echo hppa1.0-hp-mpeix - exit 0 ;; - hp7??:OSF1:*:* | hp8?[79]:OSF1:*:* ) - echo hppa1.1-hp-osf - exit 0 ;; + GUESS=hppa1.0-hp-mpeix + ;; + hp7??:OSF1:*:* | hp8?[79]:OSF1:*:*) + GUESS=hppa1.1-hp-osf + ;; hp8??:OSF1:*:*) - echo hppa1.0-hp-osf - exit 0 ;; + GUESS=hppa1.0-hp-osf + ;; i*86:OSF1:*:*) - if [ -x /usr/sbin/sysversion ] ; then - echo ${UNAME_MACHINE}-unknown-osf1mk + if test -x /usr/sbin/sysversion ; then + GUESS=$UNAME_MACHINE-unknown-osf1mk else - echo ${UNAME_MACHINE}-unknown-osf1 + GUESS=$UNAME_MACHINE-unknown-osf1 fi - exit 0 ;; + ;; parisc*:Lites*:*:*) - echo hppa1.1-hp-lites - exit 0 ;; + GUESS=hppa1.1-hp-lites + ;; C1*:ConvexOS:*:* | convex:ConvexOS:C1*:*) - echo c1-convex-bsd - exit 0 ;; + GUESS=c1-convex-bsd + ;; C2*:ConvexOS:*:* | convex:ConvexOS:C2*:*) if getsysinfo -f scalar_acc then echo c32-convex-bsd else echo c2-convex-bsd fi - exit 0 ;; + exit ;; C34*:ConvexOS:*:* | convex:ConvexOS:C34*:*) - echo c34-convex-bsd - exit 0 ;; + GUESS=c34-convex-bsd + ;; C38*:ConvexOS:*:* | convex:ConvexOS:C38*:*) - echo c38-convex-bsd - exit 0 ;; + GUESS=c38-convex-bsd + ;; C4*:ConvexOS:*:* | convex:ConvexOS:C4*:*) - echo c4-convex-bsd - exit 0 ;; + GUESS=c4-convex-bsd + ;; CRAY*Y-MP:*:*:*) - echo ymp-cray-unicos${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' - exit 0 ;; + CRAY_REL=`echo "$UNAME_RELEASE" | sed -e 's/\.[^.]*$/.X/'` + GUESS=ymp-cray-unicos$CRAY_REL + ;; CRAY*[A-Z]90:*:*:*) - echo ${UNAME_MACHINE}-cray-unicos${UNAME_RELEASE} \ + echo "$UNAME_MACHINE"-cray-unicos"$UNAME_RELEASE" \ | sed -e 's/CRAY.*\([A-Z]90\)/\1/' \ -e y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/ \ -e 's/\.[^.]*$/.X/' - exit 0 ;; + exit ;; CRAY*TS:*:*:*) - echo t90-cray-unicos${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' - exit 0 ;; - CRAY*T3D:*:*:*) - echo alpha-cray-unicosmk${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' - exit 0 ;; + CRAY_REL=`echo "$UNAME_RELEASE" | sed -e 's/\.[^.]*$/.X/'` + GUESS=t90-cray-unicos$CRAY_REL + ;; CRAY*T3E:*:*:*) - echo alphaev5-cray-unicosmk${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' - exit 0 ;; + CRAY_REL=`echo "$UNAME_RELEASE" | sed -e 's/\.[^.]*$/.X/'` + GUESS=alphaev5-cray-unicosmk$CRAY_REL + ;; CRAY*SV1:*:*:*) - echo sv1-cray-unicos${UNAME_RELEASE} | sed -e 's/\.[^.]*$/.X/' - exit 0 ;; + CRAY_REL=`echo "$UNAME_RELEASE" | sed -e 's/\.[^.]*$/.X/'` + GUESS=sv1-cray-unicos$CRAY_REL + ;; + *:UNICOS/mp:*:*) + CRAY_REL=`echo "$UNAME_RELEASE" | sed -e 's/\.[^.]*$/.X/'` + GUESS=craynv-cray-unicosmp$CRAY_REL + ;; F30[01]:UNIX_System_V:*:* | F700:UNIX_System_V:*:*) - FUJITSU_PROC=`uname -m | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'` - FUJITSU_SYS=`uname -p | tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' | sed -e 's/\///'` - FUJITSU_REL=`echo ${UNAME_RELEASE} | sed -e 's/ /_/'` - echo "${FUJITSU_PROC}-fujitsu-${FUJITSU_SYS}${FUJITSU_REL}" - exit 0 ;; + FUJITSU_PROC=`uname -m | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz` + FUJITSU_SYS=`uname -p | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/\///'` + FUJITSU_REL=`echo "$UNAME_RELEASE" | sed -e 's/ /_/'` + GUESS=${FUJITSU_PROC}-fujitsu-${FUJITSU_SYS}${FUJITSU_REL} + ;; + 5000:UNIX_System_V:4.*:*) + FUJITSU_SYS=`uname -p | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/\///'` + FUJITSU_REL=`echo "$UNAME_RELEASE" | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/ /_/'` + GUESS=sparc-fujitsu-${FUJITSU_SYS}${FUJITSU_REL} + ;; i*86:BSD/386:*:* | i*86:BSD/OS:*:* | *:Ascend\ Embedded/OS:*:*) - echo ${UNAME_MACHINE}-pc-bsdi${UNAME_RELEASE} - exit 0 ;; + GUESS=$UNAME_MACHINE-pc-bsdi$UNAME_RELEASE + ;; sparc*:BSD/OS:*:*) - echo sparc-unknown-bsdi${UNAME_RELEASE} - exit 0 ;; + GUESS=sparc-unknown-bsdi$UNAME_RELEASE + ;; *:BSD/OS:*:*) - echo ${UNAME_MACHINE}-unknown-bsdi${UNAME_RELEASE} - exit 0 ;; + GUESS=$UNAME_MACHINE-unknown-bsdi$UNAME_RELEASE + ;; + arm:FreeBSD:*:*) + UNAME_PROCESSOR=`uname -p` + set_cc_for_build + if echo __ARM_PCS_VFP | $CC_FOR_BUILD -E - 2>/dev/null \ + | grep -q __ARM_PCS_VFP + then + FREEBSD_REL=`echo "$UNAME_RELEASE" | sed -e 's/[-(].*//'` + GUESS=$UNAME_PROCESSOR-unknown-freebsd$FREEBSD_REL-gnueabi + else + FREEBSD_REL=`echo "$UNAME_RELEASE" | sed -e 's/[-(].*//'` + GUESS=$UNAME_PROCESSOR-unknown-freebsd$FREEBSD_REL-gnueabihf + fi + ;; *:FreeBSD:*:*) - echo ${UNAME_MACHINE}-unknown-freebsd`echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'` - exit 0 ;; + UNAME_PROCESSOR=`/usr/bin/uname -p` + case $UNAME_PROCESSOR in + amd64) + UNAME_PROCESSOR=x86_64 ;; + i386) + UNAME_PROCESSOR=i586 ;; + esac + FREEBSD_REL=`echo "$UNAME_RELEASE" | sed -e 's/[-(].*//'` + GUESS=$UNAME_PROCESSOR-unknown-freebsd$FREEBSD_REL + ;; i*:CYGWIN*:*) - echo ${UNAME_MACHINE}-pc-cygwin - exit 0 ;; - i*:MINGW*:*) - echo ${UNAME_MACHINE}-pc-mingw32 - exit 0 ;; + GUESS=$UNAME_MACHINE-pc-cygwin + ;; + *:MINGW64*:*) + GUESS=$UNAME_MACHINE-pc-mingw64 + ;; + *:MINGW*:*) + GUESS=$UNAME_MACHINE-pc-mingw32 + ;; + *:MSYS*:*) + GUESS=$UNAME_MACHINE-pc-msys + ;; i*:PW*:*) - echo ${UNAME_MACHINE}-pc-pw32 - exit 0 ;; - x86:Interix*:3*) - echo i386-pc-interix3 - exit 0 ;; - i*:Windows_NT*:* | Pentium*:Windows_NT*:*) - # How do we know it's Interix rather than the generic POSIX subsystem? - # It also conflicts with pre-2.0 versions of AT&T UWIN. Should we - # UNAME_MACHINE based on the output of uname instead of i386? - echo i386-pc-interix - exit 0 ;; + GUESS=$UNAME_MACHINE-pc-pw32 + ;; + *:SerenityOS:*:*) + GUESS=$UNAME_MACHINE-pc-serenity + ;; + *:Interix*:*) + case $UNAME_MACHINE in + x86) + GUESS=i586-pc-interix$UNAME_RELEASE + ;; + authenticamd | genuineintel | EM64T) + GUESS=x86_64-unknown-interix$UNAME_RELEASE + ;; + IA64) + GUESS=ia64-unknown-interix$UNAME_RELEASE + ;; + esac ;; i*:UWIN*:*) - echo ${UNAME_MACHINE}-pc-uwin - exit 0 ;; - p*:CYGWIN*:*) - echo powerpcle-unknown-cygwin - exit 0 ;; + GUESS=$UNAME_MACHINE-pc-uwin + ;; + amd64:CYGWIN*:*:* | x86_64:CYGWIN*:*:*) + GUESS=x86_64-pc-cygwin + ;; prep*:SunOS:5.*:*) - echo powerpcle-unknown-solaris2`echo ${UNAME_RELEASE}|sed -e 's/[^.]*//'` - exit 0 ;; + SUN_REL=`echo "$UNAME_RELEASE" | sed -e 's/[^.]*//'` + GUESS=powerpcle-unknown-solaris2$SUN_REL + ;; *:GNU:*:*) - echo `echo ${UNAME_MACHINE}|sed -e 's,[-/].*$,,'`-unknown-gnu`echo ${UNAME_RELEASE}|sed -e 's,/.*$,,'` - exit 0 ;; - i*86:Minix:*:*) - echo ${UNAME_MACHINE}-pc-minix - exit 0 ;; + # the GNU system + GNU_ARCH=`echo "$UNAME_MACHINE" | sed -e 's,[-/].*$,,'` + GNU_REL=`echo "$UNAME_RELEASE" | sed -e 's,/.*$,,'` + GUESS=$GNU_ARCH-unknown-$LIBC$GNU_REL + ;; + *:GNU/*:*:*) + # other systems with GNU libc and userland + GNU_SYS=`echo "$UNAME_SYSTEM" | sed 's,^[^/]*/,,' | tr "[:upper:]" "[:lower:]"` + GNU_REL=`echo "$UNAME_RELEASE" | sed -e 's/[-(].*//'` + GUESS=$UNAME_MACHINE-unknown-$GNU_SYS$GNU_REL-$LIBC + ;; + x86_64:[Mm]anagarm:*:*|i?86:[Mm]anagarm:*:*) + GUESS="$UNAME_MACHINE-pc-managarm-mlibc" + ;; + *:[Mm]anagarm:*:*) + GUESS="$UNAME_MACHINE-unknown-managarm-mlibc" + ;; + *:Minix:*:*) + GUESS=$UNAME_MACHINE-unknown-minix + ;; + aarch64:Linux:*:*) + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; + aarch64_be:Linux:*:*) + UNAME_MACHINE=aarch64_be + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; + alpha:Linux:*:*) + case `sed -n '/^cpu model/s/^.*: \(.*\)/\1/p' /proc/cpuinfo 2>/dev/null` in + EV5) UNAME_MACHINE=alphaev5 ;; + EV56) UNAME_MACHINE=alphaev56 ;; + PCA56) UNAME_MACHINE=alphapca56 ;; + PCA57) UNAME_MACHINE=alphapca56 ;; + EV6) UNAME_MACHINE=alphaev6 ;; + EV67) UNAME_MACHINE=alphaev67 ;; + EV68*) UNAME_MACHINE=alphaev68 ;; + esac + objdump --private-headers /bin/sh | grep -q ld.so.1 + if test "$?" = 0 ; then LIBC=gnulibc1 ; fi + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; + arc:Linux:*:* | arceb:Linux:*:* | arc32:Linux:*:* | arc64:Linux:*:*) + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; arm*:Linux:*:*) - echo ${UNAME_MACHINE}-unknown-linux-gnu - exit 0 ;; + set_cc_for_build + if echo __ARM_EABI__ | $CC_FOR_BUILD -E - 2>/dev/null \ + | grep -q __ARM_EABI__ + then + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + else + if echo __ARM_PCS_VFP | $CC_FOR_BUILD -E - 2>/dev/null \ + | grep -q __ARM_PCS_VFP + then + GUESS=$UNAME_MACHINE-unknown-linux-${LIBC}eabi + else + GUESS=$UNAME_MACHINE-unknown-linux-${LIBC}eabihf + fi + fi + ;; + avr32*:Linux:*:*) + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; + cris:Linux:*:*) + GUESS=$UNAME_MACHINE-axis-linux-$LIBC + ;; + crisv32:Linux:*:*) + GUESS=$UNAME_MACHINE-axis-linux-$LIBC + ;; + e2k:Linux:*:*) + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; + frv:Linux:*:*) + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; + hexagon:Linux:*:*) + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; + i*86:Linux:*:*) + GUESS=$UNAME_MACHINE-pc-linux-$LIBC + ;; ia64:Linux:*:*) - echo ${UNAME_MACHINE}-unknown-linux-gnu - exit 0 ;; + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; + k1om:Linux:*:*) + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; + loongarch32:Linux:*:* | loongarch64:Linux:*:*) + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; + m32r*:Linux:*:*) + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; m68*:Linux:*:*) - echo ${UNAME_MACHINE}-unknown-linux-gnu - exit 0 ;; - mips:Linux:*:*) - eval $set_cc_for_build - sed 's/^ //' << EOF >$dummy.c + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; + mips:Linux:*:* | mips64:Linux:*:*) + set_cc_for_build + IS_GLIBC=0 + test x"${LIBC}" = xgnu && IS_GLIBC=1 + sed 's/^ //' << EOF > "$dummy.c" #undef CPU #undef mips #undef mipsel + #undef mips64 + #undef mips64el + #if ${IS_GLIBC} && defined(_ABI64) + LIBCABI=gnuabi64 + #else + #if ${IS_GLIBC} && defined(_ABIN32) + LIBCABI=gnuabin32 + #else + LIBCABI=${LIBC} + #endif + #endif + + #if ${IS_GLIBC} && defined(__mips64) && defined(__mips_isa_rev) && __mips_isa_rev>=6 + CPU=mipsisa64r6 + #else + #if ${IS_GLIBC} && !defined(__mips64) && defined(__mips_isa_rev) && __mips_isa_rev>=6 + CPU=mipsisa32r6 + #else + #if defined(__mips64) + CPU=mips64 + #else + CPU=mips + #endif + #endif + #endif + #if defined(__MIPSEL__) || defined(__MIPSEL) || defined(_MIPSEL) || defined(MIPSEL) - CPU=mipsel + MIPS_ENDIAN=el #else #if defined(__MIPSEB__) || defined(__MIPSEB) || defined(_MIPSEB) || defined(MIPSEB) - CPU=mips + MIPS_ENDIAN= #else - CPU= + MIPS_ENDIAN= #endif #endif EOF - eval `$CC_FOR_BUILD -E $dummy.c 2>/dev/null | grep ^CPU=` - rm -f $dummy.c - test x"${CPU}" != x && echo "${CPU}-pc-linux-gnu" && exit 0 + cc_set_vars=`$CC_FOR_BUILD -E "$dummy.c" 2>/dev/null | grep '^CPU\|^MIPS_ENDIAN\|^LIBCABI'` + eval "$cc_set_vars" + test "x$CPU" != x && { echo "$CPU${MIPS_ENDIAN}-unknown-linux-$LIBCABI"; exit; } + ;; + mips64el:Linux:*:*) + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; + openrisc*:Linux:*:*) + GUESS=or1k-unknown-linux-$LIBC + ;; + or32:Linux:*:* | or1k*:Linux:*:*) + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; + padre:Linux:*:*) + GUESS=sparc-unknown-linux-$LIBC + ;; + parisc64:Linux:*:* | hppa64:Linux:*:*) + GUESS=hppa64-unknown-linux-$LIBC ;; - ppc:Linux:*:*) - echo powerpc-unknown-linux-gnu - exit 0 ;; - ppc64:Linux:*:*) - echo powerpc64-unknown-linux-gnu - exit 0 ;; - alpha:Linux:*:*) - case `sed -n '/^cpu model/s/^.*: \(.*\)/\1/p' < /proc/cpuinfo` in - EV5) UNAME_MACHINE=alphaev5 ;; - EV56) UNAME_MACHINE=alphaev56 ;; - PCA56) UNAME_MACHINE=alphapca56 ;; - PCA57) UNAME_MACHINE=alphapca56 ;; - EV6) UNAME_MACHINE=alphaev6 ;; - EV67) UNAME_MACHINE=alphaev67 ;; - EV68*) UNAME_MACHINE=alphaev68 ;; - esac - objdump --private-headers /bin/sh | grep ld.so.1 >/dev/null - if test "$?" = 0 ; then LIBC="libc1" ; else LIBC="" ; fi - echo ${UNAME_MACHINE}-unknown-linux-gnu${LIBC} - exit 0 ;; parisc:Linux:*:* | hppa:Linux:*:*) # Look for CPU level case `grep '^cpu[^a-z]*:' /proc/cpuinfo 2>/dev/null | cut -d' ' -f2` in - PA7*) echo hppa1.1-unknown-linux-gnu ;; - PA8*) echo hppa2.0-unknown-linux-gnu ;; - *) echo hppa-unknown-linux-gnu ;; + PA7*) GUESS=hppa1.1-unknown-linux-$LIBC ;; + PA8*) GUESS=hppa2.0-unknown-linux-$LIBC ;; + *) GUESS=hppa-unknown-linux-$LIBC ;; esac - exit 0 ;; - parisc64:Linux:*:* | hppa64:Linux:*:*) - echo hppa64-unknown-linux-gnu - exit 0 ;; + ;; + ppc64:Linux:*:*) + GUESS=powerpc64-unknown-linux-$LIBC + ;; + ppc:Linux:*:*) + GUESS=powerpc-unknown-linux-$LIBC + ;; + ppc64le:Linux:*:*) + GUESS=powerpc64le-unknown-linux-$LIBC + ;; + ppcle:Linux:*:*) + GUESS=powerpcle-unknown-linux-$LIBC + ;; + riscv32:Linux:*:* | riscv32be:Linux:*:* | riscv64:Linux:*:* | riscv64be:Linux:*:*) + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; s390:Linux:*:* | s390x:Linux:*:*) - echo ${UNAME_MACHINE}-ibm-linux - exit 0 ;; + GUESS=$UNAME_MACHINE-ibm-linux-$LIBC + ;; + sh64*:Linux:*:*) + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; sh*:Linux:*:*) - echo ${UNAME_MACHINE}-unknown-linux-gnu - exit 0 ;; + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; sparc:Linux:*:* | sparc64:Linux:*:*) - echo ${UNAME_MACHINE}-unknown-linux-gnu - exit 0 ;; + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; + tile*:Linux:*:*) + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC + ;; + vax:Linux:*:*) + GUESS=$UNAME_MACHINE-dec-linux-$LIBC + ;; x86_64:Linux:*:*) - echo x86_64-unknown-linux-gnu - exit 0 ;; - i*86:Linux:*:*) - # The BFD linker knows what the default object file format is, so - # first see if it will tell us. cd to the root directory to prevent - # problems with other programs or directories called `ld' in the path. - # Set LC_ALL=C to ensure ld outputs messages in English. - ld_supported_targets=`cd /; LC_ALL=C ld --help 2>&1 \ - | sed -ne '/supported targets:/!d - s/[ ][ ]*/ /g - s/.*supported targets: *// - s/ .*// - p'` - case "$ld_supported_targets" in - elf32-i386) - TENTATIVE="${UNAME_MACHINE}-pc-linux-gnu" - ;; - a.out-i386-linux) - echo "${UNAME_MACHINE}-pc-linux-gnuaout" - exit 0 ;; - coff-i386) - echo "${UNAME_MACHINE}-pc-linux-gnucoff" - exit 0 ;; - "") - # Either a pre-BFD a.out linker (linux-gnuoldld) or - # one that does not give us useful --help. - echo "${UNAME_MACHINE}-pc-linux-gnuoldld" - exit 0 ;; - esac - # Determine whether the default compiler is a.out or elf - eval $set_cc_for_build - sed 's/^ //' << EOF >$dummy.c - #include - #ifdef __ELF__ - # ifdef __GLIBC__ - # if __GLIBC__ >= 2 - LIBC=gnu - # else - LIBC=gnulibc1 - # endif - # else - LIBC=gnulibc1 - # endif - #else - #ifdef __INTEL_COMPILER - LIBC=gnu - #else - LIBC=gnuaout - #endif - #endif + set_cc_for_build + CPU=$UNAME_MACHINE + LIBCABI=$LIBC + if test "$CC_FOR_BUILD" != no_compiler_found; then + ABI=64 + sed 's/^ //' << EOF > "$dummy.c" + #ifdef __i386__ + ABI=x86 + #else + #ifdef __ILP32__ + ABI=x32 + #endif + #endif EOF - eval `$CC_FOR_BUILD -E $dummy.c 2>/dev/null | grep ^LIBC=` - rm -f $dummy.c - test x"${LIBC}" != x && echo "${UNAME_MACHINE}-pc-linux-${LIBC}" && exit 0 - test x"${TENTATIVE}" != x && echo "${TENTATIVE}" && exit 0 + cc_set_abi=`$CC_FOR_BUILD -E "$dummy.c" 2>/dev/null | grep '^ABI' | sed 's, ,,g'` + eval "$cc_set_abi" + case $ABI in + x86) CPU=i686 ;; + x32) LIBCABI=${LIBC}x32 ;; + esac + fi + GUESS=$CPU-pc-linux-$LIBCABI + ;; + xtensa*:Linux:*:*) + GUESS=$UNAME_MACHINE-unknown-linux-$LIBC ;; i*86:DYNIX/ptx:4*:*) # ptx 4.0 does uname -s correctly, with DYNIX/ptx in there. # earlier versions are messed up and put the nodename in both # sysname and nodename. - echo i386-sequent-sysv4 - exit 0 ;; + GUESS=i386-sequent-sysv4 + ;; i*86:UNIX_SV:4.2MP:2.*) - # Unixware is an offshoot of SVR4, but it has its own version - # number series starting with 2... - # I am not positive that other SVR4 systems won't match this, + # Unixware is an offshoot of SVR4, but it has its own version + # number series starting with 2... + # I am not positive that other SVR4 systems won't match this, # I just have to hope. -- rms. - # Use sysv4.2uw... so that sysv4* matches it. - echo ${UNAME_MACHINE}-pc-sysv4.2uw${UNAME_VERSION} - exit 0 ;; - i*86:*:4.*:* | i*86:SYSTEM_V:4.*:*) - UNAME_REL=`echo ${UNAME_RELEASE} | sed 's/\/MP$//'` + # Use sysv4.2uw... so that sysv4* matches it. + GUESS=$UNAME_MACHINE-pc-sysv4.2uw$UNAME_VERSION + ;; + i*86:OS/2:*:*) + # If we were able to find `uname', then EMX Unix compatibility + # is probably installed. + GUESS=$UNAME_MACHINE-pc-os2-emx + ;; + i*86:XTS-300:*:STOP) + GUESS=$UNAME_MACHINE-unknown-stop + ;; + i*86:atheos:*:*) + GUESS=$UNAME_MACHINE-unknown-atheos + ;; + i*86:syllable:*:*) + GUESS=$UNAME_MACHINE-pc-syllable + ;; + i*86:LynxOS:2.*:* | i*86:LynxOS:3.[01]*:* | i*86:LynxOS:4.[02]*:*) + GUESS=i386-unknown-lynxos$UNAME_RELEASE + ;; + i*86:*DOS:*:*) + GUESS=$UNAME_MACHINE-pc-msdosdjgpp + ;; + i*86:*:4.*:*) + UNAME_REL=`echo "$UNAME_RELEASE" | sed 's/\/MP$//'` if grep Novell /usr/include/link.h >/dev/null 2>/dev/null; then - echo ${UNAME_MACHINE}-univel-sysv${UNAME_REL} + GUESS=$UNAME_MACHINE-univel-sysv$UNAME_REL else - echo ${UNAME_MACHINE}-pc-sysv${UNAME_REL} + GUESS=$UNAME_MACHINE-pc-sysv$UNAME_REL fi - exit 0 ;; - i*86:*:5:[78]*) + ;; + i*86:*:5:[678]*) + # UnixWare 7.x, OpenUNIX and OpenServer 6. case `/bin/uname -X | grep "^Machine"` in *486*) UNAME_MACHINE=i486 ;; *Pentium) UNAME_MACHINE=i586 ;; *Pent*|*Celeron) UNAME_MACHINE=i686 ;; esac - echo ${UNAME_MACHINE}-unknown-sysv${UNAME_RELEASE}${UNAME_SYSTEM}${UNAME_VERSION} - exit 0 ;; + GUESS=$UNAME_MACHINE-unknown-sysv${UNAME_RELEASE}${UNAME_SYSTEM}${UNAME_VERSION} + ;; i*86:*:3.2:*) if test -f /usr/options/cb.name; then UNAME_REL=`sed -n 's/.*Version //p' /dev/null >/dev/null ; then - UNAME_REL=`(/bin/uname -X|egrep Release|sed -e 's/.*= //')` - (/bin/uname -X|egrep i80486 >/dev/null) && UNAME_MACHINE=i486 - (/bin/uname -X|egrep '^Machine.*Pentium' >/dev/null) \ + UNAME_REL=`(/bin/uname -X|grep Release|sed -e 's/.*= //')` + (/bin/uname -X|grep i80486 >/dev/null) && UNAME_MACHINE=i486 + (/bin/uname -X|grep '^Machine.*Pentium' >/dev/null) \ && UNAME_MACHINE=i586 - (/bin/uname -X|egrep '^Machine.*Pent ?II' >/dev/null) \ + (/bin/uname -X|grep '^Machine.*Pent *II' >/dev/null) \ && UNAME_MACHINE=i686 - (/bin/uname -X|egrep '^Machine.*Pentium Pro' >/dev/null) \ + (/bin/uname -X|grep '^Machine.*Pentium Pro' >/dev/null) \ && UNAME_MACHINE=i686 - echo ${UNAME_MACHINE}-pc-sco$UNAME_REL + GUESS=$UNAME_MACHINE-pc-sco$UNAME_REL else - echo ${UNAME_MACHINE}-pc-sysv32 + GUESS=$UNAME_MACHINE-pc-sysv32 fi - exit 0 ;; - i*86:*DOS:*:*) - echo ${UNAME_MACHINE}-pc-msdosdjgpp - exit 0 ;; + ;; pc:*:*:*) # Left here for compatibility: - # uname -m prints for DJGPP always 'pc', but it prints nothing about - # the processor, so we play safe by assuming i386. - echo i386-pc-msdosdjgpp - exit 0 ;; + # uname -m prints for DJGPP always 'pc', but it prints nothing about + # the processor, so we play safe by assuming i586. + # Note: whatever this is, it MUST be the same as what config.sub + # prints for the "djgpp" host, or else GDB configure will decide that + # this is a cross-build. + GUESS=i586-pc-msdosdjgpp + ;; Intel:Mach:3*:*) - echo i386-pc-mach3 - exit 0 ;; + GUESS=i386-pc-mach3 + ;; paragon:*:*:*) - echo i860-intel-osf1 - exit 0 ;; + GUESS=i860-intel-osf1 + ;; i860:*:4.*:*) # i860-SVR4 if grep Stardent /usr/include/sys/uadmin.h >/dev/null 2>&1 ; then - echo i860-stardent-sysv${UNAME_RELEASE} # Stardent Vistra i860-SVR4 + GUESS=i860-stardent-sysv$UNAME_RELEASE # Stardent Vistra i860-SVR4 else # Add other i860-SVR4 vendors below as they are discovered. - echo i860-unknown-sysv${UNAME_RELEASE} # Unknown i860-SVR4 + GUESS=i860-unknown-sysv$UNAME_RELEASE # Unknown i860-SVR4 fi - exit 0 ;; + ;; mini*:CTIX:SYS*5:*) # "miniframe" - echo m68010-convergent-sysv - exit 0 ;; - M68*:*:R3V[567]*:*) - test -r /sysV68 && echo 'm68k-motorola-sysv' && exit 0 ;; - 3[34]??:*:4.0:3.0 | 3[34]??A:*:4.0:3.0 | 3[34]??,*:*:4.0:3.0 | 3[34]??/*:*:4.0:3.0 | 4850:*:4.0:3.0 | SKA40:*:4.0:3.0) + GUESS=m68010-convergent-sysv + ;; + mc68k:UNIX:SYSTEM5:3.51m) + GUESS=m68k-convergent-sysv + ;; + M680?0:D-NIX:5.3:*) + GUESS=m68k-diab-dnix + ;; + M68*:*:R3V[5678]*:*) + test -r /sysV68 && { echo 'm68k-motorola-sysv'; exit; } ;; + 3[345]??:*:4.0:3.0 | 3[34]??A:*:4.0:3.0 | 3[34]??,*:*:4.0:3.0 | 3[34]??/*:*:4.0:3.0 | 4400:*:4.0:3.0 | 4850:*:4.0:3.0 | SKA40:*:4.0:3.0 | SDS2:*:4.0:3.0 | SHG2:*:4.0:3.0 | S7501*:*:4.0:3.0) OS_REL='' test -r /etc/.relid \ && OS_REL=.`sed -n 's/[^ ]* [^ ]* \([0-9][0-9]\).*/\1/p' < /etc/.relid` /bin/uname -p 2>/dev/null | grep 86 >/dev/null \ - && echo i486-ncr-sysv4.3${OS_REL} && exit 0 + && { echo i486-ncr-sysv4.3"$OS_REL"; exit; } /bin/uname -p 2>/dev/null | /bin/grep entium >/dev/null \ - && echo i586-ncr-sysv4.3${OS_REL} && exit 0 ;; + && { echo i586-ncr-sysv4.3"$OS_REL"; exit; } ;; 3[34]??:*:4.0:* | 3[34]??,*:*:4.0:*) - /bin/uname -p 2>/dev/null | grep 86 >/dev/null \ - && echo i486-ncr-sysv4 && exit 0 ;; + /bin/uname -p 2>/dev/null | grep 86 >/dev/null \ + && { echo i486-ncr-sysv4; exit; } ;; + NCR*:*:4.2:* | MPRAS*:*:4.2:*) + OS_REL='.3' + test -r /etc/.relid \ + && OS_REL=.`sed -n 's/[^ ]* [^ ]* \([0-9][0-9]\).*/\1/p' < /etc/.relid` + /bin/uname -p 2>/dev/null | grep 86 >/dev/null \ + && { echo i486-ncr-sysv4.3"$OS_REL"; exit; } + /bin/uname -p 2>/dev/null | /bin/grep entium >/dev/null \ + && { echo i586-ncr-sysv4.3"$OS_REL"; exit; } + /bin/uname -p 2>/dev/null | /bin/grep pteron >/dev/null \ + && { echo i586-ncr-sysv4.3"$OS_REL"; exit; } ;; m68*:LynxOS:2.*:* | m68*:LynxOS:3.0*:*) - echo m68k-unknown-lynxos${UNAME_RELEASE} - exit 0 ;; + GUESS=m68k-unknown-lynxos$UNAME_RELEASE + ;; mc68030:UNIX_System_V:4.*:*) - echo m68k-atari-sysv4 - exit 0 ;; - i*86:LynxOS:2.*:* | i*86:LynxOS:3.[01]*:* | i*86:LynxOS:4.0*:*) - echo i386-unknown-lynxos${UNAME_RELEASE} - exit 0 ;; + GUESS=m68k-atari-sysv4 + ;; TSUNAMI:LynxOS:2.*:*) - echo sparc-unknown-lynxos${UNAME_RELEASE} - exit 0 ;; + GUESS=sparc-unknown-lynxos$UNAME_RELEASE + ;; rs6000:LynxOS:2.*:*) - echo rs6000-unknown-lynxos${UNAME_RELEASE} - exit 0 ;; - PowerPC:LynxOS:2.*:* | PowerPC:LynxOS:3.[01]*:* | PowerPC:LynxOS:4.0*:*) - echo powerpc-unknown-lynxos${UNAME_RELEASE} - exit 0 ;; + GUESS=rs6000-unknown-lynxos$UNAME_RELEASE + ;; + PowerPC:LynxOS:2.*:* | PowerPC:LynxOS:3.[01]*:* | PowerPC:LynxOS:4.[02]*:*) + GUESS=powerpc-unknown-lynxos$UNAME_RELEASE + ;; SM[BE]S:UNIX_SV:*:*) - echo mips-dde-sysv${UNAME_RELEASE} - exit 0 ;; + GUESS=mips-dde-sysv$UNAME_RELEASE + ;; RM*:ReliantUNIX-*:*:*) - echo mips-sni-sysv4 - exit 0 ;; + GUESS=mips-sni-sysv4 + ;; RM*:SINIX-*:*:*) - echo mips-sni-sysv4 - exit 0 ;; + GUESS=mips-sni-sysv4 + ;; *:SINIX-*:*:*) if uname -p 2>/dev/null >/dev/null ; then UNAME_MACHINE=`(uname -p) 2>/dev/null` - echo ${UNAME_MACHINE}-sni-sysv4 + GUESS=$UNAME_MACHINE-sni-sysv4 else - echo ns32k-sni-sysv + GUESS=ns32k-sni-sysv fi - exit 0 ;; - PENTIUM:*:4.0*:*) # Unisys `ClearPath HMP IX 4000' SVR4/MP effort - # says - echo i586-unisys-sysv4 - exit 0 ;; + ;; + PENTIUM:*:4.0*:*) # Unisys `ClearPath HMP IX 4000' SVR4/MP effort + # says + GUESS=i586-unisys-sysv4 + ;; *:UNIX_System_V:4*:FTX*) # From Gerald Hewes . # How about differentiating between stratus architectures? -djm - echo hppa1.1-stratus-sysv4 - exit 0 ;; + GUESS=hppa1.1-stratus-sysv4 + ;; *:*:*:FTX*) # From seanf@swdc.stratus.com. - echo i860-stratus-sysv4 - exit 0 ;; + GUESS=i860-stratus-sysv4 + ;; + i*86:VOS:*:*) + # From Paul.Green@stratus.com. + GUESS=$UNAME_MACHINE-stratus-vos + ;; *:VOS:*:*) # From Paul.Green@stratus.com. - echo hppa1.1-stratus-vos - exit 0 ;; + GUESS=hppa1.1-stratus-vos + ;; mc68*:A/UX:*:*) - echo m68k-apple-aux${UNAME_RELEASE} - exit 0 ;; + GUESS=m68k-apple-aux$UNAME_RELEASE + ;; news*:NEWS-OS:6*:*) - echo mips-sony-newsos6 - exit 0 ;; + GUESS=mips-sony-newsos6 + ;; R[34]000:*System_V*:*:* | R4000:UNIX_SYSV:*:* | R*000:UNIX_SV:*:*) - if [ -d /usr/nec ]; then - echo mips-nec-sysv${UNAME_RELEASE} + if test -d /usr/nec; then + GUESS=mips-nec-sysv$UNAME_RELEASE else - echo mips-unknown-sysv${UNAME_RELEASE} + GUESS=mips-unknown-sysv$UNAME_RELEASE fi - exit 0 ;; + ;; BeBox:BeOS:*:*) # BeOS running on hardware made by Be, PPC only. - echo powerpc-be-beos - exit 0 ;; + GUESS=powerpc-be-beos + ;; BeMac:BeOS:*:*) # BeOS running on Mac or Mac clone, PPC only. - echo powerpc-apple-beos - exit 0 ;; + GUESS=powerpc-apple-beos + ;; BePC:BeOS:*:*) # BeOS running on Intel PC compatible. - echo i586-pc-beos - exit 0 ;; + GUESS=i586-pc-beos + ;; + BePC:Haiku:*:*) # Haiku running on Intel PC compatible. + GUESS=i586-pc-haiku + ;; + ppc:Haiku:*:*) # Haiku running on Apple PowerPC + GUESS=powerpc-apple-haiku + ;; + *:Haiku:*:*) # Haiku modern gcc (not bound by BeOS compat) + GUESS=$UNAME_MACHINE-unknown-haiku + ;; SX-4:SUPER-UX:*:*) - echo sx4-nec-superux${UNAME_RELEASE} - exit 0 ;; + GUESS=sx4-nec-superux$UNAME_RELEASE + ;; SX-5:SUPER-UX:*:*) - echo sx5-nec-superux${UNAME_RELEASE} - exit 0 ;; + GUESS=sx5-nec-superux$UNAME_RELEASE + ;; + SX-6:SUPER-UX:*:*) + GUESS=sx6-nec-superux$UNAME_RELEASE + ;; + SX-7:SUPER-UX:*:*) + GUESS=sx7-nec-superux$UNAME_RELEASE + ;; + SX-8:SUPER-UX:*:*) + GUESS=sx8-nec-superux$UNAME_RELEASE + ;; + SX-8R:SUPER-UX:*:*) + GUESS=sx8r-nec-superux$UNAME_RELEASE + ;; + SX-ACE:SUPER-UX:*:*) + GUESS=sxace-nec-superux$UNAME_RELEASE + ;; Power*:Rhapsody:*:*) - echo powerpc-apple-rhapsody${UNAME_RELEASE} - exit 0 ;; + GUESS=powerpc-apple-rhapsody$UNAME_RELEASE + ;; *:Rhapsody:*:*) - echo ${UNAME_MACHINE}-apple-rhapsody${UNAME_RELEASE} - exit 0 ;; + GUESS=$UNAME_MACHINE-apple-rhapsody$UNAME_RELEASE + ;; + arm64:Darwin:*:*) + GUESS=aarch64-apple-darwin$UNAME_RELEASE + ;; *:Darwin:*:*) - echo `uname -p`-apple-darwin${UNAME_RELEASE} - exit 0 ;; + UNAME_PROCESSOR=`uname -p` + case $UNAME_PROCESSOR in + unknown) UNAME_PROCESSOR=powerpc ;; + esac + if command -v xcode-select > /dev/null 2> /dev/null && \ + ! xcode-select --print-path > /dev/null 2> /dev/null ; then + # Avoid executing cc if there is no toolchain installed as + # cc will be a stub that puts up a graphical alert + # prompting the user to install developer tools. + CC_FOR_BUILD=no_compiler_found + else + set_cc_for_build + fi + if test "$CC_FOR_BUILD" != no_compiler_found; then + if (echo '#ifdef __LP64__'; echo IS_64BIT_ARCH; echo '#endif') | \ + (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \ + grep IS_64BIT_ARCH >/dev/null + then + case $UNAME_PROCESSOR in + i386) UNAME_PROCESSOR=x86_64 ;; + powerpc) UNAME_PROCESSOR=powerpc64 ;; + esac + fi + # On 10.4-10.6 one might compile for PowerPC via gcc -arch ppc + if (echo '#ifdef __POWERPC__'; echo IS_PPC; echo '#endif') | \ + (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \ + grep IS_PPC >/dev/null + then + UNAME_PROCESSOR=powerpc + fi + elif test "$UNAME_PROCESSOR" = i386 ; then + # uname -m returns i386 or x86_64 + UNAME_PROCESSOR=$UNAME_MACHINE + fi + GUESS=$UNAME_PROCESSOR-apple-darwin$UNAME_RELEASE + ;; *:procnto*:*:* | *:QNX:[0123456789]*:*) UNAME_PROCESSOR=`uname -p` - if test "$UNAME_PROCESSOR" = "x86"; then + if test "$UNAME_PROCESSOR" = x86; then UNAME_PROCESSOR=i386 UNAME_MACHINE=pc fi - echo ${UNAME_PROCESSOR}-${UNAME_MACHINE}-nto-qnx${UNAME_RELEASE} - exit 0 ;; + GUESS=$UNAME_PROCESSOR-$UNAME_MACHINE-nto-qnx$UNAME_RELEASE + ;; *:QNX:*:4*) - echo i386-pc-qnx - exit 0 ;; - NSR-[GKLNPTVW]:NONSTOP_KERNEL:*:*) - echo nsr-tandem-nsk${UNAME_RELEASE} - exit 0 ;; + GUESS=i386-pc-qnx + ;; + NEO-*:NONSTOP_KERNEL:*:*) + GUESS=neo-tandem-nsk$UNAME_RELEASE + ;; + NSE-*:NONSTOP_KERNEL:*:*) + GUESS=nse-tandem-nsk$UNAME_RELEASE + ;; + NSR-*:NONSTOP_KERNEL:*:*) + GUESS=nsr-tandem-nsk$UNAME_RELEASE + ;; + NSV-*:NONSTOP_KERNEL:*:*) + GUESS=nsv-tandem-nsk$UNAME_RELEASE + ;; + NSX-*:NONSTOP_KERNEL:*:*) + GUESS=nsx-tandem-nsk$UNAME_RELEASE + ;; *:NonStop-UX:*:*) - echo mips-compaq-nonstopux - exit 0 ;; + GUESS=mips-compaq-nonstopux + ;; BS2000:POSIX*:*:*) - echo bs2000-siemens-sysv - exit 0 ;; + GUESS=bs2000-siemens-sysv + ;; DS/*:UNIX_System_V:*:*) - echo ${UNAME_MACHINE}-${UNAME_SYSTEM}-${UNAME_RELEASE} - exit 0 ;; + GUESS=$UNAME_MACHINE-$UNAME_SYSTEM-$UNAME_RELEASE + ;; *:Plan9:*:*) # "uname -m" is not consistent, so use $cputype instead. 386 # is converted to i386 for consistency with other x86 # operating systems. - if test "$cputype" = "386"; then + if test "${cputype-}" = 386; then UNAME_MACHINE=i386 - else - UNAME_MACHINE="$cputype" + elif test "x${cputype-}" != x; then + UNAME_MACHINE=$cputype fi - echo ${UNAME_MACHINE}-unknown-plan9 - exit 0 ;; - i*86:OS/2:*:*) - # If we were able to find `uname', then EMX Unix compatibility - # is probably installed. - echo ${UNAME_MACHINE}-pc-os2-emx - exit 0 ;; + GUESS=$UNAME_MACHINE-unknown-plan9 + ;; *:TOPS-10:*:*) - echo pdp10-unknown-tops10 - exit 0 ;; + GUESS=pdp10-unknown-tops10 + ;; *:TENEX:*:*) - echo pdp10-unknown-tenex - exit 0 ;; + GUESS=pdp10-unknown-tenex + ;; KS10:TOPS-20:*:* | KL10:TOPS-20:*:* | TYPE4:TOPS-20:*:*) - echo pdp10-dec-tops20 - exit 0 ;; + GUESS=pdp10-dec-tops20 + ;; XKL-1:TOPS-20:*:* | TYPE5:TOPS-20:*:*) - echo pdp10-xkl-tops20 - exit 0 ;; + GUESS=pdp10-xkl-tops20 + ;; *:TOPS-20:*:*) - echo pdp10-unknown-tops20 - exit 0 ;; + GUESS=pdp10-unknown-tops20 + ;; *:ITS:*:*) - echo pdp10-unknown-its - exit 0 ;; - i*86:XTS-300:*:STOP) - echo ${UNAME_MACHINE}-unknown-stop - exit 0 ;; - i*86:atheos:*:*) - echo ${UNAME_MACHINE}-unknown-atheos - exit 0 ;; + GUESS=pdp10-unknown-its + ;; + SEI:*:*:SEIUX) + GUESS=mips-sei-seiux$UNAME_RELEASE + ;; + *:DragonFly:*:*) + DRAGONFLY_REL=`echo "$UNAME_RELEASE" | sed -e 's/[-(].*//'` + GUESS=$UNAME_MACHINE-unknown-dragonfly$DRAGONFLY_REL + ;; + *:*VMS:*:*) + UNAME_MACHINE=`(uname -p) 2>/dev/null` + case $UNAME_MACHINE in + A*) GUESS=alpha-dec-vms ;; + I*) GUESS=ia64-dec-vms ;; + V*) GUESS=vax-dec-vms ;; + esac ;; + *:XENIX:*:SysV) + GUESS=i386-pc-xenix + ;; + i*86:skyos:*:*) + SKYOS_REL=`echo "$UNAME_RELEASE" | sed -e 's/ .*$//'` + GUESS=$UNAME_MACHINE-pc-skyos$SKYOS_REL + ;; + i*86:rdos:*:*) + GUESS=$UNAME_MACHINE-pc-rdos + ;; + i*86:Fiwix:*:*) + GUESS=$UNAME_MACHINE-pc-fiwix + ;; + *:AROS:*:*) + GUESS=$UNAME_MACHINE-unknown-aros + ;; + x86_64:VMkernel:*:*) + GUESS=$UNAME_MACHINE-unknown-esx + ;; + amd64:Isilon\ OneFS:*:*) + GUESS=x86_64-unknown-onefs + ;; + *:Unleashed:*:*) + GUESS=$UNAME_MACHINE-unknown-unleashed$UNAME_RELEASE + ;; esac -#echo '(No uname command or uname output not recognized.)' 1>&2 -#echo "${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION}" 1>&2 +# Do we have a guess based on uname results? +if test "x$GUESS" != x; then + echo "$GUESS" + exit +fi -eval $set_cc_for_build -cat >$dummy.c < "$dummy.c" < -# include +#include +#include +#endif +#if defined(ultrix) || defined(_ultrix) || defined(__ultrix) || defined(__ultrix__) +#if defined (vax) || defined (__vax) || defined (__vax__) || defined(mips) || defined(__mips) || defined(__mips__) || defined(MIPS) || defined(__MIPS__) +#include +#if defined(_SIZE_T_) || defined(SIGLOST) +#include +#endif +#endif #endif main () { @@ -1143,22 +1594,14 @@ main () #include printf ("m68k-sony-newsos%s\n", #ifdef NEWSOS4 - "4" + "4" #else - "" + "" #endif - ); exit (0); + ); exit (0); #endif #endif -#if defined (__arm) && defined (__acorn) && defined (__unix) - printf ("arm-acorn-riscix"); exit (0); -#endif - -#if defined (hp300) && !defined (hpux) - printf ("m68k-hp-bsd\n"); exit (0); -#endif - #if defined (NeXT) #if !defined (__ARCHITECTURE__) #define __ARCHITECTURE__ "m68k" @@ -1198,39 +1641,54 @@ main () #endif #if defined (_SEQUENT_) - struct utsname un; - - uname(&un); - - if (strncmp(un.version, "V2", 2) == 0) { - printf ("i386-sequent-ptx2\n"); exit (0); - } - if (strncmp(un.version, "V1", 2) == 0) { /* XXX is V1 correct? */ - printf ("i386-sequent-ptx1\n"); exit (0); - } - printf ("i386-sequent-ptx\n"); exit (0); + struct utsname un; + uname(&un); + if (strncmp(un.version, "V2", 2) == 0) { + printf ("i386-sequent-ptx2\n"); exit (0); + } + if (strncmp(un.version, "V1", 2) == 0) { /* XXX is V1 correct? */ + printf ("i386-sequent-ptx1\n"); exit (0); + } + printf ("i386-sequent-ptx\n"); exit (0); #endif #if defined (vax) -# if !defined (ultrix) -# include -# if defined (BSD) -# if BSD == 43 - printf ("vax-dec-bsd4.3\n"); exit (0); -# else -# if BSD == 199006 - printf ("vax-dec-bsd4.3reno\n"); exit (0); -# else - printf ("vax-dec-bsd\n"); exit (0); -# endif -# endif -# else - printf ("vax-dec-bsd\n"); exit (0); -# endif -# else - printf ("vax-dec-ultrix\n"); exit (0); -# endif +#if !defined (ultrix) +#include +#if defined (BSD) +#if BSD == 43 + printf ("vax-dec-bsd4.3\n"); exit (0); +#else +#if BSD == 199006 + printf ("vax-dec-bsd4.3reno\n"); exit (0); +#else + printf ("vax-dec-bsd\n"); exit (0); +#endif +#endif +#else + printf ("vax-dec-bsd\n"); exit (0); +#endif +#else +#if defined(_SIZE_T_) || defined(SIGLOST) + struct utsname un; + uname (&un); + printf ("vax-dec-ultrix%s\n", un.release); exit (0); +#else + printf ("vax-dec-ultrix\n"); exit (0); +#endif +#endif +#endif +#if defined(ultrix) || defined(_ultrix) || defined(__ultrix) || defined(__ultrix__) +#if defined(mips) || defined(__mips) || defined(__mips__) || defined(MIPS) || defined(__MIPS__) +#if defined(_SIZE_T_) || defined(SIGLOST) + struct utsname *un; + uname (&un); + printf ("mips-dec-ultrix%s\n", un.release); exit (0); +#else + printf ("mips-dec-ultrix\n"); exit (0); +#endif +#endif #endif #if defined (alliant) && defined (i860) @@ -1241,52 +1699,46 @@ main () } EOF -$CC_FOR_BUILD $dummy.c -o $dummy 2>/dev/null && ./$dummy && rm -f $dummy.c $dummy && exit 0 -rm -f $dummy.c $dummy +$CC_FOR_BUILD -o "$dummy" "$dummy.c" 2>/dev/null && SYSTEM_NAME=`"$dummy"` && + { echo "$SYSTEM_NAME"; exit; } # Apollos put the system type in the environment. +test -d /usr/apollo && { echo "$ISP-apollo-$SYSTYPE"; exit; } -test -d /usr/apollo && { echo ${ISP}-apollo-${SYSTYPE}; exit 0; } +echo "$0: unable to guess system type" >&2 -# Convex versions that predate uname can use getsysinfo(1) +case $UNAME_MACHINE:$UNAME_SYSTEM in + mips:Linux | mips64:Linux) + # If we got here on MIPS GNU/Linux, output extra information. + cat >&2 <&2 <&2 < in order to provide the needed -information to handle your system. +If $0 has already been updated, send the following data and any +information you think might be pertinent to config-patches@gnu.org to +provide the necessary information to handle your system. config.guess timestamp = $timestamp @@ -1305,16 +1757,17 @@ hostinfo = `(hostinfo) 2>/dev/null` /usr/bin/oslevel = `(/usr/bin/oslevel) 2>/dev/null` /usr/convex/getsysinfo = `(/usr/convex/getsysinfo) 2>/dev/null` -UNAME_MACHINE = ${UNAME_MACHINE} -UNAME_RELEASE = ${UNAME_RELEASE} -UNAME_SYSTEM = ${UNAME_SYSTEM} -UNAME_VERSION = ${UNAME_VERSION} +UNAME_MACHINE = "$UNAME_MACHINE" +UNAME_RELEASE = "$UNAME_RELEASE" +UNAME_SYSTEM = "$UNAME_SYSTEM" +UNAME_VERSION = "$UNAME_VERSION" EOF +fi exit 1 # Local variables: -# eval: (add-hook 'write-file-hooks 'time-stamp) +# eval: (add-hook 'before-save-hook 'time-stamp) # time-stamp-start: "timestamp='" # time-stamp-format: "%:y-%02m-%02d" # time-stamp-end: "'" diff --git a/config.h.in b/config.h.in index 28805a72..dec6ec2f 100644 --- a/config.h.in +++ b/config.h.in @@ -3,6 +3,9 @@ /* Define if you have the strftime function. */ #undef HAVE_STRFTIME +/* Define to a `signed integer` if stdint.h or inttypes.h don't define */ +#undef intptr_t + /* Define to `unsigned' if doesn't define. */ #undef size_t @@ -18,11 +21,23 @@ /* Define if your declares struct tm. */ #undef TM_IN_SYS_TIME +/* Define if has a prototype for isblank. */ +#undef HAVE_DECL_ISBLANK + +/* Define if has a prototype for strcasecmp. */ +#undef HAVE_DECL_STRCASECMP + +/* Define if has a prototype for strcasestr. */ +#undef HAVE_DECL_STRCASESTR + #undef HAVE_GDBM_H #undef HAVE_ICONV #undef HAVE_ICONV_H +/* Define if you're using the chardet library */ +#undef HAVE_CHARDET + /* Define if you're using the FNV hash library */ #undef HAVE_LIBFNV @@ -44,12 +59,6 @@ /* Define if you have the mkdir function. */ #undef HAVE_MKDIR -/* Define if you have the strcasecmp function. */ -#undef HAVE_STRCASECMP - -/* Define if you have the strcasestr function. */ -#undef HAVE_STRCASESTR - /* Define if you have the strdup function. */ #undef HAVE_STRDUP @@ -80,6 +89,9 @@ /* Define if you have the header file. */ #undef HAVE_FCNTL_H +/* Define if you have the header file. */ +#undef HAVE_INTTYPES_H + /* Define if you have the header file. */ #undef HAVE_LOCALE_H @@ -101,12 +113,24 @@ /* Define if you have the header file. */ #undef HAVE_STDARG_H +/* Define if you have the header file. */ +#undef HAVE_STDINT_H + /* Define if you have the header file. */ #undef HAVE_STDIO_H /* Define if you have the header file. */ #undef HAVE_STDLIB_H +/* Define if you have the isblank function. */ +#undef HAVE_ISBLANK + +/* Define if you have the strcasecmp function. */ +#undef HAVE_STRCASECMP + +/* Define if you have the strcasestr function. */ +#undef HAVE_STRCASESTR + /* Define if you have the header file. */ #undef HAVE_STRING_H diff --git a/config.sub b/config.sub index f3657978..de4259e4 100755 --- a/config.sub +++ b/config.sub @@ -1,42 +1,42 @@ #! /bin/sh # Configuration validation subroutine script. -# Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, -# 2000, 2001, 2002 Free Software Foundation, Inc. +# Copyright 1992-2023 Free Software Foundation, Inc. -timestamp='2002-03-07' +# shellcheck disable=SC2006,SC2268 # see below for rationale -# This file is (in principle) common to ALL GNU software. -# The presence of a machine in this file suggests that SOME GNU software -# can handle that machine. It does not imply ALL GNU software can. -# -# This file is free software; you can redistribute it and/or modify -# it under the terms of the GNU General Public License as published by -# the Free Software Foundation; either version 2 of the License, or +timestamp='2023-01-21' + +# This file is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # -# This program is distributed in the hope that it will be useful, -# but WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -# GNU General Public License for more details. +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. # # You should have received a copy of the GNU General Public License -# along with this program; if not, write to the Free Software -# Foundation, Inc., 59 Temple Place - Suite 330, -# Boston, MA 02111-1307, USA. - +# along with this program; if not, see . +# # As a special exception to the GNU General Public License, if you # distribute this file as part of a program that contains a # configuration script generated by Autoconf, you may include it under -# the same distribution terms that you use for the rest of that program. +# the same distribution terms that you use for the rest of that +# program. This Exception is an additional permission under section 7 +# of the GNU General Public License, version 3 ("GPLv3"). + -# Please send patches to . Submit a context -# diff and a properly formatted ChangeLog entry. +# Please send patches to . # # Configuration subroutine to validate and canonicalize a configuration type. # Supply the specified configuration type as an argument. # If it is invalid, we print an error message on stderr and exit with code 1. # Otherwise, we print the canonical config type on stdout and succeed. +# You can get the latest version of this script from: +# https://git.savannah.gnu.org/cgit/config.git/plain/config.sub + # This file is supposed to be the same for all GNU packages # and recognize all the CPU types, system types and aliases # that are meaningful with *any* GNU software. @@ -52,15 +52,21 @@ timestamp='2002-03-07' # CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM # It is wrong to echo any other type of specification. +# The "shellcheck disable" line above the timestamp inhibits complaints +# about features and limitations of the classic Bourne shell that were +# superseded or lifted in POSIX. However, this script identifies a wide +# variety of pre-POSIX systems that do not have POSIX shells at all, and +# even some reasonably current systems (Solaris 10 as case-in-point) still +# have a pre-POSIX /bin/sh. + me=`echo "$0" | sed -e 's,.*/,,'` usage="\ -Usage: $0 [OPTION] CPU-MFR-OPSYS - $0 [OPTION] ALIAS +Usage: $0 [OPTION] CPU-MFR-OPSYS or ALIAS Canonicalize a configuration name. -Operation modes: +Options: -h, --help print this help, then exit -t, --time-stamp print date of last modification, then exit -v, --version print version number, then exit @@ -70,8 +76,7 @@ Report bugs and patches to ." version="\ GNU config.sub ($timestamp) -Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001 -Free Software Foundation, Inc. +Copyright 1992-2023 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE." @@ -83,23 +88,23 @@ Try \`$me --help' for more information." while test $# -gt 0 ; do case $1 in --time-stamp | --time* | -t ) - echo "$timestamp" ; exit 0 ;; + echo "$timestamp" ; exit ;; --version | -v ) - echo "$version" ; exit 0 ;; + echo "$version" ; exit ;; --help | --h* | -h ) - echo "$usage"; exit 0 ;; + echo "$usage"; exit ;; -- ) # Stop option processing shift; break ;; - ) # Use stdin as input. break ;; -* ) - echo "$me: invalid option $1$help" + echo "$me: invalid option $1$help" >&2 exit 1 ;; *local*) # First pass through any local machine types. - echo $1 - exit 0;; + echo "$1" + exit ;; * ) break ;; @@ -114,932 +119,1186 @@ case $# in exit 1;; esac -# Separate what the user gave into CPU-COMPANY and OS or KERNEL-OS (if any). -# Here we must recognize all the valid KERNEL-OS combinations. -maybe_os=`echo $1 | sed 's/^\(.*\)-\([^-]*-[^-]*\)$/\2/'` -case $maybe_os in - nto-qnx* | linux-gnu* | storm-chaos* | os2-emx* | windows32-* | rtmk-nova*) - os=-$maybe_os - basic_machine=`echo $1 | sed 's/^\(.*\)-\([^-]*-[^-]*\)$/\1/'` - ;; - *) - basic_machine=`echo $1 | sed 's/-[^-]*$//'` - if [ $basic_machine != $1 ] - then os=`echo $1 | sed 's/.*-/-/'` - else os=; fi - ;; -esac +# Split fields of configuration type +# shellcheck disable=SC2162 +saved_IFS=$IFS +IFS="-" read field1 field2 field3 field4 <&2 + exit 1 ;; - -sco*) - os=-sco3.2v2 - basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` + *-*-*-*) + basic_machine=$field1-$field2 + basic_os=$field3-$field4 ;; - -udk*) - basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` + *-*-*) + # Ambiguous whether COMPANY is present, or skipped and KERNEL-OS is two + # parts + maybe_os=$field2-$field3 + case $maybe_os in + nto-qnx* | linux-* | uclinux-uclibc* \ + | uclinux-gnu* | kfreebsd*-gnu* | knetbsd*-gnu* | netbsd*-gnu* \ + | netbsd*-eabi* | kopensolaris*-gnu* | cloudabi*-eabi* \ + | storm-chaos* | os2-emx* | rtmk-nova* | managarm-*) + basic_machine=$field1 + basic_os=$maybe_os + ;; + android-linux) + basic_machine=$field1-unknown + basic_os=linux-android + ;; + *) + basic_machine=$field1-$field2 + basic_os=$field3 + ;; + esac ;; - -isc) - os=-isc2.2 - basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` + *-*) + # A lone config we happen to match not fitting any pattern + case $field1-$field2 in + decstation-3100) + basic_machine=mips-dec + basic_os= + ;; + *-*) + # Second component is usually, but not always the OS + case $field2 in + # Prevent following clause from handling this valid os + sun*os*) + basic_machine=$field1 + basic_os=$field2 + ;; + zephyr*) + basic_machine=$field1-unknown + basic_os=$field2 + ;; + # Manufacturers + dec* | mips* | sequent* | encore* | pc533* | sgi* | sony* \ + | att* | 7300* | 3300* | delta* | motorola* | sun[234]* \ + | unicom* | ibm* | next | hp | isi* | apollo | altos* \ + | convergent* | ncr* | news | 32* | 3600* | 3100* \ + | hitachi* | c[123]* | convex* | sun | crds | omron* | dg \ + | ultra | tti* | harris | dolphin | highlevel | gould \ + | cbm | ns | masscomp | apple | axis | knuth | cray \ + | microblaze* | sim | cisco \ + | oki | wec | wrs | winbond) + basic_machine=$field1-$field2 + basic_os= + ;; + *) + basic_machine=$field1 + basic_os=$field2 + ;; + esac + ;; + esac ;; - -clix*) - basic_machine=clipper-intergraph + *) + # Convert single-component short-hands not valid as part of + # multi-component configurations. + case $field1 in + 386bsd) + basic_machine=i386-pc + basic_os=bsd + ;; + a29khif) + basic_machine=a29k-amd + basic_os=udi + ;; + adobe68k) + basic_machine=m68010-adobe + basic_os=scout + ;; + alliant) + basic_machine=fx80-alliant + basic_os= + ;; + altos | altos3068) + basic_machine=m68k-altos + basic_os= + ;; + am29k) + basic_machine=a29k-none + basic_os=bsd + ;; + amdahl) + basic_machine=580-amdahl + basic_os=sysv + ;; + amiga) + basic_machine=m68k-unknown + basic_os= + ;; + amigaos | amigados) + basic_machine=m68k-unknown + basic_os=amigaos + ;; + amigaunix | amix) + basic_machine=m68k-unknown + basic_os=sysv4 + ;; + apollo68) + basic_machine=m68k-apollo + basic_os=sysv + ;; + apollo68bsd) + basic_machine=m68k-apollo + basic_os=bsd + ;; + aros) + basic_machine=i386-pc + basic_os=aros + ;; + aux) + basic_machine=m68k-apple + basic_os=aux + ;; + balance) + basic_machine=ns32k-sequent + basic_os=dynix + ;; + blackfin) + basic_machine=bfin-unknown + basic_os=linux + ;; + cegcc) + basic_machine=arm-unknown + basic_os=cegcc + ;; + convex-c1) + basic_machine=c1-convex + basic_os=bsd + ;; + convex-c2) + basic_machine=c2-convex + basic_os=bsd + ;; + convex-c32) + basic_machine=c32-convex + basic_os=bsd + ;; + convex-c34) + basic_machine=c34-convex + basic_os=bsd + ;; + convex-c38) + basic_machine=c38-convex + basic_os=bsd + ;; + cray) + basic_machine=j90-cray + basic_os=unicos + ;; + crds | unos) + basic_machine=m68k-crds + basic_os= + ;; + da30) + basic_machine=m68k-da30 + basic_os= + ;; + decstation | pmax | pmin | dec3100 | decstatn) + basic_machine=mips-dec + basic_os= + ;; + delta88) + basic_machine=m88k-motorola + basic_os=sysv3 + ;; + dicos) + basic_machine=i686-pc + basic_os=dicos + ;; + djgpp) + basic_machine=i586-pc + basic_os=msdosdjgpp + ;; + ebmon29k) + basic_machine=a29k-amd + basic_os=ebmon + ;; + es1800 | OSE68k | ose68k | ose | OSE) + basic_machine=m68k-ericsson + basic_os=ose + ;; + gmicro) + basic_machine=tron-gmicro + basic_os=sysv + ;; + go32) + basic_machine=i386-pc + basic_os=go32 + ;; + h8300hms) + basic_machine=h8300-hitachi + basic_os=hms + ;; + h8300xray) + basic_machine=h8300-hitachi + basic_os=xray + ;; + h8500hms) + basic_machine=h8500-hitachi + basic_os=hms + ;; + harris) + basic_machine=m88k-harris + basic_os=sysv3 + ;; + hp300 | hp300hpux) + basic_machine=m68k-hp + basic_os=hpux + ;; + hp300bsd) + basic_machine=m68k-hp + basic_os=bsd + ;; + hppaosf) + basic_machine=hppa1.1-hp + basic_os=osf + ;; + hppro) + basic_machine=hppa1.1-hp + basic_os=proelf + ;; + i386mach) + basic_machine=i386-mach + basic_os=mach + ;; + isi68 | isi) + basic_machine=m68k-isi + basic_os=sysv + ;; + m68knommu) + basic_machine=m68k-unknown + basic_os=linux + ;; + magnum | m3230) + basic_machine=mips-mips + basic_os=sysv + ;; + merlin) + basic_machine=ns32k-utek + basic_os=sysv + ;; + mingw64) + basic_machine=x86_64-pc + basic_os=mingw64 + ;; + mingw32) + basic_machine=i686-pc + basic_os=mingw32 + ;; + mingw32ce) + basic_machine=arm-unknown + basic_os=mingw32ce + ;; + monitor) + basic_machine=m68k-rom68k + basic_os=coff + ;; + morphos) + basic_machine=powerpc-unknown + basic_os=morphos + ;; + moxiebox) + basic_machine=moxie-unknown + basic_os=moxiebox + ;; + msdos) + basic_machine=i386-pc + basic_os=msdos + ;; + msys) + basic_machine=i686-pc + basic_os=msys + ;; + mvs) + basic_machine=i370-ibm + basic_os=mvs + ;; + nacl) + basic_machine=le32-unknown + basic_os=nacl + ;; + ncr3000) + basic_machine=i486-ncr + basic_os=sysv4 + ;; + netbsd386) + basic_machine=i386-pc + basic_os=netbsd + ;; + netwinder) + basic_machine=armv4l-rebel + basic_os=linux + ;; + news | news700 | news800 | news900) + basic_machine=m68k-sony + basic_os=newsos + ;; + news1000) + basic_machine=m68030-sony + basic_os=newsos + ;; + necv70) + basic_machine=v70-nec + basic_os=sysv + ;; + nh3000) + basic_machine=m68k-harris + basic_os=cxux + ;; + nh[45]000) + basic_machine=m88k-harris + basic_os=cxux + ;; + nindy960) + basic_machine=i960-intel + basic_os=nindy + ;; + mon960) + basic_machine=i960-intel + basic_os=mon960 + ;; + nonstopux) + basic_machine=mips-compaq + basic_os=nonstopux + ;; + os400) + basic_machine=powerpc-ibm + basic_os=os400 + ;; + OSE68000 | ose68000) + basic_machine=m68000-ericsson + basic_os=ose + ;; + os68k) + basic_machine=m68k-none + basic_os=os68k + ;; + paragon) + basic_machine=i860-intel + basic_os=osf + ;; + parisc) + basic_machine=hppa-unknown + basic_os=linux + ;; + psp) + basic_machine=mipsallegrexel-sony + basic_os=psp + ;; + pw32) + basic_machine=i586-unknown + basic_os=pw32 + ;; + rdos | rdos64) + basic_machine=x86_64-pc + basic_os=rdos + ;; + rdos32) + basic_machine=i386-pc + basic_os=rdos + ;; + rom68k) + basic_machine=m68k-rom68k + basic_os=coff + ;; + sa29200) + basic_machine=a29k-amd + basic_os=udi + ;; + sei) + basic_machine=mips-sei + basic_os=seiux + ;; + sequent) + basic_machine=i386-sequent + basic_os= + ;; + sps7) + basic_machine=m68k-bull + basic_os=sysv2 + ;; + st2000) + basic_machine=m68k-tandem + basic_os= + ;; + stratus) + basic_machine=i860-stratus + basic_os=sysv4 + ;; + sun2) + basic_machine=m68000-sun + basic_os= + ;; + sun2os3) + basic_machine=m68000-sun + basic_os=sunos3 + ;; + sun2os4) + basic_machine=m68000-sun + basic_os=sunos4 + ;; + sun3) + basic_machine=m68k-sun + basic_os= + ;; + sun3os3) + basic_machine=m68k-sun + basic_os=sunos3 + ;; + sun3os4) + basic_machine=m68k-sun + basic_os=sunos4 + ;; + sun4) + basic_machine=sparc-sun + basic_os= + ;; + sun4os3) + basic_machine=sparc-sun + basic_os=sunos3 + ;; + sun4os4) + basic_machine=sparc-sun + basic_os=sunos4 + ;; + sun4sol2) + basic_machine=sparc-sun + basic_os=solaris2 + ;; + sun386 | sun386i | roadrunner) + basic_machine=i386-sun + basic_os= + ;; + sv1) + basic_machine=sv1-cray + basic_os=unicos + ;; + symmetry) + basic_machine=i386-sequent + basic_os=dynix + ;; + t3e) + basic_machine=alphaev5-cray + basic_os=unicos + ;; + t90) + basic_machine=t90-cray + basic_os=unicos + ;; + toad1) + basic_machine=pdp10-xkl + basic_os=tops20 + ;; + tpf) + basic_machine=s390x-ibm + basic_os=tpf + ;; + udi29k) + basic_machine=a29k-amd + basic_os=udi + ;; + ultra3) + basic_machine=a29k-nyu + basic_os=sym1 + ;; + v810 | necv810) + basic_machine=v810-nec + basic_os=none + ;; + vaxv) + basic_machine=vax-dec + basic_os=sysv + ;; + vms) + basic_machine=vax-dec + basic_os=vms + ;; + vsta) + basic_machine=i386-pc + basic_os=vsta + ;; + vxworks960) + basic_machine=i960-wrs + basic_os=vxworks + ;; + vxworks68) + basic_machine=m68k-wrs + basic_os=vxworks + ;; + vxworks29k) + basic_machine=a29k-wrs + basic_os=vxworks + ;; + xbox) + basic_machine=i686-pc + basic_os=mingw32 + ;; + ymp) + basic_machine=ymp-cray + basic_os=unicos + ;; + *) + basic_machine=$1 + basic_os= + ;; + esac ;; - -isc*) - basic_machine=`echo $1 | sed -e 's/86-.*/86-pc/'` +esac + +# Decode 1-component or ad-hoc basic machines +case $basic_machine in + # Here we handle the default manufacturer of certain CPU types. It is in + # some cases the only manufacturer, in others, it is the most popular. + w89k) + cpu=hppa1.1 + vendor=winbond ;; - -lynx*) - os=-lynxos + op50n) + cpu=hppa1.1 + vendor=oki ;; - -ptx*) - basic_machine=`echo $1 | sed -e 's/86-.*/86-sequent/'` + op60c) + cpu=hppa1.1 + vendor=oki ;; - -windowsnt*) - os=`echo $os | sed -e 's/windowsnt/winnt/'` + ibm*) + cpu=i370 + vendor=ibm ;; - -psos*) - os=-psos + orion105) + cpu=clipper + vendor=highlevel ;; - -mint | -mint[0-9]*) - basic_machine=m68k-atari - os=-mint + mac | mpw | mac-mpw) + cpu=m68k + vendor=apple ;; -esac - -# Decode aliases for certain CPU-COMPANY combinations. -case $basic_machine in - # Recognize the basic CPU types without company name. - # Some are omitted here because they have special meanings below. - 1750a | 580 \ - | a29k \ - | alpha | alphaev[4-8] | alphaev56 | alphaev6[78] | alphapca5[67] \ - | alpha64 | alpha64ev[4-8] | alpha64ev56 | alpha64ev6[78] | alpha64pca5[67] \ - | arc | arm | arm[bl]e | arme[lb] | armv[2345] | armv[345][lb] | avr \ - | c4x | clipper \ - | d10v | d30v | dsp16xx \ - | fr30 \ - | h8300 | h8500 | hppa | hppa1.[01] | hppa2.0 | hppa2.0[nw] | hppa64 \ - | i370 | i860 | i960 | ia64 \ - | m32r | m68000 | m68k | m88k | mcore \ - | mips | mips16 | mips64 | mips64el | mips64orion | mips64orionel \ - | mips64vr4100 | mips64vr4100el | mips64vr4300 \ - | mips64vr4300el | mips64vr5000 | mips64vr5000el \ - | mipsbe | mipseb | mipsel | mipsle | mipstx39 | mipstx39el \ - | mipsisa32 | mipsisa64 \ - | mn10200 | mn10300 \ - | ns16k | ns32k \ - | openrisc | or32 \ - | pdp10 | pdp11 | pj | pjl \ - | powerpc | powerpc64 | powerpc64le | powerpcle | ppcbe \ - | pyramid \ - | sh | sh[34] | sh[34]eb | shbe | shle | sh64 \ - | sparc | sparc64 | sparc86x | sparclet | sparclite | sparcv9 | sparcv9b \ - | strongarm \ - | tahoe | thumb | tic80 | tron \ - | v850 | v850e \ - | we32k \ - | x86 | xscale | xstormy16 | xtensa \ - | z8k) - basic_machine=$basic_machine-unknown - ;; - m6811 | m68hc11 | m6812 | m68hc12) - # Motorola 68HC11/12. - basic_machine=$basic_machine-unknown - os=-none - ;; - m88110 | m680[12346]0 | m683?2 | m68360 | m5200 | v70 | w65 | z8k) + pmac | pmac-mpw) + cpu=powerpc + vendor=apple ;; - # We use `pc' rather than `unknown' - # because (1) that's what they normally are, and - # (2) the word "unknown" tends to confuse beginning users. - i*86 | x86_64) - basic_machine=$basic_machine-pc - ;; - # Object if more than one company name word. - *-*-*) - echo Invalid configuration \`$1\': machine \`$basic_machine\' not recognized 1>&2 - exit 1 - ;; - # Recognize the basic CPU types with company name. - 580-* \ - | a29k-* \ - | alpha-* | alphaev[4-8]-* | alphaev56-* | alphaev6[78]-* \ - | alpha64-* | alpha64ev[4-8]-* | alpha64ev56-* | alpha64ev6[78]-* \ - | alphapca5[67]-* | alpha64pca5[67]-* | arc-* \ - | arm-* | armbe-* | armle-* | armv*-* \ - | avr-* \ - | bs2000-* \ - | c[123]* | c30-* | [cjt]90-* | c54x-* \ - | clipper-* | cydra-* \ - | d10v-* | d30v-* \ - | elxsi-* \ - | f30[01]-* | f700-* | fr30-* | fx80-* \ - | h8300-* | h8500-* \ - | hppa-* | hppa1.[01]-* | hppa2.0-* | hppa2.0[nw]-* | hppa64-* \ - | i*86-* | i860-* | i960-* | ia64-* \ - | m32r-* \ - | m68000-* | m680[012346]0-* | m68360-* | m683?2-* | m68k-* \ - | m88110-* | m88k-* | mcore-* \ - | mips-* | mips16-* | mips64-* | mips64el-* | mips64orion-* \ - | mips64orionel-* | mips64vr4100-* | mips64vr4100el-* \ - | mips64vr4300-* | mips64vr4300el-* | mipsbe-* | mipseb-* \ - | mipsle-* | mipsel-* | mipstx39-* | mipstx39el-* \ - | none-* | np1-* | ns16k-* | ns32k-* \ - | orion-* \ - | pdp10-* | pdp11-* | pj-* | pjl-* | pn-* | power-* \ - | powerpc-* | powerpc64-* | powerpc64le-* | powerpcle-* | ppcbe-* \ - | pyramid-* \ - | romp-* | rs6000-* \ - | sh-* | sh[34]-* | sh[34]eb-* | shbe-* | shle-* | sh64-* \ - | sparc-* | sparc64-* | sparc86x-* | sparclet-* | sparclite-* \ - | sparcv9-* | sparcv9b-* | strongarm-* | sv1-* | sx?-* \ - | tahoe-* | thumb-* | tic30-* | tic54x-* | tic80-* | tron-* \ - | v850-* | v850e-* | vax-* \ - | we32k-* \ - | x86-* | x86_64-* | xps100-* | xscale-* | xstormy16-* \ - | xtensa-* \ - | ymp-* \ - | z8k-*) - ;; # Recognize the various machine names and aliases which stand # for a CPU type and a company and sometimes even an OS. - 386bsd) - basic_machine=i386-unknown - os=-bsd - ;; 3b1 | 7300 | 7300-att | att-7300 | pc7300 | safari | unixpc) - basic_machine=m68000-att + cpu=m68000 + vendor=att ;; 3b*) - basic_machine=we32k-att - ;; - a29khif) - basic_machine=a29k-amd - os=-udi - ;; - adobe68k) - basic_machine=m68010-adobe - os=-scout - ;; - alliant | fx80) - basic_machine=fx80-alliant - ;; - altos | altos3068) - basic_machine=m68k-altos - ;; - am29k) - basic_machine=a29k-none - os=-bsd - ;; - amdahl) - basic_machine=580-amdahl - os=-sysv - ;; - amiga | amiga-*) - basic_machine=m68k-unknown - ;; - amigaos | amigados) - basic_machine=m68k-unknown - os=-amigaos - ;; - amigaunix | amix) - basic_machine=m68k-unknown - os=-sysv4 - ;; - apollo68) - basic_machine=m68k-apollo - os=-sysv + cpu=we32k + vendor=att ;; - apollo68bsd) - basic_machine=m68k-apollo - os=-bsd - ;; - aux) - basic_machine=m68k-apple - os=-aux - ;; - balance) - basic_machine=ns32k-sequent - os=-dynix - ;; - c90) - basic_machine=c90-cray - os=-unicos - ;; - convex-c1) - basic_machine=c1-convex - os=-bsd - ;; - convex-c2) - basic_machine=c2-convex - os=-bsd - ;; - convex-c32) - basic_machine=c32-convex - os=-bsd - ;; - convex-c34) - basic_machine=c34-convex - os=-bsd - ;; - convex-c38) - basic_machine=c38-convex - os=-bsd - ;; - cray | j90) - basic_machine=j90-cray - os=-unicos - ;; - crds | unos) - basic_machine=m68k-crds - ;; - cris | cris-* | etrax*) - basic_machine=cris-axis - ;; - da30 | da30-*) - basic_machine=m68k-da30 - ;; - decstation | decstation-3100 | pmax | pmax-* | pmin | dec3100 | decstatn) - basic_machine=mips-dec + bluegene*) + cpu=powerpc + vendor=ibm + basic_os=cnk ;; decsystem10* | dec10*) - basic_machine=pdp10-dec - os=-tops10 + cpu=pdp10 + vendor=dec + basic_os=tops10 ;; decsystem20* | dec20*) - basic_machine=pdp10-dec - os=-tops20 + cpu=pdp10 + vendor=dec + basic_os=tops20 ;; delta | 3300 | motorola-3300 | motorola-delta \ | 3300-motorola | delta-motorola) - basic_machine=m68k-motorola - ;; - delta88) - basic_machine=m88k-motorola - os=-sysv3 - ;; - dpx20 | dpx20-*) - basic_machine=rs6000-bull - os=-bosx + cpu=m68k + vendor=motorola ;; - dpx2* | dpx2*-bull) - basic_machine=m68k-bull - os=-sysv3 - ;; - ebmon29k) - basic_machine=a29k-amd - os=-ebmon - ;; - elxsi) - basic_machine=elxsi-elxsi - os=-bsd + dpx2*) + cpu=m68k + vendor=bull + basic_os=sysv3 ;; encore | umax | mmax) - basic_machine=ns32k-encore + cpu=ns32k + vendor=encore ;; - es1800 | OSE68k | ose68k | ose | OSE) - basic_machine=m68k-ericsson - os=-ose + elxsi) + cpu=elxsi + vendor=elxsi + basic_os=${basic_os:-bsd} ;; fx2800) - basic_machine=i860-alliant + cpu=i860 + vendor=alliant ;; genix) - basic_machine=ns32k-ns - ;; - gmicro) - basic_machine=tron-gmicro - os=-sysv - ;; - go32) - basic_machine=i386-pc - os=-go32 + cpu=ns32k + vendor=ns ;; h3050r* | hiux*) - basic_machine=hppa1.1-hitachi - os=-hiuxwe2 - ;; - h8300hms) - basic_machine=h8300-hitachi - os=-hms - ;; - h8300xray) - basic_machine=h8300-hitachi - os=-xray - ;; - h8500hms) - basic_machine=h8500-hitachi - os=-hms - ;; - harris) - basic_machine=m88k-harris - os=-sysv3 - ;; - hp300-*) - basic_machine=m68k-hp - ;; - hp300bsd) - basic_machine=m68k-hp - os=-bsd - ;; - hp300hpux) - basic_machine=m68k-hp - os=-hpux + cpu=hppa1.1 + vendor=hitachi + basic_os=hiuxwe2 ;; hp3k9[0-9][0-9] | hp9[0-9][0-9]) - basic_machine=hppa1.0-hp + cpu=hppa1.0 + vendor=hp ;; hp9k2[0-9][0-9] | hp9k31[0-9]) - basic_machine=m68000-hp + cpu=m68000 + vendor=hp ;; hp9k3[2-9][0-9]) - basic_machine=m68k-hp + cpu=m68k + vendor=hp ;; hp9k6[0-9][0-9] | hp6[0-9][0-9]) - basic_machine=hppa1.0-hp + cpu=hppa1.0 + vendor=hp ;; hp9k7[0-79][0-9] | hp7[0-79][0-9]) - basic_machine=hppa1.1-hp + cpu=hppa1.1 + vendor=hp ;; hp9k78[0-9] | hp78[0-9]) # FIXME: really hppa2.0-hp - basic_machine=hppa1.1-hp + cpu=hppa1.1 + vendor=hp ;; hp9k8[67]1 | hp8[67]1 | hp9k80[24] | hp80[24] | hp9k8[78]9 | hp8[78]9 | hp9k893 | hp893) # FIXME: really hppa2.0-hp - basic_machine=hppa1.1-hp + cpu=hppa1.1 + vendor=hp ;; hp9k8[0-9][13679] | hp8[0-9][13679]) - basic_machine=hppa1.1-hp + cpu=hppa1.1 + vendor=hp ;; hp9k8[0-9][0-9] | hp8[0-9][0-9]) - basic_machine=hppa1.0-hp - ;; - hppa-next) - os=-nextstep3 + cpu=hppa1.0 + vendor=hp ;; - hppaosf) - basic_machine=hppa1.1-hp - os=-osf - ;; - hppro) - basic_machine=hppa1.1-hp - os=-proelf - ;; - i370-ibm* | ibm*) - basic_machine=i370-ibm - ;; -# I'm not sure what "Sysv32" means. Should this be sysv3.2? i*86v32) - basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'` - os=-sysv32 + cpu=`echo "$1" | sed -e 's/86.*/86/'` + vendor=pc + basic_os=sysv32 ;; i*86v4*) - basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'` - os=-sysv4 + cpu=`echo "$1" | sed -e 's/86.*/86/'` + vendor=pc + basic_os=sysv4 ;; i*86v) - basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'` - os=-sysv + cpu=`echo "$1" | sed -e 's/86.*/86/'` + vendor=pc + basic_os=sysv ;; i*86sol2) - basic_machine=`echo $1 | sed -e 's/86.*/86-pc/'` - os=-solaris2 + cpu=`echo "$1" | sed -e 's/86.*/86/'` + vendor=pc + basic_os=solaris2 ;; - i386mach) - basic_machine=i386-mach - os=-mach - ;; - i386-vsta | vsta) - basic_machine=i386-unknown - os=-vsta + j90 | j90-cray) + cpu=j90 + vendor=cray + basic_os=${basic_os:-unicos} ;; iris | iris4d) - basic_machine=mips-sgi - case $os in - -irix*) + cpu=mips + vendor=sgi + case $basic_os in + irix*) ;; *) - os=-irix4 + basic_os=irix4 ;; esac ;; - isi68 | isi) - basic_machine=m68k-isi - os=-sysv - ;; - m88k-omron*) - basic_machine=m88k-omron - ;; - magnum | m3230) - basic_machine=mips-mips - os=-sysv - ;; - merlin) - basic_machine=ns32k-utek - os=-sysv - ;; - mingw32) - basic_machine=i386-pc - os=-mingw32 - ;; miniframe) - basic_machine=m68000-convergent - ;; - *mint | -mint[0-9]* | *MiNT | *MiNT[0-9]*) - basic_machine=m68k-atari - os=-mint - ;; - mips3*-*) - basic_machine=`echo $basic_machine | sed -e 's/mips3/mips64/'` - ;; - mips3*) - basic_machine=`echo $basic_machine | sed -e 's/mips3/mips64/'`-unknown - ;; - mmix*) - basic_machine=mmix-knuth - os=-mmixware - ;; - monitor) - basic_machine=m68k-rom68k - os=-coff - ;; - morphos) - basic_machine=powerpc-unknown - os=-morphos - ;; - msdos) - basic_machine=i386-pc - os=-msdos + cpu=m68000 + vendor=convergent ;; - mvs) - basic_machine=i370-ibm - os=-mvs - ;; - ncr3000) - basic_machine=i486-ncr - os=-sysv4 - ;; - netbsd386) - basic_machine=i386-unknown - os=-netbsd - ;; - netwinder) - basic_machine=armv4l-rebel - os=-linux - ;; - news | news700 | news800 | news900) - basic_machine=m68k-sony - os=-newsos - ;; - news1000) - basic_machine=m68030-sony - os=-newsos + *mint | mint[0-9]* | *MiNT | *MiNT[0-9]*) + cpu=m68k + vendor=atari + basic_os=mint ;; news-3600 | risc-news) - basic_machine=mips-sony - os=-newsos - ;; - necv70) - basic_machine=v70-nec - os=-sysv - ;; - next | m*-next ) - basic_machine=m68k-next - case $os in - -nextstep* ) + cpu=mips + vendor=sony + basic_os=newsos + ;; + next | m*-next) + cpu=m68k + vendor=next + case $basic_os in + openstep*) + ;; + nextstep*) ;; - -ns2*) - os=-nextstep2 + ns2*) + basic_os=nextstep2 ;; *) - os=-nextstep3 + basic_os=nextstep3 ;; esac ;; - nh3000) - basic_machine=m68k-harris - os=-cxux - ;; - nh[45]000) - basic_machine=m88k-harris - os=-cxux - ;; - nindy960) - basic_machine=i960-intel - os=-nindy - ;; - mon960) - basic_machine=i960-intel - os=-mon960 - ;; - nonstopux) - basic_machine=mips-compaq - os=-nonstopux - ;; np1) - basic_machine=np1-gould - ;; - nsr-tandem) - basic_machine=nsr-tandem + cpu=np1 + vendor=gould ;; op50n-* | op60c-*) - basic_machine=hppa1.1-oki - os=-proelf - ;; - or32 | or32-*) - basic_machine=or32-unknown - os=-coff - ;; - OSE68000 | ose68000) - basic_machine=m68000-ericsson - os=-ose - ;; - os68k) - basic_machine=m68k-none - os=-os68k + cpu=hppa1.1 + vendor=oki + basic_os=proelf ;; pa-hitachi) - basic_machine=hppa1.1-hitachi - os=-hiuxwe2 - ;; - paragon) - basic_machine=i860-intel - os=-osf + cpu=hppa1.1 + vendor=hitachi + basic_os=hiuxwe2 ;; pbd) - basic_machine=sparc-tti + cpu=sparc + vendor=tti ;; pbb) - basic_machine=m68k-tti - ;; - pc532 | pc532-*) - basic_machine=ns32k-pc532 - ;; - pentium | p5 | k5 | k6 | nexgen | viac3) - basic_machine=i586-pc - ;; - pentiumpro | p6 | 6x86 | athlon) - basic_machine=i686-pc - ;; - pentiumii | pentium2) - basic_machine=i686-pc - ;; - pentium-* | p5-* | k5-* | k6-* | nexgen-* | viac3-*) - basic_machine=i586-`echo $basic_machine | sed 's/^[^-]*-//'` - ;; - pentiumpro-* | p6-* | 6x86-* | athlon-*) - basic_machine=i686-`echo $basic_machine | sed 's/^[^-]*-//'` + cpu=m68k + vendor=tti ;; - pentiumii-* | pentium2-*) - basic_machine=i686-`echo $basic_machine | sed 's/^[^-]*-//'` + pc532) + cpu=ns32k + vendor=pc532 ;; pn) - basic_machine=pn-gould + cpu=pn + vendor=gould ;; - power) basic_machine=power-ibm + power) + cpu=power + vendor=ibm ;; - ppc) basic_machine=powerpc-unknown - ;; - ppc-*) basic_machine=powerpc-`echo $basic_machine | sed 's/^[^-]*-//'` + ps2) + cpu=i386 + vendor=ibm ;; - ppcle | powerpclittle | ppc-le | powerpc-little) - basic_machine=powerpcle-unknown - ;; - ppcle-* | powerpclittle-*) - basic_machine=powerpcle-`echo $basic_machine | sed 's/^[^-]*-//'` + rm[46]00) + cpu=mips + vendor=siemens ;; - ppc64) basic_machine=powerpc64-unknown - ;; - ppc64-*) basic_machine=powerpc64-`echo $basic_machine | sed 's/^[^-]*-//'` + rtpc | rtpc-*) + cpu=romp + vendor=ibm ;; - ppc64le | powerpc64little | ppc64-le | powerpc64-little) - basic_machine=powerpc64le-unknown - ;; - ppc64le-* | powerpc64little-*) - basic_machine=powerpc64le-`echo $basic_machine | sed 's/^[^-]*-//'` + sde) + cpu=mipsisa32 + vendor=sde + basic_os=${basic_os:-elf} ;; - ps2) - basic_machine=i386-ibm + simso-wrs) + cpu=sparclite + vendor=wrs + basic_os=vxworks ;; - pw32) - basic_machine=i586-unknown - os=-pw32 + tower | tower-32) + cpu=m68k + vendor=ncr ;; - rom68k) - basic_machine=m68k-rom68k - os=-coff + vpp*|vx|vx-*) + cpu=f301 + vendor=fujitsu ;; - rm[46]00) - basic_machine=mips-siemens + w65) + cpu=w65 + vendor=wdc ;; - rtpc | rtpc-*) - basic_machine=romp-ibm + w89k-*) + cpu=hppa1.1 + vendor=winbond + basic_os=proelf ;; - s390 | s390-*) - basic_machine=s390-ibm + none) + cpu=none + vendor=none ;; - s390x | s390x-*) - basic_machine=s390x-ibm + leon|leon[3-9]) + cpu=sparc + vendor=$basic_machine ;; - sa29200) - basic_machine=a29k-amd - os=-udi + leon-*|leon[3-9]-*) + cpu=sparc + vendor=`echo "$basic_machine" | sed 's/-.*//'` ;; - sequent) - basic_machine=i386-sequent + + *-*) + # shellcheck disable=SC2162 + saved_IFS=$IFS + IFS="-" read cpu vendor <&2 - exit 1 + # Recognize the canonical CPU types that are allowed with any + # company name. + case $cpu in + 1750a | 580 \ + | a29k \ + | aarch64 | aarch64_be \ + | abacus \ + | alpha | alphaev[4-8] | alphaev56 | alphaev6[78] \ + | alpha64 | alpha64ev[4-8] | alpha64ev56 | alpha64ev6[78] \ + | alphapca5[67] | alpha64pca5[67] \ + | am33_2.0 \ + | amdgcn \ + | arc | arceb | arc32 | arc64 \ + | arm | arm[lb]e | arme[lb] | armv* \ + | avr | avr32 \ + | asmjs \ + | ba \ + | be32 | be64 \ + | bfin | bpf | bs2000 \ + | c[123]* | c30 | [cjt]90 | c4x \ + | c8051 | clipper | craynv | csky | cydra \ + | d10v | d30v | dlx | dsp16xx \ + | e2k | elxsi | epiphany \ + | f30[01] | f700 | fido | fr30 | frv | ft32 | fx80 \ + | h8300 | h8500 \ + | hppa | hppa1.[01] | hppa2.0 | hppa2.0[nw] | hppa64 \ + | hexagon \ + | i370 | i*86 | i860 | i960 | ia16 | ia64 \ + | ip2k | iq2000 \ + | k1om \ + | le32 | le64 \ + | lm32 \ + | loongarch32 | loongarch64 \ + | m32c | m32r | m32rle \ + | m5200 | m68000 | m680[012346]0 | m68360 | m683?2 | m68k \ + | m6811 | m68hc11 | m6812 | m68hc12 | m68hcs12x \ + | m88110 | m88k | maxq | mb | mcore | mep | metag \ + | microblaze | microblazeel \ + | mips | mipsbe | mipseb | mipsel | mipsle \ + | mips16 \ + | mips64 | mips64eb | mips64el \ + | mips64octeon | mips64octeonel \ + | mips64orion | mips64orionel \ + | mips64r5900 | mips64r5900el \ + | mips64vr | mips64vrel \ + | mips64vr4100 | mips64vr4100el \ + | mips64vr4300 | mips64vr4300el \ + | mips64vr5000 | mips64vr5000el \ + | mips64vr5900 | mips64vr5900el \ + | mipsisa32 | mipsisa32el \ + | mipsisa32r2 | mipsisa32r2el \ + | mipsisa32r3 | mipsisa32r3el \ + | mipsisa32r5 | mipsisa32r5el \ + | mipsisa32r6 | mipsisa32r6el \ + | mipsisa64 | mipsisa64el \ + | mipsisa64r2 | mipsisa64r2el \ + | mipsisa64r3 | mipsisa64r3el \ + | mipsisa64r5 | mipsisa64r5el \ + | mipsisa64r6 | mipsisa64r6el \ + | mipsisa64sb1 | mipsisa64sb1el \ + | mipsisa64sr71k | mipsisa64sr71kel \ + | mipsr5900 | mipsr5900el \ + | mipstx39 | mipstx39el \ + | mmix \ + | mn10200 | mn10300 \ + | moxie \ + | mt \ + | msp430 \ + | nds32 | nds32le | nds32be \ + | nfp \ + | nios | nios2 | nios2eb | nios2el \ + | none | np1 | ns16k | ns32k | nvptx \ + | open8 \ + | or1k* \ + | or32 \ + | orion \ + | picochip \ + | pdp10 | pdp11 | pj | pjl | pn | power \ + | powerpc | powerpc64 | powerpc64le | powerpcle | powerpcspe \ + | pru \ + | pyramid \ + | riscv | riscv32 | riscv32be | riscv64 | riscv64be \ + | rl78 | romp | rs6000 | rx \ + | s390 | s390x \ + | score \ + | sh | shl \ + | sh[1234] | sh[24]a | sh[24]ae[lb] | sh[23]e | she[lb] | sh[lb]e \ + | sh[1234]e[lb] | sh[12345][lb]e | sh[23]ele | sh64 | sh64le \ + | sparc | sparc64 | sparc64b | sparc64v | sparc86x | sparclet \ + | sparclite \ + | sparcv8 | sparcv9 | sparcv9b | sparcv9v | sv1 | sx* \ + | spu \ + | tahoe \ + | thumbv7* \ + | tic30 | tic4x | tic54x | tic55x | tic6x | tic80 \ + | tron \ + | ubicom32 \ + | v70 | v850 | v850e | v850e1 | v850es | v850e2 | v850e2v3 \ + | vax \ + | visium \ + | w65 \ + | wasm32 | wasm64 \ + | we32k \ + | x86 | x86_64 | xc16x | xgate | xps100 \ + | xstormy16 | xtensa* \ + | ymp \ + | z8k | z80) + ;; + + *) + echo Invalid configuration \`"$1"\': machine \`"$cpu-$vendor"\' not recognized 1>&2 + exit 1 + ;; + esac ;; esac # Here we canonicalize certain aliases for manufacturers. -case $basic_machine in - *-digital*) - basic_machine=`echo $basic_machine | sed 's/digital.*/dec/'` +case $vendor in + digital*) + vendor=dec ;; - *-commodore*) - basic_machine=`echo $basic_machine | sed 's/commodore.*/cbm/'` + commodore*) + vendor=cbm ;; *) ;; @@ -1047,165 +1306,219 @@ esac # Decode manufacturer-specific aliases for certain operating systems. -if [ x"$os" != x"" ] +if test x$basic_os != x then + +# First recognize some ad-hoc cases, or perhaps split kernel-os, or else just +# set os. +case $basic_os in + gnu/linux*) + kernel=linux + os=`echo "$basic_os" | sed -e 's|gnu/linux|gnu|'` + ;; + os2-emx) + kernel=os2 + os=`echo "$basic_os" | sed -e 's|os2-emx|emx|'` + ;; + nto-qnx*) + kernel=nto + os=`echo "$basic_os" | sed -e 's|nto-qnx|qnx|'` + ;; + *-*) + # shellcheck disable=SC2162 + saved_IFS=$IFS + IFS="-" read kernel os <&2 - exit 1 + # No normalization, but not necessarily accepted, that comes below. ;; esac + else # Here we handle the default operating systems that come with various machines. @@ -1218,225 +1531,376 @@ else # will signal an error saying that MANUFACTURER isn't an operating # system, and we'll never get to this point. -case $basic_machine in +kernel= +case $cpu-$vendor in + score-*) + os=elf + ;; + spu-*) + os=elf + ;; *-acorn) - os=-riscix1.2 + os=riscix1.2 ;; arm*-rebel) - os=-linux + kernel=linux + os=gnu ;; arm*-semi) - os=-aout + os=aout + ;; + c4x-* | tic4x-*) + os=coff + ;; + c8051-*) + os=elf + ;; + clipper-intergraph) + os=clix + ;; + hexagon-*) + os=elf + ;; + tic54x-*) + os=coff + ;; + tic55x-*) + os=coff + ;; + tic6x-*) + os=coff ;; # This must come before the *-dec entry. pdp10-*) - os=-tops20 + os=tops20 ;; - pdp11-*) - os=-none + pdp11-*) + os=none ;; *-dec | vax-*) - os=-ultrix4.2 + os=ultrix4.2 ;; m68*-apollo) - os=-domain + os=domain ;; i386-sun) - os=-sunos4.0.2 + os=sunos4.0.2 ;; m68000-sun) - os=-sunos3 - # This also exists in the configure program, but was not the - # default. - # os=-sunos4 + os=sunos3 ;; m68*-cisco) - os=-aout + os=aout + ;; + mep-*) + os=elf ;; mips*-cisco) - os=-elf + os=elf ;; mips*-*) - os=-elf + os=elf ;; or32-*) - os=-coff + os=coff ;; *-tti) # must be before sparc entry or we get the wrong os. - os=-sysv3 + os=sysv3 ;; sparc-* | *-sun) - os=-sunos4.1.1 + os=sunos4.1.1 + ;; + pru-*) + os=elf ;; *-be) - os=-beos + os=beos ;; *-ibm) - os=-aix + os=aix + ;; + *-knuth) + os=mmixware ;; *-wec) - os=-proelf + os=proelf ;; *-winbond) - os=-proelf + os=proelf ;; *-oki) - os=-proelf + os=proelf ;; *-hp) - os=-hpux + os=hpux ;; *-hitachi) - os=-hiux + os=hiux ;; i860-* | *-att | *-ncr | *-altos | *-motorola | *-convergent) - os=-sysv + os=sysv ;; *-cbm) - os=-amigaos + os=amigaos ;; *-dg) - os=-dgux + os=dgux ;; *-dolphin) - os=-sysv3 + os=sysv3 ;; m68k-ccur) - os=-rtu + os=rtu ;; m88k-omron*) - os=-luna + os=luna ;; - *-next ) - os=-nextstep + *-next) + os=nextstep ;; *-sequent) - os=-ptx + os=ptx ;; *-crds) - os=-unos + os=unos ;; *-ns) - os=-genix + os=genix ;; i370-*) - os=-mvs - ;; - *-next) - os=-nextstep3 + os=mvs ;; - *-gould) - os=-sysv + *-gould) + os=sysv ;; - *-highlevel) - os=-bsd + *-highlevel) + os=bsd ;; *-encore) - os=-bsd + os=bsd ;; - *-sgi) - os=-irix + *-sgi) + os=irix ;; - *-siemens) - os=-sysv4 + *-siemens) + os=sysv4 ;; *-masscomp) - os=-rtu + os=rtu ;; f30[01]-fujitsu | f700-fujitsu) - os=-uxpv + os=uxpv ;; *-rom68k) - os=-coff + os=coff ;; *-*bug) - os=-coff + os=coff ;; *-apple) - os=-macos + os=macos ;; *-atari*) - os=-mint + os=mint + ;; + *-wrs) + os=vxworks ;; *) - os=-none + os=none ;; esac + fi +# Now, validate our (potentially fixed-up) OS. +case $os in + # Sometimes we do "kernel-libc", so those need to count as OSes. + musl* | newlib* | relibc* | uclibc*) + ;; + # Likewise for "kernel-abi" + eabi* | gnueabi*) + ;; + # VxWorks passes extra cpu info in the 4th filed. + simlinux | simwindows | spe) + ;; + # Now accept the basic system types. + # The portable systems comes first. + # Each alternative MUST end in a * to match a version number. + gnu* | android* | bsd* | mach* | minix* | genix* | ultrix* | irix* \ + | *vms* | esix* | aix* | cnk* | sunos | sunos[34]* \ + | hpux* | unos* | osf* | luna* | dgux* | auroraux* | solaris* \ + | sym* | plan9* | psp* | sim* | xray* | os68k* | v88r* \ + | hiux* | abug | nacl* | netware* | windows* \ + | os9* | macos* | osx* | ios* \ + | mpw* | magic* | mmixware* | mon960* | lnews* \ + | amigaos* | amigados* | msdos* | newsos* | unicos* | aof* \ + | aos* | aros* | cloudabi* | sortix* | twizzler* \ + | nindy* | vxsim* | vxworks* | ebmon* | hms* | mvs* \ + | clix* | riscos* | uniplus* | iris* | isc* | rtu* | xenix* \ + | mirbsd* | netbsd* | dicos* | openedition* | ose* \ + | bitrig* | openbsd* | secbsd* | solidbsd* | libertybsd* | os108* \ + | ekkobsd* | freebsd* | riscix* | lynxos* | os400* \ + | bosx* | nextstep* | cxux* | aout* | elf* | oabi* \ + | ptx* | coff* | ecoff* | winnt* | domain* | vsta* \ + | udi* | lites* | ieee* | go32* | aux* | hcos* \ + | chorusrdb* | cegcc* | glidix* | serenity* \ + | cygwin* | msys* | pe* | moss* | proelf* | rtems* \ + | midipix* | mingw32* | mingw64* | mint* \ + | uxpv* | beos* | mpeix* | udk* | moxiebox* \ + | interix* | uwin* | mks* | rhapsody* | darwin* \ + | openstep* | oskit* | conix* | pw32* | nonstopux* \ + | storm-chaos* | tops10* | tenex* | tops20* | its* \ + | os2* | vos* | palmos* | uclinux* | nucleus* | morphos* \ + | scout* | superux* | sysv* | rtmk* | tpf* | windiss* \ + | powermax* | dnix* | nx6 | nx7 | sei* | dragonfly* \ + | skyos* | haiku* | rdos* | toppers* | drops* | es* \ + | onefs* | tirtos* | phoenix* | fuchsia* | redox* | bme* \ + | midnightbsd* | amdhsa* | unleashed* | emscripten* | wasi* \ + | nsk* | powerunix* | genode* | zvmoe* | qnx* | emx* | zephyr* \ + | fiwix* | mlibc* ) + ;; + # This one is extra strict with allowed versions + sco3.2v2 | sco3.2v[4-9]* | sco5v6*) + # Don't forget version if it is 3.2v4 or newer. + ;; + none) + ;; + kernel* ) + # Restricted further below + ;; + *) + echo Invalid configuration \`"$1"\': OS \`"$os"\' not recognized 1>&2 + exit 1 + ;; +esac + +# As a final step for OS-related things, validate the OS-kernel combination +# (given a valid OS), if there is a kernel. +case $kernel-$os in + linux-gnu* | linux-dietlibc* | linux-android* | linux-newlib* \ + | linux-musl* | linux-relibc* | linux-uclibc* | linux-mlibc* ) + ;; + uclinux-uclibc* ) + ;; + managarm-mlibc* | managarm-kernel* ) + ;; + -dietlibc* | -newlib* | -musl* | -relibc* | -uclibc* | -mlibc* ) + # These are just libc implementations, not actual OSes, and thus + # require a kernel. + echo "Invalid configuration \`$1': libc \`$os' needs explicit kernel." 1>&2 + exit 1 + ;; + -kernel* ) + echo "Invalid configuration \`$1': \`$os' needs explicit kernel." 1>&2 + exit 1 + ;; + *-kernel* ) + echo "Invalid configuration \`$1': \`$kernel' does not support \`$os'." 1>&2 + exit 1 + ;; + kfreebsd*-gnu* | kopensolaris*-gnu*) + ;; + vxworks-simlinux | vxworks-simwindows | vxworks-spe) + ;; + nto-qnx*) + ;; + os2-emx) + ;; + *-eabi* | *-gnueabi*) + ;; + -*) + # Blank kernel with real OS is always fine. + ;; + *-*) + echo "Invalid configuration \`$1': Kernel \`$kernel' not known to work with OS \`$os'." 1>&2 + exit 1 + ;; +esac + # Here we handle the case where we know the os, and the CPU type, but not the # manufacturer. We pick the logical manufacturer. -vendor=unknown -case $basic_machine in - *-unknown) - case $os in - -riscix*) +case $vendor in + unknown) + case $cpu-$os in + *-riscix*) vendor=acorn ;; - -sunos*) + *-sunos*) vendor=sun ;; - -aix*) + *-cnk* | *-aix*) vendor=ibm ;; - -beos*) + *-beos*) vendor=be ;; - -hpux*) + *-hpux*) vendor=hp ;; - -mpeix*) + *-mpeix*) vendor=hp ;; - -hiux*) + *-hiux*) vendor=hitachi ;; - -unos*) + *-unos*) vendor=crds ;; - -dgux*) + *-dgux*) vendor=dg ;; - -luna*) + *-luna*) vendor=omron ;; - -genix*) + *-genix*) vendor=ns ;; - -mvs* | -opened*) + *-clix*) + vendor=intergraph + ;; + *-mvs* | *-opened*) vendor=ibm ;; - -ptx*) + *-os400*) + vendor=ibm + ;; + s390-* | s390x-*) + vendor=ibm + ;; + *-ptx*) vendor=sequent ;; - -vxsim* | -vxworks*) + *-tpf*) + vendor=ibm + ;; + *-vxsim* | *-vxworks* | *-windiss*) vendor=wrs ;; - -aux*) + *-aux*) vendor=apple ;; - -hms*) + *-hms*) vendor=hitachi ;; - -mpw* | -macos*) + *-mpw* | *-macos*) vendor=apple ;; - -*mint | -mint[0-9]* | -*MiNT | -MiNT[0-9]*) + *-*mint | *-mint[0-9]* | *-*MiNT | *-MiNT[0-9]*) vendor=atari ;; - -vos*) + *-vos*) vendor=stratus ;; esac - basic_machine=`echo $basic_machine | sed "s/unknown/$vendor/"` ;; esac -echo $basic_machine$os -exit 0 +echo "$cpu-$vendor-${kernel:+$kernel-}$os" +exit # Local variables: -# eval: (add-hook 'write-file-hooks 'time-stamp) +# eval: (add-hook 'before-save-hook 'time-stamp) # time-stamp-start: "timestamp='" # time-stamp-format: "%:y-%02m-%02d" # time-stamp-end: "'" diff --git a/configure b/configure index 7858d04e..c8aceae3 100755 --- a/configure +++ b/configure @@ -1,10 +1,11 @@ #! /bin/sh # From configure.ac Revision: 1.2 . # Guess values for system-dependent variables and create Makefiles. -# Generated by GNU Autoconf 2.69. +# Generated by GNU Autoconf 2.71. # # -# Copyright (C) 1992-1996, 1998-2012 Free Software Foundation, Inc. +# Copyright (C) 1992-1996, 1998-2017, 2020-2021 Free Software Foundation, +# Inc. # # # This configure script is free software; the Free Software Foundation @@ -15,14 +16,16 @@ # Be more Bourne compatible DUALCASE=1; export DUALCASE # for MKS sh -if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then : +as_nop=: +if test ${ZSH_VERSION+y} && (emulate sh) >/dev/null 2>&1 +then : emulate sh NULLCMD=: # Pre-4.2 versions of Zsh do word splitting on ${1+"$@"}, which # is contrary to our usage. Disable this feature. alias -g '${1+"$@"}'='"$@"' setopt NO_GLOB_SUBST -else +else $as_nop case `(set -o) 2>/dev/null` in #( *posix*) : set -o posix ;; #( @@ -32,46 +35,46 @@ esac fi + +# Reset variables that may have inherited troublesome values from +# the environment. + +# IFS needs to be set, to space, tab, and newline, in precisely that order. +# (If _AS_PATH_WALK were called with IFS unset, it would have the +# side effect of setting IFS to empty, thus disabling word splitting.) +# Quoting is to prevent editors from complaining about space-tab. as_nl=' ' export as_nl -# Printing a long string crashes Solaris 7 /usr/bin/printf. -as_echo='\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' -as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo -as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo$as_echo -# Prefer a ksh shell builtin over an external printf program on Solaris, -# but without wasting forks for bash or zsh. -if test -z "$BASH_VERSION$ZSH_VERSION" \ - && (test "X`print -r -- $as_echo`" = "X$as_echo") 2>/dev/null; then - as_echo='print -r --' - as_echo_n='print -rn --' -elif (test "X`printf %s $as_echo`" = "X$as_echo") 2>/dev/null; then - as_echo='printf %s\n' - as_echo_n='printf %s' -else - if test "X`(/usr/ucb/echo -n -n $as_echo) 2>/dev/null`" = "X-n $as_echo"; then - as_echo_body='eval /usr/ucb/echo -n "$1$as_nl"' - as_echo_n='/usr/ucb/echo -n' - else - as_echo_body='eval expr "X$1" : "X\\(.*\\)"' - as_echo_n_body='eval - arg=$1; - case $arg in #( - *"$as_nl"*) - expr "X$arg" : "X\\(.*\\)$as_nl"; - arg=`expr "X$arg" : ".*$as_nl\\(.*\\)"`;; - esac; - expr "X$arg" : "X\\(.*\\)" | tr -d "$as_nl" - ' - export as_echo_n_body - as_echo_n='sh -c $as_echo_n_body as_echo' - fi - export as_echo_body - as_echo='sh -c $as_echo_body as_echo' -fi +IFS=" "" $as_nl" + +PS1='$ ' +PS2='> ' +PS4='+ ' + +# Ensure predictable behavior from utilities with locale-dependent output. +LC_ALL=C +export LC_ALL +LANGUAGE=C +export LANGUAGE + +# We cannot yet rely on "unset" to work, but we need these variables +# to be unset--not just set to an empty or harmless value--now, to +# avoid bugs in old shells (e.g. pre-3.0 UWIN ksh). This construct +# also avoids known problems related to "unset" and subshell syntax +# in other old shells (e.g. bash 2.01 and pdksh 5.2.14). +for as_var in BASH_ENV ENV MAIL MAILPATH CDPATH +do eval test \${$as_var+y} \ + && ( (unset $as_var) || exit 1) >/dev/null 2>&1 && unset $as_var || : +done + +# Ensure that fds 0, 1, and 2 are open. +if (exec 3>&0) 2>/dev/null; then :; else exec 0&1) 2>/dev/null; then :; else exec 1>/dev/null; fi +if (exec 3>&2) ; then :; else exec 2>/dev/null; fi # The user is always right. -if test "${PATH_SEPARATOR+set}" != set; then +if ${PATH_SEPARATOR+false} :; then PATH_SEPARATOR=: (PATH='/bin;/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 && { (PATH='/bin:/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 || @@ -80,13 +83,6 @@ if test "${PATH_SEPARATOR+set}" != set; then fi -# IFS -# We need space, tab and new line, in precisely that order. Quoting is -# there to prevent editors from complaining about space-tab. -# (If _AS_PATH_WALK were called with IFS unset, it would disable word -# splitting by setting IFS to empty value.) -IFS=" "" $as_nl" - # Find who we are. Look in the path if we contain no directory separator. as_myself= case $0 in #(( @@ -95,8 +91,12 @@ case $0 in #(( for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - test -r "$as_dir/$0" && as_myself=$as_dir/$0 && break + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + test -r "$as_dir$0" && as_myself=$as_dir$0 && break done IFS=$as_save_IFS @@ -108,30 +108,10 @@ if test "x$as_myself" = x; then as_myself=$0 fi if test ! -f "$as_myself"; then - $as_echo "$as_myself: error: cannot find myself; rerun with an absolute file name" >&2 + printf "%s\n" "$as_myself: error: cannot find myself; rerun with an absolute file name" >&2 exit 1 fi -# Unset variables that we do not need and which cause bugs (e.g. in -# pre-3.0 UWIN ksh). But do not cause bugs in bash 2.01; the "|| exit 1" -# suppresses any "Segmentation fault" message there. '((' could -# trigger a bug in pdksh 5.2.14. -for as_var in BASH_ENV ENV MAIL MAILPATH -do eval test x\${$as_var+set} = xset \ - && ( (unset $as_var) || exit 1) >/dev/null 2>&1 && unset $as_var || : -done -PS1='$ ' -PS2='> ' -PS4='+ ' - -# NLS nuisances. -LC_ALL=C -export LC_ALL -LANGUAGE=C -export LANGUAGE - -# CDPATH. -(unset CDPATH) >/dev/null 2>&1 && unset CDPATH # Use a proper internal environment variable to ensure we don't fall # into an infinite loop, continuously re-executing ourselves. @@ -153,20 +133,22 @@ esac exec $CONFIG_SHELL $as_opts "$as_myself" ${1+"$@"} # Admittedly, this is quite paranoid, since all the known shells bail # out after a failed `exec'. -$as_echo "$0: could not re-execute with $CONFIG_SHELL" >&2 -as_fn_exit 255 +printf "%s\n" "$0: could not re-execute with $CONFIG_SHELL" >&2 +exit 255 fi # We don't want this to propagate to other subprocesses. { _as_can_reexec=; unset _as_can_reexec;} if test "x$CONFIG_SHELL" = x; then - as_bourne_compatible="if test -n \"\${ZSH_VERSION+set}\" && (emulate sh) >/dev/null 2>&1; then : + as_bourne_compatible="as_nop=: +if test \${ZSH_VERSION+y} && (emulate sh) >/dev/null 2>&1 +then : emulate sh NULLCMD=: # Pre-4.2 versions of Zsh do word splitting on \${1+\"\$@\"}, which # is contrary to our usage. Disable this feature. alias -g '\${1+\"\$@\"}'='\"\$@\"' setopt NO_GLOB_SUBST -else +else \$as_nop case \`(set -o) 2>/dev/null\` in #( *posix*) : set -o posix ;; #( @@ -186,42 +168,53 @@ as_fn_success || { exitcode=1; echo as_fn_success failed.; } as_fn_failure && { exitcode=1; echo as_fn_failure succeeded.; } as_fn_ret_success || { exitcode=1; echo as_fn_ret_success failed.; } as_fn_ret_failure && { exitcode=1; echo as_fn_ret_failure succeeded.; } -if ( set x; as_fn_ret_success y && test x = \"\$1\" ); then : +if ( set x; as_fn_ret_success y && test x = \"\$1\" ) +then : -else +else \$as_nop exitcode=1; echo positional parameters were not saved. fi test x\$exitcode = x0 || exit 1 +blah=\$(echo \$(echo blah)) +test x\"\$blah\" = xblah || exit 1 test -x / || exit 1" as_suggested=" as_lineno_1=";as_suggested=$as_suggested$LINENO;as_suggested=$as_suggested" as_lineno_1a=\$LINENO as_lineno_2=";as_suggested=$as_suggested$LINENO;as_suggested=$as_suggested" as_lineno_2a=\$LINENO eval 'test \"x\$as_lineno_1'\$as_run'\" != \"x\$as_lineno_2'\$as_run'\" && test \"x\`expr \$as_lineno_1'\$as_run' + 1\`\" = \"x\$as_lineno_2'\$as_run'\"' || exit 1 test \$(( 1 + 1 )) = 2 || exit 1" - if (eval "$as_required") 2>/dev/null; then : + if (eval "$as_required") 2>/dev/null +then : as_have_required=yes -else +else $as_nop as_have_required=no fi - if test x$as_have_required = xyes && (eval "$as_suggested") 2>/dev/null; then : + if test x$as_have_required = xyes && (eval "$as_suggested") 2>/dev/null +then : -else +else $as_nop as_save_IFS=$IFS; IFS=$PATH_SEPARATOR as_found=false for as_dir in /bin$PATH_SEPARATOR/usr/bin$PATH_SEPARATOR$PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac as_found=: case $as_dir in #( /*) for as_base in sh bash ksh sh5; do # Try only shells that exist, to save several forks. - as_shell=$as_dir/$as_base + as_shell=$as_dir$as_base if { test -f "$as_shell" || test -f "$as_shell.exe"; } && - { $as_echo "$as_bourne_compatible""$as_required" | as_run=a "$as_shell"; } 2>/dev/null; then : + as_run=a "$as_shell" -c "$as_bourne_compatible""$as_required" 2>/dev/null +then : CONFIG_SHELL=$as_shell as_have_required=yes - if { $as_echo "$as_bourne_compatible""$as_suggested" | as_run=a "$as_shell"; } 2>/dev/null; then : + if as_run=a "$as_shell" -c "$as_bourne_compatible""$as_suggested" 2>/dev/null +then : break 2 fi fi @@ -229,14 +222,21 @@ fi esac as_found=false done -$as_found || { if { test -f "$SHELL" || test -f "$SHELL.exe"; } && - { $as_echo "$as_bourne_compatible""$as_required" | as_run=a "$SHELL"; } 2>/dev/null; then : - CONFIG_SHELL=$SHELL as_have_required=yes -fi; } IFS=$as_save_IFS +if $as_found +then : + +else $as_nop + if { test -f "$SHELL" || test -f "$SHELL.exe"; } && + as_run=a "$SHELL" -c "$as_bourne_compatible""$as_required" 2>/dev/null +then : + CONFIG_SHELL=$SHELL as_have_required=yes +fi +fi - if test "x$CONFIG_SHELL" != x; then : + if test "x$CONFIG_SHELL" != x +then : export CONFIG_SHELL # We cannot yet assume a decent shell, so we have to provide a # neutralization value for shells without unset; and this also @@ -254,18 +254,19 @@ esac exec $CONFIG_SHELL $as_opts "$as_myself" ${1+"$@"} # Admittedly, this is quite paranoid, since all the known shells bail # out after a failed `exec'. -$as_echo "$0: could not re-execute with $CONFIG_SHELL" >&2 +printf "%s\n" "$0: could not re-execute with $CONFIG_SHELL" >&2 exit 255 fi - if test x$as_have_required = xno; then : - $as_echo "$0: This script requires a shell more modern than all" - $as_echo "$0: the shells that I found on your system." - if test x${ZSH_VERSION+set} = xset ; then - $as_echo "$0: In particular, zsh $ZSH_VERSION has bugs and should" - $as_echo "$0: be upgraded to zsh 4.3.4 or later." + if test x$as_have_required = xno +then : + printf "%s\n" "$0: This script requires a shell more modern than all" + printf "%s\n" "$0: the shells that I found on your system." + if test ${ZSH_VERSION+y} ; then + printf "%s\n" "$0: In particular, zsh $ZSH_VERSION has bugs and should" + printf "%s\n" "$0: be upgraded to zsh 4.3.4 or later." else - $as_echo "$0: Please tell bug-autoconf@gnu.org about your system, + printf "%s\n" "$0: Please tell bug-autoconf@gnu.org about your system, $0: including any error possibly output before this $0: message. Then install a modern shell, or manually run $0: the script under such a shell if you do have one." @@ -292,6 +293,7 @@ as_fn_unset () } as_unset=as_fn_unset + # as_fn_set_status STATUS # ----------------------- # Set $? to STATUS, without forking. @@ -309,6 +311,14 @@ as_fn_exit () as_fn_set_status $1 exit $1 } # as_fn_exit +# as_fn_nop +# --------- +# Do nothing but, unlike ":", preserve the value of $?. +as_fn_nop () +{ + return $? +} +as_nop=as_fn_nop # as_fn_mkdir_p # ------------- @@ -323,7 +333,7 @@ as_fn_mkdir_p () as_dirs= while :; do case $as_dir in #( - *\'*) as_qdir=`$as_echo "$as_dir" | sed "s/'/'\\\\\\\\''/g"`;; #'( + *\'*) as_qdir=`printf "%s\n" "$as_dir" | sed "s/'/'\\\\\\\\''/g"`;; #'( *) as_qdir=$as_dir;; esac as_dirs="'$as_qdir' $as_dirs" @@ -332,7 +342,7 @@ $as_expr X"$as_dir" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_dir" : 'X\(//\)[^/]' \| \ X"$as_dir" : 'X\(//\)$' \| \ X"$as_dir" : 'X\(/\)' \| . 2>/dev/null || -$as_echo X"$as_dir" | +printf "%s\n" X"$as_dir" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q @@ -371,12 +381,13 @@ as_fn_executable_p () # advantage of any shell optimizations that allow amortized linear growth over # repeated appends, instead of the typical quadratic growth present in naive # implementations. -if (eval "as_var=1; as_var+=2; test x\$as_var = x12") 2>/dev/null; then : +if (eval "as_var=1; as_var+=2; test x\$as_var = x12") 2>/dev/null +then : eval 'as_fn_append () { eval $1+=\$2 }' -else +else $as_nop as_fn_append () { eval $1=\$$1\$2 @@ -388,18 +399,27 @@ fi # as_fn_append # Perform arithmetic evaluation on the ARGs, and store the result in the # global $as_val. Take advantage of shells that can avoid forks. The arguments # must be portable across $(()) and expr. -if (eval "test \$(( 1 + 1 )) = 2") 2>/dev/null; then : +if (eval "test \$(( 1 + 1 )) = 2") 2>/dev/null +then : eval 'as_fn_arith () { as_val=$(( $* )) }' -else +else $as_nop as_fn_arith () { as_val=`expr "$@" || test $? -eq 1` } fi # as_fn_arith +# as_fn_nop +# --------- +# Do nothing but, unlike ":", preserve the value of $?. +as_fn_nop () +{ + return $? +} +as_nop=as_fn_nop # as_fn_error STATUS ERROR [LINENO LOG_FD] # ---------------------------------------- @@ -411,9 +431,9 @@ as_fn_error () as_status=$1; test $as_status -eq 0 && as_status=1 if test "$4"; then as_lineno=${as_lineno-"$3"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - $as_echo "$as_me:${as_lineno-$LINENO}: error: $2" >&$4 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: $2" >&$4 fi - $as_echo "$as_me: error: $2" >&2 + printf "%s\n" "$as_me: error: $2" >&2 as_fn_exit $as_status } # as_fn_error @@ -440,7 +460,7 @@ as_me=`$as_basename -- "$0" || $as_expr X/"$0" : '.*/\([^/][^/]*\)/*$' \| \ X"$0" : 'X\(//\)$' \| \ X"$0" : 'X\(/\)' \| . 2>/dev/null || -$as_echo X/"$0" | +printf "%s\n" X/"$0" | sed '/^.*\/\([^/][^/]*\)\/*$/{ s//\1/ q @@ -484,7 +504,7 @@ as_cr_alnum=$as_cr_Letters$as_cr_digits s/-\n.*// ' >$as_me.lineno && chmod +x "$as_me.lineno" || - { $as_echo "$as_me: error: cannot create $as_me.lineno; rerun with a POSIX shell" >&2; as_fn_exit 1; } + { printf "%s\n" "$as_me: error: cannot create $as_me.lineno; rerun with a POSIX shell" >&2; as_fn_exit 1; } # If we had to re-execute with $CONFIG_SHELL, we're ensured to have # already done that, so ensure we don't try to do so again and fall @@ -498,6 +518,10 @@ as_cr_alnum=$as_cr_Letters$as_cr_digits exit } + +# Determine whether it's possible to make 'echo' print without a newline. +# These variables are no longer used directly by Autoconf, but are AC_SUBSTed +# for compatibility with existing Makefiles. ECHO_C= ECHO_N= ECHO_T= case `echo -n x` in #((((( -n*) @@ -511,6 +535,13 @@ case `echo -n x` in #((((( ECHO_N='-n';; esac +# For backward compatibility with old third-party macros, we provide +# the shell variables $as_echo and $as_echo_n. New code should use +# AS_ECHO(["message"]) and AS_ECHO_N(["message"]), respectively. +as_echo='printf %s\n' +as_echo_n='printf %s' + + rm -f conf$$ conf$$.exe conf$$.file if test -d conf$$.dir; then rm -f conf$$.dir/conf$$.file @@ -636,52 +667,48 @@ MFLAGS= MAKEFLAGS= # Identity of this package. -PACKAGE_NAME= -PACKAGE_TARNAME= -PACKAGE_VERSION= -PACKAGE_STRING= -PACKAGE_BUGREPORT= -PACKAGE_URL= +PACKAGE_NAME='' +PACKAGE_TARNAME='' +PACKAGE_VERSION='' +PACKAGE_STRING='' +PACKAGE_BUGREPORT='' +PACKAGE_URL='' ac_unique_file="src/hypermail.c" ac_default_prefix=/usr/local enable_option_checking=no # Factoring default headers for most tests. ac_includes_default="\ -#include -#ifdef HAVE_SYS_TYPES_H -# include -#endif -#ifdef HAVE_SYS_STAT_H -# include +#include +#ifdef HAVE_STDIO_H +# include #endif -#ifdef STDC_HEADERS +#ifdef HAVE_STDLIB_H # include -# include -#else -# ifdef HAVE_STDLIB_H -# include -# endif #endif #ifdef HAVE_STRING_H -# if !defined STDC_HEADERS && defined HAVE_MEMORY_H -# include -# endif # include #endif -#ifdef HAVE_STRINGS_H -# include -#endif #ifdef HAVE_INTTYPES_H # include #endif #ifdef HAVE_STDINT_H # include #endif +#ifdef HAVE_STRINGS_H +# include +#endif +#ifdef HAVE_SYS_TYPES_H +# include +#endif +#ifdef HAVE_SYS_STAT_H +# include +#endif #ifdef HAVE_UNISTD_H # include #endif" +ac_header_c_list= ac_subst_vars='LTLIBOBJS LIBOBJS INCLUDES @@ -689,8 +716,8 @@ EXTRA_LIBS HAVE_STRERROR HAVE_MEMMOVE FNV_DEP -PCRE_DEP -PCRE_CONFIG +PCRE2_DEP +PCRE2_CONFIG TRIO_DEP EGREP GREP @@ -698,10 +725,10 @@ domainaddr defaultindex htmlsuffix language -cgidir httpddir suffix INSTALL +PKGCONF RANLIB AR SET_MAKE @@ -776,17 +803,16 @@ ac_user_opts=' enable_option_checking enable_warnings with_httpddir -with_cgidir with_htmldir with_language with_htmlsuffix enable_defaultindex with_domainaddr with_gdbm -enable_i18n enable_system_libtrio enable_bundled_pcre -with_external_pcre +with_external_pcre2 +with_libchardet enable_libfnv ' ac_precious_vars='build_alias @@ -868,8 +894,6 @@ do *) ac_optarg=yes ;; esac - # Accept the important Cygnus configure options, so we can diagnose typos. - case $ac_dashdash$ac_option in --) ac_dashdash=yes ;; @@ -910,9 +934,9 @@ do ac_useropt=`expr "x$ac_option" : 'x-*disable-\(.*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && - as_fn_error $? "invalid feature name: $ac_useropt" + as_fn_error $? "invalid feature name: \`$ac_useropt'" ac_useropt_orig=$ac_useropt - ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` + ac_useropt=`printf "%s\n" "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "enable_$ac_useropt" @@ -936,9 +960,9 @@ do ac_useropt=`expr "x$ac_option" : 'x-*enable-\([^=]*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && - as_fn_error $? "invalid feature name: $ac_useropt" + as_fn_error $? "invalid feature name: \`$ac_useropt'" ac_useropt_orig=$ac_useropt - ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` + ac_useropt=`printf "%s\n" "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "enable_$ac_useropt" @@ -1149,9 +1173,9 @@ do ac_useropt=`expr "x$ac_option" : 'x-*with-\([^=]*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && - as_fn_error $? "invalid package name: $ac_useropt" + as_fn_error $? "invalid package name: \`$ac_useropt'" ac_useropt_orig=$ac_useropt - ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` + ac_useropt=`printf "%s\n" "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "with_$ac_useropt" @@ -1165,9 +1189,9 @@ do ac_useropt=`expr "x$ac_option" : 'x-*without-\(.*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && - as_fn_error $? "invalid package name: $ac_useropt" + as_fn_error $? "invalid package name: \`$ac_useropt'" ac_useropt_orig=$ac_useropt - ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` + ac_useropt=`printf "%s\n" "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "with_$ac_useropt" @@ -1211,9 +1235,9 @@ Try \`$0 --help' for more information" *) # FIXME: should be removed in autoconf 3.0. - $as_echo "$as_me: WARNING: you should use --build, --host, --target" >&2 + printf "%s\n" "$as_me: WARNING: you should use --build, --host, --target" >&2 expr "x$ac_option" : ".*[^-._$as_cr_alnum]" >/dev/null && - $as_echo "$as_me: WARNING: invalid host type: $ac_option" >&2 + printf "%s\n" "$as_me: WARNING: invalid host type: $ac_option" >&2 : "${build_alias=$ac_option} ${host_alias=$ac_option} ${target_alias=$ac_option}" ;; @@ -1229,7 +1253,7 @@ if test -n "$ac_unrecognized_opts"; then case $enable_option_checking in no) ;; fatal) as_fn_error $? "unrecognized options: $ac_unrecognized_opts" ;; - *) $as_echo "$as_me: WARNING: unrecognized options: $ac_unrecognized_opts" >&2 ;; + *) printf "%s\n" "$as_me: WARNING: unrecognized options: $ac_unrecognized_opts" >&2 ;; esac fi @@ -1293,7 +1317,7 @@ $as_expr X"$as_myself" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_myself" : 'X\(//\)[^/]' \| \ X"$as_myself" : 'X\(//\)$' \| \ X"$as_myself" : 'X\(/\)' \| . 2>/dev/null || -$as_echo X"$as_myself" | +printf "%s\n" X"$as_myself" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q @@ -1425,28 +1449,29 @@ Optional Features: --enable-FEATURE[=ARG] include FEATURE [ARG=yes] --enable-warnings Enable -Wall if using gcc. --enable-defaultindex=type Default index page type thread - --disable-i18n Disable I18N support --enable-system-libtrio Use the system libtrio instead of compiling the bundled one - --enable-bundled-pcre Force the use of the bundled PCRE library instead of - the system one - --enable-libfnv use the fnv hash library for generating - non-sequential filenames [no] + --enable-bundled-pcre2 Force the use of the bundled PCRE2 library instead + of the system one + --enable-libfnv (EXPERIMENTAL, UNMAINTAINED) use the fnv hash + library for generating non-sequential filenames [no] Optional Packages: --with-PACKAGE[=ARG] use PACKAGE [ARG=yes] --without-PACKAGE do not use PACKAGE (same as --with-PACKAGE=no) --with-httpddir=DIR webserver's root directory /usr/local/apache - --with-cgidir=DIR where to install CGI scripts --with-htmldir=DIR where to install Hypermail HTML pages --with-language=xx two character language indicator en --with-htmlsuffix=xx two character language indicator html --with-domainaddr=YOURDOMAIN domain address of local domain - --with-gdbm=DIR Include GDBM support - --with-external-pcre=PATH_TO_PCRE_DIR|PATH_TO_PCRE_CONFIG_SCRIPT - Use an external PCRE library instead of the system + --with-gdbm=DIR (UNMAINTAINED) Include GDBM support + --with-external-pcre2=PATH_TO_PCRE2_DIR|PATH_TO_PCRE2_CONFIG_SCRIPT + Use an external PCRE2 library instead of the system or the bundled one + --with-libchardet=DIR Use libchardet for character set detection, optional + DIR points to path to local installed libchardet; + leave empty for using system libchardet Some influential environment variables: CC C compiler command @@ -1483,9 +1508,9 @@ if test "$ac_init_help" = "recursive"; then case "$ac_dir" in .) ac_dir_suffix= ac_top_builddir_sub=. ac_top_build_prefix= ;; *) - ac_dir_suffix=/`$as_echo "$ac_dir" | sed 's|^\.[\\/]||'` + ac_dir_suffix=/`printf "%s\n" "$ac_dir" | sed 's|^\.[\\/]||'` # A ".." for each directory in $ac_dir_suffix. - ac_top_builddir_sub=`$as_echo "$ac_dir_suffix" | sed 's|/[^\\/]*|/..|g;s|/||'` + ac_top_builddir_sub=`printf "%s\n" "$ac_dir_suffix" | sed 's|/[^\\/]*|/..|g;s|/||'` case $ac_top_builddir_sub in "") ac_top_builddir_sub=. ac_top_build_prefix= ;; *) ac_top_build_prefix=$ac_top_builddir_sub/ ;; @@ -1513,7 +1538,8 @@ esac ac_abs_srcdir=$ac_abs_top_srcdir$ac_dir_suffix cd "$ac_dir" || { ac_status=$?; continue; } - # Check for guested configure. + # Check for configure.gnu first; this name is used for a wrapper for + # Metaconfig's "Configure" on case-insensitive file systems. if test -f "$ac_srcdir/configure.gnu"; then echo && $SHELL "$ac_srcdir/configure.gnu" --help=recursive @@ -1521,7 +1547,7 @@ ac_abs_srcdir=$ac_abs_top_srcdir$ac_dir_suffix echo && $SHELL "$ac_srcdir/configure" --help=recursive else - $as_echo "$as_me: WARNING: no configuration information is in $ac_dir" >&2 + printf "%s\n" "$as_me: WARNING: no configuration information is in $ac_dir" >&2 fi || ac_status=$? cd "$ac_pwd" || { ac_status=$?; break; } done @@ -1531,9 +1557,9 @@ test -n "$ac_init_help" && exit $ac_status if $ac_init_version; then cat <<\_ACEOF configure -generated by GNU Autoconf 2.69 +generated by GNU Autoconf 2.71 -Copyright (C) 2012 Free Software Foundation, Inc. +Copyright (C) 2021 Free Software Foundation, Inc. This configure script is free software; the Free Software Foundation gives unlimited permission to copy, distribute and modify it. _ACEOF @@ -1550,14 +1576,14 @@ fi ac_fn_c_try_compile () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - rm -f conftest.$ac_objext + rm -f conftest.$ac_objext conftest.beam if { { ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_compile") 2>conftest.err ac_status=$? if test -s conftest.err; then @@ -1565,14 +1591,15 @@ $as_echo "$ac_try_echo"; } >&5 cat conftest.er1 >&5 mv -f conftest.er1 conftest.err fi - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err - } && test -s conftest.$ac_objext; then : + } && test -s conftest.$ac_objext +then : ac_retval=0 -else - $as_echo "$as_me: failed program was:" >&5 +else $as_nop + printf "%s\n" "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_retval=1 @@ -1594,7 +1621,7 @@ case "(($ac_try" in *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_cpp conftest.$ac_ext") 2>conftest.err ac_status=$? if test -s conftest.err; then @@ -1602,14 +1629,15 @@ $as_echo "$ac_try_echo"; } >&5 cat conftest.er1 >&5 mv -f conftest.er1 conftest.err fi - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } > conftest.i && { test -z "$ac_c_preproc_warn_flag$ac_c_werror_flag" || test ! -s conftest.err - }; then : + } +then : ac_retval=0 -else - $as_echo "$as_me: failed program was:" >&5 +else $as_nop + printf "%s\n" "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_retval=1 @@ -1621,8 +1649,8 @@ fi # ac_fn_c_try_run LINENO # ---------------------- -# Try to link conftest.$ac_ext, and return whether this succeeded. Assumes -# that executables *can* be run. +# Try to run conftest.$ac_ext, and return whether this succeeded. Assumes that +# executables *can* be run. ac_fn_c_try_run () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack @@ -1632,25 +1660,26 @@ case "(($ac_try" in *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_link") 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } && { ac_try='./conftest$ac_exeext' { { case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_try") 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; }; }; then : + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; } +then : ac_retval=0 -else - $as_echo "$as_me: program exited with status $ac_status" >&5 - $as_echo "$as_me: failed program was:" >&5 +else $as_nop + printf "%s\n" "$as_me: program exited with status $ac_status" >&5 + printf "%s\n" "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_retval=$ac_status @@ -1661,93 +1690,6 @@ fi } # ac_fn_c_try_run -# ac_fn_c_check_header_mongrel LINENO HEADER VAR INCLUDES -# ------------------------------------------------------- -# Tests whether HEADER exists, giving a warning if it cannot be compiled using -# the include files in INCLUDES and setting the cache variable VAR -# accordingly. -ac_fn_c_check_header_mongrel () -{ - as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - if eval \${$3+:} false; then : - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 -$as_echo_n "checking for $2... " >&6; } -if eval \${$3+:} false; then : - $as_echo_n "(cached) " >&6 -fi -eval ac_res=\$$3 - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } -else - # Is the header compilable? -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking $2 usability" >&5 -$as_echo_n "checking $2 usability... " >&6; } -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -$4 -#include <$2> -_ACEOF -if ac_fn_c_try_compile "$LINENO"; then : - ac_header_compiler=yes -else - ac_header_compiler=no -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_header_compiler" >&5 -$as_echo "$ac_header_compiler" >&6; } - -# Is the header present? -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking $2 presence" >&5 -$as_echo_n "checking $2 presence... " >&6; } -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include <$2> -_ACEOF -if ac_fn_c_try_cpp "$LINENO"; then : - ac_header_preproc=yes -else - ac_header_preproc=no -fi -rm -f conftest.err conftest.i conftest.$ac_ext -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_header_preproc" >&5 -$as_echo "$ac_header_preproc" >&6; } - -# So? What about this header? -case $ac_header_compiler:$ac_header_preproc:$ac_c_preproc_warn_flag in #(( - yes:no: ) - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: accepted by the compiler, rejected by the preprocessor!" >&5 -$as_echo "$as_me: WARNING: $2: accepted by the compiler, rejected by the preprocessor!" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: proceeding with the compiler's result" >&5 -$as_echo "$as_me: WARNING: $2: proceeding with the compiler's result" >&2;} - ;; - no:yes:* ) - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: present but cannot be compiled" >&5 -$as_echo "$as_me: WARNING: $2: present but cannot be compiled" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: check for missing prerequisite headers?" >&5 -$as_echo "$as_me: WARNING: $2: check for missing prerequisite headers?" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: see the Autoconf documentation" >&5 -$as_echo "$as_me: WARNING: $2: see the Autoconf documentation" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: section \"Present But Cannot Be Compiled\"" >&5 -$as_echo "$as_me: WARNING: $2: section \"Present But Cannot Be Compiled\"" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: proceeding with the compiler's result" >&5 -$as_echo "$as_me: WARNING: $2: proceeding with the compiler's result" >&2;} - ;; -esac - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 -$as_echo_n "checking for $2... " >&6; } -if eval \${$3+:} false; then : - $as_echo_n "(cached) " >&6 -else - eval "$3=\$ac_header_compiler" -fi -eval ac_res=\$$3 - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } -fi - eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno - -} # ac_fn_c_check_header_mongrel - # ac_fn_c_check_header_compile LINENO HEADER VAR INCLUDES # ------------------------------------------------------- # Tests whether HEADER exists and can be compiled using the include files in @@ -1755,26 +1697,28 @@ fi ac_fn_c_check_header_compile () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 -$as_echo_n "checking for $2... " >&6; } -if eval \${$3+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 +printf %s "checking for $2... " >&6; } +if eval test \${$3+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ $4 #include <$2> _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : eval "$3=yes" -else +else $as_nop eval "$3=no" fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext fi eval ac_res=\$$3 - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 +printf "%s\n" "$ac_res" >&6; } eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno } # ac_fn_c_check_header_compile @@ -1785,14 +1729,14 @@ $as_echo "$ac_res" >&6; } ac_fn_c_try_link () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - rm -f conftest.$ac_objext conftest$ac_exeext + rm -f conftest.$ac_objext conftest.beam conftest$ac_exeext if { { ac_try="$ac_link" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_link") 2>conftest.err ac_status=$? if test -s conftest.err; then @@ -1800,17 +1744,18 @@ $as_echo "$ac_try_echo"; } >&5 cat conftest.er1 >&5 mv -f conftest.er1 conftest.err fi - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err } && test -s conftest$ac_exeext && { test "$cross_compiling" = yes || test -x conftest$ac_exeext - }; then : + } +then : ac_retval=0 -else - $as_echo "$as_me: failed program was:" >&5 +else $as_nop + printf "%s\n" "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_retval=1 @@ -1825,17 +1770,70 @@ fi } # ac_fn_c_try_link +# ac_fn_check_decl LINENO SYMBOL VAR INCLUDES EXTRA-OPTIONS FLAG-VAR +# ------------------------------------------------------------------ +# Tests whether SYMBOL is declared in INCLUDES, setting cache variable VAR +# accordingly. Pass EXTRA-OPTIONS to the compiler, using FLAG-VAR. +ac_fn_check_decl () +{ + as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack + as_decl_name=`echo $2|sed 's/ *(.*//'` + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether $as_decl_name is declared" >&5 +printf %s "checking whether $as_decl_name is declared... " >&6; } +if eval test \${$3+y} +then : + printf %s "(cached) " >&6 +else $as_nop + as_decl_use=`echo $2|sed -e 's/(/((/' -e 's/)/) 0&/' -e 's/,/) 0& (/g'` + eval ac_save_FLAGS=\$$6 + as_fn_append $6 " $5" + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +$4 +int +main (void) +{ +#ifndef $as_decl_name +#ifdef __cplusplus + (void) $as_decl_use; +#else + (void) $as_decl_name; +#endif +#endif + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO" +then : + eval "$3=yes" +else $as_nop + eval "$3=no" +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext + eval $6=\$ac_save_FLAGS + +fi +eval ac_res=\$$3 + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 +printf "%s\n" "$ac_res" >&6; } + eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno + +} # ac_fn_check_decl + # ac_fn_c_check_func LINENO FUNC VAR # ---------------------------------- # Tests whether FUNC exists, setting the cache variable VAR accordingly ac_fn_c_check_func () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 -$as_echo_n "checking for $2... " >&6; } -if eval \${$3+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 +printf %s "checking for $2... " >&6; } +if eval test \${$3+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ /* Define $2 to an innocuous variant, in case declares $2. @@ -1843,16 +1841,9 @@ else #define $2 innocuous_$2 /* System header to define __stub macros and hopefully few prototypes, - which can conflict with char $2 (); below. - Prefer to if __STDC__ is defined, since - exists even on freestanding compilers. */ - -#ifdef __STDC__ -# include -#else -# include -#endif + which can conflict with char $2 (); below. */ +#include #undef $2 /* Override any GCC internal prototype to avoid an error. @@ -1870,24 +1861,25 @@ choke me #endif int -main () +main (void) { return $2 (); ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : eval "$3=yes" -else +else $as_nop eval "$3=no" fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext fi eval ac_res=\$$3 - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 +printf "%s\n" "$ac_res" >&6; } eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno } # ac_fn_c_check_func @@ -1899,17 +1891,18 @@ $as_echo "$ac_res" >&6; } ac_fn_c_check_type () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 -$as_echo_n "checking for $2... " >&6; } -if eval \${$3+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 +printf %s "checking for $2... " >&6; } +if eval test \${$3+y} +then : + printf %s "(cached) " >&6 +else $as_nop eval "$3=no" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ $4 int -main () +main (void) { if (sizeof ($2)) return 0; @@ -1917,12 +1910,13 @@ if (sizeof ($2)) return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ $4 int -main () +main (void) { if (sizeof (($2))) return 0; @@ -1930,29 +1924,50 @@ if (sizeof (($2))) return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : -else +else $as_nop eval "$3=yes" fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext fi eval ac_res=\$$3 - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 +printf "%s\n" "$ac_res" >&6; } eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno } # ac_fn_c_check_type +ac_configure_args_raw= +for ac_arg +do + case $ac_arg in + *\'*) + ac_arg=`printf "%s\n" "$ac_arg" | sed "s/'/'\\\\\\\\''/g"` ;; + esac + as_fn_append ac_configure_args_raw " '$ac_arg'" +done + +case $ac_configure_args_raw in + *$as_nl*) + ac_safe_unquote= ;; + *) + ac_unsafe_z='|&;<>()$`\\"*?[ '' ' # This string ends in space, tab. + ac_unsafe_a="$ac_unsafe_z#~" + ac_safe_unquote="s/ '\\([^$ac_unsafe_a][^$ac_unsafe_z]*\\)'/ \\1/g" + ac_configure_args_raw=` printf "%s\n" "$ac_configure_args_raw" | sed "$ac_safe_unquote"`;; +esac + cat >config.log <<_ACEOF This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. It was created by $as_me, which was -generated by GNU Autoconf 2.69. Invocation command line was +generated by GNU Autoconf 2.71. Invocation command line was - $ $0 $@ + $ $0$ac_configure_args_raw _ACEOF exec 5>>config.log @@ -1985,8 +2000,12 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - $as_echo "PATH: $as_dir" + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + printf "%s\n" "PATH: $as_dir" done IFS=$as_save_IFS @@ -2021,7 +2040,7 @@ do | -silent | --silent | --silen | --sile | --sil) continue ;; *\'*) - ac_arg=`$as_echo "$ac_arg" | sed "s/'/'\\\\\\\\''/g"` ;; + ac_arg=`printf "%s\n" "$ac_arg" | sed "s/'/'\\\\\\\\''/g"` ;; esac case $ac_pass in 1) as_fn_append ac_configure_args0 " '$ac_arg'" ;; @@ -2056,11 +2075,13 @@ done # WARNING: Use '\'' to represent an apostrophe within the trap. # WARNING: Do not start the trap code with a newline, due to a FreeBSD 4.0 bug. trap 'exit_status=$? + # Sanitize IFS. + IFS=" "" $as_nl" # Save into config.log some information that might help in debugging. { echo - $as_echo "## ---------------- ## + printf "%s\n" "## ---------------- ## ## Cache variables. ## ## ---------------- ##" echo @@ -2071,8 +2092,8 @@ trap 'exit_status=$? case $ac_val in #( *${as_nl}*) case $ac_var in #( - *_cv_*) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 -$as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; + *_cv_*) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 +printf "%s\n" "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; esac case $ac_var in #( _ | IFS | as_nl) ;; #( @@ -2096,7 +2117,7 @@ $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; ) echo - $as_echo "## ----------------- ## + printf "%s\n" "## ----------------- ## ## Output variables. ## ## ----------------- ##" echo @@ -2104,14 +2125,14 @@ $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; do eval ac_val=\$$ac_var case $ac_val in - *\'\''*) ac_val=`$as_echo "$ac_val" | sed "s/'\''/'\''\\\\\\\\'\'''\''/g"`;; + *\'\''*) ac_val=`printf "%s\n" "$ac_val" | sed "s/'\''/'\''\\\\\\\\'\'''\''/g"`;; esac - $as_echo "$ac_var='\''$ac_val'\''" + printf "%s\n" "$ac_var='\''$ac_val'\''" done | sort echo if test -n "$ac_subst_files"; then - $as_echo "## ------------------- ## + printf "%s\n" "## ------------------- ## ## File substitutions. ## ## ------------------- ##" echo @@ -2119,15 +2140,15 @@ $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; do eval ac_val=\$$ac_var case $ac_val in - *\'\''*) ac_val=`$as_echo "$ac_val" | sed "s/'\''/'\''\\\\\\\\'\'''\''/g"`;; + *\'\''*) ac_val=`printf "%s\n" "$ac_val" | sed "s/'\''/'\''\\\\\\\\'\'''\''/g"`;; esac - $as_echo "$ac_var='\''$ac_val'\''" + printf "%s\n" "$ac_var='\''$ac_val'\''" done | sort echo fi if test -s confdefs.h; then - $as_echo "## ----------- ## + printf "%s\n" "## ----------- ## ## confdefs.h. ## ## ----------- ##" echo @@ -2135,8 +2156,8 @@ $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; echo fi test "$ac_signal" != 0 && - $as_echo "$as_me: caught signal $ac_signal" - $as_echo "$as_me: exit $exit_status" + printf "%s\n" "$as_me: caught signal $ac_signal" + printf "%s\n" "$as_me: exit $exit_status" } >&5 rm -f core *.core core.conftest.* && rm -f -r conftest* confdefs* conf$$* $ac_clean_files && @@ -2150,63 +2171,48 @@ ac_signal=0 # confdefs.h avoids OS command line length limits that DEFS can exceed. rm -f -r conftest* confdefs.h -$as_echo "/* confdefs.h */" > confdefs.h +printf "%s\n" "/* confdefs.h */" > confdefs.h # Predefined preprocessor variables. -cat >>confdefs.h <<_ACEOF -#define PACKAGE_NAME "$PACKAGE_NAME" -_ACEOF +printf "%s\n" "#define PACKAGE_NAME \"$PACKAGE_NAME\"" >>confdefs.h -cat >>confdefs.h <<_ACEOF -#define PACKAGE_TARNAME "$PACKAGE_TARNAME" -_ACEOF +printf "%s\n" "#define PACKAGE_TARNAME \"$PACKAGE_TARNAME\"" >>confdefs.h -cat >>confdefs.h <<_ACEOF -#define PACKAGE_VERSION "$PACKAGE_VERSION" -_ACEOF +printf "%s\n" "#define PACKAGE_VERSION \"$PACKAGE_VERSION\"" >>confdefs.h -cat >>confdefs.h <<_ACEOF -#define PACKAGE_STRING "$PACKAGE_STRING" -_ACEOF +printf "%s\n" "#define PACKAGE_STRING \"$PACKAGE_STRING\"" >>confdefs.h -cat >>confdefs.h <<_ACEOF -#define PACKAGE_BUGREPORT "$PACKAGE_BUGREPORT" -_ACEOF +printf "%s\n" "#define PACKAGE_BUGREPORT \"$PACKAGE_BUGREPORT\"" >>confdefs.h -cat >>confdefs.h <<_ACEOF -#define PACKAGE_URL "$PACKAGE_URL" -_ACEOF +printf "%s\n" "#define PACKAGE_URL \"$PACKAGE_URL\"" >>confdefs.h # Let the site file select an alternate cache file if it wants to. # Prefer an explicitly selected file to automatically selected ones. -ac_site_file1=NONE -ac_site_file2=NONE if test -n "$CONFIG_SITE"; then - # We do not want a PATH search for config.site. - case $CONFIG_SITE in #(( - -*) ac_site_file1=./$CONFIG_SITE;; - */*) ac_site_file1=$CONFIG_SITE;; - *) ac_site_file1=./$CONFIG_SITE;; - esac + ac_site_files="$CONFIG_SITE" elif test "x$prefix" != xNONE; then - ac_site_file1=$prefix/share/config.site - ac_site_file2=$prefix/etc/config.site + ac_site_files="$prefix/share/config.site $prefix/etc/config.site" else - ac_site_file1=$ac_default_prefix/share/config.site - ac_site_file2=$ac_default_prefix/etc/config.site + ac_site_files="$ac_default_prefix/share/config.site $ac_default_prefix/etc/config.site" fi -for ac_site_file in "$ac_site_file1" "$ac_site_file2" + +for ac_site_file in $ac_site_files do - test "x$ac_site_file" = xNONE && continue - if test /dev/null != "$ac_site_file" && test -r "$ac_site_file"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: loading site script $ac_site_file" >&5 -$as_echo "$as_me: loading site script $ac_site_file" >&6;} + case $ac_site_file in #( + */*) : + ;; #( + *) : + ac_site_file=./$ac_site_file ;; +esac + if test -f "$ac_site_file" && test -r "$ac_site_file"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: loading site script $ac_site_file" >&5 +printf "%s\n" "$as_me: loading site script $ac_site_file" >&6;} sed 's/^/| /' "$ac_site_file" >&5 . "$ac_site_file" \ - || { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} + || { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "failed to load site script $ac_site_file See \`config.log' for more details" "$LINENO" 5; } fi @@ -2216,153 +2222,546 @@ if test -r "$cache_file"; then # Some versions of bash will fail to source /dev/null (special files # actually), so we avoid doing that. DJGPP emulates it as a regular file. if test /dev/null != "$cache_file" && test -f "$cache_file"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: loading cache $cache_file" >&5 -$as_echo "$as_me: loading cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: loading cache $cache_file" >&5 +printf "%s\n" "$as_me: loading cache $cache_file" >&6;} case $cache_file in [\\/]* | ?:[\\/]* ) . "$cache_file";; *) . "./$cache_file";; esac fi else - { $as_echo "$as_me:${as_lineno-$LINENO}: creating cache $cache_file" >&5 -$as_echo "$as_me: creating cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: creating cache $cache_file" >&5 +printf "%s\n" "$as_me: creating cache $cache_file" >&6;} >$cache_file fi -# Check that the precious variables saved in the cache have kept the same -# value. -ac_cache_corrupted=false -for ac_var in $ac_precious_vars; do - eval ac_old_set=\$ac_cv_env_${ac_var}_set - eval ac_new_set=\$ac_env_${ac_var}_set - eval ac_old_val=\$ac_cv_env_${ac_var}_value - eval ac_new_val=\$ac_env_${ac_var}_value - case $ac_old_set,$ac_new_set in - set,) - { $as_echo "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&5 -$as_echo "$as_me: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&2;} - ac_cache_corrupted=: ;; - ,set) - { $as_echo "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' was not set in the previous run" >&5 -$as_echo "$as_me: error: \`$ac_var' was not set in the previous run" >&2;} - ac_cache_corrupted=: ;; - ,);; - *) - if test "x$ac_old_val" != "x$ac_new_val"; then - # differences in whitespace do not lead to failure. - ac_old_val_w=`echo x $ac_old_val` - ac_new_val_w=`echo x $ac_new_val` - if test "$ac_old_val_w" != "$ac_new_val_w"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' has changed since the previous run:" >&5 -$as_echo "$as_me: error: \`$ac_var' has changed since the previous run:" >&2;} - ac_cache_corrupted=: - else - { $as_echo "$as_me:${as_lineno-$LINENO}: warning: ignoring whitespace changes in \`$ac_var' since the previous run:" >&5 -$as_echo "$as_me: warning: ignoring whitespace changes in \`$ac_var' since the previous run:" >&2;} - eval $ac_var=\$ac_old_val - fi - { $as_echo "$as_me:${as_lineno-$LINENO}: former value: \`$ac_old_val'" >&5 -$as_echo "$as_me: former value: \`$ac_old_val'" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: current value: \`$ac_new_val'" >&5 -$as_echo "$as_me: current value: \`$ac_new_val'" >&2;} - fi;; - esac - # Pass precious variables to config.status. - if test "$ac_new_set" = set; then - case $ac_new_val in - *\'*) ac_arg=$ac_var=`$as_echo "$ac_new_val" | sed "s/'/'\\\\\\\\''/g"` ;; - *) ac_arg=$ac_var=$ac_new_val ;; - esac - case " $ac_configure_args " in - *" '$ac_arg' "*) ;; # Avoid dups. Use of quotes ensures accuracy. - *) as_fn_append ac_configure_args " '$ac_arg'" ;; - esac - fi -done -if $ac_cache_corrupted; then - { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: error: changes in the environment can compromise the build" >&5 -$as_echo "$as_me: error: changes in the environment can compromise the build" >&2;} - as_fn_error $? "run \`make distclean' and/or \`rm $cache_file' and start over" "$LINENO" 5 -fi -## -------------------- ## -## Main body of script. ## -## -------------------- ## +# Test code for whether the C compiler supports C89 (global declarations) +ac_c_conftest_c89_globals=' +/* Does the compiler advertise C89 conformance? + Do not test the value of __STDC__, because some compilers set it to 0 + while being otherwise adequately conformant. */ +#if !defined __STDC__ +# error "Compiler does not advertise C89 conformance" +#endif -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu +#include +#include +struct stat; +/* Most of the following tests are stolen from RCS 5.7 src/conf.sh. */ +struct buf { int x; }; +struct buf * (*rcsopen) (struct buf *, struct stat *, int); +static char *e (p, i) + char **p; + int i; +{ + return p[i]; +} +static char *f (char * (*g) (char **, int), char **p, ...) +{ + char *s; + va_list v; + va_start (v,p); + s = g (p, va_arg (v,int)); + va_end (v); + return s; +} +/* OSF 4.0 Compaq cc is some sort of almost-ANSI by default. It has + function prototypes and stuff, but not \xHH hex character constants. + These do not provoke an error unfortunately, instead are silently treated + as an "x". The following induces an error, until -std is added to get + proper ANSI mode. Curiously \x00 != x always comes out true, for an + array size at least. It is necessary to write \x00 == 0 to get something + that is true only with -std. */ +int osf4_cc_array ['\''\x00'\'' == 0 ? 1 : -1]; +/* IBM C 6 for AIX is almost-ANSI by default, but it replaces macro parameters + inside strings and character constants. */ +#define FOO(x) '\''x'\'' +int xlc6_cc_array[FOO(a) == '\''x'\'' ? 1 : -1]; -ac_config_headers="$ac_config_headers config.h" +int test (int i, double x); +struct s1 {int (*f) (int a);}; +struct s2 {int (*f) (double a);}; +int pairnames (int, char **, int *(*)(struct buf *, struct stat *, int), + int, int);' +# Test code for whether the C compiler supports C89 (body of main). +ac_c_conftest_c89_main=' +ok |= (argc == 0 || f (e, argv, 0) != argv[0] || f (e, argv, 1) != argv[1]); +' -LDFLAGS="" -LIBS="" -EXTRA_LIBS="" -INCLUDES="" -CPPFLAGS="" -GDBM_INCLUDE="" -GDBM_LIB="" -FNV_DEP="" -TRIO_DEP="" -PCRE_DEP="" -PCRE_MIN_VERSION="8.39" +# Test code for whether the C compiler supports C99 (global declarations) +ac_c_conftest_c99_globals=' +// Does the compiler advertise C99 conformance? +#if !defined __STDC_VERSION__ || __STDC_VERSION__ < 199901L +# error "Compiler does not advertise C99 conformance" +#endif +#include +extern int puts (const char *); +extern int printf (const char *, ...); +extern int dprintf (int, const char *, ...); +extern void *malloc (size_t); + +// Check varargs macros. These examples are taken from C99 6.10.3.5. +// dprintf is used instead of fprintf to avoid needing to declare +// FILE and stderr. +#define debug(...) dprintf (2, __VA_ARGS__) +#define showlist(...) puts (#__VA_ARGS__) +#define report(test,...) ((test) ? puts (#test) : printf (__VA_ARGS__)) +static void +test_varargs_macros (void) +{ + int x = 1234; + int y = 5678; + debug ("Flag"); + debug ("X = %d\n", x); + showlist (The first, second, and third items.); + report (x>y, "x is %d but y is %d", x, y); +} -ac_aux_dir= -for ac_dir in "$srcdir" "$srcdir/.." "$srcdir/../.."; do - if test -f "$ac_dir/install-sh"; then - ac_aux_dir=$ac_dir - ac_install_sh="$ac_aux_dir/install-sh -c" - break - elif test -f "$ac_dir/install.sh"; then - ac_aux_dir=$ac_dir - ac_install_sh="$ac_aux_dir/install.sh -c" - break - elif test -f "$ac_dir/shtool"; then - ac_aux_dir=$ac_dir - ac_install_sh="$ac_aux_dir/shtool install -c" - break - fi -done -if test -z "$ac_aux_dir"; then - as_fn_error $? "cannot find install-sh, install.sh, or shtool in \"$srcdir\" \"$srcdir/..\" \"$srcdir/../..\"" "$LINENO" 5 +// Check long long types. +#define BIG64 18446744073709551615ull +#define BIG32 4294967295ul +#define BIG_OK (BIG64 / BIG32 == 4294967297ull && BIG64 % BIG32 == 0) +#if !BIG_OK + #error "your preprocessor is broken" +#endif +#if BIG_OK +#else + #error "your preprocessor is broken" +#endif +static long long int bignum = -9223372036854775807LL; +static unsigned long long int ubignum = BIG64; + +struct incomplete_array +{ + int datasize; + double data[]; +}; + +struct named_init { + int number; + const wchar_t *name; + double average; +}; + +typedef const char *ccp; + +static inline int +test_restrict (ccp restrict text) +{ + // See if C++-style comments work. + // Iterate through items via the restricted pointer. + // Also check for declarations in for loops. + for (unsigned int i = 0; *(text+i) != '\''\0'\''; ++i) + continue; + return 0; +} + +// Check varargs and va_copy. +static bool +test_varargs (const char *format, ...) +{ + va_list args; + va_start (args, format); + va_list args_copy; + va_copy (args_copy, args); + + const char *str = ""; + int number = 0; + float fnumber = 0; + + while (*format) + { + switch (*format++) + { + case '\''s'\'': // string + str = va_arg (args_copy, const char *); + break; + case '\''d'\'': // int + number = va_arg (args_copy, int); + break; + case '\''f'\'': // float + fnumber = va_arg (args_copy, double); + break; + default: + break; + } + } + va_end (args_copy); + va_end (args); + + return *str && number && fnumber; +} +' + +# Test code for whether the C compiler supports C99 (body of main). +ac_c_conftest_c99_main=' + // Check bool. + _Bool success = false; + success |= (argc != 0); + + // Check restrict. + if (test_restrict ("String literal") == 0) + success = true; + char *restrict newvar = "Another string"; + + // Check varargs. + success &= test_varargs ("s, d'\'' f .", "string", 65, 34.234); + test_varargs_macros (); + + // Check flexible array members. + struct incomplete_array *ia = + malloc (sizeof (struct incomplete_array) + (sizeof (double) * 10)); + ia->datasize = 10; + for (int i = 0; i < ia->datasize; ++i) + ia->data[i] = i * 1.234; + + // Check named initializers. + struct named_init ni = { + .number = 34, + .name = L"Test wide string", + .average = 543.34343, + }; + + ni.number = 58; + + int dynamic_array[ni.number]; + dynamic_array[0] = argv[0][0]; + dynamic_array[ni.number - 1] = 543; + + // work around unused variable warnings + ok |= (!success || bignum == 0LL || ubignum == 0uLL || newvar[0] == '\''x'\'' + || dynamic_array[ni.number - 1] != 543); +' + +# Test code for whether the C compiler supports C11 (global declarations) +ac_c_conftest_c11_globals=' +// Does the compiler advertise C11 conformance? +#if !defined __STDC_VERSION__ || __STDC_VERSION__ < 201112L +# error "Compiler does not advertise C11 conformance" +#endif + +// Check _Alignas. +char _Alignas (double) aligned_as_double; +char _Alignas (0) no_special_alignment; +extern char aligned_as_int; +char _Alignas (0) _Alignas (int) aligned_as_int; + +// Check _Alignof. +enum +{ + int_alignment = _Alignof (int), + int_array_alignment = _Alignof (int[100]), + char_alignment = _Alignof (char) +}; +_Static_assert (0 < -_Alignof (int), "_Alignof is signed"); + +// Check _Noreturn. +int _Noreturn does_not_return (void) { for (;;) continue; } + +// Check _Static_assert. +struct test_static_assert +{ + int x; + _Static_assert (sizeof (int) <= sizeof (long int), + "_Static_assert does not work in struct"); + long int y; +}; + +// Check UTF-8 literals. +#define u8 syntax error! +char const utf8_literal[] = u8"happens to be ASCII" "another string"; + +// Check duplicate typedefs. +typedef long *long_ptr; +typedef long int *long_ptr; +typedef long_ptr long_ptr; + +// Anonymous structures and unions -- taken from C11 6.7.2.1 Example 1. +struct anonymous +{ + union { + struct { int i; int j; }; + struct { int k; long int l; } w; + }; + int m; +} v1; +' + +# Test code for whether the C compiler supports C11 (body of main). +ac_c_conftest_c11_main=' + _Static_assert ((offsetof (struct anonymous, i) + == offsetof (struct anonymous, w.k)), + "Anonymous union alignment botch"); + v1.i = 2; + v1.w.k = 5; + ok |= v1.i != 5; +' + +# Test code for whether the C compiler supports C11 (complete). +ac_c_conftest_c11_program="${ac_c_conftest_c89_globals} +${ac_c_conftest_c99_globals} +${ac_c_conftest_c11_globals} + +int +main (int argc, char **argv) +{ + int ok = 0; + ${ac_c_conftest_c89_main} + ${ac_c_conftest_c99_main} + ${ac_c_conftest_c11_main} + return ok; +} +" + +# Test code for whether the C compiler supports C99 (complete). +ac_c_conftest_c99_program="${ac_c_conftest_c89_globals} +${ac_c_conftest_c99_globals} + +int +main (int argc, char **argv) +{ + int ok = 0; + ${ac_c_conftest_c89_main} + ${ac_c_conftest_c99_main} + return ok; +} +" + +# Test code for whether the C compiler supports C89 (complete). +ac_c_conftest_c89_program="${ac_c_conftest_c89_globals} + +int +main (int argc, char **argv) +{ + int ok = 0; + ${ac_c_conftest_c89_main} + return ok; +} +" + +as_fn_append ac_header_c_list " stdio.h stdio_h HAVE_STDIO_H" +as_fn_append ac_header_c_list " stdlib.h stdlib_h HAVE_STDLIB_H" +as_fn_append ac_header_c_list " string.h string_h HAVE_STRING_H" +as_fn_append ac_header_c_list " inttypes.h inttypes_h HAVE_INTTYPES_H" +as_fn_append ac_header_c_list " stdint.h stdint_h HAVE_STDINT_H" +as_fn_append ac_header_c_list " strings.h strings_h HAVE_STRINGS_H" +as_fn_append ac_header_c_list " sys/stat.h sys_stat_h HAVE_SYS_STAT_H" +as_fn_append ac_header_c_list " sys/types.h sys_types_h HAVE_SYS_TYPES_H" +as_fn_append ac_header_c_list " unistd.h unistd_h HAVE_UNISTD_H" +as_fn_append ac_header_c_list " sys/time.h sys_time_h HAVE_SYS_TIME_H" + +# Auxiliary files required by this configure script. +ac_aux_files="install-sh config.guess config.sub" + +# Locations in which to look for auxiliary files. +ac_aux_dir_candidates="${srcdir}${PATH_SEPARATOR}${srcdir}/..${PATH_SEPARATOR}${srcdir}/../.." + +# Search for a directory containing all of the required auxiliary files, +# $ac_aux_files, from the $PATH-style list $ac_aux_dir_candidates. +# If we don't find one directory that contains all the files we need, +# we report the set of missing files from the *first* directory in +# $ac_aux_dir_candidates and give up. +ac_missing_aux_files="" +ac_first_candidate=: +printf "%s\n" "$as_me:${as_lineno-$LINENO}: looking for aux files: $ac_aux_files" >&5 +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +as_found=false +for as_dir in $ac_aux_dir_candidates +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + as_found=: + + printf "%s\n" "$as_me:${as_lineno-$LINENO}: trying $as_dir" >&5 + ac_aux_dir_found=yes + ac_install_sh= + for ac_aux in $ac_aux_files + do + # As a special case, if "install-sh" is required, that requirement + # can be satisfied by any of "install-sh", "install.sh", or "shtool", + # and $ac_install_sh is set appropriately for whichever one is found. + if test x"$ac_aux" = x"install-sh" + then + if test -f "${as_dir}install-sh"; then + printf "%s\n" "$as_me:${as_lineno-$LINENO}: ${as_dir}install-sh found" >&5 + ac_install_sh="${as_dir}install-sh -c" + elif test -f "${as_dir}install.sh"; then + printf "%s\n" "$as_me:${as_lineno-$LINENO}: ${as_dir}install.sh found" >&5 + ac_install_sh="${as_dir}install.sh -c" + elif test -f "${as_dir}shtool"; then + printf "%s\n" "$as_me:${as_lineno-$LINENO}: ${as_dir}shtool found" >&5 + ac_install_sh="${as_dir}shtool install -c" + else + ac_aux_dir_found=no + if $ac_first_candidate; then + ac_missing_aux_files="${ac_missing_aux_files} install-sh" + else + break + fi + fi + else + if test -f "${as_dir}${ac_aux}"; then + printf "%s\n" "$as_me:${as_lineno-$LINENO}: ${as_dir}${ac_aux} found" >&5 + else + ac_aux_dir_found=no + if $ac_first_candidate; then + ac_missing_aux_files="${ac_missing_aux_files} ${ac_aux}" + else + break + fi + fi + fi + done + if test "$ac_aux_dir_found" = yes; then + ac_aux_dir="$as_dir" + break + fi + ac_first_candidate=false + + as_found=false +done +IFS=$as_save_IFS +if $as_found +then : + +else $as_nop + as_fn_error $? "cannot find required auxiliary files:$ac_missing_aux_files" "$LINENO" 5 fi + # These three variables are undocumented and unsupported, # and are intended to be withdrawn in a future Autoconf release. # They can cause serious problems if a builder's source tree is in a directory # whose full name contains unusual characters. -ac_config_guess="$SHELL $ac_aux_dir/config.guess" # Please don't use this var. -ac_config_sub="$SHELL $ac_aux_dir/config.sub" # Please don't use this var. -ac_configure="$SHELL $ac_aux_dir/configure" # Please don't use this var. +if test -f "${ac_aux_dir}config.guess"; then + ac_config_guess="$SHELL ${ac_aux_dir}config.guess" +fi +if test -f "${ac_aux_dir}config.sub"; then + ac_config_sub="$SHELL ${ac_aux_dir}config.sub" +fi +if test -f "$ac_aux_dir/configure"; then + ac_configure="$SHELL ${ac_aux_dir}configure" +fi + +# Check that the precious variables saved in the cache have kept the same +# value. +ac_cache_corrupted=false +for ac_var in $ac_precious_vars; do + eval ac_old_set=\$ac_cv_env_${ac_var}_set + eval ac_new_set=\$ac_env_${ac_var}_set + eval ac_old_val=\$ac_cv_env_${ac_var}_value + eval ac_new_val=\$ac_env_${ac_var}_value + case $ac_old_set,$ac_new_set in + set,) + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&5 +printf "%s\n" "$as_me: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&2;} + ac_cache_corrupted=: ;; + ,set) + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' was not set in the previous run" >&5 +printf "%s\n" "$as_me: error: \`$ac_var' was not set in the previous run" >&2;} + ac_cache_corrupted=: ;; + ,);; + *) + if test "x$ac_old_val" != "x$ac_new_val"; then + # differences in whitespace do not lead to failure. + ac_old_val_w=`echo x $ac_old_val` + ac_new_val_w=`echo x $ac_new_val` + if test "$ac_old_val_w" != "$ac_new_val_w"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' has changed since the previous run:" >&5 +printf "%s\n" "$as_me: error: \`$ac_var' has changed since the previous run:" >&2;} + ac_cache_corrupted=: + else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: warning: ignoring whitespace changes in \`$ac_var' since the previous run:" >&5 +printf "%s\n" "$as_me: warning: ignoring whitespace changes in \`$ac_var' since the previous run:" >&2;} + eval $ac_var=\$ac_old_val + fi + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: former value: \`$ac_old_val'" >&5 +printf "%s\n" "$as_me: former value: \`$ac_old_val'" >&2;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: current value: \`$ac_new_val'" >&5 +printf "%s\n" "$as_me: current value: \`$ac_new_val'" >&2;} + fi;; + esac + # Pass precious variables to config.status. + if test "$ac_new_set" = set; then + case $ac_new_val in + *\'*) ac_arg=$ac_var=`printf "%s\n" "$ac_new_val" | sed "s/'/'\\\\\\\\''/g"` ;; + *) ac_arg=$ac_var=$ac_new_val ;; + esac + case " $ac_configure_args " in + *" '$ac_arg' "*) ;; # Avoid dups. Use of quotes ensures accuracy. + *) as_fn_append ac_configure_args " '$ac_arg'" ;; + esac + fi +done +if $ac_cache_corrupted; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: changes in the environment can compromise the build" >&5 +printf "%s\n" "$as_me: error: changes in the environment can compromise the build" >&2;} + as_fn_error $? "run \`${MAKE-make} distclean' and/or \`rm $cache_file' + and start over" "$LINENO" 5 +fi +## -------------------- ## +## Main body of script. ## +## -------------------- ## + +ac_ext=c +ac_cpp='$CPP $CPPFLAGS' +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' +ac_compiler_gnu=$ac_cv_c_compiler_gnu -# Make sure we can run config.sub. -$SHELL "$ac_aux_dir/config.sub" sun4 >/dev/null 2>&1 || - as_fn_error $? "cannot run $SHELL $ac_aux_dir/config.sub" "$LINENO" 5 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking build system type" >&5 -$as_echo_n "checking build system type... " >&6; } -if ${ac_cv_build+:} false; then : - $as_echo_n "(cached) " >&6 -else +ac_config_headers="$ac_config_headers config.h" + + + +LDFLAGS="" +LIBS="" +EXTRA_LIBS="" +INCLUDES="" +CPPFLAGS="" +GDBM_INCLUDE="" +GDBM_LIB="" +FNV_DEP="" +TRIO_DEP="" +PCRE2_DEP="" +PCRE2_MIN_VERSION="10.32" + + + + + + # Make sure we can run config.sub. +$SHELL "${ac_aux_dir}config.sub" sun4 >/dev/null 2>&1 || + as_fn_error $? "cannot run $SHELL ${ac_aux_dir}config.sub" "$LINENO" 5 + +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking build system type" >&5 +printf %s "checking build system type... " >&6; } +if test ${ac_cv_build+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_build_alias=$build_alias test "x$ac_build_alias" = x && - ac_build_alias=`$SHELL "$ac_aux_dir/config.guess"` + ac_build_alias=`$SHELL "${ac_aux_dir}config.guess"` test "x$ac_build_alias" = x && as_fn_error $? "cannot guess build type; you must specify one" "$LINENO" 5 -ac_cv_build=`$SHELL "$ac_aux_dir/config.sub" $ac_build_alias` || - as_fn_error $? "$SHELL $ac_aux_dir/config.sub $ac_build_alias failed" "$LINENO" 5 +ac_cv_build=`$SHELL "${ac_aux_dir}config.sub" $ac_build_alias` || + as_fn_error $? "$SHELL ${ac_aux_dir}config.sub $ac_build_alias failed" "$LINENO" 5 fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_build" >&5 -$as_echo "$ac_cv_build" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_build" >&5 +printf "%s\n" "$ac_cv_build" >&6; } case $ac_cv_build in *-*-*) ;; *) as_fn_error $? "invalid value of canonical build" "$LINENO" 5;; @@ -2381,21 +2780,22 @@ IFS=$ac_save_IFS case $build_os in *\ *) build_os=`echo "$build_os" | sed 's/ /-/g'`;; esac -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking host system type" >&5 -$as_echo_n "checking host system type... " >&6; } -if ${ac_cv_host+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking host system type" >&5 +printf %s "checking host system type... " >&6; } +if test ${ac_cv_host+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test "x$host_alias" = x; then ac_cv_host=$ac_cv_build else - ac_cv_host=`$SHELL "$ac_aux_dir/config.sub" $host_alias` || - as_fn_error $? "$SHELL $ac_aux_dir/config.sub $host_alias failed" "$LINENO" 5 + ac_cv_host=`$SHELL "${ac_aux_dir}config.sub" $host_alias` || + as_fn_error $? "$SHELL ${ac_aux_dir}config.sub $host_alias failed" "$LINENO" 5 fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_host" >&5 -$as_echo "$ac_cv_host" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_host" >&5 +printf "%s\n" "$ac_cv_host" >&6; } case $ac_cv_host in *-*-*) ;; *) as_fn_error $? "invalid value of canonical host" "$LINENO" 5;; @@ -2414,21 +2814,22 @@ IFS=$ac_save_IFS case $host_os in *\ *) host_os=`echo "$host_os" | sed 's/ /-/g'`;; esac -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking target system type" >&5 -$as_echo_n "checking target system type... " >&6; } -if ${ac_cv_target+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking target system type" >&5 +printf %s "checking target system type... " >&6; } +if test ${ac_cv_target+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test "x$target_alias" = x; then ac_cv_target=$ac_cv_host else - ac_cv_target=`$SHELL "$ac_aux_dir/config.sub" $target_alias` || - as_fn_error $? "$SHELL $ac_aux_dir/config.sub $target_alias failed" "$LINENO" 5 + ac_cv_target=`$SHELL "${ac_aux_dir}config.sub" $target_alias` || + as_fn_error $? "$SHELL ${ac_aux_dir}config.sub $target_alias failed" "$LINENO" 5 fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_target" >&5 -$as_echo "$ac_cv_target" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_target" >&5 +printf "%s\n" "$ac_cv_target" >&6; } case $ac_cv_target in *-*-*) ;; *) as_fn_error $? "invalid value of canonical target" "$LINENO" 5;; @@ -2455,27 +2856,37 @@ test -n "$target_alias" && program_prefix=${target_alias}- -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking cached information" >&5 -$as_echo_n "checking cached information... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking cached information" >&5 +printf %s "checking cached information... " >&6; } hostcheck="$host" -if ${ac_cv_hostcheck+:} false; then : - $as_echo_n "(cached) " >&6 -else +if test ${ac_cv_hostcheck+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_cv_hostcheck="$hostcheck" fi if test "$ac_cv_hostcheck" != "$hostcheck"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: changed" >&5 -$as_echo "changed" >&6; } - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: config.cache exists!" >&5 -$as_echo "$as_me: WARNING: config.cache exists!" >&2;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: changed" >&5 +printf "%s\n" "changed" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: config.cache exists!" >&5 +printf "%s\n" "$as_me: WARNING: config.cache exists!" >&2;} as_fn_error $? "you must do 'make clobber' first to compile for different host or different parameters." "$LINENO" 5 else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: ok" >&5 -$as_echo "ok" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: ok" >&5 +printf "%s\n" "ok" >&6; } fi + + + + + + + + + ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' @@ -2484,11 +2895,12 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}gcc", so it can be a program name with args. set dummy ${ac_tool_prefix}gcc; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else @@ -2496,11 +2908,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_CC="${ac_tool_prefix}gcc" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -2511,11 +2927,11 @@ fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 -$as_echo "$CC" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 +printf "%s\n" "$CC" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -2524,11 +2940,12 @@ if test -z "$ac_cv_prog_CC"; then ac_ct_CC=$CC # Extract the first word of "gcc", so it can be a program name with args. set dummy gcc; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else @@ -2536,11 +2953,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CC="gcc" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -2551,11 +2972,11 @@ fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CC" >&5 -$as_echo "$ac_ct_CC" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CC" >&5 +printf "%s\n" "$ac_ct_CC" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi if test "x$ac_ct_CC" = x; then @@ -2563,8 +2984,8 @@ fi else case $cross_compiling:$ac_tool_warned in yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} ac_tool_warned=yes ;; esac CC=$ac_ct_CC @@ -2577,11 +2998,12 @@ if test -z "$CC"; then if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}cc", so it can be a program name with args. set dummy ${ac_tool_prefix}cc; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else @@ -2589,11 +3011,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_CC="${ac_tool_prefix}cc" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -2604,11 +3030,11 @@ fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 -$as_echo "$CC" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 +printf "%s\n" "$CC" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -2617,11 +3043,12 @@ fi if test -z "$CC"; then # Extract the first word of "cc", so it can be a program name with args. set dummy cc; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else @@ -2630,15 +3057,19 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - if test "$as_dir/$ac_word$ac_exec_ext" = "/usr/ucb/cc"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + if test "$as_dir$ac_word$ac_exec_ext" = "/usr/ucb/cc"; then ac_prog_rejected=yes continue fi ac_cv_prog_CC="cc" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -2654,18 +3085,18 @@ if test $ac_prog_rejected = yes; then # However, it has the same basename, so the bogon will be chosen # first if we set CC to just the basename; use the full file name. shift - ac_cv_prog_CC="$as_dir/$ac_word${1+' '}$@" + ac_cv_prog_CC="$as_dir$ac_word${1+' '}$@" fi fi fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 -$as_echo "$CC" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 +printf "%s\n" "$CC" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -2676,11 +3107,12 @@ if test -z "$CC"; then do # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. set dummy $ac_tool_prefix$ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else @@ -2688,11 +3120,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_CC="$ac_tool_prefix$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -2703,11 +3139,11 @@ fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 -$as_echo "$CC" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 +printf "%s\n" "$CC" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -2720,11 +3156,12 @@ if test -z "$CC"; then do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else @@ -2732,11 +3169,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CC="$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -2747,11 +3188,11 @@ fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CC" >&5 -$as_echo "$ac_ct_CC" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CC" >&5 +printf "%s\n" "$ac_ct_CC" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -2763,34 +3204,138 @@ done else case $cross_compiling:$ac_tool_warned in yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +ac_tool_warned=yes ;; +esac + CC=$ac_ct_CC + fi +fi + +fi +if test -z "$CC"; then + if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}clang", so it can be a program name with args. +set dummy ${ac_tool_prefix}clang; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$CC"; then + ac_cv_prog_CC="$CC" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_CC="${ac_tool_prefix}clang" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS + +fi +fi +CC=$ac_cv_prog_CC +if test -n "$CC"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 +printf "%s\n" "$CC" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + + +fi +if test -z "$ac_cv_prog_CC"; then + ac_ct_CC=$CC + # Extract the first word of "clang", so it can be a program name with args. +set dummy clang; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_CC"; then + ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_CC="clang" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS + +fi +fi +ac_ct_CC=$ac_cv_prog_ac_ct_CC +if test -n "$ac_ct_CC"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CC" >&5 +printf "%s\n" "$ac_ct_CC" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + + if test "x$ac_ct_CC" = x; then + CC="" + else + case $cross_compiling:$ac_tool_warned in +yes:) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} ac_tool_warned=yes ;; esac CC=$ac_ct_CC fi +else + CC="$ac_cv_prog_CC" fi fi -test -z "$CC" && { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} +test -z "$CC" && { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "no acceptable C compiler found in \$PATH See \`config.log' for more details" "$LINENO" 5; } # Provide some information about the compiler. -$as_echo "$as_me:${as_lineno-$LINENO}: checking for C compiler version" >&5 +printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for C compiler version" >&5 set X $ac_compile ac_compiler=$2 -for ac_option in --version -v -V -qversion; do +for ac_option in --version -v -V -qversion -version; do { { ac_try="$ac_compiler $ac_option >&5" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_compiler $ac_option >&5") 2>conftest.err ac_status=$? if test -s conftest.err; then @@ -2800,7 +3345,7 @@ $as_echo "$ac_try_echo"; } >&5 cat conftest.er1 >&5 fi rm -f conftest.er1 conftest.err - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } done @@ -2808,7 +3353,7 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int -main () +main (void) { ; @@ -2820,9 +3365,9 @@ ac_clean_files="$ac_clean_files a.out a.out.dSYM a.exe b.out" # Try to create an executable without -o first, disregard a.out. # It will help us diagnose broken compilers, and finding out an intuition # of exeext. -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the C compiler works" >&5 -$as_echo_n "checking whether the C compiler works... " >&6; } -ac_link_default=`$as_echo "$ac_link" | sed 's/ -o *conftest[^ ]*//'` +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the C compiler works" >&5 +printf %s "checking whether the C compiler works... " >&6; } +ac_link_default=`printf "%s\n" "$ac_link" | sed 's/ -o *conftest[^ ]*//'` # The possible output files: ac_files="a.out conftest.exe conftest a.exe a_out.exe b.out conftest.*" @@ -2843,11 +3388,12 @@ case "(($ac_try" in *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_link_default") 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; }; then : + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; } +then : # Autoconf-2.13 could set the ac_cv_exeext variable to `no'. # So ignore a value of `no', otherwise this would lead to `EXEEXT = no' # in a Makefile. We should not override ac_cv_exeext if it was cached, @@ -2864,7 +3410,7 @@ do # certainly right. break;; *.* ) - if test "${ac_cv_exeext+set}" = set && test "$ac_cv_exeext" != no; + if test ${ac_cv_exeext+y} && test "$ac_cv_exeext" != no; then :; else ac_cv_exeext=`expr "$ac_file" : '[^.]*\(\..*\)'` fi @@ -2880,44 +3426,46 @@ do done test "$ac_cv_exeext" = no && ac_cv_exeext= -else +else $as_nop ac_file='' fi -if test -z "$ac_file"; then : - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -$as_echo "$as_me: failed program was:" >&5 +if test -z "$ac_file" +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +printf "%s\n" "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 -{ { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} +{ { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error 77 "C compiler cannot create executables See \`config.log' for more details" "$LINENO" 5; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for C compiler default output file name" >&5 -$as_echo_n "checking for C compiler default output file name... " >&6; } -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_file" >&5 -$as_echo "$ac_file" >&6; } +else $as_nop + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for C compiler default output file name" >&5 +printf %s "checking for C compiler default output file name... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_file" >&5 +printf "%s\n" "$ac_file" >&6; } ac_exeext=$ac_cv_exeext rm -f -r a.out a.out.dSYM a.exe conftest$ac_cv_exeext b.out ac_clean_files=$ac_clean_files_save -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for suffix of executables" >&5 -$as_echo_n "checking for suffix of executables... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for suffix of executables" >&5 +printf %s "checking for suffix of executables... " >&6; } if { { ac_try="$ac_link" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_link") 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; }; then : + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; } +then : # If both `conftest.exe' and `conftest' are `present' (well, observable) # catch `conftest.exe'. For instance with Cygwin, `ls conftest' will # work properly (i.e., refer to `conftest.exe'), while it won't with @@ -2931,15 +3479,15 @@ for ac_file in conftest.exe conftest conftest.*; do * ) break;; esac done -else - { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} +else $as_nop + { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "cannot compute suffix of executables: cannot compile and link See \`config.log' for more details" "$LINENO" 5; } fi rm -f conftest conftest$ac_cv_exeext -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_exeext" >&5 -$as_echo "$ac_cv_exeext" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_exeext" >&5 +printf "%s\n" "$ac_cv_exeext" >&6; } rm -f conftest.$ac_ext EXEEXT=$ac_cv_exeext @@ -2948,7 +3496,7 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #include int -main () +main (void) { FILE *f = fopen ("conftest.out", "w"); return ferror (f) || fclose (f) != 0; @@ -2960,8 +3508,8 @@ _ACEOF ac_clean_files="$ac_clean_files conftest.out" # Check that the compiler produces executables we can run. If not, either # the compiler is broken, or we cross compile. -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether we are cross compiling" >&5 -$as_echo_n "checking whether we are cross compiling... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether we are cross compiling" >&5 +printf %s "checking whether we are cross compiling... " >&6; } if test "$cross_compiling" != yes; then { { ac_try="$ac_link" case "(($ac_try" in @@ -2969,10 +3517,10 @@ case "(($ac_try" in *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_link") 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } if { ac_try='./conftest$ac_cv_exeext' { { case "(($ac_try" in @@ -2980,39 +3528,40 @@ $as_echo "$ac_try_echo"; } >&5 *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_try") 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; }; }; then cross_compiling=no else if test "$cross_compiling" = maybe; then cross_compiling=yes else - { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} -as_fn_error $? "cannot run C compiled programs. + { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} +as_fn_error 77 "cannot run C compiled programs. If you meant to cross compile, use \`--host'. See \`config.log' for more details" "$LINENO" 5; } fi fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $cross_compiling" >&5 -$as_echo "$cross_compiling" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $cross_compiling" >&5 +printf "%s\n" "$cross_compiling" >&6; } rm -f conftest.$ac_ext conftest$ac_cv_exeext conftest.out ac_clean_files=$ac_clean_files_save -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for suffix of object files" >&5 -$as_echo_n "checking for suffix of object files... " >&6; } -if ${ac_cv_objext+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for suffix of object files" >&5 +printf %s "checking for suffix of object files... " >&6; } +if test ${ac_cv_objext+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int -main () +main (void) { ; @@ -3026,11 +3575,12 @@ case "(($ac_try" in *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_compile") 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; }; then : + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; } +then : for ac_file in conftest.o conftest.obj conftest.*; do test -f "$ac_file" || continue; case $ac_file in @@ -3039,31 +3589,32 @@ $as_echo "$ac_try_echo"; } >&5 break;; esac done -else - $as_echo "$as_me: failed program was:" >&5 +else $as_nop + printf "%s\n" "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 -{ { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} +{ { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "cannot compute suffix of object files: cannot compile See \`config.log' for more details" "$LINENO" 5; } fi rm -f conftest.$ac_cv_objext conftest.$ac_ext fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_objext" >&5 -$as_echo "$ac_cv_objext" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_objext" >&5 +printf "%s\n" "$ac_cv_objext" >&6; } OBJEXT=$ac_cv_objext ac_objext=$OBJEXT -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether we are using the GNU C compiler" >&5 -$as_echo_n "checking whether we are using the GNU C compiler... " >&6; } -if ${ac_cv_c_compiler_gnu+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the compiler supports GNU C" >&5 +printf %s "checking whether the compiler supports GNU C... " >&6; } +if test ${ac_cv_c_compiler_gnu+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int -main () +main (void) { #ifndef __GNUC__ choke me @@ -3073,29 +3624,33 @@ main () return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : ac_compiler_gnu=yes -else +else $as_nop ac_compiler_gnu=no fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ac_cv_c_compiler_gnu=$ac_compiler_gnu fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_compiler_gnu" >&5 -$as_echo "$ac_cv_c_compiler_gnu" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_compiler_gnu" >&5 +printf "%s\n" "$ac_cv_c_compiler_gnu" >&6; } +ac_compiler_gnu=$ac_cv_c_compiler_gnu + if test $ac_compiler_gnu = yes; then GCC=yes else GCC= fi -ac_test_CFLAGS=${CFLAGS+set} +ac_test_CFLAGS=${CFLAGS+y} ac_save_CFLAGS=$CFLAGS -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CC accepts -g" >&5 -$as_echo_n "checking whether $CC accepts -g... " >&6; } -if ${ac_cv_prog_cc_g+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether $CC accepts -g" >&5 +printf %s "checking whether $CC accepts -g... " >&6; } +if test ${ac_cv_prog_cc_g+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_save_c_werror_flag=$ac_c_werror_flag ac_c_werror_flag=yes ac_cv_prog_cc_g=no @@ -3104,57 +3659,60 @@ else /* end confdefs.h. */ int -main () +main (void) { ; return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : ac_cv_prog_cc_g=yes -else +else $as_nop CFLAGS="" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int -main () +main (void) { ; return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : -else +else $as_nop ac_c_werror_flag=$ac_save_c_werror_flag CFLAGS="-g" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int -main () +main (void) { ; return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : ac_cv_prog_cc_g=yes fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ac_c_werror_flag=$ac_save_c_werror_flag fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_g" >&5 -$as_echo "$ac_cv_prog_cc_g" >&6; } -if test "$ac_test_CFLAGS" = set; then +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_g" >&5 +printf "%s\n" "$ac_cv_prog_cc_g" >&6; } +if test $ac_test_CFLAGS; then CFLAGS=$ac_save_CFLAGS elif test $ac_cv_prog_cc_g = yes; then if test "$GCC" = yes; then @@ -3169,94 +3727,144 @@ else CFLAGS= fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $CC option to accept ISO C89" >&5 -$as_echo_n "checking for $CC option to accept ISO C89... " >&6; } -if ${ac_cv_prog_cc_c89+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_cv_prog_cc_c89=no +ac_prog_cc_stdc=no +if test x$ac_prog_cc_stdc = xno +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $CC option to enable C11 features" >&5 +printf %s "checking for $CC option to enable C11 features... " >&6; } +if test ${ac_cv_prog_cc_c11+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_cv_prog_cc_c11=no ac_save_CC=$CC cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ -#include -#include -struct stat; -/* Most of the following tests are stolen from RCS 5.7's src/conf.sh. */ -struct buf { int x; }; -FILE * (*rcsopen) (struct buf *, struct stat *, int); -static char *e (p, i) - char **p; - int i; -{ - return p[i]; -} -static char *f (char * (*g) (char **, int), char **p, ...) -{ - char *s; - va_list v; - va_start (v,p); - s = g (p, va_arg (v,int)); - va_end (v); - return s; -} - -/* OSF 4.0 Compaq cc is some sort of almost-ANSI by default. It has - function prototypes and stuff, but not '\xHH' hex character constants. - These don't provoke an error unfortunately, instead are silently treated - as 'x'. The following induces an error, until -std is added to get - proper ANSI mode. Curiously '\x00'!='x' always comes out true, for an - array size at least. It's necessary to write '\x00'==0 to get something - that's true only with -std. */ -int osf4_cc_array ['\x00' == 0 ? 1 : -1]; +$ac_c_conftest_c11_program +_ACEOF +for ac_arg in '' -std=gnu11 +do + CC="$ac_save_CC $ac_arg" + if ac_fn_c_try_compile "$LINENO" +then : + ac_cv_prog_cc_c11=$ac_arg +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam + test "x$ac_cv_prog_cc_c11" != "xno" && break +done +rm -f conftest.$ac_ext +CC=$ac_save_CC +fi -/* IBM C 6 for AIX is almost-ANSI by default, but it replaces macro parameters - inside strings and character constants. */ -#define FOO(x) 'x' -int xlc6_cc_array[FOO(a) == 'x' ? 1 : -1]; +if test "x$ac_cv_prog_cc_c11" = xno +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5 +printf "%s\n" "unsupported" >&6; } +else $as_nop + if test "x$ac_cv_prog_cc_c11" = x +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: none needed" >&5 +printf "%s\n" "none needed" >&6; } +else $as_nop + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c11" >&5 +printf "%s\n" "$ac_cv_prog_cc_c11" >&6; } + CC="$CC $ac_cv_prog_cc_c11" +fi + ac_cv_prog_cc_stdc=$ac_cv_prog_cc_c11 + ac_prog_cc_stdc=c11 +fi +fi +if test x$ac_prog_cc_stdc = xno +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $CC option to enable C99 features" >&5 +printf %s "checking for $CC option to enable C99 features... " >&6; } +if test ${ac_cv_prog_cc_c99+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_cv_prog_cc_c99=no +ac_save_CC=$CC +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +$ac_c_conftest_c99_program +_ACEOF +for ac_arg in '' -std=gnu99 -std=c99 -c99 -qlanglvl=extc1x -qlanglvl=extc99 -AC99 -D_STDC_C99= +do + CC="$ac_save_CC $ac_arg" + if ac_fn_c_try_compile "$LINENO" +then : + ac_cv_prog_cc_c99=$ac_arg +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam + test "x$ac_cv_prog_cc_c99" != "xno" && break +done +rm -f conftest.$ac_ext +CC=$ac_save_CC +fi -int test (int i, double x); -struct s1 {int (*f) (int a);}; -struct s2 {int (*f) (double a);}; -int pairnames (int, char **, FILE *(*)(struct buf *, struct stat *, int), int, int); -int argc; -char **argv; -int -main () -{ -return f (e, argv, 0) != argv[0] || f (e, argv, 1) != argv[1]; - ; - return 0; -} +if test "x$ac_cv_prog_cc_c99" = xno +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5 +printf "%s\n" "unsupported" >&6; } +else $as_nop + if test "x$ac_cv_prog_cc_c99" = x +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: none needed" >&5 +printf "%s\n" "none needed" >&6; } +else $as_nop + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c99" >&5 +printf "%s\n" "$ac_cv_prog_cc_c99" >&6; } + CC="$CC $ac_cv_prog_cc_c99" +fi + ac_cv_prog_cc_stdc=$ac_cv_prog_cc_c99 + ac_prog_cc_stdc=c99 +fi +fi +if test x$ac_prog_cc_stdc = xno +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $CC option to enable C89 features" >&5 +printf %s "checking for $CC option to enable C89 features... " >&6; } +if test ${ac_cv_prog_cc_c89+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_cv_prog_cc_c89=no +ac_save_CC=$CC +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +$ac_c_conftest_c89_program _ACEOF -for ac_arg in '' -qlanglvl=extc89 -qlanglvl=ansi -std \ - -Ae "-Aa -D_HPUX_SOURCE" "-Xc -D__EXTENSIONS__" +for ac_arg in '' -qlanglvl=extc89 -qlanglvl=ansi -std -Ae "-Aa -D_HPUX_SOURCE" "-Xc -D__EXTENSIONS__" do CC="$ac_save_CC $ac_arg" - if ac_fn_c_try_compile "$LINENO"; then : + if ac_fn_c_try_compile "$LINENO" +then : ac_cv_prog_cc_c89=$ac_arg fi -rm -f core conftest.err conftest.$ac_objext +rm -f core conftest.err conftest.$ac_objext conftest.beam test "x$ac_cv_prog_cc_c89" != "xno" && break done rm -f conftest.$ac_ext CC=$ac_save_CC - fi -# AC_CACHE_VAL -case "x$ac_cv_prog_cc_c89" in - x) - { $as_echo "$as_me:${as_lineno-$LINENO}: result: none needed" >&5 -$as_echo "none needed" >&6; } ;; - xno) - { $as_echo "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5 -$as_echo "unsupported" >&6; } ;; - *) - CC="$CC $ac_cv_prog_cc_c89" - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c89" >&5 -$as_echo "$ac_cv_prog_cc_c89" >&6; } ;; -esac -if test "x$ac_cv_prog_cc_c89" != xno; then : +if test "x$ac_cv_prog_cc_c89" = xno +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5 +printf "%s\n" "unsupported" >&6; } +else $as_nop + if test "x$ac_cv_prog_cc_c89" = x +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: none needed" >&5 +printf "%s\n" "none needed" >&6; } +else $as_nop + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c89" >&5 +printf "%s\n" "$ac_cv_prog_cc_c89" >&6; } + CC="$CC $ac_cv_prog_cc_c89" +fi + ac_cv_prog_cc_stdc=$ac_cv_prog_cc_c89 + ac_prog_cc_stdc=c89 +fi fi ac_ext=c @@ -3270,40 +3878,36 @@ ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking how to run the C preprocessor" >&5 -$as_echo_n "checking how to run the C preprocessor... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking how to run the C preprocessor" >&5 +printf %s "checking how to run the C preprocessor... " >&6; } # On Suns, sometimes $CPP names a directory. if test -n "$CPP" && test -d "$CPP"; then CPP= fi if test -z "$CPP"; then - if ${ac_cv_prog_CPP+:} false; then : - $as_echo_n "(cached) " >&6 -else - # Double quotes because CPP needs to be expanded - for CPP in "$CC -E" "$CC -E -traditional-cpp" "/lib/cpp" + if test ${ac_cv_prog_CPP+y} +then : + printf %s "(cached) " >&6 +else $as_nop + # Double quotes because $CC needs to be expanded + for CPP in "$CC -E" "$CC -E -traditional-cpp" cpp /lib/cpp do ac_preproc_ok=false for ac_c_preproc_warn_flag in '' yes do # Use a header file that comes with gcc, so configuring glibc # with a fresh cross-compiler works. - # Prefer to if __STDC__ is defined, since - # exists even on freestanding compilers. # On the NeXT, cc -E runs the code through the compiler's parser, # not just through cpp. "Syntax error" is here to catch this case. cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ -#ifdef __STDC__ -# include -#else -# include -#endif +#include Syntax error _ACEOF -if ac_fn_c_try_cpp "$LINENO"; then : +if ac_fn_c_try_cpp "$LINENO" +then : -else +else $as_nop # Broken: fails on valid input. continue fi @@ -3315,10 +3919,11 @@ rm -f conftest.err conftest.i conftest.$ac_ext /* end confdefs.h. */ #include _ACEOF -if ac_fn_c_try_cpp "$LINENO"; then : +if ac_fn_c_try_cpp "$LINENO" +then : # Broken: success on invalid input. continue -else +else $as_nop # Passes both tests. ac_preproc_ok=: break @@ -3328,7 +3933,8 @@ rm -f conftest.err conftest.i conftest.$ac_ext done # Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped. rm -f conftest.i conftest.err conftest.$ac_ext -if $ac_preproc_ok; then : +if $ac_preproc_ok +then : break fi @@ -3340,29 +3946,24 @@ fi else ac_cv_prog_CPP=$CPP fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $CPP" >&5 -$as_echo "$CPP" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $CPP" >&5 +printf "%s\n" "$CPP" >&6; } ac_preproc_ok=false for ac_c_preproc_warn_flag in '' yes do # Use a header file that comes with gcc, so configuring glibc # with a fresh cross-compiler works. - # Prefer to if __STDC__ is defined, since - # exists even on freestanding compilers. # On the NeXT, cc -E runs the code through the compiler's parser, # not just through cpp. "Syntax error" is here to catch this case. cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ -#ifdef __STDC__ -# include -#else -# include -#endif +#include Syntax error _ACEOF -if ac_fn_c_try_cpp "$LINENO"; then : +if ac_fn_c_try_cpp "$LINENO" +then : -else +else $as_nop # Broken: fails on valid input. continue fi @@ -3374,10 +3975,11 @@ rm -f conftest.err conftest.i conftest.$ac_ext /* end confdefs.h. */ #include _ACEOF -if ac_fn_c_try_cpp "$LINENO"; then : +if ac_fn_c_try_cpp "$LINENO" +then : # Broken: success on invalid input. continue -else +else $as_nop # Passes both tests. ac_preproc_ok=: break @@ -3387,11 +3989,12 @@ rm -f conftest.err conftest.i conftest.$ac_ext done # Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped. rm -f conftest.i conftest.err conftest.$ac_ext -if $ac_preproc_ok; then : +if $ac_preproc_ok +then : -else - { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} +else $as_nop + { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "C preprocessor \"$CPP\" fails sanity check See \`config.log' for more details" "$LINENO" 5; } fi @@ -3406,11 +4009,12 @@ for ac_prog in 'bison -y' byacc do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_YACC+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_YACC+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$YACC"; then ac_cv_prog_YACC="$YACC" # Let the user override the test. else @@ -3418,11 +4022,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_YACC="$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -3433,11 +4041,11 @@ fi fi YACC=$ac_cv_prog_YACC if test -n "$YACC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $YACC" >&5 -$as_echo "$YACC" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $YACC" >&5 +printf "%s\n" "$YACC" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -3447,11 +4055,12 @@ test -n "$YACC" || YACC="yacc" # Extract the first word of "$YACC", so it can be a program name with args. set dummy $YACC; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_YACC_CHECK+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_YACC_CHECK+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$YACC_CHECK"; then ac_cv_prog_YACC_CHECK="$YACC_CHECK" # Let the user override the test. else @@ -3459,11 +4068,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_YACC_CHECK="yes" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -3474,18 +4087,20 @@ fi fi YACC_CHECK=$ac_cv_prog_YACC_CHECK if test -n "$YACC_CHECK"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $YACC_CHECK" >&5 -$as_echo "$YACC_CHECK" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $YACC_CHECK" >&5 +printf "%s\n" "$YACC_CHECK" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi -if test x"$YACC_CHECK" != x"yes"; then : +if test x"$YACC_CHECK" != x"yes" +then : as_fn_error $? "Please install either bison or yacc and run configure again" "$LINENO" 5 fi -# Find a good install program. We prefer a C program (faster), + + # Find a good install program. We prefer a C program (faster), # so one script is as good as another. But avoid the broken or # incompatible versions: # SysV /etc/install, /usr/sbin/install @@ -3499,20 +4114,25 @@ fi # OS/2's system install, which has a completely different semantic # ./install, which can be erroneously created by make from ./install.sh. # Reject install programs that cannot install multiple files. -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for a BSD-compatible install" >&5 -$as_echo_n "checking for a BSD-compatible install... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for a BSD-compatible install" >&5 +printf %s "checking for a BSD-compatible install... " >&6; } if test -z "$INSTALL"; then -if ${ac_cv_path_install+:} false; then : - $as_echo_n "(cached) " >&6 -else +if test ${ac_cv_path_install+y} +then : + printf %s "(cached) " >&6 +else $as_nop as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - # Account for people who put trailing slashes in PATH elements. -case $as_dir/ in #(( - ./ | .// | /[cC]/* | \ + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + # Account for fact that we put trailing slashes in our PATH walk. +case $as_dir in #(( + ./ | /[cC]/* | \ /etc/* | /usr/sbin/* | /usr/etc/* | /sbin/* | /usr/afsws/bin/* | \ ?:[\\/]os2[\\/]install[\\/]* | ?:[\\/]OS2[\\/]INSTALL[\\/]* | \ /usr/ucb/* ) ;; @@ -3522,13 +4142,13 @@ case $as_dir/ in #(( # by default. for ac_prog in ginstall scoinst install; do for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_prog$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_prog$ac_exec_ext"; then if test $ac_prog = install && - grep dspmsg "$as_dir/$ac_prog$ac_exec_ext" >/dev/null 2>&1; then + grep dspmsg "$as_dir$ac_prog$ac_exec_ext" >/dev/null 2>&1; then # AIX install. It has an incompatible calling convention. : elif test $ac_prog = install && - grep pwplus "$as_dir/$ac_prog$ac_exec_ext" >/dev/null 2>&1; then + grep pwplus "$as_dir$ac_prog$ac_exec_ext" >/dev/null 2>&1; then # program-specific install script used by HP pwplus--don't use. : else @@ -3536,12 +4156,12 @@ case $as_dir/ in #(( echo one > conftest.one echo two > conftest.two mkdir conftest.dir - if "$as_dir/$ac_prog$ac_exec_ext" -c conftest.one conftest.two "`pwd`/conftest.dir" && + if "$as_dir$ac_prog$ac_exec_ext" -c conftest.one conftest.two "`pwd`/conftest.dir/" && test -s conftest.one && test -s conftest.two && test -s conftest.dir/conftest.one && test -s conftest.dir/conftest.two then - ac_cv_path_install="$as_dir/$ac_prog$ac_exec_ext -c" + ac_cv_path_install="$as_dir$ac_prog$ac_exec_ext -c" break 3 fi fi @@ -3557,7 +4177,7 @@ IFS=$as_save_IFS rm -rf conftest.one conftest.two conftest.dir fi - if test "${ac_cv_path_install+set}" = set; then + if test ${ac_cv_path_install+y}; then INSTALL=$ac_cv_path_install else # As a last resort, use the slow shell script. Don't cache a @@ -3567,8 +4187,8 @@ fi INSTALL=$ac_install_sh fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $INSTALL" >&5 -$as_echo "$INSTALL" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $INSTALL" >&5 +printf "%s\n" "$INSTALL" >&6; } # Use test -z because SunOS4 sh mishandles braces in ${var-val}. # It thinks the first close brace ends the variable substitution. @@ -3578,24 +4198,25 @@ test -z "$INSTALL_SCRIPT" && INSTALL_SCRIPT='${INSTALL}' test -z "$INSTALL_DATA" && INSTALL_DATA='${INSTALL} -m 644' -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ln -s works" >&5 -$as_echo_n "checking whether ln -s works... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether ln -s works" >&5 +printf %s "checking whether ln -s works... " >&6; } LN_S=$as_ln_s if test "$LN_S" = "ln -s"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no, using $LN_S" >&5 -$as_echo "no, using $LN_S" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no, using $LN_S" >&5 +printf "%s\n" "no, using $LN_S" >&6; } fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${MAKE-make} sets \$(MAKE)" >&5 -$as_echo_n "checking whether ${MAKE-make} sets \$(MAKE)... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether ${MAKE-make} sets \$(MAKE)" >&5 +printf %s "checking whether ${MAKE-make} sets \$(MAKE)... " >&6; } set x ${MAKE-make} -ac_make=`$as_echo "$2" | sed 's/+/p/g; s/[^a-zA-Z0-9_]/_/g'` -if eval \${ac_cv_prog_make_${ac_make}_set+:} false; then : - $as_echo_n "(cached) " >&6 -else +ac_make=`printf "%s\n" "$2" | sed 's/+/p/g; s/[^a-zA-Z0-9_]/_/g'` +if eval test \${ac_cv_prog_make_${ac_make}_set+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat >conftest.make <<\_ACEOF SHELL = /bin/sh all: @@ -3611,22 +4232,23 @@ esac rm -f conftest.make fi if eval test \$ac_cv_prog_make_${ac_make}_set = yes; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } SET_MAKE= else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } SET_MAKE="MAKE=${MAKE-make}" fi # Extract the first word of "ar", so it can be a program name with args. set dummy ar; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_AR+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_AR+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$AR"; then ac_cv_prog_AR="$AR" # Let the user override the test. else @@ -3634,11 +4256,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_AR="ar" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -3650,11 +4276,11 @@ fi fi AR=$ac_cv_prog_AR if test -n "$AR"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $AR" >&5 -$as_echo "$AR" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $AR" >&5 +printf "%s\n" "$AR" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -3662,11 +4288,12 @@ if test -z "$no_ranlib"; then if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}ranlib", so it can be a program name with args. set dummy ${ac_tool_prefix}ranlib; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_RANLIB+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_RANLIB+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$RANLIB"; then ac_cv_prog_RANLIB="$RANLIB" # Let the user override the test. else @@ -3674,11 +4301,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_RANLIB="${ac_tool_prefix}ranlib" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -3689,11 +4320,11 @@ fi fi RANLIB=$ac_cv_prog_RANLIB if test -n "$RANLIB"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $RANLIB" >&5 -$as_echo "$RANLIB" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $RANLIB" >&5 +printf "%s\n" "$RANLIB" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -3702,11 +4333,12 @@ if test -z "$ac_cv_prog_RANLIB"; then ac_ct_RANLIB=$RANLIB # Extract the first word of "ranlib", so it can be a program name with args. set dummy ranlib; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_RANLIB+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_RANLIB+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$ac_ct_RANLIB"; then ac_cv_prog_ac_ct_RANLIB="$ac_ct_RANLIB" # Let the user override the test. else @@ -3714,11 +4346,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_RANLIB="ranlib" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -3729,11 +4365,11 @@ fi fi ac_ct_RANLIB=$ac_cv_prog_ac_ct_RANLIB if test -n "$ac_ct_RANLIB"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_RANLIB" >&5 -$as_echo "$ac_ct_RANLIB" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_RANLIB" >&5 +printf "%s\n" "$ac_ct_RANLIB" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi if test "x$ac_ct_RANLIB" = x; then @@ -3741,8 +4377,8 @@ fi else case $cross_compiling:$ac_tool_warned in yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} ac_tool_warned=yes ;; esac RANLIB=$ac_ct_RANLIB @@ -3754,6 +4390,108 @@ fi else RANLIB=":" fi +if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}pkgconf", so it can be a program name with args. +set dummy ${ac_tool_prefix}pkgconf; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_PKGCONF+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$PKGCONF"; then + ac_cv_prog_PKGCONF="$PKGCONF" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_PKGCONF="${ac_tool_prefix}pkgconf" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS + +fi +fi +PKGCONF=$ac_cv_prog_PKGCONF +if test -n "$PKGCONF"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $PKGCONF" >&5 +printf "%s\n" "$PKGCONF" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + + +fi +if test -z "$ac_cv_prog_PKGCONF"; then + ac_ct_PKGCONF=$PKGCONF + # Extract the first word of "pkgconf", so it can be a program name with args. +set dummy pkgconf; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_PKGCONF+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_PKGCONF"; then + ac_cv_prog_ac_ct_PKGCONF="$ac_ct_PKGCONF" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_PKGCONF="pkgconf" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS + +fi +fi +ac_ct_PKGCONF=$ac_cv_prog_ac_ct_PKGCONF +if test -n "$ac_ct_PKGCONF"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_PKGCONF" >&5 +printf "%s\n" "$ac_ct_PKGCONF" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + + if test "x$ac_ct_PKGCONF" = x; then + PKGCONF="" + else + case $cross_compiling:$ac_tool_warned in +yes:) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +ac_tool_warned=yes ;; +esac + PKGCONF=$ac_ct_PKGCONF + fi +else + PKGCONF="$ac_cv_prog_PKGCONF" +fi + INSTALL="install-sh" @@ -3761,7 +4499,7 @@ INSTALL="install-sh" case "$host" in *-*-solaris*) # Solaris flags - $as_echo "#define NO_MACRO 1" >>confdefs.h + printf "%s\n" "#define NO_MACRO 1" >>confdefs.h ;; @@ -3782,22 +4520,24 @@ export CFLAGS CC -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking that the compiler works" >&5 -$as_echo_n "checking that the compiler works... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking that the compiler works" >&5 +printf %s "checking that the compiler works... " >&6; } -if test "$cross_compiling" = yes; then : +if test "$cross_compiling" = yes +then : as_fn_error $? "Could not compile and run even a trivial ANSI C program - check CC." "$LINENO" 5 -else +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ main(int ac, char **av) { return 0; } _ACEOF -if ac_fn_c_try_run "$LINENO"; then : - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } +if ac_fn_c_try_run "$LINENO" +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } +else $as_nop + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } as_fn_error $? "Could not compile and run even a trivial ANSI C program - check CC." "$LINENO" 5 fi rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \ @@ -3814,46 +4554,39 @@ case "$build" in esac -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to enable -Wall" >&5 -$as_echo_n "checking whether to enable -Wall... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether to enable -Wall" >&5 +printf %s "checking whether to enable -Wall... " >&6; } # Check whether --enable-warnings was given. -if test "${enable_warnings+set}" = set; then : +if test ${enable_warnings+y} +then : enableval=$enable_warnings; if test -n "$GCC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: adding -Wall to CFLAGS." >&5 -$as_echo "adding -Wall to CFLAGS." >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: adding -Wall to CFLAGS." >&5 +printf "%s\n" "adding -Wall to CFLAGS." >&6; } CFLAGS="$CFLAGS -Wall" fi -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } +else $as_nop + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi # Check whether --with-httpddir was given. -if test "${with_httpddir+set}" = set; then : +if test ${with_httpddir+y} +then : withval=$with_httpddir; httpddir=$with_httpddir -else +else $as_nop httpddir=/usr/local/apache fi -# Check whether --with-cgidir was given. -if test "${with_cgidir+set}" = set; then : - withval=$with_cgidir; cgidir=$with_cgidir -else - cgidir=$httpddir/cgi-bin -fi - - - - # Check whether --with-htmldir was given. -if test "${with_htmldir+set}" = set; then : +if test ${with_htmldir+y} +then : withval=$with_htmldir; htmldir=$with_htmldir -else +else $as_nop htmldir=$httpddir/htdocs/hypermail fi @@ -3861,9 +4594,10 @@ fi # Check whether --with-language was given. -if test "${with_language+set}" = set; then : +if test ${with_language+y} +then : withval=$with_language; language=$with_language -else +else $as_nop language=en fi @@ -3871,18 +4605,20 @@ fi # Check whether --with-htmlsuffix was given. -if test "${with_htmlsuffix+set}" = set; then : +if test ${with_htmlsuffix+y} +then : withval=$with_htmlsuffix; htmlsuffix=$with_htmlsuffix -else +else $as_nop htmlsuffix=html fi # Check whether --enable-defaultindex was given. -if test "${enable_defaultindex+set}" = set; then : +if test ${enable_defaultindex+y} +then : enableval=$enable_defaultindex; defaultindex=$enableval -else +else $as_nop defaultindex="thread" fi @@ -3890,9 +4626,10 @@ fi # Check whether --with-domainaddr was given. -if test "${with_domainaddr+set}" = set; then : +if test ${with_domainaddr+y} +then : withval=$with_domainaddr; domainaddr=$with_domainaddr -else +else $as_nop domainaddr=NONE fi @@ -3902,12 +4639,43 @@ fi +# Autoupdate added the next two lines to ensure that your configure +# script's behavior did not change. They are probably safe to remove. +ac_header= ac_cache= +for ac_item in $ac_header_c_list +do + if test $ac_cache; then + ac_fn_c_check_header_compile "$LINENO" $ac_header ac_cv_header_$ac_cache "$ac_includes_default" + if eval test \"x\$ac_cv_header_$ac_cache\" = xyes; then + printf "%s\n" "#define $ac_item 1" >> confdefs.h + fi + ac_header= ac_cache= + elif test $ac_header; then + ac_cache=$ac_item + else + ac_header=$ac_item + fi +done -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for grep that handles long lines and -e" >&5 -$as_echo_n "checking for grep that handles long lines and -e... " >&6; } -if ${ac_cv_path_GREP+:} false; then : - $as_echo_n "(cached) " >&6 -else + + + + + + + +if test $ac_cv_header_stdlib_h = yes && test $ac_cv_header_string_h = yes +then : + +printf "%s\n" "#define STDC_HEADERS 1" >>confdefs.h + +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for grep that handles long lines and -e" >&5 +printf %s "checking for grep that handles long lines and -e... " >&6; } +if test ${ac_cv_path_GREP+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -z "$GREP"; then ac_path_GREP_found=false # Loop through the user's path and test for each of PROGNAME-LIST @@ -3915,10 +4683,15 @@ else for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_prog in grep ggrep; do + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_prog in grep ggrep + do for ac_exec_ext in '' $ac_executable_extensions; do - ac_path_GREP="$as_dir/$ac_prog$ac_exec_ext" + ac_path_GREP="$as_dir$ac_prog$ac_exec_ext" as_fn_executable_p "$ac_path_GREP" || continue # Check for GNU ac_path_GREP and select it if it is found. # Check for GNU $ac_path_GREP @@ -3927,13 +4700,13 @@ case `"$ac_path_GREP" --version 2>&1` in ac_cv_path_GREP="$ac_path_GREP" ac_path_GREP_found=:;; *) ac_count=0 - $as_echo_n 0123456789 >"conftest.in" + printf %s 0123456789 >"conftest.in" while : do cat "conftest.in" "conftest.in" >"conftest.tmp" mv "conftest.tmp" "conftest.in" cp "conftest.in" "conftest.nl" - $as_echo 'GREP' >> "conftest.nl" + printf "%s\n" 'GREP' >> "conftest.nl" "$ac_path_GREP" -e 'GREP$' -e '-(cannot match)-' < "conftest.nl" >"conftest.out" 2>/dev/null || break diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break as_fn_arith $ac_count + 1 && ac_count=$as_val @@ -3961,16 +4734,17 @@ else fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_GREP" >&5 -$as_echo "$ac_cv_path_GREP" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_GREP" >&5 +printf "%s\n" "$ac_cv_path_GREP" >&6; } GREP="$ac_cv_path_GREP" -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for egrep" >&5 -$as_echo_n "checking for egrep... " >&6; } -if ${ac_cv_path_EGREP+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for egrep" >&5 +printf %s "checking for egrep... " >&6; } +if test ${ac_cv_path_EGREP+y} +then : + printf %s "(cached) " >&6 +else $as_nop if echo a | $GREP -E '(a|b)' >/dev/null 2>&1 then ac_cv_path_EGREP="$GREP -E" else @@ -3981,10 +4755,15 @@ else for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_prog in egrep; do + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_prog in egrep + do for ac_exec_ext in '' $ac_executable_extensions; do - ac_path_EGREP="$as_dir/$ac_prog$ac_exec_ext" + ac_path_EGREP="$as_dir$ac_prog$ac_exec_ext" as_fn_executable_p "$ac_path_EGREP" || continue # Check for GNU ac_path_EGREP and select it if it is found. # Check for GNU $ac_path_EGREP @@ -3993,13 +4772,13 @@ case `"$ac_path_EGREP" --version 2>&1` in ac_cv_path_EGREP="$ac_path_EGREP" ac_path_EGREP_found=:;; *) ac_count=0 - $as_echo_n 0123456789 >"conftest.in" + printf %s 0123456789 >"conftest.in" while : do cat "conftest.in" "conftest.in" >"conftest.tmp" mv "conftest.tmp" "conftest.in" cp "conftest.in" "conftest.nl" - $as_echo 'EGREP' >> "conftest.nl" + printf "%s\n" 'EGREP' >> "conftest.nl" "$ac_path_EGREP" 'EGREP$' < "conftest.nl" >"conftest.out" 2>/dev/null || break diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break as_fn_arith $ac_count + 1 && ac_count=$as_val @@ -4028,163 +4807,171 @@ fi fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_EGREP" >&5 -$as_echo "$ac_cv_path_EGREP" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_EGREP" >&5 +printf "%s\n" "$ac_cv_path_EGREP" >&6; } EGREP="$ac_cv_path_EGREP" -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for ANSI C header files" >&5 -$as_echo_n "checking for ANSI C header files... " >&6; } -if ${ac_cv_header_stdc+:} false; then : - $as_echo_n "(cached) " >&6 -else - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include -#include -#include -#include -int -main () -{ - ; - return 0; -} -_ACEOF -if ac_fn_c_try_compile "$LINENO"; then : - ac_cv_header_stdc=yes -else - ac_cv_header_stdc=no +ac_fn_c_check_header_compile "$LINENO" "alloca.h" "ac_cv_header_alloca_h" "$ac_includes_default" +if test "x$ac_cv_header_alloca_h" = xyes +then : + printf "%s\n" "#define HAVE_ALLOCA_H 1" >>confdefs.h + fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +ac_fn_c_check_header_compile "$LINENO" "arpa/inet.h" "ac_cv_header_arpa_inet_h" "$ac_includes_default" +if test "x$ac_cv_header_arpa_inet_h" = xyes +then : + printf "%s\n" "#define HAVE_ARPA_INET_H 1" >>confdefs.h -if test $ac_cv_header_stdc = yes; then - # SunOS 4.x string.h does not declare mem*, contrary to ANSI. - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include +fi +ac_fn_c_check_header_compile "$LINENO" "ctype.h" "ac_cv_header_ctype_h" "$ac_includes_default" +if test "x$ac_cv_header_ctype_h" = xyes +then : + printf "%s\n" "#define HAVE_CTYPE_H 1" >>confdefs.h -_ACEOF -if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | - $EGREP "memchr" >/dev/null 2>&1; then : +fi +ac_fn_c_check_header_compile "$LINENO" "dirent.h" "ac_cv_header_dirent_h" "$ac_includes_default" +if test "x$ac_cv_header_dirent_h" = xyes +then : + printf "%s\n" "#define HAVE_DIRENT_H 1" >>confdefs.h -else - ac_cv_header_stdc=no fi -rm -f conftest* +ac_fn_c_check_header_compile "$LINENO" "errno.h" "ac_cv_header_errno_h" "$ac_includes_default" +if test "x$ac_cv_header_errno_h" = xyes +then : + printf "%s\n" "#define HAVE_ERRNO_H 1" >>confdefs.h fi +ac_fn_c_check_header_compile "$LINENO" "fcntl.h" "ac_cv_header_fcntl_h" "$ac_includes_default" +if test "x$ac_cv_header_fcntl_h" = xyes +then : + printf "%s\n" "#define HAVE_FCNTL_H 1" >>confdefs.h -if test $ac_cv_header_stdc = yes; then - # ISC 2.0.2 stdlib.h does not declare free, contrary to ANSI. - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include +fi +ac_fn_c_check_header_compile "$LINENO" "inttypes.h" "ac_cv_header_inttypes_h" "$ac_includes_default" +if test "x$ac_cv_header_inttypes_h" = xyes +then : + printf "%s\n" "#define HAVE_INTTYPES_H 1" >>confdefs.h -_ACEOF -if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | - $EGREP "free" >/dev/null 2>&1; then : +fi +ac_fn_c_check_header_compile "$LINENO" "locale.h" "ac_cv_header_locale_h" "$ac_includes_default" +if test "x$ac_cv_header_locale_h" = xyes +then : + printf "%s\n" "#define HAVE_LOCALE_H 1" >>confdefs.h -else - ac_cv_header_stdc=no fi -rm -f conftest* +ac_fn_c_check_header_compile "$LINENO" "malloc.h" "ac_cv_header_malloc_h" "$ac_includes_default" +if test "x$ac_cv_header_malloc_h" = xyes +then : + printf "%s\n" "#define HAVE_MALLOC_H 1" >>confdefs.h fi +ac_fn_c_check_header_compile "$LINENO" "netdb.h" "ac_cv_header_netdb_h" "$ac_includes_default" +if test "x$ac_cv_header_netdb_h" = xyes +then : + printf "%s\n" "#define HAVE_NETDB_H 1" >>confdefs.h -if test $ac_cv_header_stdc = yes; then - # /bin/cc in Irix-4.0.5 gets non-ANSI ctype macros unless using -ansi. - if test "$cross_compiling" = yes; then : - : -else - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include -#include -#if ((' ' & 0x0FF) == 0x020) -# define ISLOWER(c) ('a' <= (c) && (c) <= 'z') -# define TOUPPER(c) (ISLOWER(c) ? 'A' + ((c) - 'a') : (c)) -#else -# define ISLOWER(c) \ - (('a' <= (c) && (c) <= 'i') \ - || ('j' <= (c) && (c) <= 'r') \ - || ('s' <= (c) && (c) <= 'z')) -# define TOUPPER(c) (ISLOWER(c) ? ((c) | 0x40) : (c)) -#endif +fi +ac_fn_c_check_header_compile "$LINENO" "netinet/in.h" "ac_cv_header_netinet_in_h" "$ac_includes_default" +if test "x$ac_cv_header_netinet_in_h" = xyes +then : + printf "%s\n" "#define HAVE_NETINET_IN_H 1" >>confdefs.h -#define XOR(e, f) (((e) && !(f)) || (!(e) && (f))) -int -main () -{ - int i; - for (i = 0; i < 256; i++) - if (XOR (islower (i), ISLOWER (i)) - || toupper (i) != TOUPPER (i)) - return 2; - return 0; -} -_ACEOF -if ac_fn_c_try_run "$LINENO"; then : +fi +ac_fn_c_check_header_compile "$LINENO" "pwd.h" "ac_cv_header_pwd_h" "$ac_includes_default" +if test "x$ac_cv_header_pwd_h" = xyes +then : + printf "%s\n" "#define HAVE_PWD_H 1" >>confdefs.h -else - ac_cv_header_stdc=no fi -rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \ - conftest.$ac_objext conftest.beam conftest.$ac_ext +ac_fn_c_check_header_compile "$LINENO" "stdarg.h" "ac_cv_header_stdarg_h" "$ac_includes_default" +if test "x$ac_cv_header_stdarg_h" = xyes +then : + printf "%s\n" "#define HAVE_STDARG_H 1" >>confdefs.h + fi +ac_fn_c_check_header_compile "$LINENO" "stdint.h" "ac_cv_header_stdint_h" "$ac_includes_default" +if test "x$ac_cv_header_stdint_h" = xyes +then : + printf "%s\n" "#define HAVE_STDINT_H 1" >>confdefs.h fi +ac_fn_c_check_header_compile "$LINENO" "stdio.h" "ac_cv_header_stdio_h" "$ac_includes_default" +if test "x$ac_cv_header_stdio_h" = xyes +then : + printf "%s\n" "#define HAVE_STDIO_H 1" >>confdefs.h + fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_header_stdc" >&5 -$as_echo "$ac_cv_header_stdc" >&6; } -if test $ac_cv_header_stdc = yes; then +ac_fn_c_check_header_compile "$LINENO" "stdlib.h" "ac_cv_header_stdlib_h" "$ac_includes_default" +if test "x$ac_cv_header_stdlib_h" = xyes +then : + printf "%s\n" "#define HAVE_STDLIB_H 1" >>confdefs.h -$as_echo "#define STDC_HEADERS 1" >>confdefs.h +fi +ac_fn_c_check_header_compile "$LINENO" "string.h" "ac_cv_header_string_h" "$ac_includes_default" +if test "x$ac_cv_header_string_h" = xyes +then : + printf "%s\n" "#define HAVE_STRING_H 1" >>confdefs.h fi +ac_fn_c_check_header_compile "$LINENO" "sys/dir.h" "ac_cv_header_sys_dir_h" "$ac_includes_default" +if test "x$ac_cv_header_sys_dir_h" = xyes +then : + printf "%s\n" "#define HAVE_SYS_DIR_H 1" >>confdefs.h +fi +ac_fn_c_check_header_compile "$LINENO" "sys/param.h" "ac_cv_header_sys_param_h" "$ac_includes_default" +if test "x$ac_cv_header_sys_param_h" = xyes +then : + printf "%s\n" "#define HAVE_SYS_PARAM_H 1" >>confdefs.h -# On IRIX 5.3, sys/types and inttypes.h are conflicting. -for ac_header in sys/types.h sys/stat.h stdlib.h string.h memory.h strings.h \ - inttypes.h stdint.h unistd.h -do : - as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh` -ac_fn_c_check_header_compile "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default -" -if eval test \"x\$"$as_ac_Header"\" = x"yes"; then : - cat >>confdefs.h <<_ACEOF -#define `$as_echo "HAVE_$ac_header" | $as_tr_cpp` 1 -_ACEOF +fi +ac_fn_c_check_header_compile "$LINENO" "sys/socket.h" "ac_cv_header_sys_socket_h" "$ac_includes_default" +if test "x$ac_cv_header_sys_socket_h" = xyes +then : + printf "%s\n" "#define HAVE_SYS_SOCKET_H 1" >>confdefs.h fi +ac_fn_c_check_header_compile "$LINENO" "sys/stat.h" "ac_cv_header_sys_stat_h" "$ac_includes_default" +if test "x$ac_cv_header_sys_stat_h" = xyes +then : + printf "%s\n" "#define HAVE_SYS_STAT_H 1" >>confdefs.h -done +fi +ac_fn_c_check_header_compile "$LINENO" "sys/time.h" "ac_cv_header_sys_time_h" "$ac_includes_default" +if test "x$ac_cv_header_sys_time_h" = xyes +then : + printf "%s\n" "#define HAVE_SYS_TIME_H 1" >>confdefs.h +fi +ac_fn_c_check_header_compile "$LINENO" "sys/types.h" "ac_cv_header_sys_types_h" "$ac_includes_default" +if test "x$ac_cv_header_sys_types_h" = xyes +then : + printf "%s\n" "#define HAVE_SYS_TYPES_H 1" >>confdefs.h -for ac_header in alloca.h arpa/inet.h ctype.h dirent.h errno.h \ - fcntl.h locale.h malloc.h netdb.h netinet/in.h pwd.h stdarg.h \ - stdio.h stdlib.h string.h sys/dir.h sys/param.h sys/socket.h \ - sys/stat.h sys/time.h sys/types.h time.h unistd.h -do : - as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh` -ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default" -if eval test \"x\$"$as_ac_Header"\" = x"yes"; then : - cat >>confdefs.h <<_ACEOF -#define `$as_echo "HAVE_$ac_header" | $as_tr_cpp` 1 -_ACEOF +fi +ac_fn_c_check_header_compile "$LINENO" "time.h" "ac_cv_header_time_h" "$ac_includes_default" +if test "x$ac_cv_header_time_h" = xyes +then : + printf "%s\n" "#define HAVE_TIME_H 1" >>confdefs.h fi +ac_fn_c_check_header_compile "$LINENO" "unistd.h" "ac_cv_header_unistd_h" "$ac_includes_default" +if test "x$ac_cv_header_unistd_h" = xyes +then : + printf "%s\n" "#define HAVE_UNISTD_H 1" >>confdefs.h -done +fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether stat file-mode macros are broken" >&5 -$as_echo_n "checking whether stat file-mode macros are broken... " >&6; } -if ${ac_cv_header_stat_broken+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether stat file-mode macros are broken" >&5 +printf %s "checking whether stat file-mode macros are broken... " >&6; } +if test ${ac_cv_header_stat_broken+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #include @@ -4207,36 +4994,38 @@ extern char c4[S_ISSOCK (S_IFREG) ? -1 : 1]; #endif _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : ac_cv_header_stat_broken=no -else +else $as_nop ac_cv_header_stat_broken=yes fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_header_stat_broken" >&5 -$as_echo "$ac_cv_header_stat_broken" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_header_stat_broken" >&5 +printf "%s\n" "$ac_cv_header_stat_broken" >&6; } if test $ac_cv_header_stat_broken = yes; then -$as_echo "#define STAT_MACROS_BROKEN 1" >>confdefs.h +printf "%s\n" "#define STAT_MACROS_BROKEN 1" >>confdefs.h fi ac_header_dirent=no for ac_hdr in dirent.h sys/ndir.h sys/dir.h ndir.h; do - as_ac_Header=`$as_echo "ac_cv_header_dirent_$ac_hdr" | $as_tr_sh` -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_hdr that defines DIR" >&5 -$as_echo_n "checking for $ac_hdr that defines DIR... " >&6; } -if eval \${$as_ac_Header+:} false; then : - $as_echo_n "(cached) " >&6 -else + as_ac_Header=`printf "%s\n" "ac_cv_header_dirent_$ac_hdr" | $as_tr_sh` +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_hdr that defines DIR" >&5 +printf %s "checking for $ac_hdr that defines DIR... " >&6; } +if eval test \${$as_ac_Header+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #include #include <$ac_hdr> int -main () +main (void) { if ((DIR *) 0) return 0; @@ -4244,19 +5033,21 @@ return 0; return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : eval "$as_ac_Header=yes" -else +else $as_nop eval "$as_ac_Header=no" fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext fi eval ac_res=\$$as_ac_Header - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } -if eval test \"x\$"$as_ac_Header"\" = x"yes"; then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 +printf "%s\n" "$ac_res" >&6; } +if eval test \"x\$"$as_ac_Header"\" = x"yes" +then : cat >>confdefs.h <<_ACEOF -#define `$as_echo "HAVE_$ac_hdr" | $as_tr_cpp` 1 +#define `printf "%s\n" "HAVE_$ac_hdr" | $as_tr_cpp` 1 _ACEOF ac_header_dirent=$ac_hdr; break @@ -4265,11 +5056,12 @@ fi done # Two versions of opendir et al. are in -ldir and -lx on SCO Xenix. if test $ac_header_dirent = dirent.h; then - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing opendir" >&5 -$as_echo_n "checking for library containing opendir... " >&6; } -if ${ac_cv_search_opendir+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for library containing opendir" >&5 +printf %s "checking for library containing opendir... " >&6; } +if test ${ac_cv_search_opendir+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_func_search_save_LIBS=$LIBS cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ @@ -4277,56 +5069,59 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* Override any GCC internal prototype to avoid an error. Use char because int might match the return type of a GCC builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif char opendir (); int -main () +main (void) { return opendir (); ; return 0; } _ACEOF -for ac_lib in '' dir; do +for ac_lib in '' dir +do if test -z "$ac_lib"; then ac_res="none required" else ac_res=-l$ac_lib LIBS="-l$ac_lib $ac_func_search_save_LIBS" fi - if ac_fn_c_try_link "$LINENO"; then : + if ac_fn_c_try_link "$LINENO" +then : ac_cv_search_opendir=$ac_res fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext - if ${ac_cv_search_opendir+:} false; then : + if test ${ac_cv_search_opendir+y} +then : break fi done -if ${ac_cv_search_opendir+:} false; then : +if test ${ac_cv_search_opendir+y} +then : -else +else $as_nop ac_cv_search_opendir=no fi rm conftest.$ac_ext LIBS=$ac_func_search_save_LIBS fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_opendir" >&5 -$as_echo "$ac_cv_search_opendir" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_opendir" >&5 +printf "%s\n" "$ac_cv_search_opendir" >&6; } ac_res=$ac_cv_search_opendir -if test "$ac_res" != no; then : +if test "$ac_res" != no +then : test "$ac_res" = "none required" || LIBS="$ac_res $LIBS" fi else - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing opendir" >&5 -$as_echo_n "checking for library containing opendir... " >&6; } -if ${ac_cv_search_opendir+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for library containing opendir" >&5 +printf %s "checking for library containing opendir... " >&6; } +if test ${ac_cv_search_opendir+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_func_search_save_LIBS=$LIBS cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ @@ -4334,101 +5129,185 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* Override any GCC internal prototype to avoid an error. Use char because int might match the return type of a GCC builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif char opendir (); int -main () +main (void) { return opendir (); ; return 0; } _ACEOF -for ac_lib in '' x; do +for ac_lib in '' x +do if test -z "$ac_lib"; then ac_res="none required" else ac_res=-l$ac_lib LIBS="-l$ac_lib $ac_func_search_save_LIBS" fi - if ac_fn_c_try_link "$LINENO"; then : + if ac_fn_c_try_link "$LINENO" +then : ac_cv_search_opendir=$ac_res fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext - if ${ac_cv_search_opendir+:} false; then : + if test ${ac_cv_search_opendir+y} +then : break fi done -if ${ac_cv_search_opendir+:} false; then : +if test ${ac_cv_search_opendir+y} +then : -else +else $as_nop ac_cv_search_opendir=no fi rm conftest.$ac_ext LIBS=$ac_func_search_save_LIBS fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_opendir" >&5 -$as_echo "$ac_cv_search_opendir" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_opendir" >&5 +printf "%s\n" "$ac_cv_search_opendir" >&6; } ac_res=$ac_cv_search_opendir -if test "$ac_res" != no; then : +if test "$ac_res" != no +then : test "$ac_res" = "none required" || LIBS="$ac_res $LIBS" fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether time.h and sys/time.h may both be included" >&5 -$as_echo_n "checking whether time.h and sys/time.h may both be included... " >&6; } -if ${ac_cv_header_time+:} false; then : - $as_echo_n "(cached) " >&6 -else - cat confdefs.h - <<_ACEOF >conftest.$ac_ext + + +# Obsolete code to be removed. +if test $ac_cv_header_sys_time_h = yes; then + +printf "%s\n" "#define TIME_WITH_SYS_TIME 1" >>confdefs.h + +fi +# End of obsolete code. + + +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $CC options needed to detect all undeclared functions" >&5 +printf %s "checking for $CC options needed to detect all undeclared functions... " >&6; } +if test ${ac_cv_c_undeclared_builtin_options+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_save_CFLAGS=$CFLAGS + ac_cv_c_undeclared_builtin_options='cannot detect' + for ac_arg in '' -fno-builtin; do + CFLAGS="$ac_save_CFLAGS $ac_arg" + # This test program should *not* compile successfully. + cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ -#include -#include -#include int -main () +main (void) { -if ((struct tm *) 0) -return 0; +(void) strchr; ; return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : - ac_cv_header_time=yes -else - ac_cv_header_time=no -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_header_time" >&5 -$as_echo "$ac_cv_header_time" >&6; } -if test $ac_cv_header_time = yes; then +if ac_fn_c_try_compile "$LINENO" +then : + +else $as_nop + # This test program should compile successfully. + # No library function is consistently available on + # freestanding implementations, so test against a dummy + # declaration. Include always-available headers on the + # off chance that they somehow elicit warnings. + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +#include +#include +#include +#include +extern void ac_decl (int, char *); -$as_echo "#define TIME_WITH_SYS_TIME 1" >>confdefs.h +int +main (void) +{ +(void) ac_decl (0, (char *) 0); + (void) ac_decl; + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO" +then : + if test x"$ac_arg" = x +then : + ac_cv_c_undeclared_builtin_options='none needed' +else $as_nop + ac_cv_c_undeclared_builtin_options=$ac_arg fi + break +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext + done + CFLAGS=$ac_save_CFLAGS + +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_undeclared_builtin_options" >&5 +printf "%s\n" "$ac_cv_c_undeclared_builtin_options" >&6; } + case $ac_cv_c_undeclared_builtin_options in #( + 'cannot detect') : + { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} +as_fn_error $? "cannot make $CC report undeclared builtins +See \`config.log' for more details" "$LINENO" 5; } ;; #( + 'none needed') : + ac_c_undeclared_builtin_options='' ;; #( + *) : + ac_c_undeclared_builtin_options=$ac_cv_c_undeclared_builtin_options ;; +esac - - -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether struct tm is in sys/time.h or time.h" >&5 -$as_echo_n "checking whether struct tm is in sys/time.h or time.h... " >&6; } -if ${ac_cv_struct_tm+:} false; then : - $as_echo_n "(cached) " >&6 -else +ac_fn_check_decl "$LINENO" "isblank" "ac_cv_have_decl_isblank" "$ac_includes_default" "$ac_c_undeclared_builtin_options" "CFLAGS" +if test "x$ac_cv_have_decl_isblank" = xyes +then : + ac_have_decl=1 +else $as_nop + ac_have_decl=0 +fi +printf "%s\n" "#define HAVE_DECL_ISBLANK $ac_have_decl" >>confdefs.h +ac_fn_check_decl "$LINENO" "strcasecmp" "ac_cv_have_decl_strcasecmp" "$ac_includes_default" "$ac_c_undeclared_builtin_options" "CFLAGS" +if test "x$ac_cv_have_decl_strcasecmp" = xyes +then : + ac_have_decl=1 +else $as_nop + ac_have_decl=0 +fi +printf "%s\n" "#define HAVE_DECL_STRCASECMP $ac_have_decl" >>confdefs.h +ac_fn_check_decl "$LINENO" "strcasestr" "ac_cv_have_decl_strcasestr" "$ac_includes_default" "$ac_c_undeclared_builtin_options" "CFLAGS" +if test "x$ac_cv_have_decl_strcasestr" = xyes +then : + ac_have_decl=1 +else $as_nop + ac_have_decl=0 +fi +printf "%s\n" "#define HAVE_DECL_STRCASESTR $ac_have_decl" >>confdefs.h + + + +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether struct tm is in sys/time.h or time.h" >&5 +printf %s "checking whether struct tm is in sys/time.h or time.h... " >&6; } +if test ${ac_cv_struct_tm+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #include #include int -main () +main (void) { struct tm tm; int *p = &tm.tm_sec; @@ -4437,37 +5316,39 @@ struct tm tm; return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : ac_cv_struct_tm=time.h -else +else $as_nop ac_cv_struct_tm=sys/time.h fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_struct_tm" >&5 -$as_echo "$ac_cv_struct_tm" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_struct_tm" >&5 +printf "%s\n" "$ac_cv_struct_tm" >&6; } if test $ac_cv_struct_tm = sys/time.h; then -$as_echo "#define TM_IN_SYS_TIME 1" >>confdefs.h +printf "%s\n" "#define TM_IN_SYS_TIME 1" >>confdefs.h fi -for ac_func in strftime + + for ac_func in strftime do : ac_fn_c_check_func "$LINENO" "strftime" "ac_cv_func_strftime" -if test "x$ac_cv_func_strftime" = xyes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_STRFTIME 1 -_ACEOF +if test "x$ac_cv_func_strftime" = xyes +then : + printf "%s\n" "#define HAVE_STRFTIME 1" >>confdefs.h -else +else $as_nop # strftime is in -lintl on SCO UNIX. -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for strftime in -lintl" >&5 -$as_echo_n "checking for strftime in -lintl... " >&6; } -if ${ac_cv_lib_intl_strftime+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for strftime in -lintl" >&5 +printf %s "checking for strftime in -lintl... " >&6; } +if test ${ac_cv_lib_intl_strftime+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_check_lib_save_LIBS=$LIBS LIBS="-lintl $LIBS" cat confdefs.h - <<_ACEOF >conftest.$ac_ext @@ -4476,66 +5357,178 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* Override any GCC internal prototype to avoid an error. Use char because int might match the return type of a GCC builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif char strftime (); int -main () +main (void) { return strftime (); ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : ac_cv_lib_intl_strftime=yes -else +else $as_nop ac_cv_lib_intl_strftime=no fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_intl_strftime" >&5 -$as_echo "$ac_cv_lib_intl_strftime" >&6; } -if test "x$ac_cv_lib_intl_strftime" = xyes; then : - $as_echo "#define HAVE_STRFTIME 1" >>confdefs.h +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_intl_strftime" >&5 +printf "%s\n" "$ac_cv_lib_intl_strftime" >&6; } +if test "x$ac_cv_lib_intl_strftime" = xyes +then : + printf "%s\n" "#define HAVE_STRFTIME 1" >>confdefs.h LIBS="-lintl $LIBS" fi fi + done +ac_fn_c_check_func "$LINENO" "getopt" "ac_cv_func_getopt" +if test "x$ac_cv_func_getopt" = xyes +then : + printf "%s\n" "#define HAVE_GETOPT 1" >>confdefs.h -for ac_func in mkdir strdup strstr strtol memcpy memset lstat strcasecmp \ - strcasestr getpwuid getopt snprintf memmove strerror -do : - as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh` -ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var" -if eval test \"x\$"$as_ac_var"\" = x"yes"; then : - cat >>confdefs.h <<_ACEOF -#define `$as_echo "HAVE_$ac_func" | $as_tr_cpp` 1 +fi +ac_fn_c_check_func "$LINENO" "getpwuid" "ac_cv_func_getpwuid" +if test "x$ac_cv_func_getpwuid" = xyes +then : + printf "%s\n" "#define HAVE_GETPWUID 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "isblank" "ac_cv_func_isblank" +if test "x$ac_cv_func_isblank" = xyes +then : + printf "%s\n" "#define HAVE_ISBLANK 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "lstat" "ac_cv_func_lstat" +if test "x$ac_cv_func_lstat" = xyes +then : + printf "%s\n" "#define HAVE_LSTAT 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "memcpy" "ac_cv_func_memcpy" +if test "x$ac_cv_func_memcpy" = xyes +then : + printf "%s\n" "#define HAVE_MEMCPY 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "memmove" "ac_cv_func_memmove" +if test "x$ac_cv_func_memmove" = xyes +then : + printf "%s\n" "#define HAVE_MEMMOVE 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "memset" "ac_cv_func_memset" +if test "x$ac_cv_func_memset" = xyes +then : + printf "%s\n" "#define HAVE_MEMSET 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "mkdir" "ac_cv_func_mkdir" +if test "x$ac_cv_func_mkdir" = xyes +then : + printf "%s\n" "#define HAVE_MKDIR 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "snprintf" "ac_cv_func_snprintf" +if test "x$ac_cv_func_snprintf" = xyes +then : + printf "%s\n" "#define HAVE_SNPRINTF 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "strcasecmp" "ac_cv_func_strcasecmp" +if test "x$ac_cv_func_strcasecmp" = xyes +then : + printf "%s\n" "#define HAVE_STRCASECMP 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "strcasestr" "ac_cv_func_strcasestr" +if test "x$ac_cv_func_strcasestr" = xyes +then : + printf "%s\n" "#define HAVE_STRCASESTR 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "strdup" "ac_cv_func_strdup" +if test "x$ac_cv_func_strdup" = xyes +then : + printf "%s\n" "#define HAVE_STRDUP 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "strerror" "ac_cv_func_strerror" +if test "x$ac_cv_func_strerror" = xyes +then : + printf "%s\n" "#define HAVE_STRERROR 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "strstr" "ac_cv_func_strstr" +if test "x$ac_cv_func_strstr" = xyes +then : + printf "%s\n" "#define HAVE_STRSTR 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "strtol" "ac_cv_func_strtol" +if test "x$ac_cv_func_strtol" = xyes +then : + printf "%s\n" "#define HAVE_STRTOL 1" >>confdefs.h + +fi + + + + ac_fn_c_check_type "$LINENO" "intptr_t" "ac_cv_type_intptr_t" "$ac_includes_default" +if test "x$ac_cv_type_intptr_t" = xyes +then : + +printf "%s\n" "#define HAVE_INTPTR_T 1" >>confdefs.h + +else $as_nop + for ac_type in 'int' 'long int' 'long long int'; do + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +$ac_includes_default +int +main (void) +{ +static int test_array [1 - 2 * !(sizeof (void *) <= sizeof ($ac_type))]; +test_array [0] = 0; +return test_array [0]; + + ; + return 0; +} _ACEOF +if ac_fn_c_try_compile "$LINENO" +then : + +printf "%s\n" "#define intptr_t $ac_type" >>confdefs.h + ac_type= +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext + test -z "$ac_type" && break + done fi -done ac_fn_c_check_type "$LINENO" "size_t" "ac_cv_type_size_t" "$ac_includes_default" -if test "x$ac_cv_type_size_t" = xyes; then : +if test "x$ac_cv_type_size_t" = xyes +then : -else +else $as_nop -cat >>confdefs.h <<_ACEOF -#define size_t unsigned int -_ACEOF +printf "%s\n" "#define size_t unsigned int" >>confdefs.h fi if test $ac_cv_func_snprintf != no; then - $as_echo "#define HAVE_SNPRINTF 1" >>confdefs.h + printf "%s\n" "#define HAVE_SNPRINTF 1" >>confdefs.h fi @@ -4554,30 +5547,49 @@ fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for pkg-config" >&5 +printf "%s\n" "$as_me: checking for pkg-config" >&6;} +if test x"$PKGCONF" == x ; then + as_fn_error $? "'pkgconf' is missing; please install it." "$LINENO" 5 +fi + + + # Check whether --with-gdbm was given. -if test "${with_gdbm+set}" = set; then : +if test ${with_gdbm+y} +then : withval=$with_gdbm; given_gdbm=$withval fi -if test "$given_gdbm" != "no"; then - for i in /usr/local /usr $withval; do - if test -f "$i/include/gdbm.h"; then - GDBM_INCLUDE="$i/include" - THIS_PREFIX="$i" - fi - done +if test "${given_gdbm}" = "no"; then + given_gdbm="" +fi + +if test -n "${given_gdbm}" -a "${given_gdbm}" != "yes" -a "${given_gdbm}" != "/usr"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for \"${given_gdbm}/include/gdbm.h\"" >&5 +printf %s "checking for \"${given_gdbm}/include/gdbm.h\"... " >&6; } + if test -f "${given_gdbm}/include/gdbm.h"; then + GDBM_INCLUDE="${given_gdbm}/include" + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } + else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } + fi - unset ac_cv_lib_gdbm_gdbm_open + if test -n "${GDBM_INCLUDE}"; then + unset ac_cv_lib_gdbm_gdbm_open old_LDFLAGS="$LDFLAGS" - LDFLAGS="-L$THIS_PREFIX/lib $LDFLAGS" - - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for gdbm_open in -lgdbm" >&5 -$as_echo_n "checking for gdbm_open in -lgdbm... " >&6; } -if ${ac_cv_lib_gdbm_gdbm_open+:} false; then : - $as_echo_n "(cached) " >&6 -else + LDFLAGS="-L${given_gdbm}/lib $LDFLAGS" + + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for gdbm_open in -lgdbm" >&5 +printf %s "checking for gdbm_open in -lgdbm... " >&6; } +if test ${ac_cv_lib_gdbm_gdbm_open+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_check_lib_save_LIBS=$LIBS LIBS="-lgdbm $LIBS" cat confdefs.h - <<_ACEOF >conftest.$ac_ext @@ -4586,51 +5598,124 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* Override any GCC internal prototype to avoid an error. Use char because int might match the return type of a GCC builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif char gdbm_open (); int -main () +main (void) { return gdbm_open (); ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : ac_cv_lib_gdbm_gdbm_open=yes -else +else $as_nop ac_cv_lib_gdbm_gdbm_open=no fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_gdbm_gdbm_open" >&5 -$as_echo "$ac_cv_lib_gdbm_gdbm_open" >&6; } -if test "x$ac_cv_lib_gdbm_gdbm_open" = xyes; then : +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_gdbm_gdbm_open" >&5 +printf "%s\n" "$ac_cv_lib_gdbm_gdbm_open" >&6; } +if test "x$ac_cv_lib_gdbm_gdbm_open" = xyes +then : GDBM_LIB="-lgdbm" +else $as_nop + GDBM_INCLUDE="" fi LDFLAGS="$old_LDFLAGS" - if test "$THIS_PREFIX" != "" && test "$THIS_PREFIX" != "/usr"; then - THIS_LFLAGS="$THIS_PREFIX/lib" - fi + if test -n "${GDBM_LIB}"; then + + if test "x$LDFLAGS" = "x"; then + test "x$silent" != "xyes" && echo " setting LDFLAGS to \""-L${given_gdbm}/lib"\"" + LDFLAGS=""-L${given_gdbm}/lib"" + else + apr_addto_bugger=""-L${given_gdbm}/lib"" + for i in $apr_addto_bugger; do + apr_addto_duplicate="0" + for j in $LDFLAGS; do + if test "x$i" = "x$j"; then + apr_addto_duplicate="1" + break + fi + done + if test $apr_addto_duplicate = "0"; then + test "x$silent" != "xyes" && echo " adding \"$i\" to LDFLAGS" + LDFLAGS="$LDFLAGS $i" + fi + done + fi + + else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: gdbm includes and/or libs not found under \"${given_gdbm}\"" >&5 +printf "%s\n" "$as_me: WARNING: gdbm includes and/or libs not found under \"${given_gdbm}\"" >&2;} + fi + fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for GDBM support" >&5 -$as_echo_n "checking for GDBM support... " >&6; } +if test -n "${given_gdbm}" -a -z "${GDBM_INCLUDE}"; then + gdbm_headers_found=1 + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: Trying /usr/include/gdbm.h" >&5 +printf "%s\n" "$as_me: Trying /usr/include/gdbm.h" >&6;} + ac_fn_c_check_header_compile "$LINENO" "gdbm.h" "ac_cv_header_gdbm_h" "$ac_includes_default" +if test "x$ac_cv_header_gdbm_h" = xyes +then : + GDBM_INCLUDE="" +else $as_nop -if test "$GDBM_LIB" = "" && test "$given_gdbm" != "no"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for gdbm_open in -lgdbm" >&5 -$as_echo_n "checking for gdbm_open in -lgdbm... " >&6; } -if ${ac_cv_lib_gdbm_gdbm_open+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: Trying /usr/local/include/gdbm.h" >&5 +printf "%s\n" "$as_me: Trying /usr/local/include/gdbm.h" >&6;} + ac_fn_c_check_header_compile "$LINENO" "/usr/local/include/gdbm.h" "ac_cv_header__usr_local_include_gdbm_h" "$ac_includes_default" +if test "x$ac_cv_header__usr_local_include_gdbm_h" = xyes +then : + GDBM_INCLUDE="-I/usr/local/include" +else $as_nop + + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: Trying /opt/local/include/gdbm.h" >&5 +printf "%s\n" "$as_me: Trying /opt/local/include/gdbm.h" >&6;} + ac_fn_c_check_header_compile "$LINENO" "/opt/local/include/gdbm.h" "ac_cv_header__opt_local_include_gdbm_h" "$ac_includes_default" +if test "x$ac_cv_header__opt_local_include_gdbm_h" = xyes +then : + GDBM_INCLUDE="-I/opt/local/include" +else $as_nop + + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: Trying /usr/pkg/include/gdbm.h" >&5 +printf "%s\n" "$as_me: Trying /usr/pkg/include/gdbm.h" >&6;} + ac_fn_c_check_header_compile "$LINENO" "/usr/pkg/include/gdbm.h" "ac_cv_header__usr_pkg_include_gdbm_h" "$ac_includes_default" +if test "x$ac_cv_header__usr_pkg_include_gdbm_h" = xyes +then : + GDBM_INCLUDE="" +else $as_nop + + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: gdbm.h not found" >&5 +printf "%s\n" "gdbm.h not found" >&6; } + gdbm_headers_found=0 + +fi + + +fi + + +fi + + +fi + + + if test ${gdbm_headers_found} -eq 1; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for gdbm_open in -lgdbm" >&5 +printf %s "checking for gdbm_open in -lgdbm... " >&6; } +if test ${ac_cv_lib_gdbm_gdbm_open+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_check_lib_save_LIBS=$LIBS LIBS="-lgdbm $LIBS" cat confdefs.h - <<_ACEOF >conftest.$ac_ext @@ -4639,103 +5724,46 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* Override any GCC internal prototype to avoid an error. Use char because int might match the return type of a GCC builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif char gdbm_open (); int -main () +main (void) { return gdbm_open (); ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : ac_cv_lib_gdbm_gdbm_open=yes -else +else $as_nop ac_cv_lib_gdbm_gdbm_open=no fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_gdbm_gdbm_open" >&5 -$as_echo "$ac_cv_lib_gdbm_gdbm_open" >&6; } -if test "x$ac_cv_lib_gdbm_gdbm_open" = xyes; then : - -$as_echo "#define GDBM 1" >>confdefs.h - DBM_TYPE=gdbm; GDBM_LIB=-lgdbm -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_gdbm_gdbm_open" >&5 +printf "%s\n" "$ac_cv_lib_gdbm_gdbm_open" >&6; } +if test "x$ac_cv_lib_gdbm_gdbm_open" = xyes +then : + DBM_TYPE=gdbm; GDBM_LIB=-lgdbm +else $as_nop DBM_TYPE="" fi - { $as_echo "$as_me:${as_lineno-$LINENO}: checking gdbm library" >&5 -$as_echo_n "checking gdbm library... " >&6; } - if test "a$DBM_TYPE" = a; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: none found" >&5 -$as_echo "none found" >&6; } - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: No gdbm library found - will limit a few features" >&5 -$as_echo "$as_me: WARNING: No gdbm library found - will limit a few features" >&2;} - else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $DBM_TYPE found" >&5 -$as_echo "$DBM_TYPE found" >&6; } fi -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - -if test "$GDBM_LIB" = "-lgdbm" && test "$GDBM_INCLUDE" = ""; then - ac_fn_c_check_header_mongrel "$LINENO" "gdbm.h" "ac_cv_header_gdbm_h" "$ac_includes_default" -if test "x$ac_cv_header_gdbm_h" = xyes; then : - GDBM_INCLUDE="" -else - - { $as_echo "$as_me:${as_lineno-$LINENO}: result: Try /usr/local/include/gdbm.h" >&5 -$as_echo "Try /usr/local/include/gdbm.h" >&6; } - ac_fn_c_check_header_mongrel "$LINENO" "/usr/local/include/gdbm.h" "ac_cv_header__usr_local_include_gdbm_h" "$ac_includes_default" -if test "x$ac_cv_header__usr_local_include_gdbm_h" = xyes; then : - GDBM_INCLUDE="-I/usr/local/include" -else - - { $as_echo "$as_me:${as_lineno-$LINENO}: result: Try /opt/local/include/gdbm.h" >&5 -$as_echo "Try /opt/local/include/gdbm.h" >&6; } - ac_fn_c_check_header_mongrel "$LINENO" "/opt/local/include/gdbm.h" "ac_cv_header__opt_local_include_gdbm_h" "$ac_includes_default" -if test "x$ac_cv_header__opt_local_include_gdbm_h" = xyes; then : - GDBM_INCLUDE="-I/opt/local/include" -else - - { $as_echo "$as_me:${as_lineno-$LINENO}: result: Try /usr/pkg/include/gdbm.h" >&5 -$as_echo "Try /usr/pkg/include/gdbm.h" >&6; } - ac_fn_c_check_header_mongrel "$LINENO" "/usr/pkg/include/gdbm.h" "ac_cv_header__usr_pkg_include_gdbm_h" "$ac_includes_default" -if test "x$ac_cv_header__usr_pkg_include_gdbm_h" = xyes; then : - GDBM_INCLUDE="" -else - - { $as_echo "$as_me:${as_lineno-$LINENO}: result: Giving up - You need to install gdbm.h somewhere" >&5 -$as_echo "Giving up - You need to install gdbm.h somewhere" >&6; } - exit - -fi - - + unset gdbm_headers_found fi +if test -n "${GDBM_LIB}"; then + printf "%s\n" "#define HAVE_GDBM_H 1" >>confdefs.h -fi - +printf "%s\n" "#define GDBM 1" >>confdefs.h -fi - - -fi - -if test -n "$GDBM_LIB"; then - if test "$GDBM_INCLUDE" != "/usr/include"; then if test -z "$GDBM_INCLUDE" || echo "$GDBM_INCLUDE" | grep '^/' >/dev/null ; then @@ -4762,61 +5790,56 @@ if test -n "$GDBM_LIB"; then fi - $as_echo "#define HAVE_GDBM_H 1" >>confdefs.h - - EXTRA_LIBS="$EXTRA_LIBS $GDBM_LIB" + EXTRA_LIBS="$EXTRA_LIBS $GDBM_LIB" +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: hypermail's usegdbm config option won't be available unless you install libgdbm-dev" >&5 +printf "%s\n" "$as_me: WARNING: hypermail's usegdbm config option won't be available unless you install libgdbm-dev" >&2;} fi + for ac_header in iconv.h +do : + ac_fn_c_check_header_compile "$LINENO" "iconv.h" "ac_cv_header_iconv_h" "$ac_includes_default" +if test "x$ac_cv_header_iconv_h" = xyes +then : + printf "%s\n" "#define HAVE_ICONV_H 1" >>confdefs.h -# Check whether --enable-i18n was given. -if test "${enable_i18n+set}" = set; then : - enableval=$enable_i18n; given_iconv=$enableval -fi - -if test "$given_iconv" = "no"; then - echo "disabled I18N support." -else - for ac_func in iconv -do : - ac_fn_c_check_func "$LINENO" "iconv" "ac_cv_func_iconv" -if test "x$ac_cv_func_iconv" = xyes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_ICONV 1 -_ACEOF - +else $as_nop + as_fn_error $? "unable to find iconv.h headers (libc6-dev may be missing)" "$LINENO" 5 fi + done - for ac_header in iconv.h + for ac_func in iconv do : - ac_fn_c_check_header_mongrel "$LINENO" "iconv.h" "ac_cv_header_iconv_h" "$ac_includes_default" -if test "x$ac_cv_header_iconv_h" = xyes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_ICONV_H 1 -_ACEOF + ac_fn_c_check_func "$LINENO" "iconv" "ac_cv_func_iconv" +if test "x$ac_cv_func_iconv" = xyes +then : + printf "%s\n" "#define HAVE_ICONV 1" >>confdefs.h +else $as_nop + as_fn_error $? "unable to find iconv() function" "$LINENO" 5 fi done -fi - # Check whether --enable-system_libtrio was given. -if test "${enable_system_libtrio+set}" = set; then : +if test ${enable_system_libtrio+y} +then : enableval=$enable_system_libtrio; fi if test "${enable_system_libtrio}" = yes; then - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for system libtrio" >&5 -$as_echo "$as_me: checking for system libtrio" >&6;} - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing trio_printf" >&5 -$as_echo_n "checking for library containing trio_printf... " >&6; } -if ${ac_cv_search_trio_printf+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for system libtrio" >&5 +printf "%s\n" "$as_me: checking for system libtrio" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for library containing trio_printf" >&5 +printf %s "checking for library containing trio_printf... " >&6; } +if test ${ac_cv_search_trio_printf+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_func_search_save_LIBS=$LIBS cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ @@ -4824,65 +5847,67 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* Override any GCC internal prototype to avoid an error. Use char because int might match the return type of a GCC builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif char trio_printf (); int -main () +main (void) { return trio_printf (); ; return 0; } _ACEOF -for ac_lib in '' trio; do +for ac_lib in '' trio +do if test -z "$ac_lib"; then ac_res="none required" else ac_res=-l$ac_lib LIBS="-l$ac_lib $ac_func_search_save_LIBS" fi - if ac_fn_c_try_link "$LINENO"; then : + if ac_fn_c_try_link "$LINENO" +then : ac_cv_search_trio_printf=$ac_res fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext - if ${ac_cv_search_trio_printf+:} false; then : + if test ${ac_cv_search_trio_printf+y} +then : break fi done -if ${ac_cv_search_trio_printf+:} false; then : +if test ${ac_cv_search_trio_printf+y} +then : -else +else $as_nop ac_cv_search_trio_printf=no fi rm conftest.$ac_ext LIBS=$ac_func_search_save_LIBS fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_trio_printf" >&5 -$as_echo "$ac_cv_search_trio_printf" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_trio_printf" >&5 +printf "%s\n" "$ac_cv_search_trio_printf" >&6; } ac_res=$ac_cv_search_trio_printf -if test "$ac_res" != no; then : +if test "$ac_res" != no +then : test "$ac_res" = "none required" || LIBS="$ac_res $LIBS" -else +else $as_nop as_fn_error $? "unable to find trio_copy() function" "$LINENO" 5 fi - ac_fn_c_check_header_mongrel "$LINENO" "trio.h" "ac_cv_header_trio_h" "$ac_includes_default" -if test "x$ac_cv_header_trio_h" = xyes; then : + ac_fn_c_check_header_compile "$LINENO" "trio.h" "ac_cv_header_trio_h" "$ac_includes_default" +if test "x$ac_cv_header_trio_h" = xyes +then : -else +else $as_nop as_fn_error $? "unable to find trio.h header file (dev version not installed?)" "$LINENO" 5 fi - else - { $as_echo "$as_me:${as_lineno-$LINENO}: using bundled libtrio" >&5 -$as_echo "$as_me: using bundled libtrio" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: using bundled libtrio" >&5 +printf "%s\n" "$as_me: using bundled libtrio" >&6;} if test "x$INCLUDES" = "x"; then test "x$silent" != "xyes" && echo " setting INCLUDES to \""-Itrio"\"" @@ -4976,8 +6001,8 @@ _ACEOF case $ac_val in #( *${as_nl}*) case $ac_var in #( - *_cv_*) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 -$as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; + *_cv_*) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 +printf "%s\n" "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; esac case $ac_var in #( _ | IFS | as_nl) ;; #( @@ -5007,15 +6032,15 @@ $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; /^ac_cv_env_/b end t clear :clear - s/^\([^=]*\)=\(.*[{}].*\)$/test "${\1+set}" = set || &/ + s/^\([^=]*\)=\(.*[{}].*\)$/test ${\1+y} || &/ t end s/^\([^=]*\)=\(.*\)$/\1=${\1=\2}/ :end' >>confcache if diff "$cache_file" confcache >/dev/null 2>&1; then :; else if test -w "$cache_file"; then if test "x$cache_file" != "x/dev/null"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: updating cache $cache_file" >&5 -$as_echo "$as_me: updating cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: updating cache $cache_file" >&5 +printf "%s\n" "$as_me: updating cache $cache_file" >&6;} if test ! -f "$cache_file" || test -h "$cache_file"; then cat confcache >"$cache_file" else @@ -5029,8 +6054,8 @@ $as_echo "$as_me: updating cache $cache_file" >&6;} fi fi else - { $as_echo "$as_me:${as_lineno-$LINENO}: not updating unwritable cache $cache_file" >&5 -$as_echo "$as_me: not updating unwritable cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: not updating unwritable cache $cache_file" >&5 +printf "%s\n" "$as_me: not updating unwritable cache $cache_file" >&6;} fi fi rm -f confcache @@ -5061,7 +6086,7 @@ rm -f confcache for apr_configure_arg in $ac_configure_args do case "$apr_configure_arg" in - --with-pcre=*|\'--with-pcre=*) + --with-pcre2=*|\'--with-pcre2=*) continue ;; esac apr_configure_args="$apr_configure_args$apr_sep'$apr_configure_arg'" @@ -5088,16 +6113,16 @@ rm -f confcache # Some versions of bash will fail to source /dev/null (special files # actually), so we avoid doing that. DJGPP emulates it as a regular file. if test /dev/null != "$cache_file" && test -f "$cache_file"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: loading cache $cache_file" >&5 -$as_echo "$as_me: loading cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: loading cache $cache_file" >&5 +printf "%s\n" "$as_me: loading cache $cache_file" >&6;} case $cache_file in [\\/]* | ?:[\\/]* ) . "$cache_file";; *) . "./$cache_file";; esac fi else - { $as_echo "$as_me:${as_lineno-$LINENO}: creating cache $cache_file" >&5 -$as_echo "$as_me: creating cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: creating cache $cache_file" >&5 +printf "%s\n" "$as_me: creating cache $cache_file" >&6;} >$cache_file fi @@ -5107,52 +6132,54 @@ fi # Check whether --enable-bundled_pcre was given. -if test "${enable_bundled_pcre+set}" = set; then : +if test ${enable_bundled_pcre+y} +then : enableval=$enable_bundled_pcre; fi -# Check whether --with-external_pcre was given. -if test "${with_external_pcre+set}" = set; then : - withval=$with_external_pcre; +# Check whether --with-external_pcre2 was given. +if test ${with_external_pcre2+y} +then : + withval=$with_external_pcre2; fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for PCRE regular expressions library" >&5 -$as_echo "$as_me: checking for PCRE regular expressions library" >&6;} +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for PCRE2 regular expressions library" >&5 +printf "%s\n" "$as_me: checking for PCRE2 regular expressions library" >&6;} -if test ! -z ${enable_bundled_pcre}; then - with_external_pcre="" +if test ! -z ${enable_bundled_pcre2}; then + with_external_pcre2="" fi -case $with_external_pcre in - /*) { $as_echo "$as_me:${as_lineno-$LINENO}: --with-external-pcre => checking for an external libpcre" >&5 -$as_echo "$as_me: --with-external-pcre => checking for an external libpcre" >&6;} - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for pcre-config" >&5 -$as_echo_n "checking for pcre-config... " >&6; } - if test -d "$with_external_pcre" && test -x "$with_external_pcre/pcre-config"; then - PCRE_CONFIG=$with_external_pcre/pcre-config - elif test -x "$with_external_pcre"; then - PCRE_CONFIG=$with_external_pcre +case $with_external_pcre2 in + /*) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: --with-external-pcre2 => checking for an external libpcre2" >&5 +printf "%s\n" "$as_me: --with-external-pcre2 => checking for an external libpcre2" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for pcre2-config" >&5 +printf %s "checking for pcre2-config... " >&6; } + if test -d "$with_external_pcre" && test -x "$with_external_pcre2/pcre2-config"; then + PCRE2_CONFIG=$with_external_pcre2/pcre2-config + elif test -x "$with_external_pcre2"; then + PCRE2_CONFIG=$with_external_pcre2 else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } - as_fn_error $? "${PCRE_CONFIG} does not point to a directory or pcre-config" "$LINENO" 5 + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } + as_fn_error $? "${PCRE2_CONFIG} does not point to a directory or pcre2-config" "$LINENO" 5 fi - { $as_echo "$as_me:${as_lineno-$LINENO}: result: ${PCRE_CONFIG}" >&5 -$as_echo "${PCRE_CONFIG}" >&6; } - - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for pcre version equal or greater than ${PCRE_MIN_VERSION}" >&5 -$as_echo_n "checking for pcre version equal or greater than ${PCRE_MIN_VERSION}... " >&6; } - if test -f "${PCRE_CONFIG}" && test -x "${PCRE_CONFIG}"; then - PCRE_VERSION=$(${PCRE_CONFIG} --version) - as_arg_v1="${PCRE_VERSION}" -as_arg_v2="${PCRE_MIN_VERSION}" + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: ${PCRE2_CONFIG}" >&5 +printf "%s\n" "${PCRE2_CONFIG}" >&6; } + + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for pcre2 version equal or greater than ${PCRE2_MIN_VERSION}" >&5 +printf %s "checking for pcre2 version equal or greater than ${PCRE2_MIN_VERSION}... " >&6; } + if test -f "${PCRE2_CONFIG}" && test -x "${PCRE2_CONFIG}"; then + PCRE2_VERSION=$(${PCRE2_CONFIG} --version) + as_arg_v1="${PCRE2_VERSION}" +as_arg_v2="${PCRE2_MIN_VERSION}" awk "$as_awk_strverscmp" v1="$as_arg_v1" v2="$as_arg_v2" /dev/null case $? in #( 1) : - PCRE_CONFIG=false ;; #( + PCRE2_CONFIG=false ;; #( 0) : ;; #( 2) : @@ -5160,49 +6187,54 @@ case $? in #( *) : ;; esac - if test "${PCRE_CONFIG}" != false; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } + if test "${PCRE2_CONFIG}" != false; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } - as_fn_error $? "the PCRE library version must be equal or greater than ${PCRE_MIN_VERSION}" "$LINENO" 5 + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } + as_fn_error $? "the PCRE2 library version must be equal or greater than ${PCRE2_MIN_VERSION}" "$LINENO" 5 fi else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } - as_fn_error $? "${PCRE_CONFIG} does not point to a script" "$LINENO" 5 + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } + as_fn_error $? "${PCRE2_CONFIG} does not point to a script" "$LINENO" 5 fi ;; - *) PCRE_CONFIG=false + *) PCRE2_CONFIG=false ;; esac -if test -z "${enable_bundled_pcre}" && test -z "${PCRE_CONFIG}" -o "${PCRE_CONFIG}" = "false"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for a system libpcre" >&5 -$as_echo "$as_me: checking for a system libpcre" >&6;} - # Extract the first word of "pcre-config", so it can be a program name with args. -set dummy pcre-config; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_path_PCRE_CONFIG+:} false; then : - $as_echo_n "(cached) " >&6 -else - case $PCRE_CONFIG in +if test -z "${enable_bundled_pcre2}" && test -z "${PCRE2_CONFIG}" -o "${PCRE2_CONFIG}" = "false"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for a system libpcre2" >&5 +printf "%s\n" "$as_me: checking for a system libpcre2" >&6;} + # Extract the first word of "pcre2-config", so it can be a program name with args. +set dummy pcre2-config; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_path_PCRE2_CONFIG+y} +then : + printf %s "(cached) " >&6 +else $as_nop + case $PCRE2_CONFIG in [\\/]* | ?:[\\/]*) - ac_cv_path_PCRE_CONFIG="$PCRE_CONFIG" # Let the user override the test with a path. + ac_cv_path_PCRE2_CONFIG="$PCRE2_CONFIG" # Let the user override the test with a path. ;; *) as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_path_PCRE_CONFIG="$as_dir/$ac_word$ac_exec_ext" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_path_PCRE2_CONFIG="$as_dir$ac_word$ac_exec_ext" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -5212,26 +6244,26 @@ IFS=$as_save_IFS ;; esac fi -PCRE_CONFIG=$ac_cv_path_PCRE_CONFIG -if test -n "$PCRE_CONFIG"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $PCRE_CONFIG" >&5 -$as_echo "$PCRE_CONFIG" >&6; } +PCRE2_CONFIG=$ac_cv_path_PCRE2_CONFIG +if test -n "$PCRE2_CONFIG"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $PCRE2_CONFIG" >&5 +printf "%s\n" "$PCRE2_CONFIG" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi - if test ! -z "${PCRE_CONFIG}"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: checking checking pcre version equal or greater than ${PCRE_MIN_VERSION}" >&5 -$as_echo_n "checking checking pcre version equal or greater than ${PCRE_MIN_VERSION}... " >&6; } - PCRE_VERSION=$(${PCRE_CONFIG} --version) - as_arg_v1="${PCRE_VERSION}" -as_arg_v2="${PCRE_MIN_VERSION}" + if test ! -z "${PCRE2_CONFIG}"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking checking pcre2 version equal or greater than ${PCRE2_MIN_VERSION}" >&5 +printf %s "checking checking pcre2 version equal or greater than ${PCRE2_MIN_VERSION}... " >&6; } + PCRE2_VERSION=$(${PCRE2_CONFIG} --version) + as_arg_v1="${PCRE2_VERSION}" +as_arg_v2="${PCRE2_MIN_VERSION}" awk "$as_awk_strverscmp" v1="$as_arg_v1" v2="$as_arg_v2" /dev/null case $? in #( 1) : - PCRE_CONFIG=false ;; #( + PCRE2_CONFIG=false ;; #( 0) : ;; #( 2) : @@ -5239,40 +6271,41 @@ case $? in #( *) : ;; esac - if test "${PCRE_CONFIG}" != false; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } + if test "${PCRE2_CONFIG}" != false; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi else - PCRE_CONFIG=false + PCRE2_CONFIG=false fi fi -if test "${PCRE_CONFIG}" != false; then - PCRE_PATH=$(${PCRE_CONFIG} --prefix) - as_ac_Header=`$as_echo "ac_cv_header_"$PCRE_PATH/include/pcre.h"" | $as_tr_sh` -ac_fn_c_check_header_mongrel "$LINENO" ""$PCRE_PATH/include/pcre.h"" "$as_ac_Header" "$ac_includes_default" -if eval test \"x\$"$as_ac_Header"\" = x"yes"; then : +if test "${PCRE2_CONFIG}" != false; then + PCRE2_PATH=$(${PCRE2_CONFIG} --prefix) + as_ac_Header=`printf "%s\n" "ac_cv_header_"$PCRE2_PATH/include/pcre2.h"" | $as_tr_sh` +ac_fn_c_check_header_compile "$LINENO" ""$PCRE2_PATH/include/pcre2.h"" "$as_ac_Header" "#define PCRE2_CODE_UNIT_WIDTH 8 +" +if eval test \"x\$"$as_ac_Header"\" = x"yes" +then : -else - PCRE_CONFIG=false +else $as_nop + PCRE2_CONFIG=false fi - fi -if test "$PCRE_CONFIG" != "false"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: using system PCRE regular expressions library" >&5 -$as_echo "$as_me: using system PCRE regular expressions library" >&6;} +if test "$PCRE2_CONFIG" != "false"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: using system PCRE2 regular expressions library" >&5 +printf "%s\n" "$as_me: using system PCRE2 regular expressions library" >&6;} if test "x$CFLAGS" = "x"; then - test "x$silent" != "xyes" && echo " setting CFLAGS to \"`$PCRE_CONFIG --cflags`\"" - CFLAGS="`$PCRE_CONFIG --cflags`" + test "x$silent" != "xyes" && echo " setting CFLAGS to \"`$PCRE2_CONFIG --cflags`\"" + CFLAGS="`$PCRE2_CONFIG --cflags`" else - apr_addto_bugger="`$PCRE_CONFIG --cflags`" + apr_addto_bugger="`$PCRE2_CONFIG --cflags`" for i in $apr_addto_bugger; do apr_addto_duplicate="0" for j in $CFLAGS; do @@ -5290,10 +6323,10 @@ $as_echo "$as_me: using system PCRE regular expressions library" >&6;} if test "x$LIBS" = "x"; then - test "x$silent" != "xyes" && echo " setting LIBS to \"`$PCRE_CONFIG --libs`\"" - LIBS="`$PCRE_CONFIG --libs`" + test "x$silent" != "xyes" && echo " setting LIBS to \"`$PCRE2_CONFIG --libs8`\"" + LIBS="`$PCRE2_CONFIG --libs8`" else - apr_addto_bugger="`$PCRE_CONFIG --libs`" + apr_addto_bugger="`$PCRE2_CONFIG --libs8`" for i in $apr_addto_bugger; do apr_addto_duplicate="0" for j in $LIBS; do @@ -5310,14 +6343,14 @@ $as_echo "$as_me: using system PCRE regular expressions library" >&6;} fi else - { $as_echo "$as_me:${as_lineno-$LINENO}: using bundled PCRE regular expressions library" >&5 -$as_echo "$as_me: using bundled PCRE regular expressions library" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: using bundled PCRE2 regular expressions library" >&5 +printf "%s\n" "$as_me: using bundled PCRE2 regular expressions library" >&6;} if test "x$INCLUDES" = "x"; then - test "x$silent" != "xyes" && echo " setting INCLUDES to \"-Ipcre\"" - INCLUDES="-Ipcre" + test "x$silent" != "xyes" && echo " setting INCLUDES to \"-Ipcre2/src\"" + INCLUDES="-Ipcre2/src" else - apr_addto_bugger="-Ipcre" + apr_addto_bugger="-Ipcre2/src" for i in $apr_addto_bugger; do apr_addto_duplicate="0" for j in $INCLUDES; do @@ -5335,10 +6368,10 @@ $as_echo "$as_me: using bundled PCRE regular expressions library" >&6;} if test "x$LDFLAGS" = "x"; then - test "x$silent" != "xyes" && echo " setting LDFLAGS to \""-Lpcre/.libs"\"" - LDFLAGS=""-Lpcre/.libs"" + test "x$silent" != "xyes" && echo " setting LDFLAGS to \""-Lpcre2/.libs"\"" + LDFLAGS=""-Lpcre2/.libs"" else - apr_addto_bugger=""-Lpcre/.libs"" + apr_addto_bugger=""-Lpcre2/.libs"" for i in $apr_addto_bugger; do apr_addto_duplicate="0" for j in $LDFLAGS; do @@ -5354,7 +6387,7 @@ $as_echo "$as_me: using bundled PCRE regular expressions library" >&6;} done fi - PCRE_DEP="pcre/.libs/libpcre.a" + PCRE2_DEP="pcre2/.libs/libpcre2-8.a" # save our work to this point; this allows the sub-package to use it cat >confcache <<\_ACEOF @@ -5384,8 +6417,8 @@ _ACEOF case $ac_val in #( *${as_nl}*) case $ac_var in #( - *_cv_*) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 -$as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; + *_cv_*) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 +printf "%s\n" "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; esac case $ac_var in #( _ | IFS | as_nl) ;; #( @@ -5415,15 +6448,15 @@ $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; /^ac_cv_env_/b end t clear :clear - s/^\([^=]*\)=\(.*[{}].*\)$/test "${\1+set}" = set || &/ + s/^\([^=]*\)=\(.*[{}].*\)$/test ${\1+y} || &/ t end s/^\([^=]*\)=\(.*\)$/\1=${\1=\2}/ :end' >>confcache if diff "$cache_file" confcache >/dev/null 2>&1; then :; else if test -w "$cache_file"; then if test "x$cache_file" != "x/dev/null"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: updating cache $cache_file" >&5 -$as_echo "$as_me: updating cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: updating cache $cache_file" >&5 +printf "%s\n" "$as_me: updating cache $cache_file" >&6;} if test ! -f "$cache_file" || test -h "$cache_file"; then cat confcache >"$cache_file" else @@ -5437,18 +6470,18 @@ $as_echo "$as_me: updating cache $cache_file" >&6;} fi fi else - { $as_echo "$as_me:${as_lineno-$LINENO}: not updating unwritable cache $cache_file" >&5 -$as_echo "$as_me: not updating unwritable cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: not updating unwritable cache $cache_file" >&5 +printf "%s\n" "$as_me: not updating unwritable cache $cache_file" >&6;} fi fi rm -f confcache - echo "configuring package in src/pcre now" + echo "configuring package in src/pcre2 now" ac_popdir=`pwd` - apr_config_subdirs="src/pcre" - test -d src/pcre || $mkdir_p src/pcre - ac_abs_srcdir=`(cd $srcdir/src/pcre && pwd)` - cd src/pcre + apr_config_subdirs="src/pcre2" + test -d src/pcre2 || $mkdir_p src/pcre2 + ac_abs_srcdir=`(cd $srcdir/src/pcre2 && pwd)` + cd src/pcre2 # A "../" for each directory in /$config_subdirs. ac_dots=`echo $apr_config_subdirs|sed -e 's%^\./%%' -e 's%[^/]$%&/%' -e 's%[^/]*/%../%g'` @@ -5469,7 +6502,7 @@ rm -f confcache for apr_configure_arg in $ac_configure_args do case "$apr_configure_arg" in - --with-pcre=*|\'--with-pcre=*) + --with-pcre2=*|\'--with-pcre2=*) continue ;; esac apr_configure_args="$apr_configure_args$apr_sep'$apr_configure_arg'" @@ -5483,9 +6516,9 @@ rm -f confcache if eval $SHELL $ac_abs_srcdir/configure $apr_configure_args --cache-file=$ac_sub_cache_file --srcdir=$ac_abs_srcdir then : - echo "src/pcre configured properly" + echo "src/pcre2 configured properly" else - echo "configure failed for src/pcre" + echo "configure failed for src/pcre2" exit 1 fi @@ -5496,16 +6529,16 @@ rm -f confcache # Some versions of bash will fail to source /dev/null (special files # actually), so we avoid doing that. DJGPP emulates it as a regular file. if test /dev/null != "$cache_file" && test -f "$cache_file"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: loading cache $cache_file" >&5 -$as_echo "$as_me: loading cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: loading cache $cache_file" >&5 +printf "%s\n" "$as_me: loading cache $cache_file" >&6;} case $cache_file in [\\/]* | ?:[\\/]* ) . "$cache_file";; *) . "./$cache_file";; esac fi else - { $as_echo "$as_me:${as_lineno-$LINENO}: creating cache $cache_file" >&5 -$as_echo "$as_me: creating cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: creating cache $cache_file" >&5 +printf "%s\n" "$as_me: creating cache $cache_file" >&6;} >$cache_file fi @@ -5514,21 +6547,141 @@ fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to use fnv hash library for non-sequential filenames" >&5 -$as_echo_n "checking whether to use fnv hash library for non-sequential filenames... " >&6; } + +# Check whether --with-libchardet was given. +if test ${with_libchardet+y} +then : + withval=$with_libchardet; given_libchardet=$withval +else $as_nop + given_libchardet='yes' +fi + + +found_libchardet="" + +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether to use libchardet for automatic character set detection" >&5 +printf %s "checking whether to use libchardet for automatic character set detection... " >&6; } +if test -n "${given_libchardet}" -a "${given_libchardet}" != "no"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } + + user_pkg_config_path="" + pkg_config_print_libs="" + + if test "${given_libchardet}" != "yes" -a "${given_libchardet}" != "/usr"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for libchardet in \"${given_libchardet}\"" >&5 +printf %s "checking for libchardet in \"${given_libchardet}\"... " >&6; } + if test -d "${given_libchardet}/lib/pkgconfig"; then + user_pkg_config_path="${given_libchardet}/lib/pkgconfig" + pkg_config_print_libs="--libs" + else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: \"${given_libchardet}\"/lib/pkgconfig doesn't exist" >&5 +printf "%s\n" "$as_me: \"${given_libchardet}\"/lib/pkgconfig doesn't exist" >&6;} + fi + fi + + if test -z "${user_pkg_config_path}"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for system libchardet" >&5 +printf %s "checking for system libchardet... " >&6; } + pkg_config_print_libs="--libs-only-l" + fi + + cflags=`PKGCONFIG_PATH="${user_pkg_config_path}:${PKG_CONFIG_PATH}" $PKGCONF chardet --cflags-only-I 2>/dev/null` + + if test $? -eq 0; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } + + if test "x$INCLUDES" = "x"; then + test "x$silent" != "xyes" && echo " setting INCLUDES to \"$cflags\"" + INCLUDES="$cflags" + else + apr_addto_bugger="$cflags" + for i in $apr_addto_bugger; do + apr_addto_duplicate="0" + for j in $INCLUDES; do + if test "x$i" = "x$j"; then + apr_addto_duplicate="1" + break + fi + done + if test $apr_addto_duplicate = "0"; then + test "x$silent" != "xyes" && echo " adding \"$i\" to INCLUDES" + INCLUDES="$INCLUDES $i" + fi + done + fi + + ldflags=`PKG_CONFIG_PATH="${user_pkg_config_path}:${PKG_CONFIG_PATH}" $PKGCONF chardet ${pkg_config_print_libs} 2>/dev/null` + + if test "x$EXTRA_LIBS" = "x"; then + test "x$silent" != "xyes" && echo " setting EXTRA_LIBS to \"$ldflags\"" + EXTRA_LIBS="$ldflags" + else + apr_addto_bugger="$ldflags" + for i in $apr_addto_bugger; do + apr_addto_duplicate="0" + for j in $EXTRA_LIBS; do + if test "x$i" = "x$j"; then + apr_addto_duplicate="1" + break + fi + done + if test $apr_addto_duplicate = "0"; then + test "x$silent" != "xyes" && echo " adding \"$i\" to EXTRA_LIBS" + EXTRA_LIBS="$EXTRA_LIBS $i" + fi + done + fi + + unset ldflags + found_libchardet=yes + +printf "%s\n" "#define HAVE_CHARDET 1" >>confdefs.h + + else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } + fi + + unset cflags + unset user_pkg_config_path + unset pkg_config_print_libs +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + +if test "${found_libchardet}" != "yes"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: hypermail's automatic charset detection won't" >&5 +printf "%s\n" "$as_me: WARNING: hypermail's automatic charset detection won't" >&2;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: be available unless you install libchardet." >&5 +printf "%s\n" "$as_me: WARNING: be available unless you install libchardet." >&2;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: See hypermail's release notes" >&5 +printf "%s\n" "$as_me: WARNING: See hypermail's release notes" >&2;} + fi + +unset found_libchardet + + +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether to use fnv hash library for non-sequential filenames" >&5 +printf %s "checking whether to use fnv hash library for non-sequential filenames... " >&6; } # Check whether --enable-libfnv was given. -if test "${enable_libfnv+set}" = set; then : - enableval=$enable_libfnv; given_libfnv=$enableval +if test ${enable_libfnv+y} +then : + enableval=$enable_libfnv; given_libfnv=$enableval fi if test "$given_libfnv" != "yes"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } - $as_echo "#define HAVE_LIBFNV 1" >>confdefs.h + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } + printf "%s\n" "#define HAVE_LIBFNV 1" >>confdefs.h if test "x$INCLUDES" = "x"; then @@ -5614,8 +6767,8 @@ _ACEOF case $ac_val in #( *${as_nl}*) case $ac_var in #( - *_cv_*) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 -$as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; + *_cv_*) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 +printf "%s\n" "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; esac case $ac_var in #( _ | IFS | as_nl) ;; #( @@ -5645,15 +6798,15 @@ $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; /^ac_cv_env_/b end t clear :clear - s/^\([^=]*\)=\(.*[{}].*\)$/test "${\1+set}" = set || &/ + s/^\([^=]*\)=\(.*[{}].*\)$/test ${\1+y} || &/ t end s/^\([^=]*\)=\(.*\)$/\1=${\1=\2}/ :end' >>confcache if diff "$cache_file" confcache >/dev/null 2>&1; then :; else if test -w "$cache_file"; then if test "x$cache_file" != "x/dev/null"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: updating cache $cache_file" >&5 -$as_echo "$as_me: updating cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: updating cache $cache_file" >&5 +printf "%s\n" "$as_me: updating cache $cache_file" >&6;} if test ! -f "$cache_file" || test -h "$cache_file"; then cat confcache >"$cache_file" else @@ -5667,8 +6820,8 @@ $as_echo "$as_me: updating cache $cache_file" >&6;} fi fi else - { $as_echo "$as_me:${as_lineno-$LINENO}: not updating unwritable cache $cache_file" >&5 -$as_echo "$as_me: not updating unwritable cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: not updating unwritable cache $cache_file" >&5 +printf "%s\n" "$as_me: not updating unwritable cache $cache_file" >&6;} fi fi rm -f confcache @@ -5714,16 +6867,16 @@ rm -f confcache # Some versions of bash will fail to source /dev/null (special files # actually), so we avoid doing that. DJGPP emulates it as a regular file. if test /dev/null != "$cache_file" && test -f "$cache_file"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: loading cache $cache_file" >&5 -$as_echo "$as_me: loading cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: loading cache $cache_file" >&5 +printf "%s\n" "$as_me: loading cache $cache_file" >&6;} case $cache_file in [\\/]* | ?:[\\/]* ) . "$cache_file";; *) . "./$cache_file";; esac fi else - { $as_echo "$as_me:${as_lineno-$LINENO}: creating cache $cache_file" >&5 -$as_echo "$as_me: creating cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: creating cache $cache_file" >&5 +printf "%s\n" "$as_me: creating cache $cache_file" >&6;} >$cache_file fi @@ -5733,249 +6886,13 @@ fi -USENSL=no -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for gethostbyaddr in -lsocket" >&5 -$as_echo_n "checking for gethostbyaddr in -lsocket... " >&6; } -if ${ac_cv_lib_socket_gethostbyaddr+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-lsocket $LIBS" -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ - -/* Override any GCC internal prototype to avoid an error. - Use char because int might match the return type of a GCC - builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif -char gethostbyaddr (); -int -main () -{ -return gethostbyaddr (); - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - ac_cv_lib_socket_gethostbyaddr=yes -else - ac_cv_lib_socket_gethostbyaddr=no -fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_socket_gethostbyaddr" >&5 -$as_echo "$ac_cv_lib_socket_gethostbyaddr" >&6; } -if test "x$ac_cv_lib_socket_gethostbyaddr" = xyes; then : - result=yes -else - result=no -fi - -if test $result = yes; then - LIBS="$LIBS -lsocket" -else - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for gethostbyaddr in -lsocket" >&5 -$as_echo_n "checking for gethostbyaddr in -lsocket... " >&6; } -if ${ac_cv_lib_socket_gethostbyaddr+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-lsocket -lnsl $LIBS" -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ - -/* Override any GCC internal prototype to avoid an error. - Use char because int might match the return type of a GCC - builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif -char gethostbyaddr (); -int -main () -{ -return gethostbyaddr (); - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - ac_cv_lib_socket_gethostbyaddr=yes -else - ac_cv_lib_socket_gethostbyaddr=no -fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_socket_gethostbyaddr" >&5 -$as_echo "$ac_cv_lib_socket_gethostbyaddr" >&6; } -if test "x$ac_cv_lib_socket_gethostbyaddr" = xyes; then : - result=yes -else - result=no -fi - - if test $result = yes; then - LIBS = "$LIBS -lsocket -lnsl" - USENSL=yes - else - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for inet_addr in -lsocket" >&5 -$as_echo_n "checking for inet_addr in -lsocket... " >&6; } -if ${ac_cv_lib_socket_inet_addr+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-lsocket $LIBS" -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ - -/* Override any GCC internal prototype to avoid an error. - Use char because int might match the return type of a GCC - builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif -char inet_addr (); -int -main () -{ -return inet_addr (); - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - ac_cv_lib_socket_inet_addr=yes -else - ac_cv_lib_socket_inet_addr=no -fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_socket_inet_addr" >&5 -$as_echo "$ac_cv_lib_socket_inet_addr" >&6; } -if test "x$ac_cv_lib_socket_inet_addr" = xyes; then : - result=yes -else - result=no -fi - - if test $result = yes; then - LIBS="$LIBS -lsocket" - else - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for inet_addr in -lsocket" >&5 -$as_echo_n "checking for inet_addr in -lsocket... " >&6; } -if ${ac_cv_lib_socket_inet_addr+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-lsocket -lnsl $LIBS" -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ - -/* Override any GCC internal prototype to avoid an error. - Use char because int might match the return type of a GCC - builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif -char inet_addr (); -int -main () -{ -return inet_addr (); - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - ac_cv_lib_socket_inet_addr=yes -else - ac_cv_lib_socket_inet_addr=no -fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_socket_inet_addr" >&5 -$as_echo "$ac_cv_lib_socket_inet_addr" >&6; } -if test "x$ac_cv_lib_socket_inet_addr" = xyes; then : - result=yes -else - result=no -fi - - if test $result = yes; then - LIBS="$LIBS -lsocket -lnsl" - USENSL=yes - fi - fi - fi -fi -if test $USENSL != yes; then - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for inet_addr in -lnsl" >&5 -$as_echo_n "checking for inet_addr in -lnsl... " >&6; } -if ${ac_cv_lib_nsl_inet_addr+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-lnsl $LIBS" -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ - -/* Override any GCC internal prototype to avoid an error. - Use char because int might match the return type of a GCC - builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif -char inet_addr (); -int -main () -{ -return inet_addr (); - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - ac_cv_lib_nsl_inet_addr=yes -else - ac_cv_lib_nsl_inet_addr=no -fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_nsl_inet_addr" >&5 -$as_echo "$ac_cv_lib_nsl_inet_addr" >&6; } -if test "x$ac_cv_lib_nsl_inet_addr" = xyes; then : - result=yes -else - result=no -fi - - if test $result = yes; then - LIBS="$LIBS -lnsl" - fi -fi - - - -ac_config_files="$ac_config_files Makefile archive/Makefile docs/Makefile libcgi/Makefile src/Makefile tests/testhm src/defaults.h" +ac_config_files="$ac_config_files Makefile archive/Makefile docs/Makefile src/Makefile tests/testhm src/defaults.h" cat >confcache <<\_ACEOF # This file is a shell script that caches the results of configure @@ -6004,8 +6921,8 @@ _ACEOF case $ac_val in #( *${as_nl}*) case $ac_var in #( - *_cv_*) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 -$as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; + *_cv_*) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 +printf "%s\n" "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; esac case $ac_var in #( _ | IFS | as_nl) ;; #( @@ -6035,15 +6952,15 @@ $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; /^ac_cv_env_/b end t clear :clear - s/^\([^=]*\)=\(.*[{}].*\)$/test "${\1+set}" = set || &/ + s/^\([^=]*\)=\(.*[{}].*\)$/test ${\1+y} || &/ t end s/^\([^=]*\)=\(.*\)$/\1=${\1=\2}/ :end' >>confcache if diff "$cache_file" confcache >/dev/null 2>&1; then :; else if test -w "$cache_file"; then if test "x$cache_file" != "x/dev/null"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: updating cache $cache_file" >&5 -$as_echo "$as_me: updating cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: updating cache $cache_file" >&5 +printf "%s\n" "$as_me: updating cache $cache_file" >&6;} if test ! -f "$cache_file" || test -h "$cache_file"; then cat confcache >"$cache_file" else @@ -6057,8 +6974,8 @@ $as_echo "$as_me: updating cache $cache_file" >&6;} fi fi else - { $as_echo "$as_me:${as_lineno-$LINENO}: not updating unwritable cache $cache_file" >&5 -$as_echo "$as_me: not updating unwritable cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: not updating unwritable cache $cache_file" >&5 +printf "%s\n" "$as_me: not updating unwritable cache $cache_file" >&6;} fi fi rm -f confcache @@ -6075,7 +6992,7 @@ U= for ac_i in : $LIBOBJS; do test "x$ac_i" = x: && continue # 1. Remove the extension, and $U if already installed. ac_script='s/\$U\././;s/\.o$//;s/\.obj$//' - ac_i=`$as_echo "$ac_i" | sed "$ac_script"` + ac_i=`printf "%s\n" "$ac_i" | sed "$ac_script"` # 2. Prepend LIBOBJDIR. When used with automake>=1.10 LIBOBJDIR # will be set to the directory where LIBOBJS objects are built. as_fn_append ac_libobjs " \${LIBOBJDIR}$ac_i\$U.$ac_objext" @@ -6091,8 +7008,8 @@ LTLIBOBJS=$ac_ltlibobjs ac_write_fail=0 ac_clean_files_save=$ac_clean_files ac_clean_files="$ac_clean_files $CONFIG_STATUS" -{ $as_echo "$as_me:${as_lineno-$LINENO}: creating $CONFIG_STATUS" >&5 -$as_echo "$as_me: creating $CONFIG_STATUS" >&6;} +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: creating $CONFIG_STATUS" >&5 +printf "%s\n" "$as_me: creating $CONFIG_STATUS" >&6;} as_write_fail=0 cat >$CONFIG_STATUS <<_ASEOF || as_write_fail=1 #! $SHELL @@ -6115,14 +7032,16 @@ cat >>$CONFIG_STATUS <<\_ASEOF || as_write_fail=1 # Be more Bourne compatible DUALCASE=1; export DUALCASE # for MKS sh -if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then : +as_nop=: +if test ${ZSH_VERSION+y} && (emulate sh) >/dev/null 2>&1 +then : emulate sh NULLCMD=: # Pre-4.2 versions of Zsh do word splitting on ${1+"$@"}, which # is contrary to our usage. Disable this feature. alias -g '${1+"$@"}'='"$@"' setopt NO_GLOB_SUBST -else +else $as_nop case `(set -o) 2>/dev/null` in #( *posix*) : set -o posix ;; #( @@ -6132,46 +7051,46 @@ esac fi + +# Reset variables that may have inherited troublesome values from +# the environment. + +# IFS needs to be set, to space, tab, and newline, in precisely that order. +# (If _AS_PATH_WALK were called with IFS unset, it would have the +# side effect of setting IFS to empty, thus disabling word splitting.) +# Quoting is to prevent editors from complaining about space-tab. as_nl=' ' export as_nl -# Printing a long string crashes Solaris 7 /usr/bin/printf. -as_echo='\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' -as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo -as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo$as_echo -# Prefer a ksh shell builtin over an external printf program on Solaris, -# but without wasting forks for bash or zsh. -if test -z "$BASH_VERSION$ZSH_VERSION" \ - && (test "X`print -r -- $as_echo`" = "X$as_echo") 2>/dev/null; then - as_echo='print -r --' - as_echo_n='print -rn --' -elif (test "X`printf %s $as_echo`" = "X$as_echo") 2>/dev/null; then - as_echo='printf %s\n' - as_echo_n='printf %s' -else - if test "X`(/usr/ucb/echo -n -n $as_echo) 2>/dev/null`" = "X-n $as_echo"; then - as_echo_body='eval /usr/ucb/echo -n "$1$as_nl"' - as_echo_n='/usr/ucb/echo -n' - else - as_echo_body='eval expr "X$1" : "X\\(.*\\)"' - as_echo_n_body='eval - arg=$1; - case $arg in #( - *"$as_nl"*) - expr "X$arg" : "X\\(.*\\)$as_nl"; - arg=`expr "X$arg" : ".*$as_nl\\(.*\\)"`;; - esac; - expr "X$arg" : "X\\(.*\\)" | tr -d "$as_nl" - ' - export as_echo_n_body - as_echo_n='sh -c $as_echo_n_body as_echo' - fi - export as_echo_body - as_echo='sh -c $as_echo_body as_echo' -fi +IFS=" "" $as_nl" + +PS1='$ ' +PS2='> ' +PS4='+ ' + +# Ensure predictable behavior from utilities with locale-dependent output. +LC_ALL=C +export LC_ALL +LANGUAGE=C +export LANGUAGE + +# We cannot yet rely on "unset" to work, but we need these variables +# to be unset--not just set to an empty or harmless value--now, to +# avoid bugs in old shells (e.g. pre-3.0 UWIN ksh). This construct +# also avoids known problems related to "unset" and subshell syntax +# in other old shells (e.g. bash 2.01 and pdksh 5.2.14). +for as_var in BASH_ENV ENV MAIL MAILPATH CDPATH +do eval test \${$as_var+y} \ + && ( (unset $as_var) || exit 1) >/dev/null 2>&1 && unset $as_var || : +done + +# Ensure that fds 0, 1, and 2 are open. +if (exec 3>&0) 2>/dev/null; then :; else exec 0&1) 2>/dev/null; then :; else exec 1>/dev/null; fi +if (exec 3>&2) ; then :; else exec 2>/dev/null; fi # The user is always right. -if test "${PATH_SEPARATOR+set}" != set; then +if ${PATH_SEPARATOR+false} :; then PATH_SEPARATOR=: (PATH='/bin;/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 && { (PATH='/bin:/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 || @@ -6180,13 +7099,6 @@ if test "${PATH_SEPARATOR+set}" != set; then fi -# IFS -# We need space, tab and new line, in precisely that order. Quoting is -# there to prevent editors from complaining about space-tab. -# (If _AS_PATH_WALK were called with IFS unset, it would disable word -# splitting by setting IFS to empty value.) -IFS=" "" $as_nl" - # Find who we are. Look in the path if we contain no directory separator. as_myself= case $0 in #(( @@ -6195,8 +7107,12 @@ case $0 in #(( for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - test -r "$as_dir/$0" && as_myself=$as_dir/$0 && break + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + test -r "$as_dir$0" && as_myself=$as_dir$0 && break done IFS=$as_save_IFS @@ -6208,30 +7124,10 @@ if test "x$as_myself" = x; then as_myself=$0 fi if test ! -f "$as_myself"; then - $as_echo "$as_myself: error: cannot find myself; rerun with an absolute file name" >&2 + printf "%s\n" "$as_myself: error: cannot find myself; rerun with an absolute file name" >&2 exit 1 fi -# Unset variables that we do not need and which cause bugs (e.g. in -# pre-3.0 UWIN ksh). But do not cause bugs in bash 2.01; the "|| exit 1" -# suppresses any "Segmentation fault" message there. '((' could -# trigger a bug in pdksh 5.2.14. -for as_var in BASH_ENV ENV MAIL MAILPATH -do eval test x\${$as_var+set} = xset \ - && ( (unset $as_var) || exit 1) >/dev/null 2>&1 && unset $as_var || : -done -PS1='$ ' -PS2='> ' -PS4='+ ' - -# NLS nuisances. -LC_ALL=C -export LC_ALL -LANGUAGE=C -export LANGUAGE - -# CDPATH. -(unset CDPATH) >/dev/null 2>&1 && unset CDPATH # as_fn_error STATUS ERROR [LINENO LOG_FD] @@ -6244,13 +7140,14 @@ as_fn_error () as_status=$1; test $as_status -eq 0 && as_status=1 if test "$4"; then as_lineno=${as_lineno-"$3"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - $as_echo "$as_me:${as_lineno-$LINENO}: error: $2" >&$4 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: $2" >&$4 fi - $as_echo "$as_me: error: $2" >&2 + printf "%s\n" "$as_me: error: $2" >&2 as_fn_exit $as_status } # as_fn_error + # as_fn_set_status STATUS # ----------------------- # Set $? to STATUS, without forking. @@ -6277,18 +7174,20 @@ as_fn_unset () { eval $1=; unset $1;} } as_unset=as_fn_unset + # as_fn_append VAR VALUE # ---------------------- # Append the text in VALUE to the end of the definition contained in VAR. Take # advantage of any shell optimizations that allow amortized linear growth over # repeated appends, instead of the typical quadratic growth present in naive # implementations. -if (eval "as_var=1; as_var+=2; test x\$as_var = x12") 2>/dev/null; then : +if (eval "as_var=1; as_var+=2; test x\$as_var = x12") 2>/dev/null +then : eval 'as_fn_append () { eval $1+=\$2 }' -else +else $as_nop as_fn_append () { eval $1=\$$1\$2 @@ -6300,12 +7199,13 @@ fi # as_fn_append # Perform arithmetic evaluation on the ARGs, and store the result in the # global $as_val. Take advantage of shells that can avoid forks. The arguments # must be portable across $(()) and expr. -if (eval "test \$(( 1 + 1 )) = 2") 2>/dev/null; then : +if (eval "test \$(( 1 + 1 )) = 2") 2>/dev/null +then : eval 'as_fn_arith () { as_val=$(( $* )) }' -else +else $as_nop as_fn_arith () { as_val=`expr "$@" || test $? -eq 1` @@ -6336,7 +7236,7 @@ as_me=`$as_basename -- "$0" || $as_expr X/"$0" : '.*/\([^/][^/]*\)/*$' \| \ X"$0" : 'X\(//\)$' \| \ X"$0" : 'X\(/\)' \| . 2>/dev/null || -$as_echo X/"$0" | +printf "%s\n" X/"$0" | sed '/^.*\/\([^/][^/]*\)\/*$/{ s//\1/ q @@ -6358,6 +7258,10 @@ as_cr_Letters=$as_cr_letters$as_cr_LETTERS as_cr_digits='0123456789' as_cr_alnum=$as_cr_Letters$as_cr_digits + +# Determine whether it's possible to make 'echo' print without a newline. +# These variables are no longer used directly by Autoconf, but are AC_SUBSTed +# for compatibility with existing Makefiles. ECHO_C= ECHO_N= ECHO_T= case `echo -n x` in #((((( -n*) @@ -6371,6 +7275,12 @@ case `echo -n x` in #((((( ECHO_N='-n';; esac +# For backward compatibility with old third-party macros, we provide +# the shell variables $as_echo and $as_echo_n. New code should use +# AS_ECHO(["message"]) and AS_ECHO_N(["message"]), respectively. +as_echo='printf %s\n' +as_echo_n='printf %s' + rm -f conf$$ conf$$.exe conf$$.file if test -d conf$$.dir; then rm -f conf$$.dir/conf$$.file @@ -6412,7 +7322,7 @@ as_fn_mkdir_p () as_dirs= while :; do case $as_dir in #( - *\'*) as_qdir=`$as_echo "$as_dir" | sed "s/'/'\\\\\\\\''/g"`;; #'( + *\'*) as_qdir=`printf "%s\n" "$as_dir" | sed "s/'/'\\\\\\\\''/g"`;; #'( *) as_qdir=$as_dir;; esac as_dirs="'$as_qdir' $as_dirs" @@ -6421,7 +7331,7 @@ $as_expr X"$as_dir" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_dir" : 'X\(//\)[^/]' \| \ X"$as_dir" : 'X\(//\)$' \| \ X"$as_dir" : 'X\(/\)' \| . 2>/dev/null || -$as_echo X"$as_dir" | +printf "%s\n" X"$as_dir" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q @@ -6484,7 +7394,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 # values after options handling. ac_log=" This file was extended by $as_me, which was -generated by GNU Autoconf 2.69. Invocation command line was +generated by GNU Autoconf 2.71. Invocation command line was CONFIG_FILES = $CONFIG_FILES CONFIG_HEADERS = $CONFIG_HEADERS @@ -6542,14 +7452,16 @@ $config_headers Report bugs to the package provider." _ACEOF +ac_cs_config=`printf "%s\n" "$ac_configure_args" | sed "$ac_safe_unquote"` +ac_cs_config_escaped=`printf "%s\n" "$ac_cs_config" | sed "s/^ //; s/'/'\\\\\\\\''/g"` cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 -ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`" +ac_cs_config='$ac_cs_config_escaped' ac_cs_version="\\ config.status -configured by $0, generated by GNU Autoconf 2.69, +configured by $0, generated by GNU Autoconf 2.71, with options \\"\$ac_cs_config\\" -Copyright (C) 2012 Free Software Foundation, Inc. +Copyright (C) 2021 Free Software Foundation, Inc. This config.status script is free software; the Free Software Foundation gives unlimited permission to copy, distribute and modify it." @@ -6587,15 +7499,15 @@ do -recheck | --recheck | --rechec | --reche | --rech | --rec | --re | --r) ac_cs_recheck=: ;; --version | --versio | --versi | --vers | --ver | --ve | --v | -V ) - $as_echo "$ac_cs_version"; exit ;; + printf "%s\n" "$ac_cs_version"; exit ;; --config | --confi | --conf | --con | --co | --c ) - $as_echo "$ac_cs_config"; exit ;; + printf "%s\n" "$ac_cs_config"; exit ;; --debug | --debu | --deb | --de | --d | -d ) debug=: ;; --file | --fil | --fi | --f ) $ac_shift case $ac_optarg in - *\'*) ac_optarg=`$as_echo "$ac_optarg" | sed "s/'/'\\\\\\\\''/g"` ;; + *\'*) ac_optarg=`printf "%s\n" "$ac_optarg" | sed "s/'/'\\\\\\\\''/g"` ;; '') as_fn_error $? "missing file argument" ;; esac as_fn_append CONFIG_FILES " '$ac_optarg'" @@ -6603,7 +7515,7 @@ do --header | --heade | --head | --hea ) $ac_shift case $ac_optarg in - *\'*) ac_optarg=`$as_echo "$ac_optarg" | sed "s/'/'\\\\\\\\''/g"` ;; + *\'*) ac_optarg=`printf "%s\n" "$ac_optarg" | sed "s/'/'\\\\\\\\''/g"` ;; esac as_fn_append CONFIG_HEADERS " '$ac_optarg'" ac_need_defaults=false;; @@ -6612,7 +7524,7 @@ do as_fn_error $? "ambiguous option: \`$1' Try \`$0 --help' for more information.";; --help | --hel | -h ) - $as_echo "$ac_cs_usage"; exit ;; + printf "%s\n" "$ac_cs_usage"; exit ;; -q | -quiet | --quiet | --quie | --qui | --qu | --q \ | -silent | --silent | --silen | --sile | --sil | --si | --s) ac_cs_silent=: ;; @@ -6640,7 +7552,7 @@ cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 if \$ac_cs_recheck; then set X $SHELL '$0' $ac_configure_args \$ac_configure_extra_args --no-create --no-recursion shift - \$as_echo "running CONFIG_SHELL=$SHELL \$*" >&6 + \printf "%s\n" "running CONFIG_SHELL=$SHELL \$*" >&6 CONFIG_SHELL='$SHELL' export CONFIG_SHELL exec "\$@" @@ -6654,7 +7566,7 @@ exec 5>>config.log sed 'h;s/./-/g;s/^.../## /;s/...$/ ##/;p;x;p;x' <<_ASBOX ## Running $as_me. ## _ASBOX - $as_echo "$ac_log" + printf "%s\n" "$ac_log" } >&5 _ACEOF @@ -6671,7 +7583,6 @@ do "Makefile") CONFIG_FILES="$CONFIG_FILES Makefile" ;; "archive/Makefile") CONFIG_FILES="$CONFIG_FILES archive/Makefile" ;; "docs/Makefile") CONFIG_FILES="$CONFIG_FILES docs/Makefile" ;; - "libcgi/Makefile") CONFIG_FILES="$CONFIG_FILES libcgi/Makefile" ;; "src/Makefile") CONFIG_FILES="$CONFIG_FILES src/Makefile" ;; "tests/testhm") CONFIG_FILES="$CONFIG_FILES tests/testhm" ;; "src/defaults.h") CONFIG_FILES="$CONFIG_FILES src/defaults.h" ;; @@ -6686,8 +7597,8 @@ done # We use the long form for the default assignment because of an extremely # bizarre bug on SunOS 4.1.3. if $ac_need_defaults; then - test "${CONFIG_FILES+set}" = set || CONFIG_FILES=$config_files - test "${CONFIG_HEADERS+set}" = set || CONFIG_HEADERS=$config_headers + test ${CONFIG_FILES+y} || CONFIG_FILES=$config_files + test ${CONFIG_HEADERS+y} || CONFIG_HEADERS=$config_headers fi # Have a temporary directory for convenience. Make it in the build tree @@ -7023,7 +7934,7 @@ do esac || as_fn_error 1 "cannot find input file: \`$ac_f'" "$LINENO" 5;; esac - case $ac_f in *\'*) ac_f=`$as_echo "$ac_f" | sed "s/'/'\\\\\\\\''/g"`;; esac + case $ac_f in *\'*) ac_f=`printf "%s\n" "$ac_f" | sed "s/'/'\\\\\\\\''/g"`;; esac as_fn_append ac_file_inputs " '$ac_f'" done @@ -7031,17 +7942,17 @@ do # use $as_me), people would be surprised to read: # /* config.h. Generated by config.status. */ configure_input='Generated from '` - $as_echo "$*" | sed 's|^[^:]*/||;s|:[^:]*/|, |g' + printf "%s\n" "$*" | sed 's|^[^:]*/||;s|:[^:]*/|, |g' `' by configure.' if test x"$ac_file" != x-; then configure_input="$ac_file. $configure_input" - { $as_echo "$as_me:${as_lineno-$LINENO}: creating $ac_file" >&5 -$as_echo "$as_me: creating $ac_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: creating $ac_file" >&5 +printf "%s\n" "$as_me: creating $ac_file" >&6;} fi # Neutralize special characters interpreted by sed in replacement strings. case $configure_input in #( *\&* | *\|* | *\\* ) - ac_sed_conf_input=`$as_echo "$configure_input" | + ac_sed_conf_input=`printf "%s\n" "$configure_input" | sed 's/[\\\\&|]/\\\\&/g'`;; #( *) ac_sed_conf_input=$configure_input;; esac @@ -7058,7 +7969,7 @@ $as_expr X"$ac_file" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$ac_file" : 'X\(//\)[^/]' \| \ X"$ac_file" : 'X\(//\)$' \| \ X"$ac_file" : 'X\(/\)' \| . 2>/dev/null || -$as_echo X"$ac_file" | +printf "%s\n" X"$ac_file" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q @@ -7082,9 +7993,9 @@ $as_echo X"$ac_file" | case "$ac_dir" in .) ac_dir_suffix= ac_top_builddir_sub=. ac_top_build_prefix= ;; *) - ac_dir_suffix=/`$as_echo "$ac_dir" | sed 's|^\.[\\/]||'` + ac_dir_suffix=/`printf "%s\n" "$ac_dir" | sed 's|^\.[\\/]||'` # A ".." for each directory in $ac_dir_suffix. - ac_top_builddir_sub=`$as_echo "$ac_dir_suffix" | sed 's|/[^\\/]*|/..|g;s|/||'` + ac_top_builddir_sub=`printf "%s\n" "$ac_dir_suffix" | sed 's|/[^\\/]*|/..|g;s|/||'` case $ac_top_builddir_sub in "") ac_top_builddir_sub=. ac_top_build_prefix= ;; *) ac_top_build_prefix=$ac_top_builddir_sub/ ;; @@ -7141,8 +8052,8 @@ ac_sed_dataroot=' case `eval "sed -n \"\$ac_sed_dataroot\" $ac_file_inputs"` in *datarootdir*) ac_datarootdir_seen=yes;; *@datadir@*|*@docdir@*|*@infodir@*|*@localedir@*|*@mandir@*) - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $ac_file_inputs seems to ignore the --datarootdir setting" >&5 -$as_echo "$as_me: WARNING: $ac_file_inputs seems to ignore the --datarootdir setting" >&2;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: $ac_file_inputs seems to ignore the --datarootdir setting" >&5 +printf "%s\n" "$as_me: WARNING: $ac_file_inputs seems to ignore the --datarootdir setting" >&2;} _ACEOF cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 ac_datarootdir_hack=' @@ -7185,9 +8096,9 @@ test -z "$ac_datarootdir_hack$ac_datarootdir_seen" && { ac_out=`sed -n '/\${datarootdir}/p' "$ac_tmp/out"`; test -n "$ac_out"; } && { ac_out=`sed -n '/^[ ]*datarootdir[ ]*:*=/p' \ "$ac_tmp/out"`; test -z "$ac_out"; } && - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $ac_file contains a reference to the variable \`datarootdir' + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: $ac_file contains a reference to the variable \`datarootdir' which seems to be undefined. Please make sure it is defined" >&5 -$as_echo "$as_me: WARNING: $ac_file contains a reference to the variable \`datarootdir' +printf "%s\n" "$as_me: WARNING: $ac_file contains a reference to the variable \`datarootdir' which seems to be undefined. Please make sure it is defined" >&2;} rm -f "$ac_tmp/stdin" @@ -7203,20 +8114,20 @@ which seems to be undefined. Please make sure it is defined" >&2;} # if test x"$ac_file" != x-; then { - $as_echo "/* $configure_input */" \ + printf "%s\n" "/* $configure_input */" >&1 \ && eval '$AWK -f "$ac_tmp/defines.awk"' "$ac_file_inputs" } >"$ac_tmp/config.h" \ || as_fn_error $? "could not create $ac_file" "$LINENO" 5 if diff "$ac_file" "$ac_tmp/config.h" >/dev/null 2>&1; then - { $as_echo "$as_me:${as_lineno-$LINENO}: $ac_file is unchanged" >&5 -$as_echo "$as_me: $ac_file is unchanged" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: $ac_file is unchanged" >&5 +printf "%s\n" "$as_me: $ac_file is unchanged" >&6;} else rm -f "$ac_file" mv "$ac_tmp/config.h" "$ac_file" \ || as_fn_error $? "could not create $ac_file" "$LINENO" 5 fi else - $as_echo "/* $configure_input */" \ + printf "%s\n" "/* $configure_input */" >&1 \ && eval '$AWK -f "$ac_tmp/defines.awk"' "$ac_file_inputs" \ || as_fn_error $? "could not create -" "$LINENO" 5 fi @@ -7257,8 +8168,9 @@ if test "$no_create" != yes; then $ac_cs_success || as_fn_exit 1 fi if test -n "$ac_unrecognized_opts" && test "$enable_option_checking" != no; then - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: unrecognized options: $ac_unrecognized_opts" >&5 -$as_echo "$as_me: WARNING: unrecognized options: $ac_unrecognized_opts" >&2;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: unrecognized options: $ac_unrecognized_opts" >&5 +printf "%s\n" "$as_me: WARNING: unrecognized options: $ac_unrecognized_opts" >&2;} fi + diff --git a/configure.ac b/configure.ac index 79714bfd..de6deb53 100644 --- a/configure.ac +++ b/configure.ac @@ -1,10 +1,11 @@ dnl Process this file with autoconf to produce a configure script. -AC_PREREQ([2.69]) +AC_PREREQ([2.71]) AC_REVISION($Revision: 1.2 $)dnl AC_INIT AC_CONFIG_SRCDIR([src/hypermail.c]) -AC_CONFIG_HEADER(config.h) +AC_CONFIG_HEADERS([config.h]) +AC_CONFIG_MACRO_DIRS([m4]) AC_PREFIX_DEFAULT(/usr/local) LDFLAGS="" LIBS="" @@ -15,8 +16,8 @@ GDBM_INCLUDE="" GDBM_LIB="" FNV_DEP="" TRIO_DEP="" -PCRE_DEP="" -PCRE_MIN_VERSION="8.39" +PCRE2_DEP="" +PCRE2_MIN_VERSION="10.32" dnl =========================================================================== dnl Get host, target and build variables filled with appropriate info. @@ -57,6 +58,7 @@ if test -z "$no_ranlib"; then else RANLIB=":" fi +AC_CHECK_TOOL([PKGCONF], [pkgconf]) dnl =========================================================================== dnl Determine the host type and set compliation flags as needed @@ -120,11 +122,6 @@ AC_ARG_WITH(httpddir, httpddir=$with_httpddir, httpddir=/usr/local/apache) AC_SUBST(httpddir) -AC_ARG_WITH(cgidir, - [ --with-cgidir=DIR where to install CGI scripts ], - cgidir=$with_cgidir, cgidir=$httpddir/cgi-bin) -AC_SUBST(cgidir) - AC_ARG_WITH(htmldir, [ --with-htmldir=DIR where to install Hypermail HTML pages ], htmldir=$with_htmldir, htmldir=$httpddir/htdocs/hypermail) @@ -163,16 +160,39 @@ dnl =========================================================================== dnl Checks headers dnl =========================================================================== -AC_HEADER_STDC +m4_warn([obsolete], +[The preprocessor macro `STDC_HEADERS' is obsolete. + Except in unusual embedded environments, you can safely include all + ISO C90 headers unconditionally.])dnl +# Autoupdate added the next two lines to ensure that your configure +# script's behavior did not change. They are probably safe to remove. +AC_CHECK_INCLUDES_DEFAULT +AC_PROG_EGREP + AC_CHECK_HEADERS(alloca.h arpa/inet.h ctype.h dirent.h errno.h \ - fcntl.h locale.h malloc.h netdb.h netinet/in.h pwd.h stdarg.h \ - stdio.h stdlib.h string.h sys/dir.h sys/param.h sys/socket.h \ + fcntl.h inttypes.h locale.h malloc.h netdb.h netinet/in.h pwd.h stdarg.h \ + stdint.h stdio.h stdlib.h string.h sys/dir.h sys/param.h sys/socket.h \ sys/stat.h sys/time.h sys/types.h time.h unistd.h) AC_HEADER_STAT AC_HEADER_DIRENT -AC_HEADER_TIME +m4_warn([obsolete], +[Update your code to rely only on HAVE_SYS_TIME_H, +then remove this warning and the obsolete code below it. +All current systems provide time.h; it need not be checked for. +Not all systems provide sys/time.h, but those that do, all allow +you to include it and time.h simultaneously.])dnl +AC_CHECK_HEADERS_ONCE([sys/time.h]) +# Obsolete code to be removed. +if test $ac_cv_header_sys_time_h = yes; then + AC_DEFINE([TIME_WITH_SYS_TIME],[1],[Define to 1 if you can safely include both + and . This macro is obsolete.]) +fi +# End of obsolete code. + + +AC_CHECK_DECLS([isblank, strcasecmp, strcasestr]) dnl =========================================================================== dnl Checks for library functions. @@ -182,9 +202,10 @@ dnl =========================================================================== AC_STRUCT_TM AC_FUNC_STRFTIME -AC_CHECK_FUNCS(mkdir strdup strstr strtol memcpy memset lstat strcasecmp \ - strcasestr getpwuid getopt snprintf memmove strerror) +AC_CHECK_FUNCS(getopt getpwuid isblank lstat memcpy memmove memset mkdir snprintf \ + strcasecmp strcasestr strdup strerror strstr strtol) +AC_TYPE_INTPTR_T AC_TYPE_SIZE_T if test $ac_cv_func_snprintf != no; then @@ -274,80 +295,100 @@ ifelse($3,,[ ]) ]) +dnl +dnl check for pkg-config +dnl + +AC_MSG_NOTICE(checking for pkg-config) +if test x"$PKGCONF" == x ; then + AC_MSG_ERROR('pkgconf' is missing; please install it.) +fi + +dnl +dnl gdbm check +dnl + AC_ARG_WITH(gdbm, AS_HELP_STRING([--with-gdbm=[DIR]], - [Include GDBM support]), + [(UNMAINTAINED) Include GDBM support]), [ given_gdbm=$withval]) -if test "$given_gdbm" != "no"; then - for i in /usr/local /usr $withval; do - if test -f "$i/include/gdbm.h"; then - GDBM_INCLUDE="$i/include" - THIS_PREFIX="$i" - fi - done - - unset ac_cv_lib_gdbm_gdbm_open - AC_TEMP_LDFLAGS(-L$THIS_PREFIX/lib,[ - AC_CHECK_LIB(gdbm, gdbm_open, [GDBM_LIB="-lgdbm"]) - ]) - - if test "$THIS_PREFIX" != "" && test "$THIS_PREFIX" != "/usr"; then - THIS_LFLAGS="$THIS_PREFIX/lib" - fi +if test "${given_gdbm}" = "no"; then + given_gdbm="" fi -AC_MSG_CHECKING(for GDBM support) - -if test "$GDBM_LIB" = "" && test "$given_gdbm" != "no"; then - AC_CHECK_LIB(gdbm, gdbm_open,[AC_DEFINE(GDBM,1, [Whether you have GDBM]) DBM_TYPE=gdbm; GDBM_LIB=-lgdbm], - [DBM_TYPE=""]) - AC_MSG_CHECKING([gdbm library]) - if test "a$DBM_TYPE" = a; then - AC_MSG_RESULT(none found) - AC_MSG_WARN(No gdbm library found - will limit a few features) - else - AC_MSG_RESULT($DBM_TYPE found) - fi -else - AC_MSG_RESULT(no) +dnl user-defined GDBM local install +if test -n "${given_gdbm}" -a "${given_gdbm}" != "yes" -a "${given_gdbm}" != "/usr"; then + AC_MSG_CHECKING(for "${given_gdbm}/include/gdbm.h") + if test -f "${given_gdbm}/include/gdbm.h"; then + GDBM_INCLUDE="${given_gdbm}/include" + AC_MSG_RESULT(yes) + else + AC_MSG_RESULT(no) + fi + + if test -n "${GDBM_INCLUDE}"; then + unset ac_cv_lib_gdbm_gdbm_open + AC_TEMP_LDFLAGS(-L${given_gdbm}/lib,[ + AC_CHECK_LIB(gdbm, gdbm_open,[GDBM_LIB="-lgdbm"],[GDBM_INCLUDE=""]) + ]) + + if test -n "${GDBM_LIB}"; then + APR_ADDTO(LDFLAGS, "-L${given_gdbm}/lib") + else + AC_MSG_WARN(gdbm includes and/or libs not found under "${given_gdbm}") + fi + fi fi -if test "$GDBM_LIB" = "-lgdbm" && test "$GDBM_INCLUDE" = ""; then +dnl if the user chose the system gdbm or alternative if autoconf found +dnl issues with the path the user gave to a local gdbm +if test -n "${given_gdbm}" -a -z "${GDBM_INCLUDE}"; then + gdbm_headers_found=1 + AC_MSG_NOTICE(Trying /usr/include/gdbm.h) AC_CHECK_HEADER(gdbm.h, [ GDBM_INCLUDE="" ], [ - AC_MSG_RESULT(Try /usr/local/include/gdbm.h) + AC_MSG_NOTICE(Trying /usr/local/include/gdbm.h) AC_CHECK_HEADER(/usr/local/include/gdbm.h, [ GDBM_INCLUDE="-I/usr/local/include" ],[ - AC_MSG_RESULT(Try /opt/local/include/gdbm.h) + AC_MSG_NOTICE(Trying /opt/local/include/gdbm.h) AC_CHECK_HEADER(/opt/local/include/gdbm.h, [ GDBM_INCLUDE="-I/opt/local/include" ],[ dnl if in /usr/pkg/include, do not add anything. See above. - AC_MSG_RESULT(Try /usr/pkg/include/gdbm.h) + AC_MSG_NOTICE(Trying /usr/pkg/include/gdbm.h) AC_CHECK_HEADER(/usr/pkg/include/gdbm.h, [ GDBM_INCLUDE="" ],[ - AC_MSG_RESULT([Giving up - You need to install gdbm.h somewhere]) - exit + AC_MSG_RESULT([gdbm.h not found]) + gdbm_headers_found=0 ]) ]) ]) - ]) -fi + ]) -if test -n "$GDBM_LIB"; then - AC_ADD_INCLUDE($GDBM_INCLUDE) - AC_DEFINE(HAVE_GDBM_H) - EXTRA_LIBS="$EXTRA_LIBS $GDBM_LIB" + if test ${gdbm_headers_found} -eq 1; then + AC_CHECK_LIB(gdbm, gdbm_open,[DBM_TYPE=gdbm; GDBM_LIB=-lgdbm], + [DBM_TYPE=""]) + fi + + unset gdbm_headers_found fi +if test -n "${GDBM_LIB}"; then + AC_DEFINE(HAVE_GDBM_H) + AC_DEFINE(GDBM,1, [Whether you have GDBM]) + AC_ADD_INCLUDE($GDBM_INCLUDE) + EXTRA_LIBS="$EXTRA_LIBS $GDBM_LIB" +else + AC_MSG_WARN(hypermail's usegdbm config option won't be available unless you install libgdbm-dev) +fi dnl dnl iconv check dnl -AC_ARG_ENABLE(i18n, [ --disable-i18n Disable I18N support], [given_iconv=$enableval]) -if test "$given_iconv" = "no"; then - echo "disabled I18N support." -else - AC_CHECK_FUNCS(iconv) - AC_CHECK_HEADERS(iconv.h) -fi +dnl iconv is part of libc6 +AC_CHECK_HEADERS(iconv.h, + [], + [AC_MSG_ERROR([unable to find iconv.h headers (libc6-dev may be missing)])]) +AC_CHECK_FUNCS(iconv, + [], + [AC_MSG_ERROR([unable to find iconv() function])]) dnl dnl libtrio: select whether to use the system or the bundled libtrio @@ -374,117 +415,184 @@ else TRIO_DEP="trio/libtrio.a" APR_SUBDIR_CONFIG([src/trio], [CFLAGS=[-DTRIO_MINIMAL]], - [--with-pcre=*|\'--with-pcre=*]) + [--with-pcre2=*|\'--with-pcre2=*]) AC_SUBST([TRIO_DEP]) fi dnl -dnl PCRE: select whether to use an external, the system, or the bundled PCRE lib +dnl PCRE2: select whether to use an external, the system, or the bundled PCRE2 lib dnl partially borrowed from Apache's configure.in dnl dnl bundled pcre lib AC_ARG_ENABLE(bundled_pcre, - AS_HELP_STRING([--enable-bundled-pcre], - [Force the use of the bundled PCRE library instead of the system one])) + AS_HELP_STRING([--enable-bundled-pcre2], + [Force the use of the bundled PCRE2 library instead of the system one])) -dnl external PCRE lib -AC_ARG_WITH(external_pcre, - AS_HELP_STRING([--with-external-pcre=PATH_TO_PCRE_DIR|PATH_TO_PCRE_CONFIG_SCRIPT], - [Use an external PCRE library instead of the system or the bundled one])) +dnl external PCRE2 lib +AC_ARG_WITH(external_pcre2, + AS_HELP_STRING([--with-external-pcre2=PATH_TO_PCRE2_DIR|PATH_TO_PCRE2_CONFIG_SCRIPT], + [Use an external PCRE2 library instead of the system or the bundled one])) -AC_MSG_NOTICE(checking for PCRE regular expressions library) +AC_MSG_NOTICE(checking for PCRE2 regular expressions library) dnl if user selected the bundled one, give it priority over the bundled one -if test ! -z ${enable_bundled_pcre}; then - with_external_pcre="" +if test ! -z ${enable_bundled_pcre2}; then + with_external_pcre2="" fi -case $with_external_pcre in - /*) AC_MSG_NOTICE(--with-external-pcre => checking for an external libpcre) - AC_MSG_CHECKING(for pcre-config) - if test -d "$with_external_pcre" && test -x "$with_external_pcre/pcre-config"; then - PCRE_CONFIG=$with_external_pcre/pcre-config - elif test -x "$with_external_pcre"; then - PCRE_CONFIG=$with_external_pcre +case $with_external_pcre2 in + /*) AC_MSG_NOTICE(--with-external-pcre2 => checking for an external libpcre2) + AC_MSG_CHECKING(for pcre2-config) + if test -d "$with_external_pcre" && test -x "$with_external_pcre2/pcre2-config"; then + PCRE2_CONFIG=$with_external_pcre2/pcre2-config + elif test -x "$with_external_pcre2"; then + PCRE2_CONFIG=$with_external_pcre2 else AC_MSG_RESULT(no) - AC_MSG_ERROR(${PCRE_CONFIG} does not point to a directory or pcre-config) + AC_MSG_ERROR(${PCRE2_CONFIG} does not point to a directory or pcre2-config) fi - AC_MSG_RESULT(${PCRE_CONFIG}) + AC_MSG_RESULT(${PCRE2_CONFIG}) - AC_MSG_CHECKING(for pcre version equal or greater than ${PCRE_MIN_VERSION}) - if test -f "${PCRE_CONFIG}" && test -x "${PCRE_CONFIG}"; then - PCRE_VERSION=$(${PCRE_CONFIG} --version) - AS_VERSION_COMPARE("${PCRE_VERSION}", "${PCRE_MIN_VERSION}", - [PCRE_CONFIG=false], [], []) - if test "${PCRE_CONFIG}" != false; then + AC_MSG_CHECKING(for pcre2 version equal or greater than ${PCRE2_MIN_VERSION}) + if test -f "${PCRE2_CONFIG}" && test -x "${PCRE2_CONFIG}"; then + PCRE2_VERSION=$(${PCRE2_CONFIG} --version) + AS_VERSION_COMPARE("${PCRE2_VERSION}", "${PCRE2_MIN_VERSION}", + [PCRE2_CONFIG=false], [], []) + if test "${PCRE2_CONFIG}" != false; then AC_MSG_RESULT(yes) else AC_MSG_RESULT(no) - AC_MSG_ERROR(the PCRE library version must be equal or greater than ${PCRE_MIN_VERSION}) + AC_MSG_ERROR(the PCRE2 library version must be equal or greater than ${PCRE2_MIN_VERSION}) fi else AC_MSG_RESULT(no) - AC_MSG_ERROR(${PCRE_CONFIG} does not point to a script) + AC_MSG_ERROR(${PCRE2_CONFIG} does not point to a script) fi ;; - *) PCRE_CONFIG=false + *) PCRE2_CONFIG=false ;; esac dnl nope; do we have a system PCRE lib? -if test -z "${enable_bundled_pcre}" && test -z "${PCRE_CONFIG}" -o "${PCRE_CONFIG}" = "false"; then - AC_MSG_NOTICE(checking for a system libpcre) - AC_PATH_PROG([PCRE_CONFIG], [pcre-config], []) - if test ! -z "${PCRE_CONFIG}"; then - AC_MSG_CHECKING(checking pcre version equal or greater than ${PCRE_MIN_VERSION}) - PCRE_VERSION=$(${PCRE_CONFIG} --version) - AS_VERSION_COMPARE("${PCRE_VERSION}", "${PCRE_MIN_VERSION}", - [PCRE_CONFIG=false], [], []) - if test "${PCRE_CONFIG}" != false; then +if test -z "${enable_bundled_pcre2}" && test -z "${PCRE2_CONFIG}" -o "${PCRE2_CONFIG}" = "false"; then + AC_MSG_NOTICE(checking for a system libpcre2) + AC_PATH_PROG([PCRE2_CONFIG], [pcre2-config], []) + if test ! -z "${PCRE2_CONFIG}"; then + AC_MSG_CHECKING(checking pcre2 version equal or greater than ${PCRE2_MIN_VERSION}) + PCRE2_VERSION=$(${PCRE2_CONFIG} --version) + AS_VERSION_COMPARE("${PCRE2_VERSION}", "${PCRE2_MIN_VERSION}", + [PCRE2_CONFIG=false], [], []) + if test "${PCRE2_CONFIG}" != false; then AC_MSG_RESULT(yes) else AC_MSG_RESULT(no) fi else - PCRE_CONFIG=false + PCRE2_CONFIG=false fi fi dnl found a library, do we have compilation headers? -if test "${PCRE_CONFIG}" != false; then - PCRE_PATH=$(${PCRE_CONFIG} --prefix) - AC_CHECK_HEADER(["$PCRE_PATH/include/pcre.h"], [ ],[PCRE_CONFIG=false]) +if test "${PCRE2_CONFIG}" != false; then + PCRE2_PATH=$(${PCRE2_CONFIG} --prefix) + AC_CHECK_HEADER(["$PCRE2_PATH/include/pcre2.h"], [ ],[PCRE2_CONFIG=false], + [#define PCRE2_CODE_UNIT_WIDTH 8]) fi dnl PCRE conclusion -if test "$PCRE_CONFIG" != "false"; then - AC_MSG_NOTICE([using system PCRE regular expressions library]) - APR_ADDTO(CFLAGS, [`$PCRE_CONFIG --cflags`]) - APR_ADDTO(LIBS, [`$PCRE_CONFIG --libs`]) +if test "$PCRE2_CONFIG" != "false"; then + AC_MSG_NOTICE([using system PCRE2 regular expressions library]) + APR_ADDTO(CFLAGS, [`$PCRE2_CONFIG --cflags`]) + APR_ADDTO(LIBS, [`$PCRE2_CONFIG --libs8`]) else - AC_MSG_NOTICE([using bundled PCRE regular expressions library]) - APR_ADDTO(INCLUDES, [-Ipcre]) - APR_ADDTO(LDFLAGS, "[-Lpcre/.libs]") - PCRE_DEP="pcre/.libs/libpcre.a" - APR_SUBDIR_CONFIG([src/pcre], + AC_MSG_NOTICE([using bundled PCRE2 regular expressions library]) + APR_ADDTO(INCLUDES, [-Ipcre2/src]) + APR_ADDTO(LDFLAGS, "[-Lpcre2/.libs]") + PCRE2_DEP="pcre2/.libs/libpcre2-8.a" + APR_SUBDIR_CONFIG([src/pcre2], [], - [--with-pcre=*|\'--with-pcre=*]) - AC_SUBST([PCRE_DEP]) + [--with-pcre2=*|\'--with-pcre2=*]) + AC_SUBST([PCRE2_DEP]) +fi + +dnl +dnl check for libchardet +dnl + +AC_ARG_WITH(libchardet, + AS_HELP_STRING([--with-libchardet=[DIR]], + [Use libchardet for character set detection, optional DIR points to path to local installed libchardet; leave empty for using system libchardet]), + [ given_libchardet=$withval ], + [ given_libchardet='yes' ]) + +found_libchardet="" + +AC_MSG_CHECKING(whether to use libchardet for automatic character set detection) +if test -n "${given_libchardet}" -a "${given_libchardet}" != "no"; then + AC_MSG_RESULT(yes) + + user_pkg_config_path="" + pkg_config_print_libs="" + + dnl check for libchardet in user-defined dir or system one (default) + if test "${given_libchardet}" != "yes" -a "${given_libchardet}" != "/usr"; then + AC_MSG_CHECKING(for libchardet in "${given_libchardet}") + if test -d "${given_libchardet}/lib/pkgconfig"; then + user_pkg_config_path="${given_libchardet}/lib/pkgconfig" + pkg_config_print_libs="--libs" + else + AC_MSG_RESULT(no) + AC_MSG_NOTICE("${given_libchardet}"/lib/pkgconfig doesn't exist) + fi + fi + + if test -z "${user_pkg_config_path}"; then + AC_MSG_CHECKING(for system libchardet) + pkg_config_print_libs="--libs-only-l" + fi + + cflags=`PKGCONFIG_PATH="${user_pkg_config_path}:${PKG_CONFIG_PATH}" $PKGCONF chardet --cflags-only-I 2>/dev/null` + + if test $? -eq 0; then + AC_MSG_RESULT(yes) + APR_ADDTO(INCLUDES, [$cflags]) + ldflags=`PKG_CONFIG_PATH="${user_pkg_config_path}:${PKG_CONFIG_PATH}" $PKGCONF chardet ${pkg_config_print_libs} 2>/dev/null` + APR_ADDTO(EXTRA_LIBS, [$ldflags]) + unset ldflags + found_libchardet=yes + AC_DEFINE_UNQUOTED(HAVE_CHARDET,[1],[Use libchardet for character set detection]) + else + AC_MSG_RESULT(no) + fi + + unset cflags + unset user_pkg_config_path + unset pkg_config_print_libs +else + AC_MSG_RESULT(no) +fi + +if test "${found_libchardet}" != "yes"; then + AC_MSG_WARN(hypermail's automatic charset detection won't) + AC_MSG_WARN(be available unless you install libchardet.) + AC_MSG_WARN(See hypermail's release notes) + dnl AC_MSG_WARN(https://github.com/Joungkyun/libchardet) fi +unset found_libchardet + dnl dnl The FNV hash library used for the nonsequential filenames dnl AC_MSG_CHECKING([whether to use fnv hash library for non-sequential filenames]) AC_ARG_ENABLE(libfnv, - [ AS_HELP_STRING([--enable-libfnv], - [use the fnv hash library for generating non-sequential filenames [no]]) + [AS_HELP_STRING([--enable-libfnv], + [(EXPERIMENTAL, UNMAINTAINED) use the fnv hash library for generating non-sequential filenames [no]]) ], - [ given_libfnv=$enableval ]) + [given_libfnv=$enableval]) if test "$given_libfnv" != "yes"; then AC_MSG_RESULT([no]) @@ -499,41 +607,6 @@ else fi AC_SUBST(FNV_DEP) - -dnl =========================================================================== -dnl Checks for libraries. -dnl nsl socket lib? -dnl =========================================================================== - -USENSL=no -AC_CHECK_LIB(socket,gethostbyaddr,result=yes,result=no) -if test $result = yes; then - LIBS="$LIBS -lsocket" -else - AC_CHECK_LIB(socket,gethostbyaddr,result=yes,result=no,-lnsl) - if test $result = yes; then - LIBS = "$LIBS -lsocket -lnsl" - USENSL=yes - else - AC_CHECK_LIB(socket,inet_addr,result=yes,result=no) - if test $result = yes; then - LIBS="$LIBS -lsocket" - else - AC_CHECK_LIB(socket,inet_addr,result=yes,result=no,-lnsl) - if test $result = yes; then - LIBS="$LIBS -lsocket -lnsl" - USENSL=yes - fi - fi - fi -fi -if test $USENSL != yes; then - AC_CHECK_LIB(nsl,inet_addr,result=yes,result=no) - if test $result = yes; then - LIBS="$LIBS -lnsl" - fi -fi - dnl =========================================================================== dnl Makefile variable substitution dnl =========================================================================== @@ -545,6 +618,6 @@ AC_SUBST(LIBS) AC_SUBST(EXTRA_LIBS) AC_SUBST(INCLUDES) -AC_CONFIG_FILES([Makefile archive/Makefile docs/Makefile libcgi/Makefile src/Makefile tests/testhm src/defaults.h]) +AC_CONFIG_FILES([Makefile archive/Makefile docs/Makefile src/Makefile tests/testhm src/defaults.h]) AC_OUTPUT diff --git a/contrib/css_to_c.pl b/contrib/css_to_c.pl new file mode 100644 index 00000000..9556495d --- /dev/null +++ b/contrib/css_to_c.pl @@ -0,0 +1,95 @@ +#!/usr/bin/perl + +use strict; +use warnings; + +# This script reads a CSS file in stdin and outputs in stdout the C +# code for the src/printcss.c file. Replace that file with the +# output from this script. +# +# print_default_css_file() will generate the default CSS file for +# an hypermail mailing list archive + +# Usage: css_to_c.pl <../docs/hypermail.css >../src/printcss.c +# +# Author: J. Kahan Sep 2021 +# + +sub beginning { + + print << "__HERE__"; +/* +** Copyright (C) 2021 Hypermail Project +** +** This program and library is free software; you can redistribute it and/or +** modify it under the terms of the GNU (Library) General Public License +** as published by the Free Software Foundation; either version 3 +** of the License, or any later version. +** +** This program is distributed in the hope that it will be useful, +** but WITHOUT ANY WARRANTY; without even the implied warranty of +** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +** GNU (Library) General Public License for more details. +** +** You should have received a copy of the GNU (Library) General Public License +** along with this program; if not, write to the Free Software +** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA +*/ + +/* THIS FILE IS GENERATED BY RUNNING contrib/css_to_c.pl ON A CSS FILE +** DO NOT EDIT THIS FILE BY HAND; RATHER EDIT docs/hypermail.css TO +** REGENERATE IT. IF YO WANT TO CHANGE THE C CODE ITSELF, PLEASE EDIT +** THE css_to_c.pl SCRIPT +*/ + +#include "hypermail.h" +#include "setup.h" + +#include "proto.h" + +void print_default_css_file(char *filename) +{ + FILE *fp; + + if ((fp = fopen(filename, "w")) == NULL) { + trio_snprintf(errmsg, sizeof(errmsg), "%s \\"%s\\".", lang[MSG_COULD_NOT_WRITE], filename); + progerr(errmsg); + } + +__HERE__ +} + +sub end { + print << "__HERE__"; + + fclose(fp); + + if (chmod(filename, set_filemode) == -1) { + trio_snprintf(errmsg, sizeof(errmsg), "%s \\"%s\\": %o.", lang[MSG_CANNOT_CHMOD], filename, set_filemode); + progerr(errmsg); + } + +} /* end print_default_css_file */ + +__HERE__ +} + +sub convert_css { + while (<>) { + chomp; + # escape the few chars that may interfere with fprintf + # % == %% + # " == \" + # \ == \\ + s#%#%%#g; + s#(["\\])#\\$1#g; + print " fprintf(fp, \"" . $_ . "\\n\");\n"; + } +} + +# main +{ + beginning(); + convert_css(); + end(); +} diff --git a/docs/Makefile.in b/docs/Makefile.in index a8a087bf..f271348c 100644 --- a/docs/Makefile.in +++ b/docs/Makefile.in @@ -17,23 +17,24 @@ htmldir=@htmldir@ INSTALL_PROG=@INSTALL@ all: install - + uninstall: rm -f $(mandir)/man1/hypermail.1 rm -f $(mandir)/man4/hmrc.4 (if [ -d $(htmldir) ]; then \ - rm -f $(htmldir)/hr.yellow.png; \ - rm -f $(htmldir)/hypermail.png; \ - rm -f $(htmldir)/stars.png; \ rm -f $(htmldir)/archive_search.html; \ - rm -f $(htmldir)/hypermail.html; \ - rm -f $(htmldir)/hypermail-faq.html; \ rm -f $(htmldir)/customizing.html; \ + rm -f $(htmldir)/faq.html; \ rm -f $(htmldir)/hmrc.html; \ + rm -f $(htmldir)/hypermail.css; \ + rm -f $(htmldir)/hypermail.html; \ + rm -f $(htmldir)/hypermail.png; \ + rm -f $(htmldir)/hypermail-doc.css; \ + rm -f $(htmldir)/thanks.html; \ rmdir $(htmldir); \ fi) -install: install.man install.html +install: install.man install.html install.css install.man: @(if [ ! -d $(mandir) ]; then mkdir -p $(mandir); fi) @@ -44,14 +45,18 @@ install.man: install.html: @(if [ ! -d $(htmldir) ]; then mkdir -p $(htmldir); fi) - $(INSTALL_PROG) -c -m 0644 hr.yellow.png $(htmldir) - $(INSTALL_PROG) -c -m 0644 hypermail.png $(htmldir) - $(INSTALL_PROG) -c -m 0644 stars.png $(htmldir) $(INSTALL_PROG) -c -m 0644 archive_search.html $(htmldir) - $(INSTALL_PROG) -c -m 0644 hypermail.html $(htmldir) - $(INSTALL_PROG) -c -m 0644 hypermail-faq.html $(htmldir) $(INSTALL_PROG) -c -m 0644 customizing.html $(htmldir) + $(INSTALL_PROG) -c -m 0644 faq.html $(htmldir) $(INSTALL_PROG) -c -m 0644 hmrc.html $(htmldir) + $(INSTALL_PROG) -c -m 0644 hypermail.html $(htmldir) + $(INSTALL_PROG) -c -m 0644 hypermail.png $(htmldir) + $(INSTALL_PROG) -c -m 0644 hypermail-doc.css $(htmldir) + $(INSTALL_PROG) -c -m 0644 thanks.html $(htmldir) + +install.css: + @(if [ ! -d $(htmldir) ]; then mkdir -p $(htmldir); fi) + $(INSTALL_PROG) -c -m 0644 hypermail.css $(htmldir) clean: clobber: diff --git a/docs/archive_search.html b/docs/archive_search.html index 0ee645b0..faad40db 100644 --- a/docs/archive_search.html +++ b/docs/archive_search.html @@ -1,74 +1,85 @@ - - - - Adding a search engine to your Hypermail archives - - - - - + + + + + + Adding a search engine to your Hypermail archives + + + + + + -

Adding a search engine
to your Hypermail archives

- -

One of the feature requests we hear most often is to incorporate -a search engine in Hypermail. But because everyone has his or her -own favorite search engine, and since it's hard to imagine a search engine -that may not turn out to be unusable -for some archives, we haven't built a search engine into hypermail, and -we probably won't.

- -

But hypermail's page -customizations make it easy to integrate your own search engine into your -hypermail archives.

- -

For our example, we're going to put a form box on the top and bottom of -every index page, and we'll use the swish-e -search engine. We'll show a typical PHP -script and a typical Perl script that can -function as a glue layer between the web and the search engine. -There ought to be enough information here to get you -started regardless of what search engine and scripting language you choose -for your site.

- -

Let's begin by modifying the header and footer hypermail puts on each -index file.

- -

(1) Create a file called indexheader.hyp containing the following:

-

indexheader.hyp

-
+    

THIS DOC IS VERY OLD, WRITTEN FOR THE DEFUNCT SWISH SEARCH ENGINE AND HASN'T BEEN REVIEWED IN AGES.

+

IT'S INCLUDED HERE FOR INFORMATIONAL PURPOSES.

+ +

Adding a search engine
+ to your Hypermail archives

+

One of the feature requests we hear most often is to + incorporate a search engine in Hypermail. But because everyone + has his or her own favorite search engine, and since it's hard to + imagine a search engine that may not turn out to be unusable for + some archives, we haven't built a search engine into hypermail, + and we probably won't.

+

But hypermail's page customizations make it easy to integrate + your own search engine into your hypermail archives.

+

For our example, we're going to put a form box on the top and + bottom of every index page, and we'll use the swish-e search engine. We'll show a + typical PHP script and a + typical Perl script that can + function as a glue layer between the web and the search engine. + There ought to be enough information here to get you started + regardless of what search engine and scripting language you + choose for your site.

+

Let's begin by modifying the header and footer hypermail puts + on each index file.

+

(1) Create a file called indexheader.hyp containing the + following:

+

indexheader.hyp

+
     <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
     <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
@@ -110,26 +121,25 @@ 

indexheader.hyp

</div>
- -

Notice the HTML comment about 2/3 of the way down. Everything -above that line is standard, and ought to be very similar to the default -header. The custom code (below the comment) adds the features you -want to add. In this case, it's a form for a search.

- -

You will, of course, substitute your own URL for search.php for the form's ACTION -and the name of your own search index for "name.of.index". See the bottom of this page for -an example of creating a search index file.

- -

If you like the general appearance of the web pages Hypermail creates and -don't want to spend a lot of time playing with the HTML, you might want to -look at the source of "index.html" in the directory for which you're installing -a search engine and just copy the appropriate lines (e.g., the lines -above the first <hr />) to indexheader.hyp.

- -

(2) Create a file called indexfooter.hyp containing the following:

- -

indexfooter.hyp

-
+  

Notice the HTML comment about 2/3 of the way down. Everything + above that line is standard, and ought to be very similar to the + default header. The custom code (below the comment) adds the + features you want to add. In this case, it's a form for a + search.

+

You will, of course, substitute your own URL for search.php + for the form's ACTION and the name of your own search index for + "name.of.index". See the bottom of this page for an example of + creating a search index file.

+

If you like the general appearance of the web pages Hypermail + creates and don't want to spend a lot of time playing with the + HTML, you might want to look at the source of "index.html" in the + directory for which you're installing a search engine and just + copy the appropriate lines (e.g., the lines above the + first <hr />) to indexheader.hyp.

+

(2) Create a file called indexfooter.hyp containing the + following:

+

indexfooter.hyp

+
     <div class="center">
     <form action="/url/of/search.php" method="post"
         enctype="application/x-www-form-urlencoded">
@@ -153,18 +163,16 @@ 

indexfooter.hyp

</html>
-

Since this is a footer file, the common code will be at the end, and -you'll put your custom code above the comment.

- -

You may have noted what seem to be superfluous <div> elements in -the HTML in the header and footer file. -The W3C validator likes them, and -they do no harm.

- -

(3) Modify the .hmrc file to point to your custom header and footer:

- -

.hmrc (excerpt)

-
+  

Since this is a footer file, the common code will be at the + end, and you'll put your custom code above the comment.

+

You may have noted what seem to be superfluous <div> + elements in the HTML in the header and footer file. The W3C validator likes them, and they + do no harm.

+

(3) Modify the .hmrc file to point to your custom header and + footer:

+

.hmrc (excerpt)

+
     # ihtmlheaderfile = [ path to index header template file | NONE ]
     #
     # Set this to the path to the Index header template file containing
@@ -182,18 +190,16 @@ 

.hmrc (excerpt)

ihtmlfooterfile = /path/to/indexfooter.hyp
-

You'll call hypermail using this .hmrc file to create your archive -or add a message:

- -
+  

You'll call hypermail using this .hmrc file to create your + archive or add a message:

+
     hypermail -c /path/to/.hmrc [...]
 
 
-

Here's a sample PHP script that performs the minimum functionality -required for "search.php":

- -

search.php

-
+  

Here's a sample PHP script that performs the minimum + functionality required for "search.php":

+

search.php

+
     <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
     <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
@@ -272,29 +278,25 @@ 

search.php

</html>
- -

If you prefer Perl, or if your web server doesn't offer PHP, then you could -modify these lines in indexheader.hyp and indexfooter.hyp:

- -
+  

If you prefer Perl, or if your web server doesn't offer PHP, + then you could modify these lines in indexheader.hyp and + indexfooter.hyp:

+
     <form action="/url/of/search.php" method="post"
         enctype="application/x-www-form-urlencoded">
 
 
- -

to

- -
+  

to

+
     <form action="/url/of/search.pl" method="post"
         enctype="application/x-www-form-urlencoded">
 
 
- -

Here's a sample script contributed by Perl maven Greg Bacon that -performs the minimum functionality required for "search.pl":

- -

search.pl

-
+  

Here's a sample script contributed by Perl maven Greg Bacon + that performs the minimum functionality required for + "search.pl":

+

search.pl

+
 
     #! /usr/local/bin/perl -T
     
@@ -502,14 +504,13 @@ 

search.pl

EOFooter
- -

Swish-e, like many other search engines, requires you to generate -a search index before you perform any searches. Here's a .conf file for -swish-e that might generate a reasonable -search index for a hypermail archive of messages about model railroading:

- -

model-rr.conf

-
+  

Swish-e, like many other search engines, requires you to + generate a search index before you perform any searches. Here's a + .conf file for swish-e that might generate a reasonable search + index for a hypermail archive of messages about model + railroading:

+

model-rr.conf

+
     IndexDir /path/to/model-rr-archive
     IndexFile /path/to/model-rr.index
     IndexReport 3
@@ -523,30 +524,32 @@ 

model-rr.conf

FileRules filename is thread.html
- -

This tells swish-e to collect data from all the HTML files in your -model railroading archive except the index pages that hypermail -generates. That way, all the search results will point directly to the -messages themselves.

- -

When you want to build your search index, you'll call swish-e something like this:

- -
+  

This tells swish-e to collect data from all the HTML files in + your model railroading archive except the index pages + that hypermail generates. That way, all the search results will + point directly to the messages themselves.

+

When you want to build your search index, you'll call swish-e + something like this:

+
     swish-e -v 3 -c /path/to/model-rr.conf > /path/to/model-rr.report 2>&1
 
 
-

Many people do this either in a cron job during off-peak hours or -through a CGI script that they call whenever they update their archive.

- -

You might want to add search logging, page control (so you only -print, say, 20 results at a time), some nice CSS, and all sorts of other -things to your script. And if you'd prefer to write your script in -Python, sh, or Ada, you can do that too. From here on, it's up to you.

- -
--- -
Bob Crispen
-
Thursday, June 26, 2003
-
+

Many people do this either in a cron job during off-peak hours + or through a CGI script that they call whenever they update their + archive.

+

You might want to add search logging, page control (so you + only print, say, 20 results at a time), some nice CSS, and all + sorts of other things to your script. And if you'd prefer to + write your script in Python, sh, or Ada, you can do that too. + From here on, it's up to you.

+
+ -- +
+ Bob Crispen +
+
+ Thursday, June 26, 2003 +
+
diff --git a/docs/customizing.html b/docs/customizing.html index 653bd241..9a085e0c 100644 --- a/docs/customizing.html +++ b/docs/customizing.html @@ -1,478 +1,514 @@ - - - -Hypermail Documentation - - - -

   Customizing Hypermail Pages

- -
- -

-Contents: -

- -


- -

Hypermail Pages

- -You can customize hypermail generated pages to suit your local web site -needs as well as the needs of the list. -Hypermail generates three types of files: -
    -
  • HTML index pages, -
  • HTML message pages, -
  • MIME enclosure attactment files. -
-

-The attachment files are a copy of the attachment the user included and -are not altered. -

-This version of hypermail allows you to -customize both index and message pages separately as described below. - -


- -

Definitions

-

-In the examples below, the following terms are used. -

-label - the label passed in via the -command line or specified in the list configuration file. -

-indextype - depends on the type of index being -presented. It could be By Author, By Date, By Subject, or By -Thread. -

-mailto-address - the MAILTO -value compiled into hypermail, specified in the environment with the -HM_MAILTO variable, or specified in the -hm_mailto variable in the list specific -configuration file. -

-subject-of-message - the contents of the message's -RFC 2822 Subject: -header. -

-HMURL - Used to contain a link to the Hypermail -Development Center. Defined in hypermail.h. -

-PROGNAME - contains the name of the executable. -Defined in hypermail.h. -

-VERSION - contains the version of the software that -generated the page this appears on. Defined in src/hypermail.h. - -


- -

Choosing the Default Look of Your Pages

-

-There is no need to customize hypermail pages unless you have a specific -need. There are two different default page layouts provided with hypermail, -the Table Menu Display and the Standard Display. - -

-

Standard Page HTML

-

- -If you are not using the HTML template files described -below then Hypermail generates headers and footers that look similar -to the following. Note that you can substitute a <BODY> statement by -either defining BODY in options.h or by using the -hm_body variable in a list specific configuration file. - -

-

Index Page Headers

-

- -The default Index page headers used in hypermail look like: - -

- - -<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> -
<HTML> -
<HEAD> -
<TITLE>labelindextype</TITLE> -
<LINK REV="made" HREF="mailto:mailto-address"> -
<HEAD> -
<BODY BGCOLOR="#ffffff" TEXT="#000000"> -
<H1 ALIGN=CENTER>label<BR>By indextype</H1> -
<HR WIDTH=400> -
-
-
- -

-

Message Pages

-

- -The default Message page headers used in hypermail look like: - -

- - -<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> -
<HTML> -
<HEAD> -
<TITLE>label:  subject-of-message</TITLE> -
<LINK REV="made" HREF="mailto:mailto-address"> -
<HEAD> -
<BODY BGCOLOR="#ffffff" TEXT="#000000"> -<H1 ALIGN=CENTER>subject</H1> -
<HR> -
-
-
- -

-

Page Footer

-

- -The default page footer shown below is used in hypermail on both the -index and the message pages look like: - -

- - -<HR> -
<P> -
<SMALL> -
<EM> -
This archive was generated by  <A HREF="HMURL">PROGNAME VERSION</A> on DATE and TIME -
</EM> -
</SMALL> -
</BODY> -
</HTML> -
-
-
- -

-

Table Menu Display

-

- -The table menu display generates pages that have a menu bar at the top -and the bottom of the page that looks something like the following. If -you have enabled the "About" and "Other Archives" links are displayed if -you have enabled them in options.h, the environment variables or in the -list configuration file. A example page: - -

TITLE HERE

-

-

-

-Kent Landfield (kent@landfield.com)
-Wed, 17 Jun 1998 22:28:29 -0500 (CDT) -

-

-
-

BODY OF MESSAGE HERE

-
-

-

- - -

- - -This archive was generated by hypermail 2.0x -on Thu Jun 04 1998 - 10:05:34 CDT - - -

-If you do not want to use the table display then make sure that -USETABLE is not defined in options.h and the -hm_usetable is not enabled in the list configuration file. -When you do so you will get the standard look and feel of the hypermail -you have grown acustomed to. - -

- -Also note that New Message allows you to provide a means -for someone to post a message to the list. This feature is currently only -availabe on the Table Menu Display. It can be enabled or disabled by defining -HMAIL in options.h or by setting the hm_hmail -variable in the list specific configuration file. - -


- -

Using Template Files to Customize Your Pages

-

-You can customize your page headers and footers by specifying HTML template -files. Hypermail reads template files and uses those in generating the header -and footer sections of the index and message pages. Template files contain -the actual HTML that you want used when generating the pages. Template files -may also contain "Substitution Cookies". - -

-

Substitution Cookies

-

-You can insert "substitution cookies" in the header and footer template -files so the appropriate information is filled in at runtime. -

-Substitution cookies supported: -

-

- - - - - - - - - - - - - - - - - -
%%- '%' character
%~- Storage directory
%a- Other Archives URL
%b- About Archive URL
%e- Email address of message Author - Not valid on index pages
%g- Date and time archive generated
%h- HMURL
%i- Message-id - Not valid on index pages
%l- Archive label
%m- Mailto address
%p- PROGNAME
%s- Subject of message or Index Title
%v- VERSION
%u- Expanded version link (HMURL,PROGNAME,VERSION)
\n- newline character
\t- tab character
-
-

-Additional cookies generate the complete HTML lines: -

- - - - -
%A- Author META HTML - Not valid on index pages
%B- BODY HTML statement
%S- Subject META HTML
-
- -

-

Specifying template file locations

-

-You can specify the location of the template files either via environment -variables or via entries in the list specific configuration file. -

-

Using Environment Variables

-

-If you wish to use environment variables, make sure they are exported and -correctly available for hypermail to use. How to do this is dependent on -the type of shell in use. See the appropriate man pages if you need more -information. -

    -
  • HM_IHTMLHEADERFILE - the location of the INDEX header template. -
  • HM_IHTMLFOOTERFILE - the location of the INDEX footer template. -
  • HM_MHTMLHEADERFILE - the location of the MESSAGE header template. -
  • HM_MHTMLFOOTERFILE - the location of the MESSAGE footer template. -
-

-

Using Configuration File Entries

-

-An easy way is to tell hypermail where the template files are is via a -list specific configuration file. The following entries can be used. -

-

    -
  • hm_ihtmlheaderfile - the location of the INDEX header template. -
  • hm_ihtmlfooterfile - the location of the INDEX footer template. -
  • hm_mhtmlheaderfile - the location of the MESSAGE header template. -
  • hm_mhtmlfooterfile - the location of the MESSAGE footer template. -
- -

-

Examples

-

- -It is acceptable to have a single configuration file listed in more than -one entry. Suppose you want to have a common footer for all pages and -separate headers. The following example shows that. -

-

-hm_ihtmlheaderfile = /lists/wu-ftpd/wu-ftpd-index.hyp
-hm_mhtmlheaderfile = /lists/wu-ftpd/wu-ftpd-msg.hyp
-hm_ihtmlfooterfile = /lists/wu-ftpd/wu-ftpd-msgfooter.hyp
-hm_mhtmlfooterfile = /lists/wu-ftpd/wu-ftpd-msgfooter.hyp -
-

- -If an entry is left blank and a location is not specified via an environment -variable then the hypermail default headers are used. -

-

-hm_ihtmlheaderfile = /lists/wu-ftpd/wu-ftpd-index.hyp
-hm_ihtmlfooterfile =
-hm_mhtmlheaderfile = /lists/wu-ftpd/wu-ftpd-msg.hyp
-hm_mhtmlfooterfile = /lists/wu-ftpd/wu-ftpd-msgfooter.hyp
-
- -The above example informs hypermail to use the template files listed for -the Index header and the Message header and footer. The hypermail default -page footer would be used on the index pages. -

- -NOTE: While it is not necessary to provide absolute paths, -it is a good idea to. - -

-

Message Pages

-

- -Each HTML file that is generated for a message contains (where applicable): -

-

    -
  • the subject of the article, -
  • the name and email address of the sender, -
  • the date the article was sent, -
  • links to the next and previous messages in the archive, -
  • a link to the message the article is in reply to, and -
  • a link to the message next in the current thread. -
- -


- -

Including Reference Links

-

- -Reference links such as the following - -

- -are normally included on each of the message pages. If this -is not what you want you can disable them by using the -SHOW_MESSAGE_LINKS -define in options.h or the hm_show_msg_links -in the list's configuration file. - -

- -Additionally, if you want to list all replies to a message -such as the following, - -

- -you can do so by setting the SHOWREPLIES -define in options.h or the hm_showreplies -in the list's configuration file. - -


- -

In-lining Images

-

-It is possible to have images that are sent in email automatically -displayed when the message is presented. To do this you need to set -the hm_inline_types in the list configuration file. - -

-For example, if you listed -

-hm_inline_types = image/gif image/jpeg -
-then both GIF files and JPEG files would be displayed as part of the -message. Types that are not "in-lined" are linked as a simple attachment -requiring the user to click on it to have it displayed. - -


- -

Changing The HTML File Suffix

-

-You may wish to have the pages generated use a different HTML file -suffix other than the default ".html". To do this you need -to either set the default define HTMLSUFFIX in options.h, -set the environment variable HM_HTMLSUFFIX or set it in -the list's configuration file by using the hm_htmlsuffix -variable. -

-Note: -Do not include a "." in the suffix; If you do -you will end up with filenames that look like. "..html" - -


-

See Also

-
-hypermail.(1), -  -hmrc.(4), -  -Hypermail -  -and -  -Hypermail List Configuration File. -and Adding a Search Engines to your Hypermail Archive -
- -
- -Last updated April 10, 2003 - - - - + + + + + + Hypermail Documentation - Customizing Hypermail Pages + + + + +

hypermail logoCustomizing Hypermail Pages

+
+

Contents:

+ +
+

Hypermail Pages

You can + customize hypermail generated pages to suit your local web site + needs as well as the needs of the list. Hypermail generates three + types of files: +
    +
  • HTML index pages,
  • +
  • HTML message pages,
  • +
  • MIME enclosure attactment files.
  • +
+

The attachment files are a copy of the attachment the user + included and are not altered.

+

This version of hypermail allows you to customize both index + and message pages separately as described below.

+
+

Definitions

+

In the examples below, the following terms are used.

+

label - the label passed in via the + command line or specified in the list configuration file.

+

indextype - depends on the type of + index being presented. It could be By Author, By + Date, By Subject, or By Thread.

+

mailto-address - the + MAILTO value compiled into hypermail, specified + in the environment with the HM_MAILTO variable, + or specified in the hm_mailto variable in the + list specific configuration file.

+

subject-of-message - the contents of + the message's RFC + 2822 Subject: header.

+

HMURL - Used to contain a link to + the Hypermail Development Center. Defined in hypermail.h.

+

PROGNAME - contains the name of the + executable. Defined in hypermail.h.

+

VERSION - contains the version of + the software that generated the page this appears on. Defined in + src/hypermail.h.

+
+

Choosing the Default Look of Your + Pages

+

There is no need to customize hypermail pages unless you have + a specific need. There are two different default page layouts + provided with hypermail, the Table Menu Display and the Standard + Display.

+

Standard Page HTML

+

If you are not using the HTML template files + described below then Hypermail generates headers and footers that + look similar to the following. Note that you can substitute a + <BODY> statement by either defining BODY + in options.h or by using the hm_body variable in + a list specific configuration file.

+

Index Page + Headers

+

The default Index page headers used in hypermail look + like:

+
+ <!DOCTYPE HTML PUBLIC "-//IETF//DTD + HTML//EN">
+ <HTML>
+ <HEAD>
+ <TITLE>labelindextype</TITLE>
+ + <LINK REV="made" + HREF="mailto:mailto-address">
+ <HEAD>
+ <BODY BGCOLOR="#ffffff" TEXT="#000000">
+ <H1 + ALIGN=CENTER>label<BR>By indextype</H1>
+ + <HR WIDTH=400>
+
+

Message Pages

+

The default Message page headers used in hypermail look + like:

+
+ <!DOCTYPE HTML PUBLIC "-//IETF//DTD + HTML//EN">
+ <HTML>
+ <HEAD>
+ <TITLE>label:  subject-of-message</TITLE>
+ + <LINK REV="made" + HREF="mailto:mailto-address">
+ <HEAD>
+ <BODY BGCOLOR="#ffffff" TEXT="#000000"> <H1 + ALIGN=CENTER>subject</H1>
+ <HR>
+
+

Page Footer

+

The default page footer shown below is used in hypermail on + both the index and the message pages look like:

+
+ <HR>
+ <P>
+ <SMALL>
+ <EM>
+ This archive was generated by  <A + HREF="HMURL">PROGNAME VERSION</A> on DATE and + TIME
+ </EM>
+ </SMALL>
+ </BODY>
+ </HTML>
+
+

Table Menu Display

+

The table menu display generates pages that have a menu bar at + the top and the bottom of the page that looks something like the + following. If you have enabled the "About" and "Other Archives" + links are displayed if you have enabled them in options.h, the + environment variables or in the list configuration file. A + example page:

+

TITLE HERE

+ +

Kent Landfield (kent@landfield.com)
+ + Wed, 17 Jun 1998 22:28:29 -0500 (CDT)

+ +
+

BODY OF MESSAGE + HERE

+
+ + +

This archive was generated by hypermail 2.0x on Thu Jun 04 + 1998 - 10:05:34 CDT

+

If you do not want to use the table display then make sure + that USETABLE is not defined in options.h and + the hm_usetable is not enabled in the list + configuration file. When you do so you will get the standard look + and feel of the hypermail you have grown acustomed to.

+

Also note that New Message allows you to + provide a means for someone to post a message to the list. This + feature is currently only availabe on the Table Menu Display. It + can be enabled or disabled by defining HMAIL in + options.h or by setting the hm_hmail variable in + the list specific configuration file.

+
+

Using Template Files to Customize + Your Pages

+

You can customize your page headers and footers by specifying + HTML template files. Hypermail reads template files and uses + those in generating the header and footer sections of the index + and message pages. Template files contain the actual HTML that + you want used when generating the pages. Template files may also + contain "Substitution Cookies".

+

Substitution Cookies

+

You can insert "substitution cookies" in the header and footer + template files so the appropriate information is filled in at + runtime.

+

Substitution cookies supported:

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
%%- '%' character
%~- Storage directory
%a- Other Archives URL
%b- About Archive URL
%e- Email address of message Author - Not valid on index + pages
%g- Date and time archive generated
%h- HMURL
%i- Message-id - Not valid on index pages
%l- Archive label
%m- Mailto address
%p- PROGNAME
%s- Subject of message or Index Title
%v- VERSION
%u- Expanded version link (HMURL,PROGNAME,VERSION)
\n- newline character
\t- tab character
+
+

Additional cookies generate the complete HTML + lines:

+
+ + + + + + + + + + + + + +
%A- Author META HTML - Not valid on index pages
%B- BODY HTML statement
%S- Subject META HTML
+
+

Specifying template file + locations

+

You can specify the location of the template files either via + environment variables or via entries in the list specific + configuration file.

+

Using Environment + Variables

+

If you wish to use environment variables, make sure they are + exported and correctly available for hypermail to use. How to do + this is dependent on the type of shell in use. See the + appropriate man pages if you need more information.

+
    +
  • HM_IHTMLHEADERFILE - the location of the + INDEX header template.
  • +
  • HM_IHTMLFOOTERFILE - the location of the + INDEX footer template.
  • +
  • HM_MHTMLHEADERFILE - the location of the + MESSAGE header template.
  • +
  • HM_MHTMLFOOTERFILE - the location of the + MESSAGE footer template.
  • +
+

Using Configuration File + Entries

+

An easy way is to tell hypermail where the template files are + is via a list specific configuration file. The following entries + can be used.

+
    +
  • hm_ihtmlheaderfile - the location of the + INDEX header template.
  • +
  • hm_ihtmlfooterfile - the location of the + INDEX footer template.
  • +
  • hm_mhtmlheaderfile - the location of the + MESSAGE header template.
  • +
  • hm_mhtmlfooterfile - the location of the + MESSAGE footer template.
  • +
+

Examples

+

It is acceptable to have a single configuration file listed in + more than one entry. Suppose you want to have a common footer for + all pages and separate headers. The following example shows + that.

+
+ hm_ihtmlheaderfile = /lists/wu-ftpd/wu-ftpd-index.hyp
+ hm_mhtmlheaderfile = /lists/wu-ftpd/wu-ftpd-msg.hyp
+ hm_ihtmlfooterfile = /lists/wu-ftpd/wu-ftpd-msgfooter.hyp
+ hm_mhtmlfooterfile = /lists/wu-ftpd/wu-ftpd-msgfooter.hyp +
+

If an entry is left blank and a location is not specified via + an environment variable then the hypermail default headers are + used.

+
+ hm_ihtmlheaderfile = /lists/wu-ftpd/wu-ftpd-index.hyp
+ hm_ihtmlfooterfile =
+ hm_mhtmlheaderfile = /lists/wu-ftpd/wu-ftpd-msg.hyp
+ hm_mhtmlfooterfile = /lists/wu-ftpd/wu-ftpd-msgfooter.hyp
+
The above example informs hypermail to use the + template files listed for the Index header and the Message header + and footer. The hypermail default page footer would be used on + the index pages. +

NOTE: While it is not necessary to provide + absolute paths, it is a good idea to.

+

Message Pages

+

Each HTML file that is generated for a message contains (where + applicable):

+
    +
  • the subject of the article,
  • +
  • the name and email address of the sender,
  • +
  • the date the article was sent,
  • +
  • links to the next and previous messages in the + archive,
  • +
  • a link to the message the article is in reply to, and
  • +
  • a link to the message next in the current thread.
  • +
+
+

Including Reference + Links

+

Reference links such as the following

+ are normally included on each of the message pages. If this + is not what you want you can disable them by using the + SHOW_MESSAGE_LINKS define in options.h or the + hm_show_msg_links in the list's configuration + file. +

Additionally, if you want to list all replies + to a message such as the following,

+ you can do so by setting the SHOWREPLIES + define in options.h or the hm_showreplies in the + list's configuration file. +
+

In-lining Images

+

It is possible to have images that are sent in email + automatically displayed when the message is presented. To do this + you need to set the hm_inline_types in the list + configuration file.

+

For example, if you listed

+
+ hm_inline_types = image/gif image/jpeg +
then both GIF files and JPEG files would be + displayed as part of the message. Types that are not "in-lined" + are linked as a simple attachment requiring the user to click on + it to have it displayed. +
+

Changing The HTML File + Suffix

+

You may wish to have the pages generated use a different HTML + file suffix other than the default ".html". To do this you need + to either set the default define HTMLSUFFIX in + options.h, set the environment variable + HM_HTMLSUFFIX or set it in the list's + configuration file by using the hm_htmlsuffix + variable.

+

Note: Do not include a "." in the suffix; If + you do you will end up with filenames that look like. + "..html"

+
+

See Also

+
+ hypermail.(1),   + hmrc.(4),   Hypermail   and   + Hypermail List Configuration + File. and Adding a Search Engines to your Hypermail + Archive +
+
+ Last updated April 10, + 2003 + + diff --git a/docs/faq.html b/docs/faq.html new file mode 100644 index 00000000..644cc595 --- /dev/null +++ b/docs/faq.html @@ -0,0 +1,499 @@ + + + + + + Hypermail Frequently Asked Questions + + + + + + +
+

Hypermail Frequently Asked Questions

+
+

This is the beginning of the Hypermail FAQ. + Don't dispair that there is little here. That will change + shortly.

+

Table of Contents

+
    +
  1. Why is the License Different ?
  2. +
  3. Will hypermail run on my system ?
  4. +
  5. What Happened to EIT ?
  6. +
  7. Where in the World is Kevin Hughes ?
  8. +
  9. What is the latest version of Hypermail + ?
  10. +
  11. Where can I get the latest version of + Hypermail ?
  12. +
  13. How can I split the archives into months + ?
  14. +
  15. How do I change the font on the pages + ?
  16. +
  17. What is HM_MAILTO used for ?
  18. +
  19. Can I send a message to the list from the web + archive ?
  20. +
  21. Can I build and run this on Windows + 98/2000/NT ?
  22. +
  23. How can I remove a listserver subject prefix + ?
  24. +
  25. Which configuration file should I use + ?
  26. +
  27. Why is the downloaded file name different + when I download hypermail.tar.gz ?
  28. +
  29. Can I throttle hypermail's CPU usage + ?
  30. +
  31. How can I make my archive searchable + ?
  32. +
  33. How does hypermail decide whether messages + are in the same thread?
  34. +
  35. How can I have multiple mailboxes produce one + archive?
  36. +
  37. I have received an email with attachment that + needs octet-stream. What should I do?
  38. +
  39. Why is Hypermail ignoring some + messages?
  40. +
  41. Bogus dates are causing some emails to be put + in a strange folder. Is there an easy way around this?
  42. +
+
+

1. Why is the License Different + ?

+

Hewlett-Packard (who is now the legal owner of Hypermail, + since EIT was bought by VeriFone, which was bought by + Hewlett-Packard) has put it under the GNU license, a widely used + "free software" license. This means that you can use and modify + the source code as long as you make your changes publically + available and do not charge for their use (although, under GNU, + you can charge for their distribution). More details are + available at http://www.fsf.org/ and http://www.fsf.org/copyleft/gpl.html.

+
+ +

2. Will hypermail run on my + system ?

+

Hypermail can compile and run as-is on most of the more + popular UNIX systems today. If you're not sure if hypermail will + run on your UNIX system, try compiling it and see!

+

Hypermail has been reported to work on MacOSX (X.2.6), with + the advice to use --disable-shared and manually execute make in + src/pcre.

+

People have indeed ported hypermail to DOS, Windows 95, and + Windows NT (but not Java ...yet). Installation advice for some + Windows systems is available at win32.html.

+
+ +

3. What Happened to EIT + ?

+

A very old and established government contractor company + called Electronic Instrumentation and Technology Inc. made legal + moves to obtain the eit.com domain. Since VeriFone/HP had no + interest in keeping EIT, they dissolved it completely some months + ago. This company had a trademark on EIT so the domain name was + given to them. (Thank you InterNic...)

+
+ +

4. Where in the World is Kevin + Hughes ?

+

Kevin has not dropped off the face of the earth but is + extremely busy these days as a Hypermedia Engineer at Veo + Systems.

+
+ Kevin Hughes
+ kev@kevcom.com
+ kevinh@veosys.com
+ (650) 858-7710 (office number)
+ (650) 858-4925 (fax)
+ www.veosystems.com +
+
+ +

5. What is the latest version of + Hypermail ?

+

The latest stable version of Hypermail is Version 2.1.9. It is + available from http://www.hypermail-project.org

+
+ +

6. Where can I get the latest + version of Hypermail ?

+

The latest version of Hypermail is available from the + Hypermail Development Center at SourceForge + and/or at

+

http://www.hypermail-project.org/

It + and past versions are also available via FTP from +

ftp://ftp.hypermail.org/hypermail/

+

The cvs server at :pserver:cvs@cvs.hypermail.org:/CVS + usually has a more recent but less tested version.

+
+

7. How can I split the archives + into months ?

+

I have a fat archive that I'd like to split up by month, + like I see so many others doing. Is there a description somewhere + of a procedure to follow, or will I just have to think it + through?

+

If you are using version 2.1.0 or later and either have gdbm + or don't do incremental updates, see the folder_by_date option. The simple + way to use this is to add these to your .hmrc:

+
+ folder_by_date="%y%m"
+ usegdbm = 1 +
and regenerate your archive from its mbox file. +

An older alternative is a set of scripts in a directory named + archive/ that has tools to do just + what you want to do.

+

If you want index files split by month but don't need to split + the archive into multiple directories, try adding + "monthly_index=1" in your config file (usually ~/.hmrc) + (available in version 2.1). A summary.html file will provide + links to all the monthly indices. This is probably appropriate + for archives with a few thousand messages, but for larger + archives I recommend splitting into multiple directories.

+

There's a script at http://users.netrus.net/troc/perl.html + called mms (monthly mail splitter) which has also been reported + useful.

+
+ +

8. How do I change the font on + the pages ?

+

What I'd like to do is change the font of the archives + pages. I tried doing this by adding a <FONT FACE=...> tag + in the header.hyp file and a <FONT> tag in the footer file, + but it didn't work. Is there something in the program itself + that's preventing me from making this change?

+

Yes and no. Let me guess... You have hm_usetable = + 1. The code for tables is not inheriting the FONT values + and that they need to be set in the <TH..> tags. If tables + are not used it works as expected.

+

To test it put a

+
+ <FONT SIZE="1" + FACE="GENEVA,ARIAL,HELVETICA"> +
in the test-index.hyp and test-msg.hyp. In + test-footer.hyp put +
+ </FONT> +
+

With hm_usetable = 1 in the test.rc file, + then ran "hypermail -c test.rc -m testmail" and + everything between the "menus" was the right FONT but the menus + were not.

+

Then edit the test.rc file and set hm_usetable = + 0. Next remove the existing testdir and rerun hypermail + again. This time the FONT works as expected.

+

Is this what is happening to you ? If so the code will need to + be modified.

+
+ +

9. What is HM_MAILTO used for + ?

+

I've enabled this option in .hyprc (the configuration file + to which the pipe script for my archive points). Unlike the other + options, this one does not create a link to + mailto:admin@domain.com as I would expect. Any ideas?

+

HM_MAILTO has a couple different uses. One is to trigger the + insertion of the <LINK REV=made HREF=mailto:...> header in + the HTML sources. This is most useable with ascii browsers such + as lynx.

+
+ hm_mailto = [ email-address | NONE ]
+ #
+ # The address of the contact point that is put in the HTML + header line
+ # <LINK REV=made HREF=mailto:hm_mailto>
+ #
+ # The <LINK...> header can be disabled by default by + setting
+ # mailto to NONE.
+
+

It can also be used in a hypermail page template file since it + resolves to %m.

+

For example, your footer file might look like...

+

<P ALIGN=CENTER><IMG + SRC="/images/bar.png" WIDTH="400" HEIGHT="4" + ALT="---------"></P>
+ <ADDRESS>
+ <EM>
+ <SMALL>
+ This archive was generated by %u on %g
+ <P ALIGN=CENTER>
+ Send administrative comments to<A + HREF="mailto:%m">%m</A>
+ </P>
+ </SMALL>
+ </EM>
+ </ADDRESS>
+ </BODY>
+ </HTML>

+

In this case the %m is expanded to admin@domain.com.

+
+ +

10. Can I send a message to the + list from the web archive ?

+

I would like to add a link to the menu that says "Send a + Message to the List." I read through the documentation on your + website, and I think I know what I have to do. However, I cannot + find an example of what a .hyp template file should look + like.

+

That's what "Respond" on the menubar does. It allows a person + to reply to the existing message, with the reply sent to the list + address. And "mail a new topic" on the menubar allows a user to + send a new message to the list.

+

To enable this feature set

+


+ hm_hmail = listaddr@your-site.domain

+
+ +

Can I build and run this on + Windows 98/2000/NT ?

+

I would like to use Hypermail on my Windows box. Can I ? + If so how do I build the latest version ?

+

There is a complete description on how to build hypermail on a + Windows system at win32.html. This was + written by Bob Crispen <bob.crispen@boeing.com>. (Thanks + Bob!)

+
+ +

How can I remove a listserver + subject prefix ?

+

The Subject index page does not seem to do the right + sorting. My list has a "subject prefix" so people can filter + their inbound mail better. Is that confusing the sorting + ?

+

Yes. Hypermail functions that deal with replies look for the + Re: and return a pointer to the subject line after the "Re: ". + The listserver software is prefixing the subject line contents + with [subject-prefix]. For example:

+

Initial message:

+
+ Subject: [subject-prefix] the subject of the + message +
+

Replies:

+
+ Subject: [subject-prefix] Re: the subject of the + message +
+

The best solution is to use the directive

+
+ stripsubject = [subject-prefix] +
in the config file for that list. Then + [subject-prefix] is stripped before the Reply processing occurs + and the proper things happen. This is the proper fix since + hypermail would need to know to strip the [subject-prefix] from + the initial messages to get it right. +
+ +

Which configuration file should + I use ?

+

Which configuration file should I use ?. Is it the + hypermail.rc file in the Configs directory? I'm trying to edit + the appearance of pages created by hypermail, but I think I'm + trying to edit the wrong config file.

+

With one of the later versions you can use the -v option to + generate a config file that you can use. There is not just one + config file. What are included are examples.

+
+ $ hypermail -v > + some-file.rc +
Edit the some-file.rc and set the values to what you + want. +

Then run hypermail using the config file by:

+
+ $ hypermail -c some-file.rc. +
+
+ +

Why is the downloaded file name + different when I download hypermail.tar.gz ?

+

The source is available from

+
+ www.hypermail.org/hypermail.tar.gz +
However: When downloading though IE v4 browser + on an NT PC, you get a file called hypermail_tar.tar. +

Depending on your browser MIME type setup and support + applications, the filename downloaded might be hypermail.tar, + hypermail.tar.gz, etc. In any case, you will be informed as to + the filename when you actually download it onto a Windows/NT PC. + Use that name from that point forward.

+
+ +

Can I throttle hypermail's CPU + usage ?

+

I'm looking for a good way to throttle hypermail. When it + runs it is taking over both CPU's on my system. Is there a way to + have it limit itself?

+

There is currently no "nice" call in hypermail to limit + itself. Maybe we can consider it in the future. You might run it + as a subcommand to nice. For example in an aliases file:

+
+ "|nice someval 'hypermail command + line'" +
I've NEVER tried this so... ;) +

You might also be looking at a very large archive with lots of + messages being the cause. If so then try breaking it up into + smaller archives such as monthly archives. Tools are available + for that in the latest release.

+

And if the list is a high traffic list, consider not running + it from a sendmail alias and instead, run it from cron. (To + assure it does not run unnecessarily, consider using 'make' to + check dependencies on the inbound mailbox and the archive itself. + If the mailbox is newer, then run the hypermail command. If not, + then no new message have been received so don't bother running + it.

+

If the you are using the linkquotes option and incremental + update (-u option), add a searchbackmsgnum line to your config + file. It controls the tradeoff between speed and the reliability + of finding the right source for quoted text. Try to set it to the + largest number of messages between a message and the final direct + reply to that message.

+
+ +

How can I make my archive + searchable ?

+

Where can I find a program that provides an easy way of + adding a "search this archive" feature to my hypermail + archive?

+

These seem to be the most popular choices:

+ Swish-E comes with a script (index_hypermail.pl) that is + customized for indexing Hypermail archives. index_hypermail.txt has a description + of how to use it. +

You can also use search engines by putting text such as this + on your web page (although this search produces results from an + entire site, not just the email archive):
+ <form action=http://www.google.com/search>
+ <input type=text name=as_q>
+ <input type=hidden name=as_sitesearch + value=hypermail-project.org>
+ <input type=submit value="Search Hypermail">
+ </form>

+

You can probably get more complete (but apparently not recent) + info at: Search + Engine Software For Your Web Site.

+
+ +

How does hypermail decide + whether messages are in the same thread?

+

It uses the In-Reply-To: header if that is available. If not, + it uses the References: header if that is available. If these are + not available, it looks for previous messages with the same + subject header. Matches based on the subject are listed as + "maybe" replies.

+

If the linkquotes option is used, it will also search the + message bodies of the previous searchbackmsgnum (default = 500) + messages for text that looks like it is being quoted in the + current message. If it finds one or more prior messages with such + text matches, it will treat the one with the longest match as the + message being replied to. The exact algorithms for deciding what + is quoted text are a bit complex. These matches override the + In-Reply-To: and References: info (this rule may deserve further + thought).

+
+ +

How can I have multiple + mailboxes produce one archive?

+

One simple way is to combine them into one mailbox, and send + that to hypermail. Something like

+
+ "cat file1 file2 file3 > combined" +
should do that. +

Another way is to invoke hypermail several times, using the -u + flag for any mailboxes that you want added after the html archive + has been started:

+
+ hypermail -m file1
+ hypermail -u -m file2
+ hypermail -u -m file3
+
+

Do not try to send hypermail more than one mailbox at a time, + or send it a second mailbox without using the -u flag or the + increment=1 or increment=-1 option.

+
+ +

I have received an email with + attachment that needs octet-stream. What should I do?

+

No attachment "needs" octet-stream. octet-stream is a label + that describes the attachment as being "a stream of octets". What + is that you say? Well, the application didn't know any more + specifics and thus it identified it as good as possible. It knows + it is a stream of octets, nothing else.

+

So, an attachment coming as "octet-stream" can be pretty much + ANYTHING. You have no idea, and neither does anyone else. The + only way to figure out is to download the file and see for + yourself, ask the person who mailed it or to hope that the mail + it came with describes what the attachment was about.

+

"octet-stream" could just as well be named "I haven't got the + slightest idea what this is, but I know it is built up with a + series of bytes".

+
+ +

Why is Hypermail ignoring some + messages?

+

One possibility is that those messages have a header + saying

+
+ X-No-Archive: yes +
which seems to indicate that the author doesn't want + them included in any archive. This is controlled by a configure + option which defaults to: +
+ deleted = "X-Hypermail-Deleted X-No-Archive" +
To get Hypermail to treat these as normal messages, + add this to your config file (which is ~/.hmrc by default) or + alter the "deleted" line in your existing config file to: +
+ deleted = "X-Hypermail-Deleted" +
+
+ +

Bogus dates are causing some + emails to be put in a strange folder. Is there an easy way around + this?

+

If the bad dates were caused by a computer with the date set + absurdly, try running mailbox_date_trimmer.py, + available in the Hypermail's contrib directory.

+

If the bad dates are in a format that Hypermail doesn't + understand, then you will probably need to edit them + manually.

+
+ +
+
+
+ Do you have a Hypermail related question you'd like + to see listed here ? If so send mail to the Hypermail mailing + list hypermail@hypermail-project.org +   (preferably with the answers). In order to + minimize spam on the list, you must subscribe to the list (at + least temporarily) in order to send mail to it. You may + subscribe to the list by sending a message with the word + "subscribe" in the Subject: field to + hypermail-request@hypermail-project.org. +
+
+

 

+ + diff --git a/docs/hmrc.4 b/docs/hmrc.4 index 8a79e180..736192a1 100644 --- a/docs/hmrc.4 +++ b/docs/hmrc.4 @@ -184,6 +184,15 @@ for the valid conversion specifications. A string to be stripped from all subject lines. Helps unclutter mailing lists which add tags to subject lines. .TP +.B archive_date = boolean_number +Adds a specific line in the indexes giving the date the archive +was generated. Disabled by default. +.TP +.B hypermail_colophon = boolean_number +Adds a footer line to messages and indexes stating that the archive +was generated by hypermail, the version, and the generation date. +Enabled by default. +.TP .B archives = "URL" This will create a link in the archived index pages labeled .I "Other mail archives" @@ -236,15 +245,6 @@ experimental .B X-Robots-Tag HTTP header. For more information, browse https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag .TP -.B indextable = boolean_number -Setting this variable to -.B 1 -will tell Hypermail to generate an message index -Subject/Author/Date listings using a table format. -Set to -.B 0 -if you want the standard Hypermail index page look and feel. -.TP .B reverse = boolean_number Setting this variable to .B 1 @@ -286,6 +286,11 @@ will word wrap. This only takes effect if is enabled. .TP .B iquotes = boolean_number +.B NOTE: +This option has been deprecated as markup allows you to replicate this +behavior using the CSS +.B .quote +rule. Set this to .B 1 to italicize quoted lines. @@ -477,6 +482,11 @@ for further discussion on the .B noindex metadata value. .TP +.B empty_archive_notice = "string" +Message that hypermail should display in the indices if the archive +is empty (e.g, no messages, or all messages marked as deleted or spam), +If not set, hypermail will use a default localized message. +.TP .B base_url = "url" The url of the archive's main directory. This is needed when the latest_folder option is used and the folder_by_date makes @@ -543,7 +553,7 @@ directives. .B Robot annotations instruct a visiting web robot if the contents of a message should be indexed and/or if the outgoing links from the message should be followed, doing so thru -a specific HTML meta tag. (browse http://www.robotstxt.org/ for further +a specific HTML meta tag. (browse https://www.robotstxt.org/ for further details). .B Robot annotations @@ -690,7 +700,7 @@ a file for each thread that contains all the messages in that thread. .TP .B href_detection = boolean_number Set this to On to assume that any string on the body of the message -that says is a URL, together with its markup and treat it as such. .TP .B mbox_shortened = boolean_number @@ -763,6 +773,16 @@ With a setting of 2, attachment creation information is listed individually with the number of the message the attachments relate to. This is written to stdout. .TP +.B warn_deprecated_options = [ 0 | 1 ] +Set this to +.B 1 +if you want hypermail to display warning messages if you +are using deprecated or planned to deprecate configuration options. +Set it to +.B 0 +to hide those warnings. This option is enabled by default +This warning is written to stdout. +.TP .B thrdlevels = number This specifies the number of thread levels to outline in the thread index. For instance, if .B thrdlevels @@ -821,6 +841,21 @@ and .B statement. .TP +.B mhtmlnavbar2upfile = path +Define path as the path to a template file containing valid HTML formatting +statements you wish to be included as information on each archived message, +giving links to the hierarchy of your archive. By default uses the +value of +.B ihtmlnavbar2upfile +If +.B mhtmlnavbar2upfile +and +.B ihtmlnavbar2upfile +are undefined, hypermail will generate a generic breadcrumb +that uses the archive's +.B label +and links back to the archive's default index file. +.TP .B hmail = Mailing_List_Submission_Address Set this to the list's submission address. When enabled, this can be used to submit a new message to the list served by the hypermail archive. @@ -857,25 +892,36 @@ header. The variable is used to specify where the .B Message-Identifier value will appear in the link. A possible command one could use is -.B http://example.org/mid-resolver/$ID. +.B https://example.org/mid-resolver/$ID. This option is .B disabled by default. .TP +.B default_css_url = "URL" +This option points to an external stylesheet that will be used +for indexes and messages if either +.B icss_url +or +.B mcss_url +are not configured. + +By default this option is the relative URL +.B hypermail.css +.TP .B icss_url = "URL" This will link an external stylesheet found at the given URL to the index files. This will happen thru a -.B LINK +.B link element in the index document's -.B HEAD. +.B head. By default this option is disabled. .TP .B mcss_url = "URL" This will link an external stylesheet found at the given URL to the message files. This will happen thru a -.B LINK +.B link element in the message document's -.B HEAD. +.B head. By default this option is disabled. .TP .B show_headers = list_of_RFC_Headers_to_display @@ -939,6 +985,14 @@ do anything with. They are quietly ignored. They can be listed individually on multiple lines or comma or space separated on a single line. .TP +.B ignore_content_disposition = types_of_MIME_attachments +This is the list of MIME attachment types for which you wish to ignore any +associated +.B Content-Disposition +header. They are quietly ignored. They can be +listed individually on multiple lines or comma or space separated on a single +line. +.TP .B applemail_mimehack = [ 0 | 1] In a multipart/alternative message, Apple Mail (as of June/2018) is only adding attachments to the text/html related part. Set this @@ -957,6 +1011,20 @@ option. .B Disabled by default. .TP +.B max_attachments_per_msg = positive integer +This specifies the maximum number of attachments that will be +processed for a message. Any attachments beyond this limit +will be ignored. +Set to +.B 0 +to remove this limit. +Note that if you remove this limit or set it too high, +memory usage may increase when parsing a message +made up of hundreds of attachments. +Set to +.B 200 +by default. +.TP .B searchbackmsgnum = postive integer If the linkquotes option is on and an incremental update is being done (-u option), this controls the tradeoff between speed and @@ -1002,8 +1070,6 @@ recommended for archives with over a few hundred messages. Setting this greater than 1 will produce multiple levels of files for each thread whose replies are nested by more than 1 level, but that is rarely useful. This option is currently disabled -if the indextable option is turned on, and probably needs to -be less than thrdlevels. .LP .SH HTML TEMPLATE FILE SUBSTITUTION COOKIES .LP diff --git a/docs/hmrc.html b/docs/hmrc.html index 4953123b..211f9fa7 100644 --- a/docs/hmrc.html +++ b/docs/hmrc.html @@ -1,28 +1,30 @@ - - - - - -Hypermail - hmrc list configuration - - - -

   Hypermail List + + + + + + + Hypermail - hmrc list configuration + + + + +
+

image used to illustrate hypermail in action   Hypermail List Configuration File

-
+

The hypermail list configuration file is used to specify list specific or user specific information to hypermail. Comments are denoted by the '#' character at the begining of the line. The file to use can be specified via the -c command line argument. The default file is .hmrc in the user's home directory.

-

Examples listed on this page are shown in this style. The -default value is shown unless otherwise indicated. Off is -equivalent to 0, and On is equivalent to 1 for options which -are either on or off.

+

Examples listed on this page are shown in this style. The +default value is shown unless otherwise indicated. Off is +equivalent to 0, and On is equivalent to 1 for options which + are either on or off.

+
+
+
+
+

Options affecting both messages and index pages

-

Locale

+

Locale

-
-
language = [ two-or-more-letter-language-id -]
+
language = [ two-or-more-letter-language-id +]
This is a two-letter string specifying the default language to use, or a longer string specifying a language and locale. Set this the value of the language table you wish to use when running and generating archives. See also iso2022jp -and eurodate.
-
-Current supported languages, with their default locales:
-
-de (de_DE) - German
-en (en_US) - English
-es (es_ES) - Spanish
-fi (fi_FI) - Finnish
-fr (fr_FR) - French
-el (el) - Greek
-gr (el_GR) - Greek
-is (is_IS) - Icelandic
-no (no_NO) - Norwegian
-pl (pl_PL) - Polish
-pt (pt_BR) - Brazilian Portuguese
-ru (ru_RU) - Russian
-sv (sv_SE) - Swedish
-
+and eurodate.
+
+Current supported languages, with their default locales:
+
+de (de_DE) - German
+en (en_US) - English
+es (es_ES) - Spanish
+fi (fi_FI) - Finnish
+fr (fr_FR) - French
+el (el) - Greek
+gr (el_GR) - Greek
+is (is_IS) - Icelandic
+no (no_NO) - Norwegian
+pl (pl_PL) - Polish
+pt (pt_BR) - Brazilian Portuguese
+ru (ru_RU) - Russian
+sv (sv_SE) - Swedish
+
The directory /usr/share/i18n/locales on many systems has the -locale codes that are available on that system.
-
-language = en
-
-
iso2022jp = [ 0 | 1 ]
-
Set this to On to support ISO-2022-JP messages.
-
-iso2022jp = 0
-
-
i18n = [ 0 | 1 ]
+locale codes that are available on that system.
+
+language = en +
iso2022jp = [ 0 | 1 ]
+
Set this to On to support ISO-2022-JP messages.
+
+iso2022jp = 0
+
i18n = [ 0 | 1 ]
Enable I18N features, hypermail must be linked with libiconv.
-
-i18n = 1 (disabled by default)
-
-
eurodate = [ 0 | 1 ]
+"https://www.gnu.org/software/libiconv/">libiconv.
+
+i18n = 1 (disabled by default) +
eurodate = [ 0 | 1 ]
Set this to reflect how you want dates displayed in the index -files.
-Set as 1 to to use European date format "DD MM YYYY".
-Define as 0 to to use American date format "MM DD YYYY".
-
-eurodate = 0
-
-
dateformat = strftime-date-format
-
Format used in strftime(3) call for displaying dates.
-See strftime(3) for the valid conversion specifications.
-
-dateformat = "%D-%r Z" (disabled by default)
-
-
isodate = [ 0 | 1 ]
+files.
+Set as 1 to to use European date format "DD MM YYYY".
+Define as 0 to to use American date format "MM DD YYYY".
+
+eurodate = 0 +
dateformat = strftime-date-format
+
Format used in strftime(3) call for displaying dates.
+See strftime(3) for the valid conversion specifications.
+
+dateformat = "%D-%r Z" (disabled by default)
+
isodate = [ 0 | 1 ]
Set this to On to display article received dates in YYYY-MM-DD HH:MM:SS format. If used with the gmtime option, a Z will be -inserted between the DD and HH.
-
-isodate = 0
-
-
gmtime = [ 0 | 1 ]
+inserted between the DD and HH.
+
+isodate = 0 +
gmtime = [ 0 | 1 ]
Set this to On to display article received dates using -Greenwich Mean Time (UTC) rather than local time.
-
-gmtime = 0
-
-

Header options

-
-
label = [ Title | NONE ]
-
This is the default title you want to call your archives.
-Set this to NONE to use the name of the input mailbox.
-
-label = Hypermail Development List (default value is +Greenwich Mean Time (UTC) rather than local time.
+
+gmtime = 0
+
+

Header options

+
+
label = [ Title | NONE ]
+
This is the default title you want to call your archives.
+Set this to NONE to use the name of the input mailbox.
+
+label = Hypermail Development List (default value is filename?????)
-
-
hmail = [ Mailing List Submission Address | NONE -]
+
hmail = [ Mailing List Submission Address | NONE +]
Set this to the list's submission address. When enabled, this can be used to submit a new message to the list served by the -hypermail archive. "NONE" means don't use it.
-
-hmail = hypermail@hypermail.org (disabled by default)
-
-
newmsg_command = [ string ]
+hypermail archive. "NONE" means don't use it.
+
+hmail = hypermail@hypermail.org (disabled by default) +
newmsg_command = [ string ]
This specifies the mail command to use when converting the set_hmail address to links in replies. The variables $TO, $SUBJECT, -and $ID can be used in constructing the command string.
-
-newmsg_command=mailto:$TO
-
-
replymsg_command = [ string ]
+and $ID can be used in constructing the command string.
+
+newmsg_command=mailto:$TO +
replymsg_command = [ string ]
This specifies the mail command to use when converting the set_hmail address to links in replies. The variables $TO, $SUBJECT, and $ID can be used in constructing the command string. The value from the mailcommand option will be used -if this option is not specified.
+if this option is not specified.
There may be browsers that will benefit from adding something like
%26In-Reply-To=&lt;$ID&gt;
to the command, but I've heard no reports of this actually -working.
-
-replymsg_command=mailto:$TO?Subject=$SUBJECT
-
-
inreplyto_command = [ string ]
+working.
+
+replymsg_command=mailto:$TO?Subject=$SUBJECT +
inreplyto_command = [ string ]
This specifies a URI template to a script that hypermail will link to if it's unable to find in the archive's messages the MID corresponding to an In-Reply-To header. The variable $ID is used to specify where the Message-Identifier -value will appear in the link.
-
-inreplyto_command = http://example.org/mid-resolver/$ID. +value will appear in the link.
+
+inreplyto_command = https://example.org/mid-resolver/$ID. (disabled by default)
-
-

Miscellaneous

-
-
stripsubject = [ string | NONE ]
+
+

Miscellaneous

+
+
archive_date = [ 0 | 1 ]
+
adds a specific line in the indexes giving the date the archive was generated.
+
+ archive_date = On (disabled by default)
+
hypermail_colophon = [ 0 | 1 ]
+
adds a footnote stating the hypermail version that was used and generation date.
+
+ hypermail_colophon = On (enabled by default)
+
stripsubject = [ string | NONE ]
A string to be stripped from all subject lines. Helps unclutter -mailing lists which add tags to subject lines.
-
-stripsubject = NONE
-
-
mailcommand = [ direct mailto | cgi-bin script path | -NONE ]
+mailing lists which add tags to subject lines.
+
+stripsubject = NONE +
mailcommand = [ direct mailto | cgi-bin script path | +NONE ]
This is the mail command that email links go to, for instance "mailto:$TO" or -"/cgi-bin/mail?to=$TO&replyto=$ID&subject=$SUBJECT"
-In constructing this command, you can specify variables:
-
-$TO : the email address of the person you're sending mail to.
-$ID : the ID of the message you're replying to.
-$SUBJECT: the subject you're replying to.
-
-NONE disables mailcommand usage.
+"/cgi-bin/mail?to=$TO&replyto=$ID&subject=$SUBJECT"
+In constructing this command, you can specify variables:
+
+$TO : the email address of the person you're sending mail to.
+$ID : the ID of the message you're replying to.
+$SUBJECT: the subject you're replying to.
+
+NONE disables mailcommand usage.
There may be browsers that will benefit from adding something like
%26In-Reply-To=&lt;$ID&gt;
to the command, but I've heard no reports of this actually -working.
-
-mailcommand = mailto:$TO?Subject=$SUBJECT
-
-
mailto = [ email-address | NONE ]
+working.
+
+mailcommand = mailto:$TO?Subject=$SUBJECT +
mailto = [ email-address | NONE ]
The address of the contact point that is put in the HTML header -line
-<LINK REV=made HREF=mailto:mailto>
-
+line
+<LINK REV=made HREF=mailto:mailto>
+
The <LINK...> header can be disabled by default by setting -mailto to NONE.
-
-mailto = webmaster@hypermail.org (disabled by default)
-
-
domainaddr = [ domainname | NONE ]
+mailto to NONE.
+
+mailto = webmaster@hypermail.org (disabled by default) +
domainaddr = [ domainname | NONE ]
Domain-ize Addresses -- addresses appearing in the RFC2822 field which lack hostname can't be made into proper HREFs. Because the MTA resides on the same host as the list, it is often not required to domain-ize these addresses for delivery. In such cases, -hypermail will add the DOMAINADDR to the email address.
-
-domainaddr = hypermail.org (disabled by default)
-
-
use_sender_date = [ 0 | 1 ]
+hypermail will add the DOMAINADDR to the email address.
+
+domainaddr = hypermail.org (disabled by default) +
use_sender_date = [ 0 | 1 ]
Set this to On to have it use the Date: header (created by the the system that sent the message) rather than the date/time the message was received, for purposes such as putting in folders or sorting. Details of which purposes this affects may change in the -future.
-
-use_sender_date = 0
-
-
fragment_prefix = [ preifx ]
+future.
+
+use_sender_date = 0 +
fragment_prefix = [ preifx ]
Put this string before the message number in each URI -fragment.
-
-fragment_prefix = id (default is msg)
-
-
email_address_obfuscation = [ 0 | 1 ]
+fragment.
+
+fragment_prefix = id (default is msg) +
email_address_obfuscation = [ 0 | 1 ]
Set to 1 to enable email address obfuscation using numeric -character references.
-
-mail_address_obfuscationx = 1 (disabled by default)
-
-

Index page options

-

Index + character references.
+
+ mail_address_obfuscation = 1 (disabled by default)

+
default_css_url = [ URL | NONE ]
+
Gives the URL for the default CSS file used by hypermail if either the icss_url or mcss_url configuration options are not declared. The stylesheet will be linked to thru a link element in the in the document's head. +
+ The default value is hypermail.css (link relative to the archive).
+
+ default_css_url = https://example.org/StyleSheets/my_hypermail.css +
+
+
+ +
+

Index page options

+

Index availability

- -
folder_by_date = [ strftime-date-format ]
+
+
folder_by_date = [ strftime-date-format ]
This string causes the messages to be put in subdirectories by date. The string will be passed to strftime(3) to generate subdirectory names based on message dates. Suggested values are @@ -511,131 +527,116 @@

Index "%G/%V" for weekly. Do not alter this for an existing archive without removing the old html files. If you use this and update the archive incrementally (e.g. with -u), you must use the usegdbm option.
-
-folder_by_date = %y%m (disabled by default)

-
-
monthly_index = [ 0 | 1 ]
+"#usegdbm">usegdbm option.
+
+folder_by_date = %y%m (disabled by default) +
monthly_index = [ 0 | 1 ]
Set this to On to create additional index files broken up by month. A summary.html file will provide links to all the monthly -indices.
-
-monthly_index = 0
-
-
msgsperfolder = integer
+indices.
+
+monthly_index = 0 +
msgsperfolder = integer
Put messages in subdirectories with this many messages per directory. Do not use this and folder_by_date on the same archive. Do not alter this for an existing archive without removing the old html files. Deleted/expired messages are counted for the purpose of deciding how many messages to put in a -subdirectory.
-
-msgsperfolder = 100 (disabled by default)
-
-
yearly_index = [ 0 | 1 ]
+subdirectory.
+
+msgsperfolder = 100 (disabled by default) +
yearly_index = [ 0 | 1 ]
Set this to On to create additional index files broken up by year. A summary.html file will provide links to all the yearly -indices.
-
-yearly_index = 0
-
-
defaultindex = [ thread | date | subject | author | -attachment ]
+indices.
+
+yearly_index = 0 +
defaultindex = [ thread | date | subject | author | +attachment ]
This indicates the default type of main index hypermail will generate. Users see this type of index when the archive is first accessed. When using the folder_by_date or msgsperfolder options, this option applies to -subdirectories.
-
-defaultindex = thread
-
-
default_top_index = [ folders | thread | date | subject -| author | attachment ]
+subdirectories.
+
+defaultindex = thread +
default_top_index = [ folders | thread | date | subject +| author | attachment ]
This specifies the default index that users can view when entering the top level of an archive that uses the folder_by_date or msgsperfolder option.
-
-default_top_index = folders
-
-
avoid_indices = [ string ]
+"#msgsperfolder">msgsperfolder option.
+
+default_top_index = folders +
avoid_indices = [ string ]
This is a list of index files to not generate. Valid types are date, thread, author, and subject. They can be listed individually on multiple lines or comma or space separated on a single line. When using the folder_by_date or msgsperfolder options, this option -applies to subdirectories.
-
-avoid_indices = subject author (disabled by default)
-
-
avoid_top_indices = [ string ]
+applies to subdirectories.
+
+avoid_indices = subject author (disabled by default) +
avoid_top_indices = [ string ]
This is a list of index files to not generate for the top directory of an archive using the folder_by_date or msgsperfolder option. Valid types are date, thread, author, subject, folders, and -attachment.
-
-avoid_top_indices = date thread author subject
-
-
attachmentsindex = [ 0 | 1 ]
+attachment.
+
+avoid_top_indices = date thread author subject +
attachmentsindex = [ 0 | 1 ]
Set this to Off to make hypermail not output an index of -messages with attachments.
-
-attachmentsindex = On
-
-
latest_folder = [ string ]
+messages with attachments.
+
+attachmentsindex = On +
latest_folder = [ string ]
If folder_by_date or msgsperfolder are in use, create a symbolic link by this name to the most recently created subdirectory. Note that many web servers are configured to not follow symbolic links for security reasons. The link will be created in the directory -specified by the "dir" or "-d" option.
-
-latest_folder = current (disabled by default)
-
-
noindex_onindexes = [ 0 | 1 ]
+specified by the "dir" or "-d" option.
+
+latest_folder = current (disabled by default) +
noindex_onindexes = [ 0 | 1 ]
Tells hypermail to add a noindex metadata to its generated message indexes (by author, etc.), to instruct robots to not index the indexes. See anontated -for further discussion on noindex.
-
+for further discussion on noindex.
+
noindex_onindexes = 0
-
-

Index body style

-
-
indextable = [ 0 | 1 ]
-
Setting this variable to 1 will tell Hypermail to generate a -message index Subject/Author/Date listings using a table format. -Set to 0 if you want the standard Hypermail index page look and -feel.
-
-indextable = 0
-
-
reverse = [ 0 | 1 ]
+
empty_archive_notice = [ string ]
+
Message that hypermail should display in the indices if the archive +is empty (e.g, no messages, or all messages marked as deleted or spam), +If not set, hypermail will use a default localized message. +
+empty_archive_notice = "(no messages are available in this archive)"
+
+

Index body style

+
+
reverse = [ 0 | 1 ]
Setting this variable to 1 will reverse-sort the article entries in the date and thread index files by the date they were received. That is, the most recent messages will appear at the top of the index rather than the other way around. Set to 0 if you want -latest message on the bottom for date and thread indexes.
-
-reverse = 0
-
-
reverse_folders = [ 0 | 1 ]
+latest message on the bottom for date and thread indexes.
+
+reverse = 0 +
reverse_folders = [ 0 | 1 ]
Setting this variable to On will reverse-sort the list of folders. That is, the most recent folders will appear at the top of -the index rather than the other way around.
-
-reverse_folders = 0
-
-
thrdlevels = number
+the index rather than the other way around.
+
+reverse_folders = 0 +
thrdlevels = number
This specifies the number of thread levels to outline in the thread index. For instance, if thrdlevels is 2, replies to messages will be indented once in the index, but replies to replies, etc., -will only be indented once as well. The normal value is 4.
-
-thrdlevels = 4
-
-
thread_file_depth = [ 0 | 1 ]
+will only be indented once as well. The normal value is 4.
+
+thrdlevels = 4 +
thread_file_depth = [ 0 | 1 ]
If nonzero, break the threads index file into multiple files, with the initial message of each thread in the main index file along with links to files containing the replies. Setting this to 1 @@ -643,336 +644,342 @@

Index body style

recommended for archives with over a few hundred messages. Setting this greater than 1 will produce multiple levels of files for each thread whose replies are nested by more than 1 level, but that is -rarely useful. This option is currently disabled if the indextable -option is turned on, and probably needs to be less than -thrdlevels.
-
-thread_file_depth = 0
-
-
icss_url= [ URL | NONE ]
+rarely useful.
+
+thread_file_depth = 0 +
icss_url = [ URL | NONE ]
This option let's you specify an external stylesheet that you -would like to link to the index files. The stylesheet will be -linked to thru a LINK element in the HEAD in the document's -HEAD.
-By default, this option is deactivated.
-
-icss_url = -http://www.w3.org/StyleSheets/Mail/public-messagelist.css
-
-
describe_folder = format string
+ would like to link to the index files. The stylesheet will be + linked to thru a link element in the in the document's + head. If this option is not configured, indexes will use the value of the + default_css_url.
+ By default, this option is inactive.
+
+icss_url = +https://example.org/StyleSheets/index.css +
describe_folder = format string
Controls the labels used in folders.html to describe the directories created by the folder_by_date or msgsperfolder options. For folder_by_date labels, the describe_folder string will be passed to strftime(3) -the same as the folder_by_date string.
-For msgsperfolder:
-%d for the directory number (starts with 0)
-%D for the directory number (starts with 1)
-%m for the number of the first message in the directory
+the same as the folder_by_date string.
+For msgsperfolder:
+%d for the directory number (starts with 0)
+%D for the directory number (starts with 1)
+%m for the number of the first message in the directory
%M for the number of the last message that can be put in the -directory.
+directory.
The default is the value of folder_by_date if that is selected, -"%d" for msgsperfolder.
-
-describe_folder = "%b %Y"
-
-

Index headers/footers

-
-
archives = [ URL | NONE ]
+"%d" for msgsperfolder.
+
+describe_folder = "%b %Y" +
+

Index headers/footers

+
+
archives = [ URL | NONE ]
This creates a link in the archived index pages labeled "Other -mail archives". Set this to NONE to omit such a link.
-
-archives = NONE
-
-
custom_archives = [ HTML text | NONE ]
+mail archives". Set this to NONE to omit such a link.
+
+archives = NONE +
custom_archives = [ HTML text | NONE ]
If this variable is defined, a navigation entry will be created below the sorted_by_x list entry, with the text "Other mail archives: " followed by the value of this variable. Set it to NONE -to ommit such an entry.
-
-custom_archives = NONE
-
-
about = [ URL | NONE ]
+to ommit such an entry.
+
+custom_archives = NONE +
about = [ URL | NONE ]
This creates a link in the archived index pages labeled "About -this archive". Set this to NONE to omit such a link.
-
-about = NONE
-
-
ihtmlheaderfile = [ path to index header template file -| NONE ]
+this archive". Set this to NONE to omit such a link.
+
+about = NONE +
ihtmlheaderfile = [ path to index header template file +| NONE ]
Set this to the path to the Index header template file. The template file contains HTML directives and substitution cookies for -runtime expansion.
-
-ihtmlheaderfile = /lists/hypermail-idxheader.hyp (disabled +runtime expansion.
+
+ihtmlheaderfile = /lists/hypermail-idxheader.hyp (disabled by default)
-
-
ihtmlfooterfile = [ path to index footer template file -| NONE ]
+
ihtmlfooterfile = [ path to index footer template file +| NONE ]
Set this to the path to the Index footer template file. The template file contains HTML directives and substitution cookies for -runtime expansion.
-
-ihtmlfooterfile = /lists/hypermail-idxfooter.hyp (disabled +runtime expansion.
+
+ihtmlfooterfile = /lists/hypermail-idxfooter.hyp (disabled by default)
-
-

Message page options

-

Body style

-
-
showhtml = [ 0 | 1 | 2 ]
+
+
+
+

Message page options

+

Body style

+
+
showhtml = [ 0 | 1 | 2 ]
Set this to 1 to show the articles in a proportionally-spaced font rather than a fixed-width (monospace) font. Setting this option to 1 also tells Hypermail to attempt to italicize quoted -passages in articles.
+passages in articles.
Set this to 2 for more complex conversion to html similar to that in txt2html.pl.
-Showhtml = 2 will normally produce nicer looking results than
-showhtml = 1, and showhtml = 0 will look pretty dull, but
-1 and 2 run risks of altering the appearance in undesired ways.
-
-showhtml = 1
-
-
href_detection = [ 0 | 1 ]
+"https://www.cs.wustl.edu/~seth/txt2html/">txt2html.pl.
+Showhtml = 2 will normally produce nicer looking results than
+showhtml = 1, and showhtml = 0 will look pretty dull, but
+1 and 2 run risks of altering the appearance in undesired +ways.
+
+showhtml = 1 +
href_detection = [ 0 | 1 ]
Set this to 1 to assume that any string on the body of the message that says <A HREF=" ... </A> is a URL, together -with its markup and treat it as such.
-
-href_detection = 0
-
-
showbr = [ 0 | 1 ]
+with its markup and treat it as such.
+
+href_detection = 0 +
showbr = [ 0 | 1 ]
Set this to 1 if you want article lines to end with the <br> tag. Else set to 0 to have non-quoted lines word-wrap. -Only takes effect if showhtml is set to 1.
-
-showbr = 1
-
-
iquotes = [ 0 | 1 ]
-
Set this to 1 if you want quoted lines to be shown in italics. -Only take effect if showhtml is set to 1.
-
-iquotes = 1
-
-
i18n_body = [ 0 | 1 ]
-
Translate message body into UTF-8. The i18n -configuration option must be enabled.
-
-i18n_body = 1 (disabled by default)
-
-
mcss_url= [ URL | NONE ]
+Only takes effect if showhtml is set to 1.
+
+showbr = 1 +
iquotes = [ 0 | 1 ] DEPRECATED, use css .quote rule instead
+
Set this to 1 if you want quoted lines to be shown in italics. +Only take effect if showhtml is set to 1.
+
+iquotes = 1
+
i18n_body = [ 0 | 1 ]
+
Translate message body into UTF-8. The i18n +configuration option must be enabled.
+
+i18n_body = 1 (disabled by default)
+
mcss_url = [ URL | NONE ]
This option let's you specify an external stylesheet that you would like to link to the message files. The stylesheet will be -linked to thru a LINK element in the HEAD in the document's HEAD. -By default, this option is inactive.
-
-mcss_url = -http://www.w3.org/StyleSheets/Mail/public-message.css
-
-
quote_hide_threshold = percent (integer)
-
If the linkquotes option is on, + linked to thru a link element in the in the document's head. +If this option is not configured, messages will use the value of the + default_css_url.
+ By default, this option is inactive.
+
+mcss_url = +https://example.org/StyleSheets/message.css
+
quote_hide_threshold = percent (integer)
+
If the linkquotes option is on setting this to an integer less than 100 will cause it to replace quoted text with one-line links if the percent of lines in the message body (exluding the signature) consisting of quoted text -exceeds the number indicated by this option.
-
-quote_hide_threshold = 100
-
-
files_by_thread = [ 0 | 1]
+exceeds the number indicated by this option.
+
+quote_hide_threshold = 100 +
files_by_thread = [ 0 | 1]
Set this to 1 to generate (in addition to the usual files), a file for each thread that contains all the messages in that thread. The first line in each thread of the thread index page links to -this file instead of to a single message.
-
-files_by_thread = 0
-
-

Message page links

-
-
linkquotes = [ 0 | 1 ]
-
Set this to On to create fine-grained links from quoted text to +this file instead of to a single message.
+
+files_by_thread = 0
+
+

Message page links

+
+
linkquotes = [ 0 | 1 ]
+
NOTE: this option has not been working +well since 2.4.0 and should now be considered experimental.
+It may be deprecated in the next version of hypermail.

+Set this to On to create fine-grained links from quoted text to the text where the quote originated. It also improves the threads index file by more accurately matching messages with replies. Note that this may be rather cpu intensive (see the searchbackmsgnum option to alter the -performance).
-
-linkquotes = 0
-
-
searchbackmsgnum = postive integer
+performance).
+
+linkquotes = 0 +
searchbackmsgnum = postive integer
If the linkquotes option is on and an incremental update is being done (-u option), this controls the tradeoff between speed and the reliability of finding the right source for quoted text. Try to set it to the largest number of messages between a message and the final direct reply to that -message.
-
-searchbackmsgnum = 500
-
-
link_to_replies = [ string | NONE]
+message.
+
+searchbackmsgnum = 500 +
If the linkquotes option is on, specifying a string here causes it to generate links from original quoted text the location(s) in replies which quote them. The string -is used to display the link.
-
-link_to_replies = NONE
-
-
quote_link_string = [ string | NONE ]
+is used to display the link.
+
+link_to_replies = NONE +
If the quote_hide_threshold option is being used, the quote_link_string will be used if available to display the link that replaces the quoted text. If no string is specified here, the first line of each section of quoted -text will used.
-
-quote_link_string = NONE
-
-
spamprotect = [ 0 | 1 ]
+text will used.
+
+quote_link_string = NONE +
spamprotect = [ 0 | 1 ]
Set this to On to make hypermail not output real email addresses in the output HTML but instead it will obfuscate them a little. You can control the obfuscation with antispamdomain.
-
-spamprotect = On
-
-
antispamdomain = string with invalid -domain
+"#antispamdomain">antispamdomain.
+
+spamprotect = On +
antispamdomain = string with invalid +domain
By default the spamprotect option only does a small amount of massaging of email addresses. Use this to completely replace the domain from which a message originates (everything after the @) with some string to confuse screen-scraping programs. It is -probably wise to make this an invalid mail domain.
-
-antispamdomain = "email.domain.hidden" (disabled by +probably wise to make this an invalid mail domain.
+
+antispamdomain = "email.domain.hidden" (disabled by default)
-
-
spamprotect_id = [ 0 | 1 ]
+
spamprotect_id = [ 0 | 1 ]
Set this to On to make hypermail not output real email message ids in HTML comments (sometimes used internally by hypermail) but instead it will obfuscate them a little so they don't look like -email addresses to spammers.
-
-spamprotect_id = On
-
-

Message page +email addresses to spammers.
+
+spamprotect_id = On

+
+

Message page headers/footers

- -
showreplies = [ 0 | 1 ]
+
+
showreplies = [ 0 | 1 ]
Set to 1 to show all replies to a message as links in article -files. If this is set to 0 no reply links are generated.
-
-showreplies = 1
-
-
show_msg_links = [ 0 | 1 | 3 | 4 ]
+files. If this is set to 0 no reply links are generated.
+
+showreplies = 1 +
Set this to 1 if you want links to Next, Prev, Next thread, Reply to, etc. displayed on the article pages. Setting this to 0 disables these links from appearing on the generated pages. Set it to 3 to produce those links only at the top of the message pages, -or 4 to produce those links only at the bottom of the message.
-
-show_msg_links = 1
-
-
show_index_links = [ 0 | 1 | 3 | 4 ]
+or 4 to produce those links only at the bottom of the +message.
+
+show_msg_links = 1 +
Set this to 1 to show links to index pages from the top and bottom of each message file. Set it to 0 to avoid those links. Set it to 3 to show the links only at the top of the message pages, or -4 to produce those links only at the bottom of the message.
-
-show_index_links = 1
-
-
showheaders = [ 0 | 1 ]
+4 to produce those links only at the bottom of the message.
+
+show_index_links = 1 +
showheaders = [ 0 | 1 ]
Set this to 1 to show the RFC 2822 message headers To:, From:, and Subject: information found in the email messages. Set to 0 if -you want to hide mail headers in articles.
-
-showheaders = 0
-
-
show_headers = List of RFC 2822 Headers to -display
+you want to hide mail headers in articles.
+
+showheaders = 0 +
show_headers = List of RFC 2822 Headers to +display
This is the list of headers to be displayed if showheaders is set to 1 (TRUE) They can be listed comman or space separated all on -a single line such as
+a single line such as
      show_headers = -From,Subject,Date,Message-ID
-
-or they can be listed individually or any combination of.
-
-      show_headers = From
-      show_headers = Subject
-      show_headers = Date
-      show_headers = Message-ID
-
+From,Subject,Date,Message-ID
+
+or they can be listed individually or any combination of.
+
+      show_headers = From
+      show_headers = Subject
+      show_headers = Date
+      show_headers = Message-ID
+
If show_headers contains the special character ``*'', then -hypermail will display all header lines.
+hypermail will display all header lines.
NOTE: Do not put the ':' at the end of the -headers.
-
-show_headers = From,Subject,Date,Message-ID (disabled by +headers.
+
+show_headers = From,Subject,Date,Message-ID (disabled by +default)
+
show_headers_msg_rfc822 = List of RFC 2822 Headers to + display in message/rfc822 attachments
+
This option is identical to show_headers but only +applies to message/rfc822 attachments. If it is not defined, hypermail will use + show_headers for message/rfc822 attachments.
+
+show_headers_message_rfc822 = From,Subject,Date,Message-ID,Archived-At (disabled by default)
-
-
format_flowed [ 0 | 1 ] (EXPERIMENTAL)
+
format_flowed [ 0 | 1 ] (EXPERIMENTAL)
Enable this option to support RFC 3676 format=flowed. When this option is enabled and a message says it is supporting format=flowed, hypermail will recreate a long-line that has been -split into multiple lines as a single one.
-
+split into multiple lines as a single one.
+
format_flowed = 1 (disabled by default)
-
-
format_flowed_disable_quoted [ 0 | 1 ]
+
format_flowed_disable_quoted [ 0 | 1 ]
Use this option if you want to disable support for format=flowed inside quoted text (the lines starting with one or more '>' characters). This option is always disabled if -format_flowed is not enabled.
-
+format_flowed is not enabled.
+
format_flowed_disable_quoted = 1 (disabled by default)
-
-
mhtmlheaderfile = [ path to message header template -file | NONE ]
+
mhtmlheaderfile = [ path to message header template +file | NONE ]
Set this to the path to the Message header template file. The template file contains HTML directives and substitution cookies for -runtime expansion.
-
-mhtmlheaderfile = /lists/hypermail-msgheader.hyp (disabled +runtime expansion.
+
+mhtmlheaderfile = /lists/hypermail-msgheader.hyp (disabled by default)
-
-
mhtmlfooterfile = [ path to message footer template -file | NONE ]
+
mhtmlfooterfile = [ path to message footer template +file | NONE ]
Set this to the path to the Message footer template file. The -template file contains HTML directives and substitution cookies for -runtime expansion.
-
-mhtmlfooterfile = /lists/hypermail-msgfooter.hyp (disabled -by default)
-
-

Attachments

-
-
inlinehtml [ 0 | 1 ]
+ template file contains HTML directives and substitution cookies for + runtime expansion.
+
+ mhtmlfooterfile = /lists/hypermail-msgfooter.hyp (disabled + by default) + +
mhtmlnavbar2upfile = [ path to message navbar template + file | NONE ]
+
Set this to the path to a template file containing + valid HTML formatting statements that you wish to be + included as information in each archived message, + giving links to the hierarchy of your archive.
+ By default uses the value of ihtmlnavbar2upfile,
+ If mhtmlnavbar2upfile and ihtmlnavbar2upfile + are unspecified, hypermail will use the following breadcrumb + that will use the archive's label and link to the default + index of the archive:
+
+   <ul>
+      <li><a href="./index.html" rel="start"><em>label for the archive</em></a></li>
+   </ul>     
+	
+ mhtmlnavbar2upfile = /lists/hypermail-navbar2upfile.hyp (disabled + by default)
+
+
+
+

Attachments

+
+
inlinehtml [ 0 | 1 ]
Define to On to make text/html parts to get inlined with the mails. If set to Off, HTML-parts will be stored as separate files. A "Content-Disposition: attachment;" line in the mail will cause an HTML-part to be stored as a separate file even if this option is -On.
-
-inlinehtml = 1
-
-
usemeta [ 0 | 1 ]
+On.
+
+inlinehtml = 1 +
usemeta [ 0 | 1 ]
This option allows you to use metadata to store the content type of a MIME attachments and, later on, when a user browses the attachment, send back this information in the HTTP Content-Type header. When set to 1, the Content-Type header of -a MIME attachment will be stored in a metadata file.
+a MIME attachment will be stored in a metadata file.
Let us say that the MIME attachments for a message are stored in directory att-num. The metadata for those attachments will then be stored in directory att-num/.meta. If a MIME attachment is stored in file att-file, its metadata will be stored in file att-file.meta. This convention is directly -compatible with the Apache server handling of metadata.
-
-usemeta = 0
-
-
userobotmeta [ 0 | 1 ]
+compatible with the Apache server handling of metadata.
+
+usemeta = 0 +
userobotmeta [ 0 | 1 ]
If a message has annotations for robots and usemeta is enabled, setting this option to 1 will associate the value of the annotations to @@ -980,85 +987,95 @@

Attachments

X-Robots-Tag HTTP header. For more information, browse Google's -Robots Meta Tag documentation.
-
+Robots Meta Tag documentation.
+
userobotmeta = 0
-
-
text_types = list of types to be the same as -text/plain
+
text_types = list of types to be the same as +text/plain
This is a list of MIME types that you want hypermail to treat exactly as if they were text/plain. They can be listed individually -on multiple lines or comma or space separated on a single line.
-
-text_types = text, text/plain, message/rfc2822 (disabled by +on multiple lines or comma or space separated on a single +line.
+
+text_types = text, text/plain, message/rfc2822 (disabled by default)
-
-
inline_types = indicate data types data to be -inlined
+
inline_types = indicate data types data to be +inlined
This is the list of MIME types that you want inlined as opposed to simply linked into the message. They can be listed individually -on multiple lines or comma or space separated on a single line.
-
+on multiple lines or comma or space separated on a single +line.
+
      inline_types = image/gif -image/jpeg
-or
-      inline_types = image/gif
-      inline_types = image/jpeg
-
-inline_types = image/gif image/jpeg
-
-
inline_addlink = [ 0 | 1 ]
+image/jpeg
+or
+      inline_types = image/gif
+      inline_types = image/jpeg
+
+inline_types = image/gif image/jpeg +
Set to On to add inline links to content that is stored in the\ attachments subdirectory. inline_types must be -enabled.
-
-inline_addlink = 1 (enabled by default)
-
-
prefered_types = multipart/mixed types to -present
+enabled.
+
+inline_addlink = 1 (enabled by default) +
prefered_types = multipart/mixed types to +present
When mails using multipart/mixed or multipart/alternative types are scanned, this list of MIME types defines which part you want -presented in the result.
-
-prefered_types = text/plain, text/html
-
-
ignore_types = indicate types of attachments to -ignore
+presented in the result.
+
+prefered_types = text/plain, text/html +
ignore_types = indicate types of attachments to +ignore
This is the list of MIME attachment types that you do not want to do anything with. They are quietly ignored and are not processed. They can be listed individually on multiple lines or comma or space separated on a single line. -

Two special types may be used here:
+

Two special types may be used here:
$BINARY - ignore all types that would be stored as separate -files.
+files.
$NONPLAIN - ignore all types not treated as text/plain, and all -$BINARY types.
+$BINARY types.
Note: the behavior of these may be affected by the inlinehtml option.

-


+


      ignore_types = text/x-vcard -application/x-msdownload
-or
-      ignore_types = text/x-vcard
+application/x-msdownload
+or
+      ignore_types = +text/x-vcard
      ignore_types = -application/x-msdownload
-
-ignore_types = text/x-vcard
-ignore_types = application/x-msdownload

+application/x-msdownload
+
+ignore_types = text/x-vcard
+ignore_types = application/x-msdownload

-
-
attachmentlink = attachment-link-format
-
Format of the attachment links.
-
-%p for the full path to the attachment
-%f for the file name part only
-%d for the directory name only
-%n for the message number
-%c for the content type string
-
-attachmentlink = "%p"
-
-
applemail_mimehack [ 0 | 1 ]
+
ignore_content_disposition = indicate types + of attachments for which you want to ignore the Content-Disposition +header
+
This is the list of MIME attachment types for which you do not want +to honor an associated Content-Disposition header (thus making them inline by default). +The header is silenty ignored and is not parsed.
+The MIME types can be listed individually on multiple lines or +comma or space separated on a single line.
+This option is useful when dealing with broken UA, such as early Apple Mail +clients that associated Content-Disposition: attachment with +multipart/appledouble. Later on, Apple fixed this and either used +Content-Disposition: inline or discarded this header entirely.
+
+ignore_content_disposition = multipart/appledouble (disabled by default)
+ +
Format of the attachment links.
+
+%p for the full path to the attachment
+%f for the file name part only
+%d for the directory name only
+%n for the message number
+%c for the content type string
+
+attachmentlink = "%p"
+
applemail_mimehack [ 0 | 1 ]
In a multipart/alternative message, Apple Mail (as of June/2018) is only adding attachments to the text/html related part. Set this option to On to force the display of all alternate @@ -1066,10 +1083,9 @@

Attachments

only a text/plain and a text/html alternatives and the preference is for text/plain, the text/html alternative won't be displayed. This option won't be taken into account if your prefered type is -text/html or if you enabled the save_alts option.
-applemail_mimehack = 0 (disabled by default)
-
-
unsafe_chars = list of chars to prohibit
+text/html or if you enabled the save_alts option.
+applemail_mimehack = 0 (disabled by default) +
unsafe_chars = list of chars to prohibit
Any characters listed in this string are removed from user-specified attachment filenames. Those characters will be replaced by a "_" (which means that specifying "_" here won't have @@ -1079,64 +1095,72 @@

Attachments

prevented if you specify "." here (e.g. if a web server is configured to enable server side includes on filenames ending in something other than .shtml), but that will prevent browsers from -recognizing many file types.
-
-unsafe_chars = "." (disabled by default)
-
-
save_alts = [ 0 | 1 | 2 ]
+recognizing many file types.
+
+unsafe_chars = "." (disabled by default) +
save_alts = [ 0 | 1 | 2 ]
This controls what happens to alternatives (other than the -prefered alternative) for multipart/alternative messages.
-0 - discard non-prefered alternatives
-1 - show all alternatives inline
-2 - put non-prefered alternatives in a separate file.
-
-save_alts = 0
-
-
alts_text = descriptive text
+prefered alternative) for multipart/alternative messages.
+0 - discard non-prefered alternatives
+1 - show all alternatives inline
+2 - put non-prefered alternatives in a separate file.
+
+save_alts = 0 +
alts_text = descriptive text
If save_alts is 1, this text is put between the -alternatives.
+alternatives.
If save_alts is 2, this text is used to describe the link to each -alternative file.
-
-alts_text = "alternate version of message" (the default if -save_alts = 2)
-alts_text = "<hr>" (the default if save_alts = 1)
-
-

System administration

-

Message input

-
-
increment = [ -1 | 0 | 1 ]
-

+alternative file.
+
+alts_text = "alternate version of message" (the default if +save_alts = 2)
+alts_text = "<hr>" (the default if save_alts = 1)
+
max_attachments_per_message = positive integer
+
This specifies the maximum number of attachments that will be + processed for a message. Any attachments beyond this limit + will be ignored.
+ Set to 0 to remove this limit.
+ Note that if you remove this limit or set it too high, + memory usage may increase when parsing a message made up + of hundreds of attachments.
+
+ max_attachments_per_message = 30 (set to 200 by default) +
+
+
+
+

System administration

+

Message input

+
+
increment = [ -1 | 0 | 1 ]
+

Define as 1 to append all input messages to the end of existing -archives.
+archives.
Define as 0 for it to read a mailbox that corresponds to the entire archive. (See the mbox_shortened option for an exception to the requirement that it be the entire archive). If there are any existing html messages, it will figure out which ones at the end of the mailbox are new, and add only -those that haven't been converted yet.
+those that haven't been converted yet.
Define as -1 to have hypermail figure out whether the input is entirely new messages to be appended or whether it contains messages that are already in the archive. A value of -1 cannot be used with the mbox_shortened option or with the -i command line -option or with mbox = NONE.
-
-increment = 0
-
-
readone = [ 0 | 1 ]
+option or with mbox = NONE.
+
+increment = 0 +
readone = [ 0 | 1 ]
Set this to 1 to specify there is only one message in the -input.
-
-readone = 0
-
-
mbox = [ filename | NONE ]
+input.
+
+readone = 0 +
mbox = [ filename | NONE ]
This is the default mailbox to read messages in from. Set this with a value of NONE to read from standard input as the -default.
-
-mbox = NONE
-
-
mbox_shortened = [ 0 | 1 ]
+default.
+
+mbox = NONE +
mbox_shortened = [ 0 | 1 ]
Set this to 1 to enable use of mbox that has had some of its initial messages deleted. Requires usegdbm = 1 and increment = 0. The first message in the shortened mbox must have a Message-Id @@ -1145,195 +1169,181 @@

Message input

Message-Id as a message that was deleted. The mbox may not be altered in any way other than deleting from beginning of the mbox or appending new messages to the end (unless you rebuild the -archive from scratch using a complete mbox).
-
-mbox_shortened = 0
-
-
ietf_mbox = [ 0 | 1 ]
+archive from scratch using a complete mbox).
+
+mbox_shortened = 0 +
ietf_mbox = [ 0 | 1 ]
Setting this variable to 1 will tell hypermail that the mbox is formatted according to the IETF mbox convention: all lines, except -for the envelope, are prefixed with a > char.
-
-ietf_mbox = 0
-
-
discard_dup_msgids = [ 0 | 1 ]
+for the envelope, are prefixed with a > char.
+
+ietf_mbox = 0 +
discard_dup_msgids = [ 0 | 1 ]
Set this to 0 to accept messages with a Message-ID matching that of a message already in this archive. By default such messages -are discarded.
-
-discard_dup_msgids = 1
-
-
require_msgids = [ 0 | 1 ]
+are discarded.
+
+discard_dup_msgids = 1 +
require_msgids = [ 0 | 1 ]
Set this to 0 to accept messages without a Message-ID -header.
-Set this to 1 to discard messages without a Message-ID header.
-By default such messages are discarded.
-
-require_msgids = 1
-
-

Message Filtering

-Regular expression support is provided by the PCRE  library package, which is +header.
+Set this to 1 to discard messages without a Message-ID +header.
+By default such messages are discarded.
+
+require_msgids = 1
+
+

Message Filtering

+

Regular expression support is provided by the PCRE  library package, which is open source software, written by Philip Hazel, and copyright by the -University of Cambridge, England. +University of Cambridge, England.

The full body searches can be slow, and do not match multi-line -strings in message bodies. A string that spans multiple lines of a -header can be matched.

-

- -
filter_out = expression
+ strings in message bodies. A string that spans multiple lines of a + header can be matched.

+
+
filter_out = expression
Delete from the html archives any message having a header line which matches any of these expressions. Uses the same rules for deletion as the expires option. The expressions use the same syntax as Perl regular expressions.

The following examples should reject messages Cc'd to more than 3 addresses or from any address at spammers.com. This option is -disabled by default.
-
-filter_out=Cc:([^,]*,){3}
-filter_out=From:.+@spammers.com

+disabled by default.
+
+filter_out=Cc:([^,]*,){3}
+filter_out=From:.+@spammers.com

-
-
filter_require = expression
+
filter_require = expression
Delete from the html archives any message not having header lines which match each of these expressions. Uses the same rules for deletion as the expires option. The expressions use the same -syntax as Perl regular expressions.
-
-filter_require =
-
-
filter_out_full_body = expression
+syntax as Perl regular expressions.
+
+filter_require = +
filter_out_full_body = expression
Delete from the html archives any message having a line which matches any of these expressions. Uses the same rules for deletion as the expires option. The expressions use the same syntax as Perl -regular expressions.
-
-filter_out_full_body =
-
-
filter_require_full_body = expression
+regular expressions.
+
+filter_out_full_body = +
filter_require_full_body = expression
Delete from the html archives any message not having lines which match each of these expressions. Uses the same rules for deletion as the expires option. The expressions use the same syntax -as Perl regular expressions.
-
-filter_require_full_body =
-
-

Filesystem output

-
-
dir = [ directory path | NONE ]
+as Perl regular expressions.
+
+filter_require_full_body = +
+

Filesystem output

+
+
dir = [ directory path | NONE ]
This is the default directory that Hypermail uses when creating and updating archives. If set to NONE, the directory will have the -same name as the input mailbox.
+same name as the input mailbox.
Note that the date that Hypermail was run will be used, not a date from the message (use the folder_by_date option to have Hypermail use -dates from messages).
-
-dir = NONE
-
-
overwrite = [ 0 | 1 ]
-
Set to 1 to make Hypermail rewrite all messages.
-Set to 0 to rewrite as few messages as possible.
+dates from messages).
+
+dir = NONE
+
overwrite = [ 0 | 1 ]
+
Set to 1 to make Hypermail rewrite all messages.
+Set to 0 to rewrite as few messages as possible.
Rewriting all messages is slower, but if you change the options that control the appearance of the messages you may want to rewrite all the messages to make the appearance consistent throughout the archive. (This defaulted to 1 for most versions 2.0 through 2.1.3, presumably to encourage archives that upgraded to have a single -style. The default was changed back to 0 after 2.1.3).
-
-overwrite = 0
-
-
htmlsuffix = [ html | htm | shtml ... ]
+style. The default was changed back to 0 after 2.1.3).
+
+overwrite = 0 +
htmlsuffix = [ html | htm | shtml ... ]
Use this to specify the html file suffix to be used when Hypermail generates the html files. This is dependent on local needs. Do not put a '.' in the value. It would result in -"file..html", probably not what you want.
-
-htmlsuffix = shtml
-
-
dirmode = octal number
+"file..html", probably not what you want.
+
+htmlsuffix = shtml +
dirmode = octal number
This is an octal number representing the rwx modes that new directories are set to when they are created. If the archives will be made publically available, it's a good idea to define this as -0755. This must be an octal number.
-
-dirmode = 0755
-
-
filemode = octal number
+0755. This must be an octal number.
+
+dirmode = 0755 +
filemode = octal number
This is an octal number representing the permission modes that new files are set to when they are created. If the archives will be made publically available, it's a good idea to define this as 0644. -This must be an octal number.
-
-filemode = 0644
-
-
filename_base = [ string ]
+This must be an octal number.
+
+filemode = 0644 +
filename_base = [ string ]
This option overrides the normal rules for creating attachment file names, and creates file names from the string that this option is set to plus a file name extension if one can be found in the name supplied by the message. This option is mainly for languages -that use different character sets from English.
-
-filename_base = attachment (disabled by default)
-
-

System miscellaneous

-
-
usegdbm = [ 0 | 1 ]
+that use different character sets from English.
+
+filename_base = attachment (disabled by default) +
+

System miscellaneous

+
+
usegdbm = [ 0 | 1 ]
Set this to 1 to use gdbm to implement a header cache. This will speed up hypermail, especially if your filesystem is slow. It will not provide any speedup with the linkquotes option.
-
-usegdbm = 0
-
-
writehaof = [ 0 | 1 ]
+"#linkquotes">linkquotes option.
+Note that this option has not been reviewed since years and should +now be considered UNMAINTAINED and at risk of being DEPRECATED.
+
+usegdbm = 0 +
writehaof = [ 0 | 1 ]
Set this to On to let hypermail write an XML archive overview -file in each directory. The filename is archive_overview.haof.
-
-writehaof = 0
-
-
append = [ 0 | 1 ]
+file in each directory. The filename is +archive_overview.haof.
+
+writehaof = 0 +
append = [ 0 | 1 ]
Set this to On to maintain a parallel mbox archive. The file -name defaults to mbox in the directory specified by -d or dir.
-
-append = 1
-
-
append_filename = [ string ]
+name defaults to mbox in the directory specified by -d or +dir.
+
+append = 1 +
append_filename = [ string ]
Specifies the filename to be used by the append option. $DIR may be used to specify a name relative to the directory specified in the -d or dir option. The string will be passed to strftime(3) to allow splitting the mailbox into yearly or monthy files, such as -"%Y-%m.mbox".
-
-append_filename = $DIR/INBOX
-
-
txtsuffix = [ string ]
+"%Y-%m.mbox".
+
+append_filename = $DIR/INBOX +
txtsuffix = [ string ]
If you want the original mail messages archived in individual files, set this to the extension that you want these messages to -have (recommended value: txt).
-
-txtsuffix = txt (off by default)
-
-
annotated = list of headers used to indicate -deletion
+have (recommended value: txt).
+
+txtsuffix = txt (off by default) +
annotated = list of headers used to indicate +deletion
This is the list of headers that indicate that a message contains an annotation. Options disabled by -default.
-
+default.
+
In an annotated message, the values of the header specify the type of annotations. The header may have one or more comma-separated values. Order and case are not important. Hypermail recognizes two types of annotations: content and -robot annotations.
-
+robot annotations.
+
Content annotations give information to the reader about how an archive maintainer has operated on an original received message. This operation typically happens as a belated action, for example, when removing spam from an existing archive. Content annotations can have one, and only one, of the following -values:
+values:
spam
message deleted because it is spam;
@@ -1342,7 +1352,7 @@

System miscellaneous

edited
original received message was manually edited.
-
+
If a message specifies more than one content annotation, only the first one will be taken into account. You can customize the markup that\'s shown for content annotations by means of the System miscellaneous

htmlmessage_deleted_spam, and htmlmessage_edited directives. See also the delete_level -option for more info about what happens to a deleted message.
-
+option for more info about what happens to a deleted message.
+
Robot annotations instruct a visiting web robot if a the contents of a message should be indexed and/or if the outgoing links from the message should be followed, doing so thru a specific HTML meta tag (browse About the Robots <META> tag -for further details).
+"https://www.robotstxt.org/">About the Robots <META> tag +for further details).
Robot annotations can have either one or both of the following values:
@@ -1366,91 +1376,85 @@

System miscellaneous

noindex
don't index this message.
-
+
You can use one or both robot annotation values and combine them with the edited content annotation. Note that spam and deleted annotations have an implicit robot noindex annotation. In such case, user supplied robot annotations values will be silently -ignored.
-
+ignored.
+
Use userobotmeta for associating -attachments with annotations for robots.
-
+attachments with annotations for robots.
+
Note that the list maintainer must be careful on whether to accept incoming messages containing the annotated header. If the policy is not to allow that header on incoming messages, it must be filtered out before the message is stored or acted upon by -hypermail.
-
-In an hmrc file:
-annotated = X-Hypermail-Annotated (off by default)
-In a message:
+hypermail.
+
+In an hmrc file:
+annotated = X-Hypermail-Annotated (off by default)
+In a message:
X-Hypermail-Annotated: edited, noindex -
-
deleted = list of headers used to indicate -deletion [DEPRECATED]
+
deleted = list of headers used to indicate + deletion DEPRECATED, replaced by annotated
This is the list of headers that indicate the message should not be displayed if the value of this header is 'yes'. See the delete_level option for more info about -what happens to the message.
+what happens to the message.
NOTE: This option has been deprecated if favor of annotated. However, it will still be -parsed and honored to take into account legacy archives.
-
-deleted = X-Hypermail-Deleted X-No-Archive
-
-
expires = list of headers used to indicate -expiration
+parsed and honored to take into account legacy archives.
+
+deleted = X-Hypermail-Deleted X-No-Archive +
expires = list of headers used to indicate +expiration
This is the list of headers that indicate the message should not be displayed if the value of this header is a date in the past. See the delete_level option for more -info about what happens to the message.
-
-expires = Expires
-
-
delete_older = date/time string
+info about what happens to the message.
+
+expires = Expires +
delete_older = date/time string
Any message older than this date should not be displayed. See the delete_level option for more info about what happens to the message. Any date format that works in -the Date: header line of an email message should work here.
-
-delete_older = "Wed, 14 Mar 2001 12:59:51 +0200" (off by +the Date: header line of an email message should work here.
+
+delete_older = "Wed, 14 Mar 2001 12:59:51 +0200" (off by default)
-
-
delete_newer = date/time string
+
delete_newer = date/time string
Any message newer than this date should not be displayed. See the delete_level option for more info about what happens to the message. Any date format that works in -the Date: header line of an email message should work here.
-
-delete_newer = "Wed, 28 Mar 2001 12:59:51 +0200" (off by +the Date: header line of an email message should work here.
+
+delete_newer = "Wed, 28 Mar 2001 12:59:51 +0200" (off by default)
-
-
delete_msgnum = list of message numbers
+
delete_msgnum = list of message numbers
This is the list of message numbers that should be deleted from the html archive. The mbox is not changed. See the delete_level option for more info about what -happens to the message.
-
-delete_msgnum = 42 666 (off by default)
-
delete_level = [ 0 | 1 | 2 | 3 ]
-

+happens to the message.
+
+delete_msgnum = 42 666 (off by default) +
+
delete_level = [ 0 | 1 | 2 | 3 ]
+

0 - remove deleted and expired files. Note that with this choice threading may be screwed up if there are replies to deleted or -expired options and the archive is updated incrementally
-1 - remove message body
+expired options and the archive is updated incrementally
+1 - remove message body
2 - remove message body for deleted messages, leave expired -messages
-3 - leave all messages
+messages
+3 - leave all messages
Deleted and expired messages are removed from the index files -regardless of the delete_level selection.
-
-delete_level = 1
-
-
delete_incremental = [ 0 | 1 ]
-

+regardless of the delete_level selection.
+
+delete_level = 1
+
delete_incremental = [ 0 | 1 ]
+

If this option is enabled, hypermail will perform deletions on old messages when run in incremental mode (according to the other delete configuration options). Note that depending on your @@ -1458,124 +1462,124 @@

System miscellaneous

markup, there may be memory and parsing issues, specifically when there are non-deleted replies to a deleted message. If this option is disabled, deleted messages will only be removed when rebuilding -the whole archive.
-
-delete_incremental = 0 (enabled by default)
-
-
htmlmessage_edited = [HTML markup string]
-

-Custom markup to use in the body of manually edited message -when delete_level is equal or superior to 2.
-
-htmlmessage_edited = <div class="edited"><img -src="http://example.org/Mail/edited.png" alt="original message +the whole archive.
+
+delete_incremental = 0 (enabled by default)
+
htmlmessage_edited = [HTML markup string]
+

+Custom markup to use in the body of manually edited message when +delete_level is equal or superior to 2.
+
+htmlmessage_edited = <div class="edited"><img +src="https://example.org/Mail/edited.png" alt="original message edited" /> <p class="editedmmes">The originally received message was edited by the archive maintainer.</p><p class="spamfooter">The editing of this email is consistent with -<a href="http://example.org/Mail/">example.org's Mailing List -and Archive Usage Policy.</a></p></div>
+<a href="https://example.org/Mail/">example.org's Mailing List +and Archive Usage Policy.</a></p></div> (disabled by default)
-
-
htmlmessage_deleted_other = [HTML markup -string]
-

-Custom markup to use in the body of deleted messages (by -reasons other than spam) when delete_level is equal or -superior to 2.
-
-htmlmessage_deleted_spam = <div class="deleted"><img -src="http://example.org/Mail/deleted.png" alt="email removed" /> +
htmlmessage_deleted_other = [HTML markup +string]
+

+Custom markup to use in the body of deleted messages (by reasons +other than spam) when delete_level is equal or superior to +2.
+
+htmlmessage_deleted_spam = <div class="deleted"><img +src="https://example.org/Mail/deleted.png" alt="email removed" /> <p class="deletedmmes">This message was removed from our mail archives by the archive maintainer.</p><p class="deletedfooter">The removal of this email is consistent -with <a href="http://example.org/Mail/">example.org's Mailing -List and Archive Usage Policy.</a></p></div>
+with <a href="https://example.org/Mail/">example.org's Mailing +List and Archive Usage Policy.</a></p></div> (disabled by default)
-
-
htmlmessage_deleted_spam = [HTML markup -string]
-

+
htmlmessage_deleted_spam = [HTML markup +string]
+

Custom markup to use in the body of deleted messages (by spam -reasons) when delete_level is equal or superior to 2.
-
-htmlmessage_deleted_spam = <div class="spam"><img -src="http://example.org/Mail/noUCE.png" alt="unsolicited bulk email +reasons) when delete_level is equal or superior to +2.
+
+htmlmessage_deleted_spam = <div class="spam"><img +src="https://example.org/Mail/noUCE.png" alt="unsolicited bulk email removed" /> <p class="spammes">This message was determined to be unsolicited bulk email and has been removed from our archives.</p><p class="spamfooter">The removal of this email is consistent with <a -href="http://example.org/Mail/">example.org's Mailing List and -Archive Usage Policy.</a></p></div>
(disabled +href="https://example.org/Mail/">example.org's Mailing List and +Archive Usage Policy.</a></p></div> (disabled by default)
-
-
progress = [ 0 | 1 | 2 ]
+
progress = [ 0 | 1 | 2 ]
Set to 1 or 2 to show progress as Hypermail works. Set to 0 for silent operation. Output goes to standard output. Set to 1, progress information relating to attachments creation is overwritten for each new attachment. Set to 2, attachment creation information is listed individually with the number of the message -the attachments relates to.
-
-progress = 0
-
-
warn_suppressions = [ 0 | 1 ]
+the attachments relates to.
+
+progress = 0 +
warn_deprecated_options = [ 0 | 1 ]
+
Set to 1 if you want hypermail to display warning messages if you +are using deprecated or planned to deprecate configuration options. +Set it to 0 to hide those warnings. This warning is written to stdout.
+This option is enabled by default.
+
+warn_deprecated_options = 0 +
+
warn_suppressions = [ 0 | 1 ]
Set this to 1 to get warnings (on stdout) about messages that are not converted because of they are missing a msgid (if require_msgids is On) or because one of the following options suppressed it: deleted expires delete_msgnum filter_out -filter_require filter_out_full_body filter_require_full_body.
-
-warn_suppressions = 1
-
-
uselock = [ 0 | 1 ]
+filter_require filter_out_full_body filter_require_full_body.
+
+warn_suppressions = 1 +
uselock = [ 0 | 1 ]
Controls whether to use hypermail's built-in locking mechanism. By default, this option is set to 1. Set it to 0 if you have an external locking mechanism, like, -for example, when using procmail or smartlist.
-
-uselock = 0
-
-
locktime = number-of-seconds
+for example, when using procmail or smartlist.
+
+uselock = 0 +
locktime = number-of-seconds
The number of seconds that a lock should be honored when -processing inbound messages before it is overridden.
-
-locktime = 3600
-
-
base_url = url-of-main-archive-directory
+processing inbound messages before it is overridden.
+
+locktime = 3600 +
base_url = url-of-main-archive-directory
The url of the archive's main directory. This is needed when the latest_folder option is used and the folder_by_date makes -directories more than one level deep (e.g. with '%y/%m').
-
-base_url = http://www.hypermail-project.org/archive/
-
-
report_new_folder = [ 0 | 1 ]
+directories more than one level deep (e.g. with '%y/%m').
+
+base_url = https://www.hypermail-project.org/archive/ +
report_new_folder = [ 0 | 1 ]
Set this to On to have it print (on stdout) the names of any new directories created pursuant to the folder_by_date or msgsperfolder option, or the initial creation of the archive. It will print the full path if that is what you use to specify the archive directory. Does not print anything when attachment or -metadata directories are created.
-
-report_new_folder = 0
-
-
report_new_file = [ 0 | 1 ]
+metadata directories are created.
+
+report_new_folder = 0 +
report_new_file = [ 0 | 1 ]
Set this to On to have it print (on stdout) the names of any -new files created for new messages. It will print the full path if -that is what you use to specify the archive directory.
-report_new_file = 0
+ new files created for new messages. It will print the full path if + that is what you use to specify the archive directory.
+ report_new_file = 0 -
-See Also +
+ + diff --git a/docs/hr.yellow.png b/docs/hr.yellow.png deleted file mode 100644 index af299993..00000000 Binary files a/docs/hr.yellow.png and /dev/null differ diff --git a/docs/hypermail-doc.css b/docs/hypermail-doc.css new file mode 100644 index 00000000..cff56a0b --- /dev/null +++ b/docs/hypermail-doc.css @@ -0,0 +1,81 @@ +>/* +** This program and library is free software; you can redistribute it and/or +** modify it under the terms of the GNU (Library) General Public License +** as published by the Free Software Foundation; either version 2 +** of the License, or any later version. +** +** This program is distributed in the hope that it will be useful, +** but WITHOUT ANY WARRANTY; without even the implied warranty of +** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +** GNU (Library) General Public License for more details. +** +** You should have received a copy of the GNU (Library) General Public License +** along with this program; if not, write to the Free Software +** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA +**/ + +/* +** stylesheet for hypermail HTML documentation +** +** (extracted from what was previoulsy imbricated +** in the HTML files) +** +** Last revised: 22/May/2023 +**/ + +body { + background-color: #FFFFFF; + color: #000000; + margin: 20px; + line-height: 1.2; +} + +h1 { + text-align: center; +} + +img { + vertical-align: middle; +} + +hr { + /* width: 400px; */ + border: 1px solid black; +} + +hr.centered { + width: 55%; + text-align: center; +} + +dt { + font-weight: bold; + padding-bottom: 4px; +} + +dd { + padding-bottom: 6px; +} + +main { + border-bottom: 2px solid black; +} + +.deprecated { + text-decoration: line-through; +} + +/* rules from the faq */ + +/* + body { + background-image: url(stars.png); + background-color: #000000; + color: #FFFFFF; + } + :link { color: #998800 } + :visited { color: #998800 } + :active { color: #FFFFFF } + h1.c1 {text-align: center} + /*]]>*/ +*/ diff --git a/docs/hypermail-faq.html b/docs/hypermail-faq.html deleted file mode 100644 index a068b588..00000000 --- a/docs/hypermail-faq.html +++ /dev/null @@ -1,513 +0,0 @@ - - - -Hypermail Frequently Asked Questions - - - - - -
-

Hypermail Frequently Asked Questions

-

----

- -This is the beginning of the Hypermail FAQ. Don't dispair that there is -little here. That will change shortly. -

-

Table of Contents

-
    -
  1. Why is the License Different ? -
  2. Will hypermail run on my system ? -
  3. What Happened to EIT ? -
  4. Where in the World is Kevin Hughes ? -
  5. What is the latest version of Hypermail ? -
  6. Where can I get the latest version of Hypermail ? -
  7. How can I split the archives into months ? -
  8. How do I change the font on the pages ? -
  9. What is HM_MAILTO used for ? -
  10. Can I send a message to the list from the web archive ? -
  11. Can I build and run this on Windows 98/2000/NT ? -
  12. How can I remove a listserver subject prefix ? -
  13. Which configuration file should I use ? -
  14. Why is the downloaded file name different when I download hypermail.tar.gz ? -
  15. Can I throttle hypermail's CPU usage ? -
  16. How can I make my archive searchable ? -
  17. How does hypermail decide whether messages are in the same thread? -
  18. How can I have multiple mailboxes produce one archive? -
  19. I have received an email with attachment that needs octet-stream. What should I do? -
  20. Why is Hypermail ignoring some messages? -
  21. Bogus dates are causing some emails to be put in a strange folder. Is there an easy way around this? -
- -

----

-

1. Why is the License Different ?

-

-Hewlett-Packard (who is now the legal owner of Hypermail, since EIT was -bought by VeriFone, which was bought by Hewlett-Packard) has put it -under the GNU license, a widely used "free software" license. This means -that you can use and modify the source code as long as you make your changes -publically available and do not charge for their use (although, under GNU, -you can charge for their distribution). More details are available at -http://www.fsf.org/ and -http://www.fsf.org/copyleft/gpl.html. - -

----

- -

2. Will hypermail run on my system ?

-

-Hypermail can compile and run as-is on most of the more popular UNIX systems -today. If you're not sure if -hypermail will run on your UNIX system, try compiling it and see! -

-Hypermail has been reported to work on MacOSX (X.2.6), with the advice -to use --disable-shared and manually execute make in src/pcre. -

-People have indeed ported hypermail to DOS, Windows 95, and Windows NT (but -not Java ...yet). Installation advice for some Windows systems is available at -win32.html. - -

----

-

-

3. What Happened to EIT ?

-

-A very old and established government contractor company called Electronic -Instrumentation and Technology Inc. made legal moves to obtain the eit.com -domain. Since VeriFone/HP had no interest in keeping EIT, they dissolved it -completely some months ago. This company had a trademark on EIT so the -domain name was given to them. (Thank you InterNic...) - -

----

-

4. Where in the World is Kevin Hughes ?

-

-Kevin has not dropped off the face of the earth but is extremely busy -these days as a Hypermedia Engineer at Veo Systems. - -

- Kevin Hughes -
kev@kevcom.com -
kevinh@veosys.com -
(650) 858-7710 (office number) -
(650) 858-4925 (fax) -
www.veosystems.com -
- -

----

-

-

5. What is the latest version of Hypermail ?

-

-The latest stable version of Hypermail is Version 2.1.9. It is available from -http://www.hypermail-project.org - -

----

-

-

6. Where can I get the latest version of Hypermail ?

-

-The latest version of Hypermail is available from the Hypermail Development -Center at - -SourceForge -and/or at -

-http://www.hypermail-project.org/ -

- -It and past versions are also available via FTP from - -

ftp://ftp.hypermail.org/hypermail/

-

-The cvs server at :pserver:cvs@cvs.hypermail.org:/CVS -usually has a more recent but less tested version. - -

----

-

-

7. How can I split the archives into months ?

-

-I have a fat archive that I'd like to split up by month, like I see so many -others doing. Is there a description somewhere of a procedure to follow, -or will I just have to think it through? -

-If you are using version 2.1.0 or later and either have gdbm or don't -do incremental updates, see the - -folder_by_date - -option. The simple way to use this is to add these to your .hmrc: -

-folder_by_date="%y%m" -
usegdbm = 1 -
-and regenerate your archive from its mbox file. -

-An older alternative is a set of scripts in a directory -named archive/ that has tools to do just what -you want to do. -

-If you want index files split by month but don't need to split the archive -into multiple directories, try adding "monthly_index=1" in your -config file (usually ~/.hmrc) (available in version 2.1). A summary.html -file will provide links to all the monthly indices. -This is probably appropriate for archives with a few thousand messages, -but for larger archives I recommend splitting into multiple directories. -

- There's a script at -http://users.netrus.net/troc/perl.html -called mms (monthly mail splitter) which has also been reported useful. - -

----

-

-

8. How do I change the font on the pages ?

-

-What I'd like to do is change the font of the archives pages. I tried -doing this by adding a <FONT FACE=...> tag in the header.hyp file and a -<FONT> tag in the footer file, but it didn't work. Is there something -in the program itself that's preventing me from making this change? -

-Yes and no. Let me guess... You have hm_usetable = 1. -The code for tables is not inheriting the FONT values and that they need -to be set in the <TH..> tags. If tables are not used it works as -expected. -

-To test it put a - -

-<FONT SIZE="1" FACE="GENEVA,ARIAL,HELVETICA"> -
- -in the test-index.hyp and test-msg.hyp. In test-footer.hyp put - -
-</FONT> -
-
-

-With hm_usetable = 1 in the test.rc file, then ran -"hypermail -c test.rc -m testmail" and everything -between the "menus" was the right FONT but the menus were not. -

-Then edit the test.rc file and set hm_usetable = 0. -Next remove the existing testdir and rerun hypermail again. This time -the FONT works as expected. -

-Is this what is happening to you ? If so the code will need to be modified. - -

----

-

-

9. What is HM_MAILTO used for ?

-

- -I've enabled this option in .hyprc (the configuration file to which -the pipe script for my archive points). Unlike the other options, this one -does not create a link to mailto:admin@domain.com as I would expect. Any -ideas? -

-HM_MAILTO has a couple different uses. One is to trigger the insertion of -the <LINK REV=made HREF=mailto:...> header in the HTML sources. -This is most useable with ascii browsers such as lynx. -

- -

- hm_mailto = [ email-address | NONE ] -
# -
# The address of the contact point that is put in the HTML header line -
# <LINK REV=made HREF=mailto:hm_mailto> -
# -
# The <LINK...> header can be disabled by default by setting -
# mailto to NONE. -
- -

-It can also be used in a hypermail page template file since it resolves -to %m. -

-For example, your footer file might look like... - -

<P ALIGN=CENTER><IMG SRC="/images/bar.png" WIDTH="400" HEIGHT="4" ALT="---------"></P> -
<ADDRESS> -
<EM> -
<SMALL> -
This archive was generated by %u on %g -
<P ALIGN=CENTER> -
Send administrative comments to<A HREF="mailto:%m">%m</A> -
</P> -
</SMALL> -
</EM> -
</ADDRESS> -
</BODY> -
</HTML> -
-

-In this case the %m is expanded to admin@domain.com. - -

----

-

-

10. Can I send a message to the list from the web archive ?

-

-I would like to add a link to the menu that says "Send a Message to the -List." I read through the documentation on your website, and I think I -know what I have to do. However, I cannot find an example of what a .hyp -template file should look like. -

-That's what "Respond" on the menubar does. It allows a person -to reply to the existing message, with the reply sent to the list address. -And "mail a new topic" on the menubar allows a user to send -a new message to the list. -

-To enable this feature set -

- -
hm_hmail = listaddr@your-site.domain -
-

----

-

Can I build and run this on Windows 98/2000/NT ?

-

-I would like to use Hypermail on my Windows box. Can I ? If so how do -I build the latest version ? -

-There is a complete description on how to build hypermail on a Windows - system at win32.html. This was written by -Bob Crispen <bob.crispen@boeing.com>. (Thanks Bob!) - - -

----

-

How can I remove a listserver subject prefix ?

-

-The Subject index page does not seem to do the right sorting. My list -has a "subject prefix" so people can filter their inbound mail better. Is -that confusing the sorting ? -

-Yes. Hypermail functions that deal with replies look for the Re: and -return a pointer to the subject line after the "Re: ". -The listserver software is prefixing the subject line contents with -[subject-prefix]. For example: -

- -Initial message:
-

- Subject: [subject-prefix] the subject of the message -
-

-Replies:
-

- Subject: [subject-prefix] Re: the subject of the message -
-
-

-The best solution is to use the directive -

- stripsubject = [subject-prefix] -
-in the config file for that list. Then [subject-prefix] is stripped before -the Reply processing occurs and the proper things happen. This is the proper -fix since hypermail would need to know to strip the [subject-prefix] from -the initial messages to get it right. - -

----

-

Which configuration file should I use ?

-

-Which configuration file should I use ?. Is it the hypermail.rc file -in the Configs directory? I'm trying to edit the appearance of pages created -by hypermail, but I think I'm trying to edit the wrong config file. -

-With one of the later versions you can use the -v option to generate -a config file that you can use. There is not just one config file. -What are included are examples. -

- -

- $ hypermail -v > some-file.rc -
- - -Edit the some-file.rc and set the values to what you want. -

-Then run hypermail using the config file by: - - -

- $ hypermail -c some-file.rc. -
- - -

----

-

Why is the downloaded file name different when I download hypermail.tar.gz ?

-

- -The source is available from -

-www.hypermail.org/hypermail.tar.gz -
-However: When downloading though IE v4 browser on an NT -PC, you get a file called hypermail_tar.tar. - -

-Depending on your browser MIME type setup and support applications, -the filename downloaded might be hypermail.tar, hypermail.tar.gz, etc. -In any case, you will be informed as to the filename when you actually -download it onto a Windows/NT PC. Use that name from that point forward. - -

----

-

Can I throttle hypermail's CPU usage ?

-

- -I'm looking for a good way to throttle hypermail. When it runs it is -taking over both CPU's on my system. Is there a way to have it limit -itself? - -

-There is currently no "nice" call in hypermail to limit itself. Maybe -we can consider it in the future. You might run it as a subcommand to -nice. For example in an aliases file: - -

- "|nice someval 'hypermail command line'" -
- -I've NEVER tried this so... ;) -

-You might also be looking at a very large archive with lots of messages -being the cause. If so then try breaking it up into smaller archives -such as monthly archives. Tools are available for that in the latest -release. -

-And if the list is a high traffic list, consider not running it from a -sendmail alias and instead, run it from cron. (To assure it does not -run unnecessarily, consider using 'make' to check dependencies on the -inbound mailbox and the archive itself. If the mailbox is newer, then -run the hypermail command. If not, then no new message have been received -so don't bother running it. - -

-If the you are using the linkquotes option and incremental update -(-u option), add a searchbackmsgnum line to your config file. -It controls the tradeoff between speed and -the reliability of finding the right source for quoted text. -Try to set it to the largest number of messages between a -message and the final direct reply to that message. - -

----

-

How can I make my archive searchable ?

-

-Where can I find a program that provides an easy way of adding a -"search this archive" feature to my hypermail archive? - -

-These seem to be the most popular choices: -

-Swish-E comes with a script (index_hypermail.pl) that is customized for -indexing Hypermail archives. index_hypermail.txt -has a description of how to use it. -

- You can also use search engines by putting text such as this on -your web page (although this search produces results from an entire site, -not just the email archive): -
<form action=http://www.google.com/search> -
<input type=text name=as_q> -
<input type=hidden name=as_sitesearch value=hypermail-project.org> -
<input type=submit value="Search Hypermail"> -
</form> -

-You can probably get more complete (but apparently not recent) info at: -Search Engine Software For Your Web Site. - -

----

-

How does hypermail decide whether messages are in the same thread?

-

- It uses the In-Reply-To: header if that is available. If not, it uses -the References: header if that is available. If these are not available, -it looks for previous messages with the same subject header. Matches based -on the subject are listed as "maybe" replies. -

- If the linkquotes option is used, it will also search the message bodies -of the previous searchbackmsgnum (default = 500) messages for text that -looks like it is being quoted in the current message. If it finds one or more -prior messages with such text matches, it will treat the one with the longest -match as the message being replied to. The exact algorithms for deciding what -is quoted text are a bit complex. These matches override the In-Reply-To: -and References: info (this rule may deserve further thought). - -

----

-

How can I have multiple mailboxes produce one archive?

-

- One simple way is to combine them into one mailbox, and send that to -hypermail. Something like

"cat file1 file2 file3 > combined"
-should do that. -

- Another way is to invoke hypermail several times, using the -u flag for -any mailboxes that you want added after the html archive has been started: -

- hypermail -m file1
- hypermail -u -m file2
- hypermail -u -m file3
-
-

- Do not try to send hypermail more than one mailbox at a time, or send -it a second mailbox without using the -u flag or the increment=1 or -increment=-1 option. - -

----

-

I have received an email with attachment that needs octet-stream. What should I do?

-

-No attachment "needs" octet-stream. octet-stream is a label that describes -the attachment as being "a stream of octets". What is that you say? Well, the -application didn't know any more specifics and thus it identified it as good -as possible. It knows it is a stream of octets, nothing else. -

-So, an attachment coming as "octet-stream" can be pretty much ANYTHING. You -have no idea, and neither does anyone else. The only way to figure out is to -download the file and see for yourself, ask the person who mailed it or to -hope that the mail it came with describes what the attachment was about. -

-"octet-stream" could just as well be named "I haven't got the slightest idea -what this is, but I know it is built up with a series of bytes". - -

----

-

Why is Hypermail ignoring some messages?

-

One possibility is that those messages have a header saying -

-X-No-Archive: yes -
-which seems to indicate that the author doesn't want them included in any -archive. This is controlled by a configure option which defaults to: -
-deleted = "X-Hypermail-Deleted X-No-Archive" -
-To get Hypermail to treat these as normal messages, add this to your -config file (which is ~/.hmrc by default) or alter the "deleted" line -in your existing config file to: -
-deleted = "X-Hypermail-Deleted" -
- -

----

-

Bogus dates are causing some emails to be put in a strange folder. Is there an easy way around this?

-

-If the bad dates were caused by a computer with the date set absurdly, try -running -mailbox_date_trimmer.py, available in the Hypermail's contrib directory. -

-If the bad dates are in a format that Hypermail doesn't understand, then -you will probably need to edit them manually. - -

----

-
-
-
- -Do you have a Hypermail related question you'd like to see listed here ? -If so send mail to the Hypermail mailing list -hypermail@hypermail-project.org -  (preferably with the answers). In order to minimize spam on the list, you must subscribe to the list (at least temporarily) in order to send mail to it. -You may subscribe to the list by sending a message with the word -"subscribe" in the Subject: field to hypermail-request@hypermail-project.org. - -
-
-

 

- - diff --git a/docs/hypermail.1 b/docs/hypermail.1 index a1b3ef54..c40959ab 100644 --- a/docs/hypermail.1 +++ b/docs/hypermail.1 @@ -146,14 +146,6 @@ generates the html files. This is dependent on local needs. Do not put a '.' in the value. It would result in "file..html", probably not what you want. .TP -.B \-t - This will tell Hypermail to generate an -index menu at the top and bottom of each page in a table format. -.TP -.B \-T -This will tell Hypermail to generate -a message index Subject/Author/Date listings using a table format. -.TP .B \-u This option tells Hypermail to add message(s) to the end of the existing HTML file archive and integrate them into it by links and cross-references. All archive index files will be regenerated to include the new message. Hypermail used to require that you only send it one message at a time when @@ -554,8 +546,6 @@ recommended for archives with over a few hundred messages. Setting this greater than 1 will produce multiple levels of files for each thread whose replies are nested by more than 1 level, but that is rarely useful. This option is currently disabled -if the indextable option is turned on, and probably needs to -be less than thrdlevels. .SH BUGS @@ -572,15 +562,15 @@ was originally designed and developed by Tom Gruber .RI for Enterprise Integration Technologies (EIT) in Common Lisp. It was later rewritten in C by Kevin Hughes .RI -while at EIT. Kevin passed on\-going development and support for Hypermail +while at EIT. Kevin passed development and support for Hypermail to Kent Landfield .RI . .LP -The latest documentation can usually be found at - .B http://www.hypermail.org/ - but you might also want to check the cvs repository which is the first -place that changes become available: - .B http://cvs.hypermail.org/cgi-bin/cvsweb.cgi/hypermail/docs/ +As of today, current on-going development and support for Hypermail are done directly on its github repository +.B https://github.com/hypermail-project/hypermail +.LP +The latest documentation can be found at +.B https://github.com/hypermail-project/hypermail/docs .LP .SH CREDITS .LP diff --git a/docs/hypermail.css b/docs/hypermail.css new file mode 100644 index 00000000..ff07db4e --- /dev/null +++ b/docs/hypermail.css @@ -0,0 +1,383 @@ +/* +** This program and library is free software; you can redistribute it and/or +** modify it under the terms of the GNU (Library) General Public License +** as published by the Free Software Foundation; either version 2 +** of the License, or any later version. +** +** This program is distributed in the hope that it will be useful, +** but WITHOUT ANY WARRANTY; without even the implied warranty of +** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +** GNU (Library) General Public License for more details. +** +** You should have received a copy of the GNU (Library) General Public License +** along with this program; if not, write to the Free Software +** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA +**/ + +/* +** Default stylesheet for hypermail archives +** +** Last revised: 07/July/2023 +**/ + +/* Use this CSS rule to choose the background color + for your archive */ +html { + color: black; + background: #e2edfe; + /** you may also try #FFF6E0, #FFFFF6 or #FFEEC2. + ** For other color combinations, make sure you choices pass + ** the WAI contrast criteria + ** by using https://webaim.org/resources/contrastchecker/ */ +} + +/* wai-menu.css */ + +/* accessible breadcrumbs */ +nav.breadcrumb ul { padding: 0; } +nav.breadcrumb ul li { display: inline;} +nav.breadcrumb ul li+li::before { content: " > "; } + + +/* accessible horizontal menus */ +ul.hmenu_container { margin-left: 1em; padding-left: 0; } +ul.hmenu_container > li { list-style-type: none; } +ul.hmenu { font-style: normal; padding: 0; display: inline; } +ul.hmenu li { display: inline; } + +ul.hmenu li { + border-right: 1.2px solid black; + padding-right: 0.4em; +} + +ul.hmenu li:last-child { + border-right: none; + padding-right: 0; +} + +/* replacement for dfn */ +.heading { font-weight: bold; } + +h2.heading, h2.theading { + font-size: inherit; + font-weight: normal; + margin-top: 0.2em; + margin-bottom: 0; + display: list-item; +} + +h2.theading:before { + width:10px; + height:10px; + border-radius:50%; + background: #b83b3b; + display:inline-block; +} + +h2.theading + ul { + list-style-type: disc; +} + +/* default settings for narrow screens (smaller headings and padding) */ +/* height and min-height: 100% are to make backgrounds fill the window */ +html { + height: 100%; +} + +body { + font-family: sans-serif; + margin: 0; + padding: 0.5em 1em; + line-height: 1.4; + min-height: 100%; +} +h1 { + font-size: 1.25em; +} +h2 { + font-size: 1.125em; +} +h3 { + font-size: 1em; +} +.messages-list { + padding-top: 1em; + margin-left: 1em; +} +.messages-list ul { + margin: 0; + padding-left: 1em; + list-style-type: disc; +} + +/* bigger headings and padding on wider screens (desktops, tablets) */ +@media screen and (min-width: 48em) { + body { + padding: 0.5em 2.5em; + } + h1 { + font-size: 1.5em; + } + h2 { + font-size: 1.25em; + } + h3 { + font-size: 1.125em; + } + ul.hmenu_container { + padding-left: 1.5em; + } + .messages-list { + padding-top: 1.5em; + margin-left: 2.5em; + } + .messages-list ul { + padding-left: 2em; + } +} + +ul.footer_hmenu { padding: 0; display: inline; } +ul.footer_hmenu::after { content: ""; clear: both; display: block; } +ul.footer_hmenu > li { list-style-type: none; } +ul.footer_hmenu li { display: inline; } +ul.footer_hmenu li.footer_admin { float: right; } + +/* make links visible thru focus or hover */ +a:link { + text-decoration-thickness: 1px; + text-underline-offset: 2px; +} + +a:visited { + color:#529; +} + +a:focus { + outline: 2px solid; + outline-offset: 1px; + text-decoration: underline; + text-decoration-thickness: 1px; + outline-color:#005A9C; + cursor :pointer; +} + +a:hover { + color:#930; + text-decoration: underline; + text-decoration-thickness: 2px; +} + +a:active { + /* outline: 0; */ +} + + +/* mainindex-wai.css */ + +table { margin: 1em; } + +/* Table head -- should be w3c blue */ +thead tr { color: #FFFFFF; background-color: rgb(0,90,156); } + +tbody tr { font-family: sans-serif; } + +.head img {float:left;} +.head p {clear: left;} +/* th {font-style:italic;} */ +nav ul { list-style-type: none; } + +table { border: 0px; border-spacing: 3px; } +td { padding: 2px; } +/* accessible horizontal menus */ +.cell_message { text-align: center; } +.cell_period { text-align: right; } + +a.sub:link { color: green; } +a.unsub:link {color: #b22222; } + +.header .important { padding: 0.5em 1em; } + +.visually_hidden { + border: 0; + clip: rect(0 0 0 0); + height: 1px; + margin: -1px; + overflow: hidden; + padding: 0; + position: absolute; + width: 1px; +} + +.restricted:after { + content: " (restricted)"; +} + +body.mainindex .head { + border-bottom: none; +} + + +/* messagelist-wai.css */ + +dfn {font-weight: bold;} +nav ul {list-style:none;} +.messages-list { + border-bottom:thin solid black; +} + + +/* message.css */ + + +/* Leave message itself white but rest of metadata very pale grey. +** This makes it more legible. +** it also by analogy points out that the content of the message +** someone else's, not W3C material necessarily -- we just run the list +*/ + +pre { + color: black; + white-space: pre-wrap; + overflow-wrap: break-word; +} + + +/* want to be able to identify the author and style it +** but can't +*/ + +dfn {font-weight: bold;} +/* .mail, .head { border-bottom:1px solid black;} */ +map ul {list-style:none;} +#message-id { font-size: small;} +address { font-style:inherit ;} + +@media print { + #upper, #navbarfoot, #navbar { + display:none; + } +} + +/* for edited messages */ +div.edit { + background: #CCFF99; + padding: 0.25em 2em 0.25em 1em; + margin-top: 0.25em; +} +p.editmes{ + font-style: italic; + font-size: small; + padding: 0; +} +p.editfooter{ + font-size: small; + padding: 0; +} + +/* for deleted messages */ +div.spam { + background: #aaffaa; + padding: 1em 2em; +} +p.spammes{ + font-size: x-large; + font-weight: bold; +} +p.spamfooter{ + font-size: small; +} + + +/* updates added by Gerald 27 Apr 2023 */ + +.message-forwarded { + padding: 0 1em; + border: 2px solid #888; + border-radius: 0.5em; +} + +.message-body-part { + margin: 1em 0; +} +.message-forwarded > h2 { + color: #303030; + padding: 0.25rem 1rem 0.1em; + background: #DEE7CA; + margin: 0 -1rem 0.5rem; + border-radius: 0.3rem 0.3rem 0 0; +} + +.message-forwarded ul.headers { + margin: 0; +} + +.message-forwarded pre { + padding: 0; + margin-bottom: 0.25em; +} + +section section { + text-indent: 0; +} + +ul.headers { + list-style: none; + padding-left: 0em; +} + +.attachment-links { + background: aliceblue; + padding: 0.01em 1em; +} + +section.attachment-links ul { + padding-left: 1.5em; +} + +.attachment-links img { + padding: 0.5em 0; +} + +img { + max-width: 100%; +} + +/* experimental: */ + +.mail { + background: white; + /* + padding-left: 0.75em; + padding-top: 0.025em; + */ + padding: 0.025em 0.75em; +} + +/* the following wo rules apply when you're using the + yearly_index config option */ +.summary-year { + padding-top: 1em; + padding-bottom: 1em; +} + +main.summary-year th.cell_period + th, +main.summary-year td.cell_message + td { + padding-left: 0.5em; +} + +/* the following rules are used to set the style of quotes + when you're using the 'showhtml' config option with + values 1 or 2 */ + +/* use this option to specify the style for a quote line */ +.quote { +/* enable this one to change the style to italic to emulate + the deprecated iquote config option */ + font-style: italic; +} + +/* add more entries or separate this rule into + multiple ones if you want a specific style for + quotes depending on their level */ +.quotelev1, .quotelev2, .quotelev3, .quotelev5, .quotelev6 { +} diff --git a/docs/hypermail.html b/docs/hypermail.html index 3f3022c6..a42cbc19 100644 --- a/docs/hypermail.html +++ b/docs/hypermail.html @@ -1,77 +1,91 @@ - - - -Hypermail Documentation - - - -

  Hypermail

- -
- -

-Contents: -

-

- -


- -

What is Hypermail?

- -Hypermail is a program that takes a file of mail messages in UNIX mailbox format and generates a set of cross-referenced HTML documents. Each file that is created represents a separate message in the mail archive and contains links to other articles, so that the entire archive can be browsed in a number of ways by following links. Archives generated by Hypermail can be incrementally updated, and Hypermail is set by default to only update archives when changes are detected. -

-Each HTML file that is generated for a message contains (where applicable): -

-

    -
  • the subject of the article, -
  • the name and email address of the sender, -
  • the date the article was sent, -
  • links to the next and previous messages in the archive, -
  • a link to the message the article is in reply to, and -
  • a link to the message next in the current thread. -
-

-In addition, Hypermail will convert references in each message to email addresses and URLs to hyperlinks so they can be selected. Email addresses can be converted to mailto: URLs or links to a CGI mail program. -

-To complement each set of HTML messages, four index files are created which sort the articles by date received, thread, subject, and author. Each entry in these index files are links to the individual articles and provide a bird's-eye view of every archived message. -

-Hypermail was originally developed and designed by Tom Gruber for Enterprise Integration Technologies (EIT) in Common Lisp. It was later rewritten in C by Kevin Hughes while at EIT. Hypermail is now being maintained by Peter McCluskey <pcm@rahul.net>. -

- -To see what Hypermail can do, take a look at these Hypermail-produced archives: -

-

-

- -


- -

Usage

- -

-

+
+
+    
+    		
+	Hypermail Documentation
+		
+	
+    
+    
+	
+

hypermail logo  Hypermail

+
+ + +
+
+
+

What is Hypermail?

+ Hypermail is a program that takes a file of mail + messages in UNIX mailbox format and generates a set of + cross-referenced HTML documents. Each file that is created + represents a separate message in the mail archive and contains + links to other articles, so that the entire archive can be browsed + in a number of ways by following links. Archives generated by + Hypermail can be incrementally updated, and Hypermail is set by + default to only update archives when changes are detected. +

Each HTML file that is generated for a message contains (where + applicable):

+
    +
  • the subject of the article,
  • +
  • the name and email address of the sender,
  • +
  • the date the article was sent,
  • +
  • links to the next and previous messages in the archive,
  • +
  • a link to the message the article is in reply to, and
  • +
  • a link to the message next in the current thread.
  • +
+

In addition, Hypermail will convert references in each message + to email addresses and URLs to hyperlinks so they can be selected. + Email addresses can be converted to mailto: URLs + or links to a CGI mail program.

+

To complement each set of HTML messages, four index files are + created which sort the articles by date received, thread, subject, + and author. Each entry in these index files are links to the + individual articles and provide a bird's-eye view of every archived + message.

+ +

To see what Hypermail can do, take a look at these + Hypermail-produced archives:

+ +
+

Usage

+
 Usage: hypermail [options]
 
 Options:
@@ -89,8 +103,6 @@ 

Usage

-o keyword=val: Set config item -p : Show progress -s htmlsuffix : HTML file suffix (.html, .htm, ..) - -t : Use Tables - -T : Use index tables -u : Append all input messages -v : Show configuration variables only -V : Show version information and exit @@ -100,42 +112,46 @@

Usage

-1 : Read only one mail from input -L lang : Specify language to use (de en es fi fr is pl pt sv no el gr ru it ) -
-

-Using the flags -h, or -? with Hypermail will display this usage summary. -

- -


- -

Command-Line Options

- -Input and Output Options:
-
--i, -
-m  "mailbox", -
-d  "directory", -
-c  "file" -
-

- -To tell Hypermail what mailbox to read in, use the -m option. If articles will be sent to Hypermail through standard input, use the -i option. Note that the -m and -i options can't be used together! By default, Hypermail will look for a file called mbox to read its articles in from. -

-The -d option specifies the directory to put the HTML files and index files that are created into. If the directory doesn't exist, a new one will be created with the name that is specified. If the -d option isn't used, Hypermail will look for a directory with the same name as the mailbox or will create one if needed. -

-

+
+

Using the flags -h, or -? with +Hypermail will display this usage summary.

+
+

Command-Line Options

+Input and Output +Options:
+
-i,
+-m  "mailbox",
+-d  "directory",
+-c  "file"
+

To tell Hypermail what mailbox to read in, use the +-m option. If articles will be sent to Hypermail +through standard input, use the -i option. Note +that the -m and -i options can't +be used together! By default, Hypermail will look for a file called +mbox to read its articles in from.

+

The -d option specifies the directory to put +the HTML files and index files that are created into. If the +directory doesn't exist, a new one will be created with the name +that is specified. If the -d option isn't used, +Hypermail will look for a directory with the same name as the +mailbox or will create one if needed.

+
    example 1: hypermail -m "wu-ftpd" -d "/wu-ftpd"
    example 2: cat "/var/spool/mail/wu-ftpd" | hypermail -i
-
-

-

    -
  1. This example reads the articles in wu-ftpd and will save the output in the /wu-ftpd directory. -

    -

  2. This reads the file /var/spool/mail/wu-ftpd from standard input and will save the output in a directory called archive in the same directory Hypermail was run from. -
-

-Note that Hypermail can only read messages in the UNIX mailbox format! Such archives are typically RFC 2822 mail messages appended to each other that look similar to this: -

-

+
+
    +
  1. This example reads the articles in wu-ftpd and +will save the output in the /wu-ftpd +directory.
  2. +
  3. This reads the file /var/spool/mail/wu-ftpd +from standard input and will save the output in a directory called +archive in the same directory Hypermail was run +from.
  4. +
+

Note that Hypermail can only read messages in the UNIX mailbox +format! Such archives are typically RFC 2822 mail messages appended +to each other that look similar to this:

+
    From john@foo.com  Mon Jan  1 00:01:30 1994
    Date: Mon, 1 Jan 1994 00:01:15 PDT
    From: john@foo.com
@@ -147,482 +163,517 @@ 

Command-Line Options

From someone.else@foo.com Mon Jan 1 00:02:00 1994 Date: Mon, 1 Jan 1994 00:01:45 PDT ... -
-

-The messages are typically separated by lines in this format: -

-

+
+

The messages are typically separated by lines in this +format:

+
    From wu-ftpd@wugate.wustl.edu  Fri Jul  1 00:18:20 1994
-
-

-The -c option tells Hypermail to read in settings from a configuration file. By default, the program will attempt to read settings from a file called .hmrc in the user's home directory if it exists. -

-In the configuration file, variables are set in the following manner: -

-

- -variable = number -
variable = "string" -
-
- -The complete set of variables that Hypermail recognizes is described in the Configuration Options page. -

- -Archive Interface Options: -

--l  "label", -
-b  "About URL", -
-a  "Other Archives URL" -
-

-The -l option tells Hypermail what to call the archive - the name that is specified will be in the title of the index pages so users know what sort of messages are being archived. -

-The -a option includes a link labelled "Other mail archives" in the index pages to any specified URL. This way users who are looking at the archive have the opportunity to go to pointers to other mail archives. By default, this will be a pointer to the parent directory in which the archive files reside. -

-The -b option includes a link labelled "About this archive" in the index pages to any specified URL. This way users who are looking at the archive have the opportunity to go to information about the archive. -

-

+
+

The -c option tells Hypermail to read in +settings from a configuration file. By +default, the program will attempt to read settings from a file +called .hmrc in the user's home directory if it +exists.

+

In the configuration file, variables are set in the following +manner:

+
variable = number
+variable = "string"
+ +

The complete set of variables that Hypermail recognizes is +described in the Configuration Options + page.

+ +

To get you up to speed, a sample annotated hypermail configuration file is included in the docs dir.

+ +

Archive Interface +Options:

+
-l  "label",
+-b  "About URL",
+-a  "Other Archives +URL"
+

The -l option tells Hypermail what to call the +archive - the name that is specified will be in the title of the +index pages so users know what sort of messages are being +archived.

+

The -a option includes a link labelled "Other +mail archives" in the index pages to any specified URL. This way +users who are looking at the archive have the opportunity to go to +pointers to other mail archives. By default, this will be a pointer +to the parent directory in which the archive files reside.

+

The -b option includes a link labelled "About +this archive" in the index pages to any specified URL. This way +users who are looking at the archive have the opportunity to go to +information about the archive.

+
    example: hypermail -l "WU-FTPD Development Archives"
             -a "http://www.landfield.com/wu-ftpd/"
             -b "http://www.landfield.com/wu-ftpd/mail-archive/"
-
-

-In the index files for the archive, the above setting will produce something like this: -

-

-(top of page) -

-WU-FTPD Archives -

-

-(list of indexed articles below) -

-

- -Updating Options:
-

--x, -
-u -
-

-The -x option tells Hypermail to explicitly overwrite any previous HTML files that may exist. Use this option only when it is desirable to completely rewrite the entire archive. -

-The -u option tells Hypermail to add message(s) to the end of the existing HTML file archive and integrate them into it by links and cross-references. All archive index files will be regenerated to include the new message. -

- Hypermail used to require that you only send it one message at a time when -using the -u option, but it should now work reasonably when -given mailboxes containing multiple messages. -

-When using the -u option, don't send any messages that -Hypermail has already processed. If you want Hypermail to recognize that -some messages are old messages that shouldn't be added to the archive again, -send it a mailbox with a complete set of messages and avoid the --u option. -

-

+
+

In the index files for the archive, the above setting will +produce something like this:

+
(top of page) +

WU-FTPD Archives

+ +

(list of indexed articles below)

+
+

Updating +Options:

+
-x,
+-u
+

The -x option tells Hypermail to explicitly +overwrite any previous HTML files that may exist. Use this option +only when it is desirable to completely rewrite the entire +archive.

+

The -u option tells Hypermail to add message(s) +to the end of the existing HTML file archive and integrate them +into it by links and cross-references. All archive index files will +be regenerated to include the new message.

+

Hypermail used to require that you only send it one message at a +time when using the -u option, but it should now +work reasonably when given mailboxes containing multiple +messages.

+

When using the -u option, don't send any +messages that Hypermail has already processed. If you want +Hypermail to recognize that some messages are old messages that +shouldn't be added to the archive again, send it a mailbox with a +complete set of messages and avoid the -u +option.

+
    example 1: cat "one.letter" | hypermail -i -u -d "/wu-ftpd/mail-archives"
    example 2: hypermail -u -m "one.letter" -d "/wu-ftpd/mail-archives"
    example 3: hypermail -m "mailbox" -d "/wu-ftpd/mail-archives" -x
    example 4: hypermail -m "mailbox" -d "/wu-ftpd/mail-archives"
-
-

-

    -
  1. This tells Hypermail to take the article it receives from standard input -and integrate it with the archive under the wu-ftpd/mail-archives -directory. If no archive exists, a new one will be created with the specified -letter as the first file of the archive. -

    -

  2. This does the same thing, except that the letter is read in from a file that contains only that letter. -

    -

  3. With these options, Hypermail will read in the articles from -mailbox and write over any existing files in the -wu-ftpd/mail-archives directory if they exist. If no archive -exists, a new one will be created. -

    -

  4. With these options, Hypermail will read in the articles from -mailbox and only write new articles - it will not overwrite -any existing archive files. -
-

-Note that no matter what options are specified, the index files are always -rewritten. The date when Hypermail was last run is included in index pages, -so it's easy to tell when the archive was last updated. -

- -Miscellaneous Options
-

--p -
-v -
-V -
-

-The -p option shows a progress report as Hypermail reads in and writes out messages - the number of files that Hypermail is reading and writing and the file names of the directory and files created are shown. This information is written to standard output. -

-The -v option shows the configuration variables and their values that Hypermail would use if it was run with the same configuration file and command line options. This is useful when starting up a new list or modifying a list configuration file. Once the information is displayed, Hypermail terminates and no actual processing occurs. -

-The -V option prints the Hypermail version information. Once the information is displayed, Hypermail terminates and not actual processing occurs. -

-The -0 option list message numbers that should be deleted -from the html archive. The mbox is not changed. -It is equivalent to the delete_msgnum -option. -

-


- -

Configuration Options

- -Hypermail has many variables that can be set as environment variables or as variables in the specified configuration file. For instance, using the C shell, one could define variables in this manner: -

-

+
+
    +
  1. This tells Hypermail to take the article it receives from +standard input and integrate it with the archive under the +wu-ftpd/mail-archives directory. If no archive +exists, a new one will be created with the specified letter as the +first file of the archive.
  2. +
  3. This does the same thing, except that the letter is read in +from a file that contains only that letter.
  4. +
  5. With these options, Hypermail will read in the articles from +mailbox and write over any existing files in the +wu-ftpd/mail-archives directory if they exist. If +no archive exists, a new one will be created.
  6. +
  7. With these options, Hypermail will read in the articles from +mailbox and only write new articles - it will not +overwrite any existing archive files.
  8. +
+

Note that no matter what options are specified, the index files +are always rewritten. The date when Hypermail was last run is +included in index pages, so it's easy to tell when the archive was +last updated.

+

Miscellaneous +Options

+
-p
+-v
+-V
+

The -p option shows a progress report as +Hypermail reads in and writes out messages - the number of files +that Hypermail is reading and writing and the file names of the +directory and files created are shown. This information is written +to standard output.

+

The -v option shows the configuration variables +and their values that Hypermail would use if it was run with the +same configuration file and command line +options. This is useful when starting up a new list or modifying a +list configuration file. Once the information is displayed, +Hypermail terminates and no actual processing occurs.

+

The -V option prints the Hypermail version +information. Once the information is displayed, Hypermail +terminates and not actual processing occurs.

+

The -0 option list message numbers that should +be deleted from the html archive. The mbox is not changed. It is +equivalent to the delete_msgnum option.

+
+

Configuration Options

+Hypermail has many variables that can be set as environment +variables or as variables in the specified configuration file. For instance, using the C +shell, one could define variables in this manner: +
    setenv HM_MBOX /home/john/my_mailbox
    setenv HM_FILEMODE 0600
-
-

-In the configuration file, variables must be in lowercase and separated by their values with an equals (=) sign. Blank lines and lines beginning with the # character are skipped: -

+
+

In the configuration file, variables must be in lowercase and +separated by their values with an equals (=) sign. +Blank lines and lines beginning with the # +character are skipped:

+
    mbox = "/home/john/my_mailbox"
    filemode = 0600
-
-While the example uses quotes ("), they is not required when used in the configuration file. -

-Below is a list of the more important configuration variables. -For a complete list, see hmrc.html. -

-

-
HM_LABEL "label name" -
Define this as the default label to put in archives. -

-

HM_ARCHIVES "URL" -
This will create a link in the archived index pages to the specified - URL. Define as "NONE" to omit such a link. - See also custom_archives. -

-

HM_HMAIL "list submission address" -
This is the email address used to send a new message to a hypermail archive. "NONE" means don't use it. Since this is different for each hypermail archive, you should probably leave it set to "NONE" here, and let it be specified at runtime by command-line parameters in the list specific configfile. -
See also newmsg_command -and replymsg_command. -

-

HM_DIR "directory" -
This is the default directory that Hypermail will look for when creating and updating archives. If defined as "NONE", the directory name will be the same name as the mailbox read in. -

-

HM_MBOX "filename" -
This is the default mailbox to read messages in from. Define this with - a value of "NONE" to read from standard input as the - default. -

-

HM_STRIPSUBJECT "text"
- A string to be stripped from all subject lines. Helps -unclutter mailing lists which add tags to subject lines. -
-

- -

HM_FOLDER_BY_DATE = "strftime-date-format"
-This string causes the messages to be put in subdirectories -by date. The string will be passed to strftime(3) to generate -subdirectory names based on message dates. Suggested values are -"%y%m" or "%b%y" for monthly subdirectories, "%Y" for -yearly, "%G/%V" for weekly. Do not alter this for an existing -archive without removing the old html files. If you use this -and update the archive incrementally (e.g. with -u), you must -use the usegdbm option. -
See also monthly_index. -
-

- -

HM_ISODATE boolean_number
-Set this to On to display article received dates in -YYYY-MM-DD HH:MM:SS format. If used with the gmtime -option, a Z will be inserted between the DD and HH. -
See also eurodate and dateformat. -
-

-

HM_LANGUAGE "language-id" -
This is a two-letter string specifying the default - language to use, or a longer string specifying a language - and locale. Set this the value of the language - table you wish to use when running and generating - archives. See also iso2022jp - and eurodate. -
-
Current supported languages, with their default locales: -
-de (de_DE) - German -
en (en_US) - English -
es (es_ES) - Spanish -
fi (fi_FI) - Finnish -
fr (fr_FR) - French -
el (el) - Greek -
gr (el_GR) - Greek -
is (is_IS) - Icelandic -
no (no_NO) - Norwegian -
pl (pl_PL) - Polish -
pt (pt_BR) - Brazilian Portuguese -
ru (ru_RU) - Russian -
sv (sv_SE) - Swedish -
-The directory /usr/share/i18n/locales on many systems has the locale -codes that are available on that system. -

-

HM_INCREMENT -1, 0, or 1 -
Define as 1 to append all input messages to the end of existing archives. -
Define as 0 for it to read a mailbox that corresponds to the entire -archive. If there are any existing html messages, it will figure out which -ones at the end of the mailbox are new, and add only those that haven't been -converted yet. -
Define as -1 to have hypermail figure out whether the input -is entirely new messages to be appended or whether it contains -messages that are already in the archive. A value of -1 cannot be -used with the mbox_shortened option or with the -i command line -option or with mbox = NONE.
-

- -

HM_APPEND boolean_number
-Set this to On to maintain a parallel mbox archive. The file -name defaults to mbox in the directory specified by -d or dir. -
See also append_filename -and txtsuffix. -
-

-

HM_SHOWHTML 0, 1, or 2 -
-Define as 1 to show the articles in a proportionally-spaced -font rather than a fixed-width (monospace) font. Setting this -option to 1 also tells Hypermail to attempt to italicize quoted -passages in articles. -

-Define as 2 for more complex conversion to html -similar to that in txt2html.pl. -Showhtml = 2 will normally produce nicer looking results than -showhtml = 1, and showhtml = 0 will look pretty dull, but -1 and 2 run risks of altering the appearance in undesired ways. -

-

HM_LINKQUOTES boolean_number -
Set this to On to create fine-grained links from quoted -text to the text where the quote originated. It also improves -the threads index file by more accurately matching messages -with replies. Note that this may be rather cpu intensive (see -the searchbackmsgnum option to alter the performance). -

-

HM_ABOUT "URL" -
This will create a link in the archived index pages to the specified URL. Define as "NONE" to omit such a link. -

-

HM_MAILTO address -
The address of the contact point that is put in the HTML header line -
-
<LINK REV=made HREF=mailto:MAILTO> -
- The <LINK...> header can be disabled by default by setting HM_MAILTO to "NONE". -

-

HM_INDEXTABLE boolean_number -
Setting this variable to 1 will tell Hypermail to generate - a message index Subject/Author/Date listings using a table - format. Set to 0 if you want the standard Hypermail index - page look and feel. -

-

HM_FILTER_OUT expression -Delete messages with headers matching regular expressions -(PCRE syntax). -See also filter_require, -filter_out_full_body, and -filter_require_full_body. - -

-

HM_DOMAINADDR "domainname" -
Set this to the domainname you want added to a mail address appearing -in the RFC2822 field which lack a hostname. When the list resides on the -same host as the user sending the message, it is often not required of -the MTA to domain-ize these addresses for delivery. In such cases, -Hypermail will add the DOMAINADDR to the email address. If defined as -NONE, this feature is turned off. -

-

HM_USEMETA [ 0 | 1 ] -
This option allows you to use metadata to store the content type - of a MIME attachments and, later on, when a user browses the - attachment, send back this information in the HTTP Content-Type - header. When set to 1, the Content-Type header of a - MIME attachment will be stored in a metadata file. Let us say that - the MIME attachments for a message are stored in directory - att-num. The metadata for those attachments will - then be stored in directory att-num/.meta. If a - MIME attachment is stored in file att-file, its - metadata will be stored in file att-file.meta. This - convention is directly compatible with the Apache server handling of - metadata. -

-

HM_REVERSE boolean_number -
Defining this variable as 1 will reverse-sort the article entries in the date and thread index files by the date they were received. That is, the most recent messages will appear at the top of the index rather than the other way around. -

-

HM_MHTMLHEADERFILE "path" -
Define path as the path to a file containing valid HTML formatting statements that you wish to included at the top of every message page. Hypermail will print this file as the header of the message so make sure it contains <HTML>, <HEAD>, and <BODY> and other statements that suit your local customized needs. -
-See also ihtmlheaderfile, -ihtmlfooterfile, and -mhtmlfooterfile. -

-

HM_CONFIGFILE "filename" -
This is the default configuration file to read settings in from. This - can only be specified as an environment variable. If the first character - is "~", Hypermail will look for the file under the current user's home - directory. -

-

-

- -


- -

Order of Options Processing

-

-Settings are processed in this order: -

-

    -
  1. From the program's hard-wired internal defaults (specified in options.h), -
  2. From runtime environment variables, -
  3. From the configuration file, -
  4. From command-line options. -
-

-Early versions of Hypermail read the command line before reading the -configuration file. -

-


- -

Other Things

- -Filenames: In the specified directory, articles will be read out in the order that they were read in from a mailbox or standard input. Filenames start at zero and increase in this fashion: 0000.html, 0001.html, 0002.html, etc. In the same directory: -

-

    -
  • date.html is the index of articles sorted by the date they were received by the system's mail daemon. -
  • thread.html is the index of articles sorted by thread first, then the date they were received. -
  • subject.html is the index of articles sorted by subject. Any "Re:" prefixes in front of subjects will have been stripped out. -
  • author.html is the index of articles sorted by the first word of the author's name. If the author's name can't be determined, their email address will be substituted. -
  • One of the above files will be called index.html and is the default index that users can go to when entering the archive. -
-

- -Sorting: In the date and thread index files, note that these lists are sorted by the date the articles were received by the system's mail daemon, not by the date they were written on. The order of articles in the date index may not necessarily match the order in which the article files are written and linked together. Because of this, it is a good idea to make sure the mailbox is sorted by date with the most recent messages towards the bottom. -

- - -Running Hypermail automatically: All that's needed to start archiving email messages is to set up Hypermail to do incremental updates in your /etc/aliases file (assuming that you use sendmail or something that works like it to deliver mail). - Here's what an entry might look like (the last line is one unbroken line): -

- -# -
# WU-FTPD Mailing List Archives -
# -
wulist: "|/usr/local/bin/hypermail -i -u -d /wu-ftpd/mail-archive -l \"WU-FTPD Mailing List Archive\"" -
-
- -After adding the entry, make sure newaliases is run to update the -mail aliases. This entry will run Hypermail and update/create the archive whenever -a new message is received. Hypermail also works well as a cron job. -Because sendmail may run Hypermail as different users, you will -want to make sure that archive directories and files are made readable and writeable -by a trusted sendmail user (or read/writable by everyone if you can't do that) -when they are created. This will ensure that there will be no problems incrementally -updating the archive. -

If you use qmail instead of sendmail, you probably want to create a file -/var/qmail/alias/.qmail-<mylistemailaddress> containing something like this: -

-|/usr/local/bin/hypermail -i -u -d /wu-ftpd/mail-archive -l \"WU-FTPD Mailing List Archive\" -
-

- If you are running Linux kernel version 2.4 or higher, -dnotify -looks like it provides another way to automate Hypermail. - -

-Including HTML in messages: One can include formatted HTML in message bodies by enclosing the HTML with the <HTML> tag (in either uppercase or lowercase). This tag must be on a line by itself: -

-

-   This text will not be parsed...
-   <html>
-   this text will be parsed as HTML.
-   </html>
-   This text will not be parsed...
-
-

-There is no limit to how often the <HTML> tag can be used in an article. -

- -


- -

Getting Help With Hypermail

-

-If you are are looking for more information on Hypermail and its features and -developmental status, check out -SourceForge: Project Info - hypermail -and hypermail-project.org. Additional -documentation and the hypermail development list archives are available there. - -

-


- -

Getting Hypermail Software

- -

-Hypermail is available free of charge under GNU Public License. More details -about the GPL are available at http://www.fsf.org/copyleft/gpl.html. -

-Currently, SourceForge: Project Info - hypermail -has the most recent version. -

-The Hypermail Development Center also has beta development versions of hypermail available from time to time. - -

-


- -

Credits

- -I would like to thank Tom Gruber, -who originally designed and developed Hypermail in Common Lisp, for the basis -of a GREAT tool. -

-I'd also like to thank Kevin Hughes for -developing the initial C version of Hypermail. Kevin also provided a great -deal of assistance with restarting the current hypermail development. -

-There are a great deal of people that also contributed to Hypermail's development. -The Hypermail Development Center Credits page is an attempt to let you know just who they are. -

-Hypermail development is currently being fostered by -<Peter McCluskey>. - -

- -


-

See Also

-
-hypermail(1), -  -hmrc(4), -  -Hypermail List Configuration File. -  -and -  -Customizing Hypermail Pages -and Adding a Search Engines to your Hypermail Archive -
- - -
-

- -Please send any feature requests, bug fixes, and comments on Hypermail -to <hypermail@hypermail-project.org>. In order to minimize spam on the list, you must subscribe to the list (at least temporarily) in order to send mail to it. -You may subscribe to the list by sending a message with the word -"subscribe" in the Subject: field to hypermail-request@hypermail-project.org. - -

- -Last updated Sep 2, 2006 - - - - +

+While the example uses quotes ("), they is not +required when used in the configuration file. +

Below is a list of the more important configuration variables. +For a complete list, see hmrc.html.

+
+
HM_LABEL "label name"
+
Define this as the default label to put in archives.
+ +
HM_ARCHIVES "URL"
+
This will create a link in the archived index pages to the + specified URL. Define as "NONE" to omit such a + link. See also custom_archives. +
+ +
HM_HMAIL "list submission address"
+
This is the email address used to send a new message to a + hypermail archive. "NONE" means don't use it. Since this is + different for each hypermail archive, you should probably leave it + set to "NONE" here, and let it be specified at runtime by + command-line parameters in the list specific configfile.
+ See also newmsg_command and + replymsg_command. +
+ +
HM_DIR "directory"
+
This is the default directory that Hypermail will look for when + creating and updating archives. If defined as + "NONE", the directory name will be the same name + as the mailbox read in. +
+ +
HM_MBOX "filename"
+
This is the default mailbox to read messages in from. Define + this with a value of "NONE" to read from standard + input as the default. +
+ +
HM_STRIPSUBJECT "text"
+
A string to be stripped from all subject lines. Helps unclutter + mailing lists which add tags to subject lines. +
+ +
HM_FOLDER_BY_DATE ="strftime-date-format"
+
This string causes the messages to be put in subdirectories by + date. The string will be passed to strftime(3) to generate + subdirectory names based on message dates. Suggested values are + "%y%m" or "%b%y" for monthly subdirectories, "%Y" for yearly, + "%G/%V" for weekly. Do not alter this for an existing archive + without removing the old html files. If you use this and update the + archive incrementally (e.g. with -u), you must use the + usegdbm option.
+ See also monthly_index. +
+ +
HM_ISODATE boolean_number
+
Set this to On to display article received dates in YYYY-MM-DD + HH:MM:SS format. If used with the + gmtime option, a Z will be inserted between + the DD and HH.
+ See also eurodate and + dateformat. +
+ +
HM_LANGUAGE "language-id"
+
NB non-english language files have not been updated since long time ago + and are now lagging in translation.
+ This is a two-letter string specifying the default language to + use, or a longer string specifying a language and locale. Set this + the value of the language table you wish to use when running and + generating archives. See also iso2022jp and eurodate.
+
+ Current supported languages, with their default locales: +
+ de (de_DE) - German
+ en (en_US) - English
+ es (es_ES) - Spanish
+ fi (fi_FI) - Finnish
+ fr (fr_FR) - French
+ el (el) - Greek
+ gr (el_GR) - Greek
+ is (is_IS) - Icelandic
+ no (no_NO) - Norwegian
+ pl (pl_PL) - Polish
+ pt (pt_BR) - Brazilian Portuguese
+ ru (ru_RU) - Russian
+ sv (sv_SE) - Swedish +
+ The directory /usr/share/i18n/locales on many systems has the + locale codes that are available on that system. +
+ +
HM_INCREMENT -1, 0, or 1
+
Define as 1 to append all input messages to the + end of existing archives.
+ Define as 0 for it to read a mailbox that + corresponds to the entire archive. If there are any existing html + messages, it will figure out which ones at the end of the mailbox + are new, and add only those that haven't been converted yet.
+ Define as -1 to have hypermail figure out whether + the input is entirely new messages to be appended or whether it + contains messages that are already in the archive. A value of -1 + cannot be used with the mbox_shortened option or with the -i + command line option or with mbox = NONE.
+
+
+ +
HM_APPEND boolean_number
+
Set this to On to maintain a parallel mbox archive. The file + name defaults to mbox in the directory specified by -d or + dir.
+ See also append_filename + and txtsuffix. +
+ +
HM_SHOWHTML 0, 1, or 2
+ +
Define as 1 to show the articles in a + proportionally-spaced font rather than a fixed-width + (monospace) font. Setting this option to 1 also tells + Hypermail to attempt to italicize quoted passages in articles. +

Define as 2 for more complex conversion to html + similar to that in + txt2html.pl. Showhtml + = 2 will normally produce nicer looking results than showhtml = 1, + and showhtml = 0 will look pretty dull, but 1 and 2 run risks of + altering the appearance in undesired ways.

+
+ +
HM_LINKQUOTES boolean_number
+
Set this to On to create fine-grained links from quoted text to + the text where the quote originated. It also improves the threads + index file by more accurately matching messages with replies. Note + that this may be rather cpu intensive (see the + searchbackmsgnum option to alter + the performance). +
+ +
HM_ABOUT "URL"
+
This will create a link in the archived index pages to the + specified URL. Define as "NONE" to omit such a link. +
+ +
HM_MAILTO address
+
The address of the contact point that is put in the HTML header + line +
+ <LINK REV=made HREF=mailto:MAILTO> +
+ The <LINK...> header can be disabled by default by setting + HM_MAILTO to "NONE". +
+ +
HM_FILTER_OUT expression
+
Delete messages with + headers matching regular expressions + (PCRE syntax). + See also filter_require, + filter_out_full_body, + and filter_require_full_body.
+
+ +
HM_DOMAINADDR "domainname"
+
Set this to the domainname you want added to a mail address + appearing in the RFC2822 field which lack a hostname. When the list + resides on the same host as the user sending the message, it is + often not required of the MTA to domain-ize these addresses for + delivery. In such cases, Hypermail will add the DOMAINADDR to the + email address. If defined as NONE, this feature is turned off. +
+ +
HM_USEMETA [ 0 | 1 ]
+
This option allows you to use metadata to store the content + type of a MIME attachments and, later on, when a user browses the + attachment, send back this information in the HTTP Content-Type + header. When set to 1, the Content-Type header of + a MIME attachment will be stored in a metadata file. Let us say + that the MIME attachments for a message are stored in directory + att-num. The metadata for those attachments will + then be stored in directory att-num/.meta. If a + MIME attachment is stored in file att-file, its + metadata will be stored in file att-file.meta. + This convention is directly compatible with the Apache server + handling of metadata. +
+ +
HM_REVERSE boolean_number
+
Defining this variable as 1 will reverse-sort + the article entries in the date and thread index files by the date + they were received. That is, the most recent messages will appear + at the top of the index rather than the other way around. +
+ +
HM_MHTMLHEADERFILE "path"
+
Define path as the path to a file containing valid HTML + formatting statements that you wish to included at the top of every + message page. Hypermail will print this file as the header of the + message so make sure it contains <HTML>, + <HEAD>, and <BODY> and other statements that + suit your local customized needs.
+ See also ihtmlheaderfile, + ihtmlfooterfile, and + mhtmlfooterfile. +
+ +
HM_CONFIGFILE "filename"
+
This is the default configuration file to read settings in + from. This can only be specified as an environment variable. If the + first character is "~", Hypermail will look for the file under the + current user's home directory. +
+
+
+ +

Order of Options Processing

+

Settings are processed in this order:

+
    +
  1. From the program's hard-wired internal defaults (specified in +defaults.h and setup.c),
  2. +
  3. From runtime environment variables,
  4. +
  5. From the configuration file,
  6. +
  7. From command-line options.
  8. +
+
+ +

Other Things

+ +

Filenames

+ +

In the specified directory, articles will be read out in the order +that they were read in from a mailbox or standard input. Filenames +start at zero and increase in this fashion: 0000.html, 0001.html, 0002.html, +etc. In the same directory:

+ +
    +
  • date.html is the index of articles sorted by + the date they were received by the system's mail daemon.
  • +
  • thread.html is the index of articles sorted by + thread first, then the date they were received.
  • +
  • subject.html is the index of articles sorted + by subject. Any "Re:" prefixes in front of subjects will have been + stripped out.
  • +
  • author.html is the index of articles sorted by + the first word of the author's name. If the author's name can't be + determined, their email address will be substituted.
  • +
  • One of the above files will be called + index.html and is the default index that users can + go to when entering the archive.
  • +
+ +

Sorting

+ +

In the date and thread index files, note that these lists are sorted +by the date the articles were received by the system's mail daemon, +not by the date they were written on. The order of articles in the +date index may not necessarily match the order in which the article +files are written and linked together. Because of this, it is a good +idea to make sure the mailbox is sorted by date with the most recent +messages towards the bottom.

+ +

Running Hypermail automatically

+ +

The following are some tips on how to make hypermail archive incoming + messages on-the-fly.

+ +

Sendmail and qmail aliases

+ +

All that's needed to start archiving email +messages is to set up Hypermail to do incremental updates in your +/etc/aliases file (assuming that you use +sendmail or something that works like it to +deliver mail). Here's what an entry might look like (the last line +is one unbroken line):

+ +
#
+# WU-FTPD Mailing List Archives
+#
+wulist: "|/usr/local/bin/hypermail -i -u -d /wu-ftpd/mail-archive +-l \"WU-FTPD Mailing List Archive\""
+ +

After adding the entry, make sure newaliases is +run to update the mail aliases. This entry will run Hypermail and + update/create the archive whenever a new message is received.

+ +

Because sendmail may run Hypermail as different users, you +will want to make sure that archive directories and files are made +readable and writeable by a trusted sendmail user (or read/writable +by everyone if you can't do that) when they are created. This will +ensure that there will be no problems incrementally updating the + archive.

+ +

If you use qmail instead of sendmail, you probably want to +create a file /var/qmail/alias/.qmail-<mylistemailaddress> +containing something like this:

+
|/usr/local/bin/hypermail -i -u -d + /wu-ftpd/mail-archive -l \"WU-FTPD Mailing List + Archive\"
+ +

Smartlist / Procmail

+ +

If you are using smartlist, + you can call hypermail automatically by enabling rc.local.s20 and + adding a filter. You may also use this filter in a procmail rule. Note that in both cases, we're relying on smartlist/procmail's locks insted of hypermail's:

+
+ :0 cw + | /usr/local/bin/hypermail -o uselock=0 -i -u -d + /wu-ftpd/mail-archive -l \"WU-FTPD Mailing List + Archive\"
+ +

Other ways of automatizing hypermail

+ +

On linux, cronjobs and inotify may be two other possibilities for running hypermail.

+ +
+ +

Getting Hypermail Software

+ +

Please visit The Hypermail Project github repository to +download the latest stable version of Hypermail as well as follow its +development.

+ +

Hypermail is available free of charge under GNU Public License version 3 (GPLv3). Please read the LICENSES.txt file for licenses that cover libraries and some functions that are used by Hypermail.

+ +
+ +

Getting Help With Hypermail

+ +

We don't run any mailing lists anymore. We rely on github for reporting issues, feature requests, bug fixes, publishing releases, and tracking development.

+ +

If you're interested in the past history of hypermail and design decisions, you can browse the hypermail developer's mailing list archives (mirror), covering the 1998-2010 activity period.

+ +
+ +

Credits

+ +

Hypermail was originally developed and designed by + Tom Gruber for Enterprise + Integration Technologies (EIT) in Common Lisp. It was later + rewritten in C by Kevin Hughes + while at EIT.

+ +

Hypermail is currently fostered by + José Kahan.

+ +

Please refer to the credits page for a list of + people who have contributed to the hypermail project.

+ +
+ +

See Also

+
hypermail(1),   +hmrc(4),   Hypermail List Configuration File,   +Customizing Hypermail +Pages,   Adding +a Search Engines to your Hypermail +Archive
+
+ +

Last updated May 24, 2023

+ + + diff --git a/docs/stars.png b/docs/stars.png deleted file mode 100644 index 212902bb..00000000 Binary files a/docs/stars.png and /dev/null differ diff --git a/docs/thanks.html b/docs/thanks.html new file mode 100644 index 00000000..0bb05a35 --- /dev/null +++ b/docs/thanks.html @@ -0,0 +1,111 @@ + + + + + + Hypermail Credits Page + + + + + + + + +

Thank You!

+
+
+

+ This project has been alive for many years. Countless people have provided + feedback that have improved hypermail. +

+ +

Thanks to + Tom Gruber, who originally designed and developed Hypermail + in Common Lisp, for the basis of a GREAT + tool.

+ +

Thanks to Kevin Hughes for developing the initial C version of + Hypermail and assisting in its transition to the hypermail + project organization. Thanks Kevin!

+ +

Thanks to Kent Landfield for having started the hypermail + project organization, the first hypermail project web site, + leading the project 1997 to 2003, and encouraging people to + contribute their patches and ideas. The few years you spent on + hypermail really made the project evolve.

+ +

Thanks to Darci Chapman for encouraging Kent to incorporate his + initial "hacks" into the hypermail source code and for the + initial code clean up.

+ +

Many thanks to Daniel Stenberg for all the work he did for + adding support for MIME attachments and his trio library.

+ +

Thanks to Peter C. McCluskey for his code contributions, for + having curated hypermail from roughly 2003-2008 as well as + keeping hypermail project site + running for so long.

+ +

Thanks to Ashley M. Kirchner for having maintained the original + CVS server for hypermail and giving us access to the + repository for transitioning to sourceforge

+ +

The following people should be noted for their work and + contributions to Hypermail's development(alphabetic by last name or user handle).

+ +

If you have contributed but are missing here, please let us know!

+ +
    +
  • Tom von Alten
  • +
  • Bob Crispen
  • +
  • Baptiste Daroussin
  • +
  • Byron C. Darrah
  • +
  • Roy T. Fielding
  • +
  • John Finlay
  • +
  • Paul Haldane
  • +
  • I. Ioannou
  • +
  • José Kahan
  • +
  • Fumihiro Kato
  • +
  • David D. Kilzer
  • +
  • Ashley M. Kirchner
  • +
  • Dave Kopper
  • +
  • Elliot Lee
  • +
  • Daigo Matsubara
  • +
  • Peter C. McCluskey
  • +
  • Christof Meerwald
  • +
  • Jared Reisinger
  • +
  • James Riordon
  • +
  • Scott Rose
  • +
  • Martin Schulze
  • +
  • Bill Shannon
  • +
  • Jay Soffian
  • +
  • Glen Stewart
  • +
  • Craig A Summerhill
  • +
  • Roy Tennant
  • +
  • Andy Valencia
  • +
+ +

Last, but not least, a BIG THANKS! to the + past members of the (now defunct) Hypermail Development list (1998-2010) + and to the people who participate in the hypermail project + github repository for their encouragement, ideas, bug + fixes, and participation,

+ +

Thanks All!

+
+ +
+
+
+ + diff --git a/lcc/config.h b/lcc/config.h index 07fe111d..661c26c2 100644 --- a/lcc/config.h +++ b/lcc/config.h @@ -142,7 +142,7 @@ #define GDBM 1 /* PCRE (Perl regular expressions) */ -#define HAVE_PCRE 1 +#define HAVE_PCRE2 1 /* Whether you want function version of ctype functions */ #undef NO_MACRO diff --git a/lcc/defaults.h b/lcc/defaults.h index 33296ba9..933814a5 100644 --- a/lcc/defaults.h +++ b/lcc/defaults.h @@ -11,6 +11,8 @@ #define CONFIGFILE "~/.hmrc" +#define MSG_FRAGMENT_PREFIX "msg" + #define INLINE_TYPES "image/gif image/jpeg image/png" #define SHOW_HEADERS "From Subject Date Message-ID" @@ -27,8 +29,28 @@ #define DEFAULTINDEX "@defaultindex@" +#define DEFAULT_TOP_INDEX "folders" + #define DOMAINADDR "@domainaddr@" #define ANTISPAM_AT "_at_" +#define APPLE_MAIL_UA_HEADER "X-Mailer" + +#define APPLE_MAIL_UA "Apple iPhone iPad" + +#define DEFAULT_CHARSET "US-ASCII" + +#define HM_ANNOTATION_HEADER "X-Hypermail-Annotated" + +#define HM_DELETED_HEADERS "X-Hypermail-Deleted X-No-Archive" + +#define EXPIRES_HEADER "Expires" + +#define NEW_MSG_COMMAND "mailto:$TO" + +#define REPLYMSG_COMMAND "not set" + +#define DEFAULT_CSS_URL "hypermail.css" + #endif diff --git a/lcc/pcre.h b/lcc/pcre.h deleted file mode 100644 index a6aa4e93..00000000 --- a/lcc/pcre.h +++ /dev/null @@ -1,653 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* This is the public header file for the PCRE library, to be #included by -applications that call the PCRE functions. - - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -#ifndef _PCRE_H -#define _PCRE_H - -/* The current PCRE version information. */ - -#define PCRE_MAJOR 8 -#define PCRE_MINOR 32 -#define PCRE_PRERELEASE -#define PCRE_DATE 2012-11-30 - -/* When an application links to a PCRE DLL in Windows, the symbols that are -imported have to be identified as such. When building PCRE, the appropriate -export setting is defined in pcre_internal.h, which includes this file. So we -don't change existing definitions of PCRE_EXP_DECL and PCRECPP_EXP_DECL. */ - -#if defined(_WIN32) && !defined(PCRE_STATIC) -# ifndef PCRE_EXP_DECL -# define PCRE_EXP_DECL extern __declspec(dllimport) -# endif -# ifdef __cplusplus -# ifndef PCRECPP_EXP_DECL -# define PCRECPP_EXP_DECL extern __declspec(dllimport) -# endif -# ifndef PCRECPP_EXP_DEFN -# define PCRECPP_EXP_DEFN __declspec(dllimport) -# endif -# endif -#endif - -/* By default, we use the standard "extern" declarations. */ - -#ifndef PCRE_EXP_DECL -# ifdef __cplusplus -# define PCRE_EXP_DECL extern "C" -# else -# define PCRE_EXP_DECL extern -# endif -#endif - -#ifdef __cplusplus -# ifndef PCRECPP_EXP_DECL -# define PCRECPP_EXP_DECL extern -# endif -# ifndef PCRECPP_EXP_DEFN -# define PCRECPP_EXP_DEFN -# endif -#endif - -/* Have to include stdlib.h in order to ensure that size_t is defined; -it is needed here for malloc. */ - -#include - -/* Allow for C++ users */ - -#ifdef __cplusplus -extern "C" { -#endif - -/* Public options. Some are compile-time only, some are run-time only, and some -are both, so we keep them all distinct. However, almost all the bits in the -options word are now used. In the long run, we may have to re-use some of the -compile-time only bits for runtime options, or vice versa. Any of the -compile-time options may be inspected during studying (and therefore JIT -compiling). - -Some options for pcre_compile() change its behaviour but do not affect the -behaviour of the execution functions. Other options are passed through to the -execution functions and affect their behaviour, with or without affecting the -behaviour of pcre_compile(). - -Options that can be passed to pcre_compile() are tagged Cx below, with these -variants: - -C1 Affects compile only -C2 Does not affect compile; affects exec, dfa_exec -C3 Affects compile, exec, dfa_exec -C4 Affects compile, exec, dfa_exec, study -C5 Affects compile, exec, study - -Options that can be set for pcre_exec() and/or pcre_dfa_exec() are flagged with -E and D, respectively. They take precedence over C3, C4, and C5 settings passed -from pcre_compile(). Those that are compatible with JIT execution are flagged -with J. */ - -#define PCRE_CASELESS 0x00000001 /* C1 */ -#define PCRE_MULTILINE 0x00000002 /* C1 */ -#define PCRE_DOTALL 0x00000004 /* C1 */ -#define PCRE_EXTENDED 0x00000008 /* C1 */ -#define PCRE_ANCHORED 0x00000010 /* C4 E D */ -#define PCRE_DOLLAR_ENDONLY 0x00000020 /* C2 */ -#define PCRE_EXTRA 0x00000040 /* C1 */ -#define PCRE_NOTBOL 0x00000080 /* E D J */ -#define PCRE_NOTEOL 0x00000100 /* E D J */ -#define PCRE_UNGREEDY 0x00000200 /* C1 */ -#define PCRE_NOTEMPTY 0x00000400 /* E D J */ -#define PCRE_UTF8 0x00000800 /* C4 ) */ -#define PCRE_UTF16 0x00000800 /* C4 ) Synonyms */ -#define PCRE_UTF32 0x00000800 /* C4 ) */ -#define PCRE_NO_AUTO_CAPTURE 0x00001000 /* C1 */ -#define PCRE_NO_UTF8_CHECK 0x00002000 /* C1 E D J ) */ -#define PCRE_NO_UTF16_CHECK 0x00002000 /* C1 E D J ) Synonyms */ -#define PCRE_NO_UTF32_CHECK 0x00002000 /* C1 E D J ) */ -#define PCRE_AUTO_CALLOUT 0x00004000 /* C1 */ -#define PCRE_PARTIAL_SOFT 0x00008000 /* E D J ) Synonyms */ -#define PCRE_PARTIAL 0x00008000 /* E D J ) */ -#define PCRE_DFA_SHORTEST 0x00010000 /* D */ -#define PCRE_DFA_RESTART 0x00020000 /* D */ -#define PCRE_FIRSTLINE 0x00040000 /* C3 */ -#define PCRE_DUPNAMES 0x00080000 /* C1 */ -#define PCRE_NEWLINE_CR 0x00100000 /* C3 E D */ -#define PCRE_NEWLINE_LF 0x00200000 /* C3 E D */ -#define PCRE_NEWLINE_CRLF 0x00300000 /* C3 E D */ -#define PCRE_NEWLINE_ANY 0x00400000 /* C3 E D */ -#define PCRE_NEWLINE_ANYCRLF 0x00500000 /* C3 E D */ -#define PCRE_BSR_ANYCRLF 0x00800000 /* C3 E D */ -#define PCRE_BSR_UNICODE 0x01000000 /* C3 E D */ -#define PCRE_JAVASCRIPT_COMPAT 0x02000000 /* C5 */ -#define PCRE_NO_START_OPTIMIZE 0x04000000 /* C2 E D ) Synonyms */ -#define PCRE_NO_START_OPTIMISE 0x04000000 /* C2 E D ) */ -#define PCRE_PARTIAL_HARD 0x08000000 /* E D J */ -#define PCRE_NOTEMPTY_ATSTART 0x10000000 /* E D J */ -#define PCRE_UCP 0x20000000 /* C3 */ - -/* Exec-time and get/set-time error codes */ - -#define PCRE_ERROR_NOMATCH (-1) -#define PCRE_ERROR_NULL (-2) -#define PCRE_ERROR_BADOPTION (-3) -#define PCRE_ERROR_BADMAGIC (-4) -#define PCRE_ERROR_UNKNOWN_OPCODE (-5) -#define PCRE_ERROR_UNKNOWN_NODE (-5) /* For backward compatibility */ -#define PCRE_ERROR_NOMEMORY (-6) -#define PCRE_ERROR_NOSUBSTRING (-7) -#define PCRE_ERROR_MATCHLIMIT (-8) -#define PCRE_ERROR_CALLOUT (-9) /* Never used by PCRE itself */ -#define PCRE_ERROR_BADUTF8 (-10) /* Same for 8/16/32 */ -#define PCRE_ERROR_BADUTF16 (-10) /* Same for 8/16/32 */ -#define PCRE_ERROR_BADUTF32 (-10) /* Same for 8/16/32 */ -#define PCRE_ERROR_BADUTF8_OFFSET (-11) /* Same for 8/16 */ -#define PCRE_ERROR_BADUTF16_OFFSET (-11) /* Same for 8/16 */ -#define PCRE_ERROR_PARTIAL (-12) -#define PCRE_ERROR_BADPARTIAL (-13) -#define PCRE_ERROR_INTERNAL (-14) -#define PCRE_ERROR_BADCOUNT (-15) -#define PCRE_ERROR_DFA_UITEM (-16) -#define PCRE_ERROR_DFA_UCOND (-17) -#define PCRE_ERROR_DFA_UMLIMIT (-18) -#define PCRE_ERROR_DFA_WSSIZE (-19) -#define PCRE_ERROR_DFA_RECURSE (-20) -#define PCRE_ERROR_RECURSIONLIMIT (-21) -#define PCRE_ERROR_NULLWSLIMIT (-22) /* No longer actually used */ -#define PCRE_ERROR_BADNEWLINE (-23) -#define PCRE_ERROR_BADOFFSET (-24) -#define PCRE_ERROR_SHORTUTF8 (-25) -#define PCRE_ERROR_SHORTUTF16 (-25) /* Same for 8/16 */ -#define PCRE_ERROR_RECURSELOOP (-26) -#define PCRE_ERROR_JIT_STACKLIMIT (-27) -#define PCRE_ERROR_BADMODE (-28) -#define PCRE_ERROR_BADENDIANNESS (-29) -#define PCRE_ERROR_DFA_BADRESTART (-30) -#define PCRE_ERROR_JIT_BADOPTION (-31) -#define PCRE_ERROR_BADLENGTH (-32) - -/* Specific error codes for UTF-8 validity checks */ - -#define PCRE_UTF8_ERR0 0 -#define PCRE_UTF8_ERR1 1 -#define PCRE_UTF8_ERR2 2 -#define PCRE_UTF8_ERR3 3 -#define PCRE_UTF8_ERR4 4 -#define PCRE_UTF8_ERR5 5 -#define PCRE_UTF8_ERR6 6 -#define PCRE_UTF8_ERR7 7 -#define PCRE_UTF8_ERR8 8 -#define PCRE_UTF8_ERR9 9 -#define PCRE_UTF8_ERR10 10 -#define PCRE_UTF8_ERR11 11 -#define PCRE_UTF8_ERR12 12 -#define PCRE_UTF8_ERR13 13 -#define PCRE_UTF8_ERR14 14 -#define PCRE_UTF8_ERR15 15 -#define PCRE_UTF8_ERR16 16 -#define PCRE_UTF8_ERR17 17 -#define PCRE_UTF8_ERR18 18 -#define PCRE_UTF8_ERR19 19 -#define PCRE_UTF8_ERR20 20 -#define PCRE_UTF8_ERR21 21 -#define PCRE_UTF8_ERR22 22 - -/* Specific error codes for UTF-16 validity checks */ - -#define PCRE_UTF16_ERR0 0 -#define PCRE_UTF16_ERR1 1 -#define PCRE_UTF16_ERR2 2 -#define PCRE_UTF16_ERR3 3 -#define PCRE_UTF16_ERR4 4 - -/* Specific error codes for UTF-32 validity checks */ - -#define PCRE_UTF32_ERR0 0 -#define PCRE_UTF32_ERR1 1 -#define PCRE_UTF32_ERR2 2 -#define PCRE_UTF32_ERR3 3 - -/* Request types for pcre_fullinfo() */ - -#define PCRE_INFO_OPTIONS 0 -#define PCRE_INFO_SIZE 1 -#define PCRE_INFO_CAPTURECOUNT 2 -#define PCRE_INFO_BACKREFMAX 3 -#define PCRE_INFO_FIRSTBYTE 4 -#define PCRE_INFO_FIRSTCHAR 4 /* For backwards compatibility */ -#define PCRE_INFO_FIRSTTABLE 5 -#define PCRE_INFO_LASTLITERAL 6 -#define PCRE_INFO_NAMEENTRYSIZE 7 -#define PCRE_INFO_NAMECOUNT 8 -#define PCRE_INFO_NAMETABLE 9 -#define PCRE_INFO_STUDYSIZE 10 -#define PCRE_INFO_DEFAULT_TABLES 11 -#define PCRE_INFO_OKPARTIAL 12 -#define PCRE_INFO_JCHANGED 13 -#define PCRE_INFO_HASCRORLF 14 -#define PCRE_INFO_MINLENGTH 15 -#define PCRE_INFO_JIT 16 -#define PCRE_INFO_JITSIZE 17 -#define PCRE_INFO_MAXLOOKBEHIND 18 -#define PCRE_INFO_FIRSTCHARACTER 19 -#define PCRE_INFO_FIRSTCHARACTERFLAGS 20 -#define PCRE_INFO_REQUIREDCHAR 21 -#define PCRE_INFO_REQUIREDCHARFLAGS 22 - -/* Request types for pcre_config(). Do not re-arrange, in order to remain -compatible. */ - -#define PCRE_CONFIG_UTF8 0 -#define PCRE_CONFIG_NEWLINE 1 -#define PCRE_CONFIG_LINK_SIZE 2 -#define PCRE_CONFIG_POSIX_MALLOC_THRESHOLD 3 -#define PCRE_CONFIG_MATCH_LIMIT 4 -#define PCRE_CONFIG_STACKRECURSE 5 -#define PCRE_CONFIG_UNICODE_PROPERTIES 6 -#define PCRE_CONFIG_MATCH_LIMIT_RECURSION 7 -#define PCRE_CONFIG_BSR 8 -#define PCRE_CONFIG_JIT 9 -#define PCRE_CONFIG_UTF16 10 -#define PCRE_CONFIG_JITTARGET 11 -#define PCRE_CONFIG_UTF32 12 - -/* Request types for pcre_study(). Do not re-arrange, in order to remain -compatible. */ - -#define PCRE_STUDY_JIT_COMPILE 0x0001 -#define PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE 0x0002 -#define PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE 0x0004 -#define PCRE_STUDY_EXTRA_NEEDED 0x0008 - -/* Bit flags for the pcre[16|32]_extra structure. Do not re-arrange or redefine -these bits, just add new ones on the end, in order to remain compatible. */ - -#define PCRE_EXTRA_STUDY_DATA 0x0001 -#define PCRE_EXTRA_MATCH_LIMIT 0x0002 -#define PCRE_EXTRA_CALLOUT_DATA 0x0004 -#define PCRE_EXTRA_TABLES 0x0008 -#define PCRE_EXTRA_MATCH_LIMIT_RECURSION 0x0010 -#define PCRE_EXTRA_MARK 0x0020 -#define PCRE_EXTRA_EXECUTABLE_JIT 0x0040 - -/* Types */ - -struct real_pcre; /* declaration; the definition is private */ -typedef struct real_pcre pcre; - -struct real_pcre16; /* declaration; the definition is private */ -typedef struct real_pcre16 pcre16; - -struct real_pcre32; /* declaration; the definition is private */ -typedef struct real_pcre32 pcre32; - -struct real_pcre_jit_stack; /* declaration; the definition is private */ -typedef struct real_pcre_jit_stack pcre_jit_stack; - -struct real_pcre16_jit_stack; /* declaration; the definition is private */ -typedef struct real_pcre16_jit_stack pcre16_jit_stack; - -struct real_pcre32_jit_stack; /* declaration; the definition is private */ -typedef struct real_pcre32_jit_stack pcre32_jit_stack; - -/* If PCRE is compiled with 16 bit character support, PCRE_UCHAR16 must contain -a 16 bit wide signed data type. Otherwise it can be a dummy data type since -pcre16 functions are not implemented. There is a check for this in pcre_internal.h. */ -#ifndef PCRE_UCHAR16 -#define PCRE_UCHAR16 unsigned short -#endif - -#ifndef PCRE_SPTR16 -#define PCRE_SPTR16 const PCRE_UCHAR16 * -#endif - -/* If PCRE is compiled with 32 bit character support, PCRE_UCHAR32 must contain -a 32 bit wide signed data type. Otherwise it can be a dummy data type since -pcre32 functions are not implemented. There is a check for this in pcre_internal.h. */ -#ifndef PCRE_UCHAR32 -#define PCRE_UCHAR32 unsigned int -#endif - -#ifndef PCRE_SPTR32 -#define PCRE_SPTR32 const PCRE_UCHAR32 * -#endif - -/* When PCRE is compiled as a C++ library, the subject pointer type can be -replaced with a custom type. For conventional use, the public interface is a -const char *. */ - -#ifndef PCRE_SPTR -#define PCRE_SPTR const char * -#endif - -/* The structure for passing additional data to pcre_exec(). This is defined in -such as way as to be extensible. Always add new fields at the end, in order to -remain compatible. */ - -typedef struct pcre_extra { - unsigned long int flags; /* Bits for which fields are set */ - void *study_data; /* Opaque data from pcre_study() */ - unsigned long int match_limit; /* Maximum number of calls to match() */ - void *callout_data; /* Data passed back in callouts */ - const unsigned char *tables; /* Pointer to character tables */ - unsigned long int match_limit_recursion; /* Max recursive calls to match() */ - unsigned char **mark; /* For passing back a mark pointer */ - void *executable_jit; /* Contains a pointer to a compiled jit code */ -} pcre_extra; - -/* Same structure as above, but with 16 bit char pointers. */ - -typedef struct pcre16_extra { - unsigned long int flags; /* Bits for which fields are set */ - void *study_data; /* Opaque data from pcre_study() */ - unsigned long int match_limit; /* Maximum number of calls to match() */ - void *callout_data; /* Data passed back in callouts */ - const unsigned char *tables; /* Pointer to character tables */ - unsigned long int match_limit_recursion; /* Max recursive calls to match() */ - PCRE_UCHAR16 **mark; /* For passing back a mark pointer */ - void *executable_jit; /* Contains a pointer to a compiled jit code */ -} pcre16_extra; - -/* Same structure as above, but with 32 bit char pointers. */ - -typedef struct pcre32_extra { - unsigned long int flags; /* Bits for which fields are set */ - void *study_data; /* Opaque data from pcre_study() */ - unsigned long int match_limit; /* Maximum number of calls to match() */ - void *callout_data; /* Data passed back in callouts */ - const unsigned char *tables; /* Pointer to character tables */ - unsigned long int match_limit_recursion; /* Max recursive calls to match() */ - PCRE_UCHAR32 **mark; /* For passing back a mark pointer */ - void *executable_jit; /* Contains a pointer to a compiled jit code */ -} pcre32_extra; - -/* The structure for passing out data via the pcre_callout_function. We use a -structure so that new fields can be added on the end in future versions, -without changing the API of the function, thereby allowing old clients to work -without modification. */ - -typedef struct pcre_callout_block { - int version; /* Identifies version of block */ - /* ------------------------ Version 0 ------------------------------- */ - int callout_number; /* Number compiled into pattern */ - int *offset_vector; /* The offset vector */ - PCRE_SPTR subject; /* The subject being matched */ - int subject_length; /* The length of the subject */ - int start_match; /* Offset to start of this match attempt */ - int current_position; /* Where we currently are in the subject */ - int capture_top; /* Max current capture */ - int capture_last; /* Most recently closed capture */ - void *callout_data; /* Data passed in with the call */ - /* ------------------- Added for Version 1 -------------------------- */ - int pattern_position; /* Offset to next item in the pattern */ - int next_item_length; /* Length of next item in the pattern */ - /* ------------------- Added for Version 2 -------------------------- */ - const unsigned char *mark; /* Pointer to current mark or NULL */ - /* ------------------------------------------------------------------ */ -} pcre_callout_block; - -/* Same structure as above, but with 16 bit char pointers. */ - -typedef struct pcre16_callout_block { - int version; /* Identifies version of block */ - /* ------------------------ Version 0 ------------------------------- */ - int callout_number; /* Number compiled into pattern */ - int *offset_vector; /* The offset vector */ - PCRE_SPTR16 subject; /* The subject being matched */ - int subject_length; /* The length of the subject */ - int start_match; /* Offset to start of this match attempt */ - int current_position; /* Where we currently are in the subject */ - int capture_top; /* Max current capture */ - int capture_last; /* Most recently closed capture */ - void *callout_data; /* Data passed in with the call */ - /* ------------------- Added for Version 1 -------------------------- */ - int pattern_position; /* Offset to next item in the pattern */ - int next_item_length; /* Length of next item in the pattern */ - /* ------------------- Added for Version 2 -------------------------- */ - const PCRE_UCHAR16 *mark; /* Pointer to current mark or NULL */ - /* ------------------------------------------------------------------ */ -} pcre16_callout_block; - -/* Same structure as above, but with 32 bit char pointers. */ - -typedef struct pcre32_callout_block { - int version; /* Identifies version of block */ - /* ------------------------ Version 0 ------------------------------- */ - int callout_number; /* Number compiled into pattern */ - int *offset_vector; /* The offset vector */ - PCRE_SPTR32 subject; /* The subject being matched */ - int subject_length; /* The length of the subject */ - int start_match; /* Offset to start of this match attempt */ - int current_position; /* Where we currently are in the subject */ - int capture_top; /* Max current capture */ - int capture_last; /* Most recently closed capture */ - void *callout_data; /* Data passed in with the call */ - /* ------------------- Added for Version 1 -------------------------- */ - int pattern_position; /* Offset to next item in the pattern */ - int next_item_length; /* Length of next item in the pattern */ - /* ------------------- Added for Version 2 -------------------------- */ - const PCRE_UCHAR32 *mark; /* Pointer to current mark or NULL */ - /* ------------------------------------------------------------------ */ -} pcre32_callout_block; - -/* Indirection for store get and free functions. These can be set to -alternative malloc/free functions if required. Special ones are used in the -non-recursive case for "frames". There is also an optional callout function -that is triggered by the (?) regex item. For Virtual Pascal, these definitions -have to take another form. */ - -#ifndef VPCOMPAT -PCRE_EXP_DECL void *(*pcre_malloc)(size_t); -PCRE_EXP_DECL void (*pcre_free)(void *); -PCRE_EXP_DECL void *(*pcre_stack_malloc)(size_t); -PCRE_EXP_DECL void (*pcre_stack_free)(void *); -PCRE_EXP_DECL int (*pcre_callout)(pcre_callout_block *); - -PCRE_EXP_DECL void *(*pcre16_malloc)(size_t); -PCRE_EXP_DECL void (*pcre16_free)(void *); -PCRE_EXP_DECL void *(*pcre16_stack_malloc)(size_t); -PCRE_EXP_DECL void (*pcre16_stack_free)(void *); -PCRE_EXP_DECL int (*pcre16_callout)(pcre16_callout_block *); - -PCRE_EXP_DECL void *(*pcre32_malloc)(size_t); -PCRE_EXP_DECL void (*pcre32_free)(void *); -PCRE_EXP_DECL void *(*pcre32_stack_malloc)(size_t); -PCRE_EXP_DECL void (*pcre32_stack_free)(void *); -PCRE_EXP_DECL int (*pcre32_callout)(pcre32_callout_block *); -#else /* VPCOMPAT */ -PCRE_EXP_DECL void *pcre_malloc(size_t); -PCRE_EXP_DECL void pcre_free(void *); -PCRE_EXP_DECL void *pcre_stack_malloc(size_t); -PCRE_EXP_DECL void pcre_stack_free(void *); -PCRE_EXP_DECL int pcre_callout(pcre_callout_block *); - -PCRE_EXP_DECL void *pcre16_malloc(size_t); -PCRE_EXP_DECL void pcre16_free(void *); -PCRE_EXP_DECL void *pcre16_stack_malloc(size_t); -PCRE_EXP_DECL void pcre16_stack_free(void *); -PCRE_EXP_DECL int pcre16_callout(pcre16_callout_block *); - -PCRE_EXP_DECL void *pcre32_malloc(size_t); -PCRE_EXP_DECL void pcre32_free(void *); -PCRE_EXP_DECL void *pcre32_stack_malloc(size_t); -PCRE_EXP_DECL void pcre32_stack_free(void *); -PCRE_EXP_DECL int pcre32_callout(pcre32_callout_block *); -#endif /* VPCOMPAT */ - -/* User defined callback which provides a stack just before the match starts. */ - -typedef pcre_jit_stack *(*pcre_jit_callback)(void *); -typedef pcre16_jit_stack *(*pcre16_jit_callback)(void *); -typedef pcre32_jit_stack *(*pcre32_jit_callback)(void *); - -/* Exported PCRE functions */ - -PCRE_EXP_DECL pcre *pcre_compile(const char *, int, const char **, int *, - const unsigned char *); -PCRE_EXP_DECL pcre16 *pcre16_compile(PCRE_SPTR16, int, const char **, int *, - const unsigned char *); -PCRE_EXP_DECL pcre32 *pcre32_compile(PCRE_SPTR32, int, const char **, int *, - const unsigned char *); -PCRE_EXP_DECL pcre *pcre_compile2(const char *, int, int *, const char **, - int *, const unsigned char *); -PCRE_EXP_DECL pcre16 *pcre16_compile2(PCRE_SPTR16, int, int *, const char **, - int *, const unsigned char *); -PCRE_EXP_DECL pcre32 *pcre32_compile2(PCRE_SPTR32, int, int *, const char **, - int *, const unsigned char *); -PCRE_EXP_DECL int pcre_config(int, void *); -PCRE_EXP_DECL int pcre16_config(int, void *); -PCRE_EXP_DECL int pcre32_config(int, void *); -PCRE_EXP_DECL int pcre_copy_named_substring(const pcre *, const char *, - int *, int, const char *, char *, int); -PCRE_EXP_DECL int pcre16_copy_named_substring(const pcre16 *, PCRE_SPTR16, - int *, int, PCRE_SPTR16, PCRE_UCHAR16 *, int); -PCRE_EXP_DECL int pcre32_copy_named_substring(const pcre32 *, PCRE_SPTR32, - int *, int, PCRE_SPTR32, PCRE_UCHAR32 *, int); -PCRE_EXP_DECL int pcre_copy_substring(const char *, int *, int, int, - char *, int); -PCRE_EXP_DECL int pcre16_copy_substring(PCRE_SPTR16, int *, int, int, - PCRE_UCHAR16 *, int); -PCRE_EXP_DECL int pcre32_copy_substring(PCRE_SPTR32, int *, int, int, - PCRE_UCHAR32 *, int); -PCRE_EXP_DECL int pcre_dfa_exec(const pcre *, const pcre_extra *, - const char *, int, int, int, int *, int , int *, int); -PCRE_EXP_DECL int pcre16_dfa_exec(const pcre16 *, const pcre16_extra *, - PCRE_SPTR16, int, int, int, int *, int , int *, int); -PCRE_EXP_DECL int pcre32_dfa_exec(const pcre32 *, const pcre32_extra *, - PCRE_SPTR32, int, int, int, int *, int , int *, int); -PCRE_EXP_DECL int pcre_exec(const pcre *, const pcre_extra *, PCRE_SPTR, - int, int, int, int *, int); -PCRE_EXP_DECL int pcre16_exec(const pcre16 *, const pcre16_extra *, - PCRE_SPTR16, int, int, int, int *, int); -PCRE_EXP_DECL int pcre32_exec(const pcre32 *, const pcre32_extra *, - PCRE_SPTR32, int, int, int, int *, int); -PCRE_EXP_DECL int pcre_jit_exec(const pcre *, const pcre_extra *, - PCRE_SPTR, int, int, int, int *, int, - pcre_jit_stack *); -PCRE_EXP_DECL int pcre16_jit_exec(const pcre16 *, const pcre16_extra *, - PCRE_SPTR16, int, int, int, int *, int, - pcre16_jit_stack *); -PCRE_EXP_DECL int pcre32_jit_exec(const pcre32 *, const pcre32_extra *, - PCRE_SPTR32, int, int, int, int *, int, - pcre32_jit_stack *); -PCRE_EXP_DECL void pcre_free_substring(const char *); -PCRE_EXP_DECL void pcre16_free_substring(PCRE_SPTR16); -PCRE_EXP_DECL void pcre32_free_substring(PCRE_SPTR32); -PCRE_EXP_DECL void pcre_free_substring_list(const char **); -PCRE_EXP_DECL void pcre16_free_substring_list(PCRE_SPTR16 *); -PCRE_EXP_DECL void pcre32_free_substring_list(PCRE_SPTR32 *); -PCRE_EXP_DECL int pcre_fullinfo(const pcre *, const pcre_extra *, int, - void *); -PCRE_EXP_DECL int pcre16_fullinfo(const pcre16 *, const pcre16_extra *, int, - void *); -PCRE_EXP_DECL int pcre32_fullinfo(const pcre32 *, const pcre32_extra *, int, - void *); -PCRE_EXP_DECL int pcre_get_named_substring(const pcre *, const char *, - int *, int, const char *, const char **); -PCRE_EXP_DECL int pcre16_get_named_substring(const pcre16 *, PCRE_SPTR16, - int *, int, PCRE_SPTR16, PCRE_SPTR16 *); -PCRE_EXP_DECL int pcre32_get_named_substring(const pcre32 *, PCRE_SPTR32, - int *, int, PCRE_SPTR32, PCRE_SPTR32 *); -PCRE_EXP_DECL int pcre_get_stringnumber(const pcre *, const char *); -PCRE_EXP_DECL int pcre16_get_stringnumber(const pcre16 *, PCRE_SPTR16); -PCRE_EXP_DECL int pcre32_get_stringnumber(const pcre32 *, PCRE_SPTR32); -PCRE_EXP_DECL int pcre_get_stringtable_entries(const pcre *, const char *, - char **, char **); -PCRE_EXP_DECL int pcre16_get_stringtable_entries(const pcre16 *, PCRE_SPTR16, - PCRE_UCHAR16 **, PCRE_UCHAR16 **); -PCRE_EXP_DECL int pcre32_get_stringtable_entries(const pcre32 *, PCRE_SPTR32, - PCRE_UCHAR32 **, PCRE_UCHAR32 **); -PCRE_EXP_DECL int pcre_get_substring(const char *, int *, int, int, - const char **); -PCRE_EXP_DECL int pcre16_get_substring(PCRE_SPTR16, int *, int, int, - PCRE_SPTR16 *); -PCRE_EXP_DECL int pcre32_get_substring(PCRE_SPTR32, int *, int, int, - PCRE_SPTR32 *); -PCRE_EXP_DECL int pcre_get_substring_list(const char *, int *, int, - const char ***); -PCRE_EXP_DECL int pcre16_get_substring_list(PCRE_SPTR16, int *, int, - PCRE_SPTR16 **); -PCRE_EXP_DECL int pcre32_get_substring_list(PCRE_SPTR32, int *, int, - PCRE_SPTR32 **); -PCRE_EXP_DECL const unsigned char *pcre_maketables(void); -PCRE_EXP_DECL const unsigned char *pcre16_maketables(void); -PCRE_EXP_DECL const unsigned char *pcre32_maketables(void); -PCRE_EXP_DECL int pcre_refcount(pcre *, int); -PCRE_EXP_DECL int pcre16_refcount(pcre16 *, int); -PCRE_EXP_DECL int pcre32_refcount(pcre32 *, int); -PCRE_EXP_DECL pcre_extra *pcre_study(const pcre *, int, const char **); -PCRE_EXP_DECL pcre16_extra *pcre16_study(const pcre16 *, int, const char **); -PCRE_EXP_DECL pcre32_extra *pcre32_study(const pcre32 *, int, const char **); -PCRE_EXP_DECL void pcre_free_study(pcre_extra *); -PCRE_EXP_DECL void pcre16_free_study(pcre16_extra *); -PCRE_EXP_DECL void pcre32_free_study(pcre32_extra *); -PCRE_EXP_DECL const char *pcre_version(void); -PCRE_EXP_DECL const char *pcre16_version(void); -PCRE_EXP_DECL const char *pcre32_version(void); - -/* Utility functions for byte order swaps. */ -PCRE_EXP_DECL int pcre_pattern_to_host_byte_order(pcre *, pcre_extra *, - const unsigned char *); -PCRE_EXP_DECL int pcre16_pattern_to_host_byte_order(pcre16 *, pcre16_extra *, - const unsigned char *); -PCRE_EXP_DECL int pcre32_pattern_to_host_byte_order(pcre32 *, pcre32_extra *, - const unsigned char *); -PCRE_EXP_DECL int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *, - PCRE_SPTR16, int, int *, int); -PCRE_EXP_DECL int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *, - PCRE_SPTR32, int, int *, int); - -/* JIT compiler related functions. */ - -PCRE_EXP_DECL pcre_jit_stack *pcre_jit_stack_alloc(int, int); -PCRE_EXP_DECL pcre16_jit_stack *pcre16_jit_stack_alloc(int, int); -PCRE_EXP_DECL pcre32_jit_stack *pcre32_jit_stack_alloc(int, int); -PCRE_EXP_DECL void pcre_jit_stack_free(pcre_jit_stack *); -PCRE_EXP_DECL void pcre16_jit_stack_free(pcre16_jit_stack *); -PCRE_EXP_DECL void pcre32_jit_stack_free(pcre32_jit_stack *); -PCRE_EXP_DECL void pcre_assign_jit_stack(pcre_extra *, - pcre_jit_callback, void *); -PCRE_EXP_DECL void pcre16_assign_jit_stack(pcre16_extra *, - pcre16_jit_callback, void *); -PCRE_EXP_DECL void pcre32_assign_jit_stack(pcre32_extra *, - pcre32_jit_callback, void *); - -#ifdef __cplusplus -} /* extern "C" */ -#endif - -#endif /* End of pcre.h */ diff --git a/lcc/pcre2.h b/lcc/pcre2.h new file mode 100644 index 00000000..7ab6b39a --- /dev/null +++ b/lcc/pcre2.h @@ -0,0 +1,991 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* This is the public header file for the PCRE library, second API, to be +#included by applications that call PCRE2 functions. + + Copyright (c) 2016-2020 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + +#ifndef PCRE2_H_IDEMPOTENT_GUARD +#define PCRE2_H_IDEMPOTENT_GUARD + +/* The current PCRE version information. */ + +#define PCRE2_MAJOR 10 +#define PCRE2_MINOR 37 +#define PCRE2_PRERELEASE +#define PCRE2_DATE 2021-05-26 + +/* When an application links to a PCRE DLL in Windows, the symbols that are +imported have to be identified as such. When building PCRE2, the appropriate +export setting is defined in pcre2_internal.h, which includes this file. So we +don't change existing definitions of PCRE2_EXP_DECL. */ + +#if defined(_WIN32) && !defined(PCRE2_STATIC) +# ifndef PCRE2_EXP_DECL +# define PCRE2_EXP_DECL extern __declspec(dllimport) +# endif +#endif + +/* By default, we use the standard "extern" declarations. */ + +#ifndef PCRE2_EXP_DECL +# ifdef __cplusplus +# define PCRE2_EXP_DECL extern "C" +# else +# define PCRE2_EXP_DECL extern +# endif +#endif + +/* When compiling with the MSVC compiler, it is sometimes necessary to include +a "calling convention" before exported function names. (This is secondhand +information; I know nothing about MSVC myself). For example, something like + + void __cdecl function(....) + +might be needed. In order so make this easy, all the exported functions have +PCRE2_CALL_CONVENTION just before their names. It is rarely needed; if not +set, we ensure here that it has no effect. */ + +#ifndef PCRE2_CALL_CONVENTION +#define PCRE2_CALL_CONVENTION +#endif + +/* Have to include limits.h, stdlib.h, and inttypes.h to ensure that size_t and +uint8_t, UCHAR_MAX, etc are defined. Some systems that do have inttypes.h do +not have stdint.h, which is why we use inttypes.h, which according to the C +standard is a superset of stdint.h. If none of these headers are available, +the relevant values must be provided by some other means. */ + +#include +#include +#include + +/* Allow for C++ users compiling this directly. */ + +#ifdef __cplusplus +extern "C" { +#endif + +/* The following option bits can be passed to pcre2_compile(), pcre2_match(), +or pcre2_dfa_match(). PCRE2_NO_UTF_CHECK affects only the function to which it +is passed. Put these bits at the most significant end of the options word so +others can be added next to them */ + +#define PCRE2_ANCHORED 0x80000000u +#define PCRE2_NO_UTF_CHECK 0x40000000u +#define PCRE2_ENDANCHORED 0x20000000u + +/* The following option bits can be passed only to pcre2_compile(). However, +they may affect compilation, JIT compilation, and/or interpretive execution. +The following tags indicate which: + +C alters what is compiled by pcre2_compile() +J alters what is compiled by pcre2_jit_compile() +M is inspected during pcre2_match() execution +D is inspected during pcre2_dfa_match() execution +*/ + +#define PCRE2_ALLOW_EMPTY_CLASS 0x00000001u /* C */ +#define PCRE2_ALT_BSUX 0x00000002u /* C */ +#define PCRE2_AUTO_CALLOUT 0x00000004u /* C */ +#define PCRE2_CASELESS 0x00000008u /* C */ +#define PCRE2_DOLLAR_ENDONLY 0x00000010u /* J M D */ +#define PCRE2_DOTALL 0x00000020u /* C */ +#define PCRE2_DUPNAMES 0x00000040u /* C */ +#define PCRE2_EXTENDED 0x00000080u /* C */ +#define PCRE2_FIRSTLINE 0x00000100u /* J M D */ +#define PCRE2_MATCH_UNSET_BACKREF 0x00000200u /* C J M */ +#define PCRE2_MULTILINE 0x00000400u /* C */ +#define PCRE2_NEVER_UCP 0x00000800u /* C */ +#define PCRE2_NEVER_UTF 0x00001000u /* C */ +#define PCRE2_NO_AUTO_CAPTURE 0x00002000u /* C */ +#define PCRE2_NO_AUTO_POSSESS 0x00004000u /* C */ +#define PCRE2_NO_DOTSTAR_ANCHOR 0x00008000u /* C */ +#define PCRE2_NO_START_OPTIMIZE 0x00010000u /* J M D */ +#define PCRE2_UCP 0x00020000u /* C J M D */ +#define PCRE2_UNGREEDY 0x00040000u /* C */ +#define PCRE2_UTF 0x00080000u /* C J M D */ +#define PCRE2_NEVER_BACKSLASH_C 0x00100000u /* C */ +#define PCRE2_ALT_CIRCUMFLEX 0x00200000u /* J M D */ +#define PCRE2_ALT_VERBNAMES 0x00400000u /* C */ +#define PCRE2_USE_OFFSET_LIMIT 0x00800000u /* J M D */ +#define PCRE2_EXTENDED_MORE 0x01000000u /* C */ +#define PCRE2_LITERAL 0x02000000u /* C */ +#define PCRE2_MATCH_INVALID_UTF 0x04000000u /* J M D */ + +/* An additional compile options word is available in the compile context. */ + +#define PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 0x00000001u /* C */ +#define PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL 0x00000002u /* C */ +#define PCRE2_EXTRA_MATCH_WORD 0x00000004u /* C */ +#define PCRE2_EXTRA_MATCH_LINE 0x00000008u /* C */ +#define PCRE2_EXTRA_ESCAPED_CR_IS_LF 0x00000010u /* C */ +#define PCRE2_EXTRA_ALT_BSUX 0x00000020u /* C */ + +/* These are for pcre2_jit_compile(). */ + +#define PCRE2_JIT_COMPLETE 0x00000001u /* For full matching */ +#define PCRE2_JIT_PARTIAL_SOFT 0x00000002u +#define PCRE2_JIT_PARTIAL_HARD 0x00000004u +#define PCRE2_JIT_INVALID_UTF 0x00000100u + +/* These are for pcre2_match(), pcre2_dfa_match(), pcre2_jit_match(), and +pcre2_substitute(). Some are allowed only for one of the functions, and in +these cases it is noted below. Note that PCRE2_ANCHORED, PCRE2_ENDANCHORED and +PCRE2_NO_UTF_CHECK can also be passed to these functions (though +pcre2_jit_match() ignores the latter since it bypasses all sanity checks). */ + +#define PCRE2_NOTBOL 0x00000001u +#define PCRE2_NOTEOL 0x00000002u +#define PCRE2_NOTEMPTY 0x00000004u /* ) These two must be kept */ +#define PCRE2_NOTEMPTY_ATSTART 0x00000008u /* ) adjacent to each other. */ +#define PCRE2_PARTIAL_SOFT 0x00000010u +#define PCRE2_PARTIAL_HARD 0x00000020u +#define PCRE2_DFA_RESTART 0x00000040u /* pcre2_dfa_match() only */ +#define PCRE2_DFA_SHORTEST 0x00000080u /* pcre2_dfa_match() only */ +#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_UNKNOWN_UNSET 0x00000800u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 0x00001000u /* pcre2_substitute() only */ +#define PCRE2_NO_JIT 0x00002000u /* Not for pcre2_dfa_match() */ +#define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u +#define PCRE2_SUBSTITUTE_LITERAL 0x00008000u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_MATCHED 0x00010000u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 0x00020000u /* pcre2_substitute() only */ + +/* Options for pcre2_pattern_convert(). */ + +#define PCRE2_CONVERT_UTF 0x00000001u +#define PCRE2_CONVERT_NO_UTF_CHECK 0x00000002u +#define PCRE2_CONVERT_POSIX_BASIC 0x00000004u +#define PCRE2_CONVERT_POSIX_EXTENDED 0x00000008u +#define PCRE2_CONVERT_GLOB 0x00000010u +#define PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR 0x00000030u +#define PCRE2_CONVERT_GLOB_NO_STARSTAR 0x00000050u + +/* Newline and \R settings, for use in compile contexts. The newline values +must be kept in step with values set in config.h and both sets must all be +greater than zero. */ + +#define PCRE2_NEWLINE_CR 1 +#define PCRE2_NEWLINE_LF 2 +#define PCRE2_NEWLINE_CRLF 3 +#define PCRE2_NEWLINE_ANY 4 +#define PCRE2_NEWLINE_ANYCRLF 5 +#define PCRE2_NEWLINE_NUL 6 + +#define PCRE2_BSR_UNICODE 1 +#define PCRE2_BSR_ANYCRLF 2 + +/* Error codes for pcre2_compile(). Some of these are also used by +pcre2_pattern_convert(). */ + +#define PCRE2_ERROR_END_BACKSLASH 101 +#define PCRE2_ERROR_END_BACKSLASH_C 102 +#define PCRE2_ERROR_UNKNOWN_ESCAPE 103 +#define PCRE2_ERROR_QUANTIFIER_OUT_OF_ORDER 104 +#define PCRE2_ERROR_QUANTIFIER_TOO_BIG 105 +#define PCRE2_ERROR_MISSING_SQUARE_BRACKET 106 +#define PCRE2_ERROR_ESCAPE_INVALID_IN_CLASS 107 +#define PCRE2_ERROR_CLASS_RANGE_ORDER 108 +#define PCRE2_ERROR_QUANTIFIER_INVALID 109 +#define PCRE2_ERROR_INTERNAL_UNEXPECTED_REPEAT 110 +#define PCRE2_ERROR_INVALID_AFTER_PARENS_QUERY 111 +#define PCRE2_ERROR_POSIX_CLASS_NOT_IN_CLASS 112 +#define PCRE2_ERROR_POSIX_NO_SUPPORT_COLLATING 113 +#define PCRE2_ERROR_MISSING_CLOSING_PARENTHESIS 114 +#define PCRE2_ERROR_BAD_SUBPATTERN_REFERENCE 115 +#define PCRE2_ERROR_NULL_PATTERN 116 +#define PCRE2_ERROR_BAD_OPTIONS 117 +#define PCRE2_ERROR_MISSING_COMMENT_CLOSING 118 +#define PCRE2_ERROR_PARENTHESES_NEST_TOO_DEEP 119 +#define PCRE2_ERROR_PATTERN_TOO_LARGE 120 +#define PCRE2_ERROR_HEAP_FAILED 121 +#define PCRE2_ERROR_UNMATCHED_CLOSING_PARENTHESIS 122 +#define PCRE2_ERROR_INTERNAL_CODE_OVERFLOW 123 +#define PCRE2_ERROR_MISSING_CONDITION_CLOSING 124 +#define PCRE2_ERROR_LOOKBEHIND_NOT_FIXED_LENGTH 125 +#define PCRE2_ERROR_ZERO_RELATIVE_REFERENCE 126 +#define PCRE2_ERROR_TOO_MANY_CONDITION_BRANCHES 127 +#define PCRE2_ERROR_CONDITION_ASSERTION_EXPECTED 128 +#define PCRE2_ERROR_BAD_RELATIVE_REFERENCE 129 +#define PCRE2_ERROR_UNKNOWN_POSIX_CLASS 130 +#define PCRE2_ERROR_INTERNAL_STUDY_ERROR 131 +#define PCRE2_ERROR_UNICODE_NOT_SUPPORTED 132 +#define PCRE2_ERROR_PARENTHESES_STACK_CHECK 133 +#define PCRE2_ERROR_CODE_POINT_TOO_BIG 134 +#define PCRE2_ERROR_LOOKBEHIND_TOO_COMPLICATED 135 +#define PCRE2_ERROR_LOOKBEHIND_INVALID_BACKSLASH_C 136 +#define PCRE2_ERROR_UNSUPPORTED_ESCAPE_SEQUENCE 137 +#define PCRE2_ERROR_CALLOUT_NUMBER_TOO_BIG 138 +#define PCRE2_ERROR_MISSING_CALLOUT_CLOSING 139 +#define PCRE2_ERROR_ESCAPE_INVALID_IN_VERB 140 +#define PCRE2_ERROR_UNRECOGNIZED_AFTER_QUERY_P 141 +#define PCRE2_ERROR_MISSING_NAME_TERMINATOR 142 +#define PCRE2_ERROR_DUPLICATE_SUBPATTERN_NAME 143 +#define PCRE2_ERROR_INVALID_SUBPATTERN_NAME 144 +#define PCRE2_ERROR_UNICODE_PROPERTIES_UNAVAILABLE 145 +#define PCRE2_ERROR_MALFORMED_UNICODE_PROPERTY 146 +#define PCRE2_ERROR_UNKNOWN_UNICODE_PROPERTY 147 +#define PCRE2_ERROR_SUBPATTERN_NAME_TOO_LONG 148 +#define PCRE2_ERROR_TOO_MANY_NAMED_SUBPATTERNS 149 +#define PCRE2_ERROR_CLASS_INVALID_RANGE 150 +#define PCRE2_ERROR_OCTAL_BYTE_TOO_BIG 151 +#define PCRE2_ERROR_INTERNAL_OVERRAN_WORKSPACE 152 +#define PCRE2_ERROR_INTERNAL_MISSING_SUBPATTERN 153 +#define PCRE2_ERROR_DEFINE_TOO_MANY_BRANCHES 154 +#define PCRE2_ERROR_BACKSLASH_O_MISSING_BRACE 155 +#define PCRE2_ERROR_INTERNAL_UNKNOWN_NEWLINE 156 +#define PCRE2_ERROR_BACKSLASH_G_SYNTAX 157 +#define PCRE2_ERROR_PARENS_QUERY_R_MISSING_CLOSING 158 +/* Error 159 is obsolete and should now never occur */ +#define PCRE2_ERROR_VERB_ARGUMENT_NOT_ALLOWED 159 +#define PCRE2_ERROR_VERB_UNKNOWN 160 +#define PCRE2_ERROR_SUBPATTERN_NUMBER_TOO_BIG 161 +#define PCRE2_ERROR_SUBPATTERN_NAME_EXPECTED 162 +#define PCRE2_ERROR_INTERNAL_PARSED_OVERFLOW 163 +#define PCRE2_ERROR_INVALID_OCTAL 164 +#define PCRE2_ERROR_SUBPATTERN_NAMES_MISMATCH 165 +#define PCRE2_ERROR_MARK_MISSING_ARGUMENT 166 +#define PCRE2_ERROR_INVALID_HEXADECIMAL 167 +#define PCRE2_ERROR_BACKSLASH_C_SYNTAX 168 +#define PCRE2_ERROR_BACKSLASH_K_SYNTAX 169 +#define PCRE2_ERROR_INTERNAL_BAD_CODE_LOOKBEHINDS 170 +#define PCRE2_ERROR_BACKSLASH_N_IN_CLASS 171 +#define PCRE2_ERROR_CALLOUT_STRING_TOO_LONG 172 +#define PCRE2_ERROR_UNICODE_DISALLOWED_CODE_POINT 173 +#define PCRE2_ERROR_UTF_IS_DISABLED 174 +#define PCRE2_ERROR_UCP_IS_DISABLED 175 +#define PCRE2_ERROR_VERB_NAME_TOO_LONG 176 +#define PCRE2_ERROR_BACKSLASH_U_CODE_POINT_TOO_BIG 177 +#define PCRE2_ERROR_MISSING_OCTAL_OR_HEX_DIGITS 178 +#define PCRE2_ERROR_VERSION_CONDITION_SYNTAX 179 +#define PCRE2_ERROR_INTERNAL_BAD_CODE_AUTO_POSSESS 180 +#define PCRE2_ERROR_CALLOUT_NO_STRING_DELIMITER 181 +#define PCRE2_ERROR_CALLOUT_BAD_STRING_DELIMITER 182 +#define PCRE2_ERROR_BACKSLASH_C_CALLER_DISABLED 183 +#define PCRE2_ERROR_QUERY_BARJX_NEST_TOO_DEEP 184 +#define PCRE2_ERROR_BACKSLASH_C_LIBRARY_DISABLED 185 +#define PCRE2_ERROR_PATTERN_TOO_COMPLICATED 186 +#define PCRE2_ERROR_LOOKBEHIND_TOO_LONG 187 +#define PCRE2_ERROR_PATTERN_STRING_TOO_LONG 188 +#define PCRE2_ERROR_INTERNAL_BAD_CODE 189 +#define PCRE2_ERROR_INTERNAL_BAD_CODE_IN_SKIP 190 +#define PCRE2_ERROR_NO_SURROGATES_IN_UTF16 191 +#define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192 +#define PCRE2_ERROR_SUPPORTED_ONLY_IN_UNICODE 193 +#define PCRE2_ERROR_INVALID_HYPHEN_IN_OPTIONS 194 +#define PCRE2_ERROR_ALPHA_ASSERTION_UNKNOWN 195 +#define PCRE2_ERROR_SCRIPT_RUN_NOT_AVAILABLE 196 +#define PCRE2_ERROR_TOO_MANY_CAPTURES 197 +#define PCRE2_ERROR_CONDITION_ATOMIC_ASSERTION_EXPECTED 198 + + +/* "Expected" matching error codes: no match and partial match. */ + +#define PCRE2_ERROR_NOMATCH (-1) +#define PCRE2_ERROR_PARTIAL (-2) + +/* Error codes for UTF-8 validity checks */ + +#define PCRE2_ERROR_UTF8_ERR1 (-3) +#define PCRE2_ERROR_UTF8_ERR2 (-4) +#define PCRE2_ERROR_UTF8_ERR3 (-5) +#define PCRE2_ERROR_UTF8_ERR4 (-6) +#define PCRE2_ERROR_UTF8_ERR5 (-7) +#define PCRE2_ERROR_UTF8_ERR6 (-8) +#define PCRE2_ERROR_UTF8_ERR7 (-9) +#define PCRE2_ERROR_UTF8_ERR8 (-10) +#define PCRE2_ERROR_UTF8_ERR9 (-11) +#define PCRE2_ERROR_UTF8_ERR10 (-12) +#define PCRE2_ERROR_UTF8_ERR11 (-13) +#define PCRE2_ERROR_UTF8_ERR12 (-14) +#define PCRE2_ERROR_UTF8_ERR13 (-15) +#define PCRE2_ERROR_UTF8_ERR14 (-16) +#define PCRE2_ERROR_UTF8_ERR15 (-17) +#define PCRE2_ERROR_UTF8_ERR16 (-18) +#define PCRE2_ERROR_UTF8_ERR17 (-19) +#define PCRE2_ERROR_UTF8_ERR18 (-20) +#define PCRE2_ERROR_UTF8_ERR19 (-21) +#define PCRE2_ERROR_UTF8_ERR20 (-22) +#define PCRE2_ERROR_UTF8_ERR21 (-23) + +/* Error codes for UTF-16 validity checks */ + +#define PCRE2_ERROR_UTF16_ERR1 (-24) +#define PCRE2_ERROR_UTF16_ERR2 (-25) +#define PCRE2_ERROR_UTF16_ERR3 (-26) + +/* Error codes for UTF-32 validity checks */ + +#define PCRE2_ERROR_UTF32_ERR1 (-27) +#define PCRE2_ERROR_UTF32_ERR2 (-28) + +/* Miscellaneous error codes for pcre2[_dfa]_match(), substring extraction +functions, context functions, and serializing functions. They are in numerical +order. Originally they were in alphabetical order too, but now that PCRE2 is +released, the numbers must not be changed. */ + +#define PCRE2_ERROR_BADDATA (-29) +#define PCRE2_ERROR_MIXEDTABLES (-30) /* Name was changed */ +#define PCRE2_ERROR_BADMAGIC (-31) +#define PCRE2_ERROR_BADMODE (-32) +#define PCRE2_ERROR_BADOFFSET (-33) +#define PCRE2_ERROR_BADOPTION (-34) +#define PCRE2_ERROR_BADREPLACEMENT (-35) +#define PCRE2_ERROR_BADUTFOFFSET (-36) +#define PCRE2_ERROR_CALLOUT (-37) /* Never used by PCRE2 itself */ +#define PCRE2_ERROR_DFA_BADRESTART (-38) +#define PCRE2_ERROR_DFA_RECURSE (-39) +#define PCRE2_ERROR_DFA_UCOND (-40) +#define PCRE2_ERROR_DFA_UFUNC (-41) +#define PCRE2_ERROR_DFA_UITEM (-42) +#define PCRE2_ERROR_DFA_WSSIZE (-43) +#define PCRE2_ERROR_INTERNAL (-44) +#define PCRE2_ERROR_JIT_BADOPTION (-45) +#define PCRE2_ERROR_JIT_STACKLIMIT (-46) +#define PCRE2_ERROR_MATCHLIMIT (-47) +#define PCRE2_ERROR_NOMEMORY (-48) +#define PCRE2_ERROR_NOSUBSTRING (-49) +#define PCRE2_ERROR_NOUNIQUESUBSTRING (-50) +#define PCRE2_ERROR_NULL (-51) +#define PCRE2_ERROR_RECURSELOOP (-52) +#define PCRE2_ERROR_DEPTHLIMIT (-53) +#define PCRE2_ERROR_RECURSIONLIMIT (-53) /* Obsolete synonym */ +#define PCRE2_ERROR_UNAVAILABLE (-54) +#define PCRE2_ERROR_UNSET (-55) +#define PCRE2_ERROR_BADOFFSETLIMIT (-56) +#define PCRE2_ERROR_BADREPESCAPE (-57) +#define PCRE2_ERROR_REPMISSINGBRACE (-58) +#define PCRE2_ERROR_BADSUBSTITUTION (-59) +#define PCRE2_ERROR_BADSUBSPATTERN (-60) +#define PCRE2_ERROR_TOOMANYREPLACE (-61) +#define PCRE2_ERROR_BADSERIALIZEDDATA (-62) +#define PCRE2_ERROR_HEAPLIMIT (-63) +#define PCRE2_ERROR_CONVERT_SYNTAX (-64) +#define PCRE2_ERROR_INTERNAL_DUPMATCH (-65) +#define PCRE2_ERROR_DFA_UINVALID_UTF (-66) + + +/* Request types for pcre2_pattern_info() */ + +#define PCRE2_INFO_ALLOPTIONS 0 +#define PCRE2_INFO_ARGOPTIONS 1 +#define PCRE2_INFO_BACKREFMAX 2 +#define PCRE2_INFO_BSR 3 +#define PCRE2_INFO_CAPTURECOUNT 4 +#define PCRE2_INFO_FIRSTCODEUNIT 5 +#define PCRE2_INFO_FIRSTCODETYPE 6 +#define PCRE2_INFO_FIRSTBITMAP 7 +#define PCRE2_INFO_HASCRORLF 8 +#define PCRE2_INFO_JCHANGED 9 +#define PCRE2_INFO_JITSIZE 10 +#define PCRE2_INFO_LASTCODEUNIT 11 +#define PCRE2_INFO_LASTCODETYPE 12 +#define PCRE2_INFO_MATCHEMPTY 13 +#define PCRE2_INFO_MATCHLIMIT 14 +#define PCRE2_INFO_MAXLOOKBEHIND 15 +#define PCRE2_INFO_MINLENGTH 16 +#define PCRE2_INFO_NAMECOUNT 17 +#define PCRE2_INFO_NAMEENTRYSIZE 18 +#define PCRE2_INFO_NAMETABLE 19 +#define PCRE2_INFO_NEWLINE 20 +#define PCRE2_INFO_DEPTHLIMIT 21 +#define PCRE2_INFO_RECURSIONLIMIT 21 /* Obsolete synonym */ +#define PCRE2_INFO_SIZE 22 +#define PCRE2_INFO_HASBACKSLASHC 23 +#define PCRE2_INFO_FRAMESIZE 24 +#define PCRE2_INFO_HEAPLIMIT 25 +#define PCRE2_INFO_EXTRAOPTIONS 26 + +/* Request types for pcre2_config(). */ + +#define PCRE2_CONFIG_BSR 0 +#define PCRE2_CONFIG_JIT 1 +#define PCRE2_CONFIG_JITTARGET 2 +#define PCRE2_CONFIG_LINKSIZE 3 +#define PCRE2_CONFIG_MATCHLIMIT 4 +#define PCRE2_CONFIG_NEWLINE 5 +#define PCRE2_CONFIG_PARENSLIMIT 6 +#define PCRE2_CONFIG_DEPTHLIMIT 7 +#define PCRE2_CONFIG_RECURSIONLIMIT 7 /* Obsolete synonym */ +#define PCRE2_CONFIG_STACKRECURSE 8 /* Obsolete */ +#define PCRE2_CONFIG_UNICODE 9 +#define PCRE2_CONFIG_UNICODE_VERSION 10 +#define PCRE2_CONFIG_VERSION 11 +#define PCRE2_CONFIG_HEAPLIMIT 12 +#define PCRE2_CONFIG_NEVER_BACKSLASH_C 13 +#define PCRE2_CONFIG_COMPILED_WIDTHS 14 +#define PCRE2_CONFIG_TABLES_LENGTH 15 + + +/* Types for code units in patterns and subject strings. */ + +typedef uint8_t PCRE2_UCHAR8; +typedef uint16_t PCRE2_UCHAR16; +typedef uint32_t PCRE2_UCHAR32; + +typedef const PCRE2_UCHAR8 *PCRE2_SPTR8; +typedef const PCRE2_UCHAR16 *PCRE2_SPTR16; +typedef const PCRE2_UCHAR32 *PCRE2_SPTR32; + +/* The PCRE2_SIZE type is used for all string lengths and offsets in PCRE2, +including pattern offsets for errors and subject offsets after a match. We +define special values to indicate zero-terminated strings and unset offsets in +the offset vector (ovector). */ + +#define PCRE2_SIZE size_t +#define PCRE2_SIZE_MAX SIZE_MAX +#define PCRE2_ZERO_TERMINATED (~(PCRE2_SIZE)0) +#define PCRE2_UNSET (~(PCRE2_SIZE)0) + +/* Generic types for opaque structures and JIT callback functions. These +declarations are defined in a macro that is expanded for each width later. */ + +#define PCRE2_TYPES_LIST \ +struct pcre2_real_general_context; \ +typedef struct pcre2_real_general_context pcre2_general_context; \ +\ +struct pcre2_real_compile_context; \ +typedef struct pcre2_real_compile_context pcre2_compile_context; \ +\ +struct pcre2_real_match_context; \ +typedef struct pcre2_real_match_context pcre2_match_context; \ +\ +struct pcre2_real_convert_context; \ +typedef struct pcre2_real_convert_context pcre2_convert_context; \ +\ +struct pcre2_real_code; \ +typedef struct pcre2_real_code pcre2_code; \ +\ +struct pcre2_real_match_data; \ +typedef struct pcre2_real_match_data pcre2_match_data; \ +\ +struct pcre2_real_jit_stack; \ +typedef struct pcre2_real_jit_stack pcre2_jit_stack; \ +\ +typedef pcre2_jit_stack *(*pcre2_jit_callback)(void *); + + +/* The structures for passing out data via callout functions. We use structures +so that new fields can be added on the end in future versions, without changing +the API of the function, thereby allowing old clients to work without +modification. Define the generic versions in a macro; the width-specific +versions are generated from this macro below. */ + +/* Flags for the callout_flags field. These are cleared after a callout. */ + +#define PCRE2_CALLOUT_STARTMATCH 0x00000001u /* Set for each bumpalong */ +#define PCRE2_CALLOUT_BACKTRACK 0x00000002u /* Set after a backtrack */ + +#define PCRE2_STRUCTURE_LIST \ +typedef struct pcre2_callout_block { \ + uint32_t version; /* Identifies version of block */ \ + /* ------------------------ Version 0 ------------------------------- */ \ + uint32_t callout_number; /* Number compiled into pattern */ \ + uint32_t capture_top; /* Max current capture */ \ + uint32_t capture_last; /* Most recently closed capture */ \ + PCRE2_SIZE *offset_vector; /* The offset vector */ \ + PCRE2_SPTR mark; /* Pointer to current mark or NULL */ \ + PCRE2_SPTR subject; /* The subject being matched */ \ + PCRE2_SIZE subject_length; /* The length of the subject */ \ + PCRE2_SIZE start_match; /* Offset to start of this match attempt */ \ + PCRE2_SIZE current_position; /* Where we currently are in the subject */ \ + PCRE2_SIZE pattern_position; /* Offset to next item in the pattern */ \ + PCRE2_SIZE next_item_length; /* Length of next item in the pattern */ \ + /* ------------------- Added for Version 1 -------------------------- */ \ + PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \ + PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \ + PCRE2_SPTR callout_string; /* String compiled into pattern */ \ + /* ------------------- Added for Version 2 -------------------------- */ \ + uint32_t callout_flags; /* See above for list */ \ + /* ------------------------------------------------------------------ */ \ +} pcre2_callout_block; \ +\ +typedef struct pcre2_callout_enumerate_block { \ + uint32_t version; /* Identifies version of block */ \ + /* ------------------------ Version 0 ------------------------------- */ \ + PCRE2_SIZE pattern_position; /* Offset to next item in the pattern */ \ + PCRE2_SIZE next_item_length; /* Length of next item in the pattern */ \ + uint32_t callout_number; /* Number compiled into pattern */ \ + PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \ + PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \ + PCRE2_SPTR callout_string; /* String compiled into pattern */ \ + /* ------------------------------------------------------------------ */ \ +} pcre2_callout_enumerate_block; \ +\ +typedef struct pcre2_substitute_callout_block { \ + uint32_t version; /* Identifies version of block */ \ + /* ------------------------ Version 0 ------------------------------- */ \ + PCRE2_SPTR input; /* Pointer to input subject string */ \ + PCRE2_SPTR output; /* Pointer to output buffer */ \ + PCRE2_SIZE output_offsets[2]; /* Changed portion of the output */ \ + PCRE2_SIZE *ovector; /* Pointer to current ovector */ \ + uint32_t oveccount; /* Count of pairs set in ovector */ \ + uint32_t subscount; /* Substitution number */ \ + /* ------------------------------------------------------------------ */ \ +} pcre2_substitute_callout_block; + + +/* List the generic forms of all other functions in macros, which will be +expanded for each width below. Start with functions that give general +information. */ + +#define PCRE2_GENERAL_INFO_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION pcre2_config(uint32_t, void *); + + +/* Functions for manipulating contexts. */ + +#define PCRE2_GENERAL_CONTEXT_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_general_context PCRE2_CALL_CONVENTION \ + *pcre2_general_context_copy(pcre2_general_context *); \ +PCRE2_EXP_DECL pcre2_general_context PCRE2_CALL_CONVENTION \ + *pcre2_general_context_create(void *(*)(PCRE2_SIZE, void *), \ + void (*)(void *, void *), void *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_general_context_free(pcre2_general_context *); + +#define PCRE2_COMPILE_CONTEXT_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_compile_context PCRE2_CALL_CONVENTION \ + *pcre2_compile_context_copy(pcre2_compile_context *); \ +PCRE2_EXP_DECL pcre2_compile_context PCRE2_CALL_CONVENTION \ + *pcre2_compile_context_create(pcre2_general_context *);\ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_compile_context_free(pcre2_compile_context *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_bsr(pcre2_compile_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_character_tables(pcre2_compile_context *, const uint8_t *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_compile_extra_options(pcre2_compile_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_max_pattern_length(pcre2_compile_context *, PCRE2_SIZE); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_newline(pcre2_compile_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_parens_nest_limit(pcre2_compile_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_compile_recursion_guard(pcre2_compile_context *, \ + int (*)(uint32_t, void *), void *); + +#define PCRE2_MATCH_CONTEXT_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_match_context PCRE2_CALL_CONVENTION \ + *pcre2_match_context_copy(pcre2_match_context *); \ +PCRE2_EXP_DECL pcre2_match_context PCRE2_CALL_CONVENTION \ + *pcre2_match_context_create(pcre2_general_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_match_context_free(pcre2_match_context *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_callout(pcre2_match_context *, \ + int (*)(pcre2_callout_block *, void *), void *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_substitute_callout(pcre2_match_context *, \ + int (*)(pcre2_substitute_callout_block *, void *), void *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_depth_limit(pcre2_match_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_heap_limit(pcre2_match_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_match_limit(pcre2_match_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_offset_limit(pcre2_match_context *, PCRE2_SIZE); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_recursion_limit(pcre2_match_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_recursion_memory_management(pcre2_match_context *, \ + void *(*)(PCRE2_SIZE, void *), void (*)(void *, void *), void *); + +#define PCRE2_CONVERT_CONTEXT_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_convert_context PCRE2_CALL_CONVENTION \ + *pcre2_convert_context_copy(pcre2_convert_context *); \ +PCRE2_EXP_DECL pcre2_convert_context PCRE2_CALL_CONVENTION \ + *pcre2_convert_context_create(pcre2_general_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_convert_context_free(pcre2_convert_context *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_glob_escape(pcre2_convert_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_glob_separator(pcre2_convert_context *, uint32_t); + + +/* Functions concerned with compiling a pattern to PCRE internal code. */ + +#define PCRE2_COMPILE_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \ + *pcre2_compile(PCRE2_SPTR, PCRE2_SIZE, uint32_t, int *, PCRE2_SIZE *, \ + pcre2_compile_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_code_free(pcre2_code *); \ +PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \ + *pcre2_code_copy(const pcre2_code *); \ +PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \ + *pcre2_code_copy_with_tables(const pcre2_code *); + + +/* Functions that give information about a compiled pattern. */ + +#define PCRE2_PATTERN_INFO_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_pattern_info(const pcre2_code *, uint32_t, void *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_callout_enumerate(const pcre2_code *, \ + int (*)(pcre2_callout_enumerate_block *, void *), void *); + + +/* Functions for running a match and inspecting the result. */ + +#define PCRE2_MATCH_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_match_data PCRE2_CALL_CONVENTION \ + *pcre2_match_data_create(uint32_t, pcre2_general_context *); \ +PCRE2_EXP_DECL pcre2_match_data PCRE2_CALL_CONVENTION \ + *pcre2_match_data_create_from_pattern(const pcre2_code *, \ + pcre2_general_context *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_dfa_match(const pcre2_code *, PCRE2_SPTR, PCRE2_SIZE, PCRE2_SIZE, \ + uint32_t, pcre2_match_data *, pcre2_match_context *, int *, PCRE2_SIZE); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_match(const pcre2_code *, PCRE2_SPTR, PCRE2_SIZE, PCRE2_SIZE, \ + uint32_t, pcre2_match_data *, pcre2_match_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_match_data_free(pcre2_match_data *); \ +PCRE2_EXP_DECL PCRE2_SPTR PCRE2_CALL_CONVENTION \ + pcre2_get_mark(pcre2_match_data *); \ +PCRE2_EXP_DECL PCRE2_SIZE PCRE2_CALL_CONVENTION \ + pcre2_get_match_data_size(pcre2_match_data *); \ +PCRE2_EXP_DECL uint32_t PCRE2_CALL_CONVENTION \ + pcre2_get_ovector_count(pcre2_match_data *); \ +PCRE2_EXP_DECL PCRE2_SIZE PCRE2_CALL_CONVENTION \ + *pcre2_get_ovector_pointer(pcre2_match_data *); \ +PCRE2_EXP_DECL PCRE2_SIZE PCRE2_CALL_CONVENTION \ + pcre2_get_startchar(pcre2_match_data *); + + +/* Convenience functions for handling matched substrings. */ + +#define PCRE2_SUBSTRING_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_copy_byname(pcre2_match_data *, PCRE2_SPTR, PCRE2_UCHAR *, \ + PCRE2_SIZE *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_copy_bynumber(pcre2_match_data *, uint32_t, PCRE2_UCHAR *, \ + PCRE2_SIZE *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_substring_free(PCRE2_UCHAR *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_get_byname(pcre2_match_data *, PCRE2_SPTR, PCRE2_UCHAR **, \ + PCRE2_SIZE *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_get_bynumber(pcre2_match_data *, uint32_t, PCRE2_UCHAR **, \ + PCRE2_SIZE *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_length_byname(pcre2_match_data *, PCRE2_SPTR, PCRE2_SIZE *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_length_bynumber(pcre2_match_data *, uint32_t, PCRE2_SIZE *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_nametable_scan(const pcre2_code *, PCRE2_SPTR, PCRE2_SPTR *, \ + PCRE2_SPTR *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_number_from_name(const pcre2_code *, PCRE2_SPTR); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_substring_list_free(PCRE2_SPTR *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_list_get(pcre2_match_data *, PCRE2_UCHAR ***, PCRE2_SIZE **); + +/* Functions for serializing / deserializing compiled patterns. */ + +#define PCRE2_SERIALIZE_FUNCTIONS \ +PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION \ + pcre2_serialize_encode(const pcre2_code **, int32_t, uint8_t **, \ + PCRE2_SIZE *, pcre2_general_context *); \ +PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION \ + pcre2_serialize_decode(pcre2_code **, int32_t, const uint8_t *, \ + pcre2_general_context *); \ +PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION \ + pcre2_serialize_get_number_of_codes(const uint8_t *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_serialize_free(uint8_t *); + + +/* Convenience function for match + substitute. */ + +#define PCRE2_SUBSTITUTE_FUNCTION \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substitute(const pcre2_code *, PCRE2_SPTR, PCRE2_SIZE, PCRE2_SIZE, \ + uint32_t, pcre2_match_data *, pcre2_match_context *, PCRE2_SPTR, \ + PCRE2_SIZE, PCRE2_UCHAR *, PCRE2_SIZE *); + + +/* Functions for converting pattern source strings. */ + +#define PCRE2_CONVERT_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_pattern_convert(PCRE2_SPTR, PCRE2_SIZE, uint32_t, PCRE2_UCHAR **, \ + PCRE2_SIZE *, pcre2_convert_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_converted_pattern_free(PCRE2_UCHAR *); + + +/* Functions for JIT processing */ + +#define PCRE2_JIT_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_jit_compile(pcre2_code *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_jit_match(const pcre2_code *, PCRE2_SPTR, PCRE2_SIZE, PCRE2_SIZE, \ + uint32_t, pcre2_match_data *, pcre2_match_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_jit_free_unused_memory(pcre2_general_context *); \ +PCRE2_EXP_DECL pcre2_jit_stack PCRE2_CALL_CONVENTION \ + *pcre2_jit_stack_create(PCRE2_SIZE, PCRE2_SIZE, pcre2_general_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_jit_stack_assign(pcre2_match_context *, pcre2_jit_callback, void *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_jit_stack_free(pcre2_jit_stack *); + + +/* Other miscellaneous functions. */ + +#define PCRE2_OTHER_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_get_error_message(int, PCRE2_UCHAR *, PCRE2_SIZE); \ +PCRE2_EXP_DECL const uint8_t PCRE2_CALL_CONVENTION \ + *pcre2_maketables(pcre2_general_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_maketables_free(pcre2_general_context *, const uint8_t *); + +/* Define macros that generate width-specific names from generic versions. The +three-level macro scheme is necessary to get the macros expanded when we want +them to be. First we get the width from PCRE2_LOCAL_WIDTH, which is used for +generating three versions of everything below. After that, PCRE2_SUFFIX will be +re-defined to use PCRE2_CODE_UNIT_WIDTH, for use when macros such as +pcre2_compile are called by application code. */ + +#define PCRE2_JOIN(a,b) a ## b +#define PCRE2_GLUE(a,b) PCRE2_JOIN(a,b) +#define PCRE2_SUFFIX(a) PCRE2_GLUE(a,PCRE2_LOCAL_WIDTH) + + +/* Data types */ + +#define PCRE2_UCHAR PCRE2_SUFFIX(PCRE2_UCHAR) +#define PCRE2_SPTR PCRE2_SUFFIX(PCRE2_SPTR) + +#define pcre2_code PCRE2_SUFFIX(pcre2_code_) +#define pcre2_jit_callback PCRE2_SUFFIX(pcre2_jit_callback_) +#define pcre2_jit_stack PCRE2_SUFFIX(pcre2_jit_stack_) + +#define pcre2_real_code PCRE2_SUFFIX(pcre2_real_code_) +#define pcre2_real_general_context PCRE2_SUFFIX(pcre2_real_general_context_) +#define pcre2_real_compile_context PCRE2_SUFFIX(pcre2_real_compile_context_) +#define pcre2_real_convert_context PCRE2_SUFFIX(pcre2_real_convert_context_) +#define pcre2_real_match_context PCRE2_SUFFIX(pcre2_real_match_context_) +#define pcre2_real_jit_stack PCRE2_SUFFIX(pcre2_real_jit_stack_) +#define pcre2_real_match_data PCRE2_SUFFIX(pcre2_real_match_data_) + + +/* Data blocks */ + +#define pcre2_callout_block PCRE2_SUFFIX(pcre2_callout_block_) +#define pcre2_callout_enumerate_block PCRE2_SUFFIX(pcre2_callout_enumerate_block_) +#define pcre2_substitute_callout_block PCRE2_SUFFIX(pcre2_substitute_callout_block_) +#define pcre2_general_context PCRE2_SUFFIX(pcre2_general_context_) +#define pcre2_compile_context PCRE2_SUFFIX(pcre2_compile_context_) +#define pcre2_convert_context PCRE2_SUFFIX(pcre2_convert_context_) +#define pcre2_match_context PCRE2_SUFFIX(pcre2_match_context_) +#define pcre2_match_data PCRE2_SUFFIX(pcre2_match_data_) + + +/* Functions: the complete list in alphabetical order */ + +#define pcre2_callout_enumerate PCRE2_SUFFIX(pcre2_callout_enumerate_) +#define pcre2_code_copy PCRE2_SUFFIX(pcre2_code_copy_) +#define pcre2_code_copy_with_tables PCRE2_SUFFIX(pcre2_code_copy_with_tables_) +#define pcre2_code_free PCRE2_SUFFIX(pcre2_code_free_) +#define pcre2_compile PCRE2_SUFFIX(pcre2_compile_) +#define pcre2_compile_context_copy PCRE2_SUFFIX(pcre2_compile_context_copy_) +#define pcre2_compile_context_create PCRE2_SUFFIX(pcre2_compile_context_create_) +#define pcre2_compile_context_free PCRE2_SUFFIX(pcre2_compile_context_free_) +#define pcre2_config PCRE2_SUFFIX(pcre2_config_) +#define pcre2_convert_context_copy PCRE2_SUFFIX(pcre2_convert_context_copy_) +#define pcre2_convert_context_create PCRE2_SUFFIX(pcre2_convert_context_create_) +#define pcre2_convert_context_free PCRE2_SUFFIX(pcre2_convert_context_free_) +#define pcre2_converted_pattern_free PCRE2_SUFFIX(pcre2_converted_pattern_free_) +#define pcre2_dfa_match PCRE2_SUFFIX(pcre2_dfa_match_) +#define pcre2_general_context_copy PCRE2_SUFFIX(pcre2_general_context_copy_) +#define pcre2_general_context_create PCRE2_SUFFIX(pcre2_general_context_create_) +#define pcre2_general_context_free PCRE2_SUFFIX(pcre2_general_context_free_) +#define pcre2_get_error_message PCRE2_SUFFIX(pcre2_get_error_message_) +#define pcre2_get_mark PCRE2_SUFFIX(pcre2_get_mark_) +#define pcre2_get_match_data_size PCRE2_SUFFIX(pcre2_get_match_data_size_) +#define pcre2_get_ovector_pointer PCRE2_SUFFIX(pcre2_get_ovector_pointer_) +#define pcre2_get_ovector_count PCRE2_SUFFIX(pcre2_get_ovector_count_) +#define pcre2_get_startchar PCRE2_SUFFIX(pcre2_get_startchar_) +#define pcre2_jit_compile PCRE2_SUFFIX(pcre2_jit_compile_) +#define pcre2_jit_match PCRE2_SUFFIX(pcre2_jit_match_) +#define pcre2_jit_free_unused_memory PCRE2_SUFFIX(pcre2_jit_free_unused_memory_) +#define pcre2_jit_stack_assign PCRE2_SUFFIX(pcre2_jit_stack_assign_) +#define pcre2_jit_stack_create PCRE2_SUFFIX(pcre2_jit_stack_create_) +#define pcre2_jit_stack_free PCRE2_SUFFIX(pcre2_jit_stack_free_) +#define pcre2_maketables PCRE2_SUFFIX(pcre2_maketables_) +#define pcre2_maketables_free PCRE2_SUFFIX(pcre2_maketables_free_) +#define pcre2_match PCRE2_SUFFIX(pcre2_match_) +#define pcre2_match_context_copy PCRE2_SUFFIX(pcre2_match_context_copy_) +#define pcre2_match_context_create PCRE2_SUFFIX(pcre2_match_context_create_) +#define pcre2_match_context_free PCRE2_SUFFIX(pcre2_match_context_free_) +#define pcre2_match_data_create PCRE2_SUFFIX(pcre2_match_data_create_) +#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_) +#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_) +#define pcre2_pattern_convert PCRE2_SUFFIX(pcre2_pattern_convert_) +#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_) +#define pcre2_serialize_decode PCRE2_SUFFIX(pcre2_serialize_decode_) +#define pcre2_serialize_encode PCRE2_SUFFIX(pcre2_serialize_encode_) +#define pcre2_serialize_free PCRE2_SUFFIX(pcre2_serialize_free_) +#define pcre2_serialize_get_number_of_codes PCRE2_SUFFIX(pcre2_serialize_get_number_of_codes_) +#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_) +#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_) +#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_) +#define pcre2_set_compile_extra_options PCRE2_SUFFIX(pcre2_set_compile_extra_options_) +#define pcre2_set_compile_recursion_guard PCRE2_SUFFIX(pcre2_set_compile_recursion_guard_) +#define pcre2_set_depth_limit PCRE2_SUFFIX(pcre2_set_depth_limit_) +#define pcre2_set_glob_escape PCRE2_SUFFIX(pcre2_set_glob_escape_) +#define pcre2_set_glob_separator PCRE2_SUFFIX(pcre2_set_glob_separator_) +#define pcre2_set_heap_limit PCRE2_SUFFIX(pcre2_set_heap_limit_) +#define pcre2_set_match_limit PCRE2_SUFFIX(pcre2_set_match_limit_) +#define pcre2_set_max_pattern_length PCRE2_SUFFIX(pcre2_set_max_pattern_length_) +#define pcre2_set_newline PCRE2_SUFFIX(pcre2_set_newline_) +#define pcre2_set_parens_nest_limit PCRE2_SUFFIX(pcre2_set_parens_nest_limit_) +#define pcre2_set_offset_limit PCRE2_SUFFIX(pcre2_set_offset_limit_) +#define pcre2_set_substitute_callout PCRE2_SUFFIX(pcre2_set_substitute_callout_) +#define pcre2_substitute PCRE2_SUFFIX(pcre2_substitute_) +#define pcre2_substring_copy_byname PCRE2_SUFFIX(pcre2_substring_copy_byname_) +#define pcre2_substring_copy_bynumber PCRE2_SUFFIX(pcre2_substring_copy_bynumber_) +#define pcre2_substring_free PCRE2_SUFFIX(pcre2_substring_free_) +#define pcre2_substring_get_byname PCRE2_SUFFIX(pcre2_substring_get_byname_) +#define pcre2_substring_get_bynumber PCRE2_SUFFIX(pcre2_substring_get_bynumber_) +#define pcre2_substring_length_byname PCRE2_SUFFIX(pcre2_substring_length_byname_) +#define pcre2_substring_length_bynumber PCRE2_SUFFIX(pcre2_substring_length_bynumber_) +#define pcre2_substring_list_get PCRE2_SUFFIX(pcre2_substring_list_get_) +#define pcre2_substring_list_free PCRE2_SUFFIX(pcre2_substring_list_free_) +#define pcre2_substring_nametable_scan PCRE2_SUFFIX(pcre2_substring_nametable_scan_) +#define pcre2_substring_number_from_name PCRE2_SUFFIX(pcre2_substring_number_from_name_) + +/* Keep this old function name for backwards compatibility */ +#define pcre2_set_recursion_limit PCRE2_SUFFIX(pcre2_set_recursion_limit_) + +/* Keep this obsolete function for backwards compatibility: it is now a noop. */ +#define pcre2_set_recursion_memory_management PCRE2_SUFFIX(pcre2_set_recursion_memory_management_) + +/* Now generate all three sets of width-specific structures and function +prototypes. */ + +#define PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS \ +PCRE2_TYPES_LIST \ +PCRE2_STRUCTURE_LIST \ +PCRE2_GENERAL_INFO_FUNCTIONS \ +PCRE2_GENERAL_CONTEXT_FUNCTIONS \ +PCRE2_COMPILE_CONTEXT_FUNCTIONS \ +PCRE2_CONVERT_CONTEXT_FUNCTIONS \ +PCRE2_CONVERT_FUNCTIONS \ +PCRE2_MATCH_CONTEXT_FUNCTIONS \ +PCRE2_COMPILE_FUNCTIONS \ +PCRE2_PATTERN_INFO_FUNCTIONS \ +PCRE2_MATCH_FUNCTIONS \ +PCRE2_SUBSTRING_FUNCTIONS \ +PCRE2_SERIALIZE_FUNCTIONS \ +PCRE2_SUBSTITUTE_FUNCTION \ +PCRE2_JIT_FUNCTIONS \ +PCRE2_OTHER_FUNCTIONS + +#define PCRE2_LOCAL_WIDTH 8 +PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS +#undef PCRE2_LOCAL_WIDTH + +#define PCRE2_LOCAL_WIDTH 16 +PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS +#undef PCRE2_LOCAL_WIDTH + +#define PCRE2_LOCAL_WIDTH 32 +PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS +#undef PCRE2_LOCAL_WIDTH + +/* Undefine the list macros; they are no longer needed. */ + +#undef PCRE2_TYPES_LIST +#undef PCRE2_STRUCTURE_LIST +#undef PCRE2_GENERAL_INFO_FUNCTIONS +#undef PCRE2_GENERAL_CONTEXT_FUNCTIONS +#undef PCRE2_COMPILE_CONTEXT_FUNCTIONS +#undef PCRE2_CONVERT_CONTEXT_FUNCTIONS +#undef PCRE2_MATCH_CONTEXT_FUNCTIONS +#undef PCRE2_COMPILE_FUNCTIONS +#undef PCRE2_PATTERN_INFO_FUNCTIONS +#undef PCRE2_MATCH_FUNCTIONS +#undef PCRE2_SUBSTRING_FUNCTIONS +#undef PCRE2_SERIALIZE_FUNCTIONS +#undef PCRE2_SUBSTITUTE_FUNCTION +#undef PCRE2_JIT_FUNCTIONS +#undef PCRE2_OTHER_FUNCTIONS +#undef PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS + +/* PCRE2_CODE_UNIT_WIDTH must be defined. If it is 8, 16, or 32, redefine +PCRE2_SUFFIX to use it. If it is 0, undefine the other macros and make +PCRE2_SUFFIX a no-op. Otherwise, generate an error. */ + +#undef PCRE2_SUFFIX +#ifndef PCRE2_CODE_UNIT_WIDTH +#error PCRE2_CODE_UNIT_WIDTH must be defined before including pcre2.h. +#error Use 8, 16, or 32; or 0 for a multi-width application. +#else /* PCRE2_CODE_UNIT_WIDTH is defined */ +#if PCRE2_CODE_UNIT_WIDTH == 8 || \ + PCRE2_CODE_UNIT_WIDTH == 16 || \ + PCRE2_CODE_UNIT_WIDTH == 32 +#define PCRE2_SUFFIX(a) PCRE2_GLUE(a, PCRE2_CODE_UNIT_WIDTH) +#elif PCRE2_CODE_UNIT_WIDTH == 0 +#undef PCRE2_JOIN +#undef PCRE2_GLUE +#define PCRE2_SUFFIX(a) a +#else +#error PCRE2_CODE_UNIT_WIDTH must be 0, 8, 16, or 32. +#endif +#endif /* PCRE2_CODE_UNIT_WIDTH is defined */ + +#ifdef __cplusplus +} /* extern "C" */ +#endif + +#endif /* PCRE2_H_IDEMPOTENT_GUARD */ + +/* End of pcre2.h */ diff --git a/libcgi/.indent.pro b/libcgi/.indent.pro deleted file mode 100644 index 632288da..00000000 --- a/libcgi/.indent.pro +++ /dev/null @@ -1,6 +0,0 @@ --nbad -bap -nbbo -nbc -br -brs -c33 -cd33 -ncdb -nce -ci4 --cp33 -ncs -d0 -di1 -nfc1 -nfca -hnl -i4 -ip0 -l75 -lp -npcs --npsl -nsc -nsob -nss --Tform_entry --Tcgi_info --TFILE diff --git a/libcgi/Makefile.in b/libcgi/Makefile.in deleted file mode 100644 index 30409dd6..00000000 --- a/libcgi/Makefile.in +++ /dev/null @@ -1,45 +0,0 @@ -# CGI virdoc library makefile - -#WNOERROR=-Werror -#WARNINGS=$(WNOERROR) -ansi -pedantic -Wall -Wtraditional -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Winline -Dlint - -# You shouldn't have to edit anything else. -CC=@CC@ $(WARNINGS) -INSTALL=@INSTALL@ -AUX_LIBS= -RANLIB=@RANLIB@ -RM=rm -LINT=lint - -SRCS=form_ent.c get_cgi_info.c main.c syn_mime.c syn_url.c mcode.c\ - form_tags.c strops.c html.c - -.c.o: - $(CC) -c $(CFLAGS) $(AUX_CFLAGS) $(INF_INCS) $< - -ALL = libcgi.a - -%.o: %.c - $(CC) -c $(CFLAGS) $(AUX_CFLAGS) $(INF_INCS) $< - -all: $(ALL) - -libcgi.a: form_ent.o get_cgi_info.o main.o syn_mime.o syn_url.o mcode.o\ - form_tags.o strops.o html.o - ar r $@ $? - ${RANLIB} $@ - -install: $(ALL) - -clean: - -$(RM) -f *.o *~ *.a - -clobber: clean -distclean: clean - -insight: clean - $(MAKE) CC="insight" - -lint: - $(LINT) $(LINTFLAGS) $(SRCS) 2>&1 | tee lint.out - diff --git a/libcgi/cgi.h b/libcgi/cgi.h deleted file mode 100644 index 9674f579..00000000 --- a/libcgi/cgi.h +++ /dev/null @@ -1,119 +0,0 @@ -/** - ** Copyright (C) 1994, 1995 Enterprise Integration Technologies Corp. - ** VeriFone Inc./Hewlett-Packard. All Rights Reserved. - ** Kevin Hughes, kev@kevcom.com 3/11/94 - ** Kent Landfield, kent@landfield.com 4/6/97 - ** - ** This program and library is free software; you can redistribute it and/or - ** modify it under the terms of the GNU (Library) General Public License - ** as published by the Free Software Foundation; either version 2 - ** of the License, or any later version. - ** - ** This program is distributed in the hope that it will be useful, - ** but WITHOUT ANY WARRANTY; without even the implied warranty of - ** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - ** GNU (Library) General Public License for more details. - ** - ** You should have received a copy of the GNU (Library) General Public License - ** along with this program; if not, write to the Free Software - ** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA - **/ - -/** - ** This file is part of the LIBCGI library - ** - */ - -#ifndef CGI_H -#define CGI_H - -#include -#include -#include -#include -#include "../config.h" - -#ifdef NO_MACRO -#undef isspace -#undef isdigit -#undef isalpha -#undef isupper -#undef islower -#undef isxdigit -#undef isalnum -#undef ispunct -#undef isprint -#undef isgraph -#undef iscntrl -#undef isascii -#endif - - - -#define MCODE_GET 1 -#define MCODE_POST 2 -#define MCODE_PUT 3 -#define MCODE_HEAD 4 - -typedef struct { - char *server_software; - char *server_name; - char *gateway_interface; - char *server_protocol; - char *server_port; - char *request_method; - char *http_accept; - char *path_info; - char *path_translated; - char *script_name; - char *query_string; - char *remote_host; - char *remote_addr; - char *remote_user; - char *auth_type; - char *remote_ident; - char *content_type; - int content_length; -} cgi_info; - -typedef struct festruct { - char *name; - char *val; - struct festruct *next; -} form_entry; - - -/* Prototypes */ -/* ---------- */ - -int get_cgi_info(cgi_info *); -int syn_base_url(char *, cgi_info *); -int syn_mimeheader(char *, char *); -int mcode(cgi_info *); -char *trim(char *); -char *strmaxcpy(char *, char *, int); -char *sanitize(char *, char *); -char *parmval(form_entry *, char *); -int print_base_url(cgi_info *); -int print_mimeheader(const char *); -void print_doc_begin(char *); -void print_doc_end(char *); -void print_logo(void); -void print_sel_list(char *, char **, char *); -void print_submit(char *); -void print_input_blank(char *, unsigned int, unsigned int, char *); -form_entry *get_form_entries(cgi_info *); -void free_form_entries(form_entry *); -form_entry *get_fes_from_string(char *); -form_entry *get_fes_from_stream(int, FILE *); -unsigned char dd2c(char, char); -void dump_cgi_info(cgi_info *); - -#ifdef lint -extern int isspace(int); -extern int isalnum(int); - -extern int strcasecmp(const char *, const char *); -extern int strncasecmp(const char *, const char *, size_t); -#endif -#endif diff --git a/libcgi/form_ent.c b/libcgi/form_ent.c deleted file mode 100644 index 5022d432..00000000 --- a/libcgi/form_ent.c +++ /dev/null @@ -1,290 +0,0 @@ -/** - ** Copyright (C) 1994, 1995 Enterprise Integration Technologies Corp. - ** VeriFone Inc./Hewlett-Packard. All Rights Reserved. - ** Kevin Hughes, kev@kevcom.com 3/11/94 - ** Kent Landfield, kent@landfield.com 4/6/97 - ** - ** This program and library is free software; you can redistribute it and/or - ** modify it under the terms of the GNU (Library) General Public License - ** as published by the Free Software Foundation; either version 2 - ** of the License, or any later version. - ** - ** This program is distributed in the hope that it will be useful, - ** but WITHOUT ANY WARRANTY; without even the implied warranty of - ** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - ** GNU (Library) General Public License for more details. - ** - ** You should have received a copy of the GNU (Library) General Public License - ** along with this program; if not, write to the Free Software - ** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA - */ - -/* - * This file is part of the LIBCGI library - * - */ - -#include "cgi.h" - -#define LF 10 -#define CR 13 - -#define STARTSIZE 8 - -form_entry *get_form_entries(cgi_info *ci) -{ - if (ci && ci->request_method && - !strncasecmp(ci->request_method, "POST", 4) && ci->content_type && - !strncasecmp(ci->content_type, "application/x-www-form-urlencoded", - 33)) return get_fes_from_stream(ci->content_length, - stdin); - else if (ci && ci->request_method && - !strncasecmp(ci->request_method, "GET", 3)) - return get_fes_from_string(ci->query_string); - else - return NULL; -} - -void free_form_entries(form_entry *fe) -{ - form_entry *tempfe; - while (fe) { - if (fe->name) - free(fe->name); - if (fe->val) - free(fe->val); - tempfe = fe->next; - free(fe); - fe = tempfe; - } -} - -char *parmval(form_entry *fe, char *s) -{ - while (fe) { - if (!strcasecmp(fe->name, s)) - return fe->val; - else - fe = fe->next; - } - return NULL; -} - -form_entry *get_fes_from_string(char *s) -{ - form_entry *fe; - int asize; - int i; - - if (s == NULL) - return NULL; - while (isspace(*s) || *s == '&') - s++; /* some cases that shouldn't happen */ - if (*s == '\0') - return NULL; - fe = (form_entry *)malloc(sizeof(form_entry)); - if (fe == NULL) - return NULL; - fe->name = malloc((asize = STARTSIZE * sizeof(char))); - if (fe->name == NULL) { - free(fe); - return NULL; - } - /* get form field name */ - for (i = 0; *s && *s != '&' && *s != '='; s++, i++) { - switch (*s) { - case '+': - fe->name[i] = ' '; - break; - case '%': - fe->name[i] = dd2c(s[1], s[2]); - s += 2; - break; - default: - fe->name[i] = *s; - } - if (i + 1 >= asize) { /* try to double the buffer */ - fe->name = realloc(fe->name, (asize *= 2)); - if (fe->name == NULL) - return NULL; - } - } - fe->name[i] = '\0'; - switch (*s++) { - case '&': - fe->val = NULL; - break; - case '=': - fe->val = malloc((asize = STARTSIZE * sizeof(char))); - if (fe->val == NULL) - break; - for (i = 0; *s && *s != '&'; s++, i++) { - switch (*s) { - case '+': - fe->val[i] = ' '; - break; - case '%': - fe->val[i] = dd2c(s[1], s[2]); - s += 2; - break; - default: - fe->val[i] = *s; - } - if (i + 1 >= asize) { /* try to double the buffer */ - fe->val = realloc(fe->val, (asize *= 2)); - if (fe->val == NULL) - return NULL; - } - } - fe->val[i] = '\0'; - switch (*s++) { - case '&': - fe->next = get_fes_from_string(s); - break; - case '\0': - default: - fe->next = NULL; - } - break; - case '\0': - default: - fe->val = NULL; - fe->next = NULL; - } - return fe; -} - -#define getccl(s, l) (l-- ? getc(s) : EOF) - -form_entry *get_fes_from_stream(int length, FILE *stream) -{ - form_entry *fe; - int asize; - int i; - int c; - int c1, c2; - - while (isspace(c = getccl(stream, length)) || c == '&'); - if (c == EOF) - return NULL; - fe = (form_entry *)malloc(sizeof(form_entry)); - if (fe == NULL) - return NULL; - fe->name = malloc((asize = STARTSIZE * sizeof(char))); - if (fe->name == NULL) { - free(fe); - return NULL; - } - /* get form field name */ - for (i = 0; c != EOF && c != '&' && c != '='; - c = getccl(stream, length), i++) { - switch (c) { - case '+': - fe->name[i] = ' '; - break; - case '%': - c1 = getccl(stream, length); - c2 = getccl(stream, length); - fe->name[i] = dd2c(c1, c2); - break; - default: - fe->name[i] = c; - } - if (i + 1 >= asize) { /* try to double the buffer */ - fe->name = realloc(fe->name, (asize *= 2)); - if (fe->name == NULL) - return NULL; - } - } - fe->name[i] = '\0'; - if (c == EOF) { - fe->val = NULL; - fe->next = NULL; - } - else - switch (c) { - case '&': - fe->val = NULL; - break; - case '=': - fe->val = malloc((asize = STARTSIZE * sizeof(char))); - for (i = 0, c = getccl(stream, length); c != EOF && c != '&'; - c = getccl(stream, length), i++) { - switch (c) { - case '+': - fe->val[i] = ' '; - break; - case '%': - c1 = getccl(stream, length); - c2 = getccl(stream, length); - fe->val[i] = dd2c(c1, c2); - break; - default: - fe->val[i] = c; - } - if (i + 1 >= asize) { /* try to double the buffer */ - fe->val = realloc(fe->val, (asize *= 2)); - if (fe->val == NULL) - return NULL; - } - } - fe->val[i] = '\0'; - if (c == '&') { - fe->next = get_fes_from_stream(length, stream); - } - else - fe->next = NULL; - } - return fe; -} - -unsigned char dd2c(char d1, char d2) -{ - register unsigned char digit; - - digit = (d1 >= 'A' ? ((d1 & 0xdf) - 'A') + 10 : (d1 - '0')); - digit *= 16; - digit += (d2 >= 'A' ? ((d2 & 0xdf) - 'A') + 10 : (d2 - '0')); - return (digit); -} - - -void dump_cgi_info(cgi_info *ci) -{ - printf("CONTENT_LENGTH: %d\n", ci->content_length); - if (ci->content_type != NULL) - printf("
CONTENT_TYPE: %s\n", ci->content_type); - if (ci->server_name != NULL) - printf("
SERVER_NAME: %s\n", ci->server_name); - if (ci->server_software != NULL) - printf("
SERVER_SOFTWARE: %s\n", ci->server_software); - if (ci->gateway_interface != NULL) - printf("
GATEWAY_INTERFACE: %s\n", ci->gateway_interface); - if (ci->server_protocol != NULL) - printf("
SERVER_PROTOCOL: %s\n", ci->server_protocol); - if (ci->server_port != NULL) - printf("
SERVER_PORT: %s\n", ci->server_port); - if (ci->request_method != NULL) - printf("
REQUEST_METHOD: %s\n", ci->request_method); - if (ci->http_accept != NULL) - printf("
HTTP_ACCEPT: %s\n", ci->http_accept); - if (ci->path_info != NULL) - printf("
PATH_INFO: %s\n", ci->path_info); - if (ci->path_translated != NULL) - printf("
PATH_TRANSLATED: %s\n", ci->path_translated); - if (ci->script_name != NULL) - printf("
SCRIPT_NAME: %s\n", ci->script_name); - if (ci->query_string != NULL) - printf("
QUERY_STRING: %s\n", ci->query_string); - if (ci->remote_host != NULL) - printf("
REMOTE_HOST: %s\n", ci->remote_host); - if (ci->remote_addr != NULL) - printf("
REMOTE_ADDR: %s\n", ci->remote_addr); - if (ci->auth_type != NULL) - printf("
AUTH_TYPE: %s\n", ci->auth_type); - if (ci->remote_user != NULL) - printf("
REMOTE_USER: %s\n", ci->remote_user); - if (ci->remote_ident != NULL) - printf("
REMOTE_IDENT: %s\n", ci->remote_ident); - return; -} diff --git a/libcgi/form_tags.c b/libcgi/form_tags.c deleted file mode 100644 index f2fe6a44..00000000 --- a/libcgi/form_tags.c +++ /dev/null @@ -1,59 +0,0 @@ -/* -** Copyright (C) 1994, 1995 Enterprise Integration Technologies Corp. -** VeriFone Inc./Hewlett-Packard. All Rights Reserved. -** Kevin Hughes, kev@kevcom.com 3/11/94 -** Kent Landfield, kent@landfield.com 4/6/97 -** -** This program and library is free software; you can redistribute it and/or -** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 -** of the License, or any later version. -** -** This program is distributed in the hope that it will be useful, -** but WITHOUT ANY WARRANTY; without even the implied warranty of -** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -** GNU (Library) General Public License for more details. -** -** You should have received a copy of the GNU (Library) General Public License -** along with this program; if not, write to the Free Software -** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA -*/ - -/* - * This file is part of the LIBCGI library - * - */ - -#include "cgi.h" - -void print_sel_list(char *tname, char **opts, char *init) -{ - printf("", stdout); -} - -void print_input_blank(char *tname, unsigned int size, - unsigned int maxlength, char *init) -{ - printf("", stdout); -} - -void print_submit(char *label) -{ - printf("", stdout); -} diff --git a/libcgi/get_cgi_info.c b/libcgi/get_cgi_info.c deleted file mode 100644 index 9ad918db..00000000 --- a/libcgi/get_cgi_info.c +++ /dev/null @@ -1,52 +0,0 @@ -/* -** Copyright (C) 1994, 1995 Enterprise Integration Technologies Corp. -** VeriFone Inc./Hewlett-Packard. All Rights Reserved. -** Kevin Hughes, kev@kevcom.com 3/11/94 -** Kent Landfield, kent@landfield.com 4/6/97 -** -** This program and library is free software; you can redistribute it and/or -** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 -** of the License, or any later version. -** -** This program is distributed in the hope that it will be useful, -** but WITHOUT ANY WARRANTY; without even the implied warranty of -** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -** GNU (Library) General Public License for more details. -** -** You should have received a copy of the GNU (Library) General Public License -** along with this program; if not, write to the Free Software -** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA -*/ - -/* - * This file is part of the LIBCGI library - * - */ - -#include "cgi.h" - -int get_cgi_info(cgi_info *ci) -{ - char *s; - - ci->content_length = (s = getenv("CONTENT_LENGTH")) ? atoi(s) : 0; - ci->content_type = getenv("CONTENT_TYPE"); - ci->server_software = getenv("SERVER_SOFTWARE"); - ci->gateway_interface = getenv("GATEWAY_INTERFACE"); - ci->server_protocol = getenv("SERVER_PROTOCOL"); - ci->server_port = getenv("SERVER_PORT"); - ci->request_method = getenv("REQUEST_METHOD"); - ci->http_accept = getenv("HTTP_ACCEPT"); - ci->path_info = getenv("PATH_INFO"); - ci->path_translated = getenv("PATH_TRANSLATED"); - ci->script_name = getenv("SCRIPT_NAME"); - ci->query_string = getenv("QUERY_STRING"); - ci->remote_host = getenv("REMOTE_HOST"); - ci->remote_addr = getenv("REMOTE_ADDR"); - ci->remote_user = getenv("REMOTE_USER"); - ci->auth_type = getenv("AUTH_TYPE"); - ci->remote_user = getenv("REMOTE_USER"); - ci->remote_ident = getenv("REMOTE_IDENT"); - return (ci->server_name = getenv("SERVER_NAME")) != NULL; -} diff --git a/libcgi/html.c b/libcgi/html.c deleted file mode 100644 index 729ddf38..00000000 --- a/libcgi/html.c +++ /dev/null @@ -1,50 +0,0 @@ -/* -** Copyright (C) 1994, 1995 Enterprise Integration Technologies Corp. -** VeriFone Inc./Hewlett-Packard. All Rights Reserved. -** Kevin Hughes, kev@kevcom.com 3/11/94 -** Kent Landfield, kent@landfield.com 4/6/97 -** -** This program and library is free software; you can redistribute it and/or -** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 -** of the License, or any later version. -** -** This program is distributed in the hope that it will be useful, -** but WITHOUT ANY WARRANTY; without even the implied warranty of -** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -** GNU (Library) General Public License for more details. -** -** You should have received a copy of the GNU (Library) General Public License -** along with this program; if not, write to the Free Software -** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA -*/ - -/* - * This file is part of the LIBCGI library - * - */ - -#include "cgi.h" - -void print_doc_begin(char *title) -{ - printf("%s\n", title); - printf("

%s

\n\"\"
", - title); -} - -void print_doc_end(char *text) -{ - char *w; - - puts("


"); - if (text && *text) - puts(text); - else if ((w = getenv("WEBMASTER"))) - printf("
%s
\n", w); -} - -void print_logo(void) -{ - printf("\"HYPERMAIL\""); -} diff --git a/libcgi/libcgi.html b/libcgi/libcgi.html deleted file mode 100644 index 9788c460..00000000 --- a/libcgi/libcgi.html +++ /dev/null @@ -1,232 +0,0 @@ - - -Hypermail's CGI Library - - -

Hypermail.ORG
CGI Library

-

-These functions help you write virtual document (CGI) programs using C. -Look at the template.c file for an illustrative -example. Feel free to download -the latest -distributions from ftp.hypermail.org. -

- -

- -
-

Synopsis

-
-#include "libcgi/cgi.h"
-main()
-cgi_main(cgi_info *ci)
-
- -

Description

-libcgi contains a simple stub of a main program, which merely -calls cgi_main() with a struct filled with all the CGI -vars. Thus cgi_main is actually the entry point for -your CGI-processing code. It is this way to be upwardly-compatible -with a scheme for virtual document "daemons" that we're hatching. - -
-

Synopsis

-
-#include "libcgi/cgi.h"
-int get_cgi_info(cgi_info *)
-
-

Description

-This routine paws through the environment and fills up the struct -provided, which must already be allocated. -

-This function is called by main, and the result passed to -cgi_main. - -

Returns

-0 if there is a problem. - -
- -

Synopsis

-
-#include "libcgi/cgi.h"
-form_entry *get_form_entries(cgi_info *)
-void free_form_entries(cgi_info *)
-char *parmval(form_entry *, char *)
-
-

Description

-get_form_entries parses any form inputs information into a linked-list -of name/value pairs, returning the head pointer of that list. It does -all plus-to-space and hex code translations. -

-free_form_entries reclaims all the memory from the provided linked-list. -

-parmval return the value corresponding to the name in the second -argument (a caseless string compar) by a linear search through the list -in the first argument. - -

Returns

-get_form_enties returns the head pointer, or NULL if there is a problem -or no form input information was available. - -parmval returns the corresponding value string or NULL if there is a -problem or no matching name. - -
- -

Synopsis

-
-#include "libcgi/cgi.h"
-int syn_mimeheader(char *, char *)
-int print_mimeheader(char *)
-
-

Description

-syn_mimeheader creates a MIME header based on the MIME type in -the second string and writes it into the first string buffer -(including trailing double-newline). -

-print_mimeheader creates the same MIME header based on the MIME -type in its sole argument, and prints it to stdout - -

Returns

-both return 0 if there is a problem - -
- -

Synopsis

-
-#include "libcgi/cgi.h"
-int syn_base_url(char *, cgi_info *)
-int print_base_url(cgi_info *)
-
-

Description

-syn_base_url reconstructs the virtual document's URL given the -cgi_info, minus any query string, and fills the provided char -buffer. -

-print_base_url does the same but prints to stdout instead - -

Returns

-both return 0 if there is a problem - -
- -

Synopsis

-
-#include "libcgi/cgi.h"
-int mcode(cgi_info *)
-
-

Description

-This function examines the request_method in the cgi information -and returns an integer code. These codes are defined in cgi.h. - -

Returns

-0 if it doesn't recognize the method name, otherwise the code as -defined in cgi.h - -
- -

Synopsis

-
-#include "libcgi/cgi.h"
-void print_sel_list(char *tname, char **opts, char *init)
-
-

Description

-Prints an HTML+ selection list construct to stdout. The name of -the SELECT tag is given by tname, and the NULL-terminated string -array opts is turned into separate OPTION tags with values -corresponding to entries in opts. If any of these entries -are a caseless match with init, then that OPTION tag is the -default selection. - -
- -

Synopsis

-
-#include "libcgi/cgi.h"
-void print_input_blank(char *tname, unsigned size, char *init)
-
-

Description

-Prints an HTML+ INPUT tag (of type text) to stdout. The tag's -name is tname, its size is size, and initial value is init. - -
- -

Synopsis

-
-#include "libcgi/cgi.h"
-void print_submit(char *label)
-
-

Description

-Prints an HTML+ INPUT tag of type submit to stdout. The submit -button will be labelled by label if non-NULL. - -
- -

Synopsis

-
-#include "libcgi/cgi.h"
-char *trim(char *s)
-
-

Description

-Changes the string from blank-padded to NULL-padded - -

Returns

-its argument - -
- -

Synopsis

-
-#include "libcgi/cgi.h"
-char *sanitize(char *to, char *from)
-
-

Description

-Prepares the string for inclusion in a URL. That is, the from -string is copied to the to buffer except that blanks are turned -into '+' characters and non-alphanumerics are turned into -3-character sequences of a '%' followed by two hex digits -corresponding to the ascii code. - -

Returns

-the to string - -
- -

Synopsis

-
-#include "libcgi/cgi.h"
-char *strmaxcpy(char *s1, char *s2, int n)
-
- -

Description

-copies at most n-1 characters from s2 into s1, and then -null-terminates. Handy for truncating while copying. - -

Returns

-s1 as long as n > 0, NULL otherwise - -

-


-
hypermail@hypermail.org
- - - diff --git a/libcgi/main.c b/libcgi/main.c deleted file mode 100644 index 51678423..00000000 --- a/libcgi/main.c +++ /dev/null @@ -1,38 +0,0 @@ -/* -** Copyright (C) 1994, 1995 Enterprise Integration Technologies Corp. -** VeriFone Inc./Hewlett-Packard. All Rights Reserved. -** Kevin Hughes, kev@kevcom.com 3/11/94 -** Kent Landfield, kent@landfield.com 4/6/97 -** -** This program and library is free software; you can redistribute it and/or -** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 -** of the License, or any later version. -** -** This program is distributed in the hope that it will be useful, -** but WITHOUT ANY WARRANTY; without even the implied warranty of -** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -** GNU (Library) General Public License for more details. -** -** You should have received a copy of the GNU (Library) General Public License -** along with this program; if not, write to the Free Software -** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA -*/ - -/* - * This file is part of the LIBCGI library - * - */ - -#include "cgi.h" - -void cgi_main(cgi_info *ci); - -int main(void) -{ - cgi_info ci; - - get_cgi_info(&ci); - cgi_main(&ci); - return (0); -} diff --git a/libcgi/mcode.c b/libcgi/mcode.c deleted file mode 100644 index 891a3e3d..00000000 --- a/libcgi/mcode.c +++ /dev/null @@ -1,43 +0,0 @@ -/* -** Copyright (C) 1994, 1995 Enterprise Integration Technologies Corp. -** VeriFone Inc./Hewlett-Packard. All Rights Reserved. -** Kevin Hughes, kev@kevcom.com 3/11/94 -** Kent Landfield, kent@landfield.com 4/6/97 -** -** This program and library is free software; you can redistribute it and/or -** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 -** of the License, or any later version. -** -** This program is distributed in the hope that it will be useful, -** but WITHOUT ANY WARRANTY; without even the implied warranty of -** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -** GNU (Library) General Public License for more details. -** -** You should have received a copy of the GNU (Library) General Public License -** along with this program; if not, write to the Free Software -** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA -*/ - -/* - * This file is part of the LIBCGI library - * - */ - -#include "cgi.h" - -int mcode(cgi_info *ci) -{ - if (ci->request_method == NULL) - return 0; - else if (!strncasecmp(ci->request_method, "GET", 3)) - return MCODE_GET; - else if (!strncasecmp(ci->request_method, "POST", 4)) - return MCODE_POST; - else if (!strncasecmp(ci->request_method, "PUT", 3)) - return MCODE_PUT; - else if (!strncasecmp(ci->request_method, "HEAD", 4)) - return MCODE_HEAD; - else - return 0; -} diff --git a/libcgi/strops.c b/libcgi/strops.c deleted file mode 100644 index a402b4cb..00000000 --- a/libcgi/strops.c +++ /dev/null @@ -1,69 +0,0 @@ -/* -** Copyright (C) 1994, 1995 Enterprise Integration Technologies Corp. -** VeriFone Inc./Hewlett-Packard. All Rights Reserved. -** Kevin Hughes, kev@kevcom.com 3/11/94 -** Kent Landfield, kent@landfield.com 4/6/97 -** -** This program and library is free software; you can redistribute it and/or -** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 -** of the License, or any later version. -** -** This program is distributed in the hope that it will be useful, -** but WITHOUT ANY WARRANTY; without even the implied warranty of -** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -** GNU (Library) General Public License for more details. -** -** You should have received a copy of the GNU (Library) General Public License -** along with this program; if not, write to the Free Software -** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA -*/ - -/* -** This file is part of the LIBCGI library -** -*/ - -#include "cgi.h" - -char *trim(char *s) -{ - char *t = s; - - while (*t) - t++; - while (t > s && *--t == ' ') - *t = 0; - return s; -} - -#if 0 -char *sanitize(char *buf, char *s) -{ - char *t; - - for (t = buf; *s; s++) - if (*s == ' ') - *t++ = '+'; - else if (isalnum(*s)) - *t++ = *s; - else { - sprintf(t, "%%%2X", *s); - t += 3; - } - *t = '\0'; - return buf; -} -#endif - -char *strmaxcpy(char *dest, char *src, int n) -{ - char *d = dest; - - if (n < 1) - return NULL; - while (--n && *src) - *d++ = *src++; - *d = 0; - return dest; -} diff --git a/libcgi/syn_mime.c b/libcgi/syn_mime.c deleted file mode 100644 index f412be6c..00000000 --- a/libcgi/syn_mime.c +++ /dev/null @@ -1,46 +0,0 @@ -/* -** Copyright (C) 1994, 1995 Enterprise Integration Technologies Corp. -** VeriFone Inc./Hewlett-Packard. All Rights Reserved. -** Kevin Hughes, kev@kevcom.com 3/11/94 -** Kent Landfield, kent@landfield.com 4/6/97 -** -** This program and library is free software; you can redistribute it and/or -** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 -** of the License, or any later version. -** -** This program is distributed in the hope that it will be useful, -** but WITHOUT ANY WARRANTY; without even the implied warranty of -** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -** GNU (Library) General Public License for more details. -** -** You should have received a copy of the GNU (Library) General Public License -** along with this program; if not, write to the Free Software -** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA -*/ - -/* - * This file is part of the LIBCGI library - * - */ - -#include "cgi.h" - -#if 0 -int syn_mimeheader(char *buf, char *ct) -{ - int x; - - if (buf && ct) { - x = (int)sprintf(buf, "Content-Type: %s\n\n", ct); - return (x && x != EOF); - } - else - return 0; -} -#endif - -int print_mimeheader(const char *ct) -{ - return (ct && (printf("Content-Type: %s\n\n", ct) != EOF)); -} diff --git a/libcgi/syn_url.c b/libcgi/syn_url.c deleted file mode 100644 index 908e2dce..00000000 --- a/libcgi/syn_url.c +++ /dev/null @@ -1,50 +0,0 @@ -/* -** Copyright (C) 1994, 1995 Enterprise Integration Technologies Corp. -** VeriFone Inc./Hewlett-Packard. All Rights Reserved. -** Kevin Hughes, kev@kevcom.com 3/11/94 -** Kent Landfield, kent@landfield.com 4/6/97 -** -** This program and library is free software; you can redistribute it and/or -** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 -** of the License, or any later version. -** -** This program is distributed in the hope that it will be useful, -** but WITHOUT ANY WARRANTY; without even the implied warranty of -** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -** GNU (Library) General Public License for more details. -** -** You should have received a copy of the GNU (Library) General Public License -** along with this program; if not, write to the Free Software -** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA -*/ - -/* - * This file is part of the LIBCGI library - * - */ - -#include "cgi.h" - -#if 0 -int syn_base_url(char *buf, cgi_info *ci) -{ - int x; - - if (ci && buf) { - x = - (int)sprintf(buf, "http://%s:%s%s", ci->server_name, - ci->server_port, ci->script_name); - return (x && x != EOF); - } - else - return 0; -} -#endif - -int print_base_url(cgi_info *ci) -{ - return (ci && - (printf("http://%s:%s%s", ci->server_name, ci->server_port, - ci->script_name) != EOF)); -} diff --git a/libcgi/template.c b/libcgi/template.c deleted file mode 100644 index 86d9789f..00000000 --- a/libcgi/template.c +++ /dev/null @@ -1,72 +0,0 @@ -/* -** Copyright (C) 1994, 1995 Enterprise Integration Technologies Corp. -** VeriFone Inc./Hewlett-Packard. All Rights Reserved. -** Kevin Hughes, kev@kevcom.com 3/11/94 -** Kent Landfield, kent@landfield.com 4/6/97 -** -** This program and library is free software; you can redistribute it and/or -** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 -** of the License, or any later version. -** -** This program is distributed in the hope that it will be useful, -** but WITHOUT ANY WARRANTY; without even the implied warranty of -** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -** GNU (Library) General Public License for more details. -** -** You should have received a copy of the GNU (Library) General Public License -** along with this program; if not, write to the Free Software -** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA -*/ - -/* - * This file is part of the LIBCGI library - * - */ - -#include "cgi.h" - -void cgi_main(cgi_info *ci) -{ - char *parmval(); - form_entry *parms, *p; - form_entry *get_form_entries(); - char *foo, *bar; - - print_mimeheader("text/html"); - - puts("Your Title Here"); - puts("

Your heading here

"); - - parms = get_form_entries(ci); - if (parms) { - /* extract specific form parameters */ - for (p = parms; p; p = p->next) { - if (strcasecmp(p->name, "foo")) - foo = p->val; - else if (strcasecmp(p->name, "bar")) - bar = p->val; - } - } - - switch (mcode(ci)) { - - case MCODE_HEAD: - return; - - case MCODE_GET: - puts("Your GET response here"); - printf("based on foo=%s and bar=%s.\n", foo, bar); - break; - - case MCODE_POST: - puts("Your POST response here"); - printf("based on foo=%s and bar=%s.\n", foo, bar); - break; - - default: - printf("Unrecognized method '%s'.\n", ci->request_method); - } - - free_form_entries(parms); -} diff --git a/m4/apr.m4 b/m4/apr.m4 new file mode 100644 index 00000000..ebc6a54d --- /dev/null +++ b/m4/apr.m4 @@ -0,0 +1,140 @@ +dnl Some macros borrowed from Apache's httpd config file. Thanks! + +dnl +dnl APR_SUBDIR_CONFIG(dir [, sub-package-cmdline-args, args-to-drop]) +dnl +dnl dir: directory to find configure in +dnl sub-package-cmdline-args: arguments to add to the invocation (optional) +dnl args-to-drop: arguments to drop from the invocation (optional) +dnl +dnl Note: This macro relies on ac_configure_args being set properly. +dnl +dnl The args-to-drop argument is shoved into a case statement, so +dnl multiple arguments can be separated with a |. +dnl +dnl Note: Older versions of autoconf do not single-quote args, while 2.54+ +dnl places quotes around every argument. So, if you want to drop the +dnl argument called --enable-layout, you must pass the third argument as: +dnl [--enable-layout=*|\'--enable-layout=*] +dnl +dnl Trying to optimize this is left as an exercise to the reader who wants +dnl to put up with more autoconf craziness. I give up. +dnl +AC_DEFUN([APR_SUBDIR_CONFIG], [ + # save our work to this point; this allows the sub-package to use it + AC_CACHE_SAVE + + echo "configuring package in $1 now" + ac_popdir=`pwd` + apr_config_subdirs="$1" + test -d $1 || $mkdir_p $1 + ac_abs_srcdir=`(cd $srcdir/$1 && pwd)` + cd $1 + +changequote(, )dnl + # A "../" for each directory in /$config_subdirs. + ac_dots=`echo $apr_config_subdirs|sed -e 's%^\./%%' -e 's%[^/]$%&/%' -e 's%[^/]*/%../%g'` +changequote([, ])dnl + + # Make the cache file pathname absolute for the subdirs + # required to correctly handle subdirs that might actually + # be symlinks + case "$cache_file" in + /*) # already absolute + ac_sub_cache_file=$cache_file ;; + *) # Was relative path. + ac_sub_cache_file="$ac_popdir/$cache_file" ;; + esac + + ifelse($3, [], [apr_configure_args=$ac_configure_args],[ + apr_configure_args= + apr_sep= + for apr_configure_arg in $ac_configure_args + do + case "$apr_configure_arg" in + $3) + continue ;; + esac + apr_configure_args="$apr_configure_args$apr_sep'$apr_configure_arg'" + apr_sep=" " + done + ]) + + dnl autoconf doesn't add --silent to ac_configure_args; explicitly pass it + test "x$silent" = "xyes" && apr_configure_args="$apr_configure_args --silent" + + dnl AC_CONFIG_SUBDIRS silences option warnings, emulate this for 2.62 + apr_configure_args="--disable-option-checking $apr_configure_args" + + dnl The eval makes quoting arguments work - specifically the second argument + dnl where the quoting mechanisms used is "" rather than []. + dnl + dnl We need to execute another shell because some autoconf/shell combinations + dnl will choke after doing repeated APR_SUBDIR_CONFIG()s. (Namely Solaris + dnl and autoconf-2.54+) + if eval $SHELL $ac_abs_srcdir/configure $apr_configure_args --cache-file=$ac_sub_cache_file --srcdir=$ac_abs_srcdir $2 + then : + echo "$1 configured properly" + else + echo "configure failed for $1" + exit 1 + fi + + cd $ac_popdir + + # grab any updates from the sub-package + AC_CACHE_LOAD +])dnl + +dnl +dnl APR_ADDTO(variable, value) +dnl +dnl Add value to variable +dnl +AC_DEFUN([APR_ADDTO], [ + if test "x$$1" = "x"; then + test "x$silent" != "xyes" && echo " setting $1 to \"$2\"" + $1="$2" + else + apr_addto_bugger="$2" + for i in $apr_addto_bugger; do + apr_addto_duplicate="0" + for j in $$1; do + if test "x$i" = "x$j"; then + apr_addto_duplicate="1" + break + fi + done + if test $apr_addto_duplicate = "0"; then + test "x$silent" != "xyes" && echo " adding \"$i\" to $1" + $1="$$1 $i" + fi + done + fi +])dnl + +dnl +dnl APR_REMOVEFROM(variable, value) +dnl +dnl Remove a value from a variable +dnl +AC_DEFUN([APR_REMOVEFROM], [ + if test "x$$1" = "x$2"; then + test "x$silent" != "xyes" && echo " nulling $1" + $1="" + else + apr_new_bugger="" + apr_removed=0 + for i in $$1; do + if test "x$i" != "x$2"; then + apr_new_bugger="$apr_new_bugger $i" + else + apr_removed=1 + fi + done + if test $apr_removed = "1"; then + test "x$silent" != "xyes" && echo " removed \"$2\" from $1" + $1=$apr_new_bugger + fi + fi +]) dnl diff --git a/patchlevel.h b/patchlevel.h index 389e0790..542bc256 100644 --- a/patchlevel.h +++ b/patchlevel.h @@ -1,2 +1,2 @@ -#define VERSION "2.4.0" -#define PATCHLEVEL "0" +#define VERSION "3.0.0" + diff --git a/src/Makefile.in b/src/Makefile.in index 2838fa4b..4344107c 100644 --- a/src/Makefile.in +++ b/src/Makefile.in @@ -14,9 +14,6 @@ bindir=@bindir@ # This is where the man page goes mandir=@mandir@ -# This is where your CGI programs live -cgidir=@cgidir@ - # Executable program suffix (.exe for windows, null for Unix systems) SUFFIX=@suffix@ @@ -28,44 +25,51 @@ INSTALL_PROG=@INSTALL@ SPLINTFLAGS=@INCLUDES@ -PCRE_DEP=@PCRE_DEP@ +PCRE2_DEP=@PCRE2_DEP@ TRIO_DEP=@TRIO_DEP@ FNV_DEP=@FNV_DEP@ #WNOERROR=-Werror -#WARNINGS=$(WNOERROR) -ansi -pedantic -Wall -Wtraditional -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Winline -Dlint +# Note that -ansi and -std=iso9899:199409 seem to be equivalent in gcc as it +# doesn't support the +# for clang -std=c90 +# @@ make this part of autoconf? +#WARNINGS=$(WNOERROR) -Wall -Wextra -std=iso9899:199409 -pedantic -Wno-overlength-strings -Wall -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Winline -Dlint + +# While waiting to put this in configure.ac +# Note: if you're debugging memory issues (leaks, overruns, unallocated memory use) +# Add the folowing two warnings -fsanitize=address -fno-omit-frame-pointer +# and -static-libasan to LIBS CFLAGS=@CFLAGS@ $(WARNINGS) CPPFLAGS=@CPPFLAGS@ @INCLUDES@ YACC=@YACC@ -NETLIBS=@LIBS@ LDFLAGS=@LDFLAGS@ -MISC_LIBS= -lpcre -ltrio -lm +MISC_LIBS= -lpcre2-8 -ltrio -lm OPT_LIBS=@EXTRA_LIBS@ INCS= domains.h hypermail.h lang.h proto.h \ ../config.h ../patchlevel.h dsprintf.h threadprint.h \ - getdate.h getname.h finelink.h txt2html.h search.h + getdate.h getname.h finelink.h txt2html.h search.h \ + utf8.h SRCS= base64.c date.c domains.c file.c hypermail.c lang.c lock.c \ - mem.c parse.c print.c printfile.c string.c struct.c uudecode.c\ + mem.c parse.c print.c printfile.c printcss.c string.c struct.c uudecode.c\ dmatch.c setup.c threadprint.c getdate.c getname.c\ finelink.c txt2html.c search.c quotes.c OBJS= base64.o date.o domains.o file.o hypermail.o lang.o lock.o \ - mem.o parse.o print.o printfile.o string.o struct.o uudecode.o\ + mem.o parse.o print.o printfile.o printcss.o string.o struct.o uudecode.o\ dmatch.o setup.o threadprint.o getdate.o getname.o\ finelink.o txt2html.o search.o quotes.o -MAILOBJS= mail.o ../libcgi/libcgi.a - .c.o: $(CC) -c $(CFLAGS) $(CPPFLAGS) $< -all: @PCRE_DEP@ @TRIO_DEP@ @FNV_DEP@ hypermail$(SUFFIX) mail$(SUFFIX) lang$(SUFFIX) +all: @PCRE2_DEP@ @TRIO_DEP@ @FNV_DEP@ hypermail$(SUFFIX) lang$(SUFFIX) -pcre/.libs/libpcre.a: - @cd pcre; $(MAKE) CC="$(CC)" ; rm -f .libs/lib*.so* +pcre2/.libs/libpcre2-8.a: + @cd pcre2; $(MAKE) CC="$(CC)" ; rm -f .libs/lib*.so* trio/libtrio.a: @cd trio; $(MAKE) CC="$(CC)" @@ -77,32 +81,24 @@ hypermail$(SUFFIX): $(OBJS) $(CC) -o $@ $(CFLAGS) $(LDFLAGS) $(OBJS) $(OPT_LIBS) $(MISC_LIBS) chmod 0755 $@ -mail$(SUFFIX): $(MAILOBJS) - $(CC) -o $@ $(CFLAGS) $(MAILOBJS) $(NETLIBS) -lm - chmod 0755 $@ - lang$(SUFFIX): lang.c lang.h $(CC) -DLANG_PROG $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) -o $@ lang.c $(MISC_LIBS) -../libcgi/libcgi.a: - @cd ../libcgi; $(MAKE) all CC="$(CC)" CFLAGS="$(CFLAGS)" CPPFLAGS="$(CPPFLAGS)" - getdate.c: getdate.y getdate.h @echo "Expect 13 shift/reduce conflicts." $(YACC) getdate.y @mv -f y.tab.c getdate.c +convert-css: ../docs/hypermail.css ../contrib/css_to_c.pl + @echo "Converting ../docs/hypermail.css to printcss.c" + @perl ../contrib/css_to_c.pl ../docs/hypermail.css > printcss.c + install: all @if [ ! -d $(bindir) ]; then mkdir -p $(bindir); fi $(INSTALL_PROG) -s -c -m 0755 hypermail$(SUFFIX) $(bindir) -mail.install: - @if [ ! -d $(cgidir) ]; then mkdir -p $(cgidir); fi - $(INSTALL_PROG) -s -c -m 0755 mail$(SUFFIX) $(cgidir) - uninstall: rm -f $(bindir)/hypermail$(SUFFIX) - rm -f $(cgidir)/mail$(SUFFIX) insight: $(MAKE) CC="insight" @@ -122,31 +118,24 @@ splint: lint: lint $(SRCS) 2>&1 | tee lint.out -lint_mail: - lint mail.c 2>&1 | tee lint.out - @(cd ../libcgi; $(MAKE) lint 2>&1 | tee -a ../lint.out) - clean: - rm -f hypermail$(SUFFIX) mail$(SUFFIX) lang$(SUFFIX) + rm -f hypermail$(SUFFIX) lang$(SUFFIX) rm -f *.o .pure *qx *qv *.ln core rm -f .inslog tca.map lint.out splint.out rm -f getdate.c - @(if test "$(PCRE_DEP)" != "" ; then cd pcre; $(MAKE) clean; fi) + @(if test "$(PCRE2_DEP)" != "" ; then cd pcre2; $(MAKE) clean; fi) @(if test "$(TRIO_DEP)" != "" ; then cd trio; $(MAKE) clean; fi) @(if test "$(FNV_DEP)" != "" ; then cd fnv; $(MAKE) clean; fi) - @cd ../libcgi; $(MAKE) clean clobber: clean - @(if test "$(PCRE_DEP)" != "" ; then cd pcre; rm -f *.lock; fi) + @(if test "$(PCRE_DEP2)" != "" ; then cd pcre2; rm -f *.lock; fi) @(if test "$(TRIO_DEP)" != ""; then cd trio; rm -f *.lock; fi) @(if test "$(FNV_DEP)" != "" ; then cd fnv; $(MAKE) clobber; fi) - @cd ../libcgi; $(MAKE) clobber distclean: clobber - @(if test "$(PCRE_DEP)" != "" ; then cd pcre; $(MAKE) distclean; fi) + @(if test "$(PCRE2_DEP)" != "" ; then cd pcre2; $(MAKE) distclean; fi) @(if test "$(TRIO_DEP)" != ""; then cd trio; $(MAKE) clean; fi) @(if test "$(FNV_DEP)" != "" ; then cd fnv; $(MAKE) distclean; fi) - @cd ../libcgi; $(MAKE) distclean # # Regenerate this dependency list with gcc -MM *.c: @@ -170,15 +159,18 @@ hypermail.o: hypermail.c hypermail.h ../config.h ../patchlevel.h proto.h \ lang.o: lang.c hypermail.h ../config.h ../patchlevel.h proto.h lang.h lock.o: lock.c hypermail.h ../config.h ../patchlevel.h proto.h lang.h \ setup.h -mail.o: mail.c ../libcgi/cgi.h ../libcgi/../config.h ../config.h mem.o: mem.c hypermail.h ../config.h ../patchlevel.h proto.h lang.h parse.o: parse.c hypermail.h ../config.h ../patchlevel.h proto.h lang.h \ setup.h struct.h uudecode.h base64.h search.h getname.h parse.h print.h print.o: print.c hypermail.h ../config.h ../patchlevel.h proto.h lang.h \ setup.h struct.h printfile.h print.h parse.h txt2html.h finelink.h \ threadprint.h +printcss.o: printcss.c hypermail.h ../config.h ../patchlevel.h proto.h \ + lang.h setup.h printfile.o: printfile.c hypermail.h ../config.h ../patchlevel.h proto.h \ lang.h setup.h print.h printfile.h struct.h +printcss.o: printcss.c hypermail.h ../config.h ../patchlevel.h proto.h \ + lang.h setup.h quotes.o: quotes.c hypermail.h ../config.h ../patchlevel.h proto.h lang.h \ setup.h search.o: search.c hypermail.h ../config.h ../patchlevel.h proto.h lang.h \ @@ -186,7 +178,7 @@ search.o: search.c hypermail.h ../config.h ../patchlevel.h proto.h lang.h \ setup.o: setup.c hypermail.h ../config.h ../patchlevel.h proto.h lang.h \ defaults.h setup.h struct.h print.h string.o: string.c hypermail.h ../config.h ../patchlevel.h proto.h lang.h \ - setup.h parse.h uconvert.h + setup.h parse.h uconvert.h utf8.h struct.o: struct.c hypermail.h ../config.h ../patchlevel.h proto.h lang.h \ dmatch.h setup.h struct.h parse.h getname.h threadprint.o: threadprint.c hypermail.h ../config.h ../patchlevel.h \ @@ -195,3 +187,4 @@ txt2html.o: txt2html.c hypermail.h ../config.h ../patchlevel.h proto.h \ lang.h setup.h print.h finelink.h txt2html.h uudecode.o: uudecode.c hypermail.h ../config.h ../patchlevel.h proto.h \ lang.h setup.h uudecode.h + diff --git a/src/base64.c b/src/base64.c index b28b92ec..a34b28cf 100644 --- a/src/base64.c +++ b/src/base64.c @@ -10,85 +10,146 @@ ** ** - Encoded strings that ended with more than one = caused the decode ** function+ to generate 3 extra zero bytes at the end of the output. +** +** CHANGES by Andy Valencia - May 15 2023: +** +** - Preserve decoding state between calls to the +** base64_stream_decode() function to take into account UA that +** output intermediate base64 lines that are not always multiple of +** four. */ #include "hypermail.h" #include "base64.h" -void base64Decode(char *intext, char *out, int *length) +/* + * base64_decoder_state_new() + * Allocate a new base64 decoder state + */ +struct base64_decoder_state *base64_decoder_state_new(void) +{ + struct base64_decoder_state *st = (struct base64_decoder_state *)emalloc(sizeof(struct base64_decoder_state)); + if (!st) { + return(0); + } + memset(st, 0, sizeof(struct base64_decoder_state)); + return(st); +} + +/* + * base64_decoder_state_free() + * Release storage + */ +void base64_decoder_state_free(struct base64_decoder_state *st) +{ + free(st); +} + +/* + * base64_decode_stream() + * + * Accept base64 "intext", + * place resulting decoded output in a null-terminated "out". + * + * "st" is our state, which will be updated and can carry state between + * calls. + */ +int base64_decode_stream(struct base64_decoder_state *st, const char *intext, char *out) { - unsigned char ibuf[4]; - unsigned char obuf[3]; char ignore; - char endtext = FALSE; char ch; - int lindex = 0; - *length = 0; - - memset(ibuf, 0, sizeof(ibuf)); + int length; + + /* Ignore trailing garbage */ + if (st->endtext) { + *out = '\0'; + return(0); + } + length = 0; while (*intext) { ch = *intext; - ignore = FALSE; - if ((ch >= 'A') && (ch <= 'Z')) + + if ((ch >= 'A') && (ch <= 'Z')) { ch = ch - 'A'; - else if ((ch >= 'a') && (ch <= 'z')) + } else if ((ch >= 'a') && (ch <= 'z')) { ch = ch - 'a' + 26; - else if ((ch >= '0') && (ch <= '9')) + } else if ((ch >= '0') && (ch <= '9')) { ch = ch - '0' + 52; - else if (ch == '+') + } else if (ch == '+') { ch = 62; - else if (ch == '=') { /* end of text */ - if (endtext) + } else if (ch == '=') { /* end of text */ + if (st->endtext) { break; - endtext = TRUE; - lindex--; - if (lindex < 0) - lindex = 3; - } - else if (ch == '/') + } + st->endtext = TRUE; + st->lindex--; + if (st->lindex < 0) { + st->lindex = 3; + } + } else if (ch == '/') { ch = 63; - else if (endtext) + } else if (st->endtext) { break; - else + } else { ignore = TRUE; - + } + if (!ignore) { - if (!endtext) { - ibuf[lindex] = ch; + if (!st->endtext) { + st->ibuf[st->lindex] = ch; - lindex++; - lindex &= 3; /* use bit arithmetic instead of remainder */ + st->lindex++; + st->lindex &= 3; /* use bit arithmetic instead of remainder */ } - if ((0 == lindex) || endtext) { - - obuf[0] = (ibuf[0] << 2) | ((ibuf[1] & 0x30) >> 4); - obuf[1] = - ((ibuf[1] & 0x0F) << 4) | ((ibuf[2] & 0x3C) >> 2); - obuf[2] = ((ibuf[2] & 0x03) << 6) | (ibuf[3] & 0x3F); + if ((0 == st->lindex) || st->endtext) { - switch (lindex) { + st->obuf[0] = (st->ibuf[0] << 2) | ((st->ibuf[1] & 0x30) >> 4); + st->obuf[1] = + ((st->ibuf[1] & 0x0F) << 4) | ((st->ibuf[2] & 0x3C) >> 2); + st->obuf[2] = ((st->ibuf[2] & 0x03) << 6) | (st->ibuf[3] & 0x3F); + + switch (st->lindex) { case 1: - sprintf(out, "%c", obuf[0]); - out++; - (*length)++; + *out++ = st->obuf[0]; + length += 1; break; case 2: - sprintf(out, "%c%c", obuf[0], obuf[1]); - out += 2; - (*length) += 2; + *out++ = st->obuf[0]; + *out++ = st->obuf[1]; + length += 2; break; default: - sprintf(out, "%c%c%c", obuf[0], obuf[1], obuf[2]); - out += 3; - (*length) += 3; + *out++ = st->obuf[0]; + *out++ = st->obuf[1]; + *out++ = st->obuf[2]; + length += 3; break; } - memset(ibuf, 0, sizeof(ibuf)); + memset(st->ibuf, 0, sizeof(st->ibuf)); } } intext++; } *out = 0; + return (length); +} + +/* + * base64_decode_string() + * Convenience wrapper when decoding a single string + */ +int base64_decode_string(const char *intext, char *out) +{ + struct base64_decoder_state *st = base64_decoder_state_new(); + int length = 0; + + if (!st) { + return 0; + } + length = base64_decode_stream(st, intext, out); + base64_decoder_state_free(st); + + return (length); } diff --git a/src/base64.h b/src/base64.h index a0d5b0dc..e98e0977 100644 --- a/src/base64.h +++ b/src/base64.h @@ -1,5 +1,43 @@ +#ifndef _HYPERMAIL_BASE64_H +#define _HYPERMAIL_BASE64_H +/* +** Copyright (C) 1997-2023 Hypermail Project +** +** This program and library is free software; you can redistribute it and/or +** modify it under the terms of the GNU (Library) General Public License +** as published by the Free Software Foundation; either version 3 +** of the License, or any later version. +** +** This program is distributed in the hope that it will be useful, +** but WITHOUT ANY WARRANTY; without even the implied warranty of +** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +** GNU (Library) General Public License for more details. +** +** You should have received a copy of the GNU (Library) General Public License +** along with this program; if not, write to the Free Software +** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA +*/ + /* ** MIME Decode - base64.c */ -void base64Decode(char *, char *, int *); +/* Common state as you feed successive lines into base64_decode_stream() */ +struct base64_decoder_state { + unsigned char ibuf[4]; /* input buffer */ + unsigned char obuf[3]; /* output buffer */ + int lindex; /* index for ibuf / obuf */ + char endtext; /* base64 end detected */ +}; + +/* allocate and free a base64_state structure */ +struct base64_decoder_state *base64_decoder_state_new(void); +void base64_decoder_state_free(struct base64_decoder_state *); + +/* decode a stream made of multiple base64 lines */ +int base64_decode_stream(struct base64_decoder_state *, const char *, char *); + +/* decode a single string */ +int base64_decode_string(const char *, char *); + +#endif /* _HYPERMAIL_BASE64_H */ diff --git a/src/date.c b/src/date.c index 660c34be..00a632b0 100644 --- a/src/date.c +++ b/src/date.c @@ -3,6 +3,7 @@ ** VeriFone Inc./Hewlett-Packard. All Rights Reserved. ** Kevin Hughes, kev@kevcom.com 3/11/94 ** Kent Landfield, kent@landfield.com 4/6/97 +** Hypermail Project 1998-2023 ** ** This program and library is free software; you can redistribute it and/or ** modify it under the terms of the GNU (Library) General Public License diff --git a/src/defaults.h.in b/src/defaults.h.in index 44bbec97..ca83ef07 100644 --- a/src/defaults.h.in +++ b/src/defaults.h.in @@ -11,6 +11,8 @@ #define CONFIGFILE "~/.hmrc" +#define MSG_FRAGMENT_PREFIX "msg" + #define INLINE_TYPES "image/gif image/jpeg image/png" #define SHOW_HEADERS "From Subject Date Message-ID" @@ -27,10 +29,31 @@ #define DEFAULTINDEX "@defaultindex@" +#define DEFAULT_TOP_INDEX "folders" + #define DOMAINADDR "@domainaddr@" #define ANTISPAM_AT "_at_" +#define APPLE_MAIL_UA_HEADER "X-Mailer" + #define APPLE_MAIL_UA "Apple iPhone iPad" +#define DEFAULT_CHARSET "US-ASCII" + +#define HM_ANNOTATION_HEADER "X-Hypermail-Annotated" + +#define HM_DELETED_HEADERS "X-Hypermail-Deleted X-No-Archive" + +#define EXPIRES_HEADER "Expires" + +#define NEW_MSG_COMMAND "mailto:$TO" + +#define REPLYMSG_COMMAND "not set" + +#define DEFAULT_CSS_URL "hypermail.css" + +/* this is a format string where %s will be replaced by the archive's label */ +#define DEFAULT_MHTML_NAVBAR2UP "\n" + #endif diff --git a/src/dmatch.c b/src/dmatch.c index f6d8d2d4..fd328810 100644 --- a/src/dmatch.c +++ b/src/dmatch.c @@ -1,3 +1,21 @@ +/* +** Copyright (C) 1997-2023 Hypermail Project +** +** This program and library is free software; you can redistribute it and/or +** modify it under the terms of the GNU (Library) General Public License +** as published by the Free Software Foundation; either version 3 +** of the License, or any later version. +** +** This program is distributed in the hope that it will be useful, +** but WITHOUT ANY WARRANTY; without even the implied warranty of +** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +** GNU (Library) General Public License for more details. +** +** You should have received a copy of the GNU (Library) General Public License +** along with this program; if not, write to the Free Software +** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA +*/ + /* The shortest ever? 88 characters... diff --git a/src/dmatch.h b/src/dmatch.h index 353b717c..dfe2d07f 100644 --- a/src/dmatch.h +++ b/src/dmatch.h @@ -1 +1,23 @@ +#ifndef _HYPERMAIL_DMATCH_H +#define _HYPERMAIL_DMATCH_H +/* +** Copyright (C) 1997-2023 Hypermail Project +** +** This program and library is free software; you can redistribute it and/or +** modify it under the terms of the GNU (Library) General Public License +** as published by the Free Software Foundation; either version 3 +** of the License, or any later version. +** +** This program is distributed in the hope that it will be useful, +** but WITHOUT ANY WARRANTY; without even the implied warranty of +** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +** GNU (Library) General Public License for more details. +** +** You should have received a copy of the GNU (Library) General Public License +** along with this program; if not, write to the Free Software +** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA +*/ + char Match(char *, char *); + +#endif /* _HYPERMAIL_DMATCH_H */ diff --git a/src/domains.c b/src/domains.c index 324a1322..f5e1636a 100644 --- a/src/domains.c +++ b/src/domains.c @@ -1,3 +1,21 @@ +/* +** Copyright (C) 1997-2023 Hypermail Project +** +** This program and library is free software; you can redistribute it and/or +** modify it under the terms of the GNU (Library) General Public License +** as published by the Free Software Foundation; either version 3 +** of the License, or any later version. +** +** This program is distributed in the hope that it will be useful, +** but WITHOUT ANY WARRANTY; without even the implied warranty of +** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +** GNU (Library) General Public License for more details. +** +** You should have received a copy of the GNU (Library) General Public License +** along with this program; if not, write to the Free Software +** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA +*/ + #include "hypermail.h" #include "domains.h" diff --git a/src/domains.h b/src/domains.h index 6d3eeaaa..7645fee8 100644 --- a/src/domains.h +++ b/src/domains.h @@ -1,3 +1,23 @@ +#ifndef _HYPERMAIL_DOMAINS_H +#define _HYPERMAIL_DOMAINS_H +/* +** Copyright (C) 1997-2023 Hypermail Project +** +** This program and library is free software; you can redistribute it and/or +** modify it under the terms of the GNU (Library) General Public License +** as published by the Free Software Foundation; either version 3 +** of the License, or any later version. +** +** This program is distributed in the hope that it will be useful, +** but WITHOUT ANY WARRANTY; without even the implied warranty of +** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +** GNU (Library) General Public License for more details. +** +** You should have received a copy of the GNU (Library) General Public License +** along with this program; if not, write to the Free Software +** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA +*/ + /* ** @(#)domains.h 1.17 03/09/03 - Kent Landfield */ @@ -284,3 +304,5 @@ struct co_code domain_codes[] = { #define num_root_domains (sizeof (domain_codes) / sizeof (struct co_code)) #define MIN_DOMAIN_LEN 2 + +#endif /* _HYPERMAIL_DOMAINS_H */ diff --git a/src/file.c b/src/file.c index 5aafc834..6917ca6c 100644 --- a/src/file.c +++ b/src/file.c @@ -3,11 +3,12 @@ ** VeriFone Inc./Hewlett-Packard. All Rights Reserved. ** Kevin Hughes, kev@kevcom.com 3/11/94 ** Kent Landfield, kent@landfield.com 4/6/97 +** Hypermail Project 1998-2023 ** ** This program and library is free software; you can redistribute it and/or ** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 -** of the License, or any later version. +** as published by the Free Software Foundation; either version 3 +** of the License, or any later version. ** ** This program is distributed in the hope that it will be useful, ** but WITHOUT ANY WARRANTY; without even the implied warranty of @@ -63,7 +64,7 @@ int isfile(char *path) if (stat(path, &stbuf)) return 0; - return ((stbuf.st_mode & S_IFMT) == S_IFREG) ? 1 : 0; + return (S_ISREG(stbuf.st_mode)) ? 1 : 0; } /* @@ -76,7 +77,7 @@ int isdir(char *path) if (stat(path, &stbuf)) return 0; - return ((stbuf.st_mode & S_IFMT) == S_IFDIR) ? 1 : 0; + return (S_ISDIR(stbuf.st_mode)) ? 1 : 0; } /* @@ -98,7 +99,7 @@ void check1dir(char *dir) if (errno != ENOENT || mkdir(dir, set_dirmode) < 0) { #endif if (errno != EEXIST) { - snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_CANNOT_CREATE_DIRECTORY], dir); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_CANNOT_CREATE_DIRECTORY], dir); progerr(errmsg); } } @@ -106,7 +107,7 @@ void check1dir(char *dir) printf(" %s \"%s\", %s %o.\n", lang[MSG_CREATING_DIRECTORY], dir, lang[MSG_MODE], set_dirmode); if (chmod(dir, set_dirmode) == -1) { - snprintf(errmsg, sizeof(errmsg), "%s \"%s\" to %o.", lang[MSG_CANNOT_CHMOD], dir, set_dirmode); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\" to %o.", lang[MSG_CANNOT_CHMOD], dir, set_dirmode); progerr(errmsg); } } @@ -139,7 +140,7 @@ void checkdir(char *dir) if (errno != ENOENT || mkdir(dir, set_dirmode) < 0) { #endif if (errno != EEXIST) { - snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_CANNOT_CREATE_DIRECTORY], dir); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_CANNOT_CREATE_DIRECTORY], dir); progerr(errmsg); } } @@ -150,7 +151,7 @@ void checkdir(char *dir) printf(" %s \"%s\", %s %o.\n", lang[MSG_CREATING_DIRECTORY], dir, lang[MSG_MODE], set_dirmode); if (chmod(dir, set_dirmode) == -1) { - snprintf(errmsg, sizeof(errmsg), "%s \"%s\" to %o.", lang[MSG_CANNOT_CHMOD], dir, set_dirmode); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\" to %o.", lang[MSG_CANNOT_CHMOD], dir, set_dirmode); progerr(errmsg); } } @@ -288,13 +289,13 @@ void readconfigs(char *path, int cmd_show_variables) ** location for the config file and go on to try $HOME. */ if ((pp = getpwuid(getuid())) != NULL) { - snprintf(tmppath, sizeof(tmppath), "%s%s", pp->pw_dir, path + 1); /* AUDIT biege: who gurantees that path+1 contains data? */ + trio_snprintf(tmppath, sizeof(tmppath), "%s%s", pp->pw_dir, path + 1); /* AUDIT biege: who gurantees that path+1 contains data? */ ConfigInit(tmppath); } else #endif if ((ep = getenv("HOME")) != NULL) { /* AUDIT biege: possible BOF.. but it's not setuid.. so why to care? */ - snprintf(tmppath, sizeof(tmppath), "%s%s", ep, path + 1); /* AUDIT biege: who gurantees that path+1 contains data? */ + trio_snprintf(tmppath, sizeof(tmppath), "%s%s", ep, path + 1); /* AUDIT biege: who gurantees that path+1 contains data? */ ConfigInit(tmppath); } /* @@ -317,7 +318,7 @@ void symlink_latest() ** skip that whole thing and ignore that option. */ #ifdef __LCC__ - snprintf(errmsg, sizeof(errmsg), + trio_snprintf(errmsg, sizeof(errmsg), "WARNING: latest_folder not supported in Win32 environment.\n"); fprintf(stderr, "%s", errmsg); #else @@ -330,13 +331,13 @@ void symlink_latest() trio_snprintf(filename, MAXFILELEN, "%s%s", set_dir, set_latest_folder); if (!stat(filename, &stbuf) && unlink(filename)) { - snprintf(errmsg, sizeof(errmsg), "%s \"%s\" (latest_folder option).", lang[MSG_CANNOT_UNLINK], filename); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\" (latest_folder option).", lang[MSG_CANNOT_UNLINK], filename); progerr(errmsg); return; } if (symlink(latest_folder_path, filename)) { - snprintf(errmsg, sizeof(errmsg), "%s \"%s\" (latest_folder option).", lang[MSG_CANNOT_CREATE_SYMLINK], filename); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\" (latest_folder option).", lang[MSG_CANNOT_CREATE_SYMLINK], filename); progerr(errmsg); return; } @@ -392,7 +393,7 @@ int find_max_msgnum() return -1; dir = opendir(s_dir); if (dir == NULL) { - snprintf(errmsg, sizeof(errmsg), "internal error find_max_msgnum opening \"%s\".", s_dir); + trio_snprintf(errmsg, sizeof(errmsg), "internal error find_max_msgnum opening \"%s\".", s_dir); progerr(errmsg); } } diff --git a/src/finelink.c b/src/finelink.c index e320f8c1..845e7b71 100644 --- a/src/finelink.c +++ b/src/finelink.c @@ -59,7 +59,7 @@ static struct body *place_anchor(const String_Match * match_info, if (*ptr == match_info->start_match) { strcpy(token, last_ptr); *last_ptr = 0; - fprintf(fp2, "%s", buffer, anchor); + fprintf(fp2, "%s", buffer, anchor); strcpy(buffer, token); *ptr = last_ptr0; return bp; @@ -86,7 +86,7 @@ static struct body *place_anchor(const String_Match * match_info, if (0) printf("No match found %s; %s", anchor, buffer); } - fprintf(fp2, "", anchor); + fprintf(fp2, "", anchor); return bp; } @@ -95,7 +95,9 @@ static int place_a_end(const String_Match * match_info, struct body **bp, char * int index; char token[MAXLINE]; char *ptr1 = buffer; +#ifdef IS_THIS_NEEDED_20230503 char *last_ptr = ptr1; +#endif char *tptr; if (!*bp) return FALSE; @@ -106,7 +108,9 @@ static int place_a_end(const String_Match * match_info, struct body **bp, char * fprintf(fp2, "%s%s", buffer, token); return TRUE; } +#ifdef IS_THIS_NEEDED_20230503 last_ptr = ptr1; +#endif if (!(tptr = strstr(ptr1, *ptr))) { int len = (4 * strlen(*ptr)) / 5; char *temp1 = (char *)emalloc(len + 1); @@ -168,7 +172,7 @@ static int add_anchor(int msgnum, int quoting_msgnum, int quote_num, const char } tmpfilename = htmlfilename("tmp", ep, "tmp"); /* AUDIT biege: where is the tmp-file created? cwd? what about checking the return-value */ if ((fp2 = fopen(tmpfilename, "w")) == NULL) { - snprintf(errmsg, sizeof(errmsg), "Couldn't write \"%s\".", tmpfilename); + trio_snprintf(errmsg, sizeof(errmsg), "Couldn't write \"%s\".", tmpfilename); progerr(errmsg); } while (fgets(buffer, sizeof(buffer), fp1)) { @@ -242,11 +246,11 @@ static int add_anchor(int msgnum, int quoting_msgnum, int quote_num, const char remove(tmpfilename); else { if (rename(tmpfilename, filename) == -1) { - snprintf(errmsg, sizeof(errmsg), "Couldn't rename \"%s\" to %s.", tmpfilename, filename); + trio_snprintf(errmsg, sizeof(errmsg), "Couldn't rename \"%s\" to %s.", tmpfilename, filename); progerr(errmsg); } if (chmod(filename, set_filemode) == -1) { - snprintf(errmsg, sizeof(errmsg), "Couldn't chmod \"%s\" to %o.", filename, set_filemode); + trio_snprintf(errmsg, sizeof(errmsg), "Couldn't chmod \"%s\" to %o.", filename, set_filemode); progerr(errmsg); } } @@ -430,7 +434,7 @@ int handle_quoted_text(FILE *fp, struct emailinfo *email, const struct body *bp, tmpline[part2 - line] = 0; } if (set_link_to_replies) - fprintf(fp, "", quote_num); + fprintf(fp, "", quote_num); p2 = ConvURLsString(part2, email->msgid, email->subject, email->charset); if (replacing) fprintf(fp, fmt1, url1, set_quote_link_string, p2 ? p2 : ""); @@ -510,7 +514,7 @@ int get_new_reply_to() * "In reply to" */ -void replace_maybe_replies(const char *filename, struct emailinfo *ep, int new_reply_to) +void replace_maybe_replies(const char *filename, struct emailinfo *ep, int local_new_reply_to) { char tmpfilename[MAXFILELEN]; char buffer[MAXLINE]; @@ -520,15 +524,15 @@ void replace_maybe_replies(const char *filename, struct emailinfo *ep, int new_r char *ptr; static const char *prev_patt0 = ".html\">[ Previous ]"; - if (!hashnumlookup(new_reply_to, &ep2)) + if (!hashnumlookup(local_new_reply_to, &ep2)) return; - snprintf(tmpfilename, sizeof(tmpfilename), "%s/aaaa.tmp", set_dir); /* AUDIT biege: poss. BOF. */ + trio_snprintf(tmpfilename, sizeof(tmpfilename), "%s/aaaa.tmp", set_dir); /* AUDIT biege: poss. BOF. */ if ((fp1 = fopen(filename, "r")) == NULL) { - snprintf(errmsg, sizeof(errmsg), "Couldn't read \"%s\".", filename); + trio_snprintf(errmsg, sizeof(errmsg), "Couldn't read \"%s\".", filename); progerr(errmsg); } if ((fp2 = fopen(tmpfilename, "w")) == NULL) { - snprintf(errmsg, sizeof(errmsg), "Couldn't write \"%s\".", tmpfilename); + trio_snprintf(errmsg, sizeof(errmsg), "Couldn't write \"%s\".", tmpfilename); progerr(errmsg); } while (fgets(buffer, sizeof(buffer), fp1)) { @@ -541,9 +545,9 @@ void replace_maybe_replies(const char *filename, struct emailinfo *ep, int new_r char *tmpptr = convchars(ep2->subject, ep2->charset); if (tmpptr) { char *path = get_path(ep, ep2); - fprintf(fp2,"[ %s ]\n", - path, new_reply_to, set_htmlsuffix, lang[MSG_LTITLE_IN_REPLY_TO], - ep2->name, tmpptr ? tmpptr : ""); + fprintf(fp2,"
  • %s
  • \n", + path, local_new_reply_to, set_htmlsuffix, + tmpptr ? tmpptr : ""); free(tmpptr); } } @@ -552,10 +556,10 @@ void replace_maybe_replies(const char *filename, struct emailinfo *ep, int new_r char *tmpptr = convchars(ep2->subject, ep2->charset); if (tmpptr) { char *path = get_path(ep, ep2); - fprintf(fp2, "
  • %s " - "%s: \"%s\"
  • \n", + fprintf(fp2, "
  • %s " + "%s: \"%s\"
  • \n", lang[MSG_IN_REPLY_TO], path, - new_reply_to, set_htmlsuffix, lang[MSG_LTITLE_IN_REPLY_TO], + local_new_reply_to, set_htmlsuffix, ep2->name, tmpptr ? tmpptr : ""); free(tmpptr); } @@ -565,7 +569,7 @@ void replace_maybe_replies(const char *filename, struct emailinfo *ep, int new_r char *tmpptr = convchars(ep2->subject, ep2->charset); if (tmpptr) { char *path = get_path(ep, ep2); - fprintf(fp2, "
  • %s: " "%s: \"%s\"\n", lang[MSG_IN_REPLY_TO], path, new_reply_to, set_htmlsuffix, ep2->name, tmpptr ? tmpptr : ""); + fprintf(fp2, "
  • %s: " "%s: \"%s\"\n", lang[MSG_IN_REPLY_TO], path, local_new_reply_to, set_htmlsuffix, ep2->name, tmpptr ? tmpptr : ""); free(tmpptr); } } @@ -577,9 +581,9 @@ void replace_maybe_replies(const char *filename, struct emailinfo *ep, int new_r "%s:", "
  • Previous message: %s: %s: %s: %s: %s: %s: %s: tDAY tDAY_UNIT tDAYZONE tHOUR_UNIT tMINUTE_UNIT -%type tMONTH tMONTH_UNIT -%type tSEC_UNIT tSNUMBER tUNUMBER tYEAR_UNIT tZONE -%type tMERIDIAN o_merid +%token tAGO tID tDST tNEVER +%token tDAY tDAY_UNIT tDAYZONE tHOUR_UNIT tMINUTE_UNIT +%token tMONTH tMONTH_UNIT +%token tSEC_UNIT tSNUMBER tUNUMBER tYEAR_UNIT tZONE +%token tMERIDIAN +%type o_merid %% diff --git a/src/getname.c b/src/getname.c index edd5d5b4..111cc441 100644 --- a/src/getname.c +++ b/src/getname.c @@ -1,3 +1,21 @@ +/* +** Copyright (C) 1997-2023 Hypermail Project +** +** This program and library is free software; you can redistribute it and/or +** modify it under the terms of the GNU (Library) General Public License +** as published by the Free Software Foundation; either version 3 +** of the License, or any later version. +** +** This program is distributed in the hope that it will be useful, +** but WITHOUT ANY WARRANTY; without even the implied warranty of +** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +** GNU (Library) General Public License for more details. +** +** You should have received a copy of the GNU (Library) General Public License +** along with this program; if not, write to the Free Software +** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA +*/ + #include "hypermail.h" #include "getname.h" #include "setup.h" @@ -56,6 +74,12 @@ static int blankstring(char *str) ** ** This is an interesting new one (1998-11-26): ** From: ›Name.Hidden@era.ericsson.seœ +** +** Another case that was not yet handled, when there are two @ chars +** in the line, one in comments, and one for the address. The code was only +** detecting the first one and including the parenthesis as part of the address. +** (2023-04-27): +** From: "Roy T. Fielding (fielding@kiwi.ics.uci.edu)" */ /* AUDIT biege: this code is really tricky and may lead to BOFs in email[] and/or name[] */ @@ -64,6 +88,7 @@ void getname(char *line, char **namep, char **emailp) int i; int len; char *c; + int offset; int comment_fnd; char email[MAILSTRLEN]; @@ -84,7 +109,17 @@ void getname(char *line, char **namep, char **emailp) /* EMail Processing First: ** First, is there an '@' sign we can use as an anchor ? */ - if ((c = hm_strchr(line, '@')) == NULL) { + + /* email is often found between <> chars, let's try to find it there + to cover the case of the @ char found in comments and between <> */ + c = hm_strchr(line, '<'); + if (c && hm_strchr(c, '@')) { + offset = c - line; + } else { + offset = 0; + } + + if ((c = hm_strchr(line + offset, '@')) == NULL) { /* ** No '@' sign here so ... */ @@ -204,7 +239,12 @@ void getname(char *line, char **namep, char **emailp) for (i = 0, len = NAMESTRLEN - 1; *c && *c != '\"' && *c != '[' && *c != '\n' && i < len; c++) name[i++] = *c; - name[--i] = '\0'; + if (i > 0) { + --i; + } else { + i = 0; + } + name[i] = '\0'; comment_fnd = 1; } else { diff --git a/src/getname.h b/src/getname.h index 5fc5e5d6..1fd67acd 100644 --- a/src/getname.h +++ b/src/getname.h @@ -1 +1,23 @@ +#ifndef _HYPERMAIL_GETNAME_H +#define _HYPERMAIL_GETNAME_H +/* +** Copyright (C) 1997-2023 Hypermail Project +** +** This program and library is free software; you can redistribute it and/or +** modify it under the terms of the GNU (Library) General Public License +** as published by the Free Software Foundation; either version 3 +** of the License, or any later version. +** +** This program is distributed in the hope that it will be useful, +** but WITHOUT ANY WARRANTY; without even the implied warranty of +** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +** GNU (Library) General Public License for more details. +** +** You should have received a copy of the GNU (Library) General Public License +** along with this program; if not, write to the Free Software +** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA +*/ + void getname(char *, char **, char **); + +#endif /* _HYPERMAIL_GETNAME_H */ diff --git a/src/hypermail.c b/src/hypermail.c index b293302a..45a9f4fb 100644 --- a/src/hypermail.c +++ b/src/hypermail.c @@ -3,10 +3,11 @@ ** VeriFone Inc./Hewlett-Packard. All Rights Reserved. ** Kevin Hughes, kev@kevcom.com 3/11/94 ** Kent Landfield, kent@landfield.com 4/6/97 +** Hypermail Project 1998-2023 ** ** This program and library is free software; you can redistribute it and/or ** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 +** as published by the Free Software Foundation; either version 3 ** of the License, or any later version. ** ** This program is distributed in the hope that it will be useful, @@ -114,7 +115,7 @@ char *setindex(char *dfltindex, char *indextype, char *suffix) void version(void) { - printf("%s: %s: %s %s: %s\n", PROGNAME, lang[MSG_VERSION], VERSION, lang[MSG_PATCHLEVEL], PATCHLEVEL); + printf("%s: %s: %s\n", PROGNAME, lang[MSG_VERSION], VERSION); exit(0); } @@ -147,8 +148,6 @@ void usage(void) printf(" -o keyword=val: Set config item\n"); printf(" -p : %s\n", lang[MSG_OPTION_P]); printf(" -s htmlsuffix : %s\n", "HTML file suffix (.html, .htm, ..)"); - printf(" -t : %s\n", "Use Tables"); - printf(" -T : %s\n", "Use index tables"); printf(" -u : %s\n", lang[MSG_OPTION_U]); printf(" -v : %s\n", lang[MSG_OPTION_VERBOSE]); printf(" -V : %s\n", lang[MSG_OPTION_VERSION]); @@ -165,7 +164,6 @@ void usage(void) } printf(")\n"); printf("%s : %s\n", lang[MSG_VERSION], VERSION); - printf("%s : %s\n", lang[MSG_PATCHLEVEL], PATCHLEVEL); printf("%s : %s\n\n", lang[MSG_DOCS], HMURL); exit(1); } @@ -225,7 +223,6 @@ int main(int argc, char **argv) case 'p': case 's': case 't': - case 'T': case 'u': case 'x': case 'X': @@ -303,9 +300,12 @@ int main(int argc, char **argv) case 't': set_usetable = TRUE; break; + /* removed in 2.2.25 */ + /* case 'T': set_indextable = TRUE; break; + */ case 'u': set_increment = TRUE; break; @@ -335,7 +335,10 @@ int main(int argc, char **argv) break; } } - + + /* do some postconfig checks for deprecated / obsolete options and inits */ + PostConfig(); + #ifdef DEBUG dump_config(); exit(0); @@ -353,35 +356,35 @@ int main(int argc, char **argv) */ if (strlen(set_language) > 2) { - locale_code = strsav(set_language); + locale_code = set_language; set_language[2] = 0; /* shorten to 2-letter code */ } else locale_code = NULL; if ((tlang = valid_language(set_language, &locale_code)) == NULL) { - snprintf(errmsg, sizeof(errmsg), "\"%s\" %s.", set_language, lang[MSG_LANGUAGE_NOT_SUPPORTED]); - cmderr(errmsg); + trio_snprintf(errmsg, sizeof(errmsg), "\"%s\" %s.", set_language, lang[MSG_LANGUAGE_NOT_SUPPORTED]); + cmderr(errmsg); } #ifdef HAVE_LOCALE_H - if (!setlocale(LC_ALL, locale_code)) { - char *rv; - - if (!strcmp(locale_code, "en_US")) { - /* many systems now install by defualt en_US.UTF-8. - Here we assume that the mapping between en_US and en_US.UTF-8 - in system messages is identical. - We cannot do the same for other languages, though */ - rv = setlocale(LC_ALL, "en_US.UTF-8"); - } - if (!rv) { - snprintf(errmsg, sizeof(errmsg), "WARNING: locale \"%s\", not supported.\n", locale_code); - fprintf(stderr, "%s", errmsg);/* AUDIT biege: avoid format-bug warning */ - } + if (!setlocale(LC_ALL, locale_code)) { + char *rv = NULL; + + if (!strcmp(locale_code, "en_US")) { + /* many systems now install by defualt en_US.UTF-8. + Here we assume that the mapping between en_US and en_US.UTF-8 + in system messages is identical. + We cannot do the same for other languages, though */ + rv = setlocale(LC_ALL, "en_US.UTF-8"); + } + if (!rv) { + trio_snprintf(errmsg, sizeof(errmsg), "WARNING: locale \"%s\", not supported.\n", locale_code); + fprintf(stderr, "%s", errmsg);/* AUDIT biege: avoid format-bug warning */ + } } #endif - + lang = tlang; /* A good language, make it so. */ if (print_usage) /* Print the usage message and terminate */ @@ -389,17 +392,17 @@ int main(int argc, char **argv) #ifndef GDBM if (set_usegdbm) { - fprintf(stderr, "%s: %s\n", PROGNAME, lang[MSG_OPTION_G_NOT_BUILD_IN]); - usage(); + fprintf(stderr, "%s: %s\n", PROGNAME, lang[MSG_OPTION_G_NOT_BUILD_IN]); + usage(); } #endif - + #ifndef HAVE_LIBFNV if (set_nonsequential) - progerr("Hypermail isn't built with the libfnv hash library.\n" - "You cannot use the nonsequential option.\n"); + progerr("Hypermail isn't built with the libfnv hash library.\n" + "You cannot use the nonsequential option.\n"); #endif /* HAVE_LIBFNV */ - + if (set_mbox && !strcasecmp(set_mbox, "NONE")) { use_stdin = TRUE; } @@ -425,24 +428,39 @@ int main(int argc, char **argv) use_stdin = FALSE; } else { - if (set_mbox) + if (set_mbox) { free(set_mbox); + } set_mbox = NULL; } - /* - ** Deprecated options - */ - if (set_showhr) { - fprintf (stderr, "The \"showhr\" option has been deprecated. Ignoring it.\n"); - set_showhr = FALSE; + if (set_dir) { + char *dp = dirpath(set_dir); + set_dir = strreplace(set_dir, dp); + free (dp); } + + /* + * Default names for directories and labels need to be figured out. + */ + + if (use_stdin && (!set_dir || !strcasecmp(set_dir, "NONE"))) + set_dir = strreplace(set_dir, DIRNAME); - if (set_usetable) { - fprintf (stderr, "The \"usetable\" option has been deprecated. Ignoring it.\n"); - set_usetable = FALSE; + if (!set_dir || !strcasecmp(set_dir, "NONE")) + set_dir = strreplace(set_dir, (strrchr(set_mbox, '/')) ? strrchr(set_mbox, '/') + 1 : set_mbox); + + if (set_dir[strlen(set_dir) - 1] != PATH_SEPARATOR) { + char *t = set_dir; + + trio_asprintf(&set_dir, "%s%c", t, PATH_SEPARATOR); + free(t); } + if (!set_label || !strcasecmp(set_label, "NONE")) { + set_label = set_mbox ? (strreplace(set_label, (strrchr(set_mbox, '/')) ? strrchr(set_mbox, '/') + 1 : set_mbox)) : strsav("stdin"); + } + /* * Read the contents of the file into the variables to be used * in printing out the pages. @@ -456,26 +474,19 @@ int main(int argc, char **argv) ihtmlnavbar2upfile = expand_contents(set_ihtmlnavbar2up); mhtmlheaderfile = expand_contents(set_mhtmlheader); mhtmlfooterfile = expand_contents(set_mhtmlfooter); + if (set_mhtmlnavbar2up) { + mhtmlnavbar2upfile = expand_contents(set_mhtmlnavbar2up); + } else { + mhtmlnavbar2upfile = expand_contents(set_ihtmlnavbar2up); + } - if (set_dir) - set_dir = strreplace(set_dir, dirpath(set_dir)); - - /* - * Default names for directories and labels need to be figured out. - */ - - if (use_stdin && (!set_dir || !strcasecmp(set_dir, "NONE"))) - set_dir = strreplace(set_dir, DIRNAME); - - if (!set_dir || !strcasecmp(set_dir, "NONE")) - set_dir = strreplace(set_dir, (strrchr(set_mbox, '/')) ? strrchr(set_mbox, '/') + 1 : set_mbox); - - if (set_dir[strlen(set_dir) - 1] != PATH_SEPARATOR) - trio_asprintf(&set_dir, "%s%c", set_dir, PATH_SEPARATOR); - - if (!set_label || !strcasecmp(set_label, "NONE")) - set_label = set_mbox ? (strreplace(set_label, (strrchr(set_mbox, '/')) ? strrchr(set_mbox, '/') + 1 : set_mbox)) : "stdin"; - + /* if the user didn't specify a message navbar for messages, we + set up a generic one using the archives's label and linking + back to the main index */ + if (!mhtmlnavbar2upfile || !*mhtmlnavbar2upfile) { + trio_asprintf(&mhtmlnavbar2upfile, DEFAULT_MHTML_NAVBAR2UP, set_label); + } + /* * Which index file will be called "index.html"? */ @@ -594,6 +605,22 @@ int main(int argc, char **argv) checkdir(set_dir); + /* write the default css if any of the two custom ones was not + declared */ + if (! (set_icss_url && *set_icss_url) || + ! (set_mcss_url && *set_mcss_url)) { + + char *filename; + + if (set_default_css_url && !strcmp (set_default_css_url, "hypermail.css")) { + trio_asprintf(&filename, "%s%s", set_dir, "hypermail.css"); + if (!isfile(filename)) { + print_default_css_file (filename); + } + free(filename); + } + } + /* * Let's do it. */ @@ -678,8 +705,18 @@ int main(int argc, char **argv) writearticles(0, max_msgnum + 1); } - if (amount_new) { /* Always write the index files */ - if (set_linkquotes) { + /* only update the indices when we either have a new + message (if set_increment) is set or when inputting + a whole mbox. + This is a lame way to take into account msgid collision + when adding a new message. Without binding the condition + to set_condition, this could result in the indices stating + the archive is empty */ + if ((set_increment && amount_new > 0) + || (!set_increment + && (amount_new > 0 || count_deleted(max_msgnum + 1)))) { + /* Always write the index files */ + if (amount_new && set_linkquotes) { threadlist = NULL; threadlist_end = NULL; printedthreadlist = NULL; @@ -703,7 +740,7 @@ int main(int argc, char **argv) /* if (ep->flags & THREADING_ALTERED) */ } } - count_deleted(max_msgnum + 1); + if (show_index[0][DATE_INDEX]) writedates(amount_new, NULL); if (show_index[0][THREAD_INDEX]) @@ -731,24 +768,16 @@ int main(int argc, char **argv) if (set_uselock) unlock_archive(); + /* + ** do some cleanup + */ + printed_free(printedlist); + printed_free(printedthreadlist); + + ConfigCleanup(); + if (configfile) free(configfile); - if (ihtmlheaderfile) - free(ihtmlheaderfile); - if (ihtmlfooterfile) - free(ihtmlfooterfile); - if (ihtmlheadfile) - free(ihtmlheadfile); - if (ihtmlhelpupfile) - free(ihtmlhelpupfile); - if (ihtmlhelplowfile) - free(ihtmlhelplowfile); - if (ihtmlnavbar2upfile) - free(ihtmlnavbar2upfile); - if (mhtmlheaderfile) - free(mhtmlheaderfile); - if (mhtmlfooterfile) - free(mhtmlfooterfile); - + return (0); } diff --git a/src/hypermail.h b/src/hypermail.h index 38d71ac9..4f126f47 100644 --- a/src/hypermail.h +++ b/src/hypermail.h @@ -1,12 +1,15 @@ +#ifndef _HYPERMAIL_HYPERMAIL_H +#define _HYPERMAIL_HYPERMAIL_H /* ** Copyright (C) 1994, 1995 Enterprise Integration Technologies Corp. ** VeriFone Inc./Hewlett-Packard. All Rights Reserved. ** Kevin Hughes, kev@kevcom.com 3/11/94 ** Kent Landfield, kent@landfield.com 4/6/97 +** Hypermail Project 1998-2023 ** ** This program and library is free software; you can redistribute it and/or ** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 +** as published by the Free Software Foundation; either version 3 ** of the License, or any later version. ** ** This program is distributed in the hope that it will be useful, @@ -19,9 +22,6 @@ ** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA */ -#ifndef _HYPERMAIL_HYPERMAIL_H -#define _HYPERMAIL_HYPERMAIL_H - #ifndef MAIN_FILE #define VAR extern #else @@ -98,8 +98,8 @@ #endif /* -* this redefines the standard *printf() to use ours -*/ + * this redefines the standard *printf() to use ours + */ #define TRIO_REPLACE_STDIO #define HAVE_SSCANF /* avoid problems in setup.c with trio_sscanf */ #include @@ -114,7 +114,7 @@ #define TRUE 1 #define PROGNAME "hypermail" -#define HMURL "http://www.hypermail-project.org/" +#define HMURL "https://github.com/hypermail-project/hypermail/" #define INDEXNAME "index" #define DIRNAME "archive" @@ -134,7 +134,9 @@ #define NUMSTRLEN 10 #define MAXLINE 1024 +#define MAXURLLEN 4096 #define MAXFILELEN 256 +#define MAX_FWD_MSG_NESTING_LEVEL 100 #define NAMESTRLEN 320 #define MAILSTRLEN 80 #define DATESTRLEN 80 @@ -191,10 +193,27 @@ typedef enum { FORMAT_FLOWED = 1 } textplain_format_t; +typedef enum { + MN_KEEP = 0, + MN_KEEP_WITH_STORED_ATTACHMENT = 1, + MN_SKIP_BUT_KEEP_CHILDREN = 2, + MN_SKIP_STORED_ATTACHMENT = 4, + MN_SKIP_ALL = 8 +} message_node_skip_t; + +typedef enum { + MN_KEEP_NODE = 0, + MN_FREE_NODE = 1, + MN_FREE_ROOT_NODE = 2, + MN_DELETE_ATTACHMENTS = 4 +} message_node_release_details_t; + /* conversions supported by string.c:parseemail() */ typedef enum { - MAKEMAILCOMMAND = 1, /* makes links clickable */ - REPLACE_DOMAIN = 2, /* replaces domain by antispamdomain */ + MAKEMAILCOMMAND = 1, /* makes links clickable */ + OBFUSCATE_ADDRESS = 2, /* only obfuscate the email address */ + REPLACE_DOMAIN = 3 /* replaces domain by antispamdomain */ + } parseemail_conversion_t; /* @@ -233,7 +252,28 @@ struct body { char html; /* set to TRUE if already converted to HTML */ char header; /* part of header */ char parsedheader; /* this header line has been parsed once */ + char invalid_header; /* this is an invalid header line, it's missing its + header name, header value, and or has an invalid value */ + char antispam_disabled; /* no antispam was applied to this line */ +#ifdef DELETE_ME char attached; /* part of attachment */ +#endif +#ifdef DELETE_ME + char attachment_status; /* says if this is the start / end of an attachment and + type, expected to replace attached */ +#endif + /* review if still used */ + char attachment_links; /* part of generated links to attachments */ + int attachment_links_flags; + char attachment_rfc822; /* first line of a message/rfc822 attachment */ + int attachment_flags; /* states metadata for generating + markup when printing out the body, + like start or end of an attachment, + or a list of stored attachments, + and so on. check the flags defined + in BODY_ATTACHMENT early on this + section (and convert them to use an + typedef enum one day */ char demimed; /* if this is a header, this is set to TRUE if it has passed the decoderfc2047() function */ int format_flowed; /* TRUE if this a text/plain f=f line */ @@ -241,6 +281,41 @@ struct body { struct body *next; }; +/* here we divide a message into nodes. A message that has no attachments + has a single node. A message with attachments has one attachment child. + All those attachments then are considered siblings. + An attachment itself may be a message with children attachments, for + example a message/rfc822 */ +struct message_node { + struct body *bp; + struct body *lp; +#ifdef DEBUG_PARSE_MSGID_TRACE + char *msgid; /* for helping debugging */ +#endif + char *charset; /* the charset declared in the content-type */ + char *charsetsave; /* the first charset found in MIME RFC2047 encoded headers */ + char *content_type; + char *bin_filename; /* gives the path + filename if the part is stored */ + char *meta_filename; /* gives the path + filename of the metadata filename if it + was created */ + char *html_link; /* gives the link that will be added to the stored attachment + list */ + char *comment_filename; /* gives the filename that will be mentioned + in the HTML comment underneath the link */ + char *boundary_part; /* for multipart/mixed and message/rfc822, the + MIME for a given part */ + char *boundary_type; /* for multipart/mixed, the boundary declared in + the content type */ + char attachment_rfc822; /* set to TRUE if this message node is a + message/rfc822 attachment */ + char alternative; /* set to TRUE if this message node is a child of multipart/alternative */ + message_node_skip_t skip; /* different values stating how we should deal with this node + if we need to fully or partially skip it when flattening it */ + struct message_node *attachment_child; + struct message_node *attachment_next_sibling; + struct message_node *parent; +}; + struct printed { int msgnum; struct printed *next; @@ -330,25 +405,52 @@ struct attach { #define BODY_CONTINUE (1<<0) /* this is a continued line */ #define BODY_HTMLIZED (1<<1) /* this is already htmlified */ #define BODY_HEADER (1<<2) /* this is a header line */ -#define BODY_ATTACHED (1<<3) /* this line was attached */ -#define BODY_FORMAT_FLOWED (1<<4) /* this line is format-flowed */ -#define BODY_DEL_SSQ (1<<5) /* remove both space stuffing and +#define BODY_ATTACHMENT (1<<3) /* this line was attached */ +#define BODY_ATTACHMENT_START (1<<4) /* meta data to help encapsulate attachments */ +#define BODY_ATTACHMENT_END (1<<5) +/* beginning and ending of a list of external attachment links */ +#define BODY_ATTACHMENT_LINKS_START (1<<6) +#define BODY_ATTACHMENT_LINKS_END (1<<7) +#define BODY_ATTACHMENT_LINKS (1<<8) /* this line is a child of the list of external + attachment links */ +#define BODY_ATTACHMENT_RFC822 (1<<9) /* this line is the beginning of an message/rfc822 + ** attachment */ +#define BODY_FORMAT_FLOWED (1<<10) /* this line is format-flowed */ +#define BODY_DEL_SSQ (1<<11) /* remove both space stuffing and * quotes where applicable for f=f */ +#define BODY_NO_ANTISPAM (1<<12) /* disables anti spam protection for this line */ +#ifdef DELETE_ME +#define BODY_ATTACHED (1<<13) /* temp while cleaning code */ +#endif - -struct boundary { - struct boundary *next; - struct boundary *prev; - char *line; +/* used to store a MIME boundary and all the context related to it +** alternative_info, charset, applemail_hack, ... */ +struct boundary_stack { + struct boundary_stack *next; + struct boundary_stack *prev; + char *boundary_id; + + char alternativeparser; + int alternative_weight; + struct body *alternative_lp; + struct body *alternative_bp; + struct message_node *current_alt_message_node; + struct message_node *root_alt_message_node; + char alternative_message_node_created; + char alternative_file[131]; + char alternative_lastfile[131]; + char last_alternative_type[131]; + /* the following three store the context for the applemail hack */ + int parse_multipart_alternative_force_save_alts; + int applemail_old_set_save_alts; + int set_save_alts; }; -struct charset_stack { - struct charset_stack *next; - struct charset_stack *prev; - char *charset; - char *charsetsave; +struct hm_stack { + struct hm_stack *prev; + void *value; }; - + VAR struct header *subjectlist; VAR struct header *authorlist; VAR struct header *datelist; @@ -395,6 +497,7 @@ VAR char *ihtmlhelplowfile; VAR char *ihtmlnavbar2upfile; VAR char *mhtmlheaderfile; VAR char *mhtmlfooterfile; +VAR char *mhtmlnavbar2upfile; VAR long firstdatenum; VAR long lastdatenum; @@ -431,4 +534,4 @@ extern int strcasecmp(const char *, const char *); extern int strncasecmp(const char *, const char *, size_t); #endif -#endif /* ! _HYPERMAIL_HYPERMAIL_H */ +#endif /* _HYPERMAIL_HYPERMAIL_H */ diff --git a/src/lang.c b/src/lang.c index 11952330..1fb8e80c 100644 --- a/src/lang.c +++ b/src/lang.c @@ -1,3 +1,21 @@ +/* +** Copyright (C) 1997-2023 Hypermail Project +** +** This program and library is free software; you can redistribute it and/or +** modify it under the terms of the GNU (Library) General Public License +** as published by the Free Software Foundation; either version 3 +** of the License, or any later version. +** +** This program is distributed in the hope that it will be useful, +** but WITHOUT ANY WARRANTY; without even the implied warranty of +** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +** GNU (Library) General Public License for more details. +** +** You should have received a copy of the GNU (Library) General Public License +** along with this program; if not, write to the Free Software +** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA +*/ + #ifdef LANG_PROG #define MAIN_FILE #endif diff --git a/src/lang.h b/src/lang.h index 7ec42e66..ec265819 100644 --- a/src/lang.h +++ b/src/lang.h @@ -1,3 +1,23 @@ +#ifndef _HYPERMAIL_LANG_H +#define _HYPERMAIL_LANG_H +/* +** Copyright (C) 1997-2023 Hypermail Project +** +** This program and library is free software; you can redistribute it and/or +** modify it under the terms of the GNU (Library) General Public License +** as published by the Free Software Foundation; either version 3 +** of the License, or any later version. +** +** This program is distributed in the hope that it will be useful, +** but WITHOUT ANY WARRANTY; without even the implied warranty of +** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +** GNU (Library) General Public License for more details. +** +** You should have received a copy of the GNU (Library) General Public License +** along with this program; if not, write to the Free Software +** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA +*/ + /* ** WARNING!!! Don't muck with this file unless you know what you are ** getting yourself into!!!!!! @@ -231,6 +251,16 @@ struct language_entry { #define MSG_EDITED 166 #define MSG_SENDER_DELETED 167 #define MSG_SUBJECT_DELETED 168 +#define MSG_ACCESS_MAIL_ARCHIVES_BY_CAPTION 169 +#define MSG_CSORT_BY 170 +#define MSG_EMPTY_ARCHIVE 171 +#define MSG_ATTACHED_MESSAGE_NOTICE 172 +#define MSG_FORWARDED_MESSAGE_NOTICE 173 +#define MSG_ATTACHMENTS_NOTICE 174 +#define MSG_ATTACHMENTS_FOR_MESSAGE_NOTICE 175 +#define MSG_EMPTY_ARCHIVE_NOTICE 176 +#define MSG_CSS_NORMAL_VIEW 177 + #ifdef MAIN_FILE /* @@ -344,9 +374,9 @@ char *de[] = { /* German */ "Aktionsmöglichkeiten", /* Mail actions (MA) header -HTML*/ "Sende E-Mail mit neuen Titel", /* MA New Message -HTML*/ "Antworte auf die E-Mail", /* MA Reply -HTML*/ - "Zusammenfassung der monatlichen Index-Dateien", + "zusammenfassung der monatlichen index-dateien", /* monthly -HTML*/ - "Zusammenfassung der jährlichen Index-Dateien", + "zusammenfassung der jährlichen index-dateien", /* yearly -HTML*/ "Lege GDBM-Zwischenspeicher für Kopfzeilen an", /* Build a GDBM header cache -STDOUT*/ @@ -370,7 +400,7 @@ char *de[] = { /* German */ "Verzeichnisliste", /* MSG_FOLDERS_INDEX -HTML */ "Diese Nachricht wurde aus dem Archiv entfernt",/* MSG_DELETED -HTML */ "Diese Nachricht ist abgelaufen", /* MSG_EXPIRED -HTML */ - "(gelöschte Nachricht)", /* MSG_DEL_SHORT -HTML */ + "gelöschte Nachricht", /* MSG_DEL_SHORT -HTML */ "Ursprünglicher Text dieser Nachricht", /* MSG_TXT_VERSION -HTML */ "Diese Nachricht wurde herausgefiltert", /* MSG_FILTERED_OUT -HTML */ "Autor", /* MSG_FROM -HTML*/ @@ -418,11 +448,20 @@ char *de[] = { /* German */ " bis ", /* to - HTML */ "from", /* from - HTML */ "on", /* on - HTML */ - "message archived in another list or period", /* unknown in reply to - HTML */ + "Message archived in another list or period", /* unknown in reply to - HTML */ "Diese Nachricht wurde aus dem Archiv entfernt", /* MSG_DELETED_OTHER -HTML */ "Note: this message has been edited and differs from the originally archived copy.", /* MSG_EDITED -HTML */ "deleted", /* MSG_SENDER_DELETED -HTML */ "deleted", /* MSG_SUBJECT_DELETED -HTML */ + "Access mail archives by date, thread, author and subject", /* MSG_ACCESS_MAIL_ARCHIVES_BY_CAPTION - HTML */ + "Sortieren nach", /* CSort by - HTML */ + "Nothing received yet!", /* MSG_EMPTY_ARCHIVE - HTML */ + "attached message", /* MSG_ATTACHED_MESSAGE_NOTICE - HTML */ + "Forwarded message", /* MSG_FORWARDED_MESSAGE_NOTICE - HTML */ + "Attachments", /* MSG_ATTACHMENTS_NOTICE */ + "Attachments for message", /* MSG_ATTACHMENTS_FOR_MESSAGE_NOTICE */ + "(no messages are available in this archive)", /* MSG_EMPTY_ARCHIVE_NOTICE */ + "Normal view", /* MSG_CSS_NORMAL_VIEW */ NULL, /* End Of Message Table - NOWHERE*/ }; @@ -529,8 +568,8 @@ char *pl[] = { /* English */ "Wybierz", /* Mail actions (MA) header -HTML*/ "wy¶lij nowy temat", /* MA New Message -HTML*/ "odpowiedz na t± wiadomo¶æ", /* MA Reply -HTML*/ - "Zestawienie miesiêcy", /* monthly -HTML*/ - "Zestawienie lat", /* yearly -HTML*/ + "zestawienie miesiêcy", /* monthly -HTML*/ + "zestawienie lat", /* yearly -HTML*/ "Utwórz cache nag³owków GDBM", /* Build a GDBM header cache -STDOUT*/ "GDBM header cache option not build in", /* GDBM header cache option not build in -STDERR*/ @@ -549,7 +588,7 @@ char *pl[] = { /* English */ "Lista Katalogów", /* MSG_FOLDERS_INDEX -HTML */ "Ta wiadomo¶æ zosta³a usuniêta z archiwum", /* MSG_DELETED -HTML */ "Ta wiadomo¶æ jest przedawniona", /*MSG_EXPIRED -HTML */ - "(usuniêta wiadomo¶æ)", /* MSG_DEL_SHORT -HTML */ + "usuniêta wiadomo¶æ", /* MSG_DEL_SHORT -HTML */ "Tekst tej zawarto¶ci", /* MSG_TXT_VERSION -HTML */ "Ta wiadomo¶æ zosta³a odfiltrowana", /* MSG_FILTERED_OUT -HTML */ "Autor", /* MSG_FROM -HTML*/ @@ -597,11 +636,20 @@ char *pl[] = { /* English */ " to ", /* to - HTML */ "from", /* from - HTML */ "on", /* on - HTML */ - "message archived in another list or period", /* unknown in reply to - HTML */ + "Message archived in another list or period", /* unknown in reply to - HTML */ "Ta wiadomo¶æ zosta³a usuniêta z archiwum", /* MSG_DELETED_OTHER -HTML */ "Note: this message has been edited and differs from the originally archived copy.", /* MSG_EDITED -HTML */ "deleted", /* MSG_SENDER_DELETED -HTML */ "deleted", /* MSG_SUBJECT_DELETED -HTML */ + "Access mail archives by date, thread, author and subject", /* MSG_ACCESS_MAIL_ARCHIVES_BY_CAPTION - HTML */ + "Sort by", /* CSort by - HTML */ + "Nothing received yet!", /* MSG_EMPTY_ARCHIVE - HTML */ + "attached message", /* MSG_ATTACHED_MESSAGE_NOTICE - HTML */ + "Forwarded message", /* MSG_FORWARDED_MESSAGE_NOTICE - HTML */ + "Attachments", /* MSG_ATTACHMENTS_NOTICE */ + "Attachments for message", /* MSG_ATTACHMENTS_FOR_MESSAGE_NOTICE */ + "(no messages are available in this archive)", /* MSG_EMPTY_ARCHIVE_NOTICE */ + "Normal view", /* MSG_CSS_NORMAL_VIEW */ NULL, /* End Of Message Table - NOWHERE*/ }; @@ -709,8 +757,8 @@ char *en[] = { /* English */ "Mail actions", /* Mail actions (MA) header -HTML*/ "mail a new topic", /* MA New Message -HTML*/ "respond to this message", /* MA Reply -HTML*/ - "Summary of Monthly Index Files", /* monthly -HTML*/ - "Summary of Yearly Index Files", /* yearly -HTML*/ + "summary of monthly index files", /* monthly -HTML*/ + "summary of yearly index files", /* yearly -HTML*/ "Build a GDBM header cache", /* Build a GDBM header cache -STDOUT*/ "GDBM header cache option not build in", /* GDBM header cache option not build in -STDERR*/ @@ -729,7 +777,7 @@ char *en[] = { /* English */ "List of Folders", /* MSG_FOLDERS_INDEX -HTML */ "This message has been deleted from the archive", /* MSG_DELETED -HTML */ "This message has expired", /* MSG_EXPIRED -HTML */ - "(deleted message)", /* MSG_DEL_SHORT -HTML */ + "deleted message", /* MSG_DEL_SHORT -HTML */ "Original text of this message", /* MSG_TXT_VERSION -HTML */ "This message has been filtered out", /* MSG_FILTERED_OUT -HTML */ "From", /* MSG_FROM -HTML*/ @@ -777,11 +825,20 @@ char *en[] = { /* English */ " to ", /* to - HTML */ "from", /* from - HTML */ "on", /* on - HTML */ - "message archived in another list or period", /* unknown in reply to - HTML */ - "This message has been deleted from the archive", /* MSG_DELETED_OTHER -HTML */ - "Note: this message has been edited and differs from the originally archived copy.", /* MSG_EDITED -HTML */ - "deleted", /* MSG_SENDER_DELETED -HTML */ - "deleted", /* MSG_SUBJECT_DELETED -HTML */ + "Message archived in another list or period", /* unknown in reply to - HTML */ + "This message has been deleted from the archive", /* MSG_DELETED_OTHER - HTML */ + "Note: this message has been edited and differs from the originally archived copy.", /* MSG_EDITED - HTML */ + "deleted", /* MSG_SENDER_DELETED - HTML */ + "deleted", /* MSG_SUBJECT_DELETED - HTML */ + "Access mail archives by date, thread, author and subject", /* MSG_ACCESS_MAIL_ARCHIVES_BY_CAPTION - HTML */ + "Sort by", /* CSort by - HTML*/ + "Nothing received yet!", /* MSG_EMPTY_ARCHIVE - HTML */ + "attached message", /* MSG_ATTACHED_MESSAGE_NOTICE - HTML */ + "Forwarded message", /* MSG_FORWARDED_MESSAGE_NOTICE - HTML */ + "Attachments", /* MSG_ATTACHMENTS_NOTICE */ + "Attachments for message", /* MSG_ATTACHMENTS_FOR_MESSAGE_NOTICE */ + "(no messages are available in this archive)", /* MSG_EMPTY_ARCHIVE_NOTICE */ + "Normal view", /* MSG_CSS_NORMAL_VIEW */ NULL, /* End Of Message Table - NOWHERE*/ }; @@ -899,8 +956,8 @@ char *es[] = { /* Espanol/Spanish */ "Cabecera MA (Mail actions)", /* Mail actions (MA) header - HTML */ "Enviar un nuevo tema", /* MA New Message - HTML */ "responder a este mensaje", /* MA Reply - HTML */ - "Resumen de índices mensuales", /* monthly - HTML */ - "Resumen de índices anuales", /* yearly - HTML */ + "resumen de índices mensuales", /* monthly - HTML */ + "resumen de índices anuales", /* yearly - HTML */ "Costruir cabecera para caché GDBM",/* Build a GDBM header cache - STDOUT*/ "Creando índice gdbm... ", /* Creating gdbm index - STDOUT*/ "No pudo crearse fichero gdbm... ", /* Can't create gdbm index - STDOUT*/ @@ -919,7 +976,7 @@ char *es[] = { /* Espanol/Spanish */ "El mensaje ha sido borrado del archivo", /* MSG_DELETED - HTML */ "El mensaje ha caducado", /* MSG_EXPIRED - HTML */ - "(mensaje borrado)", /* MSG_DEL_SHORT - HTML */ + "mensaje borrado", /* MSG_DEL_SHORT - HTML */ "Texto original del mensaje", /* MSG_TXT_VERSION - HTML */ "El mensaje ha sido filtrado", /* MSG_FILTERED_OUT - HTML */ "Autor", /* MSG_FROM - HTML */ @@ -967,12 +1024,21 @@ char *es[] = { /* Espanol/Spanish */ " to ", /* to - HTML */ "from", /* from - HTML */ "on", /* on - HTML */ - "message archived in another list or period", /* unknown in reply to - HTML */ + "Message archived in another list or period", /* unknown in reply to - HTML */ "El mensaje ha sido borrado del archivo", /* MSG_DELETED_OTHER - HTML */ "Note: this message has been edited and differs from the originally archived copy.", /* MSG_EDITED -HTML */ "deleted", /* MSG_SENDER_DELETED -HTML */ "deleted", /* MSG_SUBJECT_DELETED -HTML */ + "Access mail archives by date, thread, author and subject", /* MSG_ACCESS_MAIL_ARCHIVES_BY_CAPTION - HTML */ + "Sort by", /* CSort by - HTML*/ + "Nothing received yet!", /* MSG_EMPTY_ARCHIVE - HTML */ + "attached message", /* MSG_ATTACHED_MESSAGE_NOTICE - HTML */ + "Forwarded message", /* MSG_FORWARDED_MESSAGE_NOTICE - HTML */ + "Attachments", /* MSG_ATTACHMENTS_NOTICE */ + "Attachments for message", /* MSG_ATTACHMENTS_FOR_MESSAGE_NOTICE */ + "(no messages are available in this archive)", /* MSG_EMPTY_ARCHIVE_NOTICE */ + "Normal view", /* MSG_CSS_NORMAL_VIEW */ NULL, /* End Of Message Table */ }; @@ -1082,8 +1148,8 @@ char *pt[] = { /* Brazilian Portuguese */ "Ações de E-Mail", /* Mail actions (MA) header -HTML*/ "Novo tópico de E-Mail", /* MA New Message -HTML*/ "Responder à esta mensagem", /* MA Reply -HTML*/ - "Sumário dos Arquivos Mensais de Índice", /* monthly -HTML*/ - "Sumário dos Arquivos Anuais de Índice", /* yearly -HTML*/ + "sumário dos arquivos mensais de índice", /* monthly -HTML*/ + "sumário dos arquivos anuais de índice", /* yearly -HTML*/ "Compilar cache de cabeçalho GDBM", /* Build a GDBM header cache -STDOUT*/ "Opção de cabeçalho GDBM não compilada", /* GDBM header cache option not build in -STDERR*/ "Criando índice gdbm... ", /* Creating gdbm index -STDOUT*/ @@ -1102,7 +1168,7 @@ char *pt[] = { /* Brazilian Portuguese */ "Lista de Diretórios", /* MSG_FOLDERS_INDEX -HTML */ "Esta mensagem foi removida do arquivo", /* MSG_DELETED -HTML */ "Esta mensagem expirou", /* MSG_EXPIRED -HTML */ - "(mensagem removida)", /* MSG_DEL_SHORT -HTML */ + "mensagem removida", /* MSG_DEL_SHORT -HTML */ "Texto original desta mensagem", /* MSG_TXT_VERSION -HTML */ "Esta mensagem foi filtrada", /* MSG_FILTERED_OUT -HTML */ "De", /* MSG_FROM -HTML*/ @@ -1150,11 +1216,20 @@ char *pt[] = { /* Brazilian Portuguese */ "por anexo", /* by attachment - HTML */ "período", /* period - HTML */ " para ", /* to - HTML */ - "message archived in another list or period", /* unknown in reply to - HTML */ + "Message archived in another list or period", /* unknown in reply to - HTML */ "Esta mensagem foi removida do arquivo", /* MSG_DELETE_OTHER -HTML */ "Note: this message has been edited and differs from the originally archived copy.", /* MSG_EDITED -HTML */ "deleted", /* MSG_SENDER_DELETED -HTML */ "deleted", /* MSG_SUBJECT_DELETED -HTML */ + "Access mail archives by date, thread, author and subject", /* MSG_ACCESS_MAIL_ARCHIVES_BY_CAPTION - HTML */ + "Ordenar por", /* CSort by - HTML*/ + "Nothing received yet!", /* MSG_EMPTY_ARCHIVE - HTML */ + "attached message", /* MSG_ATTACHED_MESSAGE_NOTICE - HTML */ + "Forwarded message", /* MSG_FORWARDED_MESSAGE_NOTICE - HTML */ + "Attachments", /* MSG_ATTACHMENTS_NOTICE */ + "Attachments for message", /* MSG_ATTACHMENTS_FOR_MESSAGE_NOTICE */ + "(no messages are available in this archive)", /* MSG_EMPTY_ARCHIVE_NOTICE */ + "Normal view", /* MSG_CSS_NORMAL_VIEW */ NULL, /* End Of Message Table - NOWHERE*/ }; @@ -1260,8 +1335,8 @@ char *fi[] = { /* Finnish */ "Mail actions", /* Mail actions (MA) header -HTML*/ "mail a new topic", /* MA New Message -HTML*/ "respond to this message", /* MA Reply -HTML*/ - "Summary of Monthly Index Files", /* monthly -HTML*/ - "Summary of Yearly Index Files", /* yearly -HTML*/ + "summary of monthly index files", /* monthly -HTML*/ + "summary of yearly index files", /* yearly -HTML*/ "Build a GDBM header cache", /* Build a GDBM header cache -STDOUT*/ "GDBM header cache option not build in", /* GDBM header cache option not build in -STDERR*/ @@ -1280,7 +1355,7 @@ char *fi[] = { /* Finnish */ "List of Folders", /* MSG_FOLDERS_INDEX -HTML */ "This message has been deleted from the archive", /* MSG_DELETED -HTML */ "This message has expired", /* MSG_EXPIRED -HTML */ - "(deleted message)", /* MSG_DEL_SHORT -HTML */ + "deleted message", /* MSG_DEL_SHORT -HTML */ "Original text of this message", /* MSG_TXT_VERSION -HTML */ "This message has been filtered out", /* MSG_FILTERED_OUT -HTML */ "Kirjoittajan mukaan", /* MSG_FROM -HTML*/ @@ -1328,11 +1403,20 @@ char *fi[] = { /* Finnish */ " to ", /* to - HTML */ "from", /* from - HTML */ "on", /* on - HTML */ - "message archived in another list or period", /* unknown in reply to - HTML */ + "Message archived in another list or period", /* unknown in reply to - HTML */ "This message has been deleted from the archive", /* MSG_DELETED_OTHER -HTML */ "Note: this message has been edited and differs from the originally archived copy.", /* MSG_EDITED -HTML */ "deleted", /* MSG_SENDER_DELETED -HTML */ "deleted", /* MSG_SUBJECT_DELETED -HTML */ + "Access mail archives by date, thread, author and subject", /* MSG_ACCESS_MAIL_ARCHIVES_BY_CAPTION - HTML */ + "Sort by", /* CSort by - HTML*/ + "Nothing received yet!", /* MSG_EMPTY_ARCHIVE - HTML */ + "attached message", /* MSG_ATTACHED_MESSAGE_NOTICE - HTML */ + "Forwarded message", /* MSG_FORWARDED_MESSAGE_NOTICE - HTML */ + "Attachments", /* MSG_ATTACHMENTS_NOTICE */ + "Attachments for message", /* MSG_ATTACHMENTS_FOR_MESSAGE_NOTICE */ + "(no messages are available in this archive)", /* MSG_EMPTY_ARCHIVE_NOTICE */ + "Normal view", /* MSG_CSS_NORMAL_VIEW */ NULL, /* End Of Message Table - NOWHERE*/ }; @@ -1443,8 +1527,8 @@ char *it[] = { /* Italian */ "Azioni di posta", /* Mail actions (MA) header -HTML*/ "spedisci un nuovo argomento", /* MA New Message -HTML*/ "rispondi a questo messaggio", /* MA Reply -HTML*/ - "Riepilogo dei file di indice mensili", /* monthly -HTML*/ - "Riepilogo dei file di indice annuali", /* yearly -HTML*/ + "riepilogo dei file di indice mensili", /* monthly -HTML*/ + "riepilogo dei file di indice annuali", /* yearly -HTML*/ "Costruisci una cache degli header in GDBM", /* Build a GDBM header cache -STDOUT*/ "Creazione dell'indice gdbm ... ", /* Creating gdbm index -STDOUT*/ "Impossibile creare l'indice gdbm ... ", /* Can't create gdbm index -STDOUT*/ @@ -1461,7 +1545,7 @@ char *it[] = { /* Italian */ "Lista delle cartelle", /* MSG_FOLDERS_INDEX -HTML */ "Questo messaggio è stato cancellato dall'archivio", /* MSG_DELETED -HTML */ "Il messaggio è scaduto", /* MSG_EXPIRED -HTML */ - "(messaggio cancellato)", /* MSG_DEL_SHORT -HTML */ + "messaggio cancellato", /* MSG_DEL_SHORT -HTML */ "Testo originale di questo messaggio", /* MSG_TXT_VERSION -HTML */ "Questo messaggio è stato filtrato", /* MSG_FILTERED_OUT -HTML */ "From", /* MSG_FROM -HTML*/ @@ -1509,11 +1593,20 @@ char *it[] = { /* Italian */ " to ", /* to - HTML */ "from", /* from - HTML */ "on", /* on - HTML */ - "message archived in another list or period", /* unknown in reply to - HTML */ + "Message archived in another list or period", /* unknown in reply to - HTML */ "Questo messaggio è stato cancellato dall'archivio", /* MSG_DELETED_OTHER -HTML */ "Note: this message has been edited and differs from the originally archived copy.", /* MSG_EDITED -HTML */ "deleted", /* MSG_SENDER_DELETED -HTML */ "deleted", /* MSG_SUBJECT_DELETED -HTML */ + "Access mail archives by date, thread, author and subject", /* MSG_ACCESS_MAIL_ARCHIVES_BY_CAPTION - HTML */ + "Sort by", /* CSort by - HTML*/ + "Nothing received yet!", /* MSG_EMPTY_ARCHIVE - HTML */ + "attached message", /* MSG_ATTACHED_MESSAGE_NOTICE - HTML */ + "Forwarded message", /* MSG_FORWARDED_MESSAGE_NOTICE - HTML */ + "Attachments", /* MSG_ATTACHMENTS_NOTICE */ + "Attachments for message", /* MSG_ATTACHMENTS_FOR_MESSAGE_NOTICE */ + "(no messages are available in this archive)", /* MSG_EMPTY_ARCHIVE_NOTICE */ + "Normal view", /* MSG_CSS_NORMAL_VIEW */ NULL, /* End Of Message Table - NOWHERE*/ }; @@ -1621,8 +1714,8 @@ char *fr[] = { /* French */ "Actions sur les mails", /* Mail actions (MA) header -HTML*/ "créer un nouveau thème", /* MA New Message -HTML*/ "répondre à ce message", /* MA Reply -HTML*/ - "Récapitulatif des fichiers Index mensuels", /* monthly -HTML*/ - "Récapitulatif des fichiers Index annuels", /* yearly -HTML*/ + "récapitulatif des fichiers index mensuels", /* monthly -HTML*/ + "récapitulatif des fichiers index annuels", /* yearly -HTML*/ "Creation d'un cache GDBM pour les en-têtes", /* Build a GDBM header cache -STDOUT*/ "GDBM header cache option not build in", /* GDBM header cache option not build in -STDERR*/ @@ -1641,7 +1734,7 @@ char *fr[] = { /* French */ "Liste des dossiers", /* MSG_FOLDERS_INDEX -HTML */ "Ce message a été supprimé de l'archive", /* MSG_DELETED -HTML */ "Ce message est trop vieux", /* MSG_EXPIRED -HTML */ - "(message supprimé)", /* MSG_DEL_SHORT -HTML */ + "message supprimé", /* MSG_DEL_SHORT -HTML */ "Texte original de ce message", /* MSG_TXT_VERSION -HTML */ "Ce message a été supprimé par filtrage", /* MSG_FILTERED_OUT -HTML */ "Auteur", /* MSG_FROM -HTML*/ @@ -1669,7 +1762,7 @@ char *fr[] = { /* French */ "Messages récents par fichier attaché", /* Contemporary messages by attachments - HTML*/ "Barre de navigation vers le niveau supérieur", /* Navigation bar, upper levels - HTML*/ "Barre de navigation", /* Navigation bar - HTML*/ - "Trier par", /* Sort by - HTML*/ + "trier par", /* Sort by - HTML*/ "Autres périodes", /* Other periods - HTML */ "Suivant", /* Next folder - HTML */ "Messages archives dans la période suivante, triés par date", /* Next folder, by date - HTML link */ @@ -1689,11 +1782,20 @@ char *fr[] = { /* French */ " &eagrave; ", /* to - HTML */ "from", /* from - HTML */ "on", /* on - HTML */ - "message archived in another list or period", /* unknown in reply to - HTML */ + "Message archived in another list or period", /* unknown in reply to - HTML */ "Ce message a été supprimé de l'archive", /* MSG_DELETED_OTHER -HTML */ "Note: this message has been edited and differs from the originally archived copy.", /* MSG_EDITED -HTML */ "deleted", /* MSG_SENDER_DELETED -HTML */ "deleted", /* MSG_SUBJECT_DELETED -HTML */ + "Access mail archives by date, thread, author and subject", /* MSG_ACCESS_MAIL_ARCHIVES_BY_CAPTION - HTML */ + "Trier par", /* CSort by - HTML*/ + "Nothing received yet!", /* MSG_EMPTY_ARCHIVE - HTML */ + "attached message", /* MSG_ATTACHED_MESSAGE_NOTICE - HTML */ + "Forwarded message", /* MSG_FORWARDED_MESSAGE_NOTICE - HTML */ + "Attachments", /* MSG_ATTACHMENTS_NOTICE */ + "Attachments for message", /* MSG_ATTACHMENTS_FOR_MESSAGE_NOTICE */ + "(no messages are available in this archive)", /* MSG_EMPTY_ARCHIVE_NOTICE */ + "Normal view", /* MSG_CSS_NORMAL_VIEW */ NULL, /* End Of Message Table - NOWHERE*/ }; @@ -1801,8 +1903,8 @@ char *is[] = { /* Icelandic */ "Mail actions", /* Mail actions (MA) header -HTML*/ "mail a new topic", /* MA New Message -HTML*/ "respond to this message", /* MA Reply -HTML*/ - "Summary of Monthly Index Files", /* monthly -HTML*/ - "Summary of Yearly Index Files", /* yearly -HTML*/ + "summary of monthly index files", /* monthly -HTML*/ + "summary of yearly index files", /* yearly -HTML*/ "Build a GDBM header cache", /* Build a GDBM header cache -STDOUT*/ "GDBM header cache option not build in", /* GDBM header cache option not build in -STDERR*/ @@ -1823,7 +1925,7 @@ char *is[] = { /* Icelandic */ "List of Folders", /* MSG_FOLDERS_INDEX -HTML */ "This message has been deleted from the archive", /* MSG_DELETED -HTML */ "This message has expired", /* MSG_EXPIRED -HTML */ - "(deleted message)", /* MSG_DEL_SHORT -HTML */ + "deleted message", /* MSG_DEL_SHORT -HTML */ "Original text of this message", /* MSG_TXT_VERSION -HTML */ "This message has been filtered out", /* MSG_FILTERED_OUT -HTML */ "Höfundur", /* MSG_FROM -HTML*/ @@ -1871,11 +1973,20 @@ char *is[] = { /* Icelandic */ " to ", /* to - HTML */ "from", /* from - HTML */ "on", /* on - HTML */ - "message archived in another list or period", /* unknown in reply to - HTML */ + "Message archived in another list or period", /* unknown in reply to - HTML */ "This message has been deleted from the archive", /* MSG_DELETED_OTHER -HTML */ "Note: this message has been edited and differs from the originally archived copy.", /* MSG_EDITED -HTML */ "deleted", /* MSG_SENDER_DELETED -HTML */ "deleted", /* MSG_SUBJECT_DELETED -HTML */ + "Access mail archives by date, thread, author and subject", /* MSG_ACCESS_MAIL_ARCHIVES_BY_CAPTION - HTML */ + "Sort by", /* CSort by - HTML*/ + "Nothing received yet!", /* MSG_EMPTY_ARCHIVE - HTML */ + "attached message", /* MSG_ATTACHED_MESSAGE_NOTICE - HTML */ + "Forwarded message", /* MSG_FORWARDED_MESSAGE_NOTICE - HTML */ + "Attachments", /* MSG_ATTACHMENTS_NOTICE */ + "Attachments for message", /* MSG_ATTACHMENTS_FOR_MESSAGE_NOTICE */ + "(no messages are available in this archive)", /* MSG_EMPTY_ARCHIVE_NOTICE */ + "Normal view", /* MSG_CSS_NORMAL_VIEW */ NULL, /* End Of Message Table - NOWHERE*/ }; @@ -1988,8 +2099,8 @@ char *sv[] = { "E-postfunktioner", /* Mail actions (MA) header -HTML*/ "sänd ett nytt ämne", /* MA New Message -HTML*/ "svara på brevet", /* MA Reply -HTML*/ - "Sammanfattning över månatliga indexfiler", /* monthly -HTML*/ - "Sammanfattning över årliga indexfiler", /* yearly -HTML*/ + "sammanfattning över månatliga indexfiler", /* monthly -HTML*/ + "sammanfattning över årliga indexfiler", /* yearly -HTML*/ "Bygger en GDBM-huvudcache", /* Build a GDBM header cache -STDOUT*/ "Tillval för GDBM-huvudcache inte inkompilerat", /* GDBM header cache option not build in -STDERR*/ @@ -2008,7 +2119,7 @@ char *sv[] = { "Mapplista", /* MSG_FOLDERS_INDEX -HTML */ "Detta brev har tagits bort från arkivet", /* MSG_DELETED -HTML */ "Detta brev har utgått", /* MSG_EXPIRED -HTML */ - "(borttaget brev)", /* MSG_DEL_SHORT -HTML */ + "borttaget brev", /* MSG_DEL_SHORT -HTML */ "Ursprunglig brevtext", /* MSG_TXT_VERSION -HTML */ "Detta brev har filtrerats", /* MSG_FILTERED_OUT -HTML */ "Författare", /* MSG_FROM -HTML*/ @@ -2056,11 +2167,20 @@ char *sv[] = { " to ", /* to - HTML */ "from", /* from - HTML */ "on", /* on - HTML */ - "message archived in another list or period", /* unknown in reply to - HTML */ + "Message archived in another list or period", /* unknown in reply to - HTML */ "Detta brev har tagits bort från arkivet", /* MSG_DELETED_OTHER -HTML */ "Note: this message has been edited and differs from the originally archived copy.", /* MSG_EDITED -HTML */ "deleted", /* MSG_SENDER_DELETED -HTML */ "deleted", /* MSG_SUBJECT_DELETED -HTML */ + "Access mail archives by date, thread, author and subject", /* MSG_ACCESS_MAIL_ARCHIVES_BY_CAPTION - HTML */ + "Sort by", /* CSort by - HTML*/ + "Nothing received yet!", /* MSG_EMPTY_ARCHIVE - HTML */ + "attached message", /* MSG_ATTACHED_MESSAGE_NOTICE - HTML */ + "Forwarded message", /* MSG_FORWARDED_MESSAGE_NOTICE - HTML */ + "Attachments", /* MSG_ATTACHMENTS_NOTICE */ + "Attachments for message", /* MSG_ATTACHMENTS_FOR_MESSAGE_NOTICE */ + "(no messages are available in this archive)", /* MSG_EMPTY_ARCHIVE_NOTICE */ + "Normal view", /* MSG_CSS_NORMAL_VIEW */ NULL, /* End Of Message Table - NOWHERE*/ }; @@ -2173,8 +2293,8 @@ char *no[] = { "E-postfunksjoner", /* Mail actions (MA) header -HTML*/ "Lag ny tråd", /* MA New Message -HTML*/ "besvare meldingen", /* MA Reply -HTML*/ - "Sammenfatning over månedlige indeksfiler", /* monthly -HTML*/ - "Sammenfatning over årlige indeksfiler", /* yearly -HTML*/ + "sammenfatning over månedlige indeksfiler", /* monthly -HTML*/ + "sammenfatning over årlige indeksfiler", /* yearly -HTML*/ "Bygger en GDBM-headercache", /* Build a GDBM header cache -STDOUT*/ "GDBM header cache opsjon ikke innebygget", /* GDBM header cache option not build in -STDERR*/ @@ -2193,7 +2313,7 @@ char *no[] = { "Mappeliste", /* MSG_FOLDERS_INDEX -HTML */ "Denne meldingen er fjernet fra arkivet", /* MSG_DELETED -HTML */ "Dette meldingen har utgått", /* MSG_EXPIRED -HTML */ - "(slettet melding)", /* MSG_DEL_SHORT -HTML */ + "slettet melding", /* MSG_DEL_SHORT -HTML */ "Opprinnelig brevtekst", /* MSG_TXT_VERSION -HTML */ "Denne meldingen har blitt filtrert bort", /* MSG_FILTERED_OUT -HTML */ "Forfatter", /* MSG_FROM -HTML*/ @@ -2241,11 +2361,20 @@ char *no[] = { " to ", /* to - HTML */ "from", /* from - HTML */ "on", /* on - HTML */ - "message archived in another list or period", /* unknown in reply to - HTML */ + "Message archived in another list or period", /* unknown in reply to - HTML */ "Denne meldingen er fjernet fra arkivet", /* MSG_DELETED_OTHER -HTML */ "Note: this message has been edited and differs from the originally archived copy.", /* MSG_EDITED -HTML */ "deleted", /* MSG_SENDER_DELETED -HTML */ "deleted", /* MSG_SUBJECT_DELETED -HTML */ + "Access mail archives by date, thread, author and subject", /* MSG_ACCESS_MAIL_ARCHIVES_BY_CAPTION - HTML */ + "Sort by", /* CSort by - HTML*/ + "Nothing received yet!", /* MSG_EMPTY_ARCHIVE - HTML */ + "attached message", /* MSG_ATTACHED_MESSAGE_NOTICE - HTML */ + "Forwarded message", /* MSG_FORWARDED_MESSAGE_NOTICE - HTML */ + "Attachments", /* MSG_ATTACHMENTS_NOTICE */ + "Attachments for message", /* MSG_ATTACHMENTS_FOR_MESSAGE_NOTICE */ + "(no messages are available in this archive)", /* MSG_EMPTY_ARCHIVE_NOTICE */ + "Normal view", /* MSG_CSS_NORMAL_VIEW */ NULL, /* End Of Message Table - NOWHERE*/ }; @@ -2363,8 +2492,8 @@ char *gr[] = { /* Greek */ "Mail åíÝñãåéåò", /* Mail actions (MA) header -HTML*/ "Óôåßëå åíá êáéíïýñéï ìÞíõìá ", /* MA New Message -HTML*/ "ÁðÜíôçóå óå áõôü ôï ìÞíõìá", /* MA Reply -HTML*/ - "Ðåñßëçøç ôùí ìçíéáßùí åõñåôÞñéùí", /* monthly -HTML*/ - "Ðåñßëçøç ôùí åôÞóéùí åõñåôÞñéùí", /* yearly -HTML*/ + "ðåñßëçøç ôùí ìçíéáßùí åõñåôÞñéùí", /* monthly -HTML*/ + "ðåñßëçøç ôùí åôÞóéùí åõñåôÞñéùí", /* yearly -HTML*/ "Build a GDBM header cache", /* Build a GDBM header cache -STDOUT*/ "Äçìéïõñãþ ôï gdbm åõñåôÞñéï... ", /* Creating gdbm index -STDOUT*/ "Äåí ìðïñþ íá äçìéïõñãÞóù ôï gdbm áñ÷åßï... ", /* Can't create gdbm index -STDOUT*/ @@ -2381,7 +2510,7 @@ char *gr[] = { /* Greek */ "Ëßóôá êáôáëüãùí", /* MSG_FOLDERS_INDEX -HTML */ "Áõôü ôï ìÞíõìá Ý÷åé óâçóôåß áðï ôï áñ÷åßï", /* MSG_DELETED -HTML */ "Áõôü ôï ìÞíõìá Ý÷åé ëÞîåé", /* MSG_EXPIRED -HTML */ - "(óâçóìÝíï ìÞíõìá)", /* MSG_DEL_SHORT -HTML */ + "óâçóìÝíï ìÞíõìá", /* MSG_DEL_SHORT -HTML */ "Ðñüôõðï êåßìåíï ôïõ ìçíýìáôïò", /* MSG_TXT_VERSION -HTML */ "Áõôü ôï ìÞíõìá Ý÷åé öéëôñáñéóôåß ", /* MSG_FILTERED_OUT -HTML */ "ÓõããñáöÝáò", /* MSG_FROM -HTML*/ @@ -2427,11 +2556,20 @@ char *gr[] = { /* Greek */ " to ", /* to - HTML */ "from", /* from - HTML */ "on", /* on - HTML */ - "message archived in another list or period", /* unknown in reply to - HTML */ + "Message archived in another list or period", /* unknown in reply to - HTML */ "Áõôü ôï ìÞíõìá Ý÷åé óâçóôåß áðï ôï áñ÷åßï", /* MSG_DELETED_OTHER -HTML */ "Note: this message has been edited and differs from the originally archived copy.", /* MSG_EDITED -HTML */ "deleted", /* MSG_SENDER_DELETED -HTML */ "deleted", /* MSG_SUBJECT_DELETED -HTML */ + "Access mail archives by date, thread, author and subject", /* MSG_ACCESS_MAIL_ARCHIVES_BY_CAPTION - HTML */ + "Sort by", /* CSort by - HTML*/ + "Nothing received yet!", /* MSG_EMPTY_ARCHIVE - HTML */ + "attached message", /* MSG_ATTACHED_MESSAGE_NOTICE - HTML */ + "Forwarded message", /* MSG_FORWARDED_MESSAGE_NOTICE - HTML */ + "Attachments", /* MSG_ATTACHMENTS_NOTICE */ + "Attachments for message", /* MSG_ATTACHMENTS_FOR_MESSAGE_NOTICE */ + "(no messages are available in this archive)", /* MSG_EMPTY_ARCHIVE_NOTICE */ + "Normal view", /* MSG_CSS_NORMAL_VIEW */ NULL, /* End Of Message Table - NOWHERE*/ }; @@ -2538,8 +2676,8 @@ char *ru[] = { /* Russian */ "äÅÊÓÔ×ÉÑ Ó ÐÏÞÔÏÊ", /* Mail actions (MA) header -HTML*/ "ðÏÓÌÁÔØ ÎÏ×ÕÀ ÓÔÁÔØÀ", /* MA New Message -HTML*/ "ïÔ×ÅÔÉÔØ ÎÁ ÜÔÏ ÓÏÏÂÝÅÎÉÅ", /* MA Reply -HTML*/ - "óÕÍÍÁÒÎÏ ÚÁ ÍÅÓÑà ÉÎÄÅËÓÎÙÈ ÆÁÊÌÏ×", /* monthly -HTML*/ - "óÕÍÍÁÒÎÏ ÚÁ ÇÏÄ ÉÎÄÅËÓÎÙÈ ÆÁÊÌÏ×", /* yearly -HTML*/ + "óõííáòîï úá íåóñã éîäåëóîùè æáêìï×", /* monthly -HTML*/ + "óõííáòîï úá çïä éîäåëóîùè æáêìï×", /* yearly -HTML*/ "ðÏÓÔÒÏÅÎÉÅ GDBM ÚÁÇÏÌÏ×ËÁ ËÜÛÁ", /* Build a GDBM header cache -STDOUT*/ "ïÐÃÉÑ GDBM header cache ÎÅ ×ÓÔÒÏÅÎÁ",/* GDBM header cache option not build in -STDERR*/ "óÔÒÏÀ gdbm ÉÎÄÅËÓ... ", /* Creating gdbm index -STDOUT*/ @@ -2557,7 +2695,7 @@ char *ru[] = { /* Russian */ "óÐÉÓÏË ÄÉÒÅËÔÏÒÉÊ", /* MSG_FOLDERS_INDEX -HTML */ "üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÕÄÁÌÅÎÏ ÉÈ ÁÒÈÉ×Á", /* MSG_DELETED -HTML */ "üÔÏ ÓÏÏÂÝÅÎÉÅ ÐÒÏÓÒÏÞÅÎÏ", /* MSG_EXPIRED -HTML */ - "(ÕÄÁÌÅÎÎÏÅ ÓÏÏÂÝÅÎÉÅ)", /* MSG_DEL_SHORT -HTML */ + "ÕÄÁÌÅÎÎÏÅ ÓÏÏÂÝÅÎÉÅ", /* MSG_DEL_SHORT -HTML */ "éÓÈÏÄÎÙÊ ÔÅËÓÔ ÄÁÎÎÏÇÏ ÓÏÏÂÝÅÎÉÑ", /* MSG_TXT_VERSION -HTML */ "üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÏÔÆÉÌØÔÒÏ×ÁÎÏ", /* MSG_FILTERED_OUT -HTML */ "ïÔ", /* MSG_FROM -HTML*/ @@ -2605,11 +2743,20 @@ char *ru[] = { /* Russian */ " to ", /* to - HTML */ "from", /* from - HTML */ "on", /* on - HTML */ - "message archived in another list or period", /* unknown in reply to - HTML */ + "Message archived in another list or period", /* unknown in reply to - HTML */ "üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÕÄÁÌÅÎÏ ÉÈ ÁÒÈÉ×Á", /* MSG_DELETED_OTHER -HTML */ "Note: this message has been edited and differs from the originally archived copy.", /* MSG_EDITED -HTML */ "deleted", /* MSG_SENDER_DELETED -HTML */ "deleted", /* MSG_SUBJECT_DELETED -HTML */ + "Access mail archives by date, thread, author and subject", /* MSG_ACCESS_MAIL_ARCHIVES_BY_CAPTION - HTML */ + "Sort by", /* CSort by - HTML*/ + "Nothing received yet!", /* MSG_EMPTY_ARCHIVE - HTML */ + "attached message", /* MSG_ATTACHED_MESSAGE_NOTICE - HTML */ + "Forwarded message", /* MSG_FORWARDED_MESSAGE_NOTICE - HTML */ + "Attachments", /* MSG_ATTACHMENTS_NOTICE */ + "Attachments for message", /* MSG_ATTACHMENTS_FOR_MESSAGE_NOTICE */ + "(no messages are available in this archive)", /* MSG_EMPTY_ARCHIVE_NOTICE */ + "Normal view", /* MSG_CSS_NORMAL_VIEW */ NULL, /* End Of Message Table - NOWHERE*/ }; @@ -2647,3 +2794,5 @@ extern char **lang; extern struct language_entry ltable[]; #endif + +#endif /* _HYPERMAIL_LANG_H */ diff --git a/src/lock.c b/src/lock.c index ad919913..8f6fa14d 100644 --- a/src/lock.c +++ b/src/lock.c @@ -1,3 +1,21 @@ +/* +** Copyright (C) 1997-2023 Hypermail Project +** +** This program and library is free software; you can redistribute it and/or +** modify it under the terms of the GNU (Library) General Public License +** as published by the Free Software Foundation; either version 3 +** of the License, or any later version. +** +** This program is distributed in the hope that it will be useful, +** but WITHOUT ANY WARRANTY; without even the implied warranty of +** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +** GNU (Library) General Public License for more details. +** +** You should have received a copy of the GNU (Library) General Public License +** along with this program; if not, write to the Free Software +** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA +*/ + #include "hypermail.h" #include "setup.h" @@ -13,7 +31,7 @@ void lock_archive(char *dir) int count = 0; /* # minutes waited */ i_locked_it = 0; /* guilty until proven innocent */ - snprintf(lockfile, sizeof(lockfile), "%s/%s", dir, LOCKBASE); + trio_snprintf(lockfile, sizeof(lockfile), "%s/%s", dir, LOCKBASE); while ((fp = fopen(lockfile, "r")) != NULL) { fgets(buffer, MAXLINE-1, fp); @@ -36,14 +54,14 @@ void lock_archive(char *dir) fclose(fp); } else if (dir[0]) { - snprintf(errmsg, sizeof(errmsg), "Couldn't create lock file \"%s\".", lockfile); + trio_snprintf(errmsg, sizeof(errmsg), "Couldn't create lock file \"%s\".", lockfile); progerr(errmsg); } } void unlock_archive(void) { - if (lockfile && i_locked_it) + if (*lockfile != '\0' && i_locked_it) remove(lockfile); lockfile[0] = '\0'; } diff --git a/src/mail.c b/src/mail.c deleted file mode 100644 index 5cb16ab8..00000000 --- a/src/mail.c +++ /dev/null @@ -1,36 +0,0 @@ -/* -** Copyright (C) 1994, 1995 Enterprise Integration Technologies Corp. -** VeriFone Inc./Hewlett-Packard. All Rights Reserved. -** Kevin Hughes, kev@kevcom.com 3/11/94 -** Jay Weber, weber@eit.com -** Kent Landfield, kent@landfield.com 4/6/97 -** -** This program and library is free software; you can redistribute it and/or -** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 -** of the License, or any later version. -** -** This program is distributed in the hope that it will be useful, -** but WITHOUT ANY WARRANTY; without even the implied warranty of -** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -** GNU (Library) General Public License for more details. -** -** You should have received a copy of the GNU (Library) General Public License -** along with this program; if not, write to the Free Software -** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA -*/ -#include "../libcgi/cgi.h" -#include "../config.h" - -void cgi_main(cgi_info *ci) -{ - /* This program has been disabled because it - * was probably easy for spammers to use as an open relay. It also - * had problems with enabling malicious use of JavaScript and - * CRLF Injection. - */ - printf("\n"); - printf("\n"); - printf("This page has been disabled due to potential abuse by spammers.\n"); - printf("\n\n"); -} diff --git a/src/mem.c b/src/mem.c index 9a687f51..2cfe8447 100644 --- a/src/mem.c +++ b/src/mem.c @@ -3,10 +3,11 @@ ** VeriFone Inc./Hewlett-Packard. All Rights Reserved. ** Kevin Hughes, kev@kevcom.com 3/11/94 ** Kent Landfield, kent@landfield.com 4/6/97 +** Hypermail Project 1998-2023 ** ** This program and library is free software; you can redistribute it and/or ** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 +** as published by the Free Software Foundation; either version 3 ** of the License, or any later version. ** ** This program is distributed in the hope that it will be useful, diff --git a/src/parse.c b/src/parse.c index aad1740e..18dba039 100644 --- a/src/parse.c +++ b/src/parse.c @@ -1,26 +1,26 @@ -/* +/* ** Copyright (C) 1994, 1995 Enterprise Integration Technologies Corp. ** VeriFone Inc./Hewlett-Packard. All Rights Reserved. ** Kevin Hughes, kev@kevcom.com 3/11/94 ** Kent Landfield, kent@landfield.com 4/6/97 -** -** This program and library is free software; you can redistribute it and/or -** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 -** of the License, or any later version. -** -** This program is distributed in the hope that it will be useful, -** but WITHOUT ANY WARRANTY; without even the implied warranty of -** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -** GNU (Library) General Public License for more details. -** +** Hypermail Project 1998-2023 +** +** This program and library is free software; you can redistribute it and/or +** modify it under the terms of the GNU (Library) General Public License +** as published by the Free Software Foundation; either version 3 +** of the License, or any later version. +** +** This program is distributed in the hope that it will be useful, +** but WITHOUT ANY WARRANTY; without even the implied warranty of +** MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +** GNU (Library) General Public License for more details. +** ** You should have received a copy of the GNU (Library) General Public License -** along with this program; if not, write to the Free Software -** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA +** along with this program; if not, write to the Free Software +** Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA */ #include -#include #include "hypermail.h" #include "setup.h" @@ -64,7 +64,7 @@ #include "../lcc/lcc_extras.h" #endif -extern char *mktemp(char *); +#define NEW_PARSER 1 typedef enum { ENCODE_NORMAL, @@ -91,11 +91,11 @@ typedef enum { CONTENT_UNKNOWN /* must be the last one */ } ContentType; -static int hasblack(char *p) -{ - while(p && *p && isspace(*p++)); - return (*p ? TRUE : FALSE); -} +typedef enum { + NO_FILE, + MAKE_FILE, + MADE_FILE +} FileStatus; /* for attachments */ int ignorecontent(char *type) { @@ -214,9 +214,8 @@ int isre(char *re, char **end) endp = re + 3; } else if (!strncasecmp("Re[", re, 3)) { - long level; re += 3; - level = strtol(re, &re, 10); /* eat the number */ + strtol(re, &re, 10); /* eat the number */ if (!strncmp("]:", re, 2)) { /* we have an end "]:" and therefore it qualifies as a Re */ endp = re + 2; @@ -241,7 +240,11 @@ char *findre(char *in, char **end) while (*in) { if (isre(in, end)) return in; - in++; + if (isspace(*in)) { + in++; + } else { + break; + } } return NULL; } @@ -278,9 +281,9 @@ void print_progress(int num, char *msg, char *filename) fputs(bufstr, stdout); /* put out the string */ len = strlen(bufstr); /* get length of new string */ - /* - * If there is a new message then erase - * the trailing info from the enw string + /* + * If there is a new message then erase + * the trailing info from the enw string */ if (msg != NULL) { @@ -305,13 +308,22 @@ char *safe_filename(char *name) register char *np; np = name; + + if (!np || *np == '\0') { + return NULL; + } + + /* skip leading spaces in the filename */ while (*np && (*np == ' ' || *np == '\t')) np++; - if (!*np) - return (NULL); + if (!*np || !*np == '\n' || *np == '\r') { + /* filename is made of only spaces; replace them with + REPLACEMENT_CHAR */ + np = name; + } - for (sp = name, np = name; *np && *np != '\n';) { + for (sp = name, np = name; *np && *np != '\n' && *np != '\r';) { /* if valid character then store it */ if (((*np >= 'a' && *np <= 'z') || (*np >= '0' && *np <= '9') || (*np >= 'A' && *np <= 'Z') || (*np == '-') || (*np == '.') || @@ -346,7 +358,9 @@ create_attachname(char *attachname, int max_len) strncpy(suffix, attachname + i, sizeof(suffix) - 1); else suffix[0] = 0; - strncpy(attachname, set_filename_base, max_len); + strncpy(attachname, set_filename_base, max_len - 1); + /* make sure it is a NULL terminated string */ + attachname[max_len - 1] = '\0'; strncat(attachname, suffix, max_len - strlen(attachname) - 1); safe_filename(attachname); } @@ -376,7 +390,7 @@ void crossindex(void) &maybereply); if (status != -1) { struct emailinfo *email2; - + if (!hashnumlookup(status, &email2)) { ++num; continue; @@ -390,7 +404,7 @@ void crossindex(void) ++num; continue; } - + if (set_linkquotes) { struct reply *rp; int found_num = 0; @@ -406,7 +420,7 @@ void crossindex(void) #else replylist = addreply(replylist, status, email, maybereply, &replylist_end); -#endif +#endif } else { #ifdef FASTREPLYCODE @@ -415,7 +429,7 @@ void crossindex(void) #else replylist = addreply(replylist, status, email, maybereply, &replylist_end); -#endif +#endif } } num++; @@ -437,7 +451,7 @@ void crossindex(void) #endif } -/* +/* ** Recursively checks for replies to replies to a message, etc. ** Replies are added to the thread list. */ @@ -448,8 +462,7 @@ void crossindexthread2(int num) struct reply *rp; struct emailinfo *ep; if(!hashnumlookup(num, &ep)) { - char errmsg[512]; - snprintf(errmsg, sizeof(errmsg), + trio_snprintf(errmsg, sizeof(errmsg), "internal error crossindexthread2 %d", num); progerr(errmsg); } @@ -460,7 +473,15 @@ void crossindexthread2(int num) if (0) fprintf(stderr, "add thread.b %d %d %d\n", num, rp->data->msgnum, rp->msgnum); threadlist = addreply(threadlist, num, rp->data, 0, &threadlist_end); - printedlist = markasprinted(printedthreadlist, rp->msgnum); +#ifdef FIX_OR_DELETE_ME + /* JK: 2023-05-17: this seems to have been a longtime typo, + it produces memory leaks and didn't have any use in the + thread code. Tentatively correcting it to printthreadlist + and checking for side effects */ + printedlist = markasprinted(printedthreadlist, rp->msgnum); +#else + printedthreadlist = markasprinted(printedthreadlist, rp->msgnum); +#endif crossindexthread2(rp->msgnum); } } @@ -475,7 +496,14 @@ void crossindexthread2(int num) rp->data->flags |= USED_THREAD; threadlist = addreply(threadlist, num, rp->data, 0, &threadlist_end); +#ifdef FIX_OR_DELETE_ME + /* JK: 2023-05-17: this seems to have been a longtime typo, + it produces memory leaks and didn't have any use in the + thread code. Tentatively correcting it to printthreadlist + and checking for side effects */ printedlist = markasprinted(printedthreadlist, rp->msgnum); +#endif + printedthreadlist = markasprinted(printedthreadlist, rp->msgnum); crossindexthread2(rp->msgnum); } } @@ -543,14 +571,16 @@ char *getmaildate(char *line) INIT_PUSH(buff); c = strchr(line, ':'); - if ((*(c + 1) && *(c + 1) == '\n') || (*(c + 2) && *(c + 2) == '\n')) { + if (!*(c + 1) + || ((*(c + 1) == '\n') + || (*(c + 1) == '\r'))) { PushString(&buff, NODATE); RETURN_PUSH(buff); } c += 2; while (*c == ' ' || *c == '\t') c++; - for (i = 0, len = DATESTRLEN - 1; *c && *c != '\n' && i < len; c++) + for (i = 0, len = DATESTRLEN - 1; *c && *c != '\n' && *c != '\r' && i < len; c++) PushByte(&buff, *c); RETURN_PUSH(buff); @@ -572,7 +602,7 @@ char *getfromdate(char *line) if (days[i] == NULL) tmpdate[0] = '\0'; else { - for (i = 0, len = DATESTRLEN - 1; *c && *c != '\n' && i < len; c++) + for (i = 0, len = DATESTRLEN - 1; *c && *c != '\n' && *c != '\r' && i < len; c++) tmpdate[i++] = *c; tmpdate[i] = '\0'; @@ -581,7 +611,7 @@ char *getfromdate(char *line) } -/* +/* ** Grabs the message ID, like <...> from the Message-ID: header. */ @@ -595,9 +625,9 @@ char *getid(char *line) INIT_PUSH(buff); if (strrchr(line, '<') == NULL) { - /* + /* * bozo alert! - * msg-id = "<" addr-spec ">" + * msg-id = "<" addr-spec ">" * try to recover as best we can */ c = strchr(line, ':') + 1; /* we know this exists! */ @@ -609,7 +639,7 @@ char *getid(char *line) else c = strrchr(line, '<') + 1; - for (i = 0; *c && *c != '>' && *c != '\n'; c++) { + for (i = 0; *c && *c != '>' && *c != '\n' && *c != '\r'; c++) { if (*c == '\\') continue; PushByte(&buff, *c); @@ -662,7 +692,7 @@ char *getsubject(char *line) startp = c; - for (i = len = 0; c && *c && (*c != '\n'); c++) { + for (i = len = 0; c && *c && (*c != '\n') && (*c != '\r'); c++) { i++; /* keep track of the max length without trailing white spaces: */ if (!isspace(*c)) @@ -670,7 +700,7 @@ char *getsubject(char *line) } if (isre(startp, &postre)) { - if (!*postre || (*postre == '\n')) + if (!*postre || (*postre == '\n') || (*postre == '\r')) len = 0; } @@ -687,7 +717,7 @@ char *getsubject(char *line) /* ** Grabs the annotation values given in the annotation user-defined header -** +** ** annotation_content is set to the value of the content annotation ** annotation_robot is set to the values of the robot annotations ** Returns TRUE if an annotation was found, FALSE otherwise. @@ -701,7 +731,7 @@ getannotation(char *line, annotation_content_t *annotation_content, *annotation_content = ANNOTATION_CONTENT_NONE;; *annotation_robot = ANNOTATION_ROBOT_NONE; - + c = strchr(line, ':'); if (!c) return FALSE; @@ -713,9 +743,9 @@ getannotation(char *line, annotation_content_t *annotation_content, while (isspace(*c)) c++; - + startp = c; - while (!isspace (*c) && *c != '\n' && *c != ',') { + while (!isspace (*c) && *c != '\n' && *c != '\r' && *c != ',') { c++; } @@ -741,7 +771,7 @@ getannotation(char *line, annotation_content_t *annotation_content, } /* only return true if at least a valid annotation was found */ - return (*annotation_content != ANNOTATION_CONTENT_NONE + return (*annotation_content != ANNOTATION_CONTENT_NONE || *annotation_robot != ANNOTATION_ROBOT_NONE); } @@ -749,15 +779,15 @@ getannotation(char *line, annotation_content_t *annotation_content, ** Grabs the message ID, or date, from the In-reply-to: header. ** ** Maybe I'm confused but.... -** What either ? Should it not be consistent and choose to return -** one (the msgid) as the default and fall back to date when a +** What either ? Should it not be consistent and choose to return +** one (the msgid) as the default and fall back to date when a ** msgid cannot be found ? ** ** Who knows what other formats are out there... ** ** In-Reply-To: <1DD9B854E27@everett.pitt.cc.nc.us> ** In-Reply-To: <199709181645.MAA02097@mail.clark.net> from "Marcus J. Ranum" at Sep 18, 97 12:41:40 pm -** In-Reply-To: <199709181645.MAA02097@mail.clark.net> from +** In-Reply-To: <199709181645.MAA02097@mail.clark.net> from ** In-Reply-To: "L. Detweiler"'s message of Fri, 04 Feb 94 22:51:22 -0700 <199402050551.WAA16189@longs.lance.colostate.edu> ** ** The message id should always be returned for threading purposes. Mixing @@ -776,42 +806,42 @@ char *getreply(char *line) /* Check for blank line */ - /* - * Check for line with " from " and " at ". Format of the line is + /* + * Check for line with " from " and " at ". Format of the line is * from "quoted user name" at date-string */ if (strstr(line, " from ") != NULL) { if ((strstr(line, " at ")) != NULL) { if ((m = strchr(line, '<')) != NULL) { - for (m++; *m && *m != '>' && *m != '\n'; m++) { + for (m++; *m && *m != '>' && *m != '\n' && *m != '\r'; m++) { PushByte(&buff, *m); } RETURN_PUSH(buff); } } - /* + /* * If no 'at' the line may be a continued line or a truncated line. * Both will be picked up later. */ } - /* - * Check for line with " message of ". Format of the line is + /* + * Check for line with " message of ". Format of the line is * "quoted user name"'s message of date-string */ if ((c = strstr(line, "message of ")) != NULL) { /* - * Check to see if there is a message ID on the line. + * Check to see if there is a message ID on the line. * If not this is a continued line and when you add a readline() * function that concatenates continuation lines collapsing * white space, you might want to revisit this... */ if ((m = strchr(line, '<')) != NULL) { - for (m++; *m && *m != '>' && *m != '\n'; m++) { + for (m++; *m && *m != '>' && *m != '\n' && *m != '\r'; m++) { PushByte(&buff, *m); } RETURN_PUSH(buff); @@ -824,7 +854,7 @@ char *getreply(char *line) if (*c == '"') c++; - for (; *c && *c != '.' && *c != '\n'; c++) { + for (; *c && *c != '.' && *c != '\n' && *c != '\r'; c++) { PushByte(&buff, *c); } RETURN_PUSH(buff); @@ -832,7 +862,7 @@ char *getreply(char *line) if ((c = strstr(line, "dated: ")) != NULL) { c += 7; - for (; *c && *c != '.' && *c != '\n'; c++) { + for (; *c && *c != '.' && *c != '\n' && *c != '\r'; c++) { PushByte(&buff, *c); } RETURN_PUSH(buff); @@ -840,7 +870,7 @@ char *getreply(char *line) if ((c = strstr(line, "dated ")) != NULL) { c += 6; - for (; *c && *c != '.' && *c != '\n'; c++) { + for (; *c && *c != '.' && *c != '\n' && *c != '\r'; c++) { PushByte(&buff, *c); } RETURN_PUSH(buff); @@ -849,7 +879,7 @@ char *getreply(char *line) if ((c = strchr(line, '<')) != NULL) { c++; - for (; *c && *c != '>' && *c != '\n'; c++) { + for (; *c && *c != '>' && *c != '\n' && *c != '\r'; c++) { if (*c == '\\') continue; PushByte(&buff, *c); @@ -862,7 +892,7 @@ char *getreply(char *line) if (*c == '\"') c++; - for (; *c && *c != '.' && *c != '\n' && *c != 'f'; c++) { + for (; *c && *c != '.' && *c != '\n' && *c != '\r' && *c != 'f'; c++) { PushByte(&buff, *c); } RETURN_PUSH(buff); @@ -910,7 +940,7 @@ extract_rfc2047_content(char *iptr) ** ** Should result in "I'm called Daniel" too. ** -** Returns the newly allcated string, or the previous if nothing changed +** Returns the newly allcated string, or the previous if nothing changed */ static char *mdecodeRFC2047(char *string, int length, char *charsetsave) @@ -924,13 +954,12 @@ static char *mdecodeRFC2047(char *string, int length, char *charsetsave) char charset[129]; char encoding[33]; char dummy[129]; - char *ptr, *endptr; - char *old_output; + char *endptr; #ifdef NOTUSED char equal; #endif - int value; + unsigned int value; char didanything = FALSE; @@ -959,14 +988,15 @@ static char *mdecodeRFC2047(char *string, int length, char *charsetsave) if (!strcasecmp("q", encoding)) { /* quoted printable decoding */ - endptr = ptr + strlen(ptr); +#ifdef HAVE_ICONV + char *orig2,*output2,*output3; + size_t len, charsetlen; +#endif + endptr = ptr + strlen(ptr); #ifdef HAVE_ICONV - char *orig2,*output2,*output3; - size_t len, charsetlen; - orig2=output2=malloc(strlen(string)+1); - memset(output2,0,strlen(string)+1); - old_output=output; + orig2=output2=malloc(strlen(string)+1); + memset(output2,0,strlen(string)+1); for (; ptr < endptr; ptr++) { switch (*ptr) { @@ -1011,12 +1041,12 @@ static char *mdecodeRFC2047(char *string, int length, char *charsetsave) } else if (!strcasecmp("b", encoding)) { /* base64 decoding */ - int len; - size_t charsetlen; #ifdef HAVE_ICONV + size_t charsetlen; size_t tmplen; char *output2; - base64Decode(ptr, output, &len); + + base64_decode_string(ptr, output); output2=i18n_convstring(output,charset,"UTF-8",&tmplen); memcpy(output,output2,tmplen); output += tmplen; @@ -1025,7 +1055,9 @@ static char *mdecodeRFC2047(char *string, int length, char *charsetsave) memcpy(charsetsave,charset,charsetlen); charsetsave[charsetlen] = '\0'; #else - base64Decode(ptr, output, &len); + int len; + + len = base64_decode_string(ptr, output); output += len; #endif } @@ -1083,30 +1115,115 @@ static char *mdecodeRFC2047(char *string, int length, char *charsetsave) puts(""); } #endif + /* here we should add calls to validate the utf8 string, + to avoid security issues */ return storage; /* return new */ } else { free(storage); -#ifdef HAVE_ICONV - /* make sure there are only ascii chars in the string - ** for messages that don't respect rfc2047 */ - i18n_replace_non_ascii_chars(string); -#endif + + if ( i18n_is_valid_us_ascii(string) ) { + /* nothing to do, passing thru */ + } + + /* RFC6532 allows for using UTF-8 as a header value; we make + sure that it is valid UTF-8 */ + else if ( i18n_is_valid_utf8(string) ) { + /* "default" UTF-8 charset */ + strcpy(charsetsave, "UTF-8"); + + } else { + + /* + * try to detect the charset of the string and convert it to UTF-8; + * in case of failure, replace the header value with "(invalid string)" + */ + +#if defined HAVE_CHARDET && HAVE_ICONV + char *charset; + char *conv_string; + char header_name[129]; + char *header_value; + struct Push pbuf; + + INIT_PUSH(pbuf); + + sscanf(string, "%127[^:]", header_name); + + /* save the header_name:\s */ + PushString(&pbuf, header_name); + + header_value = string + strlen(header_name); + PushByte(&pbuf, *header_value); + header_value++; + PushByte(&pbuf, *header_value); + header_value++; + + /* consider the header_value everything after header_name:\s */ + charset = i18n_charset_detect(header_value); + + if (!charset || charset[0] == '\0' && !strcmp(charset, "UTF-8") ) { + PushString (&pbuf, "(invalid string)"); + } + else { + size_t conv_string_sz; + conv_string = i18n_convstring(header_value, charset, "UTF-8", &conv_string_sz); + if ( !i18n_is_valid_utf8(conv_string) ) { + free(conv_string); + PushString (&pbuf, "(invalid string)"); + } else { + int charsetlen = strlen(charset) < 255 ? strlen(charset) : 255; + memcpy(charsetsave,charset,charsetlen); + charsetsave[charsetlen] = '\0'; + PushString(&pbuf, conv_string); + free(conv_string); + } + free(charset); + } + + free(string); + string = PUSH_STRING(pbuf); +#else + free(string); + string = strsav("(invalid string)"); +#endif + } + return string; } } -/* +/* ** RFC 3676 format=flowed parsing routines */ -/* get_quote_level returns the number of quotes in a line, +/* +** returns true if a string line is s signature start +** rfc3676 gives "-- \n" and "-- \r\n" as signatures. +** We also add "--\n" to this list, as mutt allows it +*/ +static int is_sig_separator (const char *line) +{ + bool rv; + + if (!strcmp (line, "-- \n") + || !strcmp (line, "-- \r\n") + || !strcmp (line, "--\n")) { + rv = TRUE; + } else { + rv = FALSE; + } + + return rv; +} + +/* get_quote_level returns the number of quotes in a line, following the RFC 3676 section 4.5 criteria. */ static int get_quotelevel (const char *line) { int quoted = 0; - char *p = (char *) line; + const char *p = line; while (p && *p == '>') { @@ -1135,7 +1252,7 @@ static int get_quotelevel (const char *line) ** The function returns true if the current line is flowed. ** */ -static bool rfc3676_handler (char *line, bool delsp_flag, int *quotelevel, +static bool rfc3676_handler (char *line, bool delsp_flag, int *quotelevel, bool *continue_prev_flow_flag) { int new_quotelevel = 0; @@ -1156,8 +1273,8 @@ static bool rfc3676_handler (char *line, bool delsp_flag, int *quotelevel, rules for quotes: 1. quoted lines always begin with a '>' char. This symbol may depend on the - msg charset. - 2. They are not ss before the quote symbol but may be after it + msg charset. + 2. They are not ss before the quote symbol but may be after it appears. rules for seeing if a line should be flowed with the next one: @@ -1175,7 +1292,7 @@ static bool rfc3676_handler (char *line, bool delsp_flag, int *quotelevel, printf("RFC3676: Previous line flow flag: %d\n", *continue_prev_flow_flag); #endif - /* + /* ** hard crlf detection. */ if (rfc3676_ishardlb(line)) { @@ -1187,7 +1304,7 @@ static bool rfc3676_handler (char *line, bool delsp_flag, int *quotelevel, #endif return FALSE; } - + /* ** quote level detection */ @@ -1203,21 +1320,21 @@ static bool rfc3676_handler (char *line, bool delsp_flag, int *quotelevel, #if DEBUG_PARSE printf("RFC3676: different quote levels detected. Stopping ff\n"); -#endif +#endif } tmp_padding = new_quotelevel; - /* - ** skip space stuffing if any + /* + ** skip space stuffing if any */ if (line[tmp_padding] == ' ') { tmp_padding++; #if DEBUG_PARSE printf("RFC3676: space-stuffing detected; skipping space\n"); -#endif +#endif } - /* + /* ** hard crlf detection after quotes */ if (rfc3676_ishardlb(line+tmp_padding)) { @@ -1233,13 +1350,9 @@ static bool rfc3676_handler (char *line, bool delsp_flag, int *quotelevel, /* ** signature detection */ - - /* Is it a signature separator? */ - /* rfc3676 gives "-- \n" and "--\r\n" as signatures. We also add "--\n" to this list, - as mutt allows it */ - if (!strcmp (line + tmp_padding, "-- \n") - || !strcmp (line + tmp_padding, "-- \r\n") - || !strcmp (line + tmp_padding, "--\n")) { + + /* Is it an RFC3676 signature separator? */ + if (is_sig_separator (line + tmp_padding)) { /* yes, stop f=f */ *continue_prev_flow_flag = FALSE; sig_sep = TRUE; @@ -1276,12 +1389,12 @@ static bool rfc3676_handler (char *line, bool delsp_flag, int *quotelevel, } } } - + /* ** update flags */ *quotelevel = new_quotelevel; - + #if DEBUG_PARSE if (*continue_prev_flow_flag) printf("RFC3676: Continuing previous flow\n"); @@ -1329,7 +1442,7 @@ static char * mdecodeQP(FILE *file, char *input, char **result, int *length, input++; if ('=' == inchar) { - int value; + unsigned int value; if ('\n' == *input) { if (!fgets(i_buffer, MAXLINE, file)) break; @@ -1462,7 +1575,7 @@ static int do_uudecode(FILE *fp, char *line, char *line_buf, if (uudecode(fp, line, line, NULL, &pbuf)) /* - * oh gee, we failed this is chaos + * oh gee, we failed this is chaos */ return 0; p2 = PUSH_STRING(pbuf); @@ -1490,7 +1603,7 @@ static void write_txt_file(struct emailinfo *emp, struct Push *raw_text_buf) sprintf(tmp_buf, "%.4d", emp->msgnum); txt_filename = htmlfilename(tmp_buf, emp, set_txtsuffix); if ((!emp->is_deleted - || ((emp->is_deleted & (FILTERED_DELETE | FILTERED_OLD | FILTERED_NEW + || ((emp->is_deleted & (FILTERED_DELETE | FILTERED_OLD | FILTERED_NEW | FILTERED_DELETE_OTHER)) && set_delete_level > 2) || (emp->is_deleted == FILTERED_EXPIRE && set_delete_level == 2)) @@ -1505,6 +1618,256 @@ static void write_txt_file(struct emailinfo *emp, struct Push *raw_text_buf) INIT_PUSH(*raw_text_buf); } +/* +** returns the value for a message_node skip value field +** following some heuristics +*/ +static message_node_skip_t message_node_skip_status(FileStatus file_created, + ContentType content, + char *content_type) +{ + message_node_skip_t rv; + + if (content == CONTENT_IGNORE) { + rv = MN_SKIP_ALL; + /* we want to skip adding a section when root is multipart/foo + but we'll handle that elsewhere */ + + } + + else if (!strncasecmp(content_type, "multipart/", 10) + && content == CONTENT_BINARY && file_created == NO_FILE) { + rv = MN_SKIP_BUT_KEEP_CHILDREN; + } + + else if (content == CONTENT_BINARY || content == CONTENT_UNKNOWN) { + rv = MN_SKIP_STORED_ATTACHMENT; + } + + else { + rv = MN_KEEP; + } + + return rv; +} + +/* +** singlecontent_get_charset +** +** for single (not multipart/) messages, returns +** the best charset; if none available returns +** set_default_charset +** +** caller must free the returned string +*/ +static char *_single_content_get_charset(char *charset, char *charsetsave) +{ + char *rv; + char *s; + + s = choose_charset(charset, charsetsave); + if (!s || *s == '\0') { + rv = set_default_charset; + } else { + rv = s; + } + + return strsav(rv); +} + +/* +** returns TRUE if line is just a stand-alone +** "--" or "-- " +*/ +static bool _is_signature_separator(const char *line) +{ + bool rv; + int l = strlen(line); + + if (!strncmp(line, "--", 2) + && ((l == 2 && line[2] =='\0') + || (l > 2 + && (line[2] == ' ' || line[2] == '\r' || line[2] == '\n')))) { + rv = TRUE; + } else { + rv= FALSE; + } + + return rv; +} + +/* +** Some old versions of thunderbird, pine, and other UA +** URL-escaped the <> in the In-Reply-To and first +** Reference header values. +** This functions normalizes them by unescaping those +** characters. +** +** If any unescaping takes place, returns a new string +** that the caller must free. +** +** If none unescaping happened, returns NULL. +** +*/ +static char * _unescape_reply_and_reference_values(char *line) +{ + char *ptr_lower_than; + char *ptr_greater_than; + char *c; + struct Push buff; + + if (!line || !*line || *line == '\n' || *line == '\r') { + return NULL; + } + + ptr_lower_than = strstr(line, " %3C"); + ptr_greater_than = strstr(line, "%3E"); + + /* we only do the replacement if we found both <> */ + if (!ptr_lower_than || !ptr_greater_than) { + return NULL; + } + + /* verify that we have a contiguous string between both + * characters */ + for (c = ptr_lower_than + sizeof(char) * 1; c < ptr_greater_than; c++) { + if (isspace(*c) || *c == '\r' || *c == '\n') + return NULL; + } + + /* verify what the char immediately after the ptr_greater_than to make + sure it's a separator or EOL */ + c = ptr_greater_than + sizeof(char) * 3; + if (!isspace(*c) && *c != '\n' && *c != '\r') { + return NULL; + } + + INIT_PUSH(buff); + + PushNString(&buff, line, ptr_lower_than - line + 1); + PushByte(&buff, '<'); + PushNString(&buff, ptr_lower_than + sizeof(char) * 4, ptr_greater_than - ptr_lower_than - sizeof(char) *4); + PushString(&buff, ">"); + PushString(&buff, ptr_greater_than + sizeof(char) * 3); + + RETURN_PUSH(buff); +} + +/* +** parses a filename in either a Content-Disposition or Content-Description +** line. +** +** np must be pointing at the first character after the attribute and equal +** sign, i.e., filename= or name=, respectively. +** attachname is a preallocated string of size attachname_size +** the function copies the filename, if found, to attachname and calls +** safe_filename to make sure it's a valid O.S. name. +*/ +static void _extract_attachname(char *np, char *attachname, size_t attachname_size) +{ + char *jp; + + /* some UA may have done line folding between filename= and the "foo" attribute value; + if this is the case, we skip all spaces until we find the first non-space char */ + jp = np; + while (*jp && isspace(*jp)) { + jp++; + } + + /* if we find a non space character, update np to the new position; + otherwise we ignore jp and just use np as it was as + we'll handle the only spaces case further down */ + if (*jp && *jp != '\n' && *jp != '\r' && *jp != ';') { + np = jp; + } + + /* skip the first quote */ + if (*np == '"') + np++; + + for (jp = attachname; np && *np != '\n' && *np != '\r' + && *np != '"' && *np != ';' + && jp < attachname + attachname_size - 1;) { + *jp++ = *np++; + } + *jp = '\0'; + safe_filename(attachname); +} + +/* +** if the attachname that is given is empty, searchs the Content-Type: +** header value for a name attribute and, if found, copies it to +** attachname; If in this case, the Content-Type: header value doesn't +** have a name attribute, it clears the attachname. +*/ +static void _control_attachname(char *content_type, char *attachname, size_t attachname_size) +{ + /* only use the Content-Type name attribute to get + the filename if Content-Disposition didn't + provide a filename */ + char *fname; + + if (*attachname == '\0') { + fname = strcasestr(content_type, "name="); + if (fname) { + fname += 5; + _extract_attachname(fname, attachname, attachname_size); +#ifdef FACTORIZE_ATTACHNAME + if ('\"' == *fname) + fname++; + sscanf(fname, "%128[^\"]", attachname); + safe_filename(attachname); +#endif /* FACTORIZE_ATTACHNAME */ + } + else { + attachname[0] = '\0'; /* just clear it */ + } + } +} + +/* validates that a header name is RFC282 compliant + returns TRUE if valid, FALSE otherwise +*/ +static bool _validate_header(const char *header_line) +{ + char header_name[129]; + const char *ptr; + + /* control that we have a header_name: header_value */ + if (!header_line + || *header_line=='\0' + || !(ptr = strstr(header_line, ":")) + || ptr == header_line + || *(ptr + 1) == '\0' + || (*(ptr + 1) != ' ' && *(ptr + 1) != '\t')) { + + return FALSE; + } + + /* control length of header-name and its requirement + to be only valid printable US-ASCII */ + + if (!sscanf(header_line, "%127[^:]", header_name) + /* line doesn't start with : */ + || header_line[strlen(header_name)] != ':' + /* header name is us_ascii */ + || !i18n_is_valid_us_ascii(header_name)) { + + return FALSE; + } + + /* control that we have a value that is not spaces */ + ptr = header_line + strlen(header_name) + 1; + while (*ptr) { + if (*ptr != ' ' && *ptr != '\t' && *ptr != '\r' && *ptr != '\n') { + return TRUE; + } + ptr++; + } + + return FALSE; +} + /* ** Parsing...the heart of Hypermail! ** This loads in the articles from stdin or a mailbox, adding the right @@ -1529,7 +1892,12 @@ int parsemail(char *mbox, /* file name */ char *inreply = NULL; char *namep = NULL; char *emailp = NULL; - char *line = NULL; + char message_headers_parsed = FALSE; /* we use this flag to avoid + having message/rfc822 + headers clobber the + encapsulating message + headers */ + char *line = NULL; char line_buf[MAXLINE], fromdate[DATESTRLEN] = ""; char *cp; char *dp = NULL; @@ -1552,34 +1920,38 @@ int parsemail(char *mbox, /* file name */ char *att_dir = NULL; /* directory name to store attachments in */ char *meta_dir = NULL; /* directory name where we're storing the meta data that describes the attachments */ - typedef enum { - NO_FILE, - MAKE_FILE, - MADE_FILE - } FileStatus; /* for attachments */ - /* -- variables for the multipart/alternative parser -- */ struct body *origbp = NULL; /* store the original bp */ struct body *origlp = NULL; /* ... and the original lp */ char alternativeparser = FALSE; /* set when inside alternative parser mode */ int alternative_weight = -1; /* the current weight of the prefered alternative content */ - char *prefered_content_charset = NULL; /* the current charset of the alternative */ + char *prefered_charset = NULL; /* the charset for a message as chosen by heuristics */ struct body *alternative_lp = NULL; /* the previous alternative lp */ struct body *alternative_bp = NULL; /* the previous alternative bp */ struct body *append_bp = NULL; /* text to append to body after parse done*/ struct body *append_lp = NULL; + FileStatus alternative_lastfile_created = NO_FILE; /* previous alternative attachments, for non-inline MIME types */ - char alternative_file[129]; /* file name where we store the non-inline alternatives */ - char alternative_lastfile[129]; /* last file name where we store the non-inline alternatives */ - char last_alternative_type[129]; /* the alternative Content-Type value */ + char alternative_file[131]; /* file name where we store the non-inline alternatives */ + char alternative_lastfile[131]; /* last file name where we store the non-inline alternatives */ + char last_alternative_type[131]; /* the alternative Content-Type value */ int att_counter = 0; /* used to generate a unique name for attachments */ + int parse_multipart_alternative_force_save_alts = 0; /* used to control if we are parsing alternative as multipart */ - int old_set_save_alts = -1; /* used to store the set_save_alts when overriding it for apple mail */ - int applemail_ua_header_len = (set_applemail_mimehack) ? strlen (set_applemail_ua_header) : 0; /* code optimization to avoid computing it each time */ - /* - ** keeps track of attachment file name used so far for this message + + /* used to store the set_save_alts when overriding it for apple mail */ + int applemail_old_set_save_alts = -1; + /* code optimization to avoid computing it each time */ + int applemail_ua_header_len = (set_applemail_mimehack) ? strlen (set_applemail_ua_header) : 0; + /* we make a local copy of this config variable because the apple mail + hack will alter it and we may need to fall back to the original value + while processing a complex multipart/ message/rfc822 message */ + int local_set_save_alts = set_save_alts; + + /* + ** keeps track of attachment file name used so far for this message */ - struct hmlist *att_name_list = NULL; + struct hmlist *att_name_list = NULL; struct hmlist *att_name_last = NULL; /* -- end of alternative parser variables -- */ @@ -1591,37 +1963,40 @@ int parsemail(char *mbox, /* file name */ struct body *headp = NULL; /* stored pointer to the point where we last scanned the headers of this mail. */ - struct body *content_type_p = NULL; /* pointer to the Content-Type header */ - + char Mime_B = FALSE; char boundbuffer[256] = ""; - struct boundary *boundp = NULL; /* This variable is used to store a stack - of boundary separators in cases with mimed - mails inside mimed mails */ + /* This variable is used to store a stack of boundary separators + when having multipart body parts embeeded inside other + multipart body parts */ + struct boundary_stack *boundp = NULL; - struct boundary *multipartp = NULL; /* This variable is used to store a stack of - mimetypes when dealing with multipart mails */ + /* This variable is used to store a stack of mime types when + dealing with multipart mails */ + struct hm_stack *multipartp = NULL; - struct charset_stack *charsetsp = NULL; /* This variable is used - to store a stack of - charset/charset_save - values when dealing - with multipart mails */ + struct message_node *root_message_node = NULL; /* points to the first node of a message */ + struct message_node *current_message_node = NULL; + struct message_node *root_alt_message_node = NULL; /* for temporarily storing alternatives */ + struct message_node *current_alt_message_node = NULL; + char alternative_message_node_created = FALSE; /* true if we have created a node used to + store multipart/alternative while selecting + the prefered one */ bool skip_mime_epilogue = FALSE; /* This variable is used to help skip multipart/foo epilogues */ - - char multilinenoend = FALSE; /* This variable is set TRUE if we have read - a partial line off a multiline-encoded line, + + char multilinenoend = FALSE; /* This variable is set TRUE if we have read + a partial line off a multiline-encoded line, and the next line we read is supposed to get appended to the previous one */ - int bodyflags = 0; /* This variable is set to extra flags that the + int bodyflags = 0; /* This variable is set to extra flags that the addbody() calls should OR in the flag parameter */ /* RFC 3676 related variables, set while parsing the headers and body content */ - textplain_format_t textplain_format = FORMAT_FIXED; + textplain_format_t textplain_format = FORMAT_FIXED; bool flowed_line = FALSE; int quotelevel = 0; bool continue_previous_flow_flag = FALSE; @@ -1629,55 +2004,63 @@ int parsemail(char *mbox, /* file name */ int binfile = -1; - char *charset = NULL; /* this is the LOCAL charset used in the mail */ + char *charset = NULL; /* this is the charset declared in the Content-Type header */ char *charsetsave; /* charset in MIME encoded text */ char *boundary_id = NULL; char type[129]; /* for Content-Type type */ + char *content_type_ptr; /* pointing to the Content-Type parsed line */ + bool attachment_rfc822; /* set to TRUE if the current attachment type is + message/rfc822 */ + char charbuffer[129]; /* for Content-Type charset */ FileStatus file_created = NO_FILE; /* for attachments */ char attachname[129]; /* for attachment file names */ + char *att_binname = NULL; /* full path + filename pointing to a stored attachment */ + char *meta_filename = NULL; /* full path + filename to metadata associated with + a stored attachment */ + char *att_link = NULL; /* for a stored attachment HTML link */ + char *att_comment_filename = NULL; /* for the HTML comment that is inserted after att_link */ char inline_force = FALSE; /* show a attachment in-line, regardles of the content_disposition */ char *description = NULL; /* user-supplied description for an attachment */ - /* @@@ test for attachment */ char attach_force; - /* @@@ */ - + struct base64_decoder_state *b64_decoder_state = NULL; /* multi-line base64 decoding */ + EncodeType decode = ENCODE_NORMAL; ContentType content = CONTENT_TEXT; charsetsave=malloc(256); memset(charsetsave,0,255); - - + *directory = 0; + *filename = 0; + *pathname = 0; + *attachname = '\0'; + if (use_stdin || !mbox || !strcasecmp(mbox, "NONE")) fp = stdin; else if ((fp = fopen(mbox, "rb")) == NULL) { - snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_CANNOT_OPEN_MAIL_ARCHIVE], mbox); progerr(errmsg); } if(set_append) { - + /* add to an mbox as we read */ - *directory = 0; - *filename = 0; - *pathname = 0; if (set_append_filename) { time_t curtime; const struct tm *local_curtime; - + time(&curtime); local_curtime = localtime(&curtime); - + if(strncmp(set_append_filename, "$DIR/", 5) == 0) { strncpy(directory, dir, MAXFILELEN - 1); - strftime(filename, MAXFILELEN - 1, set_append_filename+5, + strftime(filename, MAXFILELEN - 1, set_append_filename+5, local_curtime); } else { - strftime(filename, MAXFILELEN - 1, set_append_filename, + strftime(filename, MAXFILELEN - 1, set_append_filename, local_curtime); } } else { @@ -1694,6 +2077,9 @@ int parsemail(char *mbox, /* file name */ lang[MSG_CANNOT_OPEN_MAIL_ARCHIVE], pathname); progerr(errmsg); } + *directory = 0; + *filename = 0; + *pathname = 0; } num = startnum; @@ -1707,9 +2093,12 @@ int parsemail(char *mbox, /* file name */ msgid = NULL; bp = NULL; subject = NOSUBJECT; + message_headers_parsed = FALSE; parse_multipart_alternative_force_save_alts = 0; - old_set_save_alts = -1; + attachment_rfc822 = FALSE; + applemail_old_set_save_alts = -1; + local_set_save_alts = set_save_alts; require_filter_len = require_filter_full_len = 0; for (tlist = set_filter_require; tlist != NULL; require_filter_len++, tlist = tlist->next) @@ -1744,7 +2133,7 @@ int parsemail(char *mbox, /* file name */ } } - for ( ; fgets(line_buf, MAXLINE, fp) != NULL; + for ( ; fgets(line_buf, MAXLINE, fp) != NULL; set_txtsuffix ? PushString(&raw_text_buf, line_buf) : 0) { #if DEBUG_PARSE fprintf(stderr,"\n^IN: %s", line_buf); @@ -1756,8 +2145,8 @@ int parsemail(char *mbox, /* file name */ "alternative_lp", (alternative_lp) ? alternative_lp->line : "", "origbp", (origbp) ? origbp->line : "", "origlp", (origlp) ? origlp->line : "", - "headp", (headp) ? headp->line : ""); -#endif + "headp", (headp) ? headp->line : ""); +#endif if(set_append) { if(fputs(line_buf, fpo) < 0) { progerr("Can't write to \"mbox\""); /* revisit me */ @@ -1765,14 +2154,20 @@ int parsemail(char *mbox, /* file name */ } line = line_buf + set_ietf_mbox; + /* skip the mime epilogue until we find a known boundary or + a new message */ if (skip_mime_epilogue) { - if (line[0] == '\n') { + int l = strlen(line); + if ((strncmp(line, "--", 2) + || _is_signature_separator(line) + || !boundary_stack_has_id(boundp, line)) + && strncasecmp(line_buf, "From ", 5)) { continue; } else { skip_mime_epilogue = FALSE; } } - + if (!is_deleted && inlist_regex_pos(set_filter_out_full_body, line) != -1) { is_deleted = FILTERED_OUT; @@ -1787,12 +2182,18 @@ int parsemail(char *mbox, /* file name */ /* check for MIME */ else if (!strncasecmp(line, "MIME-Version:", 13)) Mime_B = TRUE; + else if (!strncasecmp(line, "Content-Type:", 13)) { + /* we don't do anything here except switch off anti-spam + to avoid having boundaries with @ chars being changed + by the antispam functions */ + bp = addbody(bp, &lp, line, BODY_HEADER | BODY_NO_ANTISPAM | bodyflags); + } else if (isspace(line[0]) && ('\n' != line[0]) \ && !('\r' == line[0] && '\n' == line[1])) { /* - * since this begins with a whitespace, it means the - * previous line is continued on this line, leave only - * one space character and go! + * since this begins with a whitespace, it means the + * previous line is continued on this line, leave only + * one space character and go! */ char *ptr = line; while (isspace(*ptr)) @@ -1809,9 +2210,9 @@ int parsemail(char *mbox, /* file name */ char savealternative; - /* - * we mark this as a header-line, and we use it to - * track end-of-header displays + /* + * we mark this as a header-line, and we use it to + * track end-of-header displays */ /* skip the alternate "\n", otherwise, we'll have @@ -1821,109 +2222,229 @@ int parsemail(char *mbox, /* file name */ isinheader--; /* - * This signals us that we are no longer in the header, - * let's fill in all those fields we are interested in. - * Parse the headers up to now and copy to the target - * variables + * This signals us that we are no longer in the header, + * let's fill in all those fields we are interested in. + * Parse the headers up to now and copy to the target + * variables */ + /* parsing of all headers except for Content-* related ones */ for (head = bp; head; head = head->next) { - char head_name[128]; + char head_name[129]; + + /* if we have a single \n, we just mark it as head->demimed + and skip the rest of the checks, which would give the + same result */ + if (head->line && rfc3676_ishardlb(head->line)) { + head->demimed = TRUE; + continue; + } + if (head->header && !head->demimed) { - head->line = - mdecodeRFC2047(head->line, strlen(head->line),charsetsave); - head->demimed = TRUE; + char *ptr; + + /* control that we have a valid header line */ + if ( !_validate_header(head->line) ) { + /* not a valid header line, we mark it as so to ignore it + later on */ + head->invalid_header = TRUE; + head->parsedheader = TRUE; + /* the following line is probably overkill and can be skipped */ + head->demimed = TRUE; + continue; + } + + head->line = + mdecodeRFC2047(head->line, strlen(head->line), charsetsave); + head->demimed = TRUE; } - if (head->parsedheader || head->attached || - !head->header) { + if (head->parsedheader +#ifdef DELETE_ME + || head->attached +#endif + || !head->header) { continue; } - if (!sscanf(head->line, "%127[^:]", head_name)) - continue; - + + /* we probably would be ok just with the sscanf as we + validated the header line some lines above */ + if (!sscanf(head->line, "%127[^:]", head_name)) { + head->invalid_header = TRUE; + head->parsedheader = TRUE; + continue; + } + if (inlist(set_deleted, head_name)) { - char *val = getsubject(head->line); /* revisit me */ - if (!strcasecmp(val, "yes")) - is_deleted = FILTERED_DELETE; - free(val); + if (!message_headers_parsed) { + char *val = getsubject(head->line); /* revisit me */ + if (!strcasecmp(val, "yes")) + is_deleted = FILTERED_DELETE; + free(val); + } + head->parsedheader = TRUE; } if (inlist(set_expires, head_name)) { - char *val = getmaildate(head->line); - exp_time = convtoyearsecs(val); - if (exp_time != -1 && exp_time < time(NULL)) - is_deleted = FILTERED_EXPIRE; - free(val); + if (!message_headers_parsed) { + char *val = getmaildate(head->line); + exp_time = convtoyearsecs(val); + if (exp_time != -1 && exp_time < time(NULL)) + is_deleted = FILTERED_EXPIRE; + free(val); + } + head->parsedheader = TRUE; } if (inlist(set_annotated, head_name)) { - getannotation(head->line, &annotation_content, - &annotation_robot); - if (annotation_content == ANNOTATION_CONTENT_DELETED_OTHER) - is_deleted = FILTERED_DELETE_OTHER; - else if (annotation_content == ANNOTATION_CONTENT_DELETED_SPAM) - is_deleted = FILTERED_DELETE; + if (!message_headers_parsed) { + getannotation(head->line, &annotation_content, + &annotation_robot); + if (annotation_content == ANNOTATION_CONTENT_DELETED_OTHER) + is_deleted = FILTERED_DELETE_OTHER; + else if (annotation_content == ANNOTATION_CONTENT_DELETED_SPAM) + is_deleted = FILTERED_DELETE; + } head->parsedheader = TRUE; } - if (!is_deleted && - inlist_regex_pos(set_filter_out, head->line) != -1) { - is_deleted = FILTERED_OUT; - } - - pos = inlist_regex_pos(set_filter_require, head->line); - if (pos != -1 && pos < require_filter_len) { - require_filter[pos] = TRUE; - } + + if (!message_headers_parsed) { + if (!is_deleted + && inlist_regex_pos(set_filter_out, head->line) != -1) { + is_deleted = FILTERED_OUT; + } - if (!strncasecmp(head->line, "Date:", 5)) { - date = getmaildate(head->line); - head->parsedheader = TRUE; - hasdate = 1; + pos = inlist_regex_pos(set_filter_require, head->line); + if (pos != -1 && pos < require_filter_len) { + require_filter[pos] = TRUE; + } + } + + if (!strncasecmp(head->line, "Received:", 8)) { + /* we are not doing anything with these + headers and there can be many of them, let's + mark them as parsed to speed up the processing + further below */ + head->parsedheader = TRUE; + continue; + } + else if (!strncasecmp(head->line, "Date:", 5)) { + strlftonl(head->line); + head->parsedheader = TRUE; + if (!message_headers_parsed) { + if (hasdate) { + /* msg has two or more of this header, + ignore them */ + continue; + } + date = getmaildate(head->line); + hasdate = 1; + } } else if (!strncasecmp(head->line, "From:", 5)) { - getname(head->line, &namep, &emailp); - head->parsedheader = TRUE; - if (set_spamprotect) { - emailp = spamify(strsav(emailp)); - /* we need to "fix" the name as well, as sometimes - the email ends up in the name part */ - namep = spamify(strsav(namep)); + head->parsedheader = TRUE; + strlftonl(head->line); + if (!message_headers_parsed) { + if (namep || emailp) { + /* msg has two or more of this header, + ignore them */ + continue; + } + getname(head->line, &namep, &emailp); + if (set_spamprotect) { + char *tmp; + tmp = emailp; + emailp = spamify(tmp); + free(tmp); + /* we need to "fix" the name as well, as sometimes + the email ends up in the name part */ + tmp = namep; + namep = spamify(tmp); + free(tmp); + } } } + else if (!strncasecmp(head->line, "To:", 3)) { + /* we don't do anything specific with this header, + we just want to mark it as parsed to avoid + processing it over and over here below + */ + head->parsedheader = TRUE; + strlftonl(head->line); + } else if (!strncasecmp(head->line, "Message-Id:", 11)) { - msgid = getid(head->line); - head->parsedheader = TRUE; + head->parsedheader = TRUE; + strlftonl(head->line); + if (!message_headers_parsed) { + if (msgid) { + /* msg has two or more of this header, + ignore them */ + continue; + } + msgid = getid(head->line); + } } else if (!strncasecmp(head->line, "Subject:", 8)) { - subject = getsubject(head->line); - hassubject = 1; - head->parsedheader = TRUE; + head->parsedheader = TRUE; + strlftonl(head->line); + if (!message_headers_parsed) { + if (hassubject) { + /* msg has two or more of this header, + ignore them */ + continue; + } + subject = getsubject(head->line); + hassubject = 1; + } } else if (!strncasecmp(head->line, "In-Reply-To:", 12)) { - inreply = getreply(head->line); - head->parsedheader = TRUE; - } - else if (!strncasecmp(head->line, "References:", 11)) { - /* - * Adding threading capability for the "References" - * header, ala RFC 822, used only for messages that - * have "References" but do not have an "In-reply-to" - * field. This is partically a concession for Netscape's - * email composer, which erroneously uses "References" - * when it should use "In-reply-to". - */ - if (!inreply) - inreply = getid(head->line); - if (set_linkquotes) { - bp = addbody(bp, &lp, line, 0); - } + char *unescaped_reply_to; head->parsedheader = TRUE; + strlftonl(head->line); + unescaped_reply_to = + _unescape_reply_and_reference_values(head->line); + if (unescaped_reply_to) { + free(head->line); + head->line = unescaped_reply_to; + } + if (!message_headers_parsed) { + if (inreply) { + /* we already parsed a References: header before, but + we're going to give priority to In-Reply-To */ + free(inreply); + } + inreply = getreply(head->line); + } + } + else if (!strncasecmp(head->line, "References:", 11)) { + head->parsedheader = TRUE; + if (!message_headers_parsed) { + char *unescaped_references; + + unescaped_references = + _unescape_reply_and_reference_values(head->line); + if (unescaped_references) { + free(head->line); + head->line = unescaped_references; + } + + /* + * Adding threading capability for the "References" + * header, ala RFC 822, used only for messages that + * have "References" but do not have an "In-reply-to" + * field. This is partically a concession for Netscape's + * email composer, which erroneously uses "References" + * when it should use "In-reply-to". + */ + if (!inreply) { + inreply = getid(head->line); + } + if (set_linkquotes) { + bp = addbody(bp, &lp, line, 0); + } + } } - else if (!strncasecmp(head->line, "Content-Type:", 13)) { - content_type_p = head; - } else if (applemail_ua_header_len > 0 && !strncasecmp(head_name, set_applemail_ua_header, applemail_ua_header_len)) { @@ -1931,11 +2452,11 @@ int parsemail(char *mbox, /* file name */ head->parsedheader = TRUE; if (alternativeparser || !Mime_B - || set_save_alts + || local_set_save_alts || !set_applemail_mimehack) { continue; } - + /* If the UA is an apple mail client and we're configured to do the * applemail hack and we're not already configured to * save the alternatives, memorize the old setting and force @@ -1951,26 +2472,32 @@ int parsemail(char *mbox, /* file name */ ** in-line. */ - old_set_save_alts = set_save_alts; - set_save_alts = 2; - + applemail_old_set_save_alts = local_set_save_alts; + local_set_save_alts = 2; + #if DEBUG_PARSE printf("Applemail_hack force save_alts: yes\n"); printf("Applemail_hack set_save_alts changed from %d to %d\n", - old_set_save_alts, set_save_alts); + applemail_old_set_save_alts, local_set_save_alts); #endif } } } - - if (!is_deleted && set_delete_older && (date || fromdate)) { + + /* avoid overwriting the message headers by those coming from + message/rfc attachments */ + if (!message_headers_parsed) { + message_headers_parsed = TRUE; + } + + if (!is_deleted && set_delete_older && (date || *fromdate)) { time_t email_time = convtoyearsecs(date); if (email_time == -1) email_time = convtoyearsecs(fromdate); if (email_time != -1 && email_time < delete_older_than) is_deleted = FILTERED_OLD; } - if (!is_deleted && set_delete_newer && (date || fromdate)) { + if (!is_deleted && set_delete_newer && (date || *fromdate)) { time_t email_time = convtoyearsecs(date); if (email_time == -1) email_time = convtoyearsecs(fromdate); @@ -1984,28 +2511,189 @@ int parsemail(char *mbox, /* file name */ savealternative = FALSE; attach_force = FALSE; +#if NEW_PARSER + /* testing separating parsing from post-processing */ + /* extract content-type and other values from the headers */ + content_type_ptr = NULL; + for (head = headp; head; head = head->next) { + if (head->parsedheader || !head->header || head->invalid_header) + continue; + + if (!strncasecmp(head->line, "Content-Type:", 13)) { + char *ptr = head->line + 13; +#define DISP_HREF 1 +#define DISP_IMG 2 +#define DISP_IGNORE 3 + /* we must make sure this is not parsed more times + than this */ + head->parsedheader = TRUE; + + while (isspace(*ptr)) + ptr++; + + content_type_ptr = ptr; + sscanf(ptr, "%128[^;]", type); + + filter_content_type_values(type); + + /* now, check if there's a charset indicator here too! */ + cp = strcasestr(ptr, "charset="); + if (cp) { + cp += 8; /* pass charset= */ + if ('\"' == *cp) + cp++; /* pass a quote too if one is there */ + + sscanf(cp, "%128[^;\"\n\r]", charbuffer); + /* @@ we need a better filter here, to remove all non US-ASCII */ + filter_content_type_values(charbuffer); + /* some old messages use DEFAULT_CHARSET or foo_CHARSET, + we strip it out */ + filter_charset_value(charbuffer); + /* save the charset info */ + if (charbuffer[0] != '\0') { + charset = strsav(charbuffer); + } + } + + /* now check if there's a format indicator */ + if (set_format_flowed) { + cp = strcasestr(ptr, "format="); + if (cp) { + cp += 7; /* pass charset= */ + if ('\"' == *cp) + cp++; /* pass a quote too if one is there */ + + sscanf(cp, "%128[^;\"\n\r]", charbuffer); + /* save the format info */ + if (!strcasecmp (charbuffer, "flowed")) + textplain_format = FORMAT_FLOWED; + } + + /* now check if there's a delsp indicator */ + cp = strcasestr(ptr, "delsp="); + if (cp) { + cp += 6; /* pass charset= */ + if ('\"' == *cp) + cp++; /* pass a quote too if one is there */ + + sscanf(cp, "%128[^;\"\n\r]", charbuffer); + /* save the delsp info */ + if (!strcasecmp (charbuffer, "yes")) + delsp_flag = TRUE; + } + } + break; + } + + } /* for content-type */ + + /* post-processing Content-Type: + check if we have the a Content=Type, a boundary parameter, + and a corresponding start bondary + revert to a default type otherwise. + */ + if (content_type_ptr == NULL) { + /* missing Content-Type header, use default text/plain unless + immediate parent is multipart/digest; in that case, use + message/rfc822 (RFC 2046) */ + if (multipart_stack_top_has_type(multipartp, "multipart/digest") + && !attachment_rfc822) { + strcpy(type, "message/rfc822"); + } else { + strcpy(type, "text/plain"); + } + content_type_ptr = type; +#if DEBUG_PARSE + printf("Missing Content-Type header, defaulting to %s\n", type); +#endif + } else if (!strncasecmp(type, "multipart/", 10)) { + boundary_id = strcasestr(content_type_ptr, "boundary="); +#if DEBUG_PARSE + printf("boundary found in %s\n", content_type_ptr); +#endif + if (boundary_id) { + boundary_id = strchr(boundary_id, '='); + if (boundary_id) { + boundary_id++; + while (isspace(*boundary_id)) + boundary_id++; + *boundbuffer ='\0'; + if ('\"' == *boundary_id) { + sscanf(++boundary_id, "%255[^\"]", + boundbuffer); + } + else + sscanf(boundary_id, "%255[^;\n]", + boundbuffer); + boundary_id = (*boundbuffer) ? boundbuffer : NULL; + } + } + + /* if we have multipart/ but there's no missing + boundary attribute, downgrade the content type to + text/plain */ + if (!boundary_id) { + strcpy(type, "text/plain"); + content_type_ptr = type; +#if DEBUG_PARSE + printf("Missing boundary attribute in multipart/*, downgrading to text/plain\n"); +#endif + } + } + + /* a limit to avoid having the message_node tree growing + uncontrollably */ + if ((set_max_attach_per_msg != 0) + && (att_counter > set_max_attach_per_msg)) { + content = CONTENT_IGNORE; +#if DEBUG_PARSE + printf("Hit max_attach_per_msg limit; ignoring further attachments for msgid %s\n", msgid); +#endif + } + + if (content == CONTENT_IGNORE) { + continue; + } else if (ignorecontent(type)) { + /* don't save this */ + content = CONTENT_IGNORE; + continue; + } +#if 0 + /* not sure if we should add charset save here or wait until later */ + if (charset[0] == NULL) { + strcpy(charset, set_default_charset); + } +#endif + + /* parsing of all Content-* related headers except for Content-Type */ description = NULL; for (head = headp; head; head = head->next) { - if (head->parsedheader || !head->header) + if (head->parsedheader || !head->header || head->invalid_header) continue; + /* Content-Description is defined ... where?? */ if (!strncasecmp(head->line, "Content-Description:", 20)) { char *ptr = head->line; description = ptr + 21; + head->parsedheader = TRUE; } /* Content-Disposition is defined in RFC 2183 */ - else - if (!strncasecmp (head->line, "Content-Disposition:", 20)) { + else if (!strncasecmp (head->line, "Content-Disposition:", 20)) { char *ptr = head->line + 20; char *fname; - char *jp; char *np; + head->parsedheader = TRUE; + + if (inlist(set_ignore_content_disposition, type)) { + continue; + } + while (*ptr && isspace(*ptr)) ptr++; - if (!strncasecmp(ptr, "attachment;", 11) + if (!strncasecmp(ptr, "attachment", 10) && (content != CONTENT_IGNORE)) { - /* signal we want to attach, rather than embeed this MIME + /* signal we want to attach, rather than embeed this MIME attachment */ if (inlist(set_ignore_types, "$NONPLAIN") || inlist(set_ignore_types, "$BINARY")) @@ -2020,14 +2708,7 @@ int parsemail(char *mbox, /* file name */ fname = strcasestr(ptr, "filename="); if (fname) { np = fname+9; - if (*np == '"') - np++; - for (jp = attachname; np && *np != '\n' - && *np != '"' && jp < attachname + sizeof(attachname) - 1;) { - *jp++ = *np++; - } - *jp = '\0'; - safe_filename(attachname); + _extract_attachname(np, attachname, sizeof(attachname)); } else { attachname[0] = '\0'; /* just clear it */ @@ -2035,18 +2716,8 @@ int parsemail(char *mbox, /* file name */ file_created = MAKE_FILE; /* please make one */ } } -#if 0 -/* -** Why was this limited to just type image ? There are more inline types than just image. -** I removed the image restriction and all of a sudden more attachments had the proper name. -*/ - else if (!strncasecmp(ptr, "inline;", 7) - && (content != CONTENT_IGNORE) - && (!strncasecmp(type, "image/", 5))) { - /* @@@ <-- here I should use the inline thingy */ -#endif - else if (!strncasecmp(ptr, "inline;", 7) + else if (!strncasecmp(ptr, "inline", 6) && (content != CONTENT_IGNORE) && inlinecontent(type)) { inline_force = TRUE; @@ -2056,482 +2727,608 @@ int parsemail(char *mbox, /* file name */ fname = strcasestr(ptr, "filename="); if (fname) { np = fname+9; - if (*np == '"') - np++; - for (jp = attachname; np && *np != '\n' && *np != '"' - && jp < attachname + sizeof(attachname) - 1;) { - *jp++ = *np++; - } - *jp = '\0'; - safe_filename(attachname); + _extract_attachname(np, attachname, sizeof(attachname)); } else { attachname[0] = '\0'; /* just clear it */ } file_created = MAKE_FILE; /* please make one */ } /* inline */ - } /* Content-Disposition: */ + + } /* Content-Disposition: */ else if (!strncasecmp(head->line, "Content-Base:", 13)) { #ifdef NOTUSED char *ptr = head->line + 13; -#endif + /* we just ignore this header. Why were we ignoring the whole + attachment? */ content=CONTENT_IGNORE; +#endif /* we must make sure this is not parsed more times than this */ head->parsedheader = TRUE; - } - else if (!strncasecmp(head->line, "Content-Type:", 13)) { - char *ptr = head->line + 13; -#define DISP_HREF 1 -#define DISP_IMG 2 -#define DISP_IGNORE 3 - /* we must make sure this is not parsed more times - than this */ + } else if (!strncasecmp + (head->line, "Content-Transfer-Encoding:", 26)) { + char *ptr = head->line + 26; + head->parsedheader = TRUE; while (isspace(*ptr)) ptr++; - - sscanf(ptr, "%128[^;]", type); - cp = type + strlen(type) - 1; - while (cp > type && isspace(*cp)) { - *cp = '\0'; /* rm newlines, etc */ - --cp; + if (!strncasecmp(ptr, "QUOTED-PRINTABLE", 16)) { + decode = ENCODE_QP; + } + else if (!strncasecmp(ptr, "BASE64", 6)) { + decode = ENCODE_BASE64; + b64_decoder_state = base64_decoder_state_new(); + } + else if (!strncasecmp(ptr, "8BIT", 4)) { + decode = ENCODE_NORMAL; + } + else if (!strncasecmp(ptr, "7BIT", 4)) { + decode = ENCODE_NORMAL; + } + else if (!strncasecmp(ptr, "x-uue", 5)) { + decode = ENCODE_UUENCODE; + /* JK 20230504: what does this do? + break; do we need to abort content-type too? */ + if (!do_uudecode(fp, line, line_buf, + &raw_text_buf, fpo)) + break; } + else { + /* Unknown format, we use default decoding */ + char code[64]; - /* now, check if there's a charset indicator here too! */ - cp = strcasestr(ptr, "charset="); - if (cp) { - cp += 8; /* pass charset= */ - if ('\"' == *cp) - cp++; /* pass a quote too if one is there */ + /* is there any value for content-encoding or is it missing? */ + if (sscanf(ptr, "%63s", code) != EOF) { - sscanf(cp, "%128[^;\"\n]", charbuffer); - /* save the charset info */ - charset = strsav(charbuffer); - } + trio_snprintf(line, sizeof(line_buf) - set_ietf_mbox, + " ('%s' %s)\n", code, + lang[MSG_ENCODING_IS_NOT_SUPPORTED]); - /* now check if there's a format indicator */ - if (set_format_flowed) { - cp = strcasestr(ptr, "format="); - if (cp) { - cp += 7; /* pass charset= */ - if ('\"' == *cp) - cp++; /* pass a quote too if one is there */ - - sscanf(cp, "%128[^;\"\n]", charbuffer); - /* save the format info */ - if (!strcasecmp (charbuffer, "flowed")) - textplain_format = FORMAT_FLOWED; - } - - /* now check if there's a delsp indicator */ - cp = strcasestr(ptr, "delsp="); - if (cp) { - cp += 6; /* pass charset= */ - if ('\"' == *cp) - cp++; /* pass a quote too if one is there */ - - sscanf(cp, "%128[^;\"\n]", charbuffer); - /* save the delsp info */ - if (!strcasecmp (charbuffer, "yes")) - delsp_flag = TRUE; - } - } + bp = addbody(bp, &lp, line, + BODY_HTMLIZED | bodyflags); - if (alternativeparser) { - struct body *next; - struct body *temp_bp = NULL; - - /* We are parsing alternatives... */ +#if DEBUG_PARSE + printf("Ignoring unknown Content-Transfer-Encoding: %s\n", code); +#endif + } else { +#if DEBUG_PARSE + printf("Missing Content-Transfer-Encoding value\n"); +#endif + } + } +#if DEBUG_PARSE + printf("DECODE set to %d\n", decode); +#endif + } /* Content-Transfer-Encoding */ + } /* for Content-* except Content-Type */ - if (parse_multipart_alternative_force_save_alts - && multipartp - && !strcasecmp(multipartp->line, "multipart/alternative") - && *last_alternative_type - && !strcasecmp(last_alternative_type, "text/plain")) { + /* process specific Content-Type values */ + do { + if (alternativeparser) { + struct body *temp_bp = NULL; + + /* We are parsing alternatives... */ + + if (parse_multipart_alternative_force_save_alts + && multipart_stack_top_has_type(multipartp, "multipart/alternative") + && *last_alternative_type + && !strcasecmp(last_alternative_type, "text/plain")) { - /* if the UA is Apple mail and if the only - ** alternatives are text/plain and - ** text/html and if the preference is - ** text/plain, skip the text/html version - ** if the applemail_hack is enabled - */ - if (!strcasecmp(type, "text/html")) { + /* if the UA is Apple mail and if the only + ** alternatives are text/plain and + ** text/html and if the preference is + ** text/plain, skip the text/html version + ** if the applemail_hack is enabled + */ + if (!strcasecmp(type, "text/html")) { #if DEBUG_PARSE - fprintf(stderr, "Discarding apparently equivalent text//html alternative\n"); + fprintf(stderr, "Discarding apparently equivalent text/html alternative\n"); #endif - content = CONTENT_IGNORE; - break; - } + content = CONTENT_IGNORE; + break; } - - if (preferedcontent(&alternative_weight, type, decode)) { - /* ... this is a prefered type, we want to store - this [instead of the earlier one]. */ - /* erase the previous alternative info */ - temp_bp = alternative_bp; /* remember the value of bp for GC */ - alternative_bp = alternative_lp = NULL; - if (prefered_content_charset) { - free(prefered_content_charset); - } - prefered_content_charset = strsav (charset); - strncpy(last_alternative_type, type, - sizeof(last_alternative_type) - 1); + } + + if (preferedcontent(&alternative_weight, type, decode)) { + /* ... this is a prefered type, we want to store + this [instead of the earlier one]. */ + /* erase the previous alternative info */ + if (current_message_node->alternative) { + current_message_node->skip = MN_SKIP_ALL; + } + + strncpy(last_alternative_type, type, + sizeof(last_alternative_type) - 2); + /* make sure it's a NULL ending string if ever type > 128 */ + last_alternative_type[sizeof(last_alternative_type) - 1] = '\0'; #ifdef DEBUG_PARSE - fprintf(stderr, "setting new prefered alternative charset to %s\n", charset); + fprintf(stderr, "setting new prefered alternative charset to %s\n", charset); #endif - alternative_lastfile_created = NO_FILE; - content = CONTENT_UNKNOWN; - if (alternative_lastfile[0] != '\0') { - /* remove the previous attachment */ - unlink(alternative_lastfile); - alternative_lastfile[0] = '\0'; - } - } - else if (set_save_alts == 2) { - content = CONTENT_BINARY; - } else { - /* ...and this type is not a prefered one. Thus, we - * shall ignore it completely! */ - content = CONTENT_IGNORE; - /* erase the current alternative info */ - temp_bp = bp; /* remember the value of bp for GC */ - lp = alternative_lp; - bp = alternative_bp; - strcpy(alternative_file, - alternative_lastfile); - file_created = - alternative_lastfile_created; - alternative_bp = alternative_lp = NULL; - alternative_lastfile_created = NO_FILE; - alternative_lastfile[0] = '\0'; - /* we haven't yet created any attachment file, so there's no need - to erase it yet */ + alternative_lastfile_created = NO_FILE; + content = CONTENT_UNKNOWN; + /* @@ JK: add here a delete for mmixed, for all children, + composite or not under this node */ + if (root_message_node != current_message_node + && current_alt_message_node == current_message_node) { + message_node_delete_attachments(current_message_node); } - - /* free any previous alternative */ - while (temp_bp) { - next = temp_bp->next; - if (temp_bp->line) - free(temp_bp->line); - free(temp_bp); - temp_bp = next; - } + if (alternative_lastfile[0] != '\0') { + /* remove the previous attachment */ + /* unlink(alternative_lastfile); */ + alternative_lastfile[0] = '\0'; + } + } + else if (local_set_save_alts == 2) { + content = CONTENT_BINARY; + } else { + /* ...and this type is not a prefered one. Thus, we + * shall ignore it completely! */ + content = CONTENT_IGNORE; + /* erase the current alternative info */ + temp_bp = bp; /* remember the value of bp for GC */ + /* + lp = alternative_lp; + bp = alternative_bp; + */ + lp = bp = headp = NULL; + strcpy(alternative_file, + alternative_lastfile); + file_created = + alternative_lastfile_created; + alternative_bp = alternative_lp = NULL; + alternative_lastfile_created = NO_FILE; + alternative_lastfile[0] = '\0'; + /* we haven't yet created any attachment file, so there's no need + to erase it yet */ + } + + /* free any previous alternative */ + free_body (temp_bp); + + /* @@ not sure if I should add a diff flag to do this break */ + if (content == CONTENT_IGNORE) + /* end the header parsing... we already know what we want */ + break; + + } /* alternativeparser */ + + if (content == CONTENT_IGNORE) + break; + else if (ignorecontent(type)) { + /* don't save this */ + content = CONTENT_IGNORE; + break; + } else if (textcontent(type) + || (inlinehtml && + !strcasecmp(type, "text/html"))) { + /* text content or text/html follows. + */ + + if (local_set_save_alts && alternativeparser + && content == CONTENT_BINARY) { + file_created = MAKE_FILE; /* please make one */ + description = set_alts_text ? set_alts_text + : "alternate version of message"; + /* JK 2023/04: why is description tied to + the length of attachname and why it was + using it to make a filename? code + commented out while investigating. We + get the filename from the filename + found in Content-Disposition or + Content-Type, and if none is found, we + generate one. + */ +#ifdef FIX_OR_DELETE_ME + strncpy(attachname, description, sizeof(attachname) - 1); + /* make sure it's a NULL terminated string */ + attachname[sizeof(attachname) - 1] = '\0'; + safe_filename(attachname); +#endif + } - /* @@ not sure if I should add a diff flag to do this break */ - if (content == CONTENT_IGNORE) - /* end the header parsing... we already know what we want */ - break; - } + /* if it's not a stored attachment, + ** try to define content more precisely + ** The condition to detect if it's a + ** is to see if file_created == MAKE_FILE + ** or content = CONTENT_BINARY */ + else if (file_created != MAKE_FILE) { + if (!strcasecmp(type, "text/html")) + content = CONTENT_HTML; + else + content = CONTENT_TEXT; + } else { + /* we should refactor and simplify the cases when + we call the following function. + It's needed here when a text/plain part has + Content-Disposition: attachment and a filename + given only in the Content-Type name attribute */ + _control_attachname(content_type_ptr, attachname, sizeof(attachname)); + } + break; - if (content == CONTENT_IGNORE) - continue; - else if (ignorecontent(type)) - /* don't save this */ - content = CONTENT_IGNORE; - else if (textcontent(type) - || (inlinehtml && - !strcasecmp(type, "text/html"))) { - /* text content or text/html follows. - */ + } /* textcontent(type) || inlinehtml && type == text/html */ - if (set_save_alts && alternativeparser - && content == CONTENT_BINARY) { - file_created = MAKE_FILE; /* please make one */ - description = set_alts_text ? set_alts_text - : "alternate version of message"; - if (strlen(description) >= sizeof(attachname)) - progerr("alts_text too long"); - strcpy(attachname, description); - safe_filename(attachname); - } - else if (!strcasecmp(type, "text/html")) - content = CONTENT_HTML; - else - content = CONTENT_TEXT; - - if (!alternativeparser && !prefered_content_charset) { - /* there are apparently no - alternatives in this message, let's - use the first text/* charset we - found as the prefered one */ - prefered_content_charset = strsav (charset); - } + +#if 1 || TESTING_IF_THIS_IS_AN_ERROR + else if (attach_force) { + /* maybe copy description and desc default values here? + other things here? + what to do with content == CONTENT_BINARY? + */ + + { + /* don't like calling this function in two parts, + but we need to fix a bug. Will have to refactorize how + to handle attach_force when we revisit the code */ - continue; - } - else if (!strncasecmp(type, "message/rfc822", 14)) { - /* - * Here comes an attached mail! This can be ugly, - * since the attached mail may very well itself - * contain attached binaries, or why not another - * attached mail? :-) - * - * We need to store the current boundary separator - * in order to get it back when we're done parsing - * this particular mail, since each attached mail - * will have its own boundary separator that *might* - * be used. - */ - bp = addbody(bp, &lp, - "

    attached mail follows:


    ", - BODY_HTMLIZED | bodyflags); - bodyflags |= BODY_ATTACHED; - /* @@ should it be 1 or 2 ?? should we use another method? */ -#if 0 - isinheader = 2; + /* if attachname is empty, copy the value of the name attribute, + if given in the Content-Type header */ + _control_attachname(content_type_ptr, attachname, sizeof(attachname)); + } + break; + } #endif - isinheader = 1; - continue; - } - else if (strncasecmp(type, "multipart/", 10)) { - /* - * This is not a multipart and not text - */ - char *fname = NULL; /* attachment filename */ - - /* - * only do anything here if we're not - * ignoring this content - */ - if (CONTENT_IGNORE != content) { + else if (!strncasecmp(type, "message/rfc822", 14)) { + /* + * Here comes an attached mail! This can be ugly, + * since the attached mail may very well itself + * contain attached binaries, or why not another + * attached mail? :-) + * + * We need to store the current boundary separator + * in order to get it back when we're done parsing + * this particular mail, since each attached mail + * will have its own boundary separator that *might* + * be used. + */ + + /* need to take into account alternates with rfc822? */ + if (boundp == NULL && multipartp == NULL) { + /* we have a non multipart message with a message/rfc822 + content-type body */ + bp = addbody(bp, &lp, + NULL, + BODY_ATTACHMENT | BODY_ATTACHMENT_RFC822); + + } else { + free_body(bp); + description = NULL; + bp = lp = headp = NULL; + attachment_rfc822 = TRUE; + } + isinheader = 1; - fname = strcasestr(ptr, "name="); - if (fname) { - fname += 5; - if ('\"' == *fname) - fname++; - sscanf(fname, "%128[^\"]", attachname); - safe_filename(attachname); - } - else { - attachname[0] = '\0'; /* just clear it */ - } + /* RFC2046 states that message/rfc822 can only + have Content-Transfer-Encoding values of 7bit, + 8bit, and binary. Some broken mail clients + may have used something else */ + if (decode != ENCODE_NORMAL) { +#if DEBUG_PARSE + printf("Error: msgid %s : message/rfc822 Content-Type associated with a\n" + "Content-Transfer-Encoding that is not\n7bit, 8bit, or binary.\n" + "Forcing ENCODE_NORMAL\n", msgid); +#endif + if (decode == ENCODE_BASE64) { + base64_decoder_state_free(b64_decoder_state); + b64_decoder_state = NULL; + } + decode = ENCODE_NORMAL; + } + + /* reset the apple mail hack and the + local_set_save_alts as we don't know if the + forwarded message was originally sent from + an apple mal client */ + parse_multipart_alternative_force_save_alts = 0; + applemail_old_set_save_alts = -1; + local_set_save_alts = set_save_alts; + break; + + } /* message/rfc822 */ - file_created = MAKE_FILE; /* please make one */ + else if (strncasecmp(type, "multipart/", 10)) { + /* + * This is not a multipart and not text + */ + + /* + * only do anything here if we're not + * ignoring this content + */ + if (CONTENT_IGNORE != content) { + /* only use the Content-Type name attribute to get + the filename if Content-Disposition didn't + provide a filename */ + _control_attachname(content_type_ptr, attachname, sizeof(attachname)); + file_created = MAKE_FILE; /* please make one */ + content = CONTENT_BINARY; /* uknown turns into binary */ + } + break; + + } /* !multipart/ */ - content = CONTENT_BINARY; /* uknown turns into binary */ - } - continue; - } - else { - /* - * Find the first boundary separator - */ + else { + /* + * Find the first boundary separator + */ + + struct body *tmpbp; + struct body *tmplp; + bool found_start_boundary; + - struct body *tmpbp; - struct body *tmplp; - - boundary_id = strcasestr(ptr, "boundary="); +#if DELETE_ME_CODE_MOVED_UP + boundary_id = strcasestr(content_type_ptr, "boundary="); #if DEBUG_PARSE - printf("boundary found in %s\n", ptr); + printf("boundary found in %s\n", ptr); #endif - if (boundary_id) { - boundary_id = strchr(boundary_id, '='); - if (boundary_id) { - boundary_id++; - while (isspace(*boundary_id)) - boundary_id++; - if ('\"' == *boundary_id) { - sscanf(++boundary_id, "%255[^\"]", - boundbuffer); - } - else - sscanf(boundary_id, "%255[^;\n]", - boundbuffer); - boundary_id = boundbuffer; - } - - /* restart on a new list: */ - tmpbp = tmplp = NULL; - - while (fgets(line_buf, MAXLINE, fp)) { - if(set_append) { - if(fputs(line_buf, fpo) < 0) { - progerr("Can't write to \"mbox\""); /* revisit me */ - } - } - if (!strncmp(line_buf + set_ietf_mbox, "--", 2) && - !strncmp(line_buf + set_ietf_mbox + 2, boundbuffer, - strlen(boundbuffer))) { - break; - } - if (!strncasecmp(line_buf, "From ", 5)) { -#if DEBUG_PARSE - printf("Error, new message found instead of boundary!\n"); #endif - isinheader = 0; - if (tmpbp) - bp = append_body(bp, &lp, tmpbp); - boundary_id = NULL; - goto leave_header; - } - /* save lines in case no boundary found */ - tmpbp = addbody(tmpbp, &tmplp, line_buf, bodyflags); - } - if (!strncmp(line_buf + set_ietf_mbox + 2 + strlen(boundary_id), "--", 2) - && tmpbp) { -#if DEBUG_PARSE - printf("Error, end of mime found before mime start!\n"); + if (boundary_id) { +#if DELETE_ME_CODE_MOVED_UP + boundary_id = strchr(boundary_id, '='); + if (boundary_id) { + boundary_id++; + while (isspace(*boundary_id)) + boundary_id++; + *boundbuffer = '\0'; + if ('\"' == *boundary_id) { + sscanf(++boundary_id, "%255[^\"]", + boundbuffer); + } + else + sscanf(boundary_id, "%255[^;\n]", + boundbuffer); + boundary_id = boundbuffer; + } #endif - /* end of mime found before mime start */ - bp = append_body(bp, &lp, tmpbp); - boundary_id = NULL; - goto leave_header; - } - free_body(tmpbp); - - /* - * This stores the boundary string in a stack - * of strings: - */ - boundp = bound(boundp, boundbuffer); - multipartp = multipart(multipartp, type); - skip_mime_epilogue = FALSE; + + /* restart on a new list: */ + tmpbp = tmplp = NULL; + found_start_boundary = FALSE; + + while (fgets(line_buf, MAXLINE, fp)) { + char *tmpline; - /* printf("set new boundary: %s\n", boundp->line); */ - - /* @@JK Take into account errors when we abort, malformed mime, etc, - probably put this call up, before detecting errors? */ - charsetsp = charsets(charsetsp, charset, charsetsave); -#ifdef DEBUG_PARSE - fprintf(stderr, "pushing charset %s and charsetsave %s\n", charset, charsetsave); -#endif - if (charset) { - free(charset); - charset = NULL; + if(set_append) { + if(fputs(line_buf, fpo) < 0) { + progerr("Can't write to \"mbox\""); /* revisit me */ } - charsetsave[0] = '\0'; + } -#ifdef DEBUG_PARSE - fprintf(stderr, "restoring parents charset %s and charsetsave %s\n", charset, charsetsave); -#endif + tmpline = line_buf + set_ietf_mbox; - /* - * We set ourselves, "back in header" since there is - * gonna come MIME headers now after the separator - */ - isinheader = 1; + /* + ** detect different cases where we may have broken, missing, + ** or unexpected start and end boundaries. + ** Using mutt as a reference on how to process each case + **/ - /* Daniel Stenberg started adding the - * "multipart/alternative" parser 13th of July - * 1998! We check if this is a 'multipart/ - * alternative' header, in which case we need to - * treat it very special. - */ + /* start boundary? */ + if (is_start_boundary(boundary_id, tmpline)) { + found_start_boundary = TRUE; + break; + } + /* new message found */ + if (!strncasecmp(line_buf, "From ", 5)) { +#if DEBUG_PARSE + printf("Error, new message found instead of expected start_boundary: %s\n", boundbuffer); +#endif + break; - if (!strncasecmp - (&ptr[10], "alternative", 11)) { - /* It *is* an alternative session! Alternative - * means there will be X parts with the same text - * using different content-types. We are supposed - * to take the most prefered format of the ones - * used and only output that one. MIME defines - * the order of the texts to start with pure text - * and then continue with more and more obscure - * formats. (well, it doesn't use those terms but - * that's what it means! ;-)) - */ - - /* How "we" are gonna deal with them: - * - * We create a "spare" linked list body for the - * very first part. Since the first part is - * defined to be the most readable, we save that - * in case no content-type present is prefered! - * - * We skip all parts that are not prefered. All - * prefered parts found will replace the first - * one that is saved. When we reach the end of - * the alternatives, we will use the last saved - * one as prefered. - */ - - savealternative = TRUE; + } + /* a preceding non-closed boundary? */ + else if (!strncmp(tmpline, "--", 2) + && ! _is_signature_separator(line)) { + char *tmp_boundary = boundary_stack_has_id(boundp, tmpline); + + boundary_id = tmp_boundary; +#if DEBUG_PARSE + printf("Error, an existing boundary found instead of expected start_boundary: %s\n", boundbuffer); +#endif + break; + } + /* save lines in case no boundary found */ + tmpbp = addbody(tmpbp, &tmplp, tmpline, bodyflags); + } + + /* control we found the start boundary we were expecting */ + if (!found_start_boundary) { #if DEBUG_PARSE - printf("SAVEALTERNATIVE: yes\n"); + printf("Error: didn't find start boundary\n"); + printf("last line read:\n%s", line_buf); #endif + isinheader = 0; + boundary_id = NULL; + + if (tmpbp) { + bp = append_body(bp, &lp, tmpbp, TRUE); } - } - else - boundary_id = NULL; - } - } - else - if (!strncasecmp - (head->line, "Content-Transfer-Encoding:", 26)) { - char *ptr = head->line + 26; + /* downgrading to text/plain */ + strcpy(type, "text/plain"); + content_type_ptr = type; +#if DEBUG_PARSE + printf("Downgrading to text/plain\n"); +#endif + goto leave_header; + } + free_body(tmpbp); + + /* + ** we got a new part coming + */ + current_message_node = + message_node_mimetest(current_message_node, + bp, lp, charset, charsetsave, + type, + (boundp) ? boundp->boundary_id : NULL, + boundary_id, + att_binname, + meta_filename, + att_link, + att_comment_filename, + attachment_rfc822, + message_node_skip_status(file_created, + content, + type)); +#if DEBUG_PARSE_MSGID_TRACE + current_message_node->msgid = strsav(msgid); +#endif + if (alternativeparser) { + current_alt_message_node = current_message_node; + } + if (att_binname) { + free(att_binname); + att_binname = NULL; + } + if (meta_filename) { + free(meta_filename); + meta_filename = NULL; + } + if (att_link) { + free(att_link); + att_link = NULL; + } + if (att_comment_filename) { + free(att_comment_filename); + att_comment_filename = NULL; + } + + if (alternativeparser) { + current_message_node->alternative = TRUE; + } - head->parsedheader = TRUE; - while (isspace(*ptr)) - ptr++; - if (!strncasecmp(ptr, "QUOTED-PRINTABLE", 16)) { - decode = ENCODE_QP; - } - else if (!strncasecmp(ptr, "BASE64", 6)) { - decode = ENCODE_BASE64; - } - else if (!strncasecmp(ptr, "8BIT", 4)) { - decode = ENCODE_NORMAL; - } - else if (!strncasecmp(ptr, "7BIT", 4)) { - decode = ENCODE_NORMAL; - } - else if (!strncasecmp(ptr, "x-uue", 5)) { - decode = ENCODE_UUENCODE; - if (!do_uudecode(fp, line, line_buf, - &raw_text_buf, fpo)) - break; - } - else { - /* Unknown format, we use default decoding */ - char code[64]; + /* + if (!strncasecmp(type, "multipart/related", 17)) { + current_message_node->skip = MN_SKIP_BUT_KEEP_CHILDREN; + } + */ + + if (!root_message_node) { + root_message_node = current_message_node; + } + + /* + * This stores the boundary string in a stack + * of strings: + */ + if (boundp && alternativeparser) { + /* if we were dealing with multipart/alternative or + message/rfc822, store the current content */ + boundp->alternativeparser = alternativeparser; + boundp->alternative_weight = alternative_weight; + boundp->alternative_message_node_created = + alternative_message_node_created; + strcpy(boundp->alternative_file, alternative_file); + strcpy(boundp->alternative_lastfile, alternative_lastfile); + strcpy(boundp->last_alternative_type, last_alternative_type); + boundp->alternative_lp = alternative_lp; + boundp->alternative_bp = alternative_bp; + boundp->current_alt_message_node = current_alt_message_node; + boundp->root_alt_message_node = root_alt_message_node; + current_alt_message_node = root_alt_message_node = NULL; + alternative_file[0] = alternative_lastfile[0] = last_alternative_type[0] = '\0'; + alternative_message_node_created = FALSE; + alternativeparser = FALSE; + } - /* is there any value for content-encoding or is it missing? */ - if (sscanf(ptr, "%63s", code) != EOF) { - - snprintf(line, sizeof(line_buf) - set_ietf_mbox, - " ('%s' %s)\n", code, - lang[MSG_ENCODING_IS_NOT_SUPPORTED]); + boundp = boundary_stack_push(boundp, boundbuffer); + boundp->parse_multipart_alternative_force_save_alts = parse_multipart_alternative_force_save_alts; + boundp->applemail_old_set_save_alts = applemail_old_set_save_alts; + boundp->set_save_alts = local_set_save_alts; + multipartp = multipart_stack_push(multipartp, type); + skip_mime_epilogue = FALSE; - bp = addbody(bp, &lp, line, - BODY_HTMLIZED | bodyflags); + attachment_rfc822 = FALSE; + + description = NULL; + *filename = '\0'; + bp = lp = headp = NULL; + /* printf("set new boundary: %s\n", boundp->boundary_id); */ -#if DEBUG_PARSE - printf("Ignoring unknown Content-Transfer-Encoding: %s\n", code); -#endif - } else { -#if DEBUG_PARSE - printf("Missing Content-Transfer-Encoding value\n"); + if (charset) { + free(charset); + charset = NULL; + } + charsetsave[0] = '\0'; + +#ifdef DEBUG_PARSE + fprintf(stderr, "restoring parents charset %s and charsetsave %s\n", charset, charsetsave); #endif - } - } + + /* + * We set ourselves, "back in header" since there is + * gonna come MIME headers now after the separator + */ + isinheader = 1; + + /* Daniel Stenberg started adding the + * "multipart/alternative" parser 13th of July + * 1998! We check if this is a 'multipart/ + * alternative' header, in which case we need to + * treat it very special. + */ + + if (!strncasecmp + (&content_type_ptr[10], "alternative", 11)) { + /* It *is* an alternative session! Alternative + * means there will be X parts with the same text + * using different content-types. We are supposed + * to take the most prefered format of the ones + * used and only output that one. MIME defines + * the order of the texts to start with pure text + * and then continue with more and more obscure + * formats. (well, it doesn't use those terms but + * that's what it means! ;-)) + */ + + /* How "we" are gonna deal with them: + * + * We create a "spare" linked list body for the + * very first part. Since the first part is + * defined to be the most readable, we save that + * in case no content-type present is prefered! + * + * We skip all parts that are not prefered. All + * prefered parts found will replace the first + * one that is saved. When we reach the end of + * the alternatives, we will use the last saved + * one as prefered. + */ + + savealternative = TRUE; #if DEBUG_PARSE - printf("DECODE set to %d\n", decode); + printf("SAVEALTERNATIVE: yes\n"); #endif - } - } + } + + } + else + boundary_id = NULL; + } + break; + } while (0); /* do .. while (0) */ + +#endif /* NEW_PARSER */ /* @@@ here we try to do a post parsing cleanup */ /* have to find out all the conditions to turn it off */ if (attach_force) { savealternative = FALSE; isinheader = 0; + /* a kludge while I wait to see how to better integrate this + case */ + content = CONTENT_BINARY; } if (savealternative) { - /* let's remember 'bp' and 'lp' */ - - origbp = bp; - origlp = lp; - alternativeparser = TRUE; /* restart on a new list: */ - lp = bp = NULL; + lp = bp = headp = NULL; /* clean the alternative status variables */ alternative_weight = -1; alternative_lp = alternative_bp = NULL; @@ -2564,8 +3361,58 @@ int parsemail(char *mbox, /* file name */ binfile = -1; } + if (bp || lp) { + /* if we reach this condition, it means the message is missing one or + more mime boundary ends. Closing the current active node should fix + this */ + if (current_message_node) { + current_message_node = + message_node_mimetest(current_message_node, + bp, lp, charset, charsetsave, + type, + (boundp) ? boundp->boundary_id : NULL, + boundary_id, + att_binname, + meta_filename, + att_link, + att_comment_filename, + attachment_rfc822, + message_node_skip_status(file_created, + content, + type)); +#if DEBUG_PARSE_MSGID_TRACE + current_message_node->msgid = strsav(msgid); +#endif + } + } + + /* THE PREFERED CHARSET ALGORITHM */ + /* as long as we don't handle UTF-8 throughout), use the prefered content charset if we got one */ + + /* see struct.c:choose_charset() for the algo heuristics 1 */ + if (root_message_node) { + prefered_charset = message_node_get_charset(root_message_node); + } else { + prefered_charset = _single_content_get_charset(charset, charsetsave); + } + + if (prefered_charset && set_replace_us_ascii_with_utf8 + && !strncasecmp(prefered_charset, "us-ascii", 8)) { + if (set_debug_level) { + fprintf(stderr, "Replacing content charset %s with UTF-8\n", + prefered_charset); + } + free(prefered_charset); + prefered_charset = strsav("UTF-8"); + } + + if (set_debug_level) { + fprintf(stderr, "Message will be stored using charset %s\n", prefered_charset); + } + +#ifdef CHARSETSP if (prefered_content_charset) { if (charset) { free(charset); @@ -2583,10 +3430,10 @@ int parsemail(char *mbox, /* file name */ charset=strsav(charsetsave); } else{ /* default charset for plain/text is US-ASCII */ - /* ISO-8859-1 is modern, however (DM) */ - charset=strsav("US-ASCII"); + /* UTF-8 is modern, however (DM) */ + charset=strsav(set_default_charset); #ifdef DEBUG_PARSE - fprintf(stderr, "found no charset for body, set ISO-8859-1.\n"); + fprintf(stderr, "found no charset for body, using default_charset %s.\n", set_default_charset); #endif } } else { @@ -2599,11 +3446,8 @@ int parsemail(char *mbox, /* file name */ } } } -#endif - -#ifdef DEBUG_PARSE - fprintf(stderr, "Message will be stored using charset %s\n", charset); -#endif +#endif /* ICONV */ +#endif /* CHARSETSP */ isinheader = 1; if (!hassubject) @@ -2620,11 +3464,12 @@ int parsemail(char *mbox, /* file name */ inreply = oneunre(subject); /* control the use of format and delsp according to RFC 3676 */ - if (textplain_format == FORMAT_FLOWED - && content != CONTENT_TEXT - || (content == CONTENT_TEXT && strcasecmp (type, "text/plain"))) { - /* format flowed only allowed on text/plain */ - textplain_format = FORMAT_FIXED; + if (textplain_format == FORMAT_FLOWED + && (content != CONTENT_TEXT + || (content == CONTENT_TEXT + && strcasecmp (type, "text/plain")))) { + /* format flowed only allowed on text/plain */ + textplain_format = FORMAT_FIXED; } if (textplain_format == FORMAT_FIXED && delsp_flag) { @@ -2632,29 +3477,48 @@ int parsemail(char *mbox, /* file name */ delsp_flag = FALSE; } + if (root_message_node) { + /* multipart message */ + + if (set_debug_level == DEBUG_DUMP_ATT + || set_debug_level == DEBUG_DUMP_ATT_VERBOSE) { + message_node_dump (root_message_node); + progerr("exiting"); + } + + bp = message_node_flatten (&lp, root_message_node); + /* free memory allocated to message nodes */ + message_node_free(root_message_node); + root_message_node = current_message_node = NULL; + root_alt_message_node = current_alt_message_node = NULL; + } else { + /* it was not a multipart message, remove all empty lines + at the end of the message */ + while (rmlastlines(bp)); + } + if (append_bp && append_bp != bp) { /* if we had attachments, close the structure */ - append_bp = - addbody(append_bp, &append_lp, "
  • \n", - BODY_HTMLIZED | bodyflags); - bp = append_body(bp, &lp, append_bp); + append_bp = addbody(append_bp, &append_lp, + NULL, + BODY_ATTACHMENT_LINKS | BODY_ATTACHMENT_LINKS_END); + lp = quick_append_body(lp, append_bp); append_bp = append_lp = NULL; } - else if(!bp) /* probably never used */ + else if(!bp) { /* probably never used */ bp = addbody(bp, &lp, "Hypermail was not able " "to parse this message correctly.\n", bodyflags); - - while (rmlastlines(bp)); - + } + if (set_mbox_shortened && !increment && num == startnum && max_msgnum >= set_startmsgnum) { emp = hashlookupbymsgid(msgid); if (!emp) { - snprintf(errmsg, sizeof(errmsg), - "Message with msgid '%s' not found in .hm2index", -msgid); - progerr(errmsg); + trio_snprintf(errmsg, sizeof(errmsg), + "Message with msgid '%s' not found in .hm2index", + msgid); + progerr(errmsg); } num = emp->msgnum; num_added = insert_older_msgs(num); @@ -2664,23 +3528,24 @@ msgid); if (hashnumlookup(num, &emp)) { if(strcmp(msgid, emp->msgid) && !strstr(emp->msgid, "hypermail.dummy")) { - snprintf(errmsg, sizeof(errmsg), - "msgid mismatch %s %s", msgid, emp->msgid); + trio_snprintf(errmsg, sizeof(errmsg), + "msgid mismatch %s %s", msgid, emp->msgid); progerr(errmsg); } } } - if (!emp) + if (!emp) { emp = addhash(num, date, namep, emailp, msgid, subject, - inreply, fromdate, charset, NULL, NULL, bp); - /* + inreply, fromdate, prefered_charset, NULL, NULL, bp); + } + /* * dp, if it has a value, has a date from the "From " line of - * the message after the one we are just finishing. + * the message after the one we are just finishing. * SMR 19 Oct 99: moved this *after* the addhash() call so it - * isn't erroneously associate with the previous message + * isn't erroneously associate with the previous message */ - + strcpymax(fromdate, dp ? dp : "", DATESTRLEN); if (emp) { @@ -2693,11 +3558,19 @@ msgid); require_filter_len + require_filter_full_len)) ++num_added; num++; - } - else if (att_dir != NULL) { - emptydir(att_dir); - rmdir(att_dir); - } + + } else { + /* addhash refused to add this message, maybe it's a duplicate id + or it failed one of its tests. + We delete the body to avoid and associated attachments to + avoid memory leaks */ + free_body(bp); + + if (att_dir != NULL) { + emptydir(att_dir); + rmdir(att_dir); + } + } for (pos = 0; pos < require_filter_len; ++pos) require_filter[pos] = FALSE; for (pos = 0; pos < require_filter_full_len; ++pos) @@ -2705,14 +3578,14 @@ msgid); if (set_txtsuffix && emp && set_increment != -1) write_txt_file(emp, &raw_text_buf); - if (hasdate) + if (hasdate) { free(date); - if (hassubject) + date = NULL; + } + if (hassubject) { free(subject); - if (inreply) { - free(inreply); - inreply = NULL; - } + subject = NULL; + } if (charset) { free(charset); charset = NULL; @@ -2720,14 +3593,18 @@ msgid); if (charsetsave){ *charsetsave = 0; } - if (prefered_content_charset) { - free(prefered_content_charset); - prefered_content_charset = NULL; - } + if (prefered_charset) { + free(prefered_charset); + prefered_charset = NULL; + } if (msgid) { free(msgid); msgid = NULL; } + if (inreply) { + free(inreply); + inreply = NULL; + } if (namep) { free(namep); namep = NULL; @@ -2737,7 +3614,7 @@ msgid); emailp = NULL; } - bp = NULL; + bp = lp = headp = NULL; bodyflags = 0; /* reset state flags */ /* reset related RFC 3676 state flags */ @@ -2748,12 +3625,17 @@ msgid); continue_previous_flow_flag = FALSE; /* go back to default mode: */ + file_created = alternative_lastfile_created = NO_FILE; content = CONTENT_TEXT; + if (decode == ENCODE_BASE64) { + base64_decoder_state_free(b64_decoder_state); + b64_decoder_state = NULL; + } decode = ENCODE_NORMAL; Mime_B = FALSE; skip_mime_epilogue = FALSE; headp = NULL; - content_type_p = NULL; + attachment_rfc822 = FALSE; multilinenoend = FALSE; if (att_dir) { free(att_dir); @@ -2764,13 +3646,35 @@ msgid); meta_dir = NULL; } att_counter = 0; - att_name_list = NULL; + if (att_name_list) { + hmlist_free (att_name_list); + att_name_list = NULL; + } inline_force = FALSE; - attachname[0] = '\0'; + attach_force = FALSE; + *attachname = '\0'; + if (att_binname) { + free(att_binname); + att_binname = NULL; + } + if (meta_filename) { + free(meta_filename); + meta_filename = NULL; + } + if (att_link) { + free(att_link); + att_link = NULL; + } + if (att_comment_filename) { + free(att_comment_filename); + att_comment_filename = NULL; + } + /* by default we have none! */ hassubject = 0; hasdate = 0; + message_headers_parsed = FALSE; annotation_robot = ANNOTATION_ROBOT_NONE; annotation_content = ANNOTATION_CONTENT_NONE; @@ -2778,32 +3682,30 @@ msgid); is_deleted = 0; exp_time = -1; - free_bound (boundp); + boundary_stack_free(boundp); boundp = NULL; - - free_multipart (multipartp); + boundary_id = NULL; + + multipart_stack_free(multipartp); multipartp = NULL; - free_charsets (charsetsp); - charsetsp = NULL; - alternativeparser = FALSE; /* there is none anymore */ if (parse_multipart_alternative_force_save_alts) { parse_multipart_alternative_force_save_alts = 0; - + #if DEBUG_PARSE printf("Applemail_hack resetting parse_multipart_alternative_force_save_alts\n"); #endif - if (old_set_save_alts != -1) { - set_save_alts = old_set_save_alts; - old_set_save_alts = -1; -#if DEBUG_PARSE - printf("Applemail_hack resetting save_alts to %d\n", old_set_save_alts); -#endif + if (applemail_old_set_save_alts != -1) { + local_set_save_alts = applemail_old_set_save_alts; + applemail_old_set_save_alts = -1; +#if DEBUG_PARSE + printf("Applemail_hack resetting save_alts to %d\n", local_set_save_alts); +#endif } } - + if (!(num % 10) && set_showprogress && !readone) { print_progress(num - startnum, NULL, NULL); } @@ -2818,76 +3720,175 @@ msgid); if (set_linkquotes && !inreply) { /* why only if set_linkquotes? pcm */ char *new_inreply = getreply(line); - if (new_inreply && !*new_inreply) free(new_inreply); - else inreply = new_inreply; + if (new_inreply && !*new_inreply) { + free(new_inreply); + } else { + inreply = new_inreply; + } } - + if (Mime_B) { if (boundp && - !strncmp(line, "--", 2) && - !strncmp(line + 2, boundp->line, - strlen(boundp->line))) { + !strncmp(line, "--", 2) + && ! _is_signature_separator(line) + && boundary_stack_has_id(boundp, line)) { /* right at this point, we have another part coming up */ #if DEBUG_PARSE printf("hit %s\n", line); #endif - if (!strncmp(line + 2 + strlen(boundp->line), "--", 2)) { - /* @@@ don't know why we had this line here. Doesn't hurt to take - it out, though */ -#if 0 - bp = addbody(bp, &lp, "\n", - BODY_HTMLIZED | bodyflags); + + if (bp) { + /* store the current attachment and prepare for + the new one */ + current_message_node = + message_node_mimetest(current_message_node, + bp, lp, charset, charsetsave, + type, + (boundp) ? boundp->boundary_id : NULL, + boundary_id, + att_binname, + meta_filename, + att_link, + att_comment_filename, + attachment_rfc822, + message_node_skip_status(file_created, + content, + type)); +#if DEBUG_PARSE_MSGID_TRACE + current_message_node->msgid = strsav(msgid); #endif - isinheader = 0; /* no header, the ending boundary - can't have any describing - headers */ + if (alternativeparser) { + current_alt_message_node = current_message_node; + } + if (att_binname) { + free(att_binname); + att_binname = NULL; + } + if (meta_filename) { + free(meta_filename); + meta_filename = NULL; + } + if (att_link) { + free(att_link); + att_link = NULL; + } + if (att_comment_filename) { + free(att_comment_filename); + att_comment_filename = NULL; + } + if (alternativeparser) { + current_message_node->alternative = TRUE; + } + + attachment_rfc822 = FALSE; + + description = NULL; + *filename = '\0'; + bp = lp = headp = NULL; + } + + /* make sure the boundaryp stack's top corresponds + to the boundary we're processing. This is to take + into account missing end boundaries */ + if ( ! boundary_stack_top_has_id(boundp, line) ) { + boundary_stack_pop_to_id(&boundp, line); + /* move the current_message_node pointer */ + current_message_node = message_node_get_parent_with_boundid(current_message_node, boundp); + /* restore context for this boundp here (hate that this + restore context code is duplicated) */ + if (boundp) { + parse_multipart_alternative_force_save_alts = boundp->parse_multipart_alternative_force_save_alts; + applemail_old_set_save_alts = boundp->applemail_old_set_save_alts; + local_set_save_alts = boundp->set_save_alts; + + if (boundp->alternativeparser) { + alternativeparser = boundp->alternativeparser; + alternative_weight = boundp->alternative_weight; + alternative_message_node_created = + boundp->alternative_message_node_created; + strcpy(alternative_file, boundp->alternative_file); + strcpy(alternative_lastfile, boundp->alternative_lastfile); + strcpy(last_alternative_type, boundp->last_alternative_type); + alternative_lp = boundp->alternative_lp; + alternative_bp = boundp->alternative_bp; + current_alt_message_node = boundp->current_alt_message_node; + root_alt_message_node = boundp->root_alt_message_node; + boundp->alternative_file[0] = '\0'; + boundp->alternative_lastfile[0] = '\0'; + boundp->last_alternative_type[0] = '\0'; + boundp->current_alt_message_node = NULL; + boundp->root_alt_message_node = NULL; + boundp->alternativeparser = FALSE; + boundp->alternative_message_node_created = FALSE; + } + } + } + + if (is_end_boundary(boundp->boundary_id, line)) { + isinheader = 0; /* no header, the ending boundary + can't have any describing + headers */ #if DEBUG_PARSE printf("End boundary %s\n", line); printf("alternativeparser %d\n", alternativeparser); - printf("has_more_alternatives %d\n", has_multipart(multipartp, "multipart/alternative")); + printf("has_more_alternatives %d\n", multipart_stack_has_type(multipartp, "multipart/alternative")); #endif - boundp = bound(boundp, NULL); - if (!boundp) { + /* this multipart/ part ends, move the message_node cursor to + its parent unless we are at root */ + if (current_message_node->parent) { + current_message_node = message_node_get_parent(current_message_node); + } + boundp = boundary_stack_pop(boundp); + /* restore the context associated with the active boundary */ + if (boundp) { + parse_multipart_alternative_force_save_alts = boundp->parse_multipart_alternative_force_save_alts; + applemail_old_set_save_alts = boundp->applemail_old_set_save_alts; + local_set_save_alts = boundp->set_save_alts; + + if (boundp->alternativeparser) { + alternativeparser = boundp->alternativeparser; + alternative_weight = boundp->alternative_weight; + alternative_message_node_created = + boundp->alternative_message_node_created; + strcpy(alternative_file, boundp->alternative_file); + strcpy(alternative_lastfile, boundp->alternative_lastfile); + strcpy(last_alternative_type, boundp->last_alternative_type); + alternative_lp = boundp->alternative_lp; + alternative_bp = boundp->alternative_bp; + current_alt_message_node = boundp->current_alt_message_node; + root_alt_message_node = boundp->root_alt_message_node; + boundp->alternative_file[0] = '\0'; + boundp->alternative_lastfile[0] = '\0'; + boundp->last_alternative_type[0] = '\0'; + boundp->current_alt_message_node = NULL; + boundp->root_alt_message_node = NULL; + boundp->alternativeparser = FALSE; + boundp->alternative_message_node_created = FALSE; + } + } +#if DELETE_ME + if (!boundp) { bodyflags &= ~BODY_ATTACHED; } +#endif /* skip the MIME epilogue until the next section (or next message!) */ skip_mime_epilogue = TRUE; - multipartp = multipart(multipartp, NULL); - - /* retrieve the parent's charset and charsetsave */ - if (charsetsp->prev != NULL) { - charsetsp = charsets(charsetsp, NULL, NULL); - } - if (charsetsp) { - if (charset) { - free(charset); - if (charsetsp) { - charset = (charsetsp->charset) ? strsav (charsetsp->charset) : NULL; } - } else { - charsetsave[0]='\0'; - } - strcpy (charsetsave, charsetsp->charsetsave); + multipartp = multipart_stack_pop(multipartp); + + *charsetsave='\0'; + if (charset) { + free(charset); + charset = NULL; } -#ifdef DEBUG_PARSE - fprintf(stderr, "Pulling charset %s and charsetsave %s\n", charset, charsetsave); -#endif - if (!boundp && charsetsp->prev == NULL) { -#ifdef DEBUG_PARSE - fprintf(stderr, "No more MIME parts, freeing charsetsp\n"); -#endif - free_charsets(charsetsp); - charsetsp = NULL; - } - if (alternativeparser - && !has_multipart(multipartp, "multipart/alternative")) { + && !multipart_stack_has_type(multipartp, "multipart/alternative")) { #ifdef NOTUSED struct body *next; #endif - + #if DEBUG_PARSE printf("We no longer have alternatives\n"); #endif @@ -2897,16 +3898,22 @@ msgid); /* reset the alternative variables (I think we can skip this step without problems */ alternative_weight = -1; - alternative_bp = NULL; + alternative_bp = alternative_lp = NULL; alternative_lastfile_created = NO_FILE; alternative_file[0] = alternative_lastfile[0] = '\0'; last_alternative_type[0] = '\0'; + type[0] = '\0'; + root_alt_message_node = current_alt_message_node = NULL; #if DEBUG_PARSE printf("We DUMP the chosen alternative\n"); #endif + + bp = lp = NULL; + /* if (bp != origbp) - origbp = append_body(origbp, &origlp, bp); + origbp = append_body(origbp, &origlp, bp, TRUE); + */ bp = origbp; lp = origlp; origbp = origlp = NULL; @@ -2915,12 +3922,12 @@ msgid); } #if DEBUG_PARSE if (boundp) - printf("back %s\n", boundp->line); + printf("back %s\n", boundp->boundary_id); else printf("back to NONE\n"); - + if (multipartp) - printf("current multipart: %s\n", multipartp->line); + printf("current multipart: %s\n", multipart_stack_top_type(multipartp)); else printf("current multipart: NONE\n"); #endif @@ -2928,28 +3935,38 @@ msgid); else { /* we found the beginning of a new section */ skip_mime_epilogue = FALSE; - - if (alternativeparser && !set_save_alts) { + + if (alternativeparser && !local_set_save_alts) { /* * parsing another alternative, so we save the - * precedent values + * precedent values */ + + /* JK: can we delete this? */ + /* alternative_bp = bp; alternative_lp = lp; + */ alternative_lastfile_created = file_created; strcpy(alternative_lastfile, alternative_file); strncpy(last_alternative_type, type, - sizeof(last_alternative_type) - 1); - + sizeof(last_alternative_type) - 2); + /* make sure it's a NULL ending string if ever type > 128 */ + last_alternative_type[sizeof(last_alternative_type) - 1] = '\0'; + /* and now reset them */ headp = bp = lp = NULL; alternative_file[0] = '\0'; + type[0] = '\0'; } else { att_counter++; - if (alternativeparser && set_save_alts == 1) { + if (alternativeparser && local_set_save_alts == 1) { + /* JK: @@@ REVIEW THIS FOR WAI CONTENT. WE DON'T WANT + TO USE
    ANYMORE .. + set_save_alts NEEDS REVIEW AFTER OUR RECENT CHANGES 202305*/ bp = addbody(bp, &lp, set_alts_text ? set_alts_text : "
    ", @@ -2965,9 +3982,14 @@ msgid); } /* go back to the MIME attachment default mode */ content = CONTENT_TEXT; + if (decode == ENCODE_BASE64) { + base64_decoder_state_free(b64_decoder_state); + b64_decoder_state = NULL; + } decode = ENCODE_NORMAL; multilinenoend = FALSE; - + *attachname = '\0'; + /* reset related RFC 3676 state flags */ textplain_format = FORMAT_FIXED; delsp_flag = FALSE; @@ -2975,27 +3997,17 @@ msgid); quotelevel = 0; continue_previous_flow_flag = FALSE; - /* restore the parent's charset/charsetsave values */ - if (charsetsp) { - if (charset) { - free(charset); - } - if (charsetsp->charset) { - charset = strsav(charsetsp->charset); - } else { - charset = NULL; - } - strcpy(charsetsave, charsetsp->charsetsave); - -#ifdef DEBUG_PARSE - printf("New section: restoring charset %s and charsetsave %s\n", charset, charsetsave); -#endif + *charsetsave = '\0'; + if(charset) { + free(charset); + charset = NULL; } + if (-1 != binfile) { close(binfile); binfile = -1; } - + continue; } } @@ -3015,7 +4027,7 @@ msgid); } break; case ENCODE_BASE64: - base64Decode(line, newbuffer, &datalen); + datalen = base64_decode_stream(b64_decoder_state, line, newbuffer); data = newbuffer; break; case ENCODE_UUENCODE: @@ -3034,11 +4046,22 @@ msgid); printf("LINE %s\n", (content != CONTENT_BINARY) ? data : ""); #endif if (data) { + if (content == CONTENT_TEXT && + charset && !strncasecmp (charset, "UTF-8", 5)) { + /* replace all unicode spaces with ascii spaces, + ** as hypermail is using C-lib functions that don't + ** understand them (like isspace() and sscanf() ) */ + i18n_replace_unicode_spaces(data, strlen(data)); +#if DEBUG_PARSE + printf("LINE with ascii spaces: %s\n", data); +#endif + } + if ((content == CONTENT_TEXT) || (content == CONTENT_HTML)) { if (decode > ENCODE_MULTILINED) { - /* - * This can be more than one resulting line, + /* + * This can be more than one resulting line, * as the decoded the string may look like: * "#!/bin/sh\r\n\r\nhelp() {\r\n echo 'Usage: difftree" */ @@ -3067,10 +4090,10 @@ msgid); p = n + 1; } if (strlen(p)) { - /* - * This line doesn't really end here, - * we will get another line soon that - * should get appended! + /* + * This line doesn't really end here, + * we will get another line soon that + * should get appended! */ #if DEBUG_PARSE printf("CONTINUE %s\n", p); @@ -3093,7 +4116,7 @@ msgid); /* remove both space stuffing and quotes * where applicable for f=f */ bodyflags |= BODY_DEL_SSQ; - flowed_line = rfc3676_handler (data, delsp_flag, "elevel, + flowed_line = rfc3676_handler (data, delsp_flag, "elevel, &continue_previous_flow_flag); if (continue_previous_flow_flag) { bodyflags |= BODY_CONTINUE; @@ -3126,37 +4149,32 @@ msgid); file_created = MADE_FILE; } -#ifndef REMOVED_990310 /* If there is no file created, we create and init one */ if (file_created == MAKE_FILE) { char *fname; - char *binname; char *file = NULL; - char buffer[512]; + char buffer[1024]; file_created = MADE_FILE; /* we have, or at least we tried */ /* create the attachment directory if it doesn't exist */ if (att_dir == NULL) { - /* first check the DIR_PREFIXER */ -#ifdef JOSE - trio_asprintf(&att_dir,"%s%c" DIR_PREFIXER "%s", - dir, PATH_SEPARATOR, - message_name (emp)) -#else trio_asprintf(&att_dir,"%s%c" DIR_PREFIXER "%04d", dir, PATH_SEPARATOR, num); -#endif + if (set_increment != -1) check1dir(att_dir); /* If this is a repeated run on the same archive we already * have HTML'ized, we risk extracting the same attachments - * several times and therefore we need to remove all the + * several times and therefore we need to remove all the * attachments currently present before we go ahead! *(Daniel -- August 6, 1999) */ - /* jk: removed it for a while, as it's not so necessary - once we can generate the same file names */ + /* jk: disabled it as it's not so necessary + as we have collision detection for attachment names + and a safer mechanism when rebuilding archives to guarantee + that the same attachment files and names are recreated + after each rebuild run */ #if DEBUG_PARSE emptydir(att_dir); #endif @@ -3177,7 +4195,7 @@ msgid); any links */ if (att_counter > 99) - binname = NULL; + att_binname = NULL; else { if (set_filename_base) create_attachname(attachname, sizeof(attachname)); @@ -3186,29 +4204,31 @@ msgid); else fname = FILE_SUFFIXER; if (!attachname[0] || inlist(att_name_list, fname)) - trio_asprintf(&binname, "%s%c%02d-%s", - att_dir, PATH_SEPARATOR, - att_counter, fname); + trio_asprintf(&att_binname, "%s%c%02d-%s", + att_dir, PATH_SEPARATOR, + att_counter, fname); else - trio_asprintf(&binname, "%s%c%s", - att_dir, PATH_SEPARATOR, - fname); + trio_asprintf(&att_binname, "%s%c%s", + att_dir, PATH_SEPARATOR, + fname); + if (att_name_list == NULL) - att_name_list = att_name_last = (struct hmlist *)malloc(sizeof(struct hmlist)); + att_name_list = att_name_last = (struct hmlist *)emalloc(sizeof(struct hmlist)); else { - att_name_last->next = (struct hmlist *)malloc(sizeof(struct hmlist)); + att_name_last->next = (struct hmlist *)emalloc(sizeof(struct hmlist)); att_name_last = att_name_last->next; } att_name_last->next = NULL; att_name_last->val = strsav(fname); - /* @@ move this one up */ + + /* JK: moved this one up */ /* att_counter++; */ } - /* - * Saving of the attachments is being done - * inline as they are encountered. The - * directories must exist first... + /* + * Saving of the attachments is being done + * inline as they are encountered. The + * directories must exist first... */ #ifdef O_BINARY @@ -3216,36 +4236,35 @@ msgid); #else #define OPENBITMASK O_WRONLY | O_CREAT | O_TRUNC #endif - if (binname) { - binfile = open(binname, OPENBITMASK, + if (att_binname) { + binfile = open(att_binname, OPENBITMASK, set_filemode); #if DEBUG_PARSE - printf("%4d open attachment %s\n", num, binname); + printf("%4d open attachment %s\n", num, att_binname); #endif if (-1 != binfile) { - chmod(binname, set_filemode); + chmod(att_binname, set_filemode); if (set_showprogress) print_progress(num, lang [MSG_CREATED_ATTACHMENT_FILE], - binname); + att_binname); if (set_usemeta) { /* write the mime meta info */ FILE *file_ptr; - char *meta_file; char *ptr; - ptr = strrchr(binname, PATH_SEPARATOR); + ptr = strrchr(att_binname, PATH_SEPARATOR); *ptr = '\0'; - trio_asprintf(&meta_file, "%s%c%s" + trio_asprintf(&meta_filename, "%s%c%s" META_EXTENSION, meta_dir, PATH_SEPARATOR, ptr + 1); *ptr = PATH_SEPARATOR; - file_ptr = fopen(meta_file, "w"); + file_ptr = fopen(meta_filename, "w"); if (file_ptr) { - if (type) { + if (*type) { if (charset) fprintf(file_ptr, "Content-Type: %s; charset=\"%s\"\n", @@ -3255,33 +4274,38 @@ msgid); "Content-Type: %s\n", type); } - if (annotation_robot && set_userobotmeta) { + if (annotation_robot != ANNOTATION_ROBOT_NONE + && set_userobotmeta) { /* annotate the attachments using the experimental google X-Robots-Tag HTTP header. See https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag */ - char *value; - if (annotation_robot == 1) - value = "nofollow"; - else if (annotation_robot == 2) - value = "noindex"; - else if (annotation_robot == 3) - value = "nofollow, noindex"; - fprintf(file_ptr,"X-Robots-Tag: %s\n", value); + char *value = NULL; + + if (annotation_robot == ANNOTATION_ROBOT_NO_FOLLOW) + value = "nofollow"; + else if (annotation_robot == ANNOTATION_ROBOT_NO_INDEX) + value = "noindex"; + else if (annotation_robot == (ANNOTATION_ROBOT_NO_FOLLOW | ANNOTATION_ROBOT_NO_INDEX)) + value = "nofollow, noindex"; + fprintf(file_ptr,"X-Robots-Tag: %s\n", value); } fclose(file_ptr); - chmod(meta_file, set_filemode); - free(meta_file); + chmod(meta_filename, set_filemode); } } if (alternativeparser) { /* save the last name, in case we need to supress it */ - strncpy(alternative_file, binname, + strncpy(alternative_file, att_binname, sizeof(alternative_file) - 1); - /* save the last mime type to help deal with the + /* make sure it's a NULL ending string if ever type > 128 */ + alternative_file[sizeof(alternative_file) - 1] = '\0'; + /* save the last mime type to help deal with the * apple mail hack */ strncpy(last_alternative_type, type, - sizeof(last_alternative_type) - 1); + sizeof(last_alternative_type) - 2); + /* make sure it's a NULL ending string if ever type > 128 */ + last_alternative_type[sizeof(last_alternative_type) - 1] = '\0'; } } @@ -3296,17 +4320,20 @@ msgid); } /* point to the filename and skip the separator */ - file = &binname[strlen(att_dir) + 1]; + file = &att_binname[strlen(att_dir) + 1]; /* protection against having a filename bigger than buffer */ if (strlen(file) <= 500) { char *desc; + bool free_desc=FALSE; char *sp; struct emailsubdir *subdir; if (description && description[0] != '\0' - && hasblack(description)) + && !strisspace(description)) { desc = convchars(description, charset); + free_desc = TRUE; + } else if (inline_force || inlinecontent(type)) desc = @@ -3317,9 +4344,6 @@ msgid); attachname[0] ? attachname : "stored"; - if (description) - description = NULL; - subdir = NULL; if (set_msgsperfolder || set_folder_by_date) { struct emailinfo e; @@ -3342,22 +4366,29 @@ msgid); + 1], file, num, type); trio_snprintf(buffer, sizeof(buffer), - "

    \"%s\"\n
    \n(%s %s:
    %s)

    \n", + "
  • %s %s: %s
    \n" + "\"%s\"\n" + "
  • \n", + type, + lang[MSG_ATTACHMENT], + subdir ? subdir->rel_path_to_top : "", + created_link, file, subdir ? subdir->rel_path_to_top : "", &att_dir[strlen(dir) + 1], PATH_SEPARATOR, file, - desc, type, - lang[MSG_ATTACHMENT], - subdir ? subdir->rel_path_to_top : "", - created_link, file); + desc); free(created_link); - }else { + } else { trio_snprintf(buffer, sizeof(buffer), - "\"%s\"\n", - subdir ? subdir->rel_path_to_top : "", - &att_dir[strlen(dir) + 1], - PATH_SEPARATOR, file, - desc); + "
  • %s %s:
    \n" + "\"%s\"\n" + "
  • \n", + type, + lang[MSG_ATTACHMENT], + subdir ? subdir->rel_path_to_top : "", + &att_dir[strlen(dir) + 1], + PATH_SEPARATOR, file, + desc); } } else { char *created_link = @@ -3370,46 +4401,54 @@ msgid); NULL) *sp = '\0'; trio_snprintf(buffer, sizeof(buffer), - "
      \n
    • %s %s: %s
    • \n
    \n", + "
  • %s %s: %s
  • \n", type, lang[MSG_ATTACHMENT], subdir ? subdir->rel_path_to_top : "", created_link, desc); - free(created_link); } + att_link = strsav(buffer); + att_comment_filename = strsav(file); + + /* use the correct condition to know we're not in + a multipart/ message, just in a single message + that has non-inline content */ + if (!root_message_node && !boundary_id && !boundp) { + /* Print attachment comment before attachment */ + /* add a SECTION to store all this info first */ + if (!append_bp) + append_bp = + addbody(append_bp, &append_lp, + NULL, + BODY_ATTACHMENT_LINKS | BODY_ATTACHMENT_LINKS_START | bodyflags); + append_bp = + addbody(append_bp, &append_lp, buffer, + BODY_HTMLIZED | BODY_ATTACHMENT_LINKS | bodyflags); + trio_snprintf(buffer, sizeof(buffer), + "\n", + file); + append_bp = + addbody(append_bp, &append_lp, buffer, + BODY_HTMLIZED | BODY_ATTACHMENT_LINKS | bodyflags); + } - /* Print attachment comment before attachment */ - /* add a DIV to store all this info first */ - if (!append_bp) - append_bp = - addbody(append_bp, &append_lp, "
    \n", - BODY_HTMLIZED | bodyflags); - append_bp = - addbody(append_bp, &append_lp, buffer, - BODY_HTMLIZED | bodyflags); - trio_snprintf(buffer, sizeof(buffer), - "\n", - file); - append_bp = - addbody(append_bp, &append_lp, buffer, - BODY_HTMLIZED | bodyflags); + if (free_desc) { + free(desc); + } } } inline_force = FALSE; attachname[0] = '\0'; - if (binname && (binfile != -1)) + if (att_binname && (binfile != -1)) content = CONTENT_BINARY; else content = CONTENT_UNKNOWN; - - if (binname) - free(binname); } } -#endif + if (-1 != binfile) { if (datalen < 0) datalen = strlen(data); @@ -3418,8 +4457,9 @@ msgid); } } - if (ENCODE_QP == decode) + if (ENCODE_QP == decode) { free(data); /* this was allocatd by mdecodeQP() */ + } } } } @@ -3429,35 +4469,45 @@ msgid); if (!isinheader || readone) { -#ifdef HAVE_ICONV - if (!charset){ - if (*charsetsave!=0){ - /** - if(set_showprogress){ - printf("\nput charset from subject header..\n"); - } - **/ - charset=strsav(charsetsave); - }else{ - /* default charset is US-ASCII */ - charset=strsav("US-ASCII"); - /** - if(set_showprogress){ - printf("\nfound no charset for body, set ISO-8859-1.\n"); - } - **/ - } - }else{ - /* if body is us-ascii but subject is not, - try to use subject's charset. */ - if (strncasecmp(charset,"us-ascii",8)==0){ - if (*charsetsave!=0 && strcasecmp(charsetsave,"us-ascii")!=0){ - free(charset); - charset=strsav(charsetsave); - } - } - } -#endif +#ifdef CHARSETSP + +#ifdef HAVE_ICONV + /* THE PREFERED CHARSET ALGORITHM ... AGAIN */ + if (root_message_node) { + /* multipart message */ + if (charset) { + free(charset); + } + prefered_charset = strsav(message_node_get_charset(root_message_node)); + + } else { + if (!charset){ + if (*charsetsave!=0){ +#ifdef DEBUG_PARSE + printf("put charset from subject header..\n"); +#endif + charset=strsav(charsetsave); + } else { + /* default charset is US-ASCII */ + charset=strsav(set_default_charset); +#ifdef DEBUG_PARSE + fprintf(stderr, "found no charset for body, using default_charset %s.\n", set_default_charset); +#endif + } + } else { + /* if body is us-ascii but subject is not, + try to use subject's charset. */ + if (strncasecmp(charset,"us-ascii",8)==0){ + if (*charsetsave!=0 && strcasecmp(charsetsave,"us-ascii")!=0){ + free(charset); + charset=strsav(charsetsave); + } + } + } + } +#endif /* HAVE_ICONV */ +#endif /* CHARSETSP */ + if (!hassubject) subject = NOSUBJECT; @@ -3468,9 +4518,9 @@ msgid); inreply = oneunre(subject); /* control the use of format and delsp according to RFC2646 */ - if (textplain_format == FORMAT_FLOWED - && content != CONTENT_TEXT - || (content == CONTENT_TEXT && strcasecmp (type, "text/plain"))) { + if ((textplain_format == FORMAT_FLOWED) + && (content != CONTENT_TEXT + || (content == CONTENT_TEXT && strcasecmp (type, "text/plain")))) { /* format flowed only allowed on text/plain */ textplain_format = FORMAT_FIXED; } @@ -3480,36 +4530,91 @@ msgid); delsp_flag = FALSE; } + if (bp || lp) { + /* if we reach this condition, it means the message is missing one or + more mime boundary ends. Closing the current active node should fix + this */ + if (current_message_node) { + current_message_node = + message_node_mimetest(current_message_node, + bp, lp, charset, charsetsave, + type, + (boundp) ? boundp->boundary_id : NULL, + boundary_id, + att_binname, + meta_filename, + att_link, + att_comment_filename, + attachment_rfc822, + message_node_skip_status(file_created, + content, + type)); +#if DEBUG_PARSE_MSGID_TRACE + current_message_node->msgid = strsav(msgid); +#endif + + } + } + + /* use heuristics to choose the charset for the whole parsed + * message 2 */ + if (root_message_node) { + prefered_charset = message_node_get_charset(root_message_node); + } else { + prefered_charset = _single_content_get_charset(charset, charsetsave); + } + + if (prefered_charset && set_replace_us_ascii_with_utf8 + && !strncasecmp(prefered_charset, "us-ascii", 8)) { + if (set_debug_level) { + fprintf(stderr, "Replacing content charset %s with UTF-8\n", + prefered_charset); + } + free(prefered_charset); + prefered_charset = strsav("UTF-8"); + } + + if (set_debug_level) { + fprintf(stderr, "Message will be stored using charset %s\n", prefered_charset); + } + + if (root_message_node) { + /* multipart message */ + + if (set_debug_level == DEBUG_DUMP_ATT + || set_debug_level == DEBUG_DUMP_ATT_VERBOSE) { + message_node_dump (root_message_node); + progerr("exiting"); + } + + bp = message_node_flatten (&lp, root_message_node); + /* free memory allocated to message nodes */ + message_node_free(root_message_node); + root_message_node = current_message_node = NULL; + root_alt_message_node = current_alt_message_node = NULL; + } else { + /* it was not a multipart message, remove all empty lines + at the end of the message */ + while (rmlastlines(bp)); + } + + if (append_bp && append_bp != bp) { - /* close the DIV */ - append_bp = - addbody(append_bp, &append_lp, "
    \n", - BODY_HTMLIZED | bodyflags); - bp = append_body(bp, &lp, append_bp); + append_bp = addbody(append_bp, &append_lp, + NULL, + BODY_ATTACHMENT_LINKS | BODY_ATTACHMENT_LINKS_END); + + /* + bp = append_body(bp, &lp, append_bp, TRUE); + */ + lp = quick_append_body(lp, append_bp); append_bp = append_lp = NULL; } - while (rmlastlines(bp)); - strcpymax(fromdate, dp ? dp : "", DATESTRLEN); - if (prefered_content_charset) { - if (prefered_content_charset[0] != '\0') { -#ifdef DEBUG_PARSE - fprintf(stderr, "Replacing charset %s with prefered_content_charset %s\n", - charset, prefered_content_charset); -#endif - if (charset) { - free(charset); - } - charset = prefered_content_charset; - } else { - free(prefered_content_charset); - } - prefered_content_charset = NULL; - } - + emp = addhash(num, date, namep, emailp, msgid, subject, inreply, - fromdate, charset, NULL, NULL, bp); + fromdate, prefered_charset, NULL, NULL, bp); if (emp) { emp->exp_time = exp_time; emp->is_deleted = is_deleted; @@ -3521,34 +4626,65 @@ msgid); if (set_txtsuffix && set_increment != -1) write_txt_file(emp, &raw_text_buf); num++; - } - + } else { + /* addhash refused to add this message, maybe it's a duplicate id + or it failed one of its tests. + We delete the body to avoid and associated attachments to + avoid memory leaks */ + free_body(bp); + bp = NULL; + + if (att_dir != NULL) { + emptydir(att_dir); + rmdir(att_dir); + } + } + /* @@@ if we didn't add the message, we should consider erasing the attdir if it's there */ - - if (hasdate) + if (att_binname) { + free(att_binname); + att_binname = NULL; + } + if (meta_filename) { + free(meta_filename); + meta_filename = NULL; + } + if (att_link) { + free(att_link); + att_link = NULL; + } + if (att_comment_filename) { + free(att_comment_filename); + att_comment_filename = NULL; + } + if (hasdate) { free(date); - if (hassubject) + date = NULL; + } + if (hassubject) { free(subject); - if (inreply) { - free(inreply); - inreply = NULL; - } + subject = NULL; + } if (charset) { free(charset); charset = NULL; } if (charsetsave){ - *charsetsave = 0; + *charsetsave = 0; } - if (prefered_content_charset) { - free(prefered_content_charset); - prefered_content_charset = NULL; + if (prefered_charset) { + free(prefered_charset); + prefered_charset = NULL; } if (msgid) { free(msgid); msgid = NULL; } + if (inreply) { + free(inreply); + inreply = NULL; + } if (namep) { free(namep); namep = NULL; @@ -3568,9 +4704,13 @@ msgid); flowed_line = FALSE; quotelevel = 0; continue_previous_flow_flag = FALSE; - + /* go back to default mode: */ content = CONTENT_TEXT; + if (ENCODE_BASE64 == decode) { + base64_decoder_state_free(b64_decoder_state); + b64_decoder_state = NULL; + } decode = ENCODE_NORMAL; Mime_B = FALSE; skip_mime_epilogue = FALSE; @@ -3585,33 +4725,32 @@ msgid); meta_dir = NULL; } att_counter = 0; - while (att_name_list != NULL) { - struct hmlist *ptr_next_att = att_name_list->next; - free(att_name_list->val); - free(att_name_list); - att_name_list = ptr_next_att; - } - att_name_list = NULL; + if (att_name_list) { + hmlist_free (att_name_list); + att_name_list = NULL; + } description = NULL; - + *attachname = '\0'; + if (parse_multipart_alternative_force_save_alts) { parse_multipart_alternative_force_save_alts = 0; - + #if DEBUG_PARSE printf("Applemail_hack resetting parse_multipart_alternative_force_save_alts\n"); #endif - if (old_set_save_alts != -1) { - set_save_alts = old_set_save_alts; - old_set_save_alts = -1; -#if DEBUG_PARSE - printf("Applemail_hack resetting save_alts to %d\n", old_set_save_alts); -#endif + if (applemail_old_set_save_alts != -1) { + local_set_save_alts = applemail_old_set_save_alts; + applemail_old_set_save_alts = -1; +#if DEBUG_PARSE + printf("Applemail_hack resetting save_alts to %d\n", local_set_save_alts); +#endif } } - + /* by default we have none! */ hassubject = 0; hasdate = 0; + message_headers_parsed = FALSE; annotation_robot = ANNOTATION_ROBOT_NONE; annotation_content = ANNOTATION_CONTENT_NONE; @@ -3644,6 +4783,7 @@ msgid); threadlist = NULL; printedthreadlist = NULL; crossindexthread1(datelist); + #if DEBUG_THREAD { struct reply *r; @@ -3667,13 +4807,23 @@ msgid); #endif /* can we clean up a bit please... */ - - free_bound (boundp); - free_multipart (multipartp); + + if (printedthreadlist) { + printed_free(printedthreadlist); + printedthreadlist = NULL; + } + + boundary_stack_free(boundp); + multipart_stack_free(multipartp); if(charsetsave){ free(charsetsave); } + + if (set_debug_level == DEBUG_DUMP_BODY) { + dump_mail(0, num_added); + } + return num_added; /* amount of mails read */ } @@ -3734,25 +4884,25 @@ int parse_old_html(int num, struct emailinfo *ep, int parse_body, FILE *fp; char inreply_start[256]; - static char *inreply_start_old = "
  • In reply to: In reply to: %s: %s: subdir : "", - msgnum_id_table[num], - set_htmlsuffix); + trio_asprintf(&filename, "%s%s%s.%s", set_dir, + subdir ? subdir->subdir : "", + msgnum_id_table[num], + set_htmlsuffix); else - trio_asprintf(&filename, "%s%s%.4d.%s", set_dir, - subdir ? subdir->subdir : "", num, set_htmlsuffix); + trio_asprintf(&filename, "%s%s%.4d.%s", set_dir, + subdir ? subdir->subdir : "", num, set_htmlsuffix); /* * fromdate == @@ -3779,21 +4929,41 @@ int parse_old_html(int num, struct emailinfo *ep, int parse_body, fromdate = getvalue(line); else if (!strcasecmp(command, "sent")) date = getvalue(line); - else if (!strcasecmp(command, "name")) - name = getvalue(line); - else if (!strcasecmp(command, "email")) - email = unobfuscate_email_address(getvalue(line)); + else if (!strcasecmp(command, "name")) { + valp = getvalue(line); + if (valp) { + name = unconvchars(valp); + free(valp); + } + } + else if (!strcasecmp(command, "email")) { + char *tmp = getvalue(line); + if (tmp) { + valp = unconvchars(line); + free (tmp); + if (valp) { + email = unobfuscate_email_address(valp); + free(valp); + } + } + } else if (!strcasecmp(command, "subject")) { valp = getvalue(line); - { + if (valp) { subject = unconvchars(valp); free(valp); } } else if (!strcasecmp(command, "id")) { - char *raw_msgid = getvalue(line); - msgid = unspamify(raw_msgid); - if (raw_msgid) free(raw_msgid); + valp = getvalue(line); + if (valp) { + char *raw_msgid = unconvchars(valp); + free(valp); + msgid = unspamify(raw_msgid); + if (raw_msgid) { + free(raw_msgid); + } + } if (msgid && !strstr(line,"-->") && set_linkquotes) msgid = NULL;/* old version of Hypermail wrote junk? */ } @@ -3819,8 +4989,12 @@ int parse_old_html(int num, struct emailinfo *ep, int parse_body, } else if (!strcasecmp(command, "inreplyto")) { char *raw_msgid = getvalue(line); - valp = unspamify(raw_msgid); - if (raw_msgid) free(raw_msgid); + if (raw_msgid) { + valp = unspamify(raw_msgid); + free(raw_msgid); + } else { + valp = NULL; + } if (valp) { inreply = unconvchars(valp); free(valp); @@ -3880,9 +5054,11 @@ int parse_old_html(int num, struct emailinfo *ep, int parse_body, } } } - else if (cmp_msgid) + else if (cmp_msgid) { + free(filename); return -1; - + } + if (legal) { /* only do this if the input was reliable */ struct emailinfo *emp; @@ -3906,8 +5082,15 @@ int parse_old_html(int num, struct emailinfo *ep, int parse_body, emp = addhash(num, date ? date : NODATE, name, email, msgid, subject, inreply, fromdate, charset, isodate, isofromdate, bp); - if (cmp_msgid) - msgids_are_same = !strcmp(ep->msgid, msgid); + if (cmp_msgid) { + /* at this point, special xml chars have been escaped in msgid, + but not in ep->msgid. We temporarily unconvert them so that we + can do the comparition */ + char *tmpmsgid = unconvchars(msgid); + + msgids_are_same = !strcmp(ep->msgid, tmpmsgid); + free(tmpmsgid); + } if (emp != NULL && replylist_tmp != NULL) { if (do_insert) { emp->exp_time = exp_time; @@ -3961,6 +5144,8 @@ int parse_old_html(int num, struct emailinfo *ep, int parse_body, free(isofromdate); } free(filename); + + free_body(bp); #if 0 if (bp != NULL) { /* revisit me */ if (bp->line) @@ -3989,7 +5174,7 @@ static int loadoldheadersfrommessages(char *dir, int num_from_gdbm) struct reply *replylist_tmp = NULL; int first_read_body = set_startmsgnum; - + if (num_from_gdbm != -1) max_num = num_from_gdbm - 1; else if (set_nonsequential) @@ -4046,29 +5231,29 @@ static int loadoldheadersfrommessages(char *dir, int num_from_gdbm) if (num_from_gdbm == -1) { if (is_empty_archive()) return 0; - snprintf(errmsg, sizeof(errmsg), - "Error: This archive does not appear to be empty, " - "and it has no gdbm file\n(%s). If you want to " - "use incremental updates with the folder_by_date\n" - "option, you must start with an empty archive or " - "with an archive\nthat was generated using the " - "usegdbm option.", GDBM_INDEX_NAME); + trio_snprintf(errmsg, sizeof(errmsg), + "Error: This archive does not appear to be empty, " + "and it has no gdbm file\n(%s). If you want to " + "use incremental updates with the folder_by_date\n" + "option, you must start with an empty archive or " + "with an archive\nthat was generated using the " + "usegdbm option.", GDBM_INDEX_NAME); } else - snprintf(errmsg, sizeof(errmsg), - "Error set_folder_by_date msg %d num_from_gdbm %d", - first_read_body, num_from_gdbm); + trio_snprintf(errmsg, sizeof(errmsg), + "Error set_folder_by_date msg %d num_from_gdbm %d", + first_read_body, num_from_gdbm); } else - snprintf(errmsg, sizeof(errmsg), "folder_by_date with incremental update requires usegdbm option"); + trio_snprintf(errmsg, sizeof(errmsg), "folder_by_date with incremental update requires usegdbm option"); #else - snprintf(errmsg, sizeof(errmsg), - "folder_by_date requires usegdbm option" - ". gdbm support has not been compiled into this" - " copy of hypermail. You probably need to install" - "gdbm and rerun configure."); + trio_snprintf(errmsg, sizeof(errmsg), + "folder_by_date requires usegdbm option" + ". gdbm support has not been compiled into this" + " copy of hypermail. You probably need to install" + "gdbm and rerun configure."); #endif - progerr(errmsg); + progerr(errmsg); } } @@ -4212,7 +5397,7 @@ int loadoldheadersfromGDBMindex(char *dir, int get_count_only) content = gdbm_fetch(gp, key); if(!(dp = content.dptr)) { if (max_num == -1) /* old file where gaps in nums not legal */ - break; /* must be at end */ + break; /* must be at end */ continue; } dp_end = dp + content.dsize; @@ -4271,7 +5456,7 @@ int loadoldheadersfromGDBMindex(char *dir, int get_count_only) free(inreply); #if 0 if(bp) { - if (bp->line) + if (bp->line) free(bp->line); free(bp); } @@ -4289,7 +5474,7 @@ int loadoldheadersfromGDBMindex(char *dir, int get_count_only) loadoldheadersfrommessages(dir, num); } /* end case of able to read gdbm index */ - else { + else { struct emailinfo *emp; if (get_count_only) @@ -4299,7 +5484,7 @@ int loadoldheadersfromGDBMindex(char *dir, int get_count_only) if (set_showprogress) printf(lang[MSG_CREATING_GDBM_INDEX]); num = loadoldheadersfrommessages(dir, -1); - + if(!(gp = gdbm_open(indexname, 0, GDBM_NEWDB, 0600, 0))){ /* Serious problem here: can't create! So, just muddle on. */ @@ -4372,7 +5557,7 @@ void fixnextheader(char *dir, int num, int direction) dp = NULL; ul = 0; - if ((e3 = neighborlookup(num, direction)) != NULL + if ((e3 = neighborlookup(num, direction)) != NULL && (email = neighborlookup(num-1, 1)) != NULL) filename = articlehtmlfilename(e3); else @@ -4389,14 +5574,14 @@ void fixnextheader(char *dir, int num, int direction) cp = bp; /* save start of list to free later */ -#ifdef HAVE_ICONV - char *numsubject,*numname; - numsubject=i18n_utf2numref(email->subject,1); - numname=i18n_utf2numref(email->name,1); -#endif fp = fopen(filename, "w+"); if (fp) { +#ifdef HAVE_ICONV + char *numsubject,*numname; + numsubject=i18n_utf2numref(email->subject,1); + numname=i18n_utf2numref(email->name,1); +#endif while (bp) { if (!strncmp(bp->line, " - -PCRE specification - - -

    Perl-compatible Regular Expressions (PCRE)

    -

    -The HTML documentation for PCRE consists of a number of pages that are listed -below in alphabetical order. If you are new to PCRE, please read the first one -first. -

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    pcre  Introductory page
    pcre-config  Information about the installation configuration
    pcre16  Discussion of the 16-bit PCRE library
    pcre32  Discussion of the 32-bit PCRE library
    pcreapi  PCRE's native API
    pcrebuild  Building PCRE
    pcrecallout  The callout facility
    pcrecompat  Compability with Perl
    pcrecpp  The C++ wrapper for the PCRE library
    pcredemo  A demonstration C program that uses the PCRE library
    pcregrep  The pcregrep command
    pcrejit  Discussion of the just-in-time optimization support
    pcrelimits  Details of size and other limits
    pcrematching  Discussion of the two matching algorithms
    pcrepartial  Using PCRE for partial matching
    pcrepattern  Specification of the regular expressions supported by PCRE
    pcreperform  Some comments on performance
    pcreposix  The POSIX API to the PCRE 8-bit library
    pcreprecompile  How to save and re-use compiled patterns
    pcresample  Discussion of the pcredemo program
    pcrestack  Discussion of PCRE's stack usage
    pcresyntax  Syntax quick-reference summary
    pcretest  The pcretest command for testing PCRE
    pcreunicode  Discussion of Unicode and UTF-8/UTF-16/UTF-32 support
    - -

    -There are also individual pages that summarize the interface for each function -in the library. There is a single page for each triple of 8-bit/16-bit/32-bit -functions. -

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    pcre_assign_jit_stack  Assign stack for JIT matching
    pcre_compile  Compile a regular expression
    pcre_compile2  Compile a regular expression (alternate interface)
    pcre_config  Show build-time configuration options
    pcre_copy_named_substring  Extract named substring into given buffer
    pcre_copy_substring  Extract numbered substring into given buffer
    pcre_dfa_exec  Match a compiled pattern to a subject string - (DFA algorithm; not Perl compatible)
    pcre_exec  Match a compiled pattern to a subject string - (Perl compatible)
    pcre_free_study  Free study data
    pcre_free_substring  Free extracted substring
    pcre_free_substring_list  Free list of extracted substrings
    pcre_fullinfo  Extract information about a pattern
    pcre_get_named_substring  Extract named substring into new memory
    pcre_get_stringnumber  Convert captured string name to number
    pcre_get_stringtable_entries  Find table entries for given string name
    pcre_get_substring  Extract numbered substring into new memory
    pcre_get_substring_list  Extract all substrings into new memory
    pcre_jit_exec  Fast path interface to JIT matching
    pcre_jit_stack_alloc  Create a stack for JIT matching
    pcre_jit_stack_free  Free a JIT matching stack
    pcre_maketables  Build character tables in current locale
    pcre_pattern_to_host_byte_order  Convert compiled pattern to host byte order if necessary
    pcre_refcount  Maintain reference count in compiled pattern
    pcre_study  Study a compiled pattern
    pcre_utf16_to_host_byte_order  Convert UTF-16 string to host byte order if necessary
    pcre_utf32_to_host_byte_order  Convert UTF-32 string to host byte order if necessary
    pcre_version  Return PCRE version and release date
    - - diff --git a/src/pcre/doc/html/pcre-config.html b/src/pcre/doc/html/pcre-config.html deleted file mode 100644 index 56a80604..00000000 --- a/src/pcre/doc/html/pcre-config.html +++ /dev/null @@ -1,109 +0,0 @@ - - -pcre-config specification - - -

    pcre-config man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -
    SYNOPSIS
    -

    -pcre-config [--prefix] [--exec-prefix] [--version] [--libs] - [--libs16] [--libs32] [--libs-cpp] [--libs-posix] - [--cflags] [--cflags-posix] -

    -
    DESCRIPTION
    -

    -pcre-config returns the configuration of the installed PCRE -libraries and the options required to compile a program to use them. Some of -the options apply only to the 8-bit, or 16-bit, or 32-bit libraries, -respectively, and are -not available if only one of those libraries has been built. If an unavailable -option is encountered, the "usage" information is output. -

    -
    OPTIONS
    -

    ---prefix -Writes the directory prefix used in the PCRE installation for architecture -independent files (/usr on many systems, /usr/local on some -systems) to the standard output. -

    -

    ---exec-prefix -Writes the directory prefix used in the PCRE installation for architecture -dependent files (normally the same as --prefix) to the standard output. -

    -

    ---version -Writes the version number of the installed PCRE libraries to the standard -output. -

    -

    ---libs -Writes to the standard output the command line options required to link -with the 8-bit PCRE library (-lpcre on many systems). -

    -

    ---libs16 -Writes to the standard output the command line options required to link -with the 16-bit PCRE library (-lpcre16 on many systems). -

    -

    ---libs32 -Writes to the standard output the command line options required to link -with the 32-bit PCRE library (-lpcre32 on many systems). -

    -

    ---libs-cpp -Writes to the standard output the command line options required to link with -PCRE's C++ wrapper library (-lpcrecpp -lpcre on many -systems). -

    -

    ---libs-posix -Writes to the standard output the command line options required to link with -PCRE's POSIX API wrapper library (-lpcreposix -lpcre on many -systems). -

    -

    ---cflags -Writes to the standard output the command line options required to compile -files that use PCRE (this may include some -I options, but is blank on -many systems). -

    -

    ---cflags-posix -Writes to the standard output the command line options required to compile -files that use PCRE's POSIX API wrapper library (this may include some -I -options, but is blank on many systems). -

    -
    SEE ALSO
    -

    -pcre(3) -

    -
    AUTHOR
    -

    -This manual page was originally written by Mark Baker for the Debian GNU/Linux -system. It has been subsequently revised as a generic PCRE man page. -

    -
    REVISION
    -

    -Last updated: 24 June 2012 -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre.html b/src/pcre/doc/html/pcre.html deleted file mode 100644 index c87b1066..00000000 --- a/src/pcre/doc/html/pcre.html +++ /dev/null @@ -1,224 +0,0 @@ - - -pcre specification - - -

    pcre man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -
    PLEASE TAKE NOTE
    -

    -This document relates to PCRE releases that use the original API, -with library names libpcre, libpcre16, and libpcre32. January 2015 saw the -first release of a new API, known as PCRE2, with release numbers starting at -10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old -libraries (now called PCRE1) are still being maintained for bug fixes, but -there will be no new development. New projects are advised to use the new PCRE2 -libraries. -

    -
    INTRODUCTION
    -

    -The PCRE library is a set of functions that implement regular expression -pattern matching using the same syntax and semantics as Perl, with just a few -differences. Some features that appeared in Python and PCRE before they -appeared in Perl are also available using the Python syntax, there is some -support for one or two .NET and Oniguruma syntax items, and there is an option -for requesting some minor changes that give better JavaScript compatibility. -

    -

    -Starting with release 8.30, it is possible to compile two separate PCRE -libraries: the original, which supports 8-bit character strings (including -UTF-8 strings), and a second library that supports 16-bit character strings -(including UTF-16 strings). The build process allows either one or both to be -built. The majority of the work to make this possible was done by Zoltan -Herczeg. -

    -

    -Starting with release 8.32 it is possible to compile a third separate PCRE -library that supports 32-bit character strings (including UTF-32 strings). The -build process allows any combination of the 8-, 16- and 32-bit libraries. The -work to make this possible was done by Christian Persch. -

    -

    -The three libraries contain identical sets of functions, except that the names -in the 16-bit library start with pcre16_ instead of pcre_, and the -names in the 32-bit library start with pcre32_ instead of pcre_. To -avoid over-complication and reduce the documentation maintenance load, most of -the documentation describes the 8-bit library, with the differences for the -16-bit and 32-bit libraries described separately in the -pcre16 -and -pcre32 -pages. References to functions or structures of the form pcre[16|32]_xxx -should be read as meaning "pcre_xxx when using the 8-bit library, -pcre16_xxx when using the 16-bit library, or pcre32_xxx when using -the 32-bit library". -

    -

    -The current implementation of PCRE corresponds approximately with Perl 5.12, -including support for UTF-8/16/32 encoded strings and Unicode general category -properties. However, UTF-8/16/32 and Unicode support has to be explicitly -enabled; it is not the default. The Unicode tables correspond to Unicode -release 6.3.0. -

    -

    -In addition to the Perl-compatible matching function, PCRE contains an -alternative function that matches the same compiled patterns in a different -way. In certain circumstances, the alternative function has some advantages. -For a discussion of the two matching algorithms, see the -pcrematching -page. -

    -

    -PCRE is written in C and released as a C library. A number of people have -written wrappers and interfaces of various kinds. In particular, Google Inc. -have provided a comprehensive C++ wrapper for the 8-bit library. This is now -included as part of the PCRE distribution. The -pcrecpp -page has details of this interface. Other people's contributions can be found -in the Contrib directory at the primary FTP site, which is: -ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre -

    -

    -Details of exactly which Perl regular expression features are and are not -supported by PCRE are given in separate documents. See the -pcrepattern -and -pcrecompat -pages. There is a syntax summary in the -pcresyntax -page. -

    -

    -Some features of PCRE can be included, excluded, or changed when the library is -built. The -pcre_config() -function makes it possible for a client to discover which features are -available. The features themselves are described in the -pcrebuild -page. Documentation about building PCRE for various operating systems can be -found in the -README -and -NON-AUTOTOOLS_BUILD -files in the source distribution. -

    -

    -The libraries contains a number of undocumented internal functions and data -tables that are used by more than one of the exported external functions, but -which are not intended for use by external callers. Their names all begin with -"_pcre_" or "_pcre16_" or "_pcre32_", which hopefully will not provoke any name -clashes. In some environments, it is possible to control which external symbols -are exported when a shared library is built, and in these cases the -undocumented symbols are not exported. -

    -
    SECURITY CONSIDERATIONS
    -

    -If you are using PCRE in a non-UTF application that permits users to supply -arbitrary patterns for compilation, you should be aware of a feature that -allows users to turn on UTF support from within a pattern, provided that PCRE -was built with UTF support. For example, an 8-bit pattern that begins with -"(*UTF8)" or "(*UTF)" turns on UTF-8 mode, which interprets patterns and -subjects as strings of UTF-8 characters instead of individual 8-bit characters. -This causes both the pattern and any data against which it is matched to be -checked for UTF-8 validity. If the data string is very long, such a check might -use sufficiently many resources as to cause your application to lose -performance. -

    -

    -One way of guarding against this possibility is to use the -pcre_fullinfo() function to check the compiled pattern's options for UTF. -Alternatively, from release 8.33, you can set the PCRE_NEVER_UTF option at -compile time. This causes an compile time error if a pattern contains a -UTF-setting sequence. -

    -

    -If your application is one that supports UTF, be aware that validity checking -can take time. If the same data string is to be matched many times, you can use -the PCRE_NO_UTF[8|16|32]_CHECK option for the second and subsequent matches to -save redundant checks. -

    -

    -Another way that performance can be hit is by running a pattern that has a very -large search tree against a string that will never match. Nested unlimited -repeats in a pattern are a common example. PCRE provides some protection -against this: see the PCRE_EXTRA_MATCH_LIMIT feature in the -pcreapi -page. -

    -
    USER DOCUMENTATION
    -

    -The user documentation for PCRE comprises a number of different sections. In -the "man" format, each of these is a separate "man page". In the HTML format, -each is a separate page, linked from the index page. In the plain text format, -the descriptions of the pcregrep and pcretest programs are in files -called pcregrep.txt and pcretest.txt, respectively. The remaining -sections, except for the pcredemo section (which is a program listing), -are concatenated in pcre.txt, for ease of searching. The sections are as -follows: -

    -  pcre              this document
    -  pcre-config       show PCRE installation configuration information
    -  pcre16            details of the 16-bit library
    -  pcre32            details of the 32-bit library
    -  pcreapi           details of PCRE's native C API
    -  pcrebuild         building PCRE
    -  pcrecallout       details of the callout feature
    -  pcrecompat        discussion of Perl compatibility
    -  pcrecpp           details of the C++ wrapper for the 8-bit library
    -  pcredemo          a demonstration C program that uses PCRE
    -  pcregrep          description of the pcregrep command (8-bit only)
    -  pcrejit           discussion of the just-in-time optimization support
    -  pcrelimits        details of size and other limits
    -  pcrematching      discussion of the two matching algorithms
    -  pcrepartial       details of the partial matching facility
    -  pcrepattern       syntax and semantics of supported regular expressions
    -  pcreperform       discussion of performance issues
    -  pcreposix         the POSIX-compatible C API for the 8-bit library
    -  pcreprecompile    details of saving and re-using precompiled patterns
    -  pcresample        discussion of the pcredemo program
    -  pcrestack         discussion of stack usage
    -  pcresyntax        quick syntax reference
    -  pcretest          description of the pcretest testing command
    -  pcreunicode       discussion of Unicode and UTF-8/16/32 support
    -
    -In the "man" and HTML formats, there is also a short page for each C library -function, listing its arguments and results. -

    -
    AUTHOR
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -

    -Putting an actual email address here seems to have been a spam magnet, so I've -taken it away. If you want to email me, use my two initials, followed by the -two digits 10, at the domain cam.ac.uk. -

    -
    REVISION
    -

    -Last updated: 10 February 2015 -
    -Copyright © 1997-2015 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre16.html b/src/pcre/doc/html/pcre16.html deleted file mode 100644 index f00859f0..00000000 --- a/src/pcre/doc/html/pcre16.html +++ /dev/null @@ -1,384 +0,0 @@ - - -pcre16 specification - - -

    pcre16 man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -

    -#include <pcre.h> -

    -
    PCRE 16-BIT API BASIC FUNCTIONS
    -

    -pcre16 *pcre16_compile(PCRE_SPTR16 pattern, int options, - const char **errptr, int *erroffset, - const unsigned char *tableptr); -
    -
    -pcre16 *pcre16_compile2(PCRE_SPTR16 pattern, int options, - int *errorcodeptr, - const char **errptr, int *erroffset, - const unsigned char *tableptr); -
    -
    -pcre16_extra *pcre16_study(const pcre16 *code, int options, - const char **errptr); -
    -
    -void pcre16_free_study(pcre16_extra *extra); -
    -
    -int pcre16_exec(const pcre16 *code, const pcre16_extra *extra, - PCRE_SPTR16 subject, int length, int startoffset, - int options, int *ovector, int ovecsize); -
    -
    -int pcre16_dfa_exec(const pcre16 *code, const pcre16_extra *extra, - PCRE_SPTR16 subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - int *workspace, int wscount); -

    -
    PCRE 16-BIT API STRING EXTRACTION FUNCTIONS
    -

    -int pcre16_copy_named_substring(const pcre16 *code, - PCRE_SPTR16 subject, int *ovector, - int stringcount, PCRE_SPTR16 stringname, - PCRE_UCHAR16 *buffer, int buffersize); -
    -
    -int pcre16_copy_substring(PCRE_SPTR16 subject, int *ovector, - int stringcount, int stringnumber, PCRE_UCHAR16 *buffer, - int buffersize); -
    -
    -int pcre16_get_named_substring(const pcre16 *code, - PCRE_SPTR16 subject, int *ovector, - int stringcount, PCRE_SPTR16 stringname, - PCRE_SPTR16 *stringptr); -
    -
    -int pcre16_get_stringnumber(const pcre16 *code, -" PCRE_SPTR16 name); -
    -
    -int pcre16_get_stringtable_entries(const pcre16 *code, - PCRE_SPTR16 name, PCRE_UCHAR16 **first, PCRE_UCHAR16 **last); -
    -
    -int pcre16_get_substring(PCRE_SPTR16 subject, int *ovector, - int stringcount, int stringnumber, - PCRE_SPTR16 *stringptr); -
    -
    -int pcre16_get_substring_list(PCRE_SPTR16 subject, - int *ovector, int stringcount, PCRE_SPTR16 **listptr); -
    -
    -void pcre16_free_substring(PCRE_SPTR16 stringptr); -
    -
    -void pcre16_free_substring_list(PCRE_SPTR16 *stringptr); -

    -
    PCRE 16-BIT API AUXILIARY FUNCTIONS
    -

    -pcre16_jit_stack *pcre16_jit_stack_alloc(int startsize, int maxsize); -
    -
    -void pcre16_jit_stack_free(pcre16_jit_stack *stack); -
    -
    -void pcre16_assign_jit_stack(pcre16_extra *extra, - pcre16_jit_callback callback, void *data); -
    -
    -const unsigned char *pcre16_maketables(void); -
    -
    -int pcre16_fullinfo(const pcre16 *code, const pcre16_extra *extra, - int what, void *where); -
    -
    -int pcre16_refcount(pcre16 *code, int adjust); -
    -
    -int pcre16_config(int what, void *where); -
    -
    -const char *pcre16_version(void); -
    -
    -int pcre16_pattern_to_host_byte_order(pcre16 *code, - pcre16_extra *extra, const unsigned char *tables); -

    -
    PCRE 16-BIT API INDIRECTED FUNCTIONS
    -

    -void *(*pcre16_malloc)(size_t); -
    -
    -void (*pcre16_free)(void *); -
    -
    -void *(*pcre16_stack_malloc)(size_t); -
    -
    -void (*pcre16_stack_free)(void *); -
    -
    -int (*pcre16_callout)(pcre16_callout_block *); -

    -
    PCRE 16-BIT API 16-BIT-ONLY FUNCTION
    -

    -int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *output, - PCRE_SPTR16 input, int length, int *byte_order, - int keep_boms); -

    -
    THE PCRE 16-BIT LIBRARY
    -

    -Starting with release 8.30, it is possible to compile a PCRE library that -supports 16-bit character strings, including UTF-16 strings, as well as or -instead of the original 8-bit library. The majority of the work to make this -possible was done by Zoltan Herczeg. The two libraries contain identical sets -of functions, used in exactly the same way. Only the names of the functions and -the data types of their arguments and results are different. To avoid -over-complication and reduce the documentation maintenance load, most of the -PCRE documentation describes the 8-bit library, with only occasional references -to the 16-bit library. This page describes what is different when you use the -16-bit library. -

    -

    -WARNING: A single application can be linked with both libraries, but you must -take care when processing any particular pattern to use functions from just one -library. For example, if you want to study a pattern that was compiled with -pcre16_compile(), you must do so with pcre16_study(), not -pcre_study(), and you must free the study data with -pcre16_free_study(). -

    -
    THE HEADER FILE
    -

    -There is only one header file, pcre.h. It contains prototypes for all the -functions in all libraries, as well as definitions of flags, structures, error -codes, etc. -

    -
    THE LIBRARY NAME
    -

    -In Unix-like systems, the 16-bit library is called libpcre16, and can -normally be accesss by adding -lpcre16 to the command for linking an -application that uses PCRE. -

    -
    STRING TYPES
    -

    -In the 8-bit library, strings are passed to PCRE library functions as vectors -of bytes with the C type "char *". In the 16-bit library, strings are passed as -vectors of unsigned 16-bit quantities. The macro PCRE_UCHAR16 specifies an -appropriate data type, and PCRE_SPTR16 is defined as "const PCRE_UCHAR16 *". In -very many environments, "short int" is a 16-bit data type. When PCRE is built, -it defines PCRE_UCHAR16 as "unsigned short int", but checks that it really is a -16-bit data type. If it is not, the build fails with an error message telling -the maintainer to modify the definition appropriately. -

    -
    STRUCTURE TYPES
    -

    -The types of the opaque structures that are used for compiled 16-bit patterns -and JIT stacks are pcre16 and pcre16_jit_stack respectively. The -type of the user-accessible structure that is returned by pcre16_study() -is pcre16_extra, and the type of the structure that is used for passing -data to a callout function is pcre16_callout_block. These structures -contain the same fields, with the same names, as their 8-bit counterparts. The -only difference is that pointers to character strings are 16-bit instead of -8-bit types. -

    -
    16-BIT FUNCTIONS
    -

    -For every function in the 8-bit library there is a corresponding function in -the 16-bit library with a name that starts with pcre16_ instead of -pcre_. The prototypes are listed above. In addition, there is one extra -function, pcre16_utf16_to_host_byte_order(). This is a utility function -that converts a UTF-16 character string to host byte order if necessary. The -other 16-bit functions expect the strings they are passed to be in host byte -order. -

    -

    -The input and output arguments of -pcre16_utf16_to_host_byte_order() may point to the same address, that is, -conversion in place is supported. The output buffer must be at least as long as -the input. -

    -

    -The length argument specifies the number of 16-bit data units in the -input string; a negative value specifies a zero-terminated string. -

    -

    -If byte_order is NULL, it is assumed that the string starts off in host -byte order. This may be changed by byte-order marks (BOMs) anywhere in the -string (commonly as the first character). -

    -

    -If byte_order is not NULL, a non-zero value of the integer to which it -points means that the input starts off in host byte order, otherwise the -opposite order is assumed. Again, BOMs in the string can change this. The final -byte order is passed back at the end of processing. -

    -

    -If keep_boms is not zero, byte-order mark characters (0xfeff) are copied -into the output string. Otherwise they are discarded. -

    -

    -The result of the function is the number of 16-bit units placed into the output -buffer, including the zero terminator if the string was zero-terminated. -

    -
    SUBJECT STRING OFFSETS
    -

    -The lengths and starting offsets of subject strings must be specified in 16-bit -data units, and the offsets within subject strings that are returned by the -matching functions are in also 16-bit units rather than bytes. -

    -
    NAMED SUBPATTERNS
    -

    -The name-to-number translation table that is maintained for named subpatterns -uses 16-bit characters. The pcre16_get_stringtable_entries() function -returns the length of each entry in the table as the number of 16-bit data -units. -

    -
    OPTION NAMES
    -

    -There are two new general option names, PCRE_UTF16 and PCRE_NO_UTF16_CHECK, -which correspond to PCRE_UTF8 and PCRE_NO_UTF8_CHECK in the 8-bit library. In -fact, these new options define the same bits in the options word. There is a -discussion about the -validity of UTF-16 strings -in the -pcreunicode -page. -

    -

    -For the pcre16_config() function there is an option PCRE_CONFIG_UTF16 -that returns 1 if UTF-16 support is configured, otherwise 0. If this option is -given to pcre_config() or pcre32_config(), or if the -PCRE_CONFIG_UTF8 or PCRE_CONFIG_UTF32 option is given to pcre16_config(), -the result is the PCRE_ERROR_BADOPTION error. -

    -
    CHARACTER CODES
    -

    -In 16-bit mode, when PCRE_UTF16 is not set, character values are treated in the -same way as in 8-bit, non UTF-8 mode, except, of course, that they can range -from 0 to 0xffff instead of 0 to 0xff. Character types for characters less than -0xff can therefore be influenced by the locale in the same way as before. -Characters greater than 0xff have only one case, and no "type" (such as letter -or digit). -

    -

    -In UTF-16 mode, the character code is Unicode, in the range 0 to 0x10ffff, with -the exception of values in the range 0xd800 to 0xdfff because those are -"surrogate" values that are used in pairs to encode values greater than 0xffff. -

    -

    -A UTF-16 string can indicate its endianness by special code knows as a -byte-order mark (BOM). The PCRE functions do not handle this, expecting strings -to be in host byte order. A utility function called -pcre16_utf16_to_host_byte_order() is provided to help with this (see -above). -

    -
    ERROR NAMES
    -

    -The errors PCRE_ERROR_BADUTF16_OFFSET and PCRE_ERROR_SHORTUTF16 correspond to -their 8-bit counterparts. The error PCRE_ERROR_BADMODE is given when a compiled -pattern is passed to a function that processes patterns in the other -mode, for example, if a pattern compiled with pcre_compile() is passed to -pcre16_exec(). -

    -

    -There are new error codes whose names begin with PCRE_UTF16_ERR for invalid -UTF-16 strings, corresponding to the PCRE_UTF8_ERR codes for UTF-8 strings that -are described in the section entitled -"Reason codes for invalid UTF-8 strings" -in the main -pcreapi -page. The UTF-16 errors are: -

    -  PCRE_UTF16_ERR1  Missing low surrogate at end of string
    -  PCRE_UTF16_ERR2  Invalid low surrogate follows high surrogate
    -  PCRE_UTF16_ERR3  Isolated low surrogate
    -  PCRE_UTF16_ERR4  Non-character
    -
    -

    -
    ERROR TEXTS
    -

    -If there is an error while compiling a pattern, the error text that is passed -back by pcre16_compile() or pcre16_compile2() is still an 8-bit -character string, zero-terminated. -

    -
    CALLOUTS
    -

    -The subject and mark fields in the callout block that is passed to -a callout function point to 16-bit vectors. -

    -
    TESTING
    -

    -The pcretest program continues to operate with 8-bit input and output -files, but it can be used for testing the 16-bit library. If it is run with the -command line option -16, patterns and subject strings are converted from -8-bit to 16-bit before being passed to PCRE, and the 16-bit library functions -are used instead of the 8-bit ones. Returned 16-bit strings are converted to -8-bit for output. If both the 8-bit and the 32-bit libraries were not compiled, -pcretest defaults to 16-bit and the -16 option is ignored. -

    -

    -When PCRE is being built, the RunTest script that is called by "make -check" uses the pcretest -C option to discover which of the 8-bit, -16-bit and 32-bit libraries has been built, and runs the tests appropriately. -

    -
    NOT SUPPORTED IN 16-BIT MODE
    -

    -Not all the features of the 8-bit library are available with the 16-bit -library. The C++ and POSIX wrapper functions support only the 8-bit library, -and the pcregrep program is at present 8-bit only. -

    -
    AUTHOR
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    REVISION
    -

    -Last updated: 12 May 2013 -
    -Copyright © 1997-2013 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre32.html b/src/pcre/doc/html/pcre32.html deleted file mode 100644 index f96876e7..00000000 --- a/src/pcre/doc/html/pcre32.html +++ /dev/null @@ -1,382 +0,0 @@ - - -pcre32 specification - - -

    pcre32 man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -

    -#include <pcre.h> -

    -
    PCRE 32-BIT API BASIC FUNCTIONS
    -

    -pcre32 *pcre32_compile(PCRE_SPTR32 pattern, int options, - const char **errptr, int *erroffset, - const unsigned char *tableptr); -
    -
    -pcre32 *pcre32_compile2(PCRE_SPTR32 pattern, int options, - int *errorcodeptr, - const unsigned char *tableptr); -
    -
    -pcre32_extra *pcre32_study(const pcre32 *code, int options, - const char **errptr); -
    -
    -void pcre32_free_study(pcre32_extra *extra); -
    -
    -int pcre32_exec(const pcre32 *code, const pcre32_extra *extra, - PCRE_SPTR32 subject, int length, int startoffset, - int options, int *ovector, int ovecsize); -
    -
    -int pcre32_dfa_exec(const pcre32 *code, const pcre32_extra *extra, - PCRE_SPTR32 subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - int *workspace, int wscount); -

    -
    PCRE 32-BIT API STRING EXTRACTION FUNCTIONS
    -

    -int pcre32_copy_named_substring(const pcre32 *code, - PCRE_SPTR32 subject, int *ovector, - int stringcount, PCRE_SPTR32 stringname, - PCRE_UCHAR32 *buffer, int buffersize); -
    -
    -int pcre32_copy_substring(PCRE_SPTR32 subject, int *ovector, - int stringcount, int stringnumber, PCRE_UCHAR32 *buffer, - int buffersize); -
    -
    -int pcre32_get_named_substring(const pcre32 *code, - PCRE_SPTR32 subject, int *ovector, - int stringcount, PCRE_SPTR32 stringname, - PCRE_SPTR32 *stringptr); -
    -
    -int pcre32_get_stringnumber(const pcre32 *code, - PCRE_SPTR32 name); -
    -
    -int pcre32_get_stringtable_entries(const pcre32 *code, - PCRE_SPTR32 name, PCRE_UCHAR32 **first, PCRE_UCHAR32 **last); -
    -
    -int pcre32_get_substring(PCRE_SPTR32 subject, int *ovector, - int stringcount, int stringnumber, - PCRE_SPTR32 *stringptr); -
    -
    -int pcre32_get_substring_list(PCRE_SPTR32 subject, - int *ovector, int stringcount, PCRE_SPTR32 **listptr); -
    -
    -void pcre32_free_substring(PCRE_SPTR32 stringptr); -
    -
    -void pcre32_free_substring_list(PCRE_SPTR32 *stringptr); -

    -
    PCRE 32-BIT API AUXILIARY FUNCTIONS
    -

    -pcre32_jit_stack *pcre32_jit_stack_alloc(int startsize, int maxsize); -
    -
    -void pcre32_jit_stack_free(pcre32_jit_stack *stack); -
    -
    -void pcre32_assign_jit_stack(pcre32_extra *extra, - pcre32_jit_callback callback, void *data); -
    -
    -const unsigned char *pcre32_maketables(void); -
    -
    -int pcre32_fullinfo(const pcre32 *code, const pcre32_extra *extra, - int what, void *where); -
    -
    -int pcre32_refcount(pcre32 *code, int adjust); -
    -
    -int pcre32_config(int what, void *where); -
    -
    -const char *pcre32_version(void); -
    -
    -int pcre32_pattern_to_host_byte_order(pcre32 *code, - pcre32_extra *extra, const unsigned char *tables); -

    -
    PCRE 32-BIT API INDIRECTED FUNCTIONS
    -

    -void *(*pcre32_malloc)(size_t); -
    -
    -void (*pcre32_free)(void *); -
    -
    -void *(*pcre32_stack_malloc)(size_t); -
    -
    -void (*pcre32_stack_free)(void *); -
    -
    -int (*pcre32_callout)(pcre32_callout_block *); -

    -
    PCRE 32-BIT API 32-BIT-ONLY FUNCTION
    -

    -int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *output, - PCRE_SPTR32 input, int length, int *byte_order, - int keep_boms); -

    -
    THE PCRE 32-BIT LIBRARY
    -

    -Starting with release 8.32, it is possible to compile a PCRE library that -supports 32-bit character strings, including UTF-32 strings, as well as or -instead of the original 8-bit library. This work was done by Christian Persch, -based on the work done by Zoltan Herczeg for the 16-bit library. All three -libraries contain identical sets of functions, used in exactly the same way. -Only the names of the functions and the data types of their arguments and -results are different. To avoid over-complication and reduce the documentation -maintenance load, most of the PCRE documentation describes the 8-bit library, -with only occasional references to the 16-bit and 32-bit libraries. This page -describes what is different when you use the 32-bit library. -

    -

    -WARNING: A single application can be linked with all or any of the three -libraries, but you must take care when processing any particular pattern -to use functions from just one library. For example, if you want to study -a pattern that was compiled with pcre32_compile(), you must do so -with pcre32_study(), not pcre_study(), and you must free the -study data with pcre32_free_study(). -

    -
    THE HEADER FILE
    -

    -There is only one header file, pcre.h. It contains prototypes for all the -functions in all libraries, as well as definitions of flags, structures, error -codes, etc. -

    -
    THE LIBRARY NAME
    -

    -In Unix-like systems, the 32-bit library is called libpcre32, and can -normally be accesss by adding -lpcre32 to the command for linking an -application that uses PCRE. -

    -
    STRING TYPES
    -

    -In the 8-bit library, strings are passed to PCRE library functions as vectors -of bytes with the C type "char *". In the 32-bit library, strings are passed as -vectors of unsigned 32-bit quantities. The macro PCRE_UCHAR32 specifies an -appropriate data type, and PCRE_SPTR32 is defined as "const PCRE_UCHAR32 *". In -very many environments, "unsigned int" is a 32-bit data type. When PCRE is -built, it defines PCRE_UCHAR32 as "unsigned int", but checks that it really is -a 32-bit data type. If it is not, the build fails with an error message telling -the maintainer to modify the definition appropriately. -

    -
    STRUCTURE TYPES
    -

    -The types of the opaque structures that are used for compiled 32-bit patterns -and JIT stacks are pcre32 and pcre32_jit_stack respectively. The -type of the user-accessible structure that is returned by pcre32_study() -is pcre32_extra, and the type of the structure that is used for passing -data to a callout function is pcre32_callout_block. These structures -contain the same fields, with the same names, as their 8-bit counterparts. The -only difference is that pointers to character strings are 32-bit instead of -8-bit types. -

    -
    32-BIT FUNCTIONS
    -

    -For every function in the 8-bit library there is a corresponding function in -the 32-bit library with a name that starts with pcre32_ instead of -pcre_. The prototypes are listed above. In addition, there is one extra -function, pcre32_utf32_to_host_byte_order(). This is a utility function -that converts a UTF-32 character string to host byte order if necessary. The -other 32-bit functions expect the strings they are passed to be in host byte -order. -

    -

    -The input and output arguments of -pcre32_utf32_to_host_byte_order() may point to the same address, that is, -conversion in place is supported. The output buffer must be at least as long as -the input. -

    -

    -The length argument specifies the number of 32-bit data units in the -input string; a negative value specifies a zero-terminated string. -

    -

    -If byte_order is NULL, it is assumed that the string starts off in host -byte order. This may be changed by byte-order marks (BOMs) anywhere in the -string (commonly as the first character). -

    -

    -If byte_order is not NULL, a non-zero value of the integer to which it -points means that the input starts off in host byte order, otherwise the -opposite order is assumed. Again, BOMs in the string can change this. The final -byte order is passed back at the end of processing. -

    -

    -If keep_boms is not zero, byte-order mark characters (0xfeff) are copied -into the output string. Otherwise they are discarded. -

    -

    -The result of the function is the number of 32-bit units placed into the output -buffer, including the zero terminator if the string was zero-terminated. -

    -
    SUBJECT STRING OFFSETS
    -

    -The lengths and starting offsets of subject strings must be specified in 32-bit -data units, and the offsets within subject strings that are returned by the -matching functions are in also 32-bit units rather than bytes. -

    -
    NAMED SUBPATTERNS
    -

    -The name-to-number translation table that is maintained for named subpatterns -uses 32-bit characters. The pcre32_get_stringtable_entries() function -returns the length of each entry in the table as the number of 32-bit data -units. -

    -
    OPTION NAMES
    -

    -There are two new general option names, PCRE_UTF32 and PCRE_NO_UTF32_CHECK, -which correspond to PCRE_UTF8 and PCRE_NO_UTF8_CHECK in the 8-bit library. In -fact, these new options define the same bits in the options word. There is a -discussion about the -validity of UTF-32 strings -in the -pcreunicode -page. -

    -

    -For the pcre32_config() function there is an option PCRE_CONFIG_UTF32 -that returns 1 if UTF-32 support is configured, otherwise 0. If this option is -given to pcre_config() or pcre16_config(), or if the -PCRE_CONFIG_UTF8 or PCRE_CONFIG_UTF16 option is given to pcre32_config(), -the result is the PCRE_ERROR_BADOPTION error. -

    -
    CHARACTER CODES
    -

    -In 32-bit mode, when PCRE_UTF32 is not set, character values are treated in the -same way as in 8-bit, non UTF-8 mode, except, of course, that they can range -from 0 to 0x7fffffff instead of 0 to 0xff. Character types for characters less -than 0xff can therefore be influenced by the locale in the same way as before. -Characters greater than 0xff have only one case, and no "type" (such as letter -or digit). -

    -

    -In UTF-32 mode, the character code is Unicode, in the range 0 to 0x10ffff, with -the exception of values in the range 0xd800 to 0xdfff because those are -"surrogate" values that are ill-formed in UTF-32. -

    -

    -A UTF-32 string can indicate its endianness by special code knows as a -byte-order mark (BOM). The PCRE functions do not handle this, expecting strings -to be in host byte order. A utility function called -pcre32_utf32_to_host_byte_order() is provided to help with this (see -above). -

    -
    ERROR NAMES
    -

    -The error PCRE_ERROR_BADUTF32 corresponds to its 8-bit counterpart. -The error PCRE_ERROR_BADMODE is given when a compiled -pattern is passed to a function that processes patterns in the other -mode, for example, if a pattern compiled with pcre_compile() is passed to -pcre32_exec(). -

    -

    -There are new error codes whose names begin with PCRE_UTF32_ERR for invalid -UTF-32 strings, corresponding to the PCRE_UTF8_ERR codes for UTF-8 strings that -are described in the section entitled -"Reason codes for invalid UTF-8 strings" -in the main -pcreapi -page. The UTF-32 errors are: -

    -  PCRE_UTF32_ERR1  Surrogate character (range from 0xd800 to 0xdfff)
    -  PCRE_UTF32_ERR2  Non-character
    -  PCRE_UTF32_ERR3  Character > 0x10ffff
    -
    -

    -
    ERROR TEXTS
    -

    -If there is an error while compiling a pattern, the error text that is passed -back by pcre32_compile() or pcre32_compile2() is still an 8-bit -character string, zero-terminated. -

    -
    CALLOUTS
    -

    -The subject and mark fields in the callout block that is passed to -a callout function point to 32-bit vectors. -

    -
    TESTING
    -

    -The pcretest program continues to operate with 8-bit input and output -files, but it can be used for testing the 32-bit library. If it is run with the -command line option -32, patterns and subject strings are converted from -8-bit to 32-bit before being passed to PCRE, and the 32-bit library functions -are used instead of the 8-bit ones. Returned 32-bit strings are converted to -8-bit for output. If both the 8-bit and the 16-bit libraries were not compiled, -pcretest defaults to 32-bit and the -32 option is ignored. -

    -

    -When PCRE is being built, the RunTest script that is called by "make -check" uses the pcretest -C option to discover which of the 8-bit, -16-bit and 32-bit libraries has been built, and runs the tests appropriately. -

    -
    NOT SUPPORTED IN 32-BIT MODE
    -

    -Not all the features of the 8-bit library are available with the 32-bit -library. The C++ and POSIX wrapper functions support only the 8-bit library, -and the pcregrep program is at present 8-bit only. -

    -
    AUTHOR
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    REVISION
    -

    -Last updated: 12 May 2013 -
    -Copyright © 1997-2013 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_assign_jit_stack.html b/src/pcre/doc/html/pcre_assign_jit_stack.html deleted file mode 100644 index b2eef704..00000000 --- a/src/pcre/doc/html/pcre_assign_jit_stack.html +++ /dev/null @@ -1,76 +0,0 @@ - - -pcre_assign_jit_stack specification - - -

    pcre_assign_jit_stack man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -void pcre_assign_jit_stack(pcre_extra *extra, - pcre_jit_callback callback, void *data); -
    -
    -void pcre16_assign_jit_stack(pcre16_extra *extra, - pcre16_jit_callback callback, void *data); -
    -
    -void pcre32_assign_jit_stack(pcre32_extra *extra, - pcre32_jit_callback callback, void *data); -

    -
    -DESCRIPTION -
    -

    -This function provides control over the memory used as a stack at run-time by a -call to pcre[16|32]_exec() with a pattern that has been successfully -compiled with JIT optimization. The arguments are: -

    -  extra     the data pointer returned by pcre[16|32]_study()
    -  callback  a callback function
    -  data      a JIT stack or a value to be passed to the callback
    -              function
    -
    -

    -

    -If callback is NULL and data is NULL, an internal 32K block on -the machine stack is used. -

    -

    -If callback is NULL and data is not NULL, data must -be a valid JIT stack, the result of calling pcre[16|32]_jit_stack_alloc(). -

    -

    -If callback not NULL, it is called with data as an argument at -the start of matching, in order to set up a JIT stack. If the result is NULL, -the internal 32K stack is used; otherwise the return value must be a valid JIT -stack, the result of calling pcre[16|32]_jit_stack_alloc(). -

    -

    -You may safely assign the same JIT stack to multiple patterns, as long as they -are all matched in the same thread. In a multithread application, each thread -must use its own JIT stack. For more details, see the -pcrejit -page. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_compile.html b/src/pcre/doc/html/pcre_compile.html deleted file mode 100644 index 95b4bec6..00000000 --- a/src/pcre/doc/html/pcre_compile.html +++ /dev/null @@ -1,111 +0,0 @@ - - -pcre_compile specification - - -

    pcre_compile man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -pcre *pcre_compile(const char *pattern, int options, - const char **errptr, int *erroffset, - const unsigned char *tableptr); -
    -
    -pcre16 *pcre16_compile(PCRE_SPTR16 pattern, int options, - const char **errptr, int *erroffset, - const unsigned char *tableptr); -
    -
    -pcre32 *pcre32_compile(PCRE_SPTR32 pattern, int options, - const char **errptr, int *erroffset, - const unsigned char *tableptr); -

    -
    -DESCRIPTION -
    -

    -This function compiles a regular expression into an internal form. It is the -same as pcre[16|32]_compile2(), except for the absence of the -errorcodeptr argument. Its arguments are: -

    -  pattern       A zero-terminated string containing the
    -                  regular expression to be compiled
    -  options       Zero or more option bits
    -  errptr        Where to put an error message
    -  erroffset     Offset in pattern where error was found
    -  tableptr      Pointer to character tables, or NULL to
    -                  use the built-in default
    -
    -The option bits are: -
    -  PCRE_ANCHORED           Force pattern anchoring
    -  PCRE_AUTO_CALLOUT       Compile automatic callouts
    -  PCRE_BSR_ANYCRLF        \R matches only CR, LF, or CRLF
    -  PCRE_BSR_UNICODE        \R matches all Unicode line endings
    -  PCRE_CASELESS           Do caseless matching
    -  PCRE_DOLLAR_ENDONLY     $ not to match newline at end
    -  PCRE_DOTALL             . matches anything including NL
    -  PCRE_DUPNAMES           Allow duplicate names for subpatterns
    -  PCRE_EXTENDED           Ignore white space and # comments
    -  PCRE_EXTRA              PCRE extra features
    -                            (not much use currently)
    -  PCRE_FIRSTLINE          Force matching to be before newline
    -  PCRE_JAVASCRIPT_COMPAT  JavaScript compatibility
    -  PCRE_MULTILINE          ^ and $ match newlines within data
    -  PCRE_NEVER_UTF          Lock out UTF, e.g. via (*UTF)
    -  PCRE_NEWLINE_ANY        Recognize any Unicode newline sequence
    -  PCRE_NEWLINE_ANYCRLF    Recognize CR, LF, and CRLF as newline
    -                            sequences
    -  PCRE_NEWLINE_CR         Set CR as the newline sequence
    -  PCRE_NEWLINE_CRLF       Set CRLF as the newline sequence
    -  PCRE_NEWLINE_LF         Set LF as the newline sequence
    -  PCRE_NO_AUTO_CAPTURE    Disable numbered capturing paren-
    -                            theses (named ones available)
    -  PCRE_NO_AUTO_POSSESS    Disable auto-possessification
    -  PCRE_NO_START_OPTIMIZE  Disable match-time start optimizations
    -  PCRE_NO_UTF16_CHECK     Do not check the pattern for UTF-16
    -                            validity (only relevant if
    -                            PCRE_UTF16 is set)
    -  PCRE_NO_UTF32_CHECK     Do not check the pattern for UTF-32
    -                            validity (only relevant if
    -                            PCRE_UTF32 is set)
    -  PCRE_NO_UTF8_CHECK      Do not check the pattern for UTF-8
    -                            validity (only relevant if
    -                            PCRE_UTF8 is set)
    -  PCRE_UCP                Use Unicode properties for \d, \w, etc.
    -  PCRE_UNGREEDY           Invert greediness of quantifiers
    -  PCRE_UTF16              Run in pcre16_compile() UTF-16 mode
    -  PCRE_UTF32              Run in pcre32_compile() UTF-32 mode
    -  PCRE_UTF8               Run in pcre_compile() UTF-8 mode
    -
    -PCRE must be built with UTF support in order to use PCRE_UTF8/16/32 and -PCRE_NO_UTF8/16/32_CHECK, and with UCP support if PCRE_UCP is used. -

    -

    -The yield of the function is a pointer to a private data structure that -contains the compiled pattern, or NULL if an error was detected. Note that -compiling regular expressions with one version of PCRE for use with a different -version is not guaranteed to work and may cause crashes. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_compile2.html b/src/pcre/doc/html/pcre_compile2.html deleted file mode 100644 index 9cd56a23..00000000 --- a/src/pcre/doc/html/pcre_compile2.html +++ /dev/null @@ -1,115 +0,0 @@ - - -pcre_compile2 specification - - -

    pcre_compile2 man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -pcre *pcre_compile2(const char *pattern, int options, - int *errorcodeptr, - const char **errptr, int *erroffset, - const unsigned char *tableptr); -
    -
    -pcre16 *pcre16_compile2(PCRE_SPTR16 pattern, int options, - int *errorcodeptr, - const char **errptr, int *erroffset, - const unsigned char *tableptr); -
    -
    -pcre32 *pcre32_compile2(PCRE_SPTR32 pattern, int options, -" int *errorcodeptr,£ - const char **errptr, int *erroffset, - const unsigned char *tableptr); -

    -
    -DESCRIPTION -
    -

    -This function compiles a regular expression into an internal form. It is the -same as pcre[16|32]_compile(), except for the addition of the -errorcodeptr argument. The arguments are: -

    -  pattern       A zero-terminated string containing the
    -                  regular expression to be compiled
    -  options       Zero or more option bits
    -  errorcodeptr  Where to put an error code
    -  errptr        Where to put an error message
    -  erroffset     Offset in pattern where error was found
    -  tableptr      Pointer to character tables, or NULL to
    -                  use the built-in default
    -
    -The option bits are: -
    -  PCRE_ANCHORED           Force pattern anchoring
    -  PCRE_AUTO_CALLOUT       Compile automatic callouts
    -  PCRE_BSR_ANYCRLF        \R matches only CR, LF, or CRLF
    -  PCRE_BSR_UNICODE        \R matches all Unicode line endings
    -  PCRE_CASELESS           Do caseless matching
    -  PCRE_DOLLAR_ENDONLY     $ not to match newline at end
    -  PCRE_DOTALL             . matches anything including NL
    -  PCRE_DUPNAMES           Allow duplicate names for subpatterns
    -  PCRE_EXTENDED           Ignore white space and # comments
    -  PCRE_EXTRA              PCRE extra features
    -                            (not much use currently)
    -  PCRE_FIRSTLINE          Force matching to be before newline
    -  PCRE_JAVASCRIPT_COMPAT  JavaScript compatibility
    -  PCRE_MULTILINE          ^ and $ match newlines within data
    -  PCRE_NEVER_UTF          Lock out UTF, e.g. via (*UTF)
    -  PCRE_NEWLINE_ANY        Recognize any Unicode newline sequence
    -  PCRE_NEWLINE_ANYCRLF    Recognize CR, LF, and CRLF as newline
    -                            sequences
    -  PCRE_NEWLINE_CR         Set CR as the newline sequence
    -  PCRE_NEWLINE_CRLF       Set CRLF as the newline sequence
    -  PCRE_NEWLINE_LF         Set LF as the newline sequence
    -  PCRE_NO_AUTO_CAPTURE    Disable numbered capturing paren-
    -                            theses (named ones available)
    -  PCRE_NO_AUTO_POSSESS    Disable auto-possessification
    -  PCRE_NO_START_OPTIMIZE  Disable match-time start optimizations
    -  PCRE_NO_UTF16_CHECK     Do not check the pattern for UTF-16
    -                            validity (only relevant if
    -                            PCRE_UTF16 is set)
    -  PCRE_NO_UTF32_CHECK     Do not check the pattern for UTF-32
    -                            validity (only relevant if
    -                            PCRE_UTF32 is set)
    -  PCRE_NO_UTF8_CHECK      Do not check the pattern for UTF-8
    -                            validity (only relevant if
    -                            PCRE_UTF8 is set)
    -  PCRE_UCP                Use Unicode properties for \d, \w, etc.
    -  PCRE_UNGREEDY           Invert greediness of quantifiers
    -  PCRE_UTF16              Run pcre16_compile() in UTF-16 mode
    -  PCRE_UTF32              Run pcre32_compile() in UTF-32 mode
    -  PCRE_UTF8               Run pcre_compile() in UTF-8 mode
    -
    -PCRE must be built with UTF support in order to use PCRE_UTF8/16/32 and -PCRE_NO_UTF8/16/32_CHECK, and with UCP support if PCRE_UCP is used. -

    -

    -The yield of the function is a pointer to a private data structure that -contains the compiled pattern, or NULL if an error was detected. Note that -compiling regular expressions with one version of PCRE for use with a different -version is not guaranteed to work and may cause crashes. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_config.html b/src/pcre/doc/html/pcre_config.html deleted file mode 100644 index 72fb9caa..00000000 --- a/src/pcre/doc/html/pcre_config.html +++ /dev/null @@ -1,94 +0,0 @@ - - -pcre_config specification - - -

    pcre_config man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre_config(int what, void *where); -

    -

    -int pcre16_config(int what, void *where); -

    -

    -int pcre32_config(int what, void *where); -

    -
    -DESCRIPTION -
    -

    -This function makes it possible for a client program to find out which optional -features are available in the version of the PCRE library it is using. The -arguments are as follows: -

    -  what     A code specifying what information is required
    -  where    Points to where to put the data
    -
    -The where argument must point to an integer variable, except for -PCRE_CONFIG_MATCH_LIMIT, PCRE_CONFIG_MATCH_LIMIT_RECURSION, and -PCRE_CONFIG_PARENS_LIMIT, when it must point to an unsigned long integer, -and for PCRE_CONFIG_JITTARGET, when it must point to a const char*. -The available codes are: -
    -  PCRE_CONFIG_JIT           Availability of just-in-time compiler
    -                              support (1=yes 0=no)
    -  PCRE_CONFIG_JITTARGET     String containing information about the
    -                              target architecture for the JIT compiler,
    -                              or NULL if there is no JIT support
    -  PCRE_CONFIG_LINK_SIZE     Internal link size: 2, 3, or 4
    -  PCRE_CONFIG_PARENS_LIMIT  Parentheses nesting limit
    -  PCRE_CONFIG_MATCH_LIMIT   Internal resource limit
    -  PCRE_CONFIG_MATCH_LIMIT_RECURSION
    -                            Internal recursion depth limit
    -  PCRE_CONFIG_NEWLINE       Value of the default newline sequence:
    -                                13 (0x000d)    for CR
    -                                10 (0x000a)    for LF
    -                              3338 (0x0d0a)    for CRLF
    -                                -2             for ANYCRLF
    -                                -1             for ANY
    -  PCRE_CONFIG_BSR           Indicates what \R matches by default:
    -                                 0             all Unicode line endings
    -                                 1             CR, LF, or CRLF only
    -  PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
    -                            Threshold of return slots, above which
    -                              malloc() is used by the POSIX API
    -  PCRE_CONFIG_STACKRECURSE  Recursion implementation (1=stack 0=heap)
    -  PCRE_CONFIG_UTF16         Availability of UTF-16 support (1=yes
    -                               0=no); option for pcre16_config()
    -  PCRE_CONFIG_UTF32         Availability of UTF-32 support (1=yes
    -                               0=no); option for pcre32_config()
    -  PCRE_CONFIG_UTF8          Availability of UTF-8 support (1=yes 0=no);
    -                              option for pcre_config()
    -  PCRE_CONFIG_UNICODE_PROPERTIES
    -                            Availability of Unicode property support
    -                              (1=yes 0=no)
    -
    -The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise. That error -is also given if PCRE_CONFIG_UTF16 or PCRE_CONFIG_UTF32 is passed to -pcre_config(), if PCRE_CONFIG_UTF8 or PCRE_CONFIG_UTF32 is passed to -pcre16_config(), or if PCRE_CONFIG_UTF8 or PCRE_CONFIG_UTF16 is passed to -pcre32_config(). -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_copy_named_substring.html b/src/pcre/doc/html/pcre_copy_named_substring.html deleted file mode 100644 index 77b48043..00000000 --- a/src/pcre/doc/html/pcre_copy_named_substring.html +++ /dev/null @@ -1,65 +0,0 @@ - - -pcre_copy_named_substring specification - - -

    pcre_copy_named_substring man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre_copy_named_substring(const pcre *code, - const char *subject, int *ovector, - int stringcount, const char *stringname, - char *buffer, int buffersize); -
    -
    -int pcre16_copy_named_substring(const pcre16 *code, - PCRE_SPTR16 subject, int *ovector, - int stringcount, PCRE_SPTR16 stringname, - PCRE_UCHAR16 *buffer, int buffersize); -
    -
    -int pcre32_copy_named_substring(const pcre32 *code, - PCRE_SPTR32 subject, int *ovector, - int stringcount, PCRE_SPTR32 stringname, - PCRE_UCHAR32 *buffer, int buffersize); -

    -
    -DESCRIPTION -
    -

    -This is a convenience function for extracting a captured substring, identified -by name, into a given buffer. The arguments are: -

    -  code          Pattern that was successfully matched
    -  subject       Subject that has been successfully matched
    -  ovector       Offset vector that pcre[16|32]_exec() used
    -  stringcount   Value returned by pcre[16|32]_exec()
    -  stringname    Name of the required substring
    -  buffer        Buffer to receive the string
    -  buffersize    Size of buffer
    -
    -The yield is the length of the substring, PCRE_ERROR_NOMEMORY if the buffer was -too small, or PCRE_ERROR_NOSUBSTRING if the string name is invalid. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_copy_substring.html b/src/pcre/doc/html/pcre_copy_substring.html deleted file mode 100644 index ecaebe85..00000000 --- a/src/pcre/doc/html/pcre_copy_substring.html +++ /dev/null @@ -1,61 +0,0 @@ - - -pcre_copy_substring specification - - -

    pcre_copy_substring man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre_copy_substring(const char *subject, int *ovector, - int stringcount, int stringnumber, char *buffer, - int buffersize); -
    -
    -int pcre16_copy_substring(PCRE_SPTR16 subject, int *ovector, - int stringcount, int stringnumber, PCRE_UCHAR16 *buffer, - int buffersize); -
    -
    -int pcre32_copy_substring(PCRE_SPTR32 subject, int *ovector, - int stringcount, int stringnumber, PCRE_UCHAR32 *buffer, - int buffersize); -

    -
    -DESCRIPTION -
    -

    -This is a convenience function for extracting a captured substring into a given -buffer. The arguments are: -

    -  subject       Subject that has been successfully matched
    -  ovector       Offset vector that pcre[16|32]_exec() used
    -  stringcount   Value returned by pcre[16|32]_exec()
    -  stringnumber  Number of the required substring
    -  buffer        Buffer to receive the string
    -  buffersize    Size of buffer
    -
    -The yield is the length of the string, PCRE_ERROR_NOMEMORY if the buffer was -too small, or PCRE_ERROR_NOSUBSTRING if the string number is invalid. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_dfa_exec.html b/src/pcre/doc/html/pcre_dfa_exec.html deleted file mode 100644 index 5fff6a7e..00000000 --- a/src/pcre/doc/html/pcre_dfa_exec.html +++ /dev/null @@ -1,129 +0,0 @@ - - -pcre_dfa_exec specification - - -

    pcre_dfa_exec man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre_dfa_exec(const pcre *code, const pcre_extra *extra, - const char *subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - int *workspace, int wscount); -
    -
    -int pcre16_dfa_exec(const pcre16 *code, const pcre16_extra *extra, - PCRE_SPTR16 subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - int *workspace, int wscount); -
    -
    -int pcre32_dfa_exec(const pcre32 *code, const pcre32_extra *extra, - PCRE_SPTR32 subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - int *workspace, int wscount); -

    -
    -DESCRIPTION -
    -

    -This function matches a compiled regular expression against a given subject -string, using an alternative matching algorithm that scans the subject string -just once (not Perl-compatible). Note that the main, Perl-compatible, -matching function is pcre[16|32]_exec(). The arguments for this function -are: -

    -  code         Points to the compiled pattern
    -  extra        Points to an associated pcre[16|32]_extra structure,
    -                 or is NULL
    -  subject      Points to the subject string
    -  length       Length of the subject string
    -  startoffset  Offset in the subject at which to start matching
    -  options      Option bits
    -  ovector      Points to a vector of ints for result offsets
    -  ovecsize     Number of elements in the vector
    -  workspace    Points to a vector of ints used as working space
    -  wscount      Number of elements in the vector
    -
    -The units for length and startoffset are bytes for -pcre_exec(), 16-bit data items for pcre16_exec(), and 32-bit items -for pcre32_exec(). The options are: -
    -  PCRE_ANCHORED          Match only at the first position
    -  PCRE_BSR_ANYCRLF       \R matches only CR, LF, or CRLF
    -  PCRE_BSR_UNICODE       \R matches all Unicode line endings
    -  PCRE_NEWLINE_ANY       Recognize any Unicode newline sequence
    -  PCRE_NEWLINE_ANYCRLF   Recognize CR, LF, & CRLF as newline sequences
    -  PCRE_NEWLINE_CR        Recognize CR as the only newline sequence
    -  PCRE_NEWLINE_CRLF      Recognize CRLF as the only newline sequence
    -  PCRE_NEWLINE_LF        Recognize LF as the only newline sequence
    -  PCRE_NOTBOL            Subject is not the beginning of a line
    -  PCRE_NOTEOL            Subject is not the end of a line
    -  PCRE_NOTEMPTY          An empty string is not a valid match
    -  PCRE_NOTEMPTY_ATSTART  An empty string at the start of the subject
    -                           is not a valid match
    -  PCRE_NO_START_OPTIMIZE Do not do "start-match" optimizations
    -  PCRE_NO_UTF16_CHECK    Do not check the subject for UTF-16
    -                           validity (only relevant if PCRE_UTF16
    -                           was set at compile time)
    -  PCRE_NO_UTF32_CHECK    Do not check the subject for UTF-32
    -                           validity (only relevant if PCRE_UTF32
    -                           was set at compile time)
    -  PCRE_NO_UTF8_CHECK     Do not check the subject for UTF-8
    -                           validity (only relevant if PCRE_UTF8
    -                           was set at compile time)
    -  PCRE_PARTIAL           ) Return PCRE_ERROR_PARTIAL for a partial
    -  PCRE_PARTIAL_SOFT      )   match if no full matches are found
    -  PCRE_PARTIAL_HARD      Return PCRE_ERROR_PARTIAL for a partial match
    -                           even if there is a full match as well
    -  PCRE_DFA_SHORTEST      Return only the shortest match
    -  PCRE_DFA_RESTART       Restart after a partial match
    -
    -There are restrictions on what may appear in a pattern when using this matching -function. Details are given in the -pcrematching -documentation. For details of partial matching, see the -pcrepartial -page. -

    -

    -A pcre[16|32]_extra structure contains the following fields: -

    -  flags            Bits indicating which fields are set
    -  study_data       Opaque data from pcre[16|32]_study()
    -  match_limit      Limit on internal resource use
    -  match_limit_recursion  Limit on internal recursion depth
    -  callout_data     Opaque data passed back to callouts
    -  tables           Points to character tables or is NULL
    -  mark             For passing back a *MARK pointer
    -  executable_jit   Opaque data from JIT compilation
    -
    -The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT, -PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, -PCRE_EXTRA_TABLES, PCRE_EXTRA_MARK and PCRE_EXTRA_EXECUTABLE_JIT. For this -matching function, the match_limit and match_limit_recursion fields -are not used, and must not be set. The PCRE_EXTRA_EXECUTABLE_JIT flag and -the corresponding variable are ignored. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_exec.html b/src/pcre/doc/html/pcre_exec.html deleted file mode 100644 index 18e1a13f..00000000 --- a/src/pcre/doc/html/pcre_exec.html +++ /dev/null @@ -1,111 +0,0 @@ - - -pcre_exec specification - - -

    pcre_exec man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre_exec(const pcre *code, const pcre_extra *extra, - const char *subject, int length, int startoffset, - int options, int *ovector, int ovecsize); -
    -
    -int pcre16_exec(const pcre16 *code, const pcre16_extra *extra, - PCRE_SPTR16 subject, int length, int startoffset, - int options, int *ovector, int ovecsize); -
    -
    -int pcre32_exec(const pcre32 *code, const pcre32_extra *extra, - PCRE_SPTR32 subject, int length, int startoffset, - int options, int *ovector, int ovecsize); -

    -
    -DESCRIPTION -
    -

    -This function matches a compiled regular expression against a given subject -string, using a matching algorithm that is similar to Perl's. It returns -offsets to captured substrings. Its arguments are: -

    -  code         Points to the compiled pattern
    -  extra        Points to an associated pcre[16|32]_extra structure,
    -                 or is NULL
    -  subject      Points to the subject string
    -  length       Length of the subject string
    -  startoffset  Offset in the subject at which to start matching
    -  options      Option bits
    -  ovector      Points to a vector of ints for result offsets
    -  ovecsize     Number of elements in the vector (a multiple of 3)
    -
    -The units for length and startoffset are bytes for -pcre_exec(), 16-bit data items for pcre16_exec(), and 32-bit items -for pcre32_exec(). The options are: -
    -  PCRE_ANCHORED          Match only at the first position
    -  PCRE_BSR_ANYCRLF       \R matches only CR, LF, or CRLF
    -  PCRE_BSR_UNICODE       \R matches all Unicode line endings
    -  PCRE_NEWLINE_ANY       Recognize any Unicode newline sequence
    -  PCRE_NEWLINE_ANYCRLF   Recognize CR, LF, & CRLF as newline sequences
    -  PCRE_NEWLINE_CR        Recognize CR as the only newline sequence
    -  PCRE_NEWLINE_CRLF      Recognize CRLF as the only newline sequence
    -  PCRE_NEWLINE_LF        Recognize LF as the only newline sequence
    -  PCRE_NOTBOL            Subject string is not the beginning of a line
    -  PCRE_NOTEOL            Subject string is not the end of a line
    -  PCRE_NOTEMPTY          An empty string is not a valid match
    -  PCRE_NOTEMPTY_ATSTART  An empty string at the start of the subject
    -                           is not a valid match
    -  PCRE_NO_START_OPTIMIZE Do not do "start-match" optimizations
    -  PCRE_NO_UTF16_CHECK    Do not check the subject for UTF-16
    -                           validity (only relevant if PCRE_UTF16
    -                           was set at compile time)
    -  PCRE_NO_UTF32_CHECK    Do not check the subject for UTF-32
    -                           validity (only relevant if PCRE_UTF32
    -                           was set at compile time)
    -  PCRE_NO_UTF8_CHECK     Do not check the subject for UTF-8
    -                           validity (only relevant if PCRE_UTF8
    -                           was set at compile time)
    -  PCRE_PARTIAL           ) Return PCRE_ERROR_PARTIAL for a partial
    -  PCRE_PARTIAL_SOFT      )   match if no full matches are found
    -  PCRE_PARTIAL_HARD      Return PCRE_ERROR_PARTIAL for a partial match
    -                           if that is found before a full match
    -
    -For details of partial matching, see the -pcrepartial -page. A pcre_extra structure contains the following fields: -
    -  flags            Bits indicating which fields are set
    -  study_data       Opaque data from pcre[16|32]_study()
    -  match_limit      Limit on internal resource use
    -  match_limit_recursion  Limit on internal recursion depth
    -  callout_data     Opaque data passed back to callouts
    -  tables           Points to character tables or is NULL
    -  mark             For passing back a *MARK pointer
    -  executable_jit   Opaque data from JIT compilation
    -
    -The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT, -PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, -PCRE_EXTRA_TABLES, PCRE_EXTRA_MARK and PCRE_EXTRA_EXECUTABLE_JIT. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_free_study.html b/src/pcre/doc/html/pcre_free_study.html deleted file mode 100644 index 7f9e10e8..00000000 --- a/src/pcre/doc/html/pcre_free_study.html +++ /dev/null @@ -1,46 +0,0 @@ - - -pcre_free_study specification - - -

    pcre_free_study man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -void pcre_free_study(pcre_extra *extra); -

    -

    -void pcre16_free_study(pcre16_extra *extra); -

    -

    -void pcre32_free_study(pcre32_extra *extra); -

    -
    -DESCRIPTION -
    -

    -This function is used to free the memory used for the data generated by a call -to pcre[16|32]_study() when it is no longer needed. The argument must be the -result of such a call. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_free_substring.html b/src/pcre/doc/html/pcre_free_substring.html deleted file mode 100644 index 1fe66107..00000000 --- a/src/pcre/doc/html/pcre_free_substring.html +++ /dev/null @@ -1,46 +0,0 @@ - - -pcre_free_substring specification - - -

    pcre_free_substring man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -void pcre_free_substring(const char *stringptr); -

    -

    -void pcre16_free_substring(PCRE_SPTR16 stringptr); -

    -

    -void pcre32_free_substring(PCRE_SPTR32 stringptr); -

    -
    -DESCRIPTION -
    -

    -This is a convenience function for freeing the store obtained by a previous -call to pcre[16|32]_get_substring() or pcre[16|32]_get_named_substring(). -Its only argument is a pointer to the string. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_free_substring_list.html b/src/pcre/doc/html/pcre_free_substring_list.html deleted file mode 100644 index c0861780..00000000 --- a/src/pcre/doc/html/pcre_free_substring_list.html +++ /dev/null @@ -1,46 +0,0 @@ - - -pcre_free_substring_list specification - - -

    pcre_free_substring_list man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -void pcre_free_substring_list(const char **stringptr); -

    -

    -void pcre16_free_substring_list(PCRE_SPTR16 *stringptr); -

    -

    -void pcre32_free_substring_list(PCRE_SPTR32 *stringptr); -

    -
    -DESCRIPTION -
    -

    -This is a convenience function for freeing the store obtained by a previous -call to pcre[16|32]_get_substring_list(). Its only argument is a pointer to -the list of string pointers. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_fullinfo.html b/src/pcre/doc/html/pcre_fullinfo.html deleted file mode 100644 index 2b7c72b3..00000000 --- a/src/pcre/doc/html/pcre_fullinfo.html +++ /dev/null @@ -1,118 +0,0 @@ - - -pcre_fullinfo specification - - -

    pcre_fullinfo man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre_fullinfo(const pcre *code, const pcre_extra *extra, - int what, void *where); -
    -
    -int pcre16_fullinfo(const pcre16 *code, const pcre16_extra *extra, - int what, void *where); -
    -
    -int pcre32_fullinfo(const pcre32 *code, const pcre32_extra *extra, - int what, void *where); -

    -
    -DESCRIPTION -
    -

    -This function returns information about a compiled pattern. Its arguments are: -

    -  code                      Compiled regular expression
    -  extra                     Result of pcre[16|32]_study() or NULL
    -  what                      What information is required
    -  where                     Where to put the information
    -
    -The following information is available: -
    -  PCRE_INFO_BACKREFMAX      Number of highest back reference
    -  PCRE_INFO_CAPTURECOUNT    Number of capturing subpatterns
    -  PCRE_INFO_DEFAULT_TABLES  Pointer to default tables
    -  PCRE_INFO_FIRSTBYTE       Fixed first data unit for a match, or
    -                              -1 for start of string
    -                                 or after newline, or
    -                              -2 otherwise
    -  PCRE_INFO_FIRSTTABLE      Table of first data units (after studying)
    -  PCRE_INFO_HASCRORLF       Return 1 if explicit CR or LF matches exist
    -  PCRE_INFO_JCHANGED        Return 1 if (?J) or (?-J) was used
    -  PCRE_INFO_JIT             Return 1 after successful JIT compilation
    -  PCRE_INFO_JITSIZE         Size of JIT compiled code
    -  PCRE_INFO_LASTLITERAL     Literal last data unit required
    -  PCRE_INFO_MINLENGTH       Lower bound length of matching strings
    -  PCRE_INFO_MATCHEMPTY      Return 1 if the pattern can match an empty string,
    -                               0 otherwise
    -  PCRE_INFO_MATCHLIMIT      Match limit if set, otherwise PCRE_RROR_UNSET
    -  PCRE_INFO_MAXLOOKBEHIND   Length (in characters) of the longest lookbehind assertion
    -  PCRE_INFO_NAMECOUNT       Number of named subpatterns
    -  PCRE_INFO_NAMEENTRYSIZE   Size of name table entry
    -  PCRE_INFO_NAMETABLE       Pointer to name table
    -  PCRE_INFO_OKPARTIAL       Return 1 if partial matching can be tried
    -                              (always returns 1 after release 8.00)
    -  PCRE_INFO_OPTIONS         Option bits used for compilation
    -  PCRE_INFO_SIZE            Size of compiled pattern
    -  PCRE_INFO_STUDYSIZE       Size of study data
    -  PCRE_INFO_FIRSTCHARACTER      Fixed first data unit for a match
    -  PCRE_INFO_FIRSTCHARACTERFLAGS Returns
    -                                  1 if there is a first data character set, which can
    -                                    then be retrieved using PCRE_INFO_FIRSTCHARACTER,
    -                                  2 if the first character is at the start of the data
    -                                    string or after a newline, and
    -                                  0 otherwise
    -  PCRE_INFO_RECURSIONLIMIT    Recursion limit if set, otherwise PCRE_ERROR_UNSET
    -  PCRE_INFO_REQUIREDCHAR      Literal last data unit required
    -  PCRE_INFO_REQUIREDCHARFLAGS Returns 1 if the last data character is set (which can then
    -                              be retrieved using PCRE_INFO_REQUIREDCHAR); 0 otherwise
    -
    -The where argument must point to an integer variable, except for the -following what values: -
    -  PCRE_INFO_DEFAULT_TABLES  const uint8_t *
    -  PCRE_INFO_FIRSTCHARACTER  uint32_t
    -  PCRE_INFO_FIRSTTABLE      const uint8_t *
    -  PCRE_INFO_JITSIZE         size_t
    -  PCRE_INFO_MATCHLIMIT      uint32_t
    -  PCRE_INFO_NAMETABLE       PCRE_SPTR16           (16-bit library)
    -  PCRE_INFO_NAMETABLE       PCRE_SPTR32           (32-bit library)
    -  PCRE_INFO_NAMETABLE       const unsigned char * (8-bit library)
    -  PCRE_INFO_OPTIONS         unsigned long int
    -  PCRE_INFO_SIZE            size_t
    -  PCRE_INFO_STUDYSIZE       size_t
    -  PCRE_INFO_RECURSIONLIMIT  uint32_t
    -  PCRE_INFO_REQUIREDCHAR    uint32_t
    -
    -The yield of the function is zero on success or: -
    -  PCRE_ERROR_NULL           the argument code was NULL
    -                            the argument where was NULL
    -  PCRE_ERROR_BADMAGIC       the "magic number" was not found
    -  PCRE_ERROR_BADOPTION      the value of what was invalid
    -  PCRE_ERROR_UNSET          the option was not set
    -
    -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_get_named_substring.html b/src/pcre/doc/html/pcre_get_named_substring.html deleted file mode 100644 index 72924d9b..00000000 --- a/src/pcre/doc/html/pcre_get_named_substring.html +++ /dev/null @@ -1,68 +0,0 @@ - - -pcre_get_named_substring specification - - -

    pcre_get_named_substring man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre_get_named_substring(const pcre *code, - const char *subject, int *ovector, - int stringcount, const char *stringname, - const char **stringptr); -
    -
    -int pcre16_get_named_substring(const pcre16 *code, - PCRE_SPTR16 subject, int *ovector, - int stringcount, PCRE_SPTR16 stringname, - PCRE_SPTR16 *stringptr); -
    -
    -int pcre32_get_named_substring(const pcre32 *code, - PCRE_SPTR32 subject, int *ovector, - int stringcount, PCRE_SPTR32 stringname, - PCRE_SPTR32 *stringptr); -

    -
    -DESCRIPTION -
    -

    -This is a convenience function for extracting a captured substring by name. The -arguments are: -

    -  code          Compiled pattern
    -  subject       Subject that has been successfully matched
    -  ovector       Offset vector that pcre[16|32]_exec() used
    -  stringcount   Value returned by pcre[16|32]_exec()
    -  stringname    Name of the required substring
    -  stringptr     Where to put the string pointer
    -
    -The memory in which the substring is placed is obtained by calling -pcre[16|32]_malloc(). The convenience function -pcre[16|32]_free_substring() can be used to free it when it is no longer -needed. The yield of the function is the length of the extracted substring, -PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or -PCRE_ERROR_NOSUBSTRING if the string name is invalid. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_get_stringnumber.html b/src/pcre/doc/html/pcre_get_stringnumber.html deleted file mode 100644 index 7324d782..00000000 --- a/src/pcre/doc/html/pcre_get_stringnumber.html +++ /dev/null @@ -1,57 +0,0 @@ - - -pcre_get_stringnumber specification - - -

    pcre_get_stringnumber man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre_get_stringnumber(const pcre *code, - const char *name); -
    -
    -int pcre16_get_stringnumber(const pcre16 *code, - PCRE_SPTR16 name); -
    -
    -int pcre32_get_stringnumber(const pcre32 *code, - PCRE_SPTR32 name); -

    -
    -DESCRIPTION -
    -

    -This convenience function finds the number of a named substring capturing -parenthesis in a compiled pattern. Its arguments are: -

    -  code    Compiled regular expression
    -  name    Name whose number is required
    -
    -The yield of the function is the number of the parenthesis if the name is -found, or PCRE_ERROR_NOSUBSTRING otherwise. When duplicate names are allowed -(PCRE_DUPNAMES is set), it is not defined which of the numbers is returned by -pcre[16|32]_get_stringnumber(). You can obtain the complete list by calling -pcre[16|32]_get_stringtable_entries(). -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_get_stringtable_entries.html b/src/pcre/doc/html/pcre_get_stringtable_entries.html deleted file mode 100644 index 79906798..00000000 --- a/src/pcre/doc/html/pcre_get_stringtable_entries.html +++ /dev/null @@ -1,60 +0,0 @@ - - -pcre_get_stringtable_entries specification - - -

    pcre_get_stringtable_entries man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre_get_stringtable_entries(const pcre *code, - const char *name, char **first, char **last); -
    -
    -int pcre16_get_stringtable_entries(const pcre16 *code, - PCRE_SPTR16 name, PCRE_UCHAR16 **first, PCRE_UCHAR16 **last); -
    -
    -int pcre32_get_stringtable_entries(const pcre32 *code, - PCRE_SPTR32 name, PCRE_UCHAR32 **first, PCRE_UCHAR32 **last); -

    -
    -DESCRIPTION -
    -

    -This convenience function finds, for a compiled pattern, the first and last -entries for a given name in the table that translates capturing parenthesis -names into numbers. When names are required to be unique (PCRE_DUPNAMES is -not set), it is usually easier to use pcre[16|32]_get_stringnumber() -instead. -

    -  code    Compiled regular expression
    -  name    Name whose entries required
    -  first   Where to return a pointer to the first entry
    -  last    Where to return a pointer to the last entry
    -
    -The yield of the function is the length of each entry, or -PCRE_ERROR_NOSUBSTRING if none are found. -

    -

    -There is a complete description of the PCRE native API, including the format of -the table entries, in the -pcreapi -page, and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_get_substring.html b/src/pcre/doc/html/pcre_get_substring.html deleted file mode 100644 index 1a8e4f5a..00000000 --- a/src/pcre/doc/html/pcre_get_substring.html +++ /dev/null @@ -1,64 +0,0 @@ - - -pcre_get_substring specification - - -

    pcre_get_substring man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre_get_substring(const char *subject, int *ovector, - int stringcount, int stringnumber, - const char **stringptr); -
    -
    -int pcre16_get_substring(PCRE_SPTR16 subject, int *ovector, - int stringcount, int stringnumber, - PCRE_SPTR16 *stringptr); -
    -
    -int pcre32_get_substring(PCRE_SPTR32 subject, int *ovector, - int stringcount, int stringnumber, - PCRE_SPTR32 *stringptr); -

    -
    -DESCRIPTION -
    -

    -This is a convenience function for extracting a captured substring. The -arguments are: -

    -  subject       Subject that has been successfully matched
    -  ovector       Offset vector that pcre[16|32]_exec() used
    -  stringcount   Value returned by pcre[16|32]_exec()
    -  stringnumber  Number of the required substring
    -  stringptr     Where to put the string pointer
    -
    -The memory in which the substring is placed is obtained by calling -pcre[16|32]_malloc(). The convenience function -pcre[16|32]_free_substring() can be used to free it when it is no longer -needed. The yield of the function is the length of the substring, -PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or -PCRE_ERROR_NOSUBSTRING if the string number is invalid. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_get_substring_list.html b/src/pcre/doc/html/pcre_get_substring_list.html deleted file mode 100644 index 7e8c6bc8..00000000 --- a/src/pcre/doc/html/pcre_get_substring_list.html +++ /dev/null @@ -1,61 +0,0 @@ - - -pcre_get_substring_list specification - - -

    pcre_get_substring_list man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre_get_substring_list(const char *subject, - int *ovector, int stringcount, const char ***listptr); -
    -
    -int pcre16_get_substring_list(PCRE_SPTR16 subject, - int *ovector, int stringcount, PCRE_SPTR16 **listptr); -
    -
    -int pcre32_get_substring_list(PCRE_SPTR32 subject, - int *ovector, int stringcount, PCRE_SPTR32 **listptr); -

    -
    -DESCRIPTION -
    -

    -This is a convenience function for extracting a list of all the captured -substrings. The arguments are: -

    -  subject       Subject that has been successfully matched
    -  ovector       Offset vector that pcre[16|32]_exec used
    -  stringcount   Value returned by pcre[16|32]_exec
    -  listptr       Where to put a pointer to the list
    -
    -The memory in which the substrings and the list are placed is obtained by -calling pcre[16|32]_malloc(). The convenience function -pcre[16|32]_free_substring_list() can be used to free it when it is no -longer needed. A pointer to a list of pointers is put in the variable whose -address is in listptr. The list is terminated by a NULL pointer. The -yield of the function is zero on success or PCRE_ERROR_NOMEMORY if sufficient -memory could not be obtained. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_jit_exec.html b/src/pcre/doc/html/pcre_jit_exec.html deleted file mode 100644 index 4ebb0cbc..00000000 --- a/src/pcre/doc/html/pcre_jit_exec.html +++ /dev/null @@ -1,108 +0,0 @@ - - -pcre_jit_exec specification - - -

    pcre_jit_exec man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre_jit_exec(const pcre *code, const pcre_extra *extra, - const char *subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - pcre_jit_stack *jstack); -
    -
    -int pcre16_jit_exec(const pcre16 *code, const pcre16_extra *extra, - PCRE_SPTR16 subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - pcre_jit_stack *jstack); -
    -
    -int pcre32_jit_exec(const pcre32 *code, const pcre32_extra *extra, - PCRE_SPTR32 subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - pcre_jit_stack *jstack); -

    -
    -DESCRIPTION -
    -

    -This function matches a compiled regular expression that has been successfully -studied with one of the JIT options against a given subject string, using a -matching algorithm that is similar to Perl's. It is a "fast path" interface to -JIT, and it bypasses some of the sanity checks that pcre_exec() applies. -It returns offsets to captured substrings. Its arguments are: -

    -  code         Points to the compiled pattern
    -  extra        Points to an associated pcre[16|32]_extra structure,
    -                 or is NULL
    -  subject      Points to the subject string
    -  length       Length of the subject string, in bytes
    -  startoffset  Offset in bytes in the subject at which to
    -                 start matching
    -  options      Option bits
    -  ovector      Points to a vector of ints for result offsets
    -  ovecsize     Number of elements in the vector (a multiple of 3)
    -  jstack       Pointer to a JIT stack
    -
    -The allowed options are: -
    -  PCRE_NOTBOL            Subject string is not the beginning of a line
    -  PCRE_NOTEOL            Subject string is not the end of a line
    -  PCRE_NOTEMPTY          An empty string is not a valid match
    -  PCRE_NOTEMPTY_ATSTART  An empty string at the start of the subject
    -                           is not a valid match
    -  PCRE_NO_UTF16_CHECK    Do not check the subject for UTF-16
    -                           validity (only relevant if PCRE_UTF16
    -                           was set at compile time)
    -  PCRE_NO_UTF32_CHECK    Do not check the subject for UTF-32
    -                           validity (only relevant if PCRE_UTF32
    -                           was set at compile time)
    -  PCRE_NO_UTF8_CHECK     Do not check the subject for UTF-8
    -                           validity (only relevant if PCRE_UTF8
    -                           was set at compile time)
    -  PCRE_PARTIAL           ) Return PCRE_ERROR_PARTIAL for a partial
    -  PCRE_PARTIAL_SOFT      )   match if no full matches are found
    -  PCRE_PARTIAL_HARD      Return PCRE_ERROR_PARTIAL for a partial match
    -                           if that is found before a full match
    -
    -However, the PCRE_NO_UTF[8|16|32]_CHECK options have no effect, as this check -is never applied. For details of partial matching, see the -pcrepartial -page. A pcre_extra structure contains the following fields: -
    -  flags            Bits indicating which fields are set
    -  study_data       Opaque data from pcre[16|32]_study()
    -  match_limit      Limit on internal resource use
    -  match_limit_recursion  Limit on internal recursion depth
    -  callout_data     Opaque data passed back to callouts
    -  tables           Points to character tables or is NULL
    -  mark             For passing back a *MARK pointer
    -  executable_jit   Opaque data from JIT compilation
    -
    -The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT, -PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, -PCRE_EXTRA_TABLES, PCRE_EXTRA_MARK and PCRE_EXTRA_EXECUTABLE_JIT. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the JIT API in the -pcrejit -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_jit_stack_alloc.html b/src/pcre/doc/html/pcre_jit_stack_alloc.html deleted file mode 100644 index 23ba4507..00000000 --- a/src/pcre/doc/html/pcre_jit_stack_alloc.html +++ /dev/null @@ -1,55 +0,0 @@ - - -pcre_jit_stack_alloc specification - - -

    pcre_jit_stack_alloc man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -pcre_jit_stack *pcre_jit_stack_alloc(int startsize, - int maxsize); -
    -
    -pcre16_jit_stack *pcre16_jit_stack_alloc(int startsize, - int maxsize); -
    -
    -pcre32_jit_stack *pcre32_jit_stack_alloc(int startsize, - int maxsize); -

    -
    -DESCRIPTION -
    -

    -This function is used to create a stack for use by the code compiled by the JIT -optimization of pcre[16|32]_study(). The arguments are a starting size for -the stack, and a maximum size to which it is allowed to grow. The result can be -passed to the JIT run-time code by pcre[16|32]_assign_jit_stack(), or that -function can set up a callback for obtaining a stack. A maximum stack size of -512K to 1M should be more than enough for any pattern. For more details, see -the -pcrejit -page. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_jit_stack_free.html b/src/pcre/doc/html/pcre_jit_stack_free.html deleted file mode 100644 index 8bd06e46..00000000 --- a/src/pcre/doc/html/pcre_jit_stack_free.html +++ /dev/null @@ -1,48 +0,0 @@ - - -pcre_jit_stack_free specification - - -

    pcre_jit_stack_free man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -void pcre_jit_stack_free(pcre_jit_stack *stack); -

    -

    -void pcre16_jit_stack_free(pcre16_jit_stack *stack); -

    -

    -void pcre32_jit_stack_free(pcre32_jit_stack *stack); -

    -
    -DESCRIPTION -
    -

    -This function is used to free a JIT stack that was created by -pcre[16|32]_jit_stack_alloc() when it is no longer needed. For more details, -see the -pcrejit -page. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_maketables.html b/src/pcre/doc/html/pcre_maketables.html deleted file mode 100644 index 3a7b5ebc..00000000 --- a/src/pcre/doc/html/pcre_maketables.html +++ /dev/null @@ -1,48 +0,0 @@ - - -pcre_maketables specification - - -

    pcre_maketables man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -const unsigned char *pcre_maketables(void); -

    -

    -const unsigned char *pcre16_maketables(void); -

    -

    -const unsigned char *pcre32_maketables(void); -

    -
    -DESCRIPTION -
    -

    -This function builds a set of character tables for character values less than -256. These can be passed to pcre[16|32]_compile() to override PCRE's -internal, built-in tables (which were made by pcre[16|32]_maketables() when -PCRE was compiled). You might want to do this if you are using a non-standard -locale. The function yields a pointer to the tables. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_pattern_to_host_byte_order.html b/src/pcre/doc/html/pcre_pattern_to_host_byte_order.html deleted file mode 100644 index 1b1c8037..00000000 --- a/src/pcre/doc/html/pcre_pattern_to_host_byte_order.html +++ /dev/null @@ -1,58 +0,0 @@ - - -pcre_pattern_to_host_byte_order specification - - -

    pcre_pattern_to_host_byte_order man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre_pattern_to_host_byte_order(pcre *code, - pcre_extra *extra, const unsigned char *tables); -
    -
    -int pcre16_pattern_to_host_byte_order(pcre16 *code, - pcre16_extra *extra, const unsigned char *tables); -
    -
    -int pcre32_pattern_to_host_byte_order(pcre32 *code, - pcre32_extra *extra, const unsigned char *tables); -

    -
    -DESCRIPTION -
    -

    -This function ensures that the bytes in 2-byte and 4-byte values in a compiled -pattern are in the correct order for the current host. It is useful when a -pattern that has been compiled on one host is transferred to another that might -have different endianness. The arguments are: -

    -  code         A compiled regular expression
    -  extra        Points to an associated pcre[16|32]_extra structure,
    -                 or is NULL
    -  tables       Pointer to character tables, or NULL to
    -                 set the built-in default
    -
    -The result is 0 for success, a negative PCRE_ERROR_xxx value otherwise. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_refcount.html b/src/pcre/doc/html/pcre_refcount.html deleted file mode 100644 index bfb92e6d..00000000 --- a/src/pcre/doc/html/pcre_refcount.html +++ /dev/null @@ -1,51 +0,0 @@ - - -pcre_refcount specification - - -

    pcre_refcount man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre_refcount(pcre *code, int adjust); -

    -

    -int pcre16_refcount(pcre16 *code, int adjust); -

    -

    -int pcre32_refcount(pcre32 *code, int adjust); -

    -
    -DESCRIPTION -
    -

    -This function is used to maintain a reference count inside a data block that -contains a compiled pattern. Its arguments are: -

    -  code                      Compiled regular expression
    -  adjust                    Adjustment to reference value
    -
    -The yield of the function is the adjusted reference value, which is constrained -to lie between 0 and 65535. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_study.html b/src/pcre/doc/html/pcre_study.html deleted file mode 100644 index af82f114..00000000 --- a/src/pcre/doc/html/pcre_study.html +++ /dev/null @@ -1,68 +0,0 @@ - - -pcre_study specification - - -

    pcre_study man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -pcre_extra *pcre_study(const pcre *code, int options, - const char **errptr); -
    -
    -pcre16_extra *pcre16_study(const pcre16 *code, int options, - const char **errptr); -
    -
    -pcre32_extra *pcre32_study(const pcre32 *code, int options, - const char **errptr); -

    -
    -DESCRIPTION -
    -

    -This function studies a compiled pattern, to see if additional information can -be extracted that might speed up matching. Its arguments are: -

    -  code       A compiled regular expression
    -  options    Options for pcre[16|32]_study()
    -  errptr     Where to put an error message
    -
    -If the function succeeds, it returns a value that can be passed to -pcre[16|32]_exec() or pcre[16|32]_dfa_exec() via their extra -arguments. -

    -

    -If the function returns NULL, either it could not find any additional -information, or there was an error. You can tell the difference by looking at -the error value. It is NULL in first case. -

    -

    -The only option is PCRE_STUDY_JIT_COMPILE. It requests just-in-time compilation -if possible. If PCRE has been compiled without JIT support, this option is -ignored. See the -pcrejit -page for further details. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_utf16_to_host_byte_order.html b/src/pcre/doc/html/pcre_utf16_to_host_byte_order.html deleted file mode 100644 index 18e7788f..00000000 --- a/src/pcre/doc/html/pcre_utf16_to_host_byte_order.html +++ /dev/null @@ -1,57 +0,0 @@ - - -pcre_utf16_to_host_byte_order specification - - -

    pcre_utf16_to_host_byte_order man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *output, - PCRE_SPTR16 input, int length, int *host_byte_order, - int keep_boms); -

    -
    -DESCRIPTION -
    -

    -This function, which exists only in the 16-bit library, converts a UTF-16 -string to the correct order for the current host, taking account of any byte -order marks (BOMs) within the string. Its arguments are: -

    -  output           pointer to output buffer, may be the same as input
    -  input            pointer to input buffer
    -  length           number of 16-bit units in the input, or negative for
    -                     a zero-terminated string
    -  host_byte_order  a NULL value or a non-zero value pointed to means
    -                     start in host byte order
    -  keep_boms        if non-zero, BOMs are copied to the output string
    -
    -The result of the function is the number of 16-bit units placed into the output -buffer, including the zero terminator if the string was zero-terminated. -

    -

    -If host_byte_order is not NULL, it is set to indicate the byte order that -is current at the end of the string. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_utf32_to_host_byte_order.html b/src/pcre/doc/html/pcre_utf32_to_host_byte_order.html deleted file mode 100644 index 772ae40c..00000000 --- a/src/pcre/doc/html/pcre_utf32_to_host_byte_order.html +++ /dev/null @@ -1,57 +0,0 @@ - - -pcre_utf32_to_host_byte_order specification - - -

    pcre_utf32_to_host_byte_order man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *output, - PCRE_SPTR32 input, int length, int *host_byte_order, - int keep_boms); -

    -
    -DESCRIPTION -
    -

    -This function, which exists only in the 32-bit library, converts a UTF-32 -string to the correct order for the current host, taking account of any byte -order marks (BOMs) within the string. Its arguments are: -

    -  output           pointer to output buffer, may be the same as input
    -  input            pointer to input buffer
    -  length           number of 32-bit units in the input, or negative for
    -                     a zero-terminated string
    -  host_byte_order  a NULL value or a non-zero value pointed to means
    -                     start in host byte order
    -  keep_boms        if non-zero, BOMs are copied to the output string
    -
    -The result of the function is the number of 32-bit units placed into the output -buffer, including the zero terminator if the string was zero-terminated. -

    -

    -If host_byte_order is not NULL, it is set to indicate the byte order that -is current at the end of the string. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcre_version.html b/src/pcre/doc/html/pcre_version.html deleted file mode 100644 index d33e7189..00000000 --- a/src/pcre/doc/html/pcre_version.html +++ /dev/null @@ -1,46 +0,0 @@ - - -pcre_version specification - - -

    pcre_version man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SYNOPSIS -
    -

    -#include <pcre.h> -

    -

    -const char *pcre_version(void); -

    -

    -const char *pcre16_version(void); -

    -

    -const char *pcre32_version(void); -

    -
    -DESCRIPTION -
    -

    -This function (even in the 16-bit and 32-bit libraries) returns a -zero-terminated, 8-bit character string that gives the version number of the -PCRE library and the date of its release. -

    -

    -There is a complete description of the PCRE native API in the -pcreapi -page and a description of the POSIX API in the -pcreposix -page. -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcreapi.html b/src/pcre/doc/html/pcreapi.html deleted file mode 100644 index 2d7adf18..00000000 --- a/src/pcre/doc/html/pcreapi.html +++ /dev/null @@ -1,2921 +0,0 @@ - - -pcreapi specification - - -

    pcreapi man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -

    -#include <pcre.h> -

    -
    PCRE NATIVE API BASIC FUNCTIONS
    -

    -pcre *pcre_compile(const char *pattern, int options, - const char **errptr, int *erroffset, - const unsigned char *tableptr); -
    -
    -pcre *pcre_compile2(const char *pattern, int options, - int *errorcodeptr, - const char **errptr, int *erroffset, - const unsigned char *tableptr); -
    -
    -pcre_extra *pcre_study(const pcre *code, int options, - const char **errptr); -
    -
    -void pcre_free_study(pcre_extra *extra); -
    -
    -int pcre_exec(const pcre *code, const pcre_extra *extra, - const char *subject, int length, int startoffset, - int options, int *ovector, int ovecsize); -
    -
    -int pcre_dfa_exec(const pcre *code, const pcre_extra *extra, - const char *subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - int *workspace, int wscount); -

    -
    PCRE NATIVE API STRING EXTRACTION FUNCTIONS
    -

    -int pcre_copy_named_substring(const pcre *code, - const char *subject, int *ovector, - int stringcount, const char *stringname, - char *buffer, int buffersize); -
    -
    -int pcre_copy_substring(const char *subject, int *ovector, - int stringcount, int stringnumber, char *buffer, - int buffersize); -
    -
    -int pcre_get_named_substring(const pcre *code, - const char *subject, int *ovector, - int stringcount, const char *stringname, - const char **stringptr); -
    -
    -int pcre_get_stringnumber(const pcre *code, - const char *name); -
    -
    -int pcre_get_stringtable_entries(const pcre *code, - const char *name, char **first, char **last); -
    -
    -int pcre_get_substring(const char *subject, int *ovector, - int stringcount, int stringnumber, - const char **stringptr); -
    -
    -int pcre_get_substring_list(const char *subject, - int *ovector, int stringcount, const char ***listptr); -
    -
    -void pcre_free_substring(const char *stringptr); -
    -
    -void pcre_free_substring_list(const char **stringptr); -

    -
    PCRE NATIVE API AUXILIARY FUNCTIONS
    -

    -int pcre_jit_exec(const pcre *code, const pcre_extra *extra, - const char *subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - pcre_jit_stack *jstack); -
    -
    -pcre_jit_stack *pcre_jit_stack_alloc(int startsize, int maxsize); -
    -
    -void pcre_jit_stack_free(pcre_jit_stack *stack); -
    -
    -void pcre_assign_jit_stack(pcre_extra *extra, - pcre_jit_callback callback, void *data); -
    -
    -const unsigned char *pcre_maketables(void); -
    -
    -int pcre_fullinfo(const pcre *code, const pcre_extra *extra, - int what, void *where); -
    -
    -int pcre_refcount(pcre *code, int adjust); -
    -
    -int pcre_config(int what, void *where); -
    -
    -const char *pcre_version(void); -
    -
    -int pcre_pattern_to_host_byte_order(pcre *code, - pcre_extra *extra, const unsigned char *tables); -

    -
    PCRE NATIVE API INDIRECTED FUNCTIONS
    -

    -void *(*pcre_malloc)(size_t); -
    -
    -void (*pcre_free)(void *); -
    -
    -void *(*pcre_stack_malloc)(size_t); -
    -
    -void (*pcre_stack_free)(void *); -
    -
    -int (*pcre_callout)(pcre_callout_block *); -
    -
    -int (*pcre_stack_guard)(void); -

    -
    PCRE 8-BIT, 16-BIT, AND 32-BIT LIBRARIES
    -

    -As well as support for 8-bit character strings, PCRE also supports 16-bit -strings (from release 8.30) and 32-bit strings (from release 8.32), by means of -two additional libraries. They can be built as well as, or instead of, the -8-bit library. To avoid too much complication, this document describes the -8-bit versions of the functions, with only occasional references to the 16-bit -and 32-bit libraries. -

    -

    -The 16-bit and 32-bit functions operate in the same way as their 8-bit -counterparts; they just use different data types for their arguments and -results, and their names start with pcre16_ or pcre32_ instead of -pcre_. For every option that has UTF8 in its name (for example, -PCRE_UTF8), there are corresponding 16-bit and 32-bit names with UTF8 replaced -by UTF16 or UTF32, respectively. This facility is in fact just cosmetic; the -16-bit and 32-bit option names define the same bit values. -

    -

    -References to bytes and UTF-8 in this document should be read as references to -16-bit data units and UTF-16 when using the 16-bit library, or 32-bit data -units and UTF-32 when using the 32-bit library, unless specified otherwise. -More details of the specific differences for the 16-bit and 32-bit libraries -are given in the -pcre16 -and -pcre32 -pages. -

    -
    PCRE API OVERVIEW
    -

    -PCRE has its own native API, which is described in this document. There are -also some wrapper functions (for the 8-bit library only) that correspond to the -POSIX regular expression API, but they do not give access to all the -functionality. They are described in the -pcreposix -documentation. Both of these APIs define a set of C function calls. A C++ -wrapper (again for the 8-bit library only) is also distributed with PCRE. It is -documented in the -pcrecpp -page. -

    -

    -The native API C function prototypes are defined in the header file -pcre.h, and on Unix-like systems the (8-bit) library itself is called -libpcre. It can normally be accessed by adding -lpcre to the -command for linking an application that uses PCRE. The header file defines the -macros PCRE_MAJOR and PCRE_MINOR to contain the major and minor release numbers -for the library. Applications can use these to include support for different -releases of PCRE. -

    -

    -In a Windows environment, if you want to statically link an application program -against a non-dll pcre.a file, you must define PCRE_STATIC before -including pcre.h or pcrecpp.h, because otherwise the -pcre_malloc() and pcre_free() exported functions will be declared -__declspec(dllimport), with unwanted results. -

    -

    -The functions pcre_compile(), pcre_compile2(), pcre_study(), -and pcre_exec() are used for compiling and matching regular expressions -in a Perl-compatible manner. A sample program that demonstrates the simplest -way of using them is provided in the file called pcredemo.c in the PCRE -source distribution. A listing of this program is given in the -pcredemo -documentation, and the -pcresample -documentation describes how to compile and run it. -

    -

    -Just-in-time compiler support is an optional feature of PCRE that can be built -in appropriate hardware environments. It greatly speeds up the matching -performance of many patterns. Simple programs can easily request that it be -used if available, by setting an option that is ignored when it is not -relevant. More complicated programs might need to make use of the functions -pcre_jit_stack_alloc(), pcre_jit_stack_free(), and -pcre_assign_jit_stack() in order to control the JIT code's memory usage. -

    -

    -From release 8.32 there is also a direct interface for JIT execution, which -gives improved performance. The JIT-specific functions are discussed in the -pcrejit -documentation. -

    -

    -A second matching function, pcre_dfa_exec(), which is not -Perl-compatible, is also provided. This uses a different algorithm for the -matching. The alternative algorithm finds all possible matches (at a given -point in the subject), and scans the subject just once (unless there are -lookbehind assertions). However, this algorithm does not return captured -substrings. A description of the two matching algorithms and their advantages -and disadvantages is given in the -pcrematching -documentation. -

    -

    -In addition to the main compiling and matching functions, there are convenience -functions for extracting captured substrings from a subject string that is -matched by pcre_exec(). They are: -

    -  pcre_copy_substring()
    -  pcre_copy_named_substring()
    -  pcre_get_substring()
    -  pcre_get_named_substring()
    -  pcre_get_substring_list()
    -  pcre_get_stringnumber()
    -  pcre_get_stringtable_entries()
    -
    -pcre_free_substring() and pcre_free_substring_list() are also -provided, to free the memory used for extracted strings. -

    -

    -The function pcre_maketables() is used to build a set of character tables -in the current locale for passing to pcre_compile(), pcre_exec(), -or pcre_dfa_exec(). This is an optional facility that is provided for -specialist use. Most commonly, no special tables are passed, in which case -internal tables that are generated when PCRE is built are used. -

    -

    -The function pcre_fullinfo() is used to find out information about a -compiled pattern. The function pcre_version() returns a pointer to a -string containing the version of PCRE and its date of release. -

    -

    -The function pcre_refcount() maintains a reference count in a data block -containing a compiled pattern. This is provided for the benefit of -object-oriented applications. -

    -

    -The global variables pcre_malloc and pcre_free initially contain -the entry points of the standard malloc() and free() functions, -respectively. PCRE calls the memory management functions via these variables, -so a calling program can replace them if it wishes to intercept the calls. This -should be done before calling any PCRE functions. -

    -

    -The global variables pcre_stack_malloc and pcre_stack_free are also -indirections to memory management functions. These special functions are used -only when PCRE is compiled to use the heap for remembering data, instead of -recursive function calls, when running the pcre_exec() function. See the -pcrebuild -documentation for details of how to do this. It is a non-standard way of -building PCRE, for use in environments that have limited stacks. Because of the -greater use of memory management, it runs more slowly. Separate functions are -provided so that special-purpose external code can be used for this case. When -used, these functions always allocate memory blocks of the same size. There is -a discussion about PCRE's stack usage in the -pcrestack -documentation. -

    -

    -The global variable pcre_callout initially contains NULL. It can be set -by the caller to a "callout" function, which PCRE will then call at specified -points during a matching operation. Details are given in the -pcrecallout -documentation. -

    -

    -The global variable pcre_stack_guard initially contains NULL. It can be -set by the caller to a function that is called by PCRE whenever it starts -to compile a parenthesized part of a pattern. When parentheses are nested, PCRE -uses recursive function calls, which use up the system stack. This function is -provided so that applications with restricted stacks can force a compilation -error if the stack runs out. The function should return zero if all is well, or -non-zero to force an error. -

    -
    NEWLINES
    -

    -PCRE supports five different conventions for indicating line breaks in -strings: a single CR (carriage return) character, a single LF (linefeed) -character, the two-character sequence CRLF, any of the three preceding, or any -Unicode newline sequence. The Unicode newline sequences are the three just -mentioned, plus the single characters VT (vertical tab, U+000B), FF (form feed, -U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS -(paragraph separator, U+2029). -

    -

    -Each of the first three conventions is used by at least one operating system as -its standard newline sequence. When PCRE is built, a default can be specified. -The default default is LF, which is the Unix standard. When PCRE is run, the -default can be overridden, either when a pattern is compiled, or when it is -matched. -

    -

    -At compile time, the newline convention can be specified by the options -argument of pcre_compile(), or it can be specified by special text at the -start of the pattern itself; this overrides any other settings. See the -pcrepattern -page for details of the special character sequences. -

    -

    -In the PCRE documentation the word "newline" is used to mean "the character or -pair of characters that indicate a line break". The choice of newline -convention affects the handling of the dot, circumflex, and dollar -metacharacters, the handling of #-comments in /x mode, and, when CRLF is a -recognized line ending sequence, the match position advancement for a -non-anchored pattern. There is more detail about this in the -section on pcre_exec() options -below. -

    -

    -The choice of newline convention does not affect the interpretation of -the \n or \r escape sequences, nor does it affect what \R matches, which is -controlled in a similar way, but by separate options. -

    -
    MULTITHREADING
    -

    -The PCRE functions can be used in multi-threading applications, with the -proviso that the memory management functions pointed to by pcre_malloc, -pcre_free, pcre_stack_malloc, and pcre_stack_free, and the -callout and stack-checking functions pointed to by pcre_callout and -pcre_stack_guard, are shared by all threads. -

    -

    -The compiled form of a regular expression is not altered during matching, so -the same compiled pattern can safely be used by several threads at once. -

    -

    -If the just-in-time optimization feature is being used, it needs separate -memory stack areas for each thread. See the -pcrejit -documentation for more details. -

    -
    SAVING PRECOMPILED PATTERNS FOR LATER USE
    -

    -The compiled form of a regular expression can be saved and re-used at a later -time, possibly by a different program, and even on a host other than the one on -which it was compiled. Details are given in the -pcreprecompile -documentation, which includes a description of the -pcre_pattern_to_host_byte_order() function. However, compiling a regular -expression with one version of PCRE for use with a different version is not -guaranteed to work and may cause crashes. -

    -
    CHECKING BUILD-TIME OPTIONS
    -

    -int pcre_config(int what, void *where); -

    -

    -The function pcre_config() makes it possible for a PCRE client to -discover which optional features have been compiled into the PCRE library. The -pcrebuild -documentation has more details about these optional features. -

    -

    -The first argument for pcre_config() is an integer, specifying which -information is required; the second argument is a pointer to a variable into -which the information is placed. The returned value is zero on success, or the -negative error code PCRE_ERROR_BADOPTION if the value in the first argument is -not recognized. The following information is available: -

    -  PCRE_CONFIG_UTF8
    -
    -The output is an integer that is set to one if UTF-8 support is available; -otherwise it is set to zero. This value should normally be given to the 8-bit -version of this function, pcre_config(). If it is given to the 16-bit -or 32-bit version of this function, the result is PCRE_ERROR_BADOPTION. -
    -  PCRE_CONFIG_UTF16
    -
    -The output is an integer that is set to one if UTF-16 support is available; -otherwise it is set to zero. This value should normally be given to the 16-bit -version of this function, pcre16_config(). If it is given to the 8-bit -or 32-bit version of this function, the result is PCRE_ERROR_BADOPTION. -
    -  PCRE_CONFIG_UTF32
    -
    -The output is an integer that is set to one if UTF-32 support is available; -otherwise it is set to zero. This value should normally be given to the 32-bit -version of this function, pcre32_config(). If it is given to the 8-bit -or 16-bit version of this function, the result is PCRE_ERROR_BADOPTION. -
    -  PCRE_CONFIG_UNICODE_PROPERTIES
    -
    -The output is an integer that is set to one if support for Unicode character -properties is available; otherwise it is set to zero. -
    -  PCRE_CONFIG_JIT
    -
    -The output is an integer that is set to one if support for just-in-time -compiling is available; otherwise it is set to zero. -
    -  PCRE_CONFIG_JITTARGET
    -
    -The output is a pointer to a zero-terminated "const char *" string. If JIT -support is available, the string contains the name of the architecture for -which the JIT compiler is configured, for example "x86 32bit (little endian + -unaligned)". If JIT support is not available, the result is NULL. -
    -  PCRE_CONFIG_NEWLINE
    -
    -The output is an integer whose value specifies the default character sequence -that is recognized as meaning "newline". The values that are supported in -ASCII/Unicode environments are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for -ANYCRLF, and -1 for ANY. In EBCDIC environments, CR, ANYCRLF, and ANY yield the -same values. However, the value for LF is normally 21, though some EBCDIC -environments use 37. The corresponding values for CRLF are 3349 and 3365. The -default should normally correspond to the standard sequence for your operating -system. -
    -  PCRE_CONFIG_BSR
    -
    -The output is an integer whose value indicates what character sequences the \R -escape sequence matches by default. A value of 0 means that \R matches any -Unicode line ending sequence; a value of 1 means that \R matches only CR, LF, -or CRLF. The default can be overridden when a pattern is compiled or matched. -
    -  PCRE_CONFIG_LINK_SIZE
    -
    -The output is an integer that contains the number of bytes used for internal -linkage in compiled regular expressions. For the 8-bit library, the value can -be 2, 3, or 4. For the 16-bit library, the value is either 2 or 4 and is still -a number of bytes. For the 32-bit library, the value is either 2 or 4 and is -still a number of bytes. The default value of 2 is sufficient for all but the -most massive patterns, since it allows the compiled pattern to be up to 64K in -size. Larger values allow larger regular expressions to be compiled, at the -expense of slower matching. -
    -  PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
    -
    -The output is an integer that contains the threshold above which the POSIX -interface uses malloc() for output vectors. Further details are given in -the -pcreposix -documentation. -
    -  PCRE_CONFIG_PARENS_LIMIT
    -
    -The output is a long integer that gives the maximum depth of nesting of -parentheses (of any kind) in a pattern. This limit is imposed to cap the amount -of system stack used when a pattern is compiled. It is specified when PCRE is -built; the default is 250. This limit does not take into account the stack that -may already be used by the calling application. For finer control over -compilation stack usage, you can set a pointer to an external checking function -in pcre_stack_guard. -
    -  PCRE_CONFIG_MATCH_LIMIT
    -
    -The output is a long integer that gives the default limit for the number of -internal matching function calls in a pcre_exec() execution. Further -details are given with pcre_exec() below. -
    -  PCRE_CONFIG_MATCH_LIMIT_RECURSION
    -
    -The output is a long integer that gives the default limit for the depth of -recursion when calling the internal matching function in a pcre_exec() -execution. Further details are given with pcre_exec() below. -
    -  PCRE_CONFIG_STACKRECURSE
    -
    -The output is an integer that is set to one if internal recursion when running -pcre_exec() is implemented by recursive function calls that use the stack -to remember their state. This is the usual way that PCRE is compiled. The -output is zero if PCRE was compiled to use blocks of data on the heap instead -of recursive function calls. In this case, pcre_stack_malloc and -pcre_stack_free are called to manage memory blocks on the heap, thus -avoiding the use of the stack. -

    -
    COMPILING A PATTERN
    -

    -pcre *pcre_compile(const char *pattern, int options, - const char **errptr, int *erroffset, - const unsigned char *tableptr); -
    -
    -pcre *pcre_compile2(const char *pattern, int options, - int *errorcodeptr, - const char **errptr, int *erroffset, - const unsigned char *tableptr); -

    -

    -Either of the functions pcre_compile() or pcre_compile2() can be -called to compile a pattern into an internal form. The only difference between -the two interfaces is that pcre_compile2() has an additional argument, -errorcodeptr, via which a numerical error code can be returned. To avoid -too much repetition, we refer just to pcre_compile() below, but the -information applies equally to pcre_compile2(). -

    -

    -The pattern is a C string terminated by a binary zero, and is passed in the -pattern argument. A pointer to a single block of memory that is obtained -via pcre_malloc is returned. This contains the compiled code and related -data. The pcre type is defined for the returned block; this is a typedef -for a structure whose contents are not externally defined. It is up to the -caller to free the memory (via pcre_free) when it is no longer required. -

    -

    -Although the compiled code of a PCRE regex is relocatable, that is, it does not -depend on memory location, the complete pcre data block is not -fully relocatable, because it may contain a copy of the tableptr -argument, which is an address (see below). -

    -

    -The options argument contains various bit settings that affect the -compilation. It should be zero if no options are required. The available -options are described below. Some of them (in particular, those that are -compatible with Perl, but some others as well) can also be set and unset from -within the pattern (see the detailed description in the -pcrepattern -documentation). For those options that can be different in different parts of -the pattern, the contents of the options argument specifies their -settings at the start of compilation and execution. The PCRE_ANCHORED, -PCRE_BSR_xxx, PCRE_NEWLINE_xxx, PCRE_NO_UTF8_CHECK, and -PCRE_NO_START_OPTIMIZE options can be set at the time of matching as well as at -compile time. -

    -

    -If errptr is NULL, pcre_compile() returns NULL immediately. -Otherwise, if compilation of a pattern fails, pcre_compile() returns -NULL, and sets the variable pointed to by errptr to point to a textual -error message. This is a static string that is part of the library. You must -not try to free it. Normally, the offset from the start of the pattern to the -data unit that was being processed when the error was discovered is placed in -the variable pointed to by erroffset, which must not be NULL (if it is, -an immediate error is given). However, for an invalid UTF-8 or UTF-16 string, -the offset is that of the first data unit of the failing character. -

    -

    -Some errors are not detected until the whole pattern has been scanned; in these -cases, the offset passed back is the length of the pattern. Note that the -offset is in data units, not characters, even in a UTF mode. It may sometimes -point into the middle of a UTF-8 or UTF-16 character. -

    -

    -If pcre_compile2() is used instead of pcre_compile(), and the -errorcodeptr argument is not NULL, a non-zero error code number is -returned via this argument in the event of an error. This is in addition to the -textual error message. Error codes and messages are listed below. -

    -

    -If the final argument, tableptr, is NULL, PCRE uses a default set of -character tables that are built when PCRE is compiled, using the default C -locale. Otherwise, tableptr must be an address that is the result of a -call to pcre_maketables(). This value is stored with the compiled -pattern, and used again by pcre_exec() and pcre_dfa_exec() when the -pattern is matched. For more discussion, see the section on locale support -below. -

    -

    -This code fragment shows a typical straightforward call to pcre_compile(): -

    -  pcre *re;
    -  const char *error;
    -  int erroffset;
    -  re = pcre_compile(
    -    "^A.*Z",          /* the pattern */
    -    0,                /* default options */
    -    &error,           /* for error message */
    -    &erroffset,       /* for error offset */
    -    NULL);            /* use default character tables */
    -
    -The following names for option bits are defined in the pcre.h header -file: -
    -  PCRE_ANCHORED
    -
    -If this bit is set, the pattern is forced to be "anchored", that is, it is -constrained to match only at the first matching point in the string that is -being searched (the "subject string"). This effect can also be achieved by -appropriate constructs in the pattern itself, which is the only way to do it in -Perl. -
    -  PCRE_AUTO_CALLOUT
    -
    -If this bit is set, pcre_compile() automatically inserts callout items, -all with number 255, before each pattern item. For discussion of the callout -facility, see the -pcrecallout -documentation. -
    -  PCRE_BSR_ANYCRLF
    -  PCRE_BSR_UNICODE
    -
    -These options (which are mutually exclusive) control what the \R escape -sequence matches. The choice is either to match only CR, LF, or CRLF, or to -match any Unicode newline sequence. The default is specified when PCRE is -built. It can be overridden from within the pattern, or by setting an option -when a compiled pattern is matched. -
    -  PCRE_CASELESS
    -
    -If this bit is set, letters in the pattern match both upper and lower case -letters. It is equivalent to Perl's /i option, and it can be changed within a -pattern by a (?i) option setting. In UTF-8 mode, PCRE always understands the -concept of case for characters whose values are less than 128, so caseless -matching is always possible. For characters with higher values, the concept of -case is supported if PCRE is compiled with Unicode property support, but not -otherwise. If you want to use caseless matching for characters 128 and above, -you must ensure that PCRE is compiled with Unicode property support as well as -with UTF-8 support. -
    -  PCRE_DOLLAR_ENDONLY
    -
    -If this bit is set, a dollar metacharacter in the pattern matches only at the -end of the subject string. Without this option, a dollar also matches -immediately before a newline at the end of the string (but not before any other -newlines). The PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set. -There is no equivalent to this option in Perl, and no way to set it within a -pattern. -
    -  PCRE_DOTALL
    -
    -If this bit is set, a dot metacharacter in the pattern matches a character of -any value, including one that indicates a newline. However, it only ever -matches one character, even if newlines are coded as CRLF. Without this option, -a dot does not match when the current position is at a newline. This option is -equivalent to Perl's /s option, and it can be changed within a pattern by a -(?s) option setting. A negative class such as [^a] always matches newline -characters, independent of the setting of this option. -
    -  PCRE_DUPNAMES
    -
    -If this bit is set, names used to identify capturing subpatterns need not be -unique. This can be helpful for certain types of pattern when it is known that -only one instance of the named subpattern can ever be matched. There are more -details of named subpatterns below; see also the -pcrepattern -documentation. -
    -  PCRE_EXTENDED
    -
    -If this bit is set, most white space characters in the pattern are totally -ignored except when escaped or inside a character class. However, white space -is not allowed within sequences such as (?> that introduce various -parenthesized subpatterns, nor within a numerical quantifier such as {1,3}. -However, ignorable white space is permitted between an item and a following -quantifier and between a quantifier and a following + that indicates -possessiveness. -

    -

    -White space did not used to include the VT character (code 11), because Perl -did not treat this character as white space. However, Perl changed at release -5.18, so PCRE followed at release 8.34, and VT is now treated as white space. -

    -

    -PCRE_EXTENDED also causes characters between an unescaped # outside a character -class and the next newline, inclusive, to be ignored. PCRE_EXTENDED is -equivalent to Perl's /x option, and it can be changed within a pattern by a -(?x) option setting. -

    -

    -Which characters are interpreted as newlines is controlled by the options -passed to pcre_compile() or by a special sequence at the start of the -pattern, as described in the section entitled -"Newline conventions" -in the pcrepattern documentation. Note that the end of this type of -comment is a literal newline sequence in the pattern; escape sequences that -happen to represent a newline do not count. -

    -

    -This option makes it possible to include comments inside complicated patterns. -Note, however, that this applies only to data characters. White space characters -may never appear within special character sequences in a pattern, for example -within the sequence (?( that introduces a conditional subpattern. -

    -  PCRE_EXTRA
    -
    -This option was invented in order to turn on additional functionality of PCRE -that is incompatible with Perl, but it is currently of very little use. When -set, any backslash in a pattern that is followed by a letter that has no -special meaning causes an error, thus reserving these combinations for future -expansion. By default, as in Perl, a backslash followed by a letter with no -special meaning is treated as a literal. (Perl can, however, be persuaded to -give an error for this, by running it with the -w option.) There are at present -no other features controlled by this option. It can also be set by a (?X) -option setting within a pattern. -
    -  PCRE_FIRSTLINE
    -
    -If this option is set, an unanchored pattern is required to match before or at -the first newline in the subject string, though the matched text may continue -over the newline. -
    -  PCRE_JAVASCRIPT_COMPAT
    -
    -If this option is set, PCRE's behaviour is changed in some ways so that it is -compatible with JavaScript rather than Perl. The changes are as follows: -

    -

    -(1) A lone closing square bracket in a pattern causes a compile-time error, -because this is illegal in JavaScript (by default it is treated as a data -character). Thus, the pattern AB]CD becomes illegal when this option is set. -

    -

    -(2) At run time, a back reference to an unset subpattern group matches an empty -string (by default this causes the current matching alternative to fail). A -pattern such as (\1)(a) succeeds when this option is set (assuming it can find -an "a" in the subject), whereas it fails by default, for Perl compatibility. -

    -

    -(3) \U matches an upper case "U" character; by default \U causes a compile -time error (Perl uses \U to upper case subsequent characters). -

    -

    -(4) \u matches a lower case "u" character unless it is followed by four -hexadecimal digits, in which case the hexadecimal number defines the code point -to match. By default, \u causes a compile time error (Perl uses it to upper -case the following character). -

    -

    -(5) \x matches a lower case "x" character unless it is followed by two -hexadecimal digits, in which case the hexadecimal number defines the code point -to match. By default, as in Perl, a hexadecimal number is always expected after -\x, but it may have zero, one, or two digits (so, for example, \xz matches a -binary zero character followed by z). -

    -  PCRE_MULTILINE
    -
    -By default, for the purposes of matching "start of line" and "end of line", -PCRE treats the subject string as consisting of a single line of characters, -even if it actually contains newlines. The "start of line" metacharacter (^) -matches only at the start of the string, and the "end of line" metacharacter -($) matches only at the end of the string, or before a terminating newline -(except when PCRE_DOLLAR_ENDONLY is set). Note, however, that unless -PCRE_DOTALL is set, the "any character" metacharacter (.) does not match at a -newline. This behaviour (for ^, $, and dot) is the same as Perl. -

    -

    -When PCRE_MULTILINE it is set, the "start of line" and "end of line" constructs -match immediately following or immediately before internal newlines in the -subject string, respectively, as well as at the very start and end. This is -equivalent to Perl's /m option, and it can be changed within a pattern by a -(?m) option setting. If there are no newlines in a subject string, or no -occurrences of ^ or $ in a pattern, setting PCRE_MULTILINE has no effect. -

    -  PCRE_NEVER_UTF
    -
    -This option locks out interpretation of the pattern as UTF-8 (or UTF-16 or -UTF-32 in the 16-bit and 32-bit libraries). In particular, it prevents the -creator of the pattern from switching to UTF interpretation by starting the -pattern with (*UTF). This may be useful in applications that process patterns -from external sources. The combination of PCRE_UTF8 and PCRE_NEVER_UTF also -causes an error. -
    -  PCRE_NEWLINE_CR
    -  PCRE_NEWLINE_LF
    -  PCRE_NEWLINE_CRLF
    -  PCRE_NEWLINE_ANYCRLF
    -  PCRE_NEWLINE_ANY
    -
    -These options override the default newline definition that was chosen when PCRE -was built. Setting the first or the second specifies that a newline is -indicated by a single character (CR or LF, respectively). Setting -PCRE_NEWLINE_CRLF specifies that a newline is indicated by the two-character -CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies that any of the three -preceding sequences should be recognized. Setting PCRE_NEWLINE_ANY specifies -that any Unicode newline sequence should be recognized. -

    -

    -In an ASCII/Unicode environment, the Unicode newline sequences are the three -just mentioned, plus the single characters VT (vertical tab, U+000B), FF (form -feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS -(paragraph separator, U+2029). For the 8-bit library, the last two are -recognized only in UTF-8 mode. -

    -

    -When PCRE is compiled to run in an EBCDIC (mainframe) environment, the code for -CR is 0x0d, the same as ASCII. However, the character code for LF is normally -0x15, though in some EBCDIC environments 0x25 is used. Whichever of these is -not LF is made to correspond to Unicode's NEL character. EBCDIC codes are all -less than 256. For more details, see the -pcrebuild -documentation. -

    -

    -The newline setting in the options word uses three bits that are treated -as a number, giving eight possibilities. Currently only six are used (default -plus the five values above). This means that if you set more than one newline -option, the combination may or may not be sensible. For example, -PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to PCRE_NEWLINE_CRLF, but -other combinations may yield unused numbers and cause an error. -

    -

    -The only time that a line break in a pattern is specially recognized when -compiling is when PCRE_EXTENDED is set. CR and LF are white space characters, -and so are ignored in this mode. Also, an unescaped # outside a character class -indicates a comment that lasts until after the next line break sequence. In -other circumstances, line break sequences in patterns are treated as literal -data. -

    -

    -The newline option that is set at compile time becomes the default that is used -for pcre_exec() and pcre_dfa_exec(), but it can be overridden. -

    -  PCRE_NO_AUTO_CAPTURE
    -
    -If this option is set, it disables the use of numbered capturing parentheses in -the pattern. Any opening parenthesis that is not followed by ? behaves as if it -were followed by ?: but named parentheses can still be used for capturing (and -they acquire numbers in the usual way). There is no equivalent of this option -in Perl. -
    -  PCRE_NO_AUTO_POSSESS
    -
    -If this option is set, it disables "auto-possessification". This is an -optimization that, for example, turns a+b into a++b in order to avoid -backtracks into a+ that can never be successful. However, if callouts are in -use, auto-possessification means that some of them are never taken. You can set -this option if you want the matching functions to do a full unoptimized search -and run all the callouts, but it is mainly provided for testing purposes. -
    -  PCRE_NO_START_OPTIMIZE
    -
    -This is an option that acts at matching time; that is, it is really an option -for pcre_exec() or pcre_dfa_exec(). If it is set at compile time, -it is remembered with the compiled pattern and assumed at matching time. This -is necessary if you want to use JIT execution, because the JIT compiler needs -to know whether or not this option is set. For details see the discussion of -PCRE_NO_START_OPTIMIZE -below. -
    -  PCRE_UCP
    -
    -This option changes the way PCRE processes \B, \b, \D, \d, \S, \s, \W, -\w, and some of the POSIX character classes. By default, only ASCII characters -are recognized, but if PCRE_UCP is set, Unicode properties are used instead to -classify characters. More details are given in the section on -generic character types -in the -pcrepattern -page. If you set PCRE_UCP, matching one of the items it affects takes much -longer. The option is available only if PCRE has been compiled with Unicode -property support. -
    -  PCRE_UNGREEDY
    -
    -This option inverts the "greediness" of the quantifiers so that they are not -greedy by default, but become greedy if followed by "?". It is not compatible -with Perl. It can also be set by a (?U) option setting within the pattern. -
    -  PCRE_UTF8
    -
    -This option causes PCRE to regard both the pattern and the subject as strings -of UTF-8 characters instead of single-byte strings. However, it is available -only when PCRE is built to include UTF support. If not, the use of this option -provokes an error. Details of how this option changes the behaviour of PCRE are -given in the -pcreunicode -page. -
    -  PCRE_NO_UTF8_CHECK
    -
    -When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is -automatically checked. There is a discussion about the -validity of UTF-8 strings -in the -pcreunicode -page. If an invalid UTF-8 sequence is found, pcre_compile() returns an -error. If you already know that your pattern is valid, and you want to skip -this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK option. -When it is set, the effect of passing an invalid UTF-8 string as a pattern is -undefined. It may cause your program to crash or loop. Note that this option -can also be passed to pcre_exec() and pcre_dfa_exec(), to suppress -the validity checking of subject strings only. If the same string is being -matched many times, the option can be safely set for the second and subsequent -matchings to improve performance. -

    -
    COMPILATION ERROR CODES
    -

    -The following table lists the error codes than may be returned by -pcre_compile2(), along with the error messages that may be returned by -both compiling functions. Note that error messages are always 8-bit ASCII -strings, even in 16-bit or 32-bit mode. As PCRE has developed, some error codes -have fallen out of use. To avoid confusion, they have not been re-used. -

    -   0  no error
    -   1  \ at end of pattern
    -   2  \c at end of pattern
    -   3  unrecognized character follows \
    -   4  numbers out of order in {} quantifier
    -   5  number too big in {} quantifier
    -   6  missing terminating ] for character class
    -   7  invalid escape sequence in character class
    -   8  range out of order in character class
    -   9  nothing to repeat
    -  10  [this code is not in use]
    -  11  internal error: unexpected repeat
    -  12  unrecognized character after (? or (?-
    -  13  POSIX named classes are supported only within a class
    -  14  missing )
    -  15  reference to non-existent subpattern
    -  16  erroffset passed as NULL
    -  17  unknown option bit(s) set
    -  18  missing ) after comment
    -  19  [this code is not in use]
    -  20  regular expression is too large
    -  21  failed to get memory
    -  22  unmatched parentheses
    -  23  internal error: code overflow
    -  24  unrecognized character after (?<
    -  25  lookbehind assertion is not fixed length
    -  26  malformed number or name after (?(
    -  27  conditional group contains more than two branches
    -  28  assertion expected after (?(
    -  29  (?R or (?[+-]digits must be followed by )
    -  30  unknown POSIX class name
    -  31  POSIX collating elements are not supported
    -  32  this version of PCRE is compiled without UTF support
    -  33  [this code is not in use]
    -  34  character value in \x{} or \o{} is too large
    -  35  invalid condition (?(0)
    -  36  \C not allowed in lookbehind assertion
    -  37  PCRE does not support \L, \l, \N{name}, \U, or \u
    -  38  number after (?C is > 255
    -  39  closing ) for (?C expected
    -  40  recursive call could loop indefinitely
    -  41  unrecognized character after (?P
    -  42  syntax error in subpattern name (missing terminator)
    -  43  two named subpatterns have the same name
    -  44  invalid UTF-8 string (specifically UTF-8)
    -  45  support for \P, \p, and \X has not been compiled
    -  46  malformed \P or \p sequence
    -  47  unknown property name after \P or \p
    -  48  subpattern name is too long (maximum 32 characters)
    -  49  too many named subpatterns (maximum 10000)
    -  50  [this code is not in use]
    -  51  octal value is greater than \377 in 8-bit non-UTF-8 mode
    -  52  internal error: overran compiling workspace
    -  53  internal error: previously-checked referenced subpattern
    -        not found
    -  54  DEFINE group contains more than one branch
    -  55  repeating a DEFINE group is not allowed
    -  56  inconsistent NEWLINE options
    -  57  \g is not followed by a braced, angle-bracketed, or quoted
    -        name/number or by a plain number
    -  58  a numbered reference must not be zero
    -  59  an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)
    -  60  (*VERB) not recognized or malformed
    -  61  number is too big
    -  62  subpattern name expected
    -  63  digit expected after (?+
    -  64  ] is an invalid data character in JavaScript compatibility mode
    -  65  different names for subpatterns of the same number are
    -        not allowed
    -  66  (*MARK) must have an argument
    -  67  this version of PCRE is not compiled with Unicode property
    -        support
    -  68  \c must be followed by an ASCII character
    -  69  \k is not followed by a braced, angle-bracketed, or quoted name
    -  70  internal error: unknown opcode in find_fixedlength()
    -  71  \N is not supported in a class
    -  72  too many forward references
    -  73  disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
    -  74  invalid UTF-16 string (specifically UTF-16)
    -  75  name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
    -  76  character value in \u.... sequence is too large
    -  77  invalid UTF-32 string (specifically UTF-32)
    -  78  setting UTF is disabled by the application
    -  79  non-hex character in \x{} (closing brace missing?)
    -  80  non-octal character in \o{} (closing brace missing?)
    -  81  missing opening brace after \o
    -  82  parentheses are too deeply nested
    -  83  invalid range in character class
    -  84  group name must start with a non-digit
    -  85  parentheses are too deeply nested (stack check)
    -
    -The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may -be used if the limits were changed when PCRE was built. -

    -
    STUDYING A PATTERN
    -

    -pcre_extra *pcre_study(const pcre *code, int options, - const char **errptr); -

    -

    -If a compiled pattern is going to be used several times, it is worth spending -more time analyzing it in order to speed up the time taken for matching. The -function pcre_study() takes a pointer to a compiled pattern as its first -argument. If studying the pattern produces additional information that will -help speed up matching, pcre_study() returns a pointer to a -pcre_extra block, in which the study_data field points to the -results of the study. -

    -

    -The returned value from pcre_study() can be passed directly to -pcre_exec() or pcre_dfa_exec(). However, a pcre_extra block -also contains other fields that can be set by the caller before the block is -passed; these are described -below -in the section on matching a pattern. -

    -

    -If studying the pattern does not produce any useful information, -pcre_study() returns NULL by default. In that circumstance, if the -calling program wants to pass any of the other fields to pcre_exec() or -pcre_dfa_exec(), it must set up its own pcre_extra block. However, -if pcre_study() is called with the PCRE_STUDY_EXTRA_NEEDED option, it -returns a pcre_extra block even if studying did not find any additional -information. It may still return NULL, however, if an error occurs in -pcre_study(). -

    -

    -The second argument of pcre_study() contains option bits. There are three -further options in addition to PCRE_STUDY_EXTRA_NEEDED: -

    -  PCRE_STUDY_JIT_COMPILE
    -  PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
    -  PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
    -
    -If any of these are set, and the just-in-time compiler is available, the -pattern is further compiled into machine code that executes much faster than -the pcre_exec() interpretive matching function. If the just-in-time -compiler is not available, these options are ignored. All undefined bits in the -options argument must be zero. -

    -

    -JIT compilation is a heavyweight optimization. It can take some time for -patterns to be analyzed, and for one-off matches and simple patterns the -benefit of faster execution might be offset by a much slower study time. -Not all patterns can be optimized by the JIT compiler. For those that cannot be -handled, matching automatically falls back to the pcre_exec() -interpreter. For more details, see the -pcrejit -documentation. -

    -

    -The third argument for pcre_study() is a pointer for an error message. If -studying succeeds (even if no data is returned), the variable it points to is -set to NULL. Otherwise it is set to point to a textual error message. This is a -static string that is part of the library. You must not try to free it. You -should test the error pointer for NULL after calling pcre_study(), to be -sure that it has run successfully. -

    -

    -When you are finished with a pattern, you can free the memory used for the -study data by calling pcre_free_study(). This function was added to the -API for release 8.20. For earlier versions, the memory could be freed with -pcre_free(), just like the pattern itself. This will still work in cases -where JIT optimization is not used, but it is advisable to change to the new -function when convenient. -

    -

    -This is a typical way in which pcre_study() is used (except that in a -real application there should be tests for errors): -

    -  int rc;
    -  pcre *re;
    -  pcre_extra *sd;
    -  re = pcre_compile("pattern", 0, &error, &erroroffset, NULL);
    -  sd = pcre_study(
    -    re,             /* result of pcre_compile() */
    -    0,              /* no options */
    -    &error);        /* set to NULL or points to a message */
    -  rc = pcre_exec(   /* see below for details of pcre_exec() options */
    -    re, sd, "subject", 7, 0, 0, ovector, 30);
    -  ...
    -  pcre_free_study(sd);
    -  pcre_free(re);
    -
    -Studying a pattern does two things: first, a lower bound for the length of -subject string that is needed to match the pattern is computed. This does not -mean that there are any strings of that length that match, but it does -guarantee that no shorter strings match. The value is used to avoid wasting -time by trying to match strings that are shorter than the lower bound. You can -find out the value in a calling program via the pcre_fullinfo() function. -

    -

    -Studying a pattern is also useful for non-anchored patterns that do not have a -single fixed starting character. A bitmap of possible starting bytes is -created. This speeds up finding a position in the subject at which to start -matching. (In 16-bit mode, the bitmap is used for 16-bit values less than 256. -In 32-bit mode, the bitmap is used for 32-bit values less than 256.) -

    -

    -These two optimizations apply to both pcre_exec() and -pcre_dfa_exec(), and the information is also used by the JIT compiler. -The optimizations can be disabled by setting the PCRE_NO_START_OPTIMIZE option. -You might want to do this if your pattern contains callouts or (*MARK) and you -want to make use of these facilities in cases where matching fails. -

    -

    -PCRE_NO_START_OPTIMIZE can be specified at either compile time or execution -time. However, if PCRE_NO_START_OPTIMIZE is passed to pcre_exec(), (that -is, after any JIT compilation has happened) JIT execution is disabled. For JIT -execution to work with PCRE_NO_START_OPTIMIZE, the option must be set at -compile time. -

    -

    -There is a longer discussion of PCRE_NO_START_OPTIMIZE -below. -

    -
    LOCALE SUPPORT
    -

    -PCRE handles caseless matching, and determines whether characters are letters, -digits, or whatever, by reference to a set of tables, indexed by character -code point. When running in UTF-8 mode, or in the 16- or 32-bit libraries, this -applies only to characters with code points less than 256. By default, -higher-valued code points never match escapes such as \w or \d. However, if -PCRE is built with Unicode property support, all characters can be tested with -\p and \P, or, alternatively, the PCRE_UCP option can be set when a pattern -is compiled; this causes \w and friends to use Unicode property support -instead of the built-in tables. -

    -

    -The use of locales with Unicode is discouraged. If you are handling characters -with code points greater than 128, you should either use Unicode support, or -use locales, but not try to mix the two. -

    -

    -PCRE contains an internal set of tables that are used when the final argument -of pcre_compile() is NULL. These are sufficient for many applications. -Normally, the internal tables recognize only ASCII characters. However, when -PCRE is built, it is possible to cause the internal tables to be rebuilt in the -default "C" locale of the local system, which may cause them to be different. -

    -

    -The internal tables can always be overridden by tables supplied by the -application that calls PCRE. These may be created in a different locale from -the default. As more and more applications change to using Unicode, the need -for this locale support is expected to die away. -

    -

    -External tables are built by calling the pcre_maketables() function, -which has no arguments, in the relevant locale. The result can then be passed -to pcre_compile() as often as necessary. For example, to build and use -tables that are appropriate for the French locale (where accented characters -with values greater than 128 are treated as letters), the following code could -be used: -

    -  setlocale(LC_CTYPE, "fr_FR");
    -  tables = pcre_maketables();
    -  re = pcre_compile(..., tables);
    -
    -The locale name "fr_FR" is used on Linux and other Unix-like systems; if you -are using Windows, the name for the French locale is "french". -

    -

    -When pcre_maketables() runs, the tables are built in memory that is -obtained via pcre_malloc. It is the caller's responsibility to ensure -that the memory containing the tables remains available for as long as it is -needed. -

    -

    -The pointer that is passed to pcre_compile() is saved with the compiled -pattern, and the same tables are used via this pointer by pcre_study() -and also by pcre_exec() and pcre_dfa_exec(). Thus, for any single -pattern, compilation, studying and matching all happen in the same locale, but -different patterns can be processed in different locales. -

    -

    -It is possible to pass a table pointer or NULL (indicating the use of the -internal tables) to pcre_exec() or pcre_dfa_exec() (see the -discussion below in the section on matching a pattern). This facility is -provided for use with pre-compiled patterns that have been saved and reloaded. -Character tables are not saved with patterns, so if a non-standard table was -used at compile time, it must be provided again when the reloaded pattern is -matched. Attempting to use this facility to match a pattern in a different -locale from the one in which it was compiled is likely to lead to anomalous -(usually incorrect) results. -

    -
    INFORMATION ABOUT A PATTERN
    -

    -int pcre_fullinfo(const pcre *code, const pcre_extra *extra, - int what, void *where); -

    -

    -The pcre_fullinfo() function returns information about a compiled -pattern. It replaces the pcre_info() function, which was removed from the -library at version 8.30, after more than 10 years of obsolescence. -

    -

    -The first argument for pcre_fullinfo() is a pointer to the compiled -pattern. The second argument is the result of pcre_study(), or NULL if -the pattern was not studied. The third argument specifies which piece of -information is required, and the fourth argument is a pointer to a variable -to receive the data. The yield of the function is zero for success, or one of -the following negative numbers: -

    -  PCRE_ERROR_NULL           the argument code was NULL
    -                            the argument where was NULL
    -  PCRE_ERROR_BADMAGIC       the "magic number" was not found
    -  PCRE_ERROR_BADENDIANNESS  the pattern was compiled with different
    -                            endianness
    -  PCRE_ERROR_BADOPTION      the value of what was invalid
    -  PCRE_ERROR_UNSET          the requested field is not set
    -
    -The "magic number" is placed at the start of each compiled pattern as an simple -check against passing an arbitrary memory pointer. The endianness error can -occur if a compiled pattern is saved and reloaded on a different host. Here is -a typical call of pcre_fullinfo(), to obtain the length of the compiled -pattern: -
    -  int rc;
    -  size_t length;
    -  rc = pcre_fullinfo(
    -    re,               /* result of pcre_compile() */
    -    sd,               /* result of pcre_study(), or NULL */
    -    PCRE_INFO_SIZE,   /* what is required */
    -    &length);         /* where to put the data */
    -
    -The possible values for the third argument are defined in pcre.h, and are -as follows: -
    -  PCRE_INFO_BACKREFMAX
    -
    -Return the number of the highest back reference in the pattern. The fourth -argument should point to an int variable. Zero is returned if there are -no back references. -
    -  PCRE_INFO_CAPTURECOUNT
    -
    -Return the number of capturing subpatterns in the pattern. The fourth argument -should point to an int variable. -
    -  PCRE_INFO_DEFAULT_TABLES
    -
    -Return a pointer to the internal default character tables within PCRE. The -fourth argument should point to an unsigned char * variable. This -information call is provided for internal use by the pcre_study() -function. External callers can cause PCRE to use its internal tables by passing -a NULL table pointer. -
    -  PCRE_INFO_FIRSTBYTE (deprecated)
    -
    -Return information about the first data unit of any matched string, for a -non-anchored pattern. The name of this option refers to the 8-bit library, -where data units are bytes. The fourth argument should point to an int -variable. Negative values are used for special cases. However, this means that -when the 32-bit library is in non-UTF-32 mode, the full 32-bit range of -characters cannot be returned. For this reason, this value is deprecated; use -PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER instead. -

    -

    -If there is a fixed first value, for example, the letter "c" from a pattern -such as (cat|cow|coyote), its value is returned. In the 8-bit library, the -value is always less than 256. In the 16-bit library the value can be up to -0xffff. In the 32-bit library the value can be up to 0x10ffff. -

    -

    -If there is no fixed first value, and if either -
    -
    -(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch -starts with "^", or -
    -
    -(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set -(if it were set, the pattern would be anchored), -
    -
    --1 is returned, indicating that the pattern matches only at the start of a -subject string or after any newline within the string. Otherwise -2 is -returned. For anchored patterns, -2 is returned. -

    -  PCRE_INFO_FIRSTCHARACTER
    -
    -Return the value of the first data unit (non-UTF character) of any matched -string in the situation where PCRE_INFO_FIRSTCHARACTERFLAGS returns 1; -otherwise return 0. The fourth argument should point to an uint_t -variable. -

    -

    -In the 8-bit library, the value is always less than 256. In the 16-bit library -the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value -can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode. -

    -  PCRE_INFO_FIRSTCHARACTERFLAGS
    -
    -Return information about the first data unit of any matched string, for a -non-anchored pattern. The fourth argument should point to an int -variable. -

    -

    -If there is a fixed first value, for example, the letter "c" from a pattern -such as (cat|cow|coyote), 1 is returned, and the character value can be -retrieved using PCRE_INFO_FIRSTCHARACTER. If there is no fixed first value, and -if either -
    -
    -(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch -starts with "^", or -
    -
    -(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set -(if it were set, the pattern would be anchored), -
    -
    -2 is returned, indicating that the pattern matches only at the start of a -subject string or after any newline within the string. Otherwise 0 is -returned. For anchored patterns, 0 is returned. -

    -  PCRE_INFO_FIRSTTABLE
    -
    -If the pattern was studied, and this resulted in the construction of a 256-bit -table indicating a fixed set of values for the first data unit in any matching -string, a pointer to the table is returned. Otherwise NULL is returned. The -fourth argument should point to an unsigned char * variable. -
    -  PCRE_INFO_HASCRORLF
    -
    -Return 1 if the pattern contains any explicit matches for CR or LF characters, -otherwise 0. The fourth argument should point to an int variable. An -explicit match is either a literal CR or LF character, or \r or \n. -
    -  PCRE_INFO_JCHANGED
    -
    -Return 1 if the (?J) or (?-J) option setting is used in the pattern, otherwise -0. The fourth argument should point to an int variable. (?J) and -(?-J) set and unset the local PCRE_DUPNAMES option, respectively. -
    -  PCRE_INFO_JIT
    -
    -Return 1 if the pattern was studied with one of the JIT options, and -just-in-time compiling was successful. The fourth argument should point to an -int variable. A return value of 0 means that JIT support is not available -in this version of PCRE, or that the pattern was not studied with a JIT option, -or that the JIT compiler could not handle this particular pattern. See the -pcrejit -documentation for details of what can and cannot be handled. -
    -  PCRE_INFO_JITSIZE
    -
    -If the pattern was successfully studied with a JIT option, return the size of -the JIT compiled code, otherwise return zero. The fourth argument should point -to a size_t variable. -
    -  PCRE_INFO_LASTLITERAL
    -
    -Return the value of the rightmost literal data unit that must exist in any -matched string, other than at its start, if such a value has been recorded. The -fourth argument should point to an int variable. If there is no such -value, -1 is returned. For anchored patterns, a last literal value is recorded -only if it follows something of variable length. For example, for the pattern -/^a\d+z\d+/ the returned value is "z", but for /^a\dz\d/ the returned value -is -1. -

    -

    -Since for the 32-bit library using the non-UTF-32 mode, this function is unable -to return the full 32-bit range of characters, this value is deprecated; -instead the PCRE_INFO_REQUIREDCHARFLAGS and PCRE_INFO_REQUIREDCHAR values should -be used. -

    -  PCRE_INFO_MATCH_EMPTY
    -
    -Return 1 if the pattern can match an empty string, otherwise 0. The fourth -argument should point to an int variable. -
    -  PCRE_INFO_MATCHLIMIT
    -
    -If the pattern set a match limit by including an item of the form -(*LIMIT_MATCH=nnnn) at the start, the value is returned. The fourth argument -should point to an unsigned 32-bit integer. If no such value has been set, the -call to pcre_fullinfo() returns the error PCRE_ERROR_UNSET. -
    -  PCRE_INFO_MAXLOOKBEHIND
    -
    -Return the number of characters (NB not data units) in the longest lookbehind -assertion in the pattern. This information is useful when doing multi-segment -matching using the partial matching facilities. Note that the simple assertions -\b and \B require a one-character lookbehind. \A also registers a -one-character lookbehind, though it does not actually inspect the previous -character. This is to ensure that at least one character from the old segment -is retained when a new segment is processed. Otherwise, if there are no -lookbehinds in the pattern, \A might match incorrectly at the start of a new -segment. -
    -  PCRE_INFO_MINLENGTH
    -
    -If the pattern was studied and a minimum length for matching subject strings -was computed, its value is returned. Otherwise the returned value is -1. The -value is a number of characters, which in UTF mode may be different from the -number of data units. The fourth argument should point to an int -variable. A non-negative value is a lower bound to the length of any matching -string. There may not be any strings of that length that do actually match, but -every string that does match is at least that long. -
    -  PCRE_INFO_NAMECOUNT
    -  PCRE_INFO_NAMEENTRYSIZE
    -  PCRE_INFO_NAMETABLE
    -
    -PCRE supports the use of named as well as numbered capturing parentheses. The -names are just an additional way of identifying the parentheses, which still -acquire numbers. Several convenience functions such as -pcre_get_named_substring() are provided for extracting captured -substrings by name. It is also possible to extract the data directly, by first -converting the name to a number in order to access the correct pointers in the -output vector (described with pcre_exec() below). To do the conversion, -you need to use the name-to-number map, which is described by these three -values. -

    -

    -The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT gives -the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size of each -entry; both of these return an int value. The entry size depends on the -length of the longest name. PCRE_INFO_NAMETABLE returns a pointer to the first -entry of the table. This is a pointer to char in the 8-bit library, where -the first two bytes of each entry are the number of the capturing parenthesis, -most significant byte first. In the 16-bit library, the pointer points to -16-bit data units, the first of which contains the parenthesis number. In the -32-bit library, the pointer points to 32-bit data units, the first of which -contains the parenthesis number. The rest of the entry is the corresponding -name, zero terminated. -

    -

    -The names are in alphabetical order. If (?| is used to create multiple groups -with the same number, as described in the -section on duplicate subpattern numbers -in the -pcrepattern -page, the groups may be given the same name, but there is only one entry in the -table. Different names for groups of the same number are not permitted. -Duplicate names for subpatterns with different numbers are permitted, -but only if PCRE_DUPNAMES is set. They appear in the table in the order in -which they were found in the pattern. In the absence of (?| this is the order -of increasing number; when (?| is used this is not necessarily the case because -later subpatterns may have lower numbers. -

    -

    -As a simple example of the name/number table, consider the following pattern -after compilation by the 8-bit library (assume PCRE_EXTENDED is set, so white -space - including newlines - is ignored): -

    -  (?<date> (?<year>(\d\d)?\d\d) - (?<month>\d\d) - (?<day>\d\d) )
    -
    -There are four named subpatterns, so the table has four entries, and each entry -in the table is eight bytes long. The table is as follows, with non-printing -bytes shows in hexadecimal, and undefined bytes shown as ??: -
    -  00 01 d  a  t  e  00 ??
    -  00 05 d  a  y  00 ?? ??
    -  00 04 m  o  n  t  h  00
    -  00 02 y  e  a  r  00 ??
    -
    -When writing code to extract data from named subpatterns using the -name-to-number map, remember that the length of the entries is likely to be -different for each compiled pattern. -
    -  PCRE_INFO_OKPARTIAL
    -
    -Return 1 if the pattern can be used for partial matching with -pcre_exec(), otherwise 0. The fourth argument should point to an -int variable. From release 8.00, this always returns 1, because the -restrictions that previously applied to partial matching have been lifted. The -pcrepartial -documentation gives details of partial matching. -
    -  PCRE_INFO_OPTIONS
    -
    -Return a copy of the options with which the pattern was compiled. The fourth -argument should point to an unsigned long int variable. These option bits -are those specified in the call to pcre_compile(), modified by any -top-level option settings at the start of the pattern itself. In other words, -they are the options that will be in force when matching starts. For example, -if the pattern /(?im)abc(?-i)d/ is compiled with the PCRE_EXTENDED option, the -result is PCRE_CASELESS, PCRE_MULTILINE, and PCRE_EXTENDED. -

    -

    -A pattern is automatically anchored by PCRE if all of its top-level -alternatives begin with one of the following: -

    -  ^     unless PCRE_MULTILINE is set
    -  \A    always
    -  \G    always
    -  .*    if PCRE_DOTALL is set and there are no back references to the subpattern in which .* appears
    -
    -For such patterns, the PCRE_ANCHORED bit is set in the options returned by -pcre_fullinfo(). -
    -  PCRE_INFO_RECURSIONLIMIT
    -
    -If the pattern set a recursion limit by including an item of the form -(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The fourth -argument should point to an unsigned 32-bit integer. If no such value has been -set, the call to pcre_fullinfo() returns the error PCRE_ERROR_UNSET. -
    -  PCRE_INFO_SIZE
    -
    -Return the size of the compiled pattern in bytes (for all three libraries). The -fourth argument should point to a size_t variable. This value does not -include the size of the pcre structure that is returned by -pcre_compile(). The value that is passed as the argument to -pcre_malloc() when pcre_compile() is getting memory in which to -place the compiled data is the value returned by this option plus the size of -the pcre structure. Studying a compiled pattern, with or without JIT, -does not alter the value returned by this option. -
    -  PCRE_INFO_STUDYSIZE
    -
    -Return the size in bytes (for all three libraries) of the data block pointed to -by the study_data field in a pcre_extra block. If pcre_extra -is NULL, or there is no study data, zero is returned. The fourth argument -should point to a size_t variable. The study_data field is set by -pcre_study() to record information that will speed up matching (see the -section entitled -"Studying a pattern" -above). The format of the study_data block is private, but its length -is made available via this option so that it can be saved and restored (see the -pcreprecompile -documentation for details). -
    -  PCRE_INFO_REQUIREDCHARFLAGS
    -
    -Returns 1 if there is a rightmost literal data unit that must exist in any -matched string, other than at its start. The fourth argument should point to -an int variable. If there is no such value, 0 is returned. If returning -1, the character value itself can be retrieved using PCRE_INFO_REQUIREDCHAR. -

    -

    -For anchored patterns, a last literal value is recorded only if it follows -something of variable length. For example, for the pattern /^a\d+z\d+/ the -returned value 1 (with "z" returned from PCRE_INFO_REQUIREDCHAR), but for -/^a\dz\d/ the returned value is 0. -

    -  PCRE_INFO_REQUIREDCHAR
    -
    -Return the value of the rightmost literal data unit that must exist in any -matched string, other than at its start, if such a value has been recorded. The -fourth argument should point to an uint32_t variable. If there is no such -value, 0 is returned. -

    -
    REFERENCE COUNTS
    -

    -int pcre_refcount(pcre *code, int adjust); -

    -

    -The pcre_refcount() function is used to maintain a reference count in the -data block that contains a compiled pattern. It is provided for the benefit of -applications that operate in an object-oriented manner, where different parts -of the application may be using the same compiled pattern, but you want to free -the block when they are all done. -

    -

    -When a pattern is compiled, the reference count field is initialized to zero. -It is changed only by calling this function, whose action is to add the -adjust value (which may be positive or negative) to it. The yield of the -function is the new value. However, the value of the count is constrained to -lie between 0 and 65535, inclusive. If the new value is outside these limits, -it is forced to the appropriate limit value. -

    -

    -Except when it is zero, the reference count is not correctly preserved if a -pattern is compiled on one host and then transferred to a host whose byte-order -is different. (This seems a highly unlikely scenario.) -

    -
    MATCHING A PATTERN: THE TRADITIONAL FUNCTION
    -

    -int pcre_exec(const pcre *code, const pcre_extra *extra, - const char *subject, int length, int startoffset, - int options, int *ovector, int ovecsize); -

    -

    -The function pcre_exec() is called to match a subject string against a -compiled pattern, which is passed in the code argument. If the -pattern was studied, the result of the study should be passed in the -extra argument. You can call pcre_exec() with the same code -and extra arguments as many times as you like, in order to match -different subject strings with the same pattern. -

    -

    -This function is the main matching facility of the library, and it operates in -a Perl-like manner. For specialist use there is also an alternative matching -function, which is described -below -in the section about the pcre_dfa_exec() function. -

    -

    -In most applications, the pattern will have been compiled (and optionally -studied) in the same process that calls pcre_exec(). However, it is -possible to save compiled patterns and study data, and then use them later -in different processes, possibly even on different hosts. For a discussion -about this, see the -pcreprecompile -documentation. -

    -

    -Here is an example of a simple call to pcre_exec(): -

    -  int rc;
    -  int ovector[30];
    -  rc = pcre_exec(
    -    re,             /* result of pcre_compile() */
    -    NULL,           /* we didn't study the pattern */
    -    "some string",  /* the subject string */
    -    11,             /* the length of the subject string */
    -    0,              /* start at offset 0 in the subject */
    -    0,              /* default options */
    -    ovector,        /* vector of integers for substring information */
    -    30);            /* number of elements (NOT size in bytes) */
    -
    -

    -
    -Extra data for pcre_exec() -
    -

    -If the extra argument is not NULL, it must point to a pcre_extra -data block. The pcre_study() function returns such a block (when it -doesn't return NULL), but you can also create one for yourself, and pass -additional information in it. The pcre_extra block contains the following -fields (not necessarily in this order): -

    -  unsigned long int flags;
    -  void *study_data;
    -  void *executable_jit;
    -  unsigned long int match_limit;
    -  unsigned long int match_limit_recursion;
    -  void *callout_data;
    -  const unsigned char *tables;
    -  unsigned char **mark;
    -
    -In the 16-bit version of this structure, the mark field has type -"PCRE_UCHAR16 **". -
    -
    -In the 32-bit version of this structure, the mark field has type -"PCRE_UCHAR32 **". -

    -

    -The flags field is used to specify which of the other fields are set. The -flag bits are: -

    -  PCRE_EXTRA_CALLOUT_DATA
    -  PCRE_EXTRA_EXECUTABLE_JIT
    -  PCRE_EXTRA_MARK
    -  PCRE_EXTRA_MATCH_LIMIT
    -  PCRE_EXTRA_MATCH_LIMIT_RECURSION
    -  PCRE_EXTRA_STUDY_DATA
    -  PCRE_EXTRA_TABLES
    -
    -Other flag bits should be set to zero. The study_data field and sometimes -the executable_jit field are set in the pcre_extra block that is -returned by pcre_study(), together with the appropriate flag bits. You -should not set these yourself, but you may add to the block by setting other -fields and their corresponding flag bits. -

    -

    -The match_limit field provides a means of preventing PCRE from using up a -vast amount of resources when running patterns that are not going to match, -but which have a very large number of possibilities in their search trees. The -classic example is a pattern that uses nested unlimited repeats. -

    -

    -Internally, pcre_exec() uses a function called match(), which it -calls repeatedly (sometimes recursively). The limit set by match_limit is -imposed on the number of times this function is called during a match, which -has the effect of limiting the amount of backtracking that can take place. For -patterns that are not anchored, the count restarts from zero for each position -in the subject string. -

    -

    -When pcre_exec() is called with a pattern that was successfully studied -with a JIT option, the way that the matching is executed is entirely different. -However, there is still the possibility of runaway matching that goes on for a -very long time, and so the match_limit value is also used in this case -(but in a different way) to limit how long the matching can continue. -

    -

    -The default value for the limit can be set when PCRE is built; the default -default is 10 million, which handles all but the most extreme cases. You can -override the default by suppling pcre_exec() with a pcre_extra -block in which match_limit is set, and PCRE_EXTRA_MATCH_LIMIT is set in -the flags field. If the limit is exceeded, pcre_exec() returns -PCRE_ERROR_MATCHLIMIT. -

    -

    -A value for the match limit may also be supplied by an item at the start of a -pattern of the form -

    -  (*LIMIT_MATCH=d)
    -
    -where d is a decimal number. However, such a setting is ignored unless d is -less than the limit set by the caller of pcre_exec() or, if no such limit -is set, less than the default. -

    -

    -The match_limit_recursion field is similar to match_limit, but -instead of limiting the total number of times that match() is called, it -limits the depth of recursion. The recursion depth is a smaller number than the -total number of calls, because not all calls to match() are recursive. -This limit is of use only if it is set smaller than match_limit. -

    -

    -Limiting the recursion depth limits the amount of machine stack that can be -used, or, when PCRE has been compiled to use memory on the heap instead of the -stack, the amount of heap memory that can be used. This limit is not relevant, -and is ignored, when matching is done using JIT compiled code. -

    -

    -The default value for match_limit_recursion can be set when PCRE is -built; the default default is the same value as the default for -match_limit. You can override the default by suppling pcre_exec() -with a pcre_extra block in which match_limit_recursion is set, and -PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in the flags field. If the limit -is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT. -

    -

    -A value for the recursion limit may also be supplied by an item at the start of -a pattern of the form -

    -  (*LIMIT_RECURSION=d)
    -
    -where d is a decimal number. However, such a setting is ignored unless d is -less than the limit set by the caller of pcre_exec() or, if no such limit -is set, less than the default. -

    -

    -The callout_data field is used in conjunction with the "callout" feature, -and is described in the -pcrecallout -documentation. -

    -

    -The tables field is provided for use with patterns that have been -pre-compiled using custom character tables, saved to disc or elsewhere, and -then reloaded, because the tables that were used to compile a pattern are not -saved with it. See the -pcreprecompile -documentation for a discussion of saving compiled patterns for later use. If -NULL is passed using this mechanism, it forces PCRE's internal tables to be -used. -

    -

    -Warning: The tables that pcre_exec() uses must be the same as those -that were used when the pattern was compiled. If this is not the case, the -behaviour of pcre_exec() is undefined. Therefore, when a pattern is -compiled and matched in the same process, this field should never be set. In -this (the most common) case, the correct table pointer is automatically passed -with the compiled pattern from pcre_compile() to pcre_exec(). -

    -

    -If PCRE_EXTRA_MARK is set in the flags field, the mark field must -be set to point to a suitable variable. If the pattern contains any -backtracking control verbs such as (*MARK:NAME), and the execution ends up with -a name to pass back, a pointer to the name string (zero terminated) is placed -in the variable pointed to by the mark field. The names are within the -compiled pattern; if you wish to retain such a name you must copy it before -freeing the memory of a compiled pattern. If there is no name to pass back, the -variable pointed to by the mark field is set to NULL. For details of the -backtracking control verbs, see the section entitled -"Backtracking control" -in the -pcrepattern -documentation. -

    -
    -Option bits for pcre_exec() -
    -

    -The unused bits of the options argument for pcre_exec() must be -zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx, -PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, -PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, and -PCRE_PARTIAL_SOFT. -

    -

    -If the pattern was successfully studied with one of the just-in-time (JIT) -compile options, the only supported options for JIT execution are -PCRE_NO_UTF8_CHECK, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, -PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT. If an -unsupported option is used, JIT execution is disabled and the normal -interpretive code in pcre_exec() is run. -

    -  PCRE_ANCHORED
    -
    -The PCRE_ANCHORED option limits pcre_exec() to matching at the first -matching position. If a pattern was compiled with PCRE_ANCHORED, or turned out -to be anchored by virtue of its contents, it cannot be made unachored at -matching time. -
    -  PCRE_BSR_ANYCRLF
    -  PCRE_BSR_UNICODE
    -
    -These options (which are mutually exclusive) control what the \R escape -sequence matches. The choice is either to match only CR, LF, or CRLF, or to -match any Unicode newline sequence. These options override the choice that was -made or defaulted when the pattern was compiled. -
    -  PCRE_NEWLINE_CR
    -  PCRE_NEWLINE_LF
    -  PCRE_NEWLINE_CRLF
    -  PCRE_NEWLINE_ANYCRLF
    -  PCRE_NEWLINE_ANY
    -
    -These options override the newline definition that was chosen or defaulted when -the pattern was compiled. For details, see the description of -pcre_compile() above. During matching, the newline choice affects the -behaviour of the dot, circumflex, and dollar metacharacters. It may also alter -the way the match position is advanced after a match failure for an unanchored -pattern. -

    -

    -When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is set, and a -match attempt for an unanchored pattern fails when the current position is at a -CRLF sequence, and the pattern contains no explicit matches for CR or LF -characters, the match position is advanced by two characters instead of one, in -other words, to after the CRLF. -

    -

    -The above rule is a compromise that makes the most common cases work as -expected. For example, if the pattern is .+A (and the PCRE_DOTALL option is not -set), it does not match the string "\r\nA" because, after failing at the -start, it skips both the CR and the LF before retrying. However, the pattern -[\r\n]A does match that string, because it contains an explicit CR or LF -reference, and so advances only by one character after the first failure. -

    -

    -An explicit match for CR of LF is either a literal appearance of one of those -characters, or one of the \r or \n escape sequences. Implicit matches such as -[^X] do not count, nor does \s (which includes CR and LF in the characters -that it matches). -

    -

    -Notwithstanding the above, anomalous effects may still occur when CRLF is a -valid newline sequence and explicit \r or \n escapes appear in the pattern. -

    -  PCRE_NOTBOL
    -
    -This option specifies that first character of the subject string is not the -beginning of a line, so the circumflex metacharacter should not match before -it. Setting this without PCRE_MULTILINE (at compile time) causes circumflex -never to match. This option affects only the behaviour of the circumflex -metacharacter. It does not affect \A. -
    -  PCRE_NOTEOL
    -
    -This option specifies that the end of the subject string is not the end of a -line, so the dollar metacharacter should not match it nor (except in multiline -mode) a newline immediately before it. Setting this without PCRE_MULTILINE (at -compile time) causes dollar never to match. This option affects only the -behaviour of the dollar metacharacter. It does not affect \Z or \z. -
    -  PCRE_NOTEMPTY
    -
    -An empty string is not considered to be a valid match if this option is set. If -there are alternatives in the pattern, they are tried. If all the alternatives -match the empty string, the entire match fails. For example, if the pattern -
    -  a?b?
    -
    -is applied to a string not beginning with "a" or "b", it matches an empty -string at the start of the subject. With PCRE_NOTEMPTY set, this match is not -valid, so PCRE searches further into the string for occurrences of "a" or "b". -
    -  PCRE_NOTEMPTY_ATSTART
    -
    -This is like PCRE_NOTEMPTY, except that an empty string match that is not at -the start of the subject is permitted. If the pattern is anchored, such a match -can occur only if the pattern contains \K. -

    -

    -Perl has no direct equivalent of PCRE_NOTEMPTY or PCRE_NOTEMPTY_ATSTART, but it -does make a special case of a pattern match of the empty string within its -split() function, and when using the /g modifier. It is possible to -emulate Perl's behaviour after matching a null string by first trying the match -again at the same offset with PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED, and then -if that fails, by advancing the starting offset (see below) and trying an -ordinary match again. There is some code that demonstrates how to do this in -the -pcredemo -sample program. In the most general case, you have to check to see if the -newline convention recognizes CRLF as a newline, and if so, and the current -character is CR followed by LF, advance the starting offset by two characters -instead of one. -

    -  PCRE_NO_START_OPTIMIZE
    -
    -There are a number of optimizations that pcre_exec() uses at the start of -a match, in order to speed up the process. For example, if it is known that an -unanchored match must start with a specific character, it searches the subject -for that character, and fails immediately if it cannot find it, without -actually running the main matching function. This means that a special item -such as (*COMMIT) at the start of a pattern is not considered until after a -suitable starting point for the match has been found. Also, when callouts or -(*MARK) items are in use, these "start-up" optimizations can cause them to be -skipped if the pattern is never actually used. The start-up optimizations are -in effect a pre-scan of the subject that takes place before the pattern is run. -

    -

    -The PCRE_NO_START_OPTIMIZE option disables the start-up optimizations, possibly -causing performance to suffer, but ensuring that in cases where the result is -"no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK) -are considered at every possible starting position in the subject string. If -PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching -time. The use of PCRE_NO_START_OPTIMIZE at matching time (that is, passing it -to pcre_exec()) disables JIT execution; in this situation, matching is -always done using interpretively. -

    -

    -Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation. -Consider the pattern -

    -  (*COMMIT)ABC
    -
    -When this is compiled, PCRE records the fact that a match must start with the -character "A". Suppose the subject string is "DEFABC". The start-up -optimization scans along the subject, finds "A" and runs the first match -attempt from there. The (*COMMIT) item means that the pattern must match the -current starting position, which in this case, it does. However, if the same -match is run with PCRE_NO_START_OPTIMIZE set, the initial scan along the -subject string does not happen. The first match attempt is run starting from -"D" and when this fails, (*COMMIT) prevents any further matches being tried, so -the overall result is "no match". If the pattern is studied, more start-up -optimizations may be used. For example, a minimum length for the subject may be -recorded. Consider the pattern -
    -  (*MARK:A)(X|Y)
    -
    -The minimum length for a match is one character. If the subject is "ABC", there -will be attempts to match "ABC", "BC", "C", and then finally an empty string. -If the pattern is studied, the final attempt does not take place, because PCRE -knows that the subject is too short, and so the (*MARK) is never encountered. -In this case, studying the pattern does not affect the overall match result, -which is still "no match", but it does affect the auxiliary information that is -returned. -
    -  PCRE_NO_UTF8_CHECK
    -
    -When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8 -string is automatically checked when pcre_exec() is subsequently called. -The entire string is checked before any other processing takes place. The value -of startoffset is also checked to ensure that it points to the start of a -UTF-8 character. There is a discussion about the -validity of UTF-8 strings -in the -pcreunicode -page. If an invalid sequence of bytes is found, pcre_exec() returns the -error PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is a -truncated character at the end of the subject, PCRE_ERROR_SHORTUTF8. In both -cases, information about the precise nature of the error may also be returned -(see the descriptions of these errors in the section entitled \fIError return -values from\fP pcre_exec() -below). -If startoffset contains a value that does not point to the start of a -UTF-8 character (or to the end of the subject), PCRE_ERROR_BADUTF8_OFFSET is -returned. -

    -

    -If you already know that your subject is valid, and you want to skip these -checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when -calling pcre_exec(). You might want to do this for the second and -subsequent calls to pcre_exec() if you are making repeated calls to find -all the matches in a single subject string. However, you should be sure that -the value of startoffset points to the start of a character (or the end -of the subject). When PCRE_NO_UTF8_CHECK is set, the effect of passing an -invalid string as a subject or an invalid value of startoffset is -undefined. Your program may crash or loop. -

    -  PCRE_PARTIAL_HARD
    -  PCRE_PARTIAL_SOFT
    -
    -These options turn on the partial matching feature. For backwards -compatibility, PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial match -occurs if the end of the subject string is reached successfully, but there are -not enough subject characters to complete the match. If this happens when -PCRE_PARTIAL_SOFT (but not PCRE_PARTIAL_HARD) is set, matching continues by -testing any remaining alternatives. Only if no complete match can be found is -PCRE_ERROR_PARTIAL returned instead of PCRE_ERROR_NOMATCH. In other words, -PCRE_PARTIAL_SOFT says that the caller is prepared to handle a partial match, -but only if no complete match can be found. -

    -

    -If PCRE_PARTIAL_HARD is set, it overrides PCRE_PARTIAL_SOFT. In this case, if a -partial match is found, pcre_exec() immediately returns -PCRE_ERROR_PARTIAL, without considering any other alternatives. In other words, -when PCRE_PARTIAL_HARD is set, a partial match is considered to be more -important that an alternative complete match. -

    -

    -In both cases, the portion of the string that was inspected when the partial -match was found is set as the first matching string. There is a more detailed -discussion of partial and multi-segment matching, with examples, in the -pcrepartial -documentation. -

    -
    -The string to be matched by pcre_exec() -
    -

    -The subject string is passed to pcre_exec() as a pointer in -subject, a length in length, and a starting offset in -startoffset. The units for length and startoffset are bytes -for the 8-bit library, 16-bit data items for the 16-bit library, and 32-bit -data items for the 32-bit library. -

    -

    -If startoffset is negative or greater than the length of the subject, -pcre_exec() returns PCRE_ERROR_BADOFFSET. When the starting offset is -zero, the search for a match starts at the beginning of the subject, and this -is by far the most common case. In UTF-8 or UTF-16 mode, the offset must point -to the start of a character, or the end of the subject (in UTF-32 mode, one -data unit equals one character, so all offsets are valid). Unlike the pattern -string, the subject may contain binary zeroes. -

    -

    -A non-zero starting offset is useful when searching for another match in the -same subject by calling pcre_exec() again after a previous success. -Setting startoffset differs from just passing over a shortened string and -setting PCRE_NOTBOL in the case of a pattern that begins with any kind of -lookbehind. For example, consider the pattern -

    -  \Biss\B
    -
    -which finds occurrences of "iss" in the middle of words. (\B matches only if -the current position in the subject is not a word boundary.) When applied to -the string "Mississipi" the first call to pcre_exec() finds the first -occurrence. If pcre_exec() is called again with just the remainder of the -subject, namely "issipi", it does not match, because \B is always false at the -start of the subject, which is deemed to be a word boundary. However, if -pcre_exec() is passed the entire string again, but with startoffset -set to 4, it finds the second occurrence of "iss" because it is able to look -behind the starting point to discover that it is preceded by a letter. -

    -

    -Finding all the matches in a subject is tricky when the pattern can match an -empty string. It is possible to emulate Perl's /g behaviour by first trying the -match again at the same offset, with the PCRE_NOTEMPTY_ATSTART and -PCRE_ANCHORED options, and then if that fails, advancing the starting offset -and trying an ordinary match again. There is some code that demonstrates how to -do this in the -pcredemo -sample program. In the most general case, you have to check to see if the -newline convention recognizes CRLF as a newline, and if so, and the current -character is CR followed by LF, advance the starting offset by two characters -instead of one. -

    -

    -If a non-zero starting offset is passed when the pattern is anchored, one -attempt to match at the given offset is made. This can only succeed if the -pattern does not require the match to be at the start of the subject. -

    -
    -How pcre_exec() returns captured substrings -
    -

    -In general, a pattern matches a certain portion of the subject, and in -addition, further substrings from the subject may be picked out by parts of the -pattern. Following the usage in Jeffrey Friedl's book, this is called -"capturing" in what follows, and the phrase "capturing subpattern" is used for -a fragment of a pattern that picks out a substring. PCRE supports several other -kinds of parenthesized subpattern that do not cause substrings to be captured. -

    -

    -Captured substrings are returned to the caller via a vector of integers whose -address is passed in ovector. The number of elements in the vector is -passed in ovecsize, which must be a non-negative number. Note: this -argument is NOT the size of ovector in bytes. -

    -

    -The first two-thirds of the vector is used to pass back captured substrings, -each substring using a pair of integers. The remaining third of the vector is -used as workspace by pcre_exec() while matching capturing subpatterns, -and is not available for passing back information. The number passed in -ovecsize should always be a multiple of three. If it is not, it is -rounded down. -

    -

    -When a match is successful, information about captured substrings is returned -in pairs of integers, starting at the beginning of ovector, and -continuing up to two-thirds of its length at the most. The first element of -each pair is set to the offset of the first character in a substring, and the -second is set to the offset of the first character after the end of a -substring. These values are always data unit offsets, even in UTF mode. They -are byte offsets in the 8-bit library, 16-bit data item offsets in the 16-bit -library, and 32-bit data item offsets in the 32-bit library. Note: they -are not character counts. -

    -

    -The first pair of integers, ovector[0] and ovector[1], identify the -portion of the subject string matched by the entire pattern. The next pair is -used for the first capturing subpattern, and so on. The value returned by -pcre_exec() is one more than the highest numbered pair that has been set. -For example, if two substrings have been captured, the returned value is 3. If -there are no capturing subpatterns, the return value from a successful match is -1, indicating that just the first pair of offsets has been set. -

    -

    -If a capturing subpattern is matched repeatedly, it is the last portion of the -string that it matched that is returned. -

    -

    -If the vector is too small to hold all the captured substring offsets, it is -used as far as possible (up to two-thirds of its length), and the function -returns a value of zero. If neither the actual string matched nor any captured -substrings are of interest, pcre_exec() may be called with ovector -passed as NULL and ovecsize as zero. However, if the pattern contains -back references and the ovector is not big enough to remember the related -substrings, PCRE has to get additional memory for use during matching. Thus it -is usually advisable to supply an ovector of reasonable size. -

    -

    -There are some cases where zero is returned (indicating vector overflow) when -in fact the vector is exactly the right size for the final match. For example, -consider the pattern -

    -  (a)(?:(b)c|bd)
    -
    -If a vector of 6 elements (allowing for only 1 captured substring) is given -with subject string "abd", pcre_exec() will try to set the second -captured string, thereby recording a vector overflow, before failing to match -"c" and backing up to try the second alternative. The zero return, however, -does correctly indicate that the maximum number of slots (namely 2) have been -filled. In similar cases where there is temporary overflow, but the final -number of used slots is actually less than the maximum, a non-zero value is -returned. -

    -

    -The pcre_fullinfo() function can be used to find out how many capturing -subpatterns there are in a compiled pattern. The smallest size for -ovector that will allow for n captured substrings, in addition to -the offsets of the substring matched by the whole pattern, is (n+1)*3. -

    -

    -It is possible for capturing subpattern number n+1 to match some part of -the subject when subpattern n has not been used at all. For example, if -the string "abc" is matched against the pattern (a|(z))(bc) the return from the -function is 4, and subpatterns 1 and 3 are matched, but 2 is not. When this -happens, both values in the offset pairs corresponding to unused subpatterns -are set to -1. -

    -

    -Offset values that correspond to unused subpatterns at the end of the -expression are also set to -1. For example, if the string "abc" is matched -against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not matched. The -return from the function is 2, because the highest used capturing subpattern -number is 1, and the offsets for for the second and third capturing subpatterns -(assuming the vector is large enough, of course) are set to -1. -

    -

    -Note: Elements in the first two-thirds of ovector that do not -correspond to capturing parentheses in the pattern are never changed. That is, -if a pattern contains n capturing parentheses, no more than -ovector[0] to ovector[2n+1] are set by pcre_exec(). The other -elements (in the first two-thirds) retain whatever values they previously had. -

    -

    -Some convenience functions are provided for extracting the captured substrings -as separate strings. These are described below. -

    -
    -Error return values from pcre_exec() -
    -

    -If pcre_exec() fails, it returns a negative number. The following are -defined in the header file: -

    -  PCRE_ERROR_NOMATCH        (-1)
    -
    -The subject string did not match the pattern. -
    -  PCRE_ERROR_NULL           (-2)
    -
    -Either code or subject was passed as NULL, or ovector was -NULL and ovecsize was not zero. -
    -  PCRE_ERROR_BADOPTION      (-3)
    -
    -An unrecognized bit was set in the options argument. -
    -  PCRE_ERROR_BADMAGIC       (-4)
    -
    -PCRE stores a 4-byte "magic number" at the start of the compiled code, to catch -the case when it is passed a junk pointer and to detect when a pattern that was -compiled in an environment of one endianness is run in an environment with the -other endianness. This is the error that PCRE gives when the magic number is -not present. -
    -  PCRE_ERROR_UNKNOWN_OPCODE (-5)
    -
    -While running the pattern match, an unknown item was encountered in the -compiled pattern. This error could be caused by a bug in PCRE or by overwriting -of the compiled pattern. -
    -  PCRE_ERROR_NOMEMORY       (-6)
    -
    -If a pattern contains back references, but the ovector that is passed to -pcre_exec() is not big enough to remember the referenced substrings, PCRE -gets a block of memory at the start of matching to use for this purpose. If the -call via pcre_malloc() fails, this error is given. The memory is -automatically freed at the end of matching. -

    -

    -This error is also given if pcre_stack_malloc() fails in -pcre_exec(). This can happen only when PCRE has been compiled with ---disable-stack-for-recursion. -

    -  PCRE_ERROR_NOSUBSTRING    (-7)
    -
    -This error is used by the pcre_copy_substring(), -pcre_get_substring(), and pcre_get_substring_list() functions (see -below). It is never returned by pcre_exec(). -
    -  PCRE_ERROR_MATCHLIMIT     (-8)
    -
    -The backtracking limit, as specified by the match_limit field in a -pcre_extra structure (or defaulted) was reached. See the description -above. -
    -  PCRE_ERROR_CALLOUT        (-9)
    -
    -This error is never generated by pcre_exec() itself. It is provided for -use by callout functions that want to yield a distinctive error code. See the -pcrecallout -documentation for details. -
    -  PCRE_ERROR_BADUTF8        (-10)
    -
    -A string that contains an invalid UTF-8 byte sequence was passed as a subject, -and the PCRE_NO_UTF8_CHECK option was not set. If the size of the output vector -(ovecsize) is at least 2, the byte offset to the start of the the invalid -UTF-8 character is placed in the first element, and a reason code is placed in -the second element. The reason codes are listed in the -following section. -For backward compatibility, if PCRE_PARTIAL_HARD is set and the problem is a -truncated UTF-8 character at the end of the subject (reason codes 1 to 5), -PCRE_ERROR_SHORTUTF8 is returned instead of PCRE_ERROR_BADUTF8. -
    -  PCRE_ERROR_BADUTF8_OFFSET (-11)
    -
    -The UTF-8 byte sequence that was passed as a subject was checked and found to -be valid (the PCRE_NO_UTF8_CHECK option was not set), but the value of -startoffset did not point to the beginning of a UTF-8 character or the -end of the subject. -
    -  PCRE_ERROR_PARTIAL        (-12)
    -
    -The subject string did not match, but it did match partially. See the -pcrepartial -documentation for details of partial matching. -
    -  PCRE_ERROR_BADPARTIAL     (-13)
    -
    -This code is no longer in use. It was formerly returned when the PCRE_PARTIAL -option was used with a compiled pattern containing items that were not -supported for partial matching. From release 8.00 onwards, there are no -restrictions on partial matching. -
    -  PCRE_ERROR_INTERNAL       (-14)
    -
    -An unexpected internal error has occurred. This error could be caused by a bug -in PCRE or by overwriting of the compiled pattern. -
    -  PCRE_ERROR_BADCOUNT       (-15)
    -
    -This error is given if the value of the ovecsize argument is negative. -
    -  PCRE_ERROR_RECURSIONLIMIT (-21)
    -
    -The internal recursion limit, as specified by the match_limit_recursion -field in a pcre_extra structure (or defaulted) was reached. See the -description above. -
    -  PCRE_ERROR_BADNEWLINE     (-23)
    -
    -An invalid combination of PCRE_NEWLINE_xxx options was given. -
    -  PCRE_ERROR_BADOFFSET      (-24)
    -
    -The value of startoffset was negative or greater than the length of the -subject, that is, the value in length. -
    -  PCRE_ERROR_SHORTUTF8      (-25)
    -
    -This error is returned instead of PCRE_ERROR_BADUTF8 when the subject string -ends with a truncated UTF-8 character and the PCRE_PARTIAL_HARD option is set. -Information about the failure is returned as for PCRE_ERROR_BADUTF8. It is in -fact sufficient to detect this case, but this special error code for -PCRE_PARTIAL_HARD precedes the implementation of returned information; it is -retained for backwards compatibility. -
    -  PCRE_ERROR_RECURSELOOP    (-26)
    -
    -This error is returned when pcre_exec() detects a recursion loop within -the pattern. Specifically, it means that either the whole pattern or a -subpattern has been called recursively for the second time at the same position -in the subject string. Some simple patterns that might do this are detected and -faulted at compile time, but more complicated cases, in particular mutual -recursions between two different subpatterns, cannot be detected until run -time. -
    -  PCRE_ERROR_JIT_STACKLIMIT (-27)
    -
    -This error is returned when a pattern that was successfully studied using a -JIT compile option is being matched, but the memory available for the -just-in-time processing stack is not large enough. See the -pcrejit -documentation for more details. -
    -  PCRE_ERROR_BADMODE        (-28)
    -
    -This error is given if a pattern that was compiled by the 8-bit library is -passed to a 16-bit or 32-bit library function, or vice versa. -
    -  PCRE_ERROR_BADENDIANNESS  (-29)
    -
    -This error is given if a pattern that was compiled and saved is reloaded on a -host with different endianness. The utility function -pcre_pattern_to_host_byte_order() can be used to convert such a pattern -so that it runs on the new host. -
    -  PCRE_ERROR_JIT_BADOPTION
    -
    -This error is returned when a pattern that was successfully studied using a JIT -compile option is being matched, but the matching mode (partial or complete -match) does not correspond to any JIT compilation mode. When the JIT fast path -function is used, this error may be also given for invalid options. See the -pcrejit -documentation for more details. -
    -  PCRE_ERROR_BADLENGTH      (-32)
    -
    -This error is given if pcre_exec() is called with a negative value for -the length argument. -

    -

    -Error numbers -16 to -20, -22, and 30 are not used by pcre_exec(). -

    -
    -Reason codes for invalid UTF-8 strings -
    -

    -This section applies only to the 8-bit library. The corresponding information -for the 16-bit and 32-bit libraries is given in the -pcre16 -and -pcre32 -pages. -

    -

    -When pcre_exec() returns either PCRE_ERROR_BADUTF8 or -PCRE_ERROR_SHORTUTF8, and the size of the output vector (ovecsize) is at -least 2, the offset of the start of the invalid UTF-8 character is placed in -the first output vector element (ovector[0]) and a reason code is placed -in the second element (ovector[1]). The reason codes are given names in -the pcre.h header file: -

    -  PCRE_UTF8_ERR1
    -  PCRE_UTF8_ERR2
    -  PCRE_UTF8_ERR3
    -  PCRE_UTF8_ERR4
    -  PCRE_UTF8_ERR5
    -
    -The string ends with a truncated UTF-8 character; the code specifies how many -bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8 characters to be -no longer than 4 bytes, the encoding scheme (originally defined by RFC 2279) -allows for up to 6 bytes, and this is checked first; hence the possibility of -4 or 5 missing bytes. -
    -  PCRE_UTF8_ERR6
    -  PCRE_UTF8_ERR7
    -  PCRE_UTF8_ERR8
    -  PCRE_UTF8_ERR9
    -  PCRE_UTF8_ERR10
    -
    -The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of the -character do not have the binary value 0b10 (that is, either the most -significant bit is 0, or the next bit is 1). -
    -  PCRE_UTF8_ERR11
    -  PCRE_UTF8_ERR12
    -
    -A character that is valid by the RFC 2279 rules is either 5 or 6 bytes long; -these code points are excluded by RFC 3629. -
    -  PCRE_UTF8_ERR13
    -
    -A 4-byte character has a value greater than 0x10fff; these code points are -excluded by RFC 3629. -
    -  PCRE_UTF8_ERR14
    -
    -A 3-byte character has a value in the range 0xd800 to 0xdfff; this range of -code points are reserved by RFC 3629 for use with UTF-16, and so are excluded -from UTF-8. -
    -  PCRE_UTF8_ERR15
    -  PCRE_UTF8_ERR16
    -  PCRE_UTF8_ERR17
    -  PCRE_UTF8_ERR18
    -  PCRE_UTF8_ERR19
    -
    -A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it codes for a -value that can be represented by fewer bytes, which is invalid. For example, -the two bytes 0xc0, 0xae give the value 0x2e, whose correct coding uses just -one byte. -
    -  PCRE_UTF8_ERR20
    -
    -The two most significant bits of the first byte of a character have the binary -value 0b10 (that is, the most significant bit is 1 and the second is 0). Such a -byte can only validly occur as the second or subsequent byte of a multi-byte -character. -
    -  PCRE_UTF8_ERR21
    -
    -The first byte of a character has the value 0xfe or 0xff. These values can -never occur in a valid UTF-8 string. -
    -  PCRE_UTF8_ERR22
    -
    -This error code was formerly used when the presence of a so-called -"non-character" caused an error. Unicode corrigendum #9 makes it clear that -such characters should not cause a string to be rejected, and so this code is -no longer in use and is never returned. -

    -
    EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
    -

    -int pcre_copy_substring(const char *subject, int *ovector, - int stringcount, int stringnumber, char *buffer, - int buffersize); -
    -
    -int pcre_get_substring(const char *subject, int *ovector, - int stringcount, int stringnumber, - const char **stringptr); -
    -
    -int pcre_get_substring_list(const char *subject, - int *ovector, int stringcount, const char ***listptr); -

    -

    -Captured substrings can be accessed directly by using the offsets returned by -pcre_exec() in ovector. For convenience, the functions -pcre_copy_substring(), pcre_get_substring(), and -pcre_get_substring_list() are provided for extracting captured substrings -as new, separate, zero-terminated strings. These functions identify substrings -by number. The next section describes functions for extracting named -substrings. -

    -

    -A substring that contains a binary zero is correctly extracted and has a -further zero added on the end, but the result is not, of course, a C string. -However, you can process such a string by referring to the length that is -returned by pcre_copy_substring() and pcre_get_substring(). -Unfortunately, the interface to pcre_get_substring_list() is not adequate -for handling strings containing binary zeros, because the end of the final -string is not independently indicated. -

    -

    -The first three arguments are the same for all three of these functions: -subject is the subject string that has just been successfully matched, -ovector is a pointer to the vector of integer offsets that was passed to -pcre_exec(), and stringcount is the number of substrings that were -captured by the match, including the substring that matched the entire regular -expression. This is the value returned by pcre_exec() if it is greater -than zero. If pcre_exec() returned zero, indicating that it ran out of -space in ovector, the value passed as stringcount should be the -number of elements in the vector divided by three. -

    -

    -The functions pcre_copy_substring() and pcre_get_substring() -extract a single substring, whose number is given as stringnumber. A -value of zero extracts the substring that matched the entire pattern, whereas -higher values extract the captured substrings. For pcre_copy_substring(), -the string is placed in buffer, whose length is given by -buffersize, while for pcre_get_substring() a new block of memory is -obtained via pcre_malloc, and its address is returned via -stringptr. The yield of the function is the length of the string, not -including the terminating zero, or one of these error codes: -

    -  PCRE_ERROR_NOMEMORY       (-6)
    -
    -The buffer was too small for pcre_copy_substring(), or the attempt to get -memory failed for pcre_get_substring(). -
    -  PCRE_ERROR_NOSUBSTRING    (-7)
    -
    -There is no substring whose number is stringnumber. -

    -

    -The pcre_get_substring_list() function extracts all available substrings -and builds a list of pointers to them. All this is done in a single block of -memory that is obtained via pcre_malloc. The address of the memory block -is returned via listptr, which is also the start of the list of string -pointers. The end of the list is marked by a NULL pointer. The yield of the -function is zero if all went well, or the error code -

    -  PCRE_ERROR_NOMEMORY       (-6)
    -
    -if the attempt to get the memory block failed. -

    -

    -When any of these functions encounter a substring that is unset, which can -happen when capturing subpattern number n+1 matches some part of the -subject, but subpattern n has not been used at all, they return an empty -string. This can be distinguished from a genuine zero-length substring by -inspecting the appropriate offset in ovector, which is negative for unset -substrings. -

    -

    -The two convenience functions pcre_free_substring() and -pcre_free_substring_list() can be used to free the memory returned by -a previous call of pcre_get_substring() or -pcre_get_substring_list(), respectively. They do nothing more than call -the function pointed to by pcre_free, which of course could be called -directly from a C program. However, PCRE is used in some situations where it is -linked via a special interface to another programming language that cannot use -pcre_free directly; it is for these cases that the functions are -provided. -

    -
    EXTRACTING CAPTURED SUBSTRINGS BY NAME
    -

    -int pcre_get_stringnumber(const pcre *code, - const char *name); -
    -
    -int pcre_copy_named_substring(const pcre *code, - const char *subject, int *ovector, - int stringcount, const char *stringname, - char *buffer, int buffersize); -
    -
    -int pcre_get_named_substring(const pcre *code, - const char *subject, int *ovector, - int stringcount, const char *stringname, - const char **stringptr); -

    -

    -To extract a substring by name, you first have to find associated number. -For example, for this pattern -

    -  (a+)b(?<xxx>\d+)...
    -
    -the number of the subpattern called "xxx" is 2. If the name is known to be -unique (PCRE_DUPNAMES was not set), you can find the number from the name by -calling pcre_get_stringnumber(). The first argument is the compiled -pattern, and the second is the name. The yield of the function is the -subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no subpattern of -that name. -

    -

    -Given the number, you can extract the substring directly, or use one of the -functions described in the previous section. For convenience, there are also -two functions that do the whole job. -

    -

    -Most of the arguments of pcre_copy_named_substring() and -pcre_get_named_substring() are the same as those for the similarly named -functions that extract by number. As these are described in the previous -section, they are not re-described here. There are just two differences: -

    -

    -First, instead of a substring number, a substring name is given. Second, there -is an extra argument, given at the start, which is a pointer to the compiled -pattern. This is needed in order to gain access to the name-to-number -translation table. -

    -

    -These functions call pcre_get_stringnumber(), and if it succeeds, they -then call pcre_copy_substring() or pcre_get_substring(), as -appropriate. NOTE: If PCRE_DUPNAMES is set and there are duplicate names, -the behaviour may not be what you want (see the next section). -

    -

    -Warning: If the pattern uses the (?| feature to set up multiple -subpatterns with the same number, as described in the -section on duplicate subpattern numbers -in the -pcrepattern -page, you cannot use names to distinguish the different subpatterns, because -names are not included in the compiled code. The matching process uses only -numbers. For this reason, the use of different names for subpatterns of the -same number causes an error at compile time. -

    -
    DUPLICATE SUBPATTERN NAMES
    -

    -int pcre_get_stringtable_entries(const pcre *code, - const char *name, char **first, char **last); -

    -

    -When a pattern is compiled with the PCRE_DUPNAMES option, names for subpatterns -are not required to be unique. (Duplicate names are always allowed for -subpatterns with the same number, created by using the (?| feature. Indeed, if -such subpatterns are named, they are required to use the same names.) -

    -

    -Normally, patterns with duplicate names are such that in any one match, only -one of the named subpatterns participates. An example is shown in the -pcrepattern -documentation. -

    -

    -When duplicates are present, pcre_copy_named_substring() and -pcre_get_named_substring() return the first substring corresponding to -the given name that is set. If none are set, PCRE_ERROR_NOSUBSTRING (-7) is -returned; no data is returned. The pcre_get_stringnumber() function -returns one of the numbers that are associated with the name, but it is not -defined which it is. -

    -

    -If you want to get full details of all captured substrings for a given name, -you must use the pcre_get_stringtable_entries() function. The first -argument is the compiled pattern, and the second is the name. The third and -fourth are pointers to variables which are updated by the function. After it -has run, they point to the first and last entries in the name-to-number table -for the given name. The function itself returns the length of each entry, or -PCRE_ERROR_NOSUBSTRING (-7) if there are none. The format of the table is -described above in the section entitled Information about a pattern -above. -Given all the relevant entries for the name, you can extract each of their -numbers, and hence the captured data, if any. -

    -
    FINDING ALL POSSIBLE MATCHES
    -

    -The traditional matching function uses a similar algorithm to Perl, which stops -when it finds the first match, starting at a given point in the subject. If you -want to find all possible matches, or the longest possible match, consider -using the alternative matching function (see below) instead. If you cannot use -the alternative function, but still need to find all possible matches, you -can kludge it up by making use of the callout facility, which is described in -the -pcrecallout -documentation. -

    -

    -What you have to do is to insert a callout right at the end of the pattern. -When your callout function is called, extract and save the current matched -substring. Then return 1, which forces pcre_exec() to backtrack and try -other alternatives. Ultimately, when it runs out of matches, pcre_exec() -will yield PCRE_ERROR_NOMATCH. -

    -
    OBTAINING AN ESTIMATE OF STACK USAGE
    -

    -Matching certain patterns using pcre_exec() can use a lot of process -stack, which in certain environments can be rather limited in size. Some users -find it helpful to have an estimate of the amount of stack that is used by -pcre_exec(), to help them set recursion limits, as described in the -pcrestack -documentation. The estimate that is output by pcretest when called with -the -m and -C options is obtained by calling pcre_exec with -the values NULL, NULL, NULL, -999, and -999 for its first five arguments. -

    -

    -Normally, if its first argument is NULL, pcre_exec() immediately returns -the negative error code PCRE_ERROR_NULL, but with this special combination of -arguments, it returns instead a negative number whose absolute value is the -approximate stack frame size in bytes. (A negative number is used so that it is -clear that no match has happened.) The value is approximate because in some -cases, recursive calls to pcre_exec() occur when there are one or two -additional variables on the stack. -

    -

    -If PCRE has been compiled to use the heap instead of the stack for recursion, -the value returned is the size of each block that is obtained from the heap. -

    -
    MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
    -

    -int pcre_dfa_exec(const pcre *code, const pcre_extra *extra, - const char *subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - int *workspace, int wscount); -

    -

    -The function pcre_dfa_exec() is called to match a subject string against -a compiled pattern, using a matching algorithm that scans the subject string -just once, and does not backtrack. This has different characteristics to the -normal algorithm, and is not compatible with Perl. Some of the features of PCRE -patterns are not supported. Nevertheless, there are times when this kind of -matching can be useful. For a discussion of the two matching algorithms, and a -list of features that pcre_dfa_exec() does not support, see the -pcrematching -documentation. -

    -

    -The arguments for the pcre_dfa_exec() function are the same as for -pcre_exec(), plus two extras. The ovector argument is used in a -different way, and this is described below. The other common arguments are used -in the same way as for pcre_exec(), so their description is not repeated -here. -

    -

    -The two additional arguments provide workspace for the function. The workspace -vector should contain at least 20 elements. It is used for keeping track of -multiple paths through the pattern tree. More workspace will be needed for -patterns and subjects where there are a lot of potential matches. -

    -

    -Here is an example of a simple call to pcre_dfa_exec(): -

    -  int rc;
    -  int ovector[10];
    -  int wspace[20];
    -  rc = pcre_dfa_exec(
    -    re,             /* result of pcre_compile() */
    -    NULL,           /* we didn't study the pattern */
    -    "some string",  /* the subject string */
    -    11,             /* the length of the subject string */
    -    0,              /* start at offset 0 in the subject */
    -    0,              /* default options */
    -    ovector,        /* vector of integers for substring information */
    -    10,             /* number of elements (NOT size in bytes) */
    -    wspace,         /* working space vector */
    -    20);            /* number of elements (NOT size in bytes) */
    -
    -

    -
    -Option bits for pcre_dfa_exec() -
    -

    -The unused bits of the options argument for pcre_dfa_exec() must be -zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx, -PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, -PCRE_NO_UTF8_CHECK, PCRE_BSR_ANYCRLF, PCRE_BSR_UNICODE, PCRE_NO_START_OPTIMIZE, -PCRE_PARTIAL_HARD, PCRE_PARTIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. -All but the last four of these are exactly the same as for pcre_exec(), -so their description is not repeated here. -

    -  PCRE_PARTIAL_HARD
    -  PCRE_PARTIAL_SOFT
    -
    -These have the same general effect as they do for pcre_exec(), but the -details are slightly different. When PCRE_PARTIAL_HARD is set for -pcre_dfa_exec(), it returns PCRE_ERROR_PARTIAL if the end of the subject -is reached and there is still at least one matching possibility that requires -additional characters. This happens even if some complete matches have also -been found. When PCRE_PARTIAL_SOFT is set, the return code PCRE_ERROR_NOMATCH -is converted into PCRE_ERROR_PARTIAL if the end of the subject is reached, -there have been no complete matches, but there is still at least one matching -possibility. The portion of the string that was inspected when the longest -partial match was found is set as the first matching string in both cases. -There is a more detailed discussion of partial and multi-segment matching, with -examples, in the -pcrepartial -documentation. -
    -  PCRE_DFA_SHORTEST
    -
    -Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to stop as -soon as it has found one match. Because of the way the alternative algorithm -works, this is necessarily the shortest possible match at the first possible -matching point in the subject string. -
    -  PCRE_DFA_RESTART
    -
    -When pcre_dfa_exec() returns a partial match, it is possible to call it -again, with additional subject characters, and have it continue with the same -match. The PCRE_DFA_RESTART option requests this action; when it is set, the -workspace and wscount options must reference the same vector as -before because data about the match so far is left in them after a partial -match. There is more discussion of this facility in the -pcrepartial -documentation. -

    -
    -Successful returns from pcre_dfa_exec() -
    -

    -When pcre_dfa_exec() succeeds, it may have matched more than one -substring in the subject. Note, however, that all the matches from one run of -the function start at the same point in the subject. The shorter matches are -all initial substrings of the longer matches. For example, if the pattern -

    -  <.*>
    -
    -is matched against the string -
    -  This is <something> <something else> <something further> no more
    -
    -the three matched strings are -
    -  <something>
    -  <something> <something else>
    -  <something> <something else> <something further>
    -
    -On success, the yield of the function is a number greater than zero, which is -the number of matched substrings. The substrings themselves are returned in -ovector. Each string uses two elements; the first is the offset to the -start, and the second is the offset to the end. In fact, all the strings have -the same start offset. (Space could have been saved by giving this only once, -but it was decided to retain some compatibility with the way pcre_exec() -returns data, even though the meaning of the strings is different.) -

    -

    -The strings are returned in reverse order of length; that is, the longest -matching string is given first. If there were too many matches to fit into -ovector, the yield of the function is zero, and the vector is filled with -the longest matches. Unlike pcre_exec(), pcre_dfa_exec() can use -the entire ovector for returning matched strings. -

    -

    -NOTE: PCRE's "auto-possessification" optimization usually applies to character -repeats at the end of a pattern (as well as internally). For example, the -pattern "a\d+" is compiled as if it were "a\d++" because there is no point -even considering the possibility of backtracking into the repeated digits. For -DFA matching, this means that only one possible match is found. If you really -do want multiple matches in such cases, either use an ungreedy repeat -("a\d+?") or set the PCRE_NO_AUTO_POSSESS option when compiling. -

    -
    -Error returns from pcre_dfa_exec() -
    -

    -The pcre_dfa_exec() function returns a negative number when it fails. -Many of the errors are the same as for pcre_exec(), and these are -described -above. -There are in addition the following errors that are specific to -pcre_dfa_exec(): -

    -  PCRE_ERROR_DFA_UITEM      (-16)
    -
    -This return is given if pcre_dfa_exec() encounters an item in the pattern -that it does not support, for instance, the use of \C or a back reference. -
    -  PCRE_ERROR_DFA_UCOND      (-17)
    -
    -This return is given if pcre_dfa_exec() encounters a condition item that -uses a back reference for the condition, or a test for recursion in a specific -group. These are not supported. -
    -  PCRE_ERROR_DFA_UMLIMIT    (-18)
    -
    -This return is given if pcre_dfa_exec() is called with an extra -block that contains a setting of the match_limit or -match_limit_recursion fields. This is not supported (these fields are -meaningless for DFA matching). -
    -  PCRE_ERROR_DFA_WSSIZE     (-19)
    -
    -This return is given if pcre_dfa_exec() runs out of space in the -workspace vector. -
    -  PCRE_ERROR_DFA_RECURSE    (-20)
    -
    -When a recursive subpattern is processed, the matching function calls itself -recursively, using private vectors for ovector and workspace. This -error is given if the output vector is not large enough. This should be -extremely rare, as a vector of size 1000 is used. -
    -  PCRE_ERROR_DFA_BADRESTART (-30)
    -
    -When pcre_dfa_exec() is called with the PCRE_DFA_RESTART option, -some plausibility checks are made on the contents of the workspace, which -should contain data about the previous partial match. If any of these checks -fail, this error is given. -

    -
    SEE ALSO
    -

    -pcre16(3), pcre32(3), pcrebuild(3), pcrecallout(3), -pcrecpp(3)(3), pcrematching(3), pcrepartial(3), -pcreposix(3), pcreprecompile(3), pcresample(3), -pcrestack(3). -

    -
    AUTHOR
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    REVISION
    -

    -Last updated: 18 December 2015 -
    -Copyright © 1997-2015 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcrebuild.html b/src/pcre/doc/html/pcrebuild.html deleted file mode 100644 index 03c8cbe0..00000000 --- a/src/pcre/doc/html/pcrebuild.html +++ /dev/null @@ -1,534 +0,0 @@ - - -pcrebuild specification - - -

    pcrebuild man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -
    BUILDING PCRE
    -

    -PCRE is distributed with a configure script that can be used to build the -library in Unix-like environments using the applications known as Autotools. -Also in the distribution are files to support building using CMake -instead of configure. The text file -README -contains general information about building with Autotools (some of which is -repeated below), and also has some comments about building on various operating -systems. There is a lot more information about building PCRE without using -Autotools (including information about using CMake and building "by -hand") in the text file called -NON-AUTOTOOLS-BUILD. -You should consult this file as well as the -README -file if you are building in a non-Unix-like environment. -

    -
    PCRE BUILD-TIME OPTIONS
    -

    -The rest of this document describes the optional features of PCRE that can be -selected when the library is compiled. It assumes use of the configure -script, where the optional features are selected or deselected by providing -options to configure before running the make command. However, the -same options can be selected in both Unix-like and non-Unix-like environments -using the GUI facility of cmake-gui if you are using CMake instead -of configure to build PCRE. -

    -

    -If you are not using Autotools or CMake, option selection can be done by -editing the config.h file, or by passing parameter settings to the -compiler, as described in -NON-AUTOTOOLS-BUILD. -

    -

    -The complete list of options for configure (which includes the standard -ones such as the selection of the installation directory) can be obtained by -running -

    -  ./configure --help
    -
    -The following sections include descriptions of options whose names begin with ---enable or --disable. These settings specify changes to the defaults for the -configure command. Because of the way that configure works, ---enable and --disable always come in pairs, so the complementary option always -exists as well, but as it specifies the default, it is not described. -

    -
    BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES
    -

    -By default, a library called libpcre is built, containing functions that -take string arguments contained in vectors of bytes, either as single-byte -characters, or interpreted as UTF-8 strings. You can also build a separate -library, called libpcre16, in which strings are contained in vectors of -16-bit data units and interpreted either as single-unit characters or UTF-16 -strings, by adding -

    -  --enable-pcre16
    -
    -to the configure command. You can also build yet another separate -library, called libpcre32, in which strings are contained in vectors of -32-bit data units and interpreted either as single-unit characters or UTF-32 -strings, by adding -
    -  --enable-pcre32
    -
    -to the configure command. If you do not want the 8-bit library, add -
    -  --disable-pcre8
    -
    -as well. At least one of the three libraries must be built. Note that the C++ -and POSIX wrappers are for the 8-bit library only, and that pcregrep is -an 8-bit program. None of these are built if you select only the 16-bit or -32-bit libraries. -

    -
    BUILDING SHARED AND STATIC LIBRARIES
    -

    -The Autotools PCRE building process uses libtool to build both shared and -static libraries by default. You can suppress one of these by adding one of -

    -  --disable-shared
    -  --disable-static
    -
    -to the configure command, as required. -

    -
    C++ SUPPORT
    -

    -By default, if the 8-bit library is being built, the configure script -will search for a C++ compiler and C++ header files. If it finds them, it -automatically builds the C++ wrapper library (which supports only 8-bit -strings). You can disable this by adding -

    -  --disable-cpp
    -
    -to the configure command. -

    -
    UTF-8, UTF-16 AND UTF-32 SUPPORT
    -

    -To build PCRE with support for UTF Unicode character strings, add -

    -  --enable-utf
    -
    -to the configure command. This setting applies to all three libraries, -adding support for UTF-8 to the 8-bit library, support for UTF-16 to the 16-bit -library, and support for UTF-32 to the to the 32-bit library. There are no -separate options for enabling UTF-8, UTF-16 and UTF-32 independently because -that would allow ridiculous settings such as requesting UTF-16 support while -building only the 8-bit library. It is not possible to build one library with -UTF support and another without in the same configuration. (For backwards -compatibility, --enable-utf8 is a synonym of --enable-utf.) -

    -

    -Of itself, this setting does not make PCRE treat strings as UTF-8, UTF-16 or -UTF-32. As well as compiling PCRE with this option, you also have have to set -the PCRE_UTF8, PCRE_UTF16 or PCRE_UTF32 option (as appropriate) when you call -one of the pattern compiling functions. -

    -

    -If you set --enable-utf when compiling in an EBCDIC environment, PCRE expects -its input to be either ASCII or UTF-8 (depending on the run-time option). It is -not possible to support both EBCDIC and UTF-8 codes in the same version of the -library. Consequently, --enable-utf and --enable-ebcdic are mutually -exclusive. -

    -
    UNICODE CHARACTER PROPERTY SUPPORT
    -

    -UTF support allows the libraries to process character codepoints up to 0x10ffff -in the strings that they handle. On its own, however, it does not provide any -facilities for accessing the properties of such characters. If you want to be -able to use the pattern escapes \P, \p, and \X, which refer to Unicode -character properties, you must add -

    -  --enable-unicode-properties
    -
    -to the configure command. This implies UTF support, even if you have -not explicitly requested it. -

    -

    -Including Unicode property support adds around 30K of tables to the PCRE -library. Only the general category properties such as Lu and Nd are -supported. Details are given in the -pcrepattern -documentation. -

    -
    JUST-IN-TIME COMPILER SUPPORT
    -

    -Just-in-time compiler support is included in the build by specifying -

    -  --enable-jit
    -
    -This support is available only for certain hardware architectures. If this -option is set for an unsupported architecture, a compile time error occurs. -See the -pcrejit -documentation for a discussion of JIT usage. When JIT support is enabled, -pcregrep automatically makes use of it, unless you add -
    -  --disable-pcregrep-jit
    -
    -to the "configure" command. -

    -
    CODE VALUE OF NEWLINE
    -

    -By default, PCRE interprets the linefeed (LF) character as indicating the end -of a line. This is the normal newline character on Unix-like systems. You can -compile PCRE to use carriage return (CR) instead, by adding -

    -  --enable-newline-is-cr
    -
    -to the configure command. There is also a --enable-newline-is-lf option, -which explicitly specifies linefeed as the newline character. -
    -
    -Alternatively, you can specify that line endings are to be indicated by the two -character sequence CRLF. If you want this, add -
    -  --enable-newline-is-crlf
    -
    -to the configure command. There is a fourth option, specified by -
    -  --enable-newline-is-anycrlf
    -
    -which causes PCRE to recognize any of the three sequences CR, LF, or CRLF as -indicating a line ending. Finally, a fifth option, specified by -
    -  --enable-newline-is-any
    -
    -causes PCRE to recognize any Unicode newline sequence. -

    -

    -Whatever line ending convention is selected when PCRE is built can be -overridden when the library functions are called. At build time it is -conventional to use the standard for your operating system. -

    -
    WHAT \R MATCHES
    -

    -By default, the sequence \R in a pattern matches any Unicode newline sequence, -whatever has been selected as the line ending sequence. If you specify -

    -  --enable-bsr-anycrlf
    -
    -the default is changed so that \R matches only CR, LF, or CRLF. Whatever is -selected when PCRE is built can be overridden when the library functions are -called. -

    -
    POSIX MALLOC USAGE
    -

    -When the 8-bit library is called through the POSIX interface (see the -pcreposix -documentation), additional working storage is required for holding the pointers -to capturing substrings, because PCRE requires three integers per substring, -whereas the POSIX interface provides only two. If the number of expected -substrings is small, the wrapper function uses space on the stack, because this -is faster than using malloc() for each call. The default threshold above -which the stack is no longer used is 10; it can be changed by adding a setting -such as -

    -  --with-posix-malloc-threshold=20
    -
    -to the configure command. -

    -
    HANDLING VERY LARGE PATTERNS
    -

    -Within a compiled pattern, offset values are used to point from one part to -another (for example, from an opening parenthesis to an alternation -metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values -are used for these offsets, leading to a maximum size for a compiled pattern of -around 64K. This is sufficient to handle all but the most gigantic patterns. -Nevertheless, some people do want to process truly enormous patterns, so it is -possible to compile PCRE to use three-byte or four-byte offsets by adding a -setting such as -

    -  --with-link-size=3
    -
    -to the configure command. The value given must be 2, 3, or 4. For the -16-bit library, a value of 3 is rounded up to 4. In these libraries, using -longer offsets slows down the operation of PCRE because it has to load -additional data when handling them. For the 32-bit library the value is always -4 and cannot be overridden; the value of --with-link-size is ignored. -

    -
    AVOIDING EXCESSIVE STACK USAGE
    -

    -When matching with the pcre_exec() function, PCRE implements backtracking -by making recursive calls to an internal function called match(). In -environments where the size of the stack is limited, this can severely limit -PCRE's operation. (The Unix environment does not usually suffer from this -problem, but it may sometimes be necessary to increase the maximum stack size. -There is a discussion in the -pcrestack -documentation.) An alternative approach to recursion that uses memory from the -heap to remember data, instead of using recursive function calls, has been -implemented to work round the problem of limited stack size. If you want to -build a version of PCRE that works this way, add -

    -  --disable-stack-for-recursion
    -
    -to the configure command. With this configuration, PCRE will use the -pcre_stack_malloc and pcre_stack_free variables to call memory -management functions. By default these point to malloc() and -free(), but you can replace the pointers so that your own functions are -used instead. -

    -

    -Separate functions are provided rather than using pcre_malloc and -pcre_free because the usage is very predictable: the block sizes -requested are always the same, and the blocks are always freed in reverse -order. A calling program might be able to implement optimized functions that -perform better than malloc() and free(). PCRE runs noticeably more -slowly when built in this way. This option affects only the pcre_exec() -function; it is not relevant for pcre_dfa_exec(). -

    -
    LIMITING PCRE RESOURCE USAGE
    -

    -Internally, PCRE has a function called match(), which it calls repeatedly -(sometimes recursively) when matching a pattern with the pcre_exec() -function. By controlling the maximum number of times this function may be -called during a single matching operation, a limit can be placed on the -resources used by a single call to pcre_exec(). The limit can be changed -at run time, as described in the -pcreapi -documentation. The default is 10 million, but this can be changed by adding a -setting such as -

    -  --with-match-limit=500000
    -
    -to the configure command. This setting has no effect on the -pcre_dfa_exec() matching function. -

    -

    -In some environments it is desirable to limit the depth of recursive calls of -match() more strictly than the total number of calls, in order to -restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion -is specified) that is used. A second limit controls this; it defaults to the -value that is set for --with-match-limit, which imposes no additional -constraints. However, you can set a lower limit by adding, for example, -

    -  --with-match-limit-recursion=10000
    -
    -to the configure command. This value can also be overridden at run time. -

    -
    CREATING CHARACTER TABLES AT BUILD TIME
    -

    -PCRE uses fixed tables for processing characters whose code values are less -than 256. By default, PCRE is built with a set of tables that are distributed -in the file pcre_chartables.c.dist. These tables are for ASCII codes -only. If you add -

    -  --enable-rebuild-chartables
    -
    -to the configure command, the distributed tables are no longer used. -Instead, a program called dftables is compiled and run. This outputs the -source for new set of tables, created in the default locale of your C run-time -system. (This method of replacing the tables does not work if you are cross -compiling, because dftables is run on the local host. If you need to -create alternative tables when cross compiling, you will have to do so "by -hand".) -

    -
    USING EBCDIC CODE
    -

    -PCRE assumes by default that it will run in an environment where the character -code is ASCII (or Unicode, which is a superset of ASCII). This is the case for -most computer operating systems. PCRE can, however, be compiled to run in an -EBCDIC environment by adding -

    -  --enable-ebcdic
    -
    -to the configure command. This setting implies ---enable-rebuild-chartables. You should only use it if you know that you are in -an EBCDIC environment (for example, an IBM mainframe operating system). The ---enable-ebcdic option is incompatible with --enable-utf. -

    -

    -The EBCDIC character that corresponds to an ASCII LF is assumed to have the -value 0x15 by default. However, in some EBCDIC environments, 0x25 is used. In -such an environment you should use -

    -  --enable-ebcdic-nl25
    -
    -as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR has the -same value as in ASCII, namely, 0x0d. Whichever of 0x15 and 0x25 is not -chosen as LF is made to correspond to the Unicode NEL character (which, in -Unicode, is 0x85). -

    -

    -The options that select newline behaviour, such as --enable-newline-is-cr, -and equivalent run-time options, refer to these character values in an EBCDIC -environment. -

    -
    PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT
    -

    -By default, pcregrep reads all files as plain text. You can build it so -that it recognizes files whose names end in .gz or .bz2, and reads -them with libz or libbz2, respectively, by adding one or both of -

    -  --enable-pcregrep-libz
    -  --enable-pcregrep-libbz2
    -
    -to the configure command. These options naturally require that the -relevant libraries are installed on your system. Configuration will fail if -they are not. -

    -
    PCREGREP BUFFER SIZE
    -

    -pcregrep uses an internal buffer to hold a "window" on the file it is -scanning, in order to be able to output "before" and "after" lines when it -finds a match. The size of the buffer is controlled by a parameter whose -default value is 20K. The buffer itself is three times this size, but because -of the way it is used for holding "before" lines, the longest line that is -guaranteed to be processable is the parameter size. You can change the default -parameter value by adding, for example, -

    -  --with-pcregrep-bufsize=50K
    -
    -to the configure command. The caller of \fPpcregrep\fP can, however, -override this value by specifying a run-time option. -

    -
    PCRETEST OPTION FOR LIBREADLINE SUPPORT
    -

    -If you add -

    -  --enable-pcretest-libreadline
    -
    -to the configure command, pcretest is linked with the -libreadline library, and when its input is from a terminal, it reads it -using the readline() function. This provides line-editing and history -facilities. Note that libreadline is GPL-licensed, so if you distribute a -binary of pcretest linked in this way, there may be licensing issues. -

    -

    -Setting this option causes the -lreadline option to be added to the -pcretest build. In many operating environments with a sytem-installed -libreadline this is sufficient. However, in some environments (e.g. -if an unmodified distribution version of readline is in use), some extra -configuration may be necessary. The INSTALL file for libreadline says -this: -

    -  "Readline uses the termcap functions, but does not link with the
    -  termcap or curses library itself, allowing applications which link
    -  with readline the to choose an appropriate library."
    -
    -If your environment has not been set up so that an appropriate library is -automatically included, you may need to add something like -
    -  LIBS="-ncurses"
    -
    -immediately before the configure command. -

    -
    DEBUGGING WITH VALGRIND SUPPORT
    -

    -By adding the -

    -  --enable-valgrind
    -
    -option to to the configure command, PCRE will use valgrind annotations -to mark certain memory regions as unaddressable. This allows it to detect -invalid memory accesses, and is mostly useful for debugging PCRE itself. -

    -
    CODE COVERAGE REPORTING
    -

    -If your C compiler is gcc, you can build a version of PCRE that can generate a -code coverage report for its test suite. To enable this, you must install -lcov version 1.6 or above. Then specify -

    -  --enable-coverage
    -
    -to the configure command and build PCRE in the usual way. -

    -

    -Note that using ccache (a caching C compiler) is incompatible with code -coverage reporting. If you have configured ccache to run automatically -on your system, you must set the environment variable -

    -  CCACHE_DISABLE=1
    -
    -before running make to build PCRE, so that ccache is not used. -

    -

    -When --enable-coverage is used, the following addition targets are added to the -Makefile: -

    -  make coverage
    -
    -This creates a fresh coverage report for the PCRE test suite. It is equivalent -to running "make coverage-reset", "make coverage-baseline", "make check", and -then "make coverage-report". -
    -  make coverage-reset
    -
    -This zeroes the coverage counters, but does nothing else. -
    -  make coverage-baseline
    -
    -This captures baseline coverage information. -
    -  make coverage-report
    -
    -This creates the coverage report. -
    -  make coverage-clean-report
    -
    -This removes the generated coverage report without cleaning the coverage data -itself. -
    -  make coverage-clean-data
    -
    -This removes the captured coverage data without removing the coverage files -created at compile time (*.gcno). -
    -  make coverage-clean
    -
    -This cleans all coverage data including the generated coverage report. For more -information about code coverage, see the gcov and lcov -documentation. -

    -
    SEE ALSO
    -

    -pcreapi(3), pcre16, pcre32, pcre_config(3). -

    -
    AUTHOR
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    REVISION
    -

    -Last updated: 12 May 2013 -
    -Copyright © 1997-2013 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcrecallout.html b/src/pcre/doc/html/pcrecallout.html deleted file mode 100644 index 53a937f5..00000000 --- a/src/pcre/doc/html/pcrecallout.html +++ /dev/null @@ -1,286 +0,0 @@ - - -pcrecallout specification - - -

    pcrecallout man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -
    SYNOPSIS
    -

    -#include <pcre.h> -

    -

    -int (*pcre_callout)(pcre_callout_block *); -

    -

    -int (*pcre16_callout)(pcre16_callout_block *); -

    -

    -int (*pcre32_callout)(pcre32_callout_block *); -

    -
    DESCRIPTION
    -

    -PCRE provides a feature called "callout", which is a means of temporarily -passing control to the caller of PCRE in the middle of pattern matching. The -caller of PCRE provides an external function by putting its entry point in the -global variable pcre_callout (pcre16_callout for the 16-bit -library, pcre32_callout for the 32-bit library). By default, this -variable contains NULL, which disables all calling out. -

    -

    -Within a regular expression, (?C) indicates the points at which the external -function is to be called. Different callout points can be identified by putting -a number less than 256 after the letter C. The default value is zero. -For example, this pattern has two callout points: -

    -  (?C1)abc(?C2)def
    -
    -If the PCRE_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE -automatically inserts callouts, all with number 255, before each item in the -pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern -
    -  A(\d{2}|--)
    -
    -it is processed as if it were -
    -
    -(?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255) -
    -
    -Notice that there is a callout before and after each parenthesis and -alternation bar. If the pattern contains a conditional group whose condition is -an assertion, an automatic callout is inserted immediately before the -condition. Such a callout may also be inserted explicitly, for example: -
    -  (?(?C9)(?=a)ab|de)
    -
    -This applies only to assertion conditions (because they are themselves -independent groups). -

    -

    -Automatic callouts can be used for tracking the progress of pattern matching. -The -pcretest -program has a pattern qualifier (/C) that sets automatic callouts; when it is -used, the output indicates how the pattern is being matched. This is useful -information when you are trying to optimize the performance of a particular -pattern. -

    -
    MISSING CALLOUTS
    -

    -You should be aware that, because of optimizations in the way PCRE compiles and -matches patterns, callouts sometimes do not happen exactly as you might expect. -

    -

    -At compile time, PCRE "auto-possessifies" repeated items when it knows that -what follows cannot be part of the repeat. For example, a+[bc] is compiled as -if it were a++[bc]. The pcretest output when this pattern is anchored and -then applied with automatic callouts to the string "aaaa" is: -

    -  --->aaaa
    -   +0 ^        ^
    -   +1 ^        a+
    -   +3 ^   ^    [bc]
    -  No match
    -
    -This indicates that when matching [bc] fails, there is no backtracking into a+ -and therefore the callouts that would be taken for the backtracks do not occur. -You can disable the auto-possessify feature by passing PCRE_NO_AUTO_POSSESS -to pcre_compile(), or starting the pattern with (*NO_AUTO_POSSESS). If -this is done in pcretest (using the /O qualifier), the output changes to -this: -
    -  --->aaaa
    -   +0 ^        ^
    -   +1 ^        a+
    -   +3 ^   ^    [bc]
    -   +3 ^  ^     [bc]
    -   +3 ^ ^      [bc]
    -   +3 ^^       [bc]
    -  No match
    -
    -This time, when matching [bc] fails, the matcher backtracks into a+ and tries -again, repeatedly, until a+ itself fails. -

    -

    -Other optimizations that provide fast "no match" results also affect callouts. -For example, if the pattern is -

    -  ab(?C4)cd
    -
    -PCRE knows that any matching string must contain the letter "d". If the subject -string is "abyz", the lack of "d" means that matching doesn't ever start, and -the callout is never reached. However, with "abyd", though the result is still -no match, the callout is obeyed. -

    -

    -If the pattern is studied, PCRE knows the minimum length of a matching string, -and will immediately give a "no match" return without actually running a match -if the subject is not long enough, or, for unanchored patterns, if it has -been scanned far enough. -

    -

    -You can disable these optimizations by passing the PCRE_NO_START_OPTIMIZE -option to the matching function, or by starting the pattern with -(*NO_START_OPT). This slows down the matching process, but does ensure that -callouts such as the example above are obeyed. -

    -
    THE CALLOUT INTERFACE
    -

    -During matching, when PCRE reaches a callout point, the external function -defined by pcre_callout or pcre[16|32]_callout is called (if it is -set). This applies to both normal and DFA matching. The only argument to the -callout function is a pointer to a pcre_callout or -pcre[16|32]_callout block. These structures contains the following -fields: -

    -  int           version;
    -  int           callout_number;
    -  int          *offset_vector;
    -  const char   *subject;           (8-bit version)
    -  PCRE_SPTR16   subject;           (16-bit version)
    -  PCRE_SPTR32   subject;           (32-bit version)
    -  int           subject_length;
    -  int           start_match;
    -  int           current_position;
    -  int           capture_top;
    -  int           capture_last;
    -  void         *callout_data;
    -  int           pattern_position;
    -  int           next_item_length;
    -  const unsigned char *mark;       (8-bit version)
    -  const PCRE_UCHAR16  *mark;       (16-bit version)
    -  const PCRE_UCHAR32  *mark;       (32-bit version)
    -
    -The version field is an integer containing the version number of the -block format. The initial version was 0; the current version is 2. The version -number will change again in future if additional fields are added, but the -intention is never to remove any of the existing fields. -

    -

    -The callout_number field contains the number of the callout, as compiled -into the pattern (that is, the number after ?C for manual callouts, and 255 for -automatically generated callouts). -

    -

    -The offset_vector field is a pointer to the vector of offsets that was -passed by the caller to the matching function. When pcre_exec() or -pcre[16|32]_exec() is used, the contents can be inspected, in order to -extract substrings that have been matched so far, in the same way as for -extracting substrings after a match has completed. For the DFA matching -functions, this field is not useful. -

    -

    -The subject and subject_length fields contain copies of the values -that were passed to the matching function. -

    -

    -The start_match field normally contains the offset within the subject at -which the current match attempt started. However, if the escape sequence \K -has been encountered, this value is changed to reflect the modified starting -point. If the pattern is not anchored, the callout function may be called -several times from the same point in the pattern for different starting points -in the subject. -

    -

    -The current_position field contains the offset within the subject of the -current match pointer. -

    -

    -When the pcre_exec() or pcre[16|32]_exec() is used, the -capture_top field contains one more than the number of the highest -numbered captured substring so far. If no substrings have been captured, the -value of capture_top is one. This is always the case when the DFA -functions are used, because they do not support captured substrings. -

    -

    -The capture_last field contains the number of the most recently captured -substring. However, when a recursion exits, the value reverts to what it was -outside the recursion, as do the values of all captured substrings. If no -substrings have been captured, the value of capture_last is -1. This is -always the case for the DFA matching functions. -

    -

    -The callout_data field contains a value that is passed to a matching -function specifically so that it can be passed back in callouts. It is passed -in the callout_data field of a pcre_extra or pcre[16|32]_extra -data structure. If no such data was passed, the value of callout_data in -a callout block is NULL. There is a description of the pcre_extra -structure in the -pcreapi -documentation. -

    -

    -The pattern_position field is present from version 1 of the callout -structure. It contains the offset to the next item to be matched in the pattern -string. -

    -

    -The next_item_length field is present from version 1 of the callout -structure. It contains the length of the next item to be matched in the pattern -string. When the callout immediately precedes an alternation bar, a closing -parenthesis, or the end of the pattern, the length is zero. When the callout -precedes an opening parenthesis, the length is that of the entire subpattern. -

    -

    -The pattern_position and next_item_length fields are intended to -help in distinguishing between different automatic callouts, which all have the -same callout number. However, they are set for all callouts. -

    -

    -The mark field is present from version 2 of the callout structure. In -callouts from pcre_exec() or pcre[16|32]_exec() it contains a -pointer to the zero-terminated name of the most recently passed (*MARK), -(*PRUNE), or (*THEN) item in the match, or NULL if no such items have been -passed. Instances of (*PRUNE) or (*THEN) without a name do not obliterate a -previous (*MARK). In callouts from the DFA matching functions this field always -contains NULL. -

    -
    RETURN VALUES
    -

    -The external callout function returns an integer to PCRE. If the value is zero, -matching proceeds as normal. If the value is greater than zero, matching fails -at the current point, but the testing of other matching possibilities goes -ahead, just as if a lookahead assertion had failed. If the value is less than -zero, the match is abandoned, the matching function returns the negative value. -

    -

    -Negative values should normally be chosen from the set of PCRE_ERROR_xxx -values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure. -The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions; -it will never be used by PCRE itself. -

    -
    AUTHOR
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    REVISION
    -

    -Last updated: 12 November 2013 -
    -Copyright © 1997-2013 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcrecompat.html b/src/pcre/doc/html/pcrecompat.html deleted file mode 100644 index d95570ef..00000000 --- a/src/pcre/doc/html/pcrecompat.html +++ /dev/null @@ -1,235 +0,0 @@ - - -pcrecompat specification - - -

    pcrecompat man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -DIFFERENCES BETWEEN PCRE AND PERL -
    -

    -This document describes the differences in the ways that PCRE and Perl handle -regular expressions. The differences described here are with respect to Perl -versions 5.10 and above. -

    -

    -1. PCRE has only a subset of Perl's Unicode support. Details of what it does -have are given in the -pcreunicode -page. -

    -

    -2. PCRE allows repeat quantifiers only on parenthesized assertions, but they do -not mean what you might think. For example, (?!a){3} does not assert that the -next three characters are not "a". It just asserts that the next character is -not "a" three times (in principle: PCRE optimizes this to run the assertion -just once). Perl allows repeat quantifiers on other assertions such as \b, but -these do not seem to have any use. -

    -

    -3. Capturing subpatterns that occur inside negative lookahead assertions are -counted, but their entries in the offsets vector are never set. Perl sometimes -(but not always) sets its numerical variables from inside negative assertions. -

    -

    -4. Though binary zero characters are supported in the subject string, they are -not allowed in a pattern string because it is passed as a normal C string, -terminated by zero. The escape sequence \0 can be used in the pattern to -represent a binary zero. -

    -

    -5. The following Perl escape sequences are not supported: \l, \u, \L, -\U, and \N when followed by a character name or Unicode value. (\N on its -own, matching a non-newline character, is supported.) In fact these are -implemented by Perl's general string-handling and are not part of its pattern -matching engine. If any of these are encountered by PCRE, an error is -generated by default. However, if the PCRE_JAVASCRIPT_COMPAT option is set, -\U and \u are interpreted as JavaScript interprets them. -

    -

    -6. The Perl escape sequences \p, \P, and \X are supported only if PCRE is -built with Unicode character property support. The properties that can be -tested with \p and \P are limited to the general category properties such as -Lu and Nd, script names such as Greek or Han, and the derived properties Any -and L&. PCRE does support the Cs (surrogate) property, which Perl does not; the -Perl documentation says "Because Perl hides the need for the user to understand -the internal representation of Unicode characters, there is no need to -implement the somewhat messy concept of surrogates." -

    -

    -7. PCRE does support the \Q...\E escape for quoting substrings. Characters in -between are treated as literals. This is slightly different from Perl in that $ -and @ are also handled as literals inside the quotes. In Perl, they cause -variable interpolation (but of course PCRE does not have variables). Note the -following examples: -

    -    Pattern            PCRE matches      Perl matches
    -
    -    \Qabc$xyz\E        abc$xyz           abc followed by the contents of $xyz
    -    \Qabc\$xyz\E       abc\$xyz          abc\$xyz
    -    \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz
    -
    -The \Q...\E sequence is recognized both inside and outside character classes. -

    -

    -8. Fairly obviously, PCRE does not support the (?{code}) and (??{code}) -constructions. However, there is support for recursive patterns. This is not -available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE "callout" -feature allows an external function to be called during pattern matching. See -the -pcrecallout -documentation for details. -

    -

    -9. Subpatterns that are called as subroutines (whether or not recursively) are -always treated as atomic groups in PCRE. This is like Python, but unlike Perl. -Captured values that are set outside a subroutine call can be reference from -inside in PCRE, but not in Perl. There is a discussion that explains these -differences in more detail in the -section on recursion differences from Perl -in the -pcrepattern -page. -

    -

    -10. If any of the backtracking control verbs are used in a subpattern that is -called as a subroutine (whether or not recursively), their effect is confined -to that subpattern; it does not extend to the surrounding pattern. This is not -always the case in Perl. In particular, if (*THEN) is present in a group that -is called as a subroutine, its action is limited to that group, even if the -group does not contain any | characters. Note that such subpatterns are -processed as anchored at the point where they are tested. -

    -

    -11. If a pattern contains more than one backtracking control verb, the first -one that is backtracked onto acts. For example, in the pattern -A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C -triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the -same as PCRE, but there are examples where it differs. -

    -

    -12. Most backtracking verbs in assertions have their normal actions. They are -not confined to the assertion. -

    -

    -13. There are some differences that are concerned with the settings of captured -strings when part of a pattern is repeated. For example, matching "aba" against -the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b". -

    -

    -14. PCRE's handling of duplicate subpattern numbers and duplicate subpattern -names is not as general as Perl's. This is a consequence of the fact the PCRE -works internally just with numbers, using an external table to translate -between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b>B), -where the two capturing parentheses have the same number but different names, -is not supported, and causes an error at compile time. If it were allowed, it -would not be possible to distinguish which parentheses matched, because both -names map to capturing subpattern number 1. To avoid this confusing situation, -an error is given at compile time. -

    -

    -15. Perl recognizes comments in some places that PCRE does not, for example, -between the ( and ? at the start of a subpattern. If the /x modifier is set, -Perl allows white space between ( and ? (though current Perls warn that this is -deprecated) but PCRE never does, even if the PCRE_EXTENDED option is set. -

    -

    -16. Perl, when in warning mode, gives warnings for character classes such as -[A-\d] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE has no -warning features, so it gives an error in these cases because they are almost -certainly user mistakes. -

    -

    -17. In PCRE, the upper/lower case character properties Lu and Ll are not -affected when case-independent matching is specified. For example, \p{Lu} -always matches an upper case letter. I think Perl has changed in this respect; -in the release at the time of writing (5.16), \p{Lu} and \p{Ll} match all -letters, regardless of case, when case independence is specified. -

    -

    -18. PCRE provides some extensions to the Perl regular expression facilities. -Perl 5.10 includes new features that are not in earlier versions of Perl, some -of which (such as named parentheses) have been in PCRE for some time. This list -is with respect to Perl 5.10: -
    -
    -(a) Although lookbehind assertions in PCRE must match fixed length strings, -each alternative branch of a lookbehind assertion can match a different length -of string. Perl requires them all to have the same length. -
    -
    -(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $ -meta-character matches only at the very end of the string. -
    -
    -(c) If PCRE_EXTRA is set, a backslash followed by a letter with no special -meaning is faulted. Otherwise, like Perl, the backslash is quietly ignored. -(Perl can be made to issue a warning.) -
    -
    -(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is -inverted, that is, by default they are not greedy, but if followed by a -question mark they are. -
    -
    -(e) PCRE_ANCHORED can be used at matching time to force a pattern to be tried -only at the first matching position in the subject string. -
    -
    -(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, and -PCRE_NO_AUTO_CAPTURE options for pcre_exec() have no Perl equivalents. -
    -
    -(g) The \R escape sequence can be restricted to match only CR, LF, or CRLF -by the PCRE_BSR_ANYCRLF option. -
    -
    -(h) The callout facility is PCRE-specific. -
    -
    -(i) The partial matching facility is PCRE-specific. -
    -
    -(j) Patterns compiled by PCRE can be saved and re-used at a later time, even on -different hosts that have the other endianness. However, this does not apply to -optimized data created by the just-in-time compiler. -
    -
    -(k) The alternative matching functions (pcre_dfa_exec(), -pcre16_dfa_exec() and pcre32_dfa_exec(),) match in a different way -and are not Perl-compatible. -
    -
    -(l) PCRE recognizes some special sequences such as (*CR) at the start of -a pattern that set overall options that cannot be changed within the pattern. -

    -
    -AUTHOR -
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    -REVISION -
    -

    -Last updated: 10 November 2013 -
    -Copyright © 1997-2013 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcrecpp.html b/src/pcre/doc/html/pcrecpp.html deleted file mode 100644 index b7eac3a3..00000000 --- a/src/pcre/doc/html/pcrecpp.html +++ /dev/null @@ -1,368 +0,0 @@ - - -pcrecpp specification - - -

    pcrecpp man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -
    SYNOPSIS OF C++ WRAPPER
    -

    -#include <pcrecpp.h> -

    -
    DESCRIPTION
    -

    -The C++ wrapper for PCRE was provided by Google Inc. Some additional -functionality was added by Giuseppe Maxia. This brief man page was constructed -from the notes in the pcrecpp.h file, which should be consulted for -further details. Note that the C++ wrapper supports only the original 8-bit -PCRE library. There is no 16-bit or 32-bit support at present. -

    -
    MATCHING INTERFACE
    -

    -The "FullMatch" operation checks that supplied text matches a supplied pattern -exactly. If pointer arguments are supplied, it copies matched sub-strings that -match sub-patterns into them. -

    -  Example: successful match
    -     pcrecpp::RE re("h.*o");
    -     re.FullMatch("hello");
    -
    -  Example: unsuccessful match (requires full match):
    -     pcrecpp::RE re("e");
    -     !re.FullMatch("hello");
    -
    -  Example: creating a temporary RE object:
    -     pcrecpp::RE("h.*o").FullMatch("hello");
    -
    -You can pass in a "const char*" or a "string" for "text". The examples below -tend to use a const char*. You can, as in the different examples above, store -the RE object explicitly in a variable or use a temporary RE object. The -examples below use one mode or the other arbitrarily. Either could correctly be -used for any of these examples. -

    -

    -You must supply extra pointer arguments to extract matched subpieces. -

    -  Example: extracts "ruby" into "s" and 1234 into "i"
    -     int i;
    -     string s;
    -     pcrecpp::RE re("(\\w+):(\\d+)");
    -     re.FullMatch("ruby:1234", &s, &i);
    -
    -  Example: does not try to extract any extra sub-patterns
    -     re.FullMatch("ruby:1234", &s);
    -
    -  Example: does not try to extract into NULL
    -     re.FullMatch("ruby:1234", NULL, &i);
    -
    -  Example: integer overflow causes failure
    -     !re.FullMatch("ruby:1234567891234", NULL, &i);
    -
    -  Example: fails because there aren't enough sub-patterns:
    -     !pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
    -
    -  Example: fails because string cannot be stored in integer
    -     !pcrecpp::RE("(.*)").FullMatch("ruby", &i);
    -
    -The provided pointer arguments can be pointers to any scalar numeric -type, or one of: -
    -   string        (matched piece is copied to string)
    -   StringPiece   (StringPiece is mutated to point to matched piece)
    -   T             (where "bool T::ParseFrom(const char*, int)" exists)
    -   NULL          (the corresponding matched sub-pattern is not copied)
    -
    -The function returns true iff all of the following conditions are satisfied: -
    -  a. "text" matches "pattern" exactly;
    -
    -  b. The number of matched sub-patterns is >= number of supplied
    -     pointers;
    -
    -  c. The "i"th argument has a suitable type for holding the
    -     string captured as the "i"th sub-pattern. If you pass in
    -     void * NULL for the "i"th argument, or a non-void * NULL
    -     of the correct type, or pass fewer arguments than the
    -     number of sub-patterns, "i"th captured sub-pattern is
    -     ignored.
    -
    -CAVEAT: An optional sub-pattern that does not exist in the matched -string is assigned the empty string. Therefore, the following will -return false (because the empty string is not a valid number): -
    -   int number;
    -   pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number);
    -
    -The matching interface supports at most 16 arguments per call. -If you need more, consider using the more general interface -pcrecpp::RE::DoMatch. See pcrecpp.h for the signature for -DoMatch. -

    -

    -NOTE: Do not use no_arg, which is used internally to mark the end of a -list of optional arguments, as a placeholder for missing arguments, as this can -lead to segfaults. -

    -
    QUOTING METACHARACTERS
    -

    -You can use the "QuoteMeta" operation to insert backslashes before all -potentially meaningful characters in a string. The returned string, used as a -regular expression, will exactly match the original string. -

    -  Example:
    -     string quoted = RE::QuoteMeta(unquoted);
    -
    -Note that it's legal to escape a character even if it has no special meaning in -a regular expression -- so this function does that. (This also makes it -identical to the perl function of the same name; see "perldoc -f quotemeta".) -For example, "1.5-2.0?" becomes "1\.5\-2\.0\?". -

    -
    PARTIAL MATCHES
    -

    -You can use the "PartialMatch" operation when you want the pattern -to match any substring of the text. -

    -  Example: simple search for a string:
    -     pcrecpp::RE("ell").PartialMatch("hello");
    -
    -  Example: find first number in a string:
    -     int number;
    -     pcrecpp::RE re("(\\d+)");
    -     re.PartialMatch("x*100 + 20", &number);
    -     assert(number == 100);
    -
    -

    -
    UTF-8 AND THE MATCHING INTERFACE
    -

    -By default, pattern and text are plain text, one byte per character. The UTF8 -flag, passed to the constructor, causes both pattern and string to be treated -as UTF-8 text, still a byte stream but potentially multiple bytes per -character. In practice, the text is likelier to be UTF-8 than the pattern, but -the match returned may depend on the UTF8 flag, so always use it when matching -UTF8 text. For example, "." will match one byte normally but with UTF8 set may -match up to three bytes of a multi-byte character. -

    -  Example:
    -     pcrecpp::RE_Options options;
    -     options.set_utf8();
    -     pcrecpp::RE re(utf8_pattern, options);
    -     re.FullMatch(utf8_string);
    -
    -  Example: using the convenience function UTF8():
    -     pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
    -     re.FullMatch(utf8_string);
    -
    -NOTE: The UTF8 flag is ignored if pcre was not configured with the -
    -      --enable-utf8 flag.
    -
    -

    -
    PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE
    -

    -PCRE defines some modifiers to change the behavior of the regular expression -engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to -pass such modifiers to a RE class. Currently, the following modifiers are -supported: -

    -   modifier              description               Perl corresponding
    -
    -   PCRE_CASELESS         case insensitive match      /i
    -   PCRE_MULTILINE        multiple lines match        /m
    -   PCRE_DOTALL           dot matches newlines        /s
    -   PCRE_DOLLAR_ENDONLY   $ matches only at end       N/A
    -   PCRE_EXTRA            strict escape parsing       N/A
    -   PCRE_EXTENDED         ignore white spaces         /x
    -   PCRE_UTF8             handles UTF8 chars          built-in
    -   PCRE_UNGREEDY         reverses * and *?           N/A
    -   PCRE_NO_AUTO_CAPTURE  disables capturing parens   N/A (*)
    -
    -(*) Both Perl and PCRE allow non capturing parentheses by means of the -"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not -capture, while (ab|cd) does. -

    -

    -For a full account on how each modifier works, please check the -PCRE API reference page. -

    -

    -For each modifier, there are two member functions whose name is made -out of the modifier in lowercase, without the "PCRE_" prefix. For -instance, PCRE_CASELESS is handled by -

    -  bool caseless()
    -
    -which returns true if the modifier is set, and -
    -  RE_Options & set_caseless(bool)
    -
    -which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be -accessed through the set_match_limit() and match_limit() member -functions. Setting match_limit to a non-zero value will limit the -execution of pcre to keep it from doing bad things like blowing the stack or -taking an eternity to return a result. A value of 5000 is good enough to stop -stack blowup in a 2MB thread stack. Setting match_limit to zero disables -match limiting. Alternatively, you can call match_limit_recursion() -which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE -recurses. match_limit() limits the number of matches PCRE does; -match_limit_recursion() limits the depth of internal recursion, and -therefore the amount of stack that is used. -

    -

    -Normally, to pass one or more modifiers to a RE class, you declare -a RE_Options object, set the appropriate options, and pass this -object to a RE constructor. Example: -

    -   RE_Options opt;
    -   opt.set_caseless(true);
    -   if (RE("HELLO", opt).PartialMatch("hello world")) ...
    -
    -RE_options has two constructors. The default constructor takes no arguments and -creates a set of flags that are off by default. The optional parameter -option_flags is to facilitate transfer of legacy code from C programs. -This lets you do -
    -   RE(pattern,
    -     RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
    -
    -However, new code is better off doing -
    -   RE(pattern,
    -     RE_Options().set_caseless(true).set_multiline(true))
    -       .PartialMatch(str);
    -
    -If you are going to pass one of the most used modifiers, there are some -convenience functions that return a RE_Options class with the -appropriate modifier already set: CASELESS(), UTF8(), -MULTILINE(), DOTALL(), and EXTENDED(). -

    -

    -If you need to set several options at once, and you don't want to go through -the pains of declaring a RE_Options object and setting several options, there -is a parallel method that give you such ability on the fly. You can concatenate -several set_xxxxx() member functions, since each of them returns a -reference to its class object. For example, to pass PCRE_CASELESS, -PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write: -

    -   RE(" ^ xyz \\s+ .* blah$",
    -     RE_Options()
    -       .set_caseless(true)
    -       .set_extended(true)
    -       .set_multiline(true)).PartialMatch(sometext);
    -
    -
    -

    -
    SCANNING TEXT INCREMENTALLY
    -

    -The "Consume" operation may be useful if you want to repeatedly -match regular expressions at the front of a string and skip over -them as they match. This requires use of the "StringPiece" type, -which represents a sub-range of a real string. Like RE, StringPiece -is defined in the pcrecpp namespace. -

    -  Example: read lines of the form "var = value" from a string.
    -     string contents = ...;                 // Fill string somehow
    -     pcrecpp::StringPiece input(contents);  // Wrap in a StringPiece
    -
    -     string var;
    -     int value;
    -     pcrecpp::RE re("(\\w+) = (\\d+)\n");
    -     while (re.Consume(&input, &var, &value)) {
    -       ...;
    -     }
    -
    -Each successful call to "Consume" will set "var/value", and also -advance "input" so it points past the matched text. -

    -

    -The "FindAndConsume" operation is similar to "Consume" but does not -anchor your match at the beginning of the string. For example, you -could extract all words from a string by repeatedly calling -

    -  pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
    -
    -

    -
    PARSING HEX/OCTAL/C-RADIX NUMBERS
    -

    -By default, if you pass a pointer to a numeric value, the -corresponding text is interpreted as a base-10 number. You can -instead wrap the pointer with a call to one of the operators Hex(), -Octal(), or CRadix() to interpret the text in another base. The -CRadix operator interprets C-style "0" (base-8) and "0x" (base-16) -prefixes, but defaults to base-10. -

    -  Example:
    -    int a, b, c, d;
    -    pcrecpp::RE re("(.*) (.*) (.*) (.*)");
    -    re.FullMatch("100 40 0100 0x40",
    -                 pcrecpp::Octal(&a), pcrecpp::Hex(&b),
    -                 pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
    -
    -will leave 64 in a, b, c, and d. -

    -
    REPLACING PARTS OF STRINGS
    -

    -You can replace the first match of "pattern" in "str" with "rewrite". -Within "rewrite", backslash-escaped digits (\1 to \9) can be -used to insert text matching corresponding parenthesized group -from the pattern. \0 in "rewrite" refers to the entire matching -text. For example: -

    -  string s = "yabba dabba doo";
    -  pcrecpp::RE("b+").Replace("d", &s);
    -
    -will leave "s" containing "yada dabba doo". The result is true if the pattern -matches and a replacement occurs, false otherwise. -

    -

    -GlobalReplace is like Replace except that it replaces all -occurrences of the pattern in the string with the rewrite. Replacements are -not subject to re-matching. For example: -

    -  string s = "yabba dabba doo";
    -  pcrecpp::RE("b+").GlobalReplace("d", &s);
    -
    -will leave "s" containing "yada dada doo". It returns the number of -replacements made. -

    -

    -Extract is like Replace, except that if the pattern matches, -"rewrite" is copied into "out" (an additional argument) with substitutions. -The non-matching portions of "text" are ignored. Returns true iff a match -occurred and the extraction happened successfully; if no match occurs, the -string is left unaffected. -

    -
    AUTHOR
    -

    -The C++ wrapper was contributed by Google Inc. -
    -Copyright © 2007 Google Inc. -
    -

    -
    REVISION
    -

    -Last updated: 08 January 2012 -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcredemo.html b/src/pcre/doc/html/pcredemo.html deleted file mode 100644 index 894a9308..00000000 --- a/src/pcre/doc/html/pcredemo.html +++ /dev/null @@ -1,426 +0,0 @@ - - -pcredemo specification - - -

    pcredemo man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

      -
    -
    -/*************************************************
    -*           PCRE DEMONSTRATION PROGRAM           *
    -*************************************************/
    -
    -/* This is a demonstration program to illustrate the most straightforward ways
    -of calling the PCRE regular expression library from a C program. See the
    -pcresample documentation for a short discussion ("man pcresample" if you have
    -the PCRE man pages installed).
    -
    -In Unix-like environments, if PCRE is installed in your standard system
    -libraries, you should be able to compile this program using this command:
    -
    -gcc -Wall pcredemo.c -lpcre -o pcredemo
    -
    -If PCRE is not installed in a standard place, it is likely to be installed with
    -support for the pkg-config mechanism. If you have pkg-config, you can compile
    -this program using this command:
    -
    -gcc -Wall pcredemo.c `pkg-config --cflags --libs libpcre` -o pcredemo
    -
    -If you do not have pkg-config, you may have to use this:
    -
    -gcc -Wall pcredemo.c -I/usr/local/include -L/usr/local/lib \
    -  -R/usr/local/lib -lpcre -o pcredemo
    -
    -Replace "/usr/local/include" and "/usr/local/lib" with wherever the include and
    -library files for PCRE are installed on your system. Only some operating
    -systems (e.g. Solaris) use the -R option.
    -
    -Building under Windows:
    -
    -If you want to statically link this program against a non-dll .a file, you must
    -define PCRE_STATIC before including pcre.h, otherwise the pcre_malloc() and
    -pcre_free() exported functions will be declared __declspec(dllimport), with
    -unwanted results. So in this environment, uncomment the following line. */
    -
    -/* #define PCRE_STATIC */
    -
    -#include <stdio.h>
    -#include <string.h>
    -#include <pcre.h>
    -
    -#define OVECCOUNT 30    /* should be a multiple of 3 */
    -
    -
    -int main(int argc, char **argv)
    -{
    -pcre *re;
    -const char *error;
    -char *pattern;
    -char *subject;
    -unsigned char *name_table;
    -unsigned int option_bits;
    -int erroffset;
    -int find_all;
    -int crlf_is_newline;
    -int namecount;
    -int name_entry_size;
    -int ovector[OVECCOUNT];
    -int subject_length;
    -int rc, i;
    -int utf8;
    -
    -
    -/**************************************************************************
    -* First, sort out the command line. There is only one possible option at  *
    -* the moment, "-g" to request repeated matching to find all occurrences,  *
    -* like Perl's /g option. We set the variable find_all to a non-zero value *
    -* if the -g option is present. Apart from that, there must be exactly two *
    -* arguments.                                                              *
    -**************************************************************************/
    -
    -find_all = 0;
    -for (i = 1; i < argc; i++)
    -  {
    -  if (strcmp(argv[i], "-g") == 0) find_all = 1;
    -    else break;
    -  }
    -
    -/* After the options, we require exactly two arguments, which are the pattern,
    -and the subject string. */
    -
    -if (argc - i != 2)
    -  {
    -  printf("Two arguments required: a regex and a subject string\n");
    -  return 1;
    -  }
    -
    -pattern = argv[i];
    -subject = argv[i+1];
    -subject_length = (int)strlen(subject);
    -
    -
    -/*************************************************************************
    -* Now we are going to compile the regular expression pattern, and handle *
    -* and errors that are detected.                                          *
    -*************************************************************************/
    -
    -re = pcre_compile(
    -  pattern,              /* the pattern */
    -  0,                    /* default options */
    -  &error,               /* for error message */
    -  &erroffset,           /* for error offset */
    -  NULL);                /* use default character tables */
    -
    -/* Compilation failed: print the error message and exit */
    -
    -if (re == NULL)
    -  {
    -  printf("PCRE compilation failed at offset %d: %s\n", erroffset, error);
    -  return 1;
    -  }
    -
    -
    -/*************************************************************************
    -* If the compilation succeeded, we call PCRE again, in order to do a     *
    -* pattern match against the subject string. This does just ONE match. If *
    -* further matching is needed, it will be done below.                     *
    -*************************************************************************/
    -
    -rc = pcre_exec(
    -  re,                   /* the compiled pattern */
    -  NULL,                 /* no extra data - we didn't study the pattern */
    -  subject,              /* the subject string */
    -  subject_length,       /* the length of the subject */
    -  0,                    /* start at offset 0 in the subject */
    -  0,                    /* default options */
    -  ovector,              /* output vector for substring information */
    -  OVECCOUNT);           /* number of elements in the output vector */
    -
    -/* Matching failed: handle error cases */
    -
    -if (rc < 0)
    -  {
    -  switch(rc)
    -    {
    -    case PCRE_ERROR_NOMATCH: printf("No match\n"); break;
    -    /*
    -    Handle other special cases if you like
    -    */
    -    default: printf("Matching error %d\n", rc); break;
    -    }
    -  pcre_free(re);     /* Release memory used for the compiled pattern */
    -  return 1;
    -  }
    -
    -/* Match succeded */
    -
    -printf("\nMatch succeeded at offset %d\n", ovector[0]);
    -
    -
    -/*************************************************************************
    -* We have found the first match within the subject string. If the output *
    -* vector wasn't big enough, say so. Then output any substrings that were *
    -* captured.                                                              *
    -*************************************************************************/
    -
    -/* The output vector wasn't big enough */
    -
    -if (rc == 0)
    -  {
    -  rc = OVECCOUNT/3;
    -  printf("ovector only has room for %d captured substrings\n", rc - 1);
    -  }
    -
    -/* Show substrings stored in the output vector by number. Obviously, in a real
    -application you might want to do things other than print them. */
    -
    -for (i = 0; i < rc; i++)
    -  {
    -  char *substring_start = subject + ovector[2*i];
    -  int substring_length = ovector[2*i+1] - ovector[2*i];
    -  printf("%2d: %.*s\n", i, substring_length, substring_start);
    -  }
    -
    -
    -/**************************************************************************
    -* That concludes the basic part of this demonstration program. We have    *
    -* compiled a pattern, and performed a single match. The code that follows *
    -* shows first how to access named substrings, and then how to code for    *
    -* repeated matches on the same subject.                                   *
    -**************************************************************************/
    -
    -/* See if there are any named substrings, and if so, show them by name. First
    -we have to extract the count of named parentheses from the pattern. */
    -
    -(void)pcre_fullinfo(
    -  re,                   /* the compiled pattern */
    -  NULL,                 /* no extra data - we didn't study the pattern */
    -  PCRE_INFO_NAMECOUNT,  /* number of named substrings */
    -  &namecount);          /* where to put the answer */
    -
    -if (namecount <= 0) printf("No named substrings\n"); else
    -  {
    -  unsigned char *tabptr;
    -  printf("Named substrings\n");
    -
    -  /* Before we can access the substrings, we must extract the table for
    -  translating names to numbers, and the size of each entry in the table. */
    -
    -  (void)pcre_fullinfo(
    -    re,                       /* the compiled pattern */
    -    NULL,                     /* no extra data - we didn't study the pattern */
    -    PCRE_INFO_NAMETABLE,      /* address of the table */
    -    &name_table);             /* where to put the answer */
    -
    -  (void)pcre_fullinfo(
    -    re,                       /* the compiled pattern */
    -    NULL,                     /* no extra data - we didn't study the pattern */
    -    PCRE_INFO_NAMEENTRYSIZE,  /* size of each entry in the table */
    -    &name_entry_size);        /* where to put the answer */
    -
    -  /* Now we can scan the table and, for each entry, print the number, the name,
    -  and the substring itself. */
    -
    -  tabptr = name_table;
    -  for (i = 0; i < namecount; i++)
    -    {
    -    int n = (tabptr[0] << 8) | tabptr[1];
    -    printf("(%d) %*s: %.*s\n", n, name_entry_size - 3, tabptr + 2,
    -      ovector[2*n+1] - ovector[2*n], subject + ovector[2*n]);
    -    tabptr += name_entry_size;
    -    }
    -  }
    -
    -
    -/*************************************************************************
    -* If the "-g" option was given on the command line, we want to continue  *
    -* to search for additional matches in the subject string, in a similar   *
    -* way to the /g option in Perl. This turns out to be trickier than you   *
    -* might think because of the possibility of matching an empty string.    *
    -* What happens is as follows:                                            *
    -*                                                                        *
    -* If the previous match was NOT for an empty string, we can just start   *
    -* the next match at the end of the previous one.                         *
    -*                                                                        *
    -* If the previous match WAS for an empty string, we can't do that, as it *
    -* would lead to an infinite loop. Instead, a special call of pcre_exec() *
    -* is made with the PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED flags set.    *
    -* The first of these tells PCRE that an empty string at the start of the *
    -* subject is not a valid match; other possibilities must be tried. The   *
    -* second flag restricts PCRE to one match attempt at the initial string  *
    -* position. If this match succeeds, an alternative to the empty string   *
    -* match has been found, and we can print it and proceed round the loop,  *
    -* advancing by the length of whatever was found. If this match does not  *
    -* succeed, we still stay in the loop, advancing by just one character.   *
    -* In UTF-8 mode, which can be set by (*UTF8) in the pattern, this may be *
    -* more than one byte.                                                    *
    -*                                                                        *
    -* However, there is a complication concerned with newlines. When the     *
    -* newline convention is such that CRLF is a valid newline, we must       *
    -* advance by two characters rather than one. The newline convention can  *
    -* be set in the regex by (*CR), etc.; if not, we must find the default.  *
    -*************************************************************************/
    -
    -if (!find_all)     /* Check for -g */
    -  {
    -  pcre_free(re);   /* Release the memory used for the compiled pattern */
    -  return 0;        /* Finish unless -g was given */
    -  }
    -
    -/* Before running the loop, check for UTF-8 and whether CRLF is a valid newline
    -sequence. First, find the options with which the regex was compiled; extract
    -the UTF-8 state, and mask off all but the newline options. */
    -
    -(void)pcre_fullinfo(re, NULL, PCRE_INFO_OPTIONS, &option_bits);
    -utf8 = option_bits & PCRE_UTF8;
    -option_bits &= PCRE_NEWLINE_CR|PCRE_NEWLINE_LF|PCRE_NEWLINE_CRLF|
    -               PCRE_NEWLINE_ANY|PCRE_NEWLINE_ANYCRLF;
    -
    -/* If no newline options were set, find the default newline convention from the
    -build configuration. */
    -
    -if (option_bits == 0)
    -  {
    -  int d;
    -  (void)pcre_config(PCRE_CONFIG_NEWLINE, &d);
    -  /* Note that these values are always the ASCII ones, even in
    -  EBCDIC environments. CR = 13, NL = 10. */
    -  option_bits = (d == 13)? PCRE_NEWLINE_CR :
    -          (d == 10)? PCRE_NEWLINE_LF :
    -          (d == (13<<8 | 10))? PCRE_NEWLINE_CRLF :
    -          (d == -2)? PCRE_NEWLINE_ANYCRLF :
    -          (d == -1)? PCRE_NEWLINE_ANY : 0;
    -  }
    -
    -/* See if CRLF is a valid newline sequence. */
    -
    -crlf_is_newline =
    -     option_bits == PCRE_NEWLINE_ANY ||
    -     option_bits == PCRE_NEWLINE_CRLF ||
    -     option_bits == PCRE_NEWLINE_ANYCRLF;
    -
    -/* Loop for second and subsequent matches */
    -
    -for (;;)
    -  {
    -  int options = 0;                 /* Normally no options */
    -  int start_offset = ovector[1];   /* Start at end of previous match */
    -
    -  /* If the previous match was for an empty string, we are finished if we are
    -  at the end of the subject. Otherwise, arrange to run another match at the
    -  same point to see if a non-empty match can be found. */
    -
    -  if (ovector[0] == ovector[1])
    -    {
    -    if (ovector[0] == subject_length) break;
    -    options = PCRE_NOTEMPTY_ATSTART | PCRE_ANCHORED;
    -    }
    -
    -  /* Run the next matching operation */
    -
    -  rc = pcre_exec(
    -    re,                   /* the compiled pattern */
    -    NULL,                 /* no extra data - we didn't study the pattern */
    -    subject,              /* the subject string */
    -    subject_length,       /* the length of the subject */
    -    start_offset,         /* starting offset in the subject */
    -    options,              /* options */
    -    ovector,              /* output vector for substring information */
    -    OVECCOUNT);           /* number of elements in the output vector */
    -
    -  /* This time, a result of NOMATCH isn't an error. If the value in "options"
    -  is zero, it just means we have found all possible matches, so the loop ends.
    -  Otherwise, it means we have failed to find a non-empty-string match at a
    -  point where there was a previous empty-string match. In this case, we do what
    -  Perl does: advance the matching position by one character, and continue. We
    -  do this by setting the "end of previous match" offset, because that is picked
    -  up at the top of the loop as the point at which to start again.
    -
    -  There are two complications: (a) When CRLF is a valid newline sequence, and
    -  the current position is just before it, advance by an extra byte. (b)
    -  Otherwise we must ensure that we skip an entire UTF-8 character if we are in
    -  UTF-8 mode. */
    -
    -  if (rc == PCRE_ERROR_NOMATCH)
    -    {
    -    if (options == 0) break;                    /* All matches found */
    -    ovector[1] = start_offset + 1;              /* Advance one byte */
    -    if (crlf_is_newline &&                      /* If CRLF is newline & */
    -        start_offset < subject_length - 1 &&    /* we are at CRLF, */
    -        subject[start_offset] == '\r' &&
    -        subject[start_offset + 1] == '\n')
    -      ovector[1] += 1;                          /* Advance by one more. */
    -    else if (utf8)                              /* Otherwise, ensure we */
    -      {                                         /* advance a whole UTF-8 */
    -      while (ovector[1] < subject_length)       /* character. */
    -        {
    -        if ((subject[ovector[1]] & 0xc0) != 0x80) break;
    -        ovector[1] += 1;
    -        }
    -      }
    -    continue;    /* Go round the loop again */
    -    }
    -
    -  /* Other matching errors are not recoverable. */
    -
    -  if (rc < 0)
    -    {
    -    printf("Matching error %d\n", rc);
    -    pcre_free(re);    /* Release memory used for the compiled pattern */
    -    return 1;
    -    }
    -
    -  /* Match succeded */
    -
    -  printf("\nMatch succeeded again at offset %d\n", ovector[0]);
    -
    -  /* The match succeeded, but the output vector wasn't big enough. */
    -
    -  if (rc == 0)
    -    {
    -    rc = OVECCOUNT/3;
    -    printf("ovector only has room for %d captured substrings\n", rc - 1);
    -    }
    -
    -  /* As before, show substrings stored in the output vector by number, and then
    -  also any named substrings. */
    -
    -  for (i = 0; i < rc; i++)
    -    {
    -    char *substring_start = subject + ovector[2*i];
    -    int substring_length = ovector[2*i+1] - ovector[2*i];
    -    printf("%2d: %.*s\n", i, substring_length, substring_start);
    -    }
    -
    -  if (namecount <= 0) printf("No named substrings\n"); else
    -    {
    -    unsigned char *tabptr = name_table;
    -    printf("Named substrings\n");
    -    for (i = 0; i < namecount; i++)
    -      {
    -      int n = (tabptr[0] << 8) | tabptr[1];
    -      printf("(%d) %*s: %.*s\n", n, name_entry_size - 3, tabptr + 2,
    -        ovector[2*n+1] - ovector[2*n], subject + ovector[2*n]);
    -      tabptr += name_entry_size;
    -      }
    -    }
    -  }      /* End of loop to find second and subsequent matches */
    -
    -printf("\n");
    -pcre_free(re);       /* Release memory used for the compiled pattern */
    -return 0;
    -}
    -
    -/* End of pcredemo.c */
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcregrep.html b/src/pcre/doc/html/pcregrep.html deleted file mode 100644 index dacbb499..00000000 --- a/src/pcre/doc/html/pcregrep.html +++ /dev/null @@ -1,759 +0,0 @@ - - -pcregrep specification - - -

    pcregrep man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -
    SYNOPSIS
    -

    -pcregrep [options] [long options] [pattern] [path1 path2 ...] -

    -
    DESCRIPTION
    -

    -pcregrep searches files for character patterns, in the same way as other -grep commands do, but it uses the PCRE regular expression library to support -patterns that are compatible with the regular expressions of Perl 5. See -pcresyntax(3) -for a quick-reference summary of pattern syntax, or -pcrepattern(3) -for a full description of the syntax and semantics of the regular expressions -that PCRE supports. -

    -

    -Patterns, whether supplied on the command line or in a separate file, are given -without delimiters. For example: -

    -  pcregrep Thursday /etc/motd
    -
    -If you attempt to use delimiters (for example, by surrounding a pattern with -slashes, as is common in Perl scripts), they are interpreted as part of the -pattern. Quotes can of course be used to delimit patterns on the command line -because they are interpreted by the shell, and indeed quotes are required if a -pattern contains white space or shell metacharacters. -

    -

    -The first argument that follows any option settings is treated as the single -pattern to be matched when neither -e nor -f is present. -Conversely, when one or both of these options are used to specify patterns, all -arguments are treated as path names. At least one of -e, -f, or an -argument pattern must be provided. -

    -

    -If no files are specified, pcregrep reads the standard input. The -standard input can also be referenced by a name consisting of a single hyphen. -For example: -

    -  pcregrep some-pattern /file1 - /file3
    -
    -By default, each line that matches a pattern is copied to the standard -output, and if there is more than one file, the file name is output at the -start of each line, followed by a colon. However, there are options that can -change how pcregrep behaves. In particular, the -M option makes it -possible to search for patterns that span line boundaries. What defines a line -boundary is controlled by the -N (--newline) option. -

    -

    -The amount of memory used for buffering files that are being scanned is -controlled by a parameter that can be set by the --buffer-size option. -The default value for this parameter is specified when pcregrep is built, -with the default default being 20K. A block of memory three times this size is -used (to allow for buffering "before" and "after" lines). An error occurs if a -line overflows the buffer. -

    -

    -Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater. -BUFSIZ is defined in <stdio.h>. When there is more than one pattern -(specified by the use of -e and/or -f), each pattern is applied to -each line in the order in which they are defined, except that all the -e -patterns are tried before the -f patterns. -

    -

    -By default, as soon as one pattern matches a line, no further patterns are -considered. However, if --colour (or --color) is used to colour the -matching substrings, or if --only-matching, --file-offsets, or ---line-offsets is used to output only the part of the line that matched -(either shown literally, or as an offset), scanning resumes immediately -following the match, so that further matches on the same line can be found. If -there are multiple patterns, they are all tried on the remainder of the line, -but patterns that follow the one that matched are not tried on the earlier part -of the line. -

    -

    -This behaviour means that the order in which multiple patterns are specified -can affect the output when one of the above options is used. This is no longer -the same behaviour as GNU grep, which now manages to display earlier matches -for later patterns (as long as there is no overlap). -

    -

    -Patterns that can match an empty string are accepted, but empty string -matches are never recognized. An example is the pattern "(super)?(man)?", in -which all components are optional. This pattern finds all occurrences of both -"super" and "man"; the output differs from matching with "super|man" when only -the matching substrings are being shown. -

    -

    -If the LC_ALL or LC_CTYPE environment variable is set, -pcregrep uses the value to set a locale when calling the PCRE library. -The --locale option can be used to override this. -

    -
    SUPPORT FOR COMPRESSED FILES
    -

    -It is possible to compile pcregrep so that it uses libz or -libbz2 to read files whose names end in .gz or .bz2, -respectively. You can find out whether your binary has support for one or both -of these file types by running it with the --help option. If the -appropriate support is not present, files are treated as plain text. The -standard input is always so treated. -

    -
    BINARY FILES
    -

    -By default, a file that contains a binary zero byte within the first 1024 bytes -is identified as a binary file, and is processed specially. (GNU grep also -identifies binary files in this manner.) See the --binary-files option -for a means of changing the way binary files are handled. -

    -
    OPTIONS
    -

    -The order in which some of the options appear can affect the output. For -example, both the -h and -l options affect the printing of file -names. Whichever comes later in the command line will be the one that takes -effect. Similarly, except where noted below, if an option is given twice, the -later setting is used. Numerical values for options may be followed by K or M, -to signify multiplication by 1024 or 1024*1024 respectively. -

    -

    --- -This terminates the list of options. It is useful if the next item on the -command line starts with a hyphen but is not an option. This allows for the -processing of patterns and filenames that start with hyphens. -

    -

    --A number, --after-context=number -Output number lines of context after each matching line. If filenames -and/or line numbers are being output, a hyphen separator is used instead of a -colon for the context lines. A line containing "--" is output between each -group of lines, unless they are in fact contiguous in the input file. The value -of number is expected to be relatively small. However, pcregrep -guarantees to have up to 8K of following text available for context output. -

    -

    --a, --text -Treat binary files as text. This is equivalent to ---binary-files=text. -

    -

    --B number, --before-context=number -Output number lines of context before each matching line. If filenames -and/or line numbers are being output, a hyphen separator is used instead of a -colon for the context lines. A line containing "--" is output between each -group of lines, unless they are in fact contiguous in the input file. The value -of number is expected to be relatively small. However, pcregrep -guarantees to have up to 8K of preceding text available for context output. -

    -

    ---binary-files=word -Specify how binary files are to be processed. If the word is "binary" (the -default), pattern matching is performed on binary files, but the only output is -"Binary file <name> matches" when a match succeeds. If the word is "text", -which is equivalent to the -a or --text option, binary files are -processed in the same way as any other file. In this case, when a match -succeeds, the output may be binary garbage, which can have nasty effects if -sent to a terminal. If the word is "without-match", which is equivalent to the --I option, binary files are not processed at all; they are assumed not to -be of interest. -

    -

    ---buffer-size=number -Set the parameter that controls how much memory is used for buffering files -that are being scanned. -

    -

    --C number, --context=number -Output number lines of context both before and after each matching line. -This is equivalent to setting both -A and -B to the same value. -

    -

    --c, --count -Do not output individual lines from the files that are being scanned; instead -output the number of lines that would otherwise have been shown. If no lines -are selected, the number zero is output. If several files are are being -scanned, a count is output for each of them. However, if the ---files-with-matches option is also used, only those files whose counts -are greater than zero are listed. When -c is used, the -A, --B, and -C options are ignored. -

    -

    ---colour, --color -If this option is given without any data, it is equivalent to "--colour=auto". -If data is required, it must be given in the same shell item, separated by an -equals sign. -

    -

    ---colour=value, --color=value -This option specifies under what circumstances the parts of a line that matched -a pattern should be coloured in the output. By default, the output is not -coloured. The value (which is optional, see above) may be "never", "always", or -"auto". In the latter case, colouring happens only if the standard output is -connected to a terminal. More resources are used when colouring is enabled, -because pcregrep has to search for all possible matches in a line, not -just one, in order to colour them all. -
    -
    -The colour that is used can be specified by setting the environment variable -PCREGREP_COLOUR or PCREGREP_COLOR. The value of this variable should be a -string of two numbers, separated by a semicolon. They are copied directly into -the control string for setting colour on a terminal, so it is your -responsibility to ensure that they make sense. If neither of the environment -variables is set, the default is "1;31", which gives red. -

    -

    --D action, --devices=action -If an input path is not a regular file or a directory, "action" specifies how -it is to be processed. Valid values are "read" (the default) or "skip" -(silently skip the path). -

    -

    --d action, --directories=action -If an input path is a directory, "action" specifies how it is to be processed. -Valid values are "read" (the default in non-Windows environments, for -compatibility with GNU grep), "recurse" (equivalent to the -r option), or -"skip" (silently skip the path, the default in Windows environments). In the -"read" case, directories are read as if they were ordinary files. In some -operating systems the effect of reading a directory like this is an immediate -end-of-file; in others it may provoke an error. -

    -

    --e pattern, --regex=pattern, --regexp=pattern -Specify a pattern to be matched. This option can be used multiple times in -order to specify several patterns. It can also be used as a way of specifying a -single pattern that starts with a hyphen. When -e is used, no argument -pattern is taken from the command line; all arguments are treated as file -names. There is no limit to the number of patterns. They are applied to each -line in the order in which they are defined until one matches. -
    -
    -If -f is used with -e, the command line patterns are matched first, -followed by the patterns from the file(s), independent of the order in which -these options are specified. Note that multiple use of -e is not the same -as a single pattern with alternatives. For example, X|Y finds the first -character in a line that is X or Y, whereas if the two patterns are given -separately, with X first, pcregrep finds X if it is present, even if it -follows Y in the line. It finds Y only if there is no X in the line. This -matters only if you are using -o or --colo(u)r to show the part(s) -of the line that matched. -

    -

    ---exclude=pattern -Files (but not directories) whose names match the pattern are skipped without -being processed. This applies to all files, whether listed on the command line, -obtained from --file-list, or by scanning a directory. The pattern is a -PCRE regular expression, and is matched against the final component of the file -name, not the entire path. The -F, -w, and -x options do not -apply to this pattern. The option may be given any number of times in order to -specify multiple patterns. If a file name matches both an --include -and an --exclude pattern, it is excluded. There is no short form for this -option. -

    -

    ---exclude-from=filename -Treat each non-empty line of the file as the data for an --exclude -option. What constitutes a newline when reading the file is the operating -system's default. The --newline option has no effect on this option. This -option may be given more than once in order to specify a number of files to -read. -

    -

    ---exclude-dir=pattern -Directories whose names match the pattern are skipped without being processed, -whatever the setting of the --recursive option. This applies to all -directories, whether listed on the command line, obtained from ---file-list, or by scanning a parent directory. The pattern is a PCRE -regular expression, and is matched against the final component of the directory -name, not the entire path. The -F, -w, and -x options do not -apply to this pattern. The option may be given any number of times in order to -specify more than one pattern. If a directory matches both --include-dir -and --exclude-dir, it is excluded. There is no short form for this -option. -

    -

    --F, --fixed-strings -Interpret each data-matching pattern as a list of fixed strings, separated by -newlines, instead of as a regular expression. What constitutes a newline for -this purpose is controlled by the --newline option. The -w (match -as a word) and -x (match whole line) options can be used with -F. -They apply to each of the fixed strings. A line is selected if any of the fixed -strings are found in it (subject to -w or -x, if present). This -option applies only to the patterns that are matched against the contents of -files; it does not apply to patterns specified by any of the --include or ---exclude options. -

    -

    --f filename, --file=filename -Read patterns from the file, one per line, and match them against -each line of input. What constitutes a newline when reading the file is the -operating system's default. The --newline option has no effect on this -option. Trailing white space is removed from each line, and blank lines are -ignored. An empty file contains no patterns and therefore matches nothing. See -also the comments about multiple patterns versus a single pattern with -alternatives in the description of -e above. -
    -
    -If this option is given more than once, all the specified files are -read. A data line is output if any of the patterns match it. A filename can -be given as "-" to refer to the standard input. When -f is used, patterns -specified on the command line using -e may also be present; they are -tested before the file's patterns. However, no other pattern is taken from the -command line; all arguments are treated as the names of paths to be searched. -

    -

    ---file-list=filename -Read a list of files and/or directories that are to be scanned from the given -file, one per line. Trailing white space is removed from each line, and blank -lines are ignored. These paths are processed before any that are listed on the -command line. The filename can be given as "-" to refer to the standard input. -If --file and --file-list are both specified as "-", patterns are -read first. This is useful only when the standard input is a terminal, from -which further lines (the list of files) can be read after an end-of-file -indication. If this option is given more than once, all the specified files are -read. -

    -

    ---file-offsets -Instead of showing lines or parts of lines that match, show each match as an -offset from the start of the file and a length, separated by a comma. In this -mode, no context is shown. That is, the -A, -B, and -C -options are ignored. If there is more than one match in a line, each of them is -shown separately. This option is mutually exclusive with --line-offsets -and --only-matching. -

    -

    --H, --with-filename -Force the inclusion of the filename at the start of output lines when searching -a single file. By default, the filename is not shown in this case. For matching -lines, the filename is followed by a colon; for context lines, a hyphen -separator is used. If a line number is also being output, it follows the file -name. -

    -

    --h, --no-filename -Suppress the output filenames when searching multiple files. By default, -filenames are shown when multiple files are searched. For matching lines, the -filename is followed by a colon; for context lines, a hyphen separator is used. -If a line number is also being output, it follows the file name. -

    -

    ---help -Output a help message, giving brief details of the command options and file -type support, and then exit. Anything else on the command line is -ignored. -

    -

    --I -Treat binary files as never matching. This is equivalent to ---binary-files=without-match. -

    -

    --i, --ignore-case -Ignore upper/lower case distinctions during comparisons. -

    -

    ---include=pattern -If any --include patterns are specified, the only files that are -processed are those that match one of the patterns (and do not match an ---exclude pattern). This option does not affect directories, but it -applies to all files, whether listed on the command line, obtained from ---file-list, or by scanning a directory. The pattern is a PCRE regular -expression, and is matched against the final component of the file name, not -the entire path. The -F, -w, and -x options do not apply to -this pattern. The option may be given any number of times. If a file name -matches both an --include and an --exclude pattern, it is excluded. -There is no short form for this option. -

    -

    ---include-from=filename -Treat each non-empty line of the file as the data for an --include -option. What constitutes a newline for this purpose is the operating system's -default. The --newline option has no effect on this option. This option -may be given any number of times; all the files are read. -

    -

    ---include-dir=pattern -If any --include-dir patterns are specified, the only directories that -are processed are those that match one of the patterns (and do not match an ---exclude-dir pattern). This applies to all directories, whether listed -on the command line, obtained from --file-list, or by scanning a parent -directory. The pattern is a PCRE regular expression, and is matched against the -final component of the directory name, not the entire path. The -F, --w, and -x options do not apply to this pattern. The option may be -given any number of times. If a directory matches both --include-dir and ---exclude-dir, it is excluded. There is no short form for this option. -

    -

    --L, --files-without-match -Instead of outputting lines from the files, just output the names of the files -that do not contain any lines that would have been output. Each file name is -output once, on a separate line. -

    -

    --l, --files-with-matches -Instead of outputting lines from the files, just output the names of the files -containing lines that would have been output. Each file name is output -once, on a separate line. Searching normally stops as soon as a matching line -is found in a file. However, if the -c (count) option is also used, -matching continues in order to obtain the correct count, and those files that -have at least one match are listed along with their counts. Using this option -with -c is a way of suppressing the listing of files with no matches. -

    -

    ---label=name -This option supplies a name to be used for the standard input when file names -are being output. If not supplied, "(standard input)" is used. There is no -short form for this option. -

    -

    ---line-buffered -When this option is given, input is read and processed line by line, and the -output is flushed after each write. By default, input is read in large chunks, -unless pcregrep can determine that it is reading from a terminal (which -is currently possible only in Unix-like environments). Output to terminal is -normally automatically flushed by the operating system. This option can be -useful when the input or output is attached to a pipe and you do not want -pcregrep to buffer up large amounts of data. However, its use will affect -performance, and the -M (multiline) option ceases to work. -

    -

    ---line-offsets -Instead of showing lines or parts of lines that match, show each match as a -line number, the offset from the start of the line, and a length. The line -number is terminated by a colon (as usual; see the -n option), and the -offset and length are separated by a comma. In this mode, no context is shown. -That is, the -A, -B, and -C options are ignored. If there is -more than one match in a line, each of them is shown separately. This option is -mutually exclusive with --file-offsets and --only-matching. -

    -

    ---locale=locale-name -This option specifies a locale to be used for pattern matching. It overrides -the value in the LC_ALL or LC_CTYPE environment variables. If no -locale is specified, the PCRE library's default (usually the "C" locale) is -used. There is no short form for this option. -

    -

    ---match-limit=number -Processing some regular expression patterns can require a very large amount of -memory, leading in some cases to a program crash if not enough is available. -Other patterns may take a very long time to search for all possible matching -strings. The pcre_exec() function that is called by pcregrep to do -the matching has two parameters that can limit the resources that it uses. -
    -
    -The --match-limit option provides a means of limiting resource usage -when processing patterns that are not going to match, but which have a very -large number of possibilities in their search trees. The classic example is a -pattern that uses nested unlimited repeats. Internally, PCRE uses a function -called match() which it calls repeatedly (sometimes recursively). The -limit set by --match-limit is imposed on the number of times this -function is called during a match, which has the effect of limiting the amount -of backtracking that can take place. -
    -
    -The --recursion-limit option is similar to --match-limit, but -instead of limiting the total number of times that match() is called, it -limits the depth of recursive calls, which in turn limits the amount of memory -that can be used. The recursion depth is a smaller number than the total number -of calls, because not all calls to match() are recursive. This limit is -of use only if it is set smaller than --match-limit. -
    -
    -There are no short forms for these options. The default settings are specified -when the PCRE library is compiled, with the default default being 10 million. -

    -

    --M, --multiline -Allow patterns to match more than one line. When this option is given, patterns -may usefully contain literal newline characters and internal occurrences of ^ -and $ characters. The output for a successful match may consist of more than -one line, the last of which is the one in which the match ended. If the matched -string ends with a newline sequence the output ends at the end of that line. -
    -
    -When this option is set, the PCRE library is called in "multiline" mode. -There is a limit to the number of lines that can be matched, imposed by the way -that pcregrep buffers the input file as it scans it. However, -pcregrep ensures that at least 8K characters or the rest of the document -(whichever is the shorter) are available for forward matching, and similarly -the previous 8K characters (or all the previous characters, if fewer than 8K) -are guaranteed to be available for lookbehind assertions. This option does not -work when input is read line by line (see \fP--line-buffered\fP.) -

    -

    --N newline-type, --newline=newline-type -The PCRE library supports five different conventions for indicating -the ends of lines. They are the single-character sequences CR (carriage return) -and LF (linefeed), the two-character sequence CRLF, an "anycrlf" convention, -which recognizes any of the preceding three types, and an "any" convention, in -which any Unicode line ending sequence is assumed to end a line. The Unicode -sequences are the three just mentioned, plus VT (vertical tab, U+000B), FF -(form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and -PS (paragraph separator, U+2029). -
    -
    -When the PCRE library is built, a default line-ending sequence is specified. -This is normally the standard sequence for the operating system. Unless -otherwise specified by this option, pcregrep uses the library's default. -The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This -makes it possible to use pcregrep to scan files that have come from other -environments without having to modify their line endings. If the data that is -being scanned does not agree with the convention set by this option, -pcregrep may behave in strange ways. Note that this option does not -apply to files specified by the -f, --exclude-from, or ---include-from options, which are expected to use the operating system's -standard newline sequence. -

    -

    --n, --line-number -Precede each output line by its line number in the file, followed by a colon -for matching lines or a hyphen for context lines. If the filename is also being -output, it precedes the line number. This option is forced if ---line-offsets is used. -

    -

    ---no-jit -If the PCRE library is built with support for just-in-time compiling (which -speeds up matching), pcregrep automatically makes use of this, unless it -was explicitly disabled at build time. This option can be used to disable the -use of JIT at run time. It is provided for testing and working round problems. -It should never be needed in normal use. -

    -

    --o, --only-matching -Show only the part of the line that matched a pattern instead of the whole -line. In this mode, no context is shown. That is, the -A, -B, and --C options are ignored. If there is more than one match in a line, each -of them is shown separately. If -o is combined with -v (invert the -sense of the match to find non-matching lines), no output is generated, but the -return code is set appropriately. If the matched portion of the line is empty, -nothing is output unless the file name or line number are being printed, in -which case they are shown on an otherwise empty line. This option is mutually -exclusive with --file-offsets and --line-offsets. -

    -

    --onumber, --only-matching=number -Show only the part of the line that matched the capturing parentheses of the -given number. Up to 32 capturing parentheses are supported, and -o0 is -equivalent to -o without a number. Because these options can be given -without an argument (see above), if an argument is present, it must be given in -the same shell item, for example, -o3 or --only-matching=2. The comments given -for the non-argument case above also apply to this case. If the specified -capturing parentheses do not exist in the pattern, or were not set in the -match, nothing is output unless the file name or line number are being printed. -
    -
    -If this option is given multiple times, multiple substrings are output, in the -order the options are given. For example, -o3 -o1 -o3 causes the substrings -matched by capturing parentheses 3 and 1 and then 3 again to be output. By -default, there is no separator (but see the next option). -

    -

    ---om-separator=text -Specify a separating string for multiple occurrences of -o. The default -is an empty string. Separating strings are never coloured. -

    -

    --q, --quiet -Work quietly, that is, display nothing except error messages. The exit -status indicates whether or not any matches were found. -

    -

    --r, --recursive -If any given path is a directory, recursively scan the files it contains, -taking note of any --include and --exclude settings. By default, a -directory is read as a normal file; in some operating systems this gives an -immediate end-of-file. This option is a shorthand for setting the -d -option to "recurse". -

    -

    ---recursion-limit=number -See --match-limit above. -

    -

    --s, --no-messages -Suppress error messages about non-existent or unreadable files. Such files are -quietly skipped. However, the return code is still 2, even if matches were -found in other files. -

    -

    --u, --utf-8 -Operate in UTF-8 mode. This option is available only if PCRE has been compiled -with UTF-8 support. All patterns (including those for any --exclude and ---include options) and all subject lines that are scanned must be valid -strings of UTF-8 characters. -

    -

    --V, --version -Write the version numbers of pcregrep and the PCRE library to the -standard output and then exit. Anything else on the command line is -ignored. -

    -

    --v, --invert-match -Invert the sense of the match, so that lines which do not match any of -the patterns are the ones that are found. -

    -

    --w, --word-regex, --word-regexp -Force the patterns to match only whole words. This is equivalent to having \b -at the start and end of the pattern. This option applies only to the patterns -that are matched against the contents of files; it does not apply to patterns -specified by any of the --include or --exclude options. -

    -

    --x, --line-regex, --line-regexp -Force the patterns to be anchored (each must start matching at the beginning of -a line) and in addition, require them to match entire lines. This is equivalent -to having ^ and $ characters at the start and end of each alternative branch in -every pattern. This option applies only to the patterns that are matched -against the contents of files; it does not apply to patterns specified by any -of the --include or --exclude options. -

    -
    ENVIRONMENT VARIABLES
    -

    -The environment variables LC_ALL and LC_CTYPE are examined, in that -order, for a locale. The first one that is set is used. This can be overridden -by the --locale option. If no locale is set, the PCRE library's default -(usually the "C" locale) is used. -

    -
    NEWLINES
    -

    -The -N (--newline) option allows pcregrep to scan files with -different newline conventions from the default. Any parts of the input files -that are written to the standard output are copied identically, with whatever -newline sequences they have in the input. However, the setting of this option -does not affect the interpretation of files specified by the -f, ---exclude-from, or --include-from options, which are assumed to use -the operating system's standard newline sequence, nor does it affect the way in -which pcregrep writes informational messages to the standard error and -output streams. For these it uses the string "\n" to indicate newlines, -relying on the C I/O library to convert this to an appropriate sequence. -

    -
    OPTIONS COMPATIBILITY
    -

    -Many of the short and long forms of pcregrep's options are the same -as in the GNU grep program. Any long option of the form ---xxx-regexp (GNU terminology) is also available as --xxx-regex -(PCRE terminology). However, the --file-list, --file-offsets, ---include-dir, --line-offsets, --locale, --match-limit, --M, --multiline, -N, --newline, --om-separator, ---recursion-limit, -u, and --utf-8 options are specific to -pcregrep, as is the use of the --only-matching option with a -capturing parentheses number. -

    -

    -Although most of the common options work the same way, a few are different in -pcregrep. For example, the --include option's argument is a glob -for GNU grep, but a regular expression for pcregrep. If both the --c and -l options are given, GNU grep lists only file names, -without counts, but pcregrep gives the counts. -

    -
    OPTIONS WITH DATA
    -

    -There are four different ways in which an option with data can be specified. -If a short form option is used, the data may follow immediately, or (with one -exception) in the next command line item. For example: -

    -  -f/some/file
    -  -f /some/file
    -
    -The exception is the -o option, which may appear with or without data. -Because of this, if data is present, it must follow immediately in the same -item, for example -o3. -

    -

    -If a long form option is used, the data may appear in the same command line -item, separated by an equals character, or (with two exceptions) it may appear -in the next command line item. For example: -

    -  --file=/some/file
    -  --file /some/file
    -
    -Note, however, that if you want to supply a file name beginning with ~ as data -in a shell command, and have the shell expand ~ to a home directory, you must -separate the file name from the option, because the shell does not treat ~ -specially unless it is at the start of an item. -

    -

    -The exceptions to the above are the --colour (or --color) and ---only-matching options, for which the data is optional. If one of these -options does have data, it must be given in the first form, using an equals -character. Otherwise pcregrep will assume that it has no data. -

    -
    MATCHING ERRORS
    -

    -It is possible to supply a regular expression that takes a very long time to -fail to match certain lines. Such patterns normally involve nested indefinite -repeats, for example: (a+)*\d when matched against a line of a's with no final -digit. The PCRE matching function has a resource limit that causes it to abort -in these circumstances. If this happens, pcregrep outputs an error -message and the line that caused the problem to the standard error stream. If -there are more than 20 such errors, pcregrep gives up. -

    -

    -The --match-limit option of pcregrep can be used to set the overall -resource limit; there is a second option called --recursion-limit that -sets a limit on the amount of memory (usually stack) that is used (see the -discussion of these options above). -

    -
    DIAGNOSTICS
    -

    -Exit status is 0 if any matches were found, 1 if no matches were found, and 2 -for syntax errors, overlong lines, non-existent or inaccessible files (even if -matches were found in other files) or too many matching errors. Using the --s option to suppress error messages about inaccessible files does not -affect the return code. -

    -
    SEE ALSO
    -

    -pcrepattern(3), pcresyntax(3), pcretest(1). -

    -
    AUTHOR
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    REVISION
    -

    -Last updated: 03 April 2014 -
    -Copyright © 1997-2014 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcrejit.html b/src/pcre/doc/html/pcrejit.html deleted file mode 100644 index abb34252..00000000 --- a/src/pcre/doc/html/pcrejit.html +++ /dev/null @@ -1,499 +0,0 @@ - - -pcrejit specification - - -

    pcrejit man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -
    PCRE JUST-IN-TIME COMPILER SUPPORT
    -

    -Just-in-time compiling is a heavyweight optimization that can greatly speed up -pattern matching. However, it comes at the cost of extra processing before the -match is performed. Therefore, it is of most benefit when the same pattern is -going to be matched many times. This does not necessarily mean many calls of a -matching function; if the pattern is not anchored, matching attempts may take -place many times at various positions in the subject, even for a single call. -Therefore, if the subject string is very long, it may still pay to use JIT for -one-off matches. -

    -

    -JIT support applies only to the traditional Perl-compatible matching function. -It does not apply when the DFA matching function is being used. The code for -this support was written by Zoltan Herczeg. -

    -
    8-BIT, 16-BIT AND 32-BIT SUPPORT
    -

    -JIT support is available for all of the 8-bit, 16-bit and 32-bit PCRE -libraries. To keep this documentation simple, only the 8-bit interface is -described in what follows. If you are using the 16-bit library, substitute the -16-bit functions and 16-bit structures (for example, pcre16_jit_stack -instead of pcre_jit_stack). If you are using the 32-bit library, -substitute the 32-bit functions and 32-bit structures (for example, -pcre32_jit_stack instead of pcre_jit_stack). -

    -
    AVAILABILITY OF JIT SUPPORT
    -

    -JIT support is an optional feature of PCRE. The "configure" option --enable-jit -(or equivalent CMake option) must be set when PCRE is built if you want to use -JIT. The support is limited to the following hardware platforms: -

    -  ARM v5, v7, and Thumb2
    -  Intel x86 32-bit and 64-bit
    -  MIPS 32-bit
    -  Power PC 32-bit and 64-bit
    -  SPARC 32-bit (experimental)
    -
    -If --enable-jit is set on an unsupported platform, compilation fails. -

    -

    -A program that is linked with PCRE 8.20 or later can tell if JIT support is -available by calling pcre_config() with the PCRE_CONFIG_JIT option. The -result is 1 when JIT is available, and 0 otherwise. However, a simple program -does not need to check this in order to use JIT. The normal API is implemented -in a way that falls back to the interpretive code if JIT is not available. For -programs that need the best possible performance, there is also a "fast path" -API that is JIT-specific. -

    -

    -If your program may sometimes be linked with versions of PCRE that are older -than 8.20, but you want to use JIT when it is available, you can test the -values of PCRE_MAJOR and PCRE_MINOR, or the existence of a JIT macro such as -PCRE_CONFIG_JIT, for compile-time control of your code. Also beware that the -pcre_jit_exec() function was not available at all before 8.32, -and may not be available at all if PCRE isn't compiled with ---enable-jit. See the "JIT FAST PATH API" section below for details. -

    -
    SIMPLE USE OF JIT
    -

    -You have to do two things to make use of the JIT support in the simplest way: -

    -  (1) Call pcre_study() with the PCRE_STUDY_JIT_COMPILE option for
    -      each compiled pattern, and pass the resulting pcre_extra block to
    -      pcre_exec().
    -
    -  (2) Use pcre_free_study() to free the pcre_extra block when it is
    -      no longer needed, instead of just freeing it yourself. This ensures that
    -      any JIT data is also freed.
    -
    -For a program that may be linked with pre-8.20 versions of PCRE, you can insert -
    -  #ifndef PCRE_STUDY_JIT_COMPILE
    -  #define PCRE_STUDY_JIT_COMPILE 0
    -  #endif
    -
    -so that no option is passed to pcre_study(), and then use something like -this to free the study data: -
    -  #ifdef PCRE_CONFIG_JIT
    -      pcre_free_study(study_ptr);
    -  #else
    -      pcre_free(study_ptr);
    -  #endif
    -
    -PCRE_STUDY_JIT_COMPILE requests the JIT compiler to generate code for complete -matches. If you want to run partial matches using the PCRE_PARTIAL_HARD or -PCRE_PARTIAL_SOFT options of pcre_exec(), you should set one or both of -the following options in addition to, or instead of, PCRE_STUDY_JIT_COMPILE -when you call pcre_study(): -
    -  PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
    -  PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
    -
    -If using pcre_jit_exec() and supporting a pre-8.32 version of -PCRE, you can insert: -
    -   #if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32
    -   pcre_jit_exec(...);
    -   #else
    -   pcre_exec(...)
    -   #endif
    -
    -but as described in the "JIT FAST PATH API" section below this assumes -version 8.32 and later are compiled with --enable-jit, which may -break. -
    -
    -The JIT compiler generates different optimized code for each of the three -modes (normal, soft partial, hard partial). When pcre_exec() is called, -the appropriate code is run if it is available. Otherwise, the pattern is -matched using interpretive code. -

    -

    -In some circumstances you may need to call additional functions. These are -described in the section entitled -"Controlling the JIT stack" -below. -

    -

    -If JIT support is not available, PCRE_STUDY_JIT_COMPILE etc. are ignored, and -no JIT data is created. Otherwise, the compiled pattern is passed to the JIT -compiler, which turns it into machine code that executes much faster than the -normal interpretive code. When pcre_exec() is passed a pcre_extra -block containing a pointer to JIT code of the appropriate mode (normal or -hard/soft partial), it obeys that code instead of running the interpreter. The -result is identical, but the compiled JIT code runs much faster. -

    -

    -There are some pcre_exec() options that are not supported for JIT -execution. There are also some pattern items that JIT cannot handle. Details -are given below. In both cases, execution automatically falls back to the -interpretive code. If you want to know whether JIT was actually used for a -particular match, you should arrange for a JIT callback function to be set up -as described in the section entitled -"Controlling the JIT stack" -below, even if you do not need to supply a non-default JIT stack. Such a -callback function is called whenever JIT code is about to be obeyed. If the -execution options are not right for JIT execution, the callback function is not -obeyed. -

    -

    -If the JIT compiler finds an unsupported item, no JIT data is generated. You -can find out if JIT execution is available after studying a pattern by calling -pcre_fullinfo() with the PCRE_INFO_JIT option. A result of 1 means that -JIT compilation was successful. A result of 0 means that JIT support is not -available, or the pattern was not studied with PCRE_STUDY_JIT_COMPILE etc., or -the JIT compiler was not able to handle the pattern. -

    -

    -Once a pattern has been studied, with or without JIT, it can be used as many -times as you like for matching different subject strings. -

    -
    UNSUPPORTED OPTIONS AND PATTERN ITEMS
    -

    -The only pcre_exec() options that are supported for JIT execution are -PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK, PCRE_NO_UTF32_CHECK, PCRE_NOTBOL, -PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and -PCRE_PARTIAL_SOFT. -

    -

    -The only unsupported pattern items are \C (match a single data unit) when -running in a UTF mode, and a callout immediately before an assertion condition -in a conditional group. -

    -
    RETURN VALUES FROM JIT EXECUTION
    -

    -When a pattern is matched using JIT execution, the return values are the same -as those given by the interpretive pcre_exec() code, with the addition of -one new error code: PCRE_ERROR_JIT_STACKLIMIT. This means that the memory used -for the JIT stack was insufficient. See -"Controlling the JIT stack" -below for a discussion of JIT stack usage. For compatibility with the -interpretive pcre_exec() code, no more than two-thirds of the -ovector argument is used for passing back captured substrings. -

    -

    -The error code PCRE_ERROR_MATCHLIMIT is returned by the JIT code if searching a -very large pattern tree goes on for too long, as it is in the same circumstance -when JIT is not used, but the details of exactly what is counted are not the -same. The PCRE_ERROR_RECURSIONLIMIT error code is never returned by JIT -execution. -

    -
    SAVING AND RESTORING COMPILED PATTERNS
    -

    -The code that is generated by the JIT compiler is architecture-specific, and is -also position dependent. For those reasons it cannot be saved (in a file or -database) and restored later like the bytecode and other data of a compiled -pattern. Saving and restoring compiled patterns is not something many people -do. More detail about this facility is given in the -pcreprecompile -documentation. It should be possible to run pcre_study() on a saved and -restored pattern, and thereby recreate the JIT data, but because JIT -compilation uses significant resources, it is probably not worth doing this; -you might as well recompile the original pattern. -

    -
    CONTROLLING THE JIT STACK
    -

    -When the compiled JIT code runs, it needs a block of memory to use as a stack. -By default, it uses 32K on the machine stack. However, some large or -complicated patterns need more than this. The error PCRE_ERROR_JIT_STACKLIMIT -is given when there is not enough stack. Three functions are provided for -managing blocks of memory for use as JIT stacks. There is further discussion -about the use of JIT stacks in the section entitled -"JIT stack FAQ" -below. -

    -

    -The pcre_jit_stack_alloc() function creates a JIT stack. Its arguments -are a starting size and a maximum size, and it returns a pointer to an opaque -structure of type pcre_jit_stack, or NULL if there is an error. The -pcre_jit_stack_free() function can be used to free a stack that is no -longer needed. (For the technically minded: the address space is allocated by -mmap or VirtualAlloc.) -

    -

    -JIT uses far less memory for recursion than the interpretive code, -and a maximum stack size of 512K to 1M should be more than enough for any -pattern. -

    -

    -The pcre_assign_jit_stack() function specifies which stack JIT code -should use. Its arguments are as follows: -

    -  pcre_extra         *extra
    -  pcre_jit_callback  callback
    -  void               *data
    -
    -The extra argument must be the result of studying a pattern with -PCRE_STUDY_JIT_COMPILE etc. There are three cases for the values of the other -two options: -
    -  (1) If callback is NULL and data is NULL, an internal 32K block
    -      on the machine stack is used.
    -
    -  (2) If callback is NULL and data is not NULL, data must be
    -      a valid JIT stack, the result of calling pcre_jit_stack_alloc().
    -
    -  (3) If callback is not NULL, it must point to a function that is
    -      called with data as an argument at the start of matching, in
    -      order to set up a JIT stack. If the return from the callback
    -      function is NULL, the internal 32K stack is used; otherwise the
    -      return value must be a valid JIT stack, the result of calling
    -      pcre_jit_stack_alloc().
    -
    -A callback function is obeyed whenever JIT code is about to be run; it is not -obeyed when pcre_exec() is called with options that are incompatible for -JIT execution. A callback function can therefore be used to determine whether a -match operation was executed by JIT or by the interpreter. -

    -

    -You may safely use the same JIT stack for more than one pattern (either by -assigning directly or by callback), as long as the patterns are all matched -sequentially in the same thread. In a multithread application, if you do not -specify a JIT stack, or if you assign or pass back NULL from a callback, that -is thread-safe, because each thread has its own machine stack. However, if you -assign or pass back a non-NULL JIT stack, this must be a different stack for -each thread so that the application is thread-safe. -

    -

    -Strictly speaking, even more is allowed. You can assign the same non-NULL stack -to any number of patterns as long as they are not used for matching by multiple -threads at the same time. For example, you can assign the same stack to all -compiled patterns, and use a global mutex in the callback to wait until the -stack is available for use. However, this is an inefficient solution, and not -recommended. -

    -

    -This is a suggestion for how a multithreaded program that needs to set up -non-default JIT stacks might operate: -

    -  During thread initalization
    -    thread_local_var = pcre_jit_stack_alloc(...)
    -
    -  During thread exit
    -    pcre_jit_stack_free(thread_local_var)
    -
    -  Use a one-line callback function
    -    return thread_local_var
    -
    -All the functions described in this section do nothing if JIT is not available, -and pcre_assign_jit_stack() does nothing unless the extra argument -is non-NULL and points to a pcre_extra block that is the result of a -successful study with PCRE_STUDY_JIT_COMPILE etc. -

    -
    JIT STACK FAQ
    -

    -(1) Why do we need JIT stacks? -
    -
    -PCRE (and JIT) is a recursive, depth-first engine, so it needs a stack where -the local data of the current node is pushed before checking its child nodes. -Allocating real machine stack on some platforms is difficult. For example, the -stack chain needs to be updated every time if we extend the stack on PowerPC. -Although it is possible, its updating time overhead decreases performance. So -we do the recursion in memory. -

    -

    -(2) Why don't we simply allocate blocks of memory with malloc()? -
    -
    -Modern operating systems have a nice feature: they can reserve an address space -instead of allocating memory. We can safely allocate memory pages inside this -address space, so the stack could grow without moving memory data (this is -important because of pointers). Thus we can allocate 1M address space, and use -only a single memory page (usually 4K) if that is enough. However, we can still -grow up to 1M anytime if needed. -

    -

    -(3) Who "owns" a JIT stack? -
    -
    -The owner of the stack is the user program, not the JIT studied pattern or -anything else. The user program must ensure that if a stack is used by -pcre_exec(), (that is, it is assigned to the pattern currently running), -that stack must not be used by any other threads (to avoid overwriting the same -memory area). The best practice for multithreaded programs is to allocate a -stack for each thread, and return this stack through the JIT callback function. -

    -

    -(4) When should a JIT stack be freed? -
    -
    -You can free a JIT stack at any time, as long as it will not be used by -pcre_exec() again. When you assign the stack to a pattern, only a pointer -is set. There is no reference counting or any other magic. You can free the -patterns and stacks in any order, anytime. Just do not call -pcre_exec() with a pattern pointing to an already freed stack, as that -will cause SEGFAULT. (Also, do not free a stack currently used by -pcre_exec() in another thread). You can also replace the stack for a -pattern at any time. You can even free the previous stack before assigning a -replacement. -

    -

    -(5) Should I allocate/free a stack every time before/after calling -pcre_exec()? -
    -
    -No, because this is too costly in terms of resources. However, you could -implement some clever idea which release the stack if it is not used in let's -say two minutes. The JIT callback can help to achieve this without keeping a -list of the currently JIT studied patterns. -

    -

    -(6) OK, the stack is for long term memory allocation. But what happens if a -pattern causes stack overflow with a stack of 1M? Is that 1M kept until the -stack is freed? -
    -
    -Especially on embedded sytems, it might be a good idea to release memory -sometimes without freeing the stack. There is no API for this at the moment. -Probably a function call which returns with the currently allocated memory for -any stack and another which allows releasing memory (shrinking the stack) would -be a good idea if someone needs this. -

    -

    -(7) This is too much of a headache. Isn't there any better solution for JIT -stack handling? -
    -
    -No, thanks to Windows. If POSIX threads were used everywhere, we could throw -out this complicated API. -

    -
    EXAMPLE CODE
    -

    -This is a single-threaded example that specifies a JIT stack without using a -callback. -

    -  int rc;
    -  int ovector[30];
    -  pcre *re;
    -  pcre_extra *extra;
    -  pcre_jit_stack *jit_stack;
    -
    -  re = pcre_compile(pattern, 0, &error, &erroffset, NULL);
    -  /* Check for errors */
    -  extra = pcre_study(re, PCRE_STUDY_JIT_COMPILE, &error);
    -  jit_stack = pcre_jit_stack_alloc(32*1024, 512*1024);
    -  /* Check for error (NULL) */
    -  pcre_assign_jit_stack(extra, NULL, jit_stack);
    -  rc = pcre_exec(re, extra, subject, length, 0, 0, ovector, 30);
    -  /* Check results */
    -  pcre_free(re);
    -  pcre_free_study(extra);
    -  pcre_jit_stack_free(jit_stack);
    -
    -
    -

    -
    JIT FAST PATH API
    -

    -Because the API described above falls back to interpreted execution when JIT is -not available, it is convenient for programs that are written for general use -in many environments. However, calling JIT via pcre_exec() does have a -performance impact. Programs that are written for use where JIT is known to be -available, and which need the best possible performance, can instead use a -"fast path" API to call JIT execution directly instead of calling -pcre_exec() (obviously only for patterns that have been successfully -studied by JIT). -

    -

    -The fast path function is called pcre_jit_exec(), and it takes exactly -the same arguments as pcre_exec(), plus one additional argument that -must point to a JIT stack. The JIT stack arrangements described above do not -apply. The return values are the same as for pcre_exec(). -

    -

    -When you call pcre_exec(), as well as testing for invalid options, a -number of other sanity checks are performed on the arguments. For example, if -the subject pointer is NULL, or its length is negative, an immediate error is -given. Also, unless PCRE_NO_UTF[8|16|32] is set, a UTF subject string is tested -for validity. In the interests of speed, these checks do not happen on the JIT -fast path, and if invalid data is passed, the result is undefined. -

    -

    -Bypassing the sanity checks and the pcre_exec() wrapping can give -speedups of more than 10%. -

    -

    -Note that the pcre_jit_exec() function is not available in versions of -PCRE before 8.32 (released in November 2012). If you need to support versions -that old you must either use the slower pcre_exec(), or switch between -the two codepaths by checking the values of PCRE_MAJOR and PCRE_MINOR. -

    -

    -Due to an unfortunate implementation oversight, even in versions 8.32 -and later there will be no pcre_jit_exec() stub function defined -when PCRE is compiled with --disable-jit, which is the default, and -there's no way to detect whether PCRE was compiled with --enable-jit -via a macro. -

    -

    -If you need to support versions older than 8.32, or versions that may -not build with --enable-jit, you must either use the slower -pcre_exec(), or switch between the two codepaths by checking the -values of PCRE_MAJOR and PCRE_MINOR. -

    -

    -Switching between the two by checking the version assumes that all the -versions being targeted are built with --enable-jit. To also support -builds that may use --disable-jit either pcre_exec() must be -used, or a compile-time check for JIT via pcre_config() (which -assumes the runtime environment will be the same), or as the Git -project decided to do, simply assume that pcre_jit_exec() is -present in 8.32 or later unless a compile-time flag is provided, see -the "grep: un-break building with PCRE >= 8.32 without --enable-jit" -commit in git.git for an example of that. -

    -
    SEE ALSO
    -

    -pcreapi(3) -

    -
    AUTHOR
    -

    -Philip Hazel (FAQ by Zoltan Herczeg) -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    REVISION
    -

    -Last updated: 05 July 2017 -
    -Copyright © 1997-2017 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcrelimits.html b/src/pcre/doc/html/pcrelimits.html deleted file mode 100644 index ee5ebf03..00000000 --- a/src/pcre/doc/html/pcrelimits.html +++ /dev/null @@ -1,90 +0,0 @@ - - -pcrelimits specification - - -

    pcrelimits man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -SIZE AND OTHER LIMITATIONS -
    -

    -There are some size limitations in PCRE but it is hoped that they will never in -practice be relevant. -

    -

    -The maximum length of a compiled pattern is approximately 64K data units (bytes -for the 8-bit library, 16-bit units for the 16-bit library, and 32-bit units for -the 32-bit library) if PCRE is compiled with the default internal linkage size, -which is 2 bytes for the 8-bit and 16-bit libraries, and 4 bytes for the 32-bit -library. If you want to process regular expressions that are truly enormous, -you can compile PCRE with an internal linkage size of 3 or 4 (when building the -16-bit or 32-bit library, 3 is rounded up to 4). See the README file in -the source distribution and the -pcrebuild -documentation for details. In these cases the limit is substantially larger. -However, the speed of execution is slower. -

    -

    -All values in repeating quantifiers must be less than 65536. -

    -

    -There is no limit to the number of parenthesized subpatterns, but there can be -no more than 65535 capturing subpatterns. There is, however, a limit to the -depth of nesting of parenthesized subpatterns of all kinds. This is imposed in -order to limit the amount of system stack used at compile time. The limit can -be specified when PCRE is built; the default is 250. -

    -

    -There is a limit to the number of forward references to subsequent subpatterns -of around 200,000. Repeated forward references with fixed upper limits, for -example, (?2){0,100} when subpattern number 2 is to the right, are included in -the count. There is no limit to the number of backward references. -

    -

    -The maximum length of name for a named subpattern is 32 characters, and the -maximum number of named subpatterns is 10000. -

    -

    -The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb -is 255 for the 8-bit library and 65535 for the 16-bit and 32-bit libraries. -

    -

    -The maximum length of a subject string is the largest positive number that an -integer variable can hold. However, when using the traditional matching -function, PCRE uses recursion to handle subpatterns and indefinite repetition. -This means that the available stack space may limit the size of a subject -string that can be processed by certain patterns. For a discussion of stack -issues, see the -pcrestack -documentation. -

    -
    -AUTHOR -
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    -REVISION -
    -

    -Last updated: 05 November 2013 -
    -Copyright © 1997-2013 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcrepartial.html b/src/pcre/doc/html/pcrepartial.html deleted file mode 100644 index 4faeafcb..00000000 --- a/src/pcre/doc/html/pcrepartial.html +++ /dev/null @@ -1,509 +0,0 @@ - - -pcrepartial specification - - -

    pcrepartial man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -
    PARTIAL MATCHING IN PCRE
    -

    -In normal use of PCRE, if the subject string that is passed to a matching -function matches as far as it goes, but is too short to match the entire -pattern, PCRE_ERROR_NOMATCH is returned. There are circumstances where it might -be helpful to distinguish this case from other cases in which there is no -match. -

    -

    -Consider, for example, an application where a human is required to type in data -for a field with specific formatting requirements. An example might be a date -in the form ddmmmyy, defined by this pattern: -

    -  ^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
    -
    -If the application sees the user's keystrokes one by one, and can check that -what has been typed so far is potentially valid, it is able to raise an error -as soon as a mistake is made, by beeping and not reflecting the character that -has been typed, for example. This immediate feedback is likely to be a better -user interface than a check that is delayed until the entire string has been -entered. Partial matching can also be useful when the subject string is very -long and is not all available at once. -

    -

    -PCRE supports partial matching by means of the PCRE_PARTIAL_SOFT and -PCRE_PARTIAL_HARD options, which can be set when calling any of the matching -functions. For backwards compatibility, PCRE_PARTIAL is a synonym for -PCRE_PARTIAL_SOFT. The essential difference between the two options is whether -or not a partial match is preferred to an alternative complete match, though -the details differ between the two types of matching function. If both options -are set, PCRE_PARTIAL_HARD takes precedence. -

    -

    -If you want to use partial matching with just-in-time optimized code, you must -call pcre_study(), pcre16_study() or pcre32_study() with one -or both of these options: -

    -  PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE
    -  PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE
    -
    -PCRE_STUDY_JIT_COMPILE should also be set if you are going to run non-partial -matches on the same pattern. If the appropriate JIT study mode has not been set -for a match, the interpretive matching code is used. -

    -

    -Setting a partial matching option disables two of PCRE's standard -optimizations. PCRE remembers the last literal data unit in a pattern, and -abandons matching immediately if it is not present in the subject string. This -optimization cannot be used for a subject string that might match only -partially. If the pattern was studied, PCRE knows the minimum length of a -matching string, and does not bother to run the matching function on shorter -strings. This optimization is also disabled for partial matching. -

    -
    PARTIAL MATCHING USING pcre_exec() OR pcre[16|32]_exec()
    -

    -A partial match occurs during a call to pcre_exec() or -pcre[16|32]_exec() when the end of the subject string is reached -successfully, but matching cannot continue because more characters are needed. -However, at least one character in the subject must have been inspected. This -character need not form part of the final matched string; lookbehind assertions -and the \K escape sequence provide ways of inspecting characters before the -start of a matched substring. The requirement for inspecting at least one -character exists because an empty string can always be matched; without such a -restriction there would always be a partial match of an empty string at the end -of the subject. -

    -

    -If there are at least two slots in the offsets vector when a partial match is -returned, the first slot is set to the offset of the earliest character that -was inspected. For convenience, the second offset points to the end of the -subject so that a substring can easily be identified. If there are at least -three slots in the offsets vector, the third slot is set to the offset of the -character where matching started. -

    -

    -For the majority of patterns, the contents of the first and third slots will be -the same. However, for patterns that contain lookbehind assertions, or begin -with \b or \B, characters before the one where matching started may have been -inspected while carrying out the match. For example, consider this pattern: -

    -  /(?<=abc)123/
    -
    -This pattern matches "123", but only if it is preceded by "abc". If the subject -string is "xyzabc12", the first two offsets after a partial match are for the -substring "abc12", because all these characters were inspected. However, the -third offset is set to 6, because that is the offset where matching began. -

    -

    -What happens when a partial match is identified depends on which of the two -partial matching options are set. -

    -
    -PCRE_PARTIAL_SOFT WITH pcre_exec() OR pcre[16|32]_exec() -
    -

    -If PCRE_PARTIAL_SOFT is set when pcre_exec() or pcre[16|32]_exec() -identifies a partial match, the partial match is remembered, but matching -continues as normal, and other alternatives in the pattern are tried. If no -complete match can be found, PCRE_ERROR_PARTIAL is returned instead of -PCRE_ERROR_NOMATCH. -

    -

    -This option is "soft" because it prefers a complete match over a partial match. -All the various matching items in a pattern behave as if the subject string is -potentially complete. For example, \z, \Z, and $ match at the end of the -subject, as normal, and for \b and \B the end of the subject is treated as a -non-alphanumeric. -

    -

    -If there is more than one partial match, the first one that was found provides -the data that is returned. Consider this pattern: -

    -  /123\w+X|dogY/
    -
    -If this is matched against the subject string "abc123dog", both -alternatives fail to match, but the end of the subject is reached during -matching, so PCRE_ERROR_PARTIAL is returned. The offsets are set to 3 and 9, -identifying "123dog" as the first partial match that was found. (In this -example, there are two partial matches, because "dog" on its own partially -matches the second alternative.) -

    -
    -PCRE_PARTIAL_HARD WITH pcre_exec() OR pcre[16|32]_exec() -
    -

    -If PCRE_PARTIAL_HARD is set for pcre_exec() or pcre[16|32]_exec(), -PCRE_ERROR_PARTIAL is returned as soon as a partial match is found, without -continuing to search for possible complete matches. This option is "hard" -because it prefers an earlier partial match over a later complete match. For -this reason, the assumption is made that the end of the supplied subject string -may not be the true end of the available data, and so, if \z, \Z, \b, \B, -or $ are encountered at the end of the subject, the result is -PCRE_ERROR_PARTIAL, provided that at least one character in the subject has -been inspected. -

    -

    -Setting PCRE_PARTIAL_HARD also affects the way UTF-8 and UTF-16 -subject strings are checked for validity. Normally, an invalid sequence -causes the error PCRE_ERROR_BADUTF8 or PCRE_ERROR_BADUTF16. However, in the -special case of a truncated character at the end of the subject, -PCRE_ERROR_SHORTUTF8 or PCRE_ERROR_SHORTUTF16 is returned when -PCRE_PARTIAL_HARD is set. -

    -
    -Comparing hard and soft partial matching -
    -

    -The difference between the two partial matching options can be illustrated by a -pattern such as: -

    -  /dog(sbody)?/
    -
    -This matches either "dog" or "dogsbody", greedily (that is, it prefers the -longer string if possible). If it is matched against the string "dog" with -PCRE_PARTIAL_SOFT, it yields a complete match for "dog". However, if -PCRE_PARTIAL_HARD is set, the result is PCRE_ERROR_PARTIAL. On the other hand, -if the pattern is made ungreedy the result is different: -
    -  /dog(sbody)??/
    -
    -In this case the result is always a complete match because that is found first, -and matching never continues after finding a complete match. It might be easier -to follow this explanation by thinking of the two patterns like this: -
    -  /dog(sbody)?/    is the same as  /dogsbody|dog/
    -  /dog(sbody)??/   is the same as  /dog|dogsbody/
    -
    -The second pattern will never match "dogsbody", because it will always find the -shorter match first. -

    -
    PARTIAL MATCHING USING pcre_dfa_exec() OR pcre[16|32]_dfa_exec()
    -

    -The DFA functions move along the subject string character by character, without -backtracking, searching for all possible matches simultaneously. If the end of -the subject is reached before the end of the pattern, there is the possibility -of a partial match, again provided that at least one character has been -inspected. -

    -

    -When PCRE_PARTIAL_SOFT is set, PCRE_ERROR_PARTIAL is returned only if there -have been no complete matches. Otherwise, the complete matches are returned. -However, if PCRE_PARTIAL_HARD is set, a partial match takes precedence over any -complete matches. The portion of the string that was inspected when the longest -partial match was found is set as the first matching string, provided there are -at least two slots in the offsets vector. -

    -

    -Because the DFA functions always search for all possible matches, and there is -no difference between greedy and ungreedy repetition, their behaviour is -different from the standard functions when PCRE_PARTIAL_HARD is set. Consider -the string "dog" matched against the ungreedy pattern shown above: -

    -  /dog(sbody)??/
    -
    -Whereas the standard functions stop as soon as they find the complete match for -"dog", the DFA functions also find the partial match for "dogsbody", and so -return that when PCRE_PARTIAL_HARD is set. -

    -
    PARTIAL MATCHING AND WORD BOUNDARIES
    -

    -If a pattern ends with one of sequences \b or \B, which test for word -boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive -results. Consider this pattern: -

    -  /\bcat\b/
    -
    -This matches "cat", provided there is a word boundary at either end. If the -subject string is "the cat", the comparison of the final "t" with a following -character cannot take place, so a partial match is found. However, normal -matching carries on, and \b matches at the end of the subject when the last -character is a letter, so a complete match is found. The result, therefore, is -not PCRE_ERROR_PARTIAL. Using PCRE_PARTIAL_HARD in this case does yield -PCRE_ERROR_PARTIAL, because then the partial match takes precedence. -

    -
    FORMERLY RESTRICTED PATTERNS
    -

    -For releases of PCRE prior to 8.00, because of the way certain internal -optimizations were implemented in the pcre_exec() function, the -PCRE_PARTIAL option (predecessor of PCRE_PARTIAL_SOFT) could not be used with -all patterns. From release 8.00 onwards, the restrictions no longer apply, and -partial matching with can be requested for any pattern. -

    -

    -Items that were formerly restricted were repeated single characters and -repeated metasequences. If PCRE_PARTIAL was set for a pattern that did not -conform to the restrictions, pcre_exec() returned the error code -PCRE_ERROR_BADPARTIAL (-13). This error code is no longer in use. The -PCRE_INFO_OKPARTIAL call to pcre_fullinfo() to find out if a compiled -pattern can be used for partial matching now always returns 1. -

    -
    EXAMPLE OF PARTIAL MATCHING USING PCRETEST
    -

    -If the escape sequence \P is present in a pcretest data line, the -PCRE_PARTIAL_SOFT option is used for the match. Here is a run of pcretest -that uses the date example quoted above: -

    -    re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
    -  data> 25jun04\P
    -   0: 25jun04
    -   1: jun
    -  data> 25dec3\P
    -  Partial match: 23dec3
    -  data> 3ju\P
    -  Partial match: 3ju
    -  data> 3juj\P
    -  No match
    -  data> j\P
    -  No match
    -
    -The first data string is matched completely, so pcretest shows the -matched substrings. The remaining four strings do not match the complete -pattern, but the first two are partial matches. Similar output is obtained -if DFA matching is used. -

    -

    -If the escape sequence \P is present more than once in a pcretest data -line, the PCRE_PARTIAL_HARD option is set for the match. -

    -
    MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() OR pcre[16|32]_dfa_exec()
    -

    -When a partial match has been found using a DFA matching function, it is -possible to continue the match by providing additional subject data and calling -the function again with the same compiled regular expression, this time setting -the PCRE_DFA_RESTART option. You must pass the same working space as before, -because this is where details of the previous partial match are stored. Here is -an example using pcretest, using the \R escape sequence to set the -PCRE_DFA_RESTART option (\D specifies the use of the DFA matching function): -

    -    re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
    -  data> 23ja\P\D
    -  Partial match: 23ja
    -  data> n05\R\D
    -   0: n05
    -
    -The first call has "23ja" as the subject, and requests partial matching; the -second call has "n05" as the subject for the continued (restarted) match. -Notice that when the match is complete, only the last part is shown; PCRE does -not retain the previously partially-matched string. It is up to the calling -program to do that if it needs to. -

    -

    -That means that, for an unanchored pattern, if a continued match fails, it is -not possible to try again at a new starting point. All this facility is capable -of doing is continuing with the previous match attempt. In the previous -example, if the second set of data is "ug23" the result is no match, even -though there would be a match for "aug23" if the entire string were given at -once. Depending on the application, this may or may not be what you want. -The only way to allow for starting again at the next character is to retain the -matched part of the subject and try a new complete match. -

    -

    -You can set the PCRE_PARTIAL_SOFT or PCRE_PARTIAL_HARD options with -PCRE_DFA_RESTART to continue partial matching over multiple segments. This -facility can be used to pass very long subject strings to the DFA matching -functions. -

    -
    MULTI-SEGMENT MATCHING WITH pcre_exec() OR pcre[16|32]_exec()
    -

    -From release 8.00, the standard matching functions can also be used to do -multi-segment matching. Unlike the DFA functions, it is not possible to -restart the previous match with a new segment of data. Instead, new data must -be added to the previous subject string, and the entire match re-run, starting -from the point where the partial match occurred. Earlier data can be discarded. -

    -

    -It is best to use PCRE_PARTIAL_HARD in this situation, because it does not -treat the end of a segment as the end of the subject when matching \z, \Z, -\b, \B, and $. Consider an unanchored pattern that matches dates: -

    -    re> /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/
    -  data> The date is 23ja\P\P
    -  Partial match: 23ja
    -
    -At this stage, an application could discard the text preceding "23ja", add on -text from the next segment, and call the matching function again. Unlike the -DFA matching functions, the entire matching string must always be available, -and the complete matching process occurs for each call, so more memory and more -processing time is needed. -

    -

    -Note: If the pattern contains lookbehind assertions, or \K, or starts -with \b or \B, the string that is returned for a partial match includes -characters that precede the start of what would be returned for a complete -match, because it contains all the characters that were inspected during the -partial match. -

    -
    ISSUES WITH MULTI-SEGMENT MATCHING
    -

    -Certain types of pattern may give problems with multi-segment matching, -whichever matching function is used. -

    -

    -1. If the pattern contains a test for the beginning of a line, you need to pass -the PCRE_NOTBOL option when the subject string for any call does start at the -beginning of a line. There is also a PCRE_NOTEOL option, but in practice when -doing multi-segment matching you should be using PCRE_PARTIAL_HARD, which -includes the effect of PCRE_NOTEOL. -

    -

    -2. Lookbehind assertions that have already been obeyed are catered for in the -offsets that are returned for a partial match. However a lookbehind assertion -later in the pattern could require even earlier characters to be inspected. You -can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the -pcre_fullinfo() or pcre[16|32]_fullinfo() functions to obtain the -length of the longest lookbehind in the pattern. This length is given in -characters, not bytes. If you always retain at least that many characters -before the partially matched string, all should be well. (Of course, near the -start of the subject, fewer characters may be present; in that case all -characters should be retained.) -

    -

    -From release 8.33, there is a more accurate way of deciding which characters to -retain. Instead of subtracting the length of the longest lookbehind from the -earliest inspected character (offsets[0]), the match start position -(offsets[2]) should be used, and the next match attempt started at the -offsets[2] character by setting the startoffset argument of -pcre_exec() or pcre_dfa_exec(). -

    -

    -For example, if the pattern "(?<=123)abc" is partially -matched against the string "xx123a", the three offset values returned are 2, 6, -and 5. This indicates that the matching process that gave a partial match -started at offset 5, but the characters "123a" were all inspected. The maximum -lookbehind for that pattern is 3, so taking that away from 5 shows that we need -only keep "123a", and the next match attempt can be started at offset 3 (that -is, at "a") when further characters have been added. When the match start is -not the earliest inspected character, pcretest shows it explicitly: -

    -    re> "(?<=123)abc"
    -  data> xx123a\P\P
    -  Partial match at offset 5: 123a
    -
    -

    -

    -3. Because a partial match must always contain at least one character, what -might be considered a partial match of an empty string actually gives a "no -match" result. For example: -

    -    re> /c(?<=abc)x/
    -  data> ab\P
    -  No match
    -
    -If the next segment begins "cx", a match should be found, but this will only -happen if characters from the previous segment are retained. For this reason, a -"no match" result should be interpreted as "partial match of an empty string" -when the pattern contains lookbehinds. -

    -

    -4. Matching a subject string that is split into multiple segments may not -always produce exactly the same result as matching over one single long string, -especially when PCRE_PARTIAL_SOFT is used. The section "Partial Matching and -Word Boundaries" above describes an issue that arises if the pattern ends with -\b or \B. Another kind of difference may occur when there are multiple -matching possibilities, because (for PCRE_PARTIAL_SOFT) a partial match result -is given only when there are no completed matches. This means that as soon as -the shortest match has been found, continuation to a new subject segment is no -longer possible. Consider again this pcretest example: -

    -    re> /dog(sbody)?/
    -  data> dogsb\P
    -   0: dog
    -  data> do\P\D
    -  Partial match: do
    -  data> gsb\R\P\D
    -   0: g
    -  data> dogsbody\D
    -   0: dogsbody
    -   1: dog
    -
    -The first data line passes the string "dogsb" to a standard matching function, -setting the PCRE_PARTIAL_SOFT option. Although the string is a partial match -for "dogsbody", the result is not PCRE_ERROR_PARTIAL, because the shorter -string "dog" is a complete match. Similarly, when the subject is presented to -a DFA matching function in several parts ("do" and "gsb" being the first two) -the match stops when "dog" has been found, and it is not possible to continue. -On the other hand, if "dogsbody" is presented as a single string, a DFA -matching function finds both matches. -

    -

    -Because of these problems, it is best to use PCRE_PARTIAL_HARD when matching -multi-segment data. The example above then behaves differently: -

    -    re> /dog(sbody)?/
    -  data> dogsb\P\P
    -  Partial match: dogsb
    -  data> do\P\D
    -  Partial match: do
    -  data> gsb\R\P\P\D
    -  Partial match: gsb
    -
    -5. Patterns that contain alternatives at the top level which do not all start -with the same pattern item may not work as expected when PCRE_DFA_RESTART is -used. For example, consider this pattern: -
    -  1234|3789
    -
    -If the first part of the subject is "ABC123", a partial match of the first -alternative is found at offset 3. There is no partial match for the second -alternative, because such a match does not start at the same point in the -subject string. Attempting to continue with the string "7890" does not yield a -match because only those alternatives that match at one point in the subject -are remembered. The problem arises because the start of the second alternative -matches within the first alternative. There is no problem with anchored -patterns or patterns such as: -
    -  1234|ABCD
    -
    -where no string can be a partial match for both alternatives. This is not a -problem if a standard matching function is used, because the entire match has -to be rerun each time: -
    -    re> /1234|3789/
    -  data> ABC123\P\P
    -  Partial match: 123
    -  data> 1237890
    -   0: 3789
    -
    -Of course, instead of using PCRE_DFA_RESTART, the same technique of re-running -the entire match can also be used with the DFA matching functions. Another -possibility is to work with two buffers. If a partial match at offset n -in the first buffer is followed by "no match" when PCRE_DFA_RESTART is used on -the second buffer, you can then try a new match starting at offset n+1 in -the first buffer. -

    -
    AUTHOR
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    REVISION
    -

    -Last updated: 02 July 2013 -
    -Copyright © 1997-2013 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcrepattern.html b/src/pcre/doc/html/pcrepattern.html deleted file mode 100644 index 96fc7298..00000000 --- a/src/pcre/doc/html/pcrepattern.html +++ /dev/null @@ -1,3276 +0,0 @@ - - -pcrepattern specification - - -

    pcrepattern man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -
    PCRE REGULAR EXPRESSION DETAILS
    -

    -The syntax and semantics of the regular expressions that are supported by PCRE -are described in detail below. There is a quick-reference syntax summary in the -pcresyntax -page. PCRE tries to match Perl syntax and semantics as closely as it can. PCRE -also supports some alternative regular expression syntax (which does not -conflict with the Perl syntax) in order to provide some compatibility with -regular expressions in Python, .NET, and Oniguruma. -

    -

    -Perl's regular expressions are described in its own documentation, and -regular expressions in general are covered in a number of books, some of which -have copious examples. Jeffrey Friedl's "Mastering Regular Expressions", -published by O'Reilly, covers regular expressions in great detail. This -description of PCRE's regular expressions is intended as reference material. -

    -

    -This document discusses the patterns that are supported by PCRE when one its -main matching functions, pcre_exec() (8-bit) or pcre[16|32]_exec() -(16- or 32-bit), is used. PCRE also has alternative matching functions, -pcre_dfa_exec() and pcre[16|32_dfa_exec(), which match using a -different algorithm that is not Perl-compatible. Some of the features discussed -below are not available when DFA matching is used. The advantages and -disadvantages of the alternative functions, and how they differ from the normal -functions, are discussed in the -pcrematching -page. -

    -
    SPECIAL START-OF-PATTERN ITEMS
    -

    -A number of options that can be passed to pcre_compile() can also be set -by special items at the start of a pattern. These are not Perl-compatible, but -are provided to make these options accessible to pattern writers who are not -able to change the program that processes the pattern. Any number of these -items may appear, but they must all be together right at the start of the -pattern string, and the letters must be in upper case. -

    -
    -UTF support -
    -

    -The original operation of PCRE was on strings of one-byte characters. However, -there is now also support for UTF-8 strings in the original library, an -extra library that supports 16-bit and UTF-16 character strings, and a -third library that supports 32-bit and UTF-32 character strings. To use these -features, PCRE must be built to include appropriate support. When using UTF -strings you must either call the compiling function with the PCRE_UTF8, -PCRE_UTF16, or PCRE_UTF32 option, or the pattern must start with one of -these special sequences: -

    -  (*UTF8)
    -  (*UTF16)
    -  (*UTF32)
    -  (*UTF)
    -
    -(*UTF) is a generic sequence that can be used with any of the libraries. -Starting a pattern with such a sequence is equivalent to setting the relevant -option. How setting a UTF mode affects pattern matching is mentioned in several -places below. There is also a summary of features in the -pcreunicode -page. -

    -

    -Some applications that allow their users to supply patterns may wish to -restrict them to non-UTF data for security reasons. If the PCRE_NEVER_UTF -option is set at compile time, (*UTF) etc. are not allowed, and their -appearance causes an error. -

    -
    -Unicode property support -
    -

    -Another special sequence that may appear at the start of a pattern is (*UCP). -This has the same effect as setting the PCRE_UCP option: it causes sequences -such as \d and \w to use Unicode properties to determine character types, -instead of recognizing only characters with codes less than 128 via a lookup -table. -

    -
    -Disabling auto-possessification -
    -

    -If a pattern starts with (*NO_AUTO_POSSESS), it has the same effect as setting -the PCRE_NO_AUTO_POSSESS option at compile time. This stops PCRE from making -quantifiers possessive when what follows cannot match the repeated item. For -example, by default a+b is treated as a++b. For more details, see the -pcreapi -documentation. -

    -
    -Disabling start-up optimizations -
    -

    -If a pattern starts with (*NO_START_OPT), it has the same effect as setting the -PCRE_NO_START_OPTIMIZE option either at compile or matching time. This disables -several optimizations for quickly reaching "no match" results. For more -details, see the -pcreapi -documentation. -

    -
    -Newline conventions -
    -

    -PCRE supports five different conventions for indicating line breaks in -strings: a single CR (carriage return) character, a single LF (linefeed) -character, the two-character sequence CRLF, any of the three preceding, or any -Unicode newline sequence. The -pcreapi -page has -further discussion -about newlines, and shows how to set the newline convention in the -options arguments for the compiling and matching functions. -

    -

    -It is also possible to specify a newline convention by starting a pattern -string with one of the following five sequences: -

    -  (*CR)        carriage return
    -  (*LF)        linefeed
    -  (*CRLF)      carriage return, followed by linefeed
    -  (*ANYCRLF)   any of the three above
    -  (*ANY)       all Unicode newline sequences
    -
    -These override the default and the options given to the compiling function. For -example, on a Unix system where LF is the default newline sequence, the pattern -
    -  (*CR)a.b
    -
    -changes the convention to CR. That pattern matches "a\nb" because LF is no -longer a newline. If more than one of these settings is present, the last one -is used. -

    -

    -The newline convention affects where the circumflex and dollar assertions are -true. It also affects the interpretation of the dot metacharacter when -PCRE_DOTALL is not set, and the behaviour of \N. However, it does not affect -what the \R escape sequence matches. By default, this is any Unicode newline -sequence, for Perl compatibility. However, this can be changed; see the -description of \R in the section entitled -"Newline sequences" -below. A change of \R setting can be combined with a change of newline -convention. -

    -
    -Setting match and recursion limits -
    -

    -The caller of pcre_exec() can set a limit on the number of times the -internal match() function is called and on the maximum depth of -recursive calls. These facilities are provided to catch runaway matches that -are provoked by patterns with huge matching trees (a typical example is a -pattern with nested unlimited repeats) and to avoid running out of system stack -by too much recursion. When one of these limits is reached, pcre_exec() -gives an error return. The limits can also be set by items at the start of the -pattern of the form -

    -  (*LIMIT_MATCH=d)
    -  (*LIMIT_RECURSION=d)
    -
    -where d is any number of decimal digits. However, the value of the setting must -be less than the value set (or defaulted) by the caller of pcre_exec() -for it to have any effect. In other words, the pattern writer can lower the -limits set by the programmer, but not raise them. If there is more than one -setting of one of these limits, the lower value is used. -

    -
    EBCDIC CHARACTER CODES
    -

    -PCRE can be compiled to run in an environment that uses EBCDIC as its character -code rather than ASCII or Unicode (typically a mainframe system). In the -sections below, character code values are ASCII or Unicode; in an EBCDIC -environment these characters may have different code values, and there are no -code points greater than 255. -

    -
    CHARACTERS AND METACHARACTERS
    -

    -A regular expression is a pattern that is matched against a subject string from -left to right. Most characters stand for themselves in a pattern, and match the -corresponding characters in the subject. As a trivial example, the pattern -

    -  The quick brown fox
    -
    -matches a portion of a subject string that is identical to itself. When -caseless matching is specified (the PCRE_CASELESS option), letters are matched -independently of case. In a UTF mode, PCRE always understands the concept of -case for characters whose values are less than 128, so caseless matching is -always possible. For characters with higher values, the concept of case is -supported if PCRE is compiled with Unicode property support, but not otherwise. -If you want to use caseless matching for characters 128 and above, you must -ensure that PCRE is compiled with Unicode property support as well as with -UTF support. -

    -

    -The power of regular expressions comes from the ability to include alternatives -and repetitions in the pattern. These are encoded in the pattern by the use of -metacharacters, which do not stand for themselves but instead are -interpreted in some special way. -

    -

    -There are two different sets of metacharacters: those that are recognized -anywhere in the pattern except within square brackets, and those that are -recognized within square brackets. Outside square brackets, the metacharacters -are as follows: -

    -  \      general escape character with several uses
    -  ^      assert start of string (or line, in multiline mode)
    -  $      assert end of string (or line, in multiline mode)
    -  .      match any character except newline (by default)
    -  [      start character class definition
    -  |      start of alternative branch
    -  (      start subpattern
    -  )      end subpattern
    -  ?      extends the meaning of (
    -         also 0 or 1 quantifier
    -         also quantifier minimizer
    -  *      0 or more quantifier
    -  +      1 or more quantifier
    -         also "possessive quantifier"
    -  {      start min/max quantifier
    -
    -Part of a pattern that is in square brackets is called a "character class". In -a character class the only metacharacters are: -
    -  \      general escape character
    -  ^      negate the class, but only if the first character
    -  -      indicates character range
    -  [      POSIX character class (only if followed by POSIX syntax)
    -  ]      terminates the character class
    -
    -The following sections describe the use of each of the metacharacters. -

    -
    BACKSLASH
    -

    -The backslash character has several uses. Firstly, if it is followed by a -character that is not a number or a letter, it takes away any special meaning -that character may have. This use of backslash as an escape character applies -both inside and outside character classes. -

    -

    -For example, if you want to match a * character, you write \* in the pattern. -This escaping action applies whether or not the following character would -otherwise be interpreted as a metacharacter, so it is always safe to precede a -non-alphanumeric with backslash to specify that it stands for itself. In -particular, if you want to match a backslash, you write \\. -

    -

    -In a UTF mode, only ASCII numbers and letters have any special meaning after a -backslash. All other characters (in particular, those whose codepoints are -greater than 127) are treated as literals. -

    -

    -If a pattern is compiled with the PCRE_EXTENDED option, most white space in the -pattern (other than in a character class), and characters between a # outside a -character class and the next newline, inclusive, are ignored. An escaping -backslash can be used to include a white space or # character as part of the -pattern. -

    -

    -If you want to remove the special meaning from a sequence of characters, you -can do so by putting them between \Q and \E. This is different from Perl in -that $ and @ are handled as literals in \Q...\E sequences in PCRE, whereas in -Perl, $ and @ cause variable interpolation. Note the following examples: -

    -  Pattern            PCRE matches   Perl matches
    -
    -  \Qabc$xyz\E        abc$xyz        abc followed by the contents of $xyz
    -  \Qabc\$xyz\E       abc\$xyz       abc\$xyz
    -  \Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz
    -
    -The \Q...\E sequence is recognized both inside and outside character classes. -An isolated \E that is not preceded by \Q is ignored. If \Q is not followed -by \E later in the pattern, the literal interpretation continues to the end of -the pattern (that is, \E is assumed at the end). If the isolated \Q is inside -a character class, this causes an error, because the character class is not -terminated. -

    -
    -Non-printing characters -
    -

    -A second use of backslash provides a way of encoding non-printing characters -in patterns in a visible manner. There is no restriction on the appearance of -non-printing characters, apart from the binary zero that terminates a pattern, -but when a pattern is being prepared by text editing, it is often easier to use -one of the following escape sequences than the binary character it represents. -In an ASCII or Unicode environment, these escapes are as follows: -

    -  \a        alarm, that is, the BEL character (hex 07)
    -  \cx       "control-x", where x is any ASCII character
    -  \e        escape (hex 1B)
    -  \f        form feed (hex 0C)
    -  \n        linefeed (hex 0A)
    -  \r        carriage return (hex 0D)
    -  \t        tab (hex 09)
    -  \0dd      character with octal code 0dd
    -  \ddd      character with octal code ddd, or back reference
    -  \o{ddd..} character with octal code ddd..
    -  \xhh      character with hex code hh
    -  \x{hhh..} character with hex code hhh.. (non-JavaScript mode)
    -  \uhhhh    character with hex code hhhh (JavaScript mode only)
    -
    -The precise effect of \cx on ASCII characters is as follows: if x is a lower -case letter, it is converted to upper case. Then bit 6 of the character (hex -40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A), -but \c{ becomes hex 3B ({ is 7B), and \c; becomes hex 7B (; is 3B). If the -data item (byte or 16-bit value) following \c has a value greater than 127, a -compile-time error occurs. This locks out non-ASCII characters in all modes. -

    -

    -When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t -generate the appropriate EBCDIC code values. The \c escape is processed -as specified for Perl in the perlebcdic document. The only characters -that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any -other character provokes a compile-time error. The sequence \c@ encodes -character code 0; after \c the letters (in either case) encode characters 1-26 -(hex 01 to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex -1F), and \c? becomes either 255 (hex FF) or 95 (hex 5F). -

    -

    -Thus, apart from \c?, these escapes generate the same character code values as -they do in an ASCII environment, though the meanings of the values mostly -differ. For example, \cG always generates code value 7, which is BEL in ASCII -but DEL in EBCDIC. -

    -

    -The sequence \c? generates DEL (127, hex 7F) in an ASCII environment, but -because 127 is not a control character in EBCDIC, Perl makes it generate the -APC character. Unfortunately, there are several variants of EBCDIC. In most of -them the APC character has the value 255 (hex FF), but in the one Perl calls -POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC -values, PCRE makes \c? generate 95; otherwise it generates 255. -

    -

    -After \0 up to two further octal digits are read. If there are fewer than two -digits, just those that are present are used. Thus the sequence \0\x\015 -specifies two binary zeros followed by a CR character (code value 13). Make -sure you supply two digits after the initial zero if the pattern character that -follows is itself an octal digit. -

    -

    -The escape \o must be followed by a sequence of octal digits, enclosed in -braces. An error occurs if this is not the case. This escape is a recent -addition to Perl; it provides way of specifying character code points as octal -numbers greater than 0777, and it also allows octal numbers and back references -to be unambiguously specified. -

    -

    -For greater clarity and unambiguity, it is best to avoid following \ by a -digit greater than zero. Instead, use \o{} or \x{} to specify character -numbers, and \g{} to specify back references. The following paragraphs -describe the old, ambiguous syntax. -

    -

    -The handling of a backslash followed by a digit other than 0 is complicated, -and Perl has changed in recent releases, causing PCRE also to change. Outside a -character class, PCRE reads the digit and any following digits as a decimal -number. If the number is less than 8, or if there have been at least that many -previous capturing left parentheses in the expression, the entire sequence is -taken as a back reference. A description of how this works is given -later, -following the discussion of -parenthesized subpatterns. -

    -

    -Inside a character class, or if the decimal number following \ is greater than -7 and there have not been that many capturing subpatterns, PCRE handles \8 and -\9 as the literal characters "8" and "9", and otherwise re-reads up to three -octal digits following the backslash, using them to generate a data character. -Any subsequent digits stand for themselves. For example: -

    -  \040   is another way of writing an ASCII space
    -  \40    is the same, provided there are fewer than 40 previous capturing subpatterns
    -  \7     is always a back reference
    -  \11    might be a back reference, or another way of writing a tab
    -  \011   is always a tab
    -  \0113  is a tab followed by the character "3"
    -  \113   might be a back reference, otherwise the character with octal code 113
    -  \377   might be a back reference, otherwise the value 255 (decimal)
    -  \81    is either a back reference, or the two characters "8" and "1"
    -
    -Note that octal values of 100 or greater that are specified using this syntax -must not be introduced by a leading zero, because no more than three octal -digits are ever read. -

    -

    -By default, after \x that is not followed by {, from zero to two hexadecimal -digits are read (letters can be in upper or lower case). Any number of -hexadecimal digits may appear between \x{ and }. If a character other than -a hexadecimal digit appears between \x{ and }, or if there is no terminating -}, an error occurs. -

    -

    -If the PCRE_JAVASCRIPT_COMPAT option is set, the interpretation of \x is -as just described only when it is followed by two hexadecimal digits. -Otherwise, it matches a literal "x" character. In JavaScript mode, support for -code points greater than 256 is provided by \u, which must be followed by -four hexadecimal digits; otherwise it matches a literal "u" character. -

    -

    -Characters whose value is less than 256 can be defined by either of the two -syntaxes for \x (or by \u in JavaScript mode). There is no difference in the -way they are handled. For example, \xdc is exactly the same as \x{dc} (or -\u00dc in JavaScript mode). -

    -
    -Constraints on character values -
    -

    -Characters that are specified using octal or hexadecimal numbers are -limited to certain values, as follows: -

    -  8-bit non-UTF mode    less than 0x100
    -  8-bit UTF-8 mode      less than 0x10ffff and a valid codepoint
    -  16-bit non-UTF mode   less than 0x10000
    -  16-bit UTF-16 mode    less than 0x10ffff and a valid codepoint
    -  32-bit non-UTF mode   less than 0x100000000
    -  32-bit UTF-32 mode    less than 0x10ffff and a valid codepoint
    -
    -Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called -"surrogate" codepoints), and 0xffef. -

    -
    -Escape sequences in character classes -
    -

    -All the sequences that define a single character value can be used both inside -and outside character classes. In addition, inside a character class, \b is -interpreted as the backspace character (hex 08). -

    -

    -\N is not allowed in a character class. \B, \R, and \X are not special -inside a character class. Like other unrecognized escape sequences, they are -treated as the literal characters "B", "R", and "X" by default, but cause an -error if the PCRE_EXTRA option is set. Outside a character class, these -sequences have different meanings. -

    -
    -Unsupported escape sequences -
    -

    -In Perl, the sequences \l, \L, \u, and \U are recognized by its string -handler and used to modify the case of following characters. By default, PCRE -does not support these escape sequences. However, if the PCRE_JAVASCRIPT_COMPAT -option is set, \U matches a "U" character, and \u can be used to define a -character by code point, as described in the previous section. -

    -
    -Absolute and relative back references -
    -

    -The sequence \g followed by an unsigned or a negative number, optionally -enclosed in braces, is an absolute or relative back reference. A named back -reference can be coded as \g{name}. Back references are discussed -later, -following the discussion of -parenthesized subpatterns. -

    -
    -Absolute and relative subroutine calls -
    -

    -For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or -a number enclosed either in angle brackets or single quotes, is an alternative -syntax for referencing a subpattern as a "subroutine". Details are discussed -later. -Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not -synonymous. The former is a back reference; the latter is a -subroutine -call. -

    -
    -Generic character types -
    -

    -Another use of backslash is for specifying generic character types: -

    -  \d     any decimal digit
    -  \D     any character that is not a decimal digit
    -  \h     any horizontal white space character
    -  \H     any character that is not a horizontal white space character
    -  \s     any white space character
    -  \S     any character that is not a white space character
    -  \v     any vertical white space character
    -  \V     any character that is not a vertical white space character
    -  \w     any "word" character
    -  \W     any "non-word" character
    -
    -There is also the single sequence \N, which matches a non-newline character. -This is the same as -the "." metacharacter -when PCRE_DOTALL is not set. Perl also uses \N to match characters by name; -PCRE does not support this. -

    -

    -Each pair of lower and upper case escape sequences partitions the complete set -of characters into two disjoint sets. Any given character matches one, and only -one, of each pair. The sequences can appear both inside and outside character -classes. They each match one character of the appropriate type. If the current -matching point is at the end of the subject string, all of them fail, because -there is no character to match. -

    -

    -For compatibility with Perl, \s did not used to match the VT character (code -11), which made it different from the the POSIX "space" class. However, Perl -added VT at release 5.18, and PCRE followed suit at release 8.34. The default -\s characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space -(32), which are defined as white space in the "C" locale. This list may vary if -locale-specific matching is taking place. For example, in some locales the -"non-breaking space" character (\xA0) is recognized as white space, and in -others the VT character is not. -

    -

    -A "word" character is an underscore or any character that is a letter or digit. -By default, the definition of letters and digits is controlled by PCRE's -low-valued character tables, and may vary if locale-specific matching is taking -place (see -"Locale support" -in the -pcreapi -page). For example, in a French locale such as "fr_FR" in Unix-like systems, -or "french" in Windows, some character codes greater than 127 are used for -accented letters, and these are then matched by \w. The use of locales with -Unicode is discouraged. -

    -

    -By default, characters whose code points are greater than 127 never match \d, -\s, or \w, and always match \D, \S, and \W, although this may vary for -characters in the range 128-255 when locale-specific matching is happening. -These escape sequences retain their original meanings from before Unicode -support was available, mainly for efficiency reasons. If PCRE is compiled with -Unicode property support, and the PCRE_UCP option is set, the behaviour is -changed so that Unicode properties are used to determine character types, as -follows: -

    -  \d  any character that matches \p{Nd} (decimal digit)
    -  \s  any character that matches \p{Z} or \h or \v
    -  \w  any character that matches \p{L} or \p{N}, plus underscore
    -
    -The upper case escapes match the inverse sets of characters. Note that \d -matches only decimal digits, whereas \w matches any Unicode digit, as well as -any Unicode letter, and underscore. Note also that PCRE_UCP affects \b, and -\B because they are defined in terms of \w and \W. Matching these sequences -is noticeably slower when PCRE_UCP is set. -

    -

    -The sequences \h, \H, \v, and \V are features that were added to Perl at -release 5.10. In contrast to the other sequences, which match only ASCII -characters by default, these always match certain high-valued code points, -whether or not PCRE_UCP is set. The horizontal space characters are: -

    -  U+0009     Horizontal tab (HT)
    -  U+0020     Space
    -  U+00A0     Non-break space
    -  U+1680     Ogham space mark
    -  U+180E     Mongolian vowel separator
    -  U+2000     En quad
    -  U+2001     Em quad
    -  U+2002     En space
    -  U+2003     Em space
    -  U+2004     Three-per-em space
    -  U+2005     Four-per-em space
    -  U+2006     Six-per-em space
    -  U+2007     Figure space
    -  U+2008     Punctuation space
    -  U+2009     Thin space
    -  U+200A     Hair space
    -  U+202F     Narrow no-break space
    -  U+205F     Medium mathematical space
    -  U+3000     Ideographic space
    -
    -The vertical space characters are: -
    -  U+000A     Linefeed (LF)
    -  U+000B     Vertical tab (VT)
    -  U+000C     Form feed (FF)
    -  U+000D     Carriage return (CR)
    -  U+0085     Next line (NEL)
    -  U+2028     Line separator
    -  U+2029     Paragraph separator
    -
    -In 8-bit, non-UTF-8 mode, only the characters with codepoints less than 256 are -relevant. -

    -
    -Newline sequences -
    -

    -Outside a character class, by default, the escape sequence \R matches any -Unicode newline sequence. In 8-bit non-UTF-8 mode \R is equivalent to the -following: -

    -  (?>\r\n|\n|\x0b|\f|\r|\x85)
    -
    -This is an example of an "atomic group", details of which are given -below. -This particular group matches either the two-character sequence CR followed by -LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab, -U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next -line, U+0085). The two-character sequence is treated as a single unit that -cannot be split. -

    -

    -In other modes, two additional characters whose codepoints are greater than 255 -are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029). -Unicode character property support is not needed for these characters to be -recognized. -

    -

    -It is possible to restrict \R to match only CR, LF, or CRLF (instead of the -complete set of Unicode line endings) by setting the option PCRE_BSR_ANYCRLF -either at compile time or when the pattern is matched. (BSR is an abbrevation -for "backslash R".) This can be made the default when PCRE is built; if this is -the case, the other behaviour can be requested via the PCRE_BSR_UNICODE option. -It is also possible to specify these settings by starting a pattern string with -one of the following sequences: -

    -  (*BSR_ANYCRLF)   CR, LF, or CRLF only
    -  (*BSR_UNICODE)   any Unicode newline sequence
    -
    -These override the default and the options given to the compiling function, but -they can themselves be overridden by options given to a matching function. Note -that these special settings, which are not Perl-compatible, are recognized only -at the very start of a pattern, and that they must be in upper case. If more -than one of them is present, the last one is used. They can be combined with a -change of newline convention; for example, a pattern can start with: -
    -  (*ANY)(*BSR_ANYCRLF)
    -
    -They can also be combined with the (*UTF8), (*UTF16), (*UTF32), (*UTF) or -(*UCP) special sequences. Inside a character class, \R is treated as an -unrecognized escape sequence, and so matches the letter "R" by default, but -causes an error if PCRE_EXTRA is set. -

    -
    -Unicode character properties -
    -

    -When PCRE is built with Unicode character property support, three additional -escape sequences that match characters with specific properties are available. -When in 8-bit non-UTF-8 mode, these sequences are of course limited to testing -characters whose codepoints are less than 256, but they do work in this mode. -The extra escape sequences are: -

    -  \p{xx}   a character with the xx property
    -  \P{xx}   a character without the xx property
    -  \X       a Unicode extended grapheme cluster
    -
    -The property names represented by xx above are limited to the Unicode -script names, the general category properties, "Any", which matches any -character (including newline), and some special PCRE properties (described -in the -next section). -Other Perl properties such as "InMusicalSymbols" are not currently supported by -PCRE. Note that \P{Any} does not match any characters, so always causes a -match failure. -

    -

    -Sets of Unicode characters are defined as belonging to certain scripts. A -character from one of these sets can be matched using a script name. For -example: -

    -  \p{Greek}
    -  \P{Han}
    -
    -Those that are not part of an identified script are lumped together as -"Common". The current list of scripts is: -

    -

    -Arabic, -Armenian, -Avestan, -Balinese, -Bamum, -Bassa_Vah, -Batak, -Bengali, -Bopomofo, -Brahmi, -Braille, -Buginese, -Buhid, -Canadian_Aboriginal, -Carian, -Caucasian_Albanian, -Chakma, -Cham, -Cherokee, -Common, -Coptic, -Cuneiform, -Cypriot, -Cyrillic, -Deseret, -Devanagari, -Duployan, -Egyptian_Hieroglyphs, -Elbasan, -Ethiopic, -Georgian, -Glagolitic, -Gothic, -Grantha, -Greek, -Gujarati, -Gurmukhi, -Han, -Hangul, -Hanunoo, -Hebrew, -Hiragana, -Imperial_Aramaic, -Inherited, -Inscriptional_Pahlavi, -Inscriptional_Parthian, -Javanese, -Kaithi, -Kannada, -Katakana, -Kayah_Li, -Kharoshthi, -Khmer, -Khojki, -Khudawadi, -Lao, -Latin, -Lepcha, -Limbu, -Linear_A, -Linear_B, -Lisu, -Lycian, -Lydian, -Mahajani, -Malayalam, -Mandaic, -Manichaean, -Meetei_Mayek, -Mende_Kikakui, -Meroitic_Cursive, -Meroitic_Hieroglyphs, -Miao, -Modi, -Mongolian, -Mro, -Myanmar, -Nabataean, -New_Tai_Lue, -Nko, -Ogham, -Ol_Chiki, -Old_Italic, -Old_North_Arabian, -Old_Permic, -Old_Persian, -Old_South_Arabian, -Old_Turkic, -Oriya, -Osmanya, -Pahawh_Hmong, -Palmyrene, -Pau_Cin_Hau, -Phags_Pa, -Phoenician, -Psalter_Pahlavi, -Rejang, -Runic, -Samaritan, -Saurashtra, -Sharada, -Shavian, -Siddham, -Sinhala, -Sora_Sompeng, -Sundanese, -Syloti_Nagri, -Syriac, -Tagalog, -Tagbanwa, -Tai_Le, -Tai_Tham, -Tai_Viet, -Takri, -Tamil, -Telugu, -Thaana, -Thai, -Tibetan, -Tifinagh, -Tirhuta, -Ugaritic, -Vai, -Warang_Citi, -Yi. -

    -

    -Each character has exactly one Unicode general category property, specified by -a two-letter abbreviation. For compatibility with Perl, negation can be -specified by including a circumflex between the opening brace and the property -name. For example, \p{^Lu} is the same as \P{Lu}. -

    -

    -If only one letter is specified with \p or \P, it includes all the general -category properties that start with that letter. In this case, in the absence -of negation, the curly brackets in the escape sequence are optional; these two -examples have the same effect: -

    -  \p{L}
    -  \pL
    -
    -The following general category property codes are supported: -
    -  C     Other
    -  Cc    Control
    -  Cf    Format
    -  Cn    Unassigned
    -  Co    Private use
    -  Cs    Surrogate
    -
    -  L     Letter
    -  Ll    Lower case letter
    -  Lm    Modifier letter
    -  Lo    Other letter
    -  Lt    Title case letter
    -  Lu    Upper case letter
    -
    -  M     Mark
    -  Mc    Spacing mark
    -  Me    Enclosing mark
    -  Mn    Non-spacing mark
    -
    -  N     Number
    -  Nd    Decimal number
    -  Nl    Letter number
    -  No    Other number
    -
    -  P     Punctuation
    -  Pc    Connector punctuation
    -  Pd    Dash punctuation
    -  Pe    Close punctuation
    -  Pf    Final punctuation
    -  Pi    Initial punctuation
    -  Po    Other punctuation
    -  Ps    Open punctuation
    -
    -  S     Symbol
    -  Sc    Currency symbol
    -  Sk    Modifier symbol
    -  Sm    Mathematical symbol
    -  So    Other symbol
    -
    -  Z     Separator
    -  Zl    Line separator
    -  Zp    Paragraph separator
    -  Zs    Space separator
    -
    -The special property L& is also supported: it matches a character that has -the Lu, Ll, or Lt property, in other words, a letter that is not classified as -a modifier or "other". -

    -

    -The Cs (Surrogate) property applies only to characters in the range U+D800 to -U+DFFF. Such characters are not valid in Unicode strings and so -cannot be tested by PCRE, unless UTF validity checking has been turned off -(see the discussion of PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK and -PCRE_NO_UTF32_CHECK in the -pcreapi -page). Perl does not support the Cs property. -

    -

    -The long synonyms for property names that Perl supports (such as \p{Letter}) -are not supported by PCRE, nor is it permitted to prefix any of these -properties with "Is". -

    -

    -No character that is in the Unicode table has the Cn (unassigned) property. -Instead, this property is assumed for any code point that is not in the -Unicode table. -

    -

    -Specifying caseless matching does not affect these escape sequences. For -example, \p{Lu} always matches only upper case letters. This is different from -the behaviour of current versions of Perl. -

    -

    -Matching characters by Unicode property is not fast, because PCRE has to do a -multistage table lookup in order to find a character's property. That is why -the traditional escape sequences such as \d and \w do not use Unicode -properties in PCRE by default, though you can make them do so by setting the -PCRE_UCP option or by starting the pattern with (*UCP). -

    -
    -Extended grapheme clusters -
    -

    -The \X escape matches any number of Unicode characters that form an "extended -grapheme cluster", and treats the sequence as an atomic group -(see below). -Up to and including release 8.31, PCRE matched an earlier, simpler definition -that was equivalent to -

    -  (?>\PM\pM*)
    -
    -That is, it matched a character without the "mark" property, followed by zero -or more characters with the "mark" property. Characters with the "mark" -property are typically non-spacing accents that affect the preceding character. -

    -

    -This simple definition was extended in Unicode to include more complicated -kinds of composite character by giving each character a grapheme breaking -property, and creating rules that use these properties to define the boundaries -of extended grapheme clusters. In releases of PCRE later than 8.31, \X matches -one of these clusters. -

    -

    -\X always matches at least one character. Then it decides whether to add -additional characters according to the following rules for ending a cluster: -

    -

    -1. End at the end of the subject string. -

    -

    -2. Do not end between CR and LF; otherwise end after any control character. -

    -

    -3. Do not break Hangul (a Korean script) syllable sequences. Hangul characters -are of five types: L, V, T, LV, and LVT. An L character may be followed by an -L, V, LV, or LVT character; an LV or V character may be followed by a V or T -character; an LVT or T character may be follwed only by a T character. -

    -

    -4. Do not end before extending characters or spacing marks. Characters with -the "mark" property always have the "extend" grapheme breaking property. -

    -

    -5. Do not end after prepend characters. -

    -

    -6. Otherwise, end the cluster. -

    -
    -PCRE's additional properties -
    -

    -As well as the standard Unicode properties described above, PCRE supports four -more that make it possible to convert traditional escape sequences such as \w -and \s to use Unicode properties. PCRE uses these non-standard, non-Perl -properties internally when PCRE_UCP is set. However, they may also be used -explicitly. These properties are: -

    -  Xan   Any alphanumeric character
    -  Xps   Any POSIX space character
    -  Xsp   Any Perl space character
    -  Xwd   Any Perl "word" character
    -
    -Xan matches characters that have either the L (letter) or the N (number) -property. Xps matches the characters tab, linefeed, vertical tab, form feed, or -carriage return, and any other character that has the Z (separator) property. -Xsp is the same as Xps; it used to exclude vertical tab, for Perl -compatibility, but Perl changed, and so PCRE followed at release 8.34. Xwd -matches the same characters as Xan, plus underscore. -

    -

    -There is another non-standard property, Xuc, which matches any character that -can be represented by a Universal Character Name in C++ and other programming -languages. These are the characters $, @, ` (grave accent), and all characters -with Unicode code points greater than or equal to U+00A0, except for the -surrogates U+D800 to U+DFFF. Note that most base (ASCII) characters are -excluded. (Universal Character Names are of the form \uHHHH or \UHHHHHHHH -where H is a hexadecimal digit. Note that the Xuc property does not match these -sequences but the characters that they represent.) -

    -
    -Resetting the match start -
    -

    -The escape sequence \K causes any previously matched characters not to be -included in the final matched sequence. For example, the pattern: -

    -  foo\Kbar
    -
    -matches "foobar", but reports that it has matched "bar". This feature is -similar to a lookbehind assertion -(described below). -However, in this case, the part of the subject before the real match does not -have to be of fixed length, as lookbehind assertions do. The use of \K does -not interfere with the setting of -captured substrings. -For example, when the pattern -
    -  (foo)\Kbar
    -
    -matches "foobar", the first substring is still set to "foo". -

    -

    -Perl documents that the use of \K within assertions is "not well defined". In -PCRE, \K is acted upon when it occurs inside positive assertions, but is -ignored in negative assertions. Note that when a pattern such as (?=ab\K) -matches, the reported start of the match can be greater than the end of the -match. -

    -
    -Simple assertions -
    -

    -The final use of backslash is for certain simple assertions. An assertion -specifies a condition that has to be met at a particular point in a match, -without consuming any characters from the subject string. The use of -subpatterns for more complicated assertions is described -below. -The backslashed assertions are: -

    -  \b     matches at a word boundary
    -  \B     matches when not at a word boundary
    -  \A     matches at the start of the subject
    -  \Z     matches at the end of the subject
    -          also matches before a newline at the end of the subject
    -  \z     matches only at the end of the subject
    -  \G     matches at the first matching position in the subject
    -
    -Inside a character class, \b has a different meaning; it matches the backspace -character. If any other of these assertions appears in a character class, by -default it matches the corresponding literal character (for example, \B -matches the letter B). However, if the PCRE_EXTRA option is set, an "invalid -escape sequence" error is generated instead. -

    -

    -A word boundary is a position in the subject string where the current character -and the previous character do not both match \w or \W (i.e. one matches -\w and the other matches \W), or the start or end of the string if the -first or last character matches \w, respectively. In a UTF mode, the meanings -of \w and \W can be changed by setting the PCRE_UCP option. When this is -done, it also affects \b and \B. Neither PCRE nor Perl has a separate "start -of word" or "end of word" metasequence. However, whatever follows \b normally -determines which it is. For example, the fragment \ba matches "a" at the start -of a word. -

    -

    -The \A, \Z, and \z assertions differ from the traditional circumflex and -dollar (described in the next section) in that they only ever match at the very -start and end of the subject string, whatever options are set. Thus, they are -independent of multiline mode. These three assertions are not affected by the -PCRE_NOTBOL or PCRE_NOTEOL options, which affect only the behaviour of the -circumflex and dollar metacharacters. However, if the startoffset -argument of pcre_exec() is non-zero, indicating that matching is to start -at a point other than the beginning of the subject, \A can never match. The -difference between \Z and \z is that \Z matches before a newline at the end -of the string as well as at the very end, whereas \z matches only at the end. -

    -

    -The \G assertion is true only when the current matching position is at the -start point of the match, as specified by the startoffset argument of -pcre_exec(). It differs from \A when the value of startoffset is -non-zero. By calling pcre_exec() multiple times with appropriate -arguments, you can mimic Perl's /g option, and it is in this kind of -implementation where \G can be useful. -

    -

    -Note, however, that PCRE's interpretation of \G, as the start of the current -match, is subtly different from Perl's, which defines it as the end of the -previous match. In Perl, these can be different when the previously matched -string was empty. Because PCRE does just one match at a time, it cannot -reproduce this behaviour. -

    -

    -If all the alternatives of a pattern begin with \G, the expression is anchored -to the starting match position, and the "anchored" flag is set in the compiled -regular expression. -

    -
    CIRCUMFLEX AND DOLLAR
    -

    -The circumflex and dollar metacharacters are zero-width assertions. That is, -they test for a particular condition being true without consuming any -characters from the subject string. -

    -

    -Outside a character class, in the default matching mode, the circumflex -character is an assertion that is true only if the current matching point is at -the start of the subject string. If the startoffset argument of -pcre_exec() is non-zero, circumflex can never match if the PCRE_MULTILINE -option is unset. Inside a character class, circumflex has an entirely different -meaning -(see below). -

    -

    -Circumflex need not be the first character of the pattern if a number of -alternatives are involved, but it should be the first thing in each alternative -in which it appears if the pattern is ever to match that branch. If all -possible alternatives start with a circumflex, that is, if the pattern is -constrained to match only at the start of the subject, it is said to be an -"anchored" pattern. (There are also other constructs that can cause a pattern -to be anchored.) -

    -

    -The dollar character is an assertion that is true only if the current matching -point is at the end of the subject string, or immediately before a newline at -the end of the string (by default). Note, however, that it does not actually -match the newline. Dollar need not be the last character of the pattern if a -number of alternatives are involved, but it should be the last item in any -branch in which it appears. Dollar has no special meaning in a character class. -

    -

    -The meaning of dollar can be changed so that it matches only at the very end of -the string, by setting the PCRE_DOLLAR_ENDONLY option at compile time. This -does not affect the \Z assertion. -

    -

    -The meanings of the circumflex and dollar characters are changed if the -PCRE_MULTILINE option is set. When this is the case, a circumflex matches -immediately after internal newlines as well as at the start of the subject -string. It does not match after a newline that ends the string. A dollar -matches before any newlines in the string, as well as at the very end, when -PCRE_MULTILINE is set. When newline is specified as the two-character -sequence CRLF, isolated CR and LF characters do not indicate newlines. -

    -

    -For example, the pattern /^abc$/ matches the subject string "def\nabc" (where -\n represents a newline) in multiline mode, but not otherwise. Consequently, -patterns that are anchored in single line mode because all branches start with -^ are not anchored in multiline mode, and a match for circumflex is possible -when the startoffset argument of pcre_exec() is non-zero. The -PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set. -

    -

    -Note that the sequences \A, \Z, and \z can be used to match the start and -end of the subject in both modes, and if all branches of a pattern start with -\A it is always anchored, whether or not PCRE_MULTILINE is set. -

    -
    FULL STOP (PERIOD, DOT) AND \N
    -

    -Outside a character class, a dot in the pattern matches any one character in -the subject string except (by default) a character that signifies the end of a -line. -

    -

    -When a line ending is defined as a single character, dot never matches that -character; when the two-character sequence CRLF is used, dot does not match CR -if it is immediately followed by LF, but otherwise it matches all characters -(including isolated CRs and LFs). When any Unicode line endings are being -recognized, dot does not match CR or LF or any of the other line ending -characters. -

    -

    -The behaviour of dot with regard to newlines can be changed. If the PCRE_DOTALL -option is set, a dot matches any one character, without exception. If the -two-character sequence CRLF is present in the subject string, it takes two dots -to match it. -

    -

    -The handling of dot is entirely independent of the handling of circumflex and -dollar, the only relationship being that they both involve newlines. Dot has no -special meaning in a character class. -

    -

    -The escape sequence \N behaves like a dot, except that it is not affected by -the PCRE_DOTALL option. In other words, it matches any character except one -that signifies the end of a line. Perl also uses \N to match characters by -name; PCRE does not support this. -

    -
    MATCHING A SINGLE DATA UNIT
    -

    -Outside a character class, the escape sequence \C matches any one data unit, -whether or not a UTF mode is set. In the 8-bit library, one data unit is one -byte; in the 16-bit library it is a 16-bit unit; in the 32-bit library it is -a 32-bit unit. Unlike a dot, \C always -matches line-ending characters. The feature is provided in Perl in order to -match individual bytes in UTF-8 mode, but it is unclear how it can usefully be -used. Because \C breaks up characters into individual data units, matching one -unit with \C in a UTF mode means that the rest of the string may start with a -malformed UTF character. This has undefined results, because PCRE assumes that -it is dealing with valid UTF strings (and by default it checks this at the -start of processing unless the PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK or -PCRE_NO_UTF32_CHECK option is used). -

    -

    -PCRE does not allow \C to appear in lookbehind assertions -(described below) -in a UTF mode, because this would make it impossible to calculate the length of -the lookbehind. -

    -

    -In general, the \C escape sequence is best avoided. However, one -way of using it that avoids the problem of malformed UTF characters is to use a -lookahead to check the length of the next character, as in this pattern, which -could be used with a UTF-8 string (ignore white space and line breaks): -

    -  (?| (?=[\x00-\x7f])(\C) |
    -      (?=[\x80-\x{7ff}])(\C)(\C) |
    -      (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) |
    -      (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C))
    -
    -A group that starts with (?| resets the capturing parentheses numbers in each -alternative (see -"Duplicate Subpattern Numbers" -below). The assertions at the start of each branch check the next UTF-8 -character for values whose encoding uses 1, 2, 3, or 4 bytes, respectively. The -character's individual bytes are then captured by the appropriate number of -groups. -

    -
    SQUARE BRACKETS AND CHARACTER CLASSES
    -

    -An opening square bracket introduces a character class, terminated by a closing -square bracket. A closing square bracket on its own is not special by default. -However, if the PCRE_JAVASCRIPT_COMPAT option is set, a lone closing square -bracket causes a compile-time error. If a closing square bracket is required as -a member of the class, it should be the first data character in the class -(after an initial circumflex, if present) or escaped with a backslash. -

    -

    -A character class matches a single character in the subject. In a UTF mode, the -character may be more than one data unit long. A matched character must be in -the set of characters defined by the class, unless the first character in the -class definition is a circumflex, in which case the subject character must not -be in the set defined by the class. If a circumflex is actually required as a -member of the class, ensure it is not the first character, or escape it with a -backslash. -

    -

    -For example, the character class [aeiou] matches any lower case vowel, while -[^aeiou] matches any character that is not a lower case vowel. Note that a -circumflex is just a convenient notation for specifying the characters that -are in the class by enumerating those that are not. A class that starts with a -circumflex is not an assertion; it still consumes a character from the subject -string, and therefore it fails if the current pointer is at the end of the -string. -

    -

    -In UTF-8 (UTF-16, UTF-32) mode, characters with values greater than 255 (0xffff) -can be included in a class as a literal string of data units, or by using the -\x{ escaping mechanism. -

    -

    -When caseless matching is set, any letters in a class represent both their -upper case and lower case versions, so for example, a caseless [aeiou] matches -"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a -caseful version would. In a UTF mode, PCRE always understands the concept of -case for characters whose values are less than 128, so caseless matching is -always possible. For characters with higher values, the concept of case is -supported if PCRE is compiled with Unicode property support, but not otherwise. -If you want to use caseless matching in a UTF mode for characters 128 and -above, you must ensure that PCRE is compiled with Unicode property support as -well as with UTF support. -

    -

    -Characters that might indicate line breaks are never treated in any special way -when matching character classes, whatever line-ending sequence is in use, and -whatever setting of the PCRE_DOTALL and PCRE_MULTILINE options is used. A class -such as [^a] always matches one of these characters. -

    -

    -The minus (hyphen) character can be used to specify a range of characters in a -character class. For example, [d-m] matches any letter between d and m, -inclusive. If a minus character is required in a class, it must be escaped with -a backslash or appear in a position where it cannot be interpreted as -indicating a range, typically as the first or last character in the class, or -immediately after a range. For example, [b-d-z] matches letters in the range b -to d, a hyphen character, or z. -

    -

    -It is not possible to have the literal character "]" as the end character of a -range. A pattern such as [W-]46] is interpreted as a class of two characters -("W" and "-") followed by a literal string "46]", so it would match "W46]" or -"-46]". However, if the "]" is escaped with a backslash it is interpreted as -the end of range, so [W-\]46] is interpreted as a class containing a range -followed by two other characters. The octal or hexadecimal representation of -"]" can also be used to end a range. -

    -

    -An error is generated if a POSIX character class (see below) or an escape -sequence other than one that defines a single character appears at a point -where a range ending character is expected. For example, [z-\xff] is valid, -but [A-\d] and [A-[:digit:]] are not. -

    -

    -Ranges operate in the collating sequence of character values. They can also be -used for characters specified numerically, for example [\000-\037]. Ranges -can include any characters that are valid for the current mode. -

    -

    -If a range that includes letters is used when caseless matching is set, it -matches the letters in either case. For example, [W-c] is equivalent to -[][\\^_`wxyzabc], matched caselessly, and in a non-UTF mode, if character -tables for a French locale are in use, [\xc8-\xcb] matches accented E -characters in both cases. In UTF modes, PCRE supports the concept of case for -characters with values greater than 128 only when it is compiled with Unicode -property support. -

    -

    -The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v, -\V, \w, and \W may appear in a character class, and add the characters that -they match to the class. For example, [\dABCDEF] matches any hexadecimal -digit. In UTF modes, the PCRE_UCP option affects the meanings of \d, \s, \w -and their upper case partners, just as it does when they appear outside a -character class, as described in the section entitled -"Generic character types" -above. The escape sequence \b has a different meaning inside a character -class; it matches the backspace character. The sequences \B, \N, \R, and \X -are not special inside a character class. Like any other unrecognized escape -sequences, they are treated as the literal characters "B", "N", "R", and "X" by -default, but cause an error if the PCRE_EXTRA option is set. -

    -

    -A circumflex can conveniently be used with the upper case character types to -specify a more restricted set of characters than the matching lower case type. -For example, the class [^\W_] matches any letter or digit, but not underscore, -whereas [\w] includes underscore. A positive character class should be read as -"something OR something OR ..." and a negative class as "NOT something AND NOT -something AND NOT ...". -

    -

    -The only metacharacters that are recognized in character classes are backslash, -hyphen (only where it can be interpreted as specifying a range), circumflex -(only at the start), opening square bracket (only when it can be interpreted as -introducing a POSIX class name, or for a special compatibility feature - see -the next two sections), and the terminating closing square bracket. However, -escaping other non-alphanumeric characters does no harm. -

    -
    POSIX CHARACTER CLASSES
    -

    -Perl supports the POSIX notation for character classes. This uses names -enclosed by [: and :] within the enclosing square brackets. PCRE also supports -this notation. For example, -

    -  [01[:alpha:]%]
    -
    -matches "0", "1", any alphabetic character, or "%". The supported class names -are: -
    -  alnum    letters and digits
    -  alpha    letters
    -  ascii    character codes 0 - 127
    -  blank    space or tab only
    -  cntrl    control characters
    -  digit    decimal digits (same as \d)
    -  graph    printing characters, excluding space
    -  lower    lower case letters
    -  print    printing characters, including space
    -  punct    printing characters, excluding letters and digits and space
    -  space    white space (the same as \s from PCRE 8.34)
    -  upper    upper case letters
    -  word     "word" characters (same as \w)
    -  xdigit   hexadecimal digits
    -
    -The default "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), -and space (32). If locale-specific matching is taking place, the list of space -characters may be different; there may be fewer or more of them. "Space" used -to be different to \s, which did not include VT, for Perl compatibility. -However, Perl changed at release 5.18, and PCRE followed at release 8.34. -"Space" and \s now match the same set of characters. -

    -

    -The name "word" is a Perl extension, and "blank" is a GNU extension from Perl -5.8. Another Perl extension is negation, which is indicated by a ^ character -after the colon. For example, -

    -  [12[:^digit:]]
    -
    -matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the POSIX -syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not -supported, and an error is given if they are encountered. -

    -

    -By default, characters with values greater than 128 do not match any of the -POSIX character classes. However, if the PCRE_UCP option is passed to -pcre_compile(), some of the classes are changed so that Unicode character -properties are used. This is achieved by replacing certain POSIX classes by -other sequences, as follows: -

    -  [:alnum:]  becomes  \p{Xan}
    -  [:alpha:]  becomes  \p{L}
    -  [:blank:]  becomes  \h
    -  [:digit:]  becomes  \p{Nd}
    -  [:lower:]  becomes  \p{Ll}
    -  [:space:]  becomes  \p{Xps}
    -  [:upper:]  becomes  \p{Lu}
    -  [:word:]   becomes  \p{Xwd}
    -
    -Negated versions, such as [:^alpha:] use \P instead of \p. Three other POSIX -classes are handled specially in UCP mode: -

    -

    -[:graph:] -This matches characters that have glyphs that mark the page when printed. In -Unicode property terms, it matches all characters with the L, M, N, P, S, or Cf -properties, except for: -

    -  U+061C           Arabic Letter Mark
    -  U+180E           Mongolian Vowel Separator
    -  U+2066 - U+2069  Various "isolate"s
    -
    -
    -

    -

    -[:print:] -This matches the same characters as [:graph:] plus space characters that are -not controls, that is, characters with the Zs property. -

    -

    -[:punct:] -This matches all characters that have the Unicode P (punctuation) property, -plus those characters whose code points are less than 128 that have the S -(Symbol) property. -

    -

    -The other POSIX classes are unchanged, and match only characters with code -points less than 128. -

    -
    COMPATIBILITY FEATURE FOR WORD BOUNDARIES
    -

    -In the POSIX.2 compliant library that was included in 4.4BSD Unix, the ugly -syntax [[:<:]] and [[:>:]] is used for matching "start of word" and "end of -word". PCRE treats these items as follows: -

    -  [[:<:]]  is converted to  \b(?=\w)
    -  [[:>:]]  is converted to  \b(?<=\w)
    -
    -Only these exact character sequences are recognized. A sequence such as -[a[:<:]b] provokes error for an unrecognized POSIX class name. This support is -not compatible with Perl. It is provided to help migrations from other -environments, and is best not used in any new patterns. Note that \b matches -at the start and the end of a word (see -"Simple assertions" -above), and in a Perl-style pattern the preceding or following character -normally shows which is wanted, without the need for the assertions that are -used above in order to give exactly the POSIX behaviour. -

    -
    VERTICAL BAR
    -

    -Vertical bar characters are used to separate alternative patterns. For example, -the pattern -

    -  gilbert|sullivan
    -
    -matches either "gilbert" or "sullivan". Any number of alternatives may appear, -and an empty alternative is permitted (matching the empty string). The matching -process tries each alternative in turn, from left to right, and the first one -that succeeds is used. If the alternatives are within a subpattern -(defined below), -"succeeds" means matching the rest of the main pattern as well as the -alternative in the subpattern. -

    -
    INTERNAL OPTION SETTING
    -

    -The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and -PCRE_EXTENDED options (which are Perl-compatible) can be changed from within -the pattern by a sequence of Perl option letters enclosed between "(?" and ")". -The option letters are -

    -  i  for PCRE_CASELESS
    -  m  for PCRE_MULTILINE
    -  s  for PCRE_DOTALL
    -  x  for PCRE_EXTENDED
    -
    -For example, (?im) sets caseless, multiline matching. It is also possible to -unset these options by preceding the letter with a hyphen, and a combined -setting and unsetting such as (?im-sx), which sets PCRE_CASELESS and -PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED, is also -permitted. If a letter appears both before and after the hyphen, the option is -unset. -

    -

    -The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA can be -changed in the same way as the Perl-compatible options by using the characters -J, U and X respectively. -

    -

    -When one of these option changes occurs at top level (that is, not inside -subpattern parentheses), the change applies to the remainder of the pattern -that follows. An option change within a subpattern (see below for a description -of subpatterns) affects only that part of the subpattern that follows it, so -

    -  (a(?i)b)c
    -
    -matches abc and aBc and no other strings (assuming PCRE_CASELESS is not used). -By this means, options can be made to have different settings in different -parts of the pattern. Any changes made in one alternative do carry on -into subsequent branches within the same subpattern. For example, -
    -  (a(?i)b|c)
    -
    -matches "ab", "aB", "c", and "C", even though when matching "C" the first -branch is abandoned before the option setting. This is because the effects of -option settings happen at compile time. There would be some very weird -behaviour otherwise. -

    -

    -Note: There are other PCRE-specific options that can be set by the -application when the compiling or matching functions are called. In some cases -the pattern can contain special leading sequences such as (*CRLF) to override -what the application has set or what has been defaulted. Details are given in -the section entitled -"Newline sequences" -above. There are also the (*UTF8), (*UTF16),(*UTF32), and (*UCP) leading -sequences that can be used to set UTF and Unicode property modes; they are -equivalent to setting the PCRE_UTF8, PCRE_UTF16, PCRE_UTF32 and the PCRE_UCP -options, respectively. The (*UTF) sequence is a generic version that can be -used with any of the libraries. However, the application can set the -PCRE_NEVER_UTF option, which locks out the use of the (*UTF) sequences. -

    -
    SUBPATTERNS
    -

    -Subpatterns are delimited by parentheses (round brackets), which can be nested. -Turning part of a pattern into a subpattern does two things: -
    -
    -1. It localizes a set of alternatives. For example, the pattern -

    -  cat(aract|erpillar|)
    -
    -matches "cataract", "caterpillar", or "cat". Without the parentheses, it would -match "cataract", "erpillar" or an empty string. -
    -
    -2. It sets up the subpattern as a capturing subpattern. This means that, when -the whole pattern matches, that portion of the subject string that matched the -subpattern is passed back to the caller via the ovector argument of the -matching function. (This applies only to the traditional matching functions; -the DFA matching functions do not support capturing.) -

    -

    -Opening parentheses are counted from left to right (starting from 1) to obtain -numbers for the capturing subpatterns. For example, if the string "the red -king" is matched against the pattern -

    -  the ((red|white) (king|queen))
    -
    -the captured substrings are "red king", "red", and "king", and are numbered 1, -2, and 3, respectively. -

    -

    -The fact that plain parentheses fulfil two functions is not always helpful. -There are often times when a grouping subpattern is required without a -capturing requirement. If an opening parenthesis is followed by a question mark -and a colon, the subpattern does not do any capturing, and is not counted when -computing the number of any subsequent capturing subpatterns. For example, if -the string "the white queen" is matched against the pattern -

    -  the ((?:red|white) (king|queen))
    -
    -the captured substrings are "white queen" and "queen", and are numbered 1 and -2. The maximum number of capturing subpatterns is 65535. -

    -

    -As a convenient shorthand, if any option settings are required at the start of -a non-capturing subpattern, the option letters may appear between the "?" and -the ":". Thus the two patterns -

    -  (?i:saturday|sunday)
    -  (?:(?i)saturday|sunday)
    -
    -match exactly the same set of strings. Because alternative branches are tried -from left to right, and options are not reset until the end of the subpattern -is reached, an option setting in one branch does affect subsequent branches, so -the above patterns match "SUNDAY" as well as "Saturday". -

    -
    DUPLICATE SUBPATTERN NUMBERS
    -

    -Perl 5.10 introduced a feature whereby each alternative in a subpattern uses -the same numbers for its capturing parentheses. Such a subpattern starts with -(?| and is itself a non-capturing subpattern. For example, consider this -pattern: -

    -  (?|(Sat)ur|(Sun))day
    -
    -Because the two alternatives are inside a (?| group, both sets of capturing -parentheses are numbered one. Thus, when the pattern matches, you can look -at captured substring number one, whichever alternative matched. This construct -is useful when you want to capture part, but not all, of one of a number of -alternatives. Inside a (?| group, parentheses are numbered as usual, but the -number is reset at the start of each branch. The numbers of any capturing -parentheses that follow the subpattern start after the highest number used in -any branch. The following example is taken from the Perl documentation. The -numbers underneath show in which buffer the captured content will be stored. -
    -  # before  ---------------branch-reset----------- after
    -  / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
    -  # 1            2         2  3        2     3     4
    -
    -A back reference to a numbered subpattern uses the most recent value that is -set for that number by any subpattern. The following pattern matches "abcabc" -or "defdef": -
    -  /(?|(abc)|(def))\1/
    -
    -In contrast, a subroutine call to a numbered subpattern always refers to the -first one in the pattern with the given number. The following pattern matches -"abcabc" or "defabc": -
    -  /(?|(abc)|(def))(?1)/
    -
    -If a -condition test -for a subpattern's having matched refers to a non-unique number, the test is -true if any of the subpatterns of that number have matched. -

    -

    -An alternative approach to using this "branch reset" feature is to use -duplicate named subpatterns, as described in the next section. -

    -
    NAMED SUBPATTERNS
    -

    -Identifying capturing parentheses by number is simple, but it can be very hard -to keep track of the numbers in complicated regular expressions. Furthermore, -if an expression is modified, the numbers may change. To help with this -difficulty, PCRE supports the naming of subpatterns. This feature was not -added to Perl until release 5.10. Python had the feature earlier, and PCRE -introduced it at release 4.0, using the Python syntax. PCRE now supports both -the Perl and the Python syntax. Perl allows identically numbered subpatterns to -have different names, but PCRE does not. -

    -

    -In PCRE, a subpattern can be named in one of three ways: (?<name>...) or -(?'name'...) as in Perl, or (?P<name>...) as in Python. References to capturing -parentheses from other parts of the pattern, such as -back references, -recursion, -and -conditions, -can be made by name as well as by number. -

    -

    -Names consist of up to 32 alphanumeric characters and underscores, but must -start with a non-digit. Named capturing parentheses are still allocated numbers -as well as names, exactly as if the names were not present. The PCRE API -provides function calls for extracting the name-to-number translation table -from a compiled pattern. There is also a convenience function for extracting a -captured substring by name. -

    -

    -By default, a name must be unique within a pattern, but it is possible to relax -this constraint by setting the PCRE_DUPNAMES option at compile time. (Duplicate -names are also always permitted for subpatterns with the same number, set up as -described in the previous section.) Duplicate names can be useful for patterns -where only one instance of the named parentheses can match. Suppose you want to -match the name of a weekday, either as a 3-letter abbreviation or as the full -name, and in both cases you want to extract the abbreviation. This pattern -(ignoring the line breaks) does the job: -

    -  (?<DN>Mon|Fri|Sun)(?:day)?|
    -  (?<DN>Tue)(?:sday)?|
    -  (?<DN>Wed)(?:nesday)?|
    -  (?<DN>Thu)(?:rsday)?|
    -  (?<DN>Sat)(?:urday)?
    -
    -There are five capturing substrings, but only one is ever set after a match. -(An alternative way of solving this problem is to use a "branch reset" -subpattern, as described in the previous section.) -

    -

    -The convenience function for extracting the data by name returns the substring -for the first (and in this example, the only) subpattern of that name that -matched. This saves searching to find which numbered subpattern it was. -

    -

    -If you make a back reference to a non-unique named subpattern from elsewhere in -the pattern, the subpatterns to which the name refers are checked in the order -in which they appear in the overall pattern. The first one that is set is used -for the reference. For example, this pattern matches both "foofoo" and -"barbar" but not "foobar" or "barfoo": -

    -  (?:(?<n>foo)|(?<n>bar))\k<n>
    -
    -
    -

    -

    -If you make a subroutine call to a non-unique named subpattern, the one that -corresponds to the first occurrence of the name is used. In the absence of -duplicate numbers (see the previous section) this is the one with the lowest -number. -

    -

    -If you use a named reference in a condition -test (see the -section about conditions -below), either to check whether a subpattern has matched, or to check for -recursion, all subpatterns with the same name are tested. If the condition is -true for any one of them, the overall condition is true. This is the same -behaviour as testing by number. For further details of the interfaces for -handling named subpatterns, see the -pcreapi -documentation. -

    -

    -Warning: You cannot use different names to distinguish between two -subpatterns with the same number because PCRE uses only the numbers when -matching. For this reason, an error is given at compile time if different names -are given to subpatterns with the same number. However, you can always give the -same name to subpatterns with the same number, even when PCRE_DUPNAMES is not -set. -

    -
    REPETITION
    -

    -Repetition is specified by quantifiers, which can follow any of the following -items: -

    -  a literal data character
    -  the dot metacharacter
    -  the \C escape sequence
    -  the \X escape sequence
    -  the \R escape sequence
    -  an escape such as \d or \pL that matches a single character
    -  a character class
    -  a back reference (see next section)
    -  a parenthesized subpattern (including assertions)
    -  a subroutine call to a subpattern (recursive or otherwise)
    -
    -The general repetition quantifier specifies a minimum and maximum number of -permitted matches, by giving the two numbers in curly brackets (braces), -separated by a comma. The numbers must be less than 65536, and the first must -be less than or equal to the second. For example: -
    -  z{2,4}
    -
    -matches "zz", "zzz", or "zzzz". A closing brace on its own is not a special -character. If the second number is omitted, but the comma is present, there is -no upper limit; if the second number and the comma are both omitted, the -quantifier specifies an exact number of required matches. Thus -
    -  [aeiou]{3,}
    -
    -matches at least 3 successive vowels, but may match many more, while -
    -  \d{8}
    -
    -matches exactly 8 digits. An opening curly bracket that appears in a position -where a quantifier is not allowed, or one that does not match the syntax of a -quantifier, is taken as a literal character. For example, {,6} is not a -quantifier, but a literal string of four characters. -

    -

    -In UTF modes, quantifiers apply to characters rather than to individual data -units. Thus, for example, \x{100}{2} matches two characters, each of -which is represented by a two-byte sequence in a UTF-8 string. Similarly, -\X{3} matches three Unicode extended grapheme clusters, each of which may be -several data units long (and they may be of different lengths). -

    -

    -The quantifier {0} is permitted, causing the expression to behave as if the -previous item and the quantifier were not present. This may be useful for -subpatterns that are referenced as -subroutines -from elsewhere in the pattern (but see also the section entitled -"Defining subpatterns for use by reference only" -below). Items other than subpatterns that have a {0} quantifier are omitted -from the compiled pattern. -

    -

    -For convenience, the three most common quantifiers have single-character -abbreviations: -

    -  *    is equivalent to {0,}
    -  +    is equivalent to {1,}
    -  ?    is equivalent to {0,1}
    -
    -It is possible to construct infinite loops by following a subpattern that can -match no characters with a quantifier that has no upper limit, for example: -
    -  (a?)*
    -
    -Earlier versions of Perl and PCRE used to give an error at compile time for -such patterns. However, because there are cases where this can be useful, such -patterns are now accepted, but if any repetition of the subpattern does in fact -match no characters, the loop is forcibly broken. -

    -

    -By default, the quantifiers are "greedy", that is, they match as much as -possible (up to the maximum number of permitted times), without causing the -rest of the pattern to fail. The classic example of where this gives problems -is in trying to match comments in C programs. These appear between /* and */ -and within the comment, individual * and / characters may appear. An attempt to -match C comments by applying the pattern -

    -  /\*.*\*/
    -
    -to the string -
    -  /* first comment */  not comment  /* second comment */
    -
    -fails, because it matches the entire string owing to the greediness of the .* -item. -

    -

    -However, if a quantifier is followed by a question mark, it ceases to be -greedy, and instead matches the minimum number of times possible, so the -pattern -

    -  /\*.*?\*/
    -
    -does the right thing with the C comments. The meaning of the various -quantifiers is not otherwise changed, just the preferred number of matches. -Do not confuse this use of question mark with its use as a quantifier in its -own right. Because it has two uses, it can sometimes appear doubled, as in -
    -  \d??\d
    -
    -which matches one digit by preference, but can match two if that is the only -way the rest of the pattern matches. -

    -

    -If the PCRE_UNGREEDY option is set (an option that is not available in Perl), -the quantifiers are not greedy by default, but individual ones can be made -greedy by following them with a question mark. In other words, it inverts the -default behaviour. -

    -

    -When a parenthesized subpattern is quantified with a minimum repeat count that -is greater than 1 or with a limited maximum, more memory is required for the -compiled pattern, in proportion to the size of the minimum or maximum. -

    -

    -If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent -to Perl's /s) is set, thus allowing the dot to match newlines, the pattern is -implicitly anchored, because whatever follows will be tried against every -character position in the subject string, so there is no point in retrying the -overall match at any position after the first. PCRE normally treats such a -pattern as though it were preceded by \A. -

    -

    -In cases where it is known that the subject string contains no newlines, it is -worth setting PCRE_DOTALL in order to obtain this optimization, or -alternatively using ^ to indicate anchoring explicitly. -

    -

    -However, there are some cases where the optimization cannot be used. When .* -is inside capturing parentheses that are the subject of a back reference -elsewhere in the pattern, a match at the start may fail where a later one -succeeds. Consider, for example: -

    -  (.*)abc\1
    -
    -If the subject is "xyz123abc123" the match point is the fourth character. For -this reason, such a pattern is not implicitly anchored. -

    -

    -Another case where implicit anchoring is not applied is when the leading .* is -inside an atomic group. Once again, a match at the start may fail where a later -one succeeds. Consider this pattern: -

    -  (?>.*?a)b
    -
    -It matches "ab" in the subject "aab". The use of the backtracking control verbs -(*PRUNE) and (*SKIP) also disable this optimization. -

    -

    -When a capturing subpattern is repeated, the value captured is the substring -that matched the final iteration. For example, after -

    -  (tweedle[dume]{3}\s*)+
    -
    -has matched "tweedledum tweedledee" the value of the captured substring is -"tweedledee". However, if there are nested capturing subpatterns, the -corresponding captured values may have been set in previous iterations. For -example, after -
    -  /(a|(b))+/
    -
    -matches "aba" the value of the second captured substring is "b". -

    -
    ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS
    -

    -With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy") -repetition, failure of what follows normally causes the repeated item to be -re-evaluated to see if a different number of repeats allows the rest of the -pattern to match. Sometimes it is useful to prevent this, either to change the -nature of the match, or to cause it fail earlier than it otherwise might, when -the author of the pattern knows there is no point in carrying on. -

    -

    -Consider, for example, the pattern \d+foo when applied to the subject line -

    -  123456bar
    -
    -After matching all 6 digits and then failing to match "foo", the normal -action of the matcher is to try again with only 5 digits matching the \d+ -item, and then with 4, and so on, before ultimately failing. "Atomic grouping" -(a term taken from Jeffrey Friedl's book) provides the means for specifying -that once a subpattern has matched, it is not to be re-evaluated in this way. -

    -

    -If we use atomic grouping for the previous example, the matcher gives up -immediately on failing to match "foo" the first time. The notation is a kind of -special parenthesis, starting with (?> as in this example: -

    -  (?>\d+)foo
    -
    -This kind of parenthesis "locks up" the part of the pattern it contains once -it has matched, and a failure further into the pattern is prevented from -backtracking into it. Backtracking past it to previous items, however, works as -normal. -

    -

    -An alternative description is that a subpattern of this type matches the string -of characters that an identical standalone pattern would match, if anchored at -the current point in the subject string. -

    -

    -Atomic grouping subpatterns are not capturing subpatterns. Simple cases such as -the above example can be thought of as a maximizing repeat that must swallow -everything it can. So, while both \d+ and \d+? are prepared to adjust the -number of digits they match in order to make the rest of the pattern match, -(?>\d+) can only match an entire sequence of digits. -

    -

    -Atomic groups in general can of course contain arbitrarily complicated -subpatterns, and can be nested. However, when the subpattern for an atomic -group is just a single repeated item, as in the example above, a simpler -notation, called a "possessive quantifier" can be used. This consists of an -additional + character following a quantifier. Using this notation, the -previous example can be rewritten as -

    -  \d++foo
    -
    -Note that a possessive quantifier can be used with an entire group, for -example: -
    -  (abc|xyz){2,3}+
    -
    -Possessive quantifiers are always greedy; the setting of the PCRE_UNGREEDY -option is ignored. They are a convenient notation for the simpler forms of -atomic group. However, there is no difference in the meaning of a possessive -quantifier and the equivalent atomic group, though there may be a performance -difference; possessive quantifiers should be slightly faster. -

    -

    -The possessive quantifier syntax is an extension to the Perl 5.8 syntax. -Jeffrey Friedl originated the idea (and the name) in the first edition of his -book. Mike McCloskey liked it, so implemented it when he built Sun's Java -package, and PCRE copied it from there. It ultimately found its way into Perl -at release 5.10. -

    -

    -PCRE has an optimization that automatically "possessifies" certain simple -pattern constructs. For example, the sequence A+B is treated as A++B because -there is no point in backtracking into a sequence of A's when B must follow. -

    -

    -When a pattern contains an unlimited repeat inside a subpattern that can itself -be repeated an unlimited number of times, the use of an atomic group is the -only way to avoid some failing matches taking a very long time indeed. The -pattern -

    -  (\D+|<\d+>)*[!?]
    -
    -matches an unlimited number of substrings that either consist of non-digits, or -digits enclosed in <>, followed by either ! or ?. When it matches, it runs -quickly. However, if it is applied to -
    -  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    -
    -it takes a long time before reporting failure. This is because the string can -be divided between the internal \D+ repeat and the external * repeat in a -large number of ways, and all have to be tried. (The example uses [!?] rather -than a single character at the end, because both PCRE and Perl have an -optimization that allows for fast failure when a single character is used. They -remember the last single character that is required for a match, and fail early -if it is not present in the string.) If the pattern is changed so that it uses -an atomic group, like this: -
    -  ((?>\D+)|<\d+>)*[!?]
    -
    -sequences of non-digits cannot be broken, and failure happens quickly. -

    -
    BACK REFERENCES
    -

    -Outside a character class, a backslash followed by a digit greater than 0 (and -possibly further digits) is a back reference to a capturing subpattern earlier -(that is, to its left) in the pattern, provided there have been that many -previous capturing left parentheses. -

    -

    -However, if the decimal number following the backslash is less than 10, it is -always taken as a back reference, and causes an error only if there are not -that many capturing left parentheses in the entire pattern. In other words, the -parentheses that are referenced need not be to the left of the reference for -numbers less than 10. A "forward back reference" of this type can make sense -when a repetition is involved and the subpattern to the right has participated -in an earlier iteration. -

    -

    -It is not possible to have a numerical "forward back reference" to a subpattern -whose number is 10 or more using this syntax because a sequence such as \50 is -interpreted as a character defined in octal. See the subsection entitled -"Non-printing characters" -above -for further details of the handling of digits following a backslash. There is -no such problem when named parentheses are used. A back reference to any -subpattern is possible using named parentheses (see below). -

    -

    -Another way of avoiding the ambiguity inherent in the use of digits following a -backslash is to use the \g escape sequence. This escape must be followed by an -unsigned number or a negative number, optionally enclosed in braces. These -examples are all identical: -

    -  (ring), \1
    -  (ring), \g1
    -  (ring), \g{1}
    -
    -An unsigned number specifies an absolute reference without the ambiguity that -is present in the older syntax. It is also useful when literal digits follow -the reference. A negative number is a relative reference. Consider this -example: -
    -  (abc(def)ghi)\g{-1}
    -
    -The sequence \g{-1} is a reference to the most recently started capturing -subpattern before \g, that is, is it equivalent to \2 in this example. -Similarly, \g{-2} would be equivalent to \1. The use of relative references -can be helpful in long patterns, and also in patterns that are created by -joining together fragments that contain references within themselves. -

    -

    -A back reference matches whatever actually matched the capturing subpattern in -the current subject string, rather than anything matching the subpattern -itself (see -"Subpatterns as subroutines" -below for a way of doing that). So the pattern -

    -  (sens|respons)e and \1ibility
    -
    -matches "sense and sensibility" and "response and responsibility", but not -"sense and responsibility". If caseful matching is in force at the time of the -back reference, the case of letters is relevant. For example, -
    -  ((?i)rah)\s+\1
    -
    -matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original -capturing subpattern is matched caselessly. -

    -

    -There are several different ways of writing back references to named -subpatterns. The .NET syntax \k{name} and the Perl syntax \k<name> or -\k'name' are supported, as is the Python syntax (?P=name). Perl 5.10's unified -back reference syntax, in which \g can be used for both numeric and named -references, is also supported. We could rewrite the above example in any of -the following ways: -

    -  (?<p1>(?i)rah)\s+\k<p1>
    -  (?'p1'(?i)rah)\s+\k{p1}
    -  (?P<p1>(?i)rah)\s+(?P=p1)
    -  (?<p1>(?i)rah)\s+\g{p1}
    -
    -A subpattern that is referenced by name may appear in the pattern before or -after the reference. -

    -

    -There may be more than one back reference to the same subpattern. If a -subpattern has not actually been used in a particular match, any back -references to it always fail by default. For example, the pattern -

    -  (a|(bc))\2
    -
    -always fails if it starts to match "a" rather than "bc". However, if the -PCRE_JAVASCRIPT_COMPAT option is set at compile time, a back reference to an -unset value matches an empty string. -

    -

    -Because there may be many capturing parentheses in a pattern, all digits -following a backslash are taken as part of a potential back reference number. -If the pattern continues with a digit character, some delimiter must be used to -terminate the back reference. If the PCRE_EXTENDED option is set, this can be -white space. Otherwise, the \g{ syntax or an empty comment (see -"Comments" -below) can be used. -

    -
    -Recursive back references -
    -

    -A back reference that occurs inside the parentheses to which it refers fails -when the subpattern is first used, so, for example, (a\1) never matches. -However, such references can be useful inside repeated subpatterns. For -example, the pattern -

    -  (a|b\1)+
    -
    -matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of -the subpattern, the back reference matches the character string corresponding -to the previous iteration. In order for this to work, the pattern must be such -that the first iteration does not need to match the back reference. This can be -done using alternation, as in the example above, or by a quantifier with a -minimum of zero. -

    -

    -Back references of this type cause the group that they reference to be treated -as an -atomic group. -Once the whole group has been matched, a subsequent matching failure cannot -cause backtracking into the middle of the group. -

    -
    ASSERTIONS
    -

    -An assertion is a test on the characters following or preceding the current -matching point that does not actually consume any characters. The simple -assertions coded as \b, \B, \A, \G, \Z, \z, ^ and $ are described -above. -

    -

    -More complicated assertions are coded as subpatterns. There are two kinds: -those that look ahead of the current position in the subject string, and those -that look behind it. An assertion subpattern is matched in the normal way, -except that it does not cause the current matching position to be changed. -

    -

    -Assertion subpatterns are not capturing subpatterns. If such an assertion -contains capturing subpatterns within it, these are counted for the purposes of -numbering the capturing subpatterns in the whole pattern. However, substring -capturing is carried out only for positive assertions. (Perl sometimes, but not -always, does do capturing in negative assertions.) -

    -

    -WARNING: If a positive assertion containing one or more capturing subpatterns -succeeds, but failure to match later in the pattern causes backtracking over -this assertion, the captures within the assertion are reset only if no higher -numbered captures are already set. This is, unfortunately, a fundamental -limitation of the current implementation, and as PCRE1 is now in -maintenance-only status, it is unlikely ever to change. -

    -

    -For compatibility with Perl, assertion subpatterns may be repeated; though -it makes no sense to assert the same thing several times, the side effect of -capturing parentheses may occasionally be useful. In practice, there only three -cases: -
    -
    -(1) If the quantifier is {0}, the assertion is never obeyed during matching. -However, it may contain internal capturing parenthesized groups that are called -from elsewhere via the -subroutine mechanism. -
    -
    -(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it -were {0,1}. At run time, the rest of the pattern match is tried with and -without the assertion, the order depending on the greediness of the quantifier. -
    -
    -(3) If the minimum repetition is greater than zero, the quantifier is ignored. -The assertion is obeyed just once when encountered during matching. -

    -
    -Lookahead assertions -
    -

    -Lookahead assertions start with (?= for positive assertions and (?! for -negative assertions. For example, -

    -  \w+(?=;)
    -
    -matches a word followed by a semicolon, but does not include the semicolon in -the match, and -
    -  foo(?!bar)
    -
    -matches any occurrence of "foo" that is not followed by "bar". Note that the -apparently similar pattern -
    -  (?!foo)bar
    -
    -does not find an occurrence of "bar" that is preceded by something other than -"foo"; it finds any occurrence of "bar" whatsoever, because the assertion -(?!foo) is always true when the next three characters are "bar". A -lookbehind assertion is needed to achieve the other effect. -

    -

    -If you want to force a matching failure at some point in a pattern, the most -convenient way to do it is with (?!) because an empty string always matches, so -an assertion that requires there not to be an empty string must always fail. -The backtracking control verb (*FAIL) or (*F) is a synonym for (?!). -

    -
    -Lookbehind assertions -
    -

    -Lookbehind assertions start with (?<= for positive assertions and (?<! for -negative assertions. For example, -

    -  (?<!foo)bar
    -
    -does find an occurrence of "bar" that is not preceded by "foo". The contents of -a lookbehind assertion are restricted such that all the strings it matches must -have a fixed length. However, if there are several top-level alternatives, they -do not all have to have the same fixed length. Thus -
    -  (?<=bullock|donkey)
    -
    -is permitted, but -
    -  (?<!dogs?|cats?)
    -
    -causes an error at compile time. Branches that match different length strings -are permitted only at the top level of a lookbehind assertion. This is an -extension compared with Perl, which requires all branches to match the same -length of string. An assertion such as -
    -  (?<=ab(c|de))
    -
    -is not permitted, because its single top-level branch can match two different -lengths, but it is acceptable to PCRE if rewritten to use two top-level -branches: -
    -  (?<=abc|abde)
    -
    -In some cases, the escape sequence \K -(see above) -can be used instead of a lookbehind assertion to get round the fixed-length -restriction. -

    -

    -The implementation of lookbehind assertions is, for each alternative, to -temporarily move the current position back by the fixed length and then try to -match. If there are insufficient characters before the current position, the -assertion fails. -

    -

    -In a UTF mode, PCRE does not allow the \C escape (which matches a single data -unit even in a UTF mode) to appear in lookbehind assertions, because it makes -it impossible to calculate the length of the lookbehind. The \X and \R -escapes, which can match different numbers of data units, are also not -permitted. -

    -

    -"Subroutine" -calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long -as the subpattern matches a fixed-length string. -Recursion, -however, is not supported. -

    -

    -Possessive quantifiers can be used in conjunction with lookbehind assertions to -specify efficient matching of fixed-length strings at the end of subject -strings. Consider a simple pattern such as -

    -  abcd$
    -
    -when applied to a long string that does not match. Because matching proceeds -from left to right, PCRE will look for each "a" in the subject and then see if -what follows matches the rest of the pattern. If the pattern is specified as -
    -  ^.*abcd$
    -
    -the initial .* matches the entire string at first, but when this fails (because -there is no following "a"), it backtracks to match all but the last character, -then all but the last two characters, and so on. Once again the search for "a" -covers the entire string, from right to left, so we are no better off. However, -if the pattern is written as -
    -  ^.*+(?<=abcd)
    -
    -there can be no backtracking for the .*+ item; it can match only the entire -string. The subsequent lookbehind assertion does a single test on the last four -characters. If it fails, the match fails immediately. For long strings, this -approach makes a significant difference to the processing time. -

    -
    -Using multiple assertions -
    -

    -Several assertions (of any sort) may occur in succession. For example, -

    -  (?<=\d{3})(?<!999)foo
    -
    -matches "foo" preceded by three digits that are not "999". Notice that each of -the assertions is applied independently at the same point in the subject -string. First there is a check that the previous three characters are all -digits, and then there is a check that the same three characters are not "999". -This pattern does not match "foo" preceded by six characters, the first -of which are digits and the last three of which are not "999". For example, it -doesn't match "123abcfoo". A pattern to do that is -
    -  (?<=\d{3}...)(?<!999)foo
    -
    -This time the first assertion looks at the preceding six characters, checking -that the first three are digits, and then the second assertion checks that the -preceding three characters are not "999". -

    -

    -Assertions can be nested in any combination. For example, -

    -  (?<=(?<!foo)bar)baz
    -
    -matches an occurrence of "baz" that is preceded by "bar" which in turn is not -preceded by "foo", while -
    -  (?<=\d{3}(?!999)...)foo
    -
    -is another pattern that matches "foo" preceded by three digits and any three -characters that are not "999". -

    -
    CONDITIONAL SUBPATTERNS
    -

    -It is possible to cause the matching process to obey a subpattern -conditionally or to choose between two alternative subpatterns, depending on -the result of an assertion, or whether a specific capturing subpattern has -already been matched. The two possible forms of conditional subpattern are: -

    -  (?(condition)yes-pattern)
    -  (?(condition)yes-pattern|no-pattern)
    -
    -If the condition is satisfied, the yes-pattern is used; otherwise the -no-pattern (if present) is used. If there are more than two alternatives in the -subpattern, a compile-time error occurs. Each of the two alternatives may -itself contain nested subpatterns of any form, including conditional -subpatterns; the restriction to two alternatives applies only at the level of -the condition. This pattern fragment is an example where the alternatives are -complex: -
    -  (?(1) (A|B|C) | (D | (?(2)E|F) | E) )
    -
    -
    -

    -

    -There are four kinds of condition: references to subpatterns, references to -recursion, a pseudo-condition called DEFINE, and assertions. -

    -
    -Checking for a used subpattern by number -
    -

    -If the text between the parentheses consists of a sequence of digits, the -condition is true if a capturing subpattern of that number has previously -matched. If there is more than one capturing subpattern with the same number -(see the earlier -section about duplicate subpattern numbers), -the condition is true if any of them have matched. An alternative notation is -to precede the digits with a plus or minus sign. In this case, the subpattern -number is relative rather than absolute. The most recently opened parentheses -can be referenced by (?(-1), the next most recent by (?(-2), and so on. Inside -loops it can also make sense to refer to subsequent groups. The next -parentheses to be opened can be referenced as (?(+1), and so on. (The value -zero in any of these forms is not used; it provokes a compile-time error.) -

    -

    -Consider the following pattern, which contains non-significant white space to -make it more readable (assume the PCRE_EXTENDED option) and to divide it into -three parts for ease of discussion: -

    -  ( \( )?    [^()]+    (?(1) \) )
    -
    -The first part matches an optional opening parenthesis, and if that -character is present, sets it as the first captured substring. The second part -matches one or more characters that are not parentheses. The third part is a -conditional subpattern that tests whether or not the first set of parentheses -matched. If they did, that is, if subject started with an opening parenthesis, -the condition is true, and so the yes-pattern is executed and a closing -parenthesis is required. Otherwise, since no-pattern is not present, the -subpattern matches nothing. In other words, this pattern matches a sequence of -non-parentheses, optionally enclosed in parentheses. -

    -

    -If you were embedding this pattern in a larger one, you could use a relative -reference: -

    -  ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...
    -
    -This makes the fragment independent of the parentheses in the larger pattern. -

    -
    -Checking for a used subpattern by name -
    -

    -Perl uses the syntax (?(<name>)...) or (?('name')...) to test for a used -subpattern by name. For compatibility with earlier versions of PCRE, which had -this facility before Perl, the syntax (?(name)...) is also recognized. -

    -

    -Rewriting the above example to use a named subpattern gives this: -

    -  (?<OPEN> \( )?    [^()]+    (?(<OPEN>) \) )
    -
    -If the name used in a condition of this kind is a duplicate, the test is -applied to all subpatterns of the same name, and is true if any one of them has -matched. -

    -
    -Checking for pattern recursion -
    -

    -If the condition is the string (R), and there is no subpattern with the name R, -the condition is true if a recursive call to the whole pattern or any -subpattern has been made. If digits or a name preceded by ampersand follow the -letter R, for example: -

    -  (?(R3)...) or (?(R&name)...)
    -
    -the condition is true if the most recent recursion is into a subpattern whose -number or name is given. This condition does not check the entire recursion -stack. If the name used in a condition of this kind is a duplicate, the test is -applied to all subpatterns of the same name, and is true if any one of them is -the most recent recursion. -

    -

    -At "top level", all these recursion test conditions are false. -The syntax for recursive patterns -is described below. -

    -
    -Defining subpatterns for use by reference only -
    -

    -If the condition is the string (DEFINE), and there is no subpattern with the -name DEFINE, the condition is always false. In this case, there may be only one -alternative in the subpattern. It is always skipped if control reaches this -point in the pattern; the idea of DEFINE is that it can be used to define -subroutines that can be referenced from elsewhere. (The use of -subroutines -is described below.) For example, a pattern to match an IPv4 address such as -"192.168.23.245" could be written like this (ignore white space and line -breaks): -

    -  (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
    -  \b (?&byte) (\.(?&byte)){3} \b
    -
    -The first part of the pattern is a DEFINE group inside which a another group -named "byte" is defined. This matches an individual component of an IPv4 -address (a number less than 256). When matching takes place, this part of the -pattern is skipped because DEFINE acts like a false condition. The rest of the -pattern uses references to the named group to match the four dot-separated -components of an IPv4 address, insisting on a word boundary at each end. -

    -
    -Assertion conditions -
    -

    -If the condition is not in any of the above formats, it must be an assertion. -This may be a positive or negative lookahead or lookbehind assertion. Consider -this pattern, again containing non-significant white space, and with the two -alternatives on the second line: -

    -  (?(?=[^a-z]*[a-z])
    -  \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
    -
    -The condition is a positive lookahead assertion that matches an optional -sequence of non-letters followed by a letter. In other words, it tests for the -presence of at least one letter in the subject. If a letter is found, the -subject is matched against the first alternative; otherwise it is matched -against the second. This pattern matches strings in one of the two forms -dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits. -

    -
    COMMENTS
    -

    -There are two ways of including comments in patterns that are processed by -PCRE. In both cases, the start of the comment must not be in a character class, -nor in the middle of any other sequence of related characters such as (?: or a -subpattern name or number. The characters that make up a comment play no part -in the pattern matching. -

    -

    -The sequence (?# marks the start of a comment that continues up to the next -closing parenthesis. Nested parentheses are not permitted. If the PCRE_EXTENDED -option is set, an unescaped # character also introduces a comment, which in -this case continues to immediately after the next newline character or -character sequence in the pattern. Which characters are interpreted as newlines -is controlled by the options passed to a compiling function or by a special -sequence at the start of the pattern, as described in the section entitled -"Newline conventions" -above. Note that the end of this type of comment is a literal newline sequence -in the pattern; escape sequences that happen to represent a newline do not -count. For example, consider this pattern when PCRE_EXTENDED is set, and the -default newline convention is in force: -

    -  abc #comment \n still comment
    -
    -On encountering the # character, pcre_compile() skips along, looking for -a newline in the pattern. The sequence \n is still literal at this stage, so -it does not terminate the comment. Only an actual character with the code value -0x0a (the default newline) does so. -

    -
    RECURSIVE PATTERNS
    -

    -Consider the problem of matching a string in parentheses, allowing for -unlimited nested parentheses. Without the use of recursion, the best that can -be done is to use a pattern that matches up to some fixed depth of nesting. It -is not possible to handle an arbitrary nesting depth. -

    -

    -For some time, Perl has provided a facility that allows regular expressions to -recurse (amongst other things). It does this by interpolating Perl code in the -expression at run time, and the code can refer to the expression itself. A Perl -pattern using code interpolation to solve the parentheses problem can be -created like this: -

    -  $re = qr{\( (?: (?>[^()]+) | (?p{$re}) )* \)}x;
    -
    -The (?p{...}) item interpolates Perl code at run time, and in this case refers -recursively to the pattern in which it appears. -

    -

    -Obviously, PCRE cannot support the interpolation of Perl code. Instead, it -supports special syntax for recursion of the entire pattern, and also for -individual subpattern recursion. After its introduction in PCRE and Python, -this kind of recursion was subsequently introduced into Perl at release 5.10. -

    -

    -A special item that consists of (? followed by a number greater than zero and a -closing parenthesis is a recursive subroutine call of the subpattern of the -given number, provided that it occurs inside that subpattern. (If not, it is a -non-recursive subroutine -call, which is described in the next section.) The special item (?R) or (?0) is -a recursive call of the entire regular expression. -

    -

    -This PCRE pattern solves the nested parentheses problem (assume the -PCRE_EXTENDED option is set so that white space is ignored): -

    -  \( ( [^()]++ | (?R) )* \)
    -
    -First it matches an opening parenthesis. Then it matches any number of -substrings which can either be a sequence of non-parentheses, or a recursive -match of the pattern itself (that is, a correctly parenthesized substring). -Finally there is a closing parenthesis. Note the use of a possessive quantifier -to avoid backtracking into sequences of non-parentheses. -

    -

    -If this were part of a larger pattern, you would not want to recurse the entire -pattern, so instead you could use this: -

    -  ( \( ( [^()]++ | (?1) )* \) )
    -
    -We have put the pattern into parentheses, and caused the recursion to refer to -them instead of the whole pattern. -

    -

    -In a larger pattern, keeping track of parenthesis numbers can be tricky. This -is made easier by the use of relative references. Instead of (?1) in the -pattern above you can write (?-2) to refer to the second most recently opened -parentheses preceding the recursion. In other words, a negative number counts -capturing parentheses leftwards from the point at which it is encountered. -

    -

    -It is also possible to refer to subsequently opened parentheses, by writing -references such as (?+2). However, these cannot be recursive because the -reference is not inside the parentheses that are referenced. They are always -non-recursive subroutine -calls, as described in the next section. -

    -

    -An alternative approach is to use named parentheses instead. The Perl syntax -for this is (?&name); PCRE's earlier syntax (?P>name) is also supported. We -could rewrite the above example as follows: -

    -  (?<pn> \( ( [^()]++ | (?&pn) )* \) )
    -
    -If there is more than one subpattern with the same name, the earliest one is -used. -

    -

    -This particular example pattern that we have been looking at contains nested -unlimited repeats, and so the use of a possessive quantifier for matching -strings of non-parentheses is important when applying the pattern to strings -that do not match. For example, when this pattern is applied to -

    -  (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
    -
    -it yields "no match" quickly. However, if a possessive quantifier is not used, -the match runs for a very long time indeed because there are so many different -ways the + and * repeats can carve up the subject, and all have to be tested -before failure can be reported. -

    -

    -At the end of a match, the values of capturing parentheses are those from -the outermost level. If you want to obtain intermediate values, a callout -function can be used (see below and the -pcrecallout -documentation). If the pattern above is matched against -

    -  (ab(cd)ef)
    -
    -the value for the inner capturing parentheses (numbered 2) is "ef", which is -the last value taken on at the top level. If a capturing subpattern is not -matched at the top level, its final captured value is unset, even if it was -(temporarily) set at a deeper level during the matching process. -

    -

    -If there are more than 15 capturing parentheses in a pattern, PCRE has to -obtain extra memory to store data during a recursion, which it does by using -pcre_malloc, freeing it via pcre_free afterwards. If no memory can -be obtained, the match fails with the PCRE_ERROR_NOMEMORY error. -

    -

    -Do not confuse the (?R) item with the condition (R), which tests for recursion. -Consider this pattern, which matches text in angle brackets, allowing for -arbitrary nesting. Only digits are allowed in nested brackets (that is, when -recursing), whereas any characters are permitted at the outer level. -

    -  < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >
    -
    -In this pattern, (?(R) is the start of a conditional subpattern, with two -different alternatives for the recursive and non-recursive cases. The (?R) item -is the actual recursive call. -

    -
    -Differences in recursion processing between PCRE and Perl -
    -

    -Recursion processing in PCRE differs from Perl in two important ways. In PCRE -(like Python, but unlike Perl), a recursive subpattern call is always treated -as an atomic group. That is, once it has matched some of the subject string, it -is never re-entered, even if it contains untried alternatives and there is a -subsequent matching failure. This can be illustrated by the following pattern, -which purports to match a palindromic string that contains an odd number of -characters (for example, "a", "aba", "abcba", "abcdcba"): -

    -  ^(.|(.)(?1)\2)$
    -
    -The idea is that it either matches a single character, or two identical -characters surrounding a sub-palindrome. In Perl, this pattern works; in PCRE -it does not if the pattern is longer than three characters. Consider the -subject string "abcba": -

    -

    -At the top level, the first character is matched, but as it is not at the end -of the string, the first alternative fails; the second alternative is taken -and the recursion kicks in. The recursive call to subpattern 1 successfully -matches the next character ("b"). (Note that the beginning and end of line -tests are not part of the recursion). -

    -

    -Back at the top level, the next character ("c") is compared with what -subpattern 2 matched, which was "a". This fails. Because the recursion is -treated as an atomic group, there are now no backtracking points, and so the -entire match fails. (Perl is able, at this point, to re-enter the recursion and -try the second alternative.) However, if the pattern is written with the -alternatives in the other order, things are different: -

    -  ^((.)(?1)\2|.)$
    -
    -This time, the recursing alternative is tried first, and continues to recurse -until it runs out of characters, at which point the recursion fails. But this -time we do have another alternative to try at the higher level. That is the big -difference: in the previous case the remaining alternative is at a deeper -recursion level, which PCRE cannot use. -

    -

    -To change the pattern so that it matches all palindromic strings, not just -those with an odd number of characters, it is tempting to change the pattern to -this: -

    -  ^((.)(?1)\2|.?)$
    -
    -Again, this works in Perl, but not in PCRE, and for the same reason. When a -deeper recursion has matched a single character, it cannot be entered again in -order to match an empty string. The solution is to separate the two cases, and -write out the odd and even cases as alternatives at the higher level: -
    -  ^(?:((.)(?1)\2|)|((.)(?3)\4|.))
    -
    -If you want to match typical palindromic phrases, the pattern has to ignore all -non-word characters, which can be done like this: -
    -  ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$
    -
    -If run with the PCRE_CASELESS option, this pattern matches phrases such as "A -man, a plan, a canal: Panama!" and it works well in both PCRE and Perl. Note -the use of the possessive quantifier *+ to avoid backtracking into sequences of -non-word characters. Without this, PCRE takes a great deal longer (ten times or -more) to match typical phrases, and Perl takes so long that you think it has -gone into a loop. -

    -

    -WARNING: The palindrome-matching patterns above work only if the subject -string does not start with a palindrome that is shorter than the entire string. -For example, although "abcba" is correctly matched, if the subject is "ababa", -PCRE finds the palindrome "aba" at the start, then fails at top level because -the end of the string does not follow. Once again, it cannot jump back into the -recursion to try other alternatives, so the entire match fails. -

    -

    -The second way in which PCRE and Perl differ in their recursion processing is -in the handling of captured values. In Perl, when a subpattern is called -recursively or as a subpattern (see the next section), it has no access to any -values that were captured outside the recursion, whereas in PCRE these values -can be referenced. Consider this pattern: -

    -  ^(.)(\1|a(?2))
    -
    -In PCRE, this pattern matches "bab". The first capturing parentheses match "b", -then in the second group, when the back reference \1 fails to match "b", the -second alternative matches "a" and then recurses. In the recursion, \1 does -now match "b" and so the whole match succeeds. In Perl, the pattern fails to -match because inside the recursive call \1 cannot access the externally set -value. -

    -
    SUBPATTERNS AS SUBROUTINES
    -

    -If the syntax for a recursive subpattern call (either by number or by -name) is used outside the parentheses to which it refers, it operates like a -subroutine in a programming language. The called subpattern may be defined -before or after the reference. A numbered reference can be absolute or -relative, as in these examples: -

    -  (...(absolute)...)...(?2)...
    -  (...(relative)...)...(?-1)...
    -  (...(?+1)...(relative)...
    -
    -An earlier example pointed out that the pattern -
    -  (sens|respons)e and \1ibility
    -
    -matches "sense and sensibility" and "response and responsibility", but not -"sense and responsibility". If instead the pattern -
    -  (sens|respons)e and (?1)ibility
    -
    -is used, it does match "sense and responsibility" as well as the other two -strings. Another example is given in the discussion of DEFINE above. -

    -

    -All subroutine calls, whether recursive or not, are always treated as atomic -groups. That is, once a subroutine has matched some of the subject string, it -is never re-entered, even if it contains untried alternatives and there is a -subsequent matching failure. Any capturing parentheses that are set during the -subroutine call revert to their previous values afterwards. -

    -

    -Processing options such as case-independence are fixed when a subpattern is -defined, so if it is used as a subroutine, such options cannot be changed for -different calls. For example, consider this pattern: -

    -  (abc)(?i:(?-1))
    -
    -It matches "abcabc". It does not match "abcABC" because the change of -processing option does not affect the called subpattern. -

    -
    ONIGURUMA SUBROUTINE SYNTAX
    -

    -For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or -a number enclosed either in angle brackets or single quotes, is an alternative -syntax for referencing a subpattern as a subroutine, possibly recursively. Here -are two of the examples used above, rewritten using this syntax: -

    -  (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
    -  (sens|respons)e and \g'1'ibility
    -
    -PCRE supports an extension to Oniguruma: if a number is preceded by a -plus or a minus sign it is taken as a relative reference. For example: -
    -  (abc)(?i:\g<-1>)
    -
    -Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not -synonymous. The former is a back reference; the latter is a subroutine call. -

    -
    CALLOUTS
    -

    -Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl -code to be obeyed in the middle of matching a regular expression. This makes it -possible, amongst other things, to extract different substrings that match the -same pair of parentheses when there is a repetition. -

    -

    -PCRE provides a similar feature, but of course it cannot obey arbitrary Perl -code. The feature is called "callout". The caller of PCRE provides an external -function by putting its entry point in the global variable pcre_callout -(8-bit library) or pcre[16|32]_callout (16-bit or 32-bit library). -By default, this variable contains NULL, which disables all calling out. -

    -

    -Within a regular expression, (?C) indicates the points at which the external -function is to be called. If you want to identify different callout points, you -can put a number less than 256 after the letter C. The default value is zero. -For example, this pattern has two callout points: -

    -  (?C1)abc(?C2)def
    -
    -If the PCRE_AUTO_CALLOUT flag is passed to a compiling function, callouts are -automatically installed before each item in the pattern. They are all numbered -255. If there is a conditional group in the pattern whose condition is an -assertion, an additional callout is inserted just before the condition. An -explicit callout may also be set at this position, as in this example: -
    -  (?(?C9)(?=a)abc|def)
    -
    -Note that this applies only to assertion conditions, not to other types of -condition. -

    -

    -During matching, when PCRE reaches a callout point, the external function is -called. It is provided with the number of the callout, the position in the -pattern, and, optionally, one item of data originally supplied by the caller of -the matching function. The callout function may cause matching to proceed, to -backtrack, or to fail altogether. -

    -

    -By default, PCRE implements a number of optimizations at compile time and -matching time, and one side-effect is that sometimes callouts are skipped. If -you need all possible callouts to happen, you need to set options that disable -the relevant optimizations. More details, and a complete description of the -interface to the callout function, are given in the -pcrecallout -documentation. -

    -
    BACKTRACKING CONTROL
    -

    -Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which -are still described in the Perl documentation as "experimental and subject to -change or removal in a future version of Perl". It goes on to say: "Their usage -in production code should be noted to avoid problems during upgrades." The same -remarks apply to the PCRE features described in this section. -

    -

    -The new verbs make use of what was previously invalid syntax: an opening -parenthesis followed by an asterisk. They are generally of the form -(*VERB) or (*VERB:NAME). Some may take either form, possibly behaving -differently depending on whether or not a name is present. A name is any -sequence of characters that does not include a closing parenthesis. The maximum -length of name is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit -libraries. If the name is empty, that is, if the closing parenthesis -immediately follows the colon, the effect is as if the colon were not there. -Any number of these verbs may occur in a pattern. -

    -

    -Since these verbs are specifically related to backtracking, most of them can be -used only when the pattern is to be matched using one of the traditional -matching functions, because these use a backtracking algorithm. With the -exception of (*FAIL), which behaves like a failing negative assertion, the -backtracking control verbs cause an error if encountered by a DFA matching -function. -

    -

    -The behaviour of these verbs in -repeated groups, -assertions, -and in -subpatterns called as subroutines -(whether or not recursively) is documented below. -

    -
    -Optimizations that affect backtracking verbs -
    -

    -PCRE contains some optimizations that are used to speed up matching by running -some checks at the start of each match attempt. For example, it may know the -minimum length of matching subject, or that a particular character must be -present. When one of these optimizations bypasses the running of a match, any -included backtracking verbs will not, of course, be processed. You can suppress -the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option -when calling pcre_compile() or pcre_exec(), or by starting the -pattern with (*NO_START_OPT). There is more discussion of this option in the -section entitled -"Option bits for pcre_exec()" -in the -pcreapi -documentation. -

    -

    -Experiments with Perl suggest that it too has similar optimizations, sometimes -leading to anomalous results. -

    -
    -Verbs that act immediately -
    -

    -The following verbs act as soon as they are encountered. They may not be -followed by a name. -

    -   (*ACCEPT)
    -
    -This verb causes the match to end successfully, skipping the remainder of the -pattern. However, when it is inside a subpattern that is called as a -subroutine, only that subpattern is ended successfully. Matching then continues -at the outer level. If (*ACCEPT) in triggered in a positive assertion, the -assertion succeeds; in a negative assertion, the assertion fails. -

    -

    -If (*ACCEPT) is inside capturing parentheses, the data so far is captured. For -example: -

    -  A((?:A|B(*ACCEPT)|C)D)
    -
    -This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by -the outer parentheses. -
    -  (*FAIL) or (*F)
    -
    -This verb causes a matching failure, forcing backtracking to occur. It is -equivalent to (?!) but easier to read. The Perl documentation notes that it is -probably useful only when combined with (?{}) or (??{}). Those are, of course, -Perl features that are not present in PCRE. The nearest equivalent is the -callout feature, as for example in this pattern: -
    -  a+(?C)(*FAIL)
    -
    -A match with the string "aaaa" always fails, but the callout is taken before -each backtrack happens (in this example, 10 times). -

    -
    -Recording which path was taken -
    -

    -There is one verb whose main purpose is to track how a match was arrived at, -though it also has a secondary use in conjunction with advancing the match -starting point (see (*SKIP) below). -

    -  (*MARK:NAME) or (*:NAME)
    -
    -A name is always required with this verb. There may be as many instances of -(*MARK) as you like in a pattern, and their names do not have to be unique. -

    -

    -When a match succeeds, the name of the last-encountered (*MARK:NAME), -(*PRUNE:NAME), or (*THEN:NAME) on the matching path is passed back to the -caller as described in the section entitled -"Extra data for pcre_exec()" -in the -pcreapi -documentation. Here is an example of pcretest output, where the /K -modifier requests the retrieval and outputting of (*MARK) data: -

    -    re> /X(*MARK:A)Y|X(*MARK:B)Z/K
    -  data> XY
    -   0: XY
    -  MK: A
    -  XZ
    -   0: XZ
    -  MK: B
    -
    -The (*MARK) name is tagged with "MK:" in this output, and in this example it -indicates which of the two alternatives matched. This is a more efficient way -of obtaining this information than putting each alternative in its own -capturing parentheses. -

    -

    -If a verb with a name is encountered in a positive assertion that is true, the -name is recorded and passed back if it is the last-encountered. This does not -happen for negative assertions or failing positive assertions. -

    -

    -After a partial match or a failed match, the last encountered name in the -entire match process is returned. For example: -

    -    re> /X(*MARK:A)Y|X(*MARK:B)Z/K
    -  data> XP
    -  No match, mark = B
    -
    -Note that in this unanchored example the mark is retained from the match -attempt that started at the letter "X" in the subject. Subsequent match -attempts starting at "P" and then with an empty string do not get as far as the -(*MARK) item, but nevertheless do not reset it. -

    -

    -If you are interested in (*MARK) values after failed matches, you should -probably set the PCRE_NO_START_OPTIMIZE option -(see above) -to ensure that the match is always attempted. -

    -
    -Verbs that act after backtracking -
    -

    -The following verbs do nothing when they are encountered. Matching continues -with what follows, but if there is no subsequent match, causing a backtrack to -the verb, a failure is forced. That is, backtracking cannot pass to the left of -the verb. However, when one of these verbs appears inside an atomic group or an -assertion that is true, its effect is confined to that group, because once the -group has been matched, there is never any backtracking into it. In this -situation, backtracking can "jump back" to the left of the entire atomic group -or assertion. (Remember also, as stated above, that this localization also -applies in subroutine calls.) -

    -

    -These verbs differ in exactly what kind of failure occurs when backtracking -reaches them. The behaviour described below is what happens when the verb is -not in a subroutine or an assertion. Subsequent sections cover these special -cases. -

    -  (*COMMIT)
    -
    -This verb, which may not be followed by a name, causes the whole match to fail -outright if there is a later matching failure that causes backtracking to reach -it. Even if the pattern is unanchored, no further attempts to find a match by -advancing the starting point take place. If (*COMMIT) is the only backtracking -verb that is encountered, once it has been passed pcre_exec() is -committed to finding a match at the current starting point, or not at all. For -example: -
    -  a+(*COMMIT)b
    -
    -This matches "xxaab" but not "aacaab". It can be thought of as a kind of -dynamic anchor, or "I've started, so I must finish." The name of the most -recently passed (*MARK) in the path is passed back when (*COMMIT) forces a -match failure. -

    -

    -If there is more than one backtracking verb in a pattern, a different one that -follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a -match does not always guarantee that a match must be at this starting point. -

    -

    -Note that (*COMMIT) at the start of a pattern is not the same as an anchor, -unless PCRE's start-of-match optimizations are turned off, as shown in this -output from pcretest: -

    -    re> /(*COMMIT)abc/
    -  data> xyzabc
    -   0: abc
    -  data> xyzabc\Y
    -  No match
    -
    -For this pattern, PCRE knows that any match must start with "a", so the -optimization skips along the subject to "a" before applying the pattern to the -first set of data. The match attempt then succeeds. In the second set of data, -the escape sequence \Y is interpreted by the pcretest program. It causes -the PCRE_NO_START_OPTIMIZE option to be set when pcre_exec() is called. -This disables the optimization that skips along to the first character. The -pattern is now applied starting at "x", and so the (*COMMIT) causes the match -to fail without trying any other starting points. -
    -  (*PRUNE) or (*PRUNE:NAME)
    -
    -This verb causes the match to fail at the current starting position in the -subject if there is a later matching failure that causes backtracking to reach -it. If the pattern is unanchored, the normal "bumpalong" advance to the next -starting character then happens. Backtracking can occur as usual to the left of -(*PRUNE), before it is reached, or when matching to the right of (*PRUNE), but -if there is no match to the right, backtracking cannot cross (*PRUNE). In -simple cases, the use of (*PRUNE) is just an alternative to an atomic group or -possessive quantifier, but there are some uses of (*PRUNE) that cannot be -expressed in any other way. In an anchored pattern (*PRUNE) has the same effect -as (*COMMIT). -

    -

    -The behaviour of (*PRUNE:NAME) is the not the same as (*MARK:NAME)(*PRUNE). -It is like (*MARK:NAME) in that the name is remembered for passing back to the -caller. However, (*SKIP:NAME) searches only for names set with (*MARK). -

    -  (*SKIP)
    -
    -This verb, when given without a name, is like (*PRUNE), except that if the -pattern is unanchored, the "bumpalong" advance is not to the next character, -but to the position in the subject where (*SKIP) was encountered. (*SKIP) -signifies that whatever text was matched leading up to it cannot be part of a -successful match. Consider: -
    -  a+(*SKIP)b
    -
    -If the subject is "aaaac...", after the first match attempt fails (starting at -the first character in the string), the starting point skips on to start the -next attempt at "c". Note that a possessive quantifer does not have the same -effect as this example; although it would suppress backtracking during the -first match attempt, the second attempt would start at the second character -instead of skipping on to "c". -
    -  (*SKIP:NAME)
    -
    -When (*SKIP) has an associated name, its behaviour is modified. When it is -triggered, the previous path through the pattern is searched for the most -recent (*MARK) that has the same name. If one is found, the "bumpalong" advance -is to the subject position that corresponds to that (*MARK) instead of to where -(*SKIP) was encountered. If no (*MARK) with a matching name is found, the -(*SKIP) is ignored. -

    -

    -Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores -names that are set by (*PRUNE:NAME) or (*THEN:NAME). -

    -  (*THEN) or (*THEN:NAME)
    -
    -This verb causes a skip to the next innermost alternative when backtracking -reaches it. That is, it cancels any further backtracking within the current -alternative. Its name comes from the observation that it can be used for a -pattern-based if-then-else block: -
    -  ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
    -
    -If the COND1 pattern matches, FOO is tried (and possibly further items after -the end of the group if FOO succeeds); on failure, the matcher skips to the -second alternative and tries COND2, without backtracking into COND1. If that -succeeds and BAR fails, COND3 is tried. If subsequently BAZ fails, there are no -more alternatives, so there is a backtrack to whatever came before the entire -group. If (*THEN) is not inside an alternation, it acts like (*PRUNE). -

    -

    -The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN). -It is like (*MARK:NAME) in that the name is remembered for passing back to the -caller. However, (*SKIP:NAME) searches only for names set with (*MARK). -

    -

    -A subpattern that does not contain a | character is just a part of the -enclosing alternative; it is not a nested alternation with only one -alternative. The effect of (*THEN) extends beyond such a subpattern to the -enclosing alternative. Consider this pattern, where A, B, etc. are complex -pattern fragments that do not contain any | characters at this level: -

    -  A (B(*THEN)C) | D
    -
    -If A and B are matched, but there is a failure in C, matching does not -backtrack into A; instead it moves to the next alternative, that is, D. -However, if the subpattern containing (*THEN) is given an alternative, it -behaves differently: -
    -  A (B(*THEN)C | (*FAIL)) | D
    -
    -The effect of (*THEN) is now confined to the inner subpattern. After a failure -in C, matching moves to (*FAIL), which causes the whole subpattern to fail -because there are no more alternatives to try. In this case, matching does now -backtrack into A. -

    -

    -Note that a conditional subpattern is not considered as having two -alternatives, because only one is ever used. In other words, the | character in -a conditional subpattern has a different meaning. Ignoring white space, -consider: -

    -  ^.*? (?(?=a) a | b(*THEN)c )
    -
    -If the subject is "ba", this pattern does not match. Because .*? is ungreedy, -it initially matches zero characters. The condition (?=a) then fails, the -character "b" is matched, but "c" is not. At this point, matching does not -backtrack to .*? as might perhaps be expected from the presence of the | -character. The conditional subpattern is part of the single alternative that -comprises the whole pattern, and so the match fails. (If there was a backtrack -into .*?, allowing it to match "b", the match would succeed.) -

    -

    -The verbs just described provide four different "strengths" of control when -subsequent matching fails. (*THEN) is the weakest, carrying on the match at the -next alternative. (*PRUNE) comes next, failing the match at the current -starting position, but allowing an advance to the next character (for an -unanchored pattern). (*SKIP) is similar, except that the advance may be more -than one character. (*COMMIT) is the strongest, causing the entire match to -fail. -

    -
    -More than one backtracking verb -
    -

    -If more than one backtracking verb is present in a pattern, the one that is -backtracked onto first acts. For example, consider this pattern, where A, B, -etc. are complex pattern fragments: -

    -  (A(*COMMIT)B(*THEN)C|ABD)
    -
    -If A matches but B fails, the backtrack to (*COMMIT) causes the entire match to -fail. However, if A and B match, but C fails, the backtrack to (*THEN) causes -the next alternative (ABD) to be tried. This behaviour is consistent, but is -not always the same as Perl's. It means that if two or more backtracking verbs -appear in succession, all the the last of them has no effect. Consider this -example: -
    -  ...(*COMMIT)(*PRUNE)...
    -
    -If there is a matching failure to the right, backtracking onto (*PRUNE) causes -it to be triggered, and its action is taken. There can never be a backtrack -onto (*COMMIT). -

    -
    -Backtracking verbs in repeated groups -
    -

    -PCRE differs from Perl in its handling of backtracking verbs in repeated -groups. For example, consider: -

    -  /(a(*COMMIT)b)+ac/
    -
    -If the subject is "abac", Perl matches, but PCRE fails because the (*COMMIT) in -the second repeat of the group acts. -

    -
    -Backtracking verbs in assertions -
    -

    -(*FAIL) in an assertion has its normal effect: it forces an immediate backtrack. -

    -

    -(*ACCEPT) in a positive assertion causes the assertion to succeed without any -further processing. In a negative assertion, (*ACCEPT) causes the assertion to -fail without any further processing. -

    -

    -The other backtracking verbs are not treated specially if they appear in a -positive assertion. In particular, (*THEN) skips to the next alternative in the -innermost enclosing group that has alternations, whether or not this is within -the assertion. -

    -

    -Negative assertions are, however, different, in order to ensure that changing a -positive assertion into a negative assertion changes its result. Backtracking -into (*COMMIT), (*SKIP), or (*PRUNE) causes a negative assertion to be true, -without considering any further alternative branches in the assertion. -Backtracking into (*THEN) causes it to skip to the next enclosing alternative -within the assertion (the normal behaviour), but if the assertion does not have -such an alternative, (*THEN) behaves like (*PRUNE). -

    -
    -Backtracking verbs in subroutines -
    -

    -These behaviours occur whether or not the subpattern is called recursively. -Perl's treatment of subroutines is different in some cases. -

    -

    -(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces -an immediate backtrack. -

    -

    -(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to -succeed without any further processing. Matching then continues after the -subroutine call. -

    -

    -(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause -the subroutine match to fail. -

    -

    -(*THEN) skips to the next alternative in the innermost enclosing group within -the subpattern that has alternatives. If there is no such group within the -subpattern, (*THEN) causes the subroutine match to fail. -

    -
    SEE ALSO
    -

    -pcreapi(3), pcrecallout(3), pcrematching(3), -pcresyntax(3), pcre(3), pcre16(3), pcre32(3). -

    -
    AUTHOR
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    REVISION
    -

    -Last updated: 23 October 2016 -
    -Copyright © 1997-2016 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcreperform.html b/src/pcre/doc/html/pcreperform.html deleted file mode 100644 index dda207f9..00000000 --- a/src/pcre/doc/html/pcreperform.html +++ /dev/null @@ -1,195 +0,0 @@ - - -pcreperform specification - - -

    pcreperform man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -PCRE PERFORMANCE -
    -

    -Two aspects of performance are discussed below: memory usage and processing -time. The way you express your pattern as a regular expression can affect both -of them. -

    -
    -COMPILED PATTERN MEMORY USAGE -
    -

    -Patterns are compiled by PCRE into a reasonably efficient interpretive code, so -that most simple patterns do not use much memory. However, there is one case -where the memory usage of a compiled pattern can be unexpectedly large. If a -parenthesized subpattern has a quantifier with a minimum greater than 1 and/or -a limited maximum, the whole subpattern is repeated in the compiled code. For -example, the pattern -

    -  (abc|def){2,4}
    -
    -is compiled as if it were -
    -  (abc|def)(abc|def)((abc|def)(abc|def)?)?
    -
    -(Technical aside: It is done this way so that backtrack points within each of -the repetitions can be independently maintained.) -

    -

    -For regular expressions whose quantifiers use only small numbers, this is not -usually a problem. However, if the numbers are large, and particularly if such -repetitions are nested, the memory usage can become an embarrassment. For -example, the very simple pattern -

    -  ((ab){1,1000}c){1,3}
    -
    -uses 51K bytes when compiled using the 8-bit library. When PCRE is compiled -with its default internal pointer size of two bytes, the size limit on a -compiled pattern is 64K data units, and this is reached with the above pattern -if the outer repetition is increased from 3 to 4. PCRE can be compiled to use -larger internal pointers and thus handle larger compiled patterns, but it is -better to try to rewrite your pattern to use less memory if you can. -

    -

    -One way of reducing the memory usage for such patterns is to make use of PCRE's -"subroutine" -facility. Re-writing the above pattern as -

    -  ((ab)(?2){0,999}c)(?1){0,2}
    -
    -reduces the memory requirements to 18K, and indeed it remains under 20K even -with the outer repetition increased to 100. However, this pattern is not -exactly equivalent, because the "subroutine" calls are treated as -atomic groups -into which there can be no backtracking if there is a subsequent matching -failure. Therefore, PCRE cannot do this kind of rewriting automatically. -Furthermore, there is a noticeable loss of speed when executing the modified -pattern. Nevertheless, if the atomic grouping is not a problem and the loss of -speed is acceptable, this kind of rewriting will allow you to process patterns -that PCRE cannot otherwise handle. -

    -
    -STACK USAGE AT RUN TIME -
    -

    -When pcre_exec() or pcre[16|32]_exec() is used for matching, certain -kinds of pattern can cause it to use large amounts of the process stack. In -some environments the default process stack is quite small, and if it runs out -the result is often SIGSEGV. This issue is probably the most frequently raised -problem with PCRE. Rewriting your pattern can often help. The -pcrestack -documentation discusses this issue in detail. -

    -
    -PROCESSING TIME -
    -

    -Certain items in regular expression patterns are processed more efficiently -than others. It is more efficient to use a character class like [aeiou] than a -set of single-character alternatives such as (a|e|i|o|u). In general, the -simplest construction that provides the required behaviour is usually the most -efficient. Jeffrey Friedl's book contains a lot of useful general discussion -about optimizing regular expressions for efficient performance. This document -contains a few observations about PCRE. -

    -

    -Using Unicode character properties (the \p, \P, and \X escapes) is slow, -because PCRE has to use a multi-stage table lookup whenever it needs a -character's property. If you can find an alternative pattern that does not use -character properties, it will probably be faster. -

    -

    -By default, the escape sequences \b, \d, \s, and \w, and the POSIX -character classes such as [:alpha:] do not use Unicode properties, partly for -backwards compatibility, and partly for performance reasons. However, you can -set PCRE_UCP if you want Unicode character properties to be used. This can -double the matching time for items such as \d, when matched with -a traditional matching function; the performance loss is less with -a DFA matching function, and in both cases there is not much difference for -\b. -

    -

    -When a pattern begins with .* not in parentheses, or in parentheses that are -not the subject of a backreference, and the PCRE_DOTALL option is set, the -pattern is implicitly anchored by PCRE, since it can match only at the start of -a subject string. However, if PCRE_DOTALL is not set, PCRE cannot make this -optimization, because the . metacharacter does not then match a newline, and if -the subject string contains newlines, the pattern may match from the character -immediately following one of them instead of from the very start. For example, -the pattern -

    -  .*second
    -
    -matches the subject "first\nand second" (where \n stands for a newline -character), with the match starting at the seventh character. In order to do -this, PCRE has to retry the match starting after every newline in the subject. -

    -

    -If you are using such a pattern with subject strings that do not contain -newlines, the best performance is obtained by setting PCRE_DOTALL, or starting -the pattern with ^.* or ^.*? to indicate explicit anchoring. That saves PCRE -from having to scan along the subject looking for a newline to restart at. -

    -

    -Beware of patterns that contain nested indefinite repeats. These can take a -long time to run when applied to a string that does not match. Consider the -pattern fragment -

    -  ^(a+)*
    -
    -This can match "aaaa" in 16 different ways, and this number increases very -rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4 -times, and for each of those cases other than 0 or 4, the + repeats can match -different numbers of times.) When the remainder of the pattern is such that the -entire match is going to fail, PCRE has in principle to try every possible -variation, and this can take an extremely long time, even for relatively short -strings. -

    -

    -An optimization catches some of the more simple cases such as -

    -  (a+)*b
    -
    -where a literal character follows. Before embarking on the standard matching -procedure, PCRE checks that there is a "b" later in the subject string, and if -there is not, it fails the match immediately. However, when there is no -following literal this optimization cannot be used. You can see the difference -by comparing the behaviour of -
    -  (a+)*\d
    -
    -with the pattern above. The former gives a failure almost instantly when -applied to a whole line of "a" characters, whereas the latter takes an -appreciable time with strings longer than about 20 characters. -

    -

    -In many cases, the solution to this kind of performance issue is to use an -atomic group or a possessive quantifier. -

    -
    -AUTHOR -
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    -REVISION -
    -

    -Last updated: 25 August 2012 -
    -Copyright © 1997-2012 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcreposix.html b/src/pcre/doc/html/pcreposix.html deleted file mode 100644 index 18924cf7..00000000 --- a/src/pcre/doc/html/pcreposix.html +++ /dev/null @@ -1,290 +0,0 @@ - - -pcreposix specification - - -

    pcreposix man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -
    SYNOPSIS
    -

    -#include <pcreposix.h> -

    -

    -int regcomp(regex_t *preg, const char *pattern, - int cflags); -
    -
    -int regexec(regex_t *preg, const char *string, - size_t nmatch, regmatch_t pmatch[], int eflags); - size_t regerror(int errcode, const regex_t *preg, - char *errbuf, size_t errbuf_size); -
    -
    -void regfree(regex_t *preg); -

    -
    DESCRIPTION
    -

    -This set of functions provides a POSIX-style API for the PCRE regular -expression 8-bit library. See the -pcreapi -documentation for a description of PCRE's native API, which contains much -additional functionality. There is no POSIX-style wrapper for PCRE's 16-bit -and 32-bit library. -

    -

    -The functions described here are just wrapper functions that ultimately call -the PCRE native API. Their prototypes are defined in the pcreposix.h -header file, and on Unix systems the library itself is called -pcreposix.a, so can be accessed by adding -lpcreposix to the -command for linking an application that uses them. Because the POSIX functions -call the native ones, it is also necessary to add -lpcre. -

    -

    -I have implemented only those POSIX option bits that can be reasonably mapped -to PCRE native options. In addition, the option REG_EXTENDED is defined with -the value zero. This has no effect, but since programs that are written to the -POSIX interface often use it, this makes it easier to slot in PCRE as a -replacement library. Other POSIX options are not even defined. -

    -

    -There are also some other options that are not defined by POSIX. These have -been added at the request of users who want to make use of certain -PCRE-specific features via the POSIX calling interface. -

    -

    -When PCRE is called via these functions, it is only the API that is POSIX-like -in style. The syntax and semantics of the regular expressions themselves are -still those of Perl, subject to the setting of various PCRE options, as -described below. "POSIX-like in style" means that the API approximates to the -POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding -domains it is probably even less compatible. -

    -

    -The header for these functions is supplied as pcreposix.h to avoid any -potential clash with other POSIX libraries. It can, of course, be renamed or -aliased as regex.h, which is the "correct" name. It provides two -structure types, regex_t for compiled internal forms, and -regmatch_t for returning captured substrings. It also defines some -constants whose names start with "REG_"; these are used for setting options and -identifying error codes. -

    -
    COMPILING A PATTERN
    -

    -The function regcomp() is called to compile a pattern into an -internal form. The pattern is a C string terminated by a binary zero, and -is passed in the argument pattern. The preg argument is a pointer -to a regex_t structure that is used as a base for storing information -about the compiled regular expression. -

    -

    -The argument cflags is either zero, or contains one or more of the bits -defined by the following macros: -

    -  REG_DOTALL
    -
    -The PCRE_DOTALL option is set when the regular expression is passed for -compilation to the native function. Note that REG_DOTALL is not part of the -POSIX standard. -
    -  REG_ICASE
    -
    -The PCRE_CASELESS option is set when the regular expression is passed for -compilation to the native function. -
    -  REG_NEWLINE
    -
    -The PCRE_MULTILINE option is set when the regular expression is passed for -compilation to the native function. Note that this does not mimic the -defined POSIX behaviour for REG_NEWLINE (see the following section). -
    -  REG_NOSUB
    -
    -The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed -for compilation to the native function. In addition, when a pattern that is -compiled with this flag is passed to regexec() for matching, the -nmatch and pmatch arguments are ignored, and no captured strings -are returned. -
    -  REG_UCP
    -
    -The PCRE_UCP option is set when the regular expression is passed for -compilation to the native function. This causes PCRE to use Unicode properties -when matchine \d, \w, etc., instead of just recognizing ASCII values. Note -that REG_UTF8 is not part of the POSIX standard. -
    -  REG_UNGREEDY
    -
    -The PCRE_UNGREEDY option is set when the regular expression is passed for -compilation to the native function. Note that REG_UNGREEDY is not part of the -POSIX standard. -
    -  REG_UTF8
    -
    -The PCRE_UTF8 option is set when the regular expression is passed for -compilation to the native function. This causes the pattern itself and all data -strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8 -is not part of the POSIX standard. -

    -

    -In the absence of these flags, no options are passed to the native function. -This means the the regex is compiled with PCRE default semantics. In -particular, the way it handles newline characters in the subject string is the -Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only -some of the effects specified for REG_NEWLINE. It does not affect the way -newlines are matched by . (they are not) or by a negative class such as [^a] -(they are). -

    -

    -The yield of regcomp() is zero on success, and non-zero otherwise. The -preg structure is filled in on success, and one member of the structure -is public: re_nsub contains the number of capturing subpatterns in -the regular expression. Various error codes are defined in the header file. -

    -

    -NOTE: If the yield of regcomp() is non-zero, you must not attempt to -use the contents of the preg structure. If, for example, you pass it to -regexec(), the result is undefined and your program is likely to crash. -

    -
    MATCHING NEWLINE CHARACTERS
    -

    -This area is not simple, because POSIX and Perl take different views of things. -It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never -intended to be a POSIX engine. The following table lists the different -possibilities for matching newline characters in PCRE: -

    -                          Default   Change with
    -
    -  . matches newline          no     PCRE_DOTALL
    -  newline matches [^a]       yes    not changeable
    -  $ matches \n at end        yes    PCRE_DOLLARENDONLY
    -  $ matches \n in middle     no     PCRE_MULTILINE
    -  ^ matches \n in middle     no     PCRE_MULTILINE
    -
    -This is the equivalent table for POSIX: -
    -                          Default   Change with
    -
    -  . matches newline          yes    REG_NEWLINE
    -  newline matches [^a]       yes    REG_NEWLINE
    -  $ matches \n at end        no     REG_NEWLINE
    -  $ matches \n in middle     no     REG_NEWLINE
    -  ^ matches \n in middle     no     REG_NEWLINE
    -
    -PCRE's behaviour is the same as Perl's, except that there is no equivalent for -PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way to stop -newline from matching [^a]. -

    -

    -The default POSIX newline handling can be obtained by setting PCRE_DOTALL and -PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE behave exactly as for the -REG_NEWLINE action. -

    -
    MATCHING A PATTERN
    -

    -The function regexec() is called to match a compiled pattern preg -against a given string, which is by default terminated by a zero byte -(but see REG_STARTEND below), subject to the options in eflags. These can -be: -

    -  REG_NOTBOL
    -
    -The PCRE_NOTBOL option is set when calling the underlying PCRE matching -function. -
    -  REG_NOTEMPTY
    -
    -The PCRE_NOTEMPTY option is set when calling the underlying PCRE matching -function. Note that REG_NOTEMPTY is not part of the POSIX standard. However, -setting this option can give more POSIX-like behaviour in some situations. -
    -  REG_NOTEOL
    -
    -The PCRE_NOTEOL option is set when calling the underlying PCRE matching -function. -
    -  REG_STARTEND
    -
    -The string is considered to start at string + pmatch[0].rm_so and -to have a terminating NUL located at string + pmatch[0].rm_eo -(there need not actually be a NUL at that location), regardless of the value of -nmatch. This is a BSD extension, compatible with but not specified by -IEEE Standard 1003.2 (POSIX.2), and should be used with caution in software -intended to be portable to other systems. Note that a non-zero rm_so does -not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not -how it is matched. -

    -

    -If the pattern was compiled with the REG_NOSUB flag, no data about any matched -strings is returned. The nmatch and pmatch arguments of -regexec() are ignored. -

    -

    -If the value of nmatch is zero, or if the value pmatch is NULL, -no data about any matched strings is returned. -

    -

    -Otherwise,the portion of the string that was matched, and also any captured -substrings, are returned via the pmatch argument, which points to an -array of nmatch structures of type regmatch_t, containing the -members rm_so and rm_eo. These contain the offset to the first -character of each substring and the offset to the first character after the end -of each substring, respectively. The 0th element of the vector relates to the -entire portion of string that was matched; subsequent elements relate to -the capturing subpatterns of the regular expression. Unused entries in the -array have both structure members set to -1. -

    -

    -A successful match yields a zero return; various error codes are defined in the -header file, of which REG_NOMATCH is the "expected" failure code. -

    -
    ERROR MESSAGES
    -

    -The regerror() function maps a non-zero errorcode from either -regcomp() or regexec() to a printable message. If preg is not -NULL, the error should have arisen from the use of that structure. A message -terminated by a binary zero is placed in errbuf. The length of the -message, including the zero, is limited to errbuf_size. The yield of the -function is the size of buffer needed to hold the whole message. -

    -
    MEMORY USAGE
    -

    -Compiling a regular expression causes memory to be allocated and associated -with the preg structure. The function regfree() frees all such -memory, after which preg may no longer be used as a compiled expression. -

    -
    AUTHOR
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    REVISION
    -

    -Last updated: 09 January 2012 -
    -Copyright © 1997-2012 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcreprecompile.html b/src/pcre/doc/html/pcreprecompile.html deleted file mode 100644 index decb1d6c..00000000 --- a/src/pcre/doc/html/pcreprecompile.html +++ /dev/null @@ -1,163 +0,0 @@ - - -pcreprecompile specification - - -

    pcreprecompile man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -
    SAVING AND RE-USING PRECOMPILED PCRE PATTERNS
    -

    -If you are running an application that uses a large number of regular -expression patterns, it may be useful to store them in a precompiled form -instead of having to compile them every time the application is run. -If you are not using any private character tables (see the -pcre_maketables() -documentation), this is relatively straightforward. If you are using private -tables, it is a little bit more complicated. However, if you are using the -just-in-time optimization feature, it is not possible to save and reload the -JIT data. -

    -

    -If you save compiled patterns to a file, you can copy them to a different host -and run them there. If the two hosts have different endianness (byte order), -you should run the pcre[16|32]_pattern_to_host_byte_order() function on the -new host before trying to match the pattern. The matching functions return -PCRE_ERROR_BADENDIANNESS if they detect a pattern with the wrong endianness. -

    -

    -Compiling regular expressions with one version of PCRE for use with a different -version is not guaranteed to work and may cause crashes, and saving and -restoring a compiled pattern loses any JIT optimization data. -

    -
    SAVING A COMPILED PATTERN
    -

    -The value returned by pcre[16|32]_compile() points to a single block of -memory that holds the compiled pattern and associated data. You can find the -length of this block in bytes by calling pcre[16|32]_fullinfo() with an -argument of PCRE_INFO_SIZE. You can then save the data in any appropriate -manner. Here is sample code for the 8-bit library that compiles a pattern and -writes it to a file. It assumes that the variable fd refers to a file -that is open for output: -

    -  int erroroffset, rc, size;
    -  char *error;
    -  pcre *re;
    -
    -  re = pcre_compile("my pattern", 0, &error, &erroroffset, NULL);
    -  if (re == NULL) { ... handle errors ... }
    -  rc = pcre_fullinfo(re, NULL, PCRE_INFO_SIZE, &size);
    -  if (rc < 0) { ... handle errors ... }
    -  rc = fwrite(re, 1, size, fd);
    -  if (rc != size) { ... handle errors ... }
    -
    -In this example, the bytes that comprise the compiled pattern are copied -exactly. Note that this is binary data that may contain any of the 256 possible -byte values. On systems that make a distinction between binary and non-binary -data, be sure that the file is opened for binary output. -

    -

    -If you want to write more than one pattern to a file, you will have to devise a -way of separating them. For binary data, preceding each pattern with its length -is probably the most straightforward approach. Another possibility is to write -out the data in hexadecimal instead of binary, one pattern to a line. -

    -

    -Saving compiled patterns in a file is only one possible way of storing them for -later use. They could equally well be saved in a database, or in the memory of -some daemon process that passes them via sockets to the processes that want -them. -

    -

    -If the pattern has been studied, it is also possible to save the normal study -data in a similar way to the compiled pattern itself. However, if the -PCRE_STUDY_JIT_COMPILE was used, the just-in-time data that is created cannot -be saved because it is too dependent on the current environment. When studying -generates additional information, pcre[16|32]_study() returns a pointer to a -pcre[16|32]_extra data block. Its format is defined in the -section on matching a pattern -in the -pcreapi -documentation. The study_data field points to the binary study data, and -this is what you must save (not the pcre[16|32]_extra block itself). The -length of the study data can be obtained by calling pcre[16|32]_fullinfo() -with an argument of PCRE_INFO_STUDYSIZE. Remember to check that -pcre[16|32]_study() did return a non-NULL value before trying to save the -study data. -

    -
    RE-USING A PRECOMPILED PATTERN
    -

    -Re-using a precompiled pattern is straightforward. Having reloaded it into main -memory, called pcre[16|32]_pattern_to_host_byte_order() if necessary, you -pass its pointer to pcre[16|32]_exec() or pcre[16|32]_dfa_exec() in -the usual way. -

    -

    -However, if you passed a pointer to custom character tables when the pattern -was compiled (the tableptr argument of pcre[16|32]_compile()), you -must now pass a similar pointer to pcre[16|32]_exec() or -pcre[16|32]_dfa_exec(), because the value saved with the compiled pattern -will obviously be nonsense. A field in a pcre[16|32]_extra() block is used -to pass this data, as described in the -section on matching a pattern -in the -pcreapi -documentation. -

    -

    -Warning: The tables that pcre_exec() and pcre_dfa_exec() use -must be the same as those that were used when the pattern was compiled. If this -is not the case, the behaviour is undefined. -

    -

    -If you did not provide custom character tables when the pattern was compiled, -the pointer in the compiled pattern is NULL, which causes the matching -functions to use PCRE's internal tables. Thus, you do not need to take any -special action at run time in this case. -

    -

    -If you saved study data with the compiled pattern, you need to create your own -pcre[16|32]_extra data block and set the study_data field to point -to the reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in -the flags field to indicate that study data is present. Then pass the -pcre[16|32]_extra block to the matching function in the usual way. If the -pattern was studied for just-in-time optimization, that data cannot be saved, -and so is lost by a save/restore cycle. -

    -
    COMPATIBILITY WITH DIFFERENT PCRE RELEASES
    -

    -In general, it is safest to recompile all saved patterns when you update to a -new PCRE release, though not all updates actually require this. -

    -
    AUTHOR
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    REVISION
    -

    -Last updated: 12 November 2013 -
    -Copyright © 1997-2013 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcresample.html b/src/pcre/doc/html/pcresample.html deleted file mode 100644 index aca9184e..00000000 --- a/src/pcre/doc/html/pcresample.html +++ /dev/null @@ -1,110 +0,0 @@ - - -pcresample specification - - -

    pcresample man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -PCRE SAMPLE PROGRAM -
    -

    -A simple, complete demonstration program, to get you started with using PCRE, -is supplied in the file pcredemo.c in the PCRE distribution. A listing of -this program is given in the -pcredemo -documentation. If you do not have a copy of the PCRE distribution, you can save -this listing to re-create pcredemo.c. -

    -

    -The demonstration program, which uses the original PCRE 8-bit library, compiles -the regular expression that is its first argument, and matches it against the -subject string in its second argument. No PCRE options are set, and default -character tables are used. If matching succeeds, the program outputs the -portion of the subject that matched, together with the contents of any captured -substrings. -

    -

    -If the -g option is given on the command line, the program then goes on to -check for further matches of the same regular expression in the same subject -string. The logic is a little bit tricky because of the possibility of matching -an empty string. Comments in the code explain what is going on. -

    -

    -If PCRE is installed in the standard include and library directories for your -operating system, you should be able to compile the demonstration program using -this command: -

    -  gcc -o pcredemo pcredemo.c -lpcre
    -
    -If PCRE is installed elsewhere, you may need to add additional options to the -command line. For example, on a Unix-like system that has PCRE installed in -/usr/local, you can compile the demonstration program using a command -like this: -
    -  gcc -o pcredemo -I/usr/local/include pcredemo.c -L/usr/local/lib -lpcre
    -
    -In a Windows environment, if you want to statically link the program against a -non-dll pcre.a file, you must uncomment the line that defines PCRE_STATIC -before including pcre.h, because otherwise the pcre_malloc() and -pcre_free() exported functions will be declared -__declspec(dllimport), with unwanted results. -

    -

    -Once you have compiled and linked the demonstration program, you can run simple -tests like this: -

    -  ./pcredemo 'cat|dog' 'the cat sat on the mat'
    -  ./pcredemo -g 'cat|dog' 'the dog sat on the cat'
    -
    -Note that there is a much more comprehensive test program, called -pcretest, -which supports many more facilities for testing regular expressions and both -PCRE libraries. The -pcredemo -program is provided as a simple coding example. -

    -

    -If you try to run -pcredemo -when PCRE is not installed in the standard library directory, you may get an -error like this on some operating systems (e.g. Solaris): -

    -  ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
    -
    -This is caused by the way shared library support works on those systems. You -need to add -
    -  -R/usr/local/lib
    -
    -(for example) to the compile command to get round this problem. -

    -
    -AUTHOR -
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    -REVISION -
    -

    -Last updated: 10 January 2012 -
    -Copyright © 1997-2012 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcrestack.html b/src/pcre/doc/html/pcrestack.html deleted file mode 100644 index af6406d0..00000000 --- a/src/pcre/doc/html/pcrestack.html +++ /dev/null @@ -1,225 +0,0 @@ - - -pcrestack specification - - -

    pcrestack man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -PCRE DISCUSSION OF STACK USAGE -
    -

    -When you call pcre[16|32]_exec(), it makes use of an internal function -called match(). This calls itself recursively at branch points in the -pattern, in order to remember the state of the match so that it can back up and -try a different alternative if the first one fails. As matching proceeds deeper -and deeper into the tree of possibilities, the recursion depth increases. The -match() function is also called in other circumstances, for example, -whenever a parenthesized sub-pattern is entered, and in certain cases of -repetition. -

    -

    -Not all calls of match() increase the recursion depth; for an item such -as a* it may be called several times at the same level, after matching -different numbers of a's. Furthermore, in a number of cases where the result of -the recursive call would immediately be passed back as the result of the -current call (a "tail recursion"), the function is just restarted instead. -

    -

    -The above comments apply when pcre[16|32]_exec() is run in its normal -interpretive manner. If the pattern was studied with the -PCRE_STUDY_JIT_COMPILE option, and just-in-time compiling was successful, and -the options passed to pcre[16|32]_exec() were not incompatible, the matching -process uses the JIT-compiled code instead of the match() function. In -this case, the memory requirements are handled entirely differently. See the -pcrejit -documentation for details. -

    -

    -The pcre[16|32]_dfa_exec() function operates in an entirely different way, -and uses recursion only when there is a regular expression recursion or -subroutine call in the pattern. This includes the processing of assertion and -"once-only" subpatterns, which are handled like subroutine calls. Normally, -these are never very deep, and the limit on the complexity of -pcre[16|32]_dfa_exec() is controlled by the amount of workspace it is given. -However, it is possible to write patterns with runaway infinite recursions; -such patterns will cause pcre[16|32]_dfa_exec() to run out of stack. At -present, there is no protection against this. -

    -

    -The comments that follow do NOT apply to pcre[16|32]_dfa_exec(); they are -relevant only for pcre[16|32]_exec() without the JIT optimization. -

    -
    -Reducing pcre[16|32]_exec()'s stack usage -
    -

    -Each time that match() is actually called recursively, it uses memory -from the process stack. For certain kinds of pattern and data, very large -amounts of stack may be needed, despite the recognition of "tail recursion". -You can often reduce the amount of recursion, and therefore the amount of stack -used, by modifying the pattern that is being matched. Consider, for example, -this pattern: -

    -  ([^<]|<(?!inet))+
    -
    -It matches from wherever it starts until it encounters "<inet" or the end of -the data, and is the kind of pattern that might be used when processing an XML -file. Each iteration of the outer parentheses matches either one character that -is not "<" or a "<" that is not followed by "inet". However, each time a -parenthesis is processed, a recursion occurs, so this formulation uses a stack -frame for each matched character. For a long string, a lot of stack is -required. Consider now this rewritten pattern, which matches exactly the same -strings: -
    -  ([^<]++|<(?!inet))+
    -
    -This uses very much less stack, because runs of characters that do not contain -"<" are "swallowed" in one item inside the parentheses. Recursion happens only -when a "<" character that is not followed by "inet" is encountered (and we -assume this is relatively rare). A possessive quantifier is used to stop any -backtracking into the runs of non-"<" characters, but that is not related to -stack usage. -

    -

    -This example shows that one way of avoiding stack problems when matching long -subject strings is to write repeated parenthesized subpatterns to match more -than one character whenever possible. -

    -
    -Compiling PCRE to use heap instead of stack for pcre[16|32]_exec() -
    -

    -In environments where stack memory is constrained, you might want to compile -PCRE to use heap memory instead of stack for remembering back-up points when -pcre[16|32]_exec() is running. This makes it run a lot more slowly, however. -Details of how to do this are given in the -pcrebuild -documentation. When built in this way, instead of using the stack, PCRE obtains -and frees memory by calling the functions that are pointed to by the -pcre[16|32]_stack_malloc and pcre[16|32]_stack_free variables. By -default, these point to malloc() and free(), but you can replace -the pointers to cause PCRE to use your own functions. Since the block sizes are -always the same, and are always freed in reverse order, it may be possible to -implement customized memory handlers that are more efficient than the standard -functions. -

    -
    -Limiting pcre[16|32]_exec()'s stack usage -
    -

    -You can set limits on the number of times that match() is called, both in -total and recursively. If a limit is exceeded, pcre[16|32]_exec() returns an -error code. Setting suitable limits should prevent it from running out of -stack. The default values of the limits are very large, and unlikely ever to -operate. They can be changed when PCRE is built, and they can also be set when -pcre[16|32]_exec() is called. For details of these interfaces, see the -pcrebuild -documentation and the -section on extra data for pcre[16|32]_exec() -in the -pcreapi -documentation. -

    -

    -As a very rough rule of thumb, you should reckon on about 500 bytes per -recursion. Thus, if you want to limit your stack usage to 8Mb, you should set -the limit at 16000 recursions. A 64Mb stack, on the other hand, can support -around 128000 recursions. -

    -

    -In Unix-like environments, the pcretest test program has a command line -option (-S) that can be used to increase the size of its stack. As long -as the stack is large enough, another option (-M) can be used to find the -smallest limits that allow a particular pattern to match a given subject -string. This is done by calling pcre[16|32]_exec() repeatedly with different -limits. -

    -
    -Obtaining an estimate of stack usage -
    -

    -The actual amount of stack used per recursion can vary quite a lot, depending -on the compiler that was used to build PCRE and the optimization or debugging -options that were set for it. The rule of thumb value of 500 bytes mentioned -above may be larger or smaller than what is actually needed. A better -approximation can be obtained by running this command: -

    -  pcretest -m -C
    -
    -The -C option causes pcretest to output information about the -options with which PCRE was compiled. When -m is also given (before --C), information about stack use is given in a line like this: -
    -  Match recursion uses stack: approximate frame size = 640 bytes
    -
    -The value is approximate because some recursions need a bit more (up to perhaps -16 more bytes). -

    -

    -If the above command is given when PCRE is compiled to use the heap instead of -the stack for recursion, the value that is output is the size of each block -that is obtained from the heap. -

    -
    -Changing stack size in Unix-like systems -
    -

    -In Unix-like environments, there is not often a problem with the stack unless -very long strings are involved, though the default limit on stack size varies -from system to system. Values from 8Mb to 64Mb are common. You can find your -default limit by running the command: -

    -  ulimit -s
    -
    -Unfortunately, the effect of running out of stack is often SIGSEGV, though -sometimes a more explicit error message is given. You can normally increase the -limit on stack size by code such as this: -
    -  struct rlimit rlim;
    -  getrlimit(RLIMIT_STACK, &rlim);
    -  rlim.rlim_cur = 100*1024*1024;
    -  setrlimit(RLIMIT_STACK, &rlim);
    -
    -This reads the current limits (soft and hard) using getrlimit(), then -attempts to increase the soft limit to 100Mb using setrlimit(). You must -do this before calling pcre[16|32]_exec(). -

    -
    -Changing stack size in Mac OS X -
    -

    -Using setrlimit(), as described above, should also work on Mac OS X. It -is also possible to set a stack size when linking a program. There is a -discussion about stack sizes in Mac OS X at this web site: -http://developer.apple.com/qa/qa2005/qa1419.html. -

    -
    -AUTHOR -
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    -REVISION -
    -

    -Last updated: 24 June 2012 -
    -Copyright © 1997-2012 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcresyntax.html b/src/pcre/doc/html/pcresyntax.html deleted file mode 100644 index 5896b9e0..00000000 --- a/src/pcre/doc/html/pcresyntax.html +++ /dev/null @@ -1,561 +0,0 @@ - - -pcresyntax specification - - -

    pcresyntax man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -
    PCRE REGULAR EXPRESSION SYNTAX SUMMARY
    -

    -The full syntax and semantics of the regular expressions that are supported by -PCRE are described in the -pcrepattern -documentation. This document contains a quick-reference summary of the syntax. -

    -
    QUOTING
    -

    -

    -  \x         where x is non-alphanumeric is a literal x
    -  \Q...\E    treat enclosed characters as literal
    -
    -

    -
    CHARACTERS
    -

    -

    -  \a         alarm, that is, the BEL character (hex 07)
    -  \cx        "control-x", where x is any ASCII character
    -  \e         escape (hex 1B)
    -  \f         form feed (hex 0C)
    -  \n         newline (hex 0A)
    -  \r         carriage return (hex 0D)
    -  \t         tab (hex 09)
    -  \0dd       character with octal code 0dd
    -  \ddd       character with octal code ddd, or backreference
    -  \o{ddd..}  character with octal code ddd..
    -  \xhh       character with hex code hh
    -  \x{hhh..}  character with hex code hhh..
    -
    -Note that \0dd is always an octal code, and that \8 and \9 are the literal -characters "8" and "9". -

    -
    CHARACTER TYPES
    -

    -

    -  .          any character except newline;
    -               in dotall mode, any character whatsoever
    -  \C         one data unit, even in UTF mode (best avoided)
    -  \d         a decimal digit
    -  \D         a character that is not a decimal digit
    -  \h         a horizontal white space character
    -  \H         a character that is not a horizontal white space character
    -  \N         a character that is not a newline
    -  \p{xx}     a character with the xx property
    -  \P{xx}     a character without the xx property
    -  \R         a newline sequence
    -  \s         a white space character
    -  \S         a character that is not a white space character
    -  \v         a vertical white space character
    -  \V         a character that is not a vertical white space character
    -  \w         a "word" character
    -  \W         a "non-word" character
    -  \X         a Unicode extended grapheme cluster
    -
    -By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode -or in the 16- bit and 32-bit libraries. However, if locale-specific matching is -happening, \s and \w may also match characters with code points in the range -128-255. If the PCRE_UCP option is set, the behaviour of these escape sequences -is changed to use Unicode properties and they match many more characters. -

    -
    GENERAL CATEGORY PROPERTIES FOR \p and \P
    -

    -

    -  C          Other
    -  Cc         Control
    -  Cf         Format
    -  Cn         Unassigned
    -  Co         Private use
    -  Cs         Surrogate
    -
    -  L          Letter
    -  Ll         Lower case letter
    -  Lm         Modifier letter
    -  Lo         Other letter
    -  Lt         Title case letter
    -  Lu         Upper case letter
    -  L&         Ll, Lu, or Lt
    -
    -  M          Mark
    -  Mc         Spacing mark
    -  Me         Enclosing mark
    -  Mn         Non-spacing mark
    -
    -  N          Number
    -  Nd         Decimal number
    -  Nl         Letter number
    -  No         Other number
    -
    -  P          Punctuation
    -  Pc         Connector punctuation
    -  Pd         Dash punctuation
    -  Pe         Close punctuation
    -  Pf         Final punctuation
    -  Pi         Initial punctuation
    -  Po         Other punctuation
    -  Ps         Open punctuation
    -
    -  S          Symbol
    -  Sc         Currency symbol
    -  Sk         Modifier symbol
    -  Sm         Mathematical symbol
    -  So         Other symbol
    -
    -  Z          Separator
    -  Zl         Line separator
    -  Zp         Paragraph separator
    -  Zs         Space separator
    -
    -

    -
    PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P
    -

    -

    -  Xan        Alphanumeric: union of properties L and N
    -  Xps        POSIX space: property Z or tab, NL, VT, FF, CR
    -  Xsp        Perl space: property Z or tab, NL, VT, FF, CR
    -  Xuc        Univerally-named character: one that can be
    -               represented by a Universal Character Name
    -  Xwd        Perl word: property Xan or underscore
    -
    -Perl and POSIX space are now the same. Perl added VT to its space character set -at release 5.18 and PCRE changed at release 8.34. -

    -
    SCRIPT NAMES FOR \p AND \P
    -

    -Arabic, -Armenian, -Avestan, -Balinese, -Bamum, -Bassa_Vah, -Batak, -Bengali, -Bopomofo, -Brahmi, -Braille, -Buginese, -Buhid, -Canadian_Aboriginal, -Carian, -Caucasian_Albanian, -Chakma, -Cham, -Cherokee, -Common, -Coptic, -Cuneiform, -Cypriot, -Cyrillic, -Deseret, -Devanagari, -Duployan, -Egyptian_Hieroglyphs, -Elbasan, -Ethiopic, -Georgian, -Glagolitic, -Gothic, -Grantha, -Greek, -Gujarati, -Gurmukhi, -Han, -Hangul, -Hanunoo, -Hebrew, -Hiragana, -Imperial_Aramaic, -Inherited, -Inscriptional_Pahlavi, -Inscriptional_Parthian, -Javanese, -Kaithi, -Kannada, -Katakana, -Kayah_Li, -Kharoshthi, -Khmer, -Khojki, -Khudawadi, -Lao, -Latin, -Lepcha, -Limbu, -Linear_A, -Linear_B, -Lisu, -Lycian, -Lydian, -Mahajani, -Malayalam, -Mandaic, -Manichaean, -Meetei_Mayek, -Mende_Kikakui, -Meroitic_Cursive, -Meroitic_Hieroglyphs, -Miao, -Modi, -Mongolian, -Mro, -Myanmar, -Nabataean, -New_Tai_Lue, -Nko, -Ogham, -Ol_Chiki, -Old_Italic, -Old_North_Arabian, -Old_Permic, -Old_Persian, -Old_South_Arabian, -Old_Turkic, -Oriya, -Osmanya, -Pahawh_Hmong, -Palmyrene, -Pau_Cin_Hau, -Phags_Pa, -Phoenician, -Psalter_Pahlavi, -Rejang, -Runic, -Samaritan, -Saurashtra, -Sharada, -Shavian, -Siddham, -Sinhala, -Sora_Sompeng, -Sundanese, -Syloti_Nagri, -Syriac, -Tagalog, -Tagbanwa, -Tai_Le, -Tai_Tham, -Tai_Viet, -Takri, -Tamil, -Telugu, -Thaana, -Thai, -Tibetan, -Tifinagh, -Tirhuta, -Ugaritic, -Vai, -Warang_Citi, -Yi. -

    -
    CHARACTER CLASSES
    -

    -

    -  [...]       positive character class
    -  [^...]      negative character class
    -  [x-y]       range (can be used for hex characters)
    -  [[:xxx:]]   positive POSIX named set
    -  [[:^xxx:]]  negative POSIX named set
    -
    -  alnum       alphanumeric
    -  alpha       alphabetic
    -  ascii       0-127
    -  blank       space or tab
    -  cntrl       control character
    -  digit       decimal digit
    -  graph       printing, excluding space
    -  lower       lower case letter
    -  print       printing, including space
    -  punct       printing, excluding alphanumeric
    -  space       white space
    -  upper       upper case letter
    -  word        same as \w
    -  xdigit      hexadecimal digit
    -
    -In PCRE, POSIX character set names recognize only ASCII characters by default, -but some of them use Unicode properties if PCRE_UCP is set. You can use -\Q...\E inside a character class. -

    -
    QUANTIFIERS
    -

    -

    -  ?           0 or 1, greedy
    -  ?+          0 or 1, possessive
    -  ??          0 or 1, lazy
    -  *           0 or more, greedy
    -  *+          0 or more, possessive
    -  *?          0 or more, lazy
    -  +           1 or more, greedy
    -  ++          1 or more, possessive
    -  +?          1 or more, lazy
    -  {n}         exactly n
    -  {n,m}       at least n, no more than m, greedy
    -  {n,m}+      at least n, no more than m, possessive
    -  {n,m}?      at least n, no more than m, lazy
    -  {n,}        n or more, greedy
    -  {n,}+       n or more, possessive
    -  {n,}?       n or more, lazy
    -
    -

    -
    ANCHORS AND SIMPLE ASSERTIONS
    -

    -

    -  \b          word boundary
    -  \B          not a word boundary
    -  ^           start of subject
    -               also after internal newline in multiline mode
    -  \A          start of subject
    -  $           end of subject
    -               also before newline at end of subject
    -               also before internal newline in multiline mode
    -  \Z          end of subject
    -               also before newline at end of subject
    -  \z          end of subject
    -  \G          first matching position in subject
    -
    -

    -
    MATCH POINT RESET
    -

    -

    -  \K          reset start of match
    -
    -\K is honoured in positive assertions, but ignored in negative ones. -

    -
    ALTERNATION
    -

    -

    -  expr|expr|expr...
    -
    -

    -
    CAPTURING
    -

    -

    -  (...)           capturing group
    -  (?<name>...)    named capturing group (Perl)
    -  (?'name'...)    named capturing group (Perl)
    -  (?P<name>...)   named capturing group (Python)
    -  (?:...)         non-capturing group
    -  (?|...)         non-capturing group; reset group numbers for
    -                   capturing groups in each alternative
    -
    -

    -
    ATOMIC GROUPS
    -

    -

    -  (?>...)         atomic, non-capturing group
    -
    -

    -
    COMMENT
    -

    -

    -  (?#....)        comment (not nestable)
    -
    -

    -
    OPTION SETTING
    -

    -

    -  (?i)            caseless
    -  (?J)            allow duplicate names
    -  (?m)            multiline
    -  (?s)            single line (dotall)
    -  (?U)            default ungreedy (lazy)
    -  (?x)            extended (ignore white space)
    -  (?-...)         unset option(s)
    -
    -The following are recognized only at the very start of a pattern or after one -of the newline or \R options with similar syntax. More than one of them may -appear. -
    -  (*LIMIT_MATCH=d) set the match limit to d (decimal number)
    -  (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number)
    -  (*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS)
    -  (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE)
    -  (*UTF8)         set UTF-8 mode: 8-bit library (PCRE_UTF8)
    -  (*UTF16)        set UTF-16 mode: 16-bit library (PCRE_UTF16)
    -  (*UTF32)        set UTF-32 mode: 32-bit library (PCRE_UTF32)
    -  (*UTF)          set appropriate UTF mode for the library in use
    -  (*UCP)          set PCRE_UCP (use Unicode properties for \d etc)
    -
    -Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the -limits set by the caller of pcre_exec(), not increase them. -

    -
    NEWLINE CONVENTION
    -

    -These are recognized only at the very start of the pattern or after option -settings with a similar syntax. -

    -  (*CR)           carriage return only
    -  (*LF)           linefeed only
    -  (*CRLF)         carriage return followed by linefeed
    -  (*ANYCRLF)      all three of the above
    -  (*ANY)          any Unicode newline sequence
    -
    -

    -
    WHAT \R MATCHES
    -

    -These are recognized only at the very start of the pattern or after option -setting with a similar syntax. -

    -  (*BSR_ANYCRLF)  CR, LF, or CRLF
    -  (*BSR_UNICODE)  any Unicode newline sequence
    -
    -

    -
    LOOKAHEAD AND LOOKBEHIND ASSERTIONS
    -

    -

    -  (?=...)         positive look ahead
    -  (?!...)         negative look ahead
    -  (?<=...)        positive look behind
    -  (?<!...)        negative look behind
    -
    -Each top-level branch of a look behind must be of a fixed length. -

    -
    BACKREFERENCES
    -

    -

    -  \n              reference by number (can be ambiguous)
    -  \gn             reference by number
    -  \g{n}           reference by number
    -  \g{-n}          relative reference by number
    -  \k<name>        reference by name (Perl)
    -  \k'name'        reference by name (Perl)
    -  \g{name}        reference by name (Perl)
    -  \k{name}        reference by name (.NET)
    -  (?P=name)       reference by name (Python)
    -
    -

    -
    SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)
    -

    -

    -  (?R)            recurse whole pattern
    -  (?n)            call subpattern by absolute number
    -  (?+n)           call subpattern by relative number
    -  (?-n)           call subpattern by relative number
    -  (?&name)        call subpattern by name (Perl)
    -  (?P>name)       call subpattern by name (Python)
    -  \g<name>        call subpattern by name (Oniguruma)
    -  \g'name'        call subpattern by name (Oniguruma)
    -  \g<n>           call subpattern by absolute number (Oniguruma)
    -  \g'n'           call subpattern by absolute number (Oniguruma)
    -  \g<+n>          call subpattern by relative number (PCRE extension)
    -  \g'+n'          call subpattern by relative number (PCRE extension)
    -  \g<-n>          call subpattern by relative number (PCRE extension)
    -  \g'-n'          call subpattern by relative number (PCRE extension)
    -
    -

    -
    CONDITIONAL PATTERNS
    -

    -

    -  (?(condition)yes-pattern)
    -  (?(condition)yes-pattern|no-pattern)
    -
    -  (?(n)...        absolute reference condition
    -  (?(+n)...       relative reference condition
    -  (?(-n)...       relative reference condition
    -  (?(<name>)...   named reference condition (Perl)
    -  (?('name')...   named reference condition (Perl)
    -  (?(name)...     named reference condition (PCRE)
    -  (?(R)...        overall recursion condition
    -  (?(Rn)...       specific group recursion condition
    -  (?(R&name)...   specific recursion condition
    -  (?(DEFINE)...   define subpattern for reference
    -  (?(assert)...   assertion condition
    -
    -

    -
    BACKTRACKING CONTROL
    -

    -The following act immediately they are reached: -

    -  (*ACCEPT)       force successful match
    -  (*FAIL)         force backtrack; synonym (*F)
    -  (*MARK:NAME)    set name to be passed back; synonym (*:NAME)
    -
    -The following act only when a subsequent match failure causes a backtrack to -reach them. They all force a match failure, but they differ in what happens -afterwards. Those that advance the start-of-match point do so only if the -pattern is not anchored. -
    -  (*COMMIT)       overall failure, no advance of starting point
    -  (*PRUNE)        advance to next starting character
    -  (*PRUNE:NAME)   equivalent to (*MARK:NAME)(*PRUNE)
    -  (*SKIP)         advance to current matching position
    -  (*SKIP:NAME)    advance to position corresponding to an earlier
    -                  (*MARK:NAME); if not found, the (*SKIP) is ignored
    -  (*THEN)         local failure, backtrack to next alternation
    -  (*THEN:NAME)    equivalent to (*MARK:NAME)(*THEN)
    -
    -

    -
    CALLOUTS
    -

    -

    -  (?C)      callout
    -  (?Cn)     callout with data n
    -
    -

    -
    SEE ALSO
    -

    -pcrepattern(3), pcreapi(3), pcrecallout(3), -pcrematching(3), pcre(3). -

    -
    AUTHOR
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    REVISION
    -

    -Last updated: 08 January 2014 -
    -Copyright © 1997-2014 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcretest.html b/src/pcre/doc/html/pcretest.html deleted file mode 100644 index ba540d3c..00000000 --- a/src/pcre/doc/html/pcretest.html +++ /dev/null @@ -1,1163 +0,0 @@ - - -pcretest specification - - -

    pcretest man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -

    -
    SYNOPSIS
    -

    -pcretest [options] [input file [output file]] -
    -
    -pcretest was written as a test program for the PCRE regular expression -library itself, but it can also be used for experimenting with regular -expressions. This document describes the features of the test program; for -details of the regular expressions themselves, see the -pcrepattern -documentation. For details of the PCRE library function calls and their -options, see the -pcreapi -, -pcre16 -and -pcre32 -documentation. -

    -

    -The input for pcretest is a sequence of regular expression patterns and -strings to be matched, as described below. The output shows the result of each -match. Options on the command line and the patterns control PCRE options and -exactly what is output. -

    -

    -As PCRE has evolved, it has acquired many different features, and as a result, -pcretest now has rather a lot of obscure options for testing every -possible feature. Some of these options are specifically designed for use in -conjunction with the test script and data files that are distributed as part of -PCRE, and are unlikely to be of use otherwise. They are all documented here, -but without much justification. -

    -
    INPUT DATA FORMAT
    -

    -Input to pcretest is processed line by line, either by calling the C -library's fgets() function, or via the libreadline library (see -below). In Unix-like environments, fgets() treats any bytes other than -newline as data characters. However, in some Windows environments character 26 -(hex 1A) causes an immediate end of file, and no further data is read. For -maximum portability, therefore, it is safest to use only ASCII characters in -pcretest input files. -

    -

    -The input is processed using using C's string functions, so must not -contain binary zeroes, even though in Unix-like environments, fgets() -treats any bytes other than newline as data characters. -

    -
    PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
    -

    -From release 8.30, two separate PCRE libraries can be built. The original one -supports 8-bit character strings, whereas the newer 16-bit library supports -character strings encoded in 16-bit units. From release 8.32, a third library -can be built, supporting character strings encoded in 32-bit units. The -pcretest program can be used to test all three libraries. However, it is -itself still an 8-bit program, reading 8-bit input and writing 8-bit output. -When testing the 16-bit or 32-bit library, the patterns and data strings are -converted to 16- or 32-bit format before being passed to the PCRE library -functions. Results are converted to 8-bit for output. -

    -

    -References to functions and structures of the form pcre[16|32]_xx below -mean "pcre_xx when using the 8-bit library, pcre16_xx when using -the 16-bit library, or pcre32_xx when using the 32-bit library". -

    -
    COMMAND LINE OPTIONS
    -

    --8 -If both the 8-bit library has been built, this option causes the 8-bit library -to be used (which is the default); if the 8-bit library has not been built, -this option causes an error. -

    -

    --16 -If both the 8-bit or the 32-bit, and the 16-bit libraries have been built, this -option causes the 16-bit library to be used. If only the 16-bit library has been -built, this is the default (so has no effect). If only the 8-bit or the 32-bit -library has been built, this option causes an error. -

    -

    --32 -If both the 8-bit or the 16-bit, and the 32-bit libraries have been built, this -option causes the 32-bit library to be used. If only the 32-bit library has been -built, this is the default (so has no effect). If only the 8-bit or the 16-bit -library has been built, this option causes an error. -

    -

    --b -Behave as if each pattern has the /B (show byte code) modifier; the -internal form is output after compilation. -

    -

    --C -Output the version number of the PCRE library, and all available information -about the optional features that are included, and then exit with zero exit -code. All other options are ignored. -

    -

    --C option -Output information about a specific build-time option, then exit. This -functionality is intended for use in scripts such as RunTest. The -following options output the value and set the exit code as indicated: -

    -  ebcdic-nl  the code for LF (= NL) in an EBCDIC environment:
    -               0x15 or 0x25
    -               0 if used in an ASCII environment
    -               exit code is always 0
    -  linksize   the configured internal link size (2, 3, or 4)
    -               exit code is set to the link size
    -  newline    the default newline setting:
    -               CR, LF, CRLF, ANYCRLF, or ANY
    -               exit code is always 0
    -  bsr        the default setting for what \R matches:
    -               ANYCRLF or ANY
    -               exit code is always 0
    -
    -The following options output 1 for true or 0 for false, and set the exit code -to the same value: -
    -  ebcdic     compiled for an EBCDIC environment
    -  jit        just-in-time support is available
    -  pcre16     the 16-bit library was built
    -  pcre32     the 32-bit library was built
    -  pcre8      the 8-bit library was built
    -  ucp        Unicode property support is available
    -  utf        UTF-8 and/or UTF-16 and/or UTF-32 support
    -               is available
    -
    -If an unknown option is given, an error message is output; the exit code is 0. -

    -

    --d -Behave as if each pattern has the /D (debug) modifier; the internal -form and information about the compiled pattern is output after compilation; --d is equivalent to -b -i. -

    -

    --dfa -Behave as if each data line contains the \D escape sequence; this causes the -alternative matching function, pcre[16|32]_dfa_exec(), to be used instead -of the standard pcre[16|32]_exec() function (more detail is given below). -

    -

    --help -Output a brief summary these options and then exit. -

    -

    --i -Behave as if each pattern has the /I modifier; information about the -compiled pattern is given after compilation. -

    -

    --M -Behave as if each data line contains the \M escape sequence; this causes -PCRE to discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings by -calling pcre[16|32]_exec() repeatedly with different limits. -

    -

    --m -Output the size of each compiled pattern after it has been compiled. This is -equivalent to adding /M to each regular expression. The size is given in -bytes for both libraries. -

    -

    --O -Behave as if each pattern has the /O modifier, that is disable -auto-possessification for all patterns. -

    -

    --o osize -Set the number of elements in the output vector that is used when calling -pcre[16|32]_exec() or pcre[16|32]_dfa_exec() to be osize. The -default value is 45, which is enough for 14 capturing subexpressions for -pcre[16|32]_exec() or 22 different matches for -pcre[16|32]_dfa_exec(). -The vector size can be changed for individual matching calls by including \O -in the data line (see below). -

    -

    --p -Behave as if each pattern has the /P modifier; the POSIX wrapper API is -used to call PCRE. None of the other options has any effect when -p is -set. This option can be used only with the 8-bit library. -

    -

    --q -Do not output the version number of pcretest at the start of execution. -

    -

    --S size -On Unix-like systems, set the size of the run-time stack to size -megabytes. -

    -

    --s or -s+ -Behave as if each pattern has the /S modifier; in other words, force each -pattern to be studied. If -s+ is used, all the JIT compile options are -passed to pcre[16|32]_study(), causing just-in-time optimization to be set -up if it is available, for both full and partial matching. Specific JIT compile -options can be selected by following -s+ with a digit in the range 1 to -7, which selects the JIT compile modes as follows: -

    -  1  normal match only
    -  2  soft partial match only
    -  3  normal match and soft partial match
    -  4  hard partial match only
    -  6  soft and hard partial match
    -  7  all three modes (default)
    -
    -If -s++ is used instead of -s+ (with or without a following digit), -the text "(JIT)" is added to the first output line after a match or no match -when JIT-compiled code was actually used. -
    -
    -Note that there are pattern options that can override -s, either -specifying no studying at all, or suppressing JIT compilation. -
    -
    -If the /I or /D option is present on a pattern (requesting output -about the compiled pattern), information about the result of studying is not -included when studying is caused only by -s and neither -i nor --d is present on the command line. This behaviour means that the output -from tests that are run with and without -s should be identical, except -when options that output information about the actual running of a match are -set. -
    -
    -The -M, -t, and -tm options, which give information about -resources used, are likely to produce different output with and without --s. Output may also differ if the /C option is present on an -individual pattern. This uses callouts to trace the the matching process, and -this may be different between studied and non-studied patterns. If the pattern -contains (*MARK) items there may also be differences, for the same reason. The --s command line option can be overridden for specific patterns that -should never be studied (see the /S pattern modifier below). -

    -

    --t -Run each compile, study, and match many times with a timer, and output the -resulting times per compile, study, or match (in milliseconds). Do not set --m with -t, because you will then get the size output a zillion -times, and the timing will be distorted. You can control the number of -iterations that are used for timing by following -t with a number (as a -separate item on the command line). For example, "-t 1000" iterates 1000 times. -The default is to iterate 500000 times. -

    -

    --tm -This is like -t except that it times only the matching phase, not the -compile or study phases. -

    -

    --T -TM -These behave like -t and -tm, but in addition, at the end of a run, -the total times for all compiles, studies, and matches are output. -

    -
    DESCRIPTION
    -

    -If pcretest is given two filename arguments, it reads from the first and -writes to the second. If it is given only one filename argument, it reads from -that file and writes to stdout. Otherwise, it reads from stdin and writes to -stdout, and prompts for each line of input, using "re>" to prompt for regular -expressions, and "data>" to prompt for data lines. -

    -

    -When pcretest is built, a configuration option can specify that it should -be linked with the libreadline library. When this is done, if the input -is from a terminal, it is read using the readline() function. This -provides line-editing and history facilities. The output from the -help -option states whether or not readline() will be used. -

    -

    -The program handles any number of sets of input on a single input file. Each -set starts with a regular expression, and continues with any number of data -lines to be matched against that pattern. -

    -

    -Each data line is matched separately and independently. If you want to do -multi-line matches, you have to use the \n escape sequence (or \r or \r\n, -etc., depending on the newline setting) in a single line of input to encode the -newline sequences. There is no limit on the length of data lines; the input -buffer is automatically extended if it is too small. -

    -

    -An empty line signals the end of the data lines, at which point a new regular -expression is read. The regular expressions are given enclosed in any -non-alphanumeric delimiters other than backslash, for example: -

    -  /(a|bc)x+yz/
    -
    -White space before the initial delimiter is ignored. A regular expression may -be continued over several input lines, in which case the newline characters are -included within it. It is possible to include the delimiter within the pattern -by escaping it, for example -
    -  /abc\/def/
    -
    -If you do so, the escape and the delimiter form part of the pattern, but since -delimiters are always non-alphanumeric, this does not affect its interpretation. -If the terminating delimiter is immediately followed by a backslash, for -example, -
    -  /abc/\
    -
    -then a backslash is added to the end of the pattern. This is done to provide a -way of testing the error condition that arises if a pattern finishes with a -backslash, because -
    -  /abc\/
    -
    -is interpreted as the first line of a pattern that starts with "abc/", causing -pcretest to read the next line as a continuation of the regular expression. -

    -
    PATTERN MODIFIERS
    -

    -A pattern may be followed by any number of modifiers, which are mostly single -characters, though some of these can be qualified by further characters. -Following Perl usage, these are referred to below as, for example, "the -/i modifier", even though the delimiter of the pattern need not always be -a slash, and no slash is used when writing modifiers. White space may appear -between the final pattern delimiter and the first modifier, and between the -modifiers themselves. For reference, here is a complete list of modifiers. They -fall into several groups that are described in detail in the following -sections. -

    -  /8              set UTF mode
    -  /9              set PCRE_NEVER_UTF (locks out UTF mode)
    -  /?              disable UTF validity check
    -  /+              show remainder of subject after match
    -  /=              show all captures (not just those that are set)
    -
    -  /A              set PCRE_ANCHORED
    -  /B              show compiled code
    -  /C              set PCRE_AUTO_CALLOUT
    -  /D              same as /B plus /I
    -  /E              set PCRE_DOLLAR_ENDONLY
    -  /F              flip byte order in compiled pattern
    -  /f              set PCRE_FIRSTLINE
    -  /G              find all matches (shorten string)
    -  /g              find all matches (use startoffset)
    -  /I              show information about pattern
    -  /i              set PCRE_CASELESS
    -  /J              set PCRE_DUPNAMES
    -  /K              show backtracking control names
    -  /L              set locale
    -  /M              show compiled memory size
    -  /m              set PCRE_MULTILINE
    -  /N              set PCRE_NO_AUTO_CAPTURE
    -  /O              set PCRE_NO_AUTO_POSSESS
    -  /P              use the POSIX wrapper
    -  /Q              test external stack check function
    -  /S              study the pattern after compilation
    -  /s              set PCRE_DOTALL
    -  /T              select character tables
    -  /U              set PCRE_UNGREEDY
    -  /W              set PCRE_UCP
    -  /X              set PCRE_EXTRA
    -  /x              set PCRE_EXTENDED
    -  /Y              set PCRE_NO_START_OPTIMIZE
    -  /Z              don't show lengths in /B output
    -
    -  /<any>          set PCRE_NEWLINE_ANY
    -  /<anycrlf>      set PCRE_NEWLINE_ANYCRLF
    -  /<cr>           set PCRE_NEWLINE_CR
    -  /<crlf>         set PCRE_NEWLINE_CRLF
    -  /<lf>           set PCRE_NEWLINE_LF
    -  /<bsr_anycrlf>  set PCRE_BSR_ANYCRLF
    -  /<bsr_unicode>  set PCRE_BSR_UNICODE
    -  /<JS>           set PCRE_JAVASCRIPT_COMPAT
    -
    -
    -

    -
    -Perl-compatible modifiers -
    -

    -The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, -PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when -pcre[16|32]_compile() is called. These four modifier letters have the same -effect as they do in Perl. For example: -

    -  /caseless/i
    -
    -
    -

    -
    -Modifiers for other PCRE options -
    -

    -The following table shows additional modifiers for setting PCRE compile-time -options that do not correspond to anything in Perl: -

    -  /8              PCRE_UTF8           ) when using the 8-bit
    -  /?              PCRE_NO_UTF8_CHECK  )   library
    -
    -  /8              PCRE_UTF16          ) when using the 16-bit
    -  /?              PCRE_NO_UTF16_CHECK )   library
    -
    -  /8              PCRE_UTF32          ) when using the 32-bit
    -  /?              PCRE_NO_UTF32_CHECK )   library
    -
    -  /9              PCRE_NEVER_UTF
    -  /A              PCRE_ANCHORED
    -  /C              PCRE_AUTO_CALLOUT
    -  /E              PCRE_DOLLAR_ENDONLY
    -  /f              PCRE_FIRSTLINE
    -  /J              PCRE_DUPNAMES
    -  /N              PCRE_NO_AUTO_CAPTURE
    -  /O              PCRE_NO_AUTO_POSSESS
    -  /U              PCRE_UNGREEDY
    -  /W              PCRE_UCP
    -  /X              PCRE_EXTRA
    -  /Y              PCRE_NO_START_OPTIMIZE
    -  /<any>          PCRE_NEWLINE_ANY
    -  /<anycrlf>      PCRE_NEWLINE_ANYCRLF
    -  /<cr>           PCRE_NEWLINE_CR
    -  /<crlf>         PCRE_NEWLINE_CRLF
    -  /<lf>           PCRE_NEWLINE_LF
    -  /<bsr_anycrlf>  PCRE_BSR_ANYCRLF
    -  /<bsr_unicode>  PCRE_BSR_UNICODE
    -  /<JS>           PCRE_JAVASCRIPT_COMPAT
    -
    -The modifiers that are enclosed in angle brackets are literal strings as shown, -including the angle brackets, but the letters within can be in either case. -This example sets multiline matching with CRLF as the line ending sequence: -
    -  /^abc/m<CRLF>
    -
    -As well as turning on the PCRE_UTF8/16/32 option, the /8 modifier causes -all non-printing characters in output strings to be printed using the -\x{hh...} notation. Otherwise, those less than 0x100 are output in hex without -the curly brackets. -

    -

    -Full details of the PCRE options are given in the -pcreapi -documentation. -

    -
    -Finding all matches in a string -
    -

    -Searching for all possible matches within each subject string can be requested -by the /g or /G modifier. After finding a match, PCRE is called -again to search the remainder of the subject string. The difference between -/g and /G is that the former uses the startoffset argument to -pcre[16|32]_exec() to start searching at a new point within the entire -string (which is in effect what Perl does), whereas the latter passes over a -shortened substring. This makes a difference to the matching process if the -pattern begins with a lookbehind assertion (including \b or \B). -

    -

    -If any call to pcre[16|32]_exec() in a /g or /G sequence matches -an empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and -PCRE_ANCHORED flags set in order to search for another, non-empty, match at the -same point. If this second match fails, the start offset is advanced, and the -normal match is retried. This imitates the way Perl handles such cases when -using the /g modifier or the split() function. Normally, the start -offset is advanced by one character, but if the newline convention recognizes -CRLF as a newline, and the current character is CR followed by LF, an advance -of two is used. -

    -
    -Other modifiers -
    -

    -There are yet more modifiers for controlling the way pcretest -operates. -

    -

    -The /+ modifier requests that as well as outputting the substring that -matched the entire pattern, pcretest should in addition output the -remainder of the subject string. This is useful for tests where the subject -contains multiple copies of the same substring. If the + modifier appears -twice, the same action is taken for captured substrings. In each case the -remainder is output on the following line with a plus character following the -capture number. Note that this modifier must not immediately follow the /S -modifier because /S+ and /S++ have other meanings. -

    -

    -The /= modifier requests that the values of all potential captured -parentheses be output after a match. By default, only those up to the highest -one actually used in the match are output (corresponding to the return code -from pcre[16|32]_exec()). Values in the offsets vector corresponding to -higher numbers should be set to -1, and these are output as "<unset>". This -modifier gives a way of checking that this is happening. -

    -

    -The /B modifier is a debugging feature. It requests that pcretest -output a representation of the compiled code after compilation. Normally this -information contains length and offset values; however, if /Z is also -present, this data is replaced by spaces. This is a special feature for use in -the automatic test scripts; it ensures that the same output is generated for -different internal link sizes. -

    -

    -The /D modifier is a PCRE debugging feature, and is equivalent to -/BI, that is, both the /B and the /I modifiers. -

    -

    -The /F modifier causes pcretest to flip the byte order of the -2-byte and 4-byte fields in the compiled pattern. This facility is for testing -the feature in PCRE that allows it to execute patterns that were compiled on a -host with a different endianness. This feature is not available when the POSIX -interface to PCRE is being used, that is, when the /P pattern modifier is -specified. See also the section about saving and reloading compiled patterns -below. -

    -

    -The /I modifier requests that pcretest output information about the -compiled pattern (whether it is anchored, has a fixed first character, and -so on). It does this by calling pcre[16|32]_fullinfo() after compiling a -pattern. If the pattern is studied, the results of that are also output. In -this output, the word "char" means a non-UTF character, that is, the value of a -single data item (8-bit, 16-bit, or 32-bit, depending on the library that is -being tested). -

    -

    -The /K modifier requests pcretest to show names from backtracking -control verbs that are returned from calls to pcre[16|32]_exec(). It causes -pcretest to create a pcre[16|32]_extra block if one has not already -been created by a call to pcre[16|32]_study(), and to set the -PCRE_EXTRA_MARK flag and the mark field within it, every time that -pcre[16|32]_exec() is called. If the variable that the mark field -points to is non-NULL for a match, non-match, or partial match, pcretest -prints the string to which it points. For a match, this is shown on a line by -itself, tagged with "MK:". For a non-match it is added to the message. -

    -

    -The /L modifier must be followed directly by the name of a locale, for -example, -

    -  /pattern/Lfr_FR
    -
    -For this reason, it must be the last modifier. The given locale is set, -pcre[16|32]_maketables() is called to build a set of character tables for -the locale, and this is then passed to pcre[16|32]_compile() when compiling -the regular expression. Without an /L (or /T) modifier, NULL is -passed as the tables pointer; that is, /L applies only to the expression -on which it appears. -

    -

    -The /M modifier causes the size in bytes of the memory block used to hold -the compiled pattern to be output. This does not include the size of the -pcre[16|32] block; it is just the actual compiled data. If the pattern is -successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the -JIT compiled code is also output. -

    -

    -The /Q modifier is used to test the use of pcre_stack_guard. It -must be followed by '0' or '1', specifying the return code to be given from an -external function that is passed to PCRE and used for stack checking during -compilation (see the -pcreapi -documentation for details). -

    -

    -The /S modifier causes pcre[16|32]_study() to be called after the -expression has been compiled, and the results used when the expression is -matched. There are a number of qualifying characters that may follow /S. -They may appear in any order. -

    -

    -If /S is followed by an exclamation mark, pcre[16|32]_study() is -called with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a -pcre_extra block, even when studying discovers no useful information. -

    -

    -If /S is followed by a second S character, it suppresses studying, even -if it was requested externally by the -s command line option. This makes -it possible to specify that certain patterns are always studied, and others are -never studied, independently of -s. This feature is used in the test -files in a few cases where the output is different when the pattern is studied. -

    -

    -If the /S modifier is followed by a + character, the call to -pcre[16|32]_study() is made with all the JIT study options, requesting -just-in-time optimization support if it is available, for both normal and -partial matching. If you want to restrict the JIT compiling modes, you can -follow /S+ with a digit in the range 1 to 7: -

    -  1  normal match only
    -  2  soft partial match only
    -  3  normal match and soft partial match
    -  4  hard partial match only
    -  6  soft and hard partial match
    -  7  all three modes (default)
    -
    -If /S++ is used instead of /S+ (with or without a following digit), -the text "(JIT)" is added to the first output line after a match or no match -when JIT-compiled code was actually used. -

    -

    -Note that there is also an independent /+ modifier; it must not be given -immediately after /S or /S+ because this will be misinterpreted. -

    -

    -If JIT studying is successful, the compiled JIT code will automatically be used -when pcre[16|32]_exec() is run, except when incompatible run-time options -are specified. For more details, see the -pcrejit -documentation. See also the \J escape sequence below for a way of -setting the size of the JIT stack. -

    -

    -Finally, if /S is followed by a minus character, JIT compilation is -suppressed, even if it was requested externally by the -s command line -option. This makes it possible to specify that JIT is never to be used for -certain patterns. -

    -

    -The /T modifier must be followed by a single digit. It causes a specific -set of built-in character tables to be passed to pcre[16|32]_compile(). It -is used in the standard PCRE tests to check behaviour with different character -tables. The digit specifies the tables as follows: -

    -  0   the default ASCII tables, as distributed in
    -        pcre_chartables.c.dist
    -  1   a set of tables defining ISO 8859 characters
    -
    -In table 1, some characters whose codes are greater than 128 are identified as -letters, digits, spaces, etc. -

    -
    -Using the POSIX wrapper API -
    -

    -The /P modifier causes pcretest to call PCRE via the POSIX wrapper -API rather than its native API. This supports only the 8-bit library. When -/P is set, the following modifiers set options for the regcomp() -function: -

    -  /i    REG_ICASE
    -  /m    REG_NEWLINE
    -  /N    REG_NOSUB
    -  /s    REG_DOTALL     )
    -  /U    REG_UNGREEDY   ) These options are not part of
    -  /W    REG_UCP        )   the POSIX standard
    -  /8    REG_UTF8       )
    -
    -The /+ modifier works as described above. All other modifiers are -ignored. -

    -
    -Locking out certain modifiers -
    -

    -PCRE can be compiled with or without support for certain features such as -UTF-8/16/32 or Unicode properties. Accordingly, the standard tests are split up -into a number of different files that are selected for running depending on -which features are available. When updating the tests, it is all too easy to -put a new test into the wrong file by mistake; for example, to put a test that -requires UTF support into a file that is used when it is not available. To help -detect such mistakes as early as possible, there is a facility for locking out -specific modifiers. If an input line for pcretest starts with the string -"< forbid " the following sequence of characters is taken as a list of -forbidden modifiers. For example, in the test files that must not use UTF or -Unicode property support, this line appears: -

    -  < forbid 8W
    -
    -This locks out the /8 and /W modifiers. An immediate error is given if they are -subsequently encountered. If the character string contains < but not >, all the -multi-character modifiers that begin with < are locked out. Otherwise, such -modifiers must be explicitly listed, for example: -
    -  < forbid <JS><cr>
    -
    -There must be a single space between < and "forbid" for this feature to be -recognised. If there is not, the line is interpreted either as a request to -re-load a pre-compiled pattern (see "SAVING AND RELOADING COMPILED PATTERNS" -below) or, if there is a another < character, as a pattern that uses < as its -delimiter. -

    -
    DATA LINES
    -

    -Before each data line is passed to pcre[16|32]_exec(), leading and trailing -white space is removed, and it is then scanned for \ escapes. Some of these -are pretty esoteric features, intended for checking out some of the more -complicated features of PCRE. If you are just testing "ordinary" regular -expressions, you probably don't need any of these. The following escapes are -recognized: -

    -  \a         alarm (BEL, \x07)
    -  \b         backspace (\x08)
    -  \e         escape (\x27)
    -  \f         form feed (\x0c)
    -  \n         newline (\x0a)
    -  \qdd       set the PCRE_MATCH_LIMIT limit to dd (any number of digits)
    -  \r         carriage return (\x0d)
    -  \t         tab (\x09)
    -  \v         vertical tab (\x0b)
    -  \nnn       octal character (up to 3 octal digits); always
    -               a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode
    -  \o{dd...}  octal character (any number of octal digits}
    -  \xhh       hexadecimal byte (up to 2 hex digits)
    -  \x{hh...}  hexadecimal character (any number of hex digits)
    -  \A         pass the PCRE_ANCHORED option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec()
    -  \B         pass the PCRE_NOTBOL option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec()
    -  \Cdd       call pcre[16|32]_copy_substring() for substring dd after a successful match (number less than 32)
    -  \Cname     call pcre[16|32]_copy_named_substring() for substring "name" after a successful match (name termin-
    -               ated by next non alphanumeric character)
    -  \C+        show the current captured substrings at callout time
    -  \C-        do not supply a callout function
    -  \C!n       return 1 instead of 0 when callout number n is reached
    -  \C!n!m     return 1 instead of 0 when callout number n is reached for the nth time
    -  \C*n       pass the number n (may be negative) as callout data; this is used as the callout return value
    -  \D         use the pcre[16|32]_dfa_exec() match function
    -  \F         only shortest match for pcre[16|32]_dfa_exec()
    -  \Gdd       call pcre[16|32]_get_substring() for substring dd after a successful match (number less than 32)
    -  \Gname     call pcre[16|32]_get_named_substring() for substring "name" after a successful match (name termin-
    -               ated by next non-alphanumeric character)
    -  \Jdd       set up a JIT stack of dd kilobytes maximum (any number of digits)
    -  \L         call pcre[16|32]_get_substringlist() after a successful match
    -  \M         discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings
    -  \N         pass the PCRE_NOTEMPTY option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec(); if used twice, pass the
    -               PCRE_NOTEMPTY_ATSTART option
    -  \Odd       set the size of the output vector passed to pcre[16|32]_exec() to dd (any number of digits)
    -  \P         pass the PCRE_PARTIAL_SOFT option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec(); if used twice, pass the
    -               PCRE_PARTIAL_HARD option
    -  \Qdd       set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits)
    -  \R         pass the PCRE_DFA_RESTART option to pcre[16|32]_dfa_exec()
    -  \S         output details of memory get/free calls during matching
    -  \Y         pass the PCRE_NO_START_OPTIMIZE option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec()
    -  \Z         pass the PCRE_NOTEOL option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec()
    -  \?         pass the PCRE_NO_UTF[8|16|32]_CHECK option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec()
    -  \>dd       start the match at offset dd (optional "-"; then any number of digits); this sets the startoffset
    -               argument for pcre[16|32]_exec() or pcre[16|32]_dfa_exec()
    -  \<cr>      pass the PCRE_NEWLINE_CR option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec()
    -  \<lf>      pass the PCRE_NEWLINE_LF option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec()
    -  \<crlf>    pass the PCRE_NEWLINE_CRLF option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec()
    -  \<anycrlf> pass the PCRE_NEWLINE_ANYCRLF option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec()
    -  \<any>     pass the PCRE_NEWLINE_ANY option to pcre[16|32]_exec() or pcre[16|32]_dfa_exec()
    -
    -The use of \x{hh...} is not dependent on the use of the /8 modifier on -the pattern. It is recognized always. There may be any number of hexadecimal -digits inside the braces; invalid values provoke error messages. -

    -

    -Note that \xhh specifies one byte rather than one character in UTF-8 mode; -this makes it possible to construct invalid UTF-8 sequences for testing -purposes. On the other hand, \x{hh} is interpreted as a UTF-8 character in -UTF-8 mode, generating more than one byte if the value is greater than 127. -When testing the 8-bit library not in UTF-8 mode, \x{hh} generates one byte -for values less than 256, and causes an error for greater values. -

    -

    -In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it -possible to construct invalid UTF-16 sequences for testing purposes. -

    -

    -In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This makes it -possible to construct invalid UTF-32 sequences for testing purposes. -

    -

    -The escapes that specify line ending sequences are literal strings, exactly as -shown. No more than one newline setting should be present in any data line. -

    -

    -A backslash followed by anything else just escapes the anything else. If -the very last character is a backslash, it is ignored. This gives a way of -passing an empty line as data, since a real empty line terminates the data -input. -

    -

    -The \J escape provides a way of setting the maximum stack size that is -used by the just-in-time optimization code. It is ignored if JIT optimization -is not being used. Providing a stack that is larger than the default 32K is -necessary only for very complicated patterns. -

    -

    -If \M is present, pcretest calls pcre[16|32]_exec() several times, -with different values in the match_limit and match_limit_recursion -fields of the pcre[16|32]_extra data structure, until it finds the minimum -numbers for each parameter that allow pcre[16|32]_exec() to complete without -error. Because this is testing a specific feature of the normal interpretive -pcre[16|32]_exec() execution, the use of any JIT optimization that might -have been set up by the /S+ qualifier of -s+ option is disabled. -

    -

    -The match_limit number is a measure of the amount of backtracking -that takes place, and checking it out can be instructive. For most simple -matches, the number is quite small, but for patterns with very large numbers of -matching possibilities, it can become large very quickly with increasing length -of subject string. The match_limit_recursion number is a measure of how -much stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is -needed to complete the match attempt. -

    -

    -When \O is used, the value specified may be higher or lower than the size set -by the -O command line option (or defaulted to 45); \O applies only to -the call of pcre[16|32]_exec() for the line in which it appears. -

    -

    -If the /P modifier was present on the pattern, causing the POSIX wrapper -API to be used, the only option-setting sequences that have any effect are \B, -\N, and \Z, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, -to be passed to regexec(). -

    -
    THE ALTERNATIVE MATCHING FUNCTION
    -

    -By default, pcretest uses the standard PCRE matching function, -pcre[16|32]_exec() to match each data line. PCRE also supports an -alternative matching function, pcre[16|32]_dfa_test(), which operates in a -different way, and has some restrictions. The differences between the two -functions are described in the -pcrematching -documentation. -

    -

    -If a data line contains the \D escape sequence, or if the command line -contains the -dfa option, the alternative matching function is used. -This function finds all possible matches at a given point. If, however, the \F -escape sequence is present in the data line, it stops after the first match is -found. This is always the shortest possible match. -

    -
    DEFAULT OUTPUT FROM PCRETEST
    -

    -This section describes the output when the normal matching function, -pcre[16|32]_exec(), is being used. -

    -

    -When a match succeeds, pcretest outputs the list of captured substrings -that pcre[16|32]_exec() returns, starting with number 0 for the string that -matched the whole pattern. Otherwise, it outputs "No match" when the return is -PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching -substring when pcre[16|32]_exec() returns PCRE_ERROR_PARTIAL. (Note that -this is the entire substring that was inspected during the partial match; it -may include characters before the actual match start if a lookbehind assertion, -\K, \b, or \B was involved.) For any other return, pcretest outputs -the PCRE negative error number and a short descriptive phrase. If the error is -a failed UTF string check, the offset of the start of the failing character and -the reason code are also output, provided that the size of the output vector is -at least two. Here is an example of an interactive pcretest run. -

    -  $ pcretest
    -  PCRE version 8.13 2011-04-30
    -
    -    re> /^abc(\d+)/
    -  data> abc123
    -   0: abc123
    -   1: 123
    -  data> xyz
    -  No match
    -
    -Unset capturing substrings that are not followed by one that is set are not -returned by pcre[16|32]_exec(), and are not shown by pcretest. In the -following example, there are two capturing substrings, but when the first data -line is matched, the second, unset substring is not shown. An "internal" unset -substring is shown as "<unset>", as for the second data line. -
    -    re> /(a)|(b)/
    -  data> a
    -   0: a
    -   1: a
    -  data> b
    -   0: b
    -   1: <unset>
    -   2: b
    -
    -If the strings contain any non-printing characters, they are output as \xhh -escapes if the value is less than 256 and UTF mode is not set. Otherwise they -are output as \x{hh...} escapes. See below for the definition of non-printing -characters. If the pattern has the /+ modifier, the output for substring -0 is followed by the the rest of the subject string, identified by "0+" like -this: -
    -    re> /cat/+
    -  data> cataract
    -   0: cat
    -   0+ aract
    -
    -If the pattern has the /g or /G modifier, the results of successive -matching attempts are output in sequence, like this: -
    -    re> /\Bi(\w\w)/g
    -  data> Mississippi
    -   0: iss
    -   1: ss
    -   0: iss
    -   1: ss
    -   0: ipp
    -   1: pp
    -
    -"No match" is output only if the first match attempt fails. Here is an example -of a failure message (the offset 4 that is specified by \>4 is past the end of -the subject string): -
    -    re> /xyz/
    -  data> xyz\>4
    -  Error -24 (bad offset value)
    -
    -

    -

    -If any of the sequences \C, \G, or \L are present in a -data line that is successfully matched, the substrings extracted by the -convenience functions are output with C, G, or L after the string number -instead of a colon. This is in addition to the normal full list. The string -length (that is, the return from the extraction function) is given in -parentheses after each string for \C and \G. -

    -

    -Note that whereas patterns can be continued over several lines (a plain ">" -prompt is used for continuations), data lines may not. However newlines can be -included in data by means of the \n escape (or \r, \r\n, etc., depending on -the newline sequence setting). -

    -
    OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
    -

    -When the alternative matching function, pcre[16|32]_dfa_exec(), is used (by -means of the \D escape sequence or the -dfa command line option), the -output consists of a list of all the matches that start at the first point in -the subject where there is at least one match. For example: -

    -    re> /(tang|tangerine|tan)/
    -  data> yellow tangerine\D
    -   0: tangerine
    -   1: tang
    -   2: tan
    -
    -(Using the normal matching function on this data finds only "tang".) The -longest matching string is always given first (and numbered zero). After a -PCRE_ERROR_PARTIAL return, the output is "Partial match:", followed by the -partially matching substring. (Note that this is the entire substring that was -inspected during the partial match; it may include characters before the actual -match start if a lookbehind assertion, \K, \b, or \B was involved.) -

    -

    -If /g is present on the pattern, the search for further matches resumes -at the end of the longest match. For example: -

    -    re> /(tang|tangerine|tan)/g
    -  data> yellow tangerine and tangy sultana\D
    -   0: tangerine
    -   1: tang
    -   2: tan
    -   0: tang
    -   1: tan
    -   0: tan
    -
    -Since the matching function does not support substring capture, the escape -sequences that are concerned with captured substrings are not relevant. -

    -
    RESTARTING AFTER A PARTIAL MATCH
    -

    -When the alternative matching function has given the PCRE_ERROR_PARTIAL return, -indicating that the subject partially matched the pattern, you can restart the -match with additional subject data by means of the \R escape sequence. For -example: -

    -    re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
    -  data> 23ja\P\D
    -  Partial match: 23ja
    -  data> n05\R\D
    -   0: n05
    -
    -For further information about partial matching, see the -pcrepartial -documentation. -

    -
    CALLOUTS
    -

    -If the pattern contains any callout requests, pcretest's callout function -is called during matching. This works with both matching functions. By default, -the called function displays the callout number, the start and current -positions in the text at the callout time, and the next pattern item to be -tested. For example: -

    -  --->pqrabcdef
    -    0    ^  ^     \d
    -
    -This output indicates that callout number 0 occurred for a match attempt -starting at the fourth character of the subject string, when the pointer was at -the seventh character of the data, and when the next pattern item was \d. Just -one circumflex is output if the start and current positions are the same. -

    -

    -Callouts numbered 255 are assumed to be automatic callouts, inserted as a -result of the /C pattern modifier. In this case, instead of showing the -callout number, the offset in the pattern, preceded by a plus, is output. For -example: -

    -    re> /\d?[A-E]\*/C
    -  data> E*
    -  --->E*
    -   +0 ^      \d?
    -   +3 ^      [A-E]
    -   +8 ^^     \*
    -  +10 ^ ^
    -   0: E*
    -
    -If a pattern contains (*MARK) items, an additional line is output whenever -a change of latest mark is passed to the callout function. For example: -
    -    re> /a(*MARK:X)bc/C
    -  data> abc
    -  --->abc
    -   +0 ^       a
    -   +1 ^^      (*MARK:X)
    -  +10 ^^      b
    -  Latest Mark: X
    -  +11 ^ ^     c
    -  +12 ^  ^
    -   0: abc
    -
    -The mark changes between matching "a" and "b", but stays the same for the rest -of the match, so nothing more is output. If, as a result of backtracking, the -mark reverts to being unset, the text "<unset>" is output. -

    -

    -The callout function in pcretest returns zero (carry on matching) by -default, but you can use a \C item in a data line (as described above) to -change this and other parameters of the callout. -

    -

    -Inserting callouts can be helpful when using pcretest to check -complicated regular expressions. For further information about callouts, see -the -pcrecallout -documentation. -

    -
    NON-PRINTING CHARACTERS
    -

    -When pcretest is outputting text in the compiled version of a pattern, -bytes other than 32-126 are always treated as non-printing characters are are -therefore shown as hex escapes. -

    -

    -When pcretest is outputting text that is a matched part of a subject -string, it behaves in the same way, unless a different locale has been set for -the pattern (using the /L modifier). In this case, the isprint() -function to distinguish printing and non-printing characters. -

    -
    SAVING AND RELOADING COMPILED PATTERNS
    -

    -The facilities described in this section are not available when the POSIX -interface to PCRE is being used, that is, when the /P pattern modifier is -specified. -

    -

    -When the POSIX interface is not in use, you can cause pcretest to write a -compiled pattern to a file, by following the modifiers with > and a file name. -For example: -

    -  /pattern/im >/some/file
    -
    -See the -pcreprecompile -documentation for a discussion about saving and re-using compiled patterns. -Note that if the pattern was successfully studied with JIT optimization, the -JIT data cannot be saved. -

    -

    -The data that is written is binary. The first eight bytes are the length of the -compiled pattern data followed by the length of the optional study data, each -written as four bytes in big-endian order (most significant byte first). If -there is no study data (either the pattern was not studied, or studying did not -return any data), the second length is zero. The lengths are followed by an -exact copy of the compiled pattern. If there is additional study data, this -(excluding any JIT data) follows immediately after the compiled pattern. After -writing the file, pcretest expects to read a new pattern. -

    -

    -A saved pattern can be reloaded into pcretest by specifying < and a file -name instead of a pattern. There must be no space between < and the file name, -which must not contain a < character, as otherwise pcretest will -interpret the line as a pattern delimited by < characters. For example: -

    -   re> </some/file
    -  Compiled pattern loaded from /some/file
    -  No study data
    -
    -If the pattern was previously studied with the JIT optimization, the JIT -information cannot be saved and restored, and so is lost. When the pattern has -been loaded, pcretest proceeds to read data lines in the usual way. -

    -

    -You can copy a file written by pcretest to a different host and reload it -there, even if the new host has opposite endianness to the one on which the -pattern was compiled. For example, you can compile on an i86 machine and run on -a SPARC machine. When a pattern is reloaded on a host with different -endianness, the confirmation message is changed to: -

    -  Compiled pattern (byte-inverted) loaded from /some/file
    -
    -The test suite contains some saved pre-compiled patterns with different -endianness. These are reloaded using "<!" instead of just "<". This suppresses -the "(byte-inverted)" text so that the output is the same on all hosts. It also -forces debugging output once the pattern has been reloaded. -

    -

    -File names for saving and reloading can be absolute or relative, but note that -the shell facility of expanding a file name that starts with a tilde (~) is not -available. -

    -

    -The ability to save and reload files in pcretest is intended for testing -and experimentation. It is not intended for production use because only a -single pattern can be written to a file. Furthermore, there is no facility for -supplying custom character tables for use with a reloaded pattern. If the -original pattern was compiled with custom tables, an attempt to match a subject -string using a reloaded pattern is likely to cause pcretest to crash. -Finally, if you attempt to load a file that is not in the correct format, the -result is undefined. -

    -
    SEE ALSO
    -

    -pcre(3), pcre16(3), pcre32(3), pcreapi(3), -pcrecallout(3), -pcrejit, pcrematching(3), pcrepartial(d), -pcrepattern(3), pcreprecompile(3). -

    -
    AUTHOR
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    REVISION
    -

    -Last updated: 23 February 2017 -
    -Copyright © 1997-2017 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/html/pcreunicode.html b/src/pcre/doc/html/pcreunicode.html deleted file mode 100644 index ab36bc61..00000000 --- a/src/pcre/doc/html/pcreunicode.html +++ /dev/null @@ -1,262 +0,0 @@ - - -pcreunicode specification - - -

    pcreunicode man page

    -

    -Return to the PCRE index page. -

    -

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. -
    -
    -UTF-8, UTF-16, UTF-32, AND UNICODE PROPERTY SUPPORT -
    -

    -As well as UTF-8 support, PCRE also supports UTF-16 (from release 8.30) and -UTF-32 (from release 8.32), by means of two additional libraries. They can be -built as well as, or instead of, the 8-bit library. -

    -
    -UTF-8 SUPPORT -
    -

    -In order process UTF-8 strings, you must build PCRE's 8-bit library with UTF -support, and, in addition, you must call -pcre_compile() -with the PCRE_UTF8 option flag, or the pattern must start with the sequence -(*UTF8) or (*UTF). When either of these is the case, both the pattern and any -subject strings that are matched against it are treated as UTF-8 strings -instead of strings of individual 1-byte characters. -

    -
    -UTF-16 AND UTF-32 SUPPORT -
    -

    -In order process UTF-16 or UTF-32 strings, you must build PCRE's 16-bit or -32-bit library with UTF support, and, in addition, you must call -pcre16_compile() -or -pcre32_compile() -with the PCRE_UTF16 or PCRE_UTF32 option flag, as appropriate. Alternatively, -the pattern must start with the sequence (*UTF16), (*UTF32), as appropriate, or -(*UTF), which can be used with either library. When UTF mode is set, both the -pattern and any subject strings that are matched against it are treated as -UTF-16 or UTF-32 strings instead of strings of individual 16-bit or 32-bit -characters. -

    -
    -UTF SUPPORT OVERHEAD -
    -

    -If you compile PCRE with UTF support, but do not use it at run time, the -library will be a bit bigger, but the additional run time overhead is limited -to testing the PCRE_UTF[8|16|32] flag occasionally, so should not be very big. -

    -
    -UNICODE PROPERTY SUPPORT -
    -

    -If PCRE is built with Unicode character property support (which implies UTF -support), the escape sequences \p{..}, \P{..}, and \X can be used. -The available properties that can be tested are limited to the general -category properties such as Lu for an upper case letter or Nd for a decimal -number, the Unicode script names such as Arabic or Han, and the derived -properties Any and L&. Full lists is given in the -pcrepattern -and -pcresyntax -documentation. Only the short names for properties are supported. For example, -\p{L} matches a letter. Its Perl synonym, \p{Letter}, is not supported. -Furthermore, in Perl, many properties may optionally be prefixed by "Is", for -compatibility with Perl 5.6. PCRE does not support this. -

    -
    -Validity of UTF-8 strings -
    -

    -When you set the PCRE_UTF8 flag, the byte strings passed as patterns and -subjects are (by default) checked for validity on entry to the relevant -functions. The entire string is checked before any other processing takes -place. From release 7.3 of PCRE, the check is according the rules of RFC 3629, -which are themselves derived from the Unicode specification. Earlier releases -of PCRE followed the rules of RFC 2279, which allows the full range of 31-bit -values (0 to 0x7FFFFFFF). The current check allows only values in the range U+0 -to U+10FFFF, excluding the surrogate area. (From release 8.33 the so-called -"non-character" code points are no longer excluded because Unicode corrigendum -#9 makes it clear that they should not be.) -

    -

    -Characters in the "Surrogate Area" of Unicode are reserved for use by UTF-16, -where they are used in pairs to encode codepoints with values greater than -0xFFFF. The code points that are encoded by UTF-16 pairs are available -independently in the UTF-8 and UTF-32 encodings. (In other words, the whole -surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8 and -UTF-32.) -

    -

    -If an invalid UTF-8 string is passed to PCRE, an error return is given. At -compile time, the only additional information is the offset to the first byte -of the failing character. The run-time functions pcre_exec() and -pcre_dfa_exec() also pass back this information, as well as a more -detailed reason code if the caller has provided memory in which to do this. -

    -

    -In some situations, you may already know that your strings are valid, and -therefore want to skip these checks in order to improve performance, for -example in the case of a long subject string that is being scanned repeatedly. -If you set the PCRE_NO_UTF8_CHECK flag at compile time or at run time, PCRE -assumes that the pattern or subject it is given (respectively) contains only -valid UTF-8 codes. In this case, it does not diagnose an invalid UTF-8 string. -

    -

    -Note that passing PCRE_NO_UTF8_CHECK to pcre_compile() just disables the -check for the pattern; it does not also apply to subject strings. If you want -to disable the check for a subject string you must pass this option to -pcre_exec() or pcre_dfa_exec(). -

    -

    -If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, the result -is undefined and your program may crash. -

    -
    -Validity of UTF-16 strings -
    -

    -When you set the PCRE_UTF16 flag, the strings of 16-bit data units that are -passed as patterns and subjects are (by default) checked for validity on entry -to the relevant functions. Values other than those in the surrogate range -U+D800 to U+DFFF are independent code points. Values in the surrogate range -must be used in pairs in the correct manner. -

    -

    -If an invalid UTF-16 string is passed to PCRE, an error return is given. At -compile time, the only additional information is the offset to the first data -unit of the failing character. The run-time functions pcre16_exec() and -pcre16_dfa_exec() also pass back this information, as well as a more -detailed reason code if the caller has provided memory in which to do this. -

    -

    -In some situations, you may already know that your strings are valid, and -therefore want to skip these checks in order to improve performance. If you set -the PCRE_NO_UTF16_CHECK flag at compile time or at run time, PCRE assumes that -the pattern or subject it is given (respectively) contains only valid UTF-16 -sequences. In this case, it does not diagnose an invalid UTF-16 string. -However, if an invalid string is passed, the result is undefined. -

    -
    -Validity of UTF-32 strings -
    -

    -When you set the PCRE_UTF32 flag, the strings of 32-bit data units that are -passed as patterns and subjects are (by default) checked for validity on entry -to the relevant functions. This check allows only values in the range U+0 -to U+10FFFF, excluding the surrogate area U+D800 to U+DFFF. -

    -

    -If an invalid UTF-32 string is passed to PCRE, an error return is given. At -compile time, the only additional information is the offset to the first data -unit of the failing character. The run-time functions pcre32_exec() and -pcre32_dfa_exec() also pass back this information, as well as a more -detailed reason code if the caller has provided memory in which to do this. -

    -

    -In some situations, you may already know that your strings are valid, and -therefore want to skip these checks in order to improve performance. If you set -the PCRE_NO_UTF32_CHECK flag at compile time or at run time, PCRE assumes that -the pattern or subject it is given (respectively) contains only valid UTF-32 -sequences. In this case, it does not diagnose an invalid UTF-32 string. -However, if an invalid string is passed, the result is undefined. -

    -
    -General comments about UTF modes -
    -

    -1. Codepoints less than 256 can be specified in patterns by either braced or -unbraced hexadecimal escape sequences (for example, \x{b3} or \xb3). Larger -values have to use braced sequences. -

    -

    -2. Octal numbers up to \777 are recognized, and in UTF-8 mode they match -two-byte characters for values greater than \177. -

    -

    -3. Repeat quantifiers apply to complete UTF characters, not to individual -data units, for example: \x{100}{3}. -

    -

    -4. The dot metacharacter matches one UTF character instead of a single data -unit. -

    -

    -5. The escape sequence \C can be used to match a single byte in UTF-8 mode, or -a single 16-bit data unit in UTF-16 mode, or a single 32-bit data unit in -UTF-32 mode, but its use can lead to some strange effects because it breaks up -multi-unit characters (see the description of \C in the -pcrepattern -documentation). The use of \C is not supported in the alternative matching -function pcre[16|32]_dfa_exec(), nor is it supported in UTF mode by the -JIT optimization of pcre[16|32]_exec(). If JIT optimization is requested -for a UTF pattern that contains \C, it will not succeed, and so the matching -will be carried out by the normal interpretive function. -

    -

    -6. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly -test characters of any code value, but, by default, the characters that PCRE -recognizes as digits, spaces, or word characters remain the same set as in -non-UTF mode, all with values less than 256. This remains true even when PCRE -is built to include Unicode property support, because to do otherwise would -slow down PCRE in many common cases. Note in particular that this applies to -\b and \B, because they are defined in terms of \w and \W. If you really -want to test for a wider sense of, say, "digit", you can use explicit Unicode -property tests such as \p{Nd}. Alternatively, if you set the PCRE_UCP option, -the way that the character escapes work is changed so that Unicode properties -are used to determine which characters match. There are more details in the -section on -generic character types -in the -pcrepattern -documentation. -

    -

    -7. Similarly, characters that match the POSIX named character classes are all -low-valued characters, unless the PCRE_UCP option is set. -

    -

    -8. However, the horizontal and vertical white space matching escapes (\h, \H, -\v, and \V) do match all the appropriate Unicode characters, whether or not -PCRE_UCP is set. -

    -

    -9. Case-insensitive matching applies only to characters whose values are less -than 128, unless PCRE is built with Unicode property support. A few Unicode -characters such as Greek sigma have more than two codepoints that are -case-equivalent. Up to and including PCRE release 8.31, only one-to-one case -mappings were supported, but later releases (with Unicode property support) do -treat as case-equivalent all versions of characters such as Greek sigma. -

    -
    -AUTHOR -
    -

    -Philip Hazel -
    -University Computing Service -
    -Cambridge CB2 3QH, England. -
    -

    -
    -REVISION -
    -

    -Last updated: 27 February 2013 -
    -Copyright © 1997-2013 University of Cambridge. -
    -

    -Return to the PCRE index page. -

    diff --git a/src/pcre/doc/index.html.src b/src/pcre/doc/index.html.src deleted file mode 100644 index 887f4d75..00000000 --- a/src/pcre/doc/index.html.src +++ /dev/null @@ -1,185 +0,0 @@ - - - -PCRE specification - - -

    Perl-compatible Regular Expressions (PCRE)

    -

    -The HTML documentation for PCRE consists of a number of pages that are listed -below in alphabetical order. If you are new to PCRE, please read the first one -first. -

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    pcre  Introductory page
    pcre-config  Information about the installation configuration
    pcre16  Discussion of the 16-bit PCRE library
    pcre32  Discussion of the 32-bit PCRE library
    pcreapi  PCRE's native API
    pcrebuild  Building PCRE
    pcrecallout  The callout facility
    pcrecompat  Compability with Perl
    pcrecpp  The C++ wrapper for the PCRE library
    pcredemo  A demonstration C program that uses the PCRE library
    pcregrep  The pcregrep command
    pcrejit  Discussion of the just-in-time optimization support
    pcrelimits  Details of size and other limits
    pcrematching  Discussion of the two matching algorithms
    pcrepartial  Using PCRE for partial matching
    pcrepattern  Specification of the regular expressions supported by PCRE
    pcreperform  Some comments on performance
    pcreposix  The POSIX API to the PCRE 8-bit library
    pcreprecompile  How to save and re-use compiled patterns
    pcresample  Discussion of the pcredemo program
    pcrestack  Discussion of PCRE's stack usage
    pcresyntax  Syntax quick-reference summary
    pcretest  The pcretest command for testing PCRE
    pcreunicode  Discussion of Unicode and UTF-8/UTF-16/UTF-32 support
    - -

    -There are also individual pages that summarize the interface for each function -in the library. There is a single page for each triple of 8-bit/16-bit/32-bit -functions. -

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    pcre_assign_jit_stack  Assign stack for JIT matching
    pcre_compile  Compile a regular expression
    pcre_compile2  Compile a regular expression (alternate interface)
    pcre_config  Show build-time configuration options
    pcre_copy_named_substring  Extract named substring into given buffer
    pcre_copy_substring  Extract numbered substring into given buffer
    pcre_dfa_exec  Match a compiled pattern to a subject string - (DFA algorithm; not Perl compatible)
    pcre_exec  Match a compiled pattern to a subject string - (Perl compatible)
    pcre_free_study  Free study data
    pcre_free_substring  Free extracted substring
    pcre_free_substring_list  Free list of extracted substrings
    pcre_fullinfo  Extract information about a pattern
    pcre_get_named_substring  Extract named substring into new memory
    pcre_get_stringnumber  Convert captured string name to number
    pcre_get_stringtable_entries  Find table entries for given string name
    pcre_get_substring  Extract numbered substring into new memory
    pcre_get_substring_list  Extract all substrings into new memory
    pcre_jit_exec  Fast path interface to JIT matching
    pcre_jit_stack_alloc  Create a stack for JIT matching
    pcre_jit_stack_free  Free a JIT matching stack
    pcre_maketables  Build character tables in current locale
    pcre_pattern_to_host_byte_order  Convert compiled pattern to host byte order if necessary
    pcre_refcount  Maintain reference count in compiled pattern
    pcre_study  Study a compiled pattern
    pcre_utf16_to_host_byte_order  Convert UTF-16 string to host byte order if necessary
    pcre_utf32_to_host_byte_order  Convert UTF-32 string to host byte order if necessary
    pcre_version  Return PCRE version and release date
    - - diff --git a/src/pcre/doc/pcre-config.1 b/src/pcre/doc/pcre-config.1 deleted file mode 100644 index 52eb4fb2..00000000 --- a/src/pcre/doc/pcre-config.1 +++ /dev/null @@ -1,92 +0,0 @@ -.TH PCRE-CONFIG 1 "01 January 2012" "PCRE 8.30" -.SH NAME -pcre-config - program to return PCRE configuration -.SH SYNOPSIS -.rs -.sp -.nf -.B pcre-config [--prefix] [--exec-prefix] [--version] [--libs] -.B " [--libs16] [--libs32] [--libs-cpp] [--libs-posix]" -.B " [--cflags] [--cflags-posix]" -.fi -. -. -.SH DESCRIPTION -.rs -.sp -\fBpcre-config\fP returns the configuration of the installed PCRE -libraries and the options required to compile a program to use them. Some of -the options apply only to the 8-bit, or 16-bit, or 32-bit libraries, -respectively, and are -not available if only one of those libraries has been built. If an unavailable -option is encountered, the "usage" information is output. -. -. -.SH OPTIONS -.rs -.TP 10 -\fB--prefix\fP -Writes the directory prefix used in the PCRE installation for architecture -independent files (\fI/usr\fP on many systems, \fI/usr/local\fP on some -systems) to the standard output. -.TP 10 -\fB--exec-prefix\fP -Writes the directory prefix used in the PCRE installation for architecture -dependent files (normally the same as \fB--prefix\fP) to the standard output. -.TP 10 -\fB--version\fP -Writes the version number of the installed PCRE libraries to the standard -output. -.TP 10 -\fB--libs\fP -Writes to the standard output the command line options required to link -with the 8-bit PCRE library (\fB-lpcre\fP on many systems). -.TP 10 -\fB--libs16\fP -Writes to the standard output the command line options required to link -with the 16-bit PCRE library (\fB-lpcre16\fP on many systems). -.TP 10 -\fB--libs32\fP -Writes to the standard output the command line options required to link -with the 32-bit PCRE library (\fB-lpcre32\fP on many systems). -.TP 10 -\fB--libs-cpp\fP -Writes to the standard output the command line options required to link with -PCRE's C++ wrapper library (\fB-lpcrecpp\fP \fB-lpcre\fP on many -systems). -.TP 10 -\fB--libs-posix\fP -Writes to the standard output the command line options required to link with -PCRE's POSIX API wrapper library (\fB-lpcreposix\fP \fB-lpcre\fP on many -systems). -.TP 10 -\fB--cflags\fP -Writes to the standard output the command line options required to compile -files that use PCRE (this may include some \fB-I\fP options, but is blank on -many systems). -.TP 10 -\fB--cflags-posix\fP -Writes to the standard output the command line options required to compile -files that use PCRE's POSIX API wrapper library (this may include some \fB-I\fP -options, but is blank on many systems). -. -. -.SH "SEE ALSO" -.rs -.sp -\fBpcre(3)\fP -. -. -.SH AUTHOR -.rs -.sp -This manual page was originally written by Mark Baker for the Debian GNU/Linux -system. It has been subsequently revised as a generic PCRE man page. -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 24 June 2012 -.fi diff --git a/src/pcre/doc/pcre-config.txt b/src/pcre/doc/pcre-config.txt deleted file mode 100644 index 8503ab0e..00000000 --- a/src/pcre/doc/pcre-config.txt +++ /dev/null @@ -1,86 +0,0 @@ -PCRE-CONFIG(1) General Commands Manual PCRE-CONFIG(1) - - - -NAME - pcre-config - program to return PCRE configuration - -SYNOPSIS - - pcre-config [--prefix] [--exec-prefix] [--version] [--libs] - [--libs16] [--libs32] [--libs-cpp] [--libs-posix] - [--cflags] [--cflags-posix] - - -DESCRIPTION - - pcre-config returns the configuration of the installed PCRE libraries - and the options required to compile a program to use them. Some of the - options apply only to the 8-bit, or 16-bit, or 32-bit libraries, - respectively, and are not available if only one of those libraries has - been built. If an unavailable option is encountered, the "usage" infor- - mation is output. - - -OPTIONS - - --prefix Writes the directory prefix used in the PCRE installation for - architecture independent files (/usr on many systems, - /usr/local on some systems) to the standard output. - - --exec-prefix - Writes the directory prefix used in the PCRE installation for - architecture dependent files (normally the same as --prefix) - to the standard output. - - --version Writes the version number of the installed PCRE libraries to - the standard output. - - --libs Writes to the standard output the command line options - required to link with the 8-bit PCRE library (-lpcre on many - systems). - - --libs16 Writes to the standard output the command line options - required to link with the 16-bit PCRE library (-lpcre16 on - many systems). - - --libs32 Writes to the standard output the command line options - required to link with the 32-bit PCRE library (-lpcre32 on - many systems). - - --libs-cpp - Writes to the standard output the command line options - required to link with PCRE's C++ wrapper library (-lpcrecpp - -lpcre on many systems). - - --libs-posix - Writes to the standard output the command line options - required to link with PCRE's POSIX API wrapper library - (-lpcreposix -lpcre on many systems). - - --cflags Writes to the standard output the command line options - required to compile files that use PCRE (this may include - some -I options, but is blank on many systems). - - --cflags-posix - Writes to the standard output the command line options - required to compile files that use PCRE's POSIX API wrapper - library (this may include some -I options, but is blank on - many systems). - - -SEE ALSO - - pcre(3) - - -AUTHOR - - This manual page was originally written by Mark Baker for the Debian - GNU/Linux system. It has been subsequently revised as a generic PCRE - man page. - - -REVISION - - Last updated: 24 June 2012 diff --git a/src/pcre/doc/pcre.3 b/src/pcre/doc/pcre.3 deleted file mode 100644 index 0f2837e7..00000000 --- a/src/pcre/doc/pcre.3 +++ /dev/null @@ -1,230 +0,0 @@ -.TH PCRE 3 "10 February 2015" "PCRE 8.37" -.SH NAME -PCRE - Perl-compatible regular expressions (original API) -.SH "PLEASE TAKE NOTE" -.rs -.sp -This document relates to PCRE releases that use the original API, -with library names libpcre, libpcre16, and libpcre32. January 2015 saw the -first release of a new API, known as PCRE2, with release numbers starting at -10.00 and library names libpcre2-8, libpcre2-16, and libpcre2-32. The old -libraries (now called PCRE1) are still being maintained for bug fixes, but -there will be no new development. New projects are advised to use the new PCRE2 -libraries. -. -. -.SH INTRODUCTION -.rs -.sp -The PCRE library is a set of functions that implement regular expression -pattern matching using the same syntax and semantics as Perl, with just a few -differences. Some features that appeared in Python and PCRE before they -appeared in Perl are also available using the Python syntax, there is some -support for one or two .NET and Oniguruma syntax items, and there is an option -for requesting some minor changes that give better JavaScript compatibility. -.P -Starting with release 8.30, it is possible to compile two separate PCRE -libraries: the original, which supports 8-bit character strings (including -UTF-8 strings), and a second library that supports 16-bit character strings -(including UTF-16 strings). The build process allows either one or both to be -built. The majority of the work to make this possible was done by Zoltan -Herczeg. -.P -Starting with release 8.32 it is possible to compile a third separate PCRE -library that supports 32-bit character strings (including UTF-32 strings). The -build process allows any combination of the 8-, 16- and 32-bit libraries. The -work to make this possible was done by Christian Persch. -.P -The three libraries contain identical sets of functions, except that the names -in the 16-bit library start with \fBpcre16_\fP instead of \fBpcre_\fP, and the -names in the 32-bit library start with \fBpcre32_\fP instead of \fBpcre_\fP. To -avoid over-complication and reduce the documentation maintenance load, most of -the documentation describes the 8-bit library, with the differences for the -16-bit and 32-bit libraries described separately in the -.\" HREF -\fBpcre16\fP -and -.\" HREF -\fBpcre32\fP -.\" -pages. References to functions or structures of the form \fIpcre[16|32]_xxx\fP -should be read as meaning "\fIpcre_xxx\fP when using the 8-bit library, -\fIpcre16_xxx\fP when using the 16-bit library, or \fIpcre32_xxx\fP when using -the 32-bit library". -.P -The current implementation of PCRE corresponds approximately with Perl 5.12, -including support for UTF-8/16/32 encoded strings and Unicode general category -properties. However, UTF-8/16/32 and Unicode support has to be explicitly -enabled; it is not the default. The Unicode tables correspond to Unicode -release 6.3.0. -.P -In addition to the Perl-compatible matching function, PCRE contains an -alternative function that matches the same compiled patterns in a different -way. In certain circumstances, the alternative function has some advantages. -For a discussion of the two matching algorithms, see the -.\" HREF -\fBpcrematching\fP -.\" -page. -.P -PCRE is written in C and released as a C library. A number of people have -written wrappers and interfaces of various kinds. In particular, Google Inc. -have provided a comprehensive C++ wrapper for the 8-bit library. This is now -included as part of the PCRE distribution. The -.\" HREF -\fBpcrecpp\fP -.\" -page has details of this interface. Other people's contributions can be found -in the \fIContrib\fP directory at the primary FTP site, which is: -.sp -.\" HTML -.\" -ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre -.\" -.P -Details of exactly which Perl regular expression features are and are not -supported by PCRE are given in separate documents. See the -.\" HREF -\fBpcrepattern\fP -.\" -and -.\" HREF -\fBpcrecompat\fP -.\" -pages. There is a syntax summary in the -.\" HREF -\fBpcresyntax\fP -.\" -page. -.P -Some features of PCRE can be included, excluded, or changed when the library is -built. The -.\" HREF -\fBpcre_config()\fP -.\" -function makes it possible for a client to discover which features are -available. The features themselves are described in the -.\" HREF -\fBpcrebuild\fP -.\" -page. Documentation about building PCRE for various operating systems can be -found in the -.\" HTML -.\" -\fBREADME\fP -.\" -and -.\" HTML -.\" -\fBNON-AUTOTOOLS_BUILD\fP -.\" -files in the source distribution. -.P -The libraries contains a number of undocumented internal functions and data -tables that are used by more than one of the exported external functions, but -which are not intended for use by external callers. Their names all begin with -"_pcre_" or "_pcre16_" or "_pcre32_", which hopefully will not provoke any name -clashes. In some environments, it is possible to control which external symbols -are exported when a shared library is built, and in these cases the -undocumented symbols are not exported. -. -. -.SH "SECURITY CONSIDERATIONS" -.rs -.sp -If you are using PCRE in a non-UTF application that permits users to supply -arbitrary patterns for compilation, you should be aware of a feature that -allows users to turn on UTF support from within a pattern, provided that PCRE -was built with UTF support. For example, an 8-bit pattern that begins with -"(*UTF8)" or "(*UTF)" turns on UTF-8 mode, which interprets patterns and -subjects as strings of UTF-8 characters instead of individual 8-bit characters. -This causes both the pattern and any data against which it is matched to be -checked for UTF-8 validity. If the data string is very long, such a check might -use sufficiently many resources as to cause your application to lose -performance. -.P -One way of guarding against this possibility is to use the -\fBpcre_fullinfo()\fP function to check the compiled pattern's options for UTF. -Alternatively, from release 8.33, you can set the PCRE_NEVER_UTF option at -compile time. This causes an compile time error if a pattern contains a -UTF-setting sequence. -.P -If your application is one that supports UTF, be aware that validity checking -can take time. If the same data string is to be matched many times, you can use -the PCRE_NO_UTF[8|16|32]_CHECK option for the second and subsequent matches to -save redundant checks. -.P -Another way that performance can be hit is by running a pattern that has a very -large search tree against a string that will never match. Nested unlimited -repeats in a pattern are a common example. PCRE provides some protection -against this: see the PCRE_EXTRA_MATCH_LIMIT feature in the -.\" HREF -\fBpcreapi\fP -.\" -page. -. -. -.SH "USER DOCUMENTATION" -.rs -.sp -The user documentation for PCRE comprises a number of different sections. In -the "man" format, each of these is a separate "man page". In the HTML format, -each is a separate page, linked from the index page. In the plain text format, -the descriptions of the \fBpcregrep\fP and \fBpcretest\fP programs are in files -called \fBpcregrep.txt\fP and \fBpcretest.txt\fP, respectively. The remaining -sections, except for the \fBpcredemo\fP section (which is a program listing), -are concatenated in \fBpcre.txt\fP, for ease of searching. The sections are as -follows: -.sp - pcre this document - pcre-config show PCRE installation configuration information - pcre16 details of the 16-bit library - pcre32 details of the 32-bit library - pcreapi details of PCRE's native C API - pcrebuild building PCRE - pcrecallout details of the callout feature - pcrecompat discussion of Perl compatibility - pcrecpp details of the C++ wrapper for the 8-bit library - pcredemo a demonstration C program that uses PCRE - pcregrep description of the \fBpcregrep\fP command (8-bit only) - pcrejit discussion of the just-in-time optimization support - pcrelimits details of size and other limits - pcrematching discussion of the two matching algorithms - pcrepartial details of the partial matching facility -.\" JOIN - pcrepattern syntax and semantics of supported - regular expressions - pcreperform discussion of performance issues - pcreposix the POSIX-compatible C API for the 8-bit library - pcreprecompile details of saving and re-using precompiled patterns - pcresample discussion of the pcredemo program - pcrestack discussion of stack usage - pcresyntax quick syntax reference - pcretest description of the \fBpcretest\fP testing command - pcreunicode discussion of Unicode and UTF-8/16/32 support -.sp -In the "man" and HTML formats, there is also a short page for each C library -function, listing its arguments and results. -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -.P -Putting an actual email address here seems to have been a spam magnet, so I've -taken it away. If you want to email me, use my two initials, followed by the -two digits 10, at the domain cam.ac.uk. -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 10 February 2015 -Copyright (c) 1997-2015 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcre.txt b/src/pcre/doc/pcre.txt deleted file mode 100644 index c027538f..00000000 --- a/src/pcre/doc/pcre.txt +++ /dev/null @@ -1,10502 +0,0 @@ ------------------------------------------------------------------------------ -This file contains a concatenation of the PCRE man pages, converted to plain -text format for ease of searching with a text editor, or for use on systems -that do not have a man page processor. The small individual files that give -synopses of each function in the library have not been included. Neither has -the pcredemo program. There are separate text files for the pcregrep and -pcretest commands. ------------------------------------------------------------------------------ - - -PCRE(3) Library Functions Manual PCRE(3) - - - -NAME - PCRE - Perl-compatible regular expressions (original API) - -PLEASE TAKE NOTE - - This document relates to PCRE releases that use the original API, with - library names libpcre, libpcre16, and libpcre32. January 2015 saw the - first release of a new API, known as PCRE2, with release numbers start- - ing at 10.00 and library names libpcre2-8, libpcre2-16, and - libpcre2-32. The old libraries (now called PCRE1) are still being main- - tained for bug fixes, but there will be no new development. New - projects are advised to use the new PCRE2 libraries. - - -INTRODUCTION - - The PCRE library is a set of functions that implement regular expres- - sion pattern matching using the same syntax and semantics as Perl, with - just a few differences. Some features that appeared in Python and PCRE - before they appeared in Perl are also available using the Python syn- - tax, there is some support for one or two .NET and Oniguruma syntax - items, and there is an option for requesting some minor changes that - give better JavaScript compatibility. - - Starting with release 8.30, it is possible to compile two separate PCRE - libraries: the original, which supports 8-bit character strings - (including UTF-8 strings), and a second library that supports 16-bit - character strings (including UTF-16 strings). The build process allows - either one or both to be built. The majority of the work to make this - possible was done by Zoltan Herczeg. - - Starting with release 8.32 it is possible to compile a third separate - PCRE library that supports 32-bit character strings (including UTF-32 - strings). The build process allows any combination of the 8-, 16- and - 32-bit libraries. The work to make this possible was done by Christian - Persch. - - The three libraries contain identical sets of functions, except that - the names in the 16-bit library start with pcre16_ instead of pcre_, - and the names in the 32-bit library start with pcre32_ instead of - pcre_. To avoid over-complication and reduce the documentation mainte- - nance load, most of the documentation describes the 8-bit library, with - the differences for the 16-bit and 32-bit libraries described sepa- - rately in the pcre16 and pcre32 pages. References to functions or - structures of the form pcre[16|32]_xxx should be read as meaning - "pcre_xxx when using the 8-bit library, pcre16_xxx when using the - 16-bit library, or pcre32_xxx when using the 32-bit library". - - The current implementation of PCRE corresponds approximately with Perl - 5.12, including support for UTF-8/16/32 encoded strings and Unicode - general category properties. However, UTF-8/16/32 and Unicode support - has to be explicitly enabled; it is not the default. The Unicode tables - correspond to Unicode release 6.3.0. - - In addition to the Perl-compatible matching function, PCRE contains an - alternative function that matches the same compiled patterns in a dif- - ferent way. In certain circumstances, the alternative function has some - advantages. For a discussion of the two matching algorithms, see the - pcrematching page. - - PCRE is written in C and released as a C library. A number of people - have written wrappers and interfaces of various kinds. In particular, - Google Inc. have provided a comprehensive C++ wrapper for the 8-bit - library. This is now included as part of the PCRE distribution. The - pcrecpp page has details of this interface. Other people's contribu- - tions can be found in the Contrib directory at the primary FTP site, - which is: - - ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre - - Details of exactly which Perl regular expression features are and are - not supported by PCRE are given in separate documents. See the pcrepat- - tern and pcrecompat pages. There is a syntax summary in the pcresyntax - page. - - Some features of PCRE can be included, excluded, or changed when the - library is built. The pcre_config() function makes it possible for a - client to discover which features are available. The features them- - selves are described in the pcrebuild page. Documentation about build- - ing PCRE for various operating systems can be found in the README and - NON-AUTOTOOLS_BUILD files in the source distribution. - - The libraries contains a number of undocumented internal functions and - data tables that are used by more than one of the exported external - functions, but which are not intended for use by external callers. - Their names all begin with "_pcre_" or "_pcre16_" or "_pcre32_", which - hopefully will not provoke any name clashes. In some environments, it - is possible to control which external symbols are exported when a - shared library is built, and in these cases the undocumented symbols - are not exported. - - -SECURITY CONSIDERATIONS - - If you are using PCRE in a non-UTF application that permits users to - supply arbitrary patterns for compilation, you should be aware of a - feature that allows users to turn on UTF support from within a pattern, - provided that PCRE was built with UTF support. For example, an 8-bit - pattern that begins with "(*UTF8)" or "(*UTF)" turns on UTF-8 mode, - which interprets patterns and subjects as strings of UTF-8 characters - instead of individual 8-bit characters. This causes both the pattern - and any data against which it is matched to be checked for UTF-8 valid- - ity. If the data string is very long, such a check might use suffi- - ciently many resources as to cause your application to lose perfor- - mance. - - One way of guarding against this possibility is to use the - pcre_fullinfo() function to check the compiled pattern's options for - UTF. Alternatively, from release 8.33, you can set the PCRE_NEVER_UTF - option at compile time. This causes an compile time error if a pattern - contains a UTF-setting sequence. - - If your application is one that supports UTF, be aware that validity - checking can take time. If the same data string is to be matched many - times, you can use the PCRE_NO_UTF[8|16|32]_CHECK option for the second - and subsequent matches to save redundant checks. - - Another way that performance can be hit is by running a pattern that - has a very large search tree against a string that will never match. - Nested unlimited repeats in a pattern are a common example. PCRE pro- - vides some protection against this: see the PCRE_EXTRA_MATCH_LIMIT fea- - ture in the pcreapi page. - - -USER DOCUMENTATION - - The user documentation for PCRE comprises a number of different sec- - tions. In the "man" format, each of these is a separate "man page". In - the HTML format, each is a separate page, linked from the index page. - In the plain text format, the descriptions of the pcregrep and pcretest - programs are in files called pcregrep.txt and pcretest.txt, respec- - tively. The remaining sections, except for the pcredemo section (which - is a program listing), are concatenated in pcre.txt, for ease of - searching. The sections are as follows: - - pcre this document - pcre-config show PCRE installation configuration information - pcre16 details of the 16-bit library - pcre32 details of the 32-bit library - pcreapi details of PCRE's native C API - pcrebuild building PCRE - pcrecallout details of the callout feature - pcrecompat discussion of Perl compatibility - pcrecpp details of the C++ wrapper for the 8-bit library - pcredemo a demonstration C program that uses PCRE - pcregrep description of the pcregrep command (8-bit only) - pcrejit discussion of the just-in-time optimization support - pcrelimits details of size and other limits - pcrematching discussion of the two matching algorithms - pcrepartial details of the partial matching facility - pcrepattern syntax and semantics of supported - regular expressions - pcreperform discussion of performance issues - pcreposix the POSIX-compatible C API for the 8-bit library - pcreprecompile details of saving and re-using precompiled patterns - pcresample discussion of the pcredemo program - pcrestack discussion of stack usage - pcresyntax quick syntax reference - pcretest description of the pcretest testing command - pcreunicode discussion of Unicode and UTF-8/16/32 support - - In the "man" and HTML formats, there is also a short page for each C - library function, listing its arguments and results. - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - Putting an actual email address here seems to have been a spam magnet, - so I've taken it away. If you want to email me, use my two initials, - followed by the two digits 10, at the domain cam.ac.uk. - - -REVISION - - Last updated: 10 February 2015 - Copyright (c) 1997-2015 University of Cambridge. ------------------------------------------------------------------------------- - - -PCRE(3) Library Functions Manual PCRE(3) - - - -NAME - PCRE - Perl-compatible regular expressions - - #include - - -PCRE 16-BIT API BASIC FUNCTIONS - - pcre16 *pcre16_compile(PCRE_SPTR16 pattern, int options, - const char **errptr, int *erroffset, - const unsigned char *tableptr); - - pcre16 *pcre16_compile2(PCRE_SPTR16 pattern, int options, - int *errorcodeptr, - const char **errptr, int *erroffset, - const unsigned char *tableptr); - - pcre16_extra *pcre16_study(const pcre16 *code, int options, - const char **errptr); - - void pcre16_free_study(pcre16_extra *extra); - - int pcre16_exec(const pcre16 *code, const pcre16_extra *extra, - PCRE_SPTR16 subject, int length, int startoffset, - int options, int *ovector, int ovecsize); - - int pcre16_dfa_exec(const pcre16 *code, const pcre16_extra *extra, - PCRE_SPTR16 subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - int *workspace, int wscount); - - -PCRE 16-BIT API STRING EXTRACTION FUNCTIONS - - int pcre16_copy_named_substring(const pcre16 *code, - PCRE_SPTR16 subject, int *ovector, - int stringcount, PCRE_SPTR16 stringname, - PCRE_UCHAR16 *buffer, int buffersize); - - int pcre16_copy_substring(PCRE_SPTR16 subject, int *ovector, - int stringcount, int stringnumber, PCRE_UCHAR16 *buffer, - int buffersize); - - int pcre16_get_named_substring(const pcre16 *code, - PCRE_SPTR16 subject, int *ovector, - int stringcount, PCRE_SPTR16 stringname, - PCRE_SPTR16 *stringptr); - - int pcre16_get_stringnumber(const pcre16 *code, - PCRE_SPTR16 name); - - int pcre16_get_stringtable_entries(const pcre16 *code, - PCRE_SPTR16 name, PCRE_UCHAR16 **first, PCRE_UCHAR16 **last); - - int pcre16_get_substring(PCRE_SPTR16 subject, int *ovector, - int stringcount, int stringnumber, - PCRE_SPTR16 *stringptr); - - int pcre16_get_substring_list(PCRE_SPTR16 subject, - int *ovector, int stringcount, PCRE_SPTR16 **listptr); - - void pcre16_free_substring(PCRE_SPTR16 stringptr); - - void pcre16_free_substring_list(PCRE_SPTR16 *stringptr); - - -PCRE 16-BIT API AUXILIARY FUNCTIONS - - pcre16_jit_stack *pcre16_jit_stack_alloc(int startsize, int maxsize); - - void pcre16_jit_stack_free(pcre16_jit_stack *stack); - - void pcre16_assign_jit_stack(pcre16_extra *extra, - pcre16_jit_callback callback, void *data); - - const unsigned char *pcre16_maketables(void); - - int pcre16_fullinfo(const pcre16 *code, const pcre16_extra *extra, - int what, void *where); - - int pcre16_refcount(pcre16 *code, int adjust); - - int pcre16_config(int what, void *where); - - const char *pcre16_version(void); - - int pcre16_pattern_to_host_byte_order(pcre16 *code, - pcre16_extra *extra, const unsigned char *tables); - - -PCRE 16-BIT API INDIRECTED FUNCTIONS - - void *(*pcre16_malloc)(size_t); - - void (*pcre16_free)(void *); - - void *(*pcre16_stack_malloc)(size_t); - - void (*pcre16_stack_free)(void *); - - int (*pcre16_callout)(pcre16_callout_block *); - - -PCRE 16-BIT API 16-BIT-ONLY FUNCTION - - int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *output, - PCRE_SPTR16 input, int length, int *byte_order, - int keep_boms); - - -THE PCRE 16-BIT LIBRARY - - Starting with release 8.30, it is possible to compile a PCRE library - that supports 16-bit character strings, including UTF-16 strings, as - well as or instead of the original 8-bit library. The majority of the - work to make this possible was done by Zoltan Herczeg. The two - libraries contain identical sets of functions, used in exactly the same - way. Only the names of the functions and the data types of their argu- - ments and results are different. To avoid over-complication and reduce - the documentation maintenance load, most of the PCRE documentation - describes the 8-bit library, with only occasional references to the - 16-bit library. This page describes what is different when you use the - 16-bit library. - - WARNING: A single application can be linked with both libraries, but - you must take care when processing any particular pattern to use func- - tions from just one library. For example, if you want to study a pat- - tern that was compiled with pcre16_compile(), you must do so with - pcre16_study(), not pcre_study(), and you must free the study data with - pcre16_free_study(). - - -THE HEADER FILE - - There is only one header file, pcre.h. It contains prototypes for all - the functions in all libraries, as well as definitions of flags, struc- - tures, error codes, etc. - - -THE LIBRARY NAME - - In Unix-like systems, the 16-bit library is called libpcre16, and can - normally be accesss by adding -lpcre16 to the command for linking an - application that uses PCRE. - - -STRING TYPES - - In the 8-bit library, strings are passed to PCRE library functions as - vectors of bytes with the C type "char *". In the 16-bit library, - strings are passed as vectors of unsigned 16-bit quantities. The macro - PCRE_UCHAR16 specifies an appropriate data type, and PCRE_SPTR16 is - defined as "const PCRE_UCHAR16 *". In very many environments, "short - int" is a 16-bit data type. When PCRE is built, it defines PCRE_UCHAR16 - as "unsigned short int", but checks that it really is a 16-bit data - type. If it is not, the build fails with an error message telling the - maintainer to modify the definition appropriately. - - -STRUCTURE TYPES - - The types of the opaque structures that are used for compiled 16-bit - patterns and JIT stacks are pcre16 and pcre16_jit_stack respectively. - The type of the user-accessible structure that is returned by - pcre16_study() is pcre16_extra, and the type of the structure that is - used for passing data to a callout function is pcre16_callout_block. - These structures contain the same fields, with the same names, as their - 8-bit counterparts. The only difference is that pointers to character - strings are 16-bit instead of 8-bit types. - - -16-BIT FUNCTIONS - - For every function in the 8-bit library there is a corresponding func- - tion in the 16-bit library with a name that starts with pcre16_ instead - of pcre_. The prototypes are listed above. In addition, there is one - extra function, pcre16_utf16_to_host_byte_order(). This is a utility - function that converts a UTF-16 character string to host byte order if - necessary. The other 16-bit functions expect the strings they are - passed to be in host byte order. - - The input and output arguments of pcre16_utf16_to_host_byte_order() may - point to the same address, that is, conversion in place is supported. - The output buffer must be at least as long as the input. - - The length argument specifies the number of 16-bit data units in the - input string; a negative value specifies a zero-terminated string. - - If byte_order is NULL, it is assumed that the string starts off in host - byte order. This may be changed by byte-order marks (BOMs) anywhere in - the string (commonly as the first character). - - If byte_order is not NULL, a non-zero value of the integer to which it - points means that the input starts off in host byte order, otherwise - the opposite order is assumed. Again, BOMs in the string can change - this. The final byte order is passed back at the end of processing. - - If keep_boms is not zero, byte-order mark characters (0xfeff) are - copied into the output string. Otherwise they are discarded. - - The result of the function is the number of 16-bit units placed into - the output buffer, including the zero terminator if the string was - zero-terminated. - - -SUBJECT STRING OFFSETS - - The lengths and starting offsets of subject strings must be specified - in 16-bit data units, and the offsets within subject strings that are - returned by the matching functions are in also 16-bit units rather than - bytes. - - -NAMED SUBPATTERNS - - The name-to-number translation table that is maintained for named sub- - patterns uses 16-bit characters. The pcre16_get_stringtable_entries() - function returns the length of each entry in the table as the number of - 16-bit data units. - - -OPTION NAMES - - There are two new general option names, PCRE_UTF16 and - PCRE_NO_UTF16_CHECK, which correspond to PCRE_UTF8 and - PCRE_NO_UTF8_CHECK in the 8-bit library. In fact, these new options - define the same bits in the options word. There is a discussion about - the validity of UTF-16 strings in the pcreunicode page. - - For the pcre16_config() function there is an option PCRE_CONFIG_UTF16 - that returns 1 if UTF-16 support is configured, otherwise 0. If this - option is given to pcre_config() or pcre32_config(), or if the - PCRE_CONFIG_UTF8 or PCRE_CONFIG_UTF32 option is given to pcre16_con- - fig(), the result is the PCRE_ERROR_BADOPTION error. - - -CHARACTER CODES - - In 16-bit mode, when PCRE_UTF16 is not set, character values are - treated in the same way as in 8-bit, non UTF-8 mode, except, of course, - that they can range from 0 to 0xffff instead of 0 to 0xff. Character - types for characters less than 0xff can therefore be influenced by the - locale in the same way as before. Characters greater than 0xff have - only one case, and no "type" (such as letter or digit). - - In UTF-16 mode, the character code is Unicode, in the range 0 to - 0x10ffff, with the exception of values in the range 0xd800 to 0xdfff - because those are "surrogate" values that are used in pairs to encode - values greater than 0xffff. - - A UTF-16 string can indicate its endianness by special code knows as a - byte-order mark (BOM). The PCRE functions do not handle this, expecting - strings to be in host byte order. A utility function called - pcre16_utf16_to_host_byte_order() is provided to help with this (see - above). - - -ERROR NAMES - - The errors PCRE_ERROR_BADUTF16_OFFSET and PCRE_ERROR_SHORTUTF16 corre- - spond to their 8-bit counterparts. The error PCRE_ERROR_BADMODE is - given when a compiled pattern is passed to a function that processes - patterns in the other mode, for example, if a pattern compiled with - pcre_compile() is passed to pcre16_exec(). - - There are new error codes whose names begin with PCRE_UTF16_ERR for - invalid UTF-16 strings, corresponding to the PCRE_UTF8_ERR codes for - UTF-8 strings that are described in the section entitled "Reason codes - for invalid UTF-8 strings" in the main pcreapi page. The UTF-16 errors - are: - - PCRE_UTF16_ERR1 Missing low surrogate at end of string - PCRE_UTF16_ERR2 Invalid low surrogate follows high surrogate - PCRE_UTF16_ERR3 Isolated low surrogate - PCRE_UTF16_ERR4 Non-character - - -ERROR TEXTS - - If there is an error while compiling a pattern, the error text that is - passed back by pcre16_compile() or pcre16_compile2() is still an 8-bit - character string, zero-terminated. - - -CALLOUTS - - The subject and mark fields in the callout block that is passed to a - callout function point to 16-bit vectors. - - -TESTING - - The pcretest program continues to operate with 8-bit input and output - files, but it can be used for testing the 16-bit library. If it is run - with the command line option -16, patterns and subject strings are con- - verted from 8-bit to 16-bit before being passed to PCRE, and the 16-bit - library functions are used instead of the 8-bit ones. Returned 16-bit - strings are converted to 8-bit for output. If both the 8-bit and the - 32-bit libraries were not compiled, pcretest defaults to 16-bit and the - -16 option is ignored. - - When PCRE is being built, the RunTest script that is called by "make - check" uses the pcretest -C option to discover which of the 8-bit, - 16-bit and 32-bit libraries has been built, and runs the tests appro- - priately. - - -NOT SUPPORTED IN 16-BIT MODE - - Not all the features of the 8-bit library are available with the 16-bit - library. The C++ and POSIX wrapper functions support only the 8-bit - library, and the pcregrep program is at present 8-bit only. - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 12 May 2013 - Copyright (c) 1997-2013 University of Cambridge. ------------------------------------------------------------------------------- - - -PCRE(3) Library Functions Manual PCRE(3) - - - -NAME - PCRE - Perl-compatible regular expressions - - #include - - -PCRE 32-BIT API BASIC FUNCTIONS - - pcre32 *pcre32_compile(PCRE_SPTR32 pattern, int options, - const char **errptr, int *erroffset, - const unsigned char *tableptr); - - pcre32 *pcre32_compile2(PCRE_SPTR32 pattern, int options, - int *errorcodeptr, - const unsigned char *tableptr); - - pcre32_extra *pcre32_study(const pcre32 *code, int options, - const char **errptr); - - void pcre32_free_study(pcre32_extra *extra); - - int pcre32_exec(const pcre32 *code, const pcre32_extra *extra, - PCRE_SPTR32 subject, int length, int startoffset, - int options, int *ovector, int ovecsize); - - int pcre32_dfa_exec(const pcre32 *code, const pcre32_extra *extra, - PCRE_SPTR32 subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - int *workspace, int wscount); - - -PCRE 32-BIT API STRING EXTRACTION FUNCTIONS - - int pcre32_copy_named_substring(const pcre32 *code, - PCRE_SPTR32 subject, int *ovector, - int stringcount, PCRE_SPTR32 stringname, - PCRE_UCHAR32 *buffer, int buffersize); - - int pcre32_copy_substring(PCRE_SPTR32 subject, int *ovector, - int stringcount, int stringnumber, PCRE_UCHAR32 *buffer, - int buffersize); - - int pcre32_get_named_substring(const pcre32 *code, - PCRE_SPTR32 subject, int *ovector, - int stringcount, PCRE_SPTR32 stringname, - PCRE_SPTR32 *stringptr); - - int pcre32_get_stringnumber(const pcre32 *code, - PCRE_SPTR32 name); - - int pcre32_get_stringtable_entries(const pcre32 *code, - PCRE_SPTR32 name, PCRE_UCHAR32 **first, PCRE_UCHAR32 **last); - - int pcre32_get_substring(PCRE_SPTR32 subject, int *ovector, - int stringcount, int stringnumber, - PCRE_SPTR32 *stringptr); - - int pcre32_get_substring_list(PCRE_SPTR32 subject, - int *ovector, int stringcount, PCRE_SPTR32 **listptr); - - void pcre32_free_substring(PCRE_SPTR32 stringptr); - - void pcre32_free_substring_list(PCRE_SPTR32 *stringptr); - - -PCRE 32-BIT API AUXILIARY FUNCTIONS - - pcre32_jit_stack *pcre32_jit_stack_alloc(int startsize, int maxsize); - - void pcre32_jit_stack_free(pcre32_jit_stack *stack); - - void pcre32_assign_jit_stack(pcre32_extra *extra, - pcre32_jit_callback callback, void *data); - - const unsigned char *pcre32_maketables(void); - - int pcre32_fullinfo(const pcre32 *code, const pcre32_extra *extra, - int what, void *where); - - int pcre32_refcount(pcre32 *code, int adjust); - - int pcre32_config(int what, void *where); - - const char *pcre32_version(void); - - int pcre32_pattern_to_host_byte_order(pcre32 *code, - pcre32_extra *extra, const unsigned char *tables); - - -PCRE 32-BIT API INDIRECTED FUNCTIONS - - void *(*pcre32_malloc)(size_t); - - void (*pcre32_free)(void *); - - void *(*pcre32_stack_malloc)(size_t); - - void (*pcre32_stack_free)(void *); - - int (*pcre32_callout)(pcre32_callout_block *); - - -PCRE 32-BIT API 32-BIT-ONLY FUNCTION - - int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *output, - PCRE_SPTR32 input, int length, int *byte_order, - int keep_boms); - - -THE PCRE 32-BIT LIBRARY - - Starting with release 8.32, it is possible to compile a PCRE library - that supports 32-bit character strings, including UTF-32 strings, as - well as or instead of the original 8-bit library. This work was done by - Christian Persch, based on the work done by Zoltan Herczeg for the - 16-bit library. All three libraries contain identical sets of func- - tions, used in exactly the same way. Only the names of the functions - and the data types of their arguments and results are different. To - avoid over-complication and reduce the documentation maintenance load, - most of the PCRE documentation describes the 8-bit library, with only - occasional references to the 16-bit and 32-bit libraries. This page - describes what is different when you use the 32-bit library. - - WARNING: A single application can be linked with all or any of the - three libraries, but you must take care when processing any particular - pattern to use functions from just one library. For example, if you - want to study a pattern that was compiled with pcre32_compile(), you - must do so with pcre32_study(), not pcre_study(), and you must free the - study data with pcre32_free_study(). - - -THE HEADER FILE - - There is only one header file, pcre.h. It contains prototypes for all - the functions in all libraries, as well as definitions of flags, struc- - tures, error codes, etc. - - -THE LIBRARY NAME - - In Unix-like systems, the 32-bit library is called libpcre32, and can - normally be accesss by adding -lpcre32 to the command for linking an - application that uses PCRE. - - -STRING TYPES - - In the 8-bit library, strings are passed to PCRE library functions as - vectors of bytes with the C type "char *". In the 32-bit library, - strings are passed as vectors of unsigned 32-bit quantities. The macro - PCRE_UCHAR32 specifies an appropriate data type, and PCRE_SPTR32 is - defined as "const PCRE_UCHAR32 *". In very many environments, "unsigned - int" is a 32-bit data type. When PCRE is built, it defines PCRE_UCHAR32 - as "unsigned int", but checks that it really is a 32-bit data type. If - it is not, the build fails with an error message telling the maintainer - to modify the definition appropriately. - - -STRUCTURE TYPES - - The types of the opaque structures that are used for compiled 32-bit - patterns and JIT stacks are pcre32 and pcre32_jit_stack respectively. - The type of the user-accessible structure that is returned by - pcre32_study() is pcre32_extra, and the type of the structure that is - used for passing data to a callout function is pcre32_callout_block. - These structures contain the same fields, with the same names, as their - 8-bit counterparts. The only difference is that pointers to character - strings are 32-bit instead of 8-bit types. - - -32-BIT FUNCTIONS - - For every function in the 8-bit library there is a corresponding func- - tion in the 32-bit library with a name that starts with pcre32_ instead - of pcre_. The prototypes are listed above. In addition, there is one - extra function, pcre32_utf32_to_host_byte_order(). This is a utility - function that converts a UTF-32 character string to host byte order if - necessary. The other 32-bit functions expect the strings they are - passed to be in host byte order. - - The input and output arguments of pcre32_utf32_to_host_byte_order() may - point to the same address, that is, conversion in place is supported. - The output buffer must be at least as long as the input. - - The length argument specifies the number of 32-bit data units in the - input string; a negative value specifies a zero-terminated string. - - If byte_order is NULL, it is assumed that the string starts off in host - byte order. This may be changed by byte-order marks (BOMs) anywhere in - the string (commonly as the first character). - - If byte_order is not NULL, a non-zero value of the integer to which it - points means that the input starts off in host byte order, otherwise - the opposite order is assumed. Again, BOMs in the string can change - this. The final byte order is passed back at the end of processing. - - If keep_boms is not zero, byte-order mark characters (0xfeff) are - copied into the output string. Otherwise they are discarded. - - The result of the function is the number of 32-bit units placed into - the output buffer, including the zero terminator if the string was - zero-terminated. - - -SUBJECT STRING OFFSETS - - The lengths and starting offsets of subject strings must be specified - in 32-bit data units, and the offsets within subject strings that are - returned by the matching functions are in also 32-bit units rather than - bytes. - - -NAMED SUBPATTERNS - - The name-to-number translation table that is maintained for named sub- - patterns uses 32-bit characters. The pcre32_get_stringtable_entries() - function returns the length of each entry in the table as the number of - 32-bit data units. - - -OPTION NAMES - - There are two new general option names, PCRE_UTF32 and - PCRE_NO_UTF32_CHECK, which correspond to PCRE_UTF8 and - PCRE_NO_UTF8_CHECK in the 8-bit library. In fact, these new options - define the same bits in the options word. There is a discussion about - the validity of UTF-32 strings in the pcreunicode page. - - For the pcre32_config() function there is an option PCRE_CONFIG_UTF32 - that returns 1 if UTF-32 support is configured, otherwise 0. If this - option is given to pcre_config() or pcre16_config(), or if the - PCRE_CONFIG_UTF8 or PCRE_CONFIG_UTF16 option is given to pcre32_con- - fig(), the result is the PCRE_ERROR_BADOPTION error. - - -CHARACTER CODES - - In 32-bit mode, when PCRE_UTF32 is not set, character values are - treated in the same way as in 8-bit, non UTF-8 mode, except, of course, - that they can range from 0 to 0x7fffffff instead of 0 to 0xff. Charac- - ter types for characters less than 0xff can therefore be influenced by - the locale in the same way as before. Characters greater than 0xff - have only one case, and no "type" (such as letter or digit). - - In UTF-32 mode, the character code is Unicode, in the range 0 to - 0x10ffff, with the exception of values in the range 0xd800 to 0xdfff - because those are "surrogate" values that are ill-formed in UTF-32. - - A UTF-32 string can indicate its endianness by special code knows as a - byte-order mark (BOM). The PCRE functions do not handle this, expecting - strings to be in host byte order. A utility function called - pcre32_utf32_to_host_byte_order() is provided to help with this (see - above). - - -ERROR NAMES - - The error PCRE_ERROR_BADUTF32 corresponds to its 8-bit counterpart. - The error PCRE_ERROR_BADMODE is given when a compiled pattern is passed - to a function that processes patterns in the other mode, for example, - if a pattern compiled with pcre_compile() is passed to pcre32_exec(). - - There are new error codes whose names begin with PCRE_UTF32_ERR for - invalid UTF-32 strings, corresponding to the PCRE_UTF8_ERR codes for - UTF-8 strings that are described in the section entitled "Reason codes - for invalid UTF-8 strings" in the main pcreapi page. The UTF-32 errors - are: - - PCRE_UTF32_ERR1 Surrogate character (range from 0xd800 to 0xdfff) - PCRE_UTF32_ERR2 Non-character - PCRE_UTF32_ERR3 Character > 0x10ffff - - -ERROR TEXTS - - If there is an error while compiling a pattern, the error text that is - passed back by pcre32_compile() or pcre32_compile2() is still an 8-bit - character string, zero-terminated. - - -CALLOUTS - - The subject and mark fields in the callout block that is passed to a - callout function point to 32-bit vectors. - - -TESTING - - The pcretest program continues to operate with 8-bit input and output - files, but it can be used for testing the 32-bit library. If it is run - with the command line option -32, patterns and subject strings are con- - verted from 8-bit to 32-bit before being passed to PCRE, and the 32-bit - library functions are used instead of the 8-bit ones. Returned 32-bit - strings are converted to 8-bit for output. If both the 8-bit and the - 16-bit libraries were not compiled, pcretest defaults to 32-bit and the - -32 option is ignored. - - When PCRE is being built, the RunTest script that is called by "make - check" uses the pcretest -C option to discover which of the 8-bit, - 16-bit and 32-bit libraries has been built, and runs the tests appro- - priately. - - -NOT SUPPORTED IN 32-BIT MODE - - Not all the features of the 8-bit library are available with the 32-bit - library. The C++ and POSIX wrapper functions support only the 8-bit - library, and the pcregrep program is at present 8-bit only. - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 12 May 2013 - Copyright (c) 1997-2013 University of Cambridge. ------------------------------------------------------------------------------- - - -PCREBUILD(3) Library Functions Manual PCREBUILD(3) - - - -NAME - PCRE - Perl-compatible regular expressions - -BUILDING PCRE - - PCRE is distributed with a configure script that can be used to build - the library in Unix-like environments using the applications known as - Autotools. Also in the distribution are files to support building - using CMake instead of configure. The text file README contains general - information about building with Autotools (some of which is repeated - below), and also has some comments about building on various operating - systems. There is a lot more information about building PCRE without - using Autotools (including information about using CMake and building - "by hand") in the text file called NON-AUTOTOOLS-BUILD. You should - consult this file as well as the README file if you are building in a - non-Unix-like environment. - - -PCRE BUILD-TIME OPTIONS - - The rest of this document describes the optional features of PCRE that - can be selected when the library is compiled. It assumes use of the - configure script, where the optional features are selected or dese- - lected by providing options to configure before running the make com- - mand. However, the same options can be selected in both Unix-like and - non-Unix-like environments using the GUI facility of cmake-gui if you - are using CMake instead of configure to build PCRE. - - If you are not using Autotools or CMake, option selection can be done - by editing the config.h file, or by passing parameter settings to the - compiler, as described in NON-AUTOTOOLS-BUILD. - - The complete list of options for configure (which includes the standard - ones such as the selection of the installation directory) can be - obtained by running - - ./configure --help - - The following sections include descriptions of options whose names - begin with --enable or --disable. These settings specify changes to the - defaults for the configure command. Because of the way that configure - works, --enable and --disable always come in pairs, so the complemen- - tary option always exists as well, but as it specifies the default, it - is not described. - - -BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES - - By default, a library called libpcre is built, containing functions - that take string arguments contained in vectors of bytes, either as - single-byte characters, or interpreted as UTF-8 strings. You can also - build a separate library, called libpcre16, in which strings are con- - tained in vectors of 16-bit data units and interpreted either as sin- - gle-unit characters or UTF-16 strings, by adding - - --enable-pcre16 - - to the configure command. You can also build yet another separate - library, called libpcre32, in which strings are contained in vectors of - 32-bit data units and interpreted either as single-unit characters or - UTF-32 strings, by adding - - --enable-pcre32 - - to the configure command. If you do not want the 8-bit library, add - - --disable-pcre8 - - as well. At least one of the three libraries must be built. Note that - the C++ and POSIX wrappers are for the 8-bit library only, and that - pcregrep is an 8-bit program. None of these are built if you select - only the 16-bit or 32-bit libraries. - - -BUILDING SHARED AND STATIC LIBRARIES - - The Autotools PCRE building process uses libtool to build both shared - and static libraries by default. You can suppress one of these by - adding one of - - --disable-shared - --disable-static - - to the configure command, as required. - - -C++ SUPPORT - - By default, if the 8-bit library is being built, the configure script - will search for a C++ compiler and C++ header files. If it finds them, - it automatically builds the C++ wrapper library (which supports only - 8-bit strings). You can disable this by adding - - --disable-cpp - - to the configure command. - - -UTF-8, UTF-16 AND UTF-32 SUPPORT - - To build PCRE with support for UTF Unicode character strings, add - - --enable-utf - - to the configure command. This setting applies to all three libraries, - adding support for UTF-8 to the 8-bit library, support for UTF-16 to - the 16-bit library, and support for UTF-32 to the to the 32-bit - library. There are no separate options for enabling UTF-8, UTF-16 and - UTF-32 independently because that would allow ridiculous settings such - as requesting UTF-16 support while building only the 8-bit library. It - is not possible to build one library with UTF support and another with- - out in the same configuration. (For backwards compatibility, --enable- - utf8 is a synonym of --enable-utf.) - - Of itself, this setting does not make PCRE treat strings as UTF-8, - UTF-16 or UTF-32. As well as compiling PCRE with this option, you also - have have to set the PCRE_UTF8, PCRE_UTF16 or PCRE_UTF32 option (as - appropriate) when you call one of the pattern compiling functions. - - If you set --enable-utf when compiling in an EBCDIC environment, PCRE - expects its input to be either ASCII or UTF-8 (depending on the run- - time option). It is not possible to support both EBCDIC and UTF-8 codes - in the same version of the library. Consequently, --enable-utf and - --enable-ebcdic are mutually exclusive. - - -UNICODE CHARACTER PROPERTY SUPPORT - - UTF support allows the libraries to process character codepoints up to - 0x10ffff in the strings that they handle. On its own, however, it does - not provide any facilities for accessing the properties of such charac- - ters. If you want to be able to use the pattern escapes \P, \p, and \X, - which refer to Unicode character properties, you must add - - --enable-unicode-properties - - to the configure command. This implies UTF support, even if you have - not explicitly requested it. - - Including Unicode property support adds around 30K of tables to the - PCRE library. Only the general category properties such as Lu and Nd - are supported. Details are given in the pcrepattern documentation. - - -JUST-IN-TIME COMPILER SUPPORT - - Just-in-time compiler support is included in the build by specifying - - --enable-jit - - This support is available only for certain hardware architectures. If - this option is set for an unsupported architecture, a compile time - error occurs. See the pcrejit documentation for a discussion of JIT - usage. When JIT support is enabled, pcregrep automatically makes use of - it, unless you add - - --disable-pcregrep-jit - - to the "configure" command. - - -CODE VALUE OF NEWLINE - - By default, PCRE interprets the linefeed (LF) character as indicating - the end of a line. This is the normal newline character on Unix-like - systems. You can compile PCRE to use carriage return (CR) instead, by - adding - - --enable-newline-is-cr - - to the configure command. There is also a --enable-newline-is-lf - option, which explicitly specifies linefeed as the newline character. - - Alternatively, you can specify that line endings are to be indicated by - the two character sequence CRLF. If you want this, add - - --enable-newline-is-crlf - - to the configure command. There is a fourth option, specified by - - --enable-newline-is-anycrlf - - which causes PCRE to recognize any of the three sequences CR, LF, or - CRLF as indicating a line ending. Finally, a fifth option, specified by - - --enable-newline-is-any - - causes PCRE to recognize any Unicode newline sequence. - - Whatever line ending convention is selected when PCRE is built can be - overridden when the library functions are called. At build time it is - conventional to use the standard for your operating system. - - -WHAT \R MATCHES - - By default, the sequence \R in a pattern matches any Unicode newline - sequence, whatever has been selected as the line ending sequence. If - you specify - - --enable-bsr-anycrlf - - the default is changed so that \R matches only CR, LF, or CRLF. What- - ever is selected when PCRE is built can be overridden when the library - functions are called. - - -POSIX MALLOC USAGE - - When the 8-bit library is called through the POSIX interface (see the - pcreposix documentation), additional working storage is required for - holding the pointers to capturing substrings, because PCRE requires - three integers per substring, whereas the POSIX interface provides only - two. If the number of expected substrings is small, the wrapper func- - tion uses space on the stack, because this is faster than using mal- - loc() for each call. The default threshold above which the stack is no - longer used is 10; it can be changed by adding a setting such as - - --with-posix-malloc-threshold=20 - - to the configure command. - - -HANDLING VERY LARGE PATTERNS - - Within a compiled pattern, offset values are used to point from one - part to another (for example, from an opening parenthesis to an alter- - nation metacharacter). By default, in the 8-bit and 16-bit libraries, - two-byte values are used for these offsets, leading to a maximum size - for a compiled pattern of around 64K. This is sufficient to handle all - but the most gigantic patterns. Nevertheless, some people do want to - process truly enormous patterns, so it is possible to compile PCRE to - use three-byte or four-byte offsets by adding a setting such as - - --with-link-size=3 - - to the configure command. The value given must be 2, 3, or 4. For the - 16-bit library, a value of 3 is rounded up to 4. In these libraries, - using longer offsets slows down the operation of PCRE because it has to - load additional data when handling them. For the 32-bit library the - value is always 4 and cannot be overridden; the value of --with-link- - size is ignored. - - -AVOIDING EXCESSIVE STACK USAGE - - When matching with the pcre_exec() function, PCRE implements backtrack- - ing by making recursive calls to an internal function called match(). - In environments where the size of the stack is limited, this can se- - verely limit PCRE's operation. (The Unix environment does not usually - suffer from this problem, but it may sometimes be necessary to increase - the maximum stack size. There is a discussion in the pcrestack docu- - mentation.) An alternative approach to recursion that uses memory from - the heap to remember data, instead of using recursive function calls, - has been implemented to work round the problem of limited stack size. - If you want to build a version of PCRE that works this way, add - - --disable-stack-for-recursion - - to the configure command. With this configuration, PCRE will use the - pcre_stack_malloc and pcre_stack_free variables to call memory manage- - ment functions. By default these point to malloc() and free(), but you - can replace the pointers so that your own functions are used instead. - - Separate functions are provided rather than using pcre_malloc and - pcre_free because the usage is very predictable: the block sizes - requested are always the same, and the blocks are always freed in - reverse order. A calling program might be able to implement optimized - functions that perform better than malloc() and free(). PCRE runs - noticeably more slowly when built in this way. This option affects only - the pcre_exec() function; it is not relevant for pcre_dfa_exec(). - - -LIMITING PCRE RESOURCE USAGE - - Internally, PCRE has a function called match(), which it calls repeat- - edly (sometimes recursively) when matching a pattern with the - pcre_exec() function. By controlling the maximum number of times this - function may be called during a single matching operation, a limit can - be placed on the resources used by a single call to pcre_exec(). The - limit can be changed at run time, as described in the pcreapi documen- - tation. The default is 10 million, but this can be changed by adding a - setting such as - - --with-match-limit=500000 - - to the configure command. This setting has no effect on the - pcre_dfa_exec() matching function. - - In some environments it is desirable to limit the depth of recursive - calls of match() more strictly than the total number of calls, in order - to restrict the maximum amount of stack (or heap, if --disable-stack- - for-recursion is specified) that is used. A second limit controls this; - it defaults to the value that is set for --with-match-limit, which - imposes no additional constraints. However, you can set a lower limit - by adding, for example, - - --with-match-limit-recursion=10000 - - to the configure command. This value can also be overridden at run - time. - - -CREATING CHARACTER TABLES AT BUILD TIME - - PCRE uses fixed tables for processing characters whose code values are - less than 256. By default, PCRE is built with a set of tables that are - distributed in the file pcre_chartables.c.dist. These tables are for - ASCII codes only. If you add - - --enable-rebuild-chartables - - to the configure command, the distributed tables are no longer used. - Instead, a program called dftables is compiled and run. This outputs - the source for new set of tables, created in the default locale of your - C run-time system. (This method of replacing the tables does not work - if you are cross compiling, because dftables is run on the local host. - If you need to create alternative tables when cross compiling, you will - have to do so "by hand".) - - -USING EBCDIC CODE - - PCRE assumes by default that it will run in an environment where the - character code is ASCII (or Unicode, which is a superset of ASCII). - This is the case for most computer operating systems. PCRE can, how- - ever, be compiled to run in an EBCDIC environment by adding - - --enable-ebcdic - - to the configure command. This setting implies --enable-rebuild-charta- - bles. You should only use it if you know that you are in an EBCDIC - environment (for example, an IBM mainframe operating system). The - --enable-ebcdic option is incompatible with --enable-utf. - - The EBCDIC character that corresponds to an ASCII LF is assumed to have - the value 0x15 by default. However, in some EBCDIC environments, 0x25 - is used. In such an environment you should use - - --enable-ebcdic-nl25 - - as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR - has the same value as in ASCII, namely, 0x0d. Whichever of 0x15 and - 0x25 is not chosen as LF is made to correspond to the Unicode NEL char- - acter (which, in Unicode, is 0x85). - - The options that select newline behaviour, such as --enable-newline-is- - cr, and equivalent run-time options, refer to these character values in - an EBCDIC environment. - - -PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT - - By default, pcregrep reads all files as plain text. You can build it so - that it recognizes files whose names end in .gz or .bz2, and reads them - with libz or libbz2, respectively, by adding one or both of - - --enable-pcregrep-libz - --enable-pcregrep-libbz2 - - to the configure command. These options naturally require that the rel- - evant libraries are installed on your system. Configuration will fail - if they are not. - - -PCREGREP BUFFER SIZE - - pcregrep uses an internal buffer to hold a "window" on the file it is - scanning, in order to be able to output "before" and "after" lines when - it finds a match. The size of the buffer is controlled by a parameter - whose default value is 20K. The buffer itself is three times this size, - but because of the way it is used for holding "before" lines, the long- - est line that is guaranteed to be processable is the parameter size. - You can change the default parameter value by adding, for example, - - --with-pcregrep-bufsize=50K - - to the configure command. The caller of pcregrep can, however, override - this value by specifying a run-time option. - - -PCRETEST OPTION FOR LIBREADLINE SUPPORT - - If you add - - --enable-pcretest-libreadline - - to the configure command, pcretest is linked with the libreadline - library, and when its input is from a terminal, it reads it using the - readline() function. This provides line-editing and history facilities. - Note that libreadline is GPL-licensed, so if you distribute a binary of - pcretest linked in this way, there may be licensing issues. - - Setting this option causes the -lreadline option to be added to the - pcretest build. In many operating environments with a sytem-installed - libreadline this is sufficient. However, in some environments (e.g. if - an unmodified distribution version of readline is in use), some extra - configuration may be necessary. The INSTALL file for libreadline says - this: - - "Readline uses the termcap functions, but does not link with the - termcap or curses library itself, allowing applications which link - with readline the to choose an appropriate library." - - If your environment has not been set up so that an appropriate library - is automatically included, you may need to add something like - - LIBS="-ncurses" - - immediately before the configure command. - - -DEBUGGING WITH VALGRIND SUPPORT - - By adding the - - --enable-valgrind - - option to to the configure command, PCRE will use valgrind annotations - to mark certain memory regions as unaddressable. This allows it to - detect invalid memory accesses, and is mostly useful for debugging PCRE - itself. - - -CODE COVERAGE REPORTING - - If your C compiler is gcc, you can build a version of PCRE that can - generate a code coverage report for its test suite. To enable this, you - must install lcov version 1.6 or above. Then specify - - --enable-coverage - - to the configure command and build PCRE in the usual way. - - Note that using ccache (a caching C compiler) is incompatible with code - coverage reporting. If you have configured ccache to run automatically - on your system, you must set the environment variable - - CCACHE_DISABLE=1 - - before running make to build PCRE, so that ccache is not used. - - When --enable-coverage is used, the following addition targets are - added to the Makefile: - - make coverage - - This creates a fresh coverage report for the PCRE test suite. It is - equivalent to running "make coverage-reset", "make coverage-baseline", - "make check", and then "make coverage-report". - - make coverage-reset - - This zeroes the coverage counters, but does nothing else. - - make coverage-baseline - - This captures baseline coverage information. - - make coverage-report - - This creates the coverage report. - - make coverage-clean-report - - This removes the generated coverage report without cleaning the cover- - age data itself. - - make coverage-clean-data - - This removes the captured coverage data without removing the coverage - files created at compile time (*.gcno). - - make coverage-clean - - This cleans all coverage data including the generated coverage report. - For more information about code coverage, see the gcov and lcov docu- - mentation. - - -SEE ALSO - - pcreapi(3), pcre16, pcre32, pcre_config(3). - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 12 May 2013 - Copyright (c) 1997-2013 University of Cambridge. ------------------------------------------------------------------------------- - - -PCREMATCHING(3) Library Functions Manual PCREMATCHING(3) - - - -NAME - PCRE - Perl-compatible regular expressions - -PCRE MATCHING ALGORITHMS - - This document describes the two different algorithms that are available - in PCRE for matching a compiled regular expression against a given sub- - ject string. The "standard" algorithm is the one provided by the - pcre_exec(), pcre16_exec() and pcre32_exec() functions. These work in - the same as as Perl's matching function, and provide a Perl-compatible - matching operation. The just-in-time (JIT) optimization that is - described in the pcrejit documentation is compatible with these func- - tions. - - An alternative algorithm is provided by the pcre_dfa_exec(), - pcre16_dfa_exec() and pcre32_dfa_exec() functions; they operate in a - different way, and are not Perl-compatible. This alternative has advan- - tages and disadvantages compared with the standard algorithm, and these - are described below. - - When there is only one possible way in which a given subject string can - match a pattern, the two algorithms give the same answer. A difference - arises, however, when there are multiple possibilities. For example, if - the pattern - - ^<.*> - - is matched against the string - - - - there are three possible answers. The standard algorithm finds only one - of them, whereas the alternative algorithm finds all three. - - -REGULAR EXPRESSIONS AS TREES - - The set of strings that are matched by a regular expression can be rep- - resented as a tree structure. An unlimited repetition in the pattern - makes the tree of infinite size, but it is still a tree. Matching the - pattern to a given subject string (from a given starting point) can be - thought of as a search of the tree. There are two ways to search a - tree: depth-first and breadth-first, and these correspond to the two - matching algorithms provided by PCRE. - - -THE STANDARD MATCHING ALGORITHM - - In the terminology of Jeffrey Friedl's book "Mastering Regular Expres- - sions", the standard algorithm is an "NFA algorithm". It conducts a - depth-first search of the pattern tree. That is, it proceeds along a - single path through the tree, checking that the subject matches what is - required. When there is a mismatch, the algorithm tries any alterna- - tives at the current point, and if they all fail, it backs up to the - previous branch point in the tree, and tries the next alternative - branch at that level. This often involves backing up (moving to the - left) in the subject string as well. The order in which repetition - branches are tried is controlled by the greedy or ungreedy nature of - the quantifier. - - If a leaf node is reached, a matching string has been found, and at - that point the algorithm stops. Thus, if there is more than one possi- - ble match, this algorithm returns the first one that it finds. Whether - this is the shortest, the longest, or some intermediate length depends - on the way the greedy and ungreedy repetition quantifiers are specified - in the pattern. - - Because it ends up with a single path through the tree, it is rela- - tively straightforward for this algorithm to keep track of the sub- - strings that are matched by portions of the pattern in parentheses. - This provides support for capturing parentheses and back references. - - -THE ALTERNATIVE MATCHING ALGORITHM - - This algorithm conducts a breadth-first search of the tree. Starting - from the first matching point in the subject, it scans the subject - string from left to right, once, character by character, and as it does - this, it remembers all the paths through the tree that represent valid - matches. In Friedl's terminology, this is a kind of "DFA algorithm", - though it is not implemented as a traditional finite state machine (it - keeps multiple states active simultaneously). - - Although the general principle of this matching algorithm is that it - scans the subject string only once, without backtracking, there is one - exception: when a lookaround assertion is encountered, the characters - following or preceding the current point have to be independently - inspected. - - The scan continues until either the end of the subject is reached, or - there are no more unterminated paths. At this point, terminated paths - represent the different matching possibilities (if there are none, the - match has failed). Thus, if there is more than one possible match, - this algorithm finds all of them, and in particular, it finds the long- - est. The matches are returned in decreasing order of length. There is - an option to stop the algorithm after the first match (which is neces- - sarily the shortest) is found. - - Note that all the matches that are found start at the same point in the - subject. If the pattern - - cat(er(pillar)?)? - - is matched against the string "the caterpillar catchment", the result - will be the three strings "caterpillar", "cater", and "cat" that start - at the fifth character of the subject. The algorithm does not automati- - cally move on to find matches that start at later positions. - - PCRE's "auto-possessification" optimization usually applies to charac- - ter repeats at the end of a pattern (as well as internally). For exam- - ple, the pattern "a\d+" is compiled as if it were "a\d++" because there - is no point even considering the possibility of backtracking into the - repeated digits. For DFA matching, this means that only one possible - match is found. If you really do want multiple matches in such cases, - either use an ungreedy repeat ("a\d+?") or set the PCRE_NO_AUTO_POSSESS - option when compiling. - - There are a number of features of PCRE regular expressions that are not - supported by the alternative matching algorithm. They are as follows: - - 1. Because the algorithm finds all possible matches, the greedy or - ungreedy nature of repetition quantifiers is not relevant. Greedy and - ungreedy quantifiers are treated in exactly the same way. However, pos- - sessive quantifiers can make a difference when what follows could also - match what is quantified, for example in a pattern like this: - - ^a++\w! - - This pattern matches "aaab!" but not "aaa!", which would be matched by - a non-possessive quantifier. Similarly, if an atomic group is present, - it is matched as if it were a standalone pattern at the current point, - and the longest match is then "locked in" for the rest of the overall - pattern. - - 2. When dealing with multiple paths through the tree simultaneously, it - is not straightforward to keep track of captured substrings for the - different matching possibilities, and PCRE's implementation of this - algorithm does not attempt to do this. This means that no captured sub- - strings are available. - - 3. Because no substrings are captured, back references within the pat- - tern are not supported, and cause errors if encountered. - - 4. For the same reason, conditional expressions that use a backrefer- - ence as the condition or test for a specific group recursion are not - supported. - - 5. Because many paths through the tree may be active, the \K escape - sequence, which resets the start of the match when encountered (but may - be on some paths and not on others), is not supported. It causes an - error if encountered. - - 6. Callouts are supported, but the value of the capture_top field is - always 1, and the value of the capture_last field is always -1. - - 7. The \C escape sequence, which (in the standard algorithm) always - matches a single data unit, even in UTF-8, UTF-16 or UTF-32 modes, is - not supported in these modes, because the alternative algorithm moves - through the subject string one character (not data unit) at a time, for - all active paths through the tree. - - 8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) - are not supported. (*FAIL) is supported, and behaves like a failing - negative assertion. - - -ADVANTAGES OF THE ALTERNATIVE ALGORITHM - - Using the alternative matching algorithm provides the following advan- - tages: - - 1. All possible matches (at a single point in the subject) are automat- - ically found, and in particular, the longest match is found. To find - more than one match using the standard algorithm, you have to do kludgy - things with callouts. - - 2. Because the alternative algorithm scans the subject string just - once, and never needs to backtrack (except for lookbehinds), it is pos- - sible to pass very long subject strings to the matching function in - several pieces, checking for partial matching each time. Although it is - possible to do multi-segment matching using the standard algorithm by - retaining partially matched substrings, it is more complicated. The - pcrepartial documentation gives details of partial matching and dis- - cusses multi-segment matching. - - -DISADVANTAGES OF THE ALTERNATIVE ALGORITHM - - The alternative algorithm suffers from a number of disadvantages: - - 1. It is substantially slower than the standard algorithm. This is - partly because it has to search for all possible matches, but is also - because it is less susceptible to optimization. - - 2. Capturing parentheses and back references are not supported. - - 3. Although atomic groups are supported, their use does not provide the - performance advantage that it does for the standard algorithm. - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 12 November 2013 - Copyright (c) 1997-2012 University of Cambridge. ------------------------------------------------------------------------------- - - -PCREAPI(3) Library Functions Manual PCREAPI(3) - - - -NAME - PCRE - Perl-compatible regular expressions - - #include - - -PCRE NATIVE API BASIC FUNCTIONS - - pcre *pcre_compile(const char *pattern, int options, - const char **errptr, int *erroffset, - const unsigned char *tableptr); - - pcre *pcre_compile2(const char *pattern, int options, - int *errorcodeptr, - const char **errptr, int *erroffset, - const unsigned char *tableptr); - - pcre_extra *pcre_study(const pcre *code, int options, - const char **errptr); - - void pcre_free_study(pcre_extra *extra); - - int pcre_exec(const pcre *code, const pcre_extra *extra, - const char *subject, int length, int startoffset, - int options, int *ovector, int ovecsize); - - int pcre_dfa_exec(const pcre *code, const pcre_extra *extra, - const char *subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - int *workspace, int wscount); - - -PCRE NATIVE API STRING EXTRACTION FUNCTIONS - - int pcre_copy_named_substring(const pcre *code, - const char *subject, int *ovector, - int stringcount, const char *stringname, - char *buffer, int buffersize); - - int pcre_copy_substring(const char *subject, int *ovector, - int stringcount, int stringnumber, char *buffer, - int buffersize); - - int pcre_get_named_substring(const pcre *code, - const char *subject, int *ovector, - int stringcount, const char *stringname, - const char **stringptr); - - int pcre_get_stringnumber(const pcre *code, - const char *name); - - int pcre_get_stringtable_entries(const pcre *code, - const char *name, char **first, char **last); - - int pcre_get_substring(const char *subject, int *ovector, - int stringcount, int stringnumber, - const char **stringptr); - - int pcre_get_substring_list(const char *subject, - int *ovector, int stringcount, const char ***listptr); - - void pcre_free_substring(const char *stringptr); - - void pcre_free_substring_list(const char **stringptr); - - -PCRE NATIVE API AUXILIARY FUNCTIONS - - int pcre_jit_exec(const pcre *code, const pcre_extra *extra, - const char *subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - pcre_jit_stack *jstack); - - pcre_jit_stack *pcre_jit_stack_alloc(int startsize, int maxsize); - - void pcre_jit_stack_free(pcre_jit_stack *stack); - - void pcre_assign_jit_stack(pcre_extra *extra, - pcre_jit_callback callback, void *data); - - const unsigned char *pcre_maketables(void); - - int pcre_fullinfo(const pcre *code, const pcre_extra *extra, - int what, void *where); - - int pcre_refcount(pcre *code, int adjust); - - int pcre_config(int what, void *where); - - const char *pcre_version(void); - - int pcre_pattern_to_host_byte_order(pcre *code, - pcre_extra *extra, const unsigned char *tables); - - -PCRE NATIVE API INDIRECTED FUNCTIONS - - void *(*pcre_malloc)(size_t); - - void (*pcre_free)(void *); - - void *(*pcre_stack_malloc)(size_t); - - void (*pcre_stack_free)(void *); - - int (*pcre_callout)(pcre_callout_block *); - - int (*pcre_stack_guard)(void); - - -PCRE 8-BIT, 16-BIT, AND 32-BIT LIBRARIES - - As well as support for 8-bit character strings, PCRE also supports - 16-bit strings (from release 8.30) and 32-bit strings (from release - 8.32), by means of two additional libraries. They can be built as well - as, or instead of, the 8-bit library. To avoid too much complication, - this document describes the 8-bit versions of the functions, with only - occasional references to the 16-bit and 32-bit libraries. - - The 16-bit and 32-bit functions operate in the same way as their 8-bit - counterparts; they just use different data types for their arguments - and results, and their names start with pcre16_ or pcre32_ instead of - pcre_. For every option that has UTF8 in its name (for example, - PCRE_UTF8), there are corresponding 16-bit and 32-bit names with UTF8 - replaced by UTF16 or UTF32, respectively. This facility is in fact just - cosmetic; the 16-bit and 32-bit option names define the same bit val- - ues. - - References to bytes and UTF-8 in this document should be read as refer- - ences to 16-bit data units and UTF-16 when using the 16-bit library, or - 32-bit data units and UTF-32 when using the 32-bit library, unless - specified otherwise. More details of the specific differences for the - 16-bit and 32-bit libraries are given in the pcre16 and pcre32 pages. - - -PCRE API OVERVIEW - - PCRE has its own native API, which is described in this document. There - are also some wrapper functions (for the 8-bit library only) that cor- - respond to the POSIX regular expression API, but they do not give - access to all the functionality. They are described in the pcreposix - documentation. Both of these APIs define a set of C function calls. A - C++ wrapper (again for the 8-bit library only) is also distributed with - PCRE. It is documented in the pcrecpp page. - - The native API C function prototypes are defined in the header file - pcre.h, and on Unix-like systems the (8-bit) library itself is called - libpcre. It can normally be accessed by adding -lpcre to the command - for linking an application that uses PCRE. The header file defines the - macros PCRE_MAJOR and PCRE_MINOR to contain the major and minor release - numbers for the library. Applications can use these to include support - for different releases of PCRE. - - In a Windows environment, if you want to statically link an application - program against a non-dll pcre.a file, you must define PCRE_STATIC - before including pcre.h or pcrecpp.h, because otherwise the pcre_mal- - loc() and pcre_free() exported functions will be declared - __declspec(dllimport), with unwanted results. - - The functions pcre_compile(), pcre_compile2(), pcre_study(), and - pcre_exec() are used for compiling and matching regular expressions in - a Perl-compatible manner. A sample program that demonstrates the sim- - plest way of using them is provided in the file called pcredemo.c in - the PCRE source distribution. A listing of this program is given in the - pcredemo documentation, and the pcresample documentation describes how - to compile and run it. - - Just-in-time compiler support is an optional feature of PCRE that can - be built in appropriate hardware environments. It greatly speeds up the - matching performance of many patterns. Simple programs can easily - request that it be used if available, by setting an option that is - ignored when it is not relevant. More complicated programs might need - to make use of the functions pcre_jit_stack_alloc(), - pcre_jit_stack_free(), and pcre_assign_jit_stack() in order to control - the JIT code's memory usage. - - From release 8.32 there is also a direct interface for JIT execution, - which gives improved performance. The JIT-specific functions are dis- - cussed in the pcrejit documentation. - - A second matching function, pcre_dfa_exec(), which is not Perl-compati- - ble, is also provided. This uses a different algorithm for the match- - ing. The alternative algorithm finds all possible matches (at a given - point in the subject), and scans the subject just once (unless there - are lookbehind assertions). However, this algorithm does not return - captured substrings. A description of the two matching algorithms and - their advantages and disadvantages is given in the pcrematching docu- - mentation. - - In addition to the main compiling and matching functions, there are - convenience functions for extracting captured substrings from a subject - string that is matched by pcre_exec(). They are: - - pcre_copy_substring() - pcre_copy_named_substring() - pcre_get_substring() - pcre_get_named_substring() - pcre_get_substring_list() - pcre_get_stringnumber() - pcre_get_stringtable_entries() - - pcre_free_substring() and pcre_free_substring_list() are also provided, - to free the memory used for extracted strings. - - The function pcre_maketables() is used to build a set of character - tables in the current locale for passing to pcre_compile(), - pcre_exec(), or pcre_dfa_exec(). This is an optional facility that is - provided for specialist use. Most commonly, no special tables are - passed, in which case internal tables that are generated when PCRE is - built are used. - - The function pcre_fullinfo() is used to find out information about a - compiled pattern. The function pcre_version() returns a pointer to a - string containing the version of PCRE and its date of release. - - The function pcre_refcount() maintains a reference count in a data - block containing a compiled pattern. This is provided for the benefit - of object-oriented applications. - - The global variables pcre_malloc and pcre_free initially contain the - entry points of the standard malloc() and free() functions, respec- - tively. PCRE calls the memory management functions via these variables, - so a calling program can replace them if it wishes to intercept the - calls. This should be done before calling any PCRE functions. - - The global variables pcre_stack_malloc and pcre_stack_free are also - indirections to memory management functions. These special functions - are used only when PCRE is compiled to use the heap for remembering - data, instead of recursive function calls, when running the pcre_exec() - function. See the pcrebuild documentation for details of how to do - this. It is a non-standard way of building PCRE, for use in environ- - ments that have limited stacks. Because of the greater use of memory - management, it runs more slowly. Separate functions are provided so - that special-purpose external code can be used for this case. When - used, these functions always allocate memory blocks of the same size. - There is a discussion about PCRE's stack usage in the pcrestack docu- - mentation. - - The global variable pcre_callout initially contains NULL. It can be set - by the caller to a "callout" function, which PCRE will then call at - specified points during a matching operation. Details are given in the - pcrecallout documentation. - - The global variable pcre_stack_guard initially contains NULL. It can be - set by the caller to a function that is called by PCRE whenever it - starts to compile a parenthesized part of a pattern. When parentheses - are nested, PCRE uses recursive function calls, which use up the system - stack. This function is provided so that applications with restricted - stacks can force a compilation error if the stack runs out. The func- - tion should return zero if all is well, or non-zero to force an error. - - -NEWLINES - - PCRE supports five different conventions for indicating line breaks in - strings: a single CR (carriage return) character, a single LF (line- - feed) character, the two-character sequence CRLF, any of the three pre- - ceding, or any Unicode newline sequence. The Unicode newline sequences - are the three just mentioned, plus the single characters VT (vertical - tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line - separator, U+2028), and PS (paragraph separator, U+2029). - - Each of the first three conventions is used by at least one operating - system as its standard newline sequence. When PCRE is built, a default - can be specified. The default default is LF, which is the Unix stan- - dard. When PCRE is run, the default can be overridden, either when a - pattern is compiled, or when it is matched. - - At compile time, the newline convention can be specified by the options - argument of pcre_compile(), or it can be specified by special text at - the start of the pattern itself; this overrides any other settings. See - the pcrepattern page for details of the special character sequences. - - In the PCRE documentation the word "newline" is used to mean "the char- - acter or pair of characters that indicate a line break". The choice of - newline convention affects the handling of the dot, circumflex, and - dollar metacharacters, the handling of #-comments in /x mode, and, when - CRLF is a recognized line ending sequence, the match position advance- - ment for a non-anchored pattern. There is more detail about this in the - section on pcre_exec() options below. - - The choice of newline convention does not affect the interpretation of - the \n or \r escape sequences, nor does it affect what \R matches, - which is controlled in a similar way, but by separate options. - - -MULTITHREADING - - The PCRE functions can be used in multi-threading applications, with - the proviso that the memory management functions pointed to by - pcre_malloc, pcre_free, pcre_stack_malloc, and pcre_stack_free, and the - callout and stack-checking functions pointed to by pcre_callout and - pcre_stack_guard, are shared by all threads. - - The compiled form of a regular expression is not altered during match- - ing, so the same compiled pattern can safely be used by several threads - at once. - - If the just-in-time optimization feature is being used, it needs sepa- - rate memory stack areas for each thread. See the pcrejit documentation - for more details. - - -SAVING PRECOMPILED PATTERNS FOR LATER USE - - The compiled form of a regular expression can be saved and re-used at a - later time, possibly by a different program, and even on a host other - than the one on which it was compiled. Details are given in the - pcreprecompile documentation, which includes a description of the - pcre_pattern_to_host_byte_order() function. However, compiling a regu- - lar expression with one version of PCRE for use with a different ver- - sion is not guaranteed to work and may cause crashes. - - -CHECKING BUILD-TIME OPTIONS - - int pcre_config(int what, void *where); - - The function pcre_config() makes it possible for a PCRE client to dis- - cover which optional features have been compiled into the PCRE library. - The pcrebuild documentation has more details about these optional fea- - tures. - - The first argument for pcre_config() is an integer, specifying which - information is required; the second argument is a pointer to a variable - into which the information is placed. The returned value is zero on - success, or the negative error code PCRE_ERROR_BADOPTION if the value - in the first argument is not recognized. The following information is - available: - - PCRE_CONFIG_UTF8 - - The output is an integer that is set to one if UTF-8 support is avail- - able; otherwise it is set to zero. This value should normally be given - to the 8-bit version of this function, pcre_config(). If it is given to - the 16-bit or 32-bit version of this function, the result is - PCRE_ERROR_BADOPTION. - - PCRE_CONFIG_UTF16 - - The output is an integer that is set to one if UTF-16 support is avail- - able; otherwise it is set to zero. This value should normally be given - to the 16-bit version of this function, pcre16_config(). If it is given - to the 8-bit or 32-bit version of this function, the result is - PCRE_ERROR_BADOPTION. - - PCRE_CONFIG_UTF32 - - The output is an integer that is set to one if UTF-32 support is avail- - able; otherwise it is set to zero. This value should normally be given - to the 32-bit version of this function, pcre32_config(). If it is given - to the 8-bit or 16-bit version of this function, the result is - PCRE_ERROR_BADOPTION. - - PCRE_CONFIG_UNICODE_PROPERTIES - - The output is an integer that is set to one if support for Unicode - character properties is available; otherwise it is set to zero. - - PCRE_CONFIG_JIT - - The output is an integer that is set to one if support for just-in-time - compiling is available; otherwise it is set to zero. - - PCRE_CONFIG_JITTARGET - - The output is a pointer to a zero-terminated "const char *" string. If - JIT support is available, the string contains the name of the architec- - ture for which the JIT compiler is configured, for example "x86 32bit - (little endian + unaligned)". If JIT support is not available, the - result is NULL. - - PCRE_CONFIG_NEWLINE - - The output is an integer whose value specifies the default character - sequence that is recognized as meaning "newline". The values that are - supported in ASCII/Unicode environments are: 10 for LF, 13 for CR, 3338 - for CRLF, -2 for ANYCRLF, and -1 for ANY. In EBCDIC environments, CR, - ANYCRLF, and ANY yield the same values. However, the value for LF is - normally 21, though some EBCDIC environments use 37. The corresponding - values for CRLF are 3349 and 3365. The default should normally corre- - spond to the standard sequence for your operating system. - - PCRE_CONFIG_BSR - - The output is an integer whose value indicates what character sequences - the \R escape sequence matches by default. A value of 0 means that \R - matches any Unicode line ending sequence; a value of 1 means that \R - matches only CR, LF, or CRLF. The default can be overridden when a pat- - tern is compiled or matched. - - PCRE_CONFIG_LINK_SIZE - - The output is an integer that contains the number of bytes used for - internal linkage in compiled regular expressions. For the 8-bit - library, the value can be 2, 3, or 4. For the 16-bit library, the value - is either 2 or 4 and is still a number of bytes. For the 32-bit - library, the value is either 2 or 4 and is still a number of bytes. The - default value of 2 is sufficient for all but the most massive patterns, - since it allows the compiled pattern to be up to 64K in size. Larger - values allow larger regular expressions to be compiled, at the expense - of slower matching. - - PCRE_CONFIG_POSIX_MALLOC_THRESHOLD - - The output is an integer that contains the threshold above which the - POSIX interface uses malloc() for output vectors. Further details are - given in the pcreposix documentation. - - PCRE_CONFIG_PARENS_LIMIT - - The output is a long integer that gives the maximum depth of nesting of - parentheses (of any kind) in a pattern. This limit is imposed to cap - the amount of system stack used when a pattern is compiled. It is spec- - ified when PCRE is built; the default is 250. This limit does not take - into account the stack that may already be used by the calling applica- - tion. For finer control over compilation stack usage, you can set a - pointer to an external checking function in pcre_stack_guard. - - PCRE_CONFIG_MATCH_LIMIT - - The output is a long integer that gives the default limit for the num- - ber of internal matching function calls in a pcre_exec() execution. - Further details are given with pcre_exec() below. - - PCRE_CONFIG_MATCH_LIMIT_RECURSION - - The output is a long integer that gives the default limit for the depth - of recursion when calling the internal matching function in a - pcre_exec() execution. Further details are given with pcre_exec() - below. - - PCRE_CONFIG_STACKRECURSE - - The output is an integer that is set to one if internal recursion when - running pcre_exec() is implemented by recursive function calls that use - the stack to remember their state. This is the usual way that PCRE is - compiled. The output is zero if PCRE was compiled to use blocks of data - on the heap instead of recursive function calls. In this case, - pcre_stack_malloc and pcre_stack_free are called to manage memory - blocks on the heap, thus avoiding the use of the stack. - - -COMPILING A PATTERN - - pcre *pcre_compile(const char *pattern, int options, - const char **errptr, int *erroffset, - const unsigned char *tableptr); - - pcre *pcre_compile2(const char *pattern, int options, - int *errorcodeptr, - const char **errptr, int *erroffset, - const unsigned char *tableptr); - - Either of the functions pcre_compile() or pcre_compile2() can be called - to compile a pattern into an internal form. The only difference between - the two interfaces is that pcre_compile2() has an additional argument, - errorcodeptr, via which a numerical error code can be returned. To - avoid too much repetition, we refer just to pcre_compile() below, but - the information applies equally to pcre_compile2(). - - The pattern is a C string terminated by a binary zero, and is passed in - the pattern argument. A pointer to a single block of memory that is - obtained via pcre_malloc is returned. This contains the compiled code - and related data. The pcre type is defined for the returned block; this - is a typedef for a structure whose contents are not externally defined. - It is up to the caller to free the memory (via pcre_free) when it is no - longer required. - - Although the compiled code of a PCRE regex is relocatable, that is, it - does not depend on memory location, the complete pcre data block is not - fully relocatable, because it may contain a copy of the tableptr argu- - ment, which is an address (see below). - - The options argument contains various bit settings that affect the com- - pilation. It should be zero if no options are required. The available - options are described below. Some of them (in particular, those that - are compatible with Perl, but some others as well) can also be set and - unset from within the pattern (see the detailed description in the - pcrepattern documentation). For those options that can be different in - different parts of the pattern, the contents of the options argument - specifies their settings at the start of compilation and execution. The - PCRE_ANCHORED, PCRE_BSR_xxx, PCRE_NEWLINE_xxx, PCRE_NO_UTF8_CHECK, and - PCRE_NO_START_OPTIMIZE options can be set at the time of matching as - well as at compile time. - - If errptr is NULL, pcre_compile() returns NULL immediately. Otherwise, - if compilation of a pattern fails, pcre_compile() returns NULL, and - sets the variable pointed to by errptr to point to a textual error mes- - sage. This is a static string that is part of the library. You must not - try to free it. Normally, the offset from the start of the pattern to - the data unit that was being processed when the error was discovered is - placed in the variable pointed to by erroffset, which must not be NULL - (if it is, an immediate error is given). However, for an invalid UTF-8 - or UTF-16 string, the offset is that of the first data unit of the - failing character. - - Some errors are not detected until the whole pattern has been scanned; - in these cases, the offset passed back is the length of the pattern. - Note that the offset is in data units, not characters, even in a UTF - mode. It may sometimes point into the middle of a UTF-8 or UTF-16 char- - acter. - - If pcre_compile2() is used instead of pcre_compile(), and the error- - codeptr argument is not NULL, a non-zero error code number is returned - via this argument in the event of an error. This is in addition to the - textual error message. Error codes and messages are listed below. - - If the final argument, tableptr, is NULL, PCRE uses a default set of - character tables that are built when PCRE is compiled, using the - default C locale. Otherwise, tableptr must be an address that is the - result of a call to pcre_maketables(). This value is stored with the - compiled pattern, and used again by pcre_exec() and pcre_dfa_exec() - when the pattern is matched. For more discussion, see the section on - locale support below. - - This code fragment shows a typical straightforward call to pcre_com- - pile(): - - pcre *re; - const char *error; - int erroffset; - re = pcre_compile( - "^A.*Z", /* the pattern */ - 0, /* default options */ - &error, /* for error message */ - &erroffset, /* for error offset */ - NULL); /* use default character tables */ - - The following names for option bits are defined in the pcre.h header - file: - - PCRE_ANCHORED - - If this bit is set, the pattern is forced to be "anchored", that is, it - is constrained to match only at the first matching point in the string - that is being searched (the "subject string"). This effect can also be - achieved by appropriate constructs in the pattern itself, which is the - only way to do it in Perl. - - PCRE_AUTO_CALLOUT - - If this bit is set, pcre_compile() automatically inserts callout items, - all with number 255, before each pattern item. For discussion of the - callout facility, see the pcrecallout documentation. - - PCRE_BSR_ANYCRLF - PCRE_BSR_UNICODE - - These options (which are mutually exclusive) control what the \R escape - sequence matches. The choice is either to match only CR, LF, or CRLF, - or to match any Unicode newline sequence. The default is specified when - PCRE is built. It can be overridden from within the pattern, or by set- - ting an option when a compiled pattern is matched. - - PCRE_CASELESS - - If this bit is set, letters in the pattern match both upper and lower - case letters. It is equivalent to Perl's /i option, and it can be - changed within a pattern by a (?i) option setting. In UTF-8 mode, PCRE - always understands the concept of case for characters whose values are - less than 128, so caseless matching is always possible. For characters - with higher values, the concept of case is supported if PCRE is com- - piled with Unicode property support, but not otherwise. If you want to - use caseless matching for characters 128 and above, you must ensure - that PCRE is compiled with Unicode property support as well as with - UTF-8 support. - - PCRE_DOLLAR_ENDONLY - - If this bit is set, a dollar metacharacter in the pattern matches only - at the end of the subject string. Without this option, a dollar also - matches immediately before a newline at the end of the string (but not - before any other newlines). The PCRE_DOLLAR_ENDONLY option is ignored - if PCRE_MULTILINE is set. There is no equivalent to this option in - Perl, and no way to set it within a pattern. - - PCRE_DOTALL - - If this bit is set, a dot metacharacter in the pattern matches a char- - acter of any value, including one that indicates a newline. However, it - only ever matches one character, even if newlines are coded as CRLF. - Without this option, a dot does not match when the current position is - at a newline. This option is equivalent to Perl's /s option, and it can - be changed within a pattern by a (?s) option setting. A negative class - such as [^a] always matches newline characters, independent of the set- - ting of this option. - - PCRE_DUPNAMES - - If this bit is set, names used to identify capturing subpatterns need - not be unique. This can be helpful for certain types of pattern when it - is known that only one instance of the named subpattern can ever be - matched. There are more details of named subpatterns below; see also - the pcrepattern documentation. - - PCRE_EXTENDED - - If this bit is set, most white space characters in the pattern are - totally ignored except when escaped or inside a character class. How- - ever, white space is not allowed within sequences such as (?> that - introduce various parenthesized subpatterns, nor within a numerical - quantifier such as {1,3}. However, ignorable white space is permitted - between an item and a following quantifier and between a quantifier and - a following + that indicates possessiveness. - - White space did not used to include the VT character (code 11), because - Perl did not treat this character as white space. However, Perl changed - at release 5.18, so PCRE followed at release 8.34, and VT is now - treated as white space. - - PCRE_EXTENDED also causes characters between an unescaped # outside a - character class and the next newline, inclusive, to be ignored. - PCRE_EXTENDED is equivalent to Perl's /x option, and it can be changed - within a pattern by a (?x) option setting. - - Which characters are interpreted as newlines is controlled by the - options passed to pcre_compile() or by a special sequence at the start - of the pattern, as described in the section entitled "Newline conven- - tions" in the pcrepattern documentation. Note that the end of this type - of comment is a literal newline sequence in the pattern; escape - sequences that happen to represent a newline do not count. - - This option makes it possible to include comments inside complicated - patterns. Note, however, that this applies only to data characters. - White space characters may never appear within special character - sequences in a pattern, for example within the sequence (?( that intro- - duces a conditional subpattern. - - PCRE_EXTRA - - This option was invented in order to turn on additional functionality - of PCRE that is incompatible with Perl, but it is currently of very - little use. When set, any backslash in a pattern that is followed by a - letter that has no special meaning causes an error, thus reserving - these combinations for future expansion. By default, as in Perl, a - backslash followed by a letter with no special meaning is treated as a - literal. (Perl can, however, be persuaded to give an error for this, by - running it with the -w option.) There are at present no other features - controlled by this option. It can also be set by a (?X) option setting - within a pattern. - - PCRE_FIRSTLINE - - If this option is set, an unanchored pattern is required to match - before or at the first newline in the subject string, though the - matched text may continue over the newline. - - PCRE_JAVASCRIPT_COMPAT - - If this option is set, PCRE's behaviour is changed in some ways so that - it is compatible with JavaScript rather than Perl. The changes are as - follows: - - (1) A lone closing square bracket in a pattern causes a compile-time - error, because this is illegal in JavaScript (by default it is treated - as a data character). Thus, the pattern AB]CD becomes illegal when this - option is set. - - (2) At run time, a back reference to an unset subpattern group matches - an empty string (by default this causes the current matching alterna- - tive to fail). A pattern such as (\1)(a) succeeds when this option is - set (assuming it can find an "a" in the subject), whereas it fails by - default, for Perl compatibility. - - (3) \U matches an upper case "U" character; by default \U causes a com- - pile time error (Perl uses \U to upper case subsequent characters). - - (4) \u matches a lower case "u" character unless it is followed by four - hexadecimal digits, in which case the hexadecimal number defines the - code point to match. By default, \u causes a compile time error (Perl - uses it to upper case the following character). - - (5) \x matches a lower case "x" character unless it is followed by two - hexadecimal digits, in which case the hexadecimal number defines the - code point to match. By default, as in Perl, a hexadecimal number is - always expected after \x, but it may have zero, one, or two digits (so, - for example, \xz matches a binary zero character followed by z). - - PCRE_MULTILINE - - By default, for the purposes of matching "start of line" and "end of - line", PCRE treats the subject string as consisting of a single line of - characters, even if it actually contains newlines. The "start of line" - metacharacter (^) matches only at the start of the string, and the "end - of line" metacharacter ($) matches only at the end of the string, or - before a terminating newline (except when PCRE_DOLLAR_ENDONLY is set). - Note, however, that unless PCRE_DOTALL is set, the "any character" - metacharacter (.) does not match at a newline. This behaviour (for ^, - $, and dot) is the same as Perl. - - When PCRE_MULTILINE it is set, the "start of line" and "end of line" - constructs match immediately following or immediately before internal - newlines in the subject string, respectively, as well as at the very - start and end. This is equivalent to Perl's /m option, and it can be - changed within a pattern by a (?m) option setting. If there are no new- - lines in a subject string, or no occurrences of ^ or $ in a pattern, - setting PCRE_MULTILINE has no effect. - - PCRE_NEVER_UTF - - This option locks out interpretation of the pattern as UTF-8 (or UTF-16 - or UTF-32 in the 16-bit and 32-bit libraries). In particular, it pre- - vents the creator of the pattern from switching to UTF interpretation - by starting the pattern with (*UTF). This may be useful in applications - that process patterns from external sources. The combination of - PCRE_UTF8 and PCRE_NEVER_UTF also causes an error. - - PCRE_NEWLINE_CR - PCRE_NEWLINE_LF - PCRE_NEWLINE_CRLF - PCRE_NEWLINE_ANYCRLF - PCRE_NEWLINE_ANY - - These options override the default newline definition that was chosen - when PCRE was built. Setting the first or the second specifies that a - newline is indicated by a single character (CR or LF, respectively). - Setting PCRE_NEWLINE_CRLF specifies that a newline is indicated by the - two-character CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies - that any of the three preceding sequences should be recognized. Setting - PCRE_NEWLINE_ANY specifies that any Unicode newline sequence should be - recognized. - - In an ASCII/Unicode environment, the Unicode newline sequences are the - three just mentioned, plus the single characters VT (vertical tab, - U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line sep- - arator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit - library, the last two are recognized only in UTF-8 mode. - - When PCRE is compiled to run in an EBCDIC (mainframe) environment, the - code for CR is 0x0d, the same as ASCII. However, the character code for - LF is normally 0x15, though in some EBCDIC environments 0x25 is used. - Whichever of these is not LF is made to correspond to Unicode's NEL - character. EBCDIC codes are all less than 256. For more details, see - the pcrebuild documentation. - - The newline setting in the options word uses three bits that are - treated as a number, giving eight possibilities. Currently only six are - used (default plus the five values above). This means that if you set - more than one newline option, the combination may or may not be sensi- - ble. For example, PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to - PCRE_NEWLINE_CRLF, but other combinations may yield unused numbers and - cause an error. - - The only time that a line break in a pattern is specially recognized - when compiling is when PCRE_EXTENDED is set. CR and LF are white space - characters, and so are ignored in this mode. Also, an unescaped # out- - side a character class indicates a comment that lasts until after the - next line break sequence. In other circumstances, line break sequences - in patterns are treated as literal data. - - The newline option that is set at compile time becomes the default that - is used for pcre_exec() and pcre_dfa_exec(), but it can be overridden. - - PCRE_NO_AUTO_CAPTURE - - If this option is set, it disables the use of numbered capturing paren- - theses in the pattern. Any opening parenthesis that is not followed by - ? behaves as if it were followed by ?: but named parentheses can still - be used for capturing (and they acquire numbers in the usual way). - There is no equivalent of this option in Perl. - - PCRE_NO_AUTO_POSSESS - - If this option is set, it disables "auto-possessification". This is an - optimization that, for example, turns a+b into a++b in order to avoid - backtracks into a+ that can never be successful. However, if callouts - are in use, auto-possessification means that some of them are never - taken. You can set this option if you want the matching functions to do - a full unoptimized search and run all the callouts, but it is mainly - provided for testing purposes. - - PCRE_NO_START_OPTIMIZE - - This is an option that acts at matching time; that is, it is really an - option for pcre_exec() or pcre_dfa_exec(). If it is set at compile - time, it is remembered with the compiled pattern and assumed at match- - ing time. This is necessary if you want to use JIT execution, because - the JIT compiler needs to know whether or not this option is set. For - details see the discussion of PCRE_NO_START_OPTIMIZE below. - - PCRE_UCP - - This option changes the way PCRE processes \B, \b, \D, \d, \S, \s, \W, - \w, and some of the POSIX character classes. By default, only ASCII - characters are recognized, but if PCRE_UCP is set, Unicode properties - are used instead to classify characters. More details are given in the - section on generic character types in the pcrepattern page. If you set - PCRE_UCP, matching one of the items it affects takes much longer. The - option is available only if PCRE has been compiled with Unicode prop- - erty support. - - PCRE_UNGREEDY - - This option inverts the "greediness" of the quantifiers so that they - are not greedy by default, but become greedy if followed by "?". It is - not compatible with Perl. It can also be set by a (?U) option setting - within the pattern. - - PCRE_UTF8 - - This option causes PCRE to regard both the pattern and the subject as - strings of UTF-8 characters instead of single-byte strings. However, it - is available only when PCRE is built to include UTF support. If not, - the use of this option provokes an error. Details of how this option - changes the behaviour of PCRE are given in the pcreunicode page. - - PCRE_NO_UTF8_CHECK - - When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is - automatically checked. There is a discussion about the validity of - UTF-8 strings in the pcreunicode page. If an invalid UTF-8 sequence is - found, pcre_compile() returns an error. If you already know that your - pattern is valid, and you want to skip this check for performance rea- - sons, you can set the PCRE_NO_UTF8_CHECK option. When it is set, the - effect of passing an invalid UTF-8 string as a pattern is undefined. It - may cause your program to crash or loop. Note that this option can also - be passed to pcre_exec() and pcre_dfa_exec(), to suppress the validity - checking of subject strings only. If the same string is being matched - many times, the option can be safely set for the second and subsequent - matchings to improve performance. - - -COMPILATION ERROR CODES - - The following table lists the error codes than may be returned by - pcre_compile2(), along with the error messages that may be returned by - both compiling functions. Note that error messages are always 8-bit - ASCII strings, even in 16-bit or 32-bit mode. As PCRE has developed, - some error codes have fallen out of use. To avoid confusion, they have - not been re-used. - - 0 no error - 1 \ at end of pattern - 2 \c at end of pattern - 3 unrecognized character follows \ - 4 numbers out of order in {} quantifier - 5 number too big in {} quantifier - 6 missing terminating ] for character class - 7 invalid escape sequence in character class - 8 range out of order in character class - 9 nothing to repeat - 10 [this code is not in use] - 11 internal error: unexpected repeat - 12 unrecognized character after (? or (?- - 13 POSIX named classes are supported only within a class - 14 missing ) - 15 reference to non-existent subpattern - 16 erroffset passed as NULL - 17 unknown option bit(s) set - 18 missing ) after comment - 19 [this code is not in use] - 20 regular expression is too large - 21 failed to get memory - 22 unmatched parentheses - 23 internal error: code overflow - 24 unrecognized character after (?< - 25 lookbehind assertion is not fixed length - 26 malformed number or name after (?( - 27 conditional group contains more than two branches - 28 assertion expected after (?( - 29 (?R or (?[+-]digits must be followed by ) - 30 unknown POSIX class name - 31 POSIX collating elements are not supported - 32 this version of PCRE is compiled without UTF support - 33 [this code is not in use] - 34 character value in \x{} or \o{} is too large - 35 invalid condition (?(0) - 36 \C not allowed in lookbehind assertion - 37 PCRE does not support \L, \l, \N{name}, \U, or \u - 38 number after (?C is > 255 - 39 closing ) for (?C expected - 40 recursive call could loop indefinitely - 41 unrecognized character after (?P - 42 syntax error in subpattern name (missing terminator) - 43 two named subpatterns have the same name - 44 invalid UTF-8 string (specifically UTF-8) - 45 support for \P, \p, and \X has not been compiled - 46 malformed \P or \p sequence - 47 unknown property name after \P or \p - 48 subpattern name is too long (maximum 32 characters) - 49 too many named subpatterns (maximum 10000) - 50 [this code is not in use] - 51 octal value is greater than \377 in 8-bit non-UTF-8 mode - 52 internal error: overran compiling workspace - 53 internal error: previously-checked referenced subpattern - not found - 54 DEFINE group contains more than one branch - 55 repeating a DEFINE group is not allowed - 56 inconsistent NEWLINE options - 57 \g is not followed by a braced, angle-bracketed, or quoted - name/number or by a plain number - 58 a numbered reference must not be zero - 59 an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT) - 60 (*VERB) not recognized or malformed - 61 number is too big - 62 subpattern name expected - 63 digit expected after (?+ - 64 ] is an invalid data character in JavaScript compatibility mode - 65 different names for subpatterns of the same number are - not allowed - 66 (*MARK) must have an argument - 67 this version of PCRE is not compiled with Unicode property - support - 68 \c must be followed by an ASCII character - 69 \k is not followed by a braced, angle-bracketed, or quoted name - 70 internal error: unknown opcode in find_fixedlength() - 71 \N is not supported in a class - 72 too many forward references - 73 disallowed Unicode code point (>= 0xd800 && <= 0xdfff) - 74 invalid UTF-16 string (specifically UTF-16) - 75 name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN) - 76 character value in \u.... sequence is too large - 77 invalid UTF-32 string (specifically UTF-32) - 78 setting UTF is disabled by the application - 79 non-hex character in \x{} (closing brace missing?) - 80 non-octal character in \o{} (closing brace missing?) - 81 missing opening brace after \o - 82 parentheses are too deeply nested - 83 invalid range in character class - 84 group name must start with a non-digit - 85 parentheses are too deeply nested (stack check) - - The numbers 32 and 10000 in errors 48 and 49 are defaults; different - values may be used if the limits were changed when PCRE was built. - - -STUDYING A PATTERN - - pcre_extra *pcre_study(const pcre *code, int options, - const char **errptr); - - If a compiled pattern is going to be used several times, it is worth - spending more time analyzing it in order to speed up the time taken for - matching. The function pcre_study() takes a pointer to a compiled pat- - tern as its first argument. If studying the pattern produces additional - information that will help speed up matching, pcre_study() returns a - pointer to a pcre_extra block, in which the study_data field points to - the results of the study. - - The returned value from pcre_study() can be passed directly to - pcre_exec() or pcre_dfa_exec(). However, a pcre_extra block also con- - tains other fields that can be set by the caller before the block is - passed; these are described below in the section on matching a pattern. - - If studying the pattern does not produce any useful information, - pcre_study() returns NULL by default. In that circumstance, if the - calling program wants to pass any of the other fields to pcre_exec() or - pcre_dfa_exec(), it must set up its own pcre_extra block. However, if - pcre_study() is called with the PCRE_STUDY_EXTRA_NEEDED option, it - returns a pcre_extra block even if studying did not find any additional - information. It may still return NULL, however, if an error occurs in - pcre_study(). - - The second argument of pcre_study() contains option bits. There are - three further options in addition to PCRE_STUDY_EXTRA_NEEDED: - - PCRE_STUDY_JIT_COMPILE - PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE - PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE - - If any of these are set, and the just-in-time compiler is available, - the pattern is further compiled into machine code that executes much - faster than the pcre_exec() interpretive matching function. If the - just-in-time compiler is not available, these options are ignored. All - undefined bits in the options argument must be zero. - - JIT compilation is a heavyweight optimization. It can take some time - for patterns to be analyzed, and for one-off matches and simple pat- - terns the benefit of faster execution might be offset by a much slower - study time. Not all patterns can be optimized by the JIT compiler. For - those that cannot be handled, matching automatically falls back to the - pcre_exec() interpreter. For more details, see the pcrejit documenta- - tion. - - The third argument for pcre_study() is a pointer for an error message. - If studying succeeds (even if no data is returned), the variable it - points to is set to NULL. Otherwise it is set to point to a textual - error message. This is a static string that is part of the library. You - must not try to free it. You should test the error pointer for NULL - after calling pcre_study(), to be sure that it has run successfully. - - When you are finished with a pattern, you can free the memory used for - the study data by calling pcre_free_study(). This function was added to - the API for release 8.20. For earlier versions, the memory could be - freed with pcre_free(), just like the pattern itself. This will still - work in cases where JIT optimization is not used, but it is advisable - to change to the new function when convenient. - - This is a typical way in which pcre_study() is used (except that in a - real application there should be tests for errors): - - int rc; - pcre *re; - pcre_extra *sd; - re = pcre_compile("pattern", 0, &error, &erroroffset, NULL); - sd = pcre_study( - re, /* result of pcre_compile() */ - 0, /* no options */ - &error); /* set to NULL or points to a message */ - rc = pcre_exec( /* see below for details of pcre_exec() options */ - re, sd, "subject", 7, 0, 0, ovector, 30); - ... - pcre_free_study(sd); - pcre_free(re); - - Studying a pattern does two things: first, a lower bound for the length - of subject string that is needed to match the pattern is computed. This - does not mean that there are any strings of that length that match, but - it does guarantee that no shorter strings match. The value is used to - avoid wasting time by trying to match strings that are shorter than the - lower bound. You can find out the value in a calling program via the - pcre_fullinfo() function. - - Studying a pattern is also useful for non-anchored patterns that do not - have a single fixed starting character. A bitmap of possible starting - bytes is created. This speeds up finding a position in the subject at - which to start matching. (In 16-bit mode, the bitmap is used for 16-bit - values less than 256. In 32-bit mode, the bitmap is used for 32-bit - values less than 256.) - - These two optimizations apply to both pcre_exec() and pcre_dfa_exec(), - and the information is also used by the JIT compiler. The optimiza- - tions can be disabled by setting the PCRE_NO_START_OPTIMIZE option. - You might want to do this if your pattern contains callouts or (*MARK) - and you want to make use of these facilities in cases where matching - fails. - - PCRE_NO_START_OPTIMIZE can be specified at either compile time or exe- - cution time. However, if PCRE_NO_START_OPTIMIZE is passed to - pcre_exec(), (that is, after any JIT compilation has happened) JIT exe- - cution is disabled. For JIT execution to work with PCRE_NO_START_OPTI- - MIZE, the option must be set at compile time. - - There is a longer discussion of PCRE_NO_START_OPTIMIZE below. - - -LOCALE SUPPORT - - PCRE handles caseless matching, and determines whether characters are - letters, digits, or whatever, by reference to a set of tables, indexed - by character code point. When running in UTF-8 mode, or in the 16- or - 32-bit libraries, this applies only to characters with code points less - than 256. By default, higher-valued code points never match escapes - such as \w or \d. However, if PCRE is built with Unicode property sup- - port, all characters can be tested with \p and \P, or, alternatively, - the PCRE_UCP option can be set when a pattern is compiled; this causes - \w and friends to use Unicode property support instead of the built-in - tables. - - The use of locales with Unicode is discouraged. If you are handling - characters with code points greater than 128, you should either use - Unicode support, or use locales, but not try to mix the two. - - PCRE contains an internal set of tables that are used when the final - argument of pcre_compile() is NULL. These are sufficient for many - applications. Normally, the internal tables recognize only ASCII char- - acters. However, when PCRE is built, it is possible to cause the inter- - nal tables to be rebuilt in the default "C" locale of the local system, - which may cause them to be different. - - The internal tables can always be overridden by tables supplied by the - application that calls PCRE. These may be created in a different locale - from the default. As more and more applications change to using Uni- - code, the need for this locale support is expected to die away. - - External tables are built by calling the pcre_maketables() function, - which has no arguments, in the relevant locale. The result can then be - passed to pcre_compile() as often as necessary. For example, to build - and use tables that are appropriate for the French locale (where - accented characters with values greater than 128 are treated as let- - ters), the following code could be used: - - setlocale(LC_CTYPE, "fr_FR"); - tables = pcre_maketables(); - re = pcre_compile(..., tables); - - The locale name "fr_FR" is used on Linux and other Unix-like systems; - if you are using Windows, the name for the French locale is "french". - - When pcre_maketables() runs, the tables are built in memory that is - obtained via pcre_malloc. It is the caller's responsibility to ensure - that the memory containing the tables remains available for as long as - it is needed. - - The pointer that is passed to pcre_compile() is saved with the compiled - pattern, and the same tables are used via this pointer by pcre_study() - and also by pcre_exec() and pcre_dfa_exec(). Thus, for any single pat- - tern, compilation, studying and matching all happen in the same locale, - but different patterns can be processed in different locales. - - It is possible to pass a table pointer or NULL (indicating the use of - the internal tables) to pcre_exec() or pcre_dfa_exec() (see the discus- - sion below in the section on matching a pattern). This facility is pro- - vided for use with pre-compiled patterns that have been saved and - reloaded. Character tables are not saved with patterns, so if a non- - standard table was used at compile time, it must be provided again when - the reloaded pattern is matched. Attempting to use this facility to - match a pattern in a different locale from the one in which it was com- - piled is likely to lead to anomalous (usually incorrect) results. - - -INFORMATION ABOUT A PATTERN - - int pcre_fullinfo(const pcre *code, const pcre_extra *extra, - int what, void *where); - - The pcre_fullinfo() function returns information about a compiled pat- - tern. It replaces the pcre_info() function, which was removed from the - library at version 8.30, after more than 10 years of obsolescence. - - The first argument for pcre_fullinfo() is a pointer to the compiled - pattern. The second argument is the result of pcre_study(), or NULL if - the pattern was not studied. The third argument specifies which piece - of information is required, and the fourth argument is a pointer to a - variable to receive the data. The yield of the function is zero for - success, or one of the following negative numbers: - - PCRE_ERROR_NULL the argument code was NULL - the argument where was NULL - PCRE_ERROR_BADMAGIC the "magic number" was not found - PCRE_ERROR_BADENDIANNESS the pattern was compiled with different - endianness - PCRE_ERROR_BADOPTION the value of what was invalid - PCRE_ERROR_UNSET the requested field is not set - - The "magic number" is placed at the start of each compiled pattern as - an simple check against passing an arbitrary memory pointer. The endi- - anness error can occur if a compiled pattern is saved and reloaded on a - different host. Here is a typical call of pcre_fullinfo(), to obtain - the length of the compiled pattern: - - int rc; - size_t length; - rc = pcre_fullinfo( - re, /* result of pcre_compile() */ - sd, /* result of pcre_study(), or NULL */ - PCRE_INFO_SIZE, /* what is required */ - &length); /* where to put the data */ - - The possible values for the third argument are defined in pcre.h, and - are as follows: - - PCRE_INFO_BACKREFMAX - - Return the number of the highest back reference in the pattern. The - fourth argument should point to an int variable. Zero is returned if - there are no back references. - - PCRE_INFO_CAPTURECOUNT - - Return the number of capturing subpatterns in the pattern. The fourth - argument should point to an int variable. - - PCRE_INFO_DEFAULT_TABLES - - Return a pointer to the internal default character tables within PCRE. - The fourth argument should point to an unsigned char * variable. This - information call is provided for internal use by the pcre_study() func- - tion. External callers can cause PCRE to use its internal tables by - passing a NULL table pointer. - - PCRE_INFO_FIRSTBYTE (deprecated) - - Return information about the first data unit of any matched string, for - a non-anchored pattern. The name of this option refers to the 8-bit - library, where data units are bytes. The fourth argument should point - to an int variable. Negative values are used for special cases. How- - ever, this means that when the 32-bit library is in non-UTF-32 mode, - the full 32-bit range of characters cannot be returned. For this rea- - son, this value is deprecated; use PCRE_INFO_FIRSTCHARACTERFLAGS and - PCRE_INFO_FIRSTCHARACTER instead. - - If there is a fixed first value, for example, the letter "c" from a - pattern such as (cat|cow|coyote), its value is returned. In the 8-bit - library, the value is always less than 256. In the 16-bit library the - value can be up to 0xffff. In the 32-bit library the value can be up to - 0x10ffff. - - If there is no fixed first value, and if either - - (a) the pattern was compiled with the PCRE_MULTILINE option, and every - branch starts with "^", or - - (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not - set (if it were set, the pattern would be anchored), - - -1 is returned, indicating that the pattern matches only at the start - of a subject string or after any newline within the string. Otherwise - -2 is returned. For anchored patterns, -2 is returned. - - PCRE_INFO_FIRSTCHARACTER - - Return the value of the first data unit (non-UTF character) of any - matched string in the situation where PCRE_INFO_FIRSTCHARACTERFLAGS - returns 1; otherwise return 0. The fourth argument should point to an - uint_t variable. - - In the 8-bit library, the value is always less than 256. In the 16-bit - library the value can be up to 0xffff. In the 32-bit library in UTF-32 - mode the value can be up to 0x10ffff, and up to 0xffffffff when not - using UTF-32 mode. - - PCRE_INFO_FIRSTCHARACTERFLAGS - - Return information about the first data unit of any matched string, for - a non-anchored pattern. The fourth argument should point to an int - variable. - - If there is a fixed first value, for example, the letter "c" from a - pattern such as (cat|cow|coyote), 1 is returned, and the character - value can be retrieved using PCRE_INFO_FIRSTCHARACTER. If there is no - fixed first value, and if either - - (a) the pattern was compiled with the PCRE_MULTILINE option, and every - branch starts with "^", or - - (b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not - set (if it were set, the pattern would be anchored), - - 2 is returned, indicating that the pattern matches only at the start of - a subject string or after any newline within the string. Otherwise 0 is - returned. For anchored patterns, 0 is returned. - - PCRE_INFO_FIRSTTABLE - - If the pattern was studied, and this resulted in the construction of a - 256-bit table indicating a fixed set of values for the first data unit - in any matching string, a pointer to the table is returned. Otherwise - NULL is returned. The fourth argument should point to an unsigned char - * variable. - - PCRE_INFO_HASCRORLF - - Return 1 if the pattern contains any explicit matches for CR or LF - characters, otherwise 0. The fourth argument should point to an int - variable. An explicit match is either a literal CR or LF character, or - \r or \n. - - PCRE_INFO_JCHANGED - - Return 1 if the (?J) or (?-J) option setting is used in the pattern, - otherwise 0. The fourth argument should point to an int variable. (?J) - and (?-J) set and unset the local PCRE_DUPNAMES option, respectively. - - PCRE_INFO_JIT - - Return 1 if the pattern was studied with one of the JIT options, and - just-in-time compiling was successful. The fourth argument should point - to an int variable. A return value of 0 means that JIT support is not - available in this version of PCRE, or that the pattern was not studied - with a JIT option, or that the JIT compiler could not handle this par- - ticular pattern. See the pcrejit documentation for details of what can - and cannot be handled. - - PCRE_INFO_JITSIZE - - If the pattern was successfully studied with a JIT option, return the - size of the JIT compiled code, otherwise return zero. The fourth argu- - ment should point to a size_t variable. - - PCRE_INFO_LASTLITERAL - - Return the value of the rightmost literal data unit that must exist in - any matched string, other than at its start, if such a value has been - recorded. The fourth argument should point to an int variable. If there - is no such value, -1 is returned. For anchored patterns, a last literal - value is recorded only if it follows something of variable length. For - example, for the pattern /^a\d+z\d+/ the returned value is "z", but for - /^a\dz\d/ the returned value is -1. - - Since for the 32-bit library using the non-UTF-32 mode, this function - is unable to return the full 32-bit range of characters, this value is - deprecated; instead the PCRE_INFO_REQUIREDCHARFLAGS and - PCRE_INFO_REQUIREDCHAR values should be used. - - PCRE_INFO_MATCH_EMPTY - - Return 1 if the pattern can match an empty string, otherwise 0. The - fourth argument should point to an int variable. - - PCRE_INFO_MATCHLIMIT - - If the pattern set a match limit by including an item of the form - (*LIMIT_MATCH=nnnn) at the start, the value is returned. The fourth - argument should point to an unsigned 32-bit integer. If no such value - has been set, the call to pcre_fullinfo() returns the error - PCRE_ERROR_UNSET. - - PCRE_INFO_MAXLOOKBEHIND - - Return the number of characters (NB not data units) in the longest - lookbehind assertion in the pattern. This information is useful when - doing multi-segment matching using the partial matching facilities. - Note that the simple assertions \b and \B require a one-character look- - behind. \A also registers a one-character lookbehind, though it does - not actually inspect the previous character. This is to ensure that at - least one character from the old segment is retained when a new segment - is processed. Otherwise, if there are no lookbehinds in the pattern, \A - might match incorrectly at the start of a new segment. - - PCRE_INFO_MINLENGTH - - If the pattern was studied and a minimum length for matching subject - strings was computed, its value is returned. Otherwise the returned - value is -1. The value is a number of characters, which in UTF mode may - be different from the number of data units. The fourth argument should - point to an int variable. A non-negative value is a lower bound to the - length of any matching string. There may not be any strings of that - length that do actually match, but every string that does match is at - least that long. - - PCRE_INFO_NAMECOUNT - PCRE_INFO_NAMEENTRYSIZE - PCRE_INFO_NAMETABLE - - PCRE supports the use of named as well as numbered capturing parenthe- - ses. The names are just an additional way of identifying the parenthe- - ses, which still acquire numbers. Several convenience functions such as - pcre_get_named_substring() are provided for extracting captured sub- - strings by name. It is also possible to extract the data directly, by - first converting the name to a number in order to access the correct - pointers in the output vector (described with pcre_exec() below). To do - the conversion, you need to use the name-to-number map, which is - described by these three values. - - The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT - gives the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size - of each entry; both of these return an int value. The entry size - depends on the length of the longest name. PCRE_INFO_NAMETABLE returns - a pointer to the first entry of the table. This is a pointer to char in - the 8-bit library, where the first two bytes of each entry are the num- - ber of the capturing parenthesis, most significant byte first. In the - 16-bit library, the pointer points to 16-bit data units, the first of - which contains the parenthesis number. In the 32-bit library, the - pointer points to 32-bit data units, the first of which contains the - parenthesis number. The rest of the entry is the corresponding name, - zero terminated. - - The names are in alphabetical order. If (?| is used to create multiple - groups with the same number, as described in the section on duplicate - subpattern numbers in the pcrepattern page, the groups may be given the - same name, but there is only one entry in the table. Different names - for groups of the same number are not permitted. Duplicate names for - subpatterns with different numbers are permitted, but only if PCRE_DUP- - NAMES is set. They appear in the table in the order in which they were - found in the pattern. In the absence of (?| this is the order of - increasing number; when (?| is used this is not necessarily the case - because later subpatterns may have lower numbers. - - As a simple example of the name/number table, consider the following - pattern after compilation by the 8-bit library (assume PCRE_EXTENDED is - set, so white space - including newlines - is ignored): - - (? (?(\d\d)?\d\d) - - (?\d\d) - (?\d\d) ) - - There are four named subpatterns, so the table has four entries, and - each entry in the table is eight bytes long. The table is as follows, - with non-printing bytes shows in hexadecimal, and undefined bytes shown - as ??: - - 00 01 d a t e 00 ?? - 00 05 d a y 00 ?? ?? - 00 04 m o n t h 00 - 00 02 y e a r 00 ?? - - When writing code to extract data from named subpatterns using the - name-to-number map, remember that the length of the entries is likely - to be different for each compiled pattern. - - PCRE_INFO_OKPARTIAL - - Return 1 if the pattern can be used for partial matching with - pcre_exec(), otherwise 0. The fourth argument should point to an int - variable. From release 8.00, this always returns 1, because the - restrictions that previously applied to partial matching have been - lifted. The pcrepartial documentation gives details of partial match- - ing. - - PCRE_INFO_OPTIONS - - Return a copy of the options with which the pattern was compiled. The - fourth argument should point to an unsigned long int variable. These - option bits are those specified in the call to pcre_compile(), modified - by any top-level option settings at the start of the pattern itself. In - other words, they are the options that will be in force when matching - starts. For example, if the pattern /(?im)abc(?-i)d/ is compiled with - the PCRE_EXTENDED option, the result is PCRE_CASELESS, PCRE_MULTILINE, - and PCRE_EXTENDED. - - A pattern is automatically anchored by PCRE if all of its top-level - alternatives begin with one of the following: - - ^ unless PCRE_MULTILINE is set - \A always - \G always - .* if PCRE_DOTALL is set and there are no back - references to the subpattern in which .* appears - - For such patterns, the PCRE_ANCHORED bit is set in the options returned - by pcre_fullinfo(). - - PCRE_INFO_RECURSIONLIMIT - - If the pattern set a recursion limit by including an item of the form - (*LIMIT_RECURSION=nnnn) at the start, the value is returned. The fourth - argument should point to an unsigned 32-bit integer. If no such value - has been set, the call to pcre_fullinfo() returns the error - PCRE_ERROR_UNSET. - - PCRE_INFO_SIZE - - Return the size of the compiled pattern in bytes (for all three - libraries). The fourth argument should point to a size_t variable. This - value does not include the size of the pcre structure that is returned - by pcre_compile(). The value that is passed as the argument to - pcre_malloc() when pcre_compile() is getting memory in which to place - the compiled data is the value returned by this option plus the size of - the pcre structure. Studying a compiled pattern, with or without JIT, - does not alter the value returned by this option. - - PCRE_INFO_STUDYSIZE - - Return the size in bytes (for all three libraries) of the data block - pointed to by the study_data field in a pcre_extra block. If pcre_extra - is NULL, or there is no study data, zero is returned. The fourth argu- - ment should point to a size_t variable. The study_data field is set by - pcre_study() to record information that will speed up matching (see the - section entitled "Studying a pattern" above). The format of the - study_data block is private, but its length is made available via this - option so that it can be saved and restored (see the pcreprecompile - documentation for details). - - PCRE_INFO_REQUIREDCHARFLAGS - - Returns 1 if there is a rightmost literal data unit that must exist in - any matched string, other than at its start. The fourth argument should - point to an int variable. If there is no such value, 0 is returned. If - returning 1, the character value itself can be retrieved using - PCRE_INFO_REQUIREDCHAR. - - For anchored patterns, a last literal value is recorded only if it fol- - lows something of variable length. For example, for the pattern - /^a\d+z\d+/ the returned value 1 (with "z" returned from - PCRE_INFO_REQUIREDCHAR), but for /^a\dz\d/ the returned value is 0. - - PCRE_INFO_REQUIREDCHAR - - Return the value of the rightmost literal data unit that must exist in - any matched string, other than at its start, if such a value has been - recorded. The fourth argument should point to an uint32_t variable. If - there is no such value, 0 is returned. - - -REFERENCE COUNTS - - int pcre_refcount(pcre *code, int adjust); - - The pcre_refcount() function is used to maintain a reference count in - the data block that contains a compiled pattern. It is provided for the - benefit of applications that operate in an object-oriented manner, - where different parts of the application may be using the same compiled - pattern, but you want to free the block when they are all done. - - When a pattern is compiled, the reference count field is initialized to - zero. It is changed only by calling this function, whose action is to - add the adjust value (which may be positive or negative) to it. The - yield of the function is the new value. However, the value of the count - is constrained to lie between 0 and 65535, inclusive. If the new value - is outside these limits, it is forced to the appropriate limit value. - - Except when it is zero, the reference count is not correctly preserved - if a pattern is compiled on one host and then transferred to a host - whose byte-order is different. (This seems a highly unlikely scenario.) - - -MATCHING A PATTERN: THE TRADITIONAL FUNCTION - - int pcre_exec(const pcre *code, const pcre_extra *extra, - const char *subject, int length, int startoffset, - int options, int *ovector, int ovecsize); - - The function pcre_exec() is called to match a subject string against a - compiled pattern, which is passed in the code argument. If the pattern - was studied, the result of the study should be passed in the extra - argument. You can call pcre_exec() with the same code and extra argu- - ments as many times as you like, in order to match different subject - strings with the same pattern. - - This function is the main matching facility of the library, and it - operates in a Perl-like manner. For specialist use there is also an - alternative matching function, which is described below in the section - about the pcre_dfa_exec() function. - - In most applications, the pattern will have been compiled (and option- - ally studied) in the same process that calls pcre_exec(). However, it - is possible to save compiled patterns and study data, and then use them - later in different processes, possibly even on different hosts. For a - discussion about this, see the pcreprecompile documentation. - - Here is an example of a simple call to pcre_exec(): - - int rc; - int ovector[30]; - rc = pcre_exec( - re, /* result of pcre_compile() */ - NULL, /* we didn't study the pattern */ - "some string", /* the subject string */ - 11, /* the length of the subject string */ - 0, /* start at offset 0 in the subject */ - 0, /* default options */ - ovector, /* vector of integers for substring information */ - 30); /* number of elements (NOT size in bytes) */ - - Extra data for pcre_exec() - - If the extra argument is not NULL, it must point to a pcre_extra data - block. The pcre_study() function returns such a block (when it doesn't - return NULL), but you can also create one for yourself, and pass addi- - tional information in it. The pcre_extra block contains the following - fields (not necessarily in this order): - - unsigned long int flags; - void *study_data; - void *executable_jit; - unsigned long int match_limit; - unsigned long int match_limit_recursion; - void *callout_data; - const unsigned char *tables; - unsigned char **mark; - - In the 16-bit version of this structure, the mark field has type - "PCRE_UCHAR16 **". - - In the 32-bit version of this structure, the mark field has type - "PCRE_UCHAR32 **". - - The flags field is used to specify which of the other fields are set. - The flag bits are: - - PCRE_EXTRA_CALLOUT_DATA - PCRE_EXTRA_EXECUTABLE_JIT - PCRE_EXTRA_MARK - PCRE_EXTRA_MATCH_LIMIT - PCRE_EXTRA_MATCH_LIMIT_RECURSION - PCRE_EXTRA_STUDY_DATA - PCRE_EXTRA_TABLES - - Other flag bits should be set to zero. The study_data field and some- - times the executable_jit field are set in the pcre_extra block that is - returned by pcre_study(), together with the appropriate flag bits. You - should not set these yourself, but you may add to the block by setting - other fields and their corresponding flag bits. - - The match_limit field provides a means of preventing PCRE from using up - a vast amount of resources when running patterns that are not going to - match, but which have a very large number of possibilities in their - search trees. The classic example is a pattern that uses nested unlim- - ited repeats. - - Internally, pcre_exec() uses a function called match(), which it calls - repeatedly (sometimes recursively). The limit set by match_limit is - imposed on the number of times this function is called during a match, - which has the effect of limiting the amount of backtracking that can - take place. For patterns that are not anchored, the count restarts from - zero for each position in the subject string. - - When pcre_exec() is called with a pattern that was successfully studied - with a JIT option, the way that the matching is executed is entirely - different. However, there is still the possibility of runaway matching - that goes on for a very long time, and so the match_limit value is also - used in this case (but in a different way) to limit how long the match- - ing can continue. - - The default value for the limit can be set when PCRE is built; the - default default is 10 million, which handles all but the most extreme - cases. You can override the default by suppling pcre_exec() with a - pcre_extra block in which match_limit is set, and - PCRE_EXTRA_MATCH_LIMIT is set in the flags field. If the limit is - exceeded, pcre_exec() returns PCRE_ERROR_MATCHLIMIT. - - A value for the match limit may also be supplied by an item at the - start of a pattern of the form - - (*LIMIT_MATCH=d) - - where d is a decimal number. However, such a setting is ignored unless - d is less than the limit set by the caller of pcre_exec() or, if no - such limit is set, less than the default. - - The match_limit_recursion field is similar to match_limit, but instead - of limiting the total number of times that match() is called, it limits - the depth of recursion. The recursion depth is a smaller number than - the total number of calls, because not all calls to match() are recur- - sive. This limit is of use only if it is set smaller than match_limit. - - Limiting the recursion depth limits the amount of machine stack that - can be used, or, when PCRE has been compiled to use memory on the heap - instead of the stack, the amount of heap memory that can be used. This - limit is not relevant, and is ignored, when matching is done using JIT - compiled code. - - The default value for match_limit_recursion can be set when PCRE is - built; the default default is the same value as the default for - match_limit. You can override the default by suppling pcre_exec() with - a pcre_extra block in which match_limit_recursion is set, and - PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in the flags field. If the - limit is exceeded, pcre_exec() returns PCRE_ERROR_RECURSIONLIMIT. - - A value for the recursion limit may also be supplied by an item at the - start of a pattern of the form - - (*LIMIT_RECURSION=d) - - where d is a decimal number. However, such a setting is ignored unless - d is less than the limit set by the caller of pcre_exec() or, if no - such limit is set, less than the default. - - The callout_data field is used in conjunction with the "callout" fea- - ture, and is described in the pcrecallout documentation. - - The tables field is provided for use with patterns that have been pre- - compiled using custom character tables, saved to disc or elsewhere, and - then reloaded, because the tables that were used to compile a pattern - are not saved with it. See the pcreprecompile documentation for a dis- - cussion of saving compiled patterns for later use. If NULL is passed - using this mechanism, it forces PCRE's internal tables to be used. - - Warning: The tables that pcre_exec() uses must be the same as those - that were used when the pattern was compiled. If this is not the case, - the behaviour of pcre_exec() is undefined. Therefore, when a pattern is - compiled and matched in the same process, this field should never be - set. In this (the most common) case, the correct table pointer is auto- - matically passed with the compiled pattern from pcre_compile() to - pcre_exec(). - - If PCRE_EXTRA_MARK is set in the flags field, the mark field must be - set to point to a suitable variable. If the pattern contains any back- - tracking control verbs such as (*MARK:NAME), and the execution ends up - with a name to pass back, a pointer to the name string (zero termi- - nated) is placed in the variable pointed to by the mark field. The - names are within the compiled pattern; if you wish to retain such a - name you must copy it before freeing the memory of a compiled pattern. - If there is no name to pass back, the variable pointed to by the mark - field is set to NULL. For details of the backtracking control verbs, - see the section entitled "Backtracking control" in the pcrepattern doc- - umentation. - - Option bits for pcre_exec() - - The unused bits of the options argument for pcre_exec() must be zero. - The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_xxx, - PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, - PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, and - PCRE_PARTIAL_SOFT. - - If the pattern was successfully studied with one of the just-in-time - (JIT) compile options, the only supported options for JIT execution are - PCRE_NO_UTF8_CHECK, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, - PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT. If an - unsupported option is used, JIT execution is disabled and the normal - interpretive code in pcre_exec() is run. - - PCRE_ANCHORED - - The PCRE_ANCHORED option limits pcre_exec() to matching at the first - matching position. If a pattern was compiled with PCRE_ANCHORED, or - turned out to be anchored by virtue of its contents, it cannot be made - unachored at matching time. - - PCRE_BSR_ANYCRLF - PCRE_BSR_UNICODE - - These options (which are mutually exclusive) control what the \R escape - sequence matches. The choice is either to match only CR, LF, or CRLF, - or to match any Unicode newline sequence. These options override the - choice that was made or defaulted when the pattern was compiled. - - PCRE_NEWLINE_CR - PCRE_NEWLINE_LF - PCRE_NEWLINE_CRLF - PCRE_NEWLINE_ANYCRLF - PCRE_NEWLINE_ANY - - These options override the newline definition that was chosen or - defaulted when the pattern was compiled. For details, see the descrip- - tion of pcre_compile() above. During matching, the newline choice - affects the behaviour of the dot, circumflex, and dollar metacharac- - ters. It may also alter the way the match position is advanced after a - match failure for an unanchored pattern. - - When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is - set, and a match attempt for an unanchored pattern fails when the cur- - rent position is at a CRLF sequence, and the pattern contains no - explicit matches for CR or LF characters, the match position is - advanced by two characters instead of one, in other words, to after the - CRLF. - - The above rule is a compromise that makes the most common cases work as - expected. For example, if the pattern is .+A (and the PCRE_DOTALL - option is not set), it does not match the string "\r\nA" because, after - failing at the start, it skips both the CR and the LF before retrying. - However, the pattern [\r\n]A does match that string, because it con- - tains an explicit CR or LF reference, and so advances only by one char- - acter after the first failure. - - An explicit match for CR of LF is either a literal appearance of one of - those characters, or one of the \r or \n escape sequences. Implicit - matches such as [^X] do not count, nor does \s (which includes CR and - LF in the characters that it matches). - - Notwithstanding the above, anomalous effects may still occur when CRLF - is a valid newline sequence and explicit \r or \n escapes appear in the - pattern. - - PCRE_NOTBOL - - This option specifies that first character of the subject string is not - the beginning of a line, so the circumflex metacharacter should not - match before it. Setting this without PCRE_MULTILINE (at compile time) - causes circumflex never to match. This option affects only the behav- - iour of the circumflex metacharacter. It does not affect \A. - - PCRE_NOTEOL - - This option specifies that the end of the subject string is not the end - of a line, so the dollar metacharacter should not match it nor (except - in multiline mode) a newline immediately before it. Setting this with- - out PCRE_MULTILINE (at compile time) causes dollar never to match. This - option affects only the behaviour of the dollar metacharacter. It does - not affect \Z or \z. - - PCRE_NOTEMPTY - - An empty string is not considered to be a valid match if this option is - set. If there are alternatives in the pattern, they are tried. If all - the alternatives match the empty string, the entire match fails. For - example, if the pattern - - a?b? - - is applied to a string not beginning with "a" or "b", it matches an - empty string at the start of the subject. With PCRE_NOTEMPTY set, this - match is not valid, so PCRE searches further into the string for occur- - rences of "a" or "b". - - PCRE_NOTEMPTY_ATSTART - - This is like PCRE_NOTEMPTY, except that an empty string match that is - not at the start of the subject is permitted. If the pattern is - anchored, such a match can occur only if the pattern contains \K. - - Perl has no direct equivalent of PCRE_NOTEMPTY or - PCRE_NOTEMPTY_ATSTART, but it does make a special case of a pattern - match of the empty string within its split() function, and when using - the /g modifier. It is possible to emulate Perl's behaviour after - matching a null string by first trying the match again at the same off- - set with PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED, and then if that - fails, by advancing the starting offset (see below) and trying an ordi- - nary match again. There is some code that demonstrates how to do this - in the pcredemo sample program. In the most general case, you have to - check to see if the newline convention recognizes CRLF as a newline, - and if so, and the current character is CR followed by LF, advance the - starting offset by two characters instead of one. - - PCRE_NO_START_OPTIMIZE - - There are a number of optimizations that pcre_exec() uses at the start - of a match, in order to speed up the process. For example, if it is - known that an unanchored match must start with a specific character, it - searches the subject for that character, and fails immediately if it - cannot find it, without actually running the main matching function. - This means that a special item such as (*COMMIT) at the start of a pat- - tern is not considered until after a suitable starting point for the - match has been found. Also, when callouts or (*MARK) items are in use, - these "start-up" optimizations can cause them to be skipped if the pat- - tern is never actually used. The start-up optimizations are in effect a - pre-scan of the subject that takes place before the pattern is run. - - The PCRE_NO_START_OPTIMIZE option disables the start-up optimizations, - possibly causing performance to suffer, but ensuring that in cases - where the result is "no match", the callouts do occur, and that items - such as (*COMMIT) and (*MARK) are considered at every possible starting - position in the subject string. If PCRE_NO_START_OPTIMIZE is set at - compile time, it cannot be unset at matching time. The use of - PCRE_NO_START_OPTIMIZE at matching time (that is, passing it to - pcre_exec()) disables JIT execution; in this situation, matching is - always done using interpretively. - - Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching - operation. Consider the pattern - - (*COMMIT)ABC - - When this is compiled, PCRE records the fact that a match must start - with the character "A". Suppose the subject string is "DEFABC". The - start-up optimization scans along the subject, finds "A" and runs the - first match attempt from there. The (*COMMIT) item means that the pat- - tern must match the current starting position, which in this case, it - does. However, if the same match is run with PCRE_NO_START_OPTIMIZE - set, the initial scan along the subject string does not happen. The - first match attempt is run starting from "D" and when this fails, - (*COMMIT) prevents any further matches being tried, so the overall - result is "no match". If the pattern is studied, more start-up opti- - mizations may be used. For example, a minimum length for the subject - may be recorded. Consider the pattern - - (*MARK:A)(X|Y) - - The minimum length for a match is one character. If the subject is - "ABC", there will be attempts to match "ABC", "BC", "C", and then - finally an empty string. If the pattern is studied, the final attempt - does not take place, because PCRE knows that the subject is too short, - and so the (*MARK) is never encountered. In this case, studying the - pattern does not affect the overall match result, which is still "no - match", but it does affect the auxiliary information that is returned. - - PCRE_NO_UTF8_CHECK - - When PCRE_UTF8 is set at compile time, the validity of the subject as a - UTF-8 string is automatically checked when pcre_exec() is subsequently - called. The entire string is checked before any other processing takes - place. The value of startoffset is also checked to ensure that it - points to the start of a UTF-8 character. There is a discussion about - the validity of UTF-8 strings in the pcreunicode page. If an invalid - sequence of bytes is found, pcre_exec() returns the error - PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is a - truncated character at the end of the subject, PCRE_ERROR_SHORTUTF8. In - both cases, information about the precise nature of the error may also - be returned (see the descriptions of these errors in the section enti- - tled Error return values from pcre_exec() below). If startoffset con- - tains a value that does not point to the start of a UTF-8 character (or - to the end of the subject), PCRE_ERROR_BADUTF8_OFFSET is returned. - - If you already know that your subject is valid, and you want to skip - these checks for performance reasons, you can set the - PCRE_NO_UTF8_CHECK option when calling pcre_exec(). You might want to - do this for the second and subsequent calls to pcre_exec() if you are - making repeated calls to find all the matches in a single subject - string. However, you should be sure that the value of startoffset - points to the start of a character (or the end of the subject). When - PCRE_NO_UTF8_CHECK is set, the effect of passing an invalid string as a - subject or an invalid value of startoffset is undefined. Your program - may crash or loop. - - PCRE_PARTIAL_HARD - PCRE_PARTIAL_SOFT - - These options turn on the partial matching feature. For backwards com- - patibility, PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial - match occurs if the end of the subject string is reached successfully, - but there are not enough subject characters to complete the match. If - this happens when PCRE_PARTIAL_SOFT (but not PCRE_PARTIAL_HARD) is set, - matching continues by testing any remaining alternatives. Only if no - complete match can be found is PCRE_ERROR_PARTIAL returned instead of - PCRE_ERROR_NOMATCH. In other words, PCRE_PARTIAL_SOFT says that the - caller is prepared to handle a partial match, but only if no complete - match can be found. - - If PCRE_PARTIAL_HARD is set, it overrides PCRE_PARTIAL_SOFT. In this - case, if a partial match is found, pcre_exec() immediately returns - PCRE_ERROR_PARTIAL, without considering any other alternatives. In - other words, when PCRE_PARTIAL_HARD is set, a partial match is consid- - ered to be more important that an alternative complete match. - - In both cases, the portion of the string that was inspected when the - partial match was found is set as the first matching string. There is a - more detailed discussion of partial and multi-segment matching, with - examples, in the pcrepartial documentation. - - The string to be matched by pcre_exec() - - The subject string is passed to pcre_exec() as a pointer in subject, a - length in length, and a starting offset in startoffset. The units for - length and startoffset are bytes for the 8-bit library, 16-bit data - items for the 16-bit library, and 32-bit data items for the 32-bit - library. - - If startoffset is negative or greater than the length of the subject, - pcre_exec() returns PCRE_ERROR_BADOFFSET. When the starting offset is - zero, the search for a match starts at the beginning of the subject, - and this is by far the most common case. In UTF-8 or UTF-16 mode, the - offset must point to the start of a character, or the end of the sub- - ject (in UTF-32 mode, one data unit equals one character, so all off- - sets are valid). Unlike the pattern string, the subject may contain - binary zeroes. - - A non-zero starting offset is useful when searching for another match - in the same subject by calling pcre_exec() again after a previous suc- - cess. Setting startoffset differs from just passing over a shortened - string and setting PCRE_NOTBOL in the case of a pattern that begins - with any kind of lookbehind. For example, consider the pattern - - \Biss\B - - which finds occurrences of "iss" in the middle of words. (\B matches - only if the current position in the subject is not a word boundary.) - When applied to the string "Mississipi" the first call to pcre_exec() - finds the first occurrence. If pcre_exec() is called again with just - the remainder of the subject, namely "issipi", it does not match, - because \B is always false at the start of the subject, which is deemed - to be a word boundary. However, if pcre_exec() is passed the entire - string again, but with startoffset set to 4, it finds the second occur- - rence of "iss" because it is able to look behind the starting point to - discover that it is preceded by a letter. - - Finding all the matches in a subject is tricky when the pattern can - match an empty string. It is possible to emulate Perl's /g behaviour by - first trying the match again at the same offset, with the - PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED options, and then if that - fails, advancing the starting offset and trying an ordinary match - again. There is some code that demonstrates how to do this in the pcre- - demo sample program. In the most general case, you have to check to see - if the newline convention recognizes CRLF as a newline, and if so, and - the current character is CR followed by LF, advance the starting offset - by two characters instead of one. - - If a non-zero starting offset is passed when the pattern is anchored, - one attempt to match at the given offset is made. This can only succeed - if the pattern does not require the match to be at the start of the - subject. - - How pcre_exec() returns captured substrings - - In general, a pattern matches a certain portion of the subject, and in - addition, further substrings from the subject may be picked out by - parts of the pattern. Following the usage in Jeffrey Friedl's book, - this is called "capturing" in what follows, and the phrase "capturing - subpattern" is used for a fragment of a pattern that picks out a sub- - string. PCRE supports several other kinds of parenthesized subpattern - that do not cause substrings to be captured. - - Captured substrings are returned to the caller via a vector of integers - whose address is passed in ovector. The number of elements in the vec- - tor is passed in ovecsize, which must be a non-negative number. Note: - this argument is NOT the size of ovector in bytes. - - The first two-thirds of the vector is used to pass back captured sub- - strings, each substring using a pair of integers. The remaining third - of the vector is used as workspace by pcre_exec() while matching cap- - turing subpatterns, and is not available for passing back information. - The number passed in ovecsize should always be a multiple of three. If - it is not, it is rounded down. - - When a match is successful, information about captured substrings is - returned in pairs of integers, starting at the beginning of ovector, - and continuing up to two-thirds of its length at the most. The first - element of each pair is set to the offset of the first character in a - substring, and the second is set to the offset of the first character - after the end of a substring. These values are always data unit off- - sets, even in UTF mode. They are byte offsets in the 8-bit library, - 16-bit data item offsets in the 16-bit library, and 32-bit data item - offsets in the 32-bit library. Note: they are not character counts. - - The first pair of integers, ovector[0] and ovector[1], identify the - portion of the subject string matched by the entire pattern. The next - pair is used for the first capturing subpattern, and so on. The value - returned by pcre_exec() is one more than the highest numbered pair that - has been set. For example, if two substrings have been captured, the - returned value is 3. If there are no capturing subpatterns, the return - value from a successful match is 1, indicating that just the first pair - of offsets has been set. - - If a capturing subpattern is matched repeatedly, it is the last portion - of the string that it matched that is returned. - - If the vector is too small to hold all the captured substring offsets, - it is used as far as possible (up to two-thirds of its length), and the - function returns a value of zero. If neither the actual string matched - nor any captured substrings are of interest, pcre_exec() may be called - with ovector passed as NULL and ovecsize as zero. However, if the pat- - tern contains back references and the ovector is not big enough to - remember the related substrings, PCRE has to get additional memory for - use during matching. Thus it is usually advisable to supply an ovector - of reasonable size. - - There are some cases where zero is returned (indicating vector over- - flow) when in fact the vector is exactly the right size for the final - match. For example, consider the pattern - - (a)(?:(b)c|bd) - - If a vector of 6 elements (allowing for only 1 captured substring) is - given with subject string "abd", pcre_exec() will try to set the second - captured string, thereby recording a vector overflow, before failing to - match "c" and backing up to try the second alternative. The zero - return, however, does correctly indicate that the maximum number of - slots (namely 2) have been filled. In similar cases where there is tem- - porary overflow, but the final number of used slots is actually less - than the maximum, a non-zero value is returned. - - The pcre_fullinfo() function can be used to find out how many capturing - subpatterns there are in a compiled pattern. The smallest size for - ovector that will allow for n captured substrings, in addition to the - offsets of the substring matched by the whole pattern, is (n+1)*3. - - It is possible for capturing subpattern number n+1 to match some part - of the subject when subpattern n has not been used at all. For example, - if the string "abc" is matched against the pattern (a|(z))(bc) the - return from the function is 4, and subpatterns 1 and 3 are matched, but - 2 is not. When this happens, both values in the offset pairs corre- - sponding to unused subpatterns are set to -1. - - Offset values that correspond to unused subpatterns at the end of the - expression are also set to -1. For example, if the string "abc" is - matched against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not - matched. The return from the function is 2, because the highest used - capturing subpattern number is 1, and the offsets for for the second - and third capturing subpatterns (assuming the vector is large enough, - of course) are set to -1. - - Note: Elements in the first two-thirds of ovector that do not corre- - spond to capturing parentheses in the pattern are never changed. That - is, if a pattern contains n capturing parentheses, no more than ovec- - tor[0] to ovector[2n+1] are set by pcre_exec(). The other elements (in - the first two-thirds) retain whatever values they previously had. - - Some convenience functions are provided for extracting the captured - substrings as separate strings. These are described below. - - Error return values from pcre_exec() - - If pcre_exec() fails, it returns a negative number. The following are - defined in the header file: - - PCRE_ERROR_NOMATCH (-1) - - The subject string did not match the pattern. - - PCRE_ERROR_NULL (-2) - - Either code or subject was passed as NULL, or ovector was NULL and - ovecsize was not zero. - - PCRE_ERROR_BADOPTION (-3) - - An unrecognized bit was set in the options argument. - - PCRE_ERROR_BADMAGIC (-4) - - PCRE stores a 4-byte "magic number" at the start of the compiled code, - to catch the case when it is passed a junk pointer and to detect when a - pattern that was compiled in an environment of one endianness is run in - an environment with the other endianness. This is the error that PCRE - gives when the magic number is not present. - - PCRE_ERROR_UNKNOWN_OPCODE (-5) - - While running the pattern match, an unknown item was encountered in the - compiled pattern. This error could be caused by a bug in PCRE or by - overwriting of the compiled pattern. - - PCRE_ERROR_NOMEMORY (-6) - - If a pattern contains back references, but the ovector that is passed - to pcre_exec() is not big enough to remember the referenced substrings, - PCRE gets a block of memory at the start of matching to use for this - purpose. If the call via pcre_malloc() fails, this error is given. The - memory is automatically freed at the end of matching. - - This error is also given if pcre_stack_malloc() fails in pcre_exec(). - This can happen only when PCRE has been compiled with --disable-stack- - for-recursion. - - PCRE_ERROR_NOSUBSTRING (-7) - - This error is used by the pcre_copy_substring(), pcre_get_substring(), - and pcre_get_substring_list() functions (see below). It is never - returned by pcre_exec(). - - PCRE_ERROR_MATCHLIMIT (-8) - - The backtracking limit, as specified by the match_limit field in a - pcre_extra structure (or defaulted) was reached. See the description - above. - - PCRE_ERROR_CALLOUT (-9) - - This error is never generated by pcre_exec() itself. It is provided for - use by callout functions that want to yield a distinctive error code. - See the pcrecallout documentation for details. - - PCRE_ERROR_BADUTF8 (-10) - - A string that contains an invalid UTF-8 byte sequence was passed as a - subject, and the PCRE_NO_UTF8_CHECK option was not set. If the size of - the output vector (ovecsize) is at least 2, the byte offset to the - start of the the invalid UTF-8 character is placed in the first ele- - ment, and a reason code is placed in the second element. The reason - codes are listed in the following section. For backward compatibility, - if PCRE_PARTIAL_HARD is set and the problem is a truncated UTF-8 char- - acter at the end of the subject (reason codes 1 to 5), - PCRE_ERROR_SHORTUTF8 is returned instead of PCRE_ERROR_BADUTF8. - - PCRE_ERROR_BADUTF8_OFFSET (-11) - - The UTF-8 byte sequence that was passed as a subject was checked and - found to be valid (the PCRE_NO_UTF8_CHECK option was not set), but the - value of startoffset did not point to the beginning of a UTF-8 charac- - ter or the end of the subject. - - PCRE_ERROR_PARTIAL (-12) - - The subject string did not match, but it did match partially. See the - pcrepartial documentation for details of partial matching. - - PCRE_ERROR_BADPARTIAL (-13) - - This code is no longer in use. It was formerly returned when the - PCRE_PARTIAL option was used with a compiled pattern containing items - that were not supported for partial matching. From release 8.00 - onwards, there are no restrictions on partial matching. - - PCRE_ERROR_INTERNAL (-14) - - An unexpected internal error has occurred. This error could be caused - by a bug in PCRE or by overwriting of the compiled pattern. - - PCRE_ERROR_BADCOUNT (-15) - - This error is given if the value of the ovecsize argument is negative. - - PCRE_ERROR_RECURSIONLIMIT (-21) - - The internal recursion limit, as specified by the match_limit_recursion - field in a pcre_extra structure (or defaulted) was reached. See the - description above. - - PCRE_ERROR_BADNEWLINE (-23) - - An invalid combination of PCRE_NEWLINE_xxx options was given. - - PCRE_ERROR_BADOFFSET (-24) - - The value of startoffset was negative or greater than the length of the - subject, that is, the value in length. - - PCRE_ERROR_SHORTUTF8 (-25) - - This error is returned instead of PCRE_ERROR_BADUTF8 when the subject - string ends with a truncated UTF-8 character and the PCRE_PARTIAL_HARD - option is set. Information about the failure is returned as for - PCRE_ERROR_BADUTF8. It is in fact sufficient to detect this case, but - this special error code for PCRE_PARTIAL_HARD precedes the implementa- - tion of returned information; it is retained for backwards compatibil- - ity. - - PCRE_ERROR_RECURSELOOP (-26) - - This error is returned when pcre_exec() detects a recursion loop within - the pattern. Specifically, it means that either the whole pattern or a - subpattern has been called recursively for the second time at the same - position in the subject string. Some simple patterns that might do this - are detected and faulted at compile time, but more complicated cases, - in particular mutual recursions between two different subpatterns, can- - not be detected until run time. - - PCRE_ERROR_JIT_STACKLIMIT (-27) - - This error is returned when a pattern that was successfully studied - using a JIT compile option is being matched, but the memory available - for the just-in-time processing stack is not large enough. See the - pcrejit documentation for more details. - - PCRE_ERROR_BADMODE (-28) - - This error is given if a pattern that was compiled by the 8-bit library - is passed to a 16-bit or 32-bit library function, or vice versa. - - PCRE_ERROR_BADENDIANNESS (-29) - - This error is given if a pattern that was compiled and saved is - reloaded on a host with different endianness. The utility function - pcre_pattern_to_host_byte_order() can be used to convert such a pattern - so that it runs on the new host. - - PCRE_ERROR_JIT_BADOPTION - - This error is returned when a pattern that was successfully studied - using a JIT compile option is being matched, but the matching mode - (partial or complete match) does not correspond to any JIT compilation - mode. When the JIT fast path function is used, this error may be also - given for invalid options. See the pcrejit documentation for more - details. - - PCRE_ERROR_BADLENGTH (-32) - - This error is given if pcre_exec() is called with a negative value for - the length argument. - - Error numbers -16 to -20, -22, and 30 are not used by pcre_exec(). - - Reason codes for invalid UTF-8 strings - - This section applies only to the 8-bit library. The corresponding - information for the 16-bit and 32-bit libraries is given in the pcre16 - and pcre32 pages. - - When pcre_exec() returns either PCRE_ERROR_BADUTF8 or PCRE_ERROR_SHORT- - UTF8, and the size of the output vector (ovecsize) is at least 2, the - offset of the start of the invalid UTF-8 character is placed in the - first output vector element (ovector[0]) and a reason code is placed in - the second element (ovector[1]). The reason codes are given names in - the pcre.h header file: - - PCRE_UTF8_ERR1 - PCRE_UTF8_ERR2 - PCRE_UTF8_ERR3 - PCRE_UTF8_ERR4 - PCRE_UTF8_ERR5 - - The string ends with a truncated UTF-8 character; the code specifies - how many bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8 - characters to be no longer than 4 bytes, the encoding scheme (origi- - nally defined by RFC 2279) allows for up to 6 bytes, and this is - checked first; hence the possibility of 4 or 5 missing bytes. - - PCRE_UTF8_ERR6 - PCRE_UTF8_ERR7 - PCRE_UTF8_ERR8 - PCRE_UTF8_ERR9 - PCRE_UTF8_ERR10 - - The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of - the character do not have the binary value 0b10 (that is, either the - most significant bit is 0, or the next bit is 1). - - PCRE_UTF8_ERR11 - PCRE_UTF8_ERR12 - - A character that is valid by the RFC 2279 rules is either 5 or 6 bytes - long; these code points are excluded by RFC 3629. - - PCRE_UTF8_ERR13 - - A 4-byte character has a value greater than 0x10fff; these code points - are excluded by RFC 3629. - - PCRE_UTF8_ERR14 - - A 3-byte character has a value in the range 0xd800 to 0xdfff; this - range of code points are reserved by RFC 3629 for use with UTF-16, and - so are excluded from UTF-8. - - PCRE_UTF8_ERR15 - PCRE_UTF8_ERR16 - PCRE_UTF8_ERR17 - PCRE_UTF8_ERR18 - PCRE_UTF8_ERR19 - - A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it codes - for a value that can be represented by fewer bytes, which is invalid. - For example, the two bytes 0xc0, 0xae give the value 0x2e, whose cor- - rect coding uses just one byte. - - PCRE_UTF8_ERR20 - - The two most significant bits of the first byte of a character have the - binary value 0b10 (that is, the most significant bit is 1 and the sec- - ond is 0). Such a byte can only validly occur as the second or subse- - quent byte of a multi-byte character. - - PCRE_UTF8_ERR21 - - The first byte of a character has the value 0xfe or 0xff. These values - can never occur in a valid UTF-8 string. - - PCRE_UTF8_ERR22 - - This error code was formerly used when the presence of a so-called - "non-character" caused an error. Unicode corrigendum #9 makes it clear - that such characters should not cause a string to be rejected, and so - this code is no longer in use and is never returned. - - -EXTRACTING CAPTURED SUBSTRINGS BY NUMBER - - int pcre_copy_substring(const char *subject, int *ovector, - int stringcount, int stringnumber, char *buffer, - int buffersize); - - int pcre_get_substring(const char *subject, int *ovector, - int stringcount, int stringnumber, - const char **stringptr); - - int pcre_get_substring_list(const char *subject, - int *ovector, int stringcount, const char ***listptr); - - Captured substrings can be accessed directly by using the offsets - returned by pcre_exec() in ovector. For convenience, the functions - pcre_copy_substring(), pcre_get_substring(), and pcre_get_sub- - string_list() are provided for extracting captured substrings as new, - separate, zero-terminated strings. These functions identify substrings - by number. The next section describes functions for extracting named - substrings. - - A substring that contains a binary zero is correctly extracted and has - a further zero added on the end, but the result is not, of course, a C - string. However, you can process such a string by referring to the - length that is returned by pcre_copy_substring() and pcre_get_sub- - string(). Unfortunately, the interface to pcre_get_substring_list() is - not adequate for handling strings containing binary zeros, because the - end of the final string is not independently indicated. - - The first three arguments are the same for all three of these func- - tions: subject is the subject string that has just been successfully - matched, ovector is a pointer to the vector of integer offsets that was - passed to pcre_exec(), and stringcount is the number of substrings that - were captured by the match, including the substring that matched the - entire regular expression. This is the value returned by pcre_exec() if - it is greater than zero. If pcre_exec() returned zero, indicating that - it ran out of space in ovector, the value passed as stringcount should - be the number of elements in the vector divided by three. - - The functions pcre_copy_substring() and pcre_get_substring() extract a - single substring, whose number is given as stringnumber. A value of - zero extracts the substring that matched the entire pattern, whereas - higher values extract the captured substrings. For pcre_copy_sub- - string(), the string is placed in buffer, whose length is given by - buffersize, while for pcre_get_substring() a new block of memory is - obtained via pcre_malloc, and its address is returned via stringptr. - The yield of the function is the length of the string, not including - the terminating zero, or one of these error codes: - - PCRE_ERROR_NOMEMORY (-6) - - The buffer was too small for pcre_copy_substring(), or the attempt to - get memory failed for pcre_get_substring(). - - PCRE_ERROR_NOSUBSTRING (-7) - - There is no substring whose number is stringnumber. - - The pcre_get_substring_list() function extracts all available sub- - strings and builds a list of pointers to them. All this is done in a - single block of memory that is obtained via pcre_malloc. The address of - the memory block is returned via listptr, which is also the start of - the list of string pointers. The end of the list is marked by a NULL - pointer. The yield of the function is zero if all went well, or the - error code - - PCRE_ERROR_NOMEMORY (-6) - - if the attempt to get the memory block failed. - - When any of these functions encounter a substring that is unset, which - can happen when capturing subpattern number n+1 matches some part of - the subject, but subpattern n has not been used at all, they return an - empty string. This can be distinguished from a genuine zero-length sub- - string by inspecting the appropriate offset in ovector, which is nega- - tive for unset substrings. - - The two convenience functions pcre_free_substring() and pcre_free_sub- - string_list() can be used to free the memory returned by a previous - call of pcre_get_substring() or pcre_get_substring_list(), respec- - tively. They do nothing more than call the function pointed to by - pcre_free, which of course could be called directly from a C program. - However, PCRE is used in some situations where it is linked via a spe- - cial interface to another programming language that cannot use - pcre_free directly; it is for these cases that the functions are pro- - vided. - - -EXTRACTING CAPTURED SUBSTRINGS BY NAME - - int pcre_get_stringnumber(const pcre *code, - const char *name); - - int pcre_copy_named_substring(const pcre *code, - const char *subject, int *ovector, - int stringcount, const char *stringname, - char *buffer, int buffersize); - - int pcre_get_named_substring(const pcre *code, - const char *subject, int *ovector, - int stringcount, const char *stringname, - const char **stringptr); - - To extract a substring by name, you first have to find associated num- - ber. For example, for this pattern - - (a+)b(?\d+)... - - the number of the subpattern called "xxx" is 2. If the name is known to - be unique (PCRE_DUPNAMES was not set), you can find the number from the - name by calling pcre_get_stringnumber(). The first argument is the com- - piled pattern, and the second is the name. The yield of the function is - the subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no - subpattern of that name. - - Given the number, you can extract the substring directly, or use one of - the functions described in the previous section. For convenience, there - are also two functions that do the whole job. - - Most of the arguments of pcre_copy_named_substring() and - pcre_get_named_substring() are the same as those for the similarly - named functions that extract by number. As these are described in the - previous section, they are not re-described here. There are just two - differences: - - First, instead of a substring number, a substring name is given. Sec- - ond, there is an extra argument, given at the start, which is a pointer - to the compiled pattern. This is needed in order to gain access to the - name-to-number translation table. - - These functions call pcre_get_stringnumber(), and if it succeeds, they - then call pcre_copy_substring() or pcre_get_substring(), as appropri- - ate. NOTE: If PCRE_DUPNAMES is set and there are duplicate names, the - behaviour may not be what you want (see the next section). - - Warning: If the pattern uses the (?| feature to set up multiple subpat- - terns with the same number, as described in the section on duplicate - subpattern numbers in the pcrepattern page, you cannot use names to - distinguish the different subpatterns, because names are not included - in the compiled code. The matching process uses only numbers. For this - reason, the use of different names for subpatterns of the same number - causes an error at compile time. - - -DUPLICATE SUBPATTERN NAMES - - int pcre_get_stringtable_entries(const pcre *code, - const char *name, char **first, char **last); - - When a pattern is compiled with the PCRE_DUPNAMES option, names for - subpatterns are not required to be unique. (Duplicate names are always - allowed for subpatterns with the same number, created by using the (?| - feature. Indeed, if such subpatterns are named, they are required to - use the same names.) - - Normally, patterns with duplicate names are such that in any one match, - only one of the named subpatterns participates. An example is shown in - the pcrepattern documentation. - - When duplicates are present, pcre_copy_named_substring() and - pcre_get_named_substring() return the first substring corresponding to - the given name that is set. If none are set, PCRE_ERROR_NOSUBSTRING - (-7) is returned; no data is returned. The pcre_get_stringnumber() - function returns one of the numbers that are associated with the name, - but it is not defined which it is. - - If you want to get full details of all captured substrings for a given - name, you must use the pcre_get_stringtable_entries() function. The - first argument is the compiled pattern, and the second is the name. The - third and fourth are pointers to variables which are updated by the - function. After it has run, they point to the first and last entries in - the name-to-number table for the given name. The function itself - returns the length of each entry, or PCRE_ERROR_NOSUBSTRING (-7) if - there are none. The format of the table is described above in the sec- - tion entitled Information about a pattern above. Given all the rele- - vant entries for the name, you can extract each of their numbers, and - hence the captured data, if any. - - -FINDING ALL POSSIBLE MATCHES - - The traditional matching function uses a similar algorithm to Perl, - which stops when it finds the first match, starting at a given point in - the subject. If you want to find all possible matches, or the longest - possible match, consider using the alternative matching function (see - below) instead. If you cannot use the alternative function, but still - need to find all possible matches, you can kludge it up by making use - of the callout facility, which is described in the pcrecallout documen- - tation. - - What you have to do is to insert a callout right at the end of the pat- - tern. When your callout function is called, extract and save the cur- - rent matched substring. Then return 1, which forces pcre_exec() to - backtrack and try other alternatives. Ultimately, when it runs out of - matches, pcre_exec() will yield PCRE_ERROR_NOMATCH. - - -OBTAINING AN ESTIMATE OF STACK USAGE - - Matching certain patterns using pcre_exec() can use a lot of process - stack, which in certain environments can be rather limited in size. - Some users find it helpful to have an estimate of the amount of stack - that is used by pcre_exec(), to help them set recursion limits, as - described in the pcrestack documentation. The estimate that is output - by pcretest when called with the -m and -C options is obtained by call- - ing pcre_exec with the values NULL, NULL, NULL, -999, and -999 for its - first five arguments. - - Normally, if its first argument is NULL, pcre_exec() immediately - returns the negative error code PCRE_ERROR_NULL, but with this special - combination of arguments, it returns instead a negative number whose - absolute value is the approximate stack frame size in bytes. (A nega- - tive number is used so that it is clear that no match has happened.) - The value is approximate because in some cases, recursive calls to - pcre_exec() occur when there are one or two additional variables on the - stack. - - If PCRE has been compiled to use the heap instead of the stack for - recursion, the value returned is the size of each block that is - obtained from the heap. - - -MATCHING A PATTERN: THE ALTERNATIVE FUNCTION - - int pcre_dfa_exec(const pcre *code, const pcre_extra *extra, - const char *subject, int length, int startoffset, - int options, int *ovector, int ovecsize, - int *workspace, int wscount); - - The function pcre_dfa_exec() is called to match a subject string - against a compiled pattern, using a matching algorithm that scans the - subject string just once, and does not backtrack. This has different - characteristics to the normal algorithm, and is not compatible with - Perl. Some of the features of PCRE patterns are not supported. Never- - theless, there are times when this kind of matching can be useful. For - a discussion of the two matching algorithms, and a list of features - that pcre_dfa_exec() does not support, see the pcrematching documenta- - tion. - - The arguments for the pcre_dfa_exec() function are the same as for - pcre_exec(), plus two extras. The ovector argument is used in a differ- - ent way, and this is described below. The other common arguments are - used in the same way as for pcre_exec(), so their description is not - repeated here. - - The two additional arguments provide workspace for the function. The - workspace vector should contain at least 20 elements. It is used for - keeping track of multiple paths through the pattern tree. More - workspace will be needed for patterns and subjects where there are a - lot of potential matches. - - Here is an example of a simple call to pcre_dfa_exec(): - - int rc; - int ovector[10]; - int wspace[20]; - rc = pcre_dfa_exec( - re, /* result of pcre_compile() */ - NULL, /* we didn't study the pattern */ - "some string", /* the subject string */ - 11, /* the length of the subject string */ - 0, /* start at offset 0 in the subject */ - 0, /* default options */ - ovector, /* vector of integers for substring information */ - 10, /* number of elements (NOT size in bytes) */ - wspace, /* working space vector */ - 20); /* number of elements (NOT size in bytes) */ - - Option bits for pcre_dfa_exec() - - The unused bits of the options argument for pcre_dfa_exec() must be - zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEW- - LINE_xxx, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, - PCRE_NOTEMPTY_ATSTART, PCRE_NO_UTF8_CHECK, PCRE_BSR_ANYCRLF, - PCRE_BSR_UNICODE, PCRE_NO_START_OPTIMIZE, PCRE_PARTIAL_HARD, PCRE_PAR- - TIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. All but the last - four of these are exactly the same as for pcre_exec(), so their - description is not repeated here. - - PCRE_PARTIAL_HARD - PCRE_PARTIAL_SOFT - - These have the same general effect as they do for pcre_exec(), but the - details are slightly different. When PCRE_PARTIAL_HARD is set for - pcre_dfa_exec(), it returns PCRE_ERROR_PARTIAL if the end of the sub- - ject is reached and there is still at least one matching possibility - that requires additional characters. This happens even if some complete - matches have also been found. When PCRE_PARTIAL_SOFT is set, the return - code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end - of the subject is reached, there have been no complete matches, but - there is still at least one matching possibility. The portion of the - string that was inspected when the longest partial match was found is - set as the first matching string in both cases. There is a more - detailed discussion of partial and multi-segment matching, with exam- - ples, in the pcrepartial documentation. - - PCRE_DFA_SHORTEST - - Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to - stop as soon as it has found one match. Because of the way the alterna- - tive algorithm works, this is necessarily the shortest possible match - at the first possible matching point in the subject string. - - PCRE_DFA_RESTART - - When pcre_dfa_exec() returns a partial match, it is possible to call it - again, with additional subject characters, and have it continue with - the same match. The PCRE_DFA_RESTART option requests this action; when - it is set, the workspace and wscount options must reference the same - vector as before because data about the match so far is left in them - after a partial match. There is more discussion of this facility in the - pcrepartial documentation. - - Successful returns from pcre_dfa_exec() - - When pcre_dfa_exec() succeeds, it may have matched more than one sub- - string in the subject. Note, however, that all the matches from one run - of the function start at the same point in the subject. The shorter - matches are all initial substrings of the longer matches. For example, - if the pattern - - <.*> - - is matched against the string - - This is no more - - the three matched strings are - - - - - - On success, the yield of the function is a number greater than zero, - which is the number of matched substrings. The substrings themselves - are returned in ovector. Each string uses two elements; the first is - the offset to the start, and the second is the offset to the end. In - fact, all the strings have the same start offset. (Space could have - been saved by giving this only once, but it was decided to retain some - compatibility with the way pcre_exec() returns data, even though the - meaning of the strings is different.) - - The strings are returned in reverse order of length; that is, the long- - est matching string is given first. If there were too many matches to - fit into ovector, the yield of the function is zero, and the vector is - filled with the longest matches. Unlike pcre_exec(), pcre_dfa_exec() - can use the entire ovector for returning matched strings. - - NOTE: PCRE's "auto-possessification" optimization usually applies to - character repeats at the end of a pattern (as well as internally). For - example, the pattern "a\d+" is compiled as if it were "a\d++" because - there is no point even considering the possibility of backtracking into - the repeated digits. For DFA matching, this means that only one possi- - ble match is found. If you really do want multiple matches in such - cases, either use an ungreedy repeat ("a\d+?") or set the - PCRE_NO_AUTO_POSSESS option when compiling. - - Error returns from pcre_dfa_exec() - - The pcre_dfa_exec() function returns a negative number when it fails. - Many of the errors are the same as for pcre_exec(), and these are - described above. There are in addition the following errors that are - specific to pcre_dfa_exec(): - - PCRE_ERROR_DFA_UITEM (-16) - - This return is given if pcre_dfa_exec() encounters an item in the pat- - tern that it does not support, for instance, the use of \C or a back - reference. - - PCRE_ERROR_DFA_UCOND (-17) - - This return is given if pcre_dfa_exec() encounters a condition item - that uses a back reference for the condition, or a test for recursion - in a specific group. These are not supported. - - PCRE_ERROR_DFA_UMLIMIT (-18) - - This return is given if pcre_dfa_exec() is called with an extra block - that contains a setting of the match_limit or match_limit_recursion - fields. This is not supported (these fields are meaningless for DFA - matching). - - PCRE_ERROR_DFA_WSSIZE (-19) - - This return is given if pcre_dfa_exec() runs out of space in the - workspace vector. - - PCRE_ERROR_DFA_RECURSE (-20) - - When a recursive subpattern is processed, the matching function calls - itself recursively, using private vectors for ovector and workspace. - This error is given if the output vector is not large enough. This - should be extremely rare, as a vector of size 1000 is used. - - PCRE_ERROR_DFA_BADRESTART (-30) - - When pcre_dfa_exec() is called with the PCRE_DFA_RESTART option, some - plausibility checks are made on the contents of the workspace, which - should contain data about the previous partial match. If any of these - checks fail, this error is given. - - -SEE ALSO - - pcre16(3), pcre32(3), pcrebuild(3), pcrecallout(3), pcrecpp(3)(3), - pcrematching(3), pcrepartial(3), pcreposix(3), pcreprecompile(3), pcre- - sample(3), pcrestack(3). - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 18 December 2015 - Copyright (c) 1997-2015 University of Cambridge. ------------------------------------------------------------------------------- - - -PCRECALLOUT(3) Library Functions Manual PCRECALLOUT(3) - - - -NAME - PCRE - Perl-compatible regular expressions - -SYNOPSIS - - #include - - int (*pcre_callout)(pcre_callout_block *); - - int (*pcre16_callout)(pcre16_callout_block *); - - int (*pcre32_callout)(pcre32_callout_block *); - - -DESCRIPTION - - PCRE provides a feature called "callout", which is a means of temporar- - ily passing control to the caller of PCRE in the middle of pattern - matching. The caller of PCRE provides an external function by putting - its entry point in the global variable pcre_callout (pcre16_callout for - the 16-bit library, pcre32_callout for the 32-bit library). By default, - this variable contains NULL, which disables all calling out. - - Within a regular expression, (?C) indicates the points at which the - external function is to be called. Different callout points can be - identified by putting a number less than 256 after the letter C. The - default value is zero. For example, this pattern has two callout - points: - - (?C1)abc(?C2)def - - If the PCRE_AUTO_CALLOUT option bit is set when a pattern is compiled, - PCRE automatically inserts callouts, all with number 255, before each - item in the pattern. For example, if PCRE_AUTO_CALLOUT is used with the - pattern - - A(\d{2}|--) - - it is processed as if it were - - (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255) - - Notice that there is a callout before and after each parenthesis and - alternation bar. If the pattern contains a conditional group whose con- - dition is an assertion, an automatic callout is inserted immediately - before the condition. Such a callout may also be inserted explicitly, - for example: - - (?(?C9)(?=a)ab|de) - - This applies only to assertion conditions (because they are themselves - independent groups). - - Automatic callouts can be used for tracking the progress of pattern - matching. The pcretest program has a pattern qualifier (/C) that sets - automatic callouts; when it is used, the output indicates how the pat- - tern is being matched. This is useful information when you are trying - to optimize the performance of a particular pattern. - - -MISSING CALLOUTS - - You should be aware that, because of optimizations in the way PCRE com- - piles and matches patterns, callouts sometimes do not happen exactly as - you might expect. - - At compile time, PCRE "auto-possessifies" repeated items when it knows - that what follows cannot be part of the repeat. For example, a+[bc] is - compiled as if it were a++[bc]. The pcretest output when this pattern - is anchored and then applied with automatic callouts to the string - "aaaa" is: - - --->aaaa - +0 ^ ^ - +1 ^ a+ - +3 ^ ^ [bc] - No match - - This indicates that when matching [bc] fails, there is no backtracking - into a+ and therefore the callouts that would be taken for the back- - tracks do not occur. You can disable the auto-possessify feature by - passing PCRE_NO_AUTO_POSSESS to pcre_compile(), or starting the pattern - with (*NO_AUTO_POSSESS). If this is done in pcretest (using the /O - qualifier), the output changes to this: - - --->aaaa - +0 ^ ^ - +1 ^ a+ - +3 ^ ^ [bc] - +3 ^ ^ [bc] - +3 ^ ^ [bc] - +3 ^^ [bc] - No match - - This time, when matching [bc] fails, the matcher backtracks into a+ and - tries again, repeatedly, until a+ itself fails. - - Other optimizations that provide fast "no match" results also affect - callouts. For example, if the pattern is - - ab(?C4)cd - - PCRE knows that any matching string must contain the letter "d". If the - subject string is "abyz", the lack of "d" means that matching doesn't - ever start, and the callout is never reached. However, with "abyd", - though the result is still no match, the callout is obeyed. - - If the pattern is studied, PCRE knows the minimum length of a matching - string, and will immediately give a "no match" return without actually - running a match if the subject is not long enough, or, for unanchored - patterns, if it has been scanned far enough. - - You can disable these optimizations by passing the PCRE_NO_START_OPTI- - MIZE option to the matching function, or by starting the pattern with - (*NO_START_OPT). This slows down the matching process, but does ensure - that callouts such as the example above are obeyed. - - -THE CALLOUT INTERFACE - - During matching, when PCRE reaches a callout point, the external func- - tion defined by pcre_callout or pcre[16|32]_callout is called (if it is - set). This applies to both normal and DFA matching. The only argument - to the callout function is a pointer to a pcre_callout or - pcre[16|32]_callout block. These structures contains the following - fields: - - int version; - int callout_number; - int *offset_vector; - const char *subject; (8-bit version) - PCRE_SPTR16 subject; (16-bit version) - PCRE_SPTR32 subject; (32-bit version) - int subject_length; - int start_match; - int current_position; - int capture_top; - int capture_last; - void *callout_data; - int pattern_position; - int next_item_length; - const unsigned char *mark; (8-bit version) - const PCRE_UCHAR16 *mark; (16-bit version) - const PCRE_UCHAR32 *mark; (32-bit version) - - The version field is an integer containing the version number of the - block format. The initial version was 0; the current version is 2. The - version number will change again in future if additional fields are - added, but the intention is never to remove any of the existing fields. - - The callout_number field contains the number of the callout, as com- - piled into the pattern (that is, the number after ?C for manual call- - outs, and 255 for automatically generated callouts). - - The offset_vector field is a pointer to the vector of offsets that was - passed by the caller to the matching function. When pcre_exec() or - pcre[16|32]_exec() is used, the contents can be inspected, in order to - extract substrings that have been matched so far, in the same way as - for extracting substrings after a match has completed. For the DFA - matching functions, this field is not useful. - - The subject and subject_length fields contain copies of the values that - were passed to the matching function. - - The start_match field normally contains the offset within the subject - at which the current match attempt started. However, if the escape - sequence \K has been encountered, this value is changed to reflect the - modified starting point. If the pattern is not anchored, the callout - function may be called several times from the same point in the pattern - for different starting points in the subject. - - The current_position field contains the offset within the subject of - the current match pointer. - - When the pcre_exec() or pcre[16|32]_exec() is used, the capture_top - field contains one more than the number of the highest numbered cap- - tured substring so far. If no substrings have been captured, the value - of capture_top is one. This is always the case when the DFA functions - are used, because they do not support captured substrings. - - The capture_last field contains the number of the most recently cap- - tured substring. However, when a recursion exits, the value reverts to - what it was outside the recursion, as do the values of all captured - substrings. If no substrings have been captured, the value of cap- - ture_last is -1. This is always the case for the DFA matching func- - tions. - - The callout_data field contains a value that is passed to a matching - function specifically so that it can be passed back in callouts. It is - passed in the callout_data field of a pcre_extra or pcre[16|32]_extra - data structure. If no such data was passed, the value of callout_data - in a callout block is NULL. There is a description of the pcre_extra - structure in the pcreapi documentation. - - The pattern_position field is present from version 1 of the callout - structure. It contains the offset to the next item to be matched in the - pattern string. - - The next_item_length field is present from version 1 of the callout - structure. It contains the length of the next item to be matched in the - pattern string. When the callout immediately precedes an alternation - bar, a closing parenthesis, or the end of the pattern, the length is - zero. When the callout precedes an opening parenthesis, the length is - that of the entire subpattern. - - The pattern_position and next_item_length fields are intended to help - in distinguishing between different automatic callouts, which all have - the same callout number. However, they are set for all callouts. - - The mark field is present from version 2 of the callout structure. In - callouts from pcre_exec() or pcre[16|32]_exec() it contains a pointer - to the zero-terminated name of the most recently passed (*MARK), - (*PRUNE), or (*THEN) item in the match, or NULL if no such items have - been passed. Instances of (*PRUNE) or (*THEN) without a name do not - obliterate a previous (*MARK). In callouts from the DFA matching func- - tions this field always contains NULL. - - -RETURN VALUES - - The external callout function returns an integer to PCRE. If the value - is zero, matching proceeds as normal. If the value is greater than - zero, matching fails at the current point, but the testing of other - matching possibilities goes ahead, just as if a lookahead assertion had - failed. If the value is less than zero, the match is abandoned, the - matching function returns the negative value. - - Negative values should normally be chosen from the set of - PCRE_ERROR_xxx values. In particular, PCRE_ERROR_NOMATCH forces a stan- - dard "no match" failure. The error number PCRE_ERROR_CALLOUT is - reserved for use by callout functions; it will never be used by PCRE - itself. - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 12 November 2013 - Copyright (c) 1997-2013 University of Cambridge. ------------------------------------------------------------------------------- - - -PCRECOMPAT(3) Library Functions Manual PCRECOMPAT(3) - - - -NAME - PCRE - Perl-compatible regular expressions - -DIFFERENCES BETWEEN PCRE AND PERL - - This document describes the differences in the ways that PCRE and Perl - handle regular expressions. The differences described here are with - respect to Perl versions 5.10 and above. - - 1. PCRE has only a subset of Perl's Unicode support. Details of what it - does have are given in the pcreunicode page. - - 2. PCRE allows repeat quantifiers only on parenthesized assertions, but - they do not mean what you might think. For example, (?!a){3} does not - assert that the next three characters are not "a". It just asserts that - the next character is not "a" three times (in principle: PCRE optimizes - this to run the assertion just once). Perl allows repeat quantifiers on - other assertions such as \b, but these do not seem to have any use. - - 3. Capturing subpatterns that occur inside negative lookahead asser- - tions are counted, but their entries in the offsets vector are never - set. Perl sometimes (but not always) sets its numerical variables from - inside negative assertions. - - 4. Though binary zero characters are supported in the subject string, - they are not allowed in a pattern string because it is passed as a nor- - mal C string, terminated by zero. The escape sequence \0 can be used in - the pattern to represent a binary zero. - - 5. The following Perl escape sequences are not supported: \l, \u, \L, - \U, and \N when followed by a character name or Unicode value. (\N on - its own, matching a non-newline character, is supported.) In fact these - are implemented by Perl's general string-handling and are not part of - its pattern matching engine. If any of these are encountered by PCRE, - an error is generated by default. However, if the PCRE_JAVASCRIPT_COM- - PAT option is set, \U and \u are interpreted as JavaScript interprets - them. - - 6. The Perl escape sequences \p, \P, and \X are supported only if PCRE - is built with Unicode character property support. The properties that - can be tested with \p and \P are limited to the general category prop- - erties such as Lu and Nd, script names such as Greek or Han, and the - derived properties Any and L&. PCRE does support the Cs (surrogate) - property, which Perl does not; the Perl documentation says "Because - Perl hides the need for the user to understand the internal representa- - tion of Unicode characters, there is no need to implement the somewhat - messy concept of surrogates." - - 7. PCRE does support the \Q...\E escape for quoting substrings. Charac- - ters in between are treated as literals. This is slightly different - from Perl in that $ and @ are also handled as literals inside the - quotes. In Perl, they cause variable interpolation (but of course PCRE - does not have variables). Note the following examples: - - Pattern PCRE matches Perl matches - - \Qabc$xyz\E abc$xyz abc followed by the - contents of $xyz - \Qabc\$xyz\E abc\$xyz abc\$xyz - \Qabc\E\$\Qxyz\E abc$xyz abc$xyz - - The \Q...\E sequence is recognized both inside and outside character - classes. - - 8. Fairly obviously, PCRE does not support the (?{code}) and (??{code}) - constructions. However, there is support for recursive patterns. This - is not available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE - "callout" feature allows an external function to be called during pat- - tern matching. See the pcrecallout documentation for details. - - 9. Subpatterns that are called as subroutines (whether or not recur- - sively) are always treated as atomic groups in PCRE. This is like - Python, but unlike Perl. Captured values that are set outside a sub- - routine call can be reference from inside in PCRE, but not in Perl. - There is a discussion that explains these differences in more detail in - the section on recursion differences from Perl in the pcrepattern page. - - 10. If any of the backtracking control verbs are used in a subpattern - that is called as a subroutine (whether or not recursively), their - effect is confined to that subpattern; it does not extend to the sur- - rounding pattern. This is not always the case in Perl. In particular, - if (*THEN) is present in a group that is called as a subroutine, its - action is limited to that group, even if the group does not contain any - | characters. Note that such subpatterns are processed as anchored at - the point where they are tested. - - 11. If a pattern contains more than one backtracking control verb, the - first one that is backtracked onto acts. For example, in the pattern - A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure - in C triggers (*PRUNE). Perl's behaviour is more complex; in many cases - it is the same as PCRE, but there are examples where it differs. - - 12. Most backtracking verbs in assertions have their normal actions. - They are not confined to the assertion. - - 13. There are some differences that are concerned with the settings of - captured strings when part of a pattern is repeated. For example, - matching "aba" against the pattern /^(a(b)?)+$/ in Perl leaves $2 - unset, but in PCRE it is set to "b". - - 14. PCRE's handling of duplicate subpattern numbers and duplicate sub- - pattern names is not as general as Perl's. This is a consequence of the - fact the PCRE works internally just with numbers, using an external ta- - ble to translate between numbers and names. In particular, a pattern - such as (?|(?A)|(?B), where the two capturing parentheses have - the same number but different names, is not supported, and causes an - error at compile time. If it were allowed, it would not be possible to - distinguish which parentheses matched, because both names map to cap- - turing subpattern number 1. To avoid this confusing situation, an error - is given at compile time. - - 15. Perl recognizes comments in some places that PCRE does not, for - example, between the ( and ? at the start of a subpattern. If the /x - modifier is set, Perl allows white space between ( and ? (though cur- - rent Perls warn that this is deprecated) but PCRE never does, even if - the PCRE_EXTENDED option is set. - - 16. Perl, when in warning mode, gives warnings for character classes - such as [A-\d] or [a-[:digit:]]. It then treats the hyphens as liter- - als. PCRE has no warning features, so it gives an error in these cases - because they are almost certainly user mistakes. - - 17. In PCRE, the upper/lower case character properties Lu and Ll are - not affected when case-independent matching is specified. For example, - \p{Lu} always matches an upper case letter. I think Perl has changed in - this respect; in the release at the time of writing (5.16), \p{Lu} and - \p{Ll} match all letters, regardless of case, when case independence is - specified. - - 18. PCRE provides some extensions to the Perl regular expression facil- - ities. Perl 5.10 includes new features that are not in earlier ver- - sions of Perl, some of which (such as named parentheses) have been in - PCRE for some time. This list is with respect to Perl 5.10: - - (a) Although lookbehind assertions in PCRE must match fixed length - strings, each alternative branch of a lookbehind assertion can match a - different length of string. Perl requires them all to have the same - length. - - (b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $ - meta-character matches only at the very end of the string. - - (c) If PCRE_EXTRA is set, a backslash followed by a letter with no spe- - cial meaning is faulted. Otherwise, like Perl, the backslash is quietly - ignored. (Perl can be made to issue a warning.) - - (d) If PCRE_UNGREEDY is set, the greediness of the repetition quanti- - fiers is inverted, that is, by default they are not greedy, but if fol- - lowed by a question mark they are. - - (e) PCRE_ANCHORED can be used at matching time to force a pattern to be - tried only at the first matching position in the subject string. - - (f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, - and PCRE_NO_AUTO_CAPTURE options for pcre_exec() have no Perl equiva- - lents. - - (g) The \R escape sequence can be restricted to match only CR, LF, or - CRLF by the PCRE_BSR_ANYCRLF option. - - (h) The callout facility is PCRE-specific. - - (i) The partial matching facility is PCRE-specific. - - (j) Patterns compiled by PCRE can be saved and re-used at a later time, - even on different hosts that have the other endianness. However, this - does not apply to optimized data created by the just-in-time compiler. - - (k) The alternative matching functions (pcre_dfa_exec(), - pcre16_dfa_exec() and pcre32_dfa_exec(),) match in a different way and - are not Perl-compatible. - - (l) PCRE recognizes some special sequences such as (*CR) at the start - of a pattern that set overall options that cannot be changed within the - pattern. - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 10 November 2013 - Copyright (c) 1997-2013 University of Cambridge. ------------------------------------------------------------------------------- - - -PCREPATTERN(3) Library Functions Manual PCREPATTERN(3) - - - -NAME - PCRE - Perl-compatible regular expressions - -PCRE REGULAR EXPRESSION DETAILS - - The syntax and semantics of the regular expressions that are supported - by PCRE are described in detail below. There is a quick-reference syn- - tax summary in the pcresyntax page. PCRE tries to match Perl syntax and - semantics as closely as it can. PCRE also supports some alternative - regular expression syntax (which does not conflict with the Perl syn- - tax) in order to provide some compatibility with regular expressions in - Python, .NET, and Oniguruma. - - Perl's regular expressions are described in its own documentation, and - regular expressions in general are covered in a number of books, some - of which have copious examples. Jeffrey Friedl's "Mastering Regular - Expressions", published by O'Reilly, covers regular expressions in - great detail. This description of PCRE's regular expressions is - intended as reference material. - - This document discusses the patterns that are supported by PCRE when - one its main matching functions, pcre_exec() (8-bit) or - pcre[16|32]_exec() (16- or 32-bit), is used. PCRE also has alternative - matching functions, pcre_dfa_exec() and pcre[16|32_dfa_exec(), which - match using a different algorithm that is not Perl-compatible. Some of - the features discussed below are not available when DFA matching is - used. The advantages and disadvantages of the alternative functions, - and how they differ from the normal functions, are discussed in the - pcrematching page. - - -SPECIAL START-OF-PATTERN ITEMS - - A number of options that can be passed to pcre_compile() can also be - set by special items at the start of a pattern. These are not Perl-com- - patible, but are provided to make these options accessible to pattern - writers who are not able to change the program that processes the pat- - tern. Any number of these items may appear, but they must all be - together right at the start of the pattern string, and the letters must - be in upper case. - - UTF support - - The original operation of PCRE was on strings of one-byte characters. - However, there is now also support for UTF-8 strings in the original - library, an extra library that supports 16-bit and UTF-16 character - strings, and a third library that supports 32-bit and UTF-32 character - strings. To use these features, PCRE must be built to include appropri- - ate support. When using UTF strings you must either call the compiling - function with the PCRE_UTF8, PCRE_UTF16, or PCRE_UTF32 option, or the - pattern must start with one of these special sequences: - - (*UTF8) - (*UTF16) - (*UTF32) - (*UTF) - - (*UTF) is a generic sequence that can be used with any of the - libraries. Starting a pattern with such a sequence is equivalent to - setting the relevant option. How setting a UTF mode affects pattern - matching is mentioned in several places below. There is also a summary - of features in the pcreunicode page. - - Some applications that allow their users to supply patterns may wish to - restrict them to non-UTF data for security reasons. If the - PCRE_NEVER_UTF option is set at compile time, (*UTF) etc. are not - allowed, and their appearance causes an error. - - Unicode property support - - Another special sequence that may appear at the start of a pattern is - (*UCP). This has the same effect as setting the PCRE_UCP option: it - causes sequences such as \d and \w to use Unicode properties to deter- - mine character types, instead of recognizing only characters with codes - less than 128 via a lookup table. - - Disabling auto-possessification - - If a pattern starts with (*NO_AUTO_POSSESS), it has the same effect as - setting the PCRE_NO_AUTO_POSSESS option at compile time. This stops - PCRE from making quantifiers possessive when what follows cannot match - the repeated item. For example, by default a+b is treated as a++b. For - more details, see the pcreapi documentation. - - Disabling start-up optimizations - - If a pattern starts with (*NO_START_OPT), it has the same effect as - setting the PCRE_NO_START_OPTIMIZE option either at compile or matching - time. This disables several optimizations for quickly reaching "no - match" results. For more details, see the pcreapi documentation. - - Newline conventions - - PCRE supports five different conventions for indicating line breaks in - strings: a single CR (carriage return) character, a single LF (line- - feed) character, the two-character sequence CRLF, any of the three pre- - ceding, or any Unicode newline sequence. The pcreapi page has further - discussion about newlines, and shows how to set the newline convention - in the options arguments for the compiling and matching functions. - - It is also possible to specify a newline convention by starting a pat- - tern string with one of the following five sequences: - - (*CR) carriage return - (*LF) linefeed - (*CRLF) carriage return, followed by linefeed - (*ANYCRLF) any of the three above - (*ANY) all Unicode newline sequences - - These override the default and the options given to the compiling func- - tion. For example, on a Unix system where LF is the default newline - sequence, the pattern - - (*CR)a.b - - changes the convention to CR. That pattern matches "a\nb" because LF is - no longer a newline. If more than one of these settings is present, the - last one is used. - - The newline convention affects where the circumflex and dollar asser- - tions are true. It also affects the interpretation of the dot metachar- - acter when PCRE_DOTALL is not set, and the behaviour of \N. However, it - does not affect what the \R escape sequence matches. By default, this - is any Unicode newline sequence, for Perl compatibility. However, this - can be changed; see the description of \R in the section entitled "New- - line sequences" below. A change of \R setting can be combined with a - change of newline convention. - - Setting match and recursion limits - - The caller of pcre_exec() can set a limit on the number of times the - internal match() function is called and on the maximum depth of recur- - sive calls. These facilities are provided to catch runaway matches that - are provoked by patterns with huge matching trees (a typical example is - a pattern with nested unlimited repeats) and to avoid running out of - system stack by too much recursion. When one of these limits is - reached, pcre_exec() gives an error return. The limits can also be set - by items at the start of the pattern of the form - - (*LIMIT_MATCH=d) - (*LIMIT_RECURSION=d) - - where d is any number of decimal digits. However, the value of the set- - ting must be less than the value set (or defaulted) by the caller of - pcre_exec() for it to have any effect. In other words, the pattern - writer can lower the limits set by the programmer, but not raise them. - If there is more than one setting of one of these limits, the lower - value is used. - - -EBCDIC CHARACTER CODES - - PCRE can be compiled to run in an environment that uses EBCDIC as its - character code rather than ASCII or Unicode (typically a mainframe sys- - tem). In the sections below, character code values are ASCII or Uni- - code; in an EBCDIC environment these characters may have different code - values, and there are no code points greater than 255. - - -CHARACTERS AND METACHARACTERS - - A regular expression is a pattern that is matched against a subject - string from left to right. Most characters stand for themselves in a - pattern, and match the corresponding characters in the subject. As a - trivial example, the pattern - - The quick brown fox - - matches a portion of a subject string that is identical to itself. When - caseless matching is specified (the PCRE_CASELESS option), letters are - matched independently of case. In a UTF mode, PCRE always understands - the concept of case for characters whose values are less than 128, so - caseless matching is always possible. For characters with higher val- - ues, the concept of case is supported if PCRE is compiled with Unicode - property support, but not otherwise. If you want to use caseless - matching for characters 128 and above, you must ensure that PCRE is - compiled with Unicode property support as well as with UTF support. - - The power of regular expressions comes from the ability to include - alternatives and repetitions in the pattern. These are encoded in the - pattern by the use of metacharacters, which do not stand for themselves - but instead are interpreted in some special way. - - There are two different sets of metacharacters: those that are recog- - nized anywhere in the pattern except within square brackets, and those - that are recognized within square brackets. Outside square brackets, - the metacharacters are as follows: - - \ general escape character with several uses - ^ assert start of string (or line, in multiline mode) - $ assert end of string (or line, in multiline mode) - . match any character except newline (by default) - [ start character class definition - | start of alternative branch - ( start subpattern - ) end subpattern - ? extends the meaning of ( - also 0 or 1 quantifier - also quantifier minimizer - * 0 or more quantifier - + 1 or more quantifier - also "possessive quantifier" - { start min/max quantifier - - Part of a pattern that is in square brackets is called a "character - class". In a character class the only metacharacters are: - - \ general escape character - ^ negate the class, but only if the first character - - indicates character range - [ POSIX character class (only if followed by POSIX - syntax) - ] terminates the character class - - The following sections describe the use of each of the metacharacters. - - -BACKSLASH - - The backslash character has several uses. Firstly, if it is followed by - a character that is not a number or a letter, it takes away any special - meaning that character may have. This use of backslash as an escape - character applies both inside and outside character classes. - - For example, if you want to match a * character, you write \* in the - pattern. This escaping action applies whether or not the following - character would otherwise be interpreted as a metacharacter, so it is - always safe to precede a non-alphanumeric with backslash to specify - that it stands for itself. In particular, if you want to match a back- - slash, you write \\. - - In a UTF mode, only ASCII numbers and letters have any special meaning - after a backslash. All other characters (in particular, those whose - codepoints are greater than 127) are treated as literals. - - If a pattern is compiled with the PCRE_EXTENDED option, most white - space in the pattern (other than in a character class), and characters - between a # outside a character class and the next newline, inclusive, - are ignored. An escaping backslash can be used to include a white space - or # character as part of the pattern. - - If you want to remove the special meaning from a sequence of charac- - ters, you can do so by putting them between \Q and \E. This is differ- - ent from Perl in that $ and @ are handled as literals in \Q...\E - sequences in PCRE, whereas in Perl, $ and @ cause variable interpola- - tion. Note the following examples: - - Pattern PCRE matches Perl matches - - \Qabc$xyz\E abc$xyz abc followed by the - contents of $xyz - \Qabc\$xyz\E abc\$xyz abc\$xyz - \Qabc\E\$\Qxyz\E abc$xyz abc$xyz - - The \Q...\E sequence is recognized both inside and outside character - classes. An isolated \E that is not preceded by \Q is ignored. If \Q - is not followed by \E later in the pattern, the literal interpretation - continues to the end of the pattern (that is, \E is assumed at the - end). If the isolated \Q is inside a character class, this causes an - error, because the character class is not terminated. - - Non-printing characters - - A second use of backslash provides a way of encoding non-printing char- - acters in patterns in a visible manner. There is no restriction on the - appearance of non-printing characters, apart from the binary zero that - terminates a pattern, but when a pattern is being prepared by text - editing, it is often easier to use one of the following escape - sequences than the binary character it represents. In an ASCII or Uni- - code environment, these escapes are as follows: - - \a alarm, that is, the BEL character (hex 07) - \cx "control-x", where x is any ASCII character - \e escape (hex 1B) - \f form feed (hex 0C) - \n linefeed (hex 0A) - \r carriage return (hex 0D) - \t tab (hex 09) - \0dd character with octal code 0dd - \ddd character with octal code ddd, or back reference - \o{ddd..} character with octal code ddd.. - \xhh character with hex code hh - \x{hhh..} character with hex code hhh.. (non-JavaScript mode) - \uhhhh character with hex code hhhh (JavaScript mode only) - - The precise effect of \cx on ASCII characters is as follows: if x is a - lower case letter, it is converted to upper case. Then bit 6 of the - character (hex 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A - (A is 41, Z is 5A), but \c{ becomes hex 3B ({ is 7B), and \c; becomes - hex 7B (; is 3B). If the data item (byte or 16-bit value) following \c - has a value greater than 127, a compile-time error occurs. This locks - out non-ASCII characters in all modes. - - When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t gener- - ate the appropriate EBCDIC code values. The \c escape is processed as - specified for Perl in the perlebcdic document. The only characters that - are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. - Any other character provokes a compile-time error. The sequence \c@ - encodes character code 0; after \c the letters (in either case) encode - characters 1-26 (hex 01 to hex 1A); [, \, ], ^, and _ encode characters - 27-31 (hex 1B to hex 1F), and \c? becomes either 255 (hex FF) or 95 - (hex 5F). - - Thus, apart from \c?, these escapes generate the same character code - values as they do in an ASCII environment, though the meanings of the - values mostly differ. For example, \cG always generates code value 7, - which is BEL in ASCII but DEL in EBCDIC. - - The sequence \c? generates DEL (127, hex 7F) in an ASCII environment, - but because 127 is not a control character in EBCDIC, Perl makes it - generate the APC character. Unfortunately, there are several variants - of EBCDIC. In most of them the APC character has the value 255 (hex - FF), but in the one Perl calls POSIX-BC its value is 95 (hex 5F). If - certain other characters have POSIX-BC values, PCRE makes \c? generate - 95; otherwise it generates 255. - - After \0 up to two further octal digits are read. If there are fewer - than two digits, just those that are present are used. Thus the - sequence \0\x\015 specifies two binary zeros followed by a CR character - (code value 13). Make sure you supply two digits after the initial zero - if the pattern character that follows is itself an octal digit. - - The escape \o must be followed by a sequence of octal digits, enclosed - in braces. An error occurs if this is not the case. This escape is a - recent addition to Perl; it provides way of specifying character code - points as octal numbers greater than 0777, and it also allows octal - numbers and back references to be unambiguously specified. - - For greater clarity and unambiguity, it is best to avoid following \ by - a digit greater than zero. Instead, use \o{} or \x{} to specify charac- - ter numbers, and \g{} to specify back references. The following para- - graphs describe the old, ambiguous syntax. - - The handling of a backslash followed by a digit other than 0 is compli- - cated, and Perl has changed in recent releases, causing PCRE also to - change. Outside a character class, PCRE reads the digit and any follow- - ing digits as a decimal number. If the number is less than 8, or if - there have been at least that many previous capturing left parentheses - in the expression, the entire sequence is taken as a back reference. A - description of how this works is given later, following the discussion - of parenthesized subpatterns. - - Inside a character class, or if the decimal number following \ is - greater than 7 and there have not been that many capturing subpatterns, - PCRE handles \8 and \9 as the literal characters "8" and "9", and oth- - erwise re-reads up to three octal digits following the backslash, using - them to generate a data character. Any subsequent digits stand for - themselves. For example: - - \040 is another way of writing an ASCII space - \40 is the same, provided there are fewer than 40 - previous capturing subpatterns - \7 is always a back reference - \11 might be a back reference, or another way of - writing a tab - \011 is always a tab - \0113 is a tab followed by the character "3" - \113 might be a back reference, otherwise the - character with octal code 113 - \377 might be a back reference, otherwise - the value 255 (decimal) - \81 is either a back reference, or the two - characters "8" and "1" - - Note that octal values of 100 or greater that are specified using this - syntax must not be introduced by a leading zero, because no more than - three octal digits are ever read. - - By default, after \x that is not followed by {, from zero to two hexa- - decimal digits are read (letters can be in upper or lower case). Any - number of hexadecimal digits may appear between \x{ and }. If a charac- - ter other than a hexadecimal digit appears between \x{ and }, or if - there is no terminating }, an error occurs. - - If the PCRE_JAVASCRIPT_COMPAT option is set, the interpretation of \x - is as just described only when it is followed by two hexadecimal dig- - its. Otherwise, it matches a literal "x" character. In JavaScript - mode, support for code points greater than 256 is provided by \u, which - must be followed by four hexadecimal digits; otherwise it matches a - literal "u" character. - - Characters whose value is less than 256 can be defined by either of the - two syntaxes for \x (or by \u in JavaScript mode). There is no differ- - ence in the way they are handled. For example, \xdc is exactly the same - as \x{dc} (or \u00dc in JavaScript mode). - - Constraints on character values - - Characters that are specified using octal or hexadecimal numbers are - limited to certain values, as follows: - - 8-bit non-UTF mode less than 0x100 - 8-bit UTF-8 mode less than 0x10ffff and a valid codepoint - 16-bit non-UTF mode less than 0x10000 - 16-bit UTF-16 mode less than 0x10ffff and a valid codepoint - 32-bit non-UTF mode less than 0x100000000 - 32-bit UTF-32 mode less than 0x10ffff and a valid codepoint - - Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so- - called "surrogate" codepoints), and 0xffef. - - Escape sequences in character classes - - All the sequences that define a single character value can be used both - inside and outside character classes. In addition, inside a character - class, \b is interpreted as the backspace character (hex 08). - - \N is not allowed in a character class. \B, \R, and \X are not special - inside a character class. Like other unrecognized escape sequences, - they are treated as the literal characters "B", "R", and "X" by - default, but cause an error if the PCRE_EXTRA option is set. Outside a - character class, these sequences have different meanings. - - Unsupported escape sequences - - In Perl, the sequences \l, \L, \u, and \U are recognized by its string - handler and used to modify the case of following characters. By - default, PCRE does not support these escape sequences. However, if the - PCRE_JAVASCRIPT_COMPAT option is set, \U matches a "U" character, and - \u can be used to define a character by code point, as described in the - previous section. - - Absolute and relative back references - - The sequence \g followed by an unsigned or a negative number, option- - ally enclosed in braces, is an absolute or relative back reference. A - named back reference can be coded as \g{name}. Back references are dis- - cussed later, following the discussion of parenthesized subpatterns. - - Absolute and relative subroutine calls - - For compatibility with Oniguruma, the non-Perl syntax \g followed by a - name or a number enclosed either in angle brackets or single quotes, is - an alternative syntax for referencing a subpattern as a "subroutine". - Details are discussed later. Note that \g{...} (Perl syntax) and - \g<...> (Oniguruma syntax) are not synonymous. The former is a back - reference; the latter is a subroutine call. - - Generic character types - - Another use of backslash is for specifying generic character types: - - \d any decimal digit - \D any character that is not a decimal digit - \h any horizontal white space character - \H any character that is not a horizontal white space character - \s any white space character - \S any character that is not a white space character - \v any vertical white space character - \V any character that is not a vertical white space character - \w any "word" character - \W any "non-word" character - - There is also the single sequence \N, which matches a non-newline char- - acter. This is the same as the "." metacharacter when PCRE_DOTALL is - not set. Perl also uses \N to match characters by name; PCRE does not - support this. - - Each pair of lower and upper case escape sequences partitions the com- - plete set of characters into two disjoint sets. Any given character - matches one, and only one, of each pair. The sequences can appear both - inside and outside character classes. They each match one character of - the appropriate type. If the current matching point is at the end of - the subject string, all of them fail, because there is no character to - match. - - For compatibility with Perl, \s did not used to match the VT character - (code 11), which made it different from the the POSIX "space" class. - However, Perl added VT at release 5.18, and PCRE followed suit at - release 8.34. The default \s characters are now HT (9), LF (10), VT - (11), FF (12), CR (13), and space (32), which are defined as white - space in the "C" locale. This list may vary if locale-specific matching - is taking place. For example, in some locales the "non-breaking space" - character (\xA0) is recognized as white space, and in others the VT - character is not. - - A "word" character is an underscore or any character that is a letter - or digit. By default, the definition of letters and digits is con- - trolled by PCRE's low-valued character tables, and may vary if locale- - specific matching is taking place (see "Locale support" in the pcreapi - page). For example, in a French locale such as "fr_FR" in Unix-like - systems, or "french" in Windows, some character codes greater than 127 - are used for accented letters, and these are then matched by \w. The - use of locales with Unicode is discouraged. - - By default, characters whose code points are greater than 127 never - match \d, \s, or \w, and always match \D, \S, and \W, although this may - vary for characters in the range 128-255 when locale-specific matching - is happening. These escape sequences retain their original meanings - from before Unicode support was available, mainly for efficiency rea- - sons. If PCRE is compiled with Unicode property support, and the - PCRE_UCP option is set, the behaviour is changed so that Unicode prop- - erties are used to determine character types, as follows: - - \d any character that matches \p{Nd} (decimal digit) - \s any character that matches \p{Z} or \h or \v - \w any character that matches \p{L} or \p{N}, plus underscore - - The upper case escapes match the inverse sets of characters. Note that - \d matches only decimal digits, whereas \w matches any Unicode digit, - as well as any Unicode letter, and underscore. Note also that PCRE_UCP - affects \b, and \B because they are defined in terms of \w and \W. - Matching these sequences is noticeably slower when PCRE_UCP is set. - - The sequences \h, \H, \v, and \V are features that were added to Perl - at release 5.10. In contrast to the other sequences, which match only - ASCII characters by default, these always match certain high-valued - code points, whether or not PCRE_UCP is set. The horizontal space char- - acters are: - - U+0009 Horizontal tab (HT) - U+0020 Space - U+00A0 Non-break space - U+1680 Ogham space mark - U+180E Mongolian vowel separator - U+2000 En quad - U+2001 Em quad - U+2002 En space - U+2003 Em space - U+2004 Three-per-em space - U+2005 Four-per-em space - U+2006 Six-per-em space - U+2007 Figure space - U+2008 Punctuation space - U+2009 Thin space - U+200A Hair space - U+202F Narrow no-break space - U+205F Medium mathematical space - U+3000 Ideographic space - - The vertical space characters are: - - U+000A Linefeed (LF) - U+000B Vertical tab (VT) - U+000C Form feed (FF) - U+000D Carriage return (CR) - U+0085 Next line (NEL) - U+2028 Line separator - U+2029 Paragraph separator - - In 8-bit, non-UTF-8 mode, only the characters with codepoints less than - 256 are relevant. - - Newline sequences - - Outside a character class, by default, the escape sequence \R matches - any Unicode newline sequence. In 8-bit non-UTF-8 mode \R is equivalent - to the following: - - (?>\r\n|\n|\x0b|\f|\r|\x85) - - This is an example of an "atomic group", details of which are given - below. This particular group matches either the two-character sequence - CR followed by LF, or one of the single characters LF (linefeed, - U+000A), VT (vertical tab, U+000B), FF (form feed, U+000C), CR (car- - riage return, U+000D), or NEL (next line, U+0085). The two-character - sequence is treated as a single unit that cannot be split. - - In other modes, two additional characters whose codepoints are greater - than 255 are added: LS (line separator, U+2028) and PS (paragraph sepa- - rator, U+2029). Unicode character property support is not needed for - these characters to be recognized. - - It is possible to restrict \R to match only CR, LF, or CRLF (instead of - the complete set of Unicode line endings) by setting the option - PCRE_BSR_ANYCRLF either at compile time or when the pattern is matched. - (BSR is an abbrevation for "backslash R".) This can be made the default - when PCRE is built; if this is the case, the other behaviour can be - requested via the PCRE_BSR_UNICODE option. It is also possible to - specify these settings by starting a pattern string with one of the - following sequences: - - (*BSR_ANYCRLF) CR, LF, or CRLF only - (*BSR_UNICODE) any Unicode newline sequence - - These override the default and the options given to the compiling func- - tion, but they can themselves be overridden by options given to a - matching function. Note that these special settings, which are not - Perl-compatible, are recognized only at the very start of a pattern, - and that they must be in upper case. If more than one of them is - present, the last one is used. They can be combined with a change of - newline convention; for example, a pattern can start with: - - (*ANY)(*BSR_ANYCRLF) - - They can also be combined with the (*UTF8), (*UTF16), (*UTF32), (*UTF) - or (*UCP) special sequences. Inside a character class, \R is treated as - an unrecognized escape sequence, and so matches the letter "R" by - default, but causes an error if PCRE_EXTRA is set. - - Unicode character properties - - When PCRE is built with Unicode character property support, three addi- - tional escape sequences that match characters with specific properties - are available. When in 8-bit non-UTF-8 mode, these sequences are of - course limited to testing characters whose codepoints are less than - 256, but they do work in this mode. The extra escape sequences are: - - \p{xx} a character with the xx property - \P{xx} a character without the xx property - \X a Unicode extended grapheme cluster - - The property names represented by xx above are limited to the Unicode - script names, the general category properties, "Any", which matches any - character (including newline), and some special PCRE properties - (described in the next section). Other Perl properties such as "InMu- - sicalSymbols" are not currently supported by PCRE. Note that \P{Any} - does not match any characters, so always causes a match failure. - - Sets of Unicode characters are defined as belonging to certain scripts. - A character from one of these sets can be matched using a script name. - For example: - - \p{Greek} - \P{Han} - - Those that are not part of an identified script are lumped together as - "Common". The current list of scripts is: - - Arabic, Armenian, Avestan, Balinese, Bamum, Bassa_Vah, Batak, Bengali, - Bopomofo, Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Car- - ian, Caucasian_Albanian, Chakma, Cham, Cherokee, Common, Coptic, Cunei- - form, Cypriot, Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hiero- - glyphs, Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, Grantha, - Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, - Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip- - tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li, - Kharoshthi, Khmer, Khojki, Khudawadi, Lao, Latin, Lepcha, Limbu, Lin- - ear_A, Linear_B, Lisu, Lycian, Lydian, Mahajani, Malayalam, Mandaic, - Manichaean, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive, - Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Myanmar, Nabataean, - New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_North_Arabian, - Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, Oriya, Osmanya, - Pahawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician, - Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha- - vian, Siddham, Sinhala, Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac, - Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu, - Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi, - Yi. - - Each character has exactly one Unicode general category property, spec- - ified by a two-letter abbreviation. For compatibility with Perl, nega- - tion can be specified by including a circumflex between the opening - brace and the property name. For example, \p{^Lu} is the same as - \P{Lu}. - - If only one letter is specified with \p or \P, it includes all the gen- - eral category properties that start with that letter. In this case, in - the absence of negation, the curly brackets in the escape sequence are - optional; these two examples have the same effect: - - \p{L} - \pL - - The following general category property codes are supported: - - C Other - Cc Control - Cf Format - Cn Unassigned - Co Private use - Cs Surrogate - - L Letter - Ll Lower case letter - Lm Modifier letter - Lo Other letter - Lt Title case letter - Lu Upper case letter - - M Mark - Mc Spacing mark - Me Enclosing mark - Mn Non-spacing mark - - N Number - Nd Decimal number - Nl Letter number - No Other number - - P Punctuation - Pc Connector punctuation - Pd Dash punctuation - Pe Close punctuation - Pf Final punctuation - Pi Initial punctuation - Po Other punctuation - Ps Open punctuation - - S Symbol - Sc Currency symbol - Sk Modifier symbol - Sm Mathematical symbol - So Other symbol - - Z Separator - Zl Line separator - Zp Paragraph separator - Zs Space separator - - The special property L& is also supported: it matches a character that - has the Lu, Ll, or Lt property, in other words, a letter that is not - classified as a modifier or "other". - - The Cs (Surrogate) property applies only to characters in the range - U+D800 to U+DFFF. Such characters are not valid in Unicode strings and - so cannot be tested by PCRE, unless UTF validity checking has been - turned off (see the discussion of PCRE_NO_UTF8_CHECK, - PCRE_NO_UTF16_CHECK and PCRE_NO_UTF32_CHECK in the pcreapi page). Perl - does not support the Cs property. - - The long synonyms for property names that Perl supports (such as - \p{Letter}) are not supported by PCRE, nor is it permitted to prefix - any of these properties with "Is". - - No character that is in the Unicode table has the Cn (unassigned) prop- - erty. Instead, this property is assumed for any code point that is not - in the Unicode table. - - Specifying caseless matching does not affect these escape sequences. - For example, \p{Lu} always matches only upper case letters. This is - different from the behaviour of current versions of Perl. - - Matching characters by Unicode property is not fast, because PCRE has - to do a multistage table lookup in order to find a character's prop- - erty. That is why the traditional escape sequences such as \d and \w do - not use Unicode properties in PCRE by default, though you can make them - do so by setting the PCRE_UCP option or by starting the pattern with - (*UCP). - - Extended grapheme clusters - - The \X escape matches any number of Unicode characters that form an - "extended grapheme cluster", and treats the sequence as an atomic group - (see below). Up to and including release 8.31, PCRE matched an ear- - lier, simpler definition that was equivalent to - - (?>\PM\pM*) - - That is, it matched a character without the "mark" property, followed - by zero or more characters with the "mark" property. Characters with - the "mark" property are typically non-spacing accents that affect the - preceding character. - - This simple definition was extended in Unicode to include more compli- - cated kinds of composite character by giving each character a grapheme - breaking property, and creating rules that use these properties to - define the boundaries of extended grapheme clusters. In releases of - PCRE later than 8.31, \X matches one of these clusters. - - \X always matches at least one character. Then it decides whether to - add additional characters according to the following rules for ending a - cluster: - - 1. End at the end of the subject string. - - 2. Do not end between CR and LF; otherwise end after any control char- - acter. - - 3. Do not break Hangul (a Korean script) syllable sequences. Hangul - characters are of five types: L, V, T, LV, and LVT. An L character may - be followed by an L, V, LV, or LVT character; an LV or V character may - be followed by a V or T character; an LVT or T character may be follwed - only by a T character. - - 4. Do not end before extending characters or spacing marks. Characters - with the "mark" property always have the "extend" grapheme breaking - property. - - 5. Do not end after prepend characters. - - 6. Otherwise, end the cluster. - - PCRE's additional properties - - As well as the standard Unicode properties described above, PCRE sup- - ports four more that make it possible to convert traditional escape - sequences such as \w and \s to use Unicode properties. PCRE uses these - non-standard, non-Perl properties internally when PCRE_UCP is set. How- - ever, they may also be used explicitly. These properties are: - - Xan Any alphanumeric character - Xps Any POSIX space character - Xsp Any Perl space character - Xwd Any Perl "word" character - - Xan matches characters that have either the L (letter) or the N (num- - ber) property. Xps matches the characters tab, linefeed, vertical tab, - form feed, or carriage return, and any other character that has the Z - (separator) property. Xsp is the same as Xps; it used to exclude ver- - tical tab, for Perl compatibility, but Perl changed, and so PCRE fol- - lowed at release 8.34. Xwd matches the same characters as Xan, plus - underscore. - - There is another non-standard property, Xuc, which matches any charac- - ter that can be represented by a Universal Character Name in C++ and - other programming languages. These are the characters $, @, ` (grave - accent), and all characters with Unicode code points greater than or - equal to U+00A0, except for the surrogates U+D800 to U+DFFF. Note that - most base (ASCII) characters are excluded. (Universal Character Names - are of the form \uHHHH or \UHHHHHHHH where H is a hexadecimal digit. - Note that the Xuc property does not match these sequences but the char- - acters that they represent.) - - Resetting the match start - - The escape sequence \K causes any previously matched characters not to - be included in the final matched sequence. For example, the pattern: - - foo\Kbar - - matches "foobar", but reports that it has matched "bar". This feature - is similar to a lookbehind assertion (described below). However, in - this case, the part of the subject before the real match does not have - to be of fixed length, as lookbehind assertions do. The use of \K does - not interfere with the setting of captured substrings. For example, - when the pattern - - (foo)\Kbar - - matches "foobar", the first substring is still set to "foo". - - Perl documents that the use of \K within assertions is "not well - defined". In PCRE, \K is acted upon when it occurs inside positive - assertions, but is ignored in negative assertions. Note that when a - pattern such as (?=ab\K) matches, the reported start of the match can - be greater than the end of the match. - - Simple assertions - - The final use of backslash is for certain simple assertions. An asser- - tion specifies a condition that has to be met at a particular point in - a match, without consuming any characters from the subject string. The - use of subpatterns for more complicated assertions is described below. - The backslashed assertions are: - - \b matches at a word boundary - \B matches when not at a word boundary - \A matches at the start of the subject - \Z matches at the end of the subject - also matches before a newline at the end of the subject - \z matches only at the end of the subject - \G matches at the first matching position in the subject - - Inside a character class, \b has a different meaning; it matches the - backspace character. If any other of these assertions appears in a - character class, by default it matches the corresponding literal char- - acter (for example, \B matches the letter B). However, if the - PCRE_EXTRA option is set, an "invalid escape sequence" error is gener- - ated instead. - - A word boundary is a position in the subject string where the current - character and the previous character do not both match \w or \W (i.e. - one matches \w and the other matches \W), or the start or end of the - string if the first or last character matches \w, respectively. In a - UTF mode, the meanings of \w and \W can be changed by setting the - PCRE_UCP option. When this is done, it also affects \b and \B. Neither - PCRE nor Perl has a separate "start of word" or "end of word" metase- - quence. However, whatever follows \b normally determines which it is. - For example, the fragment \ba matches "a" at the start of a word. - - The \A, \Z, and \z assertions differ from the traditional circumflex - and dollar (described in the next section) in that they only ever match - at the very start and end of the subject string, whatever options are - set. Thus, they are independent of multiline mode. These three asser- - tions are not affected by the PCRE_NOTBOL or PCRE_NOTEOL options, which - affect only the behaviour of the circumflex and dollar metacharacters. - However, if the startoffset argument of pcre_exec() is non-zero, indi- - cating that matching is to start at a point other than the beginning of - the subject, \A can never match. The difference between \Z and \z is - that \Z matches before a newline at the end of the string as well as at - the very end, whereas \z matches only at the end. - - The \G assertion is true only when the current matching position is at - the start point of the match, as specified by the startoffset argument - of pcre_exec(). It differs from \A when the value of startoffset is - non-zero. By calling pcre_exec() multiple times with appropriate argu- - ments, you can mimic Perl's /g option, and it is in this kind of imple- - mentation where \G can be useful. - - Note, however, that PCRE's interpretation of \G, as the start of the - current match, is subtly different from Perl's, which defines it as the - end of the previous match. In Perl, these can be different when the - previously matched string was empty. Because PCRE does just one match - at a time, it cannot reproduce this behaviour. - - If all the alternatives of a pattern begin with \G, the expression is - anchored to the starting match position, and the "anchored" flag is set - in the compiled regular expression. - - -CIRCUMFLEX AND DOLLAR - - The circumflex and dollar metacharacters are zero-width assertions. - That is, they test for a particular condition being true without con- - suming any characters from the subject string. - - Outside a character class, in the default matching mode, the circumflex - character is an assertion that is true only if the current matching - point is at the start of the subject string. If the startoffset argu- - ment of pcre_exec() is non-zero, circumflex can never match if the - PCRE_MULTILINE option is unset. Inside a character class, circumflex - has an entirely different meaning (see below). - - Circumflex need not be the first character of the pattern if a number - of alternatives are involved, but it should be the first thing in each - alternative in which it appears if the pattern is ever to match that - branch. If all possible alternatives start with a circumflex, that is, - if the pattern is constrained to match only at the start of the sub- - ject, it is said to be an "anchored" pattern. (There are also other - constructs that can cause a pattern to be anchored.) - - The dollar character is an assertion that is true only if the current - matching point is at the end of the subject string, or immediately - before a newline at the end of the string (by default). Note, however, - that it does not actually match the newline. Dollar need not be the - last character of the pattern if a number of alternatives are involved, - but it should be the last item in any branch in which it appears. Dol- - lar has no special meaning in a character class. - - The meaning of dollar can be changed so that it matches only at the - very end of the string, by setting the PCRE_DOLLAR_ENDONLY option at - compile time. This does not affect the \Z assertion. - - The meanings of the circumflex and dollar characters are changed if the - PCRE_MULTILINE option is set. When this is the case, a circumflex - matches immediately after internal newlines as well as at the start of - the subject string. It does not match after a newline that ends the - string. A dollar matches before any newlines in the string, as well as - at the very end, when PCRE_MULTILINE is set. When newline is specified - as the two-character sequence CRLF, isolated CR and LF characters do - not indicate newlines. - - For example, the pattern /^abc$/ matches the subject string "def\nabc" - (where \n represents a newline) in multiline mode, but not otherwise. - Consequently, patterns that are anchored in single line mode because - all branches start with ^ are not anchored in multiline mode, and a - match for circumflex is possible when the startoffset argument of - pcre_exec() is non-zero. The PCRE_DOLLAR_ENDONLY option is ignored if - PCRE_MULTILINE is set. - - Note that the sequences \A, \Z, and \z can be used to match the start - and end of the subject in both modes, and if all branches of a pattern - start with \A it is always anchored, whether or not PCRE_MULTILINE is - set. - - -FULL STOP (PERIOD, DOT) AND \N - - Outside a character class, a dot in the pattern matches any one charac- - ter in the subject string except (by default) a character that signi- - fies the end of a line. - - When a line ending is defined as a single character, dot never matches - that character; when the two-character sequence CRLF is used, dot does - not match CR if it is immediately followed by LF, but otherwise it - matches all characters (including isolated CRs and LFs). When any Uni- - code line endings are being recognized, dot does not match CR or LF or - any of the other line ending characters. - - The behaviour of dot with regard to newlines can be changed. If the - PCRE_DOTALL option is set, a dot matches any one character, without - exception. If the two-character sequence CRLF is present in the subject - string, it takes two dots to match it. - - The handling of dot is entirely independent of the handling of circum- - flex and dollar, the only relationship being that they both involve - newlines. Dot has no special meaning in a character class. - - The escape sequence \N behaves like a dot, except that it is not - affected by the PCRE_DOTALL option. In other words, it matches any - character except one that signifies the end of a line. Perl also uses - \N to match characters by name; PCRE does not support this. - - -MATCHING A SINGLE DATA UNIT - - Outside a character class, the escape sequence \C matches any one data - unit, whether or not a UTF mode is set. In the 8-bit library, one data - unit is one byte; in the 16-bit library it is a 16-bit unit; in the - 32-bit library it is a 32-bit unit. Unlike a dot, \C always matches - line-ending characters. The feature is provided in Perl in order to - match individual bytes in UTF-8 mode, but it is unclear how it can use- - fully be used. Because \C breaks up characters into individual data - units, matching one unit with \C in a UTF mode means that the rest of - the string may start with a malformed UTF character. This has undefined - results, because PCRE assumes that it is dealing with valid UTF strings - (and by default it checks this at the start of processing unless the - PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK or PCRE_NO_UTF32_CHECK option - is used). - - PCRE does not allow \C to appear in lookbehind assertions (described - below) in a UTF mode, because this would make it impossible to calcu- - late the length of the lookbehind. - - In general, the \C escape sequence is best avoided. However, one way of - using it that avoids the problem of malformed UTF characters is to use - a lookahead to check the length of the next character, as in this pat- - tern, which could be used with a UTF-8 string (ignore white space and - line breaks): - - (?| (?=[\x00-\x7f])(\C) | - (?=[\x80-\x{7ff}])(\C)(\C) | - (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) | - (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C)) - - A group that starts with (?| resets the capturing parentheses numbers - in each alternative (see "Duplicate Subpattern Numbers" below). The - assertions at the start of each branch check the next UTF-8 character - for values whose encoding uses 1, 2, 3, or 4 bytes, respectively. The - character's individual bytes are then captured by the appropriate num- - ber of groups. - - -SQUARE BRACKETS AND CHARACTER CLASSES - - An opening square bracket introduces a character class, terminated by a - closing square bracket. A closing square bracket on its own is not spe- - cial by default. However, if the PCRE_JAVASCRIPT_COMPAT option is set, - a lone closing square bracket causes a compile-time error. If a closing - square bracket is required as a member of the class, it should be the - first data character in the class (after an initial circumflex, if - present) or escaped with a backslash. - - A character class matches a single character in the subject. In a UTF - mode, the character may be more than one data unit long. A matched - character must be in the set of characters defined by the class, unless - the first character in the class definition is a circumflex, in which - case the subject character must not be in the set defined by the class. - If a circumflex is actually required as a member of the class, ensure - it is not the first character, or escape it with a backslash. - - For example, the character class [aeiou] matches any lower case vowel, - while [^aeiou] matches any character that is not a lower case vowel. - Note that a circumflex is just a convenient notation for specifying the - characters that are in the class by enumerating those that are not. A - class that starts with a circumflex is not an assertion; it still con- - sumes a character from the subject string, and therefore it fails if - the current pointer is at the end of the string. - - In UTF-8 (UTF-16, UTF-32) mode, characters with values greater than 255 - (0xffff) can be included in a class as a literal string of data units, - or by using the \x{ escaping mechanism. - - When caseless matching is set, any letters in a class represent both - their upper case and lower case versions, so for example, a caseless - [aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not - match "A", whereas a caseful version would. In a UTF mode, PCRE always - understands the concept of case for characters whose values are less - than 128, so caseless matching is always possible. For characters with - higher values, the concept of case is supported if PCRE is compiled - with Unicode property support, but not otherwise. If you want to use - caseless matching in a UTF mode for characters 128 and above, you must - ensure that PCRE is compiled with Unicode property support as well as - with UTF support. - - Characters that might indicate line breaks are never treated in any - special way when matching character classes, whatever line-ending - sequence is in use, and whatever setting of the PCRE_DOTALL and - PCRE_MULTILINE options is used. A class such as [^a] always matches one - of these characters. - - The minus (hyphen) character can be used to specify a range of charac- - ters in a character class. For example, [d-m] matches any letter - between d and m, inclusive. If a minus character is required in a - class, it must be escaped with a backslash or appear in a position - where it cannot be interpreted as indicating a range, typically as the - first or last character in the class, or immediately after a range. For - example, [b-d-z] matches letters in the range b to d, a hyphen charac- - ter, or z. - - It is not possible to have the literal character "]" as the end charac- - ter of a range. A pattern such as [W-]46] is interpreted as a class of - two characters ("W" and "-") followed by a literal string "46]", so it - would match "W46]" or "-46]". However, if the "]" is escaped with a - backslash it is interpreted as the end of range, so [W-\]46] is inter- - preted as a class containing a range followed by two other characters. - The octal or hexadecimal representation of "]" can also be used to end - a range. - - An error is generated if a POSIX character class (see below) or an - escape sequence other than one that defines a single character appears - at a point where a range ending character is expected. For example, - [z-\xff] is valid, but [A-\d] and [A-[:digit:]] are not. - - Ranges operate in the collating sequence of character values. They can - also be used for characters specified numerically, for example - [\000-\037]. Ranges can include any characters that are valid for the - current mode. - - If a range that includes letters is used when caseless matching is set, - it matches the letters in either case. For example, [W-c] is equivalent - to [][\\^_`wxyzabc], matched caselessly, and in a non-UTF mode, if - character tables for a French locale are in use, [\xc8-\xcb] matches - accented E characters in both cases. In UTF modes, PCRE supports the - concept of case for characters with values greater than 128 only when - it is compiled with Unicode property support. - - The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v, \V, - \w, and \W may appear in a character class, and add the characters that - they match to the class. For example, [\dABCDEF] matches any hexadeci- - mal digit. In UTF modes, the PCRE_UCP option affects the meanings of - \d, \s, \w and their upper case partners, just as it does when they - appear outside a character class, as described in the section entitled - "Generic character types" above. The escape sequence \b has a different - meaning inside a character class; it matches the backspace character. - The sequences \B, \N, \R, and \X are not special inside a character - class. Like any other unrecognized escape sequences, they are treated - as the literal characters "B", "N", "R", and "X" by default, but cause - an error if the PCRE_EXTRA option is set. - - A circumflex can conveniently be used with the upper case character - types to specify a more restricted set of characters than the matching - lower case type. For example, the class [^\W_] matches any letter or - digit, but not underscore, whereas [\w] includes underscore. A positive - character class should be read as "something OR something OR ..." and a - negative class as "NOT something AND NOT something AND NOT ...". - - The only metacharacters that are recognized in character classes are - backslash, hyphen (only where it can be interpreted as specifying a - range), circumflex (only at the start), opening square bracket (only - when it can be interpreted as introducing a POSIX class name, or for a - special compatibility feature - see the next two sections), and the - terminating closing square bracket. However, escaping other non- - alphanumeric characters does no harm. - - -POSIX CHARACTER CLASSES - - Perl supports the POSIX notation for character classes. This uses names - enclosed by [: and :] within the enclosing square brackets. PCRE also - supports this notation. For example, - - [01[:alpha:]%] - - matches "0", "1", any alphabetic character, or "%". The supported class - names are: - - alnum letters and digits - alpha letters - ascii character codes 0 - 127 - blank space or tab only - cntrl control characters - digit decimal digits (same as \d) - graph printing characters, excluding space - lower lower case letters - print printing characters, including space - punct printing characters, excluding letters and digits and space - space white space (the same as \s from PCRE 8.34) - upper upper case letters - word "word" characters (same as \w) - xdigit hexadecimal digits - - The default "space" characters are HT (9), LF (10), VT (11), FF (12), - CR (13), and space (32). If locale-specific matching is taking place, - the list of space characters may be different; there may be fewer or - more of them. "Space" used to be different to \s, which did not include - VT, for Perl compatibility. However, Perl changed at release 5.18, and - PCRE followed at release 8.34. "Space" and \s now match the same set - of characters. - - The name "word" is a Perl extension, and "blank" is a GNU extension - from Perl 5.8. Another Perl extension is negation, which is indicated - by a ^ character after the colon. For example, - - [12[:^digit:]] - - matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the - POSIX syntax [.ch.] and [=ch=] where "ch" is a "collating element", but - these are not supported, and an error is given if they are encountered. - - By default, characters with values greater than 128 do not match any of - the POSIX character classes. However, if the PCRE_UCP option is passed - to pcre_compile(), some of the classes are changed so that Unicode - character properties are used. This is achieved by replacing certain - POSIX classes by other sequences, as follows: - - [:alnum:] becomes \p{Xan} - [:alpha:] becomes \p{L} - [:blank:] becomes \h - [:digit:] becomes \p{Nd} - [:lower:] becomes \p{Ll} - [:space:] becomes \p{Xps} - [:upper:] becomes \p{Lu} - [:word:] becomes \p{Xwd} - - Negated versions, such as [:^alpha:] use \P instead of \p. Three other - POSIX classes are handled specially in UCP mode: - - [:graph:] This matches characters that have glyphs that mark the page - when printed. In Unicode property terms, it matches all char- - acters with the L, M, N, P, S, or Cf properties, except for: - - U+061C Arabic Letter Mark - U+180E Mongolian Vowel Separator - U+2066 - U+2069 Various "isolate"s - - - [:print:] This matches the same characters as [:graph:] plus space - characters that are not controls, that is, characters with - the Zs property. - - [:punct:] This matches all characters that have the Unicode P (punctua- - tion) property, plus those characters whose code points are - less than 128 that have the S (Symbol) property. - - The other POSIX classes are unchanged, and match only characters with - code points less than 128. - - -COMPATIBILITY FEATURE FOR WORD BOUNDARIES - - In the POSIX.2 compliant library that was included in 4.4BSD Unix, the - ugly syntax [[:<:]] and [[:>:]] is used for matching "start of word" - and "end of word". PCRE treats these items as follows: - - [[:<:]] is converted to \b(?=\w) - [[:>:]] is converted to \b(?<=\w) - - Only these exact character sequences are recognized. A sequence such as - [a[:<:]b] provokes error for an unrecognized POSIX class name. This - support is not compatible with Perl. It is provided to help migrations - from other environments, and is best not used in any new patterns. Note - that \b matches at the start and the end of a word (see "Simple asser- - tions" above), and in a Perl-style pattern the preceding or following - character normally shows which is wanted, without the need for the - assertions that are used above in order to give exactly the POSIX be- - haviour. - - -VERTICAL BAR - - Vertical bar characters are used to separate alternative patterns. For - example, the pattern - - gilbert|sullivan - - matches either "gilbert" or "sullivan". Any number of alternatives may - appear, and an empty alternative is permitted (matching the empty - string). The matching process tries each alternative in turn, from left - to right, and the first one that succeeds is used. If the alternatives - are within a subpattern (defined below), "succeeds" means matching the - rest of the main pattern as well as the alternative in the subpattern. - - -INTERNAL OPTION SETTING - - The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and - PCRE_EXTENDED options (which are Perl-compatible) can be changed from - within the pattern by a sequence of Perl option letters enclosed - between "(?" and ")". The option letters are - - i for PCRE_CASELESS - m for PCRE_MULTILINE - s for PCRE_DOTALL - x for PCRE_EXTENDED - - For example, (?im) sets caseless, multiline matching. It is also possi- - ble to unset these options by preceding the letter with a hyphen, and a - combined setting and unsetting such as (?im-sx), which sets PCRE_CASE- - LESS and PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED, - is also permitted. If a letter appears both before and after the - hyphen, the option is unset. - - The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA - can be changed in the same way as the Perl-compatible options by using - the characters J, U and X respectively. - - When one of these option changes occurs at top level (that is, not - inside subpattern parentheses), the change applies to the remainder of - the pattern that follows. An option change within a subpattern (see - below for a description of subpatterns) affects only that part of the - subpattern that follows it, so - - (a(?i)b)c - - matches abc and aBc and no other strings (assuming PCRE_CASELESS is not - used). By this means, options can be made to have different settings - in different parts of the pattern. Any changes made in one alternative - do carry on into subsequent branches within the same subpattern. For - example, - - (a(?i)b|c) - - matches "ab", "aB", "c", and "C", even though when matching "C" the - first branch is abandoned before the option setting. This is because - the effects of option settings happen at compile time. There would be - some very weird behaviour otherwise. - - Note: There are other PCRE-specific options that can be set by the - application when the compiling or matching functions are called. In - some cases the pattern can contain special leading sequences such as - (*CRLF) to override what the application has set or what has been - defaulted. Details are given in the section entitled "Newline - sequences" above. There are also the (*UTF8), (*UTF16),(*UTF32), and - (*UCP) leading sequences that can be used to set UTF and Unicode prop- - erty modes; they are equivalent to setting the PCRE_UTF8, PCRE_UTF16, - PCRE_UTF32 and the PCRE_UCP options, respectively. The (*UTF) sequence - is a generic version that can be used with any of the libraries. How- - ever, the application can set the PCRE_NEVER_UTF option, which locks - out the use of the (*UTF) sequences. - - -SUBPATTERNS - - Subpatterns are delimited by parentheses (round brackets), which can be - nested. Turning part of a pattern into a subpattern does two things: - - 1. It localizes a set of alternatives. For example, the pattern - - cat(aract|erpillar|) - - matches "cataract", "caterpillar", or "cat". Without the parentheses, - it would match "cataract", "erpillar" or an empty string. - - 2. It sets up the subpattern as a capturing subpattern. This means - that, when the whole pattern matches, that portion of the subject - string that matched the subpattern is passed back to the caller via the - ovector argument of the matching function. (This applies only to the - traditional matching functions; the DFA matching functions do not sup- - port capturing.) - - Opening parentheses are counted from left to right (starting from 1) to - obtain numbers for the capturing subpatterns. For example, if the - string "the red king" is matched against the pattern - - the ((red|white) (king|queen)) - - the captured substrings are "red king", "red", and "king", and are num- - bered 1, 2, and 3, respectively. - - The fact that plain parentheses fulfil two functions is not always - helpful. There are often times when a grouping subpattern is required - without a capturing requirement. If an opening parenthesis is followed - by a question mark and a colon, the subpattern does not do any captur- - ing, and is not counted when computing the number of any subsequent - capturing subpatterns. For example, if the string "the white queen" is - matched against the pattern - - the ((?:red|white) (king|queen)) - - the captured substrings are "white queen" and "queen", and are numbered - 1 and 2. The maximum number of capturing subpatterns is 65535. - - As a convenient shorthand, if any option settings are required at the - start of a non-capturing subpattern, the option letters may appear - between the "?" and the ":". Thus the two patterns - - (?i:saturday|sunday) - (?:(?i)saturday|sunday) - - match exactly the same set of strings. Because alternative branches are - tried from left to right, and options are not reset until the end of - the subpattern is reached, an option setting in one branch does affect - subsequent branches, so the above patterns match "SUNDAY" as well as - "Saturday". - - -DUPLICATE SUBPATTERN NUMBERS - - Perl 5.10 introduced a feature whereby each alternative in a subpattern - uses the same numbers for its capturing parentheses. Such a subpattern - starts with (?| and is itself a non-capturing subpattern. For example, - consider this pattern: - - (?|(Sat)ur|(Sun))day - - Because the two alternatives are inside a (?| group, both sets of cap- - turing parentheses are numbered one. Thus, when the pattern matches, - you can look at captured substring number one, whichever alternative - matched. This construct is useful when you want to capture part, but - not all, of one of a number of alternatives. Inside a (?| group, paren- - theses are numbered as usual, but the number is reset at the start of - each branch. The numbers of any capturing parentheses that follow the - subpattern start after the highest number used in any branch. The fol- - lowing example is taken from the Perl documentation. The numbers under- - neath show in which buffer the captured content will be stored. - - # before ---------------branch-reset----------- after - / ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x - # 1 2 2 3 2 3 4 - - A back reference to a numbered subpattern uses the most recent value - that is set for that number by any subpattern. The following pattern - matches "abcabc" or "defdef": - - /(?|(abc)|(def))\1/ - - In contrast, a subroutine call to a numbered subpattern always refers - to the first one in the pattern with the given number. The following - pattern matches "abcabc" or "defabc": - - /(?|(abc)|(def))(?1)/ - - If a condition test for a subpattern's having matched refers to a non- - unique number, the test is true if any of the subpatterns of that num- - ber have matched. - - An alternative approach to using this "branch reset" feature is to use - duplicate named subpatterns, as described in the next section. - - -NAMED SUBPATTERNS - - Identifying capturing parentheses by number is simple, but it can be - very hard to keep track of the numbers in complicated regular expres- - sions. Furthermore, if an expression is modified, the numbers may - change. To help with this difficulty, PCRE supports the naming of sub- - patterns. This feature was not added to Perl until release 5.10. Python - had the feature earlier, and PCRE introduced it at release 4.0, using - the Python syntax. PCRE now supports both the Perl and the Python syn- - tax. Perl allows identically numbered subpatterns to have different - names, but PCRE does not. - - In PCRE, a subpattern can be named in one of three ways: (?...) - or (?'name'...) as in Perl, or (?P...) as in Python. References - to capturing parentheses from other parts of the pattern, such as back - references, recursion, and conditions, can be made by name as well as - by number. - - Names consist of up to 32 alphanumeric characters and underscores, but - must start with a non-digit. Named capturing parentheses are still - allocated numbers as well as names, exactly as if the names were not - present. The PCRE API provides function calls for extracting the name- - to-number translation table from a compiled pattern. There is also a - convenience function for extracting a captured substring by name. - - By default, a name must be unique within a pattern, but it is possible - to relax this constraint by setting the PCRE_DUPNAMES option at compile - time. (Duplicate names are also always permitted for subpatterns with - the same number, set up as described in the previous section.) Dupli- - cate names can be useful for patterns where only one instance of the - named parentheses can match. Suppose you want to match the name of a - weekday, either as a 3-letter abbreviation or as the full name, and in - both cases you want to extract the abbreviation. This pattern (ignoring - the line breaks) does the job: - - (?Mon|Fri|Sun)(?:day)?| - (?Tue)(?:sday)?| - (?Wed)(?:nesday)?| - (?Thu)(?:rsday)?| - (?Sat)(?:urday)? - - There are five capturing substrings, but only one is ever set after a - match. (An alternative way of solving this problem is to use a "branch - reset" subpattern, as described in the previous section.) - - The convenience function for extracting the data by name returns the - substring for the first (and in this example, the only) subpattern of - that name that matched. This saves searching to find which numbered - subpattern it was. - - If you make a back reference to a non-unique named subpattern from - elsewhere in the pattern, the subpatterns to which the name refers are - checked in the order in which they appear in the overall pattern. The - first one that is set is used for the reference. For example, this pat- - tern matches both "foofoo" and "barbar" but not "foobar" or "barfoo": - - (?:(?foo)|(?bar))\k - - - If you make a subroutine call to a non-unique named subpattern, the one - that corresponds to the first occurrence of the name is used. In the - absence of duplicate numbers (see the previous section) this is the one - with the lowest number. - - If you use a named reference in a condition test (see the section about - conditions below), either to check whether a subpattern has matched, or - to check for recursion, all subpatterns with the same name are tested. - If the condition is true for any one of them, the overall condition is - true. This is the same behaviour as testing by number. For further - details of the interfaces for handling named subpatterns, see the - pcreapi documentation. - - Warning: You cannot use different names to distinguish between two sub- - patterns with the same number because PCRE uses only the numbers when - matching. For this reason, an error is given at compile time if differ- - ent names are given to subpatterns with the same number. However, you - can always give the same name to subpatterns with the same number, even - when PCRE_DUPNAMES is not set. - - -REPETITION - - Repetition is specified by quantifiers, which can follow any of the - following items: - - a literal data character - the dot metacharacter - the \C escape sequence - the \X escape sequence - the \R escape sequence - an escape such as \d or \pL that matches a single character - a character class - a back reference (see next section) - a parenthesized subpattern (including assertions) - a subroutine call to a subpattern (recursive or otherwise) - - The general repetition quantifier specifies a minimum and maximum num- - ber of permitted matches, by giving the two numbers in curly brackets - (braces), separated by a comma. The numbers must be less than 65536, - and the first must be less than or equal to the second. For example: - - z{2,4} - - matches "zz", "zzz", or "zzzz". A closing brace on its own is not a - special character. If the second number is omitted, but the comma is - present, there is no upper limit; if the second number and the comma - are both omitted, the quantifier specifies an exact number of required - matches. Thus - - [aeiou]{3,} - - matches at least 3 successive vowels, but may match many more, while - - \d{8} - - matches exactly 8 digits. An opening curly bracket that appears in a - position where a quantifier is not allowed, or one that does not match - the syntax of a quantifier, is taken as a literal character. For exam- - ple, {,6} is not a quantifier, but a literal string of four characters. - - In UTF modes, quantifiers apply to characters rather than to individual - data units. Thus, for example, \x{100}{2} matches two characters, each - of which is represented by a two-byte sequence in a UTF-8 string. Simi- - larly, \X{3} matches three Unicode extended grapheme clusters, each of - which may be several data units long (and they may be of different - lengths). - - The quantifier {0} is permitted, causing the expression to behave as if - the previous item and the quantifier were not present. This may be use- - ful for subpatterns that are referenced as subroutines from elsewhere - in the pattern (but see also the section entitled "Defining subpatterns - for use by reference only" below). Items other than subpatterns that - have a {0} quantifier are omitted from the compiled pattern. - - For convenience, the three most common quantifiers have single-charac- - ter abbreviations: - - * is equivalent to {0,} - + is equivalent to {1,} - ? is equivalent to {0,1} - - It is possible to construct infinite loops by following a subpattern - that can match no characters with a quantifier that has no upper limit, - for example: - - (a?)* - - Earlier versions of Perl and PCRE used to give an error at compile time - for such patterns. However, because there are cases where this can be - useful, such patterns are now accepted, but if any repetition of the - subpattern does in fact match no characters, the loop is forcibly bro- - ken. - - By default, the quantifiers are "greedy", that is, they match as much - as possible (up to the maximum number of permitted times), without - causing the rest of the pattern to fail. The classic example of where - this gives problems is in trying to match comments in C programs. These - appear between /* and */ and within the comment, individual * and / - characters may appear. An attempt to match C comments by applying the - pattern - - /\*.*\*/ - - to the string - - /* first comment */ not comment /* second comment */ - - fails, because it matches the entire string owing to the greediness of - the .* item. - - However, if a quantifier is followed by a question mark, it ceases to - be greedy, and instead matches the minimum number of times possible, so - the pattern - - /\*.*?\*/ - - does the right thing with the C comments. The meaning of the various - quantifiers is not otherwise changed, just the preferred number of - matches. Do not confuse this use of question mark with its use as a - quantifier in its own right. Because it has two uses, it can sometimes - appear doubled, as in - - \d??\d - - which matches one digit by preference, but can match two if that is the - only way the rest of the pattern matches. - - If the PCRE_UNGREEDY option is set (an option that is not available in - Perl), the quantifiers are not greedy by default, but individual ones - can be made greedy by following them with a question mark. In other - words, it inverts the default behaviour. - - When a parenthesized subpattern is quantified with a minimum repeat - count that is greater than 1 or with a limited maximum, more memory is - required for the compiled pattern, in proportion to the size of the - minimum or maximum. - - If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equiv- - alent to Perl's /s) is set, thus allowing the dot to match newlines, - the pattern is implicitly anchored, because whatever follows will be - tried against every character position in the subject string, so there - is no point in retrying the overall match at any position after the - first. PCRE normally treats such a pattern as though it were preceded - by \A. - - In cases where it is known that the subject string contains no new- - lines, it is worth setting PCRE_DOTALL in order to obtain this opti- - mization, or alternatively using ^ to indicate anchoring explicitly. - - However, there are some cases where the optimization cannot be used. - When .* is inside capturing parentheses that are the subject of a back - reference elsewhere in the pattern, a match at the start may fail where - a later one succeeds. Consider, for example: - - (.*)abc\1 - - If the subject is "xyz123abc123" the match point is the fourth charac- - ter. For this reason, such a pattern is not implicitly anchored. - - Another case where implicit anchoring is not applied is when the lead- - ing .* is inside an atomic group. Once again, a match at the start may - fail where a later one succeeds. Consider this pattern: - - (?>.*?a)b - - It matches "ab" in the subject "aab". The use of the backtracking con- - trol verbs (*PRUNE) and (*SKIP) also disable this optimization. - - When a capturing subpattern is repeated, the value captured is the sub- - string that matched the final iteration. For example, after - - (tweedle[dume]{3}\s*)+ - - has matched "tweedledum tweedledee" the value of the captured substring - is "tweedledee". However, if there are nested capturing subpatterns, - the corresponding captured values may have been set in previous itera- - tions. For example, after - - /(a|(b))+/ - - matches "aba" the value of the second captured substring is "b". - - -ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS - - With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy") - repetition, failure of what follows normally causes the repeated item - to be re-evaluated to see if a different number of repeats allows the - rest of the pattern to match. Sometimes it is useful to prevent this, - either to change the nature of the match, or to cause it fail earlier - than it otherwise might, when the author of the pattern knows there is - no point in carrying on. - - Consider, for example, the pattern \d+foo when applied to the subject - line - - 123456bar - - After matching all 6 digits and then failing to match "foo", the normal - action of the matcher is to try again with only 5 digits matching the - \d+ item, and then with 4, and so on, before ultimately failing. - "Atomic grouping" (a term taken from Jeffrey Friedl's book) provides - the means for specifying that once a subpattern has matched, it is not - to be re-evaluated in this way. - - If we use atomic grouping for the previous example, the matcher gives - up immediately on failing to match "foo" the first time. The notation - is a kind of special parenthesis, starting with (?> as in this example: - - (?>\d+)foo - - This kind of parenthesis "locks up" the part of the pattern it con- - tains once it has matched, and a failure further into the pattern is - prevented from backtracking into it. Backtracking past it to previous - items, however, works as normal. - - An alternative description is that a subpattern of this type matches - the string of characters that an identical standalone pattern would - match, if anchored at the current point in the subject string. - - Atomic grouping subpatterns are not capturing subpatterns. Simple cases - such as the above example can be thought of as a maximizing repeat that - must swallow everything it can. So, while both \d+ and \d+? are pre- - pared to adjust the number of digits they match in order to make the - rest of the pattern match, (?>\d+) can only match an entire sequence of - digits. - - Atomic groups in general can of course contain arbitrarily complicated - subpatterns, and can be nested. However, when the subpattern for an - atomic group is just a single repeated item, as in the example above, a - simpler notation, called a "possessive quantifier" can be used. This - consists of an additional + character following a quantifier. Using - this notation, the previous example can be rewritten as - - \d++foo - - Note that a possessive quantifier can be used with an entire group, for - example: - - (abc|xyz){2,3}+ - - Possessive quantifiers are always greedy; the setting of the - PCRE_UNGREEDY option is ignored. They are a convenient notation for the - simpler forms of atomic group. However, there is no difference in the - meaning of a possessive quantifier and the equivalent atomic group, - though there may be a performance difference; possessive quantifiers - should be slightly faster. - - The possessive quantifier syntax is an extension to the Perl 5.8 syn- - tax. Jeffrey Friedl originated the idea (and the name) in the first - edition of his book. Mike McCloskey liked it, so implemented it when he - built Sun's Java package, and PCRE copied it from there. It ultimately - found its way into Perl at release 5.10. - - PCRE has an optimization that automatically "possessifies" certain sim- - ple pattern constructs. For example, the sequence A+B is treated as - A++B because there is no point in backtracking into a sequence of A's - when B must follow. - - When a pattern contains an unlimited repeat inside a subpattern that - can itself be repeated an unlimited number of times, the use of an - atomic group is the only way to avoid some failing matches taking a - very long time indeed. The pattern - - (\D+|<\d+>)*[!?] - - matches an unlimited number of substrings that either consist of non- - digits, or digits enclosed in <>, followed by either ! or ?. When it - matches, it runs quickly. However, if it is applied to - - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa - - it takes a long time before reporting failure. This is because the - string can be divided between the internal \D+ repeat and the external - * repeat in a large number of ways, and all have to be tried. (The - example uses [!?] rather than a single character at the end, because - both PCRE and Perl have an optimization that allows for fast failure - when a single character is used. They remember the last single charac- - ter that is required for a match, and fail early if it is not present - in the string.) If the pattern is changed so that it uses an atomic - group, like this: - - ((?>\D+)|<\d+>)*[!?] - - sequences of non-digits cannot be broken, and failure happens quickly. - - -BACK REFERENCES - - Outside a character class, a backslash followed by a digit greater than - 0 (and possibly further digits) is a back reference to a capturing sub- - pattern earlier (that is, to its left) in the pattern, provided there - have been that many previous capturing left parentheses. - - However, if the decimal number following the backslash is less than 10, - it is always taken as a back reference, and causes an error only if - there are not that many capturing left parentheses in the entire pat- - tern. In other words, the parentheses that are referenced need not be - to the left of the reference for numbers less than 10. A "forward back - reference" of this type can make sense when a repetition is involved - and the subpattern to the right has participated in an earlier itera- - tion. - - It is not possible to have a numerical "forward back reference" to a - subpattern whose number is 10 or more using this syntax because a - sequence such as \50 is interpreted as a character defined in octal. - See the subsection entitled "Non-printing characters" above for further - details of the handling of digits following a backslash. There is no - such problem when named parentheses are used. A back reference to any - subpattern is possible using named parentheses (see below). - - Another way of avoiding the ambiguity inherent in the use of digits - following a backslash is to use the \g escape sequence. This escape - must be followed by an unsigned number or a negative number, optionally - enclosed in braces. These examples are all identical: - - (ring), \1 - (ring), \g1 - (ring), \g{1} - - An unsigned number specifies an absolute reference without the ambigu- - ity that is present in the older syntax. It is also useful when literal - digits follow the reference. A negative number is a relative reference. - Consider this example: - - (abc(def)ghi)\g{-1} - - The sequence \g{-1} is a reference to the most recently started captur- - ing subpattern before \g, that is, is it equivalent to \2 in this exam- - ple. Similarly, \g{-2} would be equivalent to \1. The use of relative - references can be helpful in long patterns, and also in patterns that - are created by joining together fragments that contain references - within themselves. - - A back reference matches whatever actually matched the capturing sub- - pattern in the current subject string, rather than anything matching - the subpattern itself (see "Subpatterns as subroutines" below for a way - of doing that). So the pattern - - (sens|respons)e and \1ibility - - matches "sense and sensibility" and "response and responsibility", but - not "sense and responsibility". If caseful matching is in force at the - time of the back reference, the case of letters is relevant. For exam- - ple, - - ((?i)rah)\s+\1 - - matches "rah rah" and "RAH RAH", but not "RAH rah", even though the - original capturing subpattern is matched caselessly. - - There are several different ways of writing back references to named - subpatterns. The .NET syntax \k{name} and the Perl syntax \k or - \k'name' are supported, as is the Python syntax (?P=name). Perl 5.10's - unified back reference syntax, in which \g can be used for both numeric - and named references, is also supported. We could rewrite the above - example in any of the following ways: - - (?(?i)rah)\s+\k - (?'p1'(?i)rah)\s+\k{p1} - (?P(?i)rah)\s+(?P=p1) - (?(?i)rah)\s+\g{p1} - - A subpattern that is referenced by name may appear in the pattern - before or after the reference. - - There may be more than one back reference to the same subpattern. If a - subpattern has not actually been used in a particular match, any back - references to it always fail by default. For example, the pattern - - (a|(bc))\2 - - always fails if it starts to match "a" rather than "bc". However, if - the PCRE_JAVASCRIPT_COMPAT option is set at compile time, a back refer- - ence to an unset value matches an empty string. - - Because there may be many capturing parentheses in a pattern, all dig- - its following a backslash are taken as part of a potential back refer- - ence number. If the pattern continues with a digit character, some - delimiter must be used to terminate the back reference. If the - PCRE_EXTENDED option is set, this can be white space. Otherwise, the - \g{ syntax or an empty comment (see "Comments" below) can be used. - - Recursive back references - - A back reference that occurs inside the parentheses to which it refers - fails when the subpattern is first used, so, for example, (a\1) never - matches. However, such references can be useful inside repeated sub- - patterns. For example, the pattern - - (a|b\1)+ - - matches any number of "a"s and also "aba", "ababbaa" etc. At each iter- - ation of the subpattern, the back reference matches the character - string corresponding to the previous iteration. In order for this to - work, the pattern must be such that the first iteration does not need - to match the back reference. This can be done using alternation, as in - the example above, or by a quantifier with a minimum of zero. - - Back references of this type cause the group that they reference to be - treated as an atomic group. Once the whole group has been matched, a - subsequent matching failure cannot cause backtracking into the middle - of the group. - - -ASSERTIONS - - An assertion is a test on the characters following or preceding the - current matching point that does not actually consume any characters. - The simple assertions coded as \b, \B, \A, \G, \Z, \z, ^ and $ are - described above. - - More complicated assertions are coded as subpatterns. There are two - kinds: those that look ahead of the current position in the subject - string, and those that look behind it. An assertion subpattern is - matched in the normal way, except that it does not cause the current - matching position to be changed. - - Assertion subpatterns are not capturing subpatterns. If such an asser- - tion contains capturing subpatterns within it, these are counted for - the purposes of numbering the capturing subpatterns in the whole pat- - tern. However, substring capturing is carried out only for positive - assertions. (Perl sometimes, but not always, does do capturing in nega- - tive assertions.) - - WARNING: If a positive assertion containing one or more capturing sub- - patterns succeeds, but failure to match later in the pattern causes - backtracking over this assertion, the captures within the assertion are - reset only if no higher numbered captures are already set. This is, - unfortunately, a fundamental limitation of the current implementation, - and as PCRE1 is now in maintenance-only status, it is unlikely ever to - change. - - For compatibility with Perl, assertion subpatterns may be repeated; - though it makes no sense to assert the same thing several times, the - side effect of capturing parentheses may occasionally be useful. In - practice, there only three cases: - - (1) If the quantifier is {0}, the assertion is never obeyed during - matching. However, it may contain internal capturing parenthesized - groups that are called from elsewhere via the subroutine mechanism. - - (2) If quantifier is {0,n} where n is greater than zero, it is treated - as if it were {0,1}. At run time, the rest of the pattern match is - tried with and without the assertion, the order depending on the greed- - iness of the quantifier. - - (3) If the minimum repetition is greater than zero, the quantifier is - ignored. The assertion is obeyed just once when encountered during - matching. - - Lookahead assertions - - Lookahead assertions start with (?= for positive assertions and (?! for - negative assertions. For example, - - \w+(?=;) - - matches a word followed by a semicolon, but does not include the semi- - colon in the match, and - - foo(?!bar) - - matches any occurrence of "foo" that is not followed by "bar". Note - that the apparently similar pattern - - (?!foo)bar - - does not find an occurrence of "bar" that is preceded by something - other than "foo"; it finds any occurrence of "bar" whatsoever, because - the assertion (?!foo) is always true when the next three characters are - "bar". A lookbehind assertion is needed to achieve the other effect. - - If you want to force a matching failure at some point in a pattern, the - most convenient way to do it is with (?!) because an empty string - always matches, so an assertion that requires there not to be an empty - string must always fail. The backtracking control verb (*FAIL) or (*F) - is a synonym for (?!). - - Lookbehind assertions - - Lookbehind assertions start with (?<= for positive assertions and (?)...) or (?('name')...) to test for a - used subpattern by name. For compatibility with earlier versions of - PCRE, which had this facility before Perl, the syntax (?(name)...) is - also recognized. - - Rewriting the above example to use a named subpattern gives this: - - (? \( )? [^()]+ (?() \) ) - - If the name used in a condition of this kind is a duplicate, the test - is applied to all subpatterns of the same name, and is true if any one - of them has matched. - - Checking for pattern recursion - - If the condition is the string (R), and there is no subpattern with the - name R, the condition is true if a recursive call to the whole pattern - or any subpattern has been made. If digits or a name preceded by amper- - sand follow the letter R, for example: - - (?(R3)...) or (?(R&name)...) - - the condition is true if the most recent recursion is into a subpattern - whose number or name is given. This condition does not check the entire - recursion stack. If the name used in a condition of this kind is a - duplicate, the test is applied to all subpatterns of the same name, and - is true if any one of them is the most recent recursion. - - At "top level", all these recursion test conditions are false. The - syntax for recursive patterns is described below. - - Defining subpatterns for use by reference only - - If the condition is the string (DEFINE), and there is no subpattern - with the name DEFINE, the condition is always false. In this case, - there may be only one alternative in the subpattern. It is always - skipped if control reaches this point in the pattern; the idea of - DEFINE is that it can be used to define subroutines that can be refer- - enced from elsewhere. (The use of subroutines is described below.) For - example, a pattern to match an IPv4 address such as "192.168.23.245" - could be written like this (ignore white space and line breaks): - - (?(DEFINE) (? 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) ) - \b (?&byte) (\.(?&byte)){3} \b - - The first part of the pattern is a DEFINE group inside which a another - group named "byte" is defined. This matches an individual component of - an IPv4 address (a number less than 256). When matching takes place, - this part of the pattern is skipped because DEFINE acts like a false - condition. The rest of the pattern uses references to the named group - to match the four dot-separated components of an IPv4 address, insist- - ing on a word boundary at each end. - - Assertion conditions - - If the condition is not in any of the above formats, it must be an - assertion. This may be a positive or negative lookahead or lookbehind - assertion. Consider this pattern, again containing non-significant - white space, and with the two alternatives on the second line: - - (?(?=[^a-z]*[a-z]) - \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} ) - - The condition is a positive lookahead assertion that matches an - optional sequence of non-letters followed by a letter. In other words, - it tests for the presence of at least one letter in the subject. If a - letter is found, the subject is matched against the first alternative; - otherwise it is matched against the second. This pattern matches - strings in one of the two forms dd-aaa-dd or dd-dd-dd, where aaa are - letters and dd are digits. - - -COMMENTS - - There are two ways of including comments in patterns that are processed - by PCRE. In both cases, the start of the comment must not be in a char- - acter class, nor in the middle of any other sequence of related charac- - ters such as (?: or a subpattern name or number. The characters that - make up a comment play no part in the pattern matching. - - The sequence (?# marks the start of a comment that continues up to the - next closing parenthesis. Nested parentheses are not permitted. If the - PCRE_EXTENDED option is set, an unescaped # character also introduces a - comment, which in this case continues to immediately after the next - newline character or character sequence in the pattern. Which charac- - ters are interpreted as newlines is controlled by the options passed to - a compiling function or by a special sequence at the start of the pat- - tern, as described in the section entitled "Newline conventions" above. - Note that the end of this type of comment is a literal newline sequence - in the pattern; escape sequences that happen to represent a newline do - not count. For example, consider this pattern when PCRE_EXTENDED is - set, and the default newline convention is in force: - - abc #comment \n still comment - - On encountering the # character, pcre_compile() skips along, looking - for a newline in the pattern. The sequence \n is still literal at this - stage, so it does not terminate the comment. Only an actual character - with the code value 0x0a (the default newline) does so. - - -RECURSIVE PATTERNS - - Consider the problem of matching a string in parentheses, allowing for - unlimited nested parentheses. Without the use of recursion, the best - that can be done is to use a pattern that matches up to some fixed - depth of nesting. It is not possible to handle an arbitrary nesting - depth. - - For some time, Perl has provided a facility that allows regular expres- - sions to recurse (amongst other things). It does this by interpolating - Perl code in the expression at run time, and the code can refer to the - expression itself. A Perl pattern using code interpolation to solve the - parentheses problem can be created like this: - - $re = qr{\( (?: (?>[^()]+) | (?p{$re}) )* \)}x; - - The (?p{...}) item interpolates Perl code at run time, and in this case - refers recursively to the pattern in which it appears. - - Obviously, PCRE cannot support the interpolation of Perl code. Instead, - it supports special syntax for recursion of the entire pattern, and - also for individual subpattern recursion. After its introduction in - PCRE and Python, this kind of recursion was subsequently introduced - into Perl at release 5.10. - - A special item that consists of (? followed by a number greater than - zero and a closing parenthesis is a recursive subroutine call of the - subpattern of the given number, provided that it occurs inside that - subpattern. (If not, it is a non-recursive subroutine call, which is - described in the next section.) The special item (?R) or (?0) is a - recursive call of the entire regular expression. - - This PCRE pattern solves the nested parentheses problem (assume the - PCRE_EXTENDED option is set so that white space is ignored): - - \( ( [^()]++ | (?R) )* \) - - First it matches an opening parenthesis. Then it matches any number of - substrings which can either be a sequence of non-parentheses, or a - recursive match of the pattern itself (that is, a correctly parenthe- - sized substring). Finally there is a closing parenthesis. Note the use - of a possessive quantifier to avoid backtracking into sequences of non- - parentheses. - - If this were part of a larger pattern, you would not want to recurse - the entire pattern, so instead you could use this: - - ( \( ( [^()]++ | (?1) )* \) ) - - We have put the pattern into parentheses, and caused the recursion to - refer to them instead of the whole pattern. - - In a larger pattern, keeping track of parenthesis numbers can be - tricky. This is made easier by the use of relative references. Instead - of (?1) in the pattern above you can write (?-2) to refer to the second - most recently opened parentheses preceding the recursion. In other - words, a negative number counts capturing parentheses leftwards from - the point at which it is encountered. - - It is also possible to refer to subsequently opened parentheses, by - writing references such as (?+2). However, these cannot be recursive - because the reference is not inside the parentheses that are refer- - enced. They are always non-recursive subroutine calls, as described in - the next section. - - An alternative approach is to use named parentheses instead. The Perl - syntax for this is (?&name); PCRE's earlier syntax (?P>name) is also - supported. We could rewrite the above example as follows: - - (? \( ( [^()]++ | (?&pn) )* \) ) - - If there is more than one subpattern with the same name, the earliest - one is used. - - This particular example pattern that we have been looking at contains - nested unlimited repeats, and so the use of a possessive quantifier for - matching strings of non-parentheses is important when applying the pat- - tern to strings that do not match. For example, when this pattern is - applied to - - (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa() - - it yields "no match" quickly. However, if a possessive quantifier is - not used, the match runs for a very long time indeed because there are - so many different ways the + and * repeats can carve up the subject, - and all have to be tested before failure can be reported. - - At the end of a match, the values of capturing parentheses are those - from the outermost level. If you want to obtain intermediate values, a - callout function can be used (see below and the pcrecallout documenta- - tion). If the pattern above is matched against - - (ab(cd)ef) - - the value for the inner capturing parentheses (numbered 2) is "ef", - which is the last value taken on at the top level. If a capturing sub- - pattern is not matched at the top level, its final captured value is - unset, even if it was (temporarily) set at a deeper level during the - matching process. - - If there are more than 15 capturing parentheses in a pattern, PCRE has - to obtain extra memory to store data during a recursion, which it does - by using pcre_malloc, freeing it via pcre_free afterwards. If no memory - can be obtained, the match fails with the PCRE_ERROR_NOMEMORY error. - - Do not confuse the (?R) item with the condition (R), which tests for - recursion. Consider this pattern, which matches text in angle brack- - ets, allowing for arbitrary nesting. Only digits are allowed in nested - brackets (that is, when recursing), whereas any characters are permit- - ted at the outer level. - - < (?: (?(R) \d++ | [^<>]*+) | (?R)) * > - - In this pattern, (?(R) is the start of a conditional subpattern, with - two different alternatives for the recursive and non-recursive cases. - The (?R) item is the actual recursive call. - - Differences in recursion processing between PCRE and Perl - - Recursion processing in PCRE differs from Perl in two important ways. - In PCRE (like Python, but unlike Perl), a recursive subpattern call is - always treated as an atomic group. That is, once it has matched some of - the subject string, it is never re-entered, even if it contains untried - alternatives and there is a subsequent matching failure. This can be - illustrated by the following pattern, which purports to match a palin- - dromic string that contains an odd number of characters (for example, - "a", "aba", "abcba", "abcdcba"): - - ^(.|(.)(?1)\2)$ - - The idea is that it either matches a single character, or two identical - characters surrounding a sub-palindrome. In Perl, this pattern works; - in PCRE it does not if the pattern is longer than three characters. - Consider the subject string "abcba": - - At the top level, the first character is matched, but as it is not at - the end of the string, the first alternative fails; the second alterna- - tive is taken and the recursion kicks in. The recursive call to subpat- - tern 1 successfully matches the next character ("b"). (Note that the - beginning and end of line tests are not part of the recursion). - - Back at the top level, the next character ("c") is compared with what - subpattern 2 matched, which was "a". This fails. Because the recursion - is treated as an atomic group, there are now no backtracking points, - and so the entire match fails. (Perl is able, at this point, to re- - enter the recursion and try the second alternative.) However, if the - pattern is written with the alternatives in the other order, things are - different: - - ^((.)(?1)\2|.)$ - - This time, the recursing alternative is tried first, and continues to - recurse until it runs out of characters, at which point the recursion - fails. But this time we do have another alternative to try at the - higher level. That is the big difference: in the previous case the - remaining alternative is at a deeper recursion level, which PCRE cannot - use. - - To change the pattern so that it matches all palindromic strings, not - just those with an odd number of characters, it is tempting to change - the pattern to this: - - ^((.)(?1)\2|.?)$ - - Again, this works in Perl, but not in PCRE, and for the same reason. - When a deeper recursion has matched a single character, it cannot be - entered again in order to match an empty string. The solution is to - separate the two cases, and write out the odd and even cases as alter- - natives at the higher level: - - ^(?:((.)(?1)\2|)|((.)(?3)\4|.)) - - If you want to match typical palindromic phrases, the pattern has to - ignore all non-word characters, which can be done like this: - - ^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$ - - If run with the PCRE_CASELESS option, this pattern matches phrases such - as "A man, a plan, a canal: Panama!" and it works well in both PCRE and - Perl. Note the use of the possessive quantifier *+ to avoid backtrack- - ing into sequences of non-word characters. Without this, PCRE takes a - great deal longer (ten times or more) to match typical phrases, and - Perl takes so long that you think it has gone into a loop. - - WARNING: The palindrome-matching patterns above work only if the sub- - ject string does not start with a palindrome that is shorter than the - entire string. For example, although "abcba" is correctly matched, if - the subject is "ababa", PCRE finds the palindrome "aba" at the start, - then fails at top level because the end of the string does not follow. - Once again, it cannot jump back into the recursion to try other alter- - natives, so the entire match fails. - - The second way in which PCRE and Perl differ in their recursion pro- - cessing is in the handling of captured values. In Perl, when a subpat- - tern is called recursively or as a subpattern (see the next section), - it has no access to any values that were captured outside the recur- - sion, whereas in PCRE these values can be referenced. Consider this - pattern: - - ^(.)(\1|a(?2)) - - In PCRE, this pattern matches "bab". The first capturing parentheses - match "b", then in the second group, when the back reference \1 fails - to match "b", the second alternative matches "a" and then recurses. In - the recursion, \1 does now match "b" and so the whole match succeeds. - In Perl, the pattern fails to match because inside the recursive call - \1 cannot access the externally set value. - - -SUBPATTERNS AS SUBROUTINES - - If the syntax for a recursive subpattern call (either by number or by - name) is used outside the parentheses to which it refers, it operates - like a subroutine in a programming language. The called subpattern may - be defined before or after the reference. A numbered reference can be - absolute or relative, as in these examples: - - (...(absolute)...)...(?2)... - (...(relative)...)...(?-1)... - (...(?+1)...(relative)... - - An earlier example pointed out that the pattern - - (sens|respons)e and \1ibility - - matches "sense and sensibility" and "response and responsibility", but - not "sense and responsibility". If instead the pattern - - (sens|respons)e and (?1)ibility - - is used, it does match "sense and responsibility" as well as the other - two strings. Another example is given in the discussion of DEFINE - above. - - All subroutine calls, whether recursive or not, are always treated as - atomic groups. That is, once a subroutine has matched some of the sub- - ject string, it is never re-entered, even if it contains untried alter- - natives and there is a subsequent matching failure. Any capturing - parentheses that are set during the subroutine call revert to their - previous values afterwards. - - Processing options such as case-independence are fixed when a subpat- - tern is defined, so if it is used as a subroutine, such options cannot - be changed for different calls. For example, consider this pattern: - - (abc)(?i:(?-1)) - - It matches "abcabc". It does not match "abcABC" because the change of - processing option does not affect the called subpattern. - - -ONIGURUMA SUBROUTINE SYNTAX - - For compatibility with Oniguruma, the non-Perl syntax \g followed by a - name or a number enclosed either in angle brackets or single quotes, is - an alternative syntax for referencing a subpattern as a subroutine, - possibly recursively. Here are two of the examples used above, rewrit- - ten using this syntax: - - (? \( ( (?>[^()]+) | \g )* \) ) - (sens|respons)e and \g'1'ibility - - PCRE supports an extension to Oniguruma: if a number is preceded by a - plus or a minus sign it is taken as a relative reference. For example: - - (abc)(?i:\g<-1>) - - Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not - synonymous. The former is a back reference; the latter is a subroutine - call. - - -CALLOUTS - - Perl has a feature whereby using the sequence (?{...}) causes arbitrary - Perl code to be obeyed in the middle of matching a regular expression. - This makes it possible, amongst other things, to extract different sub- - strings that match the same pair of parentheses when there is a repeti- - tion. - - PCRE provides a similar feature, but of course it cannot obey arbitrary - Perl code. The feature is called "callout". The caller of PCRE provides - an external function by putting its entry point in the global variable - pcre_callout (8-bit library) or pcre[16|32]_callout (16-bit or 32-bit - library). By default, this variable contains NULL, which disables all - calling out. - - Within a regular expression, (?C) indicates the points at which the - external function is to be called. If you want to identify different - callout points, you can put a number less than 256 after the letter C. - The default value is zero. For example, this pattern has two callout - points: - - (?C1)abc(?C2)def - - If the PCRE_AUTO_CALLOUT flag is passed to a compiling function, call- - outs are automatically installed before each item in the pattern. They - are all numbered 255. If there is a conditional group in the pattern - whose condition is an assertion, an additional callout is inserted just - before the condition. An explicit callout may also be set at this posi- - tion, as in this example: - - (?(?C9)(?=a)abc|def) - - Note that this applies only to assertion conditions, not to other types - of condition. - - During matching, when PCRE reaches a callout point, the external func- - tion is called. It is provided with the number of the callout, the - position in the pattern, and, optionally, one item of data originally - supplied by the caller of the matching function. The callout function - may cause matching to proceed, to backtrack, or to fail altogether. - - By default, PCRE implements a number of optimizations at compile time - and matching time, and one side-effect is that sometimes callouts are - skipped. If you need all possible callouts to happen, you need to set - options that disable the relevant optimizations. More details, and a - complete description of the interface to the callout function, are - given in the pcrecallout documentation. - - -BACKTRACKING CONTROL - - Perl 5.10 introduced a number of "Special Backtracking Control Verbs", - which are still described in the Perl documentation as "experimental - and subject to change or removal in a future version of Perl". It goes - on to say: "Their usage in production code should be noted to avoid - problems during upgrades." The same remarks apply to the PCRE features - described in this section. - - The new verbs make use of what was previously invalid syntax: an open- - ing parenthesis followed by an asterisk. They are generally of the form - (*VERB) or (*VERB:NAME). Some may take either form, possibly behaving - differently depending on whether or not a name is present. A name is - any sequence of characters that does not include a closing parenthesis. - The maximum length of name is 255 in the 8-bit library and 65535 in the - 16-bit and 32-bit libraries. If the name is empty, that is, if the - closing parenthesis immediately follows the colon, the effect is as if - the colon were not there. Any number of these verbs may occur in a - pattern. - - Since these verbs are specifically related to backtracking, most of - them can be used only when the pattern is to be matched using one of - the traditional matching functions, because these use a backtracking - algorithm. With the exception of (*FAIL), which behaves like a failing - negative assertion, the backtracking control verbs cause an error if - encountered by a DFA matching function. - - The behaviour of these verbs in repeated groups, assertions, and in - subpatterns called as subroutines (whether or not recursively) is docu- - mented below. - - Optimizations that affect backtracking verbs - - PCRE contains some optimizations that are used to speed up matching by - running some checks at the start of each match attempt. For example, it - may know the minimum length of matching subject, or that a particular - character must be present. When one of these optimizations bypasses the - running of a match, any included backtracking verbs will not, of - course, be processed. You can suppress the start-of-match optimizations - by setting the PCRE_NO_START_OPTIMIZE option when calling pcre_com- - pile() or pcre_exec(), or by starting the pattern with (*NO_START_OPT). - There is more discussion of this option in the section entitled "Option - bits for pcre_exec()" in the pcreapi documentation. - - Experiments with Perl suggest that it too has similar optimizations, - sometimes leading to anomalous results. - - Verbs that act immediately - - The following verbs act as soon as they are encountered. They may not - be followed by a name. - - (*ACCEPT) - - This verb causes the match to end successfully, skipping the remainder - of the pattern. However, when it is inside a subpattern that is called - as a subroutine, only that subpattern is ended successfully. Matching - then continues at the outer level. If (*ACCEPT) in triggered in a posi- - tive assertion, the assertion succeeds; in a negative assertion, the - assertion fails. - - If (*ACCEPT) is inside capturing parentheses, the data so far is cap- - tured. For example: - - A((?:A|B(*ACCEPT)|C)D) - - This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap- - tured by the outer parentheses. - - (*FAIL) or (*F) - - This verb causes a matching failure, forcing backtracking to occur. It - is equivalent to (?!) but easier to read. The Perl documentation notes - that it is probably useful only when combined with (?{}) or (??{}). - Those are, of course, Perl features that are not present in PCRE. The - nearest equivalent is the callout feature, as for example in this pat- - tern: - - a+(?C)(*FAIL) - - A match with the string "aaaa" always fails, but the callout is taken - before each backtrack happens (in this example, 10 times). - - Recording which path was taken - - There is one verb whose main purpose is to track how a match was - arrived at, though it also has a secondary use in conjunction with - advancing the match starting point (see (*SKIP) below). - - (*MARK:NAME) or (*:NAME) - - A name is always required with this verb. There may be as many - instances of (*MARK) as you like in a pattern, and their names do not - have to be unique. - - When a match succeeds, the name of the last-encountered (*MARK:NAME), - (*PRUNE:NAME), or (*THEN:NAME) on the matching path is passed back to - the caller as described in the section entitled "Extra data for - pcre_exec()" in the pcreapi documentation. Here is an example of - pcretest output, where the /K modifier requests the retrieval and out- - putting of (*MARK) data: - - re> /X(*MARK:A)Y|X(*MARK:B)Z/K - data> XY - 0: XY - MK: A - XZ - 0: XZ - MK: B - - The (*MARK) name is tagged with "MK:" in this output, and in this exam- - ple it indicates which of the two alternatives matched. This is a more - efficient way of obtaining this information than putting each alterna- - tive in its own capturing parentheses. - - If a verb with a name is encountered in a positive assertion that is - true, the name is recorded and passed back if it is the last-encoun- - tered. This does not happen for negative assertions or failing positive - assertions. - - After a partial match or a failed match, the last encountered name in - the entire match process is returned. For example: - - re> /X(*MARK:A)Y|X(*MARK:B)Z/K - data> XP - No match, mark = B - - Note that in this unanchored example the mark is retained from the - match attempt that started at the letter "X" in the subject. Subsequent - match attempts starting at "P" and then with an empty string do not get - as far as the (*MARK) item, but nevertheless do not reset it. - - If you are interested in (*MARK) values after failed matches, you - should probably set the PCRE_NO_START_OPTIMIZE option (see above) to - ensure that the match is always attempted. - - Verbs that act after backtracking - - The following verbs do nothing when they are encountered. Matching con- - tinues with what follows, but if there is no subsequent match, causing - a backtrack to the verb, a failure is forced. That is, backtracking - cannot pass to the left of the verb. However, when one of these verbs - appears inside an atomic group or an assertion that is true, its effect - is confined to that group, because once the group has been matched, - there is never any backtracking into it. In this situation, backtrack- - ing can "jump back" to the left of the entire atomic group or asser- - tion. (Remember also, as stated above, that this localization also - applies in subroutine calls.) - - These verbs differ in exactly what kind of failure occurs when back- - tracking reaches them. The behaviour described below is what happens - when the verb is not in a subroutine or an assertion. Subsequent sec- - tions cover these special cases. - - (*COMMIT) - - This verb, which may not be followed by a name, causes the whole match - to fail outright if there is a later matching failure that causes back- - tracking to reach it. Even if the pattern is unanchored, no further - attempts to find a match by advancing the starting point take place. If - (*COMMIT) is the only backtracking verb that is encountered, once it - has been passed pcre_exec() is committed to finding a match at the cur- - rent starting point, or not at all. For example: - - a+(*COMMIT)b - - This matches "xxaab" but not "aacaab". It can be thought of as a kind - of dynamic anchor, or "I've started, so I must finish." The name of the - most recently passed (*MARK) in the path is passed back when (*COMMIT) - forces a match failure. - - If there is more than one backtracking verb in a pattern, a different - one that follows (*COMMIT) may be triggered first, so merely passing - (*COMMIT) during a match does not always guarantee that a match must be - at this starting point. - - Note that (*COMMIT) at the start of a pattern is not the same as an - anchor, unless PCRE's start-of-match optimizations are turned off, as - shown in this output from pcretest: - - re> /(*COMMIT)abc/ - data> xyzabc - 0: abc - data> xyzabc\Y - No match - - For this pattern, PCRE knows that any match must start with "a", so the - optimization skips along the subject to "a" before applying the pattern - to the first set of data. The match attempt then succeeds. In the sec- - ond set of data, the escape sequence \Y is interpreted by the pcretest - program. It causes the PCRE_NO_START_OPTIMIZE option to be set when - pcre_exec() is called. This disables the optimization that skips along - to the first character. The pattern is now applied starting at "x", and - so the (*COMMIT) causes the match to fail without trying any other - starting points. - - (*PRUNE) or (*PRUNE:NAME) - - This verb causes the match to fail at the current starting position in - the subject if there is a later matching failure that causes backtrack- - ing to reach it. If the pattern is unanchored, the normal "bumpalong" - advance to the next starting character then happens. Backtracking can - occur as usual to the left of (*PRUNE), before it is reached, or when - matching to the right of (*PRUNE), but if there is no match to the - right, backtracking cannot cross (*PRUNE). In simple cases, the use of - (*PRUNE) is just an alternative to an atomic group or possessive quan- - tifier, but there are some uses of (*PRUNE) that cannot be expressed in - any other way. In an anchored pattern (*PRUNE) has the same effect as - (*COMMIT). - - The behaviour of (*PRUNE:NAME) is the not the same as - (*MARK:NAME)(*PRUNE). It is like (*MARK:NAME) in that the name is - remembered for passing back to the caller. However, (*SKIP:NAME) - searches only for names set with (*MARK). - - (*SKIP) - - This verb, when given without a name, is like (*PRUNE), except that if - the pattern is unanchored, the "bumpalong" advance is not to the next - character, but to the position in the subject where (*SKIP) was encoun- - tered. (*SKIP) signifies that whatever text was matched leading up to - it cannot be part of a successful match. Consider: - - a+(*SKIP)b - - If the subject is "aaaac...", after the first match attempt fails - (starting at the first character in the string), the starting point - skips on to start the next attempt at "c". Note that a possessive quan- - tifer does not have the same effect as this example; although it would - suppress backtracking during the first match attempt, the second - attempt would start at the second character instead of skipping on to - "c". - - (*SKIP:NAME) - - When (*SKIP) has an associated name, its behaviour is modified. When it - is triggered, the previous path through the pattern is searched for the - most recent (*MARK) that has the same name. If one is found, the - "bumpalong" advance is to the subject position that corresponds to that - (*MARK) instead of to where (*SKIP) was encountered. If no (*MARK) with - a matching name is found, the (*SKIP) is ignored. - - Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It - ignores names that are set by (*PRUNE:NAME) or (*THEN:NAME). - - (*THEN) or (*THEN:NAME) - - This verb causes a skip to the next innermost alternative when back- - tracking reaches it. That is, it cancels any further backtracking - within the current alternative. Its name comes from the observation - that it can be used for a pattern-based if-then-else block: - - ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ... - - If the COND1 pattern matches, FOO is tried (and possibly further items - after the end of the group if FOO succeeds); on failure, the matcher - skips to the second alternative and tries COND2, without backtracking - into COND1. If that succeeds and BAR fails, COND3 is tried. If subse- - quently BAZ fails, there are no more alternatives, so there is a back- - track to whatever came before the entire group. If (*THEN) is not - inside an alternation, it acts like (*PRUNE). - - The behaviour of (*THEN:NAME) is the not the same as - (*MARK:NAME)(*THEN). It is like (*MARK:NAME) in that the name is - remembered for passing back to the caller. However, (*SKIP:NAME) - searches only for names set with (*MARK). - - A subpattern that does not contain a | character is just a part of the - enclosing alternative; it is not a nested alternation with only one - alternative. The effect of (*THEN) extends beyond such a subpattern to - the enclosing alternative. Consider this pattern, where A, B, etc. are - complex pattern fragments that do not contain any | characters at this - level: - - A (B(*THEN)C) | D - - If A and B are matched, but there is a failure in C, matching does not - backtrack into A; instead it moves to the next alternative, that is, D. - However, if the subpattern containing (*THEN) is given an alternative, - it behaves differently: - - A (B(*THEN)C | (*FAIL)) | D - - The effect of (*THEN) is now confined to the inner subpattern. After a - failure in C, matching moves to (*FAIL), which causes the whole subpat- - tern to fail because there are no more alternatives to try. In this - case, matching does now backtrack into A. - - Note that a conditional subpattern is not considered as having two - alternatives, because only one is ever used. In other words, the | - character in a conditional subpattern has a different meaning. Ignoring - white space, consider: - - ^.*? (?(?=a) a | b(*THEN)c ) - - If the subject is "ba", this pattern does not match. Because .*? is - ungreedy, it initially matches zero characters. The condition (?=a) - then fails, the character "b" is matched, but "c" is not. At this - point, matching does not backtrack to .*? as might perhaps be expected - from the presence of the | character. The conditional subpattern is - part of the single alternative that comprises the whole pattern, and so - the match fails. (If there was a backtrack into .*?, allowing it to - match "b", the match would succeed.) - - The verbs just described provide four different "strengths" of control - when subsequent matching fails. (*THEN) is the weakest, carrying on the - match at the next alternative. (*PRUNE) comes next, failing the match - at the current starting position, but allowing an advance to the next - character (for an unanchored pattern). (*SKIP) is similar, except that - the advance may be more than one character. (*COMMIT) is the strongest, - causing the entire match to fail. - - More than one backtracking verb - - If more than one backtracking verb is present in a pattern, the one - that is backtracked onto first acts. For example, consider this pat- - tern, where A, B, etc. are complex pattern fragments: - - (A(*COMMIT)B(*THEN)C|ABD) - - If A matches but B fails, the backtrack to (*COMMIT) causes the entire - match to fail. However, if A and B match, but C fails, the backtrack to - (*THEN) causes the next alternative (ABD) to be tried. This behaviour - is consistent, but is not always the same as Perl's. It means that if - two or more backtracking verbs appear in succession, all the the last - of them has no effect. Consider this example: - - ...(*COMMIT)(*PRUNE)... - - If there is a matching failure to the right, backtracking onto (*PRUNE) - causes it to be triggered, and its action is taken. There can never be - a backtrack onto (*COMMIT). - - Backtracking verbs in repeated groups - - PCRE differs from Perl in its handling of backtracking verbs in - repeated groups. For example, consider: - - /(a(*COMMIT)b)+ac/ - - If the subject is "abac", Perl matches, but PCRE fails because the - (*COMMIT) in the second repeat of the group acts. - - Backtracking verbs in assertions - - (*FAIL) in an assertion has its normal effect: it forces an immediate - backtrack. - - (*ACCEPT) in a positive assertion causes the assertion to succeed with- - out any further processing. In a negative assertion, (*ACCEPT) causes - the assertion to fail without any further processing. - - The other backtracking verbs are not treated specially if they appear - in a positive assertion. In particular, (*THEN) skips to the next - alternative in the innermost enclosing group that has alternations, - whether or not this is within the assertion. - - Negative assertions are, however, different, in order to ensure that - changing a positive assertion into a negative assertion changes its - result. Backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes a neg- - ative assertion to be true, without considering any further alternative - branches in the assertion. Backtracking into (*THEN) causes it to skip - to the next enclosing alternative within the assertion (the normal be- - haviour), but if the assertion does not have such an alternative, - (*THEN) behaves like (*PRUNE). - - Backtracking verbs in subroutines - - These behaviours occur whether or not the subpattern is called recur- - sively. Perl's treatment of subroutines is different in some cases. - - (*FAIL) in a subpattern called as a subroutine has its normal effect: - it forces an immediate backtrack. - - (*ACCEPT) in a subpattern called as a subroutine causes the subroutine - match to succeed without any further processing. Matching then contin- - ues after the subroutine call. - - (*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine - cause the subroutine match to fail. - - (*THEN) skips to the next alternative in the innermost enclosing group - within the subpattern that has alternatives. If there is no such group - within the subpattern, (*THEN) causes the subroutine match to fail. - - -SEE ALSO - - pcreapi(3), pcrecallout(3), pcrematching(3), pcresyntax(3), pcre(3), - pcre16(3), pcre32(3). - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 23 October 2016 - Copyright (c) 1997-2016 University of Cambridge. ------------------------------------------------------------------------------- - - -PCRESYNTAX(3) Library Functions Manual PCRESYNTAX(3) - - - -NAME - PCRE - Perl-compatible regular expressions - -PCRE REGULAR EXPRESSION SYNTAX SUMMARY - - The full syntax and semantics of the regular expressions that are sup- - ported by PCRE are described in the pcrepattern documentation. This - document contains a quick-reference summary of the syntax. - - -QUOTING - - \x where x is non-alphanumeric is a literal x - \Q...\E treat enclosed characters as literal - - -CHARACTERS - - \a alarm, that is, the BEL character (hex 07) - \cx "control-x", where x is any ASCII character - \e escape (hex 1B) - \f form feed (hex 0C) - \n newline (hex 0A) - \r carriage return (hex 0D) - \t tab (hex 09) - \0dd character with octal code 0dd - \ddd character with octal code ddd, or backreference - \o{ddd..} character with octal code ddd.. - \xhh character with hex code hh - \x{hhh..} character with hex code hhh.. - - Note that \0dd is always an octal code, and that \8 and \9 are the lit- - eral characters "8" and "9". - - -CHARACTER TYPES - - . any character except newline; - in dotall mode, any character whatsoever - \C one data unit, even in UTF mode (best avoided) - \d a decimal digit - \D a character that is not a decimal digit - \h a horizontal white space character - \H a character that is not a horizontal white space character - \N a character that is not a newline - \p{xx} a character with the xx property - \P{xx} a character without the xx property - \R a newline sequence - \s a white space character - \S a character that is not a white space character - \v a vertical white space character - \V a character that is not a vertical white space character - \w a "word" character - \W a "non-word" character - \X a Unicode extended grapheme cluster - - By default, \d, \s, and \w match only ASCII characters, even in UTF-8 - mode or in the 16- bit and 32-bit libraries. However, if locale-spe- - cific matching is happening, \s and \w may also match characters with - code points in the range 128-255. If the PCRE_UCP option is set, the - behaviour of these escape sequences is changed to use Unicode proper- - ties and they match many more characters. - - -GENERAL CATEGORY PROPERTIES FOR \p and \P - - C Other - Cc Control - Cf Format - Cn Unassigned - Co Private use - Cs Surrogate - - L Letter - Ll Lower case letter - Lm Modifier letter - Lo Other letter - Lt Title case letter - Lu Upper case letter - L& Ll, Lu, or Lt - - M Mark - Mc Spacing mark - Me Enclosing mark - Mn Non-spacing mark - - N Number - Nd Decimal number - Nl Letter number - No Other number - - P Punctuation - Pc Connector punctuation - Pd Dash punctuation - Pe Close punctuation - Pf Final punctuation - Pi Initial punctuation - Po Other punctuation - Ps Open punctuation - - S Symbol - Sc Currency symbol - Sk Modifier symbol - Sm Mathematical symbol - So Other symbol - - Z Separator - Zl Line separator - Zp Paragraph separator - Zs Space separator - - -PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P - - Xan Alphanumeric: union of properties L and N - Xps POSIX space: property Z or tab, NL, VT, FF, CR - Xsp Perl space: property Z or tab, NL, VT, FF, CR - Xuc Univerally-named character: one that can be - represented by a Universal Character Name - Xwd Perl word: property Xan or underscore - - Perl and POSIX space are now the same. Perl added VT to its space char- - acter set at release 5.18 and PCRE changed at release 8.34. - - -SCRIPT NAMES FOR \p AND \P - - Arabic, Armenian, Avestan, Balinese, Bamum, Bassa_Vah, Batak, Bengali, - Bopomofo, Brahmi, Braille, Buginese, Buhid, Canadian_Aboriginal, Car- - ian, Caucasian_Albanian, Chakma, Cham, Cherokee, Common, Coptic, Cunei- - form, Cypriot, Cyrillic, Deseret, Devanagari, Duployan, Egyptian_Hiero- - glyphs, Elbasan, Ethiopic, Georgian, Glagolitic, Gothic, Grantha, - Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunoo, Hebrew, Hiragana, - Imperial_Aramaic, Inherited, Inscriptional_Pahlavi, Inscrip- - tional_Parthian, Javanese, Kaithi, Kannada, Katakana, Kayah_Li, - Kharoshthi, Khmer, Khojki, Khudawadi, Lao, Latin, Lepcha, Limbu, Lin- - ear_A, Linear_B, Lisu, Lycian, Lydian, Mahajani, Malayalam, Mandaic, - Manichaean, Meetei_Mayek, Mende_Kikakui, Meroitic_Cursive, - Meroitic_Hieroglyphs, Miao, Modi, Mongolian, Mro, Myanmar, Nabataean, - New_Tai_Lue, Nko, Ogham, Ol_Chiki, Old_Italic, Old_North_Arabian, - Old_Permic, Old_Persian, Old_South_Arabian, Old_Turkic, Oriya, Osmanya, - Pahawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician, - Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha- - vian, Siddham, Sinhala, Sora_Sompeng, Sundanese, Syloti_Nagri, Syriac, - Tagalog, Tagbanwa, Tai_Le, Tai_Tham, Tai_Viet, Takri, Tamil, Telugu, - Thaana, Thai, Tibetan, Tifinagh, Tirhuta, Ugaritic, Vai, Warang_Citi, - Yi. - - -CHARACTER CLASSES - - [...] positive character class - [^...] negative character class - [x-y] range (can be used for hex characters) - [[:xxx:]] positive POSIX named set - [[:^xxx:]] negative POSIX named set - - alnum alphanumeric - alpha alphabetic - ascii 0-127 - blank space or tab - cntrl control character - digit decimal digit - graph printing, excluding space - lower lower case letter - print printing, including space - punct printing, excluding alphanumeric - space white space - upper upper case letter - word same as \w - xdigit hexadecimal digit - - In PCRE, POSIX character set names recognize only ASCII characters by - default, but some of them use Unicode properties if PCRE_UCP is set. - You can use \Q...\E inside a character class. - - -QUANTIFIERS - - ? 0 or 1, greedy - ?+ 0 or 1, possessive - ?? 0 or 1, lazy - * 0 or more, greedy - *+ 0 or more, possessive - *? 0 or more, lazy - + 1 or more, greedy - ++ 1 or more, possessive - +? 1 or more, lazy - {n} exactly n - {n,m} at least n, no more than m, greedy - {n,m}+ at least n, no more than m, possessive - {n,m}? at least n, no more than m, lazy - {n,} n or more, greedy - {n,}+ n or more, possessive - {n,}? n or more, lazy - - -ANCHORS AND SIMPLE ASSERTIONS - - \b word boundary - \B not a word boundary - ^ start of subject - also after internal newline in multiline mode - \A start of subject - $ end of subject - also before newline at end of subject - also before internal newline in multiline mode - \Z end of subject - also before newline at end of subject - \z end of subject - \G first matching position in subject - - -MATCH POINT RESET - - \K reset start of match - - \K is honoured in positive assertions, but ignored in negative ones. - - -ALTERNATION - - expr|expr|expr... - - -CAPTURING - - (...) capturing group - (?...) named capturing group (Perl) - (?'name'...) named capturing group (Perl) - (?P...) named capturing group (Python) - (?:...) non-capturing group - (?|...) non-capturing group; reset group numbers for - capturing groups in each alternative - - -ATOMIC GROUPS - - (?>...) atomic, non-capturing group - - -COMMENT - - (?#....) comment (not nestable) - - -OPTION SETTING - - (?i) caseless - (?J) allow duplicate names - (?m) multiline - (?s) single line (dotall) - (?U) default ungreedy (lazy) - (?x) extended (ignore white space) - (?-...) unset option(s) - - The following are recognized only at the very start of a pattern or - after one of the newline or \R options with similar syntax. More than - one of them may appear. - - (*LIMIT_MATCH=d) set the match limit to d (decimal number) - (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) - (*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS) - (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) - (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) - (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) - (*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32) - (*UTF) set appropriate UTF mode for the library in use - (*UCP) set PCRE_UCP (use Unicode properties for \d etc) - - Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of - the limits set by the caller of pcre_exec(), not increase them. - - -NEWLINE CONVENTION - - These are recognized only at the very start of the pattern or after - option settings with a similar syntax. - - (*CR) carriage return only - (*LF) linefeed only - (*CRLF) carriage return followed by linefeed - (*ANYCRLF) all three of the above - (*ANY) any Unicode newline sequence - - -WHAT \R MATCHES - - These are recognized only at the very start of the pattern or after - option setting with a similar syntax. - - (*BSR_ANYCRLF) CR, LF, or CRLF - (*BSR_UNICODE) any Unicode newline sequence - - -LOOKAHEAD AND LOOKBEHIND ASSERTIONS - - (?=...) positive look ahead - (?!...) negative look ahead - (?<=...) positive look behind - (? reference by name (Perl) - \k'name' reference by name (Perl) - \g{name} reference by name (Perl) - \k{name} reference by name (.NET) - (?P=name) reference by name (Python) - - -SUBROUTINE REFERENCES (POSSIBLY RECURSIVE) - - (?R) recurse whole pattern - (?n) call subpattern by absolute number - (?+n) call subpattern by relative number - (?-n) call subpattern by relative number - (?&name) call subpattern by name (Perl) - (?P>name) call subpattern by name (Python) - \g call subpattern by name (Oniguruma) - \g'name' call subpattern by name (Oniguruma) - \g call subpattern by absolute number (Oniguruma) - \g'n' call subpattern by absolute number (Oniguruma) - \g<+n> call subpattern by relative number (PCRE extension) - \g'+n' call subpattern by relative number (PCRE extension) - \g<-n> call subpattern by relative number (PCRE extension) - \g'-n' call subpattern by relative number (PCRE extension) - - -CONDITIONAL PATTERNS - - (?(condition)yes-pattern) - (?(condition)yes-pattern|no-pattern) - - (?(n)... absolute reference condition - (?(+n)... relative reference condition - (?(-n)... relative reference condition - (?()... named reference condition (Perl) - (?('name')... named reference condition (Perl) - (?(name)... named reference condition (PCRE) - (?(R)... overall recursion condition - (?(Rn)... specific group recursion condition - (?(R&name)... specific recursion condition - (?(DEFINE)... define subpattern for reference - (?(assert)... assertion condition - - -BACKTRACKING CONTROL - - The following act immediately they are reached: - - (*ACCEPT) force successful match - (*FAIL) force backtrack; synonym (*F) - (*MARK:NAME) set name to be passed back; synonym (*:NAME) - - The following act only when a subsequent match failure causes a back- - track to reach them. They all force a match failure, but they differ in - what happens afterwards. Those that advance the start-of-match point do - so only if the pattern is not anchored. - - (*COMMIT) overall failure, no advance of starting point - (*PRUNE) advance to next starting character - (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE) - (*SKIP) advance to current matching position - (*SKIP:NAME) advance to position corresponding to an earlier - (*MARK:NAME); if not found, the (*SKIP) is ignored - (*THEN) local failure, backtrack to next alternation - (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN) - - -CALLOUTS - - (?C) callout - (?Cn) callout with data n - - -SEE ALSO - - pcrepattern(3), pcreapi(3), pcrecallout(3), pcrematching(3), pcre(3). - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 08 January 2014 - Copyright (c) 1997-2014 University of Cambridge. ------------------------------------------------------------------------------- - - -PCREUNICODE(3) Library Functions Manual PCREUNICODE(3) - - - -NAME - PCRE - Perl-compatible regular expressions - -UTF-8, UTF-16, UTF-32, AND UNICODE PROPERTY SUPPORT - - As well as UTF-8 support, PCRE also supports UTF-16 (from release 8.30) - and UTF-32 (from release 8.32), by means of two additional libraries. - They can be built as well as, or instead of, the 8-bit library. - - -UTF-8 SUPPORT - - In order process UTF-8 strings, you must build PCRE's 8-bit library - with UTF support, and, in addition, you must call pcre_compile() with - the PCRE_UTF8 option flag, or the pattern must start with the sequence - (*UTF8) or (*UTF). When either of these is the case, both the pattern - and any subject strings that are matched against it are treated as - UTF-8 strings instead of strings of individual 1-byte characters. - - -UTF-16 AND UTF-32 SUPPORT - - In order process UTF-16 or UTF-32 strings, you must build PCRE's 16-bit - or 32-bit library with UTF support, and, in addition, you must call - pcre16_compile() or pcre32_compile() with the PCRE_UTF16 or PCRE_UTF32 - option flag, as appropriate. Alternatively, the pattern must start with - the sequence (*UTF16), (*UTF32), as appropriate, or (*UTF), which can - be used with either library. When UTF mode is set, both the pattern and - any subject strings that are matched against it are treated as UTF-16 - or UTF-32 strings instead of strings of individual 16-bit or 32-bit - characters. - - -UTF SUPPORT OVERHEAD - - If you compile PCRE with UTF support, but do not use it at run time, - the library will be a bit bigger, but the additional run time overhead - is limited to testing the PCRE_UTF[8|16|32] flag occasionally, so - should not be very big. - - -UNICODE PROPERTY SUPPORT - - If PCRE is built with Unicode character property support (which implies - UTF support), the escape sequences \p{..}, \P{..}, and \X can be used. - The available properties that can be tested are limited to the general - category properties such as Lu for an upper case letter or Nd for a - decimal number, the Unicode script names such as Arabic or Han, and the - derived properties Any and L&. Full lists is given in the pcrepattern - and pcresyntax documentation. Only the short names for properties are - supported. For example, \p{L} matches a letter. Its Perl synonym, - \p{Letter}, is not supported. Furthermore, in Perl, many properties - may optionally be prefixed by "Is", for compatibility with Perl 5.6. - PCRE does not support this. - - Validity of UTF-8 strings - - When you set the PCRE_UTF8 flag, the byte strings passed as patterns - and subjects are (by default) checked for validity on entry to the rel- - evant functions. The entire string is checked before any other process- - ing takes place. From release 7.3 of PCRE, the check is according the - rules of RFC 3629, which are themselves derived from the Unicode speci- - fication. Earlier releases of PCRE followed the rules of RFC 2279, - which allows the full range of 31-bit values (0 to 0x7FFFFFFF). The - current check allows only values in the range U+0 to U+10FFFF, exclud- - ing the surrogate area. (From release 8.33 the so-called "non-charac- - ter" code points are no longer excluded because Unicode corrigendum #9 - makes it clear that they should not be.) - - Characters in the "Surrogate Area" of Unicode are reserved for use by - UTF-16, where they are used in pairs to encode codepoints with values - greater than 0xFFFF. The code points that are encoded by UTF-16 pairs - are available independently in the UTF-8 and UTF-32 encodings. (In - other words, the whole surrogate thing is a fudge for UTF-16 which - unfortunately messes up UTF-8 and UTF-32.) - - If an invalid UTF-8 string is passed to PCRE, an error return is given. - At compile time, the only additional information is the offset to the - first byte of the failing character. The run-time functions pcre_exec() - and pcre_dfa_exec() also pass back this information, as well as a more - detailed reason code if the caller has provided memory in which to do - this. - - In some situations, you may already know that your strings are valid, - and therefore want to skip these checks in order to improve perfor- - mance, for example in the case of a long subject string that is being - scanned repeatedly. If you set the PCRE_NO_UTF8_CHECK flag at compile - time or at run time, PCRE assumes that the pattern or subject it is - given (respectively) contains only valid UTF-8 codes. In this case, it - does not diagnose an invalid UTF-8 string. - - Note that passing PCRE_NO_UTF8_CHECK to pcre_compile() just disables - the check for the pattern; it does not also apply to subject strings. - If you want to disable the check for a subject string you must pass - this option to pcre_exec() or pcre_dfa_exec(). - - If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, the - result is undefined and your program may crash. - - Validity of UTF-16 strings - - When you set the PCRE_UTF16 flag, the strings of 16-bit data units that - are passed as patterns and subjects are (by default) checked for valid- - ity on entry to the relevant functions. Values other than those in the - surrogate range U+D800 to U+DFFF are independent code points. Values in - the surrogate range must be used in pairs in the correct manner. - - If an invalid UTF-16 string is passed to PCRE, an error return is - given. At compile time, the only additional information is the offset - to the first data unit of the failing character. The run-time functions - pcre16_exec() and pcre16_dfa_exec() also pass back this information, as - well as a more detailed reason code if the caller has provided memory - in which to do this. - - In some situations, you may already know that your strings are valid, - and therefore want to skip these checks in order to improve perfor- - mance. If you set the PCRE_NO_UTF16_CHECK flag at compile time or at - run time, PCRE assumes that the pattern or subject it is given (respec- - tively) contains only valid UTF-16 sequences. In this case, it does not - diagnose an invalid UTF-16 string. However, if an invalid string is - passed, the result is undefined. - - Validity of UTF-32 strings - - When you set the PCRE_UTF32 flag, the strings of 32-bit data units that - are passed as patterns and subjects are (by default) checked for valid- - ity on entry to the relevant functions. This check allows only values - in the range U+0 to U+10FFFF, excluding the surrogate area U+D800 to - U+DFFF. - - If an invalid UTF-32 string is passed to PCRE, an error return is - given. At compile time, the only additional information is the offset - to the first data unit of the failing character. The run-time functions - pcre32_exec() and pcre32_dfa_exec() also pass back this information, as - well as a more detailed reason code if the caller has provided memory - in which to do this. - - In some situations, you may already know that your strings are valid, - and therefore want to skip these checks in order to improve perfor- - mance. If you set the PCRE_NO_UTF32_CHECK flag at compile time or at - run time, PCRE assumes that the pattern or subject it is given (respec- - tively) contains only valid UTF-32 sequences. In this case, it does not - diagnose an invalid UTF-32 string. However, if an invalid string is - passed, the result is undefined. - - General comments about UTF modes - - 1. Codepoints less than 256 can be specified in patterns by either - braced or unbraced hexadecimal escape sequences (for example, \x{b3} or - \xb3). Larger values have to use braced sequences. - - 2. Octal numbers up to \777 are recognized, and in UTF-8 mode they - match two-byte characters for values greater than \177. - - 3. Repeat quantifiers apply to complete UTF characters, not to individ- - ual data units, for example: \x{100}{3}. - - 4. The dot metacharacter matches one UTF character instead of a single - data unit. - - 5. The escape sequence \C can be used to match a single byte in UTF-8 - mode, or a single 16-bit data unit in UTF-16 mode, or a single 32-bit - data unit in UTF-32 mode, but its use can lead to some strange effects - because it breaks up multi-unit characters (see the description of \C - in the pcrepattern documentation). The use of \C is not supported in - the alternative matching function pcre[16|32]_dfa_exec(), nor is it - supported in UTF mode by the JIT optimization of pcre[16|32]_exec(). If - JIT optimization is requested for a UTF pattern that contains \C, it - will not succeed, and so the matching will be carried out by the normal - interpretive function. - - 6. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly - test characters of any code value, but, by default, the characters that - PCRE recognizes as digits, spaces, or word characters remain the same - set as in non-UTF mode, all with values less than 256. This remains - true even when PCRE is built to include Unicode property support, - because to do otherwise would slow down PCRE in many common cases. Note - in particular that this applies to \b and \B, because they are defined - in terms of \w and \W. If you really want to test for a wider sense of, - say, "digit", you can use explicit Unicode property tests such as - \p{Nd}. Alternatively, if you set the PCRE_UCP option, the way that the - character escapes work is changed so that Unicode properties are used - to determine which characters match. There are more details in the sec- - tion on generic character types in the pcrepattern documentation. - - 7. Similarly, characters that match the POSIX named character classes - are all low-valued characters, unless the PCRE_UCP option is set. - - 8. However, the horizontal and vertical white space matching escapes - (\h, \H, \v, and \V) do match all the appropriate Unicode characters, - whether or not PCRE_UCP is set. - - 9. Case-insensitive matching applies only to characters whose values - are less than 128, unless PCRE is built with Unicode property support. - A few Unicode characters such as Greek sigma have more than two code- - points that are case-equivalent. Up to and including PCRE release 8.31, - only one-to-one case mappings were supported, but later releases (with - Unicode property support) do treat as case-equivalent all versions of - characters such as Greek sigma. - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 27 February 2013 - Copyright (c) 1997-2013 University of Cambridge. ------------------------------------------------------------------------------- - - -PCREJIT(3) Library Functions Manual PCREJIT(3) - - - -NAME - PCRE - Perl-compatible regular expressions - -PCRE JUST-IN-TIME COMPILER SUPPORT - - Just-in-time compiling is a heavyweight optimization that can greatly - speed up pattern matching. However, it comes at the cost of extra pro- - cessing before the match is performed. Therefore, it is of most benefit - when the same pattern is going to be matched many times. This does not - necessarily mean many calls of a matching function; if the pattern is - not anchored, matching attempts may take place many times at various - positions in the subject, even for a single call. Therefore, if the - subject string is very long, it may still pay to use JIT for one-off - matches. - - JIT support applies only to the traditional Perl-compatible matching - function. It does not apply when the DFA matching function is being - used. The code for this support was written by Zoltan Herczeg. - - -8-BIT, 16-BIT AND 32-BIT SUPPORT - - JIT support is available for all of the 8-bit, 16-bit and 32-bit PCRE - libraries. To keep this documentation simple, only the 8-bit interface - is described in what follows. If you are using the 16-bit library, sub- - stitute the 16-bit functions and 16-bit structures (for example, - pcre16_jit_stack instead of pcre_jit_stack). If you are using the - 32-bit library, substitute the 32-bit functions and 32-bit structures - (for example, pcre32_jit_stack instead of pcre_jit_stack). - - -AVAILABILITY OF JIT SUPPORT - - JIT support is an optional feature of PCRE. The "configure" option - --enable-jit (or equivalent CMake option) must be set when PCRE is - built if you want to use JIT. The support is limited to the following - hardware platforms: - - ARM v5, v7, and Thumb2 - Intel x86 32-bit and 64-bit - MIPS 32-bit - Power PC 32-bit and 64-bit - SPARC 32-bit (experimental) - - If --enable-jit is set on an unsupported platform, compilation fails. - - A program that is linked with PCRE 8.20 or later can tell if JIT sup- - port is available by calling pcre_config() with the PCRE_CONFIG_JIT - option. The result is 1 when JIT is available, and 0 otherwise. How- - ever, a simple program does not need to check this in order to use JIT. - The normal API is implemented in a way that falls back to the interpre- - tive code if JIT is not available. For programs that need the best pos- - sible performance, there is also a "fast path" API that is JIT-spe- - cific. - - If your program may sometimes be linked with versions of PCRE that are - older than 8.20, but you want to use JIT when it is available, you can - test the values of PCRE_MAJOR and PCRE_MINOR, or the existence of a JIT - macro such as PCRE_CONFIG_JIT, for compile-time control of your code. - Also beware that the pcre_jit_exec() function was not available at all - before 8.32, and may not be available at all if PCRE isn't compiled - with --enable-jit. See the "JIT FAST PATH API" section below for - details. - - -SIMPLE USE OF JIT - - You have to do two things to make use of the JIT support in the sim- - plest way: - - (1) Call pcre_study() with the PCRE_STUDY_JIT_COMPILE option for - each compiled pattern, and pass the resulting pcre_extra block to - pcre_exec(). - - (2) Use pcre_free_study() to free the pcre_extra block when it is - no longer needed, instead of just freeing it yourself. This - ensures that - any JIT data is also freed. - - For a program that may be linked with pre-8.20 versions of PCRE, you - can insert - - #ifndef PCRE_STUDY_JIT_COMPILE - #define PCRE_STUDY_JIT_COMPILE 0 - #endif - - so that no option is passed to pcre_study(), and then use something - like this to free the study data: - - #ifdef PCRE_CONFIG_JIT - pcre_free_study(study_ptr); - #else - pcre_free(study_ptr); - #endif - - PCRE_STUDY_JIT_COMPILE requests the JIT compiler to generate code for - complete matches. If you want to run partial matches using the - PCRE_PARTIAL_HARD or PCRE_PARTIAL_SOFT options of pcre_exec(), you - should set one or both of the following options in addition to, or - instead of, PCRE_STUDY_JIT_COMPILE when you call pcre_study(): - - PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE - PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE - - If using pcre_jit_exec() and supporting a pre-8.32 version of PCRE, you - can insert: - - #if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32 - pcre_jit_exec(...); - #else - pcre_exec(...) - #endif - - but as described in the "JIT FAST PATH API" section below this assumes - version 8.32 and later are compiled with --enable-jit, which may break. - - The JIT compiler generates different optimized code for each of the - three modes (normal, soft partial, hard partial). When pcre_exec() is - called, the appropriate code is run if it is available. Otherwise, the - pattern is matched using interpretive code. - - In some circumstances you may need to call additional functions. These - are described in the section entitled "Controlling the JIT stack" - below. - - If JIT support is not available, PCRE_STUDY_JIT_COMPILE etc. are - ignored, and no JIT data is created. Otherwise, the compiled pattern is - passed to the JIT compiler, which turns it into machine code that exe- - cutes much faster than the normal interpretive code. When pcre_exec() - is passed a pcre_extra block containing a pointer to JIT code of the - appropriate mode (normal or hard/soft partial), it obeys that code - instead of running the interpreter. The result is identical, but the - compiled JIT code runs much faster. - - There are some pcre_exec() options that are not supported for JIT exe- - cution. There are also some pattern items that JIT cannot handle. - Details are given below. In both cases, execution automatically falls - back to the interpretive code. If you want to know whether JIT was - actually used for a particular match, you should arrange for a JIT - callback function to be set up as described in the section entitled - "Controlling the JIT stack" below, even if you do not need to supply a - non-default JIT stack. Such a callback function is called whenever JIT - code is about to be obeyed. If the execution options are not right for - JIT execution, the callback function is not obeyed. - - If the JIT compiler finds an unsupported item, no JIT data is gener- - ated. You can find out if JIT execution is available after studying a - pattern by calling pcre_fullinfo() with the PCRE_INFO_JIT option. A - result of 1 means that JIT compilation was successful. A result of 0 - means that JIT support is not available, or the pattern was not studied - with PCRE_STUDY_JIT_COMPILE etc., or the JIT compiler was not able to - handle the pattern. - - Once a pattern has been studied, with or without JIT, it can be used as - many times as you like for matching different subject strings. - - -UNSUPPORTED OPTIONS AND PATTERN ITEMS - - The only pcre_exec() options that are supported for JIT execution are - PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK, PCRE_NO_UTF32_CHECK, PCRE_NOT- - BOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, PCRE_PAR- - TIAL_HARD, and PCRE_PARTIAL_SOFT. - - The only unsupported pattern items are \C (match a single data unit) - when running in a UTF mode, and a callout immediately before an asser- - tion condition in a conditional group. - - -RETURN VALUES FROM JIT EXECUTION - - When a pattern is matched using JIT execution, the return values are - the same as those given by the interpretive pcre_exec() code, with the - addition of one new error code: PCRE_ERROR_JIT_STACKLIMIT. This means - that the memory used for the JIT stack was insufficient. See "Control- - ling the JIT stack" below for a discussion of JIT stack usage. For com- - patibility with the interpretive pcre_exec() code, no more than two- - thirds of the ovector argument is used for passing back captured sub- - strings. - - The error code PCRE_ERROR_MATCHLIMIT is returned by the JIT code if - searching a very large pattern tree goes on for too long, as it is in - the same circumstance when JIT is not used, but the details of exactly - what is counted are not the same. The PCRE_ERROR_RECURSIONLIMIT error - code is never returned by JIT execution. - - -SAVING AND RESTORING COMPILED PATTERNS - - The code that is generated by the JIT compiler is architecture-spe- - cific, and is also position dependent. For those reasons it cannot be - saved (in a file or database) and restored later like the bytecode and - other data of a compiled pattern. Saving and restoring compiled pat- - terns is not something many people do. More detail about this facility - is given in the pcreprecompile documentation. It should be possible to - run pcre_study() on a saved and restored pattern, and thereby recreate - the JIT data, but because JIT compilation uses significant resources, - it is probably not worth doing this; you might as well recompile the - original pattern. - - -CONTROLLING THE JIT STACK - - When the compiled JIT code runs, it needs a block of memory to use as a - stack. By default, it uses 32K on the machine stack. However, some - large or complicated patterns need more than this. The error - PCRE_ERROR_JIT_STACKLIMIT is given when there is not enough stack. - Three functions are provided for managing blocks of memory for use as - JIT stacks. There is further discussion about the use of JIT stacks in - the section entitled "JIT stack FAQ" below. - - The pcre_jit_stack_alloc() function creates a JIT stack. Its arguments - are a starting size and a maximum size, and it returns a pointer to an - opaque structure of type pcre_jit_stack, or NULL if there is an error. - The pcre_jit_stack_free() function can be used to free a stack that is - no longer needed. (For the technically minded: the address space is - allocated by mmap or VirtualAlloc.) - - JIT uses far less memory for recursion than the interpretive code, and - a maximum stack size of 512K to 1M should be more than enough for any - pattern. - - The pcre_assign_jit_stack() function specifies which stack JIT code - should use. Its arguments are as follows: - - pcre_extra *extra - pcre_jit_callback callback - void *data - - The extra argument must be the result of studying a pattern with - PCRE_STUDY_JIT_COMPILE etc. There are three cases for the values of the - other two options: - - (1) If callback is NULL and data is NULL, an internal 32K block - on the machine stack is used. - - (2) If callback is NULL and data is not NULL, data must be - a valid JIT stack, the result of calling pcre_jit_stack_alloc(). - - (3) If callback is not NULL, it must point to a function that is - called with data as an argument at the start of matching, in - order to set up a JIT stack. If the return from the callback - function is NULL, the internal 32K stack is used; otherwise the - return value must be a valid JIT stack, the result of calling - pcre_jit_stack_alloc(). - - A callback function is obeyed whenever JIT code is about to be run; it - is not obeyed when pcre_exec() is called with options that are incom- - patible for JIT execution. A callback function can therefore be used to - determine whether a match operation was executed by JIT or by the - interpreter. - - You may safely use the same JIT stack for more than one pattern (either - by assigning directly or by callback), as long as the patterns are all - matched sequentially in the same thread. In a multithread application, - if you do not specify a JIT stack, or if you assign or pass back NULL - from a callback, that is thread-safe, because each thread has its own - machine stack. However, if you assign or pass back a non-NULL JIT - stack, this must be a different stack for each thread so that the - application is thread-safe. - - Strictly speaking, even more is allowed. You can assign the same non- - NULL stack to any number of patterns as long as they are not used for - matching by multiple threads at the same time. For example, you can - assign the same stack to all compiled patterns, and use a global mutex - in the callback to wait until the stack is available for use. However, - this is an inefficient solution, and not recommended. - - This is a suggestion for how a multithreaded program that needs to set - up non-default JIT stacks might operate: - - During thread initalization - thread_local_var = pcre_jit_stack_alloc(...) - - During thread exit - pcre_jit_stack_free(thread_local_var) - - Use a one-line callback function - return thread_local_var - - All the functions described in this section do nothing if JIT is not - available, and pcre_assign_jit_stack() does nothing unless the extra - argument is non-NULL and points to a pcre_extra block that is the - result of a successful study with PCRE_STUDY_JIT_COMPILE etc. - - -JIT STACK FAQ - - (1) Why do we need JIT stacks? - - PCRE (and JIT) is a recursive, depth-first engine, so it needs a stack - where the local data of the current node is pushed before checking its - child nodes. Allocating real machine stack on some platforms is diffi- - cult. For example, the stack chain needs to be updated every time if we - extend the stack on PowerPC. Although it is possible, its updating - time overhead decreases performance. So we do the recursion in memory. - - (2) Why don't we simply allocate blocks of memory with malloc()? - - Modern operating systems have a nice feature: they can reserve an - address space instead of allocating memory. We can safely allocate mem- - ory pages inside this address space, so the stack could grow without - moving memory data (this is important because of pointers). Thus we can - allocate 1M address space, and use only a single memory page (usually - 4K) if that is enough. However, we can still grow up to 1M anytime if - needed. - - (3) Who "owns" a JIT stack? - - The owner of the stack is the user program, not the JIT studied pattern - or anything else. The user program must ensure that if a stack is used - by pcre_exec(), (that is, it is assigned to the pattern currently run- - ning), that stack must not be used by any other threads (to avoid over- - writing the same memory area). The best practice for multithreaded pro- - grams is to allocate a stack for each thread, and return this stack - through the JIT callback function. - - (4) When should a JIT stack be freed? - - You can free a JIT stack at any time, as long as it will not be used by - pcre_exec() again. When you assign the stack to a pattern, only a - pointer is set. There is no reference counting or any other magic. You - can free the patterns and stacks in any order, anytime. Just do not - call pcre_exec() with a pattern pointing to an already freed stack, as - that will cause SEGFAULT. (Also, do not free a stack currently used by - pcre_exec() in another thread). You can also replace the stack for a - pattern at any time. You can even free the previous stack before - assigning a replacement. - - (5) Should I allocate/free a stack every time before/after calling - pcre_exec()? - - No, because this is too costly in terms of resources. However, you - could implement some clever idea which release the stack if it is not - used in let's say two minutes. The JIT callback can help to achieve - this without keeping a list of the currently JIT studied patterns. - - (6) OK, the stack is for long term memory allocation. But what happens - if a pattern causes stack overflow with a stack of 1M? Is that 1M kept - until the stack is freed? - - Especially on embedded sytems, it might be a good idea to release mem- - ory sometimes without freeing the stack. There is no API for this at - the moment. Probably a function call which returns with the currently - allocated memory for any stack and another which allows releasing mem- - ory (shrinking the stack) would be a good idea if someone needs this. - - (7) This is too much of a headache. Isn't there any better solution for - JIT stack handling? - - No, thanks to Windows. If POSIX threads were used everywhere, we could - throw out this complicated API. - - -EXAMPLE CODE - - This is a single-threaded example that specifies a JIT stack without - using a callback. - - int rc; - int ovector[30]; - pcre *re; - pcre_extra *extra; - pcre_jit_stack *jit_stack; - - re = pcre_compile(pattern, 0, &error, &erroffset, NULL); - /* Check for errors */ - extra = pcre_study(re, PCRE_STUDY_JIT_COMPILE, &error); - jit_stack = pcre_jit_stack_alloc(32*1024, 512*1024); - /* Check for error (NULL) */ - pcre_assign_jit_stack(extra, NULL, jit_stack); - rc = pcre_exec(re, extra, subject, length, 0, 0, ovector, 30); - /* Check results */ - pcre_free(re); - pcre_free_study(extra); - pcre_jit_stack_free(jit_stack); - - -JIT FAST PATH API - - Because the API described above falls back to interpreted execution - when JIT is not available, it is convenient for programs that are writ- - ten for general use in many environments. However, calling JIT via - pcre_exec() does have a performance impact. Programs that are written - for use where JIT is known to be available, and which need the best - possible performance, can instead use a "fast path" API to call JIT - execution directly instead of calling pcre_exec() (obviously only for - patterns that have been successfully studied by JIT). - - The fast path function is called pcre_jit_exec(), and it takes exactly - the same arguments as pcre_exec(), plus one additional argument that - must point to a JIT stack. The JIT stack arrangements described above - do not apply. The return values are the same as for pcre_exec(). - - When you call pcre_exec(), as well as testing for invalid options, a - number of other sanity checks are performed on the arguments. For exam- - ple, if the subject pointer is NULL, or its length is negative, an - immediate error is given. Also, unless PCRE_NO_UTF[8|16|32] is set, a - UTF subject string is tested for validity. In the interests of speed, - these checks do not happen on the JIT fast path, and if invalid data is - passed, the result is undefined. - - Bypassing the sanity checks and the pcre_exec() wrapping can give - speedups of more than 10%. - - Note that the pcre_jit_exec() function is not available in versions of - PCRE before 8.32 (released in November 2012). If you need to support - versions that old you must either use the slower pcre_exec(), or switch - between the two codepaths by checking the values of PCRE_MAJOR and - PCRE_MINOR. - - Due to an unfortunate implementation oversight, even in versions 8.32 - and later there will be no pcre_jit_exec() stub function defined when - PCRE is compiled with --disable-jit, which is the default, and there's - no way to detect whether PCRE was compiled with --enable-jit via a - macro. - - If you need to support versions older than 8.32, or versions that may - not build with --enable-jit, you must either use the slower - pcre_exec(), or switch between the two codepaths by checking the values - of PCRE_MAJOR and PCRE_MINOR. - - Switching between the two by checking the version assumes that all the - versions being targeted are built with --enable-jit. To also support - builds that may use --disable-jit either pcre_exec() must be used, or a - compile-time check for JIT via pcre_config() (which assumes the runtime - environment will be the same), or as the Git project decided to do, - simply assume that pcre_jit_exec() is present in 8.32 or later unless a - compile-time flag is provided, see the "grep: un-break building with - PCRE >= 8.32 without --enable-jit" commit in git.git for an example of - that. - - -SEE ALSO - - pcreapi(3) - - -AUTHOR - - Philip Hazel (FAQ by Zoltan Herczeg) - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 05 July 2017 - Copyright (c) 1997-2017 University of Cambridge. ------------------------------------------------------------------------------- - - -PCREPARTIAL(3) Library Functions Manual PCREPARTIAL(3) - - - -NAME - PCRE - Perl-compatible regular expressions - -PARTIAL MATCHING IN PCRE - - In normal use of PCRE, if the subject string that is passed to a match- - ing function matches as far as it goes, but is too short to match the - entire pattern, PCRE_ERROR_NOMATCH is returned. There are circumstances - where it might be helpful to distinguish this case from other cases in - which there is no match. - - Consider, for example, an application where a human is required to type - in data for a field with specific formatting requirements. An example - might be a date in the form ddmmmyy, defined by this pattern: - - ^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$ - - If the application sees the user's keystrokes one by one, and can check - that what has been typed so far is potentially valid, it is able to - raise an error as soon as a mistake is made, by beeping and not - reflecting the character that has been typed, for example. This immedi- - ate feedback is likely to be a better user interface than a check that - is delayed until the entire string has been entered. Partial matching - can also be useful when the subject string is very long and is not all - available at once. - - PCRE supports partial matching by means of the PCRE_PARTIAL_SOFT and - PCRE_PARTIAL_HARD options, which can be set when calling any of the - matching functions. For backwards compatibility, PCRE_PARTIAL is a syn- - onym for PCRE_PARTIAL_SOFT. The essential difference between the two - options is whether or not a partial match is preferred to an alterna- - tive complete match, though the details differ between the two types of - matching function. If both options are set, PCRE_PARTIAL_HARD takes - precedence. - - If you want to use partial matching with just-in-time optimized code, - you must call pcre_study(), pcre16_study() or pcre32_study() with one - or both of these options: - - PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE - PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE - - PCRE_STUDY_JIT_COMPILE should also be set if you are going to run non- - partial matches on the same pattern. If the appropriate JIT study mode - has not been set for a match, the interpretive matching code is used. - - Setting a partial matching option disables two of PCRE's standard opti- - mizations. PCRE remembers the last literal data unit in a pattern, and - abandons matching immediately if it is not present in the subject - string. This optimization cannot be used for a subject string that - might match only partially. If the pattern was studied, PCRE knows the - minimum length of a matching string, and does not bother to run the - matching function on shorter strings. This optimization is also dis- - abled for partial matching. - - -PARTIAL MATCHING USING pcre_exec() OR pcre[16|32]_exec() - - A partial match occurs during a call to pcre_exec() or - pcre[16|32]_exec() when the end of the subject string is reached suc- - cessfully, but matching cannot continue because more characters are - needed. However, at least one character in the subject must have been - inspected. This character need not form part of the final matched - string; lookbehind assertions and the \K escape sequence provide ways - of inspecting characters before the start of a matched substring. The - requirement for inspecting at least one character exists because an - empty string can always be matched; without such a restriction there - would always be a partial match of an empty string at the end of the - subject. - - If there are at least two slots in the offsets vector when a partial - match is returned, the first slot is set to the offset of the earliest - character that was inspected. For convenience, the second offset points - to the end of the subject so that a substring can easily be identified. - If there are at least three slots in the offsets vector, the third slot - is set to the offset of the character where matching started. - - For the majority of patterns, the contents of the first and third slots - will be the same. However, for patterns that contain lookbehind asser- - tions, or begin with \b or \B, characters before the one where matching - started may have been inspected while carrying out the match. For exam- - ple, consider this pattern: - - /(?<=abc)123/ - - This pattern matches "123", but only if it is preceded by "abc". If the - subject string is "xyzabc12", the first two offsets after a partial - match are for the substring "abc12", because all these characters were - inspected. However, the third offset is set to 6, because that is the - offset where matching began. - - What happens when a partial match is identified depends on which of the - two partial matching options are set. - - PCRE_PARTIAL_SOFT WITH pcre_exec() OR pcre[16|32]_exec() - - If PCRE_PARTIAL_SOFT is set when pcre_exec() or pcre[16|32]_exec() - identifies a partial match, the partial match is remembered, but match- - ing continues as normal, and other alternatives in the pattern are - tried. If no complete match can be found, PCRE_ERROR_PARTIAL is - returned instead of PCRE_ERROR_NOMATCH. - - This option is "soft" because it prefers a complete match over a par- - tial match. All the various matching items in a pattern behave as if - the subject string is potentially complete. For example, \z, \Z, and $ - match at the end of the subject, as normal, and for \b and \B the end - of the subject is treated as a non-alphanumeric. - - If there is more than one partial match, the first one that was found - provides the data that is returned. Consider this pattern: - - /123\w+X|dogY/ - - If this is matched against the subject string "abc123dog", both alter- - natives fail to match, but the end of the subject is reached during - matching, so PCRE_ERROR_PARTIAL is returned. The offsets are set to 3 - and 9, identifying "123dog" as the first partial match that was found. - (In this example, there are two partial matches, because "dog" on its - own partially matches the second alternative.) - - PCRE_PARTIAL_HARD WITH pcre_exec() OR pcre[16|32]_exec() - - If PCRE_PARTIAL_HARD is set for pcre_exec() or pcre[16|32]_exec(), - PCRE_ERROR_PARTIAL is returned as soon as a partial match is found, - without continuing to search for possible complete matches. This option - is "hard" because it prefers an earlier partial match over a later com- - plete match. For this reason, the assumption is made that the end of - the supplied subject string may not be the true end of the available - data, and so, if \z, \Z, \b, \B, or $ are encountered at the end of the - subject, the result is PCRE_ERROR_PARTIAL, provided that at least one - character in the subject has been inspected. - - Setting PCRE_PARTIAL_HARD also affects the way UTF-8 and UTF-16 subject - strings are checked for validity. Normally, an invalid sequence causes - the error PCRE_ERROR_BADUTF8 or PCRE_ERROR_BADUTF16. However, in the - special case of a truncated character at the end of the subject, - PCRE_ERROR_SHORTUTF8 or PCRE_ERROR_SHORTUTF16 is returned when - PCRE_PARTIAL_HARD is set. - - Comparing hard and soft partial matching - - The difference between the two partial matching options can be illus- - trated by a pattern such as: - - /dog(sbody)?/ - - This matches either "dog" or "dogsbody", greedily (that is, it prefers - the longer string if possible). If it is matched against the string - "dog" with PCRE_PARTIAL_SOFT, it yields a complete match for "dog". - However, if PCRE_PARTIAL_HARD is set, the result is PCRE_ERROR_PARTIAL. - On the other hand, if the pattern is made ungreedy the result is dif- - ferent: - - /dog(sbody)??/ - - In this case the result is always a complete match because that is - found first, and matching never continues after finding a complete - match. It might be easier to follow this explanation by thinking of the - two patterns like this: - - /dog(sbody)?/ is the same as /dogsbody|dog/ - /dog(sbody)??/ is the same as /dog|dogsbody/ - - The second pattern will never match "dogsbody", because it will always - find the shorter match first. - - -PARTIAL MATCHING USING pcre_dfa_exec() OR pcre[16|32]_dfa_exec() - - The DFA functions move along the subject string character by character, - without backtracking, searching for all possible matches simultane- - ously. If the end of the subject is reached before the end of the pat- - tern, there is the possibility of a partial match, again provided that - at least one character has been inspected. - - When PCRE_PARTIAL_SOFT is set, PCRE_ERROR_PARTIAL is returned only if - there have been no complete matches. Otherwise, the complete matches - are returned. However, if PCRE_PARTIAL_HARD is set, a partial match - takes precedence over any complete matches. The portion of the string - that was inspected when the longest partial match was found is set as - the first matching string, provided there are at least two slots in the - offsets vector. - - Because the DFA functions always search for all possible matches, and - there is no difference between greedy and ungreedy repetition, their - behaviour is different from the standard functions when PCRE_PAR- - TIAL_HARD is set. Consider the string "dog" matched against the - ungreedy pattern shown above: - - /dog(sbody)??/ - - Whereas the standard functions stop as soon as they find the complete - match for "dog", the DFA functions also find the partial match for - "dogsbody", and so return that when PCRE_PARTIAL_HARD is set. - - -PARTIAL MATCHING AND WORD BOUNDARIES - - If a pattern ends with one of sequences \b or \B, which test for word - boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter- - intuitive results. Consider this pattern: - - /\bcat\b/ - - This matches "cat", provided there is a word boundary at either end. If - the subject string is "the cat", the comparison of the final "t" with a - following character cannot take place, so a partial match is found. - However, normal matching carries on, and \b matches at the end of the - subject when the last character is a letter, so a complete match is - found. The result, therefore, is not PCRE_ERROR_PARTIAL. Using - PCRE_PARTIAL_HARD in this case does yield PCRE_ERROR_PARTIAL, because - then the partial match takes precedence. - - -FORMERLY RESTRICTED PATTERNS - - For releases of PCRE prior to 8.00, because of the way certain internal - optimizations were implemented in the pcre_exec() function, the - PCRE_PARTIAL option (predecessor of PCRE_PARTIAL_SOFT) could not be - used with all patterns. From release 8.00 onwards, the restrictions no - longer apply, and partial matching with can be requested for any pat- - tern. - - Items that were formerly restricted were repeated single characters and - repeated metasequences. If PCRE_PARTIAL was set for a pattern that did - not conform to the restrictions, pcre_exec() returned the error code - PCRE_ERROR_BADPARTIAL (-13). This error code is no longer in use. The - PCRE_INFO_OKPARTIAL call to pcre_fullinfo() to find out if a compiled - pattern can be used for partial matching now always returns 1. - - -EXAMPLE OF PARTIAL MATCHING USING PCRETEST - - If the escape sequence \P is present in a pcretest data line, the - PCRE_PARTIAL_SOFT option is used for the match. Here is a run of - pcretest that uses the date example quoted above: - - re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ - data> 25jun04\P - 0: 25jun04 - 1: jun - data> 25dec3\P - Partial match: 23dec3 - data> 3ju\P - Partial match: 3ju - data> 3juj\P - No match - data> j\P - No match - - The first data string is matched completely, so pcretest shows the - matched substrings. The remaining four strings do not match the com- - plete pattern, but the first two are partial matches. Similar output is - obtained if DFA matching is used. - - If the escape sequence \P is present more than once in a pcretest data - line, the PCRE_PARTIAL_HARD option is set for the match. - - -MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() OR pcre[16|32]_dfa_exec() - - When a partial match has been found using a DFA matching function, it - is possible to continue the match by providing additional subject data - and calling the function again with the same compiled regular expres- - sion, this time setting the PCRE_DFA_RESTART option. You must pass the - same working space as before, because this is where details of the pre- - vious partial match are stored. Here is an example using pcretest, - using the \R escape sequence to set the PCRE_DFA_RESTART option (\D - specifies the use of the DFA matching function): - - re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ - data> 23ja\P\D - Partial match: 23ja - data> n05\R\D - 0: n05 - - The first call has "23ja" as the subject, and requests partial match- - ing; the second call has "n05" as the subject for the continued - (restarted) match. Notice that when the match is complete, only the - last part is shown; PCRE does not retain the previously partially- - matched string. It is up to the calling program to do that if it needs - to. - - That means that, for an unanchored pattern, if a continued match fails, - it is not possible to try again at a new starting point. All this - facility is capable of doing is continuing with the previous match - attempt. In the previous example, if the second set of data is "ug23" - the result is no match, even though there would be a match for "aug23" - if the entire string were given at once. Depending on the application, - this may or may not be what you want. The only way to allow for start- - ing again at the next character is to retain the matched part of the - subject and try a new complete match. - - You can set the PCRE_PARTIAL_SOFT or PCRE_PARTIAL_HARD options with - PCRE_DFA_RESTART to continue partial matching over multiple segments. - This facility can be used to pass very long subject strings to the DFA - matching functions. - - -MULTI-SEGMENT MATCHING WITH pcre_exec() OR pcre[16|32]_exec() - - From release 8.00, the standard matching functions can also be used to - do multi-segment matching. Unlike the DFA functions, it is not possible - to restart the previous match with a new segment of data. Instead, new - data must be added to the previous subject string, and the entire match - re-run, starting from the point where the partial match occurred. Ear- - lier data can be discarded. - - It is best to use PCRE_PARTIAL_HARD in this situation, because it does - not treat the end of a segment as the end of the subject when matching - \z, \Z, \b, \B, and $. Consider an unanchored pattern that matches - dates: - - re> /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/ - data> The date is 23ja\P\P - Partial match: 23ja - - At this stage, an application could discard the text preceding "23ja", - add on text from the next segment, and call the matching function - again. Unlike the DFA matching functions, the entire matching string - must always be available, and the complete matching process occurs for - each call, so more memory and more processing time is needed. - - Note: If the pattern contains lookbehind assertions, or \K, or starts - with \b or \B, the string that is returned for a partial match includes - characters that precede the start of what would be returned for a com- - plete match, because it contains all the characters that were inspected - during the partial match. - - -ISSUES WITH MULTI-SEGMENT MATCHING - - Certain types of pattern may give problems with multi-segment matching, - whichever matching function is used. - - 1. If the pattern contains a test for the beginning of a line, you need - to pass the PCRE_NOTBOL option when the subject string for any call - does start at the beginning of a line. There is also a PCRE_NOTEOL - option, but in practice when doing multi-segment matching you should be - using PCRE_PARTIAL_HARD, which includes the effect of PCRE_NOTEOL. - - 2. Lookbehind assertions that have already been obeyed are catered for - in the offsets that are returned for a partial match. However a lookbe- - hind assertion later in the pattern could require even earlier charac- - ters to be inspected. You can handle this case by using the - PCRE_INFO_MAXLOOKBEHIND option of the pcre_fullinfo() or - pcre[16|32]_fullinfo() functions to obtain the length of the longest - lookbehind in the pattern. This length is given in characters, not - bytes. If you always retain at least that many characters before the - partially matched string, all should be well. (Of course, near the - start of the subject, fewer characters may be present; in that case all - characters should be retained.) - - From release 8.33, there is a more accurate way of deciding which char- - acters to retain. Instead of subtracting the length of the longest - lookbehind from the earliest inspected character (offsets[0]), the - match start position (offsets[2]) should be used, and the next match - attempt started at the offsets[2] character by setting the startoffset - argument of pcre_exec() or pcre_dfa_exec(). - - For example, if the pattern "(?<=123)abc" is partially matched against - the string "xx123a", the three offset values returned are 2, 6, and 5. - This indicates that the matching process that gave a partial match - started at offset 5, but the characters "123a" were all inspected. The - maximum lookbehind for that pattern is 3, so taking that away from 5 - shows that we need only keep "123a", and the next match attempt can be - started at offset 3 (that is, at "a") when further characters have been - added. When the match start is not the earliest inspected character, - pcretest shows it explicitly: - - re> "(?<=123)abc" - data> xx123a\P\P - Partial match at offset 5: 123a - - 3. Because a partial match must always contain at least one character, - what might be considered a partial match of an empty string actually - gives a "no match" result. For example: - - re> /c(?<=abc)x/ - data> ab\P - No match - - If the next segment begins "cx", a match should be found, but this will - only happen if characters from the previous segment are retained. For - this reason, a "no match" result should be interpreted as "partial - match of an empty string" when the pattern contains lookbehinds. - - 4. Matching a subject string that is split into multiple segments may - not always produce exactly the same result as matching over one single - long string, especially when PCRE_PARTIAL_SOFT is used. The section - "Partial Matching and Word Boundaries" above describes an issue that - arises if the pattern ends with \b or \B. Another kind of difference - may occur when there are multiple matching possibilities, because (for - PCRE_PARTIAL_SOFT) a partial match result is given only when there are - no completed matches. This means that as soon as the shortest match has - been found, continuation to a new subject segment is no longer possi- - ble. Consider again this pcretest example: - - re> /dog(sbody)?/ - data> dogsb\P - 0: dog - data> do\P\D - Partial match: do - data> gsb\R\P\D - 0: g - data> dogsbody\D - 0: dogsbody - 1: dog - - The first data line passes the string "dogsb" to a standard matching - function, setting the PCRE_PARTIAL_SOFT option. Although the string is - a partial match for "dogsbody", the result is not PCRE_ERROR_PARTIAL, - because the shorter string "dog" is a complete match. Similarly, when - the subject is presented to a DFA matching function in several parts - ("do" and "gsb" being the first two) the match stops when "dog" has - been found, and it is not possible to continue. On the other hand, if - "dogsbody" is presented as a single string, a DFA matching function - finds both matches. - - Because of these problems, it is best to use PCRE_PARTIAL_HARD when - matching multi-segment data. The example above then behaves differ- - ently: - - re> /dog(sbody)?/ - data> dogsb\P\P - Partial match: dogsb - data> do\P\D - Partial match: do - data> gsb\R\P\P\D - Partial match: gsb - - 5. Patterns that contain alternatives at the top level which do not all - start with the same pattern item may not work as expected when - PCRE_DFA_RESTART is used. For example, consider this pattern: - - 1234|3789 - - If the first part of the subject is "ABC123", a partial match of the - first alternative is found at offset 3. There is no partial match for - the second alternative, because such a match does not start at the same - point in the subject string. Attempting to continue with the string - "7890" does not yield a match because only those alternatives that - match at one point in the subject are remembered. The problem arises - because the start of the second alternative matches within the first - alternative. There is no problem with anchored patterns or patterns - such as: - - 1234|ABCD - - where no string can be a partial match for both alternatives. This is - not a problem if a standard matching function is used, because the - entire match has to be rerun each time: - - re> /1234|3789/ - data> ABC123\P\P - Partial match: 123 - data> 1237890 - 0: 3789 - - Of course, instead of using PCRE_DFA_RESTART, the same technique of re- - running the entire match can also be used with the DFA matching func- - tions. Another possibility is to work with two buffers. If a partial - match at offset n in the first buffer is followed by "no match" when - PCRE_DFA_RESTART is used on the second buffer, you can then try a new - match starting at offset n+1 in the first buffer. - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 02 July 2013 - Copyright (c) 1997-2013 University of Cambridge. ------------------------------------------------------------------------------- - - -PCREPRECOMPILE(3) Library Functions Manual PCREPRECOMPILE(3) - - - -NAME - PCRE - Perl-compatible regular expressions - -SAVING AND RE-USING PRECOMPILED PCRE PATTERNS - - If you are running an application that uses a large number of regular - expression patterns, it may be useful to store them in a precompiled - form instead of having to compile them every time the application is - run. If you are not using any private character tables (see the - pcre_maketables() documentation), this is relatively straightforward. - If you are using private tables, it is a little bit more complicated. - However, if you are using the just-in-time optimization feature, it is - not possible to save and reload the JIT data. - - If you save compiled patterns to a file, you can copy them to a differ- - ent host and run them there. If the two hosts have different endianness - (byte order), you should run the pcre[16|32]_pat- - tern_to_host_byte_order() function on the new host before trying to - match the pattern. The matching functions return PCRE_ERROR_BADENDIAN- - NESS if they detect a pattern with the wrong endianness. - - Compiling regular expressions with one version of PCRE for use with a - different version is not guaranteed to work and may cause crashes, and - saving and restoring a compiled pattern loses any JIT optimization - data. - - -SAVING A COMPILED PATTERN - - The value returned by pcre[16|32]_compile() points to a single block of - memory that holds the compiled pattern and associated data. You can - find the length of this block in bytes by calling - pcre[16|32]_fullinfo() with an argument of PCRE_INFO_SIZE. You can then - save the data in any appropriate manner. Here is sample code for the - 8-bit library that compiles a pattern and writes it to a file. It - assumes that the variable fd refers to a file that is open for output: - - int erroroffset, rc, size; - char *error; - pcre *re; - - re = pcre_compile("my pattern", 0, &error, &erroroffset, NULL); - if (re == NULL) { ... handle errors ... } - rc = pcre_fullinfo(re, NULL, PCRE_INFO_SIZE, &size); - if (rc < 0) { ... handle errors ... } - rc = fwrite(re, 1, size, fd); - if (rc != size) { ... handle errors ... } - - In this example, the bytes that comprise the compiled pattern are - copied exactly. Note that this is binary data that may contain any of - the 256 possible byte values. On systems that make a distinction - between binary and non-binary data, be sure that the file is opened for - binary output. - - If you want to write more than one pattern to a file, you will have to - devise a way of separating them. For binary data, preceding each pat- - tern with its length is probably the most straightforward approach. - Another possibility is to write out the data in hexadecimal instead of - binary, one pattern to a line. - - Saving compiled patterns in a file is only one possible way of storing - them for later use. They could equally well be saved in a database, or - in the memory of some daemon process that passes them via sockets to - the processes that want them. - - If the pattern has been studied, it is also possible to save the normal - study data in a similar way to the compiled pattern itself. However, if - the PCRE_STUDY_JIT_COMPILE was used, the just-in-time data that is cre- - ated cannot be saved because it is too dependent on the current envi- - ronment. When studying generates additional information, - pcre[16|32]_study() returns a pointer to a pcre[16|32]_extra data - block. Its format is defined in the section on matching a pattern in - the pcreapi documentation. The study_data field points to the binary - study data, and this is what you must save (not the pcre[16|32]_extra - block itself). The length of the study data can be obtained by calling - pcre[16|32]_fullinfo() with an argument of PCRE_INFO_STUDYSIZE. Remem- - ber to check that pcre[16|32]_study() did return a non-NULL value - before trying to save the study data. - - -RE-USING A PRECOMPILED PATTERN - - Re-using a precompiled pattern is straightforward. Having reloaded it - into main memory, called pcre[16|32]_pattern_to_host_byte_order() if - necessary, you pass its pointer to pcre[16|32]_exec() or - pcre[16|32]_dfa_exec() in the usual way. - - However, if you passed a pointer to custom character tables when the - pattern was compiled (the tableptr argument of pcre[16|32]_compile()), - you must now pass a similar pointer to pcre[16|32]_exec() or - pcre[16|32]_dfa_exec(), because the value saved with the compiled pat- - tern will obviously be nonsense. A field in a pcre[16|32]_extra() block - is used to pass this data, as described in the section on matching a - pattern in the pcreapi documentation. - - Warning: The tables that pcre_exec() and pcre_dfa_exec() use must be - the same as those that were used when the pattern was compiled. If this - is not the case, the behaviour is undefined. - - If you did not provide custom character tables when the pattern was - compiled, the pointer in the compiled pattern is NULL, which causes the - matching functions to use PCRE's internal tables. Thus, you do not need - to take any special action at run time in this case. - - If you saved study data with the compiled pattern, you need to create - your own pcre[16|32]_extra data block and set the study_data field to - point to the reloaded study data. You must also set the - PCRE_EXTRA_STUDY_DATA bit in the flags field to indicate that study - data is present. Then pass the pcre[16|32]_extra block to the matching - function in the usual way. If the pattern was studied for just-in-time - optimization, that data cannot be saved, and so is lost by a - save/restore cycle. - - -COMPATIBILITY WITH DIFFERENT PCRE RELEASES - - In general, it is safest to recompile all saved patterns when you - update to a new PCRE release, though not all updates actually require - this. - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 12 November 2013 - Copyright (c) 1997-2013 University of Cambridge. ------------------------------------------------------------------------------- - - -PCREPERFORM(3) Library Functions Manual PCREPERFORM(3) - - - -NAME - PCRE - Perl-compatible regular expressions - -PCRE PERFORMANCE - - Two aspects of performance are discussed below: memory usage and pro- - cessing time. The way you express your pattern as a regular expression - can affect both of them. - - -COMPILED PATTERN MEMORY USAGE - - Patterns are compiled by PCRE into a reasonably efficient interpretive - code, so that most simple patterns do not use much memory. However, - there is one case where the memory usage of a compiled pattern can be - unexpectedly large. If a parenthesized subpattern has a quantifier with - a minimum greater than 1 and/or a limited maximum, the whole subpattern - is repeated in the compiled code. For example, the pattern - - (abc|def){2,4} - - is compiled as if it were - - (abc|def)(abc|def)((abc|def)(abc|def)?)? - - (Technical aside: It is done this way so that backtrack points within - each of the repetitions can be independently maintained.) - - For regular expressions whose quantifiers use only small numbers, this - is not usually a problem. However, if the numbers are large, and par- - ticularly if such repetitions are nested, the memory usage can become - an embarrassment. For example, the very simple pattern - - ((ab){1,1000}c){1,3} - - uses 51K bytes when compiled using the 8-bit library. When PCRE is com- - piled with its default internal pointer size of two bytes, the size - limit on a compiled pattern is 64K data units, and this is reached with - the above pattern if the outer repetition is increased from 3 to 4. - PCRE can be compiled to use larger internal pointers and thus handle - larger compiled patterns, but it is better to try to rewrite your pat- - tern to use less memory if you can. - - One way of reducing the memory usage for such patterns is to make use - of PCRE's "subroutine" facility. Re-writing the above pattern as - - ((ab)(?2){0,999}c)(?1){0,2} - - reduces the memory requirements to 18K, and indeed it remains under 20K - even with the outer repetition increased to 100. However, this pattern - is not exactly equivalent, because the "subroutine" calls are treated - as atomic groups into which there can be no backtracking if there is a - subsequent matching failure. Therefore, PCRE cannot do this kind of - rewriting automatically. Furthermore, there is a noticeable loss of - speed when executing the modified pattern. Nevertheless, if the atomic - grouping is not a problem and the loss of speed is acceptable, this - kind of rewriting will allow you to process patterns that PCRE cannot - otherwise handle. - - -STACK USAGE AT RUN TIME - - When pcre_exec() or pcre[16|32]_exec() is used for matching, certain - kinds of pattern can cause it to use large amounts of the process - stack. In some environments the default process stack is quite small, - and if it runs out the result is often SIGSEGV. This issue is probably - the most frequently raised problem with PCRE. Rewriting your pattern - can often help. The pcrestack documentation discusses this issue in - detail. - - -PROCESSING TIME - - Certain items in regular expression patterns are processed more effi- - ciently than others. It is more efficient to use a character class like - [aeiou] than a set of single-character alternatives such as - (a|e|i|o|u). In general, the simplest construction that provides the - required behaviour is usually the most efficient. Jeffrey Friedl's book - contains a lot of useful general discussion about optimizing regular - expressions for efficient performance. This document contains a few - observations about PCRE. - - Using Unicode character properties (the \p, \P, and \X escapes) is - slow, because PCRE has to use a multi-stage table lookup whenever it - needs a character's property. If you can find an alternative pattern - that does not use character properties, it will probably be faster. - - By default, the escape sequences \b, \d, \s, and \w, and the POSIX - character classes such as [:alpha:] do not use Unicode properties, - partly for backwards compatibility, and partly for performance reasons. - However, you can set PCRE_UCP if you want Unicode character properties - to be used. This can double the matching time for items such as \d, - when matched with a traditional matching function; the performance loss - is less with a DFA matching function, and in both cases there is not - much difference for \b. - - When a pattern begins with .* not in parentheses, or in parentheses - that are not the subject of a backreference, and the PCRE_DOTALL option - is set, the pattern is implicitly anchored by PCRE, since it can match - only at the start of a subject string. However, if PCRE_DOTALL is not - set, PCRE cannot make this optimization, because the . metacharacter - does not then match a newline, and if the subject string contains new- - lines, the pattern may match from the character immediately following - one of them instead of from the very start. For example, the pattern - - .*second - - matches the subject "first\nand second" (where \n stands for a newline - character), with the match starting at the seventh character. In order - to do this, PCRE has to retry the match starting after every newline in - the subject. - - If you are using such a pattern with subject strings that do not con- - tain newlines, the best performance is obtained by setting PCRE_DOTALL, - or starting the pattern with ^.* or ^.*? to indicate explicit anchor- - ing. That saves PCRE from having to scan along the subject looking for - a newline to restart at. - - Beware of patterns that contain nested indefinite repeats. These can - take a long time to run when applied to a string that does not match. - Consider the pattern fragment - - ^(a+)* - - This can match "aaaa" in 16 different ways, and this number increases - very rapidly as the string gets longer. (The * repeat can match 0, 1, - 2, 3, or 4 times, and for each of those cases other than 0 or 4, the + - repeats can match different numbers of times.) When the remainder of - the pattern is such that the entire match is going to fail, PCRE has in - principle to try every possible variation, and this can take an - extremely long time, even for relatively short strings. - - An optimization catches some of the more simple cases such as - - (a+)*b - - where a literal character follows. Before embarking on the standard - matching procedure, PCRE checks that there is a "b" later in the sub- - ject string, and if there is not, it fails the match immediately. How- - ever, when there is no following literal this optimization cannot be - used. You can see the difference by comparing the behaviour of - - (a+)*\d - - with the pattern above. The former gives a failure almost instantly - when applied to a whole line of "a" characters, whereas the latter - takes an appreciable time with strings longer than about 20 characters. - - In many cases, the solution to this kind of performance issue is to use - an atomic group or a possessive quantifier. - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 25 August 2012 - Copyright (c) 1997-2012 University of Cambridge. ------------------------------------------------------------------------------- - - -PCREPOSIX(3) Library Functions Manual PCREPOSIX(3) - - - -NAME - PCRE - Perl-compatible regular expressions. - -SYNOPSIS - - #include - - int regcomp(regex_t *preg, const char *pattern, - int cflags); - - int regexec(regex_t *preg, const char *string, - size_t nmatch, regmatch_t pmatch[], int eflags); - size_t regerror(int errcode, const regex_t *preg, - char *errbuf, size_t errbuf_size); - - void regfree(regex_t *preg); - - -DESCRIPTION - - This set of functions provides a POSIX-style API for the PCRE regular - expression 8-bit library. See the pcreapi documentation for a descrip- - tion of PCRE's native API, which contains much additional functional- - ity. There is no POSIX-style wrapper for PCRE's 16-bit and 32-bit - library. - - The functions described here are just wrapper functions that ultimately - call the PCRE native API. Their prototypes are defined in the - pcreposix.h header file, and on Unix systems the library itself is - called pcreposix.a, so can be accessed by adding -lpcreposix to the - command for linking an application that uses them. Because the POSIX - functions call the native ones, it is also necessary to add -lpcre. - - I have implemented only those POSIX option bits that can be reasonably - mapped to PCRE native options. In addition, the option REG_EXTENDED is - defined with the value zero. This has no effect, but since programs - that are written to the POSIX interface often use it, this makes it - easier to slot in PCRE as a replacement library. Other POSIX options - are not even defined. - - There are also some other options that are not defined by POSIX. These - have been added at the request of users who want to make use of certain - PCRE-specific features via the POSIX calling interface. - - When PCRE is called via these functions, it is only the API that is - POSIX-like in style. The syntax and semantics of the regular expres- - sions themselves are still those of Perl, subject to the setting of - various PCRE options, as described below. "POSIX-like in style" means - that the API approximates to the POSIX definition; it is not fully - POSIX-compatible, and in multi-byte encoding domains it is probably - even less compatible. - - The header for these functions is supplied as pcreposix.h to avoid any - potential clash with other POSIX libraries. It can, of course, be - renamed or aliased as regex.h, which is the "correct" name. It provides - two structure types, regex_t for compiled internal forms, and reg- - match_t for returning captured substrings. It also defines some con- - stants whose names start with "REG_"; these are used for setting - options and identifying error codes. - - -COMPILING A PATTERN - - The function regcomp() is called to compile a pattern into an internal - form. The pattern is a C string terminated by a binary zero, and is - passed in the argument pattern. The preg argument is a pointer to a - regex_t structure that is used as a base for storing information about - the compiled regular expression. - - The argument cflags is either zero, or contains one or more of the bits - defined by the following macros: - - REG_DOTALL - - The PCRE_DOTALL option is set when the regular expression is passed for - compilation to the native function. Note that REG_DOTALL is not part of - the POSIX standard. - - REG_ICASE - - The PCRE_CASELESS option is set when the regular expression is passed - for compilation to the native function. - - REG_NEWLINE - - The PCRE_MULTILINE option is set when the regular expression is passed - for compilation to the native function. Note that this does not mimic - the defined POSIX behaviour for REG_NEWLINE (see the following sec- - tion). - - REG_NOSUB - - The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is - passed for compilation to the native function. In addition, when a pat- - tern that is compiled with this flag is passed to regexec() for match- - ing, the nmatch and pmatch arguments are ignored, and no captured - strings are returned. - - REG_UCP - - The PCRE_UCP option is set when the regular expression is passed for - compilation to the native function. This causes PCRE to use Unicode - properties when matchine \d, \w, etc., instead of just recognizing - ASCII values. Note that REG_UTF8 is not part of the POSIX standard. - - REG_UNGREEDY - - The PCRE_UNGREEDY option is set when the regular expression is passed - for compilation to the native function. Note that REG_UNGREEDY is not - part of the POSIX standard. - - REG_UTF8 - - The PCRE_UTF8 option is set when the regular expression is passed for - compilation to the native function. This causes the pattern itself and - all data strings used for matching it to be treated as UTF-8 strings. - Note that REG_UTF8 is not part of the POSIX standard. - - In the absence of these flags, no options are passed to the native - function. This means the the regex is compiled with PCRE default - semantics. In particular, the way it handles newline characters in the - subject string is the Perl way, not the POSIX way. Note that setting - PCRE_MULTILINE has only some of the effects specified for REG_NEWLINE. - It does not affect the way newlines are matched by . (they are not) or - by a negative class such as [^a] (they are). - - The yield of regcomp() is zero on success, and non-zero otherwise. The - preg structure is filled in on success, and one member of the structure - is public: re_nsub contains the number of capturing subpatterns in the - regular expression. Various error codes are defined in the header file. - - NOTE: If the yield of regcomp() is non-zero, you must not attempt to - use the contents of the preg structure. If, for example, you pass it to - regexec(), the result is undefined and your program is likely to crash. - - -MATCHING NEWLINE CHARACTERS - - This area is not simple, because POSIX and Perl take different views of - things. It is not possible to get PCRE to obey POSIX semantics, but - then PCRE was never intended to be a POSIX engine. The following table - lists the different possibilities for matching newline characters in - PCRE: - - Default Change with - - . matches newline no PCRE_DOTALL - newline matches [^a] yes not changeable - $ matches \n at end yes PCRE_DOLLARENDONLY - $ matches \n in middle no PCRE_MULTILINE - ^ matches \n in middle no PCRE_MULTILINE - - This is the equivalent table for POSIX: - - Default Change with - - . matches newline yes REG_NEWLINE - newline matches [^a] yes REG_NEWLINE - $ matches \n at end no REG_NEWLINE - $ matches \n in middle no REG_NEWLINE - ^ matches \n in middle no REG_NEWLINE - - PCRE's behaviour is the same as Perl's, except that there is no equiva- - lent for PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is - no way to stop newline from matching [^a]. - - The default POSIX newline handling can be obtained by setting - PCRE_DOTALL and PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE - behave exactly as for the REG_NEWLINE action. - - -MATCHING A PATTERN - - The function regexec() is called to match a compiled pattern preg - against a given string, which is by default terminated by a zero byte - (but see REG_STARTEND below), subject to the options in eflags. These - can be: - - REG_NOTBOL - - The PCRE_NOTBOL option is set when calling the underlying PCRE matching - function. - - REG_NOTEMPTY - - The PCRE_NOTEMPTY option is set when calling the underlying PCRE match- - ing function. Note that REG_NOTEMPTY is not part of the POSIX standard. - However, setting this option can give more POSIX-like behaviour in some - situations. - - REG_NOTEOL - - The PCRE_NOTEOL option is set when calling the underlying PCRE matching - function. - - REG_STARTEND - - The string is considered to start at string + pmatch[0].rm_so and to - have a terminating NUL located at string + pmatch[0].rm_eo (there need - not actually be a NUL at that location), regardless of the value of - nmatch. This is a BSD extension, compatible with but not specified by - IEEE Standard 1003.2 (POSIX.2), and should be used with caution in - software intended to be portable to other systems. Note that a non-zero - rm_so does not imply REG_NOTBOL; REG_STARTEND affects only the location - of the string, not how it is matched. - - If the pattern was compiled with the REG_NOSUB flag, no data about any - matched strings is returned. The nmatch and pmatch arguments of - regexec() are ignored. - - If the value of nmatch is zero, or if the value pmatch is NULL, no data - about any matched strings is returned. - - Otherwise,the portion of the string that was matched, and also any cap- - tured substrings, are returned via the pmatch argument, which points to - an array of nmatch structures of type regmatch_t, containing the mem- - bers rm_so and rm_eo. These contain the offset to the first character - of each substring and the offset to the first character after the end - of each substring, respectively. The 0th element of the vector relates - to the entire portion of string that was matched; subsequent elements - relate to the capturing subpatterns of the regular expression. Unused - entries in the array have both structure members set to -1. - - A successful match yields a zero return; various error codes are - defined in the header file, of which REG_NOMATCH is the "expected" - failure code. - - -ERROR MESSAGES - - The regerror() function maps a non-zero errorcode from either regcomp() - or regexec() to a printable message. If preg is not NULL, the error - should have arisen from the use of that structure. A message terminated - by a binary zero is placed in errbuf. The length of the message, - including the zero, is limited to errbuf_size. The yield of the func- - tion is the size of buffer needed to hold the whole message. - - -MEMORY USAGE - - Compiling a regular expression causes memory to be allocated and asso- - ciated with the preg structure. The function regfree() frees all such - memory, after which preg may no longer be used as a compiled expres- - sion. - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 09 January 2012 - Copyright (c) 1997-2012 University of Cambridge. ------------------------------------------------------------------------------- - - -PCRECPP(3) Library Functions Manual PCRECPP(3) - - - -NAME - PCRE - Perl-compatible regular expressions. - -SYNOPSIS OF C++ WRAPPER - - #include - - -DESCRIPTION - - The C++ wrapper for PCRE was provided by Google Inc. Some additional - functionality was added by Giuseppe Maxia. This brief man page was con- - structed from the notes in the pcrecpp.h file, which should be con- - sulted for further details. Note that the C++ wrapper supports only the - original 8-bit PCRE library. There is no 16-bit or 32-bit support at - present. - - -MATCHING INTERFACE - - The "FullMatch" operation checks that supplied text matches a supplied - pattern exactly. If pointer arguments are supplied, it copies matched - sub-strings that match sub-patterns into them. - - Example: successful match - pcrecpp::RE re("h.*o"); - re.FullMatch("hello"); - - Example: unsuccessful match (requires full match): - pcrecpp::RE re("e"); - !re.FullMatch("hello"); - - Example: creating a temporary RE object: - pcrecpp::RE("h.*o").FullMatch("hello"); - - You can pass in a "const char*" or a "string" for "text". The examples - below tend to use a const char*. You can, as in the different examples - above, store the RE object explicitly in a variable or use a temporary - RE object. The examples below use one mode or the other arbitrarily. - Either could correctly be used for any of these examples. - - You must supply extra pointer arguments to extract matched subpieces. - - Example: extracts "ruby" into "s" and 1234 into "i" - int i; - string s; - pcrecpp::RE re("(\\w+):(\\d+)"); - re.FullMatch("ruby:1234", &s, &i); - - Example: does not try to extract any extra sub-patterns - re.FullMatch("ruby:1234", &s); - - Example: does not try to extract into NULL - re.FullMatch("ruby:1234", NULL, &i); - - Example: integer overflow causes failure - !re.FullMatch("ruby:1234567891234", NULL, &i); - - Example: fails because there aren't enough sub-patterns: - !pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s); - - Example: fails because string cannot be stored in integer - !pcrecpp::RE("(.*)").FullMatch("ruby", &i); - - The provided pointer arguments can be pointers to any scalar numeric - type, or one of: - - string (matched piece is copied to string) - StringPiece (StringPiece is mutated to point to matched piece) - T (where "bool T::ParseFrom(const char*, int)" exists) - NULL (the corresponding matched sub-pattern is not copied) - - The function returns true iff all of the following conditions are sat- - isfied: - - a. "text" matches "pattern" exactly; - - b. The number of matched sub-patterns is >= number of supplied - pointers; - - c. The "i"th argument has a suitable type for holding the - string captured as the "i"th sub-pattern. If you pass in - void * NULL for the "i"th argument, or a non-void * NULL - of the correct type, or pass fewer arguments than the - number of sub-patterns, "i"th captured sub-pattern is - ignored. - - CAVEAT: An optional sub-pattern that does not exist in the matched - string is assigned the empty string. Therefore, the following will - return false (because the empty string is not a valid number): - - int number; - pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number); - - The matching interface supports at most 16 arguments per call. If you - need more, consider using the more general interface - pcrecpp::RE::DoMatch. See pcrecpp.h for the signature for DoMatch. - - NOTE: Do not use no_arg, which is used internally to mark the end of a - list of optional arguments, as a placeholder for missing arguments, as - this can lead to segfaults. - - -QUOTING METACHARACTERS - - You can use the "QuoteMeta" operation to insert backslashes before all - potentially meaningful characters in a string. The returned string, - used as a regular expression, will exactly match the original string. - - Example: - string quoted = RE::QuoteMeta(unquoted); - - Note that it's legal to escape a character even if it has no special - meaning in a regular expression -- so this function does that. (This - also makes it identical to the perl function of the same name; see - "perldoc -f quotemeta".) For example, "1.5-2.0?" becomes - "1\.5\-2\.0\?". - - -PARTIAL MATCHES - - You can use the "PartialMatch" operation when you want the pattern to - match any substring of the text. - - Example: simple search for a string: - pcrecpp::RE("ell").PartialMatch("hello"); - - Example: find first number in a string: - int number; - pcrecpp::RE re("(\\d+)"); - re.PartialMatch("x*100 + 20", &number); - assert(number == 100); - - -UTF-8 AND THE MATCHING INTERFACE - - By default, pattern and text are plain text, one byte per character. - The UTF8 flag, passed to the constructor, causes both pattern and - string to be treated as UTF-8 text, still a byte stream but potentially - multiple bytes per character. In practice, the text is likelier to be - UTF-8 than the pattern, but the match returned may depend on the UTF8 - flag, so always use it when matching UTF8 text. For example, "." will - match one byte normally but with UTF8 set may match up to three bytes - of a multi-byte character. - - Example: - pcrecpp::RE_Options options; - options.set_utf8(); - pcrecpp::RE re(utf8_pattern, options); - re.FullMatch(utf8_string); - - Example: using the convenience function UTF8(): - pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8()); - re.FullMatch(utf8_string); - - NOTE: The UTF8 flag is ignored if pcre was not configured with the - --enable-utf8 flag. - - -PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE - - PCRE defines some modifiers to change the behavior of the regular - expression engine. The C++ wrapper defines an auxiliary class, - RE_Options, as a vehicle to pass such modifiers to a RE class. Cur- - rently, the following modifiers are supported: - - modifier description Perl corresponding - - PCRE_CASELESS case insensitive match /i - PCRE_MULTILINE multiple lines match /m - PCRE_DOTALL dot matches newlines /s - PCRE_DOLLAR_ENDONLY $ matches only at end N/A - PCRE_EXTRA strict escape parsing N/A - PCRE_EXTENDED ignore white spaces /x - PCRE_UTF8 handles UTF8 chars built-in - PCRE_UNGREEDY reverses * and *? N/A - PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*) - - (*) Both Perl and PCRE allow non capturing parentheses by means of the - "?:" modifier within the pattern itself. e.g. (?:ab|cd) does not cap- - ture, while (ab|cd) does. - - For a full account on how each modifier works, please check the PCRE - API reference page. - - For each modifier, there are two member functions whose name is made - out of the modifier in lowercase, without the "PCRE_" prefix. For - instance, PCRE_CASELESS is handled by - - bool caseless() - - which returns true if the modifier is set, and - - RE_Options & set_caseless(bool) - - which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can - be accessed through the set_match_limit() and match_limit() member - functions. Setting match_limit to a non-zero value will limit the exe- - cution of pcre to keep it from doing bad things like blowing the stack - or taking an eternity to return a result. A value of 5000 is good - enough to stop stack blowup in a 2MB thread stack. Setting match_limit - to zero disables match limiting. Alternatively, you can call - match_limit_recursion() which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to - limit how much PCRE recurses. match_limit() limits the number of - matches PCRE does; match_limit_recursion() limits the depth of internal - recursion, and therefore the amount of stack that is used. - - Normally, to pass one or more modifiers to a RE class, you declare a - RE_Options object, set the appropriate options, and pass this object to - a RE constructor. Example: - - RE_Options opt; - opt.set_caseless(true); - if (RE("HELLO", opt).PartialMatch("hello world")) ... - - RE_options has two constructors. The default constructor takes no argu- - ments and creates a set of flags that are off by default. The optional - parameter option_flags is to facilitate transfer of legacy code from C - programs. This lets you do - - RE(pattern, - RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str); - - However, new code is better off doing - - RE(pattern, - RE_Options().set_caseless(true).set_multiline(true)) - .PartialMatch(str); - - If you are going to pass one of the most used modifiers, there are some - convenience functions that return a RE_Options class with the appropri- - ate modifier already set: CASELESS(), UTF8(), MULTILINE(), DOTALL(), - and EXTENDED(). - - If you need to set several options at once, and you don't want to go - through the pains of declaring a RE_Options object and setting several - options, there is a parallel method that give you such ability on the - fly. You can concatenate several set_xxxxx() member functions, since - each of them returns a reference to its class object. For example, to - pass PCRE_CASELESS, PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one - statement, you may write: - - RE(" ^ xyz \\s+ .* blah$", - RE_Options() - .set_caseless(true) - .set_extended(true) - .set_multiline(true)).PartialMatch(sometext); - - -SCANNING TEXT INCREMENTALLY - - The "Consume" operation may be useful if you want to repeatedly match - regular expressions at the front of a string and skip over them as they - match. This requires use of the "StringPiece" type, which represents a - sub-range of a real string. Like RE, StringPiece is defined in the - pcrecpp namespace. - - Example: read lines of the form "var = value" from a string. - string contents = ...; // Fill string somehow - pcrecpp::StringPiece input(contents); // Wrap in a StringPiece - - string var; - int value; - pcrecpp::RE re("(\\w+) = (\\d+)\n"); - while (re.Consume(&input, &var, &value)) { - ...; - } - - Each successful call to "Consume" will set "var/value", and also - advance "input" so it points past the matched text. - - The "FindAndConsume" operation is similar to "Consume" but does not - anchor your match at the beginning of the string. For example, you - could extract all words from a string by repeatedly calling - - pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word) - - -PARSING HEX/OCTAL/C-RADIX NUMBERS - - By default, if you pass a pointer to a numeric value, the corresponding - text is interpreted as a base-10 number. You can instead wrap the - pointer with a call to one of the operators Hex(), Octal(), or CRadix() - to interpret the text in another base. The CRadix operator interprets - C-style "0" (base-8) and "0x" (base-16) prefixes, but defaults to - base-10. - - Example: - int a, b, c, d; - pcrecpp::RE re("(.*) (.*) (.*) (.*)"); - re.FullMatch("100 40 0100 0x40", - pcrecpp::Octal(&a), pcrecpp::Hex(&b), - pcrecpp::CRadix(&c), pcrecpp::CRadix(&d)); - - will leave 64 in a, b, c, and d. - - -REPLACING PARTS OF STRINGS - - You can replace the first match of "pattern" in "str" with "rewrite". - Within "rewrite", backslash-escaped digits (\1 to \9) can be used to - insert text matching corresponding parenthesized group from the pat- - tern. \0 in "rewrite" refers to the entire matching text. For example: - - string s = "yabba dabba doo"; - pcrecpp::RE("b+").Replace("d", &s); - - will leave "s" containing "yada dabba doo". The result is true if the - pattern matches and a replacement occurs, false otherwise. - - GlobalReplace is like Replace except that it replaces all occurrences - of the pattern in the string with the rewrite. Replacements are not - subject to re-matching. For example: - - string s = "yabba dabba doo"; - pcrecpp::RE("b+").GlobalReplace("d", &s); - - will leave "s" containing "yada dada doo". It returns the number of - replacements made. - - Extract is like Replace, except that if the pattern matches, "rewrite" - is copied into "out" (an additional argument) with substitutions. The - non-matching portions of "text" are ignored. Returns true iff a match - occurred and the extraction happened successfully; if no match occurs, - the string is left unaffected. - - -AUTHOR - - The C++ wrapper was contributed by Google Inc. - Copyright (c) 2007 Google Inc. - - -REVISION - - Last updated: 08 January 2012 ------------------------------------------------------------------------------- - - -PCRESAMPLE(3) Library Functions Manual PCRESAMPLE(3) - - - -NAME - PCRE - Perl-compatible regular expressions - -PCRE SAMPLE PROGRAM - - A simple, complete demonstration program, to get you started with using - PCRE, is supplied in the file pcredemo.c in the PCRE distribution. A - listing of this program is given in the pcredemo documentation. If you - do not have a copy of the PCRE distribution, you can save this listing - to re-create pcredemo.c. - - The demonstration program, which uses the original PCRE 8-bit library, - compiles the regular expression that is its first argument, and matches - it against the subject string in its second argument. No PCRE options - are set, and default character tables are used. If matching succeeds, - the program outputs the portion of the subject that matched, together - with the contents of any captured substrings. - - If the -g option is given on the command line, the program then goes on - to check for further matches of the same regular expression in the same - subject string. The logic is a little bit tricky because of the possi- - bility of matching an empty string. Comments in the code explain what - is going on. - - If PCRE is installed in the standard include and library directories - for your operating system, you should be able to compile the demonstra- - tion program using this command: - - gcc -o pcredemo pcredemo.c -lpcre - - If PCRE is installed elsewhere, you may need to add additional options - to the command line. For example, on a Unix-like system that has PCRE - installed in /usr/local, you can compile the demonstration program - using a command like this: - - gcc -o pcredemo -I/usr/local/include pcredemo.c \ - -L/usr/local/lib -lpcre - - In a Windows environment, if you want to statically link the program - against a non-dll pcre.a file, you must uncomment the line that defines - PCRE_STATIC before including pcre.h, because otherwise the pcre_mal- - loc() and pcre_free() exported functions will be declared - __declspec(dllimport), with unwanted results. - - Once you have compiled and linked the demonstration program, you can - run simple tests like this: - - ./pcredemo 'cat|dog' 'the cat sat on the mat' - ./pcredemo -g 'cat|dog' 'the dog sat on the cat' - - Note that there is a much more comprehensive test program, called - pcretest, which supports many more facilities for testing regular - expressions and both PCRE libraries. The pcredemo program is provided - as a simple coding example. - - If you try to run pcredemo when PCRE is not installed in the standard - library directory, you may get an error like this on some operating - systems (e.g. Solaris): - - ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or - directory - - This is caused by the way shared library support works on those sys- - tems. You need to add - - -R/usr/local/lib - - (for example) to the compile command to get round this problem. - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 10 January 2012 - Copyright (c) 1997-2012 University of Cambridge. ------------------------------------------------------------------------------- -PCRELIMITS(3) Library Functions Manual PCRELIMITS(3) - - - -NAME - PCRE - Perl-compatible regular expressions - -SIZE AND OTHER LIMITATIONS - - There are some size limitations in PCRE but it is hoped that they will - never in practice be relevant. - - The maximum length of a compiled pattern is approximately 64K data - units (bytes for the 8-bit library, 16-bit units for the 16-bit - library, and 32-bit units for the 32-bit library) if PCRE is compiled - with the default internal linkage size, which is 2 bytes for the 8-bit - and 16-bit libraries, and 4 bytes for the 32-bit library. If you want - to process regular expressions that are truly enormous, you can compile - PCRE with an internal linkage size of 3 or 4 (when building the 16-bit - or 32-bit library, 3 is rounded up to 4). See the README file in the - source distribution and the pcrebuild documentation for details. In - these cases the limit is substantially larger. However, the speed of - execution is slower. - - All values in repeating quantifiers must be less than 65536. - - There is no limit to the number of parenthesized subpatterns, but there - can be no more than 65535 capturing subpatterns. There is, however, a - limit to the depth of nesting of parenthesized subpatterns of all - kinds. This is imposed in order to limit the amount of system stack - used at compile time. The limit can be specified when PCRE is built; - the default is 250. - - There is a limit to the number of forward references to subsequent sub- - patterns of around 200,000. Repeated forward references with fixed - upper limits, for example, (?2){0,100} when subpattern number 2 is to - the right, are included in the count. There is no limit to the number - of backward references. - - The maximum length of name for a named subpattern is 32 characters, and - the maximum number of named subpatterns is 10000. - - The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or - (*THEN) verb is 255 for the 8-bit library and 65535 for the 16-bit and - 32-bit libraries. - - The maximum length of a subject string is the largest positive number - that an integer variable can hold. However, when using the traditional - matching function, PCRE uses recursion to handle subpatterns and indef- - inite repetition. This means that the available stack space may limit - the size of a subject string that can be processed by certain patterns. - For a discussion of stack issues, see the pcrestack documentation. - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 05 November 2013 - Copyright (c) 1997-2013 University of Cambridge. ------------------------------------------------------------------------------- - - -PCRESTACK(3) Library Functions Manual PCRESTACK(3) - - - -NAME - PCRE - Perl-compatible regular expressions - -PCRE DISCUSSION OF STACK USAGE - - When you call pcre[16|32]_exec(), it makes use of an internal function - called match(). This calls itself recursively at branch points in the - pattern, in order to remember the state of the match so that it can - back up and try a different alternative if the first one fails. As - matching proceeds deeper and deeper into the tree of possibilities, the - recursion depth increases. The match() function is also called in other - circumstances, for example, whenever a parenthesized sub-pattern is - entered, and in certain cases of repetition. - - Not all calls of match() increase the recursion depth; for an item such - as a* it may be called several times at the same level, after matching - different numbers of a's. Furthermore, in a number of cases where the - result of the recursive call would immediately be passed back as the - result of the current call (a "tail recursion"), the function is just - restarted instead. - - The above comments apply when pcre[16|32]_exec() is run in its normal - interpretive manner. If the pattern was studied with the - PCRE_STUDY_JIT_COMPILE option, and just-in-time compiling was success- - ful, and the options passed to pcre[16|32]_exec() were not incompati- - ble, the matching process uses the JIT-compiled code instead of the - match() function. In this case, the memory requirements are handled - entirely differently. See the pcrejit documentation for details. - - The pcre[16|32]_dfa_exec() function operates in an entirely different - way, and uses recursion only when there is a regular expression recur- - sion or subroutine call in the pattern. This includes the processing of - assertion and "once-only" subpatterns, which are handled like subrou- - tine calls. Normally, these are never very deep, and the limit on the - complexity of pcre[16|32]_dfa_exec() is controlled by the amount of - workspace it is given. However, it is possible to write patterns with - runaway infinite recursions; such patterns will cause - pcre[16|32]_dfa_exec() to run out of stack. At present, there is no - protection against this. - - The comments that follow do NOT apply to pcre[16|32]_dfa_exec(); they - are relevant only for pcre[16|32]_exec() without the JIT optimization. - - Reducing pcre[16|32]_exec()'s stack usage - - Each time that match() is actually called recursively, it uses memory - from the process stack. For certain kinds of pattern and data, very - large amounts of stack may be needed, despite the recognition of "tail - recursion". You can often reduce the amount of recursion, and there- - fore the amount of stack used, by modifying the pattern that is being - matched. Consider, for example, this pattern: - - ([^<]|<(?!inet))+ - - It matches from wherever it starts until it encounters " -. -. -.SH "PCRE 16-BIT API BASIC FUNCTIONS" -.rs -.sp -.nf -.B pcre16 *pcre16_compile(PCRE_SPTR16 \fIpattern\fP, int \fIoptions\fP, -.B " const char **\fIerrptr\fP, int *\fIerroffset\fP," -.B " const unsigned char *\fItableptr\fP);" -.sp -.B pcre16 *pcre16_compile2(PCRE_SPTR16 \fIpattern\fP, int \fIoptions\fP, -.B " int *\fIerrorcodeptr\fP," -.B " const char **\fIerrptr\fP, int *\fIerroffset\fP," -.B " const unsigned char *\fItableptr\fP);" -.sp -.B pcre16_extra *pcre16_study(const pcre16 *\fIcode\fP, int \fIoptions\fP, -.B " const char **\fIerrptr\fP);" -.sp -.B void pcre16_free_study(pcre16_extra *\fIextra\fP); -.sp -.B int pcre16_exec(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP," -.B " PCRE_SPTR16 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);" -.sp -.B int pcre16_dfa_exec(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP," -.B " PCRE_SPTR16 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP," -.B " int *\fIworkspace\fP, int \fIwscount\fP);" -.fi -. -. -.SH "PCRE 16-BIT API STRING EXTRACTION FUNCTIONS" -.rs -.sp -.nf -.B int pcre16_copy_named_substring(const pcre16 *\fIcode\fP, -.B " PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP," -.B " int \fIstringcount\fP, PCRE_SPTR16 \fIstringname\fP," -.B " PCRE_UCHAR16 *\fIbuffer\fP, int \fIbuffersize\fP);" -.sp -.B int pcre16_copy_substring(PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP, -.B " int \fIstringcount\fP, int \fIstringnumber\fP, PCRE_UCHAR16 *\fIbuffer\fP," -.B " int \fIbuffersize\fP);" -.sp -.B int pcre16_get_named_substring(const pcre16 *\fIcode\fP, -.B " PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP," -.B " int \fIstringcount\fP, PCRE_SPTR16 \fIstringname\fP," -.B " PCRE_SPTR16 *\fIstringptr\fP);" -.sp -.B int pcre16_get_stringnumber(const pcre16 *\fIcode\fP, -.B " PCRE_SPTR16 \fIname\fP); -.sp -.B int pcre16_get_stringtable_entries(const pcre16 *\fIcode\fP, -.B " PCRE_SPTR16 \fIname\fP, PCRE_UCHAR16 **\fIfirst\fP, PCRE_UCHAR16 **\fIlast\fP);" -.sp -.B int pcre16_get_substring(PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP, -.B " int \fIstringcount\fP, int \fIstringnumber\fP," -.B " PCRE_SPTR16 *\fIstringptr\fP);" -.sp -.B int pcre16_get_substring_list(PCRE_SPTR16 \fIsubject\fP, -.B " int *\fIovector\fP, int \fIstringcount\fP, PCRE_SPTR16 **\fIlistptr\fP);" -.sp -.B void pcre16_free_substring(PCRE_SPTR16 \fIstringptr\fP); -.sp -.B void pcre16_free_substring_list(PCRE_SPTR16 *\fIstringptr\fP); -.fi -. -. -.SH "PCRE 16-BIT API AUXILIARY FUNCTIONS" -.rs -.sp -.nf -.B pcre16_jit_stack *pcre16_jit_stack_alloc(int \fIstartsize\fP, int \fImaxsize\fP); -.sp -.B void pcre16_jit_stack_free(pcre16_jit_stack *\fIstack\fP); -.sp -.B void pcre16_assign_jit_stack(pcre16_extra *\fIextra\fP, -.B " pcre16_jit_callback \fIcallback\fP, void *\fIdata\fP);" -.sp -.B const unsigned char *pcre16_maketables(void); -.sp -.B int pcre16_fullinfo(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP," -.B " int \fIwhat\fP, void *\fIwhere\fP);" -.sp -.B int pcre16_refcount(pcre16 *\fIcode\fP, int \fIadjust\fP); -.sp -.B int pcre16_config(int \fIwhat\fP, void *\fIwhere\fP); -.sp -.B const char *pcre16_version(void); -.sp -.B int pcre16_pattern_to_host_byte_order(pcre16 *\fIcode\fP, -.B " pcre16_extra *\fIextra\fP, const unsigned char *\fItables\fP);" -.fi -. -. -.SH "PCRE 16-BIT API INDIRECTED FUNCTIONS" -.rs -.sp -.nf -.B void *(*pcre16_malloc)(size_t); -.sp -.B void (*pcre16_free)(void *); -.sp -.B void *(*pcre16_stack_malloc)(size_t); -.sp -.B void (*pcre16_stack_free)(void *); -.sp -.B int (*pcre16_callout)(pcre16_callout_block *); -.fi -. -. -.SH "PCRE 16-BIT API 16-BIT-ONLY FUNCTION" -.rs -.sp -.nf -.B int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *\fIoutput\fP, -.B " PCRE_SPTR16 \fIinput\fP, int \fIlength\fP, int *\fIbyte_order\fP," -.B " int \fIkeep_boms\fP);" -.fi -. -. -.SH "THE PCRE 16-BIT LIBRARY" -.rs -.sp -Starting with release 8.30, it is possible to compile a PCRE library that -supports 16-bit character strings, including UTF-16 strings, as well as or -instead of the original 8-bit library. The majority of the work to make this -possible was done by Zoltan Herczeg. The two libraries contain identical sets -of functions, used in exactly the same way. Only the names of the functions and -the data types of their arguments and results are different. To avoid -over-complication and reduce the documentation maintenance load, most of the -PCRE documentation describes the 8-bit library, with only occasional references -to the 16-bit library. This page describes what is different when you use the -16-bit library. -.P -WARNING: A single application can be linked with both libraries, but you must -take care when processing any particular pattern to use functions from just one -library. For example, if you want to study a pattern that was compiled with -\fBpcre16_compile()\fP, you must do so with \fBpcre16_study()\fP, not -\fBpcre_study()\fP, and you must free the study data with -\fBpcre16_free_study()\fP. -. -. -.SH "THE HEADER FILE" -.rs -.sp -There is only one header file, \fBpcre.h\fP. It contains prototypes for all the -functions in all libraries, as well as definitions of flags, structures, error -codes, etc. -. -. -.SH "THE LIBRARY NAME" -.rs -.sp -In Unix-like systems, the 16-bit library is called \fBlibpcre16\fP, and can -normally be accesss by adding \fB-lpcre16\fP to the command for linking an -application that uses PCRE. -. -. -.SH "STRING TYPES" -.rs -.sp -In the 8-bit library, strings are passed to PCRE library functions as vectors -of bytes with the C type "char *". In the 16-bit library, strings are passed as -vectors of unsigned 16-bit quantities. The macro PCRE_UCHAR16 specifies an -appropriate data type, and PCRE_SPTR16 is defined as "const PCRE_UCHAR16 *". In -very many environments, "short int" is a 16-bit data type. When PCRE is built, -it defines PCRE_UCHAR16 as "unsigned short int", but checks that it really is a -16-bit data type. If it is not, the build fails with an error message telling -the maintainer to modify the definition appropriately. -. -. -.SH "STRUCTURE TYPES" -.rs -.sp -The types of the opaque structures that are used for compiled 16-bit patterns -and JIT stacks are \fBpcre16\fP and \fBpcre16_jit_stack\fP respectively. The -type of the user-accessible structure that is returned by \fBpcre16_study()\fP -is \fBpcre16_extra\fP, and the type of the structure that is used for passing -data to a callout function is \fBpcre16_callout_block\fP. These structures -contain the same fields, with the same names, as their 8-bit counterparts. The -only difference is that pointers to character strings are 16-bit instead of -8-bit types. -. -. -.SH "16-BIT FUNCTIONS" -.rs -.sp -For every function in the 8-bit library there is a corresponding function in -the 16-bit library with a name that starts with \fBpcre16_\fP instead of -\fBpcre_\fP. The prototypes are listed above. In addition, there is one extra -function, \fBpcre16_utf16_to_host_byte_order()\fP. This is a utility function -that converts a UTF-16 character string to host byte order if necessary. The -other 16-bit functions expect the strings they are passed to be in host byte -order. -.P -The \fIinput\fP and \fIoutput\fP arguments of -\fBpcre16_utf16_to_host_byte_order()\fP may point to the same address, that is, -conversion in place is supported. The output buffer must be at least as long as -the input. -.P -The \fIlength\fP argument specifies the number of 16-bit data units in the -input string; a negative value specifies a zero-terminated string. -.P -If \fIbyte_order\fP is NULL, it is assumed that the string starts off in host -byte order. This may be changed by byte-order marks (BOMs) anywhere in the -string (commonly as the first character). -.P -If \fIbyte_order\fP is not NULL, a non-zero value of the integer to which it -points means that the input starts off in host byte order, otherwise the -opposite order is assumed. Again, BOMs in the string can change this. The final -byte order is passed back at the end of processing. -.P -If \fIkeep_boms\fP is not zero, byte-order mark characters (0xfeff) are copied -into the output string. Otherwise they are discarded. -.P -The result of the function is the number of 16-bit units placed into the output -buffer, including the zero terminator if the string was zero-terminated. -. -. -.SH "SUBJECT STRING OFFSETS" -.rs -.sp -The lengths and starting offsets of subject strings must be specified in 16-bit -data units, and the offsets within subject strings that are returned by the -matching functions are in also 16-bit units rather than bytes. -. -. -.SH "NAMED SUBPATTERNS" -.rs -.sp -The name-to-number translation table that is maintained for named subpatterns -uses 16-bit characters. The \fBpcre16_get_stringtable_entries()\fP function -returns the length of each entry in the table as the number of 16-bit data -units. -. -. -.SH "OPTION NAMES" -.rs -.sp -There are two new general option names, PCRE_UTF16 and PCRE_NO_UTF16_CHECK, -which correspond to PCRE_UTF8 and PCRE_NO_UTF8_CHECK in the 8-bit library. In -fact, these new options define the same bits in the options word. There is a -discussion about the -.\" HTML -.\" -validity of UTF-16 strings -.\" -in the -.\" HREF -\fBpcreunicode\fP -.\" -page. -.P -For the \fBpcre16_config()\fP function there is an option PCRE_CONFIG_UTF16 -that returns 1 if UTF-16 support is configured, otherwise 0. If this option is -given to \fBpcre_config()\fP or \fBpcre32_config()\fP, or if the -PCRE_CONFIG_UTF8 or PCRE_CONFIG_UTF32 option is given to \fBpcre16_config()\fP, -the result is the PCRE_ERROR_BADOPTION error. -. -. -.SH "CHARACTER CODES" -.rs -.sp -In 16-bit mode, when PCRE_UTF16 is not set, character values are treated in the -same way as in 8-bit, non UTF-8 mode, except, of course, that they can range -from 0 to 0xffff instead of 0 to 0xff. Character types for characters less than -0xff can therefore be influenced by the locale in the same way as before. -Characters greater than 0xff have only one case, and no "type" (such as letter -or digit). -.P -In UTF-16 mode, the character code is Unicode, in the range 0 to 0x10ffff, with -the exception of values in the range 0xd800 to 0xdfff because those are -"surrogate" values that are used in pairs to encode values greater than 0xffff. -.P -A UTF-16 string can indicate its endianness by special code knows as a -byte-order mark (BOM). The PCRE functions do not handle this, expecting strings -to be in host byte order. A utility function called -\fBpcre16_utf16_to_host_byte_order()\fP is provided to help with this (see -above). -. -. -.SH "ERROR NAMES" -.rs -.sp -The errors PCRE_ERROR_BADUTF16_OFFSET and PCRE_ERROR_SHORTUTF16 correspond to -their 8-bit counterparts. The error PCRE_ERROR_BADMODE is given when a compiled -pattern is passed to a function that processes patterns in the other -mode, for example, if a pattern compiled with \fBpcre_compile()\fP is passed to -\fBpcre16_exec()\fP. -.P -There are new error codes whose names begin with PCRE_UTF16_ERR for invalid -UTF-16 strings, corresponding to the PCRE_UTF8_ERR codes for UTF-8 strings that -are described in the section entitled -.\" HTML -.\" -"Reason codes for invalid UTF-8 strings" -.\" -in the main -.\" HREF -\fBpcreapi\fP -.\" -page. The UTF-16 errors are: -.sp - PCRE_UTF16_ERR1 Missing low surrogate at end of string - PCRE_UTF16_ERR2 Invalid low surrogate follows high surrogate - PCRE_UTF16_ERR3 Isolated low surrogate - PCRE_UTF16_ERR4 Non-character -. -. -.SH "ERROR TEXTS" -.rs -.sp -If there is an error while compiling a pattern, the error text that is passed -back by \fBpcre16_compile()\fP or \fBpcre16_compile2()\fP is still an 8-bit -character string, zero-terminated. -. -. -.SH "CALLOUTS" -.rs -.sp -The \fIsubject\fP and \fImark\fP fields in the callout block that is passed to -a callout function point to 16-bit vectors. -. -. -.SH "TESTING" -.rs -.sp -The \fBpcretest\fP program continues to operate with 8-bit input and output -files, but it can be used for testing the 16-bit library. If it is run with the -command line option \fB-16\fP, patterns and subject strings are converted from -8-bit to 16-bit before being passed to PCRE, and the 16-bit library functions -are used instead of the 8-bit ones. Returned 16-bit strings are converted to -8-bit for output. If both the 8-bit and the 32-bit libraries were not compiled, -\fBpcretest\fP defaults to 16-bit and the \fB-16\fP option is ignored. -.P -When PCRE is being built, the \fBRunTest\fP script that is called by "make -check" uses the \fBpcretest\fP \fB-C\fP option to discover which of the 8-bit, -16-bit and 32-bit libraries has been built, and runs the tests appropriately. -. -. -.SH "NOT SUPPORTED IN 16-BIT MODE" -.rs -.sp -Not all the features of the 8-bit library are available with the 16-bit -library. The C++ and POSIX wrapper functions support only the 8-bit library, -and the \fBpcregrep\fP program is at present 8-bit only. -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 12 May 2013 -Copyright (c) 1997-2013 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcre32.3 b/src/pcre/doc/pcre32.3 deleted file mode 100644 index 7cde8c08..00000000 --- a/src/pcre/doc/pcre32.3 +++ /dev/null @@ -1,369 +0,0 @@ -.TH PCRE 3 "12 May 2013" "PCRE 8.33" -.SH NAME -PCRE - Perl-compatible regular expressions -.sp -.B #include -. -. -.SH "PCRE 32-BIT API BASIC FUNCTIONS" -.rs -.sp -.nf -.B pcre32 *pcre32_compile(PCRE_SPTR32 \fIpattern\fP, int \fIoptions\fP, -.B " const char **\fIerrptr\fP, int *\fIerroffset\fP," -.B " const unsigned char *\fItableptr\fP);" -.sp -.B pcre32 *pcre32_compile2(PCRE_SPTR32 \fIpattern\fP, int \fIoptions\fP, -.B " int *\fIerrorcodeptr\fP," -.B " const unsigned char *\fItableptr\fP);" -.sp -.B pcre32_extra *pcre32_study(const pcre32 *\fIcode\fP, int \fIoptions\fP, -.B " const char **\fIerrptr\fP);" -.sp -.B void pcre32_free_study(pcre32_extra *\fIextra\fP); -.sp -.B int pcre32_exec(const pcre32 *\fIcode\fP, "const pcre32_extra *\fIextra\fP," -.B " PCRE_SPTR32 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);" -.sp -.B int pcre32_dfa_exec(const pcre32 *\fIcode\fP, "const pcre32_extra *\fIextra\fP," -.B " PCRE_SPTR32 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP," -.B " int *\fIworkspace\fP, int \fIwscount\fP);" -.fi -. -. -.SH "PCRE 32-BIT API STRING EXTRACTION FUNCTIONS" -.rs -.sp -.nf -.B int pcre32_copy_named_substring(const pcre32 *\fIcode\fP, -.B " PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP," -.B " int \fIstringcount\fP, PCRE_SPTR32 \fIstringname\fP," -.B " PCRE_UCHAR32 *\fIbuffer\fP, int \fIbuffersize\fP);" -.sp -.B int pcre32_copy_substring(PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP, -.B " int \fIstringcount\fP, int \fIstringnumber\fP, PCRE_UCHAR32 *\fIbuffer\fP," -.B " int \fIbuffersize\fP);" -.sp -.B int pcre32_get_named_substring(const pcre32 *\fIcode\fP, -.B " PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP," -.B " int \fIstringcount\fP, PCRE_SPTR32 \fIstringname\fP," -.B " PCRE_SPTR32 *\fIstringptr\fP);" -.sp -.B int pcre32_get_stringnumber(const pcre32 *\fIcode\fP, -.B " PCRE_SPTR32 \fIname\fP);" -.sp -.B int pcre32_get_stringtable_entries(const pcre32 *\fIcode\fP, -.B " PCRE_SPTR32 \fIname\fP, PCRE_UCHAR32 **\fIfirst\fP, PCRE_UCHAR32 **\fIlast\fP);" -.sp -.B int pcre32_get_substring(PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP, -.B " int \fIstringcount\fP, int \fIstringnumber\fP," -.B " PCRE_SPTR32 *\fIstringptr\fP);" -.sp -.B int pcre32_get_substring_list(PCRE_SPTR32 \fIsubject\fP, -.B " int *\fIovector\fP, int \fIstringcount\fP, PCRE_SPTR32 **\fIlistptr\fP);" -.sp -.B void pcre32_free_substring(PCRE_SPTR32 \fIstringptr\fP); -.sp -.B void pcre32_free_substring_list(PCRE_SPTR32 *\fIstringptr\fP); -.fi -. -. -.SH "PCRE 32-BIT API AUXILIARY FUNCTIONS" -.rs -.sp -.nf -.B pcre32_jit_stack *pcre32_jit_stack_alloc(int \fIstartsize\fP, int \fImaxsize\fP); -.sp -.B void pcre32_jit_stack_free(pcre32_jit_stack *\fIstack\fP); -.sp -.B void pcre32_assign_jit_stack(pcre32_extra *\fIextra\fP, -.B " pcre32_jit_callback \fIcallback\fP, void *\fIdata\fP);" -.sp -.B const unsigned char *pcre32_maketables(void); -.sp -.B int pcre32_fullinfo(const pcre32 *\fIcode\fP, "const pcre32_extra *\fIextra\fP," -.B " int \fIwhat\fP, void *\fIwhere\fP);" -.sp -.B int pcre32_refcount(pcre32 *\fIcode\fP, int \fIadjust\fP); -.sp -.B int pcre32_config(int \fIwhat\fP, void *\fIwhere\fP); -.sp -.B const char *pcre32_version(void); -.sp -.B int pcre32_pattern_to_host_byte_order(pcre32 *\fIcode\fP, -.B " pcre32_extra *\fIextra\fP, const unsigned char *\fItables\fP);" -.fi -. -. -.SH "PCRE 32-BIT API INDIRECTED FUNCTIONS" -.rs -.sp -.nf -.B void *(*pcre32_malloc)(size_t); -.sp -.B void (*pcre32_free)(void *); -.sp -.B void *(*pcre32_stack_malloc)(size_t); -.sp -.B void (*pcre32_stack_free)(void *); -.sp -.B int (*pcre32_callout)(pcre32_callout_block *); -.fi -. -. -.SH "PCRE 32-BIT API 32-BIT-ONLY FUNCTION" -.rs -.sp -.nf -.B int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *\fIoutput\fP, -.B " PCRE_SPTR32 \fIinput\fP, int \fIlength\fP, int *\fIbyte_order\fP," -.B " int \fIkeep_boms\fP);" -.fi -. -. -.SH "THE PCRE 32-BIT LIBRARY" -.rs -.sp -Starting with release 8.32, it is possible to compile a PCRE library that -supports 32-bit character strings, including UTF-32 strings, as well as or -instead of the original 8-bit library. This work was done by Christian Persch, -based on the work done by Zoltan Herczeg for the 16-bit library. All three -libraries contain identical sets of functions, used in exactly the same way. -Only the names of the functions and the data types of their arguments and -results are different. To avoid over-complication and reduce the documentation -maintenance load, most of the PCRE documentation describes the 8-bit library, -with only occasional references to the 16-bit and 32-bit libraries. This page -describes what is different when you use the 32-bit library. -.P -WARNING: A single application can be linked with all or any of the three -libraries, but you must take care when processing any particular pattern -to use functions from just one library. For example, if you want to study -a pattern that was compiled with \fBpcre32_compile()\fP, you must do so -with \fBpcre32_study()\fP, not \fBpcre_study()\fP, and you must free the -study data with \fBpcre32_free_study()\fP. -. -. -.SH "THE HEADER FILE" -.rs -.sp -There is only one header file, \fBpcre.h\fP. It contains prototypes for all the -functions in all libraries, as well as definitions of flags, structures, error -codes, etc. -. -. -.SH "THE LIBRARY NAME" -.rs -.sp -In Unix-like systems, the 32-bit library is called \fBlibpcre32\fP, and can -normally be accesss by adding \fB-lpcre32\fP to the command for linking an -application that uses PCRE. -. -. -.SH "STRING TYPES" -.rs -.sp -In the 8-bit library, strings are passed to PCRE library functions as vectors -of bytes with the C type "char *". In the 32-bit library, strings are passed as -vectors of unsigned 32-bit quantities. The macro PCRE_UCHAR32 specifies an -appropriate data type, and PCRE_SPTR32 is defined as "const PCRE_UCHAR32 *". In -very many environments, "unsigned int" is a 32-bit data type. When PCRE is -built, it defines PCRE_UCHAR32 as "unsigned int", but checks that it really is -a 32-bit data type. If it is not, the build fails with an error message telling -the maintainer to modify the definition appropriately. -. -. -.SH "STRUCTURE TYPES" -.rs -.sp -The types of the opaque structures that are used for compiled 32-bit patterns -and JIT stacks are \fBpcre32\fP and \fBpcre32_jit_stack\fP respectively. The -type of the user-accessible structure that is returned by \fBpcre32_study()\fP -is \fBpcre32_extra\fP, and the type of the structure that is used for passing -data to a callout function is \fBpcre32_callout_block\fP. These structures -contain the same fields, with the same names, as their 8-bit counterparts. The -only difference is that pointers to character strings are 32-bit instead of -8-bit types. -. -. -.SH "32-BIT FUNCTIONS" -.rs -.sp -For every function in the 8-bit library there is a corresponding function in -the 32-bit library with a name that starts with \fBpcre32_\fP instead of -\fBpcre_\fP. The prototypes are listed above. In addition, there is one extra -function, \fBpcre32_utf32_to_host_byte_order()\fP. This is a utility function -that converts a UTF-32 character string to host byte order if necessary. The -other 32-bit functions expect the strings they are passed to be in host byte -order. -.P -The \fIinput\fP and \fIoutput\fP arguments of -\fBpcre32_utf32_to_host_byte_order()\fP may point to the same address, that is, -conversion in place is supported. The output buffer must be at least as long as -the input. -.P -The \fIlength\fP argument specifies the number of 32-bit data units in the -input string; a negative value specifies a zero-terminated string. -.P -If \fIbyte_order\fP is NULL, it is assumed that the string starts off in host -byte order. This may be changed by byte-order marks (BOMs) anywhere in the -string (commonly as the first character). -.P -If \fIbyte_order\fP is not NULL, a non-zero value of the integer to which it -points means that the input starts off in host byte order, otherwise the -opposite order is assumed. Again, BOMs in the string can change this. The final -byte order is passed back at the end of processing. -.P -If \fIkeep_boms\fP is not zero, byte-order mark characters (0xfeff) are copied -into the output string. Otherwise they are discarded. -.P -The result of the function is the number of 32-bit units placed into the output -buffer, including the zero terminator if the string was zero-terminated. -. -. -.SH "SUBJECT STRING OFFSETS" -.rs -.sp -The lengths and starting offsets of subject strings must be specified in 32-bit -data units, and the offsets within subject strings that are returned by the -matching functions are in also 32-bit units rather than bytes. -. -. -.SH "NAMED SUBPATTERNS" -.rs -.sp -The name-to-number translation table that is maintained for named subpatterns -uses 32-bit characters. The \fBpcre32_get_stringtable_entries()\fP function -returns the length of each entry in the table as the number of 32-bit data -units. -. -. -.SH "OPTION NAMES" -.rs -.sp -There are two new general option names, PCRE_UTF32 and PCRE_NO_UTF32_CHECK, -which correspond to PCRE_UTF8 and PCRE_NO_UTF8_CHECK in the 8-bit library. In -fact, these new options define the same bits in the options word. There is a -discussion about the -.\" HTML -.\" -validity of UTF-32 strings -.\" -in the -.\" HREF -\fBpcreunicode\fP -.\" -page. -.P -For the \fBpcre32_config()\fP function there is an option PCRE_CONFIG_UTF32 -that returns 1 if UTF-32 support is configured, otherwise 0. If this option is -given to \fBpcre_config()\fP or \fBpcre16_config()\fP, or if the -PCRE_CONFIG_UTF8 or PCRE_CONFIG_UTF16 option is given to \fBpcre32_config()\fP, -the result is the PCRE_ERROR_BADOPTION error. -. -. -.SH "CHARACTER CODES" -.rs -.sp -In 32-bit mode, when PCRE_UTF32 is not set, character values are treated in the -same way as in 8-bit, non UTF-8 mode, except, of course, that they can range -from 0 to 0x7fffffff instead of 0 to 0xff. Character types for characters less -than 0xff can therefore be influenced by the locale in the same way as before. -Characters greater than 0xff have only one case, and no "type" (such as letter -or digit). -.P -In UTF-32 mode, the character code is Unicode, in the range 0 to 0x10ffff, with -the exception of values in the range 0xd800 to 0xdfff because those are -"surrogate" values that are ill-formed in UTF-32. -.P -A UTF-32 string can indicate its endianness by special code knows as a -byte-order mark (BOM). The PCRE functions do not handle this, expecting strings -to be in host byte order. A utility function called -\fBpcre32_utf32_to_host_byte_order()\fP is provided to help with this (see -above). -. -. -.SH "ERROR NAMES" -.rs -.sp -The error PCRE_ERROR_BADUTF32 corresponds to its 8-bit counterpart. -The error PCRE_ERROR_BADMODE is given when a compiled -pattern is passed to a function that processes patterns in the other -mode, for example, if a pattern compiled with \fBpcre_compile()\fP is passed to -\fBpcre32_exec()\fP. -.P -There are new error codes whose names begin with PCRE_UTF32_ERR for invalid -UTF-32 strings, corresponding to the PCRE_UTF8_ERR codes for UTF-8 strings that -are described in the section entitled -.\" HTML -.\" -"Reason codes for invalid UTF-8 strings" -.\" -in the main -.\" HREF -\fBpcreapi\fP -.\" -page. The UTF-32 errors are: -.sp - PCRE_UTF32_ERR1 Surrogate character (range from 0xd800 to 0xdfff) - PCRE_UTF32_ERR2 Non-character - PCRE_UTF32_ERR3 Character > 0x10ffff -. -. -.SH "ERROR TEXTS" -.rs -.sp -If there is an error while compiling a pattern, the error text that is passed -back by \fBpcre32_compile()\fP or \fBpcre32_compile2()\fP is still an 8-bit -character string, zero-terminated. -. -. -.SH "CALLOUTS" -.rs -.sp -The \fIsubject\fP and \fImark\fP fields in the callout block that is passed to -a callout function point to 32-bit vectors. -. -. -.SH "TESTING" -.rs -.sp -The \fBpcretest\fP program continues to operate with 8-bit input and output -files, but it can be used for testing the 32-bit library. If it is run with the -command line option \fB-32\fP, patterns and subject strings are converted from -8-bit to 32-bit before being passed to PCRE, and the 32-bit library functions -are used instead of the 8-bit ones. Returned 32-bit strings are converted to -8-bit for output. If both the 8-bit and the 16-bit libraries were not compiled, -\fBpcretest\fP defaults to 32-bit and the \fB-32\fP option is ignored. -.P -When PCRE is being built, the \fBRunTest\fP script that is called by "make -check" uses the \fBpcretest\fP \fB-C\fP option to discover which of the 8-bit, -16-bit and 32-bit libraries has been built, and runs the tests appropriately. -. -. -.SH "NOT SUPPORTED IN 32-BIT MODE" -.rs -.sp -Not all the features of the 8-bit library are available with the 32-bit -library. The C++ and POSIX wrapper functions support only the 8-bit library, -and the \fBpcregrep\fP program is at present 8-bit only. -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 12 May 2013 -Copyright (c) 1997-2013 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcre_assign_jit_stack.3 b/src/pcre/doc/pcre_assign_jit_stack.3 deleted file mode 100644 index 0ecf6f2c..00000000 --- a/src/pcre/doc/pcre_assign_jit_stack.3 +++ /dev/null @@ -1,59 +0,0 @@ -.TH PCRE_ASSIGN_JIT_STACK 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B void pcre_assign_jit_stack(pcre_extra *\fIextra\fP, -.B " pcre_jit_callback \fIcallback\fP, void *\fIdata\fP);" -.sp -.B void pcre16_assign_jit_stack(pcre16_extra *\fIextra\fP, -.B " pcre16_jit_callback \fIcallback\fP, void *\fIdata\fP);" -.sp -.B void pcre32_assign_jit_stack(pcre32_extra *\fIextra\fP, -.B " pcre32_jit_callback \fIcallback\fP, void *\fIdata\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This function provides control over the memory used as a stack at run-time by a -call to \fBpcre[16|32]_exec()\fP with a pattern that has been successfully -compiled with JIT optimization. The arguments are: -.sp - extra the data pointer returned by \fBpcre[16|32]_study()\fP - callback a callback function - data a JIT stack or a value to be passed to the callback - function -.P -If \fIcallback\fP is NULL and \fIdata\fP is NULL, an internal 32K block on -the machine stack is used. -.P -If \fIcallback\fP is NULL and \fIdata\fP is not NULL, \fIdata\fP must -be a valid JIT stack, the result of calling \fBpcre[16|32]_jit_stack_alloc()\fP. -.P -If \fIcallback\fP not NULL, it is called with \fIdata\fP as an argument at -the start of matching, in order to set up a JIT stack. If the result is NULL, -the internal 32K stack is used; otherwise the return value must be a valid JIT -stack, the result of calling \fBpcre[16|32]_jit_stack_alloc()\fP. -.P -You may safely assign the same JIT stack to multiple patterns, as long as they -are all matched in the same thread. In a multithread application, each thread -must use its own JIT stack. For more details, see the -.\" HREF -\fBpcrejit\fP -.\" -page. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_compile.3 b/src/pcre/doc/pcre_compile.3 deleted file mode 100644 index 5c16ebe2..00000000 --- a/src/pcre/doc/pcre_compile.3 +++ /dev/null @@ -1,96 +0,0 @@ -.TH PCRE_COMPILE 3 "01 October 2013" "PCRE 8.34" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B pcre *pcre_compile(const char *\fIpattern\fP, int \fIoptions\fP, -.B " const char **\fIerrptr\fP, int *\fIerroffset\fP," -.B " const unsigned char *\fItableptr\fP);" -.sp -.B pcre16 *pcre16_compile(PCRE_SPTR16 \fIpattern\fP, int \fIoptions\fP, -.B " const char **\fIerrptr\fP, int *\fIerroffset\fP," -.B " const unsigned char *\fItableptr\fP);" -.sp -.B pcre32 *pcre32_compile(PCRE_SPTR32 \fIpattern\fP, int \fIoptions\fP, -.B " const char **\fIerrptr\fP, int *\fIerroffset\fP," -.B " const unsigned char *\fItableptr\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This function compiles a regular expression into an internal form. It is the -same as \fBpcre[16|32]_compile2()\fP, except for the absence of the -\fIerrorcodeptr\fP argument. Its arguments are: -.sp - \fIpattern\fP A zero-terminated string containing the - regular expression to be compiled - \fIoptions\fP Zero or more option bits - \fIerrptr\fP Where to put an error message - \fIerroffset\fP Offset in pattern where error was found - \fItableptr\fP Pointer to character tables, or NULL to - use the built-in default -.sp -The option bits are: -.sp - PCRE_ANCHORED Force pattern anchoring - PCRE_AUTO_CALLOUT Compile automatic callouts - PCRE_BSR_ANYCRLF \eR matches only CR, LF, or CRLF - PCRE_BSR_UNICODE \eR matches all Unicode line endings - PCRE_CASELESS Do caseless matching - PCRE_DOLLAR_ENDONLY $ not to match newline at end - PCRE_DOTALL . matches anything including NL - PCRE_DUPNAMES Allow duplicate names for subpatterns - PCRE_EXTENDED Ignore white space and # comments - PCRE_EXTRA PCRE extra features - (not much use currently) - PCRE_FIRSTLINE Force matching to be before newline - PCRE_JAVASCRIPT_COMPAT JavaScript compatibility - PCRE_MULTILINE ^ and $ match newlines within data - PCRE_NEVER_UTF Lock out UTF, e.g. via (*UTF) - PCRE_NEWLINE_ANY Recognize any Unicode newline sequence - PCRE_NEWLINE_ANYCRLF Recognize CR, LF, and CRLF as newline - sequences - PCRE_NEWLINE_CR Set CR as the newline sequence - PCRE_NEWLINE_CRLF Set CRLF as the newline sequence - PCRE_NEWLINE_LF Set LF as the newline sequence - PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren- - theses (named ones available) - PCRE_NO_AUTO_POSSESS Disable auto-possessification - PCRE_NO_START_OPTIMIZE Disable match-time start optimizations - PCRE_NO_UTF16_CHECK Do not check the pattern for UTF-16 - validity (only relevant if - PCRE_UTF16 is set) - PCRE_NO_UTF32_CHECK Do not check the pattern for UTF-32 - validity (only relevant if - PCRE_UTF32 is set) - PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8 - validity (only relevant if - PCRE_UTF8 is set) - PCRE_UCP Use Unicode properties for \ed, \ew, etc. - PCRE_UNGREEDY Invert greediness of quantifiers - PCRE_UTF16 Run in \fBpcre16_compile()\fP UTF-16 mode - PCRE_UTF32 Run in \fBpcre32_compile()\fP UTF-32 mode - PCRE_UTF8 Run in \fBpcre_compile()\fP UTF-8 mode -.sp -PCRE must be built with UTF support in order to use PCRE_UTF8/16/32 and -PCRE_NO_UTF8/16/32_CHECK, and with UCP support if PCRE_UCP is used. -.P -The yield of the function is a pointer to a private data structure that -contains the compiled pattern, or NULL if an error was detected. Note that -compiling regular expressions with one version of PCRE for use with a different -version is not guaranteed to work and may cause crashes. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_compile2.3 b/src/pcre/doc/pcre_compile2.3 deleted file mode 100644 index 37742018..00000000 --- a/src/pcre/doc/pcre_compile2.3 +++ /dev/null @@ -1,101 +0,0 @@ -.TH PCRE_COMPILE2 3 "01 October 2013" "PCRE 8.34" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B pcre *pcre_compile2(const char *\fIpattern\fP, int \fIoptions\fP, -.B " int *\fIerrorcodeptr\fP," -.B " const char **\fIerrptr\fP, int *\fIerroffset\fP," -.B " const unsigned char *\fItableptr\fP);" -.sp -.B pcre16 *pcre16_compile2(PCRE_SPTR16 \fIpattern\fP, int \fIoptions\fP, -.B " int *\fIerrorcodeptr\fP," -.B " const char **\fIerrptr\fP, int *\fIerroffset\fP," -.B " const unsigned char *\fItableptr\fP);" -.sp -.B pcre32 *pcre32_compile2(PCRE_SPTR32 \fIpattern\fP, int \fIoptions\fP, -.B " int *\fIerrorcodeptr\fP,£ -.B " const char **\fIerrptr\fP, int *\fIerroffset\fP," -.B " const unsigned char *\fItableptr\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This function compiles a regular expression into an internal form. It is the -same as \fBpcre[16|32]_compile()\fP, except for the addition of the -\fIerrorcodeptr\fP argument. The arguments are: -. -.sp - \fIpattern\fP A zero-terminated string containing the - regular expression to be compiled - \fIoptions\fP Zero or more option bits - \fIerrorcodeptr\fP Where to put an error code - \fIerrptr\fP Where to put an error message - \fIerroffset\fP Offset in pattern where error was found - \fItableptr\fP Pointer to character tables, or NULL to - use the built-in default -.sp -The option bits are: -.sp - PCRE_ANCHORED Force pattern anchoring - PCRE_AUTO_CALLOUT Compile automatic callouts - PCRE_BSR_ANYCRLF \eR matches only CR, LF, or CRLF - PCRE_BSR_UNICODE \eR matches all Unicode line endings - PCRE_CASELESS Do caseless matching - PCRE_DOLLAR_ENDONLY $ not to match newline at end - PCRE_DOTALL . matches anything including NL - PCRE_DUPNAMES Allow duplicate names for subpatterns - PCRE_EXTENDED Ignore white space and # comments - PCRE_EXTRA PCRE extra features - (not much use currently) - PCRE_FIRSTLINE Force matching to be before newline - PCRE_JAVASCRIPT_COMPAT JavaScript compatibility - PCRE_MULTILINE ^ and $ match newlines within data - PCRE_NEVER_UTF Lock out UTF, e.g. via (*UTF) - PCRE_NEWLINE_ANY Recognize any Unicode newline sequence - PCRE_NEWLINE_ANYCRLF Recognize CR, LF, and CRLF as newline - sequences - PCRE_NEWLINE_CR Set CR as the newline sequence - PCRE_NEWLINE_CRLF Set CRLF as the newline sequence - PCRE_NEWLINE_LF Set LF as the newline sequence - PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren- - theses (named ones available) - PCRE_NO_AUTO_POSSESS Disable auto-possessification - PCRE_NO_START_OPTIMIZE Disable match-time start optimizations - PCRE_NO_UTF16_CHECK Do not check the pattern for UTF-16 - validity (only relevant if - PCRE_UTF16 is set) - PCRE_NO_UTF32_CHECK Do not check the pattern for UTF-32 - validity (only relevant if - PCRE_UTF32 is set) - PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8 - validity (only relevant if - PCRE_UTF8 is set) - PCRE_UCP Use Unicode properties for \ed, \ew, etc. - PCRE_UNGREEDY Invert greediness of quantifiers - PCRE_UTF16 Run \fBpcre16_compile()\fP in UTF-16 mode - PCRE_UTF32 Run \fBpcre32_compile()\fP in UTF-32 mode - PCRE_UTF8 Run \fBpcre_compile()\fP in UTF-8 mode -.sp -PCRE must be built with UTF support in order to use PCRE_UTF8/16/32 and -PCRE_NO_UTF8/16/32_CHECK, and with UCP support if PCRE_UCP is used. -.P -The yield of the function is a pointer to a private data structure that -contains the compiled pattern, or NULL if an error was detected. Note that -compiling regular expressions with one version of PCRE for use with a different -version is not guaranteed to work and may cause crashes. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_config.3 b/src/pcre/doc/pcre_config.3 deleted file mode 100644 index d14ffdad..00000000 --- a/src/pcre/doc/pcre_config.3 +++ /dev/null @@ -1,79 +0,0 @@ -.TH PCRE_CONFIG 3 "20 April 2014" "PCRE 8.36" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.SM -.B int pcre_config(int \fIwhat\fP, void *\fIwhere\fP); -.PP -.B int pcre16_config(int \fIwhat\fP, void *\fIwhere\fP); -.PP -.B int pcre32_config(int \fIwhat\fP, void *\fIwhere\fP); -. -.SH DESCRIPTION -.rs -.sp -This function makes it possible for a client program to find out which optional -features are available in the version of the PCRE library it is using. The -arguments are as follows: -.sp - \fIwhat\fP A code specifying what information is required - \fIwhere\fP Points to where to put the data -.sp -The \fIwhere\fP argument must point to an integer variable, except for -PCRE_CONFIG_MATCH_LIMIT, PCRE_CONFIG_MATCH_LIMIT_RECURSION, and -PCRE_CONFIG_PARENS_LIMIT, when it must point to an unsigned long integer, -and for PCRE_CONFIG_JITTARGET, when it must point to a const char*. -The available codes are: -.sp - PCRE_CONFIG_JIT Availability of just-in-time compiler - support (1=yes 0=no) - PCRE_CONFIG_JITTARGET String containing information about the - target architecture for the JIT compiler, - or NULL if there is no JIT support - PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4 - PCRE_CONFIG_PARENS_LIMIT Parentheses nesting limit - PCRE_CONFIG_MATCH_LIMIT Internal resource limit - PCRE_CONFIG_MATCH_LIMIT_RECURSION - Internal recursion depth limit - PCRE_CONFIG_NEWLINE Value of the default newline sequence: - 13 (0x000d) for CR - 10 (0x000a) for LF - 3338 (0x0d0a) for CRLF - -2 for ANYCRLF - -1 for ANY - PCRE_CONFIG_BSR Indicates what \eR matches by default: - 0 all Unicode line endings - 1 CR, LF, or CRLF only - PCRE_CONFIG_POSIX_MALLOC_THRESHOLD - Threshold of return slots, above which - \fBmalloc()\fP is used by the POSIX API - PCRE_CONFIG_STACKRECURSE Recursion implementation (1=stack 0=heap) - PCRE_CONFIG_UTF16 Availability of UTF-16 support (1=yes - 0=no); option for \fBpcre16_config()\fP - PCRE_CONFIG_UTF32 Availability of UTF-32 support (1=yes - 0=no); option for \fBpcre32_config()\fP - PCRE_CONFIG_UTF8 Availability of UTF-8 support (1=yes 0=no); - option for \fBpcre_config()\fP - PCRE_CONFIG_UNICODE_PROPERTIES - Availability of Unicode property support - (1=yes 0=no) -.sp -The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise. That error -is also given if PCRE_CONFIG_UTF16 or PCRE_CONFIG_UTF32 is passed to -\fBpcre_config()\fP, if PCRE_CONFIG_UTF8 or PCRE_CONFIG_UTF32 is passed to -\fBpcre16_config()\fP, or if PCRE_CONFIG_UTF8 or PCRE_CONFIG_UTF16 is passed to -\fBpcre32_config()\fP. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_copy_named_substring.3 b/src/pcre/doc/pcre_copy_named_substring.3 deleted file mode 100644 index 52582aec..00000000 --- a/src/pcre/doc/pcre_copy_named_substring.3 +++ /dev/null @@ -1,51 +0,0 @@ -.TH PCRE_COPY_NAMED_SUBSTRING 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B int pcre_copy_named_substring(const pcre *\fIcode\fP, -.B " const char *\fIsubject\fP, int *\fIovector\fP," -.B " int \fIstringcount\fP, const char *\fIstringname\fP," -.B " char *\fIbuffer\fP, int \fIbuffersize\fP);" -.sp -.B int pcre16_copy_named_substring(const pcre16 *\fIcode\fP, -.B " PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP," -.B " int \fIstringcount\fP, PCRE_SPTR16 \fIstringname\fP," -.B " PCRE_UCHAR16 *\fIbuffer\fP, int \fIbuffersize\fP);" -.sp -.B int pcre32_copy_named_substring(const pcre32 *\fIcode\fP, -.B " PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP," -.B " int \fIstringcount\fP, PCRE_SPTR32 \fIstringname\fP," -.B " PCRE_UCHAR32 *\fIbuffer\fP, int \fIbuffersize\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This is a convenience function for extracting a captured substring, identified -by name, into a given buffer. The arguments are: -.sp - \fIcode\fP Pattern that was successfully matched - \fIsubject\fP Subject that has been successfully matched - \fIovector\fP Offset vector that \fBpcre[16|32]_exec()\fP used - \fIstringcount\fP Value returned by \fBpcre[16|32]_exec()\fP - \fIstringname\fP Name of the required substring - \fIbuffer\fP Buffer to receive the string - \fIbuffersize\fP Size of buffer -.sp -The yield is the length of the substring, PCRE_ERROR_NOMEMORY if the buffer was -too small, or PCRE_ERROR_NOSUBSTRING if the string name is invalid. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_copy_substring.3 b/src/pcre/doc/pcre_copy_substring.3 deleted file mode 100644 index 83af6e80..00000000 --- a/src/pcre/doc/pcre_copy_substring.3 +++ /dev/null @@ -1,47 +0,0 @@ -.TH PCRE_COPY_SUBSTRING 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B int pcre_copy_substring(const char *\fIsubject\fP, int *\fIovector\fP, -.B " int \fIstringcount\fP, int \fIstringnumber\fP, char *\fIbuffer\fP," -.B " int \fIbuffersize\fP);" -.sp -.B int pcre16_copy_substring(PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP, -.B " int \fIstringcount\fP, int \fIstringnumber\fP, PCRE_UCHAR16 *\fIbuffer\fP," -.B " int \fIbuffersize\fP);" -.sp -.B int pcre32_copy_substring(PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP, -.B " int \fIstringcount\fP, int \fIstringnumber\fP, PCRE_UCHAR32 *\fIbuffer\fP," -.B " int \fIbuffersize\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This is a convenience function for extracting a captured substring into a given -buffer. The arguments are: -.sp - \fIsubject\fP Subject that has been successfully matched - \fIovector\fP Offset vector that \fBpcre[16|32]_exec()\fP used - \fIstringcount\fP Value returned by \fBpcre[16|32]_exec()\fP - \fIstringnumber\fP Number of the required substring - \fIbuffer\fP Buffer to receive the string - \fIbuffersize\fP Size of buffer -.sp -The yield is the length of the string, PCRE_ERROR_NOMEMORY if the buffer was -too small, or PCRE_ERROR_NOSUBSTRING if the string number is invalid. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_dfa_exec.3 b/src/pcre/doc/pcre_dfa_exec.3 deleted file mode 100644 index 39c2e836..00000000 --- a/src/pcre/doc/pcre_dfa_exec.3 +++ /dev/null @@ -1,118 +0,0 @@ -.TH PCRE_DFA_EXEC 3 "12 May 2013" "PCRE 8.33" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B int pcre_dfa_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP," -.B " const char *\fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP," -.B " int *\fIworkspace\fP, int \fIwscount\fP);" -.sp -.B int pcre16_dfa_exec(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP," -.B " PCRE_SPTR16 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP," -.B " int *\fIworkspace\fP, int \fIwscount\fP);" -.sp -.B int pcre32_dfa_exec(const pcre32 *\fIcode\fP, "const pcre32_extra *\fIextra\fP," -.B " PCRE_SPTR32 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP," -.B " int *\fIworkspace\fP, int \fIwscount\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This function matches a compiled regular expression against a given subject -string, using an alternative matching algorithm that scans the subject string -just once (\fInot\fP Perl-compatible). Note that the main, Perl-compatible, -matching function is \fBpcre[16|32]_exec()\fP. The arguments for this function -are: -.sp - \fIcode\fP Points to the compiled pattern - \fIextra\fP Points to an associated \fBpcre[16|32]_extra\fP structure, - or is NULL - \fIsubject\fP Points to the subject string - \fIlength\fP Length of the subject string - \fIstartoffset\fP Offset in the subject at which to start matching - \fIoptions\fP Option bits - \fIovector\fP Points to a vector of ints for result offsets - \fIovecsize\fP Number of elements in the vector - \fIworkspace\fP Points to a vector of ints used as working space - \fIwscount\fP Number of elements in the vector -.sp -The units for \fIlength\fP and \fIstartoffset\fP are bytes for -\fBpcre_exec()\fP, 16-bit data items for \fBpcre16_exec()\fP, and 32-bit items -for \fBpcre32_exec()\fP. The options are: -.sp - PCRE_ANCHORED Match only at the first position - PCRE_BSR_ANYCRLF \eR matches only CR, LF, or CRLF - PCRE_BSR_UNICODE \eR matches all Unicode line endings - PCRE_NEWLINE_ANY Recognize any Unicode newline sequence - PCRE_NEWLINE_ANYCRLF Recognize CR, LF, & CRLF as newline sequences - PCRE_NEWLINE_CR Recognize CR as the only newline sequence - PCRE_NEWLINE_CRLF Recognize CRLF as the only newline sequence - PCRE_NEWLINE_LF Recognize LF as the only newline sequence - PCRE_NOTBOL Subject is not the beginning of a line - PCRE_NOTEOL Subject is not the end of a line - PCRE_NOTEMPTY An empty string is not a valid match - PCRE_NOTEMPTY_ATSTART An empty string at the start of the subject - is not a valid match - PCRE_NO_START_OPTIMIZE Do not do "start-match" optimizations - PCRE_NO_UTF16_CHECK Do not check the subject for UTF-16 - validity (only relevant if PCRE_UTF16 - was set at compile time) - PCRE_NO_UTF32_CHECK Do not check the subject for UTF-32 - validity (only relevant if PCRE_UTF32 - was set at compile time) - PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8 - validity (only relevant if PCRE_UTF8 - was set at compile time) - PCRE_PARTIAL ) Return PCRE_ERROR_PARTIAL for a partial - PCRE_PARTIAL_SOFT ) match if no full matches are found - PCRE_PARTIAL_HARD Return PCRE_ERROR_PARTIAL for a partial match - even if there is a full match as well - PCRE_DFA_SHORTEST Return only the shortest match - PCRE_DFA_RESTART Restart after a partial match -.sp -There are restrictions on what may appear in a pattern when using this matching -function. Details are given in the -.\" HREF -\fBpcrematching\fP -.\" -documentation. For details of partial matching, see the -.\" HREF -\fBpcrepartial\fP -.\" -page. -.P -A \fBpcre[16|32]_extra\fP structure contains the following fields: -.sp - \fIflags\fP Bits indicating which fields are set - \fIstudy_data\fP Opaque data from \fBpcre[16|32]_study()\fP - \fImatch_limit\fP Limit on internal resource use - \fImatch_limit_recursion\fP Limit on internal recursion depth - \fIcallout_data\fP Opaque data passed back to callouts - \fItables\fP Points to character tables or is NULL - \fImark\fP For passing back a *MARK pointer - \fIexecutable_jit\fP Opaque data from JIT compilation -.sp -The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT, -PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, -PCRE_EXTRA_TABLES, PCRE_EXTRA_MARK and PCRE_EXTRA_EXECUTABLE_JIT. For this -matching function, the \fImatch_limit\fP and \fImatch_limit_recursion\fP fields -are not used, and must not be set. The PCRE_EXTRA_EXECUTABLE_JIT flag and -the corresponding variable are ignored. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_exec.3 b/src/pcre/doc/pcre_exec.3 deleted file mode 100644 index 4686bd6d..00000000 --- a/src/pcre/doc/pcre_exec.3 +++ /dev/null @@ -1,99 +0,0 @@ -.TH PCRE_EXEC 3 "12 May 2013" "PCRE 8.33" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B int pcre_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP," -.B " const char *\fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);" -.sp -.B int pcre16_exec(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP," -.B " PCRE_SPTR16 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);" -.sp -.B int pcre32_exec(const pcre32 *\fIcode\fP, "const pcre32_extra *\fIextra\fP," -.B " PCRE_SPTR32 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This function matches a compiled regular expression against a given subject -string, using a matching algorithm that is similar to Perl's. It returns -offsets to captured substrings. Its arguments are: -.sp - \fIcode\fP Points to the compiled pattern - \fIextra\fP Points to an associated \fBpcre[16|32]_extra\fP structure, - or is NULL - \fIsubject\fP Points to the subject string - \fIlength\fP Length of the subject string - \fIstartoffset\fP Offset in the subject at which to start matching - \fIoptions\fP Option bits - \fIovector\fP Points to a vector of ints for result offsets - \fIovecsize\fP Number of elements in the vector (a multiple of 3) -.sp -The units for \fIlength\fP and \fIstartoffset\fP are bytes for -\fBpcre_exec()\fP, 16-bit data items for \fBpcre16_exec()\fP, and 32-bit items -for \fBpcre32_exec()\fP. The options are: -.sp - PCRE_ANCHORED Match only at the first position - PCRE_BSR_ANYCRLF \eR matches only CR, LF, or CRLF - PCRE_BSR_UNICODE \eR matches all Unicode line endings - PCRE_NEWLINE_ANY Recognize any Unicode newline sequence - PCRE_NEWLINE_ANYCRLF Recognize CR, LF, & CRLF as newline sequences - PCRE_NEWLINE_CR Recognize CR as the only newline sequence - PCRE_NEWLINE_CRLF Recognize CRLF as the only newline sequence - PCRE_NEWLINE_LF Recognize LF as the only newline sequence - PCRE_NOTBOL Subject string is not the beginning of a line - PCRE_NOTEOL Subject string is not the end of a line - PCRE_NOTEMPTY An empty string is not a valid match - PCRE_NOTEMPTY_ATSTART An empty string at the start of the subject - is not a valid match - PCRE_NO_START_OPTIMIZE Do not do "start-match" optimizations - PCRE_NO_UTF16_CHECK Do not check the subject for UTF-16 - validity (only relevant if PCRE_UTF16 - was set at compile time) - PCRE_NO_UTF32_CHECK Do not check the subject for UTF-32 - validity (only relevant if PCRE_UTF32 - was set at compile time) - PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8 - validity (only relevant if PCRE_UTF8 - was set at compile time) - PCRE_PARTIAL ) Return PCRE_ERROR_PARTIAL for a partial - PCRE_PARTIAL_SOFT ) match if no full matches are found - PCRE_PARTIAL_HARD Return PCRE_ERROR_PARTIAL for a partial match - if that is found before a full match -.sp -For details of partial matching, see the -.\" HREF -\fBpcrepartial\fP -.\" -page. A \fBpcre_extra\fP structure contains the following fields: -.sp - \fIflags\fP Bits indicating which fields are set - \fIstudy_data\fP Opaque data from \fBpcre[16|32]_study()\fP - \fImatch_limit\fP Limit on internal resource use - \fImatch_limit_recursion\fP Limit on internal recursion depth - \fIcallout_data\fP Opaque data passed back to callouts - \fItables\fP Points to character tables or is NULL - \fImark\fP For passing back a *MARK pointer - \fIexecutable_jit\fP Opaque data from JIT compilation -.sp -The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT, -PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, -PCRE_EXTRA_TABLES, PCRE_EXTRA_MARK and PCRE_EXTRA_EXECUTABLE_JIT. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_free_study.3 b/src/pcre/doc/pcre_free_study.3 deleted file mode 100644 index 8826b735..00000000 --- a/src/pcre/doc/pcre_free_study.3 +++ /dev/null @@ -1,31 +0,0 @@ -.TH PCRE_FREE_STUDY 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.SM -.B void pcre_free_study(pcre_extra *\fIextra\fP); -.PP -.B void pcre16_free_study(pcre16_extra *\fIextra\fP); -.PP -.B void pcre32_free_study(pcre32_extra *\fIextra\fP); -. -.SH DESCRIPTION -.rs -.sp -This function is used to free the memory used for the data generated by a call -to \fBpcre[16|32]_study()\fP when it is no longer needed. The argument must be the -result of such a call. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_free_substring.3 b/src/pcre/doc/pcre_free_substring.3 deleted file mode 100644 index 88c04019..00000000 --- a/src/pcre/doc/pcre_free_substring.3 +++ /dev/null @@ -1,31 +0,0 @@ -.TH PCRE_FREE_SUBSTRING 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.SM -.B void pcre_free_substring(const char *\fIstringptr\fP); -.PP -.B void pcre16_free_substring(PCRE_SPTR16 \fIstringptr\fP); -.PP -.B void pcre32_free_substring(PCRE_SPTR32 \fIstringptr\fP); -. -.SH DESCRIPTION -.rs -.sp -This is a convenience function for freeing the store obtained by a previous -call to \fBpcre[16|32]_get_substring()\fP or \fBpcre[16|32]_get_named_substring()\fP. -Its only argument is a pointer to the string. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_free_substring_list.3 b/src/pcre/doc/pcre_free_substring_list.3 deleted file mode 100644 index 248b4bd0..00000000 --- a/src/pcre/doc/pcre_free_substring_list.3 +++ /dev/null @@ -1,31 +0,0 @@ -.TH PCRE_FREE_SUBSTRING_LIST 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.SM -.B void pcre_free_substring_list(const char **\fIstringptr\fP); -.PP -.B void pcre16_free_substring_list(PCRE_SPTR16 *\fIstringptr\fP); -.PP -.B void pcre32_free_substring_list(PCRE_SPTR32 *\fIstringptr\fP); -. -.SH DESCRIPTION -.rs -.sp -This is a convenience function for freeing the store obtained by a previous -call to \fBpcre[16|32]_get_substring_list()\fP. Its only argument is a pointer to -the list of string pointers. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_fullinfo.3 b/src/pcre/doc/pcre_fullinfo.3 deleted file mode 100644 index c9b2c656..00000000 --- a/src/pcre/doc/pcre_fullinfo.3 +++ /dev/null @@ -1,103 +0,0 @@ -.TH PCRE_FULLINFO 3 "21 April 2014" "PCRE 8.36" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B int pcre_fullinfo(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP," -.B " int \fIwhat\fP, void *\fIwhere\fP);" -.sp -.B int pcre16_fullinfo(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP," -.B " int \fIwhat\fP, void *\fIwhere\fP);" -.sp -.B int pcre32_fullinfo(const pcre32 *\fIcode\fP, "const pcre32_extra *\fIextra\fP," -.B " int \fIwhat\fP, void *\fIwhere\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This function returns information about a compiled pattern. Its arguments are: -.sp - \fIcode\fP Compiled regular expression - \fIextra\fP Result of \fBpcre[16|32]_study()\fP or NULL - \fIwhat\fP What information is required - \fIwhere\fP Where to put the information -.sp -The following information is available: -.sp - PCRE_INFO_BACKREFMAX Number of highest back reference - PCRE_INFO_CAPTURECOUNT Number of capturing subpatterns - PCRE_INFO_DEFAULT_TABLES Pointer to default tables - PCRE_INFO_FIRSTBYTE Fixed first data unit for a match, or - -1 for start of string - or after newline, or - -2 otherwise - PCRE_INFO_FIRSTTABLE Table of first data units (after studying) - PCRE_INFO_HASCRORLF Return 1 if explicit CR or LF matches exist - PCRE_INFO_JCHANGED Return 1 if (?J) or (?-J) was used - PCRE_INFO_JIT Return 1 after successful JIT compilation - PCRE_INFO_JITSIZE Size of JIT compiled code - PCRE_INFO_LASTLITERAL Literal last data unit required - PCRE_INFO_MINLENGTH Lower bound length of matching strings - PCRE_INFO_MATCHEMPTY Return 1 if the pattern can match an empty string, - 0 otherwise - PCRE_INFO_MATCHLIMIT Match limit if set, otherwise PCRE_RROR_UNSET - PCRE_INFO_MAXLOOKBEHIND Length (in characters) of the longest lookbehind assertion - PCRE_INFO_NAMECOUNT Number of named subpatterns - PCRE_INFO_NAMEENTRYSIZE Size of name table entry - PCRE_INFO_NAMETABLE Pointer to name table - PCRE_INFO_OKPARTIAL Return 1 if partial matching can be tried - (always returns 1 after release 8.00) - PCRE_INFO_OPTIONS Option bits used for compilation - PCRE_INFO_SIZE Size of compiled pattern - PCRE_INFO_STUDYSIZE Size of study data - PCRE_INFO_FIRSTCHARACTER Fixed first data unit for a match - PCRE_INFO_FIRSTCHARACTERFLAGS Returns - 1 if there is a first data character set, which can - then be retrieved using PCRE_INFO_FIRSTCHARACTER, - 2 if the first character is at the start of the data - string or after a newline, and - 0 otherwise - PCRE_INFO_RECURSIONLIMIT Recursion limit if set, otherwise PCRE_ERROR_UNSET - PCRE_INFO_REQUIREDCHAR Literal last data unit required - PCRE_INFO_REQUIREDCHARFLAGS Returns 1 if the last data character is set (which can then - be retrieved using PCRE_INFO_REQUIREDCHAR); 0 otherwise -.sp -The \fIwhere\fP argument must point to an integer variable, except for the -following \fIwhat\fP values: -.sp - PCRE_INFO_DEFAULT_TABLES const uint8_t * - PCRE_INFO_FIRSTCHARACTER uint32_t - PCRE_INFO_FIRSTTABLE const uint8_t * - PCRE_INFO_JITSIZE size_t - PCRE_INFO_MATCHLIMIT uint32_t - PCRE_INFO_NAMETABLE PCRE_SPTR16 (16-bit library) - PCRE_INFO_NAMETABLE PCRE_SPTR32 (32-bit library) - PCRE_INFO_NAMETABLE const unsigned char * (8-bit library) - PCRE_INFO_OPTIONS unsigned long int - PCRE_INFO_SIZE size_t - PCRE_INFO_STUDYSIZE size_t - PCRE_INFO_RECURSIONLIMIT uint32_t - PCRE_INFO_REQUIREDCHAR uint32_t -.sp -The yield of the function is zero on success or: -.sp - PCRE_ERROR_NULL the argument \fIcode\fP was NULL - the argument \fIwhere\fP was NULL - PCRE_ERROR_BADMAGIC the "magic number" was not found - PCRE_ERROR_BADOPTION the value of \fIwhat\fP was invalid - PCRE_ERROR_UNSET the option was not set -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_get_named_substring.3 b/src/pcre/doc/pcre_get_named_substring.3 deleted file mode 100644 index 84d4ee7d..00000000 --- a/src/pcre/doc/pcre_get_named_substring.3 +++ /dev/null @@ -1,54 +0,0 @@ -.TH PCRE_GET_NAMED_SUBSTRING 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B int pcre_get_named_substring(const pcre *\fIcode\fP, -.B " const char *\fIsubject\fP, int *\fIovector\fP," -.B " int \fIstringcount\fP, const char *\fIstringname\fP," -.B " const char **\fIstringptr\fP);" -.sp -.B int pcre16_get_named_substring(const pcre16 *\fIcode\fP, -.B " PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP," -.B " int \fIstringcount\fP, PCRE_SPTR16 \fIstringname\fP," -.B " PCRE_SPTR16 *\fIstringptr\fP);" -.sp -.B int pcre32_get_named_substring(const pcre32 *\fIcode\fP, -.B " PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP," -.B " int \fIstringcount\fP, PCRE_SPTR32 \fIstringname\fP," -.B " PCRE_SPTR32 *\fIstringptr\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This is a convenience function for extracting a captured substring by name. The -arguments are: -.sp - \fIcode\fP Compiled pattern - \fIsubject\fP Subject that has been successfully matched - \fIovector\fP Offset vector that \fBpcre[16|32]_exec()\fP used - \fIstringcount\fP Value returned by \fBpcre[16|32]_exec()\fP - \fIstringname\fP Name of the required substring - \fIstringptr\fP Where to put the string pointer -.sp -The memory in which the substring is placed is obtained by calling -\fBpcre[16|32]_malloc()\fP. The convenience function -\fBpcre[16|32]_free_substring()\fP can be used to free it when it is no longer -needed. The yield of the function is the length of the extracted substring, -PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or -PCRE_ERROR_NOSUBSTRING if the string name is invalid. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_get_stringnumber.3 b/src/pcre/doc/pcre_get_stringnumber.3 deleted file mode 100644 index 9fc5291d..00000000 --- a/src/pcre/doc/pcre_get_stringnumber.3 +++ /dev/null @@ -1,43 +0,0 @@ -.TH PCRE_GET_STRINGNUMBER 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B int pcre_get_stringnumber(const pcre *\fIcode\fP, -.B " const char *\fIname\fP);" -.sp -.B int pcre16_get_stringnumber(const pcre16 *\fIcode\fP, -.B " PCRE_SPTR16 \fIname\fP);" -.sp -.B int pcre32_get_stringnumber(const pcre32 *\fIcode\fP, -.B " PCRE_SPTR32 \fIname\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This convenience function finds the number of a named substring capturing -parenthesis in a compiled pattern. Its arguments are: -.sp - \fIcode\fP Compiled regular expression - \fIname\fP Name whose number is required -.sp -The yield of the function is the number of the parenthesis if the name is -found, or PCRE_ERROR_NOSUBSTRING otherwise. When duplicate names are allowed -(PCRE_DUPNAMES is set), it is not defined which of the numbers is returned by -\fBpcre[16|32]_get_stringnumber()\fP. You can obtain the complete list by calling -\fBpcre[16|32]_get_stringtable_entries()\fP. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_get_stringtable_entries.3 b/src/pcre/doc/pcre_get_stringtable_entries.3 deleted file mode 100644 index 5c58c90c..00000000 --- a/src/pcre/doc/pcre_get_stringtable_entries.3 +++ /dev/null @@ -1,46 +0,0 @@ -.TH PCRE_GET_STRINGTABLE_ENTRIES 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B int pcre_get_stringtable_entries(const pcre *\fIcode\fP, -.B " const char *\fIname\fP, char **\fIfirst\fP, char **\fIlast\fP);" -.sp -.B int pcre16_get_stringtable_entries(const pcre16 *\fIcode\fP, -.B " PCRE_SPTR16 \fIname\fP, PCRE_UCHAR16 **\fIfirst\fP, PCRE_UCHAR16 **\fIlast\fP);" -.sp -.B int pcre32_get_stringtable_entries(const pcre32 *\fIcode\fP, -.B " PCRE_SPTR32 \fIname\fP, PCRE_UCHAR32 **\fIfirst\fP, PCRE_UCHAR32 **\fIlast\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This convenience function finds, for a compiled pattern, the first and last -entries for a given name in the table that translates capturing parenthesis -names into numbers. When names are required to be unique (PCRE_DUPNAMES is -\fInot\fP set), it is usually easier to use \fBpcre[16|32]_get_stringnumber()\fP -instead. -.sp - \fIcode\fP Compiled regular expression - \fIname\fP Name whose entries required - \fIfirst\fP Where to return a pointer to the first entry - \fIlast\fP Where to return a pointer to the last entry -.sp -The yield of the function is the length of each entry, or -PCRE_ERROR_NOSUBSTRING if none are found. -.P -There is a complete description of the PCRE native API, including the format of -the table entries, in the -.\" HREF -\fBpcreapi\fP -.\" -page, and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_get_substring.3 b/src/pcre/doc/pcre_get_substring.3 deleted file mode 100644 index 1e62b2c0..00000000 --- a/src/pcre/doc/pcre_get_substring.3 +++ /dev/null @@ -1,50 +0,0 @@ -.TH PCRE_GET_SUBSTRING 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B int pcre_get_substring(const char *\fIsubject\fP, int *\fIovector\fP, -.B " int \fIstringcount\fP, int \fIstringnumber\fP," -.B " const char **\fIstringptr\fP);" -.sp -.B int pcre16_get_substring(PCRE_SPTR16 \fIsubject\fP, int *\fIovector\fP, -.B " int \fIstringcount\fP, int \fIstringnumber\fP," -.B " PCRE_SPTR16 *\fIstringptr\fP);" -.sp -.B int pcre32_get_substring(PCRE_SPTR32 \fIsubject\fP, int *\fIovector\fP, -.B " int \fIstringcount\fP, int \fIstringnumber\fP," -.B " PCRE_SPTR32 *\fIstringptr\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This is a convenience function for extracting a captured substring. The -arguments are: -.sp - \fIsubject\fP Subject that has been successfully matched - \fIovector\fP Offset vector that \fBpcre[16|32]_exec()\fP used - \fIstringcount\fP Value returned by \fBpcre[16|32]_exec()\fP - \fIstringnumber\fP Number of the required substring - \fIstringptr\fP Where to put the string pointer -.sp -The memory in which the substring is placed is obtained by calling -\fBpcre[16|32]_malloc()\fP. The convenience function -\fBpcre[16|32]_free_substring()\fP can be used to free it when it is no longer -needed. The yield of the function is the length of the substring, -PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or -PCRE_ERROR_NOSUBSTRING if the string number is invalid. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_get_substring_list.3 b/src/pcre/doc/pcre_get_substring_list.3 deleted file mode 100644 index 511a4a39..00000000 --- a/src/pcre/doc/pcre_get_substring_list.3 +++ /dev/null @@ -1,47 +0,0 @@ -.TH PCRE_GET_SUBSTRING_LIST 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B int pcre_get_substring_list(const char *\fIsubject\fP, -.B " int *\fIovector\fP, int \fIstringcount\fP, const char ***\fIlistptr\fP);" -.sp -.B int pcre16_get_substring_list(PCRE_SPTR16 \fIsubject\fP, -.B " int *\fIovector\fP, int \fIstringcount\fP, PCRE_SPTR16 **\fIlistptr\fP);" -.sp -.B int pcre32_get_substring_list(PCRE_SPTR32 \fIsubject\fP, -.B " int *\fIovector\fP, int \fIstringcount\fP, PCRE_SPTR32 **\fIlistptr\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This is a convenience function for extracting a list of all the captured -substrings. The arguments are: -.sp - \fIsubject\fP Subject that has been successfully matched - \fIovector\fP Offset vector that \fBpcre[16|32]_exec\fP used - \fIstringcount\fP Value returned by \fBpcre[16|32]_exec\fP - \fIlistptr\fP Where to put a pointer to the list -.sp -The memory in which the substrings and the list are placed is obtained by -calling \fBpcre[16|32]_malloc()\fP. The convenience function -\fBpcre[16|32]_free_substring_list()\fP can be used to free it when it is no -longer needed. A pointer to a list of pointers is put in the variable whose -address is in \fIlistptr\fP. The list is terminated by a NULL pointer. The -yield of the function is zero on success or PCRE_ERROR_NOMEMORY if sufficient -memory could not be obtained. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_jit_exec.3 b/src/pcre/doc/pcre_jit_exec.3 deleted file mode 100644 index ba851681..00000000 --- a/src/pcre/doc/pcre_jit_exec.3 +++ /dev/null @@ -1,96 +0,0 @@ -.TH PCRE_EXEC 3 "31 October 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B int pcre_jit_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP," -.B " const char *\fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP," -.B " pcre_jit_stack *\fIjstack\fP);" -.sp -.B int pcre16_jit_exec(const pcre16 *\fIcode\fP, "const pcre16_extra *\fIextra\fP," -.B " PCRE_SPTR16 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP," -.B " pcre_jit_stack *\fIjstack\fP);" -.sp -.B int pcre32_jit_exec(const pcre32 *\fIcode\fP, "const pcre32_extra *\fIextra\fP," -.B " PCRE_SPTR32 \fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP," -.B " pcre_jit_stack *\fIjstack\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This function matches a compiled regular expression that has been successfully -studied with one of the JIT options against a given subject string, using a -matching algorithm that is similar to Perl's. It is a "fast path" interface to -JIT, and it bypasses some of the sanity checks that \fBpcre_exec()\fP applies. -It returns offsets to captured substrings. Its arguments are: -.sp - \fIcode\fP Points to the compiled pattern - \fIextra\fP Points to an associated \fBpcre[16|32]_extra\fP structure, - or is NULL - \fIsubject\fP Points to the subject string - \fIlength\fP Length of the subject string, in bytes - \fIstartoffset\fP Offset in bytes in the subject at which to - start matching - \fIoptions\fP Option bits - \fIovector\fP Points to a vector of ints for result offsets - \fIovecsize\fP Number of elements in the vector (a multiple of 3) - \fIjstack\fP Pointer to a JIT stack -.sp -The allowed options are: -.sp - PCRE_NOTBOL Subject string is not the beginning of a line - PCRE_NOTEOL Subject string is not the end of a line - PCRE_NOTEMPTY An empty string is not a valid match - PCRE_NOTEMPTY_ATSTART An empty string at the start of the subject - is not a valid match - PCRE_NO_UTF16_CHECK Do not check the subject for UTF-16 - validity (only relevant if PCRE_UTF16 - was set at compile time) - PCRE_NO_UTF32_CHECK Do not check the subject for UTF-32 - validity (only relevant if PCRE_UTF32 - was set at compile time) - PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8 - validity (only relevant if PCRE_UTF8 - was set at compile time) - PCRE_PARTIAL ) Return PCRE_ERROR_PARTIAL for a partial - PCRE_PARTIAL_SOFT ) match if no full matches are found - PCRE_PARTIAL_HARD Return PCRE_ERROR_PARTIAL for a partial match - if that is found before a full match -.sp -However, the PCRE_NO_UTF[8|16|32]_CHECK options have no effect, as this check -is never applied. For details of partial matching, see the -.\" HREF -\fBpcrepartial\fP -.\" -page. A \fBpcre_extra\fP structure contains the following fields: -.sp - \fIflags\fP Bits indicating which fields are set - \fIstudy_data\fP Opaque data from \fBpcre[16|32]_study()\fP - \fImatch_limit\fP Limit on internal resource use - \fImatch_limit_recursion\fP Limit on internal recursion depth - \fIcallout_data\fP Opaque data passed back to callouts - \fItables\fP Points to character tables or is NULL - \fImark\fP For passing back a *MARK pointer - \fIexecutable_jit\fP Opaque data from JIT compilation -.sp -The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT, -PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, -PCRE_EXTRA_TABLES, PCRE_EXTRA_MARK and PCRE_EXTRA_EXECUTABLE_JIT. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the JIT API in the -.\" HREF -\fBpcrejit\fP -.\" -page. diff --git a/src/pcre/doc/pcre_jit_stack_alloc.3 b/src/pcre/doc/pcre_jit_stack_alloc.3 deleted file mode 100644 index 11c97a0f..00000000 --- a/src/pcre/doc/pcre_jit_stack_alloc.3 +++ /dev/null @@ -1,43 +0,0 @@ -.TH PCRE_JIT_STACK_ALLOC 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B pcre_jit_stack *pcre_jit_stack_alloc(int \fIstartsize\fP, -.B " int \fImaxsize\fP);" -.sp -.B pcre16_jit_stack *pcre16_jit_stack_alloc(int \fIstartsize\fP, -.B " int \fImaxsize\fP);" -.sp -.B pcre32_jit_stack *pcre32_jit_stack_alloc(int \fIstartsize\fP, -.B " int \fImaxsize\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This function is used to create a stack for use by the code compiled by the JIT -optimization of \fBpcre[16|32]_study()\fP. The arguments are a starting size for -the stack, and a maximum size to which it is allowed to grow. The result can be -passed to the JIT run-time code by \fBpcre[16|32]_assign_jit_stack()\fP, or that -function can set up a callback for obtaining a stack. A maximum stack size of -512K to 1M should be more than enough for any pattern. For more details, see -the -.\" HREF -\fBpcrejit\fP -.\" -page. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_jit_stack_free.3 b/src/pcre/doc/pcre_jit_stack_free.3 deleted file mode 100644 index 494724e8..00000000 --- a/src/pcre/doc/pcre_jit_stack_free.3 +++ /dev/null @@ -1,35 +0,0 @@ -.TH PCRE_JIT_STACK_FREE 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.SM -.B void pcre_jit_stack_free(pcre_jit_stack *\fIstack\fP); -.PP -.B void pcre16_jit_stack_free(pcre16_jit_stack *\fIstack\fP); -.PP -.B void pcre32_jit_stack_free(pcre32_jit_stack *\fIstack\fP); -. -.SH DESCRIPTION -.rs -.sp -This function is used to free a JIT stack that was created by -\fBpcre[16|32]_jit_stack_alloc()\fP when it is no longer needed. For more details, -see the -.\" HREF -\fBpcrejit\fP -.\" -page. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_maketables.3 b/src/pcre/doc/pcre_maketables.3 deleted file mode 100644 index b2c3d23a..00000000 --- a/src/pcre/doc/pcre_maketables.3 +++ /dev/null @@ -1,33 +0,0 @@ -.TH PCRE_MAKETABLES 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.SM -.B const unsigned char *pcre_maketables(void); -.PP -.B const unsigned char *pcre16_maketables(void); -.PP -.B const unsigned char *pcre32_maketables(void); -. -.SH DESCRIPTION -.rs -.sp -This function builds a set of character tables for character values less than -256. These can be passed to \fBpcre[16|32]_compile()\fP to override PCRE's -internal, built-in tables (which were made by \fBpcre[16|32]_maketables()\fP when -PCRE was compiled). You might want to do this if you are using a non-standard -locale. The function yields a pointer to the tables. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_pattern_to_host_byte_order.3 b/src/pcre/doc/pcre_pattern_to_host_byte_order.3 deleted file mode 100644 index b0c41c38..00000000 --- a/src/pcre/doc/pcre_pattern_to_host_byte_order.3 +++ /dev/null @@ -1,44 +0,0 @@ -.TH PCRE_PATTERN_TO_HOST_BYTE_ORDER 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B int pcre_pattern_to_host_byte_order(pcre *\fIcode\fP, -.B " pcre_extra *\fIextra\fP, const unsigned char *\fItables\fP);" -.sp -.B int pcre16_pattern_to_host_byte_order(pcre16 *\fIcode\fP, -.B " pcre16_extra *\fIextra\fP, const unsigned char *\fItables\fP);" -.sp -.B int pcre32_pattern_to_host_byte_order(pcre32 *\fIcode\fP, -.B " pcre32_extra *\fIextra\fP, const unsigned char *\fItables\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This function ensures that the bytes in 2-byte and 4-byte values in a compiled -pattern are in the correct order for the current host. It is useful when a -pattern that has been compiled on one host is transferred to another that might -have different endianness. The arguments are: -.sp - \fIcode\fP A compiled regular expression - \fIextra\fP Points to an associated \fBpcre[16|32]_extra\fP structure, - or is NULL - \fItables\fP Pointer to character tables, or NULL to - set the built-in default -.sp -The result is 0 for success, a negative PCRE_ERROR_xxx value otherwise. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_refcount.3 b/src/pcre/doc/pcre_refcount.3 deleted file mode 100644 index 45a41fef..00000000 --- a/src/pcre/doc/pcre_refcount.3 +++ /dev/null @@ -1,36 +0,0 @@ -.TH PCRE_REFCOUNT 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.SM -.B int pcre_refcount(pcre *\fIcode\fP, int \fIadjust\fP); -.PP -.B int pcre16_refcount(pcre16 *\fIcode\fP, int \fIadjust\fP); -.PP -.B int pcre32_refcount(pcre32 *\fIcode\fP, int \fIadjust\fP); -. -.SH DESCRIPTION -.rs -.sp -This function is used to maintain a reference count inside a data block that -contains a compiled pattern. Its arguments are: -.sp - \fIcode\fP Compiled regular expression - \fIadjust\fP Adjustment to reference value -.sp -The yield of the function is the adjusted reference value, which is constrained -to lie between 0 and 65535. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_study.3 b/src/pcre/doc/pcre_study.3 deleted file mode 100644 index 1200e0a6..00000000 --- a/src/pcre/doc/pcre_study.3 +++ /dev/null @@ -1,54 +0,0 @@ -.TH PCRE_STUDY 3 " 24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP, -.B " const char **\fIerrptr\fP);" -.sp -.B pcre16_extra *pcre16_study(const pcre16 *\fIcode\fP, int \fIoptions\fP, -.B " const char **\fIerrptr\fP);" -.sp -.B pcre32_extra *pcre32_study(const pcre32 *\fIcode\fP, int \fIoptions\fP, -.B " const char **\fIerrptr\fP);" -.fi -. -.SH DESCRIPTION -.rs -.sp -This function studies a compiled pattern, to see if additional information can -be extracted that might speed up matching. Its arguments are: -.sp - \fIcode\fP A compiled regular expression - \fIoptions\fP Options for \fBpcre[16|32]_study()\fP - \fIerrptr\fP Where to put an error message -.sp -If the function succeeds, it returns a value that can be passed to -\fBpcre[16|32]_exec()\fP or \fBpcre[16|32]_dfa_exec()\fP via their \fIextra\fP -arguments. -.P -If the function returns NULL, either it could not find any additional -information, or there was an error. You can tell the difference by looking at -the error value. It is NULL in first case. -.P -The only option is PCRE_STUDY_JIT_COMPILE. It requests just-in-time compilation -if possible. If PCRE has been compiled without JIT support, this option is -ignored. See the -.\" HREF -\fBpcrejit\fP -.\" -page for further details. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_utf16_to_host_byte_order.3 b/src/pcre/doc/pcre_utf16_to_host_byte_order.3 deleted file mode 100644 index 1851b619..00000000 --- a/src/pcre/doc/pcre_utf16_to_host_byte_order.3 +++ /dev/null @@ -1,45 +0,0 @@ -.TH PCRE_UTF16_TO_HOST_BYTE_ORDER 3 "21 January 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *\fIoutput\fP, -.B " PCRE_SPTR16 \fIinput\fP, int \fIlength\fP, int *\fIhost_byte_order\fP," -.B " int \fIkeep_boms\fP);" -.fi -. -. -.SH DESCRIPTION -.rs -.sp -This function, which exists only in the 16-bit library, converts a UTF-16 -string to the correct order for the current host, taking account of any byte -order marks (BOMs) within the string. Its arguments are: -.sp - \fIoutput\fP pointer to output buffer, may be the same as \fIinput\fP - \fIinput\fP pointer to input buffer - \fIlength\fP number of 16-bit units in the input, or negative for - a zero-terminated string - \fIhost_byte_order\fP a NULL value or a non-zero value pointed to means - start in host byte order - \fIkeep_boms\fP if non-zero, BOMs are copied to the output string -.sp -The result of the function is the number of 16-bit units placed into the output -buffer, including the zero terminator if the string was zero-terminated. -.P -If \fIhost_byte_order\fP is not NULL, it is set to indicate the byte order that -is current at the end of the string. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_utf32_to_host_byte_order.3 b/src/pcre/doc/pcre_utf32_to_host_byte_order.3 deleted file mode 100644 index a415dcf5..00000000 --- a/src/pcre/doc/pcre_utf32_to_host_byte_order.3 +++ /dev/null @@ -1,45 +0,0 @@ -.TH PCRE_UTF32_TO_HOST_BYTE_ORDER 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.nf -.B int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *\fIoutput\fP, -.B " PCRE_SPTR32 \fIinput\fP, int \fIlength\fP, int *\fIhost_byte_order\fP," -.B " int \fIkeep_boms\fP);" -.fi -. -. -.SH DESCRIPTION -.rs -.sp -This function, which exists only in the 32-bit library, converts a UTF-32 -string to the correct order for the current host, taking account of any byte -order marks (BOMs) within the string. Its arguments are: -.sp - \fIoutput\fP pointer to output buffer, may be the same as \fIinput\fP - \fIinput\fP pointer to input buffer - \fIlength\fP number of 32-bit units in the input, or negative for - a zero-terminated string - \fIhost_byte_order\fP a NULL value or a non-zero value pointed to means - start in host byte order - \fIkeep_boms\fP if non-zero, BOMs are copied to the output string -.sp -The result of the function is the number of 32-bit units placed into the output -buffer, including the zero terminator if the string was zero-terminated. -.P -If \fIhost_byte_order\fP is not NULL, it is set to indicate the byte order that -is current at the end of the string. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcre_version.3 b/src/pcre/doc/pcre_version.3 deleted file mode 100644 index 0f4973f9..00000000 --- a/src/pcre/doc/pcre_version.3 +++ /dev/null @@ -1,31 +0,0 @@ -.TH PCRE_VERSION 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.SM -.B const char *pcre_version(void); -.PP -.B const char *pcre16_version(void); -.PP -.B const char *pcre32_version(void); -. -.SH DESCRIPTION -.rs -.sp -This function (even in the 16-bit and 32-bit libraries) returns a -zero-terminated, 8-bit character string that gives the version number of the -PCRE library and the date of its release. -.P -There is a complete description of the PCRE native API in the -.\" HREF -\fBpcreapi\fP -.\" -page and a description of the POSIX API in the -.\" HREF -\fBpcreposix\fP -.\" -page. diff --git a/src/pcre/doc/pcreapi.3 b/src/pcre/doc/pcreapi.3 deleted file mode 100644 index 6e7c7c6e..00000000 --- a/src/pcre/doc/pcreapi.3 +++ /dev/null @@ -1,2918 +0,0 @@ -.TH PCREAPI 3 "18 December 2015" "PCRE 8.39" -.SH NAME -PCRE - Perl-compatible regular expressions -.sp -.B #include -. -. -.SH "PCRE NATIVE API BASIC FUNCTIONS" -.rs -.sp -.nf -.B pcre *pcre_compile(const char *\fIpattern\fP, int \fIoptions\fP, -.B " const char **\fIerrptr\fP, int *\fIerroffset\fP," -.B " const unsigned char *\fItableptr\fP);" -.sp -.B pcre *pcre_compile2(const char *\fIpattern\fP, int \fIoptions\fP, -.B " int *\fIerrorcodeptr\fP," -.B " const char **\fIerrptr\fP, int *\fIerroffset\fP," -.B " const unsigned char *\fItableptr\fP);" -.sp -.B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP, -.B " const char **\fIerrptr\fP);" -.sp -.B void pcre_free_study(pcre_extra *\fIextra\fP); -.sp -.B int pcre_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP," -.B " const char *\fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);" -.sp -.B int pcre_dfa_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP," -.B " const char *\fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP," -.B " int *\fIworkspace\fP, int \fIwscount\fP);" -.fi -. -. -.SH "PCRE NATIVE API STRING EXTRACTION FUNCTIONS" -.rs -.sp -.nf -.B int pcre_copy_named_substring(const pcre *\fIcode\fP, -.B " const char *\fIsubject\fP, int *\fIovector\fP," -.B " int \fIstringcount\fP, const char *\fIstringname\fP," -.B " char *\fIbuffer\fP, int \fIbuffersize\fP);" -.sp -.B int pcre_copy_substring(const char *\fIsubject\fP, int *\fIovector\fP, -.B " int \fIstringcount\fP, int \fIstringnumber\fP, char *\fIbuffer\fP," -.B " int \fIbuffersize\fP);" -.sp -.B int pcre_get_named_substring(const pcre *\fIcode\fP, -.B " const char *\fIsubject\fP, int *\fIovector\fP," -.B " int \fIstringcount\fP, const char *\fIstringname\fP," -.B " const char **\fIstringptr\fP);" -.sp -.B int pcre_get_stringnumber(const pcre *\fIcode\fP, -.B " const char *\fIname\fP);" -.sp -.B int pcre_get_stringtable_entries(const pcre *\fIcode\fP, -.B " const char *\fIname\fP, char **\fIfirst\fP, char **\fIlast\fP);" -.sp -.B int pcre_get_substring(const char *\fIsubject\fP, int *\fIovector\fP, -.B " int \fIstringcount\fP, int \fIstringnumber\fP," -.B " const char **\fIstringptr\fP);" -.sp -.B int pcre_get_substring_list(const char *\fIsubject\fP, -.B " int *\fIovector\fP, int \fIstringcount\fP, const char ***\fIlistptr\fP);" -.sp -.B void pcre_free_substring(const char *\fIstringptr\fP); -.sp -.B void pcre_free_substring_list(const char **\fIstringptr\fP); -.fi -. -. -.SH "PCRE NATIVE API AUXILIARY FUNCTIONS" -.rs -.sp -.nf -.B int pcre_jit_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP," -.B " const char *\fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP," -.B " pcre_jit_stack *\fIjstack\fP);" -.sp -.B pcre_jit_stack *pcre_jit_stack_alloc(int \fIstartsize\fP, int \fImaxsize\fP); -.sp -.B void pcre_jit_stack_free(pcre_jit_stack *\fIstack\fP); -.sp -.B void pcre_assign_jit_stack(pcre_extra *\fIextra\fP, -.B " pcre_jit_callback \fIcallback\fP, void *\fIdata\fP);" -.sp -.B const unsigned char *pcre_maketables(void); -.sp -.B int pcre_fullinfo(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP," -.B " int \fIwhat\fP, void *\fIwhere\fP);" -.sp -.B int pcre_refcount(pcre *\fIcode\fP, int \fIadjust\fP); -.sp -.B int pcre_config(int \fIwhat\fP, void *\fIwhere\fP); -.sp -.B const char *pcre_version(void); -.sp -.B int pcre_pattern_to_host_byte_order(pcre *\fIcode\fP, -.B " pcre_extra *\fIextra\fP, const unsigned char *\fItables\fP);" -.fi -. -. -.SH "PCRE NATIVE API INDIRECTED FUNCTIONS" -.rs -.sp -.nf -.B void *(*pcre_malloc)(size_t); -.sp -.B void (*pcre_free)(void *); -.sp -.B void *(*pcre_stack_malloc)(size_t); -.sp -.B void (*pcre_stack_free)(void *); -.sp -.B int (*pcre_callout)(pcre_callout_block *); -.sp -.B int (*pcre_stack_guard)(void); -.fi -. -. -.SH "PCRE 8-BIT, 16-BIT, AND 32-BIT LIBRARIES" -.rs -.sp -As well as support for 8-bit character strings, PCRE also supports 16-bit -strings (from release 8.30) and 32-bit strings (from release 8.32), by means of -two additional libraries. They can be built as well as, or instead of, the -8-bit library. To avoid too much complication, this document describes the -8-bit versions of the functions, with only occasional references to the 16-bit -and 32-bit libraries. -.P -The 16-bit and 32-bit functions operate in the same way as their 8-bit -counterparts; they just use different data types for their arguments and -results, and their names start with \fBpcre16_\fP or \fBpcre32_\fP instead of -\fBpcre_\fP. For every option that has UTF8 in its name (for example, -PCRE_UTF8), there are corresponding 16-bit and 32-bit names with UTF8 replaced -by UTF16 or UTF32, respectively. This facility is in fact just cosmetic; the -16-bit and 32-bit option names define the same bit values. -.P -References to bytes and UTF-8 in this document should be read as references to -16-bit data units and UTF-16 when using the 16-bit library, or 32-bit data -units and UTF-32 when using the 32-bit library, unless specified otherwise. -More details of the specific differences for the 16-bit and 32-bit libraries -are given in the -.\" HREF -\fBpcre16\fP -.\" -and -.\" HREF -\fBpcre32\fP -.\" -pages. -. -. -.SH "PCRE API OVERVIEW" -.rs -.sp -PCRE has its own native API, which is described in this document. There are -also some wrapper functions (for the 8-bit library only) that correspond to the -POSIX regular expression API, but they do not give access to all the -functionality. They are described in the -.\" HREF -\fBpcreposix\fP -.\" -documentation. Both of these APIs define a set of C function calls. A C++ -wrapper (again for the 8-bit library only) is also distributed with PCRE. It is -documented in the -.\" HREF -\fBpcrecpp\fP -.\" -page. -.P -The native API C function prototypes are defined in the header file -\fBpcre.h\fP, and on Unix-like systems the (8-bit) library itself is called -\fBlibpcre\fP. It can normally be accessed by adding \fB-lpcre\fP to the -command for linking an application that uses PCRE. The header file defines the -macros PCRE_MAJOR and PCRE_MINOR to contain the major and minor release numbers -for the library. Applications can use these to include support for different -releases of PCRE. -.P -In a Windows environment, if you want to statically link an application program -against a non-dll \fBpcre.a\fP file, you must define PCRE_STATIC before -including \fBpcre.h\fP or \fBpcrecpp.h\fP, because otherwise the -\fBpcre_malloc()\fP and \fBpcre_free()\fP exported functions will be declared -\fB__declspec(dllimport)\fP, with unwanted results. -.P -The functions \fBpcre_compile()\fP, \fBpcre_compile2()\fP, \fBpcre_study()\fP, -and \fBpcre_exec()\fP are used for compiling and matching regular expressions -in a Perl-compatible manner. A sample program that demonstrates the simplest -way of using them is provided in the file called \fIpcredemo.c\fP in the PCRE -source distribution. A listing of this program is given in the -.\" HREF -\fBpcredemo\fP -.\" -documentation, and the -.\" HREF -\fBpcresample\fP -.\" -documentation describes how to compile and run it. -.P -Just-in-time compiler support is an optional feature of PCRE that can be built -in appropriate hardware environments. It greatly speeds up the matching -performance of many patterns. Simple programs can easily request that it be -used if available, by setting an option that is ignored when it is not -relevant. More complicated programs might need to make use of the functions -\fBpcre_jit_stack_alloc()\fP, \fBpcre_jit_stack_free()\fP, and -\fBpcre_assign_jit_stack()\fP in order to control the JIT code's memory usage. -.P -From release 8.32 there is also a direct interface for JIT execution, which -gives improved performance. The JIT-specific functions are discussed in the -.\" HREF -\fBpcrejit\fP -.\" -documentation. -.P -A second matching function, \fBpcre_dfa_exec()\fP, which is not -Perl-compatible, is also provided. This uses a different algorithm for the -matching. The alternative algorithm finds all possible matches (at a given -point in the subject), and scans the subject just once (unless there are -lookbehind assertions). However, this algorithm does not return captured -substrings. A description of the two matching algorithms and their advantages -and disadvantages is given in the -.\" HREF -\fBpcrematching\fP -.\" -documentation. -.P -In addition to the main compiling and matching functions, there are convenience -functions for extracting captured substrings from a subject string that is -matched by \fBpcre_exec()\fP. They are: -.sp - \fBpcre_copy_substring()\fP - \fBpcre_copy_named_substring()\fP - \fBpcre_get_substring()\fP - \fBpcre_get_named_substring()\fP - \fBpcre_get_substring_list()\fP - \fBpcre_get_stringnumber()\fP - \fBpcre_get_stringtable_entries()\fP -.sp -\fBpcre_free_substring()\fP and \fBpcre_free_substring_list()\fP are also -provided, to free the memory used for extracted strings. -.P -The function \fBpcre_maketables()\fP is used to build a set of character tables -in the current locale for passing to \fBpcre_compile()\fP, \fBpcre_exec()\fP, -or \fBpcre_dfa_exec()\fP. This is an optional facility that is provided for -specialist use. Most commonly, no special tables are passed, in which case -internal tables that are generated when PCRE is built are used. -.P -The function \fBpcre_fullinfo()\fP is used to find out information about a -compiled pattern. The function \fBpcre_version()\fP returns a pointer to a -string containing the version of PCRE and its date of release. -.P -The function \fBpcre_refcount()\fP maintains a reference count in a data block -containing a compiled pattern. This is provided for the benefit of -object-oriented applications. -.P -The global variables \fBpcre_malloc\fP and \fBpcre_free\fP initially contain -the entry points of the standard \fBmalloc()\fP and \fBfree()\fP functions, -respectively. PCRE calls the memory management functions via these variables, -so a calling program can replace them if it wishes to intercept the calls. This -should be done before calling any PCRE functions. -.P -The global variables \fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP are also -indirections to memory management functions. These special functions are used -only when PCRE is compiled to use the heap for remembering data, instead of -recursive function calls, when running the \fBpcre_exec()\fP function. See the -.\" HREF -\fBpcrebuild\fP -.\" -documentation for details of how to do this. It is a non-standard way of -building PCRE, for use in environments that have limited stacks. Because of the -greater use of memory management, it runs more slowly. Separate functions are -provided so that special-purpose external code can be used for this case. When -used, these functions always allocate memory blocks of the same size. There is -a discussion about PCRE's stack usage in the -.\" HREF -\fBpcrestack\fP -.\" -documentation. -.P -The global variable \fBpcre_callout\fP initially contains NULL. It can be set -by the caller to a "callout" function, which PCRE will then call at specified -points during a matching operation. Details are given in the -.\" HREF -\fBpcrecallout\fP -.\" -documentation. -.P -The global variable \fBpcre_stack_guard\fP initially contains NULL. It can be -set by the caller to a function that is called by PCRE whenever it starts -to compile a parenthesized part of a pattern. When parentheses are nested, PCRE -uses recursive function calls, which use up the system stack. This function is -provided so that applications with restricted stacks can force a compilation -error if the stack runs out. The function should return zero if all is well, or -non-zero to force an error. -. -. -.\" HTML -.SH NEWLINES -.rs -.sp -PCRE supports five different conventions for indicating line breaks in -strings: a single CR (carriage return) character, a single LF (linefeed) -character, the two-character sequence CRLF, any of the three preceding, or any -Unicode newline sequence. The Unicode newline sequences are the three just -mentioned, plus the single characters VT (vertical tab, U+000B), FF (form feed, -U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS -(paragraph separator, U+2029). -.P -Each of the first three conventions is used by at least one operating system as -its standard newline sequence. When PCRE is built, a default can be specified. -The default default is LF, which is the Unix standard. When PCRE is run, the -default can be overridden, either when a pattern is compiled, or when it is -matched. -.P -At compile time, the newline convention can be specified by the \fIoptions\fP -argument of \fBpcre_compile()\fP, or it can be specified by special text at the -start of the pattern itself; this overrides any other settings. See the -.\" HREF -\fBpcrepattern\fP -.\" -page for details of the special character sequences. -.P -In the PCRE documentation the word "newline" is used to mean "the character or -pair of characters that indicate a line break". The choice of newline -convention affects the handling of the dot, circumflex, and dollar -metacharacters, the handling of #-comments in /x mode, and, when CRLF is a -recognized line ending sequence, the match position advancement for a -non-anchored pattern. There is more detail about this in the -.\" HTML -.\" -section on \fBpcre_exec()\fP options -.\" -below. -.P -The choice of newline convention does not affect the interpretation of -the \en or \er escape sequences, nor does it affect what \eR matches, which is -controlled in a similar way, but by separate options. -. -. -.SH MULTITHREADING -.rs -.sp -The PCRE functions can be used in multi-threading applications, with the -proviso that the memory management functions pointed to by \fBpcre_malloc\fP, -\fBpcre_free\fP, \fBpcre_stack_malloc\fP, and \fBpcre_stack_free\fP, and the -callout and stack-checking functions pointed to by \fBpcre_callout\fP and -\fBpcre_stack_guard\fP, are shared by all threads. -.P -The compiled form of a regular expression is not altered during matching, so -the same compiled pattern can safely be used by several threads at once. -.P -If the just-in-time optimization feature is being used, it needs separate -memory stack areas for each thread. See the -.\" HREF -\fBpcrejit\fP -.\" -documentation for more details. -. -. -.SH "SAVING PRECOMPILED PATTERNS FOR LATER USE" -.rs -.sp -The compiled form of a regular expression can be saved and re-used at a later -time, possibly by a different program, and even on a host other than the one on -which it was compiled. Details are given in the -.\" HREF -\fBpcreprecompile\fP -.\" -documentation, which includes a description of the -\fBpcre_pattern_to_host_byte_order()\fP function. However, compiling a regular -expression with one version of PCRE for use with a different version is not -guaranteed to work and may cause crashes. -. -. -.SH "CHECKING BUILD-TIME OPTIONS" -.rs -.sp -.B int pcre_config(int \fIwhat\fP, void *\fIwhere\fP); -.PP -The function \fBpcre_config()\fP makes it possible for a PCRE client to -discover which optional features have been compiled into the PCRE library. The -.\" HREF -\fBpcrebuild\fP -.\" -documentation has more details about these optional features. -.P -The first argument for \fBpcre_config()\fP is an integer, specifying which -information is required; the second argument is a pointer to a variable into -which the information is placed. The returned value is zero on success, or the -negative error code PCRE_ERROR_BADOPTION if the value in the first argument is -not recognized. The following information is available: -.sp - PCRE_CONFIG_UTF8 -.sp -The output is an integer that is set to one if UTF-8 support is available; -otherwise it is set to zero. This value should normally be given to the 8-bit -version of this function, \fBpcre_config()\fP. If it is given to the 16-bit -or 32-bit version of this function, the result is PCRE_ERROR_BADOPTION. -.sp - PCRE_CONFIG_UTF16 -.sp -The output is an integer that is set to one if UTF-16 support is available; -otherwise it is set to zero. This value should normally be given to the 16-bit -version of this function, \fBpcre16_config()\fP. If it is given to the 8-bit -or 32-bit version of this function, the result is PCRE_ERROR_BADOPTION. -.sp - PCRE_CONFIG_UTF32 -.sp -The output is an integer that is set to one if UTF-32 support is available; -otherwise it is set to zero. This value should normally be given to the 32-bit -version of this function, \fBpcre32_config()\fP. If it is given to the 8-bit -or 16-bit version of this function, the result is PCRE_ERROR_BADOPTION. -.sp - PCRE_CONFIG_UNICODE_PROPERTIES -.sp -The output is an integer that is set to one if support for Unicode character -properties is available; otherwise it is set to zero. -.sp - PCRE_CONFIG_JIT -.sp -The output is an integer that is set to one if support for just-in-time -compiling is available; otherwise it is set to zero. -.sp - PCRE_CONFIG_JITTARGET -.sp -The output is a pointer to a zero-terminated "const char *" string. If JIT -support is available, the string contains the name of the architecture for -which the JIT compiler is configured, for example "x86 32bit (little endian + -unaligned)". If JIT support is not available, the result is NULL. -.sp - PCRE_CONFIG_NEWLINE -.sp -The output is an integer whose value specifies the default character sequence -that is recognized as meaning "newline". The values that are supported in -ASCII/Unicode environments are: 10 for LF, 13 for CR, 3338 for CRLF, -2 for -ANYCRLF, and -1 for ANY. In EBCDIC environments, CR, ANYCRLF, and ANY yield the -same values. However, the value for LF is normally 21, though some EBCDIC -environments use 37. The corresponding values for CRLF are 3349 and 3365. The -default should normally correspond to the standard sequence for your operating -system. -.sp - PCRE_CONFIG_BSR -.sp -The output is an integer whose value indicates what character sequences the \eR -escape sequence matches by default. A value of 0 means that \eR matches any -Unicode line ending sequence; a value of 1 means that \eR matches only CR, LF, -or CRLF. The default can be overridden when a pattern is compiled or matched. -.sp - PCRE_CONFIG_LINK_SIZE -.sp -The output is an integer that contains the number of bytes used for internal -linkage in compiled regular expressions. For the 8-bit library, the value can -be 2, 3, or 4. For the 16-bit library, the value is either 2 or 4 and is still -a number of bytes. For the 32-bit library, the value is either 2 or 4 and is -still a number of bytes. The default value of 2 is sufficient for all but the -most massive patterns, since it allows the compiled pattern to be up to 64K in -size. Larger values allow larger regular expressions to be compiled, at the -expense of slower matching. -.sp - PCRE_CONFIG_POSIX_MALLOC_THRESHOLD -.sp -The output is an integer that contains the threshold above which the POSIX -interface uses \fBmalloc()\fP for output vectors. Further details are given in -the -.\" HREF -\fBpcreposix\fP -.\" -documentation. -.sp - PCRE_CONFIG_PARENS_LIMIT -.sp -The output is a long integer that gives the maximum depth of nesting of -parentheses (of any kind) in a pattern. This limit is imposed to cap the amount -of system stack used when a pattern is compiled. It is specified when PCRE is -built; the default is 250. This limit does not take into account the stack that -may already be used by the calling application. For finer control over -compilation stack usage, you can set a pointer to an external checking function -in \fBpcre_stack_guard\fP. -.sp - PCRE_CONFIG_MATCH_LIMIT -.sp -The output is a long integer that gives the default limit for the number of -internal matching function calls in a \fBpcre_exec()\fP execution. Further -details are given with \fBpcre_exec()\fP below. -.sp - PCRE_CONFIG_MATCH_LIMIT_RECURSION -.sp -The output is a long integer that gives the default limit for the depth of -recursion when calling the internal matching function in a \fBpcre_exec()\fP -execution. Further details are given with \fBpcre_exec()\fP below. -.sp - PCRE_CONFIG_STACKRECURSE -.sp -The output is an integer that is set to one if internal recursion when running -\fBpcre_exec()\fP is implemented by recursive function calls that use the stack -to remember their state. This is the usual way that PCRE is compiled. The -output is zero if PCRE was compiled to use blocks of data on the heap instead -of recursive function calls. In this case, \fBpcre_stack_malloc\fP and -\fBpcre_stack_free\fP are called to manage memory blocks on the heap, thus -avoiding the use of the stack. -. -. -.SH "COMPILING A PATTERN" -.rs -.sp -.nf -.B pcre *pcre_compile(const char *\fIpattern\fP, int \fIoptions\fP, -.B " const char **\fIerrptr\fP, int *\fIerroffset\fP," -.B " const unsigned char *\fItableptr\fP);" -.sp -.B pcre *pcre_compile2(const char *\fIpattern\fP, int \fIoptions\fP, -.B " int *\fIerrorcodeptr\fP," -.B " const char **\fIerrptr\fP, int *\fIerroffset\fP," -.B " const unsigned char *\fItableptr\fP);" -.fi -.P -Either of the functions \fBpcre_compile()\fP or \fBpcre_compile2()\fP can be -called to compile a pattern into an internal form. The only difference between -the two interfaces is that \fBpcre_compile2()\fP has an additional argument, -\fIerrorcodeptr\fP, via which a numerical error code can be returned. To avoid -too much repetition, we refer just to \fBpcre_compile()\fP below, but the -information applies equally to \fBpcre_compile2()\fP. -.P -The pattern is a C string terminated by a binary zero, and is passed in the -\fIpattern\fP argument. A pointer to a single block of memory that is obtained -via \fBpcre_malloc\fP is returned. This contains the compiled code and related -data. The \fBpcre\fP type is defined for the returned block; this is a typedef -for a structure whose contents are not externally defined. It is up to the -caller to free the memory (via \fBpcre_free\fP) when it is no longer required. -.P -Although the compiled code of a PCRE regex is relocatable, that is, it does not -depend on memory location, the complete \fBpcre\fP data block is not -fully relocatable, because it may contain a copy of the \fItableptr\fP -argument, which is an address (see below). -.P -The \fIoptions\fP argument contains various bit settings that affect the -compilation. It should be zero if no options are required. The available -options are described below. Some of them (in particular, those that are -compatible with Perl, but some others as well) can also be set and unset from -within the pattern (see the detailed description in the -.\" HREF -\fBpcrepattern\fP -.\" -documentation). For those options that can be different in different parts of -the pattern, the contents of the \fIoptions\fP argument specifies their -settings at the start of compilation and execution. The PCRE_ANCHORED, -PCRE_BSR_\fIxxx\fP, PCRE_NEWLINE_\fIxxx\fP, PCRE_NO_UTF8_CHECK, and -PCRE_NO_START_OPTIMIZE options can be set at the time of matching as well as at -compile time. -.P -If \fIerrptr\fP is NULL, \fBpcre_compile()\fP returns NULL immediately. -Otherwise, if compilation of a pattern fails, \fBpcre_compile()\fP returns -NULL, and sets the variable pointed to by \fIerrptr\fP to point to a textual -error message. This is a static string that is part of the library. You must -not try to free it. Normally, the offset from the start of the pattern to the -data unit that was being processed when the error was discovered is placed in -the variable pointed to by \fIerroffset\fP, which must not be NULL (if it is, -an immediate error is given). However, for an invalid UTF-8 or UTF-16 string, -the offset is that of the first data unit of the failing character. -.P -Some errors are not detected until the whole pattern has been scanned; in these -cases, the offset passed back is the length of the pattern. Note that the -offset is in data units, not characters, even in a UTF mode. It may sometimes -point into the middle of a UTF-8 or UTF-16 character. -.P -If \fBpcre_compile2()\fP is used instead of \fBpcre_compile()\fP, and the -\fIerrorcodeptr\fP argument is not NULL, a non-zero error code number is -returned via this argument in the event of an error. This is in addition to the -textual error message. Error codes and messages are listed below. -.P -If the final argument, \fItableptr\fP, is NULL, PCRE uses a default set of -character tables that are built when PCRE is compiled, using the default C -locale. Otherwise, \fItableptr\fP must be an address that is the result of a -call to \fBpcre_maketables()\fP. This value is stored with the compiled -pattern, and used again by \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP when the -pattern is matched. For more discussion, see the section on locale support -below. -.P -This code fragment shows a typical straightforward call to \fBpcre_compile()\fP: -.sp - pcre *re; - const char *error; - int erroffset; - re = pcre_compile( - "^A.*Z", /* the pattern */ - 0, /* default options */ - &error, /* for error message */ - &erroffset, /* for error offset */ - NULL); /* use default character tables */ -.sp -The following names for option bits are defined in the \fBpcre.h\fP header -file: -.sp - PCRE_ANCHORED -.sp -If this bit is set, the pattern is forced to be "anchored", that is, it is -constrained to match only at the first matching point in the string that is -being searched (the "subject string"). This effect can also be achieved by -appropriate constructs in the pattern itself, which is the only way to do it in -Perl. -.sp - PCRE_AUTO_CALLOUT -.sp -If this bit is set, \fBpcre_compile()\fP automatically inserts callout items, -all with number 255, before each pattern item. For discussion of the callout -facility, see the -.\" HREF -\fBpcrecallout\fP -.\" -documentation. -.sp - PCRE_BSR_ANYCRLF - PCRE_BSR_UNICODE -.sp -These options (which are mutually exclusive) control what the \eR escape -sequence matches. The choice is either to match only CR, LF, or CRLF, or to -match any Unicode newline sequence. The default is specified when PCRE is -built. It can be overridden from within the pattern, or by setting an option -when a compiled pattern is matched. -.sp - PCRE_CASELESS -.sp -If this bit is set, letters in the pattern match both upper and lower case -letters. It is equivalent to Perl's /i option, and it can be changed within a -pattern by a (?i) option setting. In UTF-8 mode, PCRE always understands the -concept of case for characters whose values are less than 128, so caseless -matching is always possible. For characters with higher values, the concept of -case is supported if PCRE is compiled with Unicode property support, but not -otherwise. If you want to use caseless matching for characters 128 and above, -you must ensure that PCRE is compiled with Unicode property support as well as -with UTF-8 support. -.sp - PCRE_DOLLAR_ENDONLY -.sp -If this bit is set, a dollar metacharacter in the pattern matches only at the -end of the subject string. Without this option, a dollar also matches -immediately before a newline at the end of the string (but not before any other -newlines). The PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set. -There is no equivalent to this option in Perl, and no way to set it within a -pattern. -.sp - PCRE_DOTALL -.sp -If this bit is set, a dot metacharacter in the pattern matches a character of -any value, including one that indicates a newline. However, it only ever -matches one character, even if newlines are coded as CRLF. Without this option, -a dot does not match when the current position is at a newline. This option is -equivalent to Perl's /s option, and it can be changed within a pattern by a -(?s) option setting. A negative class such as [^a] always matches newline -characters, independent of the setting of this option. -.sp - PCRE_DUPNAMES -.sp -If this bit is set, names used to identify capturing subpatterns need not be -unique. This can be helpful for certain types of pattern when it is known that -only one instance of the named subpattern can ever be matched. There are more -details of named subpatterns below; see also the -.\" HREF -\fBpcrepattern\fP -.\" -documentation. -.sp - PCRE_EXTENDED -.sp -If this bit is set, most white space characters in the pattern are totally -ignored except when escaped or inside a character class. However, white space -is not allowed within sequences such as (?> that introduce various -parenthesized subpatterns, nor within a numerical quantifier such as {1,3}. -However, ignorable white space is permitted between an item and a following -quantifier and between a quantifier and a following + that indicates -possessiveness. -.P -White space did not used to include the VT character (code 11), because Perl -did not treat this character as white space. However, Perl changed at release -5.18, so PCRE followed at release 8.34, and VT is now treated as white space. -.P -PCRE_EXTENDED also causes characters between an unescaped # outside a character -class and the next newline, inclusive, to be ignored. PCRE_EXTENDED is -equivalent to Perl's /x option, and it can be changed within a pattern by a -(?x) option setting. -.P -Which characters are interpreted as newlines is controlled by the options -passed to \fBpcre_compile()\fP or by a special sequence at the start of the -pattern, as described in the section entitled -.\" HTML -.\" -"Newline conventions" -.\" -in the \fBpcrepattern\fP documentation. Note that the end of this type of -comment is a literal newline sequence in the pattern; escape sequences that -happen to represent a newline do not count. -.P -This option makes it possible to include comments inside complicated patterns. -Note, however, that this applies only to data characters. White space characters -may never appear within special character sequences in a pattern, for example -within the sequence (?( that introduces a conditional subpattern. -.sp - PCRE_EXTRA -.sp -This option was invented in order to turn on additional functionality of PCRE -that is incompatible with Perl, but it is currently of very little use. When -set, any backslash in a pattern that is followed by a letter that has no -special meaning causes an error, thus reserving these combinations for future -expansion. By default, as in Perl, a backslash followed by a letter with no -special meaning is treated as a literal. (Perl can, however, be persuaded to -give an error for this, by running it with the -w option.) There are at present -no other features controlled by this option. It can also be set by a (?X) -option setting within a pattern. -.sp - PCRE_FIRSTLINE -.sp -If this option is set, an unanchored pattern is required to match before or at -the first newline in the subject string, though the matched text may continue -over the newline. -.sp - PCRE_JAVASCRIPT_COMPAT -.sp -If this option is set, PCRE's behaviour is changed in some ways so that it is -compatible with JavaScript rather than Perl. The changes are as follows: -.P -(1) A lone closing square bracket in a pattern causes a compile-time error, -because this is illegal in JavaScript (by default it is treated as a data -character). Thus, the pattern AB]CD becomes illegal when this option is set. -.P -(2) At run time, a back reference to an unset subpattern group matches an empty -string (by default this causes the current matching alternative to fail). A -pattern such as (\e1)(a) succeeds when this option is set (assuming it can find -an "a" in the subject), whereas it fails by default, for Perl compatibility. -.P -(3) \eU matches an upper case "U" character; by default \eU causes a compile -time error (Perl uses \eU to upper case subsequent characters). -.P -(4) \eu matches a lower case "u" character unless it is followed by four -hexadecimal digits, in which case the hexadecimal number defines the code point -to match. By default, \eu causes a compile time error (Perl uses it to upper -case the following character). -.P -(5) \ex matches a lower case "x" character unless it is followed by two -hexadecimal digits, in which case the hexadecimal number defines the code point -to match. By default, as in Perl, a hexadecimal number is always expected after -\ex, but it may have zero, one, or two digits (so, for example, \exz matches a -binary zero character followed by z). -.sp - PCRE_MULTILINE -.sp -By default, for the purposes of matching "start of line" and "end of line", -PCRE treats the subject string as consisting of a single line of characters, -even if it actually contains newlines. The "start of line" metacharacter (^) -matches only at the start of the string, and the "end of line" metacharacter -($) matches only at the end of the string, or before a terminating newline -(except when PCRE_DOLLAR_ENDONLY is set). Note, however, that unless -PCRE_DOTALL is set, the "any character" metacharacter (.) does not match at a -newline. This behaviour (for ^, $, and dot) is the same as Perl. -.P -When PCRE_MULTILINE it is set, the "start of line" and "end of line" constructs -match immediately following or immediately before internal newlines in the -subject string, respectively, as well as at the very start and end. This is -equivalent to Perl's /m option, and it can be changed within a pattern by a -(?m) option setting. If there are no newlines in a subject string, or no -occurrences of ^ or $ in a pattern, setting PCRE_MULTILINE has no effect. -.sp - PCRE_NEVER_UTF -.sp -This option locks out interpretation of the pattern as UTF-8 (or UTF-16 or -UTF-32 in the 16-bit and 32-bit libraries). In particular, it prevents the -creator of the pattern from switching to UTF interpretation by starting the -pattern with (*UTF). This may be useful in applications that process patterns -from external sources. The combination of PCRE_UTF8 and PCRE_NEVER_UTF also -causes an error. -.sp - PCRE_NEWLINE_CR - PCRE_NEWLINE_LF - PCRE_NEWLINE_CRLF - PCRE_NEWLINE_ANYCRLF - PCRE_NEWLINE_ANY -.sp -These options override the default newline definition that was chosen when PCRE -was built. Setting the first or the second specifies that a newline is -indicated by a single character (CR or LF, respectively). Setting -PCRE_NEWLINE_CRLF specifies that a newline is indicated by the two-character -CRLF sequence. Setting PCRE_NEWLINE_ANYCRLF specifies that any of the three -preceding sequences should be recognized. Setting PCRE_NEWLINE_ANY specifies -that any Unicode newline sequence should be recognized. -.P -In an ASCII/Unicode environment, the Unicode newline sequences are the three -just mentioned, plus the single characters VT (vertical tab, U+000B), FF (form -feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS -(paragraph separator, U+2029). For the 8-bit library, the last two are -recognized only in UTF-8 mode. -.P -When PCRE is compiled to run in an EBCDIC (mainframe) environment, the code for -CR is 0x0d, the same as ASCII. However, the character code for LF is normally -0x15, though in some EBCDIC environments 0x25 is used. Whichever of these is -not LF is made to correspond to Unicode's NEL character. EBCDIC codes are all -less than 256. For more details, see the -.\" HREF -\fBpcrebuild\fP -.\" -documentation. -.P -The newline setting in the options word uses three bits that are treated -as a number, giving eight possibilities. Currently only six are used (default -plus the five values above). This means that if you set more than one newline -option, the combination may or may not be sensible. For example, -PCRE_NEWLINE_CR with PCRE_NEWLINE_LF is equivalent to PCRE_NEWLINE_CRLF, but -other combinations may yield unused numbers and cause an error. -.P -The only time that a line break in a pattern is specially recognized when -compiling is when PCRE_EXTENDED is set. CR and LF are white space characters, -and so are ignored in this mode. Also, an unescaped # outside a character class -indicates a comment that lasts until after the next line break sequence. In -other circumstances, line break sequences in patterns are treated as literal -data. -.P -The newline option that is set at compile time becomes the default that is used -for \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, but it can be overridden. -.sp - PCRE_NO_AUTO_CAPTURE -.sp -If this option is set, it disables the use of numbered capturing parentheses in -the pattern. Any opening parenthesis that is not followed by ? behaves as if it -were followed by ?: but named parentheses can still be used for capturing (and -they acquire numbers in the usual way). There is no equivalent of this option -in Perl. -.sp - PCRE_NO_AUTO_POSSESS -.sp -If this option is set, it disables "auto-possessification". This is an -optimization that, for example, turns a+b into a++b in order to avoid -backtracks into a+ that can never be successful. However, if callouts are in -use, auto-possessification means that some of them are never taken. You can set -this option if you want the matching functions to do a full unoptimized search -and run all the callouts, but it is mainly provided for testing purposes. -.sp - PCRE_NO_START_OPTIMIZE -.sp -This is an option that acts at matching time; that is, it is really an option -for \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. If it is set at compile time, -it is remembered with the compiled pattern and assumed at matching time. This -is necessary if you want to use JIT execution, because the JIT compiler needs -to know whether or not this option is set. For details see the discussion of -PCRE_NO_START_OPTIMIZE -.\" HTML -.\" -below. -.\" -.sp - PCRE_UCP -.sp -This option changes the way PCRE processes \eB, \eb, \eD, \ed, \eS, \es, \eW, -\ew, and some of the POSIX character classes. By default, only ASCII characters -are recognized, but if PCRE_UCP is set, Unicode properties are used instead to -classify characters. More details are given in the section on -.\" HTML -.\" -generic character types -.\" -in the -.\" HREF -\fBpcrepattern\fP -.\" -page. If you set PCRE_UCP, matching one of the items it affects takes much -longer. The option is available only if PCRE has been compiled with Unicode -property support. -.sp - PCRE_UNGREEDY -.sp -This option inverts the "greediness" of the quantifiers so that they are not -greedy by default, but become greedy if followed by "?". It is not compatible -with Perl. It can also be set by a (?U) option setting within the pattern. -.sp - PCRE_UTF8 -.sp -This option causes PCRE to regard both the pattern and the subject as strings -of UTF-8 characters instead of single-byte strings. However, it is available -only when PCRE is built to include UTF support. If not, the use of this option -provokes an error. Details of how this option changes the behaviour of PCRE are -given in the -.\" HREF -\fBpcreunicode\fP -.\" -page. -.sp - PCRE_NO_UTF8_CHECK -.sp -When PCRE_UTF8 is set, the validity of the pattern as a UTF-8 string is -automatically checked. There is a discussion about the -.\" HTML -.\" -validity of UTF-8 strings -.\" -in the -.\" HREF -\fBpcreunicode\fP -.\" -page. If an invalid UTF-8 sequence is found, \fBpcre_compile()\fP returns an -error. If you already know that your pattern is valid, and you want to skip -this check for performance reasons, you can set the PCRE_NO_UTF8_CHECK option. -When it is set, the effect of passing an invalid UTF-8 string as a pattern is -undefined. It may cause your program to crash or loop. Note that this option -can also be passed to \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP, to suppress -the validity checking of subject strings only. If the same string is being -matched many times, the option can be safely set for the second and subsequent -matchings to improve performance. -. -. -.SH "COMPILATION ERROR CODES" -.rs -.sp -The following table lists the error codes than may be returned by -\fBpcre_compile2()\fP, along with the error messages that may be returned by -both compiling functions. Note that error messages are always 8-bit ASCII -strings, even in 16-bit or 32-bit mode. As PCRE has developed, some error codes -have fallen out of use. To avoid confusion, they have not been re-used. -.sp - 0 no error - 1 \e at end of pattern - 2 \ec at end of pattern - 3 unrecognized character follows \e - 4 numbers out of order in {} quantifier - 5 number too big in {} quantifier - 6 missing terminating ] for character class - 7 invalid escape sequence in character class - 8 range out of order in character class - 9 nothing to repeat - 10 [this code is not in use] - 11 internal error: unexpected repeat - 12 unrecognized character after (? or (?- - 13 POSIX named classes are supported only within a class - 14 missing ) - 15 reference to non-existent subpattern - 16 erroffset passed as NULL - 17 unknown option bit(s) set - 18 missing ) after comment - 19 [this code is not in use] - 20 regular expression is too large - 21 failed to get memory - 22 unmatched parentheses - 23 internal error: code overflow - 24 unrecognized character after (?< - 25 lookbehind assertion is not fixed length - 26 malformed number or name after (?( - 27 conditional group contains more than two branches - 28 assertion expected after (?( - 29 (?R or (?[+-]digits must be followed by ) - 30 unknown POSIX class name - 31 POSIX collating elements are not supported - 32 this version of PCRE is compiled without UTF support - 33 [this code is not in use] - 34 character value in \ex{} or \eo{} is too large - 35 invalid condition (?(0) - 36 \eC not allowed in lookbehind assertion - 37 PCRE does not support \eL, \el, \eN{name}, \eU, or \eu - 38 number after (?C is > 255 - 39 closing ) for (?C expected - 40 recursive call could loop indefinitely - 41 unrecognized character after (?P - 42 syntax error in subpattern name (missing terminator) - 43 two named subpatterns have the same name - 44 invalid UTF-8 string (specifically UTF-8) - 45 support for \eP, \ep, and \eX has not been compiled - 46 malformed \eP or \ep sequence - 47 unknown property name after \eP or \ep - 48 subpattern name is too long (maximum 32 characters) - 49 too many named subpatterns (maximum 10000) - 50 [this code is not in use] - 51 octal value is greater than \e377 in 8-bit non-UTF-8 mode - 52 internal error: overran compiling workspace - 53 internal error: previously-checked referenced subpattern - not found - 54 DEFINE group contains more than one branch - 55 repeating a DEFINE group is not allowed - 56 inconsistent NEWLINE options - 57 \eg is not followed by a braced, angle-bracketed, or quoted - name/number or by a plain number - 58 a numbered reference must not be zero - 59 an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT) - 60 (*VERB) not recognized or malformed - 61 number is too big - 62 subpattern name expected - 63 digit expected after (?+ - 64 ] is an invalid data character in JavaScript compatibility mode - 65 different names for subpatterns of the same number are - not allowed - 66 (*MARK) must have an argument - 67 this version of PCRE is not compiled with Unicode property - support - 68 \ec must be followed by an ASCII character - 69 \ek is not followed by a braced, angle-bracketed, or quoted name - 70 internal error: unknown opcode in find_fixedlength() - 71 \eN is not supported in a class - 72 too many forward references - 73 disallowed Unicode code point (>= 0xd800 && <= 0xdfff) - 74 invalid UTF-16 string (specifically UTF-16) - 75 name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN) - 76 character value in \eu.... sequence is too large - 77 invalid UTF-32 string (specifically UTF-32) - 78 setting UTF is disabled by the application - 79 non-hex character in \ex{} (closing brace missing?) - 80 non-octal character in \eo{} (closing brace missing?) - 81 missing opening brace after \eo - 82 parentheses are too deeply nested - 83 invalid range in character class - 84 group name must start with a non-digit - 85 parentheses are too deeply nested (stack check) -.sp -The numbers 32 and 10000 in errors 48 and 49 are defaults; different values may -be used if the limits were changed when PCRE was built. -. -. -.\" HTML -.SH "STUDYING A PATTERN" -.rs -.sp -.nf -.B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP, -.B " const char **\fIerrptr\fP);" -.fi -.PP -If a compiled pattern is going to be used several times, it is worth spending -more time analyzing it in order to speed up the time taken for matching. The -function \fBpcre_study()\fP takes a pointer to a compiled pattern as its first -argument. If studying the pattern produces additional information that will -help speed up matching, \fBpcre_study()\fP returns a pointer to a -\fBpcre_extra\fP block, in which the \fIstudy_data\fP field points to the -results of the study. -.P -The returned value from \fBpcre_study()\fP can be passed directly to -\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. However, a \fBpcre_extra\fP block -also contains other fields that can be set by the caller before the block is -passed; these are described -.\" HTML -.\" -below -.\" -in the section on matching a pattern. -.P -If studying the pattern does not produce any useful information, -\fBpcre_study()\fP returns NULL by default. In that circumstance, if the -calling program wants to pass any of the other fields to \fBpcre_exec()\fP or -\fBpcre_dfa_exec()\fP, it must set up its own \fBpcre_extra\fP block. However, -if \fBpcre_study()\fP is called with the PCRE_STUDY_EXTRA_NEEDED option, it -returns a \fBpcre_extra\fP block even if studying did not find any additional -information. It may still return NULL, however, if an error occurs in -\fBpcre_study()\fP. -.P -The second argument of \fBpcre_study()\fP contains option bits. There are three -further options in addition to PCRE_STUDY_EXTRA_NEEDED: -.sp - PCRE_STUDY_JIT_COMPILE - PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE - PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE -.sp -If any of these are set, and the just-in-time compiler is available, the -pattern is further compiled into machine code that executes much faster than -the \fBpcre_exec()\fP interpretive matching function. If the just-in-time -compiler is not available, these options are ignored. All undefined bits in the -\fIoptions\fP argument must be zero. -.P -JIT compilation is a heavyweight optimization. It can take some time for -patterns to be analyzed, and for one-off matches and simple patterns the -benefit of faster execution might be offset by a much slower study time. -Not all patterns can be optimized by the JIT compiler. For those that cannot be -handled, matching automatically falls back to the \fBpcre_exec()\fP -interpreter. For more details, see the -.\" HREF -\fBpcrejit\fP -.\" -documentation. -.P -The third argument for \fBpcre_study()\fP is a pointer for an error message. If -studying succeeds (even if no data is returned), the variable it points to is -set to NULL. Otherwise it is set to point to a textual error message. This is a -static string that is part of the library. You must not try to free it. You -should test the error pointer for NULL after calling \fBpcre_study()\fP, to be -sure that it has run successfully. -.P -When you are finished with a pattern, you can free the memory used for the -study data by calling \fBpcre_free_study()\fP. This function was added to the -API for release 8.20. For earlier versions, the memory could be freed with -\fBpcre_free()\fP, just like the pattern itself. This will still work in cases -where JIT optimization is not used, but it is advisable to change to the new -function when convenient. -.P -This is a typical way in which \fBpcre_study\fP() is used (except that in a -real application there should be tests for errors): -.sp - int rc; - pcre *re; - pcre_extra *sd; - re = pcre_compile("pattern", 0, &error, &erroroffset, NULL); - sd = pcre_study( - re, /* result of pcre_compile() */ - 0, /* no options */ - &error); /* set to NULL or points to a message */ - rc = pcre_exec( /* see below for details of pcre_exec() options */ - re, sd, "subject", 7, 0, 0, ovector, 30); - ... - pcre_free_study(sd); - pcre_free(re); -.sp -Studying a pattern does two things: first, a lower bound for the length of -subject string that is needed to match the pattern is computed. This does not -mean that there are any strings of that length that match, but it does -guarantee that no shorter strings match. The value is used to avoid wasting -time by trying to match strings that are shorter than the lower bound. You can -find out the value in a calling program via the \fBpcre_fullinfo()\fP function. -.P -Studying a pattern is also useful for non-anchored patterns that do not have a -single fixed starting character. A bitmap of possible starting bytes is -created. This speeds up finding a position in the subject at which to start -matching. (In 16-bit mode, the bitmap is used for 16-bit values less than 256. -In 32-bit mode, the bitmap is used for 32-bit values less than 256.) -.P -These two optimizations apply to both \fBpcre_exec()\fP and -\fBpcre_dfa_exec()\fP, and the information is also used by the JIT compiler. -The optimizations can be disabled by setting the PCRE_NO_START_OPTIMIZE option. -You might want to do this if your pattern contains callouts or (*MARK) and you -want to make use of these facilities in cases where matching fails. -.P -PCRE_NO_START_OPTIMIZE can be specified at either compile time or execution -time. However, if PCRE_NO_START_OPTIMIZE is passed to \fBpcre_exec()\fP, (that -is, after any JIT compilation has happened) JIT execution is disabled. For JIT -execution to work with PCRE_NO_START_OPTIMIZE, the option must be set at -compile time. -.P -There is a longer discussion of PCRE_NO_START_OPTIMIZE -.\" HTML -.\" -below. -.\" -. -. -.\" HTML -.SH "LOCALE SUPPORT" -.rs -.sp -PCRE handles caseless matching, and determines whether characters are letters, -digits, or whatever, by reference to a set of tables, indexed by character -code point. When running in UTF-8 mode, or in the 16- or 32-bit libraries, this -applies only to characters with code points less than 256. By default, -higher-valued code points never match escapes such as \ew or \ed. However, if -PCRE is built with Unicode property support, all characters can be tested with -\ep and \eP, or, alternatively, the PCRE_UCP option can be set when a pattern -is compiled; this causes \ew and friends to use Unicode property support -instead of the built-in tables. -.P -The use of locales with Unicode is discouraged. If you are handling characters -with code points greater than 128, you should either use Unicode support, or -use locales, but not try to mix the two. -.P -PCRE contains an internal set of tables that are used when the final argument -of \fBpcre_compile()\fP is NULL. These are sufficient for many applications. -Normally, the internal tables recognize only ASCII characters. However, when -PCRE is built, it is possible to cause the internal tables to be rebuilt in the -default "C" locale of the local system, which may cause them to be different. -.P -The internal tables can always be overridden by tables supplied by the -application that calls PCRE. These may be created in a different locale from -the default. As more and more applications change to using Unicode, the need -for this locale support is expected to die away. -.P -External tables are built by calling the \fBpcre_maketables()\fP function, -which has no arguments, in the relevant locale. The result can then be passed -to \fBpcre_compile()\fP as often as necessary. For example, to build and use -tables that are appropriate for the French locale (where accented characters -with values greater than 128 are treated as letters), the following code could -be used: -.sp - setlocale(LC_CTYPE, "fr_FR"); - tables = pcre_maketables(); - re = pcre_compile(..., tables); -.sp -The locale name "fr_FR" is used on Linux and other Unix-like systems; if you -are using Windows, the name for the French locale is "french". -.P -When \fBpcre_maketables()\fP runs, the tables are built in memory that is -obtained via \fBpcre_malloc\fP. It is the caller's responsibility to ensure -that the memory containing the tables remains available for as long as it is -needed. -.P -The pointer that is passed to \fBpcre_compile()\fP is saved with the compiled -pattern, and the same tables are used via this pointer by \fBpcre_study()\fP -and also by \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP. Thus, for any single -pattern, compilation, studying and matching all happen in the same locale, but -different patterns can be processed in different locales. -.P -It is possible to pass a table pointer or NULL (indicating the use of the -internal tables) to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP (see the -discussion below in the section on matching a pattern). This facility is -provided for use with pre-compiled patterns that have been saved and reloaded. -Character tables are not saved with patterns, so if a non-standard table was -used at compile time, it must be provided again when the reloaded pattern is -matched. Attempting to use this facility to match a pattern in a different -locale from the one in which it was compiled is likely to lead to anomalous -(usually incorrect) results. -. -. -.\" HTML -.SH "INFORMATION ABOUT A PATTERN" -.rs -.sp -.nf -.B int pcre_fullinfo(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP," -.B " int \fIwhat\fP, void *\fIwhere\fP);" -.fi -.PP -The \fBpcre_fullinfo()\fP function returns information about a compiled -pattern. It replaces the \fBpcre_info()\fP function, which was removed from the -library at version 8.30, after more than 10 years of obsolescence. -.P -The first argument for \fBpcre_fullinfo()\fP is a pointer to the compiled -pattern. The second argument is the result of \fBpcre_study()\fP, or NULL if -the pattern was not studied. The third argument specifies which piece of -information is required, and the fourth argument is a pointer to a variable -to receive the data. The yield of the function is zero for success, or one of -the following negative numbers: -.sp - PCRE_ERROR_NULL the argument \fIcode\fP was NULL - the argument \fIwhere\fP was NULL - PCRE_ERROR_BADMAGIC the "magic number" was not found - PCRE_ERROR_BADENDIANNESS the pattern was compiled with different - endianness - PCRE_ERROR_BADOPTION the value of \fIwhat\fP was invalid - PCRE_ERROR_UNSET the requested field is not set -.sp -The "magic number" is placed at the start of each compiled pattern as an simple -check against passing an arbitrary memory pointer. The endianness error can -occur if a compiled pattern is saved and reloaded on a different host. Here is -a typical call of \fBpcre_fullinfo()\fP, to obtain the length of the compiled -pattern: -.sp - int rc; - size_t length; - rc = pcre_fullinfo( - re, /* result of pcre_compile() */ - sd, /* result of pcre_study(), or NULL */ - PCRE_INFO_SIZE, /* what is required */ - &length); /* where to put the data */ -.sp -The possible values for the third argument are defined in \fBpcre.h\fP, and are -as follows: -.sp - PCRE_INFO_BACKREFMAX -.sp -Return the number of the highest back reference in the pattern. The fourth -argument should point to an \fBint\fP variable. Zero is returned if there are -no back references. -.sp - PCRE_INFO_CAPTURECOUNT -.sp -Return the number of capturing subpatterns in the pattern. The fourth argument -should point to an \fBint\fP variable. -.sp - PCRE_INFO_DEFAULT_TABLES -.sp -Return a pointer to the internal default character tables within PCRE. The -fourth argument should point to an \fBunsigned char *\fP variable. This -information call is provided for internal use by the \fBpcre_study()\fP -function. External callers can cause PCRE to use its internal tables by passing -a NULL table pointer. -.sp - PCRE_INFO_FIRSTBYTE (deprecated) -.sp -Return information about the first data unit of any matched string, for a -non-anchored pattern. The name of this option refers to the 8-bit library, -where data units are bytes. The fourth argument should point to an \fBint\fP -variable. Negative values are used for special cases. However, this means that -when the 32-bit library is in non-UTF-32 mode, the full 32-bit range of -characters cannot be returned. For this reason, this value is deprecated; use -PCRE_INFO_FIRSTCHARACTERFLAGS and PCRE_INFO_FIRSTCHARACTER instead. -.P -If there is a fixed first value, for example, the letter "c" from a pattern -such as (cat|cow|coyote), its value is returned. In the 8-bit library, the -value is always less than 256. In the 16-bit library the value can be up to -0xffff. In the 32-bit library the value can be up to 0x10ffff. -.P -If there is no fixed first value, and if either -.sp -(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch -starts with "^", or -.sp -(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set -(if it were set, the pattern would be anchored), -.sp --1 is returned, indicating that the pattern matches only at the start of a -subject string or after any newline within the string. Otherwise -2 is -returned. For anchored patterns, -2 is returned. -.sp - PCRE_INFO_FIRSTCHARACTER -.sp -Return the value of the first data unit (non-UTF character) of any matched -string in the situation where PCRE_INFO_FIRSTCHARACTERFLAGS returns 1; -otherwise return 0. The fourth argument should point to an \fBuint_t\fP -variable. -.P -In the 8-bit library, the value is always less than 256. In the 16-bit library -the value can be up to 0xffff. In the 32-bit library in UTF-32 mode the value -can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 mode. -.sp - PCRE_INFO_FIRSTCHARACTERFLAGS -.sp -Return information about the first data unit of any matched string, for a -non-anchored pattern. The fourth argument should point to an \fBint\fP -variable. -.P -If there is a fixed first value, for example, the letter "c" from a pattern -such as (cat|cow|coyote), 1 is returned, and the character value can be -retrieved using PCRE_INFO_FIRSTCHARACTER. If there is no fixed first value, and -if either -.sp -(a) the pattern was compiled with the PCRE_MULTILINE option, and every branch -starts with "^", or -.sp -(b) every branch of the pattern starts with ".*" and PCRE_DOTALL is not set -(if it were set, the pattern would be anchored), -.sp -2 is returned, indicating that the pattern matches only at the start of a -subject string or after any newline within the string. Otherwise 0 is -returned. For anchored patterns, 0 is returned. -.sp - PCRE_INFO_FIRSTTABLE -.sp -If the pattern was studied, and this resulted in the construction of a 256-bit -table indicating a fixed set of values for the first data unit in any matching -string, a pointer to the table is returned. Otherwise NULL is returned. The -fourth argument should point to an \fBunsigned char *\fP variable. -.sp - PCRE_INFO_HASCRORLF -.sp -Return 1 if the pattern contains any explicit matches for CR or LF characters, -otherwise 0. The fourth argument should point to an \fBint\fP variable. An -explicit match is either a literal CR or LF character, or \er or \en. -.sp - PCRE_INFO_JCHANGED -.sp -Return 1 if the (?J) or (?-J) option setting is used in the pattern, otherwise -0. The fourth argument should point to an \fBint\fP variable. (?J) and -(?-J) set and unset the local PCRE_DUPNAMES option, respectively. -.sp - PCRE_INFO_JIT -.sp -Return 1 if the pattern was studied with one of the JIT options, and -just-in-time compiling was successful. The fourth argument should point to an -\fBint\fP variable. A return value of 0 means that JIT support is not available -in this version of PCRE, or that the pattern was not studied with a JIT option, -or that the JIT compiler could not handle this particular pattern. See the -.\" HREF -\fBpcrejit\fP -.\" -documentation for details of what can and cannot be handled. -.sp - PCRE_INFO_JITSIZE -.sp -If the pattern was successfully studied with a JIT option, return the size of -the JIT compiled code, otherwise return zero. The fourth argument should point -to a \fBsize_t\fP variable. -.sp - PCRE_INFO_LASTLITERAL -.sp -Return the value of the rightmost literal data unit that must exist in any -matched string, other than at its start, if such a value has been recorded. The -fourth argument should point to an \fBint\fP variable. If there is no such -value, -1 is returned. For anchored patterns, a last literal value is recorded -only if it follows something of variable length. For example, for the pattern -/^a\ed+z\ed+/ the returned value is "z", but for /^a\edz\ed/ the returned value -is -1. -.P -Since for the 32-bit library using the non-UTF-32 mode, this function is unable -to return the full 32-bit range of characters, this value is deprecated; -instead the PCRE_INFO_REQUIREDCHARFLAGS and PCRE_INFO_REQUIREDCHAR values should -be used. -.sp - PCRE_INFO_MATCH_EMPTY -.sp -Return 1 if the pattern can match an empty string, otherwise 0. The fourth -argument should point to an \fBint\fP variable. -.sp - PCRE_INFO_MATCHLIMIT -.sp -If the pattern set a match limit by including an item of the form -(*LIMIT_MATCH=nnnn) at the start, the value is returned. The fourth argument -should point to an unsigned 32-bit integer. If no such value has been set, the -call to \fBpcre_fullinfo()\fP returns the error PCRE_ERROR_UNSET. -.sp - PCRE_INFO_MAXLOOKBEHIND -.sp -Return the number of characters (NB not data units) in the longest lookbehind -assertion in the pattern. This information is useful when doing multi-segment -matching using the partial matching facilities. Note that the simple assertions -\eb and \eB require a one-character lookbehind. \eA also registers a -one-character lookbehind, though it does not actually inspect the previous -character. This is to ensure that at least one character from the old segment -is retained when a new segment is processed. Otherwise, if there are no -lookbehinds in the pattern, \eA might match incorrectly at the start of a new -segment. -.sp - PCRE_INFO_MINLENGTH -.sp -If the pattern was studied and a minimum length for matching subject strings -was computed, its value is returned. Otherwise the returned value is -1. The -value is a number of characters, which in UTF mode may be different from the -number of data units. The fourth argument should point to an \fBint\fP -variable. A non-negative value is a lower bound to the length of any matching -string. There may not be any strings of that length that do actually match, but -every string that does match is at least that long. -.sp - PCRE_INFO_NAMECOUNT - PCRE_INFO_NAMEENTRYSIZE - PCRE_INFO_NAMETABLE -.sp -PCRE supports the use of named as well as numbered capturing parentheses. The -names are just an additional way of identifying the parentheses, which still -acquire numbers. Several convenience functions such as -\fBpcre_get_named_substring()\fP are provided for extracting captured -substrings by name. It is also possible to extract the data directly, by first -converting the name to a number in order to access the correct pointers in the -output vector (described with \fBpcre_exec()\fP below). To do the conversion, -you need to use the name-to-number map, which is described by these three -values. -.P -The map consists of a number of fixed-size entries. PCRE_INFO_NAMECOUNT gives -the number of entries, and PCRE_INFO_NAMEENTRYSIZE gives the size of each -entry; both of these return an \fBint\fP value. The entry size depends on the -length of the longest name. PCRE_INFO_NAMETABLE returns a pointer to the first -entry of the table. This is a pointer to \fBchar\fP in the 8-bit library, where -the first two bytes of each entry are the number of the capturing parenthesis, -most significant byte first. In the 16-bit library, the pointer points to -16-bit data units, the first of which contains the parenthesis number. In the -32-bit library, the pointer points to 32-bit data units, the first of which -contains the parenthesis number. The rest of the entry is the corresponding -name, zero terminated. -.P -The names are in alphabetical order. If (?| is used to create multiple groups -with the same number, as described in the -.\" HTML -.\" -section on duplicate subpattern numbers -.\" -in the -.\" HREF -\fBpcrepattern\fP -.\" -page, the groups may be given the same name, but there is only one entry in the -table. Different names for groups of the same number are not permitted. -Duplicate names for subpatterns with different numbers are permitted, -but only if PCRE_DUPNAMES is set. They appear in the table in the order in -which they were found in the pattern. In the absence of (?| this is the order -of increasing number; when (?| is used this is not necessarily the case because -later subpatterns may have lower numbers. -.P -As a simple example of the name/number table, consider the following pattern -after compilation by the 8-bit library (assume PCRE_EXTENDED is set, so white -space - including newlines - is ignored): -.sp -.\" JOIN - (? (?(\ed\ed)?\ed\ed) - - (?\ed\ed) - (?\ed\ed) ) -.sp -There are four named subpatterns, so the table has four entries, and each entry -in the table is eight bytes long. The table is as follows, with non-printing -bytes shows in hexadecimal, and undefined bytes shown as ??: -.sp - 00 01 d a t e 00 ?? - 00 05 d a y 00 ?? ?? - 00 04 m o n t h 00 - 00 02 y e a r 00 ?? -.sp -When writing code to extract data from named subpatterns using the -name-to-number map, remember that the length of the entries is likely to be -different for each compiled pattern. -.sp - PCRE_INFO_OKPARTIAL -.sp -Return 1 if the pattern can be used for partial matching with -\fBpcre_exec()\fP, otherwise 0. The fourth argument should point to an -\fBint\fP variable. From release 8.00, this always returns 1, because the -restrictions that previously applied to partial matching have been lifted. The -.\" HREF -\fBpcrepartial\fP -.\" -documentation gives details of partial matching. -.sp - PCRE_INFO_OPTIONS -.sp -Return a copy of the options with which the pattern was compiled. The fourth -argument should point to an \fBunsigned long int\fP variable. These option bits -are those specified in the call to \fBpcre_compile()\fP, modified by any -top-level option settings at the start of the pattern itself. In other words, -they are the options that will be in force when matching starts. For example, -if the pattern /(?im)abc(?-i)d/ is compiled with the PCRE_EXTENDED option, the -result is PCRE_CASELESS, PCRE_MULTILINE, and PCRE_EXTENDED. -.P -A pattern is automatically anchored by PCRE if all of its top-level -alternatives begin with one of the following: -.sp - ^ unless PCRE_MULTILINE is set - \eA always - \eG always -.\" JOIN - .* if PCRE_DOTALL is set and there are no back - references to the subpattern in which .* appears -.sp -For such patterns, the PCRE_ANCHORED bit is set in the options returned by -\fBpcre_fullinfo()\fP. -.sp - PCRE_INFO_RECURSIONLIMIT -.sp -If the pattern set a recursion limit by including an item of the form -(*LIMIT_RECURSION=nnnn) at the start, the value is returned. The fourth -argument should point to an unsigned 32-bit integer. If no such value has been -set, the call to \fBpcre_fullinfo()\fP returns the error PCRE_ERROR_UNSET. -.sp - PCRE_INFO_SIZE -.sp -Return the size of the compiled pattern in bytes (for all three libraries). The -fourth argument should point to a \fBsize_t\fP variable. This value does not -include the size of the \fBpcre\fP structure that is returned by -\fBpcre_compile()\fP. The value that is passed as the argument to -\fBpcre_malloc()\fP when \fBpcre_compile()\fP is getting memory in which to -place the compiled data is the value returned by this option plus the size of -the \fBpcre\fP structure. Studying a compiled pattern, with or without JIT, -does not alter the value returned by this option. -.sp - PCRE_INFO_STUDYSIZE -.sp -Return the size in bytes (for all three libraries) of the data block pointed to -by the \fIstudy_data\fP field in a \fBpcre_extra\fP block. If \fBpcre_extra\fP -is NULL, or there is no study data, zero is returned. The fourth argument -should point to a \fBsize_t\fP variable. The \fIstudy_data\fP field is set by -\fBpcre_study()\fP to record information that will speed up matching (see the -section entitled -.\" HTML -.\" -"Studying a pattern" -.\" -above). The format of the \fIstudy_data\fP block is private, but its length -is made available via this option so that it can be saved and restored (see the -.\" HREF -\fBpcreprecompile\fP -.\" -documentation for details). -.sp - PCRE_INFO_REQUIREDCHARFLAGS -.sp -Returns 1 if there is a rightmost literal data unit that must exist in any -matched string, other than at its start. The fourth argument should point to -an \fBint\fP variable. If there is no such value, 0 is returned. If returning -1, the character value itself can be retrieved using PCRE_INFO_REQUIREDCHAR. -.P -For anchored patterns, a last literal value is recorded only if it follows -something of variable length. For example, for the pattern /^a\ed+z\ed+/ the -returned value 1 (with "z" returned from PCRE_INFO_REQUIREDCHAR), but for -/^a\edz\ed/ the returned value is 0. -.sp - PCRE_INFO_REQUIREDCHAR -.sp -Return the value of the rightmost literal data unit that must exist in any -matched string, other than at its start, if such a value has been recorded. The -fourth argument should point to an \fBuint32_t\fP variable. If there is no such -value, 0 is returned. -. -. -.SH "REFERENCE COUNTS" -.rs -.sp -.B int pcre_refcount(pcre *\fIcode\fP, int \fIadjust\fP); -.PP -The \fBpcre_refcount()\fP function is used to maintain a reference count in the -data block that contains a compiled pattern. It is provided for the benefit of -applications that operate in an object-oriented manner, where different parts -of the application may be using the same compiled pattern, but you want to free -the block when they are all done. -.P -When a pattern is compiled, the reference count field is initialized to zero. -It is changed only by calling this function, whose action is to add the -\fIadjust\fP value (which may be positive or negative) to it. The yield of the -function is the new value. However, the value of the count is constrained to -lie between 0 and 65535, inclusive. If the new value is outside these limits, -it is forced to the appropriate limit value. -.P -Except when it is zero, the reference count is not correctly preserved if a -pattern is compiled on one host and then transferred to a host whose byte-order -is different. (This seems a highly unlikely scenario.) -. -. -.SH "MATCHING A PATTERN: THE TRADITIONAL FUNCTION" -.rs -.sp -.nf -.B int pcre_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP," -.B " const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP, -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);" -.fi -.P -The function \fBpcre_exec()\fP is called to match a subject string against a -compiled pattern, which is passed in the \fIcode\fP argument. If the -pattern was studied, the result of the study should be passed in the -\fIextra\fP argument. You can call \fBpcre_exec()\fP with the same \fIcode\fP -and \fIextra\fP arguments as many times as you like, in order to match -different subject strings with the same pattern. -.P -This function is the main matching facility of the library, and it operates in -a Perl-like manner. For specialist use there is also an alternative matching -function, which is described -.\" HTML -.\" -below -.\" -in the section about the \fBpcre_dfa_exec()\fP function. -.P -In most applications, the pattern will have been compiled (and optionally -studied) in the same process that calls \fBpcre_exec()\fP. However, it is -possible to save compiled patterns and study data, and then use them later -in different processes, possibly even on different hosts. For a discussion -about this, see the -.\" HREF -\fBpcreprecompile\fP -.\" -documentation. -.P -Here is an example of a simple call to \fBpcre_exec()\fP: -.sp - int rc; - int ovector[30]; - rc = pcre_exec( - re, /* result of pcre_compile() */ - NULL, /* we didn't study the pattern */ - "some string", /* the subject string */ - 11, /* the length of the subject string */ - 0, /* start at offset 0 in the subject */ - 0, /* default options */ - ovector, /* vector of integers for substring information */ - 30); /* number of elements (NOT size in bytes) */ -. -. -.\" HTML -.SS "Extra data for \fBpcre_exec()\fR" -.rs -.sp -If the \fIextra\fP argument is not NULL, it must point to a \fBpcre_extra\fP -data block. The \fBpcre_study()\fP function returns such a block (when it -doesn't return NULL), but you can also create one for yourself, and pass -additional information in it. The \fBpcre_extra\fP block contains the following -fields (not necessarily in this order): -.sp - unsigned long int \fIflags\fP; - void *\fIstudy_data\fP; - void *\fIexecutable_jit\fP; - unsigned long int \fImatch_limit\fP; - unsigned long int \fImatch_limit_recursion\fP; - void *\fIcallout_data\fP; - const unsigned char *\fItables\fP; - unsigned char **\fImark\fP; -.sp -In the 16-bit version of this structure, the \fImark\fP field has type -"PCRE_UCHAR16 **". -.sp -In the 32-bit version of this structure, the \fImark\fP field has type -"PCRE_UCHAR32 **". -.P -The \fIflags\fP field is used to specify which of the other fields are set. The -flag bits are: -.sp - PCRE_EXTRA_CALLOUT_DATA - PCRE_EXTRA_EXECUTABLE_JIT - PCRE_EXTRA_MARK - PCRE_EXTRA_MATCH_LIMIT - PCRE_EXTRA_MATCH_LIMIT_RECURSION - PCRE_EXTRA_STUDY_DATA - PCRE_EXTRA_TABLES -.sp -Other flag bits should be set to zero. The \fIstudy_data\fP field and sometimes -the \fIexecutable_jit\fP field are set in the \fBpcre_extra\fP block that is -returned by \fBpcre_study()\fP, together with the appropriate flag bits. You -should not set these yourself, but you may add to the block by setting other -fields and their corresponding flag bits. -.P -The \fImatch_limit\fP field provides a means of preventing PCRE from using up a -vast amount of resources when running patterns that are not going to match, -but which have a very large number of possibilities in their search trees. The -classic example is a pattern that uses nested unlimited repeats. -.P -Internally, \fBpcre_exec()\fP uses a function called \fBmatch()\fP, which it -calls repeatedly (sometimes recursively). The limit set by \fImatch_limit\fP is -imposed on the number of times this function is called during a match, which -has the effect of limiting the amount of backtracking that can take place. For -patterns that are not anchored, the count restarts from zero for each position -in the subject string. -.P -When \fBpcre_exec()\fP is called with a pattern that was successfully studied -with a JIT option, the way that the matching is executed is entirely different. -However, there is still the possibility of runaway matching that goes on for a -very long time, and so the \fImatch_limit\fP value is also used in this case -(but in a different way) to limit how long the matching can continue. -.P -The default value for the limit can be set when PCRE is built; the default -default is 10 million, which handles all but the most extreme cases. You can -override the default by suppling \fBpcre_exec()\fP with a \fBpcre_extra\fP -block in which \fImatch_limit\fP is set, and PCRE_EXTRA_MATCH_LIMIT is set in -the \fIflags\fP field. If the limit is exceeded, \fBpcre_exec()\fP returns -PCRE_ERROR_MATCHLIMIT. -.P -A value for the match limit may also be supplied by an item at the start of a -pattern of the form -.sp - (*LIMIT_MATCH=d) -.sp -where d is a decimal number. However, such a setting is ignored unless d is -less than the limit set by the caller of \fBpcre_exec()\fP or, if no such limit -is set, less than the default. -.P -The \fImatch_limit_recursion\fP field is similar to \fImatch_limit\fP, but -instead of limiting the total number of times that \fBmatch()\fP is called, it -limits the depth of recursion. The recursion depth is a smaller number than the -total number of calls, because not all calls to \fBmatch()\fP are recursive. -This limit is of use only if it is set smaller than \fImatch_limit\fP. -.P -Limiting the recursion depth limits the amount of machine stack that can be -used, or, when PCRE has been compiled to use memory on the heap instead of the -stack, the amount of heap memory that can be used. This limit is not relevant, -and is ignored, when matching is done using JIT compiled code. -.P -The default value for \fImatch_limit_recursion\fP can be set when PCRE is -built; the default default is the same value as the default for -\fImatch_limit\fP. You can override the default by suppling \fBpcre_exec()\fP -with a \fBpcre_extra\fP block in which \fImatch_limit_recursion\fP is set, and -PCRE_EXTRA_MATCH_LIMIT_RECURSION is set in the \fIflags\fP field. If the limit -is exceeded, \fBpcre_exec()\fP returns PCRE_ERROR_RECURSIONLIMIT. -.P -A value for the recursion limit may also be supplied by an item at the start of -a pattern of the form -.sp - (*LIMIT_RECURSION=d) -.sp -where d is a decimal number. However, such a setting is ignored unless d is -less than the limit set by the caller of \fBpcre_exec()\fP or, if no such limit -is set, less than the default. -.P -The \fIcallout_data\fP field is used in conjunction with the "callout" feature, -and is described in the -.\" HREF -\fBpcrecallout\fP -.\" -documentation. -.P -The \fItables\fP field is provided for use with patterns that have been -pre-compiled using custom character tables, saved to disc or elsewhere, and -then reloaded, because the tables that were used to compile a pattern are not -saved with it. See the -.\" HREF -\fBpcreprecompile\fP -.\" -documentation for a discussion of saving compiled patterns for later use. If -NULL is passed using this mechanism, it forces PCRE's internal tables to be -used. -.P -\fBWarning:\fP The tables that \fBpcre_exec()\fP uses must be the same as those -that were used when the pattern was compiled. If this is not the case, the -behaviour of \fBpcre_exec()\fP is undefined. Therefore, when a pattern is -compiled and matched in the same process, this field should never be set. In -this (the most common) case, the correct table pointer is automatically passed -with the compiled pattern from \fBpcre_compile()\fP to \fBpcre_exec()\fP. -.P -If PCRE_EXTRA_MARK is set in the \fIflags\fP field, the \fImark\fP field must -be set to point to a suitable variable. If the pattern contains any -backtracking control verbs such as (*MARK:NAME), and the execution ends up with -a name to pass back, a pointer to the name string (zero terminated) is placed -in the variable pointed to by the \fImark\fP field. The names are within the -compiled pattern; if you wish to retain such a name you must copy it before -freeing the memory of a compiled pattern. If there is no name to pass back, the -variable pointed to by the \fImark\fP field is set to NULL. For details of the -backtracking control verbs, see the section entitled -.\" HTML -.\" -"Backtracking control" -.\" -in the -.\" HREF -\fBpcrepattern\fP -.\" -documentation. -. -. -.\" HTML -.SS "Option bits for \fBpcre_exec()\fP" -.rs -.sp -The unused bits of the \fIoptions\fP argument for \fBpcre_exec()\fP must be -zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_\fIxxx\fP, -PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, -PCRE_NO_START_OPTIMIZE, PCRE_NO_UTF8_CHECK, PCRE_PARTIAL_HARD, and -PCRE_PARTIAL_SOFT. -.P -If the pattern was successfully studied with one of the just-in-time (JIT) -compile options, the only supported options for JIT execution are -PCRE_NO_UTF8_CHECK, PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, -PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and PCRE_PARTIAL_SOFT. If an -unsupported option is used, JIT execution is disabled and the normal -interpretive code in \fBpcre_exec()\fP is run. -.sp - PCRE_ANCHORED -.sp -The PCRE_ANCHORED option limits \fBpcre_exec()\fP to matching at the first -matching position. If a pattern was compiled with PCRE_ANCHORED, or turned out -to be anchored by virtue of its contents, it cannot be made unachored at -matching time. -.sp - PCRE_BSR_ANYCRLF - PCRE_BSR_UNICODE -.sp -These options (which are mutually exclusive) control what the \eR escape -sequence matches. The choice is either to match only CR, LF, or CRLF, or to -match any Unicode newline sequence. These options override the choice that was -made or defaulted when the pattern was compiled. -.sp - PCRE_NEWLINE_CR - PCRE_NEWLINE_LF - PCRE_NEWLINE_CRLF - PCRE_NEWLINE_ANYCRLF - PCRE_NEWLINE_ANY -.sp -These options override the newline definition that was chosen or defaulted when -the pattern was compiled. For details, see the description of -\fBpcre_compile()\fP above. During matching, the newline choice affects the -behaviour of the dot, circumflex, and dollar metacharacters. It may also alter -the way the match position is advanced after a match failure for an unanchored -pattern. -.P -When PCRE_NEWLINE_CRLF, PCRE_NEWLINE_ANYCRLF, or PCRE_NEWLINE_ANY is set, and a -match attempt for an unanchored pattern fails when the current position is at a -CRLF sequence, and the pattern contains no explicit matches for CR or LF -characters, the match position is advanced by two characters instead of one, in -other words, to after the CRLF. -.P -The above rule is a compromise that makes the most common cases work as -expected. For example, if the pattern is .+A (and the PCRE_DOTALL option is not -set), it does not match the string "\er\enA" because, after failing at the -start, it skips both the CR and the LF before retrying. However, the pattern -[\er\en]A does match that string, because it contains an explicit CR or LF -reference, and so advances only by one character after the first failure. -.P -An explicit match for CR of LF is either a literal appearance of one of those -characters, or one of the \er or \en escape sequences. Implicit matches such as -[^X] do not count, nor does \es (which includes CR and LF in the characters -that it matches). -.P -Notwithstanding the above, anomalous effects may still occur when CRLF is a -valid newline sequence and explicit \er or \en escapes appear in the pattern. -.sp - PCRE_NOTBOL -.sp -This option specifies that first character of the subject string is not the -beginning of a line, so the circumflex metacharacter should not match before -it. Setting this without PCRE_MULTILINE (at compile time) causes circumflex -never to match. This option affects only the behaviour of the circumflex -metacharacter. It does not affect \eA. -.sp - PCRE_NOTEOL -.sp -This option specifies that the end of the subject string is not the end of a -line, so the dollar metacharacter should not match it nor (except in multiline -mode) a newline immediately before it. Setting this without PCRE_MULTILINE (at -compile time) causes dollar never to match. This option affects only the -behaviour of the dollar metacharacter. It does not affect \eZ or \ez. -.sp - PCRE_NOTEMPTY -.sp -An empty string is not considered to be a valid match if this option is set. If -there are alternatives in the pattern, they are tried. If all the alternatives -match the empty string, the entire match fails. For example, if the pattern -.sp - a?b? -.sp -is applied to a string not beginning with "a" or "b", it matches an empty -string at the start of the subject. With PCRE_NOTEMPTY set, this match is not -valid, so PCRE searches further into the string for occurrences of "a" or "b". -.sp - PCRE_NOTEMPTY_ATSTART -.sp -This is like PCRE_NOTEMPTY, except that an empty string match that is not at -the start of the subject is permitted. If the pattern is anchored, such a match -can occur only if the pattern contains \eK. -.P -Perl has no direct equivalent of PCRE_NOTEMPTY or PCRE_NOTEMPTY_ATSTART, but it -does make a special case of a pattern match of the empty string within its -\fBsplit()\fP function, and when using the /g modifier. It is possible to -emulate Perl's behaviour after matching a null string by first trying the match -again at the same offset with PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED, and then -if that fails, by advancing the starting offset (see below) and trying an -ordinary match again. There is some code that demonstrates how to do this in -the -.\" HREF -\fBpcredemo\fP -.\" -sample program. In the most general case, you have to check to see if the -newline convention recognizes CRLF as a newline, and if so, and the current -character is CR followed by LF, advance the starting offset by two characters -instead of one. -.sp - PCRE_NO_START_OPTIMIZE -.sp -There are a number of optimizations that \fBpcre_exec()\fP uses at the start of -a match, in order to speed up the process. For example, if it is known that an -unanchored match must start with a specific character, it searches the subject -for that character, and fails immediately if it cannot find it, without -actually running the main matching function. This means that a special item -such as (*COMMIT) at the start of a pattern is not considered until after a -suitable starting point for the match has been found. Also, when callouts or -(*MARK) items are in use, these "start-up" optimizations can cause them to be -skipped if the pattern is never actually used. The start-up optimizations are -in effect a pre-scan of the subject that takes place before the pattern is run. -.P -The PCRE_NO_START_OPTIMIZE option disables the start-up optimizations, possibly -causing performance to suffer, but ensuring that in cases where the result is -"no match", the callouts do occur, and that items such as (*COMMIT) and (*MARK) -are considered at every possible starting position in the subject string. If -PCRE_NO_START_OPTIMIZE is set at compile time, it cannot be unset at matching -time. The use of PCRE_NO_START_OPTIMIZE at matching time (that is, passing it -to \fBpcre_exec()\fP) disables JIT execution; in this situation, matching is -always done using interpretively. -.P -Setting PCRE_NO_START_OPTIMIZE can change the outcome of a matching operation. -Consider the pattern -.sp - (*COMMIT)ABC -.sp -When this is compiled, PCRE records the fact that a match must start with the -character "A". Suppose the subject string is "DEFABC". The start-up -optimization scans along the subject, finds "A" and runs the first match -attempt from there. The (*COMMIT) item means that the pattern must match the -current starting position, which in this case, it does. However, if the same -match is run with PCRE_NO_START_OPTIMIZE set, the initial scan along the -subject string does not happen. The first match attempt is run starting from -"D" and when this fails, (*COMMIT) prevents any further matches being tried, so -the overall result is "no match". If the pattern is studied, more start-up -optimizations may be used. For example, a minimum length for the subject may be -recorded. Consider the pattern -.sp - (*MARK:A)(X|Y) -.sp -The minimum length for a match is one character. If the subject is "ABC", there -will be attempts to match "ABC", "BC", "C", and then finally an empty string. -If the pattern is studied, the final attempt does not take place, because PCRE -knows that the subject is too short, and so the (*MARK) is never encountered. -In this case, studying the pattern does not affect the overall match result, -which is still "no match", but it does affect the auxiliary information that is -returned. -.sp - PCRE_NO_UTF8_CHECK -.sp -When PCRE_UTF8 is set at compile time, the validity of the subject as a UTF-8 -string is automatically checked when \fBpcre_exec()\fP is subsequently called. -The entire string is checked before any other processing takes place. The value -of \fIstartoffset\fP is also checked to ensure that it points to the start of a -UTF-8 character. There is a discussion about the -.\" HTML -.\" -validity of UTF-8 strings -.\" -in the -.\" HREF -\fBpcreunicode\fP -.\" -page. If an invalid sequence of bytes is found, \fBpcre_exec()\fP returns the -error PCRE_ERROR_BADUTF8 or, if PCRE_PARTIAL_HARD is set and the problem is a -truncated character at the end of the subject, PCRE_ERROR_SHORTUTF8. In both -cases, information about the precise nature of the error may also be returned -(see the descriptions of these errors in the section entitled \fIError return -values from\fP \fBpcre_exec()\fP -.\" HTML -.\" -below). -.\" -If \fIstartoffset\fP contains a value that does not point to the start of a -UTF-8 character (or to the end of the subject), PCRE_ERROR_BADUTF8_OFFSET is -returned. -.P -If you already know that your subject is valid, and you want to skip these -checks for performance reasons, you can set the PCRE_NO_UTF8_CHECK option when -calling \fBpcre_exec()\fP. You might want to do this for the second and -subsequent calls to \fBpcre_exec()\fP if you are making repeated calls to find -all the matches in a single subject string. However, you should be sure that -the value of \fIstartoffset\fP points to the start of a character (or the end -of the subject). When PCRE_NO_UTF8_CHECK is set, the effect of passing an -invalid string as a subject or an invalid value of \fIstartoffset\fP is -undefined. Your program may crash or loop. -.sp - PCRE_PARTIAL_HARD - PCRE_PARTIAL_SOFT -.sp -These options turn on the partial matching feature. For backwards -compatibility, PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. A partial match -occurs if the end of the subject string is reached successfully, but there are -not enough subject characters to complete the match. If this happens when -PCRE_PARTIAL_SOFT (but not PCRE_PARTIAL_HARD) is set, matching continues by -testing any remaining alternatives. Only if no complete match can be found is -PCRE_ERROR_PARTIAL returned instead of PCRE_ERROR_NOMATCH. In other words, -PCRE_PARTIAL_SOFT says that the caller is prepared to handle a partial match, -but only if no complete match can be found. -.P -If PCRE_PARTIAL_HARD is set, it overrides PCRE_PARTIAL_SOFT. In this case, if a -partial match is found, \fBpcre_exec()\fP immediately returns -PCRE_ERROR_PARTIAL, without considering any other alternatives. In other words, -when PCRE_PARTIAL_HARD is set, a partial match is considered to be more -important that an alternative complete match. -.P -In both cases, the portion of the string that was inspected when the partial -match was found is set as the first matching string. There is a more detailed -discussion of partial and multi-segment matching, with examples, in the -.\" HREF -\fBpcrepartial\fP -.\" -documentation. -. -. -.SS "The string to be matched by \fBpcre_exec()\fP" -.rs -.sp -The subject string is passed to \fBpcre_exec()\fP as a pointer in -\fIsubject\fP, a length in \fIlength\fP, and a starting offset in -\fIstartoffset\fP. The units for \fIlength\fP and \fIstartoffset\fP are bytes -for the 8-bit library, 16-bit data items for the 16-bit library, and 32-bit -data items for the 32-bit library. -.P -If \fIstartoffset\fP is negative or greater than the length of the subject, -\fBpcre_exec()\fP returns PCRE_ERROR_BADOFFSET. When the starting offset is -zero, the search for a match starts at the beginning of the subject, and this -is by far the most common case. In UTF-8 or UTF-16 mode, the offset must point -to the start of a character, or the end of the subject (in UTF-32 mode, one -data unit equals one character, so all offsets are valid). Unlike the pattern -string, the subject may contain binary zeroes. -.P -A non-zero starting offset is useful when searching for another match in the -same subject by calling \fBpcre_exec()\fP again after a previous success. -Setting \fIstartoffset\fP differs from just passing over a shortened string and -setting PCRE_NOTBOL in the case of a pattern that begins with any kind of -lookbehind. For example, consider the pattern -.sp - \eBiss\eB -.sp -which finds occurrences of "iss" in the middle of words. (\eB matches only if -the current position in the subject is not a word boundary.) When applied to -the string "Mississipi" the first call to \fBpcre_exec()\fP finds the first -occurrence. If \fBpcre_exec()\fP is called again with just the remainder of the -subject, namely "issipi", it does not match, because \eB is always false at the -start of the subject, which is deemed to be a word boundary. However, if -\fBpcre_exec()\fP is passed the entire string again, but with \fIstartoffset\fP -set to 4, it finds the second occurrence of "iss" because it is able to look -behind the starting point to discover that it is preceded by a letter. -.P -Finding all the matches in a subject is tricky when the pattern can match an -empty string. It is possible to emulate Perl's /g behaviour by first trying the -match again at the same offset, with the PCRE_NOTEMPTY_ATSTART and -PCRE_ANCHORED options, and then if that fails, advancing the starting offset -and trying an ordinary match again. There is some code that demonstrates how to -do this in the -.\" HREF -\fBpcredemo\fP -.\" -sample program. In the most general case, you have to check to see if the -newline convention recognizes CRLF as a newline, and if so, and the current -character is CR followed by LF, advance the starting offset by two characters -instead of one. -.P -If a non-zero starting offset is passed when the pattern is anchored, one -attempt to match at the given offset is made. This can only succeed if the -pattern does not require the match to be at the start of the subject. -. -. -.SS "How \fBpcre_exec()\fP returns captured substrings" -.rs -.sp -In general, a pattern matches a certain portion of the subject, and in -addition, further substrings from the subject may be picked out by parts of the -pattern. Following the usage in Jeffrey Friedl's book, this is called -"capturing" in what follows, and the phrase "capturing subpattern" is used for -a fragment of a pattern that picks out a substring. PCRE supports several other -kinds of parenthesized subpattern that do not cause substrings to be captured. -.P -Captured substrings are returned to the caller via a vector of integers whose -address is passed in \fIovector\fP. The number of elements in the vector is -passed in \fIovecsize\fP, which must be a non-negative number. \fBNote\fP: this -argument is NOT the size of \fIovector\fP in bytes. -.P -The first two-thirds of the vector is used to pass back captured substrings, -each substring using a pair of integers. The remaining third of the vector is -used as workspace by \fBpcre_exec()\fP while matching capturing subpatterns, -and is not available for passing back information. The number passed in -\fIovecsize\fP should always be a multiple of three. If it is not, it is -rounded down. -.P -When a match is successful, information about captured substrings is returned -in pairs of integers, starting at the beginning of \fIovector\fP, and -continuing up to two-thirds of its length at the most. The first element of -each pair is set to the offset of the first character in a substring, and the -second is set to the offset of the first character after the end of a -substring. These values are always data unit offsets, even in UTF mode. They -are byte offsets in the 8-bit library, 16-bit data item offsets in the 16-bit -library, and 32-bit data item offsets in the 32-bit library. \fBNote\fP: they -are not character counts. -.P -The first pair of integers, \fIovector[0]\fP and \fIovector[1]\fP, identify the -portion of the subject string matched by the entire pattern. The next pair is -used for the first capturing subpattern, and so on. The value returned by -\fBpcre_exec()\fP is one more than the highest numbered pair that has been set. -For example, if two substrings have been captured, the returned value is 3. If -there are no capturing subpatterns, the return value from a successful match is -1, indicating that just the first pair of offsets has been set. -.P -If a capturing subpattern is matched repeatedly, it is the last portion of the -string that it matched that is returned. -.P -If the vector is too small to hold all the captured substring offsets, it is -used as far as possible (up to two-thirds of its length), and the function -returns a value of zero. If neither the actual string matched nor any captured -substrings are of interest, \fBpcre_exec()\fP may be called with \fIovector\fP -passed as NULL and \fIovecsize\fP as zero. However, if the pattern contains -back references and the \fIovector\fP is not big enough to remember the related -substrings, PCRE has to get additional memory for use during matching. Thus it -is usually advisable to supply an \fIovector\fP of reasonable size. -.P -There are some cases where zero is returned (indicating vector overflow) when -in fact the vector is exactly the right size for the final match. For example, -consider the pattern -.sp - (a)(?:(b)c|bd) -.sp -If a vector of 6 elements (allowing for only 1 captured substring) is given -with subject string "abd", \fBpcre_exec()\fP will try to set the second -captured string, thereby recording a vector overflow, before failing to match -"c" and backing up to try the second alternative. The zero return, however, -does correctly indicate that the maximum number of slots (namely 2) have been -filled. In similar cases where there is temporary overflow, but the final -number of used slots is actually less than the maximum, a non-zero value is -returned. -.P -The \fBpcre_fullinfo()\fP function can be used to find out how many capturing -subpatterns there are in a compiled pattern. The smallest size for -\fIovector\fP that will allow for \fIn\fP captured substrings, in addition to -the offsets of the substring matched by the whole pattern, is (\fIn\fP+1)*3. -.P -It is possible for capturing subpattern number \fIn+1\fP to match some part of -the subject when subpattern \fIn\fP has not been used at all. For example, if -the string "abc" is matched against the pattern (a|(z))(bc) the return from the -function is 4, and subpatterns 1 and 3 are matched, but 2 is not. When this -happens, both values in the offset pairs corresponding to unused subpatterns -are set to -1. -.P -Offset values that correspond to unused subpatterns at the end of the -expression are also set to -1. For example, if the string "abc" is matched -against the pattern (abc)(x(yz)?)? subpatterns 2 and 3 are not matched. The -return from the function is 2, because the highest used capturing subpattern -number is 1, and the offsets for for the second and third capturing subpatterns -(assuming the vector is large enough, of course) are set to -1. -.P -\fBNote\fP: Elements in the first two-thirds of \fIovector\fP that do not -correspond to capturing parentheses in the pattern are never changed. That is, -if a pattern contains \fIn\fP capturing parentheses, no more than -\fIovector[0]\fP to \fIovector[2n+1]\fP are set by \fBpcre_exec()\fP. The other -elements (in the first two-thirds) retain whatever values they previously had. -.P -Some convenience functions are provided for extracting the captured substrings -as separate strings. These are described below. -. -. -.\" HTML -.SS "Error return values from \fBpcre_exec()\fP" -.rs -.sp -If \fBpcre_exec()\fP fails, it returns a negative number. The following are -defined in the header file: -.sp - PCRE_ERROR_NOMATCH (-1) -.sp -The subject string did not match the pattern. -.sp - PCRE_ERROR_NULL (-2) -.sp -Either \fIcode\fP or \fIsubject\fP was passed as NULL, or \fIovector\fP was -NULL and \fIovecsize\fP was not zero. -.sp - PCRE_ERROR_BADOPTION (-3) -.sp -An unrecognized bit was set in the \fIoptions\fP argument. -.sp - PCRE_ERROR_BADMAGIC (-4) -.sp -PCRE stores a 4-byte "magic number" at the start of the compiled code, to catch -the case when it is passed a junk pointer and to detect when a pattern that was -compiled in an environment of one endianness is run in an environment with the -other endianness. This is the error that PCRE gives when the magic number is -not present. -.sp - PCRE_ERROR_UNKNOWN_OPCODE (-5) -.sp -While running the pattern match, an unknown item was encountered in the -compiled pattern. This error could be caused by a bug in PCRE or by overwriting -of the compiled pattern. -.sp - PCRE_ERROR_NOMEMORY (-6) -.sp -If a pattern contains back references, but the \fIovector\fP that is passed to -\fBpcre_exec()\fP is not big enough to remember the referenced substrings, PCRE -gets a block of memory at the start of matching to use for this purpose. If the -call via \fBpcre_malloc()\fP fails, this error is given. The memory is -automatically freed at the end of matching. -.P -This error is also given if \fBpcre_stack_malloc()\fP fails in -\fBpcre_exec()\fP. This can happen only when PCRE has been compiled with -\fB--disable-stack-for-recursion\fP. -.sp - PCRE_ERROR_NOSUBSTRING (-7) -.sp -This error is used by the \fBpcre_copy_substring()\fP, -\fBpcre_get_substring()\fP, and \fBpcre_get_substring_list()\fP functions (see -below). It is never returned by \fBpcre_exec()\fP. -.sp - PCRE_ERROR_MATCHLIMIT (-8) -.sp -The backtracking limit, as specified by the \fImatch_limit\fP field in a -\fBpcre_extra\fP structure (or defaulted) was reached. See the description -above. -.sp - PCRE_ERROR_CALLOUT (-9) -.sp -This error is never generated by \fBpcre_exec()\fP itself. It is provided for -use by callout functions that want to yield a distinctive error code. See the -.\" HREF -\fBpcrecallout\fP -.\" -documentation for details. -.sp - PCRE_ERROR_BADUTF8 (-10) -.sp -A string that contains an invalid UTF-8 byte sequence was passed as a subject, -and the PCRE_NO_UTF8_CHECK option was not set. If the size of the output vector -(\fIovecsize\fP) is at least 2, the byte offset to the start of the the invalid -UTF-8 character is placed in the first element, and a reason code is placed in -the second element. The reason codes are listed in the -.\" HTML -.\" -following section. -.\" -For backward compatibility, if PCRE_PARTIAL_HARD is set and the problem is a -truncated UTF-8 character at the end of the subject (reason codes 1 to 5), -PCRE_ERROR_SHORTUTF8 is returned instead of PCRE_ERROR_BADUTF8. -.sp - PCRE_ERROR_BADUTF8_OFFSET (-11) -.sp -The UTF-8 byte sequence that was passed as a subject was checked and found to -be valid (the PCRE_NO_UTF8_CHECK option was not set), but the value of -\fIstartoffset\fP did not point to the beginning of a UTF-8 character or the -end of the subject. -.sp - PCRE_ERROR_PARTIAL (-12) -.sp -The subject string did not match, but it did match partially. See the -.\" HREF -\fBpcrepartial\fP -.\" -documentation for details of partial matching. -.sp - PCRE_ERROR_BADPARTIAL (-13) -.sp -This code is no longer in use. It was formerly returned when the PCRE_PARTIAL -option was used with a compiled pattern containing items that were not -supported for partial matching. From release 8.00 onwards, there are no -restrictions on partial matching. -.sp - PCRE_ERROR_INTERNAL (-14) -.sp -An unexpected internal error has occurred. This error could be caused by a bug -in PCRE or by overwriting of the compiled pattern. -.sp - PCRE_ERROR_BADCOUNT (-15) -.sp -This error is given if the value of the \fIovecsize\fP argument is negative. -.sp - PCRE_ERROR_RECURSIONLIMIT (-21) -.sp -The internal recursion limit, as specified by the \fImatch_limit_recursion\fP -field in a \fBpcre_extra\fP structure (or defaulted) was reached. See the -description above. -.sp - PCRE_ERROR_BADNEWLINE (-23) -.sp -An invalid combination of PCRE_NEWLINE_\fIxxx\fP options was given. -.sp - PCRE_ERROR_BADOFFSET (-24) -.sp -The value of \fIstartoffset\fP was negative or greater than the length of the -subject, that is, the value in \fIlength\fP. -.sp - PCRE_ERROR_SHORTUTF8 (-25) -.sp -This error is returned instead of PCRE_ERROR_BADUTF8 when the subject string -ends with a truncated UTF-8 character and the PCRE_PARTIAL_HARD option is set. -Information about the failure is returned as for PCRE_ERROR_BADUTF8. It is in -fact sufficient to detect this case, but this special error code for -PCRE_PARTIAL_HARD precedes the implementation of returned information; it is -retained for backwards compatibility. -.sp - PCRE_ERROR_RECURSELOOP (-26) -.sp -This error is returned when \fBpcre_exec()\fP detects a recursion loop within -the pattern. Specifically, it means that either the whole pattern or a -subpattern has been called recursively for the second time at the same position -in the subject string. Some simple patterns that might do this are detected and -faulted at compile time, but more complicated cases, in particular mutual -recursions between two different subpatterns, cannot be detected until run -time. -.sp - PCRE_ERROR_JIT_STACKLIMIT (-27) -.sp -This error is returned when a pattern that was successfully studied using a -JIT compile option is being matched, but the memory available for the -just-in-time processing stack is not large enough. See the -.\" HREF -\fBpcrejit\fP -.\" -documentation for more details. -.sp - PCRE_ERROR_BADMODE (-28) -.sp -This error is given if a pattern that was compiled by the 8-bit library is -passed to a 16-bit or 32-bit library function, or vice versa. -.sp - PCRE_ERROR_BADENDIANNESS (-29) -.sp -This error is given if a pattern that was compiled and saved is reloaded on a -host with different endianness. The utility function -\fBpcre_pattern_to_host_byte_order()\fP can be used to convert such a pattern -so that it runs on the new host. -.sp - PCRE_ERROR_JIT_BADOPTION -.sp -This error is returned when a pattern that was successfully studied using a JIT -compile option is being matched, but the matching mode (partial or complete -match) does not correspond to any JIT compilation mode. When the JIT fast path -function is used, this error may be also given for invalid options. See the -.\" HREF -\fBpcrejit\fP -.\" -documentation for more details. -.sp - PCRE_ERROR_BADLENGTH (-32) -.sp -This error is given if \fBpcre_exec()\fP is called with a negative value for -the \fIlength\fP argument. -.P -Error numbers -16 to -20, -22, and 30 are not used by \fBpcre_exec()\fP. -. -. -.\" HTML -.SS "Reason codes for invalid UTF-8 strings" -.rs -.sp -This section applies only to the 8-bit library. The corresponding information -for the 16-bit and 32-bit libraries is given in the -.\" HREF -\fBpcre16\fP -.\" -and -.\" HREF -\fBpcre32\fP -.\" -pages. -.P -When \fBpcre_exec()\fP returns either PCRE_ERROR_BADUTF8 or -PCRE_ERROR_SHORTUTF8, and the size of the output vector (\fIovecsize\fP) is at -least 2, the offset of the start of the invalid UTF-8 character is placed in -the first output vector element (\fIovector[0]\fP) and a reason code is placed -in the second element (\fIovector[1]\fP). The reason codes are given names in -the \fBpcre.h\fP header file: -.sp - PCRE_UTF8_ERR1 - PCRE_UTF8_ERR2 - PCRE_UTF8_ERR3 - PCRE_UTF8_ERR4 - PCRE_UTF8_ERR5 -.sp -The string ends with a truncated UTF-8 character; the code specifies how many -bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8 characters to be -no longer than 4 bytes, the encoding scheme (originally defined by RFC 2279) -allows for up to 6 bytes, and this is checked first; hence the possibility of -4 or 5 missing bytes. -.sp - PCRE_UTF8_ERR6 - PCRE_UTF8_ERR7 - PCRE_UTF8_ERR8 - PCRE_UTF8_ERR9 - PCRE_UTF8_ERR10 -.sp -The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of the -character do not have the binary value 0b10 (that is, either the most -significant bit is 0, or the next bit is 1). -.sp - PCRE_UTF8_ERR11 - PCRE_UTF8_ERR12 -.sp -A character that is valid by the RFC 2279 rules is either 5 or 6 bytes long; -these code points are excluded by RFC 3629. -.sp - PCRE_UTF8_ERR13 -.sp -A 4-byte character has a value greater than 0x10fff; these code points are -excluded by RFC 3629. -.sp - PCRE_UTF8_ERR14 -.sp -A 3-byte character has a value in the range 0xd800 to 0xdfff; this range of -code points are reserved by RFC 3629 for use with UTF-16, and so are excluded -from UTF-8. -.sp - PCRE_UTF8_ERR15 - PCRE_UTF8_ERR16 - PCRE_UTF8_ERR17 - PCRE_UTF8_ERR18 - PCRE_UTF8_ERR19 -.sp -A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it codes for a -value that can be represented by fewer bytes, which is invalid. For example, -the two bytes 0xc0, 0xae give the value 0x2e, whose correct coding uses just -one byte. -.sp - PCRE_UTF8_ERR20 -.sp -The two most significant bits of the first byte of a character have the binary -value 0b10 (that is, the most significant bit is 1 and the second is 0). Such a -byte can only validly occur as the second or subsequent byte of a multi-byte -character. -.sp - PCRE_UTF8_ERR21 -.sp -The first byte of a character has the value 0xfe or 0xff. These values can -never occur in a valid UTF-8 string. -.sp - PCRE_UTF8_ERR22 -.sp -This error code was formerly used when the presence of a so-called -"non-character" caused an error. Unicode corrigendum #9 makes it clear that -such characters should not cause a string to be rejected, and so this code is -no longer in use and is never returned. -. -. -.SH "EXTRACTING CAPTURED SUBSTRINGS BY NUMBER" -.rs -.sp -.nf -.B int pcre_copy_substring(const char *\fIsubject\fP, int *\fIovector\fP, -.B " int \fIstringcount\fP, int \fIstringnumber\fP, char *\fIbuffer\fP," -.B " int \fIbuffersize\fP);" -.sp -.B int pcre_get_substring(const char *\fIsubject\fP, int *\fIovector\fP, -.B " int \fIstringcount\fP, int \fIstringnumber\fP," -.B " const char **\fIstringptr\fP);" -.sp -.B int pcre_get_substring_list(const char *\fIsubject\fP, -.B " int *\fIovector\fP, int \fIstringcount\fP, const char ***\fIlistptr\fP);" -.fi -.PP -Captured substrings can be accessed directly by using the offsets returned by -\fBpcre_exec()\fP in \fIovector\fP. For convenience, the functions -\fBpcre_copy_substring()\fP, \fBpcre_get_substring()\fP, and -\fBpcre_get_substring_list()\fP are provided for extracting captured substrings -as new, separate, zero-terminated strings. These functions identify substrings -by number. The next section describes functions for extracting named -substrings. -.P -A substring that contains a binary zero is correctly extracted and has a -further zero added on the end, but the result is not, of course, a C string. -However, you can process such a string by referring to the length that is -returned by \fBpcre_copy_substring()\fP and \fBpcre_get_substring()\fP. -Unfortunately, the interface to \fBpcre_get_substring_list()\fP is not adequate -for handling strings containing binary zeros, because the end of the final -string is not independently indicated. -.P -The first three arguments are the same for all three of these functions: -\fIsubject\fP is the subject string that has just been successfully matched, -\fIovector\fP is a pointer to the vector of integer offsets that was passed to -\fBpcre_exec()\fP, and \fIstringcount\fP is the number of substrings that were -captured by the match, including the substring that matched the entire regular -expression. This is the value returned by \fBpcre_exec()\fP if it is greater -than zero. If \fBpcre_exec()\fP returned zero, indicating that it ran out of -space in \fIovector\fP, the value passed as \fIstringcount\fP should be the -number of elements in the vector divided by three. -.P -The functions \fBpcre_copy_substring()\fP and \fBpcre_get_substring()\fP -extract a single substring, whose number is given as \fIstringnumber\fP. A -value of zero extracts the substring that matched the entire pattern, whereas -higher values extract the captured substrings. For \fBpcre_copy_substring()\fP, -the string is placed in \fIbuffer\fP, whose length is given by -\fIbuffersize\fP, while for \fBpcre_get_substring()\fP a new block of memory is -obtained via \fBpcre_malloc\fP, and its address is returned via -\fIstringptr\fP. The yield of the function is the length of the string, not -including the terminating zero, or one of these error codes: -.sp - PCRE_ERROR_NOMEMORY (-6) -.sp -The buffer was too small for \fBpcre_copy_substring()\fP, or the attempt to get -memory failed for \fBpcre_get_substring()\fP. -.sp - PCRE_ERROR_NOSUBSTRING (-7) -.sp -There is no substring whose number is \fIstringnumber\fP. -.P -The \fBpcre_get_substring_list()\fP function extracts all available substrings -and builds a list of pointers to them. All this is done in a single block of -memory that is obtained via \fBpcre_malloc\fP. The address of the memory block -is returned via \fIlistptr\fP, which is also the start of the list of string -pointers. The end of the list is marked by a NULL pointer. The yield of the -function is zero if all went well, or the error code -.sp - PCRE_ERROR_NOMEMORY (-6) -.sp -if the attempt to get the memory block failed. -.P -When any of these functions encounter a substring that is unset, which can -happen when capturing subpattern number \fIn+1\fP matches some part of the -subject, but subpattern \fIn\fP has not been used at all, they return an empty -string. This can be distinguished from a genuine zero-length substring by -inspecting the appropriate offset in \fIovector\fP, which is negative for unset -substrings. -.P -The two convenience functions \fBpcre_free_substring()\fP and -\fBpcre_free_substring_list()\fP can be used to free the memory returned by -a previous call of \fBpcre_get_substring()\fP or -\fBpcre_get_substring_list()\fP, respectively. They do nothing more than call -the function pointed to by \fBpcre_free\fP, which of course could be called -directly from a C program. However, PCRE is used in some situations where it is -linked via a special interface to another programming language that cannot use -\fBpcre_free\fP directly; it is for these cases that the functions are -provided. -. -. -.SH "EXTRACTING CAPTURED SUBSTRINGS BY NAME" -.rs -.sp -.nf -.B int pcre_get_stringnumber(const pcre *\fIcode\fP, -.B " const char *\fIname\fP);" -.sp -.B int pcre_copy_named_substring(const pcre *\fIcode\fP, -.B " const char *\fIsubject\fP, int *\fIovector\fP," -.B " int \fIstringcount\fP, const char *\fIstringname\fP," -.B " char *\fIbuffer\fP, int \fIbuffersize\fP);" -.sp -.B int pcre_get_named_substring(const pcre *\fIcode\fP, -.B " const char *\fIsubject\fP, int *\fIovector\fP," -.B " int \fIstringcount\fP, const char *\fIstringname\fP," -.B " const char **\fIstringptr\fP);" -.fi -.PP -To extract a substring by name, you first have to find associated number. -For example, for this pattern -.sp - (a+)b(?\ed+)... -.sp -the number of the subpattern called "xxx" is 2. If the name is known to be -unique (PCRE_DUPNAMES was not set), you can find the number from the name by -calling \fBpcre_get_stringnumber()\fP. The first argument is the compiled -pattern, and the second is the name. The yield of the function is the -subpattern number, or PCRE_ERROR_NOSUBSTRING (-7) if there is no subpattern of -that name. -.P -Given the number, you can extract the substring directly, or use one of the -functions described in the previous section. For convenience, there are also -two functions that do the whole job. -.P -Most of the arguments of \fBpcre_copy_named_substring()\fP and -\fBpcre_get_named_substring()\fP are the same as those for the similarly named -functions that extract by number. As these are described in the previous -section, they are not re-described here. There are just two differences: -.P -First, instead of a substring number, a substring name is given. Second, there -is an extra argument, given at the start, which is a pointer to the compiled -pattern. This is needed in order to gain access to the name-to-number -translation table. -.P -These functions call \fBpcre_get_stringnumber()\fP, and if it succeeds, they -then call \fBpcre_copy_substring()\fP or \fBpcre_get_substring()\fP, as -appropriate. \fBNOTE:\fP If PCRE_DUPNAMES is set and there are duplicate names, -the behaviour may not be what you want (see the next section). -.P -\fBWarning:\fP If the pattern uses the (?| feature to set up multiple -subpatterns with the same number, as described in the -.\" HTML -.\" -section on duplicate subpattern numbers -.\" -in the -.\" HREF -\fBpcrepattern\fP -.\" -page, you cannot use names to distinguish the different subpatterns, because -names are not included in the compiled code. The matching process uses only -numbers. For this reason, the use of different names for subpatterns of the -same number causes an error at compile time. -. -. -.SH "DUPLICATE SUBPATTERN NAMES" -.rs -.sp -.nf -.B int pcre_get_stringtable_entries(const pcre *\fIcode\fP, -.B " const char *\fIname\fP, char **\fIfirst\fP, char **\fIlast\fP);" -.fi -.PP -When a pattern is compiled with the PCRE_DUPNAMES option, names for subpatterns -are not required to be unique. (Duplicate names are always allowed for -subpatterns with the same number, created by using the (?| feature. Indeed, if -such subpatterns are named, they are required to use the same names.) -.P -Normally, patterns with duplicate names are such that in any one match, only -one of the named subpatterns participates. An example is shown in the -.\" HREF -\fBpcrepattern\fP -.\" -documentation. -.P -When duplicates are present, \fBpcre_copy_named_substring()\fP and -\fBpcre_get_named_substring()\fP return the first substring corresponding to -the given name that is set. If none are set, PCRE_ERROR_NOSUBSTRING (-7) is -returned; no data is returned. The \fBpcre_get_stringnumber()\fP function -returns one of the numbers that are associated with the name, but it is not -defined which it is. -.P -If you want to get full details of all captured substrings for a given name, -you must use the \fBpcre_get_stringtable_entries()\fP function. The first -argument is the compiled pattern, and the second is the name. The third and -fourth are pointers to variables which are updated by the function. After it -has run, they point to the first and last entries in the name-to-number table -for the given name. The function itself returns the length of each entry, or -PCRE_ERROR_NOSUBSTRING (-7) if there are none. The format of the table is -described above in the section entitled \fIInformation about a pattern\fP -.\" HTML -.\" -above. -.\" -Given all the relevant entries for the name, you can extract each of their -numbers, and hence the captured data, if any. -. -. -.SH "FINDING ALL POSSIBLE MATCHES" -.rs -.sp -The traditional matching function uses a similar algorithm to Perl, which stops -when it finds the first match, starting at a given point in the subject. If you -want to find all possible matches, or the longest possible match, consider -using the alternative matching function (see below) instead. If you cannot use -the alternative function, but still need to find all possible matches, you -can kludge it up by making use of the callout facility, which is described in -the -.\" HREF -\fBpcrecallout\fP -.\" -documentation. -.P -What you have to do is to insert a callout right at the end of the pattern. -When your callout function is called, extract and save the current matched -substring. Then return 1, which forces \fBpcre_exec()\fP to backtrack and try -other alternatives. Ultimately, when it runs out of matches, \fBpcre_exec()\fP -will yield PCRE_ERROR_NOMATCH. -. -. -.SH "OBTAINING AN ESTIMATE OF STACK USAGE" -.rs -.sp -Matching certain patterns using \fBpcre_exec()\fP can use a lot of process -stack, which in certain environments can be rather limited in size. Some users -find it helpful to have an estimate of the amount of stack that is used by -\fBpcre_exec()\fP, to help them set recursion limits, as described in the -.\" HREF -\fBpcrestack\fP -.\" -documentation. The estimate that is output by \fBpcretest\fP when called with -the \fB-m\fP and \fB-C\fP options is obtained by calling \fBpcre_exec\fP with -the values NULL, NULL, NULL, -999, and -999 for its first five arguments. -.P -Normally, if its first argument is NULL, \fBpcre_exec()\fP immediately returns -the negative error code PCRE_ERROR_NULL, but with this special combination of -arguments, it returns instead a negative number whose absolute value is the -approximate stack frame size in bytes. (A negative number is used so that it is -clear that no match has happened.) The value is approximate because in some -cases, recursive calls to \fBpcre_exec()\fP occur when there are one or two -additional variables on the stack. -.P -If PCRE has been compiled to use the heap instead of the stack for recursion, -the value returned is the size of each block that is obtained from the heap. -. -. -.\" HTML -.SH "MATCHING A PATTERN: THE ALTERNATIVE FUNCTION" -.rs -.sp -.nf -.B int pcre_dfa_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP," -.B " const char *\fIsubject\fP, int \fIlength\fP, int \fIstartoffset\fP," -.B " int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP," -.B " int *\fIworkspace\fP, int \fIwscount\fP);" -.fi -.P -The function \fBpcre_dfa_exec()\fP is called to match a subject string against -a compiled pattern, using a matching algorithm that scans the subject string -just once, and does not backtrack. This has different characteristics to the -normal algorithm, and is not compatible with Perl. Some of the features of PCRE -patterns are not supported. Nevertheless, there are times when this kind of -matching can be useful. For a discussion of the two matching algorithms, and a -list of features that \fBpcre_dfa_exec()\fP does not support, see the -.\" HREF -\fBpcrematching\fP -.\" -documentation. -.P -The arguments for the \fBpcre_dfa_exec()\fP function are the same as for -\fBpcre_exec()\fP, plus two extras. The \fIovector\fP argument is used in a -different way, and this is described below. The other common arguments are used -in the same way as for \fBpcre_exec()\fP, so their description is not repeated -here. -.P -The two additional arguments provide workspace for the function. The workspace -vector should contain at least 20 elements. It is used for keeping track of -multiple paths through the pattern tree. More workspace will be needed for -patterns and subjects where there are a lot of potential matches. -.P -Here is an example of a simple call to \fBpcre_dfa_exec()\fP: -.sp - int rc; - int ovector[10]; - int wspace[20]; - rc = pcre_dfa_exec( - re, /* result of pcre_compile() */ - NULL, /* we didn't study the pattern */ - "some string", /* the subject string */ - 11, /* the length of the subject string */ - 0, /* start at offset 0 in the subject */ - 0, /* default options */ - ovector, /* vector of integers for substring information */ - 10, /* number of elements (NOT size in bytes) */ - wspace, /* working space vector */ - 20); /* number of elements (NOT size in bytes) */ -. -.SS "Option bits for \fBpcre_dfa_exec()\fP" -.rs -.sp -The unused bits of the \fIoptions\fP argument for \fBpcre_dfa_exec()\fP must be -zero. The only bits that may be set are PCRE_ANCHORED, PCRE_NEWLINE_\fIxxx\fP, -PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, -PCRE_NO_UTF8_CHECK, PCRE_BSR_ANYCRLF, PCRE_BSR_UNICODE, PCRE_NO_START_OPTIMIZE, -PCRE_PARTIAL_HARD, PCRE_PARTIAL_SOFT, PCRE_DFA_SHORTEST, and PCRE_DFA_RESTART. -All but the last four of these are exactly the same as for \fBpcre_exec()\fP, -so their description is not repeated here. -.sp - PCRE_PARTIAL_HARD - PCRE_PARTIAL_SOFT -.sp -These have the same general effect as they do for \fBpcre_exec()\fP, but the -details are slightly different. When PCRE_PARTIAL_HARD is set for -\fBpcre_dfa_exec()\fP, it returns PCRE_ERROR_PARTIAL if the end of the subject -is reached and there is still at least one matching possibility that requires -additional characters. This happens even if some complete matches have also -been found. When PCRE_PARTIAL_SOFT is set, the return code PCRE_ERROR_NOMATCH -is converted into PCRE_ERROR_PARTIAL if the end of the subject is reached, -there have been no complete matches, but there is still at least one matching -possibility. The portion of the string that was inspected when the longest -partial match was found is set as the first matching string in both cases. -There is a more detailed discussion of partial and multi-segment matching, with -examples, in the -.\" HREF -\fBpcrepartial\fP -.\" -documentation. -.sp - PCRE_DFA_SHORTEST -.sp -Setting the PCRE_DFA_SHORTEST option causes the matching algorithm to stop as -soon as it has found one match. Because of the way the alternative algorithm -works, this is necessarily the shortest possible match at the first possible -matching point in the subject string. -.sp - PCRE_DFA_RESTART -.sp -When \fBpcre_dfa_exec()\fP returns a partial match, it is possible to call it -again, with additional subject characters, and have it continue with the same -match. The PCRE_DFA_RESTART option requests this action; when it is set, the -\fIworkspace\fP and \fIwscount\fP options must reference the same vector as -before because data about the match so far is left in them after a partial -match. There is more discussion of this facility in the -.\" HREF -\fBpcrepartial\fP -.\" -documentation. -. -. -.SS "Successful returns from \fBpcre_dfa_exec()\fP" -.rs -.sp -When \fBpcre_dfa_exec()\fP succeeds, it may have matched more than one -substring in the subject. Note, however, that all the matches from one run of -the function start at the same point in the subject. The shorter matches are -all initial substrings of the longer matches. For example, if the pattern -.sp - <.*> -.sp -is matched against the string -.sp - This is no more -.sp -the three matched strings are -.sp - - - -.sp -On success, the yield of the function is a number greater than zero, which is -the number of matched substrings. The substrings themselves are returned in -\fIovector\fP. Each string uses two elements; the first is the offset to the -start, and the second is the offset to the end. In fact, all the strings have -the same start offset. (Space could have been saved by giving this only once, -but it was decided to retain some compatibility with the way \fBpcre_exec()\fP -returns data, even though the meaning of the strings is different.) -.P -The strings are returned in reverse order of length; that is, the longest -matching string is given first. If there were too many matches to fit into -\fIovector\fP, the yield of the function is zero, and the vector is filled with -the longest matches. Unlike \fBpcre_exec()\fP, \fBpcre_dfa_exec()\fP can use -the entire \fIovector\fP for returning matched strings. -.P -NOTE: PCRE's "auto-possessification" optimization usually applies to character -repeats at the end of a pattern (as well as internally). For example, the -pattern "a\ed+" is compiled as if it were "a\ed++" because there is no point -even considering the possibility of backtracking into the repeated digits. For -DFA matching, this means that only one possible match is found. If you really -do want multiple matches in such cases, either use an ungreedy repeat -("a\ed+?") or set the PCRE_NO_AUTO_POSSESS option when compiling. -. -. -.SS "Error returns from \fBpcre_dfa_exec()\fP" -.rs -.sp -The \fBpcre_dfa_exec()\fP function returns a negative number when it fails. -Many of the errors are the same as for \fBpcre_exec()\fP, and these are -described -.\" HTML -.\" -above. -.\" -There are in addition the following errors that are specific to -\fBpcre_dfa_exec()\fP: -.sp - PCRE_ERROR_DFA_UITEM (-16) -.sp -This return is given if \fBpcre_dfa_exec()\fP encounters an item in the pattern -that it does not support, for instance, the use of \eC or a back reference. -.sp - PCRE_ERROR_DFA_UCOND (-17) -.sp -This return is given if \fBpcre_dfa_exec()\fP encounters a condition item that -uses a back reference for the condition, or a test for recursion in a specific -group. These are not supported. -.sp - PCRE_ERROR_DFA_UMLIMIT (-18) -.sp -This return is given if \fBpcre_dfa_exec()\fP is called with an \fIextra\fP -block that contains a setting of the \fImatch_limit\fP or -\fImatch_limit_recursion\fP fields. This is not supported (these fields are -meaningless for DFA matching). -.sp - PCRE_ERROR_DFA_WSSIZE (-19) -.sp -This return is given if \fBpcre_dfa_exec()\fP runs out of space in the -\fIworkspace\fP vector. -.sp - PCRE_ERROR_DFA_RECURSE (-20) -.sp -When a recursive subpattern is processed, the matching function calls itself -recursively, using private vectors for \fIovector\fP and \fIworkspace\fP. This -error is given if the output vector is not large enough. This should be -extremely rare, as a vector of size 1000 is used. -.sp - PCRE_ERROR_DFA_BADRESTART (-30) -.sp -When \fBpcre_dfa_exec()\fP is called with the \fBPCRE_DFA_RESTART\fP option, -some plausibility checks are made on the contents of the workspace, which -should contain data about the previous partial match. If any of these checks -fail, this error is given. -. -. -.SH "SEE ALSO" -.rs -.sp -\fBpcre16\fP(3), \fBpcre32\fP(3), \fBpcrebuild\fP(3), \fBpcrecallout\fP(3), -\fBpcrecpp(3)\fP(3), \fBpcrematching\fP(3), \fBpcrepartial\fP(3), -\fBpcreposix\fP(3), \fBpcreprecompile\fP(3), \fBpcresample\fP(3), -\fBpcrestack\fP(3). -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 18 December 2015 -Copyright (c) 1997-2015 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcrebuild.3 b/src/pcre/doc/pcrebuild.3 deleted file mode 100644 index 403f2ae3..00000000 --- a/src/pcre/doc/pcrebuild.3 +++ /dev/null @@ -1,550 +0,0 @@ -.TH PCREBUILD 3 "12 May 2013" "PCRE 8.33" -.SH NAME -PCRE - Perl-compatible regular expressions -. -. -.SH "BUILDING PCRE" -.rs -.sp -PCRE is distributed with a \fBconfigure\fP script that can be used to build the -library in Unix-like environments using the applications known as Autotools. -Also in the distribution are files to support building using \fBCMake\fP -instead of \fBconfigure\fP. The text file -.\" HTML -.\" -\fBREADME\fP -.\" -contains general information about building with Autotools (some of which is -repeated below), and also has some comments about building on various operating -systems. There is a lot more information about building PCRE without using -Autotools (including information about using \fBCMake\fP and building "by -hand") in the text file called -.\" HTML -.\" -\fBNON-AUTOTOOLS-BUILD\fP. -.\" -You should consult this file as well as the -.\" HTML -.\" -\fBREADME\fP -.\" -file if you are building in a non-Unix-like environment. -. -. -.SH "PCRE BUILD-TIME OPTIONS" -.rs -.sp -The rest of this document describes the optional features of PCRE that can be -selected when the library is compiled. It assumes use of the \fBconfigure\fP -script, where the optional features are selected or deselected by providing -options to \fBconfigure\fP before running the \fBmake\fP command. However, the -same options can be selected in both Unix-like and non-Unix-like environments -using the GUI facility of \fBcmake-gui\fP if you are using \fBCMake\fP instead -of \fBconfigure\fP to build PCRE. -.P -If you are not using Autotools or \fBCMake\fP, option selection can be done by -editing the \fBconfig.h\fP file, or by passing parameter settings to the -compiler, as described in -.\" HTML -.\" -\fBNON-AUTOTOOLS-BUILD\fP. -.\" -.P -The complete list of options for \fBconfigure\fP (which includes the standard -ones such as the selection of the installation directory) can be obtained by -running -.sp - ./configure --help -.sp -The following sections include descriptions of options whose names begin with ---enable or --disable. These settings specify changes to the defaults for the -\fBconfigure\fP command. Because of the way that \fBconfigure\fP works, ---enable and --disable always come in pairs, so the complementary option always -exists as well, but as it specifies the default, it is not described. -. -. -.SH "BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES" -.rs -.sp -By default, a library called \fBlibpcre\fP is built, containing functions that -take string arguments contained in vectors of bytes, either as single-byte -characters, or interpreted as UTF-8 strings. You can also build a separate -library, called \fBlibpcre16\fP, in which strings are contained in vectors of -16-bit data units and interpreted either as single-unit characters or UTF-16 -strings, by adding -.sp - --enable-pcre16 -.sp -to the \fBconfigure\fP command. You can also build yet another separate -library, called \fBlibpcre32\fP, in which strings are contained in vectors of -32-bit data units and interpreted either as single-unit characters or UTF-32 -strings, by adding -.sp - --enable-pcre32 -.sp -to the \fBconfigure\fP command. If you do not want the 8-bit library, add -.sp - --disable-pcre8 -.sp -as well. At least one of the three libraries must be built. Note that the C++ -and POSIX wrappers are for the 8-bit library only, and that \fBpcregrep\fP is -an 8-bit program. None of these are built if you select only the 16-bit or -32-bit libraries. -. -. -.SH "BUILDING SHARED AND STATIC LIBRARIES" -.rs -.sp -The Autotools PCRE building process uses \fBlibtool\fP to build both shared and -static libraries by default. You can suppress one of these by adding one of -.sp - --disable-shared - --disable-static -.sp -to the \fBconfigure\fP command, as required. -. -. -.SH "C++ SUPPORT" -.rs -.sp -By default, if the 8-bit library is being built, the \fBconfigure\fP script -will search for a C++ compiler and C++ header files. If it finds them, it -automatically builds the C++ wrapper library (which supports only 8-bit -strings). You can disable this by adding -.sp - --disable-cpp -.sp -to the \fBconfigure\fP command. -. -. -.SH "UTF-8, UTF-16 AND UTF-32 SUPPORT" -.rs -.sp -To build PCRE with support for UTF Unicode character strings, add -.sp - --enable-utf -.sp -to the \fBconfigure\fP command. This setting applies to all three libraries, -adding support for UTF-8 to the 8-bit library, support for UTF-16 to the 16-bit -library, and support for UTF-32 to the to the 32-bit library. There are no -separate options for enabling UTF-8, UTF-16 and UTF-32 independently because -that would allow ridiculous settings such as requesting UTF-16 support while -building only the 8-bit library. It is not possible to build one library with -UTF support and another without in the same configuration. (For backwards -compatibility, --enable-utf8 is a synonym of --enable-utf.) -.P -Of itself, this setting does not make PCRE treat strings as UTF-8, UTF-16 or -UTF-32. As well as compiling PCRE with this option, you also have have to set -the PCRE_UTF8, PCRE_UTF16 or PCRE_UTF32 option (as appropriate) when you call -one of the pattern compiling functions. -.P -If you set --enable-utf when compiling in an EBCDIC environment, PCRE expects -its input to be either ASCII or UTF-8 (depending on the run-time option). It is -not possible to support both EBCDIC and UTF-8 codes in the same version of the -library. Consequently, --enable-utf and --enable-ebcdic are mutually -exclusive. -. -. -.SH "UNICODE CHARACTER PROPERTY SUPPORT" -.rs -.sp -UTF support allows the libraries to process character codepoints up to 0x10ffff -in the strings that they handle. On its own, however, it does not provide any -facilities for accessing the properties of such characters. If you want to be -able to use the pattern escapes \eP, \ep, and \eX, which refer to Unicode -character properties, you must add -.sp - --enable-unicode-properties -.sp -to the \fBconfigure\fP command. This implies UTF support, even if you have -not explicitly requested it. -.P -Including Unicode property support adds around 30K of tables to the PCRE -library. Only the general category properties such as \fILu\fP and \fINd\fP are -supported. Details are given in the -.\" HREF -\fBpcrepattern\fP -.\" -documentation. -. -. -.SH "JUST-IN-TIME COMPILER SUPPORT" -.rs -.sp -Just-in-time compiler support is included in the build by specifying -.sp - --enable-jit -.sp -This support is available only for certain hardware architectures. If this -option is set for an unsupported architecture, a compile time error occurs. -See the -.\" HREF -\fBpcrejit\fP -.\" -documentation for a discussion of JIT usage. When JIT support is enabled, -pcregrep automatically makes use of it, unless you add -.sp - --disable-pcregrep-jit -.sp -to the "configure" command. -. -. -.SH "CODE VALUE OF NEWLINE" -.rs -.sp -By default, PCRE interprets the linefeed (LF) character as indicating the end -of a line. This is the normal newline character on Unix-like systems. You can -compile PCRE to use carriage return (CR) instead, by adding -.sp - --enable-newline-is-cr -.sp -to the \fBconfigure\fP command. There is also a --enable-newline-is-lf option, -which explicitly specifies linefeed as the newline character. -.sp -Alternatively, you can specify that line endings are to be indicated by the two -character sequence CRLF. If you want this, add -.sp - --enable-newline-is-crlf -.sp -to the \fBconfigure\fP command. There is a fourth option, specified by -.sp - --enable-newline-is-anycrlf -.sp -which causes PCRE to recognize any of the three sequences CR, LF, or CRLF as -indicating a line ending. Finally, a fifth option, specified by -.sp - --enable-newline-is-any -.sp -causes PCRE to recognize any Unicode newline sequence. -.P -Whatever line ending convention is selected when PCRE is built can be -overridden when the library functions are called. At build time it is -conventional to use the standard for your operating system. -. -. -.SH "WHAT \eR MATCHES" -.rs -.sp -By default, the sequence \eR in a pattern matches any Unicode newline sequence, -whatever has been selected as the line ending sequence. If you specify -.sp - --enable-bsr-anycrlf -.sp -the default is changed so that \eR matches only CR, LF, or CRLF. Whatever is -selected when PCRE is built can be overridden when the library functions are -called. -. -. -.SH "POSIX MALLOC USAGE" -.rs -.sp -When the 8-bit library is called through the POSIX interface (see the -.\" HREF -\fBpcreposix\fP -.\" -documentation), additional working storage is required for holding the pointers -to capturing substrings, because PCRE requires three integers per substring, -whereas the POSIX interface provides only two. If the number of expected -substrings is small, the wrapper function uses space on the stack, because this -is faster than using \fBmalloc()\fP for each call. The default threshold above -which the stack is no longer used is 10; it can be changed by adding a setting -such as -.sp - --with-posix-malloc-threshold=20 -.sp -to the \fBconfigure\fP command. -. -. -.SH "HANDLING VERY LARGE PATTERNS" -.rs -.sp -Within a compiled pattern, offset values are used to point from one part to -another (for example, from an opening parenthesis to an alternation -metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values -are used for these offsets, leading to a maximum size for a compiled pattern of -around 64K. This is sufficient to handle all but the most gigantic patterns. -Nevertheless, some people do want to process truly enormous patterns, so it is -possible to compile PCRE to use three-byte or four-byte offsets by adding a -setting such as -.sp - --with-link-size=3 -.sp -to the \fBconfigure\fP command. The value given must be 2, 3, or 4. For the -16-bit library, a value of 3 is rounded up to 4. In these libraries, using -longer offsets slows down the operation of PCRE because it has to load -additional data when handling them. For the 32-bit library the value is always -4 and cannot be overridden; the value of --with-link-size is ignored. -. -. -.SH "AVOIDING EXCESSIVE STACK USAGE" -.rs -.sp -When matching with the \fBpcre_exec()\fP function, PCRE implements backtracking -by making recursive calls to an internal function called \fBmatch()\fP. In -environments where the size of the stack is limited, this can severely limit -PCRE's operation. (The Unix environment does not usually suffer from this -problem, but it may sometimes be necessary to increase the maximum stack size. -There is a discussion in the -.\" HREF -\fBpcrestack\fP -.\" -documentation.) An alternative approach to recursion that uses memory from the -heap to remember data, instead of using recursive function calls, has been -implemented to work round the problem of limited stack size. If you want to -build a version of PCRE that works this way, add -.sp - --disable-stack-for-recursion -.sp -to the \fBconfigure\fP command. With this configuration, PCRE will use the -\fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP variables to call memory -management functions. By default these point to \fBmalloc()\fP and -\fBfree()\fP, but you can replace the pointers so that your own functions are -used instead. -.P -Separate functions are provided rather than using \fBpcre_malloc\fP and -\fBpcre_free\fP because the usage is very predictable: the block sizes -requested are always the same, and the blocks are always freed in reverse -order. A calling program might be able to implement optimized functions that -perform better than \fBmalloc()\fP and \fBfree()\fP. PCRE runs noticeably more -slowly when built in this way. This option affects only the \fBpcre_exec()\fP -function; it is not relevant for \fBpcre_dfa_exec()\fP. -. -. -.SH "LIMITING PCRE RESOURCE USAGE" -.rs -.sp -Internally, PCRE has a function called \fBmatch()\fP, which it calls repeatedly -(sometimes recursively) when matching a pattern with the \fBpcre_exec()\fP -function. By controlling the maximum number of times this function may be -called during a single matching operation, a limit can be placed on the -resources used by a single call to \fBpcre_exec()\fP. The limit can be changed -at run time, as described in the -.\" HREF -\fBpcreapi\fP -.\" -documentation. The default is 10 million, but this can be changed by adding a -setting such as -.sp - --with-match-limit=500000 -.sp -to the \fBconfigure\fP command. This setting has no effect on the -\fBpcre_dfa_exec()\fP matching function. -.P -In some environments it is desirable to limit the depth of recursive calls of -\fBmatch()\fP more strictly than the total number of calls, in order to -restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion -is specified) that is used. A second limit controls this; it defaults to the -value that is set for --with-match-limit, which imposes no additional -constraints. However, you can set a lower limit by adding, for example, -.sp - --with-match-limit-recursion=10000 -.sp -to the \fBconfigure\fP command. This value can also be overridden at run time. -. -. -.SH "CREATING CHARACTER TABLES AT BUILD TIME" -.rs -.sp -PCRE uses fixed tables for processing characters whose code values are less -than 256. By default, PCRE is built with a set of tables that are distributed -in the file \fIpcre_chartables.c.dist\fP. These tables are for ASCII codes -only. If you add -.sp - --enable-rebuild-chartables -.sp -to the \fBconfigure\fP command, the distributed tables are no longer used. -Instead, a program called \fBdftables\fP is compiled and run. This outputs the -source for new set of tables, created in the default locale of your C run-time -system. (This method of replacing the tables does not work if you are cross -compiling, because \fBdftables\fP is run on the local host. If you need to -create alternative tables when cross compiling, you will have to do so "by -hand".) -. -. -.SH "USING EBCDIC CODE" -.rs -.sp -PCRE assumes by default that it will run in an environment where the character -code is ASCII (or Unicode, which is a superset of ASCII). This is the case for -most computer operating systems. PCRE can, however, be compiled to run in an -EBCDIC environment by adding -.sp - --enable-ebcdic -.sp -to the \fBconfigure\fP command. This setting implies ---enable-rebuild-chartables. You should only use it if you know that you are in -an EBCDIC environment (for example, an IBM mainframe operating system). The ---enable-ebcdic option is incompatible with --enable-utf. -.P -The EBCDIC character that corresponds to an ASCII LF is assumed to have the -value 0x15 by default. However, in some EBCDIC environments, 0x25 is used. In -such an environment you should use -.sp - --enable-ebcdic-nl25 -.sp -as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR has the -same value as in ASCII, namely, 0x0d. Whichever of 0x15 and 0x25 is \fInot\fP -chosen as LF is made to correspond to the Unicode NEL character (which, in -Unicode, is 0x85). -.P -The options that select newline behaviour, such as --enable-newline-is-cr, -and equivalent run-time options, refer to these character values in an EBCDIC -environment. -. -. -.SH "PCREGREP OPTIONS FOR COMPRESSED FILE SUPPORT" -.rs -.sp -By default, \fBpcregrep\fP reads all files as plain text. You can build it so -that it recognizes files whose names end in \fB.gz\fP or \fB.bz2\fP, and reads -them with \fBlibz\fP or \fBlibbz2\fP, respectively, by adding one or both of -.sp - --enable-pcregrep-libz - --enable-pcregrep-libbz2 -.sp -to the \fBconfigure\fP command. These options naturally require that the -relevant libraries are installed on your system. Configuration will fail if -they are not. -. -. -.SH "PCREGREP BUFFER SIZE" -.rs -.sp -\fBpcregrep\fP uses an internal buffer to hold a "window" on the file it is -scanning, in order to be able to output "before" and "after" lines when it -finds a match. The size of the buffer is controlled by a parameter whose -default value is 20K. The buffer itself is three times this size, but because -of the way it is used for holding "before" lines, the longest line that is -guaranteed to be processable is the parameter size. You can change the default -parameter value by adding, for example, -.sp - --with-pcregrep-bufsize=50K -.sp -to the \fBconfigure\fP command. The caller of \fPpcregrep\fP can, however, -override this value by specifying a run-time option. -. -. -.SH "PCRETEST OPTION FOR LIBREADLINE SUPPORT" -.rs -.sp -If you add -.sp - --enable-pcretest-libreadline -.sp -to the \fBconfigure\fP command, \fBpcretest\fP is linked with the -\fBlibreadline\fP library, and when its input is from a terminal, it reads it -using the \fBreadline()\fP function. This provides line-editing and history -facilities. Note that \fBlibreadline\fP is GPL-licensed, so if you distribute a -binary of \fBpcretest\fP linked in this way, there may be licensing issues. -.P -Setting this option causes the \fB-lreadline\fP option to be added to the -\fBpcretest\fP build. In many operating environments with a sytem-installed -\fBlibreadline\fP this is sufficient. However, in some environments (e.g. -if an unmodified distribution version of readline is in use), some extra -configuration may be necessary. The INSTALL file for \fBlibreadline\fP says -this: -.sp - "Readline uses the termcap functions, but does not link with the - termcap or curses library itself, allowing applications which link - with readline the to choose an appropriate library." -.sp -If your environment has not been set up so that an appropriate library is -automatically included, you may need to add something like -.sp - LIBS="-ncurses" -.sp -immediately before the \fBconfigure\fP command. -. -. -.SH "DEBUGGING WITH VALGRIND SUPPORT" -.rs -.sp -By adding the -.sp - --enable-valgrind -.sp -option to to the \fBconfigure\fP command, PCRE will use valgrind annotations -to mark certain memory regions as unaddressable. This allows it to detect -invalid memory accesses, and is mostly useful for debugging PCRE itself. -. -. -.SH "CODE COVERAGE REPORTING" -.rs -.sp -If your C compiler is gcc, you can build a version of PCRE that can generate a -code coverage report for its test suite. To enable this, you must install -\fBlcov\fP version 1.6 or above. Then specify -.sp - --enable-coverage -.sp -to the \fBconfigure\fP command and build PCRE in the usual way. -.P -Note that using \fBccache\fP (a caching C compiler) is incompatible with code -coverage reporting. If you have configured \fBccache\fP to run automatically -on your system, you must set the environment variable -.sp - CCACHE_DISABLE=1 -.sp -before running \fBmake\fP to build PCRE, so that \fBccache\fP is not used. -.P -When --enable-coverage is used, the following addition targets are added to the -\fIMakefile\fP: -.sp - make coverage -.sp -This creates a fresh coverage report for the PCRE test suite. It is equivalent -to running "make coverage-reset", "make coverage-baseline", "make check", and -then "make coverage-report". -.sp - make coverage-reset -.sp -This zeroes the coverage counters, but does nothing else. -.sp - make coverage-baseline -.sp -This captures baseline coverage information. -.sp - make coverage-report -.sp -This creates the coverage report. -.sp - make coverage-clean-report -.sp -This removes the generated coverage report without cleaning the coverage data -itself. -.sp - make coverage-clean-data -.sp -This removes the captured coverage data without removing the coverage files -created at compile time (*.gcno). -.sp - make coverage-clean -.sp -This cleans all coverage data including the generated coverage report. For more -information about code coverage, see the \fBgcov\fP and \fBlcov\fP -documentation. -. -. -.SH "SEE ALSO" -.rs -.sp -\fBpcreapi\fP(3), \fBpcre16\fP, \fBpcre32\fP, \fBpcre_config\fP(3). -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 12 May 2013 -Copyright (c) 1997-2013 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcrecallout.3 b/src/pcre/doc/pcrecallout.3 deleted file mode 100644 index 8ebc9959..00000000 --- a/src/pcre/doc/pcrecallout.3 +++ /dev/null @@ -1,255 +0,0 @@ -.TH PCRECALLOUT 3 "12 November 2013" "PCRE 8.34" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH SYNOPSIS -.rs -.sp -.B #include -.PP -.SM -.B int (*pcre_callout)(pcre_callout_block *); -.PP -.B int (*pcre16_callout)(pcre16_callout_block *); -.PP -.B int (*pcre32_callout)(pcre32_callout_block *); -. -.SH DESCRIPTION -.rs -.sp -PCRE provides a feature called "callout", which is a means of temporarily -passing control to the caller of PCRE in the middle of pattern matching. The -caller of PCRE provides an external function by putting its entry point in the -global variable \fIpcre_callout\fP (\fIpcre16_callout\fP for the 16-bit -library, \fIpcre32_callout\fP for the 32-bit library). By default, this -variable contains NULL, which disables all calling out. -.P -Within a regular expression, (?C) indicates the points at which the external -function is to be called. Different callout points can be identified by putting -a number less than 256 after the letter C. The default value is zero. -For example, this pattern has two callout points: -.sp - (?C1)abc(?C2)def -.sp -If the PCRE_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE -automatically inserts callouts, all with number 255, before each item in the -pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern -.sp - A(\ed{2}|--) -.sp -it is processed as if it were -.sp -(?C255)A(?C255)((?C255)\ed{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255) -.sp -Notice that there is a callout before and after each parenthesis and -alternation bar. If the pattern contains a conditional group whose condition is -an assertion, an automatic callout is inserted immediately before the -condition. Such a callout may also be inserted explicitly, for example: -.sp - (?(?C9)(?=a)ab|de) -.sp -This applies only to assertion conditions (because they are themselves -independent groups). -.P -Automatic callouts can be used for tracking the progress of pattern matching. -The -.\" HREF -\fBpcretest\fP -.\" -program has a pattern qualifier (/C) that sets automatic callouts; when it is -used, the output indicates how the pattern is being matched. This is useful -information when you are trying to optimize the performance of a particular -pattern. -. -. -.SH "MISSING CALLOUTS" -.rs -.sp -You should be aware that, because of optimizations in the way PCRE compiles and -matches patterns, callouts sometimes do not happen exactly as you might expect. -.P -At compile time, PCRE "auto-possessifies" repeated items when it knows that -what follows cannot be part of the repeat. For example, a+[bc] is compiled as -if it were a++[bc]. The \fBpcretest\fP output when this pattern is anchored and -then applied with automatic callouts to the string "aaaa" is: -.sp - --->aaaa - +0 ^ ^ - +1 ^ a+ - +3 ^ ^ [bc] - No match -.sp -This indicates that when matching [bc] fails, there is no backtracking into a+ -and therefore the callouts that would be taken for the backtracks do not occur. -You can disable the auto-possessify feature by passing PCRE_NO_AUTO_POSSESS -to \fBpcre_compile()\fP, or starting the pattern with (*NO_AUTO_POSSESS). If -this is done in \fBpcretest\fP (using the /O qualifier), the output changes to -this: -.sp - --->aaaa - +0 ^ ^ - +1 ^ a+ - +3 ^ ^ [bc] - +3 ^ ^ [bc] - +3 ^ ^ [bc] - +3 ^^ [bc] - No match -.sp -This time, when matching [bc] fails, the matcher backtracks into a+ and tries -again, repeatedly, until a+ itself fails. -.P -Other optimizations that provide fast "no match" results also affect callouts. -For example, if the pattern is -.sp - ab(?C4)cd -.sp -PCRE knows that any matching string must contain the letter "d". If the subject -string is "abyz", the lack of "d" means that matching doesn't ever start, and -the callout is never reached. However, with "abyd", though the result is still -no match, the callout is obeyed. -.P -If the pattern is studied, PCRE knows the minimum length of a matching string, -and will immediately give a "no match" return without actually running a match -if the subject is not long enough, or, for unanchored patterns, if it has -been scanned far enough. -.P -You can disable these optimizations by passing the PCRE_NO_START_OPTIMIZE -option to the matching function, or by starting the pattern with -(*NO_START_OPT). This slows down the matching process, but does ensure that -callouts such as the example above are obeyed. -. -. -.SH "THE CALLOUT INTERFACE" -.rs -.sp -During matching, when PCRE reaches a callout point, the external function -defined by \fIpcre_callout\fP or \fIpcre[16|32]_callout\fP is called (if it is -set). This applies to both normal and DFA matching. The only argument to the -callout function is a pointer to a \fBpcre_callout\fP or -\fBpcre[16|32]_callout\fP block. These structures contains the following -fields: -.sp - int \fIversion\fP; - int \fIcallout_number\fP; - int *\fIoffset_vector\fP; - const char *\fIsubject\fP; (8-bit version) - PCRE_SPTR16 \fIsubject\fP; (16-bit version) - PCRE_SPTR32 \fIsubject\fP; (32-bit version) - int \fIsubject_length\fP; - int \fIstart_match\fP; - int \fIcurrent_position\fP; - int \fIcapture_top\fP; - int \fIcapture_last\fP; - void *\fIcallout_data\fP; - int \fIpattern_position\fP; - int \fInext_item_length\fP; - const unsigned char *\fImark\fP; (8-bit version) - const PCRE_UCHAR16 *\fImark\fP; (16-bit version) - const PCRE_UCHAR32 *\fImark\fP; (32-bit version) -.sp -The \fIversion\fP field is an integer containing the version number of the -block format. The initial version was 0; the current version is 2. The version -number will change again in future if additional fields are added, but the -intention is never to remove any of the existing fields. -.P -The \fIcallout_number\fP field contains the number of the callout, as compiled -into the pattern (that is, the number after ?C for manual callouts, and 255 for -automatically generated callouts). -.P -The \fIoffset_vector\fP field is a pointer to the vector of offsets that was -passed by the caller to the matching function. When \fBpcre_exec()\fP or -\fBpcre[16|32]_exec()\fP is used, the contents can be inspected, in order to -extract substrings that have been matched so far, in the same way as for -extracting substrings after a match has completed. For the DFA matching -functions, this field is not useful. -.P -The \fIsubject\fP and \fIsubject_length\fP fields contain copies of the values -that were passed to the matching function. -.P -The \fIstart_match\fP field normally contains the offset within the subject at -which the current match attempt started. However, if the escape sequence \eK -has been encountered, this value is changed to reflect the modified starting -point. If the pattern is not anchored, the callout function may be called -several times from the same point in the pattern for different starting points -in the subject. -.P -The \fIcurrent_position\fP field contains the offset within the subject of the -current match pointer. -.P -When the \fBpcre_exec()\fP or \fBpcre[16|32]_exec()\fP is used, the -\fIcapture_top\fP field contains one more than the number of the highest -numbered captured substring so far. If no substrings have been captured, the -value of \fIcapture_top\fP is one. This is always the case when the DFA -functions are used, because they do not support captured substrings. -.P -The \fIcapture_last\fP field contains the number of the most recently captured -substring. However, when a recursion exits, the value reverts to what it was -outside the recursion, as do the values of all captured substrings. If no -substrings have been captured, the value of \fIcapture_last\fP is -1. This is -always the case for the DFA matching functions. -.P -The \fIcallout_data\fP field contains a value that is passed to a matching -function specifically so that it can be passed back in callouts. It is passed -in the \fIcallout_data\fP field of a \fBpcre_extra\fP or \fBpcre[16|32]_extra\fP -data structure. If no such data was passed, the value of \fIcallout_data\fP in -a callout block is NULL. There is a description of the \fBpcre_extra\fP -structure in the -.\" HREF -\fBpcreapi\fP -.\" -documentation. -.P -The \fIpattern_position\fP field is present from version 1 of the callout -structure. It contains the offset to the next item to be matched in the pattern -string. -.P -The \fInext_item_length\fP field is present from version 1 of the callout -structure. It contains the length of the next item to be matched in the pattern -string. When the callout immediately precedes an alternation bar, a closing -parenthesis, or the end of the pattern, the length is zero. When the callout -precedes an opening parenthesis, the length is that of the entire subpattern. -.P -The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to -help in distinguishing between different automatic callouts, which all have the -same callout number. However, they are set for all callouts. -.P -The \fImark\fP field is present from version 2 of the callout structure. In -callouts from \fBpcre_exec()\fP or \fBpcre[16|32]_exec()\fP it contains a -pointer to the zero-terminated name of the most recently passed (*MARK), -(*PRUNE), or (*THEN) item in the match, or NULL if no such items have been -passed. Instances of (*PRUNE) or (*THEN) without a name do not obliterate a -previous (*MARK). In callouts from the DFA matching functions this field always -contains NULL. -. -. -.SH "RETURN VALUES" -.rs -.sp -The external callout function returns an integer to PCRE. If the value is zero, -matching proceeds as normal. If the value is greater than zero, matching fails -at the current point, but the testing of other matching possibilities goes -ahead, just as if a lookahead assertion had failed. If the value is less than -zero, the match is abandoned, the matching function returns the negative value. -.P -Negative values should normally be chosen from the set of PCRE_ERROR_xxx -values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure. -The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions; -it will never be used by PCRE itself. -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 12 November 2013 -Copyright (c) 1997-2013 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcrecompat.3 b/src/pcre/doc/pcrecompat.3 deleted file mode 100644 index 6156e776..00000000 --- a/src/pcre/doc/pcrecompat.3 +++ /dev/null @@ -1,200 +0,0 @@ -.TH PCRECOMPAT 3 "10 November 2013" "PCRE 8.34" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH "DIFFERENCES BETWEEN PCRE AND PERL" -.rs -.sp -This document describes the differences in the ways that PCRE and Perl handle -regular expressions. The differences described here are with respect to Perl -versions 5.10 and above. -.P -1. PCRE has only a subset of Perl's Unicode support. Details of what it does -have are given in the -.\" HREF -\fBpcreunicode\fP -.\" -page. -.P -2. PCRE allows repeat quantifiers only on parenthesized assertions, but they do -not mean what you might think. For example, (?!a){3} does not assert that the -next three characters are not "a". It just asserts that the next character is -not "a" three times (in principle: PCRE optimizes this to run the assertion -just once). Perl allows repeat quantifiers on other assertions such as \eb, but -these do not seem to have any use. -.P -3. Capturing subpatterns that occur inside negative lookahead assertions are -counted, but their entries in the offsets vector are never set. Perl sometimes -(but not always) sets its numerical variables from inside negative assertions. -.P -4. Though binary zero characters are supported in the subject string, they are -not allowed in a pattern string because it is passed as a normal C string, -terminated by zero. The escape sequence \e0 can be used in the pattern to -represent a binary zero. -.P -5. The following Perl escape sequences are not supported: \el, \eu, \eL, -\eU, and \eN when followed by a character name or Unicode value. (\eN on its -own, matching a non-newline character, is supported.) In fact these are -implemented by Perl's general string-handling and are not part of its pattern -matching engine. If any of these are encountered by PCRE, an error is -generated by default. However, if the PCRE_JAVASCRIPT_COMPAT option is set, -\eU and \eu are interpreted as JavaScript interprets them. -.P -6. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE is -built with Unicode character property support. The properties that can be -tested with \ep and \eP are limited to the general category properties such as -Lu and Nd, script names such as Greek or Han, and the derived properties Any -and L&. PCRE does support the Cs (surrogate) property, which Perl does not; the -Perl documentation says "Because Perl hides the need for the user to understand -the internal representation of Unicode characters, there is no need to -implement the somewhat messy concept of surrogates." -.P -7. PCRE does support the \eQ...\eE escape for quoting substrings. Characters in -between are treated as literals. This is slightly different from Perl in that $ -and @ are also handled as literals inside the quotes. In Perl, they cause -variable interpolation (but of course PCRE does not have variables). Note the -following examples: -.sp - Pattern PCRE matches Perl matches -.sp -.\" JOIN - \eQabc$xyz\eE abc$xyz abc followed by the - contents of $xyz - \eQabc\e$xyz\eE abc\e$xyz abc\e$xyz - \eQabc\eE\e$\eQxyz\eE abc$xyz abc$xyz -.sp -The \eQ...\eE sequence is recognized both inside and outside character classes. -.P -8. Fairly obviously, PCRE does not support the (?{code}) and (??{code}) -constructions. However, there is support for recursive patterns. This is not -available in Perl 5.8, but it is in Perl 5.10. Also, the PCRE "callout" -feature allows an external function to be called during pattern matching. See -the -.\" HREF -\fBpcrecallout\fP -.\" -documentation for details. -.P -9. Subpatterns that are called as subroutines (whether or not recursively) are -always treated as atomic groups in PCRE. This is like Python, but unlike Perl. -Captured values that are set outside a subroutine call can be reference from -inside in PCRE, but not in Perl. There is a discussion that explains these -differences in more detail in the -.\" HTML -.\" -section on recursion differences from Perl -.\" -in the -.\" HREF -\fBpcrepattern\fP -.\" -page. -.P -10. If any of the backtracking control verbs are used in a subpattern that is -called as a subroutine (whether or not recursively), their effect is confined -to that subpattern; it does not extend to the surrounding pattern. This is not -always the case in Perl. In particular, if (*THEN) is present in a group that -is called as a subroutine, its action is limited to that group, even if the -group does not contain any | characters. Note that such subpatterns are -processed as anchored at the point where they are tested. -.P -11. If a pattern contains more than one backtracking control verb, the first -one that is backtracked onto acts. For example, in the pattern -A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C -triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the -same as PCRE, but there are examples where it differs. -.P -12. Most backtracking verbs in assertions have their normal actions. They are -not confined to the assertion. -.P -13. There are some differences that are concerned with the settings of captured -strings when part of a pattern is repeated. For example, matching "aba" against -the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b". -.P -14. PCRE's handling of duplicate subpattern numbers and duplicate subpattern -names is not as general as Perl's. This is a consequence of the fact the PCRE -works internally just with numbers, using an external table to translate -between numbers and names. In particular, a pattern such as (?|(?A)|(?B), -where the two capturing parentheses have the same number but different names, -is not supported, and causes an error at compile time. If it were allowed, it -would not be possible to distinguish which parentheses matched, because both -names map to capturing subpattern number 1. To avoid this confusing situation, -an error is given at compile time. -.P -15. Perl recognizes comments in some places that PCRE does not, for example, -between the ( and ? at the start of a subpattern. If the /x modifier is set, -Perl allows white space between ( and ? (though current Perls warn that this is -deprecated) but PCRE never does, even if the PCRE_EXTENDED option is set. -.P -16. Perl, when in warning mode, gives warnings for character classes such as -[A-\ed] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE has no -warning features, so it gives an error in these cases because they are almost -certainly user mistakes. -.P -17. In PCRE, the upper/lower case character properties Lu and Ll are not -affected when case-independent matching is specified. For example, \ep{Lu} -always matches an upper case letter. I think Perl has changed in this respect; -in the release at the time of writing (5.16), \ep{Lu} and \ep{Ll} match all -letters, regardless of case, when case independence is specified. -.P -18. PCRE provides some extensions to the Perl regular expression facilities. -Perl 5.10 includes new features that are not in earlier versions of Perl, some -of which (such as named parentheses) have been in PCRE for some time. This list -is with respect to Perl 5.10: -.sp -(a) Although lookbehind assertions in PCRE must match fixed length strings, -each alternative branch of a lookbehind assertion can match a different length -of string. Perl requires them all to have the same length. -.sp -(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $ -meta-character matches only at the very end of the string. -.sp -(c) If PCRE_EXTRA is set, a backslash followed by a letter with no special -meaning is faulted. Otherwise, like Perl, the backslash is quietly ignored. -(Perl can be made to issue a warning.) -.sp -(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is -inverted, that is, by default they are not greedy, but if followed by a -question mark they are. -.sp -(e) PCRE_ANCHORED can be used at matching time to force a pattern to be tried -only at the first matching position in the subject string. -.sp -(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, and -PCRE_NO_AUTO_CAPTURE options for \fBpcre_exec()\fP have no Perl equivalents. -.sp -(g) The \eR escape sequence can be restricted to match only CR, LF, or CRLF -by the PCRE_BSR_ANYCRLF option. -.sp -(h) The callout facility is PCRE-specific. -.sp -(i) The partial matching facility is PCRE-specific. -.sp -(j) Patterns compiled by PCRE can be saved and re-used at a later time, even on -different hosts that have the other endianness. However, this does not apply to -optimized data created by the just-in-time compiler. -.sp -(k) The alternative matching functions (\fBpcre_dfa_exec()\fP, -\fBpcre16_dfa_exec()\fP and \fBpcre32_dfa_exec()\fP,) match in a different way -and are not Perl-compatible. -.sp -(l) PCRE recognizes some special sequences such as (*CR) at the start of -a pattern that set overall options that cannot be changed within the pattern. -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 10 November 2013 -Copyright (c) 1997-2013 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcrecpp.3 b/src/pcre/doc/pcrecpp.3 deleted file mode 100644 index fbddd86a..00000000 --- a/src/pcre/doc/pcrecpp.3 +++ /dev/null @@ -1,348 +0,0 @@ -.TH PCRECPP 3 "08 January 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions. -.SH "SYNOPSIS OF C++ WRAPPER" -.rs -.sp -.B #include -. -.SH DESCRIPTION -.rs -.sp -The C++ wrapper for PCRE was provided by Google Inc. Some additional -functionality was added by Giuseppe Maxia. This brief man page was constructed -from the notes in the \fIpcrecpp.h\fP file, which should be consulted for -further details. Note that the C++ wrapper supports only the original 8-bit -PCRE library. There is no 16-bit or 32-bit support at present. -. -. -.SH "MATCHING INTERFACE" -.rs -.sp -The "FullMatch" operation checks that supplied text matches a supplied pattern -exactly. If pointer arguments are supplied, it copies matched sub-strings that -match sub-patterns into them. -.sp - Example: successful match - pcrecpp::RE re("h.*o"); - re.FullMatch("hello"); -.sp - Example: unsuccessful match (requires full match): - pcrecpp::RE re("e"); - !re.FullMatch("hello"); -.sp - Example: creating a temporary RE object: - pcrecpp::RE("h.*o").FullMatch("hello"); -.sp -You can pass in a "const char*" or a "string" for "text". The examples below -tend to use a const char*. You can, as in the different examples above, store -the RE object explicitly in a variable or use a temporary RE object. The -examples below use one mode or the other arbitrarily. Either could correctly be -used for any of these examples. -.P -You must supply extra pointer arguments to extract matched subpieces. -.sp - Example: extracts "ruby" into "s" and 1234 into "i" - int i; - string s; - pcrecpp::RE re("(\e\ew+):(\e\ed+)"); - re.FullMatch("ruby:1234", &s, &i); -.sp - Example: does not try to extract any extra sub-patterns - re.FullMatch("ruby:1234", &s); -.sp - Example: does not try to extract into NULL - re.FullMatch("ruby:1234", NULL, &i); -.sp - Example: integer overflow causes failure - !re.FullMatch("ruby:1234567891234", NULL, &i); -.sp - Example: fails because there aren't enough sub-patterns: - !pcrecpp::RE("\e\ew+:\e\ed+").FullMatch("ruby:1234", &s); -.sp - Example: fails because string cannot be stored in integer - !pcrecpp::RE("(.*)").FullMatch("ruby", &i); -.sp -The provided pointer arguments can be pointers to any scalar numeric -type, or one of: -.sp - string (matched piece is copied to string) - StringPiece (StringPiece is mutated to point to matched piece) - T (where "bool T::ParseFrom(const char*, int)" exists) - NULL (the corresponding matched sub-pattern is not copied) -.sp -The function returns true iff all of the following conditions are satisfied: -.sp - a. "text" matches "pattern" exactly; -.sp - b. The number of matched sub-patterns is >= number of supplied - pointers; -.sp - c. The "i"th argument has a suitable type for holding the - string captured as the "i"th sub-pattern. If you pass in - void * NULL for the "i"th argument, or a non-void * NULL - of the correct type, or pass fewer arguments than the - number of sub-patterns, "i"th captured sub-pattern is - ignored. -.sp -CAVEAT: An optional sub-pattern that does not exist in the matched -string is assigned the empty string. Therefore, the following will -return false (because the empty string is not a valid number): -.sp - int number; - pcrecpp::RE::FullMatch("abc", "[a-z]+(\e\ed+)?", &number); -.sp -The matching interface supports at most 16 arguments per call. -If you need more, consider using the more general interface -\fBpcrecpp::RE::DoMatch\fP. See \fBpcrecpp.h\fP for the signature for -\fBDoMatch\fP. -.P -NOTE: Do not use \fBno_arg\fP, which is used internally to mark the end of a -list of optional arguments, as a placeholder for missing arguments, as this can -lead to segfaults. -. -. -.SH "QUOTING METACHARACTERS" -.rs -.sp -You can use the "QuoteMeta" operation to insert backslashes before all -potentially meaningful characters in a string. The returned string, used as a -regular expression, will exactly match the original string. -.sp - Example: - string quoted = RE::QuoteMeta(unquoted); -.sp -Note that it's legal to escape a character even if it has no special meaning in -a regular expression -- so this function does that. (This also makes it -identical to the perl function of the same name; see "perldoc -f quotemeta".) -For example, "1.5-2.0?" becomes "1\e.5\e-2\e.0\e?". -. -.SH "PARTIAL MATCHES" -.rs -.sp -You can use the "PartialMatch" operation when you want the pattern -to match any substring of the text. -.sp - Example: simple search for a string: - pcrecpp::RE("ell").PartialMatch("hello"); -.sp - Example: find first number in a string: - int number; - pcrecpp::RE re("(\e\ed+)"); - re.PartialMatch("x*100 + 20", &number); - assert(number == 100); -. -. -.SH "UTF-8 AND THE MATCHING INTERFACE" -.rs -.sp -By default, pattern and text are plain text, one byte per character. The UTF8 -flag, passed to the constructor, causes both pattern and string to be treated -as UTF-8 text, still a byte stream but potentially multiple bytes per -character. In practice, the text is likelier to be UTF-8 than the pattern, but -the match returned may depend on the UTF8 flag, so always use it when matching -UTF8 text. For example, "." will match one byte normally but with UTF8 set may -match up to three bytes of a multi-byte character. -.sp - Example: - pcrecpp::RE_Options options; - options.set_utf8(); - pcrecpp::RE re(utf8_pattern, options); - re.FullMatch(utf8_string); -.sp - Example: using the convenience function UTF8(): - pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8()); - re.FullMatch(utf8_string); -.sp -NOTE: The UTF8 flag is ignored if pcre was not configured with the - --enable-utf8 flag. -. -. -.SH "PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE" -.rs -.sp -PCRE defines some modifiers to change the behavior of the regular expression -engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to -pass such modifiers to a RE class. Currently, the following modifiers are -supported: -.sp - modifier description Perl corresponding -.sp - PCRE_CASELESS case insensitive match /i - PCRE_MULTILINE multiple lines match /m - PCRE_DOTALL dot matches newlines /s - PCRE_DOLLAR_ENDONLY $ matches only at end N/A - PCRE_EXTRA strict escape parsing N/A - PCRE_EXTENDED ignore white spaces /x - PCRE_UTF8 handles UTF8 chars built-in - PCRE_UNGREEDY reverses * and *? N/A - PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*) -.sp -(*) Both Perl and PCRE allow non capturing parentheses by means of the -"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not -capture, while (ab|cd) does. -.P -For a full account on how each modifier works, please check the -PCRE API reference page. -.P -For each modifier, there are two member functions whose name is made -out of the modifier in lowercase, without the "PCRE_" prefix. For -instance, PCRE_CASELESS is handled by -.sp - bool caseless() -.sp -which returns true if the modifier is set, and -.sp - RE_Options & set_caseless(bool) -.sp -which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be -accessed through the \fBset_match_limit()\fP and \fBmatch_limit()\fP member -functions. Setting \fImatch_limit\fP to a non-zero value will limit the -execution of pcre to keep it from doing bad things like blowing the stack or -taking an eternity to return a result. A value of 5000 is good enough to stop -stack blowup in a 2MB thread stack. Setting \fImatch_limit\fP to zero disables -match limiting. Alternatively, you can call \fBmatch_limit_recursion()\fP -which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE -recurses. \fBmatch_limit()\fP limits the number of matches PCRE does; -\fBmatch_limit_recursion()\fP limits the depth of internal recursion, and -therefore the amount of stack that is used. -.P -Normally, to pass one or more modifiers to a RE class, you declare -a \fIRE_Options\fP object, set the appropriate options, and pass this -object to a RE constructor. Example: -.sp - RE_Options opt; - opt.set_caseless(true); - if (RE("HELLO", opt).PartialMatch("hello world")) ... -.sp -RE_options has two constructors. The default constructor takes no arguments and -creates a set of flags that are off by default. The optional parameter -\fIoption_flags\fP is to facilitate transfer of legacy code from C programs. -This lets you do -.sp - RE(pattern, - RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str); -.sp -However, new code is better off doing -.sp - RE(pattern, - RE_Options().set_caseless(true).set_multiline(true)) - .PartialMatch(str); -.sp -If you are going to pass one of the most used modifiers, there are some -convenience functions that return a RE_Options class with the -appropriate modifier already set: \fBCASELESS()\fP, \fBUTF8()\fP, -\fBMULTILINE()\fP, \fBDOTALL\fP(), and \fBEXTENDED()\fP. -.P -If you need to set several options at once, and you don't want to go through -the pains of declaring a RE_Options object and setting several options, there -is a parallel method that give you such ability on the fly. You can concatenate -several \fBset_xxxxx()\fP member functions, since each of them returns a -reference to its class object. For example, to pass PCRE_CASELESS, -PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write: -.sp - RE(" ^ xyz \e\es+ .* blah$", - RE_Options() - .set_caseless(true) - .set_extended(true) - .set_multiline(true)).PartialMatch(sometext); -.sp -. -. -.SH "SCANNING TEXT INCREMENTALLY" -.rs -.sp -The "Consume" operation may be useful if you want to repeatedly -match regular expressions at the front of a string and skip over -them as they match. This requires use of the "StringPiece" type, -which represents a sub-range of a real string. Like RE, StringPiece -is defined in the pcrecpp namespace. -.sp - Example: read lines of the form "var = value" from a string. - string contents = ...; // Fill string somehow - pcrecpp::StringPiece input(contents); // Wrap in a StringPiece -.sp - string var; - int value; - pcrecpp::RE re("(\e\ew+) = (\e\ed+)\en"); - while (re.Consume(&input, &var, &value)) { - ...; - } -.sp -Each successful call to "Consume" will set "var/value", and also -advance "input" so it points past the matched text. -.P -The "FindAndConsume" operation is similar to "Consume" but does not -anchor your match at the beginning of the string. For example, you -could extract all words from a string by repeatedly calling -.sp - pcrecpp::RE("(\e\ew+)").FindAndConsume(&input, &word) -. -. -.SH "PARSING HEX/OCTAL/C-RADIX NUMBERS" -.rs -.sp -By default, if you pass a pointer to a numeric value, the -corresponding text is interpreted as a base-10 number. You can -instead wrap the pointer with a call to one of the operators Hex(), -Octal(), or CRadix() to interpret the text in another base. The -CRadix operator interprets C-style "0" (base-8) and "0x" (base-16) -prefixes, but defaults to base-10. -.sp - Example: - int a, b, c, d; - pcrecpp::RE re("(.*) (.*) (.*) (.*)"); - re.FullMatch("100 40 0100 0x40", - pcrecpp::Octal(&a), pcrecpp::Hex(&b), - pcrecpp::CRadix(&c), pcrecpp::CRadix(&d)); -.sp -will leave 64 in a, b, c, and d. -. -. -.SH "REPLACING PARTS OF STRINGS" -.rs -.sp -You can replace the first match of "pattern" in "str" with "rewrite". -Within "rewrite", backslash-escaped digits (\e1 to \e9) can be -used to insert text matching corresponding parenthesized group -from the pattern. \e0 in "rewrite" refers to the entire matching -text. For example: -.sp - string s = "yabba dabba doo"; - pcrecpp::RE("b+").Replace("d", &s); -.sp -will leave "s" containing "yada dabba doo". The result is true if the pattern -matches and a replacement occurs, false otherwise. -.P -\fBGlobalReplace\fP is like \fBReplace\fP except that it replaces all -occurrences of the pattern in the string with the rewrite. Replacements are -not subject to re-matching. For example: -.sp - string s = "yabba dabba doo"; - pcrecpp::RE("b+").GlobalReplace("d", &s); -.sp -will leave "s" containing "yada dada doo". It returns the number of -replacements made. -.P -\fBExtract\fP is like \fBReplace\fP, except that if the pattern matches, -"rewrite" is copied into "out" (an additional argument) with substitutions. -The non-matching portions of "text" are ignored. Returns true iff a match -occurred and the extraction happened successfully; if no match occurs, the -string is left unaffected. -. -. -.SH AUTHOR -.rs -.sp -.nf -The C++ wrapper was contributed by Google Inc. -Copyright (c) 2007 Google Inc. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 08 January 2012 -.fi diff --git a/src/pcre/doc/pcredemo.3 b/src/pcre/doc/pcredemo.3 deleted file mode 100644 index 194629b1..00000000 --- a/src/pcre/doc/pcredemo.3 +++ /dev/null @@ -1,424 +0,0 @@ -.\" Start example. -.de EX -. nr mE \\n(.f -. nf -. nh -. ft CW -.. -. -. -.\" End example. -.de EE -. ft \\n(mE -. fi -. hy \\n(HY -.. -. -.EX -/************************************************* -* PCRE DEMONSTRATION PROGRAM * -*************************************************/ - -/* This is a demonstration program to illustrate the most straightforward ways -of calling the PCRE regular expression library from a C program. See the -pcresample documentation for a short discussion ("man pcresample" if you have -the PCRE man pages installed). - -In Unix-like environments, if PCRE is installed in your standard system -libraries, you should be able to compile this program using this command: - -gcc -Wall pcredemo.c -lpcre -o pcredemo - -If PCRE is not installed in a standard place, it is likely to be installed with -support for the pkg-config mechanism. If you have pkg-config, you can compile -this program using this command: - -gcc -Wall pcredemo.c `pkg-config --cflags --libs libpcre` -o pcredemo - -If you do not have pkg-config, you may have to use this: - -gcc -Wall pcredemo.c -I/usr/local/include -L/usr/local/lib \e - -R/usr/local/lib -lpcre -o pcredemo - -Replace "/usr/local/include" and "/usr/local/lib" with wherever the include and -library files for PCRE are installed on your system. Only some operating -systems (e.g. Solaris) use the -R option. - -Building under Windows: - -If you want to statically link this program against a non-dll .a file, you must -define PCRE_STATIC before including pcre.h, otherwise the pcre_malloc() and -pcre_free() exported functions will be declared __declspec(dllimport), with -unwanted results. So in this environment, uncomment the following line. */ - -/* #define PCRE_STATIC */ - -#include -#include -#include - -#define OVECCOUNT 30 /* should be a multiple of 3 */ - - -int main(int argc, char **argv) -{ -pcre *re; -const char *error; -char *pattern; -char *subject; -unsigned char *name_table; -unsigned int option_bits; -int erroffset; -int find_all; -int crlf_is_newline; -int namecount; -int name_entry_size; -int ovector[OVECCOUNT]; -int subject_length; -int rc, i; -int utf8; - - -/************************************************************************** -* First, sort out the command line. There is only one possible option at * -* the moment, "-g" to request repeated matching to find all occurrences, * -* like Perl's /g option. We set the variable find_all to a non-zero value * -* if the -g option is present. Apart from that, there must be exactly two * -* arguments. * -**************************************************************************/ - -find_all = 0; -for (i = 1; i < argc; i++) - { - if (strcmp(argv[i], "-g") == 0) find_all = 1; - else break; - } - -/* After the options, we require exactly two arguments, which are the pattern, -and the subject string. */ - -if (argc - i != 2) - { - printf("Two arguments required: a regex and a subject string\en"); - return 1; - } - -pattern = argv[i]; -subject = argv[i+1]; -subject_length = (int)strlen(subject); - - -/************************************************************************* -* Now we are going to compile the regular expression pattern, and handle * -* and errors that are detected. * -*************************************************************************/ - -re = pcre_compile( - pattern, /* the pattern */ - 0, /* default options */ - &error, /* for error message */ - &erroffset, /* for error offset */ - NULL); /* use default character tables */ - -/* Compilation failed: print the error message and exit */ - -if (re == NULL) - { - printf("PCRE compilation failed at offset %d: %s\en", erroffset, error); - return 1; - } - - -/************************************************************************* -* If the compilation succeeded, we call PCRE again, in order to do a * -* pattern match against the subject string. This does just ONE match. If * -* further matching is needed, it will be done below. * -*************************************************************************/ - -rc = pcre_exec( - re, /* the compiled pattern */ - NULL, /* no extra data - we didn't study the pattern */ - subject, /* the subject string */ - subject_length, /* the length of the subject */ - 0, /* start at offset 0 in the subject */ - 0, /* default options */ - ovector, /* output vector for substring information */ - OVECCOUNT); /* number of elements in the output vector */ - -/* Matching failed: handle error cases */ - -if (rc < 0) - { - switch(rc) - { - case PCRE_ERROR_NOMATCH: printf("No match\en"); break; - /* - Handle other special cases if you like - */ - default: printf("Matching error %d\en", rc); break; - } - pcre_free(re); /* Release memory used for the compiled pattern */ - return 1; - } - -/* Match succeded */ - -printf("\enMatch succeeded at offset %d\en", ovector[0]); - - -/************************************************************************* -* We have found the first match within the subject string. If the output * -* vector wasn't big enough, say so. Then output any substrings that were * -* captured. * -*************************************************************************/ - -/* The output vector wasn't big enough */ - -if (rc == 0) - { - rc = OVECCOUNT/3; - printf("ovector only has room for %d captured substrings\en", rc - 1); - } - -/* Show substrings stored in the output vector by number. Obviously, in a real -application you might want to do things other than print them. */ - -for (i = 0; i < rc; i++) - { - char *substring_start = subject + ovector[2*i]; - int substring_length = ovector[2*i+1] - ovector[2*i]; - printf("%2d: %.*s\en", i, substring_length, substring_start); - } - - -/************************************************************************** -* That concludes the basic part of this demonstration program. We have * -* compiled a pattern, and performed a single match. The code that follows * -* shows first how to access named substrings, and then how to code for * -* repeated matches on the same subject. * -**************************************************************************/ - -/* See if there are any named substrings, and if so, show them by name. First -we have to extract the count of named parentheses from the pattern. */ - -(void)pcre_fullinfo( - re, /* the compiled pattern */ - NULL, /* no extra data - we didn't study the pattern */ - PCRE_INFO_NAMECOUNT, /* number of named substrings */ - &namecount); /* where to put the answer */ - -if (namecount <= 0) printf("No named substrings\en"); else - { - unsigned char *tabptr; - printf("Named substrings\en"); - - /* Before we can access the substrings, we must extract the table for - translating names to numbers, and the size of each entry in the table. */ - - (void)pcre_fullinfo( - re, /* the compiled pattern */ - NULL, /* no extra data - we didn't study the pattern */ - PCRE_INFO_NAMETABLE, /* address of the table */ - &name_table); /* where to put the answer */ - - (void)pcre_fullinfo( - re, /* the compiled pattern */ - NULL, /* no extra data - we didn't study the pattern */ - PCRE_INFO_NAMEENTRYSIZE, /* size of each entry in the table */ - &name_entry_size); /* where to put the answer */ - - /* Now we can scan the table and, for each entry, print the number, the name, - and the substring itself. */ - - tabptr = name_table; - for (i = 0; i < namecount; i++) - { - int n = (tabptr[0] << 8) | tabptr[1]; - printf("(%d) %*s: %.*s\en", n, name_entry_size - 3, tabptr + 2, - ovector[2*n+1] - ovector[2*n], subject + ovector[2*n]); - tabptr += name_entry_size; - } - } - - -/************************************************************************* -* If the "-g" option was given on the command line, we want to continue * -* to search for additional matches in the subject string, in a similar * -* way to the /g option in Perl. This turns out to be trickier than you * -* might think because of the possibility of matching an empty string. * -* What happens is as follows: * -* * -* If the previous match was NOT for an empty string, we can just start * -* the next match at the end of the previous one. * -* * -* If the previous match WAS for an empty string, we can't do that, as it * -* would lead to an infinite loop. Instead, a special call of pcre_exec() * -* is made with the PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED flags set. * -* The first of these tells PCRE that an empty string at the start of the * -* subject is not a valid match; other possibilities must be tried. The * -* second flag restricts PCRE to one match attempt at the initial string * -* position. If this match succeeds, an alternative to the empty string * -* match has been found, and we can print it and proceed round the loop, * -* advancing by the length of whatever was found. If this match does not * -* succeed, we still stay in the loop, advancing by just one character. * -* In UTF-8 mode, which can be set by (*UTF8) in the pattern, this may be * -* more than one byte. * -* * -* However, there is a complication concerned with newlines. When the * -* newline convention is such that CRLF is a valid newline, we must * -* advance by two characters rather than one. The newline convention can * -* be set in the regex by (*CR), etc.; if not, we must find the default. * -*************************************************************************/ - -if (!find_all) /* Check for -g */ - { - pcre_free(re); /* Release the memory used for the compiled pattern */ - return 0; /* Finish unless -g was given */ - } - -/* Before running the loop, check for UTF-8 and whether CRLF is a valid newline -sequence. First, find the options with which the regex was compiled; extract -the UTF-8 state, and mask off all but the newline options. */ - -(void)pcre_fullinfo(re, NULL, PCRE_INFO_OPTIONS, &option_bits); -utf8 = option_bits & PCRE_UTF8; -option_bits &= PCRE_NEWLINE_CR|PCRE_NEWLINE_LF|PCRE_NEWLINE_CRLF| - PCRE_NEWLINE_ANY|PCRE_NEWLINE_ANYCRLF; - -/* If no newline options were set, find the default newline convention from the -build configuration. */ - -if (option_bits == 0) - { - int d; - (void)pcre_config(PCRE_CONFIG_NEWLINE, &d); - /* Note that these values are always the ASCII ones, even in - EBCDIC environments. CR = 13, NL = 10. */ - option_bits = (d == 13)? PCRE_NEWLINE_CR : - (d == 10)? PCRE_NEWLINE_LF : - (d == (13<<8 | 10))? PCRE_NEWLINE_CRLF : - (d == -2)? PCRE_NEWLINE_ANYCRLF : - (d == -1)? PCRE_NEWLINE_ANY : 0; - } - -/* See if CRLF is a valid newline sequence. */ - -crlf_is_newline = - option_bits == PCRE_NEWLINE_ANY || - option_bits == PCRE_NEWLINE_CRLF || - option_bits == PCRE_NEWLINE_ANYCRLF; - -/* Loop for second and subsequent matches */ - -for (;;) - { - int options = 0; /* Normally no options */ - int start_offset = ovector[1]; /* Start at end of previous match */ - - /* If the previous match was for an empty string, we are finished if we are - at the end of the subject. Otherwise, arrange to run another match at the - same point to see if a non-empty match can be found. */ - - if (ovector[0] == ovector[1]) - { - if (ovector[0] == subject_length) break; - options = PCRE_NOTEMPTY_ATSTART | PCRE_ANCHORED; - } - - /* Run the next matching operation */ - - rc = pcre_exec( - re, /* the compiled pattern */ - NULL, /* no extra data - we didn't study the pattern */ - subject, /* the subject string */ - subject_length, /* the length of the subject */ - start_offset, /* starting offset in the subject */ - options, /* options */ - ovector, /* output vector for substring information */ - OVECCOUNT); /* number of elements in the output vector */ - - /* This time, a result of NOMATCH isn't an error. If the value in "options" - is zero, it just means we have found all possible matches, so the loop ends. - Otherwise, it means we have failed to find a non-empty-string match at a - point where there was a previous empty-string match. In this case, we do what - Perl does: advance the matching position by one character, and continue. We - do this by setting the "end of previous match" offset, because that is picked - up at the top of the loop as the point at which to start again. - - There are two complications: (a) When CRLF is a valid newline sequence, and - the current position is just before it, advance by an extra byte. (b) - Otherwise we must ensure that we skip an entire UTF-8 character if we are in - UTF-8 mode. */ - - if (rc == PCRE_ERROR_NOMATCH) - { - if (options == 0) break; /* All matches found */ - ovector[1] = start_offset + 1; /* Advance one byte */ - if (crlf_is_newline && /* If CRLF is newline & */ - start_offset < subject_length - 1 && /* we are at CRLF, */ - subject[start_offset] == '\er' && - subject[start_offset + 1] == '\en') - ovector[1] += 1; /* Advance by one more. */ - else if (utf8) /* Otherwise, ensure we */ - { /* advance a whole UTF-8 */ - while (ovector[1] < subject_length) /* character. */ - { - if ((subject[ovector[1]] & 0xc0) != 0x80) break; - ovector[1] += 1; - } - } - continue; /* Go round the loop again */ - } - - /* Other matching errors are not recoverable. */ - - if (rc < 0) - { - printf("Matching error %d\en", rc); - pcre_free(re); /* Release memory used for the compiled pattern */ - return 1; - } - - /* Match succeded */ - - printf("\enMatch succeeded again at offset %d\en", ovector[0]); - - /* The match succeeded, but the output vector wasn't big enough. */ - - if (rc == 0) - { - rc = OVECCOUNT/3; - printf("ovector only has room for %d captured substrings\en", rc - 1); - } - - /* As before, show substrings stored in the output vector by number, and then - also any named substrings. */ - - for (i = 0; i < rc; i++) - { - char *substring_start = subject + ovector[2*i]; - int substring_length = ovector[2*i+1] - ovector[2*i]; - printf("%2d: %.*s\en", i, substring_length, substring_start); - } - - if (namecount <= 0) printf("No named substrings\en"); else - { - unsigned char *tabptr = name_table; - printf("Named substrings\en"); - for (i = 0; i < namecount; i++) - { - int n = (tabptr[0] << 8) | tabptr[1]; - printf("(%d) %*s: %.*s\en", n, name_entry_size - 3, tabptr + 2, - ovector[2*n+1] - ovector[2*n], subject + ovector[2*n]); - tabptr += name_entry_size; - } - } - } /* End of loop to find second and subsequent matches */ - -printf("\en"); -pcre_free(re); /* Release memory used for the compiled pattern */ -return 0; -} - -/* End of pcredemo.c */ -.EE diff --git a/src/pcre/doc/pcregrep.1 b/src/pcre/doc/pcregrep.1 deleted file mode 100644 index 98866754..00000000 --- a/src/pcre/doc/pcregrep.1 +++ /dev/null @@ -1,683 +0,0 @@ -.TH PCREGREP 1 "03 April 2014" "PCRE 8.35" -.SH NAME -pcregrep - a grep with Perl-compatible regular expressions. -.SH SYNOPSIS -.B pcregrep [options] [long options] [pattern] [path1 path2 ...] -. -.SH DESCRIPTION -.rs -.sp -\fBpcregrep\fP searches files for character patterns, in the same way as other -grep commands do, but it uses the PCRE regular expression library to support -patterns that are compatible with the regular expressions of Perl 5. See -.\" HREF -\fBpcresyntax\fP(3) -.\" -for a quick-reference summary of pattern syntax, or -.\" HREF -\fBpcrepattern\fP(3) -.\" -for a full description of the syntax and semantics of the regular expressions -that PCRE supports. -.P -Patterns, whether supplied on the command line or in a separate file, are given -without delimiters. For example: -.sp - pcregrep Thursday /etc/motd -.sp -If you attempt to use delimiters (for example, by surrounding a pattern with -slashes, as is common in Perl scripts), they are interpreted as part of the -pattern. Quotes can of course be used to delimit patterns on the command line -because they are interpreted by the shell, and indeed quotes are required if a -pattern contains white space or shell metacharacters. -.P -The first argument that follows any option settings is treated as the single -pattern to be matched when neither \fB-e\fP nor \fB-f\fP is present. -Conversely, when one or both of these options are used to specify patterns, all -arguments are treated as path names. At least one of \fB-e\fP, \fB-f\fP, or an -argument pattern must be provided. -.P -If no files are specified, \fBpcregrep\fP reads the standard input. The -standard input can also be referenced by a name consisting of a single hyphen. -For example: -.sp - pcregrep some-pattern /file1 - /file3 -.sp -By default, each line that matches a pattern is copied to the standard -output, and if there is more than one file, the file name is output at the -start of each line, followed by a colon. However, there are options that can -change how \fBpcregrep\fP behaves. In particular, the \fB-M\fP option makes it -possible to search for patterns that span line boundaries. What defines a line -boundary is controlled by the \fB-N\fP (\fB--newline\fP) option. -.P -The amount of memory used for buffering files that are being scanned is -controlled by a parameter that can be set by the \fB--buffer-size\fP option. -The default value for this parameter is specified when \fBpcregrep\fP is built, -with the default default being 20K. A block of memory three times this size is -used (to allow for buffering "before" and "after" lines). An error occurs if a -line overflows the buffer. -.P -Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater. -BUFSIZ is defined in \fB\fP. When there is more than one pattern -(specified by the use of \fB-e\fP and/or \fB-f\fP), each pattern is applied to -each line in the order in which they are defined, except that all the \fB-e\fP -patterns are tried before the \fB-f\fP patterns. -.P -By default, as soon as one pattern matches a line, no further patterns are -considered. However, if \fB--colour\fP (or \fB--color\fP) is used to colour the -matching substrings, or if \fB--only-matching\fP, \fB--file-offsets\fP, or -\fB--line-offsets\fP is used to output only the part of the line that matched -(either shown literally, or as an offset), scanning resumes immediately -following the match, so that further matches on the same line can be found. If -there are multiple patterns, they are all tried on the remainder of the line, -but patterns that follow the one that matched are not tried on the earlier part -of the line. -.P -This behaviour means that the order in which multiple patterns are specified -can affect the output when one of the above options is used. This is no longer -the same behaviour as GNU grep, which now manages to display earlier matches -for later patterns (as long as there is no overlap). -.P -Patterns that can match an empty string are accepted, but empty string -matches are never recognized. An example is the pattern "(super)?(man)?", in -which all components are optional. This pattern finds all occurrences of both -"super" and "man"; the output differs from matching with "super|man" when only -the matching substrings are being shown. -.P -If the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variable is set, -\fBpcregrep\fP uses the value to set a locale when calling the PCRE library. -The \fB--locale\fP option can be used to override this. -. -. -.SH "SUPPORT FOR COMPRESSED FILES" -.rs -.sp -It is possible to compile \fBpcregrep\fP so that it uses \fBlibz\fP or -\fBlibbz2\fP to read files whose names end in \fB.gz\fP or \fB.bz2\fP, -respectively. You can find out whether your binary has support for one or both -of these file types by running it with the \fB--help\fP option. If the -appropriate support is not present, files are treated as plain text. The -standard input is always so treated. -. -. -.SH "BINARY FILES" -.rs -.sp -By default, a file that contains a binary zero byte within the first 1024 bytes -is identified as a binary file, and is processed specially. (GNU grep also -identifies binary files in this manner.) See the \fB--binary-files\fP option -for a means of changing the way binary files are handled. -. -. -.SH OPTIONS -.rs -.sp -The order in which some of the options appear can affect the output. For -example, both the \fB-h\fP and \fB-l\fP options affect the printing of file -names. Whichever comes later in the command line will be the one that takes -effect. Similarly, except where noted below, if an option is given twice, the -later setting is used. Numerical values for options may be followed by K or M, -to signify multiplication by 1024 or 1024*1024 respectively. -.TP 10 -\fB--\fP -This terminates the list of options. It is useful if the next item on the -command line starts with a hyphen but is not an option. This allows for the -processing of patterns and filenames that start with hyphens. -.TP -\fB-A\fP \fInumber\fP, \fB--after-context=\fP\fInumber\fP -Output \fInumber\fP lines of context after each matching line. If filenames -and/or line numbers are being output, a hyphen separator is used instead of a -colon for the context lines. A line containing "--" is output between each -group of lines, unless they are in fact contiguous in the input file. The value -of \fInumber\fP is expected to be relatively small. However, \fBpcregrep\fP -guarantees to have up to 8K of following text available for context output. -.TP -\fB-a\fP, \fB--text\fP -Treat binary files as text. This is equivalent to -\fB--binary-files\fP=\fItext\fP. -.TP -\fB-B\fP \fInumber\fP, \fB--before-context=\fP\fInumber\fP -Output \fInumber\fP lines of context before each matching line. If filenames -and/or line numbers are being output, a hyphen separator is used instead of a -colon for the context lines. A line containing "--" is output between each -group of lines, unless they are in fact contiguous in the input file. The value -of \fInumber\fP is expected to be relatively small. However, \fBpcregrep\fP -guarantees to have up to 8K of preceding text available for context output. -.TP -\fB--binary-files=\fP\fIword\fP -Specify how binary files are to be processed. If the word is "binary" (the -default), pattern matching is performed on binary files, but the only output is -"Binary file matches" when a match succeeds. If the word is "text", -which is equivalent to the \fB-a\fP or \fB--text\fP option, binary files are -processed in the same way as any other file. In this case, when a match -succeeds, the output may be binary garbage, which can have nasty effects if -sent to a terminal. If the word is "without-match", which is equivalent to the -\fB-I\fP option, binary files are not processed at all; they are assumed not to -be of interest. -.TP -\fB--buffer-size=\fP\fInumber\fP -Set the parameter that controls how much memory is used for buffering files -that are being scanned. -.TP -\fB-C\fP \fInumber\fP, \fB--context=\fP\fInumber\fP -Output \fInumber\fP lines of context both before and after each matching line. -This is equivalent to setting both \fB-A\fP and \fB-B\fP to the same value. -.TP -\fB-c\fP, \fB--count\fP -Do not output individual lines from the files that are being scanned; instead -output the number of lines that would otherwise have been shown. If no lines -are selected, the number zero is output. If several files are are being -scanned, a count is output for each of them. However, if the -\fB--files-with-matches\fP option is also used, only those files whose counts -are greater than zero are listed. When \fB-c\fP is used, the \fB-A\fP, -\fB-B\fP, and \fB-C\fP options are ignored. -.TP -\fB--colour\fP, \fB--color\fP -If this option is given without any data, it is equivalent to "--colour=auto". -If data is required, it must be given in the same shell item, separated by an -equals sign. -.TP -\fB--colour=\fP\fIvalue\fP, \fB--color=\fP\fIvalue\fP -This option specifies under what circumstances the parts of a line that matched -a pattern should be coloured in the output. By default, the output is not -coloured. The value (which is optional, see above) may be "never", "always", or -"auto". In the latter case, colouring happens only if the standard output is -connected to a terminal. More resources are used when colouring is enabled, -because \fBpcregrep\fP has to search for all possible matches in a line, not -just one, in order to colour them all. -.sp -The colour that is used can be specified by setting the environment variable -PCREGREP_COLOUR or PCREGREP_COLOR. The value of this variable should be a -string of two numbers, separated by a semicolon. They are copied directly into -the control string for setting colour on a terminal, so it is your -responsibility to ensure that they make sense. If neither of the environment -variables is set, the default is "1;31", which gives red. -.TP -\fB-D\fP \fIaction\fP, \fB--devices=\fP\fIaction\fP -If an input path is not a regular file or a directory, "action" specifies how -it is to be processed. Valid values are "read" (the default) or "skip" -(silently skip the path). -.TP -\fB-d\fP \fIaction\fP, \fB--directories=\fP\fIaction\fP -If an input path is a directory, "action" specifies how it is to be processed. -Valid values are "read" (the default in non-Windows environments, for -compatibility with GNU grep), "recurse" (equivalent to the \fB-r\fP option), or -"skip" (silently skip the path, the default in Windows environments). In the -"read" case, directories are read as if they were ordinary files. In some -operating systems the effect of reading a directory like this is an immediate -end-of-file; in others it may provoke an error. -.TP -\fB-e\fP \fIpattern\fP, \fB--regex=\fP\fIpattern\fP, \fB--regexp=\fP\fIpattern\fP -Specify a pattern to be matched. This option can be used multiple times in -order to specify several patterns. It can also be used as a way of specifying a -single pattern that starts with a hyphen. When \fB-e\fP is used, no argument -pattern is taken from the command line; all arguments are treated as file -names. There is no limit to the number of patterns. They are applied to each -line in the order in which they are defined until one matches. -.sp -If \fB-f\fP is used with \fB-e\fP, the command line patterns are matched first, -followed by the patterns from the file(s), independent of the order in which -these options are specified. Note that multiple use of \fB-e\fP is not the same -as a single pattern with alternatives. For example, X|Y finds the first -character in a line that is X or Y, whereas if the two patterns are given -separately, with X first, \fBpcregrep\fP finds X if it is present, even if it -follows Y in the line. It finds Y only if there is no X in the line. This -matters only if you are using \fB-o\fP or \fB--colo(u)r\fP to show the part(s) -of the line that matched. -.TP -\fB--exclude\fP=\fIpattern\fP -Files (but not directories) whose names match the pattern are skipped without -being processed. This applies to all files, whether listed on the command line, -obtained from \fB--file-list\fP, or by scanning a directory. The pattern is a -PCRE regular expression, and is matched against the final component of the file -name, not the entire path. The \fB-F\fP, \fB-w\fP, and \fB-x\fP options do not -apply to this pattern. The option may be given any number of times in order to -specify multiple patterns. If a file name matches both an \fB--include\fP -and an \fB--exclude\fP pattern, it is excluded. There is no short form for this -option. -.TP -\fB--exclude-from=\fP\fIfilename\fP -Treat each non-empty line of the file as the data for an \fB--exclude\fP -option. What constitutes a newline when reading the file is the operating -system's default. The \fB--newline\fP option has no effect on this option. This -option may be given more than once in order to specify a number of files to -read. -.TP -\fB--exclude-dir\fP=\fIpattern\fP -Directories whose names match the pattern are skipped without being processed, -whatever the setting of the \fB--recursive\fP option. This applies to all -directories, whether listed on the command line, obtained from -\fB--file-list\fP, or by scanning a parent directory. The pattern is a PCRE -regular expression, and is matched against the final component of the directory -name, not the entire path. The \fB-F\fP, \fB-w\fP, and \fB-x\fP options do not -apply to this pattern. The option may be given any number of times in order to -specify more than one pattern. If a directory matches both \fB--include-dir\fP -and \fB--exclude-dir\fP, it is excluded. There is no short form for this -option. -.TP -\fB-F\fP, \fB--fixed-strings\fP -Interpret each data-matching pattern as a list of fixed strings, separated by -newlines, instead of as a regular expression. What constitutes a newline for -this purpose is controlled by the \fB--newline\fP option. The \fB-w\fP (match -as a word) and \fB-x\fP (match whole line) options can be used with \fB-F\fP. -They apply to each of the fixed strings. A line is selected if any of the fixed -strings are found in it (subject to \fB-w\fP or \fB-x\fP, if present). This -option applies only to the patterns that are matched against the contents of -files; it does not apply to patterns specified by any of the \fB--include\fP or -\fB--exclude\fP options. -.TP -\fB-f\fP \fIfilename\fP, \fB--file=\fP\fIfilename\fP -Read patterns from the file, one per line, and match them against -each line of input. What constitutes a newline when reading the file is the -operating system's default. The \fB--newline\fP option has no effect on this -option. Trailing white space is removed from each line, and blank lines are -ignored. An empty file contains no patterns and therefore matches nothing. See -also the comments about multiple patterns versus a single pattern with -alternatives in the description of \fB-e\fP above. -.sp -If this option is given more than once, all the specified files are -read. A data line is output if any of the patterns match it. A filename can -be given as "-" to refer to the standard input. When \fB-f\fP is used, patterns -specified on the command line using \fB-e\fP may also be present; they are -tested before the file's patterns. However, no other pattern is taken from the -command line; all arguments are treated as the names of paths to be searched. -.TP -\fB--file-list\fP=\fIfilename\fP -Read a list of files and/or directories that are to be scanned from the given -file, one per line. Trailing white space is removed from each line, and blank -lines are ignored. These paths are processed before any that are listed on the -command line. The filename can be given as "-" to refer to the standard input. -If \fB--file\fP and \fB--file-list\fP are both specified as "-", patterns are -read first. This is useful only when the standard input is a terminal, from -which further lines (the list of files) can be read after an end-of-file -indication. If this option is given more than once, all the specified files are -read. -.TP -\fB--file-offsets\fP -Instead of showing lines or parts of lines that match, show each match as an -offset from the start of the file and a length, separated by a comma. In this -mode, no context is shown. That is, the \fB-A\fP, \fB-B\fP, and \fB-C\fP -options are ignored. If there is more than one match in a line, each of them is -shown separately. This option is mutually exclusive with \fB--line-offsets\fP -and \fB--only-matching\fP. -.TP -\fB-H\fP, \fB--with-filename\fP -Force the inclusion of the filename at the start of output lines when searching -a single file. By default, the filename is not shown in this case. For matching -lines, the filename is followed by a colon; for context lines, a hyphen -separator is used. If a line number is also being output, it follows the file -name. -.TP -\fB-h\fP, \fB--no-filename\fP -Suppress the output filenames when searching multiple files. By default, -filenames are shown when multiple files are searched. For matching lines, the -filename is followed by a colon; for context lines, a hyphen separator is used. -If a line number is also being output, it follows the file name. -.TP -\fB--help\fP -Output a help message, giving brief details of the command options and file -type support, and then exit. Anything else on the command line is -ignored. -.TP -\fB-I\fP -Treat binary files as never matching. This is equivalent to -\fB--binary-files\fP=\fIwithout-match\fP. -.TP -\fB-i\fP, \fB--ignore-case\fP -Ignore upper/lower case distinctions during comparisons. -.TP -\fB--include\fP=\fIpattern\fP -If any \fB--include\fP patterns are specified, the only files that are -processed are those that match one of the patterns (and do not match an -\fB--exclude\fP pattern). This option does not affect directories, but it -applies to all files, whether listed on the command line, obtained from -\fB--file-list\fP, or by scanning a directory. The pattern is a PCRE regular -expression, and is matched against the final component of the file name, not -the entire path. The \fB-F\fP, \fB-w\fP, and \fB-x\fP options do not apply to -this pattern. The option may be given any number of times. If a file name -matches both an \fB--include\fP and an \fB--exclude\fP pattern, it is excluded. -There is no short form for this option. -.TP -\fB--include-from=\fP\fIfilename\fP -Treat each non-empty line of the file as the data for an \fB--include\fP -option. What constitutes a newline for this purpose is the operating system's -default. The \fB--newline\fP option has no effect on this option. This option -may be given any number of times; all the files are read. -.TP -\fB--include-dir\fP=\fIpattern\fP -If any \fB--include-dir\fP patterns are specified, the only directories that -are processed are those that match one of the patterns (and do not match an -\fB--exclude-dir\fP pattern). This applies to all directories, whether listed -on the command line, obtained from \fB--file-list\fP, or by scanning a parent -directory. The pattern is a PCRE regular expression, and is matched against the -final component of the directory name, not the entire path. The \fB-F\fP, -\fB-w\fP, and \fB-x\fP options do not apply to this pattern. The option may be -given any number of times. If a directory matches both \fB--include-dir\fP and -\fB--exclude-dir\fP, it is excluded. There is no short form for this option. -.TP -\fB-L\fP, \fB--files-without-match\fP -Instead of outputting lines from the files, just output the names of the files -that do not contain any lines that would have been output. Each file name is -output once, on a separate line. -.TP -\fB-l\fP, \fB--files-with-matches\fP -Instead of outputting lines from the files, just output the names of the files -containing lines that would have been output. Each file name is output -once, on a separate line. Searching normally stops as soon as a matching line -is found in a file. However, if the \fB-c\fP (count) option is also used, -matching continues in order to obtain the correct count, and those files that -have at least one match are listed along with their counts. Using this option -with \fB-c\fP is a way of suppressing the listing of files with no matches. -.TP -\fB--label\fP=\fIname\fP -This option supplies a name to be used for the standard input when file names -are being output. If not supplied, "(standard input)" is used. There is no -short form for this option. -.TP -\fB--line-buffered\fP -When this option is given, input is read and processed line by line, and the -output is flushed after each write. By default, input is read in large chunks, -unless \fBpcregrep\fP can determine that it is reading from a terminal (which -is currently possible only in Unix-like environments). Output to terminal is -normally automatically flushed by the operating system. This option can be -useful when the input or output is attached to a pipe and you do not want -\fBpcregrep\fP to buffer up large amounts of data. However, its use will affect -performance, and the \fB-M\fP (multiline) option ceases to work. -.TP -\fB--line-offsets\fP -Instead of showing lines or parts of lines that match, show each match as a -line number, the offset from the start of the line, and a length. The line -number is terminated by a colon (as usual; see the \fB-n\fP option), and the -offset and length are separated by a comma. In this mode, no context is shown. -That is, the \fB-A\fP, \fB-B\fP, and \fB-C\fP options are ignored. If there is -more than one match in a line, each of them is shown separately. This option is -mutually exclusive with \fB--file-offsets\fP and \fB--only-matching\fP. -.TP -\fB--locale\fP=\fIlocale-name\fP -This option specifies a locale to be used for pattern matching. It overrides -the value in the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variables. If no -locale is specified, the PCRE library's default (usually the "C" locale) is -used. There is no short form for this option. -.TP -\fB--match-limit\fP=\fInumber\fP -Processing some regular expression patterns can require a very large amount of -memory, leading in some cases to a program crash if not enough is available. -Other patterns may take a very long time to search for all possible matching -strings. The \fBpcre_exec()\fP function that is called by \fBpcregrep\fP to do -the matching has two parameters that can limit the resources that it uses. -.sp -The \fB--match-limit\fP option provides a means of limiting resource usage -when processing patterns that are not going to match, but which have a very -large number of possibilities in their search trees. The classic example is a -pattern that uses nested unlimited repeats. Internally, PCRE uses a function -called \fBmatch()\fP which it calls repeatedly (sometimes recursively). The -limit set by \fB--match-limit\fP is imposed on the number of times this -function is called during a match, which has the effect of limiting the amount -of backtracking that can take place. -.sp -The \fB--recursion-limit\fP option is similar to \fB--match-limit\fP, but -instead of limiting the total number of times that \fBmatch()\fP is called, it -limits the depth of recursive calls, which in turn limits the amount of memory -that can be used. The recursion depth is a smaller number than the total number -of calls, because not all calls to \fBmatch()\fP are recursive. This limit is -of use only if it is set smaller than \fB--match-limit\fP. -.sp -There are no short forms for these options. The default settings are specified -when the PCRE library is compiled, with the default default being 10 million. -.TP -\fB-M\fP, \fB--multiline\fP -Allow patterns to match more than one line. When this option is given, patterns -may usefully contain literal newline characters and internal occurrences of ^ -and $ characters. The output for a successful match may consist of more than -one line, the last of which is the one in which the match ended. If the matched -string ends with a newline sequence the output ends at the end of that line. -.sp -When this option is set, the PCRE library is called in "multiline" mode. -There is a limit to the number of lines that can be matched, imposed by the way -that \fBpcregrep\fP buffers the input file as it scans it. However, -\fBpcregrep\fP ensures that at least 8K characters or the rest of the document -(whichever is the shorter) are available for forward matching, and similarly -the previous 8K characters (or all the previous characters, if fewer than 8K) -are guaranteed to be available for lookbehind assertions. This option does not -work when input is read line by line (see \fP--line-buffered\fP.) -.TP -\fB-N\fP \fInewline-type\fP, \fB--newline\fP=\fInewline-type\fP -The PCRE library supports five different conventions for indicating -the ends of lines. They are the single-character sequences CR (carriage return) -and LF (linefeed), the two-character sequence CRLF, an "anycrlf" convention, -which recognizes any of the preceding three types, and an "any" convention, in -which any Unicode line ending sequence is assumed to end a line. The Unicode -sequences are the three just mentioned, plus VT (vertical tab, U+000B), FF -(form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and -PS (paragraph separator, U+2029). -.sp -When the PCRE library is built, a default line-ending sequence is specified. -This is normally the standard sequence for the operating system. Unless -otherwise specified by this option, \fBpcregrep\fP uses the library's default. -The possible values for this option are CR, LF, CRLF, ANYCRLF, or ANY. This -makes it possible to use \fBpcregrep\fP to scan files that have come from other -environments without having to modify their line endings. If the data that is -being scanned does not agree with the convention set by this option, -\fBpcregrep\fP may behave in strange ways. Note that this option does not -apply to files specified by the \fB-f\fP, \fB--exclude-from\fP, or -\fB--include-from\fP options, which are expected to use the operating system's -standard newline sequence. -.TP -\fB-n\fP, \fB--line-number\fP -Precede each output line by its line number in the file, followed by a colon -for matching lines or a hyphen for context lines. If the filename is also being -output, it precedes the line number. This option is forced if -\fB--line-offsets\fP is used. -.TP -\fB--no-jit\fP -If the PCRE library is built with support for just-in-time compiling (which -speeds up matching), \fBpcregrep\fP automatically makes use of this, unless it -was explicitly disabled at build time. This option can be used to disable the -use of JIT at run time. It is provided for testing and working round problems. -It should never be needed in normal use. -.TP -\fB-o\fP, \fB--only-matching\fP -Show only the part of the line that matched a pattern instead of the whole -line. In this mode, no context is shown. That is, the \fB-A\fP, \fB-B\fP, and -\fB-C\fP options are ignored. If there is more than one match in a line, each -of them is shown separately. If \fB-o\fP is combined with \fB-v\fP (invert the -sense of the match to find non-matching lines), no output is generated, but the -return code is set appropriately. If the matched portion of the line is empty, -nothing is output unless the file name or line number are being printed, in -which case they are shown on an otherwise empty line. This option is mutually -exclusive with \fB--file-offsets\fP and \fB--line-offsets\fP. -.TP -\fB-o\fP\fInumber\fP, \fB--only-matching\fP=\fInumber\fP -Show only the part of the line that matched the capturing parentheses of the -given number. Up to 32 capturing parentheses are supported, and -o0 is -equivalent to \fB-o\fP without a number. Because these options can be given -without an argument (see above), if an argument is present, it must be given in -the same shell item, for example, -o3 or --only-matching=2. The comments given -for the non-argument case above also apply to this case. If the specified -capturing parentheses do not exist in the pattern, or were not set in the -match, nothing is output unless the file name or line number are being printed. -.sp -If this option is given multiple times, multiple substrings are output, in the -order the options are given. For example, -o3 -o1 -o3 causes the substrings -matched by capturing parentheses 3 and 1 and then 3 again to be output. By -default, there is no separator (but see the next option). -.TP -\fB--om-separator\fP=\fItext\fP -Specify a separating string for multiple occurrences of \fB-o\fP. The default -is an empty string. Separating strings are never coloured. -.TP -\fB-q\fP, \fB--quiet\fP -Work quietly, that is, display nothing except error messages. The exit -status indicates whether or not any matches were found. -.TP -\fB-r\fP, \fB--recursive\fP -If any given path is a directory, recursively scan the files it contains, -taking note of any \fB--include\fP and \fB--exclude\fP settings. By default, a -directory is read as a normal file; in some operating systems this gives an -immediate end-of-file. This option is a shorthand for setting the \fB-d\fP -option to "recurse". -.TP -\fB--recursion-limit\fP=\fInumber\fP -See \fB--match-limit\fP above. -.TP -\fB-s\fP, \fB--no-messages\fP -Suppress error messages about non-existent or unreadable files. Such files are -quietly skipped. However, the return code is still 2, even if matches were -found in other files. -.TP -\fB-u\fP, \fB--utf-8\fP -Operate in UTF-8 mode. This option is available only if PCRE has been compiled -with UTF-8 support. All patterns (including those for any \fB--exclude\fP and -\fB--include\fP options) and all subject lines that are scanned must be valid -strings of UTF-8 characters. -.TP -\fB-V\fP, \fB--version\fP -Write the version numbers of \fBpcregrep\fP and the PCRE library to the -standard output and then exit. Anything else on the command line is -ignored. -.TP -\fB-v\fP, \fB--invert-match\fP -Invert the sense of the match, so that lines which do \fInot\fP match any of -the patterns are the ones that are found. -.TP -\fB-w\fP, \fB--word-regex\fP, \fB--word-regexp\fP -Force the patterns to match only whole words. This is equivalent to having \eb -at the start and end of the pattern. This option applies only to the patterns -that are matched against the contents of files; it does not apply to patterns -specified by any of the \fB--include\fP or \fB--exclude\fP options. -.TP -\fB-x\fP, \fB--line-regex\fP, \fB--line-regexp\fP -Force the patterns to be anchored (each must start matching at the beginning of -a line) and in addition, require them to match entire lines. This is equivalent -to having ^ and $ characters at the start and end of each alternative branch in -every pattern. This option applies only to the patterns that are matched -against the contents of files; it does not apply to patterns specified by any -of the \fB--include\fP or \fB--exclude\fP options. -. -. -.SH "ENVIRONMENT VARIABLES" -.rs -.sp -The environment variables \fBLC_ALL\fP and \fBLC_CTYPE\fP are examined, in that -order, for a locale. The first one that is set is used. This can be overridden -by the \fB--locale\fP option. If no locale is set, the PCRE library's default -(usually the "C" locale) is used. -. -. -.SH "NEWLINES" -.rs -.sp -The \fB-N\fP (\fB--newline\fP) option allows \fBpcregrep\fP to scan files with -different newline conventions from the default. Any parts of the input files -that are written to the standard output are copied identically, with whatever -newline sequences they have in the input. However, the setting of this option -does not affect the interpretation of files specified by the \fB-f\fP, -\fB--exclude-from\fP, or \fB--include-from\fP options, which are assumed to use -the operating system's standard newline sequence, nor does it affect the way in -which \fBpcregrep\fP writes informational messages to the standard error and -output streams. For these it uses the string "\en" to indicate newlines, -relying on the C I/O library to convert this to an appropriate sequence. -. -. -.SH "OPTIONS COMPATIBILITY" -.rs -.sp -Many of the short and long forms of \fBpcregrep\fP's options are the same -as in the GNU \fBgrep\fP program. Any long option of the form -\fB--xxx-regexp\fP (GNU terminology) is also available as \fB--xxx-regex\fP -(PCRE terminology). However, the \fB--file-list\fP, \fB--file-offsets\fP, -\fB--include-dir\fP, \fB--line-offsets\fP, \fB--locale\fP, \fB--match-limit\fP, -\fB-M\fP, \fB--multiline\fP, \fB-N\fP, \fB--newline\fP, \fB--om-separator\fP, -\fB--recursion-limit\fP, \fB-u\fP, and \fB--utf-8\fP options are specific to -\fBpcregrep\fP, as is the use of the \fB--only-matching\fP option with a -capturing parentheses number. -.P -Although most of the common options work the same way, a few are different in -\fBpcregrep\fP. For example, the \fB--include\fP option's argument is a glob -for GNU \fBgrep\fP, but a regular expression for \fBpcregrep\fP. If both the -\fB-c\fP and \fB-l\fP options are given, GNU grep lists only file names, -without counts, but \fBpcregrep\fP gives the counts. -. -. -.SH "OPTIONS WITH DATA" -.rs -.sp -There are four different ways in which an option with data can be specified. -If a short form option is used, the data may follow immediately, or (with one -exception) in the next command line item. For example: -.sp - -f/some/file - -f /some/file -.sp -The exception is the \fB-o\fP option, which may appear with or without data. -Because of this, if data is present, it must follow immediately in the same -item, for example -o3. -.P -If a long form option is used, the data may appear in the same command line -item, separated by an equals character, or (with two exceptions) it may appear -in the next command line item. For example: -.sp - --file=/some/file - --file /some/file -.sp -Note, however, that if you want to supply a file name beginning with ~ as data -in a shell command, and have the shell expand ~ to a home directory, you must -separate the file name from the option, because the shell does not treat ~ -specially unless it is at the start of an item. -.P -The exceptions to the above are the \fB--colour\fP (or \fB--color\fP) and -\fB--only-matching\fP options, for which the data is optional. If one of these -options does have data, it must be given in the first form, using an equals -character. Otherwise \fBpcregrep\fP will assume that it has no data. -. -. -.SH "MATCHING ERRORS" -.rs -.sp -It is possible to supply a regular expression that takes a very long time to -fail to match certain lines. Such patterns normally involve nested indefinite -repeats, for example: (a+)*\ed when matched against a line of a's with no final -digit. The PCRE matching function has a resource limit that causes it to abort -in these circumstances. If this happens, \fBpcregrep\fP outputs an error -message and the line that caused the problem to the standard error stream. If -there are more than 20 such errors, \fBpcregrep\fP gives up. -.P -The \fB--match-limit\fP option of \fBpcregrep\fP can be used to set the overall -resource limit; there is a second option called \fB--recursion-limit\fP that -sets a limit on the amount of memory (usually stack) that is used (see the -discussion of these options above). -. -. -.SH DIAGNOSTICS -.rs -.sp -Exit status is 0 if any matches were found, 1 if no matches were found, and 2 -for syntax errors, overlong lines, non-existent or inaccessible files (even if -matches were found in other files) or too many matching errors. Using the -\fB-s\fP option to suppress error messages about inaccessible files does not -affect the return code. -. -. -.SH "SEE ALSO" -.rs -.sp -\fBpcrepattern\fP(3), \fBpcresyntax\fP(3), \fBpcretest\fP(1). -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 03 April 2014 -Copyright (c) 1997-2014 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcregrep.txt b/src/pcre/doc/pcregrep.txt deleted file mode 100644 index 97d9a7bd..00000000 --- a/src/pcre/doc/pcregrep.txt +++ /dev/null @@ -1,741 +0,0 @@ -PCREGREP(1) General Commands Manual PCREGREP(1) - - - -NAME - pcregrep - a grep with Perl-compatible regular expressions. - -SYNOPSIS - pcregrep [options] [long options] [pattern] [path1 path2 ...] - - -DESCRIPTION - - pcregrep searches files for character patterns, in the same way as - other grep commands do, but it uses the PCRE regular expression library - to support patterns that are compatible with the regular expressions of - Perl 5. See pcresyntax(3) for a quick-reference summary of pattern syn- - tax, or pcrepattern(3) for a full description of the syntax and seman- - tics of the regular expressions that PCRE supports. - - Patterns, whether supplied on the command line or in a separate file, - are given without delimiters. For example: - - pcregrep Thursday /etc/motd - - If you attempt to use delimiters (for example, by surrounding a pattern - with slashes, as is common in Perl scripts), they are interpreted as - part of the pattern. Quotes can of course be used to delimit patterns - on the command line because they are interpreted by the shell, and - indeed quotes are required if a pattern contains white space or shell - metacharacters. - - The first argument that follows any option settings is treated as the - single pattern to be matched when neither -e nor -f is present. Con- - versely, when one or both of these options are used to specify pat- - terns, all arguments are treated as path names. At least one of -e, -f, - or an argument pattern must be provided. - - If no files are specified, pcregrep reads the standard input. The stan- - dard input can also be referenced by a name consisting of a single - hyphen. For example: - - pcregrep some-pattern /file1 - /file3 - - By default, each line that matches a pattern is copied to the standard - output, and if there is more than one file, the file name is output at - the start of each line, followed by a colon. However, there are options - that can change how pcregrep behaves. In particular, the -M option - makes it possible to search for patterns that span line boundaries. - What defines a line boundary is controlled by the -N (--newline) - option. - - The amount of memory used for buffering files that are being scanned is - controlled by a parameter that can be set by the --buffer-size option. - The default value for this parameter is specified when pcregrep is - built, with the default default being 20K. A block of memory three - times this size is used (to allow for buffering "before" and "after" - lines). An error occurs if a line overflows the buffer. - - Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the - greater. BUFSIZ is defined in . When there is more than one - pattern (specified by the use of -e and/or -f), each pattern is applied - to each line in the order in which they are defined, except that all - the -e patterns are tried before the -f patterns. - - By default, as soon as one pattern matches a line, no further patterns - are considered. However, if --colour (or --color) is used to colour the - matching substrings, or if --only-matching, --file-offsets, or --line- - offsets is used to output only the part of the line that matched - (either shown literally, or as an offset), scanning resumes immediately - following the match, so that further matches on the same line can be - found. If there are multiple patterns, they are all tried on the - remainder of the line, but patterns that follow the one that matched - are not tried on the earlier part of the line. - - This behaviour means that the order in which multiple patterns are - specified can affect the output when one of the above options is used. - This is no longer the same behaviour as GNU grep, which now manages to - display earlier matches for later patterns (as long as there is no - overlap). - - Patterns that can match an empty string are accepted, but empty string - matches are never recognized. An example is the pattern - "(super)?(man)?", in which all components are optional. This pattern - finds all occurrences of both "super" and "man"; the output differs - from matching with "super|man" when only the matching substrings are - being shown. - - If the LC_ALL or LC_CTYPE environment variable is set, pcregrep uses - the value to set a locale when calling the PCRE library. The --locale - option can be used to override this. - - -SUPPORT FOR COMPRESSED FILES - - It is possible to compile pcregrep so that it uses libz or libbz2 to - read files whose names end in .gz or .bz2, respectively. You can find - out whether your binary has support for one or both of these file types - by running it with the --help option. If the appropriate support is not - present, files are treated as plain text. The standard input is always - so treated. - - -BINARY FILES - - By default, a file that contains a binary zero byte within the first - 1024 bytes is identified as a binary file, and is processed specially. - (GNU grep also identifies binary files in this manner.) See the - --binary-files option for a means of changing the way binary files are - handled. - - -OPTIONS - - The order in which some of the options appear can affect the output. - For example, both the -h and -l options affect the printing of file - names. Whichever comes later in the command line will be the one that - takes effect. Similarly, except where noted below, if an option is - given twice, the later setting is used. Numerical values for options - may be followed by K or M, to signify multiplication by 1024 or - 1024*1024 respectively. - - -- This terminates the list of options. It is useful if the next - item on the command line starts with a hyphen but is not an - option. This allows for the processing of patterns and file- - names that start with hyphens. - - -A number, --after-context=number - Output number lines of context after each matching line. If - filenames and/or line numbers are being output, a hyphen sep- - arator is used instead of a colon for the context lines. A - line containing "--" is output between each group of lines, - unless they are in fact contiguous in the input file. The - value of number is expected to be relatively small. However, - pcregrep guarantees to have up to 8K of following text avail- - able for context output. - - -a, --text - Treat binary files as text. This is equivalent to --binary- - files=text. - - -B number, --before-context=number - Output number lines of context before each matching line. If - filenames and/or line numbers are being output, a hyphen sep- - arator is used instead of a colon for the context lines. A - line containing "--" is output between each group of lines, - unless they are in fact contiguous in the input file. The - value of number is expected to be relatively small. However, - pcregrep guarantees to have up to 8K of preceding text avail- - able for context output. - - --binary-files=word - Specify how binary files are to be processed. If the word is - "binary" (the default), pattern matching is performed on - binary files, but the only output is "Binary file - matches" when a match succeeds. If the word is "text", which - is equivalent to the -a or --text option, binary files are - processed in the same way as any other file. In this case, - when a match succeeds, the output may be binary garbage, - which can have nasty effects if sent to a terminal. If the - word is "without-match", which is equivalent to the -I - option, binary files are not processed at all; they are - assumed not to be of interest. - - --buffer-size=number - Set the parameter that controls how much memory is used for - buffering files that are being scanned. - - -C number, --context=number - Output number lines of context both before and after each - matching line. This is equivalent to setting both -A and -B - to the same value. - - -c, --count - Do not output individual lines from the files that are being - scanned; instead output the number of lines that would other- - wise have been shown. If no lines are selected, the number - zero is output. If several files are are being scanned, a - count is output for each of them. However, if the --files- - with-matches option is also used, only those files whose - counts are greater than zero are listed. When -c is used, the - -A, -B, and -C options are ignored. - - --colour, --color - If this option is given without any data, it is equivalent to - "--colour=auto". If data is required, it must be given in - the same shell item, separated by an equals sign. - - --colour=value, --color=value - This option specifies under what circumstances the parts of a - line that matched a pattern should be coloured in the output. - By default, the output is not coloured. The value (which is - optional, see above) may be "never", "always", or "auto". In - the latter case, colouring happens only if the standard out- - put is connected to a terminal. More resources are used when - colouring is enabled, because pcregrep has to search for all - possible matches in a line, not just one, in order to colour - them all. - - The colour that is used can be specified by setting the envi- - ronment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value - of this variable should be a string of two numbers, separated - by a semicolon. They are copied directly into the control - string for setting colour on a terminal, so it is your - responsibility to ensure that they make sense. If neither of - the environment variables is set, the default is "1;31", - which gives red. - - -D action, --devices=action - If an input path is not a regular file or a directory, - "action" specifies how it is to be processed. Valid values - are "read" (the default) or "skip" (silently skip the path). - - -d action, --directories=action - If an input path is a directory, "action" specifies how it is - to be processed. Valid values are "read" (the default in - non-Windows environments, for compatibility with GNU grep), - "recurse" (equivalent to the -r option), or "skip" (silently - skip the path, the default in Windows environments). In the - "read" case, directories are read as if they were ordinary - files. In some operating systems the effect of reading a - directory like this is an immediate end-of-file; in others it - may provoke an error. - - -e pattern, --regex=pattern, --regexp=pattern - Specify a pattern to be matched. This option can be used mul- - tiple times in order to specify several patterns. It can also - be used as a way of specifying a single pattern that starts - with a hyphen. When -e is used, no argument pattern is taken - from the command line; all arguments are treated as file - names. There is no limit to the number of patterns. They are - applied to each line in the order in which they are defined - until one matches. - - If -f is used with -e, the command line patterns are matched - first, followed by the patterns from the file(s), independent - of the order in which these options are specified. Note that - multiple use of -e is not the same as a single pattern with - alternatives. For example, X|Y finds the first character in a - line that is X or Y, whereas if the two patterns are given - separately, with X first, pcregrep finds X if it is present, - even if it follows Y in the line. It finds Y only if there is - no X in the line. This matters only if you are using -o or - --colo(u)r to show the part(s) of the line that matched. - - --exclude=pattern - Files (but not directories) whose names match the pattern are - skipped without being processed. This applies to all files, - whether listed on the command line, obtained from --file- - list, or by scanning a directory. The pattern is a PCRE regu- - lar expression, and is matched against the final component of - the file name, not the entire path. The -F, -w, and -x - options do not apply to this pattern. The option may be given - any number of times in order to specify multiple patterns. If - a file name matches both an --include and an --exclude pat- - tern, it is excluded. There is no short form for this option. - - --exclude-from=filename - Treat each non-empty line of the file as the data for an - --exclude option. What constitutes a newline when reading the - file is the operating system's default. The --newline option - has no effect on this option. This option may be given more - than once in order to specify a number of files to read. - - --exclude-dir=pattern - Directories whose names match the pattern are skipped without - being processed, whatever the setting of the --recursive - option. This applies to all directories, whether listed on - the command line, obtained from --file-list, or by scanning a - parent directory. The pattern is a PCRE regular expression, - and is matched against the final component of the directory - name, not the entire path. The -F, -w, and -x options do not - apply to this pattern. The option may be given any number of - times in order to specify more than one pattern. If a direc- - tory matches both --include-dir and --exclude-dir, it is - excluded. There is no short form for this option. - - -F, --fixed-strings - Interpret each data-matching pattern as a list of fixed - strings, separated by newlines, instead of as a regular - expression. What constitutes a newline for this purpose is - controlled by the --newline option. The -w (match as a word) - and -x (match whole line) options can be used with -F. They - apply to each of the fixed strings. A line is selected if any - of the fixed strings are found in it (subject to -w or -x, if - present). This option applies only to the patterns that are - matched against the contents of files; it does not apply to - patterns specified by any of the --include or --exclude - options. - - -f filename, --file=filename - Read patterns from the file, one per line, and match them - against each line of input. What constitutes a newline when - reading the file is the operating system's default. The - --newline option has no effect on this option. Trailing white - space is removed from each line, and blank lines are ignored. - An empty file contains no patterns and therefore matches - nothing. See also the comments about multiple patterns versus - a single pattern with alternatives in the description of -e - above. - - If this option is given more than once, all the specified - files are read. A data line is output if any of the patterns - match it. A filename can be given as "-" to refer to the - standard input. When -f is used, patterns specified on the - command line using -e may also be present; they are tested - before the file's patterns. However, no other pattern is - taken from the command line; all arguments are treated as the - names of paths to be searched. - - --file-list=filename - Read a list of files and/or directories that are to be - scanned from the given file, one per line. Trailing white - space is removed from each line, and blank lines are ignored. - These paths are processed before any that are listed on the - command line. The filename can be given as "-" to refer to - the standard input. If --file and --file-list are both spec- - ified as "-", patterns are read first. This is useful only - when the standard input is a terminal, from which further - lines (the list of files) can be read after an end-of-file - indication. If this option is given more than once, all the - specified files are read. - - --file-offsets - Instead of showing lines or parts of lines that match, show - each match as an offset from the start of the file and a - length, separated by a comma. In this mode, no context is - shown. That is, the -A, -B, and -C options are ignored. If - there is more than one match in a line, each of them is shown - separately. This option is mutually exclusive with --line- - offsets and --only-matching. - - -H, --with-filename - Force the inclusion of the filename at the start of output - lines when searching a single file. By default, the filename - is not shown in this case. For matching lines, the filename - is followed by a colon; for context lines, a hyphen separator - is used. If a line number is also being output, it follows - the file name. - - -h, --no-filename - Suppress the output filenames when searching multiple files. - By default, filenames are shown when multiple files are - searched. For matching lines, the filename is followed by a - colon; for context lines, a hyphen separator is used. If a - line number is also being output, it follows the file name. - - --help Output a help message, giving brief details of the command - options and file type support, and then exit. Anything else - on the command line is ignored. - - -I Treat binary files as never matching. This is equivalent to - --binary-files=without-match. - - -i, --ignore-case - Ignore upper/lower case distinctions during comparisons. - - --include=pattern - If any --include patterns are specified, the only files that - are processed are those that match one of the patterns (and - do not match an --exclude pattern). This option does not - affect directories, but it applies to all files, whether - listed on the command line, obtained from --file-list, or by - scanning a directory. The pattern is a PCRE regular expres- - sion, and is matched against the final component of the file - name, not the entire path. The -F, -w, and -x options do not - apply to this pattern. The option may be given any number of - times. If a file name matches both an --include and an - --exclude pattern, it is excluded. There is no short form - for this option. - - --include-from=filename - Treat each non-empty line of the file as the data for an - --include option. What constitutes a newline for this purpose - is the operating system's default. The --newline option has - no effect on this option. This option may be given any number - of times; all the files are read. - - --include-dir=pattern - If any --include-dir patterns are specified, the only direc- - tories that are processed are those that match one of the - patterns (and do not match an --exclude-dir pattern). This - applies to all directories, whether listed on the command - line, obtained from --file-list, or by scanning a parent - directory. The pattern is a PCRE regular expression, and is - matched against the final component of the directory name, - not the entire path. The -F, -w, and -x options do not apply - to this pattern. The option may be given any number of times. - If a directory matches both --include-dir and --exclude-dir, - it is excluded. There is no short form for this option. - - -L, --files-without-match - Instead of outputting lines from the files, just output the - names of the files that do not contain any lines that would - have been output. Each file name is output once, on a sepa- - rate line. - - -l, --files-with-matches - Instead of outputting lines from the files, just output the - names of the files containing lines that would have been out- - put. Each file name is output once, on a separate line. - Searching normally stops as soon as a matching line is found - in a file. However, if the -c (count) option is also used, - matching continues in order to obtain the correct count, and - those files that have at least one match are listed along - with their counts. Using this option with -c is a way of sup- - pressing the listing of files with no matches. - - --label=name - This option supplies a name to be used for the standard input - when file names are being output. If not supplied, "(standard - input)" is used. There is no short form for this option. - - --line-buffered - When this option is given, input is read and processed line - by line, and the output is flushed after each write. By - default, input is read in large chunks, unless pcregrep can - determine that it is reading from a terminal (which is cur- - rently possible only in Unix-like environments). Output to - terminal is normally automatically flushed by the operating - system. This option can be useful when the input or output is - attached to a pipe and you do not want pcregrep to buffer up - large amounts of data. However, its use will affect perfor- - mance, and the -M (multiline) option ceases to work. - - --line-offsets - Instead of showing lines or parts of lines that match, show - each match as a line number, the offset from the start of the - line, and a length. The line number is terminated by a colon - (as usual; see the -n option), and the offset and length are - separated by a comma. In this mode, no context is shown. - That is, the -A, -B, and -C options are ignored. If there is - more than one match in a line, each of them is shown sepa- - rately. This option is mutually exclusive with --file-offsets - and --only-matching. - - --locale=locale-name - This option specifies a locale to be used for pattern match- - ing. It overrides the value in the LC_ALL or LC_CTYPE envi- - ronment variables. If no locale is specified, the PCRE - library's default (usually the "C" locale) is used. There is - no short form for this option. - - --match-limit=number - Processing some regular expression patterns can require a - very large amount of memory, leading in some cases to a pro- - gram crash if not enough is available. Other patterns may - take a very long time to search for all possible matching - strings. The pcre_exec() function that is called by pcregrep - to do the matching has two parameters that can limit the - resources that it uses. - - The --match-limit option provides a means of limiting - resource usage when processing patterns that are not going to - match, but which have a very large number of possibilities in - their search trees. The classic example is a pattern that - uses nested unlimited repeats. Internally, PCRE uses a func- - tion called match() which it calls repeatedly (sometimes - recursively). The limit set by --match-limit is imposed on - the number of times this function is called during a match, - which has the effect of limiting the amount of backtracking - that can take place. - - The --recursion-limit option is similar to --match-limit, but - instead of limiting the total number of times that match() is - called, it limits the depth of recursive calls, which in turn - limits the amount of memory that can be used. The recursion - depth is a smaller number than the total number of calls, - because not all calls to match() are recursive. This limit is - of use only if it is set smaller than --match-limit. - - There are no short forms for these options. The default set- - tings are specified when the PCRE library is compiled, with - the default default being 10 million. - - -M, --multiline - Allow patterns to match more than one line. When this option - is given, patterns may usefully contain literal newline char- - acters and internal occurrences of ^ and $ characters. The - output for a successful match may consist of more than one - line, the last of which is the one in which the match ended. - If the matched string ends with a newline sequence the output - ends at the end of that line. - - When this option is set, the PCRE library is called in "mul- - tiline" mode. There is a limit to the number of lines that - can be matched, imposed by the way that pcregrep buffers the - input file as it scans it. However, pcregrep ensures that at - least 8K characters or the rest of the document (whichever is - the shorter) are available for forward matching, and simi- - larly the previous 8K characters (or all the previous charac- - ters, if fewer than 8K) are guaranteed to be available for - lookbehind assertions. This option does not work when input - is read line by line (see --line-buffered.) - - -N newline-type, --newline=newline-type - The PCRE library supports five different conventions for - indicating the ends of lines. They are the single-character - sequences CR (carriage return) and LF (linefeed), the two- - character sequence CRLF, an "anycrlf" convention, which rec- - ognizes any of the preceding three types, and an "any" con- - vention, in which any Unicode line ending sequence is assumed - to end a line. The Unicode sequences are the three just men- - tioned, plus VT (vertical tab, U+000B), FF (form feed, - U+000C), NEL (next line, U+0085), LS (line separator, - U+2028), and PS (paragraph separator, U+2029). - - When the PCRE library is built, a default line-ending - sequence is specified. This is normally the standard - sequence for the operating system. Unless otherwise specified - by this option, pcregrep uses the library's default. The - possible values for this option are CR, LF, CRLF, ANYCRLF, or - ANY. This makes it possible to use pcregrep to scan files - that have come from other environments without having to mod- - ify their line endings. If the data that is being scanned - does not agree with the convention set by this option, pcre- - grep may behave in strange ways. Note that this option does - not apply to files specified by the -f, --exclude-from, or - --include-from options, which are expected to use the operat- - ing system's standard newline sequence. - - -n, --line-number - Precede each output line by its line number in the file, fol- - lowed by a colon for matching lines or a hyphen for context - lines. If the filename is also being output, it precedes the - line number. This option is forced if --line-offsets is used. - - --no-jit If the PCRE library is built with support for just-in-time - compiling (which speeds up matching), pcregrep automatically - makes use of this, unless it was explicitly disabled at build - time. This option can be used to disable the use of JIT at - run time. It is provided for testing and working round prob- - lems. It should never be needed in normal use. - - -o, --only-matching - Show only the part of the line that matched a pattern instead - of the whole line. In this mode, no context is shown. That - is, the -A, -B, and -C options are ignored. If there is more - than one match in a line, each of them is shown separately. - If -o is combined with -v (invert the sense of the match to - find non-matching lines), no output is generated, but the - return code is set appropriately. If the matched portion of - the line is empty, nothing is output unless the file name or - line number are being printed, in which case they are shown - on an otherwise empty line. This option is mutually exclusive - with --file-offsets and --line-offsets. - - -onumber, --only-matching=number - Show only the part of the line that matched the capturing - parentheses of the given number. Up to 32 capturing parenthe- - ses are supported, and -o0 is equivalent to -o without a num- - ber. Because these options can be given without an argument - (see above), if an argument is present, it must be given in - the same shell item, for example, -o3 or --only-matching=2. - The comments given for the non-argument case above also apply - to this case. If the specified capturing parentheses do not - exist in the pattern, or were not set in the match, nothing - is output unless the file name or line number are being - printed. - - If this option is given multiple times, multiple substrings - are output, in the order the options are given. For example, - -o3 -o1 -o3 causes the substrings matched by capturing paren- - theses 3 and 1 and then 3 again to be output. By default, - there is no separator (but see the next option). - - --om-separator=text - Specify a separating string for multiple occurrences of -o. - The default is an empty string. Separating strings are never - coloured. - - -q, --quiet - Work quietly, that is, display nothing except error messages. - The exit status indicates whether or not any matches were - found. - - -r, --recursive - If any given path is a directory, recursively scan the files - it contains, taking note of any --include and --exclude set- - tings. By default, a directory is read as a normal file; in - some operating systems this gives an immediate end-of-file. - This option is a shorthand for setting the -d option to - "recurse". - - --recursion-limit=number - See --match-limit above. - - -s, --no-messages - Suppress error messages about non-existent or unreadable - files. Such files are quietly skipped. However, the return - code is still 2, even if matches were found in other files. - - -u, --utf-8 - Operate in UTF-8 mode. This option is available only if PCRE - has been compiled with UTF-8 support. All patterns (including - those for any --exclude and --include options) and all sub- - ject lines that are scanned must be valid strings of UTF-8 - characters. - - -V, --version - Write the version numbers of pcregrep and the PCRE library to - the standard output and then exit. Anything else on the com- - mand line is ignored. - - -v, --invert-match - Invert the sense of the match, so that lines which do not - match any of the patterns are the ones that are found. - - -w, --word-regex, --word-regexp - Force the patterns to match only whole words. This is equiva- - lent to having \b at the start and end of the pattern. This - option applies only to the patterns that are matched against - the contents of files; it does not apply to patterns speci- - fied by any of the --include or --exclude options. - - -x, --line-regex, --line-regexp - Force the patterns to be anchored (each must start matching - at the beginning of a line) and in addition, require them to - match entire lines. This is equivalent to having ^ and $ - characters at the start and end of each alternative branch in - every pattern. This option applies only to the patterns that - are matched against the contents of files; it does not apply - to patterns specified by any of the --include or --exclude - options. - - -ENVIRONMENT VARIABLES - - The environment variables LC_ALL and LC_CTYPE are examined, in that - order, for a locale. The first one that is set is used. This can be - overridden by the --locale option. If no locale is set, the PCRE - library's default (usually the "C" locale) is used. - - -NEWLINES - - The -N (--newline) option allows pcregrep to scan files with different - newline conventions from the default. Any parts of the input files that - are written to the standard output are copied identically, with what- - ever newline sequences they have in the input. However, the setting of - this option does not affect the interpretation of files specified by - the -f, --exclude-from, or --include-from options, which are assumed to - use the operating system's standard newline sequence, nor does it - affect the way in which pcregrep writes informational messages to the - standard error and output streams. For these it uses the string "\n" to - indicate newlines, relying on the C I/O library to convert this to an - appropriate sequence. - - -OPTIONS COMPATIBILITY - - Many of the short and long forms of pcregrep's options are the same as - in the GNU grep program. Any long option of the form --xxx-regexp (GNU - terminology) is also available as --xxx-regex (PCRE terminology). How- - ever, the --file-list, --file-offsets, --include-dir, --line-offsets, - --locale, --match-limit, -M, --multiline, -N, --newline, --om-separa- - tor, --recursion-limit, -u, and --utf-8 options are specific to pcre- - grep, as is the use of the --only-matching option with a capturing - parentheses number. - - Although most of the common options work the same way, a few are dif- - ferent in pcregrep. For example, the --include option's argument is a - glob for GNU grep, but a regular expression for pcregrep. If both the - -c and -l options are given, GNU grep lists only file names, without - counts, but pcregrep gives the counts. - - -OPTIONS WITH DATA - - There are four different ways in which an option with data can be spec- - ified. If a short form option is used, the data may follow immedi- - ately, or (with one exception) in the next command line item. For exam- - ple: - - -f/some/file - -f /some/file - - The exception is the -o option, which may appear with or without data. - Because of this, if data is present, it must follow immediately in the - same item, for example -o3. - - If a long form option is used, the data may appear in the same command - line item, separated by an equals character, or (with two exceptions) - it may appear in the next command line item. For example: - - --file=/some/file - --file /some/file - - Note, however, that if you want to supply a file name beginning with ~ - as data in a shell command, and have the shell expand ~ to a home - directory, you must separate the file name from the option, because the - shell does not treat ~ specially unless it is at the start of an item. - - The exceptions to the above are the --colour (or --color) and --only- - matching options, for which the data is optional. If one of these - options does have data, it must be given in the first form, using an - equals character. Otherwise pcregrep will assume that it has no data. - - -MATCHING ERRORS - - It is possible to supply a regular expression that takes a very long - time to fail to match certain lines. Such patterns normally involve - nested indefinite repeats, for example: (a+)*\d when matched against a - line of a's with no final digit. The PCRE matching function has a - resource limit that causes it to abort in these circumstances. If this - happens, pcregrep outputs an error message and the line that caused the - problem to the standard error stream. If there are more than 20 such - errors, pcregrep gives up. - - The --match-limit option of pcregrep can be used to set the overall - resource limit; there is a second option called --recursion-limit that - sets a limit on the amount of memory (usually stack) that is used (see - the discussion of these options above). - - -DIAGNOSTICS - - Exit status is 0 if any matches were found, 1 if no matches were found, - and 2 for syntax errors, overlong lines, non-existent or inaccessible - files (even if matches were found in other files) or too many matching - errors. Using the -s option to suppress error messages about inaccessi- - ble files does not affect the return code. - - -SEE ALSO - - pcrepattern(3), pcresyntax(3), pcretest(1). - - -AUTHOR - - Philip Hazel - University Computing Service - Cambridge CB2 3QH, England. - - -REVISION - - Last updated: 03 April 2014 - Copyright (c) 1997-2014 University of Cambridge. diff --git a/src/pcre/doc/pcrejit.3 b/src/pcre/doc/pcrejit.3 deleted file mode 100644 index 3b785f0f..00000000 --- a/src/pcre/doc/pcrejit.3 +++ /dev/null @@ -1,473 +0,0 @@ -.TH PCREJIT 3 "05 July 2017" "PCRE 8.41" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH "PCRE JUST-IN-TIME COMPILER SUPPORT" -.rs -.sp -Just-in-time compiling is a heavyweight optimization that can greatly speed up -pattern matching. However, it comes at the cost of extra processing before the -match is performed. Therefore, it is of most benefit when the same pattern is -going to be matched many times. This does not necessarily mean many calls of a -matching function; if the pattern is not anchored, matching attempts may take -place many times at various positions in the subject, even for a single call. -Therefore, if the subject string is very long, it may still pay to use JIT for -one-off matches. -.P -JIT support applies only to the traditional Perl-compatible matching function. -It does not apply when the DFA matching function is being used. The code for -this support was written by Zoltan Herczeg. -. -. -.SH "8-BIT, 16-BIT AND 32-BIT SUPPORT" -.rs -.sp -JIT support is available for all of the 8-bit, 16-bit and 32-bit PCRE -libraries. To keep this documentation simple, only the 8-bit interface is -described in what follows. If you are using the 16-bit library, substitute the -16-bit functions and 16-bit structures (for example, \fIpcre16_jit_stack\fP -instead of \fIpcre_jit_stack\fP). If you are using the 32-bit library, -substitute the 32-bit functions and 32-bit structures (for example, -\fIpcre32_jit_stack\fP instead of \fIpcre_jit_stack\fP). -. -. -.SH "AVAILABILITY OF JIT SUPPORT" -.rs -.sp -JIT support is an optional feature of PCRE. The "configure" option --enable-jit -(or equivalent CMake option) must be set when PCRE is built if you want to use -JIT. The support is limited to the following hardware platforms: -.sp - ARM v5, v7, and Thumb2 - Intel x86 32-bit and 64-bit - MIPS 32-bit - Power PC 32-bit and 64-bit - SPARC 32-bit (experimental) -.sp -If --enable-jit is set on an unsupported platform, compilation fails. -.P -A program that is linked with PCRE 8.20 or later can tell if JIT support is -available by calling \fBpcre_config()\fP with the PCRE_CONFIG_JIT option. The -result is 1 when JIT is available, and 0 otherwise. However, a simple program -does not need to check this in order to use JIT. The normal API is implemented -in a way that falls back to the interpretive code if JIT is not available. For -programs that need the best possible performance, there is also a "fast path" -API that is JIT-specific. -.P -If your program may sometimes be linked with versions of PCRE that are older -than 8.20, but you want to use JIT when it is available, you can test the -values of PCRE_MAJOR and PCRE_MINOR, or the existence of a JIT macro such as -PCRE_CONFIG_JIT, for compile-time control of your code. Also beware that the -\fBpcre_jit_exec()\fP function was not available at all before 8.32, -and may not be available at all if PCRE isn't compiled with ---enable-jit. See the "JIT FAST PATH API" section below for details. -. -. -.SH "SIMPLE USE OF JIT" -.rs -.sp -You have to do two things to make use of the JIT support in the simplest way: -.sp - (1) Call \fBpcre_study()\fP with the PCRE_STUDY_JIT_COMPILE option for - each compiled pattern, and pass the resulting \fBpcre_extra\fP block to - \fBpcre_exec()\fP. -.sp - (2) Use \fBpcre_free_study()\fP to free the \fBpcre_extra\fP block when it is - no longer needed, instead of just freeing it yourself. This ensures that - any JIT data is also freed. -.sp -For a program that may be linked with pre-8.20 versions of PCRE, you can insert -.sp - #ifndef PCRE_STUDY_JIT_COMPILE - #define PCRE_STUDY_JIT_COMPILE 0 - #endif -.sp -so that no option is passed to \fBpcre_study()\fP, and then use something like -this to free the study data: -.sp - #ifdef PCRE_CONFIG_JIT - pcre_free_study(study_ptr); - #else - pcre_free(study_ptr); - #endif -.sp -PCRE_STUDY_JIT_COMPILE requests the JIT compiler to generate code for complete -matches. If you want to run partial matches using the PCRE_PARTIAL_HARD or -PCRE_PARTIAL_SOFT options of \fBpcre_exec()\fP, you should set one or both of -the following options in addition to, or instead of, PCRE_STUDY_JIT_COMPILE -when you call \fBpcre_study()\fP: -.sp - PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE - PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE -.sp -If using \fBpcre_jit_exec()\fP and supporting a pre-8.32 version of -PCRE, you can insert: -.sp - #if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32 - pcre_jit_exec(...); - #else - pcre_exec(...) - #endif -.sp -but as described in the "JIT FAST PATH API" section below this assumes -version 8.32 and later are compiled with --enable-jit, which may -break. -.sp -The JIT compiler generates different optimized code for each of the three -modes (normal, soft partial, hard partial). When \fBpcre_exec()\fP is called, -the appropriate code is run if it is available. Otherwise, the pattern is -matched using interpretive code. -.P -In some circumstances you may need to call additional functions. These are -described in the section entitled -.\" HTML -.\" -"Controlling the JIT stack" -.\" -below. -.P -If JIT support is not available, PCRE_STUDY_JIT_COMPILE etc. are ignored, and -no JIT data is created. Otherwise, the compiled pattern is passed to the JIT -compiler, which turns it into machine code that executes much faster than the -normal interpretive code. When \fBpcre_exec()\fP is passed a \fBpcre_extra\fP -block containing a pointer to JIT code of the appropriate mode (normal or -hard/soft partial), it obeys that code instead of running the interpreter. The -result is identical, but the compiled JIT code runs much faster. -.P -There are some \fBpcre_exec()\fP options that are not supported for JIT -execution. There are also some pattern items that JIT cannot handle. Details -are given below. In both cases, execution automatically falls back to the -interpretive code. If you want to know whether JIT was actually used for a -particular match, you should arrange for a JIT callback function to be set up -as described in the section entitled -.\" HTML -.\" -"Controlling the JIT stack" -.\" -below, even if you do not need to supply a non-default JIT stack. Such a -callback function is called whenever JIT code is about to be obeyed. If the -execution options are not right for JIT execution, the callback function is not -obeyed. -.P -If the JIT compiler finds an unsupported item, no JIT data is generated. You -can find out if JIT execution is available after studying a pattern by calling -\fBpcre_fullinfo()\fP with the PCRE_INFO_JIT option. A result of 1 means that -JIT compilation was successful. A result of 0 means that JIT support is not -available, or the pattern was not studied with PCRE_STUDY_JIT_COMPILE etc., or -the JIT compiler was not able to handle the pattern. -.P -Once a pattern has been studied, with or without JIT, it can be used as many -times as you like for matching different subject strings. -. -. -.SH "UNSUPPORTED OPTIONS AND PATTERN ITEMS" -.rs -.sp -The only \fBpcre_exec()\fP options that are supported for JIT execution are -PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK, PCRE_NO_UTF32_CHECK, PCRE_NOTBOL, -PCRE_NOTEOL, PCRE_NOTEMPTY, PCRE_NOTEMPTY_ATSTART, PCRE_PARTIAL_HARD, and -PCRE_PARTIAL_SOFT. -.P -The only unsupported pattern items are \eC (match a single data unit) when -running in a UTF mode, and a callout immediately before an assertion condition -in a conditional group. -. -. -.SH "RETURN VALUES FROM JIT EXECUTION" -.rs -.sp -When a pattern is matched using JIT execution, the return values are the same -as those given by the interpretive \fBpcre_exec()\fP code, with the addition of -one new error code: PCRE_ERROR_JIT_STACKLIMIT. This means that the memory used -for the JIT stack was insufficient. See -.\" HTML -.\" -"Controlling the JIT stack" -.\" -below for a discussion of JIT stack usage. For compatibility with the -interpretive \fBpcre_exec()\fP code, no more than two-thirds of the -\fIovector\fP argument is used for passing back captured substrings. -.P -The error code PCRE_ERROR_MATCHLIMIT is returned by the JIT code if searching a -very large pattern tree goes on for too long, as it is in the same circumstance -when JIT is not used, but the details of exactly what is counted are not the -same. The PCRE_ERROR_RECURSIONLIMIT error code is never returned by JIT -execution. -. -. -.SH "SAVING AND RESTORING COMPILED PATTERNS" -.rs -.sp -The code that is generated by the JIT compiler is architecture-specific, and is -also position dependent. For those reasons it cannot be saved (in a file or -database) and restored later like the bytecode and other data of a compiled -pattern. Saving and restoring compiled patterns is not something many people -do. More detail about this facility is given in the -.\" HREF -\fBpcreprecompile\fP -.\" -documentation. It should be possible to run \fBpcre_study()\fP on a saved and -restored pattern, and thereby recreate the JIT data, but because JIT -compilation uses significant resources, it is probably not worth doing this; -you might as well recompile the original pattern. -. -. -.\" HTML -.SH "CONTROLLING THE JIT STACK" -.rs -.sp -When the compiled JIT code runs, it needs a block of memory to use as a stack. -By default, it uses 32K on the machine stack. However, some large or -complicated patterns need more than this. The error PCRE_ERROR_JIT_STACKLIMIT -is given when there is not enough stack. Three functions are provided for -managing blocks of memory for use as JIT stacks. There is further discussion -about the use of JIT stacks in the section entitled -.\" HTML -.\" -"JIT stack FAQ" -.\" -below. -.P -The \fBpcre_jit_stack_alloc()\fP function creates a JIT stack. Its arguments -are a starting size and a maximum size, and it returns a pointer to an opaque -structure of type \fBpcre_jit_stack\fP, or NULL if there is an error. The -\fBpcre_jit_stack_free()\fP function can be used to free a stack that is no -longer needed. (For the technically minded: the address space is allocated by -mmap or VirtualAlloc.) -.P -JIT uses far less memory for recursion than the interpretive code, -and a maximum stack size of 512K to 1M should be more than enough for any -pattern. -.P -The \fBpcre_assign_jit_stack()\fP function specifies which stack JIT code -should use. Its arguments are as follows: -.sp - pcre_extra *extra - pcre_jit_callback callback - void *data -.sp -The \fIextra\fP argument must be the result of studying a pattern with -PCRE_STUDY_JIT_COMPILE etc. There are three cases for the values of the other -two options: -.sp - (1) If \fIcallback\fP is NULL and \fIdata\fP is NULL, an internal 32K block - on the machine stack is used. -.sp - (2) If \fIcallback\fP is NULL and \fIdata\fP is not NULL, \fIdata\fP must be - a valid JIT stack, the result of calling \fBpcre_jit_stack_alloc()\fP. -.sp - (3) If \fIcallback\fP is not NULL, it must point to a function that is - called with \fIdata\fP as an argument at the start of matching, in - order to set up a JIT stack. If the return from the callback - function is NULL, the internal 32K stack is used; otherwise the - return value must be a valid JIT stack, the result of calling - \fBpcre_jit_stack_alloc()\fP. -.sp -A callback function is obeyed whenever JIT code is about to be run; it is not -obeyed when \fBpcre_exec()\fP is called with options that are incompatible for -JIT execution. A callback function can therefore be used to determine whether a -match operation was executed by JIT or by the interpreter. -.P -You may safely use the same JIT stack for more than one pattern (either by -assigning directly or by callback), as long as the patterns are all matched -sequentially in the same thread. In a multithread application, if you do not -specify a JIT stack, or if you assign or pass back NULL from a callback, that -is thread-safe, because each thread has its own machine stack. However, if you -assign or pass back a non-NULL JIT stack, this must be a different stack for -each thread so that the application is thread-safe. -.P -Strictly speaking, even more is allowed. You can assign the same non-NULL stack -to any number of patterns as long as they are not used for matching by multiple -threads at the same time. For example, you can assign the same stack to all -compiled patterns, and use a global mutex in the callback to wait until the -stack is available for use. However, this is an inefficient solution, and not -recommended. -.P -This is a suggestion for how a multithreaded program that needs to set up -non-default JIT stacks might operate: -.sp - During thread initalization - thread_local_var = pcre_jit_stack_alloc(...) -.sp - During thread exit - pcre_jit_stack_free(thread_local_var) -.sp - Use a one-line callback function - return thread_local_var -.sp -All the functions described in this section do nothing if JIT is not available, -and \fBpcre_assign_jit_stack()\fP does nothing unless the \fBextra\fP argument -is non-NULL and points to a \fBpcre_extra\fP block that is the result of a -successful study with PCRE_STUDY_JIT_COMPILE etc. -. -. -.\" HTML -.SH "JIT STACK FAQ" -.rs -.sp -(1) Why do we need JIT stacks? -.sp -PCRE (and JIT) is a recursive, depth-first engine, so it needs a stack where -the local data of the current node is pushed before checking its child nodes. -Allocating real machine stack on some platforms is difficult. For example, the -stack chain needs to be updated every time if we extend the stack on PowerPC. -Although it is possible, its updating time overhead decreases performance. So -we do the recursion in memory. -.P -(2) Why don't we simply allocate blocks of memory with \fBmalloc()\fP? -.sp -Modern operating systems have a nice feature: they can reserve an address space -instead of allocating memory. We can safely allocate memory pages inside this -address space, so the stack could grow without moving memory data (this is -important because of pointers). Thus we can allocate 1M address space, and use -only a single memory page (usually 4K) if that is enough. However, we can still -grow up to 1M anytime if needed. -.P -(3) Who "owns" a JIT stack? -.sp -The owner of the stack is the user program, not the JIT studied pattern or -anything else. The user program must ensure that if a stack is used by -\fBpcre_exec()\fP, (that is, it is assigned to the pattern currently running), -that stack must not be used by any other threads (to avoid overwriting the same -memory area). The best practice for multithreaded programs is to allocate a -stack for each thread, and return this stack through the JIT callback function. -.P -(4) When should a JIT stack be freed? -.sp -You can free a JIT stack at any time, as long as it will not be used by -\fBpcre_exec()\fP again. When you assign the stack to a pattern, only a pointer -is set. There is no reference counting or any other magic. You can free the -patterns and stacks in any order, anytime. Just \fIdo not\fP call -\fBpcre_exec()\fP with a pattern pointing to an already freed stack, as that -will cause SEGFAULT. (Also, do not free a stack currently used by -\fBpcre_exec()\fP in another thread). You can also replace the stack for a -pattern at any time. You can even free the previous stack before assigning a -replacement. -.P -(5) Should I allocate/free a stack every time before/after calling -\fBpcre_exec()\fP? -.sp -No, because this is too costly in terms of resources. However, you could -implement some clever idea which release the stack if it is not used in let's -say two minutes. The JIT callback can help to achieve this without keeping a -list of the currently JIT studied patterns. -.P -(6) OK, the stack is for long term memory allocation. But what happens if a -pattern causes stack overflow with a stack of 1M? Is that 1M kept until the -stack is freed? -.sp -Especially on embedded sytems, it might be a good idea to release memory -sometimes without freeing the stack. There is no API for this at the moment. -Probably a function call which returns with the currently allocated memory for -any stack and another which allows releasing memory (shrinking the stack) would -be a good idea if someone needs this. -.P -(7) This is too much of a headache. Isn't there any better solution for JIT -stack handling? -.sp -No, thanks to Windows. If POSIX threads were used everywhere, we could throw -out this complicated API. -. -. -.SH "EXAMPLE CODE" -.rs -.sp -This is a single-threaded example that specifies a JIT stack without using a -callback. -.sp - int rc; - int ovector[30]; - pcre *re; - pcre_extra *extra; - pcre_jit_stack *jit_stack; -.sp - re = pcre_compile(pattern, 0, &error, &erroffset, NULL); - /* Check for errors */ - extra = pcre_study(re, PCRE_STUDY_JIT_COMPILE, &error); - jit_stack = pcre_jit_stack_alloc(32*1024, 512*1024); - /* Check for error (NULL) */ - pcre_assign_jit_stack(extra, NULL, jit_stack); - rc = pcre_exec(re, extra, subject, length, 0, 0, ovector, 30); - /* Check results */ - pcre_free(re); - pcre_free_study(extra); - pcre_jit_stack_free(jit_stack); -.sp -. -. -.SH "JIT FAST PATH API" -.rs -.sp -Because the API described above falls back to interpreted execution when JIT is -not available, it is convenient for programs that are written for general use -in many environments. However, calling JIT via \fBpcre_exec()\fP does have a -performance impact. Programs that are written for use where JIT is known to be -available, and which need the best possible performance, can instead use a -"fast path" API to call JIT execution directly instead of calling -\fBpcre_exec()\fP (obviously only for patterns that have been successfully -studied by JIT). -.P -The fast path function is called \fBpcre_jit_exec()\fP, and it takes exactly -the same arguments as \fBpcre_exec()\fP, plus one additional argument that -must point to a JIT stack. The JIT stack arrangements described above do not -apply. The return values are the same as for \fBpcre_exec()\fP. -.P -When you call \fBpcre_exec()\fP, as well as testing for invalid options, a -number of other sanity checks are performed on the arguments. For example, if -the subject pointer is NULL, or its length is negative, an immediate error is -given. Also, unless PCRE_NO_UTF[8|16|32] is set, a UTF subject string is tested -for validity. In the interests of speed, these checks do not happen on the JIT -fast path, and if invalid data is passed, the result is undefined. -.P -Bypassing the sanity checks and the \fBpcre_exec()\fP wrapping can give -speedups of more than 10%. -.P -Note that the \fBpcre_jit_exec()\fP function is not available in versions of -PCRE before 8.32 (released in November 2012). If you need to support versions -that old you must either use the slower \fBpcre_exec()\fP, or switch between -the two codepaths by checking the values of PCRE_MAJOR and PCRE_MINOR. -.P -Due to an unfortunate implementation oversight, even in versions 8.32 -and later there will be no \fBpcre_jit_exec()\fP stub function defined -when PCRE is compiled with --disable-jit, which is the default, and -there's no way to detect whether PCRE was compiled with --enable-jit -via a macro. -.P -If you need to support versions older than 8.32, or versions that may -not build with --enable-jit, you must either use the slower -\fBpcre_exec()\fP, or switch between the two codepaths by checking the -values of PCRE_MAJOR and PCRE_MINOR. -.P -Switching between the two by checking the version assumes that all the -versions being targeted are built with --enable-jit. To also support -builds that may use --disable-jit either \fBpcre_exec()\fP must be -used, or a compile-time check for JIT via \fBpcre_config()\fP (which -assumes the runtime environment will be the same), or as the Git -project decided to do, simply assume that \fBpcre_jit_exec()\fP is -present in 8.32 or later unless a compile-time flag is provided, see -the "grep: un-break building with PCRE >= 8.32 without --enable-jit" -commit in git.git for an example of that. -. -. -.SH "SEE ALSO" -.rs -.sp -\fBpcreapi\fP(3) -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel (FAQ by Zoltan Herczeg) -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 05 July 2017 -Copyright (c) 1997-2017 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcrelimits.3 b/src/pcre/doc/pcrelimits.3 deleted file mode 100644 index 423d6a27..00000000 --- a/src/pcre/doc/pcrelimits.3 +++ /dev/null @@ -1,71 +0,0 @@ -.TH PCRELIMITS 3 "05 November 2013" "PCRE 8.34" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH "SIZE AND OTHER LIMITATIONS" -.rs -.sp -There are some size limitations in PCRE but it is hoped that they will never in -practice be relevant. -.P -The maximum length of a compiled pattern is approximately 64K data units (bytes -for the 8-bit library, 16-bit units for the 16-bit library, and 32-bit units for -the 32-bit library) if PCRE is compiled with the default internal linkage size, -which is 2 bytes for the 8-bit and 16-bit libraries, and 4 bytes for the 32-bit -library. If you want to process regular expressions that are truly enormous, -you can compile PCRE with an internal linkage size of 3 or 4 (when building the -16-bit or 32-bit library, 3 is rounded up to 4). See the \fBREADME\fP file in -the source distribution and the -.\" HREF -\fBpcrebuild\fP -.\" -documentation for details. In these cases the limit is substantially larger. -However, the speed of execution is slower. -.P -All values in repeating quantifiers must be less than 65536. -.P -There is no limit to the number of parenthesized subpatterns, but there can be -no more than 65535 capturing subpatterns. There is, however, a limit to the -depth of nesting of parenthesized subpatterns of all kinds. This is imposed in -order to limit the amount of system stack used at compile time. The limit can -be specified when PCRE is built; the default is 250. -.P -There is a limit to the number of forward references to subsequent subpatterns -of around 200,000. Repeated forward references with fixed upper limits, for -example, (?2){0,100} when subpattern number 2 is to the right, are included in -the count. There is no limit to the number of backward references. -.P -The maximum length of name for a named subpattern is 32 characters, and the -maximum number of named subpatterns is 10000. -.P -The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb -is 255 for the 8-bit library and 65535 for the 16-bit and 32-bit libraries. -.P -The maximum length of a subject string is the largest positive number that an -integer variable can hold. However, when using the traditional matching -function, PCRE uses recursion to handle subpatterns and indefinite repetition. -This means that the available stack space may limit the size of a subject -string that can be processed by certain patterns. For a discussion of stack -issues, see the -.\" HREF -\fBpcrestack\fP -.\" -documentation. -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 05 November 2013 -Copyright (c) 1997-2013 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcrepartial.3 b/src/pcre/doc/pcrepartial.3 deleted file mode 100644 index 14d0124f..00000000 --- a/src/pcre/doc/pcrepartial.3 +++ /dev/null @@ -1,476 +0,0 @@ -.TH PCREPARTIAL 3 "02 July 2013" "PCRE 8.34" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH "PARTIAL MATCHING IN PCRE" -.rs -.sp -In normal use of PCRE, if the subject string that is passed to a matching -function matches as far as it goes, but is too short to match the entire -pattern, PCRE_ERROR_NOMATCH is returned. There are circumstances where it might -be helpful to distinguish this case from other cases in which there is no -match. -.P -Consider, for example, an application where a human is required to type in data -for a field with specific formatting requirements. An example might be a date -in the form \fIddmmmyy\fP, defined by this pattern: -.sp - ^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$ -.sp -If the application sees the user's keystrokes one by one, and can check that -what has been typed so far is potentially valid, it is able to raise an error -as soon as a mistake is made, by beeping and not reflecting the character that -has been typed, for example. This immediate feedback is likely to be a better -user interface than a check that is delayed until the entire string has been -entered. Partial matching can also be useful when the subject string is very -long and is not all available at once. -.P -PCRE supports partial matching by means of the PCRE_PARTIAL_SOFT and -PCRE_PARTIAL_HARD options, which can be set when calling any of the matching -functions. For backwards compatibility, PCRE_PARTIAL is a synonym for -PCRE_PARTIAL_SOFT. The essential difference between the two options is whether -or not a partial match is preferred to an alternative complete match, though -the details differ between the two types of matching function. If both options -are set, PCRE_PARTIAL_HARD takes precedence. -.P -If you want to use partial matching with just-in-time optimized code, you must -call \fBpcre_study()\fP, \fBpcre16_study()\fP or \fBpcre32_study()\fP with one -or both of these options: -.sp - PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE - PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE -.sp -PCRE_STUDY_JIT_COMPILE should also be set if you are going to run non-partial -matches on the same pattern. If the appropriate JIT study mode has not been set -for a match, the interpretive matching code is used. -.P -Setting a partial matching option disables two of PCRE's standard -optimizations. PCRE remembers the last literal data unit in a pattern, and -abandons matching immediately if it is not present in the subject string. This -optimization cannot be used for a subject string that might match only -partially. If the pattern was studied, PCRE knows the minimum length of a -matching string, and does not bother to run the matching function on shorter -strings. This optimization is also disabled for partial matching. -. -. -.SH "PARTIAL MATCHING USING pcre_exec() OR pcre[16|32]_exec()" -.rs -.sp -A partial match occurs during a call to \fBpcre_exec()\fP or -\fBpcre[16|32]_exec()\fP when the end of the subject string is reached -successfully, but matching cannot continue because more characters are needed. -However, at least one character in the subject must have been inspected. This -character need not form part of the final matched string; lookbehind assertions -and the \eK escape sequence provide ways of inspecting characters before the -start of a matched substring. The requirement for inspecting at least one -character exists because an empty string can always be matched; without such a -restriction there would always be a partial match of an empty string at the end -of the subject. -.P -If there are at least two slots in the offsets vector when a partial match is -returned, the first slot is set to the offset of the earliest character that -was inspected. For convenience, the second offset points to the end of the -subject so that a substring can easily be identified. If there are at least -three slots in the offsets vector, the third slot is set to the offset of the -character where matching started. -.P -For the majority of patterns, the contents of the first and third slots will be -the same. However, for patterns that contain lookbehind assertions, or begin -with \eb or \eB, characters before the one where matching started may have been -inspected while carrying out the match. For example, consider this pattern: -.sp - /(?<=abc)123/ -.sp -This pattern matches "123", but only if it is preceded by "abc". If the subject -string is "xyzabc12", the first two offsets after a partial match are for the -substring "abc12", because all these characters were inspected. However, the -third offset is set to 6, because that is the offset where matching began. -.P -What happens when a partial match is identified depends on which of the two -partial matching options are set. -. -. -.SS "PCRE_PARTIAL_SOFT WITH pcre_exec() OR pcre[16|32]_exec()" -.rs -.sp -If PCRE_PARTIAL_SOFT is set when \fBpcre_exec()\fP or \fBpcre[16|32]_exec()\fP -identifies a partial match, the partial match is remembered, but matching -continues as normal, and other alternatives in the pattern are tried. If no -complete match can be found, PCRE_ERROR_PARTIAL is returned instead of -PCRE_ERROR_NOMATCH. -.P -This option is "soft" because it prefers a complete match over a partial match. -All the various matching items in a pattern behave as if the subject string is -potentially complete. For example, \ez, \eZ, and $ match at the end of the -subject, as normal, and for \eb and \eB the end of the subject is treated as a -non-alphanumeric. -.P -If there is more than one partial match, the first one that was found provides -the data that is returned. Consider this pattern: -.sp - /123\ew+X|dogY/ -.sp -If this is matched against the subject string "abc123dog", both -alternatives fail to match, but the end of the subject is reached during -matching, so PCRE_ERROR_PARTIAL is returned. The offsets are set to 3 and 9, -identifying "123dog" as the first partial match that was found. (In this -example, there are two partial matches, because "dog" on its own partially -matches the second alternative.) -. -. -.SS "PCRE_PARTIAL_HARD WITH pcre_exec() OR pcre[16|32]_exec()" -.rs -.sp -If PCRE_PARTIAL_HARD is set for \fBpcre_exec()\fP or \fBpcre[16|32]_exec()\fP, -PCRE_ERROR_PARTIAL is returned as soon as a partial match is found, without -continuing to search for possible complete matches. This option is "hard" -because it prefers an earlier partial match over a later complete match. For -this reason, the assumption is made that the end of the supplied subject string -may not be the true end of the available data, and so, if \ez, \eZ, \eb, \eB, -or $ are encountered at the end of the subject, the result is -PCRE_ERROR_PARTIAL, provided that at least one character in the subject has -been inspected. -.P -Setting PCRE_PARTIAL_HARD also affects the way UTF-8 and UTF-16 -subject strings are checked for validity. Normally, an invalid sequence -causes the error PCRE_ERROR_BADUTF8 or PCRE_ERROR_BADUTF16. However, in the -special case of a truncated character at the end of the subject, -PCRE_ERROR_SHORTUTF8 or PCRE_ERROR_SHORTUTF16 is returned when -PCRE_PARTIAL_HARD is set. -. -. -.SS "Comparing hard and soft partial matching" -.rs -.sp -The difference between the two partial matching options can be illustrated by a -pattern such as: -.sp - /dog(sbody)?/ -.sp -This matches either "dog" or "dogsbody", greedily (that is, it prefers the -longer string if possible). If it is matched against the string "dog" with -PCRE_PARTIAL_SOFT, it yields a complete match for "dog". However, if -PCRE_PARTIAL_HARD is set, the result is PCRE_ERROR_PARTIAL. On the other hand, -if the pattern is made ungreedy the result is different: -.sp - /dog(sbody)??/ -.sp -In this case the result is always a complete match because that is found first, -and matching never continues after finding a complete match. It might be easier -to follow this explanation by thinking of the two patterns like this: -.sp - /dog(sbody)?/ is the same as /dogsbody|dog/ - /dog(sbody)??/ is the same as /dog|dogsbody/ -.sp -The second pattern will never match "dogsbody", because it will always find the -shorter match first. -. -. -.SH "PARTIAL MATCHING USING pcre_dfa_exec() OR pcre[16|32]_dfa_exec()" -.rs -.sp -The DFA functions move along the subject string character by character, without -backtracking, searching for all possible matches simultaneously. If the end of -the subject is reached before the end of the pattern, there is the possibility -of a partial match, again provided that at least one character has been -inspected. -.P -When PCRE_PARTIAL_SOFT is set, PCRE_ERROR_PARTIAL is returned only if there -have been no complete matches. Otherwise, the complete matches are returned. -However, if PCRE_PARTIAL_HARD is set, a partial match takes precedence over any -complete matches. The portion of the string that was inspected when the longest -partial match was found is set as the first matching string, provided there are -at least two slots in the offsets vector. -.P -Because the DFA functions always search for all possible matches, and there is -no difference between greedy and ungreedy repetition, their behaviour is -different from the standard functions when PCRE_PARTIAL_HARD is set. Consider -the string "dog" matched against the ungreedy pattern shown above: -.sp - /dog(sbody)??/ -.sp -Whereas the standard functions stop as soon as they find the complete match for -"dog", the DFA functions also find the partial match for "dogsbody", and so -return that when PCRE_PARTIAL_HARD is set. -. -. -.SH "PARTIAL MATCHING AND WORD BOUNDARIES" -.rs -.sp -If a pattern ends with one of sequences \eb or \eB, which test for word -boundaries, partial matching with PCRE_PARTIAL_SOFT can give counter-intuitive -results. Consider this pattern: -.sp - /\ebcat\eb/ -.sp -This matches "cat", provided there is a word boundary at either end. If the -subject string is "the cat", the comparison of the final "t" with a following -character cannot take place, so a partial match is found. However, normal -matching carries on, and \eb matches at the end of the subject when the last -character is a letter, so a complete match is found. The result, therefore, is -\fInot\fP PCRE_ERROR_PARTIAL. Using PCRE_PARTIAL_HARD in this case does yield -PCRE_ERROR_PARTIAL, because then the partial match takes precedence. -. -. -.SH "FORMERLY RESTRICTED PATTERNS" -.rs -.sp -For releases of PCRE prior to 8.00, because of the way certain internal -optimizations were implemented in the \fBpcre_exec()\fP function, the -PCRE_PARTIAL option (predecessor of PCRE_PARTIAL_SOFT) could not be used with -all patterns. From release 8.00 onwards, the restrictions no longer apply, and -partial matching with can be requested for any pattern. -.P -Items that were formerly restricted were repeated single characters and -repeated metasequences. If PCRE_PARTIAL was set for a pattern that did not -conform to the restrictions, \fBpcre_exec()\fP returned the error code -PCRE_ERROR_BADPARTIAL (-13). This error code is no longer in use. The -PCRE_INFO_OKPARTIAL call to \fBpcre_fullinfo()\fP to find out if a compiled -pattern can be used for partial matching now always returns 1. -. -. -.SH "EXAMPLE OF PARTIAL MATCHING USING PCRETEST" -.rs -.sp -If the escape sequence \eP is present in a \fBpcretest\fP data line, the -PCRE_PARTIAL_SOFT option is used for the match. Here is a run of \fBpcretest\fP -that uses the date example quoted above: -.sp - re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/ - data> 25jun04\eP - 0: 25jun04 - 1: jun - data> 25dec3\eP - Partial match: 23dec3 - data> 3ju\eP - Partial match: 3ju - data> 3juj\eP - No match - data> j\eP - No match -.sp -The first data string is matched completely, so \fBpcretest\fP shows the -matched substrings. The remaining four strings do not match the complete -pattern, but the first two are partial matches. Similar output is obtained -if DFA matching is used. -.P -If the escape sequence \eP is present more than once in a \fBpcretest\fP data -line, the PCRE_PARTIAL_HARD option is set for the match. -. -. -.SH "MULTI-SEGMENT MATCHING WITH pcre_dfa_exec() OR pcre[16|32]_dfa_exec()" -.rs -.sp -When a partial match has been found using a DFA matching function, it is -possible to continue the match by providing additional subject data and calling -the function again with the same compiled regular expression, this time setting -the PCRE_DFA_RESTART option. You must pass the same working space as before, -because this is where details of the previous partial match are stored. Here is -an example using \fBpcretest\fP, using the \eR escape sequence to set the -PCRE_DFA_RESTART option (\eD specifies the use of the DFA matching function): -.sp - re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/ - data> 23ja\eP\eD - Partial match: 23ja - data> n05\eR\eD - 0: n05 -.sp -The first call has "23ja" as the subject, and requests partial matching; the -second call has "n05" as the subject for the continued (restarted) match. -Notice that when the match is complete, only the last part is shown; PCRE does -not retain the previously partially-matched string. It is up to the calling -program to do that if it needs to. -.P -That means that, for an unanchored pattern, if a continued match fails, it is -not possible to try again at a new starting point. All this facility is capable -of doing is continuing with the previous match attempt. In the previous -example, if the second set of data is "ug23" the result is no match, even -though there would be a match for "aug23" if the entire string were given at -once. Depending on the application, this may or may not be what you want. -The only way to allow for starting again at the next character is to retain the -matched part of the subject and try a new complete match. -.P -You can set the PCRE_PARTIAL_SOFT or PCRE_PARTIAL_HARD options with -PCRE_DFA_RESTART to continue partial matching over multiple segments. This -facility can be used to pass very long subject strings to the DFA matching -functions. -. -. -.SH "MULTI-SEGMENT MATCHING WITH pcre_exec() OR pcre[16|32]_exec()" -.rs -.sp -From release 8.00, the standard matching functions can also be used to do -multi-segment matching. Unlike the DFA functions, it is not possible to -restart the previous match with a new segment of data. Instead, new data must -be added to the previous subject string, and the entire match re-run, starting -from the point where the partial match occurred. Earlier data can be discarded. -.P -It is best to use PCRE_PARTIAL_HARD in this situation, because it does not -treat the end of a segment as the end of the subject when matching \ez, \eZ, -\eb, \eB, and $. Consider an unanchored pattern that matches dates: -.sp - re> /\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed/ - data> The date is 23ja\eP\eP - Partial match: 23ja -.sp -At this stage, an application could discard the text preceding "23ja", add on -text from the next segment, and call the matching function again. Unlike the -DFA matching functions, the entire matching string must always be available, -and the complete matching process occurs for each call, so more memory and more -processing time is needed. -.P -\fBNote:\fP If the pattern contains lookbehind assertions, or \eK, or starts -with \eb or \eB, the string that is returned for a partial match includes -characters that precede the start of what would be returned for a complete -match, because it contains all the characters that were inspected during the -partial match. -. -. -.SH "ISSUES WITH MULTI-SEGMENT MATCHING" -.rs -.sp -Certain types of pattern may give problems with multi-segment matching, -whichever matching function is used. -.P -1. If the pattern contains a test for the beginning of a line, you need to pass -the PCRE_NOTBOL option when the subject string for any call does start at the -beginning of a line. There is also a PCRE_NOTEOL option, but in practice when -doing multi-segment matching you should be using PCRE_PARTIAL_HARD, which -includes the effect of PCRE_NOTEOL. -.P -2. Lookbehind assertions that have already been obeyed are catered for in the -offsets that are returned for a partial match. However a lookbehind assertion -later in the pattern could require even earlier characters to be inspected. You -can handle this case by using the PCRE_INFO_MAXLOOKBEHIND option of the -\fBpcre_fullinfo()\fP or \fBpcre[16|32]_fullinfo()\fP functions to obtain the -length of the longest lookbehind in the pattern. This length is given in -characters, not bytes. If you always retain at least that many characters -before the partially matched string, all should be well. (Of course, near the -start of the subject, fewer characters may be present; in that case all -characters should be retained.) -.P -From release 8.33, there is a more accurate way of deciding which characters to -retain. Instead of subtracting the length of the longest lookbehind from the -earliest inspected character (\fIoffsets[0]\fP), the match start position -(\fIoffsets[2]\fP) should be used, and the next match attempt started at the -\fIoffsets[2]\fP character by setting the \fIstartoffset\fP argument of -\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. -.P -For example, if the pattern "(?<=123)abc" is partially -matched against the string "xx123a", the three offset values returned are 2, 6, -and 5. This indicates that the matching process that gave a partial match -started at offset 5, but the characters "123a" were all inspected. The maximum -lookbehind for that pattern is 3, so taking that away from 5 shows that we need -only keep "123a", and the next match attempt can be started at offset 3 (that -is, at "a") when further characters have been added. When the match start is -not the earliest inspected character, \fBpcretest\fP shows it explicitly: -.sp - re> "(?<=123)abc" - data> xx123a\eP\eP - Partial match at offset 5: 123a -.P -3. Because a partial match must always contain at least one character, what -might be considered a partial match of an empty string actually gives a "no -match" result. For example: -.sp - re> /c(?<=abc)x/ - data> ab\eP - No match -.sp -If the next segment begins "cx", a match should be found, but this will only -happen if characters from the previous segment are retained. For this reason, a -"no match" result should be interpreted as "partial match of an empty string" -when the pattern contains lookbehinds. -.P -4. Matching a subject string that is split into multiple segments may not -always produce exactly the same result as matching over one single long string, -especially when PCRE_PARTIAL_SOFT is used. The section "Partial Matching and -Word Boundaries" above describes an issue that arises if the pattern ends with -\eb or \eB. Another kind of difference may occur when there are multiple -matching possibilities, because (for PCRE_PARTIAL_SOFT) a partial match result -is given only when there are no completed matches. This means that as soon as -the shortest match has been found, continuation to a new subject segment is no -longer possible. Consider again this \fBpcretest\fP example: -.sp - re> /dog(sbody)?/ - data> dogsb\eP - 0: dog - data> do\eP\eD - Partial match: do - data> gsb\eR\eP\eD - 0: g - data> dogsbody\eD - 0: dogsbody - 1: dog -.sp -The first data line passes the string "dogsb" to a standard matching function, -setting the PCRE_PARTIAL_SOFT option. Although the string is a partial match -for "dogsbody", the result is not PCRE_ERROR_PARTIAL, because the shorter -string "dog" is a complete match. Similarly, when the subject is presented to -a DFA matching function in several parts ("do" and "gsb" being the first two) -the match stops when "dog" has been found, and it is not possible to continue. -On the other hand, if "dogsbody" is presented as a single string, a DFA -matching function finds both matches. -.P -Because of these problems, it is best to use PCRE_PARTIAL_HARD when matching -multi-segment data. The example above then behaves differently: -.sp - re> /dog(sbody)?/ - data> dogsb\eP\eP - Partial match: dogsb - data> do\eP\eD - Partial match: do - data> gsb\eR\eP\eP\eD - Partial match: gsb -.sp -5. Patterns that contain alternatives at the top level which do not all start -with the same pattern item may not work as expected when PCRE_DFA_RESTART is -used. For example, consider this pattern: -.sp - 1234|3789 -.sp -If the first part of the subject is "ABC123", a partial match of the first -alternative is found at offset 3. There is no partial match for the second -alternative, because such a match does not start at the same point in the -subject string. Attempting to continue with the string "7890" does not yield a -match because only those alternatives that match at one point in the subject -are remembered. The problem arises because the start of the second alternative -matches within the first alternative. There is no problem with anchored -patterns or patterns such as: -.sp - 1234|ABCD -.sp -where no string can be a partial match for both alternatives. This is not a -problem if a standard matching function is used, because the entire match has -to be rerun each time: -.sp - re> /1234|3789/ - data> ABC123\eP\eP - Partial match: 123 - data> 1237890 - 0: 3789 -.sp -Of course, instead of using PCRE_DFA_RESTART, the same technique of re-running -the entire match can also be used with the DFA matching functions. Another -possibility is to work with two buffers. If a partial match at offset \fIn\fP -in the first buffer is followed by "no match" when PCRE_DFA_RESTART is used on -the second buffer, you can then try a new match starting at offset \fIn+1\fP in -the first buffer. -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 02 July 2013 -Copyright (c) 1997-2013 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcrepattern.3 b/src/pcre/doc/pcrepattern.3 deleted file mode 100644 index 97df217f..00000000 --- a/src/pcre/doc/pcrepattern.3 +++ /dev/null @@ -1,3304 +0,0 @@ -.TH PCREPATTERN 3 "23 October 2016" "PCRE 8.40" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH "PCRE REGULAR EXPRESSION DETAILS" -.rs -.sp -The syntax and semantics of the regular expressions that are supported by PCRE -are described in detail below. There is a quick-reference syntax summary in the -.\" HREF -\fBpcresyntax\fP -.\" -page. PCRE tries to match Perl syntax and semantics as closely as it can. PCRE -also supports some alternative regular expression syntax (which does not -conflict with the Perl syntax) in order to provide some compatibility with -regular expressions in Python, .NET, and Oniguruma. -.P -Perl's regular expressions are described in its own documentation, and -regular expressions in general are covered in a number of books, some of which -have copious examples. Jeffrey Friedl's "Mastering Regular Expressions", -published by O'Reilly, covers regular expressions in great detail. This -description of PCRE's regular expressions is intended as reference material. -.P -This document discusses the patterns that are supported by PCRE when one its -main matching functions, \fBpcre_exec()\fP (8-bit) or \fBpcre[16|32]_exec()\fP -(16- or 32-bit), is used. PCRE also has alternative matching functions, -\fBpcre_dfa_exec()\fP and \fBpcre[16|32_dfa_exec()\fP, which match using a -different algorithm that is not Perl-compatible. Some of the features discussed -below are not available when DFA matching is used. The advantages and -disadvantages of the alternative functions, and how they differ from the normal -functions, are discussed in the -.\" HREF -\fBpcrematching\fP -.\" -page. -. -. -.SH "SPECIAL START-OF-PATTERN ITEMS" -.rs -.sp -A number of options that can be passed to \fBpcre_compile()\fP can also be set -by special items at the start of a pattern. These are not Perl-compatible, but -are provided to make these options accessible to pattern writers who are not -able to change the program that processes the pattern. Any number of these -items may appear, but they must all be together right at the start of the -pattern string, and the letters must be in upper case. -. -. -.SS "UTF support" -.rs -.sp -The original operation of PCRE was on strings of one-byte characters. However, -there is now also support for UTF-8 strings in the original library, an -extra library that supports 16-bit and UTF-16 character strings, and a -third library that supports 32-bit and UTF-32 character strings. To use these -features, PCRE must be built to include appropriate support. When using UTF -strings you must either call the compiling function with the PCRE_UTF8, -PCRE_UTF16, or PCRE_UTF32 option, or the pattern must start with one of -these special sequences: -.sp - (*UTF8) - (*UTF16) - (*UTF32) - (*UTF) -.sp -(*UTF) is a generic sequence that can be used with any of the libraries. -Starting a pattern with such a sequence is equivalent to setting the relevant -option. How setting a UTF mode affects pattern matching is mentioned in several -places below. There is also a summary of features in the -.\" HREF -\fBpcreunicode\fP -.\" -page. -.P -Some applications that allow their users to supply patterns may wish to -restrict them to non-UTF data for security reasons. If the PCRE_NEVER_UTF -option is set at compile time, (*UTF) etc. are not allowed, and their -appearance causes an error. -. -. -.SS "Unicode property support" -.rs -.sp -Another special sequence that may appear at the start of a pattern is (*UCP). -This has the same effect as setting the PCRE_UCP option: it causes sequences -such as \ed and \ew to use Unicode properties to determine character types, -instead of recognizing only characters with codes less than 128 via a lookup -table. -. -. -.SS "Disabling auto-possessification" -.rs -.sp -If a pattern starts with (*NO_AUTO_POSSESS), it has the same effect as setting -the PCRE_NO_AUTO_POSSESS option at compile time. This stops PCRE from making -quantifiers possessive when what follows cannot match the repeated item. For -example, by default a+b is treated as a++b. For more details, see the -.\" HREF -\fBpcreapi\fP -.\" -documentation. -. -. -.SS "Disabling start-up optimizations" -.rs -.sp -If a pattern starts with (*NO_START_OPT), it has the same effect as setting the -PCRE_NO_START_OPTIMIZE option either at compile or matching time. This disables -several optimizations for quickly reaching "no match" results. For more -details, see the -.\" HREF -\fBpcreapi\fP -.\" -documentation. -. -. -.\" HTML -.SS "Newline conventions" -.rs -.sp -PCRE supports five different conventions for indicating line breaks in -strings: a single CR (carriage return) character, a single LF (linefeed) -character, the two-character sequence CRLF, any of the three preceding, or any -Unicode newline sequence. The -.\" HREF -\fBpcreapi\fP -.\" -page has -.\" HTML -.\" -further discussion -.\" -about newlines, and shows how to set the newline convention in the -\fIoptions\fP arguments for the compiling and matching functions. -.P -It is also possible to specify a newline convention by starting a pattern -string with one of the following five sequences: -.sp - (*CR) carriage return - (*LF) linefeed - (*CRLF) carriage return, followed by linefeed - (*ANYCRLF) any of the three above - (*ANY) all Unicode newline sequences -.sp -These override the default and the options given to the compiling function. For -example, on a Unix system where LF is the default newline sequence, the pattern -.sp - (*CR)a.b -.sp -changes the convention to CR. That pattern matches "a\enb" because LF is no -longer a newline. If more than one of these settings is present, the last one -is used. -.P -The newline convention affects where the circumflex and dollar assertions are -true. It also affects the interpretation of the dot metacharacter when -PCRE_DOTALL is not set, and the behaviour of \eN. However, it does not affect -what the \eR escape sequence matches. By default, this is any Unicode newline -sequence, for Perl compatibility. However, this can be changed; see the -description of \eR in the section entitled -.\" HTML -.\" -"Newline sequences" -.\" -below. A change of \eR setting can be combined with a change of newline -convention. -. -. -.SS "Setting match and recursion limits" -.rs -.sp -The caller of \fBpcre_exec()\fP can set a limit on the number of times the -internal \fBmatch()\fP function is called and on the maximum depth of -recursive calls. These facilities are provided to catch runaway matches that -are provoked by patterns with huge matching trees (a typical example is a -pattern with nested unlimited repeats) and to avoid running out of system stack -by too much recursion. When one of these limits is reached, \fBpcre_exec()\fP -gives an error return. The limits can also be set by items at the start of the -pattern of the form -.sp - (*LIMIT_MATCH=d) - (*LIMIT_RECURSION=d) -.sp -where d is any number of decimal digits. However, the value of the setting must -be less than the value set (or defaulted) by the caller of \fBpcre_exec()\fP -for it to have any effect. In other words, the pattern writer can lower the -limits set by the programmer, but not raise them. If there is more than one -setting of one of these limits, the lower value is used. -. -. -.SH "EBCDIC CHARACTER CODES" -.rs -.sp -PCRE can be compiled to run in an environment that uses EBCDIC as its character -code rather than ASCII or Unicode (typically a mainframe system). In the -sections below, character code values are ASCII or Unicode; in an EBCDIC -environment these characters may have different code values, and there are no -code points greater than 255. -. -. -.SH "CHARACTERS AND METACHARACTERS" -.rs -.sp -A regular expression is a pattern that is matched against a subject string from -left to right. Most characters stand for themselves in a pattern, and match the -corresponding characters in the subject. As a trivial example, the pattern -.sp - The quick brown fox -.sp -matches a portion of a subject string that is identical to itself. When -caseless matching is specified (the PCRE_CASELESS option), letters are matched -independently of case. In a UTF mode, PCRE always understands the concept of -case for characters whose values are less than 128, so caseless matching is -always possible. For characters with higher values, the concept of case is -supported if PCRE is compiled with Unicode property support, but not otherwise. -If you want to use caseless matching for characters 128 and above, you must -ensure that PCRE is compiled with Unicode property support as well as with -UTF support. -.P -The power of regular expressions comes from the ability to include alternatives -and repetitions in the pattern. These are encoded in the pattern by the use of -\fImetacharacters\fP, which do not stand for themselves but instead are -interpreted in some special way. -.P -There are two different sets of metacharacters: those that are recognized -anywhere in the pattern except within square brackets, and those that are -recognized within square brackets. Outside square brackets, the metacharacters -are as follows: -.sp - \e general escape character with several uses - ^ assert start of string (or line, in multiline mode) - $ assert end of string (or line, in multiline mode) - . match any character except newline (by default) - [ start character class definition - | start of alternative branch - ( start subpattern - ) end subpattern - ? extends the meaning of ( - also 0 or 1 quantifier - also quantifier minimizer - * 0 or more quantifier - + 1 or more quantifier - also "possessive quantifier" - { start min/max quantifier -.sp -Part of a pattern that is in square brackets is called a "character class". In -a character class the only metacharacters are: -.sp - \e general escape character - ^ negate the class, but only if the first character - - indicates character range -.\" JOIN - [ POSIX character class (only if followed by POSIX - syntax) - ] terminates the character class -.sp -The following sections describe the use of each of the metacharacters. -. -. -.SH BACKSLASH -.rs -.sp -The backslash character has several uses. Firstly, if it is followed by a -character that is not a number or a letter, it takes away any special meaning -that character may have. This use of backslash as an escape character applies -both inside and outside character classes. -.P -For example, if you want to match a * character, you write \e* in the pattern. -This escaping action applies whether or not the following character would -otherwise be interpreted as a metacharacter, so it is always safe to precede a -non-alphanumeric with backslash to specify that it stands for itself. In -particular, if you want to match a backslash, you write \e\e. -.P -In a UTF mode, only ASCII numbers and letters have any special meaning after a -backslash. All other characters (in particular, those whose codepoints are -greater than 127) are treated as literals. -.P -If a pattern is compiled with the PCRE_EXTENDED option, most white space in the -pattern (other than in a character class), and characters between a # outside a -character class and the next newline, inclusive, are ignored. An escaping -backslash can be used to include a white space or # character as part of the -pattern. -.P -If you want to remove the special meaning from a sequence of characters, you -can do so by putting them between \eQ and \eE. This is different from Perl in -that $ and @ are handled as literals in \eQ...\eE sequences in PCRE, whereas in -Perl, $ and @ cause variable interpolation. Note the following examples: -.sp - Pattern PCRE matches Perl matches -.sp -.\" JOIN - \eQabc$xyz\eE abc$xyz abc followed by the - contents of $xyz - \eQabc\e$xyz\eE abc\e$xyz abc\e$xyz - \eQabc\eE\e$\eQxyz\eE abc$xyz abc$xyz -.sp -The \eQ...\eE sequence is recognized both inside and outside character classes. -An isolated \eE that is not preceded by \eQ is ignored. If \eQ is not followed -by \eE later in the pattern, the literal interpretation continues to the end of -the pattern (that is, \eE is assumed at the end). If the isolated \eQ is inside -a character class, this causes an error, because the character class is not -terminated. -. -. -.\" HTML -.SS "Non-printing characters" -.rs -.sp -A second use of backslash provides a way of encoding non-printing characters -in patterns in a visible manner. There is no restriction on the appearance of -non-printing characters, apart from the binary zero that terminates a pattern, -but when a pattern is being prepared by text editing, it is often easier to use -one of the following escape sequences than the binary character it represents. -In an ASCII or Unicode environment, these escapes are as follows: -.sp - \ea alarm, that is, the BEL character (hex 07) - \ecx "control-x", where x is any ASCII character - \ee escape (hex 1B) - \ef form feed (hex 0C) - \en linefeed (hex 0A) - \er carriage return (hex 0D) - \et tab (hex 09) - \e0dd character with octal code 0dd - \eddd character with octal code ddd, or back reference - \eo{ddd..} character with octal code ddd.. - \exhh character with hex code hh - \ex{hhh..} character with hex code hhh.. (non-JavaScript mode) - \euhhhh character with hex code hhhh (JavaScript mode only) -.sp -The precise effect of \ecx on ASCII characters is as follows: if x is a lower -case letter, it is converted to upper case. Then bit 6 of the character (hex -40) is inverted. Thus \ecA to \ecZ become hex 01 to hex 1A (A is 41, Z is 5A), -but \ec{ becomes hex 3B ({ is 7B), and \ec; becomes hex 7B (; is 3B). If the -data item (byte or 16-bit value) following \ec has a value greater than 127, a -compile-time error occurs. This locks out non-ASCII characters in all modes. -.P -When PCRE is compiled in EBCDIC mode, \ea, \ee, \ef, \en, \er, and \et -generate the appropriate EBCDIC code values. The \ec escape is processed -as specified for Perl in the \fBperlebcdic\fP document. The only characters -that are allowed after \ec are A-Z, a-z, or one of @, [, \e, ], ^, _, or ?. Any -other character provokes a compile-time error. The sequence \ec@ encodes -character code 0; after \ec the letters (in either case) encode characters 1-26 -(hex 01 to hex 1A); [, \e, ], ^, and _ encode characters 27-31 (hex 1B to hex -1F), and \ec? becomes either 255 (hex FF) or 95 (hex 5F). -.P -Thus, apart from \ec?, these escapes generate the same character code values as -they do in an ASCII environment, though the meanings of the values mostly -differ. For example, \ecG always generates code value 7, which is BEL in ASCII -but DEL in EBCDIC. -.P -The sequence \ec? generates DEL (127, hex 7F) in an ASCII environment, but -because 127 is not a control character in EBCDIC, Perl makes it generate the -APC character. Unfortunately, there are several variants of EBCDIC. In most of -them the APC character has the value 255 (hex FF), but in the one Perl calls -POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC -values, PCRE makes \ec? generate 95; otherwise it generates 255. -.P -After \e0 up to two further octal digits are read. If there are fewer than two -digits, just those that are present are used. Thus the sequence \e0\ex\e015 -specifies two binary zeros followed by a CR character (code value 13). Make -sure you supply two digits after the initial zero if the pattern character that -follows is itself an octal digit. -.P -The escape \eo must be followed by a sequence of octal digits, enclosed in -braces. An error occurs if this is not the case. This escape is a recent -addition to Perl; it provides way of specifying character code points as octal -numbers greater than 0777, and it also allows octal numbers and back references -to be unambiguously specified. -.P -For greater clarity and unambiguity, it is best to avoid following \e by a -digit greater than zero. Instead, use \eo{} or \ex{} to specify character -numbers, and \eg{} to specify back references. The following paragraphs -describe the old, ambiguous syntax. -.P -The handling of a backslash followed by a digit other than 0 is complicated, -and Perl has changed in recent releases, causing PCRE also to change. Outside a -character class, PCRE reads the digit and any following digits as a decimal -number. If the number is less than 8, or if there have been at least that many -previous capturing left parentheses in the expression, the entire sequence is -taken as a \fIback reference\fP. A description of how this works is given -.\" HTML -.\" -later, -.\" -following the discussion of -.\" HTML -.\" -parenthesized subpatterns. -.\" -.P -Inside a character class, or if the decimal number following \e is greater than -7 and there have not been that many capturing subpatterns, PCRE handles \e8 and -\e9 as the literal characters "8" and "9", and otherwise re-reads up to three -octal digits following the backslash, using them to generate a data character. -Any subsequent digits stand for themselves. For example: -.sp - \e040 is another way of writing an ASCII space -.\" JOIN - \e40 is the same, provided there are fewer than 40 - previous capturing subpatterns - \e7 is always a back reference -.\" JOIN - \e11 might be a back reference, or another way of - writing a tab - \e011 is always a tab - \e0113 is a tab followed by the character "3" -.\" JOIN - \e113 might be a back reference, otherwise the - character with octal code 113 -.\" JOIN - \e377 might be a back reference, otherwise - the value 255 (decimal) -.\" JOIN - \e81 is either a back reference, or the two - characters "8" and "1" -.sp -Note that octal values of 100 or greater that are specified using this syntax -must not be introduced by a leading zero, because no more than three octal -digits are ever read. -.P -By default, after \ex that is not followed by {, from zero to two hexadecimal -digits are read (letters can be in upper or lower case). Any number of -hexadecimal digits may appear between \ex{ and }. If a character other than -a hexadecimal digit appears between \ex{ and }, or if there is no terminating -}, an error occurs. -.P -If the PCRE_JAVASCRIPT_COMPAT option is set, the interpretation of \ex is -as just described only when it is followed by two hexadecimal digits. -Otherwise, it matches a literal "x" character. In JavaScript mode, support for -code points greater than 256 is provided by \eu, which must be followed by -four hexadecimal digits; otherwise it matches a literal "u" character. -.P -Characters whose value is less than 256 can be defined by either of the two -syntaxes for \ex (or by \eu in JavaScript mode). There is no difference in the -way they are handled. For example, \exdc is exactly the same as \ex{dc} (or -\eu00dc in JavaScript mode). -. -. -.SS "Constraints on character values" -.rs -.sp -Characters that are specified using octal or hexadecimal numbers are -limited to certain values, as follows: -.sp - 8-bit non-UTF mode less than 0x100 - 8-bit UTF-8 mode less than 0x10ffff and a valid codepoint - 16-bit non-UTF mode less than 0x10000 - 16-bit UTF-16 mode less than 0x10ffff and a valid codepoint - 32-bit non-UTF mode less than 0x100000000 - 32-bit UTF-32 mode less than 0x10ffff and a valid codepoint -.sp -Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the so-called -"surrogate" codepoints), and 0xffef. -. -. -.SS "Escape sequences in character classes" -.rs -.sp -All the sequences that define a single character value can be used both inside -and outside character classes. In addition, inside a character class, \eb is -interpreted as the backspace character (hex 08). -.P -\eN is not allowed in a character class. \eB, \eR, and \eX are not special -inside a character class. Like other unrecognized escape sequences, they are -treated as the literal characters "B", "R", and "X" by default, but cause an -error if the PCRE_EXTRA option is set. Outside a character class, these -sequences have different meanings. -. -. -.SS "Unsupported escape sequences" -.rs -.sp -In Perl, the sequences \el, \eL, \eu, and \eU are recognized by its string -handler and used to modify the case of following characters. By default, PCRE -does not support these escape sequences. However, if the PCRE_JAVASCRIPT_COMPAT -option is set, \eU matches a "U" character, and \eu can be used to define a -character by code point, as described in the previous section. -. -. -.SS "Absolute and relative back references" -.rs -.sp -The sequence \eg followed by an unsigned or a negative number, optionally -enclosed in braces, is an absolute or relative back reference. A named back -reference can be coded as \eg{name}. Back references are discussed -.\" HTML -.\" -later, -.\" -following the discussion of -.\" HTML -.\" -parenthesized subpatterns. -.\" -. -. -.SS "Absolute and relative subroutine calls" -.rs -.sp -For compatibility with Oniguruma, the non-Perl syntax \eg followed by a name or -a number enclosed either in angle brackets or single quotes, is an alternative -syntax for referencing a subpattern as a "subroutine". Details are discussed -.\" HTML -.\" -later. -.\" -Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP -synonymous. The former is a back reference; the latter is a -.\" HTML -.\" -subroutine -.\" -call. -. -. -.\" HTML -.SS "Generic character types" -.rs -.sp -Another use of backslash is for specifying generic character types: -.sp - \ed any decimal digit - \eD any character that is not a decimal digit - \eh any horizontal white space character - \eH any character that is not a horizontal white space character - \es any white space character - \eS any character that is not a white space character - \ev any vertical white space character - \eV any character that is not a vertical white space character - \ew any "word" character - \eW any "non-word" character -.sp -There is also the single sequence \eN, which matches a non-newline character. -This is the same as -.\" HTML -.\" -the "." metacharacter -.\" -when PCRE_DOTALL is not set. Perl also uses \eN to match characters by name; -PCRE does not support this. -.P -Each pair of lower and upper case escape sequences partitions the complete set -of characters into two disjoint sets. Any given character matches one, and only -one, of each pair. The sequences can appear both inside and outside character -classes. They each match one character of the appropriate type. If the current -matching point is at the end of the subject string, all of them fail, because -there is no character to match. -.P -For compatibility with Perl, \es did not used to match the VT character (code -11), which made it different from the the POSIX "space" class. However, Perl -added VT at release 5.18, and PCRE followed suit at release 8.34. The default -\es characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space -(32), which are defined as white space in the "C" locale. This list may vary if -locale-specific matching is taking place. For example, in some locales the -"non-breaking space" character (\exA0) is recognized as white space, and in -others the VT character is not. -.P -A "word" character is an underscore or any character that is a letter or digit. -By default, the definition of letters and digits is controlled by PCRE's -low-valued character tables, and may vary if locale-specific matching is taking -place (see -.\" HTML -.\" -"Locale support" -.\" -in the -.\" HREF -\fBpcreapi\fP -.\" -page). For example, in a French locale such as "fr_FR" in Unix-like systems, -or "french" in Windows, some character codes greater than 127 are used for -accented letters, and these are then matched by \ew. The use of locales with -Unicode is discouraged. -.P -By default, characters whose code points are greater than 127 never match \ed, -\es, or \ew, and always match \eD, \eS, and \eW, although this may vary for -characters in the range 128-255 when locale-specific matching is happening. -These escape sequences retain their original meanings from before Unicode -support was available, mainly for efficiency reasons. If PCRE is compiled with -Unicode property support, and the PCRE_UCP option is set, the behaviour is -changed so that Unicode properties are used to determine character types, as -follows: -.sp - \ed any character that matches \ep{Nd} (decimal digit) - \es any character that matches \ep{Z} or \eh or \ev - \ew any character that matches \ep{L} or \ep{N}, plus underscore -.sp -The upper case escapes match the inverse sets of characters. Note that \ed -matches only decimal digits, whereas \ew matches any Unicode digit, as well as -any Unicode letter, and underscore. Note also that PCRE_UCP affects \eb, and -\eB because they are defined in terms of \ew and \eW. Matching these sequences -is noticeably slower when PCRE_UCP is set. -.P -The sequences \eh, \eH, \ev, and \eV are features that were added to Perl at -release 5.10. In contrast to the other sequences, which match only ASCII -characters by default, these always match certain high-valued code points, -whether or not PCRE_UCP is set. The horizontal space characters are: -.sp - U+0009 Horizontal tab (HT) - U+0020 Space - U+00A0 Non-break space - U+1680 Ogham space mark - U+180E Mongolian vowel separator - U+2000 En quad - U+2001 Em quad - U+2002 En space - U+2003 Em space - U+2004 Three-per-em space - U+2005 Four-per-em space - U+2006 Six-per-em space - U+2007 Figure space - U+2008 Punctuation space - U+2009 Thin space - U+200A Hair space - U+202F Narrow no-break space - U+205F Medium mathematical space - U+3000 Ideographic space -.sp -The vertical space characters are: -.sp - U+000A Linefeed (LF) - U+000B Vertical tab (VT) - U+000C Form feed (FF) - U+000D Carriage return (CR) - U+0085 Next line (NEL) - U+2028 Line separator - U+2029 Paragraph separator -.sp -In 8-bit, non-UTF-8 mode, only the characters with codepoints less than 256 are -relevant. -. -. -.\" HTML -.SS "Newline sequences" -.rs -.sp -Outside a character class, by default, the escape sequence \eR matches any -Unicode newline sequence. In 8-bit non-UTF-8 mode \eR is equivalent to the -following: -.sp - (?>\er\en|\en|\ex0b|\ef|\er|\ex85) -.sp -This is an example of an "atomic group", details of which are given -.\" HTML -.\" -below. -.\" -This particular group matches either the two-character sequence CR followed by -LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab, -U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next -line, U+0085). The two-character sequence is treated as a single unit that -cannot be split. -.P -In other modes, two additional characters whose codepoints are greater than 255 -are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029). -Unicode character property support is not needed for these characters to be -recognized. -.P -It is possible to restrict \eR to match only CR, LF, or CRLF (instead of the -complete set of Unicode line endings) by setting the option PCRE_BSR_ANYCRLF -either at compile time or when the pattern is matched. (BSR is an abbrevation -for "backslash R".) This can be made the default when PCRE is built; if this is -the case, the other behaviour can be requested via the PCRE_BSR_UNICODE option. -It is also possible to specify these settings by starting a pattern string with -one of the following sequences: -.sp - (*BSR_ANYCRLF) CR, LF, or CRLF only - (*BSR_UNICODE) any Unicode newline sequence -.sp -These override the default and the options given to the compiling function, but -they can themselves be overridden by options given to a matching function. Note -that these special settings, which are not Perl-compatible, are recognized only -at the very start of a pattern, and that they must be in upper case. If more -than one of them is present, the last one is used. They can be combined with a -change of newline convention; for example, a pattern can start with: -.sp - (*ANY)(*BSR_ANYCRLF) -.sp -They can also be combined with the (*UTF8), (*UTF16), (*UTF32), (*UTF) or -(*UCP) special sequences. Inside a character class, \eR is treated as an -unrecognized escape sequence, and so matches the letter "R" by default, but -causes an error if PCRE_EXTRA is set. -. -. -.\" HTML -.SS Unicode character properties -.rs -.sp -When PCRE is built with Unicode character property support, three additional -escape sequences that match characters with specific properties are available. -When in 8-bit non-UTF-8 mode, these sequences are of course limited to testing -characters whose codepoints are less than 256, but they do work in this mode. -The extra escape sequences are: -.sp - \ep{\fIxx\fP} a character with the \fIxx\fP property - \eP{\fIxx\fP} a character without the \fIxx\fP property - \eX a Unicode extended grapheme cluster -.sp -The property names represented by \fIxx\fP above are limited to the Unicode -script names, the general category properties, "Any", which matches any -character (including newline), and some special PCRE properties (described -in the -.\" HTML -.\" -next section). -.\" -Other Perl properties such as "InMusicalSymbols" are not currently supported by -PCRE. Note that \eP{Any} does not match any characters, so always causes a -match failure. -.P -Sets of Unicode characters are defined as belonging to certain scripts. A -character from one of these sets can be matched using a script name. For -example: -.sp - \ep{Greek} - \eP{Han} -.sp -Those that are not part of an identified script are lumped together as -"Common". The current list of scripts is: -.P -Arabic, -Armenian, -Avestan, -Balinese, -Bamum, -Bassa_Vah, -Batak, -Bengali, -Bopomofo, -Brahmi, -Braille, -Buginese, -Buhid, -Canadian_Aboriginal, -Carian, -Caucasian_Albanian, -Chakma, -Cham, -Cherokee, -Common, -Coptic, -Cuneiform, -Cypriot, -Cyrillic, -Deseret, -Devanagari, -Duployan, -Egyptian_Hieroglyphs, -Elbasan, -Ethiopic, -Georgian, -Glagolitic, -Gothic, -Grantha, -Greek, -Gujarati, -Gurmukhi, -Han, -Hangul, -Hanunoo, -Hebrew, -Hiragana, -Imperial_Aramaic, -Inherited, -Inscriptional_Pahlavi, -Inscriptional_Parthian, -Javanese, -Kaithi, -Kannada, -Katakana, -Kayah_Li, -Kharoshthi, -Khmer, -Khojki, -Khudawadi, -Lao, -Latin, -Lepcha, -Limbu, -Linear_A, -Linear_B, -Lisu, -Lycian, -Lydian, -Mahajani, -Malayalam, -Mandaic, -Manichaean, -Meetei_Mayek, -Mende_Kikakui, -Meroitic_Cursive, -Meroitic_Hieroglyphs, -Miao, -Modi, -Mongolian, -Mro, -Myanmar, -Nabataean, -New_Tai_Lue, -Nko, -Ogham, -Ol_Chiki, -Old_Italic, -Old_North_Arabian, -Old_Permic, -Old_Persian, -Old_South_Arabian, -Old_Turkic, -Oriya, -Osmanya, -Pahawh_Hmong, -Palmyrene, -Pau_Cin_Hau, -Phags_Pa, -Phoenician, -Psalter_Pahlavi, -Rejang, -Runic, -Samaritan, -Saurashtra, -Sharada, -Shavian, -Siddham, -Sinhala, -Sora_Sompeng, -Sundanese, -Syloti_Nagri, -Syriac, -Tagalog, -Tagbanwa, -Tai_Le, -Tai_Tham, -Tai_Viet, -Takri, -Tamil, -Telugu, -Thaana, -Thai, -Tibetan, -Tifinagh, -Tirhuta, -Ugaritic, -Vai, -Warang_Citi, -Yi. -.P -Each character has exactly one Unicode general category property, specified by -a two-letter abbreviation. For compatibility with Perl, negation can be -specified by including a circumflex between the opening brace and the property -name. For example, \ep{^Lu} is the same as \eP{Lu}. -.P -If only one letter is specified with \ep or \eP, it includes all the general -category properties that start with that letter. In this case, in the absence -of negation, the curly brackets in the escape sequence are optional; these two -examples have the same effect: -.sp - \ep{L} - \epL -.sp -The following general category property codes are supported: -.sp - C Other - Cc Control - Cf Format - Cn Unassigned - Co Private use - Cs Surrogate -.sp - L Letter - Ll Lower case letter - Lm Modifier letter - Lo Other letter - Lt Title case letter - Lu Upper case letter -.sp - M Mark - Mc Spacing mark - Me Enclosing mark - Mn Non-spacing mark -.sp - N Number - Nd Decimal number - Nl Letter number - No Other number -.sp - P Punctuation - Pc Connector punctuation - Pd Dash punctuation - Pe Close punctuation - Pf Final punctuation - Pi Initial punctuation - Po Other punctuation - Ps Open punctuation -.sp - S Symbol - Sc Currency symbol - Sk Modifier symbol - Sm Mathematical symbol - So Other symbol -.sp - Z Separator - Zl Line separator - Zp Paragraph separator - Zs Space separator -.sp -The special property L& is also supported: it matches a character that has -the Lu, Ll, or Lt property, in other words, a letter that is not classified as -a modifier or "other". -.P -The Cs (Surrogate) property applies only to characters in the range U+D800 to -U+DFFF. Such characters are not valid in Unicode strings and so -cannot be tested by PCRE, unless UTF validity checking has been turned off -(see the discussion of PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK and -PCRE_NO_UTF32_CHECK in the -.\" HREF -\fBpcreapi\fP -.\" -page). Perl does not support the Cs property. -.P -The long synonyms for property names that Perl supports (such as \ep{Letter}) -are not supported by PCRE, nor is it permitted to prefix any of these -properties with "Is". -.P -No character that is in the Unicode table has the Cn (unassigned) property. -Instead, this property is assumed for any code point that is not in the -Unicode table. -.P -Specifying caseless matching does not affect these escape sequences. For -example, \ep{Lu} always matches only upper case letters. This is different from -the behaviour of current versions of Perl. -.P -Matching characters by Unicode property is not fast, because PCRE has to do a -multistage table lookup in order to find a character's property. That is why -the traditional escape sequences such as \ed and \ew do not use Unicode -properties in PCRE by default, though you can make them do so by setting the -PCRE_UCP option or by starting the pattern with (*UCP). -. -. -.SS Extended grapheme clusters -.rs -.sp -The \eX escape matches any number of Unicode characters that form an "extended -grapheme cluster", and treats the sequence as an atomic group -.\" HTML -.\" -(see below). -.\" -Up to and including release 8.31, PCRE matched an earlier, simpler definition -that was equivalent to -.sp - (?>\ePM\epM*) -.sp -That is, it matched a character without the "mark" property, followed by zero -or more characters with the "mark" property. Characters with the "mark" -property are typically non-spacing accents that affect the preceding character. -.P -This simple definition was extended in Unicode to include more complicated -kinds of composite character by giving each character a grapheme breaking -property, and creating rules that use these properties to define the boundaries -of extended grapheme clusters. In releases of PCRE later than 8.31, \eX matches -one of these clusters. -.P -\eX always matches at least one character. Then it decides whether to add -additional characters according to the following rules for ending a cluster: -.P -1. End at the end of the subject string. -.P -2. Do not end between CR and LF; otherwise end after any control character. -.P -3. Do not break Hangul (a Korean script) syllable sequences. Hangul characters -are of five types: L, V, T, LV, and LVT. An L character may be followed by an -L, V, LV, or LVT character; an LV or V character may be followed by a V or T -character; an LVT or T character may be follwed only by a T character. -.P -4. Do not end before extending characters or spacing marks. Characters with -the "mark" property always have the "extend" grapheme breaking property. -.P -5. Do not end after prepend characters. -.P -6. Otherwise, end the cluster. -. -. -.\" HTML -.SS PCRE's additional properties -.rs -.sp -As well as the standard Unicode properties described above, PCRE supports four -more that make it possible to convert traditional escape sequences such as \ew -and \es to use Unicode properties. PCRE uses these non-standard, non-Perl -properties internally when PCRE_UCP is set. However, they may also be used -explicitly. These properties are: -.sp - Xan Any alphanumeric character - Xps Any POSIX space character - Xsp Any Perl space character - Xwd Any Perl "word" character -.sp -Xan matches characters that have either the L (letter) or the N (number) -property. Xps matches the characters tab, linefeed, vertical tab, form feed, or -carriage return, and any other character that has the Z (separator) property. -Xsp is the same as Xps; it used to exclude vertical tab, for Perl -compatibility, but Perl changed, and so PCRE followed at release 8.34. Xwd -matches the same characters as Xan, plus underscore. -.P -There is another non-standard property, Xuc, which matches any character that -can be represented by a Universal Character Name in C++ and other programming -languages. These are the characters $, @, ` (grave accent), and all characters -with Unicode code points greater than or equal to U+00A0, except for the -surrogates U+D800 to U+DFFF. Note that most base (ASCII) characters are -excluded. (Universal Character Names are of the form \euHHHH or \eUHHHHHHHH -where H is a hexadecimal digit. Note that the Xuc property does not match these -sequences but the characters that they represent.) -. -. -.\" HTML -.SS "Resetting the match start" -.rs -.sp -The escape sequence \eK causes any previously matched characters not to be -included in the final matched sequence. For example, the pattern: -.sp - foo\eKbar -.sp -matches "foobar", but reports that it has matched "bar". This feature is -similar to a lookbehind assertion -.\" HTML -.\" -(described below). -.\" -However, in this case, the part of the subject before the real match does not -have to be of fixed length, as lookbehind assertions do. The use of \eK does -not interfere with the setting of -.\" HTML -.\" -captured substrings. -.\" -For example, when the pattern -.sp - (foo)\eKbar -.sp -matches "foobar", the first substring is still set to "foo". -.P -Perl documents that the use of \eK within assertions is "not well defined". In -PCRE, \eK is acted upon when it occurs inside positive assertions, but is -ignored in negative assertions. Note that when a pattern such as (?=ab\eK) -matches, the reported start of the match can be greater than the end of the -match. -. -. -.\" HTML -.SS "Simple assertions" -.rs -.sp -The final use of backslash is for certain simple assertions. An assertion -specifies a condition that has to be met at a particular point in a match, -without consuming any characters from the subject string. The use of -subpatterns for more complicated assertions is described -.\" HTML -.\" -below. -.\" -The backslashed assertions are: -.sp - \eb matches at a word boundary - \eB matches when not at a word boundary - \eA matches at the start of the subject - \eZ matches at the end of the subject - also matches before a newline at the end of the subject - \ez matches only at the end of the subject - \eG matches at the first matching position in the subject -.sp -Inside a character class, \eb has a different meaning; it matches the backspace -character. If any other of these assertions appears in a character class, by -default it matches the corresponding literal character (for example, \eB -matches the letter B). However, if the PCRE_EXTRA option is set, an "invalid -escape sequence" error is generated instead. -.P -A word boundary is a position in the subject string where the current character -and the previous character do not both match \ew or \eW (i.e. one matches -\ew and the other matches \eW), or the start or end of the string if the -first or last character matches \ew, respectively. In a UTF mode, the meanings -of \ew and \eW can be changed by setting the PCRE_UCP option. When this is -done, it also affects \eb and \eB. Neither PCRE nor Perl has a separate "start -of word" or "end of word" metasequence. However, whatever follows \eb normally -determines which it is. For example, the fragment \eba matches "a" at the start -of a word. -.P -The \eA, \eZ, and \ez assertions differ from the traditional circumflex and -dollar (described in the next section) in that they only ever match at the very -start and end of the subject string, whatever options are set. Thus, they are -independent of multiline mode. These three assertions are not affected by the -PCRE_NOTBOL or PCRE_NOTEOL options, which affect only the behaviour of the -circumflex and dollar metacharacters. However, if the \fIstartoffset\fP -argument of \fBpcre_exec()\fP is non-zero, indicating that matching is to start -at a point other than the beginning of the subject, \eA can never match. The -difference between \eZ and \ez is that \eZ matches before a newline at the end -of the string as well as at the very end, whereas \ez matches only at the end. -.P -The \eG assertion is true only when the current matching position is at the -start point of the match, as specified by the \fIstartoffset\fP argument of -\fBpcre_exec()\fP. It differs from \eA when the value of \fIstartoffset\fP is -non-zero. By calling \fBpcre_exec()\fP multiple times with appropriate -arguments, you can mimic Perl's /g option, and it is in this kind of -implementation where \eG can be useful. -.P -Note, however, that PCRE's interpretation of \eG, as the start of the current -match, is subtly different from Perl's, which defines it as the end of the -previous match. In Perl, these can be different when the previously matched -string was empty. Because PCRE does just one match at a time, it cannot -reproduce this behaviour. -.P -If all the alternatives of a pattern begin with \eG, the expression is anchored -to the starting match position, and the "anchored" flag is set in the compiled -regular expression. -. -. -.SH "CIRCUMFLEX AND DOLLAR" -.rs -.sp -The circumflex and dollar metacharacters are zero-width assertions. That is, -they test for a particular condition being true without consuming any -characters from the subject string. -.P -Outside a character class, in the default matching mode, the circumflex -character is an assertion that is true only if the current matching point is at -the start of the subject string. If the \fIstartoffset\fP argument of -\fBpcre_exec()\fP is non-zero, circumflex can never match if the PCRE_MULTILINE -option is unset. Inside a character class, circumflex has an entirely different -meaning -.\" HTML -.\" -(see below). -.\" -.P -Circumflex need not be the first character of the pattern if a number of -alternatives are involved, but it should be the first thing in each alternative -in which it appears if the pattern is ever to match that branch. If all -possible alternatives start with a circumflex, that is, if the pattern is -constrained to match only at the start of the subject, it is said to be an -"anchored" pattern. (There are also other constructs that can cause a pattern -to be anchored.) -.P -The dollar character is an assertion that is true only if the current matching -point is at the end of the subject string, or immediately before a newline at -the end of the string (by default). Note, however, that it does not actually -match the newline. Dollar need not be the last character of the pattern if a -number of alternatives are involved, but it should be the last item in any -branch in which it appears. Dollar has no special meaning in a character class. -.P -The meaning of dollar can be changed so that it matches only at the very end of -the string, by setting the PCRE_DOLLAR_ENDONLY option at compile time. This -does not affect the \eZ assertion. -.P -The meanings of the circumflex and dollar characters are changed if the -PCRE_MULTILINE option is set. When this is the case, a circumflex matches -immediately after internal newlines as well as at the start of the subject -string. It does not match after a newline that ends the string. A dollar -matches before any newlines in the string, as well as at the very end, when -PCRE_MULTILINE is set. When newline is specified as the two-character -sequence CRLF, isolated CR and LF characters do not indicate newlines. -.P -For example, the pattern /^abc$/ matches the subject string "def\enabc" (where -\en represents a newline) in multiline mode, but not otherwise. Consequently, -patterns that are anchored in single line mode because all branches start with -^ are not anchored in multiline mode, and a match for circumflex is possible -when the \fIstartoffset\fP argument of \fBpcre_exec()\fP is non-zero. The -PCRE_DOLLAR_ENDONLY option is ignored if PCRE_MULTILINE is set. -.P -Note that the sequences \eA, \eZ, and \ez can be used to match the start and -end of the subject in both modes, and if all branches of a pattern start with -\eA it is always anchored, whether or not PCRE_MULTILINE is set. -. -. -.\" HTML -.SH "FULL STOP (PERIOD, DOT) AND \eN" -.rs -.sp -Outside a character class, a dot in the pattern matches any one character in -the subject string except (by default) a character that signifies the end of a -line. -.P -When a line ending is defined as a single character, dot never matches that -character; when the two-character sequence CRLF is used, dot does not match CR -if it is immediately followed by LF, but otherwise it matches all characters -(including isolated CRs and LFs). When any Unicode line endings are being -recognized, dot does not match CR or LF or any of the other line ending -characters. -.P -The behaviour of dot with regard to newlines can be changed. If the PCRE_DOTALL -option is set, a dot matches any one character, without exception. If the -two-character sequence CRLF is present in the subject string, it takes two dots -to match it. -.P -The handling of dot is entirely independent of the handling of circumflex and -dollar, the only relationship being that they both involve newlines. Dot has no -special meaning in a character class. -.P -The escape sequence \eN behaves like a dot, except that it is not affected by -the PCRE_DOTALL option. In other words, it matches any character except one -that signifies the end of a line. Perl also uses \eN to match characters by -name; PCRE does not support this. -. -. -.SH "MATCHING A SINGLE DATA UNIT" -.rs -.sp -Outside a character class, the escape sequence \eC matches any one data unit, -whether or not a UTF mode is set. In the 8-bit library, one data unit is one -byte; in the 16-bit library it is a 16-bit unit; in the 32-bit library it is -a 32-bit unit. Unlike a dot, \eC always -matches line-ending characters. The feature is provided in Perl in order to -match individual bytes in UTF-8 mode, but it is unclear how it can usefully be -used. Because \eC breaks up characters into individual data units, matching one -unit with \eC in a UTF mode means that the rest of the string may start with a -malformed UTF character. This has undefined results, because PCRE assumes that -it is dealing with valid UTF strings (and by default it checks this at the -start of processing unless the PCRE_NO_UTF8_CHECK, PCRE_NO_UTF16_CHECK or -PCRE_NO_UTF32_CHECK option is used). -.P -PCRE does not allow \eC to appear in lookbehind assertions -.\" HTML -.\" -(described below) -.\" -in a UTF mode, because this would make it impossible to calculate the length of -the lookbehind. -.P -In general, the \eC escape sequence is best avoided. However, one -way of using it that avoids the problem of malformed UTF characters is to use a -lookahead to check the length of the next character, as in this pattern, which -could be used with a UTF-8 string (ignore white space and line breaks): -.sp - (?| (?=[\ex00-\ex7f])(\eC) | - (?=[\ex80-\ex{7ff}])(\eC)(\eC) | - (?=[\ex{800}-\ex{ffff}])(\eC)(\eC)(\eC) | - (?=[\ex{10000}-\ex{1fffff}])(\eC)(\eC)(\eC)(\eC)) -.sp -A group that starts with (?| resets the capturing parentheses numbers in each -alternative (see -.\" HTML -.\" -"Duplicate Subpattern Numbers" -.\" -below). The assertions at the start of each branch check the next UTF-8 -character for values whose encoding uses 1, 2, 3, or 4 bytes, respectively. The -character's individual bytes are then captured by the appropriate number of -groups. -. -. -.\" HTML -.SH "SQUARE BRACKETS AND CHARACTER CLASSES" -.rs -.sp -An opening square bracket introduces a character class, terminated by a closing -square bracket. A closing square bracket on its own is not special by default. -However, if the PCRE_JAVASCRIPT_COMPAT option is set, a lone closing square -bracket causes a compile-time error. If a closing square bracket is required as -a member of the class, it should be the first data character in the class -(after an initial circumflex, if present) or escaped with a backslash. -.P -A character class matches a single character in the subject. In a UTF mode, the -character may be more than one data unit long. A matched character must be in -the set of characters defined by the class, unless the first character in the -class definition is a circumflex, in which case the subject character must not -be in the set defined by the class. If a circumflex is actually required as a -member of the class, ensure it is not the first character, or escape it with a -backslash. -.P -For example, the character class [aeiou] matches any lower case vowel, while -[^aeiou] matches any character that is not a lower case vowel. Note that a -circumflex is just a convenient notation for specifying the characters that -are in the class by enumerating those that are not. A class that starts with a -circumflex is not an assertion; it still consumes a character from the subject -string, and therefore it fails if the current pointer is at the end of the -string. -.P -In UTF-8 (UTF-16, UTF-32) mode, characters with values greater than 255 (0xffff) -can be included in a class as a literal string of data units, or by using the -\ex{ escaping mechanism. -.P -When caseless matching is set, any letters in a class represent both their -upper case and lower case versions, so for example, a caseless [aeiou] matches -"A" as well as "a", and a caseless [^aeiou] does not match "A", whereas a -caseful version would. In a UTF mode, PCRE always understands the concept of -case for characters whose values are less than 128, so caseless matching is -always possible. For characters with higher values, the concept of case is -supported if PCRE is compiled with Unicode property support, but not otherwise. -If you want to use caseless matching in a UTF mode for characters 128 and -above, you must ensure that PCRE is compiled with Unicode property support as -well as with UTF support. -.P -Characters that might indicate line breaks are never treated in any special way -when matching character classes, whatever line-ending sequence is in use, and -whatever setting of the PCRE_DOTALL and PCRE_MULTILINE options is used. A class -such as [^a] always matches one of these characters. -.P -The minus (hyphen) character can be used to specify a range of characters in a -character class. For example, [d-m] matches any letter between d and m, -inclusive. If a minus character is required in a class, it must be escaped with -a backslash or appear in a position where it cannot be interpreted as -indicating a range, typically as the first or last character in the class, or -immediately after a range. For example, [b-d-z] matches letters in the range b -to d, a hyphen character, or z. -.P -It is not possible to have the literal character "]" as the end character of a -range. A pattern such as [W-]46] is interpreted as a class of two characters -("W" and "-") followed by a literal string "46]", so it would match "W46]" or -"-46]". However, if the "]" is escaped with a backslash it is interpreted as -the end of range, so [W-\e]46] is interpreted as a class containing a range -followed by two other characters. The octal or hexadecimal representation of -"]" can also be used to end a range. -.P -An error is generated if a POSIX character class (see below) or an escape -sequence other than one that defines a single character appears at a point -where a range ending character is expected. For example, [z-\exff] is valid, -but [A-\ed] and [A-[:digit:]] are not. -.P -Ranges operate in the collating sequence of character values. They can also be -used for characters specified numerically, for example [\e000-\e037]. Ranges -can include any characters that are valid for the current mode. -.P -If a range that includes letters is used when caseless matching is set, it -matches the letters in either case. For example, [W-c] is equivalent to -[][\e\e^_`wxyzabc], matched caselessly, and in a non-UTF mode, if character -tables for a French locale are in use, [\exc8-\excb] matches accented E -characters in both cases. In UTF modes, PCRE supports the concept of case for -characters with values greater than 128 only when it is compiled with Unicode -property support. -.P -The character escape sequences \ed, \eD, \eh, \eH, \ep, \eP, \es, \eS, \ev, -\eV, \ew, and \eW may appear in a character class, and add the characters that -they match to the class. For example, [\edABCDEF] matches any hexadecimal -digit. In UTF modes, the PCRE_UCP option affects the meanings of \ed, \es, \ew -and their upper case partners, just as it does when they appear outside a -character class, as described in the section entitled -.\" HTML -.\" -"Generic character types" -.\" -above. The escape sequence \eb has a different meaning inside a character -class; it matches the backspace character. The sequences \eB, \eN, \eR, and \eX -are not special inside a character class. Like any other unrecognized escape -sequences, they are treated as the literal characters "B", "N", "R", and "X" by -default, but cause an error if the PCRE_EXTRA option is set. -.P -A circumflex can conveniently be used with the upper case character types to -specify a more restricted set of characters than the matching lower case type. -For example, the class [^\eW_] matches any letter or digit, but not underscore, -whereas [\ew] includes underscore. A positive character class should be read as -"something OR something OR ..." and a negative class as "NOT something AND NOT -something AND NOT ...". -.P -The only metacharacters that are recognized in character classes are backslash, -hyphen (only where it can be interpreted as specifying a range), circumflex -(only at the start), opening square bracket (only when it can be interpreted as -introducing a POSIX class name, or for a special compatibility feature - see -the next two sections), and the terminating closing square bracket. However, -escaping other non-alphanumeric characters does no harm. -. -. -.SH "POSIX CHARACTER CLASSES" -.rs -.sp -Perl supports the POSIX notation for character classes. This uses names -enclosed by [: and :] within the enclosing square brackets. PCRE also supports -this notation. For example, -.sp - [01[:alpha:]%] -.sp -matches "0", "1", any alphabetic character, or "%". The supported class names -are: -.sp - alnum letters and digits - alpha letters - ascii character codes 0 - 127 - blank space or tab only - cntrl control characters - digit decimal digits (same as \ed) - graph printing characters, excluding space - lower lower case letters - print printing characters, including space - punct printing characters, excluding letters and digits and space - space white space (the same as \es from PCRE 8.34) - upper upper case letters - word "word" characters (same as \ew) - xdigit hexadecimal digits -.sp -The default "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), -and space (32). If locale-specific matching is taking place, the list of space -characters may be different; there may be fewer or more of them. "Space" used -to be different to \es, which did not include VT, for Perl compatibility. -However, Perl changed at release 5.18, and PCRE followed at release 8.34. -"Space" and \es now match the same set of characters. -.P -The name "word" is a Perl extension, and "blank" is a GNU extension from Perl -5.8. Another Perl extension is negation, which is indicated by a ^ character -after the colon. For example, -.sp - [12[:^digit:]] -.sp -matches "1", "2", or any non-digit. PCRE (and Perl) also recognize the POSIX -syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not -supported, and an error is given if they are encountered. -.P -By default, characters with values greater than 128 do not match any of the -POSIX character classes. However, if the PCRE_UCP option is passed to -\fBpcre_compile()\fP, some of the classes are changed so that Unicode character -properties are used. This is achieved by replacing certain POSIX classes by -other sequences, as follows: -.sp - [:alnum:] becomes \ep{Xan} - [:alpha:] becomes \ep{L} - [:blank:] becomes \eh - [:digit:] becomes \ep{Nd} - [:lower:] becomes \ep{Ll} - [:space:] becomes \ep{Xps} - [:upper:] becomes \ep{Lu} - [:word:] becomes \ep{Xwd} -.sp -Negated versions, such as [:^alpha:] use \eP instead of \ep. Three other POSIX -classes are handled specially in UCP mode: -.TP 10 -[:graph:] -This matches characters that have glyphs that mark the page when printed. In -Unicode property terms, it matches all characters with the L, M, N, P, S, or Cf -properties, except for: -.sp - U+061C Arabic Letter Mark - U+180E Mongolian Vowel Separator - U+2066 - U+2069 Various "isolate"s -.sp -.TP 10 -[:print:] -This matches the same characters as [:graph:] plus space characters that are -not controls, that is, characters with the Zs property. -.TP 10 -[:punct:] -This matches all characters that have the Unicode P (punctuation) property, -plus those characters whose code points are less than 128 that have the S -(Symbol) property. -.P -The other POSIX classes are unchanged, and match only characters with code -points less than 128. -. -. -.SH "COMPATIBILITY FEATURE FOR WORD BOUNDARIES" -.rs -.sp -In the POSIX.2 compliant library that was included in 4.4BSD Unix, the ugly -syntax [[:<:]] and [[:>:]] is used for matching "start of word" and "end of -word". PCRE treats these items as follows: -.sp - [[:<:]] is converted to \eb(?=\ew) - [[:>:]] is converted to \eb(?<=\ew) -.sp -Only these exact character sequences are recognized. A sequence such as -[a[:<:]b] provokes error for an unrecognized POSIX class name. This support is -not compatible with Perl. It is provided to help migrations from other -environments, and is best not used in any new patterns. Note that \eb matches -at the start and the end of a word (see -.\" HTML -.\" -"Simple assertions" -.\" -above), and in a Perl-style pattern the preceding or following character -normally shows which is wanted, without the need for the assertions that are -used above in order to give exactly the POSIX behaviour. -. -. -.SH "VERTICAL BAR" -.rs -.sp -Vertical bar characters are used to separate alternative patterns. For example, -the pattern -.sp - gilbert|sullivan -.sp -matches either "gilbert" or "sullivan". Any number of alternatives may appear, -and an empty alternative is permitted (matching the empty string). The matching -process tries each alternative in turn, from left to right, and the first one -that succeeds is used. If the alternatives are within a subpattern -.\" HTML -.\" -(defined below), -.\" -"succeeds" means matching the rest of the main pattern as well as the -alternative in the subpattern. -. -. -.SH "INTERNAL OPTION SETTING" -.rs -.sp -The settings of the PCRE_CASELESS, PCRE_MULTILINE, PCRE_DOTALL, and -PCRE_EXTENDED options (which are Perl-compatible) can be changed from within -the pattern by a sequence of Perl option letters enclosed between "(?" and ")". -The option letters are -.sp - i for PCRE_CASELESS - m for PCRE_MULTILINE - s for PCRE_DOTALL - x for PCRE_EXTENDED -.sp -For example, (?im) sets caseless, multiline matching. It is also possible to -unset these options by preceding the letter with a hyphen, and a combined -setting and unsetting such as (?im-sx), which sets PCRE_CASELESS and -PCRE_MULTILINE while unsetting PCRE_DOTALL and PCRE_EXTENDED, is also -permitted. If a letter appears both before and after the hyphen, the option is -unset. -.P -The PCRE-specific options PCRE_DUPNAMES, PCRE_UNGREEDY, and PCRE_EXTRA can be -changed in the same way as the Perl-compatible options by using the characters -J, U and X respectively. -.P -When one of these option changes occurs at top level (that is, not inside -subpattern parentheses), the change applies to the remainder of the pattern -that follows. An option change within a subpattern (see below for a description -of subpatterns) affects only that part of the subpattern that follows it, so -.sp - (a(?i)b)c -.sp -matches abc and aBc and no other strings (assuming PCRE_CASELESS is not used). -By this means, options can be made to have different settings in different -parts of the pattern. Any changes made in one alternative do carry on -into subsequent branches within the same subpattern. For example, -.sp - (a(?i)b|c) -.sp -matches "ab", "aB", "c", and "C", even though when matching "C" the first -branch is abandoned before the option setting. This is because the effects of -option settings happen at compile time. There would be some very weird -behaviour otherwise. -.P -\fBNote:\fP There are other PCRE-specific options that can be set by the -application when the compiling or matching functions are called. In some cases -the pattern can contain special leading sequences such as (*CRLF) to override -what the application has set or what has been defaulted. Details are given in -the section entitled -.\" HTML -.\" -"Newline sequences" -.\" -above. There are also the (*UTF8), (*UTF16),(*UTF32), and (*UCP) leading -sequences that can be used to set UTF and Unicode property modes; they are -equivalent to setting the PCRE_UTF8, PCRE_UTF16, PCRE_UTF32 and the PCRE_UCP -options, respectively. The (*UTF) sequence is a generic version that can be -used with any of the libraries. However, the application can set the -PCRE_NEVER_UTF option, which locks out the use of the (*UTF) sequences. -. -. -.\" HTML -.SH SUBPATTERNS -.rs -.sp -Subpatterns are delimited by parentheses (round brackets), which can be nested. -Turning part of a pattern into a subpattern does two things: -.sp -1. It localizes a set of alternatives. For example, the pattern -.sp - cat(aract|erpillar|) -.sp -matches "cataract", "caterpillar", or "cat". Without the parentheses, it would -match "cataract", "erpillar" or an empty string. -.sp -2. It sets up the subpattern as a capturing subpattern. This means that, when -the whole pattern matches, that portion of the subject string that matched the -subpattern is passed back to the caller via the \fIovector\fP argument of the -matching function. (This applies only to the traditional matching functions; -the DFA matching functions do not support capturing.) -.P -Opening parentheses are counted from left to right (starting from 1) to obtain -numbers for the capturing subpatterns. For example, if the string "the red -king" is matched against the pattern -.sp - the ((red|white) (king|queen)) -.sp -the captured substrings are "red king", "red", and "king", and are numbered 1, -2, and 3, respectively. -.P -The fact that plain parentheses fulfil two functions is not always helpful. -There are often times when a grouping subpattern is required without a -capturing requirement. If an opening parenthesis is followed by a question mark -and a colon, the subpattern does not do any capturing, and is not counted when -computing the number of any subsequent capturing subpatterns. For example, if -the string "the white queen" is matched against the pattern -.sp - the ((?:red|white) (king|queen)) -.sp -the captured substrings are "white queen" and "queen", and are numbered 1 and -2. The maximum number of capturing subpatterns is 65535. -.P -As a convenient shorthand, if any option settings are required at the start of -a non-capturing subpattern, the option letters may appear between the "?" and -the ":". Thus the two patterns -.sp - (?i:saturday|sunday) - (?:(?i)saturday|sunday) -.sp -match exactly the same set of strings. Because alternative branches are tried -from left to right, and options are not reset until the end of the subpattern -is reached, an option setting in one branch does affect subsequent branches, so -the above patterns match "SUNDAY" as well as "Saturday". -. -. -.\" HTML -.SH "DUPLICATE SUBPATTERN NUMBERS" -.rs -.sp -Perl 5.10 introduced a feature whereby each alternative in a subpattern uses -the same numbers for its capturing parentheses. Such a subpattern starts with -(?| and is itself a non-capturing subpattern. For example, consider this -pattern: -.sp - (?|(Sat)ur|(Sun))day -.sp -Because the two alternatives are inside a (?| group, both sets of capturing -parentheses are numbered one. Thus, when the pattern matches, you can look -at captured substring number one, whichever alternative matched. This construct -is useful when you want to capture part, but not all, of one of a number of -alternatives. Inside a (?| group, parentheses are numbered as usual, but the -number is reset at the start of each branch. The numbers of any capturing -parentheses that follow the subpattern start after the highest number used in -any branch. The following example is taken from the Perl documentation. The -numbers underneath show in which buffer the captured content will be stored. -.sp - # before ---------------branch-reset----------- after - / ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x - # 1 2 2 3 2 3 4 -.sp -A back reference to a numbered subpattern uses the most recent value that is -set for that number by any subpattern. The following pattern matches "abcabc" -or "defdef": -.sp - /(?|(abc)|(def))\e1/ -.sp -In contrast, a subroutine call to a numbered subpattern always refers to the -first one in the pattern with the given number. The following pattern matches -"abcabc" or "defabc": -.sp - /(?|(abc)|(def))(?1)/ -.sp -If a -.\" HTML -.\" -condition test -.\" -for a subpattern's having matched refers to a non-unique number, the test is -true if any of the subpatterns of that number have matched. -.P -An alternative approach to using this "branch reset" feature is to use -duplicate named subpatterns, as described in the next section. -. -. -.SH "NAMED SUBPATTERNS" -.rs -.sp -Identifying capturing parentheses by number is simple, but it can be very hard -to keep track of the numbers in complicated regular expressions. Furthermore, -if an expression is modified, the numbers may change. To help with this -difficulty, PCRE supports the naming of subpatterns. This feature was not -added to Perl until release 5.10. Python had the feature earlier, and PCRE -introduced it at release 4.0, using the Python syntax. PCRE now supports both -the Perl and the Python syntax. Perl allows identically numbered subpatterns to -have different names, but PCRE does not. -.P -In PCRE, a subpattern can be named in one of three ways: (?...) or -(?'name'...) as in Perl, or (?P...) as in Python. References to capturing -parentheses from other parts of the pattern, such as -.\" HTML -.\" -back references, -.\" -.\" HTML -.\" -recursion, -.\" -and -.\" HTML -.\" -conditions, -.\" -can be made by name as well as by number. -.P -Names consist of up to 32 alphanumeric characters and underscores, but must -start with a non-digit. Named capturing parentheses are still allocated numbers -as well as names, exactly as if the names were not present. The PCRE API -provides function calls for extracting the name-to-number translation table -from a compiled pattern. There is also a convenience function for extracting a -captured substring by name. -.P -By default, a name must be unique within a pattern, but it is possible to relax -this constraint by setting the PCRE_DUPNAMES option at compile time. (Duplicate -names are also always permitted for subpatterns with the same number, set up as -described in the previous section.) Duplicate names can be useful for patterns -where only one instance of the named parentheses can match. Suppose you want to -match the name of a weekday, either as a 3-letter abbreviation or as the full -name, and in both cases you want to extract the abbreviation. This pattern -(ignoring the line breaks) does the job: -.sp - (?Mon|Fri|Sun)(?:day)?| - (?Tue)(?:sday)?| - (?Wed)(?:nesday)?| - (?Thu)(?:rsday)?| - (?Sat)(?:urday)? -.sp -There are five capturing substrings, but only one is ever set after a match. -(An alternative way of solving this problem is to use a "branch reset" -subpattern, as described in the previous section.) -.P -The convenience function for extracting the data by name returns the substring -for the first (and in this example, the only) subpattern of that name that -matched. This saves searching to find which numbered subpattern it was. -.P -If you make a back reference to a non-unique named subpattern from elsewhere in -the pattern, the subpatterns to which the name refers are checked in the order -in which they appear in the overall pattern. The first one that is set is used -for the reference. For example, this pattern matches both "foofoo" and -"barbar" but not "foobar" or "barfoo": -.sp - (?:(?foo)|(?bar))\ek -.sp -.P -If you make a subroutine call to a non-unique named subpattern, the one that -corresponds to the first occurrence of the name is used. In the absence of -duplicate numbers (see the previous section) this is the one with the lowest -number. -.P -If you use a named reference in a condition -test (see the -.\" -.\" HTML -.\" -section about conditions -.\" -below), either to check whether a subpattern has matched, or to check for -recursion, all subpatterns with the same name are tested. If the condition is -true for any one of them, the overall condition is true. This is the same -behaviour as testing by number. For further details of the interfaces for -handling named subpatterns, see the -.\" HREF -\fBpcreapi\fP -.\" -documentation. -.P -\fBWarning:\fP You cannot use different names to distinguish between two -subpatterns with the same number because PCRE uses only the numbers when -matching. For this reason, an error is given at compile time if different names -are given to subpatterns with the same number. However, you can always give the -same name to subpatterns with the same number, even when PCRE_DUPNAMES is not -set. -. -. -.SH REPETITION -.rs -.sp -Repetition is specified by quantifiers, which can follow any of the following -items: -.sp - a literal data character - the dot metacharacter - the \eC escape sequence - the \eX escape sequence - the \eR escape sequence - an escape such as \ed or \epL that matches a single character - a character class - a back reference (see next section) - a parenthesized subpattern (including assertions) - a subroutine call to a subpattern (recursive or otherwise) -.sp -The general repetition quantifier specifies a minimum and maximum number of -permitted matches, by giving the two numbers in curly brackets (braces), -separated by a comma. The numbers must be less than 65536, and the first must -be less than or equal to the second. For example: -.sp - z{2,4} -.sp -matches "zz", "zzz", or "zzzz". A closing brace on its own is not a special -character. If the second number is omitted, but the comma is present, there is -no upper limit; if the second number and the comma are both omitted, the -quantifier specifies an exact number of required matches. Thus -.sp - [aeiou]{3,} -.sp -matches at least 3 successive vowels, but may match many more, while -.sp - \ed{8} -.sp -matches exactly 8 digits. An opening curly bracket that appears in a position -where a quantifier is not allowed, or one that does not match the syntax of a -quantifier, is taken as a literal character. For example, {,6} is not a -quantifier, but a literal string of four characters. -.P -In UTF modes, quantifiers apply to characters rather than to individual data -units. Thus, for example, \ex{100}{2} matches two characters, each of -which is represented by a two-byte sequence in a UTF-8 string. Similarly, -\eX{3} matches three Unicode extended grapheme clusters, each of which may be -several data units long (and they may be of different lengths). -.P -The quantifier {0} is permitted, causing the expression to behave as if the -previous item and the quantifier were not present. This may be useful for -subpatterns that are referenced as -.\" HTML -.\" -subroutines -.\" -from elsewhere in the pattern (but see also the section entitled -.\" HTML -.\" -"Defining subpatterns for use by reference only" -.\" -below). Items other than subpatterns that have a {0} quantifier are omitted -from the compiled pattern. -.P -For convenience, the three most common quantifiers have single-character -abbreviations: -.sp - * is equivalent to {0,} - + is equivalent to {1,} - ? is equivalent to {0,1} -.sp -It is possible to construct infinite loops by following a subpattern that can -match no characters with a quantifier that has no upper limit, for example: -.sp - (a?)* -.sp -Earlier versions of Perl and PCRE used to give an error at compile time for -such patterns. However, because there are cases where this can be useful, such -patterns are now accepted, but if any repetition of the subpattern does in fact -match no characters, the loop is forcibly broken. -.P -By default, the quantifiers are "greedy", that is, they match as much as -possible (up to the maximum number of permitted times), without causing the -rest of the pattern to fail. The classic example of where this gives problems -is in trying to match comments in C programs. These appear between /* and */ -and within the comment, individual * and / characters may appear. An attempt to -match C comments by applying the pattern -.sp - /\e*.*\e*/ -.sp -to the string -.sp - /* first comment */ not comment /* second comment */ -.sp -fails, because it matches the entire string owing to the greediness of the .* -item. -.P -However, if a quantifier is followed by a question mark, it ceases to be -greedy, and instead matches the minimum number of times possible, so the -pattern -.sp - /\e*.*?\e*/ -.sp -does the right thing with the C comments. The meaning of the various -quantifiers is not otherwise changed, just the preferred number of matches. -Do not confuse this use of question mark with its use as a quantifier in its -own right. Because it has two uses, it can sometimes appear doubled, as in -.sp - \ed??\ed -.sp -which matches one digit by preference, but can match two if that is the only -way the rest of the pattern matches. -.P -If the PCRE_UNGREEDY option is set (an option that is not available in Perl), -the quantifiers are not greedy by default, but individual ones can be made -greedy by following them with a question mark. In other words, it inverts the -default behaviour. -.P -When a parenthesized subpattern is quantified with a minimum repeat count that -is greater than 1 or with a limited maximum, more memory is required for the -compiled pattern, in proportion to the size of the minimum or maximum. -.P -If a pattern starts with .* or .{0,} and the PCRE_DOTALL option (equivalent -to Perl's /s) is set, thus allowing the dot to match newlines, the pattern is -implicitly anchored, because whatever follows will be tried against every -character position in the subject string, so there is no point in retrying the -overall match at any position after the first. PCRE normally treats such a -pattern as though it were preceded by \eA. -.P -In cases where it is known that the subject string contains no newlines, it is -worth setting PCRE_DOTALL in order to obtain this optimization, or -alternatively using ^ to indicate anchoring explicitly. -.P -However, there are some cases where the optimization cannot be used. When .* -is inside capturing parentheses that are the subject of a back reference -elsewhere in the pattern, a match at the start may fail where a later one -succeeds. Consider, for example: -.sp - (.*)abc\e1 -.sp -If the subject is "xyz123abc123" the match point is the fourth character. For -this reason, such a pattern is not implicitly anchored. -.P -Another case where implicit anchoring is not applied is when the leading .* is -inside an atomic group. Once again, a match at the start may fail where a later -one succeeds. Consider this pattern: -.sp - (?>.*?a)b -.sp -It matches "ab" in the subject "aab". The use of the backtracking control verbs -(*PRUNE) and (*SKIP) also disable this optimization. -.P -When a capturing subpattern is repeated, the value captured is the substring -that matched the final iteration. For example, after -.sp - (tweedle[dume]{3}\es*)+ -.sp -has matched "tweedledum tweedledee" the value of the captured substring is -"tweedledee". However, if there are nested capturing subpatterns, the -corresponding captured values may have been set in previous iterations. For -example, after -.sp - /(a|(b))+/ -.sp -matches "aba" the value of the second captured substring is "b". -. -. -.\" HTML -.SH "ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS" -.rs -.sp -With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy") -repetition, failure of what follows normally causes the repeated item to be -re-evaluated to see if a different number of repeats allows the rest of the -pattern to match. Sometimes it is useful to prevent this, either to change the -nature of the match, or to cause it fail earlier than it otherwise might, when -the author of the pattern knows there is no point in carrying on. -.P -Consider, for example, the pattern \ed+foo when applied to the subject line -.sp - 123456bar -.sp -After matching all 6 digits and then failing to match "foo", the normal -action of the matcher is to try again with only 5 digits matching the \ed+ -item, and then with 4, and so on, before ultimately failing. "Atomic grouping" -(a term taken from Jeffrey Friedl's book) provides the means for specifying -that once a subpattern has matched, it is not to be re-evaluated in this way. -.P -If we use atomic grouping for the previous example, the matcher gives up -immediately on failing to match "foo" the first time. The notation is a kind of -special parenthesis, starting with (?> as in this example: -.sp - (?>\ed+)foo -.sp -This kind of parenthesis "locks up" the part of the pattern it contains once -it has matched, and a failure further into the pattern is prevented from -backtracking into it. Backtracking past it to previous items, however, works as -normal. -.P -An alternative description is that a subpattern of this type matches the string -of characters that an identical standalone pattern would match, if anchored at -the current point in the subject string. -.P -Atomic grouping subpatterns are not capturing subpatterns. Simple cases such as -the above example can be thought of as a maximizing repeat that must swallow -everything it can. So, while both \ed+ and \ed+? are prepared to adjust the -number of digits they match in order to make the rest of the pattern match, -(?>\ed+) can only match an entire sequence of digits. -.P -Atomic groups in general can of course contain arbitrarily complicated -subpatterns, and can be nested. However, when the subpattern for an atomic -group is just a single repeated item, as in the example above, a simpler -notation, called a "possessive quantifier" can be used. This consists of an -additional + character following a quantifier. Using this notation, the -previous example can be rewritten as -.sp - \ed++foo -.sp -Note that a possessive quantifier can be used with an entire group, for -example: -.sp - (abc|xyz){2,3}+ -.sp -Possessive quantifiers are always greedy; the setting of the PCRE_UNGREEDY -option is ignored. They are a convenient notation for the simpler forms of -atomic group. However, there is no difference in the meaning of a possessive -quantifier and the equivalent atomic group, though there may be a performance -difference; possessive quantifiers should be slightly faster. -.P -The possessive quantifier syntax is an extension to the Perl 5.8 syntax. -Jeffrey Friedl originated the idea (and the name) in the first edition of his -book. Mike McCloskey liked it, so implemented it when he built Sun's Java -package, and PCRE copied it from there. It ultimately found its way into Perl -at release 5.10. -.P -PCRE has an optimization that automatically "possessifies" certain simple -pattern constructs. For example, the sequence A+B is treated as A++B because -there is no point in backtracking into a sequence of A's when B must follow. -.P -When a pattern contains an unlimited repeat inside a subpattern that can itself -be repeated an unlimited number of times, the use of an atomic group is the -only way to avoid some failing matches taking a very long time indeed. The -pattern -.sp - (\eD+|<\ed+>)*[!?] -.sp -matches an unlimited number of substrings that either consist of non-digits, or -digits enclosed in <>, followed by either ! or ?. When it matches, it runs -quickly. However, if it is applied to -.sp - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa -.sp -it takes a long time before reporting failure. This is because the string can -be divided between the internal \eD+ repeat and the external * repeat in a -large number of ways, and all have to be tried. (The example uses [!?] rather -than a single character at the end, because both PCRE and Perl have an -optimization that allows for fast failure when a single character is used. They -remember the last single character that is required for a match, and fail early -if it is not present in the string.) If the pattern is changed so that it uses -an atomic group, like this: -.sp - ((?>\eD+)|<\ed+>)*[!?] -.sp -sequences of non-digits cannot be broken, and failure happens quickly. -. -. -.\" HTML -.SH "BACK REFERENCES" -.rs -.sp -Outside a character class, a backslash followed by a digit greater than 0 (and -possibly further digits) is a back reference to a capturing subpattern earlier -(that is, to its left) in the pattern, provided there have been that many -previous capturing left parentheses. -.P -However, if the decimal number following the backslash is less than 10, it is -always taken as a back reference, and causes an error only if there are not -that many capturing left parentheses in the entire pattern. In other words, the -parentheses that are referenced need not be to the left of the reference for -numbers less than 10. A "forward back reference" of this type can make sense -when a repetition is involved and the subpattern to the right has participated -in an earlier iteration. -.P -It is not possible to have a numerical "forward back reference" to a subpattern -whose number is 10 or more using this syntax because a sequence such as \e50 is -interpreted as a character defined in octal. See the subsection entitled -"Non-printing characters" -.\" HTML -.\" -above -.\" -for further details of the handling of digits following a backslash. There is -no such problem when named parentheses are used. A back reference to any -subpattern is possible using named parentheses (see below). -.P -Another way of avoiding the ambiguity inherent in the use of digits following a -backslash is to use the \eg escape sequence. This escape must be followed by an -unsigned number or a negative number, optionally enclosed in braces. These -examples are all identical: -.sp - (ring), \e1 - (ring), \eg1 - (ring), \eg{1} -.sp -An unsigned number specifies an absolute reference without the ambiguity that -is present in the older syntax. It is also useful when literal digits follow -the reference. A negative number is a relative reference. Consider this -example: -.sp - (abc(def)ghi)\eg{-1} -.sp -The sequence \eg{-1} is a reference to the most recently started capturing -subpattern before \eg, that is, is it equivalent to \e2 in this example. -Similarly, \eg{-2} would be equivalent to \e1. The use of relative references -can be helpful in long patterns, and also in patterns that are created by -joining together fragments that contain references within themselves. -.P -A back reference matches whatever actually matched the capturing subpattern in -the current subject string, rather than anything matching the subpattern -itself (see -.\" HTML -.\" -"Subpatterns as subroutines" -.\" -below for a way of doing that). So the pattern -.sp - (sens|respons)e and \e1ibility -.sp -matches "sense and sensibility" and "response and responsibility", but not -"sense and responsibility". If caseful matching is in force at the time of the -back reference, the case of letters is relevant. For example, -.sp - ((?i)rah)\es+\e1 -.sp -matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original -capturing subpattern is matched caselessly. -.P -There are several different ways of writing back references to named -subpatterns. The .NET syntax \ek{name} and the Perl syntax \ek or -\ek'name' are supported, as is the Python syntax (?P=name). Perl 5.10's unified -back reference syntax, in which \eg can be used for both numeric and named -references, is also supported. We could rewrite the above example in any of -the following ways: -.sp - (?(?i)rah)\es+\ek - (?'p1'(?i)rah)\es+\ek{p1} - (?P(?i)rah)\es+(?P=p1) - (?(?i)rah)\es+\eg{p1} -.sp -A subpattern that is referenced by name may appear in the pattern before or -after the reference. -.P -There may be more than one back reference to the same subpattern. If a -subpattern has not actually been used in a particular match, any back -references to it always fail by default. For example, the pattern -.sp - (a|(bc))\e2 -.sp -always fails if it starts to match "a" rather than "bc". However, if the -PCRE_JAVASCRIPT_COMPAT option is set at compile time, a back reference to an -unset value matches an empty string. -.P -Because there may be many capturing parentheses in a pattern, all digits -following a backslash are taken as part of a potential back reference number. -If the pattern continues with a digit character, some delimiter must be used to -terminate the back reference. If the PCRE_EXTENDED option is set, this can be -white space. Otherwise, the \eg{ syntax or an empty comment (see -.\" HTML -.\" -"Comments" -.\" -below) can be used. -. -.SS "Recursive back references" -.rs -.sp -A back reference that occurs inside the parentheses to which it refers fails -when the subpattern is first used, so, for example, (a\e1) never matches. -However, such references can be useful inside repeated subpatterns. For -example, the pattern -.sp - (a|b\e1)+ -.sp -matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of -the subpattern, the back reference matches the character string corresponding -to the previous iteration. In order for this to work, the pattern must be such -that the first iteration does not need to match the back reference. This can be -done using alternation, as in the example above, or by a quantifier with a -minimum of zero. -.P -Back references of this type cause the group that they reference to be treated -as an -.\" HTML -.\" -atomic group. -.\" -Once the whole group has been matched, a subsequent matching failure cannot -cause backtracking into the middle of the group. -. -. -.\" HTML -.SH ASSERTIONS -.rs -.sp -An assertion is a test on the characters following or preceding the current -matching point that does not actually consume any characters. The simple -assertions coded as \eb, \eB, \eA, \eG, \eZ, \ez, ^ and $ are described -.\" HTML -.\" -above. -.\" -.P -More complicated assertions are coded as subpatterns. There are two kinds: -those that look ahead of the current position in the subject string, and those -that look behind it. An assertion subpattern is matched in the normal way, -except that it does not cause the current matching position to be changed. -.P -Assertion subpatterns are not capturing subpatterns. If such an assertion -contains capturing subpatterns within it, these are counted for the purposes of -numbering the capturing subpatterns in the whole pattern. However, substring -capturing is carried out only for positive assertions. (Perl sometimes, but not -always, does do capturing in negative assertions.) -.P -WARNING: If a positive assertion containing one or more capturing subpatterns -succeeds, but failure to match later in the pattern causes backtracking over -this assertion, the captures within the assertion are reset only if no higher -numbered captures are already set. This is, unfortunately, a fundamental -limitation of the current implementation, and as PCRE1 is now in -maintenance-only status, it is unlikely ever to change. -.P -For compatibility with Perl, assertion subpatterns may be repeated; though -it makes no sense to assert the same thing several times, the side effect of -capturing parentheses may occasionally be useful. In practice, there only three -cases: -.sp -(1) If the quantifier is {0}, the assertion is never obeyed during matching. -However, it may contain internal capturing parenthesized groups that are called -from elsewhere via the -.\" HTML -.\" -subroutine mechanism. -.\" -.sp -(2) If quantifier is {0,n} where n is greater than zero, it is treated as if it -were {0,1}. At run time, the rest of the pattern match is tried with and -without the assertion, the order depending on the greediness of the quantifier. -.sp -(3) If the minimum repetition is greater than zero, the quantifier is ignored. -The assertion is obeyed just once when encountered during matching. -. -. -.SS "Lookahead assertions" -.rs -.sp -Lookahead assertions start with (?= for positive assertions and (?! for -negative assertions. For example, -.sp - \ew+(?=;) -.sp -matches a word followed by a semicolon, but does not include the semicolon in -the match, and -.sp - foo(?!bar) -.sp -matches any occurrence of "foo" that is not followed by "bar". Note that the -apparently similar pattern -.sp - (?!foo)bar -.sp -does not find an occurrence of "bar" that is preceded by something other than -"foo"; it finds any occurrence of "bar" whatsoever, because the assertion -(?!foo) is always true when the next three characters are "bar". A -lookbehind assertion is needed to achieve the other effect. -.P -If you want to force a matching failure at some point in a pattern, the most -convenient way to do it is with (?!) because an empty string always matches, so -an assertion that requires there not to be an empty string must always fail. -The backtracking control verb (*FAIL) or (*F) is a synonym for (?!). -. -. -.\" HTML -.SS "Lookbehind assertions" -.rs -.sp -Lookbehind assertions start with (?<= for positive assertions and (? -.\" -(see above) -.\" -can be used instead of a lookbehind assertion to get round the fixed-length -restriction. -.P -The implementation of lookbehind assertions is, for each alternative, to -temporarily move the current position back by the fixed length and then try to -match. If there are insufficient characters before the current position, the -assertion fails. -.P -In a UTF mode, PCRE does not allow the \eC escape (which matches a single data -unit even in a UTF mode) to appear in lookbehind assertions, because it makes -it impossible to calculate the length of the lookbehind. The \eX and \eR -escapes, which can match different numbers of data units, are also not -permitted. -.P -.\" HTML -.\" -"Subroutine" -.\" -calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long -as the subpattern matches a fixed-length string. -.\" HTML -.\" -Recursion, -.\" -however, is not supported. -.P -Possessive quantifiers can be used in conjunction with lookbehind assertions to -specify efficient matching of fixed-length strings at the end of subject -strings. Consider a simple pattern such as -.sp - abcd$ -.sp -when applied to a long string that does not match. Because matching proceeds -from left to right, PCRE will look for each "a" in the subject and then see if -what follows matches the rest of the pattern. If the pattern is specified as -.sp - ^.*abcd$ -.sp -the initial .* matches the entire string at first, but when this fails (because -there is no following "a"), it backtracks to match all but the last character, -then all but the last two characters, and so on. Once again the search for "a" -covers the entire string, from right to left, so we are no better off. However, -if the pattern is written as -.sp - ^.*+(?<=abcd) -.sp -there can be no backtracking for the .*+ item; it can match only the entire -string. The subsequent lookbehind assertion does a single test on the last four -characters. If it fails, the match fails immediately. For long strings, this -approach makes a significant difference to the processing time. -. -. -.SS "Using multiple assertions" -.rs -.sp -Several assertions (of any sort) may occur in succession. For example, -.sp - (?<=\ed{3})(? -.SH "CONDITIONAL SUBPATTERNS" -.rs -.sp -It is possible to cause the matching process to obey a subpattern -conditionally or to choose between two alternative subpatterns, depending on -the result of an assertion, or whether a specific capturing subpattern has -already been matched. The two possible forms of conditional subpattern are: -.sp - (?(condition)yes-pattern) - (?(condition)yes-pattern|no-pattern) -.sp -If the condition is satisfied, the yes-pattern is used; otherwise the -no-pattern (if present) is used. If there are more than two alternatives in the -subpattern, a compile-time error occurs. Each of the two alternatives may -itself contain nested subpatterns of any form, including conditional -subpatterns; the restriction to two alternatives applies only at the level of -the condition. This pattern fragment is an example where the alternatives are -complex: -.sp - (?(1) (A|B|C) | (D | (?(2)E|F) | E) ) -.sp -.P -There are four kinds of condition: references to subpatterns, references to -recursion, a pseudo-condition called DEFINE, and assertions. -. -.SS "Checking for a used subpattern by number" -.rs -.sp -If the text between the parentheses consists of a sequence of digits, the -condition is true if a capturing subpattern of that number has previously -matched. If there is more than one capturing subpattern with the same number -(see the earlier -.\" -.\" HTML -.\" -section about duplicate subpattern numbers), -.\" -the condition is true if any of them have matched. An alternative notation is -to precede the digits with a plus or minus sign. In this case, the subpattern -number is relative rather than absolute. The most recently opened parentheses -can be referenced by (?(-1), the next most recent by (?(-2), and so on. Inside -loops it can also make sense to refer to subsequent groups. The next -parentheses to be opened can be referenced as (?(+1), and so on. (The value -zero in any of these forms is not used; it provokes a compile-time error.) -.P -Consider the following pattern, which contains non-significant white space to -make it more readable (assume the PCRE_EXTENDED option) and to divide it into -three parts for ease of discussion: -.sp - ( \e( )? [^()]+ (?(1) \e) ) -.sp -The first part matches an optional opening parenthesis, and if that -character is present, sets it as the first captured substring. The second part -matches one or more characters that are not parentheses. The third part is a -conditional subpattern that tests whether or not the first set of parentheses -matched. If they did, that is, if subject started with an opening parenthesis, -the condition is true, and so the yes-pattern is executed and a closing -parenthesis is required. Otherwise, since no-pattern is not present, the -subpattern matches nothing. In other words, this pattern matches a sequence of -non-parentheses, optionally enclosed in parentheses. -.P -If you were embedding this pattern in a larger one, you could use a relative -reference: -.sp - ...other stuff... ( \e( )? [^()]+ (?(-1) \e) ) ... -.sp -This makes the fragment independent of the parentheses in the larger pattern. -. -.SS "Checking for a used subpattern by name" -.rs -.sp -Perl uses the syntax (?()...) or (?('name')...) to test for a used -subpattern by name. For compatibility with earlier versions of PCRE, which had -this facility before Perl, the syntax (?(name)...) is also recognized. -.P -Rewriting the above example to use a named subpattern gives this: -.sp - (? \e( )? [^()]+ (?() \e) ) -.sp -If the name used in a condition of this kind is a duplicate, the test is -applied to all subpatterns of the same name, and is true if any one of them has -matched. -. -.SS "Checking for pattern recursion" -.rs -.sp -If the condition is the string (R), and there is no subpattern with the name R, -the condition is true if a recursive call to the whole pattern or any -subpattern has been made. If digits or a name preceded by ampersand follow the -letter R, for example: -.sp - (?(R3)...) or (?(R&name)...) -.sp -the condition is true if the most recent recursion is into a subpattern whose -number or name is given. This condition does not check the entire recursion -stack. If the name used in a condition of this kind is a duplicate, the test is -applied to all subpatterns of the same name, and is true if any one of them is -the most recent recursion. -.P -At "top level", all these recursion test conditions are false. -.\" HTML -.\" -The syntax for recursive patterns -.\" -is described below. -. -.\" HTML -.SS "Defining subpatterns for use by reference only" -.rs -.sp -If the condition is the string (DEFINE), and there is no subpattern with the -name DEFINE, the condition is always false. In this case, there may be only one -alternative in the subpattern. It is always skipped if control reaches this -point in the pattern; the idea of DEFINE is that it can be used to define -subroutines that can be referenced from elsewhere. (The use of -.\" HTML -.\" -subroutines -.\" -is described below.) For example, a pattern to match an IPv4 address such as -"192.168.23.245" could be written like this (ignore white space and line -breaks): -.sp - (?(DEFINE) (? 2[0-4]\ed | 25[0-5] | 1\ed\ed | [1-9]?\ed) ) - \eb (?&byte) (\e.(?&byte)){3} \eb -.sp -The first part of the pattern is a DEFINE group inside which a another group -named "byte" is defined. This matches an individual component of an IPv4 -address (a number less than 256). When matching takes place, this part of the -pattern is skipped because DEFINE acts like a false condition. The rest of the -pattern uses references to the named group to match the four dot-separated -components of an IPv4 address, insisting on a word boundary at each end. -. -.SS "Assertion conditions" -.rs -.sp -If the condition is not in any of the above formats, it must be an assertion. -This may be a positive or negative lookahead or lookbehind assertion. Consider -this pattern, again containing non-significant white space, and with the two -alternatives on the second line: -.sp - (?(?=[^a-z]*[a-z]) - \ed{2}-[a-z]{3}-\ed{2} | \ed{2}-\ed{2}-\ed{2} ) -.sp -The condition is a positive lookahead assertion that matches an optional -sequence of non-letters followed by a letter. In other words, it tests for the -presence of at least one letter in the subject. If a letter is found, the -subject is matched against the first alternative; otherwise it is matched -against the second. This pattern matches strings in one of the two forms -dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits. -. -. -.\" HTML -.SH COMMENTS -.rs -.sp -There are two ways of including comments in patterns that are processed by -PCRE. In both cases, the start of the comment must not be in a character class, -nor in the middle of any other sequence of related characters such as (?: or a -subpattern name or number. The characters that make up a comment play no part -in the pattern matching. -.P -The sequence (?# marks the start of a comment that continues up to the next -closing parenthesis. Nested parentheses are not permitted. If the PCRE_EXTENDED -option is set, an unescaped # character also introduces a comment, which in -this case continues to immediately after the next newline character or -character sequence in the pattern. Which characters are interpreted as newlines -is controlled by the options passed to a compiling function or by a special -sequence at the start of the pattern, as described in the section entitled -.\" HTML -.\" -"Newline conventions" -.\" -above. Note that the end of this type of comment is a literal newline sequence -in the pattern; escape sequences that happen to represent a newline do not -count. For example, consider this pattern when PCRE_EXTENDED is set, and the -default newline convention is in force: -.sp - abc #comment \en still comment -.sp -On encountering the # character, \fBpcre_compile()\fP skips along, looking for -a newline in the pattern. The sequence \en is still literal at this stage, so -it does not terminate the comment. Only an actual character with the code value -0x0a (the default newline) does so. -. -. -.\" HTML -.SH "RECURSIVE PATTERNS" -.rs -.sp -Consider the problem of matching a string in parentheses, allowing for -unlimited nested parentheses. Without the use of recursion, the best that can -be done is to use a pattern that matches up to some fixed depth of nesting. It -is not possible to handle an arbitrary nesting depth. -.P -For some time, Perl has provided a facility that allows regular expressions to -recurse (amongst other things). It does this by interpolating Perl code in the -expression at run time, and the code can refer to the expression itself. A Perl -pattern using code interpolation to solve the parentheses problem can be -created like this: -.sp - $re = qr{\e( (?: (?>[^()]+) | (?p{$re}) )* \e)}x; -.sp -The (?p{...}) item interpolates Perl code at run time, and in this case refers -recursively to the pattern in which it appears. -.P -Obviously, PCRE cannot support the interpolation of Perl code. Instead, it -supports special syntax for recursion of the entire pattern, and also for -individual subpattern recursion. After its introduction in PCRE and Python, -this kind of recursion was subsequently introduced into Perl at release 5.10. -.P -A special item that consists of (? followed by a number greater than zero and a -closing parenthesis is a recursive subroutine call of the subpattern of the -given number, provided that it occurs inside that subpattern. (If not, it is a -.\" HTML -.\" -non-recursive subroutine -.\" -call, which is described in the next section.) The special item (?R) or (?0) is -a recursive call of the entire regular expression. -.P -This PCRE pattern solves the nested parentheses problem (assume the -PCRE_EXTENDED option is set so that white space is ignored): -.sp - \e( ( [^()]++ | (?R) )* \e) -.sp -First it matches an opening parenthesis. Then it matches any number of -substrings which can either be a sequence of non-parentheses, or a recursive -match of the pattern itself (that is, a correctly parenthesized substring). -Finally there is a closing parenthesis. Note the use of a possessive quantifier -to avoid backtracking into sequences of non-parentheses. -.P -If this were part of a larger pattern, you would not want to recurse the entire -pattern, so instead you could use this: -.sp - ( \e( ( [^()]++ | (?1) )* \e) ) -.sp -We have put the pattern into parentheses, and caused the recursion to refer to -them instead of the whole pattern. -.P -In a larger pattern, keeping track of parenthesis numbers can be tricky. This -is made easier by the use of relative references. Instead of (?1) in the -pattern above you can write (?-2) to refer to the second most recently opened -parentheses preceding the recursion. In other words, a negative number counts -capturing parentheses leftwards from the point at which it is encountered. -.P -It is also possible to refer to subsequently opened parentheses, by writing -references such as (?+2). However, these cannot be recursive because the -reference is not inside the parentheses that are referenced. They are always -.\" HTML -.\" -non-recursive subroutine -.\" -calls, as described in the next section. -.P -An alternative approach is to use named parentheses instead. The Perl syntax -for this is (?&name); PCRE's earlier syntax (?P>name) is also supported. We -could rewrite the above example as follows: -.sp - (? \e( ( [^()]++ | (?&pn) )* \e) ) -.sp -If there is more than one subpattern with the same name, the earliest one is -used. -.P -This particular example pattern that we have been looking at contains nested -unlimited repeats, and so the use of a possessive quantifier for matching -strings of non-parentheses is important when applying the pattern to strings -that do not match. For example, when this pattern is applied to -.sp - (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa() -.sp -it yields "no match" quickly. However, if a possessive quantifier is not used, -the match runs for a very long time indeed because there are so many different -ways the + and * repeats can carve up the subject, and all have to be tested -before failure can be reported. -.P -At the end of a match, the values of capturing parentheses are those from -the outermost level. If you want to obtain intermediate values, a callout -function can be used (see below and the -.\" HREF -\fBpcrecallout\fP -.\" -documentation). If the pattern above is matched against -.sp - (ab(cd)ef) -.sp -the value for the inner capturing parentheses (numbered 2) is "ef", which is -the last value taken on at the top level. If a capturing subpattern is not -matched at the top level, its final captured value is unset, even if it was -(temporarily) set at a deeper level during the matching process. -.P -If there are more than 15 capturing parentheses in a pattern, PCRE has to -obtain extra memory to store data during a recursion, which it does by using -\fBpcre_malloc\fP, freeing it via \fBpcre_free\fP afterwards. If no memory can -be obtained, the match fails with the PCRE_ERROR_NOMEMORY error. -.P -Do not confuse the (?R) item with the condition (R), which tests for recursion. -Consider this pattern, which matches text in angle brackets, allowing for -arbitrary nesting. Only digits are allowed in nested brackets (that is, when -recursing), whereas any characters are permitted at the outer level. -.sp - < (?: (?(R) \ed++ | [^<>]*+) | (?R)) * > -.sp -In this pattern, (?(R) is the start of a conditional subpattern, with two -different alternatives for the recursive and non-recursive cases. The (?R) item -is the actual recursive call. -. -. -.\" HTML -.SS "Differences in recursion processing between PCRE and Perl" -.rs -.sp -Recursion processing in PCRE differs from Perl in two important ways. In PCRE -(like Python, but unlike Perl), a recursive subpattern call is always treated -as an atomic group. That is, once it has matched some of the subject string, it -is never re-entered, even if it contains untried alternatives and there is a -subsequent matching failure. This can be illustrated by the following pattern, -which purports to match a palindromic string that contains an odd number of -characters (for example, "a", "aba", "abcba", "abcdcba"): -.sp - ^(.|(.)(?1)\e2)$ -.sp -The idea is that it either matches a single character, or two identical -characters surrounding a sub-palindrome. In Perl, this pattern works; in PCRE -it does not if the pattern is longer than three characters. Consider the -subject string "abcba": -.P -At the top level, the first character is matched, but as it is not at the end -of the string, the first alternative fails; the second alternative is taken -and the recursion kicks in. The recursive call to subpattern 1 successfully -matches the next character ("b"). (Note that the beginning and end of line -tests are not part of the recursion). -.P -Back at the top level, the next character ("c") is compared with what -subpattern 2 matched, which was "a". This fails. Because the recursion is -treated as an atomic group, there are now no backtracking points, and so the -entire match fails. (Perl is able, at this point, to re-enter the recursion and -try the second alternative.) However, if the pattern is written with the -alternatives in the other order, things are different: -.sp - ^((.)(?1)\e2|.)$ -.sp -This time, the recursing alternative is tried first, and continues to recurse -until it runs out of characters, at which point the recursion fails. But this -time we do have another alternative to try at the higher level. That is the big -difference: in the previous case the remaining alternative is at a deeper -recursion level, which PCRE cannot use. -.P -To change the pattern so that it matches all palindromic strings, not just -those with an odd number of characters, it is tempting to change the pattern to -this: -.sp - ^((.)(?1)\e2|.?)$ -.sp -Again, this works in Perl, but not in PCRE, and for the same reason. When a -deeper recursion has matched a single character, it cannot be entered again in -order to match an empty string. The solution is to separate the two cases, and -write out the odd and even cases as alternatives at the higher level: -.sp - ^(?:((.)(?1)\e2|)|((.)(?3)\e4|.)) -.sp -If you want to match typical palindromic phrases, the pattern has to ignore all -non-word characters, which can be done like this: -.sp - ^\eW*+(?:((.)\eW*+(?1)\eW*+\e2|)|((.)\eW*+(?3)\eW*+\e4|\eW*+.\eW*+))\eW*+$ -.sp -If run with the PCRE_CASELESS option, this pattern matches phrases such as "A -man, a plan, a canal: Panama!" and it works well in both PCRE and Perl. Note -the use of the possessive quantifier *+ to avoid backtracking into sequences of -non-word characters. Without this, PCRE takes a great deal longer (ten times or -more) to match typical phrases, and Perl takes so long that you think it has -gone into a loop. -.P -\fBWARNING\fP: The palindrome-matching patterns above work only if the subject -string does not start with a palindrome that is shorter than the entire string. -For example, although "abcba" is correctly matched, if the subject is "ababa", -PCRE finds the palindrome "aba" at the start, then fails at top level because -the end of the string does not follow. Once again, it cannot jump back into the -recursion to try other alternatives, so the entire match fails. -.P -The second way in which PCRE and Perl differ in their recursion processing is -in the handling of captured values. In Perl, when a subpattern is called -recursively or as a subpattern (see the next section), it has no access to any -values that were captured outside the recursion, whereas in PCRE these values -can be referenced. Consider this pattern: -.sp - ^(.)(\e1|a(?2)) -.sp -In PCRE, this pattern matches "bab". The first capturing parentheses match "b", -then in the second group, when the back reference \e1 fails to match "b", the -second alternative matches "a" and then recurses. In the recursion, \e1 does -now match "b" and so the whole match succeeds. In Perl, the pattern fails to -match because inside the recursive call \e1 cannot access the externally set -value. -. -. -.\" HTML -.SH "SUBPATTERNS AS SUBROUTINES" -.rs -.sp -If the syntax for a recursive subpattern call (either by number or by -name) is used outside the parentheses to which it refers, it operates like a -subroutine in a programming language. The called subpattern may be defined -before or after the reference. A numbered reference can be absolute or -relative, as in these examples: -.sp - (...(absolute)...)...(?2)... - (...(relative)...)...(?-1)... - (...(?+1)...(relative)... -.sp -An earlier example pointed out that the pattern -.sp - (sens|respons)e and \e1ibility -.sp -matches "sense and sensibility" and "response and responsibility", but not -"sense and responsibility". If instead the pattern -.sp - (sens|respons)e and (?1)ibility -.sp -is used, it does match "sense and responsibility" as well as the other two -strings. Another example is given in the discussion of DEFINE above. -.P -All subroutine calls, whether recursive or not, are always treated as atomic -groups. That is, once a subroutine has matched some of the subject string, it -is never re-entered, even if it contains untried alternatives and there is a -subsequent matching failure. Any capturing parentheses that are set during the -subroutine call revert to their previous values afterwards. -.P -Processing options such as case-independence are fixed when a subpattern is -defined, so if it is used as a subroutine, such options cannot be changed for -different calls. For example, consider this pattern: -.sp - (abc)(?i:(?-1)) -.sp -It matches "abcabc". It does not match "abcABC" because the change of -processing option does not affect the called subpattern. -. -. -.\" HTML -.SH "ONIGURUMA SUBROUTINE SYNTAX" -.rs -.sp -For compatibility with Oniguruma, the non-Perl syntax \eg followed by a name or -a number enclosed either in angle brackets or single quotes, is an alternative -syntax for referencing a subpattern as a subroutine, possibly recursively. Here -are two of the examples used above, rewritten using this syntax: -.sp - (? \e( ( (?>[^()]+) | \eg )* \e) ) - (sens|respons)e and \eg'1'ibility -.sp -PCRE supports an extension to Oniguruma: if a number is preceded by a -plus or a minus sign it is taken as a relative reference. For example: -.sp - (abc)(?i:\eg<-1>) -.sp -Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP -synonymous. The former is a back reference; the latter is a subroutine call. -. -. -.SH CALLOUTS -.rs -.sp -Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl -code to be obeyed in the middle of matching a regular expression. This makes it -possible, amongst other things, to extract different substrings that match the -same pair of parentheses when there is a repetition. -.P -PCRE provides a similar feature, but of course it cannot obey arbitrary Perl -code. The feature is called "callout". The caller of PCRE provides an external -function by putting its entry point in the global variable \fIpcre_callout\fP -(8-bit library) or \fIpcre[16|32]_callout\fP (16-bit or 32-bit library). -By default, this variable contains NULL, which disables all calling out. -.P -Within a regular expression, (?C) indicates the points at which the external -function is to be called. If you want to identify different callout points, you -can put a number less than 256 after the letter C. The default value is zero. -For example, this pattern has two callout points: -.sp - (?C1)abc(?C2)def -.sp -If the PCRE_AUTO_CALLOUT flag is passed to a compiling function, callouts are -automatically installed before each item in the pattern. They are all numbered -255. If there is a conditional group in the pattern whose condition is an -assertion, an additional callout is inserted just before the condition. An -explicit callout may also be set at this position, as in this example: -.sp - (?(?C9)(?=a)abc|def) -.sp -Note that this applies only to assertion conditions, not to other types of -condition. -.P -During matching, when PCRE reaches a callout point, the external function is -called. It is provided with the number of the callout, the position in the -pattern, and, optionally, one item of data originally supplied by the caller of -the matching function. The callout function may cause matching to proceed, to -backtrack, or to fail altogether. -.P -By default, PCRE implements a number of optimizations at compile time and -matching time, and one side-effect is that sometimes callouts are skipped. If -you need all possible callouts to happen, you need to set options that disable -the relevant optimizations. More details, and a complete description of the -interface to the callout function, are given in the -.\" HREF -\fBpcrecallout\fP -.\" -documentation. -. -. -.\" HTML -.SH "BACKTRACKING CONTROL" -.rs -.sp -Perl 5.10 introduced a number of "Special Backtracking Control Verbs", which -are still described in the Perl documentation as "experimental and subject to -change or removal in a future version of Perl". It goes on to say: "Their usage -in production code should be noted to avoid problems during upgrades." The same -remarks apply to the PCRE features described in this section. -.P -The new verbs make use of what was previously invalid syntax: an opening -parenthesis followed by an asterisk. They are generally of the form -(*VERB) or (*VERB:NAME). Some may take either form, possibly behaving -differently depending on whether or not a name is present. A name is any -sequence of characters that does not include a closing parenthesis. The maximum -length of name is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit -libraries. If the name is empty, that is, if the closing parenthesis -immediately follows the colon, the effect is as if the colon were not there. -Any number of these verbs may occur in a pattern. -.P -Since these verbs are specifically related to backtracking, most of them can be -used only when the pattern is to be matched using one of the traditional -matching functions, because these use a backtracking algorithm. With the -exception of (*FAIL), which behaves like a failing negative assertion, the -backtracking control verbs cause an error if encountered by a DFA matching -function. -.P -The behaviour of these verbs in -.\" HTML -.\" -repeated groups, -.\" -.\" HTML -.\" -assertions, -.\" -and in -.\" HTML -.\" -subpatterns called as subroutines -.\" -(whether or not recursively) is documented below. -. -. -.\" HTML -.SS "Optimizations that affect backtracking verbs" -.rs -.sp -PCRE contains some optimizations that are used to speed up matching by running -some checks at the start of each match attempt. For example, it may know the -minimum length of matching subject, or that a particular character must be -present. When one of these optimizations bypasses the running of a match, any -included backtracking verbs will not, of course, be processed. You can suppress -the start-of-match optimizations by setting the PCRE_NO_START_OPTIMIZE option -when calling \fBpcre_compile()\fP or \fBpcre_exec()\fP, or by starting the -pattern with (*NO_START_OPT). There is more discussion of this option in the -section entitled -.\" HTML -.\" -"Option bits for \fBpcre_exec()\fP" -.\" -in the -.\" HREF -\fBpcreapi\fP -.\" -documentation. -.P -Experiments with Perl suggest that it too has similar optimizations, sometimes -leading to anomalous results. -. -. -.SS "Verbs that act immediately" -.rs -.sp -The following verbs act as soon as they are encountered. They may not be -followed by a name. -.sp - (*ACCEPT) -.sp -This verb causes the match to end successfully, skipping the remainder of the -pattern. However, when it is inside a subpattern that is called as a -subroutine, only that subpattern is ended successfully. Matching then continues -at the outer level. If (*ACCEPT) in triggered in a positive assertion, the -assertion succeeds; in a negative assertion, the assertion fails. -.P -If (*ACCEPT) is inside capturing parentheses, the data so far is captured. For -example: -.sp - A((?:A|B(*ACCEPT)|C)D) -.sp -This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by -the outer parentheses. -.sp - (*FAIL) or (*F) -.sp -This verb causes a matching failure, forcing backtracking to occur. It is -equivalent to (?!) but easier to read. The Perl documentation notes that it is -probably useful only when combined with (?{}) or (??{}). Those are, of course, -Perl features that are not present in PCRE. The nearest equivalent is the -callout feature, as for example in this pattern: -.sp - a+(?C)(*FAIL) -.sp -A match with the string "aaaa" always fails, but the callout is taken before -each backtrack happens (in this example, 10 times). -. -. -.SS "Recording which path was taken" -.rs -.sp -There is one verb whose main purpose is to track how a match was arrived at, -though it also has a secondary use in conjunction with advancing the match -starting point (see (*SKIP) below). -.sp - (*MARK:NAME) or (*:NAME) -.sp -A name is always required with this verb. There may be as many instances of -(*MARK) as you like in a pattern, and their names do not have to be unique. -.P -When a match succeeds, the name of the last-encountered (*MARK:NAME), -(*PRUNE:NAME), or (*THEN:NAME) on the matching path is passed back to the -caller as described in the section entitled -.\" HTML -.\" -"Extra data for \fBpcre_exec()\fP" -.\" -in the -.\" HREF -\fBpcreapi\fP -.\" -documentation. Here is an example of \fBpcretest\fP output, where the /K -modifier requests the retrieval and outputting of (*MARK) data: -.sp - re> /X(*MARK:A)Y|X(*MARK:B)Z/K - data> XY - 0: XY - MK: A - XZ - 0: XZ - MK: B -.sp -The (*MARK) name is tagged with "MK:" in this output, and in this example it -indicates which of the two alternatives matched. This is a more efficient way -of obtaining this information than putting each alternative in its own -capturing parentheses. -.P -If a verb with a name is encountered in a positive assertion that is true, the -name is recorded and passed back if it is the last-encountered. This does not -happen for negative assertions or failing positive assertions. -.P -After a partial match or a failed match, the last encountered name in the -entire match process is returned. For example: -.sp - re> /X(*MARK:A)Y|X(*MARK:B)Z/K - data> XP - No match, mark = B -.sp -Note that in this unanchored example the mark is retained from the match -attempt that started at the letter "X" in the subject. Subsequent match -attempts starting at "P" and then with an empty string do not get as far as the -(*MARK) item, but nevertheless do not reset it. -.P -If you are interested in (*MARK) values after failed matches, you should -probably set the PCRE_NO_START_OPTIMIZE option -.\" HTML -.\" -(see above) -.\" -to ensure that the match is always attempted. -. -. -.SS "Verbs that act after backtracking" -.rs -.sp -The following verbs do nothing when they are encountered. Matching continues -with what follows, but if there is no subsequent match, causing a backtrack to -the verb, a failure is forced. That is, backtracking cannot pass to the left of -the verb. However, when one of these verbs appears inside an atomic group or an -assertion that is true, its effect is confined to that group, because once the -group has been matched, there is never any backtracking into it. In this -situation, backtracking can "jump back" to the left of the entire atomic group -or assertion. (Remember also, as stated above, that this localization also -applies in subroutine calls.) -.P -These verbs differ in exactly what kind of failure occurs when backtracking -reaches them. The behaviour described below is what happens when the verb is -not in a subroutine or an assertion. Subsequent sections cover these special -cases. -.sp - (*COMMIT) -.sp -This verb, which may not be followed by a name, causes the whole match to fail -outright if there is a later matching failure that causes backtracking to reach -it. Even if the pattern is unanchored, no further attempts to find a match by -advancing the starting point take place. If (*COMMIT) is the only backtracking -verb that is encountered, once it has been passed \fBpcre_exec()\fP is -committed to finding a match at the current starting point, or not at all. For -example: -.sp - a+(*COMMIT)b -.sp -This matches "xxaab" but not "aacaab". It can be thought of as a kind of -dynamic anchor, or "I've started, so I must finish." The name of the most -recently passed (*MARK) in the path is passed back when (*COMMIT) forces a -match failure. -.P -If there is more than one backtracking verb in a pattern, a different one that -follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a -match does not always guarantee that a match must be at this starting point. -.P -Note that (*COMMIT) at the start of a pattern is not the same as an anchor, -unless PCRE's start-of-match optimizations are turned off, as shown in this -output from \fBpcretest\fP: -.sp - re> /(*COMMIT)abc/ - data> xyzabc - 0: abc - data> xyzabc\eY - No match -.sp -For this pattern, PCRE knows that any match must start with "a", so the -optimization skips along the subject to "a" before applying the pattern to the -first set of data. The match attempt then succeeds. In the second set of data, -the escape sequence \eY is interpreted by the \fBpcretest\fP program. It causes -the PCRE_NO_START_OPTIMIZE option to be set when \fBpcre_exec()\fP is called. -This disables the optimization that skips along to the first character. The -pattern is now applied starting at "x", and so the (*COMMIT) causes the match -to fail without trying any other starting points. -.sp - (*PRUNE) or (*PRUNE:NAME) -.sp -This verb causes the match to fail at the current starting position in the -subject if there is a later matching failure that causes backtracking to reach -it. If the pattern is unanchored, the normal "bumpalong" advance to the next -starting character then happens. Backtracking can occur as usual to the left of -(*PRUNE), before it is reached, or when matching to the right of (*PRUNE), but -if there is no match to the right, backtracking cannot cross (*PRUNE). In -simple cases, the use of (*PRUNE) is just an alternative to an atomic group or -possessive quantifier, but there are some uses of (*PRUNE) that cannot be -expressed in any other way. In an anchored pattern (*PRUNE) has the same effect -as (*COMMIT). -.P -The behaviour of (*PRUNE:NAME) is the not the same as (*MARK:NAME)(*PRUNE). -It is like (*MARK:NAME) in that the name is remembered for passing back to the -caller. However, (*SKIP:NAME) searches only for names set with (*MARK). -.sp - (*SKIP) -.sp -This verb, when given without a name, is like (*PRUNE), except that if the -pattern is unanchored, the "bumpalong" advance is not to the next character, -but to the position in the subject where (*SKIP) was encountered. (*SKIP) -signifies that whatever text was matched leading up to it cannot be part of a -successful match. Consider: -.sp - a+(*SKIP)b -.sp -If the subject is "aaaac...", after the first match attempt fails (starting at -the first character in the string), the starting point skips on to start the -next attempt at "c". Note that a possessive quantifer does not have the same -effect as this example; although it would suppress backtracking during the -first match attempt, the second attempt would start at the second character -instead of skipping on to "c". -.sp - (*SKIP:NAME) -.sp -When (*SKIP) has an associated name, its behaviour is modified. When it is -triggered, the previous path through the pattern is searched for the most -recent (*MARK) that has the same name. If one is found, the "bumpalong" advance -is to the subject position that corresponds to that (*MARK) instead of to where -(*SKIP) was encountered. If no (*MARK) with a matching name is found, the -(*SKIP) is ignored. -.P -Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores -names that are set by (*PRUNE:NAME) or (*THEN:NAME). -.sp - (*THEN) or (*THEN:NAME) -.sp -This verb causes a skip to the next innermost alternative when backtracking -reaches it. That is, it cancels any further backtracking within the current -alternative. Its name comes from the observation that it can be used for a -pattern-based if-then-else block: -.sp - ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ... -.sp -If the COND1 pattern matches, FOO is tried (and possibly further items after -the end of the group if FOO succeeds); on failure, the matcher skips to the -second alternative and tries COND2, without backtracking into COND1. If that -succeeds and BAR fails, COND3 is tried. If subsequently BAZ fails, there are no -more alternatives, so there is a backtrack to whatever came before the entire -group. If (*THEN) is not inside an alternation, it acts like (*PRUNE). -.P -The behaviour of (*THEN:NAME) is the not the same as (*MARK:NAME)(*THEN). -It is like (*MARK:NAME) in that the name is remembered for passing back to the -caller. However, (*SKIP:NAME) searches only for names set with (*MARK). -.P -A subpattern that does not contain a | character is just a part of the -enclosing alternative; it is not a nested alternation with only one -alternative. The effect of (*THEN) extends beyond such a subpattern to the -enclosing alternative. Consider this pattern, where A, B, etc. are complex -pattern fragments that do not contain any | characters at this level: -.sp - A (B(*THEN)C) | D -.sp -If A and B are matched, but there is a failure in C, matching does not -backtrack into A; instead it moves to the next alternative, that is, D. -However, if the subpattern containing (*THEN) is given an alternative, it -behaves differently: -.sp - A (B(*THEN)C | (*FAIL)) | D -.sp -The effect of (*THEN) is now confined to the inner subpattern. After a failure -in C, matching moves to (*FAIL), which causes the whole subpattern to fail -because there are no more alternatives to try. In this case, matching does now -backtrack into A. -.P -Note that a conditional subpattern is not considered as having two -alternatives, because only one is ever used. In other words, the | character in -a conditional subpattern has a different meaning. Ignoring white space, -consider: -.sp - ^.*? (?(?=a) a | b(*THEN)c ) -.sp -If the subject is "ba", this pattern does not match. Because .*? is ungreedy, -it initially matches zero characters. The condition (?=a) then fails, the -character "b" is matched, but "c" is not. At this point, matching does not -backtrack to .*? as might perhaps be expected from the presence of the | -character. The conditional subpattern is part of the single alternative that -comprises the whole pattern, and so the match fails. (If there was a backtrack -into .*?, allowing it to match "b", the match would succeed.) -.P -The verbs just described provide four different "strengths" of control when -subsequent matching fails. (*THEN) is the weakest, carrying on the match at the -next alternative. (*PRUNE) comes next, failing the match at the current -starting position, but allowing an advance to the next character (for an -unanchored pattern). (*SKIP) is similar, except that the advance may be more -than one character. (*COMMIT) is the strongest, causing the entire match to -fail. -. -. -.SS "More than one backtracking verb" -.rs -.sp -If more than one backtracking verb is present in a pattern, the one that is -backtracked onto first acts. For example, consider this pattern, where A, B, -etc. are complex pattern fragments: -.sp - (A(*COMMIT)B(*THEN)C|ABD) -.sp -If A matches but B fails, the backtrack to (*COMMIT) causes the entire match to -fail. However, if A and B match, but C fails, the backtrack to (*THEN) causes -the next alternative (ABD) to be tried. This behaviour is consistent, but is -not always the same as Perl's. It means that if two or more backtracking verbs -appear in succession, all the the last of them has no effect. Consider this -example: -.sp - ...(*COMMIT)(*PRUNE)... -.sp -If there is a matching failure to the right, backtracking onto (*PRUNE) causes -it to be triggered, and its action is taken. There can never be a backtrack -onto (*COMMIT). -. -. -.\" HTML -.SS "Backtracking verbs in repeated groups" -.rs -.sp -PCRE differs from Perl in its handling of backtracking verbs in repeated -groups. For example, consider: -.sp - /(a(*COMMIT)b)+ac/ -.sp -If the subject is "abac", Perl matches, but PCRE fails because the (*COMMIT) in -the second repeat of the group acts. -. -. -.\" HTML -.SS "Backtracking verbs in assertions" -.rs -.sp -(*FAIL) in an assertion has its normal effect: it forces an immediate backtrack. -.P -(*ACCEPT) in a positive assertion causes the assertion to succeed without any -further processing. In a negative assertion, (*ACCEPT) causes the assertion to -fail without any further processing. -.P -The other backtracking verbs are not treated specially if they appear in a -positive assertion. In particular, (*THEN) skips to the next alternative in the -innermost enclosing group that has alternations, whether or not this is within -the assertion. -.P -Negative assertions are, however, different, in order to ensure that changing a -positive assertion into a negative assertion changes its result. Backtracking -into (*COMMIT), (*SKIP), or (*PRUNE) causes a negative assertion to be true, -without considering any further alternative branches in the assertion. -Backtracking into (*THEN) causes it to skip to the next enclosing alternative -within the assertion (the normal behaviour), but if the assertion does not have -such an alternative, (*THEN) behaves like (*PRUNE). -. -. -.\" HTML -.SS "Backtracking verbs in subroutines" -.rs -.sp -These behaviours occur whether or not the subpattern is called recursively. -Perl's treatment of subroutines is different in some cases. -.P -(*FAIL) in a subpattern called as a subroutine has its normal effect: it forces -an immediate backtrack. -.P -(*ACCEPT) in a subpattern called as a subroutine causes the subroutine match to -succeed without any further processing. Matching then continues after the -subroutine call. -.P -(*COMMIT), (*SKIP), and (*PRUNE) in a subpattern called as a subroutine cause -the subroutine match to fail. -.P -(*THEN) skips to the next alternative in the innermost enclosing group within -the subpattern that has alternatives. If there is no such group within the -subpattern, (*THEN) causes the subroutine match to fail. -. -. -.SH "SEE ALSO" -.rs -.sp -\fBpcreapi\fP(3), \fBpcrecallout\fP(3), \fBpcrematching\fP(3), -\fBpcresyntax\fP(3), \fBpcre\fP(3), \fBpcre16(3)\fP, \fBpcre32(3)\fP. -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 23 October 2016 -Copyright (c) 1997-2016 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcreperform.3 b/src/pcre/doc/pcreperform.3 deleted file mode 100644 index fb2aa959..00000000 --- a/src/pcre/doc/pcreperform.3 +++ /dev/null @@ -1,177 +0,0 @@ -.TH PCREPERFORM 3 "09 January 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH "PCRE PERFORMANCE" -.rs -.sp -Two aspects of performance are discussed below: memory usage and processing -time. The way you express your pattern as a regular expression can affect both -of them. -. -.SH "COMPILED PATTERN MEMORY USAGE" -.rs -.sp -Patterns are compiled by PCRE into a reasonably efficient interpretive code, so -that most simple patterns do not use much memory. However, there is one case -where the memory usage of a compiled pattern can be unexpectedly large. If a -parenthesized subpattern has a quantifier with a minimum greater than 1 and/or -a limited maximum, the whole subpattern is repeated in the compiled code. For -example, the pattern -.sp - (abc|def){2,4} -.sp -is compiled as if it were -.sp - (abc|def)(abc|def)((abc|def)(abc|def)?)? -.sp -(Technical aside: It is done this way so that backtrack points within each of -the repetitions can be independently maintained.) -.P -For regular expressions whose quantifiers use only small numbers, this is not -usually a problem. However, if the numbers are large, and particularly if such -repetitions are nested, the memory usage can become an embarrassment. For -example, the very simple pattern -.sp - ((ab){1,1000}c){1,3} -.sp -uses 51K bytes when compiled using the 8-bit library. When PCRE is compiled -with its default internal pointer size of two bytes, the size limit on a -compiled pattern is 64K data units, and this is reached with the above pattern -if the outer repetition is increased from 3 to 4. PCRE can be compiled to use -larger internal pointers and thus handle larger compiled patterns, but it is -better to try to rewrite your pattern to use less memory if you can. -.P -One way of reducing the memory usage for such patterns is to make use of PCRE's -.\" HTML -.\" -"subroutine" -.\" -facility. Re-writing the above pattern as -.sp - ((ab)(?2){0,999}c)(?1){0,2} -.sp -reduces the memory requirements to 18K, and indeed it remains under 20K even -with the outer repetition increased to 100. However, this pattern is not -exactly equivalent, because the "subroutine" calls are treated as -.\" HTML -.\" -atomic groups -.\" -into which there can be no backtracking if there is a subsequent matching -failure. Therefore, PCRE cannot do this kind of rewriting automatically. -Furthermore, there is a noticeable loss of speed when executing the modified -pattern. Nevertheless, if the atomic grouping is not a problem and the loss of -speed is acceptable, this kind of rewriting will allow you to process patterns -that PCRE cannot otherwise handle. -. -. -.SH "STACK USAGE AT RUN TIME" -.rs -.sp -When \fBpcre_exec()\fP or \fBpcre[16|32]_exec()\fP is used for matching, certain -kinds of pattern can cause it to use large amounts of the process stack. In -some environments the default process stack is quite small, and if it runs out -the result is often SIGSEGV. This issue is probably the most frequently raised -problem with PCRE. Rewriting your pattern can often help. The -.\" HREF -\fBpcrestack\fP -.\" -documentation discusses this issue in detail. -. -. -.SH "PROCESSING TIME" -.rs -.sp -Certain items in regular expression patterns are processed more efficiently -than others. It is more efficient to use a character class like [aeiou] than a -set of single-character alternatives such as (a|e|i|o|u). In general, the -simplest construction that provides the required behaviour is usually the most -efficient. Jeffrey Friedl's book contains a lot of useful general discussion -about optimizing regular expressions for efficient performance. This document -contains a few observations about PCRE. -.P -Using Unicode character properties (the \ep, \eP, and \eX escapes) is slow, -because PCRE has to use a multi-stage table lookup whenever it needs a -character's property. If you can find an alternative pattern that does not use -character properties, it will probably be faster. -.P -By default, the escape sequences \eb, \ed, \es, and \ew, and the POSIX -character classes such as [:alpha:] do not use Unicode properties, partly for -backwards compatibility, and partly for performance reasons. However, you can -set PCRE_UCP if you want Unicode character properties to be used. This can -double the matching time for items such as \ed, when matched with -a traditional matching function; the performance loss is less with -a DFA matching function, and in both cases there is not much difference for -\eb. -.P -When a pattern begins with .* not in parentheses, or in parentheses that are -not the subject of a backreference, and the PCRE_DOTALL option is set, the -pattern is implicitly anchored by PCRE, since it can match only at the start of -a subject string. However, if PCRE_DOTALL is not set, PCRE cannot make this -optimization, because the . metacharacter does not then match a newline, and if -the subject string contains newlines, the pattern may match from the character -immediately following one of them instead of from the very start. For example, -the pattern -.sp - .*second -.sp -matches the subject "first\enand second" (where \en stands for a newline -character), with the match starting at the seventh character. In order to do -this, PCRE has to retry the match starting after every newline in the subject. -.P -If you are using such a pattern with subject strings that do not contain -newlines, the best performance is obtained by setting PCRE_DOTALL, or starting -the pattern with ^.* or ^.*? to indicate explicit anchoring. That saves PCRE -from having to scan along the subject looking for a newline to restart at. -.P -Beware of patterns that contain nested indefinite repeats. These can take a -long time to run when applied to a string that does not match. Consider the -pattern fragment -.sp - ^(a+)* -.sp -This can match "aaaa" in 16 different ways, and this number increases very -rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4 -times, and for each of those cases other than 0 or 4, the + repeats can match -different numbers of times.) When the remainder of the pattern is such that the -entire match is going to fail, PCRE has in principle to try every possible -variation, and this can take an extremely long time, even for relatively short -strings. -.P -An optimization catches some of the more simple cases such as -.sp - (a+)*b -.sp -where a literal character follows. Before embarking on the standard matching -procedure, PCRE checks that there is a "b" later in the subject string, and if -there is not, it fails the match immediately. However, when there is no -following literal this optimization cannot be used. You can see the difference -by comparing the behaviour of -.sp - (a+)*\ed -.sp -with the pattern above. The former gives a failure almost instantly when -applied to a whole line of "a" characters, whereas the latter takes an -appreciable time with strings longer than about 20 characters. -.P -In many cases, the solution to this kind of performance issue is to use an -atomic group or a possessive quantifier. -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 25 August 2012 -Copyright (c) 1997-2012 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcreposix.3 b/src/pcre/doc/pcreposix.3 deleted file mode 100644 index 77890f36..00000000 --- a/src/pcre/doc/pcreposix.3 +++ /dev/null @@ -1,267 +0,0 @@ -.TH PCREPOSIX 3 "09 January 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions. -.SH "SYNOPSIS" -.rs -.sp -.B #include -.PP -.nf -.B int regcomp(regex_t *\fIpreg\fP, const char *\fIpattern\fP, -.B " int \fIcflags\fP);" -.sp -.B int regexec(regex_t *\fIpreg\fP, const char *\fIstring\fP, -.B " size_t \fInmatch\fP, regmatch_t \fIpmatch\fP[], int \fIeflags\fP);" -.B " size_t regerror(int \fIerrcode\fP, const regex_t *\fIpreg\fP," -.B " char *\fIerrbuf\fP, size_t \fIerrbuf_size\fP);" -.sp -.B void regfree(regex_t *\fIpreg\fP); -.fi -. -.SH DESCRIPTION -.rs -.sp -This set of functions provides a POSIX-style API for the PCRE regular -expression 8-bit library. See the -.\" HREF -\fBpcreapi\fP -.\" -documentation for a description of PCRE's native API, which contains much -additional functionality. There is no POSIX-style wrapper for PCRE's 16-bit -and 32-bit library. -.P -The functions described here are just wrapper functions that ultimately call -the PCRE native API. Their prototypes are defined in the \fBpcreposix.h\fP -header file, and on Unix systems the library itself is called -\fBpcreposix.a\fP, so can be accessed by adding \fB-lpcreposix\fP to the -command for linking an application that uses them. Because the POSIX functions -call the native ones, it is also necessary to add \fB-lpcre\fP. -.P -I have implemented only those POSIX option bits that can be reasonably mapped -to PCRE native options. In addition, the option REG_EXTENDED is defined with -the value zero. This has no effect, but since programs that are written to the -POSIX interface often use it, this makes it easier to slot in PCRE as a -replacement library. Other POSIX options are not even defined. -.P -There are also some other options that are not defined by POSIX. These have -been added at the request of users who want to make use of certain -PCRE-specific features via the POSIX calling interface. -.P -When PCRE is called via these functions, it is only the API that is POSIX-like -in style. The syntax and semantics of the regular expressions themselves are -still those of Perl, subject to the setting of various PCRE options, as -described below. "POSIX-like in style" means that the API approximates to the -POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding -domains it is probably even less compatible. -.P -The header for these functions is supplied as \fBpcreposix.h\fP to avoid any -potential clash with other POSIX libraries. It can, of course, be renamed or -aliased as \fBregex.h\fP, which is the "correct" name. It provides two -structure types, \fIregex_t\fP for compiled internal forms, and -\fIregmatch_t\fP for returning captured substrings. It also defines some -constants whose names start with "REG_"; these are used for setting options and -identifying error codes. -. -. -.SH "COMPILING A PATTERN" -.rs -.sp -The function \fBregcomp()\fP is called to compile a pattern into an -internal form. The pattern is a C string terminated by a binary zero, and -is passed in the argument \fIpattern\fP. The \fIpreg\fP argument is a pointer -to a \fBregex_t\fP structure that is used as a base for storing information -about the compiled regular expression. -.P -The argument \fIcflags\fP is either zero, or contains one or more of the bits -defined by the following macros: -.sp - REG_DOTALL -.sp -The PCRE_DOTALL option is set when the regular expression is passed for -compilation to the native function. Note that REG_DOTALL is not part of the -POSIX standard. -.sp - REG_ICASE -.sp -The PCRE_CASELESS option is set when the regular expression is passed for -compilation to the native function. -.sp - REG_NEWLINE -.sp -The PCRE_MULTILINE option is set when the regular expression is passed for -compilation to the native function. Note that this does \fInot\fP mimic the -defined POSIX behaviour for REG_NEWLINE (see the following section). -.sp - REG_NOSUB -.sp -The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed -for compilation to the native function. In addition, when a pattern that is -compiled with this flag is passed to \fBregexec()\fP for matching, the -\fInmatch\fP and \fIpmatch\fP arguments are ignored, and no captured strings -are returned. -.sp - REG_UCP -.sp -The PCRE_UCP option is set when the regular expression is passed for -compilation to the native function. This causes PCRE to use Unicode properties -when matchine \ed, \ew, etc., instead of just recognizing ASCII values. Note -that REG_UTF8 is not part of the POSIX standard. -.sp - REG_UNGREEDY -.sp -The PCRE_UNGREEDY option is set when the regular expression is passed for -compilation to the native function. Note that REG_UNGREEDY is not part of the -POSIX standard. -.sp - REG_UTF8 -.sp -The PCRE_UTF8 option is set when the regular expression is passed for -compilation to the native function. This causes the pattern itself and all data -strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8 -is not part of the POSIX standard. -.P -In the absence of these flags, no options are passed to the native function. -This means the the regex is compiled with PCRE default semantics. In -particular, the way it handles newline characters in the subject string is the -Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only -\fIsome\fP of the effects specified for REG_NEWLINE. It does not affect the way -newlines are matched by . (they are not) or by a negative class such as [^a] -(they are). -.P -The yield of \fBregcomp()\fP is zero on success, and non-zero otherwise. The -\fIpreg\fP structure is filled in on success, and one member of the structure -is public: \fIre_nsub\fP contains the number of capturing subpatterns in -the regular expression. Various error codes are defined in the header file. -.P -NOTE: If the yield of \fBregcomp()\fP is non-zero, you must not attempt to -use the contents of the \fIpreg\fP structure. If, for example, you pass it to -\fBregexec()\fP, the result is undefined and your program is likely to crash. -. -. -.SH "MATCHING NEWLINE CHARACTERS" -.rs -.sp -This area is not simple, because POSIX and Perl take different views of things. -It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never -intended to be a POSIX engine. The following table lists the different -possibilities for matching newline characters in PCRE: -.sp - Default Change with -.sp - . matches newline no PCRE_DOTALL - newline matches [^a] yes not changeable - $ matches \en at end yes PCRE_DOLLARENDONLY - $ matches \en in middle no PCRE_MULTILINE - ^ matches \en in middle no PCRE_MULTILINE -.sp -This is the equivalent table for POSIX: -.sp - Default Change with -.sp - . matches newline yes REG_NEWLINE - newline matches [^a] yes REG_NEWLINE - $ matches \en at end no REG_NEWLINE - $ matches \en in middle no REG_NEWLINE - ^ matches \en in middle no REG_NEWLINE -.sp -PCRE's behaviour is the same as Perl's, except that there is no equivalent for -PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way to stop -newline from matching [^a]. -.P -The default POSIX newline handling can be obtained by setting PCRE_DOTALL and -PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE behave exactly as for the -REG_NEWLINE action. -. -. -.SH "MATCHING A PATTERN" -.rs -.sp -The function \fBregexec()\fP is called to match a compiled pattern \fIpreg\fP -against a given \fIstring\fP, which is by default terminated by a zero byte -(but see REG_STARTEND below), subject to the options in \fIeflags\fP. These can -be: -.sp - REG_NOTBOL -.sp -The PCRE_NOTBOL option is set when calling the underlying PCRE matching -function. -.sp - REG_NOTEMPTY -.sp -The PCRE_NOTEMPTY option is set when calling the underlying PCRE matching -function. Note that REG_NOTEMPTY is not part of the POSIX standard. However, -setting this option can give more POSIX-like behaviour in some situations. -.sp - REG_NOTEOL -.sp -The PCRE_NOTEOL option is set when calling the underlying PCRE matching -function. -.sp - REG_STARTEND -.sp -The string is considered to start at \fIstring\fP + \fIpmatch[0].rm_so\fP and -to have a terminating NUL located at \fIstring\fP + \fIpmatch[0].rm_eo\fP -(there need not actually be a NUL at that location), regardless of the value of -\fInmatch\fP. This is a BSD extension, compatible with but not specified by -IEEE Standard 1003.2 (POSIX.2), and should be used with caution in software -intended to be portable to other systems. Note that a non-zero \fIrm_so\fP does -not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not -how it is matched. -.P -If the pattern was compiled with the REG_NOSUB flag, no data about any matched -strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of -\fBregexec()\fP are ignored. -.P -If the value of \fInmatch\fP is zero, or if the value \fIpmatch\fP is NULL, -no data about any matched strings is returned. -.P -Otherwise,the portion of the string that was matched, and also any captured -substrings, are returned via the \fIpmatch\fP argument, which points to an -array of \fInmatch\fP structures of type \fIregmatch_t\fP, containing the -members \fIrm_so\fP and \fIrm_eo\fP. These contain the offset to the first -character of each substring and the offset to the first character after the end -of each substring, respectively. The 0th element of the vector relates to the -entire portion of \fIstring\fP that was matched; subsequent elements relate to -the capturing subpatterns of the regular expression. Unused entries in the -array have both structure members set to -1. -.P -A successful match yields a zero return; various error codes are defined in the -header file, of which REG_NOMATCH is the "expected" failure code. -. -. -.SH "ERROR MESSAGES" -.rs -.sp -The \fBregerror()\fP function maps a non-zero errorcode from either -\fBregcomp()\fP or \fBregexec()\fP to a printable message. If \fIpreg\fP is not -NULL, the error should have arisen from the use of that structure. A message -terminated by a binary zero is placed in \fIerrbuf\fP. The length of the -message, including the zero, is limited to \fIerrbuf_size\fP. The yield of the -function is the size of buffer needed to hold the whole message. -. -. -.SH MEMORY USAGE -.rs -.sp -Compiling a regular expression causes memory to be allocated and associated -with the \fIpreg\fP structure. The function \fBregfree()\fP frees all such -memory, after which \fIpreg\fP may no longer be used as a compiled expression. -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 09 January 2012 -Copyright (c) 1997-2012 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcreprecompile.3 b/src/pcre/doc/pcreprecompile.3 deleted file mode 100644 index 40f257a9..00000000 --- a/src/pcre/doc/pcreprecompile.3 +++ /dev/null @@ -1,155 +0,0 @@ -.TH PCREPRECOMPILE 3 "12 November 2013" "PCRE 8.34" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH "SAVING AND RE-USING PRECOMPILED PCRE PATTERNS" -.rs -.sp -If you are running an application that uses a large number of regular -expression patterns, it may be useful to store them in a precompiled form -instead of having to compile them every time the application is run. -If you are not using any private character tables (see the -.\" HREF -\fBpcre_maketables()\fP -.\" -documentation), this is relatively straightforward. If you are using private -tables, it is a little bit more complicated. However, if you are using the -just-in-time optimization feature, it is not possible to save and reload the -JIT data. -.P -If you save compiled patterns to a file, you can copy them to a different host -and run them there. If the two hosts have different endianness (byte order), -you should run the \fBpcre[16|32]_pattern_to_host_byte_order()\fP function on the -new host before trying to match the pattern. The matching functions return -PCRE_ERROR_BADENDIANNESS if they detect a pattern with the wrong endianness. -.P -Compiling regular expressions with one version of PCRE for use with a different -version is not guaranteed to work and may cause crashes, and saving and -restoring a compiled pattern loses any JIT optimization data. -. -. -.SH "SAVING A COMPILED PATTERN" -.rs -.sp -The value returned by \fBpcre[16|32]_compile()\fP points to a single block of -memory that holds the compiled pattern and associated data. You can find the -length of this block in bytes by calling \fBpcre[16|32]_fullinfo()\fP with an -argument of PCRE_INFO_SIZE. You can then save the data in any appropriate -manner. Here is sample code for the 8-bit library that compiles a pattern and -writes it to a file. It assumes that the variable \fIfd\fP refers to a file -that is open for output: -.sp - int erroroffset, rc, size; - char *error; - pcre *re; -.sp - re = pcre_compile("my pattern", 0, &error, &erroroffset, NULL); - if (re == NULL) { ... handle errors ... } - rc = pcre_fullinfo(re, NULL, PCRE_INFO_SIZE, &size); - if (rc < 0) { ... handle errors ... } - rc = fwrite(re, 1, size, fd); - if (rc != size) { ... handle errors ... } -.sp -In this example, the bytes that comprise the compiled pattern are copied -exactly. Note that this is binary data that may contain any of the 256 possible -byte values. On systems that make a distinction between binary and non-binary -data, be sure that the file is opened for binary output. -.P -If you want to write more than one pattern to a file, you will have to devise a -way of separating them. For binary data, preceding each pattern with its length -is probably the most straightforward approach. Another possibility is to write -out the data in hexadecimal instead of binary, one pattern to a line. -.P -Saving compiled patterns in a file is only one possible way of storing them for -later use. They could equally well be saved in a database, or in the memory of -some daemon process that passes them via sockets to the processes that want -them. -.P -If the pattern has been studied, it is also possible to save the normal study -data in a similar way to the compiled pattern itself. However, if the -PCRE_STUDY_JIT_COMPILE was used, the just-in-time data that is created cannot -be saved because it is too dependent on the current environment. When studying -generates additional information, \fBpcre[16|32]_study()\fP returns a pointer to a -\fBpcre[16|32]_extra\fP data block. Its format is defined in the -.\" HTML -.\" -section on matching a pattern -.\" -in the -.\" HREF -\fBpcreapi\fP -.\" -documentation. The \fIstudy_data\fP field points to the binary study data, and -this is what you must save (not the \fBpcre[16|32]_extra\fP block itself). The -length of the study data can be obtained by calling \fBpcre[16|32]_fullinfo()\fP -with an argument of PCRE_INFO_STUDYSIZE. Remember to check that -\fBpcre[16|32]_study()\fP did return a non-NULL value before trying to save the -study data. -. -. -.SH "RE-USING A PRECOMPILED PATTERN" -.rs -.sp -Re-using a precompiled pattern is straightforward. Having reloaded it into main -memory, called \fBpcre[16|32]_pattern_to_host_byte_order()\fP if necessary, you -pass its pointer to \fBpcre[16|32]_exec()\fP or \fBpcre[16|32]_dfa_exec()\fP in -the usual way. -.P -However, if you passed a pointer to custom character tables when the pattern -was compiled (the \fItableptr\fP argument of \fBpcre[16|32]_compile()\fP), you -must now pass a similar pointer to \fBpcre[16|32]_exec()\fP or -\fBpcre[16|32]_dfa_exec()\fP, because the value saved with the compiled pattern -will obviously be nonsense. A field in a \fBpcre[16|32]_extra()\fP block is used -to pass this data, as described in the -.\" HTML -.\" -section on matching a pattern -.\" -in the -.\" HREF -\fBpcreapi\fP -.\" -documentation. -.P -\fBWarning:\fP The tables that \fBpcre_exec()\fP and \fBpcre_dfa_exec()\fP use -must be the same as those that were used when the pattern was compiled. If this -is not the case, the behaviour is undefined. -.P -If you did not provide custom character tables when the pattern was compiled, -the pointer in the compiled pattern is NULL, which causes the matching -functions to use PCRE's internal tables. Thus, you do not need to take any -special action at run time in this case. -.P -If you saved study data with the compiled pattern, you need to create your own -\fBpcre[16|32]_extra\fP data block and set the \fIstudy_data\fP field to point -to the reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in -the \fIflags\fP field to indicate that study data is present. Then pass the -\fBpcre[16|32]_extra\fP block to the matching function in the usual way. If the -pattern was studied for just-in-time optimization, that data cannot be saved, -and so is lost by a save/restore cycle. -. -. -.SH "COMPATIBILITY WITH DIFFERENT PCRE RELEASES" -.rs -.sp -In general, it is safest to recompile all saved patterns when you update to a -new PCRE release, though not all updates actually require this. -. -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 12 November 2013 -Copyright (c) 1997-2013 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcresample.3 b/src/pcre/doc/pcresample.3 deleted file mode 100644 index d7fe7ec5..00000000 --- a/src/pcre/doc/pcresample.3 +++ /dev/null @@ -1,99 +0,0 @@ -.TH PCRESAMPLE 3 "10 January 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH "PCRE SAMPLE PROGRAM" -.rs -.sp -A simple, complete demonstration program, to get you started with using PCRE, -is supplied in the file \fIpcredemo.c\fP in the PCRE distribution. A listing of -this program is given in the -.\" HREF -\fBpcredemo\fP -.\" -documentation. If you do not have a copy of the PCRE distribution, you can save -this listing to re-create \fIpcredemo.c\fP. -.P -The demonstration program, which uses the original PCRE 8-bit library, compiles -the regular expression that is its first argument, and matches it against the -subject string in its second argument. No PCRE options are set, and default -character tables are used. If matching succeeds, the program outputs the -portion of the subject that matched, together with the contents of any captured -substrings. -.P -If the -g option is given on the command line, the program then goes on to -check for further matches of the same regular expression in the same subject -string. The logic is a little bit tricky because of the possibility of matching -an empty string. Comments in the code explain what is going on. -.P -If PCRE is installed in the standard include and library directories for your -operating system, you should be able to compile the demonstration program using -this command: -.sp - gcc -o pcredemo pcredemo.c -lpcre -.sp -If PCRE is installed elsewhere, you may need to add additional options to the -command line. For example, on a Unix-like system that has PCRE installed in -\fI/usr/local\fP, you can compile the demonstration program using a command -like this: -.sp -.\" JOINSH - gcc -o pcredemo -I/usr/local/include pcredemo.c \e - -L/usr/local/lib -lpcre -.sp -In a Windows environment, if you want to statically link the program against a -non-dll \fBpcre.a\fP file, you must uncomment the line that defines PCRE_STATIC -before including \fBpcre.h\fP, because otherwise the \fBpcre_malloc()\fP and -\fBpcre_free()\fP exported functions will be declared -\fB__declspec(dllimport)\fP, with unwanted results. -.P -Once you have compiled and linked the demonstration program, you can run simple -tests like this: -.sp - ./pcredemo 'cat|dog' 'the cat sat on the mat' - ./pcredemo -g 'cat|dog' 'the dog sat on the cat' -.sp -Note that there is a much more comprehensive test program, called -.\" HREF -\fBpcretest\fP, -.\" -which supports many more facilities for testing regular expressions and both -PCRE libraries. The -.\" HREF -\fBpcredemo\fP -.\" -program is provided as a simple coding example. -.P -If you try to run -.\" HREF -\fBpcredemo\fP -.\" -when PCRE is not installed in the standard library directory, you may get an -error like this on some operating systems (e.g. Solaris): -.sp - ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory -.sp -This is caused by the way shared library support works on those systems. You -need to add -.sp - -R/usr/local/lib -.sp -(for example) to the compile command to get round this problem. -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 10 January 2012 -Copyright (c) 1997-2012 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcrestack.3 b/src/pcre/doc/pcrestack.3 deleted file mode 100644 index 798f0bca..00000000 --- a/src/pcre/doc/pcrestack.3 +++ /dev/null @@ -1,215 +0,0 @@ -.TH PCRESTACK 3 "24 June 2012" "PCRE 8.30" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH "PCRE DISCUSSION OF STACK USAGE" -.rs -.sp -When you call \fBpcre[16|32]_exec()\fP, it makes use of an internal function -called \fBmatch()\fP. This calls itself recursively at branch points in the -pattern, in order to remember the state of the match so that it can back up and -try a different alternative if the first one fails. As matching proceeds deeper -and deeper into the tree of possibilities, the recursion depth increases. The -\fBmatch()\fP function is also called in other circumstances, for example, -whenever a parenthesized sub-pattern is entered, and in certain cases of -repetition. -.P -Not all calls of \fBmatch()\fP increase the recursion depth; for an item such -as a* it may be called several times at the same level, after matching -different numbers of a's. Furthermore, in a number of cases where the result of -the recursive call would immediately be passed back as the result of the -current call (a "tail recursion"), the function is just restarted instead. -.P -The above comments apply when \fBpcre[16|32]_exec()\fP is run in its normal -interpretive manner. If the pattern was studied with the -PCRE_STUDY_JIT_COMPILE option, and just-in-time compiling was successful, and -the options passed to \fBpcre[16|32]_exec()\fP were not incompatible, the matching -process uses the JIT-compiled code instead of the \fBmatch()\fP function. In -this case, the memory requirements are handled entirely differently. See the -.\" HREF -\fBpcrejit\fP -.\" -documentation for details. -.P -The \fBpcre[16|32]_dfa_exec()\fP function operates in an entirely different way, -and uses recursion only when there is a regular expression recursion or -subroutine call in the pattern. This includes the processing of assertion and -"once-only" subpatterns, which are handled like subroutine calls. Normally, -these are never very deep, and the limit on the complexity of -\fBpcre[16|32]_dfa_exec()\fP is controlled by the amount of workspace it is given. -However, it is possible to write patterns with runaway infinite recursions; -such patterns will cause \fBpcre[16|32]_dfa_exec()\fP to run out of stack. At -present, there is no protection against this. -.P -The comments that follow do NOT apply to \fBpcre[16|32]_dfa_exec()\fP; they are -relevant only for \fBpcre[16|32]_exec()\fP without the JIT optimization. -. -. -.SS "Reducing \fBpcre[16|32]_exec()\fP's stack usage" -.rs -.sp -Each time that \fBmatch()\fP is actually called recursively, it uses memory -from the process stack. For certain kinds of pattern and data, very large -amounts of stack may be needed, despite the recognition of "tail recursion". -You can often reduce the amount of recursion, and therefore the amount of stack -used, by modifying the pattern that is being matched. Consider, for example, -this pattern: -.sp - ([^<]|<(?!inet))+ -.sp -It matches from wherever it starts until it encounters " -.\" -section on extra data for \fBpcre[16|32]_exec()\fP -.\" -in the -.\" HREF -\fBpcreapi\fP -.\" -documentation. -.P -As a very rough rule of thumb, you should reckon on about 500 bytes per -recursion. Thus, if you want to limit your stack usage to 8Mb, you should set -the limit at 16000 recursions. A 64Mb stack, on the other hand, can support -around 128000 recursions. -.P -In Unix-like environments, the \fBpcretest\fP test program has a command line -option (\fB-S\fP) that can be used to increase the size of its stack. As long -as the stack is large enough, another option (\fB-M\fP) can be used to find the -smallest limits that allow a particular pattern to match a given subject -string. This is done by calling \fBpcre[16|32]_exec()\fP repeatedly with different -limits. -. -. -.SS "Obtaining an estimate of stack usage" -.rs -.sp -The actual amount of stack used per recursion can vary quite a lot, depending -on the compiler that was used to build PCRE and the optimization or debugging -options that were set for it. The rule of thumb value of 500 bytes mentioned -above may be larger or smaller than what is actually needed. A better -approximation can be obtained by running this command: -.sp - pcretest -m -C -.sp -The \fB-C\fP option causes \fBpcretest\fP to output information about the -options with which PCRE was compiled. When \fB-m\fP is also given (before -\fB-C\fP), information about stack use is given in a line like this: -.sp - Match recursion uses stack: approximate frame size = 640 bytes -.sp -The value is approximate because some recursions need a bit more (up to perhaps -16 more bytes). -.P -If the above command is given when PCRE is compiled to use the heap instead of -the stack for recursion, the value that is output is the size of each block -that is obtained from the heap. -. -. -.SS "Changing stack size in Unix-like systems" -.rs -.sp -In Unix-like environments, there is not often a problem with the stack unless -very long strings are involved, though the default limit on stack size varies -from system to system. Values from 8Mb to 64Mb are common. You can find your -default limit by running the command: -.sp - ulimit -s -.sp -Unfortunately, the effect of running out of stack is often SIGSEGV, though -sometimes a more explicit error message is given. You can normally increase the -limit on stack size by code such as this: -.sp - struct rlimit rlim; - getrlimit(RLIMIT_STACK, &rlim); - rlim.rlim_cur = 100*1024*1024; - setrlimit(RLIMIT_STACK, &rlim); -.sp -This reads the current limits (soft and hard) using \fBgetrlimit()\fP, then -attempts to increase the soft limit to 100Mb using \fBsetrlimit()\fP. You must -do this before calling \fBpcre[16|32]_exec()\fP. -. -. -.SS "Changing stack size in Mac OS X" -.rs -.sp -Using \fBsetrlimit()\fP, as described above, should also work on Mac OS X. It -is also possible to set a stack size when linking a program. There is a -discussion about stack sizes in Mac OS X at this web site: -.\" HTML -.\" -http://developer.apple.com/qa/qa2005/qa1419.html. -.\" -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 24 June 2012 -Copyright (c) 1997-2012 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcresyntax.3 b/src/pcre/doc/pcresyntax.3 deleted file mode 100644 index 0850369f..00000000 --- a/src/pcre/doc/pcresyntax.3 +++ /dev/null @@ -1,540 +0,0 @@ -.TH PCRESYNTAX 3 "08 January 2014" "PCRE 8.35" -.SH NAME -PCRE - Perl-compatible regular expressions -.SH "PCRE REGULAR EXPRESSION SYNTAX SUMMARY" -.rs -.sp -The full syntax and semantics of the regular expressions that are supported by -PCRE are described in the -.\" HREF -\fBpcrepattern\fP -.\" -documentation. This document contains a quick-reference summary of the syntax. -. -. -.SH "QUOTING" -.rs -.sp - \ex where x is non-alphanumeric is a literal x - \eQ...\eE treat enclosed characters as literal -. -. -.SH "CHARACTERS" -.rs -.sp - \ea alarm, that is, the BEL character (hex 07) - \ecx "control-x", where x is any ASCII character - \ee escape (hex 1B) - \ef form feed (hex 0C) - \en newline (hex 0A) - \er carriage return (hex 0D) - \et tab (hex 09) - \e0dd character with octal code 0dd - \eddd character with octal code ddd, or backreference - \eo{ddd..} character with octal code ddd.. - \exhh character with hex code hh - \ex{hhh..} character with hex code hhh.. -.sp -Note that \e0dd is always an octal code, and that \e8 and \e9 are the literal -characters "8" and "9". -. -. -.SH "CHARACTER TYPES" -.rs -.sp - . any character except newline; - in dotall mode, any character whatsoever - \eC one data unit, even in UTF mode (best avoided) - \ed a decimal digit - \eD a character that is not a decimal digit - \eh a horizontal white space character - \eH a character that is not a horizontal white space character - \eN a character that is not a newline - \ep{\fIxx\fP} a character with the \fIxx\fP property - \eP{\fIxx\fP} a character without the \fIxx\fP property - \eR a newline sequence - \es a white space character - \eS a character that is not a white space character - \ev a vertical white space character - \eV a character that is not a vertical white space character - \ew a "word" character - \eW a "non-word" character - \eX a Unicode extended grapheme cluster -.sp -By default, \ed, \es, and \ew match only ASCII characters, even in UTF-8 mode -or in the 16- bit and 32-bit libraries. However, if locale-specific matching is -happening, \es and \ew may also match characters with code points in the range -128-255. If the PCRE_UCP option is set, the behaviour of these escape sequences -is changed to use Unicode properties and they match many more characters. -. -. -.SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP" -.rs -.sp - C Other - Cc Control - Cf Format - Cn Unassigned - Co Private use - Cs Surrogate -.sp - L Letter - Ll Lower case letter - Lm Modifier letter - Lo Other letter - Lt Title case letter - Lu Upper case letter - L& Ll, Lu, or Lt -.sp - M Mark - Mc Spacing mark - Me Enclosing mark - Mn Non-spacing mark -.sp - N Number - Nd Decimal number - Nl Letter number - No Other number -.sp - P Punctuation - Pc Connector punctuation - Pd Dash punctuation - Pe Close punctuation - Pf Final punctuation - Pi Initial punctuation - Po Other punctuation - Ps Open punctuation -.sp - S Symbol - Sc Currency symbol - Sk Modifier symbol - Sm Mathematical symbol - So Other symbol -.sp - Z Separator - Zl Line separator - Zp Paragraph separator - Zs Space separator -. -. -.SH "PCRE SPECIAL CATEGORY PROPERTIES FOR \ep and \eP" -.rs -.sp - Xan Alphanumeric: union of properties L and N - Xps POSIX space: property Z or tab, NL, VT, FF, CR - Xsp Perl space: property Z or tab, NL, VT, FF, CR - Xuc Univerally-named character: one that can be - represented by a Universal Character Name - Xwd Perl word: property Xan or underscore -.sp -Perl and POSIX space are now the same. Perl added VT to its space character set -at release 5.18 and PCRE changed at release 8.34. -. -. -.SH "SCRIPT NAMES FOR \ep AND \eP" -.rs -.sp -Arabic, -Armenian, -Avestan, -Balinese, -Bamum, -Bassa_Vah, -Batak, -Bengali, -Bopomofo, -Brahmi, -Braille, -Buginese, -Buhid, -Canadian_Aboriginal, -Carian, -Caucasian_Albanian, -Chakma, -Cham, -Cherokee, -Common, -Coptic, -Cuneiform, -Cypriot, -Cyrillic, -Deseret, -Devanagari, -Duployan, -Egyptian_Hieroglyphs, -Elbasan, -Ethiopic, -Georgian, -Glagolitic, -Gothic, -Grantha, -Greek, -Gujarati, -Gurmukhi, -Han, -Hangul, -Hanunoo, -Hebrew, -Hiragana, -Imperial_Aramaic, -Inherited, -Inscriptional_Pahlavi, -Inscriptional_Parthian, -Javanese, -Kaithi, -Kannada, -Katakana, -Kayah_Li, -Kharoshthi, -Khmer, -Khojki, -Khudawadi, -Lao, -Latin, -Lepcha, -Limbu, -Linear_A, -Linear_B, -Lisu, -Lycian, -Lydian, -Mahajani, -Malayalam, -Mandaic, -Manichaean, -Meetei_Mayek, -Mende_Kikakui, -Meroitic_Cursive, -Meroitic_Hieroglyphs, -Miao, -Modi, -Mongolian, -Mro, -Myanmar, -Nabataean, -New_Tai_Lue, -Nko, -Ogham, -Ol_Chiki, -Old_Italic, -Old_North_Arabian, -Old_Permic, -Old_Persian, -Old_South_Arabian, -Old_Turkic, -Oriya, -Osmanya, -Pahawh_Hmong, -Palmyrene, -Pau_Cin_Hau, -Phags_Pa, -Phoenician, -Psalter_Pahlavi, -Rejang, -Runic, -Samaritan, -Saurashtra, -Sharada, -Shavian, -Siddham, -Sinhala, -Sora_Sompeng, -Sundanese, -Syloti_Nagri, -Syriac, -Tagalog, -Tagbanwa, -Tai_Le, -Tai_Tham, -Tai_Viet, -Takri, -Tamil, -Telugu, -Thaana, -Thai, -Tibetan, -Tifinagh, -Tirhuta, -Ugaritic, -Vai, -Warang_Citi, -Yi. -. -. -.SH "CHARACTER CLASSES" -.rs -.sp - [...] positive character class - [^...] negative character class - [x-y] range (can be used for hex characters) - [[:xxx:]] positive POSIX named set - [[:^xxx:]] negative POSIX named set -.sp - alnum alphanumeric - alpha alphabetic - ascii 0-127 - blank space or tab - cntrl control character - digit decimal digit - graph printing, excluding space - lower lower case letter - print printing, including space - punct printing, excluding alphanumeric - space white space - upper upper case letter - word same as \ew - xdigit hexadecimal digit -.sp -In PCRE, POSIX character set names recognize only ASCII characters by default, -but some of them use Unicode properties if PCRE_UCP is set. You can use -\eQ...\eE inside a character class. -. -. -.SH "QUANTIFIERS" -.rs -.sp - ? 0 or 1, greedy - ?+ 0 or 1, possessive - ?? 0 or 1, lazy - * 0 or more, greedy - *+ 0 or more, possessive - *? 0 or more, lazy - + 1 or more, greedy - ++ 1 or more, possessive - +? 1 or more, lazy - {n} exactly n - {n,m} at least n, no more than m, greedy - {n,m}+ at least n, no more than m, possessive - {n,m}? at least n, no more than m, lazy - {n,} n or more, greedy - {n,}+ n or more, possessive - {n,}? n or more, lazy -. -. -.SH "ANCHORS AND SIMPLE ASSERTIONS" -.rs -.sp - \eb word boundary - \eB not a word boundary - ^ start of subject - also after internal newline in multiline mode - \eA start of subject - $ end of subject - also before newline at end of subject - also before internal newline in multiline mode - \eZ end of subject - also before newline at end of subject - \ez end of subject - \eG first matching position in subject -. -. -.SH "MATCH POINT RESET" -.rs -.sp - \eK reset start of match -.sp -\eK is honoured in positive assertions, but ignored in negative ones. -. -. -.SH "ALTERNATION" -.rs -.sp - expr|expr|expr... -. -. -.SH "CAPTURING" -.rs -.sp - (...) capturing group - (?...) named capturing group (Perl) - (?'name'...) named capturing group (Perl) - (?P...) named capturing group (Python) - (?:...) non-capturing group - (?|...) non-capturing group; reset group numbers for - capturing groups in each alternative -. -. -.SH "ATOMIC GROUPS" -.rs -.sp - (?>...) atomic, non-capturing group -. -. -. -. -.SH "COMMENT" -.rs -.sp - (?#....) comment (not nestable) -. -. -.SH "OPTION SETTING" -.rs -.sp - (?i) caseless - (?J) allow duplicate names - (?m) multiline - (?s) single line (dotall) - (?U) default ungreedy (lazy) - (?x) extended (ignore white space) - (?-...) unset option(s) -.sp -The following are recognized only at the very start of a pattern or after one -of the newline or \eR options with similar syntax. More than one of them may -appear. -.sp - (*LIMIT_MATCH=d) set the match limit to d (decimal number) - (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) - (*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS) - (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) - (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) - (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) - (*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32) - (*UTF) set appropriate UTF mode for the library in use - (*UCP) set PCRE_UCP (use Unicode properties for \ed etc) -.sp -Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the -limits set by the caller of pcre_exec(), not increase them. -. -. -.SH "NEWLINE CONVENTION" -.rs -.sp -These are recognized only at the very start of the pattern or after option -settings with a similar syntax. -.sp - (*CR) carriage return only - (*LF) linefeed only - (*CRLF) carriage return followed by linefeed - (*ANYCRLF) all three of the above - (*ANY) any Unicode newline sequence -. -. -.SH "WHAT \eR MATCHES" -.rs -.sp -These are recognized only at the very start of the pattern or after option -setting with a similar syntax. -.sp - (*BSR_ANYCRLF) CR, LF, or CRLF - (*BSR_UNICODE) any Unicode newline sequence -. -. -.SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS" -.rs -.sp - (?=...) positive look ahead - (?!...) negative look ahead - (?<=...) positive look behind - (? reference by name (Perl) - \ek'name' reference by name (Perl) - \eg{name} reference by name (Perl) - \ek{name} reference by name (.NET) - (?P=name) reference by name (Python) -. -. -.SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)" -.rs -.sp - (?R) recurse whole pattern - (?n) call subpattern by absolute number - (?+n) call subpattern by relative number - (?-n) call subpattern by relative number - (?&name) call subpattern by name (Perl) - (?P>name) call subpattern by name (Python) - \eg call subpattern by name (Oniguruma) - \eg'name' call subpattern by name (Oniguruma) - \eg call subpattern by absolute number (Oniguruma) - \eg'n' call subpattern by absolute number (Oniguruma) - \eg<+n> call subpattern by relative number (PCRE extension) - \eg'+n' call subpattern by relative number (PCRE extension) - \eg<-n> call subpattern by relative number (PCRE extension) - \eg'-n' call subpattern by relative number (PCRE extension) -. -. -.SH "CONDITIONAL PATTERNS" -.rs -.sp - (?(condition)yes-pattern) - (?(condition)yes-pattern|no-pattern) -.sp - (?(n)... absolute reference condition - (?(+n)... relative reference condition - (?(-n)... relative reference condition - (?()... named reference condition (Perl) - (?('name')... named reference condition (Perl) - (?(name)... named reference condition (PCRE) - (?(R)... overall recursion condition - (?(Rn)... specific group recursion condition - (?(R&name)... specific recursion condition - (?(DEFINE)... define subpattern for reference - (?(assert)... assertion condition -. -. -.SH "BACKTRACKING CONTROL" -.rs -.sp -The following act immediately they are reached: -.sp - (*ACCEPT) force successful match - (*FAIL) force backtrack; synonym (*F) - (*MARK:NAME) set name to be passed back; synonym (*:NAME) -.sp -The following act only when a subsequent match failure causes a backtrack to -reach them. They all force a match failure, but they differ in what happens -afterwards. Those that advance the start-of-match point do so only if the -pattern is not anchored. -.sp - (*COMMIT) overall failure, no advance of starting point - (*PRUNE) advance to next starting character - (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE) - (*SKIP) advance to current matching position - (*SKIP:NAME) advance to position corresponding to an earlier - (*MARK:NAME); if not found, the (*SKIP) is ignored - (*THEN) local failure, backtrack to next alternation - (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN) -. -. -.SH "CALLOUTS" -.rs -.sp - (?C) callout - (?Cn) callout with data n -. -. -.SH "SEE ALSO" -.rs -.sp -\fBpcrepattern\fP(3), \fBpcreapi\fP(3), \fBpcrecallout\fP(3), -\fBpcrematching\fP(3), \fBpcre\fP(3). -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 08 January 2014 -Copyright (c) 1997-2014 University of Cambridge. -.fi diff --git a/src/pcre/doc/pcretest.1 b/src/pcre/doc/pcretest.1 deleted file mode 100644 index ea7457c0..00000000 --- a/src/pcre/doc/pcretest.1 +++ /dev/null @@ -1,1160 +0,0 @@ -.TH PCRETEST 1 "23 February 2017" "PCRE 8.41" -.SH NAME -pcretest - a program for testing Perl-compatible regular expressions. -.SH SYNOPSIS -.rs -.sp -.B pcretest "[options] [input file [output file]]" -.sp -\fBpcretest\fP was written as a test program for the PCRE regular expression -library itself, but it can also be used for experimenting with regular -expressions. This document describes the features of the test program; for -details of the regular expressions themselves, see the -.\" HREF -\fBpcrepattern\fP -.\" -documentation. For details of the PCRE library function calls and their -options, see the -.\" HREF -\fBpcreapi\fP -.\" -, -.\" HREF -\fBpcre16\fP -and -.\" HREF -\fBpcre32\fP -.\" -documentation. -.P -The input for \fBpcretest\fP is a sequence of regular expression patterns and -strings to be matched, as described below. The output shows the result of each -match. Options on the command line and the patterns control PCRE options and -exactly what is output. -.P -As PCRE has evolved, it has acquired many different features, and as a result, -\fBpcretest\fP now has rather a lot of obscure options for testing every -possible feature. Some of these options are specifically designed for use in -conjunction with the test script and data files that are distributed as part of -PCRE, and are unlikely to be of use otherwise. They are all documented here, -but without much justification. -. -. -.SH "INPUT DATA FORMAT" -.rs -.sp -Input to \fBpcretest\fP is processed line by line, either by calling the C -library's \fBfgets()\fP function, or via the \fBlibreadline\fP library (see -below). In Unix-like environments, \fBfgets()\fP treats any bytes other than -newline as data characters. However, in some Windows environments character 26 -(hex 1A) causes an immediate end of file, and no further data is read. For -maximum portability, therefore, it is safest to use only ASCII characters in -\fBpcretest\fP input files. -.P -The input is processed using using C's string functions, so must not -contain binary zeroes, even though in Unix-like environments, \fBfgets()\fP -treats any bytes other than newline as data characters. -. -. -.SH "PCRE's 8-BIT, 16-BIT AND 32-BIT LIBRARIES" -.rs -.sp -From release 8.30, two separate PCRE libraries can be built. The original one -supports 8-bit character strings, whereas the newer 16-bit library supports -character strings encoded in 16-bit units. From release 8.32, a third library -can be built, supporting character strings encoded in 32-bit units. The -\fBpcretest\fP program can be used to test all three libraries. However, it is -itself still an 8-bit program, reading 8-bit input and writing 8-bit output. -When testing the 16-bit or 32-bit library, the patterns and data strings are -converted to 16- or 32-bit format before being passed to the PCRE library -functions. Results are converted to 8-bit for output. -.P -References to functions and structures of the form \fBpcre[16|32]_xx\fP below -mean "\fBpcre_xx\fP when using the 8-bit library, \fBpcre16_xx\fP when using -the 16-bit library, or \fBpcre32_xx\fP when using the 32-bit library". -. -. -.SH "COMMAND LINE OPTIONS" -.rs -.TP 10 -\fB-8\fP -If both the 8-bit library has been built, this option causes the 8-bit library -to be used (which is the default); if the 8-bit library has not been built, -this option causes an error. -.TP 10 -\fB-16\fP -If both the 8-bit or the 32-bit, and the 16-bit libraries have been built, this -option causes the 16-bit library to be used. If only the 16-bit library has been -built, this is the default (so has no effect). If only the 8-bit or the 32-bit -library has been built, this option causes an error. -.TP 10 -\fB-32\fP -If both the 8-bit or the 16-bit, and the 32-bit libraries have been built, this -option causes the 32-bit library to be used. If only the 32-bit library has been -built, this is the default (so has no effect). If only the 8-bit or the 16-bit -library has been built, this option causes an error. -.TP 10 -\fB-b\fP -Behave as if each pattern has the \fB/B\fP (show byte code) modifier; the -internal form is output after compilation. -.TP 10 -\fB-C\fP -Output the version number of the PCRE library, and all available information -about the optional features that are included, and then exit with zero exit -code. All other options are ignored. -.TP 10 -\fB-C\fP \fIoption\fP -Output information about a specific build-time option, then exit. This -functionality is intended for use in scripts such as \fBRunTest\fP. The -following options output the value and set the exit code as indicated: -.sp - ebcdic-nl the code for LF (= NL) in an EBCDIC environment: - 0x15 or 0x25 - 0 if used in an ASCII environment - exit code is always 0 - linksize the configured internal link size (2, 3, or 4) - exit code is set to the link size - newline the default newline setting: - CR, LF, CRLF, ANYCRLF, or ANY - exit code is always 0 - bsr the default setting for what \eR matches: - ANYCRLF or ANY - exit code is always 0 -.sp -The following options output 1 for true or 0 for false, and set the exit code -to the same value: -.sp - ebcdic compiled for an EBCDIC environment - jit just-in-time support is available - pcre16 the 16-bit library was built - pcre32 the 32-bit library was built - pcre8 the 8-bit library was built - ucp Unicode property support is available - utf UTF-8 and/or UTF-16 and/or UTF-32 support - is available -.sp -If an unknown option is given, an error message is output; the exit code is 0. -.TP 10 -\fB-d\fP -Behave as if each pattern has the \fB/D\fP (debug) modifier; the internal -form and information about the compiled pattern is output after compilation; -\fB-d\fP is equivalent to \fB-b -i\fP. -.TP 10 -\fB-dfa\fP -Behave as if each data line contains the \eD escape sequence; this causes the -alternative matching function, \fBpcre[16|32]_dfa_exec()\fP, to be used instead -of the standard \fBpcre[16|32]_exec()\fP function (more detail is given below). -.TP 10 -\fB-help\fP -Output a brief summary these options and then exit. -.TP 10 -\fB-i\fP -Behave as if each pattern has the \fB/I\fP modifier; information about the -compiled pattern is given after compilation. -.TP 10 -\fB-M\fP -Behave as if each data line contains the \eM escape sequence; this causes -PCRE to discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings by -calling \fBpcre[16|32]_exec()\fP repeatedly with different limits. -.TP 10 -\fB-m\fP -Output the size of each compiled pattern after it has been compiled. This is -equivalent to adding \fB/M\fP to each regular expression. The size is given in -bytes for both libraries. -.TP 10 -\fB-O\fP -Behave as if each pattern has the \fB/O\fP modifier, that is disable -auto-possessification for all patterns. -.TP 10 -\fB-o\fP \fIosize\fP -Set the number of elements in the output vector that is used when calling -\fBpcre[16|32]_exec()\fP or \fBpcre[16|32]_dfa_exec()\fP to be \fIosize\fP. The -default value is 45, which is enough for 14 capturing subexpressions for -\fBpcre[16|32]_exec()\fP or 22 different matches for -\fBpcre[16|32]_dfa_exec()\fP. -The vector size can be changed for individual matching calls by including \eO -in the data line (see below). -.TP 10 -\fB-p\fP -Behave as if each pattern has the \fB/P\fP modifier; the POSIX wrapper API is -used to call PCRE. None of the other options has any effect when \fB-p\fP is -set. This option can be used only with the 8-bit library. -.TP 10 -\fB-q\fP -Do not output the version number of \fBpcretest\fP at the start of execution. -.TP 10 -\fB-S\fP \fIsize\fP -On Unix-like systems, set the size of the run-time stack to \fIsize\fP -megabytes. -.TP 10 -\fB-s\fP or \fB-s+\fP -Behave as if each pattern has the \fB/S\fP modifier; in other words, force each -pattern to be studied. If \fB-s+\fP is used, all the JIT compile options are -passed to \fBpcre[16|32]_study()\fP, causing just-in-time optimization to be set -up if it is available, for both full and partial matching. Specific JIT compile -options can be selected by following \fB-s+\fP with a digit in the range 1 to -7, which selects the JIT compile modes as follows: -.sp - 1 normal match only - 2 soft partial match only - 3 normal match and soft partial match - 4 hard partial match only - 6 soft and hard partial match - 7 all three modes (default) -.sp -If \fB-s++\fP is used instead of \fB-s+\fP (with or without a following digit), -the text "(JIT)" is added to the first output line after a match or no match -when JIT-compiled code was actually used. -.sp -Note that there are pattern options that can override \fB-s\fP, either -specifying no studying at all, or suppressing JIT compilation. -.sp -If the \fB/I\fP or \fB/D\fP option is present on a pattern (requesting output -about the compiled pattern), information about the result of studying is not -included when studying is caused only by \fB-s\fP and neither \fB-i\fP nor -\fB-d\fP is present on the command line. This behaviour means that the output -from tests that are run with and without \fB-s\fP should be identical, except -when options that output information about the actual running of a match are -set. -.sp -The \fB-M\fP, \fB-t\fP, and \fB-tm\fP options, which give information about -resources used, are likely to produce different output with and without -\fB-s\fP. Output may also differ if the \fB/C\fP option is present on an -individual pattern. This uses callouts to trace the the matching process, and -this may be different between studied and non-studied patterns. If the pattern -contains (*MARK) items there may also be differences, for the same reason. The -\fB-s\fP command line option can be overridden for specific patterns that -should never be studied (see the \fB/S\fP pattern modifier below). -.TP 10 -\fB-t\fP -Run each compile, study, and match many times with a timer, and output the -resulting times per compile, study, or match (in milliseconds). Do not set -\fB-m\fP with \fB-t\fP, because you will then get the size output a zillion -times, and the timing will be distorted. You can control the number of -iterations that are used for timing by following \fB-t\fP with a number (as a -separate item on the command line). For example, "-t 1000" iterates 1000 times. -The default is to iterate 500000 times. -.TP 10 -\fB-tm\fP -This is like \fB-t\fP except that it times only the matching phase, not the -compile or study phases. -.TP 10 -\fB-T\fP \fB-TM\fP -These behave like \fB-t\fP and \fB-tm\fP, but in addition, at the end of a run, -the total times for all compiles, studies, and matches are output. -. -. -.SH DESCRIPTION -.rs -.sp -If \fBpcretest\fP is given two filename arguments, it reads from the first and -writes to the second. If it is given only one filename argument, it reads from -that file and writes to stdout. Otherwise, it reads from stdin and writes to -stdout, and prompts for each line of input, using "re>" to prompt for regular -expressions, and "data>" to prompt for data lines. -.P -When \fBpcretest\fP is built, a configuration option can specify that it should -be linked with the \fBlibreadline\fP library. When this is done, if the input -is from a terminal, it is read using the \fBreadline()\fP function. This -provides line-editing and history facilities. The output from the \fB-help\fP -option states whether or not \fBreadline()\fP will be used. -.P -The program handles any number of sets of input on a single input file. Each -set starts with a regular expression, and continues with any number of data -lines to be matched against that pattern. -.P -Each data line is matched separately and independently. If you want to do -multi-line matches, you have to use the \en escape sequence (or \er or \er\en, -etc., depending on the newline setting) in a single line of input to encode the -newline sequences. There is no limit on the length of data lines; the input -buffer is automatically extended if it is too small. -.P -An empty line signals the end of the data lines, at which point a new regular -expression is read. The regular expressions are given enclosed in any -non-alphanumeric delimiters other than backslash, for example: -.sp - /(a|bc)x+yz/ -.sp -White space before the initial delimiter is ignored. A regular expression may -be continued over several input lines, in which case the newline characters are -included within it. It is possible to include the delimiter within the pattern -by escaping it, for example -.sp - /abc\e/def/ -.sp -If you do so, the escape and the delimiter form part of the pattern, but since -delimiters are always non-alphanumeric, this does not affect its interpretation. -If the terminating delimiter is immediately followed by a backslash, for -example, -.sp - /abc/\e -.sp -then a backslash is added to the end of the pattern. This is done to provide a -way of testing the error condition that arises if a pattern finishes with a -backslash, because -.sp - /abc\e/ -.sp -is interpreted as the first line of a pattern that starts with "abc/", causing -pcretest to read the next line as a continuation of the regular expression. -. -. -.SH "PATTERN MODIFIERS" -.rs -.sp -A pattern may be followed by any number of modifiers, which are mostly single -characters, though some of these can be qualified by further characters. -Following Perl usage, these are referred to below as, for example, "the -\fB/i\fP modifier", even though the delimiter of the pattern need not always be -a slash, and no slash is used when writing modifiers. White space may appear -between the final pattern delimiter and the first modifier, and between the -modifiers themselves. For reference, here is a complete list of modifiers. They -fall into several groups that are described in detail in the following -sections. -.sp - \fB/8\fP set UTF mode - \fB/9\fP set PCRE_NEVER_UTF (locks out UTF mode) - \fB/?\fP disable UTF validity check - \fB/+\fP show remainder of subject after match - \fB/=\fP show all captures (not just those that are set) -.sp - \fB/A\fP set PCRE_ANCHORED - \fB/B\fP show compiled code - \fB/C\fP set PCRE_AUTO_CALLOUT - \fB/D\fP same as \fB/B\fP plus \fB/I\fP - \fB/E\fP set PCRE_DOLLAR_ENDONLY - \fB/F\fP flip byte order in compiled pattern - \fB/f\fP set PCRE_FIRSTLINE - \fB/G\fP find all matches (shorten string) - \fB/g\fP find all matches (use startoffset) - \fB/I\fP show information about pattern - \fB/i\fP set PCRE_CASELESS - \fB/J\fP set PCRE_DUPNAMES - \fB/K\fP show backtracking control names - \fB/L\fP set locale - \fB/M\fP show compiled memory size - \fB/m\fP set PCRE_MULTILINE - \fB/N\fP set PCRE_NO_AUTO_CAPTURE - \fB/O\fP set PCRE_NO_AUTO_POSSESS - \fB/P\fP use the POSIX wrapper - \fB/Q\fP test external stack check function - \fB/S\fP study the pattern after compilation - \fB/s\fP set PCRE_DOTALL - \fB/T\fP select character tables - \fB/U\fP set PCRE_UNGREEDY - \fB/W\fP set PCRE_UCP - \fB/X\fP set PCRE_EXTRA - \fB/x\fP set PCRE_EXTENDED - \fB/Y\fP set PCRE_NO_START_OPTIMIZE - \fB/Z\fP don't show lengths in \fB/B\fP output -.sp - \fB/\fP set PCRE_NEWLINE_ANY - \fB/\fP set PCRE_NEWLINE_ANYCRLF - \fB/\fP set PCRE_NEWLINE_CR - \fB/\fP set PCRE_NEWLINE_CRLF - \fB/\fP set PCRE_NEWLINE_LF - \fB/\fP set PCRE_BSR_ANYCRLF - \fB/\fP set PCRE_BSR_UNICODE - \fB/\fP set PCRE_JAVASCRIPT_COMPAT -.sp -. -. -.SS "Perl-compatible modifiers" -.rs -.sp -The \fB/i\fP, \fB/m\fP, \fB/s\fP, and \fB/x\fP modifiers set the PCRE_CASELESS, -PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when -\fBpcre[16|32]_compile()\fP is called. These four modifier letters have the same -effect as they do in Perl. For example: -.sp - /caseless/i -.sp -. -. -.SS "Modifiers for other PCRE options" -.rs -.sp -The following table shows additional modifiers for setting PCRE compile-time -options that do not correspond to anything in Perl: -.sp - \fB/8\fP PCRE_UTF8 ) when using the 8-bit - \fB/?\fP PCRE_NO_UTF8_CHECK ) library -.sp - \fB/8\fP PCRE_UTF16 ) when using the 16-bit - \fB/?\fP PCRE_NO_UTF16_CHECK ) library -.sp - \fB/8\fP PCRE_UTF32 ) when using the 32-bit - \fB/?\fP PCRE_NO_UTF32_CHECK ) library -.sp - \fB/9\fP PCRE_NEVER_UTF - \fB/A\fP PCRE_ANCHORED - \fB/C\fP PCRE_AUTO_CALLOUT - \fB/E\fP PCRE_DOLLAR_ENDONLY - \fB/f\fP PCRE_FIRSTLINE - \fB/J\fP PCRE_DUPNAMES - \fB/N\fP PCRE_NO_AUTO_CAPTURE - \fB/O\fP PCRE_NO_AUTO_POSSESS - \fB/U\fP PCRE_UNGREEDY - \fB/W\fP PCRE_UCP - \fB/X\fP PCRE_EXTRA - \fB/Y\fP PCRE_NO_START_OPTIMIZE - \fB/\fP PCRE_NEWLINE_ANY - \fB/\fP PCRE_NEWLINE_ANYCRLF - \fB/\fP PCRE_NEWLINE_CR - \fB/\fP PCRE_NEWLINE_CRLF - \fB/\fP PCRE_NEWLINE_LF - \fB/\fP PCRE_BSR_ANYCRLF - \fB/\fP PCRE_BSR_UNICODE - \fB/\fP PCRE_JAVASCRIPT_COMPAT -.sp -The modifiers that are enclosed in angle brackets are literal strings as shown, -including the angle brackets, but the letters within can be in either case. -This example sets multiline matching with CRLF as the line ending sequence: -.sp - /^abc/m -.sp -As well as turning on the PCRE_UTF8/16/32 option, the \fB/8\fP modifier causes -all non-printing characters in output strings to be printed using the -\ex{hh...} notation. Otherwise, those less than 0x100 are output in hex without -the curly brackets. -.P -Full details of the PCRE options are given in the -.\" HREF -\fBpcreapi\fP -.\" -documentation. -. -. -.SS "Finding all matches in a string" -.rs -.sp -Searching for all possible matches within each subject string can be requested -by the \fB/g\fP or \fB/G\fP modifier. After finding a match, PCRE is called -again to search the remainder of the subject string. The difference between -\fB/g\fP and \fB/G\fP is that the former uses the \fIstartoffset\fP argument to -\fBpcre[16|32]_exec()\fP to start searching at a new point within the entire -string (which is in effect what Perl does), whereas the latter passes over a -shortened substring. This makes a difference to the matching process if the -pattern begins with a lookbehind assertion (including \eb or \eB). -.P -If any call to \fBpcre[16|32]_exec()\fP in a \fB/g\fP or \fB/G\fP sequence matches -an empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and -PCRE_ANCHORED flags set in order to search for another, non-empty, match at the -same point. If this second match fails, the start offset is advanced, and the -normal match is retried. This imitates the way Perl handles such cases when -using the \fB/g\fP modifier or the \fBsplit()\fP function. Normally, the start -offset is advanced by one character, but if the newline convention recognizes -CRLF as a newline, and the current character is CR followed by LF, an advance -of two is used. -. -. -.SS "Other modifiers" -.rs -.sp -There are yet more modifiers for controlling the way \fBpcretest\fP -operates. -.P -The \fB/+\fP modifier requests that as well as outputting the substring that -matched the entire pattern, \fBpcretest\fP should in addition output the -remainder of the subject string. This is useful for tests where the subject -contains multiple copies of the same substring. If the \fB+\fP modifier appears -twice, the same action is taken for captured substrings. In each case the -remainder is output on the following line with a plus character following the -capture number. Note that this modifier must not immediately follow the /S -modifier because /S+ and /S++ have other meanings. -.P -The \fB/=\fP modifier requests that the values of all potential captured -parentheses be output after a match. By default, only those up to the highest -one actually used in the match are output (corresponding to the return code -from \fBpcre[16|32]_exec()\fP). Values in the offsets vector corresponding to -higher numbers should be set to -1, and these are output as "". This -modifier gives a way of checking that this is happening. -.P -The \fB/B\fP modifier is a debugging feature. It requests that \fBpcretest\fP -output a representation of the compiled code after compilation. Normally this -information contains length and offset values; however, if \fB/Z\fP is also -present, this data is replaced by spaces. This is a special feature for use in -the automatic test scripts; it ensures that the same output is generated for -different internal link sizes. -.P -The \fB/D\fP modifier is a PCRE debugging feature, and is equivalent to -\fB/BI\fP, that is, both the \fB/B\fP and the \fB/I\fP modifiers. -.P -The \fB/F\fP modifier causes \fBpcretest\fP to flip the byte order of the -2-byte and 4-byte fields in the compiled pattern. This facility is for testing -the feature in PCRE that allows it to execute patterns that were compiled on a -host with a different endianness. This feature is not available when the POSIX -interface to PCRE is being used, that is, when the \fB/P\fP pattern modifier is -specified. See also the section about saving and reloading compiled patterns -below. -.P -The \fB/I\fP modifier requests that \fBpcretest\fP output information about the -compiled pattern (whether it is anchored, has a fixed first character, and -so on). It does this by calling \fBpcre[16|32]_fullinfo()\fP after compiling a -pattern. If the pattern is studied, the results of that are also output. In -this output, the word "char" means a non-UTF character, that is, the value of a -single data item (8-bit, 16-bit, or 32-bit, depending on the library that is -being tested). -.P -The \fB/K\fP modifier requests \fBpcretest\fP to show names from backtracking -control verbs that are returned from calls to \fBpcre[16|32]_exec()\fP. It causes -\fBpcretest\fP to create a \fBpcre[16|32]_extra\fP block if one has not already -been created by a call to \fBpcre[16|32]_study()\fP, and to set the -PCRE_EXTRA_MARK flag and the \fBmark\fP field within it, every time that -\fBpcre[16|32]_exec()\fP is called. If the variable that the \fBmark\fP field -points to is non-NULL for a match, non-match, or partial match, \fBpcretest\fP -prints the string to which it points. For a match, this is shown on a line by -itself, tagged with "MK:". For a non-match it is added to the message. -.P -The \fB/L\fP modifier must be followed directly by the name of a locale, for -example, -.sp - /pattern/Lfr_FR -.sp -For this reason, it must be the last modifier. The given locale is set, -\fBpcre[16|32]_maketables()\fP is called to build a set of character tables for -the locale, and this is then passed to \fBpcre[16|32]_compile()\fP when compiling -the regular expression. Without an \fB/L\fP (or \fB/T\fP) modifier, NULL is -passed as the tables pointer; that is, \fB/L\fP applies only to the expression -on which it appears. -.P -The \fB/M\fP modifier causes the size in bytes of the memory block used to hold -the compiled pattern to be output. This does not include the size of the -\fBpcre[16|32]\fP block; it is just the actual compiled data. If the pattern is -successfully studied with the PCRE_STUDY_JIT_COMPILE option, the size of the -JIT compiled code is also output. -.P -The \fB/Q\fP modifier is used to test the use of \fBpcre_stack_guard\fP. It -must be followed by '0' or '1', specifying the return code to be given from an -external function that is passed to PCRE and used for stack checking during -compilation (see the -.\" HREF -\fBpcreapi\fP -.\" -documentation for details). -.P -The \fB/S\fP modifier causes \fBpcre[16|32]_study()\fP to be called after the -expression has been compiled, and the results used when the expression is -matched. There are a number of qualifying characters that may follow \fB/S\fP. -They may appear in any order. -.P -If \fB/S\fP is followed by an exclamation mark, \fBpcre[16|32]_study()\fP is -called with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a -\fBpcre_extra\fP block, even when studying discovers no useful information. -.P -If \fB/S\fP is followed by a second S character, it suppresses studying, even -if it was requested externally by the \fB-s\fP command line option. This makes -it possible to specify that certain patterns are always studied, and others are -never studied, independently of \fB-s\fP. This feature is used in the test -files in a few cases where the output is different when the pattern is studied. -.P -If the \fB/S\fP modifier is followed by a + character, the call to -\fBpcre[16|32]_study()\fP is made with all the JIT study options, requesting -just-in-time optimization support if it is available, for both normal and -partial matching. If you want to restrict the JIT compiling modes, you can -follow \fB/S+\fP with a digit in the range 1 to 7: -.sp - 1 normal match only - 2 soft partial match only - 3 normal match and soft partial match - 4 hard partial match only - 6 soft and hard partial match - 7 all three modes (default) -.sp -If \fB/S++\fP is used instead of \fB/S+\fP (with or without a following digit), -the text "(JIT)" is added to the first output line after a match or no match -when JIT-compiled code was actually used. -.P -Note that there is also an independent \fB/+\fP modifier; it must not be given -immediately after \fB/S\fP or \fB/S+\fP because this will be misinterpreted. -.P -If JIT studying is successful, the compiled JIT code will automatically be used -when \fBpcre[16|32]_exec()\fP is run, except when incompatible run-time options -are specified. For more details, see the -.\" HREF -\fBpcrejit\fP -.\" -documentation. See also the \fB\eJ\fP escape sequence below for a way of -setting the size of the JIT stack. -.P -Finally, if \fB/S\fP is followed by a minus character, JIT compilation is -suppressed, even if it was requested externally by the \fB-s\fP command line -option. This makes it possible to specify that JIT is never to be used for -certain patterns. -.P -The \fB/T\fP modifier must be followed by a single digit. It causes a specific -set of built-in character tables to be passed to \fBpcre[16|32]_compile()\fP. It -is used in the standard PCRE tests to check behaviour with different character -tables. The digit specifies the tables as follows: -.sp - 0 the default ASCII tables, as distributed in - pcre_chartables.c.dist - 1 a set of tables defining ISO 8859 characters -.sp -In table 1, some characters whose codes are greater than 128 are identified as -letters, digits, spaces, etc. -. -. -.SS "Using the POSIX wrapper API" -.rs -.sp -The \fB/P\fP modifier causes \fBpcretest\fP to call PCRE via the POSIX wrapper -API rather than its native API. This supports only the 8-bit library. When -\fB/P\fP is set, the following modifiers set options for the \fBregcomp()\fP -function: -.sp - /i REG_ICASE - /m REG_NEWLINE - /N REG_NOSUB - /s REG_DOTALL ) - /U REG_UNGREEDY ) These options are not part of - /W REG_UCP ) the POSIX standard - /8 REG_UTF8 ) -.sp -The \fB/+\fP modifier works as described above. All other modifiers are -ignored. -. -. -.SS "Locking out certain modifiers" -.rs -.sp -PCRE can be compiled with or without support for certain features such as -UTF-8/16/32 or Unicode properties. Accordingly, the standard tests are split up -into a number of different files that are selected for running depending on -which features are available. When updating the tests, it is all too easy to -put a new test into the wrong file by mistake; for example, to put a test that -requires UTF support into a file that is used when it is not available. To help -detect such mistakes as early as possible, there is a facility for locking out -specific modifiers. If an input line for \fBpcretest\fP starts with the string -"< forbid " the following sequence of characters is taken as a list of -forbidden modifiers. For example, in the test files that must not use UTF or -Unicode property support, this line appears: -.sp - < forbid 8W -.sp -This locks out the /8 and /W modifiers. An immediate error is given if they are -subsequently encountered. If the character string contains < but not >, all the -multi-character modifiers that begin with < are locked out. Otherwise, such -modifiers must be explicitly listed, for example: -.sp - < forbid -.sp -There must be a single space between < and "forbid" for this feature to be -recognised. If there is not, the line is interpreted either as a request to -re-load a pre-compiled pattern (see "SAVING AND RELOADING COMPILED PATTERNS" -below) or, if there is a another < character, as a pattern that uses < as its -delimiter. -. -. -.SH "DATA LINES" -.rs -.sp -Before each data line is passed to \fBpcre[16|32]_exec()\fP, leading and trailing -white space is removed, and it is then scanned for \e escapes. Some of these -are pretty esoteric features, intended for checking out some of the more -complicated features of PCRE. If you are just testing "ordinary" regular -expressions, you probably don't need any of these. The following escapes are -recognized: -.sp - \ea alarm (BEL, \ex07) - \eb backspace (\ex08) - \ee escape (\ex27) - \ef form feed (\ex0c) - \en newline (\ex0a) -.\" JOIN - \eqdd set the PCRE_MATCH_LIMIT limit to dd - (any number of digits) - \er carriage return (\ex0d) - \et tab (\ex09) - \ev vertical tab (\ex0b) - \ennn octal character (up to 3 octal digits); always - a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode - \eo{dd...} octal character (any number of octal digits} - \exhh hexadecimal byte (up to 2 hex digits) - \ex{hh...} hexadecimal character (any number of hex digits) -.\" JOIN - \eA pass the PCRE_ANCHORED option to \fBpcre[16|32]_exec()\fP - or \fBpcre[16|32]_dfa_exec()\fP -.\" JOIN - \eB pass the PCRE_NOTBOL option to \fBpcre[16|32]_exec()\fP - or \fBpcre[16|32]_dfa_exec()\fP -.\" JOIN - \eCdd call pcre[16|32]_copy_substring() for substring dd - after a successful match (number less than 32) -.\" JOIN - \eCname call pcre[16|32]_copy_named_substring() for substring - "name" after a successful match (name termin- - ated by next non alphanumeric character) -.\" JOIN - \eC+ show the current captured substrings at callout - time - \eC- do not supply a callout function -.\" JOIN - \eC!n return 1 instead of 0 when callout number n is - reached -.\" JOIN - \eC!n!m return 1 instead of 0 when callout number n is - reached for the nth time -.\" JOIN - \eC*n pass the number n (may be negative) as callout - data; this is used as the callout return value - \eD use the \fBpcre[16|32]_dfa_exec()\fP match function - \eF only shortest match for \fBpcre[16|32]_dfa_exec()\fP -.\" JOIN - \eGdd call pcre[16|32]_get_substring() for substring dd - after a successful match (number less than 32) -.\" JOIN - \eGname call pcre[16|32]_get_named_substring() for substring - "name" after a successful match (name termin- - ated by next non-alphanumeric character) -.\" JOIN - \eJdd set up a JIT stack of dd kilobytes maximum (any - number of digits) -.\" JOIN - \eL call pcre[16|32]_get_substringlist() after a - successful match -.\" JOIN - \eM discover the minimum MATCH_LIMIT and - MATCH_LIMIT_RECURSION settings -.\" JOIN - \eN pass the PCRE_NOTEMPTY option to \fBpcre[16|32]_exec()\fP - or \fBpcre[16|32]_dfa_exec()\fP; if used twice, pass the - PCRE_NOTEMPTY_ATSTART option -.\" JOIN - \eOdd set the size of the output vector passed to - \fBpcre[16|32]_exec()\fP to dd (any number of digits) -.\" JOIN - \eP pass the PCRE_PARTIAL_SOFT option to \fBpcre[16|32]_exec()\fP - or \fBpcre[16|32]_dfa_exec()\fP; if used twice, pass the - PCRE_PARTIAL_HARD option -.\" JOIN - \eQdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd - (any number of digits) - \eR pass the PCRE_DFA_RESTART option to \fBpcre[16|32]_dfa_exec()\fP - \eS output details of memory get/free calls during matching -.\" JOIN - \eY pass the PCRE_NO_START_OPTIMIZE option to \fBpcre[16|32]_exec()\fP - or \fBpcre[16|32]_dfa_exec()\fP -.\" JOIN - \eZ pass the PCRE_NOTEOL option to \fBpcre[16|32]_exec()\fP - or \fBpcre[16|32]_dfa_exec()\fP -.\" JOIN - \e? pass the PCRE_NO_UTF[8|16|32]_CHECK option to - \fBpcre[16|32]_exec()\fP or \fBpcre[16|32]_dfa_exec()\fP -.\" JOIN - \e>dd start the match at offset dd (optional "-"; then - any number of digits); this sets the \fIstartoffset\fP - argument for \fBpcre[16|32]_exec()\fP or \fBpcre[16|32]_dfa_exec()\fP -.\" JOIN - \e pass the PCRE_NEWLINE_CR option to \fBpcre[16|32]_exec()\fP - or \fBpcre[16|32]_dfa_exec()\fP -.\" JOIN - \e pass the PCRE_NEWLINE_LF option to \fBpcre[16|32]_exec()\fP - or \fBpcre[16|32]_dfa_exec()\fP -.\" JOIN - \e pass the PCRE_NEWLINE_CRLF option to \fBpcre[16|32]_exec()\fP - or \fBpcre[16|32]_dfa_exec()\fP -.\" JOIN - \e pass the PCRE_NEWLINE_ANYCRLF option to \fBpcre[16|32]_exec()\fP - or \fBpcre[16|32]_dfa_exec()\fP -.\" JOIN - \e pass the PCRE_NEWLINE_ANY option to \fBpcre[16|32]_exec()\fP - or \fBpcre[16|32]_dfa_exec()\fP -.sp -The use of \ex{hh...} is not dependent on the use of the \fB/8\fP modifier on -the pattern. It is recognized always. There may be any number of hexadecimal -digits inside the braces; invalid values provoke error messages. -.P -Note that \exhh specifies one byte rather than one character in UTF-8 mode; -this makes it possible to construct invalid UTF-8 sequences for testing -purposes. On the other hand, \ex{hh} is interpreted as a UTF-8 character in -UTF-8 mode, generating more than one byte if the value is greater than 127. -When testing the 8-bit library not in UTF-8 mode, \ex{hh} generates one byte -for values less than 256, and causes an error for greater values. -.P -In UTF-16 mode, all 4-digit \ex{hhhh} values are accepted. This makes it -possible to construct invalid UTF-16 sequences for testing purposes. -.P -In UTF-32 mode, all 4- to 8-digit \ex{...} values are accepted. This makes it -possible to construct invalid UTF-32 sequences for testing purposes. -.P -The escapes that specify line ending sequences are literal strings, exactly as -shown. No more than one newline setting should be present in any data line. -.P -A backslash followed by anything else just escapes the anything else. If -the very last character is a backslash, it is ignored. This gives a way of -passing an empty line as data, since a real empty line terminates the data -input. -.P -The \fB\eJ\fP escape provides a way of setting the maximum stack size that is -used by the just-in-time optimization code. It is ignored if JIT optimization -is not being used. Providing a stack that is larger than the default 32K is -necessary only for very complicated patterns. -.P -If \eM is present, \fBpcretest\fP calls \fBpcre[16|32]_exec()\fP several times, -with different values in the \fImatch_limit\fP and \fImatch_limit_recursion\fP -fields of the \fBpcre[16|32]_extra\fP data structure, until it finds the minimum -numbers for each parameter that allow \fBpcre[16|32]_exec()\fP to complete without -error. Because this is testing a specific feature of the normal interpretive -\fBpcre[16|32]_exec()\fP execution, the use of any JIT optimization that might -have been set up by the \fB/S+\fP qualifier of \fB-s+\fP option is disabled. -.P -The \fImatch_limit\fP number is a measure of the amount of backtracking -that takes place, and checking it out can be instructive. For most simple -matches, the number is quite small, but for patterns with very large numbers of -matching possibilities, it can become large very quickly with increasing length -of subject string. The \fImatch_limit_recursion\fP number is a measure of how -much stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is -needed to complete the match attempt. -.P -When \eO is used, the value specified may be higher or lower than the size set -by the \fB-O\fP command line option (or defaulted to 45); \eO applies only to -the call of \fBpcre[16|32]_exec()\fP for the line in which it appears. -.P -If the \fB/P\fP modifier was present on the pattern, causing the POSIX wrapper -API to be used, the only option-setting sequences that have any effect are \eB, -\eN, and \eZ, causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, -to be passed to \fBregexec()\fP. -. -. -.SH "THE ALTERNATIVE MATCHING FUNCTION" -.rs -.sp -By default, \fBpcretest\fP uses the standard PCRE matching function, -\fBpcre[16|32]_exec()\fP to match each data line. PCRE also supports an -alternative matching function, \fBpcre[16|32]_dfa_test()\fP, which operates in a -different way, and has some restrictions. The differences between the two -functions are described in the -.\" HREF -\fBpcrematching\fP -.\" -documentation. -.P -If a data line contains the \eD escape sequence, or if the command line -contains the \fB-dfa\fP option, the alternative matching function is used. -This function finds all possible matches at a given point. If, however, the \eF -escape sequence is present in the data line, it stops after the first match is -found. This is always the shortest possible match. -. -. -.SH "DEFAULT OUTPUT FROM PCRETEST" -.rs -.sp -This section describes the output when the normal matching function, -\fBpcre[16|32]_exec()\fP, is being used. -.P -When a match succeeds, \fBpcretest\fP outputs the list of captured substrings -that \fBpcre[16|32]_exec()\fP returns, starting with number 0 for the string that -matched the whole pattern. Otherwise, it outputs "No match" when the return is -PCRE_ERROR_NOMATCH, and "Partial match:" followed by the partially matching -substring when \fBpcre[16|32]_exec()\fP returns PCRE_ERROR_PARTIAL. (Note that -this is the entire substring that was inspected during the partial match; it -may include characters before the actual match start if a lookbehind assertion, -\eK, \eb, or \eB was involved.) For any other return, \fBpcretest\fP outputs -the PCRE negative error number and a short descriptive phrase. If the error is -a failed UTF string check, the offset of the start of the failing character and -the reason code are also output, provided that the size of the output vector is -at least two. Here is an example of an interactive \fBpcretest\fP run. -.sp - $ pcretest - PCRE version 8.13 2011-04-30 -.sp - re> /^abc(\ed+)/ - data> abc123 - 0: abc123 - 1: 123 - data> xyz - No match -.sp -Unset capturing substrings that are not followed by one that is set are not -returned by \fBpcre[16|32]_exec()\fP, and are not shown by \fBpcretest\fP. In the -following example, there are two capturing substrings, but when the first data -line is matched, the second, unset substring is not shown. An "internal" unset -substring is shown as "", as for the second data line. -.sp - re> /(a)|(b)/ - data> a - 0: a - 1: a - data> b - 0: b - 1: - 2: b -.sp -If the strings contain any non-printing characters, they are output as \exhh -escapes if the value is less than 256 and UTF mode is not set. Otherwise they -are output as \ex{hh...} escapes. See below for the definition of non-printing -characters. If the pattern has the \fB/+\fP modifier, the output for substring -0 is followed by the the rest of the subject string, identified by "0+" like -this: -.sp - re> /cat/+ - data> cataract - 0: cat - 0+ aract -.sp -If the pattern has the \fB/g\fP or \fB/G\fP modifier, the results of successive -matching attempts are output in sequence, like this: -.sp - re> /\eBi(\ew\ew)/g - data> Mississippi - 0: iss - 1: ss - 0: iss - 1: ss - 0: ipp - 1: pp -.sp -"No match" is output only if the first match attempt fails. Here is an example -of a failure message (the offset 4 that is specified by \e>4 is past the end of -the subject string): -.sp - re> /xyz/ - data> xyz\e>4 - Error -24 (bad offset value) -.P -If any of the sequences \fB\eC\fP, \fB\eG\fP, or \fB\eL\fP are present in a -data line that is successfully matched, the substrings extracted by the -convenience functions are output with C, G, or L after the string number -instead of a colon. This is in addition to the normal full list. The string -length (that is, the return from the extraction function) is given in -parentheses after each string for \fB\eC\fP and \fB\eG\fP. -.P -Note that whereas patterns can be continued over several lines (a plain ">" -prompt is used for continuations), data lines may not. However newlines can be -included in data by means of the \en escape (or \er, \er\en, etc., depending on -the newline sequence setting). -. -. -. -.SH "OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION" -.rs -.sp -When the alternative matching function, \fBpcre[16|32]_dfa_exec()\fP, is used (by -means of the \eD escape sequence or the \fB-dfa\fP command line option), the -output consists of a list of all the matches that start at the first point in -the subject where there is at least one match. For example: -.sp - re> /(tang|tangerine|tan)/ - data> yellow tangerine\eD - 0: tangerine - 1: tang - 2: tan -.sp -(Using the normal matching function on this data finds only "tang".) The -longest matching string is always given first (and numbered zero). After a -PCRE_ERROR_PARTIAL return, the output is "Partial match:", followed by the -partially matching substring. (Note that this is the entire substring that was -inspected during the partial match; it may include characters before the actual -match start if a lookbehind assertion, \eK, \eb, or \eB was involved.) -.P -If \fB/g\fP is present on the pattern, the search for further matches resumes -at the end of the longest match. For example: -.sp - re> /(tang|tangerine|tan)/g - data> yellow tangerine and tangy sultana\eD - 0: tangerine - 1: tang - 2: tan - 0: tang - 1: tan - 0: tan -.sp -Since the matching function does not support substring capture, the escape -sequences that are concerned with captured substrings are not relevant. -. -. -.SH "RESTARTING AFTER A PARTIAL MATCH" -.rs -.sp -When the alternative matching function has given the PCRE_ERROR_PARTIAL return, -indicating that the subject partially matched the pattern, you can restart the -match with additional subject data by means of the \eR escape sequence. For -example: -.sp - re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/ - data> 23ja\eP\eD - Partial match: 23ja - data> n05\eR\eD - 0: n05 -.sp -For further information about partial matching, see the -.\" HREF -\fBpcrepartial\fP -.\" -documentation. -. -. -.SH CALLOUTS -.rs -.sp -If the pattern contains any callout requests, \fBpcretest\fP's callout function -is called during matching. This works with both matching functions. By default, -the called function displays the callout number, the start and current -positions in the text at the callout time, and the next pattern item to be -tested. For example: -.sp - --->pqrabcdef - 0 ^ ^ \ed -.sp -This output indicates that callout number 0 occurred for a match attempt -starting at the fourth character of the subject string, when the pointer was at -the seventh character of the data, and when the next pattern item was \ed. Just -one circumflex is output if the start and current positions are the same. -.P -Callouts numbered 255 are assumed to be automatic callouts, inserted as a -result of the \fB/C\fP pattern modifier. In this case, instead of showing the -callout number, the offset in the pattern, preceded by a plus, is output. For -example: -.sp - re> /\ed?[A-E]\e*/C - data> E* - --->E* - +0 ^ \ed? - +3 ^ [A-E] - +8 ^^ \e* - +10 ^ ^ - 0: E* -.sp -If a pattern contains (*MARK) items, an additional line is output whenever -a change of latest mark is passed to the callout function. For example: -.sp - re> /a(*MARK:X)bc/C - data> abc - --->abc - +0 ^ a - +1 ^^ (*MARK:X) - +10 ^^ b - Latest Mark: X - +11 ^ ^ c - +12 ^ ^ - 0: abc -.sp -The mark changes between matching "a" and "b", but stays the same for the rest -of the match, so nothing more is output. If, as a result of backtracking, the -mark reverts to being unset, the text "" is output. -.P -The callout function in \fBpcretest\fP returns zero (carry on matching) by -default, but you can use a \eC item in a data line (as described above) to -change this and other parameters of the callout. -.P -Inserting callouts can be helpful when using \fBpcretest\fP to check -complicated regular expressions. For further information about callouts, see -the -.\" HREF -\fBpcrecallout\fP -.\" -documentation. -. -. -. -.SH "NON-PRINTING CHARACTERS" -.rs -.sp -When \fBpcretest\fP is outputting text in the compiled version of a pattern, -bytes other than 32-126 are always treated as non-printing characters are are -therefore shown as hex escapes. -.P -When \fBpcretest\fP is outputting text that is a matched part of a subject -string, it behaves in the same way, unless a different locale has been set for -the pattern (using the \fB/L\fP modifier). In this case, the \fBisprint()\fP -function to distinguish printing and non-printing characters. -. -. -. -.SH "SAVING AND RELOADING COMPILED PATTERNS" -.rs -.sp -The facilities described in this section are not available when the POSIX -interface to PCRE is being used, that is, when the \fB/P\fP pattern modifier is -specified. -.P -When the POSIX interface is not in use, you can cause \fBpcretest\fP to write a -compiled pattern to a file, by following the modifiers with > and a file name. -For example: -.sp - /pattern/im >/some/file -.sp -See the -.\" HREF -\fBpcreprecompile\fP -.\" -documentation for a discussion about saving and re-using compiled patterns. -Note that if the pattern was successfully studied with JIT optimization, the -JIT data cannot be saved. -.P -The data that is written is binary. The first eight bytes are the length of the -compiled pattern data followed by the length of the optional study data, each -written as four bytes in big-endian order (most significant byte first). If -there is no study data (either the pattern was not studied, or studying did not -return any data), the second length is zero. The lengths are followed by an -exact copy of the compiled pattern. If there is additional study data, this -(excluding any JIT data) follows immediately after the compiled pattern. After -writing the file, \fBpcretest\fP expects to read a new pattern. -.P -A saved pattern can be reloaded into \fBpcretest\fP by specifying < and a file -name instead of a pattern. There must be no space between < and the file name, -which must not contain a < character, as otherwise \fBpcretest\fP will -interpret the line as a pattern delimited by < characters. For example: -.sp - re> " to prompt for regular expressions, and "data>" to prompt for data - lines. - - When pcretest is built, a configuration option can specify that it - should be linked with the libreadline library. When this is done, if - the input is from a terminal, it is read using the readline() function. - This provides line-editing and history facilities. The output from the - -help option states whether or not readline() will be used. - - The program handles any number of sets of input on a single input file. - Each set starts with a regular expression, and continues with any num- - ber of data lines to be matched against that pattern. - - Each data line is matched separately and independently. If you want to - do multi-line matches, you have to use the \n escape sequence (or \r or - \r\n, etc., depending on the newline setting) in a single line of input - to encode the newline sequences. There is no limit on the length of - data lines; the input buffer is automatically extended if it is too - small. - - An empty line signals the end of the data lines, at which point a new - regular expression is read. The regular expressions are given enclosed - in any non-alphanumeric delimiters other than backslash, for example: - - /(a|bc)x+yz/ - - White space before the initial delimiter is ignored. A regular expres- - sion may be continued over several input lines, in which case the new- - line characters are included within it. It is possible to include the - delimiter within the pattern by escaping it, for example - - /abc\/def/ - - If you do so, the escape and the delimiter form part of the pattern, - but since delimiters are always non-alphanumeric, this does not affect - its interpretation. If the terminating delimiter is immediately fol- - lowed by a backslash, for example, - - /abc/\ - - then a backslash is added to the end of the pattern. This is done to - provide a way of testing the error condition that arises if a pattern - finishes with a backslash, because - - /abc\/ - - is interpreted as the first line of a pattern that starts with "abc/", - causing pcretest to read the next line as a continuation of the regular - expression. - - -PATTERN MODIFIERS - - A pattern may be followed by any number of modifiers, which are mostly - single characters, though some of these can be qualified by further - characters. Following Perl usage, these are referred to below as, for - example, "the /i modifier", even though the delimiter of the pattern - need not always be a slash, and no slash is used when writing modi- - fiers. White space may appear between the final pattern delimiter and - the first modifier, and between the modifiers themselves. For refer- - ence, here is a complete list of modifiers. They fall into several - groups that are described in detail in the following sections. - - /8 set UTF mode - /9 set PCRE_NEVER_UTF (locks out UTF mode) - /? disable UTF validity check - /+ show remainder of subject after match - /= show all captures (not just those that are set) - - /A set PCRE_ANCHORED - /B show compiled code - /C set PCRE_AUTO_CALLOUT - /D same as /B plus /I - /E set PCRE_DOLLAR_ENDONLY - /F flip byte order in compiled pattern - /f set PCRE_FIRSTLINE - /G find all matches (shorten string) - /g find all matches (use startoffset) - /I show information about pattern - /i set PCRE_CASELESS - /J set PCRE_DUPNAMES - /K show backtracking control names - /L set locale - /M show compiled memory size - /m set PCRE_MULTILINE - /N set PCRE_NO_AUTO_CAPTURE - /O set PCRE_NO_AUTO_POSSESS - /P use the POSIX wrapper - /Q test external stack check function - /S study the pattern after compilation - /s set PCRE_DOTALL - /T select character tables - /U set PCRE_UNGREEDY - /W set PCRE_UCP - /X set PCRE_EXTRA - /x set PCRE_EXTENDED - /Y set PCRE_NO_START_OPTIMIZE - /Z don't show lengths in /B output - - / set PCRE_NEWLINE_ANY - / set PCRE_NEWLINE_ANYCRLF - / set PCRE_NEWLINE_CR - / set PCRE_NEWLINE_CRLF - / set PCRE_NEWLINE_LF - / set PCRE_BSR_ANYCRLF - / set PCRE_BSR_UNICODE - / set PCRE_JAVASCRIPT_COMPAT - - - Perl-compatible modifiers - - The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE, - PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when - pcre[16|32]_compile() is called. These four modifier letters have the - same effect as they do in Perl. For example: - - /caseless/i - - - Modifiers for other PCRE options - - The following table shows additional modifiers for setting PCRE com- - pile-time options that do not correspond to anything in Perl: - - /8 PCRE_UTF8 ) when using the 8-bit - /? PCRE_NO_UTF8_CHECK ) library - - /8 PCRE_UTF16 ) when using the 16-bit - /? PCRE_NO_UTF16_CHECK ) library - - /8 PCRE_UTF32 ) when using the 32-bit - /? PCRE_NO_UTF32_CHECK ) library - - /9 PCRE_NEVER_UTF - /A PCRE_ANCHORED - /C PCRE_AUTO_CALLOUT - /E PCRE_DOLLAR_ENDONLY - /f PCRE_FIRSTLINE - /J PCRE_DUPNAMES - /N PCRE_NO_AUTO_CAPTURE - /O PCRE_NO_AUTO_POSSESS - /U PCRE_UNGREEDY - /W PCRE_UCP - /X PCRE_EXTRA - /Y PCRE_NO_START_OPTIMIZE - / PCRE_NEWLINE_ANY - / PCRE_NEWLINE_ANYCRLF - / PCRE_NEWLINE_CR - / PCRE_NEWLINE_CRLF - / PCRE_NEWLINE_LF - / PCRE_BSR_ANYCRLF - / PCRE_BSR_UNICODE - / PCRE_JAVASCRIPT_COMPAT - - The modifiers that are enclosed in angle brackets are literal strings - as shown, including the angle brackets, but the letters within can be - in either case. This example sets multiline matching with CRLF as the - line ending sequence: - - /^abc/m - - As well as turning on the PCRE_UTF8/16/32 option, the /8 modifier - causes all non-printing characters in output strings to be printed - using the \x{hh...} notation. Otherwise, those less than 0x100 are out- - put in hex without the curly brackets. - - Full details of the PCRE options are given in the pcreapi documenta- - tion. - - Finding all matches in a string - - Searching for all possible matches within each subject string can be - requested by the /g or /G modifier. After finding a match, PCRE is - called again to search the remainder of the subject string. The differ- - ence between /g and /G is that the former uses the startoffset argument - to pcre[16|32]_exec() to start searching at a new point within the - entire string (which is in effect what Perl does), whereas the latter - passes over a shortened substring. This makes a difference to the - matching process if the pattern begins with a lookbehind assertion - (including \b or \B). - - If any call to pcre[16|32]_exec() in a /g or /G sequence matches an - empty string, the next call is done with the PCRE_NOTEMPTY_ATSTART and - PCRE_ANCHORED flags set in order to search for another, non-empty, - match at the same point. If this second match fails, the start offset - is advanced, and the normal match is retried. This imitates the way - Perl handles such cases when using the /g modifier or the split() func- - tion. Normally, the start offset is advanced by one character, but if - the newline convention recognizes CRLF as a newline, and the current - character is CR followed by LF, an advance of two is used. - - Other modifiers - - There are yet more modifiers for controlling the way pcretest operates. - - The /+ modifier requests that as well as outputting the substring that - matched the entire pattern, pcretest should in addition output the - remainder of the subject string. This is useful for tests where the - subject contains multiple copies of the same substring. If the + modi- - fier appears twice, the same action is taken for captured substrings. - In each case the remainder is output on the following line with a plus - character following the capture number. Note that this modifier must - not immediately follow the /S modifier because /S+ and /S++ have other - meanings. - - The /= modifier requests that the values of all potential captured - parentheses be output after a match. By default, only those up to the - highest one actually used in the match are output (corresponding to the - return code from pcre[16|32]_exec()). Values in the offsets vector cor- - responding to higher numbers should be set to -1, and these are output - as "". This modifier gives a way of checking that this is hap- - pening. - - The /B modifier is a debugging feature. It requests that pcretest out- - put a representation of the compiled code after compilation. Normally - this information contains length and offset values; however, if /Z is - also present, this data is replaced by spaces. This is a special fea- - ture for use in the automatic test scripts; it ensures that the same - output is generated for different internal link sizes. - - The /D modifier is a PCRE debugging feature, and is equivalent to /BI, - that is, both the /B and the /I modifiers. - - The /F modifier causes pcretest to flip the byte order of the 2-byte - and 4-byte fields in the compiled pattern. This facility is for testing - the feature in PCRE that allows it to execute patterns that were com- - piled on a host with a different endianness. This feature is not avail- - able when the POSIX interface to PCRE is being used, that is, when the - /P pattern modifier is specified. See also the section about saving and - reloading compiled patterns below. - - The /I modifier requests that pcretest output information about the - compiled pattern (whether it is anchored, has a fixed first character, - and so on). It does this by calling pcre[16|32]_fullinfo() after com- - piling a pattern. If the pattern is studied, the results of that are - also output. In this output, the word "char" means a non-UTF character, - that is, the value of a single data item (8-bit, 16-bit, or 32-bit, - depending on the library that is being tested). - - The /K modifier requests pcretest to show names from backtracking con- - trol verbs that are returned from calls to pcre[16|32]_exec(). It - causes pcretest to create a pcre[16|32]_extra block if one has not - already been created by a call to pcre[16|32]_study(), and to set the - PCRE_EXTRA_MARK flag and the mark field within it, every time that - pcre[16|32]_exec() is called. If the variable that the mark field - points to is non-NULL for a match, non-match, or partial match, - pcretest prints the string to which it points. For a match, this is - shown on a line by itself, tagged with "MK:". For a non-match it is - added to the message. - - The /L modifier must be followed directly by the name of a locale, for - example, - - /pattern/Lfr_FR - - For this reason, it must be the last modifier. The given locale is set, - pcre[16|32]_maketables() is called to build a set of character tables - for the locale, and this is then passed to pcre[16|32]_compile() when - compiling the regular expression. Without an /L (or /T) modifier, NULL - is passed as the tables pointer; that is, /L applies only to the - expression on which it appears. - - The /M modifier causes the size in bytes of the memory block used to - hold the compiled pattern to be output. This does not include the size - of the pcre[16|32] block; it is just the actual compiled data. If the - pattern is successfully studied with the PCRE_STUDY_JIT_COMPILE option, - the size of the JIT compiled code is also output. - - The /Q modifier is used to test the use of pcre_stack_guard. It must be - followed by '0' or '1', specifying the return code to be given from an - external function that is passed to PCRE and used for stack checking - during compilation (see the pcreapi documentation for details). - - The /S modifier causes pcre[16|32]_study() to be called after the - expression has been compiled, and the results used when the expression - is matched. There are a number of qualifying characters that may follow - /S. They may appear in any order. - - If /S is followed by an exclamation mark, pcre[16|32]_study() is called - with the PCRE_STUDY_EXTRA_NEEDED option, causing it always to return a - pcre_extra block, even when studying discovers no useful information. - - If /S is followed by a second S character, it suppresses studying, even - if it was requested externally by the -s command line option. This - makes it possible to specify that certain patterns are always studied, - and others are never studied, independently of -s. This feature is used - in the test files in a few cases where the output is different when the - pattern is studied. - - If the /S modifier is followed by a + character, the call to - pcre[16|32]_study() is made with all the JIT study options, requesting - just-in-time optimization support if it is available, for both normal - and partial matching. If you want to restrict the JIT compiling modes, - you can follow /S+ with a digit in the range 1 to 7: - - 1 normal match only - 2 soft partial match only - 3 normal match and soft partial match - 4 hard partial match only - 6 soft and hard partial match - 7 all three modes (default) - - If /S++ is used instead of /S+ (with or without a following digit), the - text "(JIT)" is added to the first output line after a match or no - match when JIT-compiled code was actually used. - - Note that there is also an independent /+ modifier; it must not be - given immediately after /S or /S+ because this will be misinterpreted. - - If JIT studying is successful, the compiled JIT code will automatically - be used when pcre[16|32]_exec() is run, except when incompatible run- - time options are specified. For more details, see the pcrejit documen- - tation. See also the \J escape sequence below for a way of setting the - size of the JIT stack. - - Finally, if /S is followed by a minus character, JIT compilation is - suppressed, even if it was requested externally by the -s command line - option. This makes it possible to specify that JIT is never to be used - for certain patterns. - - The /T modifier must be followed by a single digit. It causes a spe- - cific set of built-in character tables to be passed to pcre[16|32]_com- - pile(). It is used in the standard PCRE tests to check behaviour with - different character tables. The digit specifies the tables as follows: - - 0 the default ASCII tables, as distributed in - pcre_chartables.c.dist - 1 a set of tables defining ISO 8859 characters - - In table 1, some characters whose codes are greater than 128 are iden- - tified as letters, digits, spaces, etc. - - Using the POSIX wrapper API - - The /P modifier causes pcretest to call PCRE via the POSIX wrapper API - rather than its native API. This supports only the 8-bit library. When - /P is set, the following modifiers set options for the regcomp() func- - tion: - - /i REG_ICASE - /m REG_NEWLINE - /N REG_NOSUB - /s REG_DOTALL ) - /U REG_UNGREEDY ) These options are not part of - /W REG_UCP ) the POSIX standard - /8 REG_UTF8 ) - - The /+ modifier works as described above. All other modifiers are - ignored. - - Locking out certain modifiers - - PCRE can be compiled with or without support for certain features such - as UTF-8/16/32 or Unicode properties. Accordingly, the standard tests - are split up into a number of different files that are selected for - running depending on which features are available. When updating the - tests, it is all too easy to put a new test into the wrong file by mis- - take; for example, to put a test that requires UTF support into a file - that is used when it is not available. To help detect such mistakes as - early as possible, there is a facility for locking out specific modi- - fiers. If an input line for pcretest starts with the string "< forbid " - the following sequence of characters is taken as a list of forbidden - modifiers. For example, in the test files that must not use UTF or Uni- - code property support, this line appears: - - < forbid 8W - - This locks out the /8 and /W modifiers. An immediate error is given if - they are subsequently encountered. If the character string contains < - but not >, all the multi-character modifiers that begin with < are - locked out. Otherwise, such modifiers must be explicitly listed, for - example: - - < forbid - - There must be a single space between < and "forbid" for this feature to - be recognised. If there is not, the line is interpreted either as a - request to re-load a pre-compiled pattern (see "SAVING AND RELOADING - COMPILED PATTERNS" below) or, if there is a another < character, as a - pattern that uses < as its delimiter. - - -DATA LINES - - Before each data line is passed to pcre[16|32]_exec(), leading and - trailing white space is removed, and it is then scanned for \ escapes. - Some of these are pretty esoteric features, intended for checking out - some of the more complicated features of PCRE. If you are just testing - "ordinary" regular expressions, you probably don't need any of these. - The following escapes are recognized: - - \a alarm (BEL, \x07) - \b backspace (\x08) - \e escape (\x27) - \f form feed (\x0c) - \n newline (\x0a) - \qdd set the PCRE_MATCH_LIMIT limit to dd - (any number of digits) - \r carriage return (\x0d) - \t tab (\x09) - \v vertical tab (\x0b) - \nnn octal character (up to 3 octal digits); always - a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode - \o{dd...} octal character (any number of octal digits} - \xhh hexadecimal byte (up to 2 hex digits) - \x{hh...} hexadecimal character (any number of hex digits) - \A pass the PCRE_ANCHORED option to pcre[16|32]_exec() - or pcre[16|32]_dfa_exec() - \B pass the PCRE_NOTBOL option to pcre[16|32]_exec() - or pcre[16|32]_dfa_exec() - \Cdd call pcre[16|32]_copy_substring() for substring dd - after a successful match (number less than 32) - \Cname call pcre[16|32]_copy_named_substring() for substring - "name" after a successful match (name termin- - ated by next non alphanumeric character) - \C+ show the current captured substrings at callout - time - \C- do not supply a callout function - \C!n return 1 instead of 0 when callout number n is - reached - \C!n!m return 1 instead of 0 when callout number n is - reached for the nth time - \C*n pass the number n (may be negative) as callout - data; this is used as the callout return value - \D use the pcre[16|32]_dfa_exec() match function - \F only shortest match for pcre[16|32]_dfa_exec() - \Gdd call pcre[16|32]_get_substring() for substring dd - after a successful match (number less than 32) - \Gname call pcre[16|32]_get_named_substring() for substring - "name" after a successful match (name termin- - ated by next non-alphanumeric character) - \Jdd set up a JIT stack of dd kilobytes maximum (any - number of digits) - \L call pcre[16|32]_get_substringlist() after a - successful match - \M discover the minimum MATCH_LIMIT and - MATCH_LIMIT_RECURSION settings - \N pass the PCRE_NOTEMPTY option to pcre[16|32]_exec() - or pcre[16|32]_dfa_exec(); if used twice, pass the - PCRE_NOTEMPTY_ATSTART option - \Odd set the size of the output vector passed to - pcre[16|32]_exec() to dd (any number of digits) - \P pass the PCRE_PARTIAL_SOFT option to pcre[16|32]_exec() - or pcre[16|32]_dfa_exec(); if used twice, pass the - PCRE_PARTIAL_HARD option - \Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd - (any number of digits) - \R pass the PCRE_DFA_RESTART option to pcre[16|32]_dfa_exec() - \S output details of memory get/free calls during matching - \Y pass the PCRE_NO_START_OPTIMIZE option to - pcre[16|32]_exec() - or pcre[16|32]_dfa_exec() - \Z pass the PCRE_NOTEOL option to pcre[16|32]_exec() - or pcre[16|32]_dfa_exec() - \? pass the PCRE_NO_UTF[8|16|32]_CHECK option to - pcre[16|32]_exec() or pcre[16|32]_dfa_exec() - \>dd start the match at offset dd (optional "-"; then - any number of digits); this sets the startoffset - argument for pcre[16|32]_exec() or - pcre[16|32]_dfa_exec() - \ pass the PCRE_NEWLINE_CR option to pcre[16|32]_exec() - or pcre[16|32]_dfa_exec() - \ pass the PCRE_NEWLINE_LF option to pcre[16|32]_exec() - or pcre[16|32]_dfa_exec() - \ pass the PCRE_NEWLINE_CRLF option to pcre[16|32]_exec() - or pcre[16|32]_dfa_exec() - \ pass the PCRE_NEWLINE_ANYCRLF option to pcre[16|32]_exec() - or pcre[16|32]_dfa_exec() - \ pass the PCRE_NEWLINE_ANY option to pcre[16|32]_exec() - or pcre[16|32]_dfa_exec() - - The use of \x{hh...} is not dependent on the use of the /8 modifier on - the pattern. It is recognized always. There may be any number of hexa- - decimal digits inside the braces; invalid values provoke error mes- - sages. - - Note that \xhh specifies one byte rather than one character in UTF-8 - mode; this makes it possible to construct invalid UTF-8 sequences for - testing purposes. On the other hand, \x{hh} is interpreted as a UTF-8 - character in UTF-8 mode, generating more than one byte if the value is - greater than 127. When testing the 8-bit library not in UTF-8 mode, - \x{hh} generates one byte for values less than 256, and causes an error - for greater values. - - In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it - possible to construct invalid UTF-16 sequences for testing purposes. - - In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This - makes it possible to construct invalid UTF-32 sequences for testing - purposes. - - The escapes that specify line ending sequences are literal strings, - exactly as shown. No more than one newline setting should be present in - any data line. - - A backslash followed by anything else just escapes the anything else. - If the very last character is a backslash, it is ignored. This gives a - way of passing an empty line as data, since a real empty line termi- - nates the data input. - - The \J escape provides a way of setting the maximum stack size that is - used by the just-in-time optimization code. It is ignored if JIT opti- - mization is not being used. Providing a stack that is larger than the - default 32K is necessary only for very complicated patterns. - - If \M is present, pcretest calls pcre[16|32]_exec() several times, with - different values in the match_limit and match_limit_recursion fields of - the pcre[16|32]_extra data structure, until it finds the minimum num- - bers for each parameter that allow pcre[16|32]_exec() to complete with- - out error. Because this is testing a specific feature of the normal - interpretive pcre[16|32]_exec() execution, the use of any JIT optimiza- - tion that might have been set up by the /S+ qualifier of -s+ option is - disabled. - - The match_limit number is a measure of the amount of backtracking that - takes place, and checking it out can be instructive. For most simple - matches, the number is quite small, but for patterns with very large - numbers of matching possibilities, it can become large very quickly - with increasing length of subject string. The match_limit_recursion - number is a measure of how much stack (or, if PCRE is compiled with - NO_RECURSE, how much heap) memory is needed to complete the match - attempt. - - When \O is used, the value specified may be higher or lower than the - size set by the -O command line option (or defaulted to 45); \O applies - only to the call of pcre[16|32]_exec() for the line in which it - appears. - - If the /P modifier was present on the pattern, causing the POSIX wrap- - per API to be used, the only option-setting sequences that have any - effect are \B, \N, and \Z, causing REG_NOTBOL, REG_NOTEMPTY, and - REG_NOTEOL, respectively, to be passed to regexec(). - - -THE ALTERNATIVE MATCHING FUNCTION - - By default, pcretest uses the standard PCRE matching function, - pcre[16|32]_exec() to match each data line. PCRE also supports an - alternative matching function, pcre[16|32]_dfa_test(), which operates - in a different way, and has some restrictions. The differences between - the two functions are described in the pcrematching documentation. - - If a data line contains the \D escape sequence, or if the command line - contains the -dfa option, the alternative matching function is used. - This function finds all possible matches at a given point. If, however, - the \F escape sequence is present in the data line, it stops after the - first match is found. This is always the shortest possible match. - - -DEFAULT OUTPUT FROM PCRETEST - - This section describes the output when the normal matching function, - pcre[16|32]_exec(), is being used. - - When a match succeeds, pcretest outputs the list of captured substrings - that pcre[16|32]_exec() returns, starting with number 0 for the string - that matched the whole pattern. Otherwise, it outputs "No match" when - the return is PCRE_ERROR_NOMATCH, and "Partial match:" followed by the - partially matching substring when pcre[16|32]_exec() returns - PCRE_ERROR_PARTIAL. (Note that this is the entire substring that was - inspected during the partial match; it may include characters before - the actual match start if a lookbehind assertion, \K, \b, or \B was - involved.) For any other return, pcretest outputs the PCRE negative - error number and a short descriptive phrase. If the error is a failed - UTF string check, the offset of the start of the failing character and - the reason code are also output, provided that the size of the output - vector is at least two. Here is an example of an interactive pcretest - run. - - $ pcretest - PCRE version 8.13 2011-04-30 - - re> /^abc(\d+)/ - data> abc123 - 0: abc123 - 1: 123 - data> xyz - No match - - Unset capturing substrings that are not followed by one that is set are - not returned by pcre[16|32]_exec(), and are not shown by pcretest. In - the following example, there are two capturing substrings, but when the - first data line is matched, the second, unset substring is not shown. - An "internal" unset substring is shown as "", as for the second - data line. - - re> /(a)|(b)/ - data> a - 0: a - 1: a - data> b - 0: b - 1: - 2: b - - If the strings contain any non-printing characters, they are output as - \xhh escapes if the value is less than 256 and UTF mode is not set. - Otherwise they are output as \x{hh...} escapes. See below for the defi- - nition of non-printing characters. If the pattern has the /+ modifier, - the output for substring 0 is followed by the the rest of the subject - string, identified by "0+" like this: - - re> /cat/+ - data> cataract - 0: cat - 0+ aract - - If the pattern has the /g or /G modifier, the results of successive - matching attempts are output in sequence, like this: - - re> /\Bi(\w\w)/g - data> Mississippi - 0: iss - 1: ss - 0: iss - 1: ss - 0: ipp - 1: pp - - "No match" is output only if the first match attempt fails. Here is an - example of a failure message (the offset 4 that is specified by \>4 is - past the end of the subject string): - - re> /xyz/ - data> xyz\>4 - Error -24 (bad offset value) - - If any of the sequences \C, \G, or \L are present in a data line that - is successfully matched, the substrings extracted by the convenience - functions are output with C, G, or L after the string number instead of - a colon. This is in addition to the normal full list. The string length - (that is, the return from the extraction function) is given in paren- - theses after each string for \C and \G. - - Note that whereas patterns can be continued over several lines (a plain - ">" prompt is used for continuations), data lines may not. However new- - lines can be included in data by means of the \n escape (or \r, \r\n, - etc., depending on the newline sequence setting). - - -OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION - - When the alternative matching function, pcre[16|32]_dfa_exec(), is used - (by means of the \D escape sequence or the -dfa command line option), - the output consists of a list of all the matches that start at the - first point in the subject where there is at least one match. For exam- - ple: - - re> /(tang|tangerine|tan)/ - data> yellow tangerine\D - 0: tangerine - 1: tang - 2: tan - - (Using the normal matching function on this data finds only "tang".) - The longest matching string is always given first (and numbered zero). - After a PCRE_ERROR_PARTIAL return, the output is "Partial match:", fol- - lowed by the partially matching substring. (Note that this is the - entire substring that was inspected during the partial match; it may - include characters before the actual match start if a lookbehind asser- - tion, \K, \b, or \B was involved.) - - If /g is present on the pattern, the search for further matches resumes - at the end of the longest match. For example: - - re> /(tang|tangerine|tan)/g - data> yellow tangerine and tangy sultana\D - 0: tangerine - 1: tang - 2: tan - 0: tang - 1: tan - 0: tan - - Since the matching function does not support substring capture, the - escape sequences that are concerned with captured substrings are not - relevant. - - -RESTARTING AFTER A PARTIAL MATCH - - When the alternative matching function has given the PCRE_ERROR_PARTIAL - return, indicating that the subject partially matched the pattern, you - can restart the match with additional subject data by means of the \R - escape sequence. For example: - - re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ - data> 23ja\P\D - Partial match: 23ja - data> n05\R\D - 0: n05 - - For further information about partial matching, see the pcrepartial - documentation. - - -CALLOUTS - - If the pattern contains any callout requests, pcretest's callout func- - tion is called during matching. This works with both matching func- - tions. By default, the called function displays the callout number, the - start and current positions in the text at the callout time, and the - next pattern item to be tested. For example: - - --->pqrabcdef - 0 ^ ^ \d - - This output indicates that callout number 0 occurred for a match - attempt starting at the fourth character of the subject string, when - the pointer was at the seventh character of the data, and when the next - pattern item was \d. Just one circumflex is output if the start and - current positions are the same. - - Callouts numbered 255 are assumed to be automatic callouts, inserted as - a result of the /C pattern modifier. In this case, instead of showing - the callout number, the offset in the pattern, preceded by a plus, is - output. For example: - - re> /\d?[A-E]\*/C - data> E* - --->E* - +0 ^ \d? - +3 ^ [A-E] - +8 ^^ \* - +10 ^ ^ - 0: E* - - If a pattern contains (*MARK) items, an additional line is output when- - ever a change of latest mark is passed to the callout function. For - example: - - re> /a(*MARK:X)bc/C - data> abc - --->abc - +0 ^ a - +1 ^^ (*MARK:X) - +10 ^^ b - Latest Mark: X - +11 ^ ^ c - +12 ^ ^ - 0: abc - - The mark changes between matching "a" and "b", but stays the same for - the rest of the match, so nothing more is output. If, as a result of - backtracking, the mark reverts to being unset, the text "" is - output. - - The callout function in pcretest returns zero (carry on matching) by - default, but you can use a \C item in a data line (as described above) - to change this and other parameters of the callout. - - Inserting callouts can be helpful when using pcretest to check compli- - cated regular expressions. For further information about callouts, see - the pcrecallout documentation. - - -NON-PRINTING CHARACTERS - - When pcretest is outputting text in the compiled version of a pattern, - bytes other than 32-126 are always treated as non-printing characters - are are therefore shown as hex escapes. - - When pcretest is outputting text that is a matched part of a subject - string, it behaves in the same way, unless a different locale has been - set for the pattern (using the /L modifier). In this case, the - isprint() function to distinguish printing and non-printing characters. - - -SAVING AND RELOADING COMPILED PATTERNS - - The facilities described in this section are not available when the - POSIX interface to PCRE is being used, that is, when the /P pattern - modifier is specified. - - When the POSIX interface is not in use, you can cause pcretest to write - a compiled pattern to a file, by following the modifiers with > and a - file name. For example: - - /pattern/im >/some/file - - See the pcreprecompile documentation for a discussion about saving and - re-using compiled patterns. Note that if the pattern was successfully - studied with JIT optimization, the JIT data cannot be saved. - - The data that is written is binary. The first eight bytes are the - length of the compiled pattern data followed by the length of the - optional study data, each written as four bytes in big-endian order - (most significant byte first). If there is no study data (either the - pattern was not studied, or studying did not return any data), the sec- - ond length is zero. The lengths are followed by an exact copy of the - compiled pattern. If there is additional study data, this (excluding - any JIT data) follows immediately after the compiled pattern. After - writing the file, pcretest expects to read a new pattern. - - A saved pattern can be reloaded into pcretest by specifying < and a - file name instead of a pattern. There must be no space between < and - the file name, which must not contain a < character, as otherwise - pcretest will interpret the line as a pattern delimited by < charac- - ters. For example: - - re> -.SS "Validity of UTF-8 strings" -.rs -.sp -When you set the PCRE_UTF8 flag, the byte strings passed as patterns and -subjects are (by default) checked for validity on entry to the relevant -functions. The entire string is checked before any other processing takes -place. From release 7.3 of PCRE, the check is according the rules of RFC 3629, -which are themselves derived from the Unicode specification. Earlier releases -of PCRE followed the rules of RFC 2279, which allows the full range of 31-bit -values (0 to 0x7FFFFFFF). The current check allows only values in the range U+0 -to U+10FFFF, excluding the surrogate area. (From release 8.33 the so-called -"non-character" code points are no longer excluded because Unicode corrigendum -#9 makes it clear that they should not be.) -.P -Characters in the "Surrogate Area" of Unicode are reserved for use by UTF-16, -where they are used in pairs to encode codepoints with values greater than -0xFFFF. The code points that are encoded by UTF-16 pairs are available -independently in the UTF-8 and UTF-32 encodings. (In other words, the whole -surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8 and -UTF-32.) -.P -If an invalid UTF-8 string is passed to PCRE, an error return is given. At -compile time, the only additional information is the offset to the first byte -of the failing character. The run-time functions \fBpcre_exec()\fP and -\fBpcre_dfa_exec()\fP also pass back this information, as well as a more -detailed reason code if the caller has provided memory in which to do this. -.P -In some situations, you may already know that your strings are valid, and -therefore want to skip these checks in order to improve performance, for -example in the case of a long subject string that is being scanned repeatedly. -If you set the PCRE_NO_UTF8_CHECK flag at compile time or at run time, PCRE -assumes that the pattern or subject it is given (respectively) contains only -valid UTF-8 codes. In this case, it does not diagnose an invalid UTF-8 string. -.P -Note that passing PCRE_NO_UTF8_CHECK to \fBpcre_compile()\fP just disables the -check for the pattern; it does not also apply to subject strings. If you want -to disable the check for a subject string you must pass this option to -\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. -.P -If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, the result -is undefined and your program may crash. -. -. -.\" HTML -.SS "Validity of UTF-16 strings" -.rs -.sp -When you set the PCRE_UTF16 flag, the strings of 16-bit data units that are -passed as patterns and subjects are (by default) checked for validity on entry -to the relevant functions. Values other than those in the surrogate range -U+D800 to U+DFFF are independent code points. Values in the surrogate range -must be used in pairs in the correct manner. -.P -If an invalid UTF-16 string is passed to PCRE, an error return is given. At -compile time, the only additional information is the offset to the first data -unit of the failing character. The run-time functions \fBpcre16_exec()\fP and -\fBpcre16_dfa_exec()\fP also pass back this information, as well as a more -detailed reason code if the caller has provided memory in which to do this. -.P -In some situations, you may already know that your strings are valid, and -therefore want to skip these checks in order to improve performance. If you set -the PCRE_NO_UTF16_CHECK flag at compile time or at run time, PCRE assumes that -the pattern or subject it is given (respectively) contains only valid UTF-16 -sequences. In this case, it does not diagnose an invalid UTF-16 string. -However, if an invalid string is passed, the result is undefined. -. -. -.\" HTML -.SS "Validity of UTF-32 strings" -.rs -.sp -When you set the PCRE_UTF32 flag, the strings of 32-bit data units that are -passed as patterns and subjects are (by default) checked for validity on entry -to the relevant functions. This check allows only values in the range U+0 -to U+10FFFF, excluding the surrogate area U+D800 to U+DFFF. -.P -If an invalid UTF-32 string is passed to PCRE, an error return is given. At -compile time, the only additional information is the offset to the first data -unit of the failing character. The run-time functions \fBpcre32_exec()\fP and -\fBpcre32_dfa_exec()\fP also pass back this information, as well as a more -detailed reason code if the caller has provided memory in which to do this. -.P -In some situations, you may already know that your strings are valid, and -therefore want to skip these checks in order to improve performance. If you set -the PCRE_NO_UTF32_CHECK flag at compile time or at run time, PCRE assumes that -the pattern or subject it is given (respectively) contains only valid UTF-32 -sequences. In this case, it does not diagnose an invalid UTF-32 string. -However, if an invalid string is passed, the result is undefined. -. -. -.SS "General comments about UTF modes" -.rs -.sp -1. Codepoints less than 256 can be specified in patterns by either braced or -unbraced hexadecimal escape sequences (for example, \ex{b3} or \exb3). Larger -values have to use braced sequences. -.P -2. Octal numbers up to \e777 are recognized, and in UTF-8 mode they match -two-byte characters for values greater than \e177. -.P -3. Repeat quantifiers apply to complete UTF characters, not to individual -data units, for example: \ex{100}{3}. -.P -4. The dot metacharacter matches one UTF character instead of a single data -unit. -.P -5. The escape sequence \eC can be used to match a single byte in UTF-8 mode, or -a single 16-bit data unit in UTF-16 mode, or a single 32-bit data unit in -UTF-32 mode, but its use can lead to some strange effects because it breaks up -multi-unit characters (see the description of \eC in the -.\" HREF -\fBpcrepattern\fP -.\" -documentation). The use of \eC is not supported in the alternative matching -function \fBpcre[16|32]_dfa_exec()\fP, nor is it supported in UTF mode by the -JIT optimization of \fBpcre[16|32]_exec()\fP. If JIT optimization is requested -for a UTF pattern that contains \eC, it will not succeed, and so the matching -will be carried out by the normal interpretive function. -.P -6. The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly -test characters of any code value, but, by default, the characters that PCRE -recognizes as digits, spaces, or word characters remain the same set as in -non-UTF mode, all with values less than 256. This remains true even when PCRE -is built to include Unicode property support, because to do otherwise would -slow down PCRE in many common cases. Note in particular that this applies to -\eb and \eB, because they are defined in terms of \ew and \eW. If you really -want to test for a wider sense of, say, "digit", you can use explicit Unicode -property tests such as \ep{Nd}. Alternatively, if you set the PCRE_UCP option, -the way that the character escapes work is changed so that Unicode properties -are used to determine which characters match. There are more details in the -section on -.\" HTML -.\" -generic character types -.\" -in the -.\" HREF -\fBpcrepattern\fP -.\" -documentation. -.P -7. Similarly, characters that match the POSIX named character classes are all -low-valued characters, unless the PCRE_UCP option is set. -.P -8. However, the horizontal and vertical white space matching escapes (\eh, \eH, -\ev, and \eV) do match all the appropriate Unicode characters, whether or not -PCRE_UCP is set. -.P -9. Case-insensitive matching applies only to characters whose values are less -than 128, unless PCRE is built with Unicode property support. A few Unicode -characters such as Greek sigma have more than two codepoints that are -case-equivalent. Up to and including PCRE release 8.31, only one-to-one case -mappings were supported, but later releases (with Unicode property support) do -treat as case-equivalent all versions of characters such as Greek sigma. -. -. -.SH AUTHOR -.rs -.sp -.nf -Philip Hazel -University Computing Service -Cambridge CB2 3QH, England. -.fi -. -. -.SH REVISION -.rs -.sp -.nf -Last updated: 27 February 2013 -Copyright (c) 1997-2013 University of Cambridge. -.fi diff --git a/src/pcre/doc/perltest.txt b/src/pcre/doc/perltest.txt deleted file mode 100644 index bb1a52a4..00000000 --- a/src/pcre/doc/perltest.txt +++ /dev/null @@ -1,42 +0,0 @@ -The perltest program --------------------- - -The perltest.pl script tests Perl's regular expressions; it has the same -specification as pcretest, and so can be given identical input, except that -input patterns can be followed only by Perl's lower case modifiers and certain -other pcretest modifiers that are either handled or ignored: - - /+ recognized and handled by perltest - /++ the second + is ignored - /8 recognized and handled by perltest - /J ignored - /K ignored - /W ignored - /S ignored - /SS ignored - /Y ignored - -The pcretest \Y escape in data lines is removed before matching. The data lines -are processed as Perl double-quoted strings, so if they contain " $ or @ -characters, these have to be escaped. For this reason, all such characters in -the Perl-compatible testinput1 file are escaped so that they can be used for -perltest as well as for pcretest. The special upper case pattern modifiers such -as /A that pcretest recognizes, and its special data line escapes, are not used -in the Perl-compatible test file. The output should be identical, apart from -the initial identifying banner. - -The perltest.pl script can also test UTF-8 features. It recognizes the special -modifier /8 that pcretest uses to invoke UTF-8 functionality. The testinput4 -and testinput6 files can be fed to perltest to run compatible UTF-8 tests. -However, it is necessary to add "use utf8; require Encode" to the script to -make this work correctly. I have not managed to find a way to handle this -automatically. - -The other testinput files are not suitable for feeding to perltest.pl, since -they make use of the special upper case modifiers and escapes that pcretest -uses to test certain features of PCRE. Some of these files also contain -malformed regular expressions, in order to check that PCRE diagnoses them -correctly. - -Philip Hazel -January 2012 diff --git a/src/pcre/libpcre.pc.in b/src/pcre/libpcre.pc.in deleted file mode 100644 index 0a35da87..00000000 --- a/src/pcre/libpcre.pc.in +++ /dev/null @@ -1,13 +0,0 @@ -# Package Information for pkg-config - -prefix=@prefix@ -exec_prefix=@exec_prefix@ -libdir=@libdir@ -includedir=@includedir@ - -Name: libpcre -Description: PCRE - Perl compatible regular expressions C library with 8 bit character support -Version: @PACKAGE_VERSION@ -Libs: -L${libdir} -lpcre -Libs.private: @PTHREAD_CFLAGS@ @PTHREAD_LIBS@ -Cflags: -I${includedir} @PCRE_STATIC_CFLAG@ diff --git a/src/pcre/libpcre16.pc.in b/src/pcre/libpcre16.pc.in deleted file mode 100644 index 080c9dcf..00000000 --- a/src/pcre/libpcre16.pc.in +++ /dev/null @@ -1,13 +0,0 @@ -# Package Information for pkg-config - -prefix=@prefix@ -exec_prefix=@exec_prefix@ -libdir=@libdir@ -includedir=@includedir@ - -Name: libpcre16 -Description: PCRE - Perl compatible regular expressions C library with 16 bit character support -Version: @PACKAGE_VERSION@ -Libs: -L${libdir} -lpcre16 -Libs.private: @PTHREAD_CFLAGS@ @PTHREAD_LIBS@ -Cflags: -I${includedir} @PCRE_STATIC_CFLAG@ diff --git a/src/pcre/libpcre32.pc.in b/src/pcre/libpcre32.pc.in deleted file mode 100644 index a3ae0e11..00000000 --- a/src/pcre/libpcre32.pc.in +++ /dev/null @@ -1,13 +0,0 @@ -# Package Information for pkg-config - -prefix=@prefix@ -exec_prefix=@exec_prefix@ -libdir=@libdir@ -includedir=@includedir@ - -Name: libpcre32 -Description: PCRE - Perl compatible regular expressions C library with 32 bit character support -Version: @PACKAGE_VERSION@ -Libs: -L${libdir} -lpcre32 -Libs.private: @PTHREAD_CFLAGS@ @PTHREAD_LIBS@ -Cflags: -I${includedir} @PCRE_STATIC_CFLAG@ diff --git a/src/pcre/libpcrecpp.pc.in b/src/pcre/libpcrecpp.pc.in deleted file mode 100644 index ef006fe4..00000000 --- a/src/pcre/libpcrecpp.pc.in +++ /dev/null @@ -1,12 +0,0 @@ -# Package Information for pkg-config - -prefix=@prefix@ -exec_prefix=@exec_prefix@ -libdir=@libdir@ -includedir=@includedir@ - -Name: libpcrecpp -Description: PCRECPP - C++ wrapper for PCRE -Version: @PACKAGE_VERSION@ -Libs: -L${libdir} -lpcre -lpcrecpp -Cflags: -I${includedir} @PCRE_STATIC_CFLAG@ diff --git a/src/pcre/libpcreposix.pc.in b/src/pcre/libpcreposix.pc.in deleted file mode 100644 index c6c0b0c6..00000000 --- a/src/pcre/libpcreposix.pc.in +++ /dev/null @@ -1,13 +0,0 @@ -# Package Information for pkg-config - -prefix=@prefix@ -exec_prefix=@exec_prefix@ -libdir=@libdir@ -includedir=@includedir@ - -Name: libpcreposix -Description: PCREPosix - Posix compatible interface to libpcre -Version: @PACKAGE_VERSION@ -Libs: -L${libdir} -lpcreposix -Cflags: -I${includedir} @PCRE_STATIC_CFLAG@ -Requires.private: libpcre diff --git a/src/pcre/makevp.bat b/src/pcre/makevp.bat deleted file mode 100644 index 5f795487..00000000 --- a/src/pcre/makevp.bat +++ /dev/null @@ -1,66 +0,0 @@ -:: AH 20-12-06 modified for new PCRE-7.0 and VP/BCC -:: PH 19-03-07 renamed !compile.txt and !linklib.txt as makevp-compile.txt and -:: makevp-linklib.txt -:: PH 26-03-07 re-renamed !compile.txt and !linklib.txt as makevp-c.txt and -:: makevp-l.txt -:: PH 29-03-07 hopefully the final rename to makevp_c and makevp_l -:: AH 27.08.08 updated for new PCRE-7.7 -:: required PCRE.H and CONFIG.H will be generated if not existing - -@echo off -echo. -echo Compiling PCRE with BORLAND C++ for VIRTUAL PASCAL -echo. - -REM This file was contributed by Alexander Tokarev for building PCRE for use -REM with Virtual Pascal. It has not been tested with the latest PCRE release. - -REM This file has been modified and extended to compile with newer PCRE releases -REM by Stefan Weber (Angels Holocaust). - -REM CHANGE THIS FOR YOUR BORLAND C++ COMPILER PATH -SET BORLAND=f:\bcc -REM location of the TASM binaries, if compiling with the -B BCC switch -SET TASM=f:\tasm - -SET PATH=%PATH%;%BORLAND%\bin;%TASM%\bin -SET PCRE_VER=77 -SET COMPILE_DEFAULTS=-DHAVE_CONFIG_H -DPCRE_STATIC -I%BORLAND%\include - -del pcre%PCRE_VER%.lib >nul 2>nul - -:: sh configure - -:: check for needed header files -if not exist pcre.h copy pcre.h.generic pcre.h -if not exist config.h copy config.h.generic config.h - -bcc32 -DDFTABLES %COMPILE_DEFAULTS% -L%BORLAND%\lib dftables.c -IF ERRORLEVEL 1 GOTO ERROR - -:: dftables > chartables.c -dftables pcre_chartables.c - -REM compile and link the PCRE library into lib: option -B for ASM compile works too -bcc32 -a4 -c -RT- -y- -v- -u- -R- -Q- -X -d -fp -ff -P- -O2 -Oc -Ov -3 -w-8004 -w-8064 -w-8065 -w-8012 -UDFTABLES -DVPCOMPAT %COMPILE_DEFAULTS% @makevp_c.txt -IF ERRORLEVEL 1 GOTO ERROR - -tlib %BORLAND%\lib\cw32.lib *calloc *del *strncmp *memcpy *memmove *memset *memcmp *strlen -IF ERRORLEVEL 1 GOTO ERROR -tlib pcre%PCRE_VER%.lib @makevp_l.txt +calloc.obj +del.obj +strncmp.obj +memcpy.obj +memmove.obj +memset.obj +memcmp.obj +strlen.obj -IF ERRORLEVEL 1 GOTO ERROR - -del *.obj *.tds *.bak >nul 2>nul - -echo --- -echo Now the library should be complete. Please check all messages above. -echo Don't care for warnings, it's OK. -goto END - -:ERROR -echo --- -echo Error while compiling PCRE. Aborting... -pause -goto END - -:END diff --git a/src/pcre/makevp_c.txt b/src/pcre/makevp_c.txt deleted file mode 100644 index 56481154..00000000 --- a/src/pcre/makevp_c.txt +++ /dev/null @@ -1,21 +0,0 @@ -pcre_byte_order.c -pcre_chartables.c -pcre_compile.c -pcre_config.c -pcre_dfa_exec.c -pcre_exec.c -pcre_fullinfo.c -pcre_get.c -pcre_globals.c -pcre_jit_compile.c -pcre_maketables.c -pcre_newline.c -pcre_ord2utf8.c -pcre_refcount.c -pcre_string_utils.c -pcre_study.c -pcre_tables.c -pcre_ucd.c -pcre_valid_utf8.c -pcre_version.c -pcre_xclass.c diff --git a/src/pcre/makevp_l.txt b/src/pcre/makevp_l.txt deleted file mode 100644 index 9b071e0f..00000000 --- a/src/pcre/makevp_l.txt +++ /dev/null @@ -1,21 +0,0 @@ -+pcre_byte_order.obj & -+pcre_chartables.obj & -+pcre_compile.obj & -+pcre_config.obj & -+pcre_dfa_exec.obj & -+pcre_exec.obj & -+pcre_fullinfo.obj & -+pcre_get.obj & -+pcre_globals.obj & -+pcre_jit_compile.obj & -+pcre_maketables.obj & -+pcre_newline.obj & -+pcre_ord2utf8.obj & -+pcre_refcount.obj & -+pcre_string_utils.obj & -+pcre_study.obj & -+pcre_tables.obj & -+pcre_ucd.obj & -+pcre_valid_utf8.obj & -+pcre_version.obj & -+pcre_xclass.obj diff --git a/src/pcre/pcre.h.generic b/src/pcre/pcre.h.generic deleted file mode 100644 index b578eb11..00000000 --- a/src/pcre/pcre.h.generic +++ /dev/null @@ -1,677 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* This is the public header file for the PCRE library, to be #included by -applications that call the PCRE functions. - - Copyright (c) 1997-2014 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -#ifndef _PCRE_H -#define _PCRE_H - -/* The current PCRE version information. */ - -#define PCRE_MAJOR 8 -#define PCRE_MINOR 43 -#define PCRE_PRERELEASE -#define PCRE_DATE 2019-02-23 - -/* When an application links to a PCRE DLL in Windows, the symbols that are -imported have to be identified as such. When building PCRE, the appropriate -export setting is defined in pcre_internal.h, which includes this file. So we -don't change existing definitions of PCRE_EXP_DECL and PCRECPP_EXP_DECL. */ - -#if defined(_WIN32) && !defined(PCRE_STATIC) -# ifndef PCRE_EXP_DECL -# define PCRE_EXP_DECL extern __declspec(dllimport) -# endif -# ifdef __cplusplus -# ifndef PCRECPP_EXP_DECL -# define PCRECPP_EXP_DECL extern __declspec(dllimport) -# endif -# ifndef PCRECPP_EXP_DEFN -# define PCRECPP_EXP_DEFN __declspec(dllimport) -# endif -# endif -#endif - -/* By default, we use the standard "extern" declarations. */ - -#ifndef PCRE_EXP_DECL -# ifdef __cplusplus -# define PCRE_EXP_DECL extern "C" -# else -# define PCRE_EXP_DECL extern -# endif -#endif - -#ifdef __cplusplus -# ifndef PCRECPP_EXP_DECL -# define PCRECPP_EXP_DECL extern -# endif -# ifndef PCRECPP_EXP_DEFN -# define PCRECPP_EXP_DEFN -# endif -#endif - -/* Have to include stdlib.h in order to ensure that size_t is defined; -it is needed here for malloc. */ - -#include - -/* Allow for C++ users */ - -#ifdef __cplusplus -extern "C" { -#endif - -/* Public options. Some are compile-time only, some are run-time only, and some -are both. Most of the compile-time options are saved with the compiled regex so -that they can be inspected during studying (and therefore JIT compiling). Note -that pcre_study() has its own set of options. Originally, all the options -defined here used distinct bits. However, almost all the bits in a 32-bit word -are now used, so in order to conserve them, option bits that were previously -only recognized at matching time (i.e. by pcre_exec() or pcre_dfa_exec()) may -also be used for compile-time options that affect only compiling and are not -relevant for studying or JIT compiling. - -Some options for pcre_compile() change its behaviour but do not affect the -behaviour of the execution functions. Other options are passed through to the -execution functions and affect their behaviour, with or without affecting the -behaviour of pcre_compile(). - -Options that can be passed to pcre_compile() are tagged Cx below, with these -variants: - -C1 Affects compile only -C2 Does not affect compile; affects exec, dfa_exec -C3 Affects compile, exec, dfa_exec -C4 Affects compile, exec, dfa_exec, study -C5 Affects compile, exec, study - -Options that can be set for pcre_exec() and/or pcre_dfa_exec() are flagged with -E and D, respectively. They take precedence over C3, C4, and C5 settings passed -from pcre_compile(). Those that are compatible with JIT execution are flagged -with J. */ - -#define PCRE_CASELESS 0x00000001 /* C1 */ -#define PCRE_MULTILINE 0x00000002 /* C1 */ -#define PCRE_DOTALL 0x00000004 /* C1 */ -#define PCRE_EXTENDED 0x00000008 /* C1 */ -#define PCRE_ANCHORED 0x00000010 /* C4 E D */ -#define PCRE_DOLLAR_ENDONLY 0x00000020 /* C2 */ -#define PCRE_EXTRA 0x00000040 /* C1 */ -#define PCRE_NOTBOL 0x00000080 /* E D J */ -#define PCRE_NOTEOL 0x00000100 /* E D J */ -#define PCRE_UNGREEDY 0x00000200 /* C1 */ -#define PCRE_NOTEMPTY 0x00000400 /* E D J */ -#define PCRE_UTF8 0x00000800 /* C4 ) */ -#define PCRE_UTF16 0x00000800 /* C4 ) Synonyms */ -#define PCRE_UTF32 0x00000800 /* C4 ) */ -#define PCRE_NO_AUTO_CAPTURE 0x00001000 /* C1 */ -#define PCRE_NO_UTF8_CHECK 0x00002000 /* C1 E D J ) */ -#define PCRE_NO_UTF16_CHECK 0x00002000 /* C1 E D J ) Synonyms */ -#define PCRE_NO_UTF32_CHECK 0x00002000 /* C1 E D J ) */ -#define PCRE_AUTO_CALLOUT 0x00004000 /* C1 */ -#define PCRE_PARTIAL_SOFT 0x00008000 /* E D J ) Synonyms */ -#define PCRE_PARTIAL 0x00008000 /* E D J ) */ - -/* This pair use the same bit. */ -#define PCRE_NEVER_UTF 0x00010000 /* C1 ) Overlaid */ -#define PCRE_DFA_SHORTEST 0x00010000 /* D ) Overlaid */ - -/* This pair use the same bit. */ -#define PCRE_NO_AUTO_POSSESS 0x00020000 /* C1 ) Overlaid */ -#define PCRE_DFA_RESTART 0x00020000 /* D ) Overlaid */ - -#define PCRE_FIRSTLINE 0x00040000 /* C3 */ -#define PCRE_DUPNAMES 0x00080000 /* C1 */ -#define PCRE_NEWLINE_CR 0x00100000 /* C3 E D */ -#define PCRE_NEWLINE_LF 0x00200000 /* C3 E D */ -#define PCRE_NEWLINE_CRLF 0x00300000 /* C3 E D */ -#define PCRE_NEWLINE_ANY 0x00400000 /* C3 E D */ -#define PCRE_NEWLINE_ANYCRLF 0x00500000 /* C3 E D */ -#define PCRE_BSR_ANYCRLF 0x00800000 /* C3 E D */ -#define PCRE_BSR_UNICODE 0x01000000 /* C3 E D */ -#define PCRE_JAVASCRIPT_COMPAT 0x02000000 /* C5 */ -#define PCRE_NO_START_OPTIMIZE 0x04000000 /* C2 E D ) Synonyms */ -#define PCRE_NO_START_OPTIMISE 0x04000000 /* C2 E D ) */ -#define PCRE_PARTIAL_HARD 0x08000000 /* E D J */ -#define PCRE_NOTEMPTY_ATSTART 0x10000000 /* E D J */ -#define PCRE_UCP 0x20000000 /* C3 */ - -/* Exec-time and get/set-time error codes */ - -#define PCRE_ERROR_NOMATCH (-1) -#define PCRE_ERROR_NULL (-2) -#define PCRE_ERROR_BADOPTION (-3) -#define PCRE_ERROR_BADMAGIC (-4) -#define PCRE_ERROR_UNKNOWN_OPCODE (-5) -#define PCRE_ERROR_UNKNOWN_NODE (-5) /* For backward compatibility */ -#define PCRE_ERROR_NOMEMORY (-6) -#define PCRE_ERROR_NOSUBSTRING (-7) -#define PCRE_ERROR_MATCHLIMIT (-8) -#define PCRE_ERROR_CALLOUT (-9) /* Never used by PCRE itself */ -#define PCRE_ERROR_BADUTF8 (-10) /* Same for 8/16/32 */ -#define PCRE_ERROR_BADUTF16 (-10) /* Same for 8/16/32 */ -#define PCRE_ERROR_BADUTF32 (-10) /* Same for 8/16/32 */ -#define PCRE_ERROR_BADUTF8_OFFSET (-11) /* Same for 8/16 */ -#define PCRE_ERROR_BADUTF16_OFFSET (-11) /* Same for 8/16 */ -#define PCRE_ERROR_PARTIAL (-12) -#define PCRE_ERROR_BADPARTIAL (-13) -#define PCRE_ERROR_INTERNAL (-14) -#define PCRE_ERROR_BADCOUNT (-15) -#define PCRE_ERROR_DFA_UITEM (-16) -#define PCRE_ERROR_DFA_UCOND (-17) -#define PCRE_ERROR_DFA_UMLIMIT (-18) -#define PCRE_ERROR_DFA_WSSIZE (-19) -#define PCRE_ERROR_DFA_RECURSE (-20) -#define PCRE_ERROR_RECURSIONLIMIT (-21) -#define PCRE_ERROR_NULLWSLIMIT (-22) /* No longer actually used */ -#define PCRE_ERROR_BADNEWLINE (-23) -#define PCRE_ERROR_BADOFFSET (-24) -#define PCRE_ERROR_SHORTUTF8 (-25) -#define PCRE_ERROR_SHORTUTF16 (-25) /* Same for 8/16 */ -#define PCRE_ERROR_RECURSELOOP (-26) -#define PCRE_ERROR_JIT_STACKLIMIT (-27) -#define PCRE_ERROR_BADMODE (-28) -#define PCRE_ERROR_BADENDIANNESS (-29) -#define PCRE_ERROR_DFA_BADRESTART (-30) -#define PCRE_ERROR_JIT_BADOPTION (-31) -#define PCRE_ERROR_BADLENGTH (-32) -#define PCRE_ERROR_UNSET (-33) - -/* Specific error codes for UTF-8 validity checks */ - -#define PCRE_UTF8_ERR0 0 -#define PCRE_UTF8_ERR1 1 -#define PCRE_UTF8_ERR2 2 -#define PCRE_UTF8_ERR3 3 -#define PCRE_UTF8_ERR4 4 -#define PCRE_UTF8_ERR5 5 -#define PCRE_UTF8_ERR6 6 -#define PCRE_UTF8_ERR7 7 -#define PCRE_UTF8_ERR8 8 -#define PCRE_UTF8_ERR9 9 -#define PCRE_UTF8_ERR10 10 -#define PCRE_UTF8_ERR11 11 -#define PCRE_UTF8_ERR12 12 -#define PCRE_UTF8_ERR13 13 -#define PCRE_UTF8_ERR14 14 -#define PCRE_UTF8_ERR15 15 -#define PCRE_UTF8_ERR16 16 -#define PCRE_UTF8_ERR17 17 -#define PCRE_UTF8_ERR18 18 -#define PCRE_UTF8_ERR19 19 -#define PCRE_UTF8_ERR20 20 -#define PCRE_UTF8_ERR21 21 -#define PCRE_UTF8_ERR22 22 /* Unused (was non-character) */ - -/* Specific error codes for UTF-16 validity checks */ - -#define PCRE_UTF16_ERR0 0 -#define PCRE_UTF16_ERR1 1 -#define PCRE_UTF16_ERR2 2 -#define PCRE_UTF16_ERR3 3 -#define PCRE_UTF16_ERR4 4 /* Unused (was non-character) */ - -/* Specific error codes for UTF-32 validity checks */ - -#define PCRE_UTF32_ERR0 0 -#define PCRE_UTF32_ERR1 1 -#define PCRE_UTF32_ERR2 2 /* Unused (was non-character) */ -#define PCRE_UTF32_ERR3 3 - -/* Request types for pcre_fullinfo() */ - -#define PCRE_INFO_OPTIONS 0 -#define PCRE_INFO_SIZE 1 -#define PCRE_INFO_CAPTURECOUNT 2 -#define PCRE_INFO_BACKREFMAX 3 -#define PCRE_INFO_FIRSTBYTE 4 -#define PCRE_INFO_FIRSTCHAR 4 /* For backwards compatibility */ -#define PCRE_INFO_FIRSTTABLE 5 -#define PCRE_INFO_LASTLITERAL 6 -#define PCRE_INFO_NAMEENTRYSIZE 7 -#define PCRE_INFO_NAMECOUNT 8 -#define PCRE_INFO_NAMETABLE 9 -#define PCRE_INFO_STUDYSIZE 10 -#define PCRE_INFO_DEFAULT_TABLES 11 -#define PCRE_INFO_OKPARTIAL 12 -#define PCRE_INFO_JCHANGED 13 -#define PCRE_INFO_HASCRORLF 14 -#define PCRE_INFO_MINLENGTH 15 -#define PCRE_INFO_JIT 16 -#define PCRE_INFO_JITSIZE 17 -#define PCRE_INFO_MAXLOOKBEHIND 18 -#define PCRE_INFO_FIRSTCHARACTER 19 -#define PCRE_INFO_FIRSTCHARACTERFLAGS 20 -#define PCRE_INFO_REQUIREDCHAR 21 -#define PCRE_INFO_REQUIREDCHARFLAGS 22 -#define PCRE_INFO_MATCHLIMIT 23 -#define PCRE_INFO_RECURSIONLIMIT 24 -#define PCRE_INFO_MATCH_EMPTY 25 - -/* Request types for pcre_config(). Do not re-arrange, in order to remain -compatible. */ - -#define PCRE_CONFIG_UTF8 0 -#define PCRE_CONFIG_NEWLINE 1 -#define PCRE_CONFIG_LINK_SIZE 2 -#define PCRE_CONFIG_POSIX_MALLOC_THRESHOLD 3 -#define PCRE_CONFIG_MATCH_LIMIT 4 -#define PCRE_CONFIG_STACKRECURSE 5 -#define PCRE_CONFIG_UNICODE_PROPERTIES 6 -#define PCRE_CONFIG_MATCH_LIMIT_RECURSION 7 -#define PCRE_CONFIG_BSR 8 -#define PCRE_CONFIG_JIT 9 -#define PCRE_CONFIG_UTF16 10 -#define PCRE_CONFIG_JITTARGET 11 -#define PCRE_CONFIG_UTF32 12 -#define PCRE_CONFIG_PARENS_LIMIT 13 - -/* Request types for pcre_study(). Do not re-arrange, in order to remain -compatible. */ - -#define PCRE_STUDY_JIT_COMPILE 0x0001 -#define PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE 0x0002 -#define PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE 0x0004 -#define PCRE_STUDY_EXTRA_NEEDED 0x0008 - -/* Bit flags for the pcre[16|32]_extra structure. Do not re-arrange or redefine -these bits, just add new ones on the end, in order to remain compatible. */ - -#define PCRE_EXTRA_STUDY_DATA 0x0001 -#define PCRE_EXTRA_MATCH_LIMIT 0x0002 -#define PCRE_EXTRA_CALLOUT_DATA 0x0004 -#define PCRE_EXTRA_TABLES 0x0008 -#define PCRE_EXTRA_MATCH_LIMIT_RECURSION 0x0010 -#define PCRE_EXTRA_MARK 0x0020 -#define PCRE_EXTRA_EXECUTABLE_JIT 0x0040 - -/* Types */ - -struct real_pcre8_or_16; /* declaration; the definition is private */ -typedef struct real_pcre8_or_16 pcre; - -struct real_pcre8_or_16; /* declaration; the definition is private */ -typedef struct real_pcre8_or_16 pcre16; - -struct real_pcre32; /* declaration; the definition is private */ -typedef struct real_pcre32 pcre32; - -struct real_pcre_jit_stack; /* declaration; the definition is private */ -typedef struct real_pcre_jit_stack pcre_jit_stack; - -struct real_pcre16_jit_stack; /* declaration; the definition is private */ -typedef struct real_pcre16_jit_stack pcre16_jit_stack; - -struct real_pcre32_jit_stack; /* declaration; the definition is private */ -typedef struct real_pcre32_jit_stack pcre32_jit_stack; - -/* If PCRE is compiled with 16 bit character support, PCRE_UCHAR16 must contain -a 16 bit wide signed data type. Otherwise it can be a dummy data type since -pcre16 functions are not implemented. There is a check for this in pcre_internal.h. */ -#ifndef PCRE_UCHAR16 -#define PCRE_UCHAR16 unsigned short -#endif - -#ifndef PCRE_SPTR16 -#define PCRE_SPTR16 const PCRE_UCHAR16 * -#endif - -/* If PCRE is compiled with 32 bit character support, PCRE_UCHAR32 must contain -a 32 bit wide signed data type. Otherwise it can be a dummy data type since -pcre32 functions are not implemented. There is a check for this in pcre_internal.h. */ -#ifndef PCRE_UCHAR32 -#define PCRE_UCHAR32 unsigned int -#endif - -#ifndef PCRE_SPTR32 -#define PCRE_SPTR32 const PCRE_UCHAR32 * -#endif - -/* When PCRE is compiled as a C++ library, the subject pointer type can be -replaced with a custom type. For conventional use, the public interface is a -const char *. */ - -#ifndef PCRE_SPTR -#define PCRE_SPTR const char * -#endif - -/* The structure for passing additional data to pcre_exec(). This is defined in -such as way as to be extensible. Always add new fields at the end, in order to -remain compatible. */ - -typedef struct pcre_extra { - unsigned long int flags; /* Bits for which fields are set */ - void *study_data; /* Opaque data from pcre_study() */ - unsigned long int match_limit; /* Maximum number of calls to match() */ - void *callout_data; /* Data passed back in callouts */ - const unsigned char *tables; /* Pointer to character tables */ - unsigned long int match_limit_recursion; /* Max recursive calls to match() */ - unsigned char **mark; /* For passing back a mark pointer */ - void *executable_jit; /* Contains a pointer to a compiled jit code */ -} pcre_extra; - -/* Same structure as above, but with 16 bit char pointers. */ - -typedef struct pcre16_extra { - unsigned long int flags; /* Bits for which fields are set */ - void *study_data; /* Opaque data from pcre_study() */ - unsigned long int match_limit; /* Maximum number of calls to match() */ - void *callout_data; /* Data passed back in callouts */ - const unsigned char *tables; /* Pointer to character tables */ - unsigned long int match_limit_recursion; /* Max recursive calls to match() */ - PCRE_UCHAR16 **mark; /* For passing back a mark pointer */ - void *executable_jit; /* Contains a pointer to a compiled jit code */ -} pcre16_extra; - -/* Same structure as above, but with 32 bit char pointers. */ - -typedef struct pcre32_extra { - unsigned long int flags; /* Bits for which fields are set */ - void *study_data; /* Opaque data from pcre_study() */ - unsigned long int match_limit; /* Maximum number of calls to match() */ - void *callout_data; /* Data passed back in callouts */ - const unsigned char *tables; /* Pointer to character tables */ - unsigned long int match_limit_recursion; /* Max recursive calls to match() */ - PCRE_UCHAR32 **mark; /* For passing back a mark pointer */ - void *executable_jit; /* Contains a pointer to a compiled jit code */ -} pcre32_extra; - -/* The structure for passing out data via the pcre_callout_function. We use a -structure so that new fields can be added on the end in future versions, -without changing the API of the function, thereby allowing old clients to work -without modification. */ - -typedef struct pcre_callout_block { - int version; /* Identifies version of block */ - /* ------------------------ Version 0 ------------------------------- */ - int callout_number; /* Number compiled into pattern */ - int *offset_vector; /* The offset vector */ - PCRE_SPTR subject; /* The subject being matched */ - int subject_length; /* The length of the subject */ - int start_match; /* Offset to start of this match attempt */ - int current_position; /* Where we currently are in the subject */ - int capture_top; /* Max current capture */ - int capture_last; /* Most recently closed capture */ - void *callout_data; /* Data passed in with the call */ - /* ------------------- Added for Version 1 -------------------------- */ - int pattern_position; /* Offset to next item in the pattern */ - int next_item_length; /* Length of next item in the pattern */ - /* ------------------- Added for Version 2 -------------------------- */ - const unsigned char *mark; /* Pointer to current mark or NULL */ - /* ------------------------------------------------------------------ */ -} pcre_callout_block; - -/* Same structure as above, but with 16 bit char pointers. */ - -typedef struct pcre16_callout_block { - int version; /* Identifies version of block */ - /* ------------------------ Version 0 ------------------------------- */ - int callout_number; /* Number compiled into pattern */ - int *offset_vector; /* The offset vector */ - PCRE_SPTR16 subject; /* The subject being matched */ - int subject_length; /* The length of the subject */ - int start_match; /* Offset to start of this match attempt */ - int current_position; /* Where we currently are in the subject */ - int capture_top; /* Max current capture */ - int capture_last; /* Most recently closed capture */ - void *callout_data; /* Data passed in with the call */ - /* ------------------- Added for Version 1 -------------------------- */ - int pattern_position; /* Offset to next item in the pattern */ - int next_item_length; /* Length of next item in the pattern */ - /* ------------------- Added for Version 2 -------------------------- */ - const PCRE_UCHAR16 *mark; /* Pointer to current mark or NULL */ - /* ------------------------------------------------------------------ */ -} pcre16_callout_block; - -/* Same structure as above, but with 32 bit char pointers. */ - -typedef struct pcre32_callout_block { - int version; /* Identifies version of block */ - /* ------------------------ Version 0 ------------------------------- */ - int callout_number; /* Number compiled into pattern */ - int *offset_vector; /* The offset vector */ - PCRE_SPTR32 subject; /* The subject being matched */ - int subject_length; /* The length of the subject */ - int start_match; /* Offset to start of this match attempt */ - int current_position; /* Where we currently are in the subject */ - int capture_top; /* Max current capture */ - int capture_last; /* Most recently closed capture */ - void *callout_data; /* Data passed in with the call */ - /* ------------------- Added for Version 1 -------------------------- */ - int pattern_position; /* Offset to next item in the pattern */ - int next_item_length; /* Length of next item in the pattern */ - /* ------------------- Added for Version 2 -------------------------- */ - const PCRE_UCHAR32 *mark; /* Pointer to current mark or NULL */ - /* ------------------------------------------------------------------ */ -} pcre32_callout_block; - -/* Indirection for store get and free functions. These can be set to -alternative malloc/free functions if required. Special ones are used in the -non-recursive case for "frames". There is also an optional callout function -that is triggered by the (?) regex item. For Virtual Pascal, these definitions -have to take another form. */ - -#ifndef VPCOMPAT -PCRE_EXP_DECL void *(*pcre_malloc)(size_t); -PCRE_EXP_DECL void (*pcre_free)(void *); -PCRE_EXP_DECL void *(*pcre_stack_malloc)(size_t); -PCRE_EXP_DECL void (*pcre_stack_free)(void *); -PCRE_EXP_DECL int (*pcre_callout)(pcre_callout_block *); -PCRE_EXP_DECL int (*pcre_stack_guard)(void); - -PCRE_EXP_DECL void *(*pcre16_malloc)(size_t); -PCRE_EXP_DECL void (*pcre16_free)(void *); -PCRE_EXP_DECL void *(*pcre16_stack_malloc)(size_t); -PCRE_EXP_DECL void (*pcre16_stack_free)(void *); -PCRE_EXP_DECL int (*pcre16_callout)(pcre16_callout_block *); -PCRE_EXP_DECL int (*pcre16_stack_guard)(void); - -PCRE_EXP_DECL void *(*pcre32_malloc)(size_t); -PCRE_EXP_DECL void (*pcre32_free)(void *); -PCRE_EXP_DECL void *(*pcre32_stack_malloc)(size_t); -PCRE_EXP_DECL void (*pcre32_stack_free)(void *); -PCRE_EXP_DECL int (*pcre32_callout)(pcre32_callout_block *); -PCRE_EXP_DECL int (*pcre32_stack_guard)(void); -#else /* VPCOMPAT */ -PCRE_EXP_DECL void *pcre_malloc(size_t); -PCRE_EXP_DECL void pcre_free(void *); -PCRE_EXP_DECL void *pcre_stack_malloc(size_t); -PCRE_EXP_DECL void pcre_stack_free(void *); -PCRE_EXP_DECL int pcre_callout(pcre_callout_block *); -PCRE_EXP_DECL int pcre_stack_guard(void); - -PCRE_EXP_DECL void *pcre16_malloc(size_t); -PCRE_EXP_DECL void pcre16_free(void *); -PCRE_EXP_DECL void *pcre16_stack_malloc(size_t); -PCRE_EXP_DECL void pcre16_stack_free(void *); -PCRE_EXP_DECL int pcre16_callout(pcre16_callout_block *); -PCRE_EXP_DECL int pcre16_stack_guard(void); - -PCRE_EXP_DECL void *pcre32_malloc(size_t); -PCRE_EXP_DECL void pcre32_free(void *); -PCRE_EXP_DECL void *pcre32_stack_malloc(size_t); -PCRE_EXP_DECL void pcre32_stack_free(void *); -PCRE_EXP_DECL int pcre32_callout(pcre32_callout_block *); -PCRE_EXP_DECL int pcre32_stack_guard(void); -#endif /* VPCOMPAT */ - -/* User defined callback which provides a stack just before the match starts. */ - -typedef pcre_jit_stack *(*pcre_jit_callback)(void *); -typedef pcre16_jit_stack *(*pcre16_jit_callback)(void *); -typedef pcre32_jit_stack *(*pcre32_jit_callback)(void *); - -/* Exported PCRE functions */ - -PCRE_EXP_DECL pcre *pcre_compile(const char *, int, const char **, int *, - const unsigned char *); -PCRE_EXP_DECL pcre16 *pcre16_compile(PCRE_SPTR16, int, const char **, int *, - const unsigned char *); -PCRE_EXP_DECL pcre32 *pcre32_compile(PCRE_SPTR32, int, const char **, int *, - const unsigned char *); -PCRE_EXP_DECL pcre *pcre_compile2(const char *, int, int *, const char **, - int *, const unsigned char *); -PCRE_EXP_DECL pcre16 *pcre16_compile2(PCRE_SPTR16, int, int *, const char **, - int *, const unsigned char *); -PCRE_EXP_DECL pcre32 *pcre32_compile2(PCRE_SPTR32, int, int *, const char **, - int *, const unsigned char *); -PCRE_EXP_DECL int pcre_config(int, void *); -PCRE_EXP_DECL int pcre16_config(int, void *); -PCRE_EXP_DECL int pcre32_config(int, void *); -PCRE_EXP_DECL int pcre_copy_named_substring(const pcre *, const char *, - int *, int, const char *, char *, int); -PCRE_EXP_DECL int pcre16_copy_named_substring(const pcre16 *, PCRE_SPTR16, - int *, int, PCRE_SPTR16, PCRE_UCHAR16 *, int); -PCRE_EXP_DECL int pcre32_copy_named_substring(const pcre32 *, PCRE_SPTR32, - int *, int, PCRE_SPTR32, PCRE_UCHAR32 *, int); -PCRE_EXP_DECL int pcre_copy_substring(const char *, int *, int, int, - char *, int); -PCRE_EXP_DECL int pcre16_copy_substring(PCRE_SPTR16, int *, int, int, - PCRE_UCHAR16 *, int); -PCRE_EXP_DECL int pcre32_copy_substring(PCRE_SPTR32, int *, int, int, - PCRE_UCHAR32 *, int); -PCRE_EXP_DECL int pcre_dfa_exec(const pcre *, const pcre_extra *, - const char *, int, int, int, int *, int , int *, int); -PCRE_EXP_DECL int pcre16_dfa_exec(const pcre16 *, const pcre16_extra *, - PCRE_SPTR16, int, int, int, int *, int , int *, int); -PCRE_EXP_DECL int pcre32_dfa_exec(const pcre32 *, const pcre32_extra *, - PCRE_SPTR32, int, int, int, int *, int , int *, int); -PCRE_EXP_DECL int pcre_exec(const pcre *, const pcre_extra *, PCRE_SPTR, - int, int, int, int *, int); -PCRE_EXP_DECL int pcre16_exec(const pcre16 *, const pcre16_extra *, - PCRE_SPTR16, int, int, int, int *, int); -PCRE_EXP_DECL int pcre32_exec(const pcre32 *, const pcre32_extra *, - PCRE_SPTR32, int, int, int, int *, int); -PCRE_EXP_DECL int pcre_jit_exec(const pcre *, const pcre_extra *, - PCRE_SPTR, int, int, int, int *, int, - pcre_jit_stack *); -PCRE_EXP_DECL int pcre16_jit_exec(const pcre16 *, const pcre16_extra *, - PCRE_SPTR16, int, int, int, int *, int, - pcre16_jit_stack *); -PCRE_EXP_DECL int pcre32_jit_exec(const pcre32 *, const pcre32_extra *, - PCRE_SPTR32, int, int, int, int *, int, - pcre32_jit_stack *); -PCRE_EXP_DECL void pcre_free_substring(const char *); -PCRE_EXP_DECL void pcre16_free_substring(PCRE_SPTR16); -PCRE_EXP_DECL void pcre32_free_substring(PCRE_SPTR32); -PCRE_EXP_DECL void pcre_free_substring_list(const char **); -PCRE_EXP_DECL void pcre16_free_substring_list(PCRE_SPTR16 *); -PCRE_EXP_DECL void pcre32_free_substring_list(PCRE_SPTR32 *); -PCRE_EXP_DECL int pcre_fullinfo(const pcre *, const pcre_extra *, int, - void *); -PCRE_EXP_DECL int pcre16_fullinfo(const pcre16 *, const pcre16_extra *, int, - void *); -PCRE_EXP_DECL int pcre32_fullinfo(const pcre32 *, const pcre32_extra *, int, - void *); -PCRE_EXP_DECL int pcre_get_named_substring(const pcre *, const char *, - int *, int, const char *, const char **); -PCRE_EXP_DECL int pcre16_get_named_substring(const pcre16 *, PCRE_SPTR16, - int *, int, PCRE_SPTR16, PCRE_SPTR16 *); -PCRE_EXP_DECL int pcre32_get_named_substring(const pcre32 *, PCRE_SPTR32, - int *, int, PCRE_SPTR32, PCRE_SPTR32 *); -PCRE_EXP_DECL int pcre_get_stringnumber(const pcre *, const char *); -PCRE_EXP_DECL int pcre16_get_stringnumber(const pcre16 *, PCRE_SPTR16); -PCRE_EXP_DECL int pcre32_get_stringnumber(const pcre32 *, PCRE_SPTR32); -PCRE_EXP_DECL int pcre_get_stringtable_entries(const pcre *, const char *, - char **, char **); -PCRE_EXP_DECL int pcre16_get_stringtable_entries(const pcre16 *, PCRE_SPTR16, - PCRE_UCHAR16 **, PCRE_UCHAR16 **); -PCRE_EXP_DECL int pcre32_get_stringtable_entries(const pcre32 *, PCRE_SPTR32, - PCRE_UCHAR32 **, PCRE_UCHAR32 **); -PCRE_EXP_DECL int pcre_get_substring(const char *, int *, int, int, - const char **); -PCRE_EXP_DECL int pcre16_get_substring(PCRE_SPTR16, int *, int, int, - PCRE_SPTR16 *); -PCRE_EXP_DECL int pcre32_get_substring(PCRE_SPTR32, int *, int, int, - PCRE_SPTR32 *); -PCRE_EXP_DECL int pcre_get_substring_list(const char *, int *, int, - const char ***); -PCRE_EXP_DECL int pcre16_get_substring_list(PCRE_SPTR16, int *, int, - PCRE_SPTR16 **); -PCRE_EXP_DECL int pcre32_get_substring_list(PCRE_SPTR32, int *, int, - PCRE_SPTR32 **); -PCRE_EXP_DECL const unsigned char *pcre_maketables(void); -PCRE_EXP_DECL const unsigned char *pcre16_maketables(void); -PCRE_EXP_DECL const unsigned char *pcre32_maketables(void); -PCRE_EXP_DECL int pcre_refcount(pcre *, int); -PCRE_EXP_DECL int pcre16_refcount(pcre16 *, int); -PCRE_EXP_DECL int pcre32_refcount(pcre32 *, int); -PCRE_EXP_DECL pcre_extra *pcre_study(const pcre *, int, const char **); -PCRE_EXP_DECL pcre16_extra *pcre16_study(const pcre16 *, int, const char **); -PCRE_EXP_DECL pcre32_extra *pcre32_study(const pcre32 *, int, const char **); -PCRE_EXP_DECL void pcre_free_study(pcre_extra *); -PCRE_EXP_DECL void pcre16_free_study(pcre16_extra *); -PCRE_EXP_DECL void pcre32_free_study(pcre32_extra *); -PCRE_EXP_DECL const char *pcre_version(void); -PCRE_EXP_DECL const char *pcre16_version(void); -PCRE_EXP_DECL const char *pcre32_version(void); - -/* Utility functions for byte order swaps. */ -PCRE_EXP_DECL int pcre_pattern_to_host_byte_order(pcre *, pcre_extra *, - const unsigned char *); -PCRE_EXP_DECL int pcre16_pattern_to_host_byte_order(pcre16 *, pcre16_extra *, - const unsigned char *); -PCRE_EXP_DECL int pcre32_pattern_to_host_byte_order(pcre32 *, pcre32_extra *, - const unsigned char *); -PCRE_EXP_DECL int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *, - PCRE_SPTR16, int, int *, int); -PCRE_EXP_DECL int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *, - PCRE_SPTR32, int, int *, int); - -/* JIT compiler related functions. */ - -PCRE_EXP_DECL pcre_jit_stack *pcre_jit_stack_alloc(int, int); -PCRE_EXP_DECL pcre16_jit_stack *pcre16_jit_stack_alloc(int, int); -PCRE_EXP_DECL pcre32_jit_stack *pcre32_jit_stack_alloc(int, int); -PCRE_EXP_DECL void pcre_jit_stack_free(pcre_jit_stack *); -PCRE_EXP_DECL void pcre16_jit_stack_free(pcre16_jit_stack *); -PCRE_EXP_DECL void pcre32_jit_stack_free(pcre32_jit_stack *); -PCRE_EXP_DECL void pcre_assign_jit_stack(pcre_extra *, - pcre_jit_callback, void *); -PCRE_EXP_DECL void pcre16_assign_jit_stack(pcre16_extra *, - pcre16_jit_callback, void *); -PCRE_EXP_DECL void pcre32_assign_jit_stack(pcre32_extra *, - pcre32_jit_callback, void *); -PCRE_EXP_DECL void pcre_jit_free_unused_memory(void); -PCRE_EXP_DECL void pcre16_jit_free_unused_memory(void); -PCRE_EXP_DECL void pcre32_jit_free_unused_memory(void); - -#ifdef __cplusplus -} /* extern "C" */ -#endif - -#endif /* End of pcre.h */ diff --git a/src/pcre/pcre.h.in b/src/pcre/pcre.h.in deleted file mode 100644 index d4d78926..00000000 --- a/src/pcre/pcre.h.in +++ /dev/null @@ -1,677 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* This is the public header file for the PCRE library, to be #included by -applications that call the PCRE functions. - - Copyright (c) 1997-2014 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -#ifndef _PCRE_H -#define _PCRE_H - -/* The current PCRE version information. */ - -#define PCRE_MAJOR @PCRE_MAJOR@ -#define PCRE_MINOR @PCRE_MINOR@ -#define PCRE_PRERELEASE @PCRE_PRERELEASE@ -#define PCRE_DATE @PCRE_DATE@ - -/* When an application links to a PCRE DLL in Windows, the symbols that are -imported have to be identified as such. When building PCRE, the appropriate -export setting is defined in pcre_internal.h, which includes this file. So we -don't change existing definitions of PCRE_EXP_DECL and PCRECPP_EXP_DECL. */ - -#if defined(_WIN32) && !defined(PCRE_STATIC) -# ifndef PCRE_EXP_DECL -# define PCRE_EXP_DECL extern __declspec(dllimport) -# endif -# ifdef __cplusplus -# ifndef PCRECPP_EXP_DECL -# define PCRECPP_EXP_DECL extern __declspec(dllimport) -# endif -# ifndef PCRECPP_EXP_DEFN -# define PCRECPP_EXP_DEFN __declspec(dllimport) -# endif -# endif -#endif - -/* By default, we use the standard "extern" declarations. */ - -#ifndef PCRE_EXP_DECL -# ifdef __cplusplus -# define PCRE_EXP_DECL extern "C" -# else -# define PCRE_EXP_DECL extern -# endif -#endif - -#ifdef __cplusplus -# ifndef PCRECPP_EXP_DECL -# define PCRECPP_EXP_DECL extern -# endif -# ifndef PCRECPP_EXP_DEFN -# define PCRECPP_EXP_DEFN -# endif -#endif - -/* Have to include stdlib.h in order to ensure that size_t is defined; -it is needed here for malloc. */ - -#include - -/* Allow for C++ users */ - -#ifdef __cplusplus -extern "C" { -#endif - -/* Public options. Some are compile-time only, some are run-time only, and some -are both. Most of the compile-time options are saved with the compiled regex so -that they can be inspected during studying (and therefore JIT compiling). Note -that pcre_study() has its own set of options. Originally, all the options -defined here used distinct bits. However, almost all the bits in a 32-bit word -are now used, so in order to conserve them, option bits that were previously -only recognized at matching time (i.e. by pcre_exec() or pcre_dfa_exec()) may -also be used for compile-time options that affect only compiling and are not -relevant for studying or JIT compiling. - -Some options for pcre_compile() change its behaviour but do not affect the -behaviour of the execution functions. Other options are passed through to the -execution functions and affect their behaviour, with or without affecting the -behaviour of pcre_compile(). - -Options that can be passed to pcre_compile() are tagged Cx below, with these -variants: - -C1 Affects compile only -C2 Does not affect compile; affects exec, dfa_exec -C3 Affects compile, exec, dfa_exec -C4 Affects compile, exec, dfa_exec, study -C5 Affects compile, exec, study - -Options that can be set for pcre_exec() and/or pcre_dfa_exec() are flagged with -E and D, respectively. They take precedence over C3, C4, and C5 settings passed -from pcre_compile(). Those that are compatible with JIT execution are flagged -with J. */ - -#define PCRE_CASELESS 0x00000001 /* C1 */ -#define PCRE_MULTILINE 0x00000002 /* C1 */ -#define PCRE_DOTALL 0x00000004 /* C1 */ -#define PCRE_EXTENDED 0x00000008 /* C1 */ -#define PCRE_ANCHORED 0x00000010 /* C4 E D */ -#define PCRE_DOLLAR_ENDONLY 0x00000020 /* C2 */ -#define PCRE_EXTRA 0x00000040 /* C1 */ -#define PCRE_NOTBOL 0x00000080 /* E D J */ -#define PCRE_NOTEOL 0x00000100 /* E D J */ -#define PCRE_UNGREEDY 0x00000200 /* C1 */ -#define PCRE_NOTEMPTY 0x00000400 /* E D J */ -#define PCRE_UTF8 0x00000800 /* C4 ) */ -#define PCRE_UTF16 0x00000800 /* C4 ) Synonyms */ -#define PCRE_UTF32 0x00000800 /* C4 ) */ -#define PCRE_NO_AUTO_CAPTURE 0x00001000 /* C1 */ -#define PCRE_NO_UTF8_CHECK 0x00002000 /* C1 E D J ) */ -#define PCRE_NO_UTF16_CHECK 0x00002000 /* C1 E D J ) Synonyms */ -#define PCRE_NO_UTF32_CHECK 0x00002000 /* C1 E D J ) */ -#define PCRE_AUTO_CALLOUT 0x00004000 /* C1 */ -#define PCRE_PARTIAL_SOFT 0x00008000 /* E D J ) Synonyms */ -#define PCRE_PARTIAL 0x00008000 /* E D J ) */ - -/* This pair use the same bit. */ -#define PCRE_NEVER_UTF 0x00010000 /* C1 ) Overlaid */ -#define PCRE_DFA_SHORTEST 0x00010000 /* D ) Overlaid */ - -/* This pair use the same bit. */ -#define PCRE_NO_AUTO_POSSESS 0x00020000 /* C1 ) Overlaid */ -#define PCRE_DFA_RESTART 0x00020000 /* D ) Overlaid */ - -#define PCRE_FIRSTLINE 0x00040000 /* C3 */ -#define PCRE_DUPNAMES 0x00080000 /* C1 */ -#define PCRE_NEWLINE_CR 0x00100000 /* C3 E D */ -#define PCRE_NEWLINE_LF 0x00200000 /* C3 E D */ -#define PCRE_NEWLINE_CRLF 0x00300000 /* C3 E D */ -#define PCRE_NEWLINE_ANY 0x00400000 /* C3 E D */ -#define PCRE_NEWLINE_ANYCRLF 0x00500000 /* C3 E D */ -#define PCRE_BSR_ANYCRLF 0x00800000 /* C3 E D */ -#define PCRE_BSR_UNICODE 0x01000000 /* C3 E D */ -#define PCRE_JAVASCRIPT_COMPAT 0x02000000 /* C5 */ -#define PCRE_NO_START_OPTIMIZE 0x04000000 /* C2 E D ) Synonyms */ -#define PCRE_NO_START_OPTIMISE 0x04000000 /* C2 E D ) */ -#define PCRE_PARTIAL_HARD 0x08000000 /* E D J */ -#define PCRE_NOTEMPTY_ATSTART 0x10000000 /* E D J */ -#define PCRE_UCP 0x20000000 /* C3 */ - -/* Exec-time and get/set-time error codes */ - -#define PCRE_ERROR_NOMATCH (-1) -#define PCRE_ERROR_NULL (-2) -#define PCRE_ERROR_BADOPTION (-3) -#define PCRE_ERROR_BADMAGIC (-4) -#define PCRE_ERROR_UNKNOWN_OPCODE (-5) -#define PCRE_ERROR_UNKNOWN_NODE (-5) /* For backward compatibility */ -#define PCRE_ERROR_NOMEMORY (-6) -#define PCRE_ERROR_NOSUBSTRING (-7) -#define PCRE_ERROR_MATCHLIMIT (-8) -#define PCRE_ERROR_CALLOUT (-9) /* Never used by PCRE itself */ -#define PCRE_ERROR_BADUTF8 (-10) /* Same for 8/16/32 */ -#define PCRE_ERROR_BADUTF16 (-10) /* Same for 8/16/32 */ -#define PCRE_ERROR_BADUTF32 (-10) /* Same for 8/16/32 */ -#define PCRE_ERROR_BADUTF8_OFFSET (-11) /* Same for 8/16 */ -#define PCRE_ERROR_BADUTF16_OFFSET (-11) /* Same for 8/16 */ -#define PCRE_ERROR_PARTIAL (-12) -#define PCRE_ERROR_BADPARTIAL (-13) -#define PCRE_ERROR_INTERNAL (-14) -#define PCRE_ERROR_BADCOUNT (-15) -#define PCRE_ERROR_DFA_UITEM (-16) -#define PCRE_ERROR_DFA_UCOND (-17) -#define PCRE_ERROR_DFA_UMLIMIT (-18) -#define PCRE_ERROR_DFA_WSSIZE (-19) -#define PCRE_ERROR_DFA_RECURSE (-20) -#define PCRE_ERROR_RECURSIONLIMIT (-21) -#define PCRE_ERROR_NULLWSLIMIT (-22) /* No longer actually used */ -#define PCRE_ERROR_BADNEWLINE (-23) -#define PCRE_ERROR_BADOFFSET (-24) -#define PCRE_ERROR_SHORTUTF8 (-25) -#define PCRE_ERROR_SHORTUTF16 (-25) /* Same for 8/16 */ -#define PCRE_ERROR_RECURSELOOP (-26) -#define PCRE_ERROR_JIT_STACKLIMIT (-27) -#define PCRE_ERROR_BADMODE (-28) -#define PCRE_ERROR_BADENDIANNESS (-29) -#define PCRE_ERROR_DFA_BADRESTART (-30) -#define PCRE_ERROR_JIT_BADOPTION (-31) -#define PCRE_ERROR_BADLENGTH (-32) -#define PCRE_ERROR_UNSET (-33) - -/* Specific error codes for UTF-8 validity checks */ - -#define PCRE_UTF8_ERR0 0 -#define PCRE_UTF8_ERR1 1 -#define PCRE_UTF8_ERR2 2 -#define PCRE_UTF8_ERR3 3 -#define PCRE_UTF8_ERR4 4 -#define PCRE_UTF8_ERR5 5 -#define PCRE_UTF8_ERR6 6 -#define PCRE_UTF8_ERR7 7 -#define PCRE_UTF8_ERR8 8 -#define PCRE_UTF8_ERR9 9 -#define PCRE_UTF8_ERR10 10 -#define PCRE_UTF8_ERR11 11 -#define PCRE_UTF8_ERR12 12 -#define PCRE_UTF8_ERR13 13 -#define PCRE_UTF8_ERR14 14 -#define PCRE_UTF8_ERR15 15 -#define PCRE_UTF8_ERR16 16 -#define PCRE_UTF8_ERR17 17 -#define PCRE_UTF8_ERR18 18 -#define PCRE_UTF8_ERR19 19 -#define PCRE_UTF8_ERR20 20 -#define PCRE_UTF8_ERR21 21 -#define PCRE_UTF8_ERR22 22 /* Unused (was non-character) */ - -/* Specific error codes for UTF-16 validity checks */ - -#define PCRE_UTF16_ERR0 0 -#define PCRE_UTF16_ERR1 1 -#define PCRE_UTF16_ERR2 2 -#define PCRE_UTF16_ERR3 3 -#define PCRE_UTF16_ERR4 4 /* Unused (was non-character) */ - -/* Specific error codes for UTF-32 validity checks */ - -#define PCRE_UTF32_ERR0 0 -#define PCRE_UTF32_ERR1 1 -#define PCRE_UTF32_ERR2 2 /* Unused (was non-character) */ -#define PCRE_UTF32_ERR3 3 - -/* Request types for pcre_fullinfo() */ - -#define PCRE_INFO_OPTIONS 0 -#define PCRE_INFO_SIZE 1 -#define PCRE_INFO_CAPTURECOUNT 2 -#define PCRE_INFO_BACKREFMAX 3 -#define PCRE_INFO_FIRSTBYTE 4 -#define PCRE_INFO_FIRSTCHAR 4 /* For backwards compatibility */ -#define PCRE_INFO_FIRSTTABLE 5 -#define PCRE_INFO_LASTLITERAL 6 -#define PCRE_INFO_NAMEENTRYSIZE 7 -#define PCRE_INFO_NAMECOUNT 8 -#define PCRE_INFO_NAMETABLE 9 -#define PCRE_INFO_STUDYSIZE 10 -#define PCRE_INFO_DEFAULT_TABLES 11 -#define PCRE_INFO_OKPARTIAL 12 -#define PCRE_INFO_JCHANGED 13 -#define PCRE_INFO_HASCRORLF 14 -#define PCRE_INFO_MINLENGTH 15 -#define PCRE_INFO_JIT 16 -#define PCRE_INFO_JITSIZE 17 -#define PCRE_INFO_MAXLOOKBEHIND 18 -#define PCRE_INFO_FIRSTCHARACTER 19 -#define PCRE_INFO_FIRSTCHARACTERFLAGS 20 -#define PCRE_INFO_REQUIREDCHAR 21 -#define PCRE_INFO_REQUIREDCHARFLAGS 22 -#define PCRE_INFO_MATCHLIMIT 23 -#define PCRE_INFO_RECURSIONLIMIT 24 -#define PCRE_INFO_MATCH_EMPTY 25 - -/* Request types for pcre_config(). Do not re-arrange, in order to remain -compatible. */ - -#define PCRE_CONFIG_UTF8 0 -#define PCRE_CONFIG_NEWLINE 1 -#define PCRE_CONFIG_LINK_SIZE 2 -#define PCRE_CONFIG_POSIX_MALLOC_THRESHOLD 3 -#define PCRE_CONFIG_MATCH_LIMIT 4 -#define PCRE_CONFIG_STACKRECURSE 5 -#define PCRE_CONFIG_UNICODE_PROPERTIES 6 -#define PCRE_CONFIG_MATCH_LIMIT_RECURSION 7 -#define PCRE_CONFIG_BSR 8 -#define PCRE_CONFIG_JIT 9 -#define PCRE_CONFIG_UTF16 10 -#define PCRE_CONFIG_JITTARGET 11 -#define PCRE_CONFIG_UTF32 12 -#define PCRE_CONFIG_PARENS_LIMIT 13 - -/* Request types for pcre_study(). Do not re-arrange, in order to remain -compatible. */ - -#define PCRE_STUDY_JIT_COMPILE 0x0001 -#define PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE 0x0002 -#define PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE 0x0004 -#define PCRE_STUDY_EXTRA_NEEDED 0x0008 - -/* Bit flags for the pcre[16|32]_extra structure. Do not re-arrange or redefine -these bits, just add new ones on the end, in order to remain compatible. */ - -#define PCRE_EXTRA_STUDY_DATA 0x0001 -#define PCRE_EXTRA_MATCH_LIMIT 0x0002 -#define PCRE_EXTRA_CALLOUT_DATA 0x0004 -#define PCRE_EXTRA_TABLES 0x0008 -#define PCRE_EXTRA_MATCH_LIMIT_RECURSION 0x0010 -#define PCRE_EXTRA_MARK 0x0020 -#define PCRE_EXTRA_EXECUTABLE_JIT 0x0040 - -/* Types */ - -struct real_pcre8_or_16; /* declaration; the definition is private */ -typedef struct real_pcre8_or_16 pcre; - -struct real_pcre8_or_16; /* declaration; the definition is private */ -typedef struct real_pcre8_or_16 pcre16; - -struct real_pcre32; /* declaration; the definition is private */ -typedef struct real_pcre32 pcre32; - -struct real_pcre_jit_stack; /* declaration; the definition is private */ -typedef struct real_pcre_jit_stack pcre_jit_stack; - -struct real_pcre16_jit_stack; /* declaration; the definition is private */ -typedef struct real_pcre16_jit_stack pcre16_jit_stack; - -struct real_pcre32_jit_stack; /* declaration; the definition is private */ -typedef struct real_pcre32_jit_stack pcre32_jit_stack; - -/* If PCRE is compiled with 16 bit character support, PCRE_UCHAR16 must contain -a 16 bit wide signed data type. Otherwise it can be a dummy data type since -pcre16 functions are not implemented. There is a check for this in pcre_internal.h. */ -#ifndef PCRE_UCHAR16 -#define PCRE_UCHAR16 unsigned short -#endif - -#ifndef PCRE_SPTR16 -#define PCRE_SPTR16 const PCRE_UCHAR16 * -#endif - -/* If PCRE is compiled with 32 bit character support, PCRE_UCHAR32 must contain -a 32 bit wide signed data type. Otherwise it can be a dummy data type since -pcre32 functions are not implemented. There is a check for this in pcre_internal.h. */ -#ifndef PCRE_UCHAR32 -#define PCRE_UCHAR32 unsigned int -#endif - -#ifndef PCRE_SPTR32 -#define PCRE_SPTR32 const PCRE_UCHAR32 * -#endif - -/* When PCRE is compiled as a C++ library, the subject pointer type can be -replaced with a custom type. For conventional use, the public interface is a -const char *. */ - -#ifndef PCRE_SPTR -#define PCRE_SPTR const char * -#endif - -/* The structure for passing additional data to pcre_exec(). This is defined in -such as way as to be extensible. Always add new fields at the end, in order to -remain compatible. */ - -typedef struct pcre_extra { - unsigned long int flags; /* Bits for which fields are set */ - void *study_data; /* Opaque data from pcre_study() */ - unsigned long int match_limit; /* Maximum number of calls to match() */ - void *callout_data; /* Data passed back in callouts */ - const unsigned char *tables; /* Pointer to character tables */ - unsigned long int match_limit_recursion; /* Max recursive calls to match() */ - unsigned char **mark; /* For passing back a mark pointer */ - void *executable_jit; /* Contains a pointer to a compiled jit code */ -} pcre_extra; - -/* Same structure as above, but with 16 bit char pointers. */ - -typedef struct pcre16_extra { - unsigned long int flags; /* Bits for which fields are set */ - void *study_data; /* Opaque data from pcre_study() */ - unsigned long int match_limit; /* Maximum number of calls to match() */ - void *callout_data; /* Data passed back in callouts */ - const unsigned char *tables; /* Pointer to character tables */ - unsigned long int match_limit_recursion; /* Max recursive calls to match() */ - PCRE_UCHAR16 **mark; /* For passing back a mark pointer */ - void *executable_jit; /* Contains a pointer to a compiled jit code */ -} pcre16_extra; - -/* Same structure as above, but with 32 bit char pointers. */ - -typedef struct pcre32_extra { - unsigned long int flags; /* Bits for which fields are set */ - void *study_data; /* Opaque data from pcre_study() */ - unsigned long int match_limit; /* Maximum number of calls to match() */ - void *callout_data; /* Data passed back in callouts */ - const unsigned char *tables; /* Pointer to character tables */ - unsigned long int match_limit_recursion; /* Max recursive calls to match() */ - PCRE_UCHAR32 **mark; /* For passing back a mark pointer */ - void *executable_jit; /* Contains a pointer to a compiled jit code */ -} pcre32_extra; - -/* The structure for passing out data via the pcre_callout_function. We use a -structure so that new fields can be added on the end in future versions, -without changing the API of the function, thereby allowing old clients to work -without modification. */ - -typedef struct pcre_callout_block { - int version; /* Identifies version of block */ - /* ------------------------ Version 0 ------------------------------- */ - int callout_number; /* Number compiled into pattern */ - int *offset_vector; /* The offset vector */ - PCRE_SPTR subject; /* The subject being matched */ - int subject_length; /* The length of the subject */ - int start_match; /* Offset to start of this match attempt */ - int current_position; /* Where we currently are in the subject */ - int capture_top; /* Max current capture */ - int capture_last; /* Most recently closed capture */ - void *callout_data; /* Data passed in with the call */ - /* ------------------- Added for Version 1 -------------------------- */ - int pattern_position; /* Offset to next item in the pattern */ - int next_item_length; /* Length of next item in the pattern */ - /* ------------------- Added for Version 2 -------------------------- */ - const unsigned char *mark; /* Pointer to current mark or NULL */ - /* ------------------------------------------------------------------ */ -} pcre_callout_block; - -/* Same structure as above, but with 16 bit char pointers. */ - -typedef struct pcre16_callout_block { - int version; /* Identifies version of block */ - /* ------------------------ Version 0 ------------------------------- */ - int callout_number; /* Number compiled into pattern */ - int *offset_vector; /* The offset vector */ - PCRE_SPTR16 subject; /* The subject being matched */ - int subject_length; /* The length of the subject */ - int start_match; /* Offset to start of this match attempt */ - int current_position; /* Where we currently are in the subject */ - int capture_top; /* Max current capture */ - int capture_last; /* Most recently closed capture */ - void *callout_data; /* Data passed in with the call */ - /* ------------------- Added for Version 1 -------------------------- */ - int pattern_position; /* Offset to next item in the pattern */ - int next_item_length; /* Length of next item in the pattern */ - /* ------------------- Added for Version 2 -------------------------- */ - const PCRE_UCHAR16 *mark; /* Pointer to current mark or NULL */ - /* ------------------------------------------------------------------ */ -} pcre16_callout_block; - -/* Same structure as above, but with 32 bit char pointers. */ - -typedef struct pcre32_callout_block { - int version; /* Identifies version of block */ - /* ------------------------ Version 0 ------------------------------- */ - int callout_number; /* Number compiled into pattern */ - int *offset_vector; /* The offset vector */ - PCRE_SPTR32 subject; /* The subject being matched */ - int subject_length; /* The length of the subject */ - int start_match; /* Offset to start of this match attempt */ - int current_position; /* Where we currently are in the subject */ - int capture_top; /* Max current capture */ - int capture_last; /* Most recently closed capture */ - void *callout_data; /* Data passed in with the call */ - /* ------------------- Added for Version 1 -------------------------- */ - int pattern_position; /* Offset to next item in the pattern */ - int next_item_length; /* Length of next item in the pattern */ - /* ------------------- Added for Version 2 -------------------------- */ - const PCRE_UCHAR32 *mark; /* Pointer to current mark or NULL */ - /* ------------------------------------------------------------------ */ -} pcre32_callout_block; - -/* Indirection for store get and free functions. These can be set to -alternative malloc/free functions if required. Special ones are used in the -non-recursive case for "frames". There is also an optional callout function -that is triggered by the (?) regex item. For Virtual Pascal, these definitions -have to take another form. */ - -#ifndef VPCOMPAT -PCRE_EXP_DECL void *(*pcre_malloc)(size_t); -PCRE_EXP_DECL void (*pcre_free)(void *); -PCRE_EXP_DECL void *(*pcre_stack_malloc)(size_t); -PCRE_EXP_DECL void (*pcre_stack_free)(void *); -PCRE_EXP_DECL int (*pcre_callout)(pcre_callout_block *); -PCRE_EXP_DECL int (*pcre_stack_guard)(void); - -PCRE_EXP_DECL void *(*pcre16_malloc)(size_t); -PCRE_EXP_DECL void (*pcre16_free)(void *); -PCRE_EXP_DECL void *(*pcre16_stack_malloc)(size_t); -PCRE_EXP_DECL void (*pcre16_stack_free)(void *); -PCRE_EXP_DECL int (*pcre16_callout)(pcre16_callout_block *); -PCRE_EXP_DECL int (*pcre16_stack_guard)(void); - -PCRE_EXP_DECL void *(*pcre32_malloc)(size_t); -PCRE_EXP_DECL void (*pcre32_free)(void *); -PCRE_EXP_DECL void *(*pcre32_stack_malloc)(size_t); -PCRE_EXP_DECL void (*pcre32_stack_free)(void *); -PCRE_EXP_DECL int (*pcre32_callout)(pcre32_callout_block *); -PCRE_EXP_DECL int (*pcre32_stack_guard)(void); -#else /* VPCOMPAT */ -PCRE_EXP_DECL void *pcre_malloc(size_t); -PCRE_EXP_DECL void pcre_free(void *); -PCRE_EXP_DECL void *pcre_stack_malloc(size_t); -PCRE_EXP_DECL void pcre_stack_free(void *); -PCRE_EXP_DECL int pcre_callout(pcre_callout_block *); -PCRE_EXP_DECL int pcre_stack_guard(void); - -PCRE_EXP_DECL void *pcre16_malloc(size_t); -PCRE_EXP_DECL void pcre16_free(void *); -PCRE_EXP_DECL void *pcre16_stack_malloc(size_t); -PCRE_EXP_DECL void pcre16_stack_free(void *); -PCRE_EXP_DECL int pcre16_callout(pcre16_callout_block *); -PCRE_EXP_DECL int pcre16_stack_guard(void); - -PCRE_EXP_DECL void *pcre32_malloc(size_t); -PCRE_EXP_DECL void pcre32_free(void *); -PCRE_EXP_DECL void *pcre32_stack_malloc(size_t); -PCRE_EXP_DECL void pcre32_stack_free(void *); -PCRE_EXP_DECL int pcre32_callout(pcre32_callout_block *); -PCRE_EXP_DECL int pcre32_stack_guard(void); -#endif /* VPCOMPAT */ - -/* User defined callback which provides a stack just before the match starts. */ - -typedef pcre_jit_stack *(*pcre_jit_callback)(void *); -typedef pcre16_jit_stack *(*pcre16_jit_callback)(void *); -typedef pcre32_jit_stack *(*pcre32_jit_callback)(void *); - -/* Exported PCRE functions */ - -PCRE_EXP_DECL pcre *pcre_compile(const char *, int, const char **, int *, - const unsigned char *); -PCRE_EXP_DECL pcre16 *pcre16_compile(PCRE_SPTR16, int, const char **, int *, - const unsigned char *); -PCRE_EXP_DECL pcre32 *pcre32_compile(PCRE_SPTR32, int, const char **, int *, - const unsigned char *); -PCRE_EXP_DECL pcre *pcre_compile2(const char *, int, int *, const char **, - int *, const unsigned char *); -PCRE_EXP_DECL pcre16 *pcre16_compile2(PCRE_SPTR16, int, int *, const char **, - int *, const unsigned char *); -PCRE_EXP_DECL pcre32 *pcre32_compile2(PCRE_SPTR32, int, int *, const char **, - int *, const unsigned char *); -PCRE_EXP_DECL int pcre_config(int, void *); -PCRE_EXP_DECL int pcre16_config(int, void *); -PCRE_EXP_DECL int pcre32_config(int, void *); -PCRE_EXP_DECL int pcre_copy_named_substring(const pcre *, const char *, - int *, int, const char *, char *, int); -PCRE_EXP_DECL int pcre16_copy_named_substring(const pcre16 *, PCRE_SPTR16, - int *, int, PCRE_SPTR16, PCRE_UCHAR16 *, int); -PCRE_EXP_DECL int pcre32_copy_named_substring(const pcre32 *, PCRE_SPTR32, - int *, int, PCRE_SPTR32, PCRE_UCHAR32 *, int); -PCRE_EXP_DECL int pcre_copy_substring(const char *, int *, int, int, - char *, int); -PCRE_EXP_DECL int pcre16_copy_substring(PCRE_SPTR16, int *, int, int, - PCRE_UCHAR16 *, int); -PCRE_EXP_DECL int pcre32_copy_substring(PCRE_SPTR32, int *, int, int, - PCRE_UCHAR32 *, int); -PCRE_EXP_DECL int pcre_dfa_exec(const pcre *, const pcre_extra *, - const char *, int, int, int, int *, int , int *, int); -PCRE_EXP_DECL int pcre16_dfa_exec(const pcre16 *, const pcre16_extra *, - PCRE_SPTR16, int, int, int, int *, int , int *, int); -PCRE_EXP_DECL int pcre32_dfa_exec(const pcre32 *, const pcre32_extra *, - PCRE_SPTR32, int, int, int, int *, int , int *, int); -PCRE_EXP_DECL int pcre_exec(const pcre *, const pcre_extra *, PCRE_SPTR, - int, int, int, int *, int); -PCRE_EXP_DECL int pcre16_exec(const pcre16 *, const pcre16_extra *, - PCRE_SPTR16, int, int, int, int *, int); -PCRE_EXP_DECL int pcre32_exec(const pcre32 *, const pcre32_extra *, - PCRE_SPTR32, int, int, int, int *, int); -PCRE_EXP_DECL int pcre_jit_exec(const pcre *, const pcre_extra *, - PCRE_SPTR, int, int, int, int *, int, - pcre_jit_stack *); -PCRE_EXP_DECL int pcre16_jit_exec(const pcre16 *, const pcre16_extra *, - PCRE_SPTR16, int, int, int, int *, int, - pcre16_jit_stack *); -PCRE_EXP_DECL int pcre32_jit_exec(const pcre32 *, const pcre32_extra *, - PCRE_SPTR32, int, int, int, int *, int, - pcre32_jit_stack *); -PCRE_EXP_DECL void pcre_free_substring(const char *); -PCRE_EXP_DECL void pcre16_free_substring(PCRE_SPTR16); -PCRE_EXP_DECL void pcre32_free_substring(PCRE_SPTR32); -PCRE_EXP_DECL void pcre_free_substring_list(const char **); -PCRE_EXP_DECL void pcre16_free_substring_list(PCRE_SPTR16 *); -PCRE_EXP_DECL void pcre32_free_substring_list(PCRE_SPTR32 *); -PCRE_EXP_DECL int pcre_fullinfo(const pcre *, const pcre_extra *, int, - void *); -PCRE_EXP_DECL int pcre16_fullinfo(const pcre16 *, const pcre16_extra *, int, - void *); -PCRE_EXP_DECL int pcre32_fullinfo(const pcre32 *, const pcre32_extra *, int, - void *); -PCRE_EXP_DECL int pcre_get_named_substring(const pcre *, const char *, - int *, int, const char *, const char **); -PCRE_EXP_DECL int pcre16_get_named_substring(const pcre16 *, PCRE_SPTR16, - int *, int, PCRE_SPTR16, PCRE_SPTR16 *); -PCRE_EXP_DECL int pcre32_get_named_substring(const pcre32 *, PCRE_SPTR32, - int *, int, PCRE_SPTR32, PCRE_SPTR32 *); -PCRE_EXP_DECL int pcre_get_stringnumber(const pcre *, const char *); -PCRE_EXP_DECL int pcre16_get_stringnumber(const pcre16 *, PCRE_SPTR16); -PCRE_EXP_DECL int pcre32_get_stringnumber(const pcre32 *, PCRE_SPTR32); -PCRE_EXP_DECL int pcre_get_stringtable_entries(const pcre *, const char *, - char **, char **); -PCRE_EXP_DECL int pcre16_get_stringtable_entries(const pcre16 *, PCRE_SPTR16, - PCRE_UCHAR16 **, PCRE_UCHAR16 **); -PCRE_EXP_DECL int pcre32_get_stringtable_entries(const pcre32 *, PCRE_SPTR32, - PCRE_UCHAR32 **, PCRE_UCHAR32 **); -PCRE_EXP_DECL int pcre_get_substring(const char *, int *, int, int, - const char **); -PCRE_EXP_DECL int pcre16_get_substring(PCRE_SPTR16, int *, int, int, - PCRE_SPTR16 *); -PCRE_EXP_DECL int pcre32_get_substring(PCRE_SPTR32, int *, int, int, - PCRE_SPTR32 *); -PCRE_EXP_DECL int pcre_get_substring_list(const char *, int *, int, - const char ***); -PCRE_EXP_DECL int pcre16_get_substring_list(PCRE_SPTR16, int *, int, - PCRE_SPTR16 **); -PCRE_EXP_DECL int pcre32_get_substring_list(PCRE_SPTR32, int *, int, - PCRE_SPTR32 **); -PCRE_EXP_DECL const unsigned char *pcre_maketables(void); -PCRE_EXP_DECL const unsigned char *pcre16_maketables(void); -PCRE_EXP_DECL const unsigned char *pcre32_maketables(void); -PCRE_EXP_DECL int pcre_refcount(pcre *, int); -PCRE_EXP_DECL int pcre16_refcount(pcre16 *, int); -PCRE_EXP_DECL int pcre32_refcount(pcre32 *, int); -PCRE_EXP_DECL pcre_extra *pcre_study(const pcre *, int, const char **); -PCRE_EXP_DECL pcre16_extra *pcre16_study(const pcre16 *, int, const char **); -PCRE_EXP_DECL pcre32_extra *pcre32_study(const pcre32 *, int, const char **); -PCRE_EXP_DECL void pcre_free_study(pcre_extra *); -PCRE_EXP_DECL void pcre16_free_study(pcre16_extra *); -PCRE_EXP_DECL void pcre32_free_study(pcre32_extra *); -PCRE_EXP_DECL const char *pcre_version(void); -PCRE_EXP_DECL const char *pcre16_version(void); -PCRE_EXP_DECL const char *pcre32_version(void); - -/* Utility functions for byte order swaps. */ -PCRE_EXP_DECL int pcre_pattern_to_host_byte_order(pcre *, pcre_extra *, - const unsigned char *); -PCRE_EXP_DECL int pcre16_pattern_to_host_byte_order(pcre16 *, pcre16_extra *, - const unsigned char *); -PCRE_EXP_DECL int pcre32_pattern_to_host_byte_order(pcre32 *, pcre32_extra *, - const unsigned char *); -PCRE_EXP_DECL int pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *, - PCRE_SPTR16, int, int *, int); -PCRE_EXP_DECL int pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *, - PCRE_SPTR32, int, int *, int); - -/* JIT compiler related functions. */ - -PCRE_EXP_DECL pcre_jit_stack *pcre_jit_stack_alloc(int, int); -PCRE_EXP_DECL pcre16_jit_stack *pcre16_jit_stack_alloc(int, int); -PCRE_EXP_DECL pcre32_jit_stack *pcre32_jit_stack_alloc(int, int); -PCRE_EXP_DECL void pcre_jit_stack_free(pcre_jit_stack *); -PCRE_EXP_DECL void pcre16_jit_stack_free(pcre16_jit_stack *); -PCRE_EXP_DECL void pcre32_jit_stack_free(pcre32_jit_stack *); -PCRE_EXP_DECL void pcre_assign_jit_stack(pcre_extra *, - pcre_jit_callback, void *); -PCRE_EXP_DECL void pcre16_assign_jit_stack(pcre16_extra *, - pcre16_jit_callback, void *); -PCRE_EXP_DECL void pcre32_assign_jit_stack(pcre32_extra *, - pcre32_jit_callback, void *); -PCRE_EXP_DECL void pcre_jit_free_unused_memory(void); -PCRE_EXP_DECL void pcre16_jit_free_unused_memory(void); -PCRE_EXP_DECL void pcre32_jit_free_unused_memory(void); - -#ifdef __cplusplus -} /* extern "C" */ -#endif - -#endif /* End of pcre.h */ diff --git a/src/pcre/pcre16_byte_order.c b/src/pcre/pcre16_byte_order.c deleted file mode 100644 index 11d2973a..00000000 --- a/src/pcre/pcre16_byte_order.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_byte_order.c" - -/* End of pcre16_byte_order.c */ diff --git a/src/pcre/pcre16_chartables.c b/src/pcre/pcre16_chartables.c deleted file mode 100644 index 7c0ff35f..00000000 --- a/src/pcre/pcre16_chartables.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_chartables.c" - -/* End of pcre16_chartables.c */ diff --git a/src/pcre/pcre16_compile.c b/src/pcre/pcre16_compile.c deleted file mode 100644 index e499b670..00000000 --- a/src/pcre/pcre16_compile.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_compile.c" - -/* End of pcre16_compile.c */ diff --git a/src/pcre/pcre16_config.c b/src/pcre/pcre16_config.c deleted file mode 100644 index b5213876..00000000 --- a/src/pcre/pcre16_config.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_config.c" - -/* End of pcre16_config.c */ diff --git a/src/pcre/pcre16_dfa_exec.c b/src/pcre/pcre16_dfa_exec.c deleted file mode 100644 index 2ba740e9..00000000 --- a/src/pcre/pcre16_dfa_exec.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_dfa_exec.c" - -/* End of pcre16_dfa_exec.c */ diff --git a/src/pcre/pcre16_exec.c b/src/pcre/pcre16_exec.c deleted file mode 100644 index 7417b177..00000000 --- a/src/pcre/pcre16_exec.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_exec.c" - -/* End of pcre16_exec.c */ diff --git a/src/pcre/pcre16_fullinfo.c b/src/pcre/pcre16_fullinfo.c deleted file mode 100644 index 544dca6e..00000000 --- a/src/pcre/pcre16_fullinfo.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_fullinfo.c" - -/* End of pcre16_fullinfo.c */ diff --git a/src/pcre/pcre16_get.c b/src/pcre/pcre16_get.c deleted file mode 100644 index 3ded08c6..00000000 --- a/src/pcre/pcre16_get.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_get.c" - -/* End of pcre16_get.c */ diff --git a/src/pcre/pcre16_globals.c b/src/pcre/pcre16_globals.c deleted file mode 100644 index a136b3d8..00000000 --- a/src/pcre/pcre16_globals.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_globals.c" - -/* End of pcre16_globals.c */ diff --git a/src/pcre/pcre16_jit_compile.c b/src/pcre/pcre16_jit_compile.c deleted file mode 100644 index ab0cacd7..00000000 --- a/src/pcre/pcre16_jit_compile.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_jit_compile.c" - -/* End of pcre16_jit_compile.c */ diff --git a/src/pcre/pcre16_maketables.c b/src/pcre/pcre16_maketables.c deleted file mode 100644 index b1cd1c57..00000000 --- a/src/pcre/pcre16_maketables.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_maketables.c" - -/* End of pcre16_maketables.c */ diff --git a/src/pcre/pcre16_newline.c b/src/pcre/pcre16_newline.c deleted file mode 100644 index 7fe20140..00000000 --- a/src/pcre/pcre16_newline.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_newline.c" - -/* End of pcre16_newline.c */ diff --git a/src/pcre/pcre16_ord2utf16.c b/src/pcre/pcre16_ord2utf16.c deleted file mode 100644 index 8e2ce5ea..00000000 --- a/src/pcre/pcre16_ord2utf16.c +++ /dev/null @@ -1,90 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This file contains a private PCRE function that converts an ordinal -character value into a UTF16 string. */ - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_internal.h" - -/************************************************* -* Convert character value to UTF-16 * -*************************************************/ - -/* This function takes an integer value in the range 0 - 0x10ffff -and encodes it as a UTF-16 character in 1 to 2 pcre_uchars. - -Arguments: - cvalue the character value - buffer pointer to buffer for result - at least 2 pcre_uchars long - -Returns: number of characters placed in the buffer -*/ - -unsigned int -PRIV(ord2utf)(pcre_uint32 cvalue, pcre_uchar *buffer) -{ -#ifdef SUPPORT_UTF - -if (cvalue <= 0xffff) - { - *buffer = (pcre_uchar)cvalue; - return 1; - } - -cvalue -= 0x10000; -*buffer++ = 0xd800 | (cvalue >> 10); -*buffer = 0xdc00 | (cvalue & 0x3ff); -return 2; - -#else /* SUPPORT_UTF */ -(void)(cvalue); /* Keep compiler happy; this function won't ever be */ -(void)(buffer); /* called when SUPPORT_UTF is not defined. */ -return 0; -#endif /* SUPPORT_UTF */ -} - -/* End of pcre16_ord2utf16.c */ diff --git a/src/pcre/pcre16_printint.c b/src/pcre/pcre16_printint.c deleted file mode 100644 index 33d8c340..00000000 --- a/src/pcre/pcre16_printint.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_printint.c" - -/* End of pcre16_printint.c */ diff --git a/src/pcre/pcre16_refcount.c b/src/pcre/pcre16_refcount.c deleted file mode 100644 index d3d15439..00000000 --- a/src/pcre/pcre16_refcount.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_refcount.c" - -/* End of pcre16_refcount.c */ diff --git a/src/pcre/pcre16_string_utils.c b/src/pcre/pcre16_string_utils.c deleted file mode 100644 index 382c4079..00000000 --- a/src/pcre/pcre16_string_utils.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_string_utils.c" - -/* End of pcre16_string_utils.c */ diff --git a/src/pcre/pcre16_study.c b/src/pcre/pcre16_study.c deleted file mode 100644 index f87de081..00000000 --- a/src/pcre/pcre16_study.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_study.c" - -/* End of pcre16_study.c */ diff --git a/src/pcre/pcre16_tables.c b/src/pcre/pcre16_tables.c deleted file mode 100644 index d8429709..00000000 --- a/src/pcre/pcre16_tables.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_tables.c" - -/* End of pcre16_tables.c */ diff --git a/src/pcre/pcre16_ucd.c b/src/pcre/pcre16_ucd.c deleted file mode 100644 index ee23439a..00000000 --- a/src/pcre/pcre16_ucd.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_ucd.c" - -/* End of pcre16_ucd.c */ diff --git a/src/pcre/pcre16_utf16_utils.c b/src/pcre/pcre16_utf16_utils.c deleted file mode 100644 index 49ced0c0..00000000 --- a/src/pcre/pcre16_utf16_utils.c +++ /dev/null @@ -1,130 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This module contains a function for converting any UTF-16 character -strings to host byte order. */ - - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_internal.h" - -/************************************************* -* Convert any UTF-16 string to host byte order * -*************************************************/ - -/* This function takes an UTF-16 string and converts -it to host byte order. The length can be explicitly set, -or automatically detected for zero terminated strings. -BOMs can be kept or discarded during the conversion. -Conversion can be done in place (output == input). - -Arguments: - output the output buffer, its size must be greater - or equal than the input string - input any UTF-16 string - length the number of 16-bit units in the input string - can be less than zero for zero terminated strings - host_byte_order - A non-zero value means the input is in host byte - order, which can be dynamically changed by BOMs later. - Initially it contains the starting byte order and returns - with the last byte order so it can be used for stream - processing. It can be NULL, which set the host byte - order mode by default. - keep_boms for a non-zero value, the BOM (0xfeff) characters - are copied as well - -Returns: the number of 16-bit units placed into the output buffer, - including the zero-terminator -*/ - -int -pcre16_utf16_to_host_byte_order(PCRE_UCHAR16 *output, PCRE_SPTR16 input, - int length, int *host_byte_order, int keep_boms) -{ -#ifdef SUPPORT_UTF -/* This function converts any UTF-16 string to host byte order and optionally -removes any Byte Order Marks (BOMS). Returns with the remainig length. */ -int host_bo = host_byte_order != NULL ? *host_byte_order : 1; -pcre_uchar *optr = (pcre_uchar *)output; -const pcre_uchar *iptr = (const pcre_uchar *)input; -const pcre_uchar *end; -/* The c variable must be unsigned. */ -register pcre_uchar c; - -if (length < 0) - length = STRLEN_UC(iptr) + 1; -end = iptr + length; - -while (iptr < end) - { - c = *iptr++; - if (c == 0xfeff || c == 0xfffe) - { - /* Detecting the byte order of the machine is unnecessary, it is - enough to know that the UTF-16 string has the same byte order or not. */ - host_bo = c == 0xfeff; - if (keep_boms != 0) - *optr++ = 0xfeff; - else - length--; - } - else - *optr++ = host_bo ? c : ((c >> 8) | (c << 8)); /* Flip bytes if needed. */ - } -if (host_byte_order != NULL) - *host_byte_order = host_bo; - -#else /* Not SUPPORT_UTF */ -(void)(output); /* Keep picky compilers happy */ -(void)(input); -(void)(keep_boms); -(void)(host_byte_order); -#endif /* SUPPORT_UTF */ -return length; -} - -/* End of pcre16_utf16_utils.c */ diff --git a/src/pcre/pcre16_valid_utf16.c b/src/pcre/pcre16_valid_utf16.c deleted file mode 100644 index 09076539..00000000 --- a/src/pcre/pcre16_valid_utf16.c +++ /dev/null @@ -1,137 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2013 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This module contains an internal function for validating UTF-16 character -strings. */ - - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_internal.h" - - -/************************************************* -* Validate a UTF-16 string * -*************************************************/ - -/* This function is called (optionally) at the start of compile or match, to -check that a supposed UTF-16 string is actually valid. The early check means -that subsequent code can assume it is dealing with a valid string. The check -can be turned off for maximum performance, but the consequences of supplying an -invalid string are then undefined. - -From release 8.21 more information about the details of the error are passed -back in the returned value: - -PCRE_UTF16_ERR0 No error -PCRE_UTF16_ERR1 Missing low surrogate at the end of the string -PCRE_UTF16_ERR2 Invalid low surrogate -PCRE_UTF16_ERR3 Isolated low surrogate -PCRE_UTF16_ERR4 Unused (was non-character) - -Arguments: - string points to the string - length length of string, or -1 if the string is zero-terminated - errp pointer to an error position offset variable - -Returns: = 0 if the string is a valid UTF-16 string - > 0 otherwise, setting the offset of the bad character -*/ - -int -PRIV(valid_utf)(PCRE_PUCHAR string, int length, int *erroroffset) -{ -#ifdef SUPPORT_UTF -register PCRE_PUCHAR p; -register pcre_uint32 c; - -if (length < 0) - { - for (p = string; *p != 0; p++); - length = p - string; - } - -for (p = string; length-- > 0; p++) - { - c = *p; - - if ((c & 0xf800) != 0xd800) - { - /* Normal UTF-16 code point. Neither high nor low surrogate. */ - } - else if ((c & 0x0400) == 0) - { - /* High surrogate. Must be a followed by a low surrogate. */ - if (length == 0) - { - *erroroffset = p - string; - return PCRE_UTF16_ERR1; - } - p++; - length--; - if ((*p & 0xfc00) != 0xdc00) - { - *erroroffset = p - string; - return PCRE_UTF16_ERR2; - } - } - else - { - /* Isolated low surrogate. Always an error. */ - *erroroffset = p - string; - return PCRE_UTF16_ERR3; - } - } - -#else /* SUPPORT_UTF */ -(void)(string); /* Keep picky compilers happy */ -(void)(length); -(void)(erroroffset); -#endif /* SUPPORT_UTF */ - -return PCRE_UTF16_ERR0; /* This indicates success */ -} - -/* End of pcre16_valid_utf16.c */ diff --git a/src/pcre/pcre16_version.c b/src/pcre/pcre16_version.c deleted file mode 100644 index e991b1a8..00000000 --- a/src/pcre/pcre16_version.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_version.c" - -/* End of pcre16_version.c */ diff --git a/src/pcre/pcre16_xclass.c b/src/pcre/pcre16_xclass.c deleted file mode 100644 index 5aac2a36..00000000 --- a/src/pcre/pcre16_xclass.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 16 bit character support. */ -#define COMPILE_PCRE16 - -#include "pcre_xclass.c" - -/* End of pcre16_xclass.c */ diff --git a/src/pcre/pcre32_byte_order.c b/src/pcre/pcre32_byte_order.c deleted file mode 100644 index 9cf53627..00000000 --- a/src/pcre/pcre32_byte_order.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_byte_order.c" - -/* End of pcre32_byte_order.c */ diff --git a/src/pcre/pcre32_chartables.c b/src/pcre/pcre32_chartables.c deleted file mode 100644 index b5d8c23d..00000000 --- a/src/pcre/pcre32_chartables.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_chartables.c" - -/* End of pcre32_chartables.c */ diff --git a/src/pcre/pcre32_compile.c b/src/pcre/pcre32_compile.c deleted file mode 100644 index d781eb37..00000000 --- a/src/pcre/pcre32_compile.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_compile.c" - -/* End of pcre32_compile.c */ diff --git a/src/pcre/pcre32_config.c b/src/pcre/pcre32_config.c deleted file mode 100644 index d63f3e9e..00000000 --- a/src/pcre/pcre32_config.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_config.c" - -/* End of pcre32_config.c */ diff --git a/src/pcre/pcre32_dfa_exec.c b/src/pcre/pcre32_dfa_exec.c deleted file mode 100644 index b0bfd34f..00000000 --- a/src/pcre/pcre32_dfa_exec.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_dfa_exec.c" - -/* End of pcre32_dfa_exec.c */ diff --git a/src/pcre/pcre32_exec.c b/src/pcre/pcre32_exec.c deleted file mode 100644 index 8170ed77..00000000 --- a/src/pcre/pcre32_exec.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_exec.c" - -/* End of pcre32_exec.c */ diff --git a/src/pcre/pcre32_fullinfo.c b/src/pcre/pcre32_fullinfo.c deleted file mode 100644 index 6ecc5209..00000000 --- a/src/pcre/pcre32_fullinfo.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_fullinfo.c" - -/* End of pcre32_fullinfo.c */ diff --git a/src/pcre/pcre32_get.c b/src/pcre/pcre32_get.c deleted file mode 100644 index d35deee0..00000000 --- a/src/pcre/pcre32_get.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_get.c" - -/* End of pcre32_get.c */ diff --git a/src/pcre/pcre32_globals.c b/src/pcre/pcre32_globals.c deleted file mode 100644 index 32e0914c..00000000 --- a/src/pcre/pcre32_globals.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_globals.c" - -/* End of pcre32_globals.c */ diff --git a/src/pcre/pcre32_jit_compile.c b/src/pcre/pcre32_jit_compile.c deleted file mode 100644 index 2e7c6f97..00000000 --- a/src/pcre/pcre32_jit_compile.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_jit_compile.c" - -/* End of pcre32_jit_compile.c */ diff --git a/src/pcre/pcre32_maketables.c b/src/pcre/pcre32_maketables.c deleted file mode 100644 index 5d1b1c64..00000000 --- a/src/pcre/pcre32_maketables.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_maketables.c" - -/* End of pcre32_maketables.c */ diff --git a/src/pcre/pcre32_newline.c b/src/pcre/pcre32_newline.c deleted file mode 100644 index 7f8d5360..00000000 --- a/src/pcre/pcre32_newline.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_newline.c" - -/* End of pcre32_newline.c */ diff --git a/src/pcre/pcre32_ord2utf32.c b/src/pcre/pcre32_ord2utf32.c deleted file mode 100644 index 606bcb3d..00000000 --- a/src/pcre/pcre32_ord2utf32.c +++ /dev/null @@ -1,82 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This file contains a private PCRE function that converts an ordinal -character value into a UTF32 string. */ - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_internal.h" - -/************************************************* -* Convert character value to UTF-32 * -*************************************************/ - -/* This function takes an integer value in the range 0 - 0x10ffff -and encodes it as a UTF-32 character in 1 pcre_uchars. - -Arguments: - cvalue the character value - buffer pointer to buffer for result - at least 1 pcre_uchars long - -Returns: number of characters placed in the buffer -*/ - -unsigned int -PRIV(ord2utf)(pcre_uint32 cvalue, pcre_uchar *buffer) -{ -#ifdef SUPPORT_UTF - -*buffer = (pcre_uchar)cvalue; -return 1; - -#else /* SUPPORT_UTF */ -(void)(cvalue); /* Keep compiler happy; this function won't ever be */ -(void)(buffer); /* called when SUPPORT_UTF is not defined. */ -return 0; -#endif /* SUPPORT_UTF */ -} - -/* End of pcre32_ord2utf32.c */ diff --git a/src/pcre/pcre32_printint.c b/src/pcre/pcre32_printint.c deleted file mode 100644 index f3fd7b25..00000000 --- a/src/pcre/pcre32_printint.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_printint.c" - -/* End of pcre32_printint.c */ diff --git a/src/pcre/pcre32_refcount.c b/src/pcre/pcre32_refcount.c deleted file mode 100644 index dbdf432d..00000000 --- a/src/pcre/pcre32_refcount.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_refcount.c" - -/* End of pcre32_refcount.c */ diff --git a/src/pcre/pcre32_string_utils.c b/src/pcre/pcre32_string_utils.c deleted file mode 100644 index e37b3d48..00000000 --- a/src/pcre/pcre32_string_utils.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_string_utils.c" - -/* End of pcre32_string_utils.c */ diff --git a/src/pcre/pcre32_study.c b/src/pcre/pcre32_study.c deleted file mode 100644 index d3a3afed..00000000 --- a/src/pcre/pcre32_study.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_study.c" - -/* End of pcre32_study.c */ diff --git a/src/pcre/pcre32_tables.c b/src/pcre/pcre32_tables.c deleted file mode 100644 index 3d94cca3..00000000 --- a/src/pcre/pcre32_tables.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_tables.c" - -/* End of pcre32_tables.c */ diff --git a/src/pcre/pcre32_ucd.c b/src/pcre/pcre32_ucd.c deleted file mode 100644 index befe22d3..00000000 --- a/src/pcre/pcre32_ucd.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_ucd.c" - -/* End of pcre32_ucd.c */ diff --git a/src/pcre/pcre32_utf32_utils.c b/src/pcre/pcre32_utf32_utils.c deleted file mode 100644 index f844e237..00000000 --- a/src/pcre/pcre32_utf32_utils.c +++ /dev/null @@ -1,141 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This module contains a function for converting any UTF-32 character -strings to host byte order. */ - - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_internal.h" - -#ifdef SUPPORT_UTF -static pcre_uint32 -swap_uint32(pcre_uint32 value) -{ -return ((value & 0x000000ff) << 24) | - ((value & 0x0000ff00) << 8) | - ((value & 0x00ff0000) >> 8) | - (value >> 24); -} -#endif - - -/************************************************* -* Convert any UTF-32 string to host byte order * -*************************************************/ - -/* This function takes an UTF-32 string and converts -it to host byte order. The length can be explicitly set, -or automatically detected for zero terminated strings. -BOMs can be kept or discarded during the conversion. -Conversion can be done in place (output == input). - -Arguments: - output the output buffer, its size must be greater - or equal than the input string - input any UTF-32 string - length the number of 32-bit units in the input string - can be less than zero for zero terminated strings - host_byte_order - A non-zero value means the input is in host byte - order, which can be dynamically changed by BOMs later. - Initially it contains the starting byte order and returns - with the last byte order so it can be used for stream - processing. It can be NULL, which set the host byte - order mode by default. - keep_boms for a non-zero value, the BOM (0xfeff) characters - are copied as well - -Returns: the number of 32-bit units placed into the output buffer, - including the zero-terminator -*/ - -int -pcre32_utf32_to_host_byte_order(PCRE_UCHAR32 *output, PCRE_SPTR32 input, - int length, int *host_byte_order, int keep_boms) -{ -#ifdef SUPPORT_UTF -/* This function converts any UTF-32 string to host byte order and optionally -removes any Byte Order Marks (BOMS). Returns with the remainig length. */ -int host_bo = host_byte_order != NULL ? *host_byte_order : 1; -pcre_uchar *optr = (pcre_uchar *)output; -const pcre_uchar *iptr = (const pcre_uchar *)input; -const pcre_uchar *end; -/* The c variable must be unsigned. */ -register pcre_uchar c; - -if (length < 0) - end = iptr + STRLEN_UC(iptr) + 1; -else - end = iptr + length; - -while (iptr < end) - { - c = *iptr++; - if (c == 0x0000feffu || c == 0xfffe0000u) - { - /* Detecting the byte order of the machine is unnecessary, it is - enough to know that the UTF-32 string has the same byte order or not. */ - host_bo = c == 0x0000feffu; - if (keep_boms != 0) - *optr++ = 0x0000feffu; - } - else - *optr++ = host_bo ? c : swap_uint32(c); - } -if (host_byte_order != NULL) - *host_byte_order = host_bo; - -#else /* SUPPORT_UTF */ -(void)(output); /* Keep picky compilers happy */ -(void)(input); -(void)(keep_boms); -(void)(host_byte_order); -#endif /* SUPPORT_UTF */ -return length; -} - -/* End of pcre32_utf32_utils.c */ diff --git a/src/pcre/pcre32_valid_utf32.c b/src/pcre/pcre32_valid_utf32.c deleted file mode 100644 index 94cda1a2..00000000 --- a/src/pcre/pcre32_valid_utf32.c +++ /dev/null @@ -1,124 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2013 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This module contains an internal function for validating UTF-32 character -strings. */ - - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_internal.h" - -/************************************************* -* Validate a UTF-32 string * -*************************************************/ - -/* This function is called (optionally) at the start of compile or match, to -check that a supposed UTF-32 string is actually valid. The early check means -that subsequent code can assume it is dealing with a valid string. The check -can be turned off for maximum performance, but the consequences of supplying an -invalid string are then undefined. - -More information about the details of the error are passed -back in the returned value: - -PCRE_UTF32_ERR0 No error -PCRE_UTF32_ERR1 Surrogate character -PCRE_UTF32_ERR2 Unused (was non-character) -PCRE_UTF32_ERR3 Character > 0x10ffff - -Arguments: - string points to the string - length length of string, or -1 if the string is zero-terminated - errp pointer to an error position offset variable - -Returns: = 0 if the string is a valid UTF-32 string - > 0 otherwise, setting the offset of the bad character -*/ - -int -PRIV(valid_utf)(PCRE_PUCHAR string, int length, int *erroroffset) -{ -#ifdef SUPPORT_UTF -register PCRE_PUCHAR p; -register pcre_uchar c; - -if (length < 0) - { - for (p = string; *p != 0; p++); - length = p - string; - } - -for (p = string; length-- > 0; p++) - { - c = *p; - - if ((c & 0xfffff800u) != 0xd800u) - { - /* Normal UTF-32 code point. Neither high nor low surrogate. */ - if (c > 0x10ffffu) - { - *erroroffset = p - string; - return PCRE_UTF32_ERR3; - } - } - else - { - /* A surrogate */ - *erroroffset = p - string; - return PCRE_UTF32_ERR1; - } - } - -#else /* SUPPORT_UTF */ -(void)(string); /* Keep picky compilers happy */ -(void)(length); -(void)(erroroffset); -#endif /* SUPPORT_UTF */ - -return PCRE_UTF32_ERR0; /* This indicates success */ -} - -/* End of pcre32_valid_utf32.c */ diff --git a/src/pcre/pcre32_version.c b/src/pcre/pcre32_version.c deleted file mode 100644 index fdaad9b0..00000000 --- a/src/pcre/pcre32_version.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_version.c" - -/* End of pcre32_version.c */ diff --git a/src/pcre/pcre32_xclass.c b/src/pcre/pcre32_xclass.c deleted file mode 100644 index 5662408a..00000000 --- a/src/pcre/pcre32_xclass.c +++ /dev/null @@ -1,45 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Generate code with 32 bit character support. */ -#define COMPILE_PCRE32 - -#include "pcre_xclass.c" - -/* End of pcre32_xclass.c */ diff --git a/src/pcre/pcre_byte_order.c b/src/pcre/pcre_byte_order.c deleted file mode 100644 index cf5f12b0..00000000 --- a/src/pcre/pcre_byte_order.c +++ /dev/null @@ -1,319 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2014 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This module contains an internal function that tests a compiled pattern to -see if it was compiled with the opposite endianness. If so, it uses an -auxiliary local function to flip the appropriate bytes. */ - - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include "pcre_internal.h" - - -/************************************************* -* Swap byte functions * -*************************************************/ - -/* The following functions swap the bytes of a pcre_uint16 -and pcre_uint32 value. - -Arguments: - value any number - -Returns: the byte swapped value -*/ - -static pcre_uint32 -swap_uint32(pcre_uint32 value) -{ -return ((value & 0x000000ff) << 24) | - ((value & 0x0000ff00) << 8) | - ((value & 0x00ff0000) >> 8) | - (value >> 24); -} - -static pcre_uint16 -swap_uint16(pcre_uint16 value) -{ -return (value >> 8) | (value << 8); -} - - -/************************************************* -* Test for a byte-flipped compiled regex * -*************************************************/ - -/* This function swaps the bytes of a compiled pattern usually -loaded form the disk. It also sets the tables pointer, which -is likely an invalid pointer after reload. - -Arguments: - argument_re points to the compiled expression - extra_data points to extra data or is NULL - tables points to the character tables or NULL - -Returns: 0 if the swap is successful, negative on error -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DECL int pcre_pattern_to_host_byte_order(pcre *argument_re, - pcre_extra *extra_data, const unsigned char *tables) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DECL int pcre16_pattern_to_host_byte_order(pcre16 *argument_re, - pcre16_extra *extra_data, const unsigned char *tables) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DECL int pcre32_pattern_to_host_byte_order(pcre32 *argument_re, - pcre32_extra *extra_data, const unsigned char *tables) -#endif -{ -REAL_PCRE *re = (REAL_PCRE *)argument_re; -pcre_study_data *study; -#ifndef COMPILE_PCRE8 -pcre_uchar *ptr; -int length; -#if defined SUPPORT_UTF && defined COMPILE_PCRE16 -BOOL utf; -BOOL utf16_char; -#endif /* SUPPORT_UTF && COMPILE_PCRE16 */ -#endif /* !COMPILE_PCRE8 */ - -if (re == NULL) return PCRE_ERROR_NULL; -if (re->magic_number == MAGIC_NUMBER) - { - if ((re->flags & PCRE_MODE) == 0) return PCRE_ERROR_BADMODE; - re->tables = tables; - return 0; - } - -if (re->magic_number != REVERSED_MAGIC_NUMBER) return PCRE_ERROR_BADMAGIC; -if ((swap_uint32(re->flags) & PCRE_MODE) == 0) return PCRE_ERROR_BADMODE; - -re->magic_number = MAGIC_NUMBER; -re->size = swap_uint32(re->size); -re->options = swap_uint32(re->options); -re->flags = swap_uint32(re->flags); -re->limit_match = swap_uint32(re->limit_match); -re->limit_recursion = swap_uint32(re->limit_recursion); - -#if defined COMPILE_PCRE8 || defined COMPILE_PCRE16 -re->first_char = swap_uint16(re->first_char); -re->req_char = swap_uint16(re->req_char); -#elif defined COMPILE_PCRE32 -re->first_char = swap_uint32(re->first_char); -re->req_char = swap_uint32(re->req_char); -#endif - -re->max_lookbehind = swap_uint16(re->max_lookbehind); -re->top_bracket = swap_uint16(re->top_bracket); -re->top_backref = swap_uint16(re->top_backref); -re->name_table_offset = swap_uint16(re->name_table_offset); -re->name_entry_size = swap_uint16(re->name_entry_size); -re->name_count = swap_uint16(re->name_count); -re->ref_count = swap_uint16(re->ref_count); -re->tables = tables; - -if (extra_data != NULL && (extra_data->flags & PCRE_EXTRA_STUDY_DATA) != 0) - { - study = (pcre_study_data *)extra_data->study_data; - study->size = swap_uint32(study->size); - study->flags = swap_uint32(study->flags); - study->minlength = swap_uint32(study->minlength); - } - -#ifndef COMPILE_PCRE8 -ptr = (pcre_uchar *)re + re->name_table_offset; -length = re->name_count * re->name_entry_size; -#if defined SUPPORT_UTF && defined COMPILE_PCRE16 -utf = (re->options & PCRE_UTF16) != 0; -utf16_char = FALSE; -#endif /* SUPPORT_UTF && COMPILE_PCRE16 */ - -while(TRUE) - { - /* Swap previous characters. */ - while (length-- > 0) - { -#if defined COMPILE_PCRE16 - *ptr = swap_uint16(*ptr); -#elif defined COMPILE_PCRE32 - *ptr = swap_uint32(*ptr); -#endif - ptr++; - } -#if defined SUPPORT_UTF && defined COMPILE_PCRE16 - if (utf16_char) - { - if (HAS_EXTRALEN(ptr[-1])) - { - /* We know that there is only one extra character in UTF-16. */ - *ptr = swap_uint16(*ptr); - ptr++; - } - } - utf16_char = FALSE; -#endif /* SUPPORT_UTF */ - - /* Get next opcode. */ - length = 0; -#if defined COMPILE_PCRE16 - *ptr = swap_uint16(*ptr); -#elif defined COMPILE_PCRE32 - *ptr = swap_uint32(*ptr); -#endif - switch (*ptr) - { - case OP_END: - return 0; - -#if defined SUPPORT_UTF && defined COMPILE_PCRE16 - case OP_CHAR: - case OP_CHARI: - case OP_NOT: - case OP_NOTI: - case OP_STAR: - case OP_MINSTAR: - case OP_PLUS: - case OP_MINPLUS: - case OP_QUERY: - case OP_MINQUERY: - case OP_UPTO: - case OP_MINUPTO: - case OP_EXACT: - case OP_POSSTAR: - case OP_POSPLUS: - case OP_POSQUERY: - case OP_POSUPTO: - case OP_STARI: - case OP_MINSTARI: - case OP_PLUSI: - case OP_MINPLUSI: - case OP_QUERYI: - case OP_MINQUERYI: - case OP_UPTOI: - case OP_MINUPTOI: - case OP_EXACTI: - case OP_POSSTARI: - case OP_POSPLUSI: - case OP_POSQUERYI: - case OP_POSUPTOI: - case OP_NOTSTAR: - case OP_NOTMINSTAR: - case OP_NOTPLUS: - case OP_NOTMINPLUS: - case OP_NOTQUERY: - case OP_NOTMINQUERY: - case OP_NOTUPTO: - case OP_NOTMINUPTO: - case OP_NOTEXACT: - case OP_NOTPOSSTAR: - case OP_NOTPOSPLUS: - case OP_NOTPOSQUERY: - case OP_NOTPOSUPTO: - case OP_NOTSTARI: - case OP_NOTMINSTARI: - case OP_NOTPLUSI: - case OP_NOTMINPLUSI: - case OP_NOTQUERYI: - case OP_NOTMINQUERYI: - case OP_NOTUPTOI: - case OP_NOTMINUPTOI: - case OP_NOTEXACTI: - case OP_NOTPOSSTARI: - case OP_NOTPOSPLUSI: - case OP_NOTPOSQUERYI: - case OP_NOTPOSUPTOI: - if (utf) utf16_char = TRUE; -#endif - /* Fall through. */ - - default: - length = PRIV(OP_lengths)[*ptr] - 1; - break; - - case OP_CLASS: - case OP_NCLASS: - /* Skip the character bit map. */ - ptr += 32/sizeof(pcre_uchar); - length = 0; - break; - - case OP_XCLASS: - /* Reverse the size of the XCLASS instance. */ - ptr++; -#if defined COMPILE_PCRE16 - *ptr = swap_uint16(*ptr); -#elif defined COMPILE_PCRE32 - *ptr = swap_uint32(*ptr); -#endif -#ifndef COMPILE_PCRE32 - if (LINK_SIZE > 1) - { - /* LINK_SIZE can be 1 or 2 in 16 bit mode. */ - ptr++; - *ptr = swap_uint16(*ptr); - } -#endif - ptr++; - length = (GET(ptr, -LINK_SIZE)) - (1 + LINK_SIZE + 1); -#if defined COMPILE_PCRE16 - *ptr = swap_uint16(*ptr); -#elif defined COMPILE_PCRE32 - *ptr = swap_uint32(*ptr); -#endif - if ((*ptr & XCL_MAP) != 0) - { - /* Skip the character bit map. */ - ptr += 32/sizeof(pcre_uchar); - length -= 32/sizeof(pcre_uchar); - } - break; - } - ptr++; - } -/* Control should never reach here in 16/32 bit mode. */ -#else /* In 8-bit mode, the pattern does not need to be processed. */ -return 0; -#endif /* !COMPILE_PCRE8 */ -} - -/* End of pcre_byte_order.c */ diff --git a/src/pcre/pcre_compile.c b/src/pcre/pcre_compile.c deleted file mode 100644 index 079d30aa..00000000 --- a/src/pcre/pcre_compile.c +++ /dev/null @@ -1,9805 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2018 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This module contains the external function pcre_compile(), along with -supporting internal functions that are not used by other modules. */ - - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#define NLBLOCK cd /* Block containing newline information */ -#define PSSTART start_pattern /* Field containing pattern start */ -#define PSEND end_pattern /* Field containing pattern end */ - -#include "pcre_internal.h" - - -/* When PCRE_DEBUG is defined, we need the pcre(16|32)_printint() function, which -is also used by pcretest. PCRE_DEBUG is not defined when building a production -library. We do not need to select pcre16_printint.c specially, because the -COMPILE_PCREx macro will already be appropriately set. */ - -#ifdef PCRE_DEBUG -/* pcre_printint.c should not include any headers */ -#define PCRE_INCLUDED -#include "pcre_printint.c" -#undef PCRE_INCLUDED -#endif - - -/* Macro for setting individual bits in class bitmaps. */ - -#define SETBIT(a,b) a[(b)/8] |= (1 << ((b)&7)) - -/* Maximum length value to check against when making sure that the integer that -holds the compiled pattern length does not overflow. We make it a bit less than -INT_MAX to allow for adding in group terminating bytes, so that we don't have -to check them every time. */ - -#define OFLOW_MAX (INT_MAX - 20) - -/* Definitions to allow mutual recursion */ - -static int - add_list_to_class(pcre_uint8 *, pcre_uchar **, int, compile_data *, - const pcre_uint32 *, unsigned int); - -static BOOL - compile_regex(int, pcre_uchar **, const pcre_uchar **, int *, BOOL, BOOL, int, int, - pcre_uint32 *, pcre_int32 *, pcre_uint32 *, pcre_int32 *, branch_chain *, - compile_data *, int *); - - - -/************************************************* -* Code parameters and static tables * -*************************************************/ - -/* This value specifies the size of stack workspace that is used during the -first pre-compile phase that determines how much memory is required. The regex -is partly compiled into this space, but the compiled parts are discarded as -soon as they can be, so that hopefully there will never be an overrun. The code -does, however, check for an overrun. The largest amount I've seen used is 218, -so this number is very generous. - -The same workspace is used during the second, actual compile phase for -remembering forward references to groups so that they can be filled in at the -end. Each entry in this list occupies LINK_SIZE bytes, so even when LINK_SIZE -is 4 there is plenty of room for most patterns. However, the memory can get -filled up by repetitions of forward references, for example patterns like -/(?1){0,1999}(b)/, and one user did hit the limit. The code has been changed so -that the workspace is expanded using malloc() in this situation. The value -below is therefore a minimum, and we put a maximum on it for safety. The -minimum is now also defined in terms of LINK_SIZE so that the use of malloc() -kicks in at the same number of forward references in all cases. */ - -#define COMPILE_WORK_SIZE (2048*LINK_SIZE) -#define COMPILE_WORK_SIZE_MAX (100*COMPILE_WORK_SIZE) - -/* This value determines the size of the initial vector that is used for -remembering named groups during the pre-compile. It is allocated on the stack, -but if it is too small, it is expanded using malloc(), in a similar way to the -workspace. The value is the number of slots in the list. */ - -#define NAMED_GROUP_LIST_SIZE 20 - -/* The overrun tests check for a slightly smaller size so that they detect the -overrun before it actually does run off the end of the data block. */ - -#define WORK_SIZE_SAFETY_MARGIN (100) - -/* Private flags added to firstchar and reqchar. */ - -#define REQ_CASELESS (1 << 0) /* Indicates caselessness */ -#define REQ_VARY (1 << 1) /* Reqchar followed non-literal item */ -/* Negative values for the firstchar and reqchar flags */ -#define REQ_UNSET (-2) -#define REQ_NONE (-1) - -/* Repeated character flags. */ - -#define UTF_LENGTH 0x10000000l /* The char contains its length. */ - -/* Table for handling escaped characters in the range '0'-'z'. Positive returns -are simple data values; negative values are for special things like \d and so -on. Zero means further processing is needed (for things like \x), or the escape -is invalid. */ - -#ifndef EBCDIC - -/* This is the "normal" table for ASCII systems or for EBCDIC systems running -in UTF-8 mode. */ - -static const short int escapes[] = { - 0, 0, - 0, 0, - 0, 0, - 0, 0, - 0, 0, - CHAR_COLON, CHAR_SEMICOLON, - CHAR_LESS_THAN_SIGN, CHAR_EQUALS_SIGN, - CHAR_GREATER_THAN_SIGN, CHAR_QUESTION_MARK, - CHAR_COMMERCIAL_AT, -ESC_A, - -ESC_B, -ESC_C, - -ESC_D, -ESC_E, - 0, -ESC_G, - -ESC_H, 0, - 0, -ESC_K, - 0, 0, - -ESC_N, 0, - -ESC_P, -ESC_Q, - -ESC_R, -ESC_S, - 0, 0, - -ESC_V, -ESC_W, - -ESC_X, 0, - -ESC_Z, CHAR_LEFT_SQUARE_BRACKET, - CHAR_BACKSLASH, CHAR_RIGHT_SQUARE_BRACKET, - CHAR_CIRCUMFLEX_ACCENT, CHAR_UNDERSCORE, - CHAR_GRAVE_ACCENT, ESC_a, - -ESC_b, 0, - -ESC_d, ESC_e, - ESC_f, 0, - -ESC_h, 0, - 0, -ESC_k, - 0, 0, - ESC_n, 0, - -ESC_p, 0, - ESC_r, -ESC_s, - ESC_tee, 0, - -ESC_v, -ESC_w, - 0, 0, - -ESC_z -}; - -#else - -/* This is the "abnormal" table for EBCDIC systems without UTF-8 support. */ - -static const short int escapes[] = { -/* 48 */ 0, 0, 0, '.', '<', '(', '+', '|', -/* 50 */ '&', 0, 0, 0, 0, 0, 0, 0, -/* 58 */ 0, 0, '!', '$', '*', ')', ';', '~', -/* 60 */ '-', '/', 0, 0, 0, 0, 0, 0, -/* 68 */ 0, 0, '|', ',', '%', '_', '>', '?', -/* 70 */ 0, 0, 0, 0, 0, 0, 0, 0, -/* 78 */ 0, '`', ':', '#', '@', '\'', '=', '"', -/* 80 */ 0, ESC_a, -ESC_b, 0, -ESC_d, ESC_e, ESC_f, 0, -/* 88 */-ESC_h, 0, 0, '{', 0, 0, 0, 0, -/* 90 */ 0, 0, -ESC_k, 0, 0, ESC_n, 0, -ESC_p, -/* 98 */ 0, ESC_r, 0, '}', 0, 0, 0, 0, -/* A0 */ 0, '~', -ESC_s, ESC_tee, 0,-ESC_v, -ESC_w, 0, -/* A8 */ 0,-ESC_z, 0, 0, 0, '[', 0, 0, -/* B0 */ 0, 0, 0, 0, 0, 0, 0, 0, -/* B8 */ 0, 0, 0, 0, 0, ']', '=', '-', -/* C0 */ '{',-ESC_A, -ESC_B, -ESC_C, -ESC_D,-ESC_E, 0, -ESC_G, -/* C8 */-ESC_H, 0, 0, 0, 0, 0, 0, 0, -/* D0 */ '}', 0, -ESC_K, 0, 0,-ESC_N, 0, -ESC_P, -/* D8 */-ESC_Q,-ESC_R, 0, 0, 0, 0, 0, 0, -/* E0 */ '\\', 0, -ESC_S, 0, 0,-ESC_V, -ESC_W, -ESC_X, -/* E8 */ 0,-ESC_Z, 0, 0, 0, 0, 0, 0, -/* F0 */ 0, 0, 0, 0, 0, 0, 0, 0, -/* F8 */ 0, 0, 0, 0, 0, 0, 0, 0 -}; - -/* We also need a table of characters that may follow \c in an EBCDIC -environment for characters 0-31. */ - -static unsigned char ebcdic_escape_c[] = "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_"; - -#endif - - -/* Table of special "verbs" like (*PRUNE). This is a short table, so it is -searched linearly. Put all the names into a single string, in order to reduce -the number of relocations when a shared library is dynamically linked. The -string is built from string macros so that it works in UTF-8 mode on EBCDIC -platforms. */ - -typedef struct verbitem { - int len; /* Length of verb name */ - int op; /* Op when no arg, or -1 if arg mandatory */ - int op_arg; /* Op when arg present, or -1 if not allowed */ -} verbitem; - -static const char verbnames[] = - "\0" /* Empty name is a shorthand for MARK */ - STRING_MARK0 - STRING_ACCEPT0 - STRING_COMMIT0 - STRING_F0 - STRING_FAIL0 - STRING_PRUNE0 - STRING_SKIP0 - STRING_THEN; - -static const verbitem verbs[] = { - { 0, -1, OP_MARK }, - { 4, -1, OP_MARK }, - { 6, OP_ACCEPT, -1 }, - { 6, OP_COMMIT, -1 }, - { 1, OP_FAIL, -1 }, - { 4, OP_FAIL, -1 }, - { 5, OP_PRUNE, OP_PRUNE_ARG }, - { 4, OP_SKIP, OP_SKIP_ARG }, - { 4, OP_THEN, OP_THEN_ARG } -}; - -static const int verbcount = sizeof(verbs)/sizeof(verbitem); - - -/* Substitutes for [[:<:]] and [[:>:]], which mean start and end of word in -another regex library. */ - -static const pcre_uchar sub_start_of_word[] = { - CHAR_BACKSLASH, CHAR_b, CHAR_LEFT_PARENTHESIS, CHAR_QUESTION_MARK, - CHAR_EQUALS_SIGN, CHAR_BACKSLASH, CHAR_w, CHAR_RIGHT_PARENTHESIS, '\0' }; - -static const pcre_uchar sub_end_of_word[] = { - CHAR_BACKSLASH, CHAR_b, CHAR_LEFT_PARENTHESIS, CHAR_QUESTION_MARK, - CHAR_LESS_THAN_SIGN, CHAR_EQUALS_SIGN, CHAR_BACKSLASH, CHAR_w, - CHAR_RIGHT_PARENTHESIS, '\0' }; - - -/* Tables of names of POSIX character classes and their lengths. The names are -now all in a single string, to reduce the number of relocations when a shared -library is dynamically loaded. The list of lengths is terminated by a zero -length entry. The first three must be alpha, lower, upper, as this is assumed -for handling case independence. The indices for graph, print, and punct are -needed, so identify them. */ - -static const char posix_names[] = - STRING_alpha0 STRING_lower0 STRING_upper0 STRING_alnum0 - STRING_ascii0 STRING_blank0 STRING_cntrl0 STRING_digit0 - STRING_graph0 STRING_print0 STRING_punct0 STRING_space0 - STRING_word0 STRING_xdigit; - -static const pcre_uint8 posix_name_lengths[] = { - 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 6, 0 }; - -#define PC_GRAPH 8 -#define PC_PRINT 9 -#define PC_PUNCT 10 - - -/* Table of class bit maps for each POSIX class. Each class is formed from a -base map, with an optional addition or removal of another map. Then, for some -classes, there is some additional tweaking: for [:blank:] the vertical space -characters are removed, and for [:alpha:] and [:alnum:] the underscore -character is removed. The triples in the table consist of the base map offset, -second map offset or -1 if no second map, and a non-negative value for map -addition or a negative value for map subtraction (if there are two maps). The -absolute value of the third field has these meanings: 0 => no tweaking, 1 => -remove vertical space characters, 2 => remove underscore. */ - -static const int posix_class_maps[] = { - cbit_word, cbit_digit, -2, /* alpha */ - cbit_lower, -1, 0, /* lower */ - cbit_upper, -1, 0, /* upper */ - cbit_word, -1, 2, /* alnum - word without underscore */ - cbit_print, cbit_cntrl, 0, /* ascii */ - cbit_space, -1, 1, /* blank - a GNU extension */ - cbit_cntrl, -1, 0, /* cntrl */ - cbit_digit, -1, 0, /* digit */ - cbit_graph, -1, 0, /* graph */ - cbit_print, -1, 0, /* print */ - cbit_punct, -1, 0, /* punct */ - cbit_space, -1, 0, /* space */ - cbit_word, -1, 0, /* word - a Perl extension */ - cbit_xdigit,-1, 0 /* xdigit */ -}; - -/* Table of substitutes for \d etc when PCRE_UCP is set. They are replaced by -Unicode property escapes. */ - -#ifdef SUPPORT_UCP -static const pcre_uchar string_PNd[] = { - CHAR_BACKSLASH, CHAR_P, CHAR_LEFT_CURLY_BRACKET, - CHAR_N, CHAR_d, CHAR_RIGHT_CURLY_BRACKET, '\0' }; -static const pcre_uchar string_pNd[] = { - CHAR_BACKSLASH, CHAR_p, CHAR_LEFT_CURLY_BRACKET, - CHAR_N, CHAR_d, CHAR_RIGHT_CURLY_BRACKET, '\0' }; -static const pcre_uchar string_PXsp[] = { - CHAR_BACKSLASH, CHAR_P, CHAR_LEFT_CURLY_BRACKET, - CHAR_X, CHAR_s, CHAR_p, CHAR_RIGHT_CURLY_BRACKET, '\0' }; -static const pcre_uchar string_pXsp[] = { - CHAR_BACKSLASH, CHAR_p, CHAR_LEFT_CURLY_BRACKET, - CHAR_X, CHAR_s, CHAR_p, CHAR_RIGHT_CURLY_BRACKET, '\0' }; -static const pcre_uchar string_PXwd[] = { - CHAR_BACKSLASH, CHAR_P, CHAR_LEFT_CURLY_BRACKET, - CHAR_X, CHAR_w, CHAR_d, CHAR_RIGHT_CURLY_BRACKET, '\0' }; -static const pcre_uchar string_pXwd[] = { - CHAR_BACKSLASH, CHAR_p, CHAR_LEFT_CURLY_BRACKET, - CHAR_X, CHAR_w, CHAR_d, CHAR_RIGHT_CURLY_BRACKET, '\0' }; - -static const pcre_uchar *substitutes[] = { - string_PNd, /* \D */ - string_pNd, /* \d */ - string_PXsp, /* \S */ /* Xsp is Perl space, but from 8.34, Perl */ - string_pXsp, /* \s */ /* space and POSIX space are the same. */ - string_PXwd, /* \W */ - string_pXwd /* \w */ -}; - -/* The POSIX class substitutes must be in the order of the POSIX class names, -defined above, and there are both positive and negative cases. NULL means no -general substitute of a Unicode property escape (\p or \P). However, for some -POSIX classes (e.g. graph, print, punct) a special property code is compiled -directly. */ - -static const pcre_uchar string_pL[] = { - CHAR_BACKSLASH, CHAR_p, CHAR_LEFT_CURLY_BRACKET, - CHAR_L, CHAR_RIGHT_CURLY_BRACKET, '\0' }; -static const pcre_uchar string_pLl[] = { - CHAR_BACKSLASH, CHAR_p, CHAR_LEFT_CURLY_BRACKET, - CHAR_L, CHAR_l, CHAR_RIGHT_CURLY_BRACKET, '\0' }; -static const pcre_uchar string_pLu[] = { - CHAR_BACKSLASH, CHAR_p, CHAR_LEFT_CURLY_BRACKET, - CHAR_L, CHAR_u, CHAR_RIGHT_CURLY_BRACKET, '\0' }; -static const pcre_uchar string_pXan[] = { - CHAR_BACKSLASH, CHAR_p, CHAR_LEFT_CURLY_BRACKET, - CHAR_X, CHAR_a, CHAR_n, CHAR_RIGHT_CURLY_BRACKET, '\0' }; -static const pcre_uchar string_h[] = { - CHAR_BACKSLASH, CHAR_h, '\0' }; -static const pcre_uchar string_pXps[] = { - CHAR_BACKSLASH, CHAR_p, CHAR_LEFT_CURLY_BRACKET, - CHAR_X, CHAR_p, CHAR_s, CHAR_RIGHT_CURLY_BRACKET, '\0' }; -static const pcre_uchar string_PL[] = { - CHAR_BACKSLASH, CHAR_P, CHAR_LEFT_CURLY_BRACKET, - CHAR_L, CHAR_RIGHT_CURLY_BRACKET, '\0' }; -static const pcre_uchar string_PLl[] = { - CHAR_BACKSLASH, CHAR_P, CHAR_LEFT_CURLY_BRACKET, - CHAR_L, CHAR_l, CHAR_RIGHT_CURLY_BRACKET, '\0' }; -static const pcre_uchar string_PLu[] = { - CHAR_BACKSLASH, CHAR_P, CHAR_LEFT_CURLY_BRACKET, - CHAR_L, CHAR_u, CHAR_RIGHT_CURLY_BRACKET, '\0' }; -static const pcre_uchar string_PXan[] = { - CHAR_BACKSLASH, CHAR_P, CHAR_LEFT_CURLY_BRACKET, - CHAR_X, CHAR_a, CHAR_n, CHAR_RIGHT_CURLY_BRACKET, '\0' }; -static const pcre_uchar string_H[] = { - CHAR_BACKSLASH, CHAR_H, '\0' }; -static const pcre_uchar string_PXps[] = { - CHAR_BACKSLASH, CHAR_P, CHAR_LEFT_CURLY_BRACKET, - CHAR_X, CHAR_p, CHAR_s, CHAR_RIGHT_CURLY_BRACKET, '\0' }; - -static const pcre_uchar *posix_substitutes[] = { - string_pL, /* alpha */ - string_pLl, /* lower */ - string_pLu, /* upper */ - string_pXan, /* alnum */ - NULL, /* ascii */ - string_h, /* blank */ - NULL, /* cntrl */ - string_pNd, /* digit */ - NULL, /* graph */ - NULL, /* print */ - NULL, /* punct */ - string_pXps, /* space */ /* Xps is POSIX space, but from 8.34 */ - string_pXwd, /* word */ /* Perl and POSIX space are the same */ - NULL, /* xdigit */ - /* Negated cases */ - string_PL, /* ^alpha */ - string_PLl, /* ^lower */ - string_PLu, /* ^upper */ - string_PXan, /* ^alnum */ - NULL, /* ^ascii */ - string_H, /* ^blank */ - NULL, /* ^cntrl */ - string_PNd, /* ^digit */ - NULL, /* ^graph */ - NULL, /* ^print */ - NULL, /* ^punct */ - string_PXps, /* ^space */ /* Xps is POSIX space, but from 8.34 */ - string_PXwd, /* ^word */ /* Perl and POSIX space are the same */ - NULL /* ^xdigit */ -}; -#define POSIX_SUBSIZE (sizeof(posix_substitutes) / sizeof(pcre_uchar *)) -#endif - -#define STRING(a) # a -#define XSTRING(s) STRING(s) - -/* The texts of compile-time error messages. These are "char *" because they -are passed to the outside world. Do not ever re-use any error number, because -they are documented. Always add a new error instead. Messages marked DEAD below -are no longer used. This used to be a table of strings, but in order to reduce -the number of relocations needed when a shared library is loaded dynamically, -it is now one long string. We cannot use a table of offsets, because the -lengths of inserts such as XSTRING(MAX_NAME_SIZE) are not known. Instead, we -simply count through to the one we want - this isn't a performance issue -because these strings are used only when there is a compilation error. - -Each substring ends with \0 to insert a null character. This includes the final -substring, so that the whole string ends with \0\0, which can be detected when -counting through. */ - -static const char error_texts[] = - "no error\0" - "\\ at end of pattern\0" - "\\c at end of pattern\0" - "unrecognized character follows \\\0" - "numbers out of order in {} quantifier\0" - /* 5 */ - "number too big in {} quantifier\0" - "missing terminating ] for character class\0" - "invalid escape sequence in character class\0" - "range out of order in character class\0" - "nothing to repeat\0" - /* 10 */ - "internal error: invalid forward reference offset\0" - "internal error: unexpected repeat\0" - "unrecognized character after (? or (?-\0" - "POSIX named classes are supported only within a class\0" - "missing )\0" - /* 15 */ - "reference to non-existent subpattern\0" - "erroffset passed as NULL\0" - "unknown option bit(s) set\0" - "missing ) after comment\0" - "parentheses nested too deeply\0" /** DEAD **/ - /* 20 */ - "regular expression is too large\0" - "failed to get memory\0" - "unmatched parentheses\0" - "internal error: code overflow\0" - "unrecognized character after (?<\0" - /* 25 */ - "lookbehind assertion is not fixed length\0" - "malformed number or name after (?(\0" - "conditional group contains more than two branches\0" - "assertion expected after (?( or (?(?C)\0" - "(?R or (?[+-]digits must be followed by )\0" - /* 30 */ - "unknown POSIX class name\0" - "POSIX collating elements are not supported\0" - "this version of PCRE is compiled without UTF support\0" - "spare error\0" /** DEAD **/ - "character value in \\x{} or \\o{} is too large\0" - /* 35 */ - "invalid condition (?(0)\0" - "\\C not allowed in lookbehind assertion\0" - "PCRE does not support \\L, \\l, \\N{name}, \\U, or \\u\0" - "number after (?C is > 255\0" - "closing ) for (?C expected\0" - /* 40 */ - "recursive call could loop indefinitely\0" - "unrecognized character after (?P\0" - "syntax error in subpattern name (missing terminator)\0" - "two named subpatterns have the same name\0" - "invalid UTF-8 string\0" - /* 45 */ - "support for \\P, \\p, and \\X has not been compiled\0" - "malformed \\P or \\p sequence\0" - "unknown property name after \\P or \\p\0" - "subpattern name is too long (maximum " XSTRING(MAX_NAME_SIZE) " characters)\0" - "too many named subpatterns (maximum " XSTRING(MAX_NAME_COUNT) ")\0" - /* 50 */ - "repeated subpattern is too long\0" /** DEAD **/ - "octal value is greater than \\377 in 8-bit non-UTF-8 mode\0" - "internal error: overran compiling workspace\0" - "internal error: previously-checked referenced subpattern not found\0" - "DEFINE group contains more than one branch\0" - /* 55 */ - "repeating a DEFINE group is not allowed\0" /** DEAD **/ - "inconsistent NEWLINE options\0" - "\\g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number\0" - "a numbered reference must not be zero\0" - "an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)\0" - /* 60 */ - "(*VERB) not recognized or malformed\0" - "number is too big\0" - "subpattern name expected\0" - "digit expected after (?+\0" - "] is an invalid data character in JavaScript compatibility mode\0" - /* 65 */ - "different names for subpatterns of the same number are not allowed\0" - "(*MARK) must have an argument\0" - "this version of PCRE is not compiled with Unicode property support\0" -#ifndef EBCDIC - "\\c must be followed by an ASCII character\0" -#else - "\\c must be followed by a letter or one of [\\]^_?\0" -#endif - "\\k is not followed by a braced, angle-bracketed, or quoted name\0" - /* 70 */ - "internal error: unknown opcode in find_fixedlength()\0" - "\\N is not supported in a class\0" - "too many forward references\0" - "disallowed Unicode code point (>= 0xd800 && <= 0xdfff)\0" - "invalid UTF-16 string\0" - /* 75 */ - "name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)\0" - "character value in \\u.... sequence is too large\0" - "invalid UTF-32 string\0" - "setting UTF is disabled by the application\0" - "non-hex character in \\x{} (closing brace missing?)\0" - /* 80 */ - "non-octal character in \\o{} (closing brace missing?)\0" - "missing opening brace after \\o\0" - "parentheses are too deeply nested\0" - "invalid range in character class\0" - "group name must start with a non-digit\0" - /* 85 */ - "parentheses are too deeply nested (stack check)\0" - "digits missing in \\x{} or \\o{}\0" - "regular expression is too complicated\0" - ; - -/* Table to identify digits and hex digits. This is used when compiling -patterns. Note that the tables in chartables are dependent on the locale, and -may mark arbitrary characters as digits - but the PCRE compiling code expects -to handle only 0-9, a-z, and A-Z as digits when compiling. That is why we have -a private table here. It costs 256 bytes, but it is a lot faster than doing -character value tests (at least in some simple cases I timed), and in some -applications one wants PCRE to compile efficiently as well as match -efficiently. - -For convenience, we use the same bit definitions as in chartables: - - 0x04 decimal digit - 0x08 hexadecimal digit - -Then we can use ctype_digit and ctype_xdigit in the code. */ - -/* Using a simple comparison for decimal numbers rather than a memory read -is much faster, and the resulting code is simpler (the compiler turns it -into a subtraction and unsigned comparison). */ - -#define IS_DIGIT(x) ((x) >= CHAR_0 && (x) <= CHAR_9) - -#ifndef EBCDIC - -/* This is the "normal" case, for ASCII systems, and EBCDIC systems running in -UTF-8 mode. */ - -static const pcre_uint8 digitab[] = - { - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 0- 7 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 8- 15 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 16- 23 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 24- 31 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* - ' */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* ( - / */ - 0x0c,0x0c,0x0c,0x0c,0x0c,0x0c,0x0c,0x0c, /* 0 - 7 */ - 0x0c,0x0c,0x00,0x00,0x00,0x00,0x00,0x00, /* 8 - ? */ - 0x00,0x08,0x08,0x08,0x08,0x08,0x08,0x00, /* @ - G */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* H - O */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* P - W */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* X - _ */ - 0x00,0x08,0x08,0x08,0x08,0x08,0x08,0x00, /* ` - g */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* h - o */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* p - w */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* x -127 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 128-135 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 136-143 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 144-151 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 152-159 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 160-167 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 168-175 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 176-183 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 184-191 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 192-199 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 200-207 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 208-215 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 216-223 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 224-231 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 232-239 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 240-247 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00};/* 248-255 */ - -#else - -/* This is the "abnormal" case, for EBCDIC systems not running in UTF-8 mode. */ - -static const pcre_uint8 digitab[] = - { - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 0- 7 0 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 8- 15 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 16- 23 10 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 24- 31 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 32- 39 20 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 40- 47 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 48- 55 30 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 56- 63 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* - 71 40 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 72- | */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* & - 87 50 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 88- 95 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* - -103 60 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 104- ? */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 112-119 70 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 120- " */ - 0x00,0x08,0x08,0x08,0x08,0x08,0x08,0x00, /* 128- g 80 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* h -143 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 144- p 90 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* q -159 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 160- x A0 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* y -175 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* ^ -183 B0 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 184-191 */ - 0x00,0x08,0x08,0x08,0x08,0x08,0x08,0x00, /* { - G C0 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* H -207 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* } - P D0 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* Q -223 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* \ - X E0 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* Y -239 */ - 0x0c,0x0c,0x0c,0x0c,0x0c,0x0c,0x0c,0x0c, /* 0 - 7 F0 */ - 0x0c,0x0c,0x00,0x00,0x00,0x00,0x00,0x00};/* 8 -255 */ - -static const pcre_uint8 ebcdic_chartab[] = { /* chartable partial dup */ - 0x80,0x00,0x00,0x00,0x00,0x01,0x00,0x00, /* 0- 7 */ - 0x00,0x00,0x00,0x00,0x01,0x01,0x00,0x00, /* 8- 15 */ - 0x00,0x00,0x00,0x00,0x00,0x01,0x00,0x00, /* 16- 23 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 24- 31 */ - 0x00,0x00,0x00,0x00,0x00,0x01,0x00,0x00, /* 32- 39 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 40- 47 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 48- 55 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 56- 63 */ - 0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* - 71 */ - 0x00,0x00,0x00,0x80,0x00,0x80,0x80,0x80, /* 72- | */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* & - 87 */ - 0x00,0x00,0x00,0x80,0x80,0x80,0x00,0x00, /* 88- 95 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* - -103 */ - 0x00,0x00,0x00,0x00,0x00,0x10,0x00,0x80, /* 104- ? */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 112-119 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 120- " */ - 0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* 128- g */ - 0x12,0x12,0x00,0x00,0x00,0x00,0x00,0x00, /* h -143 */ - 0x00,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* 144- p */ - 0x12,0x12,0x00,0x00,0x00,0x00,0x00,0x00, /* q -159 */ - 0x00,0x00,0x12,0x12,0x12,0x12,0x12,0x12, /* 160- x */ - 0x12,0x12,0x00,0x00,0x00,0x00,0x00,0x00, /* y -175 */ - 0x80,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* ^ -183 */ - 0x00,0x00,0x80,0x00,0x00,0x00,0x00,0x00, /* 184-191 */ - 0x80,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* { - G */ - 0x12,0x12,0x00,0x00,0x00,0x00,0x00,0x00, /* H -207 */ - 0x00,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* } - P */ - 0x12,0x12,0x00,0x00,0x00,0x00,0x00,0x00, /* Q -223 */ - 0x00,0x00,0x12,0x12,0x12,0x12,0x12,0x12, /* \ - X */ - 0x12,0x12,0x00,0x00,0x00,0x00,0x00,0x00, /* Y -239 */ - 0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c, /* 0 - 7 */ - 0x1c,0x1c,0x00,0x00,0x00,0x00,0x00,0x00};/* 8 -255 */ -#endif - - -/* This table is used to check whether auto-possessification is possible -between adjacent character-type opcodes. The left-hand (repeated) opcode is -used to select the row, and the right-hand opcode is use to select the column. -A value of 1 means that auto-possessification is OK. For example, the second -value in the first row means that \D+\d can be turned into \D++\d. - -The Unicode property types (\P and \p) have to be present to fill out the table -because of what their opcode values are, but the table values should always be -zero because property types are handled separately in the code. The last four -columns apply to items that cannot be repeated, so there is no need to have -rows for them. Note that OP_DIGIT etc. are generated only when PCRE_UCP is -*not* set. When it is set, \d etc. are converted into OP_(NOT_)PROP codes. */ - -#define APTROWS (LAST_AUTOTAB_LEFT_OP - FIRST_AUTOTAB_OP + 1) -#define APTCOLS (LAST_AUTOTAB_RIGHT_OP - FIRST_AUTOTAB_OP + 1) - -static const pcre_uint8 autoposstab[APTROWS][APTCOLS] = { -/* \D \d \S \s \W \w . .+ \C \P \p \R \H \h \V \v \X \Z \z $ $M */ - { 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 }, /* \D */ - { 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1 }, /* \d */ - { 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1 }, /* \S */ - { 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 }, /* \s */ - { 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 }, /* \W */ - { 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1 }, /* \w */ - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0 }, /* . */ - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 }, /* .+ */ - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 }, /* \C */ - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, /* \P */ - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, /* \p */ - { 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0 }, /* \R */ - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0 }, /* \H */ - { 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0 }, /* \h */ - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0 }, /* \V */ - { 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0 }, /* \v */ - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 } /* \X */ -}; - - -/* This table is used to check whether auto-possessification is possible -between adjacent Unicode property opcodes (OP_PROP and OP_NOTPROP). The -left-hand (repeated) opcode is used to select the row, and the right-hand -opcode is used to select the column. The values are as follows: - - 0 Always return FALSE (never auto-possessify) - 1 Character groups are distinct (possessify if both are OP_PROP) - 2 Check character categories in the same group (general or particular) - 3 TRUE if the two opcodes are not the same (PROP vs NOTPROP) - - 4 Check left general category vs right particular category - 5 Check right general category vs left particular category - - 6 Left alphanum vs right general category - 7 Left space vs right general category - 8 Left word vs right general category - - 9 Right alphanum vs left general category - 10 Right space vs left general category - 11 Right word vs left general category - - 12 Left alphanum vs right particular category - 13 Left space vs right particular category - 14 Left word vs right particular category - - 15 Right alphanum vs left particular category - 16 Right space vs left particular category - 17 Right word vs left particular category -*/ - -static const pcre_uint8 propposstab[PT_TABSIZE][PT_TABSIZE] = { -/* ANY LAMP GC PC SC ALNUM SPACE PXSPACE WORD CLIST UCNC */ - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, /* PT_ANY */ - { 0, 3, 0, 0, 0, 3, 1, 1, 0, 0, 0 }, /* PT_LAMP */ - { 0, 0, 2, 4, 0, 9, 10, 10, 11, 0, 0 }, /* PT_GC */ - { 0, 0, 5, 2, 0, 15, 16, 16, 17, 0, 0 }, /* PT_PC */ - { 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0 }, /* PT_SC */ - { 0, 3, 6, 12, 0, 3, 1, 1, 0, 0, 0 }, /* PT_ALNUM */ - { 0, 1, 7, 13, 0, 1, 3, 3, 1, 0, 0 }, /* PT_SPACE */ - { 0, 1, 7, 13, 0, 1, 3, 3, 1, 0, 0 }, /* PT_PXSPACE */ - { 0, 0, 8, 14, 0, 0, 1, 1, 3, 0, 0 }, /* PT_WORD */ - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, /* PT_CLIST */ - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3 } /* PT_UCNC */ -}; - -/* This table is used to check whether auto-possessification is possible -between adjacent Unicode property opcodes (OP_PROP and OP_NOTPROP) when one -specifies a general category and the other specifies a particular category. The -row is selected by the general category and the column by the particular -category. The value is 1 if the particular category is not part of the general -category. */ - -static const pcre_uint8 catposstab[7][30] = { -/* Cc Cf Cn Co Cs Ll Lm Lo Lt Lu Mc Me Mn Nd Nl No Pc Pd Pe Pf Pi Po Ps Sc Sk Sm So Zl Zp Zs */ - { 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }, /* C */ - { 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }, /* L */ - { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }, /* M */ - { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }, /* N */ - { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1 }, /* P */ - { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1 }, /* S */ - { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0 } /* Z */ -}; - -/* This table is used when checking ALNUM, (PX)SPACE, SPACE, and WORD against -a general or particular category. The properties in each row are those -that apply to the character set in question. Duplication means that a little -unnecessary work is done when checking, but this keeps things much simpler -because they can all use the same code. For more details see the comment where -this table is used. - -Note: SPACE and PXSPACE used to be different because Perl excluded VT from -"space", but from Perl 5.18 it's included, so both categories are treated the -same here. */ - -static const pcre_uint8 posspropstab[3][4] = { - { ucp_L, ucp_N, ucp_N, ucp_Nl }, /* ALNUM, 3rd and 4th values redundant */ - { ucp_Z, ucp_Z, ucp_C, ucp_Cc }, /* SPACE and PXSPACE, 2nd value redundant */ - { ucp_L, ucp_N, ucp_P, ucp_Po } /* WORD */ -}; - -/* This table is used when converting repeating opcodes into possessified -versions as a result of an explicit possessive quantifier such as ++. A zero -value means there is no possessified version - in those cases the item in -question must be wrapped in ONCE brackets. The table is truncated at OP_CALLOUT -because all relevant opcodes are less than that. */ - -static const pcre_uint8 opcode_possessify[] = { - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0 - 15 */ - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 16 - 31 */ - - 0, /* NOTI */ - OP_POSSTAR, 0, /* STAR, MINSTAR */ - OP_POSPLUS, 0, /* PLUS, MINPLUS */ - OP_POSQUERY, 0, /* QUERY, MINQUERY */ - OP_POSUPTO, 0, /* UPTO, MINUPTO */ - 0, /* EXACT */ - 0, 0, 0, 0, /* POS{STAR,PLUS,QUERY,UPTO} */ - - OP_POSSTARI, 0, /* STARI, MINSTARI */ - OP_POSPLUSI, 0, /* PLUSI, MINPLUSI */ - OP_POSQUERYI, 0, /* QUERYI, MINQUERYI */ - OP_POSUPTOI, 0, /* UPTOI, MINUPTOI */ - 0, /* EXACTI */ - 0, 0, 0, 0, /* POS{STARI,PLUSI,QUERYI,UPTOI} */ - - OP_NOTPOSSTAR, 0, /* NOTSTAR, NOTMINSTAR */ - OP_NOTPOSPLUS, 0, /* NOTPLUS, NOTMINPLUS */ - OP_NOTPOSQUERY, 0, /* NOTQUERY, NOTMINQUERY */ - OP_NOTPOSUPTO, 0, /* NOTUPTO, NOTMINUPTO */ - 0, /* NOTEXACT */ - 0, 0, 0, 0, /* NOTPOS{STAR,PLUS,QUERY,UPTO} */ - - OP_NOTPOSSTARI, 0, /* NOTSTARI, NOTMINSTARI */ - OP_NOTPOSPLUSI, 0, /* NOTPLUSI, NOTMINPLUSI */ - OP_NOTPOSQUERYI, 0, /* NOTQUERYI, NOTMINQUERYI */ - OP_NOTPOSUPTOI, 0, /* NOTUPTOI, NOTMINUPTOI */ - 0, /* NOTEXACTI */ - 0, 0, 0, 0, /* NOTPOS{STARI,PLUSI,QUERYI,UPTOI} */ - - OP_TYPEPOSSTAR, 0, /* TYPESTAR, TYPEMINSTAR */ - OP_TYPEPOSPLUS, 0, /* TYPEPLUS, TYPEMINPLUS */ - OP_TYPEPOSQUERY, 0, /* TYPEQUERY, TYPEMINQUERY */ - OP_TYPEPOSUPTO, 0, /* TYPEUPTO, TYPEMINUPTO */ - 0, /* TYPEEXACT */ - 0, 0, 0, 0, /* TYPEPOS{STAR,PLUS,QUERY,UPTO} */ - - OP_CRPOSSTAR, 0, /* CRSTAR, CRMINSTAR */ - OP_CRPOSPLUS, 0, /* CRPLUS, CRMINPLUS */ - OP_CRPOSQUERY, 0, /* CRQUERY, CRMINQUERY */ - OP_CRPOSRANGE, 0, /* CRRANGE, CRMINRANGE */ - 0, 0, 0, 0, /* CRPOS{STAR,PLUS,QUERY,RANGE} */ - - 0, 0, 0, /* CLASS, NCLASS, XCLASS */ - 0, 0, /* REF, REFI */ - 0, 0, /* DNREF, DNREFI */ - 0, 0 /* RECURSE, CALLOUT */ -}; - - - -/************************************************* -* Find an error text * -*************************************************/ - -/* The error texts are now all in one long string, to save on relocations. As -some of the text is of unknown length, we can't use a table of offsets. -Instead, just count through the strings. This is not a performance issue -because it happens only when there has been a compilation error. - -Argument: the error number -Returns: pointer to the error string -*/ - -static const char * -find_error_text(int n) -{ -const char *s = error_texts; -for (; n > 0; n--) - { - while (*s++ != CHAR_NULL) {}; - if (*s == CHAR_NULL) return "Error text not found (please report)"; - } -return s; -} - - - -/************************************************* -* Expand the workspace * -*************************************************/ - -/* This function is called during the second compiling phase, if the number of -forward references fills the existing workspace, which is originally a block on -the stack. A larger block is obtained from malloc() unless the ultimate limit -has been reached or the increase will be rather small. - -Argument: pointer to the compile data block -Returns: 0 if all went well, else an error number -*/ - -static int -expand_workspace(compile_data *cd) -{ -pcre_uchar *newspace; -int newsize = cd->workspace_size * 2; - -if (newsize > COMPILE_WORK_SIZE_MAX) newsize = COMPILE_WORK_SIZE_MAX; -if (cd->workspace_size >= COMPILE_WORK_SIZE_MAX || - newsize - cd->workspace_size < WORK_SIZE_SAFETY_MARGIN) - return ERR72; - -newspace = (PUBL(malloc))(IN_UCHARS(newsize)); -if (newspace == NULL) return ERR21; -memcpy(newspace, cd->start_workspace, cd->workspace_size * sizeof(pcre_uchar)); -cd->hwm = (pcre_uchar *)newspace + (cd->hwm - cd->start_workspace); -if (cd->workspace_size > COMPILE_WORK_SIZE) - (PUBL(free))((void *)cd->start_workspace); -cd->start_workspace = newspace; -cd->workspace_size = newsize; -return 0; -} - - - -/************************************************* -* Check for counted repeat * -*************************************************/ - -/* This function is called when a '{' is encountered in a place where it might -start a quantifier. It looks ahead to see if it really is a quantifier or not. -It is only a quantifier if it is one of the forms {ddd} {ddd,} or {ddd,ddd} -where the ddds are digits. - -Arguments: - p pointer to the first char after '{' - -Returns: TRUE or FALSE -*/ - -static BOOL -is_counted_repeat(const pcre_uchar *p) -{ -if (!IS_DIGIT(*p)) return FALSE; -p++; -while (IS_DIGIT(*p)) p++; -if (*p == CHAR_RIGHT_CURLY_BRACKET) return TRUE; - -if (*p++ != CHAR_COMMA) return FALSE; -if (*p == CHAR_RIGHT_CURLY_BRACKET) return TRUE; - -if (!IS_DIGIT(*p)) return FALSE; -p++; -while (IS_DIGIT(*p)) p++; - -return (*p == CHAR_RIGHT_CURLY_BRACKET); -} - - - -/************************************************* -* Handle escapes * -*************************************************/ - -/* This function is called when a \ has been encountered. It either returns a -positive value for a simple escape such as \n, or 0 for a data character which -will be placed in chptr. A backreference to group n is returned as negative n. -When UTF-8 is enabled, a positive value greater than 255 may be returned in -chptr. On entry, ptr is pointing at the \. On exit, it is on the final -character of the escape sequence. - -Arguments: - ptrptr points to the pattern position pointer - chptr points to a returned data character - errorcodeptr points to the errorcode variable - bracount number of previous extracting brackets - options the options bits - isclass TRUE if inside a character class - -Returns: zero => a data character - positive => a special escape sequence - negative => a back reference - on error, errorcodeptr is set -*/ - -static int -check_escape(const pcre_uchar **ptrptr, pcre_uint32 *chptr, int *errorcodeptr, - int bracount, int options, BOOL isclass) -{ -/* PCRE_UTF16 has the same value as PCRE_UTF8. */ -BOOL utf = (options & PCRE_UTF8) != 0; -const pcre_uchar *ptr = *ptrptr + 1; -pcre_uint32 c; -int escape = 0; -int i; - -GETCHARINCTEST(c, ptr); /* Get character value, increment pointer */ -ptr--; /* Set pointer back to the last byte */ - -/* If backslash is at the end of the pattern, it's an error. */ - -if (c == CHAR_NULL) *errorcodeptr = ERR1; - -/* Non-alphanumerics are literals. For digits or letters, do an initial lookup -in a table. A non-zero result is something that can be returned immediately. -Otherwise further processing may be required. */ - -#ifndef EBCDIC /* ASCII/UTF-8 coding */ -/* Not alphanumeric */ -else if (c < CHAR_0 || c > CHAR_z) {} -else if ((i = escapes[c - CHAR_0]) != 0) - { if (i > 0) c = (pcre_uint32)i; else escape = -i; } - -#else /* EBCDIC coding */ -/* Not alphanumeric */ -else if (c < CHAR_a || (!MAX_255(c) || (ebcdic_chartab[c] & 0x0E) == 0)) {} -else if ((i = escapes[c - 0x48]) != 0) { if (i > 0) c = (pcre_uint32)i; else escape = -i; } -#endif - -/* Escapes that need further processing, or are illegal. */ - -else - { - const pcre_uchar *oldptr; - BOOL braced, negated, overflow; - int s; - - switch (c) - { - /* A number of Perl escapes are not handled by PCRE. We give an explicit - error. */ - - case CHAR_l: - case CHAR_L: - *errorcodeptr = ERR37; - break; - - case CHAR_u: - if ((options & PCRE_JAVASCRIPT_COMPAT) != 0) - { - /* In JavaScript, \u must be followed by four hexadecimal numbers. - Otherwise it is a lowercase u letter. */ - if (MAX_255(ptr[1]) && (digitab[ptr[1]] & ctype_xdigit) != 0 - && MAX_255(ptr[2]) && (digitab[ptr[2]] & ctype_xdigit) != 0 - && MAX_255(ptr[3]) && (digitab[ptr[3]] & ctype_xdigit) != 0 - && MAX_255(ptr[4]) && (digitab[ptr[4]] & ctype_xdigit) != 0) - { - c = 0; - for (i = 0; i < 4; ++i) - { - register pcre_uint32 cc = *(++ptr); -#ifndef EBCDIC /* ASCII/UTF-8 coding */ - if (cc >= CHAR_a) cc -= 32; /* Convert to upper case */ - c = (c << 4) + cc - ((cc < CHAR_A)? CHAR_0 : (CHAR_A - 10)); -#else /* EBCDIC coding */ - if (cc >= CHAR_a && cc <= CHAR_z) cc += 64; /* Convert to upper case */ - c = (c << 4) + cc - ((cc >= CHAR_0)? CHAR_0 : (CHAR_A - 10)); -#endif - } - -#if defined COMPILE_PCRE8 - if (c > (utf ? 0x10ffffU : 0xffU)) -#elif defined COMPILE_PCRE16 - if (c > (utf ? 0x10ffffU : 0xffffU)) -#elif defined COMPILE_PCRE32 - if (utf && c > 0x10ffffU) -#endif - { - *errorcodeptr = ERR76; - } - else if (utf && c >= 0xd800 && c <= 0xdfff) *errorcodeptr = ERR73; - } - } - else - *errorcodeptr = ERR37; - break; - - case CHAR_U: - /* In JavaScript, \U is an uppercase U letter. */ - if ((options & PCRE_JAVASCRIPT_COMPAT) == 0) *errorcodeptr = ERR37; - break; - - /* In a character class, \g is just a literal "g". Outside a character - class, \g must be followed by one of a number of specific things: - - (1) A number, either plain or braced. If positive, it is an absolute - backreference. If negative, it is a relative backreference. This is a Perl - 5.10 feature. - - (2) Perl 5.10 also supports \g{name} as a reference to a named group. This - is part of Perl's movement towards a unified syntax for back references. As - this is synonymous with \k{name}, we fudge it up by pretending it really - was \k. - - (3) For Oniguruma compatibility we also support \g followed by a name or a - number either in angle brackets or in single quotes. However, these are - (possibly recursive) subroutine calls, _not_ backreferences. Just return - the ESC_g code (cf \k). */ - - case CHAR_g: - if (isclass) break; - if (ptr[1] == CHAR_LESS_THAN_SIGN || ptr[1] == CHAR_APOSTROPHE) - { - escape = ESC_g; - break; - } - - /* Handle the Perl-compatible cases */ - - if (ptr[1] == CHAR_LEFT_CURLY_BRACKET) - { - const pcre_uchar *p; - for (p = ptr+2; *p != CHAR_NULL && *p != CHAR_RIGHT_CURLY_BRACKET; p++) - if (*p != CHAR_MINUS && !IS_DIGIT(*p)) break; - if (*p != CHAR_NULL && *p != CHAR_RIGHT_CURLY_BRACKET) - { - escape = ESC_k; - break; - } - braced = TRUE; - ptr++; - } - else braced = FALSE; - - if (ptr[1] == CHAR_MINUS) - { - negated = TRUE; - ptr++; - } - else negated = FALSE; - - /* The integer range is limited by the machine's int representation. */ - s = 0; - overflow = FALSE; - while (IS_DIGIT(ptr[1])) - { - if (s > INT_MAX / 10 - 1) /* Integer overflow */ - { - overflow = TRUE; - break; - } - s = s * 10 + (int)(*(++ptr) - CHAR_0); - } - if (overflow) /* Integer overflow */ - { - while (IS_DIGIT(ptr[1])) - ptr++; - *errorcodeptr = ERR61; - break; - } - - if (braced && *(++ptr) != CHAR_RIGHT_CURLY_BRACKET) - { - *errorcodeptr = ERR57; - break; - } - - if (s == 0) - { - *errorcodeptr = ERR58; - break; - } - - if (negated) - { - if (s > bracount) - { - *errorcodeptr = ERR15; - break; - } - s = bracount - (s - 1); - } - - escape = -s; - break; - - /* The handling of escape sequences consisting of a string of digits - starting with one that is not zero is not straightforward. Perl has changed - over the years. Nowadays \g{} for backreferences and \o{} for octal are - recommended to avoid the ambiguities in the old syntax. - - Outside a character class, the digits are read as a decimal number. If the - number is less than 8 (used to be 10), or if there are that many previous - extracting left brackets, then it is a back reference. Otherwise, up to - three octal digits are read to form an escaped byte. Thus \123 is likely to - be octal 123 (cf \0123, which is octal 012 followed by the literal 3). If - the octal value is greater than 377, the least significant 8 bits are - taken. \8 and \9 are treated as the literal characters 8 and 9. - - Inside a character class, \ followed by a digit is always either a literal - 8 or 9 or an octal number. */ - - case CHAR_1: case CHAR_2: case CHAR_3: case CHAR_4: case CHAR_5: - case CHAR_6: case CHAR_7: case CHAR_8: case CHAR_9: - - if (!isclass) - { - oldptr = ptr; - /* The integer range is limited by the machine's int representation. */ - s = (int)(c -CHAR_0); - overflow = FALSE; - while (IS_DIGIT(ptr[1])) - { - if (s > INT_MAX / 10 - 1) /* Integer overflow */ - { - overflow = TRUE; - break; - } - s = s * 10 + (int)(*(++ptr) - CHAR_0); - } - if (overflow) /* Integer overflow */ - { - while (IS_DIGIT(ptr[1])) - ptr++; - *errorcodeptr = ERR61; - break; - } - if (s < 8 || s <= bracount) /* Check for back reference */ - { - escape = -s; - break; - } - ptr = oldptr; /* Put the pointer back and fall through */ - } - - /* Handle a digit following \ when the number is not a back reference. If - the first digit is 8 or 9, Perl used to generate a binary zero byte and - then treat the digit as a following literal. At least by Perl 5.18 this - changed so as not to insert the binary zero. */ - - if ((c = *ptr) >= CHAR_8) break; - - /* Fall through with a digit less than 8 */ - - /* \0 always starts an octal number, but we may drop through to here with a - larger first octal digit. The original code used just to take the least - significant 8 bits of octal numbers (I think this is what early Perls used - to do). Nowadays we allow for larger numbers in UTF-8 mode and 16-bit mode, - but no more than 3 octal digits. */ - - case CHAR_0: - c -= CHAR_0; - while(i++ < 2 && ptr[1] >= CHAR_0 && ptr[1] <= CHAR_7) - c = c * 8 + *(++ptr) - CHAR_0; -#ifdef COMPILE_PCRE8 - if (!utf && c > 0xff) *errorcodeptr = ERR51; -#endif - break; - - /* \o is a relatively new Perl feature, supporting a more general way of - specifying character codes in octal. The only supported form is \o{ddd}. */ - - case CHAR_o: - if (ptr[1] != CHAR_LEFT_CURLY_BRACKET) *errorcodeptr = ERR81; else - if (ptr[2] == CHAR_RIGHT_CURLY_BRACKET) *errorcodeptr = ERR86; else - { - ptr += 2; - c = 0; - overflow = FALSE; - while (*ptr >= CHAR_0 && *ptr <= CHAR_7) - { - register pcre_uint32 cc = *ptr++; - if (c == 0 && cc == CHAR_0) continue; /* Leading zeroes */ -#ifdef COMPILE_PCRE32 - if (c >= 0x20000000l) { overflow = TRUE; break; } -#endif - c = (c << 3) + cc - CHAR_0 ; -#if defined COMPILE_PCRE8 - if (c > (utf ? 0x10ffffU : 0xffU)) { overflow = TRUE; break; } -#elif defined COMPILE_PCRE16 - if (c > (utf ? 0x10ffffU : 0xffffU)) { overflow = TRUE; break; } -#elif defined COMPILE_PCRE32 - if (utf && c > 0x10ffffU) { overflow = TRUE; break; } -#endif - } - if (overflow) - { - while (*ptr >= CHAR_0 && *ptr <= CHAR_7) ptr++; - *errorcodeptr = ERR34; - } - else if (*ptr == CHAR_RIGHT_CURLY_BRACKET) - { - if (utf && c >= 0xd800 && c <= 0xdfff) *errorcodeptr = ERR73; - } - else *errorcodeptr = ERR80; - } - break; - - /* \x is complicated. In JavaScript, \x must be followed by two hexadecimal - numbers. Otherwise it is a lowercase x letter. */ - - case CHAR_x: - if ((options & PCRE_JAVASCRIPT_COMPAT) != 0) - { - if (MAX_255(ptr[1]) && (digitab[ptr[1]] & ctype_xdigit) != 0 - && MAX_255(ptr[2]) && (digitab[ptr[2]] & ctype_xdigit) != 0) - { - c = 0; - for (i = 0; i < 2; ++i) - { - register pcre_uint32 cc = *(++ptr); -#ifndef EBCDIC /* ASCII/UTF-8 coding */ - if (cc >= CHAR_a) cc -= 32; /* Convert to upper case */ - c = (c << 4) + cc - ((cc < CHAR_A)? CHAR_0 : (CHAR_A - 10)); -#else /* EBCDIC coding */ - if (cc >= CHAR_a && cc <= CHAR_z) cc += 64; /* Convert to upper case */ - c = (c << 4) + cc - ((cc >= CHAR_0)? CHAR_0 : (CHAR_A - 10)); -#endif - } - } - } /* End JavaScript handling */ - - /* Handle \x in Perl's style. \x{ddd} is a character number which can be - greater than 0xff in utf or non-8bit mode, but only if the ddd are hex - digits. If not, { used to be treated as a data character. However, Perl - seems to read hex digits up to the first non-such, and ignore the rest, so - that, for example \x{zz} matches a binary zero. This seems crazy, so PCRE - now gives an error. */ - - else - { - if (ptr[1] == CHAR_LEFT_CURLY_BRACKET) - { - ptr += 2; - if (*ptr == CHAR_RIGHT_CURLY_BRACKET) - { - *errorcodeptr = ERR86; - break; - } - c = 0; - overflow = FALSE; - while (MAX_255(*ptr) && (digitab[*ptr] & ctype_xdigit) != 0) - { - register pcre_uint32 cc = *ptr++; - if (c == 0 && cc == CHAR_0) continue; /* Leading zeroes */ - -#ifdef COMPILE_PCRE32 - if (c >= 0x10000000l) { overflow = TRUE; break; } -#endif - -#ifndef EBCDIC /* ASCII/UTF-8 coding */ - if (cc >= CHAR_a) cc -= 32; /* Convert to upper case */ - c = (c << 4) + cc - ((cc < CHAR_A)? CHAR_0 : (CHAR_A - 10)); -#else /* EBCDIC coding */ - if (cc >= CHAR_a && cc <= CHAR_z) cc += 64; /* Convert to upper case */ - c = (c << 4) + cc - ((cc >= CHAR_0)? CHAR_0 : (CHAR_A - 10)); -#endif - -#if defined COMPILE_PCRE8 - if (c > (utf ? 0x10ffffU : 0xffU)) { overflow = TRUE; break; } -#elif defined COMPILE_PCRE16 - if (c > (utf ? 0x10ffffU : 0xffffU)) { overflow = TRUE; break; } -#elif defined COMPILE_PCRE32 - if (utf && c > 0x10ffffU) { overflow = TRUE; break; } -#endif - } - - if (overflow) - { - while (MAX_255(*ptr) && (digitab[*ptr] & ctype_xdigit) != 0) ptr++; - *errorcodeptr = ERR34; - } - - else if (*ptr == CHAR_RIGHT_CURLY_BRACKET) - { - if (utf && c >= 0xd800 && c <= 0xdfff) *errorcodeptr = ERR73; - } - - /* If the sequence of hex digits does not end with '}', give an error. - We used just to recognize this construct and fall through to the normal - \x handling, but nowadays Perl gives an error, which seems much more - sensible, so we do too. */ - - else *errorcodeptr = ERR79; - } /* End of \x{} processing */ - - /* Read a single-byte hex-defined char (up to two hex digits after \x) */ - - else - { - c = 0; - while (i++ < 2 && MAX_255(ptr[1]) && (digitab[ptr[1]] & ctype_xdigit) != 0) - { - pcre_uint32 cc; /* Some compilers don't like */ - cc = *(++ptr); /* ++ in initializers */ -#ifndef EBCDIC /* ASCII/UTF-8 coding */ - if (cc >= CHAR_a) cc -= 32; /* Convert to upper case */ - c = c * 16 + cc - ((cc < CHAR_A)? CHAR_0 : (CHAR_A - 10)); -#else /* EBCDIC coding */ - if (cc <= CHAR_z) cc += 64; /* Convert to upper case */ - c = c * 16 + cc - ((cc >= CHAR_0)? CHAR_0 : (CHAR_A - 10)); -#endif - } - } /* End of \xdd handling */ - } /* End of Perl-style \x handling */ - break; - - /* For \c, a following letter is upper-cased; then the 0x40 bit is flipped. - An error is given if the byte following \c is not an ASCII character. This - coding is ASCII-specific, but then the whole concept of \cx is - ASCII-specific. (However, an EBCDIC equivalent has now been added.) */ - - case CHAR_c: - c = *(++ptr); - if (c == CHAR_NULL) - { - *errorcodeptr = ERR2; - break; - } -#ifndef EBCDIC /* ASCII/UTF-8 coding */ - if (c > 127) /* Excludes all non-ASCII in either mode */ - { - *errorcodeptr = ERR68; - break; - } - if (c >= CHAR_a && c <= CHAR_z) c -= 32; - c ^= 0x40; -#else /* EBCDIC coding */ - if (c >= CHAR_a && c <= CHAR_z) c += 64; - if (c == CHAR_QUESTION_MARK) - c = ('\\' == 188 && '`' == 74)? 0x5f : 0xff; - else - { - for (i = 0; i < 32; i++) - { - if (c == ebcdic_escape_c[i]) break; - } - if (i < 32) c = i; else *errorcodeptr = ERR68; - } -#endif - break; - - /* PCRE_EXTRA enables extensions to Perl in the matter of escapes. Any - other alphanumeric following \ is an error if PCRE_EXTRA was set; - otherwise, for Perl compatibility, it is a literal. This code looks a bit - odd, but there used to be some cases other than the default, and there may - be again in future, so I haven't "optimized" it. */ - - default: - if ((options & PCRE_EXTRA) != 0) switch(c) - { - default: - *errorcodeptr = ERR3; - break; - } - break; - } - } - -/* Perl supports \N{name} for character names, as well as plain \N for "not -newline". PCRE does not support \N{name}. However, it does support -quantification such as \N{2,3}. */ - -if (escape == ESC_N && ptr[1] == CHAR_LEFT_CURLY_BRACKET && - !is_counted_repeat(ptr+2)) - *errorcodeptr = ERR37; - -/* If PCRE_UCP is set, we change the values for \d etc. */ - -if ((options & PCRE_UCP) != 0 && escape >= ESC_D && escape <= ESC_w) - escape += (ESC_DU - ESC_D); - -/* Set the pointer to the final character before returning. */ - -*ptrptr = ptr; -*chptr = c; -return escape; -} - - - -#ifdef SUPPORT_UCP -/************************************************* -* Handle \P and \p * -*************************************************/ - -/* This function is called after \P or \p has been encountered, provided that -PCRE is compiled with support for Unicode properties. On entry, ptrptr is -pointing at the P or p. On exit, it is pointing at the final character of the -escape sequence. - -Argument: - ptrptr points to the pattern position pointer - negptr points to a boolean that is set TRUE for negation else FALSE - ptypeptr points to an unsigned int that is set to the type value - pdataptr points to an unsigned int that is set to the detailed property value - errorcodeptr points to the error code variable - -Returns: TRUE if the type value was found, or FALSE for an invalid type -*/ - -static BOOL -get_ucp(const pcre_uchar **ptrptr, BOOL *negptr, unsigned int *ptypeptr, - unsigned int *pdataptr, int *errorcodeptr) -{ -pcre_uchar c; -int i, bot, top; -const pcre_uchar *ptr = *ptrptr; -pcre_uchar name[32]; - -c = *(++ptr); -if (c == CHAR_NULL) goto ERROR_RETURN; - -*negptr = FALSE; - -/* \P or \p can be followed by a name in {}, optionally preceded by ^ for -negation. */ - -if (c == CHAR_LEFT_CURLY_BRACKET) - { - if (ptr[1] == CHAR_CIRCUMFLEX_ACCENT) - { - *negptr = TRUE; - ptr++; - } - for (i = 0; i < (int)(sizeof(name) / sizeof(pcre_uchar)) - 1; i++) - { - c = *(++ptr); - if (c == CHAR_NULL) goto ERROR_RETURN; - if (c == CHAR_RIGHT_CURLY_BRACKET) break; - name[i] = c; - } - if (c != CHAR_RIGHT_CURLY_BRACKET) goto ERROR_RETURN; - name[i] = 0; - } - -/* Otherwise there is just one following character */ - -else - { - name[0] = c; - name[1] = 0; - } - -*ptrptr = ptr; - -/* Search for a recognized property name using binary chop */ - -bot = 0; -top = PRIV(utt_size); - -while (bot < top) - { - int r; - i = (bot + top) >> 1; - r = STRCMP_UC_C8(name, PRIV(utt_names) + PRIV(utt)[i].name_offset); - if (r == 0) - { - *ptypeptr = PRIV(utt)[i].type; - *pdataptr = PRIV(utt)[i].value; - return TRUE; - } - if (r > 0) bot = i + 1; else top = i; - } - -*errorcodeptr = ERR47; -*ptrptr = ptr; -return FALSE; - -ERROR_RETURN: -*errorcodeptr = ERR46; -*ptrptr = ptr; -return FALSE; -} -#endif - - - -/************************************************* -* Read repeat counts * -*************************************************/ - -/* Read an item of the form {n,m} and return the values. This is called only -after is_counted_repeat() has confirmed that a repeat-count quantifier exists, -so the syntax is guaranteed to be correct, but we need to check the values. - -Arguments: - p pointer to first char after '{' - minp pointer to int for min - maxp pointer to int for max - returned as -1 if no max - errorcodeptr points to error code variable - -Returns: pointer to '}' on success; - current ptr on error, with errorcodeptr set non-zero -*/ - -static const pcre_uchar * -read_repeat_counts(const pcre_uchar *p, int *minp, int *maxp, int *errorcodeptr) -{ -int min = 0; -int max = -1; - -while (IS_DIGIT(*p)) - { - min = min * 10 + (int)(*p++ - CHAR_0); - if (min > 65535) - { - *errorcodeptr = ERR5; - return p; - } - } - -if (*p == CHAR_RIGHT_CURLY_BRACKET) max = min; else - { - if (*(++p) != CHAR_RIGHT_CURLY_BRACKET) - { - max = 0; - while(IS_DIGIT(*p)) - { - max = max * 10 + (int)(*p++ - CHAR_0); - if (max > 65535) - { - *errorcodeptr = ERR5; - return p; - } - } - if (max < min) - { - *errorcodeptr = ERR4; - return p; - } - } - } - -*minp = min; -*maxp = max; -return p; -} - - - -/************************************************* -* Find first significant op code * -*************************************************/ - -/* This is called by several functions that scan a compiled expression looking -for a fixed first character, or an anchoring op code etc. It skips over things -that do not influence this. For some calls, it makes sense to skip negative -forward and all backward assertions, and also the \b assertion; for others it -does not. - -Arguments: - code pointer to the start of the group - skipassert TRUE if certain assertions are to be skipped - -Returns: pointer to the first significant opcode -*/ - -static const pcre_uchar* -first_significant_code(const pcre_uchar *code, BOOL skipassert) -{ -for (;;) - { - switch ((int)*code) - { - case OP_ASSERT_NOT: - case OP_ASSERTBACK: - case OP_ASSERTBACK_NOT: - if (!skipassert) return code; - do code += GET(code, 1); while (*code == OP_ALT); - code += PRIV(OP_lengths)[*code]; - break; - - case OP_WORD_BOUNDARY: - case OP_NOT_WORD_BOUNDARY: - if (!skipassert) return code; - /* Fall through */ - - case OP_CALLOUT: - case OP_CREF: - case OP_DNCREF: - case OP_RREF: - case OP_DNRREF: - case OP_DEF: - code += PRIV(OP_lengths)[*code]; - break; - - default: - return code; - } - } -/* Control never reaches here */ -} - - - -/************************************************* -* Find the fixed length of a branch * -*************************************************/ - -/* Scan a branch and compute the fixed length of subject that will match it, -if the length is fixed. This is needed for dealing with backward assertions. -In UTF8 mode, the result is in characters rather than bytes. The branch is -temporarily terminated with OP_END when this function is called. - -This function is called when a backward assertion is encountered, so that if it -fails, the error message can point to the correct place in the pattern. -However, we cannot do this when the assertion contains subroutine calls, -because they can be forward references. We solve this by remembering this case -and doing the check at the end; a flag specifies which mode we are running in. - -Arguments: - code points to the start of the pattern (the bracket) - utf TRUE in UTF-8 / UTF-16 / UTF-32 mode - atend TRUE if called when the pattern is complete - cd the "compile data" structure - recurses chain of recurse_check to catch mutual recursion - -Returns: the fixed length, - or -1 if there is no fixed length, - or -2 if \C was encountered (in UTF-8 mode only) - or -3 if an OP_RECURSE item was encountered and atend is FALSE - or -4 if an unknown opcode was encountered (internal error) -*/ - -static int -find_fixedlength(pcre_uchar *code, BOOL utf, BOOL atend, compile_data *cd, - recurse_check *recurses) -{ -int length = -1; -recurse_check this_recurse; -register int branchlength = 0; -register pcre_uchar *cc = code + 1 + LINK_SIZE; - -/* Scan along the opcodes for this branch. If we get to the end of the -branch, check the length against that of the other branches. */ - -for (;;) - { - int d; - pcre_uchar *ce, *cs; - register pcre_uchar op = *cc; - - switch (op) - { - /* We only need to continue for OP_CBRA (normal capturing bracket) and - OP_BRA (normal non-capturing bracket) because the other variants of these - opcodes are all concerned with unlimited repeated groups, which of course - are not of fixed length. */ - - case OP_CBRA: - case OP_BRA: - case OP_ONCE: - case OP_ONCE_NC: - case OP_COND: - d = find_fixedlength(cc + ((op == OP_CBRA)? IMM2_SIZE : 0), utf, atend, cd, - recurses); - if (d < 0) return d; - branchlength += d; - do cc += GET(cc, 1); while (*cc == OP_ALT); - cc += 1 + LINK_SIZE; - break; - - /* Reached end of a branch; if it's a ket it is the end of a nested call. - If it's ALT it is an alternation in a nested call. An ACCEPT is effectively - an ALT. If it is END it's the end of the outer call. All can be handled by - the same code. Note that we must not include the OP_KETRxxx opcodes here, - because they all imply an unlimited repeat. */ - - case OP_ALT: - case OP_KET: - case OP_END: - case OP_ACCEPT: - case OP_ASSERT_ACCEPT: - if (length < 0) length = branchlength; - else if (length != branchlength) return -1; - if (*cc != OP_ALT) return length; - cc += 1 + LINK_SIZE; - branchlength = 0; - break; - - /* A true recursion implies not fixed length, but a subroutine call may - be OK. If the subroutine is a forward reference, we can't deal with - it until the end of the pattern, so return -3. */ - - case OP_RECURSE: - if (!atend) return -3; - cs = ce = (pcre_uchar *)cd->start_code + GET(cc, 1); /* Start subpattern */ - do ce += GET(ce, 1); while (*ce == OP_ALT); /* End subpattern */ - if (cc > cs && cc < ce) return -1; /* Recursion */ - else /* Check for mutual recursion */ - { - recurse_check *r = recurses; - for (r = recurses; r != NULL; r = r->prev) if (r->group == cs) break; - if (r != NULL) return -1; /* Mutual recursion */ - } - this_recurse.prev = recurses; - this_recurse.group = cs; - d = find_fixedlength(cs + IMM2_SIZE, utf, atend, cd, &this_recurse); - if (d < 0) return d; - branchlength += d; - cc += 1 + LINK_SIZE; - break; - - /* Skip over assertive subpatterns */ - - case OP_ASSERT: - case OP_ASSERT_NOT: - case OP_ASSERTBACK: - case OP_ASSERTBACK_NOT: - do cc += GET(cc, 1); while (*cc == OP_ALT); - cc += 1 + LINK_SIZE; - break; - - /* Skip over things that don't match chars */ - - case OP_MARK: - case OP_PRUNE_ARG: - case OP_SKIP_ARG: - case OP_THEN_ARG: - cc += cc[1] + PRIV(OP_lengths)[*cc]; - break; - - case OP_CALLOUT: - case OP_CIRC: - case OP_CIRCM: - case OP_CLOSE: - case OP_COMMIT: - case OP_CREF: - case OP_DEF: - case OP_DNCREF: - case OP_DNRREF: - case OP_DOLL: - case OP_DOLLM: - case OP_EOD: - case OP_EODN: - case OP_FAIL: - case OP_NOT_WORD_BOUNDARY: - case OP_PRUNE: - case OP_REVERSE: - case OP_RREF: - case OP_SET_SOM: - case OP_SKIP: - case OP_SOD: - case OP_SOM: - case OP_THEN: - case OP_WORD_BOUNDARY: - cc += PRIV(OP_lengths)[*cc]; - break; - - /* Handle literal characters */ - - case OP_CHAR: - case OP_CHARI: - case OP_NOT: - case OP_NOTI: - branchlength++; - cc += 2; -#ifdef SUPPORT_UTF - if (utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); -#endif - break; - - /* Handle exact repetitions. The count is already in characters, but we - need to skip over a multibyte character in UTF8 mode. */ - - case OP_EXACT: - case OP_EXACTI: - case OP_NOTEXACT: - case OP_NOTEXACTI: - branchlength += (int)GET2(cc,1); - cc += 2 + IMM2_SIZE; -#ifdef SUPPORT_UTF - if (utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); -#endif - break; - - case OP_TYPEEXACT: - branchlength += GET2(cc,1); - if (cc[1 + IMM2_SIZE] == OP_PROP || cc[1 + IMM2_SIZE] == OP_NOTPROP) - cc += 2; - cc += 1 + IMM2_SIZE + 1; - break; - - /* Handle single-char matchers */ - - case OP_PROP: - case OP_NOTPROP: - cc += 2; - /* Fall through */ - - case OP_HSPACE: - case OP_VSPACE: - case OP_NOT_HSPACE: - case OP_NOT_VSPACE: - case OP_NOT_DIGIT: - case OP_DIGIT: - case OP_NOT_WHITESPACE: - case OP_WHITESPACE: - case OP_NOT_WORDCHAR: - case OP_WORDCHAR: - case OP_ANY: - case OP_ALLANY: - branchlength++; - cc++; - break; - - /* The single-byte matcher isn't allowed. This only happens in UTF-8 mode; - otherwise \C is coded as OP_ALLANY. */ - - case OP_ANYBYTE: - return -2; - - /* Check a class for variable quantification */ - - case OP_CLASS: - case OP_NCLASS: -#if defined SUPPORT_UTF || defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - case OP_XCLASS: - /* The original code caused an unsigned overflow in 64 bit systems, - so now we use a conditional statement. */ - if (op == OP_XCLASS) - cc += GET(cc, 1); - else - cc += PRIV(OP_lengths)[OP_CLASS]; -#else - cc += PRIV(OP_lengths)[OP_CLASS]; -#endif - - switch (*cc) - { - case OP_CRSTAR: - case OP_CRMINSTAR: - case OP_CRPLUS: - case OP_CRMINPLUS: - case OP_CRQUERY: - case OP_CRMINQUERY: - case OP_CRPOSSTAR: - case OP_CRPOSPLUS: - case OP_CRPOSQUERY: - return -1; - - case OP_CRRANGE: - case OP_CRMINRANGE: - case OP_CRPOSRANGE: - if (GET2(cc,1) != GET2(cc,1+IMM2_SIZE)) return -1; - branchlength += (int)GET2(cc,1); - cc += 1 + 2 * IMM2_SIZE; - break; - - default: - branchlength++; - } - break; - - /* Anything else is variable length */ - - case OP_ANYNL: - case OP_BRAMINZERO: - case OP_BRAPOS: - case OP_BRAPOSZERO: - case OP_BRAZERO: - case OP_CBRAPOS: - case OP_EXTUNI: - case OP_KETRMAX: - case OP_KETRMIN: - case OP_KETRPOS: - case OP_MINPLUS: - case OP_MINPLUSI: - case OP_MINQUERY: - case OP_MINQUERYI: - case OP_MINSTAR: - case OP_MINSTARI: - case OP_MINUPTO: - case OP_MINUPTOI: - case OP_NOTMINPLUS: - case OP_NOTMINPLUSI: - case OP_NOTMINQUERY: - case OP_NOTMINQUERYI: - case OP_NOTMINSTAR: - case OP_NOTMINSTARI: - case OP_NOTMINUPTO: - case OP_NOTMINUPTOI: - case OP_NOTPLUS: - case OP_NOTPLUSI: - case OP_NOTPOSPLUS: - case OP_NOTPOSPLUSI: - case OP_NOTPOSQUERY: - case OP_NOTPOSQUERYI: - case OP_NOTPOSSTAR: - case OP_NOTPOSSTARI: - case OP_NOTPOSUPTO: - case OP_NOTPOSUPTOI: - case OP_NOTQUERY: - case OP_NOTQUERYI: - case OP_NOTSTAR: - case OP_NOTSTARI: - case OP_NOTUPTO: - case OP_NOTUPTOI: - case OP_PLUS: - case OP_PLUSI: - case OP_POSPLUS: - case OP_POSPLUSI: - case OP_POSQUERY: - case OP_POSQUERYI: - case OP_POSSTAR: - case OP_POSSTARI: - case OP_POSUPTO: - case OP_POSUPTOI: - case OP_QUERY: - case OP_QUERYI: - case OP_REF: - case OP_REFI: - case OP_DNREF: - case OP_DNREFI: - case OP_SBRA: - case OP_SBRAPOS: - case OP_SCBRA: - case OP_SCBRAPOS: - case OP_SCOND: - case OP_SKIPZERO: - case OP_STAR: - case OP_STARI: - case OP_TYPEMINPLUS: - case OP_TYPEMINQUERY: - case OP_TYPEMINSTAR: - case OP_TYPEMINUPTO: - case OP_TYPEPLUS: - case OP_TYPEPOSPLUS: - case OP_TYPEPOSQUERY: - case OP_TYPEPOSSTAR: - case OP_TYPEPOSUPTO: - case OP_TYPEQUERY: - case OP_TYPESTAR: - case OP_TYPEUPTO: - case OP_UPTO: - case OP_UPTOI: - return -1; - - /* Catch unrecognized opcodes so that when new ones are added they - are not forgotten, as has happened in the past. */ - - default: - return -4; - } - } -/* Control never gets here */ -} - - - -/************************************************* -* Scan compiled regex for specific bracket * -*************************************************/ - -/* This little function scans through a compiled pattern until it finds a -capturing bracket with the given number, or, if the number is negative, an -instance of OP_REVERSE for a lookbehind. The function is global in the C sense -so that it can be called from pcre_study() when finding the minimum matching -length. - -Arguments: - code points to start of expression - utf TRUE in UTF-8 / UTF-16 / UTF-32 mode - number the required bracket number or negative to find a lookbehind - -Returns: pointer to the opcode for the bracket, or NULL if not found -*/ - -const pcre_uchar * -PRIV(find_bracket)(const pcre_uchar *code, BOOL utf, int number) -{ -for (;;) - { - register pcre_uchar c = *code; - - if (c == OP_END) return NULL; - - /* XCLASS is used for classes that cannot be represented just by a bit - map. This includes negated single high-valued characters. The length in - the table is zero; the actual length is stored in the compiled code. */ - - if (c == OP_XCLASS) code += GET(code, 1); - - /* Handle recursion */ - - else if (c == OP_REVERSE) - { - if (number < 0) return (pcre_uchar *)code; - code += PRIV(OP_lengths)[c]; - } - - /* Handle capturing bracket */ - - else if (c == OP_CBRA || c == OP_SCBRA || - c == OP_CBRAPOS || c == OP_SCBRAPOS) - { - int n = (int)GET2(code, 1+LINK_SIZE); - if (n == number) return (pcre_uchar *)code; - code += PRIV(OP_lengths)[c]; - } - - /* Otherwise, we can get the item's length from the table, except that for - repeated character types, we have to test for \p and \P, which have an extra - two bytes of parameters, and for MARK/PRUNE/SKIP/THEN with an argument, we - must add in its length. */ - - else - { - switch(c) - { - case OP_TYPESTAR: - case OP_TYPEMINSTAR: - case OP_TYPEPLUS: - case OP_TYPEMINPLUS: - case OP_TYPEQUERY: - case OP_TYPEMINQUERY: - case OP_TYPEPOSSTAR: - case OP_TYPEPOSPLUS: - case OP_TYPEPOSQUERY: - if (code[1] == OP_PROP || code[1] == OP_NOTPROP) code += 2; - break; - - case OP_TYPEUPTO: - case OP_TYPEMINUPTO: - case OP_TYPEEXACT: - case OP_TYPEPOSUPTO: - if (code[1 + IMM2_SIZE] == OP_PROP || code[1 + IMM2_SIZE] == OP_NOTPROP) - code += 2; - break; - - case OP_MARK: - case OP_PRUNE_ARG: - case OP_SKIP_ARG: - case OP_THEN_ARG: - code += code[1]; - break; - } - - /* Add in the fixed length from the table */ - - code += PRIV(OP_lengths)[c]; - - /* In UTF-8 mode, opcodes that are followed by a character may be followed by - a multi-byte character. The length in the table is a minimum, so we have to - arrange to skip the extra bytes. */ - -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 - if (utf) switch(c) - { - case OP_CHAR: - case OP_CHARI: - case OP_NOT: - case OP_NOTI: - case OP_EXACT: - case OP_EXACTI: - case OP_NOTEXACT: - case OP_NOTEXACTI: - case OP_UPTO: - case OP_UPTOI: - case OP_NOTUPTO: - case OP_NOTUPTOI: - case OP_MINUPTO: - case OP_MINUPTOI: - case OP_NOTMINUPTO: - case OP_NOTMINUPTOI: - case OP_POSUPTO: - case OP_POSUPTOI: - case OP_NOTPOSUPTO: - case OP_NOTPOSUPTOI: - case OP_STAR: - case OP_STARI: - case OP_NOTSTAR: - case OP_NOTSTARI: - case OP_MINSTAR: - case OP_MINSTARI: - case OP_NOTMINSTAR: - case OP_NOTMINSTARI: - case OP_POSSTAR: - case OP_POSSTARI: - case OP_NOTPOSSTAR: - case OP_NOTPOSSTARI: - case OP_PLUS: - case OP_PLUSI: - case OP_NOTPLUS: - case OP_NOTPLUSI: - case OP_MINPLUS: - case OP_MINPLUSI: - case OP_NOTMINPLUS: - case OP_NOTMINPLUSI: - case OP_POSPLUS: - case OP_POSPLUSI: - case OP_NOTPOSPLUS: - case OP_NOTPOSPLUSI: - case OP_QUERY: - case OP_QUERYI: - case OP_NOTQUERY: - case OP_NOTQUERYI: - case OP_MINQUERY: - case OP_MINQUERYI: - case OP_NOTMINQUERY: - case OP_NOTMINQUERYI: - case OP_POSQUERY: - case OP_POSQUERYI: - case OP_NOTPOSQUERY: - case OP_NOTPOSQUERYI: - if (HAS_EXTRALEN(code[-1])) code += GET_EXTRALEN(code[-1]); - break; - } -#else - (void)(utf); /* Keep compiler happy by referencing function argument */ -#endif - } - } -} - - - -/************************************************* -* Scan compiled regex for recursion reference * -*************************************************/ - -/* This little function scans through a compiled pattern until it finds an -instance of OP_RECURSE. - -Arguments: - code points to start of expression - utf TRUE in UTF-8 / UTF-16 / UTF-32 mode - -Returns: pointer to the opcode for OP_RECURSE, or NULL if not found -*/ - -static const pcre_uchar * -find_recurse(const pcre_uchar *code, BOOL utf) -{ -for (;;) - { - register pcre_uchar c = *code; - if (c == OP_END) return NULL; - if (c == OP_RECURSE) return code; - - /* XCLASS is used for classes that cannot be represented just by a bit - map. This includes negated single high-valued characters. The length in - the table is zero; the actual length is stored in the compiled code. */ - - if (c == OP_XCLASS) code += GET(code, 1); - - /* Otherwise, we can get the item's length from the table, except that for - repeated character types, we have to test for \p and \P, which have an extra - two bytes of parameters, and for MARK/PRUNE/SKIP/THEN with an argument, we - must add in its length. */ - - else - { - switch(c) - { - case OP_TYPESTAR: - case OP_TYPEMINSTAR: - case OP_TYPEPLUS: - case OP_TYPEMINPLUS: - case OP_TYPEQUERY: - case OP_TYPEMINQUERY: - case OP_TYPEPOSSTAR: - case OP_TYPEPOSPLUS: - case OP_TYPEPOSQUERY: - if (code[1] == OP_PROP || code[1] == OP_NOTPROP) code += 2; - break; - - case OP_TYPEPOSUPTO: - case OP_TYPEUPTO: - case OP_TYPEMINUPTO: - case OP_TYPEEXACT: - if (code[1 + IMM2_SIZE] == OP_PROP || code[1 + IMM2_SIZE] == OP_NOTPROP) - code += 2; - break; - - case OP_MARK: - case OP_PRUNE_ARG: - case OP_SKIP_ARG: - case OP_THEN_ARG: - code += code[1]; - break; - } - - /* Add in the fixed length from the table */ - - code += PRIV(OP_lengths)[c]; - - /* In UTF-8 mode, opcodes that are followed by a character may be followed - by a multi-byte character. The length in the table is a minimum, so we have - to arrange to skip the extra bytes. */ - -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 - if (utf) switch(c) - { - case OP_CHAR: - case OP_CHARI: - case OP_NOT: - case OP_NOTI: - case OP_EXACT: - case OP_EXACTI: - case OP_NOTEXACT: - case OP_NOTEXACTI: - case OP_UPTO: - case OP_UPTOI: - case OP_NOTUPTO: - case OP_NOTUPTOI: - case OP_MINUPTO: - case OP_MINUPTOI: - case OP_NOTMINUPTO: - case OP_NOTMINUPTOI: - case OP_POSUPTO: - case OP_POSUPTOI: - case OP_NOTPOSUPTO: - case OP_NOTPOSUPTOI: - case OP_STAR: - case OP_STARI: - case OP_NOTSTAR: - case OP_NOTSTARI: - case OP_MINSTAR: - case OP_MINSTARI: - case OP_NOTMINSTAR: - case OP_NOTMINSTARI: - case OP_POSSTAR: - case OP_POSSTARI: - case OP_NOTPOSSTAR: - case OP_NOTPOSSTARI: - case OP_PLUS: - case OP_PLUSI: - case OP_NOTPLUS: - case OP_NOTPLUSI: - case OP_MINPLUS: - case OP_MINPLUSI: - case OP_NOTMINPLUS: - case OP_NOTMINPLUSI: - case OP_POSPLUS: - case OP_POSPLUSI: - case OP_NOTPOSPLUS: - case OP_NOTPOSPLUSI: - case OP_QUERY: - case OP_QUERYI: - case OP_NOTQUERY: - case OP_NOTQUERYI: - case OP_MINQUERY: - case OP_MINQUERYI: - case OP_NOTMINQUERY: - case OP_NOTMINQUERYI: - case OP_POSQUERY: - case OP_POSQUERYI: - case OP_NOTPOSQUERY: - case OP_NOTPOSQUERYI: - if (HAS_EXTRALEN(code[-1])) code += GET_EXTRALEN(code[-1]); - break; - } -#else - (void)(utf); /* Keep compiler happy by referencing function argument */ -#endif - } - } -} - - - -/************************************************* -* Scan compiled branch for non-emptiness * -*************************************************/ - -/* This function scans through a branch of a compiled pattern to see whether it -can match the empty string or not. It is called from could_be_empty() -below and from compile_branch() when checking for an unlimited repeat of a -group that can match nothing. Note that first_significant_code() skips over -backward and negative forward assertions when its final argument is TRUE. If we -hit an unclosed bracket, we return "empty" - this means we've struck an inner -bracket whose current branch will already have been scanned. - -Arguments: - code points to start of search - endcode points to where to stop - utf TRUE if in UTF-8 / UTF-16 / UTF-32 mode - cd contains pointers to tables etc. - recurses chain of recurse_check to catch mutual recursion - -Returns: TRUE if what is matched could be empty -*/ - -static BOOL -could_be_empty_branch(const pcre_uchar *code, const pcre_uchar *endcode, - BOOL utf, compile_data *cd, recurse_check *recurses) -{ -register pcre_uchar c; -recurse_check this_recurse; - -for (code = first_significant_code(code + PRIV(OP_lengths)[*code], TRUE); - code < endcode; - code = first_significant_code(code + PRIV(OP_lengths)[c], TRUE)) - { - const pcre_uchar *ccode; - - c = *code; - - /* Skip over forward assertions; the other assertions are skipped by - first_significant_code() with a TRUE final argument. */ - - if (c == OP_ASSERT) - { - do code += GET(code, 1); while (*code == OP_ALT); - c = *code; - continue; - } - - /* For a recursion/subroutine call, if its end has been reached, which - implies a backward reference subroutine call, we can scan it. If it's a - forward reference subroutine call, we can't. To detect forward reference - we have to scan up the list that is kept in the workspace. This function is - called only when doing the real compile, not during the pre-compile that - measures the size of the compiled pattern. */ - - if (c == OP_RECURSE) - { - const pcre_uchar *scode = cd->start_code + GET(code, 1); - const pcre_uchar *endgroup = scode; - BOOL empty_branch; - - /* Test for forward reference or uncompleted reference. This is disabled - when called to scan a completed pattern by setting cd->start_workspace to - NULL. */ - - if (cd->start_workspace != NULL) - { - const pcre_uchar *tcode; - for (tcode = cd->start_workspace; tcode < cd->hwm; tcode += LINK_SIZE) - if ((int)GET(tcode, 0) == (int)(code + 1 - cd->start_code)) return TRUE; - if (GET(scode, 1) == 0) return TRUE; /* Unclosed */ - } - - /* If the reference is to a completed group, we need to detect whether this - is a recursive call, as otherwise there will be an infinite loop. If it is - a recursion, just skip over it. Simple recursions are easily detected. For - mutual recursions we keep a chain on the stack. */ - - do endgroup += GET(endgroup, 1); while (*endgroup == OP_ALT); - if (code >= scode && code <= endgroup) continue; /* Simple recursion */ - else - { - recurse_check *r = recurses; - for (r = recurses; r != NULL; r = r->prev) - if (r->group == scode) break; - if (r != NULL) continue; /* Mutual recursion */ - } - - /* Completed reference; scan the referenced group, remembering it on the - stack chain to detect mutual recursions. */ - - empty_branch = FALSE; - this_recurse.prev = recurses; - this_recurse.group = scode; - - do - { - if (could_be_empty_branch(scode, endcode, utf, cd, &this_recurse)) - { - empty_branch = TRUE; - break; - } - scode += GET(scode, 1); - } - while (*scode == OP_ALT); - - if (!empty_branch) return FALSE; /* All branches are non-empty */ - continue; - } - - /* Groups with zero repeats can of course be empty; skip them. */ - - if (c == OP_BRAZERO || c == OP_BRAMINZERO || c == OP_SKIPZERO || - c == OP_BRAPOSZERO) - { - code += PRIV(OP_lengths)[c]; - do code += GET(code, 1); while (*code == OP_ALT); - c = *code; - continue; - } - - /* A nested group that is already marked as "could be empty" can just be - skipped. */ - - if (c == OP_SBRA || c == OP_SBRAPOS || - c == OP_SCBRA || c == OP_SCBRAPOS) - { - do code += GET(code, 1); while (*code == OP_ALT); - c = *code; - continue; - } - - /* For other groups, scan the branches. */ - - if (c == OP_BRA || c == OP_BRAPOS || - c == OP_CBRA || c == OP_CBRAPOS || - c == OP_ONCE || c == OP_ONCE_NC || - c == OP_COND || c == OP_SCOND) - { - BOOL empty_branch; - if (GET(code, 1) == 0) return TRUE; /* Hit unclosed bracket */ - - /* If a conditional group has only one branch, there is a second, implied, - empty branch, so just skip over the conditional, because it could be empty. - Otherwise, scan the individual branches of the group. */ - - if (c == OP_COND && code[GET(code, 1)] != OP_ALT) - code += GET(code, 1); - else - { - empty_branch = FALSE; - do - { - if (!empty_branch && could_be_empty_branch(code, endcode, utf, cd, - recurses)) empty_branch = TRUE; - code += GET(code, 1); - } - while (*code == OP_ALT); - if (!empty_branch) return FALSE; /* All branches are non-empty */ - } - - c = *code; - continue; - } - - /* Handle the other opcodes */ - - switch (c) - { - /* Check for quantifiers after a class. XCLASS is used for classes that - cannot be represented just by a bit map. This includes negated single - high-valued characters. The length in PRIV(OP_lengths)[] is zero; the - actual length is stored in the compiled code, so we must update "code" - here. */ - -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - case OP_XCLASS: - ccode = code += GET(code, 1); - goto CHECK_CLASS_REPEAT; -#endif - - case OP_CLASS: - case OP_NCLASS: - ccode = code + PRIV(OP_lengths)[OP_CLASS]; - -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - CHECK_CLASS_REPEAT: -#endif - - switch (*ccode) - { - case OP_CRSTAR: /* These could be empty; continue */ - case OP_CRMINSTAR: - case OP_CRQUERY: - case OP_CRMINQUERY: - case OP_CRPOSSTAR: - case OP_CRPOSQUERY: - break; - - default: /* Non-repeat => class must match */ - case OP_CRPLUS: /* These repeats aren't empty */ - case OP_CRMINPLUS: - case OP_CRPOSPLUS: - return FALSE; - - case OP_CRRANGE: - case OP_CRMINRANGE: - case OP_CRPOSRANGE: - if (GET2(ccode, 1) > 0) return FALSE; /* Minimum > 0 */ - break; - } - break; - - /* Opcodes that must match a character */ - - case OP_ANY: - case OP_ALLANY: - case OP_ANYBYTE: - - case OP_PROP: - case OP_NOTPROP: - case OP_ANYNL: - - case OP_NOT_HSPACE: - case OP_HSPACE: - case OP_NOT_VSPACE: - case OP_VSPACE: - case OP_EXTUNI: - - case OP_NOT_DIGIT: - case OP_DIGIT: - case OP_NOT_WHITESPACE: - case OP_WHITESPACE: - case OP_NOT_WORDCHAR: - case OP_WORDCHAR: - - case OP_CHAR: - case OP_CHARI: - case OP_NOT: - case OP_NOTI: - - case OP_PLUS: - case OP_PLUSI: - case OP_MINPLUS: - case OP_MINPLUSI: - - case OP_NOTPLUS: - case OP_NOTPLUSI: - case OP_NOTMINPLUS: - case OP_NOTMINPLUSI: - - case OP_POSPLUS: - case OP_POSPLUSI: - case OP_NOTPOSPLUS: - case OP_NOTPOSPLUSI: - - case OP_EXACT: - case OP_EXACTI: - case OP_NOTEXACT: - case OP_NOTEXACTI: - - case OP_TYPEPLUS: - case OP_TYPEMINPLUS: - case OP_TYPEPOSPLUS: - case OP_TYPEEXACT: - - return FALSE; - - /* These are going to continue, as they may be empty, but we have to - fudge the length for the \p and \P cases. */ - - case OP_TYPESTAR: - case OP_TYPEMINSTAR: - case OP_TYPEPOSSTAR: - case OP_TYPEQUERY: - case OP_TYPEMINQUERY: - case OP_TYPEPOSQUERY: - if (code[1] == OP_PROP || code[1] == OP_NOTPROP) code += 2; - break; - - /* Same for these */ - - case OP_TYPEUPTO: - case OP_TYPEMINUPTO: - case OP_TYPEPOSUPTO: - if (code[1 + IMM2_SIZE] == OP_PROP || code[1 + IMM2_SIZE] == OP_NOTPROP) - code += 2; - break; - - /* End of branch */ - - case OP_KET: - case OP_KETRMAX: - case OP_KETRMIN: - case OP_KETRPOS: - case OP_ALT: - return TRUE; - - /* In UTF-8 mode, STAR, MINSTAR, POSSTAR, QUERY, MINQUERY, POSQUERY, UPTO, - MINUPTO, and POSUPTO and their caseless and negative versions may be - followed by a multibyte character. */ - -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 - case OP_STAR: - case OP_STARI: - case OP_NOTSTAR: - case OP_NOTSTARI: - - case OP_MINSTAR: - case OP_MINSTARI: - case OP_NOTMINSTAR: - case OP_NOTMINSTARI: - - case OP_POSSTAR: - case OP_POSSTARI: - case OP_NOTPOSSTAR: - case OP_NOTPOSSTARI: - - case OP_QUERY: - case OP_QUERYI: - case OP_NOTQUERY: - case OP_NOTQUERYI: - - case OP_MINQUERY: - case OP_MINQUERYI: - case OP_NOTMINQUERY: - case OP_NOTMINQUERYI: - - case OP_POSQUERY: - case OP_POSQUERYI: - case OP_NOTPOSQUERY: - case OP_NOTPOSQUERYI: - - if (utf && HAS_EXTRALEN(code[1])) code += GET_EXTRALEN(code[1]); - break; - - case OP_UPTO: - case OP_UPTOI: - case OP_NOTUPTO: - case OP_NOTUPTOI: - - case OP_MINUPTO: - case OP_MINUPTOI: - case OP_NOTMINUPTO: - case OP_NOTMINUPTOI: - - case OP_POSUPTO: - case OP_POSUPTOI: - case OP_NOTPOSUPTO: - case OP_NOTPOSUPTOI: - - if (utf && HAS_EXTRALEN(code[1 + IMM2_SIZE])) code += GET_EXTRALEN(code[1 + IMM2_SIZE]); - break; -#endif - - /* MARK, and PRUNE/SKIP/THEN with an argument must skip over the argument - string. */ - - case OP_MARK: - case OP_PRUNE_ARG: - case OP_SKIP_ARG: - case OP_THEN_ARG: - code += code[1]; - break; - - /* None of the remaining opcodes are required to match a character. */ - - default: - break; - } - } - -return TRUE; -} - - - -/************************************************* -* Scan compiled regex for non-emptiness * -*************************************************/ - -/* This function is called to check for left recursive calls. We want to check -the current branch of the current pattern to see if it could match the empty -string. If it could, we must look outwards for branches at other levels, -stopping when we pass beyond the bracket which is the subject of the recursion. -This function is called only during the real compile, not during the -pre-compile. - -Arguments: - code points to start of the recursion - endcode points to where to stop (current RECURSE item) - bcptr points to the chain of current (unclosed) branch starts - utf TRUE if in UTF-8 / UTF-16 / UTF-32 mode - cd pointers to tables etc - -Returns: TRUE if what is matched could be empty -*/ - -static BOOL -could_be_empty(const pcre_uchar *code, const pcre_uchar *endcode, - branch_chain *bcptr, BOOL utf, compile_data *cd) -{ -while (bcptr != NULL && bcptr->current_branch >= code) - { - if (!could_be_empty_branch(bcptr->current_branch, endcode, utf, cd, NULL)) - return FALSE; - bcptr = bcptr->outer; - } -return TRUE; -} - - - -/************************************************* -* Base opcode of repeated opcodes * -*************************************************/ - -/* Returns the base opcode for repeated single character type opcodes. If the -opcode is not a repeated character type, it returns with the original value. - -Arguments: c opcode -Returns: base opcode for the type -*/ - -static pcre_uchar -get_repeat_base(pcre_uchar c) -{ -return (c > OP_TYPEPOSUPTO)? c : - (c >= OP_TYPESTAR)? OP_TYPESTAR : - (c >= OP_NOTSTARI)? OP_NOTSTARI : - (c >= OP_NOTSTAR)? OP_NOTSTAR : - (c >= OP_STARI)? OP_STARI : - OP_STAR; -} - - - -#ifdef SUPPORT_UCP -/************************************************* -* Check a character and a property * -*************************************************/ - -/* This function is called by check_auto_possessive() when a property item -is adjacent to a fixed character. - -Arguments: - c the character - ptype the property type - pdata the data for the type - negated TRUE if it's a negated property (\P or \p{^) - -Returns: TRUE if auto-possessifying is OK -*/ - -static BOOL -check_char_prop(pcre_uint32 c, unsigned int ptype, unsigned int pdata, - BOOL negated) -{ -const pcre_uint32 *p; -const ucd_record *prop = GET_UCD(c); - -switch(ptype) - { - case PT_LAMP: - return (prop->chartype == ucp_Lu || - prop->chartype == ucp_Ll || - prop->chartype == ucp_Lt) == negated; - - case PT_GC: - return (pdata == PRIV(ucp_gentype)[prop->chartype]) == negated; - - case PT_PC: - return (pdata == prop->chartype) == negated; - - case PT_SC: - return (pdata == prop->script) == negated; - - /* These are specials */ - - case PT_ALNUM: - return (PRIV(ucp_gentype)[prop->chartype] == ucp_L || - PRIV(ucp_gentype)[prop->chartype] == ucp_N) == negated; - - /* Perl space used to exclude VT, but from Perl 5.18 it is included, which - means that Perl space and POSIX space are now identical. PCRE was changed - at release 8.34. */ - - case PT_SPACE: /* Perl space */ - case PT_PXSPACE: /* POSIX space */ - switch(c) - { - HSPACE_CASES: - VSPACE_CASES: - return negated; - - default: - return (PRIV(ucp_gentype)[prop->chartype] == ucp_Z) == negated; - } - break; /* Control never reaches here */ - - case PT_WORD: - return (PRIV(ucp_gentype)[prop->chartype] == ucp_L || - PRIV(ucp_gentype)[prop->chartype] == ucp_N || - c == CHAR_UNDERSCORE) == negated; - - case PT_CLIST: - p = PRIV(ucd_caseless_sets) + prop->caseset; - for (;;) - { - if (c < *p) return !negated; - if (c == *p++) return negated; - } - break; /* Control never reaches here */ - } - -return FALSE; -} -#endif /* SUPPORT_UCP */ - - - -/************************************************* -* Fill the character property list * -*************************************************/ - -/* Checks whether the code points to an opcode that can take part in auto- -possessification, and if so, fills a list with its properties. - -Arguments: - code points to start of expression - utf TRUE if in UTF-8 / UTF-16 / UTF-32 mode - fcc points to case-flipping table - list points to output list - list[0] will be filled with the opcode - list[1] will be non-zero if this opcode - can match an empty character string - list[2..7] depends on the opcode - -Returns: points to the start of the next opcode if *code is accepted - NULL if *code is not accepted -*/ - -static const pcre_uchar * -get_chr_property_list(const pcre_uchar *code, BOOL utf, - const pcre_uint8 *fcc, pcre_uint32 *list) -{ -pcre_uchar c = *code; -pcre_uchar base; -const pcre_uchar *end; -pcre_uint32 chr; - -#ifdef SUPPORT_UCP -pcre_uint32 *clist_dest; -const pcre_uint32 *clist_src; -#else -utf = utf; /* Suppress "unused parameter" compiler warning */ -#endif - -list[0] = c; -list[1] = FALSE; -code++; - -if (c >= OP_STAR && c <= OP_TYPEPOSUPTO) - { - base = get_repeat_base(c); - c -= (base - OP_STAR); - - if (c == OP_UPTO || c == OP_MINUPTO || c == OP_EXACT || c == OP_POSUPTO) - code += IMM2_SIZE; - - list[1] = (c != OP_PLUS && c != OP_MINPLUS && c != OP_EXACT && c != OP_POSPLUS); - - switch(base) - { - case OP_STAR: - list[0] = OP_CHAR; - break; - - case OP_STARI: - list[0] = OP_CHARI; - break; - - case OP_NOTSTAR: - list[0] = OP_NOT; - break; - - case OP_NOTSTARI: - list[0] = OP_NOTI; - break; - - case OP_TYPESTAR: - list[0] = *code; - code++; - break; - } - c = list[0]; - } - -switch(c) - { - case OP_NOT_DIGIT: - case OP_DIGIT: - case OP_NOT_WHITESPACE: - case OP_WHITESPACE: - case OP_NOT_WORDCHAR: - case OP_WORDCHAR: - case OP_ANY: - case OP_ALLANY: - case OP_ANYNL: - case OP_NOT_HSPACE: - case OP_HSPACE: - case OP_NOT_VSPACE: - case OP_VSPACE: - case OP_EXTUNI: - case OP_EODN: - case OP_EOD: - case OP_DOLL: - case OP_DOLLM: - return code; - - case OP_CHAR: - case OP_NOT: - GETCHARINCTEST(chr, code); - list[2] = chr; - list[3] = NOTACHAR; - return code; - - case OP_CHARI: - case OP_NOTI: - list[0] = (c == OP_CHARI) ? OP_CHAR : OP_NOT; - GETCHARINCTEST(chr, code); - list[2] = chr; - -#ifdef SUPPORT_UCP - if (chr < 128 || (chr < 256 && !utf)) - list[3] = fcc[chr]; - else - list[3] = UCD_OTHERCASE(chr); -#elif defined SUPPORT_UTF || !defined COMPILE_PCRE8 - list[3] = (chr < 256) ? fcc[chr] : chr; -#else - list[3] = fcc[chr]; -#endif - - /* The othercase might be the same value. */ - - if (chr == list[3]) - list[3] = NOTACHAR; - else - list[4] = NOTACHAR; - return code; - -#ifdef SUPPORT_UCP - case OP_PROP: - case OP_NOTPROP: - if (code[0] != PT_CLIST) - { - list[2] = code[0]; - list[3] = code[1]; - return code + 2; - } - - /* Convert only if we have enough space. */ - - clist_src = PRIV(ucd_caseless_sets) + code[1]; - clist_dest = list + 2; - code += 2; - - do { - if (clist_dest >= list + 8) - { - /* Early return if there is not enough space. This should never - happen, since all clists are shorter than 5 character now. */ - list[2] = code[0]; - list[3] = code[1]; - return code; - } - *clist_dest++ = *clist_src; - } - while(*clist_src++ != NOTACHAR); - - /* All characters are stored. The terminating NOTACHAR - is copied form the clist itself. */ - - list[0] = (c == OP_PROP) ? OP_CHAR : OP_NOT; - return code; -#endif - - case OP_NCLASS: - case OP_CLASS: -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - case OP_XCLASS: - if (c == OP_XCLASS) - end = code + GET(code, 0) - 1; - else -#endif - end = code + 32 / sizeof(pcre_uchar); - - switch(*end) - { - case OP_CRSTAR: - case OP_CRMINSTAR: - case OP_CRQUERY: - case OP_CRMINQUERY: - case OP_CRPOSSTAR: - case OP_CRPOSQUERY: - list[1] = TRUE; - end++; - break; - - case OP_CRPLUS: - case OP_CRMINPLUS: - case OP_CRPOSPLUS: - end++; - break; - - case OP_CRRANGE: - case OP_CRMINRANGE: - case OP_CRPOSRANGE: - list[1] = (GET2(end, 1) == 0); - end += 1 + 2 * IMM2_SIZE; - break; - } - list[2] = (pcre_uint32)(end - code); - return end; - } -return NULL; /* Opcode not accepted */ -} - - - -/************************************************* -* Scan further character sets for match * -*************************************************/ - -/* Checks whether the base and the current opcode have a common character, in -which case the base cannot be possessified. - -Arguments: - code points to the byte code - utf TRUE in UTF-8 / UTF-16 / UTF-32 mode - cd static compile data - base_list the data list of the base opcode - -Returns: TRUE if the auto-possessification is possible -*/ - -static BOOL -compare_opcodes(const pcre_uchar *code, BOOL utf, const compile_data *cd, - const pcre_uint32 *base_list, const pcre_uchar *base_end, int *rec_limit) -{ -pcre_uchar c; -pcre_uint32 list[8]; -const pcre_uint32 *chr_ptr; -const pcre_uint32 *ochr_ptr; -const pcre_uint32 *list_ptr; -const pcre_uchar *next_code; -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 -const pcre_uchar *xclass_flags; -#endif -const pcre_uint8 *class_bitset; -const pcre_uint8 *set1, *set2, *set_end; -pcre_uint32 chr; -BOOL accepted, invert_bits; -BOOL entered_a_group = FALSE; - -if (*rec_limit == 0) return FALSE; ---(*rec_limit); - -/* Note: the base_list[1] contains whether the current opcode has greedy -(represented by a non-zero value) quantifier. This is a different from -other character type lists, which stores here that the character iterator -matches to an empty string (also represented by a non-zero value). */ - -for(;;) - { - /* All operations move the code pointer forward. - Therefore infinite recursions are not possible. */ - - c = *code; - - /* Skip over callouts */ - - if (c == OP_CALLOUT) - { - code += PRIV(OP_lengths)[c]; - continue; - } - - if (c == OP_ALT) - { - do code += GET(code, 1); while (*code == OP_ALT); - c = *code; - } - - switch(c) - { - case OP_END: - case OP_KETRPOS: - /* TRUE only in greedy case. The non-greedy case could be replaced by - an OP_EXACT, but it is probably not worth it. (And note that OP_EXACT - uses more memory, which we cannot get at this stage.) */ - - return base_list[1] != 0; - - case OP_KET: - /* If the bracket is capturing, and referenced by an OP_RECURSE, or - it is an atomic sub-pattern (assert, once, etc.) the non-greedy case - cannot be converted to a possessive form. */ - - if (base_list[1] == 0) return FALSE; - - switch(*(code - GET(code, 1))) - { - case OP_ASSERT: - case OP_ASSERT_NOT: - case OP_ASSERTBACK: - case OP_ASSERTBACK_NOT: - case OP_ONCE: - case OP_ONCE_NC: - /* Atomic sub-patterns and assertions can always auto-possessify their - last iterator. However, if the group was entered as a result of checking - a previous iterator, this is not possible. */ - - return !entered_a_group; - } - - code += PRIV(OP_lengths)[c]; - continue; - - case OP_ONCE: - case OP_ONCE_NC: - case OP_BRA: - case OP_CBRA: - next_code = code + GET(code, 1); - code += PRIV(OP_lengths)[c]; - - while (*next_code == OP_ALT) - { - if (!compare_opcodes(code, utf, cd, base_list, base_end, rec_limit)) - return FALSE; - code = next_code + 1 + LINK_SIZE; - next_code += GET(next_code, 1); - } - - entered_a_group = TRUE; - continue; - - case OP_BRAZERO: - case OP_BRAMINZERO: - - next_code = code + 1; - if (*next_code != OP_BRA && *next_code != OP_CBRA - && *next_code != OP_ONCE && *next_code != OP_ONCE_NC) return FALSE; - - do next_code += GET(next_code, 1); while (*next_code == OP_ALT); - - /* The bracket content will be checked by the - OP_BRA/OP_CBRA case above. */ - next_code += 1 + LINK_SIZE; - if (!compare_opcodes(next_code, utf, cd, base_list, base_end, rec_limit)) - return FALSE; - - code += PRIV(OP_lengths)[c]; - continue; - - default: - break; - } - - /* Check for a supported opcode, and load its properties. */ - - code = get_chr_property_list(code, utf, cd->fcc, list); - if (code == NULL) return FALSE; /* Unsupported */ - - /* If either opcode is a small character list, set pointers for comparing - characters from that list with another list, or with a property. */ - - if (base_list[0] == OP_CHAR) - { - chr_ptr = base_list + 2; - list_ptr = list; - } - else if (list[0] == OP_CHAR) - { - chr_ptr = list + 2; - list_ptr = base_list; - } - - /* Character bitsets can also be compared to certain opcodes. */ - - else if (base_list[0] == OP_CLASS || list[0] == OP_CLASS -#ifdef COMPILE_PCRE8 - /* In 8 bit, non-UTF mode, OP_CLASS and OP_NCLASS are the same. */ - || (!utf && (base_list[0] == OP_NCLASS || list[0] == OP_NCLASS)) -#endif - ) - { -#ifdef COMPILE_PCRE8 - if (base_list[0] == OP_CLASS || (!utf && base_list[0] == OP_NCLASS)) -#else - if (base_list[0] == OP_CLASS) -#endif - { - set1 = (pcre_uint8 *)(base_end - base_list[2]); - list_ptr = list; - } - else - { - set1 = (pcre_uint8 *)(code - list[2]); - list_ptr = base_list; - } - - invert_bits = FALSE; - switch(list_ptr[0]) - { - case OP_CLASS: - case OP_NCLASS: - set2 = (pcre_uint8 *) - ((list_ptr == list ? code : base_end) - list_ptr[2]); - break; - -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - case OP_XCLASS: - xclass_flags = (list_ptr == list ? code : base_end) - list_ptr[2] + LINK_SIZE; - if ((*xclass_flags & XCL_HASPROP) != 0) return FALSE; - if ((*xclass_flags & XCL_MAP) == 0) - { - /* No bits are set for characters < 256. */ - if (list[1] == 0) return (*xclass_flags & XCL_NOT) == 0; - /* Might be an empty repeat. */ - continue; - } - set2 = (pcre_uint8 *)(xclass_flags + 1); - break; -#endif - - case OP_NOT_DIGIT: - invert_bits = TRUE; - /* Fall through */ - case OP_DIGIT: - set2 = (pcre_uint8 *)(cd->cbits + cbit_digit); - break; - - case OP_NOT_WHITESPACE: - invert_bits = TRUE; - /* Fall through */ - case OP_WHITESPACE: - set2 = (pcre_uint8 *)(cd->cbits + cbit_space); - break; - - case OP_NOT_WORDCHAR: - invert_bits = TRUE; - /* Fall through */ - case OP_WORDCHAR: - set2 = (pcre_uint8 *)(cd->cbits + cbit_word); - break; - - default: - return FALSE; - } - - /* Because the sets are unaligned, we need - to perform byte comparison here. */ - set_end = set1 + 32; - if (invert_bits) - { - do - { - if ((*set1++ & ~(*set2++)) != 0) return FALSE; - } - while (set1 < set_end); - } - else - { - do - { - if ((*set1++ & *set2++) != 0) return FALSE; - } - while (set1 < set_end); - } - - if (list[1] == 0) return TRUE; - /* Might be an empty repeat. */ - continue; - } - - /* Some property combinations also acceptable. Unicode property opcodes are - processed specially; the rest can be handled with a lookup table. */ - - else - { - pcre_uint32 leftop, rightop; - - leftop = base_list[0]; - rightop = list[0]; - -#ifdef SUPPORT_UCP - accepted = FALSE; /* Always set in non-unicode case. */ - if (leftop == OP_PROP || leftop == OP_NOTPROP) - { - if (rightop == OP_EOD) - accepted = TRUE; - else if (rightop == OP_PROP || rightop == OP_NOTPROP) - { - int n; - const pcre_uint8 *p; - BOOL same = leftop == rightop; - BOOL lisprop = leftop == OP_PROP; - BOOL risprop = rightop == OP_PROP; - BOOL bothprop = lisprop && risprop; - - /* There's a table that specifies how each combination is to be - processed: - 0 Always return FALSE (never auto-possessify) - 1 Character groups are distinct (possessify if both are OP_PROP) - 2 Check character categories in the same group (general or particular) - 3 Return TRUE if the two opcodes are not the same - ... see comments below - */ - - n = propposstab[base_list[2]][list[2]]; - switch(n) - { - case 0: break; - case 1: accepted = bothprop; break; - case 2: accepted = (base_list[3] == list[3]) != same; break; - case 3: accepted = !same; break; - - case 4: /* Left general category, right particular category */ - accepted = risprop && catposstab[base_list[3]][list[3]] == same; - break; - - case 5: /* Right general category, left particular category */ - accepted = lisprop && catposstab[list[3]][base_list[3]] == same; - break; - - /* This code is logically tricky. Think hard before fiddling with it. - The posspropstab table has four entries per row. Each row relates to - one of PCRE's special properties such as ALNUM or SPACE or WORD. - Only WORD actually needs all four entries, but using repeats for the - others means they can all use the same code below. - - The first two entries in each row are Unicode general categories, and - apply always, because all the characters they include are part of the - PCRE character set. The third and fourth entries are a general and a - particular category, respectively, that include one or more relevant - characters. One or the other is used, depending on whether the check - is for a general or a particular category. However, in both cases the - category contains more characters than the specials that are defined - for the property being tested against. Therefore, it cannot be used - in a NOTPROP case. - - Example: the row for WORD contains ucp_L, ucp_N, ucp_P, ucp_Po. - Underscore is covered by ucp_P or ucp_Po. */ - - case 6: /* Left alphanum vs right general category */ - case 7: /* Left space vs right general category */ - case 8: /* Left word vs right general category */ - p = posspropstab[n-6]; - accepted = risprop && lisprop == - (list[3] != p[0] && - list[3] != p[1] && - (list[3] != p[2] || !lisprop)); - break; - - case 9: /* Right alphanum vs left general category */ - case 10: /* Right space vs left general category */ - case 11: /* Right word vs left general category */ - p = posspropstab[n-9]; - accepted = lisprop && risprop == - (base_list[3] != p[0] && - base_list[3] != p[1] && - (base_list[3] != p[2] || !risprop)); - break; - - case 12: /* Left alphanum vs right particular category */ - case 13: /* Left space vs right particular category */ - case 14: /* Left word vs right particular category */ - p = posspropstab[n-12]; - accepted = risprop && lisprop == - (catposstab[p[0]][list[3]] && - catposstab[p[1]][list[3]] && - (list[3] != p[3] || !lisprop)); - break; - - case 15: /* Right alphanum vs left particular category */ - case 16: /* Right space vs left particular category */ - case 17: /* Right word vs left particular category */ - p = posspropstab[n-15]; - accepted = lisprop && risprop == - (catposstab[p[0]][base_list[3]] && - catposstab[p[1]][base_list[3]] && - (base_list[3] != p[3] || !risprop)); - break; - } - } - } - - else -#endif /* SUPPORT_UCP */ - - accepted = leftop >= FIRST_AUTOTAB_OP && leftop <= LAST_AUTOTAB_LEFT_OP && - rightop >= FIRST_AUTOTAB_OP && rightop <= LAST_AUTOTAB_RIGHT_OP && - autoposstab[leftop - FIRST_AUTOTAB_OP][rightop - FIRST_AUTOTAB_OP]; - - if (!accepted) return FALSE; - - if (list[1] == 0) return TRUE; - /* Might be an empty repeat. */ - continue; - } - - /* Control reaches here only if one of the items is a small character list. - All characters are checked against the other side. */ - - do - { - chr = *chr_ptr; - - switch(list_ptr[0]) - { - case OP_CHAR: - ochr_ptr = list_ptr + 2; - do - { - if (chr == *ochr_ptr) return FALSE; - ochr_ptr++; - } - while(*ochr_ptr != NOTACHAR); - break; - - case OP_NOT: - ochr_ptr = list_ptr + 2; - do - { - if (chr == *ochr_ptr) - break; - ochr_ptr++; - } - while(*ochr_ptr != NOTACHAR); - if (*ochr_ptr == NOTACHAR) return FALSE; /* Not found */ - break; - - /* Note that OP_DIGIT etc. are generated only when PCRE_UCP is *not* - set. When it is set, \d etc. are converted into OP_(NOT_)PROP codes. */ - - case OP_DIGIT: - if (chr < 256 && (cd->ctypes[chr] & ctype_digit) != 0) return FALSE; - break; - - case OP_NOT_DIGIT: - if (chr > 255 || (cd->ctypes[chr] & ctype_digit) == 0) return FALSE; - break; - - case OP_WHITESPACE: - if (chr < 256 && (cd->ctypes[chr] & ctype_space) != 0) return FALSE; - break; - - case OP_NOT_WHITESPACE: - if (chr > 255 || (cd->ctypes[chr] & ctype_space) == 0) return FALSE; - break; - - case OP_WORDCHAR: - if (chr < 255 && (cd->ctypes[chr] & ctype_word) != 0) return FALSE; - break; - - case OP_NOT_WORDCHAR: - if (chr > 255 || (cd->ctypes[chr] & ctype_word) == 0) return FALSE; - break; - - case OP_HSPACE: - switch(chr) - { - HSPACE_CASES: return FALSE; - default: break; - } - break; - - case OP_NOT_HSPACE: - switch(chr) - { - HSPACE_CASES: break; - default: return FALSE; - } - break; - - case OP_ANYNL: - case OP_VSPACE: - switch(chr) - { - VSPACE_CASES: return FALSE; - default: break; - } - break; - - case OP_NOT_VSPACE: - switch(chr) - { - VSPACE_CASES: break; - default: return FALSE; - } - break; - - case OP_DOLL: - case OP_EODN: - switch (chr) - { - case CHAR_CR: - case CHAR_LF: - case CHAR_VT: - case CHAR_FF: - case CHAR_NEL: -#ifndef EBCDIC - case 0x2028: - case 0x2029: -#endif /* Not EBCDIC */ - return FALSE; - } - break; - - case OP_EOD: /* Can always possessify before \z */ - break; - -#ifdef SUPPORT_UCP - case OP_PROP: - case OP_NOTPROP: - if (!check_char_prop(chr, list_ptr[2], list_ptr[3], - list_ptr[0] == OP_NOTPROP)) - return FALSE; - break; -#endif - - case OP_NCLASS: - if (chr > 255) return FALSE; - /* Fall through */ - - case OP_CLASS: - if (chr > 255) break; - class_bitset = (pcre_uint8 *) - ((list_ptr == list ? code : base_end) - list_ptr[2]); - if ((class_bitset[chr >> 3] & (1 << (chr & 7))) != 0) return FALSE; - break; - -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - case OP_XCLASS: - if (PRIV(xclass)(chr, (list_ptr == list ? code : base_end) - - list_ptr[2] + LINK_SIZE, utf)) return FALSE; - break; -#endif - - default: - return FALSE; - } - - chr_ptr++; - } - while(*chr_ptr != NOTACHAR); - - /* At least one character must be matched from this opcode. */ - - if (list[1] == 0) return TRUE; - } - -/* Control never reaches here. There used to be a fail-save return FALSE; here, -but some compilers complain about an unreachable statement. */ - -} - - - -/************************************************* -* Scan compiled regex for auto-possession * -*************************************************/ - -/* Replaces single character iterations with their possessive alternatives -if appropriate. This function modifies the compiled opcode! - -Arguments: - code points to start of the byte code - utf TRUE in UTF-8 / UTF-16 / UTF-32 mode - cd static compile data - -Returns: nothing -*/ - -static void -auto_possessify(pcre_uchar *code, BOOL utf, const compile_data *cd) -{ -register pcre_uchar c; -const pcre_uchar *end; -pcre_uchar *repeat_opcode; -pcre_uint32 list[8]; -int rec_limit; - -for (;;) - { - c = *code; - - /* When a pattern with bad UTF-8 encoding is compiled with NO_UTF_CHECK, - it may compile without complaining, but may get into a loop here if the code - pointer points to a bad value. This is, of course a documentated possibility, - when NO_UTF_CHECK is set, so it isn't a bug, but we can detect this case and - just give up on this optimization. */ - - if (c >= OP_TABLE_LENGTH) return; - - if (c >= OP_STAR && c <= OP_TYPEPOSUPTO) - { - c -= get_repeat_base(c) - OP_STAR; - end = (c <= OP_MINUPTO) ? - get_chr_property_list(code, utf, cd->fcc, list) : NULL; - list[1] = c == OP_STAR || c == OP_PLUS || c == OP_QUERY || c == OP_UPTO; - - rec_limit = 1000; - if (end != NULL && compare_opcodes(end, utf, cd, list, end, &rec_limit)) - { - switch(c) - { - case OP_STAR: - *code += OP_POSSTAR - OP_STAR; - break; - - case OP_MINSTAR: - *code += OP_POSSTAR - OP_MINSTAR; - break; - - case OP_PLUS: - *code += OP_POSPLUS - OP_PLUS; - break; - - case OP_MINPLUS: - *code += OP_POSPLUS - OP_MINPLUS; - break; - - case OP_QUERY: - *code += OP_POSQUERY - OP_QUERY; - break; - - case OP_MINQUERY: - *code += OP_POSQUERY - OP_MINQUERY; - break; - - case OP_UPTO: - *code += OP_POSUPTO - OP_UPTO; - break; - - case OP_MINUPTO: - *code += OP_POSUPTO - OP_MINUPTO; - break; - } - } - c = *code; - } - else if (c == OP_CLASS || c == OP_NCLASS || c == OP_XCLASS) - { -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - if (c == OP_XCLASS) - repeat_opcode = code + GET(code, 1); - else -#endif - repeat_opcode = code + 1 + (32 / sizeof(pcre_uchar)); - - c = *repeat_opcode; - if (c >= OP_CRSTAR && c <= OP_CRMINRANGE) - { - /* end must not be NULL. */ - end = get_chr_property_list(code, utf, cd->fcc, list); - - list[1] = (c & 1) == 0; - - rec_limit = 1000; - if (compare_opcodes(end, utf, cd, list, end, &rec_limit)) - { - switch (c) - { - case OP_CRSTAR: - case OP_CRMINSTAR: - *repeat_opcode = OP_CRPOSSTAR; - break; - - case OP_CRPLUS: - case OP_CRMINPLUS: - *repeat_opcode = OP_CRPOSPLUS; - break; - - case OP_CRQUERY: - case OP_CRMINQUERY: - *repeat_opcode = OP_CRPOSQUERY; - break; - - case OP_CRRANGE: - case OP_CRMINRANGE: - *repeat_opcode = OP_CRPOSRANGE; - break; - } - } - } - c = *code; - } - - switch(c) - { - case OP_END: - return; - - case OP_TYPESTAR: - case OP_TYPEMINSTAR: - case OP_TYPEPLUS: - case OP_TYPEMINPLUS: - case OP_TYPEQUERY: - case OP_TYPEMINQUERY: - case OP_TYPEPOSSTAR: - case OP_TYPEPOSPLUS: - case OP_TYPEPOSQUERY: - if (code[1] == OP_PROP || code[1] == OP_NOTPROP) code += 2; - break; - - case OP_TYPEUPTO: - case OP_TYPEMINUPTO: - case OP_TYPEEXACT: - case OP_TYPEPOSUPTO: - if (code[1 + IMM2_SIZE] == OP_PROP || code[1 + IMM2_SIZE] == OP_NOTPROP) - code += 2; - break; - -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - case OP_XCLASS: - code += GET(code, 1); - break; -#endif - - case OP_MARK: - case OP_PRUNE_ARG: - case OP_SKIP_ARG: - case OP_THEN_ARG: - code += code[1]; - break; - } - - /* Add in the fixed length from the table */ - - code += PRIV(OP_lengths)[c]; - - /* In UTF-8 mode, opcodes that are followed by a character may be followed by - a multi-byte character. The length in the table is a minimum, so we have to - arrange to skip the extra bytes. */ - -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 - if (utf) switch(c) - { - case OP_CHAR: - case OP_CHARI: - case OP_NOT: - case OP_NOTI: - case OP_STAR: - case OP_MINSTAR: - case OP_PLUS: - case OP_MINPLUS: - case OP_QUERY: - case OP_MINQUERY: - case OP_UPTO: - case OP_MINUPTO: - case OP_EXACT: - case OP_POSSTAR: - case OP_POSPLUS: - case OP_POSQUERY: - case OP_POSUPTO: - case OP_STARI: - case OP_MINSTARI: - case OP_PLUSI: - case OP_MINPLUSI: - case OP_QUERYI: - case OP_MINQUERYI: - case OP_UPTOI: - case OP_MINUPTOI: - case OP_EXACTI: - case OP_POSSTARI: - case OP_POSPLUSI: - case OP_POSQUERYI: - case OP_POSUPTOI: - case OP_NOTSTAR: - case OP_NOTMINSTAR: - case OP_NOTPLUS: - case OP_NOTMINPLUS: - case OP_NOTQUERY: - case OP_NOTMINQUERY: - case OP_NOTUPTO: - case OP_NOTMINUPTO: - case OP_NOTEXACT: - case OP_NOTPOSSTAR: - case OP_NOTPOSPLUS: - case OP_NOTPOSQUERY: - case OP_NOTPOSUPTO: - case OP_NOTSTARI: - case OP_NOTMINSTARI: - case OP_NOTPLUSI: - case OP_NOTMINPLUSI: - case OP_NOTQUERYI: - case OP_NOTMINQUERYI: - case OP_NOTUPTOI: - case OP_NOTMINUPTOI: - case OP_NOTEXACTI: - case OP_NOTPOSSTARI: - case OP_NOTPOSPLUSI: - case OP_NOTPOSQUERYI: - case OP_NOTPOSUPTOI: - if (HAS_EXTRALEN(code[-1])) code += GET_EXTRALEN(code[-1]); - break; - } -#else - (void)(utf); /* Keep compiler happy by referencing function argument */ -#endif - } -} - - - -/************************************************* -* Check for POSIX class syntax * -*************************************************/ - -/* This function is called when the sequence "[:" or "[." or "[=" is -encountered in a character class. It checks whether this is followed by a -sequence of characters terminated by a matching ":]" or ".]" or "=]". If we -reach an unescaped ']' without the special preceding character, return FALSE. - -Originally, this function only recognized a sequence of letters between the -terminators, but it seems that Perl recognizes any sequence of characters, -though of course unknown POSIX names are subsequently rejected. Perl gives an -"Unknown POSIX class" error for [:f\oo:] for example, where previously PCRE -didn't consider this to be a POSIX class. Likewise for [:1234:]. - -The problem in trying to be exactly like Perl is in the handling of escapes. We -have to be sure that [abc[:x\]pqr] is *not* treated as containing a POSIX -class, but [abc[:x\]pqr:]] is (so that an error can be generated). The code -below handles the special cases \\ and \], but does not try to do any other -escape processing. This makes it different from Perl for cases such as -[:l\ower:] where Perl recognizes it as the POSIX class "lower" but PCRE does -not recognize "l\ower". This is a lesser evil than not diagnosing bad classes -when Perl does, I think. - -A user pointed out that PCRE was rejecting [:a[:digit:]] whereas Perl was not. -It seems that the appearance of a nested POSIX class supersedes an apparent -external class. For example, [:a[:digit:]b:] matches "a", "b", ":", or -a digit. - -In Perl, unescaped square brackets may also appear as part of class names. For -example, [:a[:abc]b:] gives unknown POSIX class "[:abc]b:]". However, for -[:a[:abc]b][b:] it gives unknown POSIX class "[:abc]b][b:]", which does not -seem right at all. PCRE does not allow closing square brackets in POSIX class -names. - -Arguments: - ptr pointer to the initial [ - endptr where to return the end pointer - -Returns: TRUE or FALSE -*/ - -static BOOL -check_posix_syntax(const pcre_uchar *ptr, const pcre_uchar **endptr) -{ -pcre_uchar terminator; /* Don't combine these lines; the Solaris cc */ -terminator = *(++ptr); /* compiler warns about "non-constant" initializer. */ -for (++ptr; *ptr != CHAR_NULL; ptr++) - { - if (*ptr == CHAR_BACKSLASH && - (ptr[1] == CHAR_RIGHT_SQUARE_BRACKET || - ptr[1] == CHAR_BACKSLASH)) - ptr++; - else if ((*ptr == CHAR_LEFT_SQUARE_BRACKET && ptr[1] == terminator) || - *ptr == CHAR_RIGHT_SQUARE_BRACKET) return FALSE; - else if (*ptr == terminator && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET) - { - *endptr = ptr; - return TRUE; - } - } -return FALSE; -} - - - - -/************************************************* -* Check POSIX class name * -*************************************************/ - -/* This function is called to check the name given in a POSIX-style class entry -such as [:alnum:]. - -Arguments: - ptr points to the first letter - len the length of the name - -Returns: a value representing the name, or -1 if unknown -*/ - -static int -check_posix_name(const pcre_uchar *ptr, int len) -{ -const char *pn = posix_names; -register int yield = 0; -while (posix_name_lengths[yield] != 0) - { - if (len == posix_name_lengths[yield] && - STRNCMP_UC_C8(ptr, pn, (unsigned int)len) == 0) return yield; - pn += posix_name_lengths[yield] + 1; - yield++; - } -return -1; -} - - -/************************************************* -* Adjust OP_RECURSE items in repeated group * -*************************************************/ - -/* OP_RECURSE items contain an offset from the start of the regex to the group -that is referenced. This means that groups can be replicated for fixed -repetition simply by copying (because the recursion is allowed to refer to -earlier groups that are outside the current group). However, when a group is -optional (i.e. the minimum quantifier is zero), OP_BRAZERO or OP_SKIPZERO is -inserted before it, after it has been compiled. This means that any OP_RECURSE -items within it that refer to the group itself or any contained groups have to -have their offsets adjusted. That one of the jobs of this function. Before it -is called, the partially compiled regex must be temporarily terminated with -OP_END. - -This function has been extended to cope with forward references for recursions -and subroutine calls. It must check the list of such references for the -group we are dealing with. If it finds that one of the recursions in the -current group is on this list, it does not adjust the value in the reference -(which is a group number). After the group has been scanned, all the offsets in -the forward reference list for the group are adjusted. - -Arguments: - group points to the start of the group - adjust the amount by which the group is to be moved - utf TRUE in UTF-8 / UTF-16 / UTF-32 mode - cd contains pointers to tables etc. - save_hwm_offset the hwm forward reference offset at the start of the group - -Returns: nothing -*/ - -static void -adjust_recurse(pcre_uchar *group, int adjust, BOOL utf, compile_data *cd, - size_t save_hwm_offset) -{ -int offset; -pcre_uchar *hc; -pcre_uchar *ptr = group; - -while ((ptr = (pcre_uchar *)find_recurse(ptr, utf)) != NULL) - { - for (hc = (pcre_uchar *)cd->start_workspace + save_hwm_offset; hc < cd->hwm; - hc += LINK_SIZE) - { - offset = (int)GET(hc, 0); - if (cd->start_code + offset == ptr + 1) break; - } - - /* If we have not found this recursion on the forward reference list, adjust - the recursion's offset if it's after the start of this group. */ - - if (hc >= cd->hwm) - { - offset = (int)GET(ptr, 1); - if (cd->start_code + offset >= group) PUT(ptr, 1, offset + adjust); - } - - ptr += 1 + LINK_SIZE; - } - -/* Now adjust all forward reference offsets for the group. */ - -for (hc = (pcre_uchar *)cd->start_workspace + save_hwm_offset; hc < cd->hwm; - hc += LINK_SIZE) - { - offset = (int)GET(hc, 0); - PUT(hc, 0, offset + adjust); - } -} - - - -/************************************************* -* Insert an automatic callout point * -*************************************************/ - -/* This function is called when the PCRE_AUTO_CALLOUT option is set, to insert -callout points before each pattern item. - -Arguments: - code current code pointer - ptr current pattern pointer - cd pointers to tables etc - -Returns: new code pointer -*/ - -static pcre_uchar * -auto_callout(pcre_uchar *code, const pcre_uchar *ptr, compile_data *cd) -{ -*code++ = OP_CALLOUT; -*code++ = 255; -PUT(code, 0, (int)(ptr - cd->start_pattern)); /* Pattern offset */ -PUT(code, LINK_SIZE, 0); /* Default length */ -return code + 2 * LINK_SIZE; -} - - - -/************************************************* -* Complete a callout item * -*************************************************/ - -/* A callout item contains the length of the next item in the pattern, which -we can't fill in till after we have reached the relevant point. This is used -for both automatic and manual callouts. - -Arguments: - previous_callout points to previous callout item - ptr current pattern pointer - cd pointers to tables etc - -Returns: nothing -*/ - -static void -complete_callout(pcre_uchar *previous_callout, const pcre_uchar *ptr, compile_data *cd) -{ -int length = (int)(ptr - cd->start_pattern - GET(previous_callout, 2)); -PUT(previous_callout, 2 + LINK_SIZE, length); -} - - - -#ifdef SUPPORT_UCP -/************************************************* -* Get othercase range * -*************************************************/ - -/* This function is passed the start and end of a class range, in UTF-8 mode -with UCP support. It searches up the characters, looking for ranges of -characters in the "other" case. Each call returns the next one, updating the -start address. A character with multiple other cases is returned on its own -with a special return value. - -Arguments: - cptr points to starting character value; updated - d end value - ocptr where to put start of othercase range - odptr where to put end of othercase range - -Yield: -1 when no more - 0 when a range is returned - >0 the CASESET offset for char with multiple other cases - in this case, ocptr contains the original -*/ - -static int -get_othercase_range(pcre_uint32 *cptr, pcre_uint32 d, pcre_uint32 *ocptr, - pcre_uint32 *odptr) -{ -pcre_uint32 c, othercase, next; -unsigned int co; - -/* Find the first character that has an other case. If it has multiple other -cases, return its case offset value. */ - -for (c = *cptr; c <= d; c++) - { - if ((co = UCD_CASESET(c)) != 0) - { - *ocptr = c++; /* Character that has the set */ - *cptr = c; /* Rest of input range */ - return (int)co; - } - if ((othercase = UCD_OTHERCASE(c)) != c) break; - } - -if (c > d) return -1; /* Reached end of range */ - -/* Found a character that has a single other case. Search for the end of the -range, which is either the end of the input range, or a character that has zero -or more than one other cases. */ - -*ocptr = othercase; -next = othercase + 1; - -for (++c; c <= d; c++) - { - if ((co = UCD_CASESET(c)) != 0 || UCD_OTHERCASE(c) != next) break; - next++; - } - -*odptr = next - 1; /* End of othercase range */ -*cptr = c; /* Rest of input range */ -return 0; -} -#endif /* SUPPORT_UCP */ - - - -/************************************************* -* Add a character or range to a class * -*************************************************/ - -/* This function packages up the logic of adding a character or range of -characters to a class. The character values in the arguments will be within the -valid values for the current mode (8-bit, 16-bit, UTF, etc). This function is -mutually recursive with the function immediately below. - -Arguments: - classbits the bit map for characters < 256 - uchardptr points to the pointer for extra data - options the options word - cd contains pointers to tables etc. - start start of range character - end end of range character - -Returns: the number of < 256 characters added - the pointer to extra data is updated -*/ - -static int -add_to_class(pcre_uint8 *classbits, pcre_uchar **uchardptr, int options, - compile_data *cd, pcre_uint32 start, pcre_uint32 end) -{ -pcre_uint32 c; -pcre_uint32 classbits_end = (end <= 0xff ? end : 0xff); -int n8 = 0; - -/* If caseless matching is required, scan the range and process alternate -cases. In Unicode, there are 8-bit characters that have alternate cases that -are greater than 255 and vice-versa. Sometimes we can just extend the original -range. */ - -if ((options & PCRE_CASELESS) != 0) - { -#ifdef SUPPORT_UCP - if ((options & PCRE_UTF8) != 0) - { - int rc; - pcre_uint32 oc, od; - - options &= ~PCRE_CASELESS; /* Remove for recursive calls */ - c = start; - - while ((rc = get_othercase_range(&c, end, &oc, &od)) >= 0) - { - /* Handle a single character that has more than one other case. */ - - if (rc > 0) n8 += add_list_to_class(classbits, uchardptr, options, cd, - PRIV(ucd_caseless_sets) + rc, oc); - - /* Do nothing if the other case range is within the original range. */ - - else if (oc >= start && od <= end) continue; - - /* Extend the original range if there is overlap, noting that if oc < c, we - can't have od > end because a subrange is always shorter than the basic - range. Otherwise, use a recursive call to add the additional range. */ - - else if (oc < start && od >= start - 1) start = oc; /* Extend downwards */ - else if (od > end && oc <= end + 1) - { - end = od; /* Extend upwards */ - if (end > classbits_end) classbits_end = (end <= 0xff ? end : 0xff); - } - else n8 += add_to_class(classbits, uchardptr, options, cd, oc, od); - } - } - else -#endif /* SUPPORT_UCP */ - - /* Not UTF-mode, or no UCP */ - - for (c = start; c <= classbits_end; c++) - { - SETBIT(classbits, cd->fcc[c]); - n8++; - } - } - -/* Now handle the original range. Adjust the final value according to the bit -length - this means that the same lists of (e.g.) horizontal spaces can be used -in all cases. */ - -#if defined COMPILE_PCRE8 -#ifdef SUPPORT_UTF - if ((options & PCRE_UTF8) == 0) -#endif - if (end > 0xff) end = 0xff; - -#elif defined COMPILE_PCRE16 -#ifdef SUPPORT_UTF - if ((options & PCRE_UTF16) == 0) -#endif - if (end > 0xffff) end = 0xffff; - -#endif /* COMPILE_PCRE[8|16] */ - -/* Use the bitmap for characters < 256. Otherwise use extra data.*/ - -for (c = start; c <= classbits_end; c++) - { - /* Regardless of start, c will always be <= 255. */ - SETBIT(classbits, c); - n8++; - } - -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 -if (start <= 0xff) start = 0xff + 1; - -if (end >= start) - { - pcre_uchar *uchardata = *uchardptr; -#ifdef SUPPORT_UTF - if ((options & PCRE_UTF8) != 0) /* All UTFs use the same flag bit */ - { - if (start < end) - { - *uchardata++ = XCL_RANGE; - uchardata += PRIV(ord2utf)(start, uchardata); - uchardata += PRIV(ord2utf)(end, uchardata); - } - else if (start == end) - { - *uchardata++ = XCL_SINGLE; - uchardata += PRIV(ord2utf)(start, uchardata); - } - } - else -#endif /* SUPPORT_UTF */ - - /* Without UTF support, character values are constrained by the bit length, - and can only be > 256 for 16-bit and 32-bit libraries. */ - -#ifdef COMPILE_PCRE8 - {} -#else - if (start < end) - { - *uchardata++ = XCL_RANGE; - *uchardata++ = start; - *uchardata++ = end; - } - else if (start == end) - { - *uchardata++ = XCL_SINGLE; - *uchardata++ = start; - } -#endif - - *uchardptr = uchardata; /* Updata extra data pointer */ - } -#endif /* SUPPORT_UTF || !COMPILE_PCRE8 */ - -return n8; /* Number of 8-bit characters */ -} - - - - -/************************************************* -* Add a list of characters to a class * -*************************************************/ - -/* This function is used for adding a list of case-equivalent characters to a -class, and also for adding a list of horizontal or vertical whitespace. If the -list is in order (which it should be), ranges of characters are detected and -handled appropriately. This function is mutually recursive with the function -above. - -Arguments: - classbits the bit map for characters < 256 - uchardptr points to the pointer for extra data - options the options word - cd contains pointers to tables etc. - p points to row of 32-bit values, terminated by NOTACHAR - except character to omit; this is used when adding lists of - case-equivalent characters to avoid including the one we - already know about - -Returns: the number of < 256 characters added - the pointer to extra data is updated -*/ - -static int -add_list_to_class(pcre_uint8 *classbits, pcre_uchar **uchardptr, int options, - compile_data *cd, const pcre_uint32 *p, unsigned int except) -{ -int n8 = 0; -while (p[0] < NOTACHAR) - { - int n = 0; - if (p[0] != except) - { - while(p[n+1] == p[0] + n + 1) n++; - n8 += add_to_class(classbits, uchardptr, options, cd, p[0], p[n]); - } - p += n + 1; - } -return n8; -} - - - -/************************************************* -* Add characters not in a list to a class * -*************************************************/ - -/* This function is used for adding the complement of a list of horizontal or -vertical whitespace to a class. The list must be in order. - -Arguments: - classbits the bit map for characters < 256 - uchardptr points to the pointer for extra data - options the options word - cd contains pointers to tables etc. - p points to row of 32-bit values, terminated by NOTACHAR - -Returns: the number of < 256 characters added - the pointer to extra data is updated -*/ - -static int -add_not_list_to_class(pcre_uint8 *classbits, pcre_uchar **uchardptr, - int options, compile_data *cd, const pcre_uint32 *p) -{ -BOOL utf = (options & PCRE_UTF8) != 0; -int n8 = 0; -if (p[0] > 0) - n8 += add_to_class(classbits, uchardptr, options, cd, 0, p[0] - 1); -while (p[0] < NOTACHAR) - { - while (p[1] == p[0] + 1) p++; - n8 += add_to_class(classbits, uchardptr, options, cd, p[0] + 1, - (p[1] == NOTACHAR) ? (utf ? 0x10ffffu : 0xffffffffu) : p[1] - 1); - p++; - } -return n8; -} - - - -/************************************************* -* Compile one branch * -*************************************************/ - -/* Scan the pattern, compiling it into the a vector. If the options are -changed during the branch, the pointer is used to change the external options -bits. This function is used during the pre-compile phase when we are trying -to find out the amount of memory needed, as well as during the real compile -phase. The value of lengthptr distinguishes the two phases. - -Arguments: - optionsptr pointer to the option bits - codeptr points to the pointer to the current code point - ptrptr points to the current pattern pointer - errorcodeptr points to error code variable - firstcharptr place to put the first required character - firstcharflagsptr place to put the first character flags, or a negative number - reqcharptr place to put the last required character - reqcharflagsptr place to put the last required character flags, or a negative number - bcptr points to current branch chain - cond_depth conditional nesting depth - cd contains pointers to tables etc. - lengthptr NULL during the real compile phase - points to length accumulator during pre-compile phase - -Returns: TRUE on success - FALSE, with *errorcodeptr set non-zero on error -*/ - -static BOOL -compile_branch(int *optionsptr, pcre_uchar **codeptr, - const pcre_uchar **ptrptr, int *errorcodeptr, - pcre_uint32 *firstcharptr, pcre_int32 *firstcharflagsptr, - pcre_uint32 *reqcharptr, pcre_int32 *reqcharflagsptr, - branch_chain *bcptr, int cond_depth, - compile_data *cd, int *lengthptr) -{ -int repeat_type, op_type; -int repeat_min = 0, repeat_max = 0; /* To please picky compilers */ -int bravalue = 0; -int greedy_default, greedy_non_default; -pcre_uint32 firstchar, reqchar; -pcre_int32 firstcharflags, reqcharflags; -pcre_uint32 zeroreqchar, zerofirstchar; -pcre_int32 zeroreqcharflags, zerofirstcharflags; -pcre_int32 req_caseopt, reqvary, tempreqvary; -int options = *optionsptr; /* May change dynamically */ -int after_manual_callout = 0; -int length_prevgroup = 0; -register pcre_uint32 c; -int escape; -register pcre_uchar *code = *codeptr; -pcre_uchar *last_code = code; -pcre_uchar *orig_code = code; -pcre_uchar *tempcode; -BOOL inescq = FALSE; -BOOL groupsetfirstchar = FALSE; -const pcre_uchar *ptr = *ptrptr; -const pcre_uchar *tempptr; -const pcre_uchar *nestptr = NULL; -pcre_uchar *previous = NULL; -pcre_uchar *previous_callout = NULL; -size_t item_hwm_offset = 0; -pcre_uint8 classbits[32]; - -/* We can fish out the UTF-8 setting once and for all into a BOOL, but we -must not do this for other options (e.g. PCRE_EXTENDED) because they may change -dynamically as we process the pattern. */ - -#ifdef SUPPORT_UTF -/* PCRE_UTF[16|32] have the same value as PCRE_UTF8. */ -BOOL utf = (options & PCRE_UTF8) != 0; -#ifndef COMPILE_PCRE32 -pcre_uchar utf_chars[6]; -#endif -#else -BOOL utf = FALSE; -#endif - -/* Helper variables for OP_XCLASS opcode (for characters > 255). We define -class_uchardata always so that it can be passed to add_to_class() always, -though it will not be used in non-UTF 8-bit cases. This avoids having to supply -alternative calls for the different cases. */ - -pcre_uchar *class_uchardata; -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 -BOOL xclass; -pcre_uchar *class_uchardata_base; -#endif - -#ifdef PCRE_DEBUG -if (lengthptr != NULL) DPRINTF((">> start branch\n")); -#endif - -/* Set up the default and non-default settings for greediness */ - -greedy_default = ((options & PCRE_UNGREEDY) != 0); -greedy_non_default = greedy_default ^ 1; - -/* Initialize no first byte, no required byte. REQ_UNSET means "no char -matching encountered yet". It gets changed to REQ_NONE if we hit something that -matches a non-fixed char first char; reqchar just remains unset if we never -find one. - -When we hit a repeat whose minimum is zero, we may have to adjust these values -to take the zero repeat into account. This is implemented by setting them to -zerofirstbyte and zeroreqchar when such a repeat is encountered. The individual -item types that can be repeated set these backoff variables appropriately. */ - -firstchar = reqchar = zerofirstchar = zeroreqchar = 0; -firstcharflags = reqcharflags = zerofirstcharflags = zeroreqcharflags = REQ_UNSET; - -/* The variable req_caseopt contains either the REQ_CASELESS value -or zero, according to the current setting of the caseless flag. The -REQ_CASELESS leaves the lower 28 bit empty. It is added into the -firstchar or reqchar variables to record the case status of the -value. This is used only for ASCII characters. */ - -req_caseopt = ((options & PCRE_CASELESS) != 0)? REQ_CASELESS:0; - -/* Switch on next character until the end of the branch */ - -for (;; ptr++) - { - BOOL negate_class; - BOOL should_flip_negation; - BOOL possessive_quantifier; - BOOL is_quantifier; - BOOL is_recurse; - BOOL reset_bracount; - int class_has_8bitchar; - int class_one_char; -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - BOOL xclass_has_prop; -#endif - int newoptions; - int recno; - int refsign; - int skipbytes; - pcre_uint32 subreqchar, subfirstchar; - pcre_int32 subreqcharflags, subfirstcharflags; - int terminator; - unsigned int mclength; - unsigned int tempbracount; - pcre_uint32 ec; - pcre_uchar mcbuffer[8]; - - /* Come here to restart the loop without advancing the pointer. */ - - REDO_LOOP: - - /* Get next character in the pattern */ - - c = *ptr; - - /* If we are at the end of a nested substitution, revert to the outer level - string. Nesting only happens one level deep. */ - - if (c == CHAR_NULL && nestptr != NULL) - { - ptr = nestptr; - nestptr = NULL; - c = *ptr; - } - - /* If we are in the pre-compile phase, accumulate the length used for the - previous cycle of this loop. */ - - if (lengthptr != NULL) - { -#ifdef PCRE_DEBUG - if (code > cd->hwm) cd->hwm = code; /* High water info */ -#endif - if (code > cd->start_workspace + cd->workspace_size - - WORK_SIZE_SAFETY_MARGIN) /* Check for overrun */ - { - *errorcodeptr = (code >= cd->start_workspace + cd->workspace_size)? - ERR52 : ERR87; - goto FAILED; - } - - /* There is at least one situation where code goes backwards: this is the - case of a zero quantifier after a class (e.g. [ab]{0}). At compile time, - the class is simply eliminated. However, it is created first, so we have to - allow memory for it. Therefore, don't ever reduce the length at this point. - */ - - if (code < last_code) code = last_code; - - /* Paranoid check for integer overflow */ - - if (OFLOW_MAX - *lengthptr < code - last_code) - { - *errorcodeptr = ERR20; - goto FAILED; - } - - *lengthptr += (int)(code - last_code); - DPRINTF(("length=%d added %d c=%c (0x%x)\n", *lengthptr, - (int)(code - last_code), c, c)); - - /* If "previous" is set and it is not at the start of the work space, move - it back to there, in order to avoid filling up the work space. Otherwise, - if "previous" is NULL, reset the current code pointer to the start. */ - - if (previous != NULL) - { - if (previous > orig_code) - { - memmove(orig_code, previous, IN_UCHARS(code - previous)); - code -= previous - orig_code; - previous = orig_code; - } - } - else code = orig_code; - - /* Remember where this code item starts so we can pick up the length - next time round. */ - - last_code = code; - } - - /* In the real compile phase, just check the workspace used by the forward - reference list. */ - - else if (cd->hwm > cd->start_workspace + cd->workspace_size) - { - *errorcodeptr = ERR52; - goto FAILED; - } - - /* If in \Q...\E, check for the end; if not, we have a literal. Otherwise an - isolated \E is ignored. */ - - if (c != CHAR_NULL) - { - if (c == CHAR_BACKSLASH && ptr[1] == CHAR_E) - { - inescq = FALSE; - ptr++; - continue; - } - else if (inescq) - { - if (previous_callout != NULL) - { - if (lengthptr == NULL) /* Don't attempt in pre-compile phase */ - complete_callout(previous_callout, ptr, cd); - previous_callout = NULL; - } - if ((options & PCRE_AUTO_CALLOUT) != 0) - { - previous_callout = code; - code = auto_callout(code, ptr, cd); - } - goto NORMAL_CHAR; - } - - /* Check for the start of a \Q...\E sequence. We must do this here rather - than later in case it is immediately followed by \E, which turns it into a - "do nothing" sequence. */ - - if (c == CHAR_BACKSLASH && ptr[1] == CHAR_Q) - { - inescq = TRUE; - ptr++; - continue; - } - } - - /* In extended mode, skip white space and comments. */ - - if ((options & PCRE_EXTENDED) != 0) - { - const pcre_uchar *wscptr = ptr; - while (MAX_255(c) && (cd->ctypes[c] & ctype_space) != 0) c = *(++ptr); - if (c == CHAR_NUMBER_SIGN) - { - ptr++; - while (*ptr != CHAR_NULL) - { - if (IS_NEWLINE(ptr)) /* For non-fixed-length newline cases, */ - { /* IS_NEWLINE sets cd->nllen. */ - ptr += cd->nllen; - break; - } - ptr++; -#ifdef SUPPORT_UTF - if (utf) FORWARDCHAR(ptr); -#endif - } - } - - /* If we skipped any characters, restart the loop. Otherwise, we didn't see - a comment. */ - - if (ptr > wscptr) goto REDO_LOOP; - } - - /* Skip over (?# comments. We need to do this here because we want to know if - the next thing is a quantifier, and these comments may come between an item - and its quantifier. */ - - if (c == CHAR_LEFT_PARENTHESIS && ptr[1] == CHAR_QUESTION_MARK && - ptr[2] == CHAR_NUMBER_SIGN) - { - ptr += 3; - while (*ptr != CHAR_NULL && *ptr != CHAR_RIGHT_PARENTHESIS) ptr++; - if (*ptr == CHAR_NULL) - { - *errorcodeptr = ERR18; - goto FAILED; - } - continue; - } - - /* See if the next thing is a quantifier. */ - - is_quantifier = - c == CHAR_ASTERISK || c == CHAR_PLUS || c == CHAR_QUESTION_MARK || - (c == CHAR_LEFT_CURLY_BRACKET && is_counted_repeat(ptr+1)); - - /* Fill in length of a previous callout, except when the next thing is a - quantifier or when processing a property substitution string in UCP mode. */ - - if (!is_quantifier && previous_callout != NULL && nestptr == NULL && - after_manual_callout-- <= 0) - { - if (lengthptr == NULL) /* Don't attempt in pre-compile phase */ - complete_callout(previous_callout, ptr, cd); - previous_callout = NULL; - } - - /* Create auto callout, except for quantifiers, or while processing property - strings that are substituted for \w etc in UCP mode. */ - - if ((options & PCRE_AUTO_CALLOUT) != 0 && !is_quantifier && nestptr == NULL) - { - previous_callout = code; - code = auto_callout(code, ptr, cd); - } - - /* Process the next pattern item. */ - - switch(c) - { - /* ===================================================================*/ - case CHAR_NULL: /* The branch terminates at string end */ - case CHAR_VERTICAL_LINE: /* or | or ) */ - case CHAR_RIGHT_PARENTHESIS: - *firstcharptr = firstchar; - *firstcharflagsptr = firstcharflags; - *reqcharptr = reqchar; - *reqcharflagsptr = reqcharflags; - *codeptr = code; - *ptrptr = ptr; - if (lengthptr != NULL) - { - if (OFLOW_MAX - *lengthptr < code - last_code) - { - *errorcodeptr = ERR20; - goto FAILED; - } - *lengthptr += (int)(code - last_code); /* To include callout length */ - DPRINTF((">> end branch\n")); - } - return TRUE; - - - /* ===================================================================*/ - /* Handle single-character metacharacters. In multiline mode, ^ disables - the setting of any following char as a first character. */ - - case CHAR_CIRCUMFLEX_ACCENT: - previous = NULL; - if ((options & PCRE_MULTILINE) != 0) - { - if (firstcharflags == REQ_UNSET) - zerofirstcharflags = firstcharflags = REQ_NONE; - *code++ = OP_CIRCM; - } - else *code++ = OP_CIRC; - break; - - case CHAR_DOLLAR_SIGN: - previous = NULL; - *code++ = ((options & PCRE_MULTILINE) != 0)? OP_DOLLM : OP_DOLL; - break; - - /* There can never be a first char if '.' is first, whatever happens about - repeats. The value of reqchar doesn't change either. */ - - case CHAR_DOT: - if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE; - zerofirstchar = firstchar; - zerofirstcharflags = firstcharflags; - zeroreqchar = reqchar; - zeroreqcharflags = reqcharflags; - previous = code; - item_hwm_offset = cd->hwm - cd->start_workspace; - *code++ = ((options & PCRE_DOTALL) != 0)? OP_ALLANY: OP_ANY; - break; - - - /* ===================================================================*/ - /* Character classes. If the included characters are all < 256, we build a - 32-byte bitmap of the permitted characters, except in the special case - where there is only one such character. For negated classes, we build the - map as usual, then invert it at the end. However, we use a different opcode - so that data characters > 255 can be handled correctly. - - If the class contains characters outside the 0-255 range, a different - opcode is compiled. It may optionally have a bit map for characters < 256, - but those above are are explicitly listed afterwards. A flag byte tells - whether the bitmap is present, and whether this is a negated class or not. - - In JavaScript compatibility mode, an isolated ']' causes an error. In - default (Perl) mode, it is treated as a data character. */ - - case CHAR_RIGHT_SQUARE_BRACKET: - if ((cd->external_options & PCRE_JAVASCRIPT_COMPAT) != 0) - { - *errorcodeptr = ERR64; - goto FAILED; - } - goto NORMAL_CHAR; - - /* In another (POSIX) regex library, the ugly syntax [[:<:]] and [[:>:]] is - used for "start of word" and "end of word". As these are otherwise illegal - sequences, we don't break anything by recognizing them. They are replaced - by \b(?=\w) and \b(?<=\w) respectively. Sequences like [a[:<:]] are - erroneous and are handled by the normal code below. */ - - case CHAR_LEFT_SQUARE_BRACKET: - if (STRNCMP_UC_C8(ptr+1, STRING_WEIRD_STARTWORD, 6) == 0) - { - nestptr = ptr + 7; - ptr = sub_start_of_word; - goto REDO_LOOP; - } - - if (STRNCMP_UC_C8(ptr+1, STRING_WEIRD_ENDWORD, 6) == 0) - { - nestptr = ptr + 7; - ptr = sub_end_of_word; - goto REDO_LOOP; - } - - /* Handle a real character class. */ - - previous = code; - item_hwm_offset = cd->hwm - cd->start_workspace; - - /* PCRE supports POSIX class stuff inside a class. Perl gives an error if - they are encountered at the top level, so we'll do that too. */ - - if ((ptr[1] == CHAR_COLON || ptr[1] == CHAR_DOT || - ptr[1] == CHAR_EQUALS_SIGN) && - check_posix_syntax(ptr, &tempptr)) - { - *errorcodeptr = (ptr[1] == CHAR_COLON)? ERR13 : ERR31; - goto FAILED; - } - - /* If the first character is '^', set the negation flag and skip it. Also, - if the first few characters (either before or after ^) are \Q\E or \E we - skip them too. This makes for compatibility with Perl. */ - - negate_class = FALSE; - for (;;) - { - c = *(++ptr); - if (c == CHAR_BACKSLASH) - { - if (ptr[1] == CHAR_E) - ptr++; - else if (STRNCMP_UC_C8(ptr + 1, STR_Q STR_BACKSLASH STR_E, 3) == 0) - ptr += 3; - else - break; - } - else if (!negate_class && c == CHAR_CIRCUMFLEX_ACCENT) - negate_class = TRUE; - else break; - } - - /* Empty classes are allowed in JavaScript compatibility mode. Otherwise, - an initial ']' is taken as a data character -- the code below handles - that. In JS mode, [] must always fail, so generate OP_FAIL, whereas - [^] must match any character, so generate OP_ALLANY. */ - - if (c == CHAR_RIGHT_SQUARE_BRACKET && - (cd->external_options & PCRE_JAVASCRIPT_COMPAT) != 0) - { - *code++ = negate_class? OP_ALLANY : OP_FAIL; - if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE; - zerofirstchar = firstchar; - zerofirstcharflags = firstcharflags; - break; - } - - /* If a class contains a negative special such as \S, we need to flip the - negation flag at the end, so that support for characters > 255 works - correctly (they are all included in the class). */ - - should_flip_negation = FALSE; - - /* Extended class (xclass) will be used when characters > 255 - might match. */ - -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - xclass = FALSE; - class_uchardata = code + LINK_SIZE + 2; /* For XCLASS items */ - class_uchardata_base = class_uchardata; /* Save the start */ -#endif - - /* For optimization purposes, we track some properties of the class: - class_has_8bitchar will be non-zero if the class contains at least one < - 256 character; class_one_char will be 1 if the class contains just one - character; xclass_has_prop will be TRUE if unicode property checks - are present in the class. */ - - class_has_8bitchar = 0; - class_one_char = 0; -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - xclass_has_prop = FALSE; -#endif - - /* Initialize the 32-char bit map to all zeros. We build the map in a - temporary bit of memory, in case the class contains fewer than two - 8-bit characters because in that case the compiled code doesn't use the bit - map. */ - - memset(classbits, 0, 32 * sizeof(pcre_uint8)); - - /* Process characters until ] is reached. By writing this as a "do" it - means that an initial ] is taken as a data character. At the start of the - loop, c contains the first byte of the character. */ - - if (c != CHAR_NULL) do - { - const pcre_uchar *oldptr; - -#ifdef SUPPORT_UTF - if (utf && HAS_EXTRALEN(c)) - { /* Braces are required because the */ - GETCHARLEN(c, ptr, ptr); /* macro generates multiple statements */ - } -#endif - -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - /* In the pre-compile phase, accumulate the length of any extra - data and reset the pointer. This is so that very large classes that - contain a zillion > 255 characters no longer overwrite the work space - (which is on the stack). We have to remember that there was XCLASS data, - however. */ - - if (class_uchardata > class_uchardata_base) xclass = TRUE; - - if (lengthptr != NULL && class_uchardata > class_uchardata_base) - { - *lengthptr += (int)(class_uchardata - class_uchardata_base); - class_uchardata = class_uchardata_base; - } -#endif - - /* Inside \Q...\E everything is literal except \E */ - - if (inescq) - { - if (c == CHAR_BACKSLASH && ptr[1] == CHAR_E) /* If we are at \E */ - { - inescq = FALSE; /* Reset literal state */ - ptr++; /* Skip the 'E' */ - continue; /* Carry on with next */ - } - goto CHECK_RANGE; /* Could be range if \E follows */ - } - - /* Handle POSIX class names. Perl allows a negation extension of the - form [:^name:]. A square bracket that doesn't match the syntax is - treated as a literal. We also recognize the POSIX constructions - [.ch.] and [=ch=] ("collating elements") and fault them, as Perl - 5.6 and 5.8 do. */ - - if (c == CHAR_LEFT_SQUARE_BRACKET && - (ptr[1] == CHAR_COLON || ptr[1] == CHAR_DOT || - ptr[1] == CHAR_EQUALS_SIGN) && check_posix_syntax(ptr, &tempptr)) - { - BOOL local_negate = FALSE; - int posix_class, taboffset, tabopt; - register const pcre_uint8 *cbits = cd->cbits; - pcre_uint8 pbits[32]; - - if (ptr[1] != CHAR_COLON) - { - *errorcodeptr = ERR31; - goto FAILED; - } - - ptr += 2; - if (*ptr == CHAR_CIRCUMFLEX_ACCENT) - { - local_negate = TRUE; - should_flip_negation = TRUE; /* Note negative special */ - ptr++; - } - - posix_class = check_posix_name(ptr, (int)(tempptr - ptr)); - if (posix_class < 0) - { - *errorcodeptr = ERR30; - goto FAILED; - } - - /* If matching is caseless, upper and lower are converted to - alpha. This relies on the fact that the class table starts with - alpha, lower, upper as the first 3 entries. */ - - if ((options & PCRE_CASELESS) != 0 && posix_class <= 2) - posix_class = 0; - - /* When PCRE_UCP is set, some of the POSIX classes are converted to - different escape sequences that use Unicode properties \p or \P. Others - that are not available via \p or \P generate XCL_PROP/XCL_NOTPROP - directly. */ - -#ifdef SUPPORT_UCP - if ((options & PCRE_UCP) != 0) - { - unsigned int ptype = 0; - int pc = posix_class + ((local_negate)? POSIX_SUBSIZE/2 : 0); - - /* The posix_substitutes table specifies which POSIX classes can be - converted to \p or \P items. */ - - if (posix_substitutes[pc] != NULL) - { - nestptr = tempptr + 1; - ptr = posix_substitutes[pc] - 1; - continue; - } - - /* There are three other classes that generate special property calls - that are recognized only in an XCLASS. */ - - else switch(posix_class) - { - case PC_GRAPH: - ptype = PT_PXGRAPH; - /* Fall through */ - case PC_PRINT: - if (ptype == 0) ptype = PT_PXPRINT; - /* Fall through */ - case PC_PUNCT: - if (ptype == 0) ptype = PT_PXPUNCT; - *class_uchardata++ = local_negate? XCL_NOTPROP : XCL_PROP; - *class_uchardata++ = ptype; - *class_uchardata++ = 0; - xclass_has_prop = TRUE; - ptr = tempptr + 1; - continue; - - /* For the other POSIX classes (ascii, cntrl, xdigit) we are going - to fall through to the non-UCP case and build a bit map for - characters with code points less than 256. If we are in a negated - POSIX class, characters with code points greater than 255 must - either all match or all not match. In the special case where we - have not yet generated any xclass data, and this is the final item - in the overall class, we need do nothing: later on, the opcode - OP_NCLASS will be used to indicate that characters greater than 255 - are acceptable. If we have already seen an xclass item or one may - follow (we have to assume that it might if this is not the end of - the class), explicitly list all wide codepoints, which will then - either not match or match, depending on whether the class is or is - not negated. */ - - default: - if (local_negate && - (xclass || tempptr[2] != CHAR_RIGHT_SQUARE_BRACKET)) - { - *class_uchardata++ = XCL_RANGE; - class_uchardata += PRIV(ord2utf)(0x100, class_uchardata); - class_uchardata += PRIV(ord2utf)(0x10ffff, class_uchardata); - } - break; - } - } -#endif - /* In the non-UCP case, or when UCP makes no difference, we build the - bit map for the POSIX class in a chunk of local store because we may be - adding and subtracting from it, and we don't want to subtract bits that - may be in the main map already. At the end we or the result into the - bit map that is being built. */ - - posix_class *= 3; - - /* Copy in the first table (always present) */ - - memcpy(pbits, cbits + posix_class_maps[posix_class], - 32 * sizeof(pcre_uint8)); - - /* If there is a second table, add or remove it as required. */ - - taboffset = posix_class_maps[posix_class + 1]; - tabopt = posix_class_maps[posix_class + 2]; - - if (taboffset >= 0) - { - if (tabopt >= 0) - for (c = 0; c < 32; c++) pbits[c] |= cbits[c + taboffset]; - else - for (c = 0; c < 32; c++) pbits[c] &= ~cbits[c + taboffset]; - } - - /* Now see if we need to remove any special characters. An option - value of 1 removes vertical space and 2 removes underscore. */ - - if (tabopt < 0) tabopt = -tabopt; - if (tabopt == 1) pbits[1] &= ~0x3c; - else if (tabopt == 2) pbits[11] &= 0x7f; - - /* Add the POSIX table or its complement into the main table that is - being built and we are done. */ - - if (local_negate) - for (c = 0; c < 32; c++) classbits[c] |= ~pbits[c]; - else - for (c = 0; c < 32; c++) classbits[c] |= pbits[c]; - - ptr = tempptr + 1; - /* Every class contains at least one < 256 character. */ - class_has_8bitchar = 1; - /* Every class contains at least two characters. */ - class_one_char = 2; - continue; /* End of POSIX syntax handling */ - } - - /* Backslash may introduce a single character, or it may introduce one - of the specials, which just set a flag. The sequence \b is a special - case. Inside a class (and only there) it is treated as backspace. We - assume that other escapes have more than one character in them, so - speculatively set both class_has_8bitchar and class_one_char bigger - than one. Unrecognized escapes fall through and are either treated - as literal characters (by default), or are faulted if - PCRE_EXTRA is set. */ - - if (c == CHAR_BACKSLASH) - { - escape = check_escape(&ptr, &ec, errorcodeptr, cd->bracount, options, - TRUE); - if (*errorcodeptr != 0) goto FAILED; - if (escape == 0) c = ec; - else if (escape == ESC_b) c = CHAR_BS; /* \b is backspace in a class */ - else if (escape == ESC_N) /* \N is not supported in a class */ - { - *errorcodeptr = ERR71; - goto FAILED; - } - else if (escape == ESC_Q) /* Handle start of quoted string */ - { - if (ptr[1] == CHAR_BACKSLASH && ptr[2] == CHAR_E) - { - ptr += 2; /* avoid empty string */ - } - else inescq = TRUE; - continue; - } - else if (escape == ESC_E) continue; /* Ignore orphan \E */ - - else - { - register const pcre_uint8 *cbits = cd->cbits; - /* Every class contains at least two < 256 characters. */ - class_has_8bitchar++; - /* Every class contains at least two characters. */ - class_one_char += 2; - - switch (escape) - { -#ifdef SUPPORT_UCP - case ESC_du: /* These are the values given for \d etc */ - case ESC_DU: /* when PCRE_UCP is set. We replace the */ - case ESC_wu: /* escape sequence with an appropriate \p */ - case ESC_WU: /* or \P to test Unicode properties instead */ - case ESC_su: /* of the default ASCII testing. */ - case ESC_SU: - nestptr = ptr; - ptr = substitutes[escape - ESC_DU] - 1; /* Just before substitute */ - class_has_8bitchar--; /* Undo! */ - continue; -#endif - case ESC_d: - for (c = 0; c < 32; c++) classbits[c] |= cbits[c+cbit_digit]; - continue; - - case ESC_D: - should_flip_negation = TRUE; - for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_digit]; - continue; - - case ESC_w: - for (c = 0; c < 32; c++) classbits[c] |= cbits[c+cbit_word]; - continue; - - case ESC_W: - should_flip_negation = TRUE; - for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_word]; - continue; - - /* Perl 5.004 onwards omitted VT from \s, but restored it at Perl - 5.18. Before PCRE 8.34, we had to preserve the VT bit if it was - previously set by something earlier in the character class. - Luckily, the value of CHAR_VT is 0x0b in both ASCII and EBCDIC, so - we could just adjust the appropriate bit. From PCRE 8.34 we no - longer treat \s and \S specially. */ - - case ESC_s: - for (c = 0; c < 32; c++) classbits[c] |= cbits[c+cbit_space]; - continue; - - case ESC_S: - should_flip_negation = TRUE; - for (c = 0; c < 32; c++) classbits[c] |= ~cbits[c+cbit_space]; - continue; - - /* The rest apply in both UCP and non-UCP cases. */ - - case ESC_h: - (void)add_list_to_class(classbits, &class_uchardata, options, cd, - PRIV(hspace_list), NOTACHAR); - continue; - - case ESC_H: - (void)add_not_list_to_class(classbits, &class_uchardata, options, - cd, PRIV(hspace_list)); - continue; - - case ESC_v: - (void)add_list_to_class(classbits, &class_uchardata, options, cd, - PRIV(vspace_list), NOTACHAR); - continue; - - case ESC_V: - (void)add_not_list_to_class(classbits, &class_uchardata, options, - cd, PRIV(vspace_list)); - continue; - - case ESC_p: - case ESC_P: -#ifdef SUPPORT_UCP - { - BOOL negated; - unsigned int ptype = 0, pdata = 0; - if (!get_ucp(&ptr, &negated, &ptype, &pdata, errorcodeptr)) - goto FAILED; - *class_uchardata++ = ((escape == ESC_p) != negated)? - XCL_PROP : XCL_NOTPROP; - *class_uchardata++ = ptype; - *class_uchardata++ = pdata; - xclass_has_prop = TRUE; - class_has_8bitchar--; /* Undo! */ - continue; - } -#else - *errorcodeptr = ERR45; - goto FAILED; -#endif - /* Unrecognized escapes are faulted if PCRE is running in its - strict mode. By default, for compatibility with Perl, they are - treated as literals. */ - - default: - if ((options & PCRE_EXTRA) != 0) - { - *errorcodeptr = ERR7; - goto FAILED; - } - class_has_8bitchar--; /* Undo the speculative increase. */ - class_one_char -= 2; /* Undo the speculative increase. */ - c = *ptr; /* Get the final character and fall through */ - break; - } - } - - /* Fall through if the escape just defined a single character (c >= 0). - This may be greater than 256. */ - - escape = 0; - - } /* End of backslash handling */ - - /* A character may be followed by '-' to form a range. However, Perl does - not permit ']' to be the end of the range. A '-' character at the end is - treated as a literal. Perl ignores orphaned \E sequences entirely. The - code for handling \Q and \E is messy. */ - - CHECK_RANGE: - while (ptr[1] == CHAR_BACKSLASH && ptr[2] == CHAR_E) - { - inescq = FALSE; - ptr += 2; - } - oldptr = ptr; - - /* Remember if \r or \n were explicitly used */ - - if (c == CHAR_CR || c == CHAR_NL) cd->external_flags |= PCRE_HASCRORLF; - - /* Check for range */ - - if (!inescq && ptr[1] == CHAR_MINUS) - { - pcre_uint32 d; - ptr += 2; - while (*ptr == CHAR_BACKSLASH && ptr[1] == CHAR_E) ptr += 2; - - /* If we hit \Q (not followed by \E) at this point, go into escaped - mode. */ - - while (*ptr == CHAR_BACKSLASH && ptr[1] == CHAR_Q) - { - ptr += 2; - if (*ptr == CHAR_BACKSLASH && ptr[1] == CHAR_E) - { ptr += 2; continue; } - inescq = TRUE; - break; - } - - /* Minus (hyphen) at the end of a class is treated as a literal, so put - back the pointer and jump to handle the character that preceded it. */ - - if (*ptr == CHAR_NULL || (!inescq && *ptr == CHAR_RIGHT_SQUARE_BRACKET)) - { - ptr = oldptr; - goto CLASS_SINGLE_CHARACTER; - } - - /* Otherwise, we have a potential range; pick up the next character */ - -#ifdef SUPPORT_UTF - if (utf) - { /* Braces are required because the */ - GETCHARLEN(d, ptr, ptr); /* macro generates multiple statements */ - } - else -#endif - d = *ptr; /* Not UTF-8 mode */ - - /* The second part of a range can be a single-character escape - sequence, but not any of the other escapes. Perl treats a hyphen as a - literal in such circumstances. However, in Perl's warning mode, a - warning is given, so PCRE now faults it as it is almost certainly a - mistake on the user's part. */ - - if (!inescq) - { - if (d == CHAR_BACKSLASH) - { - int descape; - descape = check_escape(&ptr, &d, errorcodeptr, cd->bracount, options, TRUE); - if (*errorcodeptr != 0) goto FAILED; - - /* 0 means a character was put into d; \b is backspace; any other - special causes an error. */ - - if (descape != 0) - { - if (descape == ESC_b) d = CHAR_BS; else - { - *errorcodeptr = ERR83; - goto FAILED; - } - } - } - - /* A hyphen followed by a POSIX class is treated in the same way. */ - - else if (d == CHAR_LEFT_SQUARE_BRACKET && - (ptr[1] == CHAR_COLON || ptr[1] == CHAR_DOT || - ptr[1] == CHAR_EQUALS_SIGN) && - check_posix_syntax(ptr, &tempptr)) - { - *errorcodeptr = ERR83; - goto FAILED; - } - } - - /* Check that the two values are in the correct order. Optimize - one-character ranges. */ - - if (d < c) - { - *errorcodeptr = ERR8; - goto FAILED; - } - if (d == c) goto CLASS_SINGLE_CHARACTER; /* A few lines below */ - - /* We have found a character range, so single character optimizations - cannot be done anymore. Any value greater than 1 indicates that there - is more than one character. */ - - class_one_char = 2; - - /* Remember an explicit \r or \n, and add the range to the class. */ - - if (d == CHAR_CR || d == CHAR_NL) cd->external_flags |= PCRE_HASCRORLF; - - class_has_8bitchar += - add_to_class(classbits, &class_uchardata, options, cd, c, d); - - continue; /* Go get the next char in the class */ - } - - /* Handle a single character - we can get here for a normal non-escape - char, or after \ that introduces a single character or for an apparent - range that isn't. Only the value 1 matters for class_one_char, so don't - increase it if it is already 2 or more ... just in case there's a class - with a zillion characters in it. */ - - CLASS_SINGLE_CHARACTER: - if (class_one_char < 2) class_one_char++; - - /* If xclass_has_prop is false and class_one_char is 1, we have the first - single character in the class, and there have been no prior ranges, or - XCLASS items generated by escapes. If this is the final character in the - class, we can optimize by turning the item into a 1-character OP_CHAR[I] - if it's positive, or OP_NOT[I] if it's negative. In the positive case, it - can cause firstchar to be set. Otherwise, there can be no first char if - this item is first, whatever repeat count may follow. In the case of - reqchar, save the previous value for reinstating. */ - - if (!inescq && -#ifdef SUPPORT_UCP - !xclass_has_prop && -#endif - class_one_char == 1 && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET) - { - ptr++; - zeroreqchar = reqchar; - zeroreqcharflags = reqcharflags; - - if (negate_class) - { -#ifdef SUPPORT_UCP - int d; -#endif - if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE; - zerofirstchar = firstchar; - zerofirstcharflags = firstcharflags; - - /* For caseless UTF-8 mode when UCP support is available, check - whether this character has more than one other case. If so, generate - a special OP_NOTPROP item instead of OP_NOTI. */ - -#ifdef SUPPORT_UCP - if (utf && (options & PCRE_CASELESS) != 0 && - (d = UCD_CASESET(c)) != 0) - { - *code++ = OP_NOTPROP; - *code++ = PT_CLIST; - *code++ = d; - } - else -#endif - /* Char has only one other case, or UCP not available */ - - { - *code++ = ((options & PCRE_CASELESS) != 0)? OP_NOTI: OP_NOT; -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 - if (utf && c > MAX_VALUE_FOR_SINGLE_CHAR) - code += PRIV(ord2utf)(c, code); - else -#endif - *code++ = c; - } - - /* We are finished with this character class */ - - goto END_CLASS; - } - - /* For a single, positive character, get the value into mcbuffer, and - then we can handle this with the normal one-character code. */ - -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 - if (utf && c > MAX_VALUE_FOR_SINGLE_CHAR) - mclength = PRIV(ord2utf)(c, mcbuffer); - else -#endif - { - mcbuffer[0] = c; - mclength = 1; - } - goto ONE_CHAR; - } /* End of 1-char optimization */ - - /* There is more than one character in the class, or an XCLASS item - has been generated. Add this character to the class. */ - - class_has_8bitchar += - add_to_class(classbits, &class_uchardata, options, cd, c, c); - } - - /* Loop until ']' reached. This "while" is the end of the "do" far above. - If we are at the end of an internal nested string, revert to the outer - string. */ - - while (((c = *(++ptr)) != CHAR_NULL || - (nestptr != NULL && - (ptr = nestptr, nestptr = NULL, c = *(++ptr)) != CHAR_NULL)) && - (c != CHAR_RIGHT_SQUARE_BRACKET || inescq)); - - /* Check for missing terminating ']' */ - - if (c == CHAR_NULL) - { - *errorcodeptr = ERR6; - goto FAILED; - } - - /* We will need an XCLASS if data has been placed in class_uchardata. In - the second phase this is a sufficient test. However, in the pre-compile - phase, class_uchardata gets emptied to prevent workspace overflow, so it - only if the very last character in the class needs XCLASS will it contain - anything at this point. For this reason, xclass gets set TRUE above when - uchar_classdata is emptied, and that's why this code is the way it is here - instead of just doing a test on class_uchardata below. */ - -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - if (class_uchardata > class_uchardata_base) xclass = TRUE; -#endif - - /* If this is the first thing in the branch, there can be no first char - setting, whatever the repeat count. Any reqchar setting must remain - unchanged after any kind of repeat. */ - - if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE; - zerofirstchar = firstchar; - zerofirstcharflags = firstcharflags; - zeroreqchar = reqchar; - zeroreqcharflags = reqcharflags; - - /* If there are characters with values > 255, we have to compile an - extended class, with its own opcode, unless there was a negated special - such as \S in the class, and PCRE_UCP is not set, because in that case all - characters > 255 are in the class, so any that were explicitly given as - well can be ignored. If (when there are explicit characters > 255 that must - be listed) there are no characters < 256, we can omit the bitmap in the - actual compiled code. */ - -#ifdef SUPPORT_UTF - if (xclass && (xclass_has_prop || !should_flip_negation || - (options & PCRE_UCP) != 0)) -#elif !defined COMPILE_PCRE8 - if (xclass && (xclass_has_prop || !should_flip_negation)) -#endif -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - { - /* For non-UCP wide characters, in a non-negative class containing \S or - similar (should_flip_negation is set), all characters greater than 255 - must be in the class. */ - - if ( -#if defined COMPILE_PCRE8 - utf && -#endif - should_flip_negation && !negate_class && (options & PCRE_UCP) == 0) - { - *class_uchardata++ = XCL_RANGE; - if (utf) /* Will always be utf in the 8-bit library */ - { - class_uchardata += PRIV(ord2utf)(0x100, class_uchardata); - class_uchardata += PRIV(ord2utf)(0x10ffff, class_uchardata); - } - else /* Can only happen for the 16-bit & 32-bit libraries */ - { -#if defined COMPILE_PCRE16 - *class_uchardata++ = 0x100; - *class_uchardata++ = 0xffffu; -#elif defined COMPILE_PCRE32 - *class_uchardata++ = 0x100; - *class_uchardata++ = 0xffffffffu; -#endif - } - } - - *class_uchardata++ = XCL_END; /* Marks the end of extra data */ - *code++ = OP_XCLASS; - code += LINK_SIZE; - *code = negate_class? XCL_NOT:0; - if (xclass_has_prop) *code |= XCL_HASPROP; - - /* If the map is required, move up the extra data to make room for it; - otherwise just move the code pointer to the end of the extra data. */ - - if (class_has_8bitchar > 0) - { - *code++ |= XCL_MAP; - memmove(code + (32 / sizeof(pcre_uchar)), code, - IN_UCHARS(class_uchardata - code)); - if (negate_class && !xclass_has_prop) - for (c = 0; c < 32; c++) classbits[c] = ~classbits[c]; - memcpy(code, classbits, 32); - code = class_uchardata + (32 / sizeof(pcre_uchar)); - } - else code = class_uchardata; - - /* Now fill in the complete length of the item */ - - PUT(previous, 1, (int)(code - previous)); - break; /* End of class handling */ - } - - /* Even though any XCLASS list is now discarded, we must allow for - its memory. */ - - if (lengthptr != NULL) - *lengthptr += (int)(class_uchardata - class_uchardata_base); -#endif - - /* If there are no characters > 255, or they are all to be included or - excluded, set the opcode to OP_CLASS or OP_NCLASS, depending on whether the - whole class was negated and whether there were negative specials such as \S - (non-UCP) in the class. Then copy the 32-byte map into the code vector, - negating it if necessary. */ - - *code++ = (negate_class == should_flip_negation) ? OP_CLASS : OP_NCLASS; - if (lengthptr == NULL) /* Save time in the pre-compile phase */ - { - if (negate_class) - for (c = 0; c < 32; c++) classbits[c] = ~classbits[c]; - memcpy(code, classbits, 32); - } - code += 32 / sizeof(pcre_uchar); - - END_CLASS: - break; - - - /* ===================================================================*/ - /* Various kinds of repeat; '{' is not necessarily a quantifier, but this - has been tested above. */ - - case CHAR_LEFT_CURLY_BRACKET: - if (!is_quantifier) goto NORMAL_CHAR; - ptr = read_repeat_counts(ptr+1, &repeat_min, &repeat_max, errorcodeptr); - if (*errorcodeptr != 0) goto FAILED; - goto REPEAT; - - case CHAR_ASTERISK: - repeat_min = 0; - repeat_max = -1; - goto REPEAT; - - case CHAR_PLUS: - repeat_min = 1; - repeat_max = -1; - goto REPEAT; - - case CHAR_QUESTION_MARK: - repeat_min = 0; - repeat_max = 1; - - REPEAT: - if (previous == NULL) - { - *errorcodeptr = ERR9; - goto FAILED; - } - - if (repeat_min == 0) - { - firstchar = zerofirstchar; /* Adjust for zero repeat */ - firstcharflags = zerofirstcharflags; - reqchar = zeroreqchar; /* Ditto */ - reqcharflags = zeroreqcharflags; - } - - /* Remember whether this is a variable length repeat */ - - reqvary = (repeat_min == repeat_max)? 0 : REQ_VARY; - - op_type = 0; /* Default single-char op codes */ - possessive_quantifier = FALSE; /* Default not possessive quantifier */ - - /* Save start of previous item, in case we have to move it up in order to - insert something before it. */ - - tempcode = previous; - - /* Before checking for a possessive quantifier, we must skip over - whitespace and comments in extended mode because Perl allows white space at - this point. */ - - if ((options & PCRE_EXTENDED) != 0) - { - const pcre_uchar *p = ptr + 1; - for (;;) - { - while (MAX_255(*p) && (cd->ctypes[*p] & ctype_space) != 0) p++; - if (*p != CHAR_NUMBER_SIGN) break; - p++; - while (*p != CHAR_NULL) - { - if (IS_NEWLINE(p)) /* For non-fixed-length newline cases, */ - { /* IS_NEWLINE sets cd->nllen. */ - p += cd->nllen; - break; - } - p++; -#ifdef SUPPORT_UTF - if (utf) FORWARDCHAR(p); -#endif - } /* Loop for comment characters */ - } /* Loop for multiple comments */ - ptr = p - 1; /* Character before the next significant one. */ - } - - /* We also need to skip over (?# comments, which are not dependent on - extended mode. */ - - if (ptr[1] == CHAR_LEFT_PARENTHESIS && ptr[2] == CHAR_QUESTION_MARK && - ptr[3] == CHAR_NUMBER_SIGN) - { - ptr += 4; - while (*ptr != CHAR_NULL && *ptr != CHAR_RIGHT_PARENTHESIS) ptr++; - if (*ptr == CHAR_NULL) - { - *errorcodeptr = ERR18; - goto FAILED; - } - } - - /* If the next character is '+', we have a possessive quantifier. This - implies greediness, whatever the setting of the PCRE_UNGREEDY option. - If the next character is '?' this is a minimizing repeat, by default, - but if PCRE_UNGREEDY is set, it works the other way round. We change the - repeat type to the non-default. */ - - if (ptr[1] == CHAR_PLUS) - { - repeat_type = 0; /* Force greedy */ - possessive_quantifier = TRUE; - ptr++; - } - else if (ptr[1] == CHAR_QUESTION_MARK) - { - repeat_type = greedy_non_default; - ptr++; - } - else repeat_type = greedy_default; - - /* If previous was a recursion call, wrap it in atomic brackets so that - previous becomes the atomic group. All recursions were so wrapped in the - past, but it no longer happens for non-repeated recursions. In fact, the - repeated ones could be re-implemented independently so as not to need this, - but for the moment we rely on the code for repeating groups. */ - - if (*previous == OP_RECURSE) - { - memmove(previous + 1 + LINK_SIZE, previous, IN_UCHARS(1 + LINK_SIZE)); - *previous = OP_ONCE; - PUT(previous, 1, 2 + 2*LINK_SIZE); - previous[2 + 2*LINK_SIZE] = OP_KET; - PUT(previous, 3 + 2*LINK_SIZE, 2 + 2*LINK_SIZE); - code += 2 + 2 * LINK_SIZE; - length_prevgroup = 3 + 3*LINK_SIZE; - - /* When actually compiling, we need to check whether this was a forward - reference, and if so, adjust the offset. */ - - if (lengthptr == NULL && cd->hwm >= cd->start_workspace + LINK_SIZE) - { - int offset = GET(cd->hwm, -LINK_SIZE); - if (offset == previous + 1 - cd->start_code) - PUT(cd->hwm, -LINK_SIZE, offset + 1 + LINK_SIZE); - } - } - - /* Now handle repetition for the different types of item. */ - - /* If previous was a character or negated character match, abolish the item - and generate a repeat item instead. If a char item has a minimum of more - than one, ensure that it is set in reqchar - it might not be if a sequence - such as x{3} is the first thing in a branch because the x will have gone - into firstchar instead. */ - - if (*previous == OP_CHAR || *previous == OP_CHARI - || *previous == OP_NOT || *previous == OP_NOTI) - { - switch (*previous) - { - default: /* Make compiler happy. */ - case OP_CHAR: op_type = OP_STAR - OP_STAR; break; - case OP_CHARI: op_type = OP_STARI - OP_STAR; break; - case OP_NOT: op_type = OP_NOTSTAR - OP_STAR; break; - case OP_NOTI: op_type = OP_NOTSTARI - OP_STAR; break; - } - - /* Deal with UTF characters that take up more than one character. It's - easier to write this out separately than try to macrify it. Use c to - hold the length of the character in bytes, plus UTF_LENGTH to flag that - it's a length rather than a small character. */ - -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 - if (utf && NOT_FIRSTCHAR(code[-1])) - { - pcre_uchar *lastchar = code - 1; - BACKCHAR(lastchar); - c = (int)(code - lastchar); /* Length of UTF-8 character */ - memcpy(utf_chars, lastchar, IN_UCHARS(c)); /* Save the char */ - c |= UTF_LENGTH; /* Flag c as a length */ - } - else -#endif /* SUPPORT_UTF */ - - /* Handle the case of a single charater - either with no UTF support, or - with UTF disabled, or for a single character UTF character. */ - { - c = code[-1]; - if (*previous <= OP_CHARI && repeat_min > 1) - { - reqchar = c; - reqcharflags = req_caseopt | cd->req_varyopt; - } - } - - goto OUTPUT_SINGLE_REPEAT; /* Code shared with single character types */ - } - - /* If previous was a character type match (\d or similar), abolish it and - create a suitable repeat item. The code is shared with single-character - repeats by setting op_type to add a suitable offset into repeat_type. Note - the the Unicode property types will be present only when SUPPORT_UCP is - defined, but we don't wrap the little bits of code here because it just - makes it horribly messy. */ - - else if (*previous < OP_EODN) - { - pcre_uchar *oldcode; - int prop_type, prop_value; - op_type = OP_TYPESTAR - OP_STAR; /* Use type opcodes */ - c = *previous; - - OUTPUT_SINGLE_REPEAT: - if (*previous == OP_PROP || *previous == OP_NOTPROP) - { - prop_type = previous[1]; - prop_value = previous[2]; - } - else prop_type = prop_value = -1; - - oldcode = code; - code = previous; /* Usually overwrite previous item */ - - /* If the maximum is zero then the minimum must also be zero; Perl allows - this case, so we do too - by simply omitting the item altogether. */ - - if (repeat_max == 0) goto END_REPEAT; - - /* Combine the op_type with the repeat_type */ - - repeat_type += op_type; - - /* A minimum of zero is handled either as the special case * or ?, or as - an UPTO, with the maximum given. */ - - if (repeat_min == 0) - { - if (repeat_max == -1) *code++ = OP_STAR + repeat_type; - else if (repeat_max == 1) *code++ = OP_QUERY + repeat_type; - else - { - *code++ = OP_UPTO + repeat_type; - PUT2INC(code, 0, repeat_max); - } - } - - /* A repeat minimum of 1 is optimized into some special cases. If the - maximum is unlimited, we use OP_PLUS. Otherwise, the original item is - left in place and, if the maximum is greater than 1, we use OP_UPTO with - one less than the maximum. */ - - else if (repeat_min == 1) - { - if (repeat_max == -1) - *code++ = OP_PLUS + repeat_type; - else - { - code = oldcode; /* leave previous item in place */ - if (repeat_max == 1) goto END_REPEAT; - *code++ = OP_UPTO + repeat_type; - PUT2INC(code, 0, repeat_max - 1); - } - } - - /* The case {n,n} is just an EXACT, while the general case {n,m} is - handled as an EXACT followed by an UPTO. */ - - else - { - *code++ = OP_EXACT + op_type; /* NB EXACT doesn't have repeat_type */ - PUT2INC(code, 0, repeat_min); - - /* If the maximum is unlimited, insert an OP_STAR. Before doing so, - we have to insert the character for the previous code. For a repeated - Unicode property match, there are two extra bytes that define the - required property. In UTF-8 mode, long characters have their length in - c, with the UTF_LENGTH bit as a flag. */ - - if (repeat_max < 0) - { -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 - if (utf && (c & UTF_LENGTH) != 0) - { - memcpy(code, utf_chars, IN_UCHARS(c & 7)); - code += c & 7; - } - else -#endif - { - *code++ = c; - if (prop_type >= 0) - { - *code++ = prop_type; - *code++ = prop_value; - } - } - *code++ = OP_STAR + repeat_type; - } - - /* Else insert an UPTO if the max is greater than the min, again - preceded by the character, for the previously inserted code. If the - UPTO is just for 1 instance, we can use QUERY instead. */ - - else if (repeat_max != repeat_min) - { -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 - if (utf && (c & UTF_LENGTH) != 0) - { - memcpy(code, utf_chars, IN_UCHARS(c & 7)); - code += c & 7; - } - else -#endif - *code++ = c; - if (prop_type >= 0) - { - *code++ = prop_type; - *code++ = prop_value; - } - repeat_max -= repeat_min; - - if (repeat_max == 1) - { - *code++ = OP_QUERY + repeat_type; - } - else - { - *code++ = OP_UPTO + repeat_type; - PUT2INC(code, 0, repeat_max); - } - } - } - - /* The character or character type itself comes last in all cases. */ - -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 - if (utf && (c & UTF_LENGTH) != 0) - { - memcpy(code, utf_chars, IN_UCHARS(c & 7)); - code += c & 7; - } - else -#endif - *code++ = c; - - /* For a repeated Unicode property match, there are two extra bytes that - define the required property. */ - -#ifdef SUPPORT_UCP - if (prop_type >= 0) - { - *code++ = prop_type; - *code++ = prop_value; - } -#endif - } - - /* If previous was a character class or a back reference, we put the repeat - stuff after it, but just skip the item if the repeat was {0,0}. */ - - else if (*previous == OP_CLASS || *previous == OP_NCLASS || -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - *previous == OP_XCLASS || -#endif - *previous == OP_REF || *previous == OP_REFI || - *previous == OP_DNREF || *previous == OP_DNREFI) - { - if (repeat_max == 0) - { - code = previous; - goto END_REPEAT; - } - - if (repeat_min == 0 && repeat_max == -1) - *code++ = OP_CRSTAR + repeat_type; - else if (repeat_min == 1 && repeat_max == -1) - *code++ = OP_CRPLUS + repeat_type; - else if (repeat_min == 0 && repeat_max == 1) - *code++ = OP_CRQUERY + repeat_type; - else - { - *code++ = OP_CRRANGE + repeat_type; - PUT2INC(code, 0, repeat_min); - if (repeat_max == -1) repeat_max = 0; /* 2-byte encoding for max */ - PUT2INC(code, 0, repeat_max); - } - } - - /* If previous was a bracket group, we may have to replicate it in certain - cases. Note that at this point we can encounter only the "basic" bracket - opcodes such as BRA and CBRA, as this is the place where they get converted - into the more special varieties such as BRAPOS and SBRA. A test for >= - OP_ASSERT and <= OP_COND includes ASSERT, ASSERT_NOT, ASSERTBACK, - ASSERTBACK_NOT, ONCE, ONCE_NC, BRA, BRAPOS, CBRA, CBRAPOS, and COND. - Originally, PCRE did not allow repetition of assertions, but now it does, - for Perl compatibility. */ - - else if (*previous >= OP_ASSERT && *previous <= OP_COND) - { - register int i; - int len = (int)(code - previous); - size_t base_hwm_offset = item_hwm_offset; - pcre_uchar *bralink = NULL; - pcre_uchar *brazeroptr = NULL; - - /* Repeating a DEFINE group is pointless, but Perl allows the syntax, so - we just ignore the repeat. */ - - if (*previous == OP_COND && previous[LINK_SIZE+1] == OP_DEF) - goto END_REPEAT; - - /* There is no sense in actually repeating assertions. The only potential - use of repetition is in cases when the assertion is optional. Therefore, - if the minimum is greater than zero, just ignore the repeat. If the - maximum is not zero or one, set it to 1. */ - - if (*previous < OP_ONCE) /* Assertion */ - { - if (repeat_min > 0) goto END_REPEAT; - if (repeat_max < 0 || repeat_max > 1) repeat_max = 1; - } - - /* The case of a zero minimum is special because of the need to stick - OP_BRAZERO in front of it, and because the group appears once in the - data, whereas in other cases it appears the minimum number of times. For - this reason, it is simplest to treat this case separately, as otherwise - the code gets far too messy. There are several special subcases when the - minimum is zero. */ - - if (repeat_min == 0) - { - /* If the maximum is also zero, we used to just omit the group from the - output altogether, like this: - - ** if (repeat_max == 0) - ** { - ** code = previous; - ** goto END_REPEAT; - ** } - - However, that fails when a group or a subgroup within it is referenced - as a subroutine from elsewhere in the pattern, so now we stick in - OP_SKIPZERO in front of it so that it is skipped on execution. As we - don't have a list of which groups are referenced, we cannot do this - selectively. - - If the maximum is 1 or unlimited, we just have to stick in the BRAZERO - and do no more at this point. However, we do need to adjust any - OP_RECURSE calls inside the group that refer to the group itself or any - internal or forward referenced group, because the offset is from the - start of the whole regex. Temporarily terminate the pattern while doing - this. */ - - if (repeat_max <= 1) /* Covers 0, 1, and unlimited */ - { - *code = OP_END; - adjust_recurse(previous, 1, utf, cd, item_hwm_offset); - memmove(previous + 1, previous, IN_UCHARS(len)); - code++; - if (repeat_max == 0) - { - *previous++ = OP_SKIPZERO; - goto END_REPEAT; - } - brazeroptr = previous; /* Save for possessive optimizing */ - *previous++ = OP_BRAZERO + repeat_type; - } - - /* If the maximum is greater than 1 and limited, we have to replicate - in a nested fashion, sticking OP_BRAZERO before each set of brackets. - The first one has to be handled carefully because it's the original - copy, which has to be moved up. The remainder can be handled by code - that is common with the non-zero minimum case below. We have to - adjust the value or repeat_max, since one less copy is required. Once - again, we may have to adjust any OP_RECURSE calls inside the group. */ - - else - { - int offset; - *code = OP_END; - adjust_recurse(previous, 2 + LINK_SIZE, utf, cd, item_hwm_offset); - memmove(previous + 2 + LINK_SIZE, previous, IN_UCHARS(len)); - code += 2 + LINK_SIZE; - *previous++ = OP_BRAZERO + repeat_type; - *previous++ = OP_BRA; - - /* We chain together the bracket offset fields that have to be - filled in later when the ends of the brackets are reached. */ - - offset = (bralink == NULL)? 0 : (int)(previous - bralink); - bralink = previous; - PUTINC(previous, 0, offset); - } - - repeat_max--; - } - - /* If the minimum is greater than zero, replicate the group as many - times as necessary, and adjust the maximum to the number of subsequent - copies that we need. If we set a first char from the group, and didn't - set a required char, copy the latter from the former. If there are any - forward reference subroutine calls in the group, there will be entries on - the workspace list; replicate these with an appropriate increment. */ - - else - { - if (repeat_min > 1) - { - /* In the pre-compile phase, we don't actually do the replication. We - just adjust the length as if we had. Do some paranoid checks for - potential integer overflow. The INT64_OR_DOUBLE type is a 64-bit - integer type when available, otherwise double. */ - - if (lengthptr != NULL) - { - int delta = (repeat_min - 1)*length_prevgroup; - if ((INT64_OR_DOUBLE)(repeat_min - 1)* - (INT64_OR_DOUBLE)length_prevgroup > - (INT64_OR_DOUBLE)INT_MAX || - OFLOW_MAX - *lengthptr < delta) - { - *errorcodeptr = ERR20; - goto FAILED; - } - *lengthptr += delta; - } - - /* This is compiling for real. If there is a set first byte for - the group, and we have not yet set a "required byte", set it. Make - sure there is enough workspace for copying forward references before - doing the copy. */ - - else - { - if (groupsetfirstchar && reqcharflags < 0) - { - reqchar = firstchar; - reqcharflags = firstcharflags; - } - - for (i = 1; i < repeat_min; i++) - { - pcre_uchar *hc; - size_t this_hwm_offset = cd->hwm - cd->start_workspace; - memcpy(code, previous, IN_UCHARS(len)); - - while (cd->hwm > cd->start_workspace + cd->workspace_size - - WORK_SIZE_SAFETY_MARGIN - - (this_hwm_offset - base_hwm_offset)) - { - *errorcodeptr = expand_workspace(cd); - if (*errorcodeptr != 0) goto FAILED; - } - - for (hc = (pcre_uchar *)cd->start_workspace + base_hwm_offset; - hc < (pcre_uchar *)cd->start_workspace + this_hwm_offset; - hc += LINK_SIZE) - { - PUT(cd->hwm, 0, GET(hc, 0) + len); - cd->hwm += LINK_SIZE; - } - base_hwm_offset = this_hwm_offset; - code += len; - } - } - } - - if (repeat_max > 0) repeat_max -= repeat_min; - } - - /* This code is common to both the zero and non-zero minimum cases. If - the maximum is limited, it replicates the group in a nested fashion, - remembering the bracket starts on a stack. In the case of a zero minimum, - the first one was set up above. In all cases the repeat_max now specifies - the number of additional copies needed. Again, we must remember to - replicate entries on the forward reference list. */ - - if (repeat_max >= 0) - { - /* In the pre-compile phase, we don't actually do the replication. We - just adjust the length as if we had. For each repetition we must add 1 - to the length for BRAZERO and for all but the last repetition we must - add 2 + 2*LINKSIZE to allow for the nesting that occurs. Do some - paranoid checks to avoid integer overflow. The INT64_OR_DOUBLE type is - a 64-bit integer type when available, otherwise double. */ - - if (lengthptr != NULL && repeat_max > 0) - { - int delta = repeat_max * (length_prevgroup + 1 + 2 + 2*LINK_SIZE) - - 2 - 2*LINK_SIZE; /* Last one doesn't nest */ - if ((INT64_OR_DOUBLE)repeat_max * - (INT64_OR_DOUBLE)(length_prevgroup + 1 + 2 + 2*LINK_SIZE) - > (INT64_OR_DOUBLE)INT_MAX || - OFLOW_MAX - *lengthptr < delta) - { - *errorcodeptr = ERR20; - goto FAILED; - } - *lengthptr += delta; - } - - /* This is compiling for real */ - - else for (i = repeat_max - 1; i >= 0; i--) - { - pcre_uchar *hc; - size_t this_hwm_offset = cd->hwm - cd->start_workspace; - - *code++ = OP_BRAZERO + repeat_type; - - /* All but the final copy start a new nesting, maintaining the - chain of brackets outstanding. */ - - if (i != 0) - { - int offset; - *code++ = OP_BRA; - offset = (bralink == NULL)? 0 : (int)(code - bralink); - bralink = code; - PUTINC(code, 0, offset); - } - - memcpy(code, previous, IN_UCHARS(len)); - - /* Ensure there is enough workspace for forward references before - copying them. */ - - while (cd->hwm > cd->start_workspace + cd->workspace_size - - WORK_SIZE_SAFETY_MARGIN - - (this_hwm_offset - base_hwm_offset)) - { - *errorcodeptr = expand_workspace(cd); - if (*errorcodeptr != 0) goto FAILED; - } - - for (hc = (pcre_uchar *)cd->start_workspace + base_hwm_offset; - hc < (pcre_uchar *)cd->start_workspace + this_hwm_offset; - hc += LINK_SIZE) - { - PUT(cd->hwm, 0, GET(hc, 0) + len + ((i != 0)? 2+LINK_SIZE : 1)); - cd->hwm += LINK_SIZE; - } - base_hwm_offset = this_hwm_offset; - code += len; - } - - /* Now chain through the pending brackets, and fill in their length - fields (which are holding the chain links pro tem). */ - - while (bralink != NULL) - { - int oldlinkoffset; - int offset = (int)(code - bralink + 1); - pcre_uchar *bra = code - offset; - oldlinkoffset = GET(bra, 1); - bralink = (oldlinkoffset == 0)? NULL : bralink - oldlinkoffset; - *code++ = OP_KET; - PUTINC(code, 0, offset); - PUT(bra, 1, offset); - } - } - - /* If the maximum is unlimited, set a repeater in the final copy. For - ONCE brackets, that's all we need to do. However, possessively repeated - ONCE brackets can be converted into non-capturing brackets, as the - behaviour of (?:xx)++ is the same as (?>xx)++ and this saves having to - deal with possessive ONCEs specially. - - Otherwise, when we are doing the actual compile phase, check to see - whether this group is one that could match an empty string. If so, - convert the initial operator to the S form (e.g. OP_BRA -> OP_SBRA) so - that runtime checking can be done. [This check is also applied to ONCE - groups at runtime, but in a different way.] - - Then, if the quantifier was possessive and the bracket is not a - conditional, we convert the BRA code to the POS form, and the KET code to - KETRPOS. (It turns out to be convenient at runtime to detect this kind of - subpattern at both the start and at the end.) The use of special opcodes - makes it possible to reduce greatly the stack usage in pcre_exec(). If - the group is preceded by OP_BRAZERO, convert this to OP_BRAPOSZERO. - - Then, if the minimum number of matches is 1 or 0, cancel the possessive - flag so that the default action below, of wrapping everything inside - atomic brackets, does not happen. When the minimum is greater than 1, - there will be earlier copies of the group, and so we still have to wrap - the whole thing. */ - - else - { - pcre_uchar *ketcode = code - 1 - LINK_SIZE; - pcre_uchar *bracode = ketcode - GET(ketcode, 1); - - /* Convert possessive ONCE brackets to non-capturing */ - - if ((*bracode == OP_ONCE || *bracode == OP_ONCE_NC) && - possessive_quantifier) *bracode = OP_BRA; - - /* For non-possessive ONCE brackets, all we need to do is to - set the KET. */ - - if (*bracode == OP_ONCE || *bracode == OP_ONCE_NC) - *ketcode = OP_KETRMAX + repeat_type; - - /* Handle non-ONCE brackets and possessive ONCEs (which have been - converted to non-capturing above). */ - - else - { - /* In the compile phase, check for empty string matching. */ - - if (lengthptr == NULL) - { - pcre_uchar *scode = bracode; - do - { - if (could_be_empty_branch(scode, ketcode, utf, cd, NULL)) - { - *bracode += OP_SBRA - OP_BRA; - break; - } - scode += GET(scode, 1); - } - while (*scode == OP_ALT); - } - - /* A conditional group with only one branch has an implicit empty - alternative branch. */ - - if (*bracode == OP_COND && bracode[GET(bracode,1)] != OP_ALT) - *bracode = OP_SCOND; - - /* Handle possessive quantifiers. */ - - if (possessive_quantifier) - { - /* For COND brackets, we wrap the whole thing in a possessively - repeated non-capturing bracket, because we have not invented POS - versions of the COND opcodes. Because we are moving code along, we - must ensure that any pending recursive references are updated. */ - - if (*bracode == OP_COND || *bracode == OP_SCOND) - { - int nlen = (int)(code - bracode); - *code = OP_END; - adjust_recurse(bracode, 1 + LINK_SIZE, utf, cd, item_hwm_offset); - memmove(bracode + 1 + LINK_SIZE, bracode, IN_UCHARS(nlen)); - code += 1 + LINK_SIZE; - nlen += 1 + LINK_SIZE; - *bracode = (*bracode == OP_COND)? OP_BRAPOS : OP_SBRAPOS; - *code++ = OP_KETRPOS; - PUTINC(code, 0, nlen); - PUT(bracode, 1, nlen); - } - - /* For non-COND brackets, we modify the BRA code and use KETRPOS. */ - - else - { - *bracode += 1; /* Switch to xxxPOS opcodes */ - *ketcode = OP_KETRPOS; - } - - /* If the minimum is zero, mark it as possessive, then unset the - possessive flag when the minimum is 0 or 1. */ - - if (brazeroptr != NULL) *brazeroptr = OP_BRAPOSZERO; - if (repeat_min < 2) possessive_quantifier = FALSE; - } - - /* Non-possessive quantifier */ - - else *ketcode = OP_KETRMAX + repeat_type; - } - } - } - - /* If previous is OP_FAIL, it was generated by an empty class [] in - JavaScript mode. The other ways in which OP_FAIL can be generated, that is - by (*FAIL) or (?!) set previous to NULL, which gives a "nothing to repeat" - error above. We can just ignore the repeat in JS case. */ - - else if (*previous == OP_FAIL) goto END_REPEAT; - - /* Else there's some kind of shambles */ - - else - { - *errorcodeptr = ERR11; - goto FAILED; - } - - /* If the character following a repeat is '+', possessive_quantifier is - TRUE. For some opcodes, there are special alternative opcodes for this - case. For anything else, we wrap the entire repeated item inside OP_ONCE - brackets. Logically, the '+' notation is just syntactic sugar, taken from - Sun's Java package, but the special opcodes can optimize it. - - Some (but not all) possessively repeated subpatterns have already been - completely handled in the code just above. For them, possessive_quantifier - is always FALSE at this stage. Note that the repeated item starts at - tempcode, not at previous, which might be the first part of a string whose - (former) last char we repeated. */ - - if (possessive_quantifier) - { - int len; - - /* Possessifying an EXACT quantifier has no effect, so we can ignore it. - However, QUERY, STAR, or UPTO may follow (for quantifiers such as {5,6}, - {5,}, or {5,10}). We skip over an EXACT item; if the length of what - remains is greater than zero, there's a further opcode that can be - handled. If not, do nothing, leaving the EXACT alone. */ - - switch(*tempcode) - { - case OP_TYPEEXACT: - tempcode += PRIV(OP_lengths)[*tempcode] + - ((tempcode[1 + IMM2_SIZE] == OP_PROP - || tempcode[1 + IMM2_SIZE] == OP_NOTPROP)? 2 : 0); - break; - - /* CHAR opcodes are used for exacts whose count is 1. */ - - case OP_CHAR: - case OP_CHARI: - case OP_NOT: - case OP_NOTI: - case OP_EXACT: - case OP_EXACTI: - case OP_NOTEXACT: - case OP_NOTEXACTI: - tempcode += PRIV(OP_lengths)[*tempcode]; -#ifdef SUPPORT_UTF - if (utf && HAS_EXTRALEN(tempcode[-1])) - tempcode += GET_EXTRALEN(tempcode[-1]); -#endif - break; - - /* For the class opcodes, the repeat operator appears at the end; - adjust tempcode to point to it. */ - - case OP_CLASS: - case OP_NCLASS: - tempcode += 1 + 32/sizeof(pcre_uchar); - break; - -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - case OP_XCLASS: - tempcode += GET(tempcode, 1); - break; -#endif - } - - /* If tempcode is equal to code (which points to the end of the repeated - item), it means we have skipped an EXACT item but there is no following - QUERY, STAR, or UPTO; the value of len will be 0, and we do nothing. In - all other cases, tempcode will be pointing to the repeat opcode, and will - be less than code, so the value of len will be greater than 0. */ - - len = (int)(code - tempcode); - if (len > 0) - { - unsigned int repcode = *tempcode; - - /* There is a table for possessifying opcodes, all of which are less - than OP_CALLOUT. A zero entry means there is no possessified version. - */ - - if (repcode < OP_CALLOUT && opcode_possessify[repcode] > 0) - *tempcode = opcode_possessify[repcode]; - - /* For opcode without a special possessified version, wrap the item in - ONCE brackets. Because we are moving code along, we must ensure that any - pending recursive references are updated. */ - - else - { - *code = OP_END; - adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, item_hwm_offset); - memmove(tempcode + 1 + LINK_SIZE, tempcode, IN_UCHARS(len)); - code += 1 + LINK_SIZE; - len += 1 + LINK_SIZE; - tempcode[0] = OP_ONCE; - *code++ = OP_KET; - PUTINC(code, 0, len); - PUT(tempcode, 1, len); - } - } - -#ifdef NEVER - if (len > 0) switch (*tempcode) - { - case OP_STAR: *tempcode = OP_POSSTAR; break; - case OP_PLUS: *tempcode = OP_POSPLUS; break; - case OP_QUERY: *tempcode = OP_POSQUERY; break; - case OP_UPTO: *tempcode = OP_POSUPTO; break; - - case OP_STARI: *tempcode = OP_POSSTARI; break; - case OP_PLUSI: *tempcode = OP_POSPLUSI; break; - case OP_QUERYI: *tempcode = OP_POSQUERYI; break; - case OP_UPTOI: *tempcode = OP_POSUPTOI; break; - - case OP_NOTSTAR: *tempcode = OP_NOTPOSSTAR; break; - case OP_NOTPLUS: *tempcode = OP_NOTPOSPLUS; break; - case OP_NOTQUERY: *tempcode = OP_NOTPOSQUERY; break; - case OP_NOTUPTO: *tempcode = OP_NOTPOSUPTO; break; - - case OP_NOTSTARI: *tempcode = OP_NOTPOSSTARI; break; - case OP_NOTPLUSI: *tempcode = OP_NOTPOSPLUSI; break; - case OP_NOTQUERYI: *tempcode = OP_NOTPOSQUERYI; break; - case OP_NOTUPTOI: *tempcode = OP_NOTPOSUPTOI; break; - - case OP_TYPESTAR: *tempcode = OP_TYPEPOSSTAR; break; - case OP_TYPEPLUS: *tempcode = OP_TYPEPOSPLUS; break; - case OP_TYPEQUERY: *tempcode = OP_TYPEPOSQUERY; break; - case OP_TYPEUPTO: *tempcode = OP_TYPEPOSUPTO; break; - - case OP_CRSTAR: *tempcode = OP_CRPOSSTAR; break; - case OP_CRPLUS: *tempcode = OP_CRPOSPLUS; break; - case OP_CRQUERY: *tempcode = OP_CRPOSQUERY; break; - case OP_CRRANGE: *tempcode = OP_CRPOSRANGE; break; - - /* Because we are moving code along, we must ensure that any - pending recursive references are updated. */ - - default: - *code = OP_END; - adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, item_hwm_offset); - memmove(tempcode + 1 + LINK_SIZE, tempcode, IN_UCHARS(len)); - code += 1 + LINK_SIZE; - len += 1 + LINK_SIZE; - tempcode[0] = OP_ONCE; - *code++ = OP_KET; - PUTINC(code, 0, len); - PUT(tempcode, 1, len); - break; - } -#endif - } - - /* In all case we no longer have a previous item. We also set the - "follows varying string" flag for subsequently encountered reqchars if - it isn't already set and we have just passed a varying length item. */ - - END_REPEAT: - previous = NULL; - cd->req_varyopt |= reqvary; - break; - - - /* ===================================================================*/ - /* Start of nested parenthesized sub-expression, or comment or lookahead or - lookbehind or option setting or condition or all the other extended - parenthesis forms. */ - - case CHAR_LEFT_PARENTHESIS: - ptr++; - - /* Now deal with various "verbs" that can be introduced by '*'. */ - - if (ptr[0] == CHAR_ASTERISK && (ptr[1] == ':' - || (MAX_255(ptr[1]) && ((cd->ctypes[ptr[1]] & ctype_letter) != 0)))) - { - int i, namelen; - int arglen = 0; - const char *vn = verbnames; - const pcre_uchar *name = ptr + 1; - const pcre_uchar *arg = NULL; - previous = NULL; - ptr++; - while (MAX_255(*ptr) && (cd->ctypes[*ptr] & ctype_letter) != 0) ptr++; - namelen = (int)(ptr - name); - - /* It appears that Perl allows any characters whatsoever, other than - a closing parenthesis, to appear in arguments, so we no longer insist on - letters, digits, and underscores. */ - - if (*ptr == CHAR_COLON) - { - arg = ++ptr; - while (*ptr != CHAR_NULL && *ptr != CHAR_RIGHT_PARENTHESIS) ptr++; - arglen = (int)(ptr - arg); - if ((unsigned int)arglen > MAX_MARK) - { - *errorcodeptr = ERR75; - goto FAILED; - } - } - - if (*ptr != CHAR_RIGHT_PARENTHESIS) - { - *errorcodeptr = ERR60; - goto FAILED; - } - - /* Scan the table of verb names */ - - for (i = 0; i < verbcount; i++) - { - if (namelen == verbs[i].len && - STRNCMP_UC_C8(name, vn, namelen) == 0) - { - int setverb; - - /* Check for open captures before ACCEPT and convert it to - ASSERT_ACCEPT if in an assertion. */ - - if (verbs[i].op == OP_ACCEPT) - { - open_capitem *oc; - if (arglen != 0) - { - *errorcodeptr = ERR59; - goto FAILED; - } - cd->had_accept = TRUE; - for (oc = cd->open_caps; oc != NULL; oc = oc->next) - { - if (lengthptr != NULL) - { -#ifdef COMPILE_PCRE8 - *lengthptr += 1 + IMM2_SIZE; -#elif defined COMPILE_PCRE16 - *lengthptr += 2 + IMM2_SIZE; -#elif defined COMPILE_PCRE32 - *lengthptr += 4 + IMM2_SIZE; -#endif - } - else - { - *code++ = OP_CLOSE; - PUT2INC(code, 0, oc->number); - } - } - setverb = *code++ = - (cd->assert_depth > 0)? OP_ASSERT_ACCEPT : OP_ACCEPT; - - /* Do not set firstchar after *ACCEPT */ - if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE; - } - - /* Handle other cases with/without an argument */ - - else if (arglen == 0) - { - if (verbs[i].op < 0) /* Argument is mandatory */ - { - *errorcodeptr = ERR66; - goto FAILED; - } - setverb = *code++ = verbs[i].op; - } - - else - { - if (verbs[i].op_arg < 0) /* Argument is forbidden */ - { - *errorcodeptr = ERR59; - goto FAILED; - } - setverb = *code++ = verbs[i].op_arg; - if (lengthptr != NULL) /* In pass 1 just add in the length */ - { /* to avoid potential workspace */ - *lengthptr += arglen; /* overflow. */ - *code++ = 0; - } - else - { - *code++ = arglen; - memcpy(code, arg, IN_UCHARS(arglen)); - code += arglen; - } - *code++ = 0; - } - - switch (setverb) - { - case OP_THEN: - case OP_THEN_ARG: - cd->external_flags |= PCRE_HASTHEN; - break; - - case OP_PRUNE: - case OP_PRUNE_ARG: - case OP_SKIP: - case OP_SKIP_ARG: - cd->had_pruneorskip = TRUE; - break; - } - - break; /* Found verb, exit loop */ - } - - vn += verbs[i].len + 1; - } - - if (i < verbcount) continue; /* Successfully handled a verb */ - *errorcodeptr = ERR60; /* Verb not recognized */ - goto FAILED; - } - - /* Initialize for "real" parentheses */ - - newoptions = options; - skipbytes = 0; - bravalue = OP_CBRA; - item_hwm_offset = cd->hwm - cd->start_workspace; - reset_bracount = FALSE; - - /* Deal with the extended parentheses; all are introduced by '?', and the - appearance of any of them means that this is not a capturing group. */ - - if (*ptr == CHAR_QUESTION_MARK) - { - int i, set, unset, namelen; - int *optset; - const pcre_uchar *name; - pcre_uchar *slot; - - switch (*(++ptr)) - { - /* ------------------------------------------------------------ */ - case CHAR_VERTICAL_LINE: /* Reset capture count for each branch */ - reset_bracount = TRUE; - cd->dupgroups = TRUE; /* Record (?| encountered */ - /* Fall through */ - - /* ------------------------------------------------------------ */ - case CHAR_COLON: /* Non-capturing bracket */ - bravalue = OP_BRA; - ptr++; - break; - - - /* ------------------------------------------------------------ */ - case CHAR_LEFT_PARENTHESIS: - bravalue = OP_COND; /* Conditional group */ - tempptr = ptr; - - /* A condition can be an assertion, a number (referring to a numbered - group's having been set), a name (referring to a named group), or 'R', - referring to recursion. R and R&name are also permitted for - recursion tests. - - There are ways of testing a named group: (?(name)) is used by Python; - Perl 5.10 onwards uses (?() or (?('name')). - - There is one unfortunate ambiguity, caused by history. 'R' can be the - recursive thing or the name 'R' (and similarly for 'R' followed by - digits). We look for a name first; if not found, we try the other case. - - For compatibility with auto-callouts, we allow a callout to be - specified before a condition that is an assertion. First, check for the - syntax of a callout; if found, adjust the temporary pointer that is - used to check for an assertion condition. That's all that is needed! */ - - if (ptr[1] == CHAR_QUESTION_MARK && ptr[2] == CHAR_C) - { - for (i = 3;; i++) if (!IS_DIGIT(ptr[i])) break; - if (ptr[i] == CHAR_RIGHT_PARENTHESIS) - tempptr += i + 1; - - /* tempptr should now be pointing to the opening parenthesis of the - assertion condition. */ - - if (*tempptr != CHAR_LEFT_PARENTHESIS) - { - *errorcodeptr = ERR28; - goto FAILED; - } - } - - /* For conditions that are assertions, check the syntax, and then exit - the switch. This will take control down to where bracketed groups, - including assertions, are processed. */ - - if (tempptr[1] == CHAR_QUESTION_MARK && - (tempptr[2] == CHAR_EQUALS_SIGN || - tempptr[2] == CHAR_EXCLAMATION_MARK || - (tempptr[2] == CHAR_LESS_THAN_SIGN && - (tempptr[3] == CHAR_EQUALS_SIGN || - tempptr[3] == CHAR_EXCLAMATION_MARK)))) - { - cd->iscondassert = TRUE; - break; - } - - /* Other conditions use OP_CREF/OP_DNCREF/OP_RREF/OP_DNRREF, and all - need to skip at least 1+IMM2_SIZE bytes at the start of the group. */ - - code[1+LINK_SIZE] = OP_CREF; - skipbytes = 1+IMM2_SIZE; - refsign = -1; /* => not a number */ - namelen = -1; /* => not a name; must set to avoid warning */ - name = NULL; /* Always set to avoid warning */ - recno = 0; /* Always set to avoid warning */ - - /* Check for a test for recursion in a named group. */ - - ptr++; - if (*ptr == CHAR_R && ptr[1] == CHAR_AMPERSAND) - { - terminator = -1; - ptr += 2; - code[1+LINK_SIZE] = OP_RREF; /* Change the type of test */ - } - - /* Check for a test for a named group's having been set, using the Perl - syntax (?() or (?('name'), and also allow for the original PCRE - syntax of (?(name) or for (?(+n), (?(-n), and just (?(n). */ - - else if (*ptr == CHAR_LESS_THAN_SIGN) - { - terminator = CHAR_GREATER_THAN_SIGN; - ptr++; - } - else if (*ptr == CHAR_APOSTROPHE) - { - terminator = CHAR_APOSTROPHE; - ptr++; - } - else - { - terminator = CHAR_NULL; - if (*ptr == CHAR_MINUS || *ptr == CHAR_PLUS) refsign = *ptr++; - else if (IS_DIGIT(*ptr)) refsign = 0; - } - - /* Handle a number */ - - if (refsign >= 0) - { - while (IS_DIGIT(*ptr)) - { - if (recno > INT_MAX / 10 - 1) /* Integer overflow */ - { - while (IS_DIGIT(*ptr)) ptr++; - *errorcodeptr = ERR61; - goto FAILED; - } - recno = recno * 10 + (int)(*ptr - CHAR_0); - ptr++; - } - } - - /* Otherwise we expect to read a name; anything else is an error. When - a name is one of a number of duplicates, a different opcode is used and - it needs more memory. Unfortunately we cannot tell whether a name is a - duplicate in the first pass, so we have to allow for more memory. */ - - else - { - if (IS_DIGIT(*ptr)) - { - *errorcodeptr = ERR84; - goto FAILED; - } - if (!MAX_255(*ptr) || (cd->ctypes[*ptr] & ctype_word) == 0) - { - *errorcodeptr = ERR28; /* Assertion expected */ - goto FAILED; - } - name = ptr++; - while (MAX_255(*ptr) && (cd->ctypes[*ptr] & ctype_word) != 0) - { - ptr++; - } - namelen = (int)(ptr - name); - if (lengthptr != NULL) skipbytes += IMM2_SIZE; - } - - /* Check the terminator */ - - if ((terminator > 0 && *ptr++ != (pcre_uchar)terminator) || - *ptr++ != CHAR_RIGHT_PARENTHESIS) - { - ptr--; /* Error offset */ - *errorcodeptr = ERR26; /* Malformed number or name */ - goto FAILED; - } - - /* Do no further checking in the pre-compile phase. */ - - if (lengthptr != NULL) break; - - /* In the real compile we do the work of looking for the actual - reference. If refsign is not negative, it means we have a number in - recno. */ - - if (refsign >= 0) - { - if (recno <= 0) - { - *errorcodeptr = ERR35; - goto FAILED; - } - if (refsign != 0) recno = (refsign == CHAR_MINUS)? - cd->bracount - recno + 1 : recno + cd->bracount; - if (recno <= 0 || recno > cd->final_bracount) - { - *errorcodeptr = ERR15; - goto FAILED; - } - PUT2(code, 2+LINK_SIZE, recno); - if (recno > cd->top_backref) cd->top_backref = recno; - break; - } - - /* Otherwise look for the name. */ - - slot = cd->name_table; - for (i = 0; i < cd->names_found; i++) - { - if (STRNCMP_UC_UC(name, slot+IMM2_SIZE, namelen) == 0 && - slot[IMM2_SIZE+namelen] == 0) break; - slot += cd->name_entry_size; - } - - /* Found the named subpattern. If the name is duplicated, add one to - the opcode to change CREF/RREF into DNCREF/DNRREF and insert - appropriate data values. Otherwise, just insert the unique subpattern - number. */ - - if (i < cd->names_found) - { - int offset = i++; - int count = 1; - recno = GET2(slot, 0); /* Number from first found */ - if (recno > cd->top_backref) cd->top_backref = recno; - for (; i < cd->names_found; i++) - { - slot += cd->name_entry_size; - if (STRNCMP_UC_UC(name, slot+IMM2_SIZE, namelen) != 0 || - (slot+IMM2_SIZE)[namelen] != 0) break; - count++; - } - - if (count > 1) - { - PUT2(code, 2+LINK_SIZE, offset); - PUT2(code, 2+LINK_SIZE+IMM2_SIZE, count); - skipbytes += IMM2_SIZE; - code[1+LINK_SIZE]++; - } - else /* Not a duplicated name */ - { - PUT2(code, 2+LINK_SIZE, recno); - } - } - - /* If terminator == CHAR_NULL it means that the name followed directly - after the opening parenthesis [e.g. (?(abc)...] and in this case there - are some further alternatives to try. For the cases where terminator != - CHAR_NULL [things like (?(... or (?('name')... or (?(R&name)... ] - we have now checked all the possibilities, so give an error. */ - - else if (terminator != CHAR_NULL) - { - *errorcodeptr = ERR15; - goto FAILED; - } - - /* Check for (?(R) for recursion. Allow digits after R to specify a - specific group number. */ - - else if (*name == CHAR_R) - { - recno = 0; - for (i = 1; i < namelen; i++) - { - if (!IS_DIGIT(name[i])) - { - *errorcodeptr = ERR15; - goto FAILED; - } - if (recno > INT_MAX / 10 - 1) /* Integer overflow */ - { - *errorcodeptr = ERR61; - goto FAILED; - } - recno = recno * 10 + name[i] - CHAR_0; - } - if (recno == 0) recno = RREF_ANY; - code[1+LINK_SIZE] = OP_RREF; /* Change test type */ - PUT2(code, 2+LINK_SIZE, recno); - } - - /* Similarly, check for the (?(DEFINE) "condition", which is always - false. */ - - else if (namelen == 6 && STRNCMP_UC_C8(name, STRING_DEFINE, 6) == 0) - { - code[1+LINK_SIZE] = OP_DEF; - skipbytes = 1; - } - - /* Reference to an unidentified subpattern. */ - - else - { - *errorcodeptr = ERR15; - goto FAILED; - } - break; - - - /* ------------------------------------------------------------ */ - case CHAR_EQUALS_SIGN: /* Positive lookahead */ - bravalue = OP_ASSERT; - cd->assert_depth += 1; - ptr++; - break; - - /* Optimize (?!) to (*FAIL) unless it is quantified - which is a weird - thing to do, but Perl allows all assertions to be quantified, and when - they contain capturing parentheses there may be a potential use for - this feature. Not that that applies to a quantified (?!) but we allow - it for uniformity. */ - - /* ------------------------------------------------------------ */ - case CHAR_EXCLAMATION_MARK: /* Negative lookahead */ - ptr++; - if (*ptr == CHAR_RIGHT_PARENTHESIS && ptr[1] != CHAR_ASTERISK && - ptr[1] != CHAR_PLUS && ptr[1] != CHAR_QUESTION_MARK && - (ptr[1] != CHAR_LEFT_CURLY_BRACKET || !is_counted_repeat(ptr+2))) - { - *code++ = OP_FAIL; - previous = NULL; - continue; - } - bravalue = OP_ASSERT_NOT; - cd->assert_depth += 1; - break; - - - /* ------------------------------------------------------------ */ - case CHAR_LESS_THAN_SIGN: /* Lookbehind or named define */ - switch (ptr[1]) - { - case CHAR_EQUALS_SIGN: /* Positive lookbehind */ - bravalue = OP_ASSERTBACK; - cd->assert_depth += 1; - ptr += 2; - break; - - case CHAR_EXCLAMATION_MARK: /* Negative lookbehind */ - bravalue = OP_ASSERTBACK_NOT; - cd->assert_depth += 1; - ptr += 2; - break; - - default: /* Could be name define, else bad */ - if (MAX_255(ptr[1]) && (cd->ctypes[ptr[1]] & ctype_word) != 0) - goto DEFINE_NAME; - ptr++; /* Correct offset for error */ - *errorcodeptr = ERR24; - goto FAILED; - } - break; - - - /* ------------------------------------------------------------ */ - case CHAR_GREATER_THAN_SIGN: /* One-time brackets */ - bravalue = OP_ONCE; - ptr++; - break; - - - /* ------------------------------------------------------------ */ - case CHAR_C: /* Callout - may be followed by digits; */ - previous_callout = code; /* Save for later completion */ - after_manual_callout = 1; /* Skip one item before completing */ - *code++ = OP_CALLOUT; - { - int n = 0; - ptr++; - while(IS_DIGIT(*ptr)) - n = n * 10 + *ptr++ - CHAR_0; - if (*ptr != CHAR_RIGHT_PARENTHESIS) - { - *errorcodeptr = ERR39; - goto FAILED; - } - if (n > 255) - { - *errorcodeptr = ERR38; - goto FAILED; - } - *code++ = n; - PUT(code, 0, (int)(ptr - cd->start_pattern + 1)); /* Pattern offset */ - PUT(code, LINK_SIZE, 0); /* Default length */ - code += 2 * LINK_SIZE; - } - previous = NULL; - continue; - - - /* ------------------------------------------------------------ */ - case CHAR_P: /* Python-style named subpattern handling */ - if (*(++ptr) == CHAR_EQUALS_SIGN || - *ptr == CHAR_GREATER_THAN_SIGN) /* Reference or recursion */ - { - is_recurse = *ptr == CHAR_GREATER_THAN_SIGN; - terminator = CHAR_RIGHT_PARENTHESIS; - goto NAMED_REF_OR_RECURSE; - } - else if (*ptr != CHAR_LESS_THAN_SIGN) /* Test for Python-style defn */ - { - *errorcodeptr = ERR41; - goto FAILED; - } - /* Fall through to handle (?P< as (?< is handled */ - - - /* ------------------------------------------------------------ */ - DEFINE_NAME: /* Come here from (?< handling */ - case CHAR_APOSTROPHE: - terminator = (*ptr == CHAR_LESS_THAN_SIGN)? - CHAR_GREATER_THAN_SIGN : CHAR_APOSTROPHE; - name = ++ptr; - if (IS_DIGIT(*ptr)) - { - *errorcodeptr = ERR84; /* Group name must start with non-digit */ - goto FAILED; - } - while (MAX_255(*ptr) && (cd->ctypes[*ptr] & ctype_word) != 0) ptr++; - namelen = (int)(ptr - name); - - /* In the pre-compile phase, do a syntax check, remember the longest - name, and then remember the group in a vector, expanding it if - necessary. Duplicates for the same number are skipped; other duplicates - are checked for validity. In the actual compile, there is nothing to - do. */ - - if (lengthptr != NULL) - { - named_group *ng; - pcre_uint32 number = cd->bracount + 1; - - if (*ptr != (pcre_uchar)terminator) - { - *errorcodeptr = ERR42; - goto FAILED; - } - - if (cd->names_found >= MAX_NAME_COUNT) - { - *errorcodeptr = ERR49; - goto FAILED; - } - - if (namelen + IMM2_SIZE + 1 > cd->name_entry_size) - { - cd->name_entry_size = namelen + IMM2_SIZE + 1; - if (namelen > MAX_NAME_SIZE) - { - *errorcodeptr = ERR48; - goto FAILED; - } - } - - /* Scan the list to check for duplicates. For duplicate names, if the - number is the same, break the loop, which causes the name to be - discarded; otherwise, if DUPNAMES is not set, give an error. - If it is set, allow the name with a different number, but continue - scanning in case this is a duplicate with the same number. For - non-duplicate names, give an error if the number is duplicated. */ - - ng = cd->named_groups; - for (i = 0; i < cd->names_found; i++, ng++) - { - if (namelen == ng->length && - STRNCMP_UC_UC(name, ng->name, namelen) == 0) - { - if (ng->number == number) break; - if ((options & PCRE_DUPNAMES) == 0) - { - *errorcodeptr = ERR43; - goto FAILED; - } - cd->dupnames = TRUE; /* Duplicate names exist */ - } - else if (ng->number == number) - { - *errorcodeptr = ERR65; - goto FAILED; - } - } - - if (i >= cd->names_found) /* Not a duplicate with same number */ - { - /* Increase the list size if necessary */ - - if (cd->names_found >= cd->named_group_list_size) - { - int newsize = cd->named_group_list_size * 2; - named_group *newspace = (PUBL(malloc)) - (newsize * sizeof(named_group)); - - if (newspace == NULL) - { - *errorcodeptr = ERR21; - goto FAILED; - } - - memcpy(newspace, cd->named_groups, - cd->named_group_list_size * sizeof(named_group)); - if (cd->named_group_list_size > NAMED_GROUP_LIST_SIZE) - (PUBL(free))((void *)cd->named_groups); - cd->named_groups = newspace; - cd->named_group_list_size = newsize; - } - - cd->named_groups[cd->names_found].name = name; - cd->named_groups[cd->names_found].length = namelen; - cd->named_groups[cd->names_found].number = number; - cd->names_found++; - } - } - - ptr++; /* Move past > or ' in both passes. */ - goto NUMBERED_GROUP; - - - /* ------------------------------------------------------------ */ - case CHAR_AMPERSAND: /* Perl recursion/subroutine syntax */ - terminator = CHAR_RIGHT_PARENTHESIS; - is_recurse = TRUE; - /* Fall through */ - - /* We come here from the Python syntax above that handles both - references (?P=name) and recursion (?P>name), as well as falling - through from the Perl recursion syntax (?&name). We also come here from - the Perl \k or \k'name' back reference syntax and the \k{name} - .NET syntax, and the Oniguruma \g<...> and \g'...' subroutine syntax. */ - - NAMED_REF_OR_RECURSE: - name = ++ptr; - if (IS_DIGIT(*ptr)) - { - *errorcodeptr = ERR84; /* Group name must start with non-digit */ - goto FAILED; - } - while (MAX_255(*ptr) && (cd->ctypes[*ptr] & ctype_word) != 0) ptr++; - namelen = (int)(ptr - name); - - /* In the pre-compile phase, do a syntax check. We used to just set - a dummy reference number, because it was not used in the first pass. - However, with the change of recursive back references to be atomic, - we have to look for the number so that this state can be identified, as - otherwise the incorrect length is computed. If it's not a backwards - reference, the dummy number will do. */ - - if (lengthptr != NULL) - { - named_group *ng; - recno = 0; - - if (namelen == 0) - { - *errorcodeptr = ERR62; - goto FAILED; - } - if (*ptr != (pcre_uchar)terminator) - { - *errorcodeptr = ERR42; - goto FAILED; - } - if (namelen > MAX_NAME_SIZE) - { - *errorcodeptr = ERR48; - goto FAILED; - } - - /* Count named back references. */ - - if (!is_recurse) cd->namedrefcount++; - - /* We have to allow for a named reference to a duplicated name (this - cannot be determined until the second pass). This needs an extra - 16-bit data item. */ - - *lengthptr += IMM2_SIZE; - - /* If this is a forward reference and we are within a (?|...) group, - the reference may end up as the number of a group which we are - currently inside, that is, it could be a recursive reference. In the - real compile this will be picked up and the reference wrapped with - OP_ONCE to make it atomic, so we must space in case this occurs. */ - - /* In fact, this can happen for a non-forward reference because - another group with the same number might be created later. This - issue is fixed "properly" in PCRE2. As PCRE1 is now in maintenance - only mode, we finesse the bug by allowing more memory always. */ - - *lengthptr += 4 + 4*LINK_SIZE; - - /* It is even worse than that. The current reference may be to an - existing named group with a different number (so apparently not - recursive) but which later on is also attached to a group with the - current number. This can only happen if $(| has been previous - encountered. In that case, we allow yet more memory, just in case. - (Again, this is fixed "properly" in PCRE2. */ - - if (cd->dupgroups) *lengthptr += 4 + 4*LINK_SIZE; - - /* Otherwise, check for recursion here. The name table does not exist - in the first pass; instead we must scan the list of names encountered - so far in order to get the number. If the name is not found, leave - the value of recno as 0 for a forward reference. */ - - /* This patch (removing "else") fixes a problem when a reference is - to multiple identically named nested groups from within the nest. - Once again, it is not the "proper" fix, and it results in an - over-allocation of memory. */ - - /* else */ - { - ng = cd->named_groups; - for (i = 0; i < cd->names_found; i++, ng++) - { - if (namelen == ng->length && - STRNCMP_UC_UC(name, ng->name, namelen) == 0) - { - open_capitem *oc; - recno = ng->number; - if (is_recurse) break; - for (oc = cd->open_caps; oc != NULL; oc = oc->next) - { - if (oc->number == recno) - { - oc->flag = TRUE; - break; - } - } - } - } - } - } - - /* In the real compile, search the name table. We check the name - first, and then check that we have reached the end of the name in the - table. That way, if the name is longer than any in the table, the - comparison will fail without reading beyond the table entry. */ - - else - { - slot = cd->name_table; - for (i = 0; i < cd->names_found; i++) - { - if (STRNCMP_UC_UC(name, slot+IMM2_SIZE, namelen) == 0 && - slot[IMM2_SIZE+namelen] == 0) - break; - slot += cd->name_entry_size; - } - - if (i < cd->names_found) - { - recno = GET2(slot, 0); - } - else - { - *errorcodeptr = ERR15; - goto FAILED; - } - } - - /* In both phases, for recursions, we can now go to the code than - handles numerical recursion. */ - - if (is_recurse) goto HANDLE_RECURSION; - - /* In the second pass we must see if the name is duplicated. If so, we - generate a different opcode. */ - - if (lengthptr == NULL && cd->dupnames) - { - int count = 1; - unsigned int index = i; - pcre_uchar *cslot = slot + cd->name_entry_size; - - for (i++; i < cd->names_found; i++) - { - if (STRCMP_UC_UC(slot + IMM2_SIZE, cslot + IMM2_SIZE) != 0) break; - count++; - cslot += cd->name_entry_size; - } - - if (count > 1) - { - if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE; - previous = code; - item_hwm_offset = cd->hwm - cd->start_workspace; - *code++ = ((options & PCRE_CASELESS) != 0)? OP_DNREFI : OP_DNREF; - PUT2INC(code, 0, index); - PUT2INC(code, 0, count); - - /* Process each potentially referenced group. */ - - for (; slot < cslot; slot += cd->name_entry_size) - { - open_capitem *oc; - recno = GET2(slot, 0); - cd->backref_map |= (recno < 32)? (1 << recno) : 1; - if (recno > cd->top_backref) cd->top_backref = recno; - - /* Check to see if this back reference is recursive, that it, it - is inside the group that it references. A flag is set so that the - group can be made atomic. */ - - for (oc = cd->open_caps; oc != NULL; oc = oc->next) - { - if (oc->number == recno) - { - oc->flag = TRUE; - break; - } - } - } - - continue; /* End of back ref handling */ - } - } - - /* First pass, or a non-duplicated name. */ - - goto HANDLE_REFERENCE; - - - /* ------------------------------------------------------------ */ - case CHAR_R: /* Recursion, same as (?0) */ - recno = 0; - if (*(++ptr) != CHAR_RIGHT_PARENTHESIS) - { - *errorcodeptr = ERR29; - goto FAILED; - } - goto HANDLE_RECURSION; - - - /* ------------------------------------------------------------ */ - case CHAR_MINUS: case CHAR_PLUS: /* Recursion or subroutine */ - case CHAR_0: case CHAR_1: case CHAR_2: case CHAR_3: case CHAR_4: - case CHAR_5: case CHAR_6: case CHAR_7: case CHAR_8: case CHAR_9: - { - const pcre_uchar *called; - terminator = CHAR_RIGHT_PARENTHESIS; - - /* Come here from the \g<...> and \g'...' code (Oniguruma - compatibility). However, the syntax has been checked to ensure that - the ... are a (signed) number, so that neither ERR63 nor ERR29 will - be called on this path, nor with the jump to OTHER_CHAR_AFTER_QUERY - ever be taken. */ - - HANDLE_NUMERICAL_RECURSION: - - if ((refsign = *ptr) == CHAR_PLUS) - { - ptr++; - if (!IS_DIGIT(*ptr)) - { - *errorcodeptr = ERR63; - goto FAILED; - } - } - else if (refsign == CHAR_MINUS) - { - if (!IS_DIGIT(ptr[1])) - goto OTHER_CHAR_AFTER_QUERY; - ptr++; - } - - recno = 0; - while(IS_DIGIT(*ptr)) - { - if (recno > INT_MAX / 10 - 1) /* Integer overflow */ - { - while (IS_DIGIT(*ptr)) ptr++; - *errorcodeptr = ERR61; - goto FAILED; - } - recno = recno * 10 + *ptr++ - CHAR_0; - } - - if (*ptr != (pcre_uchar)terminator) - { - *errorcodeptr = ERR29; - goto FAILED; - } - - if (refsign == CHAR_MINUS) - { - if (recno == 0) - { - *errorcodeptr = ERR58; - goto FAILED; - } - recno = cd->bracount - recno + 1; - if (recno <= 0) - { - *errorcodeptr = ERR15; - goto FAILED; - } - } - else if (refsign == CHAR_PLUS) - { - if (recno == 0) - { - *errorcodeptr = ERR58; - goto FAILED; - } - recno += cd->bracount; - } - - /* Come here from code above that handles a named recursion */ - - HANDLE_RECURSION: - - previous = code; - item_hwm_offset = cd->hwm - cd->start_workspace; - called = cd->start_code; - - /* When we are actually compiling, find the bracket that is being - referenced. Temporarily end the regex in case it doesn't exist before - this point. If we end up with a forward reference, first check that - the bracket does occur later so we can give the error (and position) - now. Then remember this forward reference in the workspace so it can - be filled in at the end. */ - - if (lengthptr == NULL) - { - *code = OP_END; - if (recno != 0) - called = PRIV(find_bracket)(cd->start_code, utf, recno); - - /* Forward reference */ - - if (called == NULL) - { - if (recno > cd->final_bracount) - { - *errorcodeptr = ERR15; - goto FAILED; - } - - /* Fudge the value of "called" so that when it is inserted as an - offset below, what it actually inserted is the reference number - of the group. Then remember the forward reference. */ - - called = cd->start_code + recno; - if (cd->hwm >= cd->start_workspace + cd->workspace_size - - WORK_SIZE_SAFETY_MARGIN) - { - *errorcodeptr = expand_workspace(cd); - if (*errorcodeptr != 0) goto FAILED; - } - PUTINC(cd->hwm, 0, (int)(code + 1 - cd->start_code)); - } - - /* If not a forward reference, and the subpattern is still open, - this is a recursive call. We check to see if this is a left - recursion that could loop for ever, and diagnose that case. We - must not, however, do this check if we are in a conditional - subpattern because the condition might be testing for recursion in - a pattern such as /(?(R)a+|(?R)b)/, which is perfectly valid. - Forever loops are also detected at runtime, so those that occur in - conditional subpatterns will be picked up then. */ - - else if (GET(called, 1) == 0 && cond_depth <= 0 && - could_be_empty(called, code, bcptr, utf, cd)) - { - *errorcodeptr = ERR40; - goto FAILED; - } - } - - /* Insert the recursion/subroutine item. It does not have a set first - character (relevant if it is repeated, because it will then be - wrapped with ONCE brackets). */ - - *code = OP_RECURSE; - PUT(code, 1, (int)(called - cd->start_code)); - code += 1 + LINK_SIZE; - groupsetfirstchar = FALSE; - } - - /* Can't determine a first byte now */ - - if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE; - zerofirstchar = firstchar; - zerofirstcharflags = firstcharflags; - continue; - - - /* ------------------------------------------------------------ */ - default: /* Other characters: check option setting */ - OTHER_CHAR_AFTER_QUERY: - set = unset = 0; - optset = &set; - - while (*ptr != CHAR_RIGHT_PARENTHESIS && *ptr != CHAR_COLON) - { - switch (*ptr++) - { - case CHAR_MINUS: optset = &unset; break; - - case CHAR_J: /* Record that it changed in the external options */ - *optset |= PCRE_DUPNAMES; - cd->external_flags |= PCRE_JCHANGED; - break; - - case CHAR_i: *optset |= PCRE_CASELESS; break; - case CHAR_m: *optset |= PCRE_MULTILINE; break; - case CHAR_s: *optset |= PCRE_DOTALL; break; - case CHAR_x: *optset |= PCRE_EXTENDED; break; - case CHAR_U: *optset |= PCRE_UNGREEDY; break; - case CHAR_X: *optset |= PCRE_EXTRA; break; - - default: *errorcodeptr = ERR12; - ptr--; /* Correct the offset */ - goto FAILED; - } - } - - /* Set up the changed option bits, but don't change anything yet. */ - - newoptions = (options | set) & (~unset); - - /* If the options ended with ')' this is not the start of a nested - group with option changes, so the options change at this level. - If we are not at the pattern start, reset the greedy defaults and the - case value for firstchar and reqchar. */ - - if (*ptr == CHAR_RIGHT_PARENTHESIS) - { - greedy_default = ((newoptions & PCRE_UNGREEDY) != 0); - greedy_non_default = greedy_default ^ 1; - req_caseopt = ((newoptions & PCRE_CASELESS) != 0)? REQ_CASELESS:0; - - /* Change options at this level, and pass them back for use - in subsequent branches. */ - - *optionsptr = options = newoptions; - previous = NULL; /* This item can't be repeated */ - continue; /* It is complete */ - } - - /* If the options ended with ':' we are heading into a nested group - with possible change of options. Such groups are non-capturing and are - not assertions of any kind. All we need to do is skip over the ':'; - the newoptions value is handled below. */ - - bravalue = OP_BRA; - ptr++; - } /* End of switch for character following (? */ - } /* End of (? handling */ - - /* Opening parenthesis not followed by '*' or '?'. If PCRE_NO_AUTO_CAPTURE - is set, all unadorned brackets become non-capturing and behave like (?:...) - brackets. */ - - else if ((options & PCRE_NO_AUTO_CAPTURE) != 0) - { - bravalue = OP_BRA; - } - - /* Else we have a capturing group. */ - - else - { - NUMBERED_GROUP: - cd->bracount += 1; - PUT2(code, 1+LINK_SIZE, cd->bracount); - skipbytes = IMM2_SIZE; - } - - /* Process nested bracketed regex. First check for parentheses nested too - deeply. */ - - if ((cd->parens_depth += 1) > PARENS_NEST_LIMIT) - { - *errorcodeptr = ERR82; - goto FAILED; - } - - /* All assertions used not to be repeatable, but this was changed for Perl - compatibility. All kinds can now be repeated except for assertions that are - conditions (Perl also forbids these to be repeated). We copy code into a - non-register variable (tempcode) in order to be able to pass its address - because some compilers complain otherwise. At the start of a conditional - group whose condition is an assertion, cd->iscondassert is set. We unset it - here so as to allow assertions later in the group to be quantified. */ - - if (bravalue >= OP_ASSERT && bravalue <= OP_ASSERTBACK_NOT && - cd->iscondassert) - { - previous = NULL; - cd->iscondassert = FALSE; - } - else - { - previous = code; - item_hwm_offset = cd->hwm - cd->start_workspace; - } - - *code = bravalue; - tempcode = code; - tempreqvary = cd->req_varyopt; /* Save value before bracket */ - tempbracount = cd->bracount; /* Save value before bracket */ - length_prevgroup = 0; /* Initialize for pre-compile phase */ - - if (!compile_regex( - newoptions, /* The complete new option state */ - &tempcode, /* Where to put code (updated) */ - &ptr, /* Input pointer (updated) */ - errorcodeptr, /* Where to put an error message */ - (bravalue == OP_ASSERTBACK || - bravalue == OP_ASSERTBACK_NOT), /* TRUE if back assert */ - reset_bracount, /* True if (?| group */ - skipbytes, /* Skip over bracket number */ - cond_depth + - ((bravalue == OP_COND)?1:0), /* Depth of condition subpatterns */ - &subfirstchar, /* For possible first char */ - &subfirstcharflags, - &subreqchar, /* For possible last char */ - &subreqcharflags, - bcptr, /* Current branch chain */ - cd, /* Tables block */ - (lengthptr == NULL)? NULL : /* Actual compile phase */ - &length_prevgroup /* Pre-compile phase */ - )) - goto FAILED; - - cd->parens_depth -= 1; - - /* If this was an atomic group and there are no capturing groups within it, - generate OP_ONCE_NC instead of OP_ONCE. */ - - if (bravalue == OP_ONCE && cd->bracount <= tempbracount) - *code = OP_ONCE_NC; - - if (bravalue >= OP_ASSERT && bravalue <= OP_ASSERTBACK_NOT) - cd->assert_depth -= 1; - - /* At the end of compiling, code is still pointing to the start of the - group, while tempcode has been updated to point past the end of the group. - The pattern pointer (ptr) is on the bracket. - - If this is a conditional bracket, check that there are no more than - two branches in the group, or just one if it's a DEFINE group. We do this - in the real compile phase, not in the pre-pass, where the whole group may - not be available. */ - - if (bravalue == OP_COND && lengthptr == NULL) - { - pcre_uchar *tc = code; - int condcount = 0; - - do { - condcount++; - tc += GET(tc,1); - } - while (*tc != OP_KET); - - /* A DEFINE group is never obeyed inline (the "condition" is always - false). It must have only one branch. */ - - if (code[LINK_SIZE+1] == OP_DEF) - { - if (condcount > 1) - { - *errorcodeptr = ERR54; - goto FAILED; - } - bravalue = OP_DEF; /* Just a flag to suppress char handling below */ - } - - /* A "normal" conditional group. If there is just one branch, we must not - make use of its firstchar or reqchar, because this is equivalent to an - empty second branch. */ - - else - { - if (condcount > 2) - { - *errorcodeptr = ERR27; - goto FAILED; - } - if (condcount == 1) subfirstcharflags = subreqcharflags = REQ_NONE; - } - } - - /* Error if hit end of pattern */ - - if (*ptr != CHAR_RIGHT_PARENTHESIS) - { - *errorcodeptr = ERR14; - goto FAILED; - } - - /* In the pre-compile phase, update the length by the length of the group, - less the brackets at either end. Then reduce the compiled code to just a - set of non-capturing brackets so that it doesn't use much memory if it is - duplicated by a quantifier.*/ - - if (lengthptr != NULL) - { - if (OFLOW_MAX - *lengthptr < length_prevgroup - 2 - 2*LINK_SIZE) - { - *errorcodeptr = ERR20; - goto FAILED; - } - *lengthptr += length_prevgroup - 2 - 2*LINK_SIZE; - code++; /* This already contains bravalue */ - PUTINC(code, 0, 1 + LINK_SIZE); - *code++ = OP_KET; - PUTINC(code, 0, 1 + LINK_SIZE); - break; /* No need to waste time with special character handling */ - } - - /* Otherwise update the main code pointer to the end of the group. */ - - code = tempcode; - - /* For a DEFINE group, required and first character settings are not - relevant. */ - - if (bravalue == OP_DEF) break; - - /* Handle updating of the required and first characters for other types of - group. Update for normal brackets of all kinds, and conditions with two - branches (see code above). If the bracket is followed by a quantifier with - zero repeat, we have to back off. Hence the definition of zeroreqchar and - zerofirstchar outside the main loop so that they can be accessed for the - back off. */ - - zeroreqchar = reqchar; - zeroreqcharflags = reqcharflags; - zerofirstchar = firstchar; - zerofirstcharflags = firstcharflags; - groupsetfirstchar = FALSE; - - if (bravalue >= OP_ONCE) - { - /* If we have not yet set a firstchar in this branch, take it from the - subpattern, remembering that it was set here so that a repeat of more - than one can replicate it as reqchar if necessary. If the subpattern has - no firstchar, set "none" for the whole branch. In both cases, a zero - repeat forces firstchar to "none". */ - - if (firstcharflags == REQ_UNSET) - { - if (subfirstcharflags >= 0) - { - firstchar = subfirstchar; - firstcharflags = subfirstcharflags; - groupsetfirstchar = TRUE; - } - else firstcharflags = REQ_NONE; - zerofirstcharflags = REQ_NONE; - } - - /* If firstchar was previously set, convert the subpattern's firstchar - into reqchar if there wasn't one, using the vary flag that was in - existence beforehand. */ - - else if (subfirstcharflags >= 0 && subreqcharflags < 0) - { - subreqchar = subfirstchar; - subreqcharflags = subfirstcharflags | tempreqvary; - } - - /* If the subpattern set a required byte (or set a first byte that isn't - really the first byte - see above), set it. */ - - if (subreqcharflags >= 0) - { - reqchar = subreqchar; - reqcharflags = subreqcharflags; - } - } - - /* For a forward assertion, we take the reqchar, if set, provided that the - group has also set a first char. This can be helpful if the pattern that - follows the assertion doesn't set a different char. For example, it's - useful for /(?=abcde).+/. We can't set firstchar for an assertion, however - because it leads to incorrect effect for patterns such as /(?=a)a.+/ when - the "real" "a" would then become a reqchar instead of a firstchar. This is - overcome by a scan at the end if there's no firstchar, looking for an - asserted first char. */ - - else if (bravalue == OP_ASSERT && subreqcharflags >= 0 && - subfirstcharflags >= 0) - { - reqchar = subreqchar; - reqcharflags = subreqcharflags; - } - break; /* End of processing '(' */ - - - /* ===================================================================*/ - /* Handle metasequences introduced by \. For ones like \d, the ESC_ values - are arranged to be the negation of the corresponding OP_values in the - default case when PCRE_UCP is not set. For the back references, the values - are negative the reference number. Only back references and those types - that consume a character may be repeated. We can test for values between - ESC_b and ESC_Z for the latter; this may have to change if any new ones are - ever created. */ - - case CHAR_BACKSLASH: - tempptr = ptr; - escape = check_escape(&ptr, &ec, errorcodeptr, cd->bracount, options, FALSE); - if (*errorcodeptr != 0) goto FAILED; - - if (escape == 0) /* The escape coded a single character */ - c = ec; - else - { - /* For metasequences that actually match a character, we disable the - setting of a first character if it hasn't already been set. */ - - if (firstcharflags == REQ_UNSET && escape > ESC_b && escape < ESC_Z) - firstcharflags = REQ_NONE; - - /* Set values to reset to if this is followed by a zero repeat. */ - - zerofirstchar = firstchar; - zerofirstcharflags = firstcharflags; - zeroreqchar = reqchar; - zeroreqcharflags = reqcharflags; - - /* \g or \g'name' is a subroutine call by name and \g or \g'n' - is a subroutine call by number (Oniguruma syntax). In fact, the value - ESC_g is returned only for these cases. So we don't need to check for < - or ' if the value is ESC_g. For the Perl syntax \g{n} the value is - -n, and for the Perl syntax \g{name} the result is ESC_k (as - that is a synonym for a named back reference). */ - - if (escape == ESC_g) - { - const pcre_uchar *p; - pcre_uint32 cf; - - item_hwm_offset = cd->hwm - cd->start_workspace; /* Normally this is set when '(' is read */ - terminator = (*(++ptr) == CHAR_LESS_THAN_SIGN)? - CHAR_GREATER_THAN_SIGN : CHAR_APOSTROPHE; - - /* These two statements stop the compiler for warning about possibly - unset variables caused by the jump to HANDLE_NUMERICAL_RECURSION. In - fact, because we do the check for a number below, the paths that - would actually be in error are never taken. */ - - skipbytes = 0; - reset_bracount = FALSE; - - /* If it's not a signed or unsigned number, treat it as a name. */ - - cf = ptr[1]; - if (cf != CHAR_PLUS && cf != CHAR_MINUS && !IS_DIGIT(cf)) - { - is_recurse = TRUE; - goto NAMED_REF_OR_RECURSE; - } - - /* Signed or unsigned number (cf = ptr[1]) is known to be plus or minus - or a digit. */ - - p = ptr + 2; - while (IS_DIGIT(*p)) p++; - if (*p != (pcre_uchar)terminator) - { - *errorcodeptr = ERR57; - goto FAILED; - } - ptr++; - goto HANDLE_NUMERICAL_RECURSION; - } - - /* \k or \k'name' is a back reference by name (Perl syntax). - We also support \k{name} (.NET syntax). */ - - if (escape == ESC_k) - { - if ((ptr[1] != CHAR_LESS_THAN_SIGN && - ptr[1] != CHAR_APOSTROPHE && ptr[1] != CHAR_LEFT_CURLY_BRACKET)) - { - *errorcodeptr = ERR69; - goto FAILED; - } - is_recurse = FALSE; - terminator = (*(++ptr) == CHAR_LESS_THAN_SIGN)? - CHAR_GREATER_THAN_SIGN : (*ptr == CHAR_APOSTROPHE)? - CHAR_APOSTROPHE : CHAR_RIGHT_CURLY_BRACKET; - goto NAMED_REF_OR_RECURSE; - } - - /* Back references are handled specially; must disable firstchar if - not set to cope with cases like (?=(\w+))\1: which would otherwise set - ':' later. */ - - if (escape < 0) - { - open_capitem *oc; - recno = -escape; - - /* Come here from named backref handling when the reference is to a - single group (i.e. not to a duplicated name. */ - - HANDLE_REFERENCE: - if (firstcharflags == REQ_UNSET) zerofirstcharflags = firstcharflags = REQ_NONE; - previous = code; - item_hwm_offset = cd->hwm - cd->start_workspace; - *code++ = ((options & PCRE_CASELESS) != 0)? OP_REFI : OP_REF; - PUT2INC(code, 0, recno); - cd->backref_map |= (recno < 32)? (1 << recno) : 1; - if (recno > cd->top_backref) cd->top_backref = recno; - - /* Check to see if this back reference is recursive, that it, it - is inside the group that it references. A flag is set so that the - group can be made atomic. */ - - for (oc = cd->open_caps; oc != NULL; oc = oc->next) - { - if (oc->number == recno) - { - oc->flag = TRUE; - break; - } - } - } - - /* So are Unicode property matches, if supported. */ - -#ifdef SUPPORT_UCP - else if (escape == ESC_P || escape == ESC_p) - { - BOOL negated; - unsigned int ptype = 0, pdata = 0; - if (!get_ucp(&ptr, &negated, &ptype, &pdata, errorcodeptr)) - goto FAILED; - previous = code; - item_hwm_offset = cd->hwm - cd->start_workspace; - *code++ = ((escape == ESC_p) != negated)? OP_PROP : OP_NOTPROP; - *code++ = ptype; - *code++ = pdata; - } -#else - - /* If Unicode properties are not supported, \X, \P, and \p are not - allowed. */ - - else if (escape == ESC_X || escape == ESC_P || escape == ESC_p) - { - *errorcodeptr = ERR45; - goto FAILED; - } -#endif - - /* For the rest (including \X when Unicode properties are supported), we - can obtain the OP value by negating the escape value in the default - situation when PCRE_UCP is not set. When it *is* set, we substitute - Unicode property tests. Note that \b and \B do a one-character - lookbehind, and \A also behaves as if it does. */ - - else - { - if ((escape == ESC_b || escape == ESC_B || escape == ESC_A) && - cd->max_lookbehind == 0) - cd->max_lookbehind = 1; -#ifdef SUPPORT_UCP - if (escape >= ESC_DU && escape <= ESC_wu) - { - nestptr = ptr + 1; /* Where to resume */ - ptr = substitutes[escape - ESC_DU] - 1; /* Just before substitute */ - } - else -#endif - /* In non-UTF-8 mode, we turn \C into OP_ALLANY instead of OP_ANYBYTE - so that it works in DFA mode and in lookbehinds. */ - - { - previous = (escape > ESC_b && escape < ESC_Z)? code : NULL; - item_hwm_offset = cd->hwm - cd->start_workspace; - *code++ = (!utf && escape == ESC_C)? OP_ALLANY : escape; - } - } - continue; - } - - /* We have a data character whose value is in c. In UTF-8 mode it may have - a value > 127. We set its representation in the length/buffer, and then - handle it as a data character. */ - -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 - if (utf && c > MAX_VALUE_FOR_SINGLE_CHAR) - mclength = PRIV(ord2utf)(c, mcbuffer); - else -#endif - - { - mcbuffer[0] = c; - mclength = 1; - } - goto ONE_CHAR; - - - /* ===================================================================*/ - /* Handle a literal character. It is guaranteed not to be whitespace or # - when the extended flag is set. If we are in a UTF mode, it may be a - multi-unit literal character. */ - - default: - NORMAL_CHAR: - mclength = 1; - mcbuffer[0] = c; - -#ifdef SUPPORT_UTF - if (utf && HAS_EXTRALEN(c)) - ACROSSCHAR(TRUE, ptr[1], mcbuffer[mclength++] = *(++ptr)); -#endif - - /* At this point we have the character's bytes in mcbuffer, and the length - in mclength. When not in UTF-8 mode, the length is always 1. */ - - ONE_CHAR: - previous = code; - item_hwm_offset = cd->hwm - cd->start_workspace; - - /* For caseless UTF-8 mode when UCP support is available, check whether - this character has more than one other case. If so, generate a special - OP_PROP item instead of OP_CHARI. */ - -#ifdef SUPPORT_UCP - if (utf && (options & PCRE_CASELESS) != 0) - { - GETCHAR(c, mcbuffer); - if ((c = UCD_CASESET(c)) != 0) - { - *code++ = OP_PROP; - *code++ = PT_CLIST; - *code++ = c; - if (firstcharflags == REQ_UNSET) - firstcharflags = zerofirstcharflags = REQ_NONE; - break; - } - } -#endif - - /* Caseful matches, or not one of the multicase characters. */ - - *code++ = ((options & PCRE_CASELESS) != 0)? OP_CHARI : OP_CHAR; - for (c = 0; c < mclength; c++) *code++ = mcbuffer[c]; - - /* Remember if \r or \n were seen */ - - if (mcbuffer[0] == CHAR_CR || mcbuffer[0] == CHAR_NL) - cd->external_flags |= PCRE_HASCRORLF; - - /* Set the first and required bytes appropriately. If no previous first - byte, set it from this character, but revert to none on a zero repeat. - Otherwise, leave the firstchar value alone, and don't change it on a zero - repeat. */ - - if (firstcharflags == REQ_UNSET) - { - zerofirstcharflags = REQ_NONE; - zeroreqchar = reqchar; - zeroreqcharflags = reqcharflags; - - /* If the character is more than one byte long, we can set firstchar - only if it is not to be matched caselessly. */ - - if (mclength == 1 || req_caseopt == 0) - { - firstchar = mcbuffer[0]; - firstcharflags = req_caseopt; - - if (mclength != 1) - { - reqchar = code[-1]; - reqcharflags = cd->req_varyopt; - } - } - else firstcharflags = reqcharflags = REQ_NONE; - } - - /* firstchar was previously set; we can set reqchar only if the length is - 1 or the matching is caseful. */ - - else - { - zerofirstchar = firstchar; - zerofirstcharflags = firstcharflags; - zeroreqchar = reqchar; - zeroreqcharflags = reqcharflags; - if (mclength == 1 || req_caseopt == 0) - { - reqchar = code[-1]; - reqcharflags = req_caseopt | cd->req_varyopt; - } - } - - break; /* End of literal character handling */ - } - } /* end of big loop */ - - -/* Control never reaches here by falling through, only by a goto for all the -error states. Pass back the position in the pattern so that it can be displayed -to the user for diagnosing the error. */ - -FAILED: -*ptrptr = ptr; -return FALSE; -} - - - -/************************************************* -* Compile sequence of alternatives * -*************************************************/ - -/* On entry, ptr is pointing past the bracket character, but on return it -points to the closing bracket, or vertical bar, or end of string. The code -variable is pointing at the byte into which the BRA operator has been stored. -This function is used during the pre-compile phase when we are trying to find -out the amount of memory needed, as well as during the real compile phase. The -value of lengthptr distinguishes the two phases. - -Arguments: - options option bits, including any changes for this subpattern - codeptr -> the address of the current code pointer - ptrptr -> the address of the current pattern pointer - errorcodeptr -> pointer to error code variable - lookbehind TRUE if this is a lookbehind assertion - reset_bracount TRUE to reset the count for each branch - skipbytes skip this many bytes at start (for brackets and OP_COND) - cond_depth depth of nesting for conditional subpatterns - firstcharptr place to put the first required character - firstcharflagsptr place to put the first character flags, or a negative number - reqcharptr place to put the last required character - reqcharflagsptr place to put the last required character flags, or a negative number - bcptr pointer to the chain of currently open branches - cd points to the data block with tables pointers etc. - lengthptr NULL during the real compile phase - points to length accumulator during pre-compile phase - -Returns: TRUE on success -*/ - -static BOOL -compile_regex(int options, pcre_uchar **codeptr, const pcre_uchar **ptrptr, - int *errorcodeptr, BOOL lookbehind, BOOL reset_bracount, int skipbytes, - int cond_depth, - pcre_uint32 *firstcharptr, pcre_int32 *firstcharflagsptr, - pcre_uint32 *reqcharptr, pcre_int32 *reqcharflagsptr, - branch_chain *bcptr, compile_data *cd, int *lengthptr) -{ -const pcre_uchar *ptr = *ptrptr; -pcre_uchar *code = *codeptr; -pcre_uchar *last_branch = code; -pcre_uchar *start_bracket = code; -pcre_uchar *reverse_count = NULL; -open_capitem capitem; -int capnumber = 0; -pcre_uint32 firstchar, reqchar; -pcre_int32 firstcharflags, reqcharflags; -pcre_uint32 branchfirstchar, branchreqchar; -pcre_int32 branchfirstcharflags, branchreqcharflags; -int length; -unsigned int orig_bracount; -unsigned int max_bracount; -branch_chain bc; -size_t save_hwm_offset; - -/* If set, call the external function that checks for stack availability. */ - -if (PUBL(stack_guard) != NULL && PUBL(stack_guard)()) - { - *errorcodeptr= ERR85; - return FALSE; - } - -/* Miscellaneous initialization */ - -bc.outer = bcptr; -bc.current_branch = code; - -firstchar = reqchar = 0; -firstcharflags = reqcharflags = REQ_UNSET; - -save_hwm_offset = cd->hwm - cd->start_workspace; - -/* Accumulate the length for use in the pre-compile phase. Start with the -length of the BRA and KET and any extra bytes that are required at the -beginning. We accumulate in a local variable to save frequent testing of -lenthptr for NULL. We cannot do this by looking at the value of code at the -start and end of each alternative, because compiled items are discarded during -the pre-compile phase so that the work space is not exceeded. */ - -length = 2 + 2*LINK_SIZE + skipbytes; - -/* WARNING: If the above line is changed for any reason, you must also change -the code that abstracts option settings at the start of the pattern and makes -them global. It tests the value of length for (2 + 2*LINK_SIZE) in the -pre-compile phase to find out whether anything has yet been compiled or not. */ - -/* If this is a capturing subpattern, add to the chain of open capturing items -so that we can detect them if (*ACCEPT) is encountered. This is also used to -detect groups that contain recursive back references to themselves. Note that -only OP_CBRA need be tested here; changing this opcode to one of its variants, -e.g. OP_SCBRAPOS, happens later, after the group has been compiled. */ - -if (*code == OP_CBRA) - { - capnumber = GET2(code, 1 + LINK_SIZE); - capitem.number = capnumber; - capitem.next = cd->open_caps; - capitem.flag = FALSE; - cd->open_caps = &capitem; - } - -/* Offset is set zero to mark that this bracket is still open */ - -PUT(code, 1, 0); -code += 1 + LINK_SIZE + skipbytes; - -/* Loop for each alternative branch */ - -orig_bracount = max_bracount = cd->bracount; -for (;;) - { - /* For a (?| group, reset the capturing bracket count so that each branch - uses the same numbers. */ - - if (reset_bracount) cd->bracount = orig_bracount; - - /* Set up dummy OP_REVERSE if lookbehind assertion */ - - if (lookbehind) - { - *code++ = OP_REVERSE; - reverse_count = code; - PUTINC(code, 0, 0); - length += 1 + LINK_SIZE; - } - - /* Now compile the branch; in the pre-compile phase its length gets added - into the length. */ - - if (!compile_branch(&options, &code, &ptr, errorcodeptr, &branchfirstchar, - &branchfirstcharflags, &branchreqchar, &branchreqcharflags, &bc, - cond_depth, cd, (lengthptr == NULL)? NULL : &length)) - { - *ptrptr = ptr; - return FALSE; - } - - /* Keep the highest bracket count in case (?| was used and some branch - has fewer than the rest. */ - - if (cd->bracount > max_bracount) max_bracount = cd->bracount; - - /* In the real compile phase, there is some post-processing to be done. */ - - if (lengthptr == NULL) - { - /* If this is the first branch, the firstchar and reqchar values for the - branch become the values for the regex. */ - - if (*last_branch != OP_ALT) - { - firstchar = branchfirstchar; - firstcharflags = branchfirstcharflags; - reqchar = branchreqchar; - reqcharflags = branchreqcharflags; - } - - /* If this is not the first branch, the first char and reqchar have to - match the values from all the previous branches, except that if the - previous value for reqchar didn't have REQ_VARY set, it can still match, - and we set REQ_VARY for the regex. */ - - else - { - /* If we previously had a firstchar, but it doesn't match the new branch, - we have to abandon the firstchar for the regex, but if there was - previously no reqchar, it takes on the value of the old firstchar. */ - - if (firstcharflags >= 0 && - (firstcharflags != branchfirstcharflags || firstchar != branchfirstchar)) - { - if (reqcharflags < 0) - { - reqchar = firstchar; - reqcharflags = firstcharflags; - } - firstcharflags = REQ_NONE; - } - - /* If we (now or from before) have no firstchar, a firstchar from the - branch becomes a reqchar if there isn't a branch reqchar. */ - - if (firstcharflags < 0 && branchfirstcharflags >= 0 && branchreqcharflags < 0) - { - branchreqchar = branchfirstchar; - branchreqcharflags = branchfirstcharflags; - } - - /* Now ensure that the reqchars match */ - - if (((reqcharflags & ~REQ_VARY) != (branchreqcharflags & ~REQ_VARY)) || - reqchar != branchreqchar) - reqcharflags = REQ_NONE; - else - { - reqchar = branchreqchar; - reqcharflags |= branchreqcharflags; /* To "or" REQ_VARY */ - } - } - - /* If lookbehind, check that this branch matches a fixed-length string, and - put the length into the OP_REVERSE item. Temporarily mark the end of the - branch with OP_END. If the branch contains OP_RECURSE, the result is -3 - because there may be forward references that we can't check here. Set a - flag to cause another lookbehind check at the end. Why not do it all at the - end? Because common, erroneous checks are picked up here and the offset of - the problem can be shown. */ - - if (lookbehind) - { - int fixed_length; - *code = OP_END; - fixed_length = find_fixedlength(last_branch, (options & PCRE_UTF8) != 0, - FALSE, cd, NULL); - DPRINTF(("fixed length = %d\n", fixed_length)); - if (fixed_length == -3) - { - cd->check_lookbehind = TRUE; - } - else if (fixed_length < 0) - { - *errorcodeptr = (fixed_length == -2)? ERR36 : - (fixed_length == -4)? ERR70: ERR25; - *ptrptr = ptr; - return FALSE; - } - else - { - if (fixed_length > cd->max_lookbehind) - cd->max_lookbehind = fixed_length; - PUT(reverse_count, 0, fixed_length); - } - } - } - - /* Reached end of expression, either ')' or end of pattern. In the real - compile phase, go back through the alternative branches and reverse the chain - of offsets, with the field in the BRA item now becoming an offset to the - first alternative. If there are no alternatives, it points to the end of the - group. The length in the terminating ket is always the length of the whole - bracketed item. Return leaving the pointer at the terminating char. */ - - if (*ptr != CHAR_VERTICAL_LINE) - { - if (lengthptr == NULL) - { - int branch_length = (int)(code - last_branch); - do - { - int prev_length = GET(last_branch, 1); - PUT(last_branch, 1, branch_length); - branch_length = prev_length; - last_branch -= branch_length; - } - while (branch_length > 0); - } - - /* Fill in the ket */ - - *code = OP_KET; - PUT(code, 1, (int)(code - start_bracket)); - code += 1 + LINK_SIZE; - - /* If it was a capturing subpattern, check to see if it contained any - recursive back references. If so, we must wrap it in atomic brackets. - Because we are moving code along, we must ensure that any pending recursive - references are updated. In any event, remove the block from the chain. */ - - if (capnumber > 0) - { - if (cd->open_caps->flag) - { - *code = OP_END; - adjust_recurse(start_bracket, 1 + LINK_SIZE, - (options & PCRE_UTF8) != 0, cd, save_hwm_offset); - memmove(start_bracket + 1 + LINK_SIZE, start_bracket, - IN_UCHARS(code - start_bracket)); - *start_bracket = OP_ONCE; - code += 1 + LINK_SIZE; - PUT(start_bracket, 1, (int)(code - start_bracket)); - *code = OP_KET; - PUT(code, 1, (int)(code - start_bracket)); - code += 1 + LINK_SIZE; - length += 2 + 2*LINK_SIZE; - } - cd->open_caps = cd->open_caps->next; - } - - /* Retain the highest bracket number, in case resetting was used. */ - - cd->bracount = max_bracount; - - /* Set values to pass back */ - - *codeptr = code; - *ptrptr = ptr; - *firstcharptr = firstchar; - *firstcharflagsptr = firstcharflags; - *reqcharptr = reqchar; - *reqcharflagsptr = reqcharflags; - if (lengthptr != NULL) - { - if (OFLOW_MAX - *lengthptr < length) - { - *errorcodeptr = ERR20; - return FALSE; - } - *lengthptr += length; - } - return TRUE; - } - - /* Another branch follows. In the pre-compile phase, we can move the code - pointer back to where it was for the start of the first branch. (That is, - pretend that each branch is the only one.) - - In the real compile phase, insert an ALT node. Its length field points back - to the previous branch while the bracket remains open. At the end the chain - is reversed. It's done like this so that the start of the bracket has a - zero offset until it is closed, making it possible to detect recursion. */ - - if (lengthptr != NULL) - { - code = *codeptr + 1 + LINK_SIZE + skipbytes; - length += 1 + LINK_SIZE; - } - else - { - *code = OP_ALT; - PUT(code, 1, (int)(code - last_branch)); - bc.current_branch = last_branch = code; - code += 1 + LINK_SIZE; - } - - ptr++; - } -/* Control never reaches here */ -} - - - - -/************************************************* -* Check for anchored expression * -*************************************************/ - -/* Try to find out if this is an anchored regular expression. Consider each -alternative branch. If they all start with OP_SOD or OP_CIRC, or with a bracket -all of whose alternatives start with OP_SOD or OP_CIRC (recurse ad lib), then -it's anchored. However, if this is a multiline pattern, then only OP_SOD will -be found, because ^ generates OP_CIRCM in that mode. - -We can also consider a regex to be anchored if OP_SOM starts all its branches. -This is the code for \G, which means "match at start of match position, taking -into account the match offset". - -A branch is also implicitly anchored if it starts with .* and DOTALL is set, -because that will try the rest of the pattern at all possible matching points, -so there is no point trying again.... er .... - -.... except when the .* appears inside capturing parentheses, and there is a -subsequent back reference to those parentheses. We haven't enough information -to catch that case precisely. - -At first, the best we could do was to detect when .* was in capturing brackets -and the highest back reference was greater than or equal to that level. -However, by keeping a bitmap of the first 31 back references, we can catch some -of the more common cases more precisely. - -... A second exception is when the .* appears inside an atomic group, because -this prevents the number of characters it matches from being adjusted. - -Arguments: - code points to start of expression (the bracket) - bracket_map a bitmap of which brackets we are inside while testing; this - handles up to substring 31; after that we just have to take - the less precise approach - cd points to the compile data block - atomcount atomic group level - -Returns: TRUE or FALSE -*/ - -static BOOL -is_anchored(register const pcre_uchar *code, unsigned int bracket_map, - compile_data *cd, int atomcount) -{ -do { - const pcre_uchar *scode = first_significant_code( - code + PRIV(OP_lengths)[*code], FALSE); - register int op = *scode; - - /* Non-capturing brackets */ - - if (op == OP_BRA || op == OP_BRAPOS || - op == OP_SBRA || op == OP_SBRAPOS) - { - if (!is_anchored(scode, bracket_map, cd, atomcount)) return FALSE; - } - - /* Capturing brackets */ - - else if (op == OP_CBRA || op == OP_CBRAPOS || - op == OP_SCBRA || op == OP_SCBRAPOS) - { - int n = GET2(scode, 1+LINK_SIZE); - int new_map = bracket_map | ((n < 32)? (1 << n) : 1); - if (!is_anchored(scode, new_map, cd, atomcount)) return FALSE; - } - - /* Positive forward assertion */ - - else if (op == OP_ASSERT) - { - if (!is_anchored(scode, bracket_map, cd, atomcount)) return FALSE; - } - - /* Condition; not anchored if no second branch */ - - else if (op == OP_COND) - { - if (scode[GET(scode,1)] != OP_ALT) return FALSE; - if (!is_anchored(scode, bracket_map, cd, atomcount)) return FALSE; - } - - /* Atomic groups */ - - else if (op == OP_ONCE || op == OP_ONCE_NC) - { - if (!is_anchored(scode, bracket_map, cd, atomcount + 1)) - return FALSE; - } - - /* .* is not anchored unless DOTALL is set (which generates OP_ALLANY) and - it isn't in brackets that are or may be referenced or inside an atomic - group. */ - - else if ((op == OP_TYPESTAR || op == OP_TYPEMINSTAR || - op == OP_TYPEPOSSTAR)) - { - if (scode[1] != OP_ALLANY || (bracket_map & cd->backref_map) != 0 || - atomcount > 0 || cd->had_pruneorskip) - return FALSE; - } - - /* Check for explicit anchoring */ - - else if (op != OP_SOD && op != OP_SOM && op != OP_CIRC) return FALSE; - - code += GET(code, 1); - } -while (*code == OP_ALT); /* Loop for each alternative */ -return TRUE; -} - - - -/************************************************* -* Check for starting with ^ or .* * -*************************************************/ - -/* This is called to find out if every branch starts with ^ or .* so that -"first char" processing can be done to speed things up in multiline -matching and for non-DOTALL patterns that start with .* (which must start at -the beginning or after \n). As in the case of is_anchored() (see above), we -have to take account of back references to capturing brackets that contain .* -because in that case we can't make the assumption. Also, the appearance of .* -inside atomic brackets or in an assertion, or in a pattern that contains *PRUNE -or *SKIP does not count, because once again the assumption no longer holds. - -Arguments: - code points to start of expression (the bracket) - bracket_map a bitmap of which brackets we are inside while testing; this - handles up to substring 31; after that we just have to take - the less precise approach - cd points to the compile data - atomcount atomic group level - inassert TRUE if in an assertion - -Returns: TRUE or FALSE -*/ - -static BOOL -is_startline(const pcre_uchar *code, unsigned int bracket_map, - compile_data *cd, int atomcount, BOOL inassert) -{ -do { - const pcre_uchar *scode = first_significant_code( - code + PRIV(OP_lengths)[*code], FALSE); - register int op = *scode; - - /* If we are at the start of a conditional assertion group, *both* the - conditional assertion *and* what follows the condition must satisfy the test - for start of line. Other kinds of condition fail. Note that there may be an - auto-callout at the start of a condition. */ - - if (op == OP_COND) - { - scode += 1 + LINK_SIZE; - if (*scode == OP_CALLOUT) scode += PRIV(OP_lengths)[OP_CALLOUT]; - switch (*scode) - { - case OP_CREF: - case OP_DNCREF: - case OP_RREF: - case OP_DNRREF: - case OP_DEF: - case OP_FAIL: - return FALSE; - - default: /* Assertion */ - if (!is_startline(scode, bracket_map, cd, atomcount, TRUE)) return FALSE; - do scode += GET(scode, 1); while (*scode == OP_ALT); - scode += 1 + LINK_SIZE; - break; - } - scode = first_significant_code(scode, FALSE); - op = *scode; - } - - /* Non-capturing brackets */ - - if (op == OP_BRA || op == OP_BRAPOS || - op == OP_SBRA || op == OP_SBRAPOS) - { - if (!is_startline(scode, bracket_map, cd, atomcount, inassert)) return FALSE; - } - - /* Capturing brackets */ - - else if (op == OP_CBRA || op == OP_CBRAPOS || - op == OP_SCBRA || op == OP_SCBRAPOS) - { - int n = GET2(scode, 1+LINK_SIZE); - int new_map = bracket_map | ((n < 32)? (1 << n) : 1); - if (!is_startline(scode, new_map, cd, atomcount, inassert)) return FALSE; - } - - /* Positive forward assertions */ - - else if (op == OP_ASSERT) - { - if (!is_startline(scode, bracket_map, cd, atomcount, TRUE)) return FALSE; - } - - /* Atomic brackets */ - - else if (op == OP_ONCE || op == OP_ONCE_NC) - { - if (!is_startline(scode, bracket_map, cd, atomcount + 1, inassert)) return FALSE; - } - - /* .* means "start at start or after \n" if it isn't in atomic brackets or - brackets that may be referenced or an assertion, as long as the pattern does - not contain *PRUNE or *SKIP, because these break the feature. Consider, for - example, /.*?a(*PRUNE)b/ with the subject "aab", which matches "ab", i.e. - not at the start of a line. */ - - else if (op == OP_TYPESTAR || op == OP_TYPEMINSTAR || op == OP_TYPEPOSSTAR) - { - if (scode[1] != OP_ANY || (bracket_map & cd->backref_map) != 0 || - atomcount > 0 || cd->had_pruneorskip || inassert) - return FALSE; - } - - /* Check for explicit circumflex; anything else gives a FALSE result. Note - in particular that this includes atomic brackets OP_ONCE and OP_ONCE_NC - because the number of characters matched by .* cannot be adjusted inside - them. */ - - else if (op != OP_CIRC && op != OP_CIRCM) return FALSE; - - /* Move on to the next alternative */ - - code += GET(code, 1); - } -while (*code == OP_ALT); /* Loop for each alternative */ -return TRUE; -} - - - -/************************************************* -* Check for asserted fixed first char * -*************************************************/ - -/* During compilation, the "first char" settings from forward assertions are -discarded, because they can cause conflicts with actual literals that follow. -However, if we end up without a first char setting for an unanchored pattern, -it is worth scanning the regex to see if there is an initial asserted first -char. If all branches start with the same asserted char, or with a -non-conditional bracket all of whose alternatives start with the same asserted -char (recurse ad lib), then we return that char, with the flags set to zero or -REQ_CASELESS; otherwise return zero with REQ_NONE in the flags. - -Arguments: - code points to start of expression (the bracket) - flags points to the first char flags, or to REQ_NONE - inassert TRUE if in an assertion - -Returns: the fixed first char, or 0 with REQ_NONE in flags -*/ - -static pcre_uint32 -find_firstassertedchar(const pcre_uchar *code, pcre_int32 *flags, - BOOL inassert) -{ -register pcre_uint32 c = 0; -int cflags = REQ_NONE; - -*flags = REQ_NONE; -do { - pcre_uint32 d; - int dflags; - int xl = (*code == OP_CBRA || *code == OP_SCBRA || - *code == OP_CBRAPOS || *code == OP_SCBRAPOS)? IMM2_SIZE:0; - const pcre_uchar *scode = first_significant_code(code + 1+LINK_SIZE + xl, - TRUE); - register pcre_uchar op = *scode; - - switch(op) - { - default: - return 0; - - case OP_BRA: - case OP_BRAPOS: - case OP_CBRA: - case OP_SCBRA: - case OP_CBRAPOS: - case OP_SCBRAPOS: - case OP_ASSERT: - case OP_ONCE: - case OP_ONCE_NC: - d = find_firstassertedchar(scode, &dflags, op == OP_ASSERT); - if (dflags < 0) - return 0; - if (cflags < 0) { c = d; cflags = dflags; } else if (c != d || cflags != dflags) return 0; - break; - - case OP_EXACT: - scode += IMM2_SIZE; - /* Fall through */ - - case OP_CHAR: - case OP_PLUS: - case OP_MINPLUS: - case OP_POSPLUS: - if (!inassert) return 0; - if (cflags < 0) { c = scode[1]; cflags = 0; } - else if (c != scode[1]) return 0; - break; - - case OP_EXACTI: - scode += IMM2_SIZE; - /* Fall through */ - - case OP_CHARI: - case OP_PLUSI: - case OP_MINPLUSI: - case OP_POSPLUSI: - if (!inassert) return 0; - if (cflags < 0) { c = scode[1]; cflags = REQ_CASELESS; } - else if (c != scode[1]) return 0; - break; - } - - code += GET(code, 1); - } -while (*code == OP_ALT); - -*flags = cflags; -return c; -} - - - -/************************************************* -* Add an entry to the name/number table * -*************************************************/ - -/* This function is called between compiling passes to add an entry to the -name/number table, maintaining alphabetical order. Checking for permitted -and forbidden duplicates has already been done. - -Arguments: - cd the compile data block - name the name to add - length the length of the name - groupno the group number - -Returns: nothing -*/ - -static void -add_name(compile_data *cd, const pcre_uchar *name, int length, - unsigned int groupno) -{ -int i; -pcre_uchar *slot = cd->name_table; - -for (i = 0; i < cd->names_found; i++) - { - int crc = memcmp(name, slot+IMM2_SIZE, IN_UCHARS(length)); - if (crc == 0 && slot[IMM2_SIZE+length] != 0) - crc = -1; /* Current name is a substring */ - - /* Make space in the table and break the loop for an earlier name. For a - duplicate or later name, carry on. We do this for duplicates so that in the - simple case (when ?(| is not used) they are in order of their numbers. In all - cases they are in the order in which they appear in the pattern. */ - - if (crc < 0) - { - memmove(slot + cd->name_entry_size, slot, - IN_UCHARS((cd->names_found - i) * cd->name_entry_size)); - break; - } - - /* Continue the loop for a later or duplicate name */ - - slot += cd->name_entry_size; - } - -PUT2(slot, 0, groupno); -memcpy(slot + IMM2_SIZE, name, IN_UCHARS(length)); -slot[IMM2_SIZE + length] = 0; -cd->names_found++; -} - - - -/************************************************* -* Compile a Regular Expression * -*************************************************/ - -/* This function takes a string and returns a pointer to a block of store -holding a compiled version of the expression. The original API for this -function had no error code return variable; it is retained for backwards -compatibility. The new function is given a new name. - -Arguments: - pattern the regular expression - options various option bits - errorcodeptr pointer to error code variable (pcre_compile2() only) - can be NULL if you don't want a code value - errorptr pointer to pointer to error text - erroroffset ptr offset in pattern where error was detected - tables pointer to character tables or NULL - -Returns: pointer to compiled data block, or NULL on error, - with errorptr and erroroffset set -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN pcre * PCRE_CALL_CONVENTION -pcre_compile(const char *pattern, int options, const char **errorptr, - int *erroroffset, const unsigned char *tables) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN pcre16 * PCRE_CALL_CONVENTION -pcre16_compile(PCRE_SPTR16 pattern, int options, const char **errorptr, - int *erroroffset, const unsigned char *tables) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN pcre32 * PCRE_CALL_CONVENTION -pcre32_compile(PCRE_SPTR32 pattern, int options, const char **errorptr, - int *erroroffset, const unsigned char *tables) -#endif -{ -#if defined COMPILE_PCRE8 -return pcre_compile2(pattern, options, NULL, errorptr, erroroffset, tables); -#elif defined COMPILE_PCRE16 -return pcre16_compile2(pattern, options, NULL, errorptr, erroroffset, tables); -#elif defined COMPILE_PCRE32 -return pcre32_compile2(pattern, options, NULL, errorptr, erroroffset, tables); -#endif -} - - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN pcre * PCRE_CALL_CONVENTION -pcre_compile2(const char *pattern, int options, int *errorcodeptr, - const char **errorptr, int *erroroffset, const unsigned char *tables) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN pcre16 * PCRE_CALL_CONVENTION -pcre16_compile2(PCRE_SPTR16 pattern, int options, int *errorcodeptr, - const char **errorptr, int *erroroffset, const unsigned char *tables) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN pcre32 * PCRE_CALL_CONVENTION -pcre32_compile2(PCRE_SPTR32 pattern, int options, int *errorcodeptr, - const char **errorptr, int *erroroffset, const unsigned char *tables) -#endif -{ -REAL_PCRE *re; -int length = 1; /* For final END opcode */ -pcre_int32 firstcharflags, reqcharflags; -pcre_uint32 firstchar, reqchar; -pcre_uint32 limit_match = PCRE_UINT32_MAX; -pcre_uint32 limit_recursion = PCRE_UINT32_MAX; -int newline; -int errorcode = 0; -int skipatstart = 0; -BOOL utf; -BOOL never_utf = FALSE; -size_t size; -pcre_uchar *code; -const pcre_uchar *codestart; -const pcre_uchar *ptr; -compile_data compile_block; -compile_data *cd = &compile_block; - -/* This space is used for "compiling" into during the first phase, when we are -computing the amount of memory that is needed. Compiled items are thrown away -as soon as possible, so that a fairly large buffer should be sufficient for -this purpose. The same space is used in the second phase for remembering where -to fill in forward references to subpatterns. That may overflow, in which case -new memory is obtained from malloc(). */ - -pcre_uchar cworkspace[COMPILE_WORK_SIZE]; - -/* This vector is used for remembering name groups during the pre-compile. In a -similar way to cworkspace, it can be expanded using malloc() if necessary. */ - -named_group named_groups[NAMED_GROUP_LIST_SIZE]; - -/* Set this early so that early errors get offset 0. */ - -ptr = (const pcre_uchar *)pattern; - -/* We can't pass back an error message if errorptr is NULL; I guess the best we -can do is just return NULL, but we can set a code value if there is a code -pointer. */ - -if (errorptr == NULL) - { - if (errorcodeptr != NULL) *errorcodeptr = 99; - return NULL; - } - -*errorptr = NULL; -if (errorcodeptr != NULL) *errorcodeptr = ERR0; - -/* However, we can give a message for this error */ - -if (erroroffset == NULL) - { - errorcode = ERR16; - goto PCRE_EARLY_ERROR_RETURN2; - } - -*erroroffset = 0; - -/* Set up pointers to the individual character tables */ - -if (tables == NULL) tables = PRIV(default_tables); -cd->lcc = tables + lcc_offset; -cd->fcc = tables + fcc_offset; -cd->cbits = tables + cbits_offset; -cd->ctypes = tables + ctypes_offset; - -/* Check that all undefined public option bits are zero */ - -if ((options & ~PUBLIC_COMPILE_OPTIONS) != 0) - { - errorcode = ERR17; - goto PCRE_EARLY_ERROR_RETURN; - } - -/* If PCRE_NEVER_UTF is set, remember it. */ - -if ((options & PCRE_NEVER_UTF) != 0) never_utf = TRUE; - -/* Check for global one-time settings at the start of the pattern, and remember -the offset for later. */ - -cd->external_flags = 0; /* Initialize here for LIMIT_MATCH/RECURSION */ - -while (ptr[skipatstart] == CHAR_LEFT_PARENTHESIS && - ptr[skipatstart+1] == CHAR_ASTERISK) - { - int newnl = 0; - int newbsr = 0; - -/* For completeness and backward compatibility, (*UTFn) is supported in the -relevant libraries, but (*UTF) is generic and always supported. Note that -PCRE_UTF8 == PCRE_UTF16 == PCRE_UTF32. */ - -#ifdef COMPILE_PCRE8 - if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_UTF8_RIGHTPAR, 5) == 0) - { skipatstart += 7; options |= PCRE_UTF8; continue; } -#endif -#ifdef COMPILE_PCRE16 - if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_UTF16_RIGHTPAR, 6) == 0) - { skipatstart += 8; options |= PCRE_UTF16; continue; } -#endif -#ifdef COMPILE_PCRE32 - if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_UTF32_RIGHTPAR, 6) == 0) - { skipatstart += 8; options |= PCRE_UTF32; continue; } -#endif - - else if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_UTF_RIGHTPAR, 4) == 0) - { skipatstart += 6; options |= PCRE_UTF8; continue; } - else if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_UCP_RIGHTPAR, 4) == 0) - { skipatstart += 6; options |= PCRE_UCP; continue; } - else if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_NO_AUTO_POSSESS_RIGHTPAR, 16) == 0) - { skipatstart += 18; options |= PCRE_NO_AUTO_POSSESS; continue; } - else if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_NO_START_OPT_RIGHTPAR, 13) == 0) - { skipatstart += 15; options |= PCRE_NO_START_OPTIMIZE; continue; } - - else if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_LIMIT_MATCH_EQ, 12) == 0) - { - pcre_uint32 c = 0; - int p = skipatstart + 14; - while (isdigit(ptr[p])) - { - if (c > PCRE_UINT32_MAX / 10 - 1) break; /* Integer overflow */ - c = c*10 + ptr[p++] - CHAR_0; - } - if (ptr[p++] != CHAR_RIGHT_PARENTHESIS) break; - if (c < limit_match) - { - limit_match = c; - cd->external_flags |= PCRE_MLSET; - } - skipatstart = p; - continue; - } - - else if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_LIMIT_RECURSION_EQ, 16) == 0) - { - pcre_uint32 c = 0; - int p = skipatstart + 18; - while (isdigit(ptr[p])) - { - if (c > PCRE_UINT32_MAX / 10 - 1) break; /* Integer overflow check */ - c = c*10 + ptr[p++] - CHAR_0; - } - if (ptr[p++] != CHAR_RIGHT_PARENTHESIS) break; - if (c < limit_recursion) - { - limit_recursion = c; - cd->external_flags |= PCRE_RLSET; - } - skipatstart = p; - continue; - } - - if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_CR_RIGHTPAR, 3) == 0) - { skipatstart += 5; newnl = PCRE_NEWLINE_CR; } - else if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_LF_RIGHTPAR, 3) == 0) - { skipatstart += 5; newnl = PCRE_NEWLINE_LF; } - else if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_CRLF_RIGHTPAR, 5) == 0) - { skipatstart += 7; newnl = PCRE_NEWLINE_CR + PCRE_NEWLINE_LF; } - else if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_ANY_RIGHTPAR, 4) == 0) - { skipatstart += 6; newnl = PCRE_NEWLINE_ANY; } - else if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_ANYCRLF_RIGHTPAR, 8) == 0) - { skipatstart += 10; newnl = PCRE_NEWLINE_ANYCRLF; } - - else if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_BSR_ANYCRLF_RIGHTPAR, 12) == 0) - { skipatstart += 14; newbsr = PCRE_BSR_ANYCRLF; } - else if (STRNCMP_UC_C8(ptr+skipatstart+2, STRING_BSR_UNICODE_RIGHTPAR, 12) == 0) - { skipatstart += 14; newbsr = PCRE_BSR_UNICODE; } - - if (newnl != 0) - options = (options & ~PCRE_NEWLINE_BITS) | newnl; - else if (newbsr != 0) - options = (options & ~(PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE)) | newbsr; - else break; - } - -/* PCRE_UTF(16|32) have the same value as PCRE_UTF8. */ -utf = (options & PCRE_UTF8) != 0; -if (utf && never_utf) - { - errorcode = ERR78; - goto PCRE_EARLY_ERROR_RETURN2; - } - -/* Can't support UTF unless PCRE has been compiled to include the code. The -return of an error code from PRIV(valid_utf)() is a new feature, introduced in -release 8.13. It is passed back from pcre_[dfa_]exec(), but at the moment is -not used here. */ - -#ifdef SUPPORT_UTF -if (utf && (options & PCRE_NO_UTF8_CHECK) == 0 && - (errorcode = PRIV(valid_utf)((PCRE_PUCHAR)pattern, -1, erroroffset)) != 0) - { -#if defined COMPILE_PCRE8 - errorcode = ERR44; -#elif defined COMPILE_PCRE16 - errorcode = ERR74; -#elif defined COMPILE_PCRE32 - errorcode = ERR77; -#endif - goto PCRE_EARLY_ERROR_RETURN2; - } -#else -if (utf) - { - errorcode = ERR32; - goto PCRE_EARLY_ERROR_RETURN; - } -#endif - -/* Can't support UCP unless PCRE has been compiled to include the code. */ - -#ifndef SUPPORT_UCP -if ((options & PCRE_UCP) != 0) - { - errorcode = ERR67; - goto PCRE_EARLY_ERROR_RETURN; - } -#endif - -/* Check validity of \R options. */ - -if ((options & (PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE)) == - (PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE)) - { - errorcode = ERR56; - goto PCRE_EARLY_ERROR_RETURN; - } - -/* Handle different types of newline. The three bits give seven cases. The -current code allows for fixed one- or two-byte sequences, plus "any" and -"anycrlf". */ - -switch (options & PCRE_NEWLINE_BITS) - { - case 0: newline = NEWLINE; break; /* Build-time default */ - case PCRE_NEWLINE_CR: newline = CHAR_CR; break; - case PCRE_NEWLINE_LF: newline = CHAR_NL; break; - case PCRE_NEWLINE_CR+ - PCRE_NEWLINE_LF: newline = (CHAR_CR << 8) | CHAR_NL; break; - case PCRE_NEWLINE_ANY: newline = -1; break; - case PCRE_NEWLINE_ANYCRLF: newline = -2; break; - default: errorcode = ERR56; goto PCRE_EARLY_ERROR_RETURN; - } - -if (newline == -2) - { - cd->nltype = NLTYPE_ANYCRLF; - } -else if (newline < 0) - { - cd->nltype = NLTYPE_ANY; - } -else - { - cd->nltype = NLTYPE_FIXED; - if (newline > 255) - { - cd->nllen = 2; - cd->nl[0] = (newline >> 8) & 255; - cd->nl[1] = newline & 255; - } - else - { - cd->nllen = 1; - cd->nl[0] = newline; - } - } - -/* Maximum back reference and backref bitmap. The bitmap records up to 31 back -references to help in deciding whether (.*) can be treated as anchored or not. -*/ - -cd->top_backref = 0; -cd->backref_map = 0; - -/* Reflect pattern for debugging output */ - -DPRINTF(("------------------------------------------------------------------\n")); -#ifdef PCRE_DEBUG -print_puchar(stdout, (PCRE_PUCHAR)pattern); -#endif -DPRINTF(("\n")); - -/* Pretend to compile the pattern while actually just accumulating the length -of memory required. This behaviour is triggered by passing a non-NULL final -argument to compile_regex(). We pass a block of workspace (cworkspace) for it -to compile parts of the pattern into; the compiled code is discarded when it is -no longer needed, so hopefully this workspace will never overflow, though there -is a test for its doing so. */ - -cd->bracount = cd->final_bracount = 0; -cd->names_found = 0; -cd->name_entry_size = 0; -cd->name_table = NULL; -cd->dupnames = FALSE; -cd->dupgroups = FALSE; -cd->namedrefcount = 0; -cd->start_code = cworkspace; -cd->hwm = cworkspace; -cd->iscondassert = FALSE; -cd->start_workspace = cworkspace; -cd->workspace_size = COMPILE_WORK_SIZE; -cd->named_groups = named_groups; -cd->named_group_list_size = NAMED_GROUP_LIST_SIZE; -cd->start_pattern = (const pcre_uchar *)pattern; -cd->end_pattern = (const pcre_uchar *)(pattern + STRLEN_UC((const pcre_uchar *)pattern)); -cd->req_varyopt = 0; -cd->parens_depth = 0; -cd->assert_depth = 0; -cd->max_lookbehind = 0; -cd->external_options = options; -cd->open_caps = NULL; - -/* Now do the pre-compile. On error, errorcode will be set non-zero, so we -don't need to look at the result of the function here. The initial options have -been put into the cd block so that they can be changed if an option setting is -found within the regex right at the beginning. Bringing initial option settings -outside can help speed up starting point checks. */ - -ptr += skipatstart; -code = cworkspace; -*code = OP_BRA; - -(void)compile_regex(cd->external_options, &code, &ptr, &errorcode, FALSE, - FALSE, 0, 0, &firstchar, &firstcharflags, &reqchar, &reqcharflags, NULL, - cd, &length); -if (errorcode != 0) goto PCRE_EARLY_ERROR_RETURN; - -DPRINTF(("end pre-compile: length=%d workspace=%d\n", length, - (int)(cd->hwm - cworkspace))); - -if (length > MAX_PATTERN_SIZE) - { - errorcode = ERR20; - goto PCRE_EARLY_ERROR_RETURN; - } - -/* Compute the size of the data block for storing the compiled pattern. Integer -overflow should no longer be possible because nowadays we limit the maximum -value of cd->names_found and cd->name_entry_size. */ - -size = sizeof(REAL_PCRE) + - (length + cd->names_found * cd->name_entry_size) * sizeof(pcre_uchar); - -/* Get the memory. */ - -re = (REAL_PCRE *)(PUBL(malloc))(size); -if (re == NULL) - { - errorcode = ERR21; - goto PCRE_EARLY_ERROR_RETURN; - } - -/* Put in the magic number, and save the sizes, initial options, internal -flags, and character table pointer. NULL is used for the default character -tables. The nullpad field is at the end; it's there to help in the case when a -regex compiled on a system with 4-byte pointers is run on another with 8-byte -pointers. */ - -re->magic_number = MAGIC_NUMBER; -re->size = (int)size; -re->options = cd->external_options; -re->flags = cd->external_flags; -re->limit_match = limit_match; -re->limit_recursion = limit_recursion; -re->first_char = 0; -re->req_char = 0; -re->name_table_offset = sizeof(REAL_PCRE) / sizeof(pcre_uchar); -re->name_entry_size = cd->name_entry_size; -re->name_count = cd->names_found; -re->ref_count = 0; -re->tables = (tables == PRIV(default_tables))? NULL : tables; -re->nullpad = NULL; -#ifdef COMPILE_PCRE32 -re->dummy = 0; -#else -re->dummy1 = re->dummy2 = re->dummy3 = 0; -#endif - -/* The starting points of the name/number translation table and of the code are -passed around in the compile data block. The start/end pattern and initial -options are already set from the pre-compile phase, as is the name_entry_size -field. Reset the bracket count and the names_found field. Also reset the hwm -field; this time it's used for remembering forward references to subpatterns. -*/ - -cd->final_bracount = cd->bracount; /* Save for checking forward references */ -cd->parens_depth = 0; -cd->assert_depth = 0; -cd->bracount = 0; -cd->max_lookbehind = 0; -cd->name_table = (pcre_uchar *)re + re->name_table_offset; -codestart = cd->name_table + re->name_entry_size * re->name_count; -cd->start_code = codestart; -cd->hwm = (pcre_uchar *)(cd->start_workspace); -cd->iscondassert = FALSE; -cd->req_varyopt = 0; -cd->had_accept = FALSE; -cd->had_pruneorskip = FALSE; -cd->check_lookbehind = FALSE; -cd->open_caps = NULL; - -/* If any named groups were found, create the name/number table from the list -created in the first pass. */ - -if (cd->names_found > 0) - { - int i = cd->names_found; - named_group *ng = cd->named_groups; - cd->names_found = 0; - for (; i > 0; i--, ng++) - add_name(cd, ng->name, ng->length, ng->number); - if (cd->named_group_list_size > NAMED_GROUP_LIST_SIZE) - (PUBL(free))((void *)cd->named_groups); - } - -/* Set up a starting, non-extracting bracket, then compile the expression. On -error, errorcode will be set non-zero, so we don't need to look at the result -of the function here. */ - -ptr = (const pcre_uchar *)pattern + skipatstart; -code = (pcre_uchar *)codestart; -*code = OP_BRA; -(void)compile_regex(re->options, &code, &ptr, &errorcode, FALSE, FALSE, 0, 0, - &firstchar, &firstcharflags, &reqchar, &reqcharflags, NULL, cd, NULL); -re->top_bracket = cd->bracount; -re->top_backref = cd->top_backref; -re->max_lookbehind = cd->max_lookbehind; -re->flags = cd->external_flags | PCRE_MODE; - -if (cd->had_accept) - { - reqchar = 0; /* Must disable after (*ACCEPT) */ - reqcharflags = REQ_NONE; - } - -/* If not reached end of pattern on success, there's an excess bracket. */ - -if (errorcode == 0 && *ptr != CHAR_NULL) errorcode = ERR22; - -/* Fill in the terminating state and check for disastrous overflow, but -if debugging, leave the test till after things are printed out. */ - -*code++ = OP_END; - -#ifndef PCRE_DEBUG -if (code - codestart > length) errorcode = ERR23; -#endif - -#ifdef SUPPORT_VALGRIND -/* If the estimated length exceeds the really used length, mark the extra -allocated memory as unaddressable, so that any out-of-bound reads can be -detected. */ -VALGRIND_MAKE_MEM_NOACCESS(code, (length - (code - codestart)) * sizeof(pcre_uchar)); -#endif - -/* Fill in any forward references that are required. There may be repeated -references; optimize for them, as searching a large regex takes time. */ - -if (cd->hwm > cd->start_workspace) - { - int prev_recno = -1; - const pcre_uchar *groupptr = NULL; - while (errorcode == 0 && cd->hwm > cd->start_workspace) - { - int offset, recno; - cd->hwm -= LINK_SIZE; - offset = GET(cd->hwm, 0); - - /* Check that the hwm handling hasn't gone wrong. This whole area is - rewritten in PCRE2 because there are some obscure cases. */ - - if (offset == 0 || codestart[offset-1] != OP_RECURSE) - { - errorcode = ERR10; - break; - } - - recno = GET(codestart, offset); - if (recno != prev_recno) - { - groupptr = PRIV(find_bracket)(codestart, utf, recno); - prev_recno = recno; - } - if (groupptr == NULL) errorcode = ERR53; - else PUT(((pcre_uchar *)codestart), offset, (int)(groupptr - codestart)); - } - } - -/* If the workspace had to be expanded, free the new memory. Set the pointer to -NULL to indicate that forward references have been filled in. */ - -if (cd->workspace_size > COMPILE_WORK_SIZE) - (PUBL(free))((void *)cd->start_workspace); -cd->start_workspace = NULL; - -/* Give an error if there's back reference to a non-existent capturing -subpattern. */ - -if (errorcode == 0 && re->top_backref > re->top_bracket) errorcode = ERR15; - -/* Unless disabled, check whether any single character iterators can be -auto-possessified. The function overwrites the appropriate opcode values, so -the type of the pointer must be cast. NOTE: the intermediate variable "temp" is -used in this code because at least one compiler gives a warning about loss of -"const" attribute if the cast (pcre_uchar *)codestart is used directly in the -function call. */ - -if (errorcode == 0 && (options & PCRE_NO_AUTO_POSSESS) == 0) - { - pcre_uchar *temp = (pcre_uchar *)codestart; - auto_possessify(temp, utf, cd); - } - -/* If there were any lookbehind assertions that contained OP_RECURSE -(recursions or subroutine calls), a flag is set for them to be checked here, -because they may contain forward references. Actual recursions cannot be fixed -length, but subroutine calls can. It is done like this so that those without -OP_RECURSE that are not fixed length get a diagnosic with a useful offset. The -exceptional ones forgo this. We scan the pattern to check that they are fixed -length, and set their lengths. */ - -if (errorcode == 0 && cd->check_lookbehind) - { - pcre_uchar *cc = (pcre_uchar *)codestart; - - /* Loop, searching for OP_REVERSE items, and process those that do not have - their length set. (Actually, it will also re-process any that have a length - of zero, but that is a pathological case, and it does no harm.) When we find - one, we temporarily terminate the branch it is in while we scan it. */ - - for (cc = (pcre_uchar *)PRIV(find_bracket)(codestart, utf, -1); - cc != NULL; - cc = (pcre_uchar *)PRIV(find_bracket)(cc, utf, -1)) - { - if (GET(cc, 1) == 0) - { - int fixed_length; - pcre_uchar *be = cc - 1 - LINK_SIZE + GET(cc, -LINK_SIZE); - int end_op = *be; - *be = OP_END; - fixed_length = find_fixedlength(cc, (re->options & PCRE_UTF8) != 0, TRUE, - cd, NULL); - *be = end_op; - DPRINTF(("fixed length = %d\n", fixed_length)); - if (fixed_length < 0) - { - errorcode = (fixed_length == -2)? ERR36 : - (fixed_length == -4)? ERR70 : ERR25; - break; - } - if (fixed_length > cd->max_lookbehind) cd->max_lookbehind = fixed_length; - PUT(cc, 1, fixed_length); - } - cc += 1 + LINK_SIZE; - } - } - -/* Failed to compile, or error while post-processing */ - -if (errorcode != 0) - { - (PUBL(free))(re); - PCRE_EARLY_ERROR_RETURN: - *erroroffset = (int)(ptr - (const pcre_uchar *)pattern); - PCRE_EARLY_ERROR_RETURN2: - *errorptr = find_error_text(errorcode); - if (errorcodeptr != NULL) *errorcodeptr = errorcode; - return NULL; - } - -/* If the anchored option was not passed, set the flag if we can determine that -the pattern is anchored by virtue of ^ characters or \A or anything else, such -as starting with non-atomic .* when DOTALL is set and there are no occurrences -of *PRUNE or *SKIP. - -Otherwise, if we know what the first byte has to be, save it, because that -speeds up unanchored matches no end. If not, see if we can set the -PCRE_STARTLINE flag. This is helpful for multiline matches when all branches -start with ^. and also when all branches start with non-atomic .* for -non-DOTALL matches when *PRUNE and SKIP are not present. */ - -if ((re->options & PCRE_ANCHORED) == 0) - { - if (is_anchored(codestart, 0, cd, 0)) re->options |= PCRE_ANCHORED; - else - { - if (firstcharflags < 0) - firstchar = find_firstassertedchar(codestart, &firstcharflags, FALSE); - if (firstcharflags >= 0) /* Remove caseless flag for non-caseable chars */ - { -#if defined COMPILE_PCRE8 - re->first_char = firstchar & 0xff; -#elif defined COMPILE_PCRE16 - re->first_char = firstchar & 0xffff; -#elif defined COMPILE_PCRE32 - re->first_char = firstchar; -#endif - if ((firstcharflags & REQ_CASELESS) != 0) - { -#if defined SUPPORT_UCP && !(defined COMPILE_PCRE8) - /* We ignore non-ASCII first chars in 8 bit mode. */ - if (utf) - { - if (re->first_char < 128) - { - if (cd->fcc[re->first_char] != re->first_char) - re->flags |= PCRE_FCH_CASELESS; - } - else if (UCD_OTHERCASE(re->first_char) != re->first_char) - re->flags |= PCRE_FCH_CASELESS; - } - else -#endif - if (MAX_255(re->first_char) - && cd->fcc[re->first_char] != re->first_char) - re->flags |= PCRE_FCH_CASELESS; - } - - re->flags |= PCRE_FIRSTSET; - } - - else if (is_startline(codestart, 0, cd, 0, FALSE)) re->flags |= PCRE_STARTLINE; - } - } - -/* For an anchored pattern, we use the "required byte" only if it follows a -variable length item in the regex. Remove the caseless flag for non-caseable -bytes. */ - -if (reqcharflags >= 0 && - ((re->options & PCRE_ANCHORED) == 0 || (reqcharflags & REQ_VARY) != 0)) - { -#if defined COMPILE_PCRE8 - re->req_char = reqchar & 0xff; -#elif defined COMPILE_PCRE16 - re->req_char = reqchar & 0xffff; -#elif defined COMPILE_PCRE32 - re->req_char = reqchar; -#endif - if ((reqcharflags & REQ_CASELESS) != 0) - { -#if defined SUPPORT_UCP && !(defined COMPILE_PCRE8) - /* We ignore non-ASCII first chars in 8 bit mode. */ - if (utf) - { - if (re->req_char < 128) - { - if (cd->fcc[re->req_char] != re->req_char) - re->flags |= PCRE_RCH_CASELESS; - } - else if (UCD_OTHERCASE(re->req_char) != re->req_char) - re->flags |= PCRE_RCH_CASELESS; - } - else -#endif - if (MAX_255(re->req_char) && cd->fcc[re->req_char] != re->req_char) - re->flags |= PCRE_RCH_CASELESS; - } - - re->flags |= PCRE_REQCHSET; - } - -/* Print out the compiled data if debugging is enabled. This is never the -case when building a production library. */ - -#ifdef PCRE_DEBUG -printf("Length = %d top_bracket = %d top_backref = %d\n", - length, re->top_bracket, re->top_backref); - -printf("Options=%08x\n", re->options); - -if ((re->flags & PCRE_FIRSTSET) != 0) - { - pcre_uchar ch = re->first_char; - const char *caseless = - ((re->flags & PCRE_FCH_CASELESS) == 0)? "" : " (caseless)"; - if (PRINTABLE(ch)) printf("First char = %c%s\n", ch, caseless); - else printf("First char = \\x%02x%s\n", ch, caseless); - } - -if ((re->flags & PCRE_REQCHSET) != 0) - { - pcre_uchar ch = re->req_char; - const char *caseless = - ((re->flags & PCRE_RCH_CASELESS) == 0)? "" : " (caseless)"; - if (PRINTABLE(ch)) printf("Req char = %c%s\n", ch, caseless); - else printf("Req char = \\x%02x%s\n", ch, caseless); - } - -#if defined COMPILE_PCRE8 -pcre_printint((pcre *)re, stdout, TRUE); -#elif defined COMPILE_PCRE16 -pcre16_printint((pcre *)re, stdout, TRUE); -#elif defined COMPILE_PCRE32 -pcre32_printint((pcre *)re, stdout, TRUE); -#endif - -/* This check is done here in the debugging case so that the code that -was compiled can be seen. */ - -if (code - codestart > length) - { - (PUBL(free))(re); - *errorptr = find_error_text(ERR23); - *erroroffset = ptr - (pcre_uchar *)pattern; - if (errorcodeptr != NULL) *errorcodeptr = ERR23; - return NULL; - } -#endif /* PCRE_DEBUG */ - -/* Check for a pattern than can match an empty string, so that this information -can be provided to applications. */ - -do - { - if (could_be_empty_branch(codestart, code, utf, cd, NULL)) - { - re->flags |= PCRE_MATCH_EMPTY; - break; - } - codestart += GET(codestart, 1); - } -while (*codestart == OP_ALT); - -#if defined COMPILE_PCRE8 -return (pcre *)re; -#elif defined COMPILE_PCRE16 -return (pcre16 *)re; -#elif defined COMPILE_PCRE32 -return (pcre32 *)re; -#endif -} - -/* End of pcre_compile.c */ diff --git a/src/pcre/pcre_config.c b/src/pcre/pcre_config.c deleted file mode 100644 index 1cbdd9c9..00000000 --- a/src/pcre/pcre_config.c +++ /dev/null @@ -1,190 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This module contains the external function pcre_config(). */ - - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -/* Keep the original link size. */ -static int real_link_size = LINK_SIZE; - -#include "pcre_internal.h" - - -/************************************************* -* Return info about what features are configured * -*************************************************/ - -/* This function has an extensible interface so that additional items can be -added compatibly. - -Arguments: - what what information is required - where where to put the information - -Returns: 0 if data returned, negative on error -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre_config(int what, void *where) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre16_config(int what, void *where) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre32_config(int what, void *where) -#endif -{ -switch (what) - { - case PCRE_CONFIG_UTF8: -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - *((int *)where) = 0; - return PCRE_ERROR_BADOPTION; -#else -#if defined SUPPORT_UTF - *((int *)where) = 1; -#else - *((int *)where) = 0; -#endif - break; -#endif - - case PCRE_CONFIG_UTF16: -#if defined COMPILE_PCRE8 || defined COMPILE_PCRE32 - *((int *)where) = 0; - return PCRE_ERROR_BADOPTION; -#else -#if defined SUPPORT_UTF - *((int *)where) = 1; -#else - *((int *)where) = 0; -#endif - break; -#endif - - case PCRE_CONFIG_UTF32: -#if defined COMPILE_PCRE8 || defined COMPILE_PCRE16 - *((int *)where) = 0; - return PCRE_ERROR_BADOPTION; -#else -#if defined SUPPORT_UTF - *((int *)where) = 1; -#else - *((int *)where) = 0; -#endif - break; -#endif - - case PCRE_CONFIG_UNICODE_PROPERTIES: -#ifdef SUPPORT_UCP - *((int *)where) = 1; -#else - *((int *)where) = 0; -#endif - break; - - case PCRE_CONFIG_JIT: -#ifdef SUPPORT_JIT - *((int *)where) = 1; -#else - *((int *)where) = 0; -#endif - break; - - case PCRE_CONFIG_JITTARGET: -#ifdef SUPPORT_JIT - *((const char **)where) = PRIV(jit_get_target)(); -#else - *((const char **)where) = NULL; -#endif - break; - - case PCRE_CONFIG_NEWLINE: - *((int *)where) = NEWLINE; - break; - - case PCRE_CONFIG_BSR: -#ifdef BSR_ANYCRLF - *((int *)where) = 1; -#else - *((int *)where) = 0; -#endif - break; - - case PCRE_CONFIG_LINK_SIZE: - *((int *)where) = real_link_size; - break; - - case PCRE_CONFIG_POSIX_MALLOC_THRESHOLD: - *((int *)where) = POSIX_MALLOC_THRESHOLD; - break; - - case PCRE_CONFIG_PARENS_LIMIT: - *((unsigned long int *)where) = PARENS_NEST_LIMIT; - break; - - case PCRE_CONFIG_MATCH_LIMIT: - *((unsigned long int *)where) = MATCH_LIMIT; - break; - - case PCRE_CONFIG_MATCH_LIMIT_RECURSION: - *((unsigned long int *)where) = MATCH_LIMIT_RECURSION; - break; - - case PCRE_CONFIG_STACKRECURSE: -#ifdef NO_RECURSE - *((int *)where) = 0; -#else - *((int *)where) = 1; -#endif - break; - - default: return PCRE_ERROR_BADOPTION; - } - -return 0; -} - -/* End of pcre_config.c */ diff --git a/src/pcre/pcre_exec.c b/src/pcre/pcre_exec.c deleted file mode 100644 index 3fd58cbe..00000000 --- a/src/pcre/pcre_exec.c +++ /dev/null @@ -1,7173 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2018 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* This module contains pcre_exec(), the externally visible function that does -pattern matching using an NFA algorithm, trying to mimic Perl as closely as -possible. There are also some static supporting functions. */ - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#define NLBLOCK md /* Block containing newline information */ -#define PSSTART start_subject /* Field containing processed string start */ -#define PSEND end_subject /* Field containing processed string end */ - -#include "pcre_internal.h" - -/* Undefine some potentially clashing cpp symbols */ - -#undef min -#undef max - -/* The md->capture_last field uses the lower 16 bits for the last captured -substring (which can never be greater than 65535) and a bit in the top half -to mean "capture vector overflowed". This odd way of doing things was -implemented when it was realized that preserving and restoring the overflow bit -whenever the last capture number was saved/restored made for a neater -interface, and doing it this way saved on (a) another variable, which would -have increased the stack frame size (a big NO-NO in PCRE) and (b) another -separate set of save/restore instructions. The following defines are used in -implementing this. */ - -#define CAPLMASK 0x0000ffff /* The bits used for last_capture */ -#define OVFLMASK 0xffff0000 /* The bits used for the overflow flag */ -#define OVFLBIT 0x00010000 /* The bit that is set for overflow */ - -/* Values for setting in md->match_function_type to indicate two special types -of call to match(). We do it this way to save on using another stack variable, -as stack usage is to be discouraged. */ - -#define MATCH_CONDASSERT 1 /* Called to check a condition assertion */ -#define MATCH_CBEGROUP 2 /* Could-be-empty unlimited repeat group */ - -/* Non-error returns from the match() function. Error returns are externally -defined PCRE_ERROR_xxx codes, which are all negative. */ - -#define MATCH_MATCH 1 -#define MATCH_NOMATCH 0 - -/* Special internal returns from the match() function. Make them sufficiently -negative to avoid the external error codes. */ - -#define MATCH_ACCEPT (-999) -#define MATCH_KETRPOS (-998) -#define MATCH_ONCE (-997) -/* The next 5 must be kept together and in sequence so that a test that checks -for any one of them can use a range. */ -#define MATCH_COMMIT (-996) -#define MATCH_PRUNE (-995) -#define MATCH_SKIP (-994) -#define MATCH_SKIP_ARG (-993) -#define MATCH_THEN (-992) -#define MATCH_BACKTRACK_MAX MATCH_THEN -#define MATCH_BACKTRACK_MIN MATCH_COMMIT - -/* Maximum number of ints of offset to save on the stack for recursive calls. -If the offset vector is bigger, malloc is used. This should be a multiple of 3, -because the offset vector is always a multiple of 3 long. */ - -#define REC_STACK_SAVE_MAX 30 - -/* Min and max values for the common repeats; for the maxima, 0 => infinity */ - -static const char rep_min[] = { 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, }; -static const char rep_max[] = { 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, }; - -#ifdef PCRE_DEBUG -/************************************************* -* Debugging function to print chars * -*************************************************/ - -/* Print a sequence of chars in printable format, stopping at the end of the -subject if the requested. - -Arguments: - p points to characters - length number to print - is_subject TRUE if printing from within md->start_subject - md pointer to matching data block, if is_subject is TRUE - -Returns: nothing -*/ - -static void -pchars(const pcre_uchar *p, int length, BOOL is_subject, match_data *md) -{ -pcre_uint32 c; -BOOL utf = md->utf; -if (is_subject && length > md->end_subject - p) length = md->end_subject - p; -while (length-- > 0) - if (isprint(c = UCHAR21INCTEST(p))) printf("%c", (char)c); else printf("\\x{%02x}", c); -} -#endif - - - -/************************************************* -* Match a back-reference * -*************************************************/ - -/* Normally, if a back reference hasn't been set, the length that is passed is -negative, so the match always fails. However, in JavaScript compatibility mode, -the length passed is zero. Note that in caseless UTF-8 mode, the number of -subject bytes matched may be different to the number of reference bytes. - -Arguments: - offset index into the offset vector - eptr pointer into the subject - length length of reference to be matched (number of bytes) - md points to match data block - caseless TRUE if caseless - -Returns: >= 0 the number of subject bytes matched - -1 no match - -2 partial match; always given if at end subject -*/ - -static int -match_ref(int offset, register PCRE_PUCHAR eptr, int length, match_data *md, - BOOL caseless) -{ -PCRE_PUCHAR eptr_start = eptr; -register PCRE_PUCHAR p = md->start_subject + md->offset_vector[offset]; -#if defined SUPPORT_UTF && defined SUPPORT_UCP -BOOL utf = md->utf; -#endif - -#ifdef PCRE_DEBUG -if (eptr >= md->end_subject) - printf("matching subject "); -else - { - printf("matching subject "); - pchars(eptr, length, TRUE, md); - } -printf(" against backref "); -pchars(p, length, FALSE, md); -printf("\n"); -#endif - -/* Always fail if reference not set (and not JavaScript compatible - in that -case the length is passed as zero). */ - -if (length < 0) return -1; - -/* Separate the caseless case for speed. In UTF-8 mode we can only do this -properly if Unicode properties are supported. Otherwise, we can check only -ASCII characters. */ - -if (caseless) - { -#if defined SUPPORT_UTF && defined SUPPORT_UCP - if (utf) - { - /* Match characters up to the end of the reference. NOTE: the number of - data units matched may differ, because in UTF-8 there are some characters - whose upper and lower case versions code have different numbers of bytes. - For example, U+023A (2 bytes in UTF-8) is the upper case version of U+2C65 - (3 bytes in UTF-8); a sequence of 3 of the former uses 6 bytes, as does a - sequence of two of the latter. It is important, therefore, to check the - length along the reference, not along the subject (earlier code did this - wrong). */ - - PCRE_PUCHAR endptr = p + length; - while (p < endptr) - { - pcre_uint32 c, d; - const ucd_record *ur; - if (eptr >= md->end_subject) return -2; /* Partial match */ - GETCHARINC(c, eptr); - GETCHARINC(d, p); - ur = GET_UCD(d); - if (c != d && c != d + ur->other_case) - { - const pcre_uint32 *pp = PRIV(ucd_caseless_sets) + ur->caseset; - for (;;) - { - if (c < *pp) return -1; - if (c == *pp++) break; - } - } - } - } - else -#endif - - /* The same code works when not in UTF-8 mode and in UTF-8 mode when there - is no UCP support. */ - { - while (length-- > 0) - { - pcre_uint32 cc, cp; - if (eptr >= md->end_subject) return -2; /* Partial match */ - cc = UCHAR21TEST(eptr); - cp = UCHAR21TEST(p); - if (TABLE_GET(cp, md->lcc, cp) != TABLE_GET(cc, md->lcc, cc)) return -1; - p++; - eptr++; - } - } - } - -/* In the caseful case, we can just compare the bytes, whether or not we -are in UTF-8 mode. */ - -else - { - while (length-- > 0) - { - if (eptr >= md->end_subject) return -2; /* Partial match */ - if (UCHAR21INCTEST(p) != UCHAR21INCTEST(eptr)) return -1; - } - } - -return (int)(eptr - eptr_start); -} - - - -/*************************************************************************** -**************************************************************************** - RECURSION IN THE match() FUNCTION - -The match() function is highly recursive, though not every recursive call -increases the recursive depth. Nevertheless, some regular expressions can cause -it to recurse to a great depth. I was writing for Unix, so I just let it call -itself recursively. This uses the stack for saving everything that has to be -saved for a recursive call. On Unix, the stack can be large, and this works -fine. - -It turns out that on some non-Unix-like systems there are problems with -programs that use a lot of stack. (This despite the fact that every last chip -has oodles of memory these days, and techniques for extending the stack have -been known for decades.) So.... - -There is a fudge, triggered by defining NO_RECURSE, which avoids recursive -calls by keeping local variables that need to be preserved in blocks of memory -obtained from malloc() instead instead of on the stack. Macros are used to -achieve this so that the actual code doesn't look very different to what it -always used to. - -The original heap-recursive code used longjmp(). However, it seems that this -can be very slow on some operating systems. Following a suggestion from Stan -Switzer, the use of longjmp() has been abolished, at the cost of having to -provide a unique number for each call to RMATCH. There is no way of generating -a sequence of numbers at compile time in C. I have given them names, to make -them stand out more clearly. - -Crude tests on x86 Linux show a small speedup of around 5-8%. However, on -FreeBSD, avoiding longjmp() more than halves the time taken to run the standard -tests. Furthermore, not using longjmp() means that local dynamic variables -don't have indeterminate values; this has meant that the frame size can be -reduced because the result can be "passed back" by straight setting of the -variable instead of being passed in the frame. -**************************************************************************** -***************************************************************************/ - -/* Numbers for RMATCH calls. When this list is changed, the code at HEAP_RETURN -below must be updated in sync. */ - -enum { RM1=1, RM2, RM3, RM4, RM5, RM6, RM7, RM8, RM9, RM10, - RM11, RM12, RM13, RM14, RM15, RM16, RM17, RM18, RM19, RM20, - RM21, RM22, RM23, RM24, RM25, RM26, RM27, RM28, RM29, RM30, - RM31, RM32, RM33, RM34, RM35, RM36, RM37, RM38, RM39, RM40, - RM41, RM42, RM43, RM44, RM45, RM46, RM47, RM48, RM49, RM50, - RM51, RM52, RM53, RM54, RM55, RM56, RM57, RM58, RM59, RM60, - RM61, RM62, RM63, RM64, RM65, RM66, RM67 }; - -/* These versions of the macros use the stack, as normal. There are debugging -versions and production versions. Note that the "rw" argument of RMATCH isn't -actually used in this definition. */ - -#ifndef NO_RECURSE -#define REGISTER register - -#ifdef PCRE_DEBUG -#define RMATCH(ra,rb,rc,rd,re,rw) \ - { \ - printf("match() called in line %d\n", __LINE__); \ - rrc = match(ra,rb,mstart,rc,rd,re,rdepth+1); \ - printf("to line %d\n", __LINE__); \ - } -#define RRETURN(ra) \ - { \ - printf("match() returned %d from line %d\n", ra, __LINE__); \ - return ra; \ - } -#else -#define RMATCH(ra,rb,rc,rd,re,rw) \ - rrc = match(ra,rb,mstart,rc,rd,re,rdepth+1) -#define RRETURN(ra) return ra -#endif - -#else - - -/* These versions of the macros manage a private stack on the heap. Note that -the "rd" argument of RMATCH isn't actually used in this definition. It's the md -argument of match(), which never changes. */ - -#define REGISTER - -#define RMATCH(ra,rb,rc,rd,re,rw)\ - {\ - heapframe *newframe = frame->Xnextframe;\ - if (newframe == NULL)\ - {\ - newframe = (heapframe *)(PUBL(stack_malloc))(sizeof(heapframe));\ - if (newframe == NULL) RRETURN(PCRE_ERROR_NOMEMORY);\ - newframe->Xnextframe = NULL;\ - frame->Xnextframe = newframe;\ - }\ - frame->Xwhere = rw;\ - newframe->Xeptr = ra;\ - newframe->Xecode = rb;\ - newframe->Xmstart = mstart;\ - newframe->Xoffset_top = rc;\ - newframe->Xeptrb = re;\ - newframe->Xrdepth = frame->Xrdepth + 1;\ - newframe->Xprevframe = frame;\ - frame = newframe;\ - DPRINTF(("restarting from line %d\n", __LINE__));\ - goto HEAP_RECURSE;\ - L_##rw:\ - DPRINTF(("jumped back to line %d\n", __LINE__));\ - } - -#define RRETURN(ra)\ - {\ - heapframe *oldframe = frame;\ - frame = oldframe->Xprevframe;\ - if (frame != NULL)\ - {\ - rrc = ra;\ - goto HEAP_RETURN;\ - }\ - return ra;\ - } - - -/* Structure for remembering the local variables in a private frame */ - -typedef struct heapframe { - struct heapframe *Xprevframe; - struct heapframe *Xnextframe; - - /* Function arguments that may change */ - - PCRE_PUCHAR Xeptr; - const pcre_uchar *Xecode; - PCRE_PUCHAR Xmstart; - int Xoffset_top; - eptrblock *Xeptrb; - unsigned int Xrdepth; - - /* Function local variables */ - - PCRE_PUCHAR Xcallpat; -#ifdef SUPPORT_UTF - PCRE_PUCHAR Xcharptr; -#endif - PCRE_PUCHAR Xdata; - PCRE_PUCHAR Xnext; - PCRE_PUCHAR Xpp; - PCRE_PUCHAR Xprev; - PCRE_PUCHAR Xsaved_eptr; - - recursion_info Xnew_recursive; - - BOOL Xcur_is_word; - BOOL Xcondition; - BOOL Xprev_is_word; - -#ifdef SUPPORT_UCP - int Xprop_type; - unsigned int Xprop_value; - int Xprop_fail_result; - int Xoclength; - pcre_uchar Xocchars[6]; -#endif - - int Xcodelink; - int Xctype; - unsigned int Xfc; - int Xfi; - int Xlength; - int Xmax; - int Xmin; - unsigned int Xnumber; - int Xoffset; - unsigned int Xop; - pcre_int32 Xsave_capture_last; - int Xsave_offset1, Xsave_offset2, Xsave_offset3; - int Xstacksave[REC_STACK_SAVE_MAX]; - - eptrblock Xnewptrb; - - /* Where to jump back to */ - - int Xwhere; - -} heapframe; - -#endif - - -/*************************************************************************** -***************************************************************************/ - - - -/************************************************* -* Match from current position * -*************************************************/ - -/* This function is called recursively in many circumstances. Whenever it -returns a negative (error) response, the outer incarnation must also return the -same response. */ - -/* These macros pack up tests that are used for partial matching, and which -appear several times in the code. We set the "hit end" flag if the pointer is -at the end of the subject and also past the start of the subject (i.e. -something has been matched). For hard partial matching, we then return -immediately. The second one is used when we already know we are past the end of -the subject. */ - -#define CHECK_PARTIAL()\ - if (md->partial != 0 && eptr >= md->end_subject && \ - eptr > md->start_used_ptr) \ - { \ - md->hitend = TRUE; \ - if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); \ - } - -#define SCHECK_PARTIAL()\ - if (md->partial != 0 && eptr > md->start_used_ptr) \ - { \ - md->hitend = TRUE; \ - if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); \ - } - - -/* Performance note: It might be tempting to extract commonly used fields from -the md structure (e.g. utf, end_subject) into individual variables to improve -performance. Tests using gcc on a SPARC disproved this; in the first case, it -made performance worse. - -Arguments: - eptr pointer to current character in subject - ecode pointer to current position in compiled code - mstart pointer to the current match start position (can be modified - by encountering \K) - offset_top current top pointer - md pointer to "static" info for the match - eptrb pointer to chain of blocks containing eptr at start of - brackets - for testing for empty matches - rdepth the recursion depth - -Returns: MATCH_MATCH if matched ) these values are >= 0 - MATCH_NOMATCH if failed to match ) - a negative MATCH_xxx value for PRUNE, SKIP, etc - a negative PCRE_ERROR_xxx value if aborted by an error condition - (e.g. stopped by repeated call or recursion limit) -*/ - -static int -match(REGISTER PCRE_PUCHAR eptr, REGISTER const pcre_uchar *ecode, - PCRE_PUCHAR mstart, int offset_top, match_data *md, eptrblock *eptrb, - unsigned int rdepth) -{ -/* These variables do not need to be preserved over recursion in this function, -so they can be ordinary variables in all cases. Mark some of them with -"register" because they are used a lot in loops. */ - -register int rrc; /* Returns from recursive calls */ -register int i; /* Used for loops not involving calls to RMATCH() */ -register pcre_uint32 c; /* Character values not kept over RMATCH() calls */ -register BOOL utf; /* Local copy of UTF flag for speed */ - -BOOL minimize, possessive; /* Quantifier options */ -BOOL caseless; -int condcode; - -/* When recursion is not being used, all "local" variables that have to be -preserved over calls to RMATCH() are part of a "frame". We set up the top-level -frame on the stack here; subsequent instantiations are obtained from the heap -whenever RMATCH() does a "recursion". See the macro definitions above. Putting -the top-level on the stack rather than malloc-ing them all gives a performance -boost in many cases where there is not much "recursion". */ - -#ifdef NO_RECURSE -heapframe *frame = (heapframe *)md->match_frames_base; - -/* Copy in the original argument variables */ - -frame->Xeptr = eptr; -frame->Xecode = ecode; -frame->Xmstart = mstart; -frame->Xoffset_top = offset_top; -frame->Xeptrb = eptrb; -frame->Xrdepth = rdepth; - -/* This is where control jumps back to to effect "recursion" */ - -HEAP_RECURSE: - -/* Macros make the argument variables come from the current frame */ - -#define eptr frame->Xeptr -#define ecode frame->Xecode -#define mstart frame->Xmstart -#define offset_top frame->Xoffset_top -#define eptrb frame->Xeptrb -#define rdepth frame->Xrdepth - -/* Ditto for the local variables */ - -#ifdef SUPPORT_UTF -#define charptr frame->Xcharptr -#endif -#define callpat frame->Xcallpat -#define codelink frame->Xcodelink -#define data frame->Xdata -#define next frame->Xnext -#define pp frame->Xpp -#define prev frame->Xprev -#define saved_eptr frame->Xsaved_eptr - -#define new_recursive frame->Xnew_recursive - -#define cur_is_word frame->Xcur_is_word -#define condition frame->Xcondition -#define prev_is_word frame->Xprev_is_word - -#ifdef SUPPORT_UCP -#define prop_type frame->Xprop_type -#define prop_value frame->Xprop_value -#define prop_fail_result frame->Xprop_fail_result -#define oclength frame->Xoclength -#define occhars frame->Xocchars -#endif - -#define ctype frame->Xctype -#define fc frame->Xfc -#define fi frame->Xfi -#define length frame->Xlength -#define max frame->Xmax -#define min frame->Xmin -#define number frame->Xnumber -#define offset frame->Xoffset -#define op frame->Xop -#define save_capture_last frame->Xsave_capture_last -#define save_offset1 frame->Xsave_offset1 -#define save_offset2 frame->Xsave_offset2 -#define save_offset3 frame->Xsave_offset3 -#define stacksave frame->Xstacksave - -#define newptrb frame->Xnewptrb - -/* When recursion is being used, local variables are allocated on the stack and -get preserved during recursion in the normal way. In this environment, fi and -i, and fc and c, can be the same variables. */ - -#else /* NO_RECURSE not defined */ -#define fi i -#define fc c - -/* Many of the following variables are used only in small blocks of the code. -My normal style of coding would have declared them within each of those blocks. -However, in order to accommodate the version of this code that uses an external -"stack" implemented on the heap, it is easier to declare them all here, so the -declarations can be cut out in a block. The only declarations within blocks -below are for variables that do not have to be preserved over a recursive call -to RMATCH(). */ - -#ifdef SUPPORT_UTF -const pcre_uchar *charptr; -#endif -const pcre_uchar *callpat; -const pcre_uchar *data; -const pcre_uchar *next; -PCRE_PUCHAR pp; -const pcre_uchar *prev; -PCRE_PUCHAR saved_eptr; - -recursion_info new_recursive; - -BOOL cur_is_word; -BOOL condition; -BOOL prev_is_word; - -#ifdef SUPPORT_UCP -int prop_type; -unsigned int prop_value; -int prop_fail_result; -int oclength; -pcre_uchar occhars[6]; -#endif - -int codelink; -int ctype; -int length; -int max; -int min; -unsigned int number; -int offset; -unsigned int op; -pcre_int32 save_capture_last; -int save_offset1, save_offset2, save_offset3; -int stacksave[REC_STACK_SAVE_MAX]; - -eptrblock newptrb; - -/* There is a special fudge for calling match() in a way that causes it to -measure the size of its basic stack frame when the stack is being used for -recursion. The second argument (ecode) being NULL triggers this behaviour. It -cannot normally ever be NULL. The return is the negated value of the frame -size. */ - -if (ecode == NULL) - { - if (rdepth == 0) - return match((PCRE_PUCHAR)&rdepth, NULL, NULL, 0, NULL, NULL, 1); - else - { - int len = (int)((char *)&rdepth - (char *)eptr); - return (len > 0)? -len : len; - } - } -#endif /* NO_RECURSE */ - -/* To save space on the stack and in the heap frame, I have doubled up on some -of the local variables that are used only in localised parts of the code, but -still need to be preserved over recursive calls of match(). These macros define -the alternative names that are used. */ - -#define allow_zero cur_is_word -#define cbegroup condition -#define code_offset codelink -#define condassert condition -#define matched_once prev_is_word -#define foc number -#define save_mark data - -/* These statements are here to stop the compiler complaining about unitialized -variables. */ - -#ifdef SUPPORT_UCP -prop_value = 0; -prop_fail_result = 0; -#endif - - -/* This label is used for tail recursion, which is used in a few cases even -when NO_RECURSE is not defined, in order to reduce the amount of stack that is -used. Thanks to Ian Taylor for noticing this possibility and sending the -original patch. */ - -TAIL_RECURSE: - -/* OK, now we can get on with the real code of the function. Recursive calls -are specified by the macro RMATCH and RRETURN is used to return. When -NO_RECURSE is *not* defined, these just turn into a recursive call to match() -and a "return", respectively (possibly with some debugging if PCRE_DEBUG is -defined). However, RMATCH isn't like a function call because it's quite a -complicated macro. It has to be used in one particular way. This shouldn't, -however, impact performance when true recursion is being used. */ - -#ifdef SUPPORT_UTF -utf = md->utf; /* Local copy of the flag */ -#else -utf = FALSE; -#endif - -/* First check that we haven't called match() too many times, or that we -haven't exceeded the recursive call limit. */ - -if (md->match_call_count++ >= md->match_limit) RRETURN(PCRE_ERROR_MATCHLIMIT); -if (rdepth >= md->match_limit_recursion) RRETURN(PCRE_ERROR_RECURSIONLIMIT); - -/* At the start of a group with an unlimited repeat that may match an empty -string, the variable md->match_function_type is set to MATCH_CBEGROUP. It is -done this way to save having to use another function argument, which would take -up space on the stack. See also MATCH_CONDASSERT below. - -When MATCH_CBEGROUP is set, add the current subject pointer to the chain of -such remembered pointers, to be checked when we hit the closing ket, in order -to break infinite loops that match no characters. When match() is called in -other circumstances, don't add to the chain. The MATCH_CBEGROUP feature must -NOT be used with tail recursion, because the memory block that is used is on -the stack, so a new one may be required for each match(). */ - -if (md->match_function_type == MATCH_CBEGROUP) - { - newptrb.epb_saved_eptr = eptr; - newptrb.epb_prev = eptrb; - eptrb = &newptrb; - md->match_function_type = 0; - } - -/* Now start processing the opcodes. */ - -for (;;) - { - minimize = possessive = FALSE; - op = *ecode; - - switch(op) - { - case OP_MARK: - md->nomatch_mark = ecode + 2; - md->mark = NULL; /* In case previously set by assertion */ - RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode] + ecode[1], offset_top, md, - eptrb, RM55); - if ((rrc == MATCH_MATCH || rrc == MATCH_ACCEPT) && - md->mark == NULL) md->mark = ecode + 2; - - /* A return of MATCH_SKIP_ARG means that matching failed at SKIP with an - argument, and we must check whether that argument matches this MARK's - argument. It is passed back in md->start_match_ptr (an overloading of that - variable). If it does match, we reset that variable to the current subject - position and return MATCH_SKIP. Otherwise, pass back the return code - unaltered. */ - - else if (rrc == MATCH_SKIP_ARG && - STRCMP_UC_UC_TEST(ecode + 2, md->start_match_ptr) == 0) - { - md->start_match_ptr = eptr; - RRETURN(MATCH_SKIP); - } - RRETURN(rrc); - - case OP_FAIL: - RRETURN(MATCH_NOMATCH); - - case OP_COMMIT: - RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, md, - eptrb, RM52); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - RRETURN(MATCH_COMMIT); - - case OP_PRUNE: - RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, md, - eptrb, RM51); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - RRETURN(MATCH_PRUNE); - - case OP_PRUNE_ARG: - md->nomatch_mark = ecode + 2; - md->mark = NULL; /* In case previously set by assertion */ - RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode] + ecode[1], offset_top, md, - eptrb, RM56); - if ((rrc == MATCH_MATCH || rrc == MATCH_ACCEPT) && - md->mark == NULL) md->mark = ecode + 2; - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - RRETURN(MATCH_PRUNE); - - case OP_SKIP: - RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, md, - eptrb, RM53); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - md->start_match_ptr = eptr; /* Pass back current position */ - RRETURN(MATCH_SKIP); - - /* Note that, for Perl compatibility, SKIP with an argument does NOT set - nomatch_mark. When a pattern match ends with a SKIP_ARG for which there was - not a matching mark, we have to re-run the match, ignoring the SKIP_ARG - that failed and any that precede it (either they also failed, or were not - triggered). To do this, we maintain a count of executed SKIP_ARGs. If a - SKIP_ARG gets to top level, the match is re-run with md->ignore_skip_arg - set to the count of the one that failed. */ - - case OP_SKIP_ARG: - md->skip_arg_count++; - if (md->skip_arg_count <= md->ignore_skip_arg) - { - ecode += PRIV(OP_lengths)[*ecode] + ecode[1]; - break; - } - RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode] + ecode[1], offset_top, md, - eptrb, RM57); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - - /* Pass back the current skip name by overloading md->start_match_ptr and - returning the special MATCH_SKIP_ARG return code. This will either be - caught by a matching MARK, or get to the top, where it causes a rematch - with md->ignore_skip_arg set to the value of md->skip_arg_count. */ - - md->start_match_ptr = ecode + 2; - RRETURN(MATCH_SKIP_ARG); - - /* For THEN (and THEN_ARG) we pass back the address of the opcode, so that - the branch in which it occurs can be determined. Overload the start of - match pointer to do this. */ - - case OP_THEN: - RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, md, - eptrb, RM54); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - md->start_match_ptr = ecode; - RRETURN(MATCH_THEN); - - case OP_THEN_ARG: - md->nomatch_mark = ecode + 2; - md->mark = NULL; /* In case previously set by assertion */ - RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode] + ecode[1], offset_top, - md, eptrb, RM58); - if ((rrc == MATCH_MATCH || rrc == MATCH_ACCEPT) && - md->mark == NULL) md->mark = ecode + 2; - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - md->start_match_ptr = ecode; - RRETURN(MATCH_THEN); - - /* Handle an atomic group that does not contain any capturing parentheses. - This can be handled like an assertion. Prior to 8.13, all atomic groups - were handled this way. In 8.13, the code was changed as below for ONCE, so - that backups pass through the group and thereby reset captured values. - However, this uses a lot more stack, so in 8.20, atomic groups that do not - contain any captures generate OP_ONCE_NC, which can be handled in the old, - less stack intensive way. - - Check the alternative branches in turn - the matching won't pass the KET - for this kind of subpattern. If any one branch matches, we carry on as at - the end of a normal bracket, leaving the subject pointer, but resetting - the start-of-match value in case it was changed by \K. */ - - case OP_ONCE_NC: - prev = ecode; - saved_eptr = eptr; - save_mark = md->mark; - do - { - RMATCH(eptr, ecode + 1 + LINK_SIZE, offset_top, md, eptrb, RM64); - if (rrc == MATCH_MATCH) /* Note: _not_ MATCH_ACCEPT */ - { - mstart = md->start_match_ptr; - break; - } - if (rrc == MATCH_THEN) - { - next = ecode + GET(ecode,1); - if (md->start_match_ptr < next && - (*ecode == OP_ALT || *next == OP_ALT)) - rrc = MATCH_NOMATCH; - } - - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - ecode += GET(ecode,1); - md->mark = save_mark; - } - while (*ecode == OP_ALT); - - /* If hit the end of the group (which could be repeated), fail */ - - if (*ecode != OP_ONCE_NC && *ecode != OP_ALT) RRETURN(MATCH_NOMATCH); - - /* Continue as from after the group, updating the offsets high water - mark, since extracts may have been taken. */ - - do ecode += GET(ecode, 1); while (*ecode == OP_ALT); - - offset_top = md->end_offset_top; - eptr = md->end_match_ptr; - - /* For a non-repeating ket, just continue at this level. This also - happens for a repeating ket if no characters were matched in the group. - This is the forcible breaking of infinite loops as implemented in Perl - 5.005. */ - - if (*ecode == OP_KET || eptr == saved_eptr) - { - ecode += 1+LINK_SIZE; - break; - } - - /* The repeating kets try the rest of the pattern or restart from the - preceding bracket, in the appropriate order. The second "call" of match() - uses tail recursion, to avoid using another stack frame. */ - - if (*ecode == OP_KETRMIN) - { - RMATCH(eptr, ecode + 1 + LINK_SIZE, offset_top, md, eptrb, RM65); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - ecode = prev; - goto TAIL_RECURSE; - } - else /* OP_KETRMAX */ - { - RMATCH(eptr, prev, offset_top, md, eptrb, RM66); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - ecode += 1 + LINK_SIZE; - goto TAIL_RECURSE; - } - /* Control never gets here */ - - /* Handle a capturing bracket, other than those that are possessive with an - unlimited repeat. If there is space in the offset vector, save the current - subject position in the working slot at the top of the vector. We mustn't - change the current values of the data slot, because they may be set from a - previous iteration of this group, and be referred to by a reference inside - the group. A failure to match might occur after the group has succeeded, - if something later on doesn't match. For this reason, we need to restore - the working value and also the values of the final offsets, in case they - were set by a previous iteration of the same bracket. - - If there isn't enough space in the offset vector, treat this as if it were - a non-capturing bracket. Don't worry about setting the flag for the error - case here; that is handled in the code for KET. */ - - case OP_CBRA: - case OP_SCBRA: - number = GET2(ecode, 1+LINK_SIZE); - offset = number << 1; - -#ifdef PCRE_DEBUG - printf("start bracket %d\n", number); - printf("subject="); - pchars(eptr, 16, TRUE, md); - printf("\n"); -#endif - - if (offset < md->offset_max) - { - save_offset1 = md->offset_vector[offset]; - save_offset2 = md->offset_vector[offset+1]; - save_offset3 = md->offset_vector[md->offset_end - number]; - save_capture_last = md->capture_last; - save_mark = md->mark; - - DPRINTF(("saving %d %d %d\n", save_offset1, save_offset2, save_offset3)); - md->offset_vector[md->offset_end - number] = - (int)(eptr - md->start_subject); - - for (;;) - { - if (op >= OP_SBRA) md->match_function_type = MATCH_CBEGROUP; - RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, md, - eptrb, RM1); - if (rrc == MATCH_ONCE) break; /* Backing up through an atomic group */ - - /* If we backed up to a THEN, check whether it is within the current - branch by comparing the address of the THEN that is passed back with - the end of the branch. If it is within the current branch, and the - branch is one of two or more alternatives (it either starts or ends - with OP_ALT), we have reached the limit of THEN's action, so convert - the return code to NOMATCH, which will cause normal backtracking to - happen from now on. Otherwise, THEN is passed back to an outer - alternative. This implements Perl's treatment of parenthesized groups, - where a group not containing | does not affect the current alternative, - that is, (X) is NOT the same as (X|(*F)). */ - - if (rrc == MATCH_THEN) - { - next = ecode + GET(ecode,1); - if (md->start_match_ptr < next && - (*ecode == OP_ALT || *next == OP_ALT)) - rrc = MATCH_NOMATCH; - } - - /* Anything other than NOMATCH is passed back. */ - - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - md->capture_last = save_capture_last; - ecode += GET(ecode, 1); - md->mark = save_mark; - if (*ecode != OP_ALT) break; - } - - DPRINTF(("bracket %d failed\n", number)); - md->offset_vector[offset] = save_offset1; - md->offset_vector[offset+1] = save_offset2; - md->offset_vector[md->offset_end - number] = save_offset3; - - /* At this point, rrc will be one of MATCH_ONCE or MATCH_NOMATCH. */ - - RRETURN(rrc); - } - - /* FALL THROUGH ... Insufficient room for saving captured contents. Treat - as a non-capturing bracket. */ - - /* VVVVVVVVVVVVVVVVVVVVVVVVV */ - /* VVVVVVVVVVVVVVVVVVVVVVVVV */ - - DPRINTF(("insufficient capture room: treat as non-capturing\n")); - - /* VVVVVVVVVVVVVVVVVVVVVVVVV */ - /* VVVVVVVVVVVVVVVVVVVVVVVVV */ - - /* Non-capturing or atomic group, except for possessive with unlimited - repeat and ONCE group with no captures. Loop for all the alternatives. - - When we get to the final alternative within the brackets, we used to return - the result of a recursive call to match() whatever happened so it was - possible to reduce stack usage by turning this into a tail recursion, - except in the case of a possibly empty group. However, now that there is - the possiblity of (*THEN) occurring in the final alternative, this - optimization is no longer always possible. - - We can optimize if we know there are no (*THEN)s in the pattern; at present - this is the best that can be done. - - MATCH_ONCE is returned when the end of an atomic group is successfully - reached, but subsequent matching fails. It passes back up the tree (causing - captured values to be reset) until the original atomic group level is - reached. This is tested by comparing md->once_target with the start of the - group. At this point, the return is converted into MATCH_NOMATCH so that - previous backup points can be taken. */ - - case OP_ONCE: - case OP_BRA: - case OP_SBRA: - DPRINTF(("start non-capturing bracket\n")); - - for (;;) - { - if (op >= OP_SBRA || op == OP_ONCE) - md->match_function_type = MATCH_CBEGROUP; - - /* If this is not a possibly empty group, and there are no (*THEN)s in - the pattern, and this is the final alternative, optimize as described - above. */ - - else if (!md->hasthen && ecode[GET(ecode, 1)] != OP_ALT) - { - ecode += PRIV(OP_lengths)[*ecode]; - goto TAIL_RECURSE; - } - - /* In all other cases, we have to make another call to match(). */ - - save_mark = md->mark; - save_capture_last = md->capture_last; - RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, md, eptrb, - RM2); - - /* See comment in the code for capturing groups above about handling - THEN. */ - - if (rrc == MATCH_THEN) - { - next = ecode + GET(ecode,1); - if (md->start_match_ptr < next && - (*ecode == OP_ALT || *next == OP_ALT)) - rrc = MATCH_NOMATCH; - } - - if (rrc != MATCH_NOMATCH) - { - if (rrc == MATCH_ONCE) - { - const pcre_uchar *scode = ecode; - if (*scode != OP_ONCE) /* If not at start, find it */ - { - while (*scode == OP_ALT) scode += GET(scode, 1); - scode -= GET(scode, 1); - } - if (md->once_target == scode) rrc = MATCH_NOMATCH; - } - RRETURN(rrc); - } - ecode += GET(ecode, 1); - md->mark = save_mark; - if (*ecode != OP_ALT) break; - md->capture_last = save_capture_last; - } - - RRETURN(MATCH_NOMATCH); - - /* Handle possessive capturing brackets with an unlimited repeat. We come - here from BRAZERO with allow_zero set TRUE. The offset_vector values are - handled similarly to the normal case above. However, the matching is - different. The end of these brackets will always be OP_KETRPOS, which - returns MATCH_KETRPOS without going further in the pattern. By this means - we can handle the group by iteration rather than recursion, thereby - reducing the amount of stack needed. */ - - case OP_CBRAPOS: - case OP_SCBRAPOS: - allow_zero = FALSE; - - POSSESSIVE_CAPTURE: - number = GET2(ecode, 1+LINK_SIZE); - offset = number << 1; - -#ifdef PCRE_DEBUG - printf("start possessive bracket %d\n", number); - printf("subject="); - pchars(eptr, 16, TRUE, md); - printf("\n"); -#endif - - if (offset >= md->offset_max) goto POSSESSIVE_NON_CAPTURE; - - matched_once = FALSE; - code_offset = (int)(ecode - md->start_code); - - save_offset1 = md->offset_vector[offset]; - save_offset2 = md->offset_vector[offset+1]; - save_offset3 = md->offset_vector[md->offset_end - number]; - save_capture_last = md->capture_last; - - DPRINTF(("saving %d %d %d\n", save_offset1, save_offset2, save_offset3)); - - /* Each time round the loop, save the current subject position for use - when the group matches. For MATCH_MATCH, the group has matched, so we - restart it with a new subject starting position, remembering that we had - at least one match. For MATCH_NOMATCH, carry on with the alternatives, as - usual. If we haven't matched any alternatives in any iteration, check to - see if a previous iteration matched. If so, the group has matched; - continue from afterwards. Otherwise it has failed; restore the previous - capture values before returning NOMATCH. */ - - for (;;) - { - md->offset_vector[md->offset_end - number] = - (int)(eptr - md->start_subject); - if (op >= OP_SBRA) md->match_function_type = MATCH_CBEGROUP; - RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, md, - eptrb, RM63); - if (rrc == MATCH_KETRPOS) - { - offset_top = md->end_offset_top; - ecode = md->start_code + code_offset; - save_capture_last = md->capture_last; - matched_once = TRUE; - mstart = md->start_match_ptr; /* In case \K changed it */ - if (eptr == md->end_match_ptr) /* Matched an empty string */ - { - do ecode += GET(ecode, 1); while (*ecode == OP_ALT); - break; - } - eptr = md->end_match_ptr; - continue; - } - - /* See comment in the code for capturing groups above about handling - THEN. */ - - if (rrc == MATCH_THEN) - { - next = ecode + GET(ecode,1); - if (md->start_match_ptr < next && - (*ecode == OP_ALT || *next == OP_ALT)) - rrc = MATCH_NOMATCH; - } - - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - md->capture_last = save_capture_last; - ecode += GET(ecode, 1); - if (*ecode != OP_ALT) break; - } - - if (!matched_once) - { - md->offset_vector[offset] = save_offset1; - md->offset_vector[offset+1] = save_offset2; - md->offset_vector[md->offset_end - number] = save_offset3; - } - - if (allow_zero || matched_once) - { - ecode += 1 + LINK_SIZE; - break; - } - - RRETURN(MATCH_NOMATCH); - - /* Non-capturing possessive bracket with unlimited repeat. We come here - from BRAZERO with allow_zero = TRUE. The code is similar to the above, - without the capturing complication. It is written out separately for speed - and cleanliness. */ - - case OP_BRAPOS: - case OP_SBRAPOS: - allow_zero = FALSE; - - POSSESSIVE_NON_CAPTURE: - matched_once = FALSE; - code_offset = (int)(ecode - md->start_code); - save_capture_last = md->capture_last; - - for (;;) - { - if (op >= OP_SBRA) md->match_function_type = MATCH_CBEGROUP; - RMATCH(eptr, ecode + PRIV(OP_lengths)[*ecode], offset_top, md, - eptrb, RM48); - if (rrc == MATCH_KETRPOS) - { - offset_top = md->end_offset_top; - ecode = md->start_code + code_offset; - matched_once = TRUE; - mstart = md->start_match_ptr; /* In case \K reset it */ - if (eptr == md->end_match_ptr) /* Matched an empty string */ - { - do ecode += GET(ecode, 1); while (*ecode == OP_ALT); - break; - } - eptr = md->end_match_ptr; - continue; - } - - /* See comment in the code for capturing groups above about handling - THEN. */ - - if (rrc == MATCH_THEN) - { - next = ecode + GET(ecode,1); - if (md->start_match_ptr < next && - (*ecode == OP_ALT || *next == OP_ALT)) - rrc = MATCH_NOMATCH; - } - - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - ecode += GET(ecode, 1); - if (*ecode != OP_ALT) break; - md->capture_last = save_capture_last; - } - - if (matched_once || allow_zero) - { - ecode += 1 + LINK_SIZE; - break; - } - RRETURN(MATCH_NOMATCH); - - /* Control never reaches here. */ - - /* Conditional group: compilation checked that there are no more than two - branches. If the condition is false, skipping the first branch takes us - past the end of the item if there is only one branch, but that's exactly - what we want. */ - - case OP_COND: - case OP_SCOND: - - /* The variable codelink will be added to ecode when the condition is - false, to get to the second branch. Setting it to the offset to the ALT - or KET, then incrementing ecode achieves this effect. We now have ecode - pointing to the condition or callout. */ - - codelink = GET(ecode, 1); /* Offset to the second branch */ - ecode += 1 + LINK_SIZE; /* From this opcode */ - - /* Because of the way auto-callout works during compile, a callout item is - inserted between OP_COND and an assertion condition. */ - - if (*ecode == OP_CALLOUT) - { - if (PUBL(callout) != NULL) - { - PUBL(callout_block) cb; - cb.version = 2; /* Version 1 of the callout block */ - cb.callout_number = ecode[1]; - cb.offset_vector = md->offset_vector; -#if defined COMPILE_PCRE8 - cb.subject = (PCRE_SPTR)md->start_subject; -#elif defined COMPILE_PCRE16 - cb.subject = (PCRE_SPTR16)md->start_subject; -#elif defined COMPILE_PCRE32 - cb.subject = (PCRE_SPTR32)md->start_subject; -#endif - cb.subject_length = (int)(md->end_subject - md->start_subject); - cb.start_match = (int)(mstart - md->start_subject); - cb.current_position = (int)(eptr - md->start_subject); - cb.pattern_position = GET(ecode, 2); - cb.next_item_length = GET(ecode, 2 + LINK_SIZE); - cb.capture_top = offset_top/2; - cb.capture_last = md->capture_last & CAPLMASK; - /* Internal change requires this for API compatibility. */ - if (cb.capture_last == 0) cb.capture_last = -1; - cb.callout_data = md->callout_data; - cb.mark = md->nomatch_mark; - if ((rrc = (*PUBL(callout))(&cb)) > 0) RRETURN(MATCH_NOMATCH); - if (rrc < 0) RRETURN(rrc); - } - - /* Advance ecode past the callout, so it now points to the condition. We - must adjust codelink so that the value of ecode+codelink is unchanged. */ - - ecode += PRIV(OP_lengths)[OP_CALLOUT]; - codelink -= PRIV(OP_lengths)[OP_CALLOUT]; - } - - /* Test the various possible conditions */ - - condition = FALSE; - switch(condcode = *ecode) - { - case OP_RREF: /* Numbered group recursion test */ - if (md->recursive != NULL) /* Not recursing => FALSE */ - { - unsigned int recno = GET2(ecode, 1); /* Recursion group number*/ - condition = (recno == RREF_ANY || recno == md->recursive->group_num); - } - break; - - case OP_DNRREF: /* Duplicate named group recursion test */ - if (md->recursive != NULL) - { - int count = GET2(ecode, 1 + IMM2_SIZE); - pcre_uchar *slot = md->name_table + GET2(ecode, 1) * md->name_entry_size; - while (count-- > 0) - { - unsigned int recno = GET2(slot, 0); - condition = recno == md->recursive->group_num; - if (condition) break; - slot += md->name_entry_size; - } - } - break; - - case OP_CREF: /* Numbered group used test */ - offset = GET2(ecode, 1) << 1; /* Doubled ref number */ - condition = offset < offset_top && md->offset_vector[offset] >= 0; - break; - - case OP_DNCREF: /* Duplicate named group used test */ - { - int count = GET2(ecode, 1 + IMM2_SIZE); - pcre_uchar *slot = md->name_table + GET2(ecode, 1) * md->name_entry_size; - while (count-- > 0) - { - offset = GET2(slot, 0) << 1; - condition = offset < offset_top && md->offset_vector[offset] >= 0; - if (condition) break; - slot += md->name_entry_size; - } - } - break; - - case OP_DEF: /* DEFINE - always false */ - case OP_FAIL: /* From optimized (?!) condition */ - break; - - /* The condition is an assertion. Call match() to evaluate it - setting - md->match_function_type to MATCH_CONDASSERT causes it to stop at the end - of an assertion. */ - - default: - md->match_function_type = MATCH_CONDASSERT; - RMATCH(eptr, ecode, offset_top, md, NULL, RM3); - if (rrc == MATCH_MATCH) - { - if (md->end_offset_top > offset_top) - offset_top = md->end_offset_top; /* Captures may have happened */ - condition = TRUE; - - /* Advance ecode past the assertion to the start of the first branch, - but adjust it so that the general choosing code below works. If the - assertion has a quantifier that allows zero repeats we must skip over - the BRAZERO. This is a lunatic thing to do, but somebody did! */ - - if (*ecode == OP_BRAZERO) ecode++; - ecode += GET(ecode, 1); - while (*ecode == OP_ALT) ecode += GET(ecode, 1); - ecode += 1 + LINK_SIZE - PRIV(OP_lengths)[condcode]; - } - - /* PCRE doesn't allow the effect of (*THEN) to escape beyond an - assertion; it is therefore treated as NOMATCH. Any other return is an - error. */ - - else if (rrc != MATCH_NOMATCH && rrc != MATCH_THEN) - { - RRETURN(rrc); /* Need braces because of following else */ - } - break; - } - - /* Choose branch according to the condition */ - - ecode += condition? PRIV(OP_lengths)[condcode] : codelink; - - /* We are now at the branch that is to be obeyed. As there is only one, we - can use tail recursion to avoid using another stack frame, except when - there is unlimited repeat of a possibly empty group. In the latter case, a - recursive call to match() is always required, unless the second alternative - doesn't exist, in which case we can just plough on. Note that, for - compatibility with Perl, the | in a conditional group is NOT treated as - creating two alternatives. If a THEN is encountered in the branch, it - propagates out to the enclosing alternative (unless nested in a deeper set - of alternatives, of course). */ - - if (condition || ecode[-(1+LINK_SIZE)] == OP_ALT) - { - if (op != OP_SCOND) - { - goto TAIL_RECURSE; - } - - md->match_function_type = MATCH_CBEGROUP; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM49); - RRETURN(rrc); - } - - /* Condition false & no alternative; continue after the group. */ - - else - { - } - break; - - - /* Before OP_ACCEPT there may be any number of OP_CLOSE opcodes, - to close any currently open capturing brackets. */ - - case OP_CLOSE: - number = GET2(ecode, 1); /* Must be less than 65536 */ - offset = number << 1; - -#ifdef PCRE_DEBUG - printf("end bracket %d at *ACCEPT", number); - printf("\n"); -#endif - - md->capture_last = (md->capture_last & OVFLMASK) | number; - if (offset >= md->offset_max) md->capture_last |= OVFLBIT; else - { - md->offset_vector[offset] = - md->offset_vector[md->offset_end - number]; - md->offset_vector[offset+1] = (int)(eptr - md->start_subject); - - /* If this group is at or above the current highwater mark, ensure that - any groups between the current high water mark and this group are marked - unset and then update the high water mark. */ - - if (offset >= offset_top) - { - register int *iptr = md->offset_vector + offset_top; - register int *iend = md->offset_vector + offset; - while (iptr < iend) *iptr++ = -1; - offset_top = offset + 2; - } - } - ecode += 1 + IMM2_SIZE; - break; - - - /* End of the pattern, either real or forced. */ - - case OP_END: - case OP_ACCEPT: - case OP_ASSERT_ACCEPT: - - /* If we have matched an empty string, fail if not in an assertion and not - in a recursion if either PCRE_NOTEMPTY is set, or if PCRE_NOTEMPTY_ATSTART - is set and we have matched at the start of the subject. In both cases, - backtracking will then try other alternatives, if any. */ - - if (eptr == mstart && op != OP_ASSERT_ACCEPT && - md->recursive == NULL && - (md->notempty || - (md->notempty_atstart && - mstart == md->start_subject + md->start_offset))) - RRETURN(MATCH_NOMATCH); - - /* Otherwise, we have a match. */ - - md->end_match_ptr = eptr; /* Record where we ended */ - md->end_offset_top = offset_top; /* and how many extracts were taken */ - md->start_match_ptr = mstart; /* and the start (\K can modify) */ - - /* For some reason, the macros don't work properly if an expression is - given as the argument to RRETURN when the heap is in use. */ - - rrc = (op == OP_END)? MATCH_MATCH : MATCH_ACCEPT; - RRETURN(rrc); - - /* Assertion brackets. Check the alternative branches in turn - the - matching won't pass the KET for an assertion. If any one branch matches, - the assertion is true. Lookbehind assertions have an OP_REVERSE item at the - start of each branch to move the current point backwards, so the code at - this level is identical to the lookahead case. When the assertion is part - of a condition, we want to return immediately afterwards. The caller of - this incarnation of the match() function will have set MATCH_CONDASSERT in - md->match_function type, and one of these opcodes will be the first opcode - that is processed. We use a local variable that is preserved over calls to - match() to remember this case. */ - - case OP_ASSERT: - case OP_ASSERTBACK: - save_mark = md->mark; - if (md->match_function_type == MATCH_CONDASSERT) - { - condassert = TRUE; - md->match_function_type = 0; - } - else condassert = FALSE; - - /* Loop for each branch */ - - do - { - RMATCH(eptr, ecode + 1 + LINK_SIZE, offset_top, md, NULL, RM4); - - /* A match means that the assertion is true; break out of the loop - that matches its alternatives. */ - - if (rrc == MATCH_MATCH || rrc == MATCH_ACCEPT) - { - mstart = md->start_match_ptr; /* In case \K reset it */ - break; - } - - /* If not matched, restore the previous mark setting. */ - - md->mark = save_mark; - - /* See comment in the code for capturing groups above about handling - THEN. */ - - if (rrc == MATCH_THEN) - { - next = ecode + GET(ecode,1); - if (md->start_match_ptr < next && - (*ecode == OP_ALT || *next == OP_ALT)) - rrc = MATCH_NOMATCH; - } - - /* Anything other than NOMATCH causes the entire assertion to fail, - passing back the return code. This includes COMMIT, SKIP, PRUNE and an - uncaptured THEN, which means they take their normal effect. This - consistent approach does not always have exactly the same effect as in - Perl. */ - - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - ecode += GET(ecode, 1); - } - while (*ecode == OP_ALT); /* Continue for next alternative */ - - /* If we have tried all the alternative branches, the assertion has - failed. If not, we broke out after a match. */ - - if (*ecode == OP_KET) RRETURN(MATCH_NOMATCH); - - /* If checking an assertion for a condition, return MATCH_MATCH. */ - - if (condassert) RRETURN(MATCH_MATCH); - - /* Continue from after a successful assertion, updating the offsets high - water mark, since extracts may have been taken during the assertion. */ - - do ecode += GET(ecode,1); while (*ecode == OP_ALT); - ecode += 1 + LINK_SIZE; - offset_top = md->end_offset_top; - continue; - - /* Negative assertion: all branches must fail to match for the assertion to - succeed. */ - - case OP_ASSERT_NOT: - case OP_ASSERTBACK_NOT: - save_mark = md->mark; - if (md->match_function_type == MATCH_CONDASSERT) - { - condassert = TRUE; - md->match_function_type = 0; - } - else condassert = FALSE; - - /* Loop for each alternative branch. */ - - do - { - RMATCH(eptr, ecode + 1 + LINK_SIZE, offset_top, md, NULL, RM5); - md->mark = save_mark; /* Always restore the mark setting */ - - switch(rrc) - { - case MATCH_MATCH: /* A successful match means */ - case MATCH_ACCEPT: /* the assertion has failed. */ - RRETURN(MATCH_NOMATCH); - - case MATCH_NOMATCH: /* Carry on with next branch */ - break; - - /* See comment in the code for capturing groups above about handling - THEN. */ - - case MATCH_THEN: - next = ecode + GET(ecode,1); - if (md->start_match_ptr < next && - (*ecode == OP_ALT || *next == OP_ALT)) - { - rrc = MATCH_NOMATCH; - break; - } - /* Otherwise fall through. */ - - /* COMMIT, SKIP, PRUNE, and an uncaptured THEN cause the whole - assertion to fail to match, without considering any more alternatives. - Failing to match means the assertion is true. This is a consistent - approach, but does not always have the same effect as in Perl. */ - - case MATCH_COMMIT: - case MATCH_SKIP: - case MATCH_SKIP_ARG: - case MATCH_PRUNE: - do ecode += GET(ecode,1); while (*ecode == OP_ALT); - goto NEG_ASSERT_TRUE; /* Break out of alternation loop */ - - /* Anything else is an error */ - - default: - RRETURN(rrc); - } - - /* Continue with next branch */ - - ecode += GET(ecode,1); - } - while (*ecode == OP_ALT); - - /* All branches in the assertion failed to match. */ - - NEG_ASSERT_TRUE: - if (condassert) RRETURN(MATCH_MATCH); /* Condition assertion */ - ecode += 1 + LINK_SIZE; /* Continue with current branch */ - continue; - - /* Move the subject pointer back. This occurs only at the start of - each branch of a lookbehind assertion. If we are too close to the start to - move back, this match function fails. When working with UTF-8 we move - back a number of characters, not bytes. */ - - case OP_REVERSE: -#ifdef SUPPORT_UTF - if (utf) - { - i = GET(ecode, 1); - while (i-- > 0) - { - eptr--; - if (eptr < md->start_subject) RRETURN(MATCH_NOMATCH); - BACKCHAR(eptr); - } - } - else -#endif - - /* No UTF-8 support, or not in UTF-8 mode: count is byte count */ - - { - eptr -= GET(ecode, 1); - if (eptr < md->start_subject) RRETURN(MATCH_NOMATCH); - } - - /* Save the earliest consulted character, then skip to next op code */ - - if (eptr < md->start_used_ptr) md->start_used_ptr = eptr; - ecode += 1 + LINK_SIZE; - break; - - /* The callout item calls an external function, if one is provided, passing - details of the match so far. This is mainly for debugging, though the - function is able to force a failure. */ - - case OP_CALLOUT: - if (PUBL(callout) != NULL) - { - PUBL(callout_block) cb; - cb.version = 2; /* Version 1 of the callout block */ - cb.callout_number = ecode[1]; - cb.offset_vector = md->offset_vector; -#if defined COMPILE_PCRE8 - cb.subject = (PCRE_SPTR)md->start_subject; -#elif defined COMPILE_PCRE16 - cb.subject = (PCRE_SPTR16)md->start_subject; -#elif defined COMPILE_PCRE32 - cb.subject = (PCRE_SPTR32)md->start_subject; -#endif - cb.subject_length = (int)(md->end_subject - md->start_subject); - cb.start_match = (int)(mstart - md->start_subject); - cb.current_position = (int)(eptr - md->start_subject); - cb.pattern_position = GET(ecode, 2); - cb.next_item_length = GET(ecode, 2 + LINK_SIZE); - cb.capture_top = offset_top/2; - cb.capture_last = md->capture_last & CAPLMASK; - /* Internal change requires this for API compatibility. */ - if (cb.capture_last == 0) cb.capture_last = -1; - cb.callout_data = md->callout_data; - cb.mark = md->nomatch_mark; - if ((rrc = (*PUBL(callout))(&cb)) > 0) RRETURN(MATCH_NOMATCH); - if (rrc < 0) RRETURN(rrc); - } - ecode += 2 + 2*LINK_SIZE; - break; - - /* Recursion either matches the current regex, or some subexpression. The - offset data is the offset to the starting bracket from the start of the - whole pattern. (This is so that it works from duplicated subpatterns.) - - The state of the capturing groups is preserved over recursion, and - re-instated afterwards. We don't know how many are started and not yet - finished (offset_top records the completed total) so we just have to save - all the potential data. There may be up to 65535 such values, which is too - large to put on the stack, but using malloc for small numbers seems - expensive. As a compromise, the stack is used when there are no more than - REC_STACK_SAVE_MAX values to store; otherwise malloc is used. - - There are also other values that have to be saved. We use a chained - sequence of blocks that actually live on the stack. Thanks to Robin Houston - for the original version of this logic. It has, however, been hacked around - a lot, so he is not to blame for the current way it works. */ - - case OP_RECURSE: - { - recursion_info *ri; - unsigned int recno; - - callpat = md->start_code + GET(ecode, 1); - recno = (callpat == md->start_code)? 0 : - GET2(callpat, 1 + LINK_SIZE); - - /* Check for repeating a recursion without advancing the subject pointer. - This should catch convoluted mutual recursions. (Some simple cases are - caught at compile time.) */ - - for (ri = md->recursive; ri != NULL; ri = ri->prevrec) - if (recno == ri->group_num && eptr == ri->subject_position) - RRETURN(PCRE_ERROR_RECURSELOOP); - - /* Add to "recursing stack" */ - - new_recursive.group_num = recno; - new_recursive.saved_capture_last = md->capture_last; - new_recursive.subject_position = eptr; - new_recursive.prevrec = md->recursive; - md->recursive = &new_recursive; - - /* Where to continue from afterwards */ - - ecode += 1 + LINK_SIZE; - - /* Now save the offset data */ - - new_recursive.saved_max = md->offset_end; - if (new_recursive.saved_max <= REC_STACK_SAVE_MAX) - new_recursive.offset_save = stacksave; - else - { - new_recursive.offset_save = - (int *)(PUBL(malloc))(new_recursive.saved_max * sizeof(int)); - if (new_recursive.offset_save == NULL) RRETURN(PCRE_ERROR_NOMEMORY); - } - memcpy(new_recursive.offset_save, md->offset_vector, - new_recursive.saved_max * sizeof(int)); - - /* OK, now we can do the recursion. After processing each alternative, - restore the offset data and the last captured value. If there were nested - recursions, md->recursive might be changed, so reset it before looping. - */ - - DPRINTF(("Recursing into group %d\n", new_recursive.group_num)); - cbegroup = (*callpat >= OP_SBRA); - do - { - if (cbegroup) md->match_function_type = MATCH_CBEGROUP; - RMATCH(eptr, callpat + PRIV(OP_lengths)[*callpat], offset_top, - md, eptrb, RM6); - memcpy(md->offset_vector, new_recursive.offset_save, - new_recursive.saved_max * sizeof(int)); - md->capture_last = new_recursive.saved_capture_last; - md->recursive = new_recursive.prevrec; - if (rrc == MATCH_MATCH || rrc == MATCH_ACCEPT) - { - DPRINTF(("Recursion matched\n")); - if (new_recursive.offset_save != stacksave) - (PUBL(free))(new_recursive.offset_save); - - /* Set where we got to in the subject, and reset the start in case - it was changed by \K. This *is* propagated back out of a recursion, - for Perl compatibility. */ - - eptr = md->end_match_ptr; - mstart = md->start_match_ptr; - goto RECURSION_MATCHED; /* Exit loop; end processing */ - } - - /* PCRE does not allow THEN, SKIP, PRUNE or COMMIT to escape beyond a - recursion; they cause a NOMATCH for the entire recursion. These codes - are defined in a range that can be tested for. */ - - if (rrc >= MATCH_BACKTRACK_MIN && rrc <= MATCH_BACKTRACK_MAX) - { - if (new_recursive.offset_save != stacksave) - (PUBL(free))(new_recursive.offset_save); - RRETURN(MATCH_NOMATCH); - } - - /* Any return code other than NOMATCH is an error. */ - - if (rrc != MATCH_NOMATCH) - { - DPRINTF(("Recursion gave error %d\n", rrc)); - if (new_recursive.offset_save != stacksave) - (PUBL(free))(new_recursive.offset_save); - RRETURN(rrc); - } - - md->recursive = &new_recursive; - callpat += GET(callpat, 1); - } - while (*callpat == OP_ALT); - - DPRINTF(("Recursion didn't match\n")); - md->recursive = new_recursive.prevrec; - if (new_recursive.offset_save != stacksave) - (PUBL(free))(new_recursive.offset_save); - RRETURN(MATCH_NOMATCH); - } - - RECURSION_MATCHED: - break; - - /* An alternation is the end of a branch; scan along to find the end of the - bracketed group and go to there. */ - - case OP_ALT: - do ecode += GET(ecode,1); while (*ecode == OP_ALT); - break; - - /* BRAZERO, BRAMINZERO and SKIPZERO occur just before a bracket group, - indicating that it may occur zero times. It may repeat infinitely, or not - at all - i.e. it could be ()* or ()? or even (){0} in the pattern. Brackets - with fixed upper repeat limits are compiled as a number of copies, with the - optional ones preceded by BRAZERO or BRAMINZERO. */ - - case OP_BRAZERO: - next = ecode + 1; - RMATCH(eptr, next, offset_top, md, eptrb, RM10); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - do next += GET(next, 1); while (*next == OP_ALT); - ecode = next + 1 + LINK_SIZE; - break; - - case OP_BRAMINZERO: - next = ecode + 1; - do next += GET(next, 1); while (*next == OP_ALT); - RMATCH(eptr, next + 1+LINK_SIZE, offset_top, md, eptrb, RM11); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - ecode++; - break; - - case OP_SKIPZERO: - next = ecode+1; - do next += GET(next,1); while (*next == OP_ALT); - ecode = next + 1 + LINK_SIZE; - break; - - /* BRAPOSZERO occurs before a possessive bracket group. Don't do anything - here; just jump to the group, with allow_zero set TRUE. */ - - case OP_BRAPOSZERO: - op = *(++ecode); - allow_zero = TRUE; - if (op == OP_CBRAPOS || op == OP_SCBRAPOS) goto POSSESSIVE_CAPTURE; - goto POSSESSIVE_NON_CAPTURE; - - /* End of a group, repeated or non-repeating. */ - - case OP_KET: - case OP_KETRMIN: - case OP_KETRMAX: - case OP_KETRPOS: - prev = ecode - GET(ecode, 1); - - /* If this was a group that remembered the subject start, in order to break - infinite repeats of empty string matches, retrieve the subject start from - the chain. Otherwise, set it NULL. */ - - if (*prev >= OP_SBRA || *prev == OP_ONCE) - { - saved_eptr = eptrb->epb_saved_eptr; /* Value at start of group */ - eptrb = eptrb->epb_prev; /* Backup to previous group */ - } - else saved_eptr = NULL; - - /* If we are at the end of an assertion group or a non-capturing atomic - group, stop matching and return MATCH_MATCH, but record the current high - water mark for use by positive assertions. We also need to record the match - start in case it was changed by \K. */ - - if ((*prev >= OP_ASSERT && *prev <= OP_ASSERTBACK_NOT) || - *prev == OP_ONCE_NC) - { - md->end_match_ptr = eptr; /* For ONCE_NC */ - md->end_offset_top = offset_top; - md->start_match_ptr = mstart; - RRETURN(MATCH_MATCH); /* Sets md->mark */ - } - - /* For capturing groups we have to check the group number back at the start - and if necessary complete handling an extraction by setting the offsets and - bumping the high water mark. Whole-pattern recursion is coded as a recurse - into group 0, so it won't be picked up here. Instead, we catch it when the - OP_END is reached. Other recursion is handled here. We just have to record - the current subject position and start match pointer and give a MATCH - return. */ - - if (*prev == OP_CBRA || *prev == OP_SCBRA || - *prev == OP_CBRAPOS || *prev == OP_SCBRAPOS) - { - number = GET2(prev, 1+LINK_SIZE); - offset = number << 1; - -#ifdef PCRE_DEBUG - printf("end bracket %d", number); - printf("\n"); -#endif - - /* Handle a recursively called group. */ - - if (md->recursive != NULL && md->recursive->group_num == number) - { - md->end_match_ptr = eptr; - md->start_match_ptr = mstart; - RRETURN(MATCH_MATCH); - } - - /* Deal with capturing */ - - md->capture_last = (md->capture_last & OVFLMASK) | number; - if (offset >= md->offset_max) md->capture_last |= OVFLBIT; else - { - /* If offset is greater than offset_top, it means that we are - "skipping" a capturing group, and that group's offsets must be marked - unset. In earlier versions of PCRE, all the offsets were unset at the - start of matching, but this doesn't work because atomic groups and - assertions can cause a value to be set that should later be unset. - Example: matching /(?>(a))b|(a)c/ against "ac". This sets group 1 as - part of the atomic group, but this is not on the final matching path, - so must be unset when 2 is set. (If there is no group 2, there is no - problem, because offset_top will then be 2, indicating no capture.) */ - - if (offset > offset_top) - { - register int *iptr = md->offset_vector + offset_top; - register int *iend = md->offset_vector + offset; - while (iptr < iend) *iptr++ = -1; - } - - /* Now make the extraction */ - - md->offset_vector[offset] = - md->offset_vector[md->offset_end - number]; - md->offset_vector[offset+1] = (int)(eptr - md->start_subject); - if (offset_top <= offset) offset_top = offset + 2; - } - } - - /* OP_KETRPOS is a possessive repeating ket. Remember the current position, - and return the MATCH_KETRPOS. This makes it possible to do the repeats one - at a time from the outer level, thus saving stack. This must precede the - empty string test - in this case that test is done at the outer level. */ - - if (*ecode == OP_KETRPOS) - { - md->start_match_ptr = mstart; /* In case \K reset it */ - md->end_match_ptr = eptr; - md->end_offset_top = offset_top; - RRETURN(MATCH_KETRPOS); - } - - /* For an ordinary non-repeating ket, just continue at this level. This - also happens for a repeating ket if no characters were matched in the - group. This is the forcible breaking of infinite loops as implemented in - Perl 5.005. For a non-repeating atomic group that includes captures, - establish a backup point by processing the rest of the pattern at a lower - level. If this results in a NOMATCH return, pass MATCH_ONCE back to the - original OP_ONCE level, thereby bypassing intermediate backup points, but - resetting any captures that happened along the way. */ - - if (*ecode == OP_KET || eptr == saved_eptr) - { - if (*prev == OP_ONCE) - { - RMATCH(eptr, ecode + 1 + LINK_SIZE, offset_top, md, eptrb, RM12); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - md->once_target = prev; /* Level at which to change to MATCH_NOMATCH */ - RRETURN(MATCH_ONCE); - } - ecode += 1 + LINK_SIZE; /* Carry on at this level */ - break; - } - - /* The normal repeating kets try the rest of the pattern or restart from - the preceding bracket, in the appropriate order. In the second case, we can - use tail recursion to avoid using another stack frame, unless we have an - an atomic group or an unlimited repeat of a group that can match an empty - string. */ - - if (*ecode == OP_KETRMIN) - { - RMATCH(eptr, ecode + 1 + LINK_SIZE, offset_top, md, eptrb, RM7); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (*prev == OP_ONCE) - { - RMATCH(eptr, prev, offset_top, md, eptrb, RM8); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - md->once_target = prev; /* Level at which to change to MATCH_NOMATCH */ - RRETURN(MATCH_ONCE); - } - if (*prev >= OP_SBRA) /* Could match an empty string */ - { - RMATCH(eptr, prev, offset_top, md, eptrb, RM50); - RRETURN(rrc); - } - ecode = prev; - goto TAIL_RECURSE; - } - else /* OP_KETRMAX */ - { - RMATCH(eptr, prev, offset_top, md, eptrb, RM13); - if (rrc == MATCH_ONCE && md->once_target == prev) rrc = MATCH_NOMATCH; - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (*prev == OP_ONCE) - { - RMATCH(eptr, ecode + 1 + LINK_SIZE, offset_top, md, eptrb, RM9); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - md->once_target = prev; - RRETURN(MATCH_ONCE); - } - ecode += 1 + LINK_SIZE; - goto TAIL_RECURSE; - } - /* Control never gets here */ - - /* Not multiline mode: start of subject assertion, unless notbol. */ - - case OP_CIRC: - if (md->notbol && eptr == md->start_subject) RRETURN(MATCH_NOMATCH); - - /* Start of subject assertion */ - - case OP_SOD: - if (eptr != md->start_subject) RRETURN(MATCH_NOMATCH); - ecode++; - break; - - /* Multiline mode: start of subject unless notbol, or after any newline. */ - - case OP_CIRCM: - if (md->notbol && eptr == md->start_subject) RRETURN(MATCH_NOMATCH); - if (eptr != md->start_subject && - (eptr == md->end_subject || !WAS_NEWLINE(eptr))) - RRETURN(MATCH_NOMATCH); - ecode++; - break; - - /* Start of match assertion */ - - case OP_SOM: - if (eptr != md->start_subject + md->start_offset) RRETURN(MATCH_NOMATCH); - ecode++; - break; - - /* Reset the start of match point */ - - case OP_SET_SOM: - mstart = eptr; - ecode++; - break; - - /* Multiline mode: assert before any newline, or before end of subject - unless noteol is set. */ - - case OP_DOLLM: - if (eptr < md->end_subject) - { - if (!IS_NEWLINE(eptr)) - { - if (md->partial != 0 && - eptr + 1 >= md->end_subject && - NLBLOCK->nltype == NLTYPE_FIXED && - NLBLOCK->nllen == 2 && - UCHAR21TEST(eptr) == NLBLOCK->nl[0]) - { - md->hitend = TRUE; - if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); - } - RRETURN(MATCH_NOMATCH); - } - } - else - { - if (md->noteol) RRETURN(MATCH_NOMATCH); - SCHECK_PARTIAL(); - } - ecode++; - break; - - /* Not multiline mode: assert before a terminating newline or before end of - subject unless noteol is set. */ - - case OP_DOLL: - if (md->noteol) RRETURN(MATCH_NOMATCH); - if (!md->endonly) goto ASSERT_NL_OR_EOS; - - /* ... else fall through for endonly */ - - /* End of subject assertion (\z) */ - - case OP_EOD: - if (eptr < md->end_subject) RRETURN(MATCH_NOMATCH); - SCHECK_PARTIAL(); - ecode++; - break; - - /* End of subject or ending \n assertion (\Z) */ - - case OP_EODN: - ASSERT_NL_OR_EOS: - if (eptr < md->end_subject && - (!IS_NEWLINE(eptr) || eptr != md->end_subject - md->nllen)) - { - if (md->partial != 0 && - eptr + 1 >= md->end_subject && - NLBLOCK->nltype == NLTYPE_FIXED && - NLBLOCK->nllen == 2 && - UCHAR21TEST(eptr) == NLBLOCK->nl[0]) - { - md->hitend = TRUE; - if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); - } - RRETURN(MATCH_NOMATCH); - } - - /* Either at end of string or \n before end. */ - - SCHECK_PARTIAL(); - ecode++; - break; - - /* Word boundary assertions */ - - case OP_NOT_WORD_BOUNDARY: - case OP_WORD_BOUNDARY: - { - - /* Find out if the previous and current characters are "word" characters. - It takes a bit more work in UTF-8 mode. Characters > 255 are assumed to - be "non-word" characters. Remember the earliest consulted character for - partial matching. */ - -#ifdef SUPPORT_UTF - if (utf) - { - /* Get status of previous character */ - - if (eptr == md->start_subject) prev_is_word = FALSE; else - { - PCRE_PUCHAR lastptr = eptr - 1; - BACKCHAR(lastptr); - if (lastptr < md->start_used_ptr) md->start_used_ptr = lastptr; - GETCHAR(c, lastptr); -#ifdef SUPPORT_UCP - if (md->use_ucp) - { - if (c == '_') prev_is_word = TRUE; else - { - int cat = UCD_CATEGORY(c); - prev_is_word = (cat == ucp_L || cat == ucp_N); - } - } - else -#endif - prev_is_word = c < 256 && (md->ctypes[c] & ctype_word) != 0; - } - - /* Get status of next character */ - - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - cur_is_word = FALSE; - } - else - { - GETCHAR(c, eptr); -#ifdef SUPPORT_UCP - if (md->use_ucp) - { - if (c == '_') cur_is_word = TRUE; else - { - int cat = UCD_CATEGORY(c); - cur_is_word = (cat == ucp_L || cat == ucp_N); - } - } - else -#endif - cur_is_word = c < 256 && (md->ctypes[c] & ctype_word) != 0; - } - } - else -#endif - - /* Not in UTF-8 mode, but we may still have PCRE_UCP set, and for - consistency with the behaviour of \w we do use it in this case. */ - - { - /* Get status of previous character */ - - if (eptr == md->start_subject) prev_is_word = FALSE; else - { - if (eptr <= md->start_used_ptr) md->start_used_ptr = eptr - 1; -#ifdef SUPPORT_UCP - if (md->use_ucp) - { - c = eptr[-1]; - if (c == '_') prev_is_word = TRUE; else - { - int cat = UCD_CATEGORY(c); - prev_is_word = (cat == ucp_L || cat == ucp_N); - } - } - else -#endif - prev_is_word = MAX_255(eptr[-1]) - && ((md->ctypes[eptr[-1]] & ctype_word) != 0); - } - - /* Get status of next character */ - - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - cur_is_word = FALSE; - } - else -#ifdef SUPPORT_UCP - if (md->use_ucp) - { - c = *eptr; - if (c == '_') cur_is_word = TRUE; else - { - int cat = UCD_CATEGORY(c); - cur_is_word = (cat == ucp_L || cat == ucp_N); - } - } - else -#endif - cur_is_word = MAX_255(*eptr) - && ((md->ctypes[*eptr] & ctype_word) != 0); - } - - /* Now see if the situation is what we want */ - - if ((*ecode++ == OP_WORD_BOUNDARY)? - cur_is_word == prev_is_word : cur_is_word != prev_is_word) - RRETURN(MATCH_NOMATCH); - } - break; - - /* Match any single character type except newline; have to take care with - CRLF newlines and partial matching. */ - - case OP_ANY: - if (IS_NEWLINE(eptr)) RRETURN(MATCH_NOMATCH); - if (md->partial != 0 && - eptr == md->end_subject - 1 && - NLBLOCK->nltype == NLTYPE_FIXED && - NLBLOCK->nllen == 2 && - UCHAR21TEST(eptr) == NLBLOCK->nl[0]) - { - md->hitend = TRUE; - if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); - } - - /* Fall through */ - - /* Match any single character whatsoever. */ - - case OP_ALLANY: - if (eptr >= md->end_subject) /* DO NOT merge the eptr++ here; it must */ - { /* not be updated before SCHECK_PARTIAL. */ - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - eptr++; -#ifdef SUPPORT_UTF - if (utf) ACROSSCHAR(eptr < md->end_subject, *eptr, eptr++); -#endif - ecode++; - break; - - /* Match a single byte, even in UTF-8 mode. This opcode really does match - any byte, even newline, independent of the setting of PCRE_DOTALL. */ - - case OP_ANYBYTE: - if (eptr >= md->end_subject) /* DO NOT merge the eptr++ here; it must */ - { /* not be updated before SCHECK_PARTIAL. */ - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - eptr++; - ecode++; - break; - - case OP_NOT_DIGIT: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if ( -#if defined SUPPORT_UTF || !(defined COMPILE_PCRE8) - c < 256 && -#endif - (md->ctypes[c] & ctype_digit) != 0 - ) - RRETURN(MATCH_NOMATCH); - ecode++; - break; - - case OP_DIGIT: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if ( -#if defined SUPPORT_UTF || !(defined COMPILE_PCRE8) - c > 255 || -#endif - (md->ctypes[c] & ctype_digit) == 0 - ) - RRETURN(MATCH_NOMATCH); - ecode++; - break; - - case OP_NOT_WHITESPACE: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if ( -#if defined SUPPORT_UTF || !(defined COMPILE_PCRE8) - c < 256 && -#endif - (md->ctypes[c] & ctype_space) != 0 - ) - RRETURN(MATCH_NOMATCH); - ecode++; - break; - - case OP_WHITESPACE: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if ( -#if defined SUPPORT_UTF || !(defined COMPILE_PCRE8) - c > 255 || -#endif - (md->ctypes[c] & ctype_space) == 0 - ) - RRETURN(MATCH_NOMATCH); - ecode++; - break; - - case OP_NOT_WORDCHAR: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if ( -#if defined SUPPORT_UTF || !(defined COMPILE_PCRE8) - c < 256 && -#endif - (md->ctypes[c] & ctype_word) != 0 - ) - RRETURN(MATCH_NOMATCH); - ecode++; - break; - - case OP_WORDCHAR: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if ( -#if defined SUPPORT_UTF || !(defined COMPILE_PCRE8) - c > 255 || -#endif - (md->ctypes[c] & ctype_word) == 0 - ) - RRETURN(MATCH_NOMATCH); - ecode++; - break; - - case OP_ANYNL: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - switch(c) - { - default: RRETURN(MATCH_NOMATCH); - - case CHAR_CR: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - } - else if (UCHAR21TEST(eptr) == CHAR_LF) eptr++; - break; - - case CHAR_LF: - break; - - case CHAR_VT: - case CHAR_FF: - case CHAR_NEL: -#ifndef EBCDIC - case 0x2028: - case 0x2029: -#endif /* Not EBCDIC */ - if (md->bsr_anycrlf) RRETURN(MATCH_NOMATCH); - break; - } - ecode++; - break; - - case OP_NOT_HSPACE: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - switch(c) - { - HSPACE_CASES: RRETURN(MATCH_NOMATCH); /* Byte and multibyte cases */ - default: break; - } - ecode++; - break; - - case OP_HSPACE: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - switch(c) - { - HSPACE_CASES: break; /* Byte and multibyte cases */ - default: RRETURN(MATCH_NOMATCH); - } - ecode++; - break; - - case OP_NOT_VSPACE: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - switch(c) - { - VSPACE_CASES: RRETURN(MATCH_NOMATCH); - default: break; - } - ecode++; - break; - - case OP_VSPACE: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - switch(c) - { - VSPACE_CASES: break; - default: RRETURN(MATCH_NOMATCH); - } - ecode++; - break; - -#ifdef SUPPORT_UCP - /* Check the next character by Unicode property. We will get here only - if the support is in the binary; otherwise a compile-time error occurs. */ - - case OP_PROP: - case OP_NOTPROP: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - { - const pcre_uint32 *cp; - const ucd_record *prop = GET_UCD(c); - - switch(ecode[1]) - { - case PT_ANY: - if (op == OP_NOTPROP) RRETURN(MATCH_NOMATCH); - break; - - case PT_LAMP: - if ((prop->chartype == ucp_Lu || - prop->chartype == ucp_Ll || - prop->chartype == ucp_Lt) == (op == OP_NOTPROP)) - RRETURN(MATCH_NOMATCH); - break; - - case PT_GC: - if ((ecode[2] != PRIV(ucp_gentype)[prop->chartype]) == (op == OP_PROP)) - RRETURN(MATCH_NOMATCH); - break; - - case PT_PC: - if ((ecode[2] != prop->chartype) == (op == OP_PROP)) - RRETURN(MATCH_NOMATCH); - break; - - case PT_SC: - if ((ecode[2] != prop->script) == (op == OP_PROP)) - RRETURN(MATCH_NOMATCH); - break; - - /* These are specials */ - - case PT_ALNUM: - if ((PRIV(ucp_gentype)[prop->chartype] == ucp_L || - PRIV(ucp_gentype)[prop->chartype] == ucp_N) == (op == OP_NOTPROP)) - RRETURN(MATCH_NOMATCH); - break; - - /* Perl space used to exclude VT, but from Perl 5.18 it is included, - which means that Perl space and POSIX space are now identical. PCRE - was changed at release 8.34. */ - - case PT_SPACE: /* Perl space */ - case PT_PXSPACE: /* POSIX space */ - switch(c) - { - HSPACE_CASES: - VSPACE_CASES: - if (op == OP_NOTPROP) RRETURN(MATCH_NOMATCH); - break; - - default: - if ((PRIV(ucp_gentype)[prop->chartype] == ucp_Z) == - (op == OP_NOTPROP)) RRETURN(MATCH_NOMATCH); - break; - } - break; - - case PT_WORD: - if ((PRIV(ucp_gentype)[prop->chartype] == ucp_L || - PRIV(ucp_gentype)[prop->chartype] == ucp_N || - c == CHAR_UNDERSCORE) == (op == OP_NOTPROP)) - RRETURN(MATCH_NOMATCH); - break; - - case PT_CLIST: - cp = PRIV(ucd_caseless_sets) + ecode[2]; - for (;;) - { - if (c < *cp) - { if (op == OP_PROP) { RRETURN(MATCH_NOMATCH); } else break; } - if (c == *cp++) - { if (op == OP_PROP) break; else { RRETURN(MATCH_NOMATCH); } } - } - break; - - case PT_UCNC: - if ((c == CHAR_DOLLAR_SIGN || c == CHAR_COMMERCIAL_AT || - c == CHAR_GRAVE_ACCENT || (c >= 0xa0 && c <= 0xd7ff) || - c >= 0xe000) == (op == OP_NOTPROP)) - RRETURN(MATCH_NOMATCH); - break; - - /* This should never occur */ - - default: - RRETURN(PCRE_ERROR_INTERNAL); - } - - ecode += 3; - } - break; - - /* Match an extended Unicode sequence. We will get here only if the support - is in the binary; otherwise a compile-time error occurs. */ - - case OP_EXTUNI: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - else - { - int lgb, rgb; - GETCHARINCTEST(c, eptr); - lgb = UCD_GRAPHBREAK(c); - while (eptr < md->end_subject) - { - int len = 1; - if (!utf) c = *eptr; else { GETCHARLEN(c, eptr, len); } - rgb = UCD_GRAPHBREAK(c); - if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) break; - lgb = rgb; - eptr += len; - } - } - CHECK_PARTIAL(); - ecode++; - break; -#endif /* SUPPORT_UCP */ - - - /* Match a back reference, possibly repeatedly. Look past the end of the - item to see if there is repeat information following. The code is similar - to that for character classes, but repeated for efficiency. Then obey - similar code to character type repeats - written out again for speed. - However, if the referenced string is the empty string, always treat - it as matched, any number of times (otherwise there could be infinite - loops). If the reference is unset, there are two possibilities: - - (a) In the default, Perl-compatible state, set the length negative; - this ensures that every attempt at a match fails. We can't just fail - here, because of the possibility of quantifiers with zero minima. - - (b) If the JavaScript compatibility flag is set, set the length to zero - so that the back reference matches an empty string. - - Otherwise, set the length to the length of what was matched by the - referenced subpattern. - - The OP_REF and OP_REFI opcodes are used for a reference to a numbered group - or to a non-duplicated named group. For a duplicated named group, OP_DNREF - and OP_DNREFI are used. In this case we must scan the list of groups to - which the name refers, and use the first one that is set. */ - - case OP_DNREF: - case OP_DNREFI: - caseless = op == OP_DNREFI; - { - int count = GET2(ecode, 1+IMM2_SIZE); - pcre_uchar *slot = md->name_table + GET2(ecode, 1) * md->name_entry_size; - ecode += 1 + 2*IMM2_SIZE; - - /* Setting the default length first and initializing 'offset' avoids - compiler warnings in the REF_REPEAT code. */ - - length = (md->jscript_compat)? 0 : -1; - offset = 0; - - while (count-- > 0) - { - offset = GET2(slot, 0) << 1; - if (offset < offset_top && md->offset_vector[offset] >= 0) - { - length = md->offset_vector[offset+1] - md->offset_vector[offset]; - break; - } - slot += md->name_entry_size; - } - } - goto REF_REPEAT; - - case OP_REF: - case OP_REFI: - caseless = op == OP_REFI; - offset = GET2(ecode, 1) << 1; /* Doubled ref number */ - ecode += 1 + IMM2_SIZE; - if (offset >= offset_top || md->offset_vector[offset] < 0) - length = (md->jscript_compat)? 0 : -1; - else - length = md->offset_vector[offset+1] - md->offset_vector[offset]; - - /* Set up for repetition, or handle the non-repeated case */ - - REF_REPEAT: - switch (*ecode) - { - case OP_CRSTAR: - case OP_CRMINSTAR: - case OP_CRPLUS: - case OP_CRMINPLUS: - case OP_CRQUERY: - case OP_CRMINQUERY: - c = *ecode++ - OP_CRSTAR; - minimize = (c & 1) != 0; - min = rep_min[c]; /* Pick up values from tables; */ - max = rep_max[c]; /* zero for max => infinity */ - if (max == 0) max = INT_MAX; - break; - - case OP_CRRANGE: - case OP_CRMINRANGE: - minimize = (*ecode == OP_CRMINRANGE); - min = GET2(ecode, 1); - max = GET2(ecode, 1 + IMM2_SIZE); - if (max == 0) max = INT_MAX; - ecode += 1 + 2 * IMM2_SIZE; - break; - - default: /* No repeat follows */ - if ((length = match_ref(offset, eptr, length, md, caseless)) < 0) - { - if (length == -2) eptr = md->end_subject; /* Partial match */ - CHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - eptr += length; - continue; /* With the main loop */ - } - - /* Handle repeated back references. If the length of the reference is - zero, just continue with the main loop. If the length is negative, it - means the reference is unset in non-Java-compatible mode. If the minimum is - zero, we can continue at the same level without recursion. For any other - minimum, carrying on will result in NOMATCH. */ - - if (length == 0) continue; - if (length < 0 && min == 0) continue; - - /* First, ensure the minimum number of matches are present. We get back - the length of the reference string explicitly rather than passing the - address of eptr, so that eptr can be a register variable. */ - - for (i = 1; i <= min; i++) - { - int slength; - if ((slength = match_ref(offset, eptr, length, md, caseless)) < 0) - { - if (slength == -2) eptr = md->end_subject; /* Partial match */ - CHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - eptr += slength; - } - - /* If min = max, continue at the same level without recursion. - They are not both allowed to be zero. */ - - if (min == max) continue; - - /* If minimizing, keep trying and advancing the pointer */ - - if (minimize) - { - for (fi = min;; fi++) - { - int slength; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM14); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if ((slength = match_ref(offset, eptr, length, md, caseless)) < 0) - { - if (slength == -2) eptr = md->end_subject; /* Partial match */ - CHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - eptr += slength; - } - /* Control never gets here */ - } - - /* If maximizing, find the longest string and work backwards */ - - else - { - pp = eptr; - for (i = min; i < max; i++) - { - int slength; - if ((slength = match_ref(offset, eptr, length, md, caseless)) < 0) - { - /* Can't use CHECK_PARTIAL because we don't want to update eptr in - the soft partial matching case. */ - - if (slength == -2 && md->partial != 0 && - md->end_subject > md->start_used_ptr) - { - md->hitend = TRUE; - if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); - } - break; - } - eptr += slength; - } - - while (eptr >= pp) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM15); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - eptr -= length; - } - RRETURN(MATCH_NOMATCH); - } - /* Control never gets here */ - - /* Match a bit-mapped character class, possibly repeatedly. This op code is - used when all the characters in the class have values in the range 0-255, - and either the matching is caseful, or the characters are in the range - 0-127 when UTF-8 processing is enabled. The only difference between - OP_CLASS and OP_NCLASS occurs when a data character outside the range is - encountered. - - First, look past the end of the item to see if there is repeat information - following. Then obey similar code to character type repeats - written out - again for speed. */ - - case OP_NCLASS: - case OP_CLASS: - { - /* The data variable is saved across frames, so the byte map needs to - be stored there. */ -#define BYTE_MAP ((pcre_uint8 *)data) - data = ecode + 1; /* Save for matching */ - ecode += 1 + (32 / sizeof(pcre_uchar)); /* Advance past the item */ - - switch (*ecode) - { - case OP_CRSTAR: - case OP_CRMINSTAR: - case OP_CRPLUS: - case OP_CRMINPLUS: - case OP_CRQUERY: - case OP_CRMINQUERY: - case OP_CRPOSSTAR: - case OP_CRPOSPLUS: - case OP_CRPOSQUERY: - c = *ecode++ - OP_CRSTAR; - if (c < OP_CRPOSSTAR - OP_CRSTAR) minimize = (c & 1) != 0; - else possessive = TRUE; - min = rep_min[c]; /* Pick up values from tables; */ - max = rep_max[c]; /* zero for max => infinity */ - if (max == 0) max = INT_MAX; - break; - - case OP_CRRANGE: - case OP_CRMINRANGE: - case OP_CRPOSRANGE: - minimize = (*ecode == OP_CRMINRANGE); - possessive = (*ecode == OP_CRPOSRANGE); - min = GET2(ecode, 1); - max = GET2(ecode, 1 + IMM2_SIZE); - if (max == 0) max = INT_MAX; - ecode += 1 + 2 * IMM2_SIZE; - break; - - default: /* No repeat follows */ - min = max = 1; - break; - } - - /* First, ensure the minimum number of matches are present. */ - -#ifdef SUPPORT_UTF - if (utf) - { - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINC(c, eptr); - if (c > 255) - { - if (op == OP_CLASS) RRETURN(MATCH_NOMATCH); - } - else - if ((BYTE_MAP[c/8] & (1 << (c&7))) == 0) RRETURN(MATCH_NOMATCH); - } - } - else -#endif - /* Not UTF mode */ - { - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - c = *eptr++; -#ifndef COMPILE_PCRE8 - if (c > 255) - { - if (op == OP_CLASS) RRETURN(MATCH_NOMATCH); - } - else -#endif - if ((BYTE_MAP[c/8] & (1 << (c&7))) == 0) RRETURN(MATCH_NOMATCH); - } - } - - /* If max == min we can continue with the main loop without the - need to recurse. */ - - if (min == max) continue; - - /* If minimizing, keep testing the rest of the expression and advancing - the pointer while it matches the class. */ - - if (minimize) - { -#ifdef SUPPORT_UTF - if (utf) - { - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM16); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINC(c, eptr); - if (c > 255) - { - if (op == OP_CLASS) RRETURN(MATCH_NOMATCH); - } - else - if ((BYTE_MAP[c/8] & (1 << (c&7))) == 0) RRETURN(MATCH_NOMATCH); - } - } - else -#endif - /* Not UTF mode */ - { - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM17); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - c = *eptr++; -#ifndef COMPILE_PCRE8 - if (c > 255) - { - if (op == OP_CLASS) RRETURN(MATCH_NOMATCH); - } - else -#endif - if ((BYTE_MAP[c/8] & (1 << (c&7))) == 0) RRETURN(MATCH_NOMATCH); - } - } - /* Control never gets here */ - } - - /* If maximizing, find the longest possible run, then work backwards. */ - - else - { - pp = eptr; - -#ifdef SUPPORT_UTF - if (utf) - { - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLEN(c, eptr, len); - if (c > 255) - { - if (op == OP_CLASS) break; - } - else - if ((BYTE_MAP[c/8] & (1 << (c&7))) == 0) break; - eptr += len; - } - - if (possessive) continue; /* No backtracking */ - - for (;;) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM18); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (eptr-- <= pp) break; /* Stop if tried at original pos */ - BACKCHAR(eptr); - } - } - else -#endif - /* Not UTF mode */ - { - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - c = *eptr; -#ifndef COMPILE_PCRE8 - if (c > 255) - { - if (op == OP_CLASS) break; - } - else -#endif - if ((BYTE_MAP[c/8] & (1 << (c&7))) == 0) break; - eptr++; - } - - if (possessive) continue; /* No backtracking */ - - while (eptr >= pp) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM19); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - eptr--; - } - } - - RRETURN(MATCH_NOMATCH); - } -#undef BYTE_MAP - } - /* Control never gets here */ - - - /* Match an extended character class. In the 8-bit library, this opcode is - encountered only when UTF-8 mode mode is supported. In the 16-bit and - 32-bit libraries, codepoints greater than 255 may be encountered even when - UTF is not supported. */ - -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - case OP_XCLASS: - { - data = ecode + 1 + LINK_SIZE; /* Save for matching */ - ecode += GET(ecode, 1); /* Advance past the item */ - - switch (*ecode) - { - case OP_CRSTAR: - case OP_CRMINSTAR: - case OP_CRPLUS: - case OP_CRMINPLUS: - case OP_CRQUERY: - case OP_CRMINQUERY: - case OP_CRPOSSTAR: - case OP_CRPOSPLUS: - case OP_CRPOSQUERY: - c = *ecode++ - OP_CRSTAR; - if (c < OP_CRPOSSTAR - OP_CRSTAR) minimize = (c & 1) != 0; - else possessive = TRUE; - min = rep_min[c]; /* Pick up values from tables; */ - max = rep_max[c]; /* zero for max => infinity */ - if (max == 0) max = INT_MAX; - break; - - case OP_CRRANGE: - case OP_CRMINRANGE: - case OP_CRPOSRANGE: - minimize = (*ecode == OP_CRMINRANGE); - possessive = (*ecode == OP_CRPOSRANGE); - min = GET2(ecode, 1); - max = GET2(ecode, 1 + IMM2_SIZE); - if (max == 0) max = INT_MAX; - ecode += 1 + 2 * IMM2_SIZE; - break; - - default: /* No repeat follows */ - min = max = 1; - break; - } - - /* First, ensure the minimum number of matches are present. */ - - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if (!PRIV(xclass)(c, data, utf)) RRETURN(MATCH_NOMATCH); - } - - /* If max == min we can continue with the main loop without the - need to recurse. */ - - if (min == max) continue; - - /* If minimizing, keep testing the rest of the expression and advancing - the pointer while it matches the class. */ - - if (minimize) - { - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM20); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if (!PRIV(xclass)(c, data, utf)) RRETURN(MATCH_NOMATCH); - } - /* Control never gets here */ - } - - /* If maximizing, find the longest possible run, then work backwards. */ - - else - { - pp = eptr; - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } -#ifdef SUPPORT_UTF - GETCHARLENTEST(c, eptr, len); -#else - c = *eptr; -#endif - if (!PRIV(xclass)(c, data, utf)) break; - eptr += len; - } - - if (possessive) continue; /* No backtracking */ - - for(;;) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM21); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (eptr-- <= pp) break; /* Stop if tried at original pos */ -#ifdef SUPPORT_UTF - if (utf) BACKCHAR(eptr); -#endif - } - RRETURN(MATCH_NOMATCH); - } - - /* Control never gets here */ - } -#endif /* End of XCLASS */ - - /* Match a single character, casefully */ - - case OP_CHAR: -#ifdef SUPPORT_UTF - if (utf) - { - length = 1; - ecode++; - GETCHARLEN(fc, ecode, length); - if (length > md->end_subject - eptr) - { - CHECK_PARTIAL(); /* Not SCHECK_PARTIAL() */ - RRETURN(MATCH_NOMATCH); - } - while (length-- > 0) if (*ecode++ != UCHAR21INC(eptr)) RRETURN(MATCH_NOMATCH); - } - else -#endif - /* Not UTF mode */ - { - if (md->end_subject - eptr < 1) - { - SCHECK_PARTIAL(); /* This one can use SCHECK_PARTIAL() */ - RRETURN(MATCH_NOMATCH); - } - if (ecode[1] != *eptr++) RRETURN(MATCH_NOMATCH); - ecode += 2; - } - break; - - /* Match a single character, caselessly. If we are at the end of the - subject, give up immediately. */ - - case OP_CHARI: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - -#ifdef SUPPORT_UTF - if (utf) - { - length = 1; - ecode++; - GETCHARLEN(fc, ecode, length); - - /* If the pattern character's value is < 128, we have only one byte, and - we know that its other case must also be one byte long, so we can use the - fast lookup table. We know that there is at least one byte left in the - subject. */ - - if (fc < 128) - { - pcre_uint32 cc = UCHAR21(eptr); - if (md->lcc[fc] != TABLE_GET(cc, md->lcc, cc)) RRETURN(MATCH_NOMATCH); - ecode++; - eptr++; - } - - /* Otherwise we must pick up the subject character. Note that we cannot - use the value of "length" to check for sufficient bytes left, because the - other case of the character may have more or fewer bytes. */ - - else - { - pcre_uint32 dc; - GETCHARINC(dc, eptr); - ecode += length; - - /* If we have Unicode property support, we can use it to test the other - case of the character, if there is one. */ - - if (fc != dc) - { -#ifdef SUPPORT_UCP - if (dc != UCD_OTHERCASE(fc)) -#endif - RRETURN(MATCH_NOMATCH); - } - } - } - else -#endif /* SUPPORT_UTF */ - - /* Not UTF mode */ - { - if (TABLE_GET(ecode[1], md->lcc, ecode[1]) - != TABLE_GET(*eptr, md->lcc, *eptr)) RRETURN(MATCH_NOMATCH); - eptr++; - ecode += 2; - } - break; - - /* Match a single character repeatedly. */ - - case OP_EXACT: - case OP_EXACTI: - min = max = GET2(ecode, 1); - ecode += 1 + IMM2_SIZE; - goto REPEATCHAR; - - case OP_POSUPTO: - case OP_POSUPTOI: - possessive = TRUE; - /* Fall through */ - - case OP_UPTO: - case OP_UPTOI: - case OP_MINUPTO: - case OP_MINUPTOI: - min = 0; - max = GET2(ecode, 1); - minimize = *ecode == OP_MINUPTO || *ecode == OP_MINUPTOI; - ecode += 1 + IMM2_SIZE; - goto REPEATCHAR; - - case OP_POSSTAR: - case OP_POSSTARI: - possessive = TRUE; - min = 0; - max = INT_MAX; - ecode++; - goto REPEATCHAR; - - case OP_POSPLUS: - case OP_POSPLUSI: - possessive = TRUE; - min = 1; - max = INT_MAX; - ecode++; - goto REPEATCHAR; - - case OP_POSQUERY: - case OP_POSQUERYI: - possessive = TRUE; - min = 0; - max = 1; - ecode++; - goto REPEATCHAR; - - case OP_STAR: - case OP_STARI: - case OP_MINSTAR: - case OP_MINSTARI: - case OP_PLUS: - case OP_PLUSI: - case OP_MINPLUS: - case OP_MINPLUSI: - case OP_QUERY: - case OP_QUERYI: - case OP_MINQUERY: - case OP_MINQUERYI: - c = *ecode++ - ((op < OP_STARI)? OP_STAR : OP_STARI); - minimize = (c & 1) != 0; - min = rep_min[c]; /* Pick up values from tables; */ - max = rep_max[c]; /* zero for max => infinity */ - if (max == 0) max = INT_MAX; - - /* Common code for all repeated single-character matches. We first check - for the minimum number of characters. If the minimum equals the maximum, we - are done. Otherwise, if minimizing, check the rest of the pattern for a - match; if there isn't one, advance up to the maximum, one character at a - time. - - If maximizing, advance up to the maximum number of matching characters, - until eptr is past the end of the maximum run. If possessive, we are - then done (no backing up). Otherwise, match at this position; anything - other than no match is immediately returned. For nomatch, back up one - character, unless we are matching \R and the last thing matched was - \r\n, in which case, back up two bytes. When we reach the first optional - character position, we can save stack by doing a tail recurse. - - The various UTF/non-UTF and caseful/caseless cases are handled separately, - for speed. */ - - REPEATCHAR: -#ifdef SUPPORT_UTF - if (utf) - { - length = 1; - charptr = ecode; - GETCHARLEN(fc, ecode, length); - ecode += length; - - /* Handle multibyte character matching specially here. There is - support for caseless matching if UCP support is present. */ - - if (length > 1) - { -#ifdef SUPPORT_UCP - pcre_uint32 othercase; - if (op >= OP_STARI && /* Caseless */ - (othercase = UCD_OTHERCASE(fc)) != fc) - oclength = PRIV(ord2utf)(othercase, occhars); - else oclength = 0; -#endif /* SUPPORT_UCP */ - - for (i = 1; i <= min; i++) - { - if (eptr <= md->end_subject - length && - memcmp(eptr, charptr, IN_UCHARS(length)) == 0) eptr += length; -#ifdef SUPPORT_UCP - else if (oclength > 0 && - eptr <= md->end_subject - oclength && - memcmp(eptr, occhars, IN_UCHARS(oclength)) == 0) eptr += oclength; -#endif /* SUPPORT_UCP */ - else - { - CHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - } - - if (min == max) continue; - - if (minimize) - { - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM22); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr <= md->end_subject - length && - memcmp(eptr, charptr, IN_UCHARS(length)) == 0) eptr += length; -#ifdef SUPPORT_UCP - else if (oclength > 0 && - eptr <= md->end_subject - oclength && - memcmp(eptr, occhars, IN_UCHARS(oclength)) == 0) eptr += oclength; -#endif /* SUPPORT_UCP */ - else - { - CHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - } - /* Control never gets here */ - } - - else /* Maximize */ - { - pp = eptr; - for (i = min; i < max; i++) - { - if (eptr <= md->end_subject - length && - memcmp(eptr, charptr, IN_UCHARS(length)) == 0) eptr += length; -#ifdef SUPPORT_UCP - else if (oclength > 0 && - eptr <= md->end_subject - oclength && - memcmp(eptr, occhars, IN_UCHARS(oclength)) == 0) eptr += oclength; -#endif /* SUPPORT_UCP */ - else - { - CHECK_PARTIAL(); - break; - } - } - - if (possessive) continue; /* No backtracking */ - for(;;) - { - if (eptr <= pp) goto TAIL_RECURSE; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM23); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); -#ifdef SUPPORT_UCP - eptr--; - BACKCHAR(eptr); -#else /* without SUPPORT_UCP */ - eptr -= length; -#endif /* SUPPORT_UCP */ - } - } - /* Control never gets here */ - } - - /* If the length of a UTF-8 character is 1, we fall through here, and - obey the code as for non-UTF-8 characters below, though in this case the - value of fc will always be < 128. */ - } - else -#endif /* SUPPORT_UTF */ - /* When not in UTF-8 mode, load a single-byte character. */ - fc = *ecode++; - - /* The value of fc at this point is always one character, though we may - or may not be in UTF mode. The code is duplicated for the caseless and - caseful cases, for speed, since matching characters is likely to be quite - common. First, ensure the minimum number of matches are present. If min = - max, continue at the same level without recursing. Otherwise, if - minimizing, keep trying the rest of the expression and advancing one - matching character if failing, up to the maximum. Alternatively, if - maximizing, find the maximum number of characters and work backwards. */ - - DPRINTF(("matching %c{%d,%d} against subject %.*s\n", fc, min, max, - max, (char *)eptr)); - - if (op >= OP_STARI) /* Caseless */ - { -#ifdef COMPILE_PCRE8 - /* fc must be < 128 if UTF is enabled. */ - foc = md->fcc[fc]; -#else -#ifdef SUPPORT_UTF -#ifdef SUPPORT_UCP - if (utf && fc > 127) - foc = UCD_OTHERCASE(fc); -#else - if (utf && fc > 127) - foc = fc; -#endif /* SUPPORT_UCP */ - else -#endif /* SUPPORT_UTF */ - foc = TABLE_GET(fc, md->fcc, fc); -#endif /* COMPILE_PCRE8 */ - - for (i = 1; i <= min; i++) - { - pcre_uint32 cc; /* Faster than pcre_uchar */ - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - cc = UCHAR21TEST(eptr); - if (fc != cc && foc != cc) RRETURN(MATCH_NOMATCH); - eptr++; - } - if (min == max) continue; - if (minimize) - { - for (fi = min;; fi++) - { - pcre_uint32 cc; /* Faster than pcre_uchar */ - RMATCH(eptr, ecode, offset_top, md, eptrb, RM24); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - cc = UCHAR21TEST(eptr); - if (fc != cc && foc != cc) RRETURN(MATCH_NOMATCH); - eptr++; - } - /* Control never gets here */ - } - else /* Maximize */ - { - pp = eptr; - for (i = min; i < max; i++) - { - pcre_uint32 cc; /* Faster than pcre_uchar */ - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - cc = UCHAR21TEST(eptr); - if (fc != cc && foc != cc) break; - eptr++; - } - if (possessive) continue; /* No backtracking */ - for (;;) - { - if (eptr == pp) goto TAIL_RECURSE; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM25); - eptr--; - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - } - /* Control never gets here */ - } - } - - /* Caseful comparisons (includes all multi-byte characters) */ - - else - { - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (fc != UCHAR21INCTEST(eptr)) RRETURN(MATCH_NOMATCH); - } - - if (min == max) continue; - - if (minimize) - { - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM26); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (fc != UCHAR21INCTEST(eptr)) RRETURN(MATCH_NOMATCH); - } - /* Control never gets here */ - } - else /* Maximize */ - { - pp = eptr; - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - if (fc != UCHAR21TEST(eptr)) break; - eptr++; - } - if (possessive) continue; /* No backtracking */ - for (;;) - { - if (eptr == pp) goto TAIL_RECURSE; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM27); - eptr--; - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - } - /* Control never gets here */ - } - } - /* Control never gets here */ - - /* Match a negated single one-byte character. The character we are - checking can be multibyte. */ - - case OP_NOT: - case OP_NOTI: - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } -#ifdef SUPPORT_UTF - if (utf) - { - register pcre_uint32 ch, och; - - ecode++; - GETCHARINC(ch, ecode); - GETCHARINC(c, eptr); - - if (op == OP_NOT) - { - if (ch == c) RRETURN(MATCH_NOMATCH); - } - else - { -#ifdef SUPPORT_UCP - if (ch > 127) - och = UCD_OTHERCASE(ch); -#else - if (ch > 127) - och = ch; -#endif /* SUPPORT_UCP */ - else - och = TABLE_GET(ch, md->fcc, ch); - if (ch == c || och == c) RRETURN(MATCH_NOMATCH); - } - } - else -#endif - { - register pcre_uint32 ch = ecode[1]; - c = *eptr++; - if (ch == c || (op == OP_NOTI && TABLE_GET(ch, md->fcc, ch) == c)) - RRETURN(MATCH_NOMATCH); - ecode += 2; - } - break; - - /* Match a negated single one-byte character repeatedly. This is almost a - repeat of the code for a repeated single character, but I haven't found a - nice way of commoning these up that doesn't require a test of the - positive/negative option for each character match. Maybe that wouldn't add - very much to the time taken, but character matching *is* what this is all - about... */ - - case OP_NOTEXACT: - case OP_NOTEXACTI: - min = max = GET2(ecode, 1); - ecode += 1 + IMM2_SIZE; - goto REPEATNOTCHAR; - - case OP_NOTUPTO: - case OP_NOTUPTOI: - case OP_NOTMINUPTO: - case OP_NOTMINUPTOI: - min = 0; - max = GET2(ecode, 1); - minimize = *ecode == OP_NOTMINUPTO || *ecode == OP_NOTMINUPTOI; - ecode += 1 + IMM2_SIZE; - goto REPEATNOTCHAR; - - case OP_NOTPOSSTAR: - case OP_NOTPOSSTARI: - possessive = TRUE; - min = 0; - max = INT_MAX; - ecode++; - goto REPEATNOTCHAR; - - case OP_NOTPOSPLUS: - case OP_NOTPOSPLUSI: - possessive = TRUE; - min = 1; - max = INT_MAX; - ecode++; - goto REPEATNOTCHAR; - - case OP_NOTPOSQUERY: - case OP_NOTPOSQUERYI: - possessive = TRUE; - min = 0; - max = 1; - ecode++; - goto REPEATNOTCHAR; - - case OP_NOTPOSUPTO: - case OP_NOTPOSUPTOI: - possessive = TRUE; - min = 0; - max = GET2(ecode, 1); - ecode += 1 + IMM2_SIZE; - goto REPEATNOTCHAR; - - case OP_NOTSTAR: - case OP_NOTSTARI: - case OP_NOTMINSTAR: - case OP_NOTMINSTARI: - case OP_NOTPLUS: - case OP_NOTPLUSI: - case OP_NOTMINPLUS: - case OP_NOTMINPLUSI: - case OP_NOTQUERY: - case OP_NOTQUERYI: - case OP_NOTMINQUERY: - case OP_NOTMINQUERYI: - c = *ecode++ - ((op >= OP_NOTSTARI)? OP_NOTSTARI: OP_NOTSTAR); - minimize = (c & 1) != 0; - min = rep_min[c]; /* Pick up values from tables; */ - max = rep_max[c]; /* zero for max => infinity */ - if (max == 0) max = INT_MAX; - - /* Common code for all repeated single-byte matches. */ - - REPEATNOTCHAR: - GETCHARINCTEST(fc, ecode); - - /* The code is duplicated for the caseless and caseful cases, for speed, - since matching characters is likely to be quite common. First, ensure the - minimum number of matches are present. If min = max, continue at the same - level without recursing. Otherwise, if minimizing, keep trying the rest of - the expression and advancing one matching character if failing, up to the - maximum. Alternatively, if maximizing, find the maximum number of - characters and work backwards. */ - - DPRINTF(("negative matching %c{%d,%d} against subject %.*s\n", fc, min, max, - max, (char *)eptr)); - - if (op >= OP_NOTSTARI) /* Caseless */ - { -#ifdef SUPPORT_UTF -#ifdef SUPPORT_UCP - if (utf && fc > 127) - foc = UCD_OTHERCASE(fc); -#else - if (utf && fc > 127) - foc = fc; -#endif /* SUPPORT_UCP */ - else -#endif /* SUPPORT_UTF */ - foc = TABLE_GET(fc, md->fcc, fc); - -#ifdef SUPPORT_UTF - if (utf) - { - register pcre_uint32 d; - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINC(d, eptr); - if (fc == d || (unsigned int)foc == d) RRETURN(MATCH_NOMATCH); - } - } - else -#endif /* SUPPORT_UTF */ - /* Not UTF mode */ - { - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (fc == *eptr || foc == *eptr) RRETURN(MATCH_NOMATCH); - eptr++; - } - } - - if (min == max) continue; - - if (minimize) - { -#ifdef SUPPORT_UTF - if (utf) - { - register pcre_uint32 d; - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM28); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINC(d, eptr); - if (fc == d || (unsigned int)foc == d) RRETURN(MATCH_NOMATCH); - } - } - else -#endif /*SUPPORT_UTF */ - /* Not UTF mode */ - { - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM29); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (fc == *eptr || foc == *eptr) RRETURN(MATCH_NOMATCH); - eptr++; - } - } - /* Control never gets here */ - } - - /* Maximize case */ - - else - { - pp = eptr; - -#ifdef SUPPORT_UTF - if (utf) - { - register pcre_uint32 d; - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLEN(d, eptr, len); - if (fc == d || (unsigned int)foc == d) break; - eptr += len; - } - if (possessive) continue; /* No backtracking */ - for(;;) - { - if (eptr <= pp) goto TAIL_RECURSE; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM30); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - eptr--; - BACKCHAR(eptr); - } - } - else -#endif /* SUPPORT_UTF */ - /* Not UTF mode */ - { - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - if (fc == *eptr || foc == *eptr) break; - eptr++; - } - if (possessive) continue; /* No backtracking */ - for (;;) - { - if (eptr == pp) goto TAIL_RECURSE; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM31); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - eptr--; - } - } - /* Control never gets here */ - } - } - - /* Caseful comparisons */ - - else - { -#ifdef SUPPORT_UTF - if (utf) - { - register pcre_uint32 d; - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINC(d, eptr); - if (fc == d) RRETURN(MATCH_NOMATCH); - } - } - else -#endif - /* Not UTF mode */ - { - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (fc == *eptr++) RRETURN(MATCH_NOMATCH); - } - } - - if (min == max) continue; - - if (minimize) - { -#ifdef SUPPORT_UTF - if (utf) - { - register pcre_uint32 d; - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM32); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINC(d, eptr); - if (fc == d) RRETURN(MATCH_NOMATCH); - } - } - else -#endif - /* Not UTF mode */ - { - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM33); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (fc == *eptr++) RRETURN(MATCH_NOMATCH); - } - } - /* Control never gets here */ - } - - /* Maximize case */ - - else - { - pp = eptr; - -#ifdef SUPPORT_UTF - if (utf) - { - register pcre_uint32 d; - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLEN(d, eptr, len); - if (fc == d) break; - eptr += len; - } - if (possessive) continue; /* No backtracking */ - for(;;) - { - if (eptr <= pp) goto TAIL_RECURSE; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM34); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - eptr--; - BACKCHAR(eptr); - } - } - else -#endif - /* Not UTF mode */ - { - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - if (fc == *eptr) break; - eptr++; - } - if (possessive) continue; /* No backtracking */ - for (;;) - { - if (eptr == pp) goto TAIL_RECURSE; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM35); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - eptr--; - } - } - /* Control never gets here */ - } - } - /* Control never gets here */ - - /* Match a single character type repeatedly; several different opcodes - share code. This is very similar to the code for single characters, but we - repeat it in the interests of efficiency. */ - - case OP_TYPEEXACT: - min = max = GET2(ecode, 1); - minimize = TRUE; - ecode += 1 + IMM2_SIZE; - goto REPEATTYPE; - - case OP_TYPEUPTO: - case OP_TYPEMINUPTO: - min = 0; - max = GET2(ecode, 1); - minimize = *ecode == OP_TYPEMINUPTO; - ecode += 1 + IMM2_SIZE; - goto REPEATTYPE; - - case OP_TYPEPOSSTAR: - possessive = TRUE; - min = 0; - max = INT_MAX; - ecode++; - goto REPEATTYPE; - - case OP_TYPEPOSPLUS: - possessive = TRUE; - min = 1; - max = INT_MAX; - ecode++; - goto REPEATTYPE; - - case OP_TYPEPOSQUERY: - possessive = TRUE; - min = 0; - max = 1; - ecode++; - goto REPEATTYPE; - - case OP_TYPEPOSUPTO: - possessive = TRUE; - min = 0; - max = GET2(ecode, 1); - ecode += 1 + IMM2_SIZE; - goto REPEATTYPE; - - case OP_TYPESTAR: - case OP_TYPEMINSTAR: - case OP_TYPEPLUS: - case OP_TYPEMINPLUS: - case OP_TYPEQUERY: - case OP_TYPEMINQUERY: - c = *ecode++ - OP_TYPESTAR; - minimize = (c & 1) != 0; - min = rep_min[c]; /* Pick up values from tables; */ - max = rep_max[c]; /* zero for max => infinity */ - if (max == 0) max = INT_MAX; - - /* Common code for all repeated single character type matches. Note that - in UTF-8 mode, '.' matches a character of any length, but for the other - character types, the valid characters are all one-byte long. */ - - REPEATTYPE: - ctype = *ecode++; /* Code for the character type */ - -#ifdef SUPPORT_UCP - if (ctype == OP_PROP || ctype == OP_NOTPROP) - { - prop_fail_result = ctype == OP_NOTPROP; - prop_type = *ecode++; - prop_value = *ecode++; - } - else prop_type = -1; -#endif - - /* First, ensure the minimum number of matches are present. Use inline - code for maximizing the speed, and do the type test once at the start - (i.e. keep it out of the loop). Separate the UTF-8 code completely as that - is tidier. Also separate the UCP code, which can be the same for both UTF-8 - and single-bytes. */ - - if (min > 0) - { -#ifdef SUPPORT_UCP - if (prop_type >= 0) - { - switch(prop_type) - { - case PT_ANY: - if (prop_fail_result) RRETURN(MATCH_NOMATCH); - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - } - break; - - case PT_LAMP: - for (i = 1; i <= min; i++) - { - int chartype; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - chartype = UCD_CHARTYPE(c); - if ((chartype == ucp_Lu || - chartype == ucp_Ll || - chartype == ucp_Lt) == prop_fail_result) - RRETURN(MATCH_NOMATCH); - } - break; - - case PT_GC: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if ((UCD_CATEGORY(c) == prop_value) == prop_fail_result) - RRETURN(MATCH_NOMATCH); - } - break; - - case PT_PC: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if ((UCD_CHARTYPE(c) == prop_value) == prop_fail_result) - RRETURN(MATCH_NOMATCH); - } - break; - - case PT_SC: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if ((UCD_SCRIPT(c) == prop_value) == prop_fail_result) - RRETURN(MATCH_NOMATCH); - } - break; - - case PT_ALNUM: - for (i = 1; i <= min; i++) - { - int category; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - category = UCD_CATEGORY(c); - if ((category == ucp_L || category == ucp_N) == prop_fail_result) - RRETURN(MATCH_NOMATCH); - } - break; - - /* Perl space used to exclude VT, but from Perl 5.18 it is included, - which means that Perl space and POSIX space are now identical. PCRE - was changed at release 8.34. */ - - case PT_SPACE: /* Perl space */ - case PT_PXSPACE: /* POSIX space */ - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - switch(c) - { - HSPACE_CASES: - VSPACE_CASES: - if (prop_fail_result) RRETURN(MATCH_NOMATCH); - break; - - default: - if ((UCD_CATEGORY(c) == ucp_Z) == prop_fail_result) - RRETURN(MATCH_NOMATCH); - break; - } - } - break; - - case PT_WORD: - for (i = 1; i <= min; i++) - { - int category; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - category = UCD_CATEGORY(c); - if ((category == ucp_L || category == ucp_N || c == CHAR_UNDERSCORE) - == prop_fail_result) - RRETURN(MATCH_NOMATCH); - } - break; - - case PT_CLIST: - for (i = 1; i <= min; i++) - { - const pcre_uint32 *cp; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - cp = PRIV(ucd_caseless_sets) + prop_value; - for (;;) - { - if (c < *cp) - { if (prop_fail_result) break; else { RRETURN(MATCH_NOMATCH); } } - if (c == *cp++) - { if (prop_fail_result) { RRETURN(MATCH_NOMATCH); } else break; } - } - } - break; - - case PT_UCNC: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if ((c == CHAR_DOLLAR_SIGN || c == CHAR_COMMERCIAL_AT || - c == CHAR_GRAVE_ACCENT || (c >= 0xa0 && c <= 0xd7ff) || - c >= 0xe000) == prop_fail_result) - RRETURN(MATCH_NOMATCH); - } - break; - - /* This should not occur */ - - default: - RRETURN(PCRE_ERROR_INTERNAL); - } - } - - /* Match extended Unicode sequences. We will get here only if the - support is in the binary; otherwise a compile-time error occurs. */ - - else if (ctype == OP_EXTUNI) - { - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - else - { - int lgb, rgb; - GETCHARINCTEST(c, eptr); - lgb = UCD_GRAPHBREAK(c); - while (eptr < md->end_subject) - { - int len = 1; - if (!utf) c = *eptr; else { GETCHARLEN(c, eptr, len); } - rgb = UCD_GRAPHBREAK(c); - if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) break; - lgb = rgb; - eptr += len; - } - } - CHECK_PARTIAL(); - } - } - - else -#endif /* SUPPORT_UCP */ - -/* Handle all other cases when the coding is UTF-8 */ - -#ifdef SUPPORT_UTF - if (utf) switch(ctype) - { - case OP_ANY: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (IS_NEWLINE(eptr)) RRETURN(MATCH_NOMATCH); - if (md->partial != 0 && - eptr + 1 >= md->end_subject && - NLBLOCK->nltype == NLTYPE_FIXED && - NLBLOCK->nllen == 2 && - UCHAR21(eptr) == NLBLOCK->nl[0]) - { - md->hitend = TRUE; - if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); - } - eptr++; - ACROSSCHAR(eptr < md->end_subject, *eptr, eptr++); - } - break; - - case OP_ALLANY: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - eptr++; - ACROSSCHAR(eptr < md->end_subject, *eptr, eptr++); - } - break; - - case OP_ANYBYTE: - if (eptr > md->end_subject - min) RRETURN(MATCH_NOMATCH); - eptr += min; - break; - - case OP_ANYNL: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINC(c, eptr); - switch(c) - { - default: RRETURN(MATCH_NOMATCH); - - case CHAR_CR: - if (eptr < md->end_subject && UCHAR21(eptr) == CHAR_LF) eptr++; - break; - - case CHAR_LF: - break; - - case CHAR_VT: - case CHAR_FF: - case CHAR_NEL: -#ifndef EBCDIC - case 0x2028: - case 0x2029: -#endif /* Not EBCDIC */ - if (md->bsr_anycrlf) RRETURN(MATCH_NOMATCH); - break; - } - } - break; - - case OP_NOT_HSPACE: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINC(c, eptr); - switch(c) - { - HSPACE_CASES: RRETURN(MATCH_NOMATCH); /* Byte and multibyte cases */ - default: break; - } - } - break; - - case OP_HSPACE: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINC(c, eptr); - switch(c) - { - HSPACE_CASES: break; /* Byte and multibyte cases */ - default: RRETURN(MATCH_NOMATCH); - } - } - break; - - case OP_NOT_VSPACE: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINC(c, eptr); - switch(c) - { - VSPACE_CASES: RRETURN(MATCH_NOMATCH); - default: break; - } - } - break; - - case OP_VSPACE: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINC(c, eptr); - switch(c) - { - VSPACE_CASES: break; - default: RRETURN(MATCH_NOMATCH); - } - } - break; - - case OP_NOT_DIGIT: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINC(c, eptr); - if (c < 128 && (md->ctypes[c] & ctype_digit) != 0) - RRETURN(MATCH_NOMATCH); - } - break; - - case OP_DIGIT: - for (i = 1; i <= min; i++) - { - pcre_uint32 cc; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - cc = UCHAR21(eptr); - if (cc >= 128 || (md->ctypes[cc] & ctype_digit) == 0) - RRETURN(MATCH_NOMATCH); - eptr++; - /* No need to skip more bytes - we know it's a 1-byte character */ - } - break; - - case OP_NOT_WHITESPACE: - for (i = 1; i <= min; i++) - { - pcre_uint32 cc; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - cc = UCHAR21(eptr); - if (cc < 128 && (md->ctypes[cc] & ctype_space) != 0) - RRETURN(MATCH_NOMATCH); - eptr++; - ACROSSCHAR(eptr < md->end_subject, *eptr, eptr++); - } - break; - - case OP_WHITESPACE: - for (i = 1; i <= min; i++) - { - pcre_uint32 cc; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - cc = UCHAR21(eptr); - if (cc >= 128 || (md->ctypes[cc] & ctype_space) == 0) - RRETURN(MATCH_NOMATCH); - eptr++; - /* No need to skip more bytes - we know it's a 1-byte character */ - } - break; - - case OP_NOT_WORDCHAR: - for (i = 1; i <= min; i++) - { - pcre_uint32 cc; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - cc = UCHAR21(eptr); - if (cc < 128 && (md->ctypes[cc] & ctype_word) != 0) - RRETURN(MATCH_NOMATCH); - eptr++; - ACROSSCHAR(eptr < md->end_subject, *eptr, eptr++); - } - break; - - case OP_WORDCHAR: - for (i = 1; i <= min; i++) - { - pcre_uint32 cc; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - cc = UCHAR21(eptr); - if (cc >= 128 || (md->ctypes[cc] & ctype_word) == 0) - RRETURN(MATCH_NOMATCH); - eptr++; - /* No need to skip more bytes - we know it's a 1-byte character */ - } - break; - - default: - RRETURN(PCRE_ERROR_INTERNAL); - } /* End switch(ctype) */ - - else -#endif /* SUPPORT_UTF */ - - /* Code for the non-UTF-8 case for minimum matching of operators other - than OP_PROP and OP_NOTPROP. */ - - switch(ctype) - { - case OP_ANY: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (IS_NEWLINE(eptr)) RRETURN(MATCH_NOMATCH); - if (md->partial != 0 && - eptr + 1 >= md->end_subject && - NLBLOCK->nltype == NLTYPE_FIXED && - NLBLOCK->nllen == 2 && - *eptr == NLBLOCK->nl[0]) - { - md->hitend = TRUE; - if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); - } - eptr++; - } - break; - - case OP_ALLANY: - if (eptr > md->end_subject - min) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - eptr += min; - break; - - case OP_ANYBYTE: - if (eptr > md->end_subject - min) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - eptr += min; - break; - - case OP_ANYNL: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - switch(*eptr++) - { - default: RRETURN(MATCH_NOMATCH); - - case CHAR_CR: - if (eptr < md->end_subject && *eptr == CHAR_LF) eptr++; - break; - - case CHAR_LF: - break; - - case CHAR_VT: - case CHAR_FF: - case CHAR_NEL: -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - case 0x2028: - case 0x2029: -#endif - if (md->bsr_anycrlf) RRETURN(MATCH_NOMATCH); - break; - } - } - break; - - case OP_NOT_HSPACE: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - switch(*eptr++) - { - default: break; - HSPACE_BYTE_CASES: -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - HSPACE_MULTIBYTE_CASES: -#endif - RRETURN(MATCH_NOMATCH); - } - } - break; - - case OP_HSPACE: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - switch(*eptr++) - { - default: RRETURN(MATCH_NOMATCH); - HSPACE_BYTE_CASES: -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - HSPACE_MULTIBYTE_CASES: -#endif - break; - } - } - break; - - case OP_NOT_VSPACE: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - switch(*eptr++) - { - VSPACE_BYTE_CASES: -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - VSPACE_MULTIBYTE_CASES: -#endif - RRETURN(MATCH_NOMATCH); - default: break; - } - } - break; - - case OP_VSPACE: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - switch(*eptr++) - { - default: RRETURN(MATCH_NOMATCH); - VSPACE_BYTE_CASES: -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - VSPACE_MULTIBYTE_CASES: -#endif - break; - } - } - break; - - case OP_NOT_DIGIT: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (MAX_255(*eptr) && (md->ctypes[*eptr] & ctype_digit) != 0) - RRETURN(MATCH_NOMATCH); - eptr++; - } - break; - - case OP_DIGIT: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (!MAX_255(*eptr) || (md->ctypes[*eptr] & ctype_digit) == 0) - RRETURN(MATCH_NOMATCH); - eptr++; - } - break; - - case OP_NOT_WHITESPACE: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (MAX_255(*eptr) && (md->ctypes[*eptr] & ctype_space) != 0) - RRETURN(MATCH_NOMATCH); - eptr++; - } - break; - - case OP_WHITESPACE: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (!MAX_255(*eptr) || (md->ctypes[*eptr] & ctype_space) == 0) - RRETURN(MATCH_NOMATCH); - eptr++; - } - break; - - case OP_NOT_WORDCHAR: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (MAX_255(*eptr) && (md->ctypes[*eptr] & ctype_word) != 0) - RRETURN(MATCH_NOMATCH); - eptr++; - } - break; - - case OP_WORDCHAR: - for (i = 1; i <= min; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (!MAX_255(*eptr) || (md->ctypes[*eptr] & ctype_word) == 0) - RRETURN(MATCH_NOMATCH); - eptr++; - } - break; - - default: - RRETURN(PCRE_ERROR_INTERNAL); - } - } - - /* If min = max, continue at the same level without recursing */ - - if (min == max) continue; - - /* If minimizing, we have to test the rest of the pattern before each - subsequent match. Again, separate the UTF-8 case for speed, and also - separate the UCP cases. */ - - if (minimize) - { -#ifdef SUPPORT_UCP - if (prop_type >= 0) - { - switch(prop_type) - { - case PT_ANY: - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM36); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if (prop_fail_result) RRETURN(MATCH_NOMATCH); - } - /* Control never gets here */ - - case PT_LAMP: - for (fi = min;; fi++) - { - int chartype; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM37); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - chartype = UCD_CHARTYPE(c); - if ((chartype == ucp_Lu || - chartype == ucp_Ll || - chartype == ucp_Lt) == prop_fail_result) - RRETURN(MATCH_NOMATCH); - } - /* Control never gets here */ - - case PT_GC: - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM38); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if ((UCD_CATEGORY(c) == prop_value) == prop_fail_result) - RRETURN(MATCH_NOMATCH); - } - /* Control never gets here */ - - case PT_PC: - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM39); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if ((UCD_CHARTYPE(c) == prop_value) == prop_fail_result) - RRETURN(MATCH_NOMATCH); - } - /* Control never gets here */ - - case PT_SC: - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM40); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if ((UCD_SCRIPT(c) == prop_value) == prop_fail_result) - RRETURN(MATCH_NOMATCH); - } - /* Control never gets here */ - - case PT_ALNUM: - for (fi = min;; fi++) - { - int category; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM59); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - category = UCD_CATEGORY(c); - if ((category == ucp_L || category == ucp_N) == prop_fail_result) - RRETURN(MATCH_NOMATCH); - } - /* Control never gets here */ - - /* Perl space used to exclude VT, but from Perl 5.18 it is included, - which means that Perl space and POSIX space are now identical. PCRE - was changed at release 8.34. */ - - case PT_SPACE: /* Perl space */ - case PT_PXSPACE: /* POSIX space */ - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM61); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - switch(c) - { - HSPACE_CASES: - VSPACE_CASES: - if (prop_fail_result) RRETURN(MATCH_NOMATCH); - break; - - default: - if ((UCD_CATEGORY(c) == ucp_Z) == prop_fail_result) - RRETURN(MATCH_NOMATCH); - break; - } - } - /* Control never gets here */ - - case PT_WORD: - for (fi = min;; fi++) - { - int category; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM62); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - category = UCD_CATEGORY(c); - if ((category == ucp_L || - category == ucp_N || - c == CHAR_UNDERSCORE) - == prop_fail_result) - RRETURN(MATCH_NOMATCH); - } - /* Control never gets here */ - - case PT_CLIST: - for (fi = min;; fi++) - { - const pcre_uint32 *cp; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM67); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - cp = PRIV(ucd_caseless_sets) + prop_value; - for (;;) - { - if (c < *cp) - { if (prop_fail_result) break; else { RRETURN(MATCH_NOMATCH); } } - if (c == *cp++) - { if (prop_fail_result) { RRETURN(MATCH_NOMATCH); } else break; } - } - } - /* Control never gets here */ - - case PT_UCNC: - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM60); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - GETCHARINCTEST(c, eptr); - if ((c == CHAR_DOLLAR_SIGN || c == CHAR_COMMERCIAL_AT || - c == CHAR_GRAVE_ACCENT || (c >= 0xa0 && c <= 0xd7ff) || - c >= 0xe000) == prop_fail_result) - RRETURN(MATCH_NOMATCH); - } - /* Control never gets here */ - - /* This should never occur */ - default: - RRETURN(PCRE_ERROR_INTERNAL); - } - } - - /* Match extended Unicode sequences. We will get here only if the - support is in the binary; otherwise a compile-time error occurs. */ - - else if (ctype == OP_EXTUNI) - { - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM41); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - else - { - int lgb, rgb; - GETCHARINCTEST(c, eptr); - lgb = UCD_GRAPHBREAK(c); - while (eptr < md->end_subject) - { - int len = 1; - if (!utf) c = *eptr; else { GETCHARLEN(c, eptr, len); } - rgb = UCD_GRAPHBREAK(c); - if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) break; - lgb = rgb; - eptr += len; - } - } - CHECK_PARTIAL(); - } - } - else -#endif /* SUPPORT_UCP */ - -#ifdef SUPPORT_UTF - if (utf) - { - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM42); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (ctype == OP_ANY && IS_NEWLINE(eptr)) - RRETURN(MATCH_NOMATCH); - GETCHARINC(c, eptr); - switch(ctype) - { - case OP_ANY: /* This is the non-NL case */ - if (md->partial != 0 && /* Take care with CRLF partial */ - eptr >= md->end_subject && - NLBLOCK->nltype == NLTYPE_FIXED && - NLBLOCK->nllen == 2 && - c == NLBLOCK->nl[0]) - { - md->hitend = TRUE; - if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); - } - break; - - case OP_ALLANY: - case OP_ANYBYTE: - break; - - case OP_ANYNL: - switch(c) - { - default: RRETURN(MATCH_NOMATCH); - case CHAR_CR: - if (eptr < md->end_subject && UCHAR21(eptr) == CHAR_LF) eptr++; - break; - - case CHAR_LF: - break; - - case CHAR_VT: - case CHAR_FF: - case CHAR_NEL: -#ifndef EBCDIC - case 0x2028: - case 0x2029: -#endif /* Not EBCDIC */ - if (md->bsr_anycrlf) RRETURN(MATCH_NOMATCH); - break; - } - break; - - case OP_NOT_HSPACE: - switch(c) - { - HSPACE_CASES: RRETURN(MATCH_NOMATCH); - default: break; - } - break; - - case OP_HSPACE: - switch(c) - { - HSPACE_CASES: break; - default: RRETURN(MATCH_NOMATCH); - } - break; - - case OP_NOT_VSPACE: - switch(c) - { - VSPACE_CASES: RRETURN(MATCH_NOMATCH); - default: break; - } - break; - - case OP_VSPACE: - switch(c) - { - VSPACE_CASES: break; - default: RRETURN(MATCH_NOMATCH); - } - break; - - case OP_NOT_DIGIT: - if (c < 256 && (md->ctypes[c] & ctype_digit) != 0) - RRETURN(MATCH_NOMATCH); - break; - - case OP_DIGIT: - if (c >= 256 || (md->ctypes[c] & ctype_digit) == 0) - RRETURN(MATCH_NOMATCH); - break; - - case OP_NOT_WHITESPACE: - if (c < 256 && (md->ctypes[c] & ctype_space) != 0) - RRETURN(MATCH_NOMATCH); - break; - - case OP_WHITESPACE: - if (c >= 256 || (md->ctypes[c] & ctype_space) == 0) - RRETURN(MATCH_NOMATCH); - break; - - case OP_NOT_WORDCHAR: - if (c < 256 && (md->ctypes[c] & ctype_word) != 0) - RRETURN(MATCH_NOMATCH); - break; - - case OP_WORDCHAR: - if (c >= 256 || (md->ctypes[c] & ctype_word) == 0) - RRETURN(MATCH_NOMATCH); - break; - - default: - RRETURN(PCRE_ERROR_INTERNAL); - } - } - } - else -#endif - /* Not UTF mode */ - { - for (fi = min;; fi++) - { - RMATCH(eptr, ecode, offset_top, md, eptrb, RM43); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - if (fi >= max) RRETURN(MATCH_NOMATCH); - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - RRETURN(MATCH_NOMATCH); - } - if (ctype == OP_ANY && IS_NEWLINE(eptr)) - RRETURN(MATCH_NOMATCH); - c = *eptr++; - switch(ctype) - { - case OP_ANY: /* This is the non-NL case */ - if (md->partial != 0 && /* Take care with CRLF partial */ - eptr >= md->end_subject && - NLBLOCK->nltype == NLTYPE_FIXED && - NLBLOCK->nllen == 2 && - c == NLBLOCK->nl[0]) - { - md->hitend = TRUE; - if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); - } - break; - - case OP_ALLANY: - case OP_ANYBYTE: - break; - - case OP_ANYNL: - switch(c) - { - default: RRETURN(MATCH_NOMATCH); - case CHAR_CR: - if (eptr < md->end_subject && *eptr == CHAR_LF) eptr++; - break; - - case CHAR_LF: - break; - - case CHAR_VT: - case CHAR_FF: - case CHAR_NEL: -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - case 0x2028: - case 0x2029: -#endif - if (md->bsr_anycrlf) RRETURN(MATCH_NOMATCH); - break; - } - break; - - case OP_NOT_HSPACE: - switch(c) - { - default: break; - HSPACE_BYTE_CASES: -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - HSPACE_MULTIBYTE_CASES: -#endif - RRETURN(MATCH_NOMATCH); - } - break; - - case OP_HSPACE: - switch(c) - { - default: RRETURN(MATCH_NOMATCH); - HSPACE_BYTE_CASES: -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - HSPACE_MULTIBYTE_CASES: -#endif - break; - } - break; - - case OP_NOT_VSPACE: - switch(c) - { - default: break; - VSPACE_BYTE_CASES: -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - VSPACE_MULTIBYTE_CASES: -#endif - RRETURN(MATCH_NOMATCH); - } - break; - - case OP_VSPACE: - switch(c) - { - default: RRETURN(MATCH_NOMATCH); - VSPACE_BYTE_CASES: -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - VSPACE_MULTIBYTE_CASES: -#endif - break; - } - break; - - case OP_NOT_DIGIT: - if (MAX_255(c) && (md->ctypes[c] & ctype_digit) != 0) RRETURN(MATCH_NOMATCH); - break; - - case OP_DIGIT: - if (!MAX_255(c) || (md->ctypes[c] & ctype_digit) == 0) RRETURN(MATCH_NOMATCH); - break; - - case OP_NOT_WHITESPACE: - if (MAX_255(c) && (md->ctypes[c] & ctype_space) != 0) RRETURN(MATCH_NOMATCH); - break; - - case OP_WHITESPACE: - if (!MAX_255(c) || (md->ctypes[c] & ctype_space) == 0) RRETURN(MATCH_NOMATCH); - break; - - case OP_NOT_WORDCHAR: - if (MAX_255(c) && (md->ctypes[c] & ctype_word) != 0) RRETURN(MATCH_NOMATCH); - break; - - case OP_WORDCHAR: - if (!MAX_255(c) || (md->ctypes[c] & ctype_word) == 0) RRETURN(MATCH_NOMATCH); - break; - - default: - RRETURN(PCRE_ERROR_INTERNAL); - } - } - } - /* Control never gets here */ - } - - /* If maximizing, it is worth using inline code for speed, doing the type - test once at the start (i.e. keep it out of the loop). Again, keep the - UTF-8 and UCP stuff separate. */ - - else - { - pp = eptr; /* Remember where we started */ - -#ifdef SUPPORT_UCP - if (prop_type >= 0) - { - switch(prop_type) - { - case PT_ANY: - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLENTEST(c, eptr, len); - if (prop_fail_result) break; - eptr+= len; - } - break; - - case PT_LAMP: - for (i = min; i < max; i++) - { - int chartype; - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLENTEST(c, eptr, len); - chartype = UCD_CHARTYPE(c); - if ((chartype == ucp_Lu || - chartype == ucp_Ll || - chartype == ucp_Lt) == prop_fail_result) - break; - eptr+= len; - } - break; - - case PT_GC: - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLENTEST(c, eptr, len); - if ((UCD_CATEGORY(c) == prop_value) == prop_fail_result) break; - eptr+= len; - } - break; - - case PT_PC: - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLENTEST(c, eptr, len); - if ((UCD_CHARTYPE(c) == prop_value) == prop_fail_result) break; - eptr+= len; - } - break; - - case PT_SC: - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLENTEST(c, eptr, len); - if ((UCD_SCRIPT(c) == prop_value) == prop_fail_result) break; - eptr+= len; - } - break; - - case PT_ALNUM: - for (i = min; i < max; i++) - { - int category; - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLENTEST(c, eptr, len); - category = UCD_CATEGORY(c); - if ((category == ucp_L || category == ucp_N) == prop_fail_result) - break; - eptr+= len; - } - break; - - /* Perl space used to exclude VT, but from Perl 5.18 it is included, - which means that Perl space and POSIX space are now identical. PCRE - was changed at release 8.34. */ - - case PT_SPACE: /* Perl space */ - case PT_PXSPACE: /* POSIX space */ - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLENTEST(c, eptr, len); - switch(c) - { - HSPACE_CASES: - VSPACE_CASES: - if (prop_fail_result) goto ENDLOOP99; /* Break the loop */ - break; - - default: - if ((UCD_CATEGORY(c) == ucp_Z) == prop_fail_result) - goto ENDLOOP99; /* Break the loop */ - break; - } - eptr+= len; - } - ENDLOOP99: - break; - - case PT_WORD: - for (i = min; i < max; i++) - { - int category; - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLENTEST(c, eptr, len); - category = UCD_CATEGORY(c); - if ((category == ucp_L || category == ucp_N || - c == CHAR_UNDERSCORE) == prop_fail_result) - break; - eptr+= len; - } - break; - - case PT_CLIST: - for (i = min; i < max; i++) - { - const pcre_uint32 *cp; - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLENTEST(c, eptr, len); - cp = PRIV(ucd_caseless_sets) + prop_value; - for (;;) - { - if (c < *cp) - { if (prop_fail_result) break; else goto GOT_MAX; } - if (c == *cp++) - { if (prop_fail_result) goto GOT_MAX; else break; } - } - eptr += len; - } - GOT_MAX: - break; - - case PT_UCNC: - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLENTEST(c, eptr, len); - if ((c == CHAR_DOLLAR_SIGN || c == CHAR_COMMERCIAL_AT || - c == CHAR_GRAVE_ACCENT || (c >= 0xa0 && c <= 0xd7ff) || - c >= 0xe000) == prop_fail_result) - break; - eptr += len; - } - break; - - default: - RRETURN(PCRE_ERROR_INTERNAL); - } - - /* eptr is now past the end of the maximum run */ - - if (possessive) continue; /* No backtracking */ - for(;;) - { - if (eptr <= pp) goto TAIL_RECURSE; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM44); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - eptr--; - if (utf) BACKCHAR(eptr); - } - } - - /* Match extended Unicode grapheme clusters. We will get here only if the - support is in the binary; otherwise a compile-time error occurs. */ - - else if (ctype == OP_EXTUNI) - { - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - else - { - int lgb, rgb; - GETCHARINCTEST(c, eptr); - lgb = UCD_GRAPHBREAK(c); - while (eptr < md->end_subject) - { - int len = 1; - if (!utf) c = *eptr; else { GETCHARLEN(c, eptr, len); } - rgb = UCD_GRAPHBREAK(c); - if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) break; - lgb = rgb; - eptr += len; - } - } - CHECK_PARTIAL(); - } - - /* eptr is now past the end of the maximum run */ - - if (possessive) continue; /* No backtracking */ - - /* We use <= pp rather than == pp to detect the start of the run while - backtracking because the use of \C in UTF mode can cause BACKCHAR to - move back past pp. This is just palliative; the use of \C in UTF mode - is fraught with danger. */ - - for(;;) - { - int lgb, rgb; - PCRE_PUCHAR fptr; - - if (eptr <= pp) goto TAIL_RECURSE; /* At start of char run */ - RMATCH(eptr, ecode, offset_top, md, eptrb, RM45); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - - /* Backtracking over an extended grapheme cluster involves inspecting - the previous two characters (if present) to see if a break is - permitted between them. */ - - eptr--; - if (!utf) c = *eptr; else - { - BACKCHAR(eptr); - GETCHAR(c, eptr); - } - rgb = UCD_GRAPHBREAK(c); - - for (;;) - { - if (eptr <= pp) goto TAIL_RECURSE; /* At start of char run */ - fptr = eptr - 1; - if (!utf) c = *fptr; else - { - BACKCHAR(fptr); - GETCHAR(c, fptr); - } - lgb = UCD_GRAPHBREAK(c); - if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) break; - eptr = fptr; - rgb = lgb; - } - } - } - - else -#endif /* SUPPORT_UCP */ - -#ifdef SUPPORT_UTF - if (utf) - { - switch(ctype) - { - case OP_ANY: - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - if (IS_NEWLINE(eptr)) break; - if (md->partial != 0 && /* Take care with CRLF partial */ - eptr + 1 >= md->end_subject && - NLBLOCK->nltype == NLTYPE_FIXED && - NLBLOCK->nllen == 2 && - UCHAR21(eptr) == NLBLOCK->nl[0]) - { - md->hitend = TRUE; - if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); - } - eptr++; - ACROSSCHAR(eptr < md->end_subject, *eptr, eptr++); - } - break; - - case OP_ALLANY: - if (max < INT_MAX) - { - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - eptr++; - ACROSSCHAR(eptr < md->end_subject, *eptr, eptr++); - } - } - else - { - eptr = md->end_subject; /* Unlimited UTF-8 repeat */ - SCHECK_PARTIAL(); - } - break; - - /* The byte case is the same as non-UTF8 */ - - case OP_ANYBYTE: - c = max - min; - if (c > (unsigned int)(md->end_subject - eptr)) - { - eptr = md->end_subject; - SCHECK_PARTIAL(); - } - else eptr += c; - break; - - case OP_ANYNL: - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLEN(c, eptr, len); - if (c == CHAR_CR) - { - if (++eptr >= md->end_subject) break; - if (UCHAR21(eptr) == CHAR_LF) eptr++; - } - else - { - if (c != CHAR_LF && - (md->bsr_anycrlf || - (c != CHAR_VT && c != CHAR_FF && c != CHAR_NEL -#ifndef EBCDIC - && c != 0x2028 && c != 0x2029 -#endif /* Not EBCDIC */ - ))) - break; - eptr += len; - } - } - break; - - case OP_NOT_HSPACE: - case OP_HSPACE: - for (i = min; i < max; i++) - { - BOOL gotspace; - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLEN(c, eptr, len); - switch(c) - { - HSPACE_CASES: gotspace = TRUE; break; - default: gotspace = FALSE; break; - } - if (gotspace == (ctype == OP_NOT_HSPACE)) break; - eptr += len; - } - break; - - case OP_NOT_VSPACE: - case OP_VSPACE: - for (i = min; i < max; i++) - { - BOOL gotspace; - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLEN(c, eptr, len); - switch(c) - { - VSPACE_CASES: gotspace = TRUE; break; - default: gotspace = FALSE; break; - } - if (gotspace == (ctype == OP_NOT_VSPACE)) break; - eptr += len; - } - break; - - case OP_NOT_DIGIT: - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLEN(c, eptr, len); - if (c < 256 && (md->ctypes[c] & ctype_digit) != 0) break; - eptr+= len; - } - break; - - case OP_DIGIT: - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLEN(c, eptr, len); - if (c >= 256 ||(md->ctypes[c] & ctype_digit) == 0) break; - eptr+= len; - } - break; - - case OP_NOT_WHITESPACE: - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLEN(c, eptr, len); - if (c < 256 && (md->ctypes[c] & ctype_space) != 0) break; - eptr+= len; - } - break; - - case OP_WHITESPACE: - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLEN(c, eptr, len); - if (c >= 256 ||(md->ctypes[c] & ctype_space) == 0) break; - eptr+= len; - } - break; - - case OP_NOT_WORDCHAR: - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLEN(c, eptr, len); - if (c < 256 && (md->ctypes[c] & ctype_word) != 0) break; - eptr+= len; - } - break; - - case OP_WORDCHAR: - for (i = min; i < max; i++) - { - int len = 1; - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - GETCHARLEN(c, eptr, len); - if (c >= 256 || (md->ctypes[c] & ctype_word) == 0) break; - eptr+= len; - } - break; - - default: - RRETURN(PCRE_ERROR_INTERNAL); - } - - if (possessive) continue; /* No backtracking */ - for(;;) - { - if (eptr <= pp) goto TAIL_RECURSE; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM46); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - eptr--; - BACKCHAR(eptr); - if (ctype == OP_ANYNL && eptr > pp && UCHAR21(eptr) == CHAR_NL && - UCHAR21(eptr - 1) == CHAR_CR) eptr--; - } - } - else -#endif /* SUPPORT_UTF */ - /* Not UTF mode */ - { - switch(ctype) - { - case OP_ANY: - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - if (IS_NEWLINE(eptr)) break; - if (md->partial != 0 && /* Take care with CRLF partial */ - eptr + 1 >= md->end_subject && - NLBLOCK->nltype == NLTYPE_FIXED && - NLBLOCK->nllen == 2 && - *eptr == NLBLOCK->nl[0]) - { - md->hitend = TRUE; - if (md->partial > 1) RRETURN(PCRE_ERROR_PARTIAL); - } - eptr++; - } - break; - - case OP_ALLANY: - case OP_ANYBYTE: - c = max - min; - if (c > (unsigned int)(md->end_subject - eptr)) - { - eptr = md->end_subject; - SCHECK_PARTIAL(); - } - else eptr += c; - break; - - case OP_ANYNL: - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - c = *eptr; - if (c == CHAR_CR) - { - if (++eptr >= md->end_subject) break; - if (*eptr == CHAR_LF) eptr++; - } - else - { - if (c != CHAR_LF && (md->bsr_anycrlf || - (c != CHAR_VT && c != CHAR_FF && c != CHAR_NEL -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - && c != 0x2028 && c != 0x2029 -#endif - ))) break; - eptr++; - } - } - break; - - case OP_NOT_HSPACE: - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - switch(*eptr) - { - default: eptr++; break; - HSPACE_BYTE_CASES: -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - HSPACE_MULTIBYTE_CASES: -#endif - goto ENDLOOP00; - } - } - ENDLOOP00: - break; - - case OP_HSPACE: - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - switch(*eptr) - { - default: goto ENDLOOP01; - HSPACE_BYTE_CASES: -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - HSPACE_MULTIBYTE_CASES: -#endif - eptr++; break; - } - } - ENDLOOP01: - break; - - case OP_NOT_VSPACE: - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - switch(*eptr) - { - default: eptr++; break; - VSPACE_BYTE_CASES: -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - VSPACE_MULTIBYTE_CASES: -#endif - goto ENDLOOP02; - } - } - ENDLOOP02: - break; - - case OP_VSPACE: - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - switch(*eptr) - { - default: goto ENDLOOP03; - VSPACE_BYTE_CASES: -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - VSPACE_MULTIBYTE_CASES: -#endif - eptr++; break; - } - } - ENDLOOP03: - break; - - case OP_NOT_DIGIT: - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - if (MAX_255(*eptr) && (md->ctypes[*eptr] & ctype_digit) != 0) break; - eptr++; - } - break; - - case OP_DIGIT: - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - if (!MAX_255(*eptr) || (md->ctypes[*eptr] & ctype_digit) == 0) break; - eptr++; - } - break; - - case OP_NOT_WHITESPACE: - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - if (MAX_255(*eptr) && (md->ctypes[*eptr] & ctype_space) != 0) break; - eptr++; - } - break; - - case OP_WHITESPACE: - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - if (!MAX_255(*eptr) || (md->ctypes[*eptr] & ctype_space) == 0) break; - eptr++; - } - break; - - case OP_NOT_WORDCHAR: - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - if (MAX_255(*eptr) && (md->ctypes[*eptr] & ctype_word) != 0) break; - eptr++; - } - break; - - case OP_WORDCHAR: - for (i = min; i < max; i++) - { - if (eptr >= md->end_subject) - { - SCHECK_PARTIAL(); - break; - } - if (!MAX_255(*eptr) || (md->ctypes[*eptr] & ctype_word) == 0) break; - eptr++; - } - break; - - default: - RRETURN(PCRE_ERROR_INTERNAL); - } - - if (possessive) continue; /* No backtracking */ - for (;;) - { - if (eptr == pp) goto TAIL_RECURSE; - RMATCH(eptr, ecode, offset_top, md, eptrb, RM47); - if (rrc != MATCH_NOMATCH) RRETURN(rrc); - eptr--; - if (ctype == OP_ANYNL && eptr > pp && *eptr == CHAR_LF && - eptr[-1] == CHAR_CR) eptr--; - } - } - - /* Control never gets here */ - } - - /* There's been some horrible disaster. Arrival here can only mean there is - something seriously wrong in the code above or the OP_xxx definitions. */ - - default: - DPRINTF(("Unknown opcode %d\n", *ecode)); - RRETURN(PCRE_ERROR_UNKNOWN_OPCODE); - } - - /* Do not stick any code in here without much thought; it is assumed - that "continue" in the code above comes out to here to repeat the main - loop. */ - - } /* End of main loop */ -/* Control never reaches here */ - - -/* When compiling to use the heap rather than the stack for recursive calls to -match(), the RRETURN() macro jumps here. The number that is saved in -frame->Xwhere indicates which label we actually want to return to. */ - -#ifdef NO_RECURSE -#define LBL(val) case val: goto L_RM##val; -HEAP_RETURN: -switch (frame->Xwhere) - { - LBL( 1) LBL( 2) LBL( 3) LBL( 4) LBL( 5) LBL( 6) LBL( 7) LBL( 8) - LBL( 9) LBL(10) LBL(11) LBL(12) LBL(13) LBL(14) LBL(15) LBL(17) - LBL(19) LBL(24) LBL(25) LBL(26) LBL(27) LBL(29) LBL(31) LBL(33) - LBL(35) LBL(43) LBL(47) LBL(48) LBL(49) LBL(50) LBL(51) LBL(52) - LBL(53) LBL(54) LBL(55) LBL(56) LBL(57) LBL(58) LBL(63) LBL(64) - LBL(65) LBL(66) -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - LBL(20) LBL(21) -#endif -#ifdef SUPPORT_UTF - LBL(16) LBL(18) - LBL(22) LBL(23) LBL(28) LBL(30) - LBL(32) LBL(34) LBL(42) LBL(46) -#ifdef SUPPORT_UCP - LBL(36) LBL(37) LBL(38) LBL(39) LBL(40) LBL(41) LBL(44) LBL(45) - LBL(59) LBL(60) LBL(61) LBL(62) LBL(67) -#endif /* SUPPORT_UCP */ -#endif /* SUPPORT_UTF */ - default: - DPRINTF(("jump error in pcre match: label %d non-existent\n", frame->Xwhere)); - return PCRE_ERROR_INTERNAL; - } -#undef LBL -#endif /* NO_RECURSE */ -} - - -/*************************************************************************** -**************************************************************************** - RECURSION IN THE match() FUNCTION - -Undefine all the macros that were defined above to handle this. */ - -#ifdef NO_RECURSE -#undef eptr -#undef ecode -#undef mstart -#undef offset_top -#undef eptrb -#undef flags - -#undef callpat -#undef charptr -#undef data -#undef next -#undef pp -#undef prev -#undef saved_eptr - -#undef new_recursive - -#undef cur_is_word -#undef condition -#undef prev_is_word - -#undef ctype -#undef length -#undef max -#undef min -#undef number -#undef offset -#undef op -#undef save_capture_last -#undef save_offset1 -#undef save_offset2 -#undef save_offset3 -#undef stacksave - -#undef newptrb - -#endif - -/* These two are defined as macros in both cases */ - -#undef fc -#undef fi - -/*************************************************************************** -***************************************************************************/ - - -#ifdef NO_RECURSE -/************************************************* -* Release allocated heap frames * -*************************************************/ - -/* This function releases all the allocated frames. The base frame is on the -machine stack, and so must not be freed. - -Argument: the address of the base frame -Returns: nothing -*/ - -static void -release_match_heapframes (heapframe *frame_base) -{ -heapframe *nextframe = frame_base->Xnextframe; -while (nextframe != NULL) - { - heapframe *oldframe = nextframe; - nextframe = nextframe->Xnextframe; - (PUBL(stack_free))(oldframe); - } -} -#endif - - -/************************************************* -* Execute a Regular Expression * -*************************************************/ - -/* This function applies a compiled re to a subject string and picks out -portions of the string if it matches. Two elements in the vector are set for -each substring: the offsets to the start and end of the substring. - -Arguments: - argument_re points to the compiled expression - extra_data points to extra data or is NULL - subject points to the subject string - length length of subject string (may contain binary zeros) - start_offset where to start in the subject string - options option bits - offsets points to a vector of ints to be filled in with offsets - offsetcount the number of elements in the vector - -Returns: > 0 => success; value is the number of elements filled in - = 0 => success, but offsets is not big enough - -1 => failed to match - < -1 => some kind of unexpected problem -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre_exec(const pcre *argument_re, const pcre_extra *extra_data, - PCRE_SPTR subject, int length, int start_offset, int options, int *offsets, - int offsetcount) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre16_exec(const pcre16 *argument_re, const pcre16_extra *extra_data, - PCRE_SPTR16 subject, int length, int start_offset, int options, int *offsets, - int offsetcount) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre32_exec(const pcre32 *argument_re, const pcre32_extra *extra_data, - PCRE_SPTR32 subject, int length, int start_offset, int options, int *offsets, - int offsetcount) -#endif -{ -int rc, ocount, arg_offset_max; -int newline; -BOOL using_temporary_offsets = FALSE; -BOOL anchored; -BOOL startline; -BOOL firstline; -BOOL utf; -BOOL has_first_char = FALSE; -BOOL has_req_char = FALSE; -pcre_uchar first_char = 0; -pcre_uchar first_char2 = 0; -pcre_uchar req_char = 0; -pcre_uchar req_char2 = 0; -match_data match_block; -match_data *md = &match_block; -const pcre_uint8 *tables; -const pcre_uint8 *start_bits = NULL; -PCRE_PUCHAR start_match = (PCRE_PUCHAR)subject + start_offset; -PCRE_PUCHAR end_subject; -PCRE_PUCHAR start_partial = NULL; -PCRE_PUCHAR match_partial = NULL; -PCRE_PUCHAR req_char_ptr = start_match - 1; - -const pcre_study_data *study; -const REAL_PCRE *re = (const REAL_PCRE *)argument_re; - -#ifdef NO_RECURSE -heapframe frame_zero; -frame_zero.Xprevframe = NULL; /* Marks the top level */ -frame_zero.Xnextframe = NULL; /* None are allocated yet */ -md->match_frames_base = &frame_zero; -#endif - -/* Check for the special magic call that measures the size of the stack used -per recursive call of match(). Without the funny casting for sizeof, a Windows -compiler gave this error: "unary minus operator applied to unsigned type, -result still unsigned". Hopefully the cast fixes that. */ - -if (re == NULL && extra_data == NULL && subject == NULL && length == -999 && - start_offset == -999) -#ifdef NO_RECURSE - return -((int)sizeof(heapframe)); -#else - return match(NULL, NULL, NULL, 0, NULL, NULL, 0); -#endif - -/* Plausibility checks */ - -if ((options & ~PUBLIC_EXEC_OPTIONS) != 0) return PCRE_ERROR_BADOPTION; -if (re == NULL || subject == NULL || (offsets == NULL && offsetcount > 0)) - return PCRE_ERROR_NULL; -if (offsetcount < 0) return PCRE_ERROR_BADCOUNT; -if (length < 0) return PCRE_ERROR_BADLENGTH; -if (start_offset < 0 || start_offset > length) return PCRE_ERROR_BADOFFSET; - -/* Check that the first field in the block is the magic number. If it is not, -return with PCRE_ERROR_BADMAGIC. However, if the magic number is equal to -REVERSED_MAGIC_NUMBER we return with PCRE_ERROR_BADENDIANNESS, which -means that the pattern is likely compiled with different endianness. */ - -if (re->magic_number != MAGIC_NUMBER) - return re->magic_number == REVERSED_MAGIC_NUMBER? - PCRE_ERROR_BADENDIANNESS:PCRE_ERROR_BADMAGIC; -if ((re->flags & PCRE_MODE) == 0) return PCRE_ERROR_BADMODE; - -/* These two settings are used in the code for checking a UTF-8 string that -follows immediately afterwards. Other values in the md block are used only -during "normal" pcre_exec() processing, not when the JIT support is in use, -so they are set up later. */ - -/* PCRE_UTF16 has the same value as PCRE_UTF8. */ -utf = md->utf = (re->options & PCRE_UTF8) != 0; -md->partial = ((options & PCRE_PARTIAL_HARD) != 0)? 2 : - ((options & PCRE_PARTIAL_SOFT) != 0)? 1 : 0; - -/* Check a UTF-8 string if required. Pass back the character offset and error -code for an invalid string if a results vector is available. */ - -#ifdef SUPPORT_UTF -if (utf && (options & PCRE_NO_UTF8_CHECK) == 0) - { - int erroroffset; - int errorcode = PRIV(valid_utf)((PCRE_PUCHAR)subject, length, &erroroffset); - if (errorcode != 0) - { - if (offsetcount >= 2) - { - offsets[0] = erroroffset; - offsets[1] = errorcode; - } -#if defined COMPILE_PCRE8 - return (errorcode <= PCRE_UTF8_ERR5 && md->partial > 1)? - PCRE_ERROR_SHORTUTF8 : PCRE_ERROR_BADUTF8; -#elif defined COMPILE_PCRE16 - return (errorcode <= PCRE_UTF16_ERR1 && md->partial > 1)? - PCRE_ERROR_SHORTUTF16 : PCRE_ERROR_BADUTF16; -#elif defined COMPILE_PCRE32 - return PCRE_ERROR_BADUTF32; -#endif - } -#if defined COMPILE_PCRE8 || defined COMPILE_PCRE16 - /* Check that a start_offset points to the start of a UTF character. */ - if (start_offset > 0 && start_offset < length && - NOT_FIRSTCHAR(((PCRE_PUCHAR)subject)[start_offset])) - return PCRE_ERROR_BADUTF8_OFFSET; -#endif - } -#endif - -/* If the pattern was successfully studied with JIT support, run the JIT -executable instead of the rest of this function. Most options must be set at -compile time for the JIT code to be usable. Fallback to the normal code path if -an unsupported flag is set. */ - -#ifdef SUPPORT_JIT -if (extra_data != NULL - && (extra_data->flags & (PCRE_EXTRA_EXECUTABLE_JIT | - PCRE_EXTRA_TABLES)) == PCRE_EXTRA_EXECUTABLE_JIT - && extra_data->executable_jit != NULL - && (options & ~PUBLIC_JIT_EXEC_OPTIONS) == 0) - { - rc = PRIV(jit_exec)(extra_data, (const pcre_uchar *)subject, length, - start_offset, options, offsets, offsetcount); - - /* PCRE_ERROR_NULL means that the selected normal or partial matching - mode is not compiled. In this case we simply fallback to interpreter. */ - - if (rc != PCRE_ERROR_JIT_BADOPTION) return rc; - } -#endif - -/* Carry on with non-JIT matching. This information is for finding all the -numbers associated with a given name, for condition testing. */ - -md->name_table = (pcre_uchar *)re + re->name_table_offset; -md->name_count = re->name_count; -md->name_entry_size = re->name_entry_size; - -/* Fish out the optional data from the extra_data structure, first setting -the default values. */ - -study = NULL; -md->match_limit = MATCH_LIMIT; -md->match_limit_recursion = MATCH_LIMIT_RECURSION; -md->callout_data = NULL; - -/* The table pointer is always in native byte order. */ - -tables = re->tables; - -/* The two limit values override the defaults, whatever their value. */ - -if (extra_data != NULL) - { - unsigned long int flags = extra_data->flags; - if ((flags & PCRE_EXTRA_STUDY_DATA) != 0) - study = (const pcre_study_data *)extra_data->study_data; - if ((flags & PCRE_EXTRA_MATCH_LIMIT) != 0) - md->match_limit = extra_data->match_limit; - if ((flags & PCRE_EXTRA_MATCH_LIMIT_RECURSION) != 0) - md->match_limit_recursion = extra_data->match_limit_recursion; - if ((flags & PCRE_EXTRA_CALLOUT_DATA) != 0) - md->callout_data = extra_data->callout_data; - if ((flags & PCRE_EXTRA_TABLES) != 0) tables = extra_data->tables; - } - -/* Limits in the regex override only if they are smaller. */ - -if ((re->flags & PCRE_MLSET) != 0 && re->limit_match < md->match_limit) - md->match_limit = re->limit_match; - -if ((re->flags & PCRE_RLSET) != 0 && - re->limit_recursion < md->match_limit_recursion) - md->match_limit_recursion = re->limit_recursion; - -/* If the exec call supplied NULL for tables, use the inbuilt ones. This -is a feature that makes it possible to save compiled regex and re-use them -in other programs later. */ - -if (tables == NULL) tables = PRIV(default_tables); - -/* Set up other data */ - -anchored = ((re->options | options) & PCRE_ANCHORED) != 0; -startline = (re->flags & PCRE_STARTLINE) != 0; -firstline = (re->options & PCRE_FIRSTLINE) != 0; - -/* The code starts after the real_pcre block and the capture name table. */ - -md->start_code = (const pcre_uchar *)re + re->name_table_offset + - re->name_count * re->name_entry_size; - -md->start_subject = (PCRE_PUCHAR)subject; -md->start_offset = start_offset; -md->end_subject = md->start_subject + length; -end_subject = md->end_subject; - -md->endonly = (re->options & PCRE_DOLLAR_ENDONLY) != 0; -md->use_ucp = (re->options & PCRE_UCP) != 0; -md->jscript_compat = (re->options & PCRE_JAVASCRIPT_COMPAT) != 0; -md->ignore_skip_arg = 0; - -/* Some options are unpacked into BOOL variables in the hope that testing -them will be faster than individual option bits. */ - -md->notbol = (options & PCRE_NOTBOL) != 0; -md->noteol = (options & PCRE_NOTEOL) != 0; -md->notempty = (options & PCRE_NOTEMPTY) != 0; -md->notempty_atstart = (options & PCRE_NOTEMPTY_ATSTART) != 0; - -md->hitend = FALSE; -md->mark = md->nomatch_mark = NULL; /* In case never set */ - -md->recursive = NULL; /* No recursion at top level */ -md->hasthen = (re->flags & PCRE_HASTHEN) != 0; - -md->lcc = tables + lcc_offset; -md->fcc = tables + fcc_offset; -md->ctypes = tables + ctypes_offset; - -/* Handle different \R options. */ - -switch (options & (PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE)) - { - case 0: - if ((re->options & (PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE)) != 0) - md->bsr_anycrlf = (re->options & PCRE_BSR_ANYCRLF) != 0; - else -#ifdef BSR_ANYCRLF - md->bsr_anycrlf = TRUE; -#else - md->bsr_anycrlf = FALSE; -#endif - break; - - case PCRE_BSR_ANYCRLF: - md->bsr_anycrlf = TRUE; - break; - - case PCRE_BSR_UNICODE: - md->bsr_anycrlf = FALSE; - break; - - default: return PCRE_ERROR_BADNEWLINE; - } - -/* Handle different types of newline. The three bits give eight cases. If -nothing is set at run time, whatever was used at compile time applies. */ - -switch ((((options & PCRE_NEWLINE_BITS) == 0)? re->options : - (pcre_uint32)options) & PCRE_NEWLINE_BITS) - { - case 0: newline = NEWLINE; break; /* Compile-time default */ - case PCRE_NEWLINE_CR: newline = CHAR_CR; break; - case PCRE_NEWLINE_LF: newline = CHAR_NL; break; - case PCRE_NEWLINE_CR+ - PCRE_NEWLINE_LF: newline = (CHAR_CR << 8) | CHAR_NL; break; - case PCRE_NEWLINE_ANY: newline = -1; break; - case PCRE_NEWLINE_ANYCRLF: newline = -2; break; - default: return PCRE_ERROR_BADNEWLINE; - } - -if (newline == -2) - { - md->nltype = NLTYPE_ANYCRLF; - } -else if (newline < 0) - { - md->nltype = NLTYPE_ANY; - } -else - { - md->nltype = NLTYPE_FIXED; - if (newline > 255) - { - md->nllen = 2; - md->nl[0] = (newline >> 8) & 255; - md->nl[1] = newline & 255; - } - else - { - md->nllen = 1; - md->nl[0] = newline; - } - } - -/* Partial matching was originally supported only for a restricted set of -regexes; from release 8.00 there are no restrictions, but the bits are still -defined (though never set). So there's no harm in leaving this code. */ - -if (md->partial && (re->flags & PCRE_NOPARTIAL) != 0) - return PCRE_ERROR_BADPARTIAL; - -/* If the expression has got more back references than the offsets supplied can -hold, we get a temporary chunk of working store to use during the matching. -Otherwise, we can use the vector supplied, rounding down its size to a multiple -of 3. */ - -ocount = offsetcount - (offsetcount % 3); -arg_offset_max = (2*ocount)/3; - -if (re->top_backref > 0 && re->top_backref >= ocount/3) - { - ocount = re->top_backref * 3 + 3; - md->offset_vector = (int *)(PUBL(malloc))(ocount * sizeof(int)); - if (md->offset_vector == NULL) return PCRE_ERROR_NOMEMORY; - using_temporary_offsets = TRUE; - DPRINTF(("Got memory to hold back references\n")); - } -else md->offset_vector = offsets; -md->offset_end = ocount; -md->offset_max = (2*ocount)/3; -md->capture_last = 0; - -/* Reset the working variable associated with each extraction. These should -never be used unless previously set, but they get saved and restored, and so we -initialize them to avoid reading uninitialized locations. Also, unset the -offsets for the matched string. This is really just for tidiness with callouts, -in case they inspect these fields. */ - -if (md->offset_vector != NULL) - { - register int *iptr = md->offset_vector + ocount; - register int *iend = iptr - re->top_bracket; - if (iend < md->offset_vector + 2) iend = md->offset_vector + 2; - while (--iptr >= iend) *iptr = -1; - if (offsetcount > 0) md->offset_vector[0] = -1; - if (offsetcount > 1) md->offset_vector[1] = -1; - } - -/* Set up the first character to match, if available. The first_char value is -never set for an anchored regular expression, but the anchoring may be forced -at run time, so we have to test for anchoring. The first char may be unset for -an unanchored pattern, of course. If there's no first char and the pattern was -studied, there may be a bitmap of possible first characters. */ - -if (!anchored) - { - if ((re->flags & PCRE_FIRSTSET) != 0) - { - has_first_char = TRUE; - first_char = first_char2 = (pcre_uchar)(re->first_char); - if ((re->flags & PCRE_FCH_CASELESS) != 0) - { - first_char2 = TABLE_GET(first_char, md->fcc, first_char); -#if defined SUPPORT_UCP && !(defined COMPILE_PCRE8) - if (utf && first_char > 127) - first_char2 = UCD_OTHERCASE(first_char); -#endif - } - } - else - if (!startline && study != NULL && - (study->flags & PCRE_STUDY_MAPPED) != 0) - start_bits = study->start_bits; - } - -/* For anchored or unanchored matches, there may be a "last known required -character" set. */ - -if ((re->flags & PCRE_REQCHSET) != 0) - { - has_req_char = TRUE; - req_char = req_char2 = (pcre_uchar)(re->req_char); - if ((re->flags & PCRE_RCH_CASELESS) != 0) - { - req_char2 = TABLE_GET(req_char, md->fcc, req_char); -#if defined SUPPORT_UCP && !(defined COMPILE_PCRE8) - if (utf && req_char > 127) - req_char2 = UCD_OTHERCASE(req_char); -#endif - } - } - - -/* ==========================================================================*/ - -/* Loop for handling unanchored repeated matching attempts; for anchored regexs -the loop runs just once. */ - -for(;;) - { - PCRE_PUCHAR save_end_subject = end_subject; - PCRE_PUCHAR new_start_match; - - /* If firstline is TRUE, the start of the match is constrained to the first - line of a multiline string. That is, the match must be before or at the first - newline. Implement this by temporarily adjusting end_subject so that we stop - scanning at a newline. If the match fails at the newline, later code breaks - this loop. */ - - if (firstline) - { - PCRE_PUCHAR t = start_match; -#ifdef SUPPORT_UTF - if (utf) - { - while (t < md->end_subject && !IS_NEWLINE(t)) - { - t++; - ACROSSCHAR(t < end_subject, *t, t++); - } - } - else -#endif - while (t < md->end_subject && !IS_NEWLINE(t)) t++; - end_subject = t; - } - - /* There are some optimizations that avoid running the match if a known - starting point is not found, or if a known later character is not present. - However, there is an option that disables these, for testing and for ensuring - that all callouts do actually occur. The option can be set in the regex by - (*NO_START_OPT) or passed in match-time options. */ - - if (((options | re->options) & PCRE_NO_START_OPTIMIZE) == 0) - { - /* Advance to a unique first char if there is one. */ - - if (has_first_char) - { - pcre_uchar smc; - - if (first_char != first_char2) - while (start_match < end_subject && - (smc = UCHAR21TEST(start_match)) != first_char && smc != first_char2) - start_match++; - else - while (start_match < end_subject && UCHAR21TEST(start_match) != first_char) - start_match++; - } - - /* Or to just after a linebreak for a multiline match */ - - else if (startline) - { - if (start_match > md->start_subject + start_offset) - { -#ifdef SUPPORT_UTF - if (utf) - { - while (start_match < end_subject && !WAS_NEWLINE(start_match)) - { - start_match++; - ACROSSCHAR(start_match < end_subject, *start_match, - start_match++); - } - } - else -#endif - while (start_match < end_subject && !WAS_NEWLINE(start_match)) - start_match++; - - /* If we have just passed a CR and the newline option is ANY or ANYCRLF, - and we are now at a LF, advance the match position by one more character. - */ - - if (start_match[-1] == CHAR_CR && - (md->nltype == NLTYPE_ANY || md->nltype == NLTYPE_ANYCRLF) && - start_match < end_subject && - UCHAR21TEST(start_match) == CHAR_NL) - start_match++; - } - } - - /* Or to a non-unique first byte after study */ - - else if (start_bits != NULL) - { - while (start_match < end_subject) - { - register pcre_uint32 c = UCHAR21TEST(start_match); -#ifndef COMPILE_PCRE8 - if (c > 255) c = 255; -#endif - if ((start_bits[c/8] & (1 << (c&7))) != 0) break; - start_match++; - } - } - } /* Starting optimizations */ - - /* Restore fudged end_subject */ - - end_subject = save_end_subject; - - /* The following two optimizations are disabled for partial matching or if - disabling is explicitly requested. */ - - if (((options | re->options) & PCRE_NO_START_OPTIMIZE) == 0 && !md->partial) - { - /* If the pattern was studied, a minimum subject length may be set. This is - a lower bound; no actual string of that length may actually match the - pattern. Although the value is, strictly, in characters, we treat it as - bytes to avoid spending too much time in this optimization. */ - - if (study != NULL && (study->flags & PCRE_STUDY_MINLEN) != 0 && - (pcre_uint32)(end_subject - start_match) < study->minlength) - { - rc = MATCH_NOMATCH; - break; - } - - /* If req_char is set, we know that that character must appear in the - subject for the match to succeed. If the first character is set, req_char - must be later in the subject; otherwise the test starts at the match point. - This optimization can save a huge amount of backtracking in patterns with - nested unlimited repeats that aren't going to match. Writing separate code - for cased/caseless versions makes it go faster, as does using an - autoincrement and backing off on a match. - - HOWEVER: when the subject string is very, very long, searching to its end - can take a long time, and give bad performance on quite ordinary patterns. - This showed up when somebody was matching something like /^\d+C/ on a - 32-megabyte string... so we don't do this when the string is sufficiently - long. */ - - if (has_req_char && end_subject - start_match < REQ_BYTE_MAX) - { - register PCRE_PUCHAR p = start_match + (has_first_char? 1:0); - - /* We don't need to repeat the search if we haven't yet reached the - place we found it at last time. */ - - if (p > req_char_ptr) - { - if (req_char != req_char2) - { - while (p < end_subject) - { - register pcre_uint32 pp = UCHAR21INCTEST(p); - if (pp == req_char || pp == req_char2) { p--; break; } - } - } - else - { - while (p < end_subject) - { - if (UCHAR21INCTEST(p) == req_char) { p--; break; } - } - } - - /* If we can't find the required character, break the matching loop, - forcing a match failure. */ - - if (p >= end_subject) - { - rc = MATCH_NOMATCH; - break; - } - - /* If we have found the required character, save the point where we - found it, so that we don't search again next time round the loop if - the start hasn't passed this character yet. */ - - req_char_ptr = p; - } - } - } - -#ifdef PCRE_DEBUG /* Sigh. Some compilers never learn. */ - printf(">>>> Match against: "); - pchars(start_match, end_subject - start_match, TRUE, md); - printf("\n"); -#endif - - /* OK, we can now run the match. If "hitend" is set afterwards, remember the - first starting point for which a partial match was found. */ - - md->start_match_ptr = start_match; - md->start_used_ptr = start_match; - md->match_call_count = 0; - md->match_function_type = 0; - md->end_offset_top = 0; - md->skip_arg_count = 0; - rc = match(start_match, md->start_code, start_match, 2, md, NULL, 0); - if (md->hitend && start_partial == NULL) - { - start_partial = md->start_used_ptr; - match_partial = start_match; - } - - switch(rc) - { - /* If MATCH_SKIP_ARG reaches this level it means that a MARK that matched - the SKIP's arg was not found. In this circumstance, Perl ignores the SKIP - entirely. The only way we can do that is to re-do the match at the same - point, with a flag to force SKIP with an argument to be ignored. Just - treating this case as NOMATCH does not work because it does not check other - alternatives in patterns such as A(*SKIP:A)B|AC when the subject is AC. */ - - case MATCH_SKIP_ARG: - new_start_match = start_match; - md->ignore_skip_arg = md->skip_arg_count; - break; - - /* SKIP passes back the next starting point explicitly, but if it is no - greater than the match we have just done, treat it as NOMATCH. */ - - case MATCH_SKIP: - if (md->start_match_ptr > start_match) - { - new_start_match = md->start_match_ptr; - break; - } - /* Fall through */ - - /* NOMATCH and PRUNE advance by one character. THEN at this level acts - exactly like PRUNE. Unset ignore SKIP-with-argument. */ - - case MATCH_NOMATCH: - case MATCH_PRUNE: - case MATCH_THEN: - md->ignore_skip_arg = 0; - new_start_match = start_match + 1; -#ifdef SUPPORT_UTF - if (utf) - ACROSSCHAR(new_start_match < end_subject, *new_start_match, - new_start_match++); -#endif - break; - - /* COMMIT disables the bumpalong, but otherwise behaves as NOMATCH. */ - - case MATCH_COMMIT: - rc = MATCH_NOMATCH; - goto ENDLOOP; - - /* Any other return is either a match, or some kind of error. */ - - default: - goto ENDLOOP; - } - - /* Control reaches here for the various types of "no match at this point" - result. Reset the code to MATCH_NOMATCH for subsequent checking. */ - - rc = MATCH_NOMATCH; - - /* If PCRE_FIRSTLINE is set, the match must happen before or at the first - newline in the subject (though it may continue over the newline). Therefore, - if we have just failed to match, starting at a newline, do not continue. */ - - if (firstline && IS_NEWLINE(start_match)) break; - - /* Advance to new matching position */ - - start_match = new_start_match; - - /* Break the loop if the pattern is anchored or if we have passed the end of - the subject. */ - - if (anchored || start_match > end_subject) break; - - /* If we have just passed a CR and we are now at a LF, and the pattern does - not contain any explicit matches for \r or \n, and the newline option is CRLF - or ANY or ANYCRLF, advance the match position by one more character. In - normal matching start_match will aways be greater than the first position at - this stage, but a failed *SKIP can cause a return at the same point, which is - why the first test exists. */ - - if (start_match > (PCRE_PUCHAR)subject + start_offset && - start_match[-1] == CHAR_CR && - start_match < end_subject && - *start_match == CHAR_NL && - (re->flags & PCRE_HASCRORLF) == 0 && - (md->nltype == NLTYPE_ANY || - md->nltype == NLTYPE_ANYCRLF || - md->nllen == 2)) - start_match++; - - md->mark = NULL; /* Reset for start of next match attempt */ - } /* End of for(;;) "bumpalong" loop */ - -/* ==========================================================================*/ - -/* We reach here when rc is not MATCH_NOMATCH, or if one of the stopping -conditions is true: - -(1) The pattern is anchored or the match was failed by (*COMMIT); - -(2) We are past the end of the subject; - -(3) PCRE_FIRSTLINE is set and we have failed to match at a newline, because - this option requests that a match occur at or before the first newline in - the subject. - -When we have a match and the offset vector is big enough to deal with any -backreferences, captured substring offsets will already be set up. In the case -where we had to get some local store to hold offsets for backreference -processing, copy those that we can. In this case there need not be overflow if -certain parts of the pattern were not used, even though there are more -capturing parentheses than vector slots. */ - -ENDLOOP: - -if (rc == MATCH_MATCH || rc == MATCH_ACCEPT) - { - if (using_temporary_offsets) - { - if (arg_offset_max >= 4) - { - memcpy(offsets + 2, md->offset_vector + 2, - (arg_offset_max - 2) * sizeof(int)); - DPRINTF(("Copied offsets from temporary memory\n")); - } - if (md->end_offset_top > arg_offset_max) md->capture_last |= OVFLBIT; - DPRINTF(("Freeing temporary memory\n")); - (PUBL(free))(md->offset_vector); - } - - /* Set the return code to the number of captured strings, or 0 if there were - too many to fit into the vector. */ - - rc = ((md->capture_last & OVFLBIT) != 0 && - md->end_offset_top >= arg_offset_max)? - 0 : md->end_offset_top/2; - - /* If there is space in the offset vector, set any unused pairs at the end of - the pattern to -1 for backwards compatibility. It is documented that this - happens. In earlier versions, the whole set of potential capturing offsets - was set to -1 each time round the loop, but this is handled differently now. - "Gaps" are set to -1 dynamically instead (this fixes a bug). Thus, it is only - those at the end that need unsetting here. We can't just unset them all at - the start of the whole thing because they may get set in one branch that is - not the final matching branch. */ - - if (md->end_offset_top/2 <= re->top_bracket && offsets != NULL) - { - register int *iptr, *iend; - int resetcount = 2 + re->top_bracket * 2; - if (resetcount > offsetcount) resetcount = offsetcount; - iptr = offsets + md->end_offset_top; - iend = offsets + resetcount; - while (iptr < iend) *iptr++ = -1; - } - - /* If there is space, set up the whole thing as substring 0. The value of - md->start_match_ptr might be modified if \K was encountered on the success - matching path. */ - - if (offsetcount < 2) rc = 0; else - { - offsets[0] = (int)(md->start_match_ptr - md->start_subject); - offsets[1] = (int)(md->end_match_ptr - md->start_subject); - } - - /* Return MARK data if requested */ - - if (extra_data != NULL && (extra_data->flags & PCRE_EXTRA_MARK) != 0) - *(extra_data->mark) = (pcre_uchar *)md->mark; - DPRINTF((">>>> returning %d\n", rc)); -#ifdef NO_RECURSE - release_match_heapframes(&frame_zero); -#endif - return rc; - } - -/* Control gets here if there has been an error, or if the overall match -attempt has failed at all permitted starting positions. */ - -if (using_temporary_offsets) - { - DPRINTF(("Freeing temporary memory\n")); - (PUBL(free))(md->offset_vector); - } - -/* For anything other than nomatch or partial match, just return the code. */ - -if (rc != MATCH_NOMATCH && rc != PCRE_ERROR_PARTIAL) - { - DPRINTF((">>>> error: returning %d\n", rc)); -#ifdef NO_RECURSE - release_match_heapframes(&frame_zero); -#endif - return rc; - } - -/* Handle partial matches - disable any mark data */ - -if (match_partial != NULL) - { - DPRINTF((">>>> returning PCRE_ERROR_PARTIAL\n")); - md->mark = NULL; - if (offsetcount > 1) - { - offsets[0] = (int)(start_partial - (PCRE_PUCHAR)subject); - offsets[1] = (int)(end_subject - (PCRE_PUCHAR)subject); - if (offsetcount > 2) - offsets[2] = (int)(match_partial - (PCRE_PUCHAR)subject); - } - rc = PCRE_ERROR_PARTIAL; - } - -/* This is the classic nomatch case */ - -else - { - DPRINTF((">>>> returning PCRE_ERROR_NOMATCH\n")); - rc = PCRE_ERROR_NOMATCH; - } - -/* Return the MARK data if it has been requested. */ - -if (extra_data != NULL && (extra_data->flags & PCRE_EXTRA_MARK) != 0) - *(extra_data->mark) = (pcre_uchar *)md->nomatch_mark; -#ifdef NO_RECURSE - release_match_heapframes(&frame_zero); -#endif -return rc; -} - -/* End of pcre_exec.c */ diff --git a/src/pcre/pcre_fullinfo.c b/src/pcre/pcre_fullinfo.c deleted file mode 100644 index a6c2ece6..00000000 --- a/src/pcre/pcre_fullinfo.c +++ /dev/null @@ -1,245 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2013 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This module contains the external function pcre_fullinfo(), which returns -information about a compiled pattern. */ - - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include "pcre_internal.h" - - -/************************************************* -* Return info about compiled pattern * -*************************************************/ - -/* This is a newer "info" function which has an extensible interface so -that additional items can be added compatibly. - -Arguments: - argument_re points to compiled code - extra_data points extra data, or NULL - what what information is required - where where to put the information - -Returns: 0 if data returned, negative on error -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre_fullinfo(const pcre *argument_re, const pcre_extra *extra_data, - int what, void *where) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre16_fullinfo(const pcre16 *argument_re, const pcre16_extra *extra_data, - int what, void *where) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre32_fullinfo(const pcre32 *argument_re, const pcre32_extra *extra_data, - int what, void *where) -#endif -{ -const REAL_PCRE *re = (const REAL_PCRE *)argument_re; -const pcre_study_data *study = NULL; - -if (re == NULL || where == NULL) return PCRE_ERROR_NULL; - -if (extra_data != NULL && (extra_data->flags & PCRE_EXTRA_STUDY_DATA) != 0) - study = (const pcre_study_data *)extra_data->study_data; - -/* Check that the first field in the block is the magic number. If it is not, -return with PCRE_ERROR_BADMAGIC. However, if the magic number is equal to -REVERSED_MAGIC_NUMBER we return with PCRE_ERROR_BADENDIANNESS, which -means that the pattern is likely compiled with different endianness. */ - -if (re->magic_number != MAGIC_NUMBER) - return re->magic_number == REVERSED_MAGIC_NUMBER? - PCRE_ERROR_BADENDIANNESS:PCRE_ERROR_BADMAGIC; - -/* Check that this pattern was compiled in the correct bit mode */ - -if ((re->flags & PCRE_MODE) == 0) return PCRE_ERROR_BADMODE; - -switch (what) - { - case PCRE_INFO_OPTIONS: - *((unsigned long int *)where) = re->options & PUBLIC_COMPILE_OPTIONS; - break; - - case PCRE_INFO_SIZE: - *((size_t *)where) = re->size; - break; - - case PCRE_INFO_STUDYSIZE: - *((size_t *)where) = (study == NULL)? 0 : study->size; - break; - - case PCRE_INFO_JITSIZE: -#ifdef SUPPORT_JIT - *((size_t *)where) = - (extra_data != NULL && - (extra_data->flags & PCRE_EXTRA_EXECUTABLE_JIT) != 0 && - extra_data->executable_jit != NULL)? - PRIV(jit_get_size)(extra_data->executable_jit) : 0; -#else - *((size_t *)where) = 0; -#endif - break; - - case PCRE_INFO_CAPTURECOUNT: - *((int *)where) = re->top_bracket; - break; - - case PCRE_INFO_BACKREFMAX: - *((int *)where) = re->top_backref; - break; - - case PCRE_INFO_FIRSTBYTE: - *((int *)where) = - ((re->flags & PCRE_FIRSTSET) != 0)? (int)re->first_char : - ((re->flags & PCRE_STARTLINE) != 0)? -1 : -2; - break; - - case PCRE_INFO_FIRSTCHARACTER: - *((pcre_uint32 *)where) = - (re->flags & PCRE_FIRSTSET) != 0 ? re->first_char : 0; - break; - - case PCRE_INFO_FIRSTCHARACTERFLAGS: - *((int *)where) = - ((re->flags & PCRE_FIRSTSET) != 0) ? 1 : - ((re->flags & PCRE_STARTLINE) != 0) ? 2 : 0; - break; - - /* Make sure we pass back the pointer to the bit vector in the external - block, not the internal copy (with flipped integer fields). */ - - case PCRE_INFO_FIRSTTABLE: - *((const pcre_uint8 **)where) = - (study != NULL && (study->flags & PCRE_STUDY_MAPPED) != 0)? - ((const pcre_study_data *)extra_data->study_data)->start_bits : NULL; - break; - - case PCRE_INFO_MINLENGTH: - *((int *)where) = - (study != NULL && (study->flags & PCRE_STUDY_MINLEN) != 0)? - (int)(study->minlength) : -1; - break; - - case PCRE_INFO_JIT: - *((int *)where) = extra_data != NULL && - (extra_data->flags & PCRE_EXTRA_EXECUTABLE_JIT) != 0 && - extra_data->executable_jit != NULL; - break; - - case PCRE_INFO_LASTLITERAL: - *((int *)where) = - ((re->flags & PCRE_REQCHSET) != 0)? (int)re->req_char : -1; - break; - - case PCRE_INFO_REQUIREDCHAR: - *((pcre_uint32 *)where) = - ((re->flags & PCRE_REQCHSET) != 0) ? re->req_char : 0; - break; - - case PCRE_INFO_REQUIREDCHARFLAGS: - *((int *)where) = - ((re->flags & PCRE_REQCHSET) != 0); - break; - - case PCRE_INFO_NAMEENTRYSIZE: - *((int *)where) = re->name_entry_size; - break; - - case PCRE_INFO_NAMECOUNT: - *((int *)where) = re->name_count; - break; - - case PCRE_INFO_NAMETABLE: - *((const pcre_uchar **)where) = (const pcre_uchar *)re + re->name_table_offset; - break; - - case PCRE_INFO_DEFAULT_TABLES: - *((const pcre_uint8 **)where) = (const pcre_uint8 *)(PRIV(default_tables)); - break; - - /* From release 8.00 this will always return TRUE because NOPARTIAL is - no longer ever set (the restrictions have been removed). */ - - case PCRE_INFO_OKPARTIAL: - *((int *)where) = (re->flags & PCRE_NOPARTIAL) == 0; - break; - - case PCRE_INFO_JCHANGED: - *((int *)where) = (re->flags & PCRE_JCHANGED) != 0; - break; - - case PCRE_INFO_HASCRORLF: - *((int *)where) = (re->flags & PCRE_HASCRORLF) != 0; - break; - - case PCRE_INFO_MAXLOOKBEHIND: - *((int *)where) = re->max_lookbehind; - break; - - case PCRE_INFO_MATCHLIMIT: - if ((re->flags & PCRE_MLSET) == 0) return PCRE_ERROR_UNSET; - *((pcre_uint32 *)where) = re->limit_match; - break; - - case PCRE_INFO_RECURSIONLIMIT: - if ((re->flags & PCRE_RLSET) == 0) return PCRE_ERROR_UNSET; - *((pcre_uint32 *)where) = re->limit_recursion; - break; - - case PCRE_INFO_MATCH_EMPTY: - *((int *)where) = (re->flags & PCRE_MATCH_EMPTY) != 0; - break; - - default: return PCRE_ERROR_BADOPTION; - } - -return 0; -} - -/* End of pcre_fullinfo.c */ diff --git a/src/pcre/pcre_get.c b/src/pcre/pcre_get.c deleted file mode 100644 index 9475d5e8..00000000 --- a/src/pcre/pcre_get.c +++ /dev/null @@ -1,669 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This module contains some convenience functions for extracting substrings -from the subject string after a regex match has succeeded. The original idea -for these functions came from Scott Wimer. */ - - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include "pcre_internal.h" - - -/************************************************* -* Find number for named string * -*************************************************/ - -/* This function is used by the get_first_set() function below, as well -as being generally available. It assumes that names are unique. - -Arguments: - code the compiled regex - stringname the name whose number is required - -Returns: the number of the named parentheses, or a negative number - (PCRE_ERROR_NOSUBSTRING) if not found -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre_get_stringnumber(const pcre *code, const char *stringname) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre16_get_stringnumber(const pcre16 *code, PCRE_SPTR16 stringname) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre32_get_stringnumber(const pcre32 *code, PCRE_SPTR32 stringname) -#endif -{ -int rc; -int entrysize; -int top, bot; -pcre_uchar *nametable; - -#ifdef COMPILE_PCRE8 -if ((rc = pcre_fullinfo(code, NULL, PCRE_INFO_NAMECOUNT, &top)) != 0) - return rc; -if (top <= 0) return PCRE_ERROR_NOSUBSTRING; - -if ((rc = pcre_fullinfo(code, NULL, PCRE_INFO_NAMEENTRYSIZE, &entrysize)) != 0) - return rc; -if ((rc = pcre_fullinfo(code, NULL, PCRE_INFO_NAMETABLE, &nametable)) != 0) - return rc; -#endif -#ifdef COMPILE_PCRE16 -if ((rc = pcre16_fullinfo(code, NULL, PCRE_INFO_NAMECOUNT, &top)) != 0) - return rc; -if (top <= 0) return PCRE_ERROR_NOSUBSTRING; - -if ((rc = pcre16_fullinfo(code, NULL, PCRE_INFO_NAMEENTRYSIZE, &entrysize)) != 0) - return rc; -if ((rc = pcre16_fullinfo(code, NULL, PCRE_INFO_NAMETABLE, &nametable)) != 0) - return rc; -#endif -#ifdef COMPILE_PCRE32 -if ((rc = pcre32_fullinfo(code, NULL, PCRE_INFO_NAMECOUNT, &top)) != 0) - return rc; -if (top <= 0) return PCRE_ERROR_NOSUBSTRING; - -if ((rc = pcre32_fullinfo(code, NULL, PCRE_INFO_NAMEENTRYSIZE, &entrysize)) != 0) - return rc; -if ((rc = pcre32_fullinfo(code, NULL, PCRE_INFO_NAMETABLE, &nametable)) != 0) - return rc; -#endif - -bot = 0; -while (top > bot) - { - int mid = (top + bot) / 2; - pcre_uchar *entry = nametable + entrysize*mid; - int c = STRCMP_UC_UC((pcre_uchar *)stringname, - (pcre_uchar *)(entry + IMM2_SIZE)); - if (c == 0) return GET2(entry, 0); - if (c > 0) bot = mid + 1; else top = mid; - } - -return PCRE_ERROR_NOSUBSTRING; -} - - - -/************************************************* -* Find (multiple) entries for named string * -*************************************************/ - -/* This is used by the get_first_set() function below, as well as being -generally available. It is used when duplicated names are permitted. - -Arguments: - code the compiled regex - stringname the name whose entries required - firstptr where to put the pointer to the first entry - lastptr where to put the pointer to the last entry - -Returns: the length of each entry, or a negative number - (PCRE_ERROR_NOSUBSTRING) if not found -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre_get_stringtable_entries(const pcre *code, const char *stringname, - char **firstptr, char **lastptr) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre16_get_stringtable_entries(const pcre16 *code, PCRE_SPTR16 stringname, - PCRE_UCHAR16 **firstptr, PCRE_UCHAR16 **lastptr) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre32_get_stringtable_entries(const pcre32 *code, PCRE_SPTR32 stringname, - PCRE_UCHAR32 **firstptr, PCRE_UCHAR32 **lastptr) -#endif -{ -int rc; -int entrysize; -int top, bot; -pcre_uchar *nametable, *lastentry; - -#ifdef COMPILE_PCRE8 -if ((rc = pcre_fullinfo(code, NULL, PCRE_INFO_NAMECOUNT, &top)) != 0) - return rc; -if (top <= 0) return PCRE_ERROR_NOSUBSTRING; - -if ((rc = pcre_fullinfo(code, NULL, PCRE_INFO_NAMEENTRYSIZE, &entrysize)) != 0) - return rc; -if ((rc = pcre_fullinfo(code, NULL, PCRE_INFO_NAMETABLE, &nametable)) != 0) - return rc; -#endif -#ifdef COMPILE_PCRE16 -if ((rc = pcre16_fullinfo(code, NULL, PCRE_INFO_NAMECOUNT, &top)) != 0) - return rc; -if (top <= 0) return PCRE_ERROR_NOSUBSTRING; - -if ((rc = pcre16_fullinfo(code, NULL, PCRE_INFO_NAMEENTRYSIZE, &entrysize)) != 0) - return rc; -if ((rc = pcre16_fullinfo(code, NULL, PCRE_INFO_NAMETABLE, &nametable)) != 0) - return rc; -#endif -#ifdef COMPILE_PCRE32 -if ((rc = pcre32_fullinfo(code, NULL, PCRE_INFO_NAMECOUNT, &top)) != 0) - return rc; -if (top <= 0) return PCRE_ERROR_NOSUBSTRING; - -if ((rc = pcre32_fullinfo(code, NULL, PCRE_INFO_NAMEENTRYSIZE, &entrysize)) != 0) - return rc; -if ((rc = pcre32_fullinfo(code, NULL, PCRE_INFO_NAMETABLE, &nametable)) != 0) - return rc; -#endif - -lastentry = nametable + entrysize * (top - 1); -bot = 0; -while (top > bot) - { - int mid = (top + bot) / 2; - pcre_uchar *entry = nametable + entrysize*mid; - int c = STRCMP_UC_UC((pcre_uchar *)stringname, - (pcre_uchar *)(entry + IMM2_SIZE)); - if (c == 0) - { - pcre_uchar *first = entry; - pcre_uchar *last = entry; - while (first > nametable) - { - if (STRCMP_UC_UC((pcre_uchar *)stringname, - (pcre_uchar *)(first - entrysize + IMM2_SIZE)) != 0) break; - first -= entrysize; - } - while (last < lastentry) - { - if (STRCMP_UC_UC((pcre_uchar *)stringname, - (pcre_uchar *)(last + entrysize + IMM2_SIZE)) != 0) break; - last += entrysize; - } -#if defined COMPILE_PCRE8 - *firstptr = (char *)first; - *lastptr = (char *)last; -#elif defined COMPILE_PCRE16 - *firstptr = (PCRE_UCHAR16 *)first; - *lastptr = (PCRE_UCHAR16 *)last; -#elif defined COMPILE_PCRE32 - *firstptr = (PCRE_UCHAR32 *)first; - *lastptr = (PCRE_UCHAR32 *)last; -#endif - return entrysize; - } - if (c > 0) bot = mid + 1; else top = mid; - } - -return PCRE_ERROR_NOSUBSTRING; -} - - - -/************************************************* -* Find first set of multiple named strings * -*************************************************/ - -/* This function allows for duplicate names in the table of named substrings. -It returns the number of the first one that was set in a pattern match. - -Arguments: - code the compiled regex - stringname the name of the capturing substring - ovector the vector of matched substrings - stringcount number of captured substrings - -Returns: the number of the first that is set, - or the number of the last one if none are set, - or a negative number on error -*/ - -#if defined COMPILE_PCRE8 -static int -get_first_set(const pcre *code, const char *stringname, int *ovector, - int stringcount) -#elif defined COMPILE_PCRE16 -static int -get_first_set(const pcre16 *code, PCRE_SPTR16 stringname, int *ovector, - int stringcount) -#elif defined COMPILE_PCRE32 -static int -get_first_set(const pcre32 *code, PCRE_SPTR32 stringname, int *ovector, - int stringcount) -#endif -{ -const REAL_PCRE *re = (const REAL_PCRE *)code; -int entrysize; -pcre_uchar *entry; -#if defined COMPILE_PCRE8 -char *first, *last; -#elif defined COMPILE_PCRE16 -PCRE_UCHAR16 *first, *last; -#elif defined COMPILE_PCRE32 -PCRE_UCHAR32 *first, *last; -#endif - -#if defined COMPILE_PCRE8 -if ((re->options & PCRE_DUPNAMES) == 0 && (re->flags & PCRE_JCHANGED) == 0) - return pcre_get_stringnumber(code, stringname); -entrysize = pcre_get_stringtable_entries(code, stringname, &first, &last); -#elif defined COMPILE_PCRE16 -if ((re->options & PCRE_DUPNAMES) == 0 && (re->flags & PCRE_JCHANGED) == 0) - return pcre16_get_stringnumber(code, stringname); -entrysize = pcre16_get_stringtable_entries(code, stringname, &first, &last); -#elif defined COMPILE_PCRE32 -if ((re->options & PCRE_DUPNAMES) == 0 && (re->flags & PCRE_JCHANGED) == 0) - return pcre32_get_stringnumber(code, stringname); -entrysize = pcre32_get_stringtable_entries(code, stringname, &first, &last); -#endif -if (entrysize <= 0) return entrysize; -for (entry = (pcre_uchar *)first; entry <= (pcre_uchar *)last; entry += entrysize) - { - int n = GET2(entry, 0); - if (n < stringcount && ovector[n*2] >= 0) return n; - } -return GET2(entry, 0); -} - - - - -/************************************************* -* Copy captured string to given buffer * -*************************************************/ - -/* This function copies a single captured substring into a given buffer. -Note that we use memcpy() rather than strncpy() in case there are binary zeros -in the string. - -Arguments: - subject the subject string that was matched - ovector pointer to the offsets table - stringcount the number of substrings that were captured - (i.e. the yield of the pcre_exec call, unless - that was zero, in which case it should be 1/3 - of the offset table size) - stringnumber the number of the required substring - buffer where to put the substring - size the size of the buffer - -Returns: if successful: - the length of the copied string, not including the zero - that is put on the end; can be zero - if not successful: - PCRE_ERROR_NOMEMORY (-6) buffer too small - PCRE_ERROR_NOSUBSTRING (-7) no such captured substring -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre_copy_substring(const char *subject, int *ovector, int stringcount, - int stringnumber, char *buffer, int size) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre16_copy_substring(PCRE_SPTR16 subject, int *ovector, int stringcount, - int stringnumber, PCRE_UCHAR16 *buffer, int size) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre32_copy_substring(PCRE_SPTR32 subject, int *ovector, int stringcount, - int stringnumber, PCRE_UCHAR32 *buffer, int size) -#endif -{ -int yield; -if (stringnumber < 0 || stringnumber >= stringcount) - return PCRE_ERROR_NOSUBSTRING; -stringnumber *= 2; -yield = ovector[stringnumber+1] - ovector[stringnumber]; -if (size < yield + 1) return PCRE_ERROR_NOMEMORY; -memcpy(buffer, subject + ovector[stringnumber], IN_UCHARS(yield)); -buffer[yield] = 0; -return yield; -} - - - -/************************************************* -* Copy named captured string to given buffer * -*************************************************/ - -/* This function copies a single captured substring into a given buffer, -identifying it by name. If the regex permits duplicate names, the first -substring that is set is chosen. - -Arguments: - code the compiled regex - subject the subject string that was matched - ovector pointer to the offsets table - stringcount the number of substrings that were captured - (i.e. the yield of the pcre_exec call, unless - that was zero, in which case it should be 1/3 - of the offset table size) - stringname the name of the required substring - buffer where to put the substring - size the size of the buffer - -Returns: if successful: - the length of the copied string, not including the zero - that is put on the end; can be zero - if not successful: - PCRE_ERROR_NOMEMORY (-6) buffer too small - PCRE_ERROR_NOSUBSTRING (-7) no such captured substring -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre_copy_named_substring(const pcre *code, const char *subject, - int *ovector, int stringcount, const char *stringname, - char *buffer, int size) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre16_copy_named_substring(const pcre16 *code, PCRE_SPTR16 subject, - int *ovector, int stringcount, PCRE_SPTR16 stringname, - PCRE_UCHAR16 *buffer, int size) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre32_copy_named_substring(const pcre32 *code, PCRE_SPTR32 subject, - int *ovector, int stringcount, PCRE_SPTR32 stringname, - PCRE_UCHAR32 *buffer, int size) -#endif -{ -int n = get_first_set(code, stringname, ovector, stringcount); -if (n <= 0) return n; -#if defined COMPILE_PCRE8 -return pcre_copy_substring(subject, ovector, stringcount, n, buffer, size); -#elif defined COMPILE_PCRE16 -return pcre16_copy_substring(subject, ovector, stringcount, n, buffer, size); -#elif defined COMPILE_PCRE32 -return pcre32_copy_substring(subject, ovector, stringcount, n, buffer, size); -#endif -} - - - -/************************************************* -* Copy all captured strings to new store * -*************************************************/ - -/* This function gets one chunk of store and builds a list of pointers and all -of the captured substrings in it. A NULL pointer is put on the end of the list. - -Arguments: - subject the subject string that was matched - ovector pointer to the offsets table - stringcount the number of substrings that were captured - (i.e. the yield of the pcre_exec call, unless - that was zero, in which case it should be 1/3 - of the offset table size) - listptr set to point to the list of pointers - -Returns: if successful: 0 - if not successful: - PCRE_ERROR_NOMEMORY (-6) failed to get store -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre_get_substring_list(const char *subject, int *ovector, int stringcount, - const char ***listptr) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre16_get_substring_list(PCRE_SPTR16 subject, int *ovector, int stringcount, - PCRE_SPTR16 **listptr) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre32_get_substring_list(PCRE_SPTR32 subject, int *ovector, int stringcount, - PCRE_SPTR32 **listptr) -#endif -{ -int i; -int size = sizeof(pcre_uchar *); -int double_count = stringcount * 2; -pcre_uchar **stringlist; -pcre_uchar *p; - -for (i = 0; i < double_count; i += 2) - { - size += sizeof(pcre_uchar *) + IN_UCHARS(1); - if (ovector[i+1] > ovector[i]) size += IN_UCHARS(ovector[i+1] - ovector[i]); - } - -stringlist = (pcre_uchar **)(PUBL(malloc))(size); -if (stringlist == NULL) return PCRE_ERROR_NOMEMORY; - -#if defined COMPILE_PCRE8 -*listptr = (const char **)stringlist; -#elif defined COMPILE_PCRE16 -*listptr = (PCRE_SPTR16 *)stringlist; -#elif defined COMPILE_PCRE32 -*listptr = (PCRE_SPTR32 *)stringlist; -#endif -p = (pcre_uchar *)(stringlist + stringcount + 1); - -for (i = 0; i < double_count; i += 2) - { - int len = (ovector[i+1] > ovector[i])? (ovector[i+1] - ovector[i]) : 0; - memcpy(p, subject + ovector[i], IN_UCHARS(len)); - *stringlist++ = p; - p += len; - *p++ = 0; - } - -*stringlist = NULL; -return 0; -} - - - -/************************************************* -* Free store obtained by get_substring_list * -*************************************************/ - -/* This function exists for the benefit of people calling PCRE from non-C -programs that can call its functions, but not free() or (PUBL(free))() -directly. - -Argument: the result of a previous pcre_get_substring_list() -Returns: nothing -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN void PCRE_CALL_CONVENTION -pcre_free_substring_list(const char **pointer) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN void PCRE_CALL_CONVENTION -pcre16_free_substring_list(PCRE_SPTR16 *pointer) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN void PCRE_CALL_CONVENTION -pcre32_free_substring_list(PCRE_SPTR32 *pointer) -#endif -{ -(PUBL(free))((void *)pointer); -} - - - -/************************************************* -* Copy captured string to new store * -*************************************************/ - -/* This function copies a single captured substring into a piece of new -store - -Arguments: - subject the subject string that was matched - ovector pointer to the offsets table - stringcount the number of substrings that were captured - (i.e. the yield of the pcre_exec call, unless - that was zero, in which case it should be 1/3 - of the offset table size) - stringnumber the number of the required substring - stringptr where to put a pointer to the substring - -Returns: if successful: - the length of the string, not including the zero that - is put on the end; can be zero - if not successful: - PCRE_ERROR_NOMEMORY (-6) failed to get store - PCRE_ERROR_NOSUBSTRING (-7) substring not present -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre_get_substring(const char *subject, int *ovector, int stringcount, - int stringnumber, const char **stringptr) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre16_get_substring(PCRE_SPTR16 subject, int *ovector, int stringcount, - int stringnumber, PCRE_SPTR16 *stringptr) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre32_get_substring(PCRE_SPTR32 subject, int *ovector, int stringcount, - int stringnumber, PCRE_SPTR32 *stringptr) -#endif -{ -int yield; -pcre_uchar *substring; -if (stringnumber < 0 || stringnumber >= stringcount) - return PCRE_ERROR_NOSUBSTRING; -stringnumber *= 2; -yield = ovector[stringnumber+1] - ovector[stringnumber]; -substring = (pcre_uchar *)(PUBL(malloc))(IN_UCHARS(yield + 1)); -if (substring == NULL) return PCRE_ERROR_NOMEMORY; -memcpy(substring, subject + ovector[stringnumber], IN_UCHARS(yield)); -substring[yield] = 0; -#if defined COMPILE_PCRE8 -*stringptr = (const char *)substring; -#elif defined COMPILE_PCRE16 -*stringptr = (PCRE_SPTR16)substring; -#elif defined COMPILE_PCRE32 -*stringptr = (PCRE_SPTR32)substring; -#endif -return yield; -} - - - -/************************************************* -* Copy named captured string to new store * -*************************************************/ - -/* This function copies a single captured substring, identified by name, into -new store. If the regex permits duplicate names, the first substring that is -set is chosen. - -Arguments: - code the compiled regex - subject the subject string that was matched - ovector pointer to the offsets table - stringcount the number of substrings that were captured - (i.e. the yield of the pcre_exec call, unless - that was zero, in which case it should be 1/3 - of the offset table size) - stringname the name of the required substring - stringptr where to put the pointer - -Returns: if successful: - the length of the copied string, not including the zero - that is put on the end; can be zero - if not successful: - PCRE_ERROR_NOMEMORY (-6) couldn't get memory - PCRE_ERROR_NOSUBSTRING (-7) no such captured substring -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre_get_named_substring(const pcre *code, const char *subject, - int *ovector, int stringcount, const char *stringname, - const char **stringptr) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre16_get_named_substring(const pcre16 *code, PCRE_SPTR16 subject, - int *ovector, int stringcount, PCRE_SPTR16 stringname, - PCRE_SPTR16 *stringptr) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre32_get_named_substring(const pcre32 *code, PCRE_SPTR32 subject, - int *ovector, int stringcount, PCRE_SPTR32 stringname, - PCRE_SPTR32 *stringptr) -#endif -{ -int n = get_first_set(code, stringname, ovector, stringcount); -if (n <= 0) return n; -#if defined COMPILE_PCRE8 -return pcre_get_substring(subject, ovector, stringcount, n, stringptr); -#elif defined COMPILE_PCRE16 -return pcre16_get_substring(subject, ovector, stringcount, n, stringptr); -#elif defined COMPILE_PCRE32 -return pcre32_get_substring(subject, ovector, stringcount, n, stringptr); -#endif -} - - - - -/************************************************* -* Free store obtained by get_substring * -*************************************************/ - -/* This function exists for the benefit of people calling PCRE from non-C -programs that can call its functions, but not free() or (PUBL(free))() -directly. - -Argument: the result of a previous pcre_get_substring() -Returns: nothing -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN void PCRE_CALL_CONVENTION -pcre_free_substring(const char *pointer) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN void PCRE_CALL_CONVENTION -pcre16_free_substring(PCRE_SPTR16 pointer) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN void PCRE_CALL_CONVENTION -pcre32_free_substring(PCRE_SPTR32 pointer) -#endif -{ -(PUBL(free))((void *)pointer); -} - -/* End of pcre_get.c */ diff --git a/src/pcre/pcre_globals.c b/src/pcre/pcre_globals.c deleted file mode 100644 index 0f106aa9..00000000 --- a/src/pcre/pcre_globals.c +++ /dev/null @@ -1,86 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2014 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This module contains global variables that are exported by the PCRE library. -PCRE is thread-clean and doesn't use any global variables in the normal sense. -However, it calls memory allocation and freeing functions via the four -indirections below, and it can optionally do callouts, using the fifth -indirection. These values can be changed by the caller, but are shared between -all threads. - -For MS Visual Studio and Symbian OS, there are problems in initializing these -variables to non-local functions. In these cases, therefore, an indirection via -a local function is used. - -Also, when compiling for Virtual Pascal, things are done differently, and -global variables are not used. */ - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include "pcre_internal.h" - -#if defined _MSC_VER || defined __SYMBIAN32__ -static void* LocalPcreMalloc(size_t aSize) - { - return malloc(aSize); - } -static void LocalPcreFree(void* aPtr) - { - free(aPtr); - } -PCRE_EXP_DATA_DEFN void *(*PUBL(malloc))(size_t) = LocalPcreMalloc; -PCRE_EXP_DATA_DEFN void (*PUBL(free))(void *) = LocalPcreFree; -PCRE_EXP_DATA_DEFN void *(*PUBL(stack_malloc))(size_t) = LocalPcreMalloc; -PCRE_EXP_DATA_DEFN void (*PUBL(stack_free))(void *) = LocalPcreFree; -PCRE_EXP_DATA_DEFN int (*PUBL(callout))(PUBL(callout_block) *) = NULL; -PCRE_EXP_DATA_DEFN int (*PUBL(stack_guard))(void) = NULL; - -#elif !defined VPCOMPAT -PCRE_EXP_DATA_DEFN void *(*PUBL(malloc))(size_t) = malloc; -PCRE_EXP_DATA_DEFN void (*PUBL(free))(void *) = free; -PCRE_EXP_DATA_DEFN void *(*PUBL(stack_malloc))(size_t) = malloc; -PCRE_EXP_DATA_DEFN void (*PUBL(stack_free))(void *) = free; -PCRE_EXP_DATA_DEFN int (*PUBL(callout))(PUBL(callout_block) *) = NULL; -PCRE_EXP_DATA_DEFN int (*PUBL(stack_guard))(void) = NULL; -#endif - -/* End of pcre_globals.c */ diff --git a/src/pcre/pcre_internal.h b/src/pcre/pcre_internal.h deleted file mode 100644 index 97ff55d0..00000000 --- a/src/pcre/pcre_internal.h +++ /dev/null @@ -1,2807 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2016 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* This header contains definitions that are shared between the different -modules, but which are not relevant to the exported API. This includes some -functions whose names all begin with "_pcre_", "_pcre16_" or "_pcre32_" -depending on the PRIV macro. */ - -#ifndef PCRE_INTERNAL_H -#define PCRE_INTERNAL_H - -/* Define PCRE_DEBUG to get debugging output on stdout. */ - -#if 0 -#define PCRE_DEBUG -#endif - -/* PCRE is compiled as an 8 bit library if it is not requested otherwise. */ - -#if !defined COMPILE_PCRE16 && !defined COMPILE_PCRE32 -#define COMPILE_PCRE8 -#endif - -/* If SUPPORT_UCP is defined, SUPPORT_UTF must also be defined. The -"configure" script ensures this, but not everybody uses "configure". */ - -#if defined SUPPORT_UCP && !(defined SUPPORT_UTF) -#define SUPPORT_UTF 1 -#endif - -/* We define SUPPORT_UTF if SUPPORT_UTF8 is enabled for compatibility -reasons with existing code. */ - -#if defined SUPPORT_UTF8 && !(defined SUPPORT_UTF) -#define SUPPORT_UTF 1 -#endif - -/* Fixme: SUPPORT_UTF8 should be eventually disappear from the code. -Until then we define it if SUPPORT_UTF is defined. */ - -#if defined SUPPORT_UTF && !(defined SUPPORT_UTF8) -#define SUPPORT_UTF8 1 -#endif - -/* We do not support both EBCDIC and UTF-8/16/32 at the same time. The "configure" -script prevents both being selected, but not everybody uses "configure". */ - -#if defined EBCDIC && defined SUPPORT_UTF -#error The use of both EBCDIC and SUPPORT_UTF is not supported. -#endif - -/* Use a macro for debugging printing, 'cause that eliminates the use of #ifdef -inline, and there are *still* stupid compilers about that don't like indented -pre-processor statements, or at least there were when I first wrote this. After -all, it had only been about 10 years then... - -It turns out that the Mac Debugging.h header also defines the macro DPRINTF, so -be absolutely sure we get our version. */ - -#undef DPRINTF -#ifdef PCRE_DEBUG -#define DPRINTF(p) printf p -#else -#define DPRINTF(p) /* Nothing */ -#endif - - -/* Standard C headers plus the external interface definition. The only time -setjmp and stdarg are used is when NO_RECURSE is set. */ - -#include -#include -#include -#include -#include -#include - -/* Valgrind (memcheck) support */ - -#ifdef SUPPORT_VALGRIND -#include -#endif - -/* When compiling a DLL for Windows, the exported symbols have to be declared -using some MS magic. I found some useful information on this web page: -http://msdn2.microsoft.com/en-us/library/y4h7bcy6(VS.80).aspx. According to the -information there, using __declspec(dllexport) without "extern" we have a -definition; with "extern" we have a declaration. The settings here override the -setting in pcre.h (which is included below); it defines only PCRE_EXP_DECL, -which is all that is needed for applications (they just import the symbols). We -use: - - PCRE_EXP_DECL for declarations - PCRE_EXP_DEFN for definitions of exported functions - PCRE_EXP_DATA_DEFN for definitions of exported variables - -The reason for the two DEFN macros is that in non-Windows environments, one -does not want to have "extern" before variable definitions because it leads to -compiler warnings. So we distinguish between functions and variables. In -Windows, the two should always be the same. - -The reason for wrapping this in #ifndef PCRE_EXP_DECL is so that pcretest, -which is an application, but needs to import this file in order to "peek" at -internals, can #include pcre.h first to get an application's-eye view. - -In principle, people compiling for non-Windows, non-Unix-like (i.e. uncommon, -special-purpose environments) might want to stick other stuff in front of -exported symbols. That's why, in the non-Windows case, we set PCRE_EXP_DEFN and -PCRE_EXP_DATA_DEFN only if they are not already set. */ - -#ifndef PCRE_EXP_DECL -# ifdef _WIN32 -# ifndef PCRE_STATIC -# define PCRE_EXP_DECL extern __declspec(dllexport) -# define PCRE_EXP_DEFN __declspec(dllexport) -# define PCRE_EXP_DATA_DEFN __declspec(dllexport) -# else -# define PCRE_EXP_DECL extern -# define PCRE_EXP_DEFN -# define PCRE_EXP_DATA_DEFN -# endif -# else -# ifdef __cplusplus -# define PCRE_EXP_DECL extern "C" -# else -# define PCRE_EXP_DECL extern -# endif -# ifndef PCRE_EXP_DEFN -# define PCRE_EXP_DEFN PCRE_EXP_DECL -# endif -# ifndef PCRE_EXP_DATA_DEFN -# define PCRE_EXP_DATA_DEFN -# endif -# endif -#endif - -/* When compiling with the MSVC compiler, it is sometimes necessary to include -a "calling convention" before exported function names. (This is secondhand -information; I know nothing about MSVC myself). For example, something like - - void __cdecl function(....) - -might be needed. In order so make this easy, all the exported functions have -PCRE_CALL_CONVENTION just before their names. It is rarely needed; if not -set, we ensure here that it has no effect. */ - -#ifndef PCRE_CALL_CONVENTION -#define PCRE_CALL_CONVENTION -#endif - -/* We need to have types that specify unsigned 8, 16 and 32-bit integers. We -cannot determine these outside the compilation (e.g. by running a program as -part of "configure") because PCRE is often cross-compiled for use on other -systems. Instead we make use of the maximum sizes that are available at -preprocessor time in standard C environments. */ - -typedef unsigned char pcre_uint8; - -#if USHRT_MAX == 65535 -typedef unsigned short pcre_uint16; -typedef short pcre_int16; -#define PCRE_UINT16_MAX USHRT_MAX -#define PCRE_INT16_MAX SHRT_MAX -#elif UINT_MAX == 65535 -typedef unsigned int pcre_uint16; -typedef int pcre_int16; -#define PCRE_UINT16_MAX UINT_MAX -#define PCRE_INT16_MAX INT_MAX -#else -#error Cannot determine a type for 16-bit integers -#endif - -#if UINT_MAX == 4294967295U -typedef unsigned int pcre_uint32; -typedef int pcre_int32; -#define PCRE_UINT32_MAX UINT_MAX -#define PCRE_INT32_MAX INT_MAX -#elif ULONG_MAX == 4294967295UL -typedef unsigned long int pcre_uint32; -typedef long int pcre_int32; -#define PCRE_UINT32_MAX ULONG_MAX -#define PCRE_INT32_MAX LONG_MAX -#else -#error Cannot determine a type for 32-bit integers -#endif - -/* When checking for integer overflow in pcre_compile(), we need to handle -large integers. If a 64-bit integer type is available, we can use that. -Otherwise we have to cast to double, which of course requires floating point -arithmetic. Handle this by defining a macro for the appropriate type. If -stdint.h is available, include it; it may define INT64_MAX. Systems that do not -have stdint.h (e.g. Solaris) may have inttypes.h. The macro int64_t may be set -by "configure". */ - -#if defined HAVE_STDINT_H -#include -#elif defined HAVE_INTTYPES_H -#include -#endif - -#if defined INT64_MAX || defined int64_t -#define INT64_OR_DOUBLE int64_t -#else -#define INT64_OR_DOUBLE double -#endif - -/* All character handling must be done as unsigned characters. Otherwise there -are problems with top-bit-set characters and functions such as isspace(). -However, we leave the interface to the outside world as char * or short *, -because that should make things easier for callers. This character type is -called pcre_uchar. - -The IN_UCHARS macro multiply its argument with the byte size of the current -pcre_uchar type. Useful for memcpy and such operations, whose require the -byte size of their input/output buffers. - -The MAX_255 macro checks whether its pcre_uchar input is less than 256. - -The TABLE_GET macro is designed for accessing elements of tables whose contain -exactly 256 items. When the character is able to contain more than 256 -items, some check is needed before accessing these tables. -*/ - -#if defined COMPILE_PCRE8 - -typedef unsigned char pcre_uchar; -#define IN_UCHARS(x) (x) -#define MAX_255(c) 1 -#define TABLE_GET(c, table, default) ((table)[c]) - -#elif defined COMPILE_PCRE16 - -#if USHRT_MAX != 65535 -/* This is a warning message. Change PCRE_UCHAR16 to a 16 bit data type in -pcre.h(.in) and disable (comment out) this message. */ -#error Warning: PCRE_UCHAR16 is not a 16 bit data type. -#endif - -typedef pcre_uint16 pcre_uchar; -#define UCHAR_SHIFT (1) -#define IN_UCHARS(x) ((x) * 2) -#define MAX_255(c) ((c) <= 255u) -#define TABLE_GET(c, table, default) (MAX_255(c)? ((table)[c]):(default)) - -#elif defined COMPILE_PCRE32 - -typedef pcre_uint32 pcre_uchar; -#define UCHAR_SHIFT (2) -#define IN_UCHARS(x) ((x) * 4) -#define MAX_255(c) ((c) <= 255u) -#define TABLE_GET(c, table, default) (MAX_255(c)? ((table)[c]):(default)) - -#else -#error Unsupported compiling mode -#endif /* COMPILE_PCRE[8|16|32] */ - -/* This is an unsigned int value that no character can ever have. UTF-8 -characters only go up to 0x7fffffff (though Unicode doesn't go beyond -0x0010ffff). */ - -#define NOTACHAR 0xffffffff - -/* PCRE is able to support several different kinds of newline (CR, LF, CRLF, -"any" and "anycrlf" at present). The following macros are used to package up -testing for newlines. NLBLOCK, PSSTART, and PSEND are defined in the various -modules to indicate in which datablock the parameters exist, and what the -start/end of string field names are. */ - -#define NLTYPE_FIXED 0 /* Newline is a fixed length string */ -#define NLTYPE_ANY 1 /* Newline is any Unicode line ending */ -#define NLTYPE_ANYCRLF 2 /* Newline is CR, LF, or CRLF */ - -/* This macro checks for a newline at the given position */ - -#define IS_NEWLINE(p) \ - ((NLBLOCK->nltype != NLTYPE_FIXED)? \ - ((p) < NLBLOCK->PSEND && \ - PRIV(is_newline)((p), NLBLOCK->nltype, NLBLOCK->PSEND, \ - &(NLBLOCK->nllen), utf)) \ - : \ - ((p) <= NLBLOCK->PSEND - NLBLOCK->nllen && \ - UCHAR21TEST(p) == NLBLOCK->nl[0] && \ - (NLBLOCK->nllen == 1 || UCHAR21TEST(p+1) == NLBLOCK->nl[1]) \ - ) \ - ) - -/* This macro checks for a newline immediately preceding the given position */ - -#define WAS_NEWLINE(p) \ - ((NLBLOCK->nltype != NLTYPE_FIXED)? \ - ((p) > NLBLOCK->PSSTART && \ - PRIV(was_newline)((p), NLBLOCK->nltype, NLBLOCK->PSSTART, \ - &(NLBLOCK->nllen), utf)) \ - : \ - ((p) >= NLBLOCK->PSSTART + NLBLOCK->nllen && \ - UCHAR21TEST(p - NLBLOCK->nllen) == NLBLOCK->nl[0] && \ - (NLBLOCK->nllen == 1 || UCHAR21TEST(p - NLBLOCK->nllen + 1) == NLBLOCK->nl[1]) \ - ) \ - ) - -/* When PCRE is compiled as a C++ library, the subject pointer can be replaced -with a custom type. This makes it possible, for example, to allow pcre_exec() -to process subject strings that are discontinuous by using a smart pointer -class. It must always be possible to inspect all of the subject string in -pcre_exec() because of the way it backtracks. Two macros are required in the -normal case, for sign-unspecified and unsigned char pointers. The former is -used for the external interface and appears in pcre.h, which is why its name -must begin with PCRE_. */ - -#ifdef CUSTOM_SUBJECT_PTR -#define PCRE_PUCHAR CUSTOM_SUBJECT_PTR -#else -#define PCRE_PUCHAR const pcre_uchar * -#endif - -/* Include the public PCRE header and the definitions of UCP character property -values. */ - -#include "pcre.h" -#include "ucp.h" - -#ifdef COMPILE_PCRE32 -/* Assert that the public PCRE_UCHAR32 is a 32-bit type */ -typedef int __assert_pcre_uchar32_size[sizeof(PCRE_UCHAR32) == 4 ? 1 : -1]; -#endif - -/* When compiling for use with the Virtual Pascal compiler, these functions -need to have their names changed. PCRE must be compiled with the -DVPCOMPAT -option on the command line. */ - -#ifdef VPCOMPAT -#define strlen(s) _strlen(s) -#define strncmp(s1,s2,m) _strncmp(s1,s2,m) -#define memcmp(s,c,n) _memcmp(s,c,n) -#define memcpy(d,s,n) _memcpy(d,s,n) -#define memmove(d,s,n) _memmove(d,s,n) -#define memset(s,c,n) _memset(s,c,n) -#else /* VPCOMPAT */ - -/* To cope with SunOS4 and other systems that lack memmove() but have bcopy(), -define a macro for memmove() if HAVE_MEMMOVE is false, provided that HAVE_BCOPY -is set. Otherwise, include an emulating function for those systems that have -neither (there some non-Unix environments where this is the case). */ - -#ifndef HAVE_MEMMOVE -#undef memmove /* some systems may have a macro */ -#ifdef HAVE_BCOPY -#define memmove(a, b, c) bcopy(b, a, c) -#else /* HAVE_BCOPY */ -static void * -pcre_memmove(void *d, const void *s, size_t n) -{ -size_t i; -unsigned char *dest = (unsigned char *)d; -const unsigned char *src = (const unsigned char *)s; -if (dest > src) - { - dest += n; - src += n; - for (i = 0; i < n; ++i) *(--dest) = *(--src); - return (void *)dest; - } -else - { - for (i = 0; i < n; ++i) *dest++ = *src++; - return (void *)(dest - n); - } -} -#define memmove(a, b, c) pcre_memmove(a, b, c) -#endif /* not HAVE_BCOPY */ -#endif /* not HAVE_MEMMOVE */ -#endif /* not VPCOMPAT */ - - -/* PCRE keeps offsets in its compiled code as 2-byte quantities (always stored -in big-endian order) by default. These are used, for example, to link from the -start of a subpattern to its alternatives and its end. The use of 2 bytes per -offset limits the size of the compiled regex to around 64K, which is big enough -for almost everybody. However, I received a request for an even bigger limit. -For this reason, and also to make the code easier to maintain, the storing and -loading of offsets from the byte string is now handled by the macros that are -defined here. - -The macros are controlled by the value of LINK_SIZE. This defaults to 2 in -the config.h file, but can be overridden by using -D on the command line. This -is automated on Unix systems via the "configure" command. */ - -#if defined COMPILE_PCRE8 - -#if LINK_SIZE == 2 - -#define PUT(a,n,d) \ - (a[n] = (d) >> 8), \ - (a[(n)+1] = (d) & 255) - -#define GET(a,n) \ - (((a)[n] << 8) | (a)[(n)+1]) - -#define MAX_PATTERN_SIZE (1 << 16) - - -#elif LINK_SIZE == 3 - -#define PUT(a,n,d) \ - (a[n] = (d) >> 16), \ - (a[(n)+1] = (d) >> 8), \ - (a[(n)+2] = (d) & 255) - -#define GET(a,n) \ - (((a)[n] << 16) | ((a)[(n)+1] << 8) | (a)[(n)+2]) - -#define MAX_PATTERN_SIZE (1 << 24) - - -#elif LINK_SIZE == 4 - -#define PUT(a,n,d) \ - (a[n] = (d) >> 24), \ - (a[(n)+1] = (d) >> 16), \ - (a[(n)+2] = (d) >> 8), \ - (a[(n)+3] = (d) & 255) - -#define GET(a,n) \ - (((a)[n] << 24) | ((a)[(n)+1] << 16) | ((a)[(n)+2] << 8) | (a)[(n)+3]) - -/* Keep it positive */ -#define MAX_PATTERN_SIZE (1 << 30) - -#else -#error LINK_SIZE must be either 2, 3, or 4 -#endif - -#elif defined COMPILE_PCRE16 - -#if LINK_SIZE == 2 - -/* Redefine LINK_SIZE as a multiple of sizeof(pcre_uchar) */ -#undef LINK_SIZE -#define LINK_SIZE 1 - -#define PUT(a,n,d) \ - (a[n] = (d)) - -#define GET(a,n) \ - (a[n]) - -#define MAX_PATTERN_SIZE (1 << 16) - -#elif LINK_SIZE == 3 || LINK_SIZE == 4 - -/* Redefine LINK_SIZE as a multiple of sizeof(pcre_uchar) */ -#undef LINK_SIZE -#define LINK_SIZE 2 - -#define PUT(a,n,d) \ - (a[n] = (d) >> 16), \ - (a[(n)+1] = (d) & 65535) - -#define GET(a,n) \ - (((a)[n] << 16) | (a)[(n)+1]) - -/* Keep it positive */ -#define MAX_PATTERN_SIZE (1 << 30) - -#else -#error LINK_SIZE must be either 2, 3, or 4 -#endif - -#elif defined COMPILE_PCRE32 - -/* Only supported LINK_SIZE is 4 */ -/* Redefine LINK_SIZE as a multiple of sizeof(pcre_uchar) */ -#undef LINK_SIZE -#define LINK_SIZE 1 - -#define PUT(a,n,d) \ - (a[n] = (d)) - -#define GET(a,n) \ - (a[n]) - -/* Keep it positive */ -#define MAX_PATTERN_SIZE (1 << 30) - -#else -#error Unsupported compiling mode -#endif /* COMPILE_PCRE[8|16|32] */ - -/* Convenience macro defined in terms of the others */ - -#define PUTINC(a,n,d) PUT(a,n,d), a += LINK_SIZE - - -/* PCRE uses some other 2-byte quantities that do not change when the size of -offsets changes. There are used for repeat counts and for other things such as -capturing parenthesis numbers in back references. */ - -#if defined COMPILE_PCRE8 - -#define IMM2_SIZE 2 - -#define PUT2(a,n,d) \ - a[n] = (d) >> 8; \ - a[(n)+1] = (d) & 255 - -/* For reasons that I do not understand, the expression in this GET2 macro is -treated by gcc as a signed expression, even when a is declared as unsigned. It -seems that any kind of arithmetic results in a signed value. */ - -#define GET2(a,n) \ - (unsigned int)(((a)[n] << 8) | (a)[(n)+1]) - -#elif defined COMPILE_PCRE16 - -#define IMM2_SIZE 1 - -#define PUT2(a,n,d) \ - a[n] = d - -#define GET2(a,n) \ - a[n] - -#elif defined COMPILE_PCRE32 - -#define IMM2_SIZE 1 - -#define PUT2(a,n,d) \ - a[n] = d - -#define GET2(a,n) \ - a[n] - -#else -#error Unsupported compiling mode -#endif /* COMPILE_PCRE[8|16|32] */ - -#define PUT2INC(a,n,d) PUT2(a,n,d), a += IMM2_SIZE - -/* The maximum length of a MARK name is currently one data unit; it may be -changed in future to be a fixed number of bytes or to depend on LINK_SIZE. */ - -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 -#define MAX_MARK ((1u << 16) - 1) -#else -#define MAX_MARK ((1u << 8) - 1) -#endif - -/* There is a proposed future special "UTF-21" mode, in which only the lowest -21 bits of a 32-bit character are interpreted as UTF, with the remaining 11 -high-order bits available to the application for other uses. In preparation for -the future implementation of this mode, there are macros that load a data item -and, if in this special mode, mask it to 21 bits. These macros all have names -starting with UCHAR21. In all other modes, including the normal 32-bit -library, the macros all have the same simple definitions. When the new mode is -implemented, it is expected that these definitions will be varied appropriately -using #ifdef when compiling the library that supports the special mode. */ - -#define UCHAR21(eptr) (*(eptr)) -#define UCHAR21TEST(eptr) (*(eptr)) -#define UCHAR21INC(eptr) (*(eptr)++) -#define UCHAR21INCTEST(eptr) (*(eptr)++) - -/* When UTF encoding is being used, a character is no longer just a single -byte in 8-bit mode or a single short in 16-bit mode. The macros for character -handling generate simple sequences when used in the basic mode, and more -complicated ones for UTF characters. GETCHARLENTEST and other macros are not -used when UTF is not supported. To make sure they can never even appear when -UTF support is omitted, we don't even define them. */ - -#ifndef SUPPORT_UTF - -/* #define MAX_VALUE_FOR_SINGLE_CHAR */ -/* #define HAS_EXTRALEN(c) */ -/* #define GET_EXTRALEN(c) */ -/* #define NOT_FIRSTCHAR(c) */ -#define GETCHAR(c, eptr) c = *eptr; -#define GETCHARTEST(c, eptr) c = *eptr; -#define GETCHARINC(c, eptr) c = *eptr++; -#define GETCHARINCTEST(c, eptr) c = *eptr++; -#define GETCHARLEN(c, eptr, len) c = *eptr; -/* #define GETCHARLENTEST(c, eptr, len) */ -/* #define BACKCHAR(eptr) */ -/* #define FORWARDCHAR(eptr) */ -/* #define ACROSSCHAR(condition, eptr, action) */ - -#else /* SUPPORT_UTF */ - -/* Tests whether the code point needs extra characters to decode. */ - -#define HASUTF8EXTRALEN(c) ((c) >= 0xc0) - -/* Base macro to pick up the remaining bytes of a UTF-8 character, not -advancing the pointer. */ - -#define GETUTF8(c, eptr) \ - { \ - if ((c & 0x20) == 0) \ - c = ((c & 0x1f) << 6) | (eptr[1] & 0x3f); \ - else if ((c & 0x10) == 0) \ - c = ((c & 0x0f) << 12) | ((eptr[1] & 0x3f) << 6) | (eptr[2] & 0x3f); \ - else if ((c & 0x08) == 0) \ - c = ((c & 0x07) << 18) | ((eptr[1] & 0x3f) << 12) | \ - ((eptr[2] & 0x3f) << 6) | (eptr[3] & 0x3f); \ - else if ((c & 0x04) == 0) \ - c = ((c & 0x03) << 24) | ((eptr[1] & 0x3f) << 18) | \ - ((eptr[2] & 0x3f) << 12) | ((eptr[3] & 0x3f) << 6) | \ - (eptr[4] & 0x3f); \ - else \ - c = ((c & 0x01) << 30) | ((eptr[1] & 0x3f) << 24) | \ - ((eptr[2] & 0x3f) << 18) | ((eptr[3] & 0x3f) << 12) | \ - ((eptr[4] & 0x3f) << 6) | (eptr[5] & 0x3f); \ - } - -/* Base macro to pick up the remaining bytes of a UTF-8 character, advancing -the pointer. */ - -#define GETUTF8INC(c, eptr) \ - { \ - if ((c & 0x20) == 0) \ - c = ((c & 0x1f) << 6) | (*eptr++ & 0x3f); \ - else if ((c & 0x10) == 0) \ - { \ - c = ((c & 0x0f) << 12) | ((*eptr & 0x3f) << 6) | (eptr[1] & 0x3f); \ - eptr += 2; \ - } \ - else if ((c & 0x08) == 0) \ - { \ - c = ((c & 0x07) << 18) | ((*eptr & 0x3f) << 12) | \ - ((eptr[1] & 0x3f) << 6) | (eptr[2] & 0x3f); \ - eptr += 3; \ - } \ - else if ((c & 0x04) == 0) \ - { \ - c = ((c & 0x03) << 24) | ((*eptr & 0x3f) << 18) | \ - ((eptr[1] & 0x3f) << 12) | ((eptr[2] & 0x3f) << 6) | \ - (eptr[3] & 0x3f); \ - eptr += 4; \ - } \ - else \ - { \ - c = ((c & 0x01) << 30) | ((*eptr & 0x3f) << 24) | \ - ((eptr[1] & 0x3f) << 18) | ((eptr[2] & 0x3f) << 12) | \ - ((eptr[3] & 0x3f) << 6) | (eptr[4] & 0x3f); \ - eptr += 5; \ - } \ - } - -#if defined COMPILE_PCRE8 - -/* These macros were originally written in the form of loops that used data -from the tables whose names start with PRIV(utf8_table). They were rewritten by -a user so as not to use loops, because in some environments this gives a -significant performance advantage, and it seems never to do any harm. */ - -/* Tells the biggest code point which can be encoded as a single character. */ - -#define MAX_VALUE_FOR_SINGLE_CHAR 127 - -/* Tests whether the code point needs extra characters to decode. */ - -#define HAS_EXTRALEN(c) ((c) >= 0xc0) - -/* Returns with the additional number of characters if IS_MULTICHAR(c) is TRUE. -Otherwise it has an undefined behaviour. */ - -#define GET_EXTRALEN(c) (PRIV(utf8_table4)[(c) & 0x3f]) - -/* Returns TRUE, if the given character is not the first character -of a UTF sequence. */ - -#define NOT_FIRSTCHAR(c) (((c) & 0xc0) == 0x80) - -/* Get the next UTF-8 character, not advancing the pointer. This is called when -we know we are in UTF-8 mode. */ - -#define GETCHAR(c, eptr) \ - c = *eptr; \ - if (c >= 0xc0) GETUTF8(c, eptr); - -/* Get the next UTF-8 character, testing for UTF-8 mode, and not advancing the -pointer. */ - -#define GETCHARTEST(c, eptr) \ - c = *eptr; \ - if (utf && c >= 0xc0) GETUTF8(c, eptr); - -/* Get the next UTF-8 character, advancing the pointer. This is called when we -know we are in UTF-8 mode. */ - -#define GETCHARINC(c, eptr) \ - c = *eptr++; \ - if (c >= 0xc0) GETUTF8INC(c, eptr); - -/* Get the next character, testing for UTF-8 mode, and advancing the pointer. -This is called when we don't know if we are in UTF-8 mode. */ - -#define GETCHARINCTEST(c, eptr) \ - c = *eptr++; \ - if (utf && c >= 0xc0) GETUTF8INC(c, eptr); - -/* Base macro to pick up the remaining bytes of a UTF-8 character, not -advancing the pointer, incrementing the length. */ - -#define GETUTF8LEN(c, eptr, len) \ - { \ - if ((c & 0x20) == 0) \ - { \ - c = ((c & 0x1f) << 6) | (eptr[1] & 0x3f); \ - len++; \ - } \ - else if ((c & 0x10) == 0) \ - { \ - c = ((c & 0x0f) << 12) | ((eptr[1] & 0x3f) << 6) | (eptr[2] & 0x3f); \ - len += 2; \ - } \ - else if ((c & 0x08) == 0) \ - {\ - c = ((c & 0x07) << 18) | ((eptr[1] & 0x3f) << 12) | \ - ((eptr[2] & 0x3f) << 6) | (eptr[3] & 0x3f); \ - len += 3; \ - } \ - else if ((c & 0x04) == 0) \ - { \ - c = ((c & 0x03) << 24) | ((eptr[1] & 0x3f) << 18) | \ - ((eptr[2] & 0x3f) << 12) | ((eptr[3] & 0x3f) << 6) | \ - (eptr[4] & 0x3f); \ - len += 4; \ - } \ - else \ - {\ - c = ((c & 0x01) << 30) | ((eptr[1] & 0x3f) << 24) | \ - ((eptr[2] & 0x3f) << 18) | ((eptr[3] & 0x3f) << 12) | \ - ((eptr[4] & 0x3f) << 6) | (eptr[5] & 0x3f); \ - len += 5; \ - } \ - } - -/* Get the next UTF-8 character, not advancing the pointer, incrementing length -if there are extra bytes. This is called when we know we are in UTF-8 mode. */ - -#define GETCHARLEN(c, eptr, len) \ - c = *eptr; \ - if (c >= 0xc0) GETUTF8LEN(c, eptr, len); - -/* Get the next UTF-8 character, testing for UTF-8 mode, not advancing the -pointer, incrementing length if there are extra bytes. This is called when we -do not know if we are in UTF-8 mode. */ - -#define GETCHARLENTEST(c, eptr, len) \ - c = *eptr; \ - if (utf && c >= 0xc0) GETUTF8LEN(c, eptr, len); - -/* If the pointer is not at the start of a character, move it back until -it is. This is called only in UTF-8 mode - we don't put a test within the macro -because almost all calls are already within a block of UTF-8 only code. */ - -#define BACKCHAR(eptr) while((*eptr & 0xc0) == 0x80) eptr-- - -/* Same as above, just in the other direction. */ -#define FORWARDCHAR(eptr) while((*eptr & 0xc0) == 0x80) eptr++ - -/* Same as above, but it allows a fully customizable form. */ -#define ACROSSCHAR(condition, eptr, action) \ - while((condition) && ((eptr) & 0xc0) == 0x80) action - -#elif defined COMPILE_PCRE16 - -/* Tells the biggest code point which can be encoded as a single character. */ - -#define MAX_VALUE_FOR_SINGLE_CHAR 65535 - -/* Tests whether the code point needs extra characters to decode. */ - -#define HAS_EXTRALEN(c) (((c) & 0xfc00) == 0xd800) - -/* Returns with the additional number of characters if IS_MULTICHAR(c) is TRUE. -Otherwise it has an undefined behaviour. */ - -#define GET_EXTRALEN(c) 1 - -/* Returns TRUE, if the given character is not the first character -of a UTF sequence. */ - -#define NOT_FIRSTCHAR(c) (((c) & 0xfc00) == 0xdc00) - -/* Base macro to pick up the low surrogate of a UTF-16 character, not -advancing the pointer. */ - -#define GETUTF16(c, eptr) \ - { c = (((c & 0x3ff) << 10) | (eptr[1] & 0x3ff)) + 0x10000; } - -/* Get the next UTF-16 character, not advancing the pointer. This is called when -we know we are in UTF-16 mode. */ - -#define GETCHAR(c, eptr) \ - c = *eptr; \ - if ((c & 0xfc00) == 0xd800) GETUTF16(c, eptr); - -/* Get the next UTF-16 character, testing for UTF-16 mode, and not advancing the -pointer. */ - -#define GETCHARTEST(c, eptr) \ - c = *eptr; \ - if (utf && (c & 0xfc00) == 0xd800) GETUTF16(c, eptr); - -/* Base macro to pick up the low surrogate of a UTF-16 character, advancing -the pointer. */ - -#define GETUTF16INC(c, eptr) \ - { c = (((c & 0x3ff) << 10) | (*eptr++ & 0x3ff)) + 0x10000; } - -/* Get the next UTF-16 character, advancing the pointer. This is called when we -know we are in UTF-16 mode. */ - -#define GETCHARINC(c, eptr) \ - c = *eptr++; \ - if ((c & 0xfc00) == 0xd800) GETUTF16INC(c, eptr); - -/* Get the next character, testing for UTF-16 mode, and advancing the pointer. -This is called when we don't know if we are in UTF-16 mode. */ - -#define GETCHARINCTEST(c, eptr) \ - c = *eptr++; \ - if (utf && (c & 0xfc00) == 0xd800) GETUTF16INC(c, eptr); - -/* Base macro to pick up the low surrogate of a UTF-16 character, not -advancing the pointer, incrementing the length. */ - -#define GETUTF16LEN(c, eptr, len) \ - { c = (((c & 0x3ff) << 10) | (eptr[1] & 0x3ff)) + 0x10000; len++; } - -/* Get the next UTF-16 character, not advancing the pointer, incrementing -length if there is a low surrogate. This is called when we know we are in -UTF-16 mode. */ - -#define GETCHARLEN(c, eptr, len) \ - c = *eptr; \ - if ((c & 0xfc00) == 0xd800) GETUTF16LEN(c, eptr, len); - -/* Get the next UTF-816character, testing for UTF-16 mode, not advancing the -pointer, incrementing length if there is a low surrogate. This is called when -we do not know if we are in UTF-16 mode. */ - -#define GETCHARLENTEST(c, eptr, len) \ - c = *eptr; \ - if (utf && (c & 0xfc00) == 0xd800) GETUTF16LEN(c, eptr, len); - -/* If the pointer is not at the start of a character, move it back until -it is. This is called only in UTF-16 mode - we don't put a test within the -macro because almost all calls are already within a block of UTF-16 only -code. */ - -#define BACKCHAR(eptr) if ((*eptr & 0xfc00) == 0xdc00) eptr-- - -/* Same as above, just in the other direction. */ -#define FORWARDCHAR(eptr) if ((*eptr & 0xfc00) == 0xdc00) eptr++ - -/* Same as above, but it allows a fully customizable form. */ -#define ACROSSCHAR(condition, eptr, action) \ - if ((condition) && ((eptr) & 0xfc00) == 0xdc00) action - -#elif defined COMPILE_PCRE32 - -/* These are trivial for the 32-bit library, since all UTF-32 characters fit -into one pcre_uchar unit. */ -#define MAX_VALUE_FOR_SINGLE_CHAR (0x10ffffu) -#define HAS_EXTRALEN(c) (0) -#define GET_EXTRALEN(c) (0) -#define NOT_FIRSTCHAR(c) (0) - -/* Get the next UTF-32 character, not advancing the pointer. This is called when -we know we are in UTF-32 mode. */ - -#define GETCHAR(c, eptr) \ - c = *(eptr); - -/* Get the next UTF-32 character, testing for UTF-32 mode, and not advancing the -pointer. */ - -#define GETCHARTEST(c, eptr) \ - c = *(eptr); - -/* Get the next UTF-32 character, advancing the pointer. This is called when we -know we are in UTF-32 mode. */ - -#define GETCHARINC(c, eptr) \ - c = *((eptr)++); - -/* Get the next character, testing for UTF-32 mode, and advancing the pointer. -This is called when we don't know if we are in UTF-32 mode. */ - -#define GETCHARINCTEST(c, eptr) \ - c = *((eptr)++); - -/* Get the next UTF-32 character, not advancing the pointer, not incrementing -length (since all UTF-32 is of length 1). This is called when we know we are in -UTF-32 mode. */ - -#define GETCHARLEN(c, eptr, len) \ - GETCHAR(c, eptr) - -/* Get the next UTF-32character, testing for UTF-32 mode, not advancing the -pointer, not incrementing the length (since all UTF-32 is of length 1). -This is called when we do not know if we are in UTF-32 mode. */ - -#define GETCHARLENTEST(c, eptr, len) \ - GETCHARTEST(c, eptr) - -/* If the pointer is not at the start of a character, move it back until -it is. This is called only in UTF-32 mode - we don't put a test within the -macro because almost all calls are already within a block of UTF-32 only -code. -These are all no-ops since all UTF-32 characters fit into one pcre_uchar. */ - -#define BACKCHAR(eptr) do { } while (0) - -/* Same as above, just in the other direction. */ -#define FORWARDCHAR(eptr) do { } while (0) - -/* Same as above, but it allows a fully customizable form. */ -#define ACROSSCHAR(condition, eptr, action) do { } while (0) - -#else -#error Unsupported compiling mode -#endif /* COMPILE_PCRE[8|16|32] */ - -#endif /* SUPPORT_UTF */ - -/* Tests for Unicode horizontal and vertical whitespace characters must check a -number of different values. Using a switch statement for this generates the -fastest code (no loop, no memory access), and there are several places in the -interpreter code where this happens. In order to ensure that all the case lists -remain in step, we use macros so that there is only one place where the lists -are defined. - -These values are also required as lists in pcre_compile.c when processing \h, -\H, \v and \V in a character class. The lists are defined in pcre_tables.c, but -macros that define the values are here so that all the definitions are -together. The lists must be in ascending character order, terminated by -NOTACHAR (which is 0xffffffff). - -Any changes should ensure that the various macros are kept in step with each -other. NOTE: The values also appear in pcre_jit_compile.c. */ - -/* ------ ASCII/Unicode environments ------ */ - -#ifndef EBCDIC - -#define HSPACE_LIST \ - CHAR_HT, CHAR_SPACE, CHAR_NBSP, \ - 0x1680, 0x180e, 0x2000, 0x2001, 0x2002, 0x2003, 0x2004, 0x2005, \ - 0x2006, 0x2007, 0x2008, 0x2009, 0x200A, 0x202f, 0x205f, 0x3000, \ - NOTACHAR - -#define HSPACE_MULTIBYTE_CASES \ - case 0x1680: /* OGHAM SPACE MARK */ \ - case 0x180e: /* MONGOLIAN VOWEL SEPARATOR */ \ - case 0x2000: /* EN QUAD */ \ - case 0x2001: /* EM QUAD */ \ - case 0x2002: /* EN SPACE */ \ - case 0x2003: /* EM SPACE */ \ - case 0x2004: /* THREE-PER-EM SPACE */ \ - case 0x2005: /* FOUR-PER-EM SPACE */ \ - case 0x2006: /* SIX-PER-EM SPACE */ \ - case 0x2007: /* FIGURE SPACE */ \ - case 0x2008: /* PUNCTUATION SPACE */ \ - case 0x2009: /* THIN SPACE */ \ - case 0x200A: /* HAIR SPACE */ \ - case 0x202f: /* NARROW NO-BREAK SPACE */ \ - case 0x205f: /* MEDIUM MATHEMATICAL SPACE */ \ - case 0x3000 /* IDEOGRAPHIC SPACE */ - -#define HSPACE_BYTE_CASES \ - case CHAR_HT: \ - case CHAR_SPACE: \ - case CHAR_NBSP - -#define HSPACE_CASES \ - HSPACE_BYTE_CASES: \ - HSPACE_MULTIBYTE_CASES - -#define VSPACE_LIST \ - CHAR_LF, CHAR_VT, CHAR_FF, CHAR_CR, CHAR_NEL, 0x2028, 0x2029, NOTACHAR - -#define VSPACE_MULTIBYTE_CASES \ - case 0x2028: /* LINE SEPARATOR */ \ - case 0x2029 /* PARAGRAPH SEPARATOR */ - -#define VSPACE_BYTE_CASES \ - case CHAR_LF: \ - case CHAR_VT: \ - case CHAR_FF: \ - case CHAR_CR: \ - case CHAR_NEL - -#define VSPACE_CASES \ - VSPACE_BYTE_CASES: \ - VSPACE_MULTIBYTE_CASES - -/* ------ EBCDIC environments ------ */ - -#else -#define HSPACE_LIST CHAR_HT, CHAR_SPACE, CHAR_NBSP, NOTACHAR - -#define HSPACE_BYTE_CASES \ - case CHAR_HT: \ - case CHAR_SPACE: \ - case CHAR_NBSP - -#define HSPACE_CASES HSPACE_BYTE_CASES - -#ifdef EBCDIC_NL25 -#define VSPACE_LIST \ - CHAR_VT, CHAR_FF, CHAR_CR, CHAR_NEL, CHAR_LF, NOTACHAR -#else -#define VSPACE_LIST \ - CHAR_VT, CHAR_FF, CHAR_CR, CHAR_LF, CHAR_NEL, NOTACHAR -#endif - -#define VSPACE_BYTE_CASES \ - case CHAR_LF: \ - case CHAR_VT: \ - case CHAR_FF: \ - case CHAR_CR: \ - case CHAR_NEL - -#define VSPACE_CASES VSPACE_BYTE_CASES -#endif /* EBCDIC */ - -/* ------ End of whitespace macros ------ */ - - - -/* Private flags containing information about the compiled regex. They used to -live at the top end of the options word, but that got almost full, so they were -moved to a 16-bit flags word - which got almost full, so now they are in a -32-bit flags word. From release 8.00, PCRE_NOPARTIAL is unused, as the -restrictions on partial matching have been lifted. It remains for backwards -compatibility. */ - -#define PCRE_MODE8 0x00000001 /* compiled in 8 bit mode */ -#define PCRE_MODE16 0x00000002 /* compiled in 16 bit mode */ -#define PCRE_MODE32 0x00000004 /* compiled in 32 bit mode */ -#define PCRE_FIRSTSET 0x00000010 /* first_char is set */ -#define PCRE_FCH_CASELESS 0x00000020 /* caseless first char */ -#define PCRE_REQCHSET 0x00000040 /* req_byte is set */ -#define PCRE_RCH_CASELESS 0x00000080 /* caseless requested char */ -#define PCRE_STARTLINE 0x00000100 /* start after \n for multiline */ -#define PCRE_NOPARTIAL 0x00000200 /* can't use partial with this regex */ -#define PCRE_JCHANGED 0x00000400 /* j option used in regex */ -#define PCRE_HASCRORLF 0x00000800 /* explicit \r or \n in pattern */ -#define PCRE_HASTHEN 0x00001000 /* pattern contains (*THEN) */ -#define PCRE_MLSET 0x00002000 /* match limit set by regex */ -#define PCRE_RLSET 0x00004000 /* recursion limit set by regex */ -#define PCRE_MATCH_EMPTY 0x00008000 /* pattern can match empty string */ - -#if defined COMPILE_PCRE8 -#define PCRE_MODE PCRE_MODE8 -#elif defined COMPILE_PCRE16 -#define PCRE_MODE PCRE_MODE16 -#elif defined COMPILE_PCRE32 -#define PCRE_MODE PCRE_MODE32 -#endif -#define PCRE_MODE_MASK (PCRE_MODE8 | PCRE_MODE16 | PCRE_MODE32) - -/* Flags for the "extra" block produced by pcre_study(). */ - -#define PCRE_STUDY_MAPPED 0x0001 /* a map of starting chars exists */ -#define PCRE_STUDY_MINLEN 0x0002 /* a minimum length field exists */ - -/* Masks for identifying the public options that are permitted at compile -time, run time, or study time, respectively. */ - -#define PCRE_NEWLINE_BITS (PCRE_NEWLINE_CR|PCRE_NEWLINE_LF|PCRE_NEWLINE_ANY| \ - PCRE_NEWLINE_ANYCRLF) - -#define PUBLIC_COMPILE_OPTIONS \ - (PCRE_CASELESS|PCRE_EXTENDED|PCRE_ANCHORED|PCRE_MULTILINE| \ - PCRE_DOTALL|PCRE_DOLLAR_ENDONLY|PCRE_EXTRA|PCRE_UNGREEDY|PCRE_UTF8| \ - PCRE_NO_AUTO_CAPTURE|PCRE_NO_AUTO_POSSESS| \ - PCRE_NO_UTF8_CHECK|PCRE_AUTO_CALLOUT|PCRE_FIRSTLINE| \ - PCRE_DUPNAMES|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE| \ - PCRE_JAVASCRIPT_COMPAT|PCRE_UCP|PCRE_NO_START_OPTIMIZE|PCRE_NEVER_UTF) - -#define PUBLIC_EXEC_OPTIONS \ - (PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NOTEMPTY_ATSTART| \ - PCRE_NO_UTF8_CHECK|PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT|PCRE_NEWLINE_BITS| \ - PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE|PCRE_NO_START_OPTIMIZE) - -#define PUBLIC_DFA_EXEC_OPTIONS \ - (PCRE_ANCHORED|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|PCRE_NOTEMPTY_ATSTART| \ - PCRE_NO_UTF8_CHECK|PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT|PCRE_DFA_SHORTEST| \ - PCRE_DFA_RESTART|PCRE_NEWLINE_BITS|PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE| \ - PCRE_NO_START_OPTIMIZE) - -#define PUBLIC_STUDY_OPTIONS \ - (PCRE_STUDY_JIT_COMPILE|PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE| \ - PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE|PCRE_STUDY_EXTRA_NEEDED) - -#define PUBLIC_JIT_EXEC_OPTIONS \ - (PCRE_NO_UTF8_CHECK|PCRE_NOTBOL|PCRE_NOTEOL|PCRE_NOTEMPTY|\ - PCRE_NOTEMPTY_ATSTART|PCRE_PARTIAL_SOFT|PCRE_PARTIAL_HARD) - -/* Magic number to provide a small check against being handed junk. */ - -#define MAGIC_NUMBER 0x50435245UL /* 'PCRE' */ - -/* This variable is used to detect a loaded regular expression -in different endianness. */ - -#define REVERSED_MAGIC_NUMBER 0x45524350UL /* 'ERCP' */ - -/* The maximum remaining length of subject we are prepared to search for a -req_byte match. */ - -#define REQ_BYTE_MAX 1000 - -/* Miscellaneous definitions. The #ifndef is to pacify compiler warnings in -environments where these macros are defined elsewhere. Unfortunately, there -is no way to do the same for the typedef. */ - -typedef int BOOL; - -#ifndef FALSE -#define FALSE 0 -#define TRUE 1 -#endif - -/* If PCRE is to support UTF-8 on EBCDIC platforms, we cannot use normal -character constants like '*' because the compiler would emit their EBCDIC code, -which is different from their ASCII/UTF-8 code. Instead we define macros for -the characters so that they always use the ASCII/UTF-8 code when UTF-8 support -is enabled. When UTF-8 support is not enabled, the definitions use character -literals. Both character and string versions of each character are needed, and -there are some longer strings as well. - -This means that, on EBCDIC platforms, the PCRE library can handle either -EBCDIC, or UTF-8, but not both. To support both in the same compiled library -would need different lookups depending on whether PCRE_UTF8 was set or not. -This would make it impossible to use characters in switch/case statements, -which would reduce performance. For a theoretical use (which nobody has asked -for) in a minority area (EBCDIC platforms), this is not sensible. Any -application that did need both could compile two versions of the library, using -macros to give the functions distinct names. */ - -#ifndef SUPPORT_UTF - -/* UTF-8 support is not enabled; use the platform-dependent character literals -so that PCRE works in both ASCII and EBCDIC environments, but only in non-UTF -mode. Newline characters are problematic in EBCDIC. Though it has CR and LF -characters, a common practice has been to use its NL (0x15) character as the -line terminator in C-like processing environments. However, sometimes the LF -(0x25) character is used instead, according to this Unicode document: - -http://unicode.org/standard/reports/tr13/tr13-5.html - -PCRE defaults EBCDIC NL to 0x15, but has a build-time option to select 0x25 -instead. Whichever is *not* chosen is defined as NEL. - -In both ASCII and EBCDIC environments, CHAR_NL and CHAR_LF are synonyms for the -same code point. */ - -#ifdef EBCDIC - -#ifndef EBCDIC_NL25 -#define CHAR_NL '\x15' -#define CHAR_NEL '\x25' -#define STR_NL "\x15" -#define STR_NEL "\x25" -#else -#define CHAR_NL '\x25' -#define CHAR_NEL '\x15' -#define STR_NL "\x25" -#define STR_NEL "\x15" -#endif - -#define CHAR_LF CHAR_NL -#define STR_LF STR_NL - -#define CHAR_ESC '\047' -#define CHAR_DEL '\007' -#define CHAR_NBSP '\x41' -#define STR_ESC "\047" -#define STR_DEL "\007" - -#else /* Not EBCDIC */ - -/* In ASCII/Unicode, linefeed is '\n' and we equate this to NL for -compatibility. NEL is the Unicode newline character; make sure it is -a positive value. */ - -#define CHAR_LF '\n' -#define CHAR_NL CHAR_LF -#define CHAR_NEL ((unsigned char)'\x85') -#define CHAR_ESC '\033' -#define CHAR_DEL '\177' -#define CHAR_NBSP ((unsigned char)'\xa0') - -#define STR_LF "\n" -#define STR_NL STR_LF -#define STR_NEL "\x85" -#define STR_ESC "\033" -#define STR_DEL "\177" - -#endif /* EBCDIC */ - -/* The remaining definitions work in both environments. */ - -#define CHAR_NULL '\0' -#define CHAR_HT '\t' -#define CHAR_VT '\v' -#define CHAR_FF '\f' -#define CHAR_CR '\r' -#define CHAR_BS '\b' -#define CHAR_BEL '\a' - -#define CHAR_SPACE ' ' -#define CHAR_EXCLAMATION_MARK '!' -#define CHAR_QUOTATION_MARK '"' -#define CHAR_NUMBER_SIGN '#' -#define CHAR_DOLLAR_SIGN '$' -#define CHAR_PERCENT_SIGN '%' -#define CHAR_AMPERSAND '&' -#define CHAR_APOSTROPHE '\'' -#define CHAR_LEFT_PARENTHESIS '(' -#define CHAR_RIGHT_PARENTHESIS ')' -#define CHAR_ASTERISK '*' -#define CHAR_PLUS '+' -#define CHAR_COMMA ',' -#define CHAR_MINUS '-' -#define CHAR_DOT '.' -#define CHAR_SLASH '/' -#define CHAR_0 '0' -#define CHAR_1 '1' -#define CHAR_2 '2' -#define CHAR_3 '3' -#define CHAR_4 '4' -#define CHAR_5 '5' -#define CHAR_6 '6' -#define CHAR_7 '7' -#define CHAR_8 '8' -#define CHAR_9 '9' -#define CHAR_COLON ':' -#define CHAR_SEMICOLON ';' -#define CHAR_LESS_THAN_SIGN '<' -#define CHAR_EQUALS_SIGN '=' -#define CHAR_GREATER_THAN_SIGN '>' -#define CHAR_QUESTION_MARK '?' -#define CHAR_COMMERCIAL_AT '@' -#define CHAR_A 'A' -#define CHAR_B 'B' -#define CHAR_C 'C' -#define CHAR_D 'D' -#define CHAR_E 'E' -#define CHAR_F 'F' -#define CHAR_G 'G' -#define CHAR_H 'H' -#define CHAR_I 'I' -#define CHAR_J 'J' -#define CHAR_K 'K' -#define CHAR_L 'L' -#define CHAR_M 'M' -#define CHAR_N 'N' -#define CHAR_O 'O' -#define CHAR_P 'P' -#define CHAR_Q 'Q' -#define CHAR_R 'R' -#define CHAR_S 'S' -#define CHAR_T 'T' -#define CHAR_U 'U' -#define CHAR_V 'V' -#define CHAR_W 'W' -#define CHAR_X 'X' -#define CHAR_Y 'Y' -#define CHAR_Z 'Z' -#define CHAR_LEFT_SQUARE_BRACKET '[' -#define CHAR_BACKSLASH '\\' -#define CHAR_RIGHT_SQUARE_BRACKET ']' -#define CHAR_CIRCUMFLEX_ACCENT '^' -#define CHAR_UNDERSCORE '_' -#define CHAR_GRAVE_ACCENT '`' -#define CHAR_a 'a' -#define CHAR_b 'b' -#define CHAR_c 'c' -#define CHAR_d 'd' -#define CHAR_e 'e' -#define CHAR_f 'f' -#define CHAR_g 'g' -#define CHAR_h 'h' -#define CHAR_i 'i' -#define CHAR_j 'j' -#define CHAR_k 'k' -#define CHAR_l 'l' -#define CHAR_m 'm' -#define CHAR_n 'n' -#define CHAR_o 'o' -#define CHAR_p 'p' -#define CHAR_q 'q' -#define CHAR_r 'r' -#define CHAR_s 's' -#define CHAR_t 't' -#define CHAR_u 'u' -#define CHAR_v 'v' -#define CHAR_w 'w' -#define CHAR_x 'x' -#define CHAR_y 'y' -#define CHAR_z 'z' -#define CHAR_LEFT_CURLY_BRACKET '{' -#define CHAR_VERTICAL_LINE '|' -#define CHAR_RIGHT_CURLY_BRACKET '}' -#define CHAR_TILDE '~' - -#define STR_HT "\t" -#define STR_VT "\v" -#define STR_FF "\f" -#define STR_CR "\r" -#define STR_BS "\b" -#define STR_BEL "\a" - -#define STR_SPACE " " -#define STR_EXCLAMATION_MARK "!" -#define STR_QUOTATION_MARK "\"" -#define STR_NUMBER_SIGN "#" -#define STR_DOLLAR_SIGN "$" -#define STR_PERCENT_SIGN "%" -#define STR_AMPERSAND "&" -#define STR_APOSTROPHE "'" -#define STR_LEFT_PARENTHESIS "(" -#define STR_RIGHT_PARENTHESIS ")" -#define STR_ASTERISK "*" -#define STR_PLUS "+" -#define STR_COMMA "," -#define STR_MINUS "-" -#define STR_DOT "." -#define STR_SLASH "/" -#define STR_0 "0" -#define STR_1 "1" -#define STR_2 "2" -#define STR_3 "3" -#define STR_4 "4" -#define STR_5 "5" -#define STR_6 "6" -#define STR_7 "7" -#define STR_8 "8" -#define STR_9 "9" -#define STR_COLON ":" -#define STR_SEMICOLON ";" -#define STR_LESS_THAN_SIGN "<" -#define STR_EQUALS_SIGN "=" -#define STR_GREATER_THAN_SIGN ">" -#define STR_QUESTION_MARK "?" -#define STR_COMMERCIAL_AT "@" -#define STR_A "A" -#define STR_B "B" -#define STR_C "C" -#define STR_D "D" -#define STR_E "E" -#define STR_F "F" -#define STR_G "G" -#define STR_H "H" -#define STR_I "I" -#define STR_J "J" -#define STR_K "K" -#define STR_L "L" -#define STR_M "M" -#define STR_N "N" -#define STR_O "O" -#define STR_P "P" -#define STR_Q "Q" -#define STR_R "R" -#define STR_S "S" -#define STR_T "T" -#define STR_U "U" -#define STR_V "V" -#define STR_W "W" -#define STR_X "X" -#define STR_Y "Y" -#define STR_Z "Z" -#define STR_LEFT_SQUARE_BRACKET "[" -#define STR_BACKSLASH "\\" -#define STR_RIGHT_SQUARE_BRACKET "]" -#define STR_CIRCUMFLEX_ACCENT "^" -#define STR_UNDERSCORE "_" -#define STR_GRAVE_ACCENT "`" -#define STR_a "a" -#define STR_b "b" -#define STR_c "c" -#define STR_d "d" -#define STR_e "e" -#define STR_f "f" -#define STR_g "g" -#define STR_h "h" -#define STR_i "i" -#define STR_j "j" -#define STR_k "k" -#define STR_l "l" -#define STR_m "m" -#define STR_n "n" -#define STR_o "o" -#define STR_p "p" -#define STR_q "q" -#define STR_r "r" -#define STR_s "s" -#define STR_t "t" -#define STR_u "u" -#define STR_v "v" -#define STR_w "w" -#define STR_x "x" -#define STR_y "y" -#define STR_z "z" -#define STR_LEFT_CURLY_BRACKET "{" -#define STR_VERTICAL_LINE "|" -#define STR_RIGHT_CURLY_BRACKET "}" -#define STR_TILDE "~" - -#define STRING_ACCEPT0 "ACCEPT\0" -#define STRING_COMMIT0 "COMMIT\0" -#define STRING_F0 "F\0" -#define STRING_FAIL0 "FAIL\0" -#define STRING_MARK0 "MARK\0" -#define STRING_PRUNE0 "PRUNE\0" -#define STRING_SKIP0 "SKIP\0" -#define STRING_THEN "THEN" - -#define STRING_alpha0 "alpha\0" -#define STRING_lower0 "lower\0" -#define STRING_upper0 "upper\0" -#define STRING_alnum0 "alnum\0" -#define STRING_ascii0 "ascii\0" -#define STRING_blank0 "blank\0" -#define STRING_cntrl0 "cntrl\0" -#define STRING_digit0 "digit\0" -#define STRING_graph0 "graph\0" -#define STRING_print0 "print\0" -#define STRING_punct0 "punct\0" -#define STRING_space0 "space\0" -#define STRING_word0 "word\0" -#define STRING_xdigit "xdigit" - -#define STRING_DEFINE "DEFINE" -#define STRING_WEIRD_STARTWORD "[:<:]]" -#define STRING_WEIRD_ENDWORD "[:>:]]" - -#define STRING_CR_RIGHTPAR "CR)" -#define STRING_LF_RIGHTPAR "LF)" -#define STRING_CRLF_RIGHTPAR "CRLF)" -#define STRING_ANY_RIGHTPAR "ANY)" -#define STRING_ANYCRLF_RIGHTPAR "ANYCRLF)" -#define STRING_BSR_ANYCRLF_RIGHTPAR "BSR_ANYCRLF)" -#define STRING_BSR_UNICODE_RIGHTPAR "BSR_UNICODE)" -#define STRING_UTF8_RIGHTPAR "UTF8)" -#define STRING_UTF16_RIGHTPAR "UTF16)" -#define STRING_UTF32_RIGHTPAR "UTF32)" -#define STRING_UTF_RIGHTPAR "UTF)" -#define STRING_UCP_RIGHTPAR "UCP)" -#define STRING_NO_AUTO_POSSESS_RIGHTPAR "NO_AUTO_POSSESS)" -#define STRING_NO_START_OPT_RIGHTPAR "NO_START_OPT)" -#define STRING_LIMIT_MATCH_EQ "LIMIT_MATCH=" -#define STRING_LIMIT_RECURSION_EQ "LIMIT_RECURSION=" - -#else /* SUPPORT_UTF */ - -/* UTF-8 support is enabled; always use UTF-8 (=ASCII) character codes. This -works in both modes non-EBCDIC platforms, and on EBCDIC platforms in UTF-8 mode -only. */ - -#define CHAR_HT '\011' -#define CHAR_VT '\013' -#define CHAR_FF '\014' -#define CHAR_CR '\015' -#define CHAR_LF '\012' -#define CHAR_NL CHAR_LF -#define CHAR_NEL ((unsigned char)'\x85') -#define CHAR_BS '\010' -#define CHAR_BEL '\007' -#define CHAR_ESC '\033' -#define CHAR_DEL '\177' - -#define CHAR_NULL '\0' -#define CHAR_SPACE '\040' -#define CHAR_EXCLAMATION_MARK '\041' -#define CHAR_QUOTATION_MARK '\042' -#define CHAR_NUMBER_SIGN '\043' -#define CHAR_DOLLAR_SIGN '\044' -#define CHAR_PERCENT_SIGN '\045' -#define CHAR_AMPERSAND '\046' -#define CHAR_APOSTROPHE '\047' -#define CHAR_LEFT_PARENTHESIS '\050' -#define CHAR_RIGHT_PARENTHESIS '\051' -#define CHAR_ASTERISK '\052' -#define CHAR_PLUS '\053' -#define CHAR_COMMA '\054' -#define CHAR_MINUS '\055' -#define CHAR_DOT '\056' -#define CHAR_SLASH '\057' -#define CHAR_0 '\060' -#define CHAR_1 '\061' -#define CHAR_2 '\062' -#define CHAR_3 '\063' -#define CHAR_4 '\064' -#define CHAR_5 '\065' -#define CHAR_6 '\066' -#define CHAR_7 '\067' -#define CHAR_8 '\070' -#define CHAR_9 '\071' -#define CHAR_COLON '\072' -#define CHAR_SEMICOLON '\073' -#define CHAR_LESS_THAN_SIGN '\074' -#define CHAR_EQUALS_SIGN '\075' -#define CHAR_GREATER_THAN_SIGN '\076' -#define CHAR_QUESTION_MARK '\077' -#define CHAR_COMMERCIAL_AT '\100' -#define CHAR_A '\101' -#define CHAR_B '\102' -#define CHAR_C '\103' -#define CHAR_D '\104' -#define CHAR_E '\105' -#define CHAR_F '\106' -#define CHAR_G '\107' -#define CHAR_H '\110' -#define CHAR_I '\111' -#define CHAR_J '\112' -#define CHAR_K '\113' -#define CHAR_L '\114' -#define CHAR_M '\115' -#define CHAR_N '\116' -#define CHAR_O '\117' -#define CHAR_P '\120' -#define CHAR_Q '\121' -#define CHAR_R '\122' -#define CHAR_S '\123' -#define CHAR_T '\124' -#define CHAR_U '\125' -#define CHAR_V '\126' -#define CHAR_W '\127' -#define CHAR_X '\130' -#define CHAR_Y '\131' -#define CHAR_Z '\132' -#define CHAR_LEFT_SQUARE_BRACKET '\133' -#define CHAR_BACKSLASH '\134' -#define CHAR_RIGHT_SQUARE_BRACKET '\135' -#define CHAR_CIRCUMFLEX_ACCENT '\136' -#define CHAR_UNDERSCORE '\137' -#define CHAR_GRAVE_ACCENT '\140' -#define CHAR_a '\141' -#define CHAR_b '\142' -#define CHAR_c '\143' -#define CHAR_d '\144' -#define CHAR_e '\145' -#define CHAR_f '\146' -#define CHAR_g '\147' -#define CHAR_h '\150' -#define CHAR_i '\151' -#define CHAR_j '\152' -#define CHAR_k '\153' -#define CHAR_l '\154' -#define CHAR_m '\155' -#define CHAR_n '\156' -#define CHAR_o '\157' -#define CHAR_p '\160' -#define CHAR_q '\161' -#define CHAR_r '\162' -#define CHAR_s '\163' -#define CHAR_t '\164' -#define CHAR_u '\165' -#define CHAR_v '\166' -#define CHAR_w '\167' -#define CHAR_x '\170' -#define CHAR_y '\171' -#define CHAR_z '\172' -#define CHAR_LEFT_CURLY_BRACKET '\173' -#define CHAR_VERTICAL_LINE '\174' -#define CHAR_RIGHT_CURLY_BRACKET '\175' -#define CHAR_TILDE '\176' -#define CHAR_NBSP ((unsigned char)'\xa0') - -#define STR_HT "\011" -#define STR_VT "\013" -#define STR_FF "\014" -#define STR_CR "\015" -#define STR_NL "\012" -#define STR_BS "\010" -#define STR_BEL "\007" -#define STR_ESC "\033" -#define STR_DEL "\177" - -#define STR_SPACE "\040" -#define STR_EXCLAMATION_MARK "\041" -#define STR_QUOTATION_MARK "\042" -#define STR_NUMBER_SIGN "\043" -#define STR_DOLLAR_SIGN "\044" -#define STR_PERCENT_SIGN "\045" -#define STR_AMPERSAND "\046" -#define STR_APOSTROPHE "\047" -#define STR_LEFT_PARENTHESIS "\050" -#define STR_RIGHT_PARENTHESIS "\051" -#define STR_ASTERISK "\052" -#define STR_PLUS "\053" -#define STR_COMMA "\054" -#define STR_MINUS "\055" -#define STR_DOT "\056" -#define STR_SLASH "\057" -#define STR_0 "\060" -#define STR_1 "\061" -#define STR_2 "\062" -#define STR_3 "\063" -#define STR_4 "\064" -#define STR_5 "\065" -#define STR_6 "\066" -#define STR_7 "\067" -#define STR_8 "\070" -#define STR_9 "\071" -#define STR_COLON "\072" -#define STR_SEMICOLON "\073" -#define STR_LESS_THAN_SIGN "\074" -#define STR_EQUALS_SIGN "\075" -#define STR_GREATER_THAN_SIGN "\076" -#define STR_QUESTION_MARK "\077" -#define STR_COMMERCIAL_AT "\100" -#define STR_A "\101" -#define STR_B "\102" -#define STR_C "\103" -#define STR_D "\104" -#define STR_E "\105" -#define STR_F "\106" -#define STR_G "\107" -#define STR_H "\110" -#define STR_I "\111" -#define STR_J "\112" -#define STR_K "\113" -#define STR_L "\114" -#define STR_M "\115" -#define STR_N "\116" -#define STR_O "\117" -#define STR_P "\120" -#define STR_Q "\121" -#define STR_R "\122" -#define STR_S "\123" -#define STR_T "\124" -#define STR_U "\125" -#define STR_V "\126" -#define STR_W "\127" -#define STR_X "\130" -#define STR_Y "\131" -#define STR_Z "\132" -#define STR_LEFT_SQUARE_BRACKET "\133" -#define STR_BACKSLASH "\134" -#define STR_RIGHT_SQUARE_BRACKET "\135" -#define STR_CIRCUMFLEX_ACCENT "\136" -#define STR_UNDERSCORE "\137" -#define STR_GRAVE_ACCENT "\140" -#define STR_a "\141" -#define STR_b "\142" -#define STR_c "\143" -#define STR_d "\144" -#define STR_e "\145" -#define STR_f "\146" -#define STR_g "\147" -#define STR_h "\150" -#define STR_i "\151" -#define STR_j "\152" -#define STR_k "\153" -#define STR_l "\154" -#define STR_m "\155" -#define STR_n "\156" -#define STR_o "\157" -#define STR_p "\160" -#define STR_q "\161" -#define STR_r "\162" -#define STR_s "\163" -#define STR_t "\164" -#define STR_u "\165" -#define STR_v "\166" -#define STR_w "\167" -#define STR_x "\170" -#define STR_y "\171" -#define STR_z "\172" -#define STR_LEFT_CURLY_BRACKET "\173" -#define STR_VERTICAL_LINE "\174" -#define STR_RIGHT_CURLY_BRACKET "\175" -#define STR_TILDE "\176" - -#define STRING_ACCEPT0 STR_A STR_C STR_C STR_E STR_P STR_T "\0" -#define STRING_COMMIT0 STR_C STR_O STR_M STR_M STR_I STR_T "\0" -#define STRING_F0 STR_F "\0" -#define STRING_FAIL0 STR_F STR_A STR_I STR_L "\0" -#define STRING_MARK0 STR_M STR_A STR_R STR_K "\0" -#define STRING_PRUNE0 STR_P STR_R STR_U STR_N STR_E "\0" -#define STRING_SKIP0 STR_S STR_K STR_I STR_P "\0" -#define STRING_THEN STR_T STR_H STR_E STR_N - -#define STRING_alpha0 STR_a STR_l STR_p STR_h STR_a "\0" -#define STRING_lower0 STR_l STR_o STR_w STR_e STR_r "\0" -#define STRING_upper0 STR_u STR_p STR_p STR_e STR_r "\0" -#define STRING_alnum0 STR_a STR_l STR_n STR_u STR_m "\0" -#define STRING_ascii0 STR_a STR_s STR_c STR_i STR_i "\0" -#define STRING_blank0 STR_b STR_l STR_a STR_n STR_k "\0" -#define STRING_cntrl0 STR_c STR_n STR_t STR_r STR_l "\0" -#define STRING_digit0 STR_d STR_i STR_g STR_i STR_t "\0" -#define STRING_graph0 STR_g STR_r STR_a STR_p STR_h "\0" -#define STRING_print0 STR_p STR_r STR_i STR_n STR_t "\0" -#define STRING_punct0 STR_p STR_u STR_n STR_c STR_t "\0" -#define STRING_space0 STR_s STR_p STR_a STR_c STR_e "\0" -#define STRING_word0 STR_w STR_o STR_r STR_d "\0" -#define STRING_xdigit STR_x STR_d STR_i STR_g STR_i STR_t - -#define STRING_DEFINE STR_D STR_E STR_F STR_I STR_N STR_E -#define STRING_WEIRD_STARTWORD STR_LEFT_SQUARE_BRACKET STR_COLON STR_LESS_THAN_SIGN STR_COLON STR_RIGHT_SQUARE_BRACKET STR_RIGHT_SQUARE_BRACKET -#define STRING_WEIRD_ENDWORD STR_LEFT_SQUARE_BRACKET STR_COLON STR_GREATER_THAN_SIGN STR_COLON STR_RIGHT_SQUARE_BRACKET STR_RIGHT_SQUARE_BRACKET - -#define STRING_CR_RIGHTPAR STR_C STR_R STR_RIGHT_PARENTHESIS -#define STRING_LF_RIGHTPAR STR_L STR_F STR_RIGHT_PARENTHESIS -#define STRING_CRLF_RIGHTPAR STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS -#define STRING_ANY_RIGHTPAR STR_A STR_N STR_Y STR_RIGHT_PARENTHESIS -#define STRING_ANYCRLF_RIGHTPAR STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS -#define STRING_BSR_ANYCRLF_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS -#define STRING_BSR_UNICODE_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_U STR_N STR_I STR_C STR_O STR_D STR_E STR_RIGHT_PARENTHESIS -#define STRING_UTF8_RIGHTPAR STR_U STR_T STR_F STR_8 STR_RIGHT_PARENTHESIS -#define STRING_UTF16_RIGHTPAR STR_U STR_T STR_F STR_1 STR_6 STR_RIGHT_PARENTHESIS -#define STRING_UTF32_RIGHTPAR STR_U STR_T STR_F STR_3 STR_2 STR_RIGHT_PARENTHESIS -#define STRING_UTF_RIGHTPAR STR_U STR_T STR_F STR_RIGHT_PARENTHESIS -#define STRING_UCP_RIGHTPAR STR_U STR_C STR_P STR_RIGHT_PARENTHESIS -#define STRING_NO_AUTO_POSSESS_RIGHTPAR STR_N STR_O STR_UNDERSCORE STR_A STR_U STR_T STR_O STR_UNDERSCORE STR_P STR_O STR_S STR_S STR_E STR_S STR_S STR_RIGHT_PARENTHESIS -#define STRING_NO_START_OPT_RIGHTPAR STR_N STR_O STR_UNDERSCORE STR_S STR_T STR_A STR_R STR_T STR_UNDERSCORE STR_O STR_P STR_T STR_RIGHT_PARENTHESIS -#define STRING_LIMIT_MATCH_EQ STR_L STR_I STR_M STR_I STR_T STR_UNDERSCORE STR_M STR_A STR_T STR_C STR_H STR_EQUALS_SIGN -#define STRING_LIMIT_RECURSION_EQ STR_L STR_I STR_M STR_I STR_T STR_UNDERSCORE STR_R STR_E STR_C STR_U STR_R STR_S STR_I STR_O STR_N STR_EQUALS_SIGN - -#endif /* SUPPORT_UTF */ - -/* Escape items that are just an encoding of a particular data value. */ - -#ifndef ESC_a -#define ESC_a CHAR_BEL -#endif - -#ifndef ESC_e -#define ESC_e CHAR_ESC -#endif - -#ifndef ESC_f -#define ESC_f CHAR_FF -#endif - -#ifndef ESC_n -#define ESC_n CHAR_LF -#endif - -#ifndef ESC_r -#define ESC_r CHAR_CR -#endif - -/* We can't officially use ESC_t because it is a POSIX reserved identifier -(presumably because of all the others like size_t). */ - -#ifndef ESC_tee -#define ESC_tee CHAR_HT -#endif - -/* Codes for different types of Unicode property */ - -#define PT_ANY 0 /* Any property - matches all chars */ -#define PT_LAMP 1 /* L& - the union of Lu, Ll, Lt */ -#define PT_GC 2 /* Specified general characteristic (e.g. L) */ -#define PT_PC 3 /* Specified particular characteristic (e.g. Lu) */ -#define PT_SC 4 /* Script (e.g. Han) */ -#define PT_ALNUM 5 /* Alphanumeric - the union of L and N */ -#define PT_SPACE 6 /* Perl space - Z plus 9,10,12,13 */ -#define PT_PXSPACE 7 /* POSIX space - Z plus 9,10,11,12,13 */ -#define PT_WORD 8 /* Word - L plus N plus underscore */ -#define PT_CLIST 9 /* Pseudo-property: match character list */ -#define PT_UCNC 10 /* Universal Character nameable character */ -#define PT_TABSIZE 11 /* Size of square table for autopossessify tests */ - -/* The following special properties are used only in XCLASS items, when POSIX -classes are specified and PCRE_UCP is set - in other words, for Unicode -handling of these classes. They are not available via the \p or \P escapes like -those in the above list, and so they do not take part in the autopossessifying -table. */ - -#define PT_PXGRAPH 11 /* [:graph:] - characters that mark the paper */ -#define PT_PXPRINT 12 /* [:print:] - [:graph:] plus non-control spaces */ -#define PT_PXPUNCT 13 /* [:punct:] - punctuation characters */ - -/* Flag bits and data types for the extended class (OP_XCLASS) for classes that -contain characters with values greater than 255. */ - -#define XCL_NOT 0x01 /* Flag: this is a negative class */ -#define XCL_MAP 0x02 /* Flag: a 32-byte map is present */ -#define XCL_HASPROP 0x04 /* Flag: property checks are present. */ - -#define XCL_END 0 /* Marks end of individual items */ -#define XCL_SINGLE 1 /* Single item (one multibyte char) follows */ -#define XCL_RANGE 2 /* A range (two multibyte chars) follows */ -#define XCL_PROP 3 /* Unicode property (2-byte property code follows) */ -#define XCL_NOTPROP 4 /* Unicode inverted property (ditto) */ - -/* These are escaped items that aren't just an encoding of a particular data -value such as \n. They must have non-zero values, as check_escape() returns 0 -for a data character. Also, they must appear in the same order as in the -opcode definitions below, up to ESC_z. There's a dummy for OP_ALLANY because it -corresponds to "." in DOTALL mode rather than an escape sequence. It is also -used for [^] in JavaScript compatibility mode, and for \C in non-utf mode. In -non-DOTALL mode, "." behaves like \N. - -The special values ESC_DU, ESC_du, etc. are used instead of ESC_D, ESC_d, etc. -when PCRE_UCP is set and replacement of \d etc by \p sequences is required. -They must be contiguous, and remain in order so that the replacements can be -looked up from a table. - -Negative numbers are used to encode a backreference (\1, \2, \3, etc.) in -check_escape(). There are two tests in the code for an escape -greater than ESC_b and less than ESC_Z to detect the types that may be -repeated. These are the types that consume characters. If any new escapes are -put in between that don't consume a character, that code will have to change. -*/ - -enum { ESC_A = 1, ESC_G, ESC_K, ESC_B, ESC_b, ESC_D, ESC_d, ESC_S, ESC_s, - ESC_W, ESC_w, ESC_N, ESC_dum, ESC_C, ESC_P, ESC_p, ESC_R, ESC_H, - ESC_h, ESC_V, ESC_v, ESC_X, ESC_Z, ESC_z, - ESC_E, ESC_Q, ESC_g, ESC_k, - ESC_DU, ESC_du, ESC_SU, ESC_su, ESC_WU, ESC_wu }; - - -/********************** Opcode definitions ******************/ - -/****** NOTE NOTE NOTE ****** - -Starting from 1 (i.e. after OP_END), the values up to OP_EOD must correspond in -order to the list of escapes immediately above. Furthermore, values up to -OP_DOLLM must not be changed without adjusting the table called autoposstab in -pcre_compile.c - -Whenever this list is updated, the two macro definitions that follow must be -updated to match. The possessification table called "opcode_possessify" in -pcre_compile.c must also be updated, and also the tables called "coptable" -and "poptable" in pcre_dfa_exec.c. - -****** NOTE NOTE NOTE ******/ - - -/* The values between FIRST_AUTOTAB_OP and LAST_AUTOTAB_RIGHT_OP, inclusive, -are used in a table for deciding whether a repeated character type can be -auto-possessified. */ - -#define FIRST_AUTOTAB_OP OP_NOT_DIGIT -#define LAST_AUTOTAB_LEFT_OP OP_EXTUNI -#define LAST_AUTOTAB_RIGHT_OP OP_DOLLM - -enum { - OP_END, /* 0 End of pattern */ - - /* Values corresponding to backslashed metacharacters */ - - OP_SOD, /* 1 Start of data: \A */ - OP_SOM, /* 2 Start of match (subject + offset): \G */ - OP_SET_SOM, /* 3 Set start of match (\K) */ - OP_NOT_WORD_BOUNDARY, /* 4 \B */ - OP_WORD_BOUNDARY, /* 5 \b */ - OP_NOT_DIGIT, /* 6 \D */ - OP_DIGIT, /* 7 \d */ - OP_NOT_WHITESPACE, /* 8 \S */ - OP_WHITESPACE, /* 9 \s */ - OP_NOT_WORDCHAR, /* 10 \W */ - OP_WORDCHAR, /* 11 \w */ - - OP_ANY, /* 12 Match any character except newline (\N) */ - OP_ALLANY, /* 13 Match any character */ - OP_ANYBYTE, /* 14 Match any byte (\C); different to OP_ANY for UTF-8 */ - OP_NOTPROP, /* 15 \P (not Unicode property) */ - OP_PROP, /* 16 \p (Unicode property) */ - OP_ANYNL, /* 17 \R (any newline sequence) */ - OP_NOT_HSPACE, /* 18 \H (not horizontal whitespace) */ - OP_HSPACE, /* 19 \h (horizontal whitespace) */ - OP_NOT_VSPACE, /* 20 \V (not vertical whitespace) */ - OP_VSPACE, /* 21 \v (vertical whitespace) */ - OP_EXTUNI, /* 22 \X (extended Unicode sequence */ - OP_EODN, /* 23 End of data or \n at end of data (\Z) */ - OP_EOD, /* 24 End of data (\z) */ - - /* Line end assertions */ - - OP_DOLL, /* 25 End of line - not multiline */ - OP_DOLLM, /* 26 End of line - multiline */ - OP_CIRC, /* 27 Start of line - not multiline */ - OP_CIRCM, /* 28 Start of line - multiline */ - - /* Single characters; caseful must precede the caseless ones */ - - OP_CHAR, /* 29 Match one character, casefully */ - OP_CHARI, /* 30 Match one character, caselessly */ - OP_NOT, /* 31 Match one character, not the given one, casefully */ - OP_NOTI, /* 32 Match one character, not the given one, caselessly */ - - /* The following sets of 13 opcodes must always be kept in step because - the offset from the first one is used to generate the others. */ - - /* Repeated characters; caseful must precede the caseless ones */ - - OP_STAR, /* 33 The maximizing and minimizing versions of */ - OP_MINSTAR, /* 34 these six opcodes must come in pairs, with */ - OP_PLUS, /* 35 the minimizing one second. */ - OP_MINPLUS, /* 36 */ - OP_QUERY, /* 37 */ - OP_MINQUERY, /* 38 */ - - OP_UPTO, /* 39 From 0 to n matches of one character, caseful*/ - OP_MINUPTO, /* 40 */ - OP_EXACT, /* 41 Exactly n matches */ - - OP_POSSTAR, /* 42 Possessified star, caseful */ - OP_POSPLUS, /* 43 Possessified plus, caseful */ - OP_POSQUERY, /* 44 Posesssified query, caseful */ - OP_POSUPTO, /* 45 Possessified upto, caseful */ - - /* Repeated characters; caseless must follow the caseful ones */ - - OP_STARI, /* 46 */ - OP_MINSTARI, /* 47 */ - OP_PLUSI, /* 48 */ - OP_MINPLUSI, /* 49 */ - OP_QUERYI, /* 50 */ - OP_MINQUERYI, /* 51 */ - - OP_UPTOI, /* 52 From 0 to n matches of one character, caseless */ - OP_MINUPTOI, /* 53 */ - OP_EXACTI, /* 54 */ - - OP_POSSTARI, /* 55 Possessified star, caseless */ - OP_POSPLUSI, /* 56 Possessified plus, caseless */ - OP_POSQUERYI, /* 57 Posesssified query, caseless */ - OP_POSUPTOI, /* 58 Possessified upto, caseless */ - - /* The negated ones must follow the non-negated ones, and match them */ - /* Negated repeated character, caseful; must precede the caseless ones */ - - OP_NOTSTAR, /* 59 The maximizing and minimizing versions of */ - OP_NOTMINSTAR, /* 60 these six opcodes must come in pairs, with */ - OP_NOTPLUS, /* 61 the minimizing one second. They must be in */ - OP_NOTMINPLUS, /* 62 exactly the same order as those above. */ - OP_NOTQUERY, /* 63 */ - OP_NOTMINQUERY, /* 64 */ - - OP_NOTUPTO, /* 65 From 0 to n matches, caseful */ - OP_NOTMINUPTO, /* 66 */ - OP_NOTEXACT, /* 67 Exactly n matches */ - - OP_NOTPOSSTAR, /* 68 Possessified versions, caseful */ - OP_NOTPOSPLUS, /* 69 */ - OP_NOTPOSQUERY, /* 70 */ - OP_NOTPOSUPTO, /* 71 */ - - /* Negated repeated character, caseless; must follow the caseful ones */ - - OP_NOTSTARI, /* 72 */ - OP_NOTMINSTARI, /* 73 */ - OP_NOTPLUSI, /* 74 */ - OP_NOTMINPLUSI, /* 75 */ - OP_NOTQUERYI, /* 76 */ - OP_NOTMINQUERYI, /* 77 */ - - OP_NOTUPTOI, /* 78 From 0 to n matches, caseless */ - OP_NOTMINUPTOI, /* 79 */ - OP_NOTEXACTI, /* 80 Exactly n matches */ - - OP_NOTPOSSTARI, /* 81 Possessified versions, caseless */ - OP_NOTPOSPLUSI, /* 82 */ - OP_NOTPOSQUERYI, /* 83 */ - OP_NOTPOSUPTOI, /* 84 */ - - /* Character types */ - - OP_TYPESTAR, /* 85 The maximizing and minimizing versions of */ - OP_TYPEMINSTAR, /* 86 these six opcodes must come in pairs, with */ - OP_TYPEPLUS, /* 87 the minimizing one second. These codes must */ - OP_TYPEMINPLUS, /* 88 be in exactly the same order as those above. */ - OP_TYPEQUERY, /* 89 */ - OP_TYPEMINQUERY, /* 90 */ - - OP_TYPEUPTO, /* 91 From 0 to n matches */ - OP_TYPEMINUPTO, /* 92 */ - OP_TYPEEXACT, /* 93 Exactly n matches */ - - OP_TYPEPOSSTAR, /* 94 Possessified versions */ - OP_TYPEPOSPLUS, /* 95 */ - OP_TYPEPOSQUERY, /* 96 */ - OP_TYPEPOSUPTO, /* 97 */ - - /* These are used for character classes and back references; only the - first six are the same as the sets above. */ - - OP_CRSTAR, /* 98 The maximizing and minimizing versions of */ - OP_CRMINSTAR, /* 99 all these opcodes must come in pairs, with */ - OP_CRPLUS, /* 100 the minimizing one second. These codes must */ - OP_CRMINPLUS, /* 101 be in exactly the same order as those above. */ - OP_CRQUERY, /* 102 */ - OP_CRMINQUERY, /* 103 */ - - OP_CRRANGE, /* 104 These are different to the three sets above. */ - OP_CRMINRANGE, /* 105 */ - - OP_CRPOSSTAR, /* 106 Possessified versions */ - OP_CRPOSPLUS, /* 107 */ - OP_CRPOSQUERY, /* 108 */ - OP_CRPOSRANGE, /* 109 */ - - /* End of quantifier opcodes */ - - OP_CLASS, /* 110 Match a character class, chars < 256 only */ - OP_NCLASS, /* 111 Same, but the bitmap was created from a negative - class - the difference is relevant only when a - character > 255 is encountered. */ - OP_XCLASS, /* 112 Extended class for handling > 255 chars within the - class. This does both positive and negative. */ - OP_REF, /* 113 Match a back reference, casefully */ - OP_REFI, /* 114 Match a back reference, caselessly */ - OP_DNREF, /* 115 Match a duplicate name backref, casefully */ - OP_DNREFI, /* 116 Match a duplicate name backref, caselessly */ - OP_RECURSE, /* 117 Match a numbered subpattern (possibly recursive) */ - OP_CALLOUT, /* 118 Call out to external function if provided */ - - OP_ALT, /* 119 Start of alternation */ - OP_KET, /* 120 End of group that doesn't have an unbounded repeat */ - OP_KETRMAX, /* 121 These two must remain together and in this */ - OP_KETRMIN, /* 122 order. They are for groups the repeat for ever. */ - OP_KETRPOS, /* 123 Possessive unlimited repeat. */ - - /* The assertions must come before BRA, CBRA, ONCE, and COND, and the four - asserts must remain in order. */ - - OP_REVERSE, /* 124 Move pointer back - used in lookbehind assertions */ - OP_ASSERT, /* 125 Positive lookahead */ - OP_ASSERT_NOT, /* 126 Negative lookahead */ - OP_ASSERTBACK, /* 127 Positive lookbehind */ - OP_ASSERTBACK_NOT, /* 128 Negative lookbehind */ - - /* ONCE, ONCE_NC, BRA, BRAPOS, CBRA, CBRAPOS, and COND must come immediately - after the assertions, with ONCE first, as there's a test for >= ONCE for a - subpattern that isn't an assertion. The POS versions must immediately follow - the non-POS versions in each case. */ - - OP_ONCE, /* 129 Atomic group, contains captures */ - OP_ONCE_NC, /* 130 Atomic group containing no captures */ - OP_BRA, /* 131 Start of non-capturing bracket */ - OP_BRAPOS, /* 132 Ditto, with unlimited, possessive repeat */ - OP_CBRA, /* 133 Start of capturing bracket */ - OP_CBRAPOS, /* 134 Ditto, with unlimited, possessive repeat */ - OP_COND, /* 135 Conditional group */ - - /* These five must follow the previous five, in the same order. There's a - check for >= SBRA to distinguish the two sets. */ - - OP_SBRA, /* 136 Start of non-capturing bracket, check empty */ - OP_SBRAPOS, /* 137 Ditto, with unlimited, possessive repeat */ - OP_SCBRA, /* 138 Start of capturing bracket, check empty */ - OP_SCBRAPOS, /* 139 Ditto, with unlimited, possessive repeat */ - OP_SCOND, /* 140 Conditional group, check empty */ - - /* The next two pairs must (respectively) be kept together. */ - - OP_CREF, /* 141 Used to hold a capture number as condition */ - OP_DNCREF, /* 142 Used to point to duplicate names as a condition */ - OP_RREF, /* 143 Used to hold a recursion number as condition */ - OP_DNRREF, /* 144 Used to point to duplicate names as a condition */ - OP_DEF, /* 145 The DEFINE condition */ - - OP_BRAZERO, /* 146 These two must remain together and in this */ - OP_BRAMINZERO, /* 147 order. */ - OP_BRAPOSZERO, /* 148 */ - - /* These are backtracking control verbs */ - - OP_MARK, /* 149 always has an argument */ - OP_PRUNE, /* 150 */ - OP_PRUNE_ARG, /* 151 same, but with argument */ - OP_SKIP, /* 152 */ - OP_SKIP_ARG, /* 153 same, but with argument */ - OP_THEN, /* 154 */ - OP_THEN_ARG, /* 155 same, but with argument */ - OP_COMMIT, /* 156 */ - - /* These are forced failure and success verbs */ - - OP_FAIL, /* 157 */ - OP_ACCEPT, /* 158 */ - OP_ASSERT_ACCEPT, /* 159 Used inside assertions */ - OP_CLOSE, /* 160 Used before OP_ACCEPT to close open captures */ - - /* This is used to skip a subpattern with a {0} quantifier */ - - OP_SKIPZERO, /* 161 */ - - /* This is not an opcode, but is used to check that tables indexed by opcode - are the correct length, in order to catch updating errors - there have been - some in the past. */ - - OP_TABLE_LENGTH -}; - -/* *** NOTE NOTE NOTE *** Whenever the list above is updated, the two macro -definitions that follow must also be updated to match. There are also tables -called "opcode_possessify" in pcre_compile.c and "coptable" and "poptable" in -pcre_dfa_exec.c that must be updated. */ - - -/* This macro defines textual names for all the opcodes. These are used only -for debugging, and some of them are only partial names. The macro is referenced -only in pcre_printint.c, which fills out the full names in many cases (and in -some cases doesn't actually use these names at all). */ - -#define OP_NAME_LIST \ - "End", "\\A", "\\G", "\\K", "\\B", "\\b", "\\D", "\\d", \ - "\\S", "\\s", "\\W", "\\w", "Any", "AllAny", "Anybyte", \ - "notprop", "prop", "\\R", "\\H", "\\h", "\\V", "\\v", \ - "extuni", "\\Z", "\\z", \ - "$", "$", "^", "^", "char", "chari", "not", "noti", \ - "*", "*?", "+", "+?", "?", "??", \ - "{", "{", "{", \ - "*+","++", "?+", "{", \ - "*", "*?", "+", "+?", "?", "??", \ - "{", "{", "{", \ - "*+","++", "?+", "{", \ - "*", "*?", "+", "+?", "?", "??", \ - "{", "{", "{", \ - "*+","++", "?+", "{", \ - "*", "*?", "+", "+?", "?", "??", \ - "{", "{", "{", \ - "*+","++", "?+", "{", \ - "*", "*?", "+", "+?", "?", "??", "{", "{", "{", \ - "*+","++", "?+", "{", \ - "*", "*?", "+", "+?", "?", "??", "{", "{", \ - "*+","++", "?+", "{", \ - "class", "nclass", "xclass", "Ref", "Refi", "DnRef", "DnRefi", \ - "Recurse", "Callout", \ - "Alt", "Ket", "KetRmax", "KetRmin", "KetRpos", \ - "Reverse", "Assert", "Assert not", "AssertB", "AssertB not", \ - "Once", "Once_NC", \ - "Bra", "BraPos", "CBra", "CBraPos", \ - "Cond", \ - "SBra", "SBraPos", "SCBra", "SCBraPos", \ - "SCond", \ - "Cond ref", "Cond dnref", "Cond rec", "Cond dnrec", "Cond def", \ - "Brazero", "Braminzero", "Braposzero", \ - "*MARK", "*PRUNE", "*PRUNE", "*SKIP", "*SKIP", \ - "*THEN", "*THEN", "*COMMIT", "*FAIL", \ - "*ACCEPT", "*ASSERT_ACCEPT", \ - "Close", "Skip zero" - - -/* This macro defines the length of fixed length operations in the compiled -regex. The lengths are used when searching for specific things, and also in the -debugging printing of a compiled regex. We use a macro so that it can be -defined close to the definitions of the opcodes themselves. - -As things have been extended, some of these are no longer fixed lenths, but are -minima instead. For example, the length of a single-character repeat may vary -in UTF-8 mode. The code that uses this table must know about such things. */ - -#define OP_LENGTHS \ - 1, /* End */ \ - 1, 1, 1, 1, 1, /* \A, \G, \K, \B, \b */ \ - 1, 1, 1, 1, 1, 1, /* \D, \d, \S, \s, \W, \w */ \ - 1, 1, 1, /* Any, AllAny, Anybyte */ \ - 3, 3, /* \P, \p */ \ - 1, 1, 1, 1, 1, /* \R, \H, \h, \V, \v */ \ - 1, /* \X */ \ - 1, 1, 1, 1, 1, 1, /* \Z, \z, $, $M ^, ^M */ \ - 2, /* Char - the minimum length */ \ - 2, /* Chari - the minimum length */ \ - 2, /* not */ \ - 2, /* noti */ \ - /* Positive single-char repeats ** These are */ \ - 2, 2, 2, 2, 2, 2, /* *, *?, +, +?, ?, ?? ** minima in */ \ - 2+IMM2_SIZE, 2+IMM2_SIZE, /* upto, minupto ** mode */ \ - 2+IMM2_SIZE, /* exact */ \ - 2, 2, 2, 2+IMM2_SIZE, /* *+, ++, ?+, upto+ */ \ - 2, 2, 2, 2, 2, 2, /* *I, *?I, +I, +?I, ?I, ??I ** UTF-8 */ \ - 2+IMM2_SIZE, 2+IMM2_SIZE, /* upto I, minupto I */ \ - 2+IMM2_SIZE, /* exact I */ \ - 2, 2, 2, 2+IMM2_SIZE, /* *+I, ++I, ?+I, upto+I */ \ - /* Negative single-char repeats - only for chars < 256 */ \ - 2, 2, 2, 2, 2, 2, /* NOT *, *?, +, +?, ?, ?? */ \ - 2+IMM2_SIZE, 2+IMM2_SIZE, /* NOT upto, minupto */ \ - 2+IMM2_SIZE, /* NOT exact */ \ - 2, 2, 2, 2+IMM2_SIZE, /* Possessive NOT *, +, ?, upto */ \ - 2, 2, 2, 2, 2, 2, /* NOT *I, *?I, +I, +?I, ?I, ??I */ \ - 2+IMM2_SIZE, 2+IMM2_SIZE, /* NOT upto I, minupto I */ \ - 2+IMM2_SIZE, /* NOT exact I */ \ - 2, 2, 2, 2+IMM2_SIZE, /* Possessive NOT *I, +I, ?I, upto I */ \ - /* Positive type repeats */ \ - 2, 2, 2, 2, 2, 2, /* Type *, *?, +, +?, ?, ?? */ \ - 2+IMM2_SIZE, 2+IMM2_SIZE, /* Type upto, minupto */ \ - 2+IMM2_SIZE, /* Type exact */ \ - 2, 2, 2, 2+IMM2_SIZE, /* Possessive *+, ++, ?+, upto+ */ \ - /* Character class & ref repeats */ \ - 1, 1, 1, 1, 1, 1, /* *, *?, +, +?, ?, ?? */ \ - 1+2*IMM2_SIZE, 1+2*IMM2_SIZE, /* CRRANGE, CRMINRANGE */ \ - 1, 1, 1, 1+2*IMM2_SIZE, /* Possessive *+, ++, ?+, CRPOSRANGE */ \ - 1+(32/sizeof(pcre_uchar)), /* CLASS */ \ - 1+(32/sizeof(pcre_uchar)), /* NCLASS */ \ - 0, /* XCLASS - variable length */ \ - 1+IMM2_SIZE, /* REF */ \ - 1+IMM2_SIZE, /* REFI */ \ - 1+2*IMM2_SIZE, /* DNREF */ \ - 1+2*IMM2_SIZE, /* DNREFI */ \ - 1+LINK_SIZE, /* RECURSE */ \ - 2+2*LINK_SIZE, /* CALLOUT */ \ - 1+LINK_SIZE, /* Alt */ \ - 1+LINK_SIZE, /* Ket */ \ - 1+LINK_SIZE, /* KetRmax */ \ - 1+LINK_SIZE, /* KetRmin */ \ - 1+LINK_SIZE, /* KetRpos */ \ - 1+LINK_SIZE, /* Reverse */ \ - 1+LINK_SIZE, /* Assert */ \ - 1+LINK_SIZE, /* Assert not */ \ - 1+LINK_SIZE, /* Assert behind */ \ - 1+LINK_SIZE, /* Assert behind not */ \ - 1+LINK_SIZE, /* ONCE */ \ - 1+LINK_SIZE, /* ONCE_NC */ \ - 1+LINK_SIZE, /* BRA */ \ - 1+LINK_SIZE, /* BRAPOS */ \ - 1+LINK_SIZE+IMM2_SIZE, /* CBRA */ \ - 1+LINK_SIZE+IMM2_SIZE, /* CBRAPOS */ \ - 1+LINK_SIZE, /* COND */ \ - 1+LINK_SIZE, /* SBRA */ \ - 1+LINK_SIZE, /* SBRAPOS */ \ - 1+LINK_SIZE+IMM2_SIZE, /* SCBRA */ \ - 1+LINK_SIZE+IMM2_SIZE, /* SCBRAPOS */ \ - 1+LINK_SIZE, /* SCOND */ \ - 1+IMM2_SIZE, 1+2*IMM2_SIZE, /* CREF, DNCREF */ \ - 1+IMM2_SIZE, 1+2*IMM2_SIZE, /* RREF, DNRREF */ \ - 1, /* DEF */ \ - 1, 1, 1, /* BRAZERO, BRAMINZERO, BRAPOSZERO */ \ - 3, 1, 3, /* MARK, PRUNE, PRUNE_ARG */ \ - 1, 3, /* SKIP, SKIP_ARG */ \ - 1, 3, /* THEN, THEN_ARG */ \ - 1, 1, 1, 1, /* COMMIT, FAIL, ACCEPT, ASSERT_ACCEPT */ \ - 1+IMM2_SIZE, 1 /* CLOSE, SKIPZERO */ - -/* A magic value for OP_RREF to indicate the "any recursion" condition. */ - -#define RREF_ANY 0xffff - -/* Compile time error code numbers. They are given names so that they can more -easily be tracked. When a new number is added, the table called eint in -pcreposix.c must be updated. */ - -enum { ERR0, ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9, - ERR10, ERR11, ERR12, ERR13, ERR14, ERR15, ERR16, ERR17, ERR18, ERR19, - ERR20, ERR21, ERR22, ERR23, ERR24, ERR25, ERR26, ERR27, ERR28, ERR29, - ERR30, ERR31, ERR32, ERR33, ERR34, ERR35, ERR36, ERR37, ERR38, ERR39, - ERR40, ERR41, ERR42, ERR43, ERR44, ERR45, ERR46, ERR47, ERR48, ERR49, - ERR50, ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59, - ERR60, ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, - ERR70, ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, - ERR80, ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERRCOUNT }; - -/* JIT compiling modes. The function list is indexed by them. */ - -enum { JIT_COMPILE, JIT_PARTIAL_SOFT_COMPILE, JIT_PARTIAL_HARD_COMPILE, - JIT_NUMBER_OF_COMPILE_MODES }; - -/* The real format of the start of the pcre block; the index of names and the -code vector run on as long as necessary after the end. We store an explicit -offset to the name table so that if a regex is compiled on one host, saved, and -then run on another where the size of pointers is different, all might still -be well. - -The size of the structure must be a multiple of 8 bytes. For the case of -compiled-on-4 and run-on-8, we include an extra pointer that is always NULL so -that there are an even number of pointers which therefore are a multiple of 8 -bytes. - -It is necessary to fork the struct for the 32 bit library, since it needs to -use pcre_uint32 for first_char and req_char. We can't put an ifdef inside the -typedef because pcretest needs access to the struct of the 8-, 16- and 32-bit -variants. - -*** WARNING *** -When new fields are added to these structures, remember to adjust the code in -pcre_byte_order.c that is concerned with swapping the byte order of the fields -when a compiled regex is reloaded on a host with different endianness. -*** WARNING *** -There is also similar byte-flipping code in pcretest.c, which is used for -testing the byte-flipping features. It must also be kept in step. -*** WARNING *** -*/ - -typedef struct real_pcre8_or_16 { - pcre_uint32 magic_number; - pcre_uint32 size; /* Total that was malloced */ - pcre_uint32 options; /* Public options */ - pcre_uint32 flags; /* Private flags */ - pcre_uint32 limit_match; /* Limit set from regex */ - pcre_uint32 limit_recursion; /* Limit set from regex */ - pcre_uint16 first_char; /* Starting character */ - pcre_uint16 req_char; /* This character must be seen */ - pcre_uint16 max_lookbehind; /* Longest lookbehind (characters) */ - pcre_uint16 top_bracket; /* Highest numbered group */ - pcre_uint16 top_backref; /* Highest numbered back reference */ - pcre_uint16 name_table_offset; /* Offset to name table that follows */ - pcre_uint16 name_entry_size; /* Size of any name items */ - pcre_uint16 name_count; /* Number of name items */ - pcre_uint16 ref_count; /* Reference count */ - pcre_uint16 dummy1; /* To ensure size is a multiple of 8 */ - pcre_uint16 dummy2; /* To ensure size is a multiple of 8 */ - pcre_uint16 dummy3; /* To ensure size is a multiple of 8 */ - const pcre_uint8 *tables; /* Pointer to tables or NULL for std */ - void *nullpad; /* NULL padding */ -} real_pcre8_or_16; - -typedef struct real_pcre8_or_16 real_pcre; -typedef struct real_pcre8_or_16 real_pcre16; - -typedef struct real_pcre32 { - pcre_uint32 magic_number; - pcre_uint32 size; /* Total that was malloced */ - pcre_uint32 options; /* Public options */ - pcre_uint32 flags; /* Private flags */ - pcre_uint32 limit_match; /* Limit set from regex */ - pcre_uint32 limit_recursion; /* Limit set from regex */ - pcre_uint32 first_char; /* Starting character */ - pcre_uint32 req_char; /* This character must be seen */ - pcre_uint16 max_lookbehind; /* Longest lookbehind (characters) */ - pcre_uint16 top_bracket; /* Highest numbered group */ - pcre_uint16 top_backref; /* Highest numbered back reference */ - pcre_uint16 name_table_offset; /* Offset to name table that follows */ - pcre_uint16 name_entry_size; /* Size of any name items */ - pcre_uint16 name_count; /* Number of name items */ - pcre_uint16 ref_count; /* Reference count */ - pcre_uint16 dummy; /* To ensure size is a multiple of 8 */ - const pcre_uint8 *tables; /* Pointer to tables or NULL for std */ - void *nullpad; /* NULL padding */ -} real_pcre32; - -#if defined COMPILE_PCRE8 -#define REAL_PCRE real_pcre -#elif defined COMPILE_PCRE16 -#define REAL_PCRE real_pcre16 -#elif defined COMPILE_PCRE32 -#define REAL_PCRE real_pcre32 -#endif - -/* Assert that the size of REAL_PCRE is divisible by 8 */ -typedef int __assert_real_pcre_size_divisible_8[(sizeof(REAL_PCRE) % 8) == 0 ? 1 : -1]; - -/* Needed in pcretest to access some fields in the real_pcre* structures - * directly. They're unified for 8/16/32 bits since the structs only differ - * after these fields; if that ever changes, need to fork those defines into - * 8/16 and 32 bit versions. */ -#define REAL_PCRE_MAGIC(re) (((REAL_PCRE*)re)->magic_number) -#define REAL_PCRE_SIZE(re) (((REAL_PCRE*)re)->size) -#define REAL_PCRE_OPTIONS(re) (((REAL_PCRE*)re)->options) -#define REAL_PCRE_FLAGS(re) (((REAL_PCRE*)re)->flags) - -/* The format of the block used to store data from pcre_study(). The same -remark (see NOTE above) about extending this structure applies. */ - -typedef struct pcre_study_data { - pcre_uint32 size; /* Total that was malloced */ - pcre_uint32 flags; /* Private flags */ - pcre_uint8 start_bits[32]; /* Starting char bits */ - pcre_uint32 minlength; /* Minimum subject length */ -} pcre_study_data; - -/* Structure for building a chain of open capturing subpatterns during -compiling, so that instructions to close them can be compiled when (*ACCEPT) is -encountered. This is also used to identify subpatterns that contain recursive -back references to themselves, so that they can be made atomic. */ - -typedef struct open_capitem { - struct open_capitem *next; /* Chain link */ - pcre_uint16 number; /* Capture number */ - pcre_uint16 flag; /* Set TRUE if recursive back ref */ -} open_capitem; - -/* Structure for building a list of named groups during the first pass of -compiling. */ - -typedef struct named_group { - const pcre_uchar *name; /* Points to the name in the pattern */ - int length; /* Length of the name */ - pcre_uint32 number; /* Group number */ -} named_group; - -/* Structure for passing "static" information around between the functions -doing the compiling, so that they are thread-safe. */ - -typedef struct compile_data { - const pcre_uint8 *lcc; /* Points to lower casing table */ - const pcre_uint8 *fcc; /* Points to case-flipping table */ - const pcre_uint8 *cbits; /* Points to character type table */ - const pcre_uint8 *ctypes; /* Points to table of type maps */ - const pcre_uchar *start_workspace;/* The start of working space */ - const pcre_uchar *start_code; /* The start of the compiled code */ - const pcre_uchar *start_pattern; /* The start of the pattern */ - const pcre_uchar *end_pattern; /* The end of the pattern */ - pcre_uchar *hwm; /* High watermark of workspace */ - open_capitem *open_caps; /* Chain of open capture items */ - named_group *named_groups; /* Points to vector in pre-compile */ - pcre_uchar *name_table; /* The name/number table */ - int names_found; /* Number of entries so far */ - int name_entry_size; /* Size of each entry */ - int named_group_list_size; /* Number of entries in the list */ - int workspace_size; /* Size of workspace */ - unsigned int bracount; /* Count of capturing parens as we compile */ - int final_bracount; /* Saved value after first pass */ - int max_lookbehind; /* Maximum lookbehind (characters) */ - int top_backref; /* Maximum back reference */ - unsigned int backref_map; /* Bitmap of low back refs */ - unsigned int namedrefcount; /* Number of backreferences by name */ - int parens_depth; /* Depth of nested parentheses */ - int assert_depth; /* Depth of nested assertions */ - pcre_uint32 external_options; /* External (initial) options */ - pcre_uint32 external_flags; /* External flag bits to be set */ - int req_varyopt; /* "After variable item" flag for reqbyte */ - BOOL had_accept; /* (*ACCEPT) encountered */ - BOOL had_pruneorskip; /* (*PRUNE) or (*SKIP) encountered */ - BOOL check_lookbehind; /* Lookbehinds need later checking */ - BOOL dupnames; /* Duplicate names exist */ - BOOL dupgroups; /* Duplicate groups exist: (?| found */ - BOOL iscondassert; /* Next assert is a condition */ - int nltype; /* Newline type */ - int nllen; /* Newline string length */ - pcre_uchar nl[4]; /* Newline string when fixed length */ -} compile_data; - -/* Structure for maintaining a chain of pointers to the currently incomplete -branches, for testing for left recursion while compiling. */ - -typedef struct branch_chain { - struct branch_chain *outer; - pcre_uchar *current_branch; -} branch_chain; - -/* Structure for mutual recursion detection. */ - -typedef struct recurse_check { - struct recurse_check *prev; - const pcre_uchar *group; -} recurse_check; - -/* Structure for items in a linked list that represents an explicit recursive -call within the pattern; used by pcre_exec(). */ - -typedef struct recursion_info { - struct recursion_info *prevrec; /* Previous recursion record (or NULL) */ - unsigned int group_num; /* Number of group that was called */ - int *offset_save; /* Pointer to start of saved offsets */ - int saved_max; /* Number of saved offsets */ - int saved_capture_last; /* Last capture number */ - PCRE_PUCHAR subject_position; /* Position at start of recursion */ -} recursion_info; - -/* A similar structure for pcre_dfa_exec(). */ - -typedef struct dfa_recursion_info { - struct dfa_recursion_info *prevrec; - int group_num; - PCRE_PUCHAR subject_position; -} dfa_recursion_info; - -/* Structure for building a chain of data for holding the values of the subject -pointer at the start of each subpattern, so as to detect when an empty string -has been matched by a subpattern - to break infinite loops; used by -pcre_exec(). */ - -typedef struct eptrblock { - struct eptrblock *epb_prev; - PCRE_PUCHAR epb_saved_eptr; -} eptrblock; - - -/* Structure for passing "static" information around between the functions -doing traditional NFA matching, so that they are thread-safe. */ - -typedef struct match_data { - unsigned long int match_call_count; /* As it says */ - unsigned long int match_limit; /* As it says */ - unsigned long int match_limit_recursion; /* As it says */ - int *offset_vector; /* Offset vector */ - int offset_end; /* One past the end */ - int offset_max; /* The maximum usable for return data */ - int nltype; /* Newline type */ - int nllen; /* Newline string length */ - int name_count; /* Number of names in name table */ - int name_entry_size; /* Size of entry in names table */ - unsigned int skip_arg_count; /* For counting SKIP_ARGs */ - unsigned int ignore_skip_arg; /* For re-run when SKIP arg name not found */ - pcre_uchar *name_table; /* Table of names */ - pcre_uchar nl[4]; /* Newline string when fixed */ - const pcre_uint8 *lcc; /* Points to lower casing table */ - const pcre_uint8 *fcc; /* Points to case-flipping table */ - const pcre_uint8 *ctypes; /* Points to table of type maps */ - BOOL notbol; /* NOTBOL flag */ - BOOL noteol; /* NOTEOL flag */ - BOOL utf; /* UTF-8 / UTF-16 flag */ - BOOL jscript_compat; /* JAVASCRIPT_COMPAT flag */ - BOOL use_ucp; /* PCRE_UCP flag */ - BOOL endonly; /* Dollar not before final \n */ - BOOL notempty; /* Empty string match not wanted */ - BOOL notempty_atstart; /* Empty string match at start not wanted */ - BOOL hitend; /* Hit the end of the subject at some point */ - BOOL bsr_anycrlf; /* \R is just any CRLF, not full Unicode */ - BOOL hasthen; /* Pattern contains (*THEN) */ - const pcre_uchar *start_code; /* For use when recursing */ - PCRE_PUCHAR start_subject; /* Start of the subject string */ - PCRE_PUCHAR end_subject; /* End of the subject string */ - PCRE_PUCHAR start_match_ptr; /* Start of matched string */ - PCRE_PUCHAR end_match_ptr; /* Subject position at end match */ - PCRE_PUCHAR start_used_ptr; /* Earliest consulted character */ - int partial; /* PARTIAL options */ - int end_offset_top; /* Highwater mark at end of match */ - pcre_int32 capture_last; /* Most recent capture number + overflow flag */ - int start_offset; /* The start offset value */ - int match_function_type; /* Set for certain special calls of MATCH() */ - eptrblock *eptrchain; /* Chain of eptrblocks for tail recursions */ - int eptrn; /* Next free eptrblock */ - recursion_info *recursive; /* Linked list of recursion data */ - void *callout_data; /* To pass back to callouts */ - const pcre_uchar *mark; /* Mark pointer to pass back on success */ - const pcre_uchar *nomatch_mark;/* Mark pointer to pass back on failure */ - const pcre_uchar *once_target; /* Where to back up to for atomic groups */ -#ifdef NO_RECURSE - void *match_frames_base; /* For remembering malloc'd frames */ -#endif -} match_data; - -/* A similar structure is used for the same purpose by the DFA matching -functions. */ - -typedef struct dfa_match_data { - const pcre_uchar *start_code; /* Start of the compiled pattern */ - const pcre_uchar *start_subject ; /* Start of the subject string */ - const pcre_uchar *end_subject; /* End of subject string */ - const pcre_uchar *start_used_ptr; /* Earliest consulted character */ - const pcre_uint8 *tables; /* Character tables */ - int start_offset; /* The start offset value */ - int moptions; /* Match options */ - int poptions; /* Pattern options */ - int nltype; /* Newline type */ - int nllen; /* Newline string length */ - pcre_uchar nl[4]; /* Newline string when fixed */ - void *callout_data; /* To pass back to callouts */ - dfa_recursion_info *recursive; /* Linked list of recursion data */ -} dfa_match_data; - -/* Bit definitions for entries in the pcre_ctypes table. */ - -#define ctype_space 0x01 -#define ctype_letter 0x02 -#define ctype_digit 0x04 -#define ctype_xdigit 0x08 -#define ctype_word 0x10 /* alphanumeric or '_' */ -#define ctype_meta 0x80 /* regexp meta char or zero (end pattern) */ - -/* Offsets for the bitmap tables in pcre_cbits. Each table contains a set -of bits for a class map. Some classes are built by combining these tables. */ - -#define cbit_space 0 /* [:space:] or \s */ -#define cbit_xdigit 32 /* [:xdigit:] */ -#define cbit_digit 64 /* [:digit:] or \d */ -#define cbit_upper 96 /* [:upper:] */ -#define cbit_lower 128 /* [:lower:] */ -#define cbit_word 160 /* [:word:] or \w */ -#define cbit_graph 192 /* [:graph:] */ -#define cbit_print 224 /* [:print:] */ -#define cbit_punct 256 /* [:punct:] */ -#define cbit_cntrl 288 /* [:cntrl:] */ -#define cbit_length 320 /* Length of the cbits table */ - -/* Offsets of the various tables from the base tables pointer, and -total length. */ - -#define lcc_offset 0 -#define fcc_offset 256 -#define cbits_offset 512 -#define ctypes_offset (cbits_offset + cbit_length) -#define tables_length (ctypes_offset + 256) - -/* Internal function and data prefixes. */ - -#if defined COMPILE_PCRE8 -#ifndef PUBL -#define PUBL(name) pcre_##name -#endif -#ifndef PRIV -#define PRIV(name) _pcre_##name -#endif -#elif defined COMPILE_PCRE16 -#ifndef PUBL -#define PUBL(name) pcre16_##name -#endif -#ifndef PRIV -#define PRIV(name) _pcre16_##name -#endif -#elif defined COMPILE_PCRE32 -#ifndef PUBL -#define PUBL(name) pcre32_##name -#endif -#ifndef PRIV -#define PRIV(name) _pcre32_##name -#endif -#else -#error Unsupported compiling mode -#endif /* COMPILE_PCRE[8|16|32] */ - -/* Layout of the UCP type table that translates property names into types and -codes. Each entry used to point directly to a name, but to reduce the number of -relocations in shared libraries, it now has an offset into a single string -instead. */ - -typedef struct { - pcre_uint16 name_offset; - pcre_uint16 type; - pcre_uint16 value; -} ucp_type_table; - - -/* Internal shared data tables. These are tables that are used by more than one -of the exported public functions. They have to be "external" in the C sense, -but are not part of the PCRE public API. The data for these tables is in the -pcre_tables.c module. */ - -#ifdef COMPILE_PCRE8 -extern const int PRIV(utf8_table1)[]; -extern const int PRIV(utf8_table1_size); -extern const int PRIV(utf8_table2)[]; -extern const int PRIV(utf8_table3)[]; -extern const pcre_uint8 PRIV(utf8_table4)[]; -#endif /* COMPILE_PCRE8 */ - -extern const char PRIV(utt_names)[]; -extern const ucp_type_table PRIV(utt)[]; -extern const int PRIV(utt_size); - -extern const pcre_uint8 PRIV(OP_lengths)[]; -extern const pcre_uint8 PRIV(default_tables)[]; - -extern const pcre_uint32 PRIV(hspace_list)[]; -extern const pcre_uint32 PRIV(vspace_list)[]; - - -/* Internal shared functions. These are functions that are used by more than -one of the exported public functions. They have to be "external" in the C -sense, but are not part of the PCRE public API. */ - -/* String comparison functions. */ -#if defined COMPILE_PCRE8 - -#define STRCMP_UC_UC(str1, str2) \ - strcmp((char *)(str1), (char *)(str2)) -#define STRCMP_UC_C8(str1, str2) \ - strcmp((char *)(str1), (str2)) -#define STRNCMP_UC_UC(str1, str2, num) \ - strncmp((char *)(str1), (char *)(str2), (num)) -#define STRNCMP_UC_C8(str1, str2, num) \ - strncmp((char *)(str1), (str2), (num)) -#define STRLEN_UC(str) strlen((const char *)str) - -#elif defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - -extern int PRIV(strcmp_uc_uc)(const pcre_uchar *, - const pcre_uchar *); -extern int PRIV(strcmp_uc_c8)(const pcre_uchar *, - const char *); -extern int PRIV(strncmp_uc_uc)(const pcre_uchar *, - const pcre_uchar *, unsigned int num); -extern int PRIV(strncmp_uc_c8)(const pcre_uchar *, - const char *, unsigned int num); -extern unsigned int PRIV(strlen_uc)(const pcre_uchar *str); - -#define STRCMP_UC_UC(str1, str2) \ - PRIV(strcmp_uc_uc)((str1), (str2)) -#define STRCMP_UC_C8(str1, str2) \ - PRIV(strcmp_uc_c8)((str1), (str2)) -#define STRNCMP_UC_UC(str1, str2, num) \ - PRIV(strncmp_uc_uc)((str1), (str2), (num)) -#define STRNCMP_UC_C8(str1, str2, num) \ - PRIV(strncmp_uc_c8)((str1), (str2), (num)) -#define STRLEN_UC(str) PRIV(strlen_uc)(str) - -#endif /* COMPILE_PCRE[8|16|32] */ - -#if defined COMPILE_PCRE8 || defined COMPILE_PCRE16 - -#define STRCMP_UC_UC_TEST(str1, str2) STRCMP_UC_UC(str1, str2) -#define STRCMP_UC_C8_TEST(str1, str2) STRCMP_UC_C8(str1, str2) - -#elif defined COMPILE_PCRE32 - -extern int PRIV(strcmp_uc_uc_utf)(const pcre_uchar *, - const pcre_uchar *); -extern int PRIV(strcmp_uc_c8_utf)(const pcre_uchar *, - const char *); - -#define STRCMP_UC_UC_TEST(str1, str2) \ - (utf ? PRIV(strcmp_uc_uc_utf)((str1), (str2)) : PRIV(strcmp_uc_uc)((str1), (str2))) -#define STRCMP_UC_C8_TEST(str1, str2) \ - (utf ? PRIV(strcmp_uc_c8_utf)((str1), (str2)) : PRIV(strcmp_uc_c8)((str1), (str2))) - -#endif /* COMPILE_PCRE[8|16|32] */ - -extern const pcre_uchar *PRIV(find_bracket)(const pcre_uchar *, BOOL, int); -extern BOOL PRIV(is_newline)(PCRE_PUCHAR, int, PCRE_PUCHAR, - int *, BOOL); -extern unsigned int PRIV(ord2utf)(pcre_uint32, pcre_uchar *); -extern int PRIV(valid_utf)(PCRE_PUCHAR, int, int *); -extern BOOL PRIV(was_newline)(PCRE_PUCHAR, int, PCRE_PUCHAR, - int *, BOOL); -extern BOOL PRIV(xclass)(pcre_uint32, const pcre_uchar *, BOOL); - -#ifdef SUPPORT_JIT -extern void PRIV(jit_compile)(const REAL_PCRE *, - PUBL(extra) *, int); -extern int PRIV(jit_exec)(const PUBL(extra) *, - const pcre_uchar *, int, int, int, int *, int); -extern void PRIV(jit_free)(void *); -extern int PRIV(jit_get_size)(void *); -extern const char* PRIV(jit_get_target)(void); -#endif - -/* Unicode character database (UCD) */ - -typedef struct { - pcre_uint8 script; /* ucp_Arabic, etc. */ - pcre_uint8 chartype; /* ucp_Cc, etc. (general categories) */ - pcre_uint8 gbprop; /* ucp_gbControl, etc. (grapheme break property) */ - pcre_uint8 caseset; /* offset to multichar other cases or zero */ - pcre_int32 other_case; /* offset to other case, or zero if none */ -} ucd_record; - -extern const pcre_uint32 PRIV(ucd_caseless_sets)[]; -extern const ucd_record PRIV(ucd_records)[]; -extern const pcre_uint8 PRIV(ucd_stage1)[]; -extern const pcre_uint16 PRIV(ucd_stage2)[]; -extern const pcre_uint32 PRIV(ucp_gentype)[]; -extern const pcre_uint32 PRIV(ucp_gbtable)[]; -#ifdef COMPILE_PCRE32 -extern const ucd_record PRIV(dummy_ucd_record)[]; -#endif -#ifdef SUPPORT_JIT -extern const int PRIV(ucp_typerange)[]; -#endif - -#ifdef SUPPORT_UCP -/* UCD access macros */ - -#define UCD_BLOCK_SIZE 128 -#define REAL_GET_UCD(ch) (PRIV(ucd_records) + \ - PRIV(ucd_stage2)[PRIV(ucd_stage1)[(int)(ch) / UCD_BLOCK_SIZE] * \ - UCD_BLOCK_SIZE + (int)(ch) % UCD_BLOCK_SIZE]) - -#ifdef COMPILE_PCRE32 -#define GET_UCD(ch) ((ch > 0x10ffff)? PRIV(dummy_ucd_record) : REAL_GET_UCD(ch)) -#else -#define GET_UCD(ch) REAL_GET_UCD(ch) -#endif - -#define UCD_CHARTYPE(ch) GET_UCD(ch)->chartype -#define UCD_SCRIPT(ch) GET_UCD(ch)->script -#define UCD_CATEGORY(ch) PRIV(ucp_gentype)[UCD_CHARTYPE(ch)] -#define UCD_GRAPHBREAK(ch) GET_UCD(ch)->gbprop -#define UCD_CASESET(ch) GET_UCD(ch)->caseset -#define UCD_OTHERCASE(ch) ((pcre_uint32)((int)ch + (int)(GET_UCD(ch)->other_case))) - -#endif /* SUPPORT_UCP */ - -#endif - -/* End of pcre_internal.h */ diff --git a/src/pcre/pcre_jit_test.c b/src/pcre/pcre_jit_test.c deleted file mode 100644 index 034cb526..00000000 --- a/src/pcre/pcre_jit_test.c +++ /dev/null @@ -1,1765 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Main Library written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - - This JIT compiler regression test program was written by Zoltan Herczeg - Copyright (c) 2010-2012 - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include -#include -#include "pcre.h" - - -#include "pcre_internal.h" - -/* - Letter characters: - \xe6\x92\xad = 0x64ad = 25773 (kanji) - Non-letter characters: - \xc2\xa1 = 0xa1 = (Inverted Exclamation Mark) - \xf3\xa9\xb7\x80 = 0xe9dc0 = 957888 - \xed\xa0\x80 = 55296 = 0xd800 (Invalid UTF character) - \xed\xb0\x80 = 56320 = 0xdc00 (Invalid UTF character) - Newlines: - \xc2\x85 = 0x85 = 133 (NExt Line = NEL) - \xe2\x80\xa8 = 0x2028 = 8232 (Line Separator) - Othercase pairs: - \xc3\xa9 = 0xe9 = 233 (e') - \xc3\x89 = 0xc9 = 201 (E') - \xc3\xa1 = 0xe1 = 225 (a') - \xc3\x81 = 0xc1 = 193 (A') - \x53 = 0x53 = S - \x73 = 0x73 = s - \xc5\xbf = 0x17f = 383 (long S) - \xc8\xba = 0x23a = 570 - \xe2\xb1\xa5 = 0x2c65 = 11365 - \xe1\xbd\xb8 = 0x1f78 = 8056 - \xe1\xbf\xb8 = 0x1ff8 = 8184 - \xf0\x90\x90\x80 = 0x10400 = 66560 - \xf0\x90\x90\xa8 = 0x10428 = 66600 - \xc7\x84 = 0x1c4 = 452 - \xc7\x85 = 0x1c5 = 453 - \xc7\x86 = 0x1c6 = 454 - Caseless sets: - ucp_Armenian - \x{531}-\x{556} -> \x{561}-\x{586} - ucp_Coptic - \x{2c80}-\x{2ce3} -> caseless: XOR 0x1 - ucp_Latin - \x{ff21}-\x{ff3a} -> \x{ff41]-\x{ff5a} - - Mark property: - \xcc\x8d = 0x30d = 781 - Special: - \xc2\x80 = 0x80 = 128 (lowest 2 byte character) - \xdf\xbf = 0x7ff = 2047 (highest 2 byte character) - \xe0\xa0\x80 = 0x800 = 2048 (lowest 2 byte character) - \xef\xbf\xbf = 0xffff = 65535 (highest 3 byte character) - \xf0\x90\x80\x80 = 0x10000 = 65536 (lowest 4 byte character) - \xf4\x8f\xbf\xbf = 0x10ffff = 1114111 (highest allowed utf character) -*/ - -static int regression_tests(void); - -int main(void) -{ - int jit = 0; -#if defined SUPPORT_PCRE8 - pcre_config(PCRE_CONFIG_JIT, &jit); -#elif defined SUPPORT_PCRE16 - pcre16_config(PCRE_CONFIG_JIT, &jit); -#elif defined SUPPORT_PCRE32 - pcre32_config(PCRE_CONFIG_JIT, &jit); -#endif - if (!jit) { - printf("JIT must be enabled to run pcre_jit_test\n"); - return 1; - } - return regression_tests(); -} - -/* --------------------------------------------------------------------------------------- */ - -#if !(defined SUPPORT_PCRE8) && !(defined SUPPORT_PCRE16) && !(defined SUPPORT_PCRE32) -#error SUPPORT_PCRE8 or SUPPORT_PCRE16 or SUPPORT_PCRE32 must be defined -#endif - -#define MUA (PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_ANYCRLF) -#define MUAP (PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_ANYCRLF | PCRE_UCP) -#define CMUA (PCRE_CASELESS | PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_ANYCRLF) -#define CMUAP (PCRE_CASELESS | PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_ANYCRLF | PCRE_UCP) -#define MA (PCRE_MULTILINE | PCRE_NEWLINE_ANYCRLF) -#define MAP (PCRE_MULTILINE | PCRE_NEWLINE_ANYCRLF | PCRE_UCP) -#define CMA (PCRE_CASELESS | PCRE_MULTILINE | PCRE_NEWLINE_ANYCRLF) - -#define OFFSET_MASK 0x00ffff -#define F_NO8 0x010000 -#define F_NO16 0x020000 -#define F_NO32 0x020000 -#define F_NOMATCH 0x040000 -#define F_DIFF 0x080000 -#define F_FORCECONV 0x100000 -#define F_PROPERTY 0x200000 -#define F_STUDY 0x400000 - -struct regression_test_case { - int flags; - int start_offset; - const char *pattern; - const char *input; -}; - -static struct regression_test_case regression_test_cases[] = { - /* Constant strings. */ - { MUA, 0, "AbC", "AbAbC" }, - { MUA, 0, "ACCEPT", "AACACCACCEACCEPACCEPTACCEPTT" }, - { CMUA, 0, "aA#\xc3\xa9\xc3\x81", "aA#Aa#\xc3\x89\xc3\xa1" }, - { MA, 0, "[^a]", "aAbB" }, - { CMA, 0, "[^m]", "mMnN" }, - { MA, 0, "a[^b][^#]", "abacd" }, - { CMA, 0, "A[^B][^E]", "abacd" }, - { CMUA, 0, "[^x][^#]", "XxBll" }, - { MUA, 0, "[^a]", "aaa\xc3\xa1#Ab" }, - { CMUA, 0, "[^A]", "aA\xe6\x92\xad" }, - { MUA, 0, "\\W(\\W)?\\w", "\r\n+bc" }, - { MUA, 0, "\\W(\\W)?\\w", "\n\r+bc" }, - { MUA, 0, "\\W(\\W)?\\w", "\r\r+bc" }, - { MUA, 0, "\\W(\\W)?\\w", "\n\n+bc" }, - { MUA, 0, "[axd]", "sAXd" }, - { CMUA, 0, "[axd]", "sAXd" }, - { CMUA, 0 | F_NOMATCH, "[^axd]", "DxA" }, - { MUA, 0, "[a-dA-C]", "\xe6\x92\xad\xc3\xa9.B" }, - { MUA, 0, "[^a-dA-C]", "\xe6\x92\xad\xc3\xa9" }, - { CMUA, 0, "[^\xc3\xa9]", "\xc3\xa9\xc3\x89." }, - { MUA, 0, "[^\xc3\xa9]", "\xc3\xa9\xc3\x89." }, - { MUA, 0, "[^a]", "\xc2\x80[]" }, - { CMUA, 0, "\xf0\x90\x90\xa7", "\xf0\x90\x91\x8f" }, - { CMA, 0, "1a2b3c4", "1a2B3c51A2B3C4" }, - { PCRE_CASELESS, 0, "\xff#a", "\xff#\xff\xfe##\xff#A" }, - { PCRE_CASELESS, 0, "\xfe", "\xff\xfc#\xfe\xfe" }, - { PCRE_CASELESS, 0, "a1", "Aa1" }, - { MA, 0, "\\Ca", "cda" }, - { CMA, 0, "\\Ca", "CDA" }, - { MA, 0 | F_NOMATCH, "\\Cx", "cda" }, - { CMA, 0 | F_NOMATCH, "\\Cx", "CDA" }, - { CMUAP, 0, "\xf0\x90\x90\x80\xf0\x90\x90\xa8", "\xf0\x90\x90\xa8\xf0\x90\x90\x80" }, - { CMUAP, 0, "\xf0\x90\x90\x80{2}", "\xf0\x90\x90\x80#\xf0\x90\x90\xa8\xf0\x90\x90\x80" }, - { CMUAP, 0, "\xf0\x90\x90\xa8{2}", "\xf0\x90\x90\x80#\xf0\x90\x90\xa8\xf0\x90\x90\x80" }, - { CMUAP, 0, "\xe1\xbd\xb8\xe1\xbf\xb8", "\xe1\xbf\xb8\xe1\xbd\xb8" }, - { MA, 0, "[3-57-9]", "5" }, - - /* Assertions. */ - { MUA, 0, "\\b[^A]", "A_B#" }, - { MA, 0 | F_NOMATCH, "\\b\\W", "\n*" }, - { MUA, 0, "\\B[^,]\\b[^s]\\b", "#X" }, - { MAP, 0, "\\B", "_\xa1" }, - { MAP, 0, "\\b_\\b[,A]\\B", "_," }, - { MUAP, 0, "\\b", "\xe6\x92\xad!" }, - { MUAP, 0, "\\B", "_\xc2\xa1\xc3\xa1\xc2\x85" }, - { MUAP, 0, "\\b[^A]\\B[^c]\\b[^_]\\B", "_\xc3\xa1\xe2\x80\xa8" }, - { MUAP, 0, "\\b\\w+\\B", "\xc3\x89\xc2\xa1\xe6\x92\xad\xc3\x81\xc3\xa1" }, - { MUA, 0 | F_NOMATCH, "\\b.", "\xcd\xbe" }, - { CMUAP, 0, "\\By", "\xf0\x90\x90\xa8y" }, - { MA, 0 | F_NOMATCH, "\\R^", "\n" }, - { MA, 1 | F_NOMATCH, "^", "\n" }, - { 0, 0, "^ab", "ab" }, - { 0, 0 | F_NOMATCH, "^ab", "aab" }, - { PCRE_MULTILINE | PCRE_NEWLINE_CRLF, 0, "^a", "\r\raa\n\naa\r\naa" }, - { PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_ANYCRLF, 0, "^-", "\xe2\x80\xa8--\xc2\x85-\r\n-" }, - { PCRE_MULTILINE | PCRE_NEWLINE_ANY, 0, "^-", "a--b--\x85--" }, - { PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_ANY, 0, "^-", "a--\xe2\x80\xa8--" }, - { PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_ANY, 0, "^-", "a--\xc2\x85--" }, - { 0, 0, "ab$", "ab" }, - { 0, 0 | F_NOMATCH, "ab$", "abab\n\n" }, - { PCRE_DOLLAR_ENDONLY, 0 | F_NOMATCH, "ab$", "abab\r\n" }, - { PCRE_MULTILINE | PCRE_NEWLINE_CRLF, 0, "a$", "\r\raa\n\naa\r\naa" }, - { PCRE_MULTILINE | PCRE_NEWLINE_ANY, 0, "a$", "aaa" }, - { PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_ANYCRLF, 0, "#$", "#\xc2\x85###\r#" }, - { PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_ANY, 0, "#$", "#\xe2\x80\xa9" }, - { PCRE_NOTBOL | PCRE_NEWLINE_ANY, 0 | F_NOMATCH, "^a", "aa\naa" }, - { PCRE_NOTBOL | PCRE_MULTILINE | PCRE_NEWLINE_ANY, 0, "^a", "aa\naa" }, - { PCRE_NOTEOL | PCRE_NEWLINE_ANY, 0 | F_NOMATCH, "a$", "aa\naa" }, - { PCRE_NOTEOL | PCRE_NEWLINE_ANY, 0 | F_NOMATCH, "a$", "aa\r\n" }, - { PCRE_UTF8 | PCRE_DOLLAR_ENDONLY | PCRE_NEWLINE_ANY, 0 | F_PROPERTY, "\\p{Any}{2,}$", "aa\r\n" }, - { PCRE_NOTEOL | PCRE_MULTILINE | PCRE_NEWLINE_ANY, 0, "a$", "aa\naa" }, - { PCRE_NEWLINE_CR, 0, ".\\Z", "aaa" }, - { PCRE_NEWLINE_CR | PCRE_UTF8, 0, "a\\Z", "aaa\r" }, - { PCRE_NEWLINE_CR, 0, ".\\Z", "aaa\n" }, - { PCRE_NEWLINE_CRLF, 0, ".\\Z", "aaa\r" }, - { PCRE_NEWLINE_CRLF | PCRE_UTF8, 0, ".\\Z", "aaa\n" }, - { PCRE_NEWLINE_CRLF, 0, ".\\Z", "aaa\r\n" }, - { PCRE_NEWLINE_ANYCRLF | PCRE_UTF8, 0, ".\\Z", "aaa" }, - { PCRE_NEWLINE_ANYCRLF | PCRE_UTF8, 0, ".\\Z", "aaa\r" }, - { PCRE_NEWLINE_ANYCRLF | PCRE_UTF8, 0, ".\\Z", "aaa\n" }, - { PCRE_NEWLINE_ANYCRLF | PCRE_UTF8, 0, ".\\Z", "aaa\r\n" }, - { PCRE_NEWLINE_ANYCRLF | PCRE_UTF8, 0, ".\\Z", "aaa\xe2\x80\xa8" }, - { PCRE_NEWLINE_ANYCRLF | PCRE_UTF8, 0, ".\\Z", "aaa" }, - { PCRE_NEWLINE_ANYCRLF | PCRE_UTF8, 0, ".\\Z", "aaa\r" }, - { PCRE_NEWLINE_ANYCRLF | PCRE_UTF8, 0, ".\\Z", "aaa\n" }, - { PCRE_NEWLINE_ANYCRLF | PCRE_UTF8, 0, ".\\Z", "aaa\r\n" }, - { PCRE_NEWLINE_ANY | PCRE_UTF8, 0, ".\\Z", "aaa\xc2\x85" }, - { PCRE_NEWLINE_ANY | PCRE_UTF8, 0, ".\\Z", "aaa\xe2\x80\xa8" }, - { MA, 0, "\\Aa", "aaa" }, - { MA, 1 | F_NOMATCH, "\\Aa", "aaa" }, - { MA, 1, "\\Ga", "aaa" }, - { MA, 1 | F_NOMATCH, "\\Ga", "aba" }, - { MA, 0, "a\\z", "aaa" }, - { MA, 0 | F_NOMATCH, "a\\z", "aab" }, - - /* Brackets and alternatives. */ - { MUA, 0, "(ab|bb|cd)", "bacde" }, - { MUA, 0, "(?:ab|a)(bc|c)", "ababc" }, - { MUA, 0, "((ab|(cc))|(bb)|(?:cd|efg))", "abac" }, - { CMUA, 0, "((aB|(Cc))|(bB)|(?:cd|EFg))", "AcCe" }, - { MUA, 0, "((ab|(cc))|(bb)|(?:cd|ebg))", "acebebg" }, - { MUA, 0, "(?:(a)|(?:b))(cc|(?:d|e))(a|b)k", "accabdbbccbk" }, - { MUA, 0, "\xc7\x82|\xc6\x82", "\xf1\x83\x82\x82\xc7\x82\xc7\x83" }, - { MUA, 0, "=\xc7\x82|#\xc6\x82", "\xf1\x83\x82\x82=\xc7\x82\xc7\x83" }, - { MUA, 0, "\xc7\x82\xc7\x83|\xc6\x82\xc6\x82", "\xf1\x83\x82\x82\xc7\x82\xc7\x83" }, - { MUA, 0, "\xc6\x82\xc6\x82|\xc7\x83\xc7\x83|\xc8\x84\xc8\x84", "\xf1\x83\x82\x82\xc8\x84\xc8\x84" }, - - /* Greedy and non-greedy ? operators. */ - { MUA, 0, "(?:a)?a", "laab" }, - { CMUA, 0, "(A)?A", "llaab" }, - { MUA, 0, "(a)?\?a", "aab" }, /* ?? is the prefix of trygraphs in GCC. */ - { MUA, 0, "(a)?a", "manm" }, - { CMUA, 0, "(a|b)?\?d((?:e)?)", "ABABdx" }, - { MUA, 0, "(a|b)?\?d((?:e)?)", "abcde" }, - { MUA, 0, "((?:ab)?\?g|b(?:g(nn|d)?\?)?)?\?(?:n)?m", "abgnbgnnbgdnmm" }, - - /* Greedy and non-greedy + operators */ - { MUA, 0, "(aa)+aa", "aaaaaaa" }, - { MUA, 0, "(aa)+?aa", "aaaaaaa" }, - { MUA, 0, "(?:aba|ab|a)+l", "ababamababal" }, - { MUA, 0, "(?:aba|ab|a)+?l", "ababamababal" }, - { MUA, 0, "(a(?:bc|cb|b|c)+?|ss)+e", "accssabccbcacbccbbXaccssabccbcacbccbbe" }, - { MUA, 0, "(a(?:bc|cb|b|c)+|ss)+?e", "accssabccbcacbccbbXaccssabccbcacbccbbe" }, - { MUA, 0, "(?:(b(c)+?)+)?\?(?:(bc)+|(cb)+)+(?:m)+", "bccbcccbcbccbcbPbccbcccbcbccbcbmmn" }, - - /* Greedy and non-greedy * operators */ - { CMUA, 0, "(?:AA)*AB", "aaaaaaamaaaaaaab" }, - { MUA, 0, "(?:aa)*?ab", "aaaaaaamaaaaaaab" }, - { MUA, 0, "(aa|ab)*ab", "aaabaaab" }, - { CMUA, 0, "(aa|Ab)*?aB", "aaabaaab" }, - { MUA, 0, "(a|b)*(?:a)*(?:b)*m", "abbbaaababanabbbaaababamm" }, - { MUA, 0, "(a|b)*?(?:a)*?(?:b)*?m", "abbbaaababanabbbaaababamm" }, - { MA, 0, "a(a(\\1*)a|(b)b+){0}a", "aa" }, - { MA, 0, "((?:a|)*){0}a", "a" }, - - /* Combining ? + * operators */ - { MUA, 0, "((bm)+)?\?(?:a)*(bm)+n|((am)+?)?(?:a)+(am)*n", "bmbmabmamaaamambmaman" }, - { MUA, 0, "(((ab)?cd)*ef)+g", "abcdcdefcdefefmabcdcdefcdefefgg" }, - { MUA, 0, "(((ab)?\?cd)*?ef)+?g", "abcdcdefcdefefmabcdcdefcdefefgg" }, - { MUA, 0, "(?:(ab)?c|(?:ab)+?d)*g", "ababcdccababddg" }, - { MUA, 0, "(?:(?:ab)?\?c|(ab)+d)*?g", "ababcdccababddg" }, - - /* Single character iterators. */ - { MUA, 0, "(a+aab)+aaaab", "aaaabcaaaabaabcaabcaaabaaaab" }, - { MUA, 0, "(a*a*aab)+x", "aaaaabaabaaabmaabx" }, - { MUA, 0, "(a*?(b|ab)a*?)+x", "aaaabcxbbaabaacbaaabaabax" }, - { MUA, 0, "(a+(ab|ad)a+)+x", "aaabaaaadaabaaabaaaadaaax" }, - { MUA, 0, "(a?(a)a?)+(aaa)", "abaaabaaaaaaaa" }, - { MUA, 0, "(a?\?(a)a?\?)+(b)", "aaaacaaacaacacbaaab" }, - { MUA, 0, "(a{0,4}(b))+d", "aaaaaabaabcaaaaabaaaaabd" }, - { MUA, 0, "(a{0,4}?[^b])+d+(a{0,4}[^b])d+", "aaaaadaaaacaadddaaddd" }, - { MUA, 0, "(ba{2})+c", "baabaaabacbaabaac" }, - { MUA, 0, "(a*+bc++)+", "aaabbcaaabcccab" }, - { MUA, 0, "(a?+[^b])+", "babaacacb" }, - { MUA, 0, "(a{0,3}+b)(a{0,3}+b)(a{0,3}+)[^c]", "abaabaaacbaabaaaac" }, - { CMUA, 0, "([a-c]+[d-f]+?)+?g", "aBdacdehAbDaFgA" }, - { CMUA, 0, "[c-f]+k", "DemmFke" }, - { MUA, 0, "([DGH]{0,4}M)+", "GGDGHDGMMHMDHHGHM" }, - { MUA, 0, "([a-c]{4,}s)+", "abasabbasbbaabsbba" }, - { CMUA, 0, "[ace]{3,7}", "AcbDAcEEcEd" }, - { CMUA, 0, "[ace]{3,7}?", "AcbDAcEEcEd" }, - { CMUA, 0, "[ace]{3,}", "AcbDAcEEcEd" }, - { CMUA, 0, "[ace]{3,}?", "AcbDAcEEcEd" }, - { MUA, 0, "[ckl]{2,}?g", "cdkkmlglglkcg" }, - { CMUA, 0, "[ace]{5}?", "AcCebDAcEEcEd" }, - { MUA, 0, "([AbC]{3,5}?d)+", "BACaAbbAEAACCbdCCbdCCAAbb" }, - { MUA, 0, "([^ab]{0,}s){2}", "abaabcdsABamsDDs" }, - { MUA, 0, "\\b\\w+\\B", "x,a_cd" }, - { MUAP, 0, "\\b[^\xc2\xa1]+\\B", "\xc3\x89\xc2\xa1\xe6\x92\xad\xc3\x81\xc3\xa1" }, - { CMUA, 0, "[^b]+(a*)([^c]?d{3})", "aaaaddd" }, - { CMUAP, 0, "\xe1\xbd\xb8{2}", "\xe1\xbf\xb8#\xe1\xbf\xb8\xe1\xbd\xb8" }, - { CMUA, 0, "[^\xf0\x90\x90\x80]{2,4}@", "\xf0\x90\x90\xa8\xf0\x90\x90\x80###\xf0\x90\x90\x80@@@" }, - { CMUA, 0, "[^\xe1\xbd\xb8][^\xc3\xa9]", "\xe1\xbd\xb8\xe1\xbf\xb8\xc3\xa9\xc3\x89#" }, - { MUA, 0, "[^\xe1\xbd\xb8][^\xc3\xa9]", "\xe1\xbd\xb8\xe1\xbf\xb8\xc3\xa9\xc3\x89#" }, - { MUA, 0, "[^\xe1\xbd\xb8]{3,}?", "##\xe1\xbd\xb8#\xe1\xbd\xb8#\xc3\x89#\xe1\xbd\xb8" }, - { MUA, 0, "\\d+123", "987654321,01234" }, - { MUA, 0, "abcd*|\\w+xy", "aaaaa,abxyz" }, - { MUA, 0, "(?:abc|((?:amc|\\b\\w*xy)))", "aaaaa,abxyz" }, - { MUA, 0, "a(?R)|([a-z]++)#", ".abcd.abcd#."}, - { MUA, 0, "a(?R)|([a-z]++)#", ".abcd.mbcd#."}, - { MUA, 0, ".[ab]*.", "xx" }, - { MUA, 0, ".[ab]*a", "xxa" }, - { MUA, 0, ".[ab]?.", "xx" }, - - /* Bracket repeats with limit. */ - { MUA, 0, "(?:(ab){2}){5}M", "abababababababababababM" }, - { MUA, 0, "(?:ab|abab){1,5}M", "abababababababababababM" }, - { MUA, 0, "(?>ab|abab){1,5}M", "abababababababababababM" }, - { MUA, 0, "(?:ab|abab){1,5}?M", "abababababababababababM" }, - { MUA, 0, "(?>ab|abab){1,5}?M", "abababababababababababM" }, - { MUA, 0, "(?:(ab){1,4}?){1,3}?M", "abababababababababababababM" }, - { MUA, 0, "(?:(ab){1,4}){1,3}abababababababababababM", "ababababababababababababM" }, - { MUA, 0 | F_NOMATCH, "(?:(ab){1,4}){1,3}abababababababababababM", "abababababababababababM" }, - { MUA, 0, "(ab){4,6}?M", "abababababababM" }, - - /* Basic character sets. */ - { MUA, 0, "(?:\\s)+(?:\\S)+", "ab \t\xc3\xa9\xe6\x92\xad " }, - { MUA, 0, "(\\w)*(k)(\\W)?\?", "abcdef abck11" }, - { MUA, 0, "\\((\\d)+\\)\\D", "a() (83 (8)2 (9)ab" }, - { MUA, 0, "\\w(\\s|(?:\\d)*,)+\\w\\wb", "a 5, 4,, bb 5, 4,, aab" }, - { MUA, 0, "(\\v+)(\\V+)", "\x0e\xc2\x85\xe2\x80\xa8\x0b\x09\xe2\x80\xa9" }, - { MUA, 0, "(\\h+)(\\H+)", "\xe2\x80\xa8\xe2\x80\x80\x20\xe2\x80\x8a\xe2\x81\x9f\xe3\x80\x80\x09\x20\xc2\xa0\x0a" }, - { MUA, 0, "x[bcef]+", "xaxdxecbfg" }, - { MUA, 0, "x[bcdghij]+", "xaxexfxdgbjk" }, - { MUA, 0, "x[^befg]+", "xbxexacdhg" }, - { MUA, 0, "x[^bcdl]+", "xlxbxaekmd" }, - { MUA, 0, "x[^bcdghi]+", "xbxdxgxaefji" }, - { MUA, 0, "x[B-Fb-f]+", "xaxAxgxbfBFG" }, - { CMUA, 0, "\\x{e9}+", "#\xf0\x90\x90\xa8\xc3\xa8\xc3\xa9\xc3\x89\xc3\x88" }, - { CMUA, 0, "[^\\x{e9}]+", "\xc3\xa9#\xf0\x90\x90\xa8\xc3\xa8\xc3\x88\xc3\x89" }, - { MUA, 0, "[\\x02\\x7e]+", "\xc3\x81\xe1\xbf\xb8\xf0\x90\x90\xa8\x01\x02\x7e\x7f" }, - { MUA, 0, "[^\\x02\\x7e]+", "\x02\xc3\x81\xe1\xbf\xb8\xf0\x90\x90\xa8\x01\x7f\x7e" }, - { MUA, 0, "[\\x{81}-\\x{7fe}]+", "#\xe1\xbf\xb8\xf0\x90\x90\xa8\xc2\x80\xc2\x81\xdf\xbe\xdf\xbf" }, - { MUA, 0, "[^\\x{81}-\\x{7fe}]+", "\xc2\x81#\xe1\xbf\xb8\xf0\x90\x90\xa8\xc2\x80\xdf\xbf\xdf\xbe" }, - { MUA, 0, "[\\x{801}-\\x{fffe}]+", "#\xc3\xa9\xf0\x90\x90\x80\xe0\xa0\x80\xe0\xa0\x81\xef\xbf\xbe\xef\xbf\xbf" }, - { MUA, 0, "[^\\x{801}-\\x{fffe}]+", "\xe0\xa0\x81#\xc3\xa9\xf0\x90\x90\x80\xe0\xa0\x80\xef\xbf\xbf\xef\xbf\xbe" }, - { MUA, 0, "[\\x{10001}-\\x{10fffe}]+", "#\xc3\xa9\xe2\xb1\xa5\xf0\x90\x80\x80\xf0\x90\x80\x81\xf4\x8f\xbf\xbe\xf4\x8f\xbf\xbf" }, - { MUA, 0, "[^\\x{10001}-\\x{10fffe}]+", "\xf0\x90\x80\x81#\xc3\xa9\xe2\xb1\xa5\xf0\x90\x80\x80\xf4\x8f\xbf\xbf\xf4\x8f\xbf\xbe" }, - - /* Unicode properties. */ - { MUAP, 0, "[1-5\xc3\xa9\\w]", "\xc3\xa1_" }, - { MUAP, 0 | F_PROPERTY, "[\xc3\x81\\p{Ll}]", "A_\xc3\x89\xc3\xa1" }, - { MUAP, 0, "[\\Wd-h_x-z]+", "a\xc2\xa1#_yhzdxi" }, - { MUAP, 0 | F_NOMATCH | F_PROPERTY, "[\\P{Any}]", "abc" }, - { MUAP, 0 | F_NOMATCH | F_PROPERTY, "[^\\p{Any}]", "abc" }, - { MUAP, 0 | F_NOMATCH | F_PROPERTY, "[\\P{Any}\xc3\xa1-\xc3\xa8]", "abc" }, - { MUAP, 0 | F_NOMATCH | F_PROPERTY, "[^\\p{Any}\xc3\xa1-\xc3\xa8]", "abc" }, - { MUAP, 0 | F_NOMATCH | F_PROPERTY, "[\xc3\xa1-\xc3\xa8\\P{Any}]", "abc" }, - { MUAP, 0 | F_NOMATCH | F_PROPERTY, "[^\xc3\xa1-\xc3\xa8\\p{Any}]", "abc" }, - { MUAP, 0 | F_PROPERTY, "[\xc3\xa1-\xc3\xa8\\p{Any}]", "abc" }, - { MUAP, 0 | F_PROPERTY, "[^\xc3\xa1-\xc3\xa8\\P{Any}]", "abc" }, - { MUAP, 0, "[b-\xc3\xa9\\s]", "a\xc\xe6\x92\xad" }, - { CMUAP, 0, "[\xc2\x85-\xc2\x89\xc3\x89]", "\xc2\x84\xc3\xa9" }, - { MUAP, 0, "[^b-d^&\\s]{3,}", "db^ !a\xe2\x80\xa8_ae" }, - { MUAP, 0 | F_PROPERTY, "[^\\S\\P{Any}][\\sN]{1,3}[\\P{N}]{4}", "\xe2\x80\xaa\xa N\x9\xc3\xa9_0" }, - { MUA, 0 | F_PROPERTY, "[^\\P{L}\x9!D-F\xa]{2,3}", "\x9,.DF\xa.CG\xc3\x81" }, - { CMUAP, 0, "[\xc3\xa1-\xc3\xa9_\xe2\x80\xa0-\xe2\x80\xaf]{1,5}[^\xe2\x80\xa0-\xe2\x80\xaf]", "\xc2\xa1\xc3\x89\xc3\x89\xe2\x80\xaf_\xe2\x80\xa0" }, - { MUAP, 0 | F_PROPERTY, "[\xc3\xa2-\xc3\xa6\xc3\x81-\xc3\x84\xe2\x80\xa8-\xe2\x80\xa9\xe6\x92\xad\\p{Zs}]{2,}", "\xe2\x80\xa7\xe2\x80\xa9\xe6\x92\xad \xe6\x92\xae" }, - { MUAP, 0 | F_PROPERTY, "[\\P{L&}]{2}[^\xc2\x85-\xc2\x89\\p{Ll}\\p{Lu}]{2}", "\xc3\xa9\xe6\x92\xad.a\xe6\x92\xad|\xc2\x8a#" }, - { PCRE_UCP, 0, "[a-b\\s]{2,5}[^a]", "AB baaa" }, - - /* Possible empty brackets. */ - { MUA, 0, "(?:|ab||bc|a)+d", "abcxabcabd" }, - { MUA, 0, "(|ab||bc|a)+d", "abcxabcabd" }, - { MUA, 0, "(?:|ab||bc|a)*d", "abcxabcabd" }, - { MUA, 0, "(|ab||bc|a)*d", "abcxabcabd" }, - { MUA, 0, "(?:|ab||bc|a)+?d", "abcxabcabd" }, - { MUA, 0, "(|ab||bc|a)+?d", "abcxabcabd" }, - { MUA, 0, "(?:|ab||bc|a)*?d", "abcxabcabd" }, - { MUA, 0, "(|ab||bc|a)*?d", "abcxabcabd" }, - { MUA, 0, "(((a)*?|(?:ba)+)+?|(?:|c|ca)*)*m", "abaacaccabacabalabaacaccabacabamm" }, - { MUA, 0, "(?:((?:a)*|(ba)+?)+|(|c|ca)*?)*?m", "abaacaccabacabalabaacaccabacabamm" }, - - /* Start offset. */ - { MUA, 3, "(\\d|(?:\\w)*\\w)+", "0ac01Hb" }, - { MUA, 4 | F_NOMATCH, "(\\w\\W\\w)+", "ab#d" }, - { MUA, 2 | F_NOMATCH, "(\\w\\W\\w)+", "ab#d" }, - { MUA, 1, "(\\w\\W\\w)+", "ab#d" }, - - /* Newline. */ - { PCRE_MULTILINE | PCRE_NEWLINE_CRLF, 0, "\\W{0,2}[^#]{3}", "\r\n#....." }, - { PCRE_MULTILINE | PCRE_NEWLINE_CR, 0, "\\W{0,2}[^#]{3}", "\r\n#....." }, - { PCRE_MULTILINE | PCRE_NEWLINE_CRLF, 0, "\\W{1,3}[^#]", "\r\n##...." }, - { MUA | PCRE_NO_UTF8_CHECK, 1, "^.a", "\n\x80\nxa" }, - { MUA, 1, "^", "\r\n" }, - { PCRE_MULTILINE | PCRE_NEWLINE_CRLF, 1 | F_NOMATCH, "^", "\r\n" }, - { PCRE_MULTILINE | PCRE_NEWLINE_CRLF, 1, "^", "\r\na" }, - - /* Any character except newline or any newline. */ - { PCRE_NEWLINE_CRLF, 0, ".", "\r" }, - { PCRE_NEWLINE_CRLF | PCRE_UTF8, 0, ".(.).", "a\xc3\xa1\r\n\n\r\r" }, - { PCRE_NEWLINE_ANYCRLF, 0, ".(.)", "a\rb\nc\r\n\xc2\x85\xe2\x80\xa8" }, - { PCRE_NEWLINE_ANYCRLF | PCRE_UTF8, 0, ".(.)", "a\rb\nc\r\n\xc2\x85\xe2\x80\xa8" }, - { PCRE_NEWLINE_ANY | PCRE_UTF8, 0, "(.).", "a\rb\nc\r\n\xc2\x85\xe2\x80\xa9$de" }, - { PCRE_NEWLINE_ANYCRLF | PCRE_UTF8, 0 | F_NOMATCH, ".(.).", "\xe2\x80\xa8\nb\r" }, - { PCRE_NEWLINE_ANY, 0, "(.)(.)", "#\x85#\r#\n#\r\n#\x84" }, - { PCRE_NEWLINE_ANY | PCRE_UTF8, 0, "(.+)#", "#\rMn\xc2\x85#\n###" }, - { PCRE_BSR_ANYCRLF, 0, "\\R", "\r" }, - { PCRE_BSR_ANYCRLF, 0, "\\R", "\x85#\r\n#" }, - { PCRE_BSR_UNICODE | PCRE_UTF8, 0, "\\R", "ab\xe2\x80\xa8#c" }, - { PCRE_BSR_UNICODE | PCRE_UTF8, 0, "\\R", "ab\r\nc" }, - { PCRE_NEWLINE_CRLF | PCRE_BSR_UNICODE | PCRE_UTF8, 0, "(\\R.)+", "\xc2\x85\r\n#\xe2\x80\xa8\n\r\n\r" }, - { MUA, 0 | F_NOMATCH, "\\R+", "ab" }, - { MUA, 0, "\\R+", "ab\r\n\r" }, - { MUA, 0, "\\R*", "ab\r\n\r" }, - { MUA, 0, "\\R*", "\r\n\r" }, - { MUA, 0, "\\R{2,4}", "\r\nab\r\r" }, - { MUA, 0, "\\R{2,4}", "\r\nab\n\n\n\r\r\r" }, - { MUA, 0, "\\R{2,}", "\r\nab\n\n\n\r\r\r" }, - { MUA, 0, "\\R{0,3}", "\r\n\r\n\r\n\r\n\r\n" }, - { MUA, 0 | F_NOMATCH, "\\R+\\R\\R", "\r\n\r\n" }, - { MUA, 0, "\\R+\\R\\R", "\r\r\r" }, - { MUA, 0, "\\R*\\R\\R", "\n\r" }, - { MUA, 0 | F_NOMATCH, "\\R{2,4}\\R\\R", "\r\r\r" }, - { MUA, 0, "\\R{2,4}\\R\\R", "\r\r\r\r" }, - - /* Atomic groups (no fallback from "next" direction). */ - { MUA, 0 | F_NOMATCH, "(?>ab)ab", "bab" }, - { MUA, 0 | F_NOMATCH, "(?>(ab))ab", "bab" }, - { MUA, 0, "(?>ab)+abc(?>de)*def(?>gh)?ghe(?>ij)+?k(?>lm)*?n(?>op)?\?op", - "bababcdedefgheijijklmlmnop" }, - { MUA, 0, "(?>a(b)+a|(ab)?\?(b))an", "abban" }, - { MUA, 0, "(?>ab+a|(?:ab)?\?b)an", "abban" }, - { MUA, 0, "((?>ab|ad|)*?)(?>|c)*abad", "abababcababad" }, - { MUA, 0, "(?>(aa|b|)*+(?>(##)|###)*d|(aa)(?>(baa)?)m)", "aabaa#####da" }, - { MUA, 0, "((?>a|)+?)b", "aaacaaab" }, - { MUA, 0, "(?>x|)*$", "aaa" }, - { MUA, 0, "(?>(x)|)*$", "aaa" }, - { MUA, 0, "(?>x|())*$", "aaa" }, - { MUA, 0, "((?>[cxy]a|[a-d])*?)b", "aaa+ aaab" }, - { MUA, 0, "((?>[cxy](a)|[a-d])*?)b", "aaa+ aaab" }, - { MUA, 0, "(?>((?>(a+))))bab|(?>((?>(a+))))bb", "aaaabaaabaabab" }, - { MUA, 0, "(?>(?>a+))bab|(?>(?>a+))bb", "aaaabaaabaabab" }, - { MUA, 0, "(?>(a)c|(?>(c)|(a))a)b*?bab", "aaaabaaabaabab" }, - { MUA, 0, "(?>ac|(?>c|a)a)b*?bab", "aaaabaaabaabab" }, - { MUA, 0, "(?>(b)b|(a))*b(?>(c)|d)?x", "ababcaaabdbx" }, - { MUA, 0, "(?>bb|a)*b(?>c|d)?x", "ababcaaabdbx" }, - { MUA, 0, "(?>(bb)|a)*b(?>c|(d))?x", "ababcaaabdbx" }, - { MUA, 0, "(?>(a))*?(?>(a))+?(?>(a))??x", "aaaaaacccaaaaabax" }, - { MUA, 0, "(?>a)*?(?>a)+?(?>a)??x", "aaaaaacccaaaaabax" }, - { MUA, 0, "(?>(a)|)*?(?>(a)|)+?(?>(a)|)??x", "aaaaaacccaaaaabax" }, - { MUA, 0, "(?>a|)*?(?>a|)+?(?>a|)??x", "aaaaaacccaaaaabax" }, - { MUA, 0, "(?>a(?>(a{0,2}))*?b|aac)+b", "aaaaaaacaaaabaaaaacaaaabaacaaabb" }, - { CMA, 0, "(?>((?>a{32}|b+|(a*))?(?>c+|d*)?\?)+e)+?f", "aaccebbdde bbdaaaccebbdee bbdaaaccebbdeef" }, - { MUA, 0, "(?>(?:(?>aa|a||x)+?b|(?>aa|a||(x))+?c)?(?>[ad]{0,2})*?d)+d", "aaacdbaabdcabdbaaacd aacaabdbdcdcaaaadaabcbaadd" }, - { MUA, 0, "(?>(?:(?>aa|a||(x))+?b|(?>aa|a||x)+?c)?(?>[ad]{0,2})*?d)+d", "aaacdbaabdcabdbaaacd aacaabdbdcdcaaaadaabcbaadd" }, - { MUA, 0 | F_PROPERTY, "\\X", "\xcc\x8d\xcc\x8d" }, - { MUA, 0 | F_PROPERTY, "\\X", "\xcc\x8d\xcc\x8d#\xcc\x8d\xcc\x8d" }, - { MUA, 0 | F_PROPERTY, "\\X+..", "\xcc\x8d#\xcc\x8d#\xcc\x8d\xcc\x8d" }, - { MUA, 0 | F_PROPERTY, "\\X{2,4}", "abcdef" }, - { MUA, 0 | F_PROPERTY, "\\X{2,4}?", "abcdef" }, - { MUA, 0 | F_NOMATCH | F_PROPERTY, "\\X{2,4}..", "#\xcc\x8d##" }, - { MUA, 0 | F_PROPERTY, "\\X{2,4}..", "#\xcc\x8d#\xcc\x8d##" }, - { MUA, 0, "(c(ab)?+ab)+", "cabcababcab" }, - { MUA, 0, "(?>(a+)b)+aabab", "aaaabaaabaabab" }, - - /* Possessive quantifiers. */ - { MUA, 0, "(?:a|b)++m", "mababbaaxababbaam" }, - { MUA, 0, "(?:a|b)*+m", "mababbaaxababbaam" }, - { MUA, 0, "(?:a|b)*+m", "ababbaaxababbaam" }, - { MUA, 0, "(a|b)++m", "mababbaaxababbaam" }, - { MUA, 0, "(a|b)*+m", "mababbaaxababbaam" }, - { MUA, 0, "(a|b)*+m", "ababbaaxababbaam" }, - { MUA, 0, "(a|b(*ACCEPT))++m", "maaxab" }, - { MUA, 0, "(?:b*)++m", "bxbbxbbbxm" }, - { MUA, 0, "(?:b*)++m", "bxbbxbbbxbbm" }, - { MUA, 0, "(?:b*)*+m", "bxbbxbbbxm" }, - { MUA, 0, "(?:b*)*+m", "bxbbxbbbxbbm" }, - { MUA, 0, "(b*)++m", "bxbbxbbbxm" }, - { MUA, 0, "(b*)++m", "bxbbxbbbxbbm" }, - { MUA, 0, "(b*)*+m", "bxbbxbbbxm" }, - { MUA, 0, "(b*)*+m", "bxbbxbbbxbbm" }, - { MUA, 0, "(?:a|(b))++m", "mababbaaxababbaam" }, - { MUA, 0, "(?:(a)|b)*+m", "mababbaaxababbaam" }, - { MUA, 0, "(?:(a)|(b))*+m", "ababbaaxababbaam" }, - { MUA, 0, "(a|(b))++m", "mababbaaxababbaam" }, - { MUA, 0, "((a)|b)*+m", "mababbaaxababbaam" }, - { MUA, 0, "((a)|(b))*+m", "ababbaaxababbaam" }, - { MUA, 0, "(a|(b)(*ACCEPT))++m", "maaxab" }, - { MUA, 0, "(?:(b*))++m", "bxbbxbbbxm" }, - { MUA, 0, "(?:(b*))++m", "bxbbxbbbxbbm" }, - { MUA, 0, "(?:(b*))*+m", "bxbbxbbbxm" }, - { MUA, 0, "(?:(b*))*+m", "bxbbxbbbxbbm" }, - { MUA, 0, "((b*))++m", "bxbbxbbbxm" }, - { MUA, 0, "((b*))++m", "bxbbxbbbxbbm" }, - { MUA, 0, "((b*))*+m", "bxbbxbbbxm" }, - { MUA, 0, "((b*))*+m", "bxbbxbbbxbbm" }, - { MUA, 0 | F_NOMATCH, "(?>(b{2,4}))(?:(?:(aa|c))++m|(?:(aa|c))+n)", "bbaacaaccaaaacxbbbmbn" }, - { MUA, 0, "((?:b)++a)+(cd)*+m", "bbababbacdcdnbbababbacdcdm" }, - { MUA, 0, "((?:(b))++a)+((c)d)*+m", "bbababbacdcdnbbababbacdcdm" }, - { MUA, 0, "(?:(?:(?:ab)*+k)++(?:n(?:cd)++)*+)*+m", "ababkkXababkkabkncXababkkabkncdcdncdXababkkabkncdcdncdkkabkncdXababkkabkncdcdncdkkabkncdm" }, - { MUA, 0, "(?:((ab)*+(k))++(n(?:c(d))++)*+)*+m", "ababkkXababkkabkncXababkkabkncdcdncdXababkkabkncdcdncdkkabkncdXababkkabkncdcdncdkkabkncdm" }, - - /* Back references. */ - { MUA, 0, "(aa|bb)(\\1*)(ll|)(\\3*)bbbbbbc", "aaaaaabbbbbbbbc" }, - { CMUA, 0, "(aa|bb)(\\1+)(ll|)(\\3+)bbbbbbc", "bBbbBbCbBbbbBbbcbbBbbbBBbbC" }, - { CMA, 0, "(a{2,4})\\1", "AaAaaAaA" }, - { MUA, 0, "(aa|bb)(\\1?)aa(\\1?)(ll|)(\\4+)bbc", "aaaaaaaabbaabbbbaabbbbc" }, - { MUA, 0, "(aa|bb)(\\1{0,5})(ll|)(\\3{0,5})cc", "bbxxbbbbxxaaaaaaaaaaaaaaaacc" }, - { MUA, 0, "(aa|bb)(\\1{3,5})(ll|)(\\3{3,5})cc", "bbbbbbbbbbbbaaaaaaccbbbbbbbbbbbbbbcc" }, - { MUA, 0, "(aa|bb)(\\1{3,})(ll|)(\\3{3,})cc", "bbbbbbbbbbbbaaaaaaccbbbbbbbbbbbbbbcc" }, - { MUA, 0, "(\\w+)b(\\1+)c", "GabGaGaDbGaDGaDc" }, - { MUA, 0, "(?:(aa)|b)\\1?b", "bb" }, - { CMUA, 0, "(aa|bb)(\\1*?)aa(\\1+?)", "bBBbaaAAaaAAaa" }, - { MUA, 0, "(aa|bb)(\\1*?)(dd|)cc(\\3+?)", "aaaaaccdd" }, - { CMUA, 0, "(?:(aa|bb)(\\1?\?)cc){2}(\\1?\?)", "aAaABBbbAAaAcCaAcCaA" }, - { MUA, 0, "(?:(aa|bb)(\\1{3,5}?)){2}(dd|)(\\3{3,5}?)", "aaaaaabbbbbbbbbbaaaaaaaaaaaaaa" }, - { CMA, 0, "(?:(aa|bb)(\\1{3,}?)){2}(dd|)(\\3{3,}?)", "aaaaaabbbbbbbbbbaaaaaaaaaaaaaa" }, - { MUA, 0, "(?:(aa|bb)(\\1{0,3}?)){2}(dd|)(\\3{0,3}?)b(\\1{0,3}?)(\\1{0,3})", "aaaaaaaaaaaaaaabaaaaa" }, - { MUA, 0, "(a(?:\\1|)a){3}b", "aaaaaaaaaaab" }, - { MA, 0, "(a?)b(\\1\\1*\\1+\\1?\\1*?\\1+?\\1??\\1*+\\1++\\1?+\\1{4}\\1{3,5}\\1{4,}\\1{0,5}\\1{3,5}?\\1{4,}?\\1{0,5}?\\1{3,5}+\\1{4,}+\\1{0,5}+#){2}d", "bb#b##d" }, - { MUAP, 0 | F_PROPERTY, "(\\P{N})\\1{2,}", ".www." }, - { MUAP, 0 | F_PROPERTY, "(\\P{N})\\1{0,2}", "wwwww." }, - { MUAP, 0 | F_PROPERTY, "(\\P{N})\\1{1,2}ww", "wwww" }, - { MUAP, 0 | F_PROPERTY, "(\\P{N})\\1{1,2}ww", "wwwww" }, - { PCRE_UCP, 0 | F_PROPERTY, "(\\P{N})\\1{2,}", ".www." }, - { CMUAP, 0, "(\xf0\x90\x90\x80)\\1", "\xf0\x90\x90\xa8\xf0\x90\x90\xa8" }, - { MUA | PCRE_DUPNAMES, 0 | F_NOMATCH, "\\k{1,3}(?aa)(?bb)", "aabb" }, - { MUA | PCRE_DUPNAMES | PCRE_JAVASCRIPT_COMPAT, 0, "\\k{1,3}(?aa)(?bb)", "aabb" }, - { MUA | PCRE_DUPNAMES | PCRE_JAVASCRIPT_COMPAT, 0, "\\k*(?aa)(?bb)", "aabb" }, - { MUA | PCRE_DUPNAMES, 0, "(?aa)(?bb)\\k{0,3}aaaaaa", "aabbaaaaaa" }, - { MUA | PCRE_DUPNAMES, 0, "(?aa)(?bb)\\k{2,5}bb", "aabbaaaabb" }, - { MUA | PCRE_DUPNAMES, 0, "(?:(?aa)|(?bb))\\k{0,3}m", "aaaaaaaabbbbaabbbbm" }, - { MUA | PCRE_DUPNAMES, 0 | F_NOMATCH, "\\k{1,3}?(?aa)(?bb)", "aabb" }, - { MUA | PCRE_DUPNAMES | PCRE_JAVASCRIPT_COMPAT, 0, "\\k{1,3}?(?aa)(?bb)", "aabb" }, - { MUA | PCRE_DUPNAMES, 0, "\\k*?(?aa)(?bb)", "aabb" }, - { MUA | PCRE_DUPNAMES, 0, "(?:(?aa)|(?bb))\\k{0,3}?m", "aaaaaabbbbbbaabbbbbbbbbbm" }, - { MUA | PCRE_DUPNAMES, 0, "(?:(?aa)|(?bb))\\k*?m", "aaaaaabbbbbbaabbbbbbbbbbm" }, - { MUA | PCRE_DUPNAMES, 0, "(?:(?aa)|(?bb))\\k{2,3}?", "aaaabbbbaaaabbbbbbbbbb" }, - { CMUA | PCRE_DUPNAMES, 0, "(?:(?AA)|(?BB))\\k{0,3}M", "aaaaaaaabbbbaabbbbm" }, - { CMUA | PCRE_DUPNAMES, 0, "(?:(?AA)|(?BB))\\k{1,3}M", "aaaaaaaabbbbaabbbbm" }, - { CMUA | PCRE_DUPNAMES, 0, "(?:(?AA)|(?BB))\\k{0,3}?M", "aaaaaabbbbbbaabbbbbbbbbbm" }, - { CMUA | PCRE_DUPNAMES, 0, "(?:(?AA)|(?BB))\\k{2,3}?", "aaaabbbbaaaabbbbbbbbbb" }, - - /* Assertions. */ - { MUA, 0, "(?=xx|yy|zz)\\w{4}", "abczzdefg" }, - { MUA, 0, "(?=((\\w+)b){3}|ab)", "dbbbb ab" }, - { MUA, 0, "(?!ab|bc|cd)[a-z]{2}", "Xabcdef" }, - { MUA, 0, "(?<=aaa|aa|a)a", "aaa" }, - { MUA, 2, "(?<=aaa|aa|a)a", "aaa" }, - { MA, 0, "(?<=aaa|aa|a)a", "aaa" }, - { MA, 2, "(?<=aaa|aa|a)a", "aaa" }, - { MUA, 0, "(\\d{2})(?!\\w+c|(((\\w?)m){2}n)+|\\1)", "x5656" }, - { MUA, 0, "((?=((\\d{2,6}\\w){2,}))\\w{5,20}K){2,}", "567v09708K12l00M00 567v09708K12l00M00K45K" }, - { MUA, 0, "(?=(?:(?=\\S+a)\\w*(b)){3})\\w+\\d", "bba bbab nbbkba nbbkba0kl" }, - { MUA, 0, "(?>a(?>(b+))a(?=(..)))*?k", "acabbcabbaabacabaabbakk" }, - { MUA, 0, "((?(?=(a))a)+k)", "bbak" }, - { MUA, 0, "((?(?=a)a)+k)", "bbak" }, - { MUA, 0 | F_NOMATCH, "(?=(?>(a))m)amk", "a k" }, - { MUA, 0 | F_NOMATCH, "(?!(?>(a))m)amk", "a k" }, - { MUA, 0 | F_NOMATCH, "(?>(?=(a))am)amk", "a k" }, - { MUA, 0, "(?=(?>a|(?=(?>(b+))a|c)[a-c]+)*?m)[a-cm]+k", "aaam bbam baaambaam abbabba baaambaamk" }, - { MUA, 0, "(?> ?\?\\b(?(?=\\w{1,4}(a))m)\\w{0,8}bc){2,}?", "bca ssbc mabd ssbc mabc" }, - { MUA, 0, "(?:(?=ab)?[^n][^n])+m", "ababcdabcdcdabnababcdabcdcdabm" }, - { MUA, 0, "(?:(?=a(b))?[^n][^n])+m", "ababcdabcdcdabnababcdabcdcdabm" }, - { MUA, 0, "(?:(?=.(.))??\\1.)+m", "aabbbcbacccanaabbbcbacccam" }, - { MUA, 0, "(?:(?=.)??[a-c])+m", "abacdcbacacdcaccam" }, - { MUA, 0, "((?!a)?(?!([^a]))?)+$", "acbab" }, - { MUA, 0, "((?!a)?\?(?!([^a]))?\?)+$", "acbab" }, - { MUA, 0, "a(?=(?C)\\B)b", "ab" }, - { MUA, 0, "a(?!(?C)\\B)bb|ab", "abb" }, - { MUA, 0, "a(?=\\b|(?C)\\B)b", "ab" }, - { MUA, 0, "a(?!\\b|(?C)\\B)bb|ab", "abb" }, - { MUA, 0, "c(?(?=(?C)\\B)ab|a)", "cab" }, - { MUA, 0, "c(?(?!(?C)\\B)ab|a)", "cab" }, - { MUA, 0, "c(?(?=\\b|(?C)\\B)ab|a)", "cab" }, - { MUA, 0, "c(?(?!\\b|(?C)\\B)ab|a)", "cab" }, - { MUA, 0, "a(?=)b", "ab" }, - { MUA, 0 | F_NOMATCH, "a(?!)b", "ab" }, - - /* Not empty, ACCEPT, FAIL */ - { MUA | PCRE_NOTEMPTY, 0 | F_NOMATCH, "a*", "bcx" }, - { MUA | PCRE_NOTEMPTY, 0, "a*", "bcaad" }, - { MUA | PCRE_NOTEMPTY, 0, "a*?", "bcaad" }, - { MUA | PCRE_NOTEMPTY_ATSTART, 0, "a*", "bcaad" }, - { MUA, 0, "a(*ACCEPT)b", "ab" }, - { MUA | PCRE_NOTEMPTY, 0 | F_NOMATCH, "a*(*ACCEPT)b", "bcx" }, - { MUA | PCRE_NOTEMPTY, 0, "a*(*ACCEPT)b", "bcaad" }, - { MUA | PCRE_NOTEMPTY, 0, "a*?(*ACCEPT)b", "bcaad" }, - { MUA | PCRE_NOTEMPTY, 0 | F_NOMATCH, "(?:z|a*(*ACCEPT)b)", "bcx" }, - { MUA | PCRE_NOTEMPTY, 0, "(?:z|a*(*ACCEPT)b)", "bcaad" }, - { MUA | PCRE_NOTEMPTY, 0, "(?:z|a*?(*ACCEPT)b)", "bcaad" }, - { MUA | PCRE_NOTEMPTY_ATSTART, 0, "a*(*ACCEPT)b", "bcx" }, - { MUA | PCRE_NOTEMPTY_ATSTART, 0 | F_NOMATCH, "a*(*ACCEPT)b", "" }, - { MUA, 0, "((a(*ACCEPT)b))", "ab" }, - { MUA, 0, "(a(*FAIL)a|a)", "aaa" }, - { MUA, 0, "(?=ab(*ACCEPT)b)a", "ab" }, - { MUA, 0, "(?=(?:x|ab(*ACCEPT)b))", "ab" }, - { MUA, 0, "(?=(a(b(*ACCEPT)b)))a", "ab" }, - { MUA | PCRE_NOTEMPTY, 0, "(?=a*(*ACCEPT))c", "c" }, - - /* Conditional blocks. */ - { MUA, 0, "(?(?=(a))a|b)+k", "ababbalbbadabak" }, - { MUA, 0, "(?(?!(b))a|b)+k", "ababbalbbadabak" }, - { MUA, 0, "(?(?=a)a|b)+k", "ababbalbbadabak" }, - { MUA, 0, "(?(?!b)a|b)+k", "ababbalbbadabak" }, - { MUA, 0, "(?(?=(a))a*|b*)+k", "ababbalbbadabak" }, - { MUA, 0, "(?(?!(b))a*|b*)+k", "ababbalbbadabak" }, - { MUA, 0, "(?(?!(b))(?:aaaaaa|a)|(?:bbbbbb|b))+aaaak", "aaaaaaaaaaaaaa bbbbbbbbbbbbbbb aaaaaaak" }, - { MUA, 0, "(?(?!b)(?:aaaaaa|a)|(?:bbbbbb|b))+aaaak", "aaaaaaaaaaaaaa bbbbbbbbbbbbbbb aaaaaaak" }, - { MUA, 0 | F_DIFF, "(?(?!(b))(?:aaaaaa|a)|(?:bbbbbb|b))+bbbbk", "aaaaaaaaaaaaaa bbbbbbbbbbbbbbb bbbbbbbk" }, - { MUA, 0, "(?(?!b)(?:aaaaaa|a)|(?:bbbbbb|b))+bbbbk", "aaaaaaaaaaaaaa bbbbbbbbbbbbbbb bbbbbbbk" }, - { MUA, 0, "(?(?=a)a*|b*)+k", "ababbalbbadabak" }, - { MUA, 0, "(?(?!b)a*|b*)+k", "ababbalbbadabak" }, - { MUA, 0, "(?(?=a)ab)", "a" }, - { MUA, 0, "(?(?a)?(?Pb)?(?(Name)c|d)*l", "bc ddd abccabccl" }, - { MUA, 0, "(?Pa)?(?Pb)?(?(Name)c|d)+?dd", "bcabcacdb bdddd" }, - { MUA, 0, "(?Pa)?(?Pb)?(?(Name)c|d)+l", "ababccddabdbccd abcccl" }, - { MUA, 0, "((?:a|aa)(?(1)aaa))x", "aax" }, - { MUA, 0, "(?(?!)a|b)", "ab" }, - { MUA, 0, "(?(?!)a)", "ab" }, - { MUA, 0 | F_NOMATCH, "(?(?!)a|b)", "ac" }, - - /* Set start of match. */ - { MUA, 0, "(?:\\Ka)*aaaab", "aaaaaaaa aaaaaaabb" }, - { MUA, 0, "(?>\\Ka\\Ka)*aaaab", "aaaaaaaa aaaaaaaaaabb" }, - { MUA, 0, "a+\\K(?<=\\Gaa)a", "aaaaaa" }, - { MUA | PCRE_NOTEMPTY, 0 | F_NOMATCH, "a\\K(*ACCEPT)b", "aa" }, - { MUA | PCRE_NOTEMPTY_ATSTART, 0, "a\\K(*ACCEPT)b", "aa" }, - - /* First line. */ - { MUA | PCRE_FIRSTLINE, 0 | F_PROPERTY, "\\p{Any}a", "bb\naaa" }, - { MUA | PCRE_FIRSTLINE, 0 | F_NOMATCH | F_PROPERTY, "\\p{Any}a", "bb\r\naaa" }, - { MUA | PCRE_FIRSTLINE, 0, "(?<=a)", "a" }, - { MUA | PCRE_FIRSTLINE, 0 | F_NOMATCH, "[^a][^b]", "ab" }, - { MUA | PCRE_FIRSTLINE, 0 | F_NOMATCH, "a", "\na" }, - { MUA | PCRE_FIRSTLINE, 0 | F_NOMATCH, "[abc]", "\na" }, - { MUA | PCRE_FIRSTLINE, 0 | F_NOMATCH, "^a", "\na" }, - { MUA | PCRE_FIRSTLINE, 0 | F_NOMATCH, "^(?<=\n)", "\na" }, - { MUA | PCRE_FIRSTLINE, 0, "\xf0\x90\x90\x80", "\xf0\x90\x90\x80" }, - { PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_ANY | PCRE_FIRSTLINE, 0 | F_NOMATCH, "#", "\xc2\x85#" }, - { PCRE_MULTILINE | PCRE_NEWLINE_ANY | PCRE_FIRSTLINE, 0 | F_NOMATCH, "#", "\x85#" }, - { PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_ANY | PCRE_FIRSTLINE, 0 | F_NOMATCH, "^#", "\xe2\x80\xa8#" }, - { PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_CRLF | PCRE_FIRSTLINE, 0 | F_PROPERTY, "\\p{Any}", "\r\na" }, - { PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_CRLF | PCRE_FIRSTLINE, 0, ".", "\r" }, - { PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_CRLF | PCRE_FIRSTLINE, 0, "a", "\ra" }, - { PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_CRLF | PCRE_FIRSTLINE, 0 | F_NOMATCH, "ba", "bbb\r\nba" }, - { PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_CRLF | PCRE_FIRSTLINE, 0 | F_NOMATCH | F_PROPERTY, "\\p{Any}{4}|a", "\r\na" }, - { PCRE_MULTILINE | PCRE_UTF8 | PCRE_NEWLINE_CRLF | PCRE_FIRSTLINE, 1, ".", "\r\n" }, - { PCRE_FIRSTLINE | PCRE_NEWLINE_LF | PCRE_DOTALL, 0 | F_NOMATCH, "ab.", "ab" }, - { MUA | PCRE_FIRSTLINE, 1 | F_NOMATCH, "^[a-d0-9]", "\nxx\nd" }, - { PCRE_NEWLINE_ANY | PCRE_FIRSTLINE | PCRE_DOTALL, 0, "....a", "012\n0a" }, - { MUA | PCRE_FIRSTLINE, 0, "[aC]", "a" }, - - /* Recurse. */ - { MUA, 0, "(a)(?1)", "aa" }, - { MUA, 0, "((a))(?1)", "aa" }, - { MUA, 0, "(b|a)(?1)", "aa" }, - { MUA, 0, "(b|(a))(?1)", "aa" }, - { MUA, 0 | F_NOMATCH, "((a)(b)(?:a*))(?1)", "aba" }, - { MUA, 0, "((a)(b)(?:a*))(?1)", "abab" }, - { MUA, 0, "((a+)c(?2))b(?1)", "aacaabaca" }, - { MUA, 0, "((?2)b|(a)){2}(?1)", "aabab" }, - { MUA, 0, "(?1)(a)*+(?2)(b(?1))", "aababa" }, - { MUA, 0, "(?1)(((a(*ACCEPT)))b)", "axaa" }, - { MUA, 0, "(?1)(?(DEFINE) (((ac(*ACCEPT)))b) )", "akaac" }, - { MUA, 0, "(a+)b(?1)b\\1", "abaaabaaaaa" }, - { MUA, 0 | F_NOMATCH, "(?(DEFINE)(aa|a))(?1)ab", "aab" }, - { MUA, 0, "(?(DEFINE)(a\\Kb))(?1)+ababc", "abababxabababc" }, - { MUA, 0, "(a\\Kb)(?1)+ababc", "abababxababababc" }, - { MUA, 0 | F_NOMATCH, "(a\\Kb)(?1)+ababc", "abababxababababxc" }, - { MUA, 0, "b|<(?R)*>", "<" }, - { MUA, 0, "(a\\K){0}(?:(?1)b|ac)", "ac" }, - { MUA, 0, "(?(DEFINE)(a(?2)|b)(b(?1)|(a)))(?:(?1)|(?2))m", "ababababnababababaam" }, - { MUA, 0, "(a)((?(R)a|b))(?2)", "aabbabaa" }, - { MUA, 0, "(a)((?(R2)a|b))(?2)", "aabbabaa" }, - { MUA, 0, "(a)((?(R1)a|b))(?2)", "ababba" }, - { MUA, 0, "(?(R0)aa|bb(?R))", "abba aabb bbaa" }, - { MUA, 0, "((?(R)(?:aaaa|a)|(?:(aaaa)|(a)))+)(?1)$", "aaaaaaaaaa aaaa" }, - { MUA, 0, "(?Pa(?(R&Name)a|b))(?1)", "aab abb abaa" }, - { MUA, 0, "((?(R)a|(?1)){3})", "XaaaaaaaaaX" }, - { MUA, 0, "((?:(?(R)a|(?1))){3})", "XaaaaaaaaaX" }, - { MUA, 0, "((?(R)a|(?1)){1,3})aaaaaa", "aaaaaaaaXaaaaaaaaa" }, - { MUA, 0, "((?(R)a|(?1)){1,3}?)M", "aaaM" }, - - /* 16 bit specific tests. */ - { CMA, 0 | F_FORCECONV, "\xc3\xa1", "\xc3\x81\xc3\xa1" }, - { CMA, 0 | F_FORCECONV, "\xe1\xbd\xb8", "\xe1\xbf\xb8\xe1\xbd\xb8" }, - { CMA, 0 | F_FORCECONV, "[\xc3\xa1]", "\xc3\x81\xc3\xa1" }, - { CMA, 0 | F_FORCECONV, "[\xe1\xbd\xb8]", "\xe1\xbf\xb8\xe1\xbd\xb8" }, - { CMA, 0 | F_FORCECONV, "[a-\xed\xb0\x80]", "A" }, - { CMA, 0 | F_NO8 | F_FORCECONV, "[a-\\x{dc00}]", "B" }, - { CMA, 0 | F_NO8 | F_NOMATCH | F_FORCECONV, "[b-\\x{dc00}]", "a" }, - { CMA, 0 | F_NO8 | F_FORCECONV, "\xed\xa0\x80\\x{d800}\xed\xb0\x80\\x{dc00}", "\xed\xa0\x80\xed\xa0\x80\xed\xb0\x80\xed\xb0\x80" }, - { CMA, 0 | F_NO8 | F_FORCECONV, "[\xed\xa0\x80\\x{d800}]{1,2}?[\xed\xb0\x80\\x{dc00}]{1,2}?#", "\xed\xa0\x80\xed\xa0\x80\xed\xb0\x80\xed\xb0\x80#" }, - { CMA, 0 | F_FORCECONV, "[\xed\xa0\x80\xed\xb0\x80#]{0,3}(?<=\xed\xb0\x80.)", "\xed\xa0\x80#\xed\xa0\x80##\xed\xb0\x80\xed\xa0\x80" }, - { CMA, 0 | F_FORCECONV, "[\xed\xa0\x80-\xed\xb3\xbf]", "\xed\x9f\xbf\xed\xa0\x83" }, - { CMA, 0 | F_FORCECONV, "[\xed\xa0\x80-\xed\xb3\xbf]", "\xed\xb4\x80\xed\xb3\xb0" }, - { CMA, 0 | F_NO8 | F_FORCECONV, "[\\x{d800}-\\x{dcff}]", "\xed\x9f\xbf\xed\xa0\x83" }, - { CMA, 0 | F_NO8 | F_FORCECONV, "[\\x{d800}-\\x{dcff}]", "\xed\xb4\x80\xed\xb3\xb0" }, - { CMA, 0 | F_FORCECONV, "[\xed\xa0\x80-\xef\xbf\xbf]+[\x1-\xed\xb0\x80]+#", "\xed\xa0\x85\xc3\x81\xed\xa0\x85\xef\xbf\xb0\xc2\x85\xed\xa9\x89#" }, - { CMA, 0 | F_FORCECONV, "[\xed\xa0\x80][\xed\xb0\x80]{2,}", "\xed\xa0\x80\xed\xb0\x80\xed\xa0\x80\xed\xb0\x80\xed\xb0\x80\xed\xb0\x80" }, - { MA, 0 | F_FORCECONV, "[^\xed\xb0\x80]{3,}?", "##\xed\xb0\x80#\xed\xb0\x80#\xc3\x89#\xed\xb0\x80" }, - { MA, 0 | F_NO8 | F_FORCECONV, "[^\\x{dc00}]{3,}?", "##\xed\xb0\x80#\xed\xb0\x80#\xc3\x89#\xed\xb0\x80" }, - { CMA, 0 | F_FORCECONV, ".\\B.", "\xed\xa0\x80\xed\xb0\x80" }, - { CMA, 0 | F_FORCECONV, "\\D+(?:\\d+|.)\\S+(?:\\s+|.)\\W+(?:\\w+|.)\xed\xa0\x80\xed\xa0\x80", "\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80" }, - { CMA, 0 | F_FORCECONV, "\\d*\\s*\\w*\xed\xa0\x80\xed\xa0\x80", "\xed\xa0\x80\xed\xa0\x80" }, - { CMA, 0 | F_FORCECONV | F_NOMATCH, "\\d*?\\D*?\\s*?\\S*?\\w*?\\W*?##", "\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80#" }, - { CMA | PCRE_EXTENDED, 0 | F_FORCECONV, "\xed\xa0\x80 \xed\xb0\x80 !", "\xed\xa0\x80\xed\xb0\x80!" }, - { CMA, 0 | F_FORCECONV, "\xed\xa0\x80+#[^#]+\xed\xa0\x80", "\xed\xa0\x80#a\xed\xa0\x80" }, - { CMA, 0 | F_FORCECONV, "(\xed\xa0\x80+)#\\1", "\xed\xa0\x80\xed\xa0\x80#\xed\xa0\x80\xed\xa0\x80" }, - { PCRE_MULTILINE | PCRE_NEWLINE_ANY, 0 | F_NO8 | F_FORCECONV, "^-", "a--\xe2\x80\xa8--" }, - { PCRE_BSR_UNICODE, 0 | F_NO8 | F_FORCECONV, "\\R", "ab\xe2\x80\xa8" }, - { 0, 0 | F_NO8 | F_FORCECONV, "\\v", "ab\xe2\x80\xa9" }, - { 0, 0 | F_NO8 | F_FORCECONV, "\\h", "ab\xe1\xa0\x8e" }, - { 0, 0 | F_NO8 | F_FORCECONV, "\\v+?\\V+?#", "\xe2\x80\xa9\xe2\x80\xa9\xef\xbf\xbf\xef\xbf\xbf#" }, - { 0, 0 | F_NO8 | F_FORCECONV, "\\h+?\\H+?#", "\xe1\xa0\x8e\xe1\xa0\x8e\xef\xbf\xbf\xef\xbf\xbf#" }, - - /* Partial matching. */ - { MUA | PCRE_PARTIAL_SOFT, 0, "ab", "a" }, - { MUA | PCRE_PARTIAL_SOFT, 0, "ab|a", "a" }, - { MUA | PCRE_PARTIAL_HARD, 0, "ab|a", "a" }, - { MUA | PCRE_PARTIAL_SOFT, 0, "\\b#", "a" }, - { MUA | PCRE_PARTIAL_SOFT, 0, "(?<=a)b", "a" }, - { MUA | PCRE_PARTIAL_SOFT, 0, "abc|(?<=xxa)bc", "xxab" }, - { MUA | PCRE_PARTIAL_SOFT, 0, "a\\B", "a" }, - { MUA | PCRE_PARTIAL_HARD, 0, "a\\b", "a" }, - - /* (*MARK) verb. */ - { MUA, 0, "a(*MARK:aa)a", "ababaa" }, - { MUA, 0 | F_NOMATCH, "a(*:aa)a", "abab" }, - { MUA, 0, "a(*:aa)(b(*:bb)b|bc)", "abc" }, - { MUA, 0 | F_NOMATCH, "a(*:1)x|b(*:2)y", "abc" }, - { MUA, 0, "(?>a(*:aa))b|ac", "ac" }, - { MUA, 0, "(?(DEFINE)(a(*:aa)))(?1)", "a" }, - { MUA, 0 | F_NOMATCH, "(?(DEFINE)((a)(*:aa)))(?1)b", "aa" }, - { MUA, 0, "(?(DEFINE)(a(*:aa)))a(?1)b|aac", "aac" }, - { MUA, 0, "(a(*:aa)){0}(?:b(?1)b|c)+c", "babbab cc" }, - { MUA, 0, "(a(*:aa)){0}(?:b(?1)b)+", "babba" }, - { MUA, 0 | F_NOMATCH | F_STUDY, "(a(*:aa)){0}(?:b(?1)b)+", "ba" }, - { MUA, 0, "(a\\K(*:aa)){0}(?:b(?1)b|c)+c", "babbab cc" }, - { MUA, 0, "(a\\K(*:aa)){0}(?:b(?1)b)+", "babba" }, - { MUA, 0 | F_NOMATCH | F_STUDY, "(a\\K(*:aa)){0}(?:b(?1)b)+", "ba" }, - { MUA, 0 | F_NOMATCH | F_STUDY, "(*:mark)m", "a" }, - - /* (*COMMIT) verb. */ - { MUA, 0 | F_NOMATCH, "a(*COMMIT)b", "ac" }, - { MUA, 0, "aa(*COMMIT)b", "xaxaab" }, - { MUA, 0 | F_NOMATCH, "a(*COMMIT)(*:msg)b|ac", "ac" }, - { MUA, 0 | F_NOMATCH, "(a(*COMMIT)b)++", "abac" }, - { MUA, 0 | F_NOMATCH, "((a)(*COMMIT)b)++", "abac" }, - { MUA, 0 | F_NOMATCH, "(?=a(*COMMIT)b)ab|ad", "ad" }, - - /* (*PRUNE) verb. */ - { MUA, 0, "aa\\K(*PRUNE)b", "aaab" }, - { MUA, 0, "aa(*PRUNE:bb)b|a", "aa" }, - { MUA, 0, "(a)(a)(*PRUNE)b|(a)", "aa" }, - { MUA, 0, "(a)(a)(a)(a)(a)(a)(a)(a)(*PRUNE)b|(a)", "aaaaaaaa" }, - { MUA | PCRE_PARTIAL_SOFT, 0, "a(*PRUNE)a|", "a" }, - { MUA | PCRE_PARTIAL_SOFT, 0, "a(*PRUNE)a|m", "a" }, - { MUA, 0 | F_NOMATCH, "(?=a(*PRUNE)b)ab|ad", "ad" }, - { MUA, 0, "a(*COMMIT)(*PRUNE)d|bc", "abc" }, - { MUA, 0, "(?=a(*COMMIT)b)a(*PRUNE)c|bc", "abc" }, - { MUA, 0 | F_NOMATCH, "(*COMMIT)(?=a(*COMMIT)b)a(*PRUNE)c|bc", "abc" }, - { MUA, 0, "(?=(a)(*COMMIT)b)a(*PRUNE)c|bc", "abc" }, - { MUA, 0 | F_NOMATCH, "(*COMMIT)(?=(a)(*COMMIT)b)a(*PRUNE)c|bc", "abc" }, - { MUA, 0, "(a(*COMMIT)b){0}a(?1)(*PRUNE)c|bc", "abc" }, - { MUA, 0 | F_NOMATCH, "(a(*COMMIT)b){0}a(*COMMIT)(?1)(*PRUNE)c|bc", "abc" }, - { MUA, 0, "(a(*COMMIT)b)++(*PRUNE)d|c", "ababc" }, - { MUA, 0 | F_NOMATCH, "(*COMMIT)(a(*COMMIT)b)++(*PRUNE)d|c", "ababc" }, - { MUA, 0, "((a)(*COMMIT)b)++(*PRUNE)d|c", "ababc" }, - { MUA, 0 | F_NOMATCH, "(*COMMIT)((a)(*COMMIT)b)++(*PRUNE)d|c", "ababc" }, - { MUA, 0, "(?>a(*COMMIT)b)*abab(*PRUNE)d|ba", "ababab" }, - { MUA, 0 | F_NOMATCH, "(*COMMIT)(?>a(*COMMIT)b)*abab(*PRUNE)d|ba", "ababab" }, - { MUA, 0, "(?>a(*COMMIT)b)+abab(*PRUNE)d|ba", "ababab" }, - { MUA, 0 | F_NOMATCH, "(*COMMIT)(?>a(*COMMIT)b)+abab(*PRUNE)d|ba", "ababab" }, - { MUA, 0, "(?>a(*COMMIT)b)?ab(*PRUNE)d|ba", "aba" }, - { MUA, 0 | F_NOMATCH, "(*COMMIT)(?>a(*COMMIT)b)?ab(*PRUNE)d|ba", "aba" }, - { MUA, 0, "(?>a(*COMMIT)b)*?n(*PRUNE)d|ba", "abababn" }, - { MUA, 0 | F_NOMATCH, "(*COMMIT)(?>a(*COMMIT)b)*?n(*PRUNE)d|ba", "abababn" }, - { MUA, 0, "(?>a(*COMMIT)b)+?n(*PRUNE)d|ba", "abababn" }, - { MUA, 0 | F_NOMATCH, "(*COMMIT)(?>a(*COMMIT)b)+?n(*PRUNE)d|ba", "abababn" }, - { MUA, 0, "(?>a(*COMMIT)b)??n(*PRUNE)d|bn", "abn" }, - { MUA, 0 | F_NOMATCH, "(*COMMIT)(?>a(*COMMIT)b)??n(*PRUNE)d|bn", "abn" }, - - /* (*SKIP) verb. */ - { MUA, 0 | F_NOMATCH, "(?=a(*SKIP)b)ab|ad", "ad" }, - { MUA, 0, "(\\w+(*SKIP)#)", "abcd,xyz#," }, - { MUA, 0, "\\w+(*SKIP)#|mm", "abcd,xyz#," }, - { MUA, 0 | F_NOMATCH, "b+(?<=(*SKIP)#c)|b+", "#bbb" }, - - /* (*THEN) verb. */ - { MUA, 0, "((?:a(*THEN)|aab)(*THEN)c|a+)+m", "aabcaabcaabcaabcnacm" }, - { MUA, 0 | F_NOMATCH, "((?:a(*THEN)|aab)(*THEN)c|a+)+m", "aabcm" }, - { MUA, 0, "((?:a(*THEN)|aab)c|a+)+m", "aabcaabcnmaabcaabcm" }, - { MUA, 0, "((?:a|aab)(*THEN)c|a+)+m", "aam" }, - { MUA, 0, "((?:a(*COMMIT)|aab)(*THEN)c|a+)+m", "aam" }, - { MUA, 0, "(?(?=a(*THEN)b)ab|ad)", "ad" }, - { MUA, 0, "(?(?!a(*THEN)b)ad|add)", "add" }, - { MUA, 0 | F_NOMATCH, "(?(?=a)a(*THEN)b|ad)", "ad" }, - { MUA, 0, "(?!(?(?=a)ab|b(*THEN)d))bn|bnn", "bnn" }, - - /* Deep recursion. */ - { MUA, 0, "((((?:(?:(?:\\w)+)?)*|(?>\\w)+?)+|(?>\\w)?\?)*)?\\s", "aaaaa+ " }, - { MUA, 0, "(?:((?:(?:(?:\\w*?)+)??|(?>\\w)?|\\w*+)*)+)+?\\s", "aa+ " }, - { MUA, 0, "((a?)+)+b", "aaaaaaaaaaaa b" }, - - /* Deep recursion: Stack limit reached. */ - { MA, 0 | F_NOMATCH, "a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaa" }, - { MA, 0 | F_NOMATCH, "(?:a+)+b", "aaaaaaaaaaaaaaaaaaaaaaaa b" }, - { MA, 0 | F_NOMATCH, "(?:a+?)+?b", "aaaaaaaaaaaaaaaaaaaaaaaa b" }, - { MA, 0 | F_NOMATCH, "(?:a*)*b", "aaaaaaaaaaaaaaaaaaaaaaaa b" }, - { MA, 0 | F_NOMATCH, "(?:a*?)*?b", "aaaaaaaaaaaaaaaaaaaaaaaa b" }, - - { 0, 0, NULL, NULL } -}; - -static const unsigned char *tables(int mode) -{ - /* The purpose of this function to allow valgrind - for reporting invalid reads and writes. */ - static unsigned char *tables_copy; - const char *errorptr; - int erroroffset; - unsigned char *default_tables; -#if defined SUPPORT_PCRE8 - pcre *regex; - char null_str[1] = { 0 }; -#elif defined SUPPORT_PCRE16 - pcre16 *regex; - PCRE_UCHAR16 null_str[1] = { 0 }; -#elif defined SUPPORT_PCRE32 - pcre32 *regex; - PCRE_UCHAR32 null_str[1] = { 0 }; -#endif - - if (mode) { - if (tables_copy) - free(tables_copy); - tables_copy = NULL; - return NULL; - } - - if (tables_copy) - return tables_copy; - - default_tables = NULL; -#if defined SUPPORT_PCRE8 - regex = pcre_compile(null_str, 0, &errorptr, &erroroffset, NULL); - if (regex) { - pcre_fullinfo(regex, NULL, PCRE_INFO_DEFAULT_TABLES, &default_tables); - pcre_free(regex); - } -#elif defined SUPPORT_PCRE16 - regex = pcre16_compile(null_str, 0, &errorptr, &erroroffset, NULL); - if (regex) { - pcre16_fullinfo(regex, NULL, PCRE_INFO_DEFAULT_TABLES, &default_tables); - pcre16_free(regex); - } -#elif defined SUPPORT_PCRE32 - regex = pcre32_compile(null_str, 0, &errorptr, &erroroffset, NULL); - if (regex) { - pcre32_fullinfo(regex, NULL, PCRE_INFO_DEFAULT_TABLES, &default_tables); - pcre32_free(regex); - } -#endif - /* Shouldn't ever happen. */ - if (!default_tables) - return NULL; - - /* Unfortunately this value cannot get from pcre_fullinfo. - Since this is a test program, this is acceptable at the moment. */ - tables_copy = (unsigned char *)malloc(1088); - if (!tables_copy) - return NULL; - - memcpy(tables_copy, default_tables, 1088); - return tables_copy; -} - -#ifdef SUPPORT_PCRE8 -static pcre_jit_stack* callback8(void *arg) -{ - return (pcre_jit_stack *)arg; -} -#endif - -#ifdef SUPPORT_PCRE16 -static pcre16_jit_stack* callback16(void *arg) -{ - return (pcre16_jit_stack *)arg; -} -#endif - -#ifdef SUPPORT_PCRE32 -static pcre32_jit_stack* callback32(void *arg) -{ - return (pcre32_jit_stack *)arg; -} -#endif - -#ifdef SUPPORT_PCRE8 -static pcre_jit_stack *stack8; - -static pcre_jit_stack *getstack8(void) -{ - if (!stack8) - stack8 = pcre_jit_stack_alloc(1, 1024 * 1024); - return stack8; -} - -static void setstack8(pcre_extra *extra) -{ - if (!extra) { - if (stack8) - pcre_jit_stack_free(stack8); - stack8 = NULL; - return; - } - - pcre_assign_jit_stack(extra, callback8, getstack8()); -} -#endif /* SUPPORT_PCRE8 */ - -#ifdef SUPPORT_PCRE16 -static pcre16_jit_stack *stack16; - -static pcre16_jit_stack *getstack16(void) -{ - if (!stack16) - stack16 = pcre16_jit_stack_alloc(1, 1024 * 1024); - return stack16; -} - -static void setstack16(pcre16_extra *extra) -{ - if (!extra) { - if (stack16) - pcre16_jit_stack_free(stack16); - stack16 = NULL; - return; - } - - pcre16_assign_jit_stack(extra, callback16, getstack16()); -} -#endif /* SUPPORT_PCRE16 */ - -#ifdef SUPPORT_PCRE32 -static pcre32_jit_stack *stack32; - -static pcre32_jit_stack *getstack32(void) -{ - if (!stack32) - stack32 = pcre32_jit_stack_alloc(1, 1024 * 1024); - return stack32; -} - -static void setstack32(pcre32_extra *extra) -{ - if (!extra) { - if (stack32) - pcre32_jit_stack_free(stack32); - stack32 = NULL; - return; - } - - pcre32_assign_jit_stack(extra, callback32, getstack32()); -} -#endif /* SUPPORT_PCRE32 */ - -#ifdef SUPPORT_PCRE16 - -static int convert_utf8_to_utf16(const char *input, PCRE_UCHAR16 *output, int *offsetmap, int max_length) -{ - unsigned char *iptr = (unsigned char*)input; - PCRE_UCHAR16 *optr = output; - unsigned int c; - - if (max_length == 0) - return 0; - - while (*iptr && max_length > 1) { - c = 0; - if (offsetmap) - *offsetmap++ = (int)(iptr - (unsigned char*)input); - - if (*iptr < 0xc0) - c = *iptr++; - else if (!(*iptr & 0x20)) { - c = ((iptr[0] & 0x1f) << 6) | (iptr[1] & 0x3f); - iptr += 2; - } else if (!(*iptr & 0x10)) { - c = ((iptr[0] & 0x0f) << 12) | ((iptr[1] & 0x3f) << 6) | (iptr[2] & 0x3f); - iptr += 3; - } else if (!(*iptr & 0x08)) { - c = ((iptr[0] & 0x07) << 18) | ((iptr[1] & 0x3f) << 12) | ((iptr[2] & 0x3f) << 6) | (iptr[3] & 0x3f); - iptr += 4; - } - - if (c < 65536) { - *optr++ = c; - max_length--; - } else if (max_length <= 2) { - *optr = '\0'; - return (int)(optr - output); - } else { - c -= 0x10000; - *optr++ = 0xd800 | ((c >> 10) & 0x3ff); - *optr++ = 0xdc00 | (c & 0x3ff); - max_length -= 2; - if (offsetmap) - offsetmap++; - } - } - if (offsetmap) - *offsetmap = (int)(iptr - (unsigned char*)input); - *optr = '\0'; - return (int)(optr - output); -} - -static int copy_char8_to_char16(const char *input, PCRE_UCHAR16 *output, int max_length) -{ - unsigned char *iptr = (unsigned char*)input; - PCRE_UCHAR16 *optr = output; - - if (max_length == 0) - return 0; - - while (*iptr && max_length > 1) { - *optr++ = *iptr++; - max_length--; - } - *optr = '\0'; - return (int)(optr - output); -} - -#define REGTEST_MAX_LENGTH16 4096 -static PCRE_UCHAR16 regtest_buf16[REGTEST_MAX_LENGTH16]; -static int regtest_offsetmap16[REGTEST_MAX_LENGTH16]; - -#endif /* SUPPORT_PCRE16 */ - -#ifdef SUPPORT_PCRE32 - -static int convert_utf8_to_utf32(const char *input, PCRE_UCHAR32 *output, int *offsetmap, int max_length) -{ - unsigned char *iptr = (unsigned char*)input; - PCRE_UCHAR32 *optr = output; - unsigned int c; - - if (max_length == 0) - return 0; - - while (*iptr && max_length > 1) { - c = 0; - if (offsetmap) - *offsetmap++ = (int)(iptr - (unsigned char*)input); - - if (*iptr < 0xc0) - c = *iptr++; - else if (!(*iptr & 0x20)) { - c = ((iptr[0] & 0x1f) << 6) | (iptr[1] & 0x3f); - iptr += 2; - } else if (!(*iptr & 0x10)) { - c = ((iptr[0] & 0x0f) << 12) | ((iptr[1] & 0x3f) << 6) | (iptr[2] & 0x3f); - iptr += 3; - } else if (!(*iptr & 0x08)) { - c = ((iptr[0] & 0x07) << 18) | ((iptr[1] & 0x3f) << 12) | ((iptr[2] & 0x3f) << 6) | (iptr[3] & 0x3f); - iptr += 4; - } - - *optr++ = c; - max_length--; - } - if (offsetmap) - *offsetmap = (int)(iptr - (unsigned char*)input); - *optr = 0; - return (int)(optr - output); -} - -static int copy_char8_to_char32(const char *input, PCRE_UCHAR32 *output, int max_length) -{ - unsigned char *iptr = (unsigned char*)input; - PCRE_UCHAR32 *optr = output; - - if (max_length == 0) - return 0; - - while (*iptr && max_length > 1) { - *optr++ = *iptr++; - max_length--; - } - *optr = '\0'; - return (int)(optr - output); -} - -#define REGTEST_MAX_LENGTH32 4096 -static PCRE_UCHAR32 regtest_buf32[REGTEST_MAX_LENGTH32]; -static int regtest_offsetmap32[REGTEST_MAX_LENGTH32]; - -#endif /* SUPPORT_PCRE32 */ - -static int check_ascii(const char *input) -{ - const unsigned char *ptr = (unsigned char *)input; - while (*ptr) { - if (*ptr > 127) - return 0; - ptr++; - } - return 1; -} - -static int regression_tests(void) -{ - struct regression_test_case *current = regression_test_cases; - const char *error; - char *cpu_info; - int i, err_offs; - int is_successful, is_ascii; - int total = 0; - int successful = 0; - int successful_row = 0; - int counter = 0; - int study_mode; - int utf = 0, ucp = 0; - int disabled_flags = 0; -#ifdef SUPPORT_PCRE8 - pcre *re8; - pcre_extra *extra8; - pcre_extra dummy_extra8; - int ovector8_1[32]; - int ovector8_2[32]; - int return_value8[2]; - unsigned char *mark8_1, *mark8_2; -#endif -#ifdef SUPPORT_PCRE16 - pcre16 *re16; - pcre16_extra *extra16; - pcre16_extra dummy_extra16; - int ovector16_1[32]; - int ovector16_2[32]; - int return_value16[2]; - PCRE_UCHAR16 *mark16_1, *mark16_2; - int length16; -#endif -#ifdef SUPPORT_PCRE32 - pcre32 *re32; - pcre32_extra *extra32; - pcre32_extra dummy_extra32; - int ovector32_1[32]; - int ovector32_2[32]; - int return_value32[2]; - PCRE_UCHAR32 *mark32_1, *mark32_2; - int length32; -#endif - - /* This test compares the behaviour of interpreter and JIT. Although disabling - utf or ucp may make tests fail, if the pcre_exec result is the SAME, it is - still considered successful from pcre_jit_test point of view. */ - -#if defined SUPPORT_PCRE8 - pcre_config(PCRE_CONFIG_JITTARGET, &cpu_info); -#elif defined SUPPORT_PCRE16 - pcre16_config(PCRE_CONFIG_JITTARGET, &cpu_info); -#elif defined SUPPORT_PCRE32 - pcre32_config(PCRE_CONFIG_JITTARGET, &cpu_info); -#endif - - printf("Running JIT regression tests\n"); - printf(" target CPU of SLJIT compiler: %s\n", cpu_info); - -#if defined SUPPORT_PCRE8 - pcre_config(PCRE_CONFIG_UTF8, &utf); - pcre_config(PCRE_CONFIG_UNICODE_PROPERTIES, &ucp); -#elif defined SUPPORT_PCRE16 - pcre16_config(PCRE_CONFIG_UTF16, &utf); - pcre16_config(PCRE_CONFIG_UNICODE_PROPERTIES, &ucp); -#elif defined SUPPORT_PCRE32 - pcre32_config(PCRE_CONFIG_UTF32, &utf); - pcre32_config(PCRE_CONFIG_UNICODE_PROPERTIES, &ucp); -#endif - - if (!utf) - disabled_flags |= PCRE_UTF8 | PCRE_UTF16 | PCRE_UTF32; - if (!ucp) - disabled_flags |= PCRE_UCP; -#ifdef SUPPORT_PCRE8 - printf(" in 8 bit mode with UTF-8 %s and ucp %s:\n", utf ? "enabled" : "disabled", ucp ? "enabled" : "disabled"); -#endif -#ifdef SUPPORT_PCRE16 - printf(" in 16 bit mode with UTF-16 %s and ucp %s:\n", utf ? "enabled" : "disabled", ucp ? "enabled" : "disabled"); -#endif -#ifdef SUPPORT_PCRE32 - printf(" in 32 bit mode with UTF-32 %s and ucp %s:\n", utf ? "enabled" : "disabled", ucp ? "enabled" : "disabled"); -#endif - - while (current->pattern) { - /* printf("\nPattern: %s :\n", current->pattern); */ - total++; - is_ascii = 0; - if (!(current->start_offset & F_PROPERTY)) - is_ascii = check_ascii(current->pattern) && check_ascii(current->input); - - if (current->flags & PCRE_PARTIAL_SOFT) - study_mode = PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE; - else if (current->flags & PCRE_PARTIAL_HARD) - study_mode = PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE; - else - study_mode = PCRE_STUDY_JIT_COMPILE; - error = NULL; -#ifdef SUPPORT_PCRE8 - re8 = NULL; - if (!(current->start_offset & F_NO8)) - re8 = pcre_compile(current->pattern, - current->flags & ~(PCRE_NOTBOL | PCRE_NOTEOL | PCRE_NOTEMPTY | PCRE_NOTEMPTY_ATSTART | PCRE_PARTIAL_SOFT | PCRE_PARTIAL_HARD | disabled_flags), - &error, &err_offs, tables(0)); - - extra8 = NULL; - if (re8) { - error = NULL; - extra8 = pcre_study(re8, study_mode, &error); - if (!extra8) { - printf("\n8 bit: Cannot study pattern: %s\n", current->pattern); - pcre_free(re8); - re8 = NULL; - } - else if (!(extra8->flags & PCRE_EXTRA_EXECUTABLE_JIT)) { - printf("\n8 bit: JIT compiler does not support: %s\n", current->pattern); - pcre_free_study(extra8); - pcre_free(re8); - re8 = NULL; - } - extra8->flags |= PCRE_EXTRA_MARK; - } else if (((utf && ucp) || is_ascii) && !(current->start_offset & F_NO8)) - printf("\n8 bit: Cannot compile pattern \"%s\": %s\n", current->pattern, error); -#endif -#ifdef SUPPORT_PCRE16 - if ((current->flags & PCRE_UTF16) || (current->start_offset & F_FORCECONV)) - convert_utf8_to_utf16(current->pattern, regtest_buf16, NULL, REGTEST_MAX_LENGTH16); - else - copy_char8_to_char16(current->pattern, regtest_buf16, REGTEST_MAX_LENGTH16); - - re16 = NULL; - if (!(current->start_offset & F_NO16)) - re16 = pcre16_compile(regtest_buf16, - current->flags & ~(PCRE_NOTBOL | PCRE_NOTEOL | PCRE_NOTEMPTY | PCRE_NOTEMPTY_ATSTART | PCRE_PARTIAL_SOFT | PCRE_PARTIAL_HARD | disabled_flags), - &error, &err_offs, tables(0)); - - extra16 = NULL; - if (re16) { - error = NULL; - extra16 = pcre16_study(re16, study_mode, &error); - if (!extra16) { - printf("\n16 bit: Cannot study pattern: %s\n", current->pattern); - pcre16_free(re16); - re16 = NULL; - } - else if (!(extra16->flags & PCRE_EXTRA_EXECUTABLE_JIT)) { - printf("\n16 bit: JIT compiler does not support: %s\n", current->pattern); - pcre16_free_study(extra16); - pcre16_free(re16); - re16 = NULL; - } - extra16->flags |= PCRE_EXTRA_MARK; - } else if (((utf && ucp) || is_ascii) && !(current->start_offset & F_NO16)) - printf("\n16 bit: Cannot compile pattern \"%s\": %s\n", current->pattern, error); -#endif -#ifdef SUPPORT_PCRE32 - if ((current->flags & PCRE_UTF32) || (current->start_offset & F_FORCECONV)) - convert_utf8_to_utf32(current->pattern, regtest_buf32, NULL, REGTEST_MAX_LENGTH32); - else - copy_char8_to_char32(current->pattern, regtest_buf32, REGTEST_MAX_LENGTH32); - - re32 = NULL; - if (!(current->start_offset & F_NO32)) - re32 = pcre32_compile(regtest_buf32, - current->flags & ~(PCRE_NOTBOL | PCRE_NOTEOL | PCRE_NOTEMPTY | PCRE_NOTEMPTY_ATSTART | PCRE_PARTIAL_SOFT | PCRE_PARTIAL_HARD | disabled_flags), - &error, &err_offs, tables(0)); - - extra32 = NULL; - if (re32) { - error = NULL; - extra32 = pcre32_study(re32, study_mode, &error); - if (!extra32) { - printf("\n32 bit: Cannot study pattern: %s\n", current->pattern); - pcre32_free(re32); - re32 = NULL; - } - if (!(extra32->flags & PCRE_EXTRA_EXECUTABLE_JIT)) { - printf("\n32 bit: JIT compiler does not support: %s\n", current->pattern); - pcre32_free_study(extra32); - pcre32_free(re32); - re32 = NULL; - } - extra32->flags |= PCRE_EXTRA_MARK; - } else if (((utf && ucp) || is_ascii) && !(current->start_offset & F_NO32)) - printf("\n32 bit: Cannot compile pattern \"%s\": %s\n", current->pattern, error); -#endif - - counter++; - if ((counter & 0x3) != 0) { -#ifdef SUPPORT_PCRE8 - setstack8(NULL); -#endif -#ifdef SUPPORT_PCRE16 - setstack16(NULL); -#endif -#ifdef SUPPORT_PCRE32 - setstack32(NULL); -#endif - } - -#ifdef SUPPORT_PCRE8 - return_value8[0] = -1000; - return_value8[1] = -1000; - for (i = 0; i < 32; ++i) - ovector8_1[i] = -2; - for (i = 0; i < 32; ++i) - ovector8_2[i] = -2; - if (re8) { - mark8_1 = NULL; - mark8_2 = NULL; - extra8->mark = &mark8_1; - - if ((counter & 0x1) != 0) { - setstack8(extra8); - return_value8[0] = pcre_exec(re8, extra8, current->input, strlen(current->input), current->start_offset & OFFSET_MASK, - current->flags & (PCRE_NOTBOL | PCRE_NOTEOL | PCRE_NOTEMPTY | PCRE_NOTEMPTY_ATSTART | PCRE_PARTIAL_SOFT | PCRE_PARTIAL_HARD | PCRE_NO_UTF8_CHECK), ovector8_1, 32); - } else - return_value8[0] = pcre_jit_exec(re8, extra8, current->input, strlen(current->input), current->start_offset & OFFSET_MASK, - current->flags & (PCRE_NOTBOL | PCRE_NOTEOL | PCRE_NOTEMPTY | PCRE_NOTEMPTY_ATSTART | PCRE_PARTIAL_SOFT | PCRE_PARTIAL_HARD | PCRE_NO_UTF8_CHECK), ovector8_1, 32, getstack8()); - memset(&dummy_extra8, 0, sizeof(pcre_extra)); - dummy_extra8.flags = PCRE_EXTRA_MARK; - if (current->start_offset & F_STUDY) { - dummy_extra8.flags |= PCRE_EXTRA_STUDY_DATA; - dummy_extra8.study_data = extra8->study_data; - } - dummy_extra8.mark = &mark8_2; - return_value8[1] = pcre_exec(re8, &dummy_extra8, current->input, strlen(current->input), current->start_offset & OFFSET_MASK, - current->flags & (PCRE_NOTBOL | PCRE_NOTEOL | PCRE_NOTEMPTY | PCRE_NOTEMPTY_ATSTART | PCRE_PARTIAL_SOFT | PCRE_PARTIAL_HARD | PCRE_NO_UTF8_CHECK), ovector8_2, 32); - } -#endif - -#ifdef SUPPORT_PCRE16 - return_value16[0] = -1000; - return_value16[1] = -1000; - for (i = 0; i < 32; ++i) - ovector16_1[i] = -2; - for (i = 0; i < 32; ++i) - ovector16_2[i] = -2; - if (re16) { - mark16_1 = NULL; - mark16_2 = NULL; - if ((current->flags & PCRE_UTF16) || (current->start_offset & F_FORCECONV)) - length16 = convert_utf8_to_utf16(current->input, regtest_buf16, regtest_offsetmap16, REGTEST_MAX_LENGTH16); - else - length16 = copy_char8_to_char16(current->input, regtest_buf16, REGTEST_MAX_LENGTH16); - extra16->mark = &mark16_1; - if ((counter & 0x1) != 0) { - setstack16(extra16); - return_value16[0] = pcre16_exec(re16, extra16, regtest_buf16, length16, current->start_offset & OFFSET_MASK, - current->flags & (PCRE_NOTBOL | PCRE_NOTEOL | PCRE_NOTEMPTY | PCRE_NOTEMPTY_ATSTART | PCRE_PARTIAL_SOFT | PCRE_PARTIAL_HARD | PCRE_NO_UTF8_CHECK), ovector16_1, 32); - } else - return_value16[0] = pcre16_jit_exec(re16, extra16, regtest_buf16, length16, current->start_offset & OFFSET_MASK, - current->flags & (PCRE_NOTBOL | PCRE_NOTEOL | PCRE_NOTEMPTY | PCRE_NOTEMPTY_ATSTART | PCRE_PARTIAL_SOFT | PCRE_PARTIAL_HARD | PCRE_NO_UTF8_CHECK), ovector16_1, 32, getstack16()); - memset(&dummy_extra16, 0, sizeof(pcre16_extra)); - dummy_extra16.flags = PCRE_EXTRA_MARK; - if (current->start_offset & F_STUDY) { - dummy_extra16.flags |= PCRE_EXTRA_STUDY_DATA; - dummy_extra16.study_data = extra16->study_data; - } - dummy_extra16.mark = &mark16_2; - return_value16[1] = pcre16_exec(re16, &dummy_extra16, regtest_buf16, length16, current->start_offset & OFFSET_MASK, - current->flags & (PCRE_NOTBOL | PCRE_NOTEOL | PCRE_NOTEMPTY | PCRE_NOTEMPTY_ATSTART | PCRE_PARTIAL_SOFT | PCRE_PARTIAL_HARD | PCRE_NO_UTF8_CHECK), ovector16_2, 32); - } -#endif - -#ifdef SUPPORT_PCRE32 - return_value32[0] = -1000; - return_value32[1] = -1000; - for (i = 0; i < 32; ++i) - ovector32_1[i] = -2; - for (i = 0; i < 32; ++i) - ovector32_2[i] = -2; - if (re32) { - mark32_1 = NULL; - mark32_2 = NULL; - if ((current->flags & PCRE_UTF32) || (current->start_offset & F_FORCECONV)) - length32 = convert_utf8_to_utf32(current->input, regtest_buf32, regtest_offsetmap32, REGTEST_MAX_LENGTH32); - else - length32 = copy_char8_to_char32(current->input, regtest_buf32, REGTEST_MAX_LENGTH32); - extra32->mark = &mark32_1; - if ((counter & 0x1) != 0) { - setstack32(extra32); - return_value32[0] = pcre32_exec(re32, extra32, regtest_buf32, length32, current->start_offset & OFFSET_MASK, - current->flags & (PCRE_NOTBOL | PCRE_NOTEOL | PCRE_NOTEMPTY | PCRE_NOTEMPTY_ATSTART | PCRE_PARTIAL_SOFT | PCRE_PARTIAL_HARD | PCRE_NO_UTF8_CHECK), ovector32_1, 32); - } else - return_value32[0] = pcre32_jit_exec(re32, extra32, regtest_buf32, length32, current->start_offset & OFFSET_MASK, - current->flags & (PCRE_NOTBOL | PCRE_NOTEOL | PCRE_NOTEMPTY | PCRE_NOTEMPTY_ATSTART | PCRE_PARTIAL_SOFT | PCRE_PARTIAL_HARD | PCRE_NO_UTF8_CHECK), ovector32_1, 32, getstack32()); - memset(&dummy_extra32, 0, sizeof(pcre32_extra)); - dummy_extra32.flags = PCRE_EXTRA_MARK; - if (current->start_offset & F_STUDY) { - dummy_extra32.flags |= PCRE_EXTRA_STUDY_DATA; - dummy_extra32.study_data = extra32->study_data; - } - dummy_extra32.mark = &mark32_2; - return_value32[1] = pcre32_exec(re32, &dummy_extra32, regtest_buf32, length32, current->start_offset & OFFSET_MASK, - current->flags & (PCRE_NOTBOL | PCRE_NOTEOL | PCRE_NOTEMPTY | PCRE_NOTEMPTY_ATSTART | PCRE_PARTIAL_SOFT | PCRE_PARTIAL_HARD | PCRE_NO_UTF8_CHECK), ovector32_2, 32); - } -#endif - - /* printf("[%d-%d-%d|%d-%d|%d-%d|%d-%d]%s", - return_value8[0], return_value16[0], return_value32[0], - ovector8_1[0], ovector8_1[1], - ovector16_1[0], ovector16_1[1], - ovector32_1[0], ovector32_1[1], - (current->flags & PCRE_CASELESS) ? "C" : ""); */ - - /* If F_DIFF is set, just run the test, but do not compare the results. - Segfaults can still be captured. */ - - is_successful = 1; - if (!(current->start_offset & F_DIFF)) { -#if defined SUPPORT_UTF && ((defined(SUPPORT_PCRE8) + defined(SUPPORT_PCRE16) + defined(SUPPORT_PCRE32)) >= 2) - if (!(current->start_offset & F_FORCECONV)) { - int return_value; - - /* All results must be the same. */ -#ifdef SUPPORT_PCRE8 - if ((return_value = return_value8[0]) != return_value8[1]) { - printf("\n8 bit: Return value differs(J8:%d,I8:%d): [%d] '%s' @ '%s'\n", - return_value8[0], return_value8[1], total, current->pattern, current->input); - is_successful = 0; - } else -#endif -#ifdef SUPPORT_PCRE16 - if ((return_value = return_value16[0]) != return_value16[1]) { - printf("\n16 bit: Return value differs(J16:%d,I16:%d): [%d] '%s' @ '%s'\n", - return_value16[0], return_value16[1], total, current->pattern, current->input); - is_successful = 0; - } else -#endif -#ifdef SUPPORT_PCRE32 - if ((return_value = return_value32[0]) != return_value32[1]) { - printf("\n32 bit: Return value differs(J32:%d,I32:%d): [%d] '%s' @ '%s'\n", - return_value32[0], return_value32[1], total, current->pattern, current->input); - is_successful = 0; - } else -#endif -#if defined SUPPORT_PCRE8 && defined SUPPORT_PCRE16 - if (return_value8[0] != return_value16[0]) { - printf("\n8 and 16 bit: Return value differs(J8:%d,J16:%d): [%d] '%s' @ '%s'\n", - return_value8[0], return_value16[0], - total, current->pattern, current->input); - is_successful = 0; - } else -#endif -#if defined SUPPORT_PCRE8 && defined SUPPORT_PCRE32 - if (return_value8[0] != return_value32[0]) { - printf("\n8 and 32 bit: Return value differs(J8:%d,J32:%d): [%d] '%s' @ '%s'\n", - return_value8[0], return_value32[0], - total, current->pattern, current->input); - is_successful = 0; - } else -#endif -#if defined SUPPORT_PCRE16 && defined SUPPORT_PCRE32 - if (return_value16[0] != return_value32[0]) { - printf("\n16 and 32 bit: Return value differs(J16:%d,J32:%d): [%d] '%s' @ '%s'\n", - return_value16[0], return_value32[0], - total, current->pattern, current->input); - is_successful = 0; - } else -#endif - if (return_value >= 0 || return_value == PCRE_ERROR_PARTIAL) { - if (return_value == PCRE_ERROR_PARTIAL) { - return_value = 2; - } else { - return_value *= 2; - } -#ifdef SUPPORT_PCRE8 - return_value8[0] = return_value; -#endif -#ifdef SUPPORT_PCRE16 - return_value16[0] = return_value; -#endif -#ifdef SUPPORT_PCRE32 - return_value32[0] = return_value; -#endif - /* Transform back the results. */ - if (current->flags & PCRE_UTF8) { -#ifdef SUPPORT_PCRE16 - for (i = 0; i < return_value; ++i) { - if (ovector16_1[i] >= 0) - ovector16_1[i] = regtest_offsetmap16[ovector16_1[i]]; - if (ovector16_2[i] >= 0) - ovector16_2[i] = regtest_offsetmap16[ovector16_2[i]]; - } -#endif -#ifdef SUPPORT_PCRE32 - for (i = 0; i < return_value; ++i) { - if (ovector32_1[i] >= 0) - ovector32_1[i] = regtest_offsetmap32[ovector32_1[i]]; - if (ovector32_2[i] >= 0) - ovector32_2[i] = regtest_offsetmap32[ovector32_2[i]]; - } -#endif - } - - for (i = 0; i < return_value; ++i) { -#if defined SUPPORT_PCRE8 && defined SUPPORT_PCRE16 - if (ovector8_1[i] != ovector8_2[i] || ovector8_1[i] != ovector16_1[i] || ovector8_1[i] != ovector16_2[i]) { - printf("\n8 and 16 bit: Ovector[%d] value differs(J8:%d,I8:%d,J16:%d,I16:%d): [%d] '%s' @ '%s' \n", - i, ovector8_1[i], ovector8_2[i], ovector16_1[i], ovector16_2[i], - total, current->pattern, current->input); - is_successful = 0; - } -#endif -#if defined SUPPORT_PCRE8 && defined SUPPORT_PCRE32 - if (ovector8_1[i] != ovector8_2[i] || ovector8_1[i] != ovector32_1[i] || ovector8_1[i] != ovector32_2[i]) { - printf("\n8 and 32 bit: Ovector[%d] value differs(J8:%d,I8:%d,J32:%d,I32:%d): [%d] '%s' @ '%s' \n", - i, ovector8_1[i], ovector8_2[i], ovector32_1[i], ovector32_2[i], - total, current->pattern, current->input); - is_successful = 0; - } -#endif -#if defined SUPPORT_PCRE16 && defined SUPPORT_PCRE32 - if (ovector16_1[i] != ovector16_2[i] || ovector16_1[i] != ovector32_1[i] || ovector16_1[i] != ovector32_2[i]) { - printf("\n16 and 32 bit: Ovector[%d] value differs(J16:%d,I16:%d,J32:%d,I32:%d): [%d] '%s' @ '%s' \n", - i, ovector16_1[i], ovector16_2[i], ovector32_1[i], ovector32_2[i], - total, current->pattern, current->input); - is_successful = 0; - } -#endif - } - } - } else -#endif /* more than one of SUPPORT_PCRE8, SUPPORT_PCRE16 and SUPPORT_PCRE32 */ - { - /* Only the 8 bit and 16 bit results must be equal. */ -#ifdef SUPPORT_PCRE8 - if (return_value8[0] != return_value8[1]) { - printf("\n8 bit: Return value differs(%d:%d): [%d] '%s' @ '%s'\n", - return_value8[0], return_value8[1], total, current->pattern, current->input); - is_successful = 0; - } else if (return_value8[0] >= 0 || return_value8[0] == PCRE_ERROR_PARTIAL) { - if (return_value8[0] == PCRE_ERROR_PARTIAL) - return_value8[0] = 2; - else - return_value8[0] *= 2; - - for (i = 0; i < return_value8[0]; ++i) - if (ovector8_1[i] != ovector8_2[i]) { - printf("\n8 bit: Ovector[%d] value differs(%d:%d): [%d] '%s' @ '%s'\n", - i, ovector8_1[i], ovector8_2[i], total, current->pattern, current->input); - is_successful = 0; - } - } -#endif - -#ifdef SUPPORT_PCRE16 - if (return_value16[0] != return_value16[1]) { - printf("\n16 bit: Return value differs(%d:%d): [%d] '%s' @ '%s'\n", - return_value16[0], return_value16[1], total, current->pattern, current->input); - is_successful = 0; - } else if (return_value16[0] >= 0 || return_value16[0] == PCRE_ERROR_PARTIAL) { - if (return_value16[0] == PCRE_ERROR_PARTIAL) - return_value16[0] = 2; - else - return_value16[0] *= 2; - - for (i = 0; i < return_value16[0]; ++i) - if (ovector16_1[i] != ovector16_2[i]) { - printf("\n16 bit: Ovector[%d] value differs(%d:%d): [%d] '%s' @ '%s'\n", - i, ovector16_1[i], ovector16_2[i], total, current->pattern, current->input); - is_successful = 0; - } - } -#endif - -#ifdef SUPPORT_PCRE32 - if (return_value32[0] != return_value32[1]) { - printf("\n32 bit: Return value differs(%d:%d): [%d] '%s' @ '%s'\n", - return_value32[0], return_value32[1], total, current->pattern, current->input); - is_successful = 0; - } else if (return_value32[0] >= 0 || return_value32[0] == PCRE_ERROR_PARTIAL) { - if (return_value32[0] == PCRE_ERROR_PARTIAL) - return_value32[0] = 2; - else - return_value32[0] *= 2; - - for (i = 0; i < return_value32[0]; ++i) - if (ovector32_1[i] != ovector32_2[i]) { - printf("\n32 bit: Ovector[%d] value differs(%d:%d): [%d] '%s' @ '%s'\n", - i, ovector32_1[i], ovector32_2[i], total, current->pattern, current->input); - is_successful = 0; - } - } -#endif - } - } - - if (is_successful) { -#ifdef SUPPORT_PCRE8 - if (!(current->start_offset & F_NO8) && ((utf && ucp) || is_ascii)) { - if (return_value8[0] < 0 && !(current->start_offset & F_NOMATCH)) { - printf("8 bit: Test should match: [%d] '%s' @ '%s'\n", - total, current->pattern, current->input); - is_successful = 0; - } - - if (return_value8[0] >= 0 && (current->start_offset & F_NOMATCH)) { - printf("8 bit: Test should not match: [%d] '%s' @ '%s'\n", - total, current->pattern, current->input); - is_successful = 0; - } - } -#endif -#ifdef SUPPORT_PCRE16 - if (!(current->start_offset & F_NO16) && ((utf && ucp) || is_ascii)) { - if (return_value16[0] < 0 && !(current->start_offset & F_NOMATCH)) { - printf("16 bit: Test should match: [%d] '%s' @ '%s'\n", - total, current->pattern, current->input); - is_successful = 0; - } - - if (return_value16[0] >= 0 && (current->start_offset & F_NOMATCH)) { - printf("16 bit: Test should not match: [%d] '%s' @ '%s'\n", - total, current->pattern, current->input); - is_successful = 0; - } - } -#endif -#ifdef SUPPORT_PCRE32 - if (!(current->start_offset & F_NO32) && ((utf && ucp) || is_ascii)) { - if (return_value32[0] < 0 && !(current->start_offset & F_NOMATCH)) { - printf("32 bit: Test should match: [%d] '%s' @ '%s'\n", - total, current->pattern, current->input); - is_successful = 0; - } - - if (return_value32[0] >= 0 && (current->start_offset & F_NOMATCH)) { - printf("32 bit: Test should not match: [%d] '%s' @ '%s'\n", - total, current->pattern, current->input); - is_successful = 0; - } - } -#endif - } - - if (is_successful) { -#ifdef SUPPORT_PCRE8 - if (mark8_1 != mark8_2) { - printf("8 bit: Mark value mismatch: [%d] '%s' @ '%s'\n", - total, current->pattern, current->input); - is_successful = 0; - } -#endif -#ifdef SUPPORT_PCRE16 - if (mark16_1 != mark16_2) { - printf("16 bit: Mark value mismatch: [%d] '%s' @ '%s'\n", - total, current->pattern, current->input); - is_successful = 0; - } -#endif -#ifdef SUPPORT_PCRE32 - if (mark32_1 != mark32_2) { - printf("32 bit: Mark value mismatch: [%d] '%s' @ '%s'\n", - total, current->pattern, current->input); - is_successful = 0; - } -#endif - } - -#ifdef SUPPORT_PCRE8 - if (re8) { - pcre_free_study(extra8); - pcre_free(re8); - } -#endif -#ifdef SUPPORT_PCRE16 - if (re16) { - pcre16_free_study(extra16); - pcre16_free(re16); - } -#endif -#ifdef SUPPORT_PCRE32 - if (re32) { - pcre32_free_study(extra32); - pcre32_free(re32); - } -#endif - - if (is_successful) { - successful++; - successful_row++; - printf("."); - if (successful_row >= 60) { - successful_row = 0; - printf("\n"); - } - } else - successful_row = 0; - - fflush(stdout); - current++; - } - tables(1); -#ifdef SUPPORT_PCRE8 - setstack8(NULL); -#endif -#ifdef SUPPORT_PCRE16 - setstack16(NULL); -#endif -#ifdef SUPPORT_PCRE32 - setstack32(NULL); -#endif - - if (total == successful) { - printf("\nAll JIT regression tests are successfully passed.\n"); - return 0; - } else { - printf("\nSuccessful test ratio: %d%% (%d failed)\n", successful * 100 / total, total - successful); - return 1; - } -} - -/* End of pcre_jit_test.c */ diff --git a/src/pcre/pcre_refcount.c b/src/pcre/pcre_refcount.c deleted file mode 100644 index 79efa90f..00000000 --- a/src/pcre/pcre_refcount.c +++ /dev/null @@ -1,92 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This module contains the external function pcre_refcount(), which is an -auxiliary function that can be used to maintain a reference count in a compiled -pattern data block. This might be helpful in applications where the block is -shared by different users. */ - - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include "pcre_internal.h" - - -/************************************************* -* Maintain reference count * -*************************************************/ - -/* The reference count is a 16-bit field, initialized to zero. It is not -possible to transfer a non-zero count from one host to a different host that -has a different byte order - though I can't see why anyone in their right mind -would ever want to do that! - -Arguments: - argument_re points to compiled code - adjust value to add to the count - -Returns: the (possibly updated) count value (a non-negative number), or - a negative error number -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre_refcount(pcre *argument_re, int adjust) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre16_refcount(pcre16 *argument_re, int adjust) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre32_refcount(pcre32 *argument_re, int adjust) -#endif -{ -REAL_PCRE *re = (REAL_PCRE *)argument_re; -if (re == NULL) return PCRE_ERROR_NULL; -if (re->magic_number != MAGIC_NUMBER) return PCRE_ERROR_BADMAGIC; -if ((re->flags & PCRE_MODE) == 0) return PCRE_ERROR_BADMODE; -re->ref_count = (-adjust > re->ref_count)? 0 : - (adjust + re->ref_count > 65535)? 65535 : - re->ref_count + adjust; -return re->ref_count; -} - -/* End of pcre_refcount.c */ diff --git a/src/pcre/pcre_scanner.cc b/src/pcre/pcre_scanner.cc deleted file mode 100644 index 6be2be68..00000000 --- a/src/pcre/pcre_scanner.cc +++ /dev/null @@ -1,199 +0,0 @@ -// Copyright (c) 2005, Google Inc. -// All rights reserved. -// -// Redistribution and use in source and binary forms, with or without -// modification, are permitted provided that the following conditions are -// met: -// -// * Redistributions of source code must retain the above copyright -// notice, this list of conditions and the following disclaimer. -// * Redistributions in binary form must reproduce the above -// copyright notice, this list of conditions and the following disclaimer -// in the documentation and/or other materials provided with the -// distribution. -// * Neither the name of Google Inc. nor the names of its -// contributors may be used to endorse or promote products derived from -// this software without specific prior written permission. -// -// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -// -// Author: Sanjay Ghemawat - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include -#include - -#include "pcrecpp_internal.h" -#include "pcre_scanner.h" - -using std::vector; - -namespace pcrecpp { - -Scanner::Scanner() - : data_(), - input_(data_), - skip_(NULL), - should_skip_(false), - skip_repeat_(false), - save_comments_(false), - comments_(NULL), - comments_offset_(0) { -} - -Scanner::Scanner(const string& in) - : data_(in), - input_(data_), - skip_(NULL), - should_skip_(false), - skip_repeat_(false), - save_comments_(false), - comments_(NULL), - comments_offset_(0) { -} - -Scanner::~Scanner() { - delete skip_; - delete comments_; -} - -void Scanner::SetSkipExpression(const char* re) { - delete skip_; - if (re != NULL) { - skip_ = new RE(re); - should_skip_ = true; - skip_repeat_ = true; - ConsumeSkip(); - } else { - skip_ = NULL; - should_skip_ = false; - skip_repeat_ = false; - } -} - -void Scanner::Skip(const char* re) { - delete skip_; - if (re != NULL) { - skip_ = new RE(re); - should_skip_ = true; - skip_repeat_ = false; - ConsumeSkip(); - } else { - skip_ = NULL; - should_skip_ = false; - skip_repeat_ = false; - } -} - -void Scanner::DisableSkip() { - assert(skip_ != NULL); - should_skip_ = false; -} - -void Scanner::EnableSkip() { - assert(skip_ != NULL); - should_skip_ = true; - ConsumeSkip(); -} - -int Scanner::LineNumber() const { - // TODO: Make it more efficient by keeping track of the last point - // where we computed line numbers and counting newlines since then. - // We could use std:count, but not all systems have it. :-( - int count = 1; - for (const char* p = data_.data(); p < input_.data(); ++p) - if (*p == '\n') - ++count; - return count; -} - -int Scanner::Offset() const { - return (int)(input_.data() - data_.c_str()); -} - -bool Scanner::LookingAt(const RE& re) const { - int consumed; - return re.DoMatch(input_, RE::ANCHOR_START, &consumed, 0, 0); -} - - -bool Scanner::Consume(const RE& re, - const Arg& arg0, - const Arg& arg1, - const Arg& arg2) { - const bool result = re.Consume(&input_, arg0, arg1, arg2); - if (result && should_skip_) ConsumeSkip(); - return result; -} - -// helper function to consume *skip_ and honour save_comments_ -void Scanner::ConsumeSkip() { - const char* start_data = input_.data(); - while (skip_->Consume(&input_)) { - if (!skip_repeat_) { - // Only one skip allowed. - break; - } - } - if (save_comments_) { - if (comments_ == NULL) { - comments_ = new vector; - } - // already pointing one past end, so no need to +1 - int length = (int)(input_.data() - start_data); - if (length > 0) { - comments_->push_back(StringPiece(start_data, length)); - } - } -} - - -void Scanner::GetComments(int start, int end, vector *ranges) { - // short circuit out if we've not yet initialized comments_ - // (e.g., when save_comments is false) - if (!comments_) { - return; - } - // TODO: if we guarantee that comments_ will contain StringPieces - // that are ordered by their start, then we can do a binary search - // for the first StringPiece at or past start and then scan for the - // ones contained in the range, quit early (use equal_range or - // lower_bound) - for (vector::const_iterator it = comments_->begin(); - it != comments_->end(); ++it) { - if ((it->data() >= data_.c_str() + start && - it->data() + it->size() <= data_.c_str() + end)) { - ranges->push_back(*it); - } - } -} - - -void Scanner::GetNextComments(vector *ranges) { - // short circuit out if we've not yet initialized comments_ - // (e.g., when save_comments is false) - if (!comments_) { - return; - } - for (vector::const_iterator it = - comments_->begin() + comments_offset_; - it != comments_->end(); ++it) { - ranges->push_back(*it); - ++comments_offset_; - } -} - -} // namespace pcrecpp diff --git a/src/pcre/pcre_scanner.h b/src/pcre/pcre_scanner.h deleted file mode 100644 index 5617e451..00000000 --- a/src/pcre/pcre_scanner.h +++ /dev/null @@ -1,172 +0,0 @@ -// Copyright (c) 2005, Google Inc. -// All rights reserved. -// -// Redistribution and use in source and binary forms, with or without -// modification, are permitted provided that the following conditions are -// met: -// -// * Redistributions of source code must retain the above copyright -// notice, this list of conditions and the following disclaimer. -// * Redistributions in binary form must reproduce the above -// copyright notice, this list of conditions and the following disclaimer -// in the documentation and/or other materials provided with the -// distribution. -// * Neither the name of Google Inc. nor the names of its -// contributors may be used to endorse or promote products derived from -// this software without specific prior written permission. -// -// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -// -// Author: Sanjay Ghemawat -// -// Regular-expression based scanner for parsing an input stream. -// -// Example 1: parse a sequence of "var = number" entries from input: -// -// Scanner scanner(input); -// string var; -// int number; -// scanner.SetSkipExpression("\\s+"); // Skip any white space we encounter -// while (scanner.Consume("(\\w+) = (\\d+)", &var, &number)) { -// ...; -// } - -#ifndef _PCRE_SCANNER_H -#define _PCRE_SCANNER_H - -#include -#include -#include - -#include -#include - -namespace pcrecpp { - -class PCRECPP_EXP_DEFN Scanner { - public: - Scanner(); - explicit Scanner(const std::string& input); - ~Scanner(); - - // Return current line number. The returned line-number is - // one-based. I.e. it returns 1 + the number of consumed newlines. - // - // Note: this method may be slow. It may take time proportional to - // the size of the input. - int LineNumber() const; - - // Return the byte-offset that the scanner is looking in the - // input data; - int Offset() const; - - // Return true iff the start of the remaining input matches "re" - bool LookingAt(const RE& re) const; - - // Return true iff all of the following are true - // a. the start of the remaining input matches "re", - // b. if any arguments are supplied, matched sub-patterns can be - // parsed and stored into the arguments. - // If it returns true, it skips over the matched input and any - // following input that matches the "skip" regular expression. - bool Consume(const RE& re, - const Arg& arg0 = RE::no_arg, - const Arg& arg1 = RE::no_arg, - const Arg& arg2 = RE::no_arg - // TODO: Allow more arguments? - ); - - // Set the "skip" regular expression. If after consuming some data, - // a prefix of the input matches this RE, it is automatically - // skipped. For example, a programming language scanner would use - // a skip RE that matches white space and comments. - // - // scanner.SetSkipExpression("\\s+|//.*|/[*](.|\n)*?[*]/"); - // - // Skipping repeats as long as it succeeds. We used to let people do - // this by writing "(...)*" in the regular expression, but that added - // up to lots of recursive calls within the pcre library, so now we - // control repetition explicitly via the function call API. - // - // You can pass NULL for "re" if you do not want any data to be skipped. - void Skip(const char* re); // DEPRECATED; does *not* repeat - void SetSkipExpression(const char* re); - - // Temporarily pause "skip"ing. This - // Skip("Foo"); code ; DisableSkip(); code; EnableSkip() - // is similar to - // Skip("Foo"); code ; Skip(NULL); code ; Skip("Foo"); - // but avoids creating/deleting new RE objects. - void DisableSkip(); - - // Reenable previously paused skipping. Any prefix of the input - // that matches the skip pattern is immediately dropped. - void EnableSkip(); - - /***** Special wrappers around SetSkip() for some common idioms *****/ - - // Arranges to skip whitespace, C comments, C++ comments. - // The overall RE is a disjunction of the following REs: - // \\s whitespace - // //.*\n C++ comment - // /[*](.|\n)*?[*]/ C comment (x*? means minimal repetitions of x) - // We get repetition via the semantics of SetSkipExpression, not by using * - void SkipCXXComments() { - SetSkipExpression("\\s|//.*\n|/[*](?:\n|.)*?[*]/"); - } - - void set_save_comments(bool comments) { - save_comments_ = comments; - } - - bool save_comments() { - return save_comments_; - } - - // Append to vector ranges the comments found in the - // byte range [start,end] (inclusive) of the input data. - // Only comments that were extracted entirely within that - // range are returned: no range splitting of atomically-extracted - // comments is performed. - void GetComments(int start, int end, std::vector *ranges); - - // Append to vector ranges the comments added - // since the last time this was called. This - // functionality is provided for efficiency when - // interleaving scanning with parsing. - void GetNextComments(std::vector *ranges); - - private: - std::string data_; // All the input data - StringPiece input_; // Unprocessed input - RE* skip_; // If non-NULL, RE for skipping input - bool should_skip_; // If true, use skip_ - bool skip_repeat_; // If true, repeat skip_ as long as it works - bool save_comments_; // If true, aggregate the skip expression - - // the skipped comments - // TODO: later consider requiring that the StringPieces be added - // in order by their start position - std::vector *comments_; - - // the offset into comments_ that has been returned by GetNextComments - int comments_offset_; - - // helper function to consume *skip_ and honour - // save_comments_ - void ConsumeSkip(); -}; - -} // namespace pcrecpp - -#endif /* _PCRE_SCANNER_H */ diff --git a/src/pcre/pcre_scanner_unittest.cc b/src/pcre/pcre_scanner_unittest.cc deleted file mode 100644 index 623e2afd..00000000 --- a/src/pcre/pcre_scanner_unittest.cc +++ /dev/null @@ -1,162 +0,0 @@ -// Copyright (c) 2005, Google Inc. -// All rights reserved. -// -// Redistribution and use in source and binary forms, with or without -// modification, are permitted provided that the following conditions are -// met: -// -// * Redistributions of source code must retain the above copyright -// notice, this list of conditions and the following disclaimer. -// * Redistributions in binary form must reproduce the above -// copyright notice, this list of conditions and the following disclaimer -// in the documentation and/or other materials provided with the -// distribution. -// * Neither the name of Google Inc. nor the names of its -// contributors may be used to endorse or promote products derived from -// this software without specific prior written permission. -// -// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -// -// Author: Greg J. Badros -// -// Unittest for scanner, especially GetNextComments and GetComments() -// functionality. - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include -#include /* for strchr */ -#include -#include - -#include "pcrecpp.h" -#include "pcre_stringpiece.h" -#include "pcre_scanner.h" - -#define FLAGS_unittest_stack_size 49152 - -// Dies with a fatal error if the two values are not equal. -#define CHECK_EQ(a, b) do { \ - if ( (a) != (b) ) { \ - fprintf(stderr, "%s:%d: Check failed because %s != %s\n", \ - __FILE__, __LINE__, #a, #b); \ - exit(1); \ - } \ -} while (0) - -using std::vector; -using std::string; -using pcrecpp::StringPiece; -using pcrecpp::Scanner; - -static void TestScanner() { - const char input[] = "\n" - "alpha = 1; // this sets alpha\n" - "bravo = 2; // bravo is set here\n" - "gamma = 33; /* and here is gamma */\n"; - - const char *re = "(\\w+) = (\\d+);"; - - Scanner s(input); - string var; - int number; - s.SkipCXXComments(); - s.set_save_comments(true); - vector comments; - - s.Consume(re, &var, &number); - CHECK_EQ(var, "alpha"); - CHECK_EQ(number, 1); - CHECK_EQ(s.LineNumber(), 3); - s.GetNextComments(&comments); - CHECK_EQ(comments.size(), 1); - CHECK_EQ(comments[0].as_string(), " // this sets alpha\n"); - comments.resize(0); - - s.Consume(re, &var, &number); - CHECK_EQ(var, "bravo"); - CHECK_EQ(number, 2); - s.GetNextComments(&comments); - CHECK_EQ(comments.size(), 1); - CHECK_EQ(comments[0].as_string(), " // bravo is set here\n"); - comments.resize(0); - - s.Consume(re, &var, &number); - CHECK_EQ(var, "gamma"); - CHECK_EQ(number, 33); - s.GetNextComments(&comments); - CHECK_EQ(comments.size(), 1); - CHECK_EQ(comments[0].as_string(), " /* and here is gamma */\n"); - comments.resize(0); - - s.GetComments(0, sizeof(input), &comments); - CHECK_EQ(comments.size(), 3); - CHECK_EQ(comments[0].as_string(), " // this sets alpha\n"); - CHECK_EQ(comments[1].as_string(), " // bravo is set here\n"); - CHECK_EQ(comments[2].as_string(), " /* and here is gamma */\n"); - comments.resize(0); - - s.GetComments(0, (int)(strchr(input, '/') - input), &comments); - CHECK_EQ(comments.size(), 0); - comments.resize(0); - - s.GetComments((int)(strchr(input, '/') - input - 1), sizeof(input), - &comments); - CHECK_EQ(comments.size(), 3); - CHECK_EQ(comments[0].as_string(), " // this sets alpha\n"); - CHECK_EQ(comments[1].as_string(), " // bravo is set here\n"); - CHECK_EQ(comments[2].as_string(), " /* and here is gamma */\n"); - comments.resize(0); - - s.GetComments((int)(strchr(input, '/') - input - 1), - (int)(strchr(input + 1, '\n') - input + 1), &comments); - CHECK_EQ(comments.size(), 1); - CHECK_EQ(comments[0].as_string(), " // this sets alpha\n"); - comments.resize(0); -} - -static void TestBigComment() { - string input; - for (int i = 0; i < 1024; ++i) { - char buf[1024]; // definitely big enough - sprintf(buf, " # Comment %d\n", i); - input += buf; - } - input += "name = value;\n"; - - Scanner s(input.c_str()); - s.SetSkipExpression("\\s+|#.*\n"); - - string name; - string value; - s.Consume("(\\w+) = (\\w+);", &name, &value); - CHECK_EQ(name, "name"); - CHECK_EQ(value, "value"); -} - -// TODO: also test scanner and big-comment in a thread with a -// small stack size - -int main(int argc, char** argv) { - (void)argc; - (void)argv; - TestScanner(); - TestBigComment(); - - // Done - printf("OK\n"); - - return 0; -} diff --git a/src/pcre/pcre_string_utils.c b/src/pcre/pcre_string_utils.c deleted file mode 100644 index 25eacc85..00000000 --- a/src/pcre/pcre_string_utils.c +++ /dev/null @@ -1,211 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2014 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This module contains internal functions for comparing and finding the length -of strings for different data item sizes. */ - - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include "pcre_internal.h" - -#ifndef COMPILE_PCRE8 - -/************************************************* -* Compare string utilities * -*************************************************/ - -/* The following two functions compares two strings. Basically a strcmp -for non 8 bit characters. - -Arguments: - str1 first string - str2 second string - -Returns: 0 if both string are equal (like strcmp), 1 otherwise -*/ - -int -PRIV(strcmp_uc_uc)(const pcre_uchar *str1, const pcre_uchar *str2) -{ -pcre_uchar c1; -pcre_uchar c2; - -while (*str1 != '\0' || *str2 != '\0') - { - c1 = *str1++; - c2 = *str2++; - if (c1 != c2) - return ((c1 > c2) << 1) - 1; - } -/* Both length and characters must be equal. */ -return 0; -} - -#ifdef COMPILE_PCRE32 - -int -PRIV(strcmp_uc_uc_utf)(const pcre_uchar *str1, const pcre_uchar *str2) -{ -pcre_uchar c1; -pcre_uchar c2; - -while (*str1 != '\0' || *str2 != '\0') - { - c1 = UCHAR21INC(str1); - c2 = UCHAR21INC(str2); - if (c1 != c2) - return ((c1 > c2) << 1) - 1; - } -/* Both length and characters must be equal. */ -return 0; -} - -#endif /* COMPILE_PCRE32 */ - -int -PRIV(strcmp_uc_c8)(const pcre_uchar *str1, const char *str2) -{ -const pcre_uint8 *ustr2 = (pcre_uint8 *)str2; -pcre_uchar c1; -pcre_uchar c2; - -while (*str1 != '\0' || *ustr2 != '\0') - { - c1 = *str1++; - c2 = (pcre_uchar)*ustr2++; - if (c1 != c2) - return ((c1 > c2) << 1) - 1; - } -/* Both length and characters must be equal. */ -return 0; -} - -#ifdef COMPILE_PCRE32 - -int -PRIV(strcmp_uc_c8_utf)(const pcre_uchar *str1, const char *str2) -{ -const pcre_uint8 *ustr2 = (pcre_uint8 *)str2; -pcre_uchar c1; -pcre_uchar c2; - -while (*str1 != '\0' || *ustr2 != '\0') - { - c1 = UCHAR21INC(str1); - c2 = (pcre_uchar)*ustr2++; - if (c1 != c2) - return ((c1 > c2) << 1) - 1; - } -/* Both length and characters must be equal. */ -return 0; -} - -#endif /* COMPILE_PCRE32 */ - -/* The following two functions compares two, fixed length -strings. Basically an strncmp for non 8 bit characters. - -Arguments: - str1 first string - str2 second string - num size of the string - -Returns: 0 if both string are equal (like strcmp), 1 otherwise -*/ - -int -PRIV(strncmp_uc_uc)(const pcre_uchar *str1, const pcre_uchar *str2, unsigned int num) -{ -pcre_uchar c1; -pcre_uchar c2; - -while (num-- > 0) - { - c1 = *str1++; - c2 = *str2++; - if (c1 != c2) - return ((c1 > c2) << 1) - 1; - } -/* Both length and characters must be equal. */ -return 0; -} - -int -PRIV(strncmp_uc_c8)(const pcre_uchar *str1, const char *str2, unsigned int num) -{ -const pcre_uint8 *ustr2 = (pcre_uint8 *)str2; -pcre_uchar c1; -pcre_uchar c2; - -while (num-- > 0) - { - c1 = *str1++; - c2 = (pcre_uchar)*ustr2++; - if (c1 != c2) - return ((c1 > c2) << 1) - 1; - } -/* Both length and characters must be equal. */ -return 0; -} - -/* The following function returns with the length of -a zero terminated string. Basically an strlen for non 8 bit characters. - -Arguments: - str string - -Returns: length of the string -*/ - -unsigned int -PRIV(strlen_uc)(const pcre_uchar *str) -{ -unsigned int len = 0; -while (*str++ != 0) - len++; -return len; -} - -#endif /* !COMPILE_PCRE8 */ - -/* End of pcre_string_utils.c */ diff --git a/src/pcre/pcre_stringpiece.cc b/src/pcre/pcre_stringpiece.cc deleted file mode 100644 index 67c0f1fc..00000000 --- a/src/pcre/pcre_stringpiece.cc +++ /dev/null @@ -1,43 +0,0 @@ -// Copyright (c) 2005, Google Inc. -// All rights reserved. -// -// Redistribution and use in source and binary forms, with or without -// modification, are permitted provided that the following conditions are -// met: -// -// * Redistributions of source code must retain the above copyright -// notice, this list of conditions and the following disclaimer. -// * Redistributions in binary form must reproduce the above -// copyright notice, this list of conditions and the following disclaimer -// in the documentation and/or other materials provided with the -// distribution. -// * Neither the name of Google Inc. nor the names of its -// contributors may be used to endorse or promote products derived from -// this software without specific prior written permission. -// -// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -// -// Author: wilsonh@google.com (Wilson Hsieh) -// - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include -#include "pcrecpp_internal.h" -#include "pcre_stringpiece.h" - -std::ostream& operator<<(std::ostream& o, const pcrecpp::StringPiece& piece) { - return (o << piece.as_string()); -} diff --git a/src/pcre/pcre_stringpiece.h.in b/src/pcre/pcre_stringpiece.h.in deleted file mode 100644 index f54f3f3b..00000000 --- a/src/pcre/pcre_stringpiece.h.in +++ /dev/null @@ -1,180 +0,0 @@ -// Copyright (c) 2005, Google Inc. -// All rights reserved. -// -// Redistribution and use in source and binary forms, with or without -// modification, are permitted provided that the following conditions are -// met: -// -// * Redistributions of source code must retain the above copyright -// notice, this list of conditions and the following disclaimer. -// * Redistributions in binary form must reproduce the above -// copyright notice, this list of conditions and the following disclaimer -// in the documentation and/or other materials provided with the -// distribution. -// * Neither the name of Google Inc. nor the names of its -// contributors may be used to endorse or promote products derived from -// this software without specific prior written permission. -// -// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -// -// Author: Sanjay Ghemawat -// -// A string like object that points into another piece of memory. -// Useful for providing an interface that allows clients to easily -// pass in either a "const char*" or a "string". -// -// Arghh! I wish C++ literals were automatically of type "string". - -#ifndef _PCRE_STRINGPIECE_H -#define _PCRE_STRINGPIECE_H - -#include -#include -#include // for ostream forward-declaration - -#if @pcre_have_type_traits@ -#define HAVE_TYPE_TRAITS -#include -#elif @pcre_have_bits_type_traits@ -#define HAVE_TYPE_TRAITS -#include -#endif - -#include - -namespace pcrecpp { - -using std::memcmp; -using std::strlen; -using std::string; - -class PCRECPP_EXP_DEFN StringPiece { - private: - const char* ptr_; - int length_; - - public: - // We provide non-explicit singleton constructors so users can pass - // in a "const char*" or a "string" wherever a "StringPiece" is - // expected. - StringPiece() - : ptr_(NULL), length_(0) { } - StringPiece(const char* str) - : ptr_(str), length_(static_cast(strlen(ptr_))) { } - StringPiece(const unsigned char* str) - : ptr_(reinterpret_cast(str)), - length_(static_cast(strlen(ptr_))) { } - StringPiece(const string& str) - : ptr_(str.data()), length_(static_cast(str.size())) { } - StringPiece(const char* offset, int len) - : ptr_(offset), length_(len) { } - - // data() may return a pointer to a buffer with embedded NULs, and the - // returned buffer may or may not be null terminated. Therefore it is - // typically a mistake to pass data() to a routine that expects a NUL - // terminated string. Use "as_string().c_str()" if you really need to do - // this. Or better yet, change your routine so it does not rely on NUL - // termination. - const char* data() const { return ptr_; } - int size() const { return length_; } - bool empty() const { return length_ == 0; } - - void clear() { ptr_ = NULL; length_ = 0; } - void set(const char* buffer, int len) { ptr_ = buffer; length_ = len; } - void set(const char* str) { - ptr_ = str; - length_ = static_cast(strlen(str)); - } - void set(const void* buffer, int len) { - ptr_ = reinterpret_cast(buffer); - length_ = len; - } - - char operator[](int i) const { return ptr_[i]; } - - void remove_prefix(int n) { - ptr_ += n; - length_ -= n; - } - - void remove_suffix(int n) { - length_ -= n; - } - - bool operator==(const StringPiece& x) const { - return ((length_ == x.length_) && - (memcmp(ptr_, x.ptr_, length_) == 0)); - } - bool operator!=(const StringPiece& x) const { - return !(*this == x); - } - -#define STRINGPIECE_BINARY_PREDICATE(cmp,auxcmp) \ - bool operator cmp (const StringPiece& x) const { \ - int r = memcmp(ptr_, x.ptr_, length_ < x.length_ ? length_ : x.length_); \ - return ((r auxcmp 0) || ((r == 0) && (length_ cmp x.length_))); \ - } - STRINGPIECE_BINARY_PREDICATE(<, <); - STRINGPIECE_BINARY_PREDICATE(<=, <); - STRINGPIECE_BINARY_PREDICATE(>=, >); - STRINGPIECE_BINARY_PREDICATE(>, >); -#undef STRINGPIECE_BINARY_PREDICATE - - int compare(const StringPiece& x) const { - int r = memcmp(ptr_, x.ptr_, length_ < x.length_ ? length_ : x.length_); - if (r == 0) { - if (length_ < x.length_) r = -1; - else if (length_ > x.length_) r = +1; - } - return r; - } - - string as_string() const { - return string(data(), size()); - } - - void CopyToString(string* target) const { - target->assign(ptr_, length_); - } - - // Does "this" start with "x" - bool starts_with(const StringPiece& x) const { - return ((length_ >= x.length_) && (memcmp(ptr_, x.ptr_, x.length_) == 0)); - } -}; - -} // namespace pcrecpp - -// ------------------------------------------------------------------ -// Functions used to create STL containers that use StringPiece -// Remember that a StringPiece's lifetime had better be less than -// that of the underlying string or char*. If it is not, then you -// cannot safely store a StringPiece into an STL container -// ------------------------------------------------------------------ - -#ifdef HAVE_TYPE_TRAITS -// This makes vector really fast for some STL implementations -template<> struct __type_traits { - typedef __true_type has_trivial_default_constructor; - typedef __true_type has_trivial_copy_constructor; - typedef __true_type has_trivial_assignment_operator; - typedef __true_type has_trivial_destructor; - typedef __true_type is_POD_type; -}; -#endif - -// allow StringPiece to be logged -PCRECPP_EXP_DECL std::ostream& operator<<(std::ostream& o, - const pcrecpp::StringPiece& piece); - -#endif /* _PCRE_STRINGPIECE_H */ diff --git a/src/pcre/pcre_stringpiece_unittest.cc b/src/pcre/pcre_stringpiece_unittest.cc deleted file mode 100644 index 88e73a1f..00000000 --- a/src/pcre/pcre_stringpiece_unittest.cc +++ /dev/null @@ -1,153 +0,0 @@ -// Copyright 2003 and onwards Google Inc. -// Author: Sanjay Ghemawat - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include -#include -#include // for make_pair - -#include "pcrecpp.h" -#include "pcre_stringpiece.h" - -// CHECK dies with a fatal error if condition is not true. It is *not* -// controlled by NDEBUG, so the check will be executed regardless of -// compilation mode. Therefore, it is safe to do things like: -// CHECK(fp->Write(x) == 4) -#define CHECK(condition) do { \ - if (!(condition)) { \ - fprintf(stderr, "%s:%d: Check failed: %s\n", \ - __FILE__, __LINE__, #condition); \ - exit(1); \ - } \ -} while (0) - -using std::string; -using pcrecpp::StringPiece; - -static void CheckSTLComparator() { - string s1("foo"); - string s2("bar"); - string s3("baz"); - - StringPiece p1(s1); - StringPiece p2(s2); - StringPiece p3(s3); - - typedef std::map TestMap; - TestMap map; - - map.insert(std::make_pair(p1, 0)); - map.insert(std::make_pair(p2, 1)); - map.insert(std::make_pair(p3, 2)); - - CHECK(map.size() == 3); - - TestMap::const_iterator iter = map.begin(); - CHECK(iter->second == 1); - ++iter; - CHECK(iter->second == 2); - ++iter; - CHECK(iter->second == 0); - ++iter; - CHECK(iter == map.end()); - - TestMap::iterator new_iter = map.find("zot"); - CHECK(new_iter == map.end()); - - new_iter = map.find("bar"); - CHECK(new_iter != map.end()); - - map.erase(new_iter); - CHECK(map.size() == 2); - - iter = map.begin(); - CHECK(iter->second == 2); - ++iter; - CHECK(iter->second == 0); - ++iter; - CHECK(iter == map.end()); -} - -static void CheckComparisonOperators() { -#define CMP_Y(op, x, y) \ - CHECK( (StringPiece((x)) op StringPiece((y)))); \ - CHECK( (StringPiece((x)).compare(StringPiece((y))) op 0)) - -#define CMP_N(op, x, y) \ - CHECK(!(StringPiece((x)) op StringPiece((y)))); \ - CHECK(!(StringPiece((x)).compare(StringPiece((y))) op 0)) - - CMP_Y(==, "", ""); - CMP_Y(==, "a", "a"); - CMP_Y(==, "aa", "aa"); - CMP_N(==, "a", ""); - CMP_N(==, "", "a"); - CMP_N(==, "a", "b"); - CMP_N(==, "a", "aa"); - CMP_N(==, "aa", "a"); - - CMP_N(!=, "", ""); - CMP_N(!=, "a", "a"); - CMP_N(!=, "aa", "aa"); - CMP_Y(!=, "a", ""); - CMP_Y(!=, "", "a"); - CMP_Y(!=, "a", "b"); - CMP_Y(!=, "a", "aa"); - CMP_Y(!=, "aa", "a"); - - CMP_Y(<, "a", "b"); - CMP_Y(<, "a", "aa"); - CMP_Y(<, "aa", "b"); - CMP_Y(<, "aa", "bb"); - CMP_N(<, "a", "a"); - CMP_N(<, "b", "a"); - CMP_N(<, "aa", "a"); - CMP_N(<, "b", "aa"); - CMP_N(<, "bb", "aa"); - - CMP_Y(<=, "a", "a"); - CMP_Y(<=, "a", "b"); - CMP_Y(<=, "a", "aa"); - CMP_Y(<=, "aa", "b"); - CMP_Y(<=, "aa", "bb"); - CMP_N(<=, "b", "a"); - CMP_N(<=, "aa", "a"); - CMP_N(<=, "b", "aa"); - CMP_N(<=, "bb", "aa"); - - CMP_N(>=, "a", "b"); - CMP_N(>=, "a", "aa"); - CMP_N(>=, "aa", "b"); - CMP_N(>=, "aa", "bb"); - CMP_Y(>=, "a", "a"); - CMP_Y(>=, "b", "a"); - CMP_Y(>=, "aa", "a"); - CMP_Y(>=, "b", "aa"); - CMP_Y(>=, "bb", "aa"); - - CMP_N(>, "a", "a"); - CMP_N(>, "a", "b"); - CMP_N(>, "a", "aa"); - CMP_N(>, "aa", "b"); - CMP_N(>, "aa", "bb"); - CMP_Y(>, "b", "a"); - CMP_Y(>, "aa", "a"); - CMP_Y(>, "b", "aa"); - CMP_Y(>, "bb", "aa"); - -#undef CMP_Y -#undef CMP_N -} - -int main(int argc, char** argv) { - (void)argc; - (void)argv; - CheckComparisonOperators(); - CheckSTLComparator(); - - printf("OK\n"); - return 0; -} diff --git a/src/pcre/pcre_study.c b/src/pcre/pcre_study.c deleted file mode 100644 index d9d4960d..00000000 --- a/src/pcre/pcre_study.c +++ /dev/null @@ -1,1686 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This module contains the external function pcre_study(), along with local -supporting functions. */ - - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include "pcre_internal.h" - -#define SET_BIT(c) start_bits[c/8] |= (1 << (c&7)) - -/* Returns from set_start_bits() */ - -enum { SSB_FAIL, SSB_DONE, SSB_CONTINUE, SSB_UNKNOWN }; - - - -/************************************************* -* Find the minimum subject length for a group * -*************************************************/ - -/* Scan a parenthesized group and compute the minimum length of subject that -is needed to match it. This is a lower bound; it does not mean there is a -string of that length that matches. In UTF8 mode, the result is in characters -rather than bytes. - -Arguments: - re compiled pattern block - code pointer to start of group (the bracket) - startcode pointer to start of the whole pattern's code - options the compiling options - recurses chain of recurse_check to catch mutual recursion - countptr pointer to call count (to catch over complexity) - -Returns: the minimum length - -1 if \C in UTF-8 mode or (*ACCEPT) was encountered - -2 internal error (missing capturing bracket) - -3 internal error (opcode not listed) -*/ - -static int -find_minlength(const REAL_PCRE *re, const pcre_uchar *code, - const pcre_uchar *startcode, int options, recurse_check *recurses, - int *countptr) -{ -int length = -1; -/* PCRE_UTF16 has the same value as PCRE_UTF8. */ -BOOL utf = (options & PCRE_UTF8) != 0; -BOOL had_recurse = FALSE; -recurse_check this_recurse; -register int branchlength = 0; -register pcre_uchar *cc = (pcre_uchar *)code + 1 + LINK_SIZE; - -if ((*countptr)++ > 1000) return -1; /* too complex */ - -if (*code == OP_CBRA || *code == OP_SCBRA || - *code == OP_CBRAPOS || *code == OP_SCBRAPOS) cc += IMM2_SIZE; - -/* Scan along the opcodes for this branch. If we get to the end of the -branch, check the length against that of the other branches. */ - -for (;;) - { - int d, min; - pcre_uchar *cs, *ce; - register pcre_uchar op = *cc; - - switch (op) - { - case OP_COND: - case OP_SCOND: - - /* If there is only one branch in a condition, the implied branch has zero - length, so we don't add anything. This covers the DEFINE "condition" - automatically. */ - - cs = cc + GET(cc, 1); - if (*cs != OP_ALT) - { - cc = cs + 1 + LINK_SIZE; - break; - } - - /* Otherwise we can fall through and treat it the same as any other - subpattern. */ - - case OP_CBRA: - case OP_SCBRA: - case OP_BRA: - case OP_SBRA: - case OP_CBRAPOS: - case OP_SCBRAPOS: - case OP_BRAPOS: - case OP_SBRAPOS: - case OP_ONCE: - case OP_ONCE_NC: - d = find_minlength(re, cc, startcode, options, recurses, countptr); - if (d < 0) return d; - branchlength += d; - do cc += GET(cc, 1); while (*cc == OP_ALT); - cc += 1 + LINK_SIZE; - break; - - /* ACCEPT makes things far too complicated; we have to give up. */ - - case OP_ACCEPT: - case OP_ASSERT_ACCEPT: - return -1; - - /* Reached end of a branch; if it's a ket it is the end of a nested - call. If it's ALT it is an alternation in a nested call. If it is END it's - the end of the outer call. All can be handled by the same code. If an - ACCEPT was previously encountered, use the length that was in force at that - time, and pass back the shortest ACCEPT length. */ - - case OP_ALT: - case OP_KET: - case OP_KETRMAX: - case OP_KETRMIN: - case OP_KETRPOS: - case OP_END: - if (length < 0 || (!had_recurse && branchlength < length)) - length = branchlength; - if (op != OP_ALT) return length; - cc += 1 + LINK_SIZE; - branchlength = 0; - had_recurse = FALSE; - break; - - /* Skip over assertive subpatterns */ - - case OP_ASSERT: - case OP_ASSERT_NOT: - case OP_ASSERTBACK: - case OP_ASSERTBACK_NOT: - do cc += GET(cc, 1); while (*cc == OP_ALT); - /* Fall through */ - - /* Skip over things that don't match chars */ - - case OP_REVERSE: - case OP_CREF: - case OP_DNCREF: - case OP_RREF: - case OP_DNRREF: - case OP_DEF: - case OP_CALLOUT: - case OP_SOD: - case OP_SOM: - case OP_EOD: - case OP_EODN: - case OP_CIRC: - case OP_CIRCM: - case OP_DOLL: - case OP_DOLLM: - case OP_NOT_WORD_BOUNDARY: - case OP_WORD_BOUNDARY: - cc += PRIV(OP_lengths)[*cc]; - break; - - /* Skip over a subpattern that has a {0} or {0,x} quantifier */ - - case OP_BRAZERO: - case OP_BRAMINZERO: - case OP_BRAPOSZERO: - case OP_SKIPZERO: - cc += PRIV(OP_lengths)[*cc]; - do cc += GET(cc, 1); while (*cc == OP_ALT); - cc += 1 + LINK_SIZE; - break; - - /* Handle literal characters and + repetitions */ - - case OP_CHAR: - case OP_CHARI: - case OP_NOT: - case OP_NOTI: - case OP_PLUS: - case OP_PLUSI: - case OP_MINPLUS: - case OP_MINPLUSI: - case OP_POSPLUS: - case OP_POSPLUSI: - case OP_NOTPLUS: - case OP_NOTPLUSI: - case OP_NOTMINPLUS: - case OP_NOTMINPLUSI: - case OP_NOTPOSPLUS: - case OP_NOTPOSPLUSI: - branchlength++; - cc += 2; -#ifdef SUPPORT_UTF - if (utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); -#endif - break; - - case OP_TYPEPLUS: - case OP_TYPEMINPLUS: - case OP_TYPEPOSPLUS: - branchlength++; - cc += (cc[1] == OP_PROP || cc[1] == OP_NOTPROP)? 4 : 2; - break; - - /* Handle exact repetitions. The count is already in characters, but we - need to skip over a multibyte character in UTF8 mode. */ - - case OP_EXACT: - case OP_EXACTI: - case OP_NOTEXACT: - case OP_NOTEXACTI: - branchlength += GET2(cc,1); - cc += 2 + IMM2_SIZE; -#ifdef SUPPORT_UTF - if (utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); -#endif - break; - - case OP_TYPEEXACT: - branchlength += GET2(cc,1); - cc += 2 + IMM2_SIZE + ((cc[1 + IMM2_SIZE] == OP_PROP - || cc[1 + IMM2_SIZE] == OP_NOTPROP)? 2 : 0); - break; - - /* Handle single-char non-literal matchers */ - - case OP_PROP: - case OP_NOTPROP: - cc += 2; - /* Fall through */ - - case OP_NOT_DIGIT: - case OP_DIGIT: - case OP_NOT_WHITESPACE: - case OP_WHITESPACE: - case OP_NOT_WORDCHAR: - case OP_WORDCHAR: - case OP_ANY: - case OP_ALLANY: - case OP_EXTUNI: - case OP_HSPACE: - case OP_NOT_HSPACE: - case OP_VSPACE: - case OP_NOT_VSPACE: - branchlength++; - cc++; - break; - - /* "Any newline" might match two characters, but it also might match just - one. */ - - case OP_ANYNL: - branchlength += 1; - cc++; - break; - - /* The single-byte matcher means we can't proceed in UTF-8 mode. (In - non-UTF-8 mode \C will actually be turned into OP_ALLANY, so won't ever - appear, but leave the code, just in case.) */ - - case OP_ANYBYTE: -#ifdef SUPPORT_UTF - if (utf) return -1; -#endif - branchlength++; - cc++; - break; - - /* For repeated character types, we have to test for \p and \P, which have - an extra two bytes of parameters. */ - - case OP_TYPESTAR: - case OP_TYPEMINSTAR: - case OP_TYPEQUERY: - case OP_TYPEMINQUERY: - case OP_TYPEPOSSTAR: - case OP_TYPEPOSQUERY: - if (cc[1] == OP_PROP || cc[1] == OP_NOTPROP) cc += 2; - cc += PRIV(OP_lengths)[op]; - break; - - case OP_TYPEUPTO: - case OP_TYPEMINUPTO: - case OP_TYPEPOSUPTO: - if (cc[1 + IMM2_SIZE] == OP_PROP - || cc[1 + IMM2_SIZE] == OP_NOTPROP) cc += 2; - cc += PRIV(OP_lengths)[op]; - break; - - /* Check a class for variable quantification */ - - case OP_CLASS: - case OP_NCLASS: -#if defined SUPPORT_UTF || defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - case OP_XCLASS: - /* The original code caused an unsigned overflow in 64 bit systems, - so now we use a conditional statement. */ - if (op == OP_XCLASS) - cc += GET(cc, 1); - else - cc += PRIV(OP_lengths)[OP_CLASS]; -#else - cc += PRIV(OP_lengths)[OP_CLASS]; -#endif - - switch (*cc) - { - case OP_CRPLUS: - case OP_CRMINPLUS: - case OP_CRPOSPLUS: - branchlength++; - /* Fall through */ - - case OP_CRSTAR: - case OP_CRMINSTAR: - case OP_CRQUERY: - case OP_CRMINQUERY: - case OP_CRPOSSTAR: - case OP_CRPOSQUERY: - cc++; - break; - - case OP_CRRANGE: - case OP_CRMINRANGE: - case OP_CRPOSRANGE: - branchlength += GET2(cc,1); - cc += 1 + 2 * IMM2_SIZE; - break; - - default: - branchlength++; - break; - } - break; - - /* Backreferences and subroutine calls are treated in the same way: we find - the minimum length for the subpattern. A recursion, however, causes an - a flag to be set that causes the length of this branch to be ignored. The - logic is that a recursion can only make sense if there is another - alternation that stops the recursing. That will provide the minimum length - (when no recursion happens). A backreference within the group that it is - referencing behaves in the same way. - - If PCRE_JAVASCRIPT_COMPAT is set, a backreference to an unset bracket - matches an empty string (by default it causes a matching failure), so in - that case we must set the minimum length to zero. */ - - case OP_DNREF: /* Duplicate named pattern back reference */ - case OP_DNREFI: - if ((options & PCRE_JAVASCRIPT_COMPAT) == 0) - { - int count = GET2(cc, 1+IMM2_SIZE); - pcre_uchar *slot = (pcre_uchar *)re + - re->name_table_offset + GET2(cc, 1) * re->name_entry_size; - d = INT_MAX; - while (count-- > 0) - { - ce = cs = (pcre_uchar *)PRIV(find_bracket)(startcode, utf, GET2(slot, 0)); - if (cs == NULL) return -2; - do ce += GET(ce, 1); while (*ce == OP_ALT); - if (cc > cs && cc < ce) /* Simple recursion */ - { - d = 0; - had_recurse = TRUE; - break; - } - else - { - recurse_check *r = recurses; - for (r = recurses; r != NULL; r = r->prev) if (r->group == cs) break; - if (r != NULL) /* Mutual recursion */ - { - d = 0; - had_recurse = TRUE; - break; - } - else - { - int dd; - this_recurse.prev = recurses; - this_recurse.group = cs; - dd = find_minlength(re, cs, startcode, options, &this_recurse, - countptr); - if (dd < d) d = dd; - } - } - slot += re->name_entry_size; - } - } - else d = 0; - cc += 1 + 2*IMM2_SIZE; - goto REPEAT_BACK_REFERENCE; - - case OP_REF: /* Single back reference */ - case OP_REFI: - if ((options & PCRE_JAVASCRIPT_COMPAT) == 0) - { - ce = cs = (pcre_uchar *)PRIV(find_bracket)(startcode, utf, GET2(cc, 1)); - if (cs == NULL) return -2; - do ce += GET(ce, 1); while (*ce == OP_ALT); - if (cc > cs && cc < ce) /* Simple recursion */ - { - d = 0; - had_recurse = TRUE; - } - else - { - recurse_check *r = recurses; - for (r = recurses; r != NULL; r = r->prev) if (r->group == cs) break; - if (r != NULL) /* Mutual recursion */ - { - d = 0; - had_recurse = TRUE; - } - else - { - this_recurse.prev = recurses; - this_recurse.group = cs; - d = find_minlength(re, cs, startcode, options, &this_recurse, - countptr); - } - } - } - else d = 0; - cc += 1 + IMM2_SIZE; - - /* Handle repeated back references */ - - REPEAT_BACK_REFERENCE: - switch (*cc) - { - case OP_CRSTAR: - case OP_CRMINSTAR: - case OP_CRQUERY: - case OP_CRMINQUERY: - case OP_CRPOSSTAR: - case OP_CRPOSQUERY: - min = 0; - cc++; - break; - - case OP_CRPLUS: - case OP_CRMINPLUS: - case OP_CRPOSPLUS: - min = 1; - cc++; - break; - - case OP_CRRANGE: - case OP_CRMINRANGE: - case OP_CRPOSRANGE: - min = GET2(cc, 1); - cc += 1 + 2 * IMM2_SIZE; - break; - - default: - min = 1; - break; - } - - branchlength += min * d; - break; - - /* We can easily detect direct recursion, but not mutual recursion. This is - caught by a recursion depth count. */ - - case OP_RECURSE: - cs = ce = (pcre_uchar *)startcode + GET(cc, 1); - do ce += GET(ce, 1); while (*ce == OP_ALT); - if (cc > cs && cc < ce) /* Simple recursion */ - had_recurse = TRUE; - else - { - recurse_check *r = recurses; - for (r = recurses; r != NULL; r = r->prev) if (r->group == cs) break; - if (r != NULL) /* Mutual recursion */ - had_recurse = TRUE; - else - { - this_recurse.prev = recurses; - this_recurse.group = cs; - branchlength += find_minlength(re, cs, startcode, options, - &this_recurse, countptr); - } - } - cc += 1 + LINK_SIZE; - break; - - /* Anything else does not or need not match a character. We can get the - item's length from the table, but for those that can match zero occurrences - of a character, we must take special action for UTF-8 characters. As it - happens, the "NOT" versions of these opcodes are used at present only for - ASCII characters, so they could be omitted from this list. However, in - future that may change, so we include them here so as not to leave a - gotcha for a future maintainer. */ - - case OP_UPTO: - case OP_UPTOI: - case OP_NOTUPTO: - case OP_NOTUPTOI: - case OP_MINUPTO: - case OP_MINUPTOI: - case OP_NOTMINUPTO: - case OP_NOTMINUPTOI: - case OP_POSUPTO: - case OP_POSUPTOI: - case OP_NOTPOSUPTO: - case OP_NOTPOSUPTOI: - - case OP_STAR: - case OP_STARI: - case OP_NOTSTAR: - case OP_NOTSTARI: - case OP_MINSTAR: - case OP_MINSTARI: - case OP_NOTMINSTAR: - case OP_NOTMINSTARI: - case OP_POSSTAR: - case OP_POSSTARI: - case OP_NOTPOSSTAR: - case OP_NOTPOSSTARI: - - case OP_QUERY: - case OP_QUERYI: - case OP_NOTQUERY: - case OP_NOTQUERYI: - case OP_MINQUERY: - case OP_MINQUERYI: - case OP_NOTMINQUERY: - case OP_NOTMINQUERYI: - case OP_POSQUERY: - case OP_POSQUERYI: - case OP_NOTPOSQUERY: - case OP_NOTPOSQUERYI: - - cc += PRIV(OP_lengths)[op]; -#ifdef SUPPORT_UTF - if (utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); -#endif - break; - - /* Skip these, but we need to add in the name length. */ - - case OP_MARK: - case OP_PRUNE_ARG: - case OP_SKIP_ARG: - case OP_THEN_ARG: - cc += PRIV(OP_lengths)[op] + cc[1]; - break; - - /* The remaining opcodes are just skipped over. */ - - case OP_CLOSE: - case OP_COMMIT: - case OP_FAIL: - case OP_PRUNE: - case OP_SET_SOM: - case OP_SKIP: - case OP_THEN: - cc += PRIV(OP_lengths)[op]; - break; - - /* This should not occur: we list all opcodes explicitly so that when - new ones get added they are properly considered. */ - - default: - return -3; - } - } -/* Control never gets here */ -} - - - -/************************************************* -* Set a bit and maybe its alternate case * -*************************************************/ - -/* Given a character, set its first byte's bit in the table, and also the -corresponding bit for the other version of a letter if we are caseless. In -UTF-8 mode, for characters greater than 127, we can only do the caseless thing -when Unicode property support is available. - -Arguments: - start_bits points to the bit map - p points to the character - caseless the caseless flag - cd the block with char table pointers - utf TRUE for UTF-8 / UTF-16 / UTF-32 mode - -Returns: pointer after the character -*/ - -static const pcre_uchar * -set_table_bit(pcre_uint8 *start_bits, const pcre_uchar *p, BOOL caseless, - compile_data *cd, BOOL utf) -{ -pcre_uint32 c = *p; - -#ifdef COMPILE_PCRE8 -SET_BIT(c); - -#ifdef SUPPORT_UTF -if (utf && c > 127) - { - GETCHARINC(c, p); -#ifdef SUPPORT_UCP - if (caseless) - { - pcre_uchar buff[6]; - c = UCD_OTHERCASE(c); - (void)PRIV(ord2utf)(c, buff); - SET_BIT(buff[0]); - } -#endif /* Not SUPPORT_UCP */ - return p; - } -#else /* Not SUPPORT_UTF */ -(void)(utf); /* Stops warning for unused parameter */ -#endif /* SUPPORT_UTF */ - -/* Not UTF-8 mode, or character is less than 127. */ - -if (caseless && (cd->ctypes[c] & ctype_letter) != 0) SET_BIT(cd->fcc[c]); -return p + 1; -#endif /* COMPILE_PCRE8 */ - -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 -if (c > 0xff) - { - c = 0xff; - caseless = FALSE; - } -SET_BIT(c); - -#ifdef SUPPORT_UTF -if (utf && c > 127) - { - GETCHARINC(c, p); -#ifdef SUPPORT_UCP - if (caseless) - { - c = UCD_OTHERCASE(c); - if (c > 0xff) - c = 0xff; - SET_BIT(c); - } -#endif /* SUPPORT_UCP */ - return p; - } -#else /* Not SUPPORT_UTF */ -(void)(utf); /* Stops warning for unused parameter */ -#endif /* SUPPORT_UTF */ - -if (caseless && (cd->ctypes[c] & ctype_letter) != 0) SET_BIT(cd->fcc[c]); -return p + 1; -#endif -} - - - -/************************************************* -* Set bits for a positive character type * -*************************************************/ - -/* This function sets starting bits for a character type. In UTF-8 mode, we can -only do a direct setting for bytes less than 128, as otherwise there can be -confusion with bytes in the middle of UTF-8 characters. In a "traditional" -environment, the tables will only recognize ASCII characters anyway, but in at -least one Windows environment, some higher bytes bits were set in the tables. -So we deal with that case by considering the UTF-8 encoding. - -Arguments: - start_bits the starting bitmap - cbit type the type of character wanted - table_limit 32 for non-UTF-8; 16 for UTF-8 - cd the block with char table pointers - -Returns: nothing -*/ - -static void -set_type_bits(pcre_uint8 *start_bits, int cbit_type, unsigned int table_limit, - compile_data *cd) -{ -register pcre_uint32 c; -for (c = 0; c < table_limit; c++) start_bits[c] |= cd->cbits[c+cbit_type]; -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 -if (table_limit == 32) return; -for (c = 128; c < 256; c++) - { - if ((cd->cbits[c/8] & (1 << (c&7))) != 0) - { - pcre_uchar buff[6]; - (void)PRIV(ord2utf)(c, buff); - SET_BIT(buff[0]); - } - } -#endif -} - - -/************************************************* -* Set bits for a negative character type * -*************************************************/ - -/* This function sets starting bits for a negative character type such as \D. -In UTF-8 mode, we can only do a direct setting for bytes less than 128, as -otherwise there can be confusion with bytes in the middle of UTF-8 characters. -Unlike in the positive case, where we can set appropriate starting bits for -specific high-valued UTF-8 characters, in this case we have to set the bits for -all high-valued characters. The lowest is 0xc2, but we overkill by starting at -0xc0 (192) for simplicity. - -Arguments: - start_bits the starting bitmap - cbit type the type of character wanted - table_limit 32 for non-UTF-8; 16 for UTF-8 - cd the block with char table pointers - -Returns: nothing -*/ - -static void -set_nottype_bits(pcre_uint8 *start_bits, int cbit_type, unsigned int table_limit, - compile_data *cd) -{ -register pcre_uint32 c; -for (c = 0; c < table_limit; c++) start_bits[c] |= ~cd->cbits[c+cbit_type]; -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 -if (table_limit != 32) for (c = 24; c < 32; c++) start_bits[c] = 0xff; -#endif -} - - - -/************************************************* -* Create bitmap of starting bytes * -*************************************************/ - -/* This function scans a compiled unanchored expression recursively and -attempts to build a bitmap of the set of possible starting bytes. As time goes -by, we may be able to get more clever at doing this. The SSB_CONTINUE return is -useful for parenthesized groups in patterns such as (a*)b where the group -provides some optional starting bytes but scanning must continue at the outer -level to find at least one mandatory byte. At the outermost level, this -function fails unless the result is SSB_DONE. - -Arguments: - code points to an expression - start_bits points to a 32-byte table, initialized to 0 - utf TRUE if in UTF-8 / UTF-16 / UTF-32 mode - cd the block with char table pointers - -Returns: SSB_FAIL => Failed to find any starting bytes - SSB_DONE => Found mandatory starting bytes - SSB_CONTINUE => Found optional starting bytes - SSB_UNKNOWN => Hit an unrecognized opcode -*/ - -static int -set_start_bits(const pcre_uchar *code, pcre_uint8 *start_bits, BOOL utf, - compile_data *cd) -{ -register pcre_uint32 c; -int yield = SSB_DONE; -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 -int table_limit = utf? 16:32; -#else -int table_limit = 32; -#endif - -#if 0 -/* ========================================================================= */ -/* The following comment and code was inserted in January 1999. In May 2006, -when it was observed to cause compiler warnings about unused values, I took it -out again. If anybody is still using OS/2, they will have to put it back -manually. */ - -/* This next statement and the later reference to dummy are here in order to -trick the optimizer of the IBM C compiler for OS/2 into generating correct -code. Apparently IBM isn't going to fix the problem, and we would rather not -disable optimization (in this module it actually makes a big difference, and -the pcre module can use all the optimization it can get). */ - -volatile int dummy; -/* ========================================================================= */ -#endif - -do - { - BOOL try_next = TRUE; - const pcre_uchar *tcode = code + 1 + LINK_SIZE; - - if (*code == OP_CBRA || *code == OP_SCBRA || - *code == OP_CBRAPOS || *code == OP_SCBRAPOS) tcode += IMM2_SIZE; - - while (try_next) /* Loop for items in this branch */ - { - int rc; - - switch(*tcode) - { - /* If we reach something we don't understand, it means a new opcode has - been created that hasn't been added to this code. Hopefully this problem - will be discovered during testing. */ - - default: - return SSB_UNKNOWN; - - /* Fail for a valid opcode that implies no starting bits. */ - - case OP_ACCEPT: - case OP_ASSERT_ACCEPT: - case OP_ALLANY: - case OP_ANY: - case OP_ANYBYTE: - case OP_CIRC: - case OP_CIRCM: - case OP_CLOSE: - case OP_COMMIT: - case OP_COND: - case OP_CREF: - case OP_DEF: - case OP_DNCREF: - case OP_DNREF: - case OP_DNREFI: - case OP_DNRREF: - case OP_DOLL: - case OP_DOLLM: - case OP_END: - case OP_EOD: - case OP_EODN: - case OP_EXTUNI: - case OP_FAIL: - case OP_MARK: - case OP_NOT: - case OP_NOTEXACT: - case OP_NOTEXACTI: - case OP_NOTI: - case OP_NOTMINPLUS: - case OP_NOTMINPLUSI: - case OP_NOTMINQUERY: - case OP_NOTMINQUERYI: - case OP_NOTMINSTAR: - case OP_NOTMINSTARI: - case OP_NOTMINUPTO: - case OP_NOTMINUPTOI: - case OP_NOTPLUS: - case OP_NOTPLUSI: - case OP_NOTPOSPLUS: - case OP_NOTPOSPLUSI: - case OP_NOTPOSQUERY: - case OP_NOTPOSQUERYI: - case OP_NOTPOSSTAR: - case OP_NOTPOSSTARI: - case OP_NOTPOSUPTO: - case OP_NOTPOSUPTOI: - case OP_NOTPROP: - case OP_NOTQUERY: - case OP_NOTQUERYI: - case OP_NOTSTAR: - case OP_NOTSTARI: - case OP_NOTUPTO: - case OP_NOTUPTOI: - case OP_NOT_HSPACE: - case OP_NOT_VSPACE: - case OP_PRUNE: - case OP_PRUNE_ARG: - case OP_RECURSE: - case OP_REF: - case OP_REFI: - case OP_REVERSE: - case OP_RREF: - case OP_SCOND: - case OP_SET_SOM: - case OP_SKIP: - case OP_SKIP_ARG: - case OP_SOD: - case OP_SOM: - case OP_THEN: - case OP_THEN_ARG: - return SSB_FAIL; - - /* A "real" property test implies no starting bits, but the fake property - PT_CLIST identifies a list of characters. These lists are short, as they - are used for characters with more than one "other case", so there is no - point in recognizing them for OP_NOTPROP. */ - - case OP_PROP: - if (tcode[1] != PT_CLIST) return SSB_FAIL; - { - const pcre_uint32 *p = PRIV(ucd_caseless_sets) + tcode[2]; - while ((c = *p++) < NOTACHAR) - { -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 - if (utf) - { - pcre_uchar buff[6]; - (void)PRIV(ord2utf)(c, buff); - c = buff[0]; - } -#endif - if (c > 0xff) SET_BIT(0xff); else SET_BIT(c); - } - } - try_next = FALSE; - break; - - /* We can ignore word boundary tests. */ - - case OP_WORD_BOUNDARY: - case OP_NOT_WORD_BOUNDARY: - tcode++; - break; - - /* If we hit a bracket or a positive lookahead assertion, recurse to set - bits from within the subpattern. If it can't find anything, we have to - give up. If it finds some mandatory character(s), we are done for this - branch. Otherwise, carry on scanning after the subpattern. */ - - case OP_BRA: - case OP_SBRA: - case OP_CBRA: - case OP_SCBRA: - case OP_BRAPOS: - case OP_SBRAPOS: - case OP_CBRAPOS: - case OP_SCBRAPOS: - case OP_ONCE: - case OP_ONCE_NC: - case OP_ASSERT: - rc = set_start_bits(tcode, start_bits, utf, cd); - if (rc == SSB_FAIL || rc == SSB_UNKNOWN) return rc; - if (rc == SSB_DONE) try_next = FALSE; else - { - do tcode += GET(tcode, 1); while (*tcode == OP_ALT); - tcode += 1 + LINK_SIZE; - } - break; - - /* If we hit ALT or KET, it means we haven't found anything mandatory in - this branch, though we might have found something optional. For ALT, we - continue with the next alternative, but we have to arrange that the final - result from subpattern is SSB_CONTINUE rather than SSB_DONE. For KET, - return SSB_CONTINUE: if this is the top level, that indicates failure, - but after a nested subpattern, it causes scanning to continue. */ - - case OP_ALT: - yield = SSB_CONTINUE; - try_next = FALSE; - break; - - case OP_KET: - case OP_KETRMAX: - case OP_KETRMIN: - case OP_KETRPOS: - return SSB_CONTINUE; - - /* Skip over callout */ - - case OP_CALLOUT: - tcode += 2 + 2*LINK_SIZE; - break; - - /* Skip over lookbehind and negative lookahead assertions */ - - case OP_ASSERT_NOT: - case OP_ASSERTBACK: - case OP_ASSERTBACK_NOT: - do tcode += GET(tcode, 1); while (*tcode == OP_ALT); - tcode += 1 + LINK_SIZE; - break; - - /* BRAZERO does the bracket, but carries on. */ - - case OP_BRAZERO: - case OP_BRAMINZERO: - case OP_BRAPOSZERO: - rc = set_start_bits(++tcode, start_bits, utf, cd); - if (rc == SSB_FAIL || rc == SSB_UNKNOWN) return rc; -/* ========================================================================= - See the comment at the head of this function concerning the next line, - which was an old fudge for the benefit of OS/2. - dummy = 1; - ========================================================================= */ - do tcode += GET(tcode,1); while (*tcode == OP_ALT); - tcode += 1 + LINK_SIZE; - break; - - /* SKIPZERO skips the bracket. */ - - case OP_SKIPZERO: - tcode++; - do tcode += GET(tcode,1); while (*tcode == OP_ALT); - tcode += 1 + LINK_SIZE; - break; - - /* Single-char * or ? sets the bit and tries the next item */ - - case OP_STAR: - case OP_MINSTAR: - case OP_POSSTAR: - case OP_QUERY: - case OP_MINQUERY: - case OP_POSQUERY: - tcode = set_table_bit(start_bits, tcode + 1, FALSE, cd, utf); - break; - - case OP_STARI: - case OP_MINSTARI: - case OP_POSSTARI: - case OP_QUERYI: - case OP_MINQUERYI: - case OP_POSQUERYI: - tcode = set_table_bit(start_bits, tcode + 1, TRUE, cd, utf); - break; - - /* Single-char upto sets the bit and tries the next */ - - case OP_UPTO: - case OP_MINUPTO: - case OP_POSUPTO: - tcode = set_table_bit(start_bits, tcode + 1 + IMM2_SIZE, FALSE, cd, utf); - break; - - case OP_UPTOI: - case OP_MINUPTOI: - case OP_POSUPTOI: - tcode = set_table_bit(start_bits, tcode + 1 + IMM2_SIZE, TRUE, cd, utf); - break; - - /* At least one single char sets the bit and stops */ - - case OP_EXACT: - tcode += IMM2_SIZE; - /* Fall through */ - case OP_CHAR: - case OP_PLUS: - case OP_MINPLUS: - case OP_POSPLUS: - (void)set_table_bit(start_bits, tcode + 1, FALSE, cd, utf); - try_next = FALSE; - break; - - case OP_EXACTI: - tcode += IMM2_SIZE; - /* Fall through */ - case OP_CHARI: - case OP_PLUSI: - case OP_MINPLUSI: - case OP_POSPLUSI: - (void)set_table_bit(start_bits, tcode + 1, TRUE, cd, utf); - try_next = FALSE; - break; - - /* Special spacing and line-terminating items. These recognize specific - lists of characters. The difference between VSPACE and ANYNL is that the - latter can match the two-character CRLF sequence, but that is not - relevant for finding the first character, so their code here is - identical. */ - - case OP_HSPACE: - SET_BIT(CHAR_HT); - SET_BIT(CHAR_SPACE); -#ifdef SUPPORT_UTF - if (utf) - { -#ifdef COMPILE_PCRE8 - SET_BIT(0xC2); /* For U+00A0 */ - SET_BIT(0xE1); /* For U+1680, U+180E */ - SET_BIT(0xE2); /* For U+2000 - U+200A, U+202F, U+205F */ - SET_BIT(0xE3); /* For U+3000 */ -#elif defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - SET_BIT(0xA0); - SET_BIT(0xFF); /* For characters > 255 */ -#endif /* COMPILE_PCRE[8|16|32] */ - } - else -#endif /* SUPPORT_UTF */ - { -#ifndef EBCDIC - SET_BIT(0xA0); -#endif /* Not EBCDIC */ -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - SET_BIT(0xFF); /* For characters > 255 */ -#endif /* COMPILE_PCRE[16|32] */ - } - try_next = FALSE; - break; - - case OP_ANYNL: - case OP_VSPACE: - SET_BIT(CHAR_LF); - SET_BIT(CHAR_VT); - SET_BIT(CHAR_FF); - SET_BIT(CHAR_CR); -#ifdef SUPPORT_UTF - if (utf) - { -#ifdef COMPILE_PCRE8 - SET_BIT(0xC2); /* For U+0085 */ - SET_BIT(0xE2); /* For U+2028, U+2029 */ -#elif defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - SET_BIT(CHAR_NEL); - SET_BIT(0xFF); /* For characters > 255 */ -#endif /* COMPILE_PCRE[8|16|32] */ - } - else -#endif /* SUPPORT_UTF */ - { - SET_BIT(CHAR_NEL); -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - SET_BIT(0xFF); /* For characters > 255 */ -#endif - } - try_next = FALSE; - break; - - /* Single character types set the bits and stop. Note that if PCRE_UCP - is set, we do not see these op codes because \d etc are converted to - properties. Therefore, these apply in the case when only characters less - than 256 are recognized to match the types. */ - - case OP_NOT_DIGIT: - set_nottype_bits(start_bits, cbit_digit, table_limit, cd); - try_next = FALSE; - break; - - case OP_DIGIT: - set_type_bits(start_bits, cbit_digit, table_limit, cd); - try_next = FALSE; - break; - - /* The cbit_space table has vertical tab as whitespace; we no longer - have to play fancy tricks because Perl added VT to its whitespace at - release 5.18. PCRE added it at release 8.34. */ - - case OP_NOT_WHITESPACE: - set_nottype_bits(start_bits, cbit_space, table_limit, cd); - try_next = FALSE; - break; - - case OP_WHITESPACE: - set_type_bits(start_bits, cbit_space, table_limit, cd); - try_next = FALSE; - break; - - case OP_NOT_WORDCHAR: - set_nottype_bits(start_bits, cbit_word, table_limit, cd); - try_next = FALSE; - break; - - case OP_WORDCHAR: - set_type_bits(start_bits, cbit_word, table_limit, cd); - try_next = FALSE; - break; - - /* One or more character type fudges the pointer and restarts, knowing - it will hit a single character type and stop there. */ - - case OP_TYPEPLUS: - case OP_TYPEMINPLUS: - case OP_TYPEPOSPLUS: - tcode++; - break; - - case OP_TYPEEXACT: - tcode += 1 + IMM2_SIZE; - break; - - /* Zero or more repeats of character types set the bits and then - try again. */ - - case OP_TYPEUPTO: - case OP_TYPEMINUPTO: - case OP_TYPEPOSUPTO: - tcode += IMM2_SIZE; /* Fall through */ - - case OP_TYPESTAR: - case OP_TYPEMINSTAR: - case OP_TYPEPOSSTAR: - case OP_TYPEQUERY: - case OP_TYPEMINQUERY: - case OP_TYPEPOSQUERY: - switch(tcode[1]) - { - default: - case OP_ANY: - case OP_ALLANY: - return SSB_FAIL; - - case OP_HSPACE: - SET_BIT(CHAR_HT); - SET_BIT(CHAR_SPACE); -#ifdef SUPPORT_UTF - if (utf) - { -#ifdef COMPILE_PCRE8 - SET_BIT(0xC2); /* For U+00A0 */ - SET_BIT(0xE1); /* For U+1680, U+180E */ - SET_BIT(0xE2); /* For U+2000 - U+200A, U+202F, U+205F */ - SET_BIT(0xE3); /* For U+3000 */ -#elif defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - SET_BIT(0xA0); - SET_BIT(0xFF); /* For characters > 255 */ -#endif /* COMPILE_PCRE[8|16|32] */ - } - else -#endif /* SUPPORT_UTF */ -#ifndef EBCDIC - SET_BIT(0xA0); -#endif /* Not EBCDIC */ - break; - - case OP_ANYNL: - case OP_VSPACE: - SET_BIT(CHAR_LF); - SET_BIT(CHAR_VT); - SET_BIT(CHAR_FF); - SET_BIT(CHAR_CR); -#ifdef SUPPORT_UTF - if (utf) - { -#ifdef COMPILE_PCRE8 - SET_BIT(0xC2); /* For U+0085 */ - SET_BIT(0xE2); /* For U+2028, U+2029 */ -#elif defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - SET_BIT(CHAR_NEL); - SET_BIT(0xFF); /* For characters > 255 */ -#endif /* COMPILE_PCRE16 */ - } - else -#endif /* SUPPORT_UTF */ - SET_BIT(CHAR_NEL); - break; - - case OP_NOT_DIGIT: - set_nottype_bits(start_bits, cbit_digit, table_limit, cd); - break; - - case OP_DIGIT: - set_type_bits(start_bits, cbit_digit, table_limit, cd); - break; - - /* The cbit_space table has vertical tab as whitespace; we no longer - have to play fancy tricks because Perl added VT to its whitespace at - release 5.18. PCRE added it at release 8.34. */ - - case OP_NOT_WHITESPACE: - set_nottype_bits(start_bits, cbit_space, table_limit, cd); - break; - - case OP_WHITESPACE: - set_type_bits(start_bits, cbit_space, table_limit, cd); - break; - - case OP_NOT_WORDCHAR: - set_nottype_bits(start_bits, cbit_word, table_limit, cd); - break; - - case OP_WORDCHAR: - set_type_bits(start_bits, cbit_word, table_limit, cd); - break; - } - - tcode += 2; - break; - - /* Character class where all the information is in a bit map: set the - bits and either carry on or not, according to the repeat count. If it was - a negative class, and we are operating with UTF-8 characters, any byte - with a value >= 0xc4 is a potentially valid starter because it starts a - character with a value > 255. */ - -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - case OP_XCLASS: - if ((tcode[1 + LINK_SIZE] & XCL_HASPROP) != 0) - return SSB_FAIL; - /* All bits are set. */ - if ((tcode[1 + LINK_SIZE] & XCL_MAP) == 0 && (tcode[1 + LINK_SIZE] & XCL_NOT) != 0) - return SSB_FAIL; -#endif - /* Fall through */ - - case OP_NCLASS: -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 - if (utf) - { - start_bits[24] |= 0xf0; /* Bits for 0xc4 - 0xc8 */ - memset(start_bits+25, 0xff, 7); /* Bits for 0xc9 - 0xff */ - } -#endif -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - SET_BIT(0xFF); /* For characters > 255 */ -#endif - /* Fall through */ - - case OP_CLASS: - { - pcre_uint8 *map; -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - map = NULL; - if (*tcode == OP_XCLASS) - { - if ((tcode[1 + LINK_SIZE] & XCL_MAP) != 0) - map = (pcre_uint8 *)(tcode + 1 + LINK_SIZE + 1); - tcode += GET(tcode, 1); - } - else -#endif - { - tcode++; - map = (pcre_uint8 *)tcode; - tcode += 32 / sizeof(pcre_uchar); - } - - /* In UTF-8 mode, the bits in a bit map correspond to character - values, not to byte values. However, the bit map we are constructing is - for byte values. So we have to do a conversion for characters whose - value is > 127. In fact, there are only two possible starting bytes for - characters in the range 128 - 255. */ - -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - if (map != NULL) -#endif - { -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 - if (utf) - { - for (c = 0; c < 16; c++) start_bits[c] |= map[c]; - for (c = 128; c < 256; c++) - { - if ((map[c/8] & (1 << (c&7))) != 0) - { - int d = (c >> 6) | 0xc0; /* Set bit for this starter */ - start_bits[d/8] |= (1 << (d&7)); /* and then skip on to the */ - c = (c & 0xc0) + 0x40 - 1; /* next relevant character. */ - } - } - } - else -#endif - { - /* In non-UTF-8 mode, the two bit maps are completely compatible. */ - for (c = 0; c < 32; c++) start_bits[c] |= map[c]; - } - } - - /* Advance past the bit map, and act on what follows. For a zero - minimum repeat, continue; otherwise stop processing. */ - - switch (*tcode) - { - case OP_CRSTAR: - case OP_CRMINSTAR: - case OP_CRQUERY: - case OP_CRMINQUERY: - case OP_CRPOSSTAR: - case OP_CRPOSQUERY: - tcode++; - break; - - case OP_CRRANGE: - case OP_CRMINRANGE: - case OP_CRPOSRANGE: - if (GET2(tcode, 1) == 0) tcode += 1 + 2 * IMM2_SIZE; - else try_next = FALSE; - break; - - default: - try_next = FALSE; - break; - } - } - break; /* End of bitmap class handling */ - - } /* End of switch */ - } /* End of try_next loop */ - - code += GET(code, 1); /* Advance to next branch */ - } -while (*code == OP_ALT); -return yield; -} - - - - - -/************************************************* -* Study a compiled expression * -*************************************************/ - -/* This function is handed a compiled expression that it must study to produce -information that will speed up the matching. It returns a pcre[16]_extra block -which then gets handed back to pcre_exec(). - -Arguments: - re points to the compiled expression - options contains option bits - errorptr points to where to place error messages; - set NULL unless error - -Returns: pointer to a pcre[16]_extra block, with study_data filled in and - the appropriate flags set; - NULL on error or if no optimization possible -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN pcre_extra * PCRE_CALL_CONVENTION -pcre_study(const pcre *external_re, int options, const char **errorptr) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN pcre16_extra * PCRE_CALL_CONVENTION -pcre16_study(const pcre16 *external_re, int options, const char **errorptr) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN pcre32_extra * PCRE_CALL_CONVENTION -pcre32_study(const pcre32 *external_re, int options, const char **errorptr) -#endif -{ -int min; -int count = 0; -BOOL bits_set = FALSE; -pcre_uint8 start_bits[32]; -PUBL(extra) *extra = NULL; -pcre_study_data *study; -const pcre_uint8 *tables; -pcre_uchar *code; -compile_data compile_block; -const REAL_PCRE *re = (const REAL_PCRE *)external_re; - - -*errorptr = NULL; - -if (re == NULL || re->magic_number != MAGIC_NUMBER) - { - *errorptr = "argument is not a compiled regular expression"; - return NULL; - } - -if ((re->flags & PCRE_MODE) == 0) - { -#if defined COMPILE_PCRE8 - *errorptr = "argument not compiled in 8 bit mode"; -#elif defined COMPILE_PCRE16 - *errorptr = "argument not compiled in 16 bit mode"; -#elif defined COMPILE_PCRE32 - *errorptr = "argument not compiled in 32 bit mode"; -#endif - return NULL; - } - -if ((options & ~PUBLIC_STUDY_OPTIONS) != 0) - { - *errorptr = "unknown or incorrect option bit(s) set"; - return NULL; - } - -code = (pcre_uchar *)re + re->name_table_offset + - (re->name_count * re->name_entry_size); - -/* For an anchored pattern, or an unanchored pattern that has a first char, or -a multiline pattern that matches only at "line starts", there is no point in -seeking a list of starting bytes. */ - -if ((re->options & PCRE_ANCHORED) == 0 && - (re->flags & (PCRE_FIRSTSET|PCRE_STARTLINE)) == 0) - { - int rc; - - /* Set the character tables in the block that is passed around */ - - tables = re->tables; - -#if defined COMPILE_PCRE8 - if (tables == NULL) - (void)pcre_fullinfo(external_re, NULL, PCRE_INFO_DEFAULT_TABLES, - (void *)(&tables)); -#elif defined COMPILE_PCRE16 - if (tables == NULL) - (void)pcre16_fullinfo(external_re, NULL, PCRE_INFO_DEFAULT_TABLES, - (void *)(&tables)); -#elif defined COMPILE_PCRE32 - if (tables == NULL) - (void)pcre32_fullinfo(external_re, NULL, PCRE_INFO_DEFAULT_TABLES, - (void *)(&tables)); -#endif - - compile_block.lcc = tables + lcc_offset; - compile_block.fcc = tables + fcc_offset; - compile_block.cbits = tables + cbits_offset; - compile_block.ctypes = tables + ctypes_offset; - - /* See if we can find a fixed set of initial characters for the pattern. */ - - memset(start_bits, 0, 32 * sizeof(pcre_uint8)); - rc = set_start_bits(code, start_bits, (re->options & PCRE_UTF8) != 0, - &compile_block); - bits_set = rc == SSB_DONE; - if (rc == SSB_UNKNOWN) - { - *errorptr = "internal error: opcode not recognized"; - return NULL; - } - } - -/* Find the minimum length of subject string. */ - -switch(min = find_minlength(re, code, code, re->options, NULL, &count)) - { - case -2: *errorptr = "internal error: missing capturing bracket"; return NULL; - case -3: *errorptr = "internal error: opcode not recognized"; return NULL; - default: break; - } - -/* If a set of starting bytes has been identified, or if the minimum length is -greater than zero, or if JIT optimization has been requested, or if -PCRE_STUDY_EXTRA_NEEDED is set, get a pcre[16]_extra block and a -pcre_study_data block. The study data is put in the latter, which is pointed to -by the former, which may also get additional data set later by the calling -program. At the moment, the size of pcre_study_data is fixed. We nevertheless -save it in a field for returning via the pcre_fullinfo() function so that if it -becomes variable in the future, we don't have to change that code. */ - -if (bits_set || min > 0 || (options & ( -#ifdef SUPPORT_JIT - PCRE_STUDY_JIT_COMPILE | PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE | - PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE | -#endif - PCRE_STUDY_EXTRA_NEEDED)) != 0) - { - extra = (PUBL(extra) *)(PUBL(malloc)) - (sizeof(PUBL(extra)) + sizeof(pcre_study_data)); - if (extra == NULL) - { - *errorptr = "failed to get memory"; - return NULL; - } - - study = (pcre_study_data *)((char *)extra + sizeof(PUBL(extra))); - extra->flags = PCRE_EXTRA_STUDY_DATA; - extra->study_data = study; - - study->size = sizeof(pcre_study_data); - study->flags = 0; - - /* Set the start bits always, to avoid unset memory errors if the - study data is written to a file, but set the flag only if any of the bits - are set, to save time looking when none are. */ - - if (bits_set) - { - study->flags |= PCRE_STUDY_MAPPED; - memcpy(study->start_bits, start_bits, sizeof(start_bits)); - } - else memset(study->start_bits, 0, 32 * sizeof(pcre_uint8)); - -#ifdef PCRE_DEBUG - if (bits_set) - { - pcre_uint8 *ptr = start_bits; - int i; - - printf("Start bits:\n"); - for (i = 0; i < 32; i++) - printf("%3d: %02x%s", i * 8, *ptr++, ((i + 1) & 0x7) != 0? " " : "\n"); - } -#endif - - /* Always set the minlength value in the block, because the JIT compiler - makes use of it. However, don't set the bit unless the length is greater than - zero - the interpretive pcre_exec() and pcre_dfa_exec() needn't waste time - checking the zero case. */ - - if (min > 0) - { - study->flags |= PCRE_STUDY_MINLEN; - study->minlength = min; - } - else study->minlength = 0; - - /* If JIT support was compiled and requested, attempt the JIT compilation. - If no starting bytes were found, and the minimum length is zero, and JIT - compilation fails, abandon the extra block and return NULL, unless - PCRE_STUDY_EXTRA_NEEDED is set. */ - -#ifdef SUPPORT_JIT - extra->executable_jit = NULL; - if ((options & PCRE_STUDY_JIT_COMPILE) != 0) - PRIV(jit_compile)(re, extra, JIT_COMPILE); - if ((options & PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE) != 0) - PRIV(jit_compile)(re, extra, JIT_PARTIAL_SOFT_COMPILE); - if ((options & PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE) != 0) - PRIV(jit_compile)(re, extra, JIT_PARTIAL_HARD_COMPILE); - - if (study->flags == 0 && (extra->flags & PCRE_EXTRA_EXECUTABLE_JIT) == 0 && - (options & PCRE_STUDY_EXTRA_NEEDED) == 0) - { -#if defined COMPILE_PCRE8 - pcre_free_study(extra); -#elif defined COMPILE_PCRE16 - pcre16_free_study(extra); -#elif defined COMPILE_PCRE32 - pcre32_free_study(extra); -#endif - extra = NULL; - } -#endif - } - -return extra; -} - - -/************************************************* -* Free the study data * -*************************************************/ - -/* This function frees the memory that was obtained by pcre_study(). - -Argument: a pointer to the pcre[16]_extra block -Returns: nothing -*/ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN void -pcre_free_study(pcre_extra *extra) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN void -pcre16_free_study(pcre16_extra *extra) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN void -pcre32_free_study(pcre32_extra *extra) -#endif -{ -if (extra == NULL) - return; -#ifdef SUPPORT_JIT -if ((extra->flags & PCRE_EXTRA_EXECUTABLE_JIT) != 0 && - extra->executable_jit != NULL) - PRIV(jit_free)(extra->executable_jit); -#endif -PUBL(free)(extra); -} - -/* End of pcre_study.c */ diff --git a/src/pcre/pcre_ucd.c b/src/pcre/pcre_ucd.c deleted file mode 100644 index f22f826c..00000000 --- a/src/pcre/pcre_ucd.c +++ /dev/null @@ -1,3644 +0,0 @@ -/* This module is generated by the maint/MultiStage2.py script. -Do not modify it by hand. Instead modify the script and run it -to regenerate this code. - -As well as being part of the PCRE library, this module is #included -by the pcretest program, which redefines the PRIV macro to change -table names from _pcre_xxx to xxxx, thereby avoiding name clashes -with the library. At present, just one of these tables is actually -needed. */ - -#ifndef PCRE_INCLUDED - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include "pcre_internal.h" - -#endif /* PCRE_INCLUDED */ - -/* Unicode character database. */ -/* This file was autogenerated by the MultiStage2.py script. */ -/* Total size: 72576 bytes, block size: 128. */ - -/* The tables herein are needed only when UCP support is built -into PCRE. This module should not be referenced otherwise, so -it should not matter whether it is compiled or not. However -a comment was received about space saving - maybe the guy linked -all the modules rather than using a library - so we include a -condition to cut out the tables when not needed. But don't leave -a totally empty module because some compilers barf at that. -Instead, just supply small dummy tables. */ - -#ifndef SUPPORT_UCP -const ucd_record PRIV(ucd_records)[] = {{0,0,0,0,0 }}; -const pcre_uint8 PRIV(ucd_stage1)[] = {0}; -const pcre_uint16 PRIV(ucd_stage2)[] = {0}; -const pcre_uint32 PRIV(ucd_caseless_sets)[] = {0}; -#else - -/* If the 32-bit library is run in non-32-bit mode, character values -greater than 0x10ffff may be encountered. For these we set up a -special record. */ - -#ifdef COMPILE_PCRE32 -const ucd_record PRIV(dummy_ucd_record)[] = {{ - ucp_Common, /* script */ - ucp_Cn, /* type unassigned */ - ucp_gbOther, /* grapheme break property */ - 0, /* case set */ - 0, /* other case */ - }}; -#endif - -/* When recompiling tables with a new Unicode version, please check the -types in this structure definition from pcre_internal.h (the actual -field names will be different): - -typedef struct { -pcre_uint8 property_0; -pcre_uint8 property_1; -pcre_uint8 property_2; -pcre_uint8 property_3; -pcre_int32 property_4; -} ucd_record; -*/ - - -const pcre_uint32 PRIV(ucd_caseless_sets)[] = { - NOTACHAR, - 0x0053, 0x0073, 0x017f, NOTACHAR, - 0x01c4, 0x01c5, 0x01c6, NOTACHAR, - 0x01c7, 0x01c8, 0x01c9, NOTACHAR, - 0x01ca, 0x01cb, 0x01cc, NOTACHAR, - 0x01f1, 0x01f2, 0x01f3, NOTACHAR, - 0x0345, 0x0399, 0x03b9, 0x1fbe, NOTACHAR, - 0x00b5, 0x039c, 0x03bc, NOTACHAR, - 0x03a3, 0x03c2, 0x03c3, NOTACHAR, - 0x0392, 0x03b2, 0x03d0, NOTACHAR, - 0x0398, 0x03b8, 0x03d1, 0x03f4, NOTACHAR, - 0x03a6, 0x03c6, 0x03d5, NOTACHAR, - 0x03a0, 0x03c0, 0x03d6, NOTACHAR, - 0x039a, 0x03ba, 0x03f0, NOTACHAR, - 0x03a1, 0x03c1, 0x03f1, NOTACHAR, - 0x0395, 0x03b5, 0x03f5, NOTACHAR, - 0x1e60, 0x1e61, 0x1e9b, NOTACHAR, - 0x03a9, 0x03c9, 0x2126, NOTACHAR, - 0x004b, 0x006b, 0x212a, NOTACHAR, - 0x00c5, 0x00e5, 0x212b, NOTACHAR, -}; - -/* When #included in pcretest, we don't need this large table. */ - -#ifndef PCRE_INCLUDED - -const ucd_record PRIV(ucd_records)[] = { /* 5760 bytes, record size 8 */ - { 9, 0, 2, 0, 0, }, /* 0 */ - { 9, 0, 1, 0, 0, }, /* 1 */ - { 9, 0, 0, 0, 0, }, /* 2 */ - { 9, 29, 12, 0, 0, }, /* 3 */ - { 9, 21, 12, 0, 0, }, /* 4 */ - { 9, 23, 12, 0, 0, }, /* 5 */ - { 9, 22, 12, 0, 0, }, /* 6 */ - { 9, 18, 12, 0, 0, }, /* 7 */ - { 9, 25, 12, 0, 0, }, /* 8 */ - { 9, 17, 12, 0, 0, }, /* 9 */ - { 9, 13, 12, 0, 0, }, /* 10 */ - { 33, 9, 12, 0, 32, }, /* 11 */ - { 33, 9, 12, 71, 32, }, /* 12 */ - { 33, 9, 12, 1, 32, }, /* 13 */ - { 9, 24, 12, 0, 0, }, /* 14 */ - { 9, 16, 12, 0, 0, }, /* 15 */ - { 33, 5, 12, 0, -32, }, /* 16 */ - { 33, 5, 12, 71, -32, }, /* 17 */ - { 33, 5, 12, 1, -32, }, /* 18 */ - { 9, 26, 12, 0, 0, }, /* 19 */ - { 33, 7, 12, 0, 0, }, /* 20 */ - { 9, 20, 12, 0, 0, }, /* 21 */ - { 9, 1, 2, 0, 0, }, /* 22 */ - { 9, 15, 12, 0, 0, }, /* 23 */ - { 9, 5, 12, 26, 775, }, /* 24 */ - { 9, 19, 12, 0, 0, }, /* 25 */ - { 33, 9, 12, 75, 32, }, /* 26 */ - { 33, 5, 12, 0, 7615, }, /* 27 */ - { 33, 5, 12, 75, -32, }, /* 28 */ - { 33, 5, 12, 0, 121, }, /* 29 */ - { 33, 9, 12, 0, 1, }, /* 30 */ - { 33, 5, 12, 0, -1, }, /* 31 */ - { 33, 9, 12, 0, 0, }, /* 32 */ - { 33, 5, 12, 0, 0, }, /* 33 */ - { 33, 9, 12, 0, -121, }, /* 34 */ - { 33, 5, 12, 1, -268, }, /* 35 */ - { 33, 5, 12, 0, 195, }, /* 36 */ - { 33, 9, 12, 0, 210, }, /* 37 */ - { 33, 9, 12, 0, 206, }, /* 38 */ - { 33, 9, 12, 0, 205, }, /* 39 */ - { 33, 9, 12, 0, 79, }, /* 40 */ - { 33, 9, 12, 0, 202, }, /* 41 */ - { 33, 9, 12, 0, 203, }, /* 42 */ - { 33, 9, 12, 0, 207, }, /* 43 */ - { 33, 5, 12, 0, 97, }, /* 44 */ - { 33, 9, 12, 0, 211, }, /* 45 */ - { 33, 9, 12, 0, 209, }, /* 46 */ - { 33, 5, 12, 0, 163, }, /* 47 */ - { 33, 9, 12, 0, 213, }, /* 48 */ - { 33, 5, 12, 0, 130, }, /* 49 */ - { 33, 9, 12, 0, 214, }, /* 50 */ - { 33, 9, 12, 0, 218, }, /* 51 */ - { 33, 9, 12, 0, 217, }, /* 52 */ - { 33, 9, 12, 0, 219, }, /* 53 */ - { 33, 5, 12, 0, 56, }, /* 54 */ - { 33, 9, 12, 5, 2, }, /* 55 */ - { 33, 8, 12, 5, 1, }, /* 56 */ - { 33, 5, 12, 5, -2, }, /* 57 */ - { 33, 9, 12, 9, 2, }, /* 58 */ - { 33, 8, 12, 9, 1, }, /* 59 */ - { 33, 5, 12, 9, -2, }, /* 60 */ - { 33, 9, 12, 13, 2, }, /* 61 */ - { 33, 8, 12, 13, 1, }, /* 62 */ - { 33, 5, 12, 13, -2, }, /* 63 */ - { 33, 5, 12, 0, -79, }, /* 64 */ - { 33, 9, 12, 17, 2, }, /* 65 */ - { 33, 8, 12, 17, 1, }, /* 66 */ - { 33, 5, 12, 17, -2, }, /* 67 */ - { 33, 9, 12, 0, -97, }, /* 68 */ - { 33, 9, 12, 0, -56, }, /* 69 */ - { 33, 9, 12, 0, -130, }, /* 70 */ - { 33, 9, 12, 0, 10795, }, /* 71 */ - { 33, 9, 12, 0, -163, }, /* 72 */ - { 33, 9, 12, 0, 10792, }, /* 73 */ - { 33, 5, 12, 0, 10815, }, /* 74 */ - { 33, 9, 12, 0, -195, }, /* 75 */ - { 33, 9, 12, 0, 69, }, /* 76 */ - { 33, 9, 12, 0, 71, }, /* 77 */ - { 33, 5, 12, 0, 10783, }, /* 78 */ - { 33, 5, 12, 0, 10780, }, /* 79 */ - { 33, 5, 12, 0, 10782, }, /* 80 */ - { 33, 5, 12, 0, -210, }, /* 81 */ - { 33, 5, 12, 0, -206, }, /* 82 */ - { 33, 5, 12, 0, -205, }, /* 83 */ - { 33, 5, 12, 0, -202, }, /* 84 */ - { 33, 5, 12, 0, -203, }, /* 85 */ - { 33, 5, 12, 0, 42319, }, /* 86 */ - { 33, 5, 12, 0, 42315, }, /* 87 */ - { 33, 5, 12, 0, -207, }, /* 88 */ - { 33, 5, 12, 0, 42280, }, /* 89 */ - { 33, 5, 12, 0, 42308, }, /* 90 */ - { 33, 5, 12, 0, -209, }, /* 91 */ - { 33, 5, 12, 0, -211, }, /* 92 */ - { 33, 5, 12, 0, 10743, }, /* 93 */ - { 33, 5, 12, 0, 42305, }, /* 94 */ - { 33, 5, 12, 0, 10749, }, /* 95 */ - { 33, 5, 12, 0, -213, }, /* 96 */ - { 33, 5, 12, 0, -214, }, /* 97 */ - { 33, 5, 12, 0, 10727, }, /* 98 */ - { 33, 5, 12, 0, -218, }, /* 99 */ - { 33, 5, 12, 0, 42282, }, /* 100 */ - { 33, 5, 12, 0, -69, }, /* 101 */ - { 33, 5, 12, 0, -217, }, /* 102 */ - { 33, 5, 12, 0, -71, }, /* 103 */ - { 33, 5, 12, 0, -219, }, /* 104 */ - { 33, 5, 12, 0, 42258, }, /* 105 */ - { 33, 6, 12, 0, 0, }, /* 106 */ - { 9, 6, 12, 0, 0, }, /* 107 */ - { 3, 24, 12, 0, 0, }, /* 108 */ - { 27, 12, 3, 0, 0, }, /* 109 */ - { 27, 12, 3, 21, 116, }, /* 110 */ - { 19, 9, 12, 0, 1, }, /* 111 */ - { 19, 5, 12, 0, -1, }, /* 112 */ - { 19, 24, 12, 0, 0, }, /* 113 */ - { 9, 2, 12, 0, 0, }, /* 114 */ - { 19, 6, 12, 0, 0, }, /* 115 */ - { 19, 5, 12, 0, 130, }, /* 116 */ - { 19, 9, 12, 0, 116, }, /* 117 */ - { 19, 9, 12, 0, 38, }, /* 118 */ - { 19, 9, 12, 0, 37, }, /* 119 */ - { 19, 9, 12, 0, 64, }, /* 120 */ - { 19, 9, 12, 0, 63, }, /* 121 */ - { 19, 5, 12, 0, 0, }, /* 122 */ - { 19, 9, 12, 0, 32, }, /* 123 */ - { 19, 9, 12, 34, 32, }, /* 124 */ - { 19, 9, 12, 59, 32, }, /* 125 */ - { 19, 9, 12, 38, 32, }, /* 126 */ - { 19, 9, 12, 21, 32, }, /* 127 */ - { 19, 9, 12, 51, 32, }, /* 128 */ - { 19, 9, 12, 26, 32, }, /* 129 */ - { 19, 9, 12, 47, 32, }, /* 130 */ - { 19, 9, 12, 55, 32, }, /* 131 */ - { 19, 9, 12, 30, 32, }, /* 132 */ - { 19, 9, 12, 43, 32, }, /* 133 */ - { 19, 9, 12, 67, 32, }, /* 134 */ - { 19, 5, 12, 0, -38, }, /* 135 */ - { 19, 5, 12, 0, -37, }, /* 136 */ - { 19, 5, 12, 0, -32, }, /* 137 */ - { 19, 5, 12, 34, -32, }, /* 138 */ - { 19, 5, 12, 59, -32, }, /* 139 */ - { 19, 5, 12, 38, -32, }, /* 140 */ - { 19, 5, 12, 21, -116, }, /* 141 */ - { 19, 5, 12, 51, -32, }, /* 142 */ - { 19, 5, 12, 26, -775, }, /* 143 */ - { 19, 5, 12, 47, -32, }, /* 144 */ - { 19, 5, 12, 55, -32, }, /* 145 */ - { 19, 5, 12, 30, 1, }, /* 146 */ - { 19, 5, 12, 30, -32, }, /* 147 */ - { 19, 5, 12, 43, -32, }, /* 148 */ - { 19, 5, 12, 67, -32, }, /* 149 */ - { 19, 5, 12, 0, -64, }, /* 150 */ - { 19, 5, 12, 0, -63, }, /* 151 */ - { 19, 9, 12, 0, 8, }, /* 152 */ - { 19, 5, 12, 34, -30, }, /* 153 */ - { 19, 5, 12, 38, -25, }, /* 154 */ - { 19, 9, 12, 0, 0, }, /* 155 */ - { 19, 5, 12, 43, -15, }, /* 156 */ - { 19, 5, 12, 47, -22, }, /* 157 */ - { 19, 5, 12, 0, -8, }, /* 158 */ - { 10, 9, 12, 0, 1, }, /* 159 */ - { 10, 5, 12, 0, -1, }, /* 160 */ - { 19, 5, 12, 51, -54, }, /* 161 */ - { 19, 5, 12, 55, -48, }, /* 162 */ - { 19, 5, 12, 0, 7, }, /* 163 */ - { 19, 5, 12, 0, -116, }, /* 164 */ - { 19, 9, 12, 38, -60, }, /* 165 */ - { 19, 5, 12, 59, -64, }, /* 166 */ - { 19, 25, 12, 0, 0, }, /* 167 */ - { 19, 9, 12, 0, -7, }, /* 168 */ - { 19, 9, 12, 0, -130, }, /* 169 */ - { 12, 9, 12, 0, 80, }, /* 170 */ - { 12, 9, 12, 0, 32, }, /* 171 */ - { 12, 5, 12, 0, -32, }, /* 172 */ - { 12, 5, 12, 0, -80, }, /* 173 */ - { 12, 9, 12, 0, 1, }, /* 174 */ - { 12, 5, 12, 0, -1, }, /* 175 */ - { 12, 26, 12, 0, 0, }, /* 176 */ - { 12, 12, 3, 0, 0, }, /* 177 */ - { 12, 11, 3, 0, 0, }, /* 178 */ - { 12, 9, 12, 0, 15, }, /* 179 */ - { 12, 5, 12, 0, -15, }, /* 180 */ - { 1, 9, 12, 0, 48, }, /* 181 */ - { 1, 6, 12, 0, 0, }, /* 182 */ - { 1, 21, 12, 0, 0, }, /* 183 */ - { 1, 5, 12, 0, -48, }, /* 184 */ - { 1, 5, 12, 0, 0, }, /* 185 */ - { 1, 17, 12, 0, 0, }, /* 186 */ - { 1, 26, 12, 0, 0, }, /* 187 */ - { 1, 23, 12, 0, 0, }, /* 188 */ - { 25, 12, 3, 0, 0, }, /* 189 */ - { 25, 17, 12, 0, 0, }, /* 190 */ - { 25, 21, 12, 0, 0, }, /* 191 */ - { 25, 7, 12, 0, 0, }, /* 192 */ - { 0, 1, 2, 0, 0, }, /* 193 */ - { 0, 25, 12, 0, 0, }, /* 194 */ - { 0, 21, 12, 0, 0, }, /* 195 */ - { 0, 23, 12, 0, 0, }, /* 196 */ - { 0, 26, 12, 0, 0, }, /* 197 */ - { 0, 12, 3, 0, 0, }, /* 198 */ - { 0, 7, 12, 0, 0, }, /* 199 */ - { 0, 6, 12, 0, 0, }, /* 200 */ - { 0, 13, 12, 0, 0, }, /* 201 */ - { 49, 21, 12, 0, 0, }, /* 202 */ - { 49, 1, 2, 0, 0, }, /* 203 */ - { 49, 7, 12, 0, 0, }, /* 204 */ - { 49, 12, 3, 0, 0, }, /* 205 */ - { 55, 7, 12, 0, 0, }, /* 206 */ - { 55, 12, 3, 0, 0, }, /* 207 */ - { 63, 13, 12, 0, 0, }, /* 208 */ - { 63, 7, 12, 0, 0, }, /* 209 */ - { 63, 12, 3, 0, 0, }, /* 210 */ - { 63, 6, 12, 0, 0, }, /* 211 */ - { 63, 26, 12, 0, 0, }, /* 212 */ - { 63, 21, 12, 0, 0, }, /* 213 */ - { 89, 7, 12, 0, 0, }, /* 214 */ - { 89, 12, 3, 0, 0, }, /* 215 */ - { 89, 6, 12, 0, 0, }, /* 216 */ - { 89, 21, 12, 0, 0, }, /* 217 */ - { 94, 7, 12, 0, 0, }, /* 218 */ - { 94, 12, 3, 0, 0, }, /* 219 */ - { 94, 21, 12, 0, 0, }, /* 220 */ - { 14, 12, 3, 0, 0, }, /* 221 */ - { 14, 10, 5, 0, 0, }, /* 222 */ - { 14, 7, 12, 0, 0, }, /* 223 */ - { 14, 13, 12, 0, 0, }, /* 224 */ - { 14, 21, 12, 0, 0, }, /* 225 */ - { 14, 6, 12, 0, 0, }, /* 226 */ - { 2, 7, 12, 0, 0, }, /* 227 */ - { 2, 12, 3, 0, 0, }, /* 228 */ - { 2, 10, 5, 0, 0, }, /* 229 */ - { 2, 10, 3, 0, 0, }, /* 230 */ - { 2, 13, 12, 0, 0, }, /* 231 */ - { 2, 23, 12, 0, 0, }, /* 232 */ - { 2, 15, 12, 0, 0, }, /* 233 */ - { 2, 26, 12, 0, 0, }, /* 234 */ - { 21, 12, 3, 0, 0, }, /* 235 */ - { 21, 10, 5, 0, 0, }, /* 236 */ - { 21, 7, 12, 0, 0, }, /* 237 */ - { 21, 13, 12, 0, 0, }, /* 238 */ - { 20, 12, 3, 0, 0, }, /* 239 */ - { 20, 10, 5, 0, 0, }, /* 240 */ - { 20, 7, 12, 0, 0, }, /* 241 */ - { 20, 13, 12, 0, 0, }, /* 242 */ - { 20, 21, 12, 0, 0, }, /* 243 */ - { 20, 23, 12, 0, 0, }, /* 244 */ - { 43, 12, 3, 0, 0, }, /* 245 */ - { 43, 10, 5, 0, 0, }, /* 246 */ - { 43, 7, 12, 0, 0, }, /* 247 */ - { 43, 10, 3, 0, 0, }, /* 248 */ - { 43, 13, 12, 0, 0, }, /* 249 */ - { 43, 26, 12, 0, 0, }, /* 250 */ - { 43, 15, 12, 0, 0, }, /* 251 */ - { 53, 12, 3, 0, 0, }, /* 252 */ - { 53, 7, 12, 0, 0, }, /* 253 */ - { 53, 10, 3, 0, 0, }, /* 254 */ - { 53, 10, 5, 0, 0, }, /* 255 */ - { 53, 13, 12, 0, 0, }, /* 256 */ - { 53, 15, 12, 0, 0, }, /* 257 */ - { 53, 26, 12, 0, 0, }, /* 258 */ - { 53, 23, 12, 0, 0, }, /* 259 */ - { 54, 12, 3, 0, 0, }, /* 260 */ - { 54, 10, 5, 0, 0, }, /* 261 */ - { 54, 7, 12, 0, 0, }, /* 262 */ - { 54, 13, 12, 0, 0, }, /* 263 */ - { 54, 15, 12, 0, 0, }, /* 264 */ - { 54, 26, 12, 0, 0, }, /* 265 */ - { 28, 12, 3, 0, 0, }, /* 266 */ - { 28, 10, 5, 0, 0, }, /* 267 */ - { 28, 7, 12, 0, 0, }, /* 268 */ - { 28, 10, 3, 0, 0, }, /* 269 */ - { 28, 13, 12, 0, 0, }, /* 270 */ - { 36, 12, 3, 0, 0, }, /* 271 */ - { 36, 10, 5, 0, 0, }, /* 272 */ - { 36, 7, 12, 0, 0, }, /* 273 */ - { 36, 10, 3, 0, 0, }, /* 274 */ - { 36, 13, 12, 0, 0, }, /* 275 */ - { 36, 15, 12, 0, 0, }, /* 276 */ - { 36, 26, 12, 0, 0, }, /* 277 */ - { 47, 10, 5, 0, 0, }, /* 278 */ - { 47, 7, 12, 0, 0, }, /* 279 */ - { 47, 12, 3, 0, 0, }, /* 280 */ - { 47, 10, 3, 0, 0, }, /* 281 */ - { 47, 13, 12, 0, 0, }, /* 282 */ - { 47, 21, 12, 0, 0, }, /* 283 */ - { 56, 7, 12, 0, 0, }, /* 284 */ - { 56, 12, 3, 0, 0, }, /* 285 */ - { 56, 7, 5, 0, 0, }, /* 286 */ - { 56, 6, 12, 0, 0, }, /* 287 */ - { 56, 21, 12, 0, 0, }, /* 288 */ - { 56, 13, 12, 0, 0, }, /* 289 */ - { 32, 7, 12, 0, 0, }, /* 290 */ - { 32, 12, 3, 0, 0, }, /* 291 */ - { 32, 7, 5, 0, 0, }, /* 292 */ - { 32, 6, 12, 0, 0, }, /* 293 */ - { 32, 13, 12, 0, 0, }, /* 294 */ - { 57, 7, 12, 0, 0, }, /* 295 */ - { 57, 26, 12, 0, 0, }, /* 296 */ - { 57, 21, 12, 0, 0, }, /* 297 */ - { 57, 12, 3, 0, 0, }, /* 298 */ - { 57, 13, 12, 0, 0, }, /* 299 */ - { 57, 15, 12, 0, 0, }, /* 300 */ - { 57, 22, 12, 0, 0, }, /* 301 */ - { 57, 18, 12, 0, 0, }, /* 302 */ - { 57, 10, 5, 0, 0, }, /* 303 */ - { 38, 7, 12, 0, 0, }, /* 304 */ - { 38, 10, 12, 0, 0, }, /* 305 */ - { 38, 12, 3, 0, 0, }, /* 306 */ - { 38, 10, 5, 0, 0, }, /* 307 */ - { 38, 13, 12, 0, 0, }, /* 308 */ - { 38, 21, 12, 0, 0, }, /* 309 */ - { 38, 26, 12, 0, 0, }, /* 310 */ - { 16, 9, 12, 0, 7264, }, /* 311 */ - { 16, 7, 12, 0, 0, }, /* 312 */ - { 16, 6, 12, 0, 0, }, /* 313 */ - { 23, 7, 6, 0, 0, }, /* 314 */ - { 23, 7, 7, 0, 0, }, /* 315 */ - { 23, 7, 8, 0, 0, }, /* 316 */ - { 15, 7, 12, 0, 0, }, /* 317 */ - { 15, 12, 3, 0, 0, }, /* 318 */ - { 15, 21, 12, 0, 0, }, /* 319 */ - { 15, 15, 12, 0, 0, }, /* 320 */ - { 15, 26, 12, 0, 0, }, /* 321 */ - { 8, 7, 12, 0, 0, }, /* 322 */ - { 7, 17, 12, 0, 0, }, /* 323 */ - { 7, 7, 12, 0, 0, }, /* 324 */ - { 7, 21, 12, 0, 0, }, /* 325 */ - { 40, 29, 12, 0, 0, }, /* 326 */ - { 40, 7, 12, 0, 0, }, /* 327 */ - { 40, 22, 12, 0, 0, }, /* 328 */ - { 40, 18, 12, 0, 0, }, /* 329 */ - { 45, 7, 12, 0, 0, }, /* 330 */ - { 45, 14, 12, 0, 0, }, /* 331 */ - { 50, 7, 12, 0, 0, }, /* 332 */ - { 50, 12, 3, 0, 0, }, /* 333 */ - { 24, 7, 12, 0, 0, }, /* 334 */ - { 24, 12, 3, 0, 0, }, /* 335 */ - { 6, 7, 12, 0, 0, }, /* 336 */ - { 6, 12, 3, 0, 0, }, /* 337 */ - { 51, 7, 12, 0, 0, }, /* 338 */ - { 51, 12, 3, 0, 0, }, /* 339 */ - { 31, 7, 12, 0, 0, }, /* 340 */ - { 31, 12, 3, 0, 0, }, /* 341 */ - { 31, 10, 5, 0, 0, }, /* 342 */ - { 31, 21, 12, 0, 0, }, /* 343 */ - { 31, 6, 12, 0, 0, }, /* 344 */ - { 31, 23, 12, 0, 0, }, /* 345 */ - { 31, 13, 12, 0, 0, }, /* 346 */ - { 31, 15, 12, 0, 0, }, /* 347 */ - { 37, 21, 12, 0, 0, }, /* 348 */ - { 37, 17, 12, 0, 0, }, /* 349 */ - { 37, 12, 3, 0, 0, }, /* 350 */ - { 37, 1, 2, 0, 0, }, /* 351 */ - { 37, 13, 12, 0, 0, }, /* 352 */ - { 37, 7, 12, 0, 0, }, /* 353 */ - { 37, 6, 12, 0, 0, }, /* 354 */ - { 34, 7, 12, 0, 0, }, /* 355 */ - { 34, 12, 3, 0, 0, }, /* 356 */ - { 34, 10, 5, 0, 0, }, /* 357 */ - { 34, 26, 12, 0, 0, }, /* 358 */ - { 34, 21, 12, 0, 0, }, /* 359 */ - { 34, 13, 12, 0, 0, }, /* 360 */ - { 52, 7, 12, 0, 0, }, /* 361 */ - { 39, 7, 12, 0, 0, }, /* 362 */ - { 39, 10, 12, 0, 0, }, /* 363 */ - { 39, 10, 5, 0, 0, }, /* 364 */ - { 39, 13, 12, 0, 0, }, /* 365 */ - { 39, 15, 12, 0, 0, }, /* 366 */ - { 39, 26, 12, 0, 0, }, /* 367 */ - { 31, 26, 12, 0, 0, }, /* 368 */ - { 5, 7, 12, 0, 0, }, /* 369 */ - { 5, 12, 3, 0, 0, }, /* 370 */ - { 5, 10, 5, 0, 0, }, /* 371 */ - { 5, 21, 12, 0, 0, }, /* 372 */ - { 90, 7, 12, 0, 0, }, /* 373 */ - { 90, 10, 5, 0, 0, }, /* 374 */ - { 90, 12, 3, 0, 0, }, /* 375 */ - { 90, 10, 12, 0, 0, }, /* 376 */ - { 90, 13, 12, 0, 0, }, /* 377 */ - { 90, 21, 12, 0, 0, }, /* 378 */ - { 90, 6, 12, 0, 0, }, /* 379 */ - { 27, 11, 3, 0, 0, }, /* 380 */ - { 61, 12, 3, 0, 0, }, /* 381 */ - { 61, 10, 5, 0, 0, }, /* 382 */ - { 61, 7, 12, 0, 0, }, /* 383 */ - { 61, 13, 12, 0, 0, }, /* 384 */ - { 61, 21, 12, 0, 0, }, /* 385 */ - { 61, 26, 12, 0, 0, }, /* 386 */ - { 75, 12, 3, 0, 0, }, /* 387 */ - { 75, 10, 5, 0, 0, }, /* 388 */ - { 75, 7, 12, 0, 0, }, /* 389 */ - { 75, 13, 12, 0, 0, }, /* 390 */ - { 92, 7, 12, 0, 0, }, /* 391 */ - { 92, 12, 3, 0, 0, }, /* 392 */ - { 92, 10, 5, 0, 0, }, /* 393 */ - { 92, 21, 12, 0, 0, }, /* 394 */ - { 69, 7, 12, 0, 0, }, /* 395 */ - { 69, 10, 5, 0, 0, }, /* 396 */ - { 69, 12, 3, 0, 0, }, /* 397 */ - { 69, 21, 12, 0, 0, }, /* 398 */ - { 69, 13, 12, 0, 0, }, /* 399 */ - { 72, 13, 12, 0, 0, }, /* 400 */ - { 72, 7, 12, 0, 0, }, /* 401 */ - { 72, 6, 12, 0, 0, }, /* 402 */ - { 72, 21, 12, 0, 0, }, /* 403 */ - { 75, 21, 12, 0, 0, }, /* 404 */ - { 9, 10, 5, 0, 0, }, /* 405 */ - { 9, 7, 12, 0, 0, }, /* 406 */ - { 12, 5, 12, 0, 0, }, /* 407 */ - { 12, 6, 12, 0, 0, }, /* 408 */ - { 33, 5, 12, 0, 35332, }, /* 409 */ - { 33, 5, 12, 0, 3814, }, /* 410 */ - { 33, 9, 12, 63, 1, }, /* 411 */ - { 33, 5, 12, 63, -1, }, /* 412 */ - { 33, 5, 12, 63, -58, }, /* 413 */ - { 33, 9, 12, 0, -7615, }, /* 414 */ - { 19, 5, 12, 0, 8, }, /* 415 */ - { 19, 9, 12, 0, -8, }, /* 416 */ - { 19, 5, 12, 0, 74, }, /* 417 */ - { 19, 5, 12, 0, 86, }, /* 418 */ - { 19, 5, 12, 0, 100, }, /* 419 */ - { 19, 5, 12, 0, 128, }, /* 420 */ - { 19, 5, 12, 0, 112, }, /* 421 */ - { 19, 5, 12, 0, 126, }, /* 422 */ - { 19, 8, 12, 0, -8, }, /* 423 */ - { 19, 5, 12, 0, 9, }, /* 424 */ - { 19, 9, 12, 0, -74, }, /* 425 */ - { 19, 8, 12, 0, -9, }, /* 426 */ - { 19, 5, 12, 21, -7173, }, /* 427 */ - { 19, 9, 12, 0, -86, }, /* 428 */ - { 19, 9, 12, 0, -100, }, /* 429 */ - { 19, 9, 12, 0, -112, }, /* 430 */ - { 19, 9, 12, 0, -128, }, /* 431 */ - { 19, 9, 12, 0, -126, }, /* 432 */ - { 27, 1, 3, 0, 0, }, /* 433 */ - { 9, 27, 2, 0, 0, }, /* 434 */ - { 9, 28, 2, 0, 0, }, /* 435 */ - { 9, 2, 2, 0, 0, }, /* 436 */ - { 9, 9, 12, 0, 0, }, /* 437 */ - { 9, 5, 12, 0, 0, }, /* 438 */ - { 19, 9, 12, 67, -7517, }, /* 439 */ - { 33, 9, 12, 71, -8383, }, /* 440 */ - { 33, 9, 12, 75, -8262, }, /* 441 */ - { 33, 9, 12, 0, 28, }, /* 442 */ - { 33, 5, 12, 0, -28, }, /* 443 */ - { 33, 14, 12, 0, 16, }, /* 444 */ - { 33, 14, 12, 0, -16, }, /* 445 */ - { 33, 14, 12, 0, 0, }, /* 446 */ - { 9, 26, 12, 0, 26, }, /* 447 */ - { 9, 26, 12, 0, -26, }, /* 448 */ - { 4, 26, 12, 0, 0, }, /* 449 */ - { 17, 9, 12, 0, 48, }, /* 450 */ - { 17, 5, 12, 0, -48, }, /* 451 */ - { 33, 9, 12, 0, -10743, }, /* 452 */ - { 33, 9, 12, 0, -3814, }, /* 453 */ - { 33, 9, 12, 0, -10727, }, /* 454 */ - { 33, 5, 12, 0, -10795, }, /* 455 */ - { 33, 5, 12, 0, -10792, }, /* 456 */ - { 33, 9, 12, 0, -10780, }, /* 457 */ - { 33, 9, 12, 0, -10749, }, /* 458 */ - { 33, 9, 12, 0, -10783, }, /* 459 */ - { 33, 9, 12, 0, -10782, }, /* 460 */ - { 33, 9, 12, 0, -10815, }, /* 461 */ - { 10, 5, 12, 0, 0, }, /* 462 */ - { 10, 26, 12, 0, 0, }, /* 463 */ - { 10, 12, 3, 0, 0, }, /* 464 */ - { 10, 21, 12, 0, 0, }, /* 465 */ - { 10, 15, 12, 0, 0, }, /* 466 */ - { 16, 5, 12, 0, -7264, }, /* 467 */ - { 58, 7, 12, 0, 0, }, /* 468 */ - { 58, 6, 12, 0, 0, }, /* 469 */ - { 58, 21, 12, 0, 0, }, /* 470 */ - { 58, 12, 3, 0, 0, }, /* 471 */ - { 22, 26, 12, 0, 0, }, /* 472 */ - { 22, 6, 12, 0, 0, }, /* 473 */ - { 22, 14, 12, 0, 0, }, /* 474 */ - { 23, 10, 3, 0, 0, }, /* 475 */ - { 26, 7, 12, 0, 0, }, /* 476 */ - { 26, 6, 12, 0, 0, }, /* 477 */ - { 29, 7, 12, 0, 0, }, /* 478 */ - { 29, 6, 12, 0, 0, }, /* 479 */ - { 3, 7, 12, 0, 0, }, /* 480 */ - { 23, 7, 12, 0, 0, }, /* 481 */ - { 23, 26, 12, 0, 0, }, /* 482 */ - { 29, 26, 12, 0, 0, }, /* 483 */ - { 22, 7, 12, 0, 0, }, /* 484 */ - { 60, 7, 12, 0, 0, }, /* 485 */ - { 60, 6, 12, 0, 0, }, /* 486 */ - { 60, 26, 12, 0, 0, }, /* 487 */ - { 85, 7, 12, 0, 0, }, /* 488 */ - { 85, 6, 12, 0, 0, }, /* 489 */ - { 85, 21, 12, 0, 0, }, /* 490 */ - { 76, 7, 12, 0, 0, }, /* 491 */ - { 76, 6, 12, 0, 0, }, /* 492 */ - { 76, 21, 12, 0, 0, }, /* 493 */ - { 76, 13, 12, 0, 0, }, /* 494 */ - { 12, 7, 12, 0, 0, }, /* 495 */ - { 12, 21, 12, 0, 0, }, /* 496 */ - { 78, 7, 12, 0, 0, }, /* 497 */ - { 78, 14, 12, 0, 0, }, /* 498 */ - { 78, 12, 3, 0, 0, }, /* 499 */ - { 78, 21, 12, 0, 0, }, /* 500 */ - { 33, 9, 12, 0, -35332, }, /* 501 */ - { 33, 9, 12, 0, -42280, }, /* 502 */ - { 33, 9, 12, 0, -42308, }, /* 503 */ - { 33, 9, 12, 0, -42319, }, /* 504 */ - { 33, 9, 12, 0, -42315, }, /* 505 */ - { 33, 9, 12, 0, -42305, }, /* 506 */ - { 33, 9, 12, 0, -42258, }, /* 507 */ - { 33, 9, 12, 0, -42282, }, /* 508 */ - { 48, 7, 12, 0, 0, }, /* 509 */ - { 48, 12, 3, 0, 0, }, /* 510 */ - { 48, 10, 5, 0, 0, }, /* 511 */ - { 48, 26, 12, 0, 0, }, /* 512 */ - { 64, 7, 12, 0, 0, }, /* 513 */ - { 64, 21, 12, 0, 0, }, /* 514 */ - { 74, 10, 5, 0, 0, }, /* 515 */ - { 74, 7, 12, 0, 0, }, /* 516 */ - { 74, 12, 3, 0, 0, }, /* 517 */ - { 74, 21, 12, 0, 0, }, /* 518 */ - { 74, 13, 12, 0, 0, }, /* 519 */ - { 68, 13, 12, 0, 0, }, /* 520 */ - { 68, 7, 12, 0, 0, }, /* 521 */ - { 68, 12, 3, 0, 0, }, /* 522 */ - { 68, 21, 12, 0, 0, }, /* 523 */ - { 73, 7, 12, 0, 0, }, /* 524 */ - { 73, 12, 3, 0, 0, }, /* 525 */ - { 73, 10, 5, 0, 0, }, /* 526 */ - { 73, 21, 12, 0, 0, }, /* 527 */ - { 83, 12, 3, 0, 0, }, /* 528 */ - { 83, 10, 5, 0, 0, }, /* 529 */ - { 83, 7, 12, 0, 0, }, /* 530 */ - { 83, 21, 12, 0, 0, }, /* 531 */ - { 83, 13, 12, 0, 0, }, /* 532 */ - { 38, 6, 12, 0, 0, }, /* 533 */ - { 67, 7, 12, 0, 0, }, /* 534 */ - { 67, 12, 3, 0, 0, }, /* 535 */ - { 67, 10, 5, 0, 0, }, /* 536 */ - { 67, 13, 12, 0, 0, }, /* 537 */ - { 67, 21, 12, 0, 0, }, /* 538 */ - { 91, 7, 12, 0, 0, }, /* 539 */ - { 91, 12, 3, 0, 0, }, /* 540 */ - { 91, 6, 12, 0, 0, }, /* 541 */ - { 91, 21, 12, 0, 0, }, /* 542 */ - { 86, 7, 12, 0, 0, }, /* 543 */ - { 86, 10, 5, 0, 0, }, /* 544 */ - { 86, 12, 3, 0, 0, }, /* 545 */ - { 86, 21, 12, 0, 0, }, /* 546 */ - { 86, 6, 12, 0, 0, }, /* 547 */ - { 86, 13, 12, 0, 0, }, /* 548 */ - { 23, 7, 9, 0, 0, }, /* 549 */ - { 23, 7, 10, 0, 0, }, /* 550 */ - { 9, 4, 2, 0, 0, }, /* 551 */ - { 9, 3, 12, 0, 0, }, /* 552 */ - { 25, 25, 12, 0, 0, }, /* 553 */ - { 0, 24, 12, 0, 0, }, /* 554 */ - { 9, 6, 3, 0, 0, }, /* 555 */ - { 35, 7, 12, 0, 0, }, /* 556 */ - { 19, 14, 12, 0, 0, }, /* 557 */ - { 19, 15, 12, 0, 0, }, /* 558 */ - { 19, 26, 12, 0, 0, }, /* 559 */ - { 70, 7, 12, 0, 0, }, /* 560 */ - { 66, 7, 12, 0, 0, }, /* 561 */ - { 41, 7, 12, 0, 0, }, /* 562 */ - { 41, 15, 12, 0, 0, }, /* 563 */ - { 18, 7, 12, 0, 0, }, /* 564 */ - { 18, 14, 12, 0, 0, }, /* 565 */ - { 117, 7, 12, 0, 0, }, /* 566 */ - { 117, 12, 3, 0, 0, }, /* 567 */ - { 59, 7, 12, 0, 0, }, /* 568 */ - { 59, 21, 12, 0, 0, }, /* 569 */ - { 42, 7, 12, 0, 0, }, /* 570 */ - { 42, 21, 12, 0, 0, }, /* 571 */ - { 42, 14, 12, 0, 0, }, /* 572 */ - { 13, 9, 12, 0, 40, }, /* 573 */ - { 13, 5, 12, 0, -40, }, /* 574 */ - { 46, 7, 12, 0, 0, }, /* 575 */ - { 44, 7, 12, 0, 0, }, /* 576 */ - { 44, 13, 12, 0, 0, }, /* 577 */ - { 105, 7, 12, 0, 0, }, /* 578 */ - { 103, 7, 12, 0, 0, }, /* 579 */ - { 103, 21, 12, 0, 0, }, /* 580 */ - { 109, 7, 12, 0, 0, }, /* 581 */ - { 11, 7, 12, 0, 0, }, /* 582 */ - { 80, 7, 12, 0, 0, }, /* 583 */ - { 80, 21, 12, 0, 0, }, /* 584 */ - { 80, 15, 12, 0, 0, }, /* 585 */ - { 119, 7, 12, 0, 0, }, /* 586 */ - { 119, 26, 12, 0, 0, }, /* 587 */ - { 119, 15, 12, 0, 0, }, /* 588 */ - { 115, 7, 12, 0, 0, }, /* 589 */ - { 115, 15, 12, 0, 0, }, /* 590 */ - { 65, 7, 12, 0, 0, }, /* 591 */ - { 65, 15, 12, 0, 0, }, /* 592 */ - { 65, 21, 12, 0, 0, }, /* 593 */ - { 71, 7, 12, 0, 0, }, /* 594 */ - { 71, 21, 12, 0, 0, }, /* 595 */ - { 97, 7, 12, 0, 0, }, /* 596 */ - { 96, 7, 12, 0, 0, }, /* 597 */ - { 30, 7, 12, 0, 0, }, /* 598 */ - { 30, 12, 3, 0, 0, }, /* 599 */ - { 30, 15, 12, 0, 0, }, /* 600 */ - { 30, 21, 12, 0, 0, }, /* 601 */ - { 87, 7, 12, 0, 0, }, /* 602 */ - { 87, 15, 12, 0, 0, }, /* 603 */ - { 87, 21, 12, 0, 0, }, /* 604 */ - { 116, 7, 12, 0, 0, }, /* 605 */ - { 116, 15, 12, 0, 0, }, /* 606 */ - { 111, 7, 12, 0, 0, }, /* 607 */ - { 111, 26, 12, 0, 0, }, /* 608 */ - { 111, 12, 3, 0, 0, }, /* 609 */ - { 111, 15, 12, 0, 0, }, /* 610 */ - { 111, 21, 12, 0, 0, }, /* 611 */ - { 77, 7, 12, 0, 0, }, /* 612 */ - { 77, 21, 12, 0, 0, }, /* 613 */ - { 82, 7, 12, 0, 0, }, /* 614 */ - { 82, 15, 12, 0, 0, }, /* 615 */ - { 81, 7, 12, 0, 0, }, /* 616 */ - { 81, 15, 12, 0, 0, }, /* 617 */ - { 120, 7, 12, 0, 0, }, /* 618 */ - { 120, 21, 12, 0, 0, }, /* 619 */ - { 120, 15, 12, 0, 0, }, /* 620 */ - { 88, 7, 12, 0, 0, }, /* 621 */ - { 0, 15, 12, 0, 0, }, /* 622 */ - { 93, 10, 5, 0, 0, }, /* 623 */ - { 93, 12, 3, 0, 0, }, /* 624 */ - { 93, 7, 12, 0, 0, }, /* 625 */ - { 93, 21, 12, 0, 0, }, /* 626 */ - { 93, 15, 12, 0, 0, }, /* 627 */ - { 93, 13, 12, 0, 0, }, /* 628 */ - { 84, 12, 3, 0, 0, }, /* 629 */ - { 84, 10, 5, 0, 0, }, /* 630 */ - { 84, 7, 12, 0, 0, }, /* 631 */ - { 84, 21, 12, 0, 0, }, /* 632 */ - { 84, 1, 2, 0, 0, }, /* 633 */ - { 100, 7, 12, 0, 0, }, /* 634 */ - { 100, 13, 12, 0, 0, }, /* 635 */ - { 95, 12, 3, 0, 0, }, /* 636 */ - { 95, 7, 12, 0, 0, }, /* 637 */ - { 95, 10, 5, 0, 0, }, /* 638 */ - { 95, 13, 12, 0, 0, }, /* 639 */ - { 95, 21, 12, 0, 0, }, /* 640 */ - { 110, 7, 12, 0, 0, }, /* 641 */ - { 110, 12, 3, 0, 0, }, /* 642 */ - { 110, 21, 12, 0, 0, }, /* 643 */ - { 99, 12, 3, 0, 0, }, /* 644 */ - { 99, 10, 5, 0, 0, }, /* 645 */ - { 99, 7, 12, 0, 0, }, /* 646 */ - { 99, 21, 12, 0, 0, }, /* 647 */ - { 99, 13, 12, 0, 0, }, /* 648 */ - { 47, 15, 12, 0, 0, }, /* 649 */ - { 107, 7, 12, 0, 0, }, /* 650 */ - { 107, 10, 5, 0, 0, }, /* 651 */ - { 107, 12, 3, 0, 0, }, /* 652 */ - { 107, 21, 12, 0, 0, }, /* 653 */ - { 108, 7, 12, 0, 0, }, /* 654 */ - { 108, 12, 3, 0, 0, }, /* 655 */ - { 108, 10, 5, 0, 0, }, /* 656 */ - { 108, 13, 12, 0, 0, }, /* 657 */ - { 106, 12, 3, 0, 0, }, /* 658 */ - { 106, 10, 5, 0, 0, }, /* 659 */ - { 106, 7, 12, 0, 0, }, /* 660 */ - { 106, 10, 3, 0, 0, }, /* 661 */ - { 123, 7, 12, 0, 0, }, /* 662 */ - { 123, 10, 3, 0, 0, }, /* 663 */ - { 123, 10, 5, 0, 0, }, /* 664 */ - { 123, 12, 3, 0, 0, }, /* 665 */ - { 123, 21, 12, 0, 0, }, /* 666 */ - { 123, 13, 12, 0, 0, }, /* 667 */ - { 122, 7, 12, 0, 0, }, /* 668 */ - { 122, 10, 3, 0, 0, }, /* 669 */ - { 122, 10, 5, 0, 0, }, /* 670 */ - { 122, 12, 3, 0, 0, }, /* 671 */ - { 122, 21, 12, 0, 0, }, /* 672 */ - { 113, 7, 12, 0, 0, }, /* 673 */ - { 113, 10, 5, 0, 0, }, /* 674 */ - { 113, 12, 3, 0, 0, }, /* 675 */ - { 113, 21, 12, 0, 0, }, /* 676 */ - { 113, 13, 12, 0, 0, }, /* 677 */ - { 101, 7, 12, 0, 0, }, /* 678 */ - { 101, 12, 3, 0, 0, }, /* 679 */ - { 101, 10, 5, 0, 0, }, /* 680 */ - { 101, 13, 12, 0, 0, }, /* 681 */ - { 124, 9, 12, 0, 32, }, /* 682 */ - { 124, 5, 12, 0, -32, }, /* 683 */ - { 124, 13, 12, 0, 0, }, /* 684 */ - { 124, 15, 12, 0, 0, }, /* 685 */ - { 124, 7, 12, 0, 0, }, /* 686 */ - { 121, 7, 12, 0, 0, }, /* 687 */ - { 62, 7, 12, 0, 0, }, /* 688 */ - { 62, 14, 12, 0, 0, }, /* 689 */ - { 62, 21, 12, 0, 0, }, /* 690 */ - { 79, 7, 12, 0, 0, }, /* 691 */ - { 114, 7, 12, 0, 0, }, /* 692 */ - { 114, 13, 12, 0, 0, }, /* 693 */ - { 114, 21, 12, 0, 0, }, /* 694 */ - { 102, 7, 12, 0, 0, }, /* 695 */ - { 102, 12, 3, 0, 0, }, /* 696 */ - { 102, 21, 12, 0, 0, }, /* 697 */ - { 118, 7, 12, 0, 0, }, /* 698 */ - { 118, 12, 3, 0, 0, }, /* 699 */ - { 118, 21, 12, 0, 0, }, /* 700 */ - { 118, 26, 12, 0, 0, }, /* 701 */ - { 118, 6, 12, 0, 0, }, /* 702 */ - { 118, 13, 12, 0, 0, }, /* 703 */ - { 118, 15, 12, 0, 0, }, /* 704 */ - { 98, 7, 12, 0, 0, }, /* 705 */ - { 98, 10, 5, 0, 0, }, /* 706 */ - { 98, 12, 3, 0, 0, }, /* 707 */ - { 98, 6, 12, 0, 0, }, /* 708 */ - { 104, 7, 12, 0, 0, }, /* 709 */ - { 104, 26, 12, 0, 0, }, /* 710 */ - { 104, 12, 3, 0, 0, }, /* 711 */ - { 104, 21, 12, 0, 0, }, /* 712 */ - { 9, 10, 3, 0, 0, }, /* 713 */ - { 19, 12, 3, 0, 0, }, /* 714 */ - { 112, 7, 12, 0, 0, }, /* 715 */ - { 112, 15, 12, 0, 0, }, /* 716 */ - { 112, 12, 3, 0, 0, }, /* 717 */ - { 9, 26, 11, 0, 0, }, /* 718 */ - { 26, 26, 12, 0, 0, }, /* 719 */ -}; - -const pcre_uint8 PRIV(ucd_stage1)[] = { /* 8704 bytes */ - 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, /* U+0000 */ - 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, /* U+0800 */ - 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 41, 41, 42, 43, 44, 45, /* U+1000 */ - 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, /* U+1800 */ - 62, 63, 64, 65, 66, 66, 67, 68, 69, 70, 71, 72, 73, 71, 74, 75, /* U+2000 */ - 76, 76, 66, 77, 66, 66, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, /* U+2800 */ - 88, 89, 90, 91, 92, 93, 94, 71, 95, 95, 95, 95, 95, 95, 95, 95, /* U+3000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+3800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+4000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 96, 95, 95, 95, 95, /* U+4800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+5000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+5800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+6000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+6800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+7000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+7800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+8000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+8800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+9000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 97, /* U+9800 */ - 98, 99, 99, 99, 99, 99, 99, 99, 99,100,101,101,102,103,104,105, /* U+A000 */ -106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,114, /* U+A800 */ -115,116,117,118,119,120,114,115,116,117,118,119,120,114,115,116, /* U+B000 */ -117,118,119,120,114,115,116,117,118,119,120,114,115,116,117,118, /* U+B800 */ -119,120,114,115,116,117,118,119,120,114,115,116,117,118,119,120, /* U+C000 */ -114,115,116,117,118,119,120,114,115,116,117,118,119,120,114,115, /* U+C800 */ -116,117,118,119,120,114,115,116,117,118,119,120,114,115,116,121, /* U+D000 */ -122,122,122,122,122,122,122,122,122,122,122,122,122,122,122,122, /* U+D800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+E000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+E800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F000 */ -123,123, 95, 95,124,125,126,127,128,128,129,130,131,132,133,134, /* U+F800 */ -135,136,137,138,139,140,141,142,143,144,145,139,146,146,147,139, /* U+10000 */ -148,149,150,151,152,153,154,155,156,139,139,139,157,139,139,139, /* U+10800 */ -158,159,160,161,162,163,164,139,139,165,139,166,167,168,139,139, /* U+11000 */ -139,169,139,139,139,170,139,139,139,139,139,139,139,139,139,139, /* U+11800 */ -171,171,171,171,171,171,171,172,173,139,139,139,139,139,139,139, /* U+12000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+12800 */ -174,174,174,174,174,174,174,174,175,139,139,139,139,139,139,139, /* U+13000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+13800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+14000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+14800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+15000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+15800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+16000 */ -176,176,176,176,177,178,179,180,139,139,139,139,139,139,181,182, /* U+16800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+17000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+17800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+18000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+18800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+19000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+19800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+1A000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+1A800 */ -183,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+1B000 */ -139,139,139,139,139,139,139,139,184,185,139,139,139,139,139,139, /* U+1B800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+1C000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+1C800 */ - 71,186,187,188,189,139,190,139,191,192,193,194,195,196,197,198, /* U+1D000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+1D800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+1E000 */ -199,200,139,139,139,139,139,139,139,139,139,139,201,202,139,139, /* U+1E800 */ -203,204,205,206,207,139,208,209, 71,210,211,212,213,214,215,216, /* U+1F000 */ -217,218,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+1F800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+20000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+20800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+21000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+21800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+22000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+22800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+23000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+23800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+24000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+24800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+25000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+25800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+26000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+26800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+27000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+27800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+28000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+28800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+29000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+29800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95,219, 95, 95, /* U+2A000 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, /* U+2A800 */ - 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95,220, 95, /* U+2B000 */ -221,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+2B800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+2C000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+2C800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+2D000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+2D800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+2E000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+2E800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+2F000 */ - 95, 95, 95, 95,221,139,139,139,139,139,139,139,139,139,139,139, /* U+2F800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+30000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+30800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+31000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+31800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+32000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+32800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+33000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+33800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+34000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+34800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+35000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+35800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+36000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+36800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+37000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+37800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+38000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+38800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+39000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+39800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+3A000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+3A800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+3B000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+3B800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+3C000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+3C800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+3D000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+3D800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+3E000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+3E800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+3F000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+3F800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+40000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+40800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+41000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+41800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+42000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+42800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+43000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+43800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+44000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+44800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+45000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+45800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+46000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+46800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+47000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+47800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+48000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+48800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+49000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+49800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+4A000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+4A800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+4B000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+4B800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+4C000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+4C800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+4D000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+4D800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+4E000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+4E800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+4F000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+4F800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+50000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+50800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+51000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+51800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+52000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+52800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+53000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+53800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+54000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+54800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+55000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+55800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+56000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+56800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+57000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+57800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+58000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+58800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+59000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+59800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+5A000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+5A800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+5B000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+5B800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+5C000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+5C800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+5D000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+5D800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+5E000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+5E800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+5F000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+5F800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+60000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+60800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+61000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+61800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+62000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+62800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+63000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+63800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+64000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+64800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+65000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+65800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+66000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+66800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+67000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+67800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+68000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+68800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+69000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+69800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+6A000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+6A800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+6B000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+6B800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+6C000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+6C800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+6D000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+6D800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+6E000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+6E800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+6F000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+6F800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+70000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+70800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+71000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+71800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+72000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+72800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+73000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+73800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+74000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+74800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+75000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+75800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+76000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+76800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+77000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+77800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+78000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+78800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+79000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+79800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+7A000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+7A800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+7B000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+7B800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+7C000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+7C800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+7D000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+7D800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+7E000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+7E800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+7F000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+7F800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+80000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+80800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+81000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+81800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+82000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+82800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+83000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+83800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+84000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+84800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+85000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+85800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+86000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+86800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+87000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+87800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+88000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+88800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+89000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+89800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+8A000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+8A800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+8B000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+8B800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+8C000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+8C800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+8D000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+8D800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+8E000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+8E800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+8F000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+8F800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+90000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+90800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+91000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+91800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+92000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+92800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+93000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+93800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+94000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+94800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+95000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+95800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+96000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+96800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+97000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+97800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+98000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+98800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+99000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+99800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+9A000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+9A800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+9B000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+9B800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+9C000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+9C800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+9D000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+9D800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+9E000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+9E800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+9F000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+9F800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A0000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A0800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A1000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A1800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A2000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A2800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A3000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A3800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A4000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A4800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A5000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A5800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A6000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A6800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A7000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A7800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A8000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A8800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A9000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+A9800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+AA000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+AA800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+AB000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+AB800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+AC000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+AC800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+AD000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+AD800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+AE000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+AE800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+AF000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+AF800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B0000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B0800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B1000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B1800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B2000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B2800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B3000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B3800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B4000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B4800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B5000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B5800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B6000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B6800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B7000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B7800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B8000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B8800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B9000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+B9800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+BA000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+BA800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+BB000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+BB800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+BC000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+BC800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+BD000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+BD800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+BE000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+BE800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+BF000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+BF800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C0000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C0800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C1000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C1800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C2000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C2800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C3000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C3800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C4000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C4800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C5000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C5800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C6000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C6800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C7000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C7800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C8000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C8800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C9000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+C9800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+CA000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+CA800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+CB000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+CB800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+CC000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+CC800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+CD000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+CD800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+CE000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+CE800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+CF000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+CF800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D0000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D0800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D1000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D1800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D2000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D2800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D3000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D3800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D4000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D4800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D5000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D5800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D6000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D6800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D7000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D7800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D8000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D8800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D9000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+D9800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+DA000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+DA800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+DB000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+DB800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+DC000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+DC800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+DD000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+DD800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+DE000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+DE800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+DF000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+DF800 */ -222,223,224,225,223,223,223,223,223,223,223,223,223,223,223,223, /* U+E0000 */ -223,223,223,223,223,223,223,223,223,223,223,223,223,223,223,223, /* U+E0800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E1000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E1800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E2000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E2800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E3000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E3800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E4000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E4800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E5000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E5800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E6000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E6800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E7000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E7800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E8000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E8800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E9000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+E9800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+EA000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+EA800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+EB000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+EB800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+EC000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+EC800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+ED000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+ED800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+EE000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+EE800 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+EF000 */ -139,139,139,139,139,139,139,139,139,139,139,139,139,139,139,139, /* U+EF800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F0000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F0800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F1000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F1800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F2000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F2800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F3000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F3800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F4000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F4800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F5000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F5800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F6000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F6800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F7000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F7800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F8000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F8800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F9000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+F9800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+FA000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+FA800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+FB000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+FB800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+FC000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+FC800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+FD000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+FD800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+FE000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+FE800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+FF000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,226, /* U+FF800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+100000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+100800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+101000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+101800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+102000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+102800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+103000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+103800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+104000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+104800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+105000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+105800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+106000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+106800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+107000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+107800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+108000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+108800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+109000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+109800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+10A000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+10A800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+10B000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+10B800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+10C000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+10C800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+10D000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+10D800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+10E000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+10E800 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,123, /* U+10F000 */ -123,123,123,123,123,123,123,123,123,123,123,123,123,123,123,226, /* U+10F800 */ -}; - -const pcre_uint16 PRIV(ucd_stage2)[] = { /* 58112 bytes, block = 128 */ -/* block 0 */ - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 3, 4, 4, 4, 5, 4, 4, 4, 6, 7, 4, 8, 4, 9, 4, 4, - 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 4, 4, 8, 8, 8, 4, - 4, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 11, 11, 11, 11, - 11, 11, 11, 13, 11, 11, 11, 11, 11, 11, 11, 6, 4, 7, 14, 15, - 14, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 16, 16, 16, 16, - 16, 16, 16, 18, 16, 16, 16, 16, 16, 16, 16, 6, 8, 7, 8, 0, - -/* block 1 */ - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 3, 4, 5, 5, 5, 5, 19, 4, 14, 19, 20, 21, 8, 22, 19, 14, - 19, 8, 23, 23, 14, 24, 4, 4, 14, 23, 20, 25, 23, 23, 23, 4, - 11, 11, 11, 11, 11, 26, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, - 11, 11, 11, 11, 11, 11, 11, 8, 11, 11, 11, 11, 11, 11, 11, 27, - 16, 16, 16, 16, 16, 28, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, - 16, 16, 16, 16, 16, 16, 16, 8, 16, 16, 16, 16, 16, 16, 16, 29, - -/* block 2 */ - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 32, 33, 30, 31, 30, 31, 30, 31, 33, 30, 31, 30, 31, 30, 31, 30, - 31, 30, 31, 30, 31, 30, 31, 30, 31, 33, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 34, 30, 31, 30, 31, 30, 31, 35, - -/* block 3 */ - 36, 37, 30, 31, 30, 31, 38, 30, 31, 39, 39, 30, 31, 33, 40, 41, - 42, 30, 31, 39, 43, 44, 45, 46, 30, 31, 47, 33, 45, 48, 49, 50, - 30, 31, 30, 31, 30, 31, 51, 30, 31, 51, 33, 33, 30, 31, 51, 30, - 31, 52, 52, 30, 31, 30, 31, 53, 30, 31, 33, 20, 30, 31, 33, 54, - 20, 20, 20, 20, 55, 56, 57, 58, 59, 60, 61, 62, 63, 30, 31, 30, - 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 64, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 33, 65, 66, 67, 30, 31, 68, 69, 30, 31, 30, 31, 30, 31, 30, 31, - -/* block 4 */ - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 70, 33, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 33, 33, 33, 33, 33, 33, 71, 30, 31, 72, 73, 74, - 74, 30, 31, 75, 76, 77, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 78, 79, 80, 81, 82, 33, 83, 83, 33, 84, 33, 85, 86, 33, 33, 33, - 83, 87, 33, 88, 33, 89, 90, 33, 91, 92, 33, 93, 94, 33, 33, 92, - 33, 95, 96, 33, 33, 97, 33, 33, 33, 33, 33, 33, 33, 98, 33, 33, - -/* block 5 */ - 99, 33, 33, 99, 33, 33, 33,100, 99,101,102,102,103, 33, 33, 33, - 33, 33,104, 33, 20, 33, 33, 33, 33, 33, 33, 33, 33, 33,105, 33, - 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, -106,106,106,106,106,106,106,106,106,107,107,107,107,107,107,107, -107,107, 14, 14, 14, 14,107,107,107,107,107,107,107,107,107,107, -107,107, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, -106,106,106,106,106, 14, 14, 14, 14, 14,108,108,107, 14,107, 14, - 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, - -/* block 6 */ -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,110,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -111,112,111,112,107,113,111,112,114,114,115,116,116,116, 4,117, - -/* block 7 */ -114,114,114,114,113, 14,118, 4,119,119,119,114,120,114,121,121, -122,123,124,123,123,125,123,123,126,127,128,123,129,123,123,123, -130,131,114,132,123,123,133,123,123,134,123,123,135,136,136,136, -122,137,138,137,137,139,137,137,140,141,142,137,143,137,137,137, -144,145,146,147,137,137,148,137,137,149,137,137,150,151,151,152, -153,154,155,155,155,156,157,158,111,112,111,112,111,112,111,112, -111,112,159,160,159,160,159,160,159,160,159,160,159,160,159,160, -161,162,163,164,165,166,167,111,112,168,111,112,122,169,169,169, - -/* block 8 */ -170,170,170,170,170,170,170,170,170,170,170,170,170,170,170,170, -171,171,171,171,171,171,171,171,171,171,171,171,171,171,171,171, -171,171,171,171,171,171,171,171,171,171,171,171,171,171,171,171, -172,172,172,172,172,172,172,172,172,172,172,172,172,172,172,172, -172,172,172,172,172,172,172,172,172,172,172,172,172,172,172,172, -173,173,173,173,173,173,173,173,173,173,173,173,173,173,173,173, -174,175,174,175,174,175,174,175,174,175,174,175,174,175,174,175, -174,175,174,175,174,175,174,175,174,175,174,175,174,175,174,175, - -/* block 9 */ -174,175,176,177,177,109,109,177,178,178,174,175,174,175,174,175, -174,175,174,175,174,175,174,175,174,175,174,175,174,175,174,175, -174,175,174,175,174,175,174,175,174,175,174,175,174,175,174,175, -174,175,174,175,174,175,174,175,174,175,174,175,174,175,174,175, -179,174,175,174,175,174,175,174,175,174,175,174,175,174,175,180, -174,175,174,175,174,175,174,175,174,175,174,175,174,175,174,175, -174,175,174,175,174,175,174,175,174,175,174,175,174,175,174,175, -174,175,174,175,174,175,174,175,174,175,174,175,174,175,174,175, - -/* block 10 */ -174,175,174,175,174,175,174,175,174,175,174,175,174,175,174,175, -174,175,174,175,174,175,174,175,174,175,174,175,174,175,174,175, -174,175,174,175,174,175,174,175,174,175,174,175,174,175,174,175, -114,181,181,181,181,181,181,181,181,181,181,181,181,181,181,181, -181,181,181,181,181,181,181,181,181,181,181,181,181,181,181,181, -181,181,181,181,181,181,181,114,114,182,183,183,183,183,183,183, -114,184,184,184,184,184,184,184,184,184,184,184,184,184,184,184, -184,184,184,184,184,184,184,184,184,184,184,184,184,184,184,184, - -/* block 11 */ -184,184,184,184,184,184,184,185,114, 4,186,114,114,187,187,188, -114,189,189,189,189,189,189,189,189,189,189,189,189,189,189,189, -189,189,189,189,189,189,189,189,189,189,189,189,189,189,189,189, -189,189,189,189,189,189,189,189,189,189,189,189,189,189,190,189, -191,189,189,191,189,189,191,189,114,114,114,114,114,114,114,114, -192,192,192,192,192,192,192,192,192,192,192,192,192,192,192,192, -192,192,192,192,192,192,192,192,192,192,192,114,114,114,114,114, -192,192,192,191,191,114,114,114,114,114,114,114,114,114,114,114, - -/* block 12 */ -193,193,193,193,193, 22,194,194,194,195,195,196, 4,195,197,197, -198,198,198,198,198,198,198,198,198,198,198, 4, 22,114,195, 4, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -107,199,199,199,199,199,199,199,199,199,199,109,109,109,109,109, -109,109,109,109,109,109,198,198,198,198,198,198,198,198,198,198, - 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,195,195,195,195,199,199, -109,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, - -/* block 13 */ -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,195,199,198,198,198,198,198,198,198, 22,197,198, -198,198,198,198,198,200,200,198,198,197,198,198,198,198,199,199, -201,201,201,201,201,201,201,201,201,201,199,199,199,197,197,199, - -/* block 14 */ -202,202,202,202,202,202,202,202,202,202,202,202,202,202,114,203, -204,205,204,204,204,204,204,204,204,204,204,204,204,204,204,204, -204,204,204,204,204,204,204,204,204,204,204,204,204,204,204,204, -205,205,205,205,205,205,205,205,205,205,205,205,205,205,205,205, -205,205,205,205,205,205,205,205,205,205,205,114,114,204,204,204, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, - -/* block 15 */ -206,206,206,206,206,206,206,206,206,206,206,206,206,206,206,206, -206,206,206,206,206,206,206,206,206,206,206,206,206,206,206,206, -206,206,206,206,206,206,207,207,207,207,207,207,207,207,207,207, -207,206,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -208,208,208,208,208,208,208,208,208,208,209,209,209,209,209,209, -209,209,209,209,209,209,209,209,209,209,209,209,209,209,209,209, -209,209,209,209,209,209,209,209,209,209,209,210,210,210,210,210, -210,210,210,210,211,211,212,213,213,213,211,114,114,114,114,114, - -/* block 16 */ -214,214,214,214,214,214,214,214,214,214,214,214,214,214,214,214, -214,214,214,214,214,214,215,215,215,215,216,215,215,215,215,215, -215,215,215,215,216,215,215,215,216,215,215,215,215,215,114,114, -217,217,217,217,217,217,217,217,217,217,217,217,217,217,217,114, -218,218,218,218,218,218,218,218,218,218,218,218,218,218,218,218, -218,218,218,218,218,218,218,218,218,219,219,219,114,114,220,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 17 */ -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,198,198,198,198,198,198,198,198,198,198,198,198, -198,198,198,198,198,198,198,198,198,198,198,198,198,198,198,198, - -/* block 18 */ -221,221,221,222,223,223,223,223,223,223,223,223,223,223,223,223, -223,223,223,223,223,223,223,223,223,223,223,223,223,223,223,223, -223,223,223,223,223,223,223,223,223,223,223,223,223,223,223,223, -223,223,223,223,223,223,223,223,223,223,221,222,221,223,222,222, -222,221,221,221,221,221,221,221,221,222,222,222,222,221,222,222, -223,109,109,221,221,221,221,221,223,223,223,223,223,223,223,223, -223,223,221,221, 4, 4,224,224,224,224,224,224,224,224,224,224, -225,226,223,223,223,223,223,223,223,223,223,223,223,223,223,223, - -/* block 19 */ -227,228,229,229,114,227,227,227,227,227,227,227,227,114,114,227, -227,114,114,227,227,227,227,227,227,227,227,227,227,227,227,227, -227,227,227,227,227,227,227,227,227,114,227,227,227,227,227,227, -227,114,227,114,114,114,227,227,227,227,114,114,228,227,230,229, -229,228,228,228,228,114,114,229,229,114,114,229,229,228,227,114, -114,114,114,114,114,114,114,230,114,114,114,114,227,227,114,227, -227,227,228,228,114,114,231,231,231,231,231,231,231,231,231,231, -227,227,232,232,233,233,233,233,233,233,234,232,114,114,114,114, - -/* block 20 */ -114,235,235,236,114,237,237,237,237,237,237,114,114,114,114,237, -237,114,114,237,237,237,237,237,237,237,237,237,237,237,237,237, -237,237,237,237,237,237,237,237,237,114,237,237,237,237,237,237, -237,114,237,237,114,237,237,114,237,237,114,114,235,114,236,236, -236,235,235,114,114,114,114,235,235,114,114,235,235,235,114,114, -114,235,114,114,114,114,114,114,114,237,237,237,237,114,237,114, -114,114,114,114,114,114,238,238,238,238,238,238,238,238,238,238, -235,235,237,237,237,235,114,114,114,114,114,114,114,114,114,114, - -/* block 21 */ -114,239,239,240,114,241,241,241,241,241,241,241,241,241,114,241, -241,241,114,241,241,241,241,241,241,241,241,241,241,241,241,241, -241,241,241,241,241,241,241,241,241,114,241,241,241,241,241,241, -241,114,241,241,114,241,241,241,241,241,114,114,239,241,240,240, -240,239,239,239,239,239,114,239,239,240,114,240,240,239,114,114, -241,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -241,241,239,239,114,114,242,242,242,242,242,242,242,242,242,242, -243,244,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 22 */ -114,245,246,246,114,247,247,247,247,247,247,247,247,114,114,247, -247,114,114,247,247,247,247,247,247,247,247,247,247,247,247,247, -247,247,247,247,247,247,247,247,247,114,247,247,247,247,247,247, -247,114,247,247,114,247,247,247,247,247,114,114,245,247,248,245, -246,245,245,245,245,114,114,246,246,114,114,246,246,245,114,114, -114,114,114,114,114,114,245,248,114,114,114,114,247,247,114,247, -247,247,245,245,114,114,249,249,249,249,249,249,249,249,249,249, -250,247,251,251,251,251,251,251,114,114,114,114,114,114,114,114, - -/* block 23 */ -114,114,252,253,114,253,253,253,253,253,253,114,114,114,253,253, -253,114,253,253,253,253,114,114,114,253,253,114,253,114,253,253, -114,114,114,253,253,114,114,114,253,253,253,114,114,114,253,253, -253,253,253,253,253,253,253,253,253,253,114,114,114,114,254,255, -252,255,255,114,114,114,255,255,255,114,255,255,255,252,114,114, -253,114,114,114,114,114,114,254,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,256,256,256,256,256,256,256,256,256,256, -257,257,257,258,258,258,258,258,258,259,258,114,114,114,114,114, - -/* block 24 */ -260,261,261,261,114,262,262,262,262,262,262,262,262,114,262,262, -262,114,262,262,262,262,262,262,262,262,262,262,262,262,262,262, -262,262,262,262,262,262,262,262,262,114,262,262,262,262,262,262, -262,262,262,262,262,262,262,262,262,262,114,114,114,262,260,260, -260,261,261,261,261,114,260,260,260,114,260,260,260,260,114,114, -114,114,114,114,114,260,260,114,262,262,114,114,114,114,114,114, -262,262,260,260,114,114,263,263,263,263,263,263,263,263,263,263, -114,114,114,114,114,114,114,114,264,264,264,264,264,264,264,265, - -/* block 25 */ -114,266,267,267,114,268,268,268,268,268,268,268,268,114,268,268, -268,114,268,268,268,268,268,268,268,268,268,268,268,268,268,268, -268,268,268,268,268,268,268,268,268,114,268,268,268,268,268,268, -268,268,268,268,114,268,268,268,268,268,114,114,266,268,267,266, -267,267,269,267,267,114,266,267,267,114,267,267,266,266,114,114, -114,114,114,114,114,269,269,114,114,114,114,114,114,114,268,114, -268,268,266,266,114,114,270,270,270,270,270,270,270,270,270,270, -114,268,268,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 26 */ -114,271,272,272,114,273,273,273,273,273,273,273,273,114,273,273, -273,114,273,273,273,273,273,273,273,273,273,273,273,273,273,273, -273,273,273,273,273,273,273,273,273,273,273,273,273,273,273,273, -273,273,273,273,273,273,273,273,273,273,273,114,114,273,274,272, -272,271,271,271,271,114,272,272,272,114,272,272,272,271,273,114, -114,114,114,114,114,114,114,274,114,114,114,114,114,114,114,114, -273,273,271,271,114,114,275,275,275,275,275,275,275,275,275,275, -276,276,276,276,276,276,114,114,114,277,273,273,273,273,273,273, - -/* block 27 */ -114,114,278,278,114,279,279,279,279,279,279,279,279,279,279,279, -279,279,279,279,279,279,279,114,114,114,279,279,279,279,279,279, -279,279,279,279,279,279,279,279,279,279,279,279,279,279,279,279, -279,279,114,279,279,279,279,279,279,279,279,279,114,279,114,114, -279,279,279,279,279,279,279,114,114,114,280,114,114,114,114,281, -278,278,280,280,280,114,280,114,278,278,278,278,278,278,278,281, -114,114,114,114,114,114,282,282,282,282,282,282,282,282,282,282, -114,114,278,278,283,114,114,114,114,114,114,114,114,114,114,114, - -/* block 28 */ -114,284,284,284,284,284,284,284,284,284,284,284,284,284,284,284, -284,284,284,284,284,284,284,284,284,284,284,284,284,284,284,284, -284,284,284,284,284,284,284,284,284,284,284,284,284,284,284,284, -284,285,284,286,285,285,285,285,285,285,285,114,114,114,114, 5, -284,284,284,284,284,284,287,285,285,285,285,285,285,285,285,288, -289,289,289,289,289,289,289,289,289,289,288,288,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 29 */ -114,290,290,114,290,114,114,290,290,114,290,114,114,290,114,114, -114,114,114,114,290,290,290,290,114,290,290,290,290,290,290,290, -114,290,290,290,114,290,114,290,114,114,290,290,114,290,290,290, -290,291,290,292,291,291,291,291,291,291,114,291,291,290,114,114, -290,290,290,290,290,114,293,114,291,291,291,291,291,291,114,114, -294,294,294,294,294,294,294,294,294,294,114,114,290,290,290,290, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 30 */ -295,296,296,296,297,297,297,297,297,297,297,297,297,297,297,297, -297,297,297,296,297,296,296,296,298,298,296,296,296,296,296,296, -299,299,299,299,299,299,299,299,299,299,300,300,300,300,300,300, -300,300,300,300,296,298,296,298,296,298,301,302,301,302,303,303, -295,295,295,295,295,295,295,295,114,295,295,295,295,295,295,295, -295,295,295,295,295,295,295,295,295,295,295,295,295,295,295,295, -295,295,295,295,295,295,295,295,295,295,295,295,295,114,114,114, -114,298,298,298,298,298,298,298,298,298,298,298,298,298,298,303, - -/* block 31 */ -298,298,298,298,298,297,298,298,295,295,295,295,295,298,298,298, -298,298,298,298,298,298,298,298,114,298,298,298,298,298,298,298, -298,298,298,298,298,298,298,298,298,298,298,298,298,298,298,298, -298,298,298,298,298,298,298,298,298,298,298,298,298,114,296,296, -296,296,296,296,296,296,298,296,296,296,296,296,296,114,296,296, -297,297,297,297,297, 19, 19, 19, 19,297,297,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 32 */ -304,304,304,304,304,304,304,304,304,304,304,304,304,304,304,304, -304,304,304,304,304,304,304,304,304,304,304,304,304,304,304,304, -304,304,304,304,304,304,304,304,304,304,304,305,305,306,306,306, -306,307,306,306,306,306,306,306,305,306,306,307,307,306,306,304, -308,308,308,308,308,308,308,308,308,308,309,309,309,309,309,309, -304,304,304,304,304,304,307,307,306,306,304,304,304,304,306,306, -306,304,305,305,305,304,304,305,305,305,305,305,305,305,304,304, -304,306,306,306,306,304,304,304,304,304,304,304,304,304,304,304, - -/* block 33 */ -304,304,306,305,307,306,306,305,305,305,305,305,305,306,304,305, -308,308,308,308,308,308,308,308,308,308,305,305,305,306,310,310, -311,311,311,311,311,311,311,311,311,311,311,311,311,311,311,311, -311,311,311,311,311,311,311,311,311,311,311,311,311,311,311,311, -311,311,311,311,311,311,114,311,114,114,114,114,114,311,114,114, -312,312,312,312,312,312,312,312,312,312,312,312,312,312,312,312, -312,312,312,312,312,312,312,312,312,312,312,312,312,312,312,312, -312,312,312,312,312,312,312,312,312,312,312, 4,313,312,312,312, - -/* block 34 */ -314,314,314,314,314,314,314,314,314,314,314,314,314,314,314,314, -314,314,314,314,314,314,314,314,314,314,314,314,314,314,314,314, -314,314,314,314,314,314,314,314,314,314,314,314,314,314,314,314, -314,314,314,314,314,314,314,314,314,314,314,314,314,314,314,314, -314,314,314,314,314,314,314,314,314,314,314,314,314,314,314,314, -314,314,314,314,314,314,314,314,314,314,314,314,314,314,314,314, -315,315,315,315,315,315,315,315,315,315,315,315,315,315,315,315, -315,315,315,315,315,315,315,315,315,315,315,315,315,315,315,315, - -/* block 35 */ -315,315,315,315,315,315,315,315,315,315,315,315,315,315,315,315, -315,315,315,315,315,315,315,315,315,315,315,315,315,315,315,315, -315,315,315,315,315,315,315,315,316,316,316,316,316,316,316,316, -316,316,316,316,316,316,316,316,316,316,316,316,316,316,316,316, -316,316,316,316,316,316,316,316,316,316,316,316,316,316,316,316, -316,316,316,316,316,316,316,316,316,316,316,316,316,316,316,316, -316,316,316,316,316,316,316,316,316,316,316,316,316,316,316,316, -316,316,316,316,316,316,316,316,316,316,316,316,316,316,316,316, - -/* block 36 */ -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, -317,317,317,317,317,317,317,317,317,114,317,317,317,317,114,114, -317,317,317,317,317,317,317,114,317,114,317,317,317,317,114,114, -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, - -/* block 37 */ -317,317,317,317,317,317,317,317,317,114,317,317,317,317,114,114, -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, -317,114,317,317,317,317,114,114,317,317,317,317,317,317,317,114, -317,114,317,317,317,317,114,114,317,317,317,317,317,317,317,317, -317,317,317,317,317,317,317,114,317,317,317,317,317,317,317,317, -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, - -/* block 38 */ -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, -317,114,317,317,317,317,114,114,317,317,317,317,317,317,317,317, -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, -317,317,317,317,317,317,317,317,317,317,317,114,114,318,318,318, -319,319,319,319,319,319,319,319,319,320,320,320,320,320,320,320, -320,320,320,320,320,320,320,320,320,320,320,320,320,114,114,114, - -/* block 39 */ -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, -321,321,321,321,321,321,321,321,321,321,114,114,114,114,114,114, -322,322,322,322,322,322,322,322,322,322,322,322,322,322,322,322, -322,322,322,322,322,322,322,322,322,322,322,322,322,322,322,322, -322,322,322,322,322,322,322,322,322,322,322,322,322,322,322,322, -322,322,322,322,322,322,322,322,322,322,322,322,322,322,322,322, -322,322,322,322,322,322,322,322,322,322,322,322,322,322,322,322, -322,322,322,322,322,114,114,114,114,114,114,114,114,114,114,114, - -/* block 40 */ -323,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, - -/* block 41 */ -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, - -/* block 42 */ -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,325,325,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, - -/* block 43 */ -326,327,327,327,327,327,327,327,327,327,327,327,327,327,327,327, -327,327,327,327,327,327,327,327,327,327,327,328,329,114,114,114, -330,330,330,330,330,330,330,330,330,330,330,330,330,330,330,330, -330,330,330,330,330,330,330,330,330,330,330,330,330,330,330,330, -330,330,330,330,330,330,330,330,330,330,330,330,330,330,330,330, -330,330,330,330,330,330,330,330,330,330,330,330,330,330,330,330, -330,330,330,330,330,330,330,330,330,330,330, 4, 4, 4,331,331, -331,330,330,330,330,330,330,330,330,114,114,114,114,114,114,114, - -/* block 44 */ -332,332,332,332,332,332,332,332,332,332,332,332,332,114,332,332, -332,332,333,333,333,114,114,114,114,114,114,114,114,114,114,114, -334,334,334,334,334,334,334,334,334,334,334,334,334,334,334,334, -334,334,335,335,335, 4, 4,114,114,114,114,114,114,114,114,114, -336,336,336,336,336,336,336,336,336,336,336,336,336,336,336,336, -336,336,337,337,114,114,114,114,114,114,114,114,114,114,114,114, -338,338,338,338,338,338,338,338,338,338,338,338,338,114,338,338, -338,114,339,339,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 45 */ -340,340,340,340,340,340,340,340,340,340,340,340,340,340,340,340, -340,340,340,340,340,340,340,340,340,340,340,340,340,340,340,340, -340,340,340,340,340,340,340,340,340,340,340,340,340,340,340,340, -340,340,340,340,341,341,342,341,341,341,341,341,341,341,342,342, -342,342,342,342,342,342,341,342,342,341,341,341,341,341,341,341, -341,341,341,341,343,343,343,344,343,343,343,345,340,341,114,114, -346,346,346,346,346,346,346,346,346,346,114,114,114,114,114,114, -347,347,347,347,347,347,347,347,347,347,114,114,114,114,114,114, - -/* block 46 */ -348,348, 4, 4,348, 4,349,348,348,348,348,350,350,350,351,114, -352,352,352,352,352,352,352,352,352,352,114,114,114,114,114,114, -353,353,353,353,353,353,353,353,353,353,353,353,353,353,353,353, -353,353,353,353,353,353,353,353,353,353,353,353,353,353,353,353, -353,353,353,354,353,353,353,353,353,353,353,353,353,353,353,353, -353,353,353,353,353,353,353,353,353,353,353,353,353,353,353,353, -353,353,353,353,353,353,353,353,353,353,353,353,353,353,353,353, -353,353,353,353,353,353,353,353,114,114,114,114,114,114,114,114, - -/* block 47 */ -353,353,353,353,353,353,353,353,353,353,353,353,353,353,353,353, -353,353,353,353,353,353,353,353,353,353,353,353,353,353,353,353, -353,353,353,353,353,353,353,353,353,350,353,114,114,114,114,114, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,324,324,324,324,324,324,324,324,324,324, -324,324,324,324,324,324,114,114,114,114,114,114,114,114,114,114, - -/* block 48 */ -355,355,355,355,355,355,355,355,355,355,355,355,355,355,355,355, -355,355,355,355,355,355,355,355,355,355,355,355,355,355,355,114, -356,356,356,357,357,357,357,356,356,357,357,357,114,114,114,114, -357,357,356,357,357,357,357,357,357,356,356,356,114,114,114,114, -358,114,114,114,359,359,360,360,360,360,360,360,360,360,360,360, -361,361,361,361,361,361,361,361,361,361,361,361,361,361,361,361, -361,361,361,361,361,361,361,361,361,361,361,361,361,361,114,114, -361,361,361,361,361,114,114,114,114,114,114,114,114,114,114,114, - -/* block 49 */ -362,362,362,362,362,362,362,362,362,362,362,362,362,362,362,362, -362,362,362,362,362,362,362,362,362,362,362,362,362,362,362,362, -362,362,362,362,362,362,362,362,362,362,362,362,114,114,114,114, -363,363,363,363,363,364,364,364,363,363,364,363,363,363,363,363, -363,362,362,362,362,362,362,362,363,363,114,114,114,114,114,114, -365,365,365,365,365,365,365,365,365,365,366,114,114,114,367,367, -368,368,368,368,368,368,368,368,368,368,368,368,368,368,368,368, -368,368,368,368,368,368,368,368,368,368,368,368,368,368,368,368, - -/* block 50 */ -369,369,369,369,369,369,369,369,369,369,369,369,369,369,369,369, -369,369,369,369,369,369,369,370,370,371,371,370,114,114,372,372, -373,373,373,373,373,373,373,373,373,373,373,373,373,373,373,373, -373,373,373,373,373,373,373,373,373,373,373,373,373,373,373,373, -373,373,373,373,373,373,373,373,373,373,373,373,373,373,373,373, -373,373,373,373,373,374,375,374,375,375,375,375,375,375,375,114, -375,376,375,376,376,375,375,375,375,375,375,375,375,374,374,374, -374,374,374,375,375,375,375,375,375,375,375,375,375,114,114,375, - -/* block 51 */ -377,377,377,377,377,377,377,377,377,377,114,114,114,114,114,114, -377,377,377,377,377,377,377,377,377,377,114,114,114,114,114,114, -378,378,378,378,378,378,378,379,378,378,378,378,378,378,114,114, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,380,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 52 */ -381,381,381,381,382,383,383,383,383,383,383,383,383,383,383,383, -383,383,383,383,383,383,383,383,383,383,383,383,383,383,383,383, -383,383,383,383,383,383,383,383,383,383,383,383,383,383,383,383, -383,383,383,383,381,382,381,381,381,381,381,382,381,382,382,382, -382,382,381,382,382,383,383,383,383,383,383,383,114,114,114,114, -384,384,384,384,384,384,384,384,384,384,385,385,385,385,385,385, -385,386,386,386,386,386,386,386,386,386,386,381,381,381,381,381, -381,381,381,381,386,386,386,386,386,386,386,386,386,114,114,114, - -/* block 53 */ -387,387,388,389,389,389,389,389,389,389,389,389,389,389,389,389, -389,389,389,389,389,389,389,389,389,389,389,389,389,389,389,389, -389,388,387,387,387,387,388,388,387,387,388,387,387,387,389,389, -390,390,390,390,390,390,390,390,390,390,389,389,389,389,389,389, -391,391,391,391,391,391,391,391,391,391,391,391,391,391,391,391, -391,391,391,391,391,391,391,391,391,391,391,391,391,391,391,391, -391,391,391,391,391,391,392,393,392,392,393,393,393,392,393,392, -392,392,393,393,114,114,114,114,114,114,114,114,394,394,394,394, - -/* block 54 */ -395,395,395,395,395,395,395,395,395,395,395,395,395,395,395,395, -395,395,395,395,395,395,395,395,395,395,395,395,395,395,395,395, -395,395,395,395,396,396,396,396,396,396,396,396,397,397,397,397, -397,397,397,397,396,396,397,397,114,114,114,398,398,398,398,398, -399,399,399,399,399,399,399,399,399,399,114,114,114,395,395,395, -400,400,400,400,400,400,400,400,400,400,401,401,401,401,401,401, -401,401,401,401,401,401,401,401,401,401,401,401,401,401,401,401, -401,401,401,401,401,401,401,401,402,402,402,402,402,402,403,403, - -/* block 55 */ -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -404,404,404,404,404,404,404,404,114,114,114,114,114,114,114,114, -109,109,109, 4,109,109,109,109,109,109,109,109,109,109,109,109, -109,405,109,109,109,109,109,109,109,406,406,406,406,109,406,406, -406,406,405,405,109,406,406,114,109,109,114,114,114,114,114,114, - -/* block 56 */ - 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, - 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, - 33, 33, 33, 33, 33, 33,122,122,122,122,122,407,106,106,106,106, -106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106, -106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106, -106,106,106,106,106,106,106,106,106,106,106,106,106,115,115,115, -115,115,106,106,106,106,115,115,115,115,115, 33, 33, 33, 33, 33, - 33, 33, 33, 33, 33, 33, 33, 33,408,409, 33, 33, 33,410, 33, 33, - -/* block 57 */ - 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, - 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33,106,106,106,106,106, -106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,106, -106,106,106,106,106,106,106,106,106,106,106,106,106,106,106,115, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,114,114,114,114,114,114,109,109,109,109, - -/* block 58 */ - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, -411,412, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - -/* block 59 */ - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 33, 33, 33, 33, 33,413, 33, 33,414, 33, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - -/* block 60 */ -415,415,415,415,415,415,415,415,416,416,416,416,416,416,416,416, -415,415,415,415,415,415,114,114,416,416,416,416,416,416,114,114, -415,415,415,415,415,415,415,415,416,416,416,416,416,416,416,416, -415,415,415,415,415,415,415,415,416,416,416,416,416,416,416,416, -415,415,415,415,415,415,114,114,416,416,416,416,416,416,114,114, -122,415,122,415,122,415,122,415,114,416,114,416,114,416,114,416, -415,415,415,415,415,415,415,415,416,416,416,416,416,416,416,416, -417,417,418,418,418,418,419,419,420,420,421,421,422,422,114,114, - -/* block 61 */ -415,415,415,415,415,415,415,415,423,423,423,423,423,423,423,423, -415,415,415,415,415,415,415,415,423,423,423,423,423,423,423,423, -415,415,415,415,415,415,415,415,423,423,423,423,423,423,423,423, -415,415,122,424,122,114,122,122,416,416,425,425,426,113,427,113, -113,113,122,424,122,114,122,122,428,428,428,428,426,113,113,113, -415,415,122,122,114,114,122,122,416,416,429,429,114,113,113,113, -415,415,122,122,122,163,122,122,416,416,430,430,168,113,113,113, -114,114,122,424,122,114,122,122,431,431,432,432,426,113,113,114, - -/* block 62 */ - 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 22,433,433, 22, 22, - 9, 9, 9, 9, 9, 9, 4, 4, 21, 25, 6, 21, 21, 25, 6, 21, - 4, 4, 4, 4, 4, 4, 4, 4,434,435, 22, 22, 22, 22, 22, 3, - 4, 4, 4, 4, 4, 4, 4, 4, 4, 21, 25, 4, 4, 4, 4, 15, - 15, 4, 4, 4, 8, 6, 7, 4, 4, 4, 4, 4, 4, 4, 4, 4, - 4, 4, 8, 4, 15, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, - 22, 22, 22, 22, 22,436, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, - 23,106,114,114, 23, 23, 23, 23, 23, 23, 8, 8, 8, 6, 7,106, - -/* block 63 */ - 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 8, 8, 8, 6, 7,114, -106,106,106,106,106,106,106,106,106,106,106,106,106,114,114,114, - 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, - 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -109,109,109,109,109,109,109,109,109,109,109,109,109,380,380,380, -380,109,380,380,380,109,109,109,109,109,109,109,109,109,109,109, -109,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 64 */ - 19, 19,437, 19, 19, 19, 19,437, 19, 19,438,437,437,437,438,438, -437,437,437,438, 19,437, 19, 19, 8,437,437,437,437,437, 19, 19, - 19, 19, 19, 19,437, 19,439, 19,437, 19,440,441,437,437, 19,438, -437,437,442,437,438,406,406,406,406,438, 19, 19,438,438,437,437, - 8, 8, 8, 8, 8,437,438,438,438,438, 19, 8, 19, 19,443, 19, - 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, -444,444,444,444,444,444,444,444,444,444,444,444,444,444,444,444, -445,445,445,445,445,445,445,445,445,445,445,445,445,445,445,445, - -/* block 65 */ -446,446,446, 30, 31,446,446,446,446, 23,114,114,114,114,114,114, - 8, 8, 8, 8, 8, 19, 19, 19, 19, 19, 8, 8, 19, 19, 19, 19, - 8, 19, 19, 8, 19, 19, 8, 19, 19, 19, 19, 19, 19, 19, 8, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 8, 8, - 19, 19, 8, 19, 8, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - -/* block 66 */ - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - -/* block 67 */ - 19, 19, 19, 19, 19, 19, 19, 19, 6, 7, 6, 7, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 8, 8, 19, 19, 19, 19, 19, 19, 19, 6, 7, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 8, 19, 19, 19, - -/* block 68 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 8, 8, 8, 8, - 8, 8, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114,114,114,114, - -/* block 69 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, - 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, - -/* block 70 */ - 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, - 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19,447,447,447,447,447,447,447,447,447,447, -447,447,447,447,447,447,447,447,447,447,447,447,447,447,447,447, -448,448,448,448,448,448,448,448,448,448,448,448,448,448,448,448, -448,448,448,448,448,448,448,448,448,448, 23, 23, 23, 23, 23, 23, - 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, - -/* block 71 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - -/* block 72 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 8, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 8, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 8, 8, 8, 8, 8, 8, 8, 8, - -/* block 73 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 8, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - -/* block 74 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 6, 7, 6, 7, 6, 7, 6, 7, - 6, 7, 6, 7, 6, 7, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, - -/* block 75 */ - 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, - 23, 23, 23, 23, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 8, 8, 8, 8, 8, 6, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - -/* block 76 */ -449,449,449,449,449,449,449,449,449,449,449,449,449,449,449,449, -449,449,449,449,449,449,449,449,449,449,449,449,449,449,449,449, -449,449,449,449,449,449,449,449,449,449,449,449,449,449,449,449, -449,449,449,449,449,449,449,449,449,449,449,449,449,449,449,449, -449,449,449,449,449,449,449,449,449,449,449,449,449,449,449,449, -449,449,449,449,449,449,449,449,449,449,449,449,449,449,449,449, -449,449,449,449,449,449,449,449,449,449,449,449,449,449,449,449, -449,449,449,449,449,449,449,449,449,449,449,449,449,449,449,449, - -/* block 77 */ - 8, 8, 8, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 6, - 7, 6, 7, 6, 7, 6, 7, 6, 7, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 8, 8, 6, 7, 6, 7, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 6, 7, 8, 8, - -/* block 78 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, - 8, 8, 8, 8, 8, 19, 19, 8, 8, 8, 8, 8, 8, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19,114,114, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - -/* block 79 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19,114,114, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114,114, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19,114, 19, 19, 19, 19, 19, 19, - 19, 19,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 80 */ -450,450,450,450,450,450,450,450,450,450,450,450,450,450,450,450, -450,450,450,450,450,450,450,450,450,450,450,450,450,450,450,450, -450,450,450,450,450,450,450,450,450,450,450,450,450,450,450,114, -451,451,451,451,451,451,451,451,451,451,451,451,451,451,451,451, -451,451,451,451,451,451,451,451,451,451,451,451,451,451,451,451, -451,451,451,451,451,451,451,451,451,451,451,451,451,451,451,114, - 30, 31,452,453,454,455,456, 30, 31, 30, 31, 30, 31,457,458,459, -460, 33, 30, 31, 33, 30, 31, 33, 33, 33, 33, 33,106,106,461,461, - -/* block 81 */ -159,160,159,160,159,160,159,160,159,160,159,160,159,160,159,160, -159,160,159,160,159,160,159,160,159,160,159,160,159,160,159,160, -159,160,159,160,159,160,159,160,159,160,159,160,159,160,159,160, -159,160,159,160,159,160,159,160,159,160,159,160,159,160,159,160, -159,160,159,160,159,160,159,160,159,160,159,160,159,160,159,160, -159,160,159,160,159,160,159,160,159,160,159,160,159,160,159,160, -159,160,159,160,462,463,463,463,463,463,463,159,160,159,160,464, -464,464,159,160,114,114,114,114,114,465,465,465,465,466,465,465, - -/* block 82 */ -467,467,467,467,467,467,467,467,467,467,467,467,467,467,467,467, -467,467,467,467,467,467,467,467,467,467,467,467,467,467,467,467, -467,467,467,467,467,467,114,467,114,114,114,114,114,467,114,114, -468,468,468,468,468,468,468,468,468,468,468,468,468,468,468,468, -468,468,468,468,468,468,468,468,468,468,468,468,468,468,468,468, -468,468,468,468,468,468,468,468,468,468,468,468,468,468,468,468, -468,468,468,468,468,468,468,468,114,114,114,114,114,114,114,469, -470,114,114,114,114,114,114,114,114,114,114,114,114,114,114,471, - -/* block 83 */ -317,317,317,317,317,317,317,317,317,317,317,317,317,317,317,317, -317,317,317,317,317,317,317,114,114,114,114,114,114,114,114,114, -317,317,317,317,317,317,317,114,317,317,317,317,317,317,317,114, -317,317,317,317,317,317,317,114,317,317,317,317,317,317,317,114, -317,317,317,317,317,317,317,114,317,317,317,317,317,317,317,114, -317,317,317,317,317,317,317,114,317,317,317,317,317,317,317,114, -177,177,177,177,177,177,177,177,177,177,177,177,177,177,177,177, -177,177,177,177,177,177,177,177,177,177,177,177,177,177,177,177, - -/* block 84 */ - 4, 4, 21, 25, 21, 25, 4, 4, 4, 21, 25, 4, 21, 25, 4, 4, - 4, 4, 4, 4, 4, 4, 4, 9, 4, 4, 9, 4, 21, 25, 4, 4, - 21, 25, 6, 7, 6, 7, 6, 7, 6, 7, 4, 4, 4, 4, 4,107, - 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 9, 9, 4, 4, 4, 4, - 9, 4, 6,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 85 */ -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,114,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 86 */ -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, - -/* block 87 */ -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,472,472,472,472,472,472,472,472,472,472, -472,472,472,472,472,472,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114,114,114, - -/* block 88 */ - 3, 4, 4, 4, 19,473,406,474, 6, 7, 6, 7, 6, 7, 6, 7, - 6, 7, 19, 19, 6, 7, 6, 7, 6, 7, 6, 7, 9, 6, 7, 7, - 19,474,474,474,474,474,474,474,474,474,109,109,109,109,475,475, - 9,107,107,107,107,107, 19, 19,474,474,474,473,406, 4, 19, 19, -114,476,476,476,476,476,476,476,476,476,476,476,476,476,476,476, -476,476,476,476,476,476,476,476,476,476,476,476,476,476,476,476, -476,476,476,476,476,476,476,476,476,476,476,476,476,476,476,476, -476,476,476,476,476,476,476,476,476,476,476,476,476,476,476,476, - -/* block 89 */ -476,476,476,476,476,476,476,476,476,476,476,476,476,476,476,476, -476,476,476,476,476,476,476,114,114,109,109, 14, 14,477,477,476, - 9,478,478,478,478,478,478,478,478,478,478,478,478,478,478,478, -478,478,478,478,478,478,478,478,478,478,478,478,478,478,478,478, -478,478,478,478,478,478,478,478,478,478,478,478,478,478,478,478, -478,478,478,478,478,478,478,478,478,478,478,478,478,478,478,478, -478,478,478,478,478,478,478,478,478,478,478,478,478,478,478,478, -478,478,478,478,478,478,478,478,478,478,478, 4,107,479,479,478, - -/* block 90 */ -114,114,114,114,114,480,480,480,480,480,480,480,480,480,480,480, -480,480,480,480,480,480,480,480,480,480,480,480,480,480,480,480, -480,480,480,480,480,480,480,480,480,480,480,480,480,480,114,114, -114,481,481,481,481,481,481,481,481,481,481,481,481,481,481,481, -481,481,481,481,481,481,481,481,481,481,481,481,481,481,481,481, -481,481,481,481,481,481,481,481,481,481,481,481,481,481,481,481, -481,481,481,481,481,481,481,481,481,481,481,481,481,481,481,481, -481,481,481,481,481,481,481,481,481,481,481,481,481,481,481,481, - -/* block 91 */ -481,481,481,481,481,481,481,481,481,481,481,481,481,481,481,114, - 19, 19, 23, 23, 23, 23, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, -480,480,480,480,480,480,480,480,480,480,480,480,480,480,480,480, -480,480,480,480,480,480,480,480,480,480,480,114,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19,114,114,114,114,114,114,114,114,114,114,114,114, -478,478,478,478,478,478,478,478,478,478,478,478,478,478,478,478, - -/* block 92 */ -482,482,482,482,482,482,482,482,482,482,482,482,482,482,482,482, -482,482,482,482,482,482,482,482,482,482,482,482,482,482,482,114, - 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 23, 23, 23, 23, 23, 23, 23, 23, - 19, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, -482,482,482,482,482,482,482,482,482,482,482,482,482,482,482,482, -482,482,482,482,482,482,482,482,482,482,482,482,482,482,482, 19, - -/* block 93 */ - 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, -483,483,483,483,483,483,483,483,483,483,483,483,483,483,483,483, -483,483,483,483,483,483,483,483,483,483,483,483,483,483,483,483, -483,483,483,483,483,483,483,483,483,483,483,483,483,483,483,114, - -/* block 94 */ -483,483,483,483,483,483,483,483,483,483,483,483,483,483,483,483, -483,483,483,483,483,483,483,483,483,483,483,483,483,483,483,483, -483,483,483,483,483,483,483,483,483,483,483,483,483,483,483,483, -483,483,483,483,483,483,483,483,483,483,483,483,483,483,483,483, -483,483,483,483,483,483,483,483,483,483,483,483,483,483,483,483, -483,483,483,483,483,483,483,483, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - -/* block 95 */ -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, - -/* block 96 */ -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,114,114,114,114,114,114,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - -/* block 97 */ -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 98 */ -485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485, -485,485,485,485,485,486,485,485,485,485,485,485,485,485,485,485, -485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485, -485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485, -485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485, -485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485, -485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485, -485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485, - -/* block 99 */ -485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485, -485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485, -485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485, -485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485, -485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485, -485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485, -485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485, -485,485,485,485,485,485,485,485,485,485,485,485,485,485,485,485, - -/* block 100 */ -485,485,485,485,485,485,485,485,485,485,485,485,485,114,114,114, -487,487,487,487,487,487,487,487,487,487,487,487,487,487,487,487, -487,487,487,487,487,487,487,487,487,487,487,487,487,487,487,487, -487,487,487,487,487,487,487,487,487,487,487,487,487,487,487,487, -487,487,487,487,487,487,487,114,114,114,114,114,114,114,114,114, -488,488,488,488,488,488,488,488,488,488,488,488,488,488,488,488, -488,488,488,488,488,488,488,488,488,488,488,488,488,488,488,488, -488,488,488,488,488,488,488,488,489,489,489,489,489,489,490,490, - -/* block 101 */ -491,491,491,491,491,491,491,491,491,491,491,491,491,491,491,491, -491,491,491,491,491,491,491,491,491,491,491,491,491,491,491,491, -491,491,491,491,491,491,491,491,491,491,491,491,491,491,491,491, -491,491,491,491,491,491,491,491,491,491,491,491,491,491,491,491, -491,491,491,491,491,491,491,491,491,491,491,491,491,491,491,491, -491,491,491,491,491,491,491,491,491,491,491,491,491,491,491,491, -491,491,491,491,491,491,491,491,491,491,491,491,491,491,491,491, -491,491,491,491,491,491,491,491,491,491,491,491,491,491,491,491, - -/* block 102 */ -491,491,491,491,491,491,491,491,491,491,491,491,492,493,493,493, -491,491,491,491,491,491,491,491,491,491,491,491,491,491,491,491, -494,494,494,494,494,494,494,494,494,494,491,491,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -174,175,174,175,174,175,174,175,174,175,174,175,174,175,174,175, -174,175,174,175,174,175,174,175,174,175,174,175,174,175,174,175, -174,175,174,175,174,175,174,175,174,175,174,175,174,175,495,177, -178,178,178,496,177,177,177,177,177,177,177,177,177,177,496,408, - -/* block 103 */ -174,175,174,175,174,175,174,175,174,175,174,175,174,175,174,175, -174,175,174,175,174,175,174,175,174,175,174,175,408,408,114,177, -497,497,497,497,497,497,497,497,497,497,497,497,497,497,497,497, -497,497,497,497,497,497,497,497,497,497,497,497,497,497,497,497, -497,497,497,497,497,497,497,497,497,497,497,497,497,497,497,497, -497,497,497,497,497,497,497,497,497,497,497,497,497,497,497,497, -497,497,497,497,497,497,498,498,498,498,498,498,498,498,498,498, -499,499,500,500,500,500,500,500,114,114,114,114,114,114,114,114, - -/* block 104 */ - 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, - 14, 14, 14, 14, 14, 14, 14,107,107,107,107,107,107,107,107,107, - 14, 14, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 33, 33, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, -106, 33, 33, 33, 33, 33, 33, 33, 33, 30, 31, 30, 31,501, 30, 31, - -/* block 105 */ - 30, 31, 30, 31, 30, 31, 30, 31,107, 14, 14, 30, 31,502, 33,114, - 30, 31, 30, 31, 33, 33, 30, 31, 30, 31, 30, 31, 30, 31, 30, 31, - 30, 31, 30, 31, 30, 31, 30, 31, 30, 31,503,504,505,506,114,114, -507,508,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114, 20,106,106, 33, 20, 20, 20, 20, 20, - -/* block 106 */ -509,509,510,509,509,509,510,509,509,509,509,510,509,509,509,509, -509,509,509,509,509,509,509,509,509,509,509,509,509,509,509,509, -509,509,509,511,511,510,510,511,512,512,512,512,114,114,114,114, - 23, 23, 23, 23, 23, 23, 19, 19, 5, 19,114,114,114,114,114,114, -513,513,513,513,513,513,513,513,513,513,513,513,513,513,513,513, -513,513,513,513,513,513,513,513,513,513,513,513,513,513,513,513, -513,513,513,513,513,513,513,513,513,513,513,513,513,513,513,513, -513,513,513,513,514,514,514,514,114,114,114,114,114,114,114,114, - -/* block 107 */ -515,515,516,516,516,516,516,516,516,516,516,516,516,516,516,516, -516,516,516,516,516,516,516,516,516,516,516,516,516,516,516,516, -516,516,516,516,516,516,516,516,516,516,516,516,516,516,516,516, -516,516,516,516,515,515,515,515,515,515,515,515,515,515,515,515, -515,515,515,515,517,114,114,114,114,114,114,114,114,114,518,518, -519,519,519,519,519,519,519,519,519,519,114,114,114,114,114,114, -221,221,221,221,221,221,221,221,221,221,221,221,221,221,221,221, -221,221,223,223,223,223,223,223,225,225,225,223,114,114,114,114, - -/* block 108 */ -520,520,520,520,520,520,520,520,520,520,521,521,521,521,521,521, -521,521,521,521,521,521,521,521,521,521,521,521,521,521,521,521, -521,521,521,521,521,521,522,522,522,522,522,522,522,522, 4,523, -524,524,524,524,524,524,524,524,524,524,524,524,524,524,524,524, -524,524,524,524,524,524,524,525,525,525,525,525,525,525,525,525, -525,525,526,526,114,114,114,114,114,114,114,114,114,114,114,527, -314,314,314,314,314,314,314,314,314,314,314,314,314,314,314,314, -314,314,314,314,314,314,314,314,314,314,314,314,314,114,114,114, - -/* block 109 */ -528,528,528,529,530,530,530,530,530,530,530,530,530,530,530,530, -530,530,530,530,530,530,530,530,530,530,530,530,530,530,530,530, -530,530,530,530,530,530,530,530,530,530,530,530,530,530,530,530, -530,530,530,528,529,529,528,528,528,528,529,529,528,529,529,529, -529,531,531,531,531,531,531,531,531,531,531,531,531,531,114,107, -532,532,532,532,532,532,532,532,532,532,114,114,114,114,531,531, -304,304,304,304,304,306,533,304,304,304,304,304,304,304,304,304, -308,308,308,308,308,308,308,308,308,308,304,304,304,304,304,114, - -/* block 110 */ -534,534,534,534,534,534,534,534,534,534,534,534,534,534,534,534, -534,534,534,534,534,534,534,534,534,534,534,534,534,534,534,534, -534,534,534,534,534,534,534,534,534,535,535,535,535,535,535,536, -536,535,535,536,536,535,535,114,114,114,114,114,114,114,114,114, -534,534,534,535,534,534,534,534,534,534,534,534,535,536,114,114, -537,537,537,537,537,537,537,537,537,537,114,114,538,538,538,538, -304,304,304,304,304,304,304,304,304,304,304,304,304,304,304,304, -533,304,304,304,304,304,304,310,310,310,304,305,306,305,304,304, - -/* block 111 */ -539,539,539,539,539,539,539,539,539,539,539,539,539,539,539,539, -539,539,539,539,539,539,539,539,539,539,539,539,539,539,539,539, -539,539,539,539,539,539,539,539,539,539,539,539,539,539,539,539, -540,539,540,540,540,539,539,540,540,539,539,539,539,539,540,540, -539,540,539,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,539,539,541,542,542, -543,543,543,543,543,543,543,543,543,543,543,544,545,545,544,544, -546,546,543,547,547,544,545,114,114,114,114,114,114,114,114,114, - -/* block 112 */ -114,317,317,317,317,317,317,114,114,317,317,317,317,317,317,114, -114,317,317,317,317,317,317,114,114,114,114,114,114,114,114,114, -317,317,317,317,317,317,317,114,317,317,317,317,317,317,317,114, - 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, - 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, - 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 14,106,106,106,106, -114,114,114,114, 33,122,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 113 */ -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -543,543,543,543,543,543,543,543,543,543,543,543,543,543,543,543, -543,543,543,543,543,543,543,543,543,543,543,543,543,543,543,543, -543,543,543,544,544,545,544,544,545,544,544,546,544,545,114,114, -548,548,548,548,548,548,548,548,548,548,114,114,114,114,114,114, - -/* block 114 */ -549,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,549,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,549,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,549,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -549,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, - -/* block 115 */ -550,550,550,550,550,550,550,550,550,550,550,550,549,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,549,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,549,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -549,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,549,550,550,550, - -/* block 116 */ -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,549,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,549,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -549,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,549,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, - -/* block 117 */ -550,550,550,550,550,550,550,550,549,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,549,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -549,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,549,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,549,550,550,550,550,550,550,550, - -/* block 118 */ -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,549,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -549,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,549,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,549,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, - -/* block 119 */ -550,550,550,550,549,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -549,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,549,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,549,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,549,550,550,550,550,550,550,550,550,550,550,550, - -/* block 120 */ -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -549,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,549,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,549,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,549,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, - -/* block 121 */ -550,550,550,550,550,550,550,550,549,550,550,550,550,550,550,550, -550,550,550,550,550,550,550,550,550,550,550,550,550,550,550,550, -550,550,550,550,114,114,114,114,114,114,114,114,114,114,114,114, -315,315,315,315,315,315,315,315,315,315,315,315,315,315,315,315, -315,315,315,315,315,315,315,114,114,114,114,316,316,316,316,316, -316,316,316,316,316,316,316,316,316,316,316,316,316,316,316,316, -316,316,316,316,316,316,316,316,316,316,316,316,316,316,316,316, -316,316,316,316,316,316,316,316,316,316,316,316,114,114,114,114, - -/* block 122 */ -551,551,551,551,551,551,551,551,551,551,551,551,551,551,551,551, -551,551,551,551,551,551,551,551,551,551,551,551,551,551,551,551, -551,551,551,551,551,551,551,551,551,551,551,551,551,551,551,551, -551,551,551,551,551,551,551,551,551,551,551,551,551,551,551,551, -551,551,551,551,551,551,551,551,551,551,551,551,551,551,551,551, -551,551,551,551,551,551,551,551,551,551,551,551,551,551,551,551, -551,551,551,551,551,551,551,551,551,551,551,551,551,551,551,551, -551,551,551,551,551,551,551,551,551,551,551,551,551,551,551,551, - -/* block 123 */ -552,552,552,552,552,552,552,552,552,552,552,552,552,552,552,552, -552,552,552,552,552,552,552,552,552,552,552,552,552,552,552,552, -552,552,552,552,552,552,552,552,552,552,552,552,552,552,552,552, -552,552,552,552,552,552,552,552,552,552,552,552,552,552,552,552, -552,552,552,552,552,552,552,552,552,552,552,552,552,552,552,552, -552,552,552,552,552,552,552,552,552,552,552,552,552,552,552,552, -552,552,552,552,552,552,552,552,552,552,552,552,552,552,552,552, -552,552,552,552,552,552,552,552,552,552,552,552,552,552,552,552, - -/* block 124 */ -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,114,114, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, - -/* block 125 */ -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 126 */ - 33, 33, 33, 33, 33, 33, 33,114,114,114,114,114,114,114,114,114, -114,114,114,185,185,185,185,185,114,114,114,114,114,192,189,192, -192,192,192,192,192,192,192,192,192,553,192,192,192,192,192,192, -192,192,192,192,192,192,192,114,192,192,192,192,192,114,192,114, -192,192,114,192,192,114,192,192,192,192,192,192,192,192,192,192, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, - -/* block 127 */ -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,554,554,554,554,554,554,554,554,554,554,554,554,554,554, -554,554,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, - -/* block 128 */ -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, - -/* block 129 */ -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199, 7, 6, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, - -/* block 130 */ -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -114,114,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -199,199,199,199,199,199,199,199,199,199,199,199,196,197,114,114, - -/* block 131 */ -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, - 4, 4, 4, 4, 4, 4, 4, 6, 7, 4,114,114,114,114,114,114, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,114,114, - 4, 9, 9, 15, 15, 6, 7, 6, 7, 6, 7, 6, 7, 6, 7, 6, - 7, 6, 7, 6, 7, 4, 4, 6, 7, 4, 4, 4, 4, 15, 15, 15, - 4, 4, 4,114, 4, 4, 4, 4, 9, 6, 7, 6, 7, 6, 7, 4, - 4, 4, 8, 9, 8, 8, 8,114, 4, 5, 4, 4,114,114,114,114, -199,199,199,199,199,114,199,199,199,199,199,199,199,199,199,199, - -/* block 132 */ -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,114,114, 22, - -/* block 133 */ -114, 4, 4, 4, 5, 4, 4, 4, 6, 7, 4, 8, 4, 9, 4, 4, - 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 4, 4, 8, 8, 8, 4, - 4, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, - 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 6, 4, 7, 14, 15, - 14, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, - 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 6, 8, 7, 8, 6, - 7, 4, 6, 7, 4, 4,478,478,478,478,478,478,478,478,478,478, -107,478,478,478,478,478,478,478,478,478,478,478,478,478,478,478, - -/* block 134 */ -478,478,478,478,478,478,478,478,478,478,478,478,478,478,478,478, -478,478,478,478,478,478,478,478,478,478,478,478,478,478,555,555, -481,481,481,481,481,481,481,481,481,481,481,481,481,481,481,481, -481,481,481,481,481,481,481,481,481,481,481,481,481,481,481,114, -114,114,481,481,481,481,481,481,114,114,481,481,481,481,481,481, -114,114,481,481,481,481,481,481,114,114,481,481,481,114,114,114, - 5, 5, 8, 14, 19, 5, 5,114, 19, 8, 8, 8, 8, 19, 19,114, -436,436,436,436,436,436,436,436,436, 22, 22, 22, 19, 19,114,114, - -/* block 135 */ -556,556,556,556,556,556,556,556,556,556,556,556,114,556,556,556, -556,556,556,556,556,556,556,556,556,556,556,556,556,556,556,556, -556,556,556,556,556,556,556,114,556,556,556,556,556,556,556,556, -556,556,556,556,556,556,556,556,556,556,556,114,556,556,114,556, -556,556,556,556,556,556,556,556,556,556,556,556,556,556,114,114, -556,556,556,556,556,556,556,556,556,556,556,556,556,556,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 136 */ -556,556,556,556,556,556,556,556,556,556,556,556,556,556,556,556, -556,556,556,556,556,556,556,556,556,556,556,556,556,556,556,556, -556,556,556,556,556,556,556,556,556,556,556,556,556,556,556,556, -556,556,556,556,556,556,556,556,556,556,556,556,556,556,556,556, -556,556,556,556,556,556,556,556,556,556,556,556,556,556,556,556, -556,556,556,556,556,556,556,556,556,556,556,556,556,556,556,556, -556,556,556,556,556,556,556,556,556,556,556,556,556,556,556,556, -556,556,556,556,556,556,556,556,556,556,556,114,114,114,114,114, - -/* block 137 */ - 4, 4, 4,114,114,114,114, 23, 23, 23, 23, 23, 23, 23, 23, 23, - 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, - 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, - 23, 23, 23, 23,114,114,114, 19, 19, 19, 19, 19, 19, 19, 19, 19, -557,557,557,557,557,557,557,557,557,557,557,557,557,557,557,557, -557,557,557,557,557,557,557,557,557,557,557,557,557,557,557,557, -557,557,557,557,557,557,557,557,557,557,557,557,557,557,557,557, -557,557,557,557,557,558,558,558,558,559,559,559,559,559,559,559, - -/* block 138 */ -559,559,559,559,559,559,559,559,559,559,558,558,559,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114,114,114, -559,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,109,114,114, - -/* block 139 */ -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 140 */ -560,560,560,560,560,560,560,560,560,560,560,560,560,560,560,560, -560,560,560,560,560,560,560,560,560,560,560,560,560,114,114,114, -561,561,561,561,561,561,561,561,561,561,561,561,561,561,561,561, -561,561,561,561,561,561,561,561,561,561,561,561,561,561,561,561, -561,561,561,561,561,561,561,561,561,561,561,561,561,561,561,561, -561,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -109, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, - 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,114,114,114,114, - -/* block 141 */ -562,562,562,562,562,562,562,562,562,562,562,562,562,562,562,562, -562,562,562,562,562,562,562,562,562,562,562,562,562,562,562,562, -563,563,563,563,114,114,114,114,114,114,114,114,114,114,114,114, -564,564,564,564,564,564,564,564,564,564,564,564,564,564,564,564, -564,565,564,564,564,564,564,564,564,564,565,114,114,114,114,114, -566,566,566,566,566,566,566,566,566,566,566,566,566,566,566,566, -566,566,566,566,566,566,566,566,566,566,566,566,566,566,566,566, -566,566,566,566,566,566,567,567,567,567,567,114,114,114,114,114, - -/* block 142 */ -568,568,568,568,568,568,568,568,568,568,568,568,568,568,568,568, -568,568,568,568,568,568,568,568,568,568,568,568,568,568,114,569, -570,570,570,570,570,570,570,570,570,570,570,570,570,570,570,570, -570,570,570,570,570,570,570,570,570,570,570,570,570,570,570,570, -570,570,570,570,114,114,114,114,570,570,570,570,570,570,570,570, -571,572,572,572,572,572,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 143 */ -573,573,573,573,573,573,573,573,573,573,573,573,573,573,573,573, -573,573,573,573,573,573,573,573,573,573,573,573,573,573,573,573, -573,573,573,573,573,573,573,573,574,574,574,574,574,574,574,574, -574,574,574,574,574,574,574,574,574,574,574,574,574,574,574,574, -574,574,574,574,574,574,574,574,574,574,574,574,574,574,574,574, -575,575,575,575,575,575,575,575,575,575,575,575,575,575,575,575, -575,575,575,575,575,575,575,575,575,575,575,575,575,575,575,575, -575,575,575,575,575,575,575,575,575,575,575,575,575,575,575,575, - -/* block 144 */ -576,576,576,576,576,576,576,576,576,576,576,576,576,576,576,576, -576,576,576,576,576,576,576,576,576,576,576,576,576,576,114,114, -577,577,577,577,577,577,577,577,577,577,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 145 */ -578,578,578,578,578,578,578,578,578,578,578,578,578,578,578,578, -578,578,578,578,578,578,578,578,578,578,578,578,578,578,578,578, -578,578,578,578,578,578,578,578,114,114,114,114,114,114,114,114, -579,579,579,579,579,579,579,579,579,579,579,579,579,579,579,579, -579,579,579,579,579,579,579,579,579,579,579,579,579,579,579,579, -579,579,579,579,579,579,579,579,579,579,579,579,579,579,579,579, -579,579,579,579,114,114,114,114,114,114,114,114,114,114,114,580, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 146 */ -581,581,581,581,581,581,581,581,581,581,581,581,581,581,581,581, -581,581,581,581,581,581,581,581,581,581,581,581,581,581,581,581, -581,581,581,581,581,581,581,581,581,581,581,581,581,581,581,581, -581,581,581,581,581,581,581,581,581,581,581,581,581,581,581,581, -581,581,581,581,581,581,581,581,581,581,581,581,581,581,581,581, -581,581,581,581,581,581,581,581,581,581,581,581,581,581,581,581, -581,581,581,581,581,581,581,581,581,581,581,581,581,581,581,581, -581,581,581,581,581,581,581,581,581,581,581,581,581,581,581,581, - -/* block 147 */ -581,581,581,581,581,581,581,581,581,581,581,581,581,581,581,581, -581,581,581,581,581,581,581,581,581,581,581,581,581,581,581,581, -581,581,581,581,581,581,581,581,581,581,581,581,581,581,581,581, -581,581,581,581,581,581,581,114,114,114,114,114,114,114,114,114, -581,581,581,581,581,581,581,581,581,581,581,581,581,581,581,581, -581,581,581,581,581,581,114,114,114,114,114,114,114,114,114,114, -581,581,581,581,581,581,581,581,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 148 */ -582,582,582,582,582,582,114,114,582,114,582,582,582,582,582,582, -582,582,582,582,582,582,582,582,582,582,582,582,582,582,582,582, -582,582,582,582,582,582,582,582,582,582,582,582,582,582,582,582, -582,582,582,582,582,582,114,582,582,114,114,114,582,114,114,582, -583,583,583,583,583,583,583,583,583,583,583,583,583,583,583,583, -583,583,583,583,583,583,114,584,585,585,585,585,585,585,585,585, -586,586,586,586,586,586,586,586,586,586,586,586,586,586,586,586, -586,586,586,586,586,586,586,587,587,588,588,588,588,588,588,588, - -/* block 149 */ -589,589,589,589,589,589,589,589,589,589,589,589,589,589,589,589, -589,589,589,589,589,589,589,589,589,589,589,589,589,589,589,114, -114,114,114,114,114,114,114,590,590,590,590,590,590,590,590,590, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 150 */ -591,591,591,591,591,591,591,591,591,591,591,591,591,591,591,591, -591,591,591,591,591,591,592,592,592,592,592,592,114,114,114,593, -594,594,594,594,594,594,594,594,594,594,594,594,594,594,594,594, -594,594,594,594,594,594,594,594,594,594,114,114,114,114,114,595, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 151 */ -596,596,596,596,596,596,596,596,596,596,596,596,596,596,596,596, -596,596,596,596,596,596,596,596,596,596,596,596,596,596,596,596, -597,597,597,597,597,597,597,597,597,597,597,597,597,597,597,597, -597,597,597,597,597,597,597,597,114,114,114,114,114,114,597,597, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 152 */ -598,599,599,599,114,599,599,114,114,114,114,114,599,599,599,599, -598,598,598,598,114,598,598,598,114,598,598,598,598,598,598,598, -598,598,598,598,598,598,598,598,598,598,598,598,598,598,598,598, -598,598,598,598,114,114,114,114,599,599,599,114,114,114,114,599, -600,600,600,600,600,600,600,600,114,114,114,114,114,114,114,114, -601,601,601,601,601,601,601,601,601,114,114,114,114,114,114,114, -602,602,602,602,602,602,602,602,602,602,602,602,602,602,602,602, -602,602,602,602,602,602,602,602,602,602,602,602,602,603,603,604, - -/* block 153 */ -605,605,605,605,605,605,605,605,605,605,605,605,605,605,605,605, -605,605,605,605,605,605,605,605,605,605,605,605,605,606,606,606, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -607,607,607,607,607,607,607,607,608,607,607,607,607,607,607,607, -607,607,607,607,607,607,607,607,607,607,607,607,607,607,607,607, -607,607,607,607,607,609,609,114,114,114,114,610,610,610,610,610, -611,611,611,611,611,611,611,114,114,114,114,114,114,114,114,114, - -/* block 154 */ -612,612,612,612,612,612,612,612,612,612,612,612,612,612,612,612, -612,612,612,612,612,612,612,612,612,612,612,612,612,612,612,612, -612,612,612,612,612,612,612,612,612,612,612,612,612,612,612,612, -612,612,612,612,612,612,114,114,114,613,613,613,613,613,613,613, -614,614,614,614,614,614,614,614,614,614,614,614,614,614,614,614, -614,614,614,614,614,614,114,114,615,615,615,615,615,615,615,615, -616,616,616,616,616,616,616,616,616,616,616,616,616,616,616,616, -616,616,616,114,114,114,114,114,617,617,617,617,617,617,617,617, - -/* block 155 */ -618,618,618,618,618,618,618,618,618,618,618,618,618,618,618,618, -618,618,114,114,114,114,114,114,114,619,619,619,619,114,114,114, -114,114,114,114,114,114,114,114,114,620,620,620,620,620,620,620, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 156 */ -621,621,621,621,621,621,621,621,621,621,621,621,621,621,621,621, -621,621,621,621,621,621,621,621,621,621,621,621,621,621,621,621, -621,621,621,621,621,621,621,621,621,621,621,621,621,621,621,621, -621,621,621,621,621,621,621,621,621,621,621,621,621,621,621,621, -621,621,621,621,621,621,621,621,621,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 157 */ -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -622,622,622,622,622,622,622,622,622,622,622,622,622,622,622,622, -622,622,622,622,622,622,622,622,622,622,622,622,622,622,622,114, - -/* block 158 */ -623,624,623,625,625,625,625,625,625,625,625,625,625,625,625,625, -625,625,625,625,625,625,625,625,625,625,625,625,625,625,625,625, -625,625,625,625,625,625,625,625,625,625,625,625,625,625,625,625, -625,625,625,625,625,625,625,625,624,624,624,624,624,624,624,624, -624,624,624,624,624,624,624,626,626,626,626,626,626,626,114,114, -114,114,627,627,627,627,627,627,627,627,627,627,627,627,627,627, -627,627,627,627,627,627,628,628,628,628,628,628,628,628,628,628, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,624, - -/* block 159 */ -629,629,630,631,631,631,631,631,631,631,631,631,631,631,631,631, -631,631,631,631,631,631,631,631,631,631,631,631,631,631,631,631, -631,631,631,631,631,631,631,631,631,631,631,631,631,631,631,631, -630,630,630,629,629,629,629,630,630,629,629,632,632,633,632,632, -632,632,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -634,634,634,634,634,634,634,634,634,634,634,634,634,634,634,634, -634,634,634,634,634,634,634,634,634,114,114,114,114,114,114,114, -635,635,635,635,635,635,635,635,635,635,114,114,114,114,114,114, - -/* block 160 */ -636,636,636,637,637,637,637,637,637,637,637,637,637,637,637,637, -637,637,637,637,637,637,637,637,637,637,637,637,637,637,637,637, -637,637,637,637,637,637,637,636,636,636,636,636,638,636,636,636, -636,636,636,636,636,114,639,639,639,639,639,639,639,639,639,639, -640,640,640,640,114,114,114,114,114,114,114,114,114,114,114,114, -641,641,641,641,641,641,641,641,641,641,641,641,641,641,641,641, -641,641,641,641,641,641,641,641,641,641,641,641,641,641,641,641, -641,641,641,642,643,643,641,114,114,114,114,114,114,114,114,114, - -/* block 161 */ -644,644,645,646,646,646,646,646,646,646,646,646,646,646,646,646, -646,646,646,646,646,646,646,646,646,646,646,646,646,646,646,646, -646,646,646,646,646,646,646,646,646,646,646,646,646,646,646,646, -646,646,646,645,645,645,644,644,644,644,644,644,644,644,644,645, -645,646,646,646,646,647,647,647,647,114,114,114,114,647,114,114, -648,648,648,648,648,648,648,648,648,648,646,114,114,114,114,114, -114,649,649,649,649,649,649,649,649,649,649,649,649,649,649,649, -649,649,649,649,649,114,114,114,114,114,114,114,114,114,114,114, - -/* block 162 */ -650,650,650,650,650,650,650,650,650,650,650,650,650,650,650,650, -650,650,114,650,650,650,650,650,650,650,650,650,650,650,650,650, -650,650,650,650,650,650,650,650,650,650,650,650,651,651,651,652, -652,652,651,651,652,651,652,652,653,653,653,653,653,653,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 163 */ -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -654,654,654,654,654,654,654,654,654,654,654,654,654,654,654,654, -654,654,654,654,654,654,654,654,654,654,654,654,654,654,654,654, -654,654,654,654,654,654,654,654,654,654,654,654,654,654,654,655, -656,656,656,655,655,655,655,655,655,655,655,114,114,114,114,114, -657,657,657,657,657,657,657,657,657,657,114,114,114,114,114,114, - -/* block 164 */ -114,658,659,659,114,660,660,660,660,660,660,660,660,114,114,660, -660,114,114,660,660,660,660,660,660,660,660,660,660,660,660,660, -660,660,660,660,660,660,660,660,660,114,660,660,660,660,660,660, -660,114,660,660,114,660,660,660,660,660,114,114,658,660,661,659, -658,659,659,659,659,114,114,659,659,114,114,659,659,659,114,114, -114,114,114,114,114,114,114,661,114,114,114,114,114,660,660,660, -660,660,659,659,114,114,658,658,658,658,658,658,658,114,114,114, -658,658,658,658,658,114,114,114,114,114,114,114,114,114,114,114, - -/* block 165 */ -662,662,662,662,662,662,662,662,662,662,662,662,662,662,662,662, -662,662,662,662,662,662,662,662,662,662,662,662,662,662,662,662, -662,662,662,662,662,662,662,662,662,662,662,662,662,662,662,662, -663,664,664,665,665,665,665,665,665,664,665,664,664,663,664,665, -665,664,665,665,662,662,666,662,114,114,114,114,114,114,114,114, -667,667,667,667,667,667,667,667,667,667,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 166 */ -668,668,668,668,668,668,668,668,668,668,668,668,668,668,668,668, -668,668,668,668,668,668,668,668,668,668,668,668,668,668,668,668, -668,668,668,668,668,668,668,668,668,668,668,668,668,668,668,669, -670,670,671,671,671,671,114,114,670,670,670,670,671,671,670,671, -671,672,672,672,672,672,672,672,672,672,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 167 */ -673,673,673,673,673,673,673,673,673,673,673,673,673,673,673,673, -673,673,673,673,673,673,673,673,673,673,673,673,673,673,673,673, -673,673,673,673,673,673,673,673,673,673,673,673,673,673,673,673, -674,674,674,675,675,675,675,675,675,675,675,674,674,675,674,675, -675,676,676,676,673,114,114,114,114,114,114,114,114,114,114,114, -677,677,677,677,677,677,677,677,677,677,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 168 */ -678,678,678,678,678,678,678,678,678,678,678,678,678,678,678,678, -678,678,678,678,678,678,678,678,678,678,678,678,678,678,678,678, -678,678,678,678,678,678,678,678,678,678,678,679,680,679,680,680, -679,679,679,679,679,679,680,679,114,114,114,114,114,114,114,114, -681,681,681,681,681,681,681,681,681,681,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 169 */ -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -682,682,682,682,682,682,682,682,682,682,682,682,682,682,682,682, -682,682,682,682,682,682,682,682,682,682,682,682,682,682,682,682, -683,683,683,683,683,683,683,683,683,683,683,683,683,683,683,683, -683,683,683,683,683,683,683,683,683,683,683,683,683,683,683,683, -684,684,684,684,684,684,684,684,684,684,685,685,685,685,685,685, -685,685,685,114,114,114,114,114,114,114,114,114,114,114,114,686, - -/* block 170 */ -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -687,687,687,687,687,687,687,687,687,687,687,687,687,687,687,687, -687,687,687,687,687,687,687,687,687,687,687,687,687,687,687,687, -687,687,687,687,687,687,687,687,687,687,687,687,687,687,687,687, -687,687,687,687,687,687,687,687,687,114,114,114,114,114,114,114, - -/* block 171 */ -688,688,688,688,688,688,688,688,688,688,688,688,688,688,688,688, -688,688,688,688,688,688,688,688,688,688,688,688,688,688,688,688, -688,688,688,688,688,688,688,688,688,688,688,688,688,688,688,688, -688,688,688,688,688,688,688,688,688,688,688,688,688,688,688,688, -688,688,688,688,688,688,688,688,688,688,688,688,688,688,688,688, -688,688,688,688,688,688,688,688,688,688,688,688,688,688,688,688, -688,688,688,688,688,688,688,688,688,688,688,688,688,688,688,688, -688,688,688,688,688,688,688,688,688,688,688,688,688,688,688,688, - -/* block 172 */ -688,688,688,688,688,688,688,688,688,688,688,688,688,688,688,688, -688,688,688,688,688,688,688,688,688,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 173 */ -689,689,689,689,689,689,689,689,689,689,689,689,689,689,689,689, -689,689,689,689,689,689,689,689,689,689,689,689,689,689,689,689, -689,689,689,689,689,689,689,689,689,689,689,689,689,689,689,689, -689,689,689,689,689,689,689,689,689,689,689,689,689,689,689,689, -689,689,689,689,689,689,689,689,689,689,689,689,689,689,689,689, -689,689,689,689,689,689,689,689,689,689,689,689,689,689,689,689, -689,689,689,689,689,689,689,689,689,689,689,689,689,689,689,114, -690,690,690,690,690,114,114,114,114,114,114,114,114,114,114,114, - -/* block 174 */ -691,691,691,691,691,691,691,691,691,691,691,691,691,691,691,691, -691,691,691,691,691,691,691,691,691,691,691,691,691,691,691,691, -691,691,691,691,691,691,691,691,691,691,691,691,691,691,691,691, -691,691,691,691,691,691,691,691,691,691,691,691,691,691,691,691, -691,691,691,691,691,691,691,691,691,691,691,691,691,691,691,691, -691,691,691,691,691,691,691,691,691,691,691,691,691,691,691,691, -691,691,691,691,691,691,691,691,691,691,691,691,691,691,691,691, -691,691,691,691,691,691,691,691,691,691,691,691,691,691,691,691, - -/* block 175 */ -691,691,691,691,691,691,691,691,691,691,691,691,691,691,691,691, -691,691,691,691,691,691,691,691,691,691,691,691,691,691,691,691, -691,691,691,691,691,691,691,691,691,691,691,691,691,691,691,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 176 */ -497,497,497,497,497,497,497,497,497,497,497,497,497,497,497,497, -497,497,497,497,497,497,497,497,497,497,497,497,497,497,497,497, -497,497,497,497,497,497,497,497,497,497,497,497,497,497,497,497, -497,497,497,497,497,497,497,497,497,497,497,497,497,497,497,497, -497,497,497,497,497,497,497,497,497,497,497,497,497,497,497,497, -497,497,497,497,497,497,497,497,497,497,497,497,497,497,497,497, -497,497,497,497,497,497,497,497,497,497,497,497,497,497,497,497, -497,497,497,497,497,497,497,497,497,497,497,497,497,497,497,497, - -/* block 177 */ -497,497,497,497,497,497,497,497,497,497,497,497,497,497,497,497, -497,497,497,497,497,497,497,497,497,497,497,497,497,497,497,497, -497,497,497,497,497,497,497,497,497,497,497,497,497,497,497,497, -497,497,497,497,497,497,497,497,497,114,114,114,114,114,114,114, -692,692,692,692,692,692,692,692,692,692,692,692,692,692,692,692, -692,692,692,692,692,692,692,692,692,692,692,692,692,692,692,114, -693,693,693,693,693,693,693,693,693,693,114,114,114,114,694,694, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 178 */ -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -695,695,695,695,695,695,695,695,695,695,695,695,695,695,695,695, -695,695,695,695,695,695,695,695,695,695,695,695,695,695,114,114, -696,696,696,696,696,697,114,114,114,114,114,114,114,114,114,114, - -/* block 179 */ -698,698,698,698,698,698,698,698,698,698,698,698,698,698,698,698, -698,698,698,698,698,698,698,698,698,698,698,698,698,698,698,698, -698,698,698,698,698,698,698,698,698,698,698,698,698,698,698,698, -699,699,699,699,699,699,699,700,700,700,700,700,701,701,701,701, -702,702,702,702,700,701,114,114,114,114,114,114,114,114,114,114, -703,703,703,703,703,703,703,703,703,703,114,704,704,704,704,704, -704,704,114,698,698,698,698,698,698,698,698,698,698,698,698,698, -698,698,698,698,698,698,698,698,114,114,114,114,114,698,698,698, - -/* block 180 */ -698,698,698,698,698,698,698,698,698,698,698,698,698,698,698,698, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 181 */ -705,705,705,705,705,705,705,705,705,705,705,705,705,705,705,705, -705,705,705,705,705,705,705,705,705,705,705,705,705,705,705,705, -705,705,705,705,705,705,705,705,705,705,705,705,705,705,705,705, -705,705,705,705,705,705,705,705,705,705,705,705,705,705,705,705, -705,705,705,705,705,114,114,114,114,114,114,114,114,114,114,114, -705,706,706,706,706,706,706,706,706,706,706,706,706,706,706,706, -706,706,706,706,706,706,706,706,706,706,706,706,706,706,706,706, -706,706,706,706,706,706,706,706,706,706,706,706,706,706,706,114, - -/* block 182 */ -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,707, -707,707,707,708,708,708,708,708,708,708,708,708,708,708,708,708, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 183 */ -478,476,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 184 */ -709,709,709,709,709,709,709,709,709,709,709,709,709,709,709,709, -709,709,709,709,709,709,709,709,709,709,709,709,709,709,709,709, -709,709,709,709,709,709,709,709,709,709,709,709,709,709,709,709, -709,709,709,709,709,709,709,709,709,709,709,709,709,709,709,709, -709,709,709,709,709,709,709,709,709,709,709,709,709,709,709,709, -709,709,709,709,709,709,709,709,709,709,709,709,709,709,709,709, -709,709,709,709,709,709,709,709,709,709,709,114,114,114,114,114, -709,709,709,709,709,709,709,709,709,709,709,709,709,114,114,114, - -/* block 185 */ -709,709,709,709,709,709,709,709,709,114,114,114,114,114,114,114, -709,709,709,709,709,709,709,709,709,709,114,114,710,711,711,712, - 22, 22, 22, 22,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 186 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19,114,114,114,114,114,114,114,114,114,114, - -/* block 187 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19,114,114, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19,713,405,109,109,109, 19, 19, 19,405,713,713, -713,713,713, 22, 22, 22, 22, 22, 22, 22, 22,109,109,109,109,109, - -/* block 188 */ -109,109,109, 19, 19,109,109,109,109,109,109,109, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,109,109,109,109, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 189 */ -559,559,559,559,559,559,559,559,559,559,559,559,559,559,559,559, -559,559,559,559,559,559,559,559,559,559,559,559,559,559,559,559, -559,559,559,559,559,559,559,559,559,559,559,559,559,559,559,559, -559,559,559,559,559,559,559,559,559,559,559,559,559,559,559,559, -559,559,714,714,714,559,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 190 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19,114,114,114,114,114,114,114,114,114, - 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, - 23, 23,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 191 */ -437,437,437,437,437,437,437,437,437,437,437,437,437,437,437,437, -437,437,437,437,437,437,437,437,437,437,438,438,438,438,438,438, -438,438,438,438,438,438,438,438,438,438,438,438,438,438,438,438, -438,438,438,438,437,437,437,437,437,437,437,437,437,437,437,437, -437,437,437,437,437,437,437,437,437,437,437,437,437,437,438,438, -438,438,438,438,438,114,438,438,438,438,438,438,438,438,438,438, -438,438,438,438,438,438,438,438,437,437,437,437,437,437,437,437, -437,437,437,437,437,437,437,437,437,437,437,437,437,437,437,437, - -/* block 192 */ -437,437,438,438,438,438,438,438,438,438,438,438,438,438,438,438, -438,438,438,438,438,438,438,438,438,438,438,438,437,114,437,437, -114,114,437,114,114,437,437,114,114,437,437,437,437,114,437,437, -437,437,437,437,437,437,438,438,438,438,114,438,114,438,438,438, -438,438,438,438,114,438,438,438,438,438,438,438,438,438,438,438, -437,437,437,437,437,437,437,437,437,437,437,437,437,437,437,437, -437,437,437,437,437,437,437,437,437,437,438,438,438,438,438,438, -438,438,438,438,438,438,438,438,438,438,438,438,438,438,438,438, - -/* block 193 */ -438,438,438,438,437,437,114,437,437,437,437,114,114,437,437,437, -437,437,437,437,437,114,437,437,437,437,437,437,437,114,438,438, -438,438,438,438,438,438,438,438,438,438,438,438,438,438,438,438, -438,438,438,438,438,438,438,438,437,437,114,437,437,437,437,114, -437,437,437,437,437,114,437,114,114,114,437,437,437,437,437,437, -437,114,438,438,438,438,438,438,438,438,438,438,438,438,438,438, -438,438,438,438,438,438,438,438,438,438,438,438,437,437,437,437, -437,437,437,437,437,437,437,437,437,437,437,437,437,437,437,437, - -/* block 194 */ -437,437,437,437,437,437,438,438,438,438,438,438,438,438,438,438, -438,438,438,438,438,438,438,438,438,438,438,438,438,438,438,438, -437,437,437,437,437,437,437,437,437,437,437,437,437,437,437,437, -437,437,437,437,437,437,437,437,437,437,438,438,438,438,438,438, -438,438,438,438,438,438,438,438,438,438,438,438,438,438,438,438, -438,438,438,438,437,437,437,437,437,437,437,437,437,437,437,437, -437,437,437,437,437,437,437,437,437,437,437,437,437,437,438,438, -438,438,438,438,438,438,438,438,438,438,438,438,438,438,438,438, - -/* block 195 */ -438,438,438,438,438,438,438,438,437,437,437,437,437,437,437,437, -437,437,437,437,437,437,437,437,437,437,437,437,437,437,437,437, -437,437,438,438,438,438,438,438,438,438,438,438,438,438,438,438, -438,438,438,438,438,438,438,438,438,438,438,438,437,437,437,437, -437,437,437,437,437,437,437,437,437,437,437,437,437,437,437,437, -437,437,437,437,437,437,438,438,438,438,438,438,438,438,438,438, -438,438,438,438,438,438,438,438,438,438,438,438,438,438,438,438, -437,437,437,437,437,437,437,437,437,437,437,437,437,437,437,437, - -/* block 196 */ -437,437,437,437,437,437,437,437,437,437,438,438,438,438,438,438, -438,438,438,438,438,438,438,438,438,438,438,438,438,438,438,438, -438,438,438,438,438,438,114,114,437,437,437,437,437,437,437,437, -437,437,437,437,437,437,437,437,437,437,437,437,437,437,437,437, -437, 8,438,438,438,438,438,438,438,438,438,438,438,438,438,438, -438,438,438,438,438,438,438,438,438,438,438, 8,438,438,438,438, -438,438,437,437,437,437,437,437,437,437,437,437,437,437,437,437, -437,437,437,437,437,437,437,437,437,437,437, 8,438,438,438,438, - -/* block 197 */ -438,438,438,438,438,438,438,438,438,438,438,438,438,438,438,438, -438,438,438,438,438, 8,438,438,438,438,438,438,437,437,437,437, -437,437,437,437,437,437,437,437,437,437,437,437,437,437,437,437, -437,437,437,437,437, 8,438,438,438,438,438,438,438,438,438,438, -438,438,438,438,438,438,438,438,438,438,438,438,438,438,438, 8, -438,438,438,438,438,438,437,437,437,437,437,437,437,437,437,437, -437,437,437,437,437,437,437,437,437,437,437,437,437,437,437, 8, -438,438,438,438,438,438,438,438,438,438,438,438,438,438,438,438, - -/* block 198 */ -438,438,438,438,438,438,438,438,438, 8,438,438,438,438,438,438, -437,437,437,437,437,437,437,437,437,437,437,437,437,437,437,437, -437,437,437,437,437,437,437,437,437, 8,438,438,438,438,438,438, -438,438,438,438,438,438,438,438,438,438,438,438,438,438,438,438, -438,438,438, 8,438,438,438,438,438,438,437,438,114,114, 10, 10, - 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, - 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, - 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, - -/* block 199 */ -715,715,715,715,715,715,715,715,715,715,715,715,715,715,715,715, -715,715,715,715,715,715,715,715,715,715,715,715,715,715,715,715, -715,715,715,715,715,715,715,715,715,715,715,715,715,715,715,715, -715,715,715,715,715,715,715,715,715,715,715,715,715,715,715,715, -715,715,715,715,715,715,715,715,715,715,715,715,715,715,715,715, -715,715,715,715,715,715,715,715,715,715,715,715,715,715,715,715, -715,715,715,715,715,715,715,715,715,715,715,715,715,715,715,715, -715,715,715,715,715,715,715,715,715,715,715,715,715,715,715,715, - -/* block 200 */ -715,715,715,715,715,715,715,715,715,715,715,715,715,715,715,715, -715,715,715,715,715,715,715,715,715,715,715,715,715,715,715,715, -715,715,715,715,715,715,715,715,715,715,715,715,715,715,715,715, -715,715,715,715,715,715,715,715,715,715,715,715,715,715,715,715, -715,715,715,715,715,114,114,716,716,716,716,716,716,716,716,716, -717,717,717,717,717,717,717,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 201 */ -199,199,199,199,114,199,199,199,199,199,199,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,199,199,199,199, -114,199,199,114,199,114,114,199,114,199,199,199,199,199,199,199, -199,199,199,114,199,199,199,199,114,199,114,199,114,114,114,114, -114,114,199,114,114,114,114,199,114,199,114,199,114,199,199,199, -114,199,199,114,199,114,114,199,114,199,114,199,114,199,114,199, -114,199,199,114,199,114,114,199,199,199,199,114,199,199,199,199, -199,199,199,114,199,199,199,199,114,199,199,199,199,114,199,114, - -/* block 202 */ -199,199,199,199,199,199,199,199,199,199,114,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,114,114,114,114, -114,199,199,199,114,199,199,199,199,199,114,199,199,199,199,199, -199,199,199,199,199,199,199,199,199,199,199,199,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -194,194,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 203 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - -/* block 204 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19,114,114,114,114,114,114,114,114,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114, -114, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, -114, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, -114, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19,114,114,114,114,114,114,114,114,114,114, - -/* block 205 */ - 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - -/* block 206 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,718,718,718,718,718,718,718,718,718,718, -718,718,718,718,718,718,718,718,718,718,718,718,718,718,718,718, - -/* block 207 */ -719, 19, 19,114,114,114,114,114,114,114,114,114,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114,114,114,114,114,114, - 19, 19,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 208 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114, - -/* block 209 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114, -114,114,114,114, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19,114,114,114,114,114,114,114,114, - -/* block 210 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114, - -/* block 211 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114, 19, 19, 19, 19, 19, - -/* block 212 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19,114, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - -/* block 213 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19,114,114, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - -/* block 214 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114,114, - 19, 19, 19, 19,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 215 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 216 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 217 */ - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19,114,114,114,114,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - -/* block 218 */ - 19, 19, 19, 19, 19, 19, 19, 19,114,114,114,114,114,114,114,114, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, - 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 219 */ -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 220 */ -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,114,114,114,114,114,114,114,114,114,114,114, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, - -/* block 221 */ -484,484,484,484,484,484,484,484,484,484,484,484,484,484,484,484, -484,484,484,484,484,484,484,484,484,484,484,484,484,484,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, -114,114,114,114,114,114,114,114,114,114,114,114,114,114,114,114, - -/* block 222 */ -436, 22,436,436,436,436,436,436,436,436,436,436,436,436,436,436, -436,436,436,436,436,436,436,436,436,436,436,436,436,436,436,436, - 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, - 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, - 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, - 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, - 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, - 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, 22, - -/* block 223 */ -436,436,436,436,436,436,436,436,436,436,436,436,436,436,436,436, -436,436,436,436,436,436,436,436,436,436,436,436,436,436,436,436, -436,436,436,436,436,436,436,436,436,436,436,436,436,436,436,436, -436,436,436,436,436,436,436,436,436,436,436,436,436,436,436,436, -436,436,436,436,436,436,436,436,436,436,436,436,436,436,436,436, -436,436,436,436,436,436,436,436,436,436,436,436,436,436,436,436, -436,436,436,436,436,436,436,436,436,436,436,436,436,436,436,436, -436,436,436,436,436,436,436,436,436,436,436,436,436,436,436,436, - -/* block 224 */ -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, - -/* block 225 */ -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -109,109,109,109,109,109,109,109,109,109,109,109,109,109,109,109, -436,436,436,436,436,436,436,436,436,436,436,436,436,436,436,436, - -/* block 226 */ -552,552,552,552,552,552,552,552,552,552,552,552,552,552,552,552, -552,552,552,552,552,552,552,552,552,552,552,552,552,552,552,552, -552,552,552,552,552,552,552,552,552,552,552,552,552,552,552,552, -552,552,552,552,552,552,552,552,552,552,552,552,552,552,552,552, -552,552,552,552,552,552,552,552,552,552,552,552,552,552,552,552, -552,552,552,552,552,552,552,552,552,552,552,552,552,552,552,552, -552,552,552,552,552,552,552,552,552,552,552,552,552,552,552,552, -552,552,552,552,552,552,552,552,552,552,552,552,552,552,114,114, - -}; - -#if UCD_BLOCK_SIZE != 128 -#error Please correct UCD_BLOCK_SIZE in pcre_internal.h -#endif -#endif /* SUPPORT_UCP */ - -#endif /* PCRE_INCLUDED */ diff --git a/src/pcre/pcre_version.c b/src/pcre/pcre_version.c deleted file mode 100644 index ae86ff28..00000000 --- a/src/pcre/pcre_version.c +++ /dev/null @@ -1,98 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This module contains the external function pcre_version(), which returns a -string that identifies the PCRE version that is in use. */ - - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include "pcre_internal.h" - - -/************************************************* -* Return version string * -*************************************************/ - -/* These macros are the standard way of turning unquoted text into C strings. -They allow macros like PCRE_MAJOR to be defined without quotes, which is -convenient for user programs that want to test its value. */ - -#define STRING(a) # a -#define XSTRING(s) STRING(s) - -/* A problem turned up with PCRE_PRERELEASE, which is defined empty for -production releases. Originally, it was used naively in this code: - - return XSTRING(PCRE_MAJOR) - "." XSTRING(PCRE_MINOR) - XSTRING(PCRE_PRERELEASE) - " " XSTRING(PCRE_DATE); - -However, when PCRE_PRERELEASE is empty, this leads to an attempted expansion of -STRING(). The C standard states: "If (before argument substitution) any -argument consists of no preprocessing tokens, the behavior is undefined." It -turns out the gcc treats this case as a single empty string - which is what we -really want - but Visual C grumbles about the lack of an argument for the -macro. Unfortunately, both are within their rights. To cope with both ways of -handling this, I had resort to some messy hackery that does a test at run time. -I could find no way of detecting that a macro is defined as an empty string at -pre-processor time. This hack uses a standard trick for avoiding calling -the STRING macro with an empty argument when doing the test. */ - -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN const char * PCRE_CALL_CONVENTION -pcre_version(void) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN const char * PCRE_CALL_CONVENTION -pcre16_version(void) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN const char * PCRE_CALL_CONVENTION -pcre32_version(void) -#endif -{ -return (XSTRING(Z PCRE_PRERELEASE)[1] == 0)? - XSTRING(PCRE_MAJOR.PCRE_MINOR PCRE_DATE) : - XSTRING(PCRE_MAJOR.PCRE_MINOR) XSTRING(PCRE_PRERELEASE PCRE_DATE); -} - -/* End of pcre_version.c */ diff --git a/src/pcre/pcrecpp.cc b/src/pcre/pcrecpp.cc deleted file mode 100644 index 77a2fedc..00000000 --- a/src/pcre/pcrecpp.cc +++ /dev/null @@ -1,984 +0,0 @@ -// Copyright (c) 2010, Google Inc. -// All rights reserved. -// -// Redistribution and use in source and binary forms, with or without -// modification, are permitted provided that the following conditions are -// met: -// -// * Redistributions of source code must retain the above copyright -// notice, this list of conditions and the following disclaimer. -// * Redistributions in binary form must reproduce the above -// copyright notice, this list of conditions and the following disclaimer -// in the documentation and/or other materials provided with the -// distribution. -// * Neither the name of Google Inc. nor the names of its -// contributors may be used to endorse or promote products derived from -// this software without specific prior written permission. -// -// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -// -// Author: Sanjay Ghemawat - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include -#include -#include -#include /* for SHRT_MIN, USHRT_MAX, etc */ -#include /* for memcpy */ -#include -#include -#include -#include - -#include "pcrecpp_internal.h" -#include "pcre.h" -#include "pcrecpp.h" -#include "pcre_stringpiece.h" - - -namespace pcrecpp { - -// Maximum number of args we can set -static const int kMaxArgs = 16; -static const int kVecSize = (1 + kMaxArgs) * 3; // results + PCRE workspace - -// Special object that stands-in for no argument -Arg RE::no_arg((void*)NULL); - -// This is for ABI compatibility with old versions of pcre (pre-7.6), -// which defined a global no_arg variable instead of putting it in the -// RE class. This works on GCC >= 3, at least. It definitely works -// for ELF, but may not for other object formats (Mach-O, for -// instance, does not support aliases.) We could probably have a more -// inclusive test if we ever needed it. (Note that not only the -// __attribute__ syntax, but also __USER_LABEL_PREFIX__, are -// gnu-specific.) -#if defined(__GNUC__) && __GNUC__ >= 3 && defined(__ELF__) && !defined(__INTEL_COMPILER) -# define ULP_AS_STRING(x) ULP_AS_STRING_INTERNAL(x) -# define ULP_AS_STRING_INTERNAL(x) #x -# define USER_LABEL_PREFIX_STR ULP_AS_STRING(__USER_LABEL_PREFIX__) -extern Arg no_arg - __attribute__((alias(USER_LABEL_PREFIX_STR "_ZN7pcrecpp2RE6no_argE"))); -#endif - -// If a regular expression has no error, its error_ field points here -static const string empty_string; - -// If the user doesn't ask for any options, we just use this one -static RE_Options default_options; - -// Specials for the start of patterns. See comments where start_options is used -// below. (PH June 2018) -static const char *start_options[] = { - "(*UTF8)", - "(*UTF)", - "(*UCP)", - "(*NO_START_OPT)", - "(*NO_AUTO_POSSESS)", - "(*LIMIT_RECURSION=", - "(*LIMIT_MATCH=", - "(*CRLF)", - "(*CR)", - "(*BSR_UNICODE)", - "(*BSR_ANYCRLF)", - "(*ANYCRLF)", - "(*ANY)", - "" }; - -void RE::Init(const string& pat, const RE_Options* options) { - pattern_ = pat; - if (options == NULL) { - options_ = default_options; - } else { - options_ = *options; - } - error_ = &empty_string; - re_full_ = NULL; - re_partial_ = NULL; - - re_partial_ = Compile(UNANCHORED); - if (re_partial_ != NULL) { - re_full_ = Compile(ANCHOR_BOTH); - } -} - -void RE::Cleanup() { - if (re_full_ != NULL) (*pcre_free)(re_full_); - if (re_partial_ != NULL) (*pcre_free)(re_partial_); - if (error_ != &empty_string) delete error_; -} - - -RE::~RE() { - Cleanup(); -} - - -pcre* RE::Compile(Anchor anchor) { - // First, convert RE_Options into pcre options - int pcre_options = 0; - pcre_options = options_.all_options(); - - // Special treatment for anchoring. This is needed because at - // runtime pcre only provides an option for anchoring at the - // beginning of a string (unless you use offset). - // - // There are three types of anchoring we want: - // UNANCHORED Compile the original pattern, and use - // a pcre unanchored match. - // ANCHOR_START Compile the original pattern, and use - // a pcre anchored match. - // ANCHOR_BOTH Tack a "\z" to the end of the original pattern - // and use a pcre anchored match. - - const char* compile_error; - int eoffset; - pcre* re; - if (anchor != ANCHOR_BOTH) { - re = pcre_compile(pattern_.c_str(), pcre_options, - &compile_error, &eoffset, NULL); - } else { - // Tack a '\z' at the end of RE. Parenthesize it first so that - // the '\z' applies to all top-level alternatives in the regexp. - - /* When this code was written (for PCRE 6.0) it was enough just to - parenthesize the entire pattern. Unfortunately, when the feature of - starting patterns with (*UTF8) or (*CR) etc. was added to PCRE patterns, - this code was never updated. This bug was not noticed till 2018, long after - PCRE became obsolescent and its maintainer no longer around. Since PCRE is - frozen, I have added a hack to check for all the existing "start of - pattern" specials - knowing that no new ones will ever be added. I am not a - C++ programmer, so the code style is no doubt crude. It is also - inefficient, but is only run when the pattern starts with "(*". - PH June 2018. */ - - string wrapped = ""; - - if (pattern_.c_str()[0] == '(' && pattern_.c_str()[1] == '*') { - int kk, klen, kmat; - for (;;) { // Loop for any number of leading items - - for (kk = 0; start_options[kk][0] != 0; kk++) { - klen = strlen(start_options[kk]); - kmat = strncmp(pattern_.c_str(), start_options[kk], klen); - if (kmat >= 0) break; - } - if (kmat != 0) break; // Not found - - // If the item ended in "=" we must copy digits up to ")". - - if (start_options[kk][klen-1] == '=') { - while (isdigit(pattern_.c_str()[klen])) klen++; - if (pattern_.c_str()[klen] != ')') break; // Syntax error - klen++; - } - - // Move the item from the pattern to the start of the wrapped string. - - wrapped += pattern_.substr(0, klen); - pattern_.erase(0, klen); - } - } - - // Wrap the rest of the pattern. - - wrapped += "(?:"; // A non-counting grouping operator - wrapped += pattern_; - wrapped += ")\\z"; - re = pcre_compile(wrapped.c_str(), pcre_options, - &compile_error, &eoffset, NULL); - } - if (re == NULL) { - if (error_ == &empty_string) error_ = new string(compile_error); - } - return re; -} - -/***** Matching interfaces *****/ - -bool RE::FullMatch(const StringPiece& text, - const Arg& ptr1, - const Arg& ptr2, - const Arg& ptr3, - const Arg& ptr4, - const Arg& ptr5, - const Arg& ptr6, - const Arg& ptr7, - const Arg& ptr8, - const Arg& ptr9, - const Arg& ptr10, - const Arg& ptr11, - const Arg& ptr12, - const Arg& ptr13, - const Arg& ptr14, - const Arg& ptr15, - const Arg& ptr16) const { - const Arg* args[kMaxArgs]; - int n = 0; - if (&ptr1 == &no_arg) { goto done; } args[n++] = &ptr1; - if (&ptr2 == &no_arg) { goto done; } args[n++] = &ptr2; - if (&ptr3 == &no_arg) { goto done; } args[n++] = &ptr3; - if (&ptr4 == &no_arg) { goto done; } args[n++] = &ptr4; - if (&ptr5 == &no_arg) { goto done; } args[n++] = &ptr5; - if (&ptr6 == &no_arg) { goto done; } args[n++] = &ptr6; - if (&ptr7 == &no_arg) { goto done; } args[n++] = &ptr7; - if (&ptr8 == &no_arg) { goto done; } args[n++] = &ptr8; - if (&ptr9 == &no_arg) { goto done; } args[n++] = &ptr9; - if (&ptr10 == &no_arg) { goto done; } args[n++] = &ptr10; - if (&ptr11 == &no_arg) { goto done; } args[n++] = &ptr11; - if (&ptr12 == &no_arg) { goto done; } args[n++] = &ptr12; - if (&ptr13 == &no_arg) { goto done; } args[n++] = &ptr13; - if (&ptr14 == &no_arg) { goto done; } args[n++] = &ptr14; - if (&ptr15 == &no_arg) { goto done; } args[n++] = &ptr15; - if (&ptr16 == &no_arg) { goto done; } args[n++] = &ptr16; - done: - - int consumed; - int vec[kVecSize]; - return DoMatchImpl(text, ANCHOR_BOTH, &consumed, args, n, vec, kVecSize); -} - -bool RE::PartialMatch(const StringPiece& text, - const Arg& ptr1, - const Arg& ptr2, - const Arg& ptr3, - const Arg& ptr4, - const Arg& ptr5, - const Arg& ptr6, - const Arg& ptr7, - const Arg& ptr8, - const Arg& ptr9, - const Arg& ptr10, - const Arg& ptr11, - const Arg& ptr12, - const Arg& ptr13, - const Arg& ptr14, - const Arg& ptr15, - const Arg& ptr16) const { - const Arg* args[kMaxArgs]; - int n = 0; - if (&ptr1 == &no_arg) { goto done; } args[n++] = &ptr1; - if (&ptr2 == &no_arg) { goto done; } args[n++] = &ptr2; - if (&ptr3 == &no_arg) { goto done; } args[n++] = &ptr3; - if (&ptr4 == &no_arg) { goto done; } args[n++] = &ptr4; - if (&ptr5 == &no_arg) { goto done; } args[n++] = &ptr5; - if (&ptr6 == &no_arg) { goto done; } args[n++] = &ptr6; - if (&ptr7 == &no_arg) { goto done; } args[n++] = &ptr7; - if (&ptr8 == &no_arg) { goto done; } args[n++] = &ptr8; - if (&ptr9 == &no_arg) { goto done; } args[n++] = &ptr9; - if (&ptr10 == &no_arg) { goto done; } args[n++] = &ptr10; - if (&ptr11 == &no_arg) { goto done; } args[n++] = &ptr11; - if (&ptr12 == &no_arg) { goto done; } args[n++] = &ptr12; - if (&ptr13 == &no_arg) { goto done; } args[n++] = &ptr13; - if (&ptr14 == &no_arg) { goto done; } args[n++] = &ptr14; - if (&ptr15 == &no_arg) { goto done; } args[n++] = &ptr15; - if (&ptr16 == &no_arg) { goto done; } args[n++] = &ptr16; - done: - - int consumed; - int vec[kVecSize]; - return DoMatchImpl(text, UNANCHORED, &consumed, args, n, vec, kVecSize); -} - -bool RE::Consume(StringPiece* input, - const Arg& ptr1, - const Arg& ptr2, - const Arg& ptr3, - const Arg& ptr4, - const Arg& ptr5, - const Arg& ptr6, - const Arg& ptr7, - const Arg& ptr8, - const Arg& ptr9, - const Arg& ptr10, - const Arg& ptr11, - const Arg& ptr12, - const Arg& ptr13, - const Arg& ptr14, - const Arg& ptr15, - const Arg& ptr16) const { - const Arg* args[kMaxArgs]; - int n = 0; - if (&ptr1 == &no_arg) { goto done; } args[n++] = &ptr1; - if (&ptr2 == &no_arg) { goto done; } args[n++] = &ptr2; - if (&ptr3 == &no_arg) { goto done; } args[n++] = &ptr3; - if (&ptr4 == &no_arg) { goto done; } args[n++] = &ptr4; - if (&ptr5 == &no_arg) { goto done; } args[n++] = &ptr5; - if (&ptr6 == &no_arg) { goto done; } args[n++] = &ptr6; - if (&ptr7 == &no_arg) { goto done; } args[n++] = &ptr7; - if (&ptr8 == &no_arg) { goto done; } args[n++] = &ptr8; - if (&ptr9 == &no_arg) { goto done; } args[n++] = &ptr9; - if (&ptr10 == &no_arg) { goto done; } args[n++] = &ptr10; - if (&ptr11 == &no_arg) { goto done; } args[n++] = &ptr11; - if (&ptr12 == &no_arg) { goto done; } args[n++] = &ptr12; - if (&ptr13 == &no_arg) { goto done; } args[n++] = &ptr13; - if (&ptr14 == &no_arg) { goto done; } args[n++] = &ptr14; - if (&ptr15 == &no_arg) { goto done; } args[n++] = &ptr15; - if (&ptr16 == &no_arg) { goto done; } args[n++] = &ptr16; - done: - - int consumed; - int vec[kVecSize]; - if (DoMatchImpl(*input, ANCHOR_START, &consumed, - args, n, vec, kVecSize)) { - input->remove_prefix(consumed); - return true; - } else { - return false; - } -} - -bool RE::FindAndConsume(StringPiece* input, - const Arg& ptr1, - const Arg& ptr2, - const Arg& ptr3, - const Arg& ptr4, - const Arg& ptr5, - const Arg& ptr6, - const Arg& ptr7, - const Arg& ptr8, - const Arg& ptr9, - const Arg& ptr10, - const Arg& ptr11, - const Arg& ptr12, - const Arg& ptr13, - const Arg& ptr14, - const Arg& ptr15, - const Arg& ptr16) const { - const Arg* args[kMaxArgs]; - int n = 0; - if (&ptr1 == &no_arg) { goto done; } args[n++] = &ptr1; - if (&ptr2 == &no_arg) { goto done; } args[n++] = &ptr2; - if (&ptr3 == &no_arg) { goto done; } args[n++] = &ptr3; - if (&ptr4 == &no_arg) { goto done; } args[n++] = &ptr4; - if (&ptr5 == &no_arg) { goto done; } args[n++] = &ptr5; - if (&ptr6 == &no_arg) { goto done; } args[n++] = &ptr6; - if (&ptr7 == &no_arg) { goto done; } args[n++] = &ptr7; - if (&ptr8 == &no_arg) { goto done; } args[n++] = &ptr8; - if (&ptr9 == &no_arg) { goto done; } args[n++] = &ptr9; - if (&ptr10 == &no_arg) { goto done; } args[n++] = &ptr10; - if (&ptr11 == &no_arg) { goto done; } args[n++] = &ptr11; - if (&ptr12 == &no_arg) { goto done; } args[n++] = &ptr12; - if (&ptr13 == &no_arg) { goto done; } args[n++] = &ptr13; - if (&ptr14 == &no_arg) { goto done; } args[n++] = &ptr14; - if (&ptr15 == &no_arg) { goto done; } args[n++] = &ptr15; - if (&ptr16 == &no_arg) { goto done; } args[n++] = &ptr16; - done: - - int consumed; - int vec[kVecSize]; - if (DoMatchImpl(*input, UNANCHORED, &consumed, - args, n, vec, kVecSize)) { - input->remove_prefix(consumed); - return true; - } else { - return false; - } -} - -bool RE::Replace(const StringPiece& rewrite, - string *str) const { - int vec[kVecSize]; - int matches = TryMatch(*str, 0, UNANCHORED, true, vec, kVecSize); - if (matches == 0) - return false; - - string s; - if (!Rewrite(&s, rewrite, *str, vec, matches)) - return false; - - assert(vec[0] >= 0); - assert(vec[1] >= 0); - str->replace(vec[0], vec[1] - vec[0], s); - return true; -} - -// Returns PCRE_NEWLINE_CRLF, PCRE_NEWLINE_CR, or PCRE_NEWLINE_LF. -// Note that PCRE_NEWLINE_CRLF is defined to be P_N_CR | P_N_LF. -// Modified by PH to add PCRE_NEWLINE_ANY and PCRE_NEWLINE_ANYCRLF. - -static int NewlineMode(int pcre_options) { - // TODO: if we can make it threadsafe, cache this var - int newline_mode = 0; - /* if (newline_mode) return newline_mode; */ // do this once it's cached - if (pcre_options & (PCRE_NEWLINE_CRLF|PCRE_NEWLINE_CR|PCRE_NEWLINE_LF| - PCRE_NEWLINE_ANY|PCRE_NEWLINE_ANYCRLF)) { - newline_mode = (pcre_options & - (PCRE_NEWLINE_CRLF|PCRE_NEWLINE_CR|PCRE_NEWLINE_LF| - PCRE_NEWLINE_ANY|PCRE_NEWLINE_ANYCRLF)); - } else { - int newline; - pcre_config(PCRE_CONFIG_NEWLINE, &newline); - if (newline == 10) - newline_mode = PCRE_NEWLINE_LF; - else if (newline == 13) - newline_mode = PCRE_NEWLINE_CR; - else if (newline == 3338) - newline_mode = PCRE_NEWLINE_CRLF; - else if (newline == -1) - newline_mode = PCRE_NEWLINE_ANY; - else if (newline == -2) - newline_mode = PCRE_NEWLINE_ANYCRLF; - else - assert(NULL == "Unexpected return value from pcre_config(NEWLINE)"); - } - return newline_mode; -} - -int RE::GlobalReplace(const StringPiece& rewrite, - string *str) const { - int count = 0; - int vec[kVecSize]; - string out; - int start = 0; - bool last_match_was_empty_string = false; - - while (start <= static_cast(str->length())) { - // If the previous match was for the empty string, we shouldn't - // just match again: we'll match in the same way and get an - // infinite loop. Instead, we do the match in a special way: - // anchored -- to force another try at the same position -- - // and with a flag saying that this time, ignore empty matches. - // If this special match returns, that means there's a non-empty - // match at this position as well, and we can continue. If not, - // we do what perl does, and just advance by one. - // Notice that perl prints '@@@' for this; - // perl -le '$_ = "aa"; s/b*|aa/@/g; print' - int matches; - if (last_match_was_empty_string) { - matches = TryMatch(*str, start, ANCHOR_START, false, vec, kVecSize); - if (matches <= 0) { - int matchend = start + 1; // advance one character. - // If the current char is CR and we're in CRLF mode, skip LF too. - // Note it's better to call pcre_fullinfo() than to examine - // all_options(), since options_ could have changed bewteen - // compile-time and now, but this is simpler and safe enough. - // Modified by PH to add ANY and ANYCRLF. - if (matchend < static_cast(str->length()) && - (*str)[start] == '\r' && (*str)[matchend] == '\n' && - (NewlineMode(options_.all_options()) == PCRE_NEWLINE_CRLF || - NewlineMode(options_.all_options()) == PCRE_NEWLINE_ANY || - NewlineMode(options_.all_options()) == PCRE_NEWLINE_ANYCRLF)) { - matchend++; - } - // We also need to advance more than one char if we're in utf8 mode. -#ifdef SUPPORT_UTF - if (options_.utf8()) { - while (matchend < static_cast(str->length()) && - ((*str)[matchend] & 0xc0) == 0x80) - matchend++; - } -#endif - if (start < static_cast(str->length())) - out.append(*str, start, matchend - start); - start = matchend; - last_match_was_empty_string = false; - continue; - } - } else { - matches = TryMatch(*str, start, UNANCHORED, true, vec, kVecSize); - if (matches <= 0) - break; - } - int matchstart = vec[0], matchend = vec[1]; - assert(matchstart >= start); - assert(matchend >= matchstart); - out.append(*str, start, matchstart - start); - Rewrite(&out, rewrite, *str, vec, matches); - start = matchend; - count++; - last_match_was_empty_string = (matchstart == matchend); - } - - if (count == 0) - return 0; - - if (start < static_cast(str->length())) - out.append(*str, start, str->length() - start); - swap(out, *str); - return count; -} - -bool RE::Extract(const StringPiece& rewrite, - const StringPiece& text, - string *out) const { - int vec[kVecSize]; - int matches = TryMatch(text, 0, UNANCHORED, true, vec, kVecSize); - if (matches == 0) - return false; - out->erase(); - return Rewrite(out, rewrite, text, vec, matches); -} - -/*static*/ string RE::QuoteMeta(const StringPiece& unquoted) { - string result; - - // Escape any ascii character not in [A-Za-z_0-9]. - // - // Note that it's legal to escape a character even if it has no - // special meaning in a regular expression -- so this function does - // that. (This also makes it identical to the perl function of the - // same name; see `perldoc -f quotemeta`.) The one exception is - // escaping NUL: rather than doing backslash + NUL, like perl does, - // we do '\0', because pcre itself doesn't take embedded NUL chars. - for (int ii = 0; ii < unquoted.size(); ++ii) { - // Note that using 'isalnum' here raises the benchmark time from - // 32ns to 58ns: - if (unquoted[ii] == '\0') { - result += "\\0"; - } else if ((unquoted[ii] < 'a' || unquoted[ii] > 'z') && - (unquoted[ii] < 'A' || unquoted[ii] > 'Z') && - (unquoted[ii] < '0' || unquoted[ii] > '9') && - unquoted[ii] != '_' && - // If this is the part of a UTF8 or Latin1 character, we need - // to copy this byte without escaping. Experimentally this is - // what works correctly with the regexp library. - !(unquoted[ii] & 128)) { - result += '\\'; - result += unquoted[ii]; - } else { - result += unquoted[ii]; - } - } - - return result; -} - -/***** Actual matching and rewriting code *****/ - -int RE::TryMatch(const StringPiece& text, - int startpos, - Anchor anchor, - bool empty_ok, - int *vec, - int vecsize) const { - pcre* re = (anchor == ANCHOR_BOTH) ? re_full_ : re_partial_; - if (re == NULL) { - //fprintf(stderr, "Matching against invalid re: %s\n", error_->c_str()); - return 0; - } - - pcre_extra extra = { 0, 0, 0, 0, 0, 0, 0, 0 }; - if (options_.match_limit() > 0) { - extra.flags |= PCRE_EXTRA_MATCH_LIMIT; - extra.match_limit = options_.match_limit(); - } - if (options_.match_limit_recursion() > 0) { - extra.flags |= PCRE_EXTRA_MATCH_LIMIT_RECURSION; - extra.match_limit_recursion = options_.match_limit_recursion(); - } - - // int options = 0; - // Changed by PH as a result of bugzilla #1288 - int options = (options_.all_options() & PCRE_NO_UTF8_CHECK); - - if (anchor != UNANCHORED) - options |= PCRE_ANCHORED; - if (!empty_ok) - options |= PCRE_NOTEMPTY; - - int rc = pcre_exec(re, // The regular expression object - &extra, - (text.data() == NULL) ? "" : text.data(), - text.size(), - startpos, - options, - vec, - vecsize); - - // Handle errors - if (rc == PCRE_ERROR_NOMATCH) { - return 0; - } else if (rc < 0) { - //fprintf(stderr, "Unexpected return code: %d when matching '%s'\n", - // re, pattern_.c_str()); - return 0; - } else if (rc == 0) { - // pcre_exec() returns 0 as a special case when the number of - // capturing subpatterns exceeds the size of the vector. - // When this happens, there is a match and the output vector - // is filled, but we miss out on the positions of the extra subpatterns. - rc = vecsize / 2; - } - - return rc; -} - -bool RE::DoMatchImpl(const StringPiece& text, - Anchor anchor, - int* consumed, - const Arg* const* args, - int n, - int* vec, - int vecsize) const { - assert((1 + n) * 3 <= vecsize); // results + PCRE workspace - int matches = TryMatch(text, 0, anchor, true, vec, vecsize); - assert(matches >= 0); // TryMatch never returns negatives - if (matches == 0) - return false; - - *consumed = vec[1]; - - if (n == 0 || args == NULL) { - // We are not interested in results - return true; - } - - if (NumberOfCapturingGroups() < n) { - // RE has fewer capturing groups than number of arg pointers passed in - return false; - } - - // If we got here, we must have matched the whole pattern. - // We do not need (can not do) any more checks on the value of 'matches' here - // -- see the comment for TryMatch. - for (int i = 0; i < n; i++) { - const int start = vec[2*(i+1)]; - const int limit = vec[2*(i+1)+1]; - if (!args[i]->Parse(text.data() + start, limit-start)) { - // TODO: Should we indicate what the error was? - return false; - } - } - - return true; -} - -bool RE::DoMatch(const StringPiece& text, - Anchor anchor, - int* consumed, - const Arg* const args[], - int n) const { - assert(n >= 0); - size_t const vecsize = (1 + n) * 3; // results + PCRE workspace - // (as for kVecSize) - int space[21]; // use stack allocation for small vecsize (common case) - int* vec = vecsize <= 21 ? space : new int[vecsize]; - bool retval = DoMatchImpl(text, anchor, consumed, args, n, vec, (int)vecsize); - if (vec != space) delete [] vec; - return retval; -} - -bool RE::Rewrite(string *out, const StringPiece &rewrite, - const StringPiece &text, int *vec, int veclen) const { - for (const char *s = rewrite.data(), *end = s + rewrite.size(); - s < end; s++) { - int c = *s; - if (c == '\\') { - c = *++s; - if (isdigit(c)) { - int n = (c - '0'); - if (n >= veclen) { - //fprintf(stderr, requested group %d in regexp %.*s\n", - // n, rewrite.size(), rewrite.data()); - return false; - } - int start = vec[2 * n]; - if (start >= 0) - out->append(text.data() + start, vec[2 * n + 1] - start); - } else if (c == '\\') { - *out += '\\'; - } else { - //fprintf(stderr, "invalid rewrite pattern: %.*s\n", - // rewrite.size(), rewrite.data()); - return false; - } - } else { - *out += c; - } - } - return true; -} - -// Return the number of capturing subpatterns, or -1 if the -// regexp wasn't valid on construction. -int RE::NumberOfCapturingGroups() const { - if (re_partial_ == NULL) return -1; - - int result; - int pcre_retval = pcre_fullinfo(re_partial_, // The regular expression object - NULL, // We did not study the pattern - PCRE_INFO_CAPTURECOUNT, - &result); - assert(pcre_retval == 0); - return result; -} - -/***** Parsers for various types *****/ - -bool Arg::parse_null(const char* str, int n, void* dest) { - (void)str; - (void)n; - // We fail if somebody asked us to store into a non-NULL void* pointer - return (dest == NULL); -} - -bool Arg::parse_string(const char* str, int n, void* dest) { - if (dest == NULL) return true; - reinterpret_cast(dest)->assign(str, n); - return true; -} - -bool Arg::parse_stringpiece(const char* str, int n, void* dest) { - if (dest == NULL) return true; - reinterpret_cast(dest)->set(str, n); - return true; -} - -bool Arg::parse_char(const char* str, int n, void* dest) { - if (n != 1) return false; - if (dest == NULL) return true; - *(reinterpret_cast(dest)) = str[0]; - return true; -} - -bool Arg::parse_uchar(const char* str, int n, void* dest) { - if (n != 1) return false; - if (dest == NULL) return true; - *(reinterpret_cast(dest)) = str[0]; - return true; -} - -// Largest number spec that we are willing to parse -static const int kMaxNumberLength = 32; - -// REQUIRES "buf" must have length at least kMaxNumberLength+1 -// REQUIRES "n > 0" -// Copies "str" into "buf" and null-terminates if necessary. -// Returns one of: -// a. "str" if no termination is needed -// b. "buf" if the string was copied and null-terminated -// c. "" if the input was invalid and has no hope of being parsed -static const char* TerminateNumber(char* buf, const char* str, int n) { - if ((n > 0) && isspace(*str)) { - // We are less forgiving than the strtoxxx() routines and do not - // allow leading spaces. - return ""; - } - - // See if the character right after the input text may potentially - // look like a digit. - if (isdigit(str[n]) || - ((str[n] >= 'a') && (str[n] <= 'f')) || - ((str[n] >= 'A') && (str[n] <= 'F'))) { - if (n > kMaxNumberLength) return ""; // Input too big to be a valid number - memcpy(buf, str, n); - buf[n] = '\0'; - return buf; - } else { - // We can parse right out of the supplied string, so return it. - return str; - } -} - -bool Arg::parse_long_radix(const char* str, - int n, - void* dest, - int radix) { - if (n == 0) return false; - char buf[kMaxNumberLength+1]; - str = TerminateNumber(buf, str, n); - char* end; - errno = 0; - long r = strtol(str, &end, radix); - if (end != str + n) return false; // Leftover junk - if (errno) return false; - if (dest == NULL) return true; - *(reinterpret_cast(dest)) = r; - return true; -} - -bool Arg::parse_ulong_radix(const char* str, - int n, - void* dest, - int radix) { - if (n == 0) return false; - char buf[kMaxNumberLength+1]; - str = TerminateNumber(buf, str, n); - if (str[0] == '-') return false; // strtoul() on a negative number?! - char* end; - errno = 0; - unsigned long r = strtoul(str, &end, radix); - if (end != str + n) return false; // Leftover junk - if (errno) return false; - if (dest == NULL) return true; - *(reinterpret_cast(dest)) = r; - return true; -} - -bool Arg::parse_short_radix(const char* str, - int n, - void* dest, - int radix) { - long r; - if (!parse_long_radix(str, n, &r, radix)) return false; // Could not parse - if (r < SHRT_MIN || r > SHRT_MAX) return false; // Out of range - if (dest == NULL) return true; - *(reinterpret_cast(dest)) = static_cast(r); - return true; -} - -bool Arg::parse_ushort_radix(const char* str, - int n, - void* dest, - int radix) { - unsigned long r; - if (!parse_ulong_radix(str, n, &r, radix)) return false; // Could not parse - if (r > USHRT_MAX) return false; // Out of range - if (dest == NULL) return true; - *(reinterpret_cast(dest)) = static_cast(r); - return true; -} - -bool Arg::parse_int_radix(const char* str, - int n, - void* dest, - int radix) { - long r; - if (!parse_long_radix(str, n, &r, radix)) return false; // Could not parse - if (r < INT_MIN || r > INT_MAX) return false; // Out of range - if (dest == NULL) return true; - *(reinterpret_cast(dest)) = r; - return true; -} - -bool Arg::parse_uint_radix(const char* str, - int n, - void* dest, - int radix) { - unsigned long r; - if (!parse_ulong_radix(str, n, &r, radix)) return false; // Could not parse - if (r > UINT_MAX) return false; // Out of range - if (dest == NULL) return true; - *(reinterpret_cast(dest)) = r; - return true; -} - -bool Arg::parse_longlong_radix(const char* str, - int n, - void* dest, - int radix) { -#ifndef HAVE_LONG_LONG - return false; -#else - if (n == 0) return false; - char buf[kMaxNumberLength+1]; - str = TerminateNumber(buf, str, n); - char* end; - errno = 0; -#if defined HAVE_STRTOQ - long long r = strtoq(str, &end, radix); -#elif defined HAVE_STRTOLL - long long r = strtoll(str, &end, radix); -#elif defined HAVE__STRTOI64 - long long r = _strtoi64(str, &end, radix); -#elif defined HAVE_STRTOIMAX - long long r = strtoimax(str, &end, radix); -#else -#error parse_longlong_radix: cannot convert input to a long-long -#endif - if (end != str + n) return false; // Leftover junk - if (errno) return false; - if (dest == NULL) return true; - *(reinterpret_cast(dest)) = r; - return true; -#endif /* HAVE_LONG_LONG */ -} - -bool Arg::parse_ulonglong_radix(const char* str, - int n, - void* dest, - int radix) { -#ifndef HAVE_UNSIGNED_LONG_LONG - return false; -#else - if (n == 0) return false; - char buf[kMaxNumberLength+1]; - str = TerminateNumber(buf, str, n); - if (str[0] == '-') return false; // strtoull() on a negative number?! - char* end; - errno = 0; -#if defined HAVE_STRTOQ - unsigned long long r = strtouq(str, &end, radix); -#elif defined HAVE_STRTOLL - unsigned long long r = strtoull(str, &end, radix); -#elif defined HAVE__STRTOI64 - unsigned long long r = _strtoui64(str, &end, radix); -#elif defined HAVE_STRTOIMAX - unsigned long long r = strtoumax(str, &end, radix); -#else -#error parse_ulonglong_radix: cannot convert input to a long-long -#endif - if (end != str + n) return false; // Leftover junk - if (errno) return false; - if (dest == NULL) return true; - *(reinterpret_cast(dest)) = r; - return true; -#endif /* HAVE_UNSIGNED_LONG_LONG */ -} - -bool Arg::parse_double(const char* str, int n, void* dest) { - if (n == 0) return false; - static const int kMaxLength = 200; - char buf[kMaxLength]; - if (n >= kMaxLength) return false; - memcpy(buf, str, n); - buf[n] = '\0'; - errno = 0; - char* end; - double r = strtod(buf, &end); - if (end != buf + n) return false; // Leftover junk - if (errno) return false; - if (dest == NULL) return true; - *(reinterpret_cast(dest)) = r; - return true; -} - -bool Arg::parse_float(const char* str, int n, void* dest) { - double r; - if (!parse_double(str, n, &r)) return false; - if (dest == NULL) return true; - *(reinterpret_cast(dest)) = static_cast(r); - return true; -} - - -#define DEFINE_INTEGER_PARSERS(name) \ - bool Arg::parse_##name(const char* str, int n, void* dest) { \ - return parse_##name##_radix(str, n, dest, 10); \ - } \ - bool Arg::parse_##name##_hex(const char* str, int n, void* dest) { \ - return parse_##name##_radix(str, n, dest, 16); \ - } \ - bool Arg::parse_##name##_octal(const char* str, int n, void* dest) { \ - return parse_##name##_radix(str, n, dest, 8); \ - } \ - bool Arg::parse_##name##_cradix(const char* str, int n, void* dest) { \ - return parse_##name##_radix(str, n, dest, 0); \ - } - -DEFINE_INTEGER_PARSERS(short) /* */ -DEFINE_INTEGER_PARSERS(ushort) /* */ -DEFINE_INTEGER_PARSERS(int) /* Don't use semicolons after these */ -DEFINE_INTEGER_PARSERS(uint) /* statements because they can cause */ -DEFINE_INTEGER_PARSERS(long) /* compiler warnings if the checking */ -DEFINE_INTEGER_PARSERS(ulong) /* level is turned up high enough. */ -DEFINE_INTEGER_PARSERS(longlong) /* */ -DEFINE_INTEGER_PARSERS(ulonglong) /* */ - -#undef DEFINE_INTEGER_PARSERS - -} // namespace pcrecpp diff --git a/src/pcre/pcrecpp.h b/src/pcre/pcrecpp.h deleted file mode 100644 index 3e594b0d..00000000 --- a/src/pcre/pcrecpp.h +++ /dev/null @@ -1,710 +0,0 @@ -// Copyright (c) 2005, Google Inc. -// All rights reserved. -// -// Redistribution and use in source and binary forms, with or without -// modification, are permitted provided that the following conditions are -// met: -// -// * Redistributions of source code must retain the above copyright -// notice, this list of conditions and the following disclaimer. -// * Redistributions in binary form must reproduce the above -// copyright notice, this list of conditions and the following disclaimer -// in the documentation and/or other materials provided with the -// distribution. -// * Neither the name of Google Inc. nor the names of its -// contributors may be used to endorse or promote products derived from -// this software without specific prior written permission. -// -// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -// -// Author: Sanjay Ghemawat -// Support for PCRE_XXX modifiers added by Giuseppe Maxia, July 2005 - -#ifndef _PCRECPP_H -#define _PCRECPP_H - -// C++ interface to the pcre regular-expression library. RE supports -// Perl-style regular expressions (with extensions like \d, \w, \s, -// ...). -// -// ----------------------------------------------------------------------- -// REGEXP SYNTAX: -// -// This module is part of the pcre library and hence supports its syntax -// for regular expressions. -// -// The syntax is pretty similar to Perl's. For those not familiar -// with Perl's regular expressions, here are some examples of the most -// commonly used extensions: -// -// "hello (\\w+) world" -- \w matches a "word" character -// "version (\\d+)" -- \d matches a digit -// "hello\\s+world" -- \s matches any whitespace character -// "\\b(\\w+)\\b" -- \b matches empty string at a word boundary -// "(?i)hello" -- (?i) turns on case-insensitive matching -// "/\\*(.*?)\\*/" -- .*? matches . minimum no. of times possible -// -// ----------------------------------------------------------------------- -// MATCHING INTERFACE: -// -// The "FullMatch" operation checks that supplied text matches a -// supplied pattern exactly. -// -// Example: successful match -// pcrecpp::RE re("h.*o"); -// re.FullMatch("hello"); -// -// Example: unsuccessful match (requires full match): -// pcrecpp::RE re("e"); -// !re.FullMatch("hello"); -// -// Example: creating a temporary RE object: -// pcrecpp::RE("h.*o").FullMatch("hello"); -// -// You can pass in a "const char*" or a "string" for "text". The -// examples below tend to use a const char*. -// -// You can, as in the different examples above, store the RE object -// explicitly in a variable or use a temporary RE object. The -// examples below use one mode or the other arbitrarily. Either -// could correctly be used for any of these examples. -// -// ----------------------------------------------------------------------- -// MATCHING WITH SUB-STRING EXTRACTION: -// -// You can supply extra pointer arguments to extract matched subpieces. -// -// Example: extracts "ruby" into "s" and 1234 into "i" -// int i; -// string s; -// pcrecpp::RE re("(\\w+):(\\d+)"); -// re.FullMatch("ruby:1234", &s, &i); -// -// Example: does not try to extract any extra sub-patterns -// re.FullMatch("ruby:1234", &s); -// -// Example: does not try to extract into NULL -// re.FullMatch("ruby:1234", NULL, &i); -// -// Example: integer overflow causes failure -// !re.FullMatch("ruby:1234567891234", NULL, &i); -// -// Example: fails because there aren't enough sub-patterns: -// !pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s); -// -// Example: fails because string cannot be stored in integer -// !pcrecpp::RE("(.*)").FullMatch("ruby", &i); -// -// The provided pointer arguments can be pointers to any scalar numeric -// type, or one of -// string (matched piece is copied to string) -// StringPiece (StringPiece is mutated to point to matched piece) -// T (where "bool T::ParseFrom(const char*, int)" exists) -// NULL (the corresponding matched sub-pattern is not copied) -// -// CAVEAT: An optional sub-pattern that does not exist in the matched -// string is assigned the empty string. Therefore, the following will -// return false (because the empty string is not a valid number): -// int number; -// pcrecpp::RE::FullMatch("abc", "[a-z]+(\\d+)?", &number); -// -// ----------------------------------------------------------------------- -// DO_MATCH -// -// The matching interface supports at most 16 arguments per call. -// If you need more, consider using the more general interface -// pcrecpp::RE::DoMatch(). See pcrecpp.h for the signature for DoMatch. -// -// ----------------------------------------------------------------------- -// PARTIAL MATCHES -// -// You can use the "PartialMatch" operation when you want the pattern -// to match any substring of the text. -// -// Example: simple search for a string: -// pcrecpp::RE("ell").PartialMatch("hello"); -// -// Example: find first number in a string: -// int number; -// pcrecpp::RE re("(\\d+)"); -// re.PartialMatch("x*100 + 20", &number); -// assert(number == 100); -// -// ----------------------------------------------------------------------- -// UTF-8 AND THE MATCHING INTERFACE: -// -// By default, pattern and text are plain text, one byte per character. -// The UTF8 flag, passed to the constructor, causes both pattern -// and string to be treated as UTF-8 text, still a byte stream but -// potentially multiple bytes per character. In practice, the text -// is likelier to be UTF-8 than the pattern, but the match returned -// may depend on the UTF8 flag, so always use it when matching -// UTF8 text. E.g., "." will match one byte normally but with UTF8 -// set may match up to three bytes of a multi-byte character. -// -// Example: -// pcrecpp::RE_Options options; -// options.set_utf8(); -// pcrecpp::RE re(utf8_pattern, options); -// re.FullMatch(utf8_string); -// -// Example: using the convenience function UTF8(): -// pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8()); -// re.FullMatch(utf8_string); -// -// NOTE: The UTF8 option is ignored if pcre was not configured with the -// --enable-utf8 flag. -// -// ----------------------------------------------------------------------- -// PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE -// -// PCRE defines some modifiers to change the behavior of the regular -// expression engine. -// The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle -// to pass such modifiers to a RE class. -// -// Currently, the following modifiers are supported -// -// modifier description Perl corresponding -// -// PCRE_CASELESS case insensitive match /i -// PCRE_MULTILINE multiple lines match /m -// PCRE_DOTALL dot matches newlines /s -// PCRE_DOLLAR_ENDONLY $ matches only at end N/A -// PCRE_EXTRA strict escape parsing N/A -// PCRE_EXTENDED ignore whitespaces /x -// PCRE_UTF8 handles UTF8 chars built-in -// PCRE_UNGREEDY reverses * and *? N/A -// PCRE_NO_AUTO_CAPTURE disables matching parens N/A (*) -// -// (For a full account on how each modifier works, please check the -// PCRE API reference manual). -// -// (*) Both Perl and PCRE allow non matching parentheses by means of the -// "?:" modifier within the pattern itself. e.g. (?:ab|cd) does not -// capture, while (ab|cd) does. -// -// For each modifier, there are two member functions whose name is made -// out of the modifier in lowercase, without the "PCRE_" prefix. For -// instance, PCRE_CASELESS is handled by -// bool caseless(), -// which returns true if the modifier is set, and -// RE_Options & set_caseless(bool), -// which sets or unsets the modifier. -// -// Moreover, PCRE_EXTRA_MATCH_LIMIT can be accessed through the -// set_match_limit() and match_limit() member functions. -// Setting match_limit to a non-zero value will limit the executation of -// pcre to keep it from doing bad things like blowing the stack or taking -// an eternity to return a result. A value of 5000 is good enough to stop -// stack blowup in a 2MB thread stack. Setting match_limit to zero will -// disable match limiting. Alternately, you can set match_limit_recursion() -// which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much pcre -// recurses. match_limit() caps the number of matches pcre does; -// match_limit_recrusion() caps the depth of recursion. -// -// Normally, to pass one or more modifiers to a RE class, you declare -// a RE_Options object, set the appropriate options, and pass this -// object to a RE constructor. Example: -// -// RE_options opt; -// opt.set_caseless(true); -// -// if (RE("HELLO", opt).PartialMatch("hello world")) ... -// -// RE_options has two constructors. The default constructor takes no -// arguments and creates a set of flags that are off by default. -// -// The optional parameter 'option_flags' is to facilitate transfer -// of legacy code from C programs. This lets you do -// RE(pattern, RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str); -// -// But new code is better off doing -// RE(pattern, -// RE_Options().set_caseless(true).set_multiline(true)).PartialMatch(str); -// (See below) -// -// If you are going to pass one of the most used modifiers, there are some -// convenience functions that return a RE_Options class with the -// appropriate modifier already set: -// CASELESS(), UTF8(), MULTILINE(), DOTALL(), EXTENDED() -// -// If you need to set several options at once, and you don't want to go -// through the pains of declaring a RE_Options object and setting several -// options, there is a parallel method that give you such ability on the -// fly. You can concatenate several set_xxxxx member functions, since each -// of them returns a reference to its class object. e.g.: to pass -// PCRE_CASELESS, PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one -// statement, you may write -// -// RE(" ^ xyz \\s+ .* blah$", RE_Options() -// .set_caseless(true) -// .set_extended(true) -// .set_multiline(true)).PartialMatch(sometext); -// -// ----------------------------------------------------------------------- -// SCANNING TEXT INCREMENTALLY -// -// The "Consume" operation may be useful if you want to repeatedly -// match regular expressions at the front of a string and skip over -// them as they match. This requires use of the "StringPiece" type, -// which represents a sub-range of a real string. Like RE, StringPiece -// is defined in the pcrecpp namespace. -// -// Example: read lines of the form "var = value" from a string. -// string contents = ...; // Fill string somehow -// pcrecpp::StringPiece input(contents); // Wrap in a StringPiece -// -// string var; -// int value; -// pcrecpp::RE re("(\\w+) = (\\d+)\n"); -// while (re.Consume(&input, &var, &value)) { -// ...; -// } -// -// Each successful call to "Consume" will set "var/value", and also -// advance "input" so it points past the matched text. -// -// The "FindAndConsume" operation is similar to "Consume" but does not -// anchor your match at the beginning of the string. For example, you -// could extract all words from a string by repeatedly calling -// pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word) -// -// ----------------------------------------------------------------------- -// PARSING HEX/OCTAL/C-RADIX NUMBERS -// -// By default, if you pass a pointer to a numeric value, the -// corresponding text is interpreted as a base-10 number. You can -// instead wrap the pointer with a call to one of the operators Hex(), -// Octal(), or CRadix() to interpret the text in another base. The -// CRadix operator interprets C-style "0" (base-8) and "0x" (base-16) -// prefixes, but defaults to base-10. -// -// Example: -// int a, b, c, d; -// pcrecpp::RE re("(.*) (.*) (.*) (.*)"); -// re.FullMatch("100 40 0100 0x40", -// pcrecpp::Octal(&a), pcrecpp::Hex(&b), -// pcrecpp::CRadix(&c), pcrecpp::CRadix(&d)); -// will leave 64 in a, b, c, and d. -// -// ----------------------------------------------------------------------- -// REPLACING PARTS OF STRINGS -// -// You can replace the first match of "pattern" in "str" with -// "rewrite". Within "rewrite", backslash-escaped digits (\1 to \9) -// can be used to insert text matching corresponding parenthesized -// group from the pattern. \0 in "rewrite" refers to the entire -// matching text. E.g., -// -// string s = "yabba dabba doo"; -// pcrecpp::RE("b+").Replace("d", &s); -// -// will leave "s" containing "yada dabba doo". The result is true if -// the pattern matches and a replacement occurs, or false otherwise. -// -// GlobalReplace() is like Replace(), except that it replaces all -// occurrences of the pattern in the string with the rewrite. -// Replacements are not subject to re-matching. E.g., -// -// string s = "yabba dabba doo"; -// pcrecpp::RE("b+").GlobalReplace("d", &s); -// -// will leave "s" containing "yada dada doo". It returns the number -// of replacements made. -// -// Extract() is like Replace(), except that if the pattern matches, -// "rewrite" is copied into "out" (an additional argument) with -// substitutions. The non-matching portions of "text" are ignored. -// Returns true iff a match occurred and the extraction happened -// successfully. If no match occurs, the string is left unaffected. - - -#include -#include -#include // defines the Arg class -// This isn't technically needed here, but we include it -// anyway so folks who include pcrecpp.h don't have to. -#include - -namespace pcrecpp { - -#define PCRE_SET_OR_CLEAR(b, o) \ - if (b) all_options_ |= (o); else all_options_ &= ~(o); \ - return *this - -#define PCRE_IS_SET(o) \ - (all_options_ & o) == o - -/***** Compiling regular expressions: the RE class *****/ - -// RE_Options allow you to set options to be passed along to pcre, -// along with other options we put on top of pcre. -// Only 9 modifiers, plus match_limit and match_limit_recursion, -// are supported now. -class PCRECPP_EXP_DEFN RE_Options { - public: - // constructor - RE_Options() : match_limit_(0), match_limit_recursion_(0), all_options_(0) {} - - // alternative constructor. - // To facilitate transfer of legacy code from C programs - // - // This lets you do - // RE(pattern, RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str); - // But new code is better off doing - // RE(pattern, - // RE_Options().set_caseless(true).set_multiline(true)).PartialMatch(str); - RE_Options(int option_flags) : match_limit_(0), match_limit_recursion_(0), - all_options_(option_flags) {} - // we're fine with the default destructor, copy constructor, etc. - - // accessors and mutators - int match_limit() const { return match_limit_; }; - RE_Options &set_match_limit(int limit) { - match_limit_ = limit; - return *this; - } - - int match_limit_recursion() const { return match_limit_recursion_; }; - RE_Options &set_match_limit_recursion(int limit) { - match_limit_recursion_ = limit; - return *this; - } - - bool caseless() const { - return PCRE_IS_SET(PCRE_CASELESS); - } - RE_Options &set_caseless(bool x) { - PCRE_SET_OR_CLEAR(x, PCRE_CASELESS); - } - - bool multiline() const { - return PCRE_IS_SET(PCRE_MULTILINE); - } - RE_Options &set_multiline(bool x) { - PCRE_SET_OR_CLEAR(x, PCRE_MULTILINE); - } - - bool dotall() const { - return PCRE_IS_SET(PCRE_DOTALL); - } - RE_Options &set_dotall(bool x) { - PCRE_SET_OR_CLEAR(x, PCRE_DOTALL); - } - - bool extended() const { - return PCRE_IS_SET(PCRE_EXTENDED); - } - RE_Options &set_extended(bool x) { - PCRE_SET_OR_CLEAR(x, PCRE_EXTENDED); - } - - bool dollar_endonly() const { - return PCRE_IS_SET(PCRE_DOLLAR_ENDONLY); - } - RE_Options &set_dollar_endonly(bool x) { - PCRE_SET_OR_CLEAR(x, PCRE_DOLLAR_ENDONLY); - } - - bool extra() const { - return PCRE_IS_SET(PCRE_EXTRA); - } - RE_Options &set_extra(bool x) { - PCRE_SET_OR_CLEAR(x, PCRE_EXTRA); - } - - bool ungreedy() const { - return PCRE_IS_SET(PCRE_UNGREEDY); - } - RE_Options &set_ungreedy(bool x) { - PCRE_SET_OR_CLEAR(x, PCRE_UNGREEDY); - } - - bool utf8() const { - return PCRE_IS_SET(PCRE_UTF8); - } - RE_Options &set_utf8(bool x) { - PCRE_SET_OR_CLEAR(x, PCRE_UTF8); - } - - bool no_auto_capture() const { - return PCRE_IS_SET(PCRE_NO_AUTO_CAPTURE); - } - RE_Options &set_no_auto_capture(bool x) { - PCRE_SET_OR_CLEAR(x, PCRE_NO_AUTO_CAPTURE); - } - - RE_Options &set_all_options(int opt) { - all_options_ = opt; - return *this; - } - int all_options() const { - return all_options_ ; - } - - // TODO: add other pcre flags - - private: - int match_limit_; - int match_limit_recursion_; - int all_options_; -}; - -// These functions return some common RE_Options -static inline RE_Options UTF8() { - return RE_Options().set_utf8(true); -} - -static inline RE_Options CASELESS() { - return RE_Options().set_caseless(true); -} -static inline RE_Options MULTILINE() { - return RE_Options().set_multiline(true); -} - -static inline RE_Options DOTALL() { - return RE_Options().set_dotall(true); -} - -static inline RE_Options EXTENDED() { - return RE_Options().set_extended(true); -} - -// Interface for regular expression matching. Also corresponds to a -// pre-compiled regular expression. An "RE" object is safe for -// concurrent use by multiple threads. -class PCRECPP_EXP_DEFN RE { - public: - // We provide implicit conversions from strings so that users can - // pass in a string or a "const char*" wherever an "RE" is expected. - RE(const string& pat) { Init(pat, NULL); } - RE(const string& pat, const RE_Options& option) { Init(pat, &option); } - RE(const char* pat) { Init(pat, NULL); } - RE(const char* pat, const RE_Options& option) { Init(pat, &option); } - RE(const unsigned char* pat) { - Init(reinterpret_cast(pat), NULL); - } - RE(const unsigned char* pat, const RE_Options& option) { - Init(reinterpret_cast(pat), &option); - } - - // Copy constructor & assignment - note that these are expensive - // because they recompile the expression. - RE(const RE& re) { Init(re.pattern_, &re.options_); } - const RE& operator=(const RE& re) { - if (this != &re) { - Cleanup(); - - // This is the code that originally came from Google - // Init(re.pattern_.c_str(), &re.options_); - - // This is the replacement from Ari Pollak - Init(re.pattern_, &re.options_); - } - return *this; - } - - - ~RE(); - - // The string specification for this RE. E.g. - // RE re("ab*c?d+"); - // re.pattern(); // "ab*c?d+" - const string& pattern() const { return pattern_; } - - // If RE could not be created properly, returns an error string. - // Else returns the empty string. - const string& error() const { return *error_; } - - /***** The useful part: the matching interface *****/ - - // This is provided so one can do pattern.ReplaceAll() just as - // easily as ReplaceAll(pattern-text, ....) - - bool FullMatch(const StringPiece& text, - const Arg& ptr1 = no_arg, - const Arg& ptr2 = no_arg, - const Arg& ptr3 = no_arg, - const Arg& ptr4 = no_arg, - const Arg& ptr5 = no_arg, - const Arg& ptr6 = no_arg, - const Arg& ptr7 = no_arg, - const Arg& ptr8 = no_arg, - const Arg& ptr9 = no_arg, - const Arg& ptr10 = no_arg, - const Arg& ptr11 = no_arg, - const Arg& ptr12 = no_arg, - const Arg& ptr13 = no_arg, - const Arg& ptr14 = no_arg, - const Arg& ptr15 = no_arg, - const Arg& ptr16 = no_arg) const; - - bool PartialMatch(const StringPiece& text, - const Arg& ptr1 = no_arg, - const Arg& ptr2 = no_arg, - const Arg& ptr3 = no_arg, - const Arg& ptr4 = no_arg, - const Arg& ptr5 = no_arg, - const Arg& ptr6 = no_arg, - const Arg& ptr7 = no_arg, - const Arg& ptr8 = no_arg, - const Arg& ptr9 = no_arg, - const Arg& ptr10 = no_arg, - const Arg& ptr11 = no_arg, - const Arg& ptr12 = no_arg, - const Arg& ptr13 = no_arg, - const Arg& ptr14 = no_arg, - const Arg& ptr15 = no_arg, - const Arg& ptr16 = no_arg) const; - - bool Consume(StringPiece* input, - const Arg& ptr1 = no_arg, - const Arg& ptr2 = no_arg, - const Arg& ptr3 = no_arg, - const Arg& ptr4 = no_arg, - const Arg& ptr5 = no_arg, - const Arg& ptr6 = no_arg, - const Arg& ptr7 = no_arg, - const Arg& ptr8 = no_arg, - const Arg& ptr9 = no_arg, - const Arg& ptr10 = no_arg, - const Arg& ptr11 = no_arg, - const Arg& ptr12 = no_arg, - const Arg& ptr13 = no_arg, - const Arg& ptr14 = no_arg, - const Arg& ptr15 = no_arg, - const Arg& ptr16 = no_arg) const; - - bool FindAndConsume(StringPiece* input, - const Arg& ptr1 = no_arg, - const Arg& ptr2 = no_arg, - const Arg& ptr3 = no_arg, - const Arg& ptr4 = no_arg, - const Arg& ptr5 = no_arg, - const Arg& ptr6 = no_arg, - const Arg& ptr7 = no_arg, - const Arg& ptr8 = no_arg, - const Arg& ptr9 = no_arg, - const Arg& ptr10 = no_arg, - const Arg& ptr11 = no_arg, - const Arg& ptr12 = no_arg, - const Arg& ptr13 = no_arg, - const Arg& ptr14 = no_arg, - const Arg& ptr15 = no_arg, - const Arg& ptr16 = no_arg) const; - - bool Replace(const StringPiece& rewrite, - string *str) const; - - int GlobalReplace(const StringPiece& rewrite, - string *str) const; - - bool Extract(const StringPiece &rewrite, - const StringPiece &text, - string *out) const; - - // Escapes all potentially meaningful regexp characters in - // 'unquoted'. The returned string, used as a regular expression, - // will exactly match the original string. For example, - // 1.5-2.0? - // may become: - // 1\.5\-2\.0\? - // Note QuoteMeta behaves the same as perl's QuoteMeta function, - // *except* that it escapes the NUL character (\0) as backslash + 0, - // rather than backslash + NUL. - static string QuoteMeta(const StringPiece& unquoted); - - - /***** Generic matching interface *****/ - - // Type of match (TODO: Should be restructured as part of RE_Options) - enum Anchor { - UNANCHORED, // No anchoring - ANCHOR_START, // Anchor at start only - ANCHOR_BOTH // Anchor at start and end - }; - - // General matching routine. Stores the length of the match in - // "*consumed" if successful. - bool DoMatch(const StringPiece& text, - Anchor anchor, - int* consumed, - const Arg* const* args, int n) const; - - // Return the number of capturing subpatterns, or -1 if the - // regexp wasn't valid on construction. - int NumberOfCapturingGroups() const; - - // The default value for an argument, to indicate the end of the argument - // list. This must be used only in optional argument defaults. It should NOT - // be passed explicitly. Some people have tried to use it like this: - // - // FullMatch(x, y, &z, no_arg, &w); - // - // This is a mistake, and will not work. - static Arg no_arg; - - private: - - void Init(const string& pattern, const RE_Options* options); - void Cleanup(); - - // Match against "text", filling in "vec" (up to "vecsize" * 2/3) with - // pairs of integers for the beginning and end positions of matched - // text. The first pair corresponds to the entire matched text; - // subsequent pairs correspond, in order, to parentheses-captured - // matches. Returns the number of pairs (one more than the number of - // the last subpattern with a match) if matching was successful - // and zero if the match failed. - // I.e. for RE("(foo)|(bar)|(baz)") it will return 2, 3, and 4 when matching - // against "foo", "bar", and "baz" respectively. - // When matching RE("(foo)|hello") against "hello", it will return 1. - // But the values for all subpattern are filled in into "vec". - int TryMatch(const StringPiece& text, - int startpos, - Anchor anchor, - bool empty_ok, - int *vec, - int vecsize) const; - - // Append the "rewrite" string, with backslash subsitutions from "text" - // and "vec", to string "out". - bool Rewrite(string *out, - const StringPiece& rewrite, - const StringPiece& text, - int *vec, - int veclen) const; - - // internal implementation for DoMatch - bool DoMatchImpl(const StringPiece& text, - Anchor anchor, - int* consumed, - const Arg* const args[], - int n, - int* vec, - int vecsize) const; - - // Compile the regexp for the specified anchoring mode - pcre* Compile(Anchor anchor); - - string pattern_; - RE_Options options_; - pcre* re_full_; // For full matches - pcre* re_partial_; // For partial matches - const string* error_; // Error indicator (or points to empty string) -}; - -} // namespace pcrecpp - -#endif /* _PCRECPP_H */ diff --git a/src/pcre/pcrecpp_internal.h b/src/pcre/pcrecpp_internal.h deleted file mode 100644 index 827f9e04..00000000 --- a/src/pcre/pcrecpp_internal.h +++ /dev/null @@ -1,71 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* -Copyright (c) 2005, Google Inc. -All rights reserved. - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -#ifndef PCRECPP_INTERNAL_H -#define PCRECPP_INTERNAL_H - -/* When compiling a DLL for Windows, the exported symbols have to be declared -using some MS magic. I found some useful information on this web page: -http://msdn2.microsoft.com/en-us/library/y4h7bcy6(VS.80).aspx. According to the -information there, using __declspec(dllexport) without "extern" we have a -definition; with "extern" we have a declaration. The settings here override the -setting in pcre.h. We use: - - PCRECPP_EXP_DECL for declarations - PCRECPP_EXP_DEFN for definitions of exported functions - -*/ - -#ifndef PCRECPP_EXP_DECL -# ifdef _WIN32 -# ifndef PCRE_STATIC -# define PCRECPP_EXP_DECL extern __declspec(dllexport) -# define PCRECPP_EXP_DEFN __declspec(dllexport) -# else -# define PCRECPP_EXP_DECL extern -# define PCRECPP_EXP_DEFN -# endif -# else -# define PCRECPP_EXP_DECL extern -# define PCRECPP_EXP_DEFN -# endif -#endif - -#endif /* PCRECPP_INTERNAL_H */ - -/* End of pcrecpp_internal.h */ diff --git a/src/pcre/pcrecpp_unittest.cc b/src/pcre/pcrecpp_unittest.cc deleted file mode 100644 index 1fc01a04..00000000 --- a/src/pcre/pcrecpp_unittest.cc +++ /dev/null @@ -1,1316 +0,0 @@ -// -*- coding: utf-8 -*- -// -// Copyright (c) 2005 - 2010, Google Inc. -// All rights reserved. -// -// Redistribution and use in source and binary forms, with or without -// modification, are permitted provided that the following conditions are -// met: -// -// * Redistributions of source code must retain the above copyright -// notice, this list of conditions and the following disclaimer. -// * Redistributions in binary form must reproduce the above -// copyright notice, this list of conditions and the following disclaimer -// in the documentation and/or other materials provided with the -// distribution. -// * Neither the name of Google Inc. nor the names of its -// contributors may be used to endorse or promote products derived from -// this software without specific prior written permission. -// -// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -// -// Author: Sanjay Ghemawat -// -// TODO: Test extractions for PartialMatch/Consume - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include -#include /* for memset and strcmp */ -#include -#include -#include "pcrecpp.h" - -using std::string; -using pcrecpp::StringPiece; -using pcrecpp::RE; -using pcrecpp::RE_Options; -using pcrecpp::Hex; -using pcrecpp::Octal; -using pcrecpp::CRadix; - -static bool VERBOSE_TEST = false; - -// CHECK dies with a fatal error if condition is not true. It is *not* -// controlled by NDEBUG, so the check will be executed regardless of -// compilation mode. Therefore, it is safe to do things like: -// CHECK_EQ(fp->Write(x), 4) -#define CHECK(condition) do { \ - if (!(condition)) { \ - fprintf(stderr, "%s:%d: Check failed: %s\n", \ - __FILE__, __LINE__, #condition); \ - exit(1); \ - } \ -} while (0) - -#define CHECK_EQ(a, b) CHECK(a == b) - -static void Timing1(int num_iters) { - // Same pattern lots of times - RE pattern("ruby:\\d+"); - StringPiece p("ruby:1234"); - for (int j = num_iters; j > 0; j--) { - CHECK(pattern.FullMatch(p)); - } -} - -static void Timing2(int num_iters) { - // Same pattern lots of times - RE pattern("ruby:(\\d+)"); - int i; - for (int j = num_iters; j > 0; j--) { - CHECK(pattern.FullMatch("ruby:1234", &i)); - CHECK_EQ(i, 1234); - } -} - -static void Timing3(int num_iters) { - string text_string; - for (int j = num_iters; j > 0; j--) { - text_string += "this is another line\n"; - } - - RE line_matcher(".*\n"); - string line; - StringPiece text(text_string); - int counter = 0; - while (line_matcher.Consume(&text)) { - counter++; - } - printf("Matched %d lines\n", counter); -} - -#if 0 // uncomment this if you have a way of defining VirtualProcessSize() - -static void LeakTest() { - // Check for memory leaks - unsigned long long initial_size = 0; - for (int i = 0; i < 100000; i++) { - if (i == 50000) { - initial_size = VirtualProcessSize(); - printf("Size after 50000: %llu\n", initial_size); - } - char buf[100]; // definitely big enough - sprintf(buf, "pat%09d", i); - RE newre(buf); - } - uint64 final_size = VirtualProcessSize(); - printf("Size after 100000: %llu\n", final_size); - const double growth = double(final_size - initial_size) / final_size; - printf("Growth: %0.2f%%", growth * 100); - CHECK(growth < 0.02); // Allow < 2% growth -} - -#endif - -static void RadixTests() { - printf("Testing hex\n"); - -#define CHECK_HEX(type, value) \ - do { \ - type v; \ - CHECK(RE("([0-9a-fA-F]+)[uUlL]*").FullMatch(#value, Hex(&v))); \ - CHECK_EQ(v, 0x ## value); \ - CHECK(RE("([0-9a-fA-FxX]+)[uUlL]*").FullMatch("0x" #value, CRadix(&v))); \ - CHECK_EQ(v, 0x ## value); \ - } while(0) - - CHECK_HEX(short, 2bad); - CHECK_HEX(unsigned short, 2badU); - CHECK_HEX(int, dead); - CHECK_HEX(unsigned int, deadU); - CHECK_HEX(long, 7eadbeefL); - CHECK_HEX(unsigned long, deadbeefUL); -#ifdef HAVE_LONG_LONG - CHECK_HEX(long long, 12345678deadbeefLL); -#endif -#ifdef HAVE_UNSIGNED_LONG_LONG - CHECK_HEX(unsigned long long, cafebabedeadbeefULL); -#endif - -#undef CHECK_HEX - - printf("Testing octal\n"); - -#define CHECK_OCTAL(type, value) \ - do { \ - type v; \ - CHECK(RE("([0-7]+)[uUlL]*").FullMatch(#value, Octal(&v))); \ - CHECK_EQ(v, 0 ## value); \ - CHECK(RE("([0-9a-fA-FxX]+)[uUlL]*").FullMatch("0" #value, CRadix(&v))); \ - CHECK_EQ(v, 0 ## value); \ - } while(0) - - CHECK_OCTAL(short, 77777); - CHECK_OCTAL(unsigned short, 177777U); - CHECK_OCTAL(int, 17777777777); - CHECK_OCTAL(unsigned int, 37777777777U); - CHECK_OCTAL(long, 17777777777L); - CHECK_OCTAL(unsigned long, 37777777777UL); -#ifdef HAVE_LONG_LONG - CHECK_OCTAL(long long, 777777777777777777777LL); -#endif -#ifdef HAVE_UNSIGNED_LONG_LONG - CHECK_OCTAL(unsigned long long, 1777777777777777777777ULL); -#endif - -#undef CHECK_OCTAL - - printf("Testing decimal\n"); - -#define CHECK_DECIMAL(type, value) \ - do { \ - type v; \ - CHECK(RE("(-?[0-9]+)[uUlL]*").FullMatch(#value, &v)); \ - CHECK_EQ(v, value); \ - CHECK(RE("(-?[0-9a-fA-FxX]+)[uUlL]*").FullMatch(#value, CRadix(&v))); \ - CHECK_EQ(v, value); \ - } while(0) - - CHECK_DECIMAL(short, -1); - CHECK_DECIMAL(unsigned short, 9999); - CHECK_DECIMAL(int, -1000); - CHECK_DECIMAL(unsigned int, 12345U); - CHECK_DECIMAL(long, -10000000L); - CHECK_DECIMAL(unsigned long, 3083324652U); -#ifdef HAVE_LONG_LONG - CHECK_DECIMAL(long long, -100000000000000LL); -#endif -#ifdef HAVE_UNSIGNED_LONG_LONG - CHECK_DECIMAL(unsigned long long, 1234567890987654321ULL); -#endif - -#undef CHECK_DECIMAL - -} - -static void TestReplace() { - printf("Testing Replace\n"); - - struct ReplaceTest { - const char *regexp; - const char *rewrite; - const char *original; - const char *single; - const char *global; - int global_count; // the expected return value from ReplaceAll - }; - static const ReplaceTest tests[] = { - { "(qu|[b-df-hj-np-tv-z]*)([a-z]+)", - "\\2\\1ay", - "the quick brown fox jumps over the lazy dogs.", - "ethay quick brown fox jumps over the lazy dogs.", - "ethay ickquay ownbray oxfay umpsjay overay ethay azylay ogsday.", - 9 }, - { "\\w+", - "\\0-NOSPAM", - "paul.haahr@google.com", - "paul-NOSPAM.haahr@google.com", - "paul-NOSPAM.haahr-NOSPAM@google-NOSPAM.com-NOSPAM", - 4 }, - { "^", - "(START)", - "foo", - "(START)foo", - "(START)foo", - 1 }, - { "^", - "(START)", - "", - "(START)", - "(START)", - 1 }, - { "$", - "(END)", - "", - "(END)", - "(END)", - 1 }, - { "b", - "bb", - "ababababab", - "abbabababab", - "abbabbabbabbabb", - 5 }, - { "b", - "bb", - "bbbbbb", - "bbbbbbb", - "bbbbbbbbbbbb", - 6 }, - { "b+", - "bb", - "bbbbbb", - "bb", - "bb", - 1 }, - { "b*", - "bb", - "bbbbbb", - "bb", - "bbbb", - 2 }, - { "b*", - "bb", - "aaaaa", - "bbaaaaa", - "bbabbabbabbabbabb", - 6 }, - { "b*", - "bb", - "aa\naa\n", - "bbaa\naa\n", - "bbabbabb\nbbabbabb\nbb", - 7 }, - { "b*", - "bb", - "aa\raa\r", - "bbaa\raa\r", - "bbabbabb\rbbabbabb\rbb", - 7 }, - { "b*", - "bb", - "aa\r\naa\r\n", - "bbaa\r\naa\r\n", - "bbabbabb\r\nbbabbabb\r\nbb", - 7 }, - // Check empty-string matching (it's tricky!) - { "aa|b*", - "@", - "aa", - "@", - "@@", - 2 }, - { "b*|aa", - "@", - "aa", - "@aa", - "@@@", - 3 }, -#ifdef SUPPORT_UTF - { "b*", - "bb", - "\xE3\x83\x9B\xE3\x83\xBC\xE3\x83\xA0\xE3\x81\xB8", // utf8 - "bb\xE3\x83\x9B\xE3\x83\xBC\xE3\x83\xA0\xE3\x81\xB8", - "bb\xE3\x83\x9B""bb""\xE3\x83\xBC""bb""\xE3\x83\xA0""bb""\xE3\x81\xB8""bb", - 5 }, - { "b*", - "bb", - "\xE3\x83\x9B\r\n\xE3\x83\xBC\r\xE3\x83\xA0\n\xE3\x81\xB8\r\n", // utf8 - "bb\xE3\x83\x9B\r\n\xE3\x83\xBC\r\xE3\x83\xA0\n\xE3\x81\xB8\r\n", - ("bb\xE3\x83\x9B""bb\r\nbb""\xE3\x83\xBC""bb\rbb""\xE3\x83\xA0" - "bb\nbb""\xE3\x81\xB8""bb\r\nbb"), - 9 }, -#endif - { "", NULL, NULL, NULL, NULL, 0 } - }; - -#ifdef SUPPORT_UTF - const bool support_utf8 = true; -#else - const bool support_utf8 = false; -#endif - - for (const ReplaceTest *t = tests; t->original != NULL; ++t) { - RE re(t->regexp, RE_Options(PCRE_NEWLINE_CRLF).set_utf8(support_utf8)); - assert(re.error().empty()); - string one(t->original); - CHECK(re.Replace(t->rewrite, &one)); - CHECK_EQ(one, t->single); - string all(t->original); - const int replace_count = re.GlobalReplace(t->rewrite, &all); - CHECK_EQ(all, t->global); - CHECK_EQ(replace_count, t->global_count); - } - - // One final test: test \r\n replacement when we're not in CRLF mode - { - RE re("b*", RE_Options(PCRE_NEWLINE_CR).set_utf8(support_utf8)); - assert(re.error().empty()); - string all("aa\r\naa\r\n"); - CHECK_EQ(re.GlobalReplace("bb", &all), 9); - CHECK_EQ(all, string("bbabbabb\rbb\nbbabbabb\rbb\nbb")); - } - { - RE re("b*", RE_Options(PCRE_NEWLINE_LF).set_utf8(support_utf8)); - assert(re.error().empty()); - string all("aa\r\naa\r\n"); - CHECK_EQ(re.GlobalReplace("bb", &all), 9); - CHECK_EQ(all, string("bbabbabb\rbb\nbbabbabb\rbb\nbb")); - } - // TODO: test what happens when no PCRE_NEWLINE_* flag is set. - // Alas, the answer depends on how pcre was compiled. -} - -static void TestExtract() { - printf("Testing Extract\n"); - - string s; - - CHECK(RE("(.*)@([^.]*)").Extract("\\2!\\1", "boris@kremvax.ru", &s)); - CHECK_EQ(s, "kremvax!boris"); - - // check the RE interface as well - CHECK(RE(".*").Extract("'\\0'", "foo", &s)); - CHECK_EQ(s, "'foo'"); - CHECK(!RE("bar").Extract("'\\0'", "baz", &s)); - CHECK_EQ(s, "'foo'"); -} - -static void TestConsume() { - printf("Testing Consume\n"); - - string word; - - string s(" aaa b!@#$@#$cccc"); - StringPiece input(s); - - RE r("\\s*(\\w+)"); // matches a word, possibly proceeded by whitespace - CHECK(r.Consume(&input, &word)); - CHECK_EQ(word, "aaa"); - CHECK(r.Consume(&input, &word)); - CHECK_EQ(word, "b"); - CHECK(! r.Consume(&input, &word)); -} - -static void TestFindAndConsume() { - printf("Testing FindAndConsume\n"); - - string word; - - string s(" aaa b!@#$@#$cccc"); - StringPiece input(s); - - RE r("(\\w+)"); // matches a word - CHECK(r.FindAndConsume(&input, &word)); - CHECK_EQ(word, "aaa"); - CHECK(r.FindAndConsume(&input, &word)); - CHECK_EQ(word, "b"); - CHECK(r.FindAndConsume(&input, &word)); - CHECK_EQ(word, "cccc"); - CHECK(! r.FindAndConsume(&input, &word)); -} - -static void TestMatchNumberPeculiarity() { - printf("Testing match-number peculiarity\n"); - - string word1; - string word2; - string word3; - - RE r("(foo)|(bar)|(baz)"); - CHECK(r.PartialMatch("foo", &word1, &word2, &word3)); - CHECK_EQ(word1, "foo"); - CHECK_EQ(word2, ""); - CHECK_EQ(word3, ""); - CHECK(r.PartialMatch("bar", &word1, &word2, &word3)); - CHECK_EQ(word1, ""); - CHECK_EQ(word2, "bar"); - CHECK_EQ(word3, ""); - CHECK(r.PartialMatch("baz", &word1, &word2, &word3)); - CHECK_EQ(word1, ""); - CHECK_EQ(word2, ""); - CHECK_EQ(word3, "baz"); - CHECK(!r.PartialMatch("f", &word1, &word2, &word3)); - - string a; - CHECK(RE("(foo)|hello").FullMatch("hello", &a)); - CHECK_EQ(a, ""); -} - -static void TestRecursion() { - printf("Testing recursion\n"); - - // Get one string that passes (sometimes), one that never does. - string text_good("abcdefghijk"); - string text_bad("acdefghijkl"); - - // According to pcretest, matching text_good against (\w+)*b - // requires match_limit of at least 8192, and match_recursion_limit - // of at least 37. - - RE_Options options_ml; - options_ml.set_match_limit(8192); - RE re("(\\w+)*b", options_ml); - CHECK(re.PartialMatch(text_good) == true); - CHECK(re.PartialMatch(text_bad) == false); - CHECK(re.FullMatch(text_good) == false); - CHECK(re.FullMatch(text_bad) == false); - - options_ml.set_match_limit(1024); - RE re2("(\\w+)*b", options_ml); - CHECK(re2.PartialMatch(text_good) == false); // because of match_limit - CHECK(re2.PartialMatch(text_bad) == false); - CHECK(re2.FullMatch(text_good) == false); - CHECK(re2.FullMatch(text_bad) == false); - - RE_Options options_mlr; - options_mlr.set_match_limit_recursion(50); - RE re3("(\\w+)*b", options_mlr); - CHECK(re3.PartialMatch(text_good) == true); - CHECK(re3.PartialMatch(text_bad) == false); - CHECK(re3.FullMatch(text_good) == false); - CHECK(re3.FullMatch(text_bad) == false); - - options_mlr.set_match_limit_recursion(10); - RE re4("(\\w+)*b", options_mlr); - CHECK(re4.PartialMatch(text_good) == false); - CHECK(re4.PartialMatch(text_bad) == false); - CHECK(re4.FullMatch(text_good) == false); - CHECK(re4.FullMatch(text_bad) == false); -} - -// A meta-quoted string, interpreted as a pattern, should always match -// the original unquoted string. -static void TestQuoteMeta(string unquoted, RE_Options options = RE_Options()) { - string quoted = RE::QuoteMeta(unquoted); - RE re(quoted, options); - CHECK(re.FullMatch(unquoted)); -} - -// A string containing meaningful regexp characters, which is then meta- -// quoted, should not generally match a string the unquoted string does. -static void NegativeTestQuoteMeta(string unquoted, string should_not_match, - RE_Options options = RE_Options()) { - string quoted = RE::QuoteMeta(unquoted); - RE re(quoted, options); - CHECK(!re.FullMatch(should_not_match)); -} - -// Tests that quoted meta characters match their original strings, -// and that a few things that shouldn't match indeed do not. -static void TestQuotaMetaSimple() { - TestQuoteMeta("foo"); - TestQuoteMeta("foo.bar"); - TestQuoteMeta("foo\\.bar"); - TestQuoteMeta("[1-9]"); - TestQuoteMeta("1.5-2.0?"); - TestQuoteMeta("\\d"); - TestQuoteMeta("Who doesn't like ice cream?"); - TestQuoteMeta("((a|b)c?d*e+[f-h]i)"); - TestQuoteMeta("((?!)xxx).*yyy"); - TestQuoteMeta("(["); - TestQuoteMeta(string("foo\0bar", 7)); -} - -static void TestQuoteMetaSimpleNegative() { - NegativeTestQuoteMeta("foo", "bar"); - NegativeTestQuoteMeta("...", "bar"); - NegativeTestQuoteMeta("\\.", "."); - NegativeTestQuoteMeta("\\.", ".."); - NegativeTestQuoteMeta("(a)", "a"); - NegativeTestQuoteMeta("(a|b)", "a"); - NegativeTestQuoteMeta("(a|b)", "(a)"); - NegativeTestQuoteMeta("(a|b)", "a|b"); - NegativeTestQuoteMeta("[0-9]", "0"); - NegativeTestQuoteMeta("[0-9]", "0-9"); - NegativeTestQuoteMeta("[0-9]", "[9]"); - NegativeTestQuoteMeta("((?!)xxx)", "xxx"); -} - -static void TestQuoteMetaLatin1() { - TestQuoteMeta("3\xb2 = 9"); -} - -static void TestQuoteMetaUtf8() { -#ifdef SUPPORT_UTF - TestQuoteMeta("Pl\xc3\xa1\x63ido Domingo", pcrecpp::UTF8()); - TestQuoteMeta("xyz", pcrecpp::UTF8()); // No fancy utf8 - TestQuoteMeta("\xc2\xb0", pcrecpp::UTF8()); // 2-byte utf8 (degree symbol) - TestQuoteMeta("27\xc2\xb0 degrees", pcrecpp::UTF8()); // As a middle character - TestQuoteMeta("\xe2\x80\xb3", pcrecpp::UTF8()); // 3-byte utf8 (double prime) - TestQuoteMeta("\xf0\x9d\x85\x9f", pcrecpp::UTF8()); // 4-byte utf8 (music note) - TestQuoteMeta("27\xc2\xb0"); // Interpreted as Latin-1, but should still work - NegativeTestQuoteMeta("27\xc2\xb0", // 2-byte utf (degree symbol) - "27\\\xc2\\\xb0", - pcrecpp::UTF8()); -#endif -} - -static void TestQuoteMetaAll() { - printf("Testing QuoteMeta\n"); - TestQuotaMetaSimple(); - TestQuoteMetaSimpleNegative(); - TestQuoteMetaLatin1(); - TestQuoteMetaUtf8(); -} - -// -// Options tests contributed by -// Giuseppe Maxia, CTO, Stardata s.r.l. -// July 2005 -// -static void GetOneOptionResult( - const char *option_name, - const char *regex, - const char *str, - RE_Options options, - bool full, - string expected) { - - printf("Testing Option <%s>\n", option_name); - if(VERBOSE_TEST) - printf("/%s/ finds \"%s\" within \"%s\" \n", - regex, - expected.c_str(), - str); - string captured(""); - if (full) - RE(regex,options).FullMatch(str, &captured); - else - RE(regex,options).PartialMatch(str, &captured); - CHECK_EQ(captured, expected); -} - -static void TestOneOption( - const char *option_name, - const char *regex, - const char *str, - RE_Options options, - bool full, - bool assertive = true) { - - printf("Testing Option <%s>\n", option_name); - if (VERBOSE_TEST) - printf("'%s' %s /%s/ \n", - str, - (assertive? "matches" : "doesn't match"), - regex); - if (assertive) { - if (full) - CHECK(RE(regex,options).FullMatch(str)); - else - CHECK(RE(regex,options).PartialMatch(str)); - } else { - if (full) - CHECK(!RE(regex,options).FullMatch(str)); - else - CHECK(!RE(regex,options).PartialMatch(str)); - } -} - -static void Test_CASELESS() { - RE_Options options; - RE_Options options2; - - options.set_caseless(true); - TestOneOption("CASELESS (class)", "HELLO", "hello", options, false); - TestOneOption("CASELESS (class2)", "HELLO", "hello", options2.set_caseless(true), false); - TestOneOption("CASELESS (class)", "^[A-Z]+$", "Hello", options, false); - - TestOneOption("CASELESS (function)", "HELLO", "hello", pcrecpp::CASELESS(), false); - TestOneOption("CASELESS (function)", "^[A-Z]+$", "Hello", pcrecpp::CASELESS(), false); - options.set_caseless(false); - TestOneOption("no CASELESS", "HELLO", "hello", options, false, false); -} - -static void Test_MULTILINE() { - RE_Options options; - RE_Options options2; - const char *str = "HELLO\n" "cruel\n" "world\n"; - - options.set_multiline(true); - TestOneOption("MULTILINE (class)", "^cruel$", str, options, false); - TestOneOption("MULTILINE (class2)", "^cruel$", str, options2.set_multiline(true), false); - TestOneOption("MULTILINE (function)", "^cruel$", str, pcrecpp::MULTILINE(), false); - options.set_multiline(false); - TestOneOption("no MULTILINE", "^cruel$", str, options, false, false); -} - -static void Test_DOTALL() { - RE_Options options; - RE_Options options2; - const char *str = "HELLO\n" "cruel\n" "world"; - - options.set_dotall(true); - TestOneOption("DOTALL (class)", "HELLO.*world", str, options, true); - TestOneOption("DOTALL (class2)", "HELLO.*world", str, options2.set_dotall(true), true); - TestOneOption("DOTALL (function)", "HELLO.*world", str, pcrecpp::DOTALL(), true); - options.set_dotall(false); - TestOneOption("no DOTALL", "HELLO.*world", str, options, true, false); -} - -static void Test_DOLLAR_ENDONLY() { - RE_Options options; - RE_Options options2; - const char *str = "HELLO world\n"; - - TestOneOption("no DOLLAR_ENDONLY", "world$", str, options, false); - options.set_dollar_endonly(true); - TestOneOption("DOLLAR_ENDONLY 1", "world$", str, options, false, false); - TestOneOption("DOLLAR_ENDONLY 2", "world$", str, options2.set_dollar_endonly(true), false, false); -} - -static void Test_EXTRA() { - RE_Options options; - const char *str = "HELLO"; - - options.set_extra(true); - TestOneOption("EXTRA 1", "\\HELL\\O", str, options, true, false ); - TestOneOption("EXTRA 2", "\\HELL\\O", str, RE_Options().set_extra(true), true, false ); - options.set_extra(false); - TestOneOption("no EXTRA", "\\HELL\\O", str, options, true ); -} - -static void Test_EXTENDED() { - RE_Options options; - RE_Options options2; - const char *str = "HELLO world"; - - options.set_extended(true); - TestOneOption("EXTENDED (class)", "HELLO world", str, options, false, false); - TestOneOption("EXTENDED (class2)", "HELLO world", str, options2.set_extended(true), false, false); - TestOneOption("EXTENDED (class)", - "^ HE L{2} O " - "\\s+ " - "\\w+ $ ", - str, - options, - false); - - TestOneOption("EXTENDED (function)", "HELLO world", str, pcrecpp::EXTENDED(), false, false); - TestOneOption("EXTENDED (function)", - "^ HE L{2} O " - "\\s+ " - "\\w+ $ ", - str, - pcrecpp::EXTENDED(), - false); - - options.set_extended(false); - TestOneOption("no EXTENDED", "HELLO world", str, options, false); -} - -static void Test_NO_AUTO_CAPTURE() { - RE_Options options; - const char *str = "HELLO world"; - string captured; - - printf("Testing Option \n"); - if (VERBOSE_TEST) - printf("parentheses capture text\n"); - RE re("(world|universe)$", options); - CHECK(re.Extract("\\1", str , &captured)); - CHECK_EQ(captured, "world"); - options.set_no_auto_capture(true); - printf("testing Option \n"); - if (VERBOSE_TEST) - printf("parentheses do not capture text\n"); - re.Extract("\\1",str, &captured ); - CHECK_EQ(captured, "world"); -} - -static void Test_UNGREEDY() { - RE_Options options; - const char *str = "HELLO, 'this' is the 'world'"; - - options.set_ungreedy(true); - GetOneOptionResult("UNGREEDY 1", "('.*')", str, options, false, "'this'" ); - GetOneOptionResult("UNGREEDY 2", "('.*')", str, RE_Options().set_ungreedy(true), false, "'this'" ); - GetOneOptionResult("UNGREEDY", "('.*?')", str, options, false, "'this' is the 'world'" ); - - options.set_ungreedy(false); - GetOneOptionResult("no UNGREEDY", "('.*')", str, options, false, "'this' is the 'world'" ); - GetOneOptionResult("no UNGREEDY", "('.*?')", str, options, false, "'this'" ); -} - -static void Test_all_options() { - const char *str = "HELLO\n" "cruel\n" "world"; - RE_Options options; - options.set_all_options(PCRE_CASELESS | PCRE_DOTALL); - - TestOneOption("all_options (CASELESS|DOTALL)", "^hello.*WORLD", str , options, false); - options.set_all_options(0); - TestOneOption("all_options (0)", "^hello.*WORLD", str , options, false, false); - options.set_all_options(PCRE_MULTILINE | PCRE_EXTENDED); - - TestOneOption("all_options (MULTILINE|EXTENDED)", " ^ c r u e l $ ", str, options, false); - TestOneOption("all_options (MULTILINE|EXTENDED) with constructor", - " ^ c r u e l $ ", - str, - RE_Options(PCRE_MULTILINE | PCRE_EXTENDED), - false); - - TestOneOption("all_options (MULTILINE|EXTENDED) with concatenation", - " ^ c r u e l $ ", - str, - RE_Options() - .set_multiline(true) - .set_extended(true), - false); - - options.set_all_options(0); - TestOneOption("all_options (0)", "^ c r u e l $", str, options, false, false); - -} - -static void TestOptions() { - printf("Testing Options\n"); - Test_CASELESS(); - Test_MULTILINE(); - Test_DOTALL(); - Test_DOLLAR_ENDONLY(); - Test_EXTENDED(); - Test_NO_AUTO_CAPTURE(); - Test_UNGREEDY(); - Test_EXTRA(); - Test_all_options(); -} - -static void TestConstructors() { - printf("Testing constructors\n"); - - RE_Options options; - options.set_dotall(true); - const char *str = "HELLO\n" "cruel\n" "world"; - - RE orig("HELLO.*world", options); - CHECK(orig.FullMatch(str)); - - RE copy1(orig); - CHECK(copy1.FullMatch(str)); - - RE copy2("not a match"); - CHECK(!copy2.FullMatch(str)); - copy2 = copy1; - CHECK(copy2.FullMatch(str)); - copy2 = orig; - CHECK(copy2.FullMatch(str)); - - // Make sure when we assign to ourselves, nothing bad happens - orig = orig; - copy1 = copy1; - copy2 = copy2; - CHECK(orig.FullMatch(str)); - CHECK(copy1.FullMatch(str)); - CHECK(copy2.FullMatch(str)); -} - -int main(int argc, char** argv) { - // Treat any flag as --help - if (argc > 1 && argv[1][0] == '-') { - printf("Usage: %s [timing1|timing2|timing3 num-iters]\n" - " If 'timingX ###' is specified, run the given timing test\n" - " with the given number of iterations, rather than running\n" - " the default corectness test.\n", argv[0]); - return 0; - } - - if (argc > 1) { - if ( argc == 2 || atoi(argv[2]) == 0) { - printf("timing mode needs a num-iters argument\n"); - return 1; - } - if (!strcmp(argv[1], "timing1")) - Timing1(atoi(argv[2])); - else if (!strcmp(argv[1], "timing2")) - Timing2(atoi(argv[2])); - else if (!strcmp(argv[1], "timing3")) - Timing3(atoi(argv[2])); - else - printf("Unknown argument '%s'\n", argv[1]); - return 0; - } - - printf("PCRE C++ wrapper tests\n"); - printf("Testing FullMatch\n"); - - int i; - string s; - - /***** FullMatch with no args *****/ - - CHECK(RE("h.*o").FullMatch("hello")); - CHECK(!RE("h.*o").FullMatch("othello")); // Must be anchored at front - CHECK(!RE("h.*o").FullMatch("hello!")); // Must be anchored at end - CHECK(RE("a*").FullMatch("aaaa")); // Fullmatch with normal op - CHECK(RE("a*?").FullMatch("aaaa")); // Fullmatch with nongreedy op - CHECK(RE("a*?\\z").FullMatch("aaaa")); // Two unusual ops - - /***** FullMatch with args *****/ - - // Zero-arg - CHECK(RE("\\d+").FullMatch("1001")); - - // Single-arg - CHECK(RE("(\\d+)").FullMatch("1001", &i)); - CHECK_EQ(i, 1001); - CHECK(RE("(-?\\d+)").FullMatch("-123", &i)); - CHECK_EQ(i, -123); - CHECK(!RE("()\\d+").FullMatch("10", &i)); - CHECK(!RE("(\\d+)").FullMatch("1234567890123456789012345678901234567890", - &i)); - - // Digits surrounding integer-arg - CHECK(RE("1(\\d*)4").FullMatch("1234", &i)); - CHECK_EQ(i, 23); - CHECK(RE("(\\d)\\d+").FullMatch("1234", &i)); - CHECK_EQ(i, 1); - CHECK(RE("(-\\d)\\d+").FullMatch("-1234", &i)); - CHECK_EQ(i, -1); - CHECK(RE("(\\d)").PartialMatch("1234", &i)); - CHECK_EQ(i, 1); - CHECK(RE("(-\\d)").PartialMatch("-1234", &i)); - CHECK_EQ(i, -1); - - // String-arg - CHECK(RE("h(.*)o").FullMatch("hello", &s)); - CHECK_EQ(s, string("ell")); - - // StringPiece-arg - StringPiece sp; - CHECK(RE("(\\w+):(\\d+)").FullMatch("ruby:1234", &sp, &i)); - CHECK_EQ(sp.size(), 4); - CHECK(memcmp(sp.data(), "ruby", 4) == 0); - CHECK_EQ(i, 1234); - - // Multi-arg - CHECK(RE("(\\w+):(\\d+)").FullMatch("ruby:1234", &s, &i)); - CHECK_EQ(s, string("ruby")); - CHECK_EQ(i, 1234); - - // Ignore non-void* NULL arg - CHECK(RE("he(.*)lo").FullMatch("hello", (char*)NULL)); - CHECK(RE("h(.*)o").FullMatch("hello", (string*)NULL)); - CHECK(RE("h(.*)o").FullMatch("hello", (StringPiece*)NULL)); - CHECK(RE("(.*)").FullMatch("1234", (int*)NULL)); -#ifdef HAVE_LONG_LONG - CHECK(RE("(.*)").FullMatch("1234567890123456", (long long*)NULL)); -#endif - CHECK(RE("(.*)").FullMatch("123.4567890123456", (double*)NULL)); - CHECK(RE("(.*)").FullMatch("123.4567890123456", (float*)NULL)); - - // Fail on non-void* NULL arg if the match doesn't parse for the given type. - CHECK(!RE("h(.*)lo").FullMatch("hello", &s, (char*)NULL)); - CHECK(!RE("(.*)").FullMatch("hello", (int*)NULL)); - CHECK(!RE("(.*)").FullMatch("1234567890123456", (int*)NULL)); - CHECK(!RE("(.*)").FullMatch("hello", (double*)NULL)); - CHECK(!RE("(.*)").FullMatch("hello", (float*)NULL)); - - // Ignored arg - CHECK(RE("(\\w+)(:)(\\d+)").FullMatch("ruby:1234", &s, (void*)NULL, &i)); - CHECK_EQ(s, string("ruby")); - CHECK_EQ(i, 1234); - - // Type tests - { - char c; - CHECK(RE("(H)ello").FullMatch("Hello", &c)); - CHECK_EQ(c, 'H'); - } - { - unsigned char c; - CHECK(RE("(H)ello").FullMatch("Hello", &c)); - CHECK_EQ(c, static_cast('H')); - } - { - short v; - CHECK(RE("(-?\\d+)").FullMatch("100", &v)); CHECK_EQ(v, 100); - CHECK(RE("(-?\\d+)").FullMatch("-100", &v)); CHECK_EQ(v, -100); - CHECK(RE("(-?\\d+)").FullMatch("32767", &v)); CHECK_EQ(v, 32767); - CHECK(RE("(-?\\d+)").FullMatch("-32768", &v)); CHECK_EQ(v, -32768); - CHECK(!RE("(-?\\d+)").FullMatch("-32769", &v)); - CHECK(!RE("(-?\\d+)").FullMatch("32768", &v)); - } - { - unsigned short v; - CHECK(RE("(\\d+)").FullMatch("100", &v)); CHECK_EQ(v, 100); - CHECK(RE("(\\d+)").FullMatch("32767", &v)); CHECK_EQ(v, 32767); - CHECK(RE("(\\d+)").FullMatch("65535", &v)); CHECK_EQ(v, 65535); - CHECK(!RE("(\\d+)").FullMatch("65536", &v)); - } - { - int v; - static const int max_value = 0x7fffffff; - static const int min_value = -max_value - 1; - CHECK(RE("(-?\\d+)").FullMatch("100", &v)); CHECK_EQ(v, 100); - CHECK(RE("(-?\\d+)").FullMatch("-100", &v)); CHECK_EQ(v, -100); - CHECK(RE("(-?\\d+)").FullMatch("2147483647", &v)); CHECK_EQ(v, max_value); - CHECK(RE("(-?\\d+)").FullMatch("-2147483648", &v)); CHECK_EQ(v, min_value); - CHECK(!RE("(-?\\d+)").FullMatch("-2147483649", &v)); - CHECK(!RE("(-?\\d+)").FullMatch("2147483648", &v)); - } - { - unsigned int v; - static const unsigned int max_value = 0xfffffffful; - CHECK(RE("(\\d+)").FullMatch("100", &v)); CHECK_EQ(v, 100); - CHECK(RE("(\\d+)").FullMatch("4294967295", &v)); CHECK_EQ(v, max_value); - CHECK(!RE("(\\d+)").FullMatch("4294967296", &v)); - } -#ifdef HAVE_LONG_LONG -# if defined(__MINGW__) || defined(__MINGW32__) -# define LLD "%I64d" -# define LLU "%I64u" -# else -# define LLD "%lld" -# define LLU "%llu" -# endif - { - long long v; - static const long long max_value = 0x7fffffffffffffffLL; - static const long long min_value = -max_value - 1; - char buf[32]; // definitely big enough for a long long - - CHECK(RE("(-?\\d+)").FullMatch("100", &v)); CHECK_EQ(v, 100); - CHECK(RE("(-?\\d+)").FullMatch("-100",&v)); CHECK_EQ(v, -100); - - sprintf(buf, LLD, max_value); - CHECK(RE("(-?\\d+)").FullMatch(buf,&v)); CHECK_EQ(v, max_value); - - sprintf(buf, LLD, min_value); - CHECK(RE("(-?\\d+)").FullMatch(buf,&v)); CHECK_EQ(v, min_value); - - sprintf(buf, LLD, max_value); - assert(buf[strlen(buf)-1] != '9'); - buf[strlen(buf)-1]++; - CHECK(!RE("(-?\\d+)").FullMatch(buf, &v)); - - sprintf(buf, LLD, min_value); - assert(buf[strlen(buf)-1] != '9'); - buf[strlen(buf)-1]++; - CHECK(!RE("(-?\\d+)").FullMatch(buf, &v)); - } -#endif -#if defined HAVE_UNSIGNED_LONG_LONG && defined HAVE_LONG_LONG - { - unsigned long long v; - long long v2; - static const unsigned long long max_value = 0xffffffffffffffffULL; - char buf[32]; // definitely big enough for a unsigned long long - - CHECK(RE("(-?\\d+)").FullMatch("100",&v)); CHECK_EQ(v, 100); - CHECK(RE("(-?\\d+)").FullMatch("-100",&v2)); CHECK_EQ(v2, -100); - - sprintf(buf, LLU, max_value); - CHECK(RE("(-?\\d+)").FullMatch(buf,&v)); CHECK_EQ(v, max_value); - - assert(buf[strlen(buf)-1] != '9'); - buf[strlen(buf)-1]++; - CHECK(!RE("(-?\\d+)").FullMatch(buf, &v)); - } -#endif - { - float v; - CHECK(RE("(.*)").FullMatch("100", &v)); - CHECK(RE("(.*)").FullMatch("-100.", &v)); - CHECK(RE("(.*)").FullMatch("1e23", &v)); - } - { - double v; - CHECK(RE("(.*)").FullMatch("100", &v)); - CHECK(RE("(.*)").FullMatch("-100.", &v)); - CHECK(RE("(.*)").FullMatch("1e23", &v)); - } - - // Check that matching is fully anchored - CHECK(!RE("(\\d+)").FullMatch("x1001", &i)); - CHECK(!RE("(\\d+)").FullMatch("1001x", &i)); - CHECK(RE("x(\\d+)").FullMatch("x1001", &i)); CHECK_EQ(i, 1001); - CHECK(RE("(\\d+)x").FullMatch("1001x", &i)); CHECK_EQ(i, 1001); - - // Braces - CHECK(RE("[0-9a-f+.-]{5,}").FullMatch("0abcd")); - CHECK(RE("[0-9a-f+.-]{5,}").FullMatch("0abcde")); - CHECK(!RE("[0-9a-f+.-]{5,}").FullMatch("0abc")); - - // Complicated RE - CHECK(RE("foo|bar|[A-Z]").FullMatch("foo")); - CHECK(RE("foo|bar|[A-Z]").FullMatch("bar")); - CHECK(RE("foo|bar|[A-Z]").FullMatch("X")); - CHECK(!RE("foo|bar|[A-Z]").FullMatch("XY")); - - // Check full-match handling (needs '$' tacked on internally) - CHECK(RE("fo|foo").FullMatch("fo")); - CHECK(RE("fo|foo").FullMatch("foo")); - CHECK(RE("fo|foo$").FullMatch("fo")); - CHECK(RE("fo|foo$").FullMatch("foo")); - CHECK(RE("foo$").FullMatch("foo")); - CHECK(!RE("foo\\$").FullMatch("foo$bar")); - CHECK(!RE("fo|bar").FullMatch("fox")); - - // Uncomment the following if we change the handling of '$' to - // prevent it from matching a trailing newline - if (false) { - // Check that we don't get bitten by pcre's special handling of a - // '\n' at the end of the string matching '$' - CHECK(!RE("foo$").PartialMatch("foo\n")); - } - - // Number of args - int a[16]; - CHECK(RE("").FullMatch("")); - - memset(a, 0, sizeof(0)); - CHECK(RE("(\\d){1}").FullMatch("1", - &a[0])); - CHECK_EQ(a[0], 1); - - memset(a, 0, sizeof(0)); - CHECK(RE("(\\d)(\\d)").FullMatch("12", - &a[0], &a[1])); - CHECK_EQ(a[0], 1); - CHECK_EQ(a[1], 2); - - memset(a, 0, sizeof(0)); - CHECK(RE("(\\d)(\\d)(\\d)").FullMatch("123", - &a[0], &a[1], &a[2])); - CHECK_EQ(a[0], 1); - CHECK_EQ(a[1], 2); - CHECK_EQ(a[2], 3); - - memset(a, 0, sizeof(0)); - CHECK(RE("(\\d)(\\d)(\\d)(\\d)").FullMatch("1234", - &a[0], &a[1], &a[2], &a[3])); - CHECK_EQ(a[0], 1); - CHECK_EQ(a[1], 2); - CHECK_EQ(a[2], 3); - CHECK_EQ(a[3], 4); - - memset(a, 0, sizeof(0)); - CHECK(RE("(\\d)(\\d)(\\d)(\\d)(\\d)").FullMatch("12345", - &a[0], &a[1], &a[2], - &a[3], &a[4])); - CHECK_EQ(a[0], 1); - CHECK_EQ(a[1], 2); - CHECK_EQ(a[2], 3); - CHECK_EQ(a[3], 4); - CHECK_EQ(a[4], 5); - - memset(a, 0, sizeof(0)); - CHECK(RE("(\\d)(\\d)(\\d)(\\d)(\\d)(\\d)").FullMatch("123456", - &a[0], &a[1], &a[2], - &a[3], &a[4], &a[5])); - CHECK_EQ(a[0], 1); - CHECK_EQ(a[1], 2); - CHECK_EQ(a[2], 3); - CHECK_EQ(a[3], 4); - CHECK_EQ(a[4], 5); - CHECK_EQ(a[5], 6); - - memset(a, 0, sizeof(0)); - CHECK(RE("(\\d)(\\d)(\\d)(\\d)(\\d)(\\d)(\\d)").FullMatch("1234567", - &a[0], &a[1], &a[2], &a[3], - &a[4], &a[5], &a[6])); - CHECK_EQ(a[0], 1); - CHECK_EQ(a[1], 2); - CHECK_EQ(a[2], 3); - CHECK_EQ(a[3], 4); - CHECK_EQ(a[4], 5); - CHECK_EQ(a[5], 6); - CHECK_EQ(a[6], 7); - - memset(a, 0, sizeof(0)); - CHECK(RE("(\\d)(\\d)(\\d)(\\d)(\\d)(\\d)(\\d)(\\d)" - "(\\d)(\\d)(\\d)(\\d)(\\d)(\\d)(\\d)(\\d)").FullMatch( - "1234567890123456", - &a[0], &a[1], &a[2], &a[3], - &a[4], &a[5], &a[6], &a[7], - &a[8], &a[9], &a[10], &a[11], - &a[12], &a[13], &a[14], &a[15])); - CHECK_EQ(a[0], 1); - CHECK_EQ(a[1], 2); - CHECK_EQ(a[2], 3); - CHECK_EQ(a[3], 4); - CHECK_EQ(a[4], 5); - CHECK_EQ(a[5], 6); - CHECK_EQ(a[6], 7); - CHECK_EQ(a[7], 8); - CHECK_EQ(a[8], 9); - CHECK_EQ(a[9], 0); - CHECK_EQ(a[10], 1); - CHECK_EQ(a[11], 2); - CHECK_EQ(a[12], 3); - CHECK_EQ(a[13], 4); - CHECK_EQ(a[14], 5); - CHECK_EQ(a[15], 6); - - /***** PartialMatch *****/ - - printf("Testing PartialMatch\n"); - - CHECK(RE("h.*o").PartialMatch("hello")); - CHECK(RE("h.*o").PartialMatch("othello")); - CHECK(RE("h.*o").PartialMatch("hello!")); - CHECK(RE("((((((((((((((((((((x))))))))))))))))))))").PartialMatch("x")); - - /***** other tests *****/ - - RadixTests(); - TestReplace(); - TestExtract(); - TestConsume(); - TestFindAndConsume(); - TestQuoteMetaAll(); - TestMatchNumberPeculiarity(); - - // Check the pattern() accessor - { - const string kPattern = "http://([^/]+)/.*"; - const RE re(kPattern); - CHECK_EQ(kPattern, re.pattern()); - } - - // Check RE error field. - { - RE re("foo"); - CHECK(re.error().empty()); // Must have no error - } - -#ifdef SUPPORT_UTF - // Check UTF-8 handling - { - printf("Testing UTF-8 handling\n"); - - // Three Japanese characters (nihongo) - const unsigned char utf8_string[] = { - 0xe6, 0x97, 0xa5, // 65e5 - 0xe6, 0x9c, 0xac, // 627c - 0xe8, 0xaa, 0x9e, // 8a9e - 0 - }; - const unsigned char utf8_pattern[] = { - '.', - 0xe6, 0x9c, 0xac, // 627c - '.', - 0 - }; - - // Both should match in either mode, bytes or UTF-8 - RE re_test1("........."); - CHECK(re_test1.FullMatch(utf8_string)); - RE re_test2("...", pcrecpp::UTF8()); - CHECK(re_test2.FullMatch(utf8_string)); - - // PH added these tests for leading option settings - - RE re_testZ0("(*CR)(*NO_START_OPT)........."); - CHECK(re_testZ0.FullMatch(utf8_string)); - -#ifdef SUPPORT_UTF - RE re_testZ1("(*UTF8)..."); - CHECK(re_testZ1.FullMatch(utf8_string)); - - RE re_testZ2("(*UTF)..."); - CHECK(re_testZ2.FullMatch(utf8_string)); - -#ifdef SUPPORT_UCP - RE re_testZ3("(*UCP)(*UTF)..."); - CHECK(re_testZ3.FullMatch(utf8_string)); - - RE re_testZ4("(*UCP)(*LIMIT_MATCH=1000)(*UTF)..."); - CHECK(re_testZ4.FullMatch(utf8_string)); - - RE re_testZ5("(*UCP)(*LIMIT_MATCH=1000)(*ANY)(*UTF)..."); - CHECK(re_testZ5.FullMatch(utf8_string)); -#endif -#endif - - // Check that '.' matches one byte or UTF-8 character - // according to the mode. - string ss; - RE re_test3("(.)"); - CHECK(re_test3.PartialMatch(utf8_string, &ss)); - CHECK_EQ(ss, string("\xe6")); - RE re_test4("(.)", pcrecpp::UTF8()); - CHECK(re_test4.PartialMatch(utf8_string, &ss)); - CHECK_EQ(ss, string("\xe6\x97\xa5")); - - // Check that string matches itself in either mode - RE re_test5(utf8_string); - CHECK(re_test5.FullMatch(utf8_string)); - RE re_test6(utf8_string, pcrecpp::UTF8()); - CHECK(re_test6.FullMatch(utf8_string)); - - // Check that pattern matches string only in UTF8 mode - RE re_test7(utf8_pattern); - CHECK(!re_test7.FullMatch(utf8_string)); - RE re_test8(utf8_pattern, pcrecpp::UTF8()); - CHECK(re_test8.FullMatch(utf8_string)); - } - - // Check that ungreedy, UTF8 regular expressions don't match when they - // oughtn't -- see bug 82246. - { - // This code always worked. - const char* pattern = "\\w+X"; - const string target = "a aX"; - RE match_sentence(pattern); - RE match_sentence_re(pattern, pcrecpp::UTF8()); - - CHECK(!match_sentence.FullMatch(target)); - CHECK(!match_sentence_re.FullMatch(target)); - } - - { - const char* pattern = "(?U)\\w+X"; - const string target = "a aX"; - RE match_sentence(pattern); - RE match_sentence_re(pattern, pcrecpp::UTF8()); - - CHECK(!match_sentence.FullMatch(target)); - CHECK(!match_sentence_re.FullMatch(target)); - } -#endif /* def SUPPORT_UTF */ - - printf("Testing error reporting\n"); - - { RE re("a\\1"); CHECK(!re.error().empty()); } - { - RE re("a[x"); - CHECK(!re.error().empty()); - } - { - RE re("a[z-a]"); - CHECK(!re.error().empty()); - } - { - RE re("a[[:foobar:]]"); - CHECK(!re.error().empty()); - } - { - RE re("a(b"); - CHECK(!re.error().empty()); - } - { - RE re("a\\"); - CHECK(!re.error().empty()); - } - - // Test that recursion is stopped - TestRecursion(); - - // Test Options - if (getenv("VERBOSE_TEST") != NULL) - VERBOSE_TEST = true; - TestOptions(); - - // Test the constructors - TestConstructors(); - - // Done - printf("OK\n"); - - return 0; -} diff --git a/src/pcre/pcrecpparg.h.in b/src/pcre/pcrecpparg.h.in deleted file mode 100644 index 61bcab54..00000000 --- a/src/pcre/pcrecpparg.h.in +++ /dev/null @@ -1,174 +0,0 @@ -// Copyright (c) 2005, Google Inc. -// All rights reserved. -// -// Redistribution and use in source and binary forms, with or without -// modification, are permitted provided that the following conditions are -// met: -// -// * Redistributions of source code must retain the above copyright -// notice, this list of conditions and the following disclaimer. -// * Redistributions in binary form must reproduce the above -// copyright notice, this list of conditions and the following disclaimer -// in the documentation and/or other materials provided with the -// distribution. -// * Neither the name of Google Inc. nor the names of its -// contributors may be used to endorse or promote products derived from -// this software without specific prior written permission. -// -// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -// -// Author: Sanjay Ghemawat - -#ifndef _PCRECPPARG_H -#define _PCRECPPARG_H - -#include // for NULL -#include - -#include - -namespace pcrecpp { - -class StringPiece; - -// Hex/Octal/Binary? - -// Special class for parsing into objects that define a ParseFrom() method -template -class _RE_MatchObject { - public: - static inline bool Parse(const char* str, int n, void* dest) { - if (dest == NULL) return true; - T* object = reinterpret_cast(dest); - return object->ParseFrom(str, n); - } -}; - -class PCRECPP_EXP_DEFN Arg { - public: - // Empty constructor so we can declare arrays of Arg - Arg(); - - // Constructor specially designed for NULL arguments - Arg(void*); - - typedef bool (*Parser)(const char* str, int n, void* dest); - -// Type-specific parsers -#define PCRE_MAKE_PARSER(type,name) \ - Arg(type* p) : arg_(p), parser_(name) { } \ - Arg(type* p, Parser parser) : arg_(p), parser_(parser) { } - - - PCRE_MAKE_PARSER(char, parse_char); - PCRE_MAKE_PARSER(unsigned char, parse_uchar); - PCRE_MAKE_PARSER(short, parse_short); - PCRE_MAKE_PARSER(unsigned short, parse_ushort); - PCRE_MAKE_PARSER(int, parse_int); - PCRE_MAKE_PARSER(unsigned int, parse_uint); - PCRE_MAKE_PARSER(long, parse_long); - PCRE_MAKE_PARSER(unsigned long, parse_ulong); -#if @pcre_have_long_long@ - PCRE_MAKE_PARSER(long long, parse_longlong); -#endif -#if @pcre_have_ulong_long@ - PCRE_MAKE_PARSER(unsigned long long, parse_ulonglong); -#endif - PCRE_MAKE_PARSER(float, parse_float); - PCRE_MAKE_PARSER(double, parse_double); - PCRE_MAKE_PARSER(std::string, parse_string); - PCRE_MAKE_PARSER(StringPiece, parse_stringpiece); - -#undef PCRE_MAKE_PARSER - - // Generic constructor - template Arg(T*, Parser parser); - // Generic constructor template - template Arg(T* p) - : arg_(p), parser_(_RE_MatchObject::Parse) { - } - - // Parse the data - bool Parse(const char* str, int n) const; - - private: - void* arg_; - Parser parser_; - - static bool parse_null (const char* str, int n, void* dest); - static bool parse_char (const char* str, int n, void* dest); - static bool parse_uchar (const char* str, int n, void* dest); - static bool parse_float (const char* str, int n, void* dest); - static bool parse_double (const char* str, int n, void* dest); - static bool parse_string (const char* str, int n, void* dest); - static bool parse_stringpiece (const char* str, int n, void* dest); - -#define PCRE_DECLARE_INTEGER_PARSER(name) \ - private: \ - static bool parse_ ## name(const char* str, int n, void* dest); \ - static bool parse_ ## name ## _radix( \ - const char* str, int n, void* dest, int radix); \ - public: \ - static bool parse_ ## name ## _hex(const char* str, int n, void* dest); \ - static bool parse_ ## name ## _octal(const char* str, int n, void* dest); \ - static bool parse_ ## name ## _cradix(const char* str, int n, void* dest) - - PCRE_DECLARE_INTEGER_PARSER(short); - PCRE_DECLARE_INTEGER_PARSER(ushort); - PCRE_DECLARE_INTEGER_PARSER(int); - PCRE_DECLARE_INTEGER_PARSER(uint); - PCRE_DECLARE_INTEGER_PARSER(long); - PCRE_DECLARE_INTEGER_PARSER(ulong); - PCRE_DECLARE_INTEGER_PARSER(longlong); - PCRE_DECLARE_INTEGER_PARSER(ulonglong); - -#undef PCRE_DECLARE_INTEGER_PARSER -}; - -inline Arg::Arg() : arg_(NULL), parser_(parse_null) { } -inline Arg::Arg(void* p) : arg_(p), parser_(parse_null) { } - -inline bool Arg::Parse(const char* str, int n) const { - return (*parser_)(str, n, arg_); -} - -// This part of the parser, appropriate only for ints, deals with bases -#define MAKE_INTEGER_PARSER(type, name) \ - inline Arg Hex(type* ptr) { \ - return Arg(ptr, Arg::parse_ ## name ## _hex); } \ - inline Arg Octal(type* ptr) { \ - return Arg(ptr, Arg::parse_ ## name ## _octal); } \ - inline Arg CRadix(type* ptr) { \ - return Arg(ptr, Arg::parse_ ## name ## _cradix); } - -MAKE_INTEGER_PARSER(short, short) /* */ -MAKE_INTEGER_PARSER(unsigned short, ushort) /* */ -MAKE_INTEGER_PARSER(int, int) /* Don't use semicolons */ -MAKE_INTEGER_PARSER(unsigned int, uint) /* after these statement */ -MAKE_INTEGER_PARSER(long, long) /* because they can cause */ -MAKE_INTEGER_PARSER(unsigned long, ulong) /* compiler warnings if */ -#if @pcre_have_long_long@ /* the checking level is */ -MAKE_INTEGER_PARSER(long long, longlong) /* turned up high enough. */ -#endif /* */ -#if @pcre_have_ulong_long@ /* */ -MAKE_INTEGER_PARSER(unsigned long long, ulonglong) /* */ -#endif - -#undef PCRE_IS_SET -#undef PCRE_SET_OR_CLEAR -#undef MAKE_INTEGER_PARSER - -} // namespace pcrecpp - - -#endif /* _PCRECPPARG_H */ diff --git a/src/pcre/pcredemo.c b/src/pcre/pcredemo.c deleted file mode 100644 index 946aba45..00000000 --- a/src/pcre/pcredemo.c +++ /dev/null @@ -1,406 +0,0 @@ -/************************************************* -* PCRE DEMONSTRATION PROGRAM * -*************************************************/ - -/* This is a demonstration program to illustrate the most straightforward ways -of calling the PCRE regular expression library from a C program. See the -pcresample documentation for a short discussion ("man pcresample" if you have -the PCRE man pages installed). - -In Unix-like environments, if PCRE is installed in your standard system -libraries, you should be able to compile this program using this command: - -gcc -Wall pcredemo.c -lpcre -o pcredemo - -If PCRE is not installed in a standard place, it is likely to be installed with -support for the pkg-config mechanism. If you have pkg-config, you can compile -this program using this command: - -gcc -Wall pcredemo.c `pkg-config --cflags --libs libpcre` -o pcredemo - -If you do not have pkg-config, you may have to use this: - -gcc -Wall pcredemo.c -I/usr/local/include -L/usr/local/lib \ - -R/usr/local/lib -lpcre -o pcredemo - -Replace "/usr/local/include" and "/usr/local/lib" with wherever the include and -library files for PCRE are installed on your system. Only some operating -systems (e.g. Solaris) use the -R option. - -Building under Windows: - -If you want to statically link this program against a non-dll .a file, you must -define PCRE_STATIC before including pcre.h, otherwise the pcre_malloc() and -pcre_free() exported functions will be declared __declspec(dllimport), with -unwanted results. So in this environment, uncomment the following line. */ - -/* #define PCRE_STATIC */ - -#include -#include -#include - -#define OVECCOUNT 30 /* should be a multiple of 3 */ - - -int main(int argc, char **argv) -{ -pcre *re; -const char *error; -char *pattern; -char *subject; -unsigned char *name_table; -unsigned int option_bits; -int erroffset; -int find_all; -int crlf_is_newline; -int namecount; -int name_entry_size; -int ovector[OVECCOUNT]; -int subject_length; -int rc, i; -int utf8; - - -/************************************************************************** -* First, sort out the command line. There is only one possible option at * -* the moment, "-g" to request repeated matching to find all occurrences, * -* like Perl's /g option. We set the variable find_all to a non-zero value * -* if the -g option is present. Apart from that, there must be exactly two * -* arguments. * -**************************************************************************/ - -find_all = 0; -for (i = 1; i < argc; i++) - { - if (strcmp(argv[i], "-g") == 0) find_all = 1; - else break; - } - -/* After the options, we require exactly two arguments, which are the pattern, -and the subject string. */ - -if (argc - i != 2) - { - printf("Two arguments required: a regex and a subject string\n"); - return 1; - } - -pattern = argv[i]; -subject = argv[i+1]; -subject_length = (int)strlen(subject); - - -/************************************************************************* -* Now we are going to compile the regular expression pattern, and handle * -* and errors that are detected. * -*************************************************************************/ - -re = pcre_compile( - pattern, /* the pattern */ - 0, /* default options */ - &error, /* for error message */ - &erroffset, /* for error offset */ - NULL); /* use default character tables */ - -/* Compilation failed: print the error message and exit */ - -if (re == NULL) - { - printf("PCRE compilation failed at offset %d: %s\n", erroffset, error); - return 1; - } - - -/************************************************************************* -* If the compilation succeeded, we call PCRE again, in order to do a * -* pattern match against the subject string. This does just ONE match. If * -* further matching is needed, it will be done below. * -*************************************************************************/ - -rc = pcre_exec( - re, /* the compiled pattern */ - NULL, /* no extra data - we didn't study the pattern */ - subject, /* the subject string */ - subject_length, /* the length of the subject */ - 0, /* start at offset 0 in the subject */ - 0, /* default options */ - ovector, /* output vector for substring information */ - OVECCOUNT); /* number of elements in the output vector */ - -/* Matching failed: handle error cases */ - -if (rc < 0) - { - switch(rc) - { - case PCRE_ERROR_NOMATCH: printf("No match\n"); break; - /* - Handle other special cases if you like - */ - default: printf("Matching error %d\n", rc); break; - } - pcre_free(re); /* Release memory used for the compiled pattern */ - return 1; - } - -/* Match succeded */ - -printf("\nMatch succeeded at offset %d\n", ovector[0]); - - -/************************************************************************* -* We have found the first match within the subject string. If the output * -* vector wasn't big enough, say so. Then output any substrings that were * -* captured. * -*************************************************************************/ - -/* The output vector wasn't big enough */ - -if (rc == 0) - { - rc = OVECCOUNT/3; - printf("ovector only has room for %d captured substrings\n", rc - 1); - } - -/* Show substrings stored in the output vector by number. Obviously, in a real -application you might want to do things other than print them. */ - -for (i = 0; i < rc; i++) - { - char *substring_start = subject + ovector[2*i]; - int substring_length = ovector[2*i+1] - ovector[2*i]; - printf("%2d: %.*s\n", i, substring_length, substring_start); - } - - -/************************************************************************** -* That concludes the basic part of this demonstration program. We have * -* compiled a pattern, and performed a single match. The code that follows * -* shows first how to access named substrings, and then how to code for * -* repeated matches on the same subject. * -**************************************************************************/ - -/* See if there are any named substrings, and if so, show them by name. First -we have to extract the count of named parentheses from the pattern. */ - -(void)pcre_fullinfo( - re, /* the compiled pattern */ - NULL, /* no extra data - we didn't study the pattern */ - PCRE_INFO_NAMECOUNT, /* number of named substrings */ - &namecount); /* where to put the answer */ - -if (namecount <= 0) printf("No named substrings\n"); else - { - unsigned char *tabptr; - printf("Named substrings\n"); - - /* Before we can access the substrings, we must extract the table for - translating names to numbers, and the size of each entry in the table. */ - - (void)pcre_fullinfo( - re, /* the compiled pattern */ - NULL, /* no extra data - we didn't study the pattern */ - PCRE_INFO_NAMETABLE, /* address of the table */ - &name_table); /* where to put the answer */ - - (void)pcre_fullinfo( - re, /* the compiled pattern */ - NULL, /* no extra data - we didn't study the pattern */ - PCRE_INFO_NAMEENTRYSIZE, /* size of each entry in the table */ - &name_entry_size); /* where to put the answer */ - - /* Now we can scan the table and, for each entry, print the number, the name, - and the substring itself. */ - - tabptr = name_table; - for (i = 0; i < namecount; i++) - { - int n = (tabptr[0] << 8) | tabptr[1]; - printf("(%d) %*s: %.*s\n", n, name_entry_size - 3, tabptr + 2, - ovector[2*n+1] - ovector[2*n], subject + ovector[2*n]); - tabptr += name_entry_size; - } - } - - -/************************************************************************* -* If the "-g" option was given on the command line, we want to continue * -* to search for additional matches in the subject string, in a similar * -* way to the /g option in Perl. This turns out to be trickier than you * -* might think because of the possibility of matching an empty string. * -* What happens is as follows: * -* * -* If the previous match was NOT for an empty string, we can just start * -* the next match at the end of the previous one. * -* * -* If the previous match WAS for an empty string, we can't do that, as it * -* would lead to an infinite loop. Instead, a special call of pcre_exec() * -* is made with the PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED flags set. * -* The first of these tells PCRE that an empty string at the start of the * -* subject is not a valid match; other possibilities must be tried. The * -* second flag restricts PCRE to one match attempt at the initial string * -* position. If this match succeeds, an alternative to the empty string * -* match has been found, and we can print it and proceed round the loop, * -* advancing by the length of whatever was found. If this match does not * -* succeed, we still stay in the loop, advancing by just one character. * -* In UTF-8 mode, which can be set by (*UTF8) in the pattern, this may be * -* more than one byte. * -* * -* However, there is a complication concerned with newlines. When the * -* newline convention is such that CRLF is a valid newline, we must * -* advance by two characters rather than one. The newline convention can * -* be set in the regex by (*CR), etc.; if not, we must find the default. * -*************************************************************************/ - -if (!find_all) /* Check for -g */ - { - pcre_free(re); /* Release the memory used for the compiled pattern */ - return 0; /* Finish unless -g was given */ - } - -/* Before running the loop, check for UTF-8 and whether CRLF is a valid newline -sequence. First, find the options with which the regex was compiled; extract -the UTF-8 state, and mask off all but the newline options. */ - -(void)pcre_fullinfo(re, NULL, PCRE_INFO_OPTIONS, &option_bits); -utf8 = option_bits & PCRE_UTF8; -option_bits &= PCRE_NEWLINE_CR|PCRE_NEWLINE_LF|PCRE_NEWLINE_CRLF| - PCRE_NEWLINE_ANY|PCRE_NEWLINE_ANYCRLF; - -/* If no newline options were set, find the default newline convention from the -build configuration. */ - -if (option_bits == 0) - { - int d; - (void)pcre_config(PCRE_CONFIG_NEWLINE, &d); - /* Note that these values are always the ASCII ones, even in - EBCDIC environments. CR = 13, NL = 10. */ - option_bits = (d == 13)? PCRE_NEWLINE_CR : - (d == 10)? PCRE_NEWLINE_LF : - (d == (13<<8 | 10))? PCRE_NEWLINE_CRLF : - (d == -2)? PCRE_NEWLINE_ANYCRLF : - (d == -1)? PCRE_NEWLINE_ANY : 0; - } - -/* See if CRLF is a valid newline sequence. */ - -crlf_is_newline = - option_bits == PCRE_NEWLINE_ANY || - option_bits == PCRE_NEWLINE_CRLF || - option_bits == PCRE_NEWLINE_ANYCRLF; - -/* Loop for second and subsequent matches */ - -for (;;) - { - int options = 0; /* Normally no options */ - int start_offset = ovector[1]; /* Start at end of previous match */ - - /* If the previous match was for an empty string, we are finished if we are - at the end of the subject. Otherwise, arrange to run another match at the - same point to see if a non-empty match can be found. */ - - if (ovector[0] == ovector[1]) - { - if (ovector[0] == subject_length) break; - options = PCRE_NOTEMPTY_ATSTART | PCRE_ANCHORED; - } - - /* Run the next matching operation */ - - rc = pcre_exec( - re, /* the compiled pattern */ - NULL, /* no extra data - we didn't study the pattern */ - subject, /* the subject string */ - subject_length, /* the length of the subject */ - start_offset, /* starting offset in the subject */ - options, /* options */ - ovector, /* output vector for substring information */ - OVECCOUNT); /* number of elements in the output vector */ - - /* This time, a result of NOMATCH isn't an error. If the value in "options" - is zero, it just means we have found all possible matches, so the loop ends. - Otherwise, it means we have failed to find a non-empty-string match at a - point where there was a previous empty-string match. In this case, we do what - Perl does: advance the matching position by one character, and continue. We - do this by setting the "end of previous match" offset, because that is picked - up at the top of the loop as the point at which to start again. - - There are two complications: (a) When CRLF is a valid newline sequence, and - the current position is just before it, advance by an extra byte. (b) - Otherwise we must ensure that we skip an entire UTF-8 character if we are in - UTF-8 mode. */ - - if (rc == PCRE_ERROR_NOMATCH) - { - if (options == 0) break; /* All matches found */ - ovector[1] = start_offset + 1; /* Advance one byte */ - if (crlf_is_newline && /* If CRLF is newline & */ - start_offset < subject_length - 1 && /* we are at CRLF, */ - subject[start_offset] == '\r' && - subject[start_offset + 1] == '\n') - ovector[1] += 1; /* Advance by one more. */ - else if (utf8) /* Otherwise, ensure we */ - { /* advance a whole UTF-8 */ - while (ovector[1] < subject_length) /* character. */ - { - if ((subject[ovector[1]] & 0xc0) != 0x80) break; - ovector[1] += 1; - } - } - continue; /* Go round the loop again */ - } - - /* Other matching errors are not recoverable. */ - - if (rc < 0) - { - printf("Matching error %d\n", rc); - pcre_free(re); /* Release memory used for the compiled pattern */ - return 1; - } - - /* Match succeded */ - - printf("\nMatch succeeded again at offset %d\n", ovector[0]); - - /* The match succeeded, but the output vector wasn't big enough. */ - - if (rc == 0) - { - rc = OVECCOUNT/3; - printf("ovector only has room for %d captured substrings\n", rc - 1); - } - - /* As before, show substrings stored in the output vector by number, and then - also any named substrings. */ - - for (i = 0; i < rc; i++) - { - char *substring_start = subject + ovector[2*i]; - int substring_length = ovector[2*i+1] - ovector[2*i]; - printf("%2d: %.*s\n", i, substring_length, substring_start); - } - - if (namecount <= 0) printf("No named substrings\n"); else - { - unsigned char *tabptr = name_table; - printf("Named substrings\n"); - for (i = 0; i < namecount; i++) - { - int n = (tabptr[0] << 8) | tabptr[1]; - printf("(%d) %*s: %.*s\n", n, name_entry_size - 3, tabptr + 2, - ovector[2*n+1] - ovector[2*n], subject + ovector[2*n]); - tabptr += name_entry_size; - } - } - } /* End of loop to find second and subsequent matches */ - -printf("\n"); -pcre_free(re); /* Release memory used for the compiled pattern */ -return 0; -} - -/* End of pcredemo.c */ diff --git a/src/pcre/pcregexp.pas b/src/pcre/pcregexp.pas deleted file mode 100644 index bb2b3da8..00000000 --- a/src/pcre/pcregexp.pas +++ /dev/null @@ -1,845 +0,0 @@ -{ - pcRegExp - Perl compatible regular expressions for Virtual Pascal - (c) 2001 Peter S. Voronov aka Chem O'Dun - - Based on PCRE library interface unit for Virtual Pascal. - (c) 2001 Alexander Tokarev - - The current PCRE version is: 3.7 - - This software may be distributed under the terms of the modified BSD license - Copyright (c) 2001, Alexander Tokarev - All rights reserved. - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - * Redistributions in binary form must reproduce the above copyright notice, - this list of conditions and the following disclaimer in the documentation - and/or other materials provided with the distribution. - * Neither the name of the nor the names of its contributors - may be used to endorse or promote products derived from this software without - specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND - ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED - WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE - DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE - FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL - DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR - SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER - CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, - OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - - The PCRE library is written by: Philip Hazel - Copyright (c) 1997-2004 University of Cambridge - - AngelsHolocaust 4-11-04 updated to use version v5.0 - (INFO: this is regex-directed, NFA) - AH: 9-11-04 - pcre_free: removed var, pcre already gives the ptr, now - everything works as it should (no more crashes) - -> removed CheckRegExp because pcre handles errors perfectly - 10-11-04 - added pcError (errorhandling), pcInit - 13-11-04 - removed the ErrorPos = 0 check -> always print erroroffset - 17-10-05 - support for \1-\9 backreferences in TpcRegExp.GetReplStr - 17-02-06 - added RunTimeOptions: caller can set options while searching - 19-02-06 - added SearchOfs(): let PCRE use the complete string and offset - into the string itself - 20-12-06 - support for version 7.0 - 27.08.08 - support for v7.7 -} - -{$H+} {$DEFINE PCRE_3_7} {$DEFINE PCRE_5_0} {$DEFINE PCRE_7_0} {$DEFINE PCRE_7_7} - -Unit pcregexp; - -Interface - -uses objects; - -Type - PpcRegExp = ^TpcRegExp; -// TpcRegExp = object - TpcRegExp = object(TObject) - MatchesCount: integer; - RegExpC, RegExpExt : Pointer; - Matches:Pointer; - RegExp: shortstring; - SourceLen: integer; - PartialMatch : boolean; - Error : boolean; - ErrorMsg : Pchar; - ErrorPos : integer; - RunTimeOptions: Integer; // options which can be set by the caller - constructor Init(const ARegExp : shortstring; AOptions : integer; ALocale : Pointer); - function Search(AStr: Pchar; ALen : longint) : boolean; virtual; - function SearchNext( AStr: Pchar; ALen : longint) : boolean; virtual; - function SearchOfs ( AStr: Pchar; ALen, AOfs : longint) : boolean; virtual; - function MatchSub(ANom: integer; var Pos, Len : longint) : boolean; virtual; - function MatchFull(var Pos, Len : longint) : boolean; virtual; - function GetSubStr(ANom: integer; AStr: Pchar) : string; virtual; - function GetFullStr(AStr: Pchar) : string; virtual; - function GetReplStr(AStr: Pchar; const ARepl: string) : string; virtual; - function GetPreSubStr(AStr: Pchar) : string; virtual; - function GetPostSubStr(AStr: Pchar) : string; virtual; - function ErrorStr : string; virtual; - destructor Done; virtual; - end; - - function pcGrepMatch(WildCard, aStr: string; AOptions:integer; ALocale : Pointer): Boolean; - function pcGrepSub(WildCard, aStr, aRepl: string; AOptions:integer; ALocale : Pointer): string; - - function pcFastGrepMatch(WildCard, aStr: string): Boolean; - function pcFastGrepSub(WildCard, aStr, aRepl: string): string; - -{$IFDEF PCRE_5_0} - function pcGetVersion : pchar; -{$ENDIF} - - function pcError (var pRegExp : Pointer) : Boolean; - function pcInit (const Pattern: Shortstring; CaseSens: Boolean) : Pointer; - -Const { Options } - PCRE_CASELESS = $0001; - PCRE_MULTILINE = $0002; - PCRE_DOTALL = $0004; - PCRE_EXTENDED = $0008; - PCRE_ANCHORED = $0010; - PCRE_DOLLAR_ENDONLY = $0020; - PCRE_EXTRA = $0040; - PCRE_NOTBOL = $0080; - PCRE_NOTEOL = $0100; - PCRE_UNGREEDY = $0200; - PCRE_NOTEMPTY = $0400; -{$IFDEF PCRE_5_0} - PCRE_UTF8 = $0800; - PCRE_NO_AUTO_CAPTURE = $1000; - PCRE_NO_UTF8_CHECK = $2000; - PCRE_AUTO_CALLOUT = $4000; - PCRE_PARTIAL = $8000; -{$ENDIF} -{$IFDEF PCRE_7_0} - PCRE_DFA_SHORTEST = $00010000; - PCRE_DFA_RESTART = $00020000; - PCRE_FIRSTLINE = $00040000; - PCRE_DUPNAMES = $00080000; - PCRE_NEWLINE_CR = $00100000; - PCRE_NEWLINE_LF = $00200000; - PCRE_NEWLINE_CRLF = $00300000; - PCRE_NEWLINE_ANY = $00400000; - PCRE_NEWLINE_ANYCRLF = $00500000; - - PCRE_NEWLINE_BITS = PCRE_NEWLINE_CR or PCRE_NEWLINE_LF or PCRE_NEWLINE_ANY; - -{$ENDIF} -{$IFDEF PCRE_7_7} - PCRE_BSR_ANYCRLF = $00800000; - PCRE_BSR_UNICODE = $01000000; - PCRE_JAVASCRIPT_COMPAT= $02000000; -{$ENDIF} - - PCRE_COMPILE_ALLOWED_OPTIONS = PCRE_ANCHORED + PCRE_AUTO_CALLOUT + PCRE_CASELESS + - PCRE_DOLLAR_ENDONLY + PCRE_DOTALL + PCRE_EXTENDED + - PCRE_EXTRA + PCRE_MULTILINE + PCRE_NO_AUTO_CAPTURE + - PCRE_UNGREEDY + PCRE_UTF8 + PCRE_NO_UTF8_CHECK - {$IFDEF PCRE_7_0} - + PCRE_DUPNAMES + PCRE_FIRSTLINE + PCRE_NEWLINE_BITS - {$ENDIF} - {$IFDEF PCRE_7_7} - + PCRE_BSR_ANYCRLF + PCRE_BSR_UNICODE + PCRE_JAVASCRIPT_COMPAT - {$ENDIF} - ; - - PCRE_EXEC_ALLOWED_OPTIONS = PCRE_ANCHORED + PCRE_NOTBOL + PCRE_NOTEOL + - PCRE_NOTEMPTY + PCRE_NO_UTF8_CHECK + PCRE_PARTIAL - {$IFDEF PCRE_7_0} - + PCRE_NEWLINE_BITS - {$ENDIF} - {$IFDEF PCRE_7_7} - + PCRE_BSR_ANYCRLF + PCRE_BSR_UNICODE - {$ENDIF} - ; - -{$IFDEF PCRE_7_0} - PCRE_DFA_EXEC_ALLOWED_OPTIONS = PCRE_ANCHORED + PCRE_NOTBOL + PCRE_NOTEOL + - PCRE_NOTEMPTY + PCRE_NO_UTF8_CHECK + PCRE_PARTIAL + - PCRE_DFA_SHORTEST + PCRE_DFA_RESTART + - PCRE_NEWLINE_BITS - {$IFDEF PCRE_7_7} - + PCRE_BSR_ANYCRLF + PCRE_BSR_UNICODE - {$ENDIF} - ; -{$ENDIF} - -{ Exec-time and get/set-time error codes } - PCRE_ERROR_NOMATCH = -1; - PCRE_ERROR_NULL = -2; - PCRE_ERROR_BADOPTION = -3; - PCRE_ERROR_BADMAGIC = -4; - PCRE_ERROR_UNKNOWN_MODE = -5; - PCRE_ERROR_NOMEMORY = -6; - PCRE_ERROR_NOSUBSTRING = -7; -{$IFDEF PCRE_5_0} - PCRE_ERROR_MATCHLIMIT = -8; - PCRE_ERROR_CALLOUT = -9; { Never used by PCRE itself } - PCRE_ERROR_BADUTF8 = -10; - PCRE_ERROR_BADUTF8_OFFSET = -11; - PCRE_ERROR_PARTIAL = -12; - PCRE_ERROR_BADPARTIAL = -13; - PCRE_ERROR_INTERNAL = -14; - PCRE_ERROR_BADCOUNT = -15; -{$ENDIF} -{$IFDEF PCRE_7_0} - PCRE_ERROR_DFA_UITEM = -16; - PCRE_ERROR_DFA_UCOND = -17; - PCRE_ERROR_DFA_UMLIMIT = -18; - PCRE_ERROR_DFA_WSSIZE = -19; - PCRE_ERROR_DFA_RECURSE = -20; - PCRE_ERROR_RECURSIONLIMIT = -21; - PCRE_ERROR_NULLWSLIMIT = -22; - PCRE_ERROR_BADNEWLINE = -23; -{$ENDIF} - -{ Request types for pcre_fullinfo() } - - PCRE_INFO_OPTIONS = 0; - PCRE_INFO_SIZE = 1; - PCRE_INFO_CAPTURECOUNT = 2; - PCRE_INFO_BACKREFMAX = 3; - PCRE_INFO_FIRSTBYTE = 4; - PCRE_INFO_FIRSTCHAR = 4; { For backwards compatibility } - PCRE_INFO_FIRSTTABLE = 5; -{$IFDEF PCRE_5_0} - PCRE_INFO_LASTLITERAL = 6; - PCRE_INFO_NAMEENTRYSIZE = 7; - PCRE_INFO_NAMECOUNT = 8; - PCRE_INFO_NAMETABLE = 9; - PCRE_INFO_STUDYSIZE = 10; - PCRE_INFO_DEFAULT_TABLES = 11; -{$ENDIF PCRE_5_0} -{$IFDEF PCRE_7_7} - PCRE_INFO_OKPARTIAL = 12; - PCRE_INFO_JCHANGED = 13; - PCRE_INFO_HASCRORLF = 14; -{$ENDIF} - -{ Request types for pcre_config() } -{$IFDEF PCRE_5_0} - PCRE_CONFIG_UTF8 = 0; - PCRE_CONFIG_NEWLINE = 1; - PCRE_CONFIG_LINK_SIZE = 2; - PCRE_CONFIG_POSIX_MALLOC_THRESHOLD = 3; - PCRE_CONFIG_MATCH_LIMIT = 4; - PCRE_CONFIG_STACKRECURSE = 5; - PCRE_CONFIG_UNICODE_PROPERTIES = 6; -{$ENDIF PCRE_5_0} -{$IFDEF PCRE_7_0} - PCRE_CONFIG_MATCH_LIMIT_RECURSION = 7; -{$ENDIF} -{$IFDEF PCRE_7_7} - PCRE_CONFIG_BSR = 8; -{$ENDIF} - -{ Bit flags for the pcre_extra structure } -{$IFDEF PCRE_5_0} - PCRE_EXTRA_STUDY_DATA = $0001; - PCRE_EXTRA_MATCH_LIMIT = $0002; - PCRE_EXTRA_CALLOUT_DATA = $0004; - PCRE_EXTRA_TABLES = $0008; -{$ENDIF PCRE_5_0} -{$IFDEF PCRE_7_0} - PCRE_EXTRA_MATCH_LIMIT_RECURSION = $0010; -{$ENDIF} - -Const -// DefaultOptions : integer = 0; - DefaultLocaleTable : pointer = nil; - -{$IFDEF PCRE_5_0} -{ The structure for passing additional data to pcre_exec(). This is defined in -such as way as to be extensible. Always add new fields at the end, in order to -remain compatible. } - -type ppcre_extra = ^tpcre_extra; - tpcre_extra = record - flags : longint; { Bits for which fields are set } - study_data : pointer; { Opaque data from pcre_study() } - match_limit : longint; { Maximum number of calls to match() } - callout_data : pointer; { Data passed back in callouts } - tables : pointer; { Pointer to character tables } - match_limit_recursion: longint; { Max recursive calls to match() } - end; - -type ppcre_callout_block = ^pcre_callout_block; - pcre_callout_block = record - version, - (* ------------------------ Version 0 ------------------------------- *) - callout_number : integer; - offset_vector : pointer; - subject : pchar; - subject_length, start_match, current_position, capture_top, - capture_last : integer; - callout_data : pointer; - (* ------------------- Added for Version 1 -------------------------- *) - pattern_position, next_item_length : integer; - end; -{$ENDIF PCRE_5_0} - -{$OrgName+} -{$IFDEF VIRTUALPASCAL} {&Cdecl+} {$ENDIF VIRTUALPASCAL} - - { local replacement of external pcre memory management functions } - function pcre_malloc( size : integer ) : pointer; - procedure pcre_free( {var} p : pointer ); -{$IFDEF PCRE_5_0} - const pcre_stack_malloc: function ( size : integer ): pointer = pcre_malloc; - pcre_stack_free: procedure ( {var} p : pointer ) = pcre_free; - function pcre_callout(var p : ppcre_callout_block) : integer; -{$ENDIF PCRE_5_0} -{$IFDEF VIRTUALPASCAL} {&Cdecl-} {$ENDIF VIRTUALPASCAL} - -Implementation - -Uses strings, collect, messages, dnapp, commands, advance0, stringsx - {$IFDEF VIRTUALPASCAL} ,vpsyslow {$ENDIF VIRTUALPASCAL}; - -Const - MAGIC_NUMBER = $50435245; { 'PCRE' } - MAX_MATCHES = 90; { changed in 3.5 version; should be divisible by 3, was 64} - -Type - PMatchArray = ^TMatchArray; - TMatchArray = array[0..( MAX_MATCHES * 3 )] of integer; - - PRegExpCollection = ^TRegExpCollection; - TRegExpCollection = object(TSortedCollection) - MaxRegExp : integer; - SearchRegExp : shortstring; - CompareModeInsert : boolean; - constructor Init(AMaxRegExp:integer); - procedure FreeItem(P: Pointer); virtual; - function Compare(P1, P2: Pointer): Integer; virtual; - function Find(ARegExp:shortstring;var P: PpcRegExp):boolean; virtual; - function CheckNew(ARegExp:shortstring):PpcRegExp;virtual; - end; - -Var - PRegExpCache : PRegExpCollection; - - -{$IFDEF VIRTUALPASCAL} {&Cdecl+} {$ENDIF VIRTUALPASCAL} - - { imported original pcre functions } - - function pcre_compile( const pattern : PChar; options : integer; - var errorptr : PChar; var erroroffset : integer; - const tables : PChar ) : pointer {pcre}; external; -{$IFDEF PCRE_7_0} - function pcre_compile2( const pattern : PChar; options : integer; - var errorcodeptr : Integer; - var errorptr : PChar; var erroroffset : integer; - const tables : PChar ) : pointer {pcre}; external; -{$ENDIF} -{$IFDEF PCRE_5_0} - function pcre_config( what : integer; where : pointer) : integer; external; - function pcre_copy_named_substring( const code : pointer {pcre}; - const subject : pchar; - var ovector : integer; - stringcount : integer; - const stringname : pchar; - var buffer : pchar; - size : integer) : integer; external; - function pcre_copy_substring( const subject : pchar; var ovector : integer; - stringcount, stringnumber : integer; - var buffer : pchar; size : integer ) - : integer; external; - function pcre_exec( const argument_re : pointer {pcre}; - const extra_data : pointer {pcre_extra}; -{$ELSE} - function pcre_exec( const external_re : pointer; - const external_extra : pointer; -{$ENDIF} - const subject : PChar; - length, start_offset, options : integer; - offsets : pointer; - offsetcount : integer ) : integer; external; -{$IFDEF PCRE_7_0} - function pcre_dfa_exec( const argument_re : pointer {pcre}; - const extra_data : pointer {pcre_extra}; - const subject : pchar; - length, start_offset, options : integer; - offsets : pointer; - offsetcount : integer; - workspace : pointer; - wscount : integer ) : integer; external; -{$ENDIF} -{$IFDEF PCRE_5_0} - procedure pcre_free_substring( const p : pchar ); external; - procedure pcre_free_substring_list( var p : pchar ); external; - function pcre_fullinfo( const argument_re : pointer {pcre}; - const extra_data : pointer {pcre_extra}; - what : integer; - where : pointer ) : integer; external; - function pcre_get_named_substring( const code : pointer {pcre}; - const subject : pchar; - var ovector : integer; - stringcount : integer; - const stringname : pchar; - var stringptr : pchar ) : integer; external; - function pcre_get_stringnumber( const code : pointer {pcre}; - const stringname : pchar ) : integer; external; - function pcre_get_stringtable_entries( const code : pointer {pcre}; - const stringname : pchar; - var firstptr, - lastptr : pchar ) : integer; external; - function pcre_get_substring( const subject : pchar; var ovector : integer; - stringcount, stringnumber : integer; - var stringptr : pchar ) : integer; external; - function pcre_get_substring_list( const subject : pchar; var ovector : integer; - stringcount : integer; - listptr : pointer {const char ***listptr}) : integer; external; - function pcre_info( const argument_re : pointer {pcre}; - var optptr : integer; - var first_byte : integer ) : integer; external; - function pcre_maketables : pchar; external; -{$ENDIF} -{$IFDEF PCRE_7_0} - function pcre_refcount( const argument_re : pointer {pcre}; - adjust : integer ) : pchar; external; -{$ENDIF} - function pcre_study( const external_re : pointer {pcre}; - options : integer; - var errorptr : PChar ) : pointer {pcre_extra}; external; -{$IFDEF PCRE_5_0} - function pcre_version : pchar; external; -{$ENDIF} - - function pcre_malloc( size : integer ) : pointer; - begin - GetMem( result, size ); - end; - - procedure pcre_free( {var} p : pointer ); - begin - if (p <> nil) then - FreeMem( p, 0 ); - {@p := nil;} - end; - -{$IFDEF PCRE_5_0} -(* Called from PCRE as a result of the (?C) item. We print out where we are in -the match. Yield zero unless more callouts than the fail count, or the callout -data is not zero. *) - - function pcre_callout; - begin - end; -{$ENDIF} - -{$IFDEF VIRTUALPASCAL} {&Cdecl-} {$ENDIF VIRTUALPASCAL} - -// Always include the newest version of the library -{$IFDEF PCRE_7_7} - {$L pcre77.lib} -{$ELSE} - {$IFDEF PCRE_7_0} - {$L pcre70.lib} - {$ELSE} - {$IFDEF PCRE_5_0} - {$L pcre50.lib} - {$ELSE} - {$IFDEF PCRE_3_7} - {$L pcre37.lib} - {$ENDIF PCRE_3_7} - {$ENDIF PCRE_5_0} - {$ENDIF PCRE_7_0} -{$ENDIF PCRE_7_7} - -{TpcRegExp} - - constructor TpcRegExp.Init(const ARegExp:shortstring; AOptions:integer; ALocale : Pointer); - var - pRegExp : PChar; - begin - RegExp:=ARegExp; - RegExpC:=nil; - RegExpExt:=nil; - Matches:=nil; - MatchesCount:=0; - Error:=true; - ErrorMsg:=nil; - ErrorPos:=0; - RunTimeOptions := 0; - if length(RegExp) < 255 then - begin - RegExp[length(RegExp)+1]:=#0; - pRegExp:=@RegExp[1]; - end - else - begin - GetMem(pRegExp,length(RegExp)+1); - pRegExp:=strpcopy(pRegExp,RegExp); - end; - RegExpC := pcre_compile( pRegExp, - AOptions and PCRE_COMPILE_ALLOWED_OPTIONS, - ErrorMsg, ErrorPos, ALocale); - if length(RegExp) = 255 then - StrDispose(pRegExp); - if RegExpC = nil then - exit; - ErrorMsg:=nil; - RegExpExt := pcre_study( RegExpC, 0, ErrorMsg ); - if (RegExpExt = nil) and (ErrorMsg <> nil) then - begin - pcre_free(RegExpC); - exit; - end; - GetMem(Matches,SizeOf(TMatchArray)); - Error:=false; - end; - - destructor TpcRegExp.Done; - begin - if RegExpC <> nil then - pcre_free(RegExpC); - if RegExpExt <> nil then - pcre_free(RegExpExt); - if Matches <> nil then - FreeMem(Matches,SizeOf(TMatchArray)); - end; - - function TpcRegExp.SearchNext( AStr: Pchar; ALen : longint ) : boolean; - var Options: Integer; - begin // must handle PCRE_ERROR_PARTIAL here - Options := (RunTimeOptions or startup.MiscMultiData.cfgRegEx.DefaultOptions) and - PCRE_EXEC_ALLOWED_OPTIONS; - if MatchesCount > 0 then - MatchesCount:=pcre_exec( RegExpC, RegExpExt, AStr, ALen, PMatchArray(Matches)^[1], - Options, Matches, MAX_MATCHES ) else - MatchesCount:=pcre_exec( RegExpC, RegExpExt, AStr, ALen, 0, - Options, Matches, MAX_MATCHES ); -{ if MatchesCount = 0 then - MatchesCount := MatchesCount div 3;} - PartialMatch := MatchesCount = PCRE_ERROR_PARTIAL; - SearchNext := MatchesCount > 0; - end; - - function TpcRegExp.Search( AStr: Pchar; ALen : longint):boolean; - begin - MatchesCount:=0; - Search:=SearchNext(AStr,ALen); - SourceLen:=ALen; - end; - - function TpcRegExp.SearchOfs( AStr: Pchar; ALen, AOfs: longint ) : boolean; - var Options: Integer; - begin - MatchesCount:=0; - Options := (RunTimeOptions or startup.MiscMultiData.cfgRegEx.DefaultOptions) and - PCRE_EXEC_ALLOWED_OPTIONS; - MatchesCount:=pcre_exec( RegExpC, RegExpExt, AStr, ALen, AOfs, - Options, Matches, MAX_MATCHES ); - PartialMatch := MatchesCount = PCRE_ERROR_PARTIAL; - SearchOfs := MatchesCount > 0; - SourceLen := ALen-AOfs; - end; - - function TpcRegExp.MatchSub(ANom:integer; var Pos,Len:longint):boolean; - begin - if (MatchesCount > 0) and (ANom <= (MatchesCount-1)) then - begin - ANom:=ANom*2; - Pos:=PMatchArray(Matches)^[ANom]; - Len:=PMatchArray(Matches)^[ANom+1]-Pos; - MatchSub:=true; - end - else - MatchSub:=false; - end; - - function TpcRegExp.MatchFull(var Pos,Len:longint):boolean; - begin - MatchFull:=MatchSub(0,Pos,Len); - end; - - function TpcRegExp.GetSubStr(ANom: integer; AStr: Pchar):string; - var - s: ansistring; - pos,len: longint; - begin - s:=''; - if MatchSub(ANom, pos, len) then - begin - setlength(s, len); - Move(AStr[pos], s[1], len); - end; - GetSubStr:=s; - end; - - function TpcRegExp.GetPreSubStr(AStr: Pchar):string; - var - s: ansistring; - l: longint; - begin - s:=''; - if (MatchesCount > 0) then - begin - l:=PMatchArray(Matches)^[0]-1; - if l > 0 then - begin - setlength(s,l); - Move(AStr[1],s[1],l); - end; - end; - GetPreSubStr:=s; - end; - - function TpcRegExp.GetPostSubStr(AStr: Pchar):string; - var - s: ansistring; - l: longint; - ANom: integer; - begin - s:=''; - if (MatchesCount > 0) then - begin - ANom:=(MatchesCount-1){*2} shl 1; - l:=SourceLen-PMatchArray(Matches)^[ANom+1]+1; - if l > 0 then - begin - setlength(s,l); - Move(AStr[PMatchArray(Matches)^[ANom+1]],s[1],l); - end; - end; - GetPostSubStr:=s; - end; - - - function TpcRegExp.GetFullStr(AStr: Pchar):string; - var - s: ansistring; - l: longint; - begin - GetFullStr:=GetSubStr(0,AStr); - end; - - function TpcRegExp.GetReplStr(AStr: Pchar; const ARepl: string):string; - var - s: ansistring; - l,i,lasti: longint; - begin - l:=length(ARepl); - i:=1; - lasti:=1; - s:=''; - while i <= l do - begin - case ARepl[i] of - '\' : - begin - if i < l then - begin - s:=s+copy(ARepl,lasti,i-lasti){+ARepl[i+1]}; - {AH 17-10-05 support for POSIX \1-\9 backreferences} - case ARepl[i+1] of - '0' : s:=s+GetFullStr(AStr); - '1'..'9' : s:=s+GetSubStr(ord(ARepl[i+1])-ord('0'),AStr); - else s:=s+ARepl[i+1]; // copy the escaped character - end; - end; - inc(i); - lasti:=i+1; - end; - '$' : - begin - if i < l then - begin - s:=s+copy(ARepl,lasti,i-lasti); - case ARepl[i+1] of - '&' : s:=s+GetFullStr(AStr); - '1'..'9' : s:=s+GetSubStr(ord(ARepl[i+1])-ord('0'),AStr); - '`' : s:=s+GetPreSubStr(AStr); - #39 : s:=s+GetPostSubStr(AStr); - end; - end; - inc(i); - lasti:=i+1; - end; - end; - inc(i); - end; - if lasti <= {AH 25-10-2004 added =, else l==1 won't work} l then - s:=s+copy(ARepl,lasti,l-lasti+1); - GetReplStr:=s; - end; - - function TpcRegExp.ErrorStr:string; - begin - ErrorStr:=StrPas(ErrorMsg); - end; - -{TRegExpCollection} - -constructor TRegExpCollection.Init(AMaxRegExp: integer); -begin - Inherited Init(1,1); - MaxRegExp:=AMaxRegExp; - CompareModeInsert:=true; -end; - -procedure TRegExpCollection.FreeItem(P: Pointer); -begin - if P <> nil then - begin - Dispose(PpcRegExp(P),Done); - end; -end; - -function TRegExpCollection.Compare(P1, P2: Pointer): Integer; -//var -// l,l1,l2,i : byte; -//// wPos: pchar; -begin - if CompareModeInsert then - begin -// l1:=length(PpcRegExp(P1)^.RegExp); -// l2:=length(PpcRegExp(P2)^.RegExp); -// if l1 > l2 then l:=l2 else -// l:=l1; -// for i:=1 to l do -// if PpcRegExp(P1).RegExp[i] <> PpcRegExp(P2).RegExp[i] then break; -// if i <=l then -// Compare:=ord(PpcRegExp(P1).RegExp[i])-ord(PpcRegExp(P2).RegExp[i]) else -// Compare:=l1-l2; - Compare := stringsx.PasStrCmp(PpcRegExp(P1).RegExp, PpcRegExp(P2).RegExp, False); - end - else - begin -// l1:=length(PpcRegExp(P1)^.RegExp); -// l2:=length(SearchRegExp); -// if l1 > l2 then l:=l2 else -// l:=l1; -// for i:=1 to l do -// if PpcRegExp(P1).RegExp[i] <> SearchRegExp[i] then -// begin -// Compare:=ord(PpcRegExp(P1).RegExp[i])-ord(SearchRegExp[i]); -// break; -// end; -// if i > l then Compare:=l1-l2; - Compare := stringsx.PasStrCmp(PpcRegExp(P1).RegExp, SearchRegExp, False); - end; -end; - -function TRegExpCollection.Find(ARegExp:shortstring;var P: PpcRegExp):boolean; -var I : integer; -begin - CompareModeInsert:=false; - SearchRegExp:=ARegExp; - if Search(nil,I) then - begin - P:=PpcRegExp(At(I)); - Find:=true; - end - else - begin - P:=nil; - Find:=false; - end; - CompareModeInsert:=true; -end; - -function TRegExpCollection.CheckNew(ARegExp:shortstring):PpcRegExp; -var - P : PpcRegExp; -begin - if not Find(ARegExp,P) then - begin - if Count = MaxRegExp then - AtFree(0); - P:=New(ppcRegExp,Init(ARegExp,PCRE_CASELESS,nil)); - Insert(P); - end; - CheckNew:=P; -end; - -function pcGrepMatch(WildCard, aStr: string; AOptions:integer; ALocale : Pointer): Boolean; -var - PpcRE:PpcRegExp; -begin - PpcRE:=New(ppcRegExp,Init(WildCard,AOptions,Alocale)); - pcGrepMatch:=PpcRE^.Search(pchar(AStr),Length(AStr)); - Dispose(PpcRE,Done); -end; - -function pcGrepSub(WildCard, aStr, aRepl: string; AOptions:integer; ALocale : Pointer): string; -var - PpcRE:PpcRegExp; -begin - PpcRE:=New(ppcRegExp,Init(WildCard,AOptions,Alocale)); - if PpcRE^.Search(pchar(AStr),Length(AStr)) then - pcGrepSub:=PpcRE^.GetReplStr(pchar(AStr),ARepl) - else - pcGrepSub:=''; - Dispose(PpcRE,Done); -end; - -function pcFastGrepMatch(WildCard, aStr: string): Boolean; -var - PpcRE:PpcRegExp; -begin - PpcRE:=PRegExpCache^.CheckNew(WildCard); - pcFastGrepMatch:=PpcRE^.Search(pchar(AStr),Length(AStr)); -end; - -function pcFastGrepSub(WildCard, aStr, aRepl: string): string; -var - PpcRE:PpcRegExp; -begin - PpcRE:=PRegExpCache^.CheckNew(WildCard); - if PpcRE^.Search(pchar(AStr),Length(AStr)) then - pcFastGrepSub:=PpcRE^.GetReplStr(pchar(AStr),ARepl) - else - pcFastGrepSub:=''; -end; - -{$IFDEF PCRE_5_0} -function pcGetVersion : pchar; assembler; {$FRAME-}{$USES none} -asm - call pcre_version -end; -{$ENDIF PCRE_5_0} - -function pcError; -var P: ppcRegExp absolute pRegExp; -begin - Result := (P = nil) or P^.Error; - If Result and (P <> nil) then - begin -{ if P^.ErrorPos = 0 then - MessageBox(GetString(erRegExpCompile)+'"'+P^.ErrorStr+'"', nil,mfConfirmation+mfOkButton) - else} - MessageBox(GetString(erRegExpCompile)+'"'+P^.ErrorStr+'"'+GetString(erRegExpCompPos), - @P^.ErrorPos,mfConfirmation+mfOkButton); - Dispose(P, Done); - P:=nil; - end; -end; - -function pcInit; -var Options : Integer; -begin - If CaseSens then Options := 0 else Options := PCRE_CASELESS; - Result := New( PpcRegExp, Init( Pattern, - {DefaultOptions} - startup.MiscMultiData.cfgRegEx.DefaultOptions or Options, - DefaultLocaleTable) ); -end; - -Initialization - PRegExpCache:=New(PRegExpCollection,Init(64)); -Finalization - Dispose(PRegExpCache,Done); -End. diff --git a/src/pcre/pcreposix.c b/src/pcre/pcreposix.c deleted file mode 100644 index a76d6bfc..00000000 --- a/src/pcre/pcreposix.c +++ /dev/null @@ -1,432 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -/* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language. - - Written by Philip Hazel - Copyright (c) 1997-2018 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - - -/* This module is a wrapper that provides a POSIX API to the underlying PCRE -functions. */ - - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - - -/* Ensure that the PCREPOSIX_EXP_xxx macros are set appropriately for -compiling these functions. This must come before including pcreposix.h, where -they are set for an application (using these functions) if they have not -previously been set. */ - -#if defined(_WIN32) && !defined(PCRE_STATIC) -# define PCREPOSIX_EXP_DECL extern __declspec(dllexport) -# define PCREPOSIX_EXP_DEFN __declspec(dllexport) -#endif - -/* We include pcre.h before pcre_internal.h so that the PCRE library functions -are declared as "import" for Windows by defining PCRE_EXP_DECL as "import". -This is needed even though pcre_internal.h itself includes pcre.h, because it -does so after it has set PCRE_EXP_DECL to "export" if it is not already set. */ - -#include "pcre.h" -#include "pcre_internal.h" -#include "pcreposix.h" - - -/* Table to translate PCRE compile time error codes into POSIX error codes. */ - -static const int eint[] = { - 0, /* no error */ - REG_EESCAPE, /* \ at end of pattern */ - REG_EESCAPE, /* \c at end of pattern */ - REG_EESCAPE, /* unrecognized character follows \ */ - REG_BADBR, /* numbers out of order in {} quantifier */ - /* 5 */ - REG_BADBR, /* number too big in {} quantifier */ - REG_EBRACK, /* missing terminating ] for character class */ - REG_ECTYPE, /* invalid escape sequence in character class */ - REG_ERANGE, /* range out of order in character class */ - REG_BADRPT, /* nothing to repeat */ - /* 10 */ - REG_BADRPT, /* operand of unlimited repeat could match the empty string */ - REG_ASSERT, /* internal error: unexpected repeat */ - REG_BADPAT, /* unrecognized character after (? */ - REG_BADPAT, /* POSIX named classes are supported only within a class */ - REG_EPAREN, /* missing ) */ - /* 15 */ - REG_ESUBREG, /* reference to non-existent subpattern */ - REG_INVARG, /* erroffset passed as NULL */ - REG_INVARG, /* unknown option bit(s) set */ - REG_EPAREN, /* missing ) after comment */ - REG_ESIZE, /* parentheses nested too deeply */ - /* 20 */ - REG_ESIZE, /* regular expression too large */ - REG_ESPACE, /* failed to get memory */ - REG_EPAREN, /* unmatched parentheses */ - REG_ASSERT, /* internal error: code overflow */ - REG_BADPAT, /* unrecognized character after (?< */ - /* 25 */ - REG_BADPAT, /* lookbehind assertion is not fixed length */ - REG_BADPAT, /* malformed number or name after (?( */ - REG_BADPAT, /* conditional group contains more than two branches */ - REG_BADPAT, /* assertion expected after (?( */ - REG_BADPAT, /* (?R or (?[+-]digits must be followed by ) */ - /* 30 */ - REG_ECTYPE, /* unknown POSIX class name */ - REG_BADPAT, /* POSIX collating elements are not supported */ - REG_INVARG, /* this version of PCRE is not compiled with PCRE_UTF8 support */ - REG_BADPAT, /* spare error */ - REG_BADPAT, /* character value in \x{} or \o{} is too large */ - /* 35 */ - REG_BADPAT, /* invalid condition (?(0) */ - REG_BADPAT, /* \C not allowed in lookbehind assertion */ - REG_EESCAPE, /* PCRE does not support \L, \l, \N, \U, or \u */ - REG_BADPAT, /* number after (?C is > 255 */ - REG_BADPAT, /* closing ) for (?C expected */ - /* 40 */ - REG_BADPAT, /* recursive call could loop indefinitely */ - REG_BADPAT, /* unrecognized character after (?P */ - REG_BADPAT, /* syntax error in subpattern name (missing terminator) */ - REG_BADPAT, /* two named subpatterns have the same name */ - REG_BADPAT, /* invalid UTF-8 string */ - /* 45 */ - REG_BADPAT, /* support for \P, \p, and \X has not been compiled */ - REG_BADPAT, /* malformed \P or \p sequence */ - REG_BADPAT, /* unknown property name after \P or \p */ - REG_BADPAT, /* subpattern name is too long (maximum 32 characters) */ - REG_BADPAT, /* too many named subpatterns (maximum 10,000) */ - /* 50 */ - REG_BADPAT, /* repeated subpattern is too long */ - REG_BADPAT, /* octal value is greater than \377 (not in UTF-8 mode) */ - REG_BADPAT, /* internal error: overran compiling workspace */ - REG_BADPAT, /* internal error: previously-checked referenced subpattern not found */ - REG_BADPAT, /* DEFINE group contains more than one branch */ - /* 55 */ - REG_BADPAT, /* repeating a DEFINE group is not allowed */ - REG_INVARG, /* inconsistent NEWLINE options */ - REG_BADPAT, /* \g is not followed followed by an (optionally braced) non-zero number */ - REG_BADPAT, /* a numbered reference must not be zero */ - REG_BADPAT, /* an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT) */ - /* 60 */ - REG_BADPAT, /* (*VERB) not recognized */ - REG_BADPAT, /* number is too big */ - REG_BADPAT, /* subpattern name expected */ - REG_BADPAT, /* digit expected after (?+ */ - REG_BADPAT, /* ] is an invalid data character in JavaScript compatibility mode */ - /* 65 */ - REG_BADPAT, /* different names for subpatterns of the same number are not allowed */ - REG_BADPAT, /* (*MARK) must have an argument */ - REG_INVARG, /* this version of PCRE is not compiled with PCRE_UCP support */ - REG_BADPAT, /* \c must be followed by an ASCII character */ - REG_BADPAT, /* \k is not followed by a braced, angle-bracketed, or quoted name */ - /* 70 */ - REG_BADPAT, /* internal error: unknown opcode in find_fixedlength() */ - REG_BADPAT, /* \N is not supported in a class */ - REG_BADPAT, /* too many forward references */ - REG_BADPAT, /* disallowed UTF-8/16/32 code point (>= 0xd800 && <= 0xdfff) */ - REG_BADPAT, /* invalid UTF-16 string (should not occur) */ - /* 75 */ - REG_BADPAT, /* overlong MARK name */ - REG_BADPAT, /* character value in \u.... sequence is too large */ - REG_BADPAT, /* invalid UTF-32 string (should not occur) */ - REG_BADPAT, /* setting UTF is disabled by the application */ - REG_BADPAT, /* non-hex character in \\x{} (closing brace missing?) */ - /* 80 */ - REG_BADPAT, /* non-octal character in \o{} (closing brace missing?) */ - REG_BADPAT, /* missing opening brace after \o */ - REG_BADPAT, /* parentheses too deeply nested */ - REG_BADPAT, /* invalid range in character class */ - REG_BADPAT, /* group name must start with a non-digit */ - /* 85 */ - REG_BADPAT, /* parentheses too deeply nested (stack check) */ - REG_BADPAT, /* missing digits in \x{} or \o{} */ - REG_BADPAT /* pattern too complicated */ -}; - -/* Table of texts corresponding to POSIX error codes */ - -static const char *const pstring[] = { - "", /* Dummy for value 0 */ - "internal error", /* REG_ASSERT */ - "invalid repeat counts in {}", /* BADBR */ - "pattern error", /* BADPAT */ - "? * + invalid", /* BADRPT */ - "unbalanced {}", /* EBRACE */ - "unbalanced []", /* EBRACK */ - "collation error - not relevant", /* ECOLLATE */ - "bad class", /* ECTYPE */ - "bad escape sequence", /* EESCAPE */ - "empty expression", /* EMPTY */ - "unbalanced ()", /* EPAREN */ - "bad range inside []", /* ERANGE */ - "expression too big", /* ESIZE */ - "failed to get memory", /* ESPACE */ - "bad back reference", /* ESUBREG */ - "bad argument", /* INVARG */ - "match failed" /* NOMATCH */ -}; - - - - -/************************************************* -* Translate error code to string * -*************************************************/ - -PCREPOSIX_EXP_DEFN size_t PCRE_CALL_CONVENTION -regerror(int errcode, const regex_t *preg, char *errbuf, size_t errbuf_size) -{ -const char *message, *addmessage; -size_t length, addlength; - -message = (errcode >= (int)(sizeof(pstring)/sizeof(char *)))? - "unknown error code" : pstring[errcode]; -length = strlen(message) + 1; - -addmessage = " at offset "; -addlength = (preg != NULL && (int)preg->re_erroffset != -1)? - strlen(addmessage) + 6 : 0; - -if (errbuf_size > 0) - { - if (addlength > 0 && errbuf_size >= length + addlength) - sprintf(errbuf, "%s%s%-6d", message, addmessage, (int)preg->re_erroffset); - else - { - strncpy(errbuf, message, errbuf_size - 1); - errbuf[errbuf_size-1] = 0; - } - } - -return length + addlength; -} - - - - -/************************************************* -* Free store held by a regex * -*************************************************/ - -PCREPOSIX_EXP_DEFN void PCRE_CALL_CONVENTION -regfree(regex_t *preg) -{ -(PUBL(free))(preg->re_pcre); -} - - - - -/************************************************* -* Compile a regular expression * -*************************************************/ - -/* -Arguments: - preg points to a structure for recording the compiled expression - pattern the pattern to compile - cflags compilation flags - -Returns: 0 on success - various non-zero codes on failure -*/ - -PCREPOSIX_EXP_DEFN int PCRE_CALL_CONVENTION -regcomp(regex_t *preg, const char *pattern, int cflags) -{ -const char *errorptr; -int erroffset; -int errorcode; -int options = 0; -int re_nsub = 0; - -if ((cflags & REG_ICASE) != 0) options |= PCRE_CASELESS; -if ((cflags & REG_NEWLINE) != 0) options |= PCRE_MULTILINE; -if ((cflags & REG_DOTALL) != 0) options |= PCRE_DOTALL; -if ((cflags & REG_NOSUB) != 0) options |= PCRE_NO_AUTO_CAPTURE; -if ((cflags & REG_UTF8) != 0) options |= PCRE_UTF8; -if ((cflags & REG_UCP) != 0) options |= PCRE_UCP; -if ((cflags & REG_UNGREEDY) != 0) options |= PCRE_UNGREEDY; - -preg->re_pcre = pcre_compile2(pattern, options, &errorcode, &errorptr, - &erroffset, NULL); -preg->re_erroffset = erroffset; - -/* Safety: if the error code is too big for the translation vector (which -should not happen, but we all make mistakes), return REG_BADPAT. */ - -if (preg->re_pcre == NULL) - { - return (errorcode < (int)(sizeof(eint)/sizeof(const int)))? - eint[errorcode] : REG_BADPAT; - } - -(void)pcre_fullinfo((const pcre *)preg->re_pcre, NULL, PCRE_INFO_CAPTURECOUNT, - &re_nsub); -preg->re_nsub = (size_t)re_nsub; -return 0; -} - - - - -/************************************************* -* Match a regular expression * -*************************************************/ - -/* Unfortunately, PCRE requires 3 ints of working space for each captured -substring, so we have to get and release working store instead of just using -the POSIX structures as was done in earlier releases when PCRE needed only 2 -ints. However, if the number of possible capturing brackets is small, use a -block of store on the stack, to reduce the use of malloc/free. The threshold is -in a macro that can be changed at configure time. - -If REG_NOSUB was specified at compile time, the PCRE_NO_AUTO_CAPTURE flag will -be set. When this is the case, the nmatch and pmatch arguments are ignored, and -the only result is yes/no/error. */ - -PCREPOSIX_EXP_DEFN int PCRE_CALL_CONVENTION -regexec(const regex_t *preg, const char *string, size_t nmatch, - regmatch_t pmatch[], int eflags) -{ -int rc, so, eo; -int options = 0; -int *ovector = NULL; -int small_ovector[POSIX_MALLOC_THRESHOLD * 3]; -BOOL allocated_ovector = FALSE; -BOOL nosub = - (REAL_PCRE_OPTIONS((const pcre *)preg->re_pcre) & PCRE_NO_AUTO_CAPTURE) != 0; - -if ((eflags & REG_NOTBOL) != 0) options |= PCRE_NOTBOL; -if ((eflags & REG_NOTEOL) != 0) options |= PCRE_NOTEOL; -if ((eflags & REG_NOTEMPTY) != 0) options |= PCRE_NOTEMPTY; - -((regex_t *)preg)->re_erroffset = (size_t)(-1); /* Only has meaning after compile */ - -/* When no string data is being returned, or no vector has been passed in which -to put it, ensure that nmatch is zero. Otherwise, ensure the vector for holding -the return data is large enough. */ - -if (nosub || pmatch == NULL) nmatch = 0; - -else if (nmatch > 0) - { - if (nmatch <= POSIX_MALLOC_THRESHOLD) - { - ovector = &(small_ovector[0]); - } - else - { - if (nmatch > INT_MAX/(sizeof(int) * 3)) return REG_ESPACE; - ovector = (int *)malloc(sizeof(int) * nmatch * 3); - if (ovector == NULL) return REG_ESPACE; - allocated_ovector = TRUE; - } - } - -/* REG_STARTEND is a BSD extension, to allow for non-NUL-terminated strings. -The man page from OS X says "REG_STARTEND affects only the location of the -string, not how it is matched". That is why the "so" value is used to bump the -start location rather than being passed as a PCRE "starting offset". */ - -if ((eflags & REG_STARTEND) != 0) - { - if (pmatch == NULL) return REG_INVARG; - so = pmatch[0].rm_so; - eo = pmatch[0].rm_eo; - } -else - { - so = 0; - eo = (int)strlen(string); - } - -rc = pcre_exec((const pcre *)preg->re_pcre, NULL, string + so, (eo - so), - 0, options, ovector, (int)(nmatch * 3)); - -if (rc == 0) rc = (int)nmatch; /* All captured slots were filled in */ - -/* Successful match */ - -if (rc >= 0) - { - size_t i; - if (!nosub) - { - for (i = 0; i < (size_t)rc; i++) - { - pmatch[i].rm_so = (ovector[i*2] < 0)? -1 : ovector[i*2] + so; - pmatch[i].rm_eo = (ovector[i*2+1] < 0)? -1: ovector[i*2+1] + so; - } - if (allocated_ovector) free(ovector); - for (; i < nmatch; i++) pmatch[i].rm_so = pmatch[i].rm_eo = -1; - } - return 0; - } - -/* Unsuccessful match */ - -if (allocated_ovector) free(ovector); -switch(rc) - { -/* ========================================================================== */ - /* These cases are never obeyed. This is a fudge that causes a compile-time - error if the vector eint, which is indexed by compile-time error number, is - not the correct length. It seems to be the only way to do such a check at - compile time, as the sizeof() operator does not work in the C preprocessor. - As all the PCRE_ERROR_xxx values are negative, we can use 0 and 1. */ - - case 0: - case (sizeof(eint)/sizeof(int) == ERRCOUNT): - return REG_ASSERT; -/* ========================================================================== */ - - case PCRE_ERROR_NOMATCH: return REG_NOMATCH; - case PCRE_ERROR_NULL: return REG_INVARG; - case PCRE_ERROR_BADOPTION: return REG_INVARG; - case PCRE_ERROR_BADMAGIC: return REG_INVARG; - case PCRE_ERROR_UNKNOWN_NODE: return REG_ASSERT; - case PCRE_ERROR_NOMEMORY: return REG_ESPACE; - case PCRE_ERROR_MATCHLIMIT: return REG_ESPACE; - case PCRE_ERROR_BADUTF8: return REG_INVARG; - case PCRE_ERROR_BADUTF8_OFFSET: return REG_INVARG; - case PCRE_ERROR_BADMODE: return REG_INVARG; - default: return REG_ASSERT; - } -} - -/* End of pcreposix.c */ diff --git a/src/pcre/pcreposix.h b/src/pcre/pcreposix.h deleted file mode 100644 index c77c0b05..00000000 --- a/src/pcre/pcreposix.h +++ /dev/null @@ -1,146 +0,0 @@ -/************************************************* -* Perl-Compatible Regular Expressions * -*************************************************/ - -#ifndef _PCREPOSIX_H -#define _PCREPOSIX_H - -/* This is the header for the POSIX wrapper interface to the PCRE Perl- -Compatible Regular Expression library. It defines the things POSIX says should -be there. I hope. - - Copyright (c) 1997-2012 University of Cambridge - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* Have to include stdlib.h in order to ensure that size_t is defined. */ - -#include - -/* Allow for C++ users */ - -#ifdef __cplusplus -extern "C" { -#endif - -/* Options, mostly defined by POSIX, but with some extras. */ - -#define REG_ICASE 0x0001 /* Maps to PCRE_CASELESS */ -#define REG_NEWLINE 0x0002 /* Maps to PCRE_MULTILINE */ -#define REG_NOTBOL 0x0004 /* Maps to PCRE_NOTBOL */ -#define REG_NOTEOL 0x0008 /* Maps to PCRE_NOTEOL */ -#define REG_DOTALL 0x0010 /* NOT defined by POSIX; maps to PCRE_DOTALL */ -#define REG_NOSUB 0x0020 /* Maps to PCRE_NO_AUTO_CAPTURE */ -#define REG_UTF8 0x0040 /* NOT defined by POSIX; maps to PCRE_UTF8 */ -#define REG_STARTEND 0x0080 /* BSD feature: pass subject string by so,eo */ -#define REG_NOTEMPTY 0x0100 /* NOT defined by POSIX; maps to PCRE_NOTEMPTY */ -#define REG_UNGREEDY 0x0200 /* NOT defined by POSIX; maps to PCRE_UNGREEDY */ -#define REG_UCP 0x0400 /* NOT defined by POSIX; maps to PCRE_UCP */ - -/* This is not used by PCRE, but by defining it we make it easier -to slot PCRE into existing programs that make POSIX calls. */ - -#define REG_EXTENDED 0 - -/* Error values. Not all these are relevant or used by the wrapper. */ - -enum { - REG_ASSERT = 1, /* internal error ? */ - REG_BADBR, /* invalid repeat counts in {} */ - REG_BADPAT, /* pattern error */ - REG_BADRPT, /* ? * + invalid */ - REG_EBRACE, /* unbalanced {} */ - REG_EBRACK, /* unbalanced [] */ - REG_ECOLLATE, /* collation error - not relevant */ - REG_ECTYPE, /* bad class */ - REG_EESCAPE, /* bad escape sequence */ - REG_EMPTY, /* empty expression */ - REG_EPAREN, /* unbalanced () */ - REG_ERANGE, /* bad range inside [] */ - REG_ESIZE, /* expression too big */ - REG_ESPACE, /* failed to get memory */ - REG_ESUBREG, /* bad back reference */ - REG_INVARG, /* bad argument */ - REG_NOMATCH /* match failed */ -}; - - -/* The structure representing a compiled regular expression. */ - -typedef struct { - void *re_pcre; - size_t re_nsub; - size_t re_erroffset; -} regex_t; - -/* The structure in which a captured offset is returned. */ - -typedef int regoff_t; - -typedef struct { - regoff_t rm_so; - regoff_t rm_eo; -} regmatch_t; - -/* When an application links to a PCRE DLL in Windows, the symbols that are -imported have to be identified as such. When building PCRE, the appropriate -export settings are needed, and are set in pcreposix.c before including this -file. */ - -#if defined(_WIN32) && !defined(PCRE_STATIC) && !defined(PCREPOSIX_EXP_DECL) -# define PCREPOSIX_EXP_DECL extern __declspec(dllimport) -# define PCREPOSIX_EXP_DEFN __declspec(dllimport) -#endif - -/* By default, we use the standard "extern" declarations. */ - -#ifndef PCREPOSIX_EXP_DECL -# ifdef __cplusplus -# define PCREPOSIX_EXP_DECL extern "C" -# define PCREPOSIX_EXP_DEFN extern "C" -# else -# define PCREPOSIX_EXP_DECL extern -# define PCREPOSIX_EXP_DEFN extern -# endif -#endif - -/* The functions */ - -PCREPOSIX_EXP_DECL int regcomp(regex_t *, const char *, int); -PCREPOSIX_EXP_DECL int regexec(const regex_t *, const char *, size_t, - regmatch_t *, int); -PCREPOSIX_EXP_DECL size_t regerror(int, const regex_t *, char *, size_t); -PCREPOSIX_EXP_DECL void regfree(regex_t *); - -#ifdef __cplusplus -} /* extern "C" */ -#endif - -#endif /* End of pcreposix.h */ diff --git a/src/pcre/pcretest.c b/src/pcre/pcretest.c deleted file mode 100644 index f1303037..00000000 --- a/src/pcre/pcretest.c +++ /dev/null @@ -1,5773 +0,0 @@ -/************************************************* -* PCRE testing program * -*************************************************/ - -/* This program was hacked up as a tester for PCRE. I really should have -written it more tidily in the first place. Will I ever learn? It has grown and -been extended and consequently is now rather, er, *very* untidy in places. The -addition of 16-bit support has made it even worse. :-( - ------------------------------------------------------------------------------ -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - - * Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - - * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - * Neither the name of the University of Cambridge nor the names of its - contributors may be used to endorse or promote products derived from - this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE -POSSIBILITY OF SUCH DAMAGE. ------------------------------------------------------------------------------ -*/ - -/* This program now supports the testing of all of the 8-bit, 16-bit, and -32-bit PCRE libraries in a single program. This is different from the modules -such as pcre_compile.c in the library itself, which are compiled separately for -each mode. If two modes are enabled, for example, pcre_compile.c is compiled -twice. By contrast, pcretest.c is compiled only once. Therefore, it must not -make use of any of the macros from pcre_internal.h that depend on -COMPILE_PCRE8, COMPILE_PCRE16, or COMPILE_PCRE32. It does, however, make use of -SUPPORT_PCRE8, SUPPORT_PCRE16, and SUPPORT_PCRE32 to ensure that it calls only -supported library functions. */ - -#ifdef HAVE_CONFIG_H -#include "config.h" -#endif - -#include -#include -#include -#include -#include -#include -#include - -/* Both libreadline and libedit are optionally supported. The user-supplied -original patch uses readline/readline.h for libedit, but in at least one system -it is installed as editline/readline.h, so the configuration code now looks for -that first, falling back to readline/readline.h. */ - -#if defined(SUPPORT_LIBREADLINE) || defined(SUPPORT_LIBEDIT) -#ifdef HAVE_UNISTD_H -#include -#endif -#if defined(SUPPORT_LIBREADLINE) -#include -#include -#else -#if defined(HAVE_EDITLINE_READLINE_H) -#include -#else -#include -#endif -#endif -#endif - -/* A number of things vary for Windows builds. Originally, pcretest opened its -input and output without "b"; then I was told that "b" was needed in some -environments, so it was added for release 5.0 to both the input and output. (It -makes no difference on Unix-like systems.) Later I was told that it is wrong -for the input on Windows. I've now abstracted the modes into two macros that -are set here, to make it easier to fiddle with them, and removed "b" from the -input mode under Windows. */ - -#if defined(_WIN32) || defined(WIN32) -#include /* For _setmode() */ -#include /* For _O_BINARY */ -#define INPUT_MODE "r" -#define OUTPUT_MODE "wb" - -#ifndef isatty -#define isatty _isatty /* This is what Windows calls them, I'm told, */ -#endif /* though in some environments they seem to */ - /* be already defined, hence the #ifndefs. */ -#ifndef fileno -#define fileno _fileno -#endif - -/* A user sent this fix for Borland Builder 5 under Windows. */ - -#ifdef __BORLANDC__ -#define _setmode(handle, mode) setmode(handle, mode) -#endif - -/* Not Windows */ - -#else -#include /* These two includes are needed */ -#include /* for setrlimit(). */ -#if defined NATIVE_ZOS /* z/OS uses non-binary I/O */ -#define INPUT_MODE "r" -#define OUTPUT_MODE "w" -#else -#define INPUT_MODE "rb" -#define OUTPUT_MODE "wb" -#endif -#endif - -#ifdef __VMS -#include -void vms_setsymbol( char *, char *, int ); -#endif - - -#define PRIV(name) name - -/* We have to include pcre_internal.h because we need the internal info for -displaying the results of pcre_study() and we also need to know about the -internal macros, structures, and other internal data values; pcretest has -"inside information" compared to a program that strictly follows the PCRE API. - -Although pcre_internal.h does itself include pcre.h, we explicitly include it -here before pcre_internal.h so that the PCRE_EXP_xxx macros get set -appropriately for an application, not for building PCRE. */ - -#include "pcre.h" -#include "pcre_internal.h" - -/* The pcre_printint() function, which prints the internal form of a compiled -regex, is held in a separate file so that (a) it can be compiled in either -8-, 16- or 32-bit mode, and (b) it can be #included directly in pcre_compile.c -when that is compiled in debug mode. */ - -#ifdef SUPPORT_PCRE8 -void pcre_printint(pcre *external_re, FILE *f, BOOL print_lengths); -#endif -#ifdef SUPPORT_PCRE16 -void pcre16_printint(pcre *external_re, FILE *f, BOOL print_lengths); -#endif -#ifdef SUPPORT_PCRE32 -void pcre32_printint(pcre *external_re, FILE *f, BOOL print_lengths); -#endif - -/* We need access to some of the data tables that PCRE uses. So as not to have -to keep two copies, we include the source files here, changing the names of the -external symbols to prevent clashes. */ - -#define PCRE_INCLUDED - -#include "pcre_tables.c" -#include "pcre_ucd.c" - -/* The definition of the macro PRINTABLE, which determines whether to print an -output character as-is or as a hex value when showing compiled patterns, is -the same as in the printint.src file. We uses it here in cases when the locale -has not been explicitly changed, so as to get consistent output from systems -that differ in their output from isprint() even in the "C" locale. */ - -#ifdef EBCDIC -#define PRINTABLE(c) ((c) >= 64 && (c) < 255) -#else -#define PRINTABLE(c) ((c) >= 32 && (c) < 127) -#endif - -#define PRINTOK(c) (locale_set? (((c) < 256) && isprint(c)) : PRINTABLE(c)) - -/* Posix support is disabled in 16 or 32 bit only mode. */ -#if !defined SUPPORT_PCRE8 && !defined NOPOSIX -#define NOPOSIX -#endif - -/* It is possible to compile this test program without including support for -testing the POSIX interface, though this is not available via the standard -Makefile. */ - -#if !defined NOPOSIX -#include "pcreposix.h" -#endif - -/* It is also possible, originally for the benefit of a version that was -imported into Exim, to build pcretest without support for UTF8 or UTF16 (define -NOUTF), without the interface to the DFA matcher (NODFA). In fact, we -automatically cut out the UTF support if PCRE is built without it. */ - -#ifndef SUPPORT_UTF -#ifndef NOUTF -#define NOUTF -#endif -#endif - -/* To make the code a bit tidier for 8/16/32-bit support, we define macros -for all the pcre[16]_xxx functions (except pcre16_fullinfo, which is called -only from one place and is handled differently). I couldn't dream up any way of -using a single macro to do this in a generic way, because of the many different -argument requirements. We know that at least one of SUPPORT_PCRE8 and -SUPPORT_PCRE16 must be set. First define macros for each individual mode; then -use these in the definitions of generic macros. - -**** Special note about the PCHARSxxx macros: the address of the string to be -printed is always given as two arguments: a base address followed by an offset. -The base address is cast to the correct data size for 8 or 16 bit data; the -offset is in units of this size. If the string were given as base+offset in one -argument, the casting might be incorrectly applied. */ - -#ifdef SUPPORT_PCRE8 - -#define PCHARS8(lv, p, offset, len, f) \ - lv = pchars((pcre_uint8 *)(p) + offset, len, f) - -#define PCHARSV8(p, offset, len, f) \ - (void)pchars((pcre_uint8 *)(p) + offset, len, f) - -#define READ_CAPTURE_NAME8(p, cn8, cn16, cn32, re) \ - p = read_capture_name8(p, cn8, re) - -#define STRLEN8(p) ((int)strlen((char *)p)) - -#define SET_PCRE_CALLOUT8(callout) \ - pcre_callout = callout - -#define SET_PCRE_STACK_GUARD8(stack_guard) \ - pcre_stack_guard = stack_guard - -#define PCRE_ASSIGN_JIT_STACK8(extra, callback, userdata) \ - pcre_assign_jit_stack(extra, callback, userdata) - -#define PCRE_COMPILE8(re, pat, options, error, erroffset, tables) \ - re = pcre_compile((char *)pat, options, error, erroffset, tables) - -#define PCRE_COPY_NAMED_SUBSTRING8(rc, re, bptr, offsets, count, \ - namesptr, cbuffer, size) \ - rc = pcre_copy_named_substring(re, (char *)bptr, offsets, count, \ - (char *)namesptr, cbuffer, size) - -#define PCRE_COPY_SUBSTRING8(rc, bptr, offsets, count, i, cbuffer, size) \ - rc = pcre_copy_substring((char *)bptr, offsets, count, i, cbuffer, size) - -#define PCRE_DFA_EXEC8(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets, workspace, size_workspace) \ - count = pcre_dfa_exec(re, extra, (char *)bptr, len, start_offset, options, \ - offsets, size_offsets, workspace, size_workspace) - -#define PCRE_EXEC8(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets) \ - count = pcre_exec(re, extra, (char *)bptr, len, start_offset, options, \ - offsets, size_offsets) - -#define PCRE_FREE_STUDY8(extra) \ - pcre_free_study(extra) - -#define PCRE_FREE_SUBSTRING8(substring) \ - pcre_free_substring(substring) - -#define PCRE_FREE_SUBSTRING_LIST8(listptr) \ - pcre_free_substring_list(listptr) - -#define PCRE_GET_NAMED_SUBSTRING8(rc, re, bptr, offsets, count, \ - getnamesptr, subsptr) \ - rc = pcre_get_named_substring(re, (char *)bptr, offsets, count, \ - (char *)getnamesptr, subsptr) - -#define PCRE_GET_STRINGNUMBER8(n, rc, ptr) \ - n = pcre_get_stringnumber(re, (char *)ptr) - -#define PCRE_GET_SUBSTRING8(rc, bptr, offsets, count, i, subsptr) \ - rc = pcre_get_substring((char *)bptr, offsets, count, i, subsptr) - -#define PCRE_GET_SUBSTRING_LIST8(rc, bptr, offsets, count, listptr) \ - rc = pcre_get_substring_list((const char *)bptr, offsets, count, listptr) - -#define PCRE_PATTERN_TO_HOST_BYTE_ORDER8(rc, re, extra, tables) \ - rc = pcre_pattern_to_host_byte_order(re, extra, tables) - -#define PCRE_PRINTINT8(re, outfile, debug_lengths) \ - pcre_printint(re, outfile, debug_lengths) - -#define PCRE_STUDY8(extra, re, options, error) \ - extra = pcre_study(re, options, error) - -#define PCRE_JIT_STACK_ALLOC8(startsize, maxsize) \ - pcre_jit_stack_alloc(startsize, maxsize) - -#define PCRE_JIT_STACK_FREE8(stack) \ - pcre_jit_stack_free(stack) - -#define pcre8_maketables pcre_maketables - -#endif /* SUPPORT_PCRE8 */ - -/* -----------------------------------------------------------*/ - -#ifdef SUPPORT_PCRE16 - -#define PCHARS16(lv, p, offset, len, f) \ - lv = pchars16((PCRE_SPTR16)(p) + offset, len, f) - -#define PCHARSV16(p, offset, len, f) \ - (void)pchars16((PCRE_SPTR16)(p) + offset, len, f) - -#define READ_CAPTURE_NAME16(p, cn8, cn16, cn32, re) \ - p = read_capture_name16(p, cn16, re) - -#define STRLEN16(p) ((int)strlen16((PCRE_SPTR16)p)) - -#define SET_PCRE_CALLOUT16(callout) \ - pcre16_callout = (int (*)(pcre16_callout_block *))callout - -#define SET_PCRE_STACK_GUARD16(stack_guard) \ - pcre16_stack_guard = (int (*)(void))stack_guard - -#define PCRE_ASSIGN_JIT_STACK16(extra, callback, userdata) \ - pcre16_assign_jit_stack((pcre16_extra *)extra, \ - (pcre16_jit_callback)callback, userdata) - -#define PCRE_COMPILE16(re, pat, options, error, erroffset, tables) \ - re = (pcre *)pcre16_compile((PCRE_SPTR16)pat, options, error, erroffset, \ - tables) - -#define PCRE_COPY_NAMED_SUBSTRING16(rc, re, bptr, offsets, count, \ - namesptr, cbuffer, size) \ - rc = pcre16_copy_named_substring((pcre16 *)re, (PCRE_SPTR16)bptr, offsets, \ - count, (PCRE_SPTR16)namesptr, (PCRE_UCHAR16 *)cbuffer, size/2) - -#define PCRE_COPY_SUBSTRING16(rc, bptr, offsets, count, i, cbuffer, size) \ - rc = pcre16_copy_substring((PCRE_SPTR16)bptr, offsets, count, i, \ - (PCRE_UCHAR16 *)cbuffer, size/2) - -#define PCRE_DFA_EXEC16(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets, workspace, size_workspace) \ - count = pcre16_dfa_exec((pcre16 *)re, (pcre16_extra *)extra, \ - (PCRE_SPTR16)bptr, len, start_offset, options, offsets, size_offsets, \ - workspace, size_workspace) - -#define PCRE_EXEC16(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets) \ - count = pcre16_exec((pcre16 *)re, (pcre16_extra *)extra, (PCRE_SPTR16)bptr, \ - len, start_offset, options, offsets, size_offsets) - -#define PCRE_FREE_STUDY16(extra) \ - pcre16_free_study((pcre16_extra *)extra) - -#define PCRE_FREE_SUBSTRING16(substring) \ - pcre16_free_substring((PCRE_SPTR16)substring) - -#define PCRE_FREE_SUBSTRING_LIST16(listptr) \ - pcre16_free_substring_list((PCRE_SPTR16 *)listptr) - -#define PCRE_GET_NAMED_SUBSTRING16(rc, re, bptr, offsets, count, \ - getnamesptr, subsptr) \ - rc = pcre16_get_named_substring((pcre16 *)re, (PCRE_SPTR16)bptr, offsets, \ - count, (PCRE_SPTR16)getnamesptr, (PCRE_SPTR16 *)(void*)subsptr) - -#define PCRE_GET_STRINGNUMBER16(n, rc, ptr) \ - n = pcre16_get_stringnumber(re, (PCRE_SPTR16)ptr) - -#define PCRE_GET_SUBSTRING16(rc, bptr, offsets, count, i, subsptr) \ - rc = pcre16_get_substring((PCRE_SPTR16)bptr, offsets, count, i, \ - (PCRE_SPTR16 *)(void*)subsptr) - -#define PCRE_GET_SUBSTRING_LIST16(rc, bptr, offsets, count, listptr) \ - rc = pcre16_get_substring_list((PCRE_SPTR16)bptr, offsets, count, \ - (PCRE_SPTR16 **)(void*)listptr) - -#define PCRE_PATTERN_TO_HOST_BYTE_ORDER16(rc, re, extra, tables) \ - rc = pcre16_pattern_to_host_byte_order((pcre16 *)re, (pcre16_extra *)extra, \ - tables) - -#define PCRE_PRINTINT16(re, outfile, debug_lengths) \ - pcre16_printint(re, outfile, debug_lengths) - -#define PCRE_STUDY16(extra, re, options, error) \ - extra = (pcre_extra *)pcre16_study((pcre16 *)re, options, error) - -#define PCRE_JIT_STACK_ALLOC16(startsize, maxsize) \ - (pcre_jit_stack *)pcre16_jit_stack_alloc(startsize, maxsize) - -#define PCRE_JIT_STACK_FREE16(stack) \ - pcre16_jit_stack_free((pcre16_jit_stack *)stack) - -#endif /* SUPPORT_PCRE16 */ - -/* -----------------------------------------------------------*/ - -#ifdef SUPPORT_PCRE32 - -#define PCHARS32(lv, p, offset, len, f) \ - lv = pchars32((PCRE_SPTR32)(p) + offset, len, use_utf, f) - -#define PCHARSV32(p, offset, len, f) \ - (void)pchars32((PCRE_SPTR32)(p) + offset, len, use_utf, f) - -#define READ_CAPTURE_NAME32(p, cn8, cn16, cn32, re) \ - p = read_capture_name32(p, cn32, re) - -#define STRLEN32(p) ((int)strlen32((PCRE_SPTR32)p)) - -#define SET_PCRE_CALLOUT32(callout) \ - pcre32_callout = (int (*)(pcre32_callout_block *))callout - -#define SET_PCRE_STACK_GUARD32(stack_guard) \ - pcre32_stack_guard = (int (*)(void))stack_guard - -#define PCRE_ASSIGN_JIT_STACK32(extra, callback, userdata) \ - pcre32_assign_jit_stack((pcre32_extra *)extra, \ - (pcre32_jit_callback)callback, userdata) - -#define PCRE_COMPILE32(re, pat, options, error, erroffset, tables) \ - re = (pcre *)pcre32_compile((PCRE_SPTR32)pat, options, error, erroffset, \ - tables) - -#define PCRE_COPY_NAMED_SUBSTRING32(rc, re, bptr, offsets, count, \ - namesptr, cbuffer, size) \ - rc = pcre32_copy_named_substring((pcre32 *)re, (PCRE_SPTR32)bptr, offsets, \ - count, (PCRE_SPTR32)namesptr, (PCRE_UCHAR32 *)cbuffer, size/4) - -#define PCRE_COPY_SUBSTRING32(rc, bptr, offsets, count, i, cbuffer, size) \ - rc = pcre32_copy_substring((PCRE_SPTR32)bptr, offsets, count, i, \ - (PCRE_UCHAR32 *)cbuffer, size/4) - -#define PCRE_DFA_EXEC32(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets, workspace, size_workspace) \ - count = pcre32_dfa_exec((pcre32 *)re, (pcre32_extra *)extra, \ - (PCRE_SPTR32)bptr, len, start_offset, options, offsets, size_offsets, \ - workspace, size_workspace) - -#define PCRE_EXEC32(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets) \ - count = pcre32_exec((pcre32 *)re, (pcre32_extra *)extra, (PCRE_SPTR32)bptr, \ - len, start_offset, options, offsets, size_offsets) - -#define PCRE_FREE_STUDY32(extra) \ - pcre32_free_study((pcre32_extra *)extra) - -#define PCRE_FREE_SUBSTRING32(substring) \ - pcre32_free_substring((PCRE_SPTR32)substring) - -#define PCRE_FREE_SUBSTRING_LIST32(listptr) \ - pcre32_free_substring_list((PCRE_SPTR32 *)listptr) - -#define PCRE_GET_NAMED_SUBSTRING32(rc, re, bptr, offsets, count, \ - getnamesptr, subsptr) \ - rc = pcre32_get_named_substring((pcre32 *)re, (PCRE_SPTR32)bptr, offsets, \ - count, (PCRE_SPTR32)getnamesptr, (PCRE_SPTR32 *)(void*)subsptr) - -#define PCRE_GET_STRINGNUMBER32(n, rc, ptr) \ - n = pcre32_get_stringnumber(re, (PCRE_SPTR32)ptr) - -#define PCRE_GET_SUBSTRING32(rc, bptr, offsets, count, i, subsptr) \ - rc = pcre32_get_substring((PCRE_SPTR32)bptr, offsets, count, i, \ - (PCRE_SPTR32 *)(void*)subsptr) - -#define PCRE_GET_SUBSTRING_LIST32(rc, bptr, offsets, count, listptr) \ - rc = pcre32_get_substring_list((PCRE_SPTR32)bptr, offsets, count, \ - (PCRE_SPTR32 **)(void*)listptr) - -#define PCRE_PATTERN_TO_HOST_BYTE_ORDER32(rc, re, extra, tables) \ - rc = pcre32_pattern_to_host_byte_order((pcre32 *)re, (pcre32_extra *)extra, \ - tables) - -#define PCRE_PRINTINT32(re, outfile, debug_lengths) \ - pcre32_printint(re, outfile, debug_lengths) - -#define PCRE_STUDY32(extra, re, options, error) \ - extra = (pcre_extra *)pcre32_study((pcre32 *)re, options, error) - -#define PCRE_JIT_STACK_ALLOC32(startsize, maxsize) \ - (pcre_jit_stack *)pcre32_jit_stack_alloc(startsize, maxsize) - -#define PCRE_JIT_STACK_FREE32(stack) \ - pcre32_jit_stack_free((pcre32_jit_stack *)stack) - -#endif /* SUPPORT_PCRE32 */ - - -/* ----- More than one mode is supported; a runtime test is needed, except for -pcre_config(), and the JIT stack functions, when it doesn't matter which -available version is called. ----- */ - -enum { - PCRE8_MODE, - PCRE16_MODE, - PCRE32_MODE -}; - -#if (defined (SUPPORT_PCRE8) + defined (SUPPORT_PCRE16) + \ - defined (SUPPORT_PCRE32)) >= 2 - -#define CHAR_SIZE (1 << pcre_mode) - -/* There doesn't seem to be an easy way of writing these macros that can cope -with the 3 pairs of bit sizes plus all three bit sizes. So just handle all the -cases separately. */ - -/* ----- All three modes supported ----- */ - -#if defined(SUPPORT_PCRE8) && defined(SUPPORT_PCRE16) && defined(SUPPORT_PCRE32) - -#define PCHARS(lv, p, offset, len, f) \ - if (pcre_mode == PCRE32_MODE) \ - PCHARS32(lv, p, offset, len, f); \ - else if (pcre_mode == PCRE16_MODE) \ - PCHARS16(lv, p, offset, len, f); \ - else \ - PCHARS8(lv, p, offset, len, f) - -#define PCHARSV(p, offset, len, f) \ - if (pcre_mode == PCRE32_MODE) \ - PCHARSV32(p, offset, len, f); \ - else if (pcre_mode == PCRE16_MODE) \ - PCHARSV16(p, offset, len, f); \ - else \ - PCHARSV8(p, offset, len, f) - -#define READ_CAPTURE_NAME(p, cn8, cn16, cn32, re) \ - if (pcre_mode == PCRE32_MODE) \ - READ_CAPTURE_NAME32(p, cn8, cn16, cn32, re); \ - else if (pcre_mode == PCRE16_MODE) \ - READ_CAPTURE_NAME16(p, cn8, cn16, cn32, re); \ - else \ - READ_CAPTURE_NAME8(p, cn8, cn16, cn32, re) - -#define SET_PCRE_CALLOUT(callout) \ - if (pcre_mode == PCRE32_MODE) \ - SET_PCRE_CALLOUT32(callout); \ - else if (pcre_mode == PCRE16_MODE) \ - SET_PCRE_CALLOUT16(callout); \ - else \ - SET_PCRE_CALLOUT8(callout) - -#define SET_PCRE_STACK_GUARD(stack_guard) \ - if (pcre_mode == PCRE32_MODE) \ - SET_PCRE_STACK_GUARD32(stack_guard); \ - else if (pcre_mode == PCRE16_MODE) \ - SET_PCRE_STACK_GUARD16(stack_guard); \ - else \ - SET_PCRE_STACK_GUARD8(stack_guard) - -#define STRLEN(p) (pcre_mode == PCRE32_MODE ? STRLEN32(p) : pcre_mode == PCRE16_MODE ? STRLEN16(p) : STRLEN8(p)) - -#define PCRE_ASSIGN_JIT_STACK(extra, callback, userdata) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_ASSIGN_JIT_STACK32(extra, callback, userdata); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_ASSIGN_JIT_STACK16(extra, callback, userdata); \ - else \ - PCRE_ASSIGN_JIT_STACK8(extra, callback, userdata) - -#define PCRE_COMPILE(re, pat, options, error, erroffset, tables) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_COMPILE32(re, pat, options, error, erroffset, tables); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_COMPILE16(re, pat, options, error, erroffset, tables); \ - else \ - PCRE_COMPILE8(re, pat, options, error, erroffset, tables) - -#define PCRE_CONFIG pcre_config - -#define PCRE_COPY_NAMED_SUBSTRING(rc, re, bptr, offsets, count, \ - namesptr, cbuffer, size) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_COPY_NAMED_SUBSTRING32(rc, re, bptr, offsets, count, \ - namesptr, cbuffer, size); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_COPY_NAMED_SUBSTRING16(rc, re, bptr, offsets, count, \ - namesptr, cbuffer, size); \ - else \ - PCRE_COPY_NAMED_SUBSTRING8(rc, re, bptr, offsets, count, \ - namesptr, cbuffer, size) - -#define PCRE_COPY_SUBSTRING(rc, bptr, offsets, count, i, cbuffer, size) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_COPY_SUBSTRING32(rc, bptr, offsets, count, i, cbuffer, size); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_COPY_SUBSTRING16(rc, bptr, offsets, count, i, cbuffer, size); \ - else \ - PCRE_COPY_SUBSTRING8(rc, bptr, offsets, count, i, cbuffer, size) - -#define PCRE_DFA_EXEC(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets, workspace, size_workspace) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_DFA_EXEC32(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets, workspace, size_workspace); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_DFA_EXEC16(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets, workspace, size_workspace); \ - else \ - PCRE_DFA_EXEC8(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets, workspace, size_workspace) - -#define PCRE_EXEC(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_EXEC32(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_EXEC16(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets); \ - else \ - PCRE_EXEC8(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets) - -#define PCRE_FREE_STUDY(extra) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_FREE_STUDY32(extra); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_FREE_STUDY16(extra); \ - else \ - PCRE_FREE_STUDY8(extra) - -#define PCRE_FREE_SUBSTRING(substring) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_FREE_SUBSTRING32(substring); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_FREE_SUBSTRING16(substring); \ - else \ - PCRE_FREE_SUBSTRING8(substring) - -#define PCRE_FREE_SUBSTRING_LIST(listptr) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_FREE_SUBSTRING_LIST32(listptr); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_FREE_SUBSTRING_LIST16(listptr); \ - else \ - PCRE_FREE_SUBSTRING_LIST8(listptr) - -#define PCRE_GET_NAMED_SUBSTRING(rc, re, bptr, offsets, count, \ - getnamesptr, subsptr) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_GET_NAMED_SUBSTRING32(rc, re, bptr, offsets, count, \ - getnamesptr, subsptr); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_GET_NAMED_SUBSTRING16(rc, re, bptr, offsets, count, \ - getnamesptr, subsptr); \ - else \ - PCRE_GET_NAMED_SUBSTRING8(rc, re, bptr, offsets, count, \ - getnamesptr, subsptr) - -#define PCRE_GET_STRINGNUMBER(n, rc, ptr) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_GET_STRINGNUMBER32(n, rc, ptr); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_GET_STRINGNUMBER16(n, rc, ptr); \ - else \ - PCRE_GET_STRINGNUMBER8(n, rc, ptr) - -#define PCRE_GET_SUBSTRING(rc, bptr, use_offsets, count, i, subsptr) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_GET_SUBSTRING32(rc, bptr, use_offsets, count, i, subsptr); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_GET_SUBSTRING16(rc, bptr, use_offsets, count, i, subsptr); \ - else \ - PCRE_GET_SUBSTRING8(rc, bptr, use_offsets, count, i, subsptr) - -#define PCRE_GET_SUBSTRING_LIST(rc, bptr, offsets, count, listptr) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_GET_SUBSTRING_LIST32(rc, bptr, offsets, count, listptr); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_GET_SUBSTRING_LIST16(rc, bptr, offsets, count, listptr); \ - else \ - PCRE_GET_SUBSTRING_LIST8(rc, bptr, offsets, count, listptr) - -#define PCRE_JIT_STACK_ALLOC(startsize, maxsize) \ - (pcre_mode == PCRE32_MODE ? \ - PCRE_JIT_STACK_ALLOC32(startsize, maxsize) \ - : pcre_mode == PCRE16_MODE ? \ - PCRE_JIT_STACK_ALLOC16(startsize, maxsize) \ - : PCRE_JIT_STACK_ALLOC8(startsize, maxsize)) - -#define PCRE_JIT_STACK_FREE(stack) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_JIT_STACK_FREE32(stack); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_JIT_STACK_FREE16(stack); \ - else \ - PCRE_JIT_STACK_FREE8(stack) - -#define PCRE_MAKETABLES \ - (pcre_mode == PCRE32_MODE ? pcre32_maketables() : pcre_mode == PCRE16_MODE ? pcre16_maketables() : pcre_maketables()) - -#define PCRE_PATTERN_TO_HOST_BYTE_ORDER(rc, re, extra, tables) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_PATTERN_TO_HOST_BYTE_ORDER32(rc, re, extra, tables); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_PATTERN_TO_HOST_BYTE_ORDER16(rc, re, extra, tables); \ - else \ - PCRE_PATTERN_TO_HOST_BYTE_ORDER8(rc, re, extra, tables) - -#define PCRE_PRINTINT(re, outfile, debug_lengths) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_PRINTINT32(re, outfile, debug_lengths); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_PRINTINT16(re, outfile, debug_lengths); \ - else \ - PCRE_PRINTINT8(re, outfile, debug_lengths) - -#define PCRE_STUDY(extra, re, options, error) \ - if (pcre_mode == PCRE32_MODE) \ - PCRE_STUDY32(extra, re, options, error); \ - else if (pcre_mode == PCRE16_MODE) \ - PCRE_STUDY16(extra, re, options, error); \ - else \ - PCRE_STUDY8(extra, re, options, error) - - -/* ----- Two out of three modes are supported ----- */ - -#else - -/* We can use some macro trickery to make a single set of definitions work in -the three different cases. */ - -/* ----- 32-bit and 16-bit but not 8-bit supported ----- */ - -#if defined(SUPPORT_PCRE32) && defined(SUPPORT_PCRE16) -#define BITONE 32 -#define BITTWO 16 - -/* ----- 32-bit and 8-bit but not 16-bit supported ----- */ - -#elif defined(SUPPORT_PCRE32) && defined(SUPPORT_PCRE8) -#define BITONE 32 -#define BITTWO 8 - -/* ----- 16-bit and 8-bit but not 32-bit supported ----- */ - -#else -#define BITONE 16 -#define BITTWO 8 -#endif - -#define glue(a,b) a##b -#define G(a,b) glue(a,b) - - -/* ----- Common macros for two-mode cases ----- */ - -#define PCHARS(lv, p, offset, len, f) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCHARS,BITONE)(lv, p, offset, len, f); \ - else \ - G(PCHARS,BITTWO)(lv, p, offset, len, f) - -#define PCHARSV(p, offset, len, f) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCHARSV,BITONE)(p, offset, len, f); \ - else \ - G(PCHARSV,BITTWO)(p, offset, len, f) - -#define READ_CAPTURE_NAME(p, cn8, cn16, cn32, re) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(READ_CAPTURE_NAME,BITONE)(p, cn8, cn16, cn32, re); \ - else \ - G(READ_CAPTURE_NAME,BITTWO)(p, cn8, cn16, cn32, re) - -#define SET_PCRE_CALLOUT(callout) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(SET_PCRE_CALLOUT,BITONE)(callout); \ - else \ - G(SET_PCRE_CALLOUT,BITTWO)(callout) - -#define SET_PCRE_STACK_GUARD(stack_guard) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(SET_PCRE_STACK_GUARD,BITONE)(stack_guard); \ - else \ - G(SET_PCRE_STACK_GUARD,BITTWO)(stack_guard) - -#define STRLEN(p) ((pcre_mode == G(G(PCRE,BITONE),_MODE)) ? \ - G(STRLEN,BITONE)(p) : G(STRLEN,BITTWO)(p)) - -#define PCRE_ASSIGN_JIT_STACK(extra, callback, userdata) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_ASSIGN_JIT_STACK,BITONE)(extra, callback, userdata); \ - else \ - G(PCRE_ASSIGN_JIT_STACK,BITTWO)(extra, callback, userdata) - -#define PCRE_COMPILE(re, pat, options, error, erroffset, tables) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_COMPILE,BITONE)(re, pat, options, error, erroffset, tables); \ - else \ - G(PCRE_COMPILE,BITTWO)(re, pat, options, error, erroffset, tables) - -#define PCRE_CONFIG G(G(pcre,BITONE),_config) - -#define PCRE_COPY_NAMED_SUBSTRING(rc, re, bptr, offsets, count, \ - namesptr, cbuffer, size) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_COPY_NAMED_SUBSTRING,BITONE)(rc, re, bptr, offsets, count, \ - namesptr, cbuffer, size); \ - else \ - G(PCRE_COPY_NAMED_SUBSTRING,BITTWO)(rc, re, bptr, offsets, count, \ - namesptr, cbuffer, size) - -#define PCRE_COPY_SUBSTRING(rc, bptr, offsets, count, i, cbuffer, size) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_COPY_SUBSTRING,BITONE)(rc, bptr, offsets, count, i, cbuffer, size); \ - else \ - G(PCRE_COPY_SUBSTRING,BITTWO)(rc, bptr, offsets, count, i, cbuffer, size) - -#define PCRE_DFA_EXEC(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets, workspace, size_workspace) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_DFA_EXEC,BITONE)(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets, workspace, size_workspace); \ - else \ - G(PCRE_DFA_EXEC,BITTWO)(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets, workspace, size_workspace) - -#define PCRE_EXEC(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_EXEC,BITONE)(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets); \ - else \ - G(PCRE_EXEC,BITTWO)(count, re, extra, bptr, len, start_offset, options, \ - offsets, size_offsets) - -#define PCRE_FREE_STUDY(extra) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_FREE_STUDY,BITONE)(extra); \ - else \ - G(PCRE_FREE_STUDY,BITTWO)(extra) - -#define PCRE_FREE_SUBSTRING(substring) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_FREE_SUBSTRING,BITONE)(substring); \ - else \ - G(PCRE_FREE_SUBSTRING,BITTWO)(substring) - -#define PCRE_FREE_SUBSTRING_LIST(listptr) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_FREE_SUBSTRING_LIST,BITONE)(listptr); \ - else \ - G(PCRE_FREE_SUBSTRING_LIST,BITTWO)(listptr) - -#define PCRE_GET_NAMED_SUBSTRING(rc, re, bptr, offsets, count, \ - getnamesptr, subsptr) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_GET_NAMED_SUBSTRING,BITONE)(rc, re, bptr, offsets, count, \ - getnamesptr, subsptr); \ - else \ - G(PCRE_GET_NAMED_SUBSTRING,BITTWO)(rc, re, bptr, offsets, count, \ - getnamesptr, subsptr) - -#define PCRE_GET_STRINGNUMBER(n, rc, ptr) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_GET_STRINGNUMBER,BITONE)(n, rc, ptr); \ - else \ - G(PCRE_GET_STRINGNUMBER,BITTWO)(n, rc, ptr) - -#define PCRE_GET_SUBSTRING(rc, bptr, use_offsets, count, i, subsptr) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_GET_SUBSTRING,BITONE)(rc, bptr, use_offsets, count, i, subsptr); \ - else \ - G(PCRE_GET_SUBSTRING,BITTWO)(rc, bptr, use_offsets, count, i, subsptr) - -#define PCRE_GET_SUBSTRING_LIST(rc, bptr, offsets, count, listptr) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_GET_SUBSTRING_LIST,BITONE)(rc, bptr, offsets, count, listptr); \ - else \ - G(PCRE_GET_SUBSTRING_LIST,BITTWO)(rc, bptr, offsets, count, listptr) - -#define PCRE_JIT_STACK_ALLOC(startsize, maxsize) \ - (pcre_mode == G(G(PCRE,BITONE),_MODE)) ? \ - G(PCRE_JIT_STACK_ALLOC,BITONE)(startsize, maxsize) \ - : G(PCRE_JIT_STACK_ALLOC,BITTWO)(startsize, maxsize) - -#define PCRE_JIT_STACK_FREE(stack) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_JIT_STACK_FREE,BITONE)(stack); \ - else \ - G(PCRE_JIT_STACK_FREE,BITTWO)(stack) - -#define PCRE_MAKETABLES \ - (pcre_mode == G(G(PCRE,BITONE),_MODE)) ? \ - G(G(pcre,BITONE),_maketables)() : G(G(pcre,BITTWO),_maketables)() - -#define PCRE_PATTERN_TO_HOST_BYTE_ORDER(rc, re, extra, tables) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_PATTERN_TO_HOST_BYTE_ORDER,BITONE)(rc, re, extra, tables); \ - else \ - G(PCRE_PATTERN_TO_HOST_BYTE_ORDER,BITTWO)(rc, re, extra, tables) - -#define PCRE_PRINTINT(re, outfile, debug_lengths) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_PRINTINT,BITONE)(re, outfile, debug_lengths); \ - else \ - G(PCRE_PRINTINT,BITTWO)(re, outfile, debug_lengths) - -#define PCRE_STUDY(extra, re, options, error) \ - if (pcre_mode == G(G(PCRE,BITONE),_MODE)) \ - G(PCRE_STUDY,BITONE)(extra, re, options, error); \ - else \ - G(PCRE_STUDY,BITTWO)(extra, re, options, error) - -#endif /* Two out of three modes */ - -/* ----- End of cases where more than one mode is supported ----- */ - - -/* ----- Only 8-bit mode is supported ----- */ - -#elif defined SUPPORT_PCRE8 -#define CHAR_SIZE 1 -#define PCHARS PCHARS8 -#define PCHARSV PCHARSV8 -#define READ_CAPTURE_NAME READ_CAPTURE_NAME8 -#define SET_PCRE_CALLOUT SET_PCRE_CALLOUT8 -#define SET_PCRE_STACK_GUARD SET_PCRE_STACK_GUARD8 -#define STRLEN STRLEN8 -#define PCRE_ASSIGN_JIT_STACK PCRE_ASSIGN_JIT_STACK8 -#define PCRE_COMPILE PCRE_COMPILE8 -#define PCRE_CONFIG pcre_config -#define PCRE_COPY_NAMED_SUBSTRING PCRE_COPY_NAMED_SUBSTRING8 -#define PCRE_COPY_SUBSTRING PCRE_COPY_SUBSTRING8 -#define PCRE_DFA_EXEC PCRE_DFA_EXEC8 -#define PCRE_EXEC PCRE_EXEC8 -#define PCRE_FREE_STUDY PCRE_FREE_STUDY8 -#define PCRE_FREE_SUBSTRING PCRE_FREE_SUBSTRING8 -#define PCRE_FREE_SUBSTRING_LIST PCRE_FREE_SUBSTRING_LIST8 -#define PCRE_GET_NAMED_SUBSTRING PCRE_GET_NAMED_SUBSTRING8 -#define PCRE_GET_STRINGNUMBER PCRE_GET_STRINGNUMBER8 -#define PCRE_GET_SUBSTRING PCRE_GET_SUBSTRING8 -#define PCRE_GET_SUBSTRING_LIST PCRE_GET_SUBSTRING_LIST8 -#define PCRE_JIT_STACK_ALLOC PCRE_JIT_STACK_ALLOC8 -#define PCRE_JIT_STACK_FREE PCRE_JIT_STACK_FREE8 -#define PCRE_MAKETABLES pcre_maketables() -#define PCRE_PATTERN_TO_HOST_BYTE_ORDER PCRE_PATTERN_TO_HOST_BYTE_ORDER8 -#define PCRE_PRINTINT PCRE_PRINTINT8 -#define PCRE_STUDY PCRE_STUDY8 - -/* ----- Only 16-bit mode is supported ----- */ - -#elif defined SUPPORT_PCRE16 -#define CHAR_SIZE 2 -#define PCHARS PCHARS16 -#define PCHARSV PCHARSV16 -#define READ_CAPTURE_NAME READ_CAPTURE_NAME16 -#define SET_PCRE_CALLOUT SET_PCRE_CALLOUT16 -#define SET_PCRE_STACK_GUARD SET_PCRE_STACK_GUARD16 -#define STRLEN STRLEN16 -#define PCRE_ASSIGN_JIT_STACK PCRE_ASSIGN_JIT_STACK16 -#define PCRE_COMPILE PCRE_COMPILE16 -#define PCRE_CONFIG pcre16_config -#define PCRE_COPY_NAMED_SUBSTRING PCRE_COPY_NAMED_SUBSTRING16 -#define PCRE_COPY_SUBSTRING PCRE_COPY_SUBSTRING16 -#define PCRE_DFA_EXEC PCRE_DFA_EXEC16 -#define PCRE_EXEC PCRE_EXEC16 -#define PCRE_FREE_STUDY PCRE_FREE_STUDY16 -#define PCRE_FREE_SUBSTRING PCRE_FREE_SUBSTRING16 -#define PCRE_FREE_SUBSTRING_LIST PCRE_FREE_SUBSTRING_LIST16 -#define PCRE_GET_NAMED_SUBSTRING PCRE_GET_NAMED_SUBSTRING16 -#define PCRE_GET_STRINGNUMBER PCRE_GET_STRINGNUMBER16 -#define PCRE_GET_SUBSTRING PCRE_GET_SUBSTRING16 -#define PCRE_GET_SUBSTRING_LIST PCRE_GET_SUBSTRING_LIST16 -#define PCRE_JIT_STACK_ALLOC PCRE_JIT_STACK_ALLOC16 -#define PCRE_JIT_STACK_FREE PCRE_JIT_STACK_FREE16 -#define PCRE_MAKETABLES pcre16_maketables() -#define PCRE_PATTERN_TO_HOST_BYTE_ORDER PCRE_PATTERN_TO_HOST_BYTE_ORDER16 -#define PCRE_PRINTINT PCRE_PRINTINT16 -#define PCRE_STUDY PCRE_STUDY16 - -/* ----- Only 32-bit mode is supported ----- */ - -#elif defined SUPPORT_PCRE32 -#define CHAR_SIZE 4 -#define PCHARS PCHARS32 -#define PCHARSV PCHARSV32 -#define READ_CAPTURE_NAME READ_CAPTURE_NAME32 -#define SET_PCRE_CALLOUT SET_PCRE_CALLOUT32 -#define SET_PCRE_STACK_GUARD SET_PCRE_STACK_GUARD32 -#define STRLEN STRLEN32 -#define PCRE_ASSIGN_JIT_STACK PCRE_ASSIGN_JIT_STACK32 -#define PCRE_COMPILE PCRE_COMPILE32 -#define PCRE_CONFIG pcre32_config -#define PCRE_COPY_NAMED_SUBSTRING PCRE_COPY_NAMED_SUBSTRING32 -#define PCRE_COPY_SUBSTRING PCRE_COPY_SUBSTRING32 -#define PCRE_DFA_EXEC PCRE_DFA_EXEC32 -#define PCRE_EXEC PCRE_EXEC32 -#define PCRE_FREE_STUDY PCRE_FREE_STUDY32 -#define PCRE_FREE_SUBSTRING PCRE_FREE_SUBSTRING32 -#define PCRE_FREE_SUBSTRING_LIST PCRE_FREE_SUBSTRING_LIST32 -#define PCRE_GET_NAMED_SUBSTRING PCRE_GET_NAMED_SUBSTRING32 -#define PCRE_GET_STRINGNUMBER PCRE_GET_STRINGNUMBER32 -#define PCRE_GET_SUBSTRING PCRE_GET_SUBSTRING32 -#define PCRE_GET_SUBSTRING_LIST PCRE_GET_SUBSTRING_LIST32 -#define PCRE_JIT_STACK_ALLOC PCRE_JIT_STACK_ALLOC32 -#define PCRE_JIT_STACK_FREE PCRE_JIT_STACK_FREE32 -#define PCRE_MAKETABLES pcre32_maketables() -#define PCRE_PATTERN_TO_HOST_BYTE_ORDER PCRE_PATTERN_TO_HOST_BYTE_ORDER32 -#define PCRE_PRINTINT PCRE_PRINTINT32 -#define PCRE_STUDY PCRE_STUDY32 - -#endif - -/* ----- End of mode-specific function call macros ----- */ - - -/* Other parameters */ - -#ifndef CLOCKS_PER_SEC -#ifdef CLK_TCK -#define CLOCKS_PER_SEC CLK_TCK -#else -#define CLOCKS_PER_SEC 100 -#endif -#endif - -#if !defined NODFA -#define DFA_WS_DIMENSION 1000 -#endif - -/* This is the default loop count for timing. */ - -#define LOOPREPEAT 500000 - -/* Static variables */ - -static FILE *outfile; -static int log_store = 0; -static int callout_count; -static int callout_extra; -static int callout_fail_count; -static int callout_fail_id; -static int debug_lengths; -static int first_callout; -static int jit_was_used; -static int locale_set = 0; -static int show_malloc; -static int stack_guard_return; -static int use_utf; -static const unsigned char *last_callout_mark = NULL; - -/* The buffers grow automatically if very long input lines are encountered. */ - -static int buffer_size = 50000; -static pcre_uint8 *buffer = NULL; -static pcre_uint8 *pbuffer = NULL; - -/* Just as a safety check, make sure that COMPILE_PCRE[16|32] are *not* set. */ - -#ifdef COMPILE_PCRE16 -#error COMPILE_PCRE16 must not be set when compiling pcretest.c -#endif - -#ifdef COMPILE_PCRE32 -#error COMPILE_PCRE32 must not be set when compiling pcretest.c -#endif - -/* We need buffers for building 16/32-bit strings, and the tables of operator -lengths that are used for 16/32-bit compiling, in order to swap bytes in a -pattern for saving/reloading testing. Luckily, the data for these tables is -defined as a macro. However, we must ensure that LINK_SIZE and IMM2_SIZE (which -are used in the tables) are adjusted appropriately for the 16/32-bit world. -LINK_SIZE is also used later in this program. */ - -#ifdef SUPPORT_PCRE16 -#undef IMM2_SIZE -#define IMM2_SIZE 1 - -#if LINK_SIZE == 2 -#undef LINK_SIZE -#define LINK_SIZE 1 -#elif LINK_SIZE == 3 || LINK_SIZE == 4 -#undef LINK_SIZE -#define LINK_SIZE 2 -#else -#error LINK_SIZE must be either 2, 3, or 4 -#endif - -static int buffer16_size = 0; -static pcre_uint16 *buffer16 = NULL; -static const pcre_uint16 OP_lengths16[] = { OP_LENGTHS }; -#endif /* SUPPORT_PCRE16 */ - -#ifdef SUPPORT_PCRE32 -#undef IMM2_SIZE -#define IMM2_SIZE 1 -#undef LINK_SIZE -#define LINK_SIZE 1 - -static int buffer32_size = 0; -static pcre_uint32 *buffer32 = NULL; -static const pcre_uint32 OP_lengths32[] = { OP_LENGTHS }; -#endif /* SUPPORT_PCRE32 */ - -/* If we have 8-bit support, default to it; if there is also 16-or 32-bit -support, it can be changed by an option. If there is no 8-bit support, there -must be 16-or 32-bit support, so default it to 1. */ - -#if defined SUPPORT_PCRE8 -static int pcre_mode = PCRE8_MODE; -#elif defined SUPPORT_PCRE16 -static int pcre_mode = PCRE16_MODE; -#elif defined SUPPORT_PCRE32 -static int pcre_mode = PCRE32_MODE; -#endif - -/* JIT study options for -s+n and /S+n where '1' <= n <= '7'. */ - -static int jit_study_bits[] = - { - PCRE_STUDY_JIT_COMPILE, - PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE, - PCRE_STUDY_JIT_COMPILE + PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE, - PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE, - PCRE_STUDY_JIT_COMPILE + PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE, - PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE + PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE, - PCRE_STUDY_JIT_COMPILE + PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE + - PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE -}; - -#define PCRE_STUDY_ALLJIT (PCRE_STUDY_JIT_COMPILE | \ - PCRE_STUDY_JIT_PARTIAL_SOFT_COMPILE | PCRE_STUDY_JIT_PARTIAL_HARD_COMPILE) - -/* Textual explanations for runtime error codes */ - -static const char *errtexts[] = { - NULL, /* 0 is no error */ - NULL, /* NOMATCH is handled specially */ - "NULL argument passed", - "bad option value", - "magic number missing", - "unknown opcode - pattern overwritten?", - "no more memory", - NULL, /* never returned by pcre_exec() or pcre_dfa_exec() */ - "match limit exceeded", - "callout error code", - NULL, /* BADUTF8/16 is handled specially */ - NULL, /* BADUTF8/16 offset is handled specially */ - NULL, /* PARTIAL is handled specially */ - "not used - internal error", - "internal error - pattern overwritten?", - "bad count value", - "item unsupported for DFA matching", - "backreference condition or recursion test not supported for DFA matching", - "match limit not supported for DFA matching", - "workspace size exceeded in DFA matching", - "too much recursion for DFA matching", - "recursion limit exceeded", - "not used - internal error", - "invalid combination of newline options", - "bad offset value", - NULL, /* SHORTUTF8/16 is handled specially */ - "nested recursion at the same subject position", - "JIT stack limit reached", - "pattern compiled in wrong mode: 8-bit/16-bit error", - "pattern compiled with other endianness", - "invalid data in workspace for DFA restart", - "bad JIT option", - "bad length" -}; - - -/************************************************* -* Alternate character tables * -*************************************************/ - -/* By default, the "tables" pointer when calling PCRE is set to NULL, thereby -using the default tables of the library. However, the T option can be used to -select alternate sets of tables, for different kinds of testing. Note also that -the L (locale) option also adjusts the tables. */ - -/* This is the set of tables distributed as default with PCRE. It recognizes -only ASCII characters. */ - -static const pcre_uint8 tables0[] = { - -/* This table is a lower casing table. */ - - 0, 1, 2, 3, 4, 5, 6, 7, - 8, 9, 10, 11, 12, 13, 14, 15, - 16, 17, 18, 19, 20, 21, 22, 23, - 24, 25, 26, 27, 28, 29, 30, 31, - 32, 33, 34, 35, 36, 37, 38, 39, - 40, 41, 42, 43, 44, 45, 46, 47, - 48, 49, 50, 51, 52, 53, 54, 55, - 56, 57, 58, 59, 60, 61, 62, 63, - 64, 97, 98, 99,100,101,102,103, - 104,105,106,107,108,109,110,111, - 112,113,114,115,116,117,118,119, - 120,121,122, 91, 92, 93, 94, 95, - 96, 97, 98, 99,100,101,102,103, - 104,105,106,107,108,109,110,111, - 112,113,114,115,116,117,118,119, - 120,121,122,123,124,125,126,127, - 128,129,130,131,132,133,134,135, - 136,137,138,139,140,141,142,143, - 144,145,146,147,148,149,150,151, - 152,153,154,155,156,157,158,159, - 160,161,162,163,164,165,166,167, - 168,169,170,171,172,173,174,175, - 176,177,178,179,180,181,182,183, - 184,185,186,187,188,189,190,191, - 192,193,194,195,196,197,198,199, - 200,201,202,203,204,205,206,207, - 208,209,210,211,212,213,214,215, - 216,217,218,219,220,221,222,223, - 224,225,226,227,228,229,230,231, - 232,233,234,235,236,237,238,239, - 240,241,242,243,244,245,246,247, - 248,249,250,251,252,253,254,255, - -/* This table is a case flipping table. */ - - 0, 1, 2, 3, 4, 5, 6, 7, - 8, 9, 10, 11, 12, 13, 14, 15, - 16, 17, 18, 19, 20, 21, 22, 23, - 24, 25, 26, 27, 28, 29, 30, 31, - 32, 33, 34, 35, 36, 37, 38, 39, - 40, 41, 42, 43, 44, 45, 46, 47, - 48, 49, 50, 51, 52, 53, 54, 55, - 56, 57, 58, 59, 60, 61, 62, 63, - 64, 97, 98, 99,100,101,102,103, - 104,105,106,107,108,109,110,111, - 112,113,114,115,116,117,118,119, - 120,121,122, 91, 92, 93, 94, 95, - 96, 65, 66, 67, 68, 69, 70, 71, - 72, 73, 74, 75, 76, 77, 78, 79, - 80, 81, 82, 83, 84, 85, 86, 87, - 88, 89, 90,123,124,125,126,127, - 128,129,130,131,132,133,134,135, - 136,137,138,139,140,141,142,143, - 144,145,146,147,148,149,150,151, - 152,153,154,155,156,157,158,159, - 160,161,162,163,164,165,166,167, - 168,169,170,171,172,173,174,175, - 176,177,178,179,180,181,182,183, - 184,185,186,187,188,189,190,191, - 192,193,194,195,196,197,198,199, - 200,201,202,203,204,205,206,207, - 208,209,210,211,212,213,214,215, - 216,217,218,219,220,221,222,223, - 224,225,226,227,228,229,230,231, - 232,233,234,235,236,237,238,239, - 240,241,242,243,244,245,246,247, - 248,249,250,251,252,253,254,255, - -/* This table contains bit maps for various character classes. Each map is 32 -bytes long and the bits run from the least significant end of each byte. The -classes that have their own maps are: space, xdigit, digit, upper, lower, word, -graph, print, punct, and cntrl. Other classes are built from combinations. */ - - 0x00,0x3e,0x00,0x00,0x01,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - - 0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03, - 0x7e,0x00,0x00,0x00,0x7e,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - - 0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0xfe,0xff,0xff,0x07,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0xfe,0xff,0xff,0x07, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - - 0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03, - 0xfe,0xff,0xff,0x87,0xfe,0xff,0xff,0x07, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - - 0x00,0x00,0x00,0x00,0xfe,0xff,0xff,0xff, - 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x7f, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - - 0x00,0x00,0x00,0x00,0xff,0xff,0xff,0xff, - 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x7f, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - - 0x00,0x00,0x00,0x00,0xfe,0xff,0x00,0xfc, - 0x01,0x00,0x00,0xf8,0x01,0x00,0x00,0x78, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - - 0xff,0xff,0xff,0xff,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x80, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - -/* This table identifies various classes of character by individual bits: - 0x01 white space character - 0x02 letter - 0x04 decimal digit - 0x08 hexadecimal digit - 0x10 alphanumeric or '_' - 0x80 regular expression metacharacter or binary zero -*/ - - 0x80,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 0- 7 */ - 0x00,0x01,0x01,0x01,0x01,0x01,0x00,0x00, /* 8- 15 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 16- 23 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 24- 31 */ - 0x01,0x00,0x00,0x00,0x80,0x00,0x00,0x00, /* - ' */ - 0x80,0x80,0x80,0x80,0x00,0x00,0x80,0x00, /* ( - / */ - 0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c, /* 0 - 7 */ - 0x1c,0x1c,0x00,0x00,0x00,0x00,0x00,0x80, /* 8 - ? */ - 0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* @ - G */ - 0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* H - O */ - 0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* P - W */ - 0x12,0x12,0x12,0x80,0x80,0x00,0x80,0x10, /* X - _ */ - 0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* ` - g */ - 0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* h - o */ - 0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* p - w */ - 0x12,0x12,0x12,0x80,0x80,0x00,0x00,0x00, /* x -127 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 128-135 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 136-143 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 144-151 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 152-159 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 160-167 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 168-175 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 176-183 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 184-191 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 192-199 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 200-207 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 208-215 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 216-223 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 224-231 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 232-239 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 240-247 */ - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00};/* 248-255 */ - -/* This is a set of tables that came originally from a Windows user. It seems -to be at least an approximation of ISO 8859. In particular, there are -characters greater than 128 that are marked as spaces, letters, etc. */ - -static const pcre_uint8 tables1[] = { -0,1,2,3,4,5,6,7, -8,9,10,11,12,13,14,15, -16,17,18,19,20,21,22,23, -24,25,26,27,28,29,30,31, -32,33,34,35,36,37,38,39, -40,41,42,43,44,45,46,47, -48,49,50,51,52,53,54,55, -56,57,58,59,60,61,62,63, -64,97,98,99,100,101,102,103, -104,105,106,107,108,109,110,111, -112,113,114,115,116,117,118,119, -120,121,122,91,92,93,94,95, -96,97,98,99,100,101,102,103, -104,105,106,107,108,109,110,111, -112,113,114,115,116,117,118,119, -120,121,122,123,124,125,126,127, -128,129,130,131,132,133,134,135, -136,137,138,139,140,141,142,143, -144,145,146,147,148,149,150,151, -152,153,154,155,156,157,158,159, -160,161,162,163,164,165,166,167, -168,169,170,171,172,173,174,175, -176,177,178,179,180,181,182,183, -184,185,186,187,188,189,190,191, -224,225,226,227,228,229,230,231, -232,233,234,235,236,237,238,239, -240,241,242,243,244,245,246,215, -248,249,250,251,252,253,254,223, -224,225,226,227,228,229,230,231, -232,233,234,235,236,237,238,239, -240,241,242,243,244,245,246,247, -248,249,250,251,252,253,254,255, -0,1,2,3,4,5,6,7, -8,9,10,11,12,13,14,15, -16,17,18,19,20,21,22,23, -24,25,26,27,28,29,30,31, -32,33,34,35,36,37,38,39, -40,41,42,43,44,45,46,47, -48,49,50,51,52,53,54,55, -56,57,58,59,60,61,62,63, -64,97,98,99,100,101,102,103, -104,105,106,107,108,109,110,111, -112,113,114,115,116,117,118,119, -120,121,122,91,92,93,94,95, -96,65,66,67,68,69,70,71, -72,73,74,75,76,77,78,79, -80,81,82,83,84,85,86,87, -88,89,90,123,124,125,126,127, -128,129,130,131,132,133,134,135, -136,137,138,139,140,141,142,143, -144,145,146,147,148,149,150,151, -152,153,154,155,156,157,158,159, -160,161,162,163,164,165,166,167, -168,169,170,171,172,173,174,175, -176,177,178,179,180,181,182,183, -184,185,186,187,188,189,190,191, -224,225,226,227,228,229,230,231, -232,233,234,235,236,237,238,239, -240,241,242,243,244,245,246,215, -248,249,250,251,252,253,254,223, -192,193,194,195,196,197,198,199, -200,201,202,203,204,205,206,207, -208,209,210,211,212,213,214,247, -216,217,218,219,220,221,222,255, -0,62,0,0,1,0,0,0, -0,0,0,0,0,0,0,0, -32,0,0,0,1,0,0,0, -0,0,0,0,0,0,0,0, -0,0,0,0,0,0,255,3, -126,0,0,0,126,0,0,0, -0,0,0,0,0,0,0,0, -0,0,0,0,0,0,0,0, -0,0,0,0,0,0,255,3, -0,0,0,0,0,0,0,0, -0,0,0,0,0,0,12,2, -0,0,0,0,0,0,0,0, -0,0,0,0,0,0,0,0, -254,255,255,7,0,0,0,0, -0,0,0,0,0,0,0,0, -255,255,127,127,0,0,0,0, -0,0,0,0,0,0,0,0, -0,0,0,0,254,255,255,7, -0,0,0,0,0,4,32,4, -0,0,0,128,255,255,127,255, -0,0,0,0,0,0,255,3, -254,255,255,135,254,255,255,7, -0,0,0,0,0,4,44,6, -255,255,127,255,255,255,127,255, -0,0,0,0,254,255,255,255, -255,255,255,255,255,255,255,127, -0,0,0,0,254,255,255,255, -255,255,255,255,255,255,255,255, -0,2,0,0,255,255,255,255, -255,255,255,255,255,255,255,127, -0,0,0,0,255,255,255,255, -255,255,255,255,255,255,255,255, -0,0,0,0,254,255,0,252, -1,0,0,248,1,0,0,120, -0,0,0,0,254,255,255,255, -0,0,128,0,0,0,128,0, -255,255,255,255,0,0,0,0, -0,0,0,0,0,0,0,128, -255,255,255,255,0,0,0,0, -0,0,0,0,0,0,0,0, -128,0,0,0,0,0,0,0, -0,1,1,0,1,1,0,0, -0,0,0,0,0,0,0,0, -0,0,0,0,0,0,0,0, -1,0,0,0,128,0,0,0, -128,128,128,128,0,0,128,0, -28,28,28,28,28,28,28,28, -28,28,0,0,0,0,0,128, -0,26,26,26,26,26,26,18, -18,18,18,18,18,18,18,18, -18,18,18,18,18,18,18,18, -18,18,18,128,128,0,128,16, -0,26,26,26,26,26,26,18, -18,18,18,18,18,18,18,18, -18,18,18,18,18,18,18,18, -18,18,18,128,128,0,0,0, -0,0,0,0,0,1,0,0, -0,0,0,0,0,0,0,0, -0,0,0,0,0,0,0,0, -0,0,0,0,0,0,0,0, -1,0,0,0,0,0,0,0, -0,0,18,0,0,0,0,0, -0,0,20,20,0,18,0,0, -0,20,18,0,0,0,0,0, -18,18,18,18,18,18,18,18, -18,18,18,18,18,18,18,18, -18,18,18,18,18,18,18,0, -18,18,18,18,18,18,18,18, -18,18,18,18,18,18,18,18, -18,18,18,18,18,18,18,18, -18,18,18,18,18,18,18,0, -18,18,18,18,18,18,18,18 -}; - - - - -#ifndef HAVE_STRERROR -/************************************************* -* Provide strerror() for non-ANSI libraries * -*************************************************/ - -/* Some old-fashioned systems still around (e.g. SunOS4) don't have strerror() -in their libraries, but can provide the same facility by this simple -alternative function. */ - -extern int sys_nerr; -extern char *sys_errlist[]; - -char * -strerror(int n) -{ -if (n < 0 || n >= sys_nerr) return "unknown error number"; -return sys_errlist[n]; -} -#endif /* HAVE_STRERROR */ - - - -/************************************************* -* Print newline configuration * -*************************************************/ - -/* -Arguments: - rc the return code from PCRE_CONFIG_NEWLINE - isc TRUE if called from "-C newline" -Returns: nothing -*/ - -static void -print_newline_config(int rc, BOOL isc) -{ -const char *s = NULL; -if (!isc) printf(" Newline sequence is "); -switch(rc) - { - case CHAR_CR: s = "CR"; break; - case CHAR_LF: s = "LF"; break; - case (CHAR_CR<<8 | CHAR_LF): s = "CRLF"; break; - case -1: s = "ANY"; break; - case -2: s = "ANYCRLF"; break; - - default: - printf("a non-standard value: 0x%04x\n", rc); - return; - } - -printf("%s\n", s); -} - - - -/************************************************* -* JIT memory callback * -*************************************************/ - -static pcre_jit_stack* jit_callback(void *arg) -{ -jit_was_used = TRUE; -return (pcre_jit_stack *)arg; -} - - -#if !defined NOUTF || defined SUPPORT_PCRE16 || defined SUPPORT_PCRE32 -/************************************************* -* Convert UTF-8 string to value * -*************************************************/ - -/* This function takes one or more bytes that represents a UTF-8 character, -and returns the value of the character. - -Argument: - utf8bytes a pointer to the byte vector - vptr a pointer to an int to receive the value - -Returns: > 0 => the number of bytes consumed - -6 to 0 => malformed UTF-8 character at offset = (-return) -*/ - -static int -utf82ord(pcre_uint8 *utf8bytes, pcre_uint32 *vptr) -{ -pcre_uint32 c = *utf8bytes++; -pcre_uint32 d = c; -int i, j, s; - -for (i = -1; i < 6; i++) /* i is number of additional bytes */ - { - if ((d & 0x80) == 0) break; - d <<= 1; - } - -if (i == -1) { *vptr = c; return 1; } /* ascii character */ -if (i == 0 || i == 6) return 0; /* invalid UTF-8 */ - -/* i now has a value in the range 1-5 */ - -s = 6*i; -d = (c & utf8_table3[i]) << s; - -for (j = 0; j < i; j++) - { - c = *utf8bytes++; - if ((c & 0xc0) != 0x80) return -(j+1); - s -= 6; - d |= (c & 0x3f) << s; - } - -/* Check that encoding was the correct unique one */ - -for (j = 0; j < utf8_table1_size; j++) - if (d <= (pcre_uint32)utf8_table1[j]) break; -if (j != i) return -(i+1); - -/* Valid value */ - -*vptr = d; -return i+1; -} -#endif /* NOUTF || SUPPORT_PCRE16 */ - - - -#if defined SUPPORT_PCRE8 && !defined NOUTF -/************************************************* -* Convert character value to UTF-8 * -*************************************************/ - -/* This function takes an integer value in the range 0 - 0x7fffffff -and encodes it as a UTF-8 character in 0 to 6 bytes. - -Arguments: - cvalue the character value - utf8bytes pointer to buffer for result - at least 6 bytes long - -Returns: number of characters placed in the buffer -*/ - -static int -ord2utf8(pcre_uint32 cvalue, pcre_uint8 *utf8bytes) -{ -register int i, j; -if (cvalue > 0x7fffffffu) - return -1; -for (i = 0; i < utf8_table1_size; i++) - if (cvalue <= (pcre_uint32)utf8_table1[i]) break; -utf8bytes += i; -for (j = i; j > 0; j--) - { - *utf8bytes-- = 0x80 | (cvalue & 0x3f); - cvalue >>= 6; - } -*utf8bytes = utf8_table2[i] | cvalue; -return i + 1; -} -#endif - - -#ifdef SUPPORT_PCRE16 -/************************************************* -* Convert a string to 16-bit * -*************************************************/ - -/* In non-UTF mode, the space needed for a 16-bit string is exactly double the -8-bit size. For a UTF-8 string, the size needed for UTF-16 is no more than -double, because up to 0xffff uses no more than 3 bytes in UTF-8 but possibly 4 -in UTF-16. Higher values use 4 bytes in UTF-8 and up to 4 bytes in UTF-16. The -result is always left in buffer16. - -Note that this function does not object to surrogate values. This is -deliberate; it makes it possible to construct UTF-16 strings that are invalid, -for the purpose of testing that they are correctly faulted. - -Patterns to be converted are either plain ASCII or UTF-8; data lines are always -in UTF-8 so that values greater than 255 can be handled. - -Arguments: - data TRUE if converting a data line; FALSE for a regex - p points to a byte string - utf true if UTF-8 (to be converted to UTF-16) - len number of bytes in the string (excluding trailing zero) - -Returns: number of 16-bit data items used (excluding trailing zero) - OR -1 if a UTF-8 string is malformed - OR -2 if a value > 0x10ffff is encountered - OR -3 if a value > 0xffff is encountered when not in UTF mode -*/ - -static int -to16(int data, pcre_uint8 *p, int utf, int len) -{ -pcre_uint16 *pp; - -if (buffer16_size < 2*len + 2) - { - if (buffer16 != NULL) free(buffer16); - buffer16_size = 2*len + 2; - buffer16 = (pcre_uint16 *)malloc(buffer16_size); - if (buffer16 == NULL) - { - fprintf(stderr, "pcretest: malloc(%d) failed for buffer16\n", buffer16_size); - exit(1); - } - } - -pp = buffer16; - -if (!utf && !data) - { - while (len-- > 0) *pp++ = *p++; - } - -else - { - pcre_uint32 c = 0; - while (len > 0) - { - int chlen = utf82ord(p, &c); - if (chlen <= 0) return -1; - if (c > 0x10ffff) return -2; - p += chlen; - len -= chlen; - if (c < 0x10000) *pp++ = c; else - { - if (!utf) return -3; - c -= 0x10000; - *pp++ = 0xD800 | (c >> 10); - *pp++ = 0xDC00 | (c & 0x3ff); - } - } - } - -*pp = 0; -return pp - buffer16; -} -#endif - -#ifdef SUPPORT_PCRE32 -/************************************************* -* Convert a string to 32-bit * -*************************************************/ - -/* In non-UTF mode, the space needed for a 32-bit string is exactly four times the -8-bit size. For a UTF-8 string, the size needed for UTF-32 is no more than four -times, because up to 0xffff uses no more than 3 bytes in UTF-8 but possibly 4 -in UTF-32. Higher values use 4 bytes in UTF-8 and up to 4 bytes in UTF-32. The -result is always left in buffer32. - -Note that this function does not object to surrogate values. This is -deliberate; it makes it possible to construct UTF-32 strings that are invalid, -for the purpose of testing that they are correctly faulted. - -Patterns to be converted are either plain ASCII or UTF-8; data lines are always -in UTF-8 so that values greater than 255 can be handled. - -Arguments: - data TRUE if converting a data line; FALSE for a regex - p points to a byte string - utf true if UTF-8 (to be converted to UTF-32) - len number of bytes in the string (excluding trailing zero) - -Returns: number of 32-bit data items used (excluding trailing zero) - OR -1 if a UTF-8 string is malformed - OR -2 if a value > 0x10ffff is encountered - OR -3 if an ill-formed value is encountered (i.e. a surrogate) -*/ - -static int -to32(int data, pcre_uint8 *p, int utf, int len) -{ -pcre_uint32 *pp; - -if (buffer32_size < 4*len + 4) - { - if (buffer32 != NULL) free(buffer32); - buffer32_size = 4*len + 4; - buffer32 = (pcre_uint32 *)malloc(buffer32_size); - if (buffer32 == NULL) - { - fprintf(stderr, "pcretest: malloc(%d) failed for buffer32\n", buffer32_size); - exit(1); - } - } - -pp = buffer32; - -if (!utf && !data) - { - while (len-- > 0) *pp++ = *p++; - } - -else - { - pcre_uint32 c = 0; - while (len > 0) - { - int chlen = utf82ord(p, &c); - if (chlen <= 0) return -1; - if (utf) - { - if (c > 0x10ffff) return -2; - if (!data && (c & 0xfffff800u) == 0xd800u) return -3; - } - - p += chlen; - len -= chlen; - *pp++ = c; - } - } - -*pp = 0; -return pp - buffer32; -} - -/* Check that a 32-bit character string is valid UTF-32. - -Arguments: - string points to the string - length length of string, or -1 if the string is zero-terminated - -Returns: TRUE if the string is a valid UTF-32 string - FALSE otherwise -*/ - -#ifdef NEVER /* Not used */ -#ifdef SUPPORT_UTF -static BOOL -valid_utf32(pcre_uint32 *string, int length) -{ -register pcre_uint32 *p; -register pcre_uint32 c; - -for (p = string; length-- > 0; p++) - { - c = *p; - if (c > 0x10ffffu) return FALSE; /* Too big */ - if ((c & 0xfffff800u) == 0xd800u) return FALSE; /* Surrogate */ - } - -return TRUE; -} -#endif /* SUPPORT_UTF */ -#endif /* NEVER */ -#endif /* SUPPORT_PCRE32 */ - - -/************************************************* -* Read or extend an input line * -*************************************************/ - -/* Input lines are read into buffer, but both patterns and data lines can be -continued over multiple input lines. In addition, if the buffer fills up, we -want to automatically expand it so as to be able to handle extremely large -lines that are needed for certain stress tests. When the input buffer is -expanded, the other two buffers must also be expanded likewise, and the -contents of pbuffer, which are a copy of the input for callouts, must be -preserved (for when expansion happens for a data line). This is not the most -optimal way of handling this, but hey, this is just a test program! - -Arguments: - f the file to read - start where in buffer to start (this *must* be within buffer) - prompt for stdin or readline() - -Returns: pointer to the start of new data - could be a copy of start, or could be moved - NULL if no data read and EOF reached -*/ - -static pcre_uint8 * -extend_inputline(FILE *f, pcre_uint8 *start, const char *prompt) -{ -pcre_uint8 *here = start; - -for (;;) - { - size_t rlen = (size_t)(buffer_size - (here - buffer)); - - if (rlen > 1000) - { - int dlen; - - /* If libreadline or libedit support is required, use readline() to read a - line if the input is a terminal. Note that readline() removes the trailing - newline, so we must put it back again, to be compatible with fgets(). */ - -#if defined(SUPPORT_LIBREADLINE) || defined(SUPPORT_LIBEDIT) - if (isatty(fileno(f))) - { - size_t len; - char *s = readline(prompt); - if (s == NULL) return (here == start)? NULL : start; - len = strlen(s); - if (len > 0) add_history(s); - if (len > rlen - 1) len = rlen - 1; - memcpy(here, s, len); - here[len] = '\n'; - here[len+1] = 0; - free(s); - } - else -#endif - - /* Read the next line by normal means, prompting if the file is stdin. */ - - { - if (f == stdin) printf("%s", prompt); - if (fgets((char *)here, rlen, f) == NULL) - return (here == start)? NULL : start; - } - - dlen = (int)strlen((char *)here); - if (dlen > 0 && here[dlen - 1] == '\n') return start; - here += dlen; - } - - else - { - int new_buffer_size = 2*buffer_size; - pcre_uint8 *new_buffer = (pcre_uint8 *)malloc(new_buffer_size); - pcre_uint8 *new_pbuffer = (pcre_uint8 *)malloc(new_buffer_size); - - if (new_buffer == NULL || new_pbuffer == NULL) - { - fprintf(stderr, "pcretest: malloc(%d) failed\n", new_buffer_size); - exit(1); - } - - memcpy(new_buffer, buffer, buffer_size); - memcpy(new_pbuffer, pbuffer, buffer_size); - - buffer_size = new_buffer_size; - - start = new_buffer + (start - buffer); - here = new_buffer + (here - buffer); - - free(buffer); - free(pbuffer); - - buffer = new_buffer; - pbuffer = new_pbuffer; - } - } - -/* Control never gets here */ -} - - - -/************************************************* -* Read number from string * -*************************************************/ - -/* We don't use strtoul() because SunOS4 doesn't have it. Rather than mess -around with conditional compilation, just do the job by hand. It is only used -for unpicking arguments, so just keep it simple. - -Arguments: - str string to be converted - endptr where to put the end pointer - -Returns: the unsigned long -*/ - -static int -get_value(pcre_uint8 *str, pcre_uint8 **endptr) -{ -int result = 0; -while(*str != 0 && isspace(*str)) str++; -while (isdigit(*str)) result = result * 10 + (int)(*str++ - '0'); -*endptr = str; -return(result); -} - - - -/************************************************* -* Print one character * -*************************************************/ - -/* Print a single character either literally, or as a hex escape. */ - -static int pchar(pcre_uint32 c, FILE *f) -{ -int n = 0; -char tempbuffer[16]; -if (PRINTOK(c)) - { - if (f != NULL) fprintf(f, "%c", c); - return 1; - } - -if (c < 0x100) - { - if (use_utf) - { - if (f != NULL) fprintf(f, "\\x{%02x}", c); - return 6; - } - else - { - if (f != NULL) fprintf(f, "\\x%02x", c); - return 4; - } - } - -if (f != NULL) n = fprintf(f, "\\x{%02x}", c); - else n = sprintf(tempbuffer, "\\x{%02x}", c); - -return n >= 0 ? n : 0; -} - - - -#ifdef SUPPORT_PCRE8 -/************************************************* -* Print 8-bit character string * -*************************************************/ - -/* Must handle UTF-8 strings in utf8 mode. Yields number of characters printed. -If handed a NULL file, just counts chars without printing. */ - -static int pchars(pcre_uint8 *p, int length, FILE *f) -{ -pcre_uint32 c = 0; -int yield = 0; - -if (length < 0) - length = strlen((char *)p); - -while (length-- > 0) - { -#if !defined NOUTF - if (use_utf) - { - int rc = utf82ord(p, &c); - if (rc > 0 && rc <= length + 1) /* Mustn't run over the end */ - { - length -= rc - 1; - p += rc; - yield += pchar(c, f); - continue; - } - } -#endif - c = *p++; - yield += pchar(c, f); - } - -return yield; -} -#endif - - - -#ifdef SUPPORT_PCRE16 -/************************************************* -* Find length of 0-terminated 16-bit string * -*************************************************/ - -static int strlen16(PCRE_SPTR16 p) -{ -PCRE_SPTR16 pp = p; -while (*pp != 0) pp++; -return (int)(pp - p); -} -#endif /* SUPPORT_PCRE16 */ - - - -#ifdef SUPPORT_PCRE32 -/************************************************* -* Find length of 0-terminated 32-bit string * -*************************************************/ - -static int strlen32(PCRE_SPTR32 p) -{ -PCRE_SPTR32 pp = p; -while (*pp != 0) pp++; -return (int)(pp - p); -} -#endif /* SUPPORT_PCRE32 */ - - - -#ifdef SUPPORT_PCRE16 -/************************************************* -* Print 16-bit character string * -*************************************************/ - -/* Must handle UTF-16 strings in utf mode. Yields number of characters printed. -If handed a NULL file, just counts chars without printing. */ - -static int pchars16(PCRE_SPTR16 p, int length, FILE *f) -{ -int yield = 0; - -if (length < 0) - length = strlen16(p); - -while (length-- > 0) - { - pcre_uint32 c = *p++ & 0xffff; -#if !defined NOUTF - if (use_utf && c >= 0xD800 && c < 0xDC00 && length > 0) - { - int d = *p & 0xffff; - if (d >= 0xDC00 && d <= 0xDFFF) - { - c = ((c & 0x3ff) << 10) + (d & 0x3ff) + 0x10000; - length--; - p++; - } - } -#endif - yield += pchar(c, f); - } - -return yield; -} -#endif /* SUPPORT_PCRE16 */ - - - -#ifdef SUPPORT_PCRE32 -/************************************************* -* Print 32-bit character string * -*************************************************/ - -/* Must handle UTF-32 strings in utf mode. Yields number of characters printed. -If handed a NULL file, just counts chars without printing. */ - -static int pchars32(PCRE_SPTR32 p, int length, BOOL utf, FILE *f) -{ -int yield = 0; - -(void)(utf); /* Avoid compiler warning */ - -if (length < 0) - length = strlen32(p); - -while (length-- > 0) - { - pcre_uint32 c = *p++; - yield += pchar(c, f); - } - -return yield; -} -#endif /* SUPPORT_PCRE32 */ - - - -#ifdef SUPPORT_PCRE8 -/************************************************* -* Read a capture name (8-bit) and check it * -*************************************************/ - -static pcre_uint8 * -read_capture_name8(pcre_uint8 *p, pcre_uint8 **pp, pcre *re) -{ -pcre_uint8 *npp = *pp; -while (isalnum(*p)) *npp++ = *p++; -*npp++ = 0; -*npp = 0; -if (pcre_get_stringnumber(re, (char *)(*pp)) < 0) - { - fprintf(outfile, "no parentheses with name \""); - PCHARSV(*pp, 0, -1, outfile); - fprintf(outfile, "\"\n"); - } - -*pp = npp; -return p; -} -#endif /* SUPPORT_PCRE8 */ - - - -#ifdef SUPPORT_PCRE16 -/************************************************* -* Read a capture name (16-bit) and check it * -*************************************************/ - -/* Note that the text being read is 8-bit. */ - -static pcre_uint8 * -read_capture_name16(pcre_uint8 *p, pcre_uint16 **pp, pcre *re) -{ -pcre_uint16 *npp = *pp; -while (isalnum(*p)) *npp++ = *p++; -*npp++ = 0; -*npp = 0; -if (pcre16_get_stringnumber((pcre16 *)re, (PCRE_SPTR16)(*pp)) < 0) - { - fprintf(outfile, "no parentheses with name \""); - PCHARSV(*pp, 0, -1, outfile); - fprintf(outfile, "\"\n"); - } -*pp = npp; -return p; -} -#endif /* SUPPORT_PCRE16 */ - - - -#ifdef SUPPORT_PCRE32 -/************************************************* -* Read a capture name (32-bit) and check it * -*************************************************/ - -/* Note that the text being read is 8-bit. */ - -static pcre_uint8 * -read_capture_name32(pcre_uint8 *p, pcre_uint32 **pp, pcre *re) -{ -pcre_uint32 *npp = *pp; -while (isalnum(*p)) *npp++ = *p++; -*npp++ = 0; -*npp = 0; -if (pcre32_get_stringnumber((pcre32 *)re, (PCRE_SPTR32)(*pp)) < 0) - { - fprintf(outfile, "no parentheses with name \""); - PCHARSV(*pp, 0, -1, outfile); - fprintf(outfile, "\"\n"); - } -*pp = npp; -return p; -} -#endif /* SUPPORT_PCRE32 */ - - - -/************************************************* -* Stack guard function * -*************************************************/ - -/* Called from PCRE when set in pcre_stack_guard. We give an error (non-zero) -return when a count overflows. */ - -static int stack_guard(void) -{ -return stack_guard_return; -} - -/************************************************* -* Callout function * -*************************************************/ - -/* Called from PCRE as a result of the (?C) item. We print out where we are in -the match. Yield zero unless more callouts than the fail count, or the callout -data is not zero. */ - -static int callout(pcre_callout_block *cb) -{ -FILE *f = (first_callout | callout_extra)? outfile : NULL; -int i, current_position, pre_start, post_start, subject_length; - -if (callout_extra) - { - fprintf(f, "Callout %d: last capture = %d\n", - cb->callout_number, cb->capture_last); - - if (cb->offset_vector != NULL) - { - for (i = 0; i < cb->capture_top * 2; i += 2) - { - if (cb->offset_vector[i] < 0) - fprintf(f, "%2d: \n", i/2); - else - { - fprintf(f, "%2d: ", i/2); - PCHARSV(cb->subject, cb->offset_vector[i], - cb->offset_vector[i+1] - cb->offset_vector[i], f); - fprintf(f, "\n"); - } - } - } - } - -/* Re-print the subject in canonical form, the first time or if giving full -datails. On subsequent calls in the same match, we use pchars just to find the -printed lengths of the substrings. */ - -if (f != NULL) fprintf(f, "--->"); - -/* If a lookbehind is involved, the current position may be earlier than the -match start. If so, use the match start instead. */ - -current_position = (cb->current_position >= cb->start_match)? - cb->current_position : cb->start_match; - -PCHARS(pre_start, cb->subject, 0, cb->start_match, f); -PCHARS(post_start, cb->subject, cb->start_match, - current_position - cb->start_match, f); - -PCHARS(subject_length, cb->subject, 0, cb->subject_length, NULL); - -PCHARSV(cb->subject, current_position, cb->subject_length - current_position, f); - -if (f != NULL) fprintf(f, "\n"); - -/* Always print appropriate indicators, with callout number if not already -shown. For automatic callouts, show the pattern offset. */ - -if (cb->callout_number == 255) - { - fprintf(outfile, "%+3d ", cb->pattern_position); - if (cb->pattern_position > 99) fprintf(outfile, "\n "); - } -else - { - if (callout_extra) fprintf(outfile, " "); - else fprintf(outfile, "%3d ", cb->callout_number); - } - -for (i = 0; i < pre_start; i++) fprintf(outfile, " "); -fprintf(outfile, "^"); - -if (post_start > 0) - { - for (i = 0; i < post_start - 1; i++) fprintf(outfile, " "); - fprintf(outfile, "^"); - } - -for (i = 0; i < subject_length - pre_start - post_start + 4; i++) - fprintf(outfile, " "); - -fprintf(outfile, "%.*s", (cb->next_item_length == 0)? 1 : cb->next_item_length, - pbuffer + cb->pattern_position); - -fprintf(outfile, "\n"); -first_callout = 0; - -if (cb->mark != last_callout_mark) - { - if (cb->mark == NULL) - fprintf(outfile, "Latest Mark: \n"); - else - { - fprintf(outfile, "Latest Mark: "); - PCHARSV(cb->mark, 0, -1, outfile); - putc('\n', outfile); - } - last_callout_mark = cb->mark; - } - -if (cb->callout_data != NULL) - { - int callout_data = *((int *)(cb->callout_data)); - if (callout_data != 0) - { - fprintf(outfile, "Callout data = %d\n", callout_data); - return callout_data; - } - } - -return (cb->callout_number != callout_fail_id)? 0 : - (++callout_count >= callout_fail_count)? 1 : 0; -} - - -/************************************************* -* Local malloc functions * -*************************************************/ - -/* Alternative malloc function, to test functionality and save the size of a -compiled re, which is the first store request that pcre_compile() makes. The -show_malloc variable is set only during matching. */ - -static void *new_malloc(size_t size) -{ -void *block = malloc(size); -if (show_malloc) - fprintf(outfile, "malloc %3d %p\n", (int)size, block); -return block; -} - -static void new_free(void *block) -{ -if (show_malloc) - fprintf(outfile, "free %p\n", block); -free(block); -} - -/* For recursion malloc/free, to test stacking calls */ - -static void *stack_malloc(size_t size) -{ -void *block = malloc(size); -if (show_malloc) - fprintf(outfile, "stack_malloc %3d %p\n", (int)size, block); -return block; -} - -static void stack_free(void *block) -{ -if (show_malloc) - fprintf(outfile, "stack_free %p\n", block); -free(block); -} - - -/************************************************* -* Call pcre_fullinfo() * -*************************************************/ - -/* Get one piece of information from the pcre_fullinfo() function. When only -one of 8-, 16- or 32-bit is supported, pcre_mode should always have the correct -value, but the code is defensive. - -Arguments: - re compiled regex - study study data - option PCRE_INFO_xxx option - ptr where to put the data - -Returns: 0 when OK, < 0 on error -*/ - -static int -new_info(pcre *re, pcre_extra *study, int option, void *ptr) -{ -int rc; - -if (pcre_mode == PCRE32_MODE) -#ifdef SUPPORT_PCRE32 - rc = pcre32_fullinfo((pcre32 *)re, (pcre32_extra *)study, option, ptr); -#else - rc = PCRE_ERROR_BADMODE; -#endif -else if (pcre_mode == PCRE16_MODE) -#ifdef SUPPORT_PCRE16 - rc = pcre16_fullinfo((pcre16 *)re, (pcre16_extra *)study, option, ptr); -#else - rc = PCRE_ERROR_BADMODE; -#endif -else -#ifdef SUPPORT_PCRE8 - rc = pcre_fullinfo(re, study, option, ptr); -#else - rc = PCRE_ERROR_BADMODE; -#endif - -if (rc < 0 && rc != PCRE_ERROR_UNSET) - { - fprintf(outfile, "Error %d from pcre%s_fullinfo(%d)\n", rc, - pcre_mode == PCRE32_MODE ? "32" : pcre_mode == PCRE16_MODE ? "16" : "", option); - if (rc == PCRE_ERROR_BADMODE) - fprintf(outfile, "Running in %d-bit mode but pattern was compiled in " - "%d-bit mode\n", 8 * CHAR_SIZE, - 8 * (REAL_PCRE_FLAGS(re) & PCRE_MODE_MASK)); - } - -return rc; -} - - - -/************************************************* -* Swap byte functions * -*************************************************/ - -/* The following functions swap the bytes of a pcre_uint16 and pcre_uint32 -value, respectively. - -Arguments: - value any number - -Returns: the byte swapped value -*/ - -static pcre_uint32 -swap_uint32(pcre_uint32 value) -{ -return ((value & 0x000000ff) << 24) | - ((value & 0x0000ff00) << 8) | - ((value & 0x00ff0000) >> 8) | - (value >> 24); -} - -static pcre_uint16 -swap_uint16(pcre_uint16 value) -{ -return (value >> 8) | (value << 8); -} - - - -/************************************************* -* Flip bytes in a compiled pattern * -*************************************************/ - -/* This function is called if the 'F' option was present on a pattern that is -to be written to a file. We flip the bytes of all the integer fields in the -regex data block and the study block. In 16-bit mode this also flips relevant -bytes in the pattern itself. This is to make it possible to test PCRE's -ability to reload byte-flipped patterns, e.g. those compiled on a different -architecture. */ - -#if defined SUPPORT_PCRE8 || defined SUPPORT_PCRE16 -static void -regexflip8_or_16(pcre *ere, pcre_extra *extra) -{ -real_pcre8_or_16 *re = (real_pcre8_or_16 *)ere; -#ifdef SUPPORT_PCRE16 -int op; -pcre_uint16 *ptr = (pcre_uint16 *)re + re->name_table_offset; -int length = re->name_count * re->name_entry_size; -#ifdef SUPPORT_UTF -BOOL utf = (re->options & PCRE_UTF16) != 0; -BOOL utf16_char = FALSE; -#endif /* SUPPORT_UTF */ -#endif /* SUPPORT_PCRE16 */ - -/* Always flip the bytes in the main data block and study blocks. */ - -re->magic_number = REVERSED_MAGIC_NUMBER; -re->size = swap_uint32(re->size); -re->options = swap_uint32(re->options); -re->flags = swap_uint32(re->flags); -re->limit_match = swap_uint32(re->limit_match); -re->limit_recursion = swap_uint32(re->limit_recursion); -re->first_char = swap_uint16(re->first_char); -re->req_char = swap_uint16(re->req_char); -re->max_lookbehind = swap_uint16(re->max_lookbehind); -re->top_bracket = swap_uint16(re->top_bracket); -re->top_backref = swap_uint16(re->top_backref); -re->name_table_offset = swap_uint16(re->name_table_offset); -re->name_entry_size = swap_uint16(re->name_entry_size); -re->name_count = swap_uint16(re->name_count); -re->ref_count = swap_uint16(re->ref_count); - -if (extra != NULL && (extra->flags & PCRE_EXTRA_STUDY_DATA) != 0) - { - pcre_study_data *rsd = (pcre_study_data *)(extra->study_data); - rsd->size = swap_uint32(rsd->size); - rsd->flags = swap_uint32(rsd->flags); - rsd->minlength = swap_uint32(rsd->minlength); - } - -/* In 8-bit mode, that is all we need to do. In 16-bit mode we must swap bytes -in the name table, if present, and then in the pattern itself. */ - -#ifdef SUPPORT_PCRE16 -if (pcre_mode != PCRE16_MODE) return; - -while(TRUE) - { - /* Swap previous characters. */ - while (length-- > 0) - { - *ptr = swap_uint16(*ptr); - ptr++; - } -#ifdef SUPPORT_UTF - if (utf16_char) - { - if ((ptr[-1] & 0xfc00) == 0xd800) - { - /* We know that there is only one extra character in UTF-16. */ - *ptr = swap_uint16(*ptr); - ptr++; - } - } - utf16_char = FALSE; -#endif /* SUPPORT_UTF */ - - /* Get next opcode. */ - - length = 0; - op = *ptr; - *ptr++ = swap_uint16(op); - - switch (op) - { - case OP_END: - return; - -#ifdef SUPPORT_UTF - case OP_CHAR: - case OP_CHARI: - case OP_NOT: - case OP_NOTI: - case OP_STAR: - case OP_MINSTAR: - case OP_PLUS: - case OP_MINPLUS: - case OP_QUERY: - case OP_MINQUERY: - case OP_UPTO: - case OP_MINUPTO: - case OP_EXACT: - case OP_POSSTAR: - case OP_POSPLUS: - case OP_POSQUERY: - case OP_POSUPTO: - case OP_STARI: - case OP_MINSTARI: - case OP_PLUSI: - case OP_MINPLUSI: - case OP_QUERYI: - case OP_MINQUERYI: - case OP_UPTOI: - case OP_MINUPTOI: - case OP_EXACTI: - case OP_POSSTARI: - case OP_POSPLUSI: - case OP_POSQUERYI: - case OP_POSUPTOI: - case OP_NOTSTAR: - case OP_NOTMINSTAR: - case OP_NOTPLUS: - case OP_NOTMINPLUS: - case OP_NOTQUERY: - case OP_NOTMINQUERY: - case OP_NOTUPTO: - case OP_NOTMINUPTO: - case OP_NOTEXACT: - case OP_NOTPOSSTAR: - case OP_NOTPOSPLUS: - case OP_NOTPOSQUERY: - case OP_NOTPOSUPTO: - case OP_NOTSTARI: - case OP_NOTMINSTARI: - case OP_NOTPLUSI: - case OP_NOTMINPLUSI: - case OP_NOTQUERYI: - case OP_NOTMINQUERYI: - case OP_NOTUPTOI: - case OP_NOTMINUPTOI: - case OP_NOTEXACTI: - case OP_NOTPOSSTARI: - case OP_NOTPOSPLUSI: - case OP_NOTPOSQUERYI: - case OP_NOTPOSUPTOI: - if (utf) utf16_char = TRUE; -#endif - /* Fall through. */ - - default: - length = OP_lengths16[op] - 1; - break; - - case OP_CLASS: - case OP_NCLASS: - /* Skip the character bit map. */ - ptr += 32/sizeof(pcre_uint16); - length = 0; - break; - - case OP_XCLASS: - /* LINK_SIZE can be 1 or 2 in 16 bit mode. */ - if (LINK_SIZE > 1) - length = (int)((((unsigned int)(ptr[0]) << 16) | (unsigned int)(ptr[1])) - - (1 + LINK_SIZE + 1)); - else - length = (int)((unsigned int)(ptr[0]) - (1 + LINK_SIZE + 1)); - - /* Reverse the size of the XCLASS instance. */ - *ptr = swap_uint16(*ptr); - ptr++; - if (LINK_SIZE > 1) - { - *ptr = swap_uint16(*ptr); - ptr++; - } - - op = *ptr; - *ptr = swap_uint16(op); - ptr++; - if ((op & XCL_MAP) != 0) - { - /* Skip the character bit map. */ - ptr += 32/sizeof(pcre_uint16); - length -= 32/sizeof(pcre_uint16); - } - break; - } - } -/* Control should never reach here in 16 bit mode. */ -#endif /* SUPPORT_PCRE16 */ -} -#endif /* SUPPORT_PCRE[8|16] */ - - - -#if defined SUPPORT_PCRE32 -static void -regexflip_32(pcre *ere, pcre_extra *extra) -{ -real_pcre32 *re = (real_pcre32 *)ere; -int op; -pcre_uint32 *ptr = (pcre_uint32 *)re + re->name_table_offset; -int length = re->name_count * re->name_entry_size; - -/* Always flip the bytes in the main data block and study blocks. */ - -re->magic_number = REVERSED_MAGIC_NUMBER; -re->size = swap_uint32(re->size); -re->options = swap_uint32(re->options); -re->flags = swap_uint32(re->flags); -re->limit_match = swap_uint32(re->limit_match); -re->limit_recursion = swap_uint32(re->limit_recursion); -re->first_char = swap_uint32(re->first_char); -re->req_char = swap_uint32(re->req_char); -re->max_lookbehind = swap_uint16(re->max_lookbehind); -re->top_bracket = swap_uint16(re->top_bracket); -re->top_backref = swap_uint16(re->top_backref); -re->name_table_offset = swap_uint16(re->name_table_offset); -re->name_entry_size = swap_uint16(re->name_entry_size); -re->name_count = swap_uint16(re->name_count); -re->ref_count = swap_uint16(re->ref_count); - -if (extra != NULL && (extra->flags & PCRE_EXTRA_STUDY_DATA) != 0) - { - pcre_study_data *rsd = (pcre_study_data *)(extra->study_data); - rsd->size = swap_uint32(rsd->size); - rsd->flags = swap_uint32(rsd->flags); - rsd->minlength = swap_uint32(rsd->minlength); - } - -/* In 32-bit mode we must swap bytes in the name table, if present, and then in -the pattern itself. */ - -while(TRUE) - { - /* Swap previous characters. */ - while (length-- > 0) - { - *ptr = swap_uint32(*ptr); - ptr++; - } - - /* Get next opcode. */ - - length = 0; - op = *ptr; - *ptr++ = swap_uint32(op); - - switch (op) - { - case OP_END: - return; - - default: - length = OP_lengths32[op] - 1; - break; - - case OP_CLASS: - case OP_NCLASS: - /* Skip the character bit map. */ - ptr += 32/sizeof(pcre_uint32); - length = 0; - break; - - case OP_XCLASS: - /* LINK_SIZE can only be 1 in 32-bit mode. */ - length = (int)((unsigned int)(ptr[0]) - (1 + LINK_SIZE + 1)); - - /* Reverse the size of the XCLASS instance. */ - *ptr = swap_uint32(*ptr); - ptr++; - - op = *ptr; - *ptr = swap_uint32(op); - ptr++; - if ((op & XCL_MAP) != 0) - { - /* Skip the character bit map. */ - ptr += 32/sizeof(pcre_uint32); - length -= 32/sizeof(pcre_uint32); - } - break; - } - } -/* Control should never reach here in 32 bit mode. */ -} - -#endif /* SUPPORT_PCRE32 */ - - - -static void -regexflip(pcre *ere, pcre_extra *extra) -{ -#if defined SUPPORT_PCRE32 - if (REAL_PCRE_FLAGS(ere) & PCRE_MODE32) - regexflip_32(ere, extra); -#endif -#if defined SUPPORT_PCRE8 || defined SUPPORT_PCRE16 - if (REAL_PCRE_FLAGS(ere) & (PCRE_MODE8 | PCRE_MODE16)) - regexflip8_or_16(ere, extra); -#endif -} - - - -/************************************************* -* Check match or recursion limit * -*************************************************/ - -static int -check_match_limit(pcre *re, pcre_extra *extra, pcre_uint8 *bptr, int len, - int start_offset, int options, int *use_offsets, int use_size_offsets, - int flag, unsigned long int *limit, int errnumber, const char *msg) -{ -int count; -int min = 0; -int mid = 64; -int max = -1; - -extra->flags |= flag; - -for (;;) - { - *limit = mid; - - PCRE_EXEC(count, re, extra, bptr, len, start_offset, options, - use_offsets, use_size_offsets); - - if (count == errnumber) - { - /* fprintf(outfile, "Testing %s limit = %d\n", msg, mid); */ - min = mid; - mid = (mid == max - 1)? max : (max > 0)? (min + max)/2 : mid*2; - } - - else if (count >= 0 || count == PCRE_ERROR_NOMATCH || - count == PCRE_ERROR_PARTIAL) - { - if (mid == min + 1) - { - fprintf(outfile, "Minimum %s limit = %d\n", msg, mid); - break; - } - /* fprintf(outfile, "Testing %s limit = %d\n", msg, mid); */ - max = mid; - mid = (min + mid)/2; - } - else break; /* Some other error */ - } - -extra->flags &= ~flag; -return count; -} - - - -/************************************************* -* Case-independent strncmp() function * -*************************************************/ - -/* -Arguments: - s first string - t second string - n number of characters to compare - -Returns: < 0, = 0, or > 0, according to the comparison -*/ - -static int -strncmpic(pcre_uint8 *s, pcre_uint8 *t, int n) -{ -while (n--) - { - int c = tolower(*s++) - tolower(*t++); - if (c) return c; - } -return 0; -} - - - -/************************************************* -* Check multicharacter option * -*************************************************/ - -/* This is used both at compile and run-time to check for escapes. Print -a message and return 0 if there is no match. - -Arguments: - p points after the leading '<' - f file for error message - nl TRUE to check only for newline settings - stype "modifier" or "escape sequence" - -Returns: appropriate PCRE_NEWLINE_xxx flags, or 0 -*/ - -static int -check_mc_option(pcre_uint8 *p, FILE *f, BOOL nl, const char *stype) -{ -if (strncmpic(p, (pcre_uint8 *)"cr>", 3) == 0) return PCRE_NEWLINE_CR; -if (strncmpic(p, (pcre_uint8 *)"lf>", 3) == 0) return PCRE_NEWLINE_LF; -if (strncmpic(p, (pcre_uint8 *)"crlf>", 5) == 0) return PCRE_NEWLINE_CRLF; -if (strncmpic(p, (pcre_uint8 *)"anycrlf>", 8) == 0) return PCRE_NEWLINE_ANYCRLF; -if (strncmpic(p, (pcre_uint8 *)"any>", 4) == 0) return PCRE_NEWLINE_ANY; -if (strncmpic(p, (pcre_uint8 *)"bsr_anycrlf>", 12) == 0) return PCRE_BSR_ANYCRLF; -if (strncmpic(p, (pcre_uint8 *)"bsr_unicode>", 12) == 0) return PCRE_BSR_UNICODE; - -if (!nl) - { - if (strncmpic(p, (pcre_uint8 *)"JS>", 3) == 0) return PCRE_JAVASCRIPT_COMPAT; - } - -fprintf(f, "Unknown %s at: <%s\n", stype, p); -return 0; -} - - - -/************************************************* -* Usage function * -*************************************************/ - -static void -usage(void) -{ -printf("Usage: pcretest [options] [ []]\n\n"); -printf("Input and output default to stdin and stdout.\n"); -#if defined(SUPPORT_LIBREADLINE) || defined(SUPPORT_LIBEDIT) -printf("If input is a terminal, readline() is used to read from it.\n"); -#else -printf("This version of pcretest is not linked with readline().\n"); -#endif -printf("\nOptions:\n"); -#ifdef SUPPORT_PCRE16 -printf(" -16 use the 16-bit library\n"); -#endif -#ifdef SUPPORT_PCRE32 -printf(" -32 use the 32-bit library\n"); -#endif -printf(" -b show compiled code\n"); -printf(" -C show PCRE compile-time options and exit\n"); -printf(" -C arg show a specific compile-time option and exit\n"); -printf(" with its value if numeric (else 0). The arg can be:\n"); -printf(" linksize internal link size [2, 3, 4]\n"); -printf(" pcre8 8 bit library support enabled [0, 1]\n"); -printf(" pcre16 16 bit library support enabled [0, 1]\n"); -printf(" pcre32 32 bit library support enabled [0, 1]\n"); -printf(" utf Unicode Transformation Format supported [0, 1]\n"); -printf(" ucp Unicode Properties supported [0, 1]\n"); -printf(" jit Just-in-time compiler supported [0, 1]\n"); -printf(" newline Newline type [CR, LF, CRLF, ANYCRLF, ANY]\n"); -printf(" bsr \\R type [ANYCRLF, ANY]\n"); -printf(" -d debug: show compiled code and information (-b and -i)\n"); -#if !defined NODFA -printf(" -dfa force DFA matching for all subjects\n"); -#endif -printf(" -help show usage information\n"); -printf(" -i show information about compiled patterns\n" - " -M find MATCH_LIMIT minimum for each subject\n" - " -m output memory used information\n" - " -O set PCRE_NO_AUTO_POSSESS on each pattern\n" - " -o set size of offsets vector to \n"); -#if !defined NOPOSIX -printf(" -p use POSIX interface\n"); -#endif -printf(" -q quiet: do not output PCRE version number at start\n"); -printf(" -S set stack size to megabytes\n"); -printf(" -s force each pattern to be studied at basic level\n" - " -s+ force each pattern to be studied, using JIT if available\n" - " -s++ ditto, verifying when JIT was actually used\n" - " -s+n force each pattern to be studied, using JIT if available,\n" - " where 1 <= n <= 7 selects JIT options\n" - " -s++n ditto, verifying when JIT was actually used\n" - " -t time compilation and execution\n"); -printf(" -t time compilation and execution, repeating times\n"); -printf(" -tm time execution (matching) only\n"); -printf(" -tm time execution (matching) only, repeating times\n"); -printf(" -T same as -t, but show total times at the end\n"); -printf(" -TM same as -tm, but show total time at the end\n"); -} - - - -/************************************************* -* Main Program * -*************************************************/ - -/* Read lines from named file or stdin and write to named file or stdout; lines -consist of a regular expression, in delimiters and optionally followed by -options, followed by a set of test data, terminated by an empty line. */ - -int main(int argc, char **argv) -{ -FILE *infile = stdin; -const char *version; -int options = 0; -int study_options = 0; -int default_find_match_limit = FALSE; -pcre_uint32 default_options = 0; -int op = 1; -int timeit = 0; -int timeitm = 0; -int showtotaltimes = 0; -int showinfo = 0; -int showstore = 0; -int force_study = -1; -int force_study_options = 0; -int quiet = 0; -int size_offsets = 45; -int size_offsets_max; -int *offsets = NULL; -int debug = 0; -int done = 0; -int all_use_dfa = 0; -int verify_jit = 0; -int yield = 0; -int stack_size; -pcre_uint8 *dbuffer = NULL; -pcre_uint8 lockout[24] = { 0 }; -size_t dbuffer_size = 1u << 14; -clock_t total_compile_time = 0; -clock_t total_study_time = 0; -clock_t total_match_time = 0; - -#if !defined NOPOSIX -int posix = 0; -#endif -#if !defined NODFA -int *dfa_workspace = NULL; -#endif - -pcre_jit_stack *jit_stack = NULL; - -/* These vectors store, end-to-end, a list of zero-terminated captured -substring names, each list itself being terminated by an empty name. Assume -that 1024 is plenty long enough for the few names we'll be testing. It is -easiest to keep separate 8-, 16- and 32-bit versions, using the 32-bit version -for the actual memory, to ensure alignment. */ - -pcre_uint32 copynames[1024]; -pcre_uint32 getnames[1024]; - -#ifdef SUPPORT_PCRE32 -pcre_uint32 *cn32ptr; -pcre_uint32 *gn32ptr; -#endif - -#ifdef SUPPORT_PCRE16 -pcre_uint16 *copynames16 = (pcre_uint16 *)copynames; -pcre_uint16 *getnames16 = (pcre_uint16 *)getnames; -pcre_uint16 *cn16ptr; -pcre_uint16 *gn16ptr; -#endif - -#ifdef SUPPORT_PCRE8 -pcre_uint8 *copynames8 = (pcre_uint8 *)copynames; -pcre_uint8 *getnames8 = (pcre_uint8 *)getnames; -pcre_uint8 *cn8ptr; -pcre_uint8 *gn8ptr; -#endif - -/* Get buffers from malloc() so that valgrind will check their misuse when -debugging. They grow automatically when very long lines are read. The 16- -and 32-bit buffers (buffer16, buffer32) are obtained only if needed. */ - -buffer = (pcre_uint8 *)malloc(buffer_size); -pbuffer = (pcre_uint8 *)malloc(buffer_size); - -/* The outfile variable is static so that new_malloc can use it. */ - -outfile = stdout; - -/* The following _setmode() stuff is some Windows magic that tells its runtime -library to translate CRLF into a single LF character. At least, that's what -I've been told: never having used Windows I take this all on trust. Originally -it set 0x8000, but then I was advised that _O_BINARY was better. */ - -#if defined(_WIN32) || defined(WIN32) -_setmode( _fileno( stdout ), _O_BINARY ); -#endif - -/* Get the version number: both pcre_version() and pcre16_version() give the -same answer. We just need to ensure that we call one that is available. */ - -#if defined SUPPORT_PCRE8 -version = pcre_version(); -#elif defined SUPPORT_PCRE16 -version = pcre16_version(); -#elif defined SUPPORT_PCRE32 -version = pcre32_version(); -#endif - -/* Scan options */ - -while (argc > 1 && argv[op][0] == '-') - { - pcre_uint8 *endptr; - char *arg = argv[op]; - - if (strcmp(arg, "-m") == 0) showstore = 1; - else if (strcmp(arg, "-s") == 0) force_study = 0; - - else if (strncmp(arg, "-s+", 3) == 0) - { - arg += 3; - if (*arg == '+') { arg++; verify_jit = TRUE; } - force_study = 1; - if (*arg == 0) - force_study_options = jit_study_bits[6]; - else if (*arg >= '1' && *arg <= '7') - force_study_options = jit_study_bits[*arg - '1']; - else goto BAD_ARG; - } - else if (strcmp(arg, "-8") == 0) - { -#ifdef SUPPORT_PCRE8 - pcre_mode = PCRE8_MODE; -#else - printf("** This version of PCRE was built without 8-bit support\n"); - exit(1); -#endif - } - else if (strcmp(arg, "-16") == 0) - { -#ifdef SUPPORT_PCRE16 - pcre_mode = PCRE16_MODE; -#else - printf("** This version of PCRE was built without 16-bit support\n"); - exit(1); -#endif - } - else if (strcmp(arg, "-32") == 0) - { -#ifdef SUPPORT_PCRE32 - pcre_mode = PCRE32_MODE; -#else - printf("** This version of PCRE was built without 32-bit support\n"); - exit(1); -#endif - } - else if (strcmp(arg, "-q") == 0) quiet = 1; - else if (strcmp(arg, "-b") == 0) debug = 1; - else if (strcmp(arg, "-i") == 0) showinfo = 1; - else if (strcmp(arg, "-d") == 0) showinfo = debug = 1; - else if (strcmp(arg, "-M") == 0) default_find_match_limit = TRUE; - else if (strcmp(arg, "-O") == 0) default_options |= PCRE_NO_AUTO_POSSESS; -#if !defined NODFA - else if (strcmp(arg, "-dfa") == 0) all_use_dfa = 1; -#endif - else if (strcmp(arg, "-o") == 0 && argc > 2 && - ((size_offsets = get_value((pcre_uint8 *)argv[op+1], &endptr)), - *endptr == 0)) - { - op++; - argc--; - } - else if (strcmp(arg, "-t") == 0 || strcmp(arg, "-tm") == 0 || - strcmp(arg, "-T") == 0 || strcmp(arg, "-TM") == 0) - { - int temp; - int both = arg[2] == 0; - showtotaltimes = arg[1] == 'T'; - if (argc > 2 && (temp = get_value((pcre_uint8 *)argv[op+1], &endptr), - *endptr == 0)) - { - timeitm = temp; - op++; - argc--; - } - else timeitm = LOOPREPEAT; - if (both) timeit = timeitm; - } - else if (strcmp(arg, "-S") == 0 && argc > 2 && - ((stack_size = get_value((pcre_uint8 *)argv[op+1], &endptr)), - *endptr == 0)) - { -#if defined(_WIN32) || defined(WIN32) || defined(__minix) || defined(NATIVE_ZOS) || defined(__VMS) - printf("PCRE: -S not supported on this OS\n"); - exit(1); -#else - int rc; - struct rlimit rlim; - getrlimit(RLIMIT_STACK, &rlim); - rlim.rlim_cur = stack_size * 1024 * 1024; - rc = setrlimit(RLIMIT_STACK, &rlim); - if (rc != 0) - { - printf("PCRE: setrlimit() failed with error %d\n", rc); - exit(1); - } - op++; - argc--; -#endif - } -#if !defined NOPOSIX - else if (strcmp(arg, "-p") == 0) posix = 1; -#endif - else if (strcmp(arg, "-C") == 0) - { - int rc; - unsigned long int lrc; - - if (argc > 2) - { - if (strcmp(argv[op + 1], "linksize") == 0) - { - (void)PCRE_CONFIG(PCRE_CONFIG_LINK_SIZE, &rc); - printf("%d\n", rc); - yield = rc; - -#ifdef __VMS - vms_setsymbol("LINKSIZE",0,yield ); -#endif - } - else if (strcmp(argv[op + 1], "pcre8") == 0) - { -#ifdef SUPPORT_PCRE8 - printf("1\n"); - yield = 1; -#else - printf("0\n"); - yield = 0; -#endif -#ifdef __VMS - vms_setsymbol("PCRE8",0,yield ); -#endif - } - else if (strcmp(argv[op + 1], "pcre16") == 0) - { -#ifdef SUPPORT_PCRE16 - printf("1\n"); - yield = 1; -#else - printf("0\n"); - yield = 0; -#endif -#ifdef __VMS - vms_setsymbol("PCRE16",0,yield ); -#endif - } - else if (strcmp(argv[op + 1], "pcre32") == 0) - { -#ifdef SUPPORT_PCRE32 - printf("1\n"); - yield = 1; -#else - printf("0\n"); - yield = 0; -#endif -#ifdef __VMS - vms_setsymbol("PCRE32",0,yield ); -#endif - } - else if (strcmp(argv[op + 1], "utf") == 0) - { -#ifdef SUPPORT_PCRE8 - if (pcre_mode == PCRE8_MODE) - (void)pcre_config(PCRE_CONFIG_UTF8, &rc); -#endif -#ifdef SUPPORT_PCRE16 - if (pcre_mode == PCRE16_MODE) - (void)pcre16_config(PCRE_CONFIG_UTF16, &rc); -#endif -#ifdef SUPPORT_PCRE32 - if (pcre_mode == PCRE32_MODE) - (void)pcre32_config(PCRE_CONFIG_UTF32, &rc); -#endif - printf("%d\n", rc); - yield = rc; -#ifdef __VMS - vms_setsymbol("UTF",0,yield ); -#endif - } - else if (strcmp(argv[op + 1], "ucp") == 0) - { - (void)PCRE_CONFIG(PCRE_CONFIG_UNICODE_PROPERTIES, &rc); - printf("%d\n", rc); - yield = rc; - } - else if (strcmp(argv[op + 1], "jit") == 0) - { - (void)PCRE_CONFIG(PCRE_CONFIG_JIT, &rc); - printf("%d\n", rc); - yield = rc; - } - else if (strcmp(argv[op + 1], "newline") == 0) - { - (void)PCRE_CONFIG(PCRE_CONFIG_NEWLINE, &rc); - print_newline_config(rc, TRUE); - } - else if (strcmp(argv[op + 1], "bsr") == 0) - { - (void)PCRE_CONFIG(PCRE_CONFIG_BSR, &rc); - printf("%s\n", rc? "ANYCRLF" : "ANY"); - } - else if (strcmp(argv[op + 1], "ebcdic") == 0) - { -#ifdef EBCDIC - printf("1\n"); - yield = 1; -#else - printf("0\n"); -#endif - } - else if (strcmp(argv[op + 1], "ebcdic-nl") == 0) - { -#ifdef EBCDIC - printf("0x%02x\n", CHAR_LF); -#else - printf("0\n"); -#endif - } - else - { - printf("Unknown -C option: %s\n", argv[op + 1]); - } - goto EXIT; - } - - /* No argument for -C: output all configuration information. */ - - printf("PCRE version %s\n", version); - printf("Compiled with\n"); - -#ifdef EBCDIC - printf(" EBCDIC code support: LF is 0x%02x\n", CHAR_LF); -#endif - -/* At least one of SUPPORT_PCRE8 and SUPPORT_PCRE16 will be set. If both -are set, either both UTFs are supported or both are not supported. */ - -#ifdef SUPPORT_PCRE8 - printf(" 8-bit support\n"); - (void)pcre_config(PCRE_CONFIG_UTF8, &rc); - printf (" %sUTF-8 support\n", rc ? "" : "No "); -#endif -#ifdef SUPPORT_PCRE16 - printf(" 16-bit support\n"); - (void)pcre16_config(PCRE_CONFIG_UTF16, &rc); - printf (" %sUTF-16 support\n", rc ? "" : "No "); -#endif -#ifdef SUPPORT_PCRE32 - printf(" 32-bit support\n"); - (void)pcre32_config(PCRE_CONFIG_UTF32, &rc); - printf (" %sUTF-32 support\n", rc ? "" : "No "); -#endif - - (void)PCRE_CONFIG(PCRE_CONFIG_UNICODE_PROPERTIES, &rc); - printf(" %sUnicode properties support\n", rc? "" : "No "); - (void)PCRE_CONFIG(PCRE_CONFIG_JIT, &rc); - if (rc) - { - const char *arch; - (void)PCRE_CONFIG(PCRE_CONFIG_JITTARGET, (void *)(&arch)); - printf(" Just-in-time compiler support: %s\n", arch); - } - else - printf(" No just-in-time compiler support\n"); - (void)PCRE_CONFIG(PCRE_CONFIG_NEWLINE, &rc); - print_newline_config(rc, FALSE); - (void)PCRE_CONFIG(PCRE_CONFIG_BSR, &rc); - printf(" \\R matches %s\n", rc? "CR, LF, or CRLF only" : - "all Unicode newlines"); - (void)PCRE_CONFIG(PCRE_CONFIG_LINK_SIZE, &rc); - printf(" Internal link size = %d\n", rc); - (void)PCRE_CONFIG(PCRE_CONFIG_POSIX_MALLOC_THRESHOLD, &rc); - printf(" POSIX malloc threshold = %d\n", rc); - (void)PCRE_CONFIG(PCRE_CONFIG_PARENS_LIMIT, &lrc); - printf(" Parentheses nest limit = %ld\n", lrc); - (void)PCRE_CONFIG(PCRE_CONFIG_MATCH_LIMIT, &lrc); - printf(" Default match limit = %ld\n", lrc); - (void)PCRE_CONFIG(PCRE_CONFIG_MATCH_LIMIT_RECURSION, &lrc); - printf(" Default recursion depth limit = %ld\n", lrc); - (void)PCRE_CONFIG(PCRE_CONFIG_STACKRECURSE, &rc); - printf(" Match recursion uses %s", rc? "stack" : "heap"); - if (showstore) - { - PCRE_EXEC(stack_size, NULL, NULL, NULL, -999, -999, 0, NULL, 0); - printf(": %sframe size = %d bytes", rc? "approximate " : "", -stack_size); - } - printf("\n"); - goto EXIT; - } - else if (strcmp(arg, "-help") == 0 || - strcmp(arg, "--help") == 0) - { - usage(); - goto EXIT; - } - else - { - BAD_ARG: - printf("** Unknown or malformed option %s\n", arg); - usage(); - yield = 1; - goto EXIT; - } - op++; - argc--; - } - -/* Get the store for the offsets vector, and remember what it was */ - -size_offsets_max = size_offsets; -offsets = (int *)malloc(size_offsets_max * sizeof(int)); -if (offsets == NULL) - { - printf("** Failed to get %d bytes of memory for offsets vector\n", - (int)(size_offsets_max * sizeof(int))); - yield = 1; - goto EXIT; - } - -/* Sort out the input and output files */ - -if (argc > 1) - { - infile = fopen(argv[op], INPUT_MODE); - if (infile == NULL) - { - printf("** Failed to open %s\n", argv[op]); - yield = 1; - goto EXIT; - } - } - -if (argc > 2) - { - outfile = fopen(argv[op+1], OUTPUT_MODE); - if (outfile == NULL) - { - printf("** Failed to open %s\n", argv[op+1]); - yield = 1; - goto EXIT; - } - } - -/* Set alternative malloc function */ - -#ifdef SUPPORT_PCRE8 -pcre_malloc = new_malloc; -pcre_free = new_free; -pcre_stack_malloc = stack_malloc; -pcre_stack_free = stack_free; -#endif - -#ifdef SUPPORT_PCRE16 -pcre16_malloc = new_malloc; -pcre16_free = new_free; -pcre16_stack_malloc = stack_malloc; -pcre16_stack_free = stack_free; -#endif - -#ifdef SUPPORT_PCRE32 -pcre32_malloc = new_malloc; -pcre32_free = new_free; -pcre32_stack_malloc = stack_malloc; -pcre32_stack_free = stack_free; -#endif - -/* Heading line unless quiet */ - -if (!quiet) fprintf(outfile, "PCRE version %s\n\n", version); - -/* Main loop */ - -while (!done) - { - pcre *re = NULL; - pcre_extra *extra = NULL; - -#if !defined NOPOSIX /* There are still compilers that require no indent */ - regex_t preg = { NULL, 0, 0} ; - int do_posix = 0; -#endif - - const char *error; - pcre_uint8 *markptr; - pcre_uint8 *p, *pp, *ppp; - pcre_uint8 *to_file = NULL; - const pcre_uint8 *tables = NULL; - unsigned long int get_options; - unsigned long int true_size, true_study_size = 0; - size_t size; - int do_allcaps = 0; - int do_mark = 0; - int do_study = 0; - int no_force_study = 0; - int do_debug = debug; - int do_G = 0; - int do_g = 0; - int do_showinfo = showinfo; - int do_showrest = 0; - int do_showcaprest = 0; - int do_flip = 0; - int erroroffset, len, delimiter, poffset; - -#if !defined NODFA - int dfa_matched = 0; -#endif - - use_utf = 0; - debug_lengths = 1; - SET_PCRE_STACK_GUARD(NULL); - - if (extend_inputline(infile, buffer, " re> ") == NULL) break; - if (infile != stdin) fprintf(outfile, "%s", (char *)buffer); - fflush(outfile); - - p = buffer; - while (isspace(*p)) p++; - if (*p == 0) continue; - - /* Handle option lock-out setting */ - - if (*p == '<' && p[1] == ' ') - { - p += 2; - while (isspace(*p)) p++; - if (strncmp((char *)p, "forbid ", 7) == 0) - { - p += 7; - while (isspace(*p)) p++; - pp = lockout; - while (!isspace(*p) && pp < lockout + sizeof(lockout) - 1) - *pp++ = *p++; - *pp = 0; - } - else - { - printf("** Unrecognized special command '%s'\n", p); - yield = 1; - goto EXIT; - } - continue; - } - - /* See if the pattern is to be loaded pre-compiled from a file. */ - - if (*p == '<' && strchr((char *)(p+1), '<') == NULL) - { - pcre_uint32 magic; - pcre_uint8 sbuf[8]; - FILE *f; - - p++; - if (*p == '!') - { - do_debug = TRUE; - do_showinfo = TRUE; - p++; - } - - pp = p + (int)strlen((char *)p); - while (isspace(pp[-1])) pp--; - *pp = 0; - - f = fopen((char *)p, "rb"); - if (f == NULL) - { - fprintf(outfile, "Failed to open %s: %s\n", p, strerror(errno)); - continue; - } - if (fread(sbuf, 1, 8, f) != 8) goto FAIL_READ; - - true_size = - (sbuf[0] << 24) | (sbuf[1] << 16) | (sbuf[2] << 8) | sbuf[3]; - true_study_size = - (sbuf[4] << 24) | (sbuf[5] << 16) | (sbuf[6] << 8) | sbuf[7]; - - re = (pcre *)new_malloc(true_size); - if (re == NULL) - { - printf("** Failed to get %d bytes of memory for pcre object\n", - (int)true_size); - yield = 1; - goto EXIT; - } - if (fread(re, 1, true_size, f) != true_size) goto FAIL_READ; - - magic = REAL_PCRE_MAGIC(re); - if (magic != MAGIC_NUMBER) - { - if (swap_uint32(magic) == MAGIC_NUMBER) - { - do_flip = 1; - } - else - { - fprintf(outfile, "Data in %s is not a compiled PCRE regex\n", p); - new_free(re); - fclose(f); - continue; - } - } - - /* We hide the byte-invert info for little and big endian tests. */ - fprintf(outfile, "Compiled pattern%s loaded from %s\n", - do_flip && (p[-1] == '<') ? " (byte-inverted)" : "", p); - - /* Now see if there is any following study data. */ - - if (true_study_size != 0) - { - pcre_study_data *psd; - - extra = (pcre_extra *)new_malloc(sizeof(pcre_extra) + true_study_size); - extra->flags = PCRE_EXTRA_STUDY_DATA; - - psd = (pcre_study_data *)(((char *)extra) + sizeof(pcre_extra)); - extra->study_data = psd; - - if (fread(psd, 1, true_study_size, f) != true_study_size) - { - FAIL_READ: - fprintf(outfile, "Failed to read data from %s\n", p); - if (extra != NULL) - { - PCRE_FREE_STUDY(extra); - } - new_free(re); - fclose(f); - continue; - } - fprintf(outfile, "Study data loaded from %s\n", p); - do_study = 1; /* To get the data output if requested */ - } - else fprintf(outfile, "No study data\n"); - - /* Flip the necessary bytes. */ - if (do_flip) - { - int rc; - PCRE_PATTERN_TO_HOST_BYTE_ORDER(rc, re, extra, NULL); - if (rc == PCRE_ERROR_BADMODE) - { - pcre_uint32 flags_in_host_byte_order; - if (REAL_PCRE_MAGIC(re) == MAGIC_NUMBER) - flags_in_host_byte_order = REAL_PCRE_FLAGS(re); - else - flags_in_host_byte_order = swap_uint32(REAL_PCRE_FLAGS(re)); - /* Simulate the result of the function call below. */ - fprintf(outfile, "Error %d from pcre%s_fullinfo(%d)\n", rc, - pcre_mode == PCRE32_MODE ? "32" : pcre_mode == PCRE16_MODE ? "16" : "", - PCRE_INFO_OPTIONS); - fprintf(outfile, "Running in %d-bit mode but pattern was compiled in " - "%d-bit mode\n", 8 * CHAR_SIZE, 8 * (flags_in_host_byte_order & PCRE_MODE_MASK)); - new_free(re); - fclose(f); - continue; - } - } - - /* Need to know if UTF-8 for printing data strings. */ - - if (new_info(re, NULL, PCRE_INFO_OPTIONS, &get_options) < 0) - { - new_free(re); - fclose(f); - continue; - } - use_utf = (get_options & PCRE_UTF8) != 0; - - fclose(f); - goto SHOW_INFO; - } - - /* In-line pattern (the usual case). Get the delimiter and seek the end of - the pattern; if it isn't complete, read more. */ - - delimiter = *p++; - - if (isalnum(delimiter) || delimiter == '\\') - { - fprintf(outfile, "** Delimiter must not be alphanumeric or \\\n"); - goto SKIP_DATA; - } - - pp = p; - poffset = (int)(p - buffer); - - for(;;) - { - while (*pp != 0) - { - if (*pp == '\\' && pp[1] != 0) pp++; - else if (*pp == delimiter) break; - pp++; - } - if (*pp != 0) break; - if ((pp = extend_inputline(infile, pp, " > ")) == NULL) - { - fprintf(outfile, "** Unexpected EOF\n"); - done = 1; - goto CONTINUE; - } - if (infile != stdin) fprintf(outfile, "%s", (char *)pp); - } - - /* The buffer may have moved while being extended; reset the start of data - pointer to the correct relative point in the buffer. */ - - p = buffer + poffset; - - /* If the first character after the delimiter is backslash, make - the pattern end with backslash. This is purely to provide a way - of testing for the error message when a pattern ends with backslash. */ - - if (pp[1] == '\\') *pp++ = '\\'; - - /* Terminate the pattern at the delimiter, and save a copy of the pattern - for callouts. */ - - *pp++ = 0; - strcpy((char *)pbuffer, (char *)p); - - /* Look for modifiers and options after the final delimiter. */ - - options = default_options; - study_options = force_study_options; - log_store = showstore; /* default from command line */ - - while (*pp != 0) - { - /* Check to see whether this modifier has been locked out for this file. - This is complicated for the multi-character options that begin with '<'. - If there is no '>' in the lockout string, all multi-character modifiers are - locked out. */ - - if (strchr((char *)lockout, *pp) != NULL) - { - if (*pp == '<' && strchr((char *)lockout, '>') != NULL) - { - int x = check_mc_option(pp+1, outfile, FALSE, "modifier"); - if (x == 0) goto SKIP_DATA; - - for (ppp = lockout; *ppp != 0; ppp++) - { - if (*ppp == '<') - { - int y = check_mc_option(ppp+1, outfile, FALSE, "modifier"); - if (y == 0) - { - printf("** Error in modifier forbid data - giving up.\n"); - yield = 1; - goto EXIT; - } - if (x == y) - { - ppp = pp; - while (*ppp != '>') ppp++; - printf("** The %.*s modifier is locked out - giving up.\n", - (int)(ppp - pp + 1), pp); - yield = 1; - goto EXIT; - } - } - } - } - - /* The single-character modifiers are straightforward. */ - - else - { - printf("** The /%c modifier is locked out - giving up.\n", *pp); - yield = 1; - goto EXIT; - } - } - - /* The modifier is not locked out; handle it. */ - - switch (*pp++) - { - case 'f': options |= PCRE_FIRSTLINE; break; - case 'g': do_g = 1; break; - case 'i': options |= PCRE_CASELESS; break; - case 'm': options |= PCRE_MULTILINE; break; - case 's': options |= PCRE_DOTALL; break; - case 'x': options |= PCRE_EXTENDED; break; - - case '+': - if (do_showrest) do_showcaprest = 1; else do_showrest = 1; - break; - - case '=': do_allcaps = 1; break; - case 'A': options |= PCRE_ANCHORED; break; - case 'B': do_debug = 1; break; - case 'C': options |= PCRE_AUTO_CALLOUT; break; - case 'D': do_debug = do_showinfo = 1; break; - case 'E': options |= PCRE_DOLLAR_ENDONLY; break; - case 'F': do_flip = 1; break; - case 'G': do_G = 1; break; - case 'I': do_showinfo = 1; break; - case 'J': options |= PCRE_DUPNAMES; break; - case 'K': do_mark = 1; break; - case 'M': log_store = 1; break; - case 'N': options |= PCRE_NO_AUTO_CAPTURE; break; - case 'O': options |= PCRE_NO_AUTO_POSSESS; break; - -#if !defined NOPOSIX - case 'P': do_posix = 1; break; -#endif - - case 'Q': - switch (*pp) - { - case '0': - case '1': - stack_guard_return = *pp++ - '0'; - break; - - default: - fprintf(outfile, "** Missing 0 or 1 after /Q\n"); - goto SKIP_DATA; - } - SET_PCRE_STACK_GUARD(stack_guard); - break; - - case 'S': - do_study = 1; - for (;;) - { - switch (*pp++) - { - case 'S': - do_study = 0; - no_force_study = 1; - break; - - case '!': - study_options |= PCRE_STUDY_EXTRA_NEEDED; - break; - - case '+': - if (*pp == '+') - { - verify_jit = TRUE; - pp++; - } - if (*pp >= '1' && *pp <= '7') - study_options |= jit_study_bits[*pp++ - '1']; - else - study_options |= jit_study_bits[6]; - break; - - case '-': - study_options &= ~PCRE_STUDY_ALLJIT; - break; - - default: - pp--; - goto ENDLOOP; - } - } - ENDLOOP: - break; - - case 'U': options |= PCRE_UNGREEDY; break; - case 'W': options |= PCRE_UCP; break; - case 'X': options |= PCRE_EXTRA; break; - case 'Y': options |= PCRE_NO_START_OPTIMISE; break; - case 'Z': debug_lengths = 0; break; - case '8': options |= PCRE_UTF8; use_utf = 1; break; - case '9': options |= PCRE_NEVER_UTF; break; - case '?': options |= PCRE_NO_UTF8_CHECK; break; - - case 'T': - switch (*pp++) - { - case '0': tables = tables0; break; - case '1': tables = tables1; break; - - case '\r': - case '\n': - case ' ': - case 0: - fprintf(outfile, "** Missing table number after /T\n"); - goto SKIP_DATA; - - default: - fprintf(outfile, "** Bad table number \"%c\" after /T\n", pp[-1]); - goto SKIP_DATA; - } - break; - - case 'L': - ppp = pp; - /* The '\r' test here is so that it works on Windows. */ - /* The '0' test is just in case this is an unterminated line. */ - while (*ppp != 0 && *ppp != '\n' && *ppp != '\r' && *ppp != ' ') ppp++; - *ppp = 0; - if (setlocale(LC_CTYPE, (const char *)pp) == NULL) - { - fprintf(outfile, "** Failed to set locale \"%s\"\n", pp); - goto SKIP_DATA; - } - locale_set = 1; - tables = PCRE_MAKETABLES; - pp = ppp; - break; - - case '>': - to_file = pp; - while (*pp != 0) pp++; - while (isspace(pp[-1])) pp--; - *pp = 0; - break; - - case '<': - { - int x = check_mc_option(pp, outfile, FALSE, "modifier"); - if (x == 0) goto SKIP_DATA; - options |= x; - while (*pp++ != '>'); - } - break; - - case '\r': /* So that it works in Windows */ - case '\n': - case ' ': - break; - - default: - fprintf(outfile, "** Unknown modifier '%c'\n", pp[-1]); - goto SKIP_DATA; - } - } - - /* Handle compiling via the POSIX interface, which doesn't support the - timing, showing, or debugging options, nor the ability to pass over - local character tables. Neither does it have 16-bit support. */ - -#if !defined NOPOSIX - if (posix || do_posix) - { - int rc; - int cflags = 0; - - if ((options & PCRE_CASELESS) != 0) cflags |= REG_ICASE; - if ((options & PCRE_MULTILINE) != 0) cflags |= REG_NEWLINE; - if ((options & PCRE_DOTALL) != 0) cflags |= REG_DOTALL; - if ((options & PCRE_NO_AUTO_CAPTURE) != 0) cflags |= REG_NOSUB; - if ((options & PCRE_UTF8) != 0) cflags |= REG_UTF8; - if ((options & PCRE_UCP) != 0) cflags |= REG_UCP; - if ((options & PCRE_UNGREEDY) != 0) cflags |= REG_UNGREEDY; - - rc = regcomp(&preg, (char *)p, cflags); - - /* Compilation failed; go back for another re, skipping to blank line - if non-interactive. */ - - if (rc != 0) - { - (void)regerror(rc, &preg, (char *)buffer, buffer_size); - fprintf(outfile, "Failed: POSIX code %d: %s\n", rc, buffer); - goto SKIP_DATA; - } - } - - /* Handle compiling via the native interface */ - - else -#endif /* !defined NOPOSIX */ - - { - /* In 16- or 32-bit mode, convert the input. */ - -#ifdef SUPPORT_PCRE16 - if (pcre_mode == PCRE16_MODE) - { - switch(to16(FALSE, p, options & PCRE_UTF8, (int)strlen((char *)p))) - { - case -1: - fprintf(outfile, "**Failed: invalid UTF-8 string cannot be " - "converted to UTF-16\n"); - goto SKIP_DATA; - - case -2: - fprintf(outfile, "**Failed: character value greater than 0x10ffff " - "cannot be converted to UTF-16\n"); - goto SKIP_DATA; - - case -3: /* "Impossible error" when to16 is called arg1 FALSE */ - fprintf(outfile, "**Failed: character value greater than 0xffff " - "cannot be converted to 16-bit in non-UTF mode\n"); - goto SKIP_DATA; - - default: - break; - } - p = (pcre_uint8 *)buffer16; - } -#endif - -#ifdef SUPPORT_PCRE32 - if (pcre_mode == PCRE32_MODE) - { - switch(to32(FALSE, p, options & PCRE_UTF32, (int)strlen((char *)p))) - { - case -1: - fprintf(outfile, "**Failed: invalid UTF-8 string cannot be " - "converted to UTF-32\n"); - goto SKIP_DATA; - - case -2: - fprintf(outfile, "**Failed: character value greater than 0x10ffff " - "cannot be converted to UTF-32\n"); - goto SKIP_DATA; - - case -3: - fprintf(outfile, "**Failed: character value is ill-formed UTF-32\n"); - goto SKIP_DATA; - - default: - break; - } - p = (pcre_uint8 *)buffer32; - } -#endif - - /* Compile many times when timing */ - - if (timeit > 0) - { - register int i; - clock_t time_taken; - clock_t start_time = clock(); - for (i = 0; i < timeit; i++) - { - PCRE_COMPILE(re, p, options, &error, &erroroffset, tables); - if (re != NULL) free(re); - } - total_compile_time += (time_taken = clock() - start_time); - fprintf(outfile, "Compile time %.4f milliseconds\n", - (((double)time_taken * 1000.0) / (double)timeit) / - (double)CLOCKS_PER_SEC); - } - - PCRE_COMPILE(re, p, options, &error, &erroroffset, tables); - - /* Compilation failed; go back for another re, skipping to blank line - if non-interactive. */ - - if (re == NULL) - { - fprintf(outfile, "Failed: %s at offset %d\n", error, erroroffset); - SKIP_DATA: - if (infile != stdin) - { - for (;;) - { - if (extend_inputline(infile, buffer, NULL) == NULL) - { - done = 1; - goto CONTINUE; - } - len = (int)strlen((char *)buffer); - while (len > 0 && isspace(buffer[len-1])) len--; - if (len == 0) break; - } - fprintf(outfile, "\n"); - } - goto CONTINUE; - } - - /* Compilation succeeded. It is now possible to set the UTF-8 option from - within the regex; check for this so that we know how to process the data - lines. */ - - if (new_info(re, NULL, PCRE_INFO_OPTIONS, &get_options) < 0) - goto SKIP_DATA; - if ((get_options & PCRE_UTF8) != 0) use_utf = 1; - - /* Extract the size for possible writing before possibly flipping it, - and remember the store that was got. */ - - true_size = REAL_PCRE_SIZE(re); - - /* Output code size information if requested */ - - if (log_store) - { - int name_count, name_entry_size, real_pcre_size; - - new_info(re, NULL, PCRE_INFO_NAMECOUNT, &name_count); - new_info(re, NULL, PCRE_INFO_NAMEENTRYSIZE, &name_entry_size); - real_pcre_size = 0; -#ifdef SUPPORT_PCRE8 - if (REAL_PCRE_FLAGS(re) & PCRE_MODE8) - real_pcre_size = sizeof(real_pcre); -#endif -#ifdef SUPPORT_PCRE16 - if (REAL_PCRE_FLAGS(re) & PCRE_MODE16) - real_pcre_size = sizeof(real_pcre16); -#endif -#ifdef SUPPORT_PCRE32 - if (REAL_PCRE_FLAGS(re) & PCRE_MODE32) - real_pcre_size = sizeof(real_pcre32); -#endif - new_info(re, NULL, PCRE_INFO_SIZE, &size); - fprintf(outfile, "Memory allocation (code space): %d\n", - (int)(size - real_pcre_size - name_count * name_entry_size)); - } - - /* If -s or /S was present, study the regex to generate additional info to - help with the matching, unless the pattern has the SS option, which - suppresses the effect of /S (used for a few test patterns where studying is - never sensible). */ - - if (do_study || (force_study >= 0 && !no_force_study)) - { - if (timeit > 0) - { - register int i; - clock_t time_taken; - clock_t start_time = clock(); - for (i = 0; i < timeit; i++) - { - PCRE_STUDY(extra, re, study_options, &error); - } - total_study_time = (time_taken = clock() - start_time); - if (extra != NULL) - { - PCRE_FREE_STUDY(extra); - } - fprintf(outfile, " Study time %.4f milliseconds\n", - (((double)time_taken * 1000.0) / (double)timeit) / - (double)CLOCKS_PER_SEC); - } - PCRE_STUDY(extra, re, study_options, &error); - if (error != NULL) - fprintf(outfile, "Failed to study: %s\n", error); - else if (extra != NULL) - { - true_study_size = ((pcre_study_data *)(extra->study_data))->size; - if (log_store) - { - size_t jitsize; - if (new_info(re, extra, PCRE_INFO_JITSIZE, &jitsize) == 0 && - jitsize != 0) - fprintf(outfile, "Memory allocation (JIT code): %d\n", (int)jitsize); - } - } - } - - /* If /K was present, we set up for handling MARK data. */ - - if (do_mark) - { - if (extra == NULL) - { - extra = (pcre_extra *)malloc(sizeof(pcre_extra)); - extra->flags = 0; - } - extra->mark = &markptr; - extra->flags |= PCRE_EXTRA_MARK; - } - - /* Extract and display information from the compiled data if required. */ - - SHOW_INFO: - - if (do_debug) - { - fprintf(outfile, "------------------------------------------------------------------\n"); - PCRE_PRINTINT(re, outfile, debug_lengths); - } - - /* We already have the options in get_options (see above) */ - - if (do_showinfo) - { - unsigned long int all_options; - pcre_uint32 first_char, need_char; - pcre_uint32 match_limit, recursion_limit; - int count, backrefmax, first_char_set, need_char_set, okpartial, jchanged, - hascrorlf, maxlookbehind, match_empty; - int nameentrysize, namecount; - const pcre_uint8 *nametable; - - if (new_info(re, NULL, PCRE_INFO_CAPTURECOUNT, &count) + - new_info(re, NULL, PCRE_INFO_BACKREFMAX, &backrefmax) + - new_info(re, NULL, PCRE_INFO_FIRSTCHARACTER, &first_char) + - new_info(re, NULL, PCRE_INFO_FIRSTCHARACTERFLAGS, &first_char_set) + - new_info(re, NULL, PCRE_INFO_REQUIREDCHAR, &need_char) + - new_info(re, NULL, PCRE_INFO_REQUIREDCHARFLAGS, &need_char_set) + - new_info(re, NULL, PCRE_INFO_NAMEENTRYSIZE, &nameentrysize) + - new_info(re, NULL, PCRE_INFO_NAMECOUNT, &namecount) + - new_info(re, NULL, PCRE_INFO_NAMETABLE, (void *)&nametable) + - new_info(re, NULL, PCRE_INFO_OKPARTIAL, &okpartial) + - new_info(re, NULL, PCRE_INFO_JCHANGED, &jchanged) + - new_info(re, NULL, PCRE_INFO_HASCRORLF, &hascrorlf) + - new_info(re, NULL, PCRE_INFO_MATCH_EMPTY, &match_empty) + - new_info(re, NULL, PCRE_INFO_MAXLOOKBEHIND, &maxlookbehind) - != 0) - goto SKIP_DATA; - - fprintf(outfile, "Capturing subpattern count = %d\n", count); - - if (backrefmax > 0) - fprintf(outfile, "Max back reference = %d\n", backrefmax); - - if (maxlookbehind > 0) - fprintf(outfile, "Max lookbehind = %d\n", maxlookbehind); - - if (new_info(re, NULL, PCRE_INFO_MATCHLIMIT, &match_limit) == 0) - fprintf(outfile, "Match limit = %u\n", match_limit); - - if (new_info(re, NULL, PCRE_INFO_RECURSIONLIMIT, &recursion_limit) == 0) - fprintf(outfile, "Recursion limit = %u\n", recursion_limit); - - if (namecount > 0) - { - fprintf(outfile, "Named capturing subpatterns:\n"); - while (namecount-- > 0) - { - int imm2_size = pcre_mode == PCRE8_MODE ? 2 : 1; - int length = (int)STRLEN(nametable + imm2_size); - fprintf(outfile, " "); - PCHARSV(nametable, imm2_size, length, outfile); - while (length++ < nameentrysize - imm2_size) putc(' ', outfile); -#ifdef SUPPORT_PCRE32 - if (pcre_mode == PCRE32_MODE) - fprintf(outfile, "%3d\n", (int)(((PCRE_SPTR32)nametable)[0])); -#endif -#ifdef SUPPORT_PCRE16 - if (pcre_mode == PCRE16_MODE) - fprintf(outfile, "%3d\n", (int)(((PCRE_SPTR16)nametable)[0])); -#endif -#ifdef SUPPORT_PCRE8 - if (pcre_mode == PCRE8_MODE) - fprintf(outfile, "%3d\n", ((int)nametable[0] << 8) | (int)nametable[1]); -#endif - nametable += nameentrysize * CHAR_SIZE; - } - } - - if (!okpartial) fprintf(outfile, "Partial matching not supported\n"); - if (hascrorlf) fprintf(outfile, "Contains explicit CR or LF match\n"); - if (match_empty) fprintf(outfile, "May match empty string\n"); - - all_options = REAL_PCRE_OPTIONS(re); - if (do_flip) all_options = swap_uint32(all_options); - - if (get_options == 0) fprintf(outfile, "No options\n"); - else fprintf(outfile, "Options:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n", - ((get_options & PCRE_ANCHORED) != 0)? " anchored" : "", - ((get_options & PCRE_CASELESS) != 0)? " caseless" : "", - ((get_options & PCRE_EXTENDED) != 0)? " extended" : "", - ((get_options & PCRE_MULTILINE) != 0)? " multiline" : "", - ((get_options & PCRE_FIRSTLINE) != 0)? " firstline" : "", - ((get_options & PCRE_DOTALL) != 0)? " dotall" : "", - ((get_options & PCRE_BSR_ANYCRLF) != 0)? " bsr_anycrlf" : "", - ((get_options & PCRE_BSR_UNICODE) != 0)? " bsr_unicode" : "", - ((get_options & PCRE_DOLLAR_ENDONLY) != 0)? " dollar_endonly" : "", - ((get_options & PCRE_EXTRA) != 0)? " extra" : "", - ((get_options & PCRE_UNGREEDY) != 0)? " ungreedy" : "", - ((get_options & PCRE_NO_AUTO_CAPTURE) != 0)? " no_auto_capture" : "", - ((get_options & PCRE_NO_AUTO_POSSESS) != 0)? " no_auto_possessify" : "", - ((get_options & PCRE_UTF8) != 0)? " utf" : "", - ((get_options & PCRE_UCP) != 0)? " ucp" : "", - ((get_options & PCRE_NO_UTF8_CHECK) != 0)? " no_utf_check" : "", - ((get_options & PCRE_NO_START_OPTIMIZE) != 0)? " no_start_optimize" : "", - ((get_options & PCRE_DUPNAMES) != 0)? " dupnames" : "", - ((get_options & PCRE_NEVER_UTF) != 0)? " never_utf" : ""); - - if (jchanged) fprintf(outfile, "Duplicate name status changes\n"); - - switch (get_options & PCRE_NEWLINE_BITS) - { - case PCRE_NEWLINE_CR: - fprintf(outfile, "Forced newline sequence: CR\n"); - break; - - case PCRE_NEWLINE_LF: - fprintf(outfile, "Forced newline sequence: LF\n"); - break; - - case PCRE_NEWLINE_CRLF: - fprintf(outfile, "Forced newline sequence: CRLF\n"); - break; - - case PCRE_NEWLINE_ANYCRLF: - fprintf(outfile, "Forced newline sequence: ANYCRLF\n"); - break; - - case PCRE_NEWLINE_ANY: - fprintf(outfile, "Forced newline sequence: ANY\n"); - break; - - default: - break; - } - - if (first_char_set == 2) - { - fprintf(outfile, "First char at start or follows newline\n"); - } - else if (first_char_set == 1) - { - const char *caseless = - ((REAL_PCRE_FLAGS(re) & PCRE_FCH_CASELESS) == 0)? - "" : " (caseless)"; - - if (PRINTOK(first_char)) - fprintf(outfile, "First char = \'%c\'%s\n", first_char, caseless); - else - { - fprintf(outfile, "First char = "); - pchar(first_char, outfile); - fprintf(outfile, "%s\n", caseless); - } - } - else - { - fprintf(outfile, "No first char\n"); - } - - if (need_char_set == 0) - { - fprintf(outfile, "No need char\n"); - } - else - { - const char *caseless = - ((REAL_PCRE_FLAGS(re) & PCRE_RCH_CASELESS) == 0)? - "" : " (caseless)"; - - if (PRINTOK(need_char)) - fprintf(outfile, "Need char = \'%c\'%s\n", need_char, caseless); - else - { - fprintf(outfile, "Need char = "); - pchar(need_char, outfile); - fprintf(outfile, "%s\n", caseless); - } - } - - /* Don't output study size; at present it is in any case a fixed - value, but it varies, depending on the computer architecture, and - so messes up the test suite. (And with the /F option, it might be - flipped.) If study was forced by an external -s, don't show this - information unless -i or -d was also present. This means that, except - when auto-callouts are involved, the output from runs with and without - -s should be identical. */ - - if (do_study || (force_study >= 0 && showinfo && !no_force_study)) - { - if (extra == NULL) - fprintf(outfile, "Study returned NULL\n"); - else - { - pcre_uint8 *start_bits = NULL; - int minlength; - - if (new_info(re, extra, PCRE_INFO_MINLENGTH, &minlength) == 0) - fprintf(outfile, "Subject length lower bound = %d\n", minlength); - - if (new_info(re, extra, PCRE_INFO_FIRSTTABLE, &start_bits) == 0) - { - if (start_bits == NULL) - fprintf(outfile, "No starting char list\n"); - else - { - int i; - int c = 24; - fprintf(outfile, "Starting chars: "); - for (i = 0; i < 256; i++) - { - if ((start_bits[i/8] & (1<<(i&7))) != 0) - { - if (c > 75) - { - fprintf(outfile, "\n "); - c = 2; - } - if (PRINTOK(i) && i != ' ') - { - fprintf(outfile, "%c ", i); - c += 2; - } - else - { - fprintf(outfile, "\\x%02x ", i); - c += 5; - } - } - } - fprintf(outfile, "\n"); - } - } - } - - /* Show this only if the JIT was set by /S, not by -s. */ - - if ((study_options & PCRE_STUDY_ALLJIT) != 0 && - (force_study_options & PCRE_STUDY_ALLJIT) == 0) - { - int jit; - if (new_info(re, extra, PCRE_INFO_JIT, &jit) == 0) - { - if (jit) - fprintf(outfile, "JIT study was successful\n"); - else -#ifdef SUPPORT_JIT - fprintf(outfile, "JIT study was not successful\n"); -#else - fprintf(outfile, "JIT support is not available in this version of PCRE\n"); -#endif - } - } - } - } - - /* If the '>' option was present, we write out the regex to a file, and - that is all. The first 8 bytes of the file are the regex length and then - the study length, in big-endian order. */ - - if (to_file != NULL) - { - FILE *f = fopen((char *)to_file, "wb"); - if (f == NULL) - { - fprintf(outfile, "Unable to open %s: %s\n", to_file, strerror(errno)); - } - else - { - pcre_uint8 sbuf[8]; - - if (do_flip) regexflip(re, extra); - sbuf[0] = (pcre_uint8)((true_size >> 24) & 255); - sbuf[1] = (pcre_uint8)((true_size >> 16) & 255); - sbuf[2] = (pcre_uint8)((true_size >> 8) & 255); - sbuf[3] = (pcre_uint8)((true_size) & 255); - sbuf[4] = (pcre_uint8)((true_study_size >> 24) & 255); - sbuf[5] = (pcre_uint8)((true_study_size >> 16) & 255); - sbuf[6] = (pcre_uint8)((true_study_size >> 8) & 255); - sbuf[7] = (pcre_uint8)((true_study_size) & 255); - - if (fwrite(sbuf, 1, 8, f) < 8 || - fwrite(re, 1, true_size, f) < true_size) - { - fprintf(outfile, "Write error on %s: %s\n", to_file, strerror(errno)); - } - else - { - fprintf(outfile, "Compiled pattern written to %s\n", to_file); - - /* If there is study data, write it. */ - - if (extra != NULL) - { - if (fwrite(extra->study_data, 1, true_study_size, f) < - true_study_size) - { - fprintf(outfile, "Write error on %s: %s\n", to_file, - strerror(errno)); - } - else fprintf(outfile, "Study data written to %s\n", to_file); - } - } - fclose(f); - } - - new_free(re); - if (extra != NULL) - { - PCRE_FREE_STUDY(extra); - } - if (locale_set) - { - new_free((void *)tables); - setlocale(LC_CTYPE, "C"); - locale_set = 0; - } - continue; /* With next regex */ - } - } /* End of non-POSIX compile */ - - /* Read data lines and test them */ - - for (;;) - { -#ifdef SUPPORT_PCRE8 - pcre_uint8 *q8; -#endif -#ifdef SUPPORT_PCRE16 - pcre_uint16 *q16; -#endif -#ifdef SUPPORT_PCRE32 - pcre_uint32 *q32; -#endif - pcre_uint8 *bptr; - int *use_offsets = offsets; - int use_size_offsets = size_offsets; - int callout_data = 0; - int callout_data_set = 0; - int count; - pcre_uint32 c; - int copystrings = 0; - int find_match_limit = default_find_match_limit; - int getstrings = 0; - int getlist = 0; - int gmatched = 0; - int start_offset = 0; - int start_offset_sign = 1; - int g_notempty = 0; - int use_dfa = 0; - - *copynames = 0; - *getnames = 0; - -#ifdef SUPPORT_PCRE32 - cn32ptr = copynames; - gn32ptr = getnames; -#endif -#ifdef SUPPORT_PCRE16 - cn16ptr = copynames16; - gn16ptr = getnames16; -#endif -#ifdef SUPPORT_PCRE8 - cn8ptr = copynames8; - gn8ptr = getnames8; -#endif - - SET_PCRE_CALLOUT(callout); - first_callout = 1; - last_callout_mark = NULL; - callout_extra = 0; - callout_count = 0; - callout_fail_count = 999999; - callout_fail_id = -1; - show_malloc = 0; - options = 0; - - if (extra != NULL) extra->flags &= - ~(PCRE_EXTRA_MATCH_LIMIT|PCRE_EXTRA_MATCH_LIMIT_RECURSION); - - len = 0; - for (;;) - { - if (extend_inputline(infile, buffer + len, "data> ") == NULL) - { - if (len > 0) /* Reached EOF without hitting a newline */ - { - fprintf(outfile, "\n"); - break; - } - done = 1; - goto CONTINUE; - } - if (infile != stdin) fprintf(outfile, "%s", (char *)buffer); - len = (int)strlen((char *)buffer); - if (buffer[len-1] == '\n') break; - } - - while (len > 0 && isspace(buffer[len-1])) len--; - buffer[len] = 0; - if (len == 0) break; - - p = buffer; - while (isspace(*p)) p++; - -#ifndef NOUTF - /* Check that the data is well-formed UTF-8 if we're in UTF mode. To create - invalid input to pcre_exec, you must use \x?? or \x{} sequences. */ - - if (use_utf) - { - pcre_uint8 *q; - pcre_uint32 cc; - int n = 1; - - for (q = p; n > 0 && *q; q += n) n = utf82ord(q, &cc); - if (n <= 0) - { - fprintf(outfile, "**Failed: invalid UTF-8 string cannot be used as input in UTF mode\n"); - goto NEXT_DATA; - } - } -#endif - -#ifdef SUPPORT_VALGRIND - /* Mark the dbuffer as addressable but undefined again. */ - - if (dbuffer != NULL) - { - VALGRIND_MAKE_MEM_UNDEFINED(dbuffer, dbuffer_size * CHAR_SIZE); - } -#endif - - /* Allocate a buffer to hold the data line; len+1 is an upper bound on - the number of pcre_uchar units that will be needed. */ - - while (dbuffer == NULL || (size_t)len >= dbuffer_size) - { - dbuffer_size *= 2; - dbuffer = (pcre_uint8 *)realloc(dbuffer, dbuffer_size * CHAR_SIZE); - if (dbuffer == NULL) - { - fprintf(stderr, "pcretest: realloc(%d) failed\n", (int)dbuffer_size); - exit(1); - } - } - -#ifdef SUPPORT_PCRE8 - q8 = (pcre_uint8 *) dbuffer; -#endif -#ifdef SUPPORT_PCRE16 - q16 = (pcre_uint16 *) dbuffer; -#endif -#ifdef SUPPORT_PCRE32 - q32 = (pcre_uint32 *) dbuffer; -#endif - - while ((c = *p++) != 0) - { - int i = 0; - int n = 0; - - /* In UTF mode, input can be UTF-8, so just copy all non-backslash bytes. - In non-UTF mode, allow the value of the byte to fall through to later, - where values greater than 127 are turned into UTF-8 when running in - 16-bit or 32-bit mode. */ - - if (c != '\\') - { -#ifndef NOUTF - if (use_utf && HASUTF8EXTRALEN(c)) { GETUTF8INC(c, p); } -#endif - } - - /* Handle backslash escapes */ - - else switch ((c = *p++)) - { - case 'a': c = CHAR_BEL; break; - case 'b': c = '\b'; break; - case 'e': c = CHAR_ESC; break; - case 'f': c = '\f'; break; - case 'n': c = '\n'; break; - case 'r': c = '\r'; break; - case 't': c = '\t'; break; - case 'v': c = '\v'; break; - - case '0': case '1': case '2': case '3': - case '4': case '5': case '6': case '7': - c -= '0'; - while (i++ < 2 && isdigit(*p) && *p != '8' && *p != '9') - c = c * 8 + *p++ - '0'; - break; - - case 'o': - if (*p == '{') - { - pcre_uint8 *pt = p; - c = 0; - for (pt++; isdigit(*pt) && *pt != '8' && *pt != '9'; pt++) - { - if (++i == 12) - fprintf(outfile, "** Too many octal digits in \\o{...} item; " - "using only the first twelve.\n"); - else c = c * 8 + *pt - '0'; - } - if (*pt == '}') p = pt + 1; - else fprintf(outfile, "** Missing } after \\o{ (assumed)\n"); - } - break; - - case 'x': - if (*p == '{') - { - pcre_uint8 *pt = p; - c = 0; - - /* We used to have "while (isxdigit(*(++pt)))" here, but it fails - when isxdigit() is a macro that refers to its argument more than - once. This is banned by the C Standard, but apparently happens in at - least one MacOS environment. */ - - for (pt++; isxdigit(*pt); pt++) - { - if (++i == 9) - fprintf(outfile, "** Too many hex digits in \\x{...} item; " - "using only the first eight.\n"); - else c = c * 16 + tolower(*pt) - ((isdigit(*pt))? '0' : 'a' - 10); - } - if (*pt == '}') - { - p = pt + 1; - break; - } - /* Not correct form for \x{...}; fall through */ - } - - /* \x without {} always defines just one byte in 8-bit mode. This - allows UTF-8 characters to be constructed byte by byte, and also allows - invalid UTF-8 sequences to be made. Just copy the byte in UTF mode. - Otherwise, pass it down to later code so that it can be turned into - UTF-8 when running in 16/32-bit mode. */ - - c = 0; - while (i++ < 2 && isxdigit(*p)) - { - c = c * 16 + tolower(*p) - ((isdigit(*p))? '0' : 'a' - 10); - p++; - } -#if !defined NOUTF && defined SUPPORT_PCRE8 - if (use_utf && (pcre_mode == PCRE8_MODE)) - { - *q8++ = c; - continue; - } -#endif - break; - - case 0: /* \ followed by EOF allows for an empty line */ - p--; - continue; - - case '>': - if (*p == '-') - { - start_offset_sign = -1; - p++; - } - while(isdigit(*p)) start_offset = start_offset * 10 + *p++ - '0'; - start_offset *= start_offset_sign; - continue; - - case 'A': /* Option setting */ - options |= PCRE_ANCHORED; - continue; - - case 'B': - options |= PCRE_NOTBOL; - continue; - - case 'C': - if (isdigit(*p)) /* Set copy string */ - { - while(isdigit(*p)) n = n * 10 + *p++ - '0'; - copystrings |= 1 << n; - } - else if (isalnum(*p)) - { - READ_CAPTURE_NAME(p, &cn8ptr, &cn16ptr, &cn32ptr, re); - } - else if (*p == '+') - { - callout_extra = 1; - p++; - } - else if (*p == '-') - { - SET_PCRE_CALLOUT(NULL); - p++; - } - else if (*p == '!') - { - callout_fail_id = 0; - p++; - while(isdigit(*p)) - callout_fail_id = callout_fail_id * 10 + *p++ - '0'; - callout_fail_count = 0; - if (*p == '!') - { - p++; - while(isdigit(*p)) - callout_fail_count = callout_fail_count * 10 + *p++ - '0'; - } - } - else if (*p == '*') - { - int sign = 1; - callout_data = 0; - if (*(++p) == '-') { sign = -1; p++; } - while(isdigit(*p)) - callout_data = callout_data * 10 + *p++ - '0'; - callout_data *= sign; - callout_data_set = 1; - } - continue; - -#if !defined NODFA - case 'D': -#if !defined NOPOSIX - if (posix || do_posix) - printf("** Can't use dfa matching in POSIX mode: \\D ignored\n"); - else -#endif - use_dfa = 1; - continue; -#endif - -#if !defined NODFA - case 'F': - options |= PCRE_DFA_SHORTEST; - continue; -#endif - - case 'G': - if (isdigit(*p)) - { - while(isdigit(*p)) n = n * 10 + *p++ - '0'; - getstrings |= 1 << n; - } - else if (isalnum(*p)) - { - READ_CAPTURE_NAME(p, &gn8ptr, &gn16ptr, &gn32ptr, re); - } - continue; - - case 'J': - while(isdigit(*p)) n = n * 10 + *p++ - '0'; - if (extra != NULL - && (extra->flags & PCRE_EXTRA_EXECUTABLE_JIT) != 0 - && extra->executable_jit != NULL) - { - if (jit_stack != NULL) { PCRE_JIT_STACK_FREE(jit_stack); } - jit_stack = PCRE_JIT_STACK_ALLOC(1, n * 1024); - PCRE_ASSIGN_JIT_STACK(extra, jit_callback, jit_stack); - } - continue; - - case 'L': - getlist = 1; - continue; - - case 'M': - find_match_limit = 1; - continue; - - case 'N': - if ((options & PCRE_NOTEMPTY) != 0) - options = (options & ~PCRE_NOTEMPTY) | PCRE_NOTEMPTY_ATSTART; - else - options |= PCRE_NOTEMPTY; - continue; - - case 'O': - while(isdigit(*p)) - { - if (n > (INT_MAX-10)/10) /* Hack to stop fuzzers */ - { - printf("** \\O argument is too big\n"); - yield = 1; - goto EXIT; - } - n = n * 10 + *p++ - '0'; - } - if (n > size_offsets_max) - { - size_offsets_max = n; - free(offsets); - use_offsets = offsets = (int *)malloc(size_offsets_max * sizeof(int)); - if (offsets == NULL) - { - printf("** Failed to get %d bytes of memory for offsets vector\n", - (int)(size_offsets_max * sizeof(int))); - yield = 1; - goto EXIT; - } - } - use_size_offsets = n; - if (n == 0) use_offsets = NULL; /* Ensures it can't write to it */ - else use_offsets = offsets + size_offsets_max - n; /* To catch overruns */ - continue; - - case 'P': - options |= ((options & PCRE_PARTIAL_SOFT) == 0)? - PCRE_PARTIAL_SOFT : PCRE_PARTIAL_HARD; - continue; - - case 'Q': - while(isdigit(*p)) n = n * 10 + *p++ - '0'; - if (extra == NULL) - { - extra = (pcre_extra *)malloc(sizeof(pcre_extra)); - extra->flags = 0; - } - extra->flags |= PCRE_EXTRA_MATCH_LIMIT_RECURSION; - extra->match_limit_recursion = n; - continue; - - case 'q': - while(isdigit(*p)) n = n * 10 + *p++ - '0'; - if (extra == NULL) - { - extra = (pcre_extra *)malloc(sizeof(pcre_extra)); - extra->flags = 0; - } - extra->flags |= PCRE_EXTRA_MATCH_LIMIT; - extra->match_limit = n; - continue; - -#if !defined NODFA - case 'R': - options |= PCRE_DFA_RESTART; - continue; -#endif - - case 'S': - show_malloc = 1; - continue; - - case 'Y': - options |= PCRE_NO_START_OPTIMIZE; - continue; - - case 'Z': - options |= PCRE_NOTEOL; - continue; - - case '?': - options |= PCRE_NO_UTF8_CHECK; - continue; - - case '<': - { - int x = check_mc_option(p, outfile, TRUE, "escape sequence"); - if (x == 0) goto NEXT_DATA; - options |= x; - while (*p++ != '>'); - } - continue; - } - - /* We now have a character value in c that may be greater than 255. - In 8-bit mode we convert to UTF-8 if we are in UTF mode. Values greater - than 127 in UTF mode must have come from \x{...} or octal constructs - because values from \x.. get this far only in non-UTF mode. */ - -#ifdef SUPPORT_PCRE8 - if (pcre_mode == PCRE8_MODE) - { -#ifndef NOUTF - if (use_utf) - { - if (c > 0x7fffffff) - { - fprintf(outfile, "** Character \\x{%x} is greater than 0x7fffffff " - "and so cannot be converted to UTF-8\n", c); - goto NEXT_DATA; - } - q8 += ord2utf8(c, q8); - } - else -#endif - { - if (c > 0xffu) - { - fprintf(outfile, "** Character \\x{%x} is greater than 255 " - "and UTF-8 mode is not enabled.\n", c); - fprintf(outfile, "** Truncation will probably give the wrong " - "result.\n"); - } - *q8++ = c; - } - } -#endif -#ifdef SUPPORT_PCRE16 - if (pcre_mode == PCRE16_MODE) - { -#ifndef NOUTF - if (use_utf) - { - if (c > 0x10ffffu) - { - fprintf(outfile, "** Failed: character \\x{%x} is greater than " - "0x10ffff and so cannot be converted to UTF-16\n", c); - goto NEXT_DATA; - } - else if (c >= 0x10000u) - { - c-= 0x10000u; - *q16++ = 0xD800 | (c >> 10); - *q16++ = 0xDC00 | (c & 0x3ff); - } - else - *q16++ = c; - } - else -#endif - { - if (c > 0xffffu) - { - fprintf(outfile, "** Character \\x{%x} is greater than 0xffff " - "and UTF-16 mode is not enabled.\n", c); - fprintf(outfile, "** Truncation will probably give the wrong " - "result.\n"); - } - - *q16++ = c; - } - } -#endif -#ifdef SUPPORT_PCRE32 - if (pcre_mode == PCRE32_MODE) - { - *q32++ = c; - } -#endif - - } - - /* Reached end of subject string */ - -#ifdef SUPPORT_PCRE8 - if (pcre_mode == PCRE8_MODE) - { - *q8 = 0; - len = (int)(q8 - (pcre_uint8 *)dbuffer); - } -#endif -#ifdef SUPPORT_PCRE16 - if (pcre_mode == PCRE16_MODE) - { - *q16 = 0; - len = (int)(q16 - (pcre_uint16 *)dbuffer); - } -#endif -#ifdef SUPPORT_PCRE32 - if (pcre_mode == PCRE32_MODE) - { - *q32 = 0; - len = (int)(q32 - (pcre_uint32 *)dbuffer); - } -#endif - - /* If we're compiling with explicit valgrind support, Mark the data from after - its end to the end of the buffer as unaddressable, so that a read over the end - of the buffer will be seen by valgrind, even if it doesn't cause a crash. - If we're not building with valgrind support, at least move the data to the end - of the buffer so that it might at least cause a crash. - If we are using the POSIX interface, we must include the terminating zero. */ - - bptr = dbuffer; - -#if !defined NOPOSIX - if (posix || do_posix) - { -#ifdef SUPPORT_VALGRIND - VALGRIND_MAKE_MEM_NOACCESS(dbuffer + len + 1, dbuffer_size - (len + 1)); -#else - memmove(bptr + dbuffer_size - len - 1, bptr, len + 1); - bptr += dbuffer_size - len - 1; -#endif - } - else -#endif - { -#ifdef SUPPORT_VALGRIND - VALGRIND_MAKE_MEM_NOACCESS(dbuffer + len * CHAR_SIZE, (dbuffer_size - len) * CHAR_SIZE); -#else - bptr = memmove(bptr + (dbuffer_size - len) * CHAR_SIZE, bptr, len * CHAR_SIZE); -#endif - } - - if ((all_use_dfa || use_dfa) && find_match_limit) - { - printf("** Match limit not relevant for DFA matching: ignored\n"); - find_match_limit = 0; - } - - /* Handle matching via the POSIX interface, which does not - support timing or playing with the match limit or callout data. */ - -#if !defined NOPOSIX - if (posix || do_posix) - { - int rc; - int eflags = 0; - regmatch_t *pmatch = NULL; - if (use_size_offsets > 0) - pmatch = (regmatch_t *)malloc(sizeof(regmatch_t) * use_size_offsets); - if ((options & PCRE_NOTBOL) != 0) eflags |= REG_NOTBOL; - if ((options & PCRE_NOTEOL) != 0) eflags |= REG_NOTEOL; - if ((options & PCRE_NOTEMPTY) != 0) eflags |= REG_NOTEMPTY; - - rc = regexec(&preg, (const char *)bptr, use_size_offsets, pmatch, eflags); - - if (rc != 0) - { - (void)regerror(rc, &preg, (char *)buffer, buffer_size); - fprintf(outfile, "No match: POSIX code %d: %s\n", rc, buffer); - } - else if ((REAL_PCRE_OPTIONS(preg.re_pcre) & PCRE_NO_AUTO_CAPTURE) != 0) - { - fprintf(outfile, "Matched with REG_NOSUB\n"); - } - else - { - size_t i; - for (i = 0; i < (size_t)use_size_offsets; i++) - { - if (pmatch[i].rm_so >= 0) - { - fprintf(outfile, "%2d: ", (int)i); - PCHARSV(dbuffer, pmatch[i].rm_so, - pmatch[i].rm_eo - pmatch[i].rm_so, outfile); - fprintf(outfile, "\n"); - if (do_showcaprest || (i == 0 && do_showrest)) - { - fprintf(outfile, "%2d+ ", (int)i); - PCHARSV(dbuffer, pmatch[i].rm_eo, len - pmatch[i].rm_eo, - outfile); - fprintf(outfile, "\n"); - } - } - } - } - free(pmatch); - goto NEXT_DATA; - } - -#endif /* !defined NOPOSIX */ - - /* Handle matching via the native interface - repeats for /g and /G */ - - /* Ensure that there is a JIT callback if we want to verify that JIT was - actually used. If jit_stack == NULL, no stack has yet been assigned. */ - - if (verify_jit && jit_stack == NULL && extra != NULL) - { PCRE_ASSIGN_JIT_STACK(extra, jit_callback, jit_stack); } - - for (;; gmatched++) /* Loop for /g or /G */ - { - markptr = NULL; - jit_was_used = FALSE; - - if (timeitm > 0) - { - register int i; - clock_t time_taken; - clock_t start_time = clock(); - -#if !defined NODFA - if (all_use_dfa || use_dfa) - { - if ((options & PCRE_DFA_RESTART) != 0) - { - fprintf(outfile, "Timing DFA restarts is not supported\n"); - break; - } - if (dfa_workspace == NULL) - dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int)); - for (i = 0; i < timeitm; i++) - { - PCRE_DFA_EXEC(count, re, extra, bptr, len, start_offset, - (options | g_notempty), use_offsets, use_size_offsets, - dfa_workspace, DFA_WS_DIMENSION); - } - } - else -#endif - - for (i = 0; i < timeitm; i++) - { - PCRE_EXEC(count, re, extra, bptr, len, start_offset, - (options | g_notempty), use_offsets, use_size_offsets); - } - total_match_time += (time_taken = clock() - start_time); - fprintf(outfile, "Execute time %.4f milliseconds\n", - (((double)time_taken * 1000.0) / (double)timeitm) / - (double)CLOCKS_PER_SEC); - } - - /* If find_match_limit is set, we want to do repeated matches with - varying limits in order to find the minimum value for the match limit and - for the recursion limit. The match limits are relevant only to the normal - running of pcre_exec(), so disable the JIT optimization. This makes it - possible to run the same set of tests with and without JIT externally - requested. */ - - if (find_match_limit) - { - if (extra != NULL) { PCRE_FREE_STUDY(extra); } - extra = (pcre_extra *)malloc(sizeof(pcre_extra)); - extra->flags = 0; - - (void)check_match_limit(re, extra, bptr, len, start_offset, - options|g_notempty, use_offsets, use_size_offsets, - PCRE_EXTRA_MATCH_LIMIT, &(extra->match_limit), - PCRE_ERROR_MATCHLIMIT, "match()"); - - count = check_match_limit(re, extra, bptr, len, start_offset, - options|g_notempty, use_offsets, use_size_offsets, - PCRE_EXTRA_MATCH_LIMIT_RECURSION, &(extra->match_limit_recursion), - PCRE_ERROR_RECURSIONLIMIT, "match() recursion"); - } - - /* If callout_data is set, use the interface with additional data */ - - else if (callout_data_set) - { - if (extra == NULL) - { - extra = (pcre_extra *)malloc(sizeof(pcre_extra)); - extra->flags = 0; - } - extra->flags |= PCRE_EXTRA_CALLOUT_DATA; - extra->callout_data = &callout_data; - PCRE_EXEC(count, re, extra, bptr, len, start_offset, - options | g_notempty, use_offsets, use_size_offsets); - extra->flags &= ~PCRE_EXTRA_CALLOUT_DATA; - } - - /* The normal case is just to do the match once, with the default - value of match_limit. */ - -#if !defined NODFA - else if (all_use_dfa || use_dfa) - { - if (dfa_workspace == NULL) - dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int)); - if (dfa_matched++ == 0) - dfa_workspace[0] = -1; /* To catch bad restart */ - PCRE_DFA_EXEC(count, re, extra, bptr, len, start_offset, - (options | g_notempty), use_offsets, use_size_offsets, dfa_workspace, - DFA_WS_DIMENSION); - if (count == 0) - { - fprintf(outfile, "Matched, but offsets vector is too small to show all matches\n"); - count = use_size_offsets/2; - } - } -#endif - - else - { - PCRE_EXEC(count, re, extra, bptr, len, start_offset, - options | g_notempty, use_offsets, use_size_offsets); - if (count == 0) - { - fprintf(outfile, "Matched, but too many substrings\n"); - /* 2 is a special case; match can be returned */ - count = (use_size_offsets == 2)? 1 : use_size_offsets/3; - } - } - - /* Matched */ - - if (count >= 0) - { - int i, maxcount; - void *cnptr, *gnptr; - -#if !defined NODFA - if (all_use_dfa || use_dfa) maxcount = use_size_offsets/2; else -#endif - /* 2 is a special case; match can be returned */ - maxcount = (use_size_offsets == 2)? 1 : use_size_offsets/3; - - /* This is a check against a lunatic return value. */ - - if (count > maxcount) - { - fprintf(outfile, - "** PCRE error: returned count %d is too big for offset size %d\n", - count, use_size_offsets); - count = use_size_offsets/3; - if (do_g || do_G) - { - fprintf(outfile, "** /%c loop abandoned\n", do_g? 'g' : 'G'); - do_g = do_G = FALSE; /* Break g/G loop */ - } - } - - /* do_allcaps requests showing of all captures in the pattern, to check - unset ones at the end. */ - - if (do_allcaps) - { - if (all_use_dfa || use_dfa) - { - fprintf(outfile, "** Show all captures ignored after DFA matching\n"); - } - else - { - if (new_info(re, NULL, PCRE_INFO_CAPTURECOUNT, &count) < 0) - goto SKIP_DATA; - count++; /* Allow for full match */ - if (count * 2 > use_size_offsets) count = use_size_offsets/2; - } - } - - /* Output the captured substrings. Note that, for the matched string, - the use of \K in an assertion can make the start later than the end. */ - - for (i = 0; i < count * 2; i += 2) - { - if (use_offsets[i] < 0) - { - if (use_offsets[i] != -1) - fprintf(outfile, "ERROR: bad negative value %d for offset %d\n", - use_offsets[i], i); - if (use_offsets[i+1] != -1) - fprintf(outfile, "ERROR: bad negative value %d for offset %d\n", - use_offsets[i+1], i+1); - fprintf(outfile, "%2d: \n", i/2); - } - else - { - int start = use_offsets[i]; - int end = use_offsets[i+1]; - - if (start > end) - { - start = use_offsets[i+1]; - end = use_offsets[i]; - fprintf(outfile, "Start of matched string is beyond its end - " - "displaying from end to start.\n"); - } - - fprintf(outfile, "%2d: ", i/2); - PCHARSV(bptr, start, end - start, outfile); - if (verify_jit && jit_was_used) fprintf(outfile, " (JIT)"); - fprintf(outfile, "\n"); - - /* Note: don't use the start/end variables here because we want to - show the text from what is reported as the end. */ - - if (do_showcaprest || (i == 0 && do_showrest)) - { - fprintf(outfile, "%2d+ ", i/2); - PCHARSV(bptr, use_offsets[i+1], len - use_offsets[i+1], - outfile); - fprintf(outfile, "\n"); - } - } - } - - if (markptr != NULL) - { - fprintf(outfile, "MK: "); - PCHARSV(markptr, 0, -1, outfile); - fprintf(outfile, "\n"); - } - - for (i = 0; i < 32; i++) - { - if ((copystrings & (1 << i)) != 0) - { - int rc; - char copybuffer[256]; - PCRE_COPY_SUBSTRING(rc, bptr, use_offsets, count, i, - copybuffer, sizeof(copybuffer)); - if (rc < 0) - fprintf(outfile, "copy substring %d failed %d\n", i, rc); - else - { - fprintf(outfile, "%2dC ", i); - PCHARSV(copybuffer, 0, rc, outfile); - fprintf(outfile, " (%d)\n", rc); - } - } - } - - cnptr = copynames; - for (;;) - { - int rc; - char copybuffer[256]; - -#ifdef SUPPORT_PCRE32 - if (pcre_mode == PCRE32_MODE) - { - if (*(pcre_uint32 *)cnptr == 0) break; - } -#endif -#ifdef SUPPORT_PCRE16 - if (pcre_mode == PCRE16_MODE) - { - if (*(pcre_uint16 *)cnptr == 0) break; - } -#endif -#ifdef SUPPORT_PCRE8 - if (pcre_mode == PCRE8_MODE) - { - if (*(pcre_uint8 *)cnptr == 0) break; - } -#endif - - PCRE_COPY_NAMED_SUBSTRING(rc, re, bptr, use_offsets, count, - cnptr, copybuffer, sizeof(copybuffer)); - - if (rc < 0) - { - fprintf(outfile, "copy substring "); - PCHARSV(cnptr, 0, -1, outfile); - fprintf(outfile, " failed %d\n", rc); - } - else - { - fprintf(outfile, " C "); - PCHARSV(copybuffer, 0, rc, outfile); - fprintf(outfile, " (%d) ", rc); - PCHARSV(cnptr, 0, -1, outfile); - putc('\n', outfile); - } - - cnptr = (char *)cnptr + (STRLEN(cnptr) + 1) * CHAR_SIZE; - } - - for (i = 0; i < 32; i++) - { - if ((getstrings & (1 << i)) != 0) - { - int rc; - const char *substring; - PCRE_GET_SUBSTRING(rc, bptr, use_offsets, count, i, &substring); - if (rc < 0) - fprintf(outfile, "get substring %d failed %d\n", i, rc); - else - { - fprintf(outfile, "%2dG ", i); - PCHARSV(substring, 0, rc, outfile); - fprintf(outfile, " (%d)\n", rc); - PCRE_FREE_SUBSTRING(substring); - } - } - } - - gnptr = getnames; - for (;;) - { - int rc; - const char *substring; - -#ifdef SUPPORT_PCRE32 - if (pcre_mode == PCRE32_MODE) - { - if (*(pcre_uint32 *)gnptr == 0) break; - } -#endif -#ifdef SUPPORT_PCRE16 - if (pcre_mode == PCRE16_MODE) - { - if (*(pcre_uint16 *)gnptr == 0) break; - } -#endif -#ifdef SUPPORT_PCRE8 - if (pcre_mode == PCRE8_MODE) - { - if (*(pcre_uint8 *)gnptr == 0) break; - } -#endif - - PCRE_GET_NAMED_SUBSTRING(rc, re, bptr, use_offsets, count, - gnptr, &substring); - if (rc < 0) - { - fprintf(outfile, "get substring "); - PCHARSV(gnptr, 0, -1, outfile); - fprintf(outfile, " failed %d\n", rc); - } - else - { - fprintf(outfile, " G "); - PCHARSV(substring, 0, rc, outfile); - fprintf(outfile, " (%d) ", rc); - PCHARSV(gnptr, 0, -1, outfile); - PCRE_FREE_SUBSTRING(substring); - putc('\n', outfile); - } - - gnptr = (char *)gnptr + (STRLEN(gnptr) + 1) * CHAR_SIZE; - } - - if (getlist) - { - int rc; - const char **stringlist; - PCRE_GET_SUBSTRING_LIST(rc, bptr, use_offsets, count, &stringlist); - if (rc < 0) - fprintf(outfile, "get substring list failed %d\n", rc); - else - { - for (i = 0; i < count; i++) - { - fprintf(outfile, "%2dL ", i); - PCHARSV(stringlist[i], 0, -1, outfile); - putc('\n', outfile); - } - if (stringlist[i] != NULL) - fprintf(outfile, "string list not terminated by NULL\n"); - PCRE_FREE_SUBSTRING_LIST(stringlist); - } - } - } - - /* There was a partial match. If the bumpalong point is not the same as - the first inspected character, show the offset explicitly. */ - - else if (count == PCRE_ERROR_PARTIAL) - { - fprintf(outfile, "Partial match"); - if (use_size_offsets > 2 && use_offsets[0] != use_offsets[2]) - fprintf(outfile, " at offset %d", use_offsets[2]); - if (markptr != NULL) - { - fprintf(outfile, ", mark="); - PCHARSV(markptr, 0, -1, outfile); - } - if (use_size_offsets > 1) - { - fprintf(outfile, ": "); - PCHARSV(bptr, use_offsets[0], use_offsets[1] - use_offsets[0], - outfile); - } - if (verify_jit && jit_was_used) fprintf(outfile, " (JIT)"); - fprintf(outfile, "\n"); - break; /* Out of the /g loop */ - } - - /* Failed to match. If this is a /g or /G loop and we previously set - g_notempty after a null match, this is not necessarily the end. We want - to advance the start offset, and continue. We won't be at the end of the - string - that was checked before setting g_notempty. - - Complication arises in the case when the newline convention is "any", - "crlf", or "anycrlf". If the previous match was at the end of a line - terminated by CRLF, an advance of one character just passes the \r, - whereas we should prefer the longer newline sequence, as does the code in - pcre_exec(). Fudge the offset value to achieve this. We check for a - newline setting in the pattern; if none was set, use PCRE_CONFIG() to - find the default. - - Otherwise, in the case of UTF-8 matching, the advance must be one - character, not one byte. */ - - else - { - if (g_notempty != 0) - { - int onechar = 1; - unsigned int obits = REAL_PCRE_OPTIONS(re); - use_offsets[0] = start_offset; - if ((obits & PCRE_NEWLINE_BITS) == 0) - { - int d; - (void)PCRE_CONFIG(PCRE_CONFIG_NEWLINE, &d); - /* Note that these values are always the ASCII ones, even in - EBCDIC environments. CR = 13, NL = 10. */ - obits = (d == 13)? PCRE_NEWLINE_CR : - (d == 10)? PCRE_NEWLINE_LF : - (d == (13<<8 | 10))? PCRE_NEWLINE_CRLF : - (d == -2)? PCRE_NEWLINE_ANYCRLF : - (d == -1)? PCRE_NEWLINE_ANY : 0; - } - if (((obits & PCRE_NEWLINE_BITS) == PCRE_NEWLINE_ANY || - (obits & PCRE_NEWLINE_BITS) == PCRE_NEWLINE_CRLF || - (obits & PCRE_NEWLINE_BITS) == PCRE_NEWLINE_ANYCRLF) - && - start_offset < len - 1 && ( -#ifdef SUPPORT_PCRE8 - (pcre_mode == PCRE8_MODE && - bptr[start_offset] == '\r' && - bptr[start_offset + 1] == '\n') || -#endif -#ifdef SUPPORT_PCRE16 - (pcre_mode == PCRE16_MODE && - ((PCRE_SPTR16)bptr)[start_offset] == '\r' && - ((PCRE_SPTR16)bptr)[start_offset + 1] == '\n') || -#endif -#ifdef SUPPORT_PCRE32 - (pcre_mode == PCRE32_MODE && - ((PCRE_SPTR32)bptr)[start_offset] == '\r' && - ((PCRE_SPTR32)bptr)[start_offset + 1] == '\n') || -#endif - 0)) - onechar++; - else if (use_utf) - { - while (start_offset + onechar < len) - { - if ((bptr[start_offset+onechar] & 0xc0) != 0x80) break; - onechar++; - } - } - use_offsets[1] = start_offset + onechar; - } - else - { - switch(count) - { - case PCRE_ERROR_NOMATCH: - if (gmatched == 0) - { - if (markptr == NULL) - { - fprintf(outfile, "No match"); - } - else - { - fprintf(outfile, "No match, mark = "); - PCHARSV(markptr, 0, -1, outfile); - } - if (verify_jit && jit_was_used) fprintf(outfile, " (JIT)"); - putc('\n', outfile); - } - break; - - case PCRE_ERROR_BADUTF8: - case PCRE_ERROR_SHORTUTF8: - fprintf(outfile, "Error %d (%s UTF-%d string)", count, - (count == PCRE_ERROR_BADUTF8)? "bad" : "short", - 8 * CHAR_SIZE); - if (use_size_offsets >= 2) - fprintf(outfile, " offset=%d reason=%d", use_offsets[0], - use_offsets[1]); - fprintf(outfile, "\n"); - break; - - case PCRE_ERROR_BADUTF8_OFFSET: - fprintf(outfile, "Error %d (bad UTF-%d offset)\n", count, - 8 * CHAR_SIZE); - break; - - default: - if (count < 0 && - (-count) < (int)(sizeof(errtexts)/sizeof(const char *))) - fprintf(outfile, "Error %d (%s)\n", count, errtexts[-count]); - else - fprintf(outfile, "Error %d (Unexpected value)\n", count); - break; - } - - break; /* Out of the /g loop */ - } - } - - /* If not /g or /G we are done */ - - if (!do_g && !do_G) break; - - if (use_offsets == NULL) - { - fprintf(outfile, "Cannot do global matching without an ovector\n"); - break; - } - - if (use_size_offsets < 2) - { - fprintf(outfile, "Cannot do global matching with an ovector size < 2\n"); - break; - } - - /* If we have matched an empty string, first check to see if we are at - the end of the subject. If so, the /g loop is over. Otherwise, mimic what - Perl's /g options does. This turns out to be rather cunning. First we set - PCRE_NOTEMPTY_ATSTART and PCRE_ANCHORED and try the match again at the - same point. If this fails (picked up above) we advance to the next - character. */ - - g_notempty = 0; - - if (use_offsets[0] == use_offsets[1]) - { - if (use_offsets[0] == len) break; - g_notempty = PCRE_NOTEMPTY_ATSTART | PCRE_ANCHORED; - } - - /* For /g, update the start offset, leaving the rest alone. There is a - tricky case when \K is used in a positive lookbehind assertion. This can - cause the end of the match to be less than or equal to the start offset. - In this case we restart at one past the start offset. This may return the - same match if the original start offset was bumped along during the - match, but eventually the new start offset will hit the actual start - offset. (In PCRE2 the true start offset is available, and this can be - done better. It is not worth doing more than making sure we do not loop - at this stage in the life of PCRE1.) */ - - if (do_g) - { - if (g_notempty == 0 && use_offsets[1] <= start_offset) - { - if (start_offset >= len) break; /* End of subject */ - start_offset++; - if (use_utf) - { - while (start_offset < len) - { - if ((bptr[start_offset] & 0xc0) != 0x80) break; - start_offset++; - } - } - } - else start_offset = use_offsets[1]; - } - - /* For /G, update the pointer and length */ - - else - { - bptr += use_offsets[1] * CHAR_SIZE; - len -= use_offsets[1]; - } - } /* End of loop for /g and /G */ - - NEXT_DATA: continue; - } /* End of loop for data lines */ - - CONTINUE: - -#if !defined NOPOSIX - if ((posix || do_posix) && preg.re_pcre != 0) regfree(&preg); -#endif - - if (re != NULL) new_free(re); - if (extra != NULL) - { - PCRE_FREE_STUDY(extra); - } - if (locale_set) - { - new_free((void *)tables); - setlocale(LC_CTYPE, "C"); - locale_set = 0; - } - if (jit_stack != NULL) - { - PCRE_JIT_STACK_FREE(jit_stack); - jit_stack = NULL; - } - } - -if (infile == stdin) fprintf(outfile, "\n"); - -if (showtotaltimes) - { - fprintf(outfile, "--------------------------------------\n"); - if (timeit > 0) - { - fprintf(outfile, "Total compile time %.4f milliseconds\n", - (((double)total_compile_time * 1000.0) / (double)timeit) / - (double)CLOCKS_PER_SEC); - fprintf(outfile, "Total study time %.4f milliseconds\n", - (((double)total_study_time * 1000.0) / (double)timeit) / - (double)CLOCKS_PER_SEC); - } - fprintf(outfile, "Total execute time %.4f milliseconds\n", - (((double)total_match_time * 1000.0) / (double)timeitm) / - (double)CLOCKS_PER_SEC); - } - -EXIT: - -if (infile != NULL && infile != stdin) fclose(infile); -if (outfile != NULL && outfile != stdout) fclose(outfile); - -free(buffer); -free(dbuffer); -free(pbuffer); -free(offsets); - -#ifdef SUPPORT_PCRE16 -if (buffer16 != NULL) free(buffer16); -#endif -#ifdef SUPPORT_PCRE32 -if (buffer32 != NULL) free(buffer32); -#endif - -#if !defined NODFA -if (dfa_workspace != NULL) - free(dfa_workspace); -#endif - -#if defined(__VMS) - yield = SS$_NORMAL; /* Return values via DCL symbols */ -#endif - -return yield; -} - -/* End of pcretest.c */ - diff --git a/src/pcre/perltest.pl b/src/pcre/perltest.pl deleted file mode 100755 index 29b808b5..00000000 --- a/src/pcre/perltest.pl +++ /dev/null @@ -1,242 +0,0 @@ -#! /usr/bin/env perl - -# Program for testing regular expressions with perl to check that PCRE handles -# them the same. This version needs to have "use utf8" at the start for running -# the UTF-8 tests, but *not* for the other tests. The only way I've found for -# doing this is to cat this line in explicitly in the RunPerlTest script. I've -# also used this method to supply "require Encode" for the UTF-8 tests, so that -# the main test will still run where Encode is not installed. - -#use utf8; -#require Encode; - -# Function for turning a string into a string of printing chars. - -sub pchars { -my($t) = ""; - -if ($utf8) - { - @p = unpack('U*', $_[0]); - foreach $c (@p) - { - if ($c >= 32 && $c < 127) { $t .= chr $c; } - else { $t .= sprintf("\\x{%02x}", $c); - } - } - } -else - { - foreach $c (split(//, $_[0])) - { - if (ord $c >= 32 && ord $c < 127) { $t .= $c; } - else { $t .= sprintf("\\x%02x", ord $c); } - } - } - -$t; -} - - -# Read lines from named file or stdin and write to named file or stdout; lines -# consist of a regular expression, in delimiters and optionally followed by -# options, followed by a set of test data, terminated by an empty line. - -# Sort out the input and output files - -if (@ARGV > 0) - { - open(INFILE, "<$ARGV[0]") || die "Failed to open $ARGV[0]\n"; - $infile = "INFILE"; - } -else { $infile = "STDIN"; } - -if (@ARGV > 1) - { - open(OUTFILE, ">$ARGV[1]") || die "Failed to open $ARGV[1]\n"; - $outfile = "OUTFILE"; - } -else { $outfile = "STDOUT"; } - -printf($outfile "Perl $] Regular Expressions\n\n"); - -# Main loop - -NEXT_RE: -for (;;) - { - printf " re> " if $infile eq "STDIN"; - last if ! ($_ = <$infile>); - printf $outfile "$_" if $infile ne "STDIN"; - next if ($_ =~ /^\s*$/ || $_ =~ /^< forbid/); - - $pattern = $_; - - while ($pattern !~ /^\s*(.).*\1/s) - { - printf " > " if $infile eq "STDIN"; - last if ! ($_ = <$infile>); - printf $outfile "$_" if $infile ne "STDIN"; - $pattern .= $_; - } - - chomp($pattern); - $pattern =~ s/\s+$//; - - # The private /+ modifier means "print $' afterwards". - - $showrest = ($pattern =~ s/\+(?=[a-zA-Z]*$)//); - - # A doubled version is used by pcretest to print remainders after captures - - $pattern =~ s/\+(?=[a-zA-Z]*$)//; - - # Remove /8 from a UTF-8 pattern. - - $utf8 = $pattern =~ s/8(?=[a-zA-Z]*$)//; - - # Remove /J from a pattern with duplicate names. - - $pattern =~ s/J(?=[a-zA-Z]*$)//; - - # Remove /K from a pattern (asks pcretest to check MARK data) */ - - $pattern =~ s/K(?=[a-zA-Z]*$)//; - - # /W asks pcretest to set PCRE_UCP; change this to /u for Perl - - $pattern =~ s/W(?=[a-zA-Z]*$)/u/; - - # Remove /S or /SS from a pattern (asks pcretest to study or not to study) - - $pattern =~ s/S(?=[a-zA-Z]*$)//g; - - # Remove /Y and /O from a pattern (disable PCRE optimizations) - - $pattern =~ s/[YO](?=[a-zA-Z]*$)//; - - # Check that the pattern is valid - - eval "\$_ =~ ${pattern}"; - if ($@) - { - printf $outfile "Error: $@"; - if ($infile != "STDIN") - { - for (;;) - { - last if ! ($_ = <$infile>); - last if $_ =~ /^\s*$/; - } - } - next NEXT_RE; - } - - # If the /g modifier is present, we want to put a loop round the matching; - # otherwise just a single "if". - - $cmd = ($pattern =~ /g[a-z]*$/)? "while" : "if"; - - # If the pattern is actually the null string, Perl uses the most recently - # executed (and successfully compiled) regex is used instead. This is a - # nasty trap for the unwary! The PCRE test suite does contain null strings - # in places - if they are allowed through here all sorts of weird and - # unexpected effects happen. To avoid this, we replace such patterns with - # a non-null pattern that has the same effect. - - $pattern = "/(?#)/$2" if ($pattern =~ /^(.)\1(.*)$/); - - # Read data lines and test them - - for (;;) - { - printf "data> " if $infile eq "STDIN"; - last NEXT_RE if ! ($_ = <$infile>); - chomp; - printf $outfile "$_\n" if $infile ne "STDIN"; - - s/\s+$//; # Remove trailing space - s/^\s+//; # Remove leading space - s/\\Y//g; # Remove \Y (pcretest flag to set PCRE_NO_START_OPTIMIZE) - - last if ($_ eq ""); - $x = eval "\"$_\""; # To get escapes processed - - # Empty array for holding results, ensure $REGERROR and $REGMARK are - # unset, then do the matching. - - @subs = (); - - $pushes = "push \@subs,\$&;" . - "push \@subs,\$1;" . - "push \@subs,\$2;" . - "push \@subs,\$3;" . - "push \@subs,\$4;" . - "push \@subs,\$5;" . - "push \@subs,\$6;" . - "push \@subs,\$7;" . - "push \@subs,\$8;" . - "push \@subs,\$9;" . - "push \@subs,\$10;" . - "push \@subs,\$11;" . - "push \@subs,\$12;" . - "push \@subs,\$13;" . - "push \@subs,\$14;" . - "push \@subs,\$15;" . - "push \@subs,\$16;" . - "push \@subs,\$'; }"; - - undef $REGERROR; - undef $REGMARK; - - eval "${cmd} (\$x =~ ${pattern}) {" . $pushes; - - if ($@) - { - printf $outfile "Error: $@\n"; - next NEXT_RE; - } - elsif (scalar(@subs) == 0) - { - printf $outfile "No match"; - if (defined $REGERROR && $REGERROR != 1) - { printf $outfile (", mark = %s", &pchars($REGERROR)); } - printf $outfile "\n"; - } - else - { - while (scalar(@subs) != 0) - { - printf $outfile (" 0: %s\n", &pchars($subs[0])); - printf $outfile (" 0+ %s\n", &pchars($subs[17])) if $showrest; - $last_printed = 0; - for ($i = 1; $i <= 16; $i++) - { - if (defined $subs[$i]) - { - while ($last_printed++ < $i-1) - { printf $outfile ("%2d: \n", $last_printed); } - printf $outfile ("%2d: %s\n", $i, &pchars($subs[$i])); - $last_printed = $i; - } - } - splice(@subs, 0, 18); - } - - # It seems that $REGMARK is not marked as UTF-8 even when use utf8 is - # set and the input pattern was a UTF-8 string. We can, however, force - # it to be so marked. - - if (defined $REGMARK && $REGMARK != 1) - { - $xx = $REGMARK; - $xx = Encode::decode_utf8($xx) if $utf8; - printf $outfile ("MK: %s\n", &pchars($xx)); - } - } - } - } - -# printf $outfile "\n"; - -# End diff --git a/src/pcre/sljit/sljitNativeTILEGX-encoder.c b/src/pcre/sljit/sljitNativeTILEGX-encoder.c deleted file mode 100644 index dd82ebae..00000000 --- a/src/pcre/sljit/sljitNativeTILEGX-encoder.c +++ /dev/null @@ -1,10159 +0,0 @@ -/* - * Stack-less Just-In-Time compiler - * - * Copyright 2013-2013 Tilera Corporation(jiwang@tilera.com). All rights reserved. - * Copyright Zoltan Herczeg (hzmester@freemail.hu). All rights reserved. - * - * Redistribution and use in source and binary forms, with or without modification, are - * permitted provided that the following conditions are met: - * - * 1. Redistributions of source code must retain the above copyright notice, this list of - * conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright notice, this list - * of conditions and the following disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) AND CONTRIBUTORS ``AS IS'' AND ANY - * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES - * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT - * SHALL THE COPYRIGHT HOLDER(S) OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, - * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED - * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR - * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN - * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ - -/* This code is owned by Tilera Corporation, and distributed as part - of multiple projects. In sljit, the code is under BSD licence. */ - -#include -#include -#include -#define BFD_RELOC(x) R_##x - -/* Special registers. */ -#define TREG_LR 55 -#define TREG_SN 56 -#define TREG_ZERO 63 - -/* Canonical name of each register. */ -const char *const tilegx_register_names[] = -{ - "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", - "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15", - "r16", "r17", "r18", "r19", "r20", "r21", "r22", "r23", - "r24", "r25", "r26", "r27", "r28", "r29", "r30", "r31", - "r32", "r33", "r34", "r35", "r36", "r37", "r38", "r39", - "r40", "r41", "r42", "r43", "r44", "r45", "r46", "r47", - "r48", "r49", "r50", "r51", "r52", "tp", "sp", "lr", - "sn", "idn0", "idn1", "udn0", "udn1", "udn2", "udn3", "zero" -}; - -enum -{ - R_NONE = 0, - R_TILEGX_NONE = 0, - R_TILEGX_64 = 1, - R_TILEGX_32 = 2, - R_TILEGX_16 = 3, - R_TILEGX_8 = 4, - R_TILEGX_64_PCREL = 5, - R_TILEGX_32_PCREL = 6, - R_TILEGX_16_PCREL = 7, - R_TILEGX_8_PCREL = 8, - R_TILEGX_HW0 = 9, - R_TILEGX_HW1 = 10, - R_TILEGX_HW2 = 11, - R_TILEGX_HW3 = 12, - R_TILEGX_HW0_LAST = 13, - R_TILEGX_HW1_LAST = 14, - R_TILEGX_HW2_LAST = 15, - R_TILEGX_COPY = 16, - R_TILEGX_GLOB_DAT = 17, - R_TILEGX_JMP_SLOT = 18, - R_TILEGX_RELATIVE = 19, - R_TILEGX_BROFF_X1 = 20, - R_TILEGX_JUMPOFF_X1 = 21, - R_TILEGX_JUMPOFF_X1_PLT = 22, - R_TILEGX_IMM8_X0 = 23, - R_TILEGX_IMM8_Y0 = 24, - R_TILEGX_IMM8_X1 = 25, - R_TILEGX_IMM8_Y1 = 26, - R_TILEGX_DEST_IMM8_X1 = 27, - R_TILEGX_MT_IMM14_X1 = 28, - R_TILEGX_MF_IMM14_X1 = 29, - R_TILEGX_MMSTART_X0 = 30, - R_TILEGX_MMEND_X0 = 31, - R_TILEGX_SHAMT_X0 = 32, - R_TILEGX_SHAMT_X1 = 33, - R_TILEGX_SHAMT_Y0 = 34, - R_TILEGX_SHAMT_Y1 = 35, - R_TILEGX_IMM16_X0_HW0 = 36, - R_TILEGX_IMM16_X1_HW0 = 37, - R_TILEGX_IMM16_X0_HW1 = 38, - R_TILEGX_IMM16_X1_HW1 = 39, - R_TILEGX_IMM16_X0_HW2 = 40, - R_TILEGX_IMM16_X1_HW2 = 41, - R_TILEGX_IMM16_X0_HW3 = 42, - R_TILEGX_IMM16_X1_HW3 = 43, - R_TILEGX_IMM16_X0_HW0_LAST = 44, - R_TILEGX_IMM16_X1_HW0_LAST = 45, - R_TILEGX_IMM16_X0_HW1_LAST = 46, - R_TILEGX_IMM16_X1_HW1_LAST = 47, - R_TILEGX_IMM16_X0_HW2_LAST = 48, - R_TILEGX_IMM16_X1_HW2_LAST = 49, - R_TILEGX_IMM16_X0_HW0_PCREL = 50, - R_TILEGX_IMM16_X1_HW0_PCREL = 51, - R_TILEGX_IMM16_X0_HW1_PCREL = 52, - R_TILEGX_IMM16_X1_HW1_PCREL = 53, - R_TILEGX_IMM16_X0_HW2_PCREL = 54, - R_TILEGX_IMM16_X1_HW2_PCREL = 55, - R_TILEGX_IMM16_X0_HW3_PCREL = 56, - R_TILEGX_IMM16_X1_HW3_PCREL = 57, - R_TILEGX_IMM16_X0_HW0_LAST_PCREL = 58, - R_TILEGX_IMM16_X1_HW0_LAST_PCREL = 59, - R_TILEGX_IMM16_X0_HW1_LAST_PCREL = 60, - R_TILEGX_IMM16_X1_HW1_LAST_PCREL = 61, - R_TILEGX_IMM16_X0_HW2_LAST_PCREL = 62, - R_TILEGX_IMM16_X1_HW2_LAST_PCREL = 63, - R_TILEGX_IMM16_X0_HW0_GOT = 64, - R_TILEGX_IMM16_X1_HW0_GOT = 65, - - R_TILEGX_IMM16_X0_HW0_PLT_PCREL = 66, - R_TILEGX_IMM16_X1_HW0_PLT_PCREL = 67, - R_TILEGX_IMM16_X0_HW1_PLT_PCREL = 68, - R_TILEGX_IMM16_X1_HW1_PLT_PCREL = 69, - R_TILEGX_IMM16_X0_HW2_PLT_PCREL = 70, - R_TILEGX_IMM16_X1_HW2_PLT_PCREL = 71, - - R_TILEGX_IMM16_X0_HW0_LAST_GOT = 72, - R_TILEGX_IMM16_X1_HW0_LAST_GOT = 73, - R_TILEGX_IMM16_X0_HW1_LAST_GOT = 74, - R_TILEGX_IMM16_X1_HW1_LAST_GOT = 75, - R_TILEGX_IMM16_X0_HW0_TLS_GD = 78, - R_TILEGX_IMM16_X1_HW0_TLS_GD = 79, - R_TILEGX_IMM16_X0_HW0_TLS_LE = 80, - R_TILEGX_IMM16_X1_HW0_TLS_LE = 81, - R_TILEGX_IMM16_X0_HW0_LAST_TLS_LE = 82, - R_TILEGX_IMM16_X1_HW0_LAST_TLS_LE = 83, - R_TILEGX_IMM16_X0_HW1_LAST_TLS_LE = 84, - R_TILEGX_IMM16_X1_HW1_LAST_TLS_LE = 85, - R_TILEGX_IMM16_X0_HW0_LAST_TLS_GD = 86, - R_TILEGX_IMM16_X1_HW0_LAST_TLS_GD = 87, - R_TILEGX_IMM16_X0_HW1_LAST_TLS_GD = 88, - R_TILEGX_IMM16_X1_HW1_LAST_TLS_GD = 89, - R_TILEGX_IMM16_X0_HW0_TLS_IE = 92, - R_TILEGX_IMM16_X1_HW0_TLS_IE = 93, - - R_TILEGX_IMM16_X0_HW0_LAST_PLT_PCREL = 94, - R_TILEGX_IMM16_X1_HW0_LAST_PLT_PCREL = 95, - R_TILEGX_IMM16_X0_HW1_LAST_PLT_PCREL = 96, - R_TILEGX_IMM16_X1_HW1_LAST_PLT_PCREL = 97, - R_TILEGX_IMM16_X0_HW2_LAST_PLT_PCREL = 98, - R_TILEGX_IMM16_X1_HW2_LAST_PLT_PCREL = 99, - - R_TILEGX_IMM16_X0_HW0_LAST_TLS_IE = 100, - R_TILEGX_IMM16_X1_HW0_LAST_TLS_IE = 101, - R_TILEGX_IMM16_X0_HW1_LAST_TLS_IE = 102, - R_TILEGX_IMM16_X1_HW1_LAST_TLS_IE = 103, - R_TILEGX_TLS_DTPMOD64 = 106, - R_TILEGX_TLS_DTPOFF64 = 107, - R_TILEGX_TLS_TPOFF64 = 108, - R_TILEGX_TLS_DTPMOD32 = 109, - R_TILEGX_TLS_DTPOFF32 = 110, - R_TILEGX_TLS_TPOFF32 = 111, - R_TILEGX_TLS_GD_CALL = 112, - R_TILEGX_IMM8_X0_TLS_GD_ADD = 113, - R_TILEGX_IMM8_X1_TLS_GD_ADD = 114, - R_TILEGX_IMM8_Y0_TLS_GD_ADD = 115, - R_TILEGX_IMM8_Y1_TLS_GD_ADD = 116, - R_TILEGX_TLS_IE_LOAD = 117, - R_TILEGX_IMM8_X0_TLS_ADD = 118, - R_TILEGX_IMM8_X1_TLS_ADD = 119, - R_TILEGX_IMM8_Y0_TLS_ADD = 120, - R_TILEGX_IMM8_Y1_TLS_ADD = 121, - R_TILEGX_GNU_VTINHERIT = 128, - R_TILEGX_GNU_VTENTRY = 129, - R_TILEGX_IRELATIVE = 130, - R_TILEGX_NUM = 131 -}; - -typedef enum -{ - TILEGX_PIPELINE_X0, - TILEGX_PIPELINE_X1, - TILEGX_PIPELINE_Y0, - TILEGX_PIPELINE_Y1, - TILEGX_PIPELINE_Y2, -} tilegx_pipeline; - -typedef unsigned long long tilegx_bundle_bits; - -/* These are the bits that determine if a bundle is in the X encoding. */ -#define TILEGX_BUNDLE_MODE_MASK ((tilegx_bundle_bits)3 << 62) - -enum -{ - /* Maximum number of instructions in a bundle (2 for X, 3 for Y). */ - TILEGX_MAX_INSTRUCTIONS_PER_BUNDLE = 3, - - /* How many different pipeline encodings are there? X0, X1, Y0, Y1, Y2. */ - TILEGX_NUM_PIPELINE_ENCODINGS = 5, - - /* Log base 2 of TILEGX_BUNDLE_SIZE_IN_BYTES. */ - TILEGX_LOG2_BUNDLE_SIZE_IN_BYTES = 3, - - /* Instructions take this many bytes. */ - TILEGX_BUNDLE_SIZE_IN_BYTES = 1 << TILEGX_LOG2_BUNDLE_SIZE_IN_BYTES, - - /* Log base 2 of TILEGX_BUNDLE_ALIGNMENT_IN_BYTES. */ - TILEGX_LOG2_BUNDLE_ALIGNMENT_IN_BYTES = 3, - - /* Bundles should be aligned modulo this number of bytes. */ - TILEGX_BUNDLE_ALIGNMENT_IN_BYTES = - (1 << TILEGX_LOG2_BUNDLE_ALIGNMENT_IN_BYTES), - - /* Number of registers (some are magic, such as network I/O). */ - TILEGX_NUM_REGISTERS = 64, -}; - -/* Make a few "tile_" variables to simplify common code between - architectures. */ - -typedef tilegx_bundle_bits tile_bundle_bits; -#define TILE_BUNDLE_SIZE_IN_BYTES TILEGX_BUNDLE_SIZE_IN_BYTES -#define TILE_BUNDLE_ALIGNMENT_IN_BYTES TILEGX_BUNDLE_ALIGNMENT_IN_BYTES -#define TILE_LOG2_BUNDLE_ALIGNMENT_IN_BYTES \ - TILEGX_LOG2_BUNDLE_ALIGNMENT_IN_BYTES - -/* 64-bit pattern for a { bpt ; nop } bundle. */ -#define TILEGX_BPT_BUNDLE 0x286a44ae51485000ULL - -typedef enum -{ - TILEGX_OP_TYPE_REGISTER, - TILEGX_OP_TYPE_IMMEDIATE, - TILEGX_OP_TYPE_ADDRESS, - TILEGX_OP_TYPE_SPR -} tilegx_operand_type; - -struct tilegx_operand -{ - /* Is this operand a register, immediate or address? */ - tilegx_operand_type type; - - /* The default relocation type for this operand. */ - signed int default_reloc : 16; - - /* How many bits is this value? (used for range checking) */ - unsigned int num_bits : 5; - - /* Is the value signed? (used for range checking) */ - unsigned int is_signed : 1; - - /* Is this operand a source register? */ - unsigned int is_src_reg : 1; - - /* Is this operand written? (i.e. is it a destination register) */ - unsigned int is_dest_reg : 1; - - /* Is this operand PC-relative? */ - unsigned int is_pc_relative : 1; - - /* By how many bits do we right shift the value before inserting? */ - unsigned int rightshift : 2; - - /* Return the bits for this operand to be ORed into an existing bundle. */ - tilegx_bundle_bits (*insert) (int op); - - /* Extract this operand and return it. */ - unsigned int (*extract) (tilegx_bundle_bits bundle); -}; - -typedef enum -{ - TILEGX_OPC_BPT, - TILEGX_OPC_INFO, - TILEGX_OPC_INFOL, - TILEGX_OPC_LD4S_TLS, - TILEGX_OPC_LD_TLS, - TILEGX_OPC_MOVE, - TILEGX_OPC_MOVEI, - TILEGX_OPC_MOVELI, - TILEGX_OPC_PREFETCH, - TILEGX_OPC_PREFETCH_ADD_L1, - TILEGX_OPC_PREFETCH_ADD_L1_FAULT, - TILEGX_OPC_PREFETCH_ADD_L2, - TILEGX_OPC_PREFETCH_ADD_L2_FAULT, - TILEGX_OPC_PREFETCH_ADD_L3, - TILEGX_OPC_PREFETCH_ADD_L3_FAULT, - TILEGX_OPC_PREFETCH_L1, - TILEGX_OPC_PREFETCH_L1_FAULT, - TILEGX_OPC_PREFETCH_L2, - TILEGX_OPC_PREFETCH_L2_FAULT, - TILEGX_OPC_PREFETCH_L3, - TILEGX_OPC_PREFETCH_L3_FAULT, - TILEGX_OPC_RAISE, - TILEGX_OPC_ADD, - TILEGX_OPC_ADDI, - TILEGX_OPC_ADDLI, - TILEGX_OPC_ADDX, - TILEGX_OPC_ADDXI, - TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXSC, - TILEGX_OPC_AND, - TILEGX_OPC_ANDI, - TILEGX_OPC_BEQZ, - TILEGX_OPC_BEQZT, - TILEGX_OPC_BFEXTS, - TILEGX_OPC_BFEXTU, - TILEGX_OPC_BFINS, - TILEGX_OPC_BGEZ, - TILEGX_OPC_BGEZT, - TILEGX_OPC_BGTZ, - TILEGX_OPC_BGTZT, - TILEGX_OPC_BLBC, - TILEGX_OPC_BLBCT, - TILEGX_OPC_BLBS, - TILEGX_OPC_BLBST, - TILEGX_OPC_BLEZ, - TILEGX_OPC_BLEZT, - TILEGX_OPC_BLTZ, - TILEGX_OPC_BLTZT, - TILEGX_OPC_BNEZ, - TILEGX_OPC_BNEZT, - TILEGX_OPC_CLZ, - TILEGX_OPC_CMOVEQZ, - TILEGX_OPC_CMOVNEZ, - TILEGX_OPC_CMPEQ, - TILEGX_OPC_CMPEQI, - TILEGX_OPC_CMPEXCH, - TILEGX_OPC_CMPEXCH4, - TILEGX_OPC_CMPLES, - TILEGX_OPC_CMPLEU, - TILEGX_OPC_CMPLTS, - TILEGX_OPC_CMPLTSI, - TILEGX_OPC_CMPLTU, - TILEGX_OPC_CMPLTUI, - TILEGX_OPC_CMPNE, - TILEGX_OPC_CMUL, - TILEGX_OPC_CMULA, - TILEGX_OPC_CMULAF, - TILEGX_OPC_CMULF, - TILEGX_OPC_CMULFR, - TILEGX_OPC_CMULH, - TILEGX_OPC_CMULHR, - TILEGX_OPC_CRC32_32, - TILEGX_OPC_CRC32_8, - TILEGX_OPC_CTZ, - TILEGX_OPC_DBLALIGN, - TILEGX_OPC_DBLALIGN2, - TILEGX_OPC_DBLALIGN4, - TILEGX_OPC_DBLALIGN6, - TILEGX_OPC_DRAIN, - TILEGX_OPC_DTLBPR, - TILEGX_OPC_EXCH, - TILEGX_OPC_EXCH4, - TILEGX_OPC_FDOUBLE_ADD_FLAGS, - TILEGX_OPC_FDOUBLE_ADDSUB, - TILEGX_OPC_FDOUBLE_MUL_FLAGS, - TILEGX_OPC_FDOUBLE_PACK1, - TILEGX_OPC_FDOUBLE_PACK2, - TILEGX_OPC_FDOUBLE_SUB_FLAGS, - TILEGX_OPC_FDOUBLE_UNPACK_MAX, - TILEGX_OPC_FDOUBLE_UNPACK_MIN, - TILEGX_OPC_FETCHADD, - TILEGX_OPC_FETCHADD4, - TILEGX_OPC_FETCHADDGEZ, - TILEGX_OPC_FETCHADDGEZ4, - TILEGX_OPC_FETCHAND, - TILEGX_OPC_FETCHAND4, - TILEGX_OPC_FETCHOR, - TILEGX_OPC_FETCHOR4, - TILEGX_OPC_FINV, - TILEGX_OPC_FLUSH, - TILEGX_OPC_FLUSHWB, - TILEGX_OPC_FNOP, - TILEGX_OPC_FSINGLE_ADD1, - TILEGX_OPC_FSINGLE_ADDSUB2, - TILEGX_OPC_FSINGLE_MUL1, - TILEGX_OPC_FSINGLE_MUL2, - TILEGX_OPC_FSINGLE_PACK1, - TILEGX_OPC_FSINGLE_PACK2, - TILEGX_OPC_FSINGLE_SUB1, - TILEGX_OPC_ICOH, - TILEGX_OPC_ILL, - TILEGX_OPC_INV, - TILEGX_OPC_IRET, - TILEGX_OPC_J, - TILEGX_OPC_JAL, - TILEGX_OPC_JALR, - TILEGX_OPC_JALRP, - TILEGX_OPC_JR, - TILEGX_OPC_JRP, - TILEGX_OPC_LD, - TILEGX_OPC_LD1S, - TILEGX_OPC_LD1S_ADD, - TILEGX_OPC_LD1U, - TILEGX_OPC_LD1U_ADD, - TILEGX_OPC_LD2S, - TILEGX_OPC_LD2S_ADD, - TILEGX_OPC_LD2U, - TILEGX_OPC_LD2U_ADD, - TILEGX_OPC_LD4S, - TILEGX_OPC_LD4S_ADD, - TILEGX_OPC_LD4U, - TILEGX_OPC_LD4U_ADD, - TILEGX_OPC_LD_ADD, - TILEGX_OPC_LDNA, - TILEGX_OPC_LDNA_ADD, - TILEGX_OPC_LDNT, - TILEGX_OPC_LDNT1S, - TILEGX_OPC_LDNT1S_ADD, - TILEGX_OPC_LDNT1U, - TILEGX_OPC_LDNT1U_ADD, - TILEGX_OPC_LDNT2S, - TILEGX_OPC_LDNT2S_ADD, - TILEGX_OPC_LDNT2U, - TILEGX_OPC_LDNT2U_ADD, - TILEGX_OPC_LDNT4S, - TILEGX_OPC_LDNT4S_ADD, - TILEGX_OPC_LDNT4U, - TILEGX_OPC_LDNT4U_ADD, - TILEGX_OPC_LDNT_ADD, - TILEGX_OPC_LNK, - TILEGX_OPC_MF, - TILEGX_OPC_MFSPR, - TILEGX_OPC_MM, - TILEGX_OPC_MNZ, - TILEGX_OPC_MTSPR, - TILEGX_OPC_MUL_HS_HS, - TILEGX_OPC_MUL_HS_HU, - TILEGX_OPC_MUL_HS_LS, - TILEGX_OPC_MUL_HS_LU, - TILEGX_OPC_MUL_HU_HU, - TILEGX_OPC_MUL_HU_LS, - TILEGX_OPC_MUL_HU_LU, - TILEGX_OPC_MUL_LS_LS, - TILEGX_OPC_MUL_LS_LU, - TILEGX_OPC_MUL_LU_LU, - TILEGX_OPC_MULA_HS_HS, - TILEGX_OPC_MULA_HS_HU, - TILEGX_OPC_MULA_HS_LS, - TILEGX_OPC_MULA_HS_LU, - TILEGX_OPC_MULA_HU_HU, - TILEGX_OPC_MULA_HU_LS, - TILEGX_OPC_MULA_HU_LU, - TILEGX_OPC_MULA_LS_LS, - TILEGX_OPC_MULA_LS_LU, - TILEGX_OPC_MULA_LU_LU, - TILEGX_OPC_MULAX, - TILEGX_OPC_MULX, - TILEGX_OPC_MZ, - TILEGX_OPC_NAP, - TILEGX_OPC_NOP, - TILEGX_OPC_NOR, - TILEGX_OPC_OR, - TILEGX_OPC_ORI, - TILEGX_OPC_PCNT, - TILEGX_OPC_REVBITS, - TILEGX_OPC_REVBYTES, - TILEGX_OPC_ROTL, - TILEGX_OPC_ROTLI, - TILEGX_OPC_SHL, - TILEGX_OPC_SHL16INSLI, - TILEGX_OPC_SHL1ADD, - TILEGX_OPC_SHL1ADDX, - TILEGX_OPC_SHL2ADD, - TILEGX_OPC_SHL2ADDX, - TILEGX_OPC_SHL3ADD, - TILEGX_OPC_SHL3ADDX, - TILEGX_OPC_SHLI, - TILEGX_OPC_SHLX, - TILEGX_OPC_SHLXI, - TILEGX_OPC_SHRS, - TILEGX_OPC_SHRSI, - TILEGX_OPC_SHRU, - TILEGX_OPC_SHRUI, - TILEGX_OPC_SHRUX, - TILEGX_OPC_SHRUXI, - TILEGX_OPC_SHUFFLEBYTES, - TILEGX_OPC_ST, - TILEGX_OPC_ST1, - TILEGX_OPC_ST1_ADD, - TILEGX_OPC_ST2, - TILEGX_OPC_ST2_ADD, - TILEGX_OPC_ST4, - TILEGX_OPC_ST4_ADD, - TILEGX_OPC_ST_ADD, - TILEGX_OPC_STNT, - TILEGX_OPC_STNT1, - TILEGX_OPC_STNT1_ADD, - TILEGX_OPC_STNT2, - TILEGX_OPC_STNT2_ADD, - TILEGX_OPC_STNT4, - TILEGX_OPC_STNT4_ADD, - TILEGX_OPC_STNT_ADD, - TILEGX_OPC_SUB, - TILEGX_OPC_SUBX, - TILEGX_OPC_SUBXSC, - TILEGX_OPC_SWINT0, - TILEGX_OPC_SWINT1, - TILEGX_OPC_SWINT2, - TILEGX_OPC_SWINT3, - TILEGX_OPC_TBLIDXB0, - TILEGX_OPC_TBLIDXB1, - TILEGX_OPC_TBLIDXB2, - TILEGX_OPC_TBLIDXB3, - TILEGX_OPC_V1ADD, - TILEGX_OPC_V1ADDI, - TILEGX_OPC_V1ADDUC, - TILEGX_OPC_V1ADIFFU, - TILEGX_OPC_V1AVGU, - TILEGX_OPC_V1CMPEQ, - TILEGX_OPC_V1CMPEQI, - TILEGX_OPC_V1CMPLES, - TILEGX_OPC_V1CMPLEU, - TILEGX_OPC_V1CMPLTS, - TILEGX_OPC_V1CMPLTSI, - TILEGX_OPC_V1CMPLTU, - TILEGX_OPC_V1CMPLTUI, - TILEGX_OPC_V1CMPNE, - TILEGX_OPC_V1DDOTPU, - TILEGX_OPC_V1DDOTPUA, - TILEGX_OPC_V1DDOTPUS, - TILEGX_OPC_V1DDOTPUSA, - TILEGX_OPC_V1DOTP, - TILEGX_OPC_V1DOTPA, - TILEGX_OPC_V1DOTPU, - TILEGX_OPC_V1DOTPUA, - TILEGX_OPC_V1DOTPUS, - TILEGX_OPC_V1DOTPUSA, - TILEGX_OPC_V1INT_H, - TILEGX_OPC_V1INT_L, - TILEGX_OPC_V1MAXU, - TILEGX_OPC_V1MAXUI, - TILEGX_OPC_V1MINU, - TILEGX_OPC_V1MINUI, - TILEGX_OPC_V1MNZ, - TILEGX_OPC_V1MULTU, - TILEGX_OPC_V1MULU, - TILEGX_OPC_V1MULUS, - TILEGX_OPC_V1MZ, - TILEGX_OPC_V1SADAU, - TILEGX_OPC_V1SADU, - TILEGX_OPC_V1SHL, - TILEGX_OPC_V1SHLI, - TILEGX_OPC_V1SHRS, - TILEGX_OPC_V1SHRSI, - TILEGX_OPC_V1SHRU, - TILEGX_OPC_V1SHRUI, - TILEGX_OPC_V1SUB, - TILEGX_OPC_V1SUBUC, - TILEGX_OPC_V2ADD, - TILEGX_OPC_V2ADDI, - TILEGX_OPC_V2ADDSC, - TILEGX_OPC_V2ADIFFS, - TILEGX_OPC_V2AVGS, - TILEGX_OPC_V2CMPEQ, - TILEGX_OPC_V2CMPEQI, - TILEGX_OPC_V2CMPLES, - TILEGX_OPC_V2CMPLEU, - TILEGX_OPC_V2CMPLTS, - TILEGX_OPC_V2CMPLTSI, - TILEGX_OPC_V2CMPLTU, - TILEGX_OPC_V2CMPLTUI, - TILEGX_OPC_V2CMPNE, - TILEGX_OPC_V2DOTP, - TILEGX_OPC_V2DOTPA, - TILEGX_OPC_V2INT_H, - TILEGX_OPC_V2INT_L, - TILEGX_OPC_V2MAXS, - TILEGX_OPC_V2MAXSI, - TILEGX_OPC_V2MINS, - TILEGX_OPC_V2MINSI, - TILEGX_OPC_V2MNZ, - TILEGX_OPC_V2MULFSC, - TILEGX_OPC_V2MULS, - TILEGX_OPC_V2MULTS, - TILEGX_OPC_V2MZ, - TILEGX_OPC_V2PACKH, - TILEGX_OPC_V2PACKL, - TILEGX_OPC_V2PACKUC, - TILEGX_OPC_V2SADAS, - TILEGX_OPC_V2SADAU, - TILEGX_OPC_V2SADS, - TILEGX_OPC_V2SADU, - TILEGX_OPC_V2SHL, - TILEGX_OPC_V2SHLI, - TILEGX_OPC_V2SHLSC, - TILEGX_OPC_V2SHRS, - TILEGX_OPC_V2SHRSI, - TILEGX_OPC_V2SHRU, - TILEGX_OPC_V2SHRUI, - TILEGX_OPC_V2SUB, - TILEGX_OPC_V2SUBSC, - TILEGX_OPC_V4ADD, - TILEGX_OPC_V4ADDSC, - TILEGX_OPC_V4INT_H, - TILEGX_OPC_V4INT_L, - TILEGX_OPC_V4PACKSC, - TILEGX_OPC_V4SHL, - TILEGX_OPC_V4SHLSC, - TILEGX_OPC_V4SHRS, - TILEGX_OPC_V4SHRU, - TILEGX_OPC_V4SUB, - TILEGX_OPC_V4SUBSC, - TILEGX_OPC_WH64, - TILEGX_OPC_XOR, - TILEGX_OPC_XORI, - TILEGX_OPC_NONE -} tilegx_mnemonic; - -enum -{ - TILEGX_MAX_OPERANDS = 4 /* bfexts */ -}; - -struct tilegx_opcode -{ - /* The opcode mnemonic, e.g. "add" */ - const char *name; - - /* The enum value for this mnemonic. */ - tilegx_mnemonic mnemonic; - - /* A bit mask of which of the five pipes this instruction - is compatible with: - X0 0x01 - X1 0x02 - Y0 0x04 - Y1 0x08 - Y2 0x10 */ - unsigned char pipes; - - /* How many operands are there? */ - unsigned char num_operands; - - /* Which register does this write implicitly, or TREG_ZERO if none? */ - unsigned char implicitly_written_register; - - /* Can this be bundled with other instructions (almost always true). */ - unsigned char can_bundle; - - /* The description of the operands. Each of these is an - * index into the tilegx_operands[] table. */ - unsigned char operands[TILEGX_NUM_PIPELINE_ENCODINGS][TILEGX_MAX_OPERANDS]; - - /* A mask of which bits have predefined values for each pipeline. - * This is useful for disassembly. */ - tilegx_bundle_bits fixed_bit_masks[TILEGX_NUM_PIPELINE_ENCODINGS]; - - /* For each bit set in fixed_bit_masks, what the value is for this - * instruction. */ - tilegx_bundle_bits fixed_bit_values[TILEGX_NUM_PIPELINE_ENCODINGS]; -}; - -/* Used for non-textual disassembly into structs. */ -struct tilegx_decoded_instruction -{ - const struct tilegx_opcode *opcode; - const struct tilegx_operand *operands[TILEGX_MAX_OPERANDS]; - long long operand_values[TILEGX_MAX_OPERANDS]; -}; - -enum -{ - ADDI_IMM8_OPCODE_X0 = 1, - ADDI_IMM8_OPCODE_X1 = 1, - ADDI_OPCODE_Y0 = 0, - ADDI_OPCODE_Y1 = 1, - ADDLI_OPCODE_X0 = 1, - ADDLI_OPCODE_X1 = 0, - ADDXI_IMM8_OPCODE_X0 = 2, - ADDXI_IMM8_OPCODE_X1 = 2, - ADDXI_OPCODE_Y0 = 1, - ADDXI_OPCODE_Y1 = 2, - ADDXLI_OPCODE_X0 = 2, - ADDXLI_OPCODE_X1 = 1, - ADDXSC_RRR_0_OPCODE_X0 = 1, - ADDXSC_RRR_0_OPCODE_X1 = 1, - ADDX_RRR_0_OPCODE_X0 = 2, - ADDX_RRR_0_OPCODE_X1 = 2, - ADDX_RRR_0_OPCODE_Y0 = 0, - ADDX_SPECIAL_0_OPCODE_Y1 = 0, - ADD_RRR_0_OPCODE_X0 = 3, - ADD_RRR_0_OPCODE_X1 = 3, - ADD_RRR_0_OPCODE_Y0 = 1, - ADD_SPECIAL_0_OPCODE_Y1 = 1, - ANDI_IMM8_OPCODE_X0 = 3, - ANDI_IMM8_OPCODE_X1 = 3, - ANDI_OPCODE_Y0 = 2, - ANDI_OPCODE_Y1 = 3, - AND_RRR_0_OPCODE_X0 = 4, - AND_RRR_0_OPCODE_X1 = 4, - AND_RRR_5_OPCODE_Y0 = 0, - AND_RRR_5_OPCODE_Y1 = 0, - BEQZT_BRANCH_OPCODE_X1 = 16, - BEQZ_BRANCH_OPCODE_X1 = 17, - BFEXTS_BF_OPCODE_X0 = 4, - BFEXTU_BF_OPCODE_X0 = 5, - BFINS_BF_OPCODE_X0 = 6, - BF_OPCODE_X0 = 3, - BGEZT_BRANCH_OPCODE_X1 = 18, - BGEZ_BRANCH_OPCODE_X1 = 19, - BGTZT_BRANCH_OPCODE_X1 = 20, - BGTZ_BRANCH_OPCODE_X1 = 21, - BLBCT_BRANCH_OPCODE_X1 = 22, - BLBC_BRANCH_OPCODE_X1 = 23, - BLBST_BRANCH_OPCODE_X1 = 24, - BLBS_BRANCH_OPCODE_X1 = 25, - BLEZT_BRANCH_OPCODE_X1 = 26, - BLEZ_BRANCH_OPCODE_X1 = 27, - BLTZT_BRANCH_OPCODE_X1 = 28, - BLTZ_BRANCH_OPCODE_X1 = 29, - BNEZT_BRANCH_OPCODE_X1 = 30, - BNEZ_BRANCH_OPCODE_X1 = 31, - BRANCH_OPCODE_X1 = 2, - CMOVEQZ_RRR_0_OPCODE_X0 = 5, - CMOVEQZ_RRR_4_OPCODE_Y0 = 0, - CMOVNEZ_RRR_0_OPCODE_X0 = 6, - CMOVNEZ_RRR_4_OPCODE_Y0 = 1, - CMPEQI_IMM8_OPCODE_X0 = 4, - CMPEQI_IMM8_OPCODE_X1 = 4, - CMPEQI_OPCODE_Y0 = 3, - CMPEQI_OPCODE_Y1 = 4, - CMPEQ_RRR_0_OPCODE_X0 = 7, - CMPEQ_RRR_0_OPCODE_X1 = 5, - CMPEQ_RRR_3_OPCODE_Y0 = 0, - CMPEQ_RRR_3_OPCODE_Y1 = 2, - CMPEXCH4_RRR_0_OPCODE_X1 = 6, - CMPEXCH_RRR_0_OPCODE_X1 = 7, - CMPLES_RRR_0_OPCODE_X0 = 8, - CMPLES_RRR_0_OPCODE_X1 = 8, - CMPLES_RRR_2_OPCODE_Y0 = 0, - CMPLES_RRR_2_OPCODE_Y1 = 0, - CMPLEU_RRR_0_OPCODE_X0 = 9, - CMPLEU_RRR_0_OPCODE_X1 = 9, - CMPLEU_RRR_2_OPCODE_Y0 = 1, - CMPLEU_RRR_2_OPCODE_Y1 = 1, - CMPLTSI_IMM8_OPCODE_X0 = 5, - CMPLTSI_IMM8_OPCODE_X1 = 5, - CMPLTSI_OPCODE_Y0 = 4, - CMPLTSI_OPCODE_Y1 = 5, - CMPLTS_RRR_0_OPCODE_X0 = 10, - CMPLTS_RRR_0_OPCODE_X1 = 10, - CMPLTS_RRR_2_OPCODE_Y0 = 2, - CMPLTS_RRR_2_OPCODE_Y1 = 2, - CMPLTUI_IMM8_OPCODE_X0 = 6, - CMPLTUI_IMM8_OPCODE_X1 = 6, - CMPLTU_RRR_0_OPCODE_X0 = 11, - CMPLTU_RRR_0_OPCODE_X1 = 11, - CMPLTU_RRR_2_OPCODE_Y0 = 3, - CMPLTU_RRR_2_OPCODE_Y1 = 3, - CMPNE_RRR_0_OPCODE_X0 = 12, - CMPNE_RRR_0_OPCODE_X1 = 12, - CMPNE_RRR_3_OPCODE_Y0 = 1, - CMPNE_RRR_3_OPCODE_Y1 = 3, - CMULAF_RRR_0_OPCODE_X0 = 13, - CMULA_RRR_0_OPCODE_X0 = 14, - CMULFR_RRR_0_OPCODE_X0 = 15, - CMULF_RRR_0_OPCODE_X0 = 16, - CMULHR_RRR_0_OPCODE_X0 = 17, - CMULH_RRR_0_OPCODE_X0 = 18, - CMUL_RRR_0_OPCODE_X0 = 19, - CNTLZ_UNARY_OPCODE_X0 = 1, - CNTLZ_UNARY_OPCODE_Y0 = 1, - CNTTZ_UNARY_OPCODE_X0 = 2, - CNTTZ_UNARY_OPCODE_Y0 = 2, - CRC32_32_RRR_0_OPCODE_X0 = 20, - CRC32_8_RRR_0_OPCODE_X0 = 21, - DBLALIGN2_RRR_0_OPCODE_X0 = 22, - DBLALIGN2_RRR_0_OPCODE_X1 = 13, - DBLALIGN4_RRR_0_OPCODE_X0 = 23, - DBLALIGN4_RRR_0_OPCODE_X1 = 14, - DBLALIGN6_RRR_0_OPCODE_X0 = 24, - DBLALIGN6_RRR_0_OPCODE_X1 = 15, - DBLALIGN_RRR_0_OPCODE_X0 = 25, - DRAIN_UNARY_OPCODE_X1 = 1, - DTLBPR_UNARY_OPCODE_X1 = 2, - EXCH4_RRR_0_OPCODE_X1 = 16, - EXCH_RRR_0_OPCODE_X1 = 17, - FDOUBLE_ADDSUB_RRR_0_OPCODE_X0 = 26, - FDOUBLE_ADD_FLAGS_RRR_0_OPCODE_X0 = 27, - FDOUBLE_MUL_FLAGS_RRR_0_OPCODE_X0 = 28, - FDOUBLE_PACK1_RRR_0_OPCODE_X0 = 29, - FDOUBLE_PACK2_RRR_0_OPCODE_X0 = 30, - FDOUBLE_SUB_FLAGS_RRR_0_OPCODE_X0 = 31, - FDOUBLE_UNPACK_MAX_RRR_0_OPCODE_X0 = 32, - FDOUBLE_UNPACK_MIN_RRR_0_OPCODE_X0 = 33, - FETCHADD4_RRR_0_OPCODE_X1 = 18, - FETCHADDGEZ4_RRR_0_OPCODE_X1 = 19, - FETCHADDGEZ_RRR_0_OPCODE_X1 = 20, - FETCHADD_RRR_0_OPCODE_X1 = 21, - FETCHAND4_RRR_0_OPCODE_X1 = 22, - FETCHAND_RRR_0_OPCODE_X1 = 23, - FETCHOR4_RRR_0_OPCODE_X1 = 24, - FETCHOR_RRR_0_OPCODE_X1 = 25, - FINV_UNARY_OPCODE_X1 = 3, - FLUSHWB_UNARY_OPCODE_X1 = 4, - FLUSH_UNARY_OPCODE_X1 = 5, - FNOP_UNARY_OPCODE_X0 = 3, - FNOP_UNARY_OPCODE_X1 = 6, - FNOP_UNARY_OPCODE_Y0 = 3, - FNOP_UNARY_OPCODE_Y1 = 8, - FSINGLE_ADD1_RRR_0_OPCODE_X0 = 34, - FSINGLE_ADDSUB2_RRR_0_OPCODE_X0 = 35, - FSINGLE_MUL1_RRR_0_OPCODE_X0 = 36, - FSINGLE_MUL2_RRR_0_OPCODE_X0 = 37, - FSINGLE_PACK1_UNARY_OPCODE_X0 = 4, - FSINGLE_PACK1_UNARY_OPCODE_Y0 = 4, - FSINGLE_PACK2_RRR_0_OPCODE_X0 = 38, - FSINGLE_SUB1_RRR_0_OPCODE_X0 = 39, - ICOH_UNARY_OPCODE_X1 = 7, - ILL_UNARY_OPCODE_X1 = 8, - ILL_UNARY_OPCODE_Y1 = 9, - IMM8_OPCODE_X0 = 4, - IMM8_OPCODE_X1 = 3, - INV_UNARY_OPCODE_X1 = 9, - IRET_UNARY_OPCODE_X1 = 10, - JALRP_UNARY_OPCODE_X1 = 11, - JALRP_UNARY_OPCODE_Y1 = 10, - JALR_UNARY_OPCODE_X1 = 12, - JALR_UNARY_OPCODE_Y1 = 11, - JAL_JUMP_OPCODE_X1 = 0, - JRP_UNARY_OPCODE_X1 = 13, - JRP_UNARY_OPCODE_Y1 = 12, - JR_UNARY_OPCODE_X1 = 14, - JR_UNARY_OPCODE_Y1 = 13, - JUMP_OPCODE_X1 = 4, - J_JUMP_OPCODE_X1 = 1, - LD1S_ADD_IMM8_OPCODE_X1 = 7, - LD1S_OPCODE_Y2 = 0, - LD1S_UNARY_OPCODE_X1 = 15, - LD1U_ADD_IMM8_OPCODE_X1 = 8, - LD1U_OPCODE_Y2 = 1, - LD1U_UNARY_OPCODE_X1 = 16, - LD2S_ADD_IMM8_OPCODE_X1 = 9, - LD2S_OPCODE_Y2 = 2, - LD2S_UNARY_OPCODE_X1 = 17, - LD2U_ADD_IMM8_OPCODE_X1 = 10, - LD2U_OPCODE_Y2 = 3, - LD2U_UNARY_OPCODE_X1 = 18, - LD4S_ADD_IMM8_OPCODE_X1 = 11, - LD4S_OPCODE_Y2 = 1, - LD4S_UNARY_OPCODE_X1 = 19, - LD4U_ADD_IMM8_OPCODE_X1 = 12, - LD4U_OPCODE_Y2 = 2, - LD4U_UNARY_OPCODE_X1 = 20, - LDNA_UNARY_OPCODE_X1 = 21, - LDNT1S_ADD_IMM8_OPCODE_X1 = 13, - LDNT1S_UNARY_OPCODE_X1 = 22, - LDNT1U_ADD_IMM8_OPCODE_X1 = 14, - LDNT1U_UNARY_OPCODE_X1 = 23, - LDNT2S_ADD_IMM8_OPCODE_X1 = 15, - LDNT2S_UNARY_OPCODE_X1 = 24, - LDNT2U_ADD_IMM8_OPCODE_X1 = 16, - LDNT2U_UNARY_OPCODE_X1 = 25, - LDNT4S_ADD_IMM8_OPCODE_X1 = 17, - LDNT4S_UNARY_OPCODE_X1 = 26, - LDNT4U_ADD_IMM8_OPCODE_X1 = 18, - LDNT4U_UNARY_OPCODE_X1 = 27, - LDNT_ADD_IMM8_OPCODE_X1 = 19, - LDNT_UNARY_OPCODE_X1 = 28, - LD_ADD_IMM8_OPCODE_X1 = 20, - LD_OPCODE_Y2 = 3, - LD_UNARY_OPCODE_X1 = 29, - LNK_UNARY_OPCODE_X1 = 30, - LNK_UNARY_OPCODE_Y1 = 14, - LWNA_ADD_IMM8_OPCODE_X1 = 21, - MFSPR_IMM8_OPCODE_X1 = 22, - MF_UNARY_OPCODE_X1 = 31, - MM_BF_OPCODE_X0 = 7, - MNZ_RRR_0_OPCODE_X0 = 40, - MNZ_RRR_0_OPCODE_X1 = 26, - MNZ_RRR_4_OPCODE_Y0 = 2, - MNZ_RRR_4_OPCODE_Y1 = 2, - MODE_OPCODE_YA2 = 1, - MODE_OPCODE_YB2 = 2, - MODE_OPCODE_YC2 = 3, - MTSPR_IMM8_OPCODE_X1 = 23, - MULAX_RRR_0_OPCODE_X0 = 41, - MULAX_RRR_3_OPCODE_Y0 = 2, - MULA_HS_HS_RRR_0_OPCODE_X0 = 42, - MULA_HS_HS_RRR_9_OPCODE_Y0 = 0, - MULA_HS_HU_RRR_0_OPCODE_X0 = 43, - MULA_HS_LS_RRR_0_OPCODE_X0 = 44, - MULA_HS_LU_RRR_0_OPCODE_X0 = 45, - MULA_HU_HU_RRR_0_OPCODE_X0 = 46, - MULA_HU_HU_RRR_9_OPCODE_Y0 = 1, - MULA_HU_LS_RRR_0_OPCODE_X0 = 47, - MULA_HU_LU_RRR_0_OPCODE_X0 = 48, - MULA_LS_LS_RRR_0_OPCODE_X0 = 49, - MULA_LS_LS_RRR_9_OPCODE_Y0 = 2, - MULA_LS_LU_RRR_0_OPCODE_X0 = 50, - MULA_LU_LU_RRR_0_OPCODE_X0 = 51, - MULA_LU_LU_RRR_9_OPCODE_Y0 = 3, - MULX_RRR_0_OPCODE_X0 = 52, - MULX_RRR_3_OPCODE_Y0 = 3, - MUL_HS_HS_RRR_0_OPCODE_X0 = 53, - MUL_HS_HS_RRR_8_OPCODE_Y0 = 0, - MUL_HS_HU_RRR_0_OPCODE_X0 = 54, - MUL_HS_LS_RRR_0_OPCODE_X0 = 55, - MUL_HS_LU_RRR_0_OPCODE_X0 = 56, - MUL_HU_HU_RRR_0_OPCODE_X0 = 57, - MUL_HU_HU_RRR_8_OPCODE_Y0 = 1, - MUL_HU_LS_RRR_0_OPCODE_X0 = 58, - MUL_HU_LU_RRR_0_OPCODE_X0 = 59, - MUL_LS_LS_RRR_0_OPCODE_X0 = 60, - MUL_LS_LS_RRR_8_OPCODE_Y0 = 2, - MUL_LS_LU_RRR_0_OPCODE_X0 = 61, - MUL_LU_LU_RRR_0_OPCODE_X0 = 62, - MUL_LU_LU_RRR_8_OPCODE_Y0 = 3, - MZ_RRR_0_OPCODE_X0 = 63, - MZ_RRR_0_OPCODE_X1 = 27, - MZ_RRR_4_OPCODE_Y0 = 3, - MZ_RRR_4_OPCODE_Y1 = 3, - NAP_UNARY_OPCODE_X1 = 32, - NOP_UNARY_OPCODE_X0 = 5, - NOP_UNARY_OPCODE_X1 = 33, - NOP_UNARY_OPCODE_Y0 = 5, - NOP_UNARY_OPCODE_Y1 = 15, - NOR_RRR_0_OPCODE_X0 = 64, - NOR_RRR_0_OPCODE_X1 = 28, - NOR_RRR_5_OPCODE_Y0 = 1, - NOR_RRR_5_OPCODE_Y1 = 1, - ORI_IMM8_OPCODE_X0 = 7, - ORI_IMM8_OPCODE_X1 = 24, - OR_RRR_0_OPCODE_X0 = 65, - OR_RRR_0_OPCODE_X1 = 29, - OR_RRR_5_OPCODE_Y0 = 2, - OR_RRR_5_OPCODE_Y1 = 2, - PCNT_UNARY_OPCODE_X0 = 6, - PCNT_UNARY_OPCODE_Y0 = 6, - REVBITS_UNARY_OPCODE_X0 = 7, - REVBITS_UNARY_OPCODE_Y0 = 7, - REVBYTES_UNARY_OPCODE_X0 = 8, - REVBYTES_UNARY_OPCODE_Y0 = 8, - ROTLI_SHIFT_OPCODE_X0 = 1, - ROTLI_SHIFT_OPCODE_X1 = 1, - ROTLI_SHIFT_OPCODE_Y0 = 0, - ROTLI_SHIFT_OPCODE_Y1 = 0, - ROTL_RRR_0_OPCODE_X0 = 66, - ROTL_RRR_0_OPCODE_X1 = 30, - ROTL_RRR_6_OPCODE_Y0 = 0, - ROTL_RRR_6_OPCODE_Y1 = 0, - RRR_0_OPCODE_X0 = 5, - RRR_0_OPCODE_X1 = 5, - RRR_0_OPCODE_Y0 = 5, - RRR_0_OPCODE_Y1 = 6, - RRR_1_OPCODE_Y0 = 6, - RRR_1_OPCODE_Y1 = 7, - RRR_2_OPCODE_Y0 = 7, - RRR_2_OPCODE_Y1 = 8, - RRR_3_OPCODE_Y0 = 8, - RRR_3_OPCODE_Y1 = 9, - RRR_4_OPCODE_Y0 = 9, - RRR_4_OPCODE_Y1 = 10, - RRR_5_OPCODE_Y0 = 10, - RRR_5_OPCODE_Y1 = 11, - RRR_6_OPCODE_Y0 = 11, - RRR_6_OPCODE_Y1 = 12, - RRR_7_OPCODE_Y0 = 12, - RRR_7_OPCODE_Y1 = 13, - RRR_8_OPCODE_Y0 = 13, - RRR_9_OPCODE_Y0 = 14, - SHIFT_OPCODE_X0 = 6, - SHIFT_OPCODE_X1 = 6, - SHIFT_OPCODE_Y0 = 15, - SHIFT_OPCODE_Y1 = 14, - SHL16INSLI_OPCODE_X0 = 7, - SHL16INSLI_OPCODE_X1 = 7, - SHL1ADDX_RRR_0_OPCODE_X0 = 67, - SHL1ADDX_RRR_0_OPCODE_X1 = 31, - SHL1ADDX_RRR_7_OPCODE_Y0 = 1, - SHL1ADDX_RRR_7_OPCODE_Y1 = 1, - SHL1ADD_RRR_0_OPCODE_X0 = 68, - SHL1ADD_RRR_0_OPCODE_X1 = 32, - SHL1ADD_RRR_1_OPCODE_Y0 = 0, - SHL1ADD_RRR_1_OPCODE_Y1 = 0, - SHL2ADDX_RRR_0_OPCODE_X0 = 69, - SHL2ADDX_RRR_0_OPCODE_X1 = 33, - SHL2ADDX_RRR_7_OPCODE_Y0 = 2, - SHL2ADDX_RRR_7_OPCODE_Y1 = 2, - SHL2ADD_RRR_0_OPCODE_X0 = 70, - SHL2ADD_RRR_0_OPCODE_X1 = 34, - SHL2ADD_RRR_1_OPCODE_Y0 = 1, - SHL2ADD_RRR_1_OPCODE_Y1 = 1, - SHL3ADDX_RRR_0_OPCODE_X0 = 71, - SHL3ADDX_RRR_0_OPCODE_X1 = 35, - SHL3ADDX_RRR_7_OPCODE_Y0 = 3, - SHL3ADDX_RRR_7_OPCODE_Y1 = 3, - SHL3ADD_RRR_0_OPCODE_X0 = 72, - SHL3ADD_RRR_0_OPCODE_X1 = 36, - SHL3ADD_RRR_1_OPCODE_Y0 = 2, - SHL3ADD_RRR_1_OPCODE_Y1 = 2, - SHLI_SHIFT_OPCODE_X0 = 2, - SHLI_SHIFT_OPCODE_X1 = 2, - SHLI_SHIFT_OPCODE_Y0 = 1, - SHLI_SHIFT_OPCODE_Y1 = 1, - SHLXI_SHIFT_OPCODE_X0 = 3, - SHLXI_SHIFT_OPCODE_X1 = 3, - SHLX_RRR_0_OPCODE_X0 = 73, - SHLX_RRR_0_OPCODE_X1 = 37, - SHL_RRR_0_OPCODE_X0 = 74, - SHL_RRR_0_OPCODE_X1 = 38, - SHL_RRR_6_OPCODE_Y0 = 1, - SHL_RRR_6_OPCODE_Y1 = 1, - SHRSI_SHIFT_OPCODE_X0 = 4, - SHRSI_SHIFT_OPCODE_X1 = 4, - SHRSI_SHIFT_OPCODE_Y0 = 2, - SHRSI_SHIFT_OPCODE_Y1 = 2, - SHRS_RRR_0_OPCODE_X0 = 75, - SHRS_RRR_0_OPCODE_X1 = 39, - SHRS_RRR_6_OPCODE_Y0 = 2, - SHRS_RRR_6_OPCODE_Y1 = 2, - SHRUI_SHIFT_OPCODE_X0 = 5, - SHRUI_SHIFT_OPCODE_X1 = 5, - SHRUI_SHIFT_OPCODE_Y0 = 3, - SHRUI_SHIFT_OPCODE_Y1 = 3, - SHRUXI_SHIFT_OPCODE_X0 = 6, - SHRUXI_SHIFT_OPCODE_X1 = 6, - SHRUX_RRR_0_OPCODE_X0 = 76, - SHRUX_RRR_0_OPCODE_X1 = 40, - SHRU_RRR_0_OPCODE_X0 = 77, - SHRU_RRR_0_OPCODE_X1 = 41, - SHRU_RRR_6_OPCODE_Y0 = 3, - SHRU_RRR_6_OPCODE_Y1 = 3, - SHUFFLEBYTES_RRR_0_OPCODE_X0 = 78, - ST1_ADD_IMM8_OPCODE_X1 = 25, - ST1_OPCODE_Y2 = 0, - ST1_RRR_0_OPCODE_X1 = 42, - ST2_ADD_IMM8_OPCODE_X1 = 26, - ST2_OPCODE_Y2 = 1, - ST2_RRR_0_OPCODE_X1 = 43, - ST4_ADD_IMM8_OPCODE_X1 = 27, - ST4_OPCODE_Y2 = 2, - ST4_RRR_0_OPCODE_X1 = 44, - STNT1_ADD_IMM8_OPCODE_X1 = 28, - STNT1_RRR_0_OPCODE_X1 = 45, - STNT2_ADD_IMM8_OPCODE_X1 = 29, - STNT2_RRR_0_OPCODE_X1 = 46, - STNT4_ADD_IMM8_OPCODE_X1 = 30, - STNT4_RRR_0_OPCODE_X1 = 47, - STNT_ADD_IMM8_OPCODE_X1 = 31, - STNT_RRR_0_OPCODE_X1 = 48, - ST_ADD_IMM8_OPCODE_X1 = 32, - ST_OPCODE_Y2 = 3, - ST_RRR_0_OPCODE_X1 = 49, - SUBXSC_RRR_0_OPCODE_X0 = 79, - SUBXSC_RRR_0_OPCODE_X1 = 50, - SUBX_RRR_0_OPCODE_X0 = 80, - SUBX_RRR_0_OPCODE_X1 = 51, - SUBX_RRR_0_OPCODE_Y0 = 2, - SUBX_RRR_0_OPCODE_Y1 = 2, - SUB_RRR_0_OPCODE_X0 = 81, - SUB_RRR_0_OPCODE_X1 = 52, - SUB_RRR_0_OPCODE_Y0 = 3, - SUB_RRR_0_OPCODE_Y1 = 3, - SWINT0_UNARY_OPCODE_X1 = 34, - SWINT1_UNARY_OPCODE_X1 = 35, - SWINT2_UNARY_OPCODE_X1 = 36, - SWINT3_UNARY_OPCODE_X1 = 37, - TBLIDXB0_UNARY_OPCODE_X0 = 9, - TBLIDXB0_UNARY_OPCODE_Y0 = 9, - TBLIDXB1_UNARY_OPCODE_X0 = 10, - TBLIDXB1_UNARY_OPCODE_Y0 = 10, - TBLIDXB2_UNARY_OPCODE_X0 = 11, - TBLIDXB2_UNARY_OPCODE_Y0 = 11, - TBLIDXB3_UNARY_OPCODE_X0 = 12, - TBLIDXB3_UNARY_OPCODE_Y0 = 12, - UNARY_RRR_0_OPCODE_X0 = 82, - UNARY_RRR_0_OPCODE_X1 = 53, - UNARY_RRR_1_OPCODE_Y0 = 3, - UNARY_RRR_1_OPCODE_Y1 = 3, - V1ADDI_IMM8_OPCODE_X0 = 8, - V1ADDI_IMM8_OPCODE_X1 = 33, - V1ADDUC_RRR_0_OPCODE_X0 = 83, - V1ADDUC_RRR_0_OPCODE_X1 = 54, - V1ADD_RRR_0_OPCODE_X0 = 84, - V1ADD_RRR_0_OPCODE_X1 = 55, - V1ADIFFU_RRR_0_OPCODE_X0 = 85, - V1AVGU_RRR_0_OPCODE_X0 = 86, - V1CMPEQI_IMM8_OPCODE_X0 = 9, - V1CMPEQI_IMM8_OPCODE_X1 = 34, - V1CMPEQ_RRR_0_OPCODE_X0 = 87, - V1CMPEQ_RRR_0_OPCODE_X1 = 56, - V1CMPLES_RRR_0_OPCODE_X0 = 88, - V1CMPLES_RRR_0_OPCODE_X1 = 57, - V1CMPLEU_RRR_0_OPCODE_X0 = 89, - V1CMPLEU_RRR_0_OPCODE_X1 = 58, - V1CMPLTSI_IMM8_OPCODE_X0 = 10, - V1CMPLTSI_IMM8_OPCODE_X1 = 35, - V1CMPLTS_RRR_0_OPCODE_X0 = 90, - V1CMPLTS_RRR_0_OPCODE_X1 = 59, - V1CMPLTUI_IMM8_OPCODE_X0 = 11, - V1CMPLTUI_IMM8_OPCODE_X1 = 36, - V1CMPLTU_RRR_0_OPCODE_X0 = 91, - V1CMPLTU_RRR_0_OPCODE_X1 = 60, - V1CMPNE_RRR_0_OPCODE_X0 = 92, - V1CMPNE_RRR_0_OPCODE_X1 = 61, - V1DDOTPUA_RRR_0_OPCODE_X0 = 161, - V1DDOTPUSA_RRR_0_OPCODE_X0 = 93, - V1DDOTPUS_RRR_0_OPCODE_X0 = 94, - V1DDOTPU_RRR_0_OPCODE_X0 = 162, - V1DOTPA_RRR_0_OPCODE_X0 = 95, - V1DOTPUA_RRR_0_OPCODE_X0 = 163, - V1DOTPUSA_RRR_0_OPCODE_X0 = 96, - V1DOTPUS_RRR_0_OPCODE_X0 = 97, - V1DOTPU_RRR_0_OPCODE_X0 = 164, - V1DOTP_RRR_0_OPCODE_X0 = 98, - V1INT_H_RRR_0_OPCODE_X0 = 99, - V1INT_H_RRR_0_OPCODE_X1 = 62, - V1INT_L_RRR_0_OPCODE_X0 = 100, - V1INT_L_RRR_0_OPCODE_X1 = 63, - V1MAXUI_IMM8_OPCODE_X0 = 12, - V1MAXUI_IMM8_OPCODE_X1 = 37, - V1MAXU_RRR_0_OPCODE_X0 = 101, - V1MAXU_RRR_0_OPCODE_X1 = 64, - V1MINUI_IMM8_OPCODE_X0 = 13, - V1MINUI_IMM8_OPCODE_X1 = 38, - V1MINU_RRR_0_OPCODE_X0 = 102, - V1MINU_RRR_0_OPCODE_X1 = 65, - V1MNZ_RRR_0_OPCODE_X0 = 103, - V1MNZ_RRR_0_OPCODE_X1 = 66, - V1MULTU_RRR_0_OPCODE_X0 = 104, - V1MULUS_RRR_0_OPCODE_X0 = 105, - V1MULU_RRR_0_OPCODE_X0 = 106, - V1MZ_RRR_0_OPCODE_X0 = 107, - V1MZ_RRR_0_OPCODE_X1 = 67, - V1SADAU_RRR_0_OPCODE_X0 = 108, - V1SADU_RRR_0_OPCODE_X0 = 109, - V1SHLI_SHIFT_OPCODE_X0 = 7, - V1SHLI_SHIFT_OPCODE_X1 = 7, - V1SHL_RRR_0_OPCODE_X0 = 110, - V1SHL_RRR_0_OPCODE_X1 = 68, - V1SHRSI_SHIFT_OPCODE_X0 = 8, - V1SHRSI_SHIFT_OPCODE_X1 = 8, - V1SHRS_RRR_0_OPCODE_X0 = 111, - V1SHRS_RRR_0_OPCODE_X1 = 69, - V1SHRUI_SHIFT_OPCODE_X0 = 9, - V1SHRUI_SHIFT_OPCODE_X1 = 9, - V1SHRU_RRR_0_OPCODE_X0 = 112, - V1SHRU_RRR_0_OPCODE_X1 = 70, - V1SUBUC_RRR_0_OPCODE_X0 = 113, - V1SUBUC_RRR_0_OPCODE_X1 = 71, - V1SUB_RRR_0_OPCODE_X0 = 114, - V1SUB_RRR_0_OPCODE_X1 = 72, - V2ADDI_IMM8_OPCODE_X0 = 14, - V2ADDI_IMM8_OPCODE_X1 = 39, - V2ADDSC_RRR_0_OPCODE_X0 = 115, - V2ADDSC_RRR_0_OPCODE_X1 = 73, - V2ADD_RRR_0_OPCODE_X0 = 116, - V2ADD_RRR_0_OPCODE_X1 = 74, - V2ADIFFS_RRR_0_OPCODE_X0 = 117, - V2AVGS_RRR_0_OPCODE_X0 = 118, - V2CMPEQI_IMM8_OPCODE_X0 = 15, - V2CMPEQI_IMM8_OPCODE_X1 = 40, - V2CMPEQ_RRR_0_OPCODE_X0 = 119, - V2CMPEQ_RRR_0_OPCODE_X1 = 75, - V2CMPLES_RRR_0_OPCODE_X0 = 120, - V2CMPLES_RRR_0_OPCODE_X1 = 76, - V2CMPLEU_RRR_0_OPCODE_X0 = 121, - V2CMPLEU_RRR_0_OPCODE_X1 = 77, - V2CMPLTSI_IMM8_OPCODE_X0 = 16, - V2CMPLTSI_IMM8_OPCODE_X1 = 41, - V2CMPLTS_RRR_0_OPCODE_X0 = 122, - V2CMPLTS_RRR_0_OPCODE_X1 = 78, - V2CMPLTUI_IMM8_OPCODE_X0 = 17, - V2CMPLTUI_IMM8_OPCODE_X1 = 42, - V2CMPLTU_RRR_0_OPCODE_X0 = 123, - V2CMPLTU_RRR_0_OPCODE_X1 = 79, - V2CMPNE_RRR_0_OPCODE_X0 = 124, - V2CMPNE_RRR_0_OPCODE_X1 = 80, - V2DOTPA_RRR_0_OPCODE_X0 = 125, - V2DOTP_RRR_0_OPCODE_X0 = 126, - V2INT_H_RRR_0_OPCODE_X0 = 127, - V2INT_H_RRR_0_OPCODE_X1 = 81, - V2INT_L_RRR_0_OPCODE_X0 = 128, - V2INT_L_RRR_0_OPCODE_X1 = 82, - V2MAXSI_IMM8_OPCODE_X0 = 18, - V2MAXSI_IMM8_OPCODE_X1 = 43, - V2MAXS_RRR_0_OPCODE_X0 = 129, - V2MAXS_RRR_0_OPCODE_X1 = 83, - V2MINSI_IMM8_OPCODE_X0 = 19, - V2MINSI_IMM8_OPCODE_X1 = 44, - V2MINS_RRR_0_OPCODE_X0 = 130, - V2MINS_RRR_0_OPCODE_X1 = 84, - V2MNZ_RRR_0_OPCODE_X0 = 131, - V2MNZ_RRR_0_OPCODE_X1 = 85, - V2MULFSC_RRR_0_OPCODE_X0 = 132, - V2MULS_RRR_0_OPCODE_X0 = 133, - V2MULTS_RRR_0_OPCODE_X0 = 134, - V2MZ_RRR_0_OPCODE_X0 = 135, - V2MZ_RRR_0_OPCODE_X1 = 86, - V2PACKH_RRR_0_OPCODE_X0 = 136, - V2PACKH_RRR_0_OPCODE_X1 = 87, - V2PACKL_RRR_0_OPCODE_X0 = 137, - V2PACKL_RRR_0_OPCODE_X1 = 88, - V2PACKUC_RRR_0_OPCODE_X0 = 138, - V2PACKUC_RRR_0_OPCODE_X1 = 89, - V2SADAS_RRR_0_OPCODE_X0 = 139, - V2SADAU_RRR_0_OPCODE_X0 = 140, - V2SADS_RRR_0_OPCODE_X0 = 141, - V2SADU_RRR_0_OPCODE_X0 = 142, - V2SHLI_SHIFT_OPCODE_X0 = 10, - V2SHLI_SHIFT_OPCODE_X1 = 10, - V2SHLSC_RRR_0_OPCODE_X0 = 143, - V2SHLSC_RRR_0_OPCODE_X1 = 90, - V2SHL_RRR_0_OPCODE_X0 = 144, - V2SHL_RRR_0_OPCODE_X1 = 91, - V2SHRSI_SHIFT_OPCODE_X0 = 11, - V2SHRSI_SHIFT_OPCODE_X1 = 11, - V2SHRS_RRR_0_OPCODE_X0 = 145, - V2SHRS_RRR_0_OPCODE_X1 = 92, - V2SHRUI_SHIFT_OPCODE_X0 = 12, - V2SHRUI_SHIFT_OPCODE_X1 = 12, - V2SHRU_RRR_0_OPCODE_X0 = 146, - V2SHRU_RRR_0_OPCODE_X1 = 93, - V2SUBSC_RRR_0_OPCODE_X0 = 147, - V2SUBSC_RRR_0_OPCODE_X1 = 94, - V2SUB_RRR_0_OPCODE_X0 = 148, - V2SUB_RRR_0_OPCODE_X1 = 95, - V4ADDSC_RRR_0_OPCODE_X0 = 149, - V4ADDSC_RRR_0_OPCODE_X1 = 96, - V4ADD_RRR_0_OPCODE_X0 = 150, - V4ADD_RRR_0_OPCODE_X1 = 97, - V4INT_H_RRR_0_OPCODE_X0 = 151, - V4INT_H_RRR_0_OPCODE_X1 = 98, - V4INT_L_RRR_0_OPCODE_X0 = 152, - V4INT_L_RRR_0_OPCODE_X1 = 99, - V4PACKSC_RRR_0_OPCODE_X0 = 153, - V4PACKSC_RRR_0_OPCODE_X1 = 100, - V4SHLSC_RRR_0_OPCODE_X0 = 154, - V4SHLSC_RRR_0_OPCODE_X1 = 101, - V4SHL_RRR_0_OPCODE_X0 = 155, - V4SHL_RRR_0_OPCODE_X1 = 102, - V4SHRS_RRR_0_OPCODE_X0 = 156, - V4SHRS_RRR_0_OPCODE_X1 = 103, - V4SHRU_RRR_0_OPCODE_X0 = 157, - V4SHRU_RRR_0_OPCODE_X1 = 104, - V4SUBSC_RRR_0_OPCODE_X0 = 158, - V4SUBSC_RRR_0_OPCODE_X1 = 105, - V4SUB_RRR_0_OPCODE_X0 = 159, - V4SUB_RRR_0_OPCODE_X1 = 106, - WH64_UNARY_OPCODE_X1 = 38, - XORI_IMM8_OPCODE_X0 = 20, - XORI_IMM8_OPCODE_X1 = 45, - XOR_RRR_0_OPCODE_X0 = 160, - XOR_RRR_0_OPCODE_X1 = 107, - XOR_RRR_5_OPCODE_Y0 = 3, - XOR_RRR_5_OPCODE_Y1 = 3 -}; - -static __inline unsigned int -get_BFEnd_X0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 12)) & 0x3f); -} - -static __inline unsigned int -get_BFOpcodeExtension_X0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 24)) & 0xf); -} - -static __inline unsigned int -get_BFStart_X0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 18)) & 0x3f); -} - -static __inline unsigned int -get_BrOff_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 31)) & 0x0000003f) | - (((unsigned int)(n >> 37)) & 0x0001ffc0); -} - -static __inline unsigned int -get_BrType_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 54)) & 0x1f); -} - -static __inline unsigned int -get_Dest_Imm8_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 31)) & 0x0000003f) | - (((unsigned int)(n >> 43)) & 0x000000c0); -} - -static __inline unsigned int -get_Dest_X0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 0)) & 0x3f); -} - -static __inline unsigned int -get_Dest_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 31)) & 0x3f); -} - -static __inline unsigned int -get_Dest_Y0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 0)) & 0x3f); -} - -static __inline unsigned int -get_Dest_Y1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 31)) & 0x3f); -} - -static __inline unsigned int -get_Imm16_X0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 12)) & 0xffff); -} - -static __inline unsigned int -get_Imm16_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 43)) & 0xffff); -} - -static __inline unsigned int -get_Imm8OpcodeExtension_X0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 20)) & 0xff); -} - -static __inline unsigned int -get_Imm8OpcodeExtension_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 51)) & 0xff); -} - -static __inline unsigned int -get_Imm8_X0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 12)) & 0xff); -} - -static __inline unsigned int -get_Imm8_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 43)) & 0xff); -} - -static __inline unsigned int -get_Imm8_Y0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 12)) & 0xff); -} - -static __inline unsigned int -get_Imm8_Y1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 43)) & 0xff); -} - -static __inline unsigned int -get_JumpOff_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 31)) & 0x7ffffff); -} - -static __inline unsigned int -get_JumpOpcodeExtension_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 58)) & 0x1); -} - -static __inline unsigned int -get_MF_Imm14_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 37)) & 0x3fff); -} - -static __inline unsigned int -get_MT_Imm14_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 31)) & 0x0000003f) | - (((unsigned int)(n >> 37)) & 0x00003fc0); -} - -static __inline unsigned int -get_Mode(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 62)) & 0x3); -} - -static __inline unsigned int -get_Opcode_X0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 28)) & 0x7); -} - -static __inline unsigned int -get_Opcode_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 59)) & 0x7); -} - -static __inline unsigned int -get_Opcode_Y0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 27)) & 0xf); -} - -static __inline unsigned int -get_Opcode_Y1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 58)) & 0xf); -} - -static __inline unsigned int -get_Opcode_Y2(tilegx_bundle_bits n) -{ - return (((n >> 26)) & 0x00000001) | - (((unsigned int)(n >> 56)) & 0x00000002); -} - -static __inline unsigned int -get_RRROpcodeExtension_X0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 18)) & 0x3ff); -} - -static __inline unsigned int -get_RRROpcodeExtension_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 49)) & 0x3ff); -} - -static __inline unsigned int -get_RRROpcodeExtension_Y0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 18)) & 0x3); -} - -static __inline unsigned int -get_RRROpcodeExtension_Y1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 49)) & 0x3); -} - -static __inline unsigned int -get_ShAmt_X0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 12)) & 0x3f); -} - -static __inline unsigned int -get_ShAmt_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 43)) & 0x3f); -} - -static __inline unsigned int -get_ShAmt_Y0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 12)) & 0x3f); -} - -static __inline unsigned int -get_ShAmt_Y1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 43)) & 0x3f); -} - -static __inline unsigned int -get_ShiftOpcodeExtension_X0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 18)) & 0x3ff); -} - -static __inline unsigned int -get_ShiftOpcodeExtension_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 49)) & 0x3ff); -} - -static __inline unsigned int -get_ShiftOpcodeExtension_Y0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 18)) & 0x3); -} - -static __inline unsigned int -get_ShiftOpcodeExtension_Y1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 49)) & 0x3); -} - -static __inline unsigned int -get_SrcA_X0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 6)) & 0x3f); -} - -static __inline unsigned int -get_SrcA_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 37)) & 0x3f); -} - -static __inline unsigned int -get_SrcA_Y0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 6)) & 0x3f); -} - -static __inline unsigned int -get_SrcA_Y1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 37)) & 0x3f); -} - -static __inline unsigned int -get_SrcA_Y2(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 20)) & 0x3f); -} - -static __inline unsigned int -get_SrcBDest_Y2(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 51)) & 0x3f); -} - -static __inline unsigned int -get_SrcB_X0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 12)) & 0x3f); -} - -static __inline unsigned int -get_SrcB_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 43)) & 0x3f); -} - -static __inline unsigned int -get_SrcB_Y0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 12)) & 0x3f); -} - -static __inline unsigned int -get_SrcB_Y1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 43)) & 0x3f); -} - -static __inline unsigned int -get_UnaryOpcodeExtension_X0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 12)) & 0x3f); -} - -static __inline unsigned int -get_UnaryOpcodeExtension_X1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 43)) & 0x3f); -} - -static __inline unsigned int -get_UnaryOpcodeExtension_Y0(tilegx_bundle_bits num) -{ - const unsigned int n = (unsigned int)num; - return (((n >> 12)) & 0x3f); -} - -static __inline unsigned int -get_UnaryOpcodeExtension_Y1(tilegx_bundle_bits n) -{ - return (((unsigned int)(n >> 43)) & 0x3f); -} - -static __inline int -sign_extend(int n, int num_bits) -{ - int shift = (int)(sizeof(int) * 8 - num_bits); - return (n << shift) >> shift; -} - -static __inline tilegx_bundle_bits -create_BFEnd_X0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3f) << 12); -} - -static __inline tilegx_bundle_bits -create_BFOpcodeExtension_X0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0xf) << 24); -} - -static __inline tilegx_bundle_bits -create_BFStart_X0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3f) << 18); -} - -static __inline tilegx_bundle_bits -create_BrOff_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x0000003f)) << 31) | - (((tilegx_bundle_bits)(n & 0x0001ffc0)) << 37); -} - -static __inline tilegx_bundle_bits -create_BrType_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x1f)) << 54); -} - -static __inline tilegx_bundle_bits -create_Dest_Imm8_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x0000003f)) << 31) | - (((tilegx_bundle_bits)(n & 0x000000c0)) << 43); -} - -static __inline tilegx_bundle_bits -create_Dest_X0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3f) << 0); -} - -static __inline tilegx_bundle_bits -create_Dest_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3f)) << 31); -} - -static __inline tilegx_bundle_bits -create_Dest_Y0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3f) << 0); -} - -static __inline tilegx_bundle_bits -create_Dest_Y1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3f)) << 31); -} - -static __inline tilegx_bundle_bits -create_Imm16_X0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0xffff) << 12); -} - -static __inline tilegx_bundle_bits -create_Imm16_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0xffff)) << 43); -} - -static __inline tilegx_bundle_bits -create_Imm8OpcodeExtension_X0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0xff) << 20); -} - -static __inline tilegx_bundle_bits -create_Imm8OpcodeExtension_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0xff)) << 51); -} - -static __inline tilegx_bundle_bits -create_Imm8_X0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0xff) << 12); -} - -static __inline tilegx_bundle_bits -create_Imm8_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0xff)) << 43); -} - -static __inline tilegx_bundle_bits -create_Imm8_Y0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0xff) << 12); -} - -static __inline tilegx_bundle_bits -create_Imm8_Y1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0xff)) << 43); -} - -static __inline tilegx_bundle_bits -create_JumpOff_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x7ffffff)) << 31); -} - -static __inline tilegx_bundle_bits -create_JumpOpcodeExtension_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x1)) << 58); -} - -static __inline tilegx_bundle_bits -create_MF_Imm14_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3fff)) << 37); -} - -static __inline tilegx_bundle_bits -create_MT_Imm14_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x0000003f)) << 31) | - (((tilegx_bundle_bits)(n & 0x00003fc0)) << 37); -} - -static __inline tilegx_bundle_bits -create_Mode(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3)) << 62); -} - -static __inline tilegx_bundle_bits -create_Opcode_X0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x7) << 28); -} - -static __inline tilegx_bundle_bits -create_Opcode_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x7)) << 59); -} - -static __inline tilegx_bundle_bits -create_Opcode_Y0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0xf) << 27); -} - -static __inline tilegx_bundle_bits -create_Opcode_Y1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0xf)) << 58); -} - -static __inline tilegx_bundle_bits -create_Opcode_Y2(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x00000001) << 26) | - (((tilegx_bundle_bits)(n & 0x00000002)) << 56); -} - -static __inline tilegx_bundle_bits -create_RRROpcodeExtension_X0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3ff) << 18); -} - -static __inline tilegx_bundle_bits -create_RRROpcodeExtension_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3ff)) << 49); -} - -static __inline tilegx_bundle_bits -create_RRROpcodeExtension_Y0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3) << 18); -} - -static __inline tilegx_bundle_bits -create_RRROpcodeExtension_Y1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3)) << 49); -} - -static __inline tilegx_bundle_bits -create_ShAmt_X0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3f) << 12); -} - -static __inline tilegx_bundle_bits -create_ShAmt_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3f)) << 43); -} - -static __inline tilegx_bundle_bits -create_ShAmt_Y0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3f) << 12); -} - -static __inline tilegx_bundle_bits -create_ShAmt_Y1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3f)) << 43); -} - -static __inline tilegx_bundle_bits -create_ShiftOpcodeExtension_X0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3ff) << 18); -} - -static __inline tilegx_bundle_bits -create_ShiftOpcodeExtension_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3ff)) << 49); -} - -static __inline tilegx_bundle_bits -create_ShiftOpcodeExtension_Y0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3) << 18); -} - -static __inline tilegx_bundle_bits -create_ShiftOpcodeExtension_Y1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3)) << 49); -} - -static __inline tilegx_bundle_bits -create_SrcA_X0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3f) << 6); -} - -static __inline tilegx_bundle_bits -create_SrcA_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3f)) << 37); -} - -static __inline tilegx_bundle_bits -create_SrcA_Y0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3f) << 6); -} - -static __inline tilegx_bundle_bits -create_SrcA_Y1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3f)) << 37); -} - -static __inline tilegx_bundle_bits -create_SrcA_Y2(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3f) << 20); -} - -static __inline tilegx_bundle_bits -create_SrcBDest_Y2(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3f)) << 51); -} - -static __inline tilegx_bundle_bits -create_SrcB_X0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3f) << 12); -} - -static __inline tilegx_bundle_bits -create_SrcB_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3f)) << 43); -} - -static __inline tilegx_bundle_bits -create_SrcB_Y0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3f) << 12); -} - -static __inline tilegx_bundle_bits -create_SrcB_Y1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3f)) << 43); -} - -static __inline tilegx_bundle_bits -create_UnaryOpcodeExtension_X0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3f) << 12); -} - -static __inline tilegx_bundle_bits -create_UnaryOpcodeExtension_X1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3f)) << 43); -} - -static __inline tilegx_bundle_bits -create_UnaryOpcodeExtension_Y0(int num) -{ - const unsigned int n = (unsigned int)num; - return ((n & 0x3f) << 12); -} - -static __inline tilegx_bundle_bits -create_UnaryOpcodeExtension_Y1(int num) -{ - const unsigned int n = (unsigned int)num; - return (((tilegx_bundle_bits)(n & 0x3f)) << 43); -} - -const struct tilegx_opcode tilegx_opcodes[336] = -{ - { "bpt", TILEGX_OPC_BPT, 0x2, 0, TREG_ZERO, 0, - { { 0, }, { }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffffffff80000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286a44ae00000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "info", TILEGX_OPC_INFO, 0xf, 1, TREG_ZERO, 1, - { { 0 }, { 1 }, { 2 }, { 3 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00fffULL, - 0xfff807ff80000000ULL, - 0x0000000078000fffULL, - 0x3c0007ff80000000ULL, - 0ULL - }, - { - 0x0000000040300fffULL, - 0x181807ff80000000ULL, - 0x0000000010000fffULL, - 0x0c0007ff80000000ULL, - -1ULL - } -#endif - }, - { "infol", TILEGX_OPC_INFOL, 0x3, 1, TREG_ZERO, 1, - { { 4 }, { 5 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc000000070000fffULL, - 0xf80007ff80000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000070000fffULL, - 0x380007ff80000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ld4s_tls", TILEGX_OPC_LD4S_TLS, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1858000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ld_tls", TILEGX_OPC_LD_TLS, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x18a0000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "move", TILEGX_OPC_MOVE, 0xf, 2, TREG_ZERO, 1, - { { 8, 9 }, { 6, 7 }, { 10, 11 }, { 12, 13 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffff000ULL, - 0xfffff80000000000ULL, - 0x00000000780ff000ULL, - 0x3c07f80000000000ULL, - 0ULL - }, - { - 0x000000005107f000ULL, - 0x283bf80000000000ULL, - 0x00000000500bf000ULL, - 0x2c05f80000000000ULL, - -1ULL - } -#endif - }, - { "movei", TILEGX_OPC_MOVEI, 0xf, 2, TREG_ZERO, 1, - { { 8, 0 }, { 6, 1 }, { 10, 2 }, { 12, 3 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00fc0ULL, - 0xfff807e000000000ULL, - 0x0000000078000fc0ULL, - 0x3c0007e000000000ULL, - 0ULL - }, - { - 0x0000000040100fc0ULL, - 0x180807e000000000ULL, - 0x0000000000000fc0ULL, - 0x040007e000000000ULL, - -1ULL - } -#endif - }, - { "moveli", TILEGX_OPC_MOVELI, 0x3, 2, TREG_ZERO, 1, - { { 8, 4 }, { 6, 5 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc000000070000fc0ULL, - 0xf80007e000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000010000fc0ULL, - 0x000007e000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "prefetch", TILEGX_OPC_PREFETCH, 0x12, 1, TREG_ZERO, 1, - { { 0, }, { 7 }, { 0, }, { 0, }, { 14 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff81f80000000ULL, - 0ULL, - 0ULL, - 0xc3f8000004000000ULL - }, - { - -1ULL, - 0x286a801f80000000ULL, - -1ULL, - -1ULL, - 0x41f8000004000000ULL - } -#endif - }, - { "prefetch_add_l1", TILEGX_OPC_PREFETCH_ADD_L1, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8001f80000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1840001f80000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "prefetch_add_l1_fault", TILEGX_OPC_PREFETCH_ADD_L1_FAULT, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8001f80000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1838001f80000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "prefetch_add_l2", TILEGX_OPC_PREFETCH_ADD_L2, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8001f80000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1850001f80000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "prefetch_add_l2_fault", TILEGX_OPC_PREFETCH_ADD_L2_FAULT, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8001f80000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1848001f80000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "prefetch_add_l3", TILEGX_OPC_PREFETCH_ADD_L3, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8001f80000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1860001f80000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "prefetch_add_l3_fault", TILEGX_OPC_PREFETCH_ADD_L3_FAULT, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8001f80000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1858001f80000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "prefetch_l1", TILEGX_OPC_PREFETCH_L1, 0x12, 1, TREG_ZERO, 1, - { { 0, }, { 7 }, { 0, }, { 0, }, { 14 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff81f80000000ULL, - 0ULL, - 0ULL, - 0xc3f8000004000000ULL - }, - { - -1ULL, - 0x286a801f80000000ULL, - -1ULL, - -1ULL, - 0x41f8000004000000ULL - } -#endif - }, - { "prefetch_l1_fault", TILEGX_OPC_PREFETCH_L1_FAULT, 0x12, 1, TREG_ZERO, 1, - { { 0, }, { 7 }, { 0, }, { 0, }, { 14 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff81f80000000ULL, - 0ULL, - 0ULL, - 0xc3f8000004000000ULL - }, - { - -1ULL, - 0x286a781f80000000ULL, - -1ULL, - -1ULL, - 0x41f8000000000000ULL - } -#endif - }, - { "prefetch_l2", TILEGX_OPC_PREFETCH_L2, 0x12, 1, TREG_ZERO, 1, - { { 0, }, { 7 }, { 0, }, { 0, }, { 14 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff81f80000000ULL, - 0ULL, - 0ULL, - 0xc3f8000004000000ULL - }, - { - -1ULL, - 0x286a901f80000000ULL, - -1ULL, - -1ULL, - 0x43f8000004000000ULL - } -#endif - }, - { "prefetch_l2_fault", TILEGX_OPC_PREFETCH_L2_FAULT, 0x12, 1, TREG_ZERO, 1, - { { 0, }, { 7 }, { 0, }, { 0, }, { 14 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff81f80000000ULL, - 0ULL, - 0ULL, - 0xc3f8000004000000ULL - }, - { - -1ULL, - 0x286a881f80000000ULL, - -1ULL, - -1ULL, - 0x43f8000000000000ULL - } -#endif - }, - { "prefetch_l3", TILEGX_OPC_PREFETCH_L3, 0x12, 1, TREG_ZERO, 1, - { { 0, }, { 7 }, { 0, }, { 0, }, { 14 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff81f80000000ULL, - 0ULL, - 0ULL, - 0xc3f8000004000000ULL - }, - { - -1ULL, - 0x286aa01f80000000ULL, - -1ULL, - -1ULL, - 0x83f8000000000000ULL - } -#endif - }, - { "prefetch_l3_fault", TILEGX_OPC_PREFETCH_L3_FAULT, 0x12, 1, TREG_ZERO, 1, - { { 0, }, { 7 }, { 0, }, { 0, }, { 14 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff81f80000000ULL, - 0ULL, - 0ULL, - 0xc3f8000004000000ULL - }, - { - -1ULL, - 0x286a981f80000000ULL, - -1ULL, - -1ULL, - 0x81f8000004000000ULL - } -#endif - }, - { "raise", TILEGX_OPC_RAISE, 0x2, 0, TREG_ZERO, 1, - { { 0, }, { }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffffffff80000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286a44ae80000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "add", TILEGX_OPC_ADD, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x00000000500c0000ULL, - 0x2806000000000000ULL, - 0x0000000028040000ULL, - 0x1802000000000000ULL, - -1ULL - } -#endif - }, - { "addi", TILEGX_OPC_ADDI, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 10, 11, 2 }, { 12, 13, 3 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0x0000000078000000ULL, - 0x3c00000000000000ULL, - 0ULL - }, - { - 0x0000000040100000ULL, - 0x1808000000000000ULL, - 0ULL, - 0x0400000000000000ULL, - -1ULL - } -#endif - }, - { "addli", TILEGX_OPC_ADDLI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 4 }, { 6, 7, 5 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc000000070000000ULL, - 0xf800000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000010000000ULL, - 0ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "addx", TILEGX_OPC_ADDX, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000050080000ULL, - 0x2804000000000000ULL, - 0x0000000028000000ULL, - 0x1800000000000000ULL, - -1ULL - } -#endif - }, - { "addxi", TILEGX_OPC_ADDXI, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 10, 11, 2 }, { 12, 13, 3 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0x0000000078000000ULL, - 0x3c00000000000000ULL, - 0ULL - }, - { - 0x0000000040200000ULL, - 0x1810000000000000ULL, - 0x0000000008000000ULL, - 0x0800000000000000ULL, - -1ULL - } -#endif - }, - { "addxli", TILEGX_OPC_ADDXLI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 4 }, { 6, 7, 5 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc000000070000000ULL, - 0xf800000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000020000000ULL, - 0x0800000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "addxsc", TILEGX_OPC_ADDXSC, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050040000ULL, - 0x2802000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "and", TILEGX_OPC_AND, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000050100000ULL, - 0x2808000000000000ULL, - 0x0000000050000000ULL, - 0x2c00000000000000ULL, - -1ULL - } -#endif - }, - { "andi", TILEGX_OPC_ANDI, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 10, 11, 2 }, { 12, 13, 3 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0x0000000078000000ULL, - 0x3c00000000000000ULL, - 0ULL - }, - { - 0x0000000040300000ULL, - 0x1818000000000000ULL, - 0x0000000010000000ULL, - 0x0c00000000000000ULL, - -1ULL - } -#endif - }, - { "beqz", TILEGX_OPC_BEQZ, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1440000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "beqzt", TILEGX_OPC_BEQZT, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1400000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "bfexts", TILEGX_OPC_BFEXTS, 0x1, 4, TREG_ZERO, 1, - { { 8, 9, 21, 22 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007f000000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000034000000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "bfextu", TILEGX_OPC_BFEXTU, 0x1, 4, TREG_ZERO, 1, - { { 8, 9, 21, 22 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007f000000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000035000000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "bfins", TILEGX_OPC_BFINS, 0x1, 4, TREG_ZERO, 1, - { { 23, 9, 21, 22 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007f000000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000036000000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "bgez", TILEGX_OPC_BGEZ, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x14c0000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "bgezt", TILEGX_OPC_BGEZT, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1480000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "bgtz", TILEGX_OPC_BGTZ, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1540000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "bgtzt", TILEGX_OPC_BGTZT, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1500000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "blbc", TILEGX_OPC_BLBC, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x15c0000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "blbct", TILEGX_OPC_BLBCT, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1580000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "blbs", TILEGX_OPC_BLBS, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1640000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "blbst", TILEGX_OPC_BLBST, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1600000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "blez", TILEGX_OPC_BLEZ, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x16c0000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "blezt", TILEGX_OPC_BLEZT, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1680000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "bltz", TILEGX_OPC_BLTZ, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1740000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "bltzt", TILEGX_OPC_BLTZT, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1700000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "bnez", TILEGX_OPC_BNEZ, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x17c0000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "bnezt", TILEGX_OPC_BNEZT, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 20 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xffc0000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1780000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "clz", TILEGX_OPC_CLZ, 0x5, 2, TREG_ZERO, 1, - { { 8, 9 }, { 0, }, { 10, 11 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffff000ULL, - 0ULL, - 0x00000000780ff000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051481000ULL, - -1ULL, - 0x00000000300c1000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "cmoveqz", TILEGX_OPC_CMOVEQZ, 0x5, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 24, 11, 18 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0x00000000780c0000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050140000ULL, - -1ULL, - 0x0000000048000000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "cmovnez", TILEGX_OPC_CMOVNEZ, 0x5, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 24, 11, 18 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0x00000000780c0000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050180000ULL, - -1ULL, - 0x0000000048040000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "cmpeq", TILEGX_OPC_CMPEQ, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x00000000501c0000ULL, - 0x280a000000000000ULL, - 0x0000000040000000ULL, - 0x2404000000000000ULL, - -1ULL - } -#endif - }, - { "cmpeqi", TILEGX_OPC_CMPEQI, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 10, 11, 2 }, { 12, 13, 3 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0x0000000078000000ULL, - 0x3c00000000000000ULL, - 0ULL - }, - { - 0x0000000040400000ULL, - 0x1820000000000000ULL, - 0x0000000018000000ULL, - 0x1000000000000000ULL, - -1ULL - } -#endif - }, - { "cmpexch", TILEGX_OPC_CMPEXCH, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x280e000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "cmpexch4", TILEGX_OPC_CMPEXCH4, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x280c000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "cmples", TILEGX_OPC_CMPLES, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000050200000ULL, - 0x2810000000000000ULL, - 0x0000000038000000ULL, - 0x2000000000000000ULL, - -1ULL - } -#endif - }, - { "cmpleu", TILEGX_OPC_CMPLEU, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000050240000ULL, - 0x2812000000000000ULL, - 0x0000000038040000ULL, - 0x2002000000000000ULL, - -1ULL - } -#endif - }, - { "cmplts", TILEGX_OPC_CMPLTS, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000050280000ULL, - 0x2814000000000000ULL, - 0x0000000038080000ULL, - 0x2004000000000000ULL, - -1ULL - } -#endif - }, - { "cmpltsi", TILEGX_OPC_CMPLTSI, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 10, 11, 2 }, { 12, 13, 3 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0x0000000078000000ULL, - 0x3c00000000000000ULL, - 0ULL - }, - { - 0x0000000040500000ULL, - 0x1828000000000000ULL, - 0x0000000020000000ULL, - 0x1400000000000000ULL, - -1ULL - } -#endif - }, - { "cmpltu", TILEGX_OPC_CMPLTU, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x00000000502c0000ULL, - 0x2816000000000000ULL, - 0x00000000380c0000ULL, - 0x2006000000000000ULL, - -1ULL - } -#endif - }, - { "cmpltui", TILEGX_OPC_CMPLTUI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000040600000ULL, - 0x1830000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "cmpne", TILEGX_OPC_CMPNE, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000050300000ULL, - 0x2818000000000000ULL, - 0x0000000040040000ULL, - 0x2406000000000000ULL, - -1ULL - } -#endif - }, - { "cmul", TILEGX_OPC_CMUL, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000504c0000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "cmula", TILEGX_OPC_CMULA, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050380000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "cmulaf", TILEGX_OPC_CMULAF, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050340000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "cmulf", TILEGX_OPC_CMULF, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050400000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "cmulfr", TILEGX_OPC_CMULFR, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000503c0000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "cmulh", TILEGX_OPC_CMULH, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050480000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "cmulhr", TILEGX_OPC_CMULHR, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050440000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "crc32_32", TILEGX_OPC_CRC32_32, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050500000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "crc32_8", TILEGX_OPC_CRC32_8, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050540000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ctz", TILEGX_OPC_CTZ, 0x5, 2, TREG_ZERO, 1, - { { 8, 9 }, { 0, }, { 10, 11 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffff000ULL, - 0ULL, - 0x00000000780ff000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051482000ULL, - -1ULL, - 0x00000000300c2000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "dblalign", TILEGX_OPC_DBLALIGN, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050640000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "dblalign2", TILEGX_OPC_DBLALIGN2, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050580000ULL, - 0x281a000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "dblalign4", TILEGX_OPC_DBLALIGN4, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000505c0000ULL, - 0x281c000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "dblalign6", TILEGX_OPC_DBLALIGN6, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050600000ULL, - 0x281e000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "drain", TILEGX_OPC_DRAIN, 0x2, 0, TREG_ZERO, 0, - { { 0, }, { }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286a080000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "dtlbpr", TILEGX_OPC_DTLBPR, 0x2, 1, TREG_ZERO, 1, - { { 0, }, { 7 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286a100000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "exch", TILEGX_OPC_EXCH, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x2822000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "exch4", TILEGX_OPC_EXCH4, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x2820000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fdouble_add_flags", TILEGX_OPC_FDOUBLE_ADD_FLAGS, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000506c0000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fdouble_addsub", TILEGX_OPC_FDOUBLE_ADDSUB, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050680000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fdouble_mul_flags", TILEGX_OPC_FDOUBLE_MUL_FLAGS, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050700000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fdouble_pack1", TILEGX_OPC_FDOUBLE_PACK1, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050740000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fdouble_pack2", TILEGX_OPC_FDOUBLE_PACK2, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050780000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fdouble_sub_flags", TILEGX_OPC_FDOUBLE_SUB_FLAGS, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000507c0000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fdouble_unpack_max", TILEGX_OPC_FDOUBLE_UNPACK_MAX, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050800000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fdouble_unpack_min", TILEGX_OPC_FDOUBLE_UNPACK_MIN, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050840000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fetchadd", TILEGX_OPC_FETCHADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x282a000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fetchadd4", TILEGX_OPC_FETCHADD4, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x2824000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fetchaddgez", TILEGX_OPC_FETCHADDGEZ, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x2828000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fetchaddgez4", TILEGX_OPC_FETCHADDGEZ4, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x2826000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fetchand", TILEGX_OPC_FETCHAND, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x282e000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fetchand4", TILEGX_OPC_FETCHAND4, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x282c000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fetchor", TILEGX_OPC_FETCHOR, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x2832000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fetchor4", TILEGX_OPC_FETCHOR4, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x2830000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "finv", TILEGX_OPC_FINV, 0x2, 1, TREG_ZERO, 1, - { { 0, }, { 7 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286a180000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "flush", TILEGX_OPC_FLUSH, 0x2, 1, TREG_ZERO, 1, - { { 0, }, { 7 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286a280000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "flushwb", TILEGX_OPC_FLUSHWB, 0x2, 0, TREG_ZERO, 1, - { { 0, }, { }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286a200000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fnop", TILEGX_OPC_FNOP, 0xf, 0, TREG_ZERO, 1, - { { }, { }, { }, { }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffff000ULL, - 0xfffff80000000000ULL, - 0x00000000780ff000ULL, - 0x3c07f80000000000ULL, - 0ULL - }, - { - 0x0000000051483000ULL, - 0x286a300000000000ULL, - 0x00000000300c3000ULL, - 0x1c06400000000000ULL, - -1ULL - } -#endif - }, - { "fsingle_add1", TILEGX_OPC_FSINGLE_ADD1, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050880000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fsingle_addsub2", TILEGX_OPC_FSINGLE_ADDSUB2, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000508c0000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fsingle_mul1", TILEGX_OPC_FSINGLE_MUL1, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050900000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fsingle_mul2", TILEGX_OPC_FSINGLE_MUL2, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050940000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fsingle_pack1", TILEGX_OPC_FSINGLE_PACK1, 0x5, 2, TREG_ZERO, 1, - { { 8, 9 }, { 0, }, { 10, 11 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffff000ULL, - 0ULL, - 0x00000000780ff000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051484000ULL, - -1ULL, - 0x00000000300c4000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fsingle_pack2", TILEGX_OPC_FSINGLE_PACK2, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050980000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "fsingle_sub1", TILEGX_OPC_FSINGLE_SUB1, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000509c0000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "icoh", TILEGX_OPC_ICOH, 0x2, 1, TREG_ZERO, 1, - { { 0, }, { 7 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286a380000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ill", TILEGX_OPC_ILL, 0xa, 0, TREG_ZERO, 1, - { { 0, }, { }, { 0, }, { }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0x3c07f80000000000ULL, - 0ULL - }, - { - -1ULL, - 0x286a400000000000ULL, - -1ULL, - 0x1c06480000000000ULL, - -1ULL - } -#endif - }, - { "inv", TILEGX_OPC_INV, 0x2, 1, TREG_ZERO, 1, - { { 0, }, { 7 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286a480000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "iret", TILEGX_OPC_IRET, 0x2, 0, TREG_ZERO, 1, - { { 0, }, { }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286a500000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "j", TILEGX_OPC_J, 0x2, 1, TREG_ZERO, 1, - { { 0, }, { 25 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfc00000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x2400000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "jal", TILEGX_OPC_JAL, 0x2, 1, TREG_LR, 1, - { { 0, }, { 25 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfc00000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x2000000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "jalr", TILEGX_OPC_JALR, 0xa, 1, TREG_LR, 1, - { { 0, }, { 7 }, { 0, }, { 13 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0x3c07f80000000000ULL, - 0ULL - }, - { - -1ULL, - 0x286a600000000000ULL, - -1ULL, - 0x1c06580000000000ULL, - -1ULL - } -#endif - }, - { "jalrp", TILEGX_OPC_JALRP, 0xa, 1, TREG_LR, 1, - { { 0, }, { 7 }, { 0, }, { 13 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0x3c07f80000000000ULL, - 0ULL - }, - { - -1ULL, - 0x286a580000000000ULL, - -1ULL, - 0x1c06500000000000ULL, - -1ULL - } -#endif - }, - { "jr", TILEGX_OPC_JR, 0xa, 1, TREG_ZERO, 1, - { { 0, }, { 7 }, { 0, }, { 13 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0x3c07f80000000000ULL, - 0ULL - }, - { - -1ULL, - 0x286a700000000000ULL, - -1ULL, - 0x1c06680000000000ULL, - -1ULL - } -#endif - }, - { "jrp", TILEGX_OPC_JRP, 0xa, 1, TREG_ZERO, 1, - { { 0, }, { 7 }, { 0, }, { 13 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0x3c07f80000000000ULL, - 0ULL - }, - { - -1ULL, - 0x286a680000000000ULL, - -1ULL, - 0x1c06600000000000ULL, - -1ULL - } -#endif - }, - { "ld", TILEGX_OPC_LD, 0x12, 2, TREG_ZERO, 1, - { { 0, }, { 6, 7 }, { 0, }, { 0, }, { 26, 14 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0xc200000004000000ULL - }, - { - -1ULL, - 0x286ae80000000000ULL, - -1ULL, - -1ULL, - 0x8200000004000000ULL - } -#endif - }, - { "ld1s", TILEGX_OPC_LD1S, 0x12, 2, TREG_ZERO, 1, - { { 0, }, { 6, 7 }, { 0, }, { 0, }, { 26, 14 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0xc200000004000000ULL - }, - { - -1ULL, - 0x286a780000000000ULL, - -1ULL, - -1ULL, - 0x4000000000000000ULL - } -#endif - }, - { "ld1s_add", TILEGX_OPC_LD1S_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1838000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ld1u", TILEGX_OPC_LD1U, 0x12, 2, TREG_ZERO, 1, - { { 0, }, { 6, 7 }, { 0, }, { 0, }, { 26, 14 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0xc200000004000000ULL - }, - { - -1ULL, - 0x286a800000000000ULL, - -1ULL, - -1ULL, - 0x4000000004000000ULL - } -#endif - }, - { "ld1u_add", TILEGX_OPC_LD1U_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1840000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ld2s", TILEGX_OPC_LD2S, 0x12, 2, TREG_ZERO, 1, - { { 0, }, { 6, 7 }, { 0, }, { 0, }, { 26, 14 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0xc200000004000000ULL - }, - { - -1ULL, - 0x286a880000000000ULL, - -1ULL, - -1ULL, - 0x4200000000000000ULL - } -#endif - }, - { "ld2s_add", TILEGX_OPC_LD2S_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1848000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ld2u", TILEGX_OPC_LD2U, 0x12, 2, TREG_ZERO, 1, - { { 0, }, { 6, 7 }, { 0, }, { 0, }, { 26, 14 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0xc200000004000000ULL - }, - { - -1ULL, - 0x286a900000000000ULL, - -1ULL, - -1ULL, - 0x4200000004000000ULL - } -#endif - }, - { "ld2u_add", TILEGX_OPC_LD2U_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1850000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ld4s", TILEGX_OPC_LD4S, 0x12, 2, TREG_ZERO, 1, - { { 0, }, { 6, 7 }, { 0, }, { 0, }, { 26, 14 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0xc200000004000000ULL - }, - { - -1ULL, - 0x286a980000000000ULL, - -1ULL, - -1ULL, - 0x8000000004000000ULL - } -#endif - }, - { "ld4s_add", TILEGX_OPC_LD4S_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1858000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ld4u", TILEGX_OPC_LD4U, 0x12, 2, TREG_ZERO, 1, - { { 0, }, { 6, 7 }, { 0, }, { 0, }, { 26, 14 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0xc200000004000000ULL - }, - { - -1ULL, - 0x286aa00000000000ULL, - -1ULL, - -1ULL, - 0x8200000000000000ULL - } -#endif - }, - { "ld4u_add", TILEGX_OPC_LD4U_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1860000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ld_add", TILEGX_OPC_LD_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x18a0000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldna", TILEGX_OPC_LDNA, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 6, 7 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286aa80000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldna_add", TILEGX_OPC_LDNA_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x18a8000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldnt", TILEGX_OPC_LDNT, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 6, 7 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286ae00000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldnt1s", TILEGX_OPC_LDNT1S, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 6, 7 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286ab00000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldnt1s_add", TILEGX_OPC_LDNT1S_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1868000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldnt1u", TILEGX_OPC_LDNT1U, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 6, 7 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286ab80000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldnt1u_add", TILEGX_OPC_LDNT1U_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1870000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldnt2s", TILEGX_OPC_LDNT2S, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 6, 7 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286ac00000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldnt2s_add", TILEGX_OPC_LDNT2S_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1878000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldnt2u", TILEGX_OPC_LDNT2U, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 6, 7 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286ac80000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldnt2u_add", TILEGX_OPC_LDNT2U_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1880000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldnt4s", TILEGX_OPC_LDNT4S, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 6, 7 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286ad00000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldnt4s_add", TILEGX_OPC_LDNT4S_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1888000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldnt4u", TILEGX_OPC_LDNT4U, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 6, 7 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286ad80000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldnt4u_add", TILEGX_OPC_LDNT4U_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1890000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "ldnt_add", TILEGX_OPC_LDNT_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 6, 15, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1898000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "lnk", TILEGX_OPC_LNK, 0xa, 1, TREG_ZERO, 1, - { { 0, }, { 6 }, { 0, }, { 12 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0x3c07f80000000000ULL, - 0ULL - }, - { - -1ULL, - 0x286af00000000000ULL, - -1ULL, - 0x1c06700000000000ULL, - -1ULL - } -#endif - }, - { "mf", TILEGX_OPC_MF, 0x2, 0, TREG_ZERO, 1, - { { 0, }, { }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286af80000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mfspr", TILEGX_OPC_MFSPR, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 6, 27 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x18b0000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mm", TILEGX_OPC_MM, 0x1, 4, TREG_ZERO, 1, - { { 23, 9, 21, 22 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007f000000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000037000000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mnz", TILEGX_OPC_MNZ, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000050a00000ULL, - 0x2834000000000000ULL, - 0x0000000048080000ULL, - 0x2804000000000000ULL, - -1ULL - } -#endif - }, - { "mtspr", TILEGX_OPC_MTSPR, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 28, 7 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x18b8000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mul_hs_hs", TILEGX_OPC_MUL_HS_HS, 0x5, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 10, 11, 18 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0x00000000780c0000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050d40000ULL, - -1ULL, - 0x0000000068000000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mul_hs_hu", TILEGX_OPC_MUL_HS_HU, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050d80000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mul_hs_ls", TILEGX_OPC_MUL_HS_LS, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050dc0000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mul_hs_lu", TILEGX_OPC_MUL_HS_LU, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050e00000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mul_hu_hu", TILEGX_OPC_MUL_HU_HU, 0x5, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 10, 11, 18 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0x00000000780c0000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050e40000ULL, - -1ULL, - 0x0000000068040000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mul_hu_ls", TILEGX_OPC_MUL_HU_LS, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050e80000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mul_hu_lu", TILEGX_OPC_MUL_HU_LU, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050ec0000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mul_ls_ls", TILEGX_OPC_MUL_LS_LS, 0x5, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 10, 11, 18 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0x00000000780c0000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050f00000ULL, - -1ULL, - 0x0000000068080000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mul_ls_lu", TILEGX_OPC_MUL_LS_LU, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050f40000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mul_lu_lu", TILEGX_OPC_MUL_LU_LU, 0x5, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 10, 11, 18 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0x00000000780c0000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050f80000ULL, - -1ULL, - 0x00000000680c0000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mula_hs_hs", TILEGX_OPC_MULA_HS_HS, 0x5, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 24, 11, 18 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0x00000000780c0000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050a80000ULL, - -1ULL, - 0x0000000070000000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mula_hs_hu", TILEGX_OPC_MULA_HS_HU, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050ac0000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mula_hs_ls", TILEGX_OPC_MULA_HS_LS, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050b00000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mula_hs_lu", TILEGX_OPC_MULA_HS_LU, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050b40000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mula_hu_hu", TILEGX_OPC_MULA_HU_HU, 0x5, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 24, 11, 18 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0x00000000780c0000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050b80000ULL, - -1ULL, - 0x0000000070040000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mula_hu_ls", TILEGX_OPC_MULA_HU_LS, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050bc0000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mula_hu_lu", TILEGX_OPC_MULA_HU_LU, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050c00000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mula_ls_ls", TILEGX_OPC_MULA_LS_LS, 0x5, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 24, 11, 18 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0x00000000780c0000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050c40000ULL, - -1ULL, - 0x0000000070080000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mula_ls_lu", TILEGX_OPC_MULA_LS_LU, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050c80000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mula_lu_lu", TILEGX_OPC_MULA_LU_LU, 0x5, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 24, 11, 18 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0x00000000780c0000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050cc0000ULL, - -1ULL, - 0x00000000700c0000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mulax", TILEGX_OPC_MULAX, 0x5, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 24, 11, 18 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0x00000000780c0000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050a40000ULL, - -1ULL, - 0x0000000040080000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mulx", TILEGX_OPC_MULX, 0x5, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 10, 11, 18 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0x00000000780c0000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000050d00000ULL, - -1ULL, - 0x00000000400c0000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "mz", TILEGX_OPC_MZ, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000050fc0000ULL, - 0x2836000000000000ULL, - 0x00000000480c0000ULL, - 0x2806000000000000ULL, - -1ULL - } -#endif - }, - { "nap", TILEGX_OPC_NAP, 0x2, 0, TREG_ZERO, 0, - { { 0, }, { }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286b000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "nop", TILEGX_OPC_NOP, 0xf, 0, TREG_ZERO, 1, - { { }, { }, { }, { }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffff000ULL, - 0xfffff80000000000ULL, - 0x00000000780ff000ULL, - 0x3c07f80000000000ULL, - 0ULL - }, - { - 0x0000000051485000ULL, - 0x286b080000000000ULL, - 0x00000000300c5000ULL, - 0x1c06780000000000ULL, - -1ULL - } -#endif - }, - { "nor", TILEGX_OPC_NOR, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000051000000ULL, - 0x2838000000000000ULL, - 0x0000000050040000ULL, - 0x2c02000000000000ULL, - -1ULL - } -#endif - }, - { "or", TILEGX_OPC_OR, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000051040000ULL, - 0x283a000000000000ULL, - 0x0000000050080000ULL, - 0x2c04000000000000ULL, - -1ULL - } -#endif - }, - { "ori", TILEGX_OPC_ORI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000040700000ULL, - 0x18c0000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "pcnt", TILEGX_OPC_PCNT, 0x5, 2, TREG_ZERO, 1, - { { 8, 9 }, { 0, }, { 10, 11 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffff000ULL, - 0ULL, - 0x00000000780ff000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051486000ULL, - -1ULL, - 0x00000000300c6000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "revbits", TILEGX_OPC_REVBITS, 0x5, 2, TREG_ZERO, 1, - { { 8, 9 }, { 0, }, { 10, 11 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffff000ULL, - 0ULL, - 0x00000000780ff000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051487000ULL, - -1ULL, - 0x00000000300c7000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "revbytes", TILEGX_OPC_REVBYTES, 0x5, 2, TREG_ZERO, 1, - { { 8, 9 }, { 0, }, { 10, 11 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffff000ULL, - 0ULL, - 0x00000000780ff000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051488000ULL, - -1ULL, - 0x00000000300c8000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "rotl", TILEGX_OPC_ROTL, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000051080000ULL, - 0x283c000000000000ULL, - 0x0000000058000000ULL, - 0x3000000000000000ULL, - -1ULL - } -#endif - }, - { "rotli", TILEGX_OPC_ROTLI, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 29 }, { 6, 7, 30 }, { 10, 11, 31 }, { 12, 13, 32 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000060040000ULL, - 0x3002000000000000ULL, - 0x0000000078000000ULL, - 0x3800000000000000ULL, - -1ULL - } -#endif - }, - { "shl", TILEGX_OPC_SHL, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000051280000ULL, - 0x284c000000000000ULL, - 0x0000000058040000ULL, - 0x3002000000000000ULL, - -1ULL - } -#endif - }, - { "shl16insli", TILEGX_OPC_SHL16INSLI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 4 }, { 6, 7, 5 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc000000070000000ULL, - 0xf800000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000070000000ULL, - 0x3800000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "shl1add", TILEGX_OPC_SHL1ADD, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000051100000ULL, - 0x2840000000000000ULL, - 0x0000000030000000ULL, - 0x1c00000000000000ULL, - -1ULL - } -#endif - }, - { "shl1addx", TILEGX_OPC_SHL1ADDX, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x00000000510c0000ULL, - 0x283e000000000000ULL, - 0x0000000060040000ULL, - 0x3402000000000000ULL, - -1ULL - } -#endif - }, - { "shl2add", TILEGX_OPC_SHL2ADD, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000051180000ULL, - 0x2844000000000000ULL, - 0x0000000030040000ULL, - 0x1c02000000000000ULL, - -1ULL - } -#endif - }, - { "shl2addx", TILEGX_OPC_SHL2ADDX, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000051140000ULL, - 0x2842000000000000ULL, - 0x0000000060080000ULL, - 0x3404000000000000ULL, - -1ULL - } -#endif - }, - { "shl3add", TILEGX_OPC_SHL3ADD, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000051200000ULL, - 0x2848000000000000ULL, - 0x0000000030080000ULL, - 0x1c04000000000000ULL, - -1ULL - } -#endif - }, - { "shl3addx", TILEGX_OPC_SHL3ADDX, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x00000000511c0000ULL, - 0x2846000000000000ULL, - 0x00000000600c0000ULL, - 0x3406000000000000ULL, - -1ULL - } -#endif - }, - { "shli", TILEGX_OPC_SHLI, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 29 }, { 6, 7, 30 }, { 10, 11, 31 }, { 12, 13, 32 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000060080000ULL, - 0x3004000000000000ULL, - 0x0000000078040000ULL, - 0x3802000000000000ULL, - -1ULL - } -#endif - }, - { "shlx", TILEGX_OPC_SHLX, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051240000ULL, - 0x284a000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "shlxi", TILEGX_OPC_SHLXI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 29 }, { 6, 7, 30 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000600c0000ULL, - 0x3006000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "shrs", TILEGX_OPC_SHRS, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x00000000512c0000ULL, - 0x284e000000000000ULL, - 0x0000000058080000ULL, - 0x3004000000000000ULL, - -1ULL - } -#endif - }, - { "shrsi", TILEGX_OPC_SHRSI, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 29 }, { 6, 7, 30 }, { 10, 11, 31 }, { 12, 13, 32 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000060100000ULL, - 0x3008000000000000ULL, - 0x0000000078080000ULL, - 0x3804000000000000ULL, - -1ULL - } -#endif - }, - { "shru", TILEGX_OPC_SHRU, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000051340000ULL, - 0x2852000000000000ULL, - 0x00000000580c0000ULL, - 0x3006000000000000ULL, - -1ULL - } -#endif - }, - { "shrui", TILEGX_OPC_SHRUI, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 29 }, { 6, 7, 30 }, { 10, 11, 31 }, { 12, 13, 32 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000060140000ULL, - 0x300a000000000000ULL, - 0x00000000780c0000ULL, - 0x3806000000000000ULL, - -1ULL - } -#endif - }, - { "shrux", TILEGX_OPC_SHRUX, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051300000ULL, - 0x2850000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "shruxi", TILEGX_OPC_SHRUXI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 29 }, { 6, 7, 30 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000060180000ULL, - 0x300c000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "shufflebytes", TILEGX_OPC_SHUFFLEBYTES, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051380000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "st", TILEGX_OPC_ST, 0x12, 2, TREG_ZERO, 1, - { { 0, }, { 7, 17 }, { 0, }, { 0, }, { 14, 33 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0xc200000004000000ULL - }, - { - -1ULL, - 0x2862000000000000ULL, - -1ULL, - -1ULL, - 0xc200000004000000ULL - } -#endif - }, - { "st1", TILEGX_OPC_ST1, 0x12, 2, TREG_ZERO, 1, - { { 0, }, { 7, 17 }, { 0, }, { 0, }, { 14, 33 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0xc200000004000000ULL - }, - { - -1ULL, - 0x2854000000000000ULL, - -1ULL, - -1ULL, - 0xc000000000000000ULL - } -#endif - }, - { "st1_add", TILEGX_OPC_ST1_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 15, 17, 34 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x18c8000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "st2", TILEGX_OPC_ST2, 0x12, 2, TREG_ZERO, 1, - { { 0, }, { 7, 17 }, { 0, }, { 0, }, { 14, 33 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0xc200000004000000ULL - }, - { - -1ULL, - 0x2856000000000000ULL, - -1ULL, - -1ULL, - 0xc000000004000000ULL - } -#endif - }, - { "st2_add", TILEGX_OPC_ST2_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 15, 17, 34 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x18d0000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "st4", TILEGX_OPC_ST4, 0x12, 2, TREG_ZERO, 1, - { { 0, }, { 7, 17 }, { 0, }, { 0, }, { 14, 33 } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0xc200000004000000ULL - }, - { - -1ULL, - 0x2858000000000000ULL, - -1ULL, - -1ULL, - 0xc200000000000000ULL - } -#endif - }, - { "st4_add", TILEGX_OPC_ST4_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 15, 17, 34 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x18d8000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "st_add", TILEGX_OPC_ST_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 15, 17, 34 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x1900000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "stnt", TILEGX_OPC_STNT, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x2860000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "stnt1", TILEGX_OPC_STNT1, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x285a000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "stnt1_add", TILEGX_OPC_STNT1_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 15, 17, 34 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x18e0000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "stnt2", TILEGX_OPC_STNT2, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x285c000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "stnt2_add", TILEGX_OPC_STNT2_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 15, 17, 34 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x18e8000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "stnt4", TILEGX_OPC_STNT4, 0x2, 2, TREG_ZERO, 1, - { { 0, }, { 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x285e000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "stnt4_add", TILEGX_OPC_STNT4_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 15, 17, 34 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x18f0000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "stnt_add", TILEGX_OPC_STNT_ADD, 0x2, 3, TREG_ZERO, 1, - { { 0, }, { 15, 17, 34 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x18f8000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "sub", TILEGX_OPC_SUB, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000051440000ULL, - 0x2868000000000000ULL, - 0x00000000280c0000ULL, - 0x1806000000000000ULL, - -1ULL - } -#endif - }, - { "subx", TILEGX_OPC_SUBX, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000051400000ULL, - 0x2866000000000000ULL, - 0x0000000028080000ULL, - 0x1804000000000000ULL, - -1ULL - } -#endif - }, - { "subxsc", TILEGX_OPC_SUBXSC, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000513c0000ULL, - 0x2864000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "swint0", TILEGX_OPC_SWINT0, 0x2, 0, TREG_ZERO, 0, - { { 0, }, { }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286b100000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "swint1", TILEGX_OPC_SWINT1, 0x2, 0, TREG_ZERO, 0, - { { 0, }, { }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286b180000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "swint2", TILEGX_OPC_SWINT2, 0x2, 0, TREG_ZERO, 0, - { { 0, }, { }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286b200000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "swint3", TILEGX_OPC_SWINT3, 0x2, 0, TREG_ZERO, 0, - { { 0, }, { }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286b280000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "tblidxb0", TILEGX_OPC_TBLIDXB0, 0x5, 2, TREG_ZERO, 1, - { { 23, 9 }, { 0, }, { 24, 11 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffff000ULL, - 0ULL, - 0x00000000780ff000ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051489000ULL, - -1ULL, - 0x00000000300c9000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "tblidxb1", TILEGX_OPC_TBLIDXB1, 0x5, 2, TREG_ZERO, 1, - { { 23, 9 }, { 0, }, { 24, 11 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffff000ULL, - 0ULL, - 0x00000000780ff000ULL, - 0ULL, - 0ULL - }, - { - 0x000000005148a000ULL, - -1ULL, - 0x00000000300ca000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "tblidxb2", TILEGX_OPC_TBLIDXB2, 0x5, 2, TREG_ZERO, 1, - { { 23, 9 }, { 0, }, { 24, 11 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffff000ULL, - 0ULL, - 0x00000000780ff000ULL, - 0ULL, - 0ULL - }, - { - 0x000000005148b000ULL, - -1ULL, - 0x00000000300cb000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "tblidxb3", TILEGX_OPC_TBLIDXB3, 0x5, 2, TREG_ZERO, 1, - { { 23, 9 }, { 0, }, { 24, 11 }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffff000ULL, - 0ULL, - 0x00000000780ff000ULL, - 0ULL, - 0ULL - }, - { - 0x000000005148c000ULL, - -1ULL, - 0x00000000300cc000ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1add", TILEGX_OPC_V1ADD, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051500000ULL, - 0x286e000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1addi", TILEGX_OPC_V1ADDI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000040800000ULL, - 0x1908000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1adduc", TILEGX_OPC_V1ADDUC, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000514c0000ULL, - 0x286c000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1adiffu", TILEGX_OPC_V1ADIFFU, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051540000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1avgu", TILEGX_OPC_V1AVGU, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051580000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1cmpeq", TILEGX_OPC_V1CMPEQ, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000515c0000ULL, - 0x2870000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1cmpeqi", TILEGX_OPC_V1CMPEQI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000040900000ULL, - 0x1910000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1cmples", TILEGX_OPC_V1CMPLES, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051600000ULL, - 0x2872000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1cmpleu", TILEGX_OPC_V1CMPLEU, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051640000ULL, - 0x2874000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1cmplts", TILEGX_OPC_V1CMPLTS, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051680000ULL, - 0x2876000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1cmpltsi", TILEGX_OPC_V1CMPLTSI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000040a00000ULL, - 0x1918000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1cmpltu", TILEGX_OPC_V1CMPLTU, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000516c0000ULL, - 0x2878000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1cmpltui", TILEGX_OPC_V1CMPLTUI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000040b00000ULL, - 0x1920000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1cmpne", TILEGX_OPC_V1CMPNE, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051700000ULL, - 0x287a000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1ddotpu", TILEGX_OPC_V1DDOTPU, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052880000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1ddotpua", TILEGX_OPC_V1DDOTPUA, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052840000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1ddotpus", TILEGX_OPC_V1DDOTPUS, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051780000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1ddotpusa", TILEGX_OPC_V1DDOTPUSA, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051740000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1dotp", TILEGX_OPC_V1DOTP, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051880000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1dotpa", TILEGX_OPC_V1DOTPA, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000517c0000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1dotpu", TILEGX_OPC_V1DOTPU, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052900000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1dotpua", TILEGX_OPC_V1DOTPUA, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000528c0000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1dotpus", TILEGX_OPC_V1DOTPUS, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051840000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1dotpusa", TILEGX_OPC_V1DOTPUSA, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051800000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1int_h", TILEGX_OPC_V1INT_H, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000518c0000ULL, - 0x287c000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1int_l", TILEGX_OPC_V1INT_L, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051900000ULL, - 0x287e000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1maxu", TILEGX_OPC_V1MAXU, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051940000ULL, - 0x2880000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1maxui", TILEGX_OPC_V1MAXUI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000040c00000ULL, - 0x1928000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1minu", TILEGX_OPC_V1MINU, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051980000ULL, - 0x2882000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1minui", TILEGX_OPC_V1MINUI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000040d00000ULL, - 0x1930000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1mnz", TILEGX_OPC_V1MNZ, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000519c0000ULL, - 0x2884000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1multu", TILEGX_OPC_V1MULTU, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051a00000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1mulu", TILEGX_OPC_V1MULU, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051a80000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1mulus", TILEGX_OPC_V1MULUS, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051a40000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1mz", TILEGX_OPC_V1MZ, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051ac0000ULL, - 0x2886000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1sadau", TILEGX_OPC_V1SADAU, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051b00000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1sadu", TILEGX_OPC_V1SADU, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051b40000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1shl", TILEGX_OPC_V1SHL, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051b80000ULL, - 0x2888000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1shli", TILEGX_OPC_V1SHLI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 29 }, { 6, 7, 30 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000601c0000ULL, - 0x300e000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1shrs", TILEGX_OPC_V1SHRS, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051bc0000ULL, - 0x288a000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1shrsi", TILEGX_OPC_V1SHRSI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 29 }, { 6, 7, 30 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000060200000ULL, - 0x3010000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1shru", TILEGX_OPC_V1SHRU, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051c00000ULL, - 0x288c000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1shrui", TILEGX_OPC_V1SHRUI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 29 }, { 6, 7, 30 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000060240000ULL, - 0x3012000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1sub", TILEGX_OPC_V1SUB, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051c80000ULL, - 0x2890000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v1subuc", TILEGX_OPC_V1SUBUC, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051c40000ULL, - 0x288e000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2add", TILEGX_OPC_V2ADD, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051d00000ULL, - 0x2894000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2addi", TILEGX_OPC_V2ADDI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000040e00000ULL, - 0x1938000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2addsc", TILEGX_OPC_V2ADDSC, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051cc0000ULL, - 0x2892000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2adiffs", TILEGX_OPC_V2ADIFFS, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051d40000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2avgs", TILEGX_OPC_V2AVGS, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051d80000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2cmpeq", TILEGX_OPC_V2CMPEQ, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051dc0000ULL, - 0x2896000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2cmpeqi", TILEGX_OPC_V2CMPEQI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000040f00000ULL, - 0x1940000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2cmples", TILEGX_OPC_V2CMPLES, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051e00000ULL, - 0x2898000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2cmpleu", TILEGX_OPC_V2CMPLEU, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051e40000ULL, - 0x289a000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2cmplts", TILEGX_OPC_V2CMPLTS, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051e80000ULL, - 0x289c000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2cmpltsi", TILEGX_OPC_V2CMPLTSI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000041000000ULL, - 0x1948000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2cmpltu", TILEGX_OPC_V2CMPLTU, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051ec0000ULL, - 0x289e000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2cmpltui", TILEGX_OPC_V2CMPLTUI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000041100000ULL, - 0x1950000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2cmpne", TILEGX_OPC_V2CMPNE, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051f00000ULL, - 0x28a0000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2dotp", TILEGX_OPC_V2DOTP, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051f80000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2dotpa", TILEGX_OPC_V2DOTPA, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051f40000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2int_h", TILEGX_OPC_V2INT_H, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000051fc0000ULL, - 0x28a2000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2int_l", TILEGX_OPC_V2INT_L, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052000000ULL, - 0x28a4000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2maxs", TILEGX_OPC_V2MAXS, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052040000ULL, - 0x28a6000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2maxsi", TILEGX_OPC_V2MAXSI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000041200000ULL, - 0x1958000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2mins", TILEGX_OPC_V2MINS, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052080000ULL, - 0x28a8000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2minsi", TILEGX_OPC_V2MINSI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000041300000ULL, - 0x1960000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2mnz", TILEGX_OPC_V2MNZ, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000520c0000ULL, - 0x28aa000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2mulfsc", TILEGX_OPC_V2MULFSC, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052100000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2muls", TILEGX_OPC_V2MULS, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052140000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2mults", TILEGX_OPC_V2MULTS, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052180000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2mz", TILEGX_OPC_V2MZ, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000521c0000ULL, - 0x28ac000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2packh", TILEGX_OPC_V2PACKH, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052200000ULL, - 0x28ae000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2packl", TILEGX_OPC_V2PACKL, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052240000ULL, - 0x28b0000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2packuc", TILEGX_OPC_V2PACKUC, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052280000ULL, - 0x28b2000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2sadas", TILEGX_OPC_V2SADAS, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000522c0000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2sadau", TILEGX_OPC_V2SADAU, 0x1, 3, TREG_ZERO, 1, - { { 23, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052300000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2sads", TILEGX_OPC_V2SADS, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052340000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2sadu", TILEGX_OPC_V2SADU, 0x1, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 0, }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052380000ULL, - -1ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2shl", TILEGX_OPC_V2SHL, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052400000ULL, - 0x28b6000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2shli", TILEGX_OPC_V2SHLI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 29 }, { 6, 7, 30 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000060280000ULL, - 0x3014000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2shlsc", TILEGX_OPC_V2SHLSC, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000523c0000ULL, - 0x28b4000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2shrs", TILEGX_OPC_V2SHRS, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052440000ULL, - 0x28b8000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2shrsi", TILEGX_OPC_V2SHRSI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 29 }, { 6, 7, 30 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000602c0000ULL, - 0x3016000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2shru", TILEGX_OPC_V2SHRU, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052480000ULL, - 0x28ba000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2shrui", TILEGX_OPC_V2SHRUI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 29 }, { 6, 7, 30 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000060300000ULL, - 0x3018000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2sub", TILEGX_OPC_V2SUB, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052500000ULL, - 0x28be000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v2subsc", TILEGX_OPC_V2SUBSC, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000524c0000ULL, - 0x28bc000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v4add", TILEGX_OPC_V4ADD, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052580000ULL, - 0x28c2000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v4addsc", TILEGX_OPC_V4ADDSC, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052540000ULL, - 0x28c0000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v4int_h", TILEGX_OPC_V4INT_H, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000525c0000ULL, - 0x28c4000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v4int_l", TILEGX_OPC_V4INT_L, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052600000ULL, - 0x28c6000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v4packsc", TILEGX_OPC_V4PACKSC, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052640000ULL, - 0x28c8000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v4shl", TILEGX_OPC_V4SHL, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000526c0000ULL, - 0x28cc000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v4shlsc", TILEGX_OPC_V4SHLSC, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052680000ULL, - 0x28ca000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v4shrs", TILEGX_OPC_V4SHRS, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052700000ULL, - 0x28ce000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v4shru", TILEGX_OPC_V4SHRU, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052740000ULL, - 0x28d0000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v4sub", TILEGX_OPC_V4SUB, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x00000000527c0000ULL, - 0x28d4000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "v4subsc", TILEGX_OPC_V4SUBSC, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000052780000ULL, - 0x28d2000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "wh64", TILEGX_OPC_WH64, 0x2, 1, TREG_ZERO, 1, - { { 0, }, { 7 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0ULL, - 0xfffff80000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - -1ULL, - 0x286b300000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { "xor", TILEGX_OPC_XOR, 0xf, 3, TREG_ZERO, 1, - { { 8, 9, 16 }, { 6, 7, 17 }, { 10, 11, 18 }, { 12, 13, 19 }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ffc0000ULL, - 0xfffe000000000000ULL, - 0x00000000780c0000ULL, - 0x3c06000000000000ULL, - 0ULL - }, - { - 0x0000000052800000ULL, - 0x28d6000000000000ULL, - 0x00000000500c0000ULL, - 0x2c06000000000000ULL, - -1ULL - } -#endif - }, - { "xori", TILEGX_OPC_XORI, 0x3, 3, TREG_ZERO, 1, - { { 8, 9, 0 }, { 6, 7, 1 }, { 0, }, { 0, }, { 0, } }, -#ifndef DISASM_ONLY - { - 0xc00000007ff00000ULL, - 0xfff8000000000000ULL, - 0ULL, - 0ULL, - 0ULL - }, - { - 0x0000000041400000ULL, - 0x1968000000000000ULL, - -1ULL, - -1ULL, - -1ULL - } -#endif - }, - { NULL, TILEGX_OPC_NONE, 0, 0, TREG_ZERO, 0, { { 0, } }, -#ifndef DISASM_ONLY - { 0, }, { 0, } -#endif - } -}; - -#define BITFIELD(start, size) ((start) | (((1 << (size)) - 1) << 6)) -#define CHILD(array_index) (TILEGX_OPC_NONE + (array_index)) - -static const unsigned short decode_X0_fsm[936] = -{ - BITFIELD(22, 9) /* index 0 */, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_BFEXTS, - TILEGX_OPC_BFEXTS, TILEGX_OPC_BFEXTS, TILEGX_OPC_BFEXTS, TILEGX_OPC_BFEXTU, - TILEGX_OPC_BFEXTU, TILEGX_OPC_BFEXTU, TILEGX_OPC_BFEXTU, TILEGX_OPC_BFINS, - TILEGX_OPC_BFINS, TILEGX_OPC_BFINS, TILEGX_OPC_BFINS, TILEGX_OPC_MM, - TILEGX_OPC_MM, TILEGX_OPC_MM, TILEGX_OPC_MM, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, CHILD(528), CHILD(578), - CHILD(583), CHILD(588), CHILD(593), CHILD(598), TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, CHILD(603), CHILD(620), CHILD(637), CHILD(654), CHILD(671), - CHILD(703), CHILD(797), CHILD(814), CHILD(831), CHILD(848), CHILD(865), - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, CHILD(889), TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), - CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), - CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), - CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), - CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), - CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), - CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), - CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), - CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), - CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), - CHILD(906), CHILD(906), CHILD(906), CHILD(906), CHILD(906), - BITFIELD(6, 2) /* index 513 */, - TILEGX_OPC_ADDLI, TILEGX_OPC_ADDLI, TILEGX_OPC_ADDLI, CHILD(518), - BITFIELD(8, 2) /* index 518 */, - TILEGX_OPC_ADDLI, TILEGX_OPC_ADDLI, TILEGX_OPC_ADDLI, CHILD(523), - BITFIELD(10, 2) /* index 523 */, - TILEGX_OPC_ADDLI, TILEGX_OPC_ADDLI, TILEGX_OPC_ADDLI, TILEGX_OPC_MOVELI, - BITFIELD(20, 2) /* index 528 */, - TILEGX_OPC_NONE, CHILD(533), TILEGX_OPC_ADDXI, CHILD(548), - BITFIELD(6, 2) /* index 533 */, - TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, CHILD(538), - BITFIELD(8, 2) /* index 538 */, - TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, CHILD(543), - BITFIELD(10, 2) /* index 543 */, - TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_MOVEI, - BITFIELD(0, 2) /* index 548 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(553), - BITFIELD(2, 2) /* index 553 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(558), - BITFIELD(4, 2) /* index 558 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(563), - BITFIELD(6, 2) /* index 563 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(568), - BITFIELD(8, 2) /* index 568 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(573), - BITFIELD(10, 2) /* index 573 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_INFO, - BITFIELD(20, 2) /* index 578 */, - TILEGX_OPC_CMPEQI, TILEGX_OPC_CMPLTSI, TILEGX_OPC_CMPLTUI, TILEGX_OPC_ORI, - BITFIELD(20, 2) /* index 583 */, - TILEGX_OPC_V1ADDI, TILEGX_OPC_V1CMPEQI, TILEGX_OPC_V1CMPLTSI, - TILEGX_OPC_V1CMPLTUI, - BITFIELD(20, 2) /* index 588 */, - TILEGX_OPC_V1MAXUI, TILEGX_OPC_V1MINUI, TILEGX_OPC_V2ADDI, - TILEGX_OPC_V2CMPEQI, - BITFIELD(20, 2) /* index 593 */, - TILEGX_OPC_V2CMPLTSI, TILEGX_OPC_V2CMPLTUI, TILEGX_OPC_V2MAXSI, - TILEGX_OPC_V2MINSI, - BITFIELD(20, 2) /* index 598 */, - TILEGX_OPC_XORI, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(18, 4) /* index 603 */, - TILEGX_OPC_NONE, TILEGX_OPC_ADDXSC, TILEGX_OPC_ADDX, TILEGX_OPC_ADD, - TILEGX_OPC_AND, TILEGX_OPC_CMOVEQZ, TILEGX_OPC_CMOVNEZ, TILEGX_OPC_CMPEQ, - TILEGX_OPC_CMPLES, TILEGX_OPC_CMPLEU, TILEGX_OPC_CMPLTS, TILEGX_OPC_CMPLTU, - TILEGX_OPC_CMPNE, TILEGX_OPC_CMULAF, TILEGX_OPC_CMULA, TILEGX_OPC_CMULFR, - BITFIELD(18, 4) /* index 620 */, - TILEGX_OPC_CMULF, TILEGX_OPC_CMULHR, TILEGX_OPC_CMULH, TILEGX_OPC_CMUL, - TILEGX_OPC_CRC32_32, TILEGX_OPC_CRC32_8, TILEGX_OPC_DBLALIGN2, - TILEGX_OPC_DBLALIGN4, TILEGX_OPC_DBLALIGN6, TILEGX_OPC_DBLALIGN, - TILEGX_OPC_FDOUBLE_ADDSUB, TILEGX_OPC_FDOUBLE_ADD_FLAGS, - TILEGX_OPC_FDOUBLE_MUL_FLAGS, TILEGX_OPC_FDOUBLE_PACK1, - TILEGX_OPC_FDOUBLE_PACK2, TILEGX_OPC_FDOUBLE_SUB_FLAGS, - BITFIELD(18, 4) /* index 637 */, - TILEGX_OPC_FDOUBLE_UNPACK_MAX, TILEGX_OPC_FDOUBLE_UNPACK_MIN, - TILEGX_OPC_FSINGLE_ADD1, TILEGX_OPC_FSINGLE_ADDSUB2, - TILEGX_OPC_FSINGLE_MUL1, TILEGX_OPC_FSINGLE_MUL2, TILEGX_OPC_FSINGLE_PACK2, - TILEGX_OPC_FSINGLE_SUB1, TILEGX_OPC_MNZ, TILEGX_OPC_MULAX, - TILEGX_OPC_MULA_HS_HS, TILEGX_OPC_MULA_HS_HU, TILEGX_OPC_MULA_HS_LS, - TILEGX_OPC_MULA_HS_LU, TILEGX_OPC_MULA_HU_HU, TILEGX_OPC_MULA_HU_LS, - BITFIELD(18, 4) /* index 654 */, - TILEGX_OPC_MULA_HU_LU, TILEGX_OPC_MULA_LS_LS, TILEGX_OPC_MULA_LS_LU, - TILEGX_OPC_MULA_LU_LU, TILEGX_OPC_MULX, TILEGX_OPC_MUL_HS_HS, - TILEGX_OPC_MUL_HS_HU, TILEGX_OPC_MUL_HS_LS, TILEGX_OPC_MUL_HS_LU, - TILEGX_OPC_MUL_HU_HU, TILEGX_OPC_MUL_HU_LS, TILEGX_OPC_MUL_HU_LU, - TILEGX_OPC_MUL_LS_LS, TILEGX_OPC_MUL_LS_LU, TILEGX_OPC_MUL_LU_LU, - TILEGX_OPC_MZ, - BITFIELD(18, 4) /* index 671 */, - TILEGX_OPC_NOR, CHILD(688), TILEGX_OPC_ROTL, TILEGX_OPC_SHL1ADDX, - TILEGX_OPC_SHL1ADD, TILEGX_OPC_SHL2ADDX, TILEGX_OPC_SHL2ADD, - TILEGX_OPC_SHL3ADDX, TILEGX_OPC_SHL3ADD, TILEGX_OPC_SHLX, TILEGX_OPC_SHL, - TILEGX_OPC_SHRS, TILEGX_OPC_SHRUX, TILEGX_OPC_SHRU, TILEGX_OPC_SHUFFLEBYTES, - TILEGX_OPC_SUBXSC, - BITFIELD(12, 2) /* index 688 */, - TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_OR, CHILD(693), - BITFIELD(14, 2) /* index 693 */, - TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_OR, CHILD(698), - BITFIELD(16, 2) /* index 698 */, - TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_MOVE, - BITFIELD(18, 4) /* index 703 */, - TILEGX_OPC_SUBX, TILEGX_OPC_SUB, CHILD(720), TILEGX_OPC_V1ADDUC, - TILEGX_OPC_V1ADD, TILEGX_OPC_V1ADIFFU, TILEGX_OPC_V1AVGU, - TILEGX_OPC_V1CMPEQ, TILEGX_OPC_V1CMPLES, TILEGX_OPC_V1CMPLEU, - TILEGX_OPC_V1CMPLTS, TILEGX_OPC_V1CMPLTU, TILEGX_OPC_V1CMPNE, - TILEGX_OPC_V1DDOTPUSA, TILEGX_OPC_V1DDOTPUS, TILEGX_OPC_V1DOTPA, - BITFIELD(12, 4) /* index 720 */, - TILEGX_OPC_NONE, CHILD(737), CHILD(742), CHILD(747), CHILD(752), CHILD(757), - CHILD(762), CHILD(767), CHILD(772), CHILD(777), CHILD(782), CHILD(787), - CHILD(792), TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(16, 2) /* index 737 */, - TILEGX_OPC_CLZ, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(16, 2) /* index 742 */, - TILEGX_OPC_CTZ, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(16, 2) /* index 747 */, - TILEGX_OPC_FNOP, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(16, 2) /* index 752 */, - TILEGX_OPC_FSINGLE_PACK1, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(16, 2) /* index 757 */, - TILEGX_OPC_NOP, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(16, 2) /* index 762 */, - TILEGX_OPC_PCNT, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(16, 2) /* index 767 */, - TILEGX_OPC_REVBITS, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(16, 2) /* index 772 */, - TILEGX_OPC_REVBYTES, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(16, 2) /* index 777 */, - TILEGX_OPC_TBLIDXB0, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(16, 2) /* index 782 */, - TILEGX_OPC_TBLIDXB1, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(16, 2) /* index 787 */, - TILEGX_OPC_TBLIDXB2, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(16, 2) /* index 792 */, - TILEGX_OPC_TBLIDXB3, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(18, 4) /* index 797 */, - TILEGX_OPC_V1DOTPUSA, TILEGX_OPC_V1DOTPUS, TILEGX_OPC_V1DOTP, - TILEGX_OPC_V1INT_H, TILEGX_OPC_V1INT_L, TILEGX_OPC_V1MAXU, - TILEGX_OPC_V1MINU, TILEGX_OPC_V1MNZ, TILEGX_OPC_V1MULTU, TILEGX_OPC_V1MULUS, - TILEGX_OPC_V1MULU, TILEGX_OPC_V1MZ, TILEGX_OPC_V1SADAU, TILEGX_OPC_V1SADU, - TILEGX_OPC_V1SHL, TILEGX_OPC_V1SHRS, - BITFIELD(18, 4) /* index 814 */, - TILEGX_OPC_V1SHRU, TILEGX_OPC_V1SUBUC, TILEGX_OPC_V1SUB, TILEGX_OPC_V2ADDSC, - TILEGX_OPC_V2ADD, TILEGX_OPC_V2ADIFFS, TILEGX_OPC_V2AVGS, - TILEGX_OPC_V2CMPEQ, TILEGX_OPC_V2CMPLES, TILEGX_OPC_V2CMPLEU, - TILEGX_OPC_V2CMPLTS, TILEGX_OPC_V2CMPLTU, TILEGX_OPC_V2CMPNE, - TILEGX_OPC_V2DOTPA, TILEGX_OPC_V2DOTP, TILEGX_OPC_V2INT_H, - BITFIELD(18, 4) /* index 831 */, - TILEGX_OPC_V2INT_L, TILEGX_OPC_V2MAXS, TILEGX_OPC_V2MINS, TILEGX_OPC_V2MNZ, - TILEGX_OPC_V2MULFSC, TILEGX_OPC_V2MULS, TILEGX_OPC_V2MULTS, TILEGX_OPC_V2MZ, - TILEGX_OPC_V2PACKH, TILEGX_OPC_V2PACKL, TILEGX_OPC_V2PACKUC, - TILEGX_OPC_V2SADAS, TILEGX_OPC_V2SADAU, TILEGX_OPC_V2SADS, - TILEGX_OPC_V2SADU, TILEGX_OPC_V2SHLSC, - BITFIELD(18, 4) /* index 848 */, - TILEGX_OPC_V2SHL, TILEGX_OPC_V2SHRS, TILEGX_OPC_V2SHRU, TILEGX_OPC_V2SUBSC, - TILEGX_OPC_V2SUB, TILEGX_OPC_V4ADDSC, TILEGX_OPC_V4ADD, TILEGX_OPC_V4INT_H, - TILEGX_OPC_V4INT_L, TILEGX_OPC_V4PACKSC, TILEGX_OPC_V4SHLSC, - TILEGX_OPC_V4SHL, TILEGX_OPC_V4SHRS, TILEGX_OPC_V4SHRU, TILEGX_OPC_V4SUBSC, - TILEGX_OPC_V4SUB, - BITFIELD(18, 3) /* index 865 */, - CHILD(874), CHILD(877), CHILD(880), CHILD(883), CHILD(886), TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(21, 1) /* index 874 */, - TILEGX_OPC_XOR, TILEGX_OPC_NONE, - BITFIELD(21, 1) /* index 877 */, - TILEGX_OPC_V1DDOTPUA, TILEGX_OPC_NONE, - BITFIELD(21, 1) /* index 880 */, - TILEGX_OPC_V1DDOTPU, TILEGX_OPC_NONE, - BITFIELD(21, 1) /* index 883 */, - TILEGX_OPC_V1DOTPUA, TILEGX_OPC_NONE, - BITFIELD(21, 1) /* index 886 */, - TILEGX_OPC_V1DOTPU, TILEGX_OPC_NONE, - BITFIELD(18, 4) /* index 889 */, - TILEGX_OPC_NONE, TILEGX_OPC_ROTLI, TILEGX_OPC_SHLI, TILEGX_OPC_SHLXI, - TILEGX_OPC_SHRSI, TILEGX_OPC_SHRUI, TILEGX_OPC_SHRUXI, TILEGX_OPC_V1SHLI, - TILEGX_OPC_V1SHRSI, TILEGX_OPC_V1SHRUI, TILEGX_OPC_V2SHLI, - TILEGX_OPC_V2SHRSI, TILEGX_OPC_V2SHRUI, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, - BITFIELD(0, 2) /* index 906 */, - TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, - CHILD(911), - BITFIELD(2, 2) /* index 911 */, - TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, - CHILD(916), - BITFIELD(4, 2) /* index 916 */, - TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, - CHILD(921), - BITFIELD(6, 2) /* index 921 */, - TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, - CHILD(926), - BITFIELD(8, 2) /* index 926 */, - TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, - CHILD(931), - BITFIELD(10, 2) /* index 931 */, - TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, - TILEGX_OPC_INFOL, -}; - -static const unsigned short decode_X1_fsm[1266] = -{ - BITFIELD(53, 9) /* index 0 */, - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), CHILD(513), - CHILD(513), CHILD(513), CHILD(513), CHILD(513), TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, - TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_ADDXLI, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_BEQZT, - TILEGX_OPC_BEQZT, TILEGX_OPC_BEQZ, TILEGX_OPC_BEQZ, TILEGX_OPC_BGEZT, - TILEGX_OPC_BGEZT, TILEGX_OPC_BGEZ, TILEGX_OPC_BGEZ, TILEGX_OPC_BGTZT, - TILEGX_OPC_BGTZT, TILEGX_OPC_BGTZ, TILEGX_OPC_BGTZ, TILEGX_OPC_BLBCT, - TILEGX_OPC_BLBCT, TILEGX_OPC_BLBC, TILEGX_OPC_BLBC, TILEGX_OPC_BLBST, - TILEGX_OPC_BLBST, TILEGX_OPC_BLBS, TILEGX_OPC_BLBS, TILEGX_OPC_BLEZT, - TILEGX_OPC_BLEZT, TILEGX_OPC_BLEZ, TILEGX_OPC_BLEZ, TILEGX_OPC_BLTZT, - TILEGX_OPC_BLTZT, TILEGX_OPC_BLTZ, TILEGX_OPC_BLTZ, TILEGX_OPC_BNEZT, - TILEGX_OPC_BNEZT, TILEGX_OPC_BNEZ, TILEGX_OPC_BNEZ, CHILD(528), CHILD(578), - CHILD(598), CHILD(703), CHILD(723), CHILD(728), CHILD(753), CHILD(758), - CHILD(763), CHILD(768), CHILD(773), CHILD(778), TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_JAL, - TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_JAL, - TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_JAL, - TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_JAL, - TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_JAL, - TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_JAL, - TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_JAL, - TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_JAL, - TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_JAL, TILEGX_OPC_J, TILEGX_OPC_J, - TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, - TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, - TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, - TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, - TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, - TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, TILEGX_OPC_J, - CHILD(783), CHILD(800), CHILD(832), CHILD(849), CHILD(1168), CHILD(1185), - CHILD(1202), TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, CHILD(1219), TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, CHILD(1236), CHILD(1236), CHILD(1236), - CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), - CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), - CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), - CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), - CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), - CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), - CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), - CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), - CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), - CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), - CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), - CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), CHILD(1236), - CHILD(1236), - BITFIELD(37, 2) /* index 513 */, - TILEGX_OPC_ADDLI, TILEGX_OPC_ADDLI, TILEGX_OPC_ADDLI, CHILD(518), - BITFIELD(39, 2) /* index 518 */, - TILEGX_OPC_ADDLI, TILEGX_OPC_ADDLI, TILEGX_OPC_ADDLI, CHILD(523), - BITFIELD(41, 2) /* index 523 */, - TILEGX_OPC_ADDLI, TILEGX_OPC_ADDLI, TILEGX_OPC_ADDLI, TILEGX_OPC_MOVELI, - BITFIELD(51, 2) /* index 528 */, - TILEGX_OPC_NONE, CHILD(533), TILEGX_OPC_ADDXI, CHILD(548), - BITFIELD(37, 2) /* index 533 */, - TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, CHILD(538), - BITFIELD(39, 2) /* index 538 */, - TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, CHILD(543), - BITFIELD(41, 2) /* index 543 */, - TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_MOVEI, - BITFIELD(31, 2) /* index 548 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(553), - BITFIELD(33, 2) /* index 553 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(558), - BITFIELD(35, 2) /* index 558 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(563), - BITFIELD(37, 2) /* index 563 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(568), - BITFIELD(39, 2) /* index 568 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(573), - BITFIELD(41, 2) /* index 573 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_INFO, - BITFIELD(51, 2) /* index 578 */, - TILEGX_OPC_CMPEQI, TILEGX_OPC_CMPLTSI, TILEGX_OPC_CMPLTUI, CHILD(583), - BITFIELD(31, 2) /* index 583 */, - TILEGX_OPC_LD1S_ADD, TILEGX_OPC_LD1S_ADD, TILEGX_OPC_LD1S_ADD, CHILD(588), - BITFIELD(33, 2) /* index 588 */, - TILEGX_OPC_LD1S_ADD, TILEGX_OPC_LD1S_ADD, TILEGX_OPC_LD1S_ADD, CHILD(593), - BITFIELD(35, 2) /* index 593 */, - TILEGX_OPC_LD1S_ADD, TILEGX_OPC_LD1S_ADD, TILEGX_OPC_LD1S_ADD, - TILEGX_OPC_PREFETCH_ADD_L1_FAULT, - BITFIELD(51, 2) /* index 598 */, - CHILD(603), CHILD(618), CHILD(633), CHILD(648), - BITFIELD(31, 2) /* index 603 */, - TILEGX_OPC_LD1U_ADD, TILEGX_OPC_LD1U_ADD, TILEGX_OPC_LD1U_ADD, CHILD(608), - BITFIELD(33, 2) /* index 608 */, - TILEGX_OPC_LD1U_ADD, TILEGX_OPC_LD1U_ADD, TILEGX_OPC_LD1U_ADD, CHILD(613), - BITFIELD(35, 2) /* index 613 */, - TILEGX_OPC_LD1U_ADD, TILEGX_OPC_LD1U_ADD, TILEGX_OPC_LD1U_ADD, - TILEGX_OPC_PREFETCH_ADD_L1, - BITFIELD(31, 2) /* index 618 */, - TILEGX_OPC_LD2S_ADD, TILEGX_OPC_LD2S_ADD, TILEGX_OPC_LD2S_ADD, CHILD(623), - BITFIELD(33, 2) /* index 623 */, - TILEGX_OPC_LD2S_ADD, TILEGX_OPC_LD2S_ADD, TILEGX_OPC_LD2S_ADD, CHILD(628), - BITFIELD(35, 2) /* index 628 */, - TILEGX_OPC_LD2S_ADD, TILEGX_OPC_LD2S_ADD, TILEGX_OPC_LD2S_ADD, - TILEGX_OPC_PREFETCH_ADD_L2_FAULT, - BITFIELD(31, 2) /* index 633 */, - TILEGX_OPC_LD2U_ADD, TILEGX_OPC_LD2U_ADD, TILEGX_OPC_LD2U_ADD, CHILD(638), - BITFIELD(33, 2) /* index 638 */, - TILEGX_OPC_LD2U_ADD, TILEGX_OPC_LD2U_ADD, TILEGX_OPC_LD2U_ADD, CHILD(643), - BITFIELD(35, 2) /* index 643 */, - TILEGX_OPC_LD2U_ADD, TILEGX_OPC_LD2U_ADD, TILEGX_OPC_LD2U_ADD, - TILEGX_OPC_PREFETCH_ADD_L2, - BITFIELD(31, 2) /* index 648 */, - CHILD(653), CHILD(653), CHILD(653), CHILD(673), - BITFIELD(43, 2) /* index 653 */, - CHILD(658), TILEGX_OPC_LD4S_ADD, TILEGX_OPC_LD4S_ADD, TILEGX_OPC_LD4S_ADD, - BITFIELD(45, 2) /* index 658 */, - CHILD(663), TILEGX_OPC_LD4S_ADD, TILEGX_OPC_LD4S_ADD, TILEGX_OPC_LD4S_ADD, - BITFIELD(47, 2) /* index 663 */, - CHILD(668), TILEGX_OPC_LD4S_ADD, TILEGX_OPC_LD4S_ADD, TILEGX_OPC_LD4S_ADD, - BITFIELD(49, 2) /* index 668 */, - TILEGX_OPC_LD4S_TLS, TILEGX_OPC_LD4S_ADD, TILEGX_OPC_LD4S_ADD, - TILEGX_OPC_LD4S_ADD, - BITFIELD(33, 2) /* index 673 */, - CHILD(653), CHILD(653), CHILD(653), CHILD(678), - BITFIELD(35, 2) /* index 678 */, - CHILD(653), CHILD(653), CHILD(653), CHILD(683), - BITFIELD(43, 2) /* index 683 */, - CHILD(688), TILEGX_OPC_PREFETCH_ADD_L3_FAULT, - TILEGX_OPC_PREFETCH_ADD_L3_FAULT, TILEGX_OPC_PREFETCH_ADD_L3_FAULT, - BITFIELD(45, 2) /* index 688 */, - CHILD(693), TILEGX_OPC_PREFETCH_ADD_L3_FAULT, - TILEGX_OPC_PREFETCH_ADD_L3_FAULT, TILEGX_OPC_PREFETCH_ADD_L3_FAULT, - BITFIELD(47, 2) /* index 693 */, - CHILD(698), TILEGX_OPC_PREFETCH_ADD_L3_FAULT, - TILEGX_OPC_PREFETCH_ADD_L3_FAULT, TILEGX_OPC_PREFETCH_ADD_L3_FAULT, - BITFIELD(49, 2) /* index 698 */, - TILEGX_OPC_LD4S_TLS, TILEGX_OPC_PREFETCH_ADD_L3_FAULT, - TILEGX_OPC_PREFETCH_ADD_L3_FAULT, TILEGX_OPC_PREFETCH_ADD_L3_FAULT, - BITFIELD(51, 2) /* index 703 */, - CHILD(708), TILEGX_OPC_LDNT1S_ADD, TILEGX_OPC_LDNT1U_ADD, - TILEGX_OPC_LDNT2S_ADD, - BITFIELD(31, 2) /* index 708 */, - TILEGX_OPC_LD4U_ADD, TILEGX_OPC_LD4U_ADD, TILEGX_OPC_LD4U_ADD, CHILD(713), - BITFIELD(33, 2) /* index 713 */, - TILEGX_OPC_LD4U_ADD, TILEGX_OPC_LD4U_ADD, TILEGX_OPC_LD4U_ADD, CHILD(718), - BITFIELD(35, 2) /* index 718 */, - TILEGX_OPC_LD4U_ADD, TILEGX_OPC_LD4U_ADD, TILEGX_OPC_LD4U_ADD, - TILEGX_OPC_PREFETCH_ADD_L3, - BITFIELD(51, 2) /* index 723 */, - TILEGX_OPC_LDNT2U_ADD, TILEGX_OPC_LDNT4S_ADD, TILEGX_OPC_LDNT4U_ADD, - TILEGX_OPC_LDNT_ADD, - BITFIELD(51, 2) /* index 728 */, - CHILD(733), TILEGX_OPC_LDNA_ADD, TILEGX_OPC_MFSPR, TILEGX_OPC_MTSPR, - BITFIELD(43, 2) /* index 733 */, - CHILD(738), TILEGX_OPC_LD_ADD, TILEGX_OPC_LD_ADD, TILEGX_OPC_LD_ADD, - BITFIELD(45, 2) /* index 738 */, - CHILD(743), TILEGX_OPC_LD_ADD, TILEGX_OPC_LD_ADD, TILEGX_OPC_LD_ADD, - BITFIELD(47, 2) /* index 743 */, - CHILD(748), TILEGX_OPC_LD_ADD, TILEGX_OPC_LD_ADD, TILEGX_OPC_LD_ADD, - BITFIELD(49, 2) /* index 748 */, - TILEGX_OPC_LD_TLS, TILEGX_OPC_LD_ADD, TILEGX_OPC_LD_ADD, TILEGX_OPC_LD_ADD, - BITFIELD(51, 2) /* index 753 */, - TILEGX_OPC_ORI, TILEGX_OPC_ST1_ADD, TILEGX_OPC_ST2_ADD, TILEGX_OPC_ST4_ADD, - BITFIELD(51, 2) /* index 758 */, - TILEGX_OPC_STNT1_ADD, TILEGX_OPC_STNT2_ADD, TILEGX_OPC_STNT4_ADD, - TILEGX_OPC_STNT_ADD, - BITFIELD(51, 2) /* index 763 */, - TILEGX_OPC_ST_ADD, TILEGX_OPC_V1ADDI, TILEGX_OPC_V1CMPEQI, - TILEGX_OPC_V1CMPLTSI, - BITFIELD(51, 2) /* index 768 */, - TILEGX_OPC_V1CMPLTUI, TILEGX_OPC_V1MAXUI, TILEGX_OPC_V1MINUI, - TILEGX_OPC_V2ADDI, - BITFIELD(51, 2) /* index 773 */, - TILEGX_OPC_V2CMPEQI, TILEGX_OPC_V2CMPLTSI, TILEGX_OPC_V2CMPLTUI, - TILEGX_OPC_V2MAXSI, - BITFIELD(51, 2) /* index 778 */, - TILEGX_OPC_V2MINSI, TILEGX_OPC_XORI, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(49, 4) /* index 783 */, - TILEGX_OPC_NONE, TILEGX_OPC_ADDXSC, TILEGX_OPC_ADDX, TILEGX_OPC_ADD, - TILEGX_OPC_AND, TILEGX_OPC_CMPEQ, TILEGX_OPC_CMPEXCH4, TILEGX_OPC_CMPEXCH, - TILEGX_OPC_CMPLES, TILEGX_OPC_CMPLEU, TILEGX_OPC_CMPLTS, TILEGX_OPC_CMPLTU, - TILEGX_OPC_CMPNE, TILEGX_OPC_DBLALIGN2, TILEGX_OPC_DBLALIGN4, - TILEGX_OPC_DBLALIGN6, - BITFIELD(49, 4) /* index 800 */, - TILEGX_OPC_EXCH4, TILEGX_OPC_EXCH, TILEGX_OPC_FETCHADD4, - TILEGX_OPC_FETCHADDGEZ4, TILEGX_OPC_FETCHADDGEZ, TILEGX_OPC_FETCHADD, - TILEGX_OPC_FETCHAND4, TILEGX_OPC_FETCHAND, TILEGX_OPC_FETCHOR4, - TILEGX_OPC_FETCHOR, TILEGX_OPC_MNZ, TILEGX_OPC_MZ, TILEGX_OPC_NOR, - CHILD(817), TILEGX_OPC_ROTL, TILEGX_OPC_SHL1ADDX, - BITFIELD(43, 2) /* index 817 */, - TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_OR, CHILD(822), - BITFIELD(45, 2) /* index 822 */, - TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_OR, CHILD(827), - BITFIELD(47, 2) /* index 827 */, - TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_MOVE, - BITFIELD(49, 4) /* index 832 */, - TILEGX_OPC_SHL1ADD, TILEGX_OPC_SHL2ADDX, TILEGX_OPC_SHL2ADD, - TILEGX_OPC_SHL3ADDX, TILEGX_OPC_SHL3ADD, TILEGX_OPC_SHLX, TILEGX_OPC_SHL, - TILEGX_OPC_SHRS, TILEGX_OPC_SHRUX, TILEGX_OPC_SHRU, TILEGX_OPC_ST1, - TILEGX_OPC_ST2, TILEGX_OPC_ST4, TILEGX_OPC_STNT1, TILEGX_OPC_STNT2, - TILEGX_OPC_STNT4, - BITFIELD(46, 7) /* index 849 */, - TILEGX_OPC_STNT, TILEGX_OPC_STNT, TILEGX_OPC_STNT, TILEGX_OPC_STNT, - TILEGX_OPC_STNT, TILEGX_OPC_STNT, TILEGX_OPC_STNT, TILEGX_OPC_STNT, - TILEGX_OPC_ST, TILEGX_OPC_ST, TILEGX_OPC_ST, TILEGX_OPC_ST, TILEGX_OPC_ST, - TILEGX_OPC_ST, TILEGX_OPC_ST, TILEGX_OPC_ST, TILEGX_OPC_SUBXSC, - TILEGX_OPC_SUBXSC, TILEGX_OPC_SUBXSC, TILEGX_OPC_SUBXSC, TILEGX_OPC_SUBXSC, - TILEGX_OPC_SUBXSC, TILEGX_OPC_SUBXSC, TILEGX_OPC_SUBXSC, TILEGX_OPC_SUBX, - TILEGX_OPC_SUBX, TILEGX_OPC_SUBX, TILEGX_OPC_SUBX, TILEGX_OPC_SUBX, - TILEGX_OPC_SUBX, TILEGX_OPC_SUBX, TILEGX_OPC_SUBX, TILEGX_OPC_SUB, - TILEGX_OPC_SUB, TILEGX_OPC_SUB, TILEGX_OPC_SUB, TILEGX_OPC_SUB, - TILEGX_OPC_SUB, TILEGX_OPC_SUB, TILEGX_OPC_SUB, CHILD(978), CHILD(987), - CHILD(1066), CHILD(1150), CHILD(1159), TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_V1ADDUC, TILEGX_OPC_V1ADDUC, TILEGX_OPC_V1ADDUC, - TILEGX_OPC_V1ADDUC, TILEGX_OPC_V1ADDUC, TILEGX_OPC_V1ADDUC, - TILEGX_OPC_V1ADDUC, TILEGX_OPC_V1ADDUC, TILEGX_OPC_V1ADD, TILEGX_OPC_V1ADD, - TILEGX_OPC_V1ADD, TILEGX_OPC_V1ADD, TILEGX_OPC_V1ADD, TILEGX_OPC_V1ADD, - TILEGX_OPC_V1ADD, TILEGX_OPC_V1ADD, TILEGX_OPC_V1CMPEQ, TILEGX_OPC_V1CMPEQ, - TILEGX_OPC_V1CMPEQ, TILEGX_OPC_V1CMPEQ, TILEGX_OPC_V1CMPEQ, - TILEGX_OPC_V1CMPEQ, TILEGX_OPC_V1CMPEQ, TILEGX_OPC_V1CMPEQ, - TILEGX_OPC_V1CMPLES, TILEGX_OPC_V1CMPLES, TILEGX_OPC_V1CMPLES, - TILEGX_OPC_V1CMPLES, TILEGX_OPC_V1CMPLES, TILEGX_OPC_V1CMPLES, - TILEGX_OPC_V1CMPLES, TILEGX_OPC_V1CMPLES, TILEGX_OPC_V1CMPLEU, - TILEGX_OPC_V1CMPLEU, TILEGX_OPC_V1CMPLEU, TILEGX_OPC_V1CMPLEU, - TILEGX_OPC_V1CMPLEU, TILEGX_OPC_V1CMPLEU, TILEGX_OPC_V1CMPLEU, - TILEGX_OPC_V1CMPLEU, TILEGX_OPC_V1CMPLTS, TILEGX_OPC_V1CMPLTS, - TILEGX_OPC_V1CMPLTS, TILEGX_OPC_V1CMPLTS, TILEGX_OPC_V1CMPLTS, - TILEGX_OPC_V1CMPLTS, TILEGX_OPC_V1CMPLTS, TILEGX_OPC_V1CMPLTS, - TILEGX_OPC_V1CMPLTU, TILEGX_OPC_V1CMPLTU, TILEGX_OPC_V1CMPLTU, - TILEGX_OPC_V1CMPLTU, TILEGX_OPC_V1CMPLTU, TILEGX_OPC_V1CMPLTU, - TILEGX_OPC_V1CMPLTU, TILEGX_OPC_V1CMPLTU, TILEGX_OPC_V1CMPNE, - TILEGX_OPC_V1CMPNE, TILEGX_OPC_V1CMPNE, TILEGX_OPC_V1CMPNE, - TILEGX_OPC_V1CMPNE, TILEGX_OPC_V1CMPNE, TILEGX_OPC_V1CMPNE, - TILEGX_OPC_V1CMPNE, TILEGX_OPC_V1INT_H, TILEGX_OPC_V1INT_H, - TILEGX_OPC_V1INT_H, TILEGX_OPC_V1INT_H, TILEGX_OPC_V1INT_H, - TILEGX_OPC_V1INT_H, TILEGX_OPC_V1INT_H, TILEGX_OPC_V1INT_H, - TILEGX_OPC_V1INT_L, TILEGX_OPC_V1INT_L, TILEGX_OPC_V1INT_L, - TILEGX_OPC_V1INT_L, TILEGX_OPC_V1INT_L, TILEGX_OPC_V1INT_L, - TILEGX_OPC_V1INT_L, TILEGX_OPC_V1INT_L, - BITFIELD(43, 3) /* index 978 */, - TILEGX_OPC_NONE, TILEGX_OPC_DRAIN, TILEGX_OPC_DTLBPR, TILEGX_OPC_FINV, - TILEGX_OPC_FLUSHWB, TILEGX_OPC_FLUSH, TILEGX_OPC_FNOP, TILEGX_OPC_ICOH, - BITFIELD(43, 3) /* index 987 */, - CHILD(996), TILEGX_OPC_INV, TILEGX_OPC_IRET, TILEGX_OPC_JALRP, - TILEGX_OPC_JALR, TILEGX_OPC_JRP, TILEGX_OPC_JR, CHILD(1051), - BITFIELD(31, 2) /* index 996 */, - CHILD(1001), CHILD(1026), TILEGX_OPC_ILL, TILEGX_OPC_ILL, - BITFIELD(33, 2) /* index 1001 */, - TILEGX_OPC_ILL, TILEGX_OPC_ILL, TILEGX_OPC_ILL, CHILD(1006), - BITFIELD(35, 2) /* index 1006 */, - TILEGX_OPC_ILL, CHILD(1011), TILEGX_OPC_ILL, TILEGX_OPC_ILL, - BITFIELD(37, 2) /* index 1011 */, - TILEGX_OPC_ILL, CHILD(1016), TILEGX_OPC_ILL, TILEGX_OPC_ILL, - BITFIELD(39, 2) /* index 1016 */, - TILEGX_OPC_ILL, CHILD(1021), TILEGX_OPC_ILL, TILEGX_OPC_ILL, - BITFIELD(41, 2) /* index 1021 */, - TILEGX_OPC_ILL, TILEGX_OPC_ILL, TILEGX_OPC_BPT, TILEGX_OPC_ILL, - BITFIELD(33, 2) /* index 1026 */, - TILEGX_OPC_ILL, TILEGX_OPC_ILL, TILEGX_OPC_ILL, CHILD(1031), - BITFIELD(35, 2) /* index 1031 */, - TILEGX_OPC_ILL, CHILD(1036), TILEGX_OPC_ILL, TILEGX_OPC_ILL, - BITFIELD(37, 2) /* index 1036 */, - TILEGX_OPC_ILL, CHILD(1041), TILEGX_OPC_ILL, TILEGX_OPC_ILL, - BITFIELD(39, 2) /* index 1041 */, - TILEGX_OPC_ILL, CHILD(1046), TILEGX_OPC_ILL, TILEGX_OPC_ILL, - BITFIELD(41, 2) /* index 1046 */, - TILEGX_OPC_ILL, TILEGX_OPC_ILL, TILEGX_OPC_RAISE, TILEGX_OPC_ILL, - BITFIELD(31, 2) /* index 1051 */, - TILEGX_OPC_LD1S, TILEGX_OPC_LD1S, TILEGX_OPC_LD1S, CHILD(1056), - BITFIELD(33, 2) /* index 1056 */, - TILEGX_OPC_LD1S, TILEGX_OPC_LD1S, TILEGX_OPC_LD1S, CHILD(1061), - BITFIELD(35, 2) /* index 1061 */, - TILEGX_OPC_LD1S, TILEGX_OPC_LD1S, TILEGX_OPC_LD1S, - TILEGX_OPC_PREFETCH_L1_FAULT, - BITFIELD(43, 3) /* index 1066 */, - CHILD(1075), CHILD(1090), CHILD(1105), CHILD(1120), CHILD(1135), - TILEGX_OPC_LDNA, TILEGX_OPC_LDNT1S, TILEGX_OPC_LDNT1U, - BITFIELD(31, 2) /* index 1075 */, - TILEGX_OPC_LD1U, TILEGX_OPC_LD1U, TILEGX_OPC_LD1U, CHILD(1080), - BITFIELD(33, 2) /* index 1080 */, - TILEGX_OPC_LD1U, TILEGX_OPC_LD1U, TILEGX_OPC_LD1U, CHILD(1085), - BITFIELD(35, 2) /* index 1085 */, - TILEGX_OPC_LD1U, TILEGX_OPC_LD1U, TILEGX_OPC_LD1U, TILEGX_OPC_PREFETCH, - BITFIELD(31, 2) /* index 1090 */, - TILEGX_OPC_LD2S, TILEGX_OPC_LD2S, TILEGX_OPC_LD2S, CHILD(1095), - BITFIELD(33, 2) /* index 1095 */, - TILEGX_OPC_LD2S, TILEGX_OPC_LD2S, TILEGX_OPC_LD2S, CHILD(1100), - BITFIELD(35, 2) /* index 1100 */, - TILEGX_OPC_LD2S, TILEGX_OPC_LD2S, TILEGX_OPC_LD2S, - TILEGX_OPC_PREFETCH_L2_FAULT, - BITFIELD(31, 2) /* index 1105 */, - TILEGX_OPC_LD2U, TILEGX_OPC_LD2U, TILEGX_OPC_LD2U, CHILD(1110), - BITFIELD(33, 2) /* index 1110 */, - TILEGX_OPC_LD2U, TILEGX_OPC_LD2U, TILEGX_OPC_LD2U, CHILD(1115), - BITFIELD(35, 2) /* index 1115 */, - TILEGX_OPC_LD2U, TILEGX_OPC_LD2U, TILEGX_OPC_LD2U, TILEGX_OPC_PREFETCH_L2, - BITFIELD(31, 2) /* index 1120 */, - TILEGX_OPC_LD4S, TILEGX_OPC_LD4S, TILEGX_OPC_LD4S, CHILD(1125), - BITFIELD(33, 2) /* index 1125 */, - TILEGX_OPC_LD4S, TILEGX_OPC_LD4S, TILEGX_OPC_LD4S, CHILD(1130), - BITFIELD(35, 2) /* index 1130 */, - TILEGX_OPC_LD4S, TILEGX_OPC_LD4S, TILEGX_OPC_LD4S, - TILEGX_OPC_PREFETCH_L3_FAULT, - BITFIELD(31, 2) /* index 1135 */, - TILEGX_OPC_LD4U, TILEGX_OPC_LD4U, TILEGX_OPC_LD4U, CHILD(1140), - BITFIELD(33, 2) /* index 1140 */, - TILEGX_OPC_LD4U, TILEGX_OPC_LD4U, TILEGX_OPC_LD4U, CHILD(1145), - BITFIELD(35, 2) /* index 1145 */, - TILEGX_OPC_LD4U, TILEGX_OPC_LD4U, TILEGX_OPC_LD4U, TILEGX_OPC_PREFETCH_L3, - BITFIELD(43, 3) /* index 1150 */, - TILEGX_OPC_LDNT2S, TILEGX_OPC_LDNT2U, TILEGX_OPC_LDNT4S, TILEGX_OPC_LDNT4U, - TILEGX_OPC_LDNT, TILEGX_OPC_LD, TILEGX_OPC_LNK, TILEGX_OPC_MF, - BITFIELD(43, 3) /* index 1159 */, - TILEGX_OPC_NAP, TILEGX_OPC_NOP, TILEGX_OPC_SWINT0, TILEGX_OPC_SWINT1, - TILEGX_OPC_SWINT2, TILEGX_OPC_SWINT3, TILEGX_OPC_WH64, TILEGX_OPC_NONE, - BITFIELD(49, 4) /* index 1168 */, - TILEGX_OPC_V1MAXU, TILEGX_OPC_V1MINU, TILEGX_OPC_V1MNZ, TILEGX_OPC_V1MZ, - TILEGX_OPC_V1SHL, TILEGX_OPC_V1SHRS, TILEGX_OPC_V1SHRU, TILEGX_OPC_V1SUBUC, - TILEGX_OPC_V1SUB, TILEGX_OPC_V2ADDSC, TILEGX_OPC_V2ADD, TILEGX_OPC_V2CMPEQ, - TILEGX_OPC_V2CMPLES, TILEGX_OPC_V2CMPLEU, TILEGX_OPC_V2CMPLTS, - TILEGX_OPC_V2CMPLTU, - BITFIELD(49, 4) /* index 1185 */, - TILEGX_OPC_V2CMPNE, TILEGX_OPC_V2INT_H, TILEGX_OPC_V2INT_L, - TILEGX_OPC_V2MAXS, TILEGX_OPC_V2MINS, TILEGX_OPC_V2MNZ, TILEGX_OPC_V2MZ, - TILEGX_OPC_V2PACKH, TILEGX_OPC_V2PACKL, TILEGX_OPC_V2PACKUC, - TILEGX_OPC_V2SHLSC, TILEGX_OPC_V2SHL, TILEGX_OPC_V2SHRS, TILEGX_OPC_V2SHRU, - TILEGX_OPC_V2SUBSC, TILEGX_OPC_V2SUB, - BITFIELD(49, 4) /* index 1202 */, - TILEGX_OPC_V4ADDSC, TILEGX_OPC_V4ADD, TILEGX_OPC_V4INT_H, - TILEGX_OPC_V4INT_L, TILEGX_OPC_V4PACKSC, TILEGX_OPC_V4SHLSC, - TILEGX_OPC_V4SHL, TILEGX_OPC_V4SHRS, TILEGX_OPC_V4SHRU, TILEGX_OPC_V4SUBSC, - TILEGX_OPC_V4SUB, TILEGX_OPC_XOR, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(49, 4) /* index 1219 */, - TILEGX_OPC_NONE, TILEGX_OPC_ROTLI, TILEGX_OPC_SHLI, TILEGX_OPC_SHLXI, - TILEGX_OPC_SHRSI, TILEGX_OPC_SHRUI, TILEGX_OPC_SHRUXI, TILEGX_OPC_V1SHLI, - TILEGX_OPC_V1SHRSI, TILEGX_OPC_V1SHRUI, TILEGX_OPC_V2SHLI, - TILEGX_OPC_V2SHRSI, TILEGX_OPC_V2SHRUI, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, - BITFIELD(31, 2) /* index 1236 */, - TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, - CHILD(1241), - BITFIELD(33, 2) /* index 1241 */, - TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, - CHILD(1246), - BITFIELD(35, 2) /* index 1246 */, - TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, - CHILD(1251), - BITFIELD(37, 2) /* index 1251 */, - TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, - CHILD(1256), - BITFIELD(39, 2) /* index 1256 */, - TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, - CHILD(1261), - BITFIELD(41, 2) /* index 1261 */, - TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, TILEGX_OPC_SHL16INSLI, - TILEGX_OPC_INFOL, -}; - -static const unsigned short decode_Y0_fsm[178] = -{ - BITFIELD(27, 4) /* index 0 */, - CHILD(17), TILEGX_OPC_ADDXI, CHILD(32), TILEGX_OPC_CMPEQI, - TILEGX_OPC_CMPLTSI, CHILD(62), CHILD(67), CHILD(118), CHILD(123), - CHILD(128), CHILD(133), CHILD(153), CHILD(158), CHILD(163), CHILD(168), - CHILD(173), - BITFIELD(6, 2) /* index 17 */, - TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, CHILD(22), - BITFIELD(8, 2) /* index 22 */, - TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, CHILD(27), - BITFIELD(10, 2) /* index 27 */, - TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_MOVEI, - BITFIELD(0, 2) /* index 32 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(37), - BITFIELD(2, 2) /* index 37 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(42), - BITFIELD(4, 2) /* index 42 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(47), - BITFIELD(6, 2) /* index 47 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(52), - BITFIELD(8, 2) /* index 52 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(57), - BITFIELD(10, 2) /* index 57 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_INFO, - BITFIELD(18, 2) /* index 62 */, - TILEGX_OPC_ADDX, TILEGX_OPC_ADD, TILEGX_OPC_SUBX, TILEGX_OPC_SUB, - BITFIELD(15, 5) /* index 67 */, - TILEGX_OPC_SHL1ADD, TILEGX_OPC_SHL1ADD, TILEGX_OPC_SHL1ADD, - TILEGX_OPC_SHL1ADD, TILEGX_OPC_SHL1ADD, TILEGX_OPC_SHL1ADD, - TILEGX_OPC_SHL1ADD, TILEGX_OPC_SHL1ADD, TILEGX_OPC_SHL2ADD, - TILEGX_OPC_SHL2ADD, TILEGX_OPC_SHL2ADD, TILEGX_OPC_SHL2ADD, - TILEGX_OPC_SHL2ADD, TILEGX_OPC_SHL2ADD, TILEGX_OPC_SHL2ADD, - TILEGX_OPC_SHL2ADD, TILEGX_OPC_SHL3ADD, TILEGX_OPC_SHL3ADD, - TILEGX_OPC_SHL3ADD, TILEGX_OPC_SHL3ADD, TILEGX_OPC_SHL3ADD, - TILEGX_OPC_SHL3ADD, TILEGX_OPC_SHL3ADD, TILEGX_OPC_SHL3ADD, CHILD(100), - CHILD(109), TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(12, 3) /* index 100 */, - TILEGX_OPC_NONE, TILEGX_OPC_CLZ, TILEGX_OPC_CTZ, TILEGX_OPC_FNOP, - TILEGX_OPC_FSINGLE_PACK1, TILEGX_OPC_NOP, TILEGX_OPC_PCNT, - TILEGX_OPC_REVBITS, - BITFIELD(12, 3) /* index 109 */, - TILEGX_OPC_REVBYTES, TILEGX_OPC_TBLIDXB0, TILEGX_OPC_TBLIDXB1, - TILEGX_OPC_TBLIDXB2, TILEGX_OPC_TBLIDXB3, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - TILEGX_OPC_NONE, - BITFIELD(18, 2) /* index 118 */, - TILEGX_OPC_CMPLES, TILEGX_OPC_CMPLEU, TILEGX_OPC_CMPLTS, TILEGX_OPC_CMPLTU, - BITFIELD(18, 2) /* index 123 */, - TILEGX_OPC_CMPEQ, TILEGX_OPC_CMPNE, TILEGX_OPC_MULAX, TILEGX_OPC_MULX, - BITFIELD(18, 2) /* index 128 */, - TILEGX_OPC_CMOVEQZ, TILEGX_OPC_CMOVNEZ, TILEGX_OPC_MNZ, TILEGX_OPC_MZ, - BITFIELD(18, 2) /* index 133 */, - TILEGX_OPC_AND, TILEGX_OPC_NOR, CHILD(138), TILEGX_OPC_XOR, - BITFIELD(12, 2) /* index 138 */, - TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_OR, CHILD(143), - BITFIELD(14, 2) /* index 143 */, - TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_OR, CHILD(148), - BITFIELD(16, 2) /* index 148 */, - TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_MOVE, - BITFIELD(18, 2) /* index 153 */, - TILEGX_OPC_ROTL, TILEGX_OPC_SHL, TILEGX_OPC_SHRS, TILEGX_OPC_SHRU, - BITFIELD(18, 2) /* index 158 */, - TILEGX_OPC_NONE, TILEGX_OPC_SHL1ADDX, TILEGX_OPC_SHL2ADDX, - TILEGX_OPC_SHL3ADDX, - BITFIELD(18, 2) /* index 163 */, - TILEGX_OPC_MUL_HS_HS, TILEGX_OPC_MUL_HU_HU, TILEGX_OPC_MUL_LS_LS, - TILEGX_OPC_MUL_LU_LU, - BITFIELD(18, 2) /* index 168 */, - TILEGX_OPC_MULA_HS_HS, TILEGX_OPC_MULA_HU_HU, TILEGX_OPC_MULA_LS_LS, - TILEGX_OPC_MULA_LU_LU, - BITFIELD(18, 2) /* index 173 */, - TILEGX_OPC_ROTLI, TILEGX_OPC_SHLI, TILEGX_OPC_SHRSI, TILEGX_OPC_SHRUI, -}; - -static const unsigned short decode_Y1_fsm[167] = -{ - BITFIELD(58, 4) /* index 0 */, - TILEGX_OPC_NONE, CHILD(17), TILEGX_OPC_ADDXI, CHILD(32), TILEGX_OPC_CMPEQI, - TILEGX_OPC_CMPLTSI, CHILD(62), CHILD(67), CHILD(117), CHILD(122), - CHILD(127), CHILD(132), CHILD(152), CHILD(157), CHILD(162), TILEGX_OPC_NONE, - BITFIELD(37, 2) /* index 17 */, - TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, CHILD(22), - BITFIELD(39, 2) /* index 22 */, - TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, CHILD(27), - BITFIELD(41, 2) /* index 27 */, - TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_ADDI, TILEGX_OPC_MOVEI, - BITFIELD(31, 2) /* index 32 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(37), - BITFIELD(33, 2) /* index 37 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(42), - BITFIELD(35, 2) /* index 42 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(47), - BITFIELD(37, 2) /* index 47 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(52), - BITFIELD(39, 2) /* index 52 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, CHILD(57), - BITFIELD(41, 2) /* index 57 */, - TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_ANDI, TILEGX_OPC_INFO, - BITFIELD(49, 2) /* index 62 */, - TILEGX_OPC_ADDX, TILEGX_OPC_ADD, TILEGX_OPC_SUBX, TILEGX_OPC_SUB, - BITFIELD(47, 4) /* index 67 */, - TILEGX_OPC_SHL1ADD, TILEGX_OPC_SHL1ADD, TILEGX_OPC_SHL1ADD, - TILEGX_OPC_SHL1ADD, TILEGX_OPC_SHL2ADD, TILEGX_OPC_SHL2ADD, - TILEGX_OPC_SHL2ADD, TILEGX_OPC_SHL2ADD, TILEGX_OPC_SHL3ADD, - TILEGX_OPC_SHL3ADD, TILEGX_OPC_SHL3ADD, TILEGX_OPC_SHL3ADD, CHILD(84), - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_NONE, - BITFIELD(43, 3) /* index 84 */, - CHILD(93), CHILD(96), CHILD(99), CHILD(102), CHILD(105), CHILD(108), - CHILD(111), CHILD(114), - BITFIELD(46, 1) /* index 93 */, - TILEGX_OPC_NONE, TILEGX_OPC_FNOP, - BITFIELD(46, 1) /* index 96 */, - TILEGX_OPC_NONE, TILEGX_OPC_ILL, - BITFIELD(46, 1) /* index 99 */, - TILEGX_OPC_NONE, TILEGX_OPC_JALRP, - BITFIELD(46, 1) /* index 102 */, - TILEGX_OPC_NONE, TILEGX_OPC_JALR, - BITFIELD(46, 1) /* index 105 */, - TILEGX_OPC_NONE, TILEGX_OPC_JRP, - BITFIELD(46, 1) /* index 108 */, - TILEGX_OPC_NONE, TILEGX_OPC_JR, - BITFIELD(46, 1) /* index 111 */, - TILEGX_OPC_NONE, TILEGX_OPC_LNK, - BITFIELD(46, 1) /* index 114 */, - TILEGX_OPC_NONE, TILEGX_OPC_NOP, - BITFIELD(49, 2) /* index 117 */, - TILEGX_OPC_CMPLES, TILEGX_OPC_CMPLEU, TILEGX_OPC_CMPLTS, TILEGX_OPC_CMPLTU, - BITFIELD(49, 2) /* index 122 */, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_CMPEQ, TILEGX_OPC_CMPNE, - BITFIELD(49, 2) /* index 127 */, - TILEGX_OPC_NONE, TILEGX_OPC_NONE, TILEGX_OPC_MNZ, TILEGX_OPC_MZ, - BITFIELD(49, 2) /* index 132 */, - TILEGX_OPC_AND, TILEGX_OPC_NOR, CHILD(137), TILEGX_OPC_XOR, - BITFIELD(43, 2) /* index 137 */, - TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_OR, CHILD(142), - BITFIELD(45, 2) /* index 142 */, - TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_OR, CHILD(147), - BITFIELD(47, 2) /* index 147 */, - TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_OR, TILEGX_OPC_MOVE, - BITFIELD(49, 2) /* index 152 */, - TILEGX_OPC_ROTL, TILEGX_OPC_SHL, TILEGX_OPC_SHRS, TILEGX_OPC_SHRU, - BITFIELD(49, 2) /* index 157 */, - TILEGX_OPC_NONE, TILEGX_OPC_SHL1ADDX, TILEGX_OPC_SHL2ADDX, - TILEGX_OPC_SHL3ADDX, - BITFIELD(49, 2) /* index 162 */, - TILEGX_OPC_ROTLI, TILEGX_OPC_SHLI, TILEGX_OPC_SHRSI, TILEGX_OPC_SHRUI, -}; - -static const unsigned short decode_Y2_fsm[118] = -{ - BITFIELD(62, 2) /* index 0 */, - TILEGX_OPC_NONE, CHILD(5), CHILD(66), CHILD(109), - BITFIELD(55, 3) /* index 5 */, - CHILD(14), CHILD(14), CHILD(14), CHILD(17), CHILD(40), CHILD(40), CHILD(40), - CHILD(43), - BITFIELD(26, 1) /* index 14 */, - TILEGX_OPC_LD1S, TILEGX_OPC_LD1U, - BITFIELD(26, 1) /* index 17 */, - CHILD(20), CHILD(30), - BITFIELD(51, 2) /* index 20 */, - TILEGX_OPC_LD1S, TILEGX_OPC_LD1S, TILEGX_OPC_LD1S, CHILD(25), - BITFIELD(53, 2) /* index 25 */, - TILEGX_OPC_LD1S, TILEGX_OPC_LD1S, TILEGX_OPC_LD1S, - TILEGX_OPC_PREFETCH_L1_FAULT, - BITFIELD(51, 2) /* index 30 */, - TILEGX_OPC_LD1U, TILEGX_OPC_LD1U, TILEGX_OPC_LD1U, CHILD(35), - BITFIELD(53, 2) /* index 35 */, - TILEGX_OPC_LD1U, TILEGX_OPC_LD1U, TILEGX_OPC_LD1U, TILEGX_OPC_PREFETCH, - BITFIELD(26, 1) /* index 40 */, - TILEGX_OPC_LD2S, TILEGX_OPC_LD2U, - BITFIELD(26, 1) /* index 43 */, - CHILD(46), CHILD(56), - BITFIELD(51, 2) /* index 46 */, - TILEGX_OPC_LD2S, TILEGX_OPC_LD2S, TILEGX_OPC_LD2S, CHILD(51), - BITFIELD(53, 2) /* index 51 */, - TILEGX_OPC_LD2S, TILEGX_OPC_LD2S, TILEGX_OPC_LD2S, - TILEGX_OPC_PREFETCH_L2_FAULT, - BITFIELD(51, 2) /* index 56 */, - TILEGX_OPC_LD2U, TILEGX_OPC_LD2U, TILEGX_OPC_LD2U, CHILD(61), - BITFIELD(53, 2) /* index 61 */, - TILEGX_OPC_LD2U, TILEGX_OPC_LD2U, TILEGX_OPC_LD2U, TILEGX_OPC_PREFETCH_L2, - BITFIELD(56, 2) /* index 66 */, - CHILD(71), CHILD(74), CHILD(90), CHILD(93), - BITFIELD(26, 1) /* index 71 */, - TILEGX_OPC_NONE, TILEGX_OPC_LD4S, - BITFIELD(26, 1) /* index 74 */, - TILEGX_OPC_NONE, CHILD(77), - BITFIELD(51, 2) /* index 77 */, - TILEGX_OPC_LD4S, TILEGX_OPC_LD4S, TILEGX_OPC_LD4S, CHILD(82), - BITFIELD(53, 2) /* index 82 */, - TILEGX_OPC_LD4S, TILEGX_OPC_LD4S, TILEGX_OPC_LD4S, CHILD(87), - BITFIELD(55, 1) /* index 87 */, - TILEGX_OPC_LD4S, TILEGX_OPC_PREFETCH_L3_FAULT, - BITFIELD(26, 1) /* index 90 */, - TILEGX_OPC_LD4U, TILEGX_OPC_LD, - BITFIELD(26, 1) /* index 93 */, - CHILD(96), TILEGX_OPC_LD, - BITFIELD(51, 2) /* index 96 */, - TILEGX_OPC_LD4U, TILEGX_OPC_LD4U, TILEGX_OPC_LD4U, CHILD(101), - BITFIELD(53, 2) /* index 101 */, - TILEGX_OPC_LD4U, TILEGX_OPC_LD4U, TILEGX_OPC_LD4U, CHILD(106), - BITFIELD(55, 1) /* index 106 */, - TILEGX_OPC_LD4U, TILEGX_OPC_PREFETCH_L3, - BITFIELD(26, 1) /* index 109 */, - CHILD(112), CHILD(115), - BITFIELD(57, 1) /* index 112 */, - TILEGX_OPC_ST1, TILEGX_OPC_ST4, - BITFIELD(57, 1) /* index 115 */, - TILEGX_OPC_ST2, TILEGX_OPC_ST, -}; - -#undef BITFIELD -#undef CHILD - -const unsigned short * const -tilegx_bundle_decoder_fsms[TILEGX_NUM_PIPELINE_ENCODINGS] = -{ - decode_X0_fsm, - decode_X1_fsm, - decode_Y0_fsm, - decode_Y1_fsm, - decode_Y2_fsm -}; - -const struct tilegx_operand tilegx_operands[35] = -{ - { - TILEGX_OP_TYPE_IMMEDIATE, BFD_RELOC(TILEGX_IMM8_X0), - 8, 1, 0, 0, 0, 0, - create_Imm8_X0, get_Imm8_X0 - }, - { - TILEGX_OP_TYPE_IMMEDIATE, BFD_RELOC(TILEGX_IMM8_X1), - 8, 1, 0, 0, 0, 0, - create_Imm8_X1, get_Imm8_X1 - }, - { - TILEGX_OP_TYPE_IMMEDIATE, BFD_RELOC(TILEGX_IMM8_Y0), - 8, 1, 0, 0, 0, 0, - create_Imm8_Y0, get_Imm8_Y0 - }, - { - TILEGX_OP_TYPE_IMMEDIATE, BFD_RELOC(TILEGX_IMM8_Y1), - 8, 1, 0, 0, 0, 0, - create_Imm8_Y1, get_Imm8_Y1 - }, - { - TILEGX_OP_TYPE_IMMEDIATE, BFD_RELOC(TILEGX_IMM16_X0_HW0_LAST), - 16, 1, 0, 0, 0, 0, - create_Imm16_X0, get_Imm16_X0 - }, - { - TILEGX_OP_TYPE_IMMEDIATE, BFD_RELOC(TILEGX_IMM16_X1_HW0_LAST), - 16, 1, 0, 0, 0, 0, - create_Imm16_X1, get_Imm16_X1 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 0, 1, 0, 0, - create_Dest_X1, get_Dest_X1 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 1, 0, 0, 0, - create_SrcA_X1, get_SrcA_X1 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 0, 1, 0, 0, - create_Dest_X0, get_Dest_X0 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 1, 0, 0, 0, - create_SrcA_X0, get_SrcA_X0 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 0, 1, 0, 0, - create_Dest_Y0, get_Dest_Y0 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 1, 0, 0, 0, - create_SrcA_Y0, get_SrcA_Y0 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 0, 1, 0, 0, - create_Dest_Y1, get_Dest_Y1 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 1, 0, 0, 0, - create_SrcA_Y1, get_SrcA_Y1 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 1, 0, 0, 0, - create_SrcA_Y2, get_SrcA_Y2 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 1, 1, 0, 0, - create_SrcA_X1, get_SrcA_X1 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 1, 0, 0, 0, - create_SrcB_X0, get_SrcB_X0 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 1, 0, 0, 0, - create_SrcB_X1, get_SrcB_X1 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 1, 0, 0, 0, - create_SrcB_Y0, get_SrcB_Y0 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 1, 0, 0, 0, - create_SrcB_Y1, get_SrcB_Y1 - }, - { - TILEGX_OP_TYPE_ADDRESS, BFD_RELOC(TILEGX_BROFF_X1), - 17, 1, 0, 0, 1, TILEGX_LOG2_BUNDLE_ALIGNMENT_IN_BYTES, - create_BrOff_X1, get_BrOff_X1 - }, - { - TILEGX_OP_TYPE_IMMEDIATE, BFD_RELOC(TILEGX_MMSTART_X0), - 6, 0, 0, 0, 0, 0, - create_BFStart_X0, get_BFStart_X0 - }, - { - TILEGX_OP_TYPE_IMMEDIATE, BFD_RELOC(TILEGX_MMEND_X0), - 6, 0, 0, 0, 0, 0, - create_BFEnd_X0, get_BFEnd_X0 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 1, 1, 0, 0, - create_Dest_X0, get_Dest_X0 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 1, 1, 0, 0, - create_Dest_Y0, get_Dest_Y0 - }, - { - TILEGX_OP_TYPE_ADDRESS, BFD_RELOC(TILEGX_JUMPOFF_X1), - 27, 1, 0, 0, 1, TILEGX_LOG2_BUNDLE_ALIGNMENT_IN_BYTES, - create_JumpOff_X1, get_JumpOff_X1 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 0, 1, 0, 0, - create_SrcBDest_Y2, get_SrcBDest_Y2 - }, - { - TILEGX_OP_TYPE_SPR, BFD_RELOC(TILEGX_MF_IMM14_X1), - 14, 0, 0, 0, 0, 0, - create_MF_Imm14_X1, get_MF_Imm14_X1 - }, - { - TILEGX_OP_TYPE_SPR, BFD_RELOC(TILEGX_MT_IMM14_X1), - 14, 0, 0, 0, 0, 0, - create_MT_Imm14_X1, get_MT_Imm14_X1 - }, - { - TILEGX_OP_TYPE_IMMEDIATE, BFD_RELOC(TILEGX_SHAMT_X0), - 6, 0, 0, 0, 0, 0, - create_ShAmt_X0, get_ShAmt_X0 - }, - { - TILEGX_OP_TYPE_IMMEDIATE, BFD_RELOC(TILEGX_SHAMT_X1), - 6, 0, 0, 0, 0, 0, - create_ShAmt_X1, get_ShAmt_X1 - }, - { - TILEGX_OP_TYPE_IMMEDIATE, BFD_RELOC(TILEGX_SHAMT_Y0), - 6, 0, 0, 0, 0, 0, - create_ShAmt_Y0, get_ShAmt_Y0 - }, - { - TILEGX_OP_TYPE_IMMEDIATE, BFD_RELOC(TILEGX_SHAMT_Y1), - 6, 0, 0, 0, 0, 0, - create_ShAmt_Y1, get_ShAmt_Y1 - }, - { - TILEGX_OP_TYPE_REGISTER, BFD_RELOC(NONE), - 6, 0, 1, 0, 0, 0, - create_SrcBDest_Y2, get_SrcBDest_Y2 - }, - { - TILEGX_OP_TYPE_IMMEDIATE, BFD_RELOC(TILEGX_DEST_IMM8_X1), - 8, 1, 0, 0, 0, 0, - create_Dest_Imm8_X1, get_Dest_Imm8_X1 - } -}; - -/* Given a set of bundle bits and a specific pipe, returns which - * instruction the bundle contains in that pipe. - */ -const struct tilegx_opcode * -find_opcode(tilegx_bundle_bits bits, tilegx_pipeline pipe) -{ - const unsigned short *table = tilegx_bundle_decoder_fsms[pipe]; - int index = 0; - - while (1) - { - unsigned short bitspec = table[index]; - unsigned int bitfield = - ((unsigned int)(bits >> (bitspec & 63))) & (bitspec >> 6); - - unsigned short next = table[index + 1 + bitfield]; - if (next <= TILEGX_OPC_NONE) - return &tilegx_opcodes[next]; - - index = next - TILEGX_OPC_NONE; - } -} - -int -parse_insn_tilegx(tilegx_bundle_bits bits, - unsigned long long pc, - struct tilegx_decoded_instruction - decoded[TILEGX_MAX_INSTRUCTIONS_PER_BUNDLE]) -{ - int num_instructions = 0; - int pipe; - - int min_pipe, max_pipe; - if ((bits & TILEGX_BUNDLE_MODE_MASK) == 0) - { - min_pipe = TILEGX_PIPELINE_X0; - max_pipe = TILEGX_PIPELINE_X1; - } - else - { - min_pipe = TILEGX_PIPELINE_Y0; - max_pipe = TILEGX_PIPELINE_Y2; - } - - /* For each pipe, find an instruction that fits. */ - for (pipe = min_pipe; pipe <= max_pipe; pipe++) - { - const struct tilegx_opcode *opc; - struct tilegx_decoded_instruction *d; - int i; - - d = &decoded[num_instructions++]; - opc = find_opcode (bits, (tilegx_pipeline)pipe); - d->opcode = opc; - - /* Decode each operand, sign extending, etc. as appropriate. */ - for (i = 0; i < opc->num_operands; i++) - { - const struct tilegx_operand *op = - &tilegx_operands[opc->operands[pipe][i]]; - int raw_opval = op->extract (bits); - long long opval; - - if (op->is_signed) - { - /* Sign-extend the operand. */ - int shift = (int)((sizeof(int) * 8) - op->num_bits); - raw_opval = (raw_opval << shift) >> shift; - } - - /* Adjust PC-relative scaled branch offsets. */ - if (op->type == TILEGX_OP_TYPE_ADDRESS) - opval = (raw_opval * TILEGX_BUNDLE_SIZE_IN_BYTES) + pc; - else - opval = raw_opval; - - /* Record the final value. */ - d->operands[i] = op; - d->operand_values[i] = opval; - } - } - - return num_instructions; -} - -struct tilegx_spr -{ - /* The number */ - int number; - - /* The name */ - const char *name; -}; - -static int -tilegx_spr_compare (const void *a_ptr, const void *b_ptr) -{ - const struct tilegx_spr *a = (const struct tilegx_spr *) a_ptr; - const struct tilegx_spr *b = (const struct tilegx_spr *) b_ptr; - return (a->number - b->number); -} - -const struct tilegx_spr tilegx_sprs[] = { - { 0, "MPL_MEM_ERROR_SET_0" }, - { 1, "MPL_MEM_ERROR_SET_1" }, - { 2, "MPL_MEM_ERROR_SET_2" }, - { 3, "MPL_MEM_ERROR_SET_3" }, - { 4, "MPL_MEM_ERROR" }, - { 5, "MEM_ERROR_CBOX_ADDR" }, - { 6, "MEM_ERROR_CBOX_STATUS" }, - { 7, "MEM_ERROR_ENABLE" }, - { 8, "MEM_ERROR_MBOX_ADDR" }, - { 9, "MEM_ERROR_MBOX_STATUS" }, - { 10, "SBOX_ERROR" }, - { 11, "XDN_DEMUX_ERROR" }, - { 256, "MPL_SINGLE_STEP_3_SET_0" }, - { 257, "MPL_SINGLE_STEP_3_SET_1" }, - { 258, "MPL_SINGLE_STEP_3_SET_2" }, - { 259, "MPL_SINGLE_STEP_3_SET_3" }, - { 260, "MPL_SINGLE_STEP_3" }, - { 261, "SINGLE_STEP_CONTROL_3" }, - { 512, "MPL_SINGLE_STEP_2_SET_0" }, - { 513, "MPL_SINGLE_STEP_2_SET_1" }, - { 514, "MPL_SINGLE_STEP_2_SET_2" }, - { 515, "MPL_SINGLE_STEP_2_SET_3" }, - { 516, "MPL_SINGLE_STEP_2" }, - { 517, "SINGLE_STEP_CONTROL_2" }, - { 768, "MPL_SINGLE_STEP_1_SET_0" }, - { 769, "MPL_SINGLE_STEP_1_SET_1" }, - { 770, "MPL_SINGLE_STEP_1_SET_2" }, - { 771, "MPL_SINGLE_STEP_1_SET_3" }, - { 772, "MPL_SINGLE_STEP_1" }, - { 773, "SINGLE_STEP_CONTROL_1" }, - { 1024, "MPL_SINGLE_STEP_0_SET_0" }, - { 1025, "MPL_SINGLE_STEP_0_SET_1" }, - { 1026, "MPL_SINGLE_STEP_0_SET_2" }, - { 1027, "MPL_SINGLE_STEP_0_SET_3" }, - { 1028, "MPL_SINGLE_STEP_0" }, - { 1029, "SINGLE_STEP_CONTROL_0" }, - { 1280, "MPL_IDN_COMPLETE_SET_0" }, - { 1281, "MPL_IDN_COMPLETE_SET_1" }, - { 1282, "MPL_IDN_COMPLETE_SET_2" }, - { 1283, "MPL_IDN_COMPLETE_SET_3" }, - { 1284, "MPL_IDN_COMPLETE" }, - { 1285, "IDN_COMPLETE_PENDING" }, - { 1536, "MPL_UDN_COMPLETE_SET_0" }, - { 1537, "MPL_UDN_COMPLETE_SET_1" }, - { 1538, "MPL_UDN_COMPLETE_SET_2" }, - { 1539, "MPL_UDN_COMPLETE_SET_3" }, - { 1540, "MPL_UDN_COMPLETE" }, - { 1541, "UDN_COMPLETE_PENDING" }, - { 1792, "MPL_ITLB_MISS_SET_0" }, - { 1793, "MPL_ITLB_MISS_SET_1" }, - { 1794, "MPL_ITLB_MISS_SET_2" }, - { 1795, "MPL_ITLB_MISS_SET_3" }, - { 1796, "MPL_ITLB_MISS" }, - { 1797, "ITLB_TSB_BASE_ADDR_0" }, - { 1798, "ITLB_TSB_BASE_ADDR_1" }, - { 1920, "ITLB_CURRENT_ATTR" }, - { 1921, "ITLB_CURRENT_PA" }, - { 1922, "ITLB_CURRENT_VA" }, - { 1923, "ITLB_INDEX" }, - { 1924, "ITLB_MATCH_0" }, - { 1925, "ITLB_PERF" }, - { 1926, "ITLB_PR" }, - { 1927, "ITLB_TSB_ADDR_0" }, - { 1928, "ITLB_TSB_ADDR_1" }, - { 1929, "ITLB_TSB_FILL_CURRENT_ATTR" }, - { 1930, "ITLB_TSB_FILL_MATCH" }, - { 1931, "NUMBER_ITLB" }, - { 1932, "REPLACEMENT_ITLB" }, - { 1933, "WIRED_ITLB" }, - { 2048, "MPL_ILL_SET_0" }, - { 2049, "MPL_ILL_SET_1" }, - { 2050, "MPL_ILL_SET_2" }, - { 2051, "MPL_ILL_SET_3" }, - { 2052, "MPL_ILL" }, - { 2304, "MPL_GPV_SET_0" }, - { 2305, "MPL_GPV_SET_1" }, - { 2306, "MPL_GPV_SET_2" }, - { 2307, "MPL_GPV_SET_3" }, - { 2308, "MPL_GPV" }, - { 2309, "GPV_REASON" }, - { 2560, "MPL_IDN_ACCESS_SET_0" }, - { 2561, "MPL_IDN_ACCESS_SET_1" }, - { 2562, "MPL_IDN_ACCESS_SET_2" }, - { 2563, "MPL_IDN_ACCESS_SET_3" }, - { 2564, "MPL_IDN_ACCESS" }, - { 2565, "IDN_DEMUX_COUNT_0" }, - { 2566, "IDN_DEMUX_COUNT_1" }, - { 2567, "IDN_FLUSH_EGRESS" }, - { 2568, "IDN_PENDING" }, - { 2569, "IDN_ROUTE_ORDER" }, - { 2570, "IDN_SP_FIFO_CNT" }, - { 2688, "IDN_DATA_AVAIL" }, - { 2816, "MPL_UDN_ACCESS_SET_0" }, - { 2817, "MPL_UDN_ACCESS_SET_1" }, - { 2818, "MPL_UDN_ACCESS_SET_2" }, - { 2819, "MPL_UDN_ACCESS_SET_3" }, - { 2820, "MPL_UDN_ACCESS" }, - { 2821, "UDN_DEMUX_COUNT_0" }, - { 2822, "UDN_DEMUX_COUNT_1" }, - { 2823, "UDN_DEMUX_COUNT_2" }, - { 2824, "UDN_DEMUX_COUNT_3" }, - { 2825, "UDN_FLUSH_EGRESS" }, - { 2826, "UDN_PENDING" }, - { 2827, "UDN_ROUTE_ORDER" }, - { 2828, "UDN_SP_FIFO_CNT" }, - { 2944, "UDN_DATA_AVAIL" }, - { 3072, "MPL_SWINT_3_SET_0" }, - { 3073, "MPL_SWINT_3_SET_1" }, - { 3074, "MPL_SWINT_3_SET_2" }, - { 3075, "MPL_SWINT_3_SET_3" }, - { 3076, "MPL_SWINT_3" }, - { 3328, "MPL_SWINT_2_SET_0" }, - { 3329, "MPL_SWINT_2_SET_1" }, - { 3330, "MPL_SWINT_2_SET_2" }, - { 3331, "MPL_SWINT_2_SET_3" }, - { 3332, "MPL_SWINT_2" }, - { 3584, "MPL_SWINT_1_SET_0" }, - { 3585, "MPL_SWINT_1_SET_1" }, - { 3586, "MPL_SWINT_1_SET_2" }, - { 3587, "MPL_SWINT_1_SET_3" }, - { 3588, "MPL_SWINT_1" }, - { 3840, "MPL_SWINT_0_SET_0" }, - { 3841, "MPL_SWINT_0_SET_1" }, - { 3842, "MPL_SWINT_0_SET_2" }, - { 3843, "MPL_SWINT_0_SET_3" }, - { 3844, "MPL_SWINT_0" }, - { 4096, "MPL_ILL_TRANS_SET_0" }, - { 4097, "MPL_ILL_TRANS_SET_1" }, - { 4098, "MPL_ILL_TRANS_SET_2" }, - { 4099, "MPL_ILL_TRANS_SET_3" }, - { 4100, "MPL_ILL_TRANS" }, - { 4101, "ILL_TRANS_REASON" }, - { 4102, "ILL_VA_PC" }, - { 4352, "MPL_UNALIGN_DATA_SET_0" }, - { 4353, "MPL_UNALIGN_DATA_SET_1" }, - { 4354, "MPL_UNALIGN_DATA_SET_2" }, - { 4355, "MPL_UNALIGN_DATA_SET_3" }, - { 4356, "MPL_UNALIGN_DATA" }, - { 4608, "MPL_DTLB_MISS_SET_0" }, - { 4609, "MPL_DTLB_MISS_SET_1" }, - { 4610, "MPL_DTLB_MISS_SET_2" }, - { 4611, "MPL_DTLB_MISS_SET_3" }, - { 4612, "MPL_DTLB_MISS" }, - { 4613, "DTLB_TSB_BASE_ADDR_0" }, - { 4614, "DTLB_TSB_BASE_ADDR_1" }, - { 4736, "AAR" }, - { 4737, "CACHE_PINNED_WAYS" }, - { 4738, "DTLB_BAD_ADDR" }, - { 4739, "DTLB_BAD_ADDR_REASON" }, - { 4740, "DTLB_CURRENT_ATTR" }, - { 4741, "DTLB_CURRENT_PA" }, - { 4742, "DTLB_CURRENT_VA" }, - { 4743, "DTLB_INDEX" }, - { 4744, "DTLB_MATCH_0" }, - { 4745, "DTLB_PERF" }, - { 4746, "DTLB_TSB_ADDR_0" }, - { 4747, "DTLB_TSB_ADDR_1" }, - { 4748, "DTLB_TSB_FILL_CURRENT_ATTR" }, - { 4749, "DTLB_TSB_FILL_MATCH" }, - { 4750, "NUMBER_DTLB" }, - { 4751, "REPLACEMENT_DTLB" }, - { 4752, "WIRED_DTLB" }, - { 4864, "MPL_DTLB_ACCESS_SET_0" }, - { 4865, "MPL_DTLB_ACCESS_SET_1" }, - { 4866, "MPL_DTLB_ACCESS_SET_2" }, - { 4867, "MPL_DTLB_ACCESS_SET_3" }, - { 4868, "MPL_DTLB_ACCESS" }, - { 5120, "MPL_IDN_FIREWALL_SET_0" }, - { 5121, "MPL_IDN_FIREWALL_SET_1" }, - { 5122, "MPL_IDN_FIREWALL_SET_2" }, - { 5123, "MPL_IDN_FIREWALL_SET_3" }, - { 5124, "MPL_IDN_FIREWALL" }, - { 5125, "IDN_DIRECTION_PROTECT" }, - { 5376, "MPL_UDN_FIREWALL_SET_0" }, - { 5377, "MPL_UDN_FIREWALL_SET_1" }, - { 5378, "MPL_UDN_FIREWALL_SET_2" }, - { 5379, "MPL_UDN_FIREWALL_SET_3" }, - { 5380, "MPL_UDN_FIREWALL" }, - { 5381, "UDN_DIRECTION_PROTECT" }, - { 5632, "MPL_TILE_TIMER_SET_0" }, - { 5633, "MPL_TILE_TIMER_SET_1" }, - { 5634, "MPL_TILE_TIMER_SET_2" }, - { 5635, "MPL_TILE_TIMER_SET_3" }, - { 5636, "MPL_TILE_TIMER" }, - { 5637, "TILE_TIMER_CONTROL" }, - { 5888, "MPL_AUX_TILE_TIMER_SET_0" }, - { 5889, "MPL_AUX_TILE_TIMER_SET_1" }, - { 5890, "MPL_AUX_TILE_TIMER_SET_2" }, - { 5891, "MPL_AUX_TILE_TIMER_SET_3" }, - { 5892, "MPL_AUX_TILE_TIMER" }, - { 5893, "AUX_TILE_TIMER_CONTROL" }, - { 6144, "MPL_IDN_TIMER_SET_0" }, - { 6145, "MPL_IDN_TIMER_SET_1" }, - { 6146, "MPL_IDN_TIMER_SET_2" }, - { 6147, "MPL_IDN_TIMER_SET_3" }, - { 6148, "MPL_IDN_TIMER" }, - { 6149, "IDN_DEADLOCK_COUNT" }, - { 6150, "IDN_DEADLOCK_TIMEOUT" }, - { 6400, "MPL_UDN_TIMER_SET_0" }, - { 6401, "MPL_UDN_TIMER_SET_1" }, - { 6402, "MPL_UDN_TIMER_SET_2" }, - { 6403, "MPL_UDN_TIMER_SET_3" }, - { 6404, "MPL_UDN_TIMER" }, - { 6405, "UDN_DEADLOCK_COUNT" }, - { 6406, "UDN_DEADLOCK_TIMEOUT" }, - { 6656, "MPL_IDN_AVAIL_SET_0" }, - { 6657, "MPL_IDN_AVAIL_SET_1" }, - { 6658, "MPL_IDN_AVAIL_SET_2" }, - { 6659, "MPL_IDN_AVAIL_SET_3" }, - { 6660, "MPL_IDN_AVAIL" }, - { 6661, "IDN_AVAIL_EN" }, - { 6912, "MPL_UDN_AVAIL_SET_0" }, - { 6913, "MPL_UDN_AVAIL_SET_1" }, - { 6914, "MPL_UDN_AVAIL_SET_2" }, - { 6915, "MPL_UDN_AVAIL_SET_3" }, - { 6916, "MPL_UDN_AVAIL" }, - { 6917, "UDN_AVAIL_EN" }, - { 7168, "MPL_IPI_3_SET_0" }, - { 7169, "MPL_IPI_3_SET_1" }, - { 7170, "MPL_IPI_3_SET_2" }, - { 7171, "MPL_IPI_3_SET_3" }, - { 7172, "MPL_IPI_3" }, - { 7173, "IPI_EVENT_3" }, - { 7174, "IPI_EVENT_RESET_3" }, - { 7175, "IPI_EVENT_SET_3" }, - { 7176, "IPI_MASK_3" }, - { 7177, "IPI_MASK_RESET_3" }, - { 7178, "IPI_MASK_SET_3" }, - { 7424, "MPL_IPI_2_SET_0" }, - { 7425, "MPL_IPI_2_SET_1" }, - { 7426, "MPL_IPI_2_SET_2" }, - { 7427, "MPL_IPI_2_SET_3" }, - { 7428, "MPL_IPI_2" }, - { 7429, "IPI_EVENT_2" }, - { 7430, "IPI_EVENT_RESET_2" }, - { 7431, "IPI_EVENT_SET_2" }, - { 7432, "IPI_MASK_2" }, - { 7433, "IPI_MASK_RESET_2" }, - { 7434, "IPI_MASK_SET_2" }, - { 7680, "MPL_IPI_1_SET_0" }, - { 7681, "MPL_IPI_1_SET_1" }, - { 7682, "MPL_IPI_1_SET_2" }, - { 7683, "MPL_IPI_1_SET_3" }, - { 7684, "MPL_IPI_1" }, - { 7685, "IPI_EVENT_1" }, - { 7686, "IPI_EVENT_RESET_1" }, - { 7687, "IPI_EVENT_SET_1" }, - { 7688, "IPI_MASK_1" }, - { 7689, "IPI_MASK_RESET_1" }, - { 7690, "IPI_MASK_SET_1" }, - { 7936, "MPL_IPI_0_SET_0" }, - { 7937, "MPL_IPI_0_SET_1" }, - { 7938, "MPL_IPI_0_SET_2" }, - { 7939, "MPL_IPI_0_SET_3" }, - { 7940, "MPL_IPI_0" }, - { 7941, "IPI_EVENT_0" }, - { 7942, "IPI_EVENT_RESET_0" }, - { 7943, "IPI_EVENT_SET_0" }, - { 7944, "IPI_MASK_0" }, - { 7945, "IPI_MASK_RESET_0" }, - { 7946, "IPI_MASK_SET_0" }, - { 8192, "MPL_PERF_COUNT_SET_0" }, - { 8193, "MPL_PERF_COUNT_SET_1" }, - { 8194, "MPL_PERF_COUNT_SET_2" }, - { 8195, "MPL_PERF_COUNT_SET_3" }, - { 8196, "MPL_PERF_COUNT" }, - { 8197, "PERF_COUNT_0" }, - { 8198, "PERF_COUNT_1" }, - { 8199, "PERF_COUNT_CTL" }, - { 8200, "PERF_COUNT_DN_CTL" }, - { 8201, "PERF_COUNT_STS" }, - { 8202, "WATCH_MASK" }, - { 8203, "WATCH_VAL" }, - { 8448, "MPL_AUX_PERF_COUNT_SET_0" }, - { 8449, "MPL_AUX_PERF_COUNT_SET_1" }, - { 8450, "MPL_AUX_PERF_COUNT_SET_2" }, - { 8451, "MPL_AUX_PERF_COUNT_SET_3" }, - { 8452, "MPL_AUX_PERF_COUNT" }, - { 8453, "AUX_PERF_COUNT_0" }, - { 8454, "AUX_PERF_COUNT_1" }, - { 8455, "AUX_PERF_COUNT_CTL" }, - { 8456, "AUX_PERF_COUNT_STS" }, - { 8704, "MPL_INTCTRL_3_SET_0" }, - { 8705, "MPL_INTCTRL_3_SET_1" }, - { 8706, "MPL_INTCTRL_3_SET_2" }, - { 8707, "MPL_INTCTRL_3_SET_3" }, - { 8708, "MPL_INTCTRL_3" }, - { 8709, "INTCTRL_3_STATUS" }, - { 8710, "INTERRUPT_MASK_3" }, - { 8711, "INTERRUPT_MASK_RESET_3" }, - { 8712, "INTERRUPT_MASK_SET_3" }, - { 8713, "INTERRUPT_VECTOR_BASE_3" }, - { 8714, "SINGLE_STEP_EN_0_3" }, - { 8715, "SINGLE_STEP_EN_1_3" }, - { 8716, "SINGLE_STEP_EN_2_3" }, - { 8717, "SINGLE_STEP_EN_3_3" }, - { 8832, "EX_CONTEXT_3_0" }, - { 8833, "EX_CONTEXT_3_1" }, - { 8834, "SYSTEM_SAVE_3_0" }, - { 8835, "SYSTEM_SAVE_3_1" }, - { 8836, "SYSTEM_SAVE_3_2" }, - { 8837, "SYSTEM_SAVE_3_3" }, - { 8960, "MPL_INTCTRL_2_SET_0" }, - { 8961, "MPL_INTCTRL_2_SET_1" }, - { 8962, "MPL_INTCTRL_2_SET_2" }, - { 8963, "MPL_INTCTRL_2_SET_3" }, - { 8964, "MPL_INTCTRL_2" }, - { 8965, "INTCTRL_2_STATUS" }, - { 8966, "INTERRUPT_MASK_2" }, - { 8967, "INTERRUPT_MASK_RESET_2" }, - { 8968, "INTERRUPT_MASK_SET_2" }, - { 8969, "INTERRUPT_VECTOR_BASE_2" }, - { 8970, "SINGLE_STEP_EN_0_2" }, - { 8971, "SINGLE_STEP_EN_1_2" }, - { 8972, "SINGLE_STEP_EN_2_2" }, - { 8973, "SINGLE_STEP_EN_3_2" }, - { 9088, "EX_CONTEXT_2_0" }, - { 9089, "EX_CONTEXT_2_1" }, - { 9090, "SYSTEM_SAVE_2_0" }, - { 9091, "SYSTEM_SAVE_2_1" }, - { 9092, "SYSTEM_SAVE_2_2" }, - { 9093, "SYSTEM_SAVE_2_3" }, - { 9216, "MPL_INTCTRL_1_SET_0" }, - { 9217, "MPL_INTCTRL_1_SET_1" }, - { 9218, "MPL_INTCTRL_1_SET_2" }, - { 9219, "MPL_INTCTRL_1_SET_3" }, - { 9220, "MPL_INTCTRL_1" }, - { 9221, "INTCTRL_1_STATUS" }, - { 9222, "INTERRUPT_MASK_1" }, - { 9223, "INTERRUPT_MASK_RESET_1" }, - { 9224, "INTERRUPT_MASK_SET_1" }, - { 9225, "INTERRUPT_VECTOR_BASE_1" }, - { 9226, "SINGLE_STEP_EN_0_1" }, - { 9227, "SINGLE_STEP_EN_1_1" }, - { 9228, "SINGLE_STEP_EN_2_1" }, - { 9229, "SINGLE_STEP_EN_3_1" }, - { 9344, "EX_CONTEXT_1_0" }, - { 9345, "EX_CONTEXT_1_1" }, - { 9346, "SYSTEM_SAVE_1_0" }, - { 9347, "SYSTEM_SAVE_1_1" }, - { 9348, "SYSTEM_SAVE_1_2" }, - { 9349, "SYSTEM_SAVE_1_3" }, - { 9472, "MPL_INTCTRL_0_SET_0" }, - { 9473, "MPL_INTCTRL_0_SET_1" }, - { 9474, "MPL_INTCTRL_0_SET_2" }, - { 9475, "MPL_INTCTRL_0_SET_3" }, - { 9476, "MPL_INTCTRL_0" }, - { 9477, "INTCTRL_0_STATUS" }, - { 9478, "INTERRUPT_MASK_0" }, - { 9479, "INTERRUPT_MASK_RESET_0" }, - { 9480, "INTERRUPT_MASK_SET_0" }, - { 9481, "INTERRUPT_VECTOR_BASE_0" }, - { 9482, "SINGLE_STEP_EN_0_0" }, - { 9483, "SINGLE_STEP_EN_1_0" }, - { 9484, "SINGLE_STEP_EN_2_0" }, - { 9485, "SINGLE_STEP_EN_3_0" }, - { 9600, "EX_CONTEXT_0_0" }, - { 9601, "EX_CONTEXT_0_1" }, - { 9602, "SYSTEM_SAVE_0_0" }, - { 9603, "SYSTEM_SAVE_0_1" }, - { 9604, "SYSTEM_SAVE_0_2" }, - { 9605, "SYSTEM_SAVE_0_3" }, - { 9728, "MPL_BOOT_ACCESS_SET_0" }, - { 9729, "MPL_BOOT_ACCESS_SET_1" }, - { 9730, "MPL_BOOT_ACCESS_SET_2" }, - { 9731, "MPL_BOOT_ACCESS_SET_3" }, - { 9732, "MPL_BOOT_ACCESS" }, - { 9733, "BIG_ENDIAN_CONFIG" }, - { 9734, "CACHE_INVALIDATION_COMPRESSION_MODE" }, - { 9735, "CACHE_INVALIDATION_MASK_0" }, - { 9736, "CACHE_INVALIDATION_MASK_1" }, - { 9737, "CACHE_INVALIDATION_MASK_2" }, - { 9738, "CBOX_CACHEASRAM_CONFIG" }, - { 9739, "CBOX_CACHE_CONFIG" }, - { 9740, "CBOX_HOME_MAP_ADDR" }, - { 9741, "CBOX_HOME_MAP_DATA" }, - { 9742, "CBOX_MMAP_0" }, - { 9743, "CBOX_MMAP_1" }, - { 9744, "CBOX_MMAP_2" }, - { 9745, "CBOX_MMAP_3" }, - { 9746, "CBOX_MSR" }, - { 9747, "DIAG_BCST_CTL" }, - { 9748, "DIAG_BCST_MASK" }, - { 9749, "DIAG_BCST_TRIGGER" }, - { 9750, "DIAG_MUX_CTL" }, - { 9751, "DIAG_TRACE_CTL" }, - { 9752, "DIAG_TRACE_DATA" }, - { 9753, "DIAG_TRACE_STS" }, - { 9754, "IDN_DEMUX_BUF_THRESH" }, - { 9755, "L1_I_PIN_WAY_0" }, - { 9756, "MEM_ROUTE_ORDER" }, - { 9757, "MEM_STRIPE_CONFIG" }, - { 9758, "PERF_COUNT_PLS" }, - { 9759, "PSEUDO_RANDOM_NUMBER_MODIFY" }, - { 9760, "QUIESCE_CTL" }, - { 9761, "RSHIM_COORD" }, - { 9762, "SBOX_CONFIG" }, - { 9763, "UDN_DEMUX_BUF_THRESH" }, - { 9764, "XDN_CORE_STARVATION_COUNT" }, - { 9765, "XDN_ROUND_ROBIN_ARB_CTL" }, - { 9856, "CYCLE_MODIFY" }, - { 9857, "I_AAR" }, - { 9984, "MPL_WORLD_ACCESS_SET_0" }, - { 9985, "MPL_WORLD_ACCESS_SET_1" }, - { 9986, "MPL_WORLD_ACCESS_SET_2" }, - { 9987, "MPL_WORLD_ACCESS_SET_3" }, - { 9988, "MPL_WORLD_ACCESS" }, - { 9989, "DONE" }, - { 9990, "DSTREAM_PF" }, - { 9991, "FAIL" }, - { 9992, "INTERRUPT_CRITICAL_SECTION" }, - { 9993, "PASS" }, - { 9994, "PSEUDO_RANDOM_NUMBER" }, - { 9995, "TILE_COORD" }, - { 9996, "TILE_RTF_HWM" }, - { 10112, "CMPEXCH_VALUE" }, - { 10113, "CYCLE" }, - { 10114, "EVENT_BEGIN" }, - { 10115, "EVENT_END" }, - { 10116, "PROC_STATUS" }, - { 10117, "SIM_CONTROL" }, - { 10118, "SIM_SOCKET" }, - { 10119, "STATUS_SATURATE" }, - { 10240, "MPL_I_ASID_SET_0" }, - { 10241, "MPL_I_ASID_SET_1" }, - { 10242, "MPL_I_ASID_SET_2" }, - { 10243, "MPL_I_ASID_SET_3" }, - { 10244, "MPL_I_ASID" }, - { 10245, "I_ASID" }, - { 10496, "MPL_D_ASID_SET_0" }, - { 10497, "MPL_D_ASID_SET_1" }, - { 10498, "MPL_D_ASID_SET_2" }, - { 10499, "MPL_D_ASID_SET_3" }, - { 10500, "MPL_D_ASID" }, - { 10501, "D_ASID" }, - { 10752, "MPL_DOUBLE_FAULT_SET_0" }, - { 10753, "MPL_DOUBLE_FAULT_SET_1" }, - { 10754, "MPL_DOUBLE_FAULT_SET_2" }, - { 10755, "MPL_DOUBLE_FAULT_SET_3" }, - { 10756, "MPL_DOUBLE_FAULT" }, - { 10757, "LAST_INTERRUPT_REASON" }, -}; - -const int tilegx_num_sprs = 441; - -const char * -get_tilegx_spr_name (int num) -{ - void *result; - struct tilegx_spr key; - - key.number = num; - result = bsearch((const void *) &key, (const void *) tilegx_sprs, - tilegx_num_sprs, sizeof (struct tilegx_spr), - tilegx_spr_compare); - - if (result == NULL) - { - return (NULL); - } - else - { - struct tilegx_spr *result_ptr = (struct tilegx_spr *) result; - return (result_ptr->name); - } -} - -int -print_insn_tilegx (unsigned char * memaddr) -{ - struct tilegx_decoded_instruction - decoded[TILEGX_MAX_INSTRUCTIONS_PER_BUNDLE]; - unsigned char opbuf[TILEGX_BUNDLE_SIZE_IN_BYTES]; - int i, num_instructions, num_printed; - tilegx_mnemonic padding_mnemonic; - - memcpy((void *)opbuf, (void *)memaddr, TILEGX_BUNDLE_SIZE_IN_BYTES); - - /* Parse the instructions in the bundle. */ - num_instructions = - parse_insn_tilegx (*(unsigned long long *)opbuf, (unsigned long long)memaddr, decoded); - - /* Print the instructions in the bundle. */ - printf("{ "); - num_printed = 0; - - /* Determine which nop opcode is used for padding and should be skipped. */ - padding_mnemonic = TILEGX_OPC_FNOP; - for (i = 0; i < num_instructions; i++) - { - if (!decoded[i].opcode->can_bundle) - { - /* Instructions that cannot be bundled are padded out with nops, - rather than fnops. Displaying them is always clutter. */ - padding_mnemonic = TILEGX_OPC_NOP; - break; - } - } - - for (i = 0; i < num_instructions; i++) - { - const struct tilegx_opcode *opcode = decoded[i].opcode; - const char *name; - int j; - - /* Do not print out fnops, unless everything is an fnop, in - which case we will print out just the last one. */ - if (opcode->mnemonic == padding_mnemonic - && (num_printed > 0 || i + 1 < num_instructions)) - continue; - - if (num_printed > 0) - printf(" ; "); - ++num_printed; - - name = opcode->name; - if (name == NULL) - name = ""; - printf("%s", name); - - for (j = 0; j < opcode->num_operands; j++) - { - unsigned long long num; - const struct tilegx_operand *op; - const char *spr_name; - - if (j > 0) - printf (","); - printf (" "); - - num = decoded[i].operand_values[j]; - - op = decoded[i].operands[j]; - switch (op->type) - { - case TILEGX_OP_TYPE_REGISTER: - printf ("%s", tilegx_register_names[(int)num]); - break; - case TILEGX_OP_TYPE_SPR: - spr_name = get_tilegx_spr_name(num); - if (spr_name != NULL) - printf ("%s", spr_name); - else - printf ("%d", (int)num); - break; - case TILEGX_OP_TYPE_IMMEDIATE: - printf ("%d", (int)num); - break; - case TILEGX_OP_TYPE_ADDRESS: - printf ("0x%016llx", num); - break; - default: - abort (); - } - } - } - printf (" }\n"); - - return TILEGX_BUNDLE_SIZE_IN_BYTES; -} diff --git a/src/pcre/sljit/sljitNativeTILEGX_64.c b/src/pcre/sljit/sljitNativeTILEGX_64.c deleted file mode 100644 index 003f43a7..00000000 --- a/src/pcre/sljit/sljitNativeTILEGX_64.c +++ /dev/null @@ -1,2555 +0,0 @@ -/* - * Stack-less Just-In-Time compiler - * - * Copyright 2013-2013 Tilera Corporation(jiwang@tilera.com). All rights reserved. - * Copyright Zoltan Herczeg (hzmester@freemail.hu). All rights reserved. - * - * Redistribution and use in source and binary forms, with or without modification, are - * permitted provided that the following conditions are met: - * - * 1. Redistributions of source code must retain the above copyright notice, this list of - * conditions and the following disclaimer. - * - * 2. Redistributions in binary form must reproduce the above copyright notice, this list - * of conditions and the following disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) AND CONTRIBUTORS ``AS IS'' AND ANY - * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES - * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT - * SHALL THE COPYRIGHT HOLDER(S) OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, - * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED - * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR - * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN - * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - */ - -/* TileGX architecture. */ -/* Contributed by Tilera Corporation. */ -#include "sljitNativeTILEGX-encoder.c" - -#define SIMM_8BIT_MAX (0x7f) -#define SIMM_8BIT_MIN (-0x80) -#define SIMM_16BIT_MAX (0x7fff) -#define SIMM_16BIT_MIN (-0x8000) -#define SIMM_17BIT_MAX (0xffff) -#define SIMM_17BIT_MIN (-0x10000) -#define SIMM_32BIT_MAX (0x7fffffff) -#define SIMM_32BIT_MIN (-0x7fffffff - 1) -#define SIMM_48BIT_MAX (0x7fffffff0000L) -#define SIMM_48BIT_MIN (-0x800000000000L) -#define IMM16(imm) ((imm) & 0xffff) - -#define UIMM_16BIT_MAX (0xffff) - -#define TMP_REG1 (SLJIT_NUMBER_OF_REGISTERS + 2) -#define TMP_REG2 (SLJIT_NUMBER_OF_REGISTERS + 3) -#define TMP_REG3 (SLJIT_NUMBER_OF_REGISTERS + 4) -#define ADDR_TMP (SLJIT_NUMBER_OF_REGISTERS + 5) -#define PIC_ADDR_REG TMP_REG2 - -static const sljit_u8 reg_map[SLJIT_NUMBER_OF_REGISTERS + 6] = { - 63, 0, 1, 2, 3, 4, 30, 31, 32, 33, 34, 54, 5, 16, 6, 7 -}; - -#define SLJIT_LOCALS_REG_mapped 54 -#define TMP_REG1_mapped 5 -#define TMP_REG2_mapped 16 -#define TMP_REG3_mapped 6 -#define ADDR_TMP_mapped 7 - -/* Flags are keept in volatile registers. */ -#define EQUAL_FLAG 8 -/* And carry flag as well. */ -#define ULESS_FLAG 9 -#define UGREATER_FLAG 10 -#define LESS_FLAG 11 -#define GREATER_FLAG 12 -#define OVERFLOW_FLAG 13 - -#define ZERO 63 -#define RA 55 -#define TMP_EREG1 14 -#define TMP_EREG2 15 - -#define LOAD_DATA 0x01 -#define WORD_DATA 0x00 -#define BYTE_DATA 0x02 -#define HALF_DATA 0x04 -#define INT_DATA 0x06 -#define SIGNED_DATA 0x08 -#define DOUBLE_DATA 0x10 - -/* Separates integer and floating point registers */ -#define GPR_REG 0xf - -#define MEM_MASK 0x1f - -#define WRITE_BACK 0x00020 -#define ARG_TEST 0x00040 -#define ALT_KEEP_CACHE 0x00080 -#define CUMULATIVE_OP 0x00100 -#define LOGICAL_OP 0x00200 -#define IMM_OP 0x00400 -#define SRC2_IMM 0x00800 - -#define UNUSED_DEST 0x01000 -#define REG_DEST 0x02000 -#define REG1_SOURCE 0x04000 -#define REG2_SOURCE 0x08000 -#define SLOW_SRC1 0x10000 -#define SLOW_SRC2 0x20000 -#define SLOW_DEST 0x40000 - -/* Only these flags are set. UNUSED_DEST is not set when no flags should be set. - */ -#define CHECK_FLAGS(list) (!(flags & UNUSED_DEST) || (op & GET_FLAGS(~(list)))) - -SLJIT_API_FUNC_ATTRIBUTE const char *sljit_get_platform_name(void) -{ - return "TileGX" SLJIT_CPUINFO; -} - -/* Length of an instruction word */ -typedef sljit_uw sljit_ins; - -struct jit_instr { - const struct tilegx_opcode* opcode; - tilegx_pipeline pipe; - unsigned long input_registers; - unsigned long output_registers; - int operand_value[4]; - int line; -}; - -/* Opcode Helper Macros */ -#define TILEGX_X_MODE 0 - -#define X_MODE create_Mode(TILEGX_X_MODE) - -#define FNOP_X0 \ - create_Opcode_X0(RRR_0_OPCODE_X0) | \ - create_RRROpcodeExtension_X0(UNARY_RRR_0_OPCODE_X0) | \ - create_UnaryOpcodeExtension_X0(FNOP_UNARY_OPCODE_X0) - -#define FNOP_X1 \ - create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(UNARY_RRR_0_OPCODE_X1) | \ - create_UnaryOpcodeExtension_X1(FNOP_UNARY_OPCODE_X1) - -#define NOP \ - create_Mode(TILEGX_X_MODE) | FNOP_X0 | FNOP_X1 - -#define ANOP_X0 \ - create_Opcode_X0(RRR_0_OPCODE_X0) | \ - create_RRROpcodeExtension_X0(UNARY_RRR_0_OPCODE_X0) | \ - create_UnaryOpcodeExtension_X0(NOP_UNARY_OPCODE_X0) - -#define BPT create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(UNARY_RRR_0_OPCODE_X1) | \ - create_UnaryOpcodeExtension_X1(ILL_UNARY_OPCODE_X1) | \ - create_Dest_X1(0x1C) | create_SrcA_X1(0x25) | ANOP_X0 - -#define ADD_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(ADD_RRR_0_OPCODE_X1) | FNOP_X0 - -#define ADDI_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(IMM8_OPCODE_X1) | \ - create_Imm8OpcodeExtension_X1(ADDI_IMM8_OPCODE_X1) | FNOP_X0 - -#define SUB_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(SUB_RRR_0_OPCODE_X1) | FNOP_X0 - -#define NOR_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(NOR_RRR_0_OPCODE_X1) | FNOP_X0 - -#define OR_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(OR_RRR_0_OPCODE_X1) | FNOP_X0 - -#define AND_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(AND_RRR_0_OPCODE_X1) | FNOP_X0 - -#define XOR_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(XOR_RRR_0_OPCODE_X1) | FNOP_X0 - -#define CMOVNEZ_X0 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X0(RRR_0_OPCODE_X0) | \ - create_RRROpcodeExtension_X0(CMOVNEZ_RRR_0_OPCODE_X0) | FNOP_X1 - -#define CMOVEQZ_X0 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X0(RRR_0_OPCODE_X0) | \ - create_RRROpcodeExtension_X0(CMOVEQZ_RRR_0_OPCODE_X0) | FNOP_X1 - -#define ADDLI_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(ADDLI_OPCODE_X1) | FNOP_X0 - -#define V4INT_L_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(V4INT_L_RRR_0_OPCODE_X1) | FNOP_X0 - -#define BFEXTU_X0 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X0(BF_OPCODE_X0) | \ - create_BFOpcodeExtension_X0(BFEXTU_BF_OPCODE_X0) | FNOP_X1 - -#define BFEXTS_X0 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X0(BF_OPCODE_X0) | \ - create_BFOpcodeExtension_X0(BFEXTS_BF_OPCODE_X0) | FNOP_X1 - -#define SHL16INSLI_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(SHL16INSLI_OPCODE_X1) | FNOP_X0 - -#define ST_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(ST_RRR_0_OPCODE_X1) | create_Dest_X1(0x0) | FNOP_X0 - -#define LD_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(UNARY_RRR_0_OPCODE_X1) | \ - create_UnaryOpcodeExtension_X1(LD_UNARY_OPCODE_X1) | FNOP_X0 - -#define JR_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(UNARY_RRR_0_OPCODE_X1) | \ - create_UnaryOpcodeExtension_X1(JR_UNARY_OPCODE_X1) | FNOP_X0 - -#define JALR_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(UNARY_RRR_0_OPCODE_X1) | \ - create_UnaryOpcodeExtension_X1(JALR_UNARY_OPCODE_X1) | FNOP_X0 - -#define CLZ_X0 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X0(RRR_0_OPCODE_X0) | \ - create_RRROpcodeExtension_X0(UNARY_RRR_0_OPCODE_X0) | \ - create_UnaryOpcodeExtension_X0(CNTLZ_UNARY_OPCODE_X0) | FNOP_X1 - -#define CMPLTUI_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(IMM8_OPCODE_X1) | \ - create_Imm8OpcodeExtension_X1(CMPLTUI_IMM8_OPCODE_X1) | FNOP_X0 - -#define CMPLTU_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(CMPLTU_RRR_0_OPCODE_X1) | FNOP_X0 - -#define CMPLTS_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(CMPLTS_RRR_0_OPCODE_X1) | FNOP_X0 - -#define XORI_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(IMM8_OPCODE_X1) | \ - create_Imm8OpcodeExtension_X1(XORI_IMM8_OPCODE_X1) | FNOP_X0 - -#define ORI_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(IMM8_OPCODE_X1) | \ - create_Imm8OpcodeExtension_X1(ORI_IMM8_OPCODE_X1) | FNOP_X0 - -#define ANDI_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(IMM8_OPCODE_X1) | \ - create_Imm8OpcodeExtension_X1(ANDI_IMM8_OPCODE_X1) | FNOP_X0 - -#define SHLI_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(SHIFT_OPCODE_X1) | \ - create_ShiftOpcodeExtension_X1(SHLI_SHIFT_OPCODE_X1) | FNOP_X0 - -#define SHL_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(SHL_RRR_0_OPCODE_X1) | FNOP_X0 - -#define SHRSI_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(SHIFT_OPCODE_X1) | \ - create_ShiftOpcodeExtension_X1(SHRSI_SHIFT_OPCODE_X1) | FNOP_X0 - -#define SHRS_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(SHRS_RRR_0_OPCODE_X1) | FNOP_X0 - -#define SHRUI_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(SHIFT_OPCODE_X1) | \ - create_ShiftOpcodeExtension_X1(SHRUI_SHIFT_OPCODE_X1) | FNOP_X0 - -#define SHRU_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(RRR_0_OPCODE_X1) | \ - create_RRROpcodeExtension_X1(SHRU_RRR_0_OPCODE_X1) | FNOP_X0 - -#define BEQZ_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(BRANCH_OPCODE_X1) | \ - create_BrType_X1(BEQZ_BRANCH_OPCODE_X1) | FNOP_X0 - -#define BNEZ_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(BRANCH_OPCODE_X1) | \ - create_BrType_X1(BNEZ_BRANCH_OPCODE_X1) | FNOP_X0 - -#define J_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(JUMP_OPCODE_X1) | \ - create_JumpOpcodeExtension_X1(J_JUMP_OPCODE_X1) | FNOP_X0 - -#define JAL_X1 \ - create_Mode(TILEGX_X_MODE) | create_Opcode_X1(JUMP_OPCODE_X1) | \ - create_JumpOpcodeExtension_X1(JAL_JUMP_OPCODE_X1) | FNOP_X0 - -#define DEST_X0(x) create_Dest_X0(x) -#define SRCA_X0(x) create_SrcA_X0(x) -#define SRCB_X0(x) create_SrcB_X0(x) -#define DEST_X1(x) create_Dest_X1(x) -#define SRCA_X1(x) create_SrcA_X1(x) -#define SRCB_X1(x) create_SrcB_X1(x) -#define IMM16_X1(x) create_Imm16_X1(x) -#define IMM8_X1(x) create_Imm8_X1(x) -#define BFSTART_X0(x) create_BFStart_X0(x) -#define BFEND_X0(x) create_BFEnd_X0(x) -#define SHIFTIMM_X1(x) create_ShAmt_X1(x) -#define JOFF_X1(x) create_JumpOff_X1(x) -#define BOFF_X1(x) create_BrOff_X1(x) - -static const tilegx_mnemonic data_transfer_insts[16] = { - /* u w s */ TILEGX_OPC_ST /* st */, - /* u w l */ TILEGX_OPC_LD /* ld */, - /* u b s */ TILEGX_OPC_ST1 /* st1 */, - /* u b l */ TILEGX_OPC_LD1U /* ld1u */, - /* u h s */ TILEGX_OPC_ST2 /* st2 */, - /* u h l */ TILEGX_OPC_LD2U /* ld2u */, - /* u i s */ TILEGX_OPC_ST4 /* st4 */, - /* u i l */ TILEGX_OPC_LD4U /* ld4u */, - /* s w s */ TILEGX_OPC_ST /* st */, - /* s w l */ TILEGX_OPC_LD /* ld */, - /* s b s */ TILEGX_OPC_ST1 /* st1 */, - /* s b l */ TILEGX_OPC_LD1S /* ld1s */, - /* s h s */ TILEGX_OPC_ST2 /* st2 */, - /* s h l */ TILEGX_OPC_LD2S /* ld2s */, - /* s i s */ TILEGX_OPC_ST4 /* st4 */, - /* s i l */ TILEGX_OPC_LD4S /* ld4s */, -}; - -#ifdef TILEGX_JIT_DEBUG -static sljit_s32 push_inst_debug(struct sljit_compiler *compiler, sljit_ins ins, int line) -{ - sljit_ins *ptr = (sljit_ins *)ensure_buf(compiler, sizeof(sljit_ins)); - FAIL_IF(!ptr); - *ptr = ins; - compiler->size++; - printf("|%04d|S0|:\t\t", line); - print_insn_tilegx(ptr); - return SLJIT_SUCCESS; -} - -static sljit_s32 push_inst_nodebug(struct sljit_compiler *compiler, sljit_ins ins) -{ - sljit_ins *ptr = (sljit_ins *)ensure_buf(compiler, sizeof(sljit_ins)); - FAIL_IF(!ptr); - *ptr = ins; - compiler->size++; - return SLJIT_SUCCESS; -} - -#define push_inst(a, b) push_inst_debug(a, b, __LINE__) -#else -static sljit_s32 push_inst(struct sljit_compiler *compiler, sljit_ins ins) -{ - sljit_ins *ptr = (sljit_ins *)ensure_buf(compiler, sizeof(sljit_ins)); - FAIL_IF(!ptr); - *ptr = ins; - compiler->size++; - return SLJIT_SUCCESS; -} -#endif - -#define BUNDLE_FORMAT_MASK(p0, p1, p2) \ - ((p0) | ((p1) << 8) | ((p2) << 16)) - -#define BUNDLE_FORMAT(p0, p1, p2) \ - { \ - { \ - (tilegx_pipeline)(p0), \ - (tilegx_pipeline)(p1), \ - (tilegx_pipeline)(p2) \ - }, \ - BUNDLE_FORMAT_MASK(1 << (p0), 1 << (p1), (1 << (p2))) \ - } - -#define NO_PIPELINE TILEGX_NUM_PIPELINE_ENCODINGS - -#define tilegx_is_x_pipeline(p) ((int)(p) <= (int)TILEGX_PIPELINE_X1) - -#define PI(encoding) \ - push_inst(compiler, encoding) - -#define PB3(opcode, dst, srca, srcb) \ - push_3_buffer(compiler, opcode, dst, srca, srcb, __LINE__) - -#define PB2(opcode, dst, src) \ - push_2_buffer(compiler, opcode, dst, src, __LINE__) - -#define JR(reg) \ - push_jr_buffer(compiler, TILEGX_OPC_JR, reg, __LINE__) - -#define ADD(dst, srca, srcb) \ - push_3_buffer(compiler, TILEGX_OPC_ADD, dst, srca, srcb, __LINE__) - -#define SUB(dst, srca, srcb) \ - push_3_buffer(compiler, TILEGX_OPC_SUB, dst, srca, srcb, __LINE__) - -#define MUL(dst, srca, srcb) \ - push_3_buffer(compiler, TILEGX_OPC_MULX, dst, srca, srcb, __LINE__) - -#define NOR(dst, srca, srcb) \ - push_3_buffer(compiler, TILEGX_OPC_NOR, dst, srca, srcb, __LINE__) - -#define OR(dst, srca, srcb) \ - push_3_buffer(compiler, TILEGX_OPC_OR, dst, srca, srcb, __LINE__) - -#define XOR(dst, srca, srcb) \ - push_3_buffer(compiler, TILEGX_OPC_XOR, dst, srca, srcb, __LINE__) - -#define AND(dst, srca, srcb) \ - push_3_buffer(compiler, TILEGX_OPC_AND, dst, srca, srcb, __LINE__) - -#define CLZ(dst, src) \ - push_2_buffer(compiler, TILEGX_OPC_CLZ, dst, src, __LINE__) - -#define SHLI(dst, srca, srcb) \ - push_3_buffer(compiler, TILEGX_OPC_SHLI, dst, srca, srcb, __LINE__) - -#define SHRUI(dst, srca, imm) \ - push_3_buffer(compiler, TILEGX_OPC_SHRUI, dst, srca, imm, __LINE__) - -#define XORI(dst, srca, imm) \ - push_3_buffer(compiler, TILEGX_OPC_XORI, dst, srca, imm, __LINE__) - -#define ORI(dst, srca, imm) \ - push_3_buffer(compiler, TILEGX_OPC_ORI, dst, srca, imm, __LINE__) - -#define CMPLTU(dst, srca, srcb) \ - push_3_buffer(compiler, TILEGX_OPC_CMPLTU, dst, srca, srcb, __LINE__) - -#define CMPLTS(dst, srca, srcb) \ - push_3_buffer(compiler, TILEGX_OPC_CMPLTS, dst, srca, srcb, __LINE__) - -#define CMPLTUI(dst, srca, imm) \ - push_3_buffer(compiler, TILEGX_OPC_CMPLTUI, dst, srca, imm, __LINE__) - -#define CMOVNEZ(dst, srca, srcb) \ - push_3_buffer(compiler, TILEGX_OPC_CMOVNEZ, dst, srca, srcb, __LINE__) - -#define CMOVEQZ(dst, srca, srcb) \ - push_3_buffer(compiler, TILEGX_OPC_CMOVEQZ, dst, srca, srcb, __LINE__) - -#define ADDLI(dst, srca, srcb) \ - push_3_buffer(compiler, TILEGX_OPC_ADDLI, dst, srca, srcb, __LINE__) - -#define SHL16INSLI(dst, srca, srcb) \ - push_3_buffer(compiler, TILEGX_OPC_SHL16INSLI, dst, srca, srcb, __LINE__) - -#define LD_ADD(dst, addr, adjust) \ - push_3_buffer(compiler, TILEGX_OPC_LD_ADD, dst, addr, adjust, __LINE__) - -#define ST_ADD(src, addr, adjust) \ - push_3_buffer(compiler, TILEGX_OPC_ST_ADD, src, addr, adjust, __LINE__) - -#define LD(dst, addr) \ - push_2_buffer(compiler, TILEGX_OPC_LD, dst, addr, __LINE__) - -#define BFEXTU(dst, src, start, end) \ - push_4_buffer(compiler, TILEGX_OPC_BFEXTU, dst, src, start, end, __LINE__) - -#define BFEXTS(dst, src, start, end) \ - push_4_buffer(compiler, TILEGX_OPC_BFEXTS, dst, src, start, end, __LINE__) - -#define ADD_SOLO(dest, srca, srcb) \ - push_inst(compiler, ADD_X1 | DEST_X1(dest) | SRCA_X1(srca) | SRCB_X1(srcb)) - -#define ADDI_SOLO(dest, srca, imm) \ - push_inst(compiler, ADDI_X1 | DEST_X1(dest) | SRCA_X1(srca) | IMM8_X1(imm)) - -#define ADDLI_SOLO(dest, srca, imm) \ - push_inst(compiler, ADDLI_X1 | DEST_X1(dest) | SRCA_X1(srca) | IMM16_X1(imm)) - -#define SHL16INSLI_SOLO(dest, srca, imm) \ - push_inst(compiler, SHL16INSLI_X1 | DEST_X1(dest) | SRCA_X1(srca) | IMM16_X1(imm)) - -#define JALR_SOLO(reg) \ - push_inst(compiler, JALR_X1 | SRCA_X1(reg)) - -#define JR_SOLO(reg) \ - push_inst(compiler, JR_X1 | SRCA_X1(reg)) - -struct Format { - /* Mapping of bundle issue slot to assigned pipe. */ - tilegx_pipeline pipe[TILEGX_MAX_INSTRUCTIONS_PER_BUNDLE]; - - /* Mask of pipes used by this bundle. */ - unsigned int pipe_mask; -}; - -const struct Format formats[] = -{ - /* In Y format we must always have something in Y2, since it has - * no fnop, so this conveys that Y2 must always be used. */ - BUNDLE_FORMAT(TILEGX_PIPELINE_Y0, TILEGX_PIPELINE_Y2, NO_PIPELINE), - BUNDLE_FORMAT(TILEGX_PIPELINE_Y1, TILEGX_PIPELINE_Y2, NO_PIPELINE), - BUNDLE_FORMAT(TILEGX_PIPELINE_Y2, TILEGX_PIPELINE_Y0, NO_PIPELINE), - BUNDLE_FORMAT(TILEGX_PIPELINE_Y2, TILEGX_PIPELINE_Y1, NO_PIPELINE), - - /* Y format has three instructions. */ - BUNDLE_FORMAT(TILEGX_PIPELINE_Y0, TILEGX_PIPELINE_Y1, TILEGX_PIPELINE_Y2), - BUNDLE_FORMAT(TILEGX_PIPELINE_Y0, TILEGX_PIPELINE_Y2, TILEGX_PIPELINE_Y1), - BUNDLE_FORMAT(TILEGX_PIPELINE_Y1, TILEGX_PIPELINE_Y0, TILEGX_PIPELINE_Y2), - BUNDLE_FORMAT(TILEGX_PIPELINE_Y1, TILEGX_PIPELINE_Y2, TILEGX_PIPELINE_Y0), - BUNDLE_FORMAT(TILEGX_PIPELINE_Y2, TILEGX_PIPELINE_Y0, TILEGX_PIPELINE_Y1), - BUNDLE_FORMAT(TILEGX_PIPELINE_Y2, TILEGX_PIPELINE_Y1, TILEGX_PIPELINE_Y0), - - /* X format has only two instructions. */ - BUNDLE_FORMAT(TILEGX_PIPELINE_X0, TILEGX_PIPELINE_X1, NO_PIPELINE), - BUNDLE_FORMAT(TILEGX_PIPELINE_X1, TILEGX_PIPELINE_X0, NO_PIPELINE) -}; - - -struct jit_instr inst_buf[TILEGX_MAX_INSTRUCTIONS_PER_BUNDLE]; -unsigned long inst_buf_index; - -tilegx_pipeline get_any_valid_pipe(const struct tilegx_opcode* opcode) -{ - /* FIXME: tile: we could pregenerate this. */ - int pipe; - for (pipe = 0; ((opcode->pipes & (1 << pipe)) == 0 && pipe < TILEGX_NUM_PIPELINE_ENCODINGS); pipe++) - ; - return (tilegx_pipeline)(pipe); -} - -void insert_nop(tilegx_mnemonic opc, int line) -{ - const struct tilegx_opcode* opcode = NULL; - - memmove(&inst_buf[1], &inst_buf[0], inst_buf_index * sizeof inst_buf[0]); - - opcode = &tilegx_opcodes[opc]; - inst_buf[0].opcode = opcode; - inst_buf[0].pipe = get_any_valid_pipe(opcode); - inst_buf[0].input_registers = 0; - inst_buf[0].output_registers = 0; - inst_buf[0].line = line; - ++inst_buf_index; -} - -const struct Format* compute_format() -{ - unsigned int compatible_pipes = BUNDLE_FORMAT_MASK( - inst_buf[0].opcode->pipes, - inst_buf[1].opcode->pipes, - (inst_buf_index == 3 ? inst_buf[2].opcode->pipes : (1 << NO_PIPELINE))); - - const struct Format* match = NULL; - const struct Format *b = NULL; - unsigned int i; - for (i = 0; i < sizeof formats / sizeof formats[0]; i++) { - b = &formats[i]; - if ((b->pipe_mask & compatible_pipes) == b->pipe_mask) { - match = b; - break; - } - } - - return match; -} - -sljit_s32 assign_pipes() -{ - unsigned long output_registers = 0; - unsigned int i = 0; - - if (inst_buf_index == 1) { - tilegx_mnemonic opc = inst_buf[0].opcode->can_bundle - ? TILEGX_OPC_FNOP : TILEGX_OPC_NOP; - insert_nop(opc, __LINE__); - } - - const struct Format* match = compute_format(); - - if (match == NULL) - return -1; - - for (i = 0; i < inst_buf_index; i++) { - - if ((i > 0) && ((inst_buf[i].input_registers & output_registers) != 0)) - return -1; - - if ((i > 0) && ((inst_buf[i].output_registers & output_registers) != 0)) - return -1; - - /* Don't include Rzero in the match set, to avoid triggering - needlessly on 'prefetch' instrs. */ - - output_registers |= inst_buf[i].output_registers & 0xFFFFFFFFFFFFFFL; - - inst_buf[i].pipe = match->pipe[i]; - } - - /* If only 2 instrs, and in Y-mode, insert a nop. */ - if (inst_buf_index == 2 && !tilegx_is_x_pipeline(match->pipe[0])) { - insert_nop(TILEGX_OPC_FNOP, __LINE__); - - /* Select the yet unassigned pipe. */ - tilegx_pipeline pipe = (tilegx_pipeline)(((TILEGX_PIPELINE_Y0 - + TILEGX_PIPELINE_Y1 + TILEGX_PIPELINE_Y2) - - (inst_buf[1].pipe + inst_buf[2].pipe))); - - inst_buf[0].pipe = pipe; - } - - return 0; -} - -tilegx_bundle_bits get_bundle_bit(struct jit_instr *inst) -{ - int i, val; - const struct tilegx_opcode* opcode = inst->opcode; - tilegx_bundle_bits bits = opcode->fixed_bit_values[inst->pipe]; - - const struct tilegx_operand* operand = NULL; - for (i = 0; i < opcode->num_operands; i++) { - operand = &tilegx_operands[opcode->operands[inst->pipe][i]]; - val = inst->operand_value[i]; - - bits |= operand->insert(val); - } - - return bits; -} - -static sljit_s32 update_buffer(struct sljit_compiler *compiler) -{ - int i; - int orig_index = inst_buf_index; - struct jit_instr inst0 = inst_buf[0]; - struct jit_instr inst1 = inst_buf[1]; - struct jit_instr inst2 = inst_buf[2]; - tilegx_bundle_bits bits = 0; - - /* If the bundle is valid as is, perform the encoding and return 1. */ - if (assign_pipes() == 0) { - for (i = 0; i < inst_buf_index; i++) { - bits |= get_bundle_bit(inst_buf + i); -#ifdef TILEGX_JIT_DEBUG - printf("|%04d", inst_buf[i].line); -#endif - } -#ifdef TILEGX_JIT_DEBUG - if (inst_buf_index == 3) - printf("|M0|:\t"); - else - printf("|M0|:\t\t"); - print_insn_tilegx(&bits); -#endif - - inst_buf_index = 0; - -#ifdef TILEGX_JIT_DEBUG - return push_inst_nodebug(compiler, bits); -#else - return push_inst(compiler, bits); -#endif - } - - /* If the bundle is invalid, split it in two. First encode the first two - (or possibly 1) instructions, and then the last, separately. Note that - assign_pipes may have re-ordered the instrs (by inserting no-ops in - lower slots) so we need to reset them. */ - - inst_buf_index = orig_index - 1; - inst_buf[0] = inst0; - inst_buf[1] = inst1; - inst_buf[2] = inst2; - if (assign_pipes() == 0) { - for (i = 0; i < inst_buf_index; i++) { - bits |= get_bundle_bit(inst_buf + i); -#ifdef TILEGX_JIT_DEBUG - printf("|%04d", inst_buf[i].line); -#endif - } - -#ifdef TILEGX_JIT_DEBUG - if (inst_buf_index == 3) - printf("|M1|:\t"); - else - printf("|M1|:\t\t"); - print_insn_tilegx(&bits); -#endif - - if ((orig_index - 1) == 2) { - inst_buf[0] = inst2; - inst_buf_index = 1; - } else if ((orig_index - 1) == 1) { - inst_buf[0] = inst1; - inst_buf_index = 1; - } else - SLJIT_UNREACHABLE(); - -#ifdef TILEGX_JIT_DEBUG - return push_inst_nodebug(compiler, bits); -#else - return push_inst(compiler, bits); -#endif - } else { - /* We had 3 instrs of which the first 2 can't live in the same bundle. - Split those two. Note that we don't try to then combine the second - and third instr into a single bundle. First instruction: */ - inst_buf_index = 1; - inst_buf[0] = inst0; - inst_buf[1] = inst1; - inst_buf[2] = inst2; - if (assign_pipes() == 0) { - for (i = 0; i < inst_buf_index; i++) { - bits |= get_bundle_bit(inst_buf + i); -#ifdef TILEGX_JIT_DEBUG - printf("|%04d", inst_buf[i].line); -#endif - } - -#ifdef TILEGX_JIT_DEBUG - if (inst_buf_index == 3) - printf("|M2|:\t"); - else - printf("|M2|:\t\t"); - print_insn_tilegx(&bits); -#endif - - inst_buf[0] = inst1; - inst_buf[1] = inst2; - inst_buf_index = orig_index - 1; -#ifdef TILEGX_JIT_DEBUG - return push_inst_nodebug(compiler, bits); -#else - return push_inst(compiler, bits); -#endif - } else - SLJIT_UNREACHABLE(); - } - - SLJIT_UNREACHABLE(); -} - -static sljit_s32 flush_buffer(struct sljit_compiler *compiler) -{ - while (inst_buf_index != 0) { - FAIL_IF(update_buffer(compiler)); - } - return SLJIT_SUCCESS; -} - -static sljit_s32 push_4_buffer(struct sljit_compiler *compiler, tilegx_mnemonic opc, int op0, int op1, int op2, int op3, int line) -{ - if (inst_buf_index == TILEGX_MAX_INSTRUCTIONS_PER_BUNDLE) - FAIL_IF(update_buffer(compiler)); - - const struct tilegx_opcode* opcode = &tilegx_opcodes[opc]; - inst_buf[inst_buf_index].opcode = opcode; - inst_buf[inst_buf_index].pipe = get_any_valid_pipe(opcode); - inst_buf[inst_buf_index].operand_value[0] = op0; - inst_buf[inst_buf_index].operand_value[1] = op1; - inst_buf[inst_buf_index].operand_value[2] = op2; - inst_buf[inst_buf_index].operand_value[3] = op3; - inst_buf[inst_buf_index].input_registers = 1L << op1; - inst_buf[inst_buf_index].output_registers = 1L << op0; - inst_buf[inst_buf_index].line = line; - inst_buf_index++; - - return SLJIT_SUCCESS; -} - -static sljit_s32 push_3_buffer(struct sljit_compiler *compiler, tilegx_mnemonic opc, int op0, int op1, int op2, int line) -{ - if (inst_buf_index == TILEGX_MAX_INSTRUCTIONS_PER_BUNDLE) - FAIL_IF(update_buffer(compiler)); - - const struct tilegx_opcode* opcode = &tilegx_opcodes[opc]; - inst_buf[inst_buf_index].opcode = opcode; - inst_buf[inst_buf_index].pipe = get_any_valid_pipe(opcode); - inst_buf[inst_buf_index].operand_value[0] = op0; - inst_buf[inst_buf_index].operand_value[1] = op1; - inst_buf[inst_buf_index].operand_value[2] = op2; - inst_buf[inst_buf_index].line = line; - - switch (opc) { - case TILEGX_OPC_ST_ADD: - inst_buf[inst_buf_index].input_registers = (1L << op0) | (1L << op1); - inst_buf[inst_buf_index].output_registers = 1L << op0; - break; - case TILEGX_OPC_LD_ADD: - inst_buf[inst_buf_index].input_registers = 1L << op1; - inst_buf[inst_buf_index].output_registers = (1L << op0) | (1L << op1); - break; - case TILEGX_OPC_ADD: - case TILEGX_OPC_AND: - case TILEGX_OPC_SUB: - case TILEGX_OPC_MULX: - case TILEGX_OPC_OR: - case TILEGX_OPC_XOR: - case TILEGX_OPC_NOR: - case TILEGX_OPC_SHL: - case TILEGX_OPC_SHRU: - case TILEGX_OPC_SHRS: - case TILEGX_OPC_CMPLTU: - case TILEGX_OPC_CMPLTS: - case TILEGX_OPC_CMOVEQZ: - case TILEGX_OPC_CMOVNEZ: - inst_buf[inst_buf_index].input_registers = (1L << op1) | (1L << op2); - inst_buf[inst_buf_index].output_registers = 1L << op0; - break; - case TILEGX_OPC_ADDLI: - case TILEGX_OPC_XORI: - case TILEGX_OPC_ORI: - case TILEGX_OPC_SHLI: - case TILEGX_OPC_SHRUI: - case TILEGX_OPC_SHRSI: - case TILEGX_OPC_SHL16INSLI: - case TILEGX_OPC_CMPLTUI: - case TILEGX_OPC_CMPLTSI: - inst_buf[inst_buf_index].input_registers = 1L << op1; - inst_buf[inst_buf_index].output_registers = 1L << op0; - break; - default: - printf("unrecoginzed opc: %s\n", opcode->name); - SLJIT_UNREACHABLE(); - } - - inst_buf_index++; - - return SLJIT_SUCCESS; -} - -static sljit_s32 push_2_buffer(struct sljit_compiler *compiler, tilegx_mnemonic opc, int op0, int op1, int line) -{ - if (inst_buf_index == TILEGX_MAX_INSTRUCTIONS_PER_BUNDLE) - FAIL_IF(update_buffer(compiler)); - - const struct tilegx_opcode* opcode = &tilegx_opcodes[opc]; - inst_buf[inst_buf_index].opcode = opcode; - inst_buf[inst_buf_index].pipe = get_any_valid_pipe(opcode); - inst_buf[inst_buf_index].operand_value[0] = op0; - inst_buf[inst_buf_index].operand_value[1] = op1; - inst_buf[inst_buf_index].line = line; - - switch (opc) { - case TILEGX_OPC_BEQZ: - case TILEGX_OPC_BNEZ: - inst_buf[inst_buf_index].input_registers = 1L << op0; - break; - case TILEGX_OPC_ST: - case TILEGX_OPC_ST1: - case TILEGX_OPC_ST2: - case TILEGX_OPC_ST4: - inst_buf[inst_buf_index].input_registers = (1L << op0) | (1L << op1); - inst_buf[inst_buf_index].output_registers = 0; - break; - case TILEGX_OPC_CLZ: - case TILEGX_OPC_LD: - case TILEGX_OPC_LD1U: - case TILEGX_OPC_LD1S: - case TILEGX_OPC_LD2U: - case TILEGX_OPC_LD2S: - case TILEGX_OPC_LD4U: - case TILEGX_OPC_LD4S: - inst_buf[inst_buf_index].input_registers = 1L << op1; - inst_buf[inst_buf_index].output_registers = 1L << op0; - break; - default: - printf("unrecoginzed opc: %s\n", opcode->name); - SLJIT_UNREACHABLE(); - } - - inst_buf_index++; - - return SLJIT_SUCCESS; -} - -static sljit_s32 push_0_buffer(struct sljit_compiler *compiler, tilegx_mnemonic opc, int line) -{ - if (inst_buf_index == TILEGX_MAX_INSTRUCTIONS_PER_BUNDLE) - FAIL_IF(update_buffer(compiler)); - - const struct tilegx_opcode* opcode = &tilegx_opcodes[opc]; - inst_buf[inst_buf_index].opcode = opcode; - inst_buf[inst_buf_index].pipe = get_any_valid_pipe(opcode); - inst_buf[inst_buf_index].input_registers = 0; - inst_buf[inst_buf_index].output_registers = 0; - inst_buf[inst_buf_index].line = line; - inst_buf_index++; - - return SLJIT_SUCCESS; -} - -static sljit_s32 push_jr_buffer(struct sljit_compiler *compiler, tilegx_mnemonic opc, int op0, int line) -{ - if (inst_buf_index == TILEGX_MAX_INSTRUCTIONS_PER_BUNDLE) - FAIL_IF(update_buffer(compiler)); - - const struct tilegx_opcode* opcode = &tilegx_opcodes[opc]; - inst_buf[inst_buf_index].opcode = opcode; - inst_buf[inst_buf_index].pipe = get_any_valid_pipe(opcode); - inst_buf[inst_buf_index].operand_value[0] = op0; - inst_buf[inst_buf_index].input_registers = 1L << op0; - inst_buf[inst_buf_index].output_registers = 0; - inst_buf[inst_buf_index].line = line; - inst_buf_index++; - - return flush_buffer(compiler); -} - -static SLJIT_INLINE sljit_ins * detect_jump_type(struct sljit_jump *jump, sljit_ins *code_ptr, sljit_ins *code) -{ - sljit_sw diff; - sljit_uw target_addr; - sljit_ins *inst; - - if (jump->flags & SLJIT_REWRITABLE_JUMP) - return code_ptr; - - if (jump->flags & JUMP_ADDR) - target_addr = jump->u.target; - else { - SLJIT_ASSERT(jump->flags & JUMP_LABEL); - target_addr = (sljit_uw)(code + jump->u.label->size); - } - - inst = (sljit_ins *)jump->addr; - if (jump->flags & IS_COND) - inst--; - - diff = ((sljit_sw) target_addr - (sljit_sw) inst) >> 3; - if (diff <= SIMM_17BIT_MAX && diff >= SIMM_17BIT_MIN) { - jump->flags |= PATCH_B; - - if (!(jump->flags & IS_COND)) { - if (jump->flags & IS_JAL) { - jump->flags &= ~(PATCH_B); - jump->flags |= PATCH_J; - inst[0] = JAL_X1; - -#ifdef TILEGX_JIT_DEBUG - printf("[runtime relocate]%04d:\t", __LINE__); - print_insn_tilegx(inst); -#endif - } else { - inst[0] = BEQZ_X1 | SRCA_X1(ZERO); - -#ifdef TILEGX_JIT_DEBUG - printf("[runtime relocate]%04d:\t", __LINE__); - print_insn_tilegx(inst); -#endif - } - - return inst; - } - - inst[0] = inst[0] ^ (0x7L << 55); - -#ifdef TILEGX_JIT_DEBUG - printf("[runtime relocate]%04d:\t", __LINE__); - print_insn_tilegx(inst); -#endif - jump->addr -= sizeof(sljit_ins); - return inst; - } - - if (jump->flags & IS_COND) { - if ((target_addr & ~0x3FFFFFFFL) == ((jump->addr + sizeof(sljit_ins)) & ~0x3FFFFFFFL)) { - jump->flags |= PATCH_J; - inst[0] = (inst[0] & ~(BOFF_X1(-1))) | BOFF_X1(2); - inst[1] = J_X1; - return inst + 1; - } - - return code_ptr; - } - - if ((target_addr & ~0x3FFFFFFFL) == ((jump->addr + sizeof(sljit_ins)) & ~0x3FFFFFFFL)) { - jump->flags |= PATCH_J; - - if (jump->flags & IS_JAL) { - inst[0] = JAL_X1; - -#ifdef TILEGX_JIT_DEBUG - printf("[runtime relocate]%04d:\t", __LINE__); - print_insn_tilegx(inst); -#endif - - } else { - inst[0] = J_X1; - -#ifdef TILEGX_JIT_DEBUG - printf("[runtime relocate]%04d:\t", __LINE__); - print_insn_tilegx(inst); -#endif - } - - return inst; - } - - return code_ptr; -} - -SLJIT_API_FUNC_ATTRIBUTE void * sljit_generate_code(struct sljit_compiler *compiler) -{ - struct sljit_memory_fragment *buf; - sljit_ins *code; - sljit_ins *code_ptr; - sljit_ins *buf_ptr; - sljit_ins *buf_end; - sljit_uw word_count; - sljit_uw addr; - - struct sljit_label *label; - struct sljit_jump *jump; - struct sljit_const *const_; - - CHECK_ERROR_PTR(); - CHECK_PTR(check_sljit_generate_code(compiler)); - reverse_buf(compiler); - - code = (sljit_ins *)SLJIT_MALLOC_EXEC(compiler->size * sizeof(sljit_ins)); - PTR_FAIL_WITH_EXEC_IF(code); - buf = compiler->buf; - - code_ptr = code; - word_count = 0; - label = compiler->labels; - jump = compiler->jumps; - const_ = compiler->consts; - do { - buf_ptr = (sljit_ins *)buf->memory; - buf_end = buf_ptr + (buf->used_size >> 3); - do { - *code_ptr = *buf_ptr++; - SLJIT_ASSERT(!label || label->size >= word_count); - SLJIT_ASSERT(!jump || jump->addr >= word_count); - SLJIT_ASSERT(!const_ || const_->addr >= word_count); - /* These structures are ordered by their address. */ - if (label && label->size == word_count) { - /* Just recording the address. */ - label->addr = (sljit_uw) code_ptr; - label->size = code_ptr - code; - label = label->next; - } - - if (jump && jump->addr == word_count) { - if (jump->flags & IS_JAL) - jump->addr = (sljit_uw)(code_ptr - 4); - else - jump->addr = (sljit_uw)(code_ptr - 3); - - code_ptr = detect_jump_type(jump, code_ptr, code); - jump = jump->next; - } - - if (const_ && const_->addr == word_count) { - /* Just recording the address. */ - const_->addr = (sljit_uw) code_ptr; - const_ = const_->next; - } - - code_ptr++; - word_count++; - } while (buf_ptr < buf_end); - - buf = buf->next; - } while (buf); - - if (label && label->size == word_count) { - label->addr = (sljit_uw) code_ptr; - label->size = code_ptr - code; - label = label->next; - } - - SLJIT_ASSERT(!label); - SLJIT_ASSERT(!jump); - SLJIT_ASSERT(!const_); - SLJIT_ASSERT(code_ptr - code <= (sljit_sw)compiler->size); - - jump = compiler->jumps; - while (jump) { - do { - addr = (jump->flags & JUMP_LABEL) ? jump->u.label->addr : jump->u.target; - buf_ptr = (sljit_ins *)jump->addr; - - if (jump->flags & PATCH_B) { - addr = (sljit_sw)(addr - (jump->addr)) >> 3; - SLJIT_ASSERT((sljit_sw) addr <= SIMM_17BIT_MAX && (sljit_sw) addr >= SIMM_17BIT_MIN); - buf_ptr[0] = (buf_ptr[0] & ~(BOFF_X1(-1))) | BOFF_X1(addr); - -#ifdef TILEGX_JIT_DEBUG - printf("[runtime relocate]%04d:\t", __LINE__); - print_insn_tilegx(buf_ptr); -#endif - break; - } - - if (jump->flags & PATCH_J) { - SLJIT_ASSERT((addr & ~0x3FFFFFFFL) == ((jump->addr + sizeof(sljit_ins)) & ~0x3FFFFFFFL)); - addr = (sljit_sw)(addr - (jump->addr)) >> 3; - buf_ptr[0] = (buf_ptr[0] & ~(JOFF_X1(-1))) | JOFF_X1(addr); - -#ifdef TILEGX_JIT_DEBUG - printf("[runtime relocate]%04d:\t", __LINE__); - print_insn_tilegx(buf_ptr); -#endif - break; - } - - SLJIT_ASSERT(!(jump->flags & IS_JAL)); - - /* Set the fields of immediate loads. */ - buf_ptr[0] = (buf_ptr[0] & ~(0xFFFFL << 43)) | (((addr >> 32) & 0xFFFFL) << 43); - buf_ptr[1] = (buf_ptr[1] & ~(0xFFFFL << 43)) | (((addr >> 16) & 0xFFFFL) << 43); - buf_ptr[2] = (buf_ptr[2] & ~(0xFFFFL << 43)) | ((addr & 0xFFFFL) << 43); - } while (0); - - jump = jump->next; - } - - compiler->error = SLJIT_ERR_COMPILED; - compiler->executable_size = (code_ptr - code) * sizeof(sljit_ins); - SLJIT_CACHE_FLUSH(code, code_ptr); - return code; -} - -static sljit_s32 load_immediate(struct sljit_compiler *compiler, sljit_s32 dst_ar, sljit_sw imm) -{ - - if (imm <= SIMM_16BIT_MAX && imm >= SIMM_16BIT_MIN) - return ADDLI(dst_ar, ZERO, imm); - - if (imm <= SIMM_32BIT_MAX && imm >= SIMM_32BIT_MIN) { - FAIL_IF(ADDLI(dst_ar, ZERO, imm >> 16)); - return SHL16INSLI(dst_ar, dst_ar, imm); - } - - if (imm <= SIMM_48BIT_MAX && imm >= SIMM_48BIT_MIN) { - FAIL_IF(ADDLI(dst_ar, ZERO, imm >> 32)); - FAIL_IF(SHL16INSLI(dst_ar, dst_ar, imm >> 16)); - return SHL16INSLI(dst_ar, dst_ar, imm); - } - - FAIL_IF(ADDLI(dst_ar, ZERO, imm >> 48)); - FAIL_IF(SHL16INSLI(dst_ar, dst_ar, imm >> 32)); - FAIL_IF(SHL16INSLI(dst_ar, dst_ar, imm >> 16)); - return SHL16INSLI(dst_ar, dst_ar, imm); -} - -static sljit_s32 emit_const(struct sljit_compiler *compiler, sljit_s32 dst_ar, sljit_sw imm, int flush) -{ - /* Should *not* be optimized as load_immediate, as pcre relocation - mechanism will match this fixed 4-instruction pattern. */ - if (flush) { - FAIL_IF(ADDLI_SOLO(dst_ar, ZERO, imm >> 32)); - FAIL_IF(SHL16INSLI_SOLO(dst_ar, dst_ar, imm >> 16)); - return SHL16INSLI_SOLO(dst_ar, dst_ar, imm); - } - - FAIL_IF(ADDLI(dst_ar, ZERO, imm >> 32)); - FAIL_IF(SHL16INSLI(dst_ar, dst_ar, imm >> 16)); - return SHL16INSLI(dst_ar, dst_ar, imm); -} - -static sljit_s32 emit_const_64(struct sljit_compiler *compiler, sljit_s32 dst_ar, sljit_sw imm, int flush) -{ - /* Should *not* be optimized as load_immediate, as pcre relocation - mechanism will match this fixed 4-instruction pattern. */ - if (flush) { - FAIL_IF(ADDLI_SOLO(reg_map[dst_ar], ZERO, imm >> 48)); - FAIL_IF(SHL16INSLI_SOLO(reg_map[dst_ar], reg_map[dst_ar], imm >> 32)); - FAIL_IF(SHL16INSLI_SOLO(reg_map[dst_ar], reg_map[dst_ar], imm >> 16)); - return SHL16INSLI_SOLO(reg_map[dst_ar], reg_map[dst_ar], imm); - } - - FAIL_IF(ADDLI(reg_map[dst_ar], ZERO, imm >> 48)); - FAIL_IF(SHL16INSLI(reg_map[dst_ar], reg_map[dst_ar], imm >> 32)); - FAIL_IF(SHL16INSLI(reg_map[dst_ar], reg_map[dst_ar], imm >> 16)); - return SHL16INSLI(reg_map[dst_ar], reg_map[dst_ar], imm); -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_enter(struct sljit_compiler *compiler, - sljit_s32 options, sljit_s32 args, sljit_s32 scratches, sljit_s32 saveds, - sljit_s32 fscratches, sljit_s32 fsaveds, sljit_s32 local_size) -{ - sljit_ins base; - sljit_s32 i, tmp; - - CHECK_ERROR(); - CHECK(check_sljit_emit_enter(compiler, options, args, scratches, saveds, fscratches, fsaveds, local_size)); - set_emit_enter(compiler, options, args, scratches, saveds, fscratches, fsaveds, local_size); - - local_size += GET_SAVED_REGISTERS_SIZE(scratches, saveds, 1); - local_size = (local_size + 7) & ~7; - compiler->local_size = local_size; - - if (local_size <= SIMM_16BIT_MAX) { - /* Frequent case. */ - FAIL_IF(ADDLI(SLJIT_LOCALS_REG_mapped, SLJIT_LOCALS_REG_mapped, -local_size)); - base = SLJIT_LOCALS_REG_mapped; - } else { - FAIL_IF(load_immediate(compiler, TMP_REG1_mapped, local_size)); - FAIL_IF(ADD(TMP_REG2_mapped, SLJIT_LOCALS_REG_mapped, ZERO)); - FAIL_IF(SUB(SLJIT_LOCALS_REG_mapped, SLJIT_LOCALS_REG_mapped, TMP_REG1_mapped)); - base = TMP_REG2_mapped; - local_size = 0; - } - - /* Save the return address. */ - FAIL_IF(ADDLI(ADDR_TMP_mapped, base, local_size - 8)); - FAIL_IF(ST_ADD(ADDR_TMP_mapped, RA, -8)); - - /* Save the S registers. */ - tmp = saveds < SLJIT_NUMBER_OF_SAVED_REGISTERS ? (SLJIT_S0 + 1 - saveds) : SLJIT_FIRST_SAVED_REG; - for (i = SLJIT_S0; i >= tmp; i--) { - FAIL_IF(ST_ADD(ADDR_TMP_mapped, reg_map[i], -8)); - } - - /* Save the R registers that need to be reserved. */ - for (i = scratches; i >= SLJIT_FIRST_SAVED_REG; i--) { - FAIL_IF(ST_ADD(ADDR_TMP_mapped, reg_map[i], -8)); - } - - /* Move the arguments to S registers. */ - for (i = 0; i < args; i++) { - FAIL_IF(ADD(reg_map[SLJIT_S0 - i], i, ZERO)); - } - - return SLJIT_SUCCESS; -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_set_context(struct sljit_compiler *compiler, - sljit_s32 options, sljit_s32 args, sljit_s32 scratches, sljit_s32 saveds, - sljit_s32 fscratches, sljit_s32 fsaveds, sljit_s32 local_size) -{ - CHECK_ERROR(); - CHECK(check_sljit_set_context(compiler, options, args, scratches, saveds, fscratches, fsaveds, local_size)); - set_set_context(compiler, options, args, scratches, saveds, fscratches, fsaveds, local_size); - - local_size += GET_SAVED_REGISTERS_SIZE(scratches, saveds, 1); - compiler->local_size = (local_size + 7) & ~7; - - return SLJIT_SUCCESS; -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_return(struct sljit_compiler *compiler, sljit_s32 op, sljit_s32 src, sljit_sw srcw) -{ - sljit_s32 local_size; - sljit_ins base; - sljit_s32 i, tmp; - sljit_s32 saveds; - - CHECK_ERROR(); - CHECK(check_sljit_emit_return(compiler, op, src, srcw)); - - FAIL_IF(emit_mov_before_return(compiler, op, src, srcw)); - - local_size = compiler->local_size; - if (local_size <= SIMM_16BIT_MAX) - base = SLJIT_LOCALS_REG_mapped; - else { - FAIL_IF(load_immediate(compiler, TMP_REG1_mapped, local_size)); - FAIL_IF(ADD(TMP_REG1_mapped, SLJIT_LOCALS_REG_mapped, TMP_REG1_mapped)); - base = TMP_REG1_mapped; - local_size = 0; - } - - /* Restore the return address. */ - FAIL_IF(ADDLI(ADDR_TMP_mapped, base, local_size - 8)); - FAIL_IF(LD_ADD(RA, ADDR_TMP_mapped, -8)); - - /* Restore the S registers. */ - saveds = compiler->saveds; - tmp = saveds < SLJIT_NUMBER_OF_SAVED_REGISTERS ? (SLJIT_S0 + 1 - saveds) : SLJIT_FIRST_SAVED_REG; - for (i = SLJIT_S0; i >= tmp; i--) { - FAIL_IF(LD_ADD(reg_map[i], ADDR_TMP_mapped, -8)); - } - - /* Restore the R registers that need to be reserved. */ - for (i = compiler->scratches; i >= SLJIT_FIRST_SAVED_REG; i--) { - FAIL_IF(LD_ADD(reg_map[i], ADDR_TMP_mapped, -8)); - } - - if (compiler->local_size <= SIMM_16BIT_MAX) - FAIL_IF(ADDLI(SLJIT_LOCALS_REG_mapped, SLJIT_LOCALS_REG_mapped, compiler->local_size)); - else - FAIL_IF(ADD(SLJIT_LOCALS_REG_mapped, TMP_REG1_mapped, ZERO)); - - return JR(RA); -} - -/* reg_ar is an absoulute register! */ - -/* Can perform an operation using at most 1 instruction. */ -static sljit_s32 getput_arg_fast(struct sljit_compiler *compiler, sljit_s32 flags, sljit_s32 reg_ar, sljit_s32 arg, sljit_sw argw) -{ - SLJIT_ASSERT(arg & SLJIT_MEM); - - if ((!(flags & WRITE_BACK) || !(arg & REG_MASK)) - && !(arg & OFFS_REG_MASK) && argw <= SIMM_16BIT_MAX && argw >= SIMM_16BIT_MIN) { - /* Works for both absoulte and relative addresses. */ - if (SLJIT_UNLIKELY(flags & ARG_TEST)) - return 1; - - FAIL_IF(ADDLI(ADDR_TMP_mapped, reg_map[arg & REG_MASK], argw)); - - if (flags & LOAD_DATA) - FAIL_IF(PB2(data_transfer_insts[flags & MEM_MASK], reg_ar, ADDR_TMP_mapped)); - else - FAIL_IF(PB2(data_transfer_insts[flags & MEM_MASK], ADDR_TMP_mapped, reg_ar)); - - return -1; - } - - return 0; -} - -/* See getput_arg below. - Note: can_cache is called only for binary operators. Those - operators always uses word arguments without write back. */ -static sljit_s32 can_cache(sljit_s32 arg, sljit_sw argw, sljit_s32 next_arg, sljit_sw next_argw) -{ - SLJIT_ASSERT((arg & SLJIT_MEM) && (next_arg & SLJIT_MEM)); - - /* Simple operation except for updates. */ - if (arg & OFFS_REG_MASK) { - argw &= 0x3; - next_argw &= 0x3; - if (argw && argw == next_argw - && (arg == next_arg || (arg & OFFS_REG_MASK) == (next_arg & OFFS_REG_MASK))) - return 1; - return 0; - } - - if (arg == next_arg) { - if (((next_argw - argw) <= SIMM_16BIT_MAX - && (next_argw - argw) >= SIMM_16BIT_MIN)) - return 1; - - return 0; - } - - return 0; -} - -/* Emit the necessary instructions. See can_cache above. */ -static sljit_s32 getput_arg(struct sljit_compiler *compiler, sljit_s32 flags, sljit_s32 reg_ar, sljit_s32 arg, sljit_sw argw, sljit_s32 next_arg, sljit_sw next_argw) -{ - sljit_s32 tmp_ar, base; - - SLJIT_ASSERT(arg & SLJIT_MEM); - if (!(next_arg & SLJIT_MEM)) { - next_arg = 0; - next_argw = 0; - } - - if ((flags & MEM_MASK) <= GPR_REG && (flags & LOAD_DATA)) - tmp_ar = reg_ar; - else - tmp_ar = TMP_REG1_mapped; - - base = arg & REG_MASK; - - if (SLJIT_UNLIKELY(arg & OFFS_REG_MASK)) { - argw &= 0x3; - - if ((flags & WRITE_BACK) && reg_ar == reg_map[base]) { - SLJIT_ASSERT(!(flags & LOAD_DATA) && reg_map[TMP_REG1] != reg_ar); - FAIL_IF(ADD(TMP_REG1_mapped, reg_ar, ZERO)); - reg_ar = TMP_REG1_mapped; - } - - /* Using the cache. */ - if (argw == compiler->cache_argw) { - if (!(flags & WRITE_BACK)) { - if (arg == compiler->cache_arg) { - if (flags & LOAD_DATA) - return PB2(data_transfer_insts[flags & MEM_MASK], reg_ar, TMP_REG3_mapped); - else - return PB2(data_transfer_insts[flags & MEM_MASK], TMP_REG3_mapped, reg_ar); - } - - if ((SLJIT_MEM | (arg & OFFS_REG_MASK)) == compiler->cache_arg) { - if (arg == next_arg && argw == (next_argw & 0x3)) { - compiler->cache_arg = arg; - compiler->cache_argw = argw; - FAIL_IF(ADD(TMP_REG3_mapped, reg_map[base], TMP_REG3_mapped)); - if (flags & LOAD_DATA) - return PB2(data_transfer_insts[flags & MEM_MASK], reg_ar, TMP_REG3_mapped); - else - return PB2(data_transfer_insts[flags & MEM_MASK], TMP_REG3_mapped, reg_ar); - } - - FAIL_IF(ADD(tmp_ar, reg_map[base], TMP_REG3_mapped)); - if (flags & LOAD_DATA) - return PB2(data_transfer_insts[flags & MEM_MASK], reg_ar, tmp_ar); - else - return PB2(data_transfer_insts[flags & MEM_MASK], tmp_ar, reg_ar); - } - } else { - if ((SLJIT_MEM | (arg & OFFS_REG_MASK)) == compiler->cache_arg) { - FAIL_IF(ADD(reg_map[base], reg_map[base], TMP_REG3_mapped)); - if (flags & LOAD_DATA) - return PB2(data_transfer_insts[flags & MEM_MASK], reg_ar, reg_map[base]); - else - return PB2(data_transfer_insts[flags & MEM_MASK], reg_map[base], reg_ar); - } - } - } - - if (SLJIT_UNLIKELY(argw)) { - compiler->cache_arg = SLJIT_MEM | (arg & OFFS_REG_MASK); - compiler->cache_argw = argw; - FAIL_IF(SHLI(TMP_REG3_mapped, reg_map[OFFS_REG(arg)], argw)); - } - - if (!(flags & WRITE_BACK)) { - if (arg == next_arg && argw == (next_argw & 0x3)) { - compiler->cache_arg = arg; - compiler->cache_argw = argw; - FAIL_IF(ADD(TMP_REG3_mapped, reg_map[base], reg_map[!argw ? OFFS_REG(arg) : TMP_REG3])); - tmp_ar = TMP_REG3_mapped; - } else - FAIL_IF(ADD(tmp_ar, reg_map[base], reg_map[!argw ? OFFS_REG(arg) : TMP_REG3])); - - if (flags & LOAD_DATA) - return PB2(data_transfer_insts[flags & MEM_MASK], reg_ar, tmp_ar); - else - return PB2(data_transfer_insts[flags & MEM_MASK], tmp_ar, reg_ar); - } - - FAIL_IF(ADD(reg_map[base], reg_map[base], reg_map[!argw ? OFFS_REG(arg) : TMP_REG3])); - - if (flags & LOAD_DATA) - return PB2(data_transfer_insts[flags & MEM_MASK], reg_ar, reg_map[base]); - else - return PB2(data_transfer_insts[flags & MEM_MASK], reg_map[base], reg_ar); - } - - if (SLJIT_UNLIKELY(flags & WRITE_BACK) && base) { - /* Update only applies if a base register exists. */ - if (reg_ar == reg_map[base]) { - SLJIT_ASSERT(!(flags & LOAD_DATA) && TMP_REG1_mapped != reg_ar); - if (argw <= SIMM_16BIT_MAX && argw >= SIMM_16BIT_MIN) { - FAIL_IF(ADDLI(ADDR_TMP_mapped, reg_map[base], argw)); - if (flags & LOAD_DATA) - FAIL_IF(PB2(data_transfer_insts[flags & MEM_MASK], reg_ar, ADDR_TMP_mapped)); - else - FAIL_IF(PB2(data_transfer_insts[flags & MEM_MASK], ADDR_TMP_mapped, reg_ar)); - - if (argw) - return ADDLI(reg_map[base], reg_map[base], argw); - - return SLJIT_SUCCESS; - } - - FAIL_IF(ADD(TMP_REG1_mapped, reg_ar, ZERO)); - reg_ar = TMP_REG1_mapped; - } - - if (argw <= SIMM_16BIT_MAX && argw >= SIMM_16BIT_MIN) { - if (argw) - FAIL_IF(ADDLI(reg_map[base], reg_map[base], argw)); - } else { - if (compiler->cache_arg == SLJIT_MEM - && argw - compiler->cache_argw <= SIMM_16BIT_MAX - && argw - compiler->cache_argw >= SIMM_16BIT_MIN) { - if (argw != compiler->cache_argw) { - FAIL_IF(ADD(TMP_REG3_mapped, TMP_REG3_mapped, argw - compiler->cache_argw)); - compiler->cache_argw = argw; - } - - FAIL_IF(ADD(reg_map[base], reg_map[base], TMP_REG3_mapped)); - } else { - compiler->cache_arg = SLJIT_MEM; - compiler->cache_argw = argw; - FAIL_IF(load_immediate(compiler, TMP_REG3_mapped, argw)); - FAIL_IF(ADD(reg_map[base], reg_map[base], TMP_REG3_mapped)); - } - } - - if (flags & LOAD_DATA) - return PB2(data_transfer_insts[flags & MEM_MASK], reg_ar, reg_map[base]); - else - return PB2(data_transfer_insts[flags & MEM_MASK], reg_map[base], reg_ar); - } - - if (compiler->cache_arg == arg - && argw - compiler->cache_argw <= SIMM_16BIT_MAX - && argw - compiler->cache_argw >= SIMM_16BIT_MIN) { - if (argw != compiler->cache_argw) { - FAIL_IF(ADDLI(TMP_REG3_mapped, TMP_REG3_mapped, argw - compiler->cache_argw)); - compiler->cache_argw = argw; - } - - if (flags & LOAD_DATA) - return PB2(data_transfer_insts[flags & MEM_MASK], reg_ar, TMP_REG3_mapped); - else - return PB2(data_transfer_insts[flags & MEM_MASK], TMP_REG3_mapped, reg_ar); - } - - if (compiler->cache_arg == SLJIT_MEM - && argw - compiler->cache_argw <= SIMM_16BIT_MAX - && argw - compiler->cache_argw >= SIMM_16BIT_MIN) { - if (argw != compiler->cache_argw) - FAIL_IF(ADDLI(TMP_REG3_mapped, TMP_REG3_mapped, argw - compiler->cache_argw)); - } else { - compiler->cache_arg = SLJIT_MEM; - FAIL_IF(load_immediate(compiler, TMP_REG3_mapped, argw)); - } - - compiler->cache_argw = argw; - - if (!base) { - if (flags & LOAD_DATA) - return PB2(data_transfer_insts[flags & MEM_MASK], reg_ar, TMP_REG3_mapped); - else - return PB2(data_transfer_insts[flags & MEM_MASK], TMP_REG3_mapped, reg_ar); - } - - if (arg == next_arg - && next_argw - argw <= SIMM_16BIT_MAX - && next_argw - argw >= SIMM_16BIT_MIN) { - compiler->cache_arg = arg; - FAIL_IF(ADD(TMP_REG3_mapped, TMP_REG3_mapped, reg_map[base])); - if (flags & LOAD_DATA) - return PB2(data_transfer_insts[flags & MEM_MASK], reg_ar, TMP_REG3_mapped); - else - return PB2(data_transfer_insts[flags & MEM_MASK], TMP_REG3_mapped, reg_ar); - } - - FAIL_IF(ADD(tmp_ar, TMP_REG3_mapped, reg_map[base])); - - if (flags & LOAD_DATA) - return PB2(data_transfer_insts[flags & MEM_MASK], reg_ar, tmp_ar); - else - return PB2(data_transfer_insts[flags & MEM_MASK], tmp_ar, reg_ar); -} - -static SLJIT_INLINE sljit_s32 emit_op_mem(struct sljit_compiler *compiler, sljit_s32 flags, sljit_s32 reg_ar, sljit_s32 arg, sljit_sw argw) -{ - if (getput_arg_fast(compiler, flags, reg_ar, arg, argw)) - return compiler->error; - - compiler->cache_arg = 0; - compiler->cache_argw = 0; - return getput_arg(compiler, flags, reg_ar, arg, argw, 0, 0); -} - -static SLJIT_INLINE sljit_s32 emit_op_mem2(struct sljit_compiler *compiler, sljit_s32 flags, sljit_s32 reg, sljit_s32 arg1, sljit_sw arg1w, sljit_s32 arg2, sljit_sw arg2w) -{ - if (getput_arg_fast(compiler, flags, reg, arg1, arg1w)) - return compiler->error; - return getput_arg(compiler, flags, reg, arg1, arg1w, arg2, arg2w); -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_enter(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw) -{ - CHECK_ERROR(); - CHECK(check_sljit_emit_fast_enter(compiler, dst, dstw)); - ADJUST_LOCAL_OFFSET(dst, dstw); - - /* For UNUSED dst. Uncommon, but possible. */ - if (dst == SLJIT_UNUSED) - return SLJIT_SUCCESS; - - if (FAST_IS_REG(dst)) - return ADD(reg_map[dst], RA, ZERO); - - /* Memory. */ - return emit_op_mem(compiler, WORD_DATA, RA, dst, dstw); -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_return(struct sljit_compiler *compiler, sljit_s32 src, sljit_sw srcw) -{ - CHECK_ERROR(); - CHECK(check_sljit_emit_fast_return(compiler, src, srcw)); - ADJUST_LOCAL_OFFSET(src, srcw); - - if (FAST_IS_REG(src)) - FAIL_IF(ADD(RA, reg_map[src], ZERO)); - - else if (src & SLJIT_MEM) - FAIL_IF(emit_op_mem(compiler, WORD_DATA | LOAD_DATA, RA, src, srcw)); - - else if (src & SLJIT_IMM) - FAIL_IF(load_immediate(compiler, RA, srcw)); - - return JR(RA); -} - -static SLJIT_INLINE sljit_s32 emit_single_op(struct sljit_compiler *compiler, sljit_s32 op, sljit_s32 flags, sljit_s32 dst, sljit_s32 src1, sljit_sw src2) -{ - sljit_s32 overflow_ra = 0; - - switch (GET_OPCODE(op)) { - case SLJIT_MOV: - case SLJIT_MOV_P: - SLJIT_ASSERT(src1 == TMP_REG1 && !(flags & SRC2_IMM)); - if (dst != src2) - return ADD(reg_map[dst], reg_map[src2], ZERO); - return SLJIT_SUCCESS; - - case SLJIT_MOV_U32: - case SLJIT_MOV_S32: - SLJIT_ASSERT(src1 == TMP_REG1 && !(flags & SRC2_IMM)); - if ((flags & (REG_DEST | REG2_SOURCE)) == (REG_DEST | REG2_SOURCE)) { - if (op == SLJIT_MOV_S32) - return BFEXTS(reg_map[dst], reg_map[src2], 0, 31); - - return BFEXTU(reg_map[dst], reg_map[src2], 0, 31); - } else if (dst != src2) { - SLJIT_ASSERT(src2 == 0); - return ADD(reg_map[dst], reg_map[src2], ZERO); - } - - return SLJIT_SUCCESS; - - case SLJIT_MOV_U8: - case SLJIT_MOV_S8: - SLJIT_ASSERT(src1 == TMP_REG1 && !(flags & SRC2_IMM)); - if ((flags & (REG_DEST | REG2_SOURCE)) == (REG_DEST | REG2_SOURCE)) { - if (op == SLJIT_MOV_S8) - return BFEXTS(reg_map[dst], reg_map[src2], 0, 7); - - return BFEXTU(reg_map[dst], reg_map[src2], 0, 7); - } else if (dst != src2) { - SLJIT_ASSERT(src2 == 0); - return ADD(reg_map[dst], reg_map[src2], ZERO); - } - - return SLJIT_SUCCESS; - - case SLJIT_MOV_U16: - case SLJIT_MOV_S16: - SLJIT_ASSERT(src1 == TMP_REG1 && !(flags & SRC2_IMM)); - if ((flags & (REG_DEST | REG2_SOURCE)) == (REG_DEST | REG2_SOURCE)) { - if (op == SLJIT_MOV_S16) - return BFEXTS(reg_map[dst], reg_map[src2], 0, 15); - - return BFEXTU(reg_map[dst], reg_map[src2], 0, 15); - } else if (dst != src2) { - SLJIT_ASSERT(src2 == 0); - return ADD(reg_map[dst], reg_map[src2], ZERO); - } - - return SLJIT_SUCCESS; - - case SLJIT_NOT: - SLJIT_ASSERT(src1 == TMP_REG1 && !(flags & SRC2_IMM)); - if (op & SLJIT_SET_E) - FAIL_IF(NOR(EQUAL_FLAG, reg_map[src2], reg_map[src2])); - if (CHECK_FLAGS(SLJIT_SET_E)) - FAIL_IF(NOR(reg_map[dst], reg_map[src2], reg_map[src2])); - - return SLJIT_SUCCESS; - - case SLJIT_CLZ: - SLJIT_ASSERT(src1 == TMP_REG1 && !(flags & SRC2_IMM)); - if (op & SLJIT_SET_E) - FAIL_IF(CLZ(EQUAL_FLAG, reg_map[src2])); - if (CHECK_FLAGS(SLJIT_SET_E)) - FAIL_IF(CLZ(reg_map[dst], reg_map[src2])); - - return SLJIT_SUCCESS; - - case SLJIT_ADD: - if (flags & SRC2_IMM) { - if (op & SLJIT_SET_O) { - FAIL_IF(SHRUI(TMP_EREG1, reg_map[src1], 63)); - if (src2 < 0) - FAIL_IF(XORI(TMP_EREG1, TMP_EREG1, 1)); - } - - if (op & SLJIT_SET_E) - FAIL_IF(ADDLI(EQUAL_FLAG, reg_map[src1], src2)); - - if (op & SLJIT_SET_C) { - if (src2 >= 0) - FAIL_IF(ORI(ULESS_FLAG ,reg_map[src1], src2)); - else { - FAIL_IF(ADDLI(ULESS_FLAG ,ZERO, src2)); - FAIL_IF(OR(ULESS_FLAG,reg_map[src1],ULESS_FLAG)); - } - } - - /* dst may be the same as src1 or src2. */ - if (CHECK_FLAGS(SLJIT_SET_E)) - FAIL_IF(ADDLI(reg_map[dst], reg_map[src1], src2)); - - if (op & SLJIT_SET_O) { - FAIL_IF(SHRUI(OVERFLOW_FLAG, reg_map[dst], 63)); - - if (src2 < 0) - FAIL_IF(XORI(OVERFLOW_FLAG, OVERFLOW_FLAG, 1)); - } - } else { - if (op & SLJIT_SET_O) { - FAIL_IF(XOR(TMP_EREG1, reg_map[src1], reg_map[src2])); - FAIL_IF(SHRUI(TMP_EREG1, TMP_EREG1, 63)); - - if (src1 != dst) - overflow_ra = reg_map[src1]; - else if (src2 != dst) - overflow_ra = reg_map[src2]; - else { - /* Rare ocasion. */ - FAIL_IF(ADD(TMP_EREG2, reg_map[src1], ZERO)); - overflow_ra = TMP_EREG2; - } - } - - if (op & SLJIT_SET_E) - FAIL_IF(ADD(EQUAL_FLAG ,reg_map[src1], reg_map[src2])); - - if (op & SLJIT_SET_C) - FAIL_IF(OR(ULESS_FLAG,reg_map[src1], reg_map[src2])); - - /* dst may be the same as src1 or src2. */ - if (CHECK_FLAGS(SLJIT_SET_E)) - FAIL_IF(ADD(reg_map[dst],reg_map[src1], reg_map[src2])); - - if (op & SLJIT_SET_O) { - FAIL_IF(XOR(OVERFLOW_FLAG,reg_map[dst], overflow_ra)); - FAIL_IF(SHRUI(OVERFLOW_FLAG, OVERFLOW_FLAG, 63)); - } - } - - /* a + b >= a | b (otherwise, the carry should be set to 1). */ - if (op & SLJIT_SET_C) - FAIL_IF(CMPLTU(ULESS_FLAG ,reg_map[dst] ,ULESS_FLAG)); - - if (op & SLJIT_SET_O) - return CMOVNEZ(OVERFLOW_FLAG, TMP_EREG1, ZERO); - - return SLJIT_SUCCESS; - - case SLJIT_ADDC: - if (flags & SRC2_IMM) { - if (op & SLJIT_SET_C) { - if (src2 >= 0) - FAIL_IF(ORI(TMP_EREG1, reg_map[src1], src2)); - else { - FAIL_IF(ADDLI(TMP_EREG1, ZERO, src2)); - FAIL_IF(OR(TMP_EREG1, reg_map[src1], TMP_EREG1)); - } - } - - FAIL_IF(ADDLI(reg_map[dst], reg_map[src1], src2)); - - } else { - if (op & SLJIT_SET_C) - FAIL_IF(OR(TMP_EREG1, reg_map[src1], reg_map[src2])); - - /* dst may be the same as src1 or src2. */ - FAIL_IF(ADD(reg_map[dst], reg_map[src1], reg_map[src2])); - } - - if (op & SLJIT_SET_C) - FAIL_IF(CMPLTU(TMP_EREG1, reg_map[dst], TMP_EREG1)); - - FAIL_IF(ADD(reg_map[dst], reg_map[dst], ULESS_FLAG)); - - if (!(op & SLJIT_SET_C)) - return SLJIT_SUCCESS; - - /* Set TMP_EREG2 (dst == 0) && (ULESS_FLAG == 1). */ - FAIL_IF(CMPLTUI(TMP_EREG2, reg_map[dst], 1)); - FAIL_IF(AND(TMP_EREG2, TMP_EREG2, ULESS_FLAG)); - /* Set carry flag. */ - return OR(ULESS_FLAG, TMP_EREG2, TMP_EREG1); - - case SLJIT_SUB: - if ((flags & SRC2_IMM) && ((op & (SLJIT_SET_U | SLJIT_SET_S)) || src2 == SIMM_16BIT_MIN)) { - FAIL_IF(ADDLI(TMP_REG2_mapped, ZERO, src2)); - src2 = TMP_REG2; - flags &= ~SRC2_IMM; - } - - if (flags & SRC2_IMM) { - if (op & SLJIT_SET_O) { - FAIL_IF(SHRUI(TMP_EREG1,reg_map[src1], 63)); - - if (src2 < 0) - FAIL_IF(XORI(TMP_EREG1, TMP_EREG1, 1)); - - if (src1 != dst) - overflow_ra = reg_map[src1]; - else { - /* Rare ocasion. */ - FAIL_IF(ADD(TMP_EREG2, reg_map[src1], ZERO)); - overflow_ra = TMP_EREG2; - } - } - - if (op & SLJIT_SET_E) - FAIL_IF(ADDLI(EQUAL_FLAG, reg_map[src1], -src2)); - - if (op & SLJIT_SET_C) { - FAIL_IF(load_immediate(compiler, ADDR_TMP_mapped, src2)); - FAIL_IF(CMPLTU(ULESS_FLAG, reg_map[src1], ADDR_TMP_mapped)); - } - - /* dst may be the same as src1 or src2. */ - if (CHECK_FLAGS(SLJIT_SET_E)) - FAIL_IF(ADDLI(reg_map[dst], reg_map[src1], -src2)); - - } else { - - if (op & SLJIT_SET_O) { - FAIL_IF(XOR(TMP_EREG1, reg_map[src1], reg_map[src2])); - FAIL_IF(SHRUI(TMP_EREG1, TMP_EREG1, 63)); - - if (src1 != dst) - overflow_ra = reg_map[src1]; - else { - /* Rare ocasion. */ - FAIL_IF(ADD(TMP_EREG2, reg_map[src1], ZERO)); - overflow_ra = TMP_EREG2; - } - } - - if (op & SLJIT_SET_E) - FAIL_IF(SUB(EQUAL_FLAG, reg_map[src1], reg_map[src2])); - - if (op & (SLJIT_SET_U | SLJIT_SET_C)) - FAIL_IF(CMPLTU(ULESS_FLAG, reg_map[src1], reg_map[src2])); - - if (op & SLJIT_SET_U) - FAIL_IF(CMPLTU(UGREATER_FLAG, reg_map[src2], reg_map[src1])); - - if (op & SLJIT_SET_S) { - FAIL_IF(CMPLTS(LESS_FLAG ,reg_map[src1] ,reg_map[src2])); - FAIL_IF(CMPLTS(GREATER_FLAG ,reg_map[src2] ,reg_map[src1])); - } - - /* dst may be the same as src1 or src2. */ - if (CHECK_FLAGS(SLJIT_SET_E | SLJIT_SET_U | SLJIT_SET_S | SLJIT_SET_C)) - FAIL_IF(SUB(reg_map[dst], reg_map[src1], reg_map[src2])); - } - - if (op & SLJIT_SET_O) { - FAIL_IF(XOR(OVERFLOW_FLAG, reg_map[dst], overflow_ra)); - FAIL_IF(SHRUI(OVERFLOW_FLAG, OVERFLOW_FLAG, 63)); - return CMOVEQZ(OVERFLOW_FLAG, TMP_EREG1, ZERO); - } - - return SLJIT_SUCCESS; - - case SLJIT_SUBC: - if ((flags & SRC2_IMM) && src2 == SIMM_16BIT_MIN) { - FAIL_IF(ADDLI(TMP_REG2_mapped, ZERO, src2)); - src2 = TMP_REG2; - flags &= ~SRC2_IMM; - } - - if (flags & SRC2_IMM) { - if (op & SLJIT_SET_C) { - FAIL_IF(load_immediate(compiler, ADDR_TMP_mapped, -src2)); - FAIL_IF(CMPLTU(TMP_EREG1, reg_map[src1], ADDR_TMP_mapped)); - } - - /* dst may be the same as src1 or src2. */ - FAIL_IF(ADDLI(reg_map[dst], reg_map[src1], -src2)); - - } else { - if (op & SLJIT_SET_C) - FAIL_IF(CMPLTU(TMP_EREG1, reg_map[src1], reg_map[src2])); - /* dst may be the same as src1 or src2. */ - FAIL_IF(SUB(reg_map[dst], reg_map[src1], reg_map[src2])); - } - - if (op & SLJIT_SET_C) - FAIL_IF(CMOVEQZ(TMP_EREG1, reg_map[dst], ULESS_FLAG)); - - FAIL_IF(SUB(reg_map[dst], reg_map[dst], ULESS_FLAG)); - - if (op & SLJIT_SET_C) - FAIL_IF(ADD(ULESS_FLAG, TMP_EREG1, ZERO)); - - return SLJIT_SUCCESS; - - case SLJIT_MUL: - if (flags & SRC2_IMM) { - FAIL_IF(load_immediate(compiler, TMP_REG2_mapped, src2)); - src2 = TMP_REG2; - flags &= ~SRC2_IMM; - } - - FAIL_IF(MUL(reg_map[dst], reg_map[src1], reg_map[src2])); - - return SLJIT_SUCCESS; - -#define EMIT_LOGICAL(op_imm, op_norm) \ - if (flags & SRC2_IMM) { \ - FAIL_IF(load_immediate(compiler, ADDR_TMP_mapped, src2)); \ - if (op & SLJIT_SET_E) \ - FAIL_IF(push_3_buffer( \ - compiler, op_norm, EQUAL_FLAG, reg_map[src1], \ - ADDR_TMP_mapped, __LINE__)); \ - if (CHECK_FLAGS(SLJIT_SET_E)) \ - FAIL_IF(push_3_buffer( \ - compiler, op_norm, reg_map[dst], reg_map[src1], \ - ADDR_TMP_mapped, __LINE__)); \ - } else { \ - if (op & SLJIT_SET_E) \ - FAIL_IF(push_3_buffer( \ - compiler, op_norm, EQUAL_FLAG, reg_map[src1], \ - reg_map[src2], __LINE__)); \ - if (CHECK_FLAGS(SLJIT_SET_E)) \ - FAIL_IF(push_3_buffer( \ - compiler, op_norm, reg_map[dst], reg_map[src1], \ - reg_map[src2], __LINE__)); \ - } - - case SLJIT_AND: - EMIT_LOGICAL(TILEGX_OPC_ANDI, TILEGX_OPC_AND); - return SLJIT_SUCCESS; - - case SLJIT_OR: - EMIT_LOGICAL(TILEGX_OPC_ORI, TILEGX_OPC_OR); - return SLJIT_SUCCESS; - - case SLJIT_XOR: - EMIT_LOGICAL(TILEGX_OPC_XORI, TILEGX_OPC_XOR); - return SLJIT_SUCCESS; - -#define EMIT_SHIFT(op_imm, op_norm) \ - if (flags & SRC2_IMM) { \ - if (op & SLJIT_SET_E) \ - FAIL_IF(push_3_buffer( \ - compiler, op_imm, EQUAL_FLAG, reg_map[src1], \ - src2 & 0x3F, __LINE__)); \ - if (CHECK_FLAGS(SLJIT_SET_E)) \ - FAIL_IF(push_3_buffer( \ - compiler, op_imm, reg_map[dst], reg_map[src1], \ - src2 & 0x3F, __LINE__)); \ - } else { \ - if (op & SLJIT_SET_E) \ - FAIL_IF(push_3_buffer( \ - compiler, op_norm, EQUAL_FLAG, reg_map[src1], \ - reg_map[src2], __LINE__)); \ - if (CHECK_FLAGS(SLJIT_SET_E)) \ - FAIL_IF(push_3_buffer( \ - compiler, op_norm, reg_map[dst], reg_map[src1], \ - reg_map[src2], __LINE__)); \ - } - - case SLJIT_SHL: - EMIT_SHIFT(TILEGX_OPC_SHLI, TILEGX_OPC_SHL); - return SLJIT_SUCCESS; - - case SLJIT_LSHR: - EMIT_SHIFT(TILEGX_OPC_SHRUI, TILEGX_OPC_SHRU); - return SLJIT_SUCCESS; - - case SLJIT_ASHR: - EMIT_SHIFT(TILEGX_OPC_SHRSI, TILEGX_OPC_SHRS); - return SLJIT_SUCCESS; - } - - SLJIT_UNREACHABLE(); - return SLJIT_SUCCESS; -} - -static sljit_s32 emit_op(struct sljit_compiler *compiler, sljit_s32 op, sljit_s32 flags, sljit_s32 dst, sljit_sw dstw, sljit_s32 src1, sljit_sw src1w, sljit_s32 src2, sljit_sw src2w) -{ - /* arg1 goes to TMP_REG1 or src reg. - arg2 goes to TMP_REG2, imm or src reg. - TMP_REG3 can be used for caching. - result goes to TMP_REG2, so put result can use TMP_REG1 and TMP_REG3. */ - sljit_s32 dst_r = TMP_REG2; - sljit_s32 src1_r; - sljit_sw src2_r = 0; - sljit_s32 sugg_src2_r = TMP_REG2; - - if (!(flags & ALT_KEEP_CACHE)) { - compiler->cache_arg = 0; - compiler->cache_argw = 0; - } - - if (SLJIT_UNLIKELY(dst == SLJIT_UNUSED)) { - if (op >= SLJIT_MOV && op <= SLJIT_MOVU_S32 && !(src2 & SLJIT_MEM)) - return SLJIT_SUCCESS; - if (GET_FLAGS(op)) - flags |= UNUSED_DEST; - } else if (FAST_IS_REG(dst)) { - dst_r = dst; - flags |= REG_DEST; - if (op >= SLJIT_MOV && op <= SLJIT_MOVU_S32) - sugg_src2_r = dst_r; - } else if ((dst & SLJIT_MEM) && !getput_arg_fast(compiler, flags | ARG_TEST, TMP_REG1_mapped, dst, dstw)) - flags |= SLOW_DEST; - - if (flags & IMM_OP) { - if ((src2 & SLJIT_IMM) && src2w) { - if ((!(flags & LOGICAL_OP) - && (src2w <= SIMM_16BIT_MAX && src2w >= SIMM_16BIT_MIN)) - || ((flags & LOGICAL_OP) && !(src2w & ~UIMM_16BIT_MAX))) { - flags |= SRC2_IMM; - src2_r = src2w; - } - } - - if (!(flags & SRC2_IMM) && (flags & CUMULATIVE_OP) && (src1 & SLJIT_IMM) && src1w) { - if ((!(flags & LOGICAL_OP) - && (src1w <= SIMM_16BIT_MAX && src1w >= SIMM_16BIT_MIN)) - || ((flags & LOGICAL_OP) && !(src1w & ~UIMM_16BIT_MAX))) { - flags |= SRC2_IMM; - src2_r = src1w; - - /* And swap arguments. */ - src1 = src2; - src1w = src2w; - src2 = SLJIT_IMM; - /* src2w = src2_r unneeded. */ - } - } - } - - /* Source 1. */ - if (FAST_IS_REG(src1)) { - src1_r = src1; - flags |= REG1_SOURCE; - } else if (src1 & SLJIT_IMM) { - if (src1w) { - FAIL_IF(load_immediate(compiler, TMP_REG1_mapped, src1w)); - src1_r = TMP_REG1; - } else - src1_r = 0; - } else { - if (getput_arg_fast(compiler, flags | LOAD_DATA, TMP_REG1_mapped, src1, src1w)) - FAIL_IF(compiler->error); - else - flags |= SLOW_SRC1; - src1_r = TMP_REG1; - } - - /* Source 2. */ - if (FAST_IS_REG(src2)) { - src2_r = src2; - flags |= REG2_SOURCE; - if (!(flags & REG_DEST) && op >= SLJIT_MOV && op <= SLJIT_MOVU_S32) - dst_r = src2_r; - } else if (src2 & SLJIT_IMM) { - if (!(flags & SRC2_IMM)) { - if (src2w) { - FAIL_IF(load_immediate(compiler, reg_map[sugg_src2_r], src2w)); - src2_r = sugg_src2_r; - } else { - src2_r = 0; - if ((op >= SLJIT_MOV && op <= SLJIT_MOVU_S32) && (dst & SLJIT_MEM)) - dst_r = 0; - } - } - } else { - if (getput_arg_fast(compiler, flags | LOAD_DATA, reg_map[sugg_src2_r], src2, src2w)) - FAIL_IF(compiler->error); - else - flags |= SLOW_SRC2; - src2_r = sugg_src2_r; - } - - if ((flags & (SLOW_SRC1 | SLOW_SRC2)) == (SLOW_SRC1 | SLOW_SRC2)) { - SLJIT_ASSERT(src2_r == TMP_REG2); - if (!can_cache(src1, src1w, src2, src2w) && can_cache(src1, src1w, dst, dstw)) { - FAIL_IF(getput_arg(compiler, flags | LOAD_DATA, TMP_REG2_mapped, src2, src2w, src1, src1w)); - FAIL_IF(getput_arg(compiler, flags | LOAD_DATA, TMP_REG1_mapped, src1, src1w, dst, dstw)); - } else { - FAIL_IF(getput_arg(compiler, flags | LOAD_DATA, TMP_REG1_mapped, src1, src1w, src2, src2w)); - FAIL_IF(getput_arg(compiler, flags | LOAD_DATA, TMP_REG2_mapped, src2, src2w, dst, dstw)); - } - } else if (flags & SLOW_SRC1) - FAIL_IF(getput_arg(compiler, flags | LOAD_DATA, TMP_REG1_mapped, src1, src1w, dst, dstw)); - else if (flags & SLOW_SRC2) - FAIL_IF(getput_arg(compiler, flags | LOAD_DATA, reg_map[sugg_src2_r], src2, src2w, dst, dstw)); - - FAIL_IF(emit_single_op(compiler, op, flags, dst_r, src1_r, src2_r)); - - if (dst & SLJIT_MEM) { - if (!(flags & SLOW_DEST)) { - getput_arg_fast(compiler, flags, reg_map[dst_r], dst, dstw); - return compiler->error; - } - - return getput_arg(compiler, flags, reg_map[dst_r], dst, dstw, 0, 0); - } - - return SLJIT_SUCCESS; -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_flags(struct sljit_compiler *compiler, sljit_s32 op, sljit_s32 dst, sljit_sw dstw, sljit_s32 src, sljit_sw srcw, sljit_s32 type) -{ - sljit_s32 sugg_dst_ar, dst_ar; - sljit_s32 flags = GET_ALL_FLAGS(op); - sljit_s32 mem_type = (op & SLJIT_I32_OP) ? (INT_DATA | SIGNED_DATA) : WORD_DATA; - - CHECK_ERROR(); - CHECK(check_sljit_emit_op_flags(compiler, op, dst, dstw, src, srcw, type)); - ADJUST_LOCAL_OFFSET(dst, dstw); - - op = GET_OPCODE(op); - if (op == SLJIT_MOV_S32 || op == SLJIT_MOV_U32) - mem_type = INT_DATA | SIGNED_DATA; - sugg_dst_ar = reg_map[(op < SLJIT_ADD && FAST_IS_REG(dst)) ? dst : TMP_REG2]; - - compiler->cache_arg = 0; - compiler->cache_argw = 0; - if (op >= SLJIT_ADD && (src & SLJIT_MEM)) { - ADJUST_LOCAL_OFFSET(src, srcw); - FAIL_IF(emit_op_mem2(compiler, mem_type | LOAD_DATA, TMP_REG1_mapped, src, srcw, dst, dstw)); - src = TMP_REG1; - srcw = 0; - } - - switch (type & 0xff) { - case SLJIT_EQUAL: - case SLJIT_NOT_EQUAL: - FAIL_IF(CMPLTUI(sugg_dst_ar, EQUAL_FLAG, 1)); - dst_ar = sugg_dst_ar; - break; - case SLJIT_LESS: - case SLJIT_GREATER_EQUAL: - dst_ar = ULESS_FLAG; - break; - case SLJIT_GREATER: - case SLJIT_LESS_EQUAL: - dst_ar = UGREATER_FLAG; - break; - case SLJIT_SIG_LESS: - case SLJIT_SIG_GREATER_EQUAL: - dst_ar = LESS_FLAG; - break; - case SLJIT_SIG_GREATER: - case SLJIT_SIG_LESS_EQUAL: - dst_ar = GREATER_FLAG; - break; - case SLJIT_OVERFLOW: - case SLJIT_NOT_OVERFLOW: - dst_ar = OVERFLOW_FLAG; - break; - case SLJIT_MUL_OVERFLOW: - case SLJIT_MUL_NOT_OVERFLOW: - FAIL_IF(CMPLTUI(sugg_dst_ar, OVERFLOW_FLAG, 1)); - dst_ar = sugg_dst_ar; - type ^= 0x1; /* Flip type bit for the XORI below. */ - break; - - default: - SLJIT_UNREACHABLE(); - dst_ar = sugg_dst_ar; - break; - } - - if (type & 0x1) { - FAIL_IF(XORI(sugg_dst_ar, dst_ar, 1)); - dst_ar = sugg_dst_ar; - } - - if (op >= SLJIT_ADD) { - if (TMP_REG2_mapped != dst_ar) - FAIL_IF(ADD(TMP_REG2_mapped, dst_ar, ZERO)); - return emit_op(compiler, op | flags, mem_type | CUMULATIVE_OP | LOGICAL_OP | IMM_OP | ALT_KEEP_CACHE, dst, dstw, src, srcw, TMP_REG2, 0); - } - - if (dst & SLJIT_MEM) - return emit_op_mem(compiler, mem_type, dst_ar, dst, dstw); - - if (sugg_dst_ar != dst_ar) - return ADD(sugg_dst_ar, dst_ar, ZERO); - - return SLJIT_SUCCESS; -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op0(struct sljit_compiler *compiler, sljit_s32 op) { - CHECK_ERROR(); - CHECK(check_sljit_emit_op0(compiler, op)); - - op = GET_OPCODE(op); - switch (op) { - case SLJIT_NOP: - return push_0_buffer(compiler, TILEGX_OPC_FNOP, __LINE__); - - case SLJIT_BREAKPOINT: - return PI(BPT); - - case SLJIT_LMUL_UW: - case SLJIT_LMUL_SW: - case SLJIT_DIVMOD_UW: - case SLJIT_DIVMOD_SW: - case SLJIT_DIV_UW: - case SLJIT_DIV_SW: - SLJIT_UNREACHABLE(); - } - - return SLJIT_SUCCESS; -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op1(struct sljit_compiler *compiler, sljit_s32 op, sljit_s32 dst, sljit_sw dstw, sljit_s32 src, sljit_sw srcw) -{ - CHECK_ERROR(); - CHECK(check_sljit_emit_op1(compiler, op, dst, dstw, src, srcw)); - ADJUST_LOCAL_OFFSET(dst, dstw); - ADJUST_LOCAL_OFFSET(src, srcw); - - switch (GET_OPCODE(op)) { - case SLJIT_MOV: - case SLJIT_MOV_P: - return emit_op(compiler, SLJIT_MOV, WORD_DATA, dst, dstw, TMP_REG1, 0, src, srcw); - - case SLJIT_MOV_U32: - return emit_op(compiler, SLJIT_MOV_U32, INT_DATA, dst, dstw, TMP_REG1, 0, src, srcw); - - case SLJIT_MOV_S32: - return emit_op(compiler, SLJIT_MOV_S32, INT_DATA | SIGNED_DATA, dst, dstw, TMP_REG1, 0, src, srcw); - - case SLJIT_MOV_U8: - return emit_op(compiler, SLJIT_MOV_U8, BYTE_DATA, dst, dstw, TMP_REG1, 0, src, (src & SLJIT_IMM) ? (sljit_u8) srcw : srcw); - - case SLJIT_MOV_S8: - return emit_op(compiler, SLJIT_MOV_S8, BYTE_DATA | SIGNED_DATA, dst, dstw, TMP_REG1, 0, src, (src & SLJIT_IMM) ? (sljit_s8) srcw : srcw); - - case SLJIT_MOV_U16: - return emit_op(compiler, SLJIT_MOV_U16, HALF_DATA, dst, dstw, TMP_REG1, 0, src, (src & SLJIT_IMM) ? (sljit_u16) srcw : srcw); - - case SLJIT_MOV_S16: - return emit_op(compiler, SLJIT_MOV_S16, HALF_DATA | SIGNED_DATA, dst, dstw, TMP_REG1, 0, src, (src & SLJIT_IMM) ? (sljit_s16) srcw : srcw); - - case SLJIT_MOVU: - case SLJIT_MOVU_P: - return emit_op(compiler, SLJIT_MOV, WORD_DATA | WRITE_BACK, dst, dstw, TMP_REG1, 0, src, srcw); - - case SLJIT_MOVU_U32: - return emit_op(compiler, SLJIT_MOV_U32, INT_DATA | WRITE_BACK, dst, dstw, TMP_REG1, 0, src, srcw); - - case SLJIT_MOVU_S32: - return emit_op(compiler, SLJIT_MOV_S32, INT_DATA | SIGNED_DATA | WRITE_BACK, dst, dstw, TMP_REG1, 0, src, srcw); - - case SLJIT_MOVU_U8: - return emit_op(compiler, SLJIT_MOV_U8, BYTE_DATA | WRITE_BACK, dst, dstw, TMP_REG1, 0, src, (src & SLJIT_IMM) ? (sljit_u8) srcw : srcw); - - case SLJIT_MOVU_S8: - return emit_op(compiler, SLJIT_MOV_S8, BYTE_DATA | SIGNED_DATA | WRITE_BACK, dst, dstw, TMP_REG1, 0, src, (src & SLJIT_IMM) ? (sljit_s8) srcw : srcw); - - case SLJIT_MOVU_U16: - return emit_op(compiler, SLJIT_MOV_U16, HALF_DATA | WRITE_BACK, dst, dstw, TMP_REG1, 0, src, (src & SLJIT_IMM) ? (sljit_u16) srcw : srcw); - - case SLJIT_MOVU_S16: - return emit_op(compiler, SLJIT_MOV_S16, HALF_DATA | SIGNED_DATA | WRITE_BACK, dst, dstw, TMP_REG1, 0, src, (src & SLJIT_IMM) ? (sljit_s16) srcw : srcw); - - case SLJIT_NOT: - return emit_op(compiler, op, 0, dst, dstw, TMP_REG1, 0, src, srcw); - - case SLJIT_NEG: - return emit_op(compiler, SLJIT_SUB | GET_ALL_FLAGS(op), IMM_OP, dst, dstw, SLJIT_IMM, 0, src, srcw); - - case SLJIT_CLZ: - return emit_op(compiler, op, (op & SLJIT_I32_OP) ? INT_DATA : WORD_DATA, dst, dstw, TMP_REG1, 0, src, srcw); - } - - return SLJIT_SUCCESS; -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op2(struct sljit_compiler *compiler, sljit_s32 op, sljit_s32 dst, sljit_sw dstw, sljit_s32 src1, sljit_sw src1w, sljit_s32 src2, sljit_sw src2w) -{ - CHECK_ERROR(); - CHECK(check_sljit_emit_op2(compiler, op, dst, dstw, src1, src1w, src2, src2w)); - ADJUST_LOCAL_OFFSET(dst, dstw); - ADJUST_LOCAL_OFFSET(src1, src1w); - ADJUST_LOCAL_OFFSET(src2, src2w); - - switch (GET_OPCODE(op)) { - case SLJIT_ADD: - case SLJIT_ADDC: - return emit_op(compiler, op, CUMULATIVE_OP | IMM_OP, dst, dstw, src1, src1w, src2, src2w); - - case SLJIT_SUB: - case SLJIT_SUBC: - return emit_op(compiler, op, IMM_OP, dst, dstw, src1, src1w, src2, src2w); - - case SLJIT_MUL: - return emit_op(compiler, op, CUMULATIVE_OP, dst, dstw, src1, src1w, src2, src2w); - - case SLJIT_AND: - case SLJIT_OR: - case SLJIT_XOR: - return emit_op(compiler, op, CUMULATIVE_OP | LOGICAL_OP | IMM_OP, dst, dstw, src1, src1w, src2, src2w); - - case SLJIT_SHL: - case SLJIT_LSHR: - case SLJIT_ASHR: - if (src2 & SLJIT_IMM) - src2w &= 0x3f; - if (op & SLJIT_I32_OP) - src2w &= 0x1f; - - return emit_op(compiler, op, IMM_OP, dst, dstw, src1, src1w, src2, src2w); - } - - return SLJIT_SUCCESS; -} - -SLJIT_API_FUNC_ATTRIBUTE struct sljit_label * sljit_emit_label(struct sljit_compiler *compiler) -{ - struct sljit_label *label; - - flush_buffer(compiler); - - CHECK_ERROR_PTR(); - CHECK_PTR(check_sljit_emit_label(compiler)); - - if (compiler->last_label && compiler->last_label->size == compiler->size) - return compiler->last_label; - - label = (struct sljit_label *)ensure_abuf(compiler, sizeof(struct sljit_label)); - PTR_FAIL_IF(!label); - set_label(label, compiler); - return label; -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_ijump(struct sljit_compiler *compiler, sljit_s32 type, sljit_s32 src, sljit_sw srcw) -{ - sljit_s32 src_r = TMP_REG2; - struct sljit_jump *jump = NULL; - - flush_buffer(compiler); - - CHECK_ERROR(); - CHECK(check_sljit_emit_ijump(compiler, type, src, srcw)); - ADJUST_LOCAL_OFFSET(src, srcw); - - if (FAST_IS_REG(src)) { - if (reg_map[src] != 0) - src_r = src; - else - FAIL_IF(ADD_SOLO(TMP_REG2_mapped, reg_map[src], ZERO)); - } - - if (type >= SLJIT_CALL0) { - SLJIT_ASSERT(reg_map[PIC_ADDR_REG] == 16 && PIC_ADDR_REG == TMP_REG2); - if (src & (SLJIT_IMM | SLJIT_MEM)) { - if (src & SLJIT_IMM) - FAIL_IF(emit_const(compiler, reg_map[PIC_ADDR_REG], srcw, 1)); - else { - SLJIT_ASSERT(src_r == TMP_REG2 && (src & SLJIT_MEM)); - FAIL_IF(emit_op(compiler, SLJIT_MOV, WORD_DATA, TMP_REG2, 0, TMP_REG1, 0, src, srcw)); - } - - FAIL_IF(ADD_SOLO(0, reg_map[SLJIT_R0], ZERO)); - - FAIL_IF(ADDI_SOLO(54, 54, -16)); - - FAIL_IF(JALR_SOLO(reg_map[PIC_ADDR_REG])); - - return ADDI_SOLO(54, 54, 16); - } - - /* Register input. */ - if (type >= SLJIT_CALL1) - FAIL_IF(ADD_SOLO(0, reg_map[SLJIT_R0], ZERO)); - - FAIL_IF(ADD_SOLO(reg_map[PIC_ADDR_REG], reg_map[src_r], ZERO)); - - FAIL_IF(ADDI_SOLO(54, 54, -16)); - - FAIL_IF(JALR_SOLO(reg_map[src_r])); - - return ADDI_SOLO(54, 54, 16); - } - - if (src & SLJIT_IMM) { - jump = (struct sljit_jump *)ensure_abuf(compiler, sizeof(struct sljit_jump)); - FAIL_IF(!jump); - set_jump(jump, compiler, JUMP_ADDR | ((type >= SLJIT_FAST_CALL) ? IS_JAL : 0)); - jump->u.target = srcw; - FAIL_IF(emit_const(compiler, TMP_REG2_mapped, 0, 1)); - - if (type >= SLJIT_FAST_CALL) { - FAIL_IF(ADD_SOLO(ZERO, ZERO, ZERO)); - jump->addr = compiler->size; - FAIL_IF(JR_SOLO(reg_map[src_r])); - } else { - jump->addr = compiler->size; - FAIL_IF(JR_SOLO(reg_map[src_r])); - } - - return SLJIT_SUCCESS; - - } else if (src & SLJIT_MEM) { - FAIL_IF(emit_op(compiler, SLJIT_MOV, WORD_DATA, TMP_REG2, 0, TMP_REG1, 0, src, srcw)); - flush_buffer(compiler); - } - - FAIL_IF(JR_SOLO(reg_map[src_r])); - - if (jump) - jump->addr = compiler->size; - - return SLJIT_SUCCESS; -} - -#define BR_Z(src) \ - inst = BEQZ_X1 | SRCA_X1(src); \ - flags = IS_COND; - -#define BR_NZ(src) \ - inst = BNEZ_X1 | SRCA_X1(src); \ - flags = IS_COND; - -SLJIT_API_FUNC_ATTRIBUTE struct sljit_jump * sljit_emit_jump(struct sljit_compiler *compiler, sljit_s32 type) -{ - struct sljit_jump *jump; - sljit_ins inst; - sljit_s32 flags = 0; - - flush_buffer(compiler); - - CHECK_ERROR_PTR(); - CHECK_PTR(check_sljit_emit_jump(compiler, type)); - - jump = (struct sljit_jump *)ensure_abuf(compiler, sizeof(struct sljit_jump)); - PTR_FAIL_IF(!jump); - set_jump(jump, compiler, type & SLJIT_REWRITABLE_JUMP); - type &= 0xff; - - switch (type) { - case SLJIT_EQUAL: - BR_NZ(EQUAL_FLAG); - break; - case SLJIT_NOT_EQUAL: - BR_Z(EQUAL_FLAG); - break; - case SLJIT_LESS: - BR_Z(ULESS_FLAG); - break; - case SLJIT_GREATER_EQUAL: - BR_NZ(ULESS_FLAG); - break; - case SLJIT_GREATER: - BR_Z(UGREATER_FLAG); - break; - case SLJIT_LESS_EQUAL: - BR_NZ(UGREATER_FLAG); - break; - case SLJIT_SIG_LESS: - BR_Z(LESS_FLAG); - break; - case SLJIT_SIG_GREATER_EQUAL: - BR_NZ(LESS_FLAG); - break; - case SLJIT_SIG_GREATER: - BR_Z(GREATER_FLAG); - break; - case SLJIT_SIG_LESS_EQUAL: - BR_NZ(GREATER_FLAG); - break; - case SLJIT_OVERFLOW: - case SLJIT_MUL_OVERFLOW: - BR_Z(OVERFLOW_FLAG); - break; - case SLJIT_NOT_OVERFLOW: - case SLJIT_MUL_NOT_OVERFLOW: - BR_NZ(OVERFLOW_FLAG); - break; - default: - /* Not conditional branch. */ - inst = 0; - break; - } - - jump->flags |= flags; - - if (inst) { - inst = inst | ((type <= SLJIT_JUMP) ? BOFF_X1(5) : BOFF_X1(6)); - PTR_FAIL_IF(PI(inst)); - } - - PTR_FAIL_IF(emit_const(compiler, TMP_REG2_mapped, 0, 1)); - if (type <= SLJIT_JUMP) { - jump->addr = compiler->size; - PTR_FAIL_IF(JR_SOLO(TMP_REG2_mapped)); - } else { - SLJIT_ASSERT(reg_map[PIC_ADDR_REG] == 16 && PIC_ADDR_REG == TMP_REG2); - /* Cannot be optimized out if type is >= CALL0. */ - jump->flags |= IS_JAL | (type >= SLJIT_CALL0 ? SLJIT_REWRITABLE_JUMP : 0); - PTR_FAIL_IF(ADD_SOLO(0, reg_map[SLJIT_R0], ZERO)); - jump->addr = compiler->size; - PTR_FAIL_IF(JALR_SOLO(TMP_REG2_mapped)); - } - - return jump; -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fop1(struct sljit_compiler *compiler, sljit_s32 op, sljit_s32 dst, sljit_sw dstw, sljit_s32 src, sljit_sw srcw) -{ - SLJIT_UNREACHABLE(); -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fop2(struct sljit_compiler *compiler, sljit_s32 op, sljit_s32 dst, sljit_sw dstw, sljit_s32 src1, sljit_sw src1w, sljit_s32 src2, sljit_sw src2w) -{ - SLJIT_UNREACHABLE(); -} - -SLJIT_API_FUNC_ATTRIBUTE struct sljit_const * sljit_emit_const(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw, sljit_sw init_value) -{ - struct sljit_const *const_; - sljit_s32 reg; - - flush_buffer(compiler); - - CHECK_ERROR_PTR(); - CHECK_PTR(check_sljit_emit_const(compiler, dst, dstw, init_value)); - ADJUST_LOCAL_OFFSET(dst, dstw); - - const_ = (struct sljit_const *)ensure_abuf(compiler, sizeof(struct sljit_const)); - PTR_FAIL_IF(!const_); - set_const(const_, compiler); - - reg = FAST_IS_REG(dst) ? dst : TMP_REG2; - - PTR_FAIL_IF(emit_const_64(compiler, reg, init_value, 1)); - - if (dst & SLJIT_MEM) - PTR_FAIL_IF(emit_op(compiler, SLJIT_MOV, WORD_DATA, dst, dstw, TMP_REG1, 0, TMP_REG2, 0)); - return const_; -} - -SLJIT_API_FUNC_ATTRIBUTE void sljit_set_jump_addr(sljit_uw addr, sljit_uw new_target) -{ - sljit_ins *inst = (sljit_ins *)addr; - - inst[0] = (inst[0] & ~(0xFFFFL << 43)) | (((new_target >> 32) & 0xffff) << 43); - inst[1] = (inst[1] & ~(0xFFFFL << 43)) | (((new_target >> 16) & 0xffff) << 43); - inst[2] = (inst[2] & ~(0xFFFFL << 43)) | ((new_target & 0xffff) << 43); - SLJIT_CACHE_FLUSH(inst, inst + 3); -} - -SLJIT_API_FUNC_ATTRIBUTE void sljit_set_const(sljit_uw addr, sljit_sw new_constant) -{ - sljit_ins *inst = (sljit_ins *)addr; - - inst[0] = (inst[0] & ~(0xFFFFL << 43)) | (((new_constant >> 48) & 0xFFFFL) << 43); - inst[1] = (inst[1] & ~(0xFFFFL << 43)) | (((new_constant >> 32) & 0xFFFFL) << 43); - inst[2] = (inst[2] & ~(0xFFFFL << 43)) | (((new_constant >> 16) & 0xFFFFL) << 43); - inst[3] = (inst[3] & ~(0xFFFFL << 43)) | ((new_constant & 0xFFFFL) << 43); - SLJIT_CACHE_FLUSH(inst, inst + 4); -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_get_register_index(sljit_s32 reg) -{ - CHECK_REG_INDEX(check_sljit_get_register_index(reg)); - return reg_map[reg]; -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_custom(struct sljit_compiler *compiler, - void *instruction, sljit_s32 size) -{ - CHECK_ERROR(); - CHECK(check_sljit_emit_op_custom(compiler, instruction, size)); - return SLJIT_ERR_UNSUPPORTED; -} - diff --git a/src/pcre/testdata/grepinputv b/src/pcre/testdata/grepinputv deleted file mode 100644 index d33d326b..00000000 --- a/src/pcre/testdata/grepinputv +++ /dev/null @@ -1,4 +0,0 @@ -The quick brown -fox jumps -over the lazy dog. -This time it jumps and jumps and jumps. diff --git a/src/pcre/testdata/grepoutput8 b/src/pcre/testdata/grepoutput8 deleted file mode 100644 index 91493bdc..00000000 --- a/src/pcre/testdata/grepoutput8 +++ /dev/null @@ -1,12 +0,0 @@ ----------------------------- Test U1 ------------------------------ -1:X one -2:X two 3:X three 4:X four 5:X five -6:X six -7:X sevenÂ…8:X eight
9:X nine
10:X ten -RC=0 ----------------------------- Test U2 ------------------------------ -12-Before 111 -13-Before 222
14-Before 333Â…15:Match -16-After 111 -17-After 222
18-After 333 -RC=0 diff --git a/src/pcre/testdata/grepoutputN b/src/pcre/testdata/grepoutputN deleted file mode 100644 index 1f9f8801..00000000 --- a/src/pcre/testdata/grepoutputN +++ /dev/null @@ -1,16 +0,0 @@ ----------------------------- Test N1 ------------------------------ -1:abc 2:def ---------------------------- Test N2 ------------------------------ -1:abc def -2:ghi -jkl---------------------------- Test N3 ------------------------------ -2:def 3: -ghi -jkl---------------------------- Test N4 ------------------------------ -2:ghi -jkl---------------------------- Test N5 ------------------------------ -1:abc 2:def -3:ghi -4:jkl---------------------------- Test N6 ------------------------------ -1:abc 2:def -3:ghi -4:jkl \ No newline at end of file diff --git a/src/pcre/testdata/saved16 b/src/pcre/testdata/saved16 deleted file mode 100644 index f86326c9..00000000 Binary files a/src/pcre/testdata/saved16 and /dev/null differ diff --git a/src/pcre/testdata/saved16BE-1 b/src/pcre/testdata/saved16BE-1 deleted file mode 100644 index 5d2bd1be..00000000 Binary files a/src/pcre/testdata/saved16BE-1 and /dev/null differ diff --git a/src/pcre/testdata/saved16BE-2 b/src/pcre/testdata/saved16BE-2 deleted file mode 100644 index c91ce37b..00000000 Binary files a/src/pcre/testdata/saved16BE-2 and /dev/null differ diff --git a/src/pcre/testdata/saved16LE-1 b/src/pcre/testdata/saved16LE-1 deleted file mode 100644 index 822ccd70..00000000 Binary files a/src/pcre/testdata/saved16LE-1 and /dev/null differ diff --git a/src/pcre/testdata/saved16LE-2 b/src/pcre/testdata/saved16LE-2 deleted file mode 100644 index 656c058d..00000000 Binary files a/src/pcre/testdata/saved16LE-2 and /dev/null differ diff --git a/src/pcre/testdata/saved32 b/src/pcre/testdata/saved32 deleted file mode 100644 index a4e27041..00000000 Binary files a/src/pcre/testdata/saved32 and /dev/null differ diff --git a/src/pcre/testdata/saved32BE-1 b/src/pcre/testdata/saved32BE-1 deleted file mode 100644 index 609d97cd..00000000 Binary files a/src/pcre/testdata/saved32BE-1 and /dev/null differ diff --git a/src/pcre/testdata/saved32BE-2 b/src/pcre/testdata/saved32BE-2 deleted file mode 100644 index 79bb5e88..00000000 Binary files a/src/pcre/testdata/saved32BE-2 and /dev/null differ diff --git a/src/pcre/testdata/saved32LE-1 b/src/pcre/testdata/saved32LE-1 deleted file mode 100644 index 901dfb63..00000000 Binary files a/src/pcre/testdata/saved32LE-1 and /dev/null differ diff --git a/src/pcre/testdata/saved32LE-2 b/src/pcre/testdata/saved32LE-2 deleted file mode 100644 index 5f64af9b..00000000 Binary files a/src/pcre/testdata/saved32LE-2 and /dev/null differ diff --git a/src/pcre/testdata/saved8 b/src/pcre/testdata/saved8 deleted file mode 100644 index 8cf0c131..00000000 Binary files a/src/pcre/testdata/saved8 and /dev/null differ diff --git a/src/pcre/testdata/testinput10 b/src/pcre/testdata/testinput10 deleted file mode 100644 index 93ddb3a7..00000000 --- a/src/pcre/testdata/testinput10 +++ /dev/null @@ -1,1419 +0,0 @@ -/-- This set of tests check Unicode property support with the DFA matching - functionality of pcre_dfa_exec(). The -dfa flag must be used with pcretest - when running it. --/ - -/\pL\P{Nd}/8 - AB - *** Failers - A0 - 00 - -/\X./8 - AB - A\x{300}BC - A\x{300}\x{301}\x{302}BC - *** Failers - \x{300} - -/\X\X/8 - ABC - A\x{300}B\x{300}\x{301}C - A\x{300}\x{301}\x{302}BC - *** Failers - \x{300} - -/^\pL+/8 - abcd - a - *** Failers - -/^\PL+/8 - 1234 - = - *** Failers - abcd - -/^\X+/8 - abcdA\x{300}\x{301}\x{302} - A\x{300}\x{301}\x{302} - A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302} - a - *** Failers - \x{300}\x{301}\x{302} - -/\X?abc/8 - abc - A\x{300}abc - A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz - \x{300}abc - *** Failers - -/^\X?abc/8 - abc - A\x{300}abc - *** Failers - A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz - \x{300}abc - -/\X*abc/8 - abc - A\x{300}abc - A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz - \x{300}abc - *** Failers - -/^\X*abc/8 - abc - A\x{300}abc - A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz - *** Failers - \x{300}abc - -/^\pL?=./8 - A=b - =c - *** Failers - 1=2 - AAAA=b - -/^\pL*=./8 - AAAA=b - =c - *** Failers - 1=2 - -/^\X{2,3}X/8 - A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X - A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X - *** Failers - X - A\x{300}\x{301}\x{302}X - A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X - -/^\pC\pL\pM\pN\pP\pS\pZ\p{Xsp}/8 - >\x{1680}\x{2028}\x{0b} - ** Failers - \x{0b} - -/^>\p{Xsp}+/8O - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xsp}*/8O - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xsp}{2,9}/8O - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>[\p{Xsp}]/8O - >\x{2028}\x{0b} - -/^>[\p{Xsp}]+/8O - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}/8 - >\x{1680}\x{2028}\x{0b} - >\x{a0} - ** Failers - \x{0b} - -/^>\p{Xps}+/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}+?/8 - >\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}*/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}{2,9}/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}{2,9}?/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>[\p{Xps}]/8 - >\x{2028}\x{0b} - -/^>[\p{Xps}]+/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^\p{Xwd}/8 - ABCD - 1234 - \x{6ca} - \x{a6c} - \x{10a7} - _ABC - ** Failers - [] - -/^\p{Xwd}+/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - -/^\p{Xwd}*/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - -/^\p{Xwd}{2,9}/8 - A_12\x{6ca}\x{a6c}\x{10a7} - -/^[\p{Xwd}]/8 - ABCD1234_ - 1234abcd_ - \x{6ca} - \x{a6c} - \x{10a7} - _ABC - ** Failers - [] - -/^[\p{Xwd}]+/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - -/-- Unicode properties for \b abd \B --/ - -/\b...\B/8W - abc_ - \x{37e}abc\x{376} - \x{37e}\x{376}\x{371}\x{393}\x{394} - !\x{c0}++\x{c1}\x{c2} - !\x{c0}+++++ - -/-- Without PCRE_UCP, non-ASCII always fail, even if < 256 --/ - -/\b...\B/8 - abc_ - ** Failers - \x{37e}abc\x{376} - \x{37e}\x{376}\x{371}\x{393}\x{394} - !\x{c0}++\x{c1}\x{c2} - !\x{c0}+++++ - -/-- With PCRE_UCP, non-UTF8 chars that are < 256 still check properties --/ - -/\b...\B/W - abc_ - !\x{c0}++\x{c1}\x{c2} - !\x{c0}+++++ - -/-- Caseless single negated characters > 127 need UCP support --/ - -/[^\x{100}]/8i - \x{100}\x{101}X - -/[^\x{100}]+/8i - \x{100}\x{101}XX - -/^\X/8 - A\P - A\P\P - A\x{300}\x{301}\P - A\x{300}\x{301}\P\P - A\x{301}\P - A\x{301}\P\P - -/^\X{2,3}/8 - A\P - A\P\P - AA\P - AA\P\P - A\x{300}\x{301}\P - A\x{300}\x{301}\P\P - A\x{300}\x{301}A\x{300}\x{301}\P - A\x{300}\x{301}A\x{300}\x{301}\P\P - -/^\X{2}/8 - AA\P - AA\P\P - A\x{300}\x{301}A\x{300}\x{301}\P - A\x{300}\x{301}A\x{300}\x{301}\P\P - -/^\X+/8 - AA\P - AA\P\P - -/^\X+?Z/8 - AA\P - AA\P\P - -/-- These are tests for extended grapheme clusters --/ - -/^\X/8+ - G\x{34e}\x{34e}X - \x{34e}\x{34e}X - \x04X - \x{1100}X - \x{1100}\x{34e}X - \x{1b04}\x{1b04}X - *These match up to the roman letters - \x{1111}\x{1111}L,L - \x{1111}\x{1111}\x{1169}L,L,V - \x{1111}\x{ae4c}L, LV - \x{1111}\x{ad89}L, LVT - \x{1111}\x{ae4c}\x{1169}L, LV, V - \x{1111}\x{ae4c}\x{1169}\x{1169}L, LV, V, V - \x{1111}\x{ae4c}\x{1169}\x{11fe}L, LV, V, T - \x{1111}\x{ad89}\x{11fe}L, LVT, T - \x{1111}\x{ad89}\x{11fe}\x{11fe}L, LVT, T, T - \x{ad89}\x{11fe}\x{11fe}LVT, T, T - *These match just the first codepoint (invalid sequence) - \x{1111}\x{11fe}L, T - \x{ae4c}\x{1111}LV, L - \x{ae4c}\x{ae4c}LV, LV - \x{ae4c}\x{ad89}LV, LVT - \x{1169}\x{1111}V, L - \x{1169}\x{ae4c}V, LV - \x{1169}\x{ad89}V, LVT - \x{ad89}\x{1111}LVT, L - \x{ad89}\x{1169}LVT, V - \x{ad89}\x{ae4c}LVT, LV - \x{ad89}\x{ad89}LVT, LVT - \x{11fe}\x{1111}T, L - \x{11fe}\x{1169}T, V - \x{11fe}\x{ae4c}T, LV - \x{11fe}\x{ad89}T, LVT - *Test extend and spacing mark - \x{1111}\x{ae4c}\x{0711}L, LV, extend - \x{1111}\x{ae4c}\x{1b04}L, LV, spacing mark - \x{1111}\x{ae4c}\x{1b04}\x{0711}\x{1b04}L, LV, spacing mark, extend, spacing mark - *Test CR, LF, and control - \x0d\x{0711}CR, extend - \x0d\x{1b04}CR, spacingmark - \x0a\x{0711}LF, extend - \x0a\x{1b04}LF, spacingmark - \x0b\x{0711}Control, extend - \x09\x{1b04}Control, spacingmark - *There are no Prepend characters, so we can't test Prepend, CR - -/^(?>\X{2})X/8+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - -/^\X{2,4}X/8+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - -/^\X{2,4}?X/8+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - -/-- --/ - -/\x{1e9e}+/8i - \x{1e9e}\x{00df} - -/[z\x{1e9e}]+/8i - \x{1e9e}\x{00df} - -/\x{00df}+/8i - \x{1e9e}\x{00df} - -/[z\x{00df}]+/8i - \x{1e9e}\x{00df} - -/\x{1f88}+/8i - \x{1f88}\x{1f80} - -/[z\x{1f88}]+/8i - \x{1f88}\x{1f80} - -/-- Perl matches these --/ - -/\x{00b5}+/8i - \x{00b5}\x{039c}\x{03bc} - -/\x{039c}+/8i - \x{00b5}\x{039c}\x{03bc} - -/\x{03bc}+/8i - \x{00b5}\x{039c}\x{03bc} - - -/\x{00c5}+/8i - \x{00c5}\x{00e5}\x{212b} - -/\x{00e5}+/8i - \x{00c5}\x{00e5}\x{212b} - -/\x{212b}+/8i - \x{00c5}\x{00e5}\x{212b} - - -/\x{01c4}+/8i - \x{01c4}\x{01c5}\x{01c6} - -/\x{01c5}+/8i - \x{01c4}\x{01c5}\x{01c6} - -/\x{01c6}+/8i - \x{01c4}\x{01c5}\x{01c6} - - -/\x{01c7}+/8i - \x{01c7}\x{01c8}\x{01c9} - -/\x{01c8}+/8i - \x{01c7}\x{01c8}\x{01c9} - -/\x{01c9}+/8i - \x{01c7}\x{01c8}\x{01c9} - - -/\x{01ca}+/8i - \x{01ca}\x{01cb}\x{01cc} - -/\x{01cb}+/8i - \x{01ca}\x{01cb}\x{01cc} - -/\x{01cc}+/8i - \x{01ca}\x{01cb}\x{01cc} - - -/\x{01f1}+/8i - \x{01f1}\x{01f2}\x{01f3} - -/\x{01f2}+/8i - \x{01f1}\x{01f2}\x{01f3} - -/\x{01f3}+/8i - \x{01f1}\x{01f2}\x{01f3} - - -/\x{0345}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - -/\x{0399}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - -/\x{03b9}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - -/\x{1fbe}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - - -/\x{0392}+/8i - \x{0392}\x{03b2}\x{03d0} - -/\x{03b2}+/8i - \x{0392}\x{03b2}\x{03d0} - -/\x{03d0}+/8i - \x{0392}\x{03b2}\x{03d0} - - -/\x{0395}+/8i - \x{0395}\x{03b5}\x{03f5} - -/\x{03b5}+/8i - \x{0395}\x{03b5}\x{03f5} - -/\x{03f5}+/8i - \x{0395}\x{03b5}\x{03f5} - - -/\x{0398}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - -/\x{03b8}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - -/\x{03d1}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - -/\x{03f4}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - - -/\x{039a}+/8i - \x{039a}\x{03ba}\x{03f0} - -/\x{03ba}+/8i - \x{039a}\x{03ba}\x{03f0} - -/\x{03f0}+/8i - \x{039a}\x{03ba}\x{03f0} - - -/\x{03a0}+/8i - \x{03a0}\x{03c0}\x{03d6} - -/\x{03c0}+/8i - \x{03a0}\x{03c0}\x{03d6} - -/\x{03d6}+/8i - \x{03a0}\x{03c0}\x{03d6} - - -/\x{03a1}+/8i - \x{03a1}\x{03c1}\x{03f1} - -/\x{03c1}+/8i - \x{03a1}\x{03c1}\x{03f1} - -/\x{03f1}+/8i - \x{03a1}\x{03c1}\x{03f1} - - -/\x{03a3}+/8i - \x{03A3}\x{03C2}\x{03C3} - -/\x{03c2}+/8i - \x{03A3}\x{03C2}\x{03C3} - -/\x{03c3}+/8i - \x{03A3}\x{03C2}\x{03C3} - - -/\x{03a6}+/8i - \x{03a6}\x{03c6}\x{03d5} - -/\x{03c6}+/8i - \x{03a6}\x{03c6}\x{03d5} - -/\x{03d5}+/8i - \x{03a6}\x{03c6}\x{03d5} - - -/\x{03c9}+/8i - \x{03c9}\x{03a9}\x{2126} - -/\x{03a9}+/8i - \x{03c9}\x{03a9}\x{2126} - -/\x{2126}+/8i - \x{03c9}\x{03a9}\x{2126} - - -/\x{1e60}+/8i - \x{1e60}\x{1e61}\x{1e9b} - -/\x{1e61}+/8i - \x{1e60}\x{1e61}\x{1e9b} - -/\x{1e9b}+/8i - \x{1e60}\x{1e61}\x{1e9b} - - -/\x{1e9e}+/8i - \x{1e9e}\x{00df} - -/\x{00df}+/8i - \x{1e9e}\x{00df} - - -/\x{1f88}+/8i - \x{1f88}\x{1f80} - -/\x{1f80}+/8i - \x{1f88}\x{1f80} - -/\x{004b}+/8i - \x{004b}\x{006b}\x{212a} - -/\x{006b}+/8i - \x{004b}\x{006b}\x{212a} - -/\x{212a}+/8i - \x{004b}\x{006b}\x{212a} - - -/\x{0053}+/8i - \x{0053}\x{0073}\x{017f} - -/\x{0073}+/8i - \x{0053}\x{0073}\x{017f} - -/\x{017f}+/8i - \x{0053}\x{0073}\x{017f} - -/ist/8i - ikt - -/is+t/8i - iSs\x{17f}t - ikt - -/is+?t/8i - ikt - -/is?t/8i - ikt - -/is{2}t/8i - iskt - -/^\p{Xuc}/8 - $abc - @abc - `abc - \x{1234}abc - ** Failers - abc - -/^\p{Xuc}+/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^\p{Xuc}+?/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^\p{Xuc}+?\*/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^\p{Xuc}++/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^\p{Xuc}{3,5}/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^\p{Xuc}{3,5}?/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^[\p{Xuc}]/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^[\p{Xuc}]+/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^\P{Xuc}/8 - abc - ** Failers - $abc - @abc - `abc - \x{1234}abc - -/^[\P{Xuc}]/8 - abc - ** Failers - $abc - @abc - `abc - \x{1234}abc - -/^A\s+Z/8W - A\x{2005}Z - A\x{85}\x{180e}\x{2005}Z - -/^A[\s]+Z/8W - A\x{2005}Z - A\x{85}\x{180e}\x{2005}Z - -/-- End of testinput10 --/ diff --git a/src/pcre/testdata/testinput11 b/src/pcre/testdata/testinput11 deleted file mode 100644 index 6f0989a1..00000000 --- a/src/pcre/testdata/testinput11 +++ /dev/null @@ -1,143 +0,0 @@ -/-- These are a few representative patterns whose lengths and offsets are to be -shown when the link size is 2. This is just a doublecheck test to ensure the -sizes don't go horribly wrong when something is changed. The pattern contents -are all themselves checked in other tests. Unicode, including property support, -is required for these tests. --/ - -/((?i)b)/BM - -/(?s)(.*X|^B)/BM - -/(?s:.*X|^B)/BM - -/^[[:alnum:]]/BM - -/#/IxMD - -/a#/IxMD - -/x?+/BM - -/x++/BM - -/x{1,3}+/BM - -/(x)*+/BM - -/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/BM - -|8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b|BM - -|\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b|BM - -/(a(?1)b)/BM - -/(a(?1)+b)/BM - -/a(?Pb|c)d(?Pe)/BM - -/(?:a(?Pc(?Pd)))(?Pa)/BM - -/(?Pa)...(?P=a)bbb(?P>a)d/BM - -/abc(?C255)de(?C)f/BM - -/abcde/CBM - -/\x{100}/8BM - -/\x{1000}/8BM - -/\x{10000}/8BM - -/\x{100000}/8BM - -/\x{10ffff}/8BM - -/\x{110000}/8BM - -/[\x{ff}]/8BM - -/[\x{100}]/8BM - -/\x80/8BM - -/\xff/8BM - -/\x{0041}\x{2262}\x{0391}\x{002e}/D8M - -/\x{D55c}\x{ad6d}\x{C5B4}/D8M - -/\x{65e5}\x{672c}\x{8a9e}/D8M - -/[\x{100}]/8BM - -/[Z\x{100}]/8BM - -/^[\x{100}\E-\Q\E\x{150}]/B8M - -/^[\QÄ€\E-\QÅ\E]/B8M - -/^[\QÄ€\E-\QÅ\E/B8M - -/[\p{L}]/BM - -/[\p{^L}]/BM - -/[\P{L}]/BM - -/[\P{^L}]/BM - -/[abc\p{L}\x{0660}]/8BM - -/[\p{Nd}]/8BM - -/[\p{Nd}+-]+/8BM - -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iBM - -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8BM - -/[\x{105}-\x{109}]/8iBM - -/( ( (?(1)0|) )* )/xBM - -/( (?(1)0|)* )/xBM - -/[a]/BM - -/[a]/8BM - -/[\xaa]/BM - -/[\xaa]/8BM - -/[^a]/BM - -/[^a]/8BM - -/[^\xaa]/BM - -/[^\xaa]/8BM - -/[^\d]/8WB - -/[[:^alpha:][:^cntrl:]]+/8WB - -/[[:^cntrl:][:^alpha:]]+/8WB - -/[[:alpha:]]+/8WB - -/[[:^alpha:]\S]+/8WB - -/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/B - -/(((a\2)|(a*)\g<-1>))*a?/B - -/((?+1)(\1))/B - -/.((?2)(?R)\1)()/B - -/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)/ - -/-- End of testinput11 --/ diff --git a/src/pcre/testdata/testinput13 b/src/pcre/testdata/testinput13 deleted file mode 100644 index c7bc67bb..00000000 --- a/src/pcre/testdata/testinput13 +++ /dev/null @@ -1,9 +0,0 @@ -/-- This test is run only when JIT support is not available. It checks that an -attempt to use it has the expected behaviour. It also tests things that -are different without JIT. --/ - -/abc/S+I - -/a*/SI - -/-- End of testinput13 --/ diff --git a/src/pcre/testdata/testinput15 b/src/pcre/testdata/testinput15 deleted file mode 100644 index c065105b..00000000 --- a/src/pcre/testdata/testinput15 +++ /dev/null @@ -1,369 +0,0 @@ -/-- This set of tests is for UTF-8 support but not Unicode property support, - and is relevant only to the 8-bit library. --/ - -< forbid W - -/X(\C{3})/8 - X\x{1234} - -/X(\C{4})/8 - X\x{1234}YZ - -/X\C*/8 - XYZabcdce - -/X\C*?/8 - XYZabcde - -/X\C{3,5}/8 - Xabcdefg - X\x{1234} - X\x{1234}YZ - X\x{1234}\x{512} - X\x{1234}\x{512}YZ - -/X\C{3,5}?/8 - Xabcdefg - X\x{1234} - X\x{1234}YZ - X\x{1234}\x{512} - -/a\Cb/8 - aXb - a\nb - -/a\C\Cb/8 - a\x{100}b - -/ab\Cde/8 - abXde - -/a\C\Cb/8 - a\x{100}b - ** Failers - a\x{12257}b - -/[Ã]/8 - -/Ã/8 - -/ÃÃÃxxx/8 - -/ÃÃÃxxx/8?DZSSO - -/badutf/8 - \xdf - \xef - \xef\x80 - \xf7 - \xf7\x80 - \xf7\x80\x80 - \xfb - \xfb\x80 - \xfb\x80\x80 - \xfb\x80\x80\x80 - \xfd - \xfd\x80 - \xfd\x80\x80 - \xfd\x80\x80\x80 - \xfd\x80\x80\x80\x80 - \xdf\x7f - \xef\x7f\x80 - \xef\x80\x7f - \xf7\x7f\x80\x80 - \xf7\x80\x7f\x80 - \xf7\x80\x80\x7f - \xfb\x7f\x80\x80\x80 - \xfb\x80\x7f\x80\x80 - \xfb\x80\x80\x7f\x80 - \xfb\x80\x80\x80\x7f - \xfd\x7f\x80\x80\x80\x80 - \xfd\x80\x7f\x80\x80\x80 - \xfd\x80\x80\x7f\x80\x80 - \xfd\x80\x80\x80\x7f\x80 - \xfd\x80\x80\x80\x80\x7f - \xed\xa0\x80 - \xc0\x8f - \xe0\x80\x8f - \xf0\x80\x80\x8f - \xf8\x80\x80\x80\x8f - \xfc\x80\x80\x80\x80\x8f - \x80 - \xfe - \xff - -/badutf/8 - \xfb\x80\x80\x80\x80 - \xfd\x80\x80\x80\x80\x80 - \xf7\xbf\xbf\xbf - -/shortutf/8 - \P\P\xdf - \P\P\xef - \P\P\xef\x80 - \P\P\xf7 - \P\P\xf7\x80 - \P\P\xf7\x80\x80 - \P\P\xfb - \P\P\xfb\x80 - \P\P\xfb\x80\x80 - \P\P\xfb\x80\x80\x80 - \P\P\xfd - \P\P\xfd\x80 - \P\P\xfd\x80\x80 - \P\P\xfd\x80\x80\x80 - \P\P\xfd\x80\x80\x80\x80 - -/anything/8 - \xc0\x80 - \xc1\x8f - \xe0\x9f\x80 - \xf0\x8f\x80\x80 - \xf8\x87\x80\x80\x80 - \xfc\x83\x80\x80\x80\x80 - \xfe\x80\x80\x80\x80\x80 - \xff\x80\x80\x80\x80\x80 - \xc3\x8f - \xe0\xaf\x80 - \xe1\x80\x80 - \xf0\x9f\x80\x80 - \xf1\x8f\x80\x80 - \xf8\x88\x80\x80\x80 - \xf9\x87\x80\x80\x80 - \xfc\x84\x80\x80\x80\x80 - \xfd\x83\x80\x80\x80\x80 - \?\xf8\x88\x80\x80\x80 - \?\xf9\x87\x80\x80\x80 - \?\xfc\x84\x80\x80\x80\x80 - \?\xfd\x83\x80\x80\x80\x80 - -/\x{100}/8DZ - -/\x{1000}/8DZ - -/\x{10000}/8DZ - -/\x{100000}/8DZ - -/\x{10ffff}/8DZ - -/[\x{ff}]/8DZ - -/[\x{100}]/8DZ - -/\x80/8DZ - -/\xff/8DZ - -/\x{D55c}\x{ad6d}\x{C5B4}/DZ8 - \x{D55c}\x{ad6d}\x{C5B4} - -/\x{65e5}\x{672c}\x{8a9e}/DZ8 - \x{65e5}\x{672c}\x{8a9e} - -/\x{80}/DZ8 - -/\x{084}/DZ8 - -/\x{104}/DZ8 - -/\x{861}/DZ8 - -/\x{212ab}/DZ8 - -/-- This one is here not because it's different to Perl, but because the way -the captured single-byte is displayed. (In Perl it becomes a character, and you -can't tell the difference.) --/ - -/X(\C)(.*)/8 - X\x{1234} - X\nabc - -/-- This one is here because Perl gives out a grumbly error message (quite -correctly, but that messes up comparisons). --/ - -/a\Cb/8 - *** Failers - a\x{100}b - -/[^ab\xC0-\xF0]/8SDZ - \x{f1} - \x{bf} - \x{100} - \x{1000} - *** Failers - \x{c0} - \x{f0} - -/Ä€{3,4}/8SDZ - \x{100}\x{100}\x{100}\x{100\x{100} - -/(\x{100}+|x)/8SDZ - -/(\x{100}*a|x)/8SDZ - -/(\x{100}{0,2}a|x)/8SDZ - -/(\x{100}{1,2}a|x)/8SDZ - -/\x{100}/8DZ - -/a\x{100}\x{101}*/8DZ - -/a\x{100}\x{101}+/8DZ - -/[^\x{c4}]/DZ - -/[\x{100}]/8DZ - \x{100} - Z\x{100} - \x{100}Z - *** Failers - -/[\xff]/DZ8 - >\x{ff}< - -/[^\xff]/8DZ - -/\x{100}abc(xyz(?1))/8DZ - -/a\x{1234}b/P8 - a\x{1234}b - -/\777/8I - \x{1ff} - \777 - -/\x{100}+\x{200}/8DZ - -/\x{100}+X/8DZ - -/^[\QÄ€\E-\QÅ\E/BZ8 - -/-- This tests the stricter UTF-8 check according to RFC 3629. --/ - -/X/8 - \x{d800} - \x{d800}\? - \x{da00} - \x{da00}\? - \x{dfff} - \x{dfff}\? - \x{110000} - \x{110000}\? - \x{2000000} - \x{2000000}\? - \x{7fffffff} - \x{7fffffff}\? - -/(*UTF8)\x{1234}/ - abcd\x{1234}pqr - -/(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I - -/\h/SI8 - ABC\x{09} - ABC\x{20} - ABC\x{a0} - ABC\x{1680} - ABC\x{180e} - ABC\x{2000} - ABC\x{202f} - ABC\x{205f} - ABC\x{3000} - -/\v/SI8 - ABC\x{0a} - ABC\x{0b} - ABC\x{0c} - ABC\x{0d} - ABC\x{85} - ABC\x{2028} - -/\h*A/SI8 - CDBABC - -/\v+A/SI8 - -/\s?xxx\s/8SI - -/\sxxx\s/I8ST1 - AB\x{85}xxx\x{a0}XYZ - AB\x{a0}xxx\x{85}XYZ - -/\S \S/I8ST1 - \x{a2} \x{84} - A Z - -/a+/8 - a\x{123}aa\>1 - a\x{123}aa\>2 - a\x{123}aa\>3 - a\x{123}aa\>4 - a\x{123}aa\>5 - a\x{123}aa\>6 - -/\x{1234}+/iS8I - -/\x{1234}+?/iS8I - -/\x{1234}++/iS8I - -/\x{1234}{2}/iS8I - -/[^\x{c4}]/8DZ - -/X+\x{200}/8DZ - -/\R/SI8 - -/\777/8DZ - -/\w+\x{C4}/8BZ - a\x{C4}\x{C4} - -/\w+\x{C4}/8BZT1 - a\x{C4}\x{C4} - -/\W+\x{C4}/8BZ - !\x{C4} - -/\W+\x{C4}/8BZT1 - !\x{C4} - -/\W+\x{A1}/8BZ - !\x{A1} - -/\W+\x{A1}/8BZT1 - !\x{A1} - -/X\s+\x{A0}/8BZ - X\x20\x{A0}\x{A0} - -/X\s+\x{A0}/8BZT1 - X\x20\x{A0}\x{A0} - -/\S+\x{A0}/8BZ - X\x{A0}\x{A0} - -/\S+\x{A0}/8BZT1 - X\x{A0}\x{A0} - -/\x{a0}+\s!/8BZ - \x{a0}\x20! - -/\x{a0}+\s!/8BZT1 - \x{a0}\x20! - -/A/8 - \x{ff000041} - \x{7f000041} - -/(*UTF8)abc/9 - -/abc/89 - -//8+L - \xf1\xad\xae\xae - -/-- End of testinput15 --/ diff --git a/src/pcre/testdata/testinput16 b/src/pcre/testdata/testinput16 deleted file mode 100644 index 7ccde0a8..00000000 --- a/src/pcre/testdata/testinput16 +++ /dev/null @@ -1,67 +0,0 @@ -/-- This set of tests is run only with the 8-bit library when Unicode property - support is available. It starts with tests of the POSIX interface, because - that is supported only with the 8-bit library. --/ - -/\w/P - +++\x{c2} - -/\w/WP - +++\x{c2} - -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iDZ - -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8DZ - -/AB\x{1fb0}/8DZ - -/AB\x{1fb0}/8DZi - -/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/8iSI - \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} - \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} - -/[â±¥]/8iBZ - -/[^â±¥]/8iBZ - -/\h/SI - -/\v/SI - -/\R/SI - -/[[:blank:]]/WBZ - -/\x{212a}+/i8SI - KKkk\x{212a} - -/s+/i8SI - SSss\x{17f} - -/[\W\p{Any}]/BZ - abc - 123 - -/[\W\pL]/BZ - abc - ** Failers - 123 - -/[\D]/8 - \x{1d7cf} - -/[\D\P{Nd}]/8 - \x{1d7cf} - -/[^\D]/8 - a9b - ** Failers - \x{1d7cf} - -/[^\D\P{Nd}]/8 - a9b - \x{1d7cf} - ** Failers - \x{10000} - -/-- End of testinput16 --/ diff --git a/src/pcre/testdata/testinput17 b/src/pcre/testdata/testinput17 deleted file mode 100644 index c48e77f9..00000000 --- a/src/pcre/testdata/testinput17 +++ /dev/null @@ -1,309 +0,0 @@ -/-- This set of tests is for the 16- and 32-bit library's basic (non-UTF-16 - or -32) features that are not compatible with the 8-bit library, or which - give different output in 16- or 32-bit mode. --/ - -< forbid 8W - -/a\Cb/ - aXb - a\nb - -/[^\x{c4}]/DZ - -/\x{100}/I - -/ (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* # optional leading comment -(?: (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| -" (?: # opening quote... -[^\\\x80-\xff\n\015"] # Anything except backslash and quote -| # or -\\ [^\x80-\xff] # Escaped something (something != CR) -)* " # closing quote -) # initial word -(?: (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* \. (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| -" (?: # opening quote... -[^\\\x80-\xff\n\015"] # Anything except backslash and quote -| # or -\\ [^\x80-\xff] # Escaped something (something != CR) -)* " # closing quote -) )* # further okay, if led by a period -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* @ (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # initial subdomain -(?: # -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* \. # if led by a period... -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # ...further okay -)* -# address -| # or -(?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| -" (?: # opening quote... -[^\\\x80-\xff\n\015"] # Anything except backslash and quote -| # or -\\ [^\x80-\xff] # Escaped something (something != CR) -)* " # closing quote -) # one word, optionally followed by.... -(?: -[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037] | # atom and space parts, or... -\( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) | # comments, or... - -" (?: # opening quote... -[^\\\x80-\xff\n\015"] # Anything except backslash and quote -| # or -\\ [^\x80-\xff] # Escaped something (something != CR) -)* " # closing quote -# quoted strings -)* -< (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* # leading < -(?: @ (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # initial subdomain -(?: # -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* \. # if led by a period... -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # ...further okay -)* - -(?: (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* , (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* @ (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # initial subdomain -(?: # -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* \. # if led by a period... -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # ...further okay -)* -)* # further okay, if led by comma -: # closing colon -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* )? # optional route -(?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| -" (?: # opening quote... -[^\\\x80-\xff\n\015"] # Anything except backslash and quote -| # or -\\ [^\x80-\xff] # Escaped something (something != CR) -)* " # closing quote -) # initial word -(?: (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* \. (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| -" (?: # opening quote... -[^\\\x80-\xff\n\015"] # Anything except backslash and quote -| # or -\\ [^\x80-\xff] # Escaped something (something != CR) -)* " # closing quote -) )* # further okay, if led by a period -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* @ (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # initial subdomain -(?: # -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* \. # if led by a period... -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # ...further okay -)* -# address spec -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* > # trailing > -# name and address -) (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* # optional trailing comment -/xSI - -/[\h]/BZ - >\x09< - -/[\h]+/BZ - >\x09\x20\xa0< - -/[\v]/BZ - -/[^\h]/BZ - -/\h+/SI - \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} - \x{3001}\x{2fff}\x{200a}\xa0\x{2000} - -/[\h\x{dc00}]+/BZSI - \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} - \x{3001}\x{2fff}\x{200a}\xa0\x{2000} - -/\H+/SI - \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} - \x{2000}\x{200a}\x{1fff}\x{200b} - \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} - \xa0\x{3000}\x9f\xa1\x{2fff}\x{3001} - -/[\H\x{d800}]+/ - \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} - \x{2000}\x{200a}\x{1fff}\x{200b} - \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} - \xa0\x{3000}\x9f\xa1\x{2fff}\x{3001} - -/\v+/SI - \x{2027}\x{2030}\x{2028}\x{2029} - \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d - -/[\v\x{dc00}]+/BZSI - \x{2027}\x{2030}\x{2028}\x{2029} - \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d - -/\V+/SI - \x{2028}\x{2029}\x{2027}\x{2030} - \x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86 - -/[\V\x{d800}]+/ - \x{2028}\x{2029}\x{2027}\x{2030} - \x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86 - -/\R+/SI - \x{2027}\x{2030}\x{2028}\x{2029} - \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d - -/\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}/I - \x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00} - -/[^\x{80}][^\x{ff}][^\x{100}][^\x{1000}][^\x{ffff}]/BZ - -/[^\x{80}][^\x{ff}][^\x{100}][^\x{1000}][^\x{ffff}]/BZi - -/[^\x{100}]*[^\x{1000}]+[^\x{ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{100}]{5,6}+/BZ - -/[^\x{100}]*[^\x{1000}]+[^\x{ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{100}]{5,6}+/BZiu0100/BZ - -/[\u0100-\u0200]/BZ - -/\ud800/BZ - -/^\x{ffff}+/i - \x{ffff} - -/^\x{ffff}?/i - \x{ffff} - -/^\x{ffff}*/i - \x{ffff} - -/^\x{ffff}{3}/i - \x{ffff}\x{ffff}\x{ffff} - -/^\x{ffff}{0,3}/i - \x{ffff} - -/[^\x00-a]{12,}[^b-\xff]*/BZ - -/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/BZ - -/a*[b-\x{200}]?a#a*[b-\x{200}]?b#[a-f]*[g-\x{200}]*#[g-\x{200}]*[a-c]*#[g-\x{200}]*[a-h]*/BZ - -/^[\x{1234}\x{4321}]{2,4}?/ - \x{1234}\x{1234}\x{1234}nd of testinput17 --/ diff --git a/src/pcre/testdata/testinput18 b/src/pcre/testdata/testinput18 deleted file mode 100644 index 2dfb54cd..00000000 --- a/src/pcre/testdata/testinput18 +++ /dev/null @@ -1,300 +0,0 @@ -/-- This set of tests is for UTF-16 and UTF-32 support, and is relevant only to - the 16- and 32-bit libraries. --/ - -< forbid W - -/ÃÃÃxxx/8?DZSS - -/abc/8 - Ã] - -/X(\C{3})/8 - X\x{11234}Y - X\x{11234}YZ - -/X(\C{4})/8 - X\x{11234}YZ - X\x{11234}YZW - -/X\C*/8 - XYZabcdce - -/X\C*?/8 - XYZabcde - -/X\C{3,5}/8 - Xabcdefg - X\x{11234}Y - X\x{11234}YZ - X\x{11234}\x{512} - X\x{11234}\x{512}YZ - X\x{11234}\x{512}\x{11234}Z - -/X\C{3,5}?/8 - Xabcdefg - X\x{11234}Y - X\x{11234}YZ - X\x{11234}\x{512}YZ - *** Failers - X\x{11234} - -/a\Cb/8 - aXb - a\nb - -/a\C\Cb/8 - a\x{12257}b - a\x{12257}\x{11234}b - ** Failers - a\x{100}b - -/ab\Cde/8 - abXde - -/-- Check maximum character size --/ - -/\x{ffff}/8DZ - -/\x{10000}/8DZ - -/\x{100}/8DZ - -/\x{1000}/8DZ - -/\x{10000}/8DZ - -/\x{100000}/8DZ - -/\x{10ffff}/8DZ - -/[\x{ff}]/8DZ - -/[\x{100}]/8DZ - -/\x80/8DZ - -/\xff/8DZ - -/\x{D55c}\x{ad6d}\x{C5B4}/DZ8 - \x{D55c}\x{ad6d}\x{C5B4} - -/\x{65e5}\x{672c}\x{8a9e}/DZ8 - \x{65e5}\x{672c}\x{8a9e} - -/\x{80}/DZ8 - -/\x{084}/DZ8 - -/\x{104}/DZ8 - -/\x{861}/DZ8 - -/\x{212ab}/DZ8 - -/-- This one is here not because it's different to Perl, but because the way -the captured single-byte is displayed. (In Perl it becomes a character, and you -can't tell the difference.) --/ - -/X(\C)(.*)/8 - X\x{1234} - X\nabc - -/-- This one is here because Perl gives out a grumbly error message (quite -correctly, but that messes up comparisons). --/ - -/a\Cb/8 - *** Failers - a\x{100}b - -/[^ab\xC0-\xF0]/8SDZ - \x{f1} - \x{bf} - \x{100} - \x{1000} - *** Failers - \x{c0} - \x{f0} - -/Ä€{3,4}/8SDZ - \x{100}\x{100}\x{100}\x{100\x{100} - -/(\x{100}+|x)/8SDZ - -/(\x{100}*a|x)/8SDZ - -/(\x{100}{0,2}a|x)/8SDZ - -/(\x{100}{1,2}a|x)/8SDZ - -/\x{100}/8DZ - -/a\x{100}\x{101}*/8DZ - -/a\x{100}\x{101}+/8DZ - -/[^\x{c4}]/DZ - -/[\x{100}]/8DZ - \x{100} - Z\x{100} - \x{100}Z - *** Failers - -/[\xff]/DZ8 - >\x{ff}< - -/[^\xff]/8DZ - -/\x{100}abc(xyz(?1))/8DZ - -/\777/8I - \x{1ff} - \777 - -/\x{100}+\x{200}/8DZ - -/\x{100}+X/8DZ - -/^[\QÄ€\E-\QÅ\E/BZ8 - -/X/8 - \x{d800} - \x{d800}\? - \x{da00} - \x{da00}\? - \x{dc00} - \x{dc00}\? - \x{de00} - \x{de00}\? - \x{dfff} - \x{dfff}\? - \x{110000} - \x{d800}\x{1234} - -/(*UTF16)\x{11234}/ - abcd\x{11234}pqr - -/(*UTF)\x{11234}/I - abcd\x{11234}pqr - -/(*UTF-32)\x{11234}/ - abcd\x{11234}pqr - -/(*CRLF)(*UTF16)(*BSR_UNICODE)a\Rb/I - -/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I - -/\h/SI8 - ABC\x{09} - ABC\x{20} - ABC\x{a0} - ABC\x{1680} - ABC\x{180e} - ABC\x{2000} - ABC\x{202f} - ABC\x{205f} - ABC\x{3000} - -/\v/SI8 - ABC\x{0a} - ABC\x{0b} - ABC\x{0c} - ABC\x{0d} - ABC\x{85} - ABC\x{2028} - -/\h*A/SI8 - CDBABC - \x{2000}ABC - -/\R*A/SI8 - CDBABC - \x{2028}A - -/\v+A/SI8 - -/\s?xxx\s/8SI - -/\sxxx\s/I8ST1 - AB\x{85}xxx\x{a0}XYZ - AB\x{a0}xxx\x{85}XYZ - -/\S \S/I8ST1 - \x{a2} \x{84} - A Z - -/a+/8 - a\x{123}aa\>1 - a\x{123}aa\>2 - a\x{123}aa\>3 - a\x{123}aa\>4 - a\x{123}aa\>5 - a\x{123}aa\>6 - -/\x{1234}+/iS8I - -/\x{1234}+?/iS8I - -/\x{1234}++/iS8I - -/\x{1234}{2}/iS8I - -/[^\x{c4}]/8DZ - -/X+\x{200}/8DZ - -/\R/SI8 - -/-- Check bad offset --/ - -/a/8 - \x{10000}\>1 - \x{10000}ab\>1 - \x{10000}ab\>2 - \x{10000}ab\>3 - \x{10000}ab\>4 - \x{10000}ab\>5 - -/í¼€/8 - -/\w+\x{C4}/8BZ - a\x{C4}\x{C4} - -/\w+\x{C4}/8BZT1 - a\x{C4}\x{C4} - -/\W+\x{C4}/8BZ - !\x{C4} - -/\W+\x{C4}/8BZT1 - !\x{C4} - -/\W+\x{A1}/8BZ - !\x{A1} - -/\W+\x{A1}/8BZT1 - !\x{A1} - -/X\s+\x{A0}/8BZ - X\x20\x{A0}\x{A0} - -/X\s+\x{A0}/8BZT1 - X\x20\x{A0}\x{A0} - -/\S+\x{A0}/8BZ - X\x{A0}\x{A0} - -/\S+\x{A0}/8BZT1 - X\x{A0}\x{A0} - -/\x{a0}+\s!/8BZ - \x{a0}\x20! - -/\x{a0}+\s!/8BZT1 - \x{a0}\x20! - -/(*UTF)abc/9 - -/abc/89 - -/-- End of testinput18 --/ diff --git a/src/pcre/testdata/testinput19 b/src/pcre/testdata/testinput19 deleted file mode 100644 index dfe8c7be..00000000 --- a/src/pcre/testdata/testinput19 +++ /dev/null @@ -1,45 +0,0 @@ -/-- This set of tests is for Unicode property support, relevant only to the - 16- and 32-bit library. --/ - -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iDZ - -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8DZ - -/AB\x{1fb0}/8DZ - -/AB\x{1fb0}/8DZi - -/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/8iSI - \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} - \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} - -/[â±¥]/8iBZ - -/[^â±¥]/8iBZ - -/[[:blank:]]/WBZ - -/\x{212a}+/i8SI - KKkk\x{212a} - -/s+/i8SI - SSss\x{17f} - -/[\D]/8 - \x{1d7cf} - -/[\D\P{Nd}]/8 - \x{1d7cf} - -/[^\D]/8 - a9b - ** Failers - \x{1d7cf} - -/[^\D\P{Nd}]/8 - a9b - \x{1d7cf} - ** Failers - \x{10000} - -/-- End of testinput19 --/ diff --git a/src/pcre/testdata/testinput2 b/src/pcre/testdata/testinput2 deleted file mode 100644 index 3528de15..00000000 --- a/src/pcre/testdata/testinput2 +++ /dev/null @@ -1,4263 +0,0 @@ -/-- This set of tests is not Perl-compatible. It checks on special features - of PCRE's API, error diagnostics, and the compiled code of some patterns. - It also checks the non-Perl syntax the PCRE supports (Python, .NET, - Oniguruma). Finally, there are some tests where PCRE and Perl differ, - either because PCRE can't be compatible, or there is a possible Perl - bug. - - NOTE: This is a non-UTF set of tests. When UTF support is needed, use - test 5, and if Unicode Property Support is needed, use test 7. --/ - -< forbid 8W - -/(a)b|/I - -/abc/I - abc - defabc - \Aabc - *** Failers - \Adefabc - ABC - -/^abc/I - abc - \Aabc - *** Failers - defabc - \Adefabc - -/a+bc/I - -/a*bc/I - -/a{3}bc/I - -/(abc|a+z)/I - -/^abc$/I - abc - *** Failers - def\nabc - -/ab\idef/X - -/(?X)ab\idef/X - -/x{5,4}/ - -/z{65536}/ - -/[abcd/ - -/(?X)[\B]/ - -/(?X)[\R]/ - -/(?X)[\X]/ - -/[\B]/BZ - -/[\R]/BZ - -/[\X]/BZ - -/[z-a]/ - -/^*/ - -/(abc/ - -/(?# abc/ - -/(?z)abc/ - -/.*b/I - -/.*?b/I - -/cat|dog|elephant/I - this sentence eventually mentions a cat - this sentences rambles on and on for a while and then reaches elephant - -/cat|dog|elephant/IS - this sentence eventually mentions a cat - this sentences rambles on and on for a while and then reaches elephant - -/cat|dog|elephant/IiS - this sentence eventually mentions a CAT cat - this sentences rambles on and on for a while to elephant ElePhant - -/a|[bcd]/IS - -/(a|[^\dZ])/IS - -/(a|b)*[\s]/IS - -/(ab\2)/ - -/{4,5}abc/ - -/(a)(b)(c)\2/I - abcb - \O0abcb - \O3abcb - \O6abcb - \O9abcb - \O12abcb - -/(a)bc|(a)(b)\2/I - abc - \O0abc - \O3abc - \O6abc - aba - \O0aba - \O3aba - \O6aba - \O9aba - \O12aba - -/abc$/IE - abc - *** Failers - abc\n - abc\ndef - -/(a)(b)(c)(d)(e)\6/ - -/the quick brown fox/I - the quick brown fox - this is a line with the quick brown fox - -/the quick brown fox/IA - the quick brown fox - *** Failers - this is a line with the quick brown fox - -/ab(?z)cd/ - -/^abc|def/I - abcdef - abcdef\B - -/.*((abc)$|(def))/I - defabc - \Zdefabc - -/)/ - -/a[]b/ - -/[^aeiou ]{3,}/I - co-processors, and for - -/<.*>/I - abcghinop - -/<.*?>/I - abcghinop - -/<.*>/IU - abcghinop - -/(?U)<.*>/I - abcghinop - -/<.*?>/IU - abcghinop - -/={3,}/IU - abc========def - -/(?U)={3,}?/I - abc========def - -/(?^abc)/Im - abc - def\nabc - *** Failers - defabc - -/(?<=ab(c+)d)ef/ - -/(?<=ab(?<=c+)d)ef/ - -/(?<=ab(c|de)f)g/ - -/The next three are in testinput2 because they have variable length branches/ - -/(?<=bullock|donkey)-cart/I - the bullock-cart - a donkey-cart race - *** Failers - cart - horse-and-cart - -/(?<=ab(?i)x|y|z)/I - -/(?>.*)(?<=(abcd)|(xyz))/I - alphabetabcd - endingxyz - -/(?<=ab(?i)x(?-i)y|(?i)z|b)ZZ/I - abxyZZ - abXyZZ - ZZZ - zZZ - bZZ - BZZ - *** Failers - ZZ - abXYZZ - zzz - bzz - -/(?[^()]+) # Either a sequence of non-brackets (no backtracking) - | # Or - (?R) # Recurse - i.e. nested bracketed string - )* # Zero or more contents - \) # Closing ) - /Ix - (abcd) - (abcd)xyz - xyz(abcd) - (ab(xy)cd)pqr - (ab(xycd)pqr - () abc () - 12(abcde(fsh)xyz(foo(bar))lmno)89 - *** Failers - abcd - abcd) - (abcd - -/\( ( (?>[^()]+) | (?R) )* \) /Ixg - (ab(xy)cd)pqr - 1(abcd)(x(y)z)pqr - -/\( (?: (?>[^()]+) | (?R) ) \) /Ix - (abcd) - (ab(xy)cd) - (a(b(c)d)e) - ((ab)) - *** Failers - () - -/\( (?: (?>[^()]+) | (?R) )? \) /Ix - () - 12(abcde(fsh)xyz(foo(bar))lmno)89 - -/\( ( (?>[^()]+) | (?R) )* \) /Ix - (ab(xy)cd) - -/\( ( ( (?>[^()]+) | (?R) )* ) \) /Ix - (ab(xy)cd) - -/\( (123)? ( ( (?>[^()]+) | (?R) )* ) \) /Ix - (ab(xy)cd) - (123ab(xy)cd) - -/\( ( (123)? ( (?>[^()]+) | (?R) )* ) \) /Ix - (ab(xy)cd) - (123ab(xy)cd) - -/\( (((((((((( ( (?>[^()]+) | (?R) )* )))))))))) \) /Ix - (ab(xy)cd) - -/\( ( ( (?>[^()<>]+) | ((?>[^()]+)) | (?R) )* ) \) /Ix - (abcd(xyz

    qrs)123) - -/\( ( ( (?>[^()]+) | ((?R)) )* ) \) /Ix - (ab(cd)ef) - (ab(cd(ef)gh)ij) - -/^[[:alnum:]]/DZ - -/^[[:^alnum:]]/DZ - -/^[[:alpha:]]/DZ - -/^[[:^alpha:]]/DZ - -/[_[:alpha:]]/IS - -/^[[:ascii:]]/DZ - -/^[[:^ascii:]]/DZ - -/^[[:blank:]]/DZ - -/^[[:^blank:]]/DZ - -/[\n\x0b\x0c\x0d[:blank:]]/IS - -/^[[:cntrl:]]/DZ - -/^[[:digit:]]/DZ - -/^[[:graph:]]/DZ - -/^[[:lower:]]/DZ - -/^[[:print:]]/DZ - -/^[[:punct:]]/DZ - -/^[[:space:]]/DZ - -/^[[:upper:]]/DZ - -/^[[:xdigit:]]/DZ - -/^[[:word:]]/DZ - -/^[[:^cntrl:]]/DZ - -/^[12[:^digit:]]/DZ - -/^[[:^blank:]]/DZ - -/[01[:alpha:]%]/DZ - -/[[.ch.]]/I - -/[[=ch=]]/I - -/[[:rhubarb:]]/I - -/[[:upper:]]/Ii - A - a - -/[[:lower:]]/Ii - A - a - -/((?-i)[[:lower:]])[[:lower:]]/Ii - ab - aB - *** Failers - Ab - AB - -/[\200-\110]/I - -/^(?(0)f|b)oo/I - -/This one's here because of the large output vector needed/I - -/(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\w+)\s+(\270)/I - \O900 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 ABC ABC - -/This one's here because Perl does this differently and PCRE can't at present/I - -/(main(O)?)+/I - mainmain - mainOmain - -/These are all cases where Perl does it differently (nested captures)/I - -/^(a(b)?)+$/I - aba - -/^(aa(bb)?)+$/I - aabbaa - -/^(aa|aa(bb))+$/I - aabbaa - -/^(aa(bb)??)+$/I - aabbaa - -/^(?:aa(bb)?)+$/I - aabbaa - -/^(aa(b(b))?)+$/I - aabbaa - -/^(?:aa(b(b))?)+$/I - aabbaa - -/^(?:aa(b(?:b))?)+$/I - aabbaa - -/^(?:aa(bb(?:b))?)+$/I - aabbbaa - -/^(?:aa(b(?:bb))?)+$/I - aabbbaa - -/^(?:aa(?:b(b))?)+$/I - aabbaa - -/^(?:aa(?:b(bb))?)+$/I - aabbbaa - -/^(aa(b(bb))?)+$/I - aabbbaa - -/^(aa(bb(bb))?)+$/I - aabbbbaa - -/--------------------------------------------------------------------/I - -/#/IxDZ - -/a#/IxDZ - -/[\s]/DZ - -/[\S]/DZ - -/a(?i)b/DZ - ab - aB - *** Failers - AB - -/(a(?i)b)/DZ - ab - aB - *** Failers - AB - -/ (?i)abc/IxDZ - -/#this is a comment - (?i)abc/Ixx/DZ - -/ \Q\E/DZ - -/a\Q\E/DZ - abc - bca - bac - -/a\Q\Eb/DZ - abc - -/\Q\Eabc/DZ - -/x*+\w/DZ - *** Failers - xxxxx - -/x?+/DZ - -/x++/DZ - -/x{1,3}+/BZO - -/x{1,3}+/BZOi - -/[^x]{1,3}+/BZO - -/[^x]{1,3}+/BZOi - -/(x)*+/DZ - -/^(\w++|\s++)*$/I - now is the time for all good men to come to the aid of the party - *** Failers - this is not a line with only words and spaces! - -/(\d++)(\w)/I - 12345a - *** Failers - 12345+ - -/a++b/I - aaab - -/(a++b)/I - aaab - -/(a++)b/I - aaab - -/([^()]++|\([^()]*\))+/I - ((abc(ade)ufh()()x - -/\(([^()]++|\([^()]+\))+\)/I - (abc) - (abc(def)xyz) - *** Failers - ((()aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa - -/(abc){1,3}+/DZ - -/a+?+/I - -/a{2,3}?+b/I - -/(?U)a+?+/I - -/a{2,3}?+b/IU - -/x(?U)a++b/DZ - xaaaab - -/(?U)xa++b/DZ - xaaaab - -/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/DZ - -/^x(?U)a+b/DZ - -/^x(?U)(a+)b/DZ - -/[.x.]/I - -/[=x=]/I - -/[:x:]/I - -/\l/I - -/\L/I - -/\N{name}/I - -/\u/I - -/\U/I - -/a{1,3}b/U - ab - -/[/I - -/[a-/I - -/[[:space:]/I - -/[\s]/IDZ - -/[[:space:]]/IDZ - -/[[:space:]abcde]/IDZ - -/< (?: (?(R) \d++ | [^<>]*+) | (?R)) * >/Ix - <> - - hij> - hij> - def> - - *** Failers - iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b|IDZ - -|\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b|IDZ - -/(.*)\d+\1/I - -/(.*)\d+/I - -/(.*)\d+\1/Is - -/(.*)\d+/Is - -/(.*(xyz))\d+\2/I - -/((.*))\d+\1/I - abc123bc - -/a[b]/I - -/(?=a).*/I - -/(?=abc).xyz/IiI - -/(?=abc)(?i).xyz/I - -/(?=a)(?=b)/I - -/(?=.)a/I - -/((?=abcda)a)/I - -/((?=abcda)ab)/I - -/()a/I - -/(?(1)ab|ac)(.)/I - -/(?(1)abz|acz)(.)/I - -/(?(1)abz)(.)/I - -/(?(1)abz)(1)23/I - -/(a)+/I - -/(a){2,3}/I - -/(a)*/I - -/[a]/I - -/[ab]/I - -/[ab]/IS - -/[^a]/I - -/\d456/I - -/\d456/IS - -/a^b/I - -/^a/Im - abcde - xy\nabc - *** Failers - xyabc - -/c|abc/I - -/(?i)[ab]/IS - -/[ab](?i)cd/IS - -/abc(?C)def/I - abcdef - 1234abcdef - *** Failers - abcxyz - abcxyzf - -/abc(?C)de(?C1)f/I - 123abcdef - -/(?C1)\dabc(?C2)def/IS - 1234abcdef - *** Failers - abcdef - -/(?C1)\dabc(?C2)def/ISS - 1234abcdef - *** Failers - abcdef - -/(?C255)ab/I - -/(?C256)ab/I - -/(?Cab)xx/I - -/(?C12vr)x/I - -/abc(?C)def/I - *** Failers - \x83\x0\x61bcdef - -/(abc)(?C)de(?C1)f/I - 123abcdef - 123abcdef\C+ - 123abcdef\C- - *** Failers - 123abcdef\C!1 - -/(?C0)(abc(?C1))*/I - abcabcabc - abcabc\C!1!3 - *** Failers - abcabcabc\C!1!3 - -/(\d{3}(?C))*/I - 123\C+ - 123456\C+ - 123456789\C+ - -/((xyz)(?C)p|(?C1)xyzabc)/I - xyzabc\C+ - -/(X)((xyz)(?C)p|(?C1)xyzabc)/I - Xxyzabc\C+ - -/(?=(abc))(?C)abcdef/I - abcdef\C+ - -/(?!(abc)(?C1)d)(?C2)abcxyz/I - abcxyz\C+ - -/(?<=(abc)(?C))xyz/I - abcxyz\C+ - -/a(b+)(c*)(?C1)/I - abbbbbccc\C*1 - -/a(b+?)(c*?)(?C1)/I - abbbbbccc\C*1 - -/(?C)abc/I - -/(?C)^abc/I - -/(?C)a|b/IS - -/(?R)/I - -/(a|(?R))/I - -/(ab|(bc|(de|(?R))))/I - -/x(ab|(bc|(de|(?R))))/I - xab - xbc - xde - xxab - xxxab - *** Failers - xyab - -/(ab|(bc|(de|(?1))))/I - -/x(ab|(bc|(de|(?1)x)x)x)/I - -/^([^()]|\((?1)*\))*$/I - abc - a(b)c - a(b(c))d - *** Failers) - a(b(c)d - -/^>abc>([^()]|\((?1)*\))*abc>123abc>1(2)3abc>(1(2)3)]*+) | (?2)) * >))/Ix - <> - - hij> - hij> - def> - - *** Failers - b|c)d(?Pe)/DZ - abde - acde - -/(?:a(?Pc(?Pd)))(?Pa)/DZ - -/(?Pa)...(?P=a)bbb(?P>a)d/DZ - -/^\W*(?:(?P(?P.)\W*(?P>one)\W*(?P=two)|)|(?P(?P.)\W*(?P>three)\W*(?P=four)|\W*.\W*))\W*$/Ii - 1221 - Satan, oscillate my metallic sonatas! - A man, a plan, a canal: Panama! - Able was I ere I saw Elba. - *** Failers - The quick brown fox - -/((?(R)a|b))\1(?1)?/I - bb - bbaa - -/(.*)a/Is - -/(.*)a\1/Is - -/(.*)a(b)\2/Is - -/((.*)a|(.*)b)z/Is - -/((.*)a|(.*)b)z\1/Is - -/((.*)a|(.*)b)z\2/Is - -/((.*)a|(.*)b)z\3/Is - -/((.*)a|^(.*)b)z\3/Is - -/(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)a/Is - -/(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)a\31/Is - -/(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)a\32/Is - -/(a)(bc)/INDZ - abc - -/(?Pa)(bc)/INDZ - abc - -/(a)(?Pbc)/INDZ - -/(a+)*zz/I - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\M - aaaaaaaaaaaaaz\M - -/(aaa(?C1)bbb|ab)/I - aaabbb - aaabbb\C*0 - aaabbb\C*1 - aaabbb\C*-1 - -/ab(?Pcd)ef(?Pgh)/I - abcdefgh - abcdefgh\C1\Gtwo - abcdefgh\Cone\Ctwo - abcdefgh\Cthree - -/(?P)(?P)/DZ - -/(?P)(?P)/DZ - -/(?Pzz)(?Paa)/I - zzaa\CZ - zzaa\CA - -/(?Peks)(?Peccs)/I - -/(?Pabc(?Pdef)(?Pxyz))/I - -"\[((?P\d+)(,(?P>elem))*)\]"I - [10,20,30,5,5,4,4,2,43,23,4234] - *** Failers - [] - -"\[((?P\d+)(,(?P>elem))*)?\]"I - [10,20,30,5,5,4,4,2,43,23,4234] - [] - -/(a(b(?2)c))?/DZ - -/(a(b(?2)c))*/DZ - -/(a(b(?2)c)){0,2}/DZ - -/[ab]{1}+/DZ - -/((w\/|-|with)*(free|immediate)*.*?shipping\s*[!.-]*)/Ii - Baby Bjorn Active Carrier - With free SHIPPING!! - -/((w\/|-|with)*(free|immediate)*.*?shipping\s*[!.-]*)/IiS - Baby Bjorn Active Carrier - With free SHIPPING!! - -/a*.*b/ISDZ - -/(a|b)*.?c/ISDZ - -/abc(?C255)de(?C)f/DZ - -/abcde/ICDZ - abcde - abcdfe - -/a*b/ICDZS - ab - aaaab - aaaacb - -/a*b/ICDZSS - ab - aaaab - aaaacb - -/a+b/ICDZ - ab - aaaab - aaaacb - -/(abc|def)x/ICDZS - abcx - defx - ** Failers - abcdefzx - -/(abc|def)x/ICDZSS - abcx - defx - ** Failers - abcdefzx - -/(ab|cd){3,4}/IC - ababab - abcdabcd - abcdcdcdcdcd - -/([ab]{,4}c|xy)/ICDZS - Note: that { does NOT introduce a quantifier - -/([ab]{,4}c|xy)/ICDZSS - Note: that { does NOT introduce a quantifier - -/([ab]{1,4}c|xy){4,5}?123/ICDZ - aacaacaacaacaac123 - -/\b.*/I - ab cd\>1 - -/\b.*/Is - ab cd\>1 - -/(?!.bcd).*/I - Xbcd12345 - -/abcde/I - ab\P - abc\P - abcd\P - abcde\P - the quick brown abc\P - ** Failers\P - the quick brown abxyz fox\P - -"^(0?[1-9]|[12][0-9]|3[01])/(0?[1-9]|1[012])/(20)?\d\d$"I - 13/05/04\P - 13/5/2004\P - 02/05/09\P - 1\P - 1/2\P - 1/2/0\P - 1/2/04\P - 0\P - 02/\P - 02/0\P - 02/1\P - ** Failers\P - \P - 123\P - 33/4/04\P - 3/13/04\P - 0/1/2003\P - 0/\P - 02/0/\P - 02/13\P - -/0{0,2}ABC/I - -/\d{3,}ABC/I - -/\d*ABC/I - -/[abc]+DE/I - -/[abc]?123/I - 123\P - a\P - b\P - c\P - c12\P - c123\P - -/^(?:\d){3,5}X/I - 1\P - 123\P - 123X - 1234\P - 1234X - 12345\P - 12345X - *** Failers - 1X - 123456\P - -//KF>testsavedregex - -/abc/IS>testsavedregex -testsavedregex -testsavedregex -testsavedregex -testsavedregex -testsavedregex -testsavedregex -testsavedregex -(.)*~smgI - \J1024\n\n\nPartner der LCO\nde\nPartner der LINEAS Consulting\nGmbH\nLINEAS Consulting GmbH Hamburg\nPartnerfirmen\n30 days\nindex,follow\n\nja\n3\nPartner\n\n\nLCO\nLINEAS Consulting\n15.10.2003\n\n\n\n\nDie Partnerfirmen der LINEAS Consulting\nGmbH\n\n\n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n - -/^a/IF - -/line\nbreak/I - this is a line\nbreak - line one\nthis is a line\nbreak in the second line - -/line\nbreak/If - this is a line\nbreak - ** Failers - line one\nthis is a line\nbreak in the second line - -/line\nbreak/Imf - this is a line\nbreak - ** Failers - line one\nthis is a line\nbreak in the second line - -/(?i)(?-i)AbCd/I - AbCd - ** Failers - abcd - -/a{11111111111111111111}/I - -/(){64294967295}/I - -/(){2,4294967295}/I - -"(?i:a)(?i:b)(?i:c)(?i:d)(?i:e)(?i:f)(?i:g)(?i:h)(?i:i)(?i:j)(k)(?i:l)A\1B"I - abcdefghijklAkB - -"(?Pa)(?Pb)(?Pc)(?Pd)(?Pe)(?Pf)(?Pg)(?Ph)(?Pi)(?Pj)(?Pk)(?Pl)A\11B"I - abcdefghijklAkB - -"(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)A\11B"I - abcdefghijklAkB - -"(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)"I - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa - -"(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)"I - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa - -/[^()]*(?:\((?R)\)[^()]*)*/I - (this(and)that - (this(and)that) - (this(and)that)stuff - -/[^()]*(?:\((?>(?R))\)[^()]*)*/I - (this(and)that - (this(and)that) - -/[^()]*(?:\((?R)\))*[^()]*/I - (this(and)that - (this(and)that) - -/(?:\((?R)\))*[^()]*/I - (this(and)that - (this(and)that) - ((this)) - -/(?:\((?R)\))|[^()]*/I - (this(and)that - (this(and)that) - (this) - ((this)) - -/\x{0000ff}/I - -/^((?Pa1)|(?Pa2)b)/I - -/^((?Pa1)|(?Pa2)b)/IJ - a1b\CA - a2b\CA - ** Failers - a1b\CZ\CA - -/(?|(?)(?)(?)|(?)(?)(?))/IJ - -/^(?Pa)(?Pb)/IJ - ab\CA - -/^(?Pa)(?Pb)|cd/IJ - ab\CA - cd\CA - -/^(?Pa)(?Pb)|cd(?Pef)(?Pgh)/IJ - cdefgh\CA - -/^((?Pa1)|(?Pa2)b)/IJ - a1b\GA - a2b\GA - ** Failers - a1b\GZ\GA - -/^(?Pa)(?Pb)/IJ - ab\GA - -/^(?Pa)(?Pb)|cd/IJ - ab\GA - cd\GA - -/^(?Pa)(?Pb)|cd(?Pef)(?Pgh)/IJ - cdefgh\GA - -/(?J)^((?Pa1)|(?Pa2)b)/I - a1b\CA - a2b\CA - -/^(?Pa) (?J:(?Pb)(?Pc)) (?Pd)/I - -/ In this next test, J is not set at the outer level; consequently it isn't -set in the pattern's options; consequently pcre_get_named_substring() produces -a random value. /Ix - -/^(?Pa) (?J:(?Pb)(?Pc)) (?Pd)/I - a bc d\CA\CB\CC - -/^(?Pa)?(?(A)a|b)/I - aabc - bc - ** Failers - abc - -/(?:(?(ZZ)a|b)(?PX))+/I - bXaX - -/(?:(?(2y)a|b)(X))+/I - -/(?:(?(ZA)a|b)(?PX))+/I - -/(?:(?(ZZ)a|b)(?(ZZ)a|b)(?PX))+/I - bbXaaX - -/(?:(?(ZZ)a|\(b\))\\(?PX))+/I - (b)\\Xa\\X - -/(?PX|Y))+/I - bXXaYYaY - bXYaXXaX - -/()()()()()()()()()(?:(?(A)(?P=A)a|b)(?PX|Y))+/I - bXXaYYaY - -/\s*,\s*/IS - \x0b,\x0b - \x0c,\x0d - -/^abc/Im - xyz\nabc - xyz\nabc\ - xyz\r\nabc\ - xyz\rabc\ - xyz\r\nabc\ - ** Failers - xyz\nabc\ - xyz\r\nabc\ - xyz\nabc\ - xyz\rabc\ - xyz\rabc\ - -/abc$/Im - xyzabc - xyzabc\n - xyzabc\npqr - xyzabc\r\ - xyzabc\rpqr\ - xyzabc\r\n\ - xyzabc\r\npqr\ - ** Failers - xyzabc\r - xyzabc\rpqr - xyzabc\r\n - xyzabc\r\npqr - -/^abc/Im - xyz\rabcdef - xyz\nabcdef\ - ** Failers - xyz\nabcdef - -/^abc/Im - xyz\nabcdef - xyz\rabcdef\ - ** Failers - xyz\rabcdef - -/^abc/Im - xyz\r\nabcdef - xyz\rabcdef\ - ** Failers - xyz\rabcdef - -/^abc/Im - -/abc/I - xyz\rabc\ - abc - -/.*/I - abc\ndef - abc\rdef - abc\r\ndef - \abc\ndef - \abc\rdef - \abc\r\ndef - \abc\ndef - \abc\rdef - \abc\r\ndef - -/\w+(.)(.)?def/Is - abc\ndef - abc\rdef - abc\r\ndef - -+((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)+I - /* this is a C style comment */\M - -/(?P25[0-5]|2[0-4]\d|[01]?\d?\d)(?:\.(?P>B)){3}/I - -/()()()()()()()()()()()()()()()()()()()() - ()()()()()()()()()()()()()()()()()()()() - ()()()()()()()()()()()()()()()()()()()() - ()()()()()()()()()()()()()()()()()()()() - ()()()()()()()()()()()()()()()()()()()() - (.(.))/Ix - XY\O400 - -/(a*b|(?i:c*(?-i)d))/IS - -/()[ab]xyz/IS - -/(|)[ab]xyz/IS - -/(|c)[ab]xyz/IS - -/(|c?)[ab]xyz/IS - -/(d?|c?)[ab]xyz/IS - -/(d?|c)[ab]xyz/IS - -/^a*b\d/DZ - -/^a*+b\d/DZ - -/^a*?b\d/DZ - -/^a+A\d/DZ - aaaA5 - ** Failers - aaaa5 - -/^a*A\d/IiDZ - aaaA5 - aaaa5 - -/(a*|b*)[cd]/IS - -/(a+|b*)[cd]/IS - -/(a*|b+)[cd]/IS - -/(a+|b+)[cd]/IS - -/(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( - (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( - ((( - a - )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) - )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) - ))) -/Ix - large nest - -/a*\d/BZ - -/a*\D/BZ - -/0*\d/BZ - -/0*\D/BZ - -/a*\s/BZ - -/a*\S/BZ - -/ *\s/BZ - -/ *\S/BZ - -/a*\w/BZ - -/a*\W/BZ - -/=*\w/BZ - -/=*\W/BZ - -/\d*a/BZ - -/\d*2/BZ - -/\d*\d/BZ - -/\d*\D/BZ - -/\d*\s/BZ - -/\d*\S/BZ - -/\d*\w/BZ - -/\d*\W/BZ - -/\D*a/BZ - -/\D*2/BZ - -/\D*\d/BZ - -/\D*\D/BZ - -/\D*\s/BZ - -/\D*\S/BZ - -/\D*\w/BZ - -/\D*\W/BZ - -/\s*a/BZ - -/\s*2/BZ - -/\s*\d/BZ - -/\s*\D/BZ - -/\s*\s/BZ - -/\s*\S/BZ - -/\s*\w/BZ - -/\s*\W/BZ - -/\S*a/BZ - -/\S*2/BZ - -/\S*\d/BZ - -/\S*\D/BZ - -/\S*\s/BZ - -/\S*\S/BZ - -/\S*\w/BZ - -/\S*\W/BZ - -/\w*a/BZ - -/\w*2/BZ - -/\w*\d/BZ - -/\w*\D/BZ - -/\w*\s/BZ - -/\w*\S/BZ - -/\w*\w/BZ - -/\w*\W/BZ - -/\W*a/BZ - -/\W*2/BZ - -/\W*\d/BZ - -/\W*\D/BZ - -/\W*\s/BZ - -/\W*\S/BZ - -/\W*\w/BZ - -/\W*\W/BZ - -/[^a]+a/BZ - -/[^a]+a/BZi - -/[^a]+A/BZi - -/[^a]+b/BZ - -/[^a]+\d/BZ - -/a*[^a]/BZ - -/(?Px)(?Py)/I - xy\Cabc\Cxyz - -/(?x)(?'xyz'y)/I - xy\Cabc\Cxyz - -/(?x)(?'xyz>y)/I - -/(?P'abc'x)(?Py)/I - -/^(?:(?(ZZ)a|b)(?X))+/ - bXaX - bXbX - ** Failers - aXaX - aXbX - -/^(?P>abc)(?xxx)/ - -/^(?P>abc)(?x|y)/ - xx - xy - yy - yx - -/^(?P>abc)(?Px|y)/ - xx - xy - yy - yx - -/^((?(abc)a|b)(?x|y))+/ - bxay - bxby - ** Failers - axby - -/^(((?P=abc)|X)(?x|y))+/ - XxXxxx - XxXyyx - XxXyxx - ** Failers - x - -/^(?1)(abc)/ - abcabc - -/^(?:(?:\1|X)(a|b))+/ - Xaaa - Xaba - -/^[\E\Qa\E-\Qz\E]+/BZ - -/^[a\Q]bc\E]/BZ - -/^[a-\Q\E]/BZ - -/^(?P>abc)[()](?)/BZ - -/^((?(abc)y)[()](?Px))+/BZ - (xy)x - -/^(?P>abc)\Q()\E(?)/BZ - -/^(?P>abc)[a\Q(]\E(](?)/BZ - -/^(?P>abc) # this is (a comment) - (?)/BZx - -/^\W*(?:(?(?.)\W*(?&one)\W*\k|)|(?(?.)\W*(?&three)\W*\k'four'|\W*.\W*))\W*$/Ii - 1221 - Satan, oscillate my metallic sonatas! - A man, a plan, a canal: Panama! - Able was I ere I saw Elba. - *** Failers - The quick brown fox - -/(?=(\w+))\1:/I - abcd: - -/(?=(?'abc'\w+))\k:/I - abcd: - -/(?'abc'a|b)(?d|e)\k{2}/J - adaa - ** Failers - addd - adbb - -/(?'abc'a|b)(?d|e)(?&abc){2}/J - bdaa - bdab - ** Failers - bddd - -/(?( (?'B' abc (?(R) (?(R&A)1) (?(R&B)2) X | (?1) (?2) (?R) ))) /x - abcabc1Xabc2XabcXabcabc - -/(? (?'B' abc (?(R) (?(R&C)1) (?(R&B)2) X | (?1) (?2) (?R) ))) /x - -/^(?(DEFINE) abc | xyz ) /x - -/(?(DEFINE) abc) xyz/xI - -/(a|)*\d/ - \O0aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa - \O0aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 - -/^a.b/ - a\rb - a\nb\ - a\x85b\ - ** Failers - a\nb - a\nb\ - a\rb\ - a\rb\ - a\x85b\ - a\rb\ - -/^abc./mgx - abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x85abc7 JUNK - -/abc.$/mgx - abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc7 abc9 - -/a/ - -/a/ - -/^a\Rb/ - a\nb - a\rb - a\r\nb - a\x0bb - a\x0cb - a\x85b - ** Failers - a\n\rb - -/^a\R*b/ - ab - a\nb - a\rb - a\r\nb - a\x0bb - a\x0cb - a\x85b - a\n\rb - a\n\r\x85\x0cb - -/^a\R+b/ - a\nb - a\rb - a\r\nb - a\x0bb - a\x0cb - a\x85b - a\n\rb - a\n\r\x85\x0cb - ** Failers - ab - -/^a\R{1,3}b/ - a\nb - a\n\rb - a\n\r\x85b - a\r\n\r\nb - a\r\n\r\n\r\nb - a\n\r\n\rb - a\n\n\r\nb - ** Failers - a\n\n\n\rb - a\r - -/^a[\R]b/ - aRb - ** Failers - a\nb - -/(?&abc)X(?P)/I - abcPXP123 - -/(?1)X(?P)/I - abcPXP123 - -/(?:a(?&abc)b)*(?x)/ - 123axbaxbaxbx456 - 123axbaxbaxb456 - -/(?:a(?&abc)b){1,5}(?x)/ - 123axbaxbaxbx456 - -/(?:a(?&abc)b){2,5}(?x)/ - 123axbaxbaxbx456 - -/(?:a(?&abc)b){2,}(?x)/ - 123axbaxbaxbx456 - -/(abc)(?i:(?1))/ - defabcabcxyz - DEFabcABCXYZ - -/(abc)(?:(?i)(?1))/ - defabcabcxyz - DEFabcABCXYZ - -/^(a)\g-2/ - -/^(a)\g/ - -/^(a)\g{0}/ - -/^(a)\g{3/ - -/^(a)\g{aa}/ - -/^a.b/ - a\rb - *** Failers - a\nb - -/.+foo/ - afoo - ** Failers - \r\nfoo - \nfoo - -/.+foo/ - afoo - \nfoo - ** Failers - \r\nfoo - -/.+foo/ - afoo - ** Failers - \nfoo - \r\nfoo - -/.+foo/s - afoo - \r\nfoo - \nfoo - -/^$/mg - abc\r\rxyz - abc\n\rxyz - ** Failers - abc\r\nxyz - -/(?m)^$/g+ - abc\r\n\r\n - -/(?m)^$|^\r\n/g+ - abc\r\n\r\n - -/(?m)$/g+ - abc\r\n\r\n - -/abc.$/mgx - abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc9 - -/^X/m - XABC - ** Failers - XABC\B - -/(ab|c)(?-1)/BZ - abc - -/xy(?+1)(abc)/BZ - xyabcabc - ** Failers - xyabc - -/x(?-0)y/ - -/x(?-1)y/ - -/x(?+0)y/ - -/x(?+1)y/ - -/^(abc)?(?(-1)X|Y)/BZ - abcX - Y - ** Failers - abcY - -/^((?(+1)X|Y)(abc))+/BZ - YabcXabc - YabcXabcXabc - ** Failers - XabcXabc - -/(?(-1)a)/BZ - -/((?(-1)a))/BZ - -/((?(-2)a))/BZ - -/^(?(+1)X|Y)(.)/BZ - Y! - -/(?tom|bon)-\k{A}/ - tom-tom - bon-bon - ** Failers - tom-bon - -/\g{A/ - -/(?|(abc)|(xyz))/BZ - >abc< - >xyz< - -/(x)(?|(abc)|(xyz))(x)/BZ - xabcx - xxyzx - -/(x)(?|(abc)(pqr)|(xyz))(x)/BZ - xabcpqrx - xxyzx - -/\H++X/BZ - ** Failers - XXXX - -/\H+\hY/BZ - XXXX Y - -/\H+ Y/BZ - -/\h+A/BZ - -/\v*B/BZ - -/\V+\x0a/BZ - -/A+\h/BZ - -/ *\H/BZ - -/A*\v/BZ - -/\x0b*\V/BZ - -/\d+\h/BZ - -/\d*\v/BZ - -/S+\h\S+\v/BZ - -/\w{3,}\h\w+\v/BZ - -/\h+\d\h+\w\h+\S\h+\H/BZ - -/\v+\d\v+\w\v+\S\v+\V/BZ - -/\H+\h\H+\d/BZ - -/\V+\v\V+\w/BZ - -/\( (?: [^()]* | (?R) )* \)/x -\J1024(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(00)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0) - -/[\E]AAA/ - -/[\Q\E]AAA/ - -/[^\E]AAA/ - -/[^\Q\E]AAA/ - -/[\E^]AAA/ - -/[\Q\E^]AAA/ - -/A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/BZ - -/^a+(*FAIL)/C - aaaaaa - -/a+b?c+(*FAIL)/C - aaabccc - -/a+b?(*PRUNE)c+(*FAIL)/C - aaabccc - -/a+b?(*COMMIT)c+(*FAIL)/C - aaabccc - -/a+b?(*SKIP)c+(*FAIL)/C - aaabcccaaabccc - -/a+b?(*THEN)c+(*FAIL)/C - aaabccc - -/a(*MARK)b/ - -/(?i:A{1,}\6666666666)/ - -/\g6666666666/ - -/[\g6666666666]/BZ - -/(?1)\c[/ - -/.+A/ - \r\nA - -/\nA/ - \r\nA - -/[\r\n]A/ - \r\nA - -/(\r|\n)A/ - \r\nA - -/a(*CR)b/ - -/(*CR)a.b/ - a\nb - ** Failers - a\rb - -/(*CR)a.b/ - a\nb - ** Failers - a\rb - -/(*LF)a.b/ - a\rb - ** Failers - a\nb - -/(*CRLF)a.b/ - a\rb - a\nb - ** Failers - a\r\nb - -/(*ANYCRLF)a.b/ - ** Failers - a\rb - a\nb - a\r\nb - -/(*ANY)a.b/ - ** Failers - a\rb - a\nb - a\r\nb - a\x85b - -/(*ANY).*/g - abc\r\ndef - -/(*ANYCRLF).*/g - abc\r\ndef - -/(*CRLF).*/g - abc\r\ndef - -/a\Rb/I - a\rb - a\nb - a\r\nb - ** Failers - a\x85b - a\x0bb - -/a\Rb/I - a\rb - a\nb - a\r\nb - a\x85b - a\x0bb - ** Failers - a\x85b\ - a\x0bb\ - -/a\R?b/I - a\rb - a\nb - a\r\nb - ** Failers - a\x85b - a\x0bb - -/a\R?b/I - a\rb - a\nb - a\r\nb - a\x85b - a\x0bb - ** Failers - a\x85b\ - a\x0bb\ - -/a\R{2,4}b/I - a\r\n\nb - a\n\r\rb - a\r\n\r\n\r\n\r\nb - ** Failers - a\x85\85b - a\x0b\0bb - -/a\R{2,4}b/I - a\r\rb - a\n\n\nb - a\r\n\n\r\rb - a\x85\85b - a\x0b\0bb - ** Failers - a\r\r\r\r\rb - a\x85\85b\ - a\x0b\0bb\ - -/(*BSR_ANYCRLF)a\Rb/I - a\nb - a\rb - -/(*BSR_UNICODE)a\Rb/I - a\x85b - -/(*BSR_ANYCRLF)(*CRLF)a\Rb/I - a\nb - a\rb - -/(*CRLF)(*BSR_UNICODE)a\Rb/I - a\x85b - -/(*CRLF)(*BSR_ANYCRLF)(*CR)ab/I - -/(?)(?&)/ - -/(?)(?&a)/ - -/(?)(?&aaaaaaaaaaaaaaaaaaaaaaa)/ - -/(?+-a)/ - -/(?-+a)/ - -/(?(-1))/ - -/(?(+10))/ - -/(?(10))/ - -/(?(+2))()()/ - -/(?(2))()()/ - -/\k''/ - -/\k<>/ - -/\k{}/ - -/\k/ - -/\kabc/ - -/(?P=)/ - -/(?P>)/ - -/(?!\w)(?R)/ - -/(?=\w)(?R)/ - -/(?x|y){0}z/ - xzxx - yzyy - ** Failers - xxz - -/(\3)(\1)(a)/ - cat - -/(\3)(\1)(a)/ - cat - -/TA]/ - The ACTA] comes - -/TA]/ - The ACTA] comes - -/(?2)[]a()b](abc)/ - abcbabc - -/(?2)[^]a()b](abc)/ - abcbabc - -/(?1)[]a()b](abc)/ - abcbabc - ** Failers - abcXabc - -/(?1)[^]a()b](abc)/ - abcXabc - ** Failers - abcbabc - -/(?2)[]a()b](abc)(xyz)/ - xyzbabcxyz - -/(?&N)[]a(?)](?abc)/ - abc)](abc)/ - abc - ** Failers - ab - -/a[]+b/ - ** Failers - ab - -/a[]*+b/ - ** Failers - ab - -/a[^]b/ - aXb - a\nb - ** Failers - ab - -/a[^]+b/ - aXb - a\nX\nXb - ** Failers - ab - -/a(?!)b/BZ - -/(?!)?a/BZ - ab - -/a(*FAIL)+b/ - -/(abc|pqr|123){0}[xyz]/SI - -/(?(?=.*b)b|^)/CI - adc - abc - -/(?(?=b).*b|^d)/I - -/(?(?=.*b).*b|^d)/I - -/xyz/C - xyz - abcxyz - abcxyz\Y - ** Failers - abc - abc\Y - abcxypqr - abcxypqr\Y - -/(*NO_START_OPT)xyz/C - abcxyz - -/(*NO_AUTO_POSSESS)a+b/BZ - -/xyz/CY - abcxyz - -/^"((?(?=[a])[^"])|b)*"$/C - "ab" - -/^"((?(?=[a])[^"])|b)*"$/ - "ab" - -/^X(?5)(a)(?|(b)|(q))(c)(d)Y/ - XYabcdY - -/^X(?&N)(a)(?|(b)|(q))(c)(d)(?Y)/ - XYabcdY - -/Xa{2,4}b/ - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/Xa{2,4}?b/ - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/Xa{2,4}+b/ - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/X\d{2,4}b/ - X\P - X3\P - X33\P - X333\P - X3333\P - -/X\d{2,4}?b/ - X\P - X3\P - X33\P - X333\P - X3333\P - -/X\d{2,4}+b/ - X\P - X3\P - X33\P - X333\P - X3333\P - -/X\D{2,4}b/ - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/X\D{2,4}?b/ - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/X\D{2,4}+b/ - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/X[abc]{2,4}b/ - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/X[abc]{2,4}?b/ - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/X[abc]{2,4}+b/ - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/X[^a]{2,4}b/ - X\P - Xz\P - Xzz\P - Xzzz\P - Xzzzz\P - -/X[^a]{2,4}?b/ - X\P - Xz\P - Xzz\P - Xzzz\P - Xzzzz\P - -/X[^a]{2,4}+b/ - X\P - Xz\P - Xzz\P - Xzzz\P - Xzzzz\P - -/(Y)X\1{2,4}b/ - YX\P - YXY\P - YXYY\P - YXYYY\P - YXYYYY\P - -/(Y)X\1{2,4}?b/ - YX\P - YXY\P - YXYY\P - YXYYY\P - YXYYYY\P - -/(Y)X\1{2,4}+b/ - YX\P - YXY\P - YXYY\P - YXYYY\P - YXYYYY\P - -/\++\KZ|\d+X|9+Y/ - ++++123999\P - ++++123999Y\P - ++++Z1234\P - -/Z(*F)/ - Z\P - ZA\P - -/Z(?!)/ - Z\P - ZA\P - -/dog(sbody)?/ - dogs\P - dogs\P\P - -/dog(sbody)??/ - dogs\P - dogs\P\P - -/dog|dogsbody/ - dogs\P - dogs\P\P - -/dogsbody|dog/ - dogs\P - dogs\P\P - -/\bthe cat\b/ - the cat\P - the cat\P\P - -/abc/ - abc\P - abc\P\P - -/abc\K123/ - xyzabc123pqr - xyzabc12\P - xyzabc12\P\P - -/(?<=abc)123/ - xyzabc123pqr - xyzabc12\P - xyzabc12\P\P - -/\babc\b/ - +++abc+++ - +++ab\P - +++ab\P\P - -/(?&word)(?&element)(?(DEFINE)(?<[^m][^>]>[^<])(?\w*+))/BZ - -/(?&word)(?&element)(?(DEFINE)(?<[^\d][^>]>[^<])(?\w*+))/BZ - -/(ab)(x(y)z(cd(*ACCEPT)))pq/BZ - -/abc\K/+ - abcdef - abcdef\N\N - xyzabcdef\N\N - ** Failers - abcdef\N - xyzabcdef\N - -/^(?:(?=abc)|abc\K)/+ - abcdef - abcdef\N\N - ** Failers - abcdef\N - -/a?b?/+ - xyz - xyzabc - xyzabc\N - xyzabc\N\N - xyz\N\N - ** Failers - xyz\N - -/^a?b?/+ - xyz - xyzabc - ** Failers - xyzabc\N - xyzabc\N\N - xyz\N\N - xyz\N - -/^(?a|b\gc)/ - aaaa - bacxxx - bbaccxxx - bbbacccxx - -/^(?a|b\g'name'c)/ - aaaa - bacxxx - bbaccxxx - bbbacccxx - -/^(a|b\g<1>c)/ - aaaa - bacxxx - bbaccxxx - bbbacccxx - -/^(a|b\g'1'c)/ - aaaa - bacxxx - bbaccxxx - bbbacccxx - -/^(a|b\g'-1'c)/ - aaaa - bacxxx - bbaccxxx - bbbacccxx - -/(^(a|b\g<-1>c))/ - aaaa - bacxxx - bbaccxxx - bbbacccxx - -/(?-i:\g)(?i:(?a))/ - XaaX - XAAX - -/(?i:\g)(?-i:(?a))/ - XaaX - ** Failers - XAAX - -/(?-i:\g<+1>)(?i:(a))/ - XaaX - XAAX - -/(?=(?(?#simplesyntax)\$(?[a-zA-Z_\x{7f}-\x{ff}][a-zA-Z0-9_\x{7f}-\x{ff}]*)(?:\[(?[a-zA-Z0-9_\x{7f}-\x{ff}]+|\$\g)\]|->\g(\(.*?\))?)?|(?#simple syntax withbraces)\$\{(?:\g(?\[(?:\g|'(?:\\.|[^'\\])*'|"(?:\g|\\.|[^"\\])*")\])?|\g|\$\{\g\})\}|(?#complexsyntax)\{(?\$(?\g(\g*|\(.*?\))?)(?:->\g)*|\$\g|\$\{\g\})\}))\{/ - -/(?a|b|c)\g*/ - abc - accccbbb - -/^X(?7)(a)(?|(b)|(q)(r)(s))(c)(d)(Y)/ - XYabcdY - -/(?<=b(?1)|zzz)(a)/ - xbaax - xzzzax - -/(a)(?<=b\1)/ - -/(a)(?<=b+(?1))/ - -/(a+)(?<=b(?1))/ - -/(a(?<=b(?1)))/ - -/(?<=b(?1))xyz/ - -/(?<=b(?1))xyz(b+)pqrstuvew/ - -/(a|bc)\1/SI - -/(a|bc)\1{2,3}/SI - -/(a|bc)(?1)/SI - -/(a|b\1)(a|b\1)/SI - -/(a|b\1){2}/SI - -/(a|bbbb\1)(a|bbbb\1)/SI - -/(a|bbbb\1){2}/SI - -/^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/SI - -/]{0,})>]{0,})>([\d]{0,}\.)(.*)((
    ([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/isIS - -"(?>.*/)foo"SI - -/(?(?=[^a-z]+[a-z]) \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} ) /xSI - -/(?:(?:(?:(?:(?:(?:(?:(?:(?:(a|b|c))))))))))/iSI - -/(?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))/SI - -/A)|(?
    B))/I - AB\Ca - BA\Ca - -/(?|(?A)|(?B))/ - -/(?:a(? (?')|(?")) | - b(? (?')|(?")) ) - (?('quote')[a-z]+|[0-9]+)/JIx - a"aaaaa - b"aaaaa - ** Failers - b"11111 - a"11111 - -/^(?|(a)(b)(c)(?d)|(?e)) (?('D')X|Y)/JDZx - abcdX - eX - ** Failers - abcdY - ey - -/(?a) (b)(c) (?d (?(R&A)$ | (?4)) )/JDZx - abcdd - ** Failers - abcdde - -/abcd*/ - xxxxabcd\P - xxxxabcd\P\P - -/abcd*/i - xxxxabcd\P - xxxxabcd\P\P - XXXXABCD\P - XXXXABCD\P\P - -/abc\d*/ - xxxxabc1\P - xxxxabc1\P\P - -/(a)bc\1*/ - xxxxabca\P - xxxxabca\P\P - -/abc[de]*/ - xxxxabcde\P - xxxxabcde\P\P - -/-- This is not in the Perl-compatible test because Perl seems currently to be - broken and not behaving as specified in that it *does* bumpalong after - hitting (*COMMIT). --/ - -/(?1)(A(*COMMIT)|B)D/ - ABD - XABD - BAD - ABXABD - ** Failers - ABX - BAXBAD - -/(\3)(\1)(a)/ - cat - -/(\3)(\1)(a)/SI - cat - -/(\3)(\1)(a)/SI - cat - -/i(?(DEFINE)(?a))/SI - i - -/()i(?(1)a)/SI - ia - -/(?i)a(?-i)b|c/BZ - XabX - XAbX - CcC - ** Failers - XABX - -/(?i)a(?s)b|c/BZ - -/(?i)a(?s-i)b|c/BZ - -/^(ab(c\1)d|x){2}$/BZ - xabcxd - -/^(?&t)*+(?(DEFINE)(?.))$/BZ - -/^(?&t)*(?(DEFINE)(?.))$/BZ - -/ -- This one is here because Perl gives the match as "b" rather than "ab". I - believe this to be a Perl bug. --/ - -/(?>a\Kb)z|(ab)/ - ab - -/(?P(?P0|)|(?P>L2)(?P>L1))/ - -/abc(*MARK:)pqr/ - -/abc(*:)pqr/ - -/abc(*FAIL:123)xyz/ - -/--- This should, and does, fail. In Perl, it does not, which I think is a - bug because replacing the B in the pattern by (B|D) does make it fail. ---/ - -/A(*COMMIT)B/+K - ACABX - -/--- These should be different, but in Perl they are not, which I think - is a bug in Perl. ---/ - -/A(*THEN)B|A(*THEN)C/K - AC - -/A(*PRUNE)B|A(*PRUNE)C/K - AC - -/--- Mark names can be duplicated. Perl doesn't give a mark for this one, -though PCRE does. ---/ - -/^A(*:A)B|^X(*:A)Y/K - ** Failers - XAQQ - -/--- COMMIT at the start of a pattern should be the same as an anchor. Perl -optimizations defeat this. So does the PCRE optimization unless we disable it -with \Y. ---/ - -/(*COMMIT)ABC/ - ABCDEFG - ** Failers - DEFGABC\Y - -/^(ab (c+(*THEN)cd) | xyz)/x - abcccd - -/^(ab (c+(*PRUNE)cd) | xyz)/x - abcccd - -/^(ab (c+(*FAIL)cd) | xyz)/x - abcccd - -/--- Perl gets some of these wrong ---/ - -/(?>.(*ACCEPT))*?5/ - abcde - -/(.(*ACCEPT))*?5/ - abcde - -/(.(*ACCEPT))5/ - abcde - -/(.(*ACCEPT))*5/ - abcde - -/A\NB./BZ - ACBD - *** Failers - A\nB - ACB\n - -/A\NB./sBZ - ACBD - ACB\n - *** Failers - A\nB - -/A\NB/ - A\nB - A\rB - ** Failers - A\r\nB - -/\R+b/BZ - -/\R+\n/BZ - -/\R+\d/BZ - -/\d*\R/BZ - -/\s*\R/BZ - \x20\x0a - \x20\x0d - \x20\x0d\x0a - -/\S*\R/BZ - a\x0a - -/X\h*\R/BZ - X\x20\x0a - -/X\H*\R/BZ - X\x0d\x0a - -/X\H+\R/BZ - X\x0d\x0a - -/X\H++\R/BZ - X\x0d\x0a - -/(?<=abc)def/ - abc\P\P - -/abc$/ - abc - abc\P - abc\P\P - -/abc$/m - abc - abc\n - abc\P\P - abc\n\P\P - abc\P - abc\n\P - -/abc\z/ - abc - abc\P - abc\P\P - -/abc\Z/ - abc - abc\P - abc\P\P - -/abc\b/ - abc - abc\P - abc\P\P - -/abc\B/ - abc - abc\P - abc\P\P - -/.+/ - abc\>0 - abc\>1 - abc\>2 - abc\>3 - abc\>4 - abc\>-4 - -/^\cÄ£/ - -/(?P(?P=abn)xxx)/BZ - -/(a\1z)/BZ - -/(?P(?P=abn)(?(?P=axn)xxx)/BZ - -/(?P(?P=axn)xxx)(?yy)/BZ - -/-- These tests are here because Perl gets the first one wrong. --/ - -/(\R*)(.)/s - \r\n - \r\r\n\n\r - \r\r\n\n\r\n - -/(\R)*(.)/s - \r\n - \r\r\n\n\r - \r\r\n\n\r\n - -/((?>\r\n|\n|\x0b|\f|\r|\x85)*)(.)/s - \r\n - \r\r\n\n\r - \r\r\n\n\r\n - -/-- --/ - -/^abc$/BZ - -/^abc$/BZm - -/^(a)*+(\w)/S - aaaaX - ** Failers - aaaa - -/^(?:a)*+(\w)/S - aaaaX - ** Failers - aaaa - -/(a)++1234/SDZ - -/([abc])++1234/SI - -/(?<=(abc)+)X/ - -/(^ab)/I - -/(^ab)++/I - -/(^ab|^)+/I - -/(^ab|^)++/I - -/(?:^ab)/I - -/(?:^ab)++/I - -/(?:^ab|^)+/I - -/(?:^ab|^)++/I - -/(.*ab)/I - -/(.*ab)++/I - -/(.*ab|.*)+/I - -/(.*ab|.*)++/I - -/(?:.*ab)/I - -/(?:.*ab)++/I - -/(?:.*ab|.*)+/I - -/(?:.*ab|.*)++/I - -/(?=a)[bcd]/I - -/((?=a))[bcd]/I - -/((?=a))+[bcd]/I - -/((?=a))++[bcd]/I - -/(?=a+)[bcd]/iI - -/(?=a+?)[bcd]/iI - -/(?=a++)[bcd]/iI - -/(?=a{3})[bcd]/iI - -/(abc)\1+/S - -/-- Perl doesn't get these right IMO (the 3rd is PCRE-specific) --/ - -/(?1)(?:(b(*ACCEPT))){0}/ - b - -/(?1)(?:(b(*ACCEPT))){0}c/ - bc - ** Failers - b - -/(?1)(?:((*ACCEPT))){0}c/ - c - c\N - -/^.*?(?(?=a)a|b(*THEN)c)/ - ba - -/^.*?(?(?=a)a|bc)/ - ba - -/^.*?(?(?=a)a(*THEN)b|c)/ - ac - -/^.*?(?(?=a)a(*THEN)b)c/ - ac - -/^.*?(a(*THEN)b)c/ - aabc - -/^.*? (?1) c (?(DEFINE)(a(*THEN)b))/x - aabc - -/^.*?(a(*THEN)b|z)c/ - aabc - -/^.*?(z|a(*THEN)b)c/ - aabc - -/-- --/ - -/-- These studied versions are here because they are not Perl-compatible; the - studying means the mark is not seen. --/ - -/(*MARK:A)(*SKIP:B)(C|X)/KS - C - D - -/(*:A)A+(*SKIP:A)(B|Z)/KS - AAAC - -/-- --/ - -"(?=a*(*ACCEPT)b)c" - c - c\N - -/(?1)c(?(DEFINE)((*ACCEPT)b))/ - c - c\N - -/(?>(*ACCEPT)b)c/ - c - c\N - -/(?:(?>(a)))+a%/++ - %aa% - -/(a)b|ac/++SS - ac\O3 - -/(a)(b)x|abc/++ - abc\O6 - -/(a)bc|(a)(b)\2/ - \O3abc - \O4abc - -/(?(DEFINE)(a(?2)|b)(b(?1)|a))(?:(?1)|(?2))/SI - -/(a(?2)|b)(b(?1)|a)(?:(?1)|(?2))/SI - -/(a(?2)|b)(b(?1)|a)(?1)(?2)/SI - -/(abc)(?1)/SI - -/^(?>a)++/ - aa\M - aaaaaaaaa\M - -/(a)(?1)++/ - aa\M - aaaaaaaaa\M - -/(?:(foo)|(bar)|(baz))X/SS= - bazfooX - foobazbarX - barfooX - bazX - foobarbazX - bazfooX\O0 - bazfooX\O2 - bazfooX\O4 - bazfooX\O6 - bazfooX\O8 - bazfooX\O10 - -/(?=abc){3}abc/BZ - -/(?=abc)+abc/BZ - -/(?=abc)++abc/BZ - -/(?=abc){0}xyz/BZ - -/(?=(a))?./BZ - -/(?=(a))??./BZ - -/^(?=(a)){0}b(?1)/BZ - -/(?(DEFINE)(a))?b(?1)/BZ - -/^(?=(?1))?[az]([abc])d/BZ - -/^(?!a){0}\w+/BZ - -/(?<=(abc))?xyz/BZ - -/[:a[:abc]b:]/BZ - -/((?2))((?1))/SS - abc - -/((?(R2)a+|(?1)b))/SS - aaaabcde - -/(?(R)a*(?1)|((?R))b)/SS - aaaabcde - -/(a+|(?R)b)/ - -/^(a(*:A)(d|e(*:B))z|aeq)/C - adz - aez - aeqwerty - -/.(*F)/ - \P\Pabc - -/\btype\b\W*?\btext\b\W*?\bjavascript\b/IS - -/\btype\b\W*?\btext\b\W*?\bjavascript\b|\burl\b\W*?\bshell:|a+)(?>(z+))\w/BZ - aaaazzzzb - ** Failers - aazz - -/(.)(\1|a(?2))/ - bab - -/\1|(.)(?R)\1/ - cbbbc - -/(.)((?(1)c|a)|a(?2))/ - baa - -/(?P(?P=abn)xxx)/BZ - -/(a\1z)/BZ - -/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/ - \Maabbccddee - -/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/ - \Maabbccddee - -/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/ - \Maabbccddee - -/^a\x41z/ - aAz - *** Failers - ax41z - -/^a[m\x41]z/ - aAz - -/^a\x1z/ - ax1z - -/^a\u0041z/ - aAz - *** Failers - au0041z - -/^a[m\u0041]z/ - aAz - -/^a\u041z/ - au041z - *** Failers - aAz - -/^a\U0041z/ - aU0041z - *** Failers - aAz - -/(?(?=c)c|d)++Y/BZ - -/(?(?=c)c|d)*+Y/BZ - -/a[\NB]c/ - aNc - -/a[B-\Nc]/ - -/a[B\Nc]/ - -/(a)(?2){0,1999}?(b)/ - -/(a)(?(DEFINE)(b))(?2){0,1999}?(?2)/ - -/--- This test, with something more complicated than individual letters, causes -different behaviour in Perl. Perhaps it disables some optimization; no tag is -passed back for the failures, whereas in PCRE there is a tag. ---/ - -/(A|P)(*:A)(B|P) | (X|P)(X|P)(*:B)(Y|P)/xK - AABC - XXYZ - ** Failers - XAQQ - XAQQXZZ - AXQQQ - AXXQQQ - -/-- Perl doesn't give marks for these, though it does if the alternatives are -replaced by single letters. --/ - -/(b|q)(*:m)f|a(*:n)w/K - aw - ** Failers - abc - -/(q|b)(*:m)f|a(*:n)w/K - aw - ** Failers - abc - -/-- After a partial match, the behaviour is as for a failure. --/ - -/^a(*:X)bcde/K - abc\P - -/-- These are here because Perl doesn't return a mark, except for the first --/ - -/(?=(*:x))(q|)/K+ - abc - -/(?=(*:x))((*:y)q|)/K+ - abc - -/(?=(*:x))(?:(*:y)q|)/K+ - abc - -/(?=(*:x))(?>(*:y)q|)/K+ - abc - -/(?=a(*:x))(?!a(*:y)c)/K+ - ab - -/(?=a(*:x))(?=a(*:y)c|)/K+ - ab - -/(..)\1/ - ab\P - aba\P - abab\P - -/(..)\1/i - ab\P - abA\P - aBAb\P - -/(..)\1{2,}/ - ab\P - aba\P - abab\P - ababa\P - ababab\P - ababab\P\P - abababa\P - abababa\P\P - -/(..)\1{2,}/i - ab\P - aBa\P - aBAb\P - AbaBA\P - abABAb\P - aBAbaB\P\P - abABabA\P - abaBABa\P\P - -/(..)\1{2,}?x/i - ab\P - abA\P - aBAb\P - abaBA\P - abAbaB\P - abaBabA\P - abAbABaBx\P - -/^(..)\1/ - aba\P - -/^(..)\1{2,3}x/ - aba\P - ababa\P - ababa\P\P - abababx - ababababx - -/^(..)\1{2,3}?x/ - aba\P - ababa\P - ababa\P\P - abababx - ababababx - -/^(..)(\1{2,3})ab/ - abababab - -/^\R/ - \r\P - \r\P\P - -/^\R{2,3}x/ - \r\P - \r\P\P - \r\r\P - \r\r\P\P - \r\r\r\P - \r\r\r\P\P - \r\rx - \r\r\rx - -/^\R{2,3}?x/ - \r\P - \r\P\P - \r\r\P - \r\r\P\P - \r\r\r\P - \r\r\r\P\P - \r\rx - \r\r\rx - -/^\R?x/ - \r\P - \r\P\P - x - \rx - -/^\R+x/ - \r\P - \r\P\P - \r\n\P - \r\n\P\P - \rx - -/^a$/ - a\r\P - a\r\P\P - -/^a$/m - a\r\P - a\r\P\P - -/^(a$|a\r)/ - a\r\P - a\r\P\P - -/^(a$|a\r)/m - a\r\P - a\r\P\P - -/./ - \r\P - \r\P\P - -/.{2,3}/ - \r\P - \r\P\P - \r\r\P - \r\r\P\P - \r\r\r\P - \r\r\r\P\P - -/.{2,3}?/ - \r\P - \r\P\P - \r\r\P - \r\r\P\P - \r\r\r\P - \r\r\r\P\P - -"AB(C(D))(E(F))?(?(?=\2)(?=\4))" - ABCDGHI\O03 - -/-- These are all run as real matches in test 1; here we are just checking the -settings of the anchored and startline bits. --/ - -/(?>.*?a)(?<=ba)/I - -/(?:.*?a)(?<=ba)/I - -/.*?a(*PRUNE)b/I - -/.*?a(*PRUNE)b/sI - -/^a(*PRUNE)b/sI - -/.*?a(*SKIP)b/I - -/(?>.*?a)b/sI - -/(?>.*?a)b/I - -/(?>^a)b/sI - -/(?>.*?)(?<=(abcd)|(wxyz))/I - -/(?>.*)(?<=(abcd)|(wxyz))/I - -"(?>.*)foo"I - -"(?>.*?)foo"I - -/(?>^abc)/mI - -/(?>.*abc)/mI - -/(?:.*abc)/mI - -/-- Check PCRE_STUDY_EXTRA_NEEDED --/ - -/.?/S-I - -/.?/S!I - -/(?:(a)+(?C1)bb|aa(?C2)b)/ - aab\C+ - -/(?:(a)++(?C1)bb|aa(?C2)b)/ - aab\C+ - -/(?:(?>(a))(?C1)bb|aa(?C2)b)/ - aab\C+ - -/(?:(?1)(?C1)x|ab(?C2))((a)){0}/ - aab\C+ - -/(?1)(?C1)((a)(?C2)){0}/ - aab\C+ - -/(?:(a)+(?C1)bb|aa(?C2)b)++/ - aab\C+ - aab\C+\O2 - -/(ab)x|ab/ - ab\O3 - ab\O2 - -/(ab)/ - ab\O3 - ab\O2 - -/(?<=123)(*MARK:xx)abc/K - xxxx123a\P\P - xxxx123a\P - -/123\Kabc/ - xxxx123a\P\P - xxxx123a\P - -/^(?(?=a)aa|bb)/C - bb - -/(?C1)^(?C2)(?(?C99)(?=(?C3)a(?C4))(?C5)a(?C6)a(?C7)|(?C8)b(?C9)b(?C10))(?C11)/ - bb - -/-- Perl seems to have a bug with this one --/ - -/aaaaa(*COMMIT)(*PRUNE)b|a+c/ - aaaaaac - -/-- Here are some that Perl treats differently because of the way it handles -backtracking verbs. --/ - - /(?!a(*COMMIT)b)ac|ad/ - ac - ad - -/^(?!a(*THEN)b|ac)../ - ac - ad - -/^(?=a(*THEN)b|ac)/ - ac - -/\A.*?(?:a|b(*THEN)c)/ - ba - -/\A.*?(?:a|b(*THEN)c)++/ - ba - -/\A.*?(?:a|b(*THEN)c|d)/ - ba - -/(?:(a(*MARK:X)a+(*SKIP:X)b)){0}(?:(?1)|aac)/ - aac - -/\A.*?(a|b(*THEN)c)/ - ba - -/^(A(*THEN)B|A(*THEN)D)/ - AD - -/(?!b(*THEN)a)bn|bnn/ - bnn - -/(?(?=b(*SKIP)a)bn|bnn)/ - bnn - -/(?=b(*THEN)a|)bn|bnn/ - bnn - -/-------------------------/ - -/(*LIMIT_MATCH=12bc)abc/ - -/(*LIMIT_MATCH=4294967290)abc/ - -/(*LIMIT_RECURSION=4294967280)abc/I - -/(a+)*zz/ - aaaaaaaaaaaaaz - aaaaaaaaaaaaaz\q3000 - -/(a+)*zz/S- - aaaaaaaaaaaaaz\Q10 - -/(*LIMIT_MATCH=3000)(a+)*zz/I - aaaaaaaaaaaaaz - aaaaaaaaaaaaaz\q60000 - -/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I - aaaaaaaaaaaaaz - -/(*LIMIT_MATCH=60000)(a+)*zz/I - aaaaaaaaaaaaaz - aaaaaaaaaaaaaz\q3000 - -/(*LIMIT_RECURSION=10)(a+)*zz/IS- - aaaaaaaaaaaaaz - aaaaaaaaaaaaaz\Q1000 - -/(*LIMIT_RECURSION=10)(*LIMIT_RECURSION=1000)(a+)*zz/IS- - aaaaaaaaaaaaaz - -/(*LIMIT_RECURSION=1000)(a+)*zz/IS- - aaaaaaaaaaaaaz - aaaaaaaaaaaaaz\Q10 - -/-- This test causes a segfault with Perl 5.18.0 --/ - -/^(?=(a)){0}b(?1)/ - backgammon - -/(?|(?f)|(?b))/JI - -/(?abc)(?z)\k()/JDZS - -/a*[bcd]/BZ - -/[bcd]*a/BZ - -/-- A complete set of tests for auto-possessification of character types --/ - -/\D+\D \D+\d \D+\S \D+\s \D+\W \D+\w \D+. \D+\C \D+\R \D+\H \D+\h \D+\V \D+\v \D+\Z \D+\z \D+$/BZx - -/\d+\D \d+\d \d+\S \d+\s \d+\W \d+\w \d+. \d+\C \d+\R \d+\H \d+\h \d+\V \d+\v \d+\Z \d+\z \d+$/BZx - -/\S+\D \S+\d \S+\S \S+\s \S+\W \S+\w \S+. \S+\C \S+\R \S+\H \S+\h \S+\V \S+\v \S+\Z \S+\z \S+$/BZx - -/\s+\D \s+\d \s+\S \s+\s \s+\W \s+\w \s+. \s+\C \s+\R \s+\H \s+\h \s+\V \s+\v \s+\Z \s+\z \s+$/BZx - -/\W+\D \W+\d \W+\S \W+\s \W+\W \W+\w \W+. \W+\C \W+\R \W+\H \W+\h \W+\V \W+\v \W+\Z \W+\z \W+$/BZx - -/\w+\D \w+\d \w+\S \w+\s \w+\W \w+\w \w+. \w+\C \w+\R \w+\H \w+\h \w+\V \w+\v \w+\Z \w+\z \w+$/BZx - -/\C+\D \C+\d \C+\S \C+\s \C+\W \C+\w \C+. \C+\C \C+\R \C+\H \C+\h \C+\V \C+\v \C+\Z \C+\z \C+$/BZx - -/\R+\D \R+\d \R+\S \R+\s \R+\W \R+\w \R+. \R+\C \R+\R \R+\H \R+\h \R+\V \R+\v \R+\Z \R+\z \R+$/BZx - -/\H+\D \H+\d \H+\S \H+\s \H+\W \H+\w \H+. \H+\C \H+\R \H+\H \H+\h \H+\V \H+\v \H+\Z \H+\z \H+$/BZx - -/\h+\D \h+\d \h+\S \h+\s \h+\W \h+\w \h+. \h+\C \h+\R \h+\H \h+\h \h+\V \h+\v \h+\Z \h+\z \h+$/BZx - -/\V+\D \V+\d \V+\S \V+\s \V+\W \V+\w \V+. \V+\C \V+\R \V+\H \V+\h \V+\V \V+\v \V+\Z \V+\z \V+$/BZx - -/\v+\D \v+\d \v+\S \v+\s \v+\W \v+\w \v+. \v+\C \v+\R \v+\H \v+\h \v+\V \v+\v \v+\Z \v+\z \v+$/BZx - -/ a+\D a+\d a+\S a+\s a+\W a+\w a+. a+\C a+\R a+\H a+\h a+\V a+\v a+\Z a+\z a+$/BZx - -/\n+\D \n+\d \n+\S \n+\s \n+\W \n+\w \n+. \n+\C \n+\R \n+\H \n+\h \n+\V \n+\v \n+\Z \n+\z \n+$/BZx - -/ .+\D .+\d .+\S .+\s .+\W .+\w .+. .+\C .+\R .+\H .+\h .+\V .+\v .+\Z .+\z .+$/BZx - -/ .+\D .+\d .+\S .+\s .+\W .+\w .+. .+\C .+\R .+\H .+\h .+\V .+\v .+\Z .+\z .+$/BZxs - -/\D+$ \d+$ \S+$ \s+$ \W+$ \w+$ \C+$ \R+$ \H+$ \h+$ \V+$ \v+$ a+$ \n+$ .+$ .+$/BZxm - -/(?=a+)a(a+)++a/BZ - -/a+(bb|cc)a+(?:bb|cc)a+(?>bb|cc)a+(?:bb|cc)+a+(aa)a+(?:bb|aa)/BZ - -/a+(bb|cc)?#a+(?:bb|cc)??#a+(?:bb|cc)?+#a+(?:bb|cc)*#a+(bb|cc)?a#a+(?:aa)?/BZ - -/a+(?:bb)?a#a+(?:|||)#a+(?:|b)a#a+(?:|||)?a/BZ - -/[ab]*/BZ - aaaa - -/[ab]*?/BZ - aaaa - -/[ab]?/BZ - aaaa - -/[ab]??/BZ - aaaa - -/[ab]+/BZ - aaaa - -/[ab]+?/BZ - aaaa - -/[ab]{2,3}/BZ - aaaa - -/[ab]{2,3}?/BZ - aaaa - -/[ab]{2,}/BZ - aaaa - -/[ab]{2,}?/BZ - aaaa - -/\d+\s{0,5}=\s*\S?=\w{0,4}\W*/BZ - -/[a-d]{5,12}[e-z0-9]*#[^a-z]+[b-y]*a[2-7]?[^0-9a-z]+/BZ - -/[a-z]*\s#[ \t]?\S#[a-c]*\S#[C-G]+?\d#[4-8]*\D#[4-9,]*\D#[!$]{0,5}\w#[M-Xf-l]+\W#[a-c,]?\W/BZ - -/a+(aa|bb)*c#a*(bb|cc)*a#a?(bb|cc)*d#[a-f]*(g|hh)*f/BZ - -/[a-f]*(g|hh|i)*i#[a-x]{4,}(y{0,6})*y#[a-k]+(ll|mm)+n/BZ - -/[a-f]*(?>gg|hh)+#[a-f]*(?>gg|hh)?#[a-f]*(?>gg|hh)*a#[a-f]*(?>gg|hh)*h/BZ - -/[a-c]*d/DZS - -/[a-c]+d/DZS - -/[a-c]?d/DZS - -/[a-c]{4,6}d/DZS - -/[a-c]{0,6}d/DZS - -/-- End of special auto-possessive tests --/ - -/^A\o{1239}B/ - A\123B - -/^A\oB/ - -/^A\x{zz}B/ - -/^A\x{12Z/ - -/^A\x{/ - -/[ab]++/BZO - -/[^ab]*+/BZO - -/a{4}+/BZO - -/a{4}+/BZOi - -/[a-[:digit:]]+/ - -/[A-[:digit:]]+/ - -/[a-[.xxx.]]+/ - -/[a-[=xxx=]]+/ - -/[a-[!xxx!]]+/ - -/[A-[!xxx!]]+/ - A]]] - -/[a-\d]+/ - -/(?<0abc>xx)/ - -/(?&1abc)xx(?<1abc>y)/ - -/(?xx)/ - -/(?'0abc'xx)/ - -/(?P<0abc>xx)/ - -/\k<5ghj>/ - -/\k'5ghj'/ - -/\k{2fgh}/ - -/(?P=8yuki)/ - -/\g{4df}/ - -/(?&1abc)xx(?<1abc>y)/ - -/(?P>1abc)xx(?<1abc>y)/ - -/\g'3gh'/ - -/\g<5fg>/ - -/(?(<4gh>)abc)/ - -/(?('4gh')abc)/ - -/(?(4gh)abc)/ - -/(?(R&6yh)abc)/ - -/(((a\2)|(a*)\g<-1>))*a?/BZ - -/-- Test the ugly "start or end of word" compatibility syntax --/ - -/[[:<:]]red[[:>:]]/BZ - little red riding hood - a /red/ thing - red is a colour - put it all on red - ** Failers - no reduction - Alfred Winifred - -/[a[:<:]] should give error/ - -/(?=ab\K)/+ - abcd - -/abcd/f - xx\nxabcd - -/ -- Test stack check external calls --/ - -/(((((a)))))/Q0 - -/(((((a)))))/Q1 - -/(((((a)))))/Q - -/^\w+(?>\s*)(?<=\w)/BZ - -/\othing/ - -/\o{}/ - -/\o{whatever}/ - -/\xthing/ - -/\x{}/ - -/\x{whatever}/ - -"((?=(?(?=(?(?=(?(?=()))))))))" - a - -"(?(?=)==)(((((((((?=)))))))))" - a - -/^(?:(a)|b)(?(1)A|B)/I - aA123\O3 - aA123\O6 - -'^(?:(?a)|b)(?()A|B)' - aA123\O3 - aA123\O6 - -'^(?)(?:(?a)|b)(?()A|B)'J - aA123\O3 - aA123\O6 - -'^(?:(?X)|)(?:(?a)|b)\k{AA}'J - aa123\O3 - aa123\O6 - -/(?(?J)(?1(111111)11|)1|1|)(?()1)/ - -/(?(?=0)?)+/ - -/(?(?=0)(?=00)?00765)/ - 00765 - -/(?(?=0)(?=00)?00765|(?!3).56)/ - 00765 - 456 - ** Failers - 356 - -'^(a)*+(\w)' - g - g\O3 - -'^(?:a)*+(\w)' - g - g\O3 - -//C - \O\C+ - -"((?2){0,1999}())?" - -/((?+1)(\1))/BZ - -/(?(?!)a|b)/ - bbb - aaa - -"((?2)+)((?1))" - -"(?(?.*!.*)?)" - -"X((?2)()*+){2}+"BZ - -"X((?2)()*+){2}"BZ - -"(?<=((?2))((?1)))" - -/(?<=\Ka)/g+ - aaaaa - -/(?<=\Ka)/G+ - aaaaa - -/((?2){73}(?2))((?1))/ - -/.((?2)(?R)\1)()/BZ - -/(?1)()((((((\1++))\x85)+)|))/ - -/(\9*+(?2);\3++()2|)++{/ - -/\V\x85\9*+((?2)\3++()2)*:2/ - -/(((?(R)){0,2}) (?''((?'R')((?'R')))))/J - -/(((?(X)){0,2}) (?''((?'X')((?'X')))))/J - -/(((?(R)){0,2}) (?''((?'X')((?'R')))))/ - -"(?J)(?'d'(?'d'\g{d}))" - -".*?\h.+.\.+\R*?\xd(?i)(?=!(?=b`b`b`\`b\xa9b!)`\a`bbbbbbbbbbbbb`bbbbbbbbbbbb*R\x85bbbbbbb\C?{((?2)(?))(( -\H){8(?<=(?1){29}\xa8bbbb\x16\xd\xc6^($(?1)/ - -/a[[:punct:]b]/BZ - -/L(?#(|++)(?J:(?)(?))(?)/ - \O\CC - -/(?=a\K)/ - ring bpattingbobnd $ 1,oern cou \rb\L - -/(?<=((?C)0))/ - 9010 - abcd - -/((?J)(?'R'(?'R'(?'R'(?'R'(?'R'(?|(\k'R'))))))))/ - -/\N(?(?C)0?!.)*/ - -/(?abc)(?(R)xyz)/BZ - -/(?abc)(?(R)xyz)/BZ - -/(?=.*[A-Z])/I - -"(?<=(a))\1?b" - ab - aaab - -"(?=(a))\1?b" - ab - aaab - -/(?(?=^))b/ - abc - -/-- End of testinput2 --/ diff --git a/src/pcre/testdata/testinput21 b/src/pcre/testdata/testinput21 deleted file mode 100644 index 30895eef..00000000 --- a/src/pcre/testdata/testinput21 +++ /dev/null @@ -1,26 +0,0 @@ -/-- Tests for reloading pre-compiled patterns. The first one gives an error -right away, and can be any old pattern compiled in 8-bit mode ("abc" is -typical). The others require the link size to be 2. */x - -(?:[AaLl]+)[^xX-]*?)(?P[\x{150}-\x{250}\x{300}]| - [^\x{800}aAs-uS-U\x{d800}-\x{dfff}])++[^#\b\x{500}\x{1000}]{3,5}$ - /x - - In 16-bit mode with options: S>testdata/saved16LE-1 - FS>testdata/saved16BE-1 - In 32-bit mode with options: S>testdata/saved32LE-1 - FS>testdata/saved32BE-1 ---%x - -[aZ\x{400}-\x{10ffff}]{4,} - [\x{f123}\x{10039}\x{20000}-\x{21234}]?| - [A-Cx-z\x{100000}-\x{1000a7}\x{101234}]) - (?[^az])/x - - In 16-bit mode with options: S8>testdata/saved16LE-2 - FS8>testdata/saved16BE-2 - In 32-bit mode with options: S8>testdata/saved32LE-2 - FS8>testdata/saved32BE-2 ---%8x - ->>\xaa<<< - >>>\xba<<< - -/[\W]+/Lfr_FR - >>>\xaa<<< - >>>\xba<<< - -/[^[:alpha:]]+/Lfr_FR - >>>\xaa<<< - >>>\xba<<< - -/\w+/Lfr_FR - >>>\xaa<<< - >>>\xba<<< - -/[\w]+/Lfr_FR - >>>\xaa<<< - >>>\xba<<< - -/[[:alpha:]]+/Lfr_FR - >>>\xaa<<< - >>>\xba<<< - -/[[:alpha:]][[:lower:]][[:upper:]]/DZLfr_FR - -/-- End of testinput3 --/ diff --git a/src/pcre/testdata/testinput4 b/src/pcre/testdata/testinput4 deleted file mode 100644 index 63368c0a..00000000 --- a/src/pcre/testdata/testinput4 +++ /dev/null @@ -1,733 +0,0 @@ -/-- This set of tests is for UTF support, excluding Unicode properties. It is - compatible with all versions of Perl >= 5.10 and both the 8-bit and 16-bit - PCRE libraries. --/ - -< forbid 9?=ABCDEFfGILMNPTUWXZ< - -/a.b/8 - acb - a\x7fb - a\x{100}b - *** Failers - a\nb - -/a(.{3})b/8 - a\x{4000}xyb - a\x{4000}\x7fyb - a\x{4000}\x{100}yb - *** Failers - a\x{4000}b - ac\ncb - -/a(.*?)(.)/ - a\xc0\x88b - -/a(.*?)(.)/8 - a\x{100}b - -/a(.*)(.)/ - a\xc0\x88b - -/a(.*)(.)/8 - a\x{100}b - -/a(.)(.)/ - a\xc0\x92bcd - -/a(.)(.)/8 - a\x{240}bcd - -/a(.?)(.)/ - a\xc0\x92bcd - -/a(.?)(.)/8 - a\x{240}bcd - -/a(.??)(.)/ - a\xc0\x92bcd - -/a(.??)(.)/8 - a\x{240}bcd - -/a(.{3})b/8 - a\x{1234}xyb - a\x{1234}\x{4321}yb - a\x{1234}\x{4321}\x{3412}b - *** Failers - a\x{1234}b - ac\ncb - -/a(.{3,})b/8 - a\x{1234}xyb - a\x{1234}\x{4321}yb - a\x{1234}\x{4321}\x{3412}b - axxxxbcdefghijb - a\x{1234}\x{4321}\x{3412}\x{3421}b - *** Failers - a\x{1234}b - -/a(.{3,}?)b/8 - a\x{1234}xyb - a\x{1234}\x{4321}yb - a\x{1234}\x{4321}\x{3412}b - axxxxbcdefghijb - a\x{1234}\x{4321}\x{3412}\x{3421}b - *** Failers - a\x{1234}b - -/a(.{3,5})b/8 - a\x{1234}xyb - a\x{1234}\x{4321}yb - a\x{1234}\x{4321}\x{3412}b - axxxxbcdefghijb - a\x{1234}\x{4321}\x{3412}\x{3421}b - axbxxbcdefghijb - axxxxxbcdefghijb - *** Failers - a\x{1234}b - axxxxxxbcdefghijb - -/a(.{3,5}?)b/8 - a\x{1234}xyb - a\x{1234}\x{4321}yb - a\x{1234}\x{4321}\x{3412}b - axxxxbcdefghijb - a\x{1234}\x{4321}\x{3412}\x{3421}b - axbxxbcdefghijb - axxxxxbcdefghijb - *** Failers - a\x{1234}b - axxxxxxbcdefghijb - -/^[a\x{c0}]/8 - *** Failers - \x{100} - -/(?<=aXb)cd/8 - aXbcd - -/(?<=a\x{100}b)cd/8 - a\x{100}bcd - -/(?<=a\x{100000}b)cd/8 - a\x{100000}bcd - -/(?:\x{100}){3}b/8 - \x{100}\x{100}\x{100}b - *** Failers - \x{100}\x{100}b - -/\x{ab}/8 - \x{ab} - \xc2\xab - *** Failers - \x00{ab} - -/(?<=(.))X/8 - WXYZ - \x{256}XYZ - *** Failers - XYZ - -/[^a]+/8g - bcd - \x{100}aY\x{256}Z - -/^[^a]{2}/8 - \x{100}bc - -/^[^a]{2,}/8 - \x{100}bcAa - -/^[^a]{2,}?/8 - \x{100}bca - -/[^a]+/8ig - bcd - \x{100}aY\x{256}Z - -/^[^a]{2}/8i - \x{100}bc - -/^[^a]{2,}/8i - \x{100}bcAa - -/^[^a]{2,}?/8i - \x{100}bca - -/\x{100}{0,0}/8 - abcd - -/\x{100}?/8 - abcd - \x{100}\x{100} - -/\x{100}{0,3}/8 - \x{100}\x{100} - \x{100}\x{100}\x{100}\x{100} - -/\x{100}*/8 - abce - \x{100}\x{100}\x{100}\x{100} - -/\x{100}{1,1}/8 - abcd\x{100}\x{100}\x{100}\x{100} - -/\x{100}{1,3}/8 - abcd\x{100}\x{100}\x{100}\x{100} - -/\x{100}+/8 - abcd\x{100}\x{100}\x{100}\x{100} - -/\x{100}{3}/8 - abcd\x{100}\x{100}\x{100}XX - -/\x{100}{3,5}/8 - abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX - -/\x{100}{3,}/8 - abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX - -/(?<=a\x{100}{2}b)X/8+ - Xyyya\x{100}\x{100}bXzzz - -/\D*/8 - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa - -/\D*/8 - \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} - -/\D/8 - 1X2 - 1\x{100}2 - -/>\S/8 - > >X Y - > >\x{100} Y - -/\d/8 - \x{100}3 - -/\s/8 - \x{100} X - -/\D+/8 - 12abcd34 - *** Failers - 1234 - -/\D{2,3}/8 - 12abcd34 - 12ab34 - *** Failers - 1234 - 12a34 - -/\D{2,3}?/8 - 12abcd34 - 12ab34 - *** Failers - 1234 - 12a34 - -/\d+/8 - 12abcd34 - *** Failers - -/\d{2,3}/8 - 12abcd34 - 1234abcd - *** Failers - 1.4 - -/\d{2,3}?/8 - 12abcd34 - 1234abcd - *** Failers - 1.4 - -/\S+/8 - 12abcd34 - *** Failers - \ \ - -/\S{2,3}/8 - 12abcd34 - 1234abcd - *** Failers - \ \ - -/\S{2,3}?/8 - 12abcd34 - 1234abcd - *** Failers - \ \ - -/>\s+ <34 - *** Failers - -/>\s{2,3} \s{2,3}? \xff< - -/[\xff]/8 - >\x{ff}< - -/[^\xFF]/ - XYZ - -/[^\xff]/8 - XYZ - \x{123} - -/^[ac]*b/8 - xb - -/^[ac\x{100}]*b/8 - xb - -/^[^x]*b/8i - xb - -/^[^x]*b/8 - xb - -/^\d*b/8 - xb - -/(|a)/g8 - catac - a\x{256}a - -/^\x{85}$/8i - \x{85} - -/^ሴ/8 - ሴ - -/^\ሴ/8 - ሴ - -"(?s)(.{1,5})"8 - abcdefg - ab - -/a*\x{100}*\w/8 - a - -/\S\S/8g - A\x{a3}BC - -/\S{2}/8g - A\x{a3}BC - -/\W\W/8g - +\x{a3}== - -/\W{2}/8g - +\x{a3}== - -/\S/8g - \x{442}\x{435}\x{441}\x{442} - -/[\S]/8g - \x{442}\x{435}\x{441}\x{442} - -/\D/8g - \x{442}\x{435}\x{441}\x{442} - -/[\D]/8g - \x{442}\x{435}\x{441}\x{442} - -/\W/8g - \x{2442}\x{2435}\x{2441}\x{2442} - -/[\W]/8g - \x{2442}\x{2435}\x{2441}\x{2442} - -/[\S\s]*/8 - abc\n\r\x{442}\x{435}\x{441}\x{442}xyz - -/[\x{41f}\S]/8g - \x{442}\x{435}\x{441}\x{442} - -/.[^\S]./8g - abc def\x{442}\x{443}xyz\npqr - -/.[^\S\n]./8g - abc def\x{442}\x{443}xyz\npqr - -/[[:^alnum:]]/8g - +\x{2442} - -/[[:^alpha:]]/8g - +\x{2442} - -/[[:^ascii:]]/8g - A\x{442} - -/[[:^blank:]]/8g - A\x{442} - -/[[:^cntrl:]]/8g - A\x{442} - -/[[:^digit:]]/8g - A\x{442} - -/[[:^graph:]]/8g - \x19\x{e01ff} - -/[[:^lower:]]/8g - A\x{422} - -/[[:^print:]]/8g - \x{19}\x{e01ff} - -/[[:^punct:]]/8g - A\x{442} - -/[[:^space:]]/8g - A\x{442} - -/[[:^upper:]]/8g - a\x{442} - -/[[:^word:]]/8g - +\x{2442} - -/[[:^xdigit:]]/8g - M\x{442} - -/[^ABCDEFGHIJKLMNOPQRSTUVWXYZÀÃÂÃÄÅÆÇÈÉÊËÌÃÃŽÃÃÑÒÓÔÕÖØÙÚÛÜÃÞĀĂĄĆĈĊČĎÄĒĔĖĘĚĜĞĠĢĤĦĨĪĬĮİIJĴĶĹĻĽĿÅŃŅŇŊŌŎÅÅ’Å”Å–Å˜ÅšÅœÅžÅ Å¢Å¤Å¦Å¨ÅªÅ¬Å®Å°Å²Å´Å¶Å¸Å¹Å»Å½ÆÆ‚Æ„Æ†Æ‡Æ‰ÆŠÆ‹ÆŽÆÆÆ‘Æ“Æ”Æ–Æ—Æ˜ÆœÆÆŸÆ Æ¢Æ¤Æ¦Æ§Æ©Æ¬Æ®Æ¯Æ±Æ²Æ³ÆµÆ·Æ¸Æ¼Ç„LJNJÇÇǑǓǕǗǙǛǞǠǢǤǦǨǪǬǮDZǴǶǷǸǺǼǾȀȂȄȆȈȊȌȎÈȒȔȖȘȚȜȞȠȢȤȦȨȪȬȮȰȲȺȻȽȾÉΆΈΉΊΌΎÎΑΒΓΔΕΖΗΘΙΚΛΜÎΞΟΠΡΣΤΥΦΧΨΩΪΫϒϓϔϘϚϜϞϠϢϤϦϨϪϬϮϴϷϹϺϽϾϿЀÐЂЃЄЅІЇЈЉЊЋЌÐÐŽÐÐБВГДЕЖЗИЙКЛМÐОПРСТУФХЦЧШЩЪЫЬЭЮЯѠѢѤѦѨѪѬѮѰѲѴѶѸѺѼѾҀҊҌҎÒҒҔҖҘҚҜҞҠҢҤҦҨҪҬҮҰҲҴҶҸҺҼҾӀÓÓƒÓ…Ó‡Ó‰Ó‹ÓÓӒӔӖӘӚӜӞӠӢӤӦӨӪӬӮӰӲӴӶӸԀԂԄԆԈԊԌԎԱԲԳԴԵԶԷԸԹԺԻԼԽԾԿՀÕÕ‚ÕƒÕ„Õ…Õ†Õ‡ÕˆÕ‰ÕŠÕ‹ÕŒÕÕŽÕÕՑՒՓՔՕՖႠႡႢႣႤႥႦႧႨႩႪႫႬႭႮႯႰႱႲႳႴႵႶႷႸႹႺႻႼႽႾႿჀáƒáƒ‚ჃჄჅḀḂḄḆḈḊḌḎá¸á¸’ḔḖḘḚḜḞḠḢḤḦḨḪḬḮḰḲḴḶḸḺḼḾṀṂṄṆṈṊṌṎá¹á¹’ṔṖṘṚṜṞṠṢṤṦṨṪṬṮṰṲṴṶṸṺṼṾẀẂẄẆẈẊẌẎáºáº’ẔẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼẾỀỂỄỆỈỊỌỎá»á»’ỔỖỘỚỜỞỠỢỤỦỨỪỬỮỰỲỴỶỸἈἉἊἋἌá¼á¼Žá¼á¼˜á¼™á¼šá¼›á¼œá¼á¼¨á¼©á¼ªá¼«á¼¬á¼­á¼®á¼¯á¼¸á¼¹á¼ºá¼»á¼¼á¼½á¼¾á¼¿á½ˆá½‰á½Šá½‹á½Œá½á½™á½›á½á½Ÿá½¨á½©á½ªá½«á½¬á½­á½®á½¯á¾¸á¾¹á¾ºá¾»á¿ˆá¿‰á¿Šá¿‹á¿˜á¿™á¿šá¿›á¿¨á¿©á¿ªá¿«á¿¬á¿¸á¿¹á¿ºá¿»abcdefghijklmnopqrstuvwxyzªµºßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿÄăąćĉċÄÄđēĕėęěÄğġģĥħĩīĭįıijĵķĸĺļľŀłńņňʼnŋÅÅőœŕŗřśÅÅŸÅ¡Å£Å¥Å§Å©Å«Å­Å¯Å±Å³ÅµÅ·ÅºÅ¼Å¾Å¿Æ€ÆƒÆ…ÆˆÆŒÆÆ’ƕƙƚƛƞơƣƥƨƪƫƭưƴƶƹƺƽƾƿdžljnjǎÇǒǔǖǘǚǜÇǟǡǣǥǧǩǫǭǯǰdzǵǹǻǽǿÈȃȅȇȉȋÈÈȑȓȕȗșțÈȟȡȣȥȧȩȫȭȯȱȳȴȵȶȷȸȹȼȿɀÉɑɒɓɔɕɖɗɘəɚɛɜÉɞɟɠɡɢɣɤɥɦɧɨɩɪɫɬɭɮɯɰɱɲɳɴɵɶɷɸɹɺɻɼɽɾɿʀÊʂʃʄʅʆʇʈʉʊʋʌÊÊŽÊÊʑʒʓʔʕʖʗʘʙʚʛʜÊʞʟʠʡʢʣʤʥʦʧʨʩʪʫʬʭʮʯÎάέήίΰαβγδεζηθικλμνξοπÏςστυφχψωϊϋόÏÏŽÏϑϕϖϗϙϛÏϟϡϣϥϧϩϫϭϯϰϱϲϳϵϸϻϼабвгдежзийклмнопрÑтуфхцчшщъыьÑÑŽÑÑёђѓєѕіїјљњћќÑўџѡѣѥѧѩѫѭѯѱѳѵѷѹѻѽѿÒÒ‹ÒÒÒ‘Ò“Ò•Ò—Ò™Ò›ÒҟҡңҥҧҩҫҭүұҳҵҷҹһҽҿӂӄӆӈӊӌӎӑӓӕӗәӛÓÓŸÓ¡Ó£Ó¥Ó§Ó©Ó«Ó­Ó¯Ó±Ó³ÓµÓ·Ó¹ÔÔƒÔ…Ô‡Ô‰Ô‹ÔÔÕ¡Õ¢Õ£Õ¤Õ¥Õ¦Õ§Õ¨Õ©ÕªÕ«Õ¬Õ­Õ®Õ¯Õ°Õ±Õ²Õ³Õ´ÕµÕ¶Õ·Õ¸Õ¹ÕºÕ»Õ¼Õ½Õ¾Õ¿Ö€Öւփքօֆևᴀá´á´‚ᴃᴄᴅᴆᴇᴈᴉᴊᴋᴌá´á´Žá´á´á´‘ᴒᴓᴔᴕᴖᴗᴘᴙᴚᴛᴜá´á´žá´Ÿá´ á´¡á´¢á´£á´¤á´¥á´¦á´§á´¨á´©á´ªá´«áµ¢áµ£áµ¤áµ¥áµ¦áµ§áµ¨áµ©áµªáµ«áµ¬áµ­áµ®áµ¯áµ°áµ±áµ²áµ³áµ´áµµáµ¶áµ·áµ¹áµºáµ»áµ¼áµ½áµ¾áµ¿á¶€á¶á¶‚ᶃᶄᶅᶆᶇᶈᶉᶊᶋᶌá¶á¶Žá¶á¶á¶‘ᶒᶓᶔᶕᶖᶗᶘᶙᶚá¸á¸ƒá¸…ḇḉḋá¸á¸á¸‘ḓḕḗḙḛá¸á¸Ÿá¸¡á¸£á¸¥á¸§á¸©á¸«á¸­á¸¯á¸±á¸³á¸µá¸·á¸¹á¸»á¸½á¸¿á¹á¹ƒá¹…ṇṉṋá¹á¹á¹‘ṓṕṗṙṛá¹á¹Ÿá¹¡á¹£á¹¥á¹§á¹©á¹«á¹­á¹¯á¹±á¹³á¹µá¹·á¹¹á¹»á¹½á¹¿áºáºƒáº…ẇẉẋáºáºáº‘ẓẕẖẗẘẙẚẛạảấầẩẫậắằẳẵặẹẻẽếá»á»ƒá»…ệỉịá»á»á»‘ồổỗộớá»á»Ÿá»¡á»£á»¥á»§á»©á»«á»­á»¯á»±á»³á»µá»·á»¹á¼€á¼á¼‚ἃἄἅἆἇá¼á¼‘ἒἓἔἕἠἡἢἣἤἥἦἧἰἱἲἳἴἵἶἷὀá½á½‚ὃὄὅá½á½‘ὒὓὔὕὖὗὠὡὢὣὤὥὦὧὰάὲέὴήὶίὸόὺύὼώᾀá¾á¾‚ᾃᾄᾅᾆᾇá¾á¾‘ᾒᾓᾔᾕᾖᾗᾠᾡᾢᾣᾤᾥᾦᾧᾰᾱᾲᾳᾴᾶᾷιῂῃῄῆῇá¿á¿‘ῒΐῖῗῠῡῢΰῤῥῦῧῲῳῴῶῷâ²â²ƒâ²…ⲇⲉⲋâ²â²â²‘ⲓⲕⲗⲙⲛâ²â²Ÿâ²¡â²£â²¥â²§â²©â²«â²­â²¯â²±â²³â²µâ²·â²¹â²»â²½â²¿â³â³ƒâ³…ⳇⳉⳋâ³â³â³‘ⳓⳕⳗⳙⳛâ³â³Ÿâ³¡â³£â³¤â´€â´â´‚ⴃⴄⴅⴆⴇⴈⴉⴊⴋⴌâ´â´Žâ´â´â´‘ⴒⴓⴔⴕⴖⴗⴘⴙⴚⴛⴜâ´â´žâ´Ÿâ´ â´¡â´¢â´£â´¤â´¥ï¬€ï¬ï¬‚ffifflſtstﬓﬔﬕﬖﬗ\d-_^]/8 - -/^[^d]*?$/ - abc - -/^[^d]*?$/8 - abc - -/^[^d]*?$/i - abc - -/^[^d]*?$/8i - abc - -/(?i)[\xc3\xa9\xc3\xbd]|[\xc3\xa9\xc3\xbdA]/8 - -/^[a\x{c0}]b/8 - \x{c0}b - -/^([a\x{c0}]*?)aa/8 - a\x{c0}aaaa/ - -/^([a\x{c0}]*?)aa/8 - a\x{c0}aaaa/ - a\x{c0}a\x{c0}aaa/ - -/^([a\x{c0}]*)aa/8 - a\x{c0}aaaa/ - a\x{c0}a\x{c0}aaa/ - -/^([a\x{c0}]*)a\x{c0}/8 - a\x{c0}aaaa/ - a\x{c0}a\x{c0}aaa/ - -/A*/g8 - AAB\x{123}BAA - -/(abc)\1/8i - abc - -/(abc)\1/8 - abc - -/a(*:a\x{1234}b)/8K - abc - -/a(*:a£b)/8K - abc - -/-- Noncharacters --/ - -/./8 - \x{fffe} - \x{ffff} - \x{1fffe} - \x{1ffff} - \x{2fffe} - \x{2ffff} - \x{3fffe} - \x{3ffff} - \x{4fffe} - \x{4ffff} - \x{5fffe} - \x{5ffff} - \x{6fffe} - \x{6ffff} - \x{7fffe} - \x{7ffff} - \x{8fffe} - \x{8ffff} - \x{9fffe} - \x{9ffff} - \x{afffe} - \x{affff} - \x{bfffe} - \x{bffff} - \x{cfffe} - \x{cffff} - \x{dfffe} - \x{dffff} - \x{efffe} - \x{effff} - \x{ffffe} - \x{fffff} - \x{10fffe} - \x{10ffff} - \x{fdd0} - \x{fdd1} - \x{fdd2} - \x{fdd3} - \x{fdd4} - \x{fdd5} - \x{fdd6} - \x{fdd7} - \x{fdd8} - \x{fdd9} - \x{fdda} - \x{fddb} - \x{fddc} - \x{fddd} - \x{fdde} - \x{fddf} - \x{fde0} - \x{fde1} - \x{fde2} - \x{fde3} - \x{fde4} - \x{fde5} - \x{fde6} - \x{fde7} - \x{fde8} - \x{fde9} - \x{fdea} - \x{fdeb} - \x{fdec} - \x{fded} - \x{fdee} - \x{fdef} - -/^\d*\w{4}/8 - 1234 - 123 - -/^[^b]*\w{4}/8 - aaaa - aaa - -/^[^b]*\w{4}/8i - aaaa - aaa - -/^\x{100}*.{4}/8 - \x{100}\x{100}\x{100}\x{100} - \x{100}\x{100}\x{100} - -/^\x{100}*.{4}/8i - \x{100}\x{100}\x{100}\x{100} - \x{100}\x{100}\x{100} - -/^a+[a\x{200}]/8 - aa - -/^.\B.\B./8 - \x{10123}\x{10124}\x{10125} - -/^#[^\x{ffff}]#[^\x{ffff}]#[^\x{ffff}]#/8 - #\x{10000}#\x{100}#\x{10ffff}# - -"[\S\V\H]"8 - -/\C(\W?Å¿)'?{{/8 - \\C(\\W?Å¿)'?{{ - -/[^\x{100}-\x{ffff}]*[\x80-\xff]/8 - \x{99}\x{99}\x{99} - -/-- End of testinput4 --/ diff --git a/src/pcre/testdata/testinput5 b/src/pcre/testdata/testinput5 deleted file mode 100644 index c94008c3..00000000 --- a/src/pcre/testdata/testinput5 +++ /dev/null @@ -1,807 +0,0 @@ -/-- This set of tests checks the API, internals, and non-Perl stuff for UTF - support, excluding Unicode properties. However, tests that give different - results in 8-bit and 16-bit modes are excluded (see tests 16 and 17). --/ - -< forbid W - -/\x{110000}/8DZ - -/\o{4200000}/8DZ - -/\x{ffffffff}/8 - -/\o{37777777777}/8 - -/\x{100000000}/8 - -/\o{77777777777}/8 - -/\x{d800}/8 - -/\o{154000}/8 - -/\x{dfff}/8 - -/\o{157777}/8 - -/\x{d7ff}/8 - -/\o{153777}/8 - -/\x{e000}/8 - -/\o{170000}/8 - -/^\x{100}a\x{1234}/8 - \x{100}a\x{1234}bcd - -/\x{0041}\x{2262}\x{0391}\x{002e}/DZ8 - \x{0041}\x{2262}\x{0391}\x{002e} - -/.{3,5}X/DZ8 - \x{212ab}\x{212ab}\x{212ab}\x{861}X - -/.{3,5}?/DZ8 - \x{212ab}\x{212ab}\x{212ab}\x{861} - -/(?<=\C)X/8 - Should produce an error diagnostic - -/^[ab]/8DZ - bar - *** Failers - c - \x{ff} - \x{100} - -/^[^ab]/8DZ - c - \x{ff} - \x{100} - *** Failers - aaa - -/\x{100}*(\d+|"(?1)")/8 - 1234 - "1234" - \x{100}1234 - "\x{100}1234" - \x{100}\x{100}12ab - \x{100}\x{100}"12" - *** Failers - \x{100}\x{100}abcd - -/\x{100}*/8DZ - -/a\x{100}*/8DZ - -/ab\x{100}*/8DZ - -/\x{100}*A/8DZ - A - -/\x{100}*\d(?R)/8DZ - -/[Z\x{100}]/8DZ - Z\x{100} - \x{100} - \x{100}Z - *** Failers - -/[\x{200}-\x{100}]/8 - -/[Ä€-Ä„]/8 - \x{100} - \x{104} - *** Failers - \x{105} - \x{ff} - -/[z-\x{100}]/8DZ - -/[z\Qa-d]Ä€\E]/8DZ - \x{100} - Ä€ - -/[\xFF]/DZ - >\xff< - -/[^\xFF]/DZ - -/[Ä-Ü]/8 - Ö # Matches without Study - \x{d6} - -/[Ä-Ü]/8S - Ö <-- Same with Study - \x{d6} - -/[\x{c4}-\x{dc}]/8 - Ö # Matches without Study - \x{d6} - -/[\x{c4}-\x{dc}]/8S - Ö <-- Same with Study - \x{d6} - -/[^\x{100}]abc(xyz(?1))/8DZ - -/[ab\x{100}]abc(xyz(?1))/8DZ - -/(\x{100}(b(?2)c))?/DZ8 - -/(\x{100}(b(?2)c)){0,2}/DZ8 - -/(\x{100}(b(?1)c))?/DZ8 - -/(\x{100}(b(?1)c)){0,2}/DZ8 - -/\W/8 - A.B - A\x{100}B - -/\w/8 - \x{100}X - -/^\ሴ/8DZ - -/\x{100}*\d/8DZ - -/\x{100}*\s/8DZ - -/\x{100}*\w/8DZ - -/\x{100}*\D/8DZ - -/\x{100}*\S/8DZ - -/\x{100}*\W/8DZ - -/()()()()()()()()()() - ()()()()()()()()()() - ()()()()()()()()()() - ()()()()()()()()()() - A (x) (?41) B/8x - AxxB - -/^[\x{100}\E-\Q\E\x{150}]/BZ8 - -/^[\QÄ€\E-\QÅ\E]/BZ8 - -/^abc./mgx8 - abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x{0085}abc7 \x{2028}abc8 \x{2029}abc9 JUNK - -/abc.$/mgx8 - abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x{0085} abc7\x{2028} abc8\x{2029} abc9 - -/^a\Rb/8 - a\nb - a\rb - a\r\nb - a\x0bb - a\x0cb - a\x{85}b - a\x{2028}b - a\x{2029}b - ** Failers - a\n\rb - -/^a\R*b/8 - ab - a\nb - a\rb - a\r\nb - a\x0bb - a\x0c\x{2028}\x{2029}b - a\x{85}b - a\n\rb - a\n\r\x{85}\x0cb - -/^a\R+b/8 - a\nb - a\rb - a\r\nb - a\x0bb - a\x0c\x{2028}\x{2029}b - a\x{85}b - a\n\rb - a\n\r\x{85}\x0cb - ** Failers - ab - -/^a\R{1,3}b/8 - a\nb - a\n\rb - a\n\r\x{85}b - a\r\n\r\nb - a\r\n\r\n\r\nb - a\n\r\n\rb - a\n\n\r\nb - ** Failers - a\n\n\n\rb - a\r - -/\H\h\V\v/8 - X X\x0a - X\x09X\x0b - ** Failers - \x{a0} X\x0a - -/\H*\h+\V?\v{3,4}/8 - \x09\x20\x{a0}X\x0a\x0b\x0c\x0d\x0a - \x09\x20\x{a0}\x0a\x0b\x0c\x0d\x0a - \x09\x20\x{a0}\x0a\x0b\x0c - ** Failers - \x09\x20\x{a0}\x0a\x0b - -/\H\h\V\v/8 - \x{3001}\x{3000}\x{2030}\x{2028} - X\x{180e}X\x{85} - ** Failers - \x{2009} X\x0a - -/\H*\h+\V?\v{3,4}/8 - \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x0c\x0d\x0a - \x09\x{205f}\x{a0}\x0a\x{2029}\x0c\x{2028}\x0a - \x09\x20\x{202f}\x0a\x0b\x0c - ** Failers - \x09\x{200a}\x{a0}\x{2028}\x0b - -/[\h]/8BZ - >\x{1680} - -/[\h]{3,}/8BZ - >\x{1680}\x{180e}\x{2000}\x{2003}\x{200a}\x{202f}\x{205f}\x{3000}< - -/[\v]/8BZ - -/[\H]/8BZ - -/[\V]/8BZ - -/.*$/8 - \x{1ec5} - -/a\Rb/I8 - a\rb - a\nb - a\r\nb - ** Failers - a\x{85}b - a\x0bb - -/a\Rb/I8 - a\rb - a\nb - a\r\nb - a\x{85}b - a\x0bb - ** Failers - a\x{85}b\ - a\x0bb\ - -/a\R?b/I8 - a\rb - a\nb - a\r\nb - ** Failers - a\x{85}b - a\x0bb - -/a\R?b/I8 - a\rb - a\nb - a\r\nb - a\x{85}b - a\x0bb - ** Failers - a\x{85}b\ - a\x0bb\ - -/.*a.*=.b.*/8 - QQQ\x{2029}ABCaXYZ=!bPQR - ** Failers - a\x{2029}b - \x61\xe2\x80\xa9\x62 - -/[[:a\x{100}b:]]/8 - -/a[^]b/8 - a\x{1234}b - a\nb - ** Failers - ab - -/a[^]+b/8 - aXb - a\nX\nX\x{1234}b - ** Failers - ab - -/(\x{de})\1/ - \x{de}\x{de} - -/X/8f - A\x{1ec5}ABCXYZ - -/Xa{2,4}b/8 - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/Xa{2,4}?b/8 - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/Xa{2,4}+b/8 - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/X\x{123}{2,4}b/8 - X\P - X\x{123}\P - X\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\x{123}\P - -/X\x{123}{2,4}?b/8 - X\P - X\x{123}\P - X\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\x{123}\P - -/X\x{123}{2,4}+b/8 - X\P - X\x{123}\P - X\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\x{123}\P - -/X\x{123}{2,4}b/8 - Xx\P - X\x{123}x\P - X\x{123}\x{123}x\P - X\x{123}\x{123}\x{123}x\P - X\x{123}\x{123}\x{123}\x{123}x\P - -/X\x{123}{2,4}?b/8 - Xx\P - X\x{123}x\P - X\x{123}\x{123}x\P - X\x{123}\x{123}\x{123}x\P - X\x{123}\x{123}\x{123}\x{123}x\P - -/X\x{123}{2,4}+b/8 - Xx\P - X\x{123}x\P - X\x{123}\x{123}x\P - X\x{123}\x{123}\x{123}x\P - X\x{123}\x{123}\x{123}\x{123}x\P - -/X\d{2,4}b/8 - X\P - X3\P - X33\P - X333\P - X3333\P - -/X\d{2,4}?b/8 - X\P - X3\P - X33\P - X333\P - X3333\P - -/X\d{2,4}+b/8 - X\P - X3\P - X33\P - X333\P - X3333\P - -/X\D{2,4}b/8 - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/X\D{2,4}?b/8 - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/X\D{2,4}+b/8 - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/X\D{2,4}b/8 - X\P - X\x{123}\P - X\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\x{123}\P - -/X\D{2,4}?b/8 - X\P - X\x{123}\P - X\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\x{123}\P - -/X\D{2,4}+b/8 - X\P - X\x{123}\P - X\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\x{123}\P - -/X[abc]{2,4}b/8 - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/X[abc]{2,4}?b/8 - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/X[abc]{2,4}+b/8 - X\P - Xa\P - Xaa\P - Xaaa\P - Xaaaa\P - -/X[abc\x{123}]{2,4}b/8 - X\P - X\x{123}\P - X\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\x{123}\P - -/X[abc\x{123}]{2,4}?b/8 - X\P - X\x{123}\P - X\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\x{123}\P - -/X[abc\x{123}]{2,4}+b/8 - X\P - X\x{123}\P - X\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\x{123}\P - -/X[^a]{2,4}b/8 - X\P - Xz\P - Xzz\P - Xzzz\P - Xzzzz\P - -/X[^a]{2,4}?b/8 - X\P - Xz\P - Xzz\P - Xzzz\P - Xzzzz\P - -/X[^a]{2,4}+b/8 - X\P - Xz\P - Xzz\P - Xzzz\P - Xzzzz\P - -/X[^a]{2,4}b/8 - X\P - X\x{123}\P - X\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\x{123}\P - -/X[^a]{2,4}?b/8 - X\P - X\x{123}\P - X\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\x{123}\P - -/X[^a]{2,4}+b/8 - X\P - X\x{123}\P - X\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\P - X\x{123}\x{123}\x{123}\x{123}\P - -/(Y)X\1{2,4}b/8 - YX\P - YXY\P - YXYY\P - YXYYY\P - YXYYYY\P - -/(Y)X\1{2,4}?b/8 - YX\P - YXY\P - YXYY\P - YXYYY\P - YXYYYY\P - -/(Y)X\1{2,4}+b/8 - YX\P - YXY\P - YXYY\P - YXYYY\P - YXYYYY\P - -/(\x{123})X\1{2,4}b/8 - \x{123}X\P - \x{123}X\x{123}\P - \x{123}X\x{123}\x{123}\P - \x{123}X\x{123}\x{123}\x{123}\P - \x{123}X\x{123}\x{123}\x{123}\x{123}\P - -/(\x{123})X\1{2,4}?b/8 - \x{123}X\P - \x{123}X\x{123}\P - \x{123}X\x{123}\x{123}\P - \x{123}X\x{123}\x{123}\x{123}\P - \x{123}X\x{123}\x{123}\x{123}\x{123}\P - -/(\x{123})X\1{2,4}+b/8 - \x{123}X\P - \x{123}X\x{123}\P - \x{123}X\x{123}\x{123}\P - \x{123}X\x{123}\x{123}\x{123}\P - \x{123}X\x{123}\x{123}\x{123}\x{123}\P - -/\bthe cat\b/8 - the cat\P - the cat\P\P - -/abcd*/8 - xxxxabcd\P - xxxxabcd\P\P - -/abcd*/i8 - xxxxabcd\P - xxxxabcd\P\P - XXXXABCD\P - XXXXABCD\P\P - -/abc\d*/8 - xxxxabc1\P - xxxxabc1\P\P - -/(a)bc\1*/8 - xxxxabca\P - xxxxabca\P\P - -/abc[de]*/8 - xxxxabcde\P - xxxxabcde\P\P - -/X\W{3}X/8 - \PX - -/\sxxx\s/8T1 - AB\x{85}xxx\x{a0}XYZ - AB\x{a0}xxx\x{85}XYZ - -/\S \S/8T1 - \x{a2} \x{84} - -'A#хц'8xBZ - -'A#хц - PQ'8xBZ - -/a+#Ñ…aa - z#XX?/8xBZ - -/a+#Ñ…aa - z#Ñ…?/8xBZ - -/\g{A}xxx#bXX(?'A'123) (?'A'456)/8xBZ - -/\g{A}xxx#bÑ…(?'A'123) (?'A'456)/8xBZ - -/^\cÄ£/8 - -/(\R*)(.)/s8 - \r\n - \r\r\n\n\r - \r\r\n\n\r\n - -/(\R)*(.)/s8 - \r\n - \r\r\n\n\r - \r\r\n\n\r\n - -/[^\x{1234}]+/iS8I - -/[^\x{1234}]+?/iS8I - -/[^\x{1234}]++/iS8I - -/[^\x{1234}]{2}/iS8I - -// - -/f.*/ - \P\Pfor - -/f.*/s - \P\Pfor - -/f.*/8 - \P\Pfor - -/f.*/8s - \P\Pfor - -/\x{d7ff}\x{e000}/8 - -/\x{d800}/8 - -/\x{dfff}/8 - -/\h+/8 - \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} - \x{3001}\x{2fff}\x{200a}\x{a0}\x{2000} - -/[\h\x{e000}]+/8BZ - \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} - \x{3001}\x{2fff}\x{200a}\x{a0}\x{2000} - -/\H+/8 - \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} - \x{2000}\x{200a}\x{1fff}\x{200b} - \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} - \x{a0}\x{3000}\x{9f}\x{a1}\x{2fff}\x{3001} - -/[\H\x{d7ff}]+/8BZ - \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} - \x{2000}\x{200a}\x{1fff}\x{200b} - \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} - \x{a0}\x{3000}\x{9f}\x{a1}\x{2fff}\x{3001} - -/\v+/8 - \x{2027}\x{2030}\x{2028}\x{2029} - \x09\x0e\x{84}\x{86}\x{85}\x0a\x0b\x0c\x0d - -/[\v\x{e000}]+/8BZ - \x{2027}\x{2030}\x{2028}\x{2029} - \x09\x0e\x{84}\x{86}\x{85}\x0a\x0b\x0c\x0d - -/\V+/8 - \x{2028}\x{2029}\x{2027}\x{2030} - \x{85}\x0a\x0b\x0c\x0d\x09\x0e\x{84}\x{86} - -/[\V\x{d7ff}]+/8BZ - \x{2028}\x{2029}\x{2027}\x{2030} - \x{85}\x0a\x0b\x0c\x0d\x09\x0e\x{84}\x{86} - -/\R+/8 - \x{2027}\x{2030}\x{2028}\x{2029} - \x09\x0e\x{84}\x{86}\x{85}\x0a\x0b\x0c\x0d - -/(..)\1/8 - ab\P - aba\P - abab\P - -/(..)\1/8i - ab\P - abA\P - aBAb\P - -/(..)\1{2,}/8 - ab\P - aba\P - abab\P - ababa\P - ababab\P - ababab\P\P - abababa\P - abababa\P\P - -/(..)\1{2,}/8i - ab\P - aBa\P - aBAb\P - AbaBA\P - abABAb\P - aBAbaB\P\P - abABabA\P - abaBABa\P\P - -/(..)\1{2,}?x/8i - ab\P - abA\P - aBAb\P - abaBA\P - abAbaB\P - abaBabA\P - abAbABaBx\P - -/./8 - \r\P - \r\P\P - -/.{2,3}/8 - \r\P - \r\P\P - \r\r\P - \r\r\P\P - \r\r\r\P - \r\r\r\P\P - -/.{2,3}?/8 - \r\P - \r\P\P - \r\r\P - \r\r\P\P - \r\r\r\P - \r\r\r\P\P - -/[^\x{100}][^\x{1234}][^\x{ffff}][^\x{10000}][^\x{10ffff}]/8BZ - -/[^\x{100}][^\x{1234}][^\x{ffff}][^\x{10000}][^\x{10ffff}]/8BZi - -/[^\x{100}]*[^\x{10000}]+[^\x{10ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{fffff}]{5,6}+/8BZ - -/[^\x{100}]*[^\x{10000}]+[^\x{10ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{fffff}]{5,6}+/8BZi - -/(?<=\x{1234}\x{1234})\bxy/I8 - -/(?8BZ - -/[\u0100-\u0200]/8BZ - -/\ud800/8 - -/^a+[a\x{200}]/8BZ - aa - -/[b-d\x{200}-\x{250}]*[ae-h]?#[\x{200}-\x{250}]{0,8}[\x00-\xff]*#[\x{200}-\x{250}]+[a-z]/8BZ - -/[^\xff]*PRUNE:\x{100}abc(xyz(?1))/8DZ - -/(?<=\K\x{17f})/8g+ - \x{17f}\x{17f}\x{17f}\x{17f}\x{17f} - -/(?<=\K\x{17f})/8G+ - \x{17f}\x{17f}\x{17f}\x{17f}\x{17f} - -/\C[^\v]+\x80/8 - [Aá¿»BÅ€C] - -/\C[^\d]+\x80/8 - [Aá¿»BÅ€C] - -/-- End of testinput5 --/ diff --git a/src/pcre/testdata/testinput6 b/src/pcre/testdata/testinput6 deleted file mode 100644 index 22ed1e64..00000000 --- a/src/pcre/testdata/testinput6 +++ /dev/null @@ -1,1571 +0,0 @@ -/-- This set of tests is for Unicode property support. It is compatible with - Perl >= 5.15. --/ - -< forbid 9?=ABCDEFfGILMNPTUXZ< - -/^\pC\pL\pM\pN\pP\pS\pZ\s+/8W - >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} - -/^>\pZ+/8W - >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} - -/^>[[:space:]]*/8W - >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} - -/^>[[:blank:]]*/8W - >\x{20}\x{a0}\x{1680}\x{180e}\x{2000}\x{202f}\x{9}\x{b}\x{2028} - -/^[[:alpha:]]*/8W - Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d} - -/^[[:alnum:]]*/8W - Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee} - -/^[[:cntrl:]]*/8W - \x{0}\x{09}\x{1f}\x{7f}\x{9f} - -/^[[:graph:]]*/8W - A\x{a1}\x{a0} - -/^[[:print:]]*/8W - A z\x{a0}\x{a1} - -/^[[:punct:]]*/8W - .+\x{a1}\x{a0} - -/\p{Zs}*?\R/ - ** Failers - a\xFCb - -/\p{Zs}*\R/ - ** Failers - a\xFCb - -/â±¥/8i - â±¥ - Ⱥx - Ⱥ - -/[â±¥]/8i - â±¥ - Ⱥx - Ⱥ - -/Ⱥ/8i - Ⱥ - â±¥ - -/-- These are tests for extended grapheme clusters --/ - -/^\X/8+ - G\x{34e}\x{34e}X - \x{34e}\x{34e}X - \x04X - \x{1100}X - \x{1100}\x{34e}X - \x{1b04}\x{1b04}X - *These match up to the roman letters - \x{1111}\x{1111}L,L - \x{1111}\x{1111}\x{1169}L,L,V - \x{1111}\x{ae4c}L, LV - \x{1111}\x{ad89}L, LVT - \x{1111}\x{ae4c}\x{1169}L, LV, V - \x{1111}\x{ae4c}\x{1169}\x{1169}L, LV, V, V - \x{1111}\x{ae4c}\x{1169}\x{11fe}L, LV, V, T - \x{1111}\x{ad89}\x{11fe}L, LVT, T - \x{1111}\x{ad89}\x{11fe}\x{11fe}L, LVT, T, T - \x{ad89}\x{11fe}\x{11fe}LVT, T, T - *These match just the first codepoint (invalid sequence) - \x{1111}\x{11fe}L, T - \x{ae4c}\x{1111}LV, L - \x{ae4c}\x{ae4c}LV, LV - \x{ae4c}\x{ad89}LV, LVT - \x{1169}\x{1111}V, L - \x{1169}\x{ae4c}V, LV - \x{1169}\x{ad89}V, LVT - \x{ad89}\x{1111}LVT, L - \x{ad89}\x{1169}LVT, V - \x{ad89}\x{ae4c}LVT, LV - \x{ad89}\x{ad89}LVT, LVT - \x{11fe}\x{1111}T, L - \x{11fe}\x{1169}T, V - \x{11fe}\x{ae4c}T, LV - \x{11fe}\x{ad89}T, LVT - *Test extend and spacing mark - \x{1111}\x{ae4c}\x{0711}L, LV, extend - \x{1111}\x{ae4c}\x{1b04}L, LV, spacing mark - \x{1111}\x{ae4c}\x{1b04}\x{0711}\x{1b04}L, LV, spacing mark, extend, spacing mark - *Test CR, LF, and control - \x0d\x{0711}CR, extend - \x0d\x{1b04}CR, spacingmark - \x0a\x{0711}LF, extend - \x0a\x{1b04}LF, spacingmark - \x0b\x{0711}Control, extend - \x09\x{1b04}Control, spacingmark - *There are no Prepend characters, so we can't test Prepend, CR - -/^(?>\X{2})X/8+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - -/^\X{2,4}X/8+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - -/^\X{2,4}?X/8+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - -/\X*Z/8Y - A\x{300} - -/\X*(.)/8Y - A\x{1111}\x{ae4c}\x{1169} - -/\X?abc/8Y -\xff\x7f\x00\x00\x03\x00\x41\xcc\x80\x41\x{300}\x61\x62\x63\x00\>06\? - -/-- --/ - -/\x{1e9e}+/8i - \x{1e9e}\x{00df} - -/[z\x{1e9e}]+/8i - \x{1e9e}\x{00df} - -/\x{00df}+/8i - \x{1e9e}\x{00df} - -/[z\x{00df}]+/8i - \x{1e9e}\x{00df} - -/\x{1f88}+/8i - \x{1f88}\x{1f80} - -/[z\x{1f88}]+/8i - \x{1f88}\x{1f80} - -/-- Characters with more than one other case; test in classes --/ - -/[z\x{00b5}]+/8i - \x{00b5}\x{039c}\x{03bc} - -/[z\x{039c}]+/8i - \x{00b5}\x{039c}\x{03bc} - -/[z\x{03bc}]+/8i - \x{00b5}\x{039c}\x{03bc} - -/[z\x{00c5}]+/8i - \x{00c5}\x{00e5}\x{212b} - -/[z\x{00e5}]+/8i - \x{00c5}\x{00e5}\x{212b} - -/[z\x{212b}]+/8i - \x{00c5}\x{00e5}\x{212b} - -/[z\x{01c4}]+/8i - \x{01c4}\x{01c5}\x{01c6} - -/[z\x{01c5}]+/8i - \x{01c4}\x{01c5}\x{01c6} - -/[z\x{01c6}]+/8i - \x{01c4}\x{01c5}\x{01c6} - -/[z\x{01c7}]+/8i - \x{01c7}\x{01c8}\x{01c9} - -/[z\x{01c8}]+/8i - \x{01c7}\x{01c8}\x{01c9} - -/[z\x{01c9}]+/8i - \x{01c7}\x{01c8}\x{01c9} - -/[z\x{01ca}]+/8i - \x{01ca}\x{01cb}\x{01cc} - -/[z\x{01cb}]+/8i - \x{01ca}\x{01cb}\x{01cc} - -/[z\x{01cc}]+/8i - \x{01ca}\x{01cb}\x{01cc} - -/[z\x{01f1}]+/8i - \x{01f1}\x{01f2}\x{01f3} - -/[z\x{01f2}]+/8i - \x{01f1}\x{01f2}\x{01f3} - -/[z\x{01f3}]+/8i - \x{01f1}\x{01f2}\x{01f3} - -/[z\x{0345}]+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - -/[z\x{0399}]+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - -/[z\x{03b9}]+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - -/[z\x{1fbe}]+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - -/[z\x{0392}]+/8i - \x{0392}\x{03b2}\x{03d0} - -/[z\x{03b2}]+/8i - \x{0392}\x{03b2}\x{03d0} - -/[z\x{03d0}]+/8i - \x{0392}\x{03b2}\x{03d0} - -/[z\x{0395}]+/8i - \x{0395}\x{03b5}\x{03f5} - -/[z\x{03b5}]+/8i - \x{0395}\x{03b5}\x{03f5} - -/[z\x{03f5}]+/8i - \x{0395}\x{03b5}\x{03f5} - -/[z\x{0398}]+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - -/[z\x{03b8}]+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - -/[z\x{03d1}]+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - -/[z\x{03f4}]+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - -/[z\x{039a}]+/8i - \x{039a}\x{03ba}\x{03f0} - -/[z\x{03ba}]+/8i - \x{039a}\x{03ba}\x{03f0} - -/[z\x{03f0}]+/8i - \x{039a}\x{03ba}\x{03f0} - -/[z\x{03a0}]+/8i - \x{03a0}\x{03c0}\x{03d6} - -/[z\x{03c0}]+/8i - \x{03a0}\x{03c0}\x{03d6} - -/[z\x{03d6}]+/8i - \x{03a0}\x{03c0}\x{03d6} - -/[z\x{03a1}]+/8i - \x{03a1}\x{03c1}\x{03f1} - -/[z\x{03c1}]+/8i - \x{03a1}\x{03c1}\x{03f1} - -/[z\x{03f1}]+/8i - \x{03a1}\x{03c1}\x{03f1} - -/[z\x{03a3}]+/8i - \x{03A3}\x{03C2}\x{03C3} - -/[z\x{03c2}]+/8i - \x{03A3}\x{03C2}\x{03C3} - -/[z\x{03c3}]+/8i - \x{03A3}\x{03C2}\x{03C3} - -/[z\x{03a6}]+/8i - \x{03a6}\x{03c6}\x{03d5} - -/[z\x{03c6}]+/8i - \x{03a6}\x{03c6}\x{03d5} - -/[z\x{03d5}]+/8i - \x{03a6}\x{03c6}\x{03d5} - -/[z\x{03c9}]+/8i - \x{03c9}\x{03a9}\x{2126} - -/[z\x{03a9}]+/8i - \x{03c9}\x{03a9}\x{2126} - -/[z\x{2126}]+/8i - \x{03c9}\x{03a9}\x{2126} - -/[z\x{1e60}]+/8i - \x{1e60}\x{1e61}\x{1e9b} - -/[z\x{1e61}]+/8i - \x{1e60}\x{1e61}\x{1e9b} - -/[z\x{1e9b}]+/8i - \x{1e60}\x{1e61}\x{1e9b} - -/-- Perl 5.12.4 gets these wrong, but 5.15.3 is OK --/ - -/[z\x{004b}]+/8i - \x{004b}\x{006b}\x{212a} - -/[z\x{006b}]+/8i - \x{004b}\x{006b}\x{212a} - -/[z\x{212a}]+/8i - \x{004b}\x{006b}\x{212a} - -/[z\x{0053}]+/8i - \x{0053}\x{0073}\x{017f} - -/[z\x{0073}]+/8i - \x{0053}\x{0073}\x{017f} - -/[z\x{017f}]+/8i - \x{0053}\x{0073}\x{017f} - -/-- --/ - -/(ΣΆΜΟΣ) \1/8i - ΣΆΜΟΣ ΣΆΜΟΣ - ΣΆΜΟΣ σάμος - σάμος σάμος - σάμος σάμοσ - σάμος ΣΆΜΟΣ - -/(σάμος) \1/8i - ΣΆΜΟΣ ΣΆΜΟΣ - ΣΆΜΟΣ σάμος - σάμος σάμος - σάμος σάμοσ - σάμος ΣΆΜΟΣ - -/(ΣΆΜΟΣ) \1*/8i - ΣΆΜΟΣ\x20 - ΣΆΜΟΣ ΣΆΜΟΣσάμοςσάμος - -/-- Perl matches these --/ - -/\x{00b5}+/8i - \x{00b5}\x{039c}\x{03bc} - -/\x{039c}+/8i - \x{00b5}\x{039c}\x{03bc} - -/\x{03bc}+/8i - \x{00b5}\x{039c}\x{03bc} - - -/\x{00c5}+/8i - \x{00c5}\x{00e5}\x{212b} - -/\x{00e5}+/8i - \x{00c5}\x{00e5}\x{212b} - -/\x{212b}+/8i - \x{00c5}\x{00e5}\x{212b} - - -/\x{01c4}+/8i - \x{01c4}\x{01c5}\x{01c6} - -/\x{01c5}+/8i - \x{01c4}\x{01c5}\x{01c6} - -/\x{01c6}+/8i - \x{01c4}\x{01c5}\x{01c6} - - -/\x{01c7}+/8i - \x{01c7}\x{01c8}\x{01c9} - -/\x{01c8}+/8i - \x{01c7}\x{01c8}\x{01c9} - -/\x{01c9}+/8i - \x{01c7}\x{01c8}\x{01c9} - - -/\x{01ca}+/8i - \x{01ca}\x{01cb}\x{01cc} - -/\x{01cb}+/8i - \x{01ca}\x{01cb}\x{01cc} - -/\x{01cc}+/8i - \x{01ca}\x{01cb}\x{01cc} - - -/\x{01f1}+/8i - \x{01f1}\x{01f2}\x{01f3} - -/\x{01f2}+/8i - \x{01f1}\x{01f2}\x{01f3} - -/\x{01f3}+/8i - \x{01f1}\x{01f2}\x{01f3} - - -/\x{0345}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - -/\x{0399}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - -/\x{03b9}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - -/\x{1fbe}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - - -/\x{0392}+/8i - \x{0392}\x{03b2}\x{03d0} - -/\x{03b2}+/8i - \x{0392}\x{03b2}\x{03d0} - -/\x{03d0}+/8i - \x{0392}\x{03b2}\x{03d0} - - -/\x{0395}+/8i - \x{0395}\x{03b5}\x{03f5} - -/\x{03b5}+/8i - \x{0395}\x{03b5}\x{03f5} - -/\x{03f5}+/8i - \x{0395}\x{03b5}\x{03f5} - - -/\x{0398}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - -/\x{03b8}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - -/\x{03d1}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - -/\x{03f4}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - - -/\x{039a}+/8i - \x{039a}\x{03ba}\x{03f0} - -/\x{03ba}+/8i - \x{039a}\x{03ba}\x{03f0} - -/\x{03f0}+/8i - \x{039a}\x{03ba}\x{03f0} - - -/\x{03a0}+/8i - \x{03a0}\x{03c0}\x{03d6} - -/\x{03c0}+/8i - \x{03a0}\x{03c0}\x{03d6} - -/\x{03d6}+/8i - \x{03a0}\x{03c0}\x{03d6} - - -/\x{03a1}+/8i - \x{03a1}\x{03c1}\x{03f1} - -/\x{03c1}+/8i - \x{03a1}\x{03c1}\x{03f1} - -/\x{03f1}+/8i - \x{03a1}\x{03c1}\x{03f1} - - -/\x{03a3}+/8i - \x{03A3}\x{03C2}\x{03C3} - -/\x{03c2}+/8i - \x{03A3}\x{03C2}\x{03C3} - -/\x{03c3}+/8i - \x{03A3}\x{03C2}\x{03C3} - - -/\x{03a6}+/8i - \x{03a6}\x{03c6}\x{03d5} - -/\x{03c6}+/8i - \x{03a6}\x{03c6}\x{03d5} - -/\x{03d5}+/8i - \x{03a6}\x{03c6}\x{03d5} - - -/\x{03c9}+/8i - \x{03c9}\x{03a9}\x{2126} - -/\x{03a9}+/8i - \x{03c9}\x{03a9}\x{2126} - -/\x{2126}+/8i - \x{03c9}\x{03a9}\x{2126} - - -/\x{1e60}+/8i - \x{1e60}\x{1e61}\x{1e9b} - -/\x{1e61}+/8i - \x{1e60}\x{1e61}\x{1e9b} - -/\x{1e9b}+/8i - \x{1e60}\x{1e61}\x{1e9b} - - -/\x{1e9e}+/8i - \x{1e9e}\x{00df} - -/\x{00df}+/8i - \x{1e9e}\x{00df} - - -/\x{1f88}+/8i - \x{1f88}\x{1f80} - -/\x{1f80}+/8i - \x{1f88}\x{1f80} - - -/-- Perl 5.12.4 gets these wrong, but 5.15.3 is OK --/ - -/\x{004b}+/8i - \x{004b}\x{006b}\x{212a} - -/\x{006b}+/8i - \x{004b}\x{006b}\x{212a} - -/\x{212a}+/8i - \x{004b}\x{006b}\x{212a} - - -/\x{0053}+/8i - \x{0053}\x{0073}\x{017f} - -/\x{0073}+/8i - \x{0053}\x{0073}\x{017f} - -/\x{017f}+/8i - \x{0053}\x{0073}\x{017f} - -/^\p{Any}*\d{4}/8 - 1234 - 123 - -/^\X*\w{4}/8 - 1234 - 123 - -/^A\s+Z/8W - A\x{2005}Z - A\x{85}\x{180e}\x{2005}Z - -/^A[\s]+Z/8W - A\x{2005}Z - A\x{85}\x{180e}\x{2005}Z - -/^[[:graph:]]+$/8W - Letter:ABC - Mark:\x{300}\x{1d172}\x{1d17b} - Number:9\x{660} - Punctuation:\x{66a},; - Symbol:\x{6de}<>\x{fffc} - Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} - \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} - \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} - \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} - \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} - \x{feff} - \x{fff9}\x{fffa}\x{fffb} - \x{110bd} - \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} - \x{e0001} - \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} - ** Failers - \x{09} - \x{0a} - \x{1D} - \x{20} - \x{85} - \x{a0} - \x{61c} - \x{1680} - \x{180e} - \x{2028} - \x{2029} - \x{202f} - \x{2065} - \x{2066} - \x{2067} - \x{2068} - \x{2069} - \x{3000} - \x{e0002} - \x{e001f} - \x{e0080} - -/^[[:print:]]+$/8W - Space: \x{a0} - \x{1680}\x{2000}\x{2001}\x{2002}\x{2003}\x{2004}\x{2005} - \x{2006}\x{2007}\x{2008}\x{2009}\x{200a} - \x{202f}\x{205f} - \x{3000} - Letter:ABC - Mark:\x{300}\x{1d172}\x{1d17b} - Number:9\x{660} - Punctuation:\x{66a},; - Symbol:\x{6de}<>\x{fffc} - Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} - \x{180e} - \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} - \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} - \x{202f} - \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} - \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} - \x{feff} - \x{fff9}\x{fffa}\x{fffb} - \x{110bd} - \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} - \x{e0001} - \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} - ** Failers - \x{09} - \x{1D} - \x{85} - \x{61c} - \x{2028} - \x{2029} - \x{2065} - \x{2066} - \x{2067} - \x{2068} - \x{2069} - \x{e0002} - \x{e001f} - \x{e0080} - -/^[[:punct:]]+$/8W - \$+<=>^`|~ - !\"#%&'()*,-./:;?@[\\]_{} - \x{a1}\x{a7} - \x{37e} - ** Failers - abcde - -/^[[:^graph:]]+$/8W - \x{09}\x{0a}\x{1D}\x{20}\x{85}\x{a0}\x{61c}\x{1680}\x{180e} - \x{2028}\x{2029}\x{202f}\x{2065}\x{2066}\x{2067}\x{2068}\x{2069} - \x{3000}\x{e0002}\x{e001f}\x{e0080} - ** Failers - Letter:ABC - Mark:\x{300}\x{1d172}\x{1d17b} - Number:9\x{660} - Punctuation:\x{66a},; - Symbol:\x{6de}<>\x{fffc} - Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} - \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} - \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} - \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} - \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} - \x{feff} - \x{fff9}\x{fffa}\x{fffb} - \x{110bd} - \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} - \x{e0001} - \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} - -/^[[:^print:]]+$/8W - \x{09}\x{1D}\x{85}\x{61c}\x{2028}\x{2029}\x{2065}\x{2066}\x{2067} - \x{2068}\x{2069}\x{e0002}\x{e001f}\x{e0080} - ** Failers - Space: \x{a0} - \x{1680}\x{2000}\x{2001}\x{2002}\x{2003}\x{2004}\x{2005} - \x{2006}\x{2007}\x{2008}\x{2009}\x{200a} - \x{202f}\x{205f} - \x{3000} - Letter:ABC - Mark:\x{300}\x{1d172}\x{1d17b} - Number:9\x{660} - Punctuation:\x{66a},; - Symbol:\x{6de}<>\x{fffc} - Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} - \x{180e} - \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} - \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} - \x{202f} - \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} - \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} - \x{feff} - \x{fff9}\x{fffa}\x{fffb} - \x{110bd} - \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} - \x{e0001} - \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} - -/^[[:^punct:]]+$/8W - abcde - ** Failers - \$+<=>^`|~ - !\"#%&'()*,-./:;?@[\\]_{} - \x{a1}\x{a7} - \x{37e} - -/[RST]+/8iW - Ss\x{17f} - -/[R-T]+/8iW - Ss\x{17f} - -/[q-u]+/8iW - Ss\x{17f} - -/^s?c/mi8 - scat - -/[A-`]/i8 - abcdefghijklmno - -/\C\X*QT/8 - Ó…\x0aT - -/[\pS#moq]/ - = - -/[[:punct:]]/8W - \xc2\xb4 - \x{b4} - -/[[:^ascii:]]/8W - \x{100} - \x{200} - \x{300} - \x{37e} - a - 9 - g - -/[[:^ascii:]\w]/8W - a - 9 - g - \x{100} - \x{200} - \x{300} - \x{37e} - -/[\w[:^ascii:]]/8W - a - 9 - g - \x{100} - \x{200} - \x{300} - \x{37e} - -/[^[:ascii:]\W]/8W - a - 9 - g - \x{100} - \x{200} - \x{300} - \x{37e} - -/[[:^ascii:]a]/8W - a - 9 - g - \x{100} - \x{200} - \x{37e} - -/[^[:^ascii:]\d]/8W - a - ~ - 0 - \a - \x{7f} - \x{389} - \x{20ac} - -/(?=.*b)\pL/ - 11bb - -/(?(?=.*b)(?=.*b)\pL|.*c)/ - 11bb - -/-- End of testinput6 --/ diff --git a/src/pcre/testdata/testinput7 b/src/pcre/testdata/testinput7 deleted file mode 100644 index f44a810f..00000000 --- a/src/pcre/testdata/testinput7 +++ /dev/null @@ -1,851 +0,0 @@ -/-- These tests for Unicode property support test PCRE's API and show some of - the compiled code. They are not Perl-compatible. --/ - -/[\p{L}]/DZ - -/[\p{^L}]/DZ - -/[\P{L}]/DZ - -/[\P{^L}]/DZ - -/[abc\p{L}\x{0660}]/8DZ - -/[\p{Nd}]/8DZ - 1234 - -/[\p{Nd}+-]+/8DZ - 1234 - 12-34 - 12+\x{661}-34 - ** Failers - abcd - -/[\x{105}-\x{109}]/8iDZ - \x{104} - \x{105} - \x{109} - ** Failers - \x{100} - \x{10a} - -/[z-\x{100}]/8iDZ - Z - z - \x{39c} - \x{178} - | - \x{80} - \x{ff} - \x{100} - \x{101} - ** Failers - \x{102} - Y - y - -/[z-\x{100}]/8DZi - -/(?:[\PPa*]*){8,}/ - -/[\P{Any}]/BZ - -/[\P{Any}\E]/BZ - -/(\P{Yi}+\277)/ - -/(\P{Yi}+\277)?/ - -/(?<=\P{Yi}{3}A)X/ - -/\p{Yi}+(\P{Yi}+)(?1)/ - -/(\P{Yi}{2}\277)?/ - -/[\P{Yi}A]/ - -/[\P{Yi}\P{Yi}\P{Yi}A]/ - -/[^\P{Yi}A]/ - -/[^\P{Yi}\P{Yi}\P{Yi}A]/ - -/(\P{Yi}*\277)*/ - -/(\P{Yi}*?\277)*/ - -/(\p{Yi}*+\277)*/ - -/(\P{Yi}?\277)*/ - -/(\P{Yi}??\277)*/ - -/(\p{Yi}?+\277)*/ - -/(\P{Yi}{0,3}\277)*/ - -/(\P{Yi}{0,3}?\277)*/ - -/(\p{Yi}{0,3}+\277)*/ - -/\p{Zl}{2,3}+/8BZ - 

 - \x{2028}\x{2028}\x{2028} - -/\p{Zl}/8BZ - -/\p{Lu}{3}+/8BZ - -/\pL{2}+/8BZ - -/\p{Cc}{2}+/8BZ - -/^\p{Cf}/8 - \x{180e} - \x{061c} - \x{2066} - \x{2067} - \x{2068} - \x{2069} - -/^\p{Cs}/8 - \?\x{dfff} - ** Failers - \x{09f} - -/^\p{Mn}/8 - \x{1a1b} - -/^\p{Pe}/8 - \x{2309} - \x{230b} - -/^\p{Ps}/8 - \x{2308} - \x{230a} - -/^\p{Sc}+/8 - $\x{a2}\x{a3}\x{a4}\x{a5}\x{a6} - \x{9f2} - ** Failers - X - \x{2c2} - -/^\p{Zs}/8 - \ \ - \x{a0} - \x{1680} - \x{2000} - \x{2001} - ** Failers - \x{2028} - \x{200d} - -/-- These are here rather than in test 6 because Perl has problems with - the negative versions of the properties and behaves has changed how - it behaves for caseless matching. --/ - -/\p{^Lu}/8i - 1234 - ** Failers - ABC - -/\P{Lu}/8i - 1234 - ** Failers - ABC - -/\p{Ll}/8i - a - Az - ** Failers - ABC - -/\p{Lu}/8i - A - a\x{10a0}B - ** Failers - a - \x{1d00} - -/\p{Lu}/8i - A - aZ - ** Failers - abc - -/[\x{c0}\x{391}]/8i - \x{c0} - \x{e0} - -/-- The next two are special cases where the lengths of the different cases of -the same character differ. The first went wrong with heap frame storage; the -second was broken in all cases. --/ - -/^\x{023a}+?(\x{0130}+)/8i - \x{023a}\x{2c65}\x{0130} - -/^\x{023a}+([^X])/8i - \x{023a}\x{2c65}X - -/\x{c0}+\x{116}+/8i - \x{c0}\x{e0}\x{116}\x{117} - -/[\x{c0}\x{116}]+/8i - \x{c0}\x{e0}\x{116}\x{117} - -/(\x{de})\1/8i - \x{de}\x{de} - \x{de}\x{fe} - \x{fe}\x{fe} - \x{fe}\x{de} - -/^\x{c0}$/8i - \x{c0} - \x{e0} - -/^\x{e0}$/8i - \x{c0} - \x{e0} - -/-- The next two should be Perl-compatible, but it fails to match \x{e0}. PCRE -will match it only with UCP support, because without that it has no notion -of case for anything other than the ASCII letters. --/ - -/((?i)[\x{c0}])/8 - \x{c0} - \x{e0} - -/(?i:[\x{c0}])/8 - \x{c0} - \x{e0} - -/-- These are PCRE's extra properties to help with Unicodizing \d etc. --/ - -/^\p{Xan}/8 - ABCD - 1234 - \x{6ca} - \x{a6c} - \x{10a7} - ** Failers - _ABC - -/^\p{Xan}+/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - ** Failers - _ABC - -/^\p{Xan}+?/8 - \x{6ca}\x{a6c}\x{10a7}_ - -/^\p{Xan}*/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - -/^\p{Xan}{2,9}/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - -/^\p{Xan}{2,9}?/8 - \x{6ca}\x{a6c}\x{10a7}_ - -/^[\p{Xan}]/8 - ABCD1234_ - 1234abcd_ - \x{6ca} - \x{a6c} - \x{10a7} - ** Failers - _ABC - -/^[\p{Xan}]+/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - ** Failers - _ABC - -/^>\p{Xsp}/8 - >\x{1680}\x{2028}\x{0b} - >\x{a0} - ** Failers - \x{0b} - -/^>\p{Xsp}+/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xsp}+?/8 - >\x{1680}\x{2028}\x{0b} - -/^>\p{Xsp}*/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xsp}{2,9}/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xsp}{2,9}?/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>[\p{Xsp}]/8 - >\x{2028}\x{0b} - -/^>[\p{Xsp}]+/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}/8 - >\x{1680}\x{2028}\x{0b} - >\x{a0} - ** Failers - \x{0b} - -/^>\p{Xps}+/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}+?/8 - >\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}*/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}{2,9}/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}{2,9}?/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>[\p{Xps}]/8 - >\x{2028}\x{0b} - -/^>[\p{Xps}]+/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^\p{Xwd}/8 - ABCD - 1234 - \x{6ca} - \x{a6c} - \x{10a7} - _ABC - ** Failers - [] - -/^\p{Xwd}+/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - -/^\p{Xwd}+?/8 - \x{6ca}\x{a6c}\x{10a7}_ - -/^\p{Xwd}*/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - -/^\p{Xwd}{2,9}/8 - A_B12\x{6ca}\x{a6c}\x{10a7} - -/^\p{Xwd}{2,9}?/8 - \x{6ca}\x{a6c}\x{10a7}_ - -/^[\p{Xwd}]/8 - ABCD1234_ - 1234abcd_ - \x{6ca} - \x{a6c} - \x{10a7} - _ABC - ** Failers - [] - -/^[\p{Xwd}]+/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - -/-- A check not in UTF-8 mode --/ - -/^[\p{Xwd}]+/ - ABCD1234_ - -/-- Some negative checks --/ - -/^[\P{Xwd}]+/8 - !.+\x{019}\x{35a}AB - -/^[\p{^Xwd}]+/8 - !.+\x{019}\x{35a}AB - -/[\D]/WBZ8 - 1\x{3c8}2 - -/[\d]/WBZ8 - >\x{6f4}< - -/[\S]/WBZ8 - \x{1680}\x{6f4}\x{1680} - -/[\s]/WBZ8 - >\x{1680}< - -/[\W]/WBZ8 - A\x{1712}B - -/[\w]/WBZ8 - >\x{1723}< - -/\D/WBZ8 - 1\x{3c8}2 - -/\d/WBZ8 - >\x{6f4}< - -/\S/WBZ8 - \x{1680}\x{6f4}\x{1680} - -/\s/WBZ8 - >\x{1680}> - -/\W/WBZ8 - A\x{1712}B - -/\w/WBZ8 - >\x{1723}< - -/[[:alpha:]]/WBZ - -/[[:lower:]]/WBZ - -/[[:upper:]]/WBZ - -/[[:alnum:]]/WBZ - -/[[:ascii:]]/WBZ - -/[[:cntrl:]]/WBZ - -/[[:digit:]]/WBZ - -/[[:graph:]]/WBZ - -/[[:print:]]/WBZ - -/[[:punct:]]/WBZ - -/[[:space:]]/WBZ - -/[[:word:]]/WBZ - -/[[:xdigit:]]/WBZ - -/-- Unicode properties for \b abd \B --/ - -/\b...\B/8W - abc_ - \x{37e}abc\x{376} - \x{37e}\x{376}\x{371}\x{393}\x{394} - !\x{c0}++\x{c1}\x{c2} - !\x{c0}+++++ - -/-- Without PCRE_UCP, non-ASCII always fail, even if < 256 --/ - -/\b...\B/8 - abc_ - ** Failers - \x{37e}abc\x{376} - \x{37e}\x{376}\x{371}\x{393}\x{394} - !\x{c0}++\x{c1}\x{c2} - !\x{c0}+++++ - -/-- With PCRE_UCP, non-UTF8 chars that are < 256 still check properties --/ - -/\b...\B/W - abc_ - !\x{c0}++\x{c1}\x{c2} - !\x{c0}+++++ - -/-- Some of these are silly, but they check various combinations --/ - -/[[:^alpha:][:^cntrl:]]+/8WBZ - 123 - abc - -/[[:^cntrl:][:^alpha:]]+/8WBZ - 123 - abc - -/[[:alpha:]]+/8WBZ - abc - -/[[:^alpha:]\S]+/8WBZ - 123 - abc - -/[^\d]+/8WBZ - abc123 - abc\x{123} - \x{660}abc - -/\p{Lu}+9\p{Lu}+B\p{Lu}+b/BZ - -/\p{^Lu}+9\p{^Lu}+B\p{^Lu}+b/BZ - -/\P{Lu}+9\P{Lu}+B\P{Lu}+b/BZ - -/\p{Han}+X\p{Greek}+\x{370}/BZ8 - -/\p{Xan}+!\p{Xan}+A/BZ - -/\p{Xsp}+!\p{Xsp}\t/BZ - -/\p{Xps}+!\p{Xps}\t/BZ - -/\p{Xwd}+!\p{Xwd}_/BZ - -/A+\p{N}A+\dB+\p{N}*B+\d*/WBZ - -/-- These behaved oddly in Perl, so they are kept in this test --/ - -/(\x{23a}\x{23a}\x{23a})?\1/8i - \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65} - -/(ȺȺȺ)?\1/8i - ȺȺȺⱥⱥ - -/(\x{23a}\x{23a}\x{23a})?\1/8i - \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} - -/(ȺȺȺ)?\1/8i - ȺȺȺⱥⱥⱥ - -/(\x{23a}\x{23a}\x{23a})\1/8i - \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65} - -/(ȺȺȺ)\1/8i - ȺȺȺⱥⱥ - -/(\x{23a}\x{23a}\x{23a})\1/8i - \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} - -/(ȺȺȺ)\1/8i - ȺȺȺⱥⱥⱥ - -/(\x{2c65}\x{2c65})\1/8i - \x{2c65}\x{2c65}\x{23a}\x{23a} - -/(ⱥⱥ)\1/8i - ⱥⱥȺȺ - -/(\x{23a}\x{23a}\x{23a})\1Y/8i - X\x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65}YZ - -/(\x{2c65}\x{2c65})\1Y/8i - X\x{2c65}\x{2c65}\x{23a}\x{23a}YZ - -/-- --/ - -/-- These scripts weren't yet in Perl when I added Unicode 6.0.0 to PCRE --/ - -/^[\p{Batak}]/8 - \x{1bc0} - \x{1bff} - ** Failers - \x{1bf4} - -/^[\p{Brahmi}]/8 - \x{11000} - \x{1106f} - ** Failers - \x{1104e} - -/^[\p{Mandaic}]/8 - \x{840} - \x{85e} - ** Failers - \x{85c} - \x{85d} - -/-- --/ - -/(\X*)(.)/s8 - A\x{300} - -/^S(\X*)e(\X*)$/8 - SteÌreÌo - -/^\X/8 - ÌreÌo - -/^a\X41z/ - aX41z - *** Failers - aAz - -/(?<=ab\Cde)X/8 - -/\X/ - a\P - a\P\P - -/\Xa/ - aa\P - aa\P\P - -/\X{2}/ - aa\P - aa\P\P - -/\X+a/ - a\P - aa\P - aa\P\P - -/\X+?a/ - a\P - ab\P - aa\P - aa\P\P - aba\P - -/-- These Unicode 6.1.0 scripts are not known to Perl. --/ - -/\p{Chakma}\d/8W - \x{11100}\x{1113c} - -/\p{Takri}\d/8W - \x{11680}\x{116c0} - -/^\X/8 - A\P - A\P\P - A\x{300}\x{301}\P - A\x{300}\x{301}\P\P - A\x{301}\P - A\x{301}\P\P - -/^\X{2,3}/8 - A\P - A\P\P - AA\P - AA\P\P - A\x{300}\x{301}\P - A\x{300}\x{301}\P\P - A\x{300}\x{301}A\x{300}\x{301}\P - A\x{300}\x{301}A\x{300}\x{301}\P\P - -/^\X{2}/8 - AA\P - AA\P\P - A\x{300}\x{301}A\x{300}\x{301}\P - A\x{300}\x{301}A\x{300}\x{301}\P\P - -/^\X+/8 - AA\P - AA\P\P - -/^\X+?Z/8 - AA\P - AA\P\P - -/A\x{3a3}B/8iDZ - -/\x{3a3}B/8iDZ - -/[\x{3a3}]/8iBZ - -/[^\x{3a3}]/8iBZ - -/[\x{3a3}]+/8iBZ - -/[^\x{3a3}]+/8iBZ - -/a*\x{3a3}/8iBZ - -/\x{3a3}+a/8iBZ - -/\x{3a3}*\x{3c2}/8iBZ - -/\x{3a3}{3}/8i+ - \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} - -/\x{3a3}{2,4}/8i+ - \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} - -/\x{3a3}{2,4}?/8i+ - \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} - -/\x{3a3}+./8i+ - \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} - -/\x{3a3}++./8i+ - ** Failers - \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} - -/\x{3a3}*\x{3c2}/8iBZ - -/[^\x{3a3}]*\x{3c2}/8iBZ - -/[^a]*\x{3c2}/8iBZ - -/ist/8iBZ - ikt - -/is+t/8i - iSs\x{17f}t - ikt - -/is+?t/8i - ikt - -/is?t/8i - ikt - -/is{2}t/8i - iskt - -/-- This property is a PCRE special --/ - -/^\p{Xuc}/8 - $abc - @abc - `abc - \x{1234}abc - ** Failers - abc - -/^\p{Xuc}+/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^\p{Xuc}+?/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^\p{Xuc}+?\*/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^\p{Xuc}++/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^\p{Xuc}{3,5}/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^\p{Xuc}{3,5}?/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^[\p{Xuc}]/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^[\p{Xuc}]+/8 - $@`\x{a0}\x{1234}\x{e000}** - ** Failers - \x{9f} - -/^\P{Xuc}/8 - abc - ** Failers - $abc - @abc - `abc - \x{1234}abc - -/^[\P{Xuc}]/8 - abc - ** Failers - $abc - @abc - `abc - \x{1234}abc - -/-- Some auto-possessification tests --/ - -/\pN+\z/BZ - -/\PN+\z/BZ - -/\pN+/BZ - -/\PN+/BZ - -/\p{Any}+\p{Any} \p{Any}+\P{Any} \p{Any}+\p{L&} \p{Any}+\p{L} \p{Any}+\p{Lu} \p{Any}+\p{Han} \p{Any}+\p{Xan} \p{Any}+\p{Xsp} \p{Any}+\p{Xps} \p{Xwd}+\p{Any} \p{Any}+\p{Xuc}/BWZx - -/\p{L&}+\p{Any} \p{L&}+\p{L&} \P{L&}+\p{L&} \p{L&}+\p{L} \p{L&}+\p{Lu} \p{L&}+\p{Han} \p{L&}+\p{Xan} \p{L&}+\P{Xan} \p{L&}+\p{Xsp} \p{L&}+\p{Xps} \p{Xwd}+\p{L&} \p{L&}+\p{Xuc}/BWZx - -/\p{N}+\p{Any} \p{N}+\p{L&} \p{N}+\p{L} \p{N}+\P{L} \p{N}+\P{N} \p{N}+\p{Lu} \p{N}+\p{Han} \p{N}+\p{Xan} \p{N}+\p{Xsp} \p{N}+\p{Xps} \p{Xwd}+\p{N} \p{N}+\p{Xuc}/BWZx - -/\p{Lu}+\p{Any} \p{Lu}+\p{L&} \p{Lu}+\p{L} \p{Lu}+\p{Lu} \P{Lu}+\p{Lu} \p{Lu}+\p{Nd} \p{Lu}+\P{Nd} \p{Lu}+\p{Han} \p{Lu}+\p{Xan} \p{Lu}+\p{Xsp} \p{Lu}+\p{Xps} \p{Xwd}+\p{Lu} \p{Lu}+\p{Xuc}/BWZx - -/\p{Han}+\p{Lu} \p{Han}+\p{L&} \p{Han}+\p{L} \p{Han}+\p{Lu} \p{Han}+\p{Arabic} \p{Arabic}+\p{Arabic} \p{Han}+\p{Xan} \p{Han}+\p{Xsp} \p{Han}+\p{Xps} \p{Xwd}+\p{Han} \p{Han}+\p{Xuc}/BWZx - -/\p{Xan}+\p{Any} \p{Xan}+\p{L&} \P{Xan}+\p{L&} \p{Xan}+\p{L} \p{Xan}+\p{Lu} \p{Xan}+\p{Han} \p{Xan}+\p{Xan} \p{Xan}+\P{Xan} \p{Xan}+\p{Xsp} \p{Xan}+\p{Xps} \p{Xwd}+\p{Xan} \p{Xan}+\p{Xuc}/BWZx - -/\p{Xsp}+\p{Any} \p{Xsp}+\p{L&} \p{Xsp}+\p{L} \p{Xsp}+\p{Lu} \p{Xsp}+\p{Han} \p{Xsp}+\p{Xan} \p{Xsp}+\p{Xsp} \P{Xsp}+\p{Xsp} \p{Xsp}+\p{Xps} \p{Xwd}+\p{Xsp} \p{Xsp}+\p{Xuc}/BWZx - -/\p{Xwd}+\p{Any} \p{Xwd}+\p{L&} \p{Xwd}+\p{L} \p{Xwd}+\p{Lu} \p{Xwd}+\p{Han} \p{Xwd}+\p{Xan} \p{Xwd}+\p{Xsp} \p{Xwd}+\p{Xps} \p{Xwd}+\p{Xwd} \p{Xwd}+\P{Xwd} \p{Xwd}+\p{Xuc}/BWZx - -/\p{Xuc}+\p{Any} \p{Xuc}+\p{L&} \p{Xuc}+\p{L} \p{Xuc}+\p{Lu} \p{Xuc}+\p{Han} \p{Xuc}+\p{Xan} \p{Xuc}+\p{Xsp} \p{Xuc}+\p{Xps} \p{Xwd}+\p{Xuc} \p{Xuc}+\p{Xuc} \p{Xuc}+\P{Xuc}/BWZx - -/\p{N}+\p{Ll} \p{N}+\p{Nd} \p{N}+\P{Nd}/BWZx - -/\p{Xan}+\p{L} \p{Xan}+\p{N} \p{Xan}+\p{C} \p{Xan}+\P{L} \P{Xan}+\p{N} \p{Xan}+\P{C}/BWZx - -/\p{L}+\p{Xan} \p{N}+\p{Xan} \p{C}+\p{Xan} \P{L}+\p{Xan} \p{N}+\p{Xan} \P{C}+\p{Xan} \p{L}+\P{Xan}/BWZx - -/\p{Xan}+\p{Lu} \p{Xan}+\p{Nd} \p{Xan}+\p{Cc} \p{Xan}+\P{Ll} \P{Xan}+\p{No} \p{Xan}+\P{Cf}/BWZx - -/\p{Lu}+\p{Xan} \p{Nd}+\p{Xan} \p{Cs}+\p{Xan} \P{Lt}+\p{Xan} \p{Nl}+\p{Xan} \P{Cc}+\p{Xan} \p{Lt}+\P{Xan}/BWZx - -/\w+\p{P} \w+\p{Po} \w+\s \p{Xan}+\s \s+\p{Xan} \s+\w/BWZx - -/\w+\P{P} \W+\p{Po} \w+\S \P{Xan}+\s \s+\P{Xan} \s+\W/BWZx - -/\w+\p{Po} \w+\p{Pc} \W+\p{Po} \W+\p{Pc} \w+\P{Po} \w+\P{Pc}/BWZx - -/\p{Nl}+\p{Xan} \P{Nl}+\p{Xan} \p{Nl}+\P{Xan} \P{Nl}+\P{Xan}/BWZx - -/\p{Xan}+\p{Nl} \P{Xan}+\p{Nl} \p{Xan}+\P{Nl} \P{Xan}+\P{Nl}/BWZx - -/\p{Xan}+\p{Nd} \P{Xan}+\p{Nd} \p{Xan}+\P{Nd} \P{Xan}+\P{Nd}/BWZx - -/-- End auto-possessification tests --/ - -/\w+/8CWBZ - abcd - -/[\p{N}]?+/BZO - -/[\p{L}ab]{2,3}+/BZO - -/\D+\X \d+\X \S+\X \s+\X \W+\X \w+\X \C+\X \R+\X \H+\X \h+\X \V+\X \v+\X a+\X \n+\X .+\X/BZx - -/.+\X/BZxs - -/\X+$/BZxm - -/\X+\D \X+\d \X+\S \X+\s \X+\W \X+\w \X+. \X+\C \X+\R \X+\H \X+\h \X+\V \X+\v \X+\X \X+\Z \X+\z \X+$/BZx - -/\d+\s{0,5}=\s*\S?=\w{0,4}\W*/8WBZ - -/[RST]+/8iWBZ - -/[R-T]+/8iWBZ - -/[Q-U]+/8iWBZ - -/^s?c/mi8I - scat - -/a[[:punct:]b]/WBZ - -/a[[:punct:]b]/8WBZ - -/a[b[:punct:]]/8WBZ - -/L(?#(|++\S/8 - > >X Y - > >\x{100} Y - -/\d/8 - \x{100}3 - -/\s/8 - \x{100} X - -/\D+/8 - 12abcd34 - *** Failers - 1234 - -/\D{2,3}/8 - 12abcd34 - 12ab34 - *** Failers - 1234 - 12a34 - -/\D{2,3}?/8 - 12abcd34 - 12ab34 - *** Failers - 1234 - 12a34 - -/\d+/8 - 12abcd34 - *** Failers - -/\d{2,3}/8 - 12abcd34 - 1234abcd - *** Failers - 1.4 - -/\d{2,3}?/8 - 12abcd34 - 1234abcd - *** Failers - 1.4 - -/\S+/8 - 12abcd34 - *** Failers - \ \ - -/\S{2,3}/8 - 12abcd34 - 1234abcd - *** Failers - \ \ - -/\S{2,3}?/8 - 12abcd34 - 1234abcd - *** Failers - \ \ - -/>\s+ <34 - *** Failers - -/>\s{2,3} \s{2,3}? \xff< - -/[\xff]/8 - >\x{ff}< - -/[^\xFF]/ - XYZ - -/[^\xff]/8 - XYZ - \x{123} - -/^[ac]*b/8 - xb - -/^[ac\x{100}]*b/8 - xb - -/^[^x]*b/8i - xb - -/^[^x]*b/8 - xb - -/^\d*b/8 - xb - -/(|a)/g8 - catac - a\x{256}a - -/^\x{85}$/8i - \x{85} - -/^abc./mgx8 - abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x{0085}abc7 \x{2028}abc8 \x{2029}abc9 JUNK - -/abc.$/mgx8 - abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x{0085} abc7\x{2028} abc8\x{2029} abc9 - -/^a\Rb/8 - a\nb - a\rb - a\r\nb - a\x0bb - a\x0cb - a\x{85}b - a\x{2028}b - a\x{2029}b - ** Failers - a\n\rb - -/^a\R*b/8 - ab - a\nb - a\rb - a\r\nb - a\x0bb - a\x0c\x{2028}\x{2029}b - a\x{85}b - a\n\rb - a\n\r\x{85}\x0cb - -/^a\R+b/8 - a\nb - a\rb - a\r\nb - a\x0bb - a\x0c\x{2028}\x{2029}b - a\x{85}b - a\n\rb - a\n\r\x{85}\x0cb - ** Failers - ab - -/^a\R{1,3}b/8 - a\nb - a\n\rb - a\n\r\x{85}b - a\r\n\r\nb - a\r\n\r\n\r\nb - a\n\r\n\rb - a\n\n\r\nb - ** Failers - a\n\n\n\rb - a\r - -/\h+\V?\v{3,4}/8O - \x09\x20\x{a0}X\x0a\x0b\x0c\x0d\x0a - -/\V?\v{3,4}/8O - \x20\x{a0}X\x0a\x0b\x0c\x0d\x0a - -/\h+\V?\v{3,4}/8O - >\x09\x20\x{a0}X\x0a\x0a\x0a< - -/\V?\v{3,4}/8O - >\x09\x20\x{a0}X\x0a\x0a\x0a< - -/\H\h\V\v/8 - X X\x0a - X\x09X\x0b - ** Failers - \x{a0} X\x0a - -/\H*\h+\V?\v{3,4}/8O - \x09\x20\x{a0}X\x0a\x0b\x0c\x0d\x0a - \x09\x20\x{a0}\x0a\x0b\x0c\x0d\x0a - \x09\x20\x{a0}\x0a\x0b\x0c - ** Failers - \x09\x20\x{a0}\x0a\x0b - -/\H\h\V\v/8 - \x{3001}\x{3000}\x{2030}\x{2028} - X\x{180e}X\x{85} - ** Failers - \x{2009} X\x0a - -/\H*\h+\V?\v{3,4}/8O - \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x0c\x0d\x0a - \x09\x{205f}\x{a0}\x0a\x{2029}\x0c\x{2028}\x0a - \x09\x20\x{202f}\x0a\x0b\x0c - ** Failers - \x09\x{200a}\x{a0}\x{2028}\x0b - -/a\Rb/I8 - a\rb - a\nb - a\r\nb - ** Failers - a\x{85}b - a\x0bb - -/a\Rb/I8 - a\rb - a\nb - a\r\nb - a\x{85}b - a\x0bb - ** Failers - a\x{85}b\ - a\x0bb\ - -/a\R?b/I8 - a\rb - a\nb - a\r\nb - ** Failers - a\x{85}b - a\x0bb - -/a\R?b/I8 - a\rb - a\nb - a\r\nb - a\x{85}b - a\x0bb - ** Failers - a\x{85}b\ - a\x0bb\ - -/X/8f - A\x{1ec5}ABCXYZ - -/abcd*/8 - xxxxabcd\P - xxxxabcd\P\P - -/abcd*/i8 - xxxxabcd\P - xxxxabcd\P\P - XXXXABCD\P - XXXXABCD\P\P - -/abc\d*/8 - xxxxabc1\P - xxxxabc1\P\P - -/abc[de]*/8 - xxxxabcde\P - xxxxabcde\P\P - -/\bthe cat\b/8 - the cat\P - the cat\P\P - -/ab\Cde/8 - abXde - -/(?<=ab\Cde)X/8 - -/./8 - \r\P - \r\P\P - -/.{2,3}/8 - \r\P - \r\P\P - \r\r\P - \r\r\P\P - \r\r\r\P - \r\r\r\P\P - -/.{2,3}?/8 - \r\P - \r\P\P - \r\r\P - \r\r\P\P - \r\r\r\P - \r\r\r\P\P - -/[^\x{100}]/8 - \x{100}\x{101}X - -/[^\x{100}]+/8 - \x{100}\x{101}X - -/-- End of testinput9 --/ diff --git a/src/pcre/testdata/testinputEBC b/src/pcre/testdata/testinputEBC deleted file mode 100644 index 378755d3..00000000 --- a/src/pcre/testdata/testinputEBC +++ /dev/null @@ -1,124 +0,0 @@ -/-- This is a specialized test for checking, when PCRE is compiled with the -EBCDIC option but in an ASCII environment, that newline and white space -functionality is working. It catches cases where explicit values such as 0x0a -have been used instead of names like CHAR_LF. Needless to say, it is not a -genuine EBCDIC test! In patterns, alphabetic characters that follow a backslash -must be in EBCDIC code. In data, newlines and other spacing characters must be -in EBCDIC, but can be specified as escapes. --/ - -/-- Test default newline and variations --/ - -/^A/m - ABC - 12\x15ABC - -/^A/m - 12\x15ABC - 12\x0dABC - 12\x0d\x15ABC - 12\x25ABC - -/^A/m - 12\x15ABC - 12\x0dABC - 12\x0d\x15ABC - ** Fail - 12\x25ABC - -/-- Test \h --/ - -/^A\ˆ/ - A B - A\x41B - -/-- Test \H --/ - -/^A\È/ - AB - A\x42B - ** Fail - A B - A\x41B - -/-- Test \R --/ - -/^A\Ù/ - A\x15B - A\x0dB - A\x25B - A\x0bB - A\x0cB - ** Fail - A B - -/-- Test \v --/ - -/^A\¥/ - A\x15B - A\x0dB - A\x25B - A\x0bB - A\x0cB - ** Fail - A B - -/-- Test \V --/ - -/^A\å/ - A B - ** Fail - A\x15B - A\x0dB - A\x25B - A\x0bB - A\x0cB - -/-- For repeated items, use an atomic group so that the output is the same -for DFA matching (otherwise it may show multiple matches). --/ - -/-- Test \h+ --/ - -/^A(?>\ˆ+)/ - A B - -/-- Test \H+ --/ - -/^A(?>\È+)/ - AB - ** Fail - A B - -/-- Test \R+ --/ - -/^A(?>\Ù+)/ - A\x15B - A\x0dB - A\x25B - A\x0bB - A\x0cB - ** Fail - A B - -/-- Test \v+ --/ - -/^A(?>\¥+)/ - A\x15B - A\x0dB - A\x25B - A\x0bB - A\x0cB - ** Fail - A B - -/-- Test \V+ --/ - -/^A(?>\å+)/ - A B - ** Fail - A\x15B - A\x0dB - A\x25B - A\x0bB - A\x0cB - -/-- End --/ diff --git a/src/pcre/testdata/testoutput10 b/src/pcre/testdata/testoutput10 deleted file mode 100644 index b89169cd..00000000 --- a/src/pcre/testdata/testoutput10 +++ /dev/null @@ -1,2547 +0,0 @@ -/-- This set of tests check Unicode property support with the DFA matching - functionality of pcre_dfa_exec(). The -dfa flag must be used with pcretest - when running it. --/ - -/\pL\P{Nd}/8 - AB - 0: AB - *** Failers - 0: Fa - A0 -No match - 00 -No match - -/\X./8 - AB - 0: AB - A\x{300}BC - 0: A\x{300}B - A\x{300}\x{301}\x{302}BC - 0: A\x{300}\x{301}\x{302}B - *** Failers - 0: ** - \x{300} -No match - -/\X\X/8 - ABC - 0: AB - A\x{300}B\x{300}\x{301}C - 0: A\x{300}B\x{300}\x{301} - A\x{300}\x{301}\x{302}BC - 0: A\x{300}\x{301}\x{302}B - *** Failers - 0: ** - \x{300} -No match - -/^\pL+/8 - abcd - 0: abcd - a - 0: a - *** Failers -No match - -/^\PL+/8 - 1234 - 0: 1234 - = - 0: = - *** Failers - 0: *** - abcd -No match - -/^\X+/8 - abcdA\x{300}\x{301}\x{302} - 0: abcdA\x{300}\x{301}\x{302} - A\x{300}\x{301}\x{302} - 0: A\x{300}\x{301}\x{302} - A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302} - 0: A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302} - a - 0: a - *** Failers - 0: *** Failers - \x{300}\x{301}\x{302} - 0: \x{300}\x{301}\x{302} - -/\X?abc/8 - abc - 0: abc - A\x{300}abc - 0: A\x{300}abc - A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz - 0: A\x{300}abc - \x{300}abc - 0: \x{300}abc - *** Failers -No match - -/^\X?abc/8 - abc - 0: abc - A\x{300}abc - 0: A\x{300}abc - *** Failers -No match - A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz -No match - \x{300}abc - 0: \x{300}abc - -/\X*abc/8 - abc - 0: abc - A\x{300}abc - 0: A\x{300}abc - A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz - 0: A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abc - \x{300}abc - 0: \x{300}abc - *** Failers -No match - -/^\X*abc/8 - abc - 0: abc - A\x{300}abc - 0: A\x{300}abc - A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz - 0: A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abc - *** Failers -No match - \x{300}abc - 0: \x{300}abc - -/^\pL?=./8 - A=b - 0: A=b - =c - 0: =c - *** Failers -No match - 1=2 -No match - AAAA=b -No match - -/^\pL*=./8 - AAAA=b - 0: AAAA=b - =c - 0: =c - *** Failers -No match - 1=2 -No match - -/^\X{2,3}X/8 - A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X - 0: A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X - A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X - 0: A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X - *** Failers -No match - X -No match - A\x{300}\x{301}\x{302}X -No match - A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X -No match - -/^\pC\pL\pM\pN\pP\pS\pZ\p{Xsp}/8 - >\x{1680}\x{2028}\x{0b} - 0: >\x{1680} - ** Failers -No match - \x{0b} -No match - -/^>\p{Xsp}+/8O - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028} - 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680} - 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0} - 4: > \x{09}\x{0a}\x{0c}\x{0d} - 5: > \x{09}\x{0a}\x{0c} - 6: > \x{09}\x{0a} - 7: > \x{09} - 8: > - -/^>\p{Xsp}*/8O - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028} - 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680} - 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0} - 4: > \x{09}\x{0a}\x{0c}\x{0d} - 5: > \x{09}\x{0a}\x{0c} - 6: > \x{09}\x{0a} - 7: > \x{09} - 8: > - 9: > - -/^>\p{Xsp}{2,9}/8O - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028} - 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680} - 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0} - 4: > \x{09}\x{0a}\x{0c}\x{0d} - 5: > \x{09}\x{0a}\x{0c} - 6: > \x{09}\x{0a} - 7: > \x{09} - -/^>[\p{Xsp}]/8O - >\x{2028}\x{0b} - 0: >\x{2028} - -/^>[\p{Xsp}]+/8O - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028} - 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680} - 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0} - 4: > \x{09}\x{0a}\x{0c}\x{0d} - 5: > \x{09}\x{0a}\x{0c} - 6: > \x{09}\x{0a} - 7: > \x{09} - 8: > - -/^>\p{Xps}/8 - >\x{1680}\x{2028}\x{0b} - 0: >\x{1680} - >\x{a0} - 0: >\x{a0} - ** Failers -No match - \x{0b} -No match - -/^>\p{Xps}+/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}+?/8 - >\x{1680}\x{2028}\x{0b} - 0: >\x{1680}\x{2028}\x{0b} - 1: >\x{1680}\x{2028} - 2: >\x{1680} - -/^>\p{Xps}*/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}{2,9}/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}{2,9}?/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028} - 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680} - 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0} - 4: > \x{09}\x{0a}\x{0c}\x{0d} - 5: > \x{09}\x{0a}\x{0c} - 6: > \x{09}\x{0a} - 7: > \x{09} - -/^>[\p{Xps}]/8 - >\x{2028}\x{0b} - 0: >\x{2028} - -/^>[\p{Xps}]+/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^\p{Xwd}/8 - ABCD - 0: A - 1234 - 0: 1 - \x{6ca} - 0: \x{6ca} - \x{a6c} - 0: \x{a6c} - \x{10a7} - 0: \x{10a7} - _ABC - 0: _ - ** Failers -No match - [] -No match - -/^\p{Xwd}+/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - -/^\p{Xwd}*/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - -/^\p{Xwd}{2,9}/8 - A_12\x{6ca}\x{a6c}\x{10a7} - 0: A_12\x{6ca}\x{a6c}\x{10a7} - -/^[\p{Xwd}]/8 - ABCD1234_ - 0: A - 1234abcd_ - 0: 1 - \x{6ca} - 0: \x{6ca} - \x{a6c} - 0: \x{a6c} - \x{10a7} - 0: \x{10a7} - _ABC - 0: _ - ** Failers -No match - [] -No match - -/^[\p{Xwd}]+/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - -/-- Unicode properties for \b abd \B --/ - -/\b...\B/8W - abc_ - 0: abc - \x{37e}abc\x{376} - 0: abc - \x{37e}\x{376}\x{371}\x{393}\x{394} - 0: \x{376}\x{371}\x{393} - !\x{c0}++\x{c1}\x{c2} - 0: ++\x{c1} - !\x{c0}+++++ - 0: \x{c0}++ - -/-- Without PCRE_UCP, non-ASCII always fail, even if < 256 --/ - -/\b...\B/8 - abc_ - 0: abc - ** Failers - 0: Fai - \x{37e}abc\x{376} -No match - \x{37e}\x{376}\x{371}\x{393}\x{394} -No match - !\x{c0}++\x{c1}\x{c2} -No match - !\x{c0}+++++ -No match - -/-- With PCRE_UCP, non-UTF8 chars that are < 256 still check properties --/ - -/\b...\B/W - abc_ - 0: abc - !\x{c0}++\x{c1}\x{c2} - 0: ++\xc1 - !\x{c0}+++++ - 0: \xc0++ - -/-- Caseless single negated characters > 127 need UCP support --/ - -/[^\x{100}]/8i - \x{100}\x{101}X - 0: X - -/[^\x{100}]+/8i - \x{100}\x{101}XX - 0: XX - -/^\X/8 - A\P - 0: A - A\P\P -Partial match: A - A\x{300}\x{301}\P - 0: A\x{300}\x{301} - A\x{300}\x{301}\P\P -Partial match: A\x{300}\x{301} - A\x{301}\P - 0: A\x{301} - A\x{301}\P\P -Partial match: A\x{301} - -/^\X{2,3}/8 - A\P -Partial match: A - A\P\P -Partial match: A - AA\P - 0: AA - AA\P\P -Partial match: AA - A\x{300}\x{301}\P -Partial match: A\x{300}\x{301} - A\x{300}\x{301}\P\P -Partial match: A\x{300}\x{301} - A\x{300}\x{301}A\x{300}\x{301}\P - 0: A\x{300}\x{301}A\x{300}\x{301} - A\x{300}\x{301}A\x{300}\x{301}\P\P -Partial match: A\x{300}\x{301}A\x{300}\x{301} - -/^\X{2}/8 - AA\P - 0: AA - AA\P\P -Partial match: AA - A\x{300}\x{301}A\x{300}\x{301}\P - 0: A\x{300}\x{301}A\x{300}\x{301} - A\x{300}\x{301}A\x{300}\x{301}\P\P -Partial match: A\x{300}\x{301}A\x{300}\x{301} - -/^\X+/8 - AA\P - 0: AA - AA\P\P -Partial match: AA - -/^\X+?Z/8 - AA\P -Partial match: AA - AA\P\P -Partial match: AA - -/-- These are tests for extended grapheme clusters --/ - -/^\X/8+ - G\x{34e}\x{34e}X - 0: G\x{34e}\x{34e} - 0+ X - \x{34e}\x{34e}X - 0: \x{34e}\x{34e} - 0+ X - \x04X - 0: \x{04} - 0+ X - \x{1100}X - 0: \x{1100} - 0+ X - \x{1100}\x{34e}X - 0: \x{1100}\x{34e} - 0+ X - \x{1b04}\x{1b04}X - 0: \x{1b04}\x{1b04} - 0+ X - *These match up to the roman letters - 0: * - 0+ These match up to the roman letters - \x{1111}\x{1111}L,L - 0: \x{1111}\x{1111} - 0+ L,L - \x{1111}\x{1111}\x{1169}L,L,V - 0: \x{1111}\x{1111}\x{1169} - 0+ L,L,V - \x{1111}\x{ae4c}L, LV - 0: \x{1111}\x{ae4c} - 0+ L, LV - \x{1111}\x{ad89}L, LVT - 0: \x{1111}\x{ad89} - 0+ L, LVT - \x{1111}\x{ae4c}\x{1169}L, LV, V - 0: \x{1111}\x{ae4c}\x{1169} - 0+ L, LV, V - \x{1111}\x{ae4c}\x{1169}\x{1169}L, LV, V, V - 0: \x{1111}\x{ae4c}\x{1169}\x{1169} - 0+ L, LV, V, V - \x{1111}\x{ae4c}\x{1169}\x{11fe}L, LV, V, T - 0: \x{1111}\x{ae4c}\x{1169}\x{11fe} - 0+ L, LV, V, T - \x{1111}\x{ad89}\x{11fe}L, LVT, T - 0: \x{1111}\x{ad89}\x{11fe} - 0+ L, LVT, T - \x{1111}\x{ad89}\x{11fe}\x{11fe}L, LVT, T, T - 0: \x{1111}\x{ad89}\x{11fe}\x{11fe} - 0+ L, LVT, T, T - \x{ad89}\x{11fe}\x{11fe}LVT, T, T - 0: \x{ad89}\x{11fe}\x{11fe} - 0+ LVT, T, T - *These match just the first codepoint (invalid sequence) - 0: * - 0+ These match just the first codepoint (invalid sequence) - \x{1111}\x{11fe}L, T - 0: \x{1111} - 0+ \x{11fe}L, T - \x{ae4c}\x{1111}LV, L - 0: \x{ae4c} - 0+ \x{1111}LV, L - \x{ae4c}\x{ae4c}LV, LV - 0: \x{ae4c} - 0+ \x{ae4c}LV, LV - \x{ae4c}\x{ad89}LV, LVT - 0: \x{ae4c} - 0+ \x{ad89}LV, LVT - \x{1169}\x{1111}V, L - 0: \x{1169} - 0+ \x{1111}V, L - \x{1169}\x{ae4c}V, LV - 0: \x{1169} - 0+ \x{ae4c}V, LV - \x{1169}\x{ad89}V, LVT - 0: \x{1169} - 0+ \x{ad89}V, LVT - \x{ad89}\x{1111}LVT, L - 0: \x{ad89} - 0+ \x{1111}LVT, L - \x{ad89}\x{1169}LVT, V - 0: \x{ad89} - 0+ \x{1169}LVT, V - \x{ad89}\x{ae4c}LVT, LV - 0: \x{ad89} - 0+ \x{ae4c}LVT, LV - \x{ad89}\x{ad89}LVT, LVT - 0: \x{ad89} - 0+ \x{ad89}LVT, LVT - \x{11fe}\x{1111}T, L - 0: \x{11fe} - 0+ \x{1111}T, L - \x{11fe}\x{1169}T, V - 0: \x{11fe} - 0+ \x{1169}T, V - \x{11fe}\x{ae4c}T, LV - 0: \x{11fe} - 0+ \x{ae4c}T, LV - \x{11fe}\x{ad89}T, LVT - 0: \x{11fe} - 0+ \x{ad89}T, LVT - *Test extend and spacing mark - 0: * - 0+ Test extend and spacing mark - \x{1111}\x{ae4c}\x{0711}L, LV, extend - 0: \x{1111}\x{ae4c}\x{711} - 0+ L, LV, extend - \x{1111}\x{ae4c}\x{1b04}L, LV, spacing mark - 0: \x{1111}\x{ae4c}\x{1b04} - 0+ L, LV, spacing mark - \x{1111}\x{ae4c}\x{1b04}\x{0711}\x{1b04}L, LV, spacing mark, extend, spacing mark - 0: \x{1111}\x{ae4c}\x{1b04}\x{711}\x{1b04} - 0+ L, LV, spacing mark, extend, spacing mark - *Test CR, LF, and control - 0: * - 0+ Test CR, LF, and control - \x0d\x{0711}CR, extend - 0: \x{0d} - 0+ \x{711}CR, extend - \x0d\x{1b04}CR, spacingmark - 0: \x{0d} - 0+ \x{1b04}CR, spacingmark - \x0a\x{0711}LF, extend - 0: \x{0a} - 0+ \x{711}LF, extend - \x0a\x{1b04}LF, spacingmark - 0: \x{0a} - 0+ \x{1b04}LF, spacingmark - \x0b\x{0711}Control, extend - 0: \x{0b} - 0+ \x{711}Control, extend - \x09\x{1b04}Control, spacingmark - 0: \x{09} - 0+ \x{1b04}Control, spacingmark - *There are no Prepend characters, so we can't test Prepend, CR - 0: * - 0+ There are no Prepend characters, so we can't test Prepend, CR - -/^(?>\X{2})X/8+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0+ - -/^\X{2,4}X/8+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0+ - -/^\X{2,4}?X/8+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0+ - -/-- --/ - -/\x{1e9e}+/8i - \x{1e9e}\x{00df} - 0: \x{1e9e}\x{df} - -/[z\x{1e9e}]+/8i - \x{1e9e}\x{00df} - 0: \x{1e9e}\x{df} - -/\x{00df}+/8i - \x{1e9e}\x{00df} - 0: \x{1e9e}\x{df} - -/[z\x{00df}]+/8i - \x{1e9e}\x{00df} - 0: \x{1e9e}\x{df} - -/\x{1f88}+/8i - \x{1f88}\x{1f80} - 0: \x{1f88}\x{1f80} - -/[z\x{1f88}]+/8i - \x{1f88}\x{1f80} - 0: \x{1f88}\x{1f80} - -/-- Perl matches these --/ - -/\x{00b5}+/8i - \x{00b5}\x{039c}\x{03bc} - 0: \x{b5}\x{39c}\x{3bc} - -/\x{039c}+/8i - \x{00b5}\x{039c}\x{03bc} - 0: \x{b5}\x{39c}\x{3bc} - -/\x{03bc}+/8i - \x{00b5}\x{039c}\x{03bc} - 0: \x{b5}\x{39c}\x{3bc} - - -/\x{00c5}+/8i - \x{00c5}\x{00e5}\x{212b} - 0: \x{c5}\x{e5}\x{212b} - -/\x{00e5}+/8i - \x{00c5}\x{00e5}\x{212b} - 0: \x{c5}\x{e5}\x{212b} - -/\x{212b}+/8i - \x{00c5}\x{00e5}\x{212b} - 0: \x{c5}\x{e5}\x{212b} - - -/\x{01c4}+/8i - \x{01c4}\x{01c5}\x{01c6} - 0: \x{1c4}\x{1c5}\x{1c6} - -/\x{01c5}+/8i - \x{01c4}\x{01c5}\x{01c6} - 0: \x{1c4}\x{1c5}\x{1c6} - -/\x{01c6}+/8i - \x{01c4}\x{01c5}\x{01c6} - 0: \x{1c4}\x{1c5}\x{1c6} - - -/\x{01c7}+/8i - \x{01c7}\x{01c8}\x{01c9} - 0: \x{1c7}\x{1c8}\x{1c9} - -/\x{01c8}+/8i - \x{01c7}\x{01c8}\x{01c9} - 0: \x{1c7}\x{1c8}\x{1c9} - -/\x{01c9}+/8i - \x{01c7}\x{01c8}\x{01c9} - 0: \x{1c7}\x{1c8}\x{1c9} - - -/\x{01ca}+/8i - \x{01ca}\x{01cb}\x{01cc} - 0: \x{1ca}\x{1cb}\x{1cc} - -/\x{01cb}+/8i - \x{01ca}\x{01cb}\x{01cc} - 0: \x{1ca}\x{1cb}\x{1cc} - -/\x{01cc}+/8i - \x{01ca}\x{01cb}\x{01cc} - 0: \x{1ca}\x{1cb}\x{1cc} - - -/\x{01f1}+/8i - \x{01f1}\x{01f2}\x{01f3} - 0: \x{1f1}\x{1f2}\x{1f3} - -/\x{01f2}+/8i - \x{01f1}\x{01f2}\x{01f3} - 0: \x{1f1}\x{1f2}\x{1f3} - -/\x{01f3}+/8i - \x{01f1}\x{01f2}\x{01f3} - 0: \x{1f1}\x{1f2}\x{1f3} - - -/\x{0345}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - 0: \x{345}\x{399}\x{3b9}\x{1fbe} - -/\x{0399}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - 0: \x{345}\x{399}\x{3b9}\x{1fbe} - -/\x{03b9}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - 0: \x{345}\x{399}\x{3b9}\x{1fbe} - -/\x{1fbe}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - 0: \x{345}\x{399}\x{3b9}\x{1fbe} - - -/\x{0392}+/8i - \x{0392}\x{03b2}\x{03d0} - 0: \x{392}\x{3b2}\x{3d0} - -/\x{03b2}+/8i - \x{0392}\x{03b2}\x{03d0} - 0: \x{392}\x{3b2}\x{3d0} - -/\x{03d0}+/8i - \x{0392}\x{03b2}\x{03d0} - 0: \x{392}\x{3b2}\x{3d0} - - -/\x{0395}+/8i - \x{0395}\x{03b5}\x{03f5} - 0: \x{395}\x{3b5}\x{3f5} - -/\x{03b5}+/8i - \x{0395}\x{03b5}\x{03f5} - 0: \x{395}\x{3b5}\x{3f5} - -/\x{03f5}+/8i - \x{0395}\x{03b5}\x{03f5} - 0: \x{395}\x{3b5}\x{3f5} - - -/\x{0398}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - 0: \x{398}\x{3b8}\x{3d1}\x{3f4} - -/\x{03b8}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - 0: \x{398}\x{3b8}\x{3d1}\x{3f4} - -/\x{03d1}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - 0: \x{398}\x{3b8}\x{3d1}\x{3f4} - -/\x{03f4}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - 0: \x{398}\x{3b8}\x{3d1}\x{3f4} - - -/\x{039a}+/8i - \x{039a}\x{03ba}\x{03f0} - 0: \x{39a}\x{3ba}\x{3f0} - -/\x{03ba}+/8i - \x{039a}\x{03ba}\x{03f0} - 0: \x{39a}\x{3ba}\x{3f0} - -/\x{03f0}+/8i - \x{039a}\x{03ba}\x{03f0} - 0: \x{39a}\x{3ba}\x{3f0} - - -/\x{03a0}+/8i - \x{03a0}\x{03c0}\x{03d6} - 0: \x{3a0}\x{3c0}\x{3d6} - -/\x{03c0}+/8i - \x{03a0}\x{03c0}\x{03d6} - 0: \x{3a0}\x{3c0}\x{3d6} - -/\x{03d6}+/8i - \x{03a0}\x{03c0}\x{03d6} - 0: \x{3a0}\x{3c0}\x{3d6} - - -/\x{03a1}+/8i - \x{03a1}\x{03c1}\x{03f1} - 0: \x{3a1}\x{3c1}\x{3f1} - -/\x{03c1}+/8i - \x{03a1}\x{03c1}\x{03f1} - 0: \x{3a1}\x{3c1}\x{3f1} - -/\x{03f1}+/8i - \x{03a1}\x{03c1}\x{03f1} - 0: \x{3a1}\x{3c1}\x{3f1} - - -/\x{03a3}+/8i - \x{03A3}\x{03C2}\x{03C3} - 0: \x{3a3}\x{3c2}\x{3c3} - -/\x{03c2}+/8i - \x{03A3}\x{03C2}\x{03C3} - 0: \x{3a3}\x{3c2}\x{3c3} - -/\x{03c3}+/8i - \x{03A3}\x{03C2}\x{03C3} - 0: \x{3a3}\x{3c2}\x{3c3} - - -/\x{03a6}+/8i - \x{03a6}\x{03c6}\x{03d5} - 0: \x{3a6}\x{3c6}\x{3d5} - -/\x{03c6}+/8i - \x{03a6}\x{03c6}\x{03d5} - 0: \x{3a6}\x{3c6}\x{3d5} - -/\x{03d5}+/8i - \x{03a6}\x{03c6}\x{03d5} - 0: \x{3a6}\x{3c6}\x{3d5} - - -/\x{03c9}+/8i - \x{03c9}\x{03a9}\x{2126} - 0: \x{3c9}\x{3a9}\x{2126} - -/\x{03a9}+/8i - \x{03c9}\x{03a9}\x{2126} - 0: \x{3c9}\x{3a9}\x{2126} - -/\x{2126}+/8i - \x{03c9}\x{03a9}\x{2126} - 0: \x{3c9}\x{3a9}\x{2126} - - -/\x{1e60}+/8i - \x{1e60}\x{1e61}\x{1e9b} - 0: \x{1e60}\x{1e61}\x{1e9b} - -/\x{1e61}+/8i - \x{1e60}\x{1e61}\x{1e9b} - 0: \x{1e60}\x{1e61}\x{1e9b} - -/\x{1e9b}+/8i - \x{1e60}\x{1e61}\x{1e9b} - 0: \x{1e60}\x{1e61}\x{1e9b} - - -/\x{1e9e}+/8i - \x{1e9e}\x{00df} - 0: \x{1e9e}\x{df} - -/\x{00df}+/8i - \x{1e9e}\x{00df} - 0: \x{1e9e}\x{df} - - -/\x{1f88}+/8i - \x{1f88}\x{1f80} - 0: \x{1f88}\x{1f80} - -/\x{1f80}+/8i - \x{1f88}\x{1f80} - 0: \x{1f88}\x{1f80} - -/\x{004b}+/8i - \x{004b}\x{006b}\x{212a} - 0: Kk\x{212a} - -/\x{006b}+/8i - \x{004b}\x{006b}\x{212a} - 0: Kk\x{212a} - -/\x{212a}+/8i - \x{004b}\x{006b}\x{212a} - 0: Kk\x{212a} - - -/\x{0053}+/8i - \x{0053}\x{0073}\x{017f} - 0: Ss\x{17f} - -/\x{0073}+/8i - \x{0053}\x{0073}\x{017f} - 0: Ss\x{17f} - -/\x{017f}+/8i - \x{0053}\x{0073}\x{017f} - 0: Ss\x{17f} - -/ist/8i - ikt -No match - -/is+t/8i - iSs\x{17f}t - 0: iSs\x{17f}t - ikt -No match - -/is+?t/8i - ikt -No match - -/is?t/8i - ikt -No match - -/is{2}t/8i - iskt -No match - -/^\p{Xuc}/8 - $abc - 0: $ - @abc - 0: @ - `abc - 0: ` - \x{1234}abc - 0: \x{1234} - ** Failers -No match - abc -No match - -/^\p{Xuc}+/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $@`\x{a0}\x{1234}\x{e000} - ** Failers -No match - \x{9f} -No match - -/^\p{Xuc}+?/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $@`\x{a0}\x{1234}\x{e000} - 1: $@`\x{a0}\x{1234} - 2: $@`\x{a0} - 3: $@` - 4: $@ - 5: $ - ** Failers -No match - \x{9f} -No match - -/^\p{Xuc}+?\*/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $@`\x{a0}\x{1234}\x{e000}* - ** Failers -No match - \x{9f} -No match - -/^\p{Xuc}++/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $@`\x{a0}\x{1234}\x{e000} - ** Failers -No match - \x{9f} -No match - -/^\p{Xuc}{3,5}/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $@`\x{a0}\x{1234} - ** Failers -No match - \x{9f} -No match - -/^\p{Xuc}{3,5}?/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $@`\x{a0}\x{1234} - 1: $@`\x{a0} - 2: $@` - ** Failers -No match - \x{9f} -No match - -/^[\p{Xuc}]/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $ - ** Failers -No match - \x{9f} -No match - -/^[\p{Xuc}]+/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $@`\x{a0}\x{1234}\x{e000} - ** Failers -No match - \x{9f} -No match - -/^\P{Xuc}/8 - abc - 0: a - ** Failers - 0: * - $abc -No match - @abc -No match - `abc -No match - \x{1234}abc -No match - -/^[\P{Xuc}]/8 - abc - 0: a - ** Failers - 0: * - $abc -No match - @abc -No match - `abc -No match - \x{1234}abc -No match - -/^A\s+Z/8W - A\x{2005}Z - 0: A\x{2005}Z - A\x{85}\x{180e}\x{2005}Z - 0: A\x{85}\x{180e}\x{2005}Z - -/^A[\s]+Z/8W - A\x{2005}Z - 0: A\x{2005}Z - A\x{85}\x{180e}\x{2005}Z - 0: A\x{85}\x{180e}\x{2005}Z - -/-- End of testinput10 --/ diff --git a/src/pcre/testdata/testoutput13 b/src/pcre/testdata/testoutput13 deleted file mode 100644 index d6fb8a5c..00000000 --- a/src/pcre/testdata/testoutput13 +++ /dev/null @@ -1,22 +0,0 @@ -/-- This test is run only when JIT support is not available. It checks that an -attempt to use it has the expected behaviour. It also tests things that -are different without JIT. --/ - -/abc/S+I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'c' -Subject length lower bound = 3 -No starting char list -JIT support is not available in this version of PCRE - -/a*/SI -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char -Study returned NULL - -/-- End of testinput13 --/ diff --git a/src/pcre/testdata/testoutput15 b/src/pcre/testdata/testoutput15 deleted file mode 100644 index e4e123c3..00000000 --- a/src/pcre/testdata/testoutput15 +++ /dev/null @@ -1,1144 +0,0 @@ -/-- This set of tests is for UTF-8 support but not Unicode property support, - and is relevant only to the 8-bit library. --/ - -< forbid W - -/X(\C{3})/8 - X\x{1234} - 0: X\x{1234} - 1: \x{1234} - -/X(\C{4})/8 - X\x{1234}YZ - 0: X\x{1234}Y - 1: \x{1234}Y - -/X\C*/8 - XYZabcdce - 0: XYZabcdce - -/X\C*?/8 - XYZabcde - 0: X - -/X\C{3,5}/8 - Xabcdefg - 0: Xabcde - X\x{1234} - 0: X\x{1234} - X\x{1234}YZ - 0: X\x{1234}YZ - X\x{1234}\x{512} - 0: X\x{1234}\x{512} - X\x{1234}\x{512}YZ - 0: X\x{1234}\x{512} - -/X\C{3,5}?/8 - Xabcdefg - 0: Xabc - X\x{1234} - 0: X\x{1234} - X\x{1234}YZ - 0: X\x{1234} - X\x{1234}\x{512} - 0: X\x{1234} - -/a\Cb/8 - aXb - 0: aXb - a\nb - 0: a\x{0a}b - -/a\C\Cb/8 - a\x{100}b - 0: a\x{100}b - -/ab\Cde/8 - abXde - 0: abXde - -/a\C\Cb/8 - a\x{100}b - 0: a\x{100}b - ** Failers -No match - a\x{12257}b -No match - -/[Ã]/8 -Failed: invalid UTF-8 string at offset 1 - -/Ã/8 -Failed: invalid UTF-8 string at offset 0 - -/ÃÃÃxxx/8 -Failed: invalid UTF-8 string at offset 0 - -/ÃÃÃxxx/8?DZSSO ------------------------------------------------------------------- - Bra - \X{c0}\X{c0}\X{c0}xxx - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: no_auto_possessify utf no_utf_check -First char = \x{c3} -Need char = 'x' - -/badutf/8 - \xdf -Error -10 (bad UTF-8 string) offset=0 reason=1 - \xef -Error -10 (bad UTF-8 string) offset=0 reason=2 - \xef\x80 -Error -10 (bad UTF-8 string) offset=0 reason=1 - \xf7 -Error -10 (bad UTF-8 string) offset=0 reason=3 - \xf7\x80 -Error -10 (bad UTF-8 string) offset=0 reason=2 - \xf7\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=1 - \xfb -Error -10 (bad UTF-8 string) offset=0 reason=4 - \xfb\x80 -Error -10 (bad UTF-8 string) offset=0 reason=3 - \xfb\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=2 - \xfb\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=1 - \xfd -Error -10 (bad UTF-8 string) offset=0 reason=5 - \xfd\x80 -Error -10 (bad UTF-8 string) offset=0 reason=4 - \xfd\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=3 - \xfd\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=2 - \xfd\x80\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=1 - \xdf\x7f -Error -10 (bad UTF-8 string) offset=0 reason=6 - \xef\x7f\x80 -Error -10 (bad UTF-8 string) offset=0 reason=6 - \xef\x80\x7f -Error -10 (bad UTF-8 string) offset=0 reason=7 - \xf7\x7f\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=6 - \xf7\x80\x7f\x80 -Error -10 (bad UTF-8 string) offset=0 reason=7 - \xf7\x80\x80\x7f -Error -10 (bad UTF-8 string) offset=0 reason=8 - \xfb\x7f\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=6 - \xfb\x80\x7f\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=7 - \xfb\x80\x80\x7f\x80 -Error -10 (bad UTF-8 string) offset=0 reason=8 - \xfb\x80\x80\x80\x7f -Error -10 (bad UTF-8 string) offset=0 reason=9 - \xfd\x7f\x80\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=6 - \xfd\x80\x7f\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=7 - \xfd\x80\x80\x7f\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=8 - \xfd\x80\x80\x80\x7f\x80 -Error -10 (bad UTF-8 string) offset=0 reason=9 - \xfd\x80\x80\x80\x80\x7f -Error -10 (bad UTF-8 string) offset=0 reason=10 - \xed\xa0\x80 -Error -10 (bad UTF-8 string) offset=0 reason=14 - \xc0\x8f -Error -10 (bad UTF-8 string) offset=0 reason=15 - \xe0\x80\x8f -Error -10 (bad UTF-8 string) offset=0 reason=16 - \xf0\x80\x80\x8f -Error -10 (bad UTF-8 string) offset=0 reason=17 - \xf8\x80\x80\x80\x8f -Error -10 (bad UTF-8 string) offset=0 reason=18 - \xfc\x80\x80\x80\x80\x8f -Error -10 (bad UTF-8 string) offset=0 reason=19 - \x80 -Error -10 (bad UTF-8 string) offset=0 reason=20 - \xfe -Error -10 (bad UTF-8 string) offset=0 reason=21 - \xff -Error -10 (bad UTF-8 string) offset=0 reason=21 - -/badutf/8 - \xfb\x80\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=11 - \xfd\x80\x80\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=12 - \xf7\xbf\xbf\xbf -Error -10 (bad UTF-8 string) offset=0 reason=13 - -/shortutf/8 - \P\P\xdf -Error -25 (short UTF-8 string) offset=0 reason=1 - \P\P\xef -Error -25 (short UTF-8 string) offset=0 reason=2 - \P\P\xef\x80 -Error -25 (short UTF-8 string) offset=0 reason=1 - \P\P\xf7 -Error -25 (short UTF-8 string) offset=0 reason=3 - \P\P\xf7\x80 -Error -25 (short UTF-8 string) offset=0 reason=2 - \P\P\xf7\x80\x80 -Error -25 (short UTF-8 string) offset=0 reason=1 - \P\P\xfb -Error -25 (short UTF-8 string) offset=0 reason=4 - \P\P\xfb\x80 -Error -25 (short UTF-8 string) offset=0 reason=3 - \P\P\xfb\x80\x80 -Error -25 (short UTF-8 string) offset=0 reason=2 - \P\P\xfb\x80\x80\x80 -Error -25 (short UTF-8 string) offset=0 reason=1 - \P\P\xfd -Error -25 (short UTF-8 string) offset=0 reason=5 - \P\P\xfd\x80 -Error -25 (short UTF-8 string) offset=0 reason=4 - \P\P\xfd\x80\x80 -Error -25 (short UTF-8 string) offset=0 reason=3 - \P\P\xfd\x80\x80\x80 -Error -25 (short UTF-8 string) offset=0 reason=2 - \P\P\xfd\x80\x80\x80\x80 -Error -25 (short UTF-8 string) offset=0 reason=1 - -/anything/8 - \xc0\x80 -Error -10 (bad UTF-8 string) offset=0 reason=15 - \xc1\x8f -Error -10 (bad UTF-8 string) offset=0 reason=15 - \xe0\x9f\x80 -Error -10 (bad UTF-8 string) offset=0 reason=16 - \xf0\x8f\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=17 - \xf8\x87\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=18 - \xfc\x83\x80\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=19 - \xfe\x80\x80\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=21 - \xff\x80\x80\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=21 - \xc3\x8f -No match - \xe0\xaf\x80 -No match - \xe1\x80\x80 -No match - \xf0\x9f\x80\x80 -No match - \xf1\x8f\x80\x80 -No match - \xf8\x88\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=11 - \xf9\x87\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=11 - \xfc\x84\x80\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=12 - \xfd\x83\x80\x80\x80\x80 -Error -10 (bad UTF-8 string) offset=0 reason=12 - \?\xf8\x88\x80\x80\x80 -No match - \?\xf9\x87\x80\x80\x80 -No match - \?\xfc\x84\x80\x80\x80\x80 -No match - \?\xfd\x83\x80\x80\x80\x80 -No match - -/\x{100}/8DZ ------------------------------------------------------------------- - Bra - \x{100} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{c4} -Need char = \x{80} - -/\x{1000}/8DZ ------------------------------------------------------------------- - Bra - \x{1000} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{e1} -Need char = \x{80} - -/\x{10000}/8DZ ------------------------------------------------------------------- - Bra - \x{10000} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{f0} -Need char = \x{80} - -/\x{100000}/8DZ ------------------------------------------------------------------- - Bra - \x{100000} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{f4} -Need char = \x{80} - -/\x{10ffff}/8DZ ------------------------------------------------------------------- - Bra - \x{10ffff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{f4} -Need char = \x{bf} - -/[\x{ff}]/8DZ ------------------------------------------------------------------- - Bra - \x{ff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{c3} -Need char = \x{bf} - -/[\x{100}]/8DZ ------------------------------------------------------------------- - Bra - \x{100} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{c4} -Need char = \x{80} - -/\x80/8DZ ------------------------------------------------------------------- - Bra - \x{80} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{c2} -Need char = \x{80} - -/\xff/8DZ ------------------------------------------------------------------- - Bra - \x{ff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{c3} -Need char = \x{bf} - -/\x{D55c}\x{ad6d}\x{C5B4}/DZ8 ------------------------------------------------------------------- - Bra - \x{d55c}\x{ad6d}\x{c5b4} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{ed} -Need char = \x{b4} - \x{D55c}\x{ad6d}\x{C5B4} - 0: \x{d55c}\x{ad6d}\x{c5b4} - -/\x{65e5}\x{672c}\x{8a9e}/DZ8 ------------------------------------------------------------------- - Bra - \x{65e5}\x{672c}\x{8a9e} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{e6} -Need char = \x{9e} - \x{65e5}\x{672c}\x{8a9e} - 0: \x{65e5}\x{672c}\x{8a9e} - -/\x{80}/DZ8 ------------------------------------------------------------------- - Bra - \x{80} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{c2} -Need char = \x{80} - -/\x{084}/DZ8 ------------------------------------------------------------------- - Bra - \x{84} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{c2} -Need char = \x{84} - -/\x{104}/DZ8 ------------------------------------------------------------------- - Bra - \x{104} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{c4} -Need char = \x{84} - -/\x{861}/DZ8 ------------------------------------------------------------------- - Bra - \x{861} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{e0} -Need char = \x{a1} - -/\x{212ab}/DZ8 ------------------------------------------------------------------- - Bra - \x{212ab} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{f0} -Need char = \x{ab} - -/-- This one is here not because it's different to Perl, but because the way -the captured single-byte is displayed. (In Perl it becomes a character, and you -can't tell the difference.) --/ - -/X(\C)(.*)/8 - X\x{1234} - 0: X\x{1234} - 1: \x{e1} - 2: \x{88}\x{b4} - X\nabc - 0: X\x{0a}abc - 1: \x{0a} - 2: abc - -/-- This one is here because Perl gives out a grumbly error message (quite -correctly, but that messes up comparisons). --/ - -/a\Cb/8 - *** Failers -No match - a\x{100}b -No match - -/[^ab\xC0-\xF0]/8SDZ ------------------------------------------------------------------- - Bra - [\x00-`c-\xbf\xf1-\xff] (neg) - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a - \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 - \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 - 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y - Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f - \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 - \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf - \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee - \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd - \xfe \xff - \x{f1} - 0: \x{f1} - \x{bf} - 0: \x{bf} - \x{100} - 0: \x{100} - \x{1000} - 0: \x{1000} - *** Failers - 0: * - \x{c0} -No match - \x{f0} -No match - -/Ä€{3,4}/8SDZ ------------------------------------------------------------------- - Bra - \x{100}{3} - \x{100}?+ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{c4} -Need char = \x{80} -Subject length lower bound = 3 -No starting char list - \x{100}\x{100}\x{100}\x{100\x{100} - 0: \x{100}\x{100}\x{100} - -/(\x{100}+|x)/8SDZ ------------------------------------------------------------------- - Bra - CBra 1 - \x{100}++ - Alt - x - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: x \xc4 - -/(\x{100}*a|x)/8SDZ ------------------------------------------------------------------- - Bra - CBra 1 - \x{100}*+ - a - Alt - x - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: a x \xc4 - -/(\x{100}{0,2}a|x)/8SDZ ------------------------------------------------------------------- - Bra - CBra 1 - \x{100}{0,2}+ - a - Alt - x - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: a x \xc4 - -/(\x{100}{1,2}a|x)/8SDZ ------------------------------------------------------------------- - Bra - CBra 1 - \x{100} - \x{100}{0,1}+ - a - Alt - x - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: x \xc4 - -/\x{100}/8DZ ------------------------------------------------------------------- - Bra - \x{100} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{c4} -Need char = \x{80} - -/a\x{100}\x{101}*/8DZ ------------------------------------------------------------------- - Bra - a\x{100} - \x{101}*+ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'a' -Need char = \x{80} - -/a\x{100}\x{101}+/8DZ ------------------------------------------------------------------- - Bra - a\x{100} - \x{101}++ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'a' -Need char = \x{81} - -/[^\x{c4}]/DZ ------------------------------------------------------------------- - Bra - [^\x{c4}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/[\x{100}]/8DZ ------------------------------------------------------------------- - Bra - \x{100} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{c4} -Need char = \x{80} - \x{100} - 0: \x{100} - Z\x{100} - 0: \x{100} - \x{100}Z - 0: \x{100} - *** Failers -No match - -/[\xff]/DZ8 ------------------------------------------------------------------- - Bra - \x{ff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{c3} -Need char = \x{bf} - >\x{ff}< - 0: \x{ff} - -/[^\xff]/8DZ ------------------------------------------------------------------- - Bra - [^\x{ff}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - -/\x{100}abc(xyz(?1))/8DZ ------------------------------------------------------------------- - Bra - \x{100}abc - CBra 1 - xyz - Recurse - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -First char = \x{c4} -Need char = 'z' - -/a\x{1234}b/P8 - a\x{1234}b - 0: a\x{1234}b - -/\777/8I -Capturing subpattern count = 0 -Options: utf -First char = \x{c7} -Need char = \x{bf} - \x{1ff} - 0: \x{1ff} - \777 - 0: \x{1ff} - -/\x{100}+\x{200}/8DZ ------------------------------------------------------------------- - Bra - \x{100}++ - \x{200} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{c4} -Need char = \x{80} - -/\x{100}+X/8DZ ------------------------------------------------------------------- - Bra - \x{100}++ - X - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{c4} -Need char = 'X' - -/^[\QÄ€\E-\QÅ\E/BZ8 -Failed: missing terminating ] for character class at offset 15 - -/-- This tests the stricter UTF-8 check according to RFC 3629. --/ - -/X/8 - \x{d800} -Error -10 (bad UTF-8 string) offset=0 reason=14 - \x{d800}\? -No match - \x{da00} -Error -10 (bad UTF-8 string) offset=0 reason=14 - \x{da00}\? -No match - \x{dfff} -Error -10 (bad UTF-8 string) offset=0 reason=14 - \x{dfff}\? -No match - \x{110000} -Error -10 (bad UTF-8 string) offset=0 reason=13 - \x{110000}\? -No match - \x{2000000} -Error -10 (bad UTF-8 string) offset=0 reason=11 - \x{2000000}\? -No match - \x{7fffffff} -Error -10 (bad UTF-8 string) offset=0 reason=12 - \x{7fffffff}\? -No match - -/(*UTF8)\x{1234}/ - abcd\x{1234}pqr - 0: \x{1234} - -/(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I -Capturing subpattern count = 0 -Options: bsr_unicode utf -Forced newline sequence: CRLF -First char = 'a' -Need char = 'b' - -/\h/SI8 -Capturing subpattern count = 0 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x09 \x20 \xc2 \xe1 \xe2 \xe3 - ABC\x{09} - 0: \x{09} - ABC\x{20} - 0: - ABC\x{a0} - 0: \x{a0} - ABC\x{1680} - 0: \x{1680} - ABC\x{180e} - 0: \x{180e} - ABC\x{2000} - 0: \x{2000} - ABC\x{202f} - 0: \x{202f} - ABC\x{205f} - 0: \x{205f} - ABC\x{3000} - 0: \x{3000} - -/\v/SI8 -Capturing subpattern count = 0 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x0a \x0b \x0c \x0d \xc2 \xe2 - ABC\x{0a} - 0: \x{0a} - ABC\x{0b} - 0: \x{0b} - ABC\x{0c} - 0: \x{0c} - ABC\x{0d} - 0: \x{0d} - ABC\x{85} - 0: \x{85} - ABC\x{2028} - 0: \x{2028} - -/\h*A/SI8 -Capturing subpattern count = 0 -Options: utf -No first char -Need char = 'A' -Subject length lower bound = 1 -Starting chars: \x09 \x20 A \xc2 \xe1 \xe2 \xe3 - CDBABC - 0: A - -/\v+A/SI8 -Capturing subpattern count = 0 -Options: utf -No first char -Need char = 'A' -Subject length lower bound = 2 -Starting chars: \x0a \x0b \x0c \x0d \xc2 \xe2 - -/\s?xxx\s/8SI -Capturing subpattern count = 0 -Options: utf -No first char -Need char = 'x' -Subject length lower bound = 4 -Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 x - -/\sxxx\s/I8ST1 -Capturing subpattern count = 0 -Options: utf -No first char -Need char = 'x' -Subject length lower bound = 5 -Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 \xc2 - AB\x{85}xxx\x{a0}XYZ - 0: \x{85}xxx\x{a0} - AB\x{a0}xxx\x{85}XYZ - 0: \x{a0}xxx\x{85} - -/\S \S/I8ST1 -Capturing subpattern count = 0 -Options: utf -No first char -Need char = ' ' -Subject length lower bound = 3 -Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f - \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e - \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C - D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h - i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4 - \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 - \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 - \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 - \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff - \x{a2} \x{84} - 0: \x{a2} \x{84} - A Z - 0: A Z - -/a+/8 - a\x{123}aa\>1 - 0: aa - a\x{123}aa\>2 -Error -11 (bad UTF-8 offset) - a\x{123}aa\>3 - 0: aa - a\x{123}aa\>4 - 0: a - a\x{123}aa\>5 -No match - a\x{123}aa\>6 -Error -24 (bad offset value) - -/\x{1234}+/iS8I -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: \xe1 - -/\x{1234}+?/iS8I -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: \xe1 - -/\x{1234}++/iS8I -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: \xe1 - -/\x{1234}{2}/iS8I -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char -Subject length lower bound = 2 -Starting chars: \xe1 - -/[^\x{c4}]/8DZ ------------------------------------------------------------------- - Bra - [^\x{c4}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - -/X+\x{200}/8DZ ------------------------------------------------------------------- - Bra - X++ - \x{200} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'X' -Need char = \x{80} - -/\R/SI8 -Capturing subpattern count = 0 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x0a \x0b \x0c \x0d \xc2 \xe2 - -/\777/8DZ ------------------------------------------------------------------- - Bra - \x{1ff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{c7} -Need char = \x{bf} - -/\w+\x{C4}/8BZ ------------------------------------------------------------------- - Bra - \w++ - \x{c4} - Ket - End ------------------------------------------------------------------- - a\x{C4}\x{C4} - 0: a\x{c4} - -/\w+\x{C4}/8BZT1 ------------------------------------------------------------------- - Bra - \w+ - \x{c4} - Ket - End ------------------------------------------------------------------- - a\x{C4}\x{C4} - 0: a\x{c4}\x{c4} - -/\W+\x{C4}/8BZ ------------------------------------------------------------------- - Bra - \W+ - \x{c4} - Ket - End ------------------------------------------------------------------- - !\x{C4} - 0: !\x{c4} - -/\W+\x{C4}/8BZT1 ------------------------------------------------------------------- - Bra - \W++ - \x{c4} - Ket - End ------------------------------------------------------------------- - !\x{C4} - 0: !\x{c4} - -/\W+\x{A1}/8BZ ------------------------------------------------------------------- - Bra - \W+ - \x{a1} - Ket - End ------------------------------------------------------------------- - !\x{A1} - 0: !\x{a1} - -/\W+\x{A1}/8BZT1 ------------------------------------------------------------------- - Bra - \W+ - \x{a1} - Ket - End ------------------------------------------------------------------- - !\x{A1} - 0: !\x{a1} - -/X\s+\x{A0}/8BZ ------------------------------------------------------------------- - Bra - X - \s++ - \x{a0} - Ket - End ------------------------------------------------------------------- - X\x20\x{A0}\x{A0} - 0: X \x{a0} - -/X\s+\x{A0}/8BZT1 ------------------------------------------------------------------- - Bra - X - \s+ - \x{a0} - Ket - End ------------------------------------------------------------------- - X\x20\x{A0}\x{A0} - 0: X \x{a0}\x{a0} - -/\S+\x{A0}/8BZ ------------------------------------------------------------------- - Bra - \S+ - \x{a0} - Ket - End ------------------------------------------------------------------- - X\x{A0}\x{A0} - 0: X\x{a0}\x{a0} - -/\S+\x{A0}/8BZT1 ------------------------------------------------------------------- - Bra - \S++ - \x{a0} - Ket - End ------------------------------------------------------------------- - X\x{A0}\x{A0} - 0: X\x{a0} - -/\x{a0}+\s!/8BZ ------------------------------------------------------------------- - Bra - \x{a0}++ - \s - ! - Ket - End ------------------------------------------------------------------- - \x{a0}\x20! - 0: \x{a0} ! - -/\x{a0}+\s!/8BZT1 ------------------------------------------------------------------- - Bra - \x{a0}+ - \s - ! - Ket - End ------------------------------------------------------------------- - \x{a0}\x20! - 0: \x{a0} ! - -/A/8 - \x{ff000041} -** Character \x{ff000041} is greater than 0x7fffffff and so cannot be converted to UTF-8 - \x{7f000041} -Error -10 (bad UTF-8 string) offset=0 reason=12 - -/(*UTF8)abc/9 -Failed: setting UTF is disabled by the application at offset 0 - -/abc/89 -Failed: setting UTF is disabled by the application at offset 0 - -//8+L - \xf1\xad\xae\xae - 0: - 0+ \x{6dbae} - -/-- End of testinput15 --/ diff --git a/src/pcre/testdata/testoutput16 b/src/pcre/testdata/testoutput16 deleted file mode 100644 index e6ba26ac..00000000 --- a/src/pcre/testdata/testoutput16 +++ /dev/null @@ -1,193 +0,0 @@ -/-- This set of tests is run only with the 8-bit library when Unicode property - support is available. It starts with tests of the POSIX interface, because - that is supported only with the 8-bit library. --/ - -/\w/P - +++\x{c2} -No match: POSIX code 17: match failed - -/\w/WP - +++\x{c2} - 0: \xc2 - -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iDZ ------------------------------------------------------------------- - Bra - /i A\x{391}\x{10427}\x{ff3a}\x{1fb0} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: caseless utf -First char = 'A' (caseless) -No need char - -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8DZ ------------------------------------------------------------------- - Bra - A\x{391}\x{10427}\x{ff3a}\x{1fb0} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'A' -Need char = \x{b0} - -/AB\x{1fb0}/8DZ ------------------------------------------------------------------- - Bra - AB\x{1fb0} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'A' -Need char = \x{b0} - -/AB\x{1fb0}/8DZi ------------------------------------------------------------------- - Bra - /i AB\x{1fb0} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: caseless utf -First char = 'A' (caseless) -Need char = 'B' (caseless) - -/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/8iSI -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char -Subject length lower bound = 17 -Starting chars: \xd0 \xd1 - \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} - 0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} - \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} - 0: \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} - -/[â±¥]/8iBZ ------------------------------------------------------------------- - Bra - /i \x{2c65} - Ket - End ------------------------------------------------------------------- - -/[^â±¥]/8iBZ ------------------------------------------------------------------- - Bra - /i [^\x{2c65}] - Ket - End ------------------------------------------------------------------- - -/\h/SI -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x09 \x20 \xa0 - -/\v/SI -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x0a \x0b \x0c \x0d \x85 - -/\R/SI -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x0a \x0b \x0c \x0d \x85 - -/[[:blank:]]/WBZ ------------------------------------------------------------------- - Bra - [\x09 \xa0] - Ket - End ------------------------------------------------------------------- - -/\x{212a}+/i8SI -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: K k \xe2 - KKkk\x{212a} - 0: KKkk\x{212a} - -/s+/i8SI -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: S s \xc5 - SSss\x{17f} - 0: SSss\x{17f} - -/[\W\p{Any}]/BZ ------------------------------------------------------------------- - Bra - [\x00-/:-@[-^`{-\xff\p{Any}] - Ket - End ------------------------------------------------------------------- - abc - 0: a - 123 - 0: 1 - -/[\W\pL]/BZ ------------------------------------------------------------------- - Bra - [\x00-/:-@[-^`{-\xff\p{L}] - Ket - End ------------------------------------------------------------------- - abc - 0: a - ** Failers - 0: * - 123 -No match - -/[\D]/8 - \x{1d7cf} - 0: \x{1d7cf} - -/[\D\P{Nd}]/8 - \x{1d7cf} - 0: \x{1d7cf} - -/[^\D]/8 - a9b - 0: 9 - ** Failers -No match - \x{1d7cf} -No match - -/[^\D\P{Nd}]/8 - a9b - 0: 9 - \x{1d7cf} - 0: \x{1d7cf} - ** Failers -No match - \x{10000} -No match - -/-- End of testinput16 --/ diff --git a/src/pcre/testdata/testoutput17 b/src/pcre/testdata/testoutput17 deleted file mode 100644 index 9ef6c727..00000000 --- a/src/pcre/testdata/testoutput17 +++ /dev/null @@ -1,560 +0,0 @@ -/-- This set of tests is for the 16- and 32-bit library's basic (non-UTF-16 - or -32) features that are not compatible with the 8-bit library, or which - give different output in 16- or 32-bit mode. --/ - -< forbid 8W - -/a\Cb/ - aXb - 0: aXb - a\nb - 0: a\x0ab - -/[^\x{c4}]/DZ ------------------------------------------------------------------- - Bra - [^\x{c4}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/\x{100}/I -Capturing subpattern count = 0 -No options -First char = \x{100} -No need char - -/ (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* # optional leading comment -(?: (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| -" (?: # opening quote... -[^\\\x80-\xff\n\015"] # Anything except backslash and quote -| # or -\\ [^\x80-\xff] # Escaped something (something != CR) -)* " # closing quote -) # initial word -(?: (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* \. (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| -" (?: # opening quote... -[^\\\x80-\xff\n\015"] # Anything except backslash and quote -| # or -\\ [^\x80-\xff] # Escaped something (something != CR) -)* " # closing quote -) )* # further okay, if led by a period -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* @ (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # initial subdomain -(?: # -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* \. # if led by a period... -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # ...further okay -)* -# address -| # or -(?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| -" (?: # opening quote... -[^\\\x80-\xff\n\015"] # Anything except backslash and quote -| # or -\\ [^\x80-\xff] # Escaped something (something != CR) -)* " # closing quote -) # one word, optionally followed by.... -(?: -[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037] | # atom and space parts, or... -\( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) | # comments, or... - -" (?: # opening quote... -[^\\\x80-\xff\n\015"] # Anything except backslash and quote -| # or -\\ [^\x80-\xff] # Escaped something (something != CR) -)* " # closing quote -# quoted strings -)* -< (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* # leading < -(?: @ (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # initial subdomain -(?: # -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* \. # if led by a period... -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # ...further okay -)* - -(?: (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* , (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* @ (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # initial subdomain -(?: # -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* \. # if led by a period... -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # ...further okay -)* -)* # further okay, if led by comma -: # closing colon -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* )? # optional route -(?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| -" (?: # opening quote... -[^\\\x80-\xff\n\015"] # Anything except backslash and quote -| # or -\\ [^\x80-\xff] # Escaped something (something != CR) -)* " # closing quote -) # initial word -(?: (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* \. (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| -" (?: # opening quote... -[^\\\x80-\xff\n\015"] # Anything except backslash and quote -| # or -\\ [^\x80-\xff] # Escaped something (something != CR) -)* " # closing quote -) )* # further okay, if led by a period -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* @ (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # initial subdomain -(?: # -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* \. # if led by a period... -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* (?: -[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... -(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom -| \[ # [ -(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff -\] # ] -) # ...further okay -)* -# address spec -(?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* > # trailing > -# name and address -) (?: [\040\t] | \( -(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* -\) )* # optional trailing comment -/xSI -Capturing subpattern count = 0 -Contains explicit CR or LF match -Options: extended -No first char -No need char -Subject length lower bound = 3 -Starting chars: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8 - 9 = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e - f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xff - -/[\h]/BZ ------------------------------------------------------------------- - Bra - [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}] - Ket - End ------------------------------------------------------------------- - >\x09< - 0: \x09 - -/[\h]+/BZ ------------------------------------------------------------------- - Bra - [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}]++ - Ket - End ------------------------------------------------------------------- - >\x09\x20\xa0< - 0: \x09 \xa0 - -/[\v]/BZ ------------------------------------------------------------------- - Bra - [\x0a-\x0d\x85\x{2028}-\x{2029}] - Ket - End ------------------------------------------------------------------- - -/[^\h]/BZ ------------------------------------------------------------------- - Bra - [^\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}] - Ket - End ------------------------------------------------------------------- - -/\h+/SI -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x09 \x20 \xa0 \xff - \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} - 0: \x{1680}\x{2000}\x{202f}\x{3000} - \x{3001}\x{2fff}\x{200a}\xa0\x{2000} - 0: \x{200a}\xa0\x{2000} - -/[\h\x{dc00}]+/BZSI ------------------------------------------------------------------- - Bra - [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}\x{dc00}]++ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x09 \x20 \xa0 \xff - \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} - 0: \x{1680}\x{2000}\x{202f}\x{3000} - \x{3001}\x{2fff}\x{200a}\xa0\x{2000} - 0: \x{200a}\xa0\x{2000} - -/\H+/SI -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -No starting char list - \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} - 0: \x{167f}\x{1681}\x{180d}\x{180f} - \x{2000}\x{200a}\x{1fff}\x{200b} - 0: \x{1fff}\x{200b} - \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} - 0: \x{202e}\x{2030}\x{205e}\x{2060} - \xa0\x{3000}\x9f\xa1\x{2fff}\x{3001} - 0: \x9f\xa1\x{2fff}\x{3001} - -/[\H\x{d800}]+/ - \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} - 0: \x{167f}\x{1681}\x{180d}\x{180f} - \x{2000}\x{200a}\x{1fff}\x{200b} - 0: \x{1fff}\x{200b} - \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} - 0: \x{202e}\x{2030}\x{205e}\x{2060} - \xa0\x{3000}\x9f\xa1\x{2fff}\x{3001} - 0: \x9f\xa1\x{2fff}\x{3001} - -/\v+/SI -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x0a \x0b \x0c \x0d \x85 \xff - \x{2027}\x{2030}\x{2028}\x{2029} - 0: \x{2028}\x{2029} - \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d - 0: \x85\x0a\x0b\x0c\x0d - -/[\v\x{dc00}]+/BZSI ------------------------------------------------------------------- - Bra - [\x0a-\x0d\x85\x{2028}-\x{2029}\x{dc00}]++ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x0a \x0b \x0c \x0d \x85 \xff - \x{2027}\x{2030}\x{2028}\x{2029} - 0: \x{2028}\x{2029} - \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d - 0: \x85\x0a\x0b\x0c\x0d - -/\V+/SI -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -No starting char list - \x{2028}\x{2029}\x{2027}\x{2030} - 0: \x{2027}\x{2030} - \x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86 - 0: \x09\x0e\x84\x86 - -/[\V\x{d800}]+/ - \x{2028}\x{2029}\x{2027}\x{2030} - 0: \x{2027}\x{2030} - \x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86 - 0: \x09\x0e\x84\x86 - -/\R+/SI -Capturing subpattern count = 0 -Options: bsr_unicode -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x0a \x0b \x0c \x0d \x85 \xff - \x{2027}\x{2030}\x{2028}\x{2029} - 0: \x{2028}\x{2029} - \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d - 0: \x85\x0a\x0b\x0c\x0d - -/\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}/I -Capturing subpattern count = 0 -No options -First char = \x{d800} -Need char = \x{dd00} - \x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00} - 0: \x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00} - -/[^\x{80}][^\x{ff}][^\x{100}][^\x{1000}][^\x{ffff}]/BZ ------------------------------------------------------------------- - Bra - [^\x80] - [^\x{ff}] - [^\x{100}] - [^\x{1000}] - [^\x{ffff}] - Ket - End ------------------------------------------------------------------- - -/[^\x{80}][^\x{ff}][^\x{100}][^\x{1000}][^\x{ffff}]/BZi ------------------------------------------------------------------- - Bra - /i [^\x80] - /i [^\x{ff}] - /i [^\x{100}] - /i [^\x{1000}] - /i [^\x{ffff}] - Ket - End ------------------------------------------------------------------- - -/[^\x{100}]*[^\x{1000}]+[^\x{ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{100}]{5,6}+/BZ ------------------------------------------------------------------- - Bra - [^\x{100}]* - [^\x{1000}]+ - [^\x{ffff}]?? - [^\x{8000}]{4} - [^\x{8000}]* - [^\x{7fff}]{2} - [^\x{7fff}]{0,7}? - [^\x{100}]{5} - [^\x{100}]?+ - Ket - End ------------------------------------------------------------------- - -/[^\x{100}]*[^\x{1000}]+[^\x{ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{100}]{5,6}+/BZi ------------------------------------------------------------------- - Bra - /i [^\x{100}]* - /i [^\x{1000}]+ - /i [^\x{ffff}]?? - /i [^\x{8000}]{4} - /i [^\x{8000}]* - /i [^\x{7fff}]{2} - /i [^\x{7fff}]{0,7}? - /i [^\x{100}]{5} - /i [^\x{100}]?+ - Ket - End ------------------------------------------------------------------- - -/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/K - XX - 0: XX -MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF - -/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/K - XX - 0: XX -MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE - -/\u0100/BZ ------------------------------------------------------------------- - Bra - \x{100} - Ket - End ------------------------------------------------------------------- - -/[\u0100-\u0200]/BZ ------------------------------------------------------------------- - Bra - [\x{100}-\x{200}] - Ket - End ------------------------------------------------------------------- - -/\ud800/BZ ------------------------------------------------------------------- - Bra - \x{d800} - Ket - End ------------------------------------------------------------------- - -/^\x{ffff}+/i - \x{ffff} - 0: \x{ffff} - -/^\x{ffff}?/i - \x{ffff} - 0: \x{ffff} - -/^\x{ffff}*/i - \x{ffff} - 0: \x{ffff} - -/^\x{ffff}{3}/i - \x{ffff}\x{ffff}\x{ffff} - 0: \x{ffff}\x{ffff}\x{ffff} - -/^\x{ffff}{0,3}/i - \x{ffff} - 0: \x{ffff} - -/[^\x00-a]{12,}[^b-\xff]*/BZ ------------------------------------------------------------------- - Bra - [b-\xff] (neg){12,} - [\x00-a] (neg)*+ - Ket - End ------------------------------------------------------------------- - -/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/BZ ------------------------------------------------------------------- - Bra - [\x00-\x08\x0e-\x1f!-\xff] (neg)* - \s* - - [0-9A-Z_a-z]++ - \W+ - - [\x00-/:-\xff] (neg)*? - \d - 0 - [\x00-/:-@[-^`{-\xff] (neg){4,6}? - \w* - A - Ket - End ------------------------------------------------------------------- - -/a*[b-\x{200}]?a#a*[b-\x{200}]?b#[a-f]*[g-\x{200}]*#[g-\x{200}]*[a-c]*#[g-\x{200}]*[a-h]*/BZ ------------------------------------------------------------------- - Bra - a* - [b-\xff\x{100}-\x{200}]?+ - a# - a*+ - [b-\xff\x{100}-\x{200}]? - b# - [a-f]*+ - [g-\xff\x{100}-\x{200}]*+ - # - [g-\xff\x{100}-\x{200}]*+ - [a-c]*+ - # - [g-\xff\x{100}-\x{200}]* - [a-h]*+ - Ket - End ------------------------------------------------------------------- - -/^[\x{1234}\x{4321}]{2,4}?/ - \x{1234}\x{1234}\x{1234} - 0: \x{1234}\x{1234}nd of testinput17 --/ diff --git a/src/pcre/testdata/testoutput18-16 b/src/pcre/testdata/testoutput18-16 deleted file mode 100644 index 1ef87047..00000000 --- a/src/pcre/testdata/testoutput18-16 +++ /dev/null @@ -1,1026 +0,0 @@ -/-- This set of tests is for UTF-16 and UTF-32 support, and is relevant only to - the 16- and 32-bit libraries. --/ - -< forbid W - -/ÃÃÃxxx/8?DZSS -**Failed: invalid UTF-8 string cannot be converted to UTF-16 - -/abc/8 - Ã] -**Failed: invalid UTF-8 string cannot be used as input in UTF mode - -/X(\C{3})/8 - X\x{11234}Y - 0: X\x{11234}Y - 1: \x{11234}Y - X\x{11234}YZ - 0: X\x{11234}Y - 1: \x{11234}Y - -/X(\C{4})/8 - X\x{11234}YZ - 0: X\x{11234}YZ - 1: \x{11234}YZ - X\x{11234}YZW - 0: X\x{11234}YZ - 1: \x{11234}YZ - -/X\C*/8 - XYZabcdce - 0: XYZabcdce - -/X\C*?/8 - XYZabcde - 0: X - -/X\C{3,5}/8 - Xabcdefg - 0: Xabcde - X\x{11234}Y - 0: X\x{11234}Y - X\x{11234}YZ - 0: X\x{11234}YZ - X\x{11234}\x{512} - 0: X\x{11234}\x{512} - X\x{11234}\x{512}YZ - 0: X\x{11234}\x{512}YZ - X\x{11234}\x{512}\x{11234}Z - 0: X\x{11234}\x{512}\x{11234} - -/X\C{3,5}?/8 - Xabcdefg - 0: Xabc - X\x{11234}Y - 0: X\x{11234}Y - X\x{11234}YZ - 0: X\x{11234}Y - X\x{11234}\x{512}YZ - 0: X\x{11234}\x{512} - *** Failers -No match - X\x{11234} -No match - -/a\Cb/8 - aXb - 0: aXb - a\nb - 0: a\x{0a}b - -/a\C\Cb/8 - a\x{12257}b - 0: a\x{12257}b - a\x{12257}\x{11234}b -No match - ** Failers -No match - a\x{100}b -No match - -/ab\Cde/8 - abXde - 0: abXde - -/-- Check maximum character size --/ - -/\x{ffff}/8DZ ------------------------------------------------------------------- - Bra - \x{ffff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{ffff} -No need char - -/\x{10000}/8DZ ------------------------------------------------------------------- - Bra - \x{10000} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{d800} -Need char = \x{dc00} - -/\x{100}/8DZ ------------------------------------------------------------------- - Bra - \x{100} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{100} -No need char - -/\x{1000}/8DZ ------------------------------------------------------------------- - Bra - \x{1000} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{1000} -No need char - -/\x{10000}/8DZ ------------------------------------------------------------------- - Bra - \x{10000} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{d800} -Need char = \x{dc00} - -/\x{100000}/8DZ ------------------------------------------------------------------- - Bra - \x{100000} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{dbc0} -Need char = \x{dc00} - -/\x{10ffff}/8DZ ------------------------------------------------------------------- - Bra - \x{10ffff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{dbff} -Need char = \x{dfff} - -/[\x{ff}]/8DZ ------------------------------------------------------------------- - Bra - \x{ff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{ff} -No need char - -/[\x{100}]/8DZ ------------------------------------------------------------------- - Bra - \x{100} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{100} -No need char - -/\x80/8DZ ------------------------------------------------------------------- - Bra - \x80 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{80} -No need char - -/\xff/8DZ ------------------------------------------------------------------- - Bra - \x{ff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{ff} -No need char - -/\x{D55c}\x{ad6d}\x{C5B4}/DZ8 ------------------------------------------------------------------- - Bra - \x{d55c}\x{ad6d}\x{c5b4} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{d55c} -Need char = \x{c5b4} - \x{D55c}\x{ad6d}\x{C5B4} - 0: \x{d55c}\x{ad6d}\x{c5b4} - -/\x{65e5}\x{672c}\x{8a9e}/DZ8 ------------------------------------------------------------------- - Bra - \x{65e5}\x{672c}\x{8a9e} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{65e5} -Need char = \x{8a9e} - \x{65e5}\x{672c}\x{8a9e} - 0: \x{65e5}\x{672c}\x{8a9e} - -/\x{80}/DZ8 ------------------------------------------------------------------- - Bra - \x80 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{80} -No need char - -/\x{084}/DZ8 ------------------------------------------------------------------- - Bra - \x{84} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{84} -No need char - -/\x{104}/DZ8 ------------------------------------------------------------------- - Bra - \x{104} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{104} -No need char - -/\x{861}/DZ8 ------------------------------------------------------------------- - Bra - \x{861} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{861} -No need char - -/\x{212ab}/DZ8 ------------------------------------------------------------------- - Bra - \x{212ab} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{d844} -Need char = \x{deab} - -/-- This one is here not because it's different to Perl, but because the way -the captured single-byte is displayed. (In Perl it becomes a character, and you -can't tell the difference.) --/ - -/X(\C)(.*)/8 - X\x{1234} - 0: X\x{1234} - 1: \x{1234} - 2: - X\nabc - 0: X\x{0a}abc - 1: \x{0a} - 2: abc - -/-- This one is here because Perl gives out a grumbly error message (quite -correctly, but that messes up comparisons). --/ - -/a\Cb/8 - *** Failers -No match - a\x{100}b - 0: a\x{100}b - -/[^ab\xC0-\xF0]/8SDZ ------------------------------------------------------------------- - Bra - [\x00-`c-\xbf\xf1-\xff] (neg) - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a - \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 - \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 - 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y - Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f - \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e - \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d - \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac - \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb - \xbc \xbd \xbe \xbf \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb - \xfc \xfd \xfe \xff - \x{f1} - 0: \x{f1} - \x{bf} - 0: \x{bf} - \x{100} - 0: \x{100} - \x{1000} - 0: \x{1000} - *** Failers - 0: * - \x{c0} -No match - \x{f0} -No match - -/Ä€{3,4}/8SDZ ------------------------------------------------------------------- - Bra - \x{100}{3} - \x{100}?+ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{100} -Need char = \x{100} -Subject length lower bound = 3 -No starting char list - \x{100}\x{100}\x{100}\x{100\x{100} - 0: \x{100}\x{100}\x{100} - -/(\x{100}+|x)/8SDZ ------------------------------------------------------------------- - Bra - CBra 1 - \x{100}++ - Alt - x - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: x \xff - -/(\x{100}*a|x)/8SDZ ------------------------------------------------------------------- - Bra - CBra 1 - \x{100}*+ - a - Alt - x - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: a x \xff - -/(\x{100}{0,2}a|x)/8SDZ ------------------------------------------------------------------- - Bra - CBra 1 - \x{100}{0,2}+ - a - Alt - x - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: a x \xff - -/(\x{100}{1,2}a|x)/8SDZ ------------------------------------------------------------------- - Bra - CBra 1 - \x{100} - \x{100}{0,1}+ - a - Alt - x - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: x \xff - -/\x{100}/8DZ ------------------------------------------------------------------- - Bra - \x{100} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{100} -No need char - -/a\x{100}\x{101}*/8DZ ------------------------------------------------------------------- - Bra - a\x{100} - \x{101}*+ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'a' -Need char = \x{100} - -/a\x{100}\x{101}+/8DZ ------------------------------------------------------------------- - Bra - a\x{100} - \x{101}++ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'a' -Need char = \x{101} - -/[^\x{c4}]/DZ ------------------------------------------------------------------- - Bra - [^\x{c4}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/[\x{100}]/8DZ ------------------------------------------------------------------- - Bra - \x{100} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{100} -No need char - \x{100} - 0: \x{100} - Z\x{100} - 0: \x{100} - \x{100}Z - 0: \x{100} - *** Failers -No match - -/[\xff]/DZ8 ------------------------------------------------------------------- - Bra - \x{ff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{ff} -No need char - >\x{ff}< - 0: \x{ff} - -/[^\xff]/8DZ ------------------------------------------------------------------- - Bra - [^\x{ff}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - -/\x{100}abc(xyz(?1))/8DZ ------------------------------------------------------------------- - Bra - \x{100}abc - CBra 1 - xyz - Recurse - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -First char = \x{100} -Need char = 'z' - -/\777/8I -Capturing subpattern count = 0 -Options: utf -First char = \x{1ff} -No need char - \x{1ff} - 0: \x{1ff} - \777 - 0: \x{1ff} - -/\x{100}+\x{200}/8DZ ------------------------------------------------------------------- - Bra - \x{100}++ - \x{200} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{100} -Need char = \x{200} - -/\x{100}+X/8DZ ------------------------------------------------------------------- - Bra - \x{100}++ - X - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{100} -Need char = 'X' - -/^[\QÄ€\E-\QÅ\E/BZ8 -Failed: missing terminating ] for character class at offset 13 - -/X/8 - \x{d800} -Error -10 (bad UTF-16 string) offset=0 reason=1 - \x{d800}\? -No match - \x{da00} -Error -10 (bad UTF-16 string) offset=0 reason=1 - \x{da00}\? -No match - \x{dc00} -Error -10 (bad UTF-16 string) offset=0 reason=3 - \x{dc00}\? -No match - \x{de00} -Error -10 (bad UTF-16 string) offset=0 reason=3 - \x{de00}\? -No match - \x{dfff} -Error -10 (bad UTF-16 string) offset=0 reason=3 - \x{dfff}\? -No match - \x{110000} -** Failed: character \x{110000} is greater than 0x10ffff and so cannot be converted to UTF-16 - \x{d800}\x{1234} -Error -10 (bad UTF-16 string) offset=1 reason=2 - -/(*UTF16)\x{11234}/ - abcd\x{11234}pqr - 0: \x{11234} - -/(*UTF)\x{11234}/I -Capturing subpattern count = 0 -Options: utf -First char = \x{d804} -Need char = \x{de34} - abcd\x{11234}pqr - 0: \x{11234} - -/(*UTF-32)\x{11234}/ -Failed: (*VERB) not recognized or malformed at offset 5 - -/(*CRLF)(*UTF16)(*BSR_UNICODE)a\Rb/I -Capturing subpattern count = 0 -Options: bsr_unicode utf -Forced newline sequence: CRLF -First char = 'a' -Need char = 'b' - -/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I -Failed: (*VERB) not recognized or malformed at offset 12 - -/\h/SI8 -Capturing subpattern count = 0 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x09 \x20 \xa0 \xff - ABC\x{09} - 0: \x{09} - ABC\x{20} - 0: - ABC\x{a0} - 0: \x{a0} - ABC\x{1680} - 0: \x{1680} - ABC\x{180e} - 0: \x{180e} - ABC\x{2000} - 0: \x{2000} - ABC\x{202f} - 0: \x{202f} - ABC\x{205f} - 0: \x{205f} - ABC\x{3000} - 0: \x{3000} - -/\v/SI8 -Capturing subpattern count = 0 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x0a \x0b \x0c \x0d \x85 \xff - ABC\x{0a} - 0: \x{0a} - ABC\x{0b} - 0: \x{0b} - ABC\x{0c} - 0: \x{0c} - ABC\x{0d} - 0: \x{0d} - ABC\x{85} - 0: \x{85} - ABC\x{2028} - 0: \x{2028} - -/\h*A/SI8 -Capturing subpattern count = 0 -Options: utf -No first char -Need char = 'A' -Subject length lower bound = 1 -Starting chars: \x09 \x20 A \xa0 \xff - CDBABC - 0: A - \x{2000}ABC - 0: \x{2000}A - -/\R*A/SI8 -Capturing subpattern count = 0 -Options: bsr_unicode utf -No first char -Need char = 'A' -Subject length lower bound = 1 -Starting chars: \x0a \x0b \x0c \x0d A \x85 \xff - CDBABC - 0: A - \x{2028}A - 0: \x{2028}A - -/\v+A/SI8 -Capturing subpattern count = 0 -Options: utf -No first char -Need char = 'A' -Subject length lower bound = 2 -Starting chars: \x0a \x0b \x0c \x0d \x85 \xff - -/\s?xxx\s/8SI -Capturing subpattern count = 0 -Options: utf -No first char -Need char = 'x' -Subject length lower bound = 4 -Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 x - -/\sxxx\s/I8ST1 -Capturing subpattern count = 0 -Options: utf -No first char -Need char = 'x' -Subject length lower bound = 5 -Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0 - AB\x{85}xxx\x{a0}XYZ - 0: \x{85}xxx\x{a0} - AB\x{a0}xxx\x{85}XYZ - 0: \x{a0}xxx\x{85} - -/\S \S/I8ST1 -Capturing subpattern count = 0 -Options: utf -No first char -Need char = ' ' -Subject length lower bound = 3 -Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f - \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e - \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C - D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h - i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 - \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 - \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3 \xa4 - \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 - \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 - \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 - \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 - \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef - \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe - \xff - \x{a2} \x{84} - 0: \x{a2} \x{84} - A Z - 0: A Z - -/a+/8 - a\x{123}aa\>1 - 0: aa - a\x{123}aa\>2 - 0: aa - a\x{123}aa\>3 - 0: a - a\x{123}aa\>4 -No match - a\x{123}aa\>5 -Error -24 (bad offset value) - a\x{123}aa\>6 -Error -24 (bad offset value) - -/\x{1234}+/iS8I -Capturing subpattern count = 0 -Options: caseless utf -First char = \x{1234} -No need char -Subject length lower bound = 1 -No starting char list - -/\x{1234}+?/iS8I -Capturing subpattern count = 0 -Options: caseless utf -First char = \x{1234} -No need char -Subject length lower bound = 1 -No starting char list - -/\x{1234}++/iS8I -Capturing subpattern count = 0 -Options: caseless utf -First char = \x{1234} -No need char -Subject length lower bound = 1 -No starting char list - -/\x{1234}{2}/iS8I -Capturing subpattern count = 0 -Options: caseless utf -First char = \x{1234} -Need char = \x{1234} -Subject length lower bound = 2 -No starting char list - -/[^\x{c4}]/8DZ ------------------------------------------------------------------- - Bra - [^\x{c4}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - -/X+\x{200}/8DZ ------------------------------------------------------------------- - Bra - X++ - \x{200} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'X' -Need char = \x{200} - -/\R/SI8 -Capturing subpattern count = 0 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x0a \x0b \x0c \x0d \x85 \xff - -/-- Check bad offset --/ - -/a/8 - \x{10000}\>1 -Error -11 (bad UTF-16 offset) - \x{10000}ab\>1 -Error -11 (bad UTF-16 offset) - \x{10000}ab\>2 - 0: a - \x{10000}ab\>3 -No match - \x{10000}ab\>4 -No match - \x{10000}ab\>5 -Error -24 (bad offset value) - -/í¼€/8 -Failed: invalid UTF-16 string at offset 0 - -/\w+\x{C4}/8BZ ------------------------------------------------------------------- - Bra - \w++ - \x{c4} - Ket - End ------------------------------------------------------------------- - a\x{C4}\x{C4} - 0: a\x{c4} - -/\w+\x{C4}/8BZT1 ------------------------------------------------------------------- - Bra - \w+ - \x{c4} - Ket - End ------------------------------------------------------------------- - a\x{C4}\x{C4} - 0: a\x{c4}\x{c4} - -/\W+\x{C4}/8BZ ------------------------------------------------------------------- - Bra - \W+ - \x{c4} - Ket - End ------------------------------------------------------------------- - !\x{C4} - 0: !\x{c4} - -/\W+\x{C4}/8BZT1 ------------------------------------------------------------------- - Bra - \W++ - \x{c4} - Ket - End ------------------------------------------------------------------- - !\x{C4} - 0: !\x{c4} - -/\W+\x{A1}/8BZ ------------------------------------------------------------------- - Bra - \W+ - \x{a1} - Ket - End ------------------------------------------------------------------- - !\x{A1} - 0: !\x{a1} - -/\W+\x{A1}/8BZT1 ------------------------------------------------------------------- - Bra - \W+ - \x{a1} - Ket - End ------------------------------------------------------------------- - !\x{A1} - 0: !\x{a1} - -/X\s+\x{A0}/8BZ ------------------------------------------------------------------- - Bra - X - \s++ - \x{a0} - Ket - End ------------------------------------------------------------------- - X\x20\x{A0}\x{A0} - 0: X \x{a0} - -/X\s+\x{A0}/8BZT1 ------------------------------------------------------------------- - Bra - X - \s+ - \x{a0} - Ket - End ------------------------------------------------------------------- - X\x20\x{A0}\x{A0} - 0: X \x{a0}\x{a0} - -/\S+\x{A0}/8BZ ------------------------------------------------------------------- - Bra - \S+ - \x{a0} - Ket - End ------------------------------------------------------------------- - X\x{A0}\x{A0} - 0: X\x{a0}\x{a0} - -/\S+\x{A0}/8BZT1 ------------------------------------------------------------------- - Bra - \S++ - \x{a0} - Ket - End ------------------------------------------------------------------- - X\x{A0}\x{A0} - 0: X\x{a0} - -/\x{a0}+\s!/8BZ ------------------------------------------------------------------- - Bra - \x{a0}++ - \s - ! - Ket - End ------------------------------------------------------------------- - \x{a0}\x20! - 0: \x{a0} ! - -/\x{a0}+\s!/8BZT1 ------------------------------------------------------------------- - Bra - \x{a0}+ - \s - ! - Ket - End ------------------------------------------------------------------- - \x{a0}\x20! - 0: \x{a0} ! - -/(*UTF)abc/9 -Failed: setting UTF is disabled by the application at offset 0 - -/abc/89 -Failed: setting UTF is disabled by the application at offset 0 - -/-- End of testinput18 --/ diff --git a/src/pcre/testdata/testoutput18-32 b/src/pcre/testdata/testoutput18-32 deleted file mode 100644 index 622ba64a..00000000 --- a/src/pcre/testdata/testoutput18-32 +++ /dev/null @@ -1,1023 +0,0 @@ -/-- This set of tests is for UTF-16 and UTF-32 support, and is relevant only to - the 16- and 32-bit libraries. --/ - -< forbid W - -/ÃÃÃxxx/8?DZSS -**Failed: invalid UTF-8 string cannot be converted to UTF-32 - -/abc/8 - Ã] -**Failed: invalid UTF-8 string cannot be used as input in UTF mode - -/X(\C{3})/8 - X\x{11234}Y -No match - X\x{11234}YZ - 0: X\x{11234}YZ - 1: \x{11234}YZ - -/X(\C{4})/8 - X\x{11234}YZ -No match - X\x{11234}YZW - 0: X\x{11234}YZW - 1: \x{11234}YZW - -/X\C*/8 - XYZabcdce - 0: XYZabcdce - -/X\C*?/8 - XYZabcde - 0: X - -/X\C{3,5}/8 - Xabcdefg - 0: Xabcde - X\x{11234}Y -No match - X\x{11234}YZ - 0: X\x{11234}YZ - X\x{11234}\x{512} -No match - X\x{11234}\x{512}YZ - 0: X\x{11234}\x{512}YZ - X\x{11234}\x{512}\x{11234}Z - 0: X\x{11234}\x{512}\x{11234}Z - -/X\C{3,5}?/8 - Xabcdefg - 0: Xabc - X\x{11234}Y -No match - X\x{11234}YZ - 0: X\x{11234}YZ - X\x{11234}\x{512}YZ - 0: X\x{11234}\x{512}Y - *** Failers -No match - X\x{11234} -No match - -/a\Cb/8 - aXb - 0: aXb - a\nb - 0: a\x{0a}b - -/a\C\Cb/8 - a\x{12257}b -No match - a\x{12257}\x{11234}b - 0: a\x{12257}\x{11234}b - ** Failers -No match - a\x{100}b -No match - -/ab\Cde/8 - abXde - 0: abXde - -/-- Check maximum character size --/ - -/\x{ffff}/8DZ ------------------------------------------------------------------- - Bra - \x{ffff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{ffff} -No need char - -/\x{10000}/8DZ ------------------------------------------------------------------- - Bra - \x{10000} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{10000} -No need char - -/\x{100}/8DZ ------------------------------------------------------------------- - Bra - \x{100} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{100} -No need char - -/\x{1000}/8DZ ------------------------------------------------------------------- - Bra - \x{1000} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{1000} -No need char - -/\x{10000}/8DZ ------------------------------------------------------------------- - Bra - \x{10000} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{10000} -No need char - -/\x{100000}/8DZ ------------------------------------------------------------------- - Bra - \x{100000} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{100000} -No need char - -/\x{10ffff}/8DZ ------------------------------------------------------------------- - Bra - \x{10ffff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{10ffff} -No need char - -/[\x{ff}]/8DZ ------------------------------------------------------------------- - Bra - \x{ff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{ff} -No need char - -/[\x{100}]/8DZ ------------------------------------------------------------------- - Bra - \x{100} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{100} -No need char - -/\x80/8DZ ------------------------------------------------------------------- - Bra - \x80 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{80} -No need char - -/\xff/8DZ ------------------------------------------------------------------- - Bra - \x{ff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{ff} -No need char - -/\x{D55c}\x{ad6d}\x{C5B4}/DZ8 ------------------------------------------------------------------- - Bra - \x{d55c}\x{ad6d}\x{c5b4} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{d55c} -Need char = \x{c5b4} - \x{D55c}\x{ad6d}\x{C5B4} - 0: \x{d55c}\x{ad6d}\x{c5b4} - -/\x{65e5}\x{672c}\x{8a9e}/DZ8 ------------------------------------------------------------------- - Bra - \x{65e5}\x{672c}\x{8a9e} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{65e5} -Need char = \x{8a9e} - \x{65e5}\x{672c}\x{8a9e} - 0: \x{65e5}\x{672c}\x{8a9e} - -/\x{80}/DZ8 ------------------------------------------------------------------- - Bra - \x80 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{80} -No need char - -/\x{084}/DZ8 ------------------------------------------------------------------- - Bra - \x{84} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{84} -No need char - -/\x{104}/DZ8 ------------------------------------------------------------------- - Bra - \x{104} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{104} -No need char - -/\x{861}/DZ8 ------------------------------------------------------------------- - Bra - \x{861} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{861} -No need char - -/\x{212ab}/DZ8 ------------------------------------------------------------------- - Bra - \x{212ab} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{212ab} -No need char - -/-- This one is here not because it's different to Perl, but because the way -the captured single-byte is displayed. (In Perl it becomes a character, and you -can't tell the difference.) --/ - -/X(\C)(.*)/8 - X\x{1234} - 0: X\x{1234} - 1: \x{1234} - 2: - X\nabc - 0: X\x{0a}abc - 1: \x{0a} - 2: abc - -/-- This one is here because Perl gives out a grumbly error message (quite -correctly, but that messes up comparisons). --/ - -/a\Cb/8 - *** Failers -No match - a\x{100}b - 0: a\x{100}b - -/[^ab\xC0-\xF0]/8SDZ ------------------------------------------------------------------- - Bra - [\x00-`c-\xbf\xf1-\xff] (neg) - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a - \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 - \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 - 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y - Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f - \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e - \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d - \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac - \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb - \xbc \xbd \xbe \xbf \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb - \xfc \xfd \xfe \xff - \x{f1} - 0: \x{f1} - \x{bf} - 0: \x{bf} - \x{100} - 0: \x{100} - \x{1000} - 0: \x{1000} - *** Failers - 0: * - \x{c0} -No match - \x{f0} -No match - -/Ä€{3,4}/8SDZ ------------------------------------------------------------------- - Bra - \x{100}{3} - \x{100}?+ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{100} -Need char = \x{100} -Subject length lower bound = 3 -No starting char list - \x{100}\x{100}\x{100}\x{100\x{100} - 0: \x{100}\x{100}\x{100} - -/(\x{100}+|x)/8SDZ ------------------------------------------------------------------- - Bra - CBra 1 - \x{100}++ - Alt - x - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: x \xff - -/(\x{100}*a|x)/8SDZ ------------------------------------------------------------------- - Bra - CBra 1 - \x{100}*+ - a - Alt - x - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: a x \xff - -/(\x{100}{0,2}a|x)/8SDZ ------------------------------------------------------------------- - Bra - CBra 1 - \x{100}{0,2}+ - a - Alt - x - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: a x \xff - -/(\x{100}{1,2}a|x)/8SDZ ------------------------------------------------------------------- - Bra - CBra 1 - \x{100} - \x{100}{0,1}+ - a - Alt - x - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: x \xff - -/\x{100}/8DZ ------------------------------------------------------------------- - Bra - \x{100} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{100} -No need char - -/a\x{100}\x{101}*/8DZ ------------------------------------------------------------------- - Bra - a\x{100} - \x{101}*+ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'a' -Need char = \x{100} - -/a\x{100}\x{101}+/8DZ ------------------------------------------------------------------- - Bra - a\x{100} - \x{101}++ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'a' -Need char = \x{101} - -/[^\x{c4}]/DZ ------------------------------------------------------------------- - Bra - [^\x{c4}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/[\x{100}]/8DZ ------------------------------------------------------------------- - Bra - \x{100} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{100} -No need char - \x{100} - 0: \x{100} - Z\x{100} - 0: \x{100} - \x{100}Z - 0: \x{100} - *** Failers -No match - -/[\xff]/DZ8 ------------------------------------------------------------------- - Bra - \x{ff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{ff} -No need char - >\x{ff}< - 0: \x{ff} - -/[^\xff]/8DZ ------------------------------------------------------------------- - Bra - [^\x{ff}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - -/\x{100}abc(xyz(?1))/8DZ ------------------------------------------------------------------- - Bra - \x{100}abc - CBra 1 - xyz - Recurse - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -First char = \x{100} -Need char = 'z' - -/\777/8I -Capturing subpattern count = 0 -Options: utf -First char = \x{1ff} -No need char - \x{1ff} - 0: \x{1ff} - \777 - 0: \x{1ff} - -/\x{100}+\x{200}/8DZ ------------------------------------------------------------------- - Bra - \x{100}++ - \x{200} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{100} -Need char = \x{200} - -/\x{100}+X/8DZ ------------------------------------------------------------------- - Bra - \x{100}++ - X - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = \x{100} -Need char = 'X' - -/^[\QÄ€\E-\QÅ\E/BZ8 -Failed: missing terminating ] for character class at offset 13 - -/X/8 - \x{d800} -Error -10 (bad UTF-32 string) offset=0 reason=1 - \x{d800}\? -No match - \x{da00} -Error -10 (bad UTF-32 string) offset=0 reason=1 - \x{da00}\? -No match - \x{dc00} -Error -10 (bad UTF-32 string) offset=0 reason=1 - \x{dc00}\? -No match - \x{de00} -Error -10 (bad UTF-32 string) offset=0 reason=1 - \x{de00}\? -No match - \x{dfff} -Error -10 (bad UTF-32 string) offset=0 reason=1 - \x{dfff}\? -No match - \x{110000} -Error -10 (bad UTF-32 string) offset=0 reason=3 - \x{d800}\x{1234} -Error -10 (bad UTF-32 string) offset=0 reason=1 - -/(*UTF16)\x{11234}/ -Failed: (*VERB) not recognized or malformed at offset 5 - -/(*UTF)\x{11234}/I -Capturing subpattern count = 0 -Options: utf -First char = \x{11234} -No need char - abcd\x{11234}pqr - 0: \x{11234} - -/(*UTF-32)\x{11234}/ -Failed: (*VERB) not recognized or malformed at offset 5 - -/(*CRLF)(*UTF16)(*BSR_UNICODE)a\Rb/I -Failed: (*VERB) not recognized or malformed at offset 12 - -/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I -Capturing subpattern count = 0 -Options: bsr_unicode utf -Forced newline sequence: CRLF -First char = 'a' -Need char = 'b' - -/\h/SI8 -Capturing subpattern count = 0 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x09 \x20 \xa0 \xff - ABC\x{09} - 0: \x{09} - ABC\x{20} - 0: - ABC\x{a0} - 0: \x{a0} - ABC\x{1680} - 0: \x{1680} - ABC\x{180e} - 0: \x{180e} - ABC\x{2000} - 0: \x{2000} - ABC\x{202f} - 0: \x{202f} - ABC\x{205f} - 0: \x{205f} - ABC\x{3000} - 0: \x{3000} - -/\v/SI8 -Capturing subpattern count = 0 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x0a \x0b \x0c \x0d \x85 \xff - ABC\x{0a} - 0: \x{0a} - ABC\x{0b} - 0: \x{0b} - ABC\x{0c} - 0: \x{0c} - ABC\x{0d} - 0: \x{0d} - ABC\x{85} - 0: \x{85} - ABC\x{2028} - 0: \x{2028} - -/\h*A/SI8 -Capturing subpattern count = 0 -Options: utf -No first char -Need char = 'A' -Subject length lower bound = 1 -Starting chars: \x09 \x20 A \xa0 \xff - CDBABC - 0: A - \x{2000}ABC - 0: \x{2000}A - -/\R*A/SI8 -Capturing subpattern count = 0 -Options: bsr_unicode utf -No first char -Need char = 'A' -Subject length lower bound = 1 -Starting chars: \x0a \x0b \x0c \x0d A \x85 \xff - CDBABC - 0: A - \x{2028}A - 0: \x{2028}A - -/\v+A/SI8 -Capturing subpattern count = 0 -Options: utf -No first char -Need char = 'A' -Subject length lower bound = 2 -Starting chars: \x0a \x0b \x0c \x0d \x85 \xff - -/\s?xxx\s/8SI -Capturing subpattern count = 0 -Options: utf -No first char -Need char = 'x' -Subject length lower bound = 4 -Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 x - -/\sxxx\s/I8ST1 -Capturing subpattern count = 0 -Options: utf -No first char -Need char = 'x' -Subject length lower bound = 5 -Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0 - AB\x{85}xxx\x{a0}XYZ - 0: \x{85}xxx\x{a0} - AB\x{a0}xxx\x{85}XYZ - 0: \x{a0}xxx\x{85} - -/\S \S/I8ST1 -Capturing subpattern count = 0 -Options: utf -No first char -Need char = ' ' -Subject length lower bound = 3 -Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f - \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e - \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C - D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h - i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 - \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 - \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3 \xa4 - \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 - \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 - \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 - \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 - \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef - \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe - \xff - \x{a2} \x{84} - 0: \x{a2} \x{84} - A Z - 0: A Z - -/a+/8 - a\x{123}aa\>1 - 0: aa - a\x{123}aa\>2 - 0: aa - a\x{123}aa\>3 - 0: a - a\x{123}aa\>4 -No match - a\x{123}aa\>5 -Error -24 (bad offset value) - a\x{123}aa\>6 -Error -24 (bad offset value) - -/\x{1234}+/iS8I -Capturing subpattern count = 0 -Options: caseless utf -First char = \x{1234} -No need char -Subject length lower bound = 1 -No starting char list - -/\x{1234}+?/iS8I -Capturing subpattern count = 0 -Options: caseless utf -First char = \x{1234} -No need char -Subject length lower bound = 1 -No starting char list - -/\x{1234}++/iS8I -Capturing subpattern count = 0 -Options: caseless utf -First char = \x{1234} -No need char -Subject length lower bound = 1 -No starting char list - -/\x{1234}{2}/iS8I -Capturing subpattern count = 0 -Options: caseless utf -First char = \x{1234} -Need char = \x{1234} -Subject length lower bound = 2 -No starting char list - -/[^\x{c4}]/8DZ ------------------------------------------------------------------- - Bra - [^\x{c4}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - -/X+\x{200}/8DZ ------------------------------------------------------------------- - Bra - X++ - \x{200} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'X' -Need char = \x{200} - -/\R/SI8 -Capturing subpattern count = 0 -Options: utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x0a \x0b \x0c \x0d \x85 \xff - -/-- Check bad offset --/ - -/a/8 - \x{10000}\>1 -No match - \x{10000}ab\>1 - 0: a - \x{10000}ab\>2 -No match - \x{10000}ab\>3 -No match - \x{10000}ab\>4 -Error -24 (bad offset value) - \x{10000}ab\>5 -Error -24 (bad offset value) - -/í¼€/8 -**Failed: character value is ill-formed UTF-32 - -/\w+\x{C4}/8BZ ------------------------------------------------------------------- - Bra - \w++ - \x{c4} - Ket - End ------------------------------------------------------------------- - a\x{C4}\x{C4} - 0: a\x{c4} - -/\w+\x{C4}/8BZT1 ------------------------------------------------------------------- - Bra - \w+ - \x{c4} - Ket - End ------------------------------------------------------------------- - a\x{C4}\x{C4} - 0: a\x{c4}\x{c4} - -/\W+\x{C4}/8BZ ------------------------------------------------------------------- - Bra - \W+ - \x{c4} - Ket - End ------------------------------------------------------------------- - !\x{C4} - 0: !\x{c4} - -/\W+\x{C4}/8BZT1 ------------------------------------------------------------------- - Bra - \W++ - \x{c4} - Ket - End ------------------------------------------------------------------- - !\x{C4} - 0: !\x{c4} - -/\W+\x{A1}/8BZ ------------------------------------------------------------------- - Bra - \W+ - \x{a1} - Ket - End ------------------------------------------------------------------- - !\x{A1} - 0: !\x{a1} - -/\W+\x{A1}/8BZT1 ------------------------------------------------------------------- - Bra - \W+ - \x{a1} - Ket - End ------------------------------------------------------------------- - !\x{A1} - 0: !\x{a1} - -/X\s+\x{A0}/8BZ ------------------------------------------------------------------- - Bra - X - \s++ - \x{a0} - Ket - End ------------------------------------------------------------------- - X\x20\x{A0}\x{A0} - 0: X \x{a0} - -/X\s+\x{A0}/8BZT1 ------------------------------------------------------------------- - Bra - X - \s+ - \x{a0} - Ket - End ------------------------------------------------------------------- - X\x20\x{A0}\x{A0} - 0: X \x{a0}\x{a0} - -/\S+\x{A0}/8BZ ------------------------------------------------------------------- - Bra - \S+ - \x{a0} - Ket - End ------------------------------------------------------------------- - X\x{A0}\x{A0} - 0: X\x{a0}\x{a0} - -/\S+\x{A0}/8BZT1 ------------------------------------------------------------------- - Bra - \S++ - \x{a0} - Ket - End ------------------------------------------------------------------- - X\x{A0}\x{A0} - 0: X\x{a0} - -/\x{a0}+\s!/8BZ ------------------------------------------------------------------- - Bra - \x{a0}++ - \s - ! - Ket - End ------------------------------------------------------------------- - \x{a0}\x20! - 0: \x{a0} ! - -/\x{a0}+\s!/8BZT1 ------------------------------------------------------------------- - Bra - \x{a0}+ - \s - ! - Ket - End ------------------------------------------------------------------- - \x{a0}\x20! - 0: \x{a0} ! - -/(*UTF)abc/9 -Failed: setting UTF is disabled by the application at offset 0 - -/abc/89 -Failed: setting UTF is disabled by the application at offset 0 - -/-- End of testinput18 --/ diff --git a/src/pcre/testdata/testoutput19 b/src/pcre/testdata/testoutput19 deleted file mode 100644 index 982bea4c..00000000 --- a/src/pcre/testdata/testoutput19 +++ /dev/null @@ -1,134 +0,0 @@ -/-- This set of tests is for Unicode property support, relevant only to the - 16- and 32-bit library. --/ - -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iDZ ------------------------------------------------------------------- - Bra - /i A\x{391}\x{10427}\x{ff3a}\x{1fb0} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: caseless utf -First char = 'A' (caseless) -Need char = \x{1fb0} (caseless) - -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8DZ ------------------------------------------------------------------- - Bra - A\x{391}\x{10427}\x{ff3a}\x{1fb0} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'A' -Need char = \x{1fb0} - -/AB\x{1fb0}/8DZ ------------------------------------------------------------------- - Bra - AB\x{1fb0} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'A' -Need char = \x{1fb0} - -/AB\x{1fb0}/8DZi ------------------------------------------------------------------- - Bra - /i AB\x{1fb0} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: caseless utf -First char = 'A' (caseless) -Need char = \x{1fb0} (caseless) - -/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/8iSI -Capturing subpattern count = 0 -Options: caseless utf -First char = \x{401} (caseless) -Need char = \x{42f} (caseless) -Subject length lower bound = 17 -No starting char list - \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} - 0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} - \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} - 0: \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} - -/[â±¥]/8iBZ ------------------------------------------------------------------- - Bra - /i \x{2c65} - Ket - End ------------------------------------------------------------------- - -/[^â±¥]/8iBZ ------------------------------------------------------------------- - Bra - /i [^\x{2c65}] - Ket - End ------------------------------------------------------------------- - -/[[:blank:]]/WBZ ------------------------------------------------------------------- - Bra - [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}] - Ket - End ------------------------------------------------------------------- - -/\x{212a}+/i8SI -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: K k \xff - KKkk\x{212a} - 0: KKkk\x{212a} - -/s+/i8SI -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char -Subject length lower bound = 1 -Starting chars: S s \xff - SSss\x{17f} - 0: SSss\x{17f} - -/[\D]/8 - \x{1d7cf} - 0: \x{1d7cf} - -/[\D\P{Nd}]/8 - \x{1d7cf} - 0: \x{1d7cf} - -/[^\D]/8 - a9b - 0: 9 - ** Failers -No match - \x{1d7cf} -No match - -/[^\D\P{Nd}]/8 - a9b - 0: 9 - \x{1d7cf} - 0: \x{1d7cf} - ** Failers -No match - \x{10000} -No match - -/-- End of testinput19 --/ diff --git a/src/pcre/testdata/testoutput2 b/src/pcre/testdata/testoutput2 deleted file mode 100644 index 4ccda272..00000000 --- a/src/pcre/testdata/testoutput2 +++ /dev/null @@ -1,14728 +0,0 @@ -/-- This set of tests is not Perl-compatible. It checks on special features - of PCRE's API, error diagnostics, and the compiled code of some patterns. - It also checks the non-Perl syntax the PCRE supports (Python, .NET, - Oniguruma). Finally, there are some tests where PCRE and Perl differ, - either because PCRE can't be compatible, or there is a possible Perl - bug. - - NOTE: This is a non-UTF set of tests. When UTF support is needed, use - test 5, and if Unicode Property Support is needed, use test 7. --/ - -< forbid 8W - -/(a)b|/I -Capturing subpattern count = 1 -May match empty string -No options -No first char -No need char - -/abc/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'c' - abc - 0: abc - defabc - 0: abc - \Aabc - 0: abc - *** Failers -No match - \Adefabc -No match - ABC -No match - -/^abc/I -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - abc - 0: abc - \Aabc - 0: abc - *** Failers -No match - defabc -No match - \Adefabc -No match - -/a+bc/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'c' - -/a*bc/I -Capturing subpattern count = 0 -No options -No first char -Need char = 'c' - -/a{3}bc/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'c' - -/(abc|a+z)/I -Capturing subpattern count = 1 -No options -First char = 'a' -No need char - -/^abc$/I -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - abc - 0: abc - *** Failers -No match - def\nabc -No match - -/ab\idef/X -Failed: unrecognized character follows \ at offset 3 - -/(?X)ab\idef/X -Failed: unrecognized character follows \ at offset 7 - -/x{5,4}/ -Failed: numbers out of order in {} quantifier at offset 5 - -/z{65536}/ -Failed: number too big in {} quantifier at offset 7 - -/[abcd/ -Failed: missing terminating ] for character class at offset 5 - -/(?X)[\B]/ -Failed: invalid escape sequence in character class at offset 6 - -/(?X)[\R]/ -Failed: invalid escape sequence in character class at offset 6 - -/(?X)[\X]/ -Failed: invalid escape sequence in character class at offset 6 - -/[\B]/BZ ------------------------------------------------------------------- - Bra - B - Ket - End ------------------------------------------------------------------- - -/[\R]/BZ ------------------------------------------------------------------- - Bra - R - Ket - End ------------------------------------------------------------------- - -/[\X]/BZ ------------------------------------------------------------------- - Bra - X - Ket - End ------------------------------------------------------------------- - -/[z-a]/ -Failed: range out of order in character class at offset 3 - -/^*/ -Failed: nothing to repeat at offset 1 - -/(abc/ -Failed: missing ) at offset 4 - -/(?# abc/ -Failed: missing ) after comment at offset 7 - -/(?z)abc/ -Failed: unrecognized character after (? or (?- at offset 2 - -/.*b/I -Capturing subpattern count = 0 -No options -First char at start or follows newline -Need char = 'b' - -/.*?b/I -Capturing subpattern count = 0 -No options -First char at start or follows newline -Need char = 'b' - -/cat|dog|elephant/I -Capturing subpattern count = 0 -No options -No first char -No need char - this sentence eventually mentions a cat - 0: cat - this sentences rambles on and on for a while and then reaches elephant - 0: elephant - -/cat|dog|elephant/IS -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 3 -Starting chars: c d e - this sentence eventually mentions a cat - 0: cat - this sentences rambles on and on for a while and then reaches elephant - 0: elephant - -/cat|dog|elephant/IiS -Capturing subpattern count = 0 -Options: caseless -No first char -No need char -Subject length lower bound = 3 -Starting chars: C D E c d e - this sentence eventually mentions a CAT cat - 0: CAT - this sentences rambles on and on for a while to elephant ElePhant - 0: elephant - -/a|[bcd]/IS -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: a b c d - -/(a|[^\dZ])/IS -Capturing subpattern count = 1 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a - \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 - \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = > - ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y [ \ ] ^ _ ` a b c d - e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 - \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 - \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 - \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 - \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf - \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce - \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd - \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec - \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb - \xfc \xfd \xfe \xff - -/(a|b)*[\s]/IS -Capturing subpattern count = 1 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 a b - -/(ab\2)/ -Failed: reference to non-existent subpattern at offset 6 - -/{4,5}abc/ -Failed: nothing to repeat at offset 4 - -/(a)(b)(c)\2/I -Capturing subpattern count = 3 -Max back reference = 2 -No options -First char = 'a' -Need char = 'c' - abcb - 0: abcb - 1: a - 2: b - 3: c - \O0abcb -Matched, but too many substrings - \O3abcb -Matched, but too many substrings - 0: abcb - \O6abcb -Matched, but too many substrings - 0: abcb - 1: a - \O9abcb -Matched, but too many substrings - 0: abcb - 1: a - 2: b - \O12abcb - 0: abcb - 1: a - 2: b - 3: c - -/(a)bc|(a)(b)\2/I -Capturing subpattern count = 3 -Max back reference = 2 -No options -First char = 'a' -No need char - abc - 0: abc - 1: a - \O0abc -Matched, but too many substrings - \O3abc -Matched, but too many substrings - 0: abc - \O6abc - 0: abc - 1: a - aba - 0: aba - 1: - 2: a - 3: b - \O0aba -Matched, but too many substrings - \O3aba -Matched, but too many substrings - 0: aba - \O6aba -Matched, but too many substrings - 0: aba - 1: - \O9aba -Matched, but too many substrings - 0: aba - 1: - 2: a - \O12aba - 0: aba - 1: - 2: a - 3: b - -/abc$/IE -Capturing subpattern count = 0 -Options: dollar_endonly -First char = 'a' -Need char = 'c' - abc - 0: abc - *** Failers -No match - abc\n -No match - abc\ndef -No match - -/(a)(b)(c)(d)(e)\6/ -Failed: reference to non-existent subpattern at offset 17 - -/the quick brown fox/I -Capturing subpattern count = 0 -No options -First char = 't' -Need char = 'x' - the quick brown fox - 0: the quick brown fox - this is a line with the quick brown fox - 0: the quick brown fox - -/the quick brown fox/IA -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - the quick brown fox - 0: the quick brown fox - *** Failers -No match - this is a line with the quick brown fox -No match - -/ab(?z)cd/ -Failed: unrecognized character after (? or (?- at offset 4 - -/^abc|def/I -Capturing subpattern count = 0 -No options -No first char -No need char - abcdef - 0: abc - abcdef\B - 0: def - -/.*((abc)$|(def))/I -Capturing subpattern count = 3 -No options -First char at start or follows newline -No need char - defabc - 0: defabc - 1: abc - 2: abc - \Zdefabc - 0: def - 1: def - 2: - 3: def - -/)/ -Failed: unmatched parentheses at offset 0 - -/a[]b/ -Failed: missing terminating ] for character class at offset 4 - -/[^aeiou ]{3,}/I -Capturing subpattern count = 0 -No options -No first char -No need char - co-processors, and for - 0: -pr - -/<.*>/I -Capturing subpattern count = 0 -No options -First char = '<' -Need char = '>' - abcghinop - 0: ghi - -/<.*?>/I -Capturing subpattern count = 0 -No options -First char = '<' -Need char = '>' - abcghinop - 0: - -/<.*>/IU -Capturing subpattern count = 0 -Options: ungreedy -First char = '<' -Need char = '>' - abcghinop - 0: - -/(?U)<.*>/I -Capturing subpattern count = 0 -No options -First char = '<' -Need char = '>' - abcghinop - 0: - -/<.*?>/IU -Capturing subpattern count = 0 -Options: ungreedy -First char = '<' -Need char = '>' - abcghinop - 0: ghi - -/={3,}/IU -Capturing subpattern count = 0 -Options: ungreedy -First char = '=' -Need char = '=' - abc========def - 0: === - -/(?U)={3,}?/I -Capturing subpattern count = 0 -No options -First char = '=' -Need char = '=' - abc========def - 0: ======== - -/(?^abc)/Im -Capturing subpattern count = 0 -Options: multiline -First char at start or follows newline -Need char = 'c' - abc - 0: abc - def\nabc - 0: abc - *** Failers -No match - defabc -No match - -/(?<=ab(c+)d)ef/ -Failed: lookbehind assertion is not fixed length at offset 11 - -/(?<=ab(?<=c+)d)ef/ -Failed: lookbehind assertion is not fixed length at offset 12 - -/(?<=ab(c|de)f)g/ -Failed: lookbehind assertion is not fixed length at offset 13 - -/The next three are in testinput2 because they have variable length branches/ - -/(?<=bullock|donkey)-cart/I -Capturing subpattern count = 0 -Max lookbehind = 7 -No options -First char = '-' -Need char = 't' - the bullock-cart - 0: -cart - a donkey-cart race - 0: -cart - *** Failers -No match - cart -No match - horse-and-cart -No match - -/(?<=ab(?i)x|y|z)/I -Capturing subpattern count = 0 -Max lookbehind = 3 -May match empty string -No options -No first char -No need char - -/(?>.*)(?<=(abcd)|(xyz))/I -Capturing subpattern count = 2 -Max lookbehind = 4 -May match empty string -No options -No first char -No need char - alphabetabcd - 0: alphabetabcd - 1: abcd - endingxyz - 0: endingxyz - 1: - 2: xyz - -/(?<=ab(?i)x(?-i)y|(?i)z|b)ZZ/I -Capturing subpattern count = 0 -Max lookbehind = 4 -No options -First char = 'Z' -Need char = 'Z' - abxyZZ - 0: ZZ - abXyZZ - 0: ZZ - ZZZ - 0: ZZ - zZZ - 0: ZZ - bZZ - 0: ZZ - BZZ - 0: ZZ - *** Failers -No match - ZZ -No match - abXYZZ -No match - zzz -No match - bzz -No match - -/(? - 3: f - 1G a (1) - 2G (0) - 3G f (1) -get substring 4 failed -7 - 0L adef - 1L a - 2L - 3L f - bcdef\G1\G2\G3\G4\L - 0: bcdef - 1: bc - 2: bc - 3: f - 1G bc (2) - 2G bc (2) - 3G f (1) -get substring 4 failed -7 - 0L bcdef - 1L bc - 2L bc - 3L f - adefghijk\C0 - 0: adef - 1: a - 2: - 3: f - 0C adef (4) - -/^abc\00def/I -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - abc\00def\L\C0 - 0: abc\x00def - 0C abc\x00def (7) - 0L abc - -/word ((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ -)((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ -)?)?)?)?)?)?)?)?)?otherword/I -Capturing subpattern count = 8 -Contains explicit CR or LF match -No options -First char = 'w' -Need char = 'd' - -/.*X/IDZ ------------------------------------------------------------------- - Bra - Any* - X - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -First char at start or follows newline -Need char = 'X' - -/.*X/IDZs ------------------------------------------------------------------- - Bra - AllAny* - X - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored dotall -No first char -Need char = 'X' - -/(.*X|^B)/IDZ ------------------------------------------------------------------- - Bra - CBra 1 - Any* - X - Alt - ^ - B - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -No options -First char at start or follows newline -No need char - -/(.*X|^B)/IDZs ------------------------------------------------------------------- - Bra - CBra 1 - AllAny* - X - Alt - ^ - B - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: anchored dotall -No first char -No need char - -/(?s)(.*X|^B)/IDZ ------------------------------------------------------------------- - Bra - CBra 1 - AllAny* - X - Alt - ^ - B - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: anchored -No first char -No need char - -/(?s:.*X|^B)/IDZ ------------------------------------------------------------------- - Bra - Bra - AllAny* - X - Alt - ^ - B - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/\Biss\B/I+ -Capturing subpattern count = 0 -Max lookbehind = 1 -No options -First char = 'i' -Need char = 's' - Mississippi - 0: iss - 0+ issippi - -/iss/IG+ -Capturing subpattern count = 0 -No options -First char = 'i' -Need char = 's' - Mississippi - 0: iss - 0+ issippi - 0: iss - 0+ ippi - -/\Biss\B/IG+ -Capturing subpattern count = 0 -Max lookbehind = 1 -No options -First char = 'i' -Need char = 's' - Mississippi - 0: iss - 0+ issippi - -/\Biss\B/Ig+ -Capturing subpattern count = 0 -Max lookbehind = 1 -No options -First char = 'i' -Need char = 's' - Mississippi - 0: iss - 0+ issippi - 0: iss - 0+ ippi - *** Failers -No match - Mississippi\A -No match - -/(?<=[Ms])iss/Ig+ -Capturing subpattern count = 0 -Max lookbehind = 1 -No options -First char = 'i' -Need char = 's' - Mississippi - 0: iss - 0+ issippi - 0: iss - 0+ ippi - -/(?<=[Ms])iss/IG+ -Capturing subpattern count = 0 -Max lookbehind = 1 -No options -First char = 'i' -Need char = 's' - Mississippi - 0: iss - 0+ issippi - -/^iss/Ig+ -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - ississippi - 0: iss - 0+ issippi - -/.*iss/Ig+ -Capturing subpattern count = 0 -No options -First char at start or follows newline -Need char = 's' - abciss\nxyzisspqr - 0: abciss - 0+ \x0axyzisspqr - 0: xyziss - 0+ pqr - -/.i./I+g -Capturing subpattern count = 0 -No options -No first char -Need char = 'i' - Mississippi - 0: Mis - 0+ sissippi - 0: sis - 0+ sippi - 0: sip - 0+ pi - Mississippi\A - 0: Mis - 0+ sissippi - 0: sis - 0+ sippi - 0: sip - 0+ pi - Missouri river - 0: Mis - 0+ souri river - 0: ri - 0+ river - 0: riv - 0+ er - Missouri river\A - 0: Mis - 0+ souri river - -/^.is/I+g -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - Mississippi - 0: Mis - 0+ sissippi - -/^ab\n/Ig+ -Capturing subpattern count = 0 -Contains explicit CR or LF match -Options: anchored -No first char -No need char - ab\nab\ncd - 0: ab\x0a - 0+ ab\x0acd - -/^ab\n/Img+ -Capturing subpattern count = 0 -Contains explicit CR or LF match -Options: multiline -First char at start or follows newline -Need char = \x0a - ab\nab\ncd - 0: ab\x0a - 0+ ab\x0acd - 0: ab\x0a - 0+ cd - -/abc/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'c' - -/abc|bac/I -Capturing subpattern count = 0 -No options -No first char -Need char = 'c' - -/(abc|bac)/I -Capturing subpattern count = 1 -No options -No first char -Need char = 'c' - -/(abc|(c|dc))/I -Capturing subpattern count = 2 -No options -No first char -Need char = 'c' - -/(abc|(d|de)c)/I -Capturing subpattern count = 2 -No options -No first char -Need char = 'c' - -/a*/I -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char - -/a+/I -Capturing subpattern count = 0 -No options -First char = 'a' -No need char - -/(baa|a+)/I -Capturing subpattern count = 1 -No options -No first char -Need char = 'a' - -/a{0,3}/I -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char - -/baa{3,}/I -Capturing subpattern count = 0 -No options -First char = 'b' -Need char = 'a' - -/"([^\\"]+|\\.)*"/I -Capturing subpattern count = 1 -No options -First char = '"' -Need char = '"' - -/(abc|ab[cd])/I -Capturing subpattern count = 1 -No options -First char = 'a' -No need char - -/(a|.)/I -Capturing subpattern count = 1 -No options -No first char -No need char - -/a|ba|\w/I -Capturing subpattern count = 0 -No options -No first char -No need char - -/abc(?=pqr)/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'r' - -/...(?<=abc)/I -Capturing subpattern count = 0 -Max lookbehind = 3 -No options -No first char -No need char - -/abc(?!pqr)/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'c' - -/ab./I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'b' - -/ab[xyz]/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'b' - -/abc*/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'b' - -/ab.c*/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'b' - -/a.c*/I -Capturing subpattern count = 0 -No options -First char = 'a' -No need char - -/.c*/I -Capturing subpattern count = 0 -No options -No first char -No need char - -/ac*/I -Capturing subpattern count = 0 -No options -First char = 'a' -No need char - -/(a.c*|b.c*)/I -Capturing subpattern count = 1 -No options -No first char -No need char - -/a.c*|aba/I -Capturing subpattern count = 0 -No options -First char = 'a' -No need char - -/.+a/I -Capturing subpattern count = 0 -No options -No first char -Need char = 'a' - -/(?=abcda)a.*/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'a' - -/(?=a)a.*/I -Capturing subpattern count = 0 -No options -First char = 'a' -No need char - -/a(b)*/I -Capturing subpattern count = 1 -No options -First char = 'a' -No need char - -/a\d*/I -Capturing subpattern count = 0 -No options -First char = 'a' -No need char - -/ab\d*/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'b' - -/a(\d)*/I -Capturing subpattern count = 1 -No options -First char = 'a' -No need char - -/abcde{0,0}/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'd' - -/ab\d+/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'b' - -/a(?(1)b)(.)/I -Capturing subpattern count = 1 -Max back reference = 1 -No options -First char = 'a' -No need char - -/a(?(1)bag|big)(.)/I -Capturing subpattern count = 1 -Max back reference = 1 -No options -First char = 'a' -Need char = 'g' - -/a(?(1)bag|big)*(.)/I -Capturing subpattern count = 1 -Max back reference = 1 -No options -First char = 'a' -No need char - -/a(?(1)bag|big)+(.)/I -Capturing subpattern count = 1 -Max back reference = 1 -No options -First char = 'a' -Need char = 'g' - -/a(?(1)b..|b..)(.)/I -Capturing subpattern count = 1 -Max back reference = 1 -No options -First char = 'a' -Need char = 'b' - -/ab\d{0}e/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'e' - -/a?b?/I -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char - a - 0: a - b - 0: b - ab - 0: ab - \ - 0: - *** Failers - 0: - \N -No match - -/|-/I -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char - abcd - 0: - -abc - 0: - \Nab-c - 0: - - *** Failers - 0: - \Nabc -No match - -/^.?abcd/IS -Capturing subpattern count = 0 -Options: anchored -No first char -Need char = 'd' -Subject length lower bound = 4 -No starting char list - -/\( # ( at start - (?: # Non-capturing bracket - (?>[^()]+) # Either a sequence of non-brackets (no backtracking) - | # Or - (?R) # Recurse - i.e. nested bracketed string - )* # Zero or more contents - \) # Closing ) - /Ix -Capturing subpattern count = 0 -Options: extended -First char = '(' -Need char = ')' - (abcd) - 0: (abcd) - (abcd)xyz - 0: (abcd) - xyz(abcd) - 0: (abcd) - (ab(xy)cd)pqr - 0: (ab(xy)cd) - (ab(xycd)pqr - 0: (xycd) - () abc () - 0: () - 12(abcde(fsh)xyz(foo(bar))lmno)89 - 0: (abcde(fsh)xyz(foo(bar))lmno) - *** Failers -No match - abcd -No match - abcd) -No match - (abcd -No match - -/\( ( (?>[^()]+) | (?R) )* \) /Ixg -Capturing subpattern count = 1 -Options: extended -First char = '(' -Need char = ')' - (ab(xy)cd)pqr - 0: (ab(xy)cd) - 1: cd - 1(abcd)(x(y)z)pqr - 0: (abcd) - 1: abcd - 0: (x(y)z) - 1: z - -/\( (?: (?>[^()]+) | (?R) ) \) /Ix -Capturing subpattern count = 0 -Options: extended -First char = '(' -Need char = ')' - (abcd) - 0: (abcd) - (ab(xy)cd) - 0: (xy) - (a(b(c)d)e) - 0: (c) - ((ab)) - 0: ((ab)) - *** Failers -No match - () -No match - -/\( (?: (?>[^()]+) | (?R) )? \) /Ix -Capturing subpattern count = 0 -Options: extended -First char = '(' -Need char = ')' - () - 0: () - 12(abcde(fsh)xyz(foo(bar))lmno)89 - 0: (fsh) - -/\( ( (?>[^()]+) | (?R) )* \) /Ix -Capturing subpattern count = 1 -Options: extended -First char = '(' -Need char = ')' - (ab(xy)cd) - 0: (ab(xy)cd) - 1: cd - -/\( ( ( (?>[^()]+) | (?R) )* ) \) /Ix -Capturing subpattern count = 2 -Options: extended -First char = '(' -Need char = ')' - (ab(xy)cd) - 0: (ab(xy)cd) - 1: ab(xy)cd - 2: cd - -/\( (123)? ( ( (?>[^()]+) | (?R) )* ) \) /Ix -Capturing subpattern count = 3 -Options: extended -First char = '(' -Need char = ')' - (ab(xy)cd) - 0: (ab(xy)cd) - 1: - 2: ab(xy)cd - 3: cd - (123ab(xy)cd) - 0: (123ab(xy)cd) - 1: 123 - 2: ab(xy)cd - 3: cd - -/\( ( (123)? ( (?>[^()]+) | (?R) )* ) \) /Ix -Capturing subpattern count = 3 -Options: extended -First char = '(' -Need char = ')' - (ab(xy)cd) - 0: (ab(xy)cd) - 1: ab(xy)cd - 2: - 3: cd - (123ab(xy)cd) - 0: (123ab(xy)cd) - 1: 123ab(xy)cd - 2: 123 - 3: cd - -/\( (((((((((( ( (?>[^()]+) | (?R) )* )))))))))) \) /Ix -Capturing subpattern count = 11 -Options: extended -First char = '(' -Need char = ')' - (ab(xy)cd) - 0: (ab(xy)cd) - 1: ab(xy)cd - 2: ab(xy)cd - 3: ab(xy)cd - 4: ab(xy)cd - 5: ab(xy)cd - 6: ab(xy)cd - 7: ab(xy)cd - 8: ab(xy)cd - 9: ab(xy)cd -10: ab(xy)cd -11: cd - -/\( ( ( (?>[^()<>]+) | ((?>[^()]+)) | (?R) )* ) \) /Ix -Capturing subpattern count = 3 -Options: extended -First char = '(' -Need char = ')' - (abcd(xyz

    qrs)123) - 0: (abcd(xyz

    qrs)123) - 1: abcd(xyz

    qrs)123 - 2: 123 - -/\( ( ( (?>[^()]+) | ((?R)) )* ) \) /Ix -Capturing subpattern count = 3 -Options: extended -First char = '(' -Need char = ')' - (ab(cd)ef) - 0: (ab(cd)ef) - 1: ab(cd)ef - 2: ef - 3: (cd) - (ab(cd(ef)gh)ij) - 0: (ab(cd(ef)gh)ij) - 1: ab(cd(ef)gh)ij - 2: ij - 3: (cd(ef)gh) - -/^[[:alnum:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [0-9A-Za-z] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:^alnum:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [\x00-/:-@[-`{-\xff] (neg) - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:alpha:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [A-Za-z] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:^alpha:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [\x00-@[-`{-\xff] (neg) - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/[_[:alpha:]]/IS -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z - _ a b c d e f g h i j k l m n o p q r s t u v w x y z - -/^[[:ascii:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [\x00-\x7f] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:^ascii:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [\x80-\xff] (neg) - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:blank:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [\x09 ] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:^blank:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [\x00-\x08\x0a-\x1f!-\xff] (neg) - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/[\n\x0b\x0c\x0d[:blank:]]/IS -Capturing subpattern count = 0 -Contains explicit CR or LF match -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 - -/^[[:cntrl:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [\x00-\x1f\x7f] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:digit:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [0-9] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:graph:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [!-~] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:lower:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [a-z] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:print:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [ -~] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:punct:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [!-/:-@[-`{-~] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:space:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [\x09-\x0d ] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:upper:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [A-Z] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:xdigit:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [0-9A-Fa-f] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:word:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [0-9A-Z_a-z] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:^cntrl:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [ -~\x80-\xff] (neg) - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[12[:^digit:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [\x00-/12:-\xff] (neg) - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/^[[:^blank:]]/DZ ------------------------------------------------------------------- - Bra - ^ - [\x00-\x08\x0a-\x1f!-\xff] (neg) - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/[01[:alpha:]%]/DZ ------------------------------------------------------------------- - Bra - [%01A-Za-z] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/[[.ch.]]/I -Failed: POSIX collating elements are not supported at offset 1 - -/[[=ch=]]/I -Failed: POSIX collating elements are not supported at offset 1 - -/[[:rhubarb:]]/I -Failed: unknown POSIX class name at offset 3 - -/[[:upper:]]/Ii -Capturing subpattern count = 0 -Options: caseless -No first char -No need char - A - 0: A - a - 0: a - -/[[:lower:]]/Ii -Capturing subpattern count = 0 -Options: caseless -No first char -No need char - A - 0: A - a - 0: a - -/((?-i)[[:lower:]])[[:lower:]]/Ii -Capturing subpattern count = 1 -Options: caseless -No first char -No need char - ab - 0: ab - 1: a - aB - 0: aB - 1: a - *** Failers - 0: ai - 1: a - Ab -No match - AB -No match - -/[\200-\110]/I -Failed: range out of order in character class at offset 9 - -/^(?(0)f|b)oo/I -Failed: invalid condition (?(0) at offset 6 - -/This one's here because of the large output vector needed/I -Capturing subpattern count = 0 -No options -First char = 'T' -Need char = 'd' - -/(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\w+)\s+(\270)/I -Capturing subpattern count = 271 -Max back reference = 270 -No options -No first char -No need char - \O900 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 ABC ABC - 0: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 ABC ABC - 1: 1 - 2: 2 - 3: 3 - 4: 4 - 5: 5 - 6: 6 - 7: 7 - 8: 8 - 9: 9 -10: 10 -11: 11 -12: 12 -13: 13 -14: 14 -15: 15 -16: 16 -17: 17 -18: 18 -19: 19 -20: 20 -21: 21 -22: 22 -23: 23 -24: 24 -25: 25 -26: 26 -27: 27 -28: 28 -29: 29 -30: 30 -31: 31 -32: 32 -33: 33 -34: 34 -35: 35 -36: 36 -37: 37 -38: 38 -39: 39 -40: 40 -41: 41 -42: 42 -43: 43 -44: 44 -45: 45 -46: 46 -47: 47 -48: 48 -49: 49 -50: 50 -51: 51 -52: 52 -53: 53 -54: 54 -55: 55 -56: 56 -57: 57 -58: 58 -59: 59 -60: 60 -61: 61 -62: 62 -63: 63 -64: 64 -65: 65 -66: 66 -67: 67 -68: 68 -69: 69 -70: 70 -71: 71 -72: 72 -73: 73 -74: 74 -75: 75 -76: 76 -77: 77 -78: 78 -79: 79 -80: 80 -81: 81 -82: 82 -83: 83 -84: 84 -85: 85 -86: 86 -87: 87 -88: 88 -89: 89 -90: 90 -91: 91 -92: 92 -93: 93 -94: 94 -95: 95 -96: 96 -97: 97 -98: 98 -99: 99 -100: 100 -101: 101 -102: 102 -103: 103 -104: 104 -105: 105 -106: 106 -107: 107 -108: 108 -109: 109 -110: 110 -111: 111 -112: 112 -113: 113 -114: 114 -115: 115 -116: 116 -117: 117 -118: 118 -119: 119 -120: 120 -121: 121 -122: 122 -123: 123 -124: 124 -125: 125 -126: 126 -127: 127 -128: 128 -129: 129 -130: 130 -131: 131 -132: 132 -133: 133 -134: 134 -135: 135 -136: 136 -137: 137 -138: 138 -139: 139 -140: 140 -141: 141 -142: 142 -143: 143 -144: 144 -145: 145 -146: 146 -147: 147 -148: 148 -149: 149 -150: 150 -151: 151 -152: 152 -153: 153 -154: 154 -155: 155 -156: 156 -157: 157 -158: 158 -159: 159 -160: 160 -161: 161 -162: 162 -163: 163 -164: 164 -165: 165 -166: 166 -167: 167 -168: 168 -169: 169 -170: 170 -171: 171 -172: 172 -173: 173 -174: 174 -175: 175 -176: 176 -177: 177 -178: 178 -179: 179 -180: 180 -181: 181 -182: 182 -183: 183 -184: 184 -185: 185 -186: 186 -187: 187 -188: 188 -189: 189 -190: 190 -191: 191 -192: 192 -193: 193 -194: 194 -195: 195 -196: 196 -197: 197 -198: 198 -199: 199 -200: 200 -201: 201 -202: 202 -203: 203 -204: 204 -205: 205 -206: 206 -207: 207 -208: 208 -209: 209 -210: 210 -211: 211 -212: 212 -213: 213 -214: 214 -215: 215 -216: 216 -217: 217 -218: 218 -219: 219 -220: 220 -221: 221 -222: 222 -223: 223 -224: 224 -225: 225 -226: 226 -227: 227 -228: 228 -229: 229 -230: 230 -231: 231 -232: 232 -233: 233 -234: 234 -235: 235 -236: 236 -237: 237 -238: 238 -239: 239 -240: 240 -241: 241 -242: 242 -243: 243 -244: 244 -245: 245 -246: 246 -247: 247 -248: 248 -249: 249 -250: 250 -251: 251 -252: 252 -253: 253 -254: 254 -255: 255 -256: 256 -257: 257 -258: 258 -259: 259 -260: 260 -261: 261 -262: 262 -263: 263 -264: 264 -265: 265 -266: 266 -267: 267 -268: 268 -269: 269 -270: ABC -271: ABC - -/This one's here because Perl does this differently and PCRE can't at present/I -Capturing subpattern count = 0 -No options -First char = 'T' -Need char = 't' - -/(main(O)?)+/I -Capturing subpattern count = 2 -No options -First char = 'm' -Need char = 'n' - mainmain - 0: mainmain - 1: main - mainOmain - 0: mainOmain - 1: main - 2: O - -/These are all cases where Perl does it differently (nested captures)/I -Capturing subpattern count = 1 -No options -First char = 'T' -Need char = 's' - -/^(a(b)?)+$/I -Capturing subpattern count = 2 -Options: anchored -No first char -No need char - aba - 0: aba - 1: a - 2: b - -/^(aa(bb)?)+$/I -Capturing subpattern count = 2 -Options: anchored -No first char -No need char - aabbaa - 0: aabbaa - 1: aa - 2: bb - -/^(aa|aa(bb))+$/I -Capturing subpattern count = 2 -Options: anchored -No first char -No need char - aabbaa - 0: aabbaa - 1: aa - 2: bb - -/^(aa(bb)??)+$/I -Capturing subpattern count = 2 -Options: anchored -No first char -No need char - aabbaa - 0: aabbaa - 1: aa - 2: bb - -/^(?:aa(bb)?)+$/I -Capturing subpattern count = 1 -Options: anchored -No first char -No need char - aabbaa - 0: aabbaa - 1: bb - -/^(aa(b(b))?)+$/I -Capturing subpattern count = 3 -Options: anchored -No first char -No need char - aabbaa - 0: aabbaa - 1: aa - 2: bb - 3: b - -/^(?:aa(b(b))?)+$/I -Capturing subpattern count = 2 -Options: anchored -No first char -No need char - aabbaa - 0: aabbaa - 1: bb - 2: b - -/^(?:aa(b(?:b))?)+$/I -Capturing subpattern count = 1 -Options: anchored -No first char -No need char - aabbaa - 0: aabbaa - 1: bb - -/^(?:aa(bb(?:b))?)+$/I -Capturing subpattern count = 1 -Options: anchored -No first char -No need char - aabbbaa - 0: aabbbaa - 1: bbb - -/^(?:aa(b(?:bb))?)+$/I -Capturing subpattern count = 1 -Options: anchored -No first char -No need char - aabbbaa - 0: aabbbaa - 1: bbb - -/^(?:aa(?:b(b))?)+$/I -Capturing subpattern count = 1 -Options: anchored -No first char -No need char - aabbaa - 0: aabbaa - 1: b - -/^(?:aa(?:b(bb))?)+$/I -Capturing subpattern count = 1 -Options: anchored -No first char -No need char - aabbbaa - 0: aabbbaa - 1: bb - -/^(aa(b(bb))?)+$/I -Capturing subpattern count = 3 -Options: anchored -No first char -No need char - aabbbaa - 0: aabbbaa - 1: aa - 2: bbb - 3: bb - -/^(aa(bb(bb))?)+$/I -Capturing subpattern count = 3 -Options: anchored -No first char -No need char - aabbbbaa - 0: aabbbbaa - 1: aa - 2: bbbb - 3: bb - -/--------------------------------------------------------------------/I -Capturing subpattern count = 0 -No options -First char = '-' -Need char = '-' - -/#/IxDZ ------------------------------------------------------------------- - Bra - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -May match empty string -Options: extended -No first char -No need char - -/a#/IxDZ ------------------------------------------------------------------- - Bra - a - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: extended -First char = 'a' -No need char - -/[\s]/DZ ------------------------------------------------------------------- - Bra - [\x09-\x0d ] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/[\S]/DZ ------------------------------------------------------------------- - Bra - [\x00-\x08\x0e-\x1f!-\xff] (neg) - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/a(?i)b/DZ ------------------------------------------------------------------- - Bra - a - /i b - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'b' (caseless) - ab - 0: ab - aB - 0: aB - *** Failers -No match - AB -No match - -/(a(?i)b)/DZ ------------------------------------------------------------------- - Bra - CBra 1 - a - /i b - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -No options -First char = 'a' -Need char = 'b' (caseless) - ab - 0: ab - 1: ab - aB - 0: aB - 1: aB - *** Failers -No match - AB -No match - -/ (?i)abc/IxDZ ------------------------------------------------------------------- - Bra - /i abc - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: extended -First char = 'a' (caseless) -Need char = 'c' (caseless) - -/#this is a comment - (?i)abc/IxDZ ------------------------------------------------------------------- - Bra - /i abc - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: extended -First char = 'a' (caseless) -Need char = 'c' (caseless) - -/123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/DZ ------------------------------------------------------------------- - Bra - 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -First char = '1' -Need char = '0' - -/\Q123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/DZ ------------------------------------------------------------------- - Bra - 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -First char = '1' -Need char = '0' - -/\Q\E/DZ ------------------------------------------------------------------- - Bra - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char - \ - 0: - -/\Q\Ex/DZ ------------------------------------------------------------------- - Bra - x - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -First char = 'x' -No need char - -/ \Q\E/DZ ------------------------------------------------------------------- - Bra - - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -First char = ' ' -No need char - -/a\Q\E/DZ ------------------------------------------------------------------- - Bra - a - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -First char = 'a' -No need char - abc - 0: a - bca - 0: a - bac - 0: a - -/a\Q\Eb/DZ ------------------------------------------------------------------- - Bra - ab - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'b' - abc - 0: ab - -/\Q\Eabc/DZ ------------------------------------------------------------------- - Bra - abc - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'c' - -/x*+\w/DZ ------------------------------------------------------------------- - Bra - x*+ - \w - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - *** Failers - 0: F - xxxxx -No match - -/x?+/DZ ------------------------------------------------------------------- - Bra - x?+ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char - -/x++/DZ ------------------------------------------------------------------- - Bra - x++ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -First char = 'x' -No need char - -/x{1,3}+/BZO ------------------------------------------------------------------- - Bra - x - x{0,2}+ - Ket - End ------------------------------------------------------------------- - -/x{1,3}+/BZOi ------------------------------------------------------------------- - Bra - /i x - /i x{0,2}+ - Ket - End ------------------------------------------------------------------- - -/[^x]{1,3}+/BZO ------------------------------------------------------------------- - Bra - [^x] - [^x]{0,2}+ - Ket - End ------------------------------------------------------------------- - -/[^x]{1,3}+/BZOi ------------------------------------------------------------------- - Bra - /i [^x] - /i [^x]{0,2}+ - Ket - End ------------------------------------------------------------------- - -/(x)*+/DZ ------------------------------------------------------------------- - Bra - Braposzero - CBraPos 1 - x - KetRpos - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -May match empty string -No options -No first char -No need char - -/^(\w++|\s++)*$/I -Capturing subpattern count = 1 -May match empty string -Options: anchored -No first char -No need char - now is the time for all good men to come to the aid of the party - 0: now is the time for all good men to come to the aid of the party - 1: party - *** Failers -No match - this is not a line with only words and spaces! -No match - -/(\d++)(\w)/I -Capturing subpattern count = 2 -No options -No first char -No need char - 12345a - 0: 12345a - 1: 12345 - 2: a - *** Failers -No match - 12345+ -No match - -/a++b/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'b' - aaab - 0: aaab - -/(a++b)/I -Capturing subpattern count = 1 -No options -First char = 'a' -Need char = 'b' - aaab - 0: aaab - 1: aaab - -/(a++)b/I -Capturing subpattern count = 1 -No options -First char = 'a' -Need char = 'b' - aaab - 0: aaab - 1: aaa - -/([^()]++|\([^()]*\))+/I -Capturing subpattern count = 1 -No options -No first char -No need char - ((abc(ade)ufh()()x - 0: abc(ade)ufh()()x - 1: x - -/\(([^()]++|\([^()]+\))+\)/I -Capturing subpattern count = 1 -No options -First char = '(' -Need char = ')' - (abc) - 0: (abc) - 1: abc - (abc(def)xyz) - 0: (abc(def)xyz) - 1: xyz - *** Failers -No match - ((()aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa -No match - -/(abc){1,3}+/DZ ------------------------------------------------------------------- - Bra - Once - CBra 1 - abc - Ket - Brazero - Bra - CBra 1 - abc - Ket - Brazero - CBra 1 - abc - Ket - Ket - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -No options -First char = 'a' -Need char = 'c' - -/a+?+/I -Failed: nothing to repeat at offset 3 - -/a{2,3}?+b/I -Failed: nothing to repeat at offset 7 - -/(?U)a+?+/I -Failed: nothing to repeat at offset 7 - -/a{2,3}?+b/IU -Failed: nothing to repeat at offset 7 - -/x(?U)a++b/DZ ------------------------------------------------------------------- - Bra - x - a++ - b - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -First char = 'x' -Need char = 'b' - xaaaab - 0: xaaaab - -/(?U)xa++b/DZ ------------------------------------------------------------------- - Bra - x - a++ - b - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -First char = 'x' -Need char = 'b' - xaaaab - 0: xaaaab - -/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/DZ ------------------------------------------------------------------- - Bra - ^ - CBra 1 - CBra 2 - a+ - Ket - CBra 3 - [ab]+? - Ket - CBra 4 - [bc]+ - Ket - CBra 5 - \w*+ - Ket - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 5 -Options: anchored -No first char -No need char - -/^x(?U)a+b/DZ ------------------------------------------------------------------- - Bra - ^ - x - a++ - b - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -Need char = 'b' - -/^x(?U)(a+)b/DZ ------------------------------------------------------------------- - Bra - ^ - x - CBra 1 - a+? - Ket - b - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: anchored -No first char -Need char = 'b' - -/[.x.]/I -Failed: POSIX collating elements are not supported at offset 0 - -/[=x=]/I -Failed: POSIX collating elements are not supported at offset 0 - -/[:x:]/I -Failed: POSIX named classes are supported only within a class at offset 0 - -/\l/I -Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1 - -/\L/I -Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1 - -/\N{name}/I -Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1 - -/\u/I -Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1 - -/\U/I -Failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1 - -/a{1,3}b/U - ab - 0: ab - -/[/I -Failed: missing terminating ] for character class at offset 1 - -/[a-/I -Failed: missing terminating ] for character class at offset 3 - -/[[:space:]/I -Failed: missing terminating ] for character class at offset 10 - -/[\s]/IDZ ------------------------------------------------------------------- - Bra - [\x09-\x0d ] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/[[:space:]]/IDZ ------------------------------------------------------------------- - Bra - [\x09-\x0d ] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/[[:space:]abcde]/IDZ ------------------------------------------------------------------- - Bra - [\x09-\x0d a-e] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/< (?: (?(R) \d++ | [^<>]*+) | (?R)) * >/Ix -Capturing subpattern count = 0 -Options: extended -First char = '<' -Need char = '>' - <> - 0: <> - - 0: - hij> - 0: hij> - hij> - 0: - def> - 0: def> - - 0: <> - *** Failers -No match - iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b|IDZ ------------------------------------------------------------------- - Bra - 8J$WE<.rX+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X - \b - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Max lookbehind = 1 -No options -First char = '8' -Need char = 'X' - -|\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b|IDZ ------------------------------------------------------------------- - Bra - $<.X+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X - \b - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Max lookbehind = 1 -No options -First char = '$' -Need char = 'X' - -/(.*)\d+\1/I -Capturing subpattern count = 1 -Max back reference = 1 -No options -No first char -No need char - -/(.*)\d+/I -Capturing subpattern count = 1 -No options -First char at start or follows newline -No need char - -/(.*)\d+\1/Is -Capturing subpattern count = 1 -Max back reference = 1 -Options: dotall -No first char -No need char - -/(.*)\d+/Is -Capturing subpattern count = 1 -Options: anchored dotall -No first char -No need char - -/(.*(xyz))\d+\2/I -Capturing subpattern count = 2 -Max back reference = 2 -No options -First char at start or follows newline -Need char = 'z' - -/((.*))\d+\1/I -Capturing subpattern count = 2 -Max back reference = 1 -No options -No first char -No need char - abc123bc - 0: bc123bc - 1: bc - 2: bc - -/a[b]/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'b' - -/(?=a).*/I -Capturing subpattern count = 0 -May match empty string -No options -First char = 'a' -No need char - -/(?=abc).xyz/IiI -Capturing subpattern count = 0 -Options: caseless -First char = 'a' (caseless) -Need char = 'z' (caseless) - -/(?=abc)(?i).xyz/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'z' (caseless) - -/(?=a)(?=b)/I -Capturing subpattern count = 0 -May match empty string -No options -First char = 'a' -No need char - -/(?=.)a/I -Capturing subpattern count = 0 -No options -First char = 'a' -No need char - -/((?=abcda)a)/I -Capturing subpattern count = 1 -No options -First char = 'a' -Need char = 'a' - -/((?=abcda)ab)/I -Capturing subpattern count = 1 -No options -First char = 'a' -Need char = 'b' - -/()a/I -Capturing subpattern count = 1 -No options -No first char -Need char = 'a' - -/(?(1)ab|ac)(.)/I -Capturing subpattern count = 1 -Max back reference = 1 -No options -First char = 'a' -No need char - -/(?(1)abz|acz)(.)/I -Capturing subpattern count = 1 -Max back reference = 1 -No options -First char = 'a' -Need char = 'z' - -/(?(1)abz)(.)/I -Capturing subpattern count = 1 -Max back reference = 1 -No options -No first char -No need char - -/(?(1)abz)(1)23/I -Capturing subpattern count = 1 -Max back reference = 1 -No options -No first char -Need char = '3' - -/(a)+/I -Capturing subpattern count = 1 -No options -First char = 'a' -No need char - -/(a){2,3}/I -Capturing subpattern count = 1 -No options -First char = 'a' -Need char = 'a' - -/(a)*/I -Capturing subpattern count = 1 -May match empty string -No options -No first char -No need char - -/[a]/I -Capturing subpattern count = 0 -No options -First char = 'a' -No need char - -/[ab]/I -Capturing subpattern count = 0 -No options -No first char -No need char - -/[ab]/IS -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: a b - -/[^a]/I -Capturing subpattern count = 0 -No options -No first char -No need char - -/\d456/I -Capturing subpattern count = 0 -No options -No first char -Need char = '6' - -/\d456/IS -Capturing subpattern count = 0 -No options -No first char -Need char = '6' -Subject length lower bound = 4 -Starting chars: 0 1 2 3 4 5 6 7 8 9 - -/a^b/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'b' - -/^a/Im -Capturing subpattern count = 0 -Options: multiline -First char at start or follows newline -Need char = 'a' - abcde - 0: a - xy\nabc - 0: a - *** Failers -No match - xyabc -No match - -/c|abc/I -Capturing subpattern count = 0 -No options -No first char -Need char = 'c' - -/(?i)[ab]/IS -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: A B a b - -/[ab](?i)cd/IS -Capturing subpattern count = 0 -No options -No first char -Need char = 'd' (caseless) -Subject length lower bound = 3 -Starting chars: a b - -/abc(?C)def/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'f' - abcdef ---->abcdef - 0 ^ ^ d - 0: abcdef - 1234abcdef ---->1234abcdef - 0 ^ ^ d - 0: abcdef - *** Failers -No match - abcxyz -No match - abcxyzf ---->abcxyzf - 0 ^ ^ d -No match - -/abc(?C)de(?C1)f/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'f' - 123abcdef ---->123abcdef - 0 ^ ^ d - 1 ^ ^ f - 0: abcdef - -/(?C1)\dabc(?C2)def/IS -Capturing subpattern count = 0 -No options -No first char -Need char = 'f' -Subject length lower bound = 7 -Starting chars: 0 1 2 3 4 5 6 7 8 9 - 1234abcdef ---->1234abcdef - 1 ^ \d - 1 ^ \d - 1 ^ \d - 1 ^ \d - 2 ^ ^ d - 0: 4abcdef - *** Failers -No match - abcdef -No match - -/(?C1)\dabc(?C2)def/ISS -Capturing subpattern count = 0 -No options -No first char -Need char = 'f' - 1234abcdef ---->1234abcdef - 1 ^ \d - 1 ^ \d - 1 ^ \d - 1 ^ \d - 2 ^ ^ d - 0: 4abcdef - *** Failers -No match - abcdef ---->abcdef - 1 ^ \d - 1 ^ \d - 1 ^ \d - 1 ^ \d - 1 ^ \d - 1 ^ \d -No match - -/(?C255)ab/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'b' - -/(?C256)ab/I -Failed: number after (?C is > 255 at offset 6 - -/(?Cab)xx/I -Failed: closing ) for (?C expected at offset 3 - -/(?C12vr)x/I -Failed: closing ) for (?C expected at offset 5 - -/abc(?C)def/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'f' - *** Failers -No match - \x83\x0\x61bcdef ---->\x83\x00abcdef - 0 ^ ^ d - 0: abcdef - -/(abc)(?C)de(?C1)f/I -Capturing subpattern count = 1 -No options -First char = 'a' -Need char = 'f' - 123abcdef ---->123abcdef - 0 ^ ^ d - 1 ^ ^ f - 0: abcdef - 1: abc - 123abcdef\C+ -Callout 0: last capture = 1 - 0: - 1: abc ---->123abcdef - ^ ^ d -Callout 1: last capture = 1 - 0: - 1: abc ---->123abcdef - ^ ^ f - 0: abcdef - 1: abc - 123abcdef\C- - 0: abcdef - 1: abc - *** Failers -No match - 123abcdef\C!1 ---->123abcdef - 0 ^ ^ d - 1 ^ ^ f -No match - -/(?C0)(abc(?C1))*/I -Capturing subpattern count = 1 -May match empty string -No options -No first char -No need char - abcabcabc ---->abcabcabc - 0 ^ (abc(?C1))* - 1 ^ ^ ) - 1 ^ ^ ) - 1 ^ ^ ) - 0: abcabcabc - 1: abc - abcabc\C!1!3 ---->abcabc - 0 ^ (abc(?C1))* - 1 ^ ^ ) - 1 ^ ^ ) - 0: abcabc - 1: abc - *** Failers ---->*** Failers - 0 ^ (abc(?C1))* - 0: - abcabcabc\C!1!3 ---->abcabcabc - 0 ^ (abc(?C1))* - 1 ^ ^ ) - 1 ^ ^ ) - 1 ^ ^ ) - 0: abcabc - 1: abc - -/(\d{3}(?C))*/I -Capturing subpattern count = 1 -May match empty string -No options -No first char -No need char - 123\C+ -Callout 0: last capture = -1 - 0: ---->123 - ^ ^ ) - 0: 123 - 1: 123 - 123456\C+ -Callout 0: last capture = -1 - 0: ---->123456 - ^ ^ ) -Callout 0: last capture = 1 - 0: - 1: 123 ---->123456 - ^ ^ ) - 0: 123456 - 1: 456 - 123456789\C+ -Callout 0: last capture = -1 - 0: ---->123456789 - ^ ^ ) -Callout 0: last capture = 1 - 0: - 1: 123 ---->123456789 - ^ ^ ) -Callout 0: last capture = 1 - 0: - 1: 456 ---->123456789 - ^ ^ ) - 0: 123456789 - 1: 789 - -/((xyz)(?C)p|(?C1)xyzabc)/I -Capturing subpattern count = 2 -No options -First char = 'x' -No need char - xyzabc\C+ -Callout 0: last capture = 2 - 0: - 1: - 2: xyz ---->xyzabc - ^ ^ p -Callout 1: last capture = -1 - 0: ---->xyzabc - ^ x - 0: xyzabc - 1: xyzabc - -/(X)((xyz)(?C)p|(?C1)xyzabc)/I -Capturing subpattern count = 3 -No options -First char = 'X' -Need char = 'x' - Xxyzabc\C+ -Callout 0: last capture = 3 - 0: - 1: X - 2: - 3: xyz ---->Xxyzabc - ^ ^ p -Callout 1: last capture = 1 - 0: - 1: X ---->Xxyzabc - ^^ x - 0: Xxyzabc - 1: X - 2: xyzabc - -/(?=(abc))(?C)abcdef/I -Capturing subpattern count = 1 -No options -First char = 'a' -Need char = 'f' - abcdef\C+ -Callout 0: last capture = 1 - 0: - 1: abc ---->abcdef - ^ a - 0: abcdef - 1: abc - -/(?!(abc)(?C1)d)(?C2)abcxyz/I -Capturing subpattern count = 1 -No options -First char = 'a' -Need char = 'z' - abcxyz\C+ -Callout 1: last capture = 1 - 0: - 1: abc ---->abcxyz - ^ ^ d -Callout 2: last capture = -1 - 0: ---->abcxyz - ^ a - 0: abcxyz - -/(?<=(abc)(?C))xyz/I -Capturing subpattern count = 1 -Max lookbehind = 3 -No options -First char = 'x' -Need char = 'z' - abcxyz\C+ -Callout 0: last capture = 1 - 0: - 1: abc ---->abcxyz - ^ ) - 0: xyz - 1: abc - -/a(b+)(c*)(?C1)/I -Capturing subpattern count = 2 -No options -First char = 'a' -Need char = 'b' - abbbbbccc\C*1 ---->abbbbbccc - 1 ^ ^ -Callout data = 1 -No match - -/a(b+?)(c*?)(?C1)/I -Capturing subpattern count = 2 -No options -First char = 'a' -Need char = 'b' - abbbbbccc\C*1 ---->abbbbbccc - 1 ^ ^ -Callout data = 1 - 1 ^ ^ -Callout data = 1 - 1 ^ ^ -Callout data = 1 - 1 ^ ^ -Callout data = 1 - 1 ^ ^ -Callout data = 1 - 1 ^ ^ -Callout data = 1 - 1 ^ ^ -Callout data = 1 - 1 ^ ^ -Callout data = 1 -No match - -/(?C)abc/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'c' - -/(?C)^abc/I -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/(?C)a|b/IS -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: a b - -/(?R)/I -Failed: recursive call could loop indefinitely at offset 3 - -/(a|(?R))/I -Failed: recursive call could loop indefinitely at offset 6 - -/(ab|(bc|(de|(?R))))/I -Failed: recursive call could loop indefinitely at offset 15 - -/x(ab|(bc|(de|(?R))))/I -Capturing subpattern count = 3 -No options -First char = 'x' -No need char - xab - 0: xab - 1: ab - xbc - 0: xbc - 1: bc - 2: bc - xde - 0: xde - 1: de - 2: de - 3: de - xxab - 0: xxab - 1: xab - 2: xab - 3: xab - xxxab - 0: xxxab - 1: xxab - 2: xxab - 3: xxab - *** Failers -No match - xyab -No match - -/(ab|(bc|(de|(?1))))/I -Failed: recursive call could loop indefinitely at offset 15 - -/x(ab|(bc|(de|(?1)x)x)x)/I -Failed: recursive call could loop indefinitely at offset 16 - -/^([^()]|\((?1)*\))*$/I -Capturing subpattern count = 1 -May match empty string -Options: anchored -No first char -No need char - abc - 0: abc - 1: c - a(b)c - 0: a(b)c - 1: c - a(b(c))d - 0: a(b(c))d - 1: d - *** Failers) -No match - a(b(c)d -No match - -/^>abc>([^()]|\((?1)*\))*abc>123abc>123abc>1(2)3abc>1(2)3abc>(1(2)3)abc>(1(2)3)]*+) | (?2)) * >))/Ix -Capturing subpattern count = 2 -Options: extended -First char = '<' -Need char = '>' - <> - 0: <> - 1: <> - 2: <> - - 0: - 1: - 2: - hij> - 0: hij> - 1: hij> - 2: hij> - hij> - 0: - 1: - 2: - def> - 0: def> - 1: def> - 2: def> - - 0: <> - 1: <> - 2: <> - *** Failers -No match - b|c)d(?Pe)/DZ ------------------------------------------------------------------- - Bra - a - CBra 1 - b - Alt - c - Ket - d - CBra 2 - e - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 2 -Named capturing subpatterns: - longername2 2 - name1 1 -No options -First char = 'a' -Need char = 'e' - abde - 0: abde - 1: b - 2: e - acde - 0: acde - 1: c - 2: e - -/(?:a(?Pc(?Pd)))(?Pa)/DZ ------------------------------------------------------------------- - Bra - Bra - a - CBra 1 - c - CBra 2 - d - Ket - Ket - Ket - CBra 3 - a - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 3 -Named capturing subpatterns: - a 3 - c 1 - d 2 -No options -First char = 'a' -Need char = 'a' - -/(?Pa)...(?P=a)bbb(?P>a)d/DZ ------------------------------------------------------------------- - Bra - CBra 1 - a - Ket - Any - Any - Any - \1 - bbb - Recurse - d - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Max back reference = 1 -Named capturing subpatterns: - a 1 -No options -First char = 'a' -Need char = 'd' - -/^\W*(?:(?P(?P.)\W*(?P>one)\W*(?P=two)|)|(?P(?P.)\W*(?P>three)\W*(?P=four)|\W*.\W*))\W*$/Ii -Capturing subpattern count = 4 -Max back reference = 4 -Named capturing subpatterns: - four 4 - one 1 - three 3 - two 2 -May match empty string -Options: anchored caseless -No first char -No need char - 1221 - 0: 1221 - 1: 1221 - 2: 1 - Satan, oscillate my metallic sonatas! - 0: Satan, oscillate my metallic sonatas! - 1: - 2: - 3: Satan, oscillate my metallic sonatas - 4: S - A man, a plan, a canal: Panama! - 0: A man, a plan, a canal: Panama! - 1: - 2: - 3: A man, a plan, a canal: Panama - 4: A - Able was I ere I saw Elba. - 0: Able was I ere I saw Elba. - 1: - 2: - 3: Able was I ere I saw Elba - 4: A - *** Failers -No match - The quick brown fox -No match - -/((?(R)a|b))\1(?1)?/I -Capturing subpattern count = 1 -Max back reference = 1 -No options -No first char -No need char - bb - 0: bb - 1: b - bbaa - 0: bba - 1: b - -/(.*)a/Is -Capturing subpattern count = 1 -Options: anchored dotall -No first char -Need char = 'a' - -/(.*)a\1/Is -Capturing subpattern count = 1 -Max back reference = 1 -Options: dotall -No first char -Need char = 'a' - -/(.*)a(b)\2/Is -Capturing subpattern count = 2 -Max back reference = 2 -Options: anchored dotall -No first char -Need char = 'b' - -/((.*)a|(.*)b)z/Is -Capturing subpattern count = 3 -Options: anchored dotall -No first char -Need char = 'z' - -/((.*)a|(.*)b)z\1/Is -Capturing subpattern count = 3 -Max back reference = 1 -Options: dotall -No first char -Need char = 'z' - -/((.*)a|(.*)b)z\2/Is -Capturing subpattern count = 3 -Max back reference = 2 -Options: dotall -No first char -Need char = 'z' - -/((.*)a|(.*)b)z\3/Is -Capturing subpattern count = 3 -Max back reference = 3 -Options: dotall -No first char -Need char = 'z' - -/((.*)a|^(.*)b)z\3/Is -Capturing subpattern count = 3 -Max back reference = 3 -Options: anchored dotall -No first char -Need char = 'z' - -/(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)a/Is -Capturing subpattern count = 31 -May match empty string -Options: anchored dotall -No first char -No need char - -/(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)a\31/Is -Capturing subpattern count = 31 -Max back reference = 31 -May match empty string -Options: dotall -No first char -No need char - -/(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)a\32/Is -Capturing subpattern count = 32 -Max back reference = 32 -May match empty string -Options: dotall -No first char -No need char - -/(a)(bc)/INDZ ------------------------------------------------------------------- - Bra - Bra - a - Ket - Bra - bc - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: no_auto_capture -First char = 'a' -Need char = 'c' - abc - 0: abc - -/(?Pa)(bc)/INDZ ------------------------------------------------------------------- - Bra - CBra 1 - a - Ket - Bra - bc - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Named capturing subpatterns: - one 1 -Options: no_auto_capture -First char = 'a' -Need char = 'c' - abc - 0: abc - 1: a - -/(a)(?Pbc)/INDZ ------------------------------------------------------------------- - Bra - Bra - a - Ket - CBra 1 - bc - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Named capturing subpatterns: - named 1 -Options: no_auto_capture -First char = 'a' -Need char = 'c' - -/(a+)*zz/I -Capturing subpattern count = 1 -No options -No first char -Need char = 'z' - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\M -Minimum match() limit = 8 -Minimum match() recursion limit = 6 - 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazz - 1: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa - aaaaaaaaaaaaaz\M -Minimum match() limit = 32768 -Minimum match() recursion limit = 29 -No match - -/(aaa(?C1)bbb|ab)/I -Capturing subpattern count = 1 -No options -First char = 'a' -Need char = 'b' - aaabbb ---->aaabbb - 1 ^ ^ b - 0: aaabbb - 1: aaabbb - aaabbb\C*0 ---->aaabbb - 1 ^ ^ b - 0: aaabbb - 1: aaabbb - aaabbb\C*1 ---->aaabbb - 1 ^ ^ b -Callout data = 1 - 0: ab - 1: ab - aaabbb\C*-1 ---->aaabbb - 1 ^ ^ b -Callout data = -1 -No match - -/ab(?Pcd)ef(?Pgh)/I -Capturing subpattern count = 2 -Named capturing subpatterns: - one 1 - two 2 -No options -First char = 'a' -Need char = 'h' - abcdefgh - 0: abcdefgh - 1: cd - 2: gh - abcdefgh\C1\Gtwo - 0: abcdefgh - 1: cd - 2: gh - 1C cd (2) - G gh (2) two - abcdefgh\Cone\Ctwo - 0: abcdefgh - 1: cd - 2: gh - C cd (2) one - C gh (2) two - abcdefgh\Cthree -no parentheses with name "three" - 0: abcdefgh - 1: cd - 2: gh -copy substring three failed -7 - -/(?P)(?P)/DZ ------------------------------------------------------------------- - Bra - CBra 1 - Ket - CBra 2 - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 2 -Named capturing subpatterns: - Tes 1 - Test 2 -May match empty string -No options -No first char -No need char - -/(?P)(?P)/DZ ------------------------------------------------------------------- - Bra - CBra 1 - Ket - CBra 2 - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 2 -Named capturing subpatterns: - Tes 2 - Test 1 -May match empty string -No options -No first char -No need char - -/(?Pzz)(?Paa)/I -Capturing subpattern count = 2 -Named capturing subpatterns: - A 2 - Z 1 -No options -First char = 'z' -Need char = 'a' - zzaa\CZ - 0: zzaa - 1: zz - 2: aa - C zz (2) Z - zzaa\CA - 0: zzaa - 1: zz - 2: aa - C aa (2) A - -/(?Peks)(?Peccs)/I -Failed: two named subpatterns have the same name at offset 15 - -/(?Pabc(?Pdef)(?Pxyz))/I -Failed: two named subpatterns have the same name at offset 30 - -"\[((?P\d+)(,(?P>elem))*)\]"I -Capturing subpattern count = 3 -Named capturing subpatterns: - elem 2 -No options -First char = '[' -Need char = ']' - [10,20,30,5,5,4,4,2,43,23,4234] - 0: [10,20,30,5,5,4,4,2,43,23,4234] - 1: 10,20,30,5,5,4,4,2,43,23,4234 - 2: 10 - 3: ,4234 - *** Failers -No match - [] -No match - -"\[((?P\d+)(,(?P>elem))*)?\]"I -Capturing subpattern count = 3 -Named capturing subpatterns: - elem 2 -No options -First char = '[' -Need char = ']' - [10,20,30,5,5,4,4,2,43,23,4234] - 0: [10,20,30,5,5,4,4,2,43,23,4234] - 1: 10,20,30,5,5,4,4,2,43,23,4234 - 2: 10 - 3: ,4234 - [] - 0: [] - -/(a(b(?2)c))?/DZ ------------------------------------------------------------------- - Bra - Brazero - CBra 1 - a - CBra 2 - b - Recurse - c - Ket - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 2 -May match empty string -No options -No first char -No need char - -/(a(b(?2)c))*/DZ ------------------------------------------------------------------- - Bra - Brazero - CBra 1 - a - CBra 2 - b - Recurse - c - Ket - KetRmax - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 2 -May match empty string -No options -No first char -No need char - -/(a(b(?2)c)){0,2}/DZ ------------------------------------------------------------------- - Bra - Brazero - Bra - CBra 1 - a - CBra 2 - b - Recurse - c - Ket - Ket - Brazero - CBra 1 - a - CBra 2 - b - Recurse - c - Ket - Ket - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 2 -May match empty string -No options -No first char -No need char - -/[ab]{1}+/DZ ------------------------------------------------------------------- - Bra - [ab]{1,1}+ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/((w\/|-|with)*(free|immediate)*.*?shipping\s*[!.-]*)/Ii -Capturing subpattern count = 3 -Options: caseless -No first char -Need char = 'g' (caseless) - Baby Bjorn Active Carrier - With free SHIPPING!! - 0: Baby Bjorn Active Carrier - With free SHIPPING!! - 1: Baby Bjorn Active Carrier - With free SHIPPING!! - -/((w\/|-|with)*(free|immediate)*.*?shipping\s*[!.-]*)/IiS -Capturing subpattern count = 3 -Options: caseless -No first char -Need char = 'g' (caseless) -Subject length lower bound = 8 -No starting char list - Baby Bjorn Active Carrier - With free SHIPPING!! - 0: Baby Bjorn Active Carrier - With free SHIPPING!! - 1: Baby Bjorn Active Carrier - With free SHIPPING!! - -/a*.*b/ISDZ ------------------------------------------------------------------- - Bra - a* - Any* - b - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -Need char = 'b' -Subject length lower bound = 1 -No starting char list - -/(a|b)*.?c/ISDZ ------------------------------------------------------------------- - Bra - Brazero - CBra 1 - a - Alt - b - KetRmax - Any? - c - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -No options -No first char -Need char = 'c' -Subject length lower bound = 1 -No starting char list - -/abc(?C255)de(?C)f/DZ ------------------------------------------------------------------- - Bra - abc - Callout 255 10 1 - de - Callout 0 16 1 - f - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'f' - -/abcde/ICDZ ------------------------------------------------------------------- - Bra - Callout 255 0 1 - a - Callout 255 1 1 - b - Callout 255 2 1 - c - Callout 255 3 1 - d - Callout 255 4 1 - e - Callout 255 5 0 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: -First char = 'a' -Need char = 'e' - abcde ---->abcde - +0 ^ a - +1 ^^ b - +2 ^ ^ c - +3 ^ ^ d - +4 ^ ^ e - +5 ^ ^ - 0: abcde - abcdfe ---->abcdfe - +0 ^ a - +1 ^^ b - +2 ^ ^ c - +3 ^ ^ d - +4 ^ ^ e -No match - -/a*b/ICDZS ------------------------------------------------------------------- - Bra - Callout 255 0 2 - a*+ - Callout 255 2 1 - b - Callout 255 3 0 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: -No first char -Need char = 'b' -Subject length lower bound = 1 -Starting chars: a b - ab ---->ab - +0 ^ a* - +2 ^^ b - +3 ^ ^ - 0: ab - aaaab ---->aaaab - +0 ^ a* - +2 ^ ^ b - +3 ^ ^ - 0: aaaab - aaaacb ---->aaaacb - +0 ^ a* - +2 ^ ^ b - +0 ^ a* - +2 ^ ^ b - +0 ^ a* - +2 ^ ^ b - +0 ^ a* - +2 ^^ b - +0 ^ a* - +2 ^ b - +3 ^^ - 0: b - -/a*b/ICDZSS ------------------------------------------------------------------- - Bra - Callout 255 0 2 - a*+ - Callout 255 2 1 - b - Callout 255 3 0 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: -No first char -Need char = 'b' - ab ---->ab - +0 ^ a* - +2 ^^ b - +3 ^ ^ - 0: ab - aaaab ---->aaaab - +0 ^ a* - +2 ^ ^ b - +3 ^ ^ - 0: aaaab - aaaacb ---->aaaacb - +0 ^ a* - +2 ^ ^ b - +0 ^ a* - +2 ^ ^ b - +0 ^ a* - +2 ^ ^ b - +0 ^ a* - +2 ^^ b - +0 ^ a* - +2 ^ b - +0 ^ a* - +2 ^ b - +3 ^^ - 0: b - -/a+b/ICDZ ------------------------------------------------------------------- - Bra - Callout 255 0 2 - a++ - Callout 255 2 1 - b - Callout 255 3 0 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: -First char = 'a' -Need char = 'b' - ab ---->ab - +0 ^ a+ - +2 ^^ b - +3 ^ ^ - 0: ab - aaaab ---->aaaab - +0 ^ a+ - +2 ^ ^ b - +3 ^ ^ - 0: aaaab - aaaacb ---->aaaacb - +0 ^ a+ - +2 ^ ^ b - +0 ^ a+ - +2 ^ ^ b - +0 ^ a+ - +2 ^ ^ b - +0 ^ a+ - +2 ^^ b -No match - -/(abc|def)x/ICDZS ------------------------------------------------------------------- - Bra - Callout 255 0 9 - CBra 1 - Callout 255 1 1 - a - Callout 255 2 1 - b - Callout 255 3 1 - c - Callout 255 4 0 - Alt - Callout 255 5 1 - d - Callout 255 6 1 - e - Callout 255 7 1 - f - Callout 255 8 0 - Ket - Callout 255 9 1 - x - Callout 255 10 0 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: -No first char -Need char = 'x' -Subject length lower bound = 4 -Starting chars: a d - abcx ---->abcx - +0 ^ (abc|def) - +1 ^ a - +2 ^^ b - +3 ^ ^ c - +4 ^ ^ | - +9 ^ ^ x -+10 ^ ^ - 0: abcx - 1: abc - defx ---->defx - +0 ^ (abc|def) - +1 ^ a - +5 ^ d - +6 ^^ e - +7 ^ ^ f - +8 ^ ^ ) - +9 ^ ^ x -+10 ^ ^ - 0: defx - 1: def - ** Failers -No match - abcdefzx ---->abcdefzx - +0 ^ (abc|def) - +1 ^ a - +2 ^^ b - +3 ^ ^ c - +4 ^ ^ | - +9 ^ ^ x - +5 ^ d - +0 ^ (abc|def) - +1 ^ a - +5 ^ d - +6 ^^ e - +7 ^ ^ f - +8 ^ ^ ) - +9 ^ ^ x -No match - -/(abc|def)x/ICDZSS ------------------------------------------------------------------- - Bra - Callout 255 0 9 - CBra 1 - Callout 255 1 1 - a - Callout 255 2 1 - b - Callout 255 3 1 - c - Callout 255 4 0 - Alt - Callout 255 5 1 - d - Callout 255 6 1 - e - Callout 255 7 1 - f - Callout 255 8 0 - Ket - Callout 255 9 1 - x - Callout 255 10 0 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: -No first char -Need char = 'x' - abcx ---->abcx - +0 ^ (abc|def) - +1 ^ a - +2 ^^ b - +3 ^ ^ c - +4 ^ ^ | - +9 ^ ^ x -+10 ^ ^ - 0: abcx - 1: abc - defx ---->defx - +0 ^ (abc|def) - +1 ^ a - +5 ^ d - +6 ^^ e - +7 ^ ^ f - +8 ^ ^ ) - +9 ^ ^ x -+10 ^ ^ - 0: defx - 1: def - ** Failers -No match - abcdefzx ---->abcdefzx - +0 ^ (abc|def) - +1 ^ a - +2 ^^ b - +3 ^ ^ c - +4 ^ ^ | - +9 ^ ^ x - +5 ^ d - +0 ^ (abc|def) - +1 ^ a - +5 ^ d - +0 ^ (abc|def) - +1 ^ a - +5 ^ d - +0 ^ (abc|def) - +1 ^ a - +5 ^ d - +6 ^^ e - +7 ^ ^ f - +8 ^ ^ ) - +9 ^ ^ x - +0 ^ (abc|def) - +1 ^ a - +5 ^ d - +0 ^ (abc|def) - +1 ^ a - +5 ^ d - +0 ^ (abc|def) - +1 ^ a - +5 ^ d - +0 ^ (abc|def) - +1 ^ a - +5 ^ d -No match - -/(ab|cd){3,4}/IC -Capturing subpattern count = 1 -Options: -No first char -No need char - ababab ---->ababab - +0 ^ (ab|cd){3,4} - +1 ^ a - +2 ^^ b - +3 ^ ^ | - +1 ^ ^ a - +2 ^ ^ b - +3 ^ ^ | - +1 ^ ^ a - +2 ^ ^ b - +3 ^ ^ | - +1 ^ ^ a - +4 ^ ^ c -+12 ^ ^ - 0: ababab - 1: ab - abcdabcd ---->abcdabcd - +0 ^ (ab|cd){3,4} - +1 ^ a - +2 ^^ b - +3 ^ ^ | - +1 ^ ^ a - +4 ^ ^ c - +5 ^ ^ d - +6 ^ ^ ) - +1 ^ ^ a - +2 ^ ^ b - +3 ^ ^ | - +1 ^ ^ a - +4 ^ ^ c - +5 ^ ^ d - +6 ^ ^ ) -+12 ^ ^ - 0: abcdabcd - 1: cd - abcdcdcdcdcd ---->abcdcdcdcdcd - +0 ^ (ab|cd){3,4} - +1 ^ a - +2 ^^ b - +3 ^ ^ | - +1 ^ ^ a - +4 ^ ^ c - +5 ^ ^ d - +6 ^ ^ ) - +1 ^ ^ a - +4 ^ ^ c - +5 ^ ^ d - +6 ^ ^ ) - +1 ^ ^ a - +4 ^ ^ c - +5 ^ ^ d - +6 ^ ^ ) -+12 ^ ^ - 0: abcdcdcd - 1: cd - -/([ab]{,4}c|xy)/ICDZS ------------------------------------------------------------------- - Bra - Callout 255 0 14 - CBra 1 - Callout 255 1 4 - [ab] - Callout 255 5 1 - { - Callout 255 6 1 - , - Callout 255 7 1 - 4 - Callout 255 8 1 - } - Callout 255 9 1 - c - Callout 255 10 0 - Alt - Callout 255 11 1 - x - Callout 255 12 1 - y - Callout 255 13 0 - Ket - Callout 255 14 0 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: -No first char -No need char -Subject length lower bound = 2 -Starting chars: a b x - Note: that { does NOT introduce a quantifier ---->Note: that { does NOT introduce a quantifier - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] - +5 ^^ { -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] - +5 ^^ { -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] - +5 ^^ { -+11 ^ x -No match - -/([ab]{,4}c|xy)/ICDZSS ------------------------------------------------------------------- - Bra - Callout 255 0 14 - CBra 1 - Callout 255 1 4 - [ab] - Callout 255 5 1 - { - Callout 255 6 1 - , - Callout 255 7 1 - 4 - Callout 255 8 1 - } - Callout 255 9 1 - c - Callout 255 10 0 - Alt - Callout 255 11 1 - x - Callout 255 12 1 - y - Callout 255 13 0 - Ket - Callout 255 14 0 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: -No first char -No need char - Note: that { does NOT introduce a quantifier ---->Note: that { does NOT introduce a quantifier - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] - +5 ^^ { -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] - +5 ^^ { -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] - +5 ^^ { -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x - +0 ^ ([ab]{,4}c|xy) - +1 ^ [ab] -+11 ^ x -No match - -/([ab]{1,4}c|xy){4,5}?123/ICDZ ------------------------------------------------------------------- - Bra - Callout 255 0 21 - CBra 1 - Callout 255 1 9 - [ab]{1,4}+ - Callout 255 10 1 - c - Callout 255 11 0 - Alt - Callout 255 12 1 - x - Callout 255 13 1 - y - Callout 255 14 0 - Ket - CBra 1 - Callout 255 1 9 - [ab]{1,4}+ - Callout 255 10 1 - c - Callout 255 11 0 - Alt - Callout 255 12 1 - x - Callout 255 13 1 - y - Callout 255 14 0 - Ket - CBra 1 - Callout 255 1 9 - [ab]{1,4}+ - Callout 255 10 1 - c - Callout 255 11 0 - Alt - Callout 255 12 1 - x - Callout 255 13 1 - y - Callout 255 14 0 - Ket - CBra 1 - Callout 255 1 9 - [ab]{1,4}+ - Callout 255 10 1 - c - Callout 255 11 0 - Alt - Callout 255 12 1 - x - Callout 255 13 1 - y - Callout 255 14 0 - Ket - Braminzero - CBra 1 - Callout 255 1 9 - [ab]{1,4}+ - Callout 255 10 1 - c - Callout 255 11 0 - Alt - Callout 255 12 1 - x - Callout 255 13 1 - y - Callout 255 14 0 - Ket - Callout 255 21 1 - 1 - Callout 255 22 1 - 2 - Callout 255 23 1 - 3 - Callout 255 24 0 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: -No first char -Need char = '3' - aacaacaacaacaac123 ---->aacaacaacaacaac123 - +0 ^ ([ab]{1,4}c|xy){4,5}? - +1 ^ [ab]{1,4} -+10 ^ ^ c -+11 ^ ^ | - +1 ^ ^ [ab]{1,4} -+10 ^ ^ c -+11 ^ ^ | - +1 ^ ^ [ab]{1,4} -+10 ^ ^ c -+11 ^ ^ | - +1 ^ ^ [ab]{1,4} -+10 ^ ^ c -+11 ^ ^ | -+21 ^ ^ 1 - +1 ^ ^ [ab]{1,4} -+10 ^ ^ c -+11 ^ ^ | -+21 ^ ^ 1 -+22 ^ ^ 2 -+23 ^ ^ 3 -+24 ^ ^ - 0: aacaacaacaacaac123 - 1: aac - -/\b.*/I -Capturing subpattern count = 0 -Max lookbehind = 1 -May match empty string -No options -No first char -No need char - ab cd\>1 - 0: cd - -/\b.*/Is -Capturing subpattern count = 0 -Max lookbehind = 1 -May match empty string -Options: dotall -No first char -No need char - ab cd\>1 - 0: cd - -/(?!.bcd).*/I -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char - Xbcd12345 - 0: bcd12345 - -/abcde/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'e' - ab\P -Partial match: ab - abc\P -Partial match: abc - abcd\P -Partial match: abcd - abcde\P - 0: abcde - the quick brown abc\P -Partial match: abc - ** Failers\P -No match - the quick brown abxyz fox\P -No match - -"^(0?[1-9]|[12][0-9]|3[01])/(0?[1-9]|1[012])/(20)?\d\d$"I -Capturing subpattern count = 3 -Options: anchored -No first char -Need char = '/' - 13/05/04\P - 0: 13/05/04 - 1: 13 - 2: 05 - 13/5/2004\P - 0: 13/5/2004 - 1: 13 - 2: 5 - 3: 20 - 02/05/09\P - 0: 02/05/09 - 1: 02 - 2: 05 - 1\P -Partial match: 1 - 1/2\P -Partial match: 1/2 - 1/2/0\P -Partial match: 1/2/0 - 1/2/04\P - 0: 1/2/04 - 1: 1 - 2: 2 - 0\P -Partial match: 0 - 02/\P -Partial match: 02/ - 02/0\P -Partial match: 02/0 - 02/1\P -Partial match: 02/1 - ** Failers\P -No match - \P -No match - 123\P -No match - 33/4/04\P -No match - 3/13/04\P -No match - 0/1/2003\P -No match - 0/\P -No match - 02/0/\P -No match - 02/13\P -No match - -/0{0,2}ABC/I -Capturing subpattern count = 0 -No options -No first char -Need char = 'C' - -/\d{3,}ABC/I -Capturing subpattern count = 0 -No options -No first char -Need char = 'C' - -/\d*ABC/I -Capturing subpattern count = 0 -No options -No first char -Need char = 'C' - -/[abc]+DE/I -Capturing subpattern count = 0 -No options -No first char -Need char = 'E' - -/[abc]?123/I -Capturing subpattern count = 0 -No options -No first char -Need char = '3' - 123\P - 0: 123 - a\P -Partial match: a - b\P -Partial match: b - c\P -Partial match: c - c12\P -Partial match: c12 - c123\P - 0: c123 - -/^(?:\d){3,5}X/I -Capturing subpattern count = 0 -Options: anchored -No first char -Need char = 'X' - 1\P -Partial match: 1 - 123\P -Partial match: 123 - 123X - 0: 123X - 1234\P -Partial match: 1234 - 1234X - 0: 1234X - 12345\P -Partial match: 12345 - 12345X - 0: 12345X - *** Failers -No match - 1X -No match - 123456\P -No match - -//KF>testsavedregex -Compiled pattern written to testsavedregex -Study data written to testsavedregex - -/abc/IS>testsavedregex -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'c' -Subject length lower bound = 3 -No starting char list -Compiled pattern written to testsavedregex -Study data written to testsavedregex -testsavedregex -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'c' -Compiled pattern written to testsavedregex -testsavedregex -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'c' -Subject length lower bound = 3 -No starting char list -Compiled pattern written to testsavedregex -Study data written to testsavedregex -testsavedregex -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'c' -Compiled pattern written to testsavedregex -testsavedregex -Capturing subpattern count = 1 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: a b -Compiled pattern written to testsavedregex -Study data written to testsavedregex -testsavedregex -Capturing subpattern count = 1 -No options -No first char -No need char -Compiled pattern written to testsavedregex -testsavedregex -Capturing subpattern count = 1 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: a b -Compiled pattern written to testsavedregex -Study data written to testsavedregex -testsavedregex -Capturing subpattern count = 1 -No options -No first char -No need char -Compiled pattern written to testsavedregex -(.)*~smgI -Capturing subpattern count = 3 -Max back reference = 1 -Options: multiline dotall -First char = '<' -Need char = '>' - \J1024\n\n\nPartner der LCO\nde\nPartner der LINEAS Consulting\nGmbH\nLINEAS Consulting GmbH Hamburg\nPartnerfirmen\n30 days\nindex,follow\n\nja\n3\nPartner\n\n\nLCO\nLINEAS Consulting\n15.10.2003\n\n\n\n\nDie Partnerfirmen der LINEAS Consulting\nGmbH\n\n\n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n - 0: \x0a\x0aPartner der LCO\x0ade\x0aPartner der LINEAS Consulting\x0aGmbH\x0aLINEAS Consulting GmbH Hamburg\x0aPartnerfirmen\x0a30 days\x0aindex,follow\x0a\x0aja\x0a3\x0aPartner\x0a\x0a\x0aLCO\x0aLINEAS Consulting\x0a15.10.2003\x0a\x0a\x0a\x0a\x0aDie Partnerfirmen der LINEAS Consulting\x0aGmbH\x0a\x0a\x0a \x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a - 1: seite - 2: \x0a - 3: seite - -/^a/IF -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/line\nbreak/I -Capturing subpattern count = 0 -Contains explicit CR or LF match -No options -First char = 'l' -Need char = 'k' - this is a line\nbreak - 0: line\x0abreak - line one\nthis is a line\nbreak in the second line - 0: line\x0abreak - -/line\nbreak/If -Capturing subpattern count = 0 -Contains explicit CR or LF match -Options: firstline -First char = 'l' -Need char = 'k' - this is a line\nbreak - 0: line\x0abreak - ** Failers -No match - line one\nthis is a line\nbreak in the second line -No match - -/line\nbreak/Imf -Capturing subpattern count = 0 -Contains explicit CR or LF match -Options: multiline firstline -First char = 'l' -Need char = 'k' - this is a line\nbreak - 0: line\x0abreak - ** Failers -No match - line one\nthis is a line\nbreak in the second line -No match - -/(?i)(?-i)AbCd/I -Capturing subpattern count = 0 -No options -First char = 'A' -Need char = 'd' - AbCd - 0: AbCd - ** Failers -No match - abcd -No match - -/a{11111111111111111111}/I -Failed: number too big in {} quantifier at offset 8 - -/(){64294967295}/I -Failed: number too big in {} quantifier at offset 9 - -/(){2,4294967295}/I -Failed: number too big in {} quantifier at offset 11 - -"(?i:a)(?i:b)(?i:c)(?i:d)(?i:e)(?i:f)(?i:g)(?i:h)(?i:i)(?i:j)(k)(?i:l)A\1B"I -Capturing subpattern count = 1 -Max back reference = 1 -No options -First char = 'a' (caseless) -Need char = 'B' - abcdefghijklAkB - 0: abcdefghijklAkB - 1: k - -"(?Pa)(?Pb)(?Pc)(?Pd)(?Pe)(?Pf)(?Pg)(?Ph)(?Pi)(?Pj)(?Pk)(?Pl)A\11B"I -Capturing subpattern count = 12 -Max back reference = 11 -Named capturing subpatterns: - n0 1 - n1 2 - n10 11 - n11 12 - n2 3 - n3 4 - n4 5 - n5 6 - n6 7 - n7 8 - n8 9 - n9 10 -No options -First char = 'a' -Need char = 'B' - abcdefghijklAkB - 0: abcdefghijklAkB - 1: a - 2: b - 3: c - 4: d - 5: e - 6: f - 7: g - 8: h - 9: i -10: j -11: k -12: l - -"(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)A\11B"I -Capturing subpattern count = 12 -Max back reference = 11 -No options -First char = 'a' -Need char = 'B' - abcdefghijklAkB - 0: abcdefghijklAkB - 1: a - 2: b - 3: c - 4: d - 5: e - 6: f - 7: g - 8: h - 9: i -10: j -11: k -12: l - -"(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)"I -Capturing subpattern count = 101 -Named capturing subpatterns: - name0 1 - name1 2 - name10 11 - name100 101 - name11 12 - name12 13 - name13 14 - name14 15 - name15 16 - name16 17 - name17 18 - name18 19 - name19 20 - name2 3 - name20 21 - name21 22 - name22 23 - name23 24 - name24 25 - name25 26 - name26 27 - name27 28 - name28 29 - name29 30 - name3 4 - name30 31 - name31 32 - name32 33 - name33 34 - name34 35 - name35 36 - name36 37 - name37 38 - name38 39 - name39 40 - name4 5 - name40 41 - name41 42 - name42 43 - name43 44 - name44 45 - name45 46 - name46 47 - name47 48 - name48 49 - name49 50 - name5 6 - name50 51 - name51 52 - name52 53 - name53 54 - name54 55 - name55 56 - name56 57 - name57 58 - name58 59 - name59 60 - name6 7 - name60 61 - name61 62 - name62 63 - name63 64 - name64 65 - name65 66 - name66 67 - name67 68 - name68 69 - name69 70 - name7 8 - name70 71 - name71 72 - name72 73 - name73 74 - name74 75 - name75 76 - name76 77 - name77 78 - name78 79 - name79 80 - name8 9 - name80 81 - name81 82 - name82 83 - name83 84 - name84 85 - name85 86 - name86 87 - name87 88 - name88 89 - name89 90 - name9 10 - name90 91 - name91 92 - name92 93 - name93 94 - name94 95 - name95 96 - name96 97 - name97 98 - name98 99 - name99 100 -No options -First char = 'a' -Need char = 'a' - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa -Matched, but too many substrings - 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa - 1: a - 2: a - 3: a - 4: a - 5: a - 6: a - 7: a - 8: a - 9: a -10: a -11: a -12: a -13: a -14: a - -"(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)"I -Capturing subpattern count = 101 -No options -First char = 'a' -Need char = 'a' - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa -Matched, but too many substrings - 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa - 1: a - 2: a - 3: a - 4: a - 5: a - 6: a - 7: a - 8: a - 9: a -10: a -11: a -12: a -13: a -14: a - -/[^()]*(?:\((?R)\)[^()]*)*/I -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char - (this(and)that - 0: - (this(and)that) - 0: (this(and)that) - (this(and)that)stuff - 0: (this(and)that)stuff - -/[^()]*(?:\((?>(?R))\)[^()]*)*/I -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char - (this(and)that - 0: - (this(and)that) - 0: (this(and)that) - -/[^()]*(?:\((?R)\))*[^()]*/I -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char - (this(and)that - 0: - (this(and)that) - 0: (this(and)that) - -/(?:\((?R)\))*[^()]*/I -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char - (this(and)that - 0: - (this(and)that) - 0: - ((this)) - 0: ((this)) - -/(?:\((?R)\))|[^()]*/I -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char - (this(and)that - 0: - (this(and)that) - 0: - (this) - 0: (this) - ((this)) - 0: ((this)) - -/\x{0000ff}/I -Capturing subpattern count = 0 -No options -First char = \xff -No need char - -/^((?Pa1)|(?Pa2)b)/I -Failed: two named subpatterns have the same name at offset 17 - -/^((?Pa1)|(?Pa2)b)/IJ -Capturing subpattern count = 3 -Named capturing subpatterns: - A 2 - A 3 -Options: anchored dupnames -No first char -No need char - a1b\CA - 0: a1 - 1: a1 - 2: a1 - C a1 (2) A - a2b\CA - 0: a2b - 1: a2b - 2: - 3: a2 - C a2 (2) A - ** Failers -No match - a1b\CZ\CA -no parentheses with name "Z" - 0: a1 - 1: a1 - 2: a1 -copy substring Z failed -7 - C a1 (2) A - -/(?|(?)(?)(?)|(?)(?)(?))/IJ -Capturing subpattern count = 3 -Named capturing subpatterns: - a 1 - a 3 - b 2 -May match empty string -Options: dupnames -No first char -No need char - -/^(?Pa)(?Pb)/IJ -Capturing subpattern count = 2 -Named capturing subpatterns: - A 1 - A 2 -Options: anchored dupnames -No first char -No need char - ab\CA - 0: ab - 1: a - 2: b - C a (1) A - -/^(?Pa)(?Pb)|cd/IJ -Capturing subpattern count = 2 -Named capturing subpatterns: - A 1 - A 2 -Options: dupnames -No first char -No need char - ab\CA - 0: ab - 1: a - 2: b - C a (1) A - cd\CA - 0: cd -copy substring A failed -7 - -/^(?Pa)(?Pb)|cd(?Pef)(?Pgh)/IJ -Capturing subpattern count = 4 -Named capturing subpatterns: - A 1 - A 2 - A 3 - A 4 -Options: dupnames -No first char -No need char - cdefgh\CA - 0: cdefgh - 1: - 2: - 3: ef - 4: gh - C ef (2) A - -/^((?Pa1)|(?Pa2)b)/IJ -Capturing subpattern count = 3 -Named capturing subpatterns: - A 2 - A 3 -Options: anchored dupnames -No first char -No need char - a1b\GA - 0: a1 - 1: a1 - 2: a1 - G a1 (2) A - a2b\GA - 0: a2b - 1: a2b - 2: - 3: a2 - G a2 (2) A - ** Failers -No match - a1b\GZ\GA -no parentheses with name "Z" - 0: a1 - 1: a1 - 2: a1 -get substring Z failed -7 - G a1 (2) A - -/^(?Pa)(?Pb)/IJ -Capturing subpattern count = 2 -Named capturing subpatterns: - A 1 - A 2 -Options: anchored dupnames -No first char -No need char - ab\GA - 0: ab - 1: a - 2: b - G a (1) A - -/^(?Pa)(?Pb)|cd/IJ -Capturing subpattern count = 2 -Named capturing subpatterns: - A 1 - A 2 -Options: dupnames -No first char -No need char - ab\GA - 0: ab - 1: a - 2: b - G a (1) A - cd\GA - 0: cd -get substring A failed -7 - -/^(?Pa)(?Pb)|cd(?Pef)(?Pgh)/IJ -Capturing subpattern count = 4 -Named capturing subpatterns: - A 1 - A 2 - A 3 - A 4 -Options: dupnames -No first char -No need char - cdefgh\GA - 0: cdefgh - 1: - 2: - 3: ef - 4: gh - G ef (2) A - -/(?J)^((?Pa1)|(?Pa2)b)/I -Capturing subpattern count = 3 -Named capturing subpatterns: - A 2 - A 3 -Options: anchored -Duplicate name status changes -No first char -No need char - a1b\CA - 0: a1 - 1: a1 - 2: a1 - C a1 (2) A - a2b\CA - 0: a2b - 1: a2b - 2: - 3: a2 - C a2 (2) A - -/^(?Pa) (?J:(?Pb)(?Pc)) (?Pd)/I -Failed: two named subpatterns have the same name at offset 37 - -/ In this next test, J is not set at the outer level; consequently it isn't -set in the pattern's options; consequently pcre_get_named_substring() produces -a random value. /Ix -Capturing subpattern count = 1 -Options: extended -First char = 'I' -Need char = 'e' - -/^(?Pa) (?J:(?Pb)(?Pc)) (?Pd)/I -Capturing subpattern count = 4 -Named capturing subpatterns: - A 1 - B 2 - B 3 - C 4 -Options: anchored -Duplicate name status changes -No first char -No need char - a bc d\CA\CB\CC - 0: a bc d - 1: a - 2: b - 3: c - 4: d - C a (1) A - C b (1) B - C d (1) C - -/^(?Pa)?(?(A)a|b)/I -Capturing subpattern count = 1 -Max back reference = 1 -Named capturing subpatterns: - A 1 -Options: anchored -No first char -No need char - aabc - 0: aa - 1: a - bc - 0: b - ** Failers -No match - abc -No match - -/(?:(?(ZZ)a|b)(?PX))+/I -Capturing subpattern count = 1 -Max back reference = 1 -Named capturing subpatterns: - ZZ 1 -No options -No first char -Need char = 'X' - bXaX - 0: bXaX - 1: X - -/(?:(?(2y)a|b)(X))+/I -Failed: malformed number or name after (?( at offset 7 - -/(?:(?(ZA)a|b)(?PX))+/I -Failed: reference to non-existent subpattern at offset 9 - -/(?:(?(ZZ)a|b)(?(ZZ)a|b)(?PX))+/I -Capturing subpattern count = 1 -Max back reference = 1 -Named capturing subpatterns: - ZZ 1 -No options -No first char -Need char = 'X' - bbXaaX - 0: bbXaaX - 1: X - -/(?:(?(ZZ)a|\(b\))\\(?PX))+/I -Capturing subpattern count = 1 -Max back reference = 1 -Named capturing subpatterns: - ZZ 1 -No options -No first char -Need char = 'X' - (b)\\Xa\\X - 0: (b)\Xa\X - 1: X - -/(?PX|Y))+/I -Capturing subpattern count = 1 -Max back reference = 1 -Named capturing subpatterns: - A 1 -No options -No first char -No need char - bXXaYYaY - 0: bXXaYYaY - 1: Y - bXYaXXaX - 0: bX - 1: X - -/()()()()()()()()()(?:(?(A)(?P=A)a|b)(?PX|Y))+/I -Capturing subpattern count = 10 -Max back reference = 10 -Named capturing subpatterns: - A 10 -No options -No first char -No need char - bXXaYYaY - 0: bXXaYYaY - 1: - 2: - 3: - 4: - 5: - 6: - 7: - 8: - 9: -10: Y - -/\s*,\s*/IS -Capturing subpattern count = 0 -No options -No first char -Need char = ',' -Subject length lower bound = 1 -Starting chars: \x09 \x0a \x0b \x0c \x0d \x20 , - \x0b,\x0b - 0: \x0b,\x0b - \x0c,\x0d - 0: \x0c,\x0d - -/^abc/Im -Capturing subpattern count = 0 -Options: multiline -First char at start or follows newline -Need char = 'c' - xyz\nabc - 0: abc - xyz\nabc\ - 0: abc - xyz\r\nabc\ - 0: abc - xyz\rabc\ - 0: abc - xyz\r\nabc\ - 0: abc - ** Failers -No match - xyz\nabc\ -No match - xyz\r\nabc\ -No match - xyz\nabc\ -No match - xyz\rabc\ -No match - xyz\rabc\ -No match - -/abc$/Im -Capturing subpattern count = 0 -Options: multiline -Forced newline sequence: LF -First char = 'a' -Need char = 'c' - xyzabc - 0: abc - xyzabc\n - 0: abc - xyzabc\npqr - 0: abc - xyzabc\r\ - 0: abc - xyzabc\rpqr\ - 0: abc - xyzabc\r\n\ - 0: abc - xyzabc\r\npqr\ - 0: abc - ** Failers -No match - xyzabc\r -No match - xyzabc\rpqr -No match - xyzabc\r\n -No match - xyzabc\r\npqr -No match - -/^abc/Im -Capturing subpattern count = 0 -Options: multiline -Forced newline sequence: CR -First char at start or follows newline -Need char = 'c' - xyz\rabcdef - 0: abc - xyz\nabcdef\ - 0: abc - ** Failers -No match - xyz\nabcdef -No match - -/^abc/Im -Capturing subpattern count = 0 -Options: multiline -Forced newline sequence: LF -First char at start or follows newline -Need char = 'c' - xyz\nabcdef - 0: abc - xyz\rabcdef\ - 0: abc - ** Failers -No match - xyz\rabcdef -No match - -/^abc/Im -Capturing subpattern count = 0 -Options: multiline -Forced newline sequence: CRLF -First char at start or follows newline -Need char = 'c' - xyz\r\nabcdef - 0: abc - xyz\rabcdef\ - 0: abc - ** Failers -No match - xyz\rabcdef -No match - -/^abc/Im -Unknown modifier at: - - -/abc/I -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'c' - xyz\rabc\ -Unknown escape sequence at: - abc - 0: abc - -/.*/I -Capturing subpattern count = 0 -May match empty string -Options: -Forced newline sequence: LF -First char at start or follows newline -No need char - abc\ndef - 0: abc - abc\rdef - 0: abc\x0ddef - abc\r\ndef - 0: abc\x0d - \abc\ndef - 0: abc\x0adef - \abc\rdef - 0: abc - \abc\r\ndef - 0: abc - \abc\ndef - 0: abc\x0adef - \abc\rdef - 0: abc\x0ddef - \abc\r\ndef - 0: abc - -/\w+(.)(.)?def/Is -Capturing subpattern count = 2 -Options: dotall -No first char -Need char = 'f' - abc\ndef - 0: abc\x0adef - 1: \x0a - abc\rdef - 0: abc\x0ddef - 1: \x0d - abc\r\ndef - 0: abc\x0d\x0adef - 1: \x0d - 2: \x0a - -+((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)+I -Capturing subpattern count = 1 -May match empty string -No options -No first char -No need char - /* this is a C style comment */\M -Minimum match() limit = 120 -Minimum match() recursion limit = 6 - 0: /* this is a C style comment */ - 1: /* this is a C style comment */ - -/(?P25[0-5]|2[0-4]\d|[01]?\d?\d)(?:\.(?P>B)){3}/I -Capturing subpattern count = 1 -Named capturing subpatterns: - B 1 -No options -No first char -Need char = '.' - -/()()()()()()()()()()()()()()()()()()()() - ()()()()()()()()()()()()()()()()()()()() - ()()()()()()()()()()()()()()()()()()()() - ()()()()()()()()()()()()()()()()()()()() - ()()()()()()()()()()()()()()()()()()()() - (.(.))/Ix -Capturing subpattern count = 102 -Options: extended -No first char -No need char - XY\O400 - 0: XY - 1: - 2: - 3: - 4: - 5: - 6: - 7: - 8: - 9: -10: -11: -12: -13: -14: -15: -16: -17: -18: -19: -20: -21: -22: -23: -24: -25: -26: -27: -28: -29: -30: -31: -32: -33: -34: -35: -36: -37: -38: -39: -40: -41: -42: -43: -44: -45: -46: -47: -48: -49: -50: -51: -52: -53: -54: -55: -56: -57: -58: -59: -60: -61: -62: -63: -64: -65: -66: -67: -68: -69: -70: -71: -72: -73: -74: -75: -76: -77: -78: -79: -80: -81: -82: -83: -84: -85: -86: -87: -88: -89: -90: -91: -92: -93: -94: -95: -96: -97: -98: -99: -100: -101: XY -102: Y - -/(a*b|(?i:c*(?-i)d))/IS -Capturing subpattern count = 1 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: C a b c d - -/()[ab]xyz/IS -Capturing subpattern count = 1 -No options -No first char -Need char = 'z' -Subject length lower bound = 4 -Starting chars: a b - -/(|)[ab]xyz/IS -Capturing subpattern count = 1 -No options -No first char -Need char = 'z' -Subject length lower bound = 4 -Starting chars: a b - -/(|c)[ab]xyz/IS -Capturing subpattern count = 1 -No options -No first char -Need char = 'z' -Subject length lower bound = 4 -Starting chars: a b c - -/(|c?)[ab]xyz/IS -Capturing subpattern count = 1 -No options -No first char -Need char = 'z' -Subject length lower bound = 4 -Starting chars: a b c - -/(d?|c?)[ab]xyz/IS -Capturing subpattern count = 1 -No options -No first char -Need char = 'z' -Subject length lower bound = 4 -Starting chars: a b c d - -/(d?|c)[ab]xyz/IS -Capturing subpattern count = 1 -No options -No first char -Need char = 'z' -Subject length lower bound = 4 -Starting chars: a b c d - -/^a*b\d/DZ ------------------------------------------------------------------- - Bra - ^ - a*+ - b - \d - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -Need char = 'b' - -/^a*+b\d/DZ ------------------------------------------------------------------- - Bra - ^ - a*+ - b - \d - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -Need char = 'b' - -/^a*?b\d/DZ ------------------------------------------------------------------- - Bra - ^ - a*+ - b - \d - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -Need char = 'b' - -/^a+A\d/DZ ------------------------------------------------------------------- - Bra - ^ - a++ - A - \d - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored -No first char -Need char = 'A' - aaaA5 - 0: aaaA5 - ** Failers -No match - aaaa5 -No match - -/^a*A\d/IiDZ ------------------------------------------------------------------- - Bra - ^ - /i a* - /i A - \d - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored caseless -No first char -Need char = 'A' (caseless) - aaaA5 - 0: aaaA5 - aaaa5 - 0: aaaa5 - -/(a*|b*)[cd]/IS -Capturing subpattern count = 1 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: a b c d - -/(a+|b*)[cd]/IS -Capturing subpattern count = 1 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: a b c d - -/(a*|b+)[cd]/IS -Capturing subpattern count = 1 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: a b c d - -/(a+|b+)[cd]/IS -Capturing subpattern count = 1 -No options -No first char -No need char -Subject length lower bound = 2 -Starting chars: a b - -/(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( - (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( - ((( - a - )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) - )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) - ))) -/Ix -Capturing subpattern count = 203 -Options: extended -First char = 'a' -No need char - large nest -Matched, but too many substrings - 0: a - 1: a - 2: a - 3: a - 4: a - 5: a - 6: a - 7: a - 8: a - 9: a -10: a -11: a -12: a -13: a -14: a - -/a*\d/BZ ------------------------------------------------------------------- - Bra - a*+ - \d - Ket - End ------------------------------------------------------------------- - -/a*\D/BZ ------------------------------------------------------------------- - Bra - a* - \D - Ket - End ------------------------------------------------------------------- - -/0*\d/BZ ------------------------------------------------------------------- - Bra - 0* - \d - Ket - End ------------------------------------------------------------------- - -/0*\D/BZ ------------------------------------------------------------------- - Bra - 0*+ - \D - Ket - End ------------------------------------------------------------------- - -/a*\s/BZ ------------------------------------------------------------------- - Bra - a*+ - \s - Ket - End ------------------------------------------------------------------- - -/a*\S/BZ ------------------------------------------------------------------- - Bra - a* - \S - Ket - End ------------------------------------------------------------------- - -/ *\s/BZ ------------------------------------------------------------------- - Bra - * - \s - Ket - End ------------------------------------------------------------------- - -/ *\S/BZ ------------------------------------------------------------------- - Bra - *+ - \S - Ket - End ------------------------------------------------------------------- - -/a*\w/BZ ------------------------------------------------------------------- - Bra - a* - \w - Ket - End ------------------------------------------------------------------- - -/a*\W/BZ ------------------------------------------------------------------- - Bra - a*+ - \W - Ket - End ------------------------------------------------------------------- - -/=*\w/BZ ------------------------------------------------------------------- - Bra - =*+ - \w - Ket - End ------------------------------------------------------------------- - -/=*\W/BZ ------------------------------------------------------------------- - Bra - =* - \W - Ket - End ------------------------------------------------------------------- - -/\d*a/BZ ------------------------------------------------------------------- - Bra - \d*+ - a - Ket - End ------------------------------------------------------------------- - -/\d*2/BZ ------------------------------------------------------------------- - Bra - \d* - 2 - Ket - End ------------------------------------------------------------------- - -/\d*\d/BZ ------------------------------------------------------------------- - Bra - \d* - \d - Ket - End ------------------------------------------------------------------- - -/\d*\D/BZ ------------------------------------------------------------------- - Bra - \d*+ - \D - Ket - End ------------------------------------------------------------------- - -/\d*\s/BZ ------------------------------------------------------------------- - Bra - \d*+ - \s - Ket - End ------------------------------------------------------------------- - -/\d*\S/BZ ------------------------------------------------------------------- - Bra - \d* - \S - Ket - End ------------------------------------------------------------------- - -/\d*\w/BZ ------------------------------------------------------------------- - Bra - \d* - \w - Ket - End ------------------------------------------------------------------- - -/\d*\W/BZ ------------------------------------------------------------------- - Bra - \d*+ - \W - Ket - End ------------------------------------------------------------------- - -/\D*a/BZ ------------------------------------------------------------------- - Bra - \D* - a - Ket - End ------------------------------------------------------------------- - -/\D*2/BZ ------------------------------------------------------------------- - Bra - \D*+ - 2 - Ket - End ------------------------------------------------------------------- - -/\D*\d/BZ ------------------------------------------------------------------- - Bra - \D*+ - \d - Ket - End ------------------------------------------------------------------- - -/\D*\D/BZ ------------------------------------------------------------------- - Bra - \D* - \D - Ket - End ------------------------------------------------------------------- - -/\D*\s/BZ ------------------------------------------------------------------- - Bra - \D* - \s - Ket - End ------------------------------------------------------------------- - -/\D*\S/BZ ------------------------------------------------------------------- - Bra - \D* - \S - Ket - End ------------------------------------------------------------------- - -/\D*\w/BZ ------------------------------------------------------------------- - Bra - \D* - \w - Ket - End ------------------------------------------------------------------- - -/\D*\W/BZ ------------------------------------------------------------------- - Bra - \D* - \W - Ket - End ------------------------------------------------------------------- - -/\s*a/BZ ------------------------------------------------------------------- - Bra - \s*+ - a - Ket - End ------------------------------------------------------------------- - -/\s*2/BZ ------------------------------------------------------------------- - Bra - \s*+ - 2 - Ket - End ------------------------------------------------------------------- - -/\s*\d/BZ ------------------------------------------------------------------- - Bra - \s*+ - \d - Ket - End ------------------------------------------------------------------- - -/\s*\D/BZ ------------------------------------------------------------------- - Bra - \s* - \D - Ket - End ------------------------------------------------------------------- - -/\s*\s/BZ ------------------------------------------------------------------- - Bra - \s* - \s - Ket - End ------------------------------------------------------------------- - -/\s*\S/BZ ------------------------------------------------------------------- - Bra - \s*+ - \S - Ket - End ------------------------------------------------------------------- - -/\s*\w/BZ ------------------------------------------------------------------- - Bra - \s*+ - \w - Ket - End ------------------------------------------------------------------- - -/\s*\W/BZ ------------------------------------------------------------------- - Bra - \s* - \W - Ket - End ------------------------------------------------------------------- - -/\S*a/BZ ------------------------------------------------------------------- - Bra - \S* - a - Ket - End ------------------------------------------------------------------- - -/\S*2/BZ ------------------------------------------------------------------- - Bra - \S* - 2 - Ket - End ------------------------------------------------------------------- - -/\S*\d/BZ ------------------------------------------------------------------- - Bra - \S* - \d - Ket - End ------------------------------------------------------------------- - -/\S*\D/BZ ------------------------------------------------------------------- - Bra - \S* - \D - Ket - End ------------------------------------------------------------------- - -/\S*\s/BZ ------------------------------------------------------------------- - Bra - \S*+ - \s - Ket - End ------------------------------------------------------------------- - -/\S*\S/BZ ------------------------------------------------------------------- - Bra - \S* - \S - Ket - End ------------------------------------------------------------------- - -/\S*\w/BZ ------------------------------------------------------------------- - Bra - \S* - \w - Ket - End ------------------------------------------------------------------- - -/\S*\W/BZ ------------------------------------------------------------------- - Bra - \S* - \W - Ket - End ------------------------------------------------------------------- - -/\w*a/BZ ------------------------------------------------------------------- - Bra - \w* - a - Ket - End ------------------------------------------------------------------- - -/\w*2/BZ ------------------------------------------------------------------- - Bra - \w* - 2 - Ket - End ------------------------------------------------------------------- - -/\w*\d/BZ ------------------------------------------------------------------- - Bra - \w* - \d - Ket - End ------------------------------------------------------------------- - -/\w*\D/BZ ------------------------------------------------------------------- - Bra - \w* - \D - Ket - End ------------------------------------------------------------------- - -/\w*\s/BZ ------------------------------------------------------------------- - Bra - \w*+ - \s - Ket - End ------------------------------------------------------------------- - -/\w*\S/BZ ------------------------------------------------------------------- - Bra - \w* - \S - Ket - End ------------------------------------------------------------------- - -/\w*\w/BZ ------------------------------------------------------------------- - Bra - \w* - \w - Ket - End ------------------------------------------------------------------- - -/\w*\W/BZ ------------------------------------------------------------------- - Bra - \w*+ - \W - Ket - End ------------------------------------------------------------------- - -/\W*a/BZ ------------------------------------------------------------------- - Bra - \W*+ - a - Ket - End ------------------------------------------------------------------- - -/\W*2/BZ ------------------------------------------------------------------- - Bra - \W*+ - 2 - Ket - End ------------------------------------------------------------------- - -/\W*\d/BZ ------------------------------------------------------------------- - Bra - \W*+ - \d - Ket - End ------------------------------------------------------------------- - -/\W*\D/BZ ------------------------------------------------------------------- - Bra - \W* - \D - Ket - End ------------------------------------------------------------------- - -/\W*\s/BZ ------------------------------------------------------------------- - Bra - \W* - \s - Ket - End ------------------------------------------------------------------- - -/\W*\S/BZ ------------------------------------------------------------------- - Bra - \W* - \S - Ket - End ------------------------------------------------------------------- - -/\W*\w/BZ ------------------------------------------------------------------- - Bra - \W*+ - \w - Ket - End ------------------------------------------------------------------- - -/\W*\W/BZ ------------------------------------------------------------------- - Bra - \W* - \W - Ket - End ------------------------------------------------------------------- - -/[^a]+a/BZ ------------------------------------------------------------------- - Bra - [^a]++ - a - Ket - End ------------------------------------------------------------------- - -/[^a]+a/BZi ------------------------------------------------------------------- - Bra - /i [^a]++ - /i a - Ket - End ------------------------------------------------------------------- - -/[^a]+A/BZi ------------------------------------------------------------------- - Bra - /i [^a]++ - /i A - Ket - End ------------------------------------------------------------------- - -/[^a]+b/BZ ------------------------------------------------------------------- - Bra - [^a]+ - b - Ket - End ------------------------------------------------------------------- - -/[^a]+\d/BZ ------------------------------------------------------------------- - Bra - [^a]+ - \d - Ket - End ------------------------------------------------------------------- - -/a*[^a]/BZ ------------------------------------------------------------------- - Bra - a*+ - [^a] - Ket - End ------------------------------------------------------------------- - -/(?Px)(?Py)/I -Capturing subpattern count = 2 -Named capturing subpatterns: - abc 1 - xyz 2 -No options -First char = 'x' -Need char = 'y' - xy\Cabc\Cxyz - 0: xy - 1: x - 2: y - C x (1) abc - C y (1) xyz - -/(?x)(?'xyz'y)/I -Capturing subpattern count = 2 -Named capturing subpatterns: - abc 1 - xyz 2 -No options -First char = 'x' -Need char = 'y' - xy\Cabc\Cxyz - 0: xy - 1: x - 2: y - C x (1) abc - C y (1) xyz - -/(?x)(?'xyz>y)/I -Failed: syntax error in subpattern name (missing terminator) at offset 15 - -/(?P'abc'x)(?Py)/I -Failed: unrecognized character after (?P at offset 3 - -/^(?:(?(ZZ)a|b)(?X))+/ - bXaX - 0: bXaX - 1: X - bXbX - 0: bX - 1: X - ** Failers -No match - aXaX -No match - aXbX -No match - -/^(?P>abc)(?xxx)/ -Failed: reference to non-existent subpattern at offset 8 - -/^(?P>abc)(?x|y)/ - xx - 0: xx - 1: x - xy - 0: xy - 1: y - yy - 0: yy - 1: y - yx - 0: yx - 1: x - -/^(?P>abc)(?Px|y)/ - xx - 0: xx - 1: x - xy - 0: xy - 1: y - yy - 0: yy - 1: y - yx - 0: yx - 1: x - -/^((?(abc)a|b)(?x|y))+/ - bxay - 0: bxay - 1: ay - 2: y - bxby - 0: bx - 1: bx - 2: x - ** Failers -No match - axby -No match - -/^(((?P=abc)|X)(?x|y))+/ - XxXxxx - 0: XxXxxx - 1: xx - 2: x - 3: x - XxXyyx - 0: XxXyyx - 1: yx - 2: y - 3: x - XxXyxx - 0: XxXy - 1: Xy - 2: X - 3: y - ** Failers -No match - x -No match - -/^(?1)(abc)/ - abcabc - 0: abcabc - 1: abc - -/^(?:(?:\1|X)(a|b))+/ - Xaaa - 0: Xaaa - 1: a - Xaba - 0: Xa - 1: a - -/^[\E\Qa\E-\Qz\E]+/BZ ------------------------------------------------------------------- - Bra - ^ - [a-z]++ - Ket - End ------------------------------------------------------------------- - -/^[a\Q]bc\E]/BZ ------------------------------------------------------------------- - Bra - ^ - [\]a-c] - Ket - End ------------------------------------------------------------------- - -/^[a-\Q\E]/BZ ------------------------------------------------------------------- - Bra - ^ - [\-a] - Ket - End ------------------------------------------------------------------- - -/^(?P>abc)[()](?)/BZ ------------------------------------------------------------------- - Bra - ^ - Recurse - [()] - CBra 1 - Ket - Ket - End ------------------------------------------------------------------- - -/^((?(abc)y)[()](?Px))+/BZ ------------------------------------------------------------------- - Bra - ^ - CBra 1 - Cond - 2 Cond ref - y - Ket - [()] - CBra 2 - x - Ket - KetRmax - Ket - End ------------------------------------------------------------------- - (xy)x - 0: (xy)x - 1: y)x - 2: x - -/^(?P>abc)\Q()\E(?)/BZ ------------------------------------------------------------------- - Bra - ^ - Recurse - () - CBra 1 - Ket - Ket - End ------------------------------------------------------------------- - -/^(?P>abc)[a\Q(]\E(](?)/BZ ------------------------------------------------------------------- - Bra - ^ - Recurse - [(\]a] - CBra 1 - Ket - Ket - End ------------------------------------------------------------------- - -/^(?P>abc) # this is (a comment) - (?)/BZx ------------------------------------------------------------------- - Bra - ^ - Recurse - CBra 1 - Ket - Ket - End ------------------------------------------------------------------- - -/^\W*(?:(?(?.)\W*(?&one)\W*\k|)|(?(?.)\W*(?&three)\W*\k'four'|\W*.\W*))\W*$/Ii -Capturing subpattern count = 4 -Max back reference = 4 -Named capturing subpatterns: - four 4 - one 1 - three 3 - two 2 -May match empty string -Options: anchored caseless -No first char -No need char - 1221 - 0: 1221 - 1: 1221 - 2: 1 - Satan, oscillate my metallic sonatas! - 0: Satan, oscillate my metallic sonatas! - 1: - 2: - 3: Satan, oscillate my metallic sonatas - 4: S - A man, a plan, a canal: Panama! - 0: A man, a plan, a canal: Panama! - 1: - 2: - 3: A man, a plan, a canal: Panama - 4: A - Able was I ere I saw Elba. - 0: Able was I ere I saw Elba. - 1: - 2: - 3: Able was I ere I saw Elba - 4: A - *** Failers -No match - The quick brown fox -No match - -/(?=(\w+))\1:/I -Capturing subpattern count = 1 -Max back reference = 1 -No options -No first char -Need char = ':' - abcd: - 0: abcd: - 1: abcd - -/(?=(?'abc'\w+))\k:/I -Capturing subpattern count = 1 -Max back reference = 1 -Named capturing subpatterns: - abc 1 -No options -No first char -Need char = ':' - abcd: - 0: abcd: - 1: abcd - -/(?'abc'a|b)(?d|e)\k{2}/J - adaa - 0: adaa - 1: a - 2: d - ** Failers -No match - addd -No match - adbb -No match - -/(?'abc'a|b)(?d|e)(?&abc){2}/J - bdaa - 0: bdaa - 1: b - 2: d - bdab - 0: bdab - 1: b - 2: d - ** Failers -No match - bddd -No match - -/(?( (?'B' abc (?(R) (?(R&A)1) (?(R&B)2) X | (?1) (?2) (?R) ))) /x - abcabc1Xabc2XabcXabcabc - 0: abcabc1Xabc2XabcX - 1: abcabc1Xabc2XabcX - 2: abcabc1Xabc2XabcX - -/(? (?'B' abc (?(R) (?(R&C)1) (?(R&B)2) X | (?1) (?2) (?R) ))) /x -Failed: reference to non-existent subpattern at offset 29 - -/^(?(DEFINE) abc | xyz ) /x -Failed: DEFINE group contains more than one branch at offset 22 - -/(?(DEFINE) abc) xyz/xI -Capturing subpattern count = 0 -Options: extended -First char = 'x' -Need char = 'z' - -/(a|)*\d/ - \O0aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa -No match - \O0aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 -Matched, but too many substrings - -/^a.b/ - a\rb - 0: a\x0db - a\nb\ - 0: a\x0ab - a\x85b\ - 0: a\x85b - ** Failers -No match - a\nb -No match - a\nb\ -No match - a\rb\ -No match - a\rb\ -No match - a\x85b\ -No match - a\rb\ -No match - -/^abc./mgx - abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x85abc7 JUNK - 0: abc1 - 0: abc2 - 0: abc3 - 0: abc4 - 0: abc5 - 0: abc6 - 0: abc7 - -/abc.$/mgx - abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc7 abc9 - 0: abc1 - 0: abc2 - 0: abc3 - 0: abc4 - 0: abc5 - 0: abc6 - 0: abc9 - -/a/ - -/a/ -Failed: inconsistent NEWLINE options at offset 0 - -/^a\Rb/ - a\nb - 0: a\x0ab - a\rb - 0: a\x0db - a\r\nb - 0: a\x0d\x0ab - a\x0bb - 0: a\x0bb - a\x0cb - 0: a\x0cb - a\x85b - 0: a\x85b - ** Failers -No match - a\n\rb -No match - -/^a\R*b/ - ab - 0: ab - a\nb - 0: a\x0ab - a\rb - 0: a\x0db - a\r\nb - 0: a\x0d\x0ab - a\x0bb - 0: a\x0bb - a\x0cb - 0: a\x0cb - a\x85b - 0: a\x85b - a\n\rb - 0: a\x0a\x0db - a\n\r\x85\x0cb - 0: a\x0a\x0d\x85\x0cb - -/^a\R+b/ - a\nb - 0: a\x0ab - a\rb - 0: a\x0db - a\r\nb - 0: a\x0d\x0ab - a\x0bb - 0: a\x0bb - a\x0cb - 0: a\x0cb - a\x85b - 0: a\x85b - a\n\rb - 0: a\x0a\x0db - a\n\r\x85\x0cb - 0: a\x0a\x0d\x85\x0cb - ** Failers -No match - ab -No match - -/^a\R{1,3}b/ - a\nb - 0: a\x0ab - a\n\rb - 0: a\x0a\x0db - a\n\r\x85b - 0: a\x0a\x0d\x85b - a\r\n\r\nb - 0: a\x0d\x0a\x0d\x0ab - a\r\n\r\n\r\nb - 0: a\x0d\x0a\x0d\x0a\x0d\x0ab - a\n\r\n\rb - 0: a\x0a\x0d\x0a\x0db - a\n\n\r\nb - 0: a\x0a\x0a\x0d\x0ab - ** Failers -No match - a\n\n\n\rb -No match - a\r -No match - -/^a[\R]b/ - aRb - 0: aRb - ** Failers -No match - a\nb -No match - -/(?&abc)X(?P)/I -Capturing subpattern count = 1 -Named capturing subpatterns: - abc 1 -No options -No first char -Need char = 'P' - abcPXP123 - 0: PXP - 1: P - -/(?1)X(?P)/I -Capturing subpattern count = 1 -Named capturing subpatterns: - abc 1 -No options -No first char -Need char = 'P' - abcPXP123 - 0: PXP - 1: P - -/(?:a(?&abc)b)*(?x)/ - 123axbaxbaxbx456 - 0: axbaxbaxbx - 1: x - 123axbaxbaxb456 - 0: x - 1: x - -/(?:a(?&abc)b){1,5}(?x)/ - 123axbaxbaxbx456 - 0: axbaxbaxbx - 1: x - -/(?:a(?&abc)b){2,5}(?x)/ - 123axbaxbaxbx456 - 0: axbaxbaxbx - 1: x - -/(?:a(?&abc)b){2,}(?x)/ - 123axbaxbaxbx456 - 0: axbaxbaxbx - 1: x - -/(abc)(?i:(?1))/ - defabcabcxyz - 0: abcabc - 1: abc - DEFabcABCXYZ -No match - -/(abc)(?:(?i)(?1))/ - defabcabcxyz - 0: abcabc - 1: abc - DEFabcABCXYZ -No match - -/^(a)\g-2/ -Failed: reference to non-existent subpattern at offset 7 - -/^(a)\g/ -Failed: a numbered reference must not be zero at offset 5 - -/^(a)\g{0}/ -Failed: a numbered reference must not be zero at offset 8 - -/^(a)\g{3/ -Failed: \g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number at offset 8 - -/^(a)\g{aa}/ -Failed: reference to non-existent subpattern at offset 9 - -/^a.b/ - a\rb - 0: a\x0db - *** Failers -No match - a\nb -No match - -/.+foo/ - afoo - 0: afoo - ** Failers -No match - \r\nfoo -No match - \nfoo -No match - -/.+foo/ - afoo - 0: afoo - \nfoo - 0: \x0afoo - ** Failers -No match - \r\nfoo -No match - -/.+foo/ - afoo - 0: afoo - ** Failers -No match - \nfoo -No match - \r\nfoo -No match - -/.+foo/s - afoo - 0: afoo - \r\nfoo - 0: \x0d\x0afoo - \nfoo - 0: \x0afoo - -/^$/mg - abc\r\rxyz - 0: - abc\n\rxyz - 0: - ** Failers -No match - abc\r\nxyz -No match - -/(?m)^$/g+ - abc\r\n\r\n - 0: - 0+ \x0d\x0a - -/(?m)^$|^\r\n/g+ - abc\r\n\r\n - 0: - 0+ \x0d\x0a - 0: \x0d\x0a - 0+ - -/(?m)$/g+ - abc\r\n\r\n - 0: - 0+ \x0d\x0a\x0d\x0a - 0: - 0+ \x0d\x0a - 0: - 0+ - -/abc.$/mgx - abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc9 - 0: abc1 - 0: abc4 - 0: abc5 - 0: abc9 - -/^X/m - XABC - 0: X - ** Failers -No match - XABC\B -No match - -/(ab|c)(?-1)/BZ ------------------------------------------------------------------- - Bra - CBra 1 - ab - Alt - c - Ket - Recurse - Ket - End ------------------------------------------------------------------- - abc - 0: abc - 1: ab - -/xy(?+1)(abc)/BZ ------------------------------------------------------------------- - Bra - xy - Recurse - CBra 1 - abc - Ket - Ket - End ------------------------------------------------------------------- - xyabcabc - 0: xyabcabc - 1: abc - ** Failers -No match - xyabc -No match - -/x(?-0)y/ -Failed: a numbered reference must not be zero at offset 5 - -/x(?-1)y/ -Failed: reference to non-existent subpattern at offset 5 - -/x(?+0)y/ -Failed: a numbered reference must not be zero at offset 5 - -/x(?+1)y/ -Failed: reference to non-existent subpattern at offset 5 - -/^(abc)?(?(-1)X|Y)/BZ ------------------------------------------------------------------- - Bra - ^ - Brazero - CBra 1 - abc - Ket - Cond - 1 Cond ref - X - Alt - Y - Ket - Ket - End ------------------------------------------------------------------- - abcX - 0: abcX - 1: abc - Y - 0: Y - ** Failers -No match - abcY -No match - -/^((?(+1)X|Y)(abc))+/BZ ------------------------------------------------------------------- - Bra - ^ - CBra 1 - Cond - 2 Cond ref - X - Alt - Y - Ket - CBra 2 - abc - Ket - KetRmax - Ket - End ------------------------------------------------------------------- - YabcXabc - 0: YabcXabc - 1: Xabc - 2: abc - YabcXabcXabc - 0: YabcXabcXabc - 1: Xabc - 2: abc - ** Failers -No match - XabcXabc -No match - -/(?(-1)a)/BZ -Failed: reference to non-existent subpattern at offset 6 - -/((?(-1)a))/BZ ------------------------------------------------------------------- - Bra - CBra 1 - Cond - 1 Cond ref - a - Ket - Ket - Ket - End ------------------------------------------------------------------- - -/((?(-2)a))/BZ -Failed: reference to non-existent subpattern at offset 7 - -/^(?(+1)X|Y)(.)/BZ ------------------------------------------------------------------- - Bra - ^ - Cond - 1 Cond ref - X - Alt - Y - Ket - CBra 1 - Any - Ket - Ket - End ------------------------------------------------------------------- - Y! - 0: Y! - 1: ! - -/(?tom|bon)-\k{A}/ - tom-tom - 0: tom-tom - 1: tom - bon-bon - 0: bon-bon - 1: bon - ** Failers -No match - tom-bon -No match - -/\g{A/ -Failed: syntax error in subpattern name (missing terminator) at offset 4 - -/(?|(abc)|(xyz))/BZ ------------------------------------------------------------------- - Bra - Bra - CBra 1 - abc - Ket - Alt - CBra 1 - xyz - Ket - Ket - Ket - End ------------------------------------------------------------------- - >abc< - 0: abc - 1: abc - >xyz< - 0: xyz - 1: xyz - -/(x)(?|(abc)|(xyz))(x)/BZ ------------------------------------------------------------------- - Bra - CBra 1 - x - Ket - Bra - CBra 2 - abc - Ket - Alt - CBra 2 - xyz - Ket - Ket - CBra 3 - x - Ket - Ket - End ------------------------------------------------------------------- - xabcx - 0: xabcx - 1: x - 2: abc - 3: x - xxyzx - 0: xxyzx - 1: x - 2: xyz - 3: x - -/(x)(?|(abc)(pqr)|(xyz))(x)/BZ ------------------------------------------------------------------- - Bra - CBra 1 - x - Ket - Bra - CBra 2 - abc - Ket - CBra 3 - pqr - Ket - Alt - CBra 2 - xyz - Ket - Ket - CBra 4 - x - Ket - Ket - End ------------------------------------------------------------------- - xabcpqrx - 0: xabcpqrx - 1: x - 2: abc - 3: pqr - 4: x - xxyzx - 0: xxyzx - 1: x - 2: xyz - 3: - 4: x - -/\H++X/BZ ------------------------------------------------------------------- - Bra - \H++ - X - Ket - End ------------------------------------------------------------------- - ** Failers -No match - XXXX -No match - -/\H+\hY/BZ ------------------------------------------------------------------- - Bra - \H++ - \h - Y - Ket - End ------------------------------------------------------------------- - XXXX Y - 0: XXXX Y - -/\H+ Y/BZ ------------------------------------------------------------------- - Bra - \H++ - Y - Ket - End ------------------------------------------------------------------- - -/\h+A/BZ ------------------------------------------------------------------- - Bra - \h++ - A - Ket - End ------------------------------------------------------------------- - -/\v*B/BZ ------------------------------------------------------------------- - Bra - \v*+ - B - Ket - End ------------------------------------------------------------------- - -/\V+\x0a/BZ ------------------------------------------------------------------- - Bra - \V++ - \x0a - Ket - End ------------------------------------------------------------------- - -/A+\h/BZ ------------------------------------------------------------------- - Bra - A++ - \h - Ket - End ------------------------------------------------------------------- - -/ *\H/BZ ------------------------------------------------------------------- - Bra - *+ - \H - Ket - End ------------------------------------------------------------------- - -/A*\v/BZ ------------------------------------------------------------------- - Bra - A*+ - \v - Ket - End ------------------------------------------------------------------- - -/\x0b*\V/BZ ------------------------------------------------------------------- - Bra - \x0b*+ - \V - Ket - End ------------------------------------------------------------------- - -/\d+\h/BZ ------------------------------------------------------------------- - Bra - \d++ - \h - Ket - End ------------------------------------------------------------------- - -/\d*\v/BZ ------------------------------------------------------------------- - Bra - \d*+ - \v - Ket - End ------------------------------------------------------------------- - -/S+\h\S+\v/BZ ------------------------------------------------------------------- - Bra - S++ - \h - \S++ - \v - Ket - End ------------------------------------------------------------------- - -/\w{3,}\h\w+\v/BZ ------------------------------------------------------------------- - Bra - \w{3} - \w*+ - \h - \w++ - \v - Ket - End ------------------------------------------------------------------- - -/\h+\d\h+\w\h+\S\h+\H/BZ ------------------------------------------------------------------- - Bra - \h++ - \d - \h++ - \w - \h++ - \S - \h++ - \H - Ket - End ------------------------------------------------------------------- - -/\v+\d\v+\w\v+\S\v+\V/BZ ------------------------------------------------------------------- - Bra - \v++ - \d - \v++ - \w - \v++ - \S - \v++ - \V - Ket - End ------------------------------------------------------------------- - -/\H+\h\H+\d/BZ ------------------------------------------------------------------- - Bra - \H++ - \h - \H+ - \d - Ket - End ------------------------------------------------------------------- - -/\V+\v\V+\w/BZ ------------------------------------------------------------------- - Bra - \V++ - \v - \V+ - \w - Ket - End ------------------------------------------------------------------- - -/\( (?: [^()]* | (?R) )* \)/x -\J1024(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(00)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0) - 0: (0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(00)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0) - -/[\E]AAA/ -Failed: missing terminating ] for character class at offset 7 - -/[\Q\E]AAA/ -Failed: missing terminating ] for character class at offset 9 - -/[^\E]AAA/ -Failed: missing terminating ] for character class at offset 8 - -/[^\Q\E]AAA/ -Failed: missing terminating ] for character class at offset 10 - -/[\E^]AAA/ -Failed: missing terminating ] for character class at offset 8 - -/[\Q\E^]AAA/ -Failed: missing terminating ] for character class at offset 10 - -/A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/BZ ------------------------------------------------------------------- - Bra - A - *PRUNE - B - *SKIP - C - *THEN - D - *COMMIT - E - *FAIL - F - *FAIL - G - *FAIL - H - *ACCEPT - I - Ket - End ------------------------------------------------------------------- - -/^a+(*FAIL)/C - aaaaaa ---->aaaaaa - +0 ^ ^ - +1 ^ a+ - +3 ^ ^ (*FAIL) - +3 ^ ^ (*FAIL) - +3 ^ ^ (*FAIL) - +3 ^ ^ (*FAIL) - +3 ^ ^ (*FAIL) - +3 ^^ (*FAIL) -No match - -/a+b?c+(*FAIL)/C - aaabccc ---->aaabccc - +0 ^ a+ - +2 ^ ^ b? - +4 ^ ^ c+ - +6 ^ ^ (*FAIL) - +6 ^ ^ (*FAIL) - +6 ^ ^ (*FAIL) - +0 ^ a+ - +2 ^ ^ b? - +4 ^ ^ c+ - +6 ^ ^ (*FAIL) - +6 ^ ^ (*FAIL) - +6 ^ ^ (*FAIL) - +0 ^ a+ - +2 ^^ b? - +4 ^ ^ c+ - +6 ^ ^ (*FAIL) - +6 ^ ^ (*FAIL) - +6 ^ ^ (*FAIL) -No match - -/a+b?(*PRUNE)c+(*FAIL)/C - aaabccc ---->aaabccc - +0 ^ a+ - +2 ^ ^ b? - +4 ^ ^ (*PRUNE) -+12 ^ ^ c+ -+14 ^ ^ (*FAIL) -+14 ^ ^ (*FAIL) -+14 ^ ^ (*FAIL) - +0 ^ a+ - +2 ^ ^ b? - +4 ^ ^ (*PRUNE) -+12 ^ ^ c+ -+14 ^ ^ (*FAIL) -+14 ^ ^ (*FAIL) -+14 ^ ^ (*FAIL) - +0 ^ a+ - +2 ^^ b? - +4 ^ ^ (*PRUNE) -+12 ^ ^ c+ -+14 ^ ^ (*FAIL) -+14 ^ ^ (*FAIL) -+14 ^ ^ (*FAIL) -No match - -/a+b?(*COMMIT)c+(*FAIL)/C - aaabccc ---->aaabccc - +0 ^ a+ - +2 ^ ^ b? - +4 ^ ^ (*COMMIT) -+13 ^ ^ c+ -+15 ^ ^ (*FAIL) -+15 ^ ^ (*FAIL) -+15 ^ ^ (*FAIL) -No match - -/a+b?(*SKIP)c+(*FAIL)/C - aaabcccaaabccc ---->aaabcccaaabccc - +0 ^ a+ - +2 ^ ^ b? - +4 ^ ^ (*SKIP) -+11 ^ ^ c+ -+13 ^ ^ (*FAIL) -+13 ^ ^ (*FAIL) -+13 ^ ^ (*FAIL) - +0 ^ a+ - +2 ^ ^ b? - +4 ^ ^ (*SKIP) -+11 ^ ^ c+ -+13 ^ ^ (*FAIL) -+13 ^ ^ (*FAIL) -+13 ^ ^ (*FAIL) -No match - -/a+b?(*THEN)c+(*FAIL)/C - aaabccc ---->aaabccc - +0 ^ a+ - +2 ^ ^ b? - +4 ^ ^ (*THEN) -+11 ^ ^ c+ -+13 ^ ^ (*FAIL) -+13 ^ ^ (*FAIL) -+13 ^ ^ (*FAIL) - +0 ^ a+ - +2 ^ ^ b? - +4 ^ ^ (*THEN) -+11 ^ ^ c+ -+13 ^ ^ (*FAIL) -+13 ^ ^ (*FAIL) -+13 ^ ^ (*FAIL) - +0 ^ a+ - +2 ^^ b? - +4 ^ ^ (*THEN) -+11 ^ ^ c+ -+13 ^ ^ (*FAIL) -+13 ^ ^ (*FAIL) -+13 ^ ^ (*FAIL) -No match - -/a(*MARK)b/ -Failed: (*MARK) must have an argument at offset 7 - -/(?i:A{1,}\6666666666)/ -Failed: number is too big at offset 19 - -/\g6666666666/ -Failed: number is too big at offset 11 - -/[\g6666666666]/BZ ------------------------------------------------------------------- - Bra - [6g] - Ket - End ------------------------------------------------------------------- - -/(?1)\c[/ -Failed: reference to non-existent subpattern at offset 3 - -/.+A/ - \r\nA -No match - -/\nA/ - \r\nA - 0: \x0aA - -/[\r\n]A/ - \r\nA - 0: \x0aA - -/(\r|\n)A/ - \r\nA - 0: \x0aA - 1: \x0a - -/a(*CR)b/ -Failed: (*VERB) not recognized or malformed at offset 5 - -/(*CR)a.b/ - a\nb - 0: a\x0ab - ** Failers -No match - a\rb -No match - -/(*CR)a.b/ - a\nb - 0: a\x0ab - ** Failers -No match - a\rb -No match - -/(*LF)a.b/ - a\rb - 0: a\x0db - ** Failers -No match - a\nb -No match - -/(*CRLF)a.b/ - a\rb - 0: a\x0db - a\nb - 0: a\x0ab - ** Failers -No match - a\r\nb -No match - -/(*ANYCRLF)a.b/ - ** Failers -No match - a\rb -No match - a\nb -No match - a\r\nb -No match - -/(*ANY)a.b/ - ** Failers -No match - a\rb -No match - a\nb -No match - a\r\nb -No match - a\x85b -No match - -/(*ANY).*/g - abc\r\ndef - 0: abc - 0: - 0: def - 0: - -/(*ANYCRLF).*/g - abc\r\ndef - 0: abc - 0: - 0: def - 0: - -/(*CRLF).*/g - abc\r\ndef - 0: abc - 0: - 0: def - 0: - -/a\Rb/I -Capturing subpattern count = 0 -Options: bsr_anycrlf -First char = 'a' -Need char = 'b' - a\rb - 0: a\x0db - a\nb - 0: a\x0ab - a\r\nb - 0: a\x0d\x0ab - ** Failers -No match - a\x85b -No match - a\x0bb -No match - -/a\Rb/I -Capturing subpattern count = 0 -Options: bsr_unicode -First char = 'a' -Need char = 'b' - a\rb - 0: a\x0db - a\nb - 0: a\x0ab - a\r\nb - 0: a\x0d\x0ab - a\x85b - 0: a\x85b - a\x0bb - 0: a\x0bb - ** Failers -No match - a\x85b\ -No match - a\x0bb\ -No match - -/a\R?b/I -Capturing subpattern count = 0 -Options: bsr_anycrlf -First char = 'a' -Need char = 'b' - a\rb - 0: a\x0db - a\nb - 0: a\x0ab - a\r\nb - 0: a\x0d\x0ab - ** Failers -No match - a\x85b -No match - a\x0bb -No match - -/a\R?b/I -Capturing subpattern count = 0 -Options: bsr_unicode -First char = 'a' -Need char = 'b' - a\rb - 0: a\x0db - a\nb - 0: a\x0ab - a\r\nb - 0: a\x0d\x0ab - a\x85b - 0: a\x85b - a\x0bb - 0: a\x0bb - ** Failers -No match - a\x85b\ -No match - a\x0bb\ -No match - -/a\R{2,4}b/I -Capturing subpattern count = 0 -Options: bsr_anycrlf -First char = 'a' -Need char = 'b' - a\r\n\nb - 0: a\x0d\x0a\x0ab - a\n\r\rb - 0: a\x0a\x0d\x0db - a\r\n\r\n\r\n\r\nb - 0: a\x0d\x0a\x0d\x0a\x0d\x0a\x0d\x0ab - ** Failers -No match - a\x85\85b -No match - a\x0b\0bb -No match - -/a\R{2,4}b/I -Capturing subpattern count = 0 -Options: bsr_unicode -First char = 'a' -Need char = 'b' - a\r\rb - 0: a\x0d\x0db - a\n\n\nb - 0: a\x0a\x0a\x0ab - a\r\n\n\r\rb - 0: a\x0d\x0a\x0a\x0d\x0db - a\x85\85b -No match - a\x0b\0bb -No match - ** Failers -No match - a\r\r\r\r\rb -No match - a\x85\85b\ -No match - a\x0b\0bb\ -No match - -/(*BSR_ANYCRLF)a\Rb/I -Capturing subpattern count = 0 -Options: bsr_anycrlf -First char = 'a' -Need char = 'b' - a\nb - 0: a\x0ab - a\rb - 0: a\x0db - -/(*BSR_UNICODE)a\Rb/I -Capturing subpattern count = 0 -Options: bsr_unicode -First char = 'a' -Need char = 'b' - a\x85b - 0: a\x85b - -/(*BSR_ANYCRLF)(*CRLF)a\Rb/I -Capturing subpattern count = 0 -Options: bsr_anycrlf -Forced newline sequence: CRLF -First char = 'a' -Need char = 'b' - a\nb - 0: a\x0ab - a\rb - 0: a\x0db - -/(*CRLF)(*BSR_UNICODE)a\Rb/I -Capturing subpattern count = 0 -Options: bsr_unicode -Forced newline sequence: CRLF -First char = 'a' -Need char = 'b' - a\x85b - 0: a\x85b - -/(*CRLF)(*BSR_ANYCRLF)(*CR)ab/I -Capturing subpattern count = 0 -Options: bsr_anycrlf -Forced newline sequence: CR -First char = 'a' -Need char = 'b' - -/(?)(?&)/ -Failed: subpattern name expected at offset 9 - -/(?)(?&a)/ -Failed: reference to non-existent subpattern at offset 12 - -/(?)(?&aaaaaaaaaaaaaaaaaaaaaaa)/ -Failed: reference to non-existent subpattern at offset 32 - -/(?+-a)/ -Failed: digit expected after (?+ at offset 3 - -/(?-+a)/ -Failed: unrecognized character after (? or (?- at offset 3 - -/(?(-1))/ -Failed: reference to non-existent subpattern at offset 6 - -/(?(+10))/ -Failed: reference to non-existent subpattern at offset 7 - -/(?(10))/ -Failed: reference to non-existent subpattern at offset 6 - -/(?(+2))()()/ - -/(?(2))()()/ - -/\k''/ -Failed: subpattern name expected at offset 3 - -/\k<>/ -Failed: subpattern name expected at offset 3 - -/\k{}/ -Failed: subpattern name expected at offset 3 - -/\k/ -Failed: \k is not followed by a braced, angle-bracketed, or quoted name at offset 1 - -/\kabc/ -Failed: \k is not followed by a braced, angle-bracketed, or quoted name at offset 1 - -/(?P=)/ -Failed: subpattern name expected at offset 4 - -/(?P>)/ -Failed: subpattern name expected at offset 4 - -/(?!\w)(?R)/ -Failed: recursive call could loop indefinitely at offset 9 - -/(?=\w)(?R)/ -Failed: recursive call could loop indefinitely at offset 9 - -/(?x|y){0}z/ - xzxx - 0: xz - yzyy - 0: yz - ** Failers -No match - xxz -No match - -/(\3)(\1)(a)/ - cat -No match - -/(\3)(\1)(a)/ - cat - 0: a - 1: - 2: - 3: a - -/TA]/ - The ACTA] comes - 0: TA] - -/TA]/ -Failed: ] is an invalid data character in JavaScript compatibility mode at offset 2 - -/(?2)[]a()b](abc)/ -Failed: reference to non-existent subpattern at offset 3 - -/(?2)[^]a()b](abc)/ -Failed: reference to non-existent subpattern at offset 3 - -/(?1)[]a()b](abc)/ - abcbabc - 0: abcbabc - 1: abc - ** Failers -No match - abcXabc -No match - -/(?1)[^]a()b](abc)/ - abcXabc - 0: abcXabc - 1: abc - ** Failers -No match - abcbabc -No match - -/(?2)[]a()b](abc)(xyz)/ - xyzbabcxyz - 0: xyzbabcxyz - 1: abc - 2: xyz - -/(?&N)[]a(?)](?abc)/ -Failed: reference to non-existent subpattern at offset 4 - -/(?&N)[]a(?)](abc)/ -Failed: reference to non-existent subpattern at offset 4 - -/a[]b/ -Failed: missing terminating ] for character class at offset 4 - -/a[^]b/ -Failed: missing terminating ] for character class at offset 5 - -/a[]b/ - ** Failers -No match - ab -No match - -/a[]+b/ - ** Failers -No match - ab -No match - -/a[]*+b/ - ** Failers -No match - ab -No match - -/a[^]b/ - aXb - 0: aXb - a\nb - 0: a\x0ab - ** Failers -No match - ab -No match - -/a[^]+b/ - aXb - 0: aXb - a\nX\nXb - 0: a\x0aX\x0aXb - ** Failers -No match - ab -No match - -/a(?!)b/BZ ------------------------------------------------------------------- - Bra - a - *FAIL - b - Ket - End ------------------------------------------------------------------- - -/(?!)?a/BZ ------------------------------------------------------------------- - Bra - Brazero - Assert not - Ket - a - Ket - End ------------------------------------------------------------------- - ab - 0: a - -/a(*FAIL)+b/ -Failed: nothing to repeat at offset 8 - -/(abc|pqr|123){0}[xyz]/SI -Capturing subpattern count = 1 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: x y z - -/(?(?=.*b)b|^)/CI -Capturing subpattern count = 0 -May match empty string -Options: -No first char -No need char - adc ---->adc - +0 ^ (?(?=.*b)b|^) - +2 ^ (?=.*b) - +5 ^ .* - +7 ^ ^ b - +7 ^ ^ b - +7 ^^ b - +7 ^ b -+11 ^ ^ -+12 ^ ) -+13 ^ - 0: - abc ---->abc - +0 ^ (?(?=.*b)b|^) - +2 ^ (?=.*b) - +5 ^ .* - +7 ^ ^ b - +7 ^ ^ b - +7 ^^ b - +8 ^ ^ ) - +9 ^ b - +0 ^ (?(?=.*b)b|^) - +2 ^ (?=.*b) - +5 ^ .* - +7 ^ ^ b - +7 ^^ b - +7 ^ b - +8 ^^ ) - +9 ^ b -+10 ^^ | -+13 ^^ - 0: b - -/(?(?=b).*b|^d)/I -Capturing subpattern count = 0 -No options -No first char -No need char - -/(?(?=.*b).*b|^d)/I -Capturing subpattern count = 0 -No options -No first char -No need char - -/xyz/C - xyz ---->xyz - +0 ^ x - +1 ^^ y - +2 ^ ^ z - +3 ^ ^ - 0: xyz - abcxyz ---->abcxyz - +0 ^ x - +1 ^^ y - +2 ^ ^ z - +3 ^ ^ - 0: xyz - abcxyz\Y ---->abcxyz - +0 ^ x - +0 ^ x - +0 ^ x - +0 ^ x - +1 ^^ y - +2 ^ ^ z - +3 ^ ^ - 0: xyz - ** Failers -No match - abc -No match - abc\Y ---->abc - +0 ^ x - +0 ^ x - +0 ^ x - +0 ^ x -No match - abcxypqr -No match - abcxypqr\Y ---->abcxypqr - +0 ^ x - +0 ^ x - +0 ^ x - +0 ^ x - +1 ^^ y - +2 ^ ^ z - +0 ^ x - +0 ^ x - +0 ^ x - +0 ^ x - +0 ^ x -No match - -/(*NO_START_OPT)xyz/C - abcxyz ---->abcxyz -+15 ^ x -+15 ^ x -+15 ^ x -+15 ^ x -+16 ^^ y -+17 ^ ^ z -+18 ^ ^ - 0: xyz - -/(*NO_AUTO_POSSESS)a+b/BZ ------------------------------------------------------------------- - Bra - a+ - b - Ket - End ------------------------------------------------------------------- - -/xyz/CY - abcxyz ---->abcxyz - +0 ^ x - +0 ^ x - +0 ^ x - +0 ^ x - +1 ^^ y - +2 ^ ^ z - +3 ^ ^ - 0: xyz - -/^"((?(?=[a])[^"])|b)*"$/C - "ab" ---->"ab" - +0 ^ ^ - +1 ^ " - +2 ^^ ((?(?=[a])[^"])|b)* - +3 ^^ (?(?=[a])[^"]) - +5 ^^ (?=[a]) - +8 ^^ [a] -+11 ^ ^ ) -+12 ^^ [^"] -+16 ^ ^ ) -+17 ^ ^ | - +3 ^ ^ (?(?=[a])[^"]) - +5 ^ ^ (?=[a]) - +8 ^ ^ [a] -+17 ^ ^ | -+21 ^ ^ " -+18 ^ ^ b -+19 ^ ^ ) - +3 ^ ^ (?(?=[a])[^"]) - +5 ^ ^ (?=[a]) - +8 ^ ^ [a] -+17 ^ ^ | -+21 ^ ^ " -+22 ^ ^ $ -+23 ^ ^ - 0: "ab" - 1: - -/^"((?(?=[a])[^"])|b)*"$/ - "ab" - 0: "ab" - 1: - -/^X(?5)(a)(?|(b)|(q))(c)(d)Y/ -Failed: reference to non-existent subpattern at offset 5 - -/^X(?&N)(a)(?|(b)|(q))(c)(d)(?Y)/ - XYabcdY - 0: XYabcdY - 1: a - 2: b - 3: c - 4: d - 5: Y - -/Xa{2,4}b/ - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/Xa{2,4}?b/ - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/Xa{2,4}+b/ - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/X\d{2,4}b/ - X\P -Partial match: X - X3\P -Partial match: X3 - X33\P -Partial match: X33 - X333\P -Partial match: X333 - X3333\P -Partial match: X3333 - -/X\d{2,4}?b/ - X\P -Partial match: X - X3\P -Partial match: X3 - X33\P -Partial match: X33 - X333\P -Partial match: X333 - X3333\P -Partial match: X3333 - -/X\d{2,4}+b/ - X\P -Partial match: X - X3\P -Partial match: X3 - X33\P -Partial match: X33 - X333\P -Partial match: X333 - X3333\P -Partial match: X3333 - -/X\D{2,4}b/ - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/X\D{2,4}?b/ - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/X\D{2,4}+b/ - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/X[abc]{2,4}b/ - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/X[abc]{2,4}?b/ - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/X[abc]{2,4}+b/ - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/X[^a]{2,4}b/ - X\P -Partial match: X - Xz\P -Partial match: Xz - Xzz\P -Partial match: Xzz - Xzzz\P -Partial match: Xzzz - Xzzzz\P -Partial match: Xzzzz - -/X[^a]{2,4}?b/ - X\P -Partial match: X - Xz\P -Partial match: Xz - Xzz\P -Partial match: Xzz - Xzzz\P -Partial match: Xzzz - Xzzzz\P -Partial match: Xzzzz - -/X[^a]{2,4}+b/ - X\P -Partial match: X - Xz\P -Partial match: Xz - Xzz\P -Partial match: Xzz - Xzzz\P -Partial match: Xzzz - Xzzzz\P -Partial match: Xzzzz - -/(Y)X\1{2,4}b/ - YX\P -Partial match: YX - YXY\P -Partial match: YXY - YXYY\P -Partial match: YXYY - YXYYY\P -Partial match: YXYYY - YXYYYY\P -Partial match: YXYYYY - -/(Y)X\1{2,4}?b/ - YX\P -Partial match: YX - YXY\P -Partial match: YXY - YXYY\P -Partial match: YXYY - YXYYY\P -Partial match: YXYYY - YXYYYY\P -Partial match: YXYYYY - -/(Y)X\1{2,4}+b/ - YX\P -Partial match: YX - YXY\P -Partial match: YXY - YXYY\P -Partial match: YXYY - YXYYY\P -Partial match: YXYYY - YXYYYY\P -Partial match: YXYYYY - -/\++\KZ|\d+X|9+Y/ - ++++123999\P -Partial match: 123999 - ++++123999Y\P - 0: 999Y - ++++Z1234\P - 0: Z - -/Z(*F)/ - Z\P -No match - ZA\P -No match - -/Z(?!)/ - Z\P -No match - ZA\P -No match - -/dog(sbody)?/ - dogs\P - 0: dog - dogs\P\P -Partial match: dogs - -/dog(sbody)??/ - dogs\P - 0: dog - dogs\P\P - 0: dog - -/dog|dogsbody/ - dogs\P - 0: dog - dogs\P\P - 0: dog - -/dogsbody|dog/ - dogs\P - 0: dog - dogs\P\P -Partial match: dogs - -/\bthe cat\b/ - the cat\P - 0: the cat - the cat\P\P -Partial match: the cat - -/abc/ - abc\P - 0: abc - abc\P\P - 0: abc - -/abc\K123/ - xyzabc123pqr - 0: 123 - xyzabc12\P -Partial match: abc12 - xyzabc12\P\P -Partial match: abc12 - -/(?<=abc)123/ - xyzabc123pqr - 0: 123 - xyzabc12\P -Partial match at offset 6: abc12 - xyzabc12\P\P -Partial match at offset 6: abc12 - -/\babc\b/ - +++abc+++ - 0: abc - +++ab\P -Partial match at offset 3: +ab - +++ab\P\P -Partial match at offset 3: +ab - -/(?&word)(?&element)(?(DEFINE)(?<[^m][^>]>[^<])(?\w*+))/BZ ------------------------------------------------------------------- - Bra - Recurse - Recurse - Cond - Cond def - CBra 1 - < - [^m] - [^>] - > - [^<] - Ket - CBra 2 - \w*+ - Ket - Ket - Ket - End ------------------------------------------------------------------- - -/(?&word)(?&element)(?(DEFINE)(?<[^\d][^>]>[^<])(?\w*+))/BZ ------------------------------------------------------------------- - Bra - Recurse - Recurse - Cond - Cond def - CBra 1 - < - [\x00-/:-\xff] (neg) - [^>] - > - [^<] - Ket - CBra 2 - \w*+ - Ket - Ket - Ket - End ------------------------------------------------------------------- - -/(ab)(x(y)z(cd(*ACCEPT)))pq/BZ ------------------------------------------------------------------- - Bra - CBra 1 - ab - Ket - CBra 2 - x - CBra 3 - y - Ket - z - CBra 4 - cd - Close 4 - Close 2 - *ACCEPT - Ket - Ket - pq - Ket - End ------------------------------------------------------------------- - -/abc\K/+ - abcdef - 0: - 0+ def - abcdef\N\N - 0: - 0+ def - xyzabcdef\N\N - 0: - 0+ def - ** Failers -No match - abcdef\N -No match - xyzabcdef\N -No match - -/^(?:(?=abc)|abc\K)/+ - abcdef - 0: - 0+ abcdef - abcdef\N\N - 0: - 0+ def - ** Failers -No match - abcdef\N -No match - -/a?b?/+ - xyz - 0: - 0+ xyz - xyzabc - 0: - 0+ xyzabc - xyzabc\N - 0: ab - 0+ c - xyzabc\N\N - 0: - 0+ yzabc - xyz\N\N - 0: - 0+ yz - ** Failers - 0: - 0+ ** Failers - xyz\N -No match - -/^a?b?/+ - xyz - 0: - 0+ xyz - xyzabc - 0: - 0+ xyzabc - ** Failers - 0: - 0+ ** Failers - xyzabc\N -No match - xyzabc\N\N -No match - xyz\N\N -No match - xyz\N -No match - -/^(?a|b\gc)/ - aaaa - 0: a - 1: a - bacxxx - 0: bac - 1: bac - bbaccxxx - 0: bbacc - 1: bbacc - bbbacccxx - 0: bbbaccc - 1: bbbaccc - -/^(?a|b\g'name'c)/ - aaaa - 0: a - 1: a - bacxxx - 0: bac - 1: bac - bbaccxxx - 0: bbacc - 1: bbacc - bbbacccxx - 0: bbbaccc - 1: bbbaccc - -/^(a|b\g<1>c)/ - aaaa - 0: a - 1: a - bacxxx - 0: bac - 1: bac - bbaccxxx - 0: bbacc - 1: bbacc - bbbacccxx - 0: bbbaccc - 1: bbbaccc - -/^(a|b\g'1'c)/ - aaaa - 0: a - 1: a - bacxxx - 0: bac - 1: bac - bbaccxxx - 0: bbacc - 1: bbacc - bbbacccxx - 0: bbbaccc - 1: bbbaccc - -/^(a|b\g'-1'c)/ - aaaa - 0: a - 1: a - bacxxx - 0: bac - 1: bac - bbaccxxx - 0: bbacc - 1: bbacc - bbbacccxx - 0: bbbaccc - 1: bbbaccc - -/(^(a|b\g<-1>c))/ - aaaa - 0: a - 1: a - 2: a - bacxxx - 0: bac - 1: bac - 2: bac - bbaccxxx - 0: bbacc - 1: bbacc - 2: bbacc - bbbacccxx - 0: bbbaccc - 1: bbbaccc - 2: bbbaccc - -/(?-i:\g)(?i:(?a))/ - XaaX - 0: aa - 1: a - XAAX - 0: AA - 1: A - -/(?i:\g)(?-i:(?a))/ - XaaX - 0: aa - 1: a - ** Failers -No match - XAAX -No match - -/(?-i:\g<+1>)(?i:(a))/ - XaaX - 0: aa - 1: a - XAAX - 0: AA - 1: A - -/(?=(?(?#simplesyntax)\$(?[a-zA-Z_\x{7f}-\x{ff}][a-zA-Z0-9_\x{7f}-\x{ff}]*)(?:\[(?[a-zA-Z0-9_\x{7f}-\x{ff}]+|\$\g)\]|->\g(\(.*?\))?)?|(?#simple syntax withbraces)\$\{(?:\g(?\[(?:\g|'(?:\\.|[^'\\])*'|"(?:\g|\\.|[^"\\])*")\])?|\g|\$\{\g\})\}|(?#complexsyntax)\{(?\$(?\g(\g*|\(.*?\))?)(?:->\g)*|\$\g|\$\{\g\})\}))\{/ - -/(?a|b|c)\g*/ - abc - 0: abc - 1: a - accccbbb - 0: accccbbb - 1: a - -/^X(?7)(a)(?|(b)|(q)(r)(s))(c)(d)(Y)/ - XYabcdY - 0: XYabcdY - 1: a - 2: b - 3: - 4: - 5: c - 6: d - 7: Y - -/(?<=b(?1)|zzz)(a)/ - xbaax - 0: a - 1: a - xzzzax - 0: a - 1: a - -/(a)(?<=b\1)/ -Failed: lookbehind assertion is not fixed length at offset 10 - -/(a)(?<=b+(?1))/ -Failed: lookbehind assertion is not fixed length at offset 13 - -/(a+)(?<=b(?1))/ -Failed: lookbehind assertion is not fixed length at offset 14 - -/(a(?<=b(?1)))/ -Failed: lookbehind assertion is not fixed length at offset 13 - -/(?<=b(?1))xyz/ -Failed: reference to non-existent subpattern at offset 8 - -/(?<=b(?1))xyz(b+)pqrstuvew/ -Failed: lookbehind assertion is not fixed length at offset 26 - -/(a|bc)\1/SI -Capturing subpattern count = 1 -Max back reference = 1 -No options -No first char -No need char -Subject length lower bound = 2 -Starting chars: a b - -/(a|bc)\1{2,3}/SI -Capturing subpattern count = 1 -Max back reference = 1 -No options -No first char -No need char -Subject length lower bound = 3 -Starting chars: a b - -/(a|bc)(?1)/SI -Capturing subpattern count = 1 -No options -No first char -No need char -Subject length lower bound = 2 -Starting chars: a b - -/(a|b\1)(a|b\1)/SI -Capturing subpattern count = 2 -Max back reference = 1 -No options -No first char -No need char -Subject length lower bound = 2 -Starting chars: a b - -/(a|b\1){2}/SI -Capturing subpattern count = 1 -Max back reference = 1 -No options -No first char -No need char -Subject length lower bound = 2 -Starting chars: a b - -/(a|bbbb\1)(a|bbbb\1)/SI -Capturing subpattern count = 2 -Max back reference = 1 -No options -No first char -No need char -Subject length lower bound = 2 -Starting chars: a b - -/(a|bbbb\1){2}/SI -Capturing subpattern count = 1 -Max back reference = 1 -No options -No first char -No need char -Subject length lower bound = 2 -Starting chars: a b - -/^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/SI -Capturing subpattern count = 1 -Options: anchored -No first char -Need char = ':' -Subject length lower bound = 22 -No starting char list - -/]{0,})>]{0,})>([\d]{0,}\.)(.*)((
    ([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/isIS -Capturing subpattern count = 11 -Options: caseless dotall -First char = '<' -Need char = '>' -Subject length lower bound = 47 -No starting char list - -"(?>.*/)foo"SI -Capturing subpattern count = 0 -No options -No first char -Need char = 'o' -Subject length lower bound = 4 -No starting char list - -/(?(?=[^a-z]+[a-z]) \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} ) /xSI -Capturing subpattern count = 0 -Options: extended -No first char -Need char = '-' -Subject length lower bound = 8 -No starting char list - -/(?:(?:(?:(?:(?:(?:(?:(?:(?:(a|b|c))))))))))/iSI -Capturing subpattern count = 1 -Options: caseless -No first char -No need char -Subject length lower bound = 1 -Starting chars: A B C a b c - -/(?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))/SI -Capturing subpattern count = 0 -No options -No first char -Need char = 'b' -Subject length lower bound = 41 -Starting chars: c d - -/A)|(?
    B))/I -Capturing subpattern count = 1 -Named capturing subpatterns: - a 1 -No options -No first char -No need char - AB\Ca - 0: A - 1: A - C A (1) a - BA\Ca - 0: B - 1: B - C B (1) a - -/(?|(?A)|(?B))/ -Failed: different names for subpatterns of the same number are not allowed at offset 15 - -/(?:a(? (?')|(?")) | - b(? (?')|(?")) ) - (?('quote')[a-z]+|[0-9]+)/JIx -Capturing subpattern count = 6 -Max back reference = 1 -Named capturing subpatterns: - apostrophe 2 - apostrophe 5 - quote 1 - quote 4 - realquote 3 - realquote 6 -Options: extended dupnames -No first char -No need char - a"aaaaa - 0: a"aaaaa - 1: " - 2: - 3: " - b"aaaaa - 0: b"aaaaa - 1: - 2: - 3: - 4: " - 5: - 6: " - ** Failers -No match - b"11111 -No match - a"11111 -No match - -/^(?|(a)(b)(c)(?d)|(?e)) (?('D')X|Y)/JDZx ------------------------------------------------------------------- - Bra - ^ - Bra - CBra 1 - a - Ket - CBra 2 - b - Ket - CBra 3 - c - Ket - CBra 4 - d - Ket - Alt - CBra 1 - e - Ket - Ket - Cond - Cond ref 2 - X - Alt - Y - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 4 -Max back reference = 4 -Named capturing subpatterns: - D 4 - D 1 -Options: anchored extended dupnames -No first char -No need char - abcdX - 0: abcdX - 1: a - 2: b - 3: c - 4: d - eX - 0: eX - 1: e - ** Failers -No match - abcdY -No match - ey -No match - -/(?a) (b)(c) (?d (?(R&A)$ | (?4)) )/JDZx ------------------------------------------------------------------- - Bra - CBra 1 - a - Ket - CBra 2 - b - Ket - CBra 3 - c - Ket - CBra 4 - d - Cond - Cond recurse 2 - $ - Alt - Recurse - Ket - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 4 -Max back reference = 1 -Named capturing subpatterns: - A 1 - A 4 -Options: extended dupnames -First char = 'a' -Need char = 'd' - abcdd - 0: abcdd - 1: a - 2: b - 3: c - 4: dd - ** Failers -No match - abcdde -No match - -/abcd*/ - xxxxabcd\P - 0: abcd - xxxxabcd\P\P -Partial match: abcd - -/abcd*/i - xxxxabcd\P - 0: abcd - xxxxabcd\P\P -Partial match: abcd - XXXXABCD\P - 0: ABCD - XXXXABCD\P\P -Partial match: ABCD - -/abc\d*/ - xxxxabc1\P - 0: abc1 - xxxxabc1\P\P -Partial match: abc1 - -/(a)bc\1*/ - xxxxabca\P - 0: abca - 1: a - xxxxabca\P\P -Partial match: abca - -/abc[de]*/ - xxxxabcde\P - 0: abcde - xxxxabcde\P\P -Partial match: abcde - -/-- This is not in the Perl-compatible test because Perl seems currently to be - broken and not behaving as specified in that it *does* bumpalong after - hitting (*COMMIT). --/ - -/(?1)(A(*COMMIT)|B)D/ - ABD - 0: ABD - 1: B - XABD - 0: ABD - 1: B - BAD - 0: BAD - 1: A - ABXABD - 0: ABD - 1: B - ** Failers -No match - ABX -No match - BAXBAD -No match - -/(\3)(\1)(a)/ - cat - 0: a - 1: - 2: - 3: a - -/(\3)(\1)(a)/SI -Capturing subpattern count = 3 -Max back reference = 3 -Options: -No first char -Need char = 'a' -Subject length lower bound = 1 -No starting char list - cat - 0: a - 1: - 2: - 3: a - -/(\3)(\1)(a)/SI -Capturing subpattern count = 3 -Max back reference = 3 -No options -No first char -Need char = 'a' -Subject length lower bound = 3 -No starting char list - cat -No match - -/i(?(DEFINE)(?a))/SI -Capturing subpattern count = 1 -Named capturing subpatterns: - s 1 -No options -First char = 'i' -No need char -Subject length lower bound = 1 -No starting char list - i - 0: i - -/()i(?(1)a)/SI -Capturing subpattern count = 1 -Max back reference = 1 -No options -No first char -Need char = 'i' -Subject length lower bound = 1 -Starting chars: i - ia - 0: ia - 1: - -/(?i)a(?-i)b|c/BZ ------------------------------------------------------------------- - Bra - /i a - b - Alt - c - Ket - End ------------------------------------------------------------------- - XabX - 0: ab - XAbX - 0: Ab - CcC - 0: c - ** Failers -No match - XABX -No match - -/(?i)a(?s)b|c/BZ ------------------------------------------------------------------- - Bra - /i ab - Alt - /i c - Ket - End ------------------------------------------------------------------- - -/(?i)a(?s-i)b|c/BZ ------------------------------------------------------------------- - Bra - /i a - b - Alt - c - Ket - End ------------------------------------------------------------------- - -/^(ab(c\1)d|x){2}$/BZ ------------------------------------------------------------------- - Bra - ^ - Once - CBra 1 - ab - CBra 2 - c - \1 - Ket - d - Alt - x - Ket - Ket - Once - CBra 1 - ab - CBra 2 - c - \1 - Ket - d - Alt - x - Ket - Ket - $ - Ket - End ------------------------------------------------------------------- - xabcxd - 0: xabcxd - 1: abcxd - 2: cx - -/^(?&t)*+(?(DEFINE)(?.))$/BZ ------------------------------------------------------------------- - Bra - ^ - Braposzero - SBraPos - Recurse - KetRpos - Cond - Cond def - CBra 1 - Any - Ket - Ket - $ - Ket - End ------------------------------------------------------------------- - -/^(?&t)*(?(DEFINE)(?.))$/BZ ------------------------------------------------------------------- - Bra - ^ - Brazero - Once - Recurse - KetRmax - Cond - Cond def - CBra 1 - Any - Ket - Ket - $ - Ket - End ------------------------------------------------------------------- - -/ -- This one is here because Perl gives the match as "b" rather than "ab". I - believe this to be a Perl bug. --/ - -/(?>a\Kb)z|(ab)/ - ab - 0: ab - 1: ab - -/(?P(?P0|)|(?P>L2)(?P>L1))/ -Failed: recursive call could loop indefinitely at offset 31 - -/abc(*MARK:)pqr/ -Failed: (*MARK) must have an argument at offset 10 - -/abc(*:)pqr/ -Failed: (*MARK) must have an argument at offset 6 - -/abc(*FAIL:123)xyz/ -Failed: an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT) at offset 13 - -/--- This should, and does, fail. In Perl, it does not, which I think is a - bug because replacing the B in the pattern by (B|D) does make it fail. ---/ - -/A(*COMMIT)B/+K - ACABX -No match - -/--- These should be different, but in Perl they are not, which I think - is a bug in Perl. ---/ - -/A(*THEN)B|A(*THEN)C/K - AC - 0: AC - -/A(*PRUNE)B|A(*PRUNE)C/K - AC -No match - -/--- Mark names can be duplicated. Perl doesn't give a mark for this one, -though PCRE does. ---/ - -/^A(*:A)B|^X(*:A)Y/K - ** Failers -No match - XAQQ -No match, mark = A - -/--- COMMIT at the start of a pattern should be the same as an anchor. Perl -optimizations defeat this. So does the PCRE optimization unless we disable it -with \Y. ---/ - -/(*COMMIT)ABC/ - ABCDEFG - 0: ABC - ** Failers -No match - DEFGABC\Y -No match - -/^(ab (c+(*THEN)cd) | xyz)/x - abcccd -No match - -/^(ab (c+(*PRUNE)cd) | xyz)/x - abcccd -No match - -/^(ab (c+(*FAIL)cd) | xyz)/x - abcccd -No match - -/--- Perl gets some of these wrong ---/ - -/(?>.(*ACCEPT))*?5/ - abcde - 0: a - -/(.(*ACCEPT))*?5/ - abcde - 0: a - 1: a - -/(.(*ACCEPT))5/ - abcde - 0: a - 1: a - -/(.(*ACCEPT))*5/ - abcde - 0: a - 1: a - -/A\NB./BZ ------------------------------------------------------------------- - Bra - A - Any - B - Any - Ket - End ------------------------------------------------------------------- - ACBD - 0: ACBD - *** Failers -No match - A\nB -No match - ACB\n -No match - -/A\NB./sBZ ------------------------------------------------------------------- - Bra - A - Any - B - AllAny - Ket - End ------------------------------------------------------------------- - ACBD - 0: ACBD - ACB\n - 0: ACB\x0a - *** Failers -No match - A\nB -No match - -/A\NB/ - A\nB - 0: A\x0aB - A\rB - 0: A\x0dB - ** Failers -No match - A\r\nB -No match - -/\R+b/BZ ------------------------------------------------------------------- - Bra - \R++ - b - Ket - End ------------------------------------------------------------------- - -/\R+\n/BZ ------------------------------------------------------------------- - Bra - \R+ - \x0a - Ket - End ------------------------------------------------------------------- - -/\R+\d/BZ ------------------------------------------------------------------- - Bra - \R++ - \d - Ket - End ------------------------------------------------------------------- - -/\d*\R/BZ ------------------------------------------------------------------- - Bra - \d*+ - \R - Ket - End ------------------------------------------------------------------- - -/\s*\R/BZ ------------------------------------------------------------------- - Bra - \s* - \R - Ket - End ------------------------------------------------------------------- - \x20\x0a - 0: \x0a - \x20\x0d - 0: \x0d - \x20\x0d\x0a - 0: \x0d\x0a - -/\S*\R/BZ ------------------------------------------------------------------- - Bra - \S*+ - \R - Ket - End ------------------------------------------------------------------- - a\x0a - 0: a\x0a - -/X\h*\R/BZ ------------------------------------------------------------------- - Bra - X - \h*+ - \R - Ket - End ------------------------------------------------------------------- - X\x20\x0a - 0: X \x0a - -/X\H*\R/BZ ------------------------------------------------------------------- - Bra - X - \H* - \R - Ket - End ------------------------------------------------------------------- - X\x0d\x0a - 0: X\x0d\x0a - -/X\H+\R/BZ ------------------------------------------------------------------- - Bra - X - \H+ - \R - Ket - End ------------------------------------------------------------------- - X\x0d\x0a - 0: X\x0d\x0a - -/X\H++\R/BZ ------------------------------------------------------------------- - Bra - X - \H++ - \R - Ket - End ------------------------------------------------------------------- - X\x0d\x0a -No match - -/(?<=abc)def/ - abc\P\P -Partial match at offset 3: abc - -/abc$/ - abc - 0: abc - abc\P - 0: abc - abc\P\P -Partial match: abc - -/abc$/m - abc - 0: abc - abc\n - 0: abc - abc\P\P -Partial match: abc - abc\n\P\P - 0: abc - abc\P - 0: abc - abc\n\P - 0: abc - -/abc\z/ - abc - 0: abc - abc\P - 0: abc - abc\P\P -Partial match: abc - -/abc\Z/ - abc - 0: abc - abc\P - 0: abc - abc\P\P -Partial match: abc - -/abc\b/ - abc - 0: abc - abc\P - 0: abc - abc\P\P -Partial match: abc - -/abc\B/ - abc -No match - abc\P -Partial match: abc - abc\P\P -Partial match: abc - -/.+/ - abc\>0 - 0: abc - abc\>1 - 0: bc - abc\>2 - 0: c - abc\>3 -No match - abc\>4 -Error -24 (bad offset value) - abc\>-4 -Error -24 (bad offset value) - -/^\cÄ£/ -Failed: \c must be followed by an ASCII character at offset 3 - -/(?P(?P=abn)xxx)/BZ ------------------------------------------------------------------- - Bra - Once - CBra 1 - \1 - xxx - Ket - Ket - Ket - End ------------------------------------------------------------------- - -/(a\1z)/BZ ------------------------------------------------------------------- - Bra - Once - CBra 1 - a - \1 - z - Ket - Ket - Ket - End ------------------------------------------------------------------- - -/(?P(?P=abn)(?(?P=axn)xxx)/BZ -Failed: reference to non-existent subpattern at offset 15 - -/(?P(?P=axn)xxx)(?yy)/BZ ------------------------------------------------------------------- - Bra - CBra 1 - \2 - xxx - Ket - CBra 2 - yy - Ket - Ket - End ------------------------------------------------------------------- - -/-- These tests are here because Perl gets the first one wrong. --/ - -/(\R*)(.)/s - \r\n - 0: \x0d - 1: - 2: \x0d - \r\r\n\n\r - 0: \x0d\x0d\x0a\x0a\x0d - 1: \x0d\x0d\x0a\x0a - 2: \x0d - \r\r\n\n\r\n - 0: \x0d\x0d\x0a\x0a\x0d - 1: \x0d\x0d\x0a\x0a - 2: \x0d - -/(\R)*(.)/s - \r\n - 0: \x0d - 1: - 2: \x0d - \r\r\n\n\r - 0: \x0d\x0d\x0a\x0a\x0d - 1: \x0a - 2: \x0d - \r\r\n\n\r\n - 0: \x0d\x0d\x0a\x0a\x0d - 1: \x0a - 2: \x0d - -/((?>\r\n|\n|\x0b|\f|\r|\x85)*)(.)/s - \r\n - 0: \x0d - 1: - 2: \x0d - \r\r\n\n\r - 0: \x0d\x0d\x0a\x0a\x0d - 1: \x0d\x0d\x0a\x0a - 2: \x0d - \r\r\n\n\r\n - 0: \x0d\x0d\x0a\x0a\x0d - 1: \x0d\x0d\x0a\x0a - 2: \x0d - -/-- --/ - -/^abc$/BZ ------------------------------------------------------------------- - Bra - ^ - abc - $ - Ket - End ------------------------------------------------------------------- - -/^abc$/BZm ------------------------------------------------------------------- - Bra - /m ^ - abc - /m $ - Ket - End ------------------------------------------------------------------- - -/^(a)*+(\w)/S - aaaaX - 0: aaaaX - 1: a - 2: X - ** Failers -No match - aaaa -No match - -/^(?:a)*+(\w)/S - aaaaX - 0: aaaaX - 1: X - ** Failers -No match - aaaa -No match - -/(a)++1234/SDZ ------------------------------------------------------------------- - Bra - CBraPos 1 - a - KetRpos - 1234 - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -No options -First char = 'a' -Need char = '4' -Subject length lower bound = 5 -No starting char list - -/([abc])++1234/SI -Capturing subpattern count = 1 -No options -No first char -Need char = '4' -Subject length lower bound = 5 -Starting chars: a b c - -/(?<=(abc)+)X/ -Failed: lookbehind assertion is not fixed length at offset 10 - -/(^ab)/I -Capturing subpattern count = 1 -Options: anchored -No first char -No need char - -/(^ab)++/I -Capturing subpattern count = 1 -Options: anchored -No first char -No need char - -/(^ab|^)+/I -Capturing subpattern count = 1 -May match empty string -Options: anchored -No first char -No need char - -/(^ab|^)++/I -Capturing subpattern count = 1 -May match empty string -Options: anchored -No first char -No need char - -/(?:^ab)/I -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/(?:^ab)++/I -Capturing subpattern count = 0 -Options: anchored -No first char -No need char - -/(?:^ab|^)+/I -Capturing subpattern count = 0 -May match empty string -Options: anchored -No first char -No need char - -/(?:^ab|^)++/I -Capturing subpattern count = 0 -May match empty string -Options: anchored -No first char -No need char - -/(.*ab)/I -Capturing subpattern count = 1 -No options -First char at start or follows newline -Need char = 'b' - -/(.*ab)++/I -Capturing subpattern count = 1 -No options -First char at start or follows newline -Need char = 'b' - -/(.*ab|.*)+/I -Capturing subpattern count = 1 -May match empty string -No options -First char at start or follows newline -No need char - -/(.*ab|.*)++/I -Capturing subpattern count = 1 -May match empty string -No options -First char at start or follows newline -No need char - -/(?:.*ab)/I -Capturing subpattern count = 0 -No options -First char at start or follows newline -Need char = 'b' - -/(?:.*ab)++/I -Capturing subpattern count = 0 -No options -First char at start or follows newline -Need char = 'b' - -/(?:.*ab|.*)+/I -Capturing subpattern count = 0 -May match empty string -No options -First char at start or follows newline -No need char - -/(?:.*ab|.*)++/I -Capturing subpattern count = 0 -May match empty string -No options -First char at start or follows newline -No need char - -/(?=a)[bcd]/I -Capturing subpattern count = 0 -No options -First char = 'a' -No need char - -/((?=a))[bcd]/I -Capturing subpattern count = 1 -No options -First char = 'a' -No need char - -/((?=a))+[bcd]/I -Capturing subpattern count = 1 -No options -First char = 'a' -No need char - -/((?=a))++[bcd]/I -Capturing subpattern count = 1 -No options -First char = 'a' -No need char - -/(?=a+)[bcd]/iI -Capturing subpattern count = 0 -Options: caseless -First char = 'a' (caseless) -No need char - -/(?=a+?)[bcd]/iI -Capturing subpattern count = 0 -Options: caseless -First char = 'a' (caseless) -No need char - -/(?=a++)[bcd]/iI -Capturing subpattern count = 0 -Options: caseless -First char = 'a' (caseless) -No need char - -/(?=a{3})[bcd]/iI -Capturing subpattern count = 0 -Options: caseless -First char = 'a' (caseless) -Need char = 'a' (caseless) - -/(abc)\1+/S - -/-- Perl doesn't get these right IMO (the 3rd is PCRE-specific) --/ - -/(?1)(?:(b(*ACCEPT))){0}/ - b - 0: b - -/(?1)(?:(b(*ACCEPT))){0}c/ - bc - 0: bc - ** Failers -No match - b -No match - -/(?1)(?:((*ACCEPT))){0}c/ - c - 0: c - c\N - 0: c - -/^.*?(?(?=a)a|b(*THEN)c)/ - ba -No match - -/^.*?(?(?=a)a|bc)/ - ba - 0: ba - -/^.*?(?(?=a)a(*THEN)b|c)/ - ac -No match - -/^.*?(?(?=a)a(*THEN)b)c/ - ac -No match - -/^.*?(a(*THEN)b)c/ - aabc -No match - -/^.*? (?1) c (?(DEFINE)(a(*THEN)b))/x - aabc - 0: aabc - -/^.*?(a(*THEN)b|z)c/ - aabc - 0: aabc - 1: ab - -/^.*?(z|a(*THEN)b)c/ - aabc - 0: aabc - 1: ab - -/-- --/ - -/-- These studied versions are here because they are not Perl-compatible; the - studying means the mark is not seen. --/ - -/(*MARK:A)(*SKIP:B)(C|X)/KS - C - 0: C - 1: C -MK: A - D -No match, mark = A - -/(*:A)A+(*SKIP:A)(B|Z)/KS - AAAC -No match, mark = A - -/-- --/ - -"(?=a*(*ACCEPT)b)c" - c - 0: c - c\N - 0: c - -/(?1)c(?(DEFINE)((*ACCEPT)b))/ - c - 0: c - c\N - 0: c - -/(?>(*ACCEPT)b)c/ - c - 0: - c\N -No match - -/(?:(?>(a)))+a%/++ - %aa% - 0: aa% - 0+ - 1: a - 1+ a% - -/(a)b|ac/++SS - ac\O3 - 0: ac - 0+ - -/(a)(b)x|abc/++ - abc\O6 - 0: abc - 0+ - -/(a)bc|(a)(b)\2/ - \O3abc -Matched, but too many substrings - 0: abc - \O4abc -Matched, but too many substrings - 0: abc - -/(?(DEFINE)(a(?2)|b)(b(?1)|a))(?:(?1)|(?2))/SI -Capturing subpattern count = 2 -No options -No first char -No need char -Subject length lower bound = 1 -No starting char list - -/(a(?2)|b)(b(?1)|a)(?:(?1)|(?2))/SI -Capturing subpattern count = 2 -No options -No first char -No need char -Subject length lower bound = 3 -Starting chars: a b - -/(a(?2)|b)(b(?1)|a)(?1)(?2)/SI -Capturing subpattern count = 2 -No options -No first char -No need char -Subject length lower bound = 4 -Starting chars: a b - -/(abc)(?1)/SI -Capturing subpattern count = 1 -No options -First char = 'a' -Need char = 'c' -Subject length lower bound = 6 -No starting char list - -/^(?>a)++/ - aa\M -Minimum match() limit = 5 -Minimum match() recursion limit = 2 - 0: aa - aaaaaaaaa\M -Minimum match() limit = 12 -Minimum match() recursion limit = 2 - 0: aaaaaaaaa - -/(a)(?1)++/ - aa\M -Minimum match() limit = 7 -Minimum match() recursion limit = 4 - 0: aa - 1: a - aaaaaaaaa\M -Minimum match() limit = 21 -Minimum match() recursion limit = 4 - 0: aaaaaaaaa - 1: a - -/(?:(foo)|(bar)|(baz))X/SS= - bazfooX - 0: fooX - 1: foo - 2: - 3: - foobazbarX - 0: barX - 1: - 2: bar - 3: - barfooX - 0: fooX - 1: foo - 2: - 3: - bazX - 0: bazX - 1: - 2: - 3: baz - foobarbazX - 0: bazX - 1: - 2: - 3: baz - bazfooX\O0 -Matched, but too many substrings - bazfooX\O2 -Matched, but too many substrings - 0: fooX - bazfooX\O4 -Matched, but too many substrings - 0: fooX - 1: - bazfooX\O6 -Matched, but too many substrings - 0: fooX - 1: foo - 2: - bazfooX\O8 -Matched, but too many substrings - 0: fooX - 1: foo - 2: - 3: - bazfooX\O10 - 0: fooX - 1: foo - 2: - 3: - -/(?=abc){3}abc/BZ ------------------------------------------------------------------- - Bra - Assert - abc - Ket - abc - Ket - End ------------------------------------------------------------------- - -/(?=abc)+abc/BZ ------------------------------------------------------------------- - Bra - Assert - abc - Ket - abc - Ket - End ------------------------------------------------------------------- - -/(?=abc)++abc/BZ ------------------------------------------------------------------- - Bra - Assert - abc - Ket - abc - Ket - End ------------------------------------------------------------------- - -/(?=abc){0}xyz/BZ ------------------------------------------------------------------- - Bra - Skip zero - Assert - abc - Ket - xyz - Ket - End ------------------------------------------------------------------- - -/(?=(a))?./BZ ------------------------------------------------------------------- - Bra - Brazero - Assert - CBra 1 - a - Ket - Ket - Any - Ket - End ------------------------------------------------------------------- - -/(?=(a))??./BZ ------------------------------------------------------------------- - Bra - Braminzero - Assert - CBra 1 - a - Ket - Ket - Any - Ket - End ------------------------------------------------------------------- - -/^(?=(a)){0}b(?1)/BZ ------------------------------------------------------------------- - Bra - ^ - Skip zero - Assert - CBra 1 - a - Ket - Ket - b - Recurse - Ket - End ------------------------------------------------------------------- - -/(?(DEFINE)(a))?b(?1)/BZ ------------------------------------------------------------------- - Bra - Cond - Cond def - CBra 1 - a - Ket - Ket - b - Recurse - Ket - End ------------------------------------------------------------------- - -/^(?=(?1))?[az]([abc])d/BZ ------------------------------------------------------------------- - Bra - ^ - Brazero - Assert - Recurse - Ket - [az] - CBra 1 - [a-c] - Ket - d - Ket - End ------------------------------------------------------------------- - -/^(?!a){0}\w+/BZ ------------------------------------------------------------------- - Bra - ^ - Skip zero - Assert not - a - Ket - \w++ - Ket - End ------------------------------------------------------------------- - -/(?<=(abc))?xyz/BZ ------------------------------------------------------------------- - Bra - Brazero - AssertB - Reverse - CBra 1 - abc - Ket - Ket - xyz - Ket - End ------------------------------------------------------------------- - -/[:a[:abc]b:]/BZ ------------------------------------------------------------------- - Bra - [:[a-c] - b:] - Ket - End ------------------------------------------------------------------- - -/((?2))((?1))/SS - abc -Error -26 (nested recursion at the same subject position) - -/((?(R2)a+|(?1)b))/SS - aaaabcde -Error -26 (nested recursion at the same subject position) - -/(?(R)a*(?1)|((?R))b)/SS - aaaabcde -Error -26 (nested recursion at the same subject position) - -/(a+|(?R)b)/ -Failed: recursive call could loop indefinitely at offset 7 - -/^(a(*:A)(d|e(*:B))z|aeq)/C - adz ---->adz - +0 ^ ^ - +1 ^ (a(*:A)(d|e(*:B))z|aeq) - +2 ^ a - +3 ^^ (*:A) - +8 ^^ (d|e(*:B)) -Latest Mark: A - +9 ^^ d -+10 ^ ^ | -+18 ^ ^ z -+19 ^ ^ | -+24 ^ ^ - 0: adz - 1: adz - 2: d - aez ---->aez - +0 ^ ^ - +1 ^ (a(*:A)(d|e(*:B))z|aeq) - +2 ^ a - +3 ^^ (*:A) - +8 ^^ (d|e(*:B)) -Latest Mark: A - +9 ^^ d -+11 ^^ e -+12 ^ ^ (*:B) -+17 ^ ^ ) -Latest Mark: B -+18 ^ ^ z -+19 ^ ^ | -+24 ^ ^ - 0: aez - 1: aez - 2: e - aeqwerty ---->aeqwerty - +0 ^ ^ - +1 ^ (a(*:A)(d|e(*:B))z|aeq) - +2 ^ a - +3 ^^ (*:A) - +8 ^^ (d|e(*:B)) -Latest Mark: A - +9 ^^ d -+11 ^^ e -+12 ^ ^ (*:B) -+17 ^ ^ ) -Latest Mark: B -+18 ^ ^ z -+20 ^ a -+21 ^^ e -+22 ^ ^ q -+23 ^ ^ ) -+24 ^ ^ - 0: aeq - 1: aeq - -/.(*F)/ - \P\Pabc -No match - -/\btype\b\W*?\btext\b\W*?\bjavascript\b/IS -Capturing subpattern count = 0 -Max lookbehind = 1 -No options -First char = 't' -Need char = 't' -Subject length lower bound = 18 -No starting char list - -/\btype\b\W*?\btext\b\W*?\bjavascript\b|\burl\b\W*?\bshell:|a+)(?>(z+))\w/BZ ------------------------------------------------------------------- - Bra - ^ - Once_NC - a++ - Ket - Once - CBra 1 - z++ - Ket - Ket - \w - Ket - End ------------------------------------------------------------------- - aaaazzzzb - 0: aaaazzzzb - 1: zzzz - ** Failers -No match - aazz -No match - -/(.)(\1|a(?2))/ - bab - 0: bab - 1: b - 2: ab - -/\1|(.)(?R)\1/ - cbbbc - 0: cbbbc - 1: c - -/(.)((?(1)c|a)|a(?2))/ - baa -No match - -/(?P(?P=abn)xxx)/BZ ------------------------------------------------------------------- - Bra - Once - CBra 1 - \1 - xxx - Ket - Ket - Ket - End ------------------------------------------------------------------- - -/(a\1z)/BZ ------------------------------------------------------------------- - Bra - Once - CBra 1 - a - \1 - z - Ket - Ket - Ket - End ------------------------------------------------------------------- - -/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/ - \Maabbccddee -Minimum match() limit = 7 -Minimum match() recursion limit = 2 - 0: aabbccddee - -/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/ - \Maabbccddee -Minimum match() limit = 17 -Minimum match() recursion limit = 16 - 0: aabbccddee - 1: aa - 2: bb - 3: cc - 4: dd - 5: ee - -/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/ - \Maabbccddee -Minimum match() limit = 13 -Minimum match() recursion limit = 10 - 0: aabbccddee - 1: aa - 2: cc - 3: ee - -/^a\x41z/ - aAz - 0: aAz - *** Failers -No match - ax41z -No match - -/^a[m\x41]z/ - aAz - 0: aAz - -/^a\x1z/ - ax1z - 0: ax1z - -/^a\u0041z/ - aAz - 0: aAz - *** Failers -No match - au0041z -No match - -/^a[m\u0041]z/ - aAz - 0: aAz - -/^a\u041z/ - au041z - 0: au041z - *** Failers -No match - aAz -No match - -/^a\U0041z/ - aU0041z - 0: aU0041z - *** Failers -No match - aAz -No match - -/(?(?=c)c|d)++Y/BZ ------------------------------------------------------------------- - Bra - BraPos - Cond - Assert - c - Ket - c - Alt - d - Ket - KetRpos - Y - Ket - End ------------------------------------------------------------------- - -/(?(?=c)c|d)*+Y/BZ ------------------------------------------------------------------- - Bra - Braposzero - BraPos - Cond - Assert - c - Ket - c - Alt - d - Ket - KetRpos - Y - Ket - End ------------------------------------------------------------------- - -/a[\NB]c/ -Failed: \N is not supported in a class at offset 3 - -/a[B-\Nc]/ -Failed: invalid range in character class at offset 5 - -/a[B\Nc]/ -Failed: \N is not supported in a class at offset 4 - -/(a)(?2){0,1999}?(b)/ - -/(a)(?(DEFINE)(b))(?2){0,1999}?(?2)/ - -/--- This test, with something more complicated than individual letters, causes -different behaviour in Perl. Perhaps it disables some optimization; no tag is -passed back for the failures, whereas in PCRE there is a tag. ---/ - -/(A|P)(*:A)(B|P) | (X|P)(X|P)(*:B)(Y|P)/xK - AABC - 0: AB - 1: A - 2: B -MK: A - XXYZ - 0: XXY - 1: - 2: - 3: X - 4: X - 5: Y -MK: B - ** Failers -No match - XAQQ -No match, mark = A - XAQQXZZ -No match, mark = A - AXQQQ -No match, mark = A - AXXQQQ -No match, mark = B - -/-- Perl doesn't give marks for these, though it does if the alternatives are -replaced by single letters. --/ - -/(b|q)(*:m)f|a(*:n)w/K - aw - 0: aw -MK: n - ** Failers -No match, mark = n - abc -No match, mark = m - -/(q|b)(*:m)f|a(*:n)w/K - aw - 0: aw -MK: n - ** Failers -No match, mark = n - abc -No match, mark = m - -/-- After a partial match, the behaviour is as for a failure. --/ - -/^a(*:X)bcde/K - abc\P -Partial match, mark=X: abc - -/-- These are here because Perl doesn't return a mark, except for the first --/ - -/(?=(*:x))(q|)/K+ - abc - 0: - 0+ abc - 1: -MK: x - -/(?=(*:x))((*:y)q|)/K+ - abc - 0: - 0+ abc - 1: -MK: x - -/(?=(*:x))(?:(*:y)q|)/K+ - abc - 0: - 0+ abc -MK: x - -/(?=(*:x))(?>(*:y)q|)/K+ - abc - 0: - 0+ abc -MK: x - -/(?=a(*:x))(?!a(*:y)c)/K+ - ab - 0: - 0+ ab -MK: x - -/(?=a(*:x))(?=a(*:y)c|)/K+ - ab - 0: - 0+ ab -MK: x - -/(..)\1/ - ab\P -Partial match: ab - aba\P -Partial match: aba - abab\P - 0: abab - 1: ab - -/(..)\1/i - ab\P -Partial match: ab - abA\P -Partial match: abA - aBAb\P - 0: aBAb - 1: aB - -/(..)\1{2,}/ - ab\P -Partial match: ab - aba\P -Partial match: aba - abab\P -Partial match: abab - ababa\P -Partial match: ababa - ababab\P - 0: ababab - 1: ab - ababab\P\P -Partial match: ababab - abababa\P - 0: ababab - 1: ab - abababa\P\P -Partial match: abababa - -/(..)\1{2,}/i - ab\P -Partial match: ab - aBa\P -Partial match: aBa - aBAb\P -Partial match: aBAb - AbaBA\P -Partial match: AbaBA - abABAb\P - 0: abABAb - 1: ab - aBAbaB\P\P -Partial match: aBAbaB - abABabA\P - 0: abABab - 1: ab - abaBABa\P\P -Partial match: abaBABa - -/(..)\1{2,}?x/i - ab\P -Partial match: ab - abA\P -Partial match: abA - aBAb\P -Partial match: aBAb - abaBA\P -Partial match: abaBA - abAbaB\P -Partial match: abAbaB - abaBabA\P -Partial match: abaBabA - abAbABaBx\P - 0: abAbABaBx - 1: ab - -/^(..)\1/ - aba\P -Partial match: aba - -/^(..)\1{2,3}x/ - aba\P -Partial match: aba - ababa\P -Partial match: ababa - ababa\P\P -Partial match: ababa - abababx - 0: abababx - 1: ab - ababababx - 0: ababababx - 1: ab - -/^(..)\1{2,3}?x/ - aba\P -Partial match: aba - ababa\P -Partial match: ababa - ababa\P\P -Partial match: ababa - abababx - 0: abababx - 1: ab - ababababx - 0: ababababx - 1: ab - -/^(..)(\1{2,3})ab/ - abababab - 0: abababab - 1: ab - 2: abab - -/^\R/ - \r\P - 0: \x0d - \r\P\P -Partial match: \x0d - -/^\R{2,3}x/ - \r\P -Partial match: \x0d - \r\P\P -Partial match: \x0d - \r\r\P -Partial match: \x0d\x0d - \r\r\P\P -Partial match: \x0d\x0d - \r\r\r\P -Partial match: \x0d\x0d\x0d - \r\r\r\P\P -Partial match: \x0d\x0d\x0d - \r\rx - 0: \x0d\x0dx - \r\r\rx - 0: \x0d\x0d\x0dx - -/^\R{2,3}?x/ - \r\P -Partial match: \x0d - \r\P\P -Partial match: \x0d - \r\r\P -Partial match: \x0d\x0d - \r\r\P\P -Partial match: \x0d\x0d - \r\r\r\P -Partial match: \x0d\x0d\x0d - \r\r\r\P\P -Partial match: \x0d\x0d\x0d - \r\rx - 0: \x0d\x0dx - \r\r\rx - 0: \x0d\x0d\x0dx - -/^\R?x/ - \r\P -Partial match: \x0d - \r\P\P -Partial match: \x0d - x - 0: x - \rx - 0: \x0dx - -/^\R+x/ - \r\P -Partial match: \x0d - \r\P\P -Partial match: \x0d - \r\n\P -Partial match: \x0d\x0a - \r\n\P\P -Partial match: \x0d\x0a - \rx - 0: \x0dx - -/^a$/ - a\r\P -Partial match: a\x0d - a\r\P\P -Partial match: a\x0d - -/^a$/m - a\r\P -Partial match: a\x0d - a\r\P\P -Partial match: a\x0d - -/^(a$|a\r)/ - a\r\P - 0: a\x0d - 1: a\x0d - a\r\P\P -Partial match: a\x0d - -/^(a$|a\r)/m - a\r\P - 0: a\x0d - 1: a\x0d - a\r\P\P -Partial match: a\x0d - -/./ - \r\P - 0: \x0d - \r\P\P -Partial match: \x0d - -/.{2,3}/ - \r\P -Partial match: \x0d - \r\P\P -Partial match: \x0d - \r\r\P - 0: \x0d\x0d - \r\r\P\P -Partial match: \x0d\x0d - \r\r\r\P - 0: \x0d\x0d\x0d - \r\r\r\P\P -Partial match: \x0d\x0d\x0d - -/.{2,3}?/ - \r\P -Partial match: \x0d - \r\P\P -Partial match: \x0d - \r\r\P - 0: \x0d\x0d - \r\r\P\P -Partial match: \x0d\x0d - \r\r\r\P - 0: \x0d\x0d - \r\r\r\P\P - 0: \x0d\x0d - -"AB(C(D))(E(F))?(?(?=\2)(?=\4))" - ABCDGHI\O03 -Matched, but too many substrings - 0: ABCD - -/-- These are all run as real matches in test 1; here we are just checking the -settings of the anchored and startline bits. --/ - -/(?>.*?a)(?<=ba)/I -Capturing subpattern count = 0 -Max lookbehind = 2 -No options -No first char -Need char = 'a' - -/(?:.*?a)(?<=ba)/I -Capturing subpattern count = 0 -Max lookbehind = 2 -No options -First char at start or follows newline -Need char = 'a' - -/.*?a(*PRUNE)b/I -Capturing subpattern count = 0 -No options -No first char -Need char = 'b' - -/.*?a(*PRUNE)b/sI -Capturing subpattern count = 0 -Options: dotall -No first char -Need char = 'b' - -/^a(*PRUNE)b/sI -Capturing subpattern count = 0 -Options: anchored dotall -No first char -No need char - -/.*?a(*SKIP)b/I -Capturing subpattern count = 0 -No options -No first char -Need char = 'b' - -/(?>.*?a)b/sI -Capturing subpattern count = 0 -Options: dotall -No first char -Need char = 'b' - -/(?>.*?a)b/I -Capturing subpattern count = 0 -No options -No first char -Need char = 'b' - -/(?>^a)b/sI -Capturing subpattern count = 0 -Options: anchored dotall -No first char -No need char - -/(?>.*?)(?<=(abcd)|(wxyz))/I -Capturing subpattern count = 2 -Max lookbehind = 4 -May match empty string -No options -No first char -No need char - -/(?>.*)(?<=(abcd)|(wxyz))/I -Capturing subpattern count = 2 -Max lookbehind = 4 -May match empty string -No options -No first char -No need char - -"(?>.*)foo"I -Capturing subpattern count = 0 -No options -No first char -Need char = 'o' - -"(?>.*?)foo"I -Capturing subpattern count = 0 -No options -No first char -Need char = 'o' - -/(?>^abc)/mI -Capturing subpattern count = 0 -Options: multiline -First char at start or follows newline -Need char = 'c' - -/(?>.*abc)/mI -Capturing subpattern count = 0 -Options: multiline -No first char -Need char = 'c' - -/(?:.*abc)/mI -Capturing subpattern count = 0 -Options: multiline -First char at start or follows newline -Need char = 'c' - -/-- Check PCRE_STUDY_EXTRA_NEEDED --/ - -/.?/S-I -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char -Study returned NULL - -/.?/S!I -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char -Subject length lower bound = -1 -No starting char list - -/(?:(a)+(?C1)bb|aa(?C2)b)/ - aab\C+ -Callout 1: last capture = 1 - 0: - 1: a ---->aab - ^ ^ b -Callout 1: last capture = 1 - 0: - 1: a ---->aab - ^^ b -Callout 2: last capture = -1 - 0: ---->aab - ^ ^ b - 0: aab - -/(?:(a)++(?C1)bb|aa(?C2)b)/ - aab\C+ -Callout 1: last capture = 1 - 0: - 1: a ---->aab - ^ ^ b -Callout 2: last capture = -1 - 0: ---->aab - ^ ^ b - 0: aab - -/(?:(?>(a))(?C1)bb|aa(?C2)b)/ - aab\C+ -Callout 1: last capture = 1 - 0: - 1: a ---->aab - ^^ b -Callout 2: last capture = -1 - 0: ---->aab - ^ ^ b - 0: aab - -/(?:(?1)(?C1)x|ab(?C2))((a)){0}/ - aab\C+ -Callout 1: last capture = -1 - 0: ---->aab - ^^ x -Callout 1: last capture = -1 - 0: ---->aab - ^^ x -Callout 2: last capture = -1 - 0: ---->aab - ^ ^ ) - 0: ab - -/(?1)(?C1)((a)(?C2)){0}/ - aab\C+ -Callout 2: last capture = 2 - 0: - 1: - 2: a ---->aab - ^^ ) -Callout 1: last capture = -1 - 0: ---->aab - ^^ ((a)(?C2)){0} - 0: a - -/(?:(a)+(?C1)bb|aa(?C2)b)++/ - aab\C+ -Callout 1: last capture = 1 - 0: - 1: a ---->aab - ^ ^ b -Callout 1: last capture = 1 - 0: - 1: a ---->aab - ^^ b -Callout 2: last capture = -1 - 0: ---->aab - ^ ^ b - 0: aab - aab\C+\O2 -Callout 1: last capture = 1 - 0: ---->aab - ^ ^ b -Callout 1: last capture = 1 - 0: ---->aab - ^^ b -Callout 2: last capture = -1 - 0: ---->aab - ^ ^ b - 0: aab - -/(ab)x|ab/ - ab\O3 - 0: ab - ab\O2 - 0: ab - -/(ab)/ - ab\O3 -Matched, but too many substrings - 0: ab - ab\O2 -Matched, but too many substrings - 0: ab - -/(?<=123)(*MARK:xx)abc/K - xxxx123a\P\P -Partial match at offset 7, mark=xx: 123a - xxxx123a\P -Partial match at offset 7, mark=xx: 123a - -/123\Kabc/ - xxxx123a\P\P -Partial match: 123a - xxxx123a\P -Partial match: 123a - -/^(?(?=a)aa|bb)/C - bb ---->bb - +0 ^ ^ - +1 ^ (?(?=a)aa|bb) - +3 ^ (?=a) - +6 ^ a -+11 ^ b -+12 ^^ b -+13 ^ ^ ) -+14 ^ ^ - 0: bb - -/(?C1)^(?C2)(?(?C99)(?=(?C3)a(?C4))(?C5)a(?C6)a(?C7)|(?C8)b(?C9)b(?C10))(?C11)/ - bb ---->bb - 1 ^ ^ - 2 ^ (?(?C99)(?=(?C3)a(?C4))(?C5)a(?C6)a(?C7)|(?C8)b(?C9)b(?C10)) - 99 ^ (?=(?C3)a(?C4)) - 3 ^ a - 8 ^ b - 9 ^^ b - 10 ^ ^ ) - 11 ^ ^ - 0: bb - -/-- Perl seems to have a bug with this one --/ - -/aaaaa(*COMMIT)(*PRUNE)b|a+c/ - aaaaaac - 0: aaaac - -/-- Here are some that Perl treats differently because of the way it handles -backtracking verbs. --/ - - /(?!a(*COMMIT)b)ac|ad/ - ac - 0: ac - ad - 0: ad - -/^(?!a(*THEN)b|ac)../ - ac -No match - ad - 0: ad - -/^(?=a(*THEN)b|ac)/ - ac - 0: - -/\A.*?(?:a|b(*THEN)c)/ - ba - 0: ba - -/\A.*?(?:a|b(*THEN)c)++/ - ba - 0: ba - -/\A.*?(?:a|b(*THEN)c|d)/ - ba - 0: ba - -/(?:(a(*MARK:X)a+(*SKIP:X)b)){0}(?:(?1)|aac)/ - aac - 0: aac - -/\A.*?(a|b(*THEN)c)/ - ba - 0: ba - 1: a - -/^(A(*THEN)B|A(*THEN)D)/ - AD - 0: AD - 1: AD - -/(?!b(*THEN)a)bn|bnn/ - bnn - 0: bn - -/(?(?=b(*SKIP)a)bn|bnn)/ - bnn -No match - -/(?=b(*THEN)a|)bn|bnn/ - bnn - 0: bn - -/-------------------------/ - -/(*LIMIT_MATCH=12bc)abc/ -Failed: (*VERB) not recognized or malformed at offset 7 - -/(*LIMIT_MATCH=4294967290)abc/ -Failed: (*VERB) not recognized or malformed at offset 7 - -/(*LIMIT_RECURSION=4294967280)abc/I -Capturing subpattern count = 0 -Recursion limit = 4294967280 -No options -First char = 'a' -Need char = 'c' - -/(a+)*zz/ - aaaaaaaaaaaaaz -No match - aaaaaaaaaaaaaz\q3000 -Error -8 (match limit exceeded) - -/(a+)*zz/S- - aaaaaaaaaaaaaz\Q10 -Error -21 (recursion limit exceeded) - -/(*LIMIT_MATCH=3000)(a+)*zz/I -Capturing subpattern count = 1 -Match limit = 3000 -No options -No first char -Need char = 'z' - aaaaaaaaaaaaaz -Error -8 (match limit exceeded) - aaaaaaaaaaaaaz\q60000 -Error -8 (match limit exceeded) - -/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I -Capturing subpattern count = 1 -Match limit = 3000 -No options -No first char -Need char = 'z' - aaaaaaaaaaaaaz -Error -8 (match limit exceeded) - -/(*LIMIT_MATCH=60000)(a+)*zz/I -Capturing subpattern count = 1 -Match limit = 60000 -No options -No first char -Need char = 'z' - aaaaaaaaaaaaaz -No match - aaaaaaaaaaaaaz\q3000 -Error -8 (match limit exceeded) - -/(*LIMIT_RECURSION=10)(a+)*zz/IS- -Capturing subpattern count = 1 -Recursion limit = 10 -No options -No first char -Need char = 'z' -Subject length lower bound = 2 -Starting chars: a z - aaaaaaaaaaaaaz -Error -21 (recursion limit exceeded) - aaaaaaaaaaaaaz\Q1000 -Error -21 (recursion limit exceeded) - -/(*LIMIT_RECURSION=10)(*LIMIT_RECURSION=1000)(a+)*zz/IS- -Capturing subpattern count = 1 -Recursion limit = 10 -No options -No first char -Need char = 'z' -Subject length lower bound = 2 -Starting chars: a z - aaaaaaaaaaaaaz -Error -21 (recursion limit exceeded) - -/(*LIMIT_RECURSION=1000)(a+)*zz/IS- -Capturing subpattern count = 1 -Recursion limit = 1000 -No options -No first char -Need char = 'z' -Subject length lower bound = 2 -Starting chars: a z - aaaaaaaaaaaaaz -No match - aaaaaaaaaaaaaz\Q10 -Error -21 (recursion limit exceeded) - -/-- This test causes a segfault with Perl 5.18.0 --/ - -/^(?=(a)){0}b(?1)/ - backgammon - 0: ba - -/(?|(?f)|(?b))/JI -Capturing subpattern count = 1 -Named capturing subpatterns: - n 1 -Options: dupnames -No first char -No need char - -/(?abc)(?z)\k()/JDZS ------------------------------------------------------------------- - Bra - CBra 1 - abc - Ket - CBra 2 - z - Ket - \k2 - CBra 3 - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 3 -Max back reference = 2 -Named capturing subpatterns: - a 1 - a 2 -Options: dupnames -First char = 'a' -Need char = 'z' -Subject length lower bound = 5 -No starting char list - -/a*[bcd]/BZ ------------------------------------------------------------------- - Bra - a*+ - [b-d] - Ket - End ------------------------------------------------------------------- - -/[bcd]*a/BZ ------------------------------------------------------------------- - Bra - [b-d]*+ - a - Ket - End ------------------------------------------------------------------- - -/-- A complete set of tests for auto-possessification of character types --/ - -/\D+\D \D+\d \D+\S \D+\s \D+\W \D+\w \D+. \D+\C \D+\R \D+\H \D+\h \D+\V \D+\v \D+\Z \D+\z \D+$/BZx ------------------------------------------------------------------- - Bra - \D+ - \D - \D++ - \d - \D+ - \S - \D+ - \s - \D+ - \W - \D+ - \w - \D+ - Any - \D+ - AllAny - \D+ - \R - \D+ - \H - \D+ - \h - \D+ - \V - \D+ - \v - \D+ - \Z - \D++ - \z - \D+ - $ - Ket - End ------------------------------------------------------------------- - -/\d+\D \d+\d \d+\S \d+\s \d+\W \d+\w \d+. \d+\C \d+\R \d+\H \d+\h \d+\V \d+\v \d+\Z \d+\z \d+$/BZx ------------------------------------------------------------------- - Bra - \d++ - \D - \d+ - \d - \d+ - \S - \d++ - \s - \d++ - \W - \d+ - \w - \d+ - Any - \d+ - AllAny - \d++ - \R - \d+ - \H - \d++ - \h - \d+ - \V - \d++ - \v - \d++ - \Z - \d++ - \z - \d++ - $ - Ket - End ------------------------------------------------------------------- - -/\S+\D \S+\d \S+\S \S+\s \S+\W \S+\w \S+. \S+\C \S+\R \S+\H \S+\h \S+\V \S+\v \S+\Z \S+\z \S+$/BZx ------------------------------------------------------------------- - Bra - \S+ - \D - \S+ - \d - \S+ - \S - \S++ - \s - \S+ - \W - \S+ - \w - \S+ - Any - \S+ - AllAny - \S++ - \R - \S+ - \H - \S++ - \h - \S+ - \V - \S++ - \v - \S++ - \Z - \S++ - \z - \S++ - $ - Ket - End ------------------------------------------------------------------- - -/\s+\D \s+\d \s+\S \s+\s \s+\W \s+\w \s+. \s+\C \s+\R \s+\H \s+\h \s+\V \s+\v \s+\Z \s+\z \s+$/BZx ------------------------------------------------------------------- - Bra - \s+ - \D - \s++ - \d - \s++ - \S - \s+ - \s - \s+ - \W - \s++ - \w - \s+ - Any - \s+ - AllAny - \s+ - \R - \s+ - \H - \s+ - \h - \s+ - \V - \s+ - \v - \s+ - \Z - \s++ - \z - \s+ - $ - Ket - End ------------------------------------------------------------------- - -/\W+\D \W+\d \W+\S \W+\s \W+\W \W+\w \W+. \W+\C \W+\R \W+\H \W+\h \W+\V \W+\v \W+\Z \W+\z \W+$/BZx ------------------------------------------------------------------- - Bra - \W+ - \D - \W++ - \d - \W+ - \S - \W+ - \s - \W+ - \W - \W++ - \w - \W+ - Any - \W+ - AllAny - \W+ - \R - \W+ - \H - \W+ - \h - \W+ - \V - \W+ - \v - \W+ - \Z - \W++ - \z - \W+ - $ - Ket - End ------------------------------------------------------------------- - -/\w+\D \w+\d \w+\S \w+\s \w+\W \w+\w \w+. \w+\C \w+\R \w+\H \w+\h \w+\V \w+\v \w+\Z \w+\z \w+$/BZx ------------------------------------------------------------------- - Bra - \w+ - \D - \w+ - \d - \w+ - \S - \w++ - \s - \w++ - \W - \w+ - \w - \w+ - Any - \w+ - AllAny - \w++ - \R - \w+ - \H - \w++ - \h - \w+ - \V - \w++ - \v - \w++ - \Z - \w++ - \z - \w++ - $ - Ket - End ------------------------------------------------------------------- - -/\C+\D \C+\d \C+\S \C+\s \C+\W \C+\w \C+. \C+\C \C+\R \C+\H \C+\h \C+\V \C+\v \C+\Z \C+\z \C+$/BZx ------------------------------------------------------------------- - Bra - AllAny+ - \D - AllAny+ - \d - AllAny+ - \S - AllAny+ - \s - AllAny+ - \W - AllAny+ - \w - AllAny+ - Any - AllAny+ - AllAny - AllAny+ - \R - AllAny+ - \H - AllAny+ - \h - AllAny+ - \V - AllAny+ - \v - AllAny+ - \Z - AllAny++ - \z - AllAny+ - $ - Ket - End ------------------------------------------------------------------- - -/\R+\D \R+\d \R+\S \R+\s \R+\W \R+\w \R+. \R+\C \R+\R \R+\H \R+\h \R+\V \R+\v \R+\Z \R+\z \R+$/BZx ------------------------------------------------------------------- - Bra - \R+ - \D - \R++ - \d - \R+ - \S - \R++ - \s - \R+ - \W - \R++ - \w - \R++ - Any - \R+ - AllAny - \R+ - \R - \R+ - \H - \R++ - \h - \R+ - \V - \R+ - \v - \R+ - \Z - \R++ - \z - \R+ - $ - Ket - End ------------------------------------------------------------------- - -/\H+\D \H+\d \H+\S \H+\s \H+\W \H+\w \H+. \H+\C \H+\R \H+\H \H+\h \H+\V \H+\v \H+\Z \H+\z \H+$/BZx ------------------------------------------------------------------- - Bra - \H+ - \D - \H+ - \d - \H+ - \S - \H+ - \s - \H+ - \W - \H+ - \w - \H+ - Any - \H+ - AllAny - \H+ - \R - \H+ - \H - \H++ - \h - \H+ - \V - \H+ - \v - \H+ - \Z - \H++ - \z - \H+ - $ - Ket - End ------------------------------------------------------------------- - -/\h+\D \h+\d \h+\S \h+\s \h+\W \h+\w \h+. \h+\C \h+\R \h+\H \h+\h \h+\V \h+\v \h+\Z \h+\z \h+$/BZx ------------------------------------------------------------------- - Bra - \h+ - \D - \h++ - \d - \h++ - \S - \h+ - \s - \h+ - \W - \h++ - \w - \h+ - Any - \h+ - AllAny - \h++ - \R - \h++ - \H - \h+ - \h - \h+ - \V - \h++ - \v - \h+ - \Z - \h++ - \z - \h+ - $ - Ket - End ------------------------------------------------------------------- - -/\V+\D \V+\d \V+\S \V+\s \V+\W \V+\w \V+. \V+\C \V+\R \V+\H \V+\h \V+\V \V+\v \V+\Z \V+\z \V+$/BZx ------------------------------------------------------------------- - Bra - \V+ - \D - \V+ - \d - \V+ - \S - \V+ - \s - \V+ - \W - \V+ - \w - \V+ - Any - \V+ - AllAny - \V++ - \R - \V+ - \H - \V+ - \h - \V+ - \V - \V++ - \v - \V+ - \Z - \V++ - \z - \V+ - $ - Ket - End ------------------------------------------------------------------- - -/\v+\D \v+\d \v+\S \v+\s \v+\W \v+\w \v+. \v+\C \v+\R \v+\H \v+\h \v+\V \v+\v \v+\Z \v+\z \v+$/BZx ------------------------------------------------------------------- - Bra - \v+ - \D - \v++ - \d - \v++ - \S - \v+ - \s - \v+ - \W - \v++ - \w - \v+ - Any - \v+ - AllAny - \v+ - \R - \v+ - \H - \v++ - \h - \v++ - \V - \v+ - \v - \v+ - \Z - \v++ - \z - \v+ - $ - Ket - End ------------------------------------------------------------------- - -/ a+\D a+\d a+\S a+\s a+\W a+\w a+. a+\C a+\R a+\H a+\h a+\V a+\v a+\Z a+\z a+$/BZx ------------------------------------------------------------------- - Bra - a+ - \D - a++ - \d - a+ - \S - a++ - \s - a++ - \W - a+ - \w - a+ - Any - a+ - AllAny - a++ - \R - a+ - \H - a++ - \h - a+ - \V - a++ - \v - a++ - \Z - a++ - \z - a++ - $ - Ket - End ------------------------------------------------------------------- - -/\n+\D \n+\d \n+\S \n+\s \n+\W \n+\w \n+. \n+\C \n+\R \n+\H \n+\h \n+\V \n+\v \n+\Z \n+\z \n+$/BZx ------------------------------------------------------------------- - Bra - \x0a+ - \D - \x0a++ - \d - \x0a++ - \S - \x0a+ - \s - \x0a+ - \W - \x0a++ - \w - \x0a+ - Any - \x0a+ - AllAny - \x0a+ - \R - \x0a+ - \H - \x0a++ - \h - \x0a++ - \V - \x0a+ - \v - \x0a+ - \Z - \x0a++ - \z - \x0a+ - $ - Ket - End ------------------------------------------------------------------- - -/ .+\D .+\d .+\S .+\s .+\W .+\w .+. .+\C .+\R .+\H .+\h .+\V .+\v .+\Z .+\z .+$/BZx ------------------------------------------------------------------- - Bra - Any+ - \D - Any+ - \d - Any+ - \S - Any+ - \s - Any+ - \W - Any+ - \w - Any+ - Any - Any+ - AllAny - Any++ - \R - Any+ - \H - Any+ - \h - Any+ - \V - Any+ - \v - Any+ - \Z - Any++ - \z - Any+ - $ - Ket - End ------------------------------------------------------------------- - -/ .+\D .+\d .+\S .+\s .+\W .+\w .+. .+\C .+\R .+\H .+\h .+\V .+\v .+\Z .+\z .+$/BZxs ------------------------------------------------------------------- - Bra - AllAny+ - \D - AllAny+ - \d - AllAny+ - \S - AllAny+ - \s - AllAny+ - \W - AllAny+ - \w - AllAny+ - AllAny - AllAny+ - AllAny - AllAny+ - \R - AllAny+ - \H - AllAny+ - \h - AllAny+ - \V - AllAny+ - \v - AllAny+ - \Z - AllAny++ - \z - AllAny+ - $ - Ket - End ------------------------------------------------------------------- - -/\D+$ \d+$ \S+$ \s+$ \W+$ \w+$ \C+$ \R+$ \H+$ \h+$ \V+$ \v+$ a+$ \n+$ .+$ .+$/BZxm ------------------------------------------------------------------- - Bra - \D+ - /m $ - \d++ - /m $ - \S++ - /m $ - \s+ - /m $ - \W+ - /m $ - \w++ - /m $ - AllAny+ - /m $ - \R+ - /m $ - \H+ - /m $ - \h+ - /m $ - \V+ - /m $ - \v+ - /m $ - a+ - /m $ - \x0a+ - /m $ - Any+ - /m $ - Any+ - /m $ - Ket - End ------------------------------------------------------------------- - -/(?=a+)a(a+)++a/BZ ------------------------------------------------------------------- - Bra - Assert - a++ - Ket - a - CBraPos 1 - a++ - KetRpos - a - Ket - End ------------------------------------------------------------------- - -/a+(bb|cc)a+(?:bb|cc)a+(?>bb|cc)a+(?:bb|cc)+a+(aa)a+(?:bb|aa)/BZ ------------------------------------------------------------------- - Bra - a++ - CBra 1 - bb - Alt - cc - Ket - a++ - Bra - bb - Alt - cc - Ket - a++ - Once_NC - bb - Alt - cc - Ket - a++ - Bra - bb - Alt - cc - KetRmax - a+ - CBra 2 - aa - Ket - a+ - Bra - bb - Alt - aa - Ket - Ket - End ------------------------------------------------------------------- - -/a+(bb|cc)?#a+(?:bb|cc)??#a+(?:bb|cc)?+#a+(?:bb|cc)*#a+(bb|cc)?a#a+(?:aa)?/BZ ------------------------------------------------------------------- - Bra - a++ - Brazero - CBra 1 - bb - Alt - cc - Ket - # - a++ - Braminzero - Bra - bb - Alt - cc - Ket - # - a++ - Once - Brazero - Bra - bb - Alt - cc - Ket - Ket - # - a++ - Brazero - Bra - bb - Alt - cc - KetRmax - # - a+ - Brazero - CBra 2 - bb - Alt - cc - Ket - a# - a+ - Brazero - Bra - aa - Ket - Ket - End ------------------------------------------------------------------- - -/a+(?:bb)?a#a+(?:|||)#a+(?:|b)a#a+(?:|||)?a/BZ ------------------------------------------------------------------- - Bra - a+ - Brazero - Bra - bb - Ket - a# - a++ - Bra - Alt - Alt - Alt - Ket - # - a+ - Bra - Alt - b - Ket - a# - a+ - Brazero - Bra - Alt - Alt - Alt - Ket - a - Ket - End ------------------------------------------------------------------- - -/[ab]*/BZ ------------------------------------------------------------------- - Bra - [ab]*+ - Ket - End ------------------------------------------------------------------- - aaaa - 0: aaaa - -/[ab]*?/BZ ------------------------------------------------------------------- - Bra - [ab]*? - Ket - End ------------------------------------------------------------------- - aaaa - 0: - -/[ab]?/BZ ------------------------------------------------------------------- - Bra - [ab]?+ - Ket - End ------------------------------------------------------------------- - aaaa - 0: a - -/[ab]??/BZ ------------------------------------------------------------------- - Bra - [ab]?? - Ket - End ------------------------------------------------------------------- - aaaa - 0: - -/[ab]+/BZ ------------------------------------------------------------------- - Bra - [ab]++ - Ket - End ------------------------------------------------------------------- - aaaa - 0: aaaa - -/[ab]+?/BZ ------------------------------------------------------------------- - Bra - [ab]+? - Ket - End ------------------------------------------------------------------- - aaaa - 0: a - -/[ab]{2,3}/BZ ------------------------------------------------------------------- - Bra - [ab]{2,3}+ - Ket - End ------------------------------------------------------------------- - aaaa - 0: aaa - -/[ab]{2,3}?/BZ ------------------------------------------------------------------- - Bra - [ab]{2,3}? - Ket - End ------------------------------------------------------------------- - aaaa - 0: aa - -/[ab]{2,}/BZ ------------------------------------------------------------------- - Bra - [ab]{2,}+ - Ket - End ------------------------------------------------------------------- - aaaa - 0: aaaa - -/[ab]{2,}?/BZ ------------------------------------------------------------------- - Bra - [ab]{2,}? - Ket - End ------------------------------------------------------------------- - aaaa - 0: aa - -/\d+\s{0,5}=\s*\S?=\w{0,4}\W*/BZ ------------------------------------------------------------------- - Bra - \d++ - \s{0,5}+ - = - \s*+ - \S? - = - \w{0,4}+ - \W*+ - Ket - End ------------------------------------------------------------------- - -/[a-d]{5,12}[e-z0-9]*#[^a-z]+[b-y]*a[2-7]?[^0-9a-z]+/BZ ------------------------------------------------------------------- - Bra - [a-d]{5,12}+ - [0-9e-z]*+ - # - [\x00-`{-\xff] (neg)++ - [b-y]*+ - a - [2-7]?+ - [\x00-/:-`{-\xff] (neg)++ - Ket - End ------------------------------------------------------------------- - -/[a-z]*\s#[ \t]?\S#[a-c]*\S#[C-G]+?\d#[4-8]*\D#[4-9,]*\D#[!$]{0,5}\w#[M-Xf-l]+\W#[a-c,]?\W/BZ ------------------------------------------------------------------- - Bra - [a-z]*+ - \s - # - [\x09 ]?+ - \S - # - [a-c]* - \S - # - [C-G]++ - \d - # - [4-8]*+ - \D - # - [,4-9]* - \D - # - [!$]{0,5}+ - \w - # - [M-Xf-l]++ - \W - # - [,a-c]? - \W - Ket - End ------------------------------------------------------------------- - -/a+(aa|bb)*c#a*(bb|cc)*a#a?(bb|cc)*d#[a-f]*(g|hh)*f/BZ ------------------------------------------------------------------- - Bra - a+ - Brazero - CBra 1 - aa - Alt - bb - KetRmax - c# - a* - Brazero - CBra 2 - bb - Alt - cc - KetRmax - a# - a?+ - Brazero - CBra 3 - bb - Alt - cc - KetRmax - d# - [a-f]* - Brazero - CBra 4 - g - Alt - hh - KetRmax - f - Ket - End ------------------------------------------------------------------- - -/[a-f]*(g|hh|i)*i#[a-x]{4,}(y{0,6})*y#[a-k]+(ll|mm)+n/BZ ------------------------------------------------------------------- - Bra - [a-f]*+ - Brazero - CBra 1 - g - Alt - hh - Alt - i - KetRmax - i# - [a-x]{4,} - Brazero - SCBra 2 - y{0,6} - KetRmax - y# - [a-k]++ - CBra 3 - ll - Alt - mm - KetRmax - n - Ket - End ------------------------------------------------------------------- - -/[a-f]*(?>gg|hh)+#[a-f]*(?>gg|hh)?#[a-f]*(?>gg|hh)*a#[a-f]*(?>gg|hh)*h/BZ ------------------------------------------------------------------- - Bra - [a-f]*+ - Once_NC - gg - Alt - hh - KetRmax - # - [a-f]*+ - Brazero - Once_NC - gg - Alt - hh - Ket - # - [a-f]* - Brazero - Once_NC - gg - Alt - hh - KetRmax - a# - [a-f]*+ - Brazero - Once_NC - gg - Alt - hh - KetRmax - h - Ket - End ------------------------------------------------------------------- - -/[a-c]*d/DZS ------------------------------------------------------------------- - Bra - [a-c]*+ - d - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -Need char = 'd' -Subject length lower bound = 1 -Starting chars: a b c d - -/[a-c]+d/DZS ------------------------------------------------------------------- - Bra - [a-c]++ - d - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -Need char = 'd' -Subject length lower bound = 2 -Starting chars: a b c - -/[a-c]?d/DZS ------------------------------------------------------------------- - Bra - [a-c]?+ - d - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -Need char = 'd' -Subject length lower bound = 1 -Starting chars: a b c d - -/[a-c]{4,6}d/DZS ------------------------------------------------------------------- - Bra - [a-c]{4,6}+ - d - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -Need char = 'd' -Subject length lower bound = 5 -Starting chars: a b c - -/[a-c]{0,6}d/DZS ------------------------------------------------------------------- - Bra - [a-c]{0,6}+ - d - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -Need char = 'd' -Subject length lower bound = 1 -Starting chars: a b c d - -/-- End of special auto-possessive tests --/ - -/^A\o{1239}B/ -Failed: non-octal character in \o{} (closing brace missing?) at offset 8 - -/^A\oB/ -Failed: missing opening brace after \o at offset 3 - -/^A\x{zz}B/ -Failed: non-hex character in \x{} (closing brace missing?) at offset 5 - -/^A\x{12Z/ -Failed: non-hex character in \x{} (closing brace missing?) at offset 7 - -/^A\x{/ -Failed: non-hex character in \x{} (closing brace missing?) at offset 5 - -/[ab]++/BZO ------------------------------------------------------------------- - Bra - [ab]++ - Ket - End ------------------------------------------------------------------- - -/[^ab]*+/BZO ------------------------------------------------------------------- - Bra - [\x00-`c-\xff] (neg)*+ - Ket - End ------------------------------------------------------------------- - -/a{4}+/BZO ------------------------------------------------------------------- - Bra - a{4} - Ket - End ------------------------------------------------------------------- - -/a{4}+/BZOi ------------------------------------------------------------------- - Bra - /i a{4} - Ket - End ------------------------------------------------------------------- - -/[a-[:digit:]]+/ -Failed: invalid range in character class at offset 3 - -/[A-[:digit:]]+/ -Failed: invalid range in character class at offset 3 - -/[a-[.xxx.]]+/ -Failed: invalid range in character class at offset 3 - -/[a-[=xxx=]]+/ -Failed: invalid range in character class at offset 3 - -/[a-[!xxx!]]+/ -Failed: range out of order in character class at offset 3 - -/[A-[!xxx!]]+/ - A]]] - 0: A]]] - -/[a-\d]+/ -Failed: invalid range in character class at offset 4 - -/(?<0abc>xx)/ -Failed: group name must start with a non-digit at offset 3 - -/(?&1abc)xx(?<1abc>y)/ -Failed: group name must start with a non-digit at offset 3 - -/(?xx)/ -Failed: syntax error in subpattern name (missing terminator) at offset 5 - -/(?'0abc'xx)/ -Failed: group name must start with a non-digit at offset 3 - -/(?P<0abc>xx)/ -Failed: group name must start with a non-digit at offset 4 - -/\k<5ghj>/ -Failed: group name must start with a non-digit at offset 3 - -/\k'5ghj'/ -Failed: group name must start with a non-digit at offset 3 - -/\k{2fgh}/ -Failed: group name must start with a non-digit at offset 3 - -/(?P=8yuki)/ -Failed: group name must start with a non-digit at offset 4 - -/\g{4df}/ -Failed: group name must start with a non-digit at offset 3 - -/(?&1abc)xx(?<1abc>y)/ -Failed: group name must start with a non-digit at offset 3 - -/(?P>1abc)xx(?<1abc>y)/ -Failed: group name must start with a non-digit at offset 4 - -/\g'3gh'/ -Failed: \g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number at offset 2 - -/\g<5fg>/ -Failed: \g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number at offset 2 - -/(?(<4gh>)abc)/ -Failed: group name must start with a non-digit at offset 4 - -/(?('4gh')abc)/ -Failed: group name must start with a non-digit at offset 4 - -/(?(4gh)abc)/ -Failed: malformed number or name after (?( at offset 4 - -/(?(R&6yh)abc)/ -Failed: group name must start with a non-digit at offset 5 - -/(((a\2)|(a*)\g<-1>))*a?/BZ ------------------------------------------------------------------- - Bra - Brazero - SCBra 1 - Once - CBra 2 - CBra 3 - a - \2 - Ket - Alt - CBra 4 - a* - Ket - Recurse - Ket - Ket - KetRmax - a?+ - Ket - End ------------------------------------------------------------------- - -/-- Test the ugly "start or end of word" compatibility syntax --/ - -/[[:<:]]red[[:>:]]/BZ ------------------------------------------------------------------- - Bra - \b - Assert - \w - Ket - red - \b - AssertB - Reverse - \w - Ket - Ket - End ------------------------------------------------------------------- - little red riding hood - 0: red - a /red/ thing - 0: red - red is a colour - 0: red - put it all on red - 0: red - ** Failers -No match - no reduction -No match - Alfred Winifred -No match - -/[a[:<:]] should give error/ -Failed: unknown POSIX class name at offset 4 - -/(?=ab\K)/+ - abcd -Start of matched string is beyond its end - displaying from end to start. - 0: ab - 0+ abcd - -/abcd/f - xx\nxabcd -No match - -/ -- Test stack check external calls --/ - -/(((((a)))))/Q0 - -/(((((a)))))/Q1 -Failed: parentheses are too deeply nested (stack check) at offset 0 - -/(((((a)))))/Q -** Missing 0 or 1 after /Q - -/^\w+(?>\s*)(?<=\w)/BZ ------------------------------------------------------------------- - Bra - ^ - \w+ - Once_NC - \s*+ - Ket - AssertB - Reverse - \w - Ket - Ket - End ------------------------------------------------------------------- - -/\othing/ -Failed: missing opening brace after \o at offset 1 - -/\o{}/ -Failed: digits missing in \x{} or \o{} at offset 1 - -/\o{whatever}/ -Failed: non-octal character in \o{} (closing brace missing?) at offset 3 - -/\xthing/ - -/\x{}/ -Failed: digits missing in \x{} or \o{} at offset 3 - -/\x{whatever}/ -Failed: non-hex character in \x{} (closing brace missing?) at offset 3 - -"((?=(?(?=(?(?=(?(?=()))))))))" - a - 0: - 1: - 2: - -"(?(?=)==)(((((((((?=)))))))))" - a -No match - -/^(?:(a)|b)(?(1)A|B)/I -Capturing subpattern count = 1 -Max back reference = 1 -Options: anchored -No first char -No need char - aA123\O3 -Matched, but too many substrings - 0: aA - aA123\O6 - 0: aA - 1: a - -'^(?:(?a)|b)(?()A|B)' - aA123\O3 -Matched, but too many substrings - 0: aA - aA123\O6 - 0: aA - 1: a - -'^(?)(?:(?a)|b)(?()A|B)'J - aA123\O3 -Matched, but too many substrings - 0: aA - aA123\O6 -Matched, but too many substrings - 0: aA - 1: - -'^(?:(?X)|)(?:(?a)|b)\k{AA}'J - aa123\O3 -Matched, but too many substrings - 0: aa - aa123\O6 -Matched, but too many substrings - 0: aa - 1: - -/(?(?J)(?1(111111)11|)1|1|)(?()1)/ - -/(?(?=0)?)+/ -Failed: nothing to repeat at offset 7 - -/(?(?=0)(?=00)?00765)/ - 00765 - 0: 00765 - -/(?(?=0)(?=00)?00765|(?!3).56)/ - 00765 - 0: 00765 - 456 - 0: 456 - ** Failers -No match - 356 -No match - -'^(a)*+(\w)' - g - 0: g - 1: - 2: g - g\O3 -Matched, but too many substrings - 0: g - -'^(?:a)*+(\w)' - g - 0: g - 1: g - g\O3 -Matched, but too many substrings - 0: g - -//C - \O\C+ -Callout 255: last capture = -1 ----> - +0 ^ -Matched, but too many substrings - -"((?2){0,1999}())?" - -/((?+1)(\1))/BZ ------------------------------------------------------------------- - Bra - Once - CBra 1 - Recurse - CBra 2 - \1 - Ket - Ket - Ket - Ket - End ------------------------------------------------------------------- - -/(?(?!)a|b)/ - bbb - 0: b - aaa -No match - -"((?2)+)((?1))" - -"(?(?.*!.*)?)" -Failed: assertion expected after (?( or (?(?C) at offset 3 - -"X((?2)()*+){2}+"BZ ------------------------------------------------------------------- - Bra - X - Once - CBra 1 - Recurse - Braposzero - SCBraPos 2 - KetRpos - Ket - CBra 1 - Recurse - Braposzero - SCBraPos 2 - KetRpos - Ket - Ket - Ket - End ------------------------------------------------------------------- - -"X((?2)()*+){2}"BZ ------------------------------------------------------------------- - Bra - X - CBra 1 - Recurse - Braposzero - SCBraPos 2 - KetRpos - Ket - CBra 1 - Recurse - Braposzero - SCBraPos 2 - KetRpos - Ket - Ket - End ------------------------------------------------------------------- - -"(?<=((?2))((?1)))" -Failed: lookbehind assertion is not fixed length at offset 17 - -/(?<=\Ka)/g+ - aaaaa - 0: a - 0+ aaaa - 0: a - 0+ aaaa - 0: a - 0+ aaa - 0: a - 0+ aa - 0: a - 0+ a - 0: a - 0+ - -/(?<=\Ka)/G+ - aaaaa - 0: a - 0+ aaaa - 0: a - 0+ aaa - 0: a - 0+ aa - 0: a - 0+ a - 0: a - 0+ - -/((?2){73}(?2))((?1))/ - -/.((?2)(?R)\1)()/BZ ------------------------------------------------------------------- - Bra - Any - Once - CBra 1 - Recurse - Recurse - \1 - Ket - Ket - CBra 2 - Ket - Ket - End ------------------------------------------------------------------- - -/(?1)()((((((\1++))\x85)+)|))/ - -/(\9*+(?2);\3++()2|)++{/ -Failed: reference to non-existent subpattern at offset 22 - -/\V\x85\9*+((?2)\3++()2)*:2/ -Failed: reference to non-existent subpattern at offset 26 - -/(((?(R)){0,2}) (?''((?'R')((?'R')))))/J - -/(((?(X)){0,2}) (?''((?'X')((?'X')))))/J - -/(((?(R)){0,2}) (?''((?'X')((?'R')))))/ - -"(?J)(?'d'(?'d'\g{d}))" - -".*?\h.+.\.+\R*?\xd(?i)(?=!(?=b`b`b`\`b\xa9b!)`\a`bbbbbbbbbbbbb`bbbbbbbbbbbb*R\x85bbbbbbb\C?{((?2)(?))(( -\H){8(?<=(?1){29}\xa8bbbb\x16\xd\xc6^($(?1)/ - -/a[[:punct:]b]/BZ ------------------------------------------------------------------- - Bra - a - [!-/:-@[-`b{-~] - Ket - End ------------------------------------------------------------------- - -/L(?#(|++)(?J:(?)(?))(?)/ - \O\CC -Matched, but too many substrings -copy substring C failed -7 - -/(?=a\K)/ - ring bpattingbobnd $ 1,oern cou \rb\L -Start of matched string is beyond its end - displaying from end to start. - 0: a - 0L - -/(?<=((?C)0))/ - 9010 ---->9010 - 0 ^ 0 - 0 ^ 0 - 0: - 1: 0 - abcd ---->abcd - 0 ^ 0 - 0 ^ 0 - 0 ^ 0 - 0 ^ 0 -No match - -/((?J)(?'R'(?'R'(?'R'(?'R'(?'R'(?|(\k'R'))))))))/ - -/\N(?(?C)0?!.)*/ -Failed: assertion expected after (?( or (?(?C) at offset 4 - -/(?abc)(?(R)xyz)/BZ ------------------------------------------------------------------- - Bra - CBra 1 - abc - Ket - Cond - Cond recurse any - xyz - Ket - Ket - End ------------------------------------------------------------------- - -/(?abc)(?(R)xyz)/BZ ------------------------------------------------------------------- - Bra - CBra 1 - abc - Ket - Cond - 1 Cond ref - xyz - Ket - Ket - End ------------------------------------------------------------------- - -/(?=.*[A-Z])/I -Capturing subpattern count = 0 -May match empty string -No options -No first char -No need char - -"(?<=(a))\1?b" - ab - 0: b - 1: a - aaab - 0: ab - 1: a - -"(?=(a))\1?b" - ab - 0: ab - 1: a - aaab - 0: ab - 1: a - -/(?(?=^))b/ - abc - 0: b - -/-- End of testinput2 --/ diff --git a/src/pcre/testdata/testoutput21-16 b/src/pcre/testdata/testoutput21-16 deleted file mode 100644 index da194d90..00000000 --- a/src/pcre/testdata/testoutput21-16 +++ /dev/null @@ -1,100 +0,0 @@ -/-- Tests for reloading pre-compiled patterns. The first one gives an error -right away, and can be any old pattern compiled in 8-bit mode ("abc" is -typical). The others require the link size to be 2. */x - -(?:[AaLl]+)[^xX-]*?)(?P[\x{150}-\x{250}\x{300}]| - [^\x{800}aAs-uS-U\x{d800}-\x{dfff}])++[^#\b\x{500}\x{1000}]{3,5}$ - /x - - In 16-bit mode with options: S>testdata/saved16LE-1 - FS>testdata/saved16BE-1 - In 32-bit mode with options: S>testdata/saved32LE-1 - FS>testdata/saved32BE-1 ---%x - -(?:[AaLl]+)[^xX-]*?)(?P[\x{150}-\x{250}\x{300}]| - [^\x{800}aAs-uS-U\x{d800}-\x{dfff}])++[^#\b\x{500}\x{1000}]{3,5}$ - /x - - In 16-bit mode with options: S>testdata/saved16LE-1 - FS>testdata/saved16BE-1 - In 32-bit mode with options: S>testdata/saved32LE-1 - FS>testdata/saved32BE-1 ---%x - -[aZ\x{400}-\x{10ffff}]{4,} - [\x{f123}\x{10039}\x{20000}-\x{21234}]?| - [A-Cx-z\x{100000}-\x{1000a7}\x{101234}]) - (?[^az])/x - - In 16-bit mode with options: S8>testdata/saved16LE-2 - FS8>testdata/saved16BE-2 - In 32-bit mode with options: S8>testdata/saved32LE-2 - FS8>testdata/saved32BE-2 ---%8x - -[aZ\x{400}-\x{10ffff}]{4,} - [\x{f123}\x{10039}\x{20000}-\x{21234}]?| - [A-Cx-z\x{100000}-\x{1000a7}\x{101234}]) - (?[^az])/x - - In 16-bit mode with options: S8>testdata/saved16LE-2 - FS8>testdata/saved16BE-2 - In 32-bit mode with options: S8>testdata/saved32LE-2 - FS8>testdata/saved32BE-2 ---%8x - - ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ - _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 - \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f - \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e - \x9f \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae - \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd - \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc - \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb - \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea - \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 - \xfa \xfb \xfc \xfd \xfe \xff - -/[\V]/BZSI ------------------------------------------------------------------- - Bra - [\x00-\x09\x0e-\x84\x86-\xff\x{100}-\x{2027}\x{202a}-\x{ffff}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e - \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d - \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > - ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c - d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 - \x83 \x84 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 - \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 - \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 - \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf - \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce - \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd - \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec - \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb - \xfc \xfd \xfe \xff - -/-- End of testinput23 --/ diff --git a/src/pcre/testdata/testoutput24 b/src/pcre/testdata/testoutput24 deleted file mode 100644 index 0714a0fe..00000000 --- a/src/pcre/testdata/testoutput24 +++ /dev/null @@ -1,13 +0,0 @@ -/-- Tests for the 16-bit library with UTF-16 support only */ - -< forbid W - -/bad/8 - \x{d800} -Error -10 (bad UTF-16 string) offset=0 reason=1 - -/short/8 - \P\P\x{d800} -Error -25 (short UTF-16 string) offset=0 reason=1 - -/-- End of testinput24 --/ diff --git a/src/pcre/testdata/testoutput25 b/src/pcre/testdata/testoutput25 deleted file mode 100644 index 4c62c8d8..00000000 --- a/src/pcre/testdata/testoutput25 +++ /dev/null @@ -1,119 +0,0 @@ -/-- Tests for the 32-bit library only */ - -< forbid 8W - -/-- Check maximum character size --/ - -/\x{110000}/ - -/\x{7fffffff}/ - -/\x{80000000}/ - -/\x{ffffffff}/ - -/\x{100000000}/ -Failed: character value in \x{} or \o{} is too large at offset 12 - -/\o{17777777777}/ - -/\o{20000000000}/ - -/\o{37777777777}/ - -/\o{40000000000}/ -Failed: character value in \x{} or \o{} is too large at offset 14 - -/\x{7fffffff}\x{7fffffff}/I -Capturing subpattern count = 0 -No options -First char = \x{7fffffff} -Need char = \x{7fffffff} - -/\x{80000000}\x{80000000}/I -Capturing subpattern count = 0 -No options -First char = \x{80000000} -Need char = \x{80000000} - -/\x{ffffffff}\x{ffffffff}/I -Capturing subpattern count = 0 -No options -First char = \x{ffffffff} -Need char = \x{ffffffff} - -/-- Non-UTF characters --/ - -/\C{2,3}/ - \x{400000}\x{400001}\x{400002}\x{400003} - 0: \x{400000}\x{400001}\x{400002} - -/\x{400000}\x{800000}/iDZ ------------------------------------------------------------------- - Bra - /i \x{400000}\x{800000} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: caseless -First char = \x{400000} -Need char = \x{800000} - -/-- Check character ranges --/ - -/[\H]/BZSI ------------------------------------------------------------------- - Bra - [\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{ffffffff}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b - \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a - \x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 - : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ - _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 - \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f - \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e - \x9f \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae - \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd - \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc - \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb - \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea - \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 - \xfa \xfb \xfc \xfd \xfe \xff - -/[\V]/BZSI ------------------------------------------------------------------- - Bra - [\x00-\x09\x0e-\x84\x86-\xff\x{100}-\x{2027}\x{202a}-\x{ffffffff}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e - \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d - \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > - ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c - d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 - \x83 \x84 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 - \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 - \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 - \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf - \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce - \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd - \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec - \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb - \xfc \xfd \xfe \xff - -/-- End of testinput25 --/ diff --git a/src/pcre/testdata/testoutput26 b/src/pcre/testdata/testoutput26 deleted file mode 100644 index 28f8d42a..00000000 --- a/src/pcre/testdata/testoutput26 +++ /dev/null @@ -1,17 +0,0 @@ -/-- Tests for the 32-bit library with UTF-32 support only */ - -< forbid W - -/-- Non-UTF characters --/ - -/\x{110000}/8 -Failed: character value in \x{} or \o{} is too large at offset 9 - -/\o{4200000}/8 -Failed: character value in \x{} or \o{} is too large at offset 10 - -/\C/8 - \x{110000} -Error -10 (bad UTF-32 string) offset=0 reason=3 - -/-- End of testinput26 --/ diff --git a/src/pcre/testdata/testoutput3 b/src/pcre/testdata/testoutput3 deleted file mode 100644 index 73119ab4..00000000 --- a/src/pcre/testdata/testoutput3 +++ /dev/null @@ -1,174 +0,0 @@ -/-- This set of tests checks local-specific features, using the "fr_FR" locale. - It is not Perl-compatible. When run via RunTest, the locale is edited to - be whichever of "fr_FR", "french", or "fr" is found to exist. There is - different version of this file called wintestinput3 for use on Windows, - where the locale is called "french" and the tests are run using - RunTest.bat. --/ - -< forbid 8W - -/^[\w]+/ - *** Failers -No match - École -No match - -/^[\w]+/Lfr_FR - École - 0: École - -/^[\w]+/ - *** Failers -No match - École -No match - -/^[\W]+/ - École - 0: \xc9 - -/^[\W]+/Lfr_FR - *** Failers - 0: *** - École -No match - -/[\b]/ - \b - 0: \x08 - *** Failers -No match - a -No match - -/[\b]/Lfr_FR - \b - 0: \x08 - *** Failers -No match - a -No match - -/^\w+/ - *** Failers -No match - École -No match - -/^\w+/Lfr_FR - École - 0: École - -/(.+)\b(.+)/ - École - 0: \xc9cole - 1: \xc9 - 2: cole - -/(.+)\b(.+)/Lfr_FR - *** Failers - 0: *** Failers - 1: *** - 2: Failers - École -No match - -/École/i - École - 0: \xc9cole - *** Failers -No match - école -No match - -/École/iLfr_FR - École - 0: École - école - 0: école - -/\w/IS -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P - Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z - -/\w/ISLfr_FR -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P - Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z - ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â - ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ - -/^[\xc8-\xc9]/iLfr_FR - École - 0: É - école - 0: é - -/^[\xc8-\xc9]/Lfr_FR - École - 0: É - *** Failers -No match - école -No match - -/\W+/Lfr_FR - >>>\xaa<<< - 0: >>> - >>>\xba<<< - 0: >>> - -/[\W]+/Lfr_FR - >>>\xaa<<< - 0: >>> - >>>\xba<<< - 0: >>> - -/[^[:alpha:]]+/Lfr_FR - >>>\xaa<<< - 0: >>> - >>>\xba<<< - 0: >>> - -/\w+/Lfr_FR - >>>\xaa<<< - 0: ª - >>>\xba<<< - 0: º - -/[\w]+/Lfr_FR - >>>\xaa<<< - 0: ª - >>>\xba<<< - 0: º - -/[[:alpha:]]+/Lfr_FR - >>>\xaa<<< - 0: ª - >>>\xba<<< - 0: º - -/[[:alpha:]][[:lower:]][[:upper:]]/DZLfr_FR ------------------------------------------------------------------- - Bra - [A-Za-z\xaa\xb5\xba\xc0-\xd6\xd8-\xf6\xf8-\xff] - [a-z\xb5\xdf-\xf6\xf8-\xff] - [A-Z\xc0-\xd6\xd8-\xde] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/-- End of testinput3 --/ diff --git a/src/pcre/testdata/testoutput3A b/src/pcre/testdata/testoutput3A deleted file mode 100644 index 0bde024e..00000000 --- a/src/pcre/testdata/testoutput3A +++ /dev/null @@ -1,174 +0,0 @@ -/-- This set of tests checks local-specific features, using the "fr_FR" locale. - It is not Perl-compatible. When run via RunTest, the locale is edited to - be whichever of "fr_FR", "french", or "fr" is found to exist. There is - different version of this file called wintestinput3 for use on Windows, - where the locale is called "french" and the tests are run using - RunTest.bat. --/ - -< forbid 8W - -/^[\w]+/ - *** Failers -No match - École -No match - -/^[\w]+/Lfr_FR - École - 0: École - -/^[\w]+/ - *** Failers -No match - École -No match - -/^[\W]+/ - École - 0: \xc9 - -/^[\W]+/Lfr_FR - *** Failers - 0: *** - École -No match - -/[\b]/ - \b - 0: \x08 - *** Failers -No match - a -No match - -/[\b]/Lfr_FR - \b - 0: \x08 - *** Failers -No match - a -No match - -/^\w+/ - *** Failers -No match - École -No match - -/^\w+/Lfr_FR - École - 0: École - -/(.+)\b(.+)/ - École - 0: \xc9cole - 1: \xc9 - 2: cole - -/(.+)\b(.+)/Lfr_FR - *** Failers - 0: *** Failers - 1: *** - 2: Failers - École -No match - -/École/i - École - 0: \xc9cole - *** Failers -No match - école -No match - -/École/iLfr_FR - École - 0: École - école - 0: école - -/\w/IS -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P - Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z - -/\w/ISLfr_FR -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P - Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z - ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â - ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ - -/^[\xc8-\xc9]/iLfr_FR - École - 0: É - école - 0: é - -/^[\xc8-\xc9]/Lfr_FR - École - 0: É - *** Failers -No match - école -No match - -/\W+/Lfr_FR - >>>\xaa<<< - 0: >>> - >>>\xba<<< - 0: >>> - -/[\W]+/Lfr_FR - >>>\xaa<<< - 0: >>> - >>>\xba<<< - 0: >>> - -/[^[:alpha:]]+/Lfr_FR - >>>\xaa<<< - 0: >>> - >>>\xba<<< - 0: >>> - -/\w+/Lfr_FR - >>>\xaa<<< - 0: ª - >>>\xba<<< - 0: º - -/[\w]+/Lfr_FR - >>>\xaa<<< - 0: ª - >>>\xba<<< - 0: º - -/[[:alpha:]]+/Lfr_FR - >>>\xaa<<< - 0: ª - >>>\xba<<< - 0: º - -/[[:alpha:]][[:lower:]][[:upper:]]/DZLfr_FR ------------------------------------------------------------------- - Bra - [A-Za-z\xaa\xb5\xba\xc0-\xd6\xd8-\xf6\xf8-\xff] - [a-z\xaa\xb5\xba\xdf-\xf6\xf8-\xff] - [A-Z\xc0-\xd6\xd8-\xde] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/-- End of testinput3 --/ diff --git a/src/pcre/testdata/testoutput4 b/src/pcre/testdata/testoutput4 deleted file mode 100644 index 69e812cd..00000000 --- a/src/pcre/testdata/testoutput4 +++ /dev/null @@ -1,1284 +0,0 @@ -/-- This set of tests is for UTF support, excluding Unicode properties. It is - compatible with all versions of Perl >= 5.10 and both the 8-bit and 16-bit - PCRE libraries. --/ - -< forbid 9?=ABCDEFfGILMNPTUWXZ< - -/a.b/8 - acb - 0: acb - a\x7fb - 0: a\x{7f}b - a\x{100}b - 0: a\x{100}b - *** Failers -No match - a\nb -No match - -/a(.{3})b/8 - a\x{4000}xyb - 0: a\x{4000}xyb - 1: \x{4000}xy - a\x{4000}\x7fyb - 0: a\x{4000}\x{7f}yb - 1: \x{4000}\x{7f}y - a\x{4000}\x{100}yb - 0: a\x{4000}\x{100}yb - 1: \x{4000}\x{100}y - *** Failers -No match - a\x{4000}b -No match - ac\ncb -No match - -/a(.*?)(.)/ - a\xc0\x88b - 0: a\xc0 - 1: - 2: \xc0 - -/a(.*?)(.)/8 - a\x{100}b - 0: a\x{100} - 1: - 2: \x{100} - -/a(.*)(.)/ - a\xc0\x88b - 0: a\xc0\x88b - 1: \xc0\x88 - 2: b - -/a(.*)(.)/8 - a\x{100}b - 0: a\x{100}b - 1: \x{100} - 2: b - -/a(.)(.)/ - a\xc0\x92bcd - 0: a\xc0\x92 - 1: \xc0 - 2: \x92 - -/a(.)(.)/8 - a\x{240}bcd - 0: a\x{240}b - 1: \x{240} - 2: b - -/a(.?)(.)/ - a\xc0\x92bcd - 0: a\xc0\x92 - 1: \xc0 - 2: \x92 - -/a(.?)(.)/8 - a\x{240}bcd - 0: a\x{240}b - 1: \x{240} - 2: b - -/a(.??)(.)/ - a\xc0\x92bcd - 0: a\xc0 - 1: - 2: \xc0 - -/a(.??)(.)/8 - a\x{240}bcd - 0: a\x{240} - 1: - 2: \x{240} - -/a(.{3})b/8 - a\x{1234}xyb - 0: a\x{1234}xyb - 1: \x{1234}xy - a\x{1234}\x{4321}yb - 0: a\x{1234}\x{4321}yb - 1: \x{1234}\x{4321}y - a\x{1234}\x{4321}\x{3412}b - 0: a\x{1234}\x{4321}\x{3412}b - 1: \x{1234}\x{4321}\x{3412} - *** Failers -No match - a\x{1234}b -No match - ac\ncb -No match - -/a(.{3,})b/8 - a\x{1234}xyb - 0: a\x{1234}xyb - 1: \x{1234}xy - a\x{1234}\x{4321}yb - 0: a\x{1234}\x{4321}yb - 1: \x{1234}\x{4321}y - a\x{1234}\x{4321}\x{3412}b - 0: a\x{1234}\x{4321}\x{3412}b - 1: \x{1234}\x{4321}\x{3412} - axxxxbcdefghijb - 0: axxxxbcdefghijb - 1: xxxxbcdefghij - a\x{1234}\x{4321}\x{3412}\x{3421}b - 0: a\x{1234}\x{4321}\x{3412}\x{3421}b - 1: \x{1234}\x{4321}\x{3412}\x{3421} - *** Failers -No match - a\x{1234}b -No match - -/a(.{3,}?)b/8 - a\x{1234}xyb - 0: a\x{1234}xyb - 1: \x{1234}xy - a\x{1234}\x{4321}yb - 0: a\x{1234}\x{4321}yb - 1: \x{1234}\x{4321}y - a\x{1234}\x{4321}\x{3412}b - 0: a\x{1234}\x{4321}\x{3412}b - 1: \x{1234}\x{4321}\x{3412} - axxxxbcdefghijb - 0: axxxxb - 1: xxxx - a\x{1234}\x{4321}\x{3412}\x{3421}b - 0: a\x{1234}\x{4321}\x{3412}\x{3421}b - 1: \x{1234}\x{4321}\x{3412}\x{3421} - *** Failers -No match - a\x{1234}b -No match - -/a(.{3,5})b/8 - a\x{1234}xyb - 0: a\x{1234}xyb - 1: \x{1234}xy - a\x{1234}\x{4321}yb - 0: a\x{1234}\x{4321}yb - 1: \x{1234}\x{4321}y - a\x{1234}\x{4321}\x{3412}b - 0: a\x{1234}\x{4321}\x{3412}b - 1: \x{1234}\x{4321}\x{3412} - axxxxbcdefghijb - 0: axxxxb - 1: xxxx - a\x{1234}\x{4321}\x{3412}\x{3421}b - 0: a\x{1234}\x{4321}\x{3412}\x{3421}b - 1: \x{1234}\x{4321}\x{3412}\x{3421} - axbxxbcdefghijb - 0: axbxxb - 1: xbxx - axxxxxbcdefghijb - 0: axxxxxb - 1: xxxxx - *** Failers -No match - a\x{1234}b -No match - axxxxxxbcdefghijb -No match - -/a(.{3,5}?)b/8 - a\x{1234}xyb - 0: a\x{1234}xyb - 1: \x{1234}xy - a\x{1234}\x{4321}yb - 0: a\x{1234}\x{4321}yb - 1: \x{1234}\x{4321}y - a\x{1234}\x{4321}\x{3412}b - 0: a\x{1234}\x{4321}\x{3412}b - 1: \x{1234}\x{4321}\x{3412} - axxxxbcdefghijb - 0: axxxxb - 1: xxxx - a\x{1234}\x{4321}\x{3412}\x{3421}b - 0: a\x{1234}\x{4321}\x{3412}\x{3421}b - 1: \x{1234}\x{4321}\x{3412}\x{3421} - axbxxbcdefghijb - 0: axbxxb - 1: xbxx - axxxxxbcdefghijb - 0: axxxxxb - 1: xxxxx - *** Failers -No match - a\x{1234}b -No match - axxxxxxbcdefghijb -No match - -/^[a\x{c0}]/8 - *** Failers -No match - \x{100} -No match - -/(?<=aXb)cd/8 - aXbcd - 0: cd - -/(?<=a\x{100}b)cd/8 - a\x{100}bcd - 0: cd - -/(?<=a\x{100000}b)cd/8 - a\x{100000}bcd - 0: cd - -/(?:\x{100}){3}b/8 - \x{100}\x{100}\x{100}b - 0: \x{100}\x{100}\x{100}b - *** Failers -No match - \x{100}\x{100}b -No match - -/\x{ab}/8 - \x{ab} - 0: \x{ab} - \xc2\xab - 0: \x{ab} - *** Failers -No match - \x00{ab} -No match - -/(?<=(.))X/8 - WXYZ - 0: X - 1: W - \x{256}XYZ - 0: X - 1: \x{256} - *** Failers -No match - XYZ -No match - -/[^a]+/8g - bcd - 0: bcd - \x{100}aY\x{256}Z - 0: \x{100} - 0: Y\x{256}Z - -/^[^a]{2}/8 - \x{100}bc - 0: \x{100}b - -/^[^a]{2,}/8 - \x{100}bcAa - 0: \x{100}bcA - -/^[^a]{2,}?/8 - \x{100}bca - 0: \x{100}b - -/[^a]+/8ig - bcd - 0: bcd - \x{100}aY\x{256}Z - 0: \x{100} - 0: Y\x{256}Z - -/^[^a]{2}/8i - \x{100}bc - 0: \x{100}b - -/^[^a]{2,}/8i - \x{100}bcAa - 0: \x{100}bc - -/^[^a]{2,}?/8i - \x{100}bca - 0: \x{100}b - -/\x{100}{0,0}/8 - abcd - 0: - -/\x{100}?/8 - abcd - 0: - \x{100}\x{100} - 0: \x{100} - -/\x{100}{0,3}/8 - \x{100}\x{100} - 0: \x{100}\x{100} - \x{100}\x{100}\x{100}\x{100} - 0: \x{100}\x{100}\x{100} - -/\x{100}*/8 - abce - 0: - \x{100}\x{100}\x{100}\x{100} - 0: \x{100}\x{100}\x{100}\x{100} - -/\x{100}{1,1}/8 - abcd\x{100}\x{100}\x{100}\x{100} - 0: \x{100} - -/\x{100}{1,3}/8 - abcd\x{100}\x{100}\x{100}\x{100} - 0: \x{100}\x{100}\x{100} - -/\x{100}+/8 - abcd\x{100}\x{100}\x{100}\x{100} - 0: \x{100}\x{100}\x{100}\x{100} - -/\x{100}{3}/8 - abcd\x{100}\x{100}\x{100}XX - 0: \x{100}\x{100}\x{100} - -/\x{100}{3,5}/8 - abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX - 0: \x{100}\x{100}\x{100}\x{100}\x{100} - -/\x{100}{3,}/8 - abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX - 0: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} - -/(?<=a\x{100}{2}b)X/8+ - Xyyya\x{100}\x{100}bXzzz - 0: X - 0+ zzz - -/\D*/8 - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa - 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa - -/\D*/8 - \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} - 0: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} - -/\D/8 - 1X2 - 0: X - 1\x{100}2 - 0: \x{100} - -/>\S/8 - > >X Y - 0: >X - > >\x{100} Y - 0: >\x{100} - -/\d/8 - \x{100}3 - 0: 3 - -/\s/8 - \x{100} X - 0: - -/\D+/8 - 12abcd34 - 0: abcd - *** Failers - 0: *** Failers - 1234 -No match - -/\D{2,3}/8 - 12abcd34 - 0: abc - 12ab34 - 0: ab - *** Failers - 0: *** - 1234 -No match - 12a34 -No match - -/\D{2,3}?/8 - 12abcd34 - 0: ab - 12ab34 - 0: ab - *** Failers - 0: ** - 1234 -No match - 12a34 -No match - -/\d+/8 - 12abcd34 - 0: 12 - *** Failers -No match - -/\d{2,3}/8 - 12abcd34 - 0: 12 - 1234abcd - 0: 123 - *** Failers -No match - 1.4 -No match - -/\d{2,3}?/8 - 12abcd34 - 0: 12 - 1234abcd - 0: 12 - *** Failers -No match - 1.4 -No match - -/\S+/8 - 12abcd34 - 0: 12abcd34 - *** Failers - 0: *** - \ \ -No match - -/\S{2,3}/8 - 12abcd34 - 0: 12a - 1234abcd - 0: 123 - *** Failers - 0: *** - \ \ -No match - -/\S{2,3}?/8 - 12abcd34 - 0: 12 - 1234abcd - 0: 12 - *** Failers - 0: ** - \ \ -No match - -/>\s+ <34 - 0: > < - 0+ 34 - *** Failers -No match - -/>\s{2,3} < - 0+ cd - ab> < - 0+ ce - *** Failers -No match - ab> \s{2,3}? < - 0+ cd - ab> < - 0+ ce - *** Failers -No match - ab> \xff< - 0: \xff - -/[\xff]/8 - >\x{ff}< - 0: \x{ff} - -/[^\xFF]/ - XYZ - 0: X - -/[^\xff]/8 - XYZ - 0: X - \x{123} - 0: \x{123} - -/^[ac]*b/8 - xb -No match - -/^[ac\x{100}]*b/8 - xb -No match - -/^[^x]*b/8i - xb -No match - -/^[^x]*b/8 - xb -No match - -/^\d*b/8 - xb -No match - -/(|a)/g8 - catac - 0: - 1: - 0: - 1: - 0: a - 1: a - 0: - 1: - 0: - 1: - 0: a - 1: a - 0: - 1: - 0: - 1: - a\x{256}a - 0: - 1: - 0: a - 1: a - 0: - 1: - 0: - 1: - 0: a - 1: a - 0: - 1: - -/^\x{85}$/8i - \x{85} - 0: \x{85} - -/^ሴ/8 - ሴ - 0: \x{1234} - -/^\ሴ/8 - ሴ - 0: \x{1234} - -"(?s)(.{1,5})"8 - abcdefg - 0: abcde - 1: abcde - ab - 0: ab - 1: ab - -/a*\x{100}*\w/8 - a - 0: a - -/\S\S/8g - A\x{a3}BC - 0: A\x{a3} - 0: BC - -/\S{2}/8g - A\x{a3}BC - 0: A\x{a3} - 0: BC - -/\W\W/8g - +\x{a3}== - 0: +\x{a3} - 0: == - -/\W{2}/8g - +\x{a3}== - 0: +\x{a3} - 0: == - -/\S/8g - \x{442}\x{435}\x{441}\x{442} - 0: \x{442} - 0: \x{435} - 0: \x{441} - 0: \x{442} - -/[\S]/8g - \x{442}\x{435}\x{441}\x{442} - 0: \x{442} - 0: \x{435} - 0: \x{441} - 0: \x{442} - -/\D/8g - \x{442}\x{435}\x{441}\x{442} - 0: \x{442} - 0: \x{435} - 0: \x{441} - 0: \x{442} - -/[\D]/8g - \x{442}\x{435}\x{441}\x{442} - 0: \x{442} - 0: \x{435} - 0: \x{441} - 0: \x{442} - -/\W/8g - \x{2442}\x{2435}\x{2441}\x{2442} - 0: \x{2442} - 0: \x{2435} - 0: \x{2441} - 0: \x{2442} - -/[\W]/8g - \x{2442}\x{2435}\x{2441}\x{2442} - 0: \x{2442} - 0: \x{2435} - 0: \x{2441} - 0: \x{2442} - -/[\S\s]*/8 - abc\n\r\x{442}\x{435}\x{441}\x{442}xyz - 0: abc\x{0a}\x{0d}\x{442}\x{435}\x{441}\x{442}xyz - -/[\x{41f}\S]/8g - \x{442}\x{435}\x{441}\x{442} - 0: \x{442} - 0: \x{435} - 0: \x{441} - 0: \x{442} - -/.[^\S]./8g - abc def\x{442}\x{443}xyz\npqr - 0: c d - 0: z\x{0a}p - -/.[^\S\n]./8g - abc def\x{442}\x{443}xyz\npqr - 0: c d - -/[[:^alnum:]]/8g - +\x{2442} - 0: + - 0: \x{2442} - -/[[:^alpha:]]/8g - +\x{2442} - 0: + - 0: \x{2442} - -/[[:^ascii:]]/8g - A\x{442} - 0: \x{442} - -/[[:^blank:]]/8g - A\x{442} - 0: A - 0: \x{442} - -/[[:^cntrl:]]/8g - A\x{442} - 0: A - 0: \x{442} - -/[[:^digit:]]/8g - A\x{442} - 0: A - 0: \x{442} - -/[[:^graph:]]/8g - \x19\x{e01ff} - 0: \x{19} - 0: \x{e01ff} - -/[[:^lower:]]/8g - A\x{422} - 0: A - 0: \x{422} - -/[[:^print:]]/8g - \x{19}\x{e01ff} - 0: \x{19} - 0: \x{e01ff} - -/[[:^punct:]]/8g - A\x{442} - 0: A - 0: \x{442} - -/[[:^space:]]/8g - A\x{442} - 0: A - 0: \x{442} - -/[[:^upper:]]/8g - a\x{442} - 0: a - 0: \x{442} - -/[[:^word:]]/8g - +\x{2442} - 0: + - 0: \x{2442} - -/[[:^xdigit:]]/8g - M\x{442} - 0: M - 0: \x{442} - -/[^ABCDEFGHIJKLMNOPQRSTUVWXYZÀÃÂÃÄÅÆÇÈÉÊËÌÃÃŽÃÃÑÒÓÔÕÖØÙÚÛÜÃÞĀĂĄĆĈĊČĎÄĒĔĖĘĚĜĞĠĢĤĦĨĪĬĮİIJĴĶĹĻĽĿÅŃŅŇŊŌŎÅÅ’Å”Å–Å˜ÅšÅœÅžÅ Å¢Å¤Å¦Å¨ÅªÅ¬Å®Å°Å²Å´Å¶Å¸Å¹Å»Å½ÆÆ‚Æ„Æ†Æ‡Æ‰ÆŠÆ‹ÆŽÆÆÆ‘Æ“Æ”Æ–Æ—Æ˜ÆœÆÆŸÆ Æ¢Æ¤Æ¦Æ§Æ©Æ¬Æ®Æ¯Æ±Æ²Æ³ÆµÆ·Æ¸Æ¼Ç„LJNJÇÇǑǓǕǗǙǛǞǠǢǤǦǨǪǬǮDZǴǶǷǸǺǼǾȀȂȄȆȈȊȌȎÈȒȔȖȘȚȜȞȠȢȤȦȨȪȬȮȰȲȺȻȽȾÉΆΈΉΊΌΎÎΑΒΓΔΕΖΗΘΙΚΛΜÎΞΟΠΡΣΤΥΦΧΨΩΪΫϒϓϔϘϚϜϞϠϢϤϦϨϪϬϮϴϷϹϺϽϾϿЀÐЂЃЄЅІЇЈЉЊЋЌÐÐŽÐÐБВГДЕЖЗИЙКЛМÐОПРСТУФХЦЧШЩЪЫЬЭЮЯѠѢѤѦѨѪѬѮѰѲѴѶѸѺѼѾҀҊҌҎÒҒҔҖҘҚҜҞҠҢҤҦҨҪҬҮҰҲҴҶҸҺҼҾӀÓÓƒÓ…Ó‡Ó‰Ó‹ÓÓӒӔӖӘӚӜӞӠӢӤӦӨӪӬӮӰӲӴӶӸԀԂԄԆԈԊԌԎԱԲԳԴԵԶԷԸԹԺԻԼԽԾԿՀÕÕ‚ÕƒÕ„Õ…Õ†Õ‡ÕˆÕ‰ÕŠÕ‹ÕŒÕÕŽÕÕՑՒՓՔՕՖႠႡႢႣႤႥႦႧႨႩႪႫႬႭႮႯႰႱႲႳႴႵႶႷႸႹႺႻႼႽႾႿჀáƒáƒ‚ჃჄჅḀḂḄḆḈḊḌḎá¸á¸’ḔḖḘḚḜḞḠḢḤḦḨḪḬḮḰḲḴḶḸḺḼḾṀṂṄṆṈṊṌṎá¹á¹’ṔṖṘṚṜṞṠṢṤṦṨṪṬṮṰṲṴṶṸṺṼṾẀẂẄẆẈẊẌẎáºáº’ẔẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼẾỀỂỄỆỈỊỌỎá»á»’ỔỖỘỚỜỞỠỢỤỦỨỪỬỮỰỲỴỶỸἈἉἊἋἌá¼á¼Žá¼á¼˜á¼™á¼šá¼›á¼œá¼á¼¨á¼©á¼ªá¼«á¼¬á¼­á¼®á¼¯á¼¸á¼¹á¼ºá¼»á¼¼á¼½á¼¾á¼¿á½ˆá½‰á½Šá½‹á½Œá½á½™á½›á½á½Ÿá½¨á½©á½ªá½«á½¬á½­á½®á½¯á¾¸á¾¹á¾ºá¾»á¿ˆá¿‰á¿Šá¿‹á¿˜á¿™á¿šá¿›á¿¨á¿©á¿ªá¿«á¿¬á¿¸á¿¹á¿ºá¿»abcdefghijklmnopqrstuvwxyzªµºßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿÄăąćĉċÄÄđēĕėęěÄğġģĥħĩīĭįıijĵķĸĺļľŀłńņňʼnŋÅÅőœŕŗřśÅÅŸÅ¡Å£Å¥Å§Å©Å«Å­Å¯Å±Å³ÅµÅ·ÅºÅ¼Å¾Å¿Æ€ÆƒÆ…ÆˆÆŒÆÆ’ƕƙƚƛƞơƣƥƨƪƫƭưƴƶƹƺƽƾƿdžljnjǎÇǒǔǖǘǚǜÇǟǡǣǥǧǩǫǭǯǰdzǵǹǻǽǿÈȃȅȇȉȋÈÈȑȓȕȗșțÈȟȡȣȥȧȩȫȭȯȱȳȴȵȶȷȸȹȼȿɀÉɑɒɓɔɕɖɗɘəɚɛɜÉɞɟɠɡɢɣɤɥɦɧɨɩɪɫɬɭɮɯɰɱɲɳɴɵɶɷɸɹɺɻɼɽɾɿʀÊʂʃʄʅʆʇʈʉʊʋʌÊÊŽÊÊʑʒʓʔʕʖʗʘʙʚʛʜÊʞʟʠʡʢʣʤʥʦʧʨʩʪʫʬʭʮʯÎάέήίΰαβγδεζηθικλμνξοπÏςστυφχψωϊϋόÏÏŽÏϑϕϖϗϙϛÏϟϡϣϥϧϩϫϭϯϰϱϲϳϵϸϻϼабвгдежзийклмнопрÑтуфхцчшщъыьÑÑŽÑÑёђѓєѕіїјљњћќÑўџѡѣѥѧѩѫѭѯѱѳѵѷѹѻѽѿÒÒ‹ÒÒÒ‘Ò“Ò•Ò—Ò™Ò›ÒҟҡңҥҧҩҫҭүұҳҵҷҹһҽҿӂӄӆӈӊӌӎӑӓӕӗәӛÓÓŸÓ¡Ó£Ó¥Ó§Ó©Ó«Ó­Ó¯Ó±Ó³ÓµÓ·Ó¹ÔÔƒÔ…Ô‡Ô‰Ô‹ÔÔÕ¡Õ¢Õ£Õ¤Õ¥Õ¦Õ§Õ¨Õ©ÕªÕ«Õ¬Õ­Õ®Õ¯Õ°Õ±Õ²Õ³Õ´ÕµÕ¶Õ·Õ¸Õ¹ÕºÕ»Õ¼Õ½Õ¾Õ¿Ö€Öւփքօֆևᴀá´á´‚ᴃᴄᴅᴆᴇᴈᴉᴊᴋᴌá´á´Žá´á´á´‘ᴒᴓᴔᴕᴖᴗᴘᴙᴚᴛᴜá´á´žá´Ÿá´ á´¡á´¢á´£á´¤á´¥á´¦á´§á´¨á´©á´ªá´«áµ¢áµ£áµ¤áµ¥áµ¦áµ§áµ¨áµ©áµªáµ«áµ¬áµ­áµ®áµ¯áµ°áµ±áµ²áµ³áµ´áµµáµ¶áµ·áµ¹áµºáµ»áµ¼áµ½áµ¾áµ¿á¶€á¶á¶‚ᶃᶄᶅᶆᶇᶈᶉᶊᶋᶌá¶á¶Žá¶á¶á¶‘ᶒᶓᶔᶕᶖᶗᶘᶙᶚá¸á¸ƒá¸…ḇḉḋá¸á¸á¸‘ḓḕḗḙḛá¸á¸Ÿá¸¡á¸£á¸¥á¸§á¸©á¸«á¸­á¸¯á¸±á¸³á¸µá¸·á¸¹á¸»á¸½á¸¿á¹á¹ƒá¹…ṇṉṋá¹á¹á¹‘ṓṕṗṙṛá¹á¹Ÿá¹¡á¹£á¹¥á¹§á¹©á¹«á¹­á¹¯á¹±á¹³á¹µá¹·á¹¹á¹»á¹½á¹¿áºáºƒáº…ẇẉẋáºáºáº‘ẓẕẖẗẘẙẚẛạảấầẩẫậắằẳẵặẹẻẽếá»á»ƒá»…ệỉịá»á»á»‘ồổỗộớá»á»Ÿá»¡á»£á»¥á»§á»©á»«á»­á»¯á»±á»³á»µá»·á»¹á¼€á¼á¼‚ἃἄἅἆἇá¼á¼‘ἒἓἔἕἠἡἢἣἤἥἦἧἰἱἲἳἴἵἶἷὀá½á½‚ὃὄὅá½á½‘ὒὓὔὕὖὗὠὡὢὣὤὥὦὧὰάὲέὴήὶίὸόὺύὼώᾀá¾á¾‚ᾃᾄᾅᾆᾇá¾á¾‘ᾒᾓᾔᾕᾖᾗᾠᾡᾢᾣᾤᾥᾦᾧᾰᾱᾲᾳᾴᾶᾷιῂῃῄῆῇá¿á¿‘ῒΐῖῗῠῡῢΰῤῥῦῧῲῳῴῶῷâ²â²ƒâ²…ⲇⲉⲋâ²â²â²‘ⲓⲕⲗⲙⲛâ²â²Ÿâ²¡â²£â²¥â²§â²©â²«â²­â²¯â²±â²³â²µâ²·â²¹â²»â²½â²¿â³â³ƒâ³…ⳇⳉⳋâ³â³â³‘ⳓⳕⳗⳙⳛâ³â³Ÿâ³¡â³£â³¤â´€â´â´‚ⴃⴄⴅⴆⴇⴈⴉⴊⴋⴌâ´â´Žâ´â´â´‘ⴒⴓⴔⴕⴖⴗⴘⴙⴚⴛⴜâ´â´žâ´Ÿâ´ â´¡â´¢â´£â´¤â´¥ï¬€ï¬ï¬‚ffifflſtstﬓﬔﬕﬖﬗ\d-_^]/8 - -/^[^d]*?$/ - abc - 0: abc - -/^[^d]*?$/8 - abc - 0: abc - -/^[^d]*?$/i - abc - 0: abc - -/^[^d]*?$/8i - abc - 0: abc - -/(?i)[\xc3\xa9\xc3\xbd]|[\xc3\xa9\xc3\xbdA]/8 - -/^[a\x{c0}]b/8 - \x{c0}b - 0: \x{c0}b - -/^([a\x{c0}]*?)aa/8 - a\x{c0}aaaa/ - 0: a\x{c0}aa - 1: a\x{c0} - -/^([a\x{c0}]*?)aa/8 - a\x{c0}aaaa/ - 0: a\x{c0}aa - 1: a\x{c0} - a\x{c0}a\x{c0}aaa/ - 0: a\x{c0}a\x{c0}aa - 1: a\x{c0}a\x{c0} - -/^([a\x{c0}]*)aa/8 - a\x{c0}aaaa/ - 0: a\x{c0}aaaa - 1: a\x{c0}aa - a\x{c0}a\x{c0}aaa/ - 0: a\x{c0}a\x{c0}aaa - 1: a\x{c0}a\x{c0}a - -/^([a\x{c0}]*)a\x{c0}/8 - a\x{c0}aaaa/ - 0: a\x{c0} - 1: - a\x{c0}a\x{c0}aaa/ - 0: a\x{c0}a\x{c0} - 1: a\x{c0} - -/A*/g8 - AAB\x{123}BAA - 0: AA - 0: - 0: - 0: - 0: AA - 0: - -/(abc)\1/8i - abc -No match - -/(abc)\1/8 - abc -No match - -/a(*:a\x{1234}b)/8K - abc - 0: a -MK: a\x{1234}b - -/a(*:a£b)/8K - abc - 0: a -MK: a\x{a3}b - -/-- Noncharacters --/ - -/./8 - \x{fffe} - 0: \x{fffe} - \x{ffff} - 0: \x{ffff} - \x{1fffe} - 0: \x{1fffe} - \x{1ffff} - 0: \x{1ffff} - \x{2fffe} - 0: \x{2fffe} - \x{2ffff} - 0: \x{2ffff} - \x{3fffe} - 0: \x{3fffe} - \x{3ffff} - 0: \x{3ffff} - \x{4fffe} - 0: \x{4fffe} - \x{4ffff} - 0: \x{4ffff} - \x{5fffe} - 0: \x{5fffe} - \x{5ffff} - 0: \x{5ffff} - \x{6fffe} - 0: \x{6fffe} - \x{6ffff} - 0: \x{6ffff} - \x{7fffe} - 0: \x{7fffe} - \x{7ffff} - 0: \x{7ffff} - \x{8fffe} - 0: \x{8fffe} - \x{8ffff} - 0: \x{8ffff} - \x{9fffe} - 0: \x{9fffe} - \x{9ffff} - 0: \x{9ffff} - \x{afffe} - 0: \x{afffe} - \x{affff} - 0: \x{affff} - \x{bfffe} - 0: \x{bfffe} - \x{bffff} - 0: \x{bffff} - \x{cfffe} - 0: \x{cfffe} - \x{cffff} - 0: \x{cffff} - \x{dfffe} - 0: \x{dfffe} - \x{dffff} - 0: \x{dffff} - \x{efffe} - 0: \x{efffe} - \x{effff} - 0: \x{effff} - \x{ffffe} - 0: \x{ffffe} - \x{fffff} - 0: \x{fffff} - \x{10fffe} - 0: \x{10fffe} - \x{10ffff} - 0: \x{10ffff} - \x{fdd0} - 0: \x{fdd0} - \x{fdd1} - 0: \x{fdd1} - \x{fdd2} - 0: \x{fdd2} - \x{fdd3} - 0: \x{fdd3} - \x{fdd4} - 0: \x{fdd4} - \x{fdd5} - 0: \x{fdd5} - \x{fdd6} - 0: \x{fdd6} - \x{fdd7} - 0: \x{fdd7} - \x{fdd8} - 0: \x{fdd8} - \x{fdd9} - 0: \x{fdd9} - \x{fdda} - 0: \x{fdda} - \x{fddb} - 0: \x{fddb} - \x{fddc} - 0: \x{fddc} - \x{fddd} - 0: \x{fddd} - \x{fdde} - 0: \x{fdde} - \x{fddf} - 0: \x{fddf} - \x{fde0} - 0: \x{fde0} - \x{fde1} - 0: \x{fde1} - \x{fde2} - 0: \x{fde2} - \x{fde3} - 0: \x{fde3} - \x{fde4} - 0: \x{fde4} - \x{fde5} - 0: \x{fde5} - \x{fde6} - 0: \x{fde6} - \x{fde7} - 0: \x{fde7} - \x{fde8} - 0: \x{fde8} - \x{fde9} - 0: \x{fde9} - \x{fdea} - 0: \x{fdea} - \x{fdeb} - 0: \x{fdeb} - \x{fdec} - 0: \x{fdec} - \x{fded} - 0: \x{fded} - \x{fdee} - 0: \x{fdee} - \x{fdef} - 0: \x{fdef} - -/^\d*\w{4}/8 - 1234 - 0: 1234 - 123 -No match - -/^[^b]*\w{4}/8 - aaaa - 0: aaaa - aaa -No match - -/^[^b]*\w{4}/8i - aaaa - 0: aaaa - aaa -No match - -/^\x{100}*.{4}/8 - \x{100}\x{100}\x{100}\x{100} - 0: \x{100}\x{100}\x{100}\x{100} - \x{100}\x{100}\x{100} -No match - -/^\x{100}*.{4}/8i - \x{100}\x{100}\x{100}\x{100} - 0: \x{100}\x{100}\x{100}\x{100} - \x{100}\x{100}\x{100} -No match - -/^a+[a\x{200}]/8 - aa - 0: aa - -/^.\B.\B./8 - \x{10123}\x{10124}\x{10125} - 0: \x{10123}\x{10124}\x{10125} - -/^#[^\x{ffff}]#[^\x{ffff}]#[^\x{ffff}]#/8 - #\x{10000}#\x{100}#\x{10ffff}# - 0: #\x{10000}#\x{100}#\x{10ffff}# - -"[\S\V\H]"8 - -/\C(\W?Å¿)'?{{/8 - \\C(\\W?Å¿)'?{{ -No match - -/[^\x{100}-\x{ffff}]*[\x80-\xff]/8 - \x{99}\x{99}\x{99} - 0: \x{99}\x{99}\x{99} - -/-- End of testinput4 --/ diff --git a/src/pcre/testdata/testoutput5 b/src/pcre/testdata/testoutput5 deleted file mode 100644 index 090e1e1c..00000000 --- a/src/pcre/testdata/testoutput5 +++ /dev/null @@ -1,1953 +0,0 @@ -/-- This set of tests checks the API, internals, and non-Perl stuff for UTF - support, excluding Unicode properties. However, tests that give different - results in 8-bit and 16-bit modes are excluded (see tests 16 and 17). --/ - -< forbid W - -/\x{110000}/8DZ -Failed: character value in \x{} or \o{} is too large at offset 9 - -/\o{4200000}/8DZ -Failed: character value in \x{} or \o{} is too large at offset 10 - -/\x{ffffffff}/8 -Failed: character value in \x{} or \o{} is too large at offset 11 - -/\o{37777777777}/8 -Failed: character value in \x{} or \o{} is too large at offset 14 - -/\x{100000000}/8 -Failed: character value in \x{} or \o{} is too large at offset 12 - -/\o{77777777777}/8 -Failed: character value in \x{} or \o{} is too large at offset 14 - -/\x{d800}/8 -Failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 7 - -/\o{154000}/8 -Failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 9 - -/\x{dfff}/8 -Failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 7 - -/\o{157777}/8 -Failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 9 - -/\x{d7ff}/8 - -/\o{153777}/8 - -/\x{e000}/8 - -/\o{170000}/8 - -/^\x{100}a\x{1234}/8 - \x{100}a\x{1234}bcd - 0: \x{100}a\x{1234} - -/\x{0041}\x{2262}\x{0391}\x{002e}/DZ8 ------------------------------------------------------------------- - Bra - A\x{2262}\x{391}. - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'A' -Need char = '.' - \x{0041}\x{2262}\x{0391}\x{002e} - 0: A\x{2262}\x{391}. - -/.{3,5}X/DZ8 ------------------------------------------------------------------- - Bra - Any{3} - Any{0,2} - X - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -Need char = 'X' - \x{212ab}\x{212ab}\x{212ab}\x{861}X - 0: \x{212ab}\x{212ab}\x{212ab}\x{861}X - -/.{3,5}?/DZ8 ------------------------------------------------------------------- - Bra - Any{3} - Any{0,2}? - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - \x{212ab}\x{212ab}\x{212ab}\x{861} - 0: \x{212ab}\x{212ab}\x{212ab} - -/(?<=\C)X/8 -Failed: \C not allowed in lookbehind assertion at offset 6 - -/^[ab]/8DZ ------------------------------------------------------------------- - Bra - ^ - [ab] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored utf -No first char -No need char - bar - 0: b - *** Failers -No match - c -No match - \x{ff} -No match - \x{100} -No match - -/^[^ab]/8DZ ------------------------------------------------------------------- - Bra - ^ - [\x00-`c-\xff] (neg) - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored utf -No first char -No need char - c - 0: c - \x{ff} - 0: \x{ff} - \x{100} - 0: \x{100} - *** Failers - 0: * - aaa -No match - -/\x{100}*(\d+|"(?1)")/8 - 1234 - 0: 1234 - 1: 1234 - "1234" - 0: "1234" - 1: "1234" - \x{100}1234 - 0: \x{100}1234 - 1: 1234 - "\x{100}1234" - 0: \x{100}1234 - 1: 1234 - \x{100}\x{100}12ab - 0: \x{100}\x{100}12 - 1: 12 - \x{100}\x{100}"12" - 0: \x{100}\x{100}"12" - 1: "12" - *** Failers -No match - \x{100}\x{100}abcd -No match - -/\x{100}*/8DZ ------------------------------------------------------------------- - Bra - \x{100}*+ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -May match empty string -Options: utf -No first char -No need char - -/a\x{100}*/8DZ ------------------------------------------------------------------- - Bra - a - \x{100}*+ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'a' -No need char - -/ab\x{100}*/8DZ ------------------------------------------------------------------- - Bra - ab - \x{100}*+ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -First char = 'a' -Need char = 'b' - -/\x{100}*A/8DZ ------------------------------------------------------------------- - Bra - \x{100}*+ - A - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -Need char = 'A' - A - 0: A - -/\x{100}*\d(?R)/8DZ ------------------------------------------------------------------- - Bra - \x{100}*+ - \d - Recurse - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - -/[Z\x{100}]/8DZ ------------------------------------------------------------------- - Bra - [Z\x{100}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - Z\x{100} - 0: Z - \x{100} - 0: \x{100} - \x{100}Z - 0: \x{100} - *** Failers -No match - -/[\x{200}-\x{100}]/8 -Failed: range out of order in character class at offset 15 - -/[Ä€-Ä„]/8 - \x{100} - 0: \x{100} - \x{104} - 0: \x{104} - *** Failers -No match - \x{105} -No match - \x{ff} -No match - -/[z-\x{100}]/8DZ ------------------------------------------------------------------- - Bra - [z-\xff\x{100}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - -/[z\Qa-d]Ä€\E]/8DZ ------------------------------------------------------------------- - Bra - [\-\]adz\x{100}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - \x{100} - 0: \x{100} - Ä€ - 0: \x{100} - -/[\xFF]/DZ ------------------------------------------------------------------- - Bra - \x{ff} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -First char = \xff -No need char - >\xff< - 0: \xff - -/[^\xFF]/DZ ------------------------------------------------------------------- - Bra - [^\x{ff}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/[Ä-Ü]/8 - Ö # Matches without Study - 0: \x{d6} - \x{d6} - 0: \x{d6} - -/[Ä-Ü]/8S - Ö <-- Same with Study - 0: \x{d6} - \x{d6} - 0: \x{d6} - -/[\x{c4}-\x{dc}]/8 - Ö # Matches without Study - 0: \x{d6} - \x{d6} - 0: \x{d6} - -/[\x{c4}-\x{dc}]/8S - Ö <-- Same with Study - 0: \x{d6} - \x{d6} - 0: \x{d6} - -/[^\x{100}]abc(xyz(?1))/8DZ ------------------------------------------------------------------- - Bra - [^\x{100}] - abc - CBra 1 - xyz - Recurse - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -No first char -Need char = 'z' - -/[ab\x{100}]abc(xyz(?1))/8DZ ------------------------------------------------------------------- - Bra - [ab\x{100}] - abc - CBra 1 - xyz - Recurse - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -No first char -Need char = 'z' - -/(\x{100}(b(?2)c))?/DZ8 ------------------------------------------------------------------- - Bra - Brazero - CBra 1 - \x{100} - CBra 2 - b - Recurse - c - Ket - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 2 -May match empty string -Options: utf -No first char -No need char - -/(\x{100}(b(?2)c)){0,2}/DZ8 ------------------------------------------------------------------- - Bra - Brazero - Bra - CBra 1 - \x{100} - CBra 2 - b - Recurse - c - Ket - Ket - Brazero - CBra 1 - \x{100} - CBra 2 - b - Recurse - c - Ket - Ket - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 2 -May match empty string -Options: utf -No first char -No need char - -/(\x{100}(b(?1)c))?/DZ8 ------------------------------------------------------------------- - Bra - Brazero - CBra 1 - \x{100} - CBra 2 - b - Recurse - c - Ket - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 2 -May match empty string -Options: utf -No first char -No need char - -/(\x{100}(b(?1)c)){0,2}/DZ8 ------------------------------------------------------------------- - Bra - Brazero - Bra - CBra 1 - \x{100} - CBra 2 - b - Recurse - c - Ket - Ket - Brazero - CBra 1 - \x{100} - CBra 2 - b - Recurse - c - Ket - Ket - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 2 -May match empty string -Options: utf -No first char -No need char - -/\W/8 - A.B - 0: . - A\x{100}B - 0: \x{100} - -/\w/8 - \x{100}X - 0: X - -/^\ሴ/8DZ ------------------------------------------------------------------- - Bra - ^ - \x{1234} - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: anchored utf -No first char -No need char - -/\x{100}*\d/8DZ ------------------------------------------------------------------- - Bra - \x{100}*+ - \d - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - -/\x{100}*\s/8DZ ------------------------------------------------------------------- - Bra - \x{100}*+ - \s - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - -/\x{100}*\w/8DZ ------------------------------------------------------------------- - Bra - \x{100}*+ - \w - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - -/\x{100}*\D/8DZ ------------------------------------------------------------------- - Bra - \x{100}* - \D - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - -/\x{100}*\S/8DZ ------------------------------------------------------------------- - Bra - \x{100}* - \S - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - -/\x{100}*\W/8DZ ------------------------------------------------------------------- - Bra - \x{100}* - \W - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - -/()()()()()()()()()() - ()()()()()()()()()() - ()()()()()()()()()() - ()()()()()()()()()() - A (x) (?41) B/8x - AxxB -Matched, but too many substrings - 0: AxxB - 1: - 2: - 3: - 4: - 5: - 6: - 7: - 8: - 9: -10: -11: -12: -13: -14: - -/^[\x{100}\E-\Q\E\x{150}]/BZ8 ------------------------------------------------------------------- - Bra - ^ - [\x{100}-\x{150}] - Ket - End ------------------------------------------------------------------- - -/^[\QÄ€\E-\QÅ\E]/BZ8 ------------------------------------------------------------------- - Bra - ^ - [\x{100}-\x{150}] - Ket - End ------------------------------------------------------------------- - -/^abc./mgx8 - abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x{0085}abc7 \x{2028}abc8 \x{2029}abc9 JUNK - 0: abc1 - 0: abc2 - 0: abc3 - 0: abc4 - 0: abc5 - 0: abc6 - 0: abc7 - 0: abc8 - 0: abc9 - -/abc.$/mgx8 - abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x{0085} abc7\x{2028} abc8\x{2029} abc9 - 0: abc1 - 0: abc2 - 0: abc3 - 0: abc4 - 0: abc5 - 0: abc6 - 0: abc7 - 0: abc8 - 0: abc9 - -/^a\Rb/8 - a\nb - 0: a\x{0a}b - a\rb - 0: a\x{0d}b - a\r\nb - 0: a\x{0d}\x{0a}b - a\x0bb - 0: a\x{0b}b - a\x0cb - 0: a\x{0c}b - a\x{85}b - 0: a\x{85}b - a\x{2028}b - 0: a\x{2028}b - a\x{2029}b - 0: a\x{2029}b - ** Failers -No match - a\n\rb -No match - -/^a\R*b/8 - ab - 0: ab - a\nb - 0: a\x{0a}b - a\rb - 0: a\x{0d}b - a\r\nb - 0: a\x{0d}\x{0a}b - a\x0bb - 0: a\x{0b}b - a\x0c\x{2028}\x{2029}b - 0: a\x{0c}\x{2028}\x{2029}b - a\x{85}b - 0: a\x{85}b - a\n\rb - 0: a\x{0a}\x{0d}b - a\n\r\x{85}\x0cb - 0: a\x{0a}\x{0d}\x{85}\x{0c}b - -/^a\R+b/8 - a\nb - 0: a\x{0a}b - a\rb - 0: a\x{0d}b - a\r\nb - 0: a\x{0d}\x{0a}b - a\x0bb - 0: a\x{0b}b - a\x0c\x{2028}\x{2029}b - 0: a\x{0c}\x{2028}\x{2029}b - a\x{85}b - 0: a\x{85}b - a\n\rb - 0: a\x{0a}\x{0d}b - a\n\r\x{85}\x0cb - 0: a\x{0a}\x{0d}\x{85}\x{0c}b - ** Failers -No match - ab -No match - -/^a\R{1,3}b/8 - a\nb - 0: a\x{0a}b - a\n\rb - 0: a\x{0a}\x{0d}b - a\n\r\x{85}b - 0: a\x{0a}\x{0d}\x{85}b - a\r\n\r\nb - 0: a\x{0d}\x{0a}\x{0d}\x{0a}b - a\r\n\r\n\r\nb - 0: a\x{0d}\x{0a}\x{0d}\x{0a}\x{0d}\x{0a}b - a\n\r\n\rb - 0: a\x{0a}\x{0d}\x{0a}\x{0d}b - a\n\n\r\nb - 0: a\x{0a}\x{0a}\x{0d}\x{0a}b - ** Failers -No match - a\n\n\n\rb -No match - a\r -No match - -/\H\h\V\v/8 - X X\x0a - 0: X X\x{0a} - X\x09X\x0b - 0: X\x{09}X\x{0b} - ** Failers -No match - \x{a0} X\x0a -No match - -/\H*\h+\V?\v{3,4}/8 - \x09\x20\x{a0}X\x0a\x0b\x0c\x0d\x0a - 0: \x{09} \x{a0}X\x{0a}\x{0b}\x{0c}\x{0d} - \x09\x20\x{a0}\x0a\x0b\x0c\x0d\x0a - 0: \x{09} \x{a0}\x{0a}\x{0b}\x{0c}\x{0d} - \x09\x20\x{a0}\x0a\x0b\x0c - 0: \x{09} \x{a0}\x{0a}\x{0b}\x{0c} - ** Failers -No match - \x09\x20\x{a0}\x0a\x0b -No match - -/\H\h\V\v/8 - \x{3001}\x{3000}\x{2030}\x{2028} - 0: \x{3001}\x{3000}\x{2030}\x{2028} - X\x{180e}X\x{85} - 0: X\x{180e}X\x{85} - ** Failers -No match - \x{2009} X\x0a -No match - -/\H*\h+\V?\v{3,4}/8 - \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x0c\x0d\x0a - 0: \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x{0c}\x{0d} - \x09\x{205f}\x{a0}\x0a\x{2029}\x0c\x{2028}\x0a - 0: \x{09}\x{205f}\x{a0}\x{0a}\x{2029}\x{0c}\x{2028} - \x09\x20\x{202f}\x0a\x0b\x0c - 0: \x{09} \x{202f}\x{0a}\x{0b}\x{0c} - ** Failers -No match - \x09\x{200a}\x{a0}\x{2028}\x0b -No match - -/[\h]/8BZ ------------------------------------------------------------------- - Bra - [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}] - Ket - End ------------------------------------------------------------------- - >\x{1680} - 0: \x{1680} - -/[\h]{3,}/8BZ ------------------------------------------------------------------- - Bra - [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}]{3,}+ - Ket - End ------------------------------------------------------------------- - >\x{1680}\x{180e}\x{2000}\x{2003}\x{200a}\x{202f}\x{205f}\x{3000}< - 0: \x{1680}\x{180e}\x{2000}\x{2003}\x{200a}\x{202f}\x{205f}\x{3000} - -/[\v]/8BZ ------------------------------------------------------------------- - Bra - [\x0a-\x0d\x85\x{2028}-\x{2029}] - Ket - End ------------------------------------------------------------------- - -/[\H]/8BZ ------------------------------------------------------------------- - Bra - [\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{10ffff}] - Ket - End ------------------------------------------------------------------- - -/[\V]/8BZ ------------------------------------------------------------------- - Bra - [\x00-\x09\x0e-\x84\x86-\xff\x{100}-\x{2027}\x{202a}-\x{10ffff}] - Ket - End ------------------------------------------------------------------- - -/.*$/8 - \x{1ec5} - 0: \x{1ec5} - -/a\Rb/I8 -Capturing subpattern count = 0 -Options: bsr_anycrlf utf -First char = 'a' -Need char = 'b' - a\rb - 0: a\x{0d}b - a\nb - 0: a\x{0a}b - a\r\nb - 0: a\x{0d}\x{0a}b - ** Failers -No match - a\x{85}b -No match - a\x0bb -No match - -/a\Rb/I8 -Capturing subpattern count = 0 -Options: bsr_unicode utf -First char = 'a' -Need char = 'b' - a\rb - 0: a\x{0d}b - a\nb - 0: a\x{0a}b - a\r\nb - 0: a\x{0d}\x{0a}b - a\x{85}b - 0: a\x{85}b - a\x0bb - 0: a\x{0b}b - ** Failers -No match - a\x{85}b\ -No match - a\x0bb\ -No match - -/a\R?b/I8 -Capturing subpattern count = 0 -Options: bsr_anycrlf utf -First char = 'a' -Need char = 'b' - a\rb - 0: a\x{0d}b - a\nb - 0: a\x{0a}b - a\r\nb - 0: a\x{0d}\x{0a}b - ** Failers -No match - a\x{85}b -No match - a\x0bb -No match - -/a\R?b/I8 -Capturing subpattern count = 0 -Options: bsr_unicode utf -First char = 'a' -Need char = 'b' - a\rb - 0: a\x{0d}b - a\nb - 0: a\x{0a}b - a\r\nb - 0: a\x{0d}\x{0a}b - a\x{85}b - 0: a\x{85}b - a\x0bb - 0: a\x{0b}b - ** Failers -No match - a\x{85}b\ -No match - a\x0bb\ -No match - -/.*a.*=.b.*/8 - QQQ\x{2029}ABCaXYZ=!bPQR - 0: ABCaXYZ=!bPQR - ** Failers -No match - a\x{2029}b -No match - \x61\xe2\x80\xa9\x62 -No match - -/[[:a\x{100}b:]]/8 -Failed: unknown POSIX class name at offset 3 - -/a[^]b/8 - a\x{1234}b - 0: a\x{1234}b - a\nb - 0: a\x{0a}b - ** Failers -No match - ab -No match - -/a[^]+b/8 - aXb - 0: aXb - a\nX\nX\x{1234}b - 0: a\x{0a}X\x{0a}X\x{1234}b - ** Failers -No match - ab -No match - -/(\x{de})\1/ - \x{de}\x{de} - 0: \xde\xde - 1: \xde - -/X/8f - A\x{1ec5}ABCXYZ - 0: X - -/Xa{2,4}b/8 - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/Xa{2,4}?b/8 - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/Xa{2,4}+b/8 - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/X\x{123}{2,4}b/8 - X\P -Partial match: X - X\x{123}\P -Partial match: X\x{123} - X\x{123}\x{123}\P -Partial match: X\x{123}\x{123} - X\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123} - X\x{123}\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123}\x{123} - -/X\x{123}{2,4}?b/8 - X\P -Partial match: X - X\x{123}\P -Partial match: X\x{123} - X\x{123}\x{123}\P -Partial match: X\x{123}\x{123} - X\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123} - X\x{123}\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123}\x{123} - -/X\x{123}{2,4}+b/8 - X\P -Partial match: X - X\x{123}\P -Partial match: X\x{123} - X\x{123}\x{123}\P -Partial match: X\x{123}\x{123} - X\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123} - X\x{123}\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123}\x{123} - -/X\x{123}{2,4}b/8 - Xx\P -No match - X\x{123}x\P -No match - X\x{123}\x{123}x\P -No match - X\x{123}\x{123}\x{123}x\P -No match - X\x{123}\x{123}\x{123}\x{123}x\P -No match - -/X\x{123}{2,4}?b/8 - Xx\P -No match - X\x{123}x\P -No match - X\x{123}\x{123}x\P -No match - X\x{123}\x{123}\x{123}x\P -No match - X\x{123}\x{123}\x{123}\x{123}x\P -No match - -/X\x{123}{2,4}+b/8 - Xx\P -No match - X\x{123}x\P -No match - X\x{123}\x{123}x\P -No match - X\x{123}\x{123}\x{123}x\P -No match - X\x{123}\x{123}\x{123}\x{123}x\P -No match - -/X\d{2,4}b/8 - X\P -Partial match: X - X3\P -Partial match: X3 - X33\P -Partial match: X33 - X333\P -Partial match: X333 - X3333\P -Partial match: X3333 - -/X\d{2,4}?b/8 - X\P -Partial match: X - X3\P -Partial match: X3 - X33\P -Partial match: X33 - X333\P -Partial match: X333 - X3333\P -Partial match: X3333 - -/X\d{2,4}+b/8 - X\P -Partial match: X - X3\P -Partial match: X3 - X33\P -Partial match: X33 - X333\P -Partial match: X333 - X3333\P -Partial match: X3333 - -/X\D{2,4}b/8 - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/X\D{2,4}?b/8 - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/X\D{2,4}+b/8 - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/X\D{2,4}b/8 - X\P -Partial match: X - X\x{123}\P -Partial match: X\x{123} - X\x{123}\x{123}\P -Partial match: X\x{123}\x{123} - X\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123} - X\x{123}\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123}\x{123} - -/X\D{2,4}?b/8 - X\P -Partial match: X - X\x{123}\P -Partial match: X\x{123} - X\x{123}\x{123}\P -Partial match: X\x{123}\x{123} - X\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123} - X\x{123}\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123}\x{123} - -/X\D{2,4}+b/8 - X\P -Partial match: X - X\x{123}\P -Partial match: X\x{123} - X\x{123}\x{123}\P -Partial match: X\x{123}\x{123} - X\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123} - X\x{123}\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123}\x{123} - -/X[abc]{2,4}b/8 - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/X[abc]{2,4}?b/8 - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/X[abc]{2,4}+b/8 - X\P -Partial match: X - Xa\P -Partial match: Xa - Xaa\P -Partial match: Xaa - Xaaa\P -Partial match: Xaaa - Xaaaa\P -Partial match: Xaaaa - -/X[abc\x{123}]{2,4}b/8 - X\P -Partial match: X - X\x{123}\P -Partial match: X\x{123} - X\x{123}\x{123}\P -Partial match: X\x{123}\x{123} - X\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123} - X\x{123}\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123}\x{123} - -/X[abc\x{123}]{2,4}?b/8 - X\P -Partial match: X - X\x{123}\P -Partial match: X\x{123} - X\x{123}\x{123}\P -Partial match: X\x{123}\x{123} - X\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123} - X\x{123}\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123}\x{123} - -/X[abc\x{123}]{2,4}+b/8 - X\P -Partial match: X - X\x{123}\P -Partial match: X\x{123} - X\x{123}\x{123}\P -Partial match: X\x{123}\x{123} - X\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123} - X\x{123}\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123}\x{123} - -/X[^a]{2,4}b/8 - X\P -Partial match: X - Xz\P -Partial match: Xz - Xzz\P -Partial match: Xzz - Xzzz\P -Partial match: Xzzz - Xzzzz\P -Partial match: Xzzzz - -/X[^a]{2,4}?b/8 - X\P -Partial match: X - Xz\P -Partial match: Xz - Xzz\P -Partial match: Xzz - Xzzz\P -Partial match: Xzzz - Xzzzz\P -Partial match: Xzzzz - -/X[^a]{2,4}+b/8 - X\P -Partial match: X - Xz\P -Partial match: Xz - Xzz\P -Partial match: Xzz - Xzzz\P -Partial match: Xzzz - Xzzzz\P -Partial match: Xzzzz - -/X[^a]{2,4}b/8 - X\P -Partial match: X - X\x{123}\P -Partial match: X\x{123} - X\x{123}\x{123}\P -Partial match: X\x{123}\x{123} - X\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123} - X\x{123}\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123}\x{123} - -/X[^a]{2,4}?b/8 - X\P -Partial match: X - X\x{123}\P -Partial match: X\x{123} - X\x{123}\x{123}\P -Partial match: X\x{123}\x{123} - X\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123} - X\x{123}\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123}\x{123} - -/X[^a]{2,4}+b/8 - X\P -Partial match: X - X\x{123}\P -Partial match: X\x{123} - X\x{123}\x{123}\P -Partial match: X\x{123}\x{123} - X\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123} - X\x{123}\x{123}\x{123}\x{123}\P -Partial match: X\x{123}\x{123}\x{123}\x{123} - -/(Y)X\1{2,4}b/8 - YX\P -Partial match: YX - YXY\P -Partial match: YXY - YXYY\P -Partial match: YXYY - YXYYY\P -Partial match: YXYYY - YXYYYY\P -Partial match: YXYYYY - -/(Y)X\1{2,4}?b/8 - YX\P -Partial match: YX - YXY\P -Partial match: YXY - YXYY\P -Partial match: YXYY - YXYYY\P -Partial match: YXYYY - YXYYYY\P -Partial match: YXYYYY - -/(Y)X\1{2,4}+b/8 - YX\P -Partial match: YX - YXY\P -Partial match: YXY - YXYY\P -Partial match: YXYY - YXYYY\P -Partial match: YXYYY - YXYYYY\P -Partial match: YXYYYY - -/(\x{123})X\1{2,4}b/8 - \x{123}X\P -Partial match: \x{123}X - \x{123}X\x{123}\P -Partial match: \x{123}X\x{123} - \x{123}X\x{123}\x{123}\P -Partial match: \x{123}X\x{123}\x{123} - \x{123}X\x{123}\x{123}\x{123}\P -Partial match: \x{123}X\x{123}\x{123}\x{123} - \x{123}X\x{123}\x{123}\x{123}\x{123}\P -Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123} - -/(\x{123})X\1{2,4}?b/8 - \x{123}X\P -Partial match: \x{123}X - \x{123}X\x{123}\P -Partial match: \x{123}X\x{123} - \x{123}X\x{123}\x{123}\P -Partial match: \x{123}X\x{123}\x{123} - \x{123}X\x{123}\x{123}\x{123}\P -Partial match: \x{123}X\x{123}\x{123}\x{123} - \x{123}X\x{123}\x{123}\x{123}\x{123}\P -Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123} - -/(\x{123})X\1{2,4}+b/8 - \x{123}X\P -Partial match: \x{123}X - \x{123}X\x{123}\P -Partial match: \x{123}X\x{123} - \x{123}X\x{123}\x{123}\P -Partial match: \x{123}X\x{123}\x{123} - \x{123}X\x{123}\x{123}\x{123}\P -Partial match: \x{123}X\x{123}\x{123}\x{123} - \x{123}X\x{123}\x{123}\x{123}\x{123}\P -Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123} - -/\bthe cat\b/8 - the cat\P - 0: the cat - the cat\P\P -Partial match: the cat - -/abcd*/8 - xxxxabcd\P - 0: abcd - xxxxabcd\P\P -Partial match: abcd - -/abcd*/i8 - xxxxabcd\P - 0: abcd - xxxxabcd\P\P -Partial match: abcd - XXXXABCD\P - 0: ABCD - XXXXABCD\P\P -Partial match: ABCD - -/abc\d*/8 - xxxxabc1\P - 0: abc1 - xxxxabc1\P\P -Partial match: abc1 - -/(a)bc\1*/8 - xxxxabca\P - 0: abca - 1: a - xxxxabca\P\P -Partial match: abca - -/abc[de]*/8 - xxxxabcde\P - 0: abcde - xxxxabcde\P\P -Partial match: abcde - -/X\W{3}X/8 - \PX -Partial match: X - -/\sxxx\s/8T1 - AB\x{85}xxx\x{a0}XYZ - 0: \x{85}xxx\x{a0} - AB\x{a0}xxx\x{85}XYZ - 0: \x{a0}xxx\x{85} - -/\S \S/8T1 - \x{a2} \x{84} - 0: \x{a2} \x{84} - -'A#хц'8xBZ ------------------------------------------------------------------- - Bra - A - Ket - End ------------------------------------------------------------------- - -'A#хц - PQ'8xBZ ------------------------------------------------------------------- - Bra - APQ - Ket - End ------------------------------------------------------------------- - -/a+#Ñ…aa - z#XX?/8xBZ ------------------------------------------------------------------- - Bra - a++ - z - Ket - End ------------------------------------------------------------------- - -/a+#Ñ…aa - z#Ñ…?/8xBZ ------------------------------------------------------------------- - Bra - a++ - z - Ket - End ------------------------------------------------------------------- - -/\g{A}xxx#bXX(?'A'123) (?'A'456)/8xBZ ------------------------------------------------------------------- - Bra - \1 - xxx - CBra 1 - 456 - Ket - Ket - End ------------------------------------------------------------------- - -/\g{A}xxx#bÑ…(?'A'123) (?'A'456)/8xBZ ------------------------------------------------------------------- - Bra - \1 - xxx - CBra 1 - 456 - Ket - Ket - End ------------------------------------------------------------------- - -/^\cÄ£/8 -Failed: \c must be followed by an ASCII character at offset 3 - -/(\R*)(.)/s8 - \r\n - 0: \x{0d} - 1: - 2: \x{0d} - \r\r\n\n\r - 0: \x{0d}\x{0d}\x{0a}\x{0a}\x{0d} - 1: \x{0d}\x{0d}\x{0a}\x{0a} - 2: \x{0d} - \r\r\n\n\r\n - 0: \x{0d}\x{0d}\x{0a}\x{0a}\x{0d} - 1: \x{0d}\x{0d}\x{0a}\x{0a} - 2: \x{0d} - -/(\R)*(.)/s8 - \r\n - 0: \x{0d} - 1: - 2: \x{0d} - \r\r\n\n\r - 0: \x{0d}\x{0d}\x{0a}\x{0a}\x{0d} - 1: \x{0a} - 2: \x{0d} - \r\r\n\n\r\n - 0: \x{0d}\x{0d}\x{0a}\x{0a}\x{0d} - 1: \x{0a} - 2: \x{0d} - -/[^\x{1234}]+/iS8I -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char -Subject length lower bound = 1 -No starting char list - -/[^\x{1234}]+?/iS8I -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char -Subject length lower bound = 1 -No starting char list - -/[^\x{1234}]++/iS8I -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char -Subject length lower bound = 1 -No starting char list - -/[^\x{1234}]{2}/iS8I -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char -Subject length lower bound = 2 -No starting char list - -// -Failed: inconsistent NEWLINE options at offset 0 - -/f.*/ - \P\Pfor -Partial match: for - -/f.*/s - \P\Pfor -Partial match: for - -/f.*/8 - \P\Pfor -Partial match: for - -/f.*/8s - \P\Pfor -Partial match: for - -/\x{d7ff}\x{e000}/8 - -/\x{d800}/8 -Failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 7 - -/\x{dfff}/8 -Failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 7 - -/\h+/8 - \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} - 0: \x{1680}\x{2000}\x{202f}\x{3000} - \x{3001}\x{2fff}\x{200a}\x{a0}\x{2000} - 0: \x{200a}\x{a0}\x{2000} - -/[\h\x{e000}]+/8BZ ------------------------------------------------------------------- - Bra - [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}\x{e000}]++ - Ket - End ------------------------------------------------------------------- - \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} - 0: \x{1680}\x{2000}\x{202f}\x{3000} - \x{3001}\x{2fff}\x{200a}\x{a0}\x{2000} - 0: \x{200a}\x{a0}\x{2000} - -/\H+/8 - \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} - 0: \x{167f}\x{1681}\x{180d}\x{180f} - \x{2000}\x{200a}\x{1fff}\x{200b} - 0: \x{1fff}\x{200b} - \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} - 0: \x{202e}\x{2030}\x{205e}\x{2060} - \x{a0}\x{3000}\x{9f}\x{a1}\x{2fff}\x{3001} - 0: \x{9f}\x{a1}\x{2fff}\x{3001} - -/[\H\x{d7ff}]+/8BZ ------------------------------------------------------------------- - Bra - [\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{10ffff}\x{d7ff}]++ - Ket - End ------------------------------------------------------------------- - \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} - 0: \x{167f}\x{1681}\x{180d}\x{180f} - \x{2000}\x{200a}\x{1fff}\x{200b} - 0: \x{1fff}\x{200b} - \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} - 0: \x{202e}\x{2030}\x{205e}\x{2060} - \x{a0}\x{3000}\x{9f}\x{a1}\x{2fff}\x{3001} - 0: \x{9f}\x{a1}\x{2fff}\x{3001} - -/\v+/8 - \x{2027}\x{2030}\x{2028}\x{2029} - 0: \x{2028}\x{2029} - \x09\x0e\x{84}\x{86}\x{85}\x0a\x0b\x0c\x0d - 0: \x{85}\x{0a}\x{0b}\x{0c}\x{0d} - -/[\v\x{e000}]+/8BZ ------------------------------------------------------------------- - Bra - [\x0a-\x0d\x85\x{2028}-\x{2029}\x{e000}]++ - Ket - End ------------------------------------------------------------------- - \x{2027}\x{2030}\x{2028}\x{2029} - 0: \x{2028}\x{2029} - \x09\x0e\x{84}\x{86}\x{85}\x0a\x0b\x0c\x0d - 0: \x{85}\x{0a}\x{0b}\x{0c}\x{0d} - -/\V+/8 - \x{2028}\x{2029}\x{2027}\x{2030} - 0: \x{2027}\x{2030} - \x{85}\x0a\x0b\x0c\x0d\x09\x0e\x{84}\x{86} - 0: \x{09}\x{0e}\x{84}\x{86} - -/[\V\x{d7ff}]+/8BZ ------------------------------------------------------------------- - Bra - [\x00-\x09\x0e-\x84\x86-\xff\x{100}-\x{2027}\x{202a}-\x{10ffff}\x{d7ff}]++ - Ket - End ------------------------------------------------------------------- - \x{2028}\x{2029}\x{2027}\x{2030} - 0: \x{2027}\x{2030} - \x{85}\x0a\x0b\x0c\x0d\x09\x0e\x{84}\x{86} - 0: \x{09}\x{0e}\x{84}\x{86} - -/\R+/8 - \x{2027}\x{2030}\x{2028}\x{2029} - 0: \x{2028}\x{2029} - \x09\x0e\x{84}\x{86}\x{85}\x0a\x0b\x0c\x0d - 0: \x{85}\x{0a}\x{0b}\x{0c}\x{0d} - -/(..)\1/8 - ab\P -Partial match: ab - aba\P -Partial match: aba - abab\P - 0: abab - 1: ab - -/(..)\1/8i - ab\P -Partial match: ab - abA\P -Partial match: abA - aBAb\P - 0: aBAb - 1: aB - -/(..)\1{2,}/8 - ab\P -Partial match: ab - aba\P -Partial match: aba - abab\P -Partial match: abab - ababa\P -Partial match: ababa - ababab\P - 0: ababab - 1: ab - ababab\P\P -Partial match: ababab - abababa\P - 0: ababab - 1: ab - abababa\P\P -Partial match: abababa - -/(..)\1{2,}/8i - ab\P -Partial match: ab - aBa\P -Partial match: aBa - aBAb\P -Partial match: aBAb - AbaBA\P -Partial match: AbaBA - abABAb\P - 0: abABAb - 1: ab - aBAbaB\P\P -Partial match: aBAbaB - abABabA\P - 0: abABab - 1: ab - abaBABa\P\P -Partial match: abaBABa - -/(..)\1{2,}?x/8i - ab\P -Partial match: ab - abA\P -Partial match: abA - aBAb\P -Partial match: aBAb - abaBA\P -Partial match: abaBA - abAbaB\P -Partial match: abAbaB - abaBabA\P -Partial match: abaBabA - abAbABaBx\P - 0: abAbABaBx - 1: ab - -/./8 - \r\P - 0: \x{0d} - \r\P\P -Partial match: \x{0d} - -/.{2,3}/8 - \r\P -Partial match: \x{0d} - \r\P\P -Partial match: \x{0d} - \r\r\P - 0: \x{0d}\x{0d} - \r\r\P\P -Partial match: \x{0d}\x{0d} - \r\r\r\P - 0: \x{0d}\x{0d}\x{0d} - \r\r\r\P\P -Partial match: \x{0d}\x{0d}\x{0d} - -/.{2,3}?/8 - \r\P -Partial match: \x{0d} - \r\P\P -Partial match: \x{0d} - \r\r\P - 0: \x{0d}\x{0d} - \r\r\P\P -Partial match: \x{0d}\x{0d} - \r\r\r\P - 0: \x{0d}\x{0d} - \r\r\r\P\P - 0: \x{0d}\x{0d} - -/[^\x{100}][^\x{1234}][^\x{ffff}][^\x{10000}][^\x{10ffff}]/8BZ ------------------------------------------------------------------- - Bra - [^\x{100}] - [^\x{1234}] - [^\x{ffff}] - [^\x{10000}] - [^\x{10ffff}] - Ket - End ------------------------------------------------------------------- - -/[^\x{100}][^\x{1234}][^\x{ffff}][^\x{10000}][^\x{10ffff}]/8BZi ------------------------------------------------------------------- - Bra - /i [^\x{100}] - /i [^\x{1234}] - /i [^\x{ffff}] - /i [^\x{10000}] - /i [^\x{10ffff}] - Ket - End ------------------------------------------------------------------- - -/[^\x{100}]*[^\x{10000}]+[^\x{10ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{fffff}]{5,6}+/8BZ ------------------------------------------------------------------- - Bra - [^\x{100}]* - [^\x{10000}]+ - [^\x{10ffff}]?? - [^\x{8000}]{4} - [^\x{8000}]* - [^\x{7fff}]{2} - [^\x{7fff}]{0,7}? - [^\x{fffff}]{5} - [^\x{fffff}]?+ - Ket - End ------------------------------------------------------------------- - -/[^\x{100}]*[^\x{10000}]+[^\x{10ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{fffff}]{5,6}+/8BZi ------------------------------------------------------------------- - Bra - /i [^\x{100}]* - /i [^\x{10000}]+ - /i [^\x{10ffff}]?? - /i [^\x{8000}]{4} - /i [^\x{8000}]* - /i [^\x{7fff}]{2} - /i [^\x{7fff}]{0,7}? - /i [^\x{fffff}]{5} - /i [^\x{fffff}]?+ - Ket - End ------------------------------------------------------------------- - -/(?<=\x{1234}\x{1234})\bxy/I8 -Capturing subpattern count = 0 -Max lookbehind = 2 -Options: utf -First char = 'x' -Need char = 'y' - -/(?8BZ ------------------------------------------------------------------- - Bra - \x{100} - Ket - End ------------------------------------------------------------------- - -/[\u0100-\u0200]/8BZ ------------------------------------------------------------------- - Bra - [\x{100}-\x{200}] - Ket - End ------------------------------------------------------------------- - -/\ud800/8 -Failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 5 - -/^a+[a\x{200}]/8BZ ------------------------------------------------------------------- - Bra - ^ - a+ - [a\x{200}] - Ket - End ------------------------------------------------------------------- - aa - 0: aa - -/[b-d\x{200}-\x{250}]*[ae-h]?#[\x{200}-\x{250}]{0,8}[\x00-\xff]*#[\x{200}-\x{250}]+[a-z]/8BZ ------------------------------------------------------------------- - Bra - [b-d\x{200}-\x{250}]*+ - [ae-h]?+ - # - [\x{200}-\x{250}]{0,8}+ - [\x00-\xff]* - # - [\x{200}-\x{250}]++ - [a-z] - Ket - End ------------------------------------------------------------------- - -/[^\xff]*PRUNE:\x{100}abc(xyz(?1))/8DZ ------------------------------------------------------------------- - Bra - [^\x{ff}]* - PRUNE:\x{100}abc - CBra 1 - xyz - Recurse - Ket - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 1 -Options: utf -No first char -Need char = 'z' - -/(?<=\K\x{17f})/8g+ - \x{17f}\x{17f}\x{17f}\x{17f}\x{17f} - 0: \x{17f} - 0+ \x{17f}\x{17f}\x{17f}\x{17f} - 0: \x{17f} - 0+ \x{17f}\x{17f}\x{17f}\x{17f} - 0: \x{17f} - 0+ \x{17f}\x{17f}\x{17f} - 0: \x{17f} - 0+ \x{17f}\x{17f} - 0: \x{17f} - 0+ \x{17f} - 0: \x{17f} - 0+ - -/(?<=\K\x{17f})/8G+ - \x{17f}\x{17f}\x{17f}\x{17f}\x{17f} - 0: \x{17f} - 0+ \x{17f}\x{17f}\x{17f}\x{17f} - 0: \x{17f} - 0+ \x{17f}\x{17f}\x{17f} - 0: \x{17f} - 0+ \x{17f}\x{17f} - 0: \x{17f} - 0+ \x{17f} - 0: \x{17f} - 0+ - -/\C[^\v]+\x80/8 - [Aá¿»BÅ€C] -No match - -/\C[^\d]+\x80/8 - [Aá¿»BÅ€C] -No match - -/-- End of testinput5 --/ diff --git a/src/pcre/testdata/testoutput6 b/src/pcre/testdata/testoutput6 deleted file mode 100644 index 422d3833..00000000 --- a/src/pcre/testdata/testoutput6 +++ /dev/null @@ -1,2584 +0,0 @@ -/-- This set of tests is for Unicode property support. It is compatible with - Perl >= 5.15. --/ - -< forbid 9?=ABCDEFfGILMNPTUXZ< - -/^\pC\pL\pM\pN\pP\pS\pZ\s+/8W - >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} - 0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{09}\x{0b} - -/^>\pZ+/8W - >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} - 0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f} - -/^>[[:space:]]*/8W - >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} - 0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{09}\x{0b} - -/^>[[:blank:]]*/8W - >\x{20}\x{a0}\x{1680}\x{180e}\x{2000}\x{202f}\x{9}\x{b}\x{2028} - 0: > \x{a0}\x{1680}\x{180e}\x{2000}\x{202f}\x{09} - -/^[[:alpha:]]*/8W - Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d} - 0: Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d} - -/^[[:alnum:]]*/8W - Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee} - 0: Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee} - -/^[[:cntrl:]]*/8W - \x{0}\x{09}\x{1f}\x{7f}\x{9f} - 0: \x{00}\x{09}\x{1f}\x{7f} - -/^[[:graph:]]*/8W - A\x{a1}\x{a0} - 0: A\x{a1} - -/^[[:print:]]*/8W - A z\x{a0}\x{a1} - 0: A z\x{a0}\x{a1} - -/^[[:punct:]]*/8W - .+\x{a1}\x{a0} - 0: .+\x{a1} - -/\p{Zs}*?\R/ - ** Failers -No match - a\xFCb -No match - -/\p{Zs}*\R/ - ** Failers -No match - a\xFCb -No match - -/â±¥/8i - â±¥ - 0: \x{2c65} - Ⱥx - 0: \x{23a} - Ⱥ - 0: \x{23a} - -/[â±¥]/8i - â±¥ - 0: \x{2c65} - Ⱥx - 0: \x{23a} - Ⱥ - 0: \x{23a} - -/Ⱥ/8i - Ⱥ - 0: \x{23a} - â±¥ - 0: \x{2c65} - -/-- These are tests for extended grapheme clusters --/ - -/^\X/8+ - G\x{34e}\x{34e}X - 0: G\x{34e}\x{34e} - 0+ X - \x{34e}\x{34e}X - 0: \x{34e}\x{34e} - 0+ X - \x04X - 0: \x{04} - 0+ X - \x{1100}X - 0: \x{1100} - 0+ X - \x{1100}\x{34e}X - 0: \x{1100}\x{34e} - 0+ X - \x{1b04}\x{1b04}X - 0: \x{1b04}\x{1b04} - 0+ X - *These match up to the roman letters - 0: * - 0+ These match up to the roman letters - \x{1111}\x{1111}L,L - 0: \x{1111}\x{1111} - 0+ L,L - \x{1111}\x{1111}\x{1169}L,L,V - 0: \x{1111}\x{1111}\x{1169} - 0+ L,L,V - \x{1111}\x{ae4c}L, LV - 0: \x{1111}\x{ae4c} - 0+ L, LV - \x{1111}\x{ad89}L, LVT - 0: \x{1111}\x{ad89} - 0+ L, LVT - \x{1111}\x{ae4c}\x{1169}L, LV, V - 0: \x{1111}\x{ae4c}\x{1169} - 0+ L, LV, V - \x{1111}\x{ae4c}\x{1169}\x{1169}L, LV, V, V - 0: \x{1111}\x{ae4c}\x{1169}\x{1169} - 0+ L, LV, V, V - \x{1111}\x{ae4c}\x{1169}\x{11fe}L, LV, V, T - 0: \x{1111}\x{ae4c}\x{1169}\x{11fe} - 0+ L, LV, V, T - \x{1111}\x{ad89}\x{11fe}L, LVT, T - 0: \x{1111}\x{ad89}\x{11fe} - 0+ L, LVT, T - \x{1111}\x{ad89}\x{11fe}\x{11fe}L, LVT, T, T - 0: \x{1111}\x{ad89}\x{11fe}\x{11fe} - 0+ L, LVT, T, T - \x{ad89}\x{11fe}\x{11fe}LVT, T, T - 0: \x{ad89}\x{11fe}\x{11fe} - 0+ LVT, T, T - *These match just the first codepoint (invalid sequence) - 0: * - 0+ These match just the first codepoint (invalid sequence) - \x{1111}\x{11fe}L, T - 0: \x{1111} - 0+ \x{11fe}L, T - \x{ae4c}\x{1111}LV, L - 0: \x{ae4c} - 0+ \x{1111}LV, L - \x{ae4c}\x{ae4c}LV, LV - 0: \x{ae4c} - 0+ \x{ae4c}LV, LV - \x{ae4c}\x{ad89}LV, LVT - 0: \x{ae4c} - 0+ \x{ad89}LV, LVT - \x{1169}\x{1111}V, L - 0: \x{1169} - 0+ \x{1111}V, L - \x{1169}\x{ae4c}V, LV - 0: \x{1169} - 0+ \x{ae4c}V, LV - \x{1169}\x{ad89}V, LVT - 0: \x{1169} - 0+ \x{ad89}V, LVT - \x{ad89}\x{1111}LVT, L - 0: \x{ad89} - 0+ \x{1111}LVT, L - \x{ad89}\x{1169}LVT, V - 0: \x{ad89} - 0+ \x{1169}LVT, V - \x{ad89}\x{ae4c}LVT, LV - 0: \x{ad89} - 0+ \x{ae4c}LVT, LV - \x{ad89}\x{ad89}LVT, LVT - 0: \x{ad89} - 0+ \x{ad89}LVT, LVT - \x{11fe}\x{1111}T, L - 0: \x{11fe} - 0+ \x{1111}T, L - \x{11fe}\x{1169}T, V - 0: \x{11fe} - 0+ \x{1169}T, V - \x{11fe}\x{ae4c}T, LV - 0: \x{11fe} - 0+ \x{ae4c}T, LV - \x{11fe}\x{ad89}T, LVT - 0: \x{11fe} - 0+ \x{ad89}T, LVT - *Test extend and spacing mark - 0: * - 0+ Test extend and spacing mark - \x{1111}\x{ae4c}\x{0711}L, LV, extend - 0: \x{1111}\x{ae4c}\x{711} - 0+ L, LV, extend - \x{1111}\x{ae4c}\x{1b04}L, LV, spacing mark - 0: \x{1111}\x{ae4c}\x{1b04} - 0+ L, LV, spacing mark - \x{1111}\x{ae4c}\x{1b04}\x{0711}\x{1b04}L, LV, spacing mark, extend, spacing mark - 0: \x{1111}\x{ae4c}\x{1b04}\x{711}\x{1b04} - 0+ L, LV, spacing mark, extend, spacing mark - *Test CR, LF, and control - 0: * - 0+ Test CR, LF, and control - \x0d\x{0711}CR, extend - 0: \x{0d} - 0+ \x{711}CR, extend - \x0d\x{1b04}CR, spacingmark - 0: \x{0d} - 0+ \x{1b04}CR, spacingmark - \x0a\x{0711}LF, extend - 0: \x{0a} - 0+ \x{711}LF, extend - \x0a\x{1b04}LF, spacingmark - 0: \x{0a} - 0+ \x{1b04}LF, spacingmark - \x0b\x{0711}Control, extend - 0: \x{0b} - 0+ \x{711}Control, extend - \x09\x{1b04}Control, spacingmark - 0: \x{09} - 0+ \x{1b04}Control, spacingmark - *There are no Prepend characters, so we can't test Prepend, CR - 0: * - 0+ There are no Prepend characters, so we can't test Prepend, CR - -/^(?>\X{2})X/8+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0+ - -/^\X{2,4}X/8+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0+ - -/^\X{2,4}?X/8+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0+ - \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X - 0+ - -/\X*Z/8Y - A\x{300} -No match - -/\X*(.)/8Y - A\x{1111}\x{ae4c}\x{1169} - 0: A\x{1111} - 1: \x{1111} - -/\X?abc/8Y -\xff\x7f\x00\x00\x03\x00\x41\xcc\x80\x41\x{300}\x61\x62\x63\x00\>06\? - 0: A\x{300}abc - -/-- --/ - -/\x{1e9e}+/8i - \x{1e9e}\x{00df} - 0: \x{1e9e}\x{df} - -/[z\x{1e9e}]+/8i - \x{1e9e}\x{00df} - 0: \x{1e9e}\x{df} - -/\x{00df}+/8i - \x{1e9e}\x{00df} - 0: \x{1e9e}\x{df} - -/[z\x{00df}]+/8i - \x{1e9e}\x{00df} - 0: \x{1e9e}\x{df} - -/\x{1f88}+/8i - \x{1f88}\x{1f80} - 0: \x{1f88}\x{1f80} - -/[z\x{1f88}]+/8i - \x{1f88}\x{1f80} - 0: \x{1f88}\x{1f80} - -/-- Characters with more than one other case; test in classes --/ - -/[z\x{00b5}]+/8i - \x{00b5}\x{039c}\x{03bc} - 0: \x{b5}\x{39c}\x{3bc} - -/[z\x{039c}]+/8i - \x{00b5}\x{039c}\x{03bc} - 0: \x{b5}\x{39c}\x{3bc} - -/[z\x{03bc}]+/8i - \x{00b5}\x{039c}\x{03bc} - 0: \x{b5}\x{39c}\x{3bc} - -/[z\x{00c5}]+/8i - \x{00c5}\x{00e5}\x{212b} - 0: \x{c5}\x{e5}\x{212b} - -/[z\x{00e5}]+/8i - \x{00c5}\x{00e5}\x{212b} - 0: \x{c5}\x{e5}\x{212b} - -/[z\x{212b}]+/8i - \x{00c5}\x{00e5}\x{212b} - 0: \x{c5}\x{e5}\x{212b} - -/[z\x{01c4}]+/8i - \x{01c4}\x{01c5}\x{01c6} - 0: \x{1c4}\x{1c5}\x{1c6} - -/[z\x{01c5}]+/8i - \x{01c4}\x{01c5}\x{01c6} - 0: \x{1c4}\x{1c5}\x{1c6} - -/[z\x{01c6}]+/8i - \x{01c4}\x{01c5}\x{01c6} - 0: \x{1c4}\x{1c5}\x{1c6} - -/[z\x{01c7}]+/8i - \x{01c7}\x{01c8}\x{01c9} - 0: \x{1c7}\x{1c8}\x{1c9} - -/[z\x{01c8}]+/8i - \x{01c7}\x{01c8}\x{01c9} - 0: \x{1c7}\x{1c8}\x{1c9} - -/[z\x{01c9}]+/8i - \x{01c7}\x{01c8}\x{01c9} - 0: \x{1c7}\x{1c8}\x{1c9} - -/[z\x{01ca}]+/8i - \x{01ca}\x{01cb}\x{01cc} - 0: \x{1ca}\x{1cb}\x{1cc} - -/[z\x{01cb}]+/8i - \x{01ca}\x{01cb}\x{01cc} - 0: \x{1ca}\x{1cb}\x{1cc} - -/[z\x{01cc}]+/8i - \x{01ca}\x{01cb}\x{01cc} - 0: \x{1ca}\x{1cb}\x{1cc} - -/[z\x{01f1}]+/8i - \x{01f1}\x{01f2}\x{01f3} - 0: \x{1f1}\x{1f2}\x{1f3} - -/[z\x{01f2}]+/8i - \x{01f1}\x{01f2}\x{01f3} - 0: \x{1f1}\x{1f2}\x{1f3} - -/[z\x{01f3}]+/8i - \x{01f1}\x{01f2}\x{01f3} - 0: \x{1f1}\x{1f2}\x{1f3} - -/[z\x{0345}]+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - 0: \x{345}\x{399}\x{3b9}\x{1fbe} - -/[z\x{0399}]+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - 0: \x{345}\x{399}\x{3b9}\x{1fbe} - -/[z\x{03b9}]+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - 0: \x{345}\x{399}\x{3b9}\x{1fbe} - -/[z\x{1fbe}]+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - 0: \x{345}\x{399}\x{3b9}\x{1fbe} - -/[z\x{0392}]+/8i - \x{0392}\x{03b2}\x{03d0} - 0: \x{392}\x{3b2}\x{3d0} - -/[z\x{03b2}]+/8i - \x{0392}\x{03b2}\x{03d0} - 0: \x{392}\x{3b2}\x{3d0} - -/[z\x{03d0}]+/8i - \x{0392}\x{03b2}\x{03d0} - 0: \x{392}\x{3b2}\x{3d0} - -/[z\x{0395}]+/8i - \x{0395}\x{03b5}\x{03f5} - 0: \x{395}\x{3b5}\x{3f5} - -/[z\x{03b5}]+/8i - \x{0395}\x{03b5}\x{03f5} - 0: \x{395}\x{3b5}\x{3f5} - -/[z\x{03f5}]+/8i - \x{0395}\x{03b5}\x{03f5} - 0: \x{395}\x{3b5}\x{3f5} - -/[z\x{0398}]+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - 0: \x{398}\x{3b8}\x{3d1}\x{3f4} - -/[z\x{03b8}]+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - 0: \x{398}\x{3b8}\x{3d1}\x{3f4} - -/[z\x{03d1}]+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - 0: \x{398}\x{3b8}\x{3d1}\x{3f4} - -/[z\x{03f4}]+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - 0: \x{398}\x{3b8}\x{3d1}\x{3f4} - -/[z\x{039a}]+/8i - \x{039a}\x{03ba}\x{03f0} - 0: \x{39a}\x{3ba}\x{3f0} - -/[z\x{03ba}]+/8i - \x{039a}\x{03ba}\x{03f0} - 0: \x{39a}\x{3ba}\x{3f0} - -/[z\x{03f0}]+/8i - \x{039a}\x{03ba}\x{03f0} - 0: \x{39a}\x{3ba}\x{3f0} - -/[z\x{03a0}]+/8i - \x{03a0}\x{03c0}\x{03d6} - 0: \x{3a0}\x{3c0}\x{3d6} - -/[z\x{03c0}]+/8i - \x{03a0}\x{03c0}\x{03d6} - 0: \x{3a0}\x{3c0}\x{3d6} - -/[z\x{03d6}]+/8i - \x{03a0}\x{03c0}\x{03d6} - 0: \x{3a0}\x{3c0}\x{3d6} - -/[z\x{03a1}]+/8i - \x{03a1}\x{03c1}\x{03f1} - 0: \x{3a1}\x{3c1}\x{3f1} - -/[z\x{03c1}]+/8i - \x{03a1}\x{03c1}\x{03f1} - 0: \x{3a1}\x{3c1}\x{3f1} - -/[z\x{03f1}]+/8i - \x{03a1}\x{03c1}\x{03f1} - 0: \x{3a1}\x{3c1}\x{3f1} - -/[z\x{03a3}]+/8i - \x{03A3}\x{03C2}\x{03C3} - 0: \x{3a3}\x{3c2}\x{3c3} - -/[z\x{03c2}]+/8i - \x{03A3}\x{03C2}\x{03C3} - 0: \x{3a3}\x{3c2}\x{3c3} - -/[z\x{03c3}]+/8i - \x{03A3}\x{03C2}\x{03C3} - 0: \x{3a3}\x{3c2}\x{3c3} - -/[z\x{03a6}]+/8i - \x{03a6}\x{03c6}\x{03d5} - 0: \x{3a6}\x{3c6}\x{3d5} - -/[z\x{03c6}]+/8i - \x{03a6}\x{03c6}\x{03d5} - 0: \x{3a6}\x{3c6}\x{3d5} - -/[z\x{03d5}]+/8i - \x{03a6}\x{03c6}\x{03d5} - 0: \x{3a6}\x{3c6}\x{3d5} - -/[z\x{03c9}]+/8i - \x{03c9}\x{03a9}\x{2126} - 0: \x{3c9}\x{3a9}\x{2126} - -/[z\x{03a9}]+/8i - \x{03c9}\x{03a9}\x{2126} - 0: \x{3c9}\x{3a9}\x{2126} - -/[z\x{2126}]+/8i - \x{03c9}\x{03a9}\x{2126} - 0: \x{3c9}\x{3a9}\x{2126} - -/[z\x{1e60}]+/8i - \x{1e60}\x{1e61}\x{1e9b} - 0: \x{1e60}\x{1e61}\x{1e9b} - -/[z\x{1e61}]+/8i - \x{1e60}\x{1e61}\x{1e9b} - 0: \x{1e60}\x{1e61}\x{1e9b} - -/[z\x{1e9b}]+/8i - \x{1e60}\x{1e61}\x{1e9b} - 0: \x{1e60}\x{1e61}\x{1e9b} - -/-- Perl 5.12.4 gets these wrong, but 5.15.3 is OK --/ - -/[z\x{004b}]+/8i - \x{004b}\x{006b}\x{212a} - 0: Kk\x{212a} - -/[z\x{006b}]+/8i - \x{004b}\x{006b}\x{212a} - 0: Kk\x{212a} - -/[z\x{212a}]+/8i - \x{004b}\x{006b}\x{212a} - 0: Kk\x{212a} - -/[z\x{0053}]+/8i - \x{0053}\x{0073}\x{017f} - 0: Ss\x{17f} - -/[z\x{0073}]+/8i - \x{0053}\x{0073}\x{017f} - 0: Ss\x{17f} - -/[z\x{017f}]+/8i - \x{0053}\x{0073}\x{017f} - 0: Ss\x{17f} - -/-- --/ - -/(ΣΆΜΟΣ) \1/8i - ΣΆΜΟΣ ΣΆΜΟΣ - 0: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} - 1: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} - ΣΆΜΟΣ σάμος - 0: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} - 1: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} - σάμος σάμος - 0: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} - 1: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} - σάμος σάμοσ - 0: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c3} - 1: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} - σάμος ΣΆΜΟΣ - 0: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} - 1: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} - -/(σάμος) \1/8i - ΣΆΜΟΣ ΣΆΜΟΣ - 0: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} - 1: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} - ΣΆΜΟΣ σάμος - 0: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} - 1: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} - σάμος σάμος - 0: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} - 1: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} - σάμος σάμοσ - 0: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c3} - 1: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} - σάμος ΣΆΜΟΣ - 0: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} - 1: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} - -/(ΣΆΜΟΣ) \1*/8i - ΣΆΜΟΣ\x20 - 0: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} - 1: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} - ΣΆΜΟΣ ΣΆΜΟΣσάμοςσάμος - 0: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3}\x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2}\x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} - 1: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} - -/-- Perl matches these --/ - -/\x{00b5}+/8i - \x{00b5}\x{039c}\x{03bc} - 0: \x{b5}\x{39c}\x{3bc} - -/\x{039c}+/8i - \x{00b5}\x{039c}\x{03bc} - 0: \x{b5}\x{39c}\x{3bc} - -/\x{03bc}+/8i - \x{00b5}\x{039c}\x{03bc} - 0: \x{b5}\x{39c}\x{3bc} - - -/\x{00c5}+/8i - \x{00c5}\x{00e5}\x{212b} - 0: \x{c5}\x{e5}\x{212b} - -/\x{00e5}+/8i - \x{00c5}\x{00e5}\x{212b} - 0: \x{c5}\x{e5}\x{212b} - -/\x{212b}+/8i - \x{00c5}\x{00e5}\x{212b} - 0: \x{c5}\x{e5}\x{212b} - - -/\x{01c4}+/8i - \x{01c4}\x{01c5}\x{01c6} - 0: \x{1c4}\x{1c5}\x{1c6} - -/\x{01c5}+/8i - \x{01c4}\x{01c5}\x{01c6} - 0: \x{1c4}\x{1c5}\x{1c6} - -/\x{01c6}+/8i - \x{01c4}\x{01c5}\x{01c6} - 0: \x{1c4}\x{1c5}\x{1c6} - - -/\x{01c7}+/8i - \x{01c7}\x{01c8}\x{01c9} - 0: \x{1c7}\x{1c8}\x{1c9} - -/\x{01c8}+/8i - \x{01c7}\x{01c8}\x{01c9} - 0: \x{1c7}\x{1c8}\x{1c9} - -/\x{01c9}+/8i - \x{01c7}\x{01c8}\x{01c9} - 0: \x{1c7}\x{1c8}\x{1c9} - - -/\x{01ca}+/8i - \x{01ca}\x{01cb}\x{01cc} - 0: \x{1ca}\x{1cb}\x{1cc} - -/\x{01cb}+/8i - \x{01ca}\x{01cb}\x{01cc} - 0: \x{1ca}\x{1cb}\x{1cc} - -/\x{01cc}+/8i - \x{01ca}\x{01cb}\x{01cc} - 0: \x{1ca}\x{1cb}\x{1cc} - - -/\x{01f1}+/8i - \x{01f1}\x{01f2}\x{01f3} - 0: \x{1f1}\x{1f2}\x{1f3} - -/\x{01f2}+/8i - \x{01f1}\x{01f2}\x{01f3} - 0: \x{1f1}\x{1f2}\x{1f3} - -/\x{01f3}+/8i - \x{01f1}\x{01f2}\x{01f3} - 0: \x{1f1}\x{1f2}\x{1f3} - - -/\x{0345}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - 0: \x{345}\x{399}\x{3b9}\x{1fbe} - -/\x{0399}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - 0: \x{345}\x{399}\x{3b9}\x{1fbe} - -/\x{03b9}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - 0: \x{345}\x{399}\x{3b9}\x{1fbe} - -/\x{1fbe}+/8i - \x{0345}\x{0399}\x{03b9}\x{1fbe} - 0: \x{345}\x{399}\x{3b9}\x{1fbe} - - -/\x{0392}+/8i - \x{0392}\x{03b2}\x{03d0} - 0: \x{392}\x{3b2}\x{3d0} - -/\x{03b2}+/8i - \x{0392}\x{03b2}\x{03d0} - 0: \x{392}\x{3b2}\x{3d0} - -/\x{03d0}+/8i - \x{0392}\x{03b2}\x{03d0} - 0: \x{392}\x{3b2}\x{3d0} - - -/\x{0395}+/8i - \x{0395}\x{03b5}\x{03f5} - 0: \x{395}\x{3b5}\x{3f5} - -/\x{03b5}+/8i - \x{0395}\x{03b5}\x{03f5} - 0: \x{395}\x{3b5}\x{3f5} - -/\x{03f5}+/8i - \x{0395}\x{03b5}\x{03f5} - 0: \x{395}\x{3b5}\x{3f5} - - -/\x{0398}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - 0: \x{398}\x{3b8}\x{3d1}\x{3f4} - -/\x{03b8}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - 0: \x{398}\x{3b8}\x{3d1}\x{3f4} - -/\x{03d1}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - 0: \x{398}\x{3b8}\x{3d1}\x{3f4} - -/\x{03f4}+/8i - \x{0398}\x{03b8}\x{03d1}\x{03f4} - 0: \x{398}\x{3b8}\x{3d1}\x{3f4} - - -/\x{039a}+/8i - \x{039a}\x{03ba}\x{03f0} - 0: \x{39a}\x{3ba}\x{3f0} - -/\x{03ba}+/8i - \x{039a}\x{03ba}\x{03f0} - 0: \x{39a}\x{3ba}\x{3f0} - -/\x{03f0}+/8i - \x{039a}\x{03ba}\x{03f0} - 0: \x{39a}\x{3ba}\x{3f0} - - -/\x{03a0}+/8i - \x{03a0}\x{03c0}\x{03d6} - 0: \x{3a0}\x{3c0}\x{3d6} - -/\x{03c0}+/8i - \x{03a0}\x{03c0}\x{03d6} - 0: \x{3a0}\x{3c0}\x{3d6} - -/\x{03d6}+/8i - \x{03a0}\x{03c0}\x{03d6} - 0: \x{3a0}\x{3c0}\x{3d6} - - -/\x{03a1}+/8i - \x{03a1}\x{03c1}\x{03f1} - 0: \x{3a1}\x{3c1}\x{3f1} - -/\x{03c1}+/8i - \x{03a1}\x{03c1}\x{03f1} - 0: \x{3a1}\x{3c1}\x{3f1} - -/\x{03f1}+/8i - \x{03a1}\x{03c1}\x{03f1} - 0: \x{3a1}\x{3c1}\x{3f1} - - -/\x{03a3}+/8i - \x{03A3}\x{03C2}\x{03C3} - 0: \x{3a3}\x{3c2}\x{3c3} - -/\x{03c2}+/8i - \x{03A3}\x{03C2}\x{03C3} - 0: \x{3a3}\x{3c2}\x{3c3} - -/\x{03c3}+/8i - \x{03A3}\x{03C2}\x{03C3} - 0: \x{3a3}\x{3c2}\x{3c3} - - -/\x{03a6}+/8i - \x{03a6}\x{03c6}\x{03d5} - 0: \x{3a6}\x{3c6}\x{3d5} - -/\x{03c6}+/8i - \x{03a6}\x{03c6}\x{03d5} - 0: \x{3a6}\x{3c6}\x{3d5} - -/\x{03d5}+/8i - \x{03a6}\x{03c6}\x{03d5} - 0: \x{3a6}\x{3c6}\x{3d5} - - -/\x{03c9}+/8i - \x{03c9}\x{03a9}\x{2126} - 0: \x{3c9}\x{3a9}\x{2126} - -/\x{03a9}+/8i - \x{03c9}\x{03a9}\x{2126} - 0: \x{3c9}\x{3a9}\x{2126} - -/\x{2126}+/8i - \x{03c9}\x{03a9}\x{2126} - 0: \x{3c9}\x{3a9}\x{2126} - - -/\x{1e60}+/8i - \x{1e60}\x{1e61}\x{1e9b} - 0: \x{1e60}\x{1e61}\x{1e9b} - -/\x{1e61}+/8i - \x{1e60}\x{1e61}\x{1e9b} - 0: \x{1e60}\x{1e61}\x{1e9b} - -/\x{1e9b}+/8i - \x{1e60}\x{1e61}\x{1e9b} - 0: \x{1e60}\x{1e61}\x{1e9b} - - -/\x{1e9e}+/8i - \x{1e9e}\x{00df} - 0: \x{1e9e}\x{df} - -/\x{00df}+/8i - \x{1e9e}\x{00df} - 0: \x{1e9e}\x{df} - - -/\x{1f88}+/8i - \x{1f88}\x{1f80} - 0: \x{1f88}\x{1f80} - -/\x{1f80}+/8i - \x{1f88}\x{1f80} - 0: \x{1f88}\x{1f80} - - -/-- Perl 5.12.4 gets these wrong, but 5.15.3 is OK --/ - -/\x{004b}+/8i - \x{004b}\x{006b}\x{212a} - 0: Kk\x{212a} - -/\x{006b}+/8i - \x{004b}\x{006b}\x{212a} - 0: Kk\x{212a} - -/\x{212a}+/8i - \x{004b}\x{006b}\x{212a} - 0: Kk\x{212a} - - -/\x{0053}+/8i - \x{0053}\x{0073}\x{017f} - 0: Ss\x{17f} - -/\x{0073}+/8i - \x{0053}\x{0073}\x{017f} - 0: Ss\x{17f} - -/\x{017f}+/8i - \x{0053}\x{0073}\x{017f} - 0: Ss\x{17f} - -/^\p{Any}*\d{4}/8 - 1234 - 0: 1234 - 123 -No match - -/^\X*\w{4}/8 - 1234 - 0: 1234 - 123 -No match - -/^A\s+Z/8W - A\x{2005}Z - 0: A\x{2005}Z - A\x{85}\x{180e}\x{2005}Z - 0: A\x{85}\x{180e}\x{2005}Z - -/^A[\s]+Z/8W - A\x{2005}Z - 0: A\x{2005}Z - A\x{85}\x{180e}\x{2005}Z - 0: A\x{85}\x{180e}\x{2005}Z - -/^[[:graph:]]+$/8W - Letter:ABC - 0: Letter:ABC - Mark:\x{300}\x{1d172}\x{1d17b} - 0: Mark:\x{300}\x{1d172}\x{1d17b} - Number:9\x{660} - 0: Number:9\x{660} - Punctuation:\x{66a},; - 0: Punctuation:\x{66a},; - Symbol:\x{6de}<>\x{fffc} - 0: Symbol:\x{6de}<>\x{fffc} - Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} - 0: Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} - \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} - 0: \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} - \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} - 0: \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} - \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} - 0: \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} - \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} - 0: \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} - \x{feff} - 0: \x{feff} - \x{fff9}\x{fffa}\x{fffb} - 0: \x{fff9}\x{fffa}\x{fffb} - \x{110bd} - 0: \x{110bd} - \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} - 0: \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} - \x{e0001} - 0: \x{e0001} - \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} - 0: \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} - ** Failers -No match - \x{09} -No match - \x{0a} -No match - \x{1D} -No match - \x{20} -No match - \x{85} -No match - \x{a0} -No match - \x{61c} -No match - \x{1680} -No match - \x{180e} -No match - \x{2028} -No match - \x{2029} -No match - \x{202f} -No match - \x{2065} -No match - \x{2066} -No match - \x{2067} -No match - \x{2068} -No match - \x{2069} -No match - \x{3000} -No match - \x{e0002} -No match - \x{e001f} -No match - \x{e0080} -No match - -/^[[:print:]]+$/8W - Space: \x{a0} - 0: Space: \x{a0} - \x{1680}\x{2000}\x{2001}\x{2002}\x{2003}\x{2004}\x{2005} - 0: \x{1680}\x{2000}\x{2001}\x{2002}\x{2003}\x{2004}\x{2005} - \x{2006}\x{2007}\x{2008}\x{2009}\x{200a} - 0: \x{2006}\x{2007}\x{2008}\x{2009}\x{200a} - \x{202f}\x{205f} - 0: \x{202f}\x{205f} - \x{3000} - 0: \x{3000} - Letter:ABC - 0: Letter:ABC - Mark:\x{300}\x{1d172}\x{1d17b} - 0: Mark:\x{300}\x{1d172}\x{1d17b} - Number:9\x{660} - 0: Number:9\x{660} - Punctuation:\x{66a},; - 0: Punctuation:\x{66a},; - Symbol:\x{6de}<>\x{fffc} - 0: Symbol:\x{6de}<>\x{fffc} - Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} - 0: Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} - \x{180e} - 0: \x{180e} - \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} - 0: \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} - \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} - 0: \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} - \x{202f} - 0: \x{202f} - \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} - 0: \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} - \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} - 0: \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} - \x{feff} - 0: \x{feff} - \x{fff9}\x{fffa}\x{fffb} - 0: \x{fff9}\x{fffa}\x{fffb} - \x{110bd} - 0: \x{110bd} - \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} - 0: \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} - \x{e0001} - 0: \x{e0001} - \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} - 0: \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} - ** Failers - 0: ** Failers - \x{09} -No match - \x{1D} -No match - \x{85} -No match - \x{61c} -No match - \x{2028} -No match - \x{2029} -No match - \x{2065} -No match - \x{2066} -No match - \x{2067} -No match - \x{2068} -No match - \x{2069} -No match - \x{e0002} -No match - \x{e001f} -No match - \x{e0080} -No match - -/^[[:punct:]]+$/8W - \$+<=>^`|~ - 0: $+<=>^`|~ - !\"#%&'()*,-./:;?@[\\]_{} - 0: !"#%&'()*,-./:;?@[\]_{} - \x{a1}\x{a7} - 0: \x{a1}\x{a7} - \x{37e} - 0: \x{37e} - ** Failers -No match - abcde -No match - -/^[[:^graph:]]+$/8W - \x{09}\x{0a}\x{1D}\x{20}\x{85}\x{a0}\x{61c}\x{1680}\x{180e} - 0: \x{09}\x{0a}\x{1d} \x{85}\x{a0}\x{61c}\x{1680}\x{180e} - \x{2028}\x{2029}\x{202f}\x{2065}\x{2066}\x{2067}\x{2068}\x{2069} - 0: \x{2028}\x{2029}\x{202f}\x{2065}\x{2066}\x{2067}\x{2068}\x{2069} - \x{3000}\x{e0002}\x{e001f}\x{e0080} - 0: \x{3000}\x{e0002}\x{e001f}\x{e0080} - ** Failers -No match - Letter:ABC -No match - Mark:\x{300}\x{1d172}\x{1d17b} -No match - Number:9\x{660} -No match - Punctuation:\x{66a},; -No match - Symbol:\x{6de}<>\x{fffc} -No match - Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} -No match - \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} -No match - \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} -No match - \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} -No match - \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} -No match - \x{feff} -No match - \x{fff9}\x{fffa}\x{fffb} -No match - \x{110bd} -No match - \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} -No match - \x{e0001} -No match - \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} -No match - -/^[[:^print:]]+$/8W - \x{09}\x{1D}\x{85}\x{61c}\x{2028}\x{2029}\x{2065}\x{2066}\x{2067} - 0: \x{09}\x{1d}\x{85}\x{61c}\x{2028}\x{2029}\x{2065}\x{2066}\x{2067} - \x{2068}\x{2069}\x{e0002}\x{e001f}\x{e0080} - 0: \x{2068}\x{2069}\x{e0002}\x{e001f}\x{e0080} - ** Failers -No match - Space: \x{a0} -No match - \x{1680}\x{2000}\x{2001}\x{2002}\x{2003}\x{2004}\x{2005} -No match - \x{2006}\x{2007}\x{2008}\x{2009}\x{200a} -No match - \x{202f}\x{205f} -No match - \x{3000} -No match - Letter:ABC -No match - Mark:\x{300}\x{1d172}\x{1d17b} -No match - Number:9\x{660} -No match - Punctuation:\x{66a},; -No match - Symbol:\x{6de}<>\x{fffc} -No match - Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} -No match - \x{180e} -No match - \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} -No match - \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} -No match - \x{202f} -No match - \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} -No match - \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} -No match - \x{feff} -No match - \x{fff9}\x{fffa}\x{fffb} -No match - \x{110bd} -No match - \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} -No match - \x{e0001} -No match - \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} -No match - -/^[[:^punct:]]+$/8W - abcde - 0: abcde - ** Failers -No match - \$+<=>^`|~ -No match - !\"#%&'()*,-./:;?@[\\]_{} -No match - \x{a1}\x{a7} -No match - \x{37e} -No match - -/[RST]+/8iW - Ss\x{17f} - 0: Ss\x{17f} - -/[R-T]+/8iW - Ss\x{17f} - 0: Ss\x{17f} - -/[q-u]+/8iW - Ss\x{17f} - 0: Ss\x{17f} - -/^s?c/mi8 - scat - 0: sc - -/[A-`]/i8 - abcdefghijklmno - 0: a - -/\C\X*QT/8 - Ó…\x0aT -No match - -/[\pS#moq]/ - = - 0: = - -/[[:punct:]]/8W - \xc2\xb4 -No match - \x{b4} -No match - -/[[:^ascii:]]/8W - \x{100} - 0: \x{100} - \x{200} - 0: \x{200} - \x{300} - 0: \x{300} - \x{37e} - 0: \x{37e} - a -No match - 9 -No match - g -No match - -/[[:^ascii:]\w]/8W - a - 0: a - 9 - 0: 9 - g - 0: g - \x{100} - 0: \x{100} - \x{200} - 0: \x{200} - \x{300} - 0: \x{300} - \x{37e} - 0: \x{37e} - -/[\w[:^ascii:]]/8W - a - 0: a - 9 - 0: 9 - g - 0: g - \x{100} - 0: \x{100} - \x{200} - 0: \x{200} - \x{300} - 0: \x{300} - \x{37e} - 0: \x{37e} - -/[^[:ascii:]\W]/8W - a -No match - 9 -No match - g -No match - \x{100} - 0: \x{100} - \x{200} - 0: \x{200} - \x{300} -No match - \x{37e} -No match - -/[[:^ascii:]a]/8W - a - 0: a - 9 -No match - g -No match - \x{100} - 0: \x{100} - \x{200} - 0: \x{200} - \x{37e} - 0: \x{37e} - -/[^[:^ascii:]\d]/8W - a - 0: a - ~ - 0: ~ - 0 -No match - \a - 0: \x{07} - \x{7f} - 0: \x{7f} - \x{389} -No match - \x{20ac} -No match - -/(?=.*b)\pL/ - 11bb - 0: b - -/(?(?=.*b)(?=.*b)\pL|.*c)/ - 11bb - 0: b - -/-- End of testinput6 --/ diff --git a/src/pcre/testdata/testoutput7 b/src/pcre/testdata/testoutput7 deleted file mode 100644 index 2b167b28..00000000 --- a/src/pcre/testdata/testoutput7 +++ /dev/null @@ -1,2345 +0,0 @@ -/-- These tests for Unicode property support test PCRE's API and show some of - the compiled code. They are not Perl-compatible. --/ - -/[\p{L}]/DZ ------------------------------------------------------------------- - Bra - [\p{L}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/[\p{^L}]/DZ ------------------------------------------------------------------- - Bra - [\P{L}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/[\P{L}]/DZ ------------------------------------------------------------------- - Bra - [\P{L}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/[\P{^L}]/DZ ------------------------------------------------------------------- - Bra - [\p{L}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -No options -No first char -No need char - -/[abc\p{L}\x{0660}]/8DZ ------------------------------------------------------------------- - Bra - [a-c\p{L}\x{660}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - -/[\p{Nd}]/8DZ ------------------------------------------------------------------- - Bra - [\p{Nd}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - 1234 - 0: 1 - -/[\p{Nd}+-]+/8DZ ------------------------------------------------------------------- - Bra - [+\-\p{Nd}]++ - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: utf -No first char -No need char - 1234 - 0: 1234 - 12-34 - 0: 12-34 - 12+\x{661}-34 - 0: 12+\x{661}-34 - ** Failers -No match - abcd -No match - -/[\x{105}-\x{109}]/8iDZ ------------------------------------------------------------------- - Bra - [\x{104}-\x{109}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char - \x{104} - 0: \x{104} - \x{105} - 0: \x{105} - \x{109} - 0: \x{109} - ** Failers -No match - \x{100} -No match - \x{10a} -No match - -/[z-\x{100}]/8iDZ ------------------------------------------------------------------- - Bra - [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char - Z - 0: Z - z - 0: z - \x{39c} - 0: \x{39c} - \x{178} - 0: \x{178} - | - 0: | - \x{80} - 0: \x{80} - \x{ff} - 0: \x{ff} - \x{100} - 0: \x{100} - \x{101} - 0: \x{101} - ** Failers -No match - \x{102} -No match - Y -No match - y -No match - -/[z-\x{100}]/8DZi ------------------------------------------------------------------- - Bra - [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}] - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: caseless utf -No first char -No need char - -/(?:[\PPa*]*){8,}/ - -/[\P{Any}]/BZ ------------------------------------------------------------------- - Bra - [\P{Any}] - Ket - End ------------------------------------------------------------------- - -/[\P{Any}\E]/BZ ------------------------------------------------------------------- - Bra - [\P{Any}] - Ket - End ------------------------------------------------------------------- - -/(\P{Yi}+\277)/ - -/(\P{Yi}+\277)?/ - -/(?<=\P{Yi}{3}A)X/ - -/\p{Yi}+(\P{Yi}+)(?1)/ - -/(\P{Yi}{2}\277)?/ - -/[\P{Yi}A]/ - -/[\P{Yi}\P{Yi}\P{Yi}A]/ - -/[^\P{Yi}A]/ - -/[^\P{Yi}\P{Yi}\P{Yi}A]/ - -/(\P{Yi}*\277)*/ - -/(\P{Yi}*?\277)*/ - -/(\p{Yi}*+\277)*/ - -/(\P{Yi}?\277)*/ - -/(\P{Yi}??\277)*/ - -/(\p{Yi}?+\277)*/ - -/(\P{Yi}{0,3}\277)*/ - -/(\P{Yi}{0,3}?\277)*/ - -/(\p{Yi}{0,3}+\277)*/ - -/\p{Zl}{2,3}+/8BZ ------------------------------------------------------------------- - Bra - prop Zl {2} - prop Zl ?+ - Ket - End ------------------------------------------------------------------- - 

 - 0: \x{2028}\x{2028} - \x{2028}\x{2028}\x{2028} - 0: \x{2028}\x{2028}\x{2028} - -/\p{Zl}/8BZ ------------------------------------------------------------------- - Bra - prop Zl - Ket - End ------------------------------------------------------------------- - -/\p{Lu}{3}+/8BZ ------------------------------------------------------------------- - Bra - prop Lu {3} - Ket - End ------------------------------------------------------------------- - -/\pL{2}+/8BZ ------------------------------------------------------------------- - Bra - prop L {2} - Ket - End ------------------------------------------------------------------- - -/\p{Cc}{2}+/8BZ ------------------------------------------------------------------- - Bra - prop Cc {2} - Ket - End ------------------------------------------------------------------- - -/^\p{Cf}/8 - \x{180e} - 0: \x{180e} - \x{061c} - 0: \x{61c} - \x{2066} - 0: \x{2066} - \x{2067} - 0: \x{2067} - \x{2068} - 0: \x{2068} - \x{2069} - 0: \x{2069} - -/^\p{Cs}/8 - \?\x{dfff} - 0: \x{dfff} - ** Failers -No match - \x{09f} -No match - -/^\p{Mn}/8 - \x{1a1b} - 0: \x{1a1b} - -/^\p{Pe}/8 - \x{2309} - 0: \x{2309} - \x{230b} - 0: \x{230b} - -/^\p{Ps}/8 - \x{2308} - 0: \x{2308} - \x{230a} - 0: \x{230a} - -/^\p{Sc}+/8 - $\x{a2}\x{a3}\x{a4}\x{a5}\x{a6} - 0: $\x{a2}\x{a3}\x{a4}\x{a5} - \x{9f2} - 0: \x{9f2} - ** Failers -No match - X -No match - \x{2c2} -No match - -/^\p{Zs}/8 - \ \ - 0: - \x{a0} - 0: \x{a0} - \x{1680} - 0: \x{1680} - \x{2000} - 0: \x{2000} - \x{2001} - 0: \x{2001} - ** Failers -No match - \x{2028} -No match - \x{200d} -No match - -/-- These are here rather than in test 6 because Perl has problems with - the negative versions of the properties and behaves has changed how - it behaves for caseless matching. --/ - -/\p{^Lu}/8i - 1234 - 0: 1 - ** Failers - 0: * - ABC -No match - -/\P{Lu}/8i - 1234 - 0: 1 - ** Failers - 0: * - ABC -No match - -/\p{Ll}/8i - a - 0: a - Az - 0: z - ** Failers - 0: a - ABC -No match - -/\p{Lu}/8i - A - 0: A - a\x{10a0}B - 0: \x{10a0} - ** Failers - 0: F - a -No match - \x{1d00} -No match - -/\p{Lu}/8i - A - 0: A - aZ - 0: Z - ** Failers - 0: F - abc -No match - -/[\x{c0}\x{391}]/8i - \x{c0} - 0: \x{c0} - \x{e0} - 0: \x{e0} - -/-- The next two are special cases where the lengths of the different cases of -the same character differ. The first went wrong with heap frame storage; the -second was broken in all cases. --/ - -/^\x{023a}+?(\x{0130}+)/8i - \x{023a}\x{2c65}\x{0130} - 0: \x{23a}\x{2c65}\x{130} - 1: \x{130} - -/^\x{023a}+([^X])/8i - \x{023a}\x{2c65}X - 0: \x{23a}\x{2c65} - 1: \x{2c65} - -/\x{c0}+\x{116}+/8i - \x{c0}\x{e0}\x{116}\x{117} - 0: \x{c0}\x{e0}\x{116}\x{117} - -/[\x{c0}\x{116}]+/8i - \x{c0}\x{e0}\x{116}\x{117} - 0: \x{c0}\x{e0}\x{116}\x{117} - -/(\x{de})\1/8i - \x{de}\x{de} - 0: \x{de}\x{de} - 1: \x{de} - \x{de}\x{fe} - 0: \x{de}\x{fe} - 1: \x{de} - \x{fe}\x{fe} - 0: \x{fe}\x{fe} - 1: \x{fe} - \x{fe}\x{de} - 0: \x{fe}\x{de} - 1: \x{fe} - -/^\x{c0}$/8i - \x{c0} - 0: \x{c0} - \x{e0} - 0: \x{e0} - -/^\x{e0}$/8i - \x{c0} - 0: \x{c0} - \x{e0} - 0: \x{e0} - -/-- The next two should be Perl-compatible, but it fails to match \x{e0}. PCRE -will match it only with UCP support, because without that it has no notion -of case for anything other than the ASCII letters. --/ - -/((?i)[\x{c0}])/8 - \x{c0} - 0: \x{c0} - 1: \x{c0} - \x{e0} - 0: \x{e0} - 1: \x{e0} - -/(?i:[\x{c0}])/8 - \x{c0} - 0: \x{c0} - \x{e0} - 0: \x{e0} - -/-- These are PCRE's extra properties to help with Unicodizing \d etc. --/ - -/^\p{Xan}/8 - ABCD - 0: A - 1234 - 0: 1 - \x{6ca} - 0: \x{6ca} - \x{a6c} - 0: \x{a6c} - \x{10a7} - 0: \x{10a7} - ** Failers -No match - _ABC -No match - -/^\p{Xan}+/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - 0: ABCD1234\x{6ca}\x{a6c}\x{10a7} - ** Failers -No match - _ABC -No match - -/^\p{Xan}+?/8 - \x{6ca}\x{a6c}\x{10a7}_ - 0: \x{6ca} - -/^\p{Xan}*/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - 0: ABCD1234\x{6ca}\x{a6c}\x{10a7} - -/^\p{Xan}{2,9}/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - 0: ABCD1234\x{6ca} - -/^\p{Xan}{2,9}?/8 - \x{6ca}\x{a6c}\x{10a7}_ - 0: \x{6ca}\x{a6c} - -/^[\p{Xan}]/8 - ABCD1234_ - 0: A - 1234abcd_ - 0: 1 - \x{6ca} - 0: \x{6ca} - \x{a6c} - 0: \x{a6c} - \x{10a7} - 0: \x{10a7} - ** Failers -No match - _ABC -No match - -/^[\p{Xan}]+/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - 0: ABCD1234\x{6ca}\x{a6c}\x{10a7} - ** Failers -No match - _ABC -No match - -/^>\p{Xsp}/8 - >\x{1680}\x{2028}\x{0b} - 0: >\x{1680} - >\x{a0} - 0: >\x{a0} - ** Failers -No match - \x{0b} -No match - -/^>\p{Xsp}+/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xsp}+?/8 - >\x{1680}\x{2028}\x{0b} - 0: >\x{1680} - -/^>\p{Xsp}*/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xsp}{2,9}/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xsp}{2,9}?/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09} - -/^>[\p{Xsp}]/8 - >\x{2028}\x{0b} - 0: >\x{2028} - -/^>[\p{Xsp}]+/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}/8 - >\x{1680}\x{2028}\x{0b} - 0: >\x{1680} - >\x{a0} - 0: >\x{a0} - ** Failers -No match - \x{0b} -No match - -/^>\p{Xps}+/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}+?/8 - >\x{1680}\x{2028}\x{0b} - 0: >\x{1680} - -/^>\p{Xps}*/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}{2,9}/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^>\p{Xps}{2,9}?/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09} - -/^>[\p{Xps}]/8 - >\x{2028}\x{0b} - 0: >\x{2028} - -/^>[\p{Xps}]+/8 - > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} - -/^\p{Xwd}/8 - ABCD - 0: A - 1234 - 0: 1 - \x{6ca} - 0: \x{6ca} - \x{a6c} - 0: \x{a6c} - \x{10a7} - 0: \x{10a7} - _ABC - 0: _ - ** Failers -No match - [] -No match - -/^\p{Xwd}+/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - -/^\p{Xwd}+?/8 - \x{6ca}\x{a6c}\x{10a7}_ - 0: \x{6ca} - -/^\p{Xwd}*/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - -/^\p{Xwd}{2,9}/8 - A_B12\x{6ca}\x{a6c}\x{10a7} - 0: A_B12\x{6ca}\x{a6c}\x{10a7} - -/^\p{Xwd}{2,9}?/8 - \x{6ca}\x{a6c}\x{10a7}_ - 0: \x{6ca}\x{a6c} - -/^[\p{Xwd}]/8 - ABCD1234_ - 0: A - 1234abcd_ - 0: 1 - \x{6ca} - 0: \x{6ca} - \x{a6c} - 0: \x{a6c} - \x{10a7} - 0: \x{10a7} - _ABC - 0: _ - ** Failers -No match - [] -No match - -/^[\p{Xwd}]+/8 - ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_ - -/-- A check not in UTF-8 mode --/ - -/^[\p{Xwd}]+/ - ABCD1234_ - 0: ABCD1234_ - -/-- Some negative checks --/ - -/^[\P{Xwd}]+/8 - !.+\x{019}\x{35a}AB - 0: !.+\x{19}\x{35a} - -/^[\p{^Xwd}]+/8 - !.+\x{019}\x{35a}AB - 0: !.+\x{19}\x{35a} - -/[\D]/WBZ8 ------------------------------------------------------------------- - Bra - [\P{Nd}] - Ket - End ------------------------------------------------------------------- - 1\x{3c8}2 - 0: \x{3c8} - -/[\d]/WBZ8 ------------------------------------------------------------------- - Bra - [\p{Nd}] - Ket - End ------------------------------------------------------------------- - >\x{6f4}< - 0: \x{6f4} - -/[\S]/WBZ8 ------------------------------------------------------------------- - Bra - [\P{Xsp}] - Ket - End ------------------------------------------------------------------- - \x{1680}\x{6f4}\x{1680} - 0: \x{6f4} - -/[\s]/WBZ8 ------------------------------------------------------------------- - Bra - [\p{Xsp}] - Ket - End ------------------------------------------------------------------- - >\x{1680}< - 0: \x{1680} - -/[\W]/WBZ8 ------------------------------------------------------------------- - Bra - [\P{Xwd}] - Ket - End ------------------------------------------------------------------- - A\x{1712}B - 0: \x{1712} - -/[\w]/WBZ8 ------------------------------------------------------------------- - Bra - [\p{Xwd}] - Ket - End ------------------------------------------------------------------- - >\x{1723}< - 0: \x{1723} - -/\D/WBZ8 ------------------------------------------------------------------- - Bra - notprop Nd - Ket - End ------------------------------------------------------------------- - 1\x{3c8}2 - 0: \x{3c8} - -/\d/WBZ8 ------------------------------------------------------------------- - Bra - prop Nd - Ket - End ------------------------------------------------------------------- - >\x{6f4}< - 0: \x{6f4} - -/\S/WBZ8 ------------------------------------------------------------------- - Bra - notprop Xsp - Ket - End ------------------------------------------------------------------- - \x{1680}\x{6f4}\x{1680} - 0: \x{6f4} - -/\s/WBZ8 ------------------------------------------------------------------- - Bra - prop Xsp - Ket - End ------------------------------------------------------------------- - >\x{1680}> - 0: \x{1680} - -/\W/WBZ8 ------------------------------------------------------------------- - Bra - notprop Xwd - Ket - End ------------------------------------------------------------------- - A\x{1712}B - 0: \x{1712} - -/\w/WBZ8 ------------------------------------------------------------------- - Bra - prop Xwd - Ket - End ------------------------------------------------------------------- - >\x{1723}< - 0: \x{1723} - -/[[:alpha:]]/WBZ ------------------------------------------------------------------- - Bra - [\p{L}] - Ket - End ------------------------------------------------------------------- - -/[[:lower:]]/WBZ ------------------------------------------------------------------- - Bra - [\p{Ll}] - Ket - End ------------------------------------------------------------------- - -/[[:upper:]]/WBZ ------------------------------------------------------------------- - Bra - [\p{Lu}] - Ket - End ------------------------------------------------------------------- - -/[[:alnum:]]/WBZ ------------------------------------------------------------------- - Bra - [\p{Xan}] - Ket - End ------------------------------------------------------------------- - -/[[:ascii:]]/WBZ ------------------------------------------------------------------- - Bra - [\x00-\x7f] - Ket - End ------------------------------------------------------------------- - -/[[:cntrl:]]/WBZ ------------------------------------------------------------------- - Bra - [\x00-\x1f\x7f] - Ket - End ------------------------------------------------------------------- - -/[[:digit:]]/WBZ ------------------------------------------------------------------- - Bra - [\p{Nd}] - Ket - End ------------------------------------------------------------------- - -/[[:graph:]]/WBZ ------------------------------------------------------------------- - Bra - [[:graph:]] - Ket - End ------------------------------------------------------------------- - -/[[:print:]]/WBZ ------------------------------------------------------------------- - Bra - [[:print:]] - Ket - End ------------------------------------------------------------------- - -/[[:punct:]]/WBZ ------------------------------------------------------------------- - Bra - [[:punct:]] - Ket - End ------------------------------------------------------------------- - -/[[:space:]]/WBZ ------------------------------------------------------------------- - Bra - [\p{Xps}] - Ket - End ------------------------------------------------------------------- - -/[[:word:]]/WBZ ------------------------------------------------------------------- - Bra - [\p{Xwd}] - Ket - End ------------------------------------------------------------------- - -/[[:xdigit:]]/WBZ ------------------------------------------------------------------- - Bra - [0-9A-Fa-f] - Ket - End ------------------------------------------------------------------- - -/-- Unicode properties for \b abd \B --/ - -/\b...\B/8W - abc_ - 0: abc - \x{37e}abc\x{376} - 0: abc - \x{37e}\x{376}\x{371}\x{393}\x{394} - 0: \x{376}\x{371}\x{393} - !\x{c0}++\x{c1}\x{c2} - 0: ++\x{c1} - !\x{c0}+++++ - 0: \x{c0}++ - -/-- Without PCRE_UCP, non-ASCII always fail, even if < 256 --/ - -/\b...\B/8 - abc_ - 0: abc - ** Failers - 0: Fai - \x{37e}abc\x{376} -No match - \x{37e}\x{376}\x{371}\x{393}\x{394} -No match - !\x{c0}++\x{c1}\x{c2} -No match - !\x{c0}+++++ -No match - -/-- With PCRE_UCP, non-UTF8 chars that are < 256 still check properties --/ - -/\b...\B/W - abc_ - 0: abc - !\x{c0}++\x{c1}\x{c2} - 0: ++\xc1 - !\x{c0}+++++ - 0: \xc0++ - -/-- Some of these are silly, but they check various combinations --/ - -/[[:^alpha:][:^cntrl:]]+/8WBZ ------------------------------------------------------------------- - Bra - [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++ - Ket - End ------------------------------------------------------------------- - 123 - 0: 123 - abc - 0: abc - -/[[:^cntrl:][:^alpha:]]+/8WBZ ------------------------------------------------------------------- - Bra - [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++ - Ket - End ------------------------------------------------------------------- - 123 - 0: 123 - abc - 0: abc - -/[[:alpha:]]+/8WBZ ------------------------------------------------------------------- - Bra - [\p{L}]++ - Ket - End ------------------------------------------------------------------- - abc - 0: abc - -/[[:^alpha:]\S]+/8WBZ ------------------------------------------------------------------- - Bra - [\P{L}\P{Xsp}]++ - Ket - End ------------------------------------------------------------------- - 123 - 0: 123 - abc - 0: abc - -/[^\d]+/8WBZ ------------------------------------------------------------------- - Bra - [^\p{Nd}]++ - Ket - End ------------------------------------------------------------------- - abc123 - 0: abc - abc\x{123} - 0: abc\x{123} - \x{660}abc - 0: abc - -/\p{Lu}+9\p{Lu}+B\p{Lu}+b/BZ ------------------------------------------------------------------- - Bra - prop Lu ++ - 9 - prop Lu + - B - prop Lu ++ - b - Ket - End ------------------------------------------------------------------- - -/\p{^Lu}+9\p{^Lu}+B\p{^Lu}+b/BZ ------------------------------------------------------------------- - Bra - notprop Lu + - 9 - notprop Lu ++ - B - notprop Lu + - b - Ket - End ------------------------------------------------------------------- - -/\P{Lu}+9\P{Lu}+B\P{Lu}+b/BZ ------------------------------------------------------------------- - Bra - notprop Lu + - 9 - notprop Lu ++ - B - notprop Lu + - b - Ket - End ------------------------------------------------------------------- - -/\p{Han}+X\p{Greek}+\x{370}/BZ8 ------------------------------------------------------------------- - Bra - prop Han ++ - X - prop Greek + - \x{370} - Ket - End ------------------------------------------------------------------- - -/\p{Xan}+!\p{Xan}+A/BZ ------------------------------------------------------------------- - Bra - prop Xan ++ - ! - prop Xan + - A - Ket - End ------------------------------------------------------------------- - -/\p{Xsp}+!\p{Xsp}\t/BZ ------------------------------------------------------------------- - Bra - prop Xsp ++ - ! - prop Xsp - \x09 - Ket - End ------------------------------------------------------------------- - -/\p{Xps}+!\p{Xps}\t/BZ ------------------------------------------------------------------- - Bra - prop Xps ++ - ! - prop Xps - \x09 - Ket - End ------------------------------------------------------------------- - -/\p{Xwd}+!\p{Xwd}_/BZ ------------------------------------------------------------------- - Bra - prop Xwd ++ - ! - prop Xwd - _ - Ket - End ------------------------------------------------------------------- - -/A+\p{N}A+\dB+\p{N}*B+\d*/WBZ ------------------------------------------------------------------- - Bra - A++ - prop N - A++ - prop Nd - B+ - prop N *+ - B++ - prop Nd *+ - Ket - End ------------------------------------------------------------------- - -/-- These behaved oddly in Perl, so they are kept in this test --/ - -/(\x{23a}\x{23a}\x{23a})?\1/8i - \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65} -No match - -/(ȺȺȺ)?\1/8i - ȺȺȺⱥⱥ -No match - -/(\x{23a}\x{23a}\x{23a})?\1/8i - \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} - 0: \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} - 1: \x{23a}\x{23a}\x{23a} - -/(ȺȺȺ)?\1/8i - ȺȺȺⱥⱥⱥ - 0: \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} - 1: \x{23a}\x{23a}\x{23a} - -/(\x{23a}\x{23a}\x{23a})\1/8i - \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65} -No match - -/(ȺȺȺ)\1/8i - ȺȺȺⱥⱥ -No match - -/(\x{23a}\x{23a}\x{23a})\1/8i - \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} - 0: \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} - 1: \x{23a}\x{23a}\x{23a} - -/(ȺȺȺ)\1/8i - ȺȺȺⱥⱥⱥ - 0: \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} - 1: \x{23a}\x{23a}\x{23a} - -/(\x{2c65}\x{2c65})\1/8i - \x{2c65}\x{2c65}\x{23a}\x{23a} - 0: \x{2c65}\x{2c65}\x{23a}\x{23a} - 1: \x{2c65}\x{2c65} - -/(ⱥⱥ)\1/8i - ⱥⱥȺȺ - 0: \x{2c65}\x{2c65}\x{23a}\x{23a} - 1: \x{2c65}\x{2c65} - -/(\x{23a}\x{23a}\x{23a})\1Y/8i - X\x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65}YZ - 0: \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65}Y - 1: \x{23a}\x{23a}\x{23a} - -/(\x{2c65}\x{2c65})\1Y/8i - X\x{2c65}\x{2c65}\x{23a}\x{23a}YZ - 0: \x{2c65}\x{2c65}\x{23a}\x{23a}Y - 1: \x{2c65}\x{2c65} - -/-- --/ - -/-- These scripts weren't yet in Perl when I added Unicode 6.0.0 to PCRE --/ - -/^[\p{Batak}]/8 - \x{1bc0} - 0: \x{1bc0} - \x{1bff} - 0: \x{1bff} - ** Failers -No match - \x{1bf4} -No match - -/^[\p{Brahmi}]/8 - \x{11000} - 0: \x{11000} - \x{1106f} - 0: \x{1106f} - ** Failers -No match - \x{1104e} -No match - -/^[\p{Mandaic}]/8 - \x{840} - 0: \x{840} - \x{85e} - 0: \x{85e} - ** Failers -No match - \x{85c} -No match - \x{85d} -No match - -/-- --/ - -/(\X*)(.)/s8 - A\x{300} - 0: A - 1: - 2: A - -/^S(\X*)e(\X*)$/8 - SteÌreÌo - 0: Ste\x{301}re\x{301}o - 1: te\x{301}r - 2: \x{301}o - -/^\X/8 - ÌreÌo - 0: \x{301} - -/^a\X41z/ - aX41z - 0: aX41z - *** Failers -No match - aAz -No match - -/(?<=ab\Cde)X/8 -Failed: \C not allowed in lookbehind assertion at offset 10 - -/\X/ - a\P - 0: a - a\P\P -Partial match: a - -/\Xa/ - aa\P - 0: aa - aa\P\P - 0: aa - -/\X{2}/ - aa\P - 0: aa - aa\P\P -Partial match: aa - -/\X+a/ - a\P -Partial match: a - aa\P - 0: aa - aa\P\P -Partial match: aa - -/\X+?a/ - a\P -Partial match: a - ab\P -Partial match: ab - aa\P - 0: aa - aa\P\P - 0: aa - aba\P - 0: aba - -/-- These Unicode 6.1.0 scripts are not known to Perl. --/ - -/\p{Chakma}\d/8W - \x{11100}\x{1113c} - 0: \x{11100}\x{1113c} - -/\p{Takri}\d/8W - \x{11680}\x{116c0} - 0: \x{11680}\x{116c0} - -/^\X/8 - A\P - 0: A - A\P\P -Partial match: A - A\x{300}\x{301}\P - 0: A\x{300}\x{301} - A\x{300}\x{301}\P\P -Partial match: A\x{300}\x{301} - A\x{301}\P - 0: A\x{301} - A\x{301}\P\P -Partial match: A\x{301} - -/^\X{2,3}/8 - A\P -Partial match: A - A\P\P -Partial match: A - AA\P - 0: AA - AA\P\P -Partial match: AA - A\x{300}\x{301}\P -Partial match: A\x{300}\x{301} - A\x{300}\x{301}\P\P -Partial match: A\x{300}\x{301} - A\x{300}\x{301}A\x{300}\x{301}\P - 0: A\x{300}\x{301}A\x{300}\x{301} - A\x{300}\x{301}A\x{300}\x{301}\P\P -Partial match: A\x{300}\x{301}A\x{300}\x{301} - -/^\X{2}/8 - AA\P - 0: AA - AA\P\P -Partial match: AA - A\x{300}\x{301}A\x{300}\x{301}\P - 0: A\x{300}\x{301}A\x{300}\x{301} - A\x{300}\x{301}A\x{300}\x{301}\P\P -Partial match: A\x{300}\x{301}A\x{300}\x{301} - -/^\X+/8 - AA\P - 0: AA - AA\P\P -Partial match: AA - -/^\X+?Z/8 - AA\P -Partial match: AA - AA\P\P -Partial match: AA - -/A\x{3a3}B/8iDZ ------------------------------------------------------------------- - Bra - /i A - clist 03a3 03c2 03c3 - /i B - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: caseless utf -First char = 'A' (caseless) -Need char = 'B' (caseless) - -/\x{3a3}B/8iDZ ------------------------------------------------------------------- - Bra - clist 03a3 03c2 03c3 - /i B - Ket - End ------------------------------------------------------------------- -Capturing subpattern count = 0 -Options: caseless utf -No first char -Need char = 'B' (caseless) - -/[\x{3a3}]/8iBZ ------------------------------------------------------------------- - Bra - clist 03a3 03c2 03c3 - Ket - End ------------------------------------------------------------------- - -/[^\x{3a3}]/8iBZ ------------------------------------------------------------------- - Bra - not clist 03a3 03c2 03c3 - Ket - End ------------------------------------------------------------------- - -/[\x{3a3}]+/8iBZ ------------------------------------------------------------------- - Bra - clist 03a3 03c2 03c3 ++ - Ket - End ------------------------------------------------------------------- - -/[^\x{3a3}]+/8iBZ ------------------------------------------------------------------- - Bra - not clist 03a3 03c2 03c3 ++ - Ket - End ------------------------------------------------------------------- - -/a*\x{3a3}/8iBZ ------------------------------------------------------------------- - Bra - /i a*+ - clist 03a3 03c2 03c3 - Ket - End ------------------------------------------------------------------- - -/\x{3a3}+a/8iBZ ------------------------------------------------------------------- - Bra - clist 03a3 03c2 03c3 ++ - /i a - Ket - End ------------------------------------------------------------------- - -/\x{3a3}*\x{3c2}/8iBZ ------------------------------------------------------------------- - Bra - clist 03a3 03c2 03c3 * - clist 03a3 03c2 03c3 - Ket - End ------------------------------------------------------------------- - -/\x{3a3}{3}/8i+ - \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} - 0: \x{3a3}\x{3c3}\x{3c2} - 0+ \x{3a3}\x{3c3}\x{3c2} - -/\x{3a3}{2,4}/8i+ - \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} - 0: \x{3a3}\x{3c3}\x{3c2}\x{3a3} - 0+ \x{3c3}\x{3c2} - -/\x{3a3}{2,4}?/8i+ - \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} - 0: \x{3a3}\x{3c3} - 0+ \x{3c2}\x{3a3}\x{3c3}\x{3c2} - -/\x{3a3}+./8i+ - \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} - 0: \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} - 0+ - -/\x{3a3}++./8i+ - ** Failers -No match - \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} -No match - -/\x{3a3}*\x{3c2}/8iBZ ------------------------------------------------------------------- - Bra - clist 03a3 03c2 03c3 * - clist 03a3 03c2 03c3 - Ket - End ------------------------------------------------------------------- - -/[^\x{3a3}]*\x{3c2}/8iBZ ------------------------------------------------------------------- - Bra - not clist 03a3 03c2 03c3 *+ - clist 03a3 03c2 03c3 - Ket - End ------------------------------------------------------------------- - -/[^a]*\x{3c2}/8iBZ ------------------------------------------------------------------- - Bra - /i [^a]* - clist 03a3 03c2 03c3 - Ket - End ------------------------------------------------------------------- - -/ist/8iBZ ------------------------------------------------------------------- - Bra - /i i - clist 0053 0073 017f - /i t - Ket - End ------------------------------------------------------------------- - ikt -No match - -/is+t/8i - iSs\x{17f}t - 0: iSs\x{17f}t - ikt -No match - -/is+?t/8i - ikt -No match - -/is?t/8i - ikt -No match - -/is{2}t/8i - iskt -No match - -/-- This property is a PCRE special --/ - -/^\p{Xuc}/8 - $abc - 0: $ - @abc - 0: @ - `abc - 0: ` - \x{1234}abc - 0: \x{1234} - ** Failers -No match - abc -No match - -/^\p{Xuc}+/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $@`\x{a0}\x{1234}\x{e000} - ** Failers -No match - \x{9f} -No match - -/^\p{Xuc}+?/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $ - ** Failers -No match - \x{9f} -No match - -/^\p{Xuc}+?\*/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $@`\x{a0}\x{1234}\x{e000}* - ** Failers -No match - \x{9f} -No match - -/^\p{Xuc}++/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $@`\x{a0}\x{1234}\x{e000} - ** Failers -No match - \x{9f} -No match - -/^\p{Xuc}{3,5}/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $@`\x{a0}\x{1234} - ** Failers -No match - \x{9f} -No match - -/^\p{Xuc}{3,5}?/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $@` - ** Failers -No match - \x{9f} -No match - -/^[\p{Xuc}]/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $ - ** Failers -No match - \x{9f} -No match - -/^[\p{Xuc}]+/8 - $@`\x{a0}\x{1234}\x{e000}** - 0: $@`\x{a0}\x{1234}\x{e000} - ** Failers -No match - \x{9f} -No match - -/^\P{Xuc}/8 - abc - 0: a - ** Failers - 0: * - $abc -No match - @abc -No match - `abc -No match - \x{1234}abc -No match - -/^[\P{Xuc}]/8 - abc - 0: a - ** Failers - 0: * - $abc -No match - @abc -No match - `abc -No match - \x{1234}abc -No match - -/-- Some auto-possessification tests --/ - -/\pN+\z/BZ ------------------------------------------------------------------- - Bra - prop N ++ - \z - Ket - End ------------------------------------------------------------------- - -/\PN+\z/BZ ------------------------------------------------------------------- - Bra - notprop N ++ - \z - Ket - End ------------------------------------------------------------------- - -/\pN+/BZ ------------------------------------------------------------------- - Bra - prop N ++ - Ket - End ------------------------------------------------------------------- - -/\PN+/BZ ------------------------------------------------------------------- - Bra - notprop N ++ - Ket - End ------------------------------------------------------------------- - -/\p{Any}+\p{Any} \p{Any}+\P{Any} \p{Any}+\p{L&} \p{Any}+\p{L} \p{Any}+\p{Lu} \p{Any}+\p{Han} \p{Any}+\p{Xan} \p{Any}+\p{Xsp} \p{Any}+\p{Xps} \p{Xwd}+\p{Any} \p{Any}+\p{Xuc}/BWZx ------------------------------------------------------------------- - Bra - prop Any + - prop Any - prop Any + - notprop Any - prop Any + - prop L& - prop Any + - prop L - prop Any + - prop Lu - prop Any + - prop Han - prop Any + - prop Xan - prop Any + - prop Xsp - prop Any + - prop Xps - prop Xwd + - prop Any - prop Any + - prop Xuc - Ket - End ------------------------------------------------------------------- - -/\p{L&}+\p{Any} \p{L&}+\p{L&} \P{L&}+\p{L&} \p{L&}+\p{L} \p{L&}+\p{Lu} \p{L&}+\p{Han} \p{L&}+\p{Xan} \p{L&}+\P{Xan} \p{L&}+\p{Xsp} \p{L&}+\p{Xps} \p{Xwd}+\p{L&} \p{L&}+\p{Xuc}/BWZx ------------------------------------------------------------------- - Bra - prop L& + - prop Any - prop L& + - prop L& - notprop L& ++ - prop L& - prop L& + - prop L - prop L& + - prop Lu - prop L& + - prop Han - prop L& + - prop Xan - prop L& ++ - notprop Xan - prop L& ++ - prop Xsp - prop L& ++ - prop Xps - prop Xwd + - prop L& - prop L& + - prop Xuc - Ket - End ------------------------------------------------------------------- - -/\p{N}+\p{Any} \p{N}+\p{L&} \p{N}+\p{L} \p{N}+\P{L} \p{N}+\P{N} \p{N}+\p{Lu} \p{N}+\p{Han} \p{N}+\p{Xan} \p{N}+\p{Xsp} \p{N}+\p{Xps} \p{Xwd}+\p{N} \p{N}+\p{Xuc}/BWZx ------------------------------------------------------------------- - Bra - prop N + - prop Any - prop N + - prop L& - prop N ++ - prop L - prop N + - notprop L - prop N ++ - notprop N - prop N ++ - prop Lu - prop N + - prop Han - prop N + - prop Xan - prop N ++ - prop Xsp - prop N ++ - prop Xps - prop Xwd + - prop N - prop N + - prop Xuc - Ket - End ------------------------------------------------------------------- - -/\p{Lu}+\p{Any} \p{Lu}+\p{L&} \p{Lu}+\p{L} \p{Lu}+\p{Lu} \P{Lu}+\p{Lu} \p{Lu}+\p{Nd} \p{Lu}+\P{Nd} \p{Lu}+\p{Han} \p{Lu}+\p{Xan} \p{Lu}+\p{Xsp} \p{Lu}+\p{Xps} \p{Xwd}+\p{Lu} \p{Lu}+\p{Xuc}/BWZx ------------------------------------------------------------------- - Bra - prop Lu + - prop Any - prop Lu + - prop L& - prop Lu + - prop L - prop Lu + - prop Lu - notprop Lu ++ - prop Lu - prop Lu ++ - prop Nd - prop Lu + - notprop Nd - prop Lu + - prop Han - prop Lu + - prop Xan - prop Lu ++ - prop Xsp - prop Lu ++ - prop Xps - prop Xwd + - prop Lu - prop Lu + - prop Xuc - Ket - End ------------------------------------------------------------------- - -/\p{Han}+\p{Lu} \p{Han}+\p{L&} \p{Han}+\p{L} \p{Han}+\p{Lu} \p{Han}+\p{Arabic} \p{Arabic}+\p{Arabic} \p{Han}+\p{Xan} \p{Han}+\p{Xsp} \p{Han}+\p{Xps} \p{Xwd}+\p{Han} \p{Han}+\p{Xuc}/BWZx ------------------------------------------------------------------- - Bra - prop Han + - prop Lu - prop Han + - prop L& - prop Han + - prop L - prop Han + - prop Lu - prop Han ++ - prop Arabic - prop Arabic + - prop Arabic - prop Han + - prop Xan - prop Han + - prop Xsp - prop Han + - prop Xps - prop Xwd + - prop Han - prop Han + - prop Xuc - Ket - End ------------------------------------------------------------------- - -/\p{Xan}+\p{Any} \p{Xan}+\p{L&} \P{Xan}+\p{L&} \p{Xan}+\p{L} \p{Xan}+\p{Lu} \p{Xan}+\p{Han} \p{Xan}+\p{Xan} \p{Xan}+\P{Xan} \p{Xan}+\p{Xsp} \p{Xan}+\p{Xps} \p{Xwd}+\p{Xan} \p{Xan}+\p{Xuc}/BWZx ------------------------------------------------------------------- - Bra - prop Xan + - prop Any - prop Xan + - prop L& - notprop Xan ++ - prop L& - prop Xan + - prop L - prop Xan + - prop Lu - prop Xan + - prop Han - prop Xan + - prop Xan - prop Xan ++ - notprop Xan - prop Xan ++ - prop Xsp - prop Xan ++ - prop Xps - prop Xwd + - prop Xan - prop Xan + - prop Xuc - Ket - End ------------------------------------------------------------------- - -/\p{Xsp}+\p{Any} \p{Xsp}+\p{L&} \p{Xsp}+\p{L} \p{Xsp}+\p{Lu} \p{Xsp}+\p{Han} \p{Xsp}+\p{Xan} \p{Xsp}+\p{Xsp} \P{Xsp}+\p{Xsp} \p{Xsp}+\p{Xps} \p{Xwd}+\p{Xsp} \p{Xsp}+\p{Xuc}/BWZx ------------------------------------------------------------------- - Bra - prop Xsp + - prop Any - prop Xsp ++ - prop L& - prop Xsp ++ - prop L - prop Xsp ++ - prop Lu - prop Xsp + - prop Han - prop Xsp ++ - prop Xan - prop Xsp + - prop Xsp - notprop Xsp ++ - prop Xsp - prop Xsp + - prop Xps - prop Xwd ++ - prop Xsp - prop Xsp + - prop Xuc - Ket - End ------------------------------------------------------------------- - -/\p{Xwd}+\p{Any} \p{Xwd}+\p{L&} \p{Xwd}+\p{L} \p{Xwd}+\p{Lu} \p{Xwd}+\p{Han} \p{Xwd}+\p{Xan} \p{Xwd}+\p{Xsp} \p{Xwd}+\p{Xps} \p{Xwd}+\p{Xwd} \p{Xwd}+\P{Xwd} \p{Xwd}+\p{Xuc}/BWZx ------------------------------------------------------------------- - Bra - prop Xwd + - prop Any - prop Xwd + - prop L& - prop Xwd + - prop L - prop Xwd + - prop Lu - prop Xwd + - prop Han - prop Xwd + - prop Xan - prop Xwd ++ - prop Xsp - prop Xwd ++ - prop Xps - prop Xwd + - prop Xwd - prop Xwd ++ - notprop Xwd - prop Xwd + - prop Xuc - Ket - End ------------------------------------------------------------------- - -/\p{Xuc}+\p{Any} \p{Xuc}+\p{L&} \p{Xuc}+\p{L} \p{Xuc}+\p{Lu} \p{Xuc}+\p{Han} \p{Xuc}+\p{Xan} \p{Xuc}+\p{Xsp} \p{Xuc}+\p{Xps} \p{Xwd}+\p{Xuc} \p{Xuc}+\p{Xuc} \p{Xuc}+\P{Xuc}/BWZx ------------------------------------------------------------------- - Bra - prop Xuc + - prop Any - prop Xuc + - prop L& - prop Xuc + - prop L - prop Xuc + - prop Lu - prop Xuc + - prop Han - prop Xuc + - prop Xan - prop Xuc + - prop Xsp - prop Xuc + - prop Xps - prop Xwd + - prop Xuc - prop Xuc + - prop Xuc - prop Xuc ++ - notprop Xuc - Ket - End ------------------------------------------------------------------- - -/\p{N}+\p{Ll} \p{N}+\p{Nd} \p{N}+\P{Nd}/BWZx ------------------------------------------------------------------- - Bra - prop N ++ - prop Ll - prop N + - prop Nd - prop N + - notprop Nd - Ket - End ------------------------------------------------------------------- - -/\p{Xan}+\p{L} \p{Xan}+\p{N} \p{Xan}+\p{C} \p{Xan}+\P{L} \P{Xan}+\p{N} \p{Xan}+\P{C}/BWZx ------------------------------------------------------------------- - Bra - prop Xan + - prop L - prop Xan + - prop N - prop Xan ++ - prop C - prop Xan + - notprop L - notprop Xan ++ - prop N - prop Xan + - notprop C - Ket - End ------------------------------------------------------------------- - -/\p{L}+\p{Xan} \p{N}+\p{Xan} \p{C}+\p{Xan} \P{L}+\p{Xan} \p{N}+\p{Xan} \P{C}+\p{Xan} \p{L}+\P{Xan}/BWZx ------------------------------------------------------------------- - Bra - prop L + - prop Xan - prop N + - prop Xan - prop C ++ - prop Xan - notprop L + - prop Xan - prop N + - prop Xan - notprop C + - prop Xan - prop L ++ - notprop Xan - Ket - End ------------------------------------------------------------------- - -/\p{Xan}+\p{Lu} \p{Xan}+\p{Nd} \p{Xan}+\p{Cc} \p{Xan}+\P{Ll} \P{Xan}+\p{No} \p{Xan}+\P{Cf}/BWZx ------------------------------------------------------------------- - Bra - prop Xan + - prop Lu - prop Xan + - prop Nd - prop Xan ++ - prop Cc - prop Xan + - notprop Ll - notprop Xan ++ - prop No - prop Xan + - notprop Cf - Ket - End ------------------------------------------------------------------- - -/\p{Lu}+\p{Xan} \p{Nd}+\p{Xan} \p{Cs}+\p{Xan} \P{Lt}+\p{Xan} \p{Nl}+\p{Xan} \P{Cc}+\p{Xan} \p{Lt}+\P{Xan}/BWZx ------------------------------------------------------------------- - Bra - prop Lu + - prop Xan - prop Nd + - prop Xan - prop Cs ++ - prop Xan - notprop Lt + - prop Xan - prop Nl + - prop Xan - notprop Cc + - prop Xan - prop Lt ++ - notprop Xan - Ket - End ------------------------------------------------------------------- - -/\w+\p{P} \w+\p{Po} \w+\s \p{Xan}+\s \s+\p{Xan} \s+\w/BWZx ------------------------------------------------------------------- - Bra - prop Xwd + - prop P - prop Xwd + - prop Po - prop Xwd ++ - prop Xsp - prop Xan ++ - prop Xsp - prop Xsp ++ - prop Xan - prop Xsp ++ - prop Xwd - Ket - End ------------------------------------------------------------------- - -/\w+\P{P} \W+\p{Po} \w+\S \P{Xan}+\s \s+\P{Xan} \s+\W/BWZx ------------------------------------------------------------------- - Bra - prop Xwd + - notprop P - notprop Xwd + - prop Po - prop Xwd + - notprop Xsp - notprop Xan + - prop Xsp - prop Xsp + - notprop Xan - prop Xsp + - notprop Xwd - Ket - End ------------------------------------------------------------------- - -/\w+\p{Po} \w+\p{Pc} \W+\p{Po} \W+\p{Pc} \w+\P{Po} \w+\P{Pc}/BWZx ------------------------------------------------------------------- - Bra - prop Xwd + - prop Po - prop Xwd ++ - prop Pc - notprop Xwd + - prop Po - notprop Xwd + - prop Pc - prop Xwd + - notprop Po - prop Xwd + - notprop Pc - Ket - End ------------------------------------------------------------------- - -/\p{Nl}+\p{Xan} \P{Nl}+\p{Xan} \p{Nl}+\P{Xan} \P{Nl}+\P{Xan}/BWZx ------------------------------------------------------------------- - Bra - prop Nl + - prop Xan - notprop Nl + - prop Xan - prop Nl ++ - notprop Xan - notprop Nl + - notprop Xan - Ket - End ------------------------------------------------------------------- - -/\p{Xan}+\p{Nl} \P{Xan}+\p{Nl} \p{Xan}+\P{Nl} \P{Xan}+\P{Nl}/BWZx ------------------------------------------------------------------- - Bra - prop Xan + - prop Nl - notprop Xan ++ - prop Nl - prop Xan + - notprop Nl - notprop Xan + - notprop Nl - Ket - End ------------------------------------------------------------------- - -/\p{Xan}+\p{Nd} \P{Xan}+\p{Nd} \p{Xan}+\P{Nd} \P{Xan}+\P{Nd}/BWZx ------------------------------------------------------------------- - Bra - prop Xan + - prop Nd - notprop Xan ++ - prop Nd - prop Xan + - notprop Nd - notprop Xan + - notprop Nd - Ket - End ------------------------------------------------------------------- - -/-- End auto-possessification tests --/ - -/\w+/8CWBZ ------------------------------------------------------------------- - Bra - Callout 255 0 3 - prop Xwd ++ - Callout 255 3 0 - Ket - End ------------------------------------------------------------------- - abcd ---->abcd - +0 ^ \w+ - +3 ^ ^ - 0: abcd - -/[\p{N}]?+/BZO ------------------------------------------------------------------- - Bra - [\p{N}]?+ - Ket - End ------------------------------------------------------------------- - -/[\p{L}ab]{2,3}+/BZO ------------------------------------------------------------------- - Bra - [ab\p{L}]{2,3}+ - Ket - End ------------------------------------------------------------------- - -/\D+\X \d+\X \S+\X \s+\X \W+\X \w+\X \C+\X \R+\X \H+\X \h+\X \V+\X \v+\X a+\X \n+\X .+\X/BZx ------------------------------------------------------------------- - Bra - \D+ - extuni - \d+ - extuni - \S+ - extuni - \s+ - extuni - \W+ - extuni - \w+ - extuni - AllAny+ - extuni - \R+ - extuni - \H+ - extuni - \h+ - extuni - \V+ - extuni - \v+ - extuni - a+ - extuni - \x0a+ - extuni - Any+ - extuni - Ket - End ------------------------------------------------------------------- - -/.+\X/BZxs ------------------------------------------------------------------- - Bra - AllAny+ - extuni - Ket - End ------------------------------------------------------------------- - -/\X+$/BZxm ------------------------------------------------------------------- - Bra - extuni+ - /m $ - Ket - End ------------------------------------------------------------------- - -/\X+\D \X+\d \X+\S \X+\s \X+\W \X+\w \X+. \X+\C \X+\R \X+\H \X+\h \X+\V \X+\v \X+\X \X+\Z \X+\z \X+$/BZx ------------------------------------------------------------------- - Bra - extuni+ - \D - extuni+ - \d - extuni+ - \S - extuni+ - \s - extuni+ - \W - extuni+ - \w - extuni+ - Any - extuni+ - AllAny - extuni+ - \R - extuni+ - \H - extuni+ - \h - extuni+ - \V - extuni+ - \v - extuni+ - extuni - extuni+ - \Z - extuni++ - \z - extuni+ - $ - Ket - End ------------------------------------------------------------------- - -/\d+\s{0,5}=\s*\S?=\w{0,4}\W*/8WBZ ------------------------------------------------------------------- - Bra - prop Nd ++ - prop Xsp {0,5}+ - = - prop Xsp *+ - notprop Xsp ? - = - prop Xwd {0,4}+ - notprop Xwd *+ - Ket - End ------------------------------------------------------------------- - -/[RST]+/8iWBZ ------------------------------------------------------------------- - Bra - [R-Tr-t\x{17f}]++ - Ket - End ------------------------------------------------------------------- - -/[R-T]+/8iWBZ ------------------------------------------------------------------- - Bra - [R-Tr-t\x{17f}]++ - Ket - End ------------------------------------------------------------------- - -/[Q-U]+/8iWBZ ------------------------------------------------------------------- - Bra - [Q-Uq-u\x{17f}]++ - Ket - End ------------------------------------------------------------------- - -/^s?c/mi8I -Capturing subpattern count = 0 -Options: caseless multiline utf -First char at start or follows newline -Need char = 'c' (caseless) - scat - 0: sc - -/a[[:punct:]b]/WBZ ------------------------------------------------------------------- - Bra - a - [b[:punct:]] - Ket - End ------------------------------------------------------------------- - -/a[[:punct:]b]/8WBZ ------------------------------------------------------------------- - Bra - a - [b[:punct:]] - Ket - End ------------------------------------------------------------------- - -/a[b[:punct:]]/8WBZ ------------------------------------------------------------------- - Bra - a - [b[:punct:]] - Ket - End ------------------------------------------------------------------- - -/L(?#(|++\S/8 - > >X Y - 0: >X - > >\x{100} Y - 0: >\x{100} - -/\d/8 - \x{100}3 - 0: 3 - -/\s/8 - \x{100} X - 0: - -/\D+/8 - 12abcd34 - 0: abcd - *** Failers - 0: *** Failers - 1234 -No match - -/\D{2,3}/8 - 12abcd34 - 0: abc - 12ab34 - 0: ab - *** Failers - 0: *** - 1234 -No match - 12a34 -No match - -/\D{2,3}?/8 - 12abcd34 - 0: abc - 1: ab - 12ab34 - 0: ab - *** Failers - 0: *** - 1: ** - 1234 -No match - 12a34 -No match - -/\d+/8 - 12abcd34 - 0: 12 - *** Failers -No match - -/\d{2,3}/8 - 12abcd34 - 0: 12 - 1234abcd - 0: 123 - *** Failers -No match - 1.4 -No match - -/\d{2,3}?/8 - 12abcd34 - 0: 12 - 1234abcd - 0: 123 - 1: 12 - *** Failers -No match - 1.4 -No match - -/\S+/8 - 12abcd34 - 0: 12abcd34 - *** Failers - 0: *** - \ \ -No match - -/\S{2,3}/8 - 12abcd34 - 0: 12a - 1234abcd - 0: 123 - *** Failers - 0: *** - \ \ -No match - -/\S{2,3}?/8 - 12abcd34 - 0: 12a - 1: 12 - 1234abcd - 0: 123 - 1: 12 - *** Failers - 0: *** - 1: ** - \ \ -No match - -/>\s+ <34 - 0: > < - *** Failers -No match - -/>\s{2,3} < - ab> < - *** Failers -No match - ab> \s{2,3}? < - ab> < - *** Failers -No match - ab> \xff< - 0: \xff - -/[\xff]/8 - >\x{ff}< - 0: \x{ff} - -/[^\xFF]/ - XYZ - 0: X - -/[^\xff]/8 - XYZ - 0: X - \x{123} - 0: \x{123} - -/^[ac]*b/8 - xb -No match - -/^[ac\x{100}]*b/8 - xb -No match - -/^[^x]*b/8i - xb -No match - -/^[^x]*b/8 - xb -No match - -/^\d*b/8 - xb -No match - -/(|a)/g8 - catac - 0: - 0: a - 1: - 0: - 0: a - 1: - 0: - 0: - a\x{256}a - 0: a - 1: - 0: - 0: a - 1: - 0: - -/^\x{85}$/8i - \x{85} - 0: \x{85} - -/^abc./mgx8 - abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x{0085}abc7 \x{2028}abc8 \x{2029}abc9 JUNK - 0: abc1 - 0: abc2 - 0: abc3 - 0: abc4 - 0: abc5 - 0: abc6 - 0: abc7 - 0: abc8 - 0: abc9 - -/abc.$/mgx8 - abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x{0085} abc7\x{2028} abc8\x{2029} abc9 - 0: abc1 - 0: abc2 - 0: abc3 - 0: abc4 - 0: abc5 - 0: abc6 - 0: abc7 - 0: abc8 - 0: abc9 - -/^a\Rb/8 - a\nb - 0: a\x{0a}b - a\rb - 0: a\x{0d}b - a\r\nb - 0: a\x{0d}\x{0a}b - a\x0bb - 0: a\x{0b}b - a\x0cb - 0: a\x{0c}b - a\x{85}b - 0: a\x{85}b - a\x{2028}b - 0: a\x{2028}b - a\x{2029}b - 0: a\x{2029}b - ** Failers -No match - a\n\rb -No match - -/^a\R*b/8 - ab - 0: ab - a\nb - 0: a\x{0a}b - a\rb - 0: a\x{0d}b - a\r\nb - 0: a\x{0d}\x{0a}b - a\x0bb - 0: a\x{0b}b - a\x0c\x{2028}\x{2029}b - 0: a\x{0c}\x{2028}\x{2029}b - a\x{85}b - 0: a\x{85}b - a\n\rb - 0: a\x{0a}\x{0d}b - a\n\r\x{85}\x0cb - 0: a\x{0a}\x{0d}\x{85}\x{0c}b - -/^a\R+b/8 - a\nb - 0: a\x{0a}b - a\rb - 0: a\x{0d}b - a\r\nb - 0: a\x{0d}\x{0a}b - a\x0bb - 0: a\x{0b}b - a\x0c\x{2028}\x{2029}b - 0: a\x{0c}\x{2028}\x{2029}b - a\x{85}b - 0: a\x{85}b - a\n\rb - 0: a\x{0a}\x{0d}b - a\n\r\x{85}\x0cb - 0: a\x{0a}\x{0d}\x{85}\x{0c}b - ** Failers -No match - ab -No match - -/^a\R{1,3}b/8 - a\nb - 0: a\x{0a}b - a\n\rb - 0: a\x{0a}\x{0d}b - a\n\r\x{85}b - 0: a\x{0a}\x{0d}\x{85}b - a\r\n\r\nb - 0: a\x{0d}\x{0a}\x{0d}\x{0a}b - a\r\n\r\n\r\nb - 0: a\x{0d}\x{0a}\x{0d}\x{0a}\x{0d}\x{0a}b - a\n\r\n\rb - 0: a\x{0a}\x{0d}\x{0a}\x{0d}b - a\n\n\r\nb - 0: a\x{0a}\x{0a}\x{0d}\x{0a}b - ** Failers -No match - a\n\n\n\rb -No match - a\r -No match - -/\h+\V?\v{3,4}/8O - \x09\x20\x{a0}X\x0a\x0b\x0c\x0d\x0a - 0: \x{09} \x{a0}X\x{0a}\x{0b}\x{0c}\x{0d} - 1: \x{09} \x{a0}X\x{0a}\x{0b}\x{0c} - -/\V?\v{3,4}/8O - \x20\x{a0}X\x0a\x0b\x0c\x0d\x0a - 0: X\x{0a}\x{0b}\x{0c}\x{0d} - 1: X\x{0a}\x{0b}\x{0c} - -/\h+\V?\v{3,4}/8O - >\x09\x20\x{a0}X\x0a\x0a\x0a< - 0: \x{09} \x{a0}X\x{0a}\x{0a}\x{0a} - -/\V?\v{3,4}/8O - >\x09\x20\x{a0}X\x0a\x0a\x0a< - 0: X\x{0a}\x{0a}\x{0a} - -/\H\h\V\v/8 - X X\x0a - 0: X X\x{0a} - X\x09X\x0b - 0: X\x{09}X\x{0b} - ** Failers -No match - \x{a0} X\x0a -No match - -/\H*\h+\V?\v{3,4}/8O - \x09\x20\x{a0}X\x0a\x0b\x0c\x0d\x0a - 0: \x{09} \x{a0}X\x{0a}\x{0b}\x{0c}\x{0d} - 1: \x{09} \x{a0}X\x{0a}\x{0b}\x{0c} - \x09\x20\x{a0}\x0a\x0b\x0c\x0d\x0a - 0: \x{09} \x{a0}\x{0a}\x{0b}\x{0c}\x{0d} - 1: \x{09} \x{a0}\x{0a}\x{0b}\x{0c} - \x09\x20\x{a0}\x0a\x0b\x0c - 0: \x{09} \x{a0}\x{0a}\x{0b}\x{0c} - ** Failers -No match - \x09\x20\x{a0}\x0a\x0b -No match - -/\H\h\V\v/8 - \x{3001}\x{3000}\x{2030}\x{2028} - 0: \x{3001}\x{3000}\x{2030}\x{2028} - X\x{180e}X\x{85} - 0: X\x{180e}X\x{85} - ** Failers -No match - \x{2009} X\x0a -No match - -/\H*\h+\V?\v{3,4}/8O - \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x0c\x0d\x0a - 0: \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x{0c}\x{0d} - 1: \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x{0c} - \x09\x{205f}\x{a0}\x0a\x{2029}\x0c\x{2028}\x0a - 0: \x{09}\x{205f}\x{a0}\x{0a}\x{2029}\x{0c}\x{2028} - 1: \x{09}\x{205f}\x{a0}\x{0a}\x{2029}\x{0c} - \x09\x20\x{202f}\x0a\x0b\x0c - 0: \x{09} \x{202f}\x{0a}\x{0b}\x{0c} - ** Failers -No match - \x09\x{200a}\x{a0}\x{2028}\x0b -No match - -/a\Rb/I8 -Capturing subpattern count = 0 -Options: bsr_anycrlf utf -First char = 'a' -Need char = 'b' - a\rb - 0: a\x{0d}b - a\nb - 0: a\x{0a}b - a\r\nb - 0: a\x{0d}\x{0a}b - ** Failers -No match - a\x{85}b -No match - a\x0bb -No match - -/a\Rb/I8 -Capturing subpattern count = 0 -Options: bsr_unicode utf -First char = 'a' -Need char = 'b' - a\rb - 0: a\x{0d}b - a\nb - 0: a\x{0a}b - a\r\nb - 0: a\x{0d}\x{0a}b - a\x{85}b - 0: a\x{85}b - a\x0bb - 0: a\x{0b}b - ** Failers -No match - a\x{85}b\ -No match - a\x0bb\ -No match - -/a\R?b/I8 -Capturing subpattern count = 0 -Options: bsr_anycrlf utf -First char = 'a' -Need char = 'b' - a\rb - 0: a\x{0d}b - a\nb - 0: a\x{0a}b - a\r\nb - 0: a\x{0d}\x{0a}b - ** Failers -No match - a\x{85}b -No match - a\x0bb -No match - -/a\R?b/I8 -Capturing subpattern count = 0 -Options: bsr_unicode utf -First char = 'a' -Need char = 'b' - a\rb - 0: a\x{0d}b - a\nb - 0: a\x{0a}b - a\r\nb - 0: a\x{0d}\x{0a}b - a\x{85}b - 0: a\x{85}b - a\x0bb - 0: a\x{0b}b - ** Failers -No match - a\x{85}b\ -No match - a\x0bb\ -No match - -/X/8f - A\x{1ec5}ABCXYZ - 0: X - -/abcd*/8 - xxxxabcd\P - 0: abcd - xxxxabcd\P\P -Partial match: abcd - -/abcd*/i8 - xxxxabcd\P - 0: abcd - xxxxabcd\P\P -Partial match: abcd - XXXXABCD\P - 0: ABCD - XXXXABCD\P\P -Partial match: ABCD - -/abc\d*/8 - xxxxabc1\P - 0: abc1 - xxxxabc1\P\P -Partial match: abc1 - -/abc[de]*/8 - xxxxabcde\P - 0: abcde - xxxxabcde\P\P -Partial match: abcde - -/\bthe cat\b/8 - the cat\P - 0: the cat - the cat\P\P -Partial match: the cat - -/ab\Cde/8 - abXde -Error -16 (item unsupported for DFA matching) - -/(?<=ab\Cde)X/8 -Failed: \C not allowed in lookbehind assertion at offset 10 - -/./8 - \r\P - 0: \x{0d} - \r\P\P -Partial match: \x{0d} - -/.{2,3}/8 - \r\P -Partial match: \x{0d} - \r\P\P -Partial match: \x{0d} - \r\r\P - 0: \x{0d}\x{0d} - \r\r\P\P -Partial match: \x{0d}\x{0d} - \r\r\r\P - 0: \x{0d}\x{0d}\x{0d} - \r\r\r\P\P -Partial match: \x{0d}\x{0d}\x{0d} - -/.{2,3}?/8 - \r\P -Partial match: \x{0d} - \r\P\P -Partial match: \x{0d} - \r\r\P - 0: \x{0d}\x{0d} - \r\r\P\P -Partial match: \x{0d}\x{0d} - \r\r\r\P - 0: \x{0d}\x{0d}\x{0d} - 1: \x{0d}\x{0d} - \r\r\r\P\P -Partial match: \x{0d}\x{0d}\x{0d} - -/[^\x{100}]/8 - \x{100}\x{101}X - 0: \x{101} - -/[^\x{100}]+/8 - \x{100}\x{101}X - 0: \x{101}X - -/-- End of testinput9 --/ diff --git a/src/pcre/testdata/testoutputEBC b/src/pcre/testdata/testoutputEBC deleted file mode 100644 index 72b6fa3e..00000000 --- a/src/pcre/testdata/testoutputEBC +++ /dev/null @@ -1,188 +0,0 @@ -/-- This is a specialized test for checking, when PCRE is compiled with the -EBCDIC option but in an ASCII environment, that newline and white space -functionality is working. It catches cases where explicit values such as 0x0a -have been used instead of names like CHAR_LF. Needless to say, it is not a -genuine EBCDIC test! In patterns, alphabetic characters that follow a backslash -must be in EBCDIC code. In data, newlines and other spacing characters must be -in EBCDIC, but can be specified as escapes. --/ - -/-- Test default newline and variations --/ - -/^A/m - ABC - 0: A - 12\x15ABC - 0: A - -/^A/m - 12\x15ABC - 0: A - 12\x0dABC - 0: A - 12\x0d\x15ABC - 0: A - 12\x25ABC - 0: A - -/^A/m - 12\x15ABC - 0: A - 12\x0dABC - 0: A - 12\x0d\x15ABC - 0: A - ** Fail -No match - 12\x25ABC -No match - -/-- Test \h --/ - -/^A\ˆ/ - A B - 0: A\x20 - A\x41B - 0: AA - -/-- Test \H --/ - -/^A\È/ - AB - 0: AB - A\x42B - 0: AB - ** Fail -No match - A B -No match - A\x41B -No match - -/-- Test \R --/ - -/^A\Ù/ - A\x15B - 0: A\x15 - A\x0dB - 0: A\x0d - A\x25B - 0: A\x25 - A\x0bB - 0: A\x0b - A\x0cB - 0: A\x0c - ** Fail -No match - A B -No match - -/-- Test \v --/ - -/^A\¥/ - A\x15B - 0: A\x15 - A\x0dB - 0: A\x0d - A\x25B - 0: A\x25 - A\x0bB - 0: A\x0b - A\x0cB - 0: A\x0c - ** Fail -No match - A B -No match - -/-- Test \V --/ - -/^A\å/ - A B - 0: A\x20 - ** Fail -No match - A\x15B -No match - A\x0dB -No match - A\x25B -No match - A\x0bB -No match - A\x0cB -No match - -/-- For repeated items, use an atomic group so that the output is the same -for DFA matching (otherwise it may show multiple matches). --/ - -/-- Test \h+ --/ - -/^A(?>\ˆ+)/ - A B - 0: A\x20 - -/-- Test \H+ --/ - -/^A(?>\È+)/ - AB - 0: AB - ** Fail -No match - A B -No match - -/-- Test \R+ --/ - -/^A(?>\Ù+)/ - A\x15B - 0: A\x15 - A\x0dB - 0: A\x0d - A\x25B - 0: A\x25 - A\x0bB - 0: A\x0b - A\x0cB - 0: A\x0c - ** Fail -No match - A B -No match - -/-- Test \v+ --/ - -/^A(?>\¥+)/ - A\x15B - 0: A\x15 - A\x0dB - 0: A\x0d - A\x25B - 0: A\x25 - A\x0bB - 0: A\x0b - A\x0cB - 0: A\x0c - ** Fail -No match - A B -No match - -/-- Test \V+ --/ - -/^A(?>\å+)/ - A B - 0: A\x20B - ** Fail -No match - A\x15B -No match - A\x0dB -No match - A\x25B -No match - A\x0bB -No match - A\x0cB -No match - -/-- End --/ diff --git a/src/pcre/testdata/wintestinput3 b/src/pcre/testdata/wintestinput3 deleted file mode 100644 index 04e76a6d..00000000 --- a/src/pcre/testdata/wintestinput3 +++ /dev/null @@ -1,91 +0,0 @@ -/^[\w]+/ - *** Failers - École - -/^[\w]+/Lfrench - École - -/^[\w]+/ - *** Failers - École - -/^[\W]+/ - École - -/^[\W]+/Lfrench - *** Failers - École - -/[\b]/ - \b - *** Failers - a - -/[\b]/Lfrench - \b - *** Failers - a - -/^\w+/ - *** Failers - École - -/^\w+/Lfrench - École - -/(.+)\b(.+)/ - École - -/(.+)\b(.+)/Lfrench - *** Failers - École - -/École/i - École - *** Failers - école - -/École/iLfrench - École - école - -/\w/IS - -/\w/ISLfrench - -/^[\xc8-\xc9]/iLfrench - École - école - -/^[\xc8-\xc9]/Lfrench - École - *** Failers - école - -/\W+/Lfrench - >>>\xaa<<< - >>>\xba<<< - -/[\W]+/Lfrench - >>>\xaa<<< - >>>\xba<<< - -/[^[:alpha:]]+/Lfrench - >>>\xaa<<< - >>>\xba<<< - -/\w+/Lfrench - >>>\xaa<<< - >>>\xba<<< - -/[\w]+/Lfrench - >>>\xaa<<< - >>>\xba<<< - -/[[:alpha:]]+/Lfrench - >>>\xaa<<< - >>>\xba<<< - -/[[:alpha:]][[:lower:]][[:upper:]]/DZLfrench - -/ End of testinput3 / diff --git a/src/pcre/132html b/src/pcre2/132html similarity index 89% rename from src/pcre/132html rename to src/pcre2/132html index e0002497..1bd62ba2 100755 --- a/src/pcre/132html +++ b/src/pcre2/132html @@ -1,6 +1,6 @@ #! /usr/bin/perl -w -# Script to turn PCRE man pages into HTML +# Script to turn PCRE2 man pages into HTML # Subroutine to handle font changes and other escapes @@ -63,12 +63,12 @@ print <

    $ARGV[0] man page

    -Return to the PCRE index page. +Return to the PCRE2 index page.

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong.
    End @@ -82,7 +82,7 @@ while () if (/^\./) { - # Some of the PCRE man pages used to contain instances of .br. However, + # Some of the PCRE2 man pages used to contain instances of .br. However, # they should have all been removed because they cause trouble in some # (other) automated systems that translate man pages to HTML. Complain if # we find .br or .in (another macro that is deprecated). @@ -109,8 +109,9 @@ while () # Handling .sp is subtle. If it is inside a literal section, do nothing if # the next line is a non literal text line; similarly, if not inside a # literal section, do nothing if a literal follows, unless we are inside - # a .nf/.ne section. The point being that the

     and 
    that delimit - # literal sections will do the spacing. Always skip if no previous output. + # a .nf/.fi section or about to enter one. The point being that the
    +    # and 
    that delimit literal sections will do the spacing. Always skip + # if no previous output. elsif (/^\.sp/) { @@ -123,7 +124,7 @@ while () } else { - print TEMP "
    \n
    \n" if ($innf || !/^[\s.]/); + print TEMP "
    \n
    \n" if ($innf || /^\.nf/ || !/^[\s.]/); } redo; # Now process the lookahead line we just read } @@ -232,7 +233,7 @@ while () redo; # Process the joined lines } - # .EX/.EE are used in the pcredemo page to bracket the entire program, + # .EX/.EE are used in the pcre2demo page to bracket the entire program, # which is unmodified except for turning backslash into "\e". elsif (/^\.EX\s*$/) @@ -303,7 +304,7 @@ print while (); print < -Return to the PCRE index page. +Return to the PCRE2 index page.

    End diff --git a/src/pcre2/AUTHORS b/src/pcre2/AUTHORS new file mode 100644 index 00000000..c61b5f3a --- /dev/null +++ b/src/pcre2/AUTHORS @@ -0,0 +1,36 @@ +THE MAIN PCRE2 LIBRARY CODE +--------------------------- + +Written by: Philip Hazel +Email local part: Philip.Hazel +Email domain: gmail.com + +University of Cambridge Computing Service, +Cambridge, England. + +Copyright (c) 1997-2021 University of Cambridge +All rights reserved + + +PCRE2 JUST-IN-TIME COMPILATION SUPPORT +-------------------------------------- + +Written by: Zoltan Herczeg +Email local part: hzmester +Emain domain: freemail.hu + +Copyright(c) 2010-2021 Zoltan Herczeg +All rights reserved. + + +STACK-LESS JUST-IN-TIME COMPILER +-------------------------------- + +Written by: Zoltan Herczeg +Email local part: hzmester +Emain domain: freemail.hu + +Copyright(c) 2009-2021 Zoltan Herczeg +All rights reserved. + +#### diff --git a/src/pcre2/CMakeLists.txt b/src/pcre2/CMakeLists.txt new file mode 100644 index 00000000..71ba693d --- /dev/null +++ b/src/pcre2/CMakeLists.txt @@ -0,0 +1,1017 @@ +# CMakeLists.txt +# +# This file enables PCRE2 to be built with the CMake configuration and build +# tool. Download CMake in source or binary form from http://www.cmake.org/ +# Converted to support PCRE2 from the original PCRE file, August 2014. +# +# Original listfile by Christian Ehrlicher +# Refined and expanded by Daniel Richard G. +# 2007-09-14 mod by Sheri so 7.4 supported configuration options can be entered +# 2007-09-19 Adjusted by PH to retain previous default settings +# 2007-12-26 (a) On UNIX, use names libpcre instead of just pcre +# (b) Ensure pcretest and pcregrep link with the local library, +# not a previously-installed one. +# (c) Add PCRE_SUPPORT_LIBREADLINE, PCRE_SUPPORT_LIBZ, and +# PCRE_SUPPORT_LIBBZ2. +# 2008-01-20 Brought up to date to include several new features by Christian +# Ehrlicher. +# 2008-01-22 Sheri added options for backward compatibility of library names +# when building with minGW: +# if "ON", NON_STANDARD_LIB_PREFIX causes shared libraries to +# be built without "lib" as prefix. (The libraries will be named +# pcre.dll, pcreposix.dll and pcrecpp.dll). +# if "ON", NON_STANDARD_LIB_SUFFIX causes shared libraries to +# be built with suffix of "-0.dll". (The libraries will be named +# libpcre-0.dll, libpcreposix-0.dll and libpcrecpp-0.dll - same names +# built by default with Configure and Make. +# 2008-01-23 PH removed the automatic build of pcredemo. +# 2008-04-22 PH modified READLINE support so it finds NCURSES when needed. +# 2008-07-03 PH updated for revised UCP property support (change of files) +# 2009-03-23 PH applied Steven Van Ingelgem's patch to change the name +# CMAKE_BINARY_DIR to PROJECT_BINARY_DIR so that it works when PCRE +# is included within another project. +# 2009-03-23 PH applied a modified version of Steven Van Ingelgem's patches to +# add options to stop the building of pcregrep and the tests, and +# to disable the final configuration report. +# 2009-04-11 PH applied Christian Ehrlicher's patch to show compiler flags that +# are set by specifying a release type. +# 2010-01-02 PH added test for stdint.h +# 2010-03-02 PH added test for inttypes.h +# 2011-08-01 PH added PCREGREP_BUFSIZE +# 2011-08-22 PH added PCRE_SUPPORT_JIT +# 2011-09-06 PH modified WIN32 ADD_TEST line as suggested by Sergey Cherepanov +# 2011-09-06 PH added PCRE_SUPPORT_PCREGREP_JIT +# 2011-10-04 Sheri added support for including coff data in windows shared libraries +# compiled with MINGW if pcre.rc and/or pcreposix.rc are placed in +# the source dir by the user prior to building +# 2011-10-04 Sheri changed various add_test's to use exes' location built instead +# of DEBUG location only (likely only matters in MSVC) +# 2011-10-04 Sheri added scripts to provide needed variables to RunTest and +# RunGrepTest (used for UNIX and Msys) +# 2011-10-04 Sheri added scripts to provide needed variables and to execute +# RunTest.bat in Win32 (for effortless testing with "make test") +# 2011-10-04 Sheri Increased minimum required cmake version +# 2012-01-06 PH removed pcre_info.c and added pcre_string_utils.c +# 2012-01-10 Zoltan Herczeg added libpcre16 support +# 2012-01-13 Stephen Kelly added out of source build support +# 2012-01-17 PH applied Stephen Kelly's patch to parse the version data out +# of the configure.ac file +# 2012-02-26 PH added support for libedit +# 2012-09-06 PH added support for PCRE_EBCDIC_NL25 +# 2012-09-08 ChPe added PCRE32 support +# 2012-10-23 PH added support for VALGRIND and GCOV +# 2012-12-08 PH added patch from Daniel Richard G to quash some MSVC warnings +# 2013-07-01 PH realized that the "support" for GCOV was a total nonsense and +# so it has been removed. +# 2013-10-08 PH got rid of the "source" command, which is a bash-ism (use ".") +# 2013-11-05 PH added support for PARENS_NEST_LIMIT +# 2014-08-29 PH converted the file for PCRE2 (which has no C++). +# 2015-04-24 PH added support for PCRE2_DEBUG +# 2015-07-16 PH updated for new pcre2_find_bracket source module +# 2015-08-24 PH correct C_FLAGS setting (patch from Roy Ivy III) +# 2015-10=16 PH added support for never-backslash-C +# 2016-03-01 PH applied Chris Wilson's patch for MSVC static +# 2016-06-24 PH applied Chris Wilson's second patch, putting the first under +# a new option instead of being unconditional. +# 2016-10-05 PH fixed a typo (PCRE should be PCRE2) in above patch +# fix by David Gaussmann +# 2016-10-07 PH added PCREGREP_MAX_BUFSIZE +# 2017-03-11 PH turned HEAP_MATCH_RECURSE into a NO-OP for 10.30 +# 2017-04-08 PH added HEAP_LIMIT +# 2017-06-15 ZH added SUPPORT_JIT_SEALLOC support +# 2018-06-19 PH added checks for stdint.h and inttypes.h (later removed) +# 2018-06-27 PH added Daniel's patch to increase the stack for MSVC +# 2018-11-14 PH removed unnecessary checks for stdint.h and inttypes.h +# 2018-11-16 PH added PCRE2GREP_SUPPORT_CALLOUT_FORK support and tidied +# 2019-02-16 PH hacked to avoid CMP0026 policy issue (see comments below) +# 2020-03-16 PH renamed dftables as pcre2_dftables (as elsewhere) +# 2020-03-24 PH changed CMAKE_MODULE_PATH definition to add, not replace +# 2020-04-08 Carlo added function check for secure_getenv, fixed strerror +# 2020-04-16 enh added check for __attribute__((uninitialized)) +# 2020-04-25 PH applied patches from Uwe Korn to support pkg-config and +# library versioning. +# 2020-04-25 Carlo added function check for mkostemp used in ProtExecAllocator +# 2020-04-28 PH added function check for memfd_create based on Carlo's patch +# 2020-05-25 PH added a check for Intel CET +# 2020-12-03 PH altered the definition of pcre2test as suggested by Daniel + +PROJECT(PCRE2 C) + +# Increased minimum to 2.8.5 to support GNUInstallDirs. +CMAKE_MINIMUM_REQUIRED(VERSION 2.8.5) + +# Set policy CMP0026 to avoid warnings for the use of LOCATION in +# GET_TARGET_PROPERTY. This should no longer be required. +# CMAKE_POLICY(SET CMP0026 OLD) + +# For FindReadline.cmake. This was changed to allow setting CMAKE_MODULE_PATH +# on the command line. +# SET(CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake) + +LIST(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake) + +INCLUDE_DIRECTORIES(${PROJECT_SOURCE_DIR}/src) + +# external packages +FIND_PACKAGE( BZip2 ) +FIND_PACKAGE( ZLIB ) +FIND_PACKAGE( Readline ) +FIND_PACKAGE( Editline ) + +# Configuration checks + +INCLUDE(CheckCSourceCompiles) +INCLUDE(CheckFunctionExists) +INCLUDE(CheckSymbolExists) +INCLUDE(CheckIncludeFile) +INCLUDE(CheckTypeSize) +INCLUDE(GNUInstallDirs) # for CMAKE_INSTALL_LIBDIR + +CHECK_INCLUDE_FILE(dirent.h HAVE_DIRENT_H) +CHECK_INCLUDE_FILE(stdint.h HAVE_STDINT_H) +CHECK_INCLUDE_FILE(inttypes.h HAVE_INTTYPES_H) +CHECK_INCLUDE_FILE(sys/stat.h HAVE_SYS_STAT_H) +CHECK_INCLUDE_FILE(sys/types.h HAVE_SYS_TYPES_H) +CHECK_INCLUDE_FILE(unistd.h HAVE_UNISTD_H) +CHECK_INCLUDE_FILE(windows.h HAVE_WINDOWS_H) + +CHECK_SYMBOL_EXISTS(bcopy "strings.h" HAVE_BCOPY) +CHECK_SYMBOL_EXISTS(memfd_create "sys/mman.h" HAVE_MEMFD_CREATE) +CHECK_SYMBOL_EXISTS(memmove "string.h" HAVE_MEMMOVE) +CHECK_SYMBOL_EXISTS(secure_getenv "stdlib.h" HAVE_SECURE_GETENV) +CHECK_SYMBOL_EXISTS(strerror "string.h" HAVE_STRERROR) + +set(ORIG_CMAKE_REQUIRED_FLAGS ${CMAKE_REQUIRED_FLAGS}) +set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} -Werror") +CHECK_C_SOURCE_COMPILES( + "int main() { char buf[128] __attribute__((uninitialized)); (void)buf; return 0; }" + HAVE_ATTRIBUTE_UNINITIALIZED +) +set(CMAKE_REQUIRED_FLAGS ${ORIG_CMAKE_REQUIRED_FLAGS}) + +# Check whether Intel CET is enabled, and if so, adjust compiler flags. This +# code was written by PH, trying to imitate the logic from the autotools +# configuration. + +CHECK_C_SOURCE_COMPILES( + "#ifndef __CET__ + #error CET is not enabled + #endif + int main() { return 0; }" + INTEL_CET_ENABLED +) + +IF (INTEL_CET_ENABLED) + SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mshstk") +ENDIF(INTEL_CET_ENABLED) + + + +# User-configurable options +# +# Note: CMakeSetup displays these in alphabetical order, regardless of +# the order we use here. + +SET(BUILD_SHARED_LIBS OFF CACHE BOOL + "Build shared libraries instead of static ones.") + +OPTION(PCRE2_BUILD_PCRE2_8 "Build 8 bit PCRE2 library" ON) + +OPTION(PCRE2_BUILD_PCRE2_16 "Build 16 bit PCRE2 library" OFF) + +OPTION(PCRE2_BUILD_PCRE2_32 "Build 32 bit PCRE2 library" OFF) + +OPTION(PCRE2_DEBUG "Include debugging code" OFF) + +OPTION(PCRE2_DISABLE_PERCENT_ZT "Disable the use of %zu and %td (rarely needed)" OFF) + +SET(PCRE2_EBCDIC OFF CACHE BOOL + "Use EBCDIC coding instead of ASCII. (This is rarely used outside of mainframe systems.)") + +SET(PCRE2_EBCDIC_NL25 OFF CACHE BOOL + "Use 0x25 as EBCDIC NL character instead of 0x15; implies EBCDIC.") + +SET(PCRE2_LINK_SIZE "2" CACHE STRING + "Internal link size (2, 3 or 4 allowed). See LINK_SIZE in config.h.in for details.") + +SET(PCRE2_PARENS_NEST_LIMIT "250" CACHE STRING + "Default nested parentheses limit. See PARENS_NEST_LIMIT in config.h.in for details.") + +SET(PCRE2_HEAP_LIMIT "20000000" CACHE STRING + "Default limit on heap memory (kibibytes). See HEAP_LIMIT in config.h.in for details.") + +SET(PCRE2_MATCH_LIMIT "10000000" CACHE STRING + "Default limit on internal looping. See MATCH_LIMIT in config.h.in for details.") + +SET(PCRE2_MATCH_LIMIT_DEPTH "MATCH_LIMIT" CACHE STRING + "Default limit on internal depth of search. See MATCH_LIMIT_DEPTH in config.h.in for details.") + +SET(PCRE2GREP_BUFSIZE "20480" CACHE STRING + "Buffer starting size parameter for pcre2grep. See PCRE2GREP_BUFSIZE in config.h.in for details.") + +SET(PCRE2GREP_MAX_BUFSIZE "1048576" CACHE STRING + "Buffer maximum size parameter for pcre2grep. See PCRE2GREP_MAX_BUFSIZE in config.h.in for details.") + +SET(PCRE2_NEWLINE "LF" CACHE STRING + "What to recognize as a newline (one of CR, LF, CRLF, ANY, ANYCRLF, NUL).") + +SET(PCRE2_HEAP_MATCH_RECURSE OFF CACHE BOOL + "Obsolete option: do not use") + +SET(PCRE2_SUPPORT_JIT OFF CACHE BOOL + "Enable support for Just-in-time compiling.") + +IF(${CMAKE_SYSTEM_NAME} MATCHES Linux|NetBSD) + SET(PCRE2_SUPPORT_JIT_SEALLOC OFF CACHE BOOL + "Enable SELinux compatible execmem allocator in JIT (experimental).") +ELSE(${CMAKE_SYSTEM_NAME} MATCHES Linux|NetBSD) + SET(PCRE2_SUPPORT_JIT_SEALLOC IGNORE) +ENDIF(${CMAKE_SYSTEM_NAME} MATCHES Linux|NetBSD) + +SET(PCRE2GREP_SUPPORT_JIT ON CACHE BOOL + "Enable use of Just-in-time compiling in pcre2grep.") + +SET(PCRE2GREP_SUPPORT_CALLOUT ON CACHE BOOL + "Enable callout string support in pcre2grep.") + +SET(PCRE2GREP_SUPPORT_CALLOUT_FORK ON CACHE BOOL + "Enable callout string fork support in pcre2grep.") + +SET(PCRE2_SUPPORT_UNICODE ON CACHE BOOL + "Enable support for Unicode and UTF-8/UTF-16/UTF-32 encoding.") + +SET(PCRE2_SUPPORT_BSR_ANYCRLF OFF CACHE BOOL + "ON=Backslash-R matches only LF CR and CRLF, OFF=Backslash-R matches all Unicode Linebreaks") + +SET(PCRE2_NEVER_BACKSLASH_C OFF CACHE BOOL + "If ON, backslash-C (upper case C) is locked out.") + +SET(PCRE2_SUPPORT_VALGRIND OFF CACHE BOOL + "Enable Valgrind support.") + +OPTION(PCRE2_SHOW_REPORT "Show the final configuration report" ON) +OPTION(PCRE2_BUILD_PCRE2GREP "Build pcre2grep" ON) +OPTION(PCRE2_BUILD_TESTS "Build the tests" ON) + +IF (MINGW) + OPTION(NON_STANDARD_LIB_PREFIX + "ON=Shared libraries built in mingw will be named pcre2.dll, etc., instead of libpcre2.dll, etc." + OFF) + + OPTION(NON_STANDARD_LIB_SUFFIX + "ON=Shared libraries built in mingw will be named libpcre2-0.dll, etc., instead of libpcre2.dll, etc." + OFF) +ENDIF(MINGW) + +IF(MSVC) + OPTION(PCRE2_STATIC_RUNTIME + "ON=Compile against the static runtime (/MT)." + OFF) + OPTION(INSTALL_MSVC_PDB + "ON=Install .pdb files built by MSVC, if generated" + OFF) +ENDIF(MSVC) + +# bzip2 lib +IF(BZIP2_FOUND) + OPTION (PCRE2_SUPPORT_LIBBZ2 "Enable support for linking pcre2grep with libbz2." ON) +ENDIF(BZIP2_FOUND) +IF(PCRE2_SUPPORT_LIBBZ2) + INCLUDE_DIRECTORIES(${BZIP2_INCLUDE_DIR}) +ENDIF(PCRE2_SUPPORT_LIBBZ2) + +# zlib +IF(ZLIB_FOUND) + OPTION (PCRE2_SUPPORT_LIBZ "Enable support for linking pcre2grep with libz." ON) +ENDIF(ZLIB_FOUND) +IF(PCRE2_SUPPORT_LIBZ) + INCLUDE_DIRECTORIES(${ZLIB_INCLUDE_DIR}) +ENDIF(PCRE2_SUPPORT_LIBZ) + +# editline lib +IF(EDITLINE_FOUND) + OPTION (PCRE2_SUPPORT_LIBEDIT "Enable support for linking pcre2test with libedit." OFF) +ENDIF(EDITLINE_FOUND) +IF(PCRE2_SUPPORT_LIBEDIT) + INCLUDE_DIRECTORIES(${EDITLINE_INCLUDE_DIR}) +ENDIF(PCRE2_SUPPORT_LIBEDIT) + +# readline lib +IF(READLINE_FOUND) + OPTION (PCRE2_SUPPORT_LIBREADLINE "Enable support for linking pcre2test with libreadline." ON) +ENDIF(READLINE_FOUND) +IF(PCRE2_SUPPORT_LIBREADLINE) + INCLUDE_DIRECTORIES(${READLINE_INCLUDE_DIR}) +ENDIF(PCRE2_SUPPORT_LIBREADLINE) + +# Prepare build configuration + +IF(NOT BUILD_SHARED_LIBS) + SET(PCRE2_STATIC 1) +ENDIF(NOT BUILD_SHARED_LIBS) + +IF(NOT PCRE2_BUILD_PCRE2_8 AND NOT PCRE2_BUILD_PCRE2_16 AND NOT PCRE2_BUILD_PCRE2_32) + MESSAGE(FATAL_ERROR "At least one of PCRE2_BUILD_PCRE2_8, PCRE2_BUILD_PCRE2_16 or PCRE2_BUILD_PCRE2_32 must be enabled") +ENDIF(NOT PCRE2_BUILD_PCRE2_8 AND NOT PCRE2_BUILD_PCRE2_16 AND NOT PCRE2_BUILD_PCRE2_32) + +IF(PCRE2_BUILD_PCRE2_8) + SET(SUPPORT_PCRE2_8 1) +ENDIF(PCRE2_BUILD_PCRE2_8) + +IF(PCRE2_BUILD_PCRE2_16) + SET(SUPPORT_PCRE2_16 1) +ENDIF(PCRE2_BUILD_PCRE2_16) + +IF(PCRE2_BUILD_PCRE2_32) + SET(SUPPORT_PCRE2_32 1) +ENDIF(PCRE2_BUILD_PCRE2_32) + +IF(PCRE2_BUILD_PCRE2GREP AND NOT PCRE2_BUILD_PCRE2_8) + MESSAGE(STATUS "** PCRE2_BUILD_PCRE2_8 must be enabled for the pcre2grep program") + SET(PCRE2_BUILD_PCRE2GREP OFF) +ENDIF(PCRE2_BUILD_PCRE2GREP AND NOT PCRE2_BUILD_PCRE2_8) + +IF(PCRE2_SUPPORT_LIBREADLINE AND PCRE2_SUPPORT_LIBEDIT) + MESSAGE(FATAL_ERROR "Only one of libreadline or libeditline can be specified") +ENDIF(PCRE2_SUPPORT_LIBREADLINE AND PCRE2_SUPPORT_LIBEDIT) + +IF(PCRE2_SUPPORT_BSR_ANYCRLF) + SET(BSR_ANYCRLF 1) +ENDIF(PCRE2_SUPPORT_BSR_ANYCRLF) + +IF(PCRE2_NEVER_BACKSLASH_C) + SET(NEVER_BACKSLASH_C 1) +ENDIF(PCRE2_NEVER_BACKSLASH_C) + +IF(PCRE2_SUPPORT_UNICODE) + SET(SUPPORT_UNICODE 1) +ENDIF(PCRE2_SUPPORT_UNICODE) + +IF(PCRE2_SUPPORT_JIT) + SET(SUPPORT_JIT 1) +ENDIF(PCRE2_SUPPORT_JIT) + +IF(PCRE2_SUPPORT_JIT_SEALLOC) + SET(CMAKE_REQUIRED_DEFINITIONS -D_GNU_SOURCE) + CHECK_SYMBOL_EXISTS(mkostemp stdlib.h REQUIRED) + UNSET(CMAKE_REQUIRED_DEFINITIONS) + IF(${REQUIRED}) + IF(${CMAKE_SYSTEM_NAME} MATCHES Linux|NetBSD) + ADD_DEFINITIONS(-D_GNU_SOURCE) + SET(SLJIT_PROT_EXECUTABLE_ALLOCATOR 1) + ELSE(${CMAKE_SYSTEM_NAME} MATCHES Linux|NetBSD) + MESSAGE(FATAL_ERROR "Your configuration is not supported") + ENDIF(${CMAKE_SYSTEM_NAME} MATCHES Linux|NetBSD) + ELSE(${REQUIRED}) + SET(PCRE2_SUPPORT_JIT_SEALLOC OFF) + ENDIF(${REQUIRED}) +ENDIF(PCRE2_SUPPORT_JIT_SEALLOC) + +IF(PCRE2GREP_SUPPORT_JIT) + SET(SUPPORT_PCRE2GREP_JIT 1) +ENDIF(PCRE2GREP_SUPPORT_JIT) + +IF(PCRE2GREP_SUPPORT_CALLOUT) + SET(SUPPORT_PCRE2GREP_CALLOUT 1) + IF(PCRE2GREP_SUPPORT_CALLOUT_FORK) + SET(SUPPORT_PCRE2GREP_CALLOUT_FORK 1) + ENDIF(PCRE2GREP_SUPPORT_CALLOUT_FORK) +ENDIF(PCRE2GREP_SUPPORT_CALLOUT) + +IF(PCRE2_SUPPORT_VALGRIND) + SET(SUPPORT_VALGRIND 1) +ENDIF(PCRE2_SUPPORT_VALGRIND) + +IF(PCRE2_DISABLE_PERCENT_ZT) + SET(DISABLE_PERCENT_ZT 1) +ENDIF(PCRE2_DISABLE_PERCENT_ZT) + +# This next one used to reference ${READLINE_LIBRARY}) +# but I was advised to add the NCURSES test as well, along with +# some modifications to cmake/FindReadline.cmake which should +# make it possible to override the default if necessary. PH + +IF(PCRE2_SUPPORT_LIBREADLINE) + SET(SUPPORT_LIBREADLINE 1) + SET(PCRE2TEST_LIBS ${READLINE_LIBRARY} ${NCURSES_LIBRARY}) +ENDIF(PCRE2_SUPPORT_LIBREADLINE) + +# libedit is a plug-compatible alternative to libreadline + +IF(PCRE2_SUPPORT_LIBEDIT) + SET(SUPPORT_LIBEDIT 1) + SET(PCRE2TEST_LIBS ${EDITLINE_LIBRARY} ${NCURSES_LIBRARY}) +ENDIF(PCRE2_SUPPORT_LIBEDIT) + +IF(PCRE2_SUPPORT_LIBZ) + SET(SUPPORT_LIBZ 1) + SET(PCRE2GREP_LIBS ${PCRE2GREP_LIBS} ${ZLIB_LIBRARIES}) +ENDIF(PCRE2_SUPPORT_LIBZ) + +IF(PCRE2_SUPPORT_LIBBZ2) + SET(SUPPORT_LIBBZ2 1) + SET(PCRE2GREP_LIBS ${PCRE2GREP_LIBS} ${BZIP2_LIBRARIES}) +ENDIF(PCRE2_SUPPORT_LIBBZ2) + +SET(NEWLINE_DEFAULT "") + +IF(PCRE2_NEWLINE STREQUAL "CR") + SET(NEWLINE_DEFAULT "1") +ENDIF(PCRE2_NEWLINE STREQUAL "CR") +IF(PCRE2_NEWLINE STREQUAL "LF") + SET(NEWLINE_DEFAULT "2") +ENDIF(PCRE2_NEWLINE STREQUAL "LF") +IF(PCRE2_NEWLINE STREQUAL "CRLF") + SET(NEWLINE_DEFAULT "3") +ENDIF(PCRE2_NEWLINE STREQUAL "CRLF") +IF(PCRE2_NEWLINE STREQUAL "ANY") + SET(NEWLINE_DEFAULT "4") +ENDIF(PCRE2_NEWLINE STREQUAL "ANY") +IF(PCRE2_NEWLINE STREQUAL "ANYCRLF") + SET(NEWLINE_DEFAULT "5") +ENDIF(PCRE2_NEWLINE STREQUAL "ANYCRLF") +IF(PCRE2_NEWLINE STREQUAL "NUL") + SET(NEWLINE_DEFAULT "6") +ENDIF(PCRE2_NEWLINE STREQUAL "NUL") + +IF(NEWLINE_DEFAULT STREQUAL "") + MESSAGE(FATAL_ERROR "The PCRE2_NEWLINE variable must be set to one of the following values: \"LF\", \"CR\", \"CRLF\", \"ANY\", \"ANYCRLF\".") +ENDIF(NEWLINE_DEFAULT STREQUAL "") + +IF(PCRE2_EBCDIC) + SET(EBCDIC 1) +ENDIF(PCRE2_EBCDIC) + +IF(PCRE2_EBCDIC_NL25) + SET(EBCDIC 1) + SET(EBCDIC_NL25 1) +ENDIF(PCRE2_EBCDIC_NL25) + +# Output files + +CONFIGURE_FILE(config-cmake.h.in + ${PROJECT_BINARY_DIR}/config.h + @ONLY) + +# Parse version numbers and date out of configure.ac + +file(STRINGS ${PROJECT_SOURCE_DIR}/configure.ac + configure_lines + LIMIT_COUNT 50 # Read only the first 50 lines of the file +) + +set(SEARCHED_VARIABLES "pcre2_major" "pcre2_minor" "pcre2_prerelease" "pcre2_date" + "libpcre2_posix_version" "libpcre2_8_version" "libpcre2_16_version" "libpcre2_32_version") +foreach(configure_line ${configure_lines}) + foreach(_substitution_variable ${SEARCHED_VARIABLES}) + string(TOUPPER ${_substitution_variable} _substitution_variable_upper) + if (NOT ${_substitution_variable_upper}) + string(REGEX MATCH "m4_define\\(${_substitution_variable}, *\\[(.*)\\]" MATCHED_STRING ${configure_line}) + if (CMAKE_MATCH_1) + set(${_substitution_variable_upper} ${CMAKE_MATCH_1}) + endif() + endif() + endforeach() +endforeach() + +macro(PARSE_LIB_VERSION VARIABLE_PREFIX) + string(REPLACE ":" ";" ${VARIABLE_PREFIX}_VERSION_LIST ${${VARIABLE_PREFIX}_VERSION}) + list(GET ${VARIABLE_PREFIX}_VERSION_LIST 0 ${VARIABLE_PREFIX}_VERSION_CURRENT) + list(GET ${VARIABLE_PREFIX}_VERSION_LIST 1 ${VARIABLE_PREFIX}_VERSION_REVISION) + list(GET ${VARIABLE_PREFIX}_VERSION_LIST 2 ${VARIABLE_PREFIX}_VERSION_AGE) + + math(EXPR ${VARIABLE_PREFIX}_SOVERSION "${${VARIABLE_PREFIX}_VERSION_CURRENT} - ${${VARIABLE_PREFIX}_VERSION_AGE}") + math(EXPR ${VARIABLE_PREFIX}_MACHO_COMPATIBILITY_VERSION "${${VARIABLE_PREFIX}_VERSION_CURRENT} + 1") + math(EXPR ${VARIABLE_PREFIX}_MACHO_CURRENT_VERSION "${${VARIABLE_PREFIX}_VERSION_CURRENT} + 1") + set(${VARIABLE_PREFIX}_MACHO_CURRENT_VERSION "${${VARIABLE_PREFIX}_MACHO_CURRENT_VERSION}.${${VARIABLE_PREFIX}_VERSION_REVISION}}") + set(${VARIABLE_PREFIX}_VERSION "${${VARIABLE_PREFIX}_SOVERSION}.${${VARIABLE_PREFIX}_VERSION_AGE}.${${VARIABLE_PREFIX}_VERSION_REVISION}") +endmacro() + +PARSE_LIB_VERSION(LIBPCRE2_POSIX) +PARSE_LIB_VERSION(LIBPCRE2_8) +PARSE_LIB_VERSION(LIBPCRE2_16) +PARSE_LIB_VERSION(LIBPCRE2_32) + +CONFIGURE_FILE(src/pcre2.h.in + ${PROJECT_BINARY_DIR}/pcre2.h + @ONLY) + +# Make sure to not link debug libs +# against release libs and vice versa +IF(WIN32) + SET(CMAKE_DEBUG_POSTFIX "d") +ENDIF(WIN32) + +# Generate pkg-config files + +SET(PACKAGE_VERSION "${PCRE2_MAJOR}.${PCRE2_MINOR}") +SET(prefix ${CMAKE_INSTALL_PREFIX}) + +SET(exec_prefix "\${prefix}") +SET(libdir "\${exec_prefix}/${CMAKE_INSTALL_LIBDIR}") +SET(includedir "\${prefix}/include") +IF(WIN32 AND (CMAKE_BUILD_TYPE MATCHES Debug)) + SET(LIB_POSTFIX ${CMAKE_DEBUG_POSTFIX}) +ENDIF() +CONFIGURE_FILE(libpcre2-posix.pc.in libpcre2-posix.pc @ONLY) +SET(pkg_config_files ${pkg_config_files} "${CMAKE_CURRENT_BINARY_DIR}/libpcre2-posix.pc") + +IF(PCRE2_BUILD_PCRE2_8) + CONFIGURE_FILE(libpcre2-8.pc.in libpcre2-8.pc @ONLY) + SET(pkg_config_files ${pkg_config_files} "${CMAKE_CURRENT_BINARY_DIR}/libpcre2-8.pc") + SET(enable_pcre2_8 "yes") +ELSE() + SET(enable_pcre2_8 "no") +ENDIF() + +IF(PCRE2_BUILD_PCRE2_16) + CONFIGURE_FILE(libpcre2-16.pc.in libpcre2-16.pc @ONLY) + SET(pkg_config_files ${pkg_config_files} "${CMAKE_CURRENT_BINARY_DIR}/libpcre2-16.pc") + SET(enable_pcre2_16 "yes") +ELSE() + SET(enable_pcre2_16 "no") +ENDIF() + +IF(PCRE2_BUILD_PCRE2_32) + CONFIGURE_FILE(libpcre2-32.pc.in libpcre2-32.pc @ONLY) + SET(pkg_config_files ${pkg_config_files} "${CMAKE_CURRENT_BINARY_DIR}/libpcre2-32.pc") + SET(enable_pcre2_32 "yes") +ELSE() + SET(enable_pcre2_32 "no") +ENDIF() + +CONFIGURE_FILE(pcre2-config.in pcre2-config @ONLY) + +# Character table generation + +OPTION(PCRE2_REBUILD_CHARTABLES "Rebuild char tables" OFF) +IF(PCRE2_REBUILD_CHARTABLES) + ADD_EXECUTABLE(pcre2_dftables src/pcre2_dftables.c) + ADD_CUSTOM_COMMAND( + COMMENT "Generating character tables (pcre2_chartables.c) for current locale" + DEPENDS pcre2_dftables + COMMAND pcre2_dftables + ARGS ${PROJECT_BINARY_DIR}/pcre2_chartables.c + OUTPUT ${PROJECT_BINARY_DIR}/pcre2_chartables.c + ) +ELSE(PCRE2_REBUILD_CHARTABLES) + CONFIGURE_FILE(${PROJECT_SOURCE_DIR}/src/pcre2_chartables.c.dist + ${PROJECT_BINARY_DIR}/pcre2_chartables.c + COPYONLY) +ENDIF(PCRE2_REBUILD_CHARTABLES) + +# Source code + +SET(PCRE2_HEADERS ${PROJECT_BINARY_DIR}/pcre2.h) + +SET(PCRE2_SOURCES + src/pcre2_auto_possess.c + ${PROJECT_BINARY_DIR}/pcre2_chartables.c + src/pcre2_compile.c + src/pcre2_config.c + src/pcre2_context.c + src/pcre2_convert.c + src/pcre2_dfa_match.c + src/pcre2_error.c + src/pcre2_extuni.c + src/pcre2_find_bracket.c + src/pcre2_jit_compile.c + src/pcre2_maketables.c + src/pcre2_match.c + src/pcre2_match_data.c + src/pcre2_newline.c + src/pcre2_ord2utf.c + src/pcre2_pattern_info.c + src/pcre2_script_run.c + src/pcre2_serialize.c + src/pcre2_string_utils.c + src/pcre2_study.c + src/pcre2_substitute.c + src/pcre2_substring.c + src/pcre2_tables.c + src/pcre2_ucd.c + src/pcre2_valid_utf.c + src/pcre2_xclass.c +) + +SET(PCRE2POSIX_HEADERS src/pcre2posix.h) +SET(PCRE2POSIX_SOURCES src/pcre2posix.c) + +IF(MINGW AND NOT PCRE2_STATIC) +IF (EXISTS ${PROJECT_SOURCE_DIR}/pcre2.rc) +ADD_CUSTOM_COMMAND(OUTPUT ${PROJECT_SOURCE_DIR}/pcre2.o +PRE-LINK +COMMAND windres ARGS pcre2.rc pcre2.o +WORKING_DIRECTORY ${PROJECT_SOURCE_DIR} +COMMENT Using pcre2 coff info in mingw build) +SET(PCRE2_SOURCES + ${PCRE2_SOURCES} ${PROJECT_SOURCE_DIR}/pcre2.o +) +ENDIF(EXISTS ${PROJECT_SOURCE_DIR}/pcre2.rc) +IF (EXISTS ${PROJECT_SOURCE_DIR}/pcre2posix.rc) +ADD_CUSTOM_COMMAND(OUTPUT ${PROJECT_SOURCE_DIR}/pcre2posix.o +PRE-LINK +COMMAND windres ARGS pcre2posix.rc pcre2posix.o +WORKING_DIRECTORY ${PROJECT_SOURCE_DIR} +COMMENT Using pcre2posix coff info in mingw build) +SET(PCRE2POSIX_SOURCES + ${PCRE2POSIX_SOURCES} ${PROJECT_SOURCE_DIR}/pcre2posix.o +) +ENDIF(EXISTS ${PROJECT_SOURCE_DIR}/pcre2posix.rc) +ENDIF(MINGW AND NOT PCRE2_STATIC) + +IF(MSVC AND NOT PCRE2_STATIC) +IF (EXISTS ${PROJECT_SOURCE_DIR}/pcre2.rc) +SET(PCRE2_SOURCES + ${PCRE2_SOURCES} pcre2.rc) +ENDIF(EXISTS ${PROJECT_SOURCE_DIR}/pcre2.rc) +IF (EXISTS ${PROJECT_SOURCE_DIR}/pcre2posix.rc) +SET(PCRE2POSIX_SOURCES + ${PCRE2POSIX_SOURCES} pcre2posix.rc) +ENDIF (EXISTS ${PROJECT_SOURCE_DIR}/pcre2posix.rc) +ENDIF(MSVC AND NOT PCRE2_STATIC) + +# Fix static compilation with MSVC: https://bugs.exim.org/show_bug.cgi?id=1681 +# This code was taken from the CMake wiki, not from WebM. + +IF(MSVC AND PCRE2_STATIC_RUNTIME) + MESSAGE(STATUS "** MSVC and PCRE2_STATIC_RUNTIME: modifying compiler flags to use static runtime library") + foreach(flag_var + CMAKE_C_FLAGS CMAKE_C_FLAGS_DEBUG CMAKE_C_FLAGS_RELEASE + CMAKE_C_FLAGS_MINSIZEREL CMAKE_C_FLAGS_RELWITHDEBINFO) + string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}") + endforeach() +ENDIF(MSVC AND PCRE2_STATIC_RUNTIME) + +# Build setup + +ADD_DEFINITIONS(-DHAVE_CONFIG_H) + +IF(MSVC) + ADD_DEFINITIONS(-D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS) +ENDIF(MSVC) + +SET(CMAKE_INCLUDE_CURRENT_DIR 1) + +SET(targets) + +# 8-bit library + +IF(PCRE2_BUILD_PCRE2_8) +ADD_LIBRARY(pcre2-8 ${PCRE2_HEADERS} ${PCRE2_SOURCES} ${PROJECT_BINARY_DIR}/config.h) +SET_TARGET_PROPERTIES(pcre2-8 PROPERTIES + COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=8 + MACHO_COMPATIBILITY_VERSION "${LIBPCRE2_8_MACHO_COMPATIBILITY_VERSION}" + MACHO_CURRENT_VERSION "${LIBPCRE2_8_MACHO_CURRENT_VERSION}" + VERSION ${LIBPCRE2_8_VERSION} + SOVERSION ${LIBPCRE2_8_SOVERSION}) +SET(targets ${targets} pcre2-8) +ADD_LIBRARY(pcre2-posix ${PCRE2POSIX_HEADERS} ${PCRE2POSIX_SOURCES}) +SET_TARGET_PROPERTIES(pcre2-posix PROPERTIES + COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=8 + MACHO_COMPATIBILITY_VERSION "${LIBPCRE2_POSIX_MACHO_COMPATIBILITY_VERSION}" + MACHO_CURRENT_VERSION "${LIBPCRE2_POSIX_MACHO_CURRENT_VERSION}" + VERSION ${LIBPCRE2_POSIX_VERSION} + SOVERSION ${LIBPCRE2_POSIX_SOVERSION}) +SET(targets ${targets} pcre2-posix) +TARGET_LINK_LIBRARIES(pcre2-posix pcre2-8) + +IF(MINGW AND NOT PCRE2_STATIC) + IF(NON_STANDARD_LIB_PREFIX) + SET_TARGET_PROPERTIES(pcre2-8 pcre2-posix PROPERTIES PREFIX "") + ENDIF(NON_STANDARD_LIB_PREFIX) + IF(NON_STANDARD_LIB_SUFFIX) + SET_TARGET_PROPERTIES(pcre2-8 pcre2-posix PROPERTIES SUFFIX "-0.dll") + ENDIF(NON_STANDARD_LIB_SUFFIX) +ENDIF(MINGW AND NOT PCRE2_STATIC) +ENDIF(PCRE2_BUILD_PCRE2_8) + +# 16-bit library + +IF(PCRE2_BUILD_PCRE2_16) +ADD_LIBRARY(pcre2-16 ${PCRE2_HEADERS} ${PCRE2_SOURCES} ${PROJECT_BINARY_DIR}/config.h) +SET_TARGET_PROPERTIES(pcre2-16 PROPERTIES + COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=16 + MACHO_COMPATIBILITY_VERSION "${LIBPCRE2_32_MACHO_COMPATIBILITY_VERSION}" + MACHO_CURRENT_VERSION "${LIBPCRE2_32_MACHO_CURRENT_VERSION}" + VERSION ${LIBPCRE2_16_VERSION} + SOVERSION ${LIBPCRE2_16_SOVERSION}) +SET(targets ${targets} pcre2-16) + +IF(MINGW AND NOT PCRE2_STATIC) + IF(NON_STANDARD_LIB_PREFIX) + SET_TARGET_PROPERTIES(pcre2-16 PROPERTIES PREFIX "") + ENDIF(NON_STANDARD_LIB_PREFIX) + IF(NON_STANDARD_LIB_SUFFIX) + SET_TARGET_PROPERTIES(pcre2-16 PROPERTIES SUFFIX "-0.dll") + ENDIF(NON_STANDARD_LIB_SUFFIX) +ENDIF(MINGW AND NOT PCRE2_STATIC) +ENDIF(PCRE2_BUILD_PCRE2_16) + +# 32-bit library + +IF(PCRE2_BUILD_PCRE2_32) +ADD_LIBRARY(pcre2-32 ${PCRE2_HEADERS} ${PCRE2_SOURCES} ${PROJECT_BINARY_DIR}/config.h) +SET_TARGET_PROPERTIES(pcre2-32 PROPERTIES + COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=32 + MACHO_COMPATIBILITY_VERSION "${LIBPCRE2_32_MACHO_COMPATIBILITY_VERSION}" + MACHO_CURRENT_VERSION "${LIBPCRE2_32_MACHO_CURRENT_VERSION}" + VERSION ${LIBPCRE2_32_VERSION} + SOVERSION ${LIBPCRE2_32_SOVERSION}) +SET(targets ${targets} pcre2-32) + +IF(MINGW AND NOT PCRE2_STATIC) + IF(NON_STANDARD_LIB_PREFIX) + SET_TARGET_PROPERTIES(pcre2-32 PROPERTIES PREFIX "") + ENDIF(NON_STANDARD_LIB_PREFIX) + IF(NON_STANDARD_LIB_SUFFIX) + SET_TARGET_PROPERTIES(pcre2-32 PROPERTIES SUFFIX "-0.dll") + ENDIF(NON_STANDARD_LIB_SUFFIX) +ENDIF(MINGW AND NOT PCRE2_STATIC) +ENDIF(PCRE2_BUILD_PCRE2_32) + +# Executables + +IF(PCRE2_BUILD_PCRE2GREP) + ADD_EXECUTABLE(pcre2grep src/pcre2grep.c) + SET_PROPERTY(TARGET pcre2grep + PROPERTY COMPILE_DEFINITIONS PCRE2_CODE_UNIT_WIDTH=8) + SET(targets ${targets} pcre2grep) + TARGET_LINK_LIBRARIES(pcre2grep pcre2-posix ${PCRE2GREP_LIBS}) +ENDIF(PCRE2_BUILD_PCRE2GREP) + +# Testing + +IF(PCRE2_BUILD_TESTS) + ENABLE_TESTING() + + SET(PCRE2TEST_SOURCES src/pcre2test.c) + + IF(MSVC) + # This is needed to avoid a stack overflow error in the standard tests. The + # flag should be indicated with a forward-slash instead of a hyphen, but + # then CMake treats it as a file path. + SET(PCRE2TEST_LINKER_FLAGS -STACK:2500000) + ENDIF(MSVC) + + ADD_EXECUTABLE(pcre2test ${PCRE2TEST_SOURCES}) + SET(targets ${targets} pcre2test) + IF(PCRE2_BUILD_PCRE2_8) + LIST(APPEND PCRE2TEST_LIBS pcre2-posix pcre2-8) + ENDIF(PCRE2_BUILD_PCRE2_8) + IF(PCRE2_BUILD_PCRE2_16) + LIST(APPEND PCRE2TEST_LIBS pcre2-16) + ENDIF(PCRE2_BUILD_PCRE2_16) + IF(PCRE2_BUILD_PCRE2_32) + LIST(APPEND PCRE2TEST_LIBS pcre2-32) + ENDIF(PCRE2_BUILD_PCRE2_32) + TARGET_LINK_LIBRARIES(pcre2test ${PCRE2TEST_LIBS} ${PCRE2TEST_LINKER_FLAGS}) + + IF(PCRE2_SUPPORT_JIT) + ADD_EXECUTABLE(pcre2_jit_test src/pcre2_jit_test.c) + SET(targets ${targets} pcre2_jit_test) + SET(PCRE2_JIT_TEST_LIBS ) + IF(PCRE2_BUILD_PCRE2_8) + LIST(APPEND PCRE2_JIT_TEST_LIBS pcre2-8) + ENDIF(PCRE2_BUILD_PCRE2_8) + IF(PCRE2_BUILD_PCRE2_16) + LIST(APPEND PCRE2_JIT_TEST_LIBS pcre2-16) + ENDIF(PCRE2_BUILD_PCRE2_16) + IF(PCRE2_BUILD_PCRE2_32) + LIST(APPEND PCRE2_JIT_TEST_LIBS pcre2-32) + ENDIF(PCRE2_BUILD_PCRE2_32) + TARGET_LINK_LIBRARIES(pcre2_jit_test ${PCRE2_JIT_TEST_LIBS}) + ENDIF(PCRE2_SUPPORT_JIT) + + # exes in Debug location tested by the RunTest and RunGrepTest shell scripts + # via "make test" + + # The commented out code below provokes a warning about future removal + # of the facility, and requires policy CMP0026 to be set to "OLD". I have + # got fed-up with the warnings, but my plea for help on the mailing list + # produced no response. So, I've hacked. The new code below seems to work on + # Linux. + +# IF(PCRE2_BUILD_PCRE2GREP) +# GET_TARGET_PROPERTY(PCRE2GREP_EXE pcre2grep DEBUG_LOCATION) +# ENDIF(PCRE2_BUILD_PCRE2GREP) +# +# GET_TARGET_PROPERTY(PCRE2TEST_EXE pcre2test DEBUG_LOCATION) + + IF(PCRE2_BUILD_PCRE2GREP) + SET(PCRE2GREP_EXE $) + ENDIF(PCRE2_BUILD_PCRE2GREP) + + SET(PCRE2TEST_EXE $) + + +# ================================================= + # Write out a CTest configuration file + # + FILE(WRITE ${PROJECT_BINARY_DIR}/CTestCustom.ctest + "# This is a generated file. +MESSAGE(\"When testing is complete, review test output in the +\\\"${PROJECT_BINARY_DIR}/Testing/Temporary\\\" folder.\") +MESSAGE(\" \") +") + + FILE(WRITE ${PROJECT_BINARY_DIR}/pcre2_test.sh + "#! /bin/sh +# This is a generated file. +. ${PROJECT_SOURCE_DIR}/RunTest +if test \"$?\" != \"0\"; then exit 1; fi +# End +") + + IF(UNIX) + ADD_TEST(pcre2_test sh ${PROJECT_BINARY_DIR}/pcre2_test.sh) + ENDIF(UNIX) + + IF(PCRE2_BUILD_PCRE2GREP) + FILE(WRITE ${PROJECT_BINARY_DIR}/pcre2_grep_test.sh + "#! /bin/sh +# This is a generated file. +. ${PROJECT_SOURCE_DIR}/RunGrepTest +if test \"$?\" != \"0\"; then exit 1; fi +# End +") + + IF(UNIX) + ADD_TEST(pcre2_grep_test sh ${PROJECT_BINARY_DIR}/pcre2_grep_test.sh) + ENDIF(UNIX) + ENDIF(PCRE2_BUILD_PCRE2GREP) + + IF(WIN32) + # Provide environment for executing the bat file version of RunTest + FILE(TO_NATIVE_PATH ${PROJECT_SOURCE_DIR} winsrc) + FILE(TO_NATIVE_PATH ${PROJECT_BINARY_DIR} winbin) + FILE(TO_NATIVE_PATH ${PCRE2TEST_EXE} winexe) + + FILE(WRITE ${PROJECT_BINARY_DIR}/pcre2_test.bat + "\@REM This is a generated file. +\@echo off +setlocal +SET srcdir=\"${winsrc}\" +# The next line was replaced by the following one after a user comment. +# SET pcre2test=\"${winexe}\" +SET pcre2test=\"${winbin}\\pcre2test.exe\" +if not [%CMAKE_CONFIG_TYPE%]==[] SET pcre2test=\"${winbin}\\%CMAKE_CONFIG_TYPE%\\pcre2test.exe\" +call %srcdir%\\RunTest.Bat +if errorlevel 1 exit /b 1 +echo RunTest.bat tests successfully completed +") + + ADD_TEST(NAME pcre2_test_bat + COMMAND pcre2_test.bat) + SET_TESTS_PROPERTIES(pcre2_test_bat PROPERTIES + PASS_REGULAR_EXPRESSION "RunTest\\.bat tests successfully completed") + + IF("$ENV{OSTYPE}" STREQUAL "msys") + # Both the sh and bat file versions of RunTest are run if make test is used + # in msys + ADD_TEST(pcre2_test_sh sh.exe ${PROJECT_BINARY_DIR}/pcre2_test.sh) + IF(PCRE2_BUILD_PCRE2GREP) + ADD_TEST(pcre2_grep_test sh.exe ${PROJECT_BINARY_DIR}/pcre2_grep_test.sh) + ENDIF(PCRE2_BUILD_PCRE2GREP) + ENDIF("$ENV{OSTYPE}" STREQUAL "msys") + ENDIF(WIN32) + + # Changed to accommodate testing whichever location was just built + + IF(PCRE2_SUPPORT_JIT) + ADD_TEST(pcre2_jit_test pcre2_jit_test) + ENDIF(PCRE2_SUPPORT_JIT) + +ENDIF(PCRE2_BUILD_TESTS) + +# Installation + +SET(CMAKE_INSTALL_ALWAYS 1) + +INSTALL(TARGETS ${targets} + RUNTIME DESTINATION bin + LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} + ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}) +INSTALL(FILES ${pkg_config_files} DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig) +INSTALL(FILES "${CMAKE_CURRENT_BINARY_DIR}/pcre2-config" + DESTINATION bin + # Set 0755 permissions + PERMISSIONS OWNER_WRITE OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE) + +INSTALL(FILES ${PCRE2_HEADERS} ${PCRE2POSIX_HEADERS} DESTINATION include) + +FILE(GLOB html ${PROJECT_SOURCE_DIR}/doc/html/*.html) +FILE(GLOB man1 ${PROJECT_SOURCE_DIR}/doc/*.1) +FILE(GLOB man3 ${PROJECT_SOURCE_DIR}/doc/*.3) + +FOREACH(man ${man3}) + GET_FILENAME_COMPONENT(man_tmp ${man} NAME) + SET(man3_new ${man3} ${man}) +ENDFOREACH(man ${man3}) +SET(man3 ${man3_new}) + +INSTALL(FILES ${man1} DESTINATION man/man1) +INSTALL(FILES ${man3} DESTINATION man/man3) +INSTALL(FILES ${html} DESTINATION share/doc/pcre2/html) + +IF(MSVC AND INSTALL_MSVC_PDB) + INSTALL(FILES ${PROJECT_BINARY_DIR}/pcre2.pdb + ${PROJECT_BINARY_DIR}/pcre2posix.pdb + DESTINATION bin + CONFIGURATIONS RelWithDebInfo) + INSTALL(FILES ${PROJECT_BINARY_DIR}/pcre2d.pdb + ${PROJECT_BINARY_DIR}/pcre2posixd.pdb + DESTINATION bin + CONFIGURATIONS Debug) +ENDIF(MSVC AND INSTALL_MSVC_PDB) + +# Help, only for nice output +IF(BUILD_SHARED_LIBS) + SET(BUILD_STATIC_LIBS OFF) +ELSE(BUILD_SHARED_LIBS) + SET(BUILD_STATIC_LIBS ON) +ENDIF(BUILD_SHARED_LIBS) + +IF(PCRE2_HEAP_MATCH_RECURSE) + MESSAGE(WARNING "HEAP_MATCH_RECURSE is obsolete and does nothing.") +ENDIF(PCRE2_HEAP_MATCH_RECURSE) + +IF(PCRE2_SHOW_REPORT) + STRING(TOUPPER "${CMAKE_BUILD_TYPE}" buildtype) + IF (CMAKE_C_FLAGS) + SET(cfsp " ") + ENDIF(CMAKE_C_FLAGS) + MESSAGE(STATUS "") + MESSAGE(STATUS "") + MESSAGE(STATUS "PCRE2-${PCRE2_MAJOR}.${PCRE2_MINOR} configuration summary:") + MESSAGE(STATUS "") + MESSAGE(STATUS " Install prefix .................. : ${CMAKE_INSTALL_PREFIX}") + MESSAGE(STATUS " C compiler ...................... : ${CMAKE_C_COMPILER}") + MESSAGE(STATUS " C compiler flags ................ : ${CMAKE_C_FLAGS}${cfsp}${CMAKE_C_FLAGS_${buildtype}}") + MESSAGE(STATUS "") + MESSAGE(STATUS " Build 8 bit PCRE2 library ....... : ${PCRE2_BUILD_PCRE2_8}") + MESSAGE(STATUS " Build 16 bit PCRE2 library ...... : ${PCRE2_BUILD_PCRE2_16}") + MESSAGE(STATUS " Build 32 bit PCRE2 library ...... : ${PCRE2_BUILD_PCRE2_32}") + MESSAGE(STATUS " Enable JIT compiling support .... : ${PCRE2_SUPPORT_JIT}") + MESSAGE(STATUS " Use SELinux allocator in JIT .... : ${PCRE2_SUPPORT_JIT_SEALLOC}") + MESSAGE(STATUS " Enable Unicode support .......... : ${PCRE2_SUPPORT_UNICODE}") + MESSAGE(STATUS " Newline char/sequence ........... : ${PCRE2_NEWLINE}") + MESSAGE(STATUS " \\R matches only ANYCRLF ......... : ${PCRE2_SUPPORT_BSR_ANYCRLF}") + MESSAGE(STATUS " \\C is disabled .................. : ${PCRE2_NEVER_BACKSLASH_C}") + MESSAGE(STATUS " EBCDIC coding ................... : ${PCRE2_EBCDIC}") + MESSAGE(STATUS " EBCDIC coding with NL=0x25 ...... : ${PCRE2_EBCDIC_NL25}") + MESSAGE(STATUS " Rebuild char tables ............. : ${PCRE2_REBUILD_CHARTABLES}") + MESSAGE(STATUS " Internal link size .............. : ${PCRE2_LINK_SIZE}") + MESSAGE(STATUS " Parentheses nest limit .......... : ${PCRE2_PARENS_NEST_LIMIT}") + MESSAGE(STATUS " Heap limit ...................... : ${PCRE2_HEAP_LIMIT}") + MESSAGE(STATUS " Match limit ..................... : ${PCRE2_MATCH_LIMIT}") + MESSAGE(STATUS " Match depth limit ............... : ${PCRE2_MATCH_LIMIT_DEPTH}") + MESSAGE(STATUS " Build shared libs ............... : ${BUILD_SHARED_LIBS}") + MESSAGE(STATUS " Build static libs ............... : ${BUILD_STATIC_LIBS}") + MESSAGE(STATUS " Build pcre2grep ................. : ${PCRE2_BUILD_PCRE2GREP}") + MESSAGE(STATUS " Enable JIT in pcre2grep ......... : ${PCRE2GREP_SUPPORT_JIT}") + MESSAGE(STATUS " Enable callouts in pcre2grep .... : ${PCRE2GREP_SUPPORT_CALLOUT}") + MESSAGE(STATUS " Enable callout fork in pcre2grep. : ${PCRE2GREP_SUPPORT_CALLOUT_FORK}") + MESSAGE(STATUS " Buffer size for pcre2grep ....... : ${PCRE2GREP_BUFSIZE}") + MESSAGE(STATUS " Build tests (implies pcre2test .. : ${PCRE2_BUILD_TESTS}") + MESSAGE(STATUS " and pcre2grep)") + IF(ZLIB_FOUND) + MESSAGE(STATUS " Link pcre2grep with libz ........ : ${PCRE2_SUPPORT_LIBZ}") + ELSE(ZLIB_FOUND) + MESSAGE(STATUS " Link pcre2grep with libz ........ : Library not found" ) + ENDIF(ZLIB_FOUND) + IF(BZIP2_FOUND) + MESSAGE(STATUS " Link pcre2grep with libbz2 ...... : ${PCRE2_SUPPORT_LIBBZ2}") + ELSE(BZIP2_FOUND) + MESSAGE(STATUS " Link pcre2grep with libbz2 ...... : Library not found" ) + ENDIF(BZIP2_FOUND) + IF(EDITLINE_FOUND) + MESSAGE(STATUS " Link pcre2test with libeditline . : ${PCRE2_SUPPORT_LIBEDIT}") + ELSE(EDITLINE_FOUND) + MESSAGE(STATUS " Link pcre2test with libeditline . : Library not found" ) + ENDIF(EDITLINE_FOUND) + IF(READLINE_FOUND) + MESSAGE(STATUS " Link pcre2test with libreadline . : ${PCRE2_SUPPORT_LIBREADLINE}") + ELSE(READLINE_FOUND) + MESSAGE(STATUS " Link pcre2test with libreadline . : Library not found" ) + ENDIF(READLINE_FOUND) + MESSAGE(STATUS " Support Valgrind .................: ${PCRE2_SUPPORT_VALGRIND}") + IF(PCRE2_DISABLE_PERCENT_ZT) + MESSAGE(STATUS " Use %zu and %td ..................: OFF" ) + ELSE(PCRE2_DISABLE_PERCENT_ZT) + MESSAGE(STATUS " Use %zu and %td ..................: AUTO" ) + ENDIF(PCRE2_DISABLE_PERCENT_ZT) + + IF(MINGW AND NOT PCRE2_STATIC) + MESSAGE(STATUS " Non-standard dll names (prefix) . : ${NON_STANDARD_LIB_PREFIX}") + MESSAGE(STATUS " Non-standard dll names (suffix) . : ${NON_STANDARD_LIB_SUFFIX}") + ENDIF(MINGW AND NOT PCRE2_STATIC) + + IF(MSVC) + MESSAGE(STATUS " Install MSVC .pdb files ..........: ${INSTALL_MSVC_PDB}") + ENDIF(MSVC) + + MESSAGE(STATUS "") +ENDIF(PCRE2_SHOW_REPORT) + +# end CMakeLists.txt diff --git a/src/pcre2/COPYING b/src/pcre2/COPYING new file mode 100644 index 00000000..c233950f --- /dev/null +++ b/src/pcre2/COPYING @@ -0,0 +1,5 @@ +PCRE2 LICENCE + +Please see the file LICENCE in the PCRE2 distribution for licensing details. + +End diff --git a/src/pcre2/ChangeLog b/src/pcre2/ChangeLog new file mode 100644 index 00000000..22f3afe9 --- /dev/null +++ b/src/pcre2/ChangeLog @@ -0,0 +1,2487 @@ +Change Log for PCRE2 +-------------------- + + +Version 10.37 26-May-2021 +------------------------- + +1. Change RunGrepTest to use tr instead of sed when testing with binary +zero bytes, because sed varies a lot from system to system and has problems +with binary zeros. This is from Bugzilla #2681. Patch from Jeremie +Courreges-Anglas via Nam Nguyen. This fixes RunGrepTest for OpenBSD. Later: +it broke it for at least one version of Solaris, where tr can't handle binary +zeros. However, that system had /usr/xpg4/bin/tr installed, which works OK, so +RunGrepTest now checks for that command and uses it if found. + +2. Compiling with gcc 10.2's -fanalyzer option showed up a hypothetical problem +with a NULL dereference. I don't think this case could ever occur in practice, +but I have put in a check in order to get rid of the compiler error. + +3. An alternative patch for CMakeLists.txt because 10.36 #4 breaks CMake on +Windows. Patch from email@cs-ware.de fixes bugzilla #2688. + +4. Two bugs related to over-large numbers have been fixed so the behaviour is +now the same as Perl. + + (a) A pattern such as /\214748364/ gave an overflow error instead of being + treated as the octal number \214 followed by literal digits. + + (b) A sequence such as {65536 that has no terminating } so is not a + quantifier was nevertheless complaining that a quantifier number was too big. + +5. A run of autoconf suggested that configure.ac was out-of-date with respect +to the lastest autoconf. Running autoupdate made some valid changes, some valid +suggestions, and also some invalid changes, which were fixed by hand. Autoconf +now runs clean and the resulting "configure" seems to work, so I hope nothing +is broken. Later: the requirement for autoconf 2.70 broke some automatic test +robots. It doesn't seem to be necessary: trying a reduction to 2.60. + +6. The pattern /a\K.(?0)*/ when matched against "abac" by the interpreter gave +the answer "bac", whereas Perl and JIT both yield "c". This was because the +effect of \K was not propagating back from the full pattern recursion. Other +recursions such as /(a\K.(?1)*)/ did not have this problem. + +7. Restore single character repetition optimization in JIT. Currently fewer +character repetitions are optimized than in 10.34. + +8. When the names of the functions in the POSIX wrapper were changed to +pcre2_regcomp() etc. (see change 10.33 #4 below), functions with the original +names were left in the library so that pre-compiled programs would still work. +However, this has proved troublesome when programs link with several libraries, +some of which use PCRE2 via the POSIX interface while others use a native POSIX +library. For this reason, the POSIX function names are removed in this release. +The macros in pcre2posix.h should ensure that re-compiling fixes any programs +that haven't been compiled since before 10.33. + + +Version 10.36 04-December-2020 +------------------------------ + +1. Add CET_CFLAGS so that when Intel CET is enabled, pass -mshstk to +compiler. This fixes https://bugs.exim.org/show_bug.cgi?id=2578. Patch for +Makefile.am and configure.ac by H.J. Lu. Equivalent patch for CMakeLists.txt +invented by PH. + +2. Fix inifinite loop when a single byte newline is searched in JIT when +invalid utf8 mode is enabled. + +3. Updated CMakeLists.txt with patch from Wolfgang Stöggl (Bugzilla #2584): + + - Include GNUInstallDirs and use ${CMAKE_INSTALL_LIBDIR} instead of hardcoded + lib. This allows differentiation between lib and lib64. + CMAKE_INSTALL_LIBDIR is used for installation of libraries and also for + pkgconfig file generation. + + - Add the version of PCRE2 to the configuration summary like ./configure + does. + + - Fix typo: MACTHED_STRING->MATCHED_STRING + +4. Updated CMakeLists.txt with another patch from Wolfgang Stöggl (Bugzilla +#2588): + + - Add escaped double quotes around include directory in CMakeLists.txt to + allow spaces in directory names. + + - This fixes a cmake error, if the path of the pcre2 source contains a space. + +5. Updated CMakeLists.txt with a patch from B. Scott Michel: CMake's +documentation suggests using CHECK_SYMBOL_EXISTS over CHECK_FUNCTION_EXIST. +Moreover, these functions come from specific header files, which need to be +specified (and, thankfully, are the same on both the Linux and WinXX +platforms.) + +6. Added a (uint32_t) cast to prevent a compiler warning in pcre2_compile.c. + +7. Applied a patch from Wolfgang Stöggl (Bugzilla #2600) to fix postfix for +debug Windows builds using CMake. This also updated configure so that it +generates *.pc files and pcre2-config with the same content, as in the past. + +8. If a pattern ended with (?(VERSION=n.d where n is any number but d is just a +single digit, the code unit beyond d was being read (i.e. there was a read +buffer overflow). Fixes ClusterFuzz 23779. + +9. After the rework in r1235, certain character ranges were incorrectly +handled by an optimization in JIT. Furthermore a wrong offset was used to +read a value from a buffer which could lead to memory overread. + +10. Unnoticed for many years was the fact that delimiters other than / in the +testinput1 and testinput4 files could cause incorrect behaviour when these +files were processed by perltest.sh. There were several tests that used quotes +as delimiters, and it was just luck that they didn't go wrong with perltest.sh. +All the patterns in testinput1 and testinput4 now use / as their delimiter. +This fixes Bugzilla #2641. + +11. Perl has started to give an error for \K within lookarounds (though there +are cases where it doesn't). PCRE2 still allows this, so the tests that include +this case have been moved from test 1 to test 2. + +12. Further to 10 above, pcre2test has been updated to detect and grumble if a +delimiter other than / is used after #perltest. + +13. Fixed a bug with PCRE2_MATCH_INVALID_UTF in 8-bit mode when PCRE2_CASELESS +was set and PCRE2_NO_START_OPTIMIZE was not set. The optimization for finding +the start of a match was not resetting correctly after a failed match on the +first valid fragment of the subject, possibly causing incorrect "no match" +returns on subsequent fragments. For example, the pattern /A/ failed to match +the subject \xe5A. Fixes Bugzilla #2642. + +14. Fixed a bug in character set matching when JIT is enabled and both unicode +scripts and unicode classes are present at the same time. + +15. Added GNU grep's -m (aka --max-count) option to pcre2grep. + +16. Refactored substitution processing in pcre2grep strings, both for the -O +option and when dealing with callouts. There is now a single function that +handles $ expansion in all cases (instead of multiple copies of almost +identical code). This means that the same escape sequences are available +everywhere, which was not previously the case. At the same time, the escape +sequences $x{...} and $o{...} have been introduced, to allow for characters +whose code points are greater than 255 in Unicode mode. + +17. Applied the patch from Bugzilla #2628 to RunGrepTest. This does an explicit +test for a version of sed that can handle binary zero, instead of assuming that +any Linux version will work. Later: replaced $(...) by `...` because not all +shells recognize the former. + +18. Fixed a word boundary check bug in JIT when partial matching is enabled. + +19. Fix ARM64 compilation warning in JIT. Patch by Carlo. + +20. A bug in the RunTest script meant that if the first part of test 2 failed, +the failure was not reported. + +21. Test 2 was failing when run from a directory other than the source +directory. This failure was previously missed in RunTest because of 20 above. +Fixes added to both RunTest and RunTest.bat. + +22. Patch to CMakeLists.txt from Daniel to fix problem with testing under +Windows. + + +Version 10.35 09-May-2020 +--------------------------- + +1. Use PCRE2_MATCH_EMPTY flag to detect empty matches in JIT. + +2. Fix ARMv5 JIT improper handling of labels right after a constant pool. + +3. A JIT bug is fixed which allowed to read the fields of the compiled +pattern before its existence is checked. + +4. Back in the PCRE1 day, capturing groups that contained recursive back +references to themselves were made atomic (version 8.01, change 18) because +after the end a repeated group, the captured substrings had their values from +the final repetition, not from an earlier repetition that might be the +destination of a backtrack. This feature was documented, and was carried over +into PCRE2. However, it has now been realized that the major refactoring that +was done for 10.30 has made this atomicizing unnecessary, and it is confusing +when users are unaware of it, making some patterns appear not to be working as +expected. Capture values of recursive back references in repeated groups are +now correctly backtracked, so this unnecessary restriction has been removed. + +5. Added PCRE2_SUBSTITUTE_LITERAL. + +6. Avoid some VS compiler warnings. + +7. Added PCRE2_SUBSTITUTE_MATCHED. + +8. Added (?* and (?<* as synonms for (*napla: and (*naplb: to match another +regex engine. The Perl regex folks are aware of this usage and have made a note +about it. + +9. When an assertion is repeated, PCRE2 used to limit the maximum repetition to +1, believing that repeating an assertion is pointless. However, if a positive +assertion contains capturing groups, repetition can be useful. In any case, an +assertion could always be wrapped in a repeated group. The only restriction +that is now imposed is that an unlimited maximum is changed to one more than +the minimum. + +10. Fix *THEN verbs in lookahead assertions in JIT. + +11. Added PCRE2_SUBSTITUTE_REPLACEMENT_ONLY. + +12. The JIT stack should be freed when the low-level stack allocation fails. + +13. In pcre2grep, if the final line in a scanned file is output but does not +end with a newline sequence, add a newline according to the --newline setting. + +14. (?(DEFINE)...) groups were not being handled correctly when checking for +the fixed length of a lookbehind assertion. Such a group within a lookbehind +should be skipped, as it does not contribute to the length of the group. +Instead, the (DEFINE) group was being processed, and if at the end of the +lookbehind, that end was not correctly recognized. Errors such as "lookbehind +assertion is not fixed length" and also "internal error: bad code value in +parsed_skip()" could result. + +15. Put a limit of 1000 on recursive calls in pcre2_study() when searching +nested groups for starting code units, in order to avoid stack overflow issues. +If the limit is reached, it just gives up trying for this optimization. + +16. The control verb chain list must always be restored when exiting from a +recurse function in JIT. + +17. Fix a crash which occurs when the character type of an invalid UTF +character is decoded in JIT. + +18. Changes in many areas of the code so that when Unicode is supported and +PCRE2_UCP is set without PCRE2_UTF, Unicode character properties are used for +upper/lower case computations on characters whose code points are greater than +127. + +19. The function for checking UTF-16 validity was returning an incorrect offset +for the start of the error when a high surrogate was not followed by a valid +low surrogate. This caused incorrect behaviour, for example when +PCRE2_MATCH_INVALID_UTF was set and a match started immediately following the +invalid high surrogate, such as /aa/ matching "\x{d800}aa". + +20. If a DEFINE group immediately preceded a lookbehind assertion, the pattern +could be mis-compiled and therefore not match correctly. This is the example +that found this: /(?(DEFINE)(?bar))(? has been raised to +50, (b) the new --om-capture option changes the limit, (c) an error is raised +if -o asks for a group that is above the limit. + +12. The quantifier {1} was always being ignored, but this is incorrect when it +is made possessive and applied to an item in parentheses, because a +parenthesized item may contain multiple branches or other backtracking points, +for example /(a|ab){1}+c/ or /(a+){1}+a/. + +13. For partial matches, pcre2test was always showing the maximum lookbehind +characters, flagged with "<", which is misleading when the lookbehind didn't +actually look behind the start (because it was later in the pattern). Showing +all consulted preceding characters for partial matches is now controlled by the +existing "allusedtext" modifier and, as for complete matches, this facility is +available only for non-JIT matching, because JIT does not maintain the first +and last consulted characters. + +14. DFA matching (using pcre2_dfa_match()) was not recognising a partial match +if the end of the subject was encountered in a lookahead (conditional or +otherwise), an atomic group, or a recursion. + +15. Give error if pcre2test -t, -T, -tm or -TM is given an argument of zero. + +16. Check for integer overflow when computing lookbehind lengths. Fixes +Clusterfuzz issue 15636. + +17. Implemented non-atomic positive lookaround assertions. + +18. If a lookbehind contained a lookahead that contained another lookbehind +within it, the nested lookbehind was not correctly processed. For example, if +/(?<=(?=(?<=a)))b/ was matched to "ab" it gave no match instead of matching +"b". + +19. Implemented pcre2_get_match_data_size(). + +20. Two alterations to partial matching: + + (a) The definition of a partial match is slightly changed: if a pattern + contains any lookbehinds, an empty partial match may be given, because this + is another situation where adding characters to the current subject can + lead to a full match. Example: /c*+(?<=[bc])/ with subject "ab". + + (b) Similarly, if a pattern could match an empty string, an empty partial + match may be given. Example: /(?![ab]).*/ with subject "ab". This case + applies only to PCRE2_PARTIAL_HARD. + + (c) An empty string partial hard match can be returned for \z and \Z as it + is documented that they shouldn't match. + +21. A branch that started with (*ACCEPT) was not being recognized as one that +could match an empty string. + +22. Corrected pcre2_set_character_tables() tables data type: was const unsigned +char * instead of const uint8_t *, as generated by pcre2_maketables(). + +23. Upgraded to Unicode 12.1.0. + +24. Add -jitfast command line option to pcre2test (to make all the jit options +available directly). + +25. Make pcre2test -C show if libreadline or libedit is supported. + +26. If the length of one branch of a group exceeded 65535 (the maximum value +that is remembered as a minimum length), the whole group's length was +incorrectly recorded as 65535, leading to incorrect "no match" when start-up +optimizations were in force. + +27. The "rightmost consulted character" value was not always correct; in +particular, if a pattern ended with a negative lookahead, characters that were +inspected in that lookahead were not included. + +28. Add the pcre2_maketables_free() function. + +29. The start-up optimization that looks for a unique initial matching +code unit in the interpretive engines uses memchr() in 8-bit mode. When the +search is caseless, it was doing so inefficiently, which ended up slowing down +the match drastically when the subject was very long. The revised code (a) +remembers if one case is not found, so it never repeats the search for that +case after a bumpalong and (b) when one case has been found, it searches only +up to that position for an earlier occurrence of the other case. This fix +applies to both interpretive pcre2_match() and to pcre2_dfa_match(). + +30. While scanning to find the minimum length of a group, if any branch has +minimum length zero, there is no need to scan any subsequent branches (a small +compile-time performance improvement). + +31. Installed a .gitignore file on a user's suggestion. When using the svn +repository with git (through git svn) this helps keep it tidy. + +32. Add underflow check in JIT which may occur when the value of subject +string pointer is close to 0. + +33. Arrange for classes such as [Aa] which contain just the two cases of the +same character, to be treated as a single caseless character. This causes the +first and required code unit optimizations to kick in where relevant. + +34. Improve the bitmap of starting bytes for positive classes that include wide +characters, but no property types, in UTF-8 mode. Previously, on encountering +such a class, the bits for all bytes greater than \xc4 were set, thus +specifying any character with codepoint >= 0x100. Now the only bits that are +set are for the relevant bytes that start the wide characters. This can give a +noticeable performance improvement. + +35. If the bitmap of starting code units contains only 1 or 2 bits, replace it +with a single starting code unit (1 bit) or a caseless single starting code +unit if the two relevant characters are case-partners. This is particularly +relevant to the 8-bit library, though it applies to all. It can give a +performance boost for patterns such as [Ww]ord and (word|WORD). However, this +optimization doesn't happen if there is a "required" code unit of the same +value (because the search for a "required" code unit starts at the match start +for non-unique first code unit patterns, but after a unique first code unit, +and patterns such as a*a need the former action). + +36. Small patch to pcre2posix.c to set the erroroffset field to -1 immediately +after a successful compile, instead of at the start of matching to avoid a +sanitizer complaint (regexec is supposed to be thread safe). + +37. Add NEON vectorization to JIT to speed up matching of first character and +pairs of characters on ARM64 CPUs. + +38. If a non-ASCII character was the first in a starting assertion in a +caseless match, the "first code unit" optimization did not get the casing +right, and the assertion failed to match a character in the other case if it +did not start with the same code unit. + +39. Fixed the incorrect computation of jump sizes on x86 CPUs in JIT. A masking +operation was incorrectly removed in r1136. Reported by Ralf Junker. + + +Version 10.33 16-April-2019 +--------------------------- + +1. Added "allvector" to pcre2test to make it easy to check the part of the +ovector that shouldn't be changed, in particular after substitute and failed or +partial matches. + +2. Fix subject buffer overread in JIT when UTF is disabled and \X or \R has +a greater than 1 fixed quantifier. This issue was found by Yunho Kim. + +3. Added support for callouts from pcre2_substitute(). After 10.33-RC1, but +prior to release, fixed a bug that caused a crash if pcre2_substitute() was +called with a NULL match context. + +4. The POSIX functions are now all called pcre2_regcomp() etc., with wrapper +functions that use the standard POSIX names. However, in pcre2posix.h the POSIX +names are defined as macros. This should help avoid linking with the wrong +library in some environments while still exporting the POSIX names for +pre-existing programs that use them. (The Debian alternative names are also +defined as macros, but not documented.) + +5. Fix an xclass matching issue in JIT. + +6. Implement PCRE2_EXTRA_ESCAPED_CR_IS_LF (see Bugzilla 2315). + +7. Implement the Perl 5.28 experimental alphabetic names for atomic groups and +lookaround assertions, for example, (*pla:...) and (*atomic:...). These are +characterized by a lower case letter following (* and to simplify coding for +this, the character tables created by pcre2_maketables() were updated to add a +new "is lower case letter" bit. At the same time, the now unused "is +hexadecimal digit" bit was removed. The default tables in +src/pcre2_chartables.c.dist are updated. + +8. Implement the new Perl "script run" features (*script_run:...) and +(*atomic_script_run:...) aka (*sr:...) and (*asr:...). + +9. Fixed two typos in change 22 for 10.21, which added special handling for +ranges such as a-z in EBCDIC environments. The original code probably never +worked, though there were no bug reports. + +10. Implement PCRE2_COPY_MATCHED_SUBJECT for pcre2_match() (including JIT via +pcre2_match()) and pcre2_dfa_match(), but *not* the pcre2_jit_match() fast +path. Also, when a match fails, set the subject field in the match data to NULL +for tidiness - none of the substring extractors should reference this after +match failure. + +11. If a pattern started with a subroutine call that had a quantifier with a +minimum of zero, an incorrect "match must start with this character" could be +recorded. Example: /(?&xxx)*ABC(?XYZ)/ would (incorrectly) expect 'A' to +be the first character of a match. + +12. The heap limit checking code in pcre2_dfa_match() could suffer from +overflow if the heap limit was set very large. This could cause incorrect "heap +limit exceeded" errors. + +13. Add "kibibytes" to the heap limit output from pcre2test -C to make the +units clear. + +14. Add a call to pcre2_jit_free_unused_memory() in pcre2grep, for tidiness. + +15. Updated the VMS-specific code in pcre2test on the advice of a VMS user. + +16. Removed the unnecessary inclusion of stdint.h (or inttypes.h) from +pcre2_internal.h as it is now included by pcre2.h. Also, change 17 for 10.32 +below was unnecessarily complicated, as inttypes.h is a Standard C header, +which is defined to be a superset of stdint.h. Instead of conditionally +including stdint.h or inttypes.h, pcre2.h now unconditionally includes +inttypes.h. This supports environments that do not have stdint.h but do have +inttypes.h, which are known to exist. A note in the autotools documentation +says (November 2018) that there are none known that are the other way round. + +17. Added --disable-percent-zt to "configure" (and equivalent to CMake) to +forcibly disable the use of %zu and %td in formatting strings because there is +at least one version of VMS that claims to be C99 but does not support these +modifiers. + +18. Added --disable-pcre2grep-callout-fork, which restricts the callout support +in pcre2grep to the inbuilt echo facility. This may be useful in environments +that do not support fork(). + +19. Fix two instances of <= 0 being applied to unsigned integers (the VMS +compiler complains). + +20. Added "fork" support for VMS to pcre2grep, for running an external program +via a string callout. + +21. Improve MAP_JIT flag usage on MacOS. Patch by Rich Siegel. + +22. If a pattern started with (*MARK), (*COMMIT), (*PRUNE), (*SKIP), or (*THEN) +followed by ^ it was not recognized as anchored. + +23. The RunGrepTest script used to cut out the test of NUL characters for +Solaris and MacOS as printf and sed can't handle them. It seems that the *BSD +systems can't either. I've inverted the test so that only those OS that are +known to work (currently only Linux) try to run this test. + +24. Some tests in RunGrepTest appended to testtrygrep from two different file +descriptors instead of redirecting stderr to stdout. This worked on Linux, but +it was reported not to on other systems, causing the tests to fail. + +25. In the RunTest script, make the test for stack setting use the same value +for the stack as it needs for -bigstack. + +26. Insert a cast in pcre2_dfa_match.c to suppress a compiler warning. + +26. With PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL set, escape sequences such as \s +which are valid in character classes, but not as the end of ranges, were being +treated as literals. An example is [_-\s] (but not [\s-_] because that gave an +error at the *start* of a range). Now an "invalid range" error is given +independently of PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL. + +27. Related to 26 above, PCRE2_BAD_ESCAPE_IS_LITERAL was affecting known escape +sequences such as \eX when they appeared invalidly in a character class. Now +the option applies only to unrecognized or malformed escape sequences. + +28. Fix word boundary in JIT compiler. Patch by Mike Munday. + +29. The pcre2_dfa_match() function was incorrectly handling conditional version +tests such as (?(VERSION>=0)...) when the version test was true. Incorrect +processing or a crash could result. + +30. When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in group +names, as Perl does. There was a small bug in this new code, found by +ClusterFuzz 12950, fixed before release. + +31. Implemented PCRE2_EXTRA_ALT_BSUX to support ECMAScript 6's \u{hhh} +construct. + +32. Compile \p{Any} to be the same as . in DOTALL mode, so that it benefits +from auto-anchoring if \p{Any}* starts a pattern. + +33. Compile invalid UTF check in JIT test when only pcre32 is enabled. + +34. For some time now, CMake has been warning about the setting of policy +CMP0026 to "OLD" in CmakeLists.txt, and hinting that the feature might be +removed in a future version. A request for CMake expertise on the list produced +no result, so I have now hacked CMakeLists.txt along the lines of some changes +I found on the Internet. The new code no longer needs the policy setting, and +it appears to work fine on Linux. + +35. Setting --enable-jit=auto for an out-of-tree build failed because the +source directory wasn't in the search path for AC_TRY_COMPILE always. Patch +from Ross Burton. + +36. Disable SSE2 JIT optimizations in x86 CPUs when SSE2 is not available. +Patch by Guillem Jover. + +37. Changed expressions such as 1<<10 to 1u<<10 in many places because compiler +warnings were reported. + +38. Using the clang compiler with sanitizing options causes runtime complaints +about truncation for statments such as x = ~x when x is an 8-bit value; it +seems to compute ~x as a 32-bit value. Changing such statements to x = 255 ^ x +gets rid of the warnings. There were also two missing casts in pcre2test. + + +Version 10.32 10-September-2018 +------------------------------- + +1. When matching using the the REG_STARTEND feature of the POSIX API with a +non-zero starting offset, unset capturing groups with lower numbers than a +group that did capture something were not being correctly returned as "unset" +(that is, with offset values of -1). + +2. When matching using the POSIX API, pcre2test used to omit listing unset +groups altogether. Now it shows those that come before any actual captures as +"", as happens for non-POSIX matching. + +3. Running "pcre2test -C" always stated "\R matches CR, LF, or CRLF only", +whatever the build configuration was. It now correctly says "\R matches all +Unicode newlines" in the default case when --enable-bsr-anycrlf has not been +specified. Similarly, running "pcre2test -C bsr" never produced the result +ANY. + +4. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing +multi-code-unit characters caused bad behaviour and possibly a crash. This +issue was fixed for other kinds of repeat in release 10.20 by change 19, but +repeating character classes were overlooked. + +5. pcre2grep now supports the inclusion of binary zeros in patterns that are +read from files via the -f option. + +6. A small fix to pcre2grep to avoid compiler warnings for -Wformat-overflow=2. + +7. Added --enable-jit=auto support to configure.ac. + +8. Added some dummy variables to the heapframe structure in 16-bit and 32-bit +modes for the benefit of m68k, where pointers can be 16-bit aligned. The +dummies force 32-bit alignment and this ensures that the structure is a +multiple of PCRE2_SIZE, a requirement that is tested at compile time. In other +architectures, alignment requirements take care of this automatically. + +9. When returning an error from pcre2_pattern_convert(), ensure the error +offset is set zero for early errors. + +10. A number of patches for Windows support from Daniel Richard G: + + (a) List of error numbers in Runtest.bat corrected (it was not the same as in + Runtest). + + (b) pcre2grep snprintf() workaround as used elsewhere in the tree. + + (c) Support for non-C99 snprintf() that returns -1 in the overflow case. + +11. Minor tidy of pcre2_dfa_match() code. + +12. Refactored pcre2_dfa_match() so that the internal recursive calls no longer +use the stack for local workspace and local ovectors. Instead, an initial block +of stack is reserved, but if this is insufficient, heap memory is used. The +heap limit parameter now applies to pcre2_dfa_match(). + +13. If a "find limits" test of DFA matching in pcre2test resulted in too many +matches for the ovector, no matches were displayed. + +14. Removed an occurrence of ctrl/Z from test 6 because Windows treats it as +EOF. The test looks to have come from a fuzzer. + +15. If PCRE2 was built with a default match limit a lot greater than the +default default of 10 000 000, some JIT tests of the match limit no longer +failed. All such tests now set 10 000 000 as the upper limit. + +16. Another Windows related patch for pcregrep to ensure that WIN32 is +undefined under Cygwin. + +17. Test for the presence of stdint.h and inttypes.h in configure and CMake and +include whichever exists (stdint preferred) instead of unconditionally +including stdint. This makes life easier for old and non-standard systems. + +18. Further changes to improve portability, especially to old and or non- +standard systems: + + (a) Put all printf arguments in RunGrepTest into single, not double, quotes, + and use \0 not \x00 for binary zero. + + (b) Avoid the use of C++ (i.e. BCPL) // comments. + + (c) Parameterize the use of %zu in pcre2test to make it like %td. For both of + these now, if using MSVC or a standard C before C99, %lu is used with a + cast if necessary. + +19. Applied a contributed patch to CMakeLists.txt to increase the stack size +when linking pcre2test with MSVC. This gets rid of a stack overflow error in +the standard set of tests. + +20. Output a warning in pcre2test when ignoring the "altglobal" modifier when +it is given with the "replace" modifier. + +21. In both pcre2test and pcre2_substitute(), with global matching, a pattern +that matched an empty string, but never at the starting match offset, was not +handled in a Perl-compatible way. The pattern /(a(*:1))(?>b)(*SKIP:1)x|.*/ matched against "abc", where the *SKIP +shouldn't find a MARK (because is in an atomic group), but it did. + +26. Upgraded the perltest.sh script: (1) #pattern lines can now be used to set +a list of modifiers for all subsequent patterns - only those that the script +recognizes are meaningful; (2) #subject lines can be used to set or unset a +default "mark" modifier; (3) Unsupported #command lines give a warning when +they are ignored; (4) Mark data is output only if the "mark" modifier is +present. + +27. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported. + +28. A (*MARK) name was not being passed back for positive assertions that were +terminated by (*ACCEPT). + +29. Add support for \N{U+dddd}, but only in Unicode mode. + +30. Add support for (?^) for unsetting all imnsx options. + +31. The PCRE2_EXTENDED (/x) option only ever discarded space characters whose +code point was less than 256 and that were recognized by the lookup table +generated by pcre2_maketables(), which uses isspace() to identify white space. +Now, when Unicode support is compiled, PCRE2_EXTENDED also discards U+0085, +U+200E, U+200F, U+2028, and U+2029, which are additional characters defined by +Unicode as "Pattern White Space". This makes PCRE2 compatible with Perl. + +32. In certain circumstances, option settings within patterns were not being +correctly processed. For example, the pattern /((?i)A)(?m)B/ incorrectly +matched "ab". (The (?m) setting lost the fact that (?i) should be reset at the +end of its group during the parse process, but without another setting such as +(?m) the compile phase got it right.) This bug was introduced by the +refactoring in release 10.23. + +33. PCRE2 uses bcopy() if available when memmove() is not, and it used just to +define memmove() as function call to bcopy(). This hasn't been tested for a +long time because in pcre2test the result of memmove() was being used, whereas +bcopy() doesn't return a result. This feature is now refactored always to call +an emulation function when there is no memmove(). The emulation makes use of +bcopy() when available. + +34. When serializing a pattern, set the memctl, executable_jit, and tables +fields (that is, all the fields that contain pointers) to zeros so that the +result of serializing is always the same. These fields are re-set when the +pattern is deserialized. + +35. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated +negative class with no characters less than 0x100 followed by a positive class +with only characters less than 0x100, the first class was incorrectly being +auto-possessified, causing incorrect match failures. + +36. Removed the character type bit ctype_meta, which dates from PCRE1 and is +not used in PCRE2. + +37. Tidied up unnecessarily complicated macros used in the escapes table. + +38. Since 10.21, the new testoutput8-16-4 file has accidentally been omitted +from distribution tarballs, owing to a typo in Makefile.am which had +testoutput8-16-3 twice. Now fixed. + +39. If the only branch in a conditional subpattern was anchored, the whole +subpattern was treated as anchored, when it should not have been, since the +assumed empty second branch cannot be anchored. Demonstrated by test patterns +such as /(?(1)^())b/ or /(?(?=^))b/. + +40. A repeated conditional subpattern that could match an empty string was +always assumed to be unanchored. Now it it checked just like any other +repeated conditional subpattern, and can be found to be anchored if the minimum +quantifier is one or more. I can't see much use for a repeated anchored +pattern, but the behaviour is now consistent. + +41. Minor addition to pcre2_jit_compile.c to avoid static analyzer complaint +(for an event that could never occur but you had to have external information +to know that). + +42. If before the first match in a file that was being searched by pcre2grep +there was a line that was sufficiently long to cause the input buffer to be +expanded, the variable holding the location of the end of the previous match +was being adjusted incorrectly, and could cause an overflow warning from a code +sanitizer. However, as the value is used only to print pending "after" lines +when the next match is reached (and there are no such lines in this case) this +bug could do no damage. + + +Version 10.31 12-February-2018 +------------------------------ + +1. Fix typo (missing ]) in VMS code in pcre2test.c. + +2. Replace the replicated code for matching extended Unicode grapheme sequences +(which got a lot more complicated by change 10.30/49) by a single subroutine +that is called by both pcre2_match() and pcre2_dfa_match(). + +3. Add idempotent guard to pcre2_internal.h. + +4. Add new pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and +PCRE2_CONFIG_COMPILED_WIDTHS. + +5. Cut out \C tests in the JIT regression tests when NEVER_BACKSLASH_C is +defined (e.g. by --enable-never-backslash-C). + +6. Defined public names for all the pcre2_compile() error numbers, and used +the public names in pcre2_convert.c. + +7. Fixed a small memory leak in pcre2test (convert contexts). + +8. Added two casts to compile.c and one to match.c to avoid compiler warnings. + +9. Added code to pcre2grep when compiled under VMS to set the symbol +PCRE2GREP_RC to the exit status, because VMS does not distinguish between +exit(0) and exit(1). + +10. Added the -LM (list modifiers) option to pcre2test. Also made -C complain +about a bad option only if the following argument item does not start with a +hyphen. + +11. pcre2grep was truncating components of file names to 128 characters when +processing files with the -r option, and also (some very odd code) truncating +path names to 512 characters. There is now a check on the absolute length of +full path file names, which may be up to 2047 characters long. + +12. When an assertion contained (*ACCEPT) it caused all open capturing groups +to be closed (as for a non-assertion ACCEPT), which was wrong and could lead to +misbehaviour for subsequent references to groups that started outside the +assertion. ACCEPT in an assertion now closes only those groups that were +started within that assertion. Fixes oss-fuzz issues 3852 and 3891. + +13. Multiline matching in pcre2grep was misbehaving if the pattern matched +within a line, and then matched again at the end of the line and over into +subsequent lines. Behaviour was different with and without colouring, and +sometimes context lines were incorrectly printed and/or line endings were lost. +All these issues should now be fixed. + +14. If --line-buffered was specified for pcre2grep when input was from a +compressed file (.gz or .bz2) a segfault occurred. (Line buffering should be +ignored for compressed files.) + +15. Although pcre2_jit_match checks whether the pattern is compiled +in a given mode, it was also expected that at least one mode is available. +This is fixed and pcre2_jit_match returns with PCRE2_ERROR_JIT_BADOPTION +when the pattern is not optimized by JIT at all. + +16. The line number and related variables such as match counts in pcre2grep +were all int variables, causing overflow when files with more than 2147483647 +lines were processed (assuming 32-bit ints). They have all been changed to +unsigned long ints. + +17. If a backreference with a minimum repeat count of zero was first in a +pattern, apart from assertions, an incorrect first matching character could be +recorded. For example, for the pattern /(?=(a))\1?b/, "b" was incorrectly set +as the first character of a match. + +18. Characters in a leading positive assertion are considered for recording a +first character of a match when the rest of the pattern does not provide one. +However, a character in a non-assertive group within a leading assertion such +as in the pattern /(?=(a))\1?b/ caused this process to fail. This was an +infelicity rather than an outright bug, because it did not affect the result of +a match, just its speed. (In fact, in this case, the starting 'a' was +subsequently picked up in the study.) + +19. A minor tidy in pcre2_match(): making all PCRE2_ERROR_ returns use "return" +instead of "RRETURN" saves unwinding the backtracks in these cases (only one +didn't). + +20. Allocate a single callout block on the stack at the start of pcre2_match() +and set its never-changing fields once only. Do the same for pcre2_dfa_match(). + +21. Save the extra compile options (set in the compile context) with the +compiled pattern (they were not previously saved), add PCRE2_INFO_EXTRAOPTIONS +to retrieve them, and update pcre2test to show them. + +22. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new +field callout_flags in callout blocks. The bits are set by pcre2_match(), but +not by JIT or pcre2_dfa_match(). Their settings are shown in pcre2test callouts +if the callout_extra subject modifier is set. These bits are provided to help +with tracking how a backtracking match is proceeding. + +23. Updated the pcre2demo.c demonstration program, which was missing the extra +code for -g that handles the case when \K in an assertion causes the match to +end at the original start point. Also arranged for it to detect when \K causes +the end of a match to be before its start. + +24. Similar to 23 above, strange things (including loops) could happen in +pcre2grep when \K was used in an assertion when --colour was used or in +multiline mode. The "end at original start point" bug is fixed, and if the end +point is found to be before the start point, they are swapped. + +25. When PCRE2_FIRSTLINE without PCRE2_NO_START_OPTIMIZE was used in non-JIT +matching (both pcre2_match() and pcre2_dfa_match()) and the matched string +started with the first code unit of a newline sequence, matching failed because +it was not tried at the newline. + +26. Code for giving up a non-partial match after failing to find a starting +code unit anywhere in the subject was missing when searching for one of a +number of code units (the bitmap case) in both pcre2_match() and +pcre2_dfa_match(). This was a missing optimization rather than a bug. + +27. Tidied up the ACROSSCHAR macro to be like FORWARDCHAR and BACKCHAR, using a +pointer argument rather than a code unit value. This should not have affected +the generated code. + +28. The JIT compiler has been updated. + +29. Avoid pointer overflow for unset captures in pcre2_substring_list_get(). +This could not actually cause a crash because it was always used in a memcpy() +call with zero length. + +30. Some internal structures have a variable-length ovector[] as their last +element. Their actual memory is obtained dynamically, giving an ovector of +appropriate length. However, they are defined in the structure as +ovector[NUMBER], where NUMBER is large so that array bound checkers don't +grumble. The value of NUMBER was 10000, but a fuzzer exceeded 5000 capturing +groups, making the ovector larger than this. The number has been increased to +131072, which allows for the maximum number of captures (65535) plus the +overall match. This fixes oss-fuzz issue 5415. + +31. Auto-possessification at the end of a capturing group was dependent on what +follows the group (e.g. /(a+)b/ would auto-possessify the a+) but this caused +incorrect behaviour when the group was called recursively from elsewhere in the +pattern where something different might follow. This bug is an unforseen +consequence of change #1 for 10.30 - the implementation of backtracking into +recursions. Iterators at the ends of capturing groups are no longer considered +for auto-possessification if the pattern contains any recursions. Fixes +Bugzilla #2232. + + +Version 10.30 14-August-2017 +---------------------------- + +1. The main interpreter, pcre2_match(), has been refactored into a new version +that does not use recursive function calls (and therefore the stack) for +remembering backtracking positions. This makes --disable-stack-for-recursion a +NOOP. The new implementation allows backtracking into recursive group calls in +patterns, making it more compatible with Perl, and also fixes some other +hard-to-do issues such as #1887 in Bugzilla. The code is also cleaner because +the old code had a number of fudges to try to reduce stack usage. It seems to +run no slower than the old code. + +A number of bugs in the refactored code were subsequently fixed during testing +before release, but after the code was made available in the repository. These +bugs were never in fully released code, but are noted here for the record. + + (a) If a pattern had fewer capturing parentheses than the ovector supplied in + the match data block, a memory error (detectable by ASAN) occurred after + a match, because the external block was being set from non-existent + internal ovector fields. Fixes oss-fuzz issue 781. + + (b) A pattern with very many capturing parentheses (when the internal frame + size was greater than the initial frame vector on the stack) caused a + crash. A vector on the heap is now set up at the start of matching if the + vector on the stack is not big enough to handle at least 10 frames. + Fixes oss-fuzz issue 783. + + (c) Handling of (*VERB)s in recursions was wrong in some cases. + + (d) Captures in negative assertions that were used as conditions were not + happening if the assertion matched via (*ACCEPT). + + (e) Mark values were not being passed out of recursions. + + (f) Refactor some code in do_callout() to avoid picky compiler warnings about + negative indices. Fixes oss-fuzz issue 1454. + + (g) Similarly refactor the way the variable length ovector is addressed for + similar reasons. Fixes oss-fuzz issue 1465. + +2. Now that pcre2_match() no longer uses recursive function calls (see above), +the "match limit recursion" value seems misnamed. It still exists, and limits +the depth of tree that is searched. To avoid future confusion, it has been +renamed as "depth limit" in all relevant places (--with-depth-limit, +(*LIMIT_DEPTH), pcre2_set_depth_limit(), etc) but the old names are still +available for backwards compatibility. + +3. Hardened pcre2test so as to reduce the number of bugs reported by fuzzers: + + (a) Check for malloc failures when getting memory for the ovector (POSIX) or + the match data block (non-POSIX). + +4. In the 32-bit library in non-UTF mode, an attempt to find a Unicode property +for a character with a code point greater than 0x10ffff (the Unicode maximum) +caused a crash. + +5. If a lookbehind assertion that contained a back reference to a group +appearing later in the pattern was compiled with the PCRE2_ANCHORED option, +undefined actions (often a segmentation fault) could occur, depending on what +other options were set. An example assertion is (?" should be ">=" in opcode check in pcre2_auto_possess.c. + (b) Added some casts to avoid "suspicious implicit sign extension". + (c) Resource leaks in pcre2test in rare error cases. + (d) Avoid warning for never-use case OP_TABLE_LENGTH which is just a fudge + for checking at compile time that tables are the right size. + (e) Add missing "fall through" comment. + +29. Implemented PCRE2_EXTENDED_MORE and related /xx and (?xx) features. + +30. Implement (?n: for PCRE2_NO_AUTO_CAPTURE, because Perl now has this. + +31. If more than one of "push", "pushcopy", or "pushtablescopy" were set in +pcre2test, a crash could occur. + +32. Make -bigstack in RunTest allocate a 64MiB stack (instead of 16MiB) so +that all the tests can run with clang's sanitizing options. + +33. Implement extra compile options in the compile context and add the first +one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES. + +34. Implement newline type PCRE2_NEWLINE_NUL. + +35. A lookbehind assertion that had a zero-length branch caused undefined +behaviour when processed by pcre2_dfa_match(). This is oss-fuzz issue 1859. + +36. The match limit value now also applies to pcre2_dfa_match() as there are +patterns that can use up a lot of resources without necessarily recursing very +deeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761. + +37. Implement PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL. + +38. Fix returned offsets from regexec() when REG_STARTEND is used with a +starting offset greater than zero. + +39. Implement REG_PEND (GNU extension) for the POSIX wrapper. + +40. Implement the subject_literal modifier in pcre2test, and allow jitstack on +pattern lines. + +41. Implement PCRE2_LITERAL and use it to support REG_NOSPEC. + +42. Implement PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD for the benefit +of pcre2grep. + +43. Re-implement pcre2grep's -F, -w, and -x options using PCRE2_LITERAL, +PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This fixes two bugs: + + (a) The -F option did not work for fixed strings containing \E. + (b) The -w option did not work for patterns with multiple branches. + +44. Added configuration options for the SELinux compatible execmem allocator in +JIT. + +45. Increased the limit for searching for a "must be present" code unit in +subjects from 1000 to 2000 for 8-bit searches, since they use memchr() and are +much faster. + +46. Arrange for anchored patterns to record and use "first code unit" data, +because this can give a fast "no match" without searching for a "required code +unit". Previously only non-anchored patterns did this. + +47. Upgraded the Unicode tables from Unicode 8.0.0 to Unicode 10.0.0. + +48. Add the callout_no_where modifier to pcre2test. + +49. Update extended grapheme breaking rules to the latest set that are in +Unicode Standard Annex #29. + +50. Added experimental foreign pattern conversion facilities +(pcre2_pattern_convert() and friends). + +51. Change the macro FWRITE, used in pcre2grep, to FWRITE_IGNORE because FWRITE +is defined in a system header in cygwin. Also modified some of the #ifdefs in +pcre2grep related to Windows and Cygwin support. + +52. Change 3(g) for 10.23 was a bit too zealous. If a hyphen that follows a +character class is the last character in the class, Perl does not give a +warning. PCRE2 now also treats this as a literal. + +53. Related to 52, though PCRE2 was throwing an error for [[:digit:]-X] it was +not doing so for [\d-X] (and similar escapes), as is documented. + +54. Fixed a MIPS issue in the JIT compiler reported by Joshua Kinard. + +55. Fixed a "maybe uninitialized" warning for class_uchardata in \p handling in +pcre2_compile() which could never actually trigger (code should have been cut +out when Unicode support is disabled). + + +Version 10.23 14-February-2017 +------------------------------ + +1. Extended pcre2test with the utf8_input modifier so that it is able to +generate all possible 16-bit and 32-bit code unit values in non-UTF modes. + +2. In any wide-character mode (8-bit UTF or any 16-bit or 32-bit mode), without +PCRE2_UCP set, a negative character type such as \D in a positive class should +cause all characters greater than 255 to match, whatever else is in the class. +There was a bug that caused this not to happen if a Unicode property item was +added to such a class, for example [\D\P{Nd}] or [\W\pL]. + +3. There has been a major re-factoring of the pcre2_compile.c file. Most syntax +checking is now done in the pre-pass that identifies capturing groups. This has +reduced the amount of duplication and made the code tidier. While doing this, +some minor bugs and Perl incompatibilities were fixed, including: + + (a) \Q\E in the middle of a quantifier such as A+\Q\E+ is now ignored instead + of giving an invalid quantifier error. + + (b) {0} can now be used after a group in a lookbehind assertion; previously + this caused an "assertion is not fixed length" error. + + (c) Perl always treats (?(DEFINE) as a "define" group, even if a group with + the name "DEFINE" exists. PCRE2 now does likewise. + + (d) A recursion condition test such as (?(R2)...) must now refer to an + existing subpattern. + + (e) A conditional recursion test such as (?(R)...) misbehaved if there was a + group whose name began with "R". + + (f) When testing zero-terminated patterns under valgrind, the terminating + zero is now marked "no access". This catches bugs that would otherwise + show up only with non-zero-terminated patterns. + + (g) A hyphen appearing immediately after a POSIX character class (for example + /[[:ascii:]-z]/) now generates an error. Perl does accept this as a + literal, but gives a warning, so it seems best to fail it in PCRE. + + (h) An empty \Q\E sequence may appear after a callout that precedes an + assertion condition (it is, of course, ignored). + +One effect of the refactoring is that some error numbers and messages have +changed, and the pattern offset given for compiling errors is not always the +right-most character that has been read. In particular, for a variable-length +lookbehind assertion it now points to the start of the assertion. Another +change is that when a callout appears before a group, the "length of next +pattern item" that is passed now just gives the length of the opening +parenthesis item, not the length of the whole group. A length of zero is now +given only for a callout at the end of the pattern. Automatic callouts are no +longer inserted before and after explicit callouts in the pattern. + +A number of bugs in the refactored code were subsequently fixed during testing +before release, but after the code was made available in the repository. Many +of the bugs were discovered by fuzzing testing. Several of them were related to +the change from assuming a zero-terminated pattern (which previously had +required non-zero terminated strings to be copied). These bugs were never in +fully released code, but are noted here for the record. + + (a) An overall recursion such as (?0) inside a lookbehind assertion was not + being diagnosed as an error. + + (b) In utf mode, the length of a *MARK (or other verb) name was being checked + in characters instead of code units, which could lead to bad code being + compiled, leading to unpredictable behaviour. + + (c) In extended /x mode, characters whose code was greater than 255 caused + a lookup outside one of the global tables. A similar bug existed for wide + characters in *VERB names. + + (d) The amount of memory needed for a compiled pattern was miscalculated if a + lookbehind contained more than one toplevel branch and the first branch + was of length zero. + + (e) In UTF-8 or UTF-16 modes with PCRE2_EXTENDED (/x) set and a non-zero- + terminated pattern, if a # comment ran on to the end of the pattern, one + or more code units past the end were being read. + + (f) An unterminated repeat at the end of a non-zero-terminated pattern (e.g. + "{2,2") could cause reading beyond the pattern. + + (g) When reading a callout string, if the end delimiter was at the end of the + pattern one further code unit was read. + + (h) An unterminated number after \g' could cause reading beyond the pattern. + + (i) An insufficient memory size was being computed for compiling with + PCRE2_AUTO_CALLOUT. + + (j) A conditional group with an assertion condition used more memory than was + allowed for it during parsing, so too many of them could therefore + overrun a buffer. + + (k) If parsing a pattern exactly filled the buffer, the internal test for + overrun did not check when the final META_END item was added. + + (l) If a lookbehind contained a subroutine call, and the called group + contained an option setting such as (?s), and the PCRE2_ANCHORED option + was set, unpredictable behaviour could occur. The underlying bug was + incorrect code and insufficient checking while searching for the end of + the called subroutine in the parsed pattern. + + (m) Quantifiers following (*VERB)s were not being diagnosed as errors. + + (n) The use of \Q...\E in a (*VERB) name when PCRE2_ALT_VERBNAMES and + PCRE2_AUTO_CALLOUT were both specified caused undetermined behaviour. + + (o) If \Q was preceded by a quantified item, and the following \E was + followed by '?' or '+', and there was at least one literal character + between them, an internal error "unexpected repeat" occurred (example: + /.+\QX\E+/). + + (p) A buffer overflow could occur while sorting the names in the group name + list (depending on the order in which the names were seen). + + (q) A conditional group that started with a callout was not doing the right + check for a following assertion, leading to compiling bad code. Example: + /(?(C'XX))?!XX/ + + (r) If a character whose code point was greater than 0xffff appeared within + a lookbehind that was within another lookbehind, the calculation of the + lookbehind length went wrong and could provoke an internal error. + + (t) The sequence \E- or \Q\E- after a POSIX class in a character class caused + an internal error. Now the hyphen is treated as a literal. + +4. Back references are now permitted in lookbehind assertions when there are +no duplicated group numbers (that is, (?| has not been used), and, if the +reference is by name, there is only one group of that name. The referenced +group must, of course be of fixed length. + +5. pcre2test has been upgraded so that, when run under valgrind with valgrind +support enabled, reading past the end of the pattern is detected, both when +compiling and during callout processing. + +6. \g{+} (e.g. \g{+2} ) is now supported. It is a "forward back +reference" and can be useful in repetitions (compare \g{-} ). Perl does +not recognize this syntax. + +7. Automatic callouts are no longer generated before and after callouts in the +pattern. + +8. When pcre2test was outputing information from a callout, the caret indicator +for the current position in the subject line was incorrect if it was after an +escape sequence for a character whose code point was greater than \x{ff}. + +9. Change 19 for 10.22 had a typo (PCRE_STATIC_RUNTIME should be +PCRE2_STATIC_RUNTIME). Fix from David Gaussmann. + +10. Added --max-buffer-size to pcre2grep, to allow for automatic buffer +expansion when long lines are encountered. Original patch by Dmitry +Cherniachenko. + +11. If pcre2grep was compiled with JIT support, but the library was compiled +without it (something that neither ./configure nor CMake allow, but it can be +done by editing config.h), pcre2grep was giving a JIT error. Now it detects +this situation and does not try to use JIT. + +12. Added some "const" qualifiers to variables in pcre2grep. + +13. Added Dmitry Cherniachenko's patch for colouring output in Windows +(untested by me). Also, look for GREP_COLOUR or GREP_COLOR if the environment +variables PCRE2GREP_COLOUR and PCRE2GREP_COLOR are not found. + +14. Add the -t (grand total) option to pcre2grep. + +15. A number of bugs have been mended relating to match start-up optimizations +when the first thing in a pattern is a positive lookahead. These all applied +only when PCRE2_NO_START_OPTIMIZE was *not* set: + + (a) A pattern such as (?=.*X)X$ was incorrectly optimized as if it needed + both an initial 'X' and a following 'X'. + (b) Some patterns starting with an assertion that started with .* were + incorrectly optimized as having to match at the start of the subject or + after a newline. There are cases where this is not true, for example, + (?=.*[A-Z])(?=.{8,16})(?!.*[\s]) matches after the start in lines that + start with spaces. Starting .* in an assertion is no longer taken as an + indication of matching at the start (or after a newline). + +16. The "offset" modifier in pcre2test was not being ignored (as documented) +when the POSIX API was in use. + +17. Added --enable-fuzz-support to "configure", causing an non-installed +library containing a test function that can be called by fuzzers to be +compiled. A non-installed binary to run the test function locally, called +pcre2fuzzcheck is also compiled. + +18. A pattern with PCRE2_DOTALL (/s) set but not PCRE2_NO_DOTSTAR_ANCHOR, and +which started with .* inside a positive lookahead was incorrectly being +compiled as implicitly anchored. + +19. Removed all instances of "register" declarations, as they are considered +obsolete these days and in any case had become very haphazard. + +20. Add strerror() to pcre2test for failed file opening. + +21. Make pcre2test -C list valgrind support when it is enabled. + +22. Add the use_length modifier to pcre2test. + +23. Fix an off-by-one bug in pcre2test for the list of names for 'get' and +'copy' modifiers. + +24. Add PCRE2_CALL_CONVENTION into the prototype declarations in pcre2.h as it +is apparently needed there as well as in the function definitions. (Why did +nobody ask for this in PCRE1?) + +25. Change the _PCRE2_H and _PCRE2_UCP_H guard macros in the header files to +PCRE2_H_IDEMPOTENT_GUARD and PCRE2_UCP_H_IDEMPOTENT_GUARD to be more standard +compliant and unique. + +26. pcre2-config --libs-posix was listing -lpcre2posix instead of +-lpcre2-posix. Also, the CMake build process was building the library with the +wrong name. + +27. In pcre2test, give some offset information for errors in hex patterns. +This uses the C99 formatting sequence %td, except for MSVC which doesn't +support it - %lu is used instead. + +28. Implemented pcre2_code_copy_with_tables(), and added pushtablescopy to +pcre2test for testing it. + +29. Fix small memory leak in pcre2test. + +30. Fix out-of-bounds read for partial matching of /./ against an empty string +when the newline type is CRLF. + +31. Fix a bug in pcre2test that caused a crash when a locale was set either in +the current pattern or a previous one and a wide character was matched. + +32. The appearance of \p, \P, or \X in a substitution string when +PCRE2_SUBSTITUTE_EXTENDED was set caused a segmentation fault (NULL +dereference). + +33. If the starting offset was specified as greater than the subject length in +a call to pcre2_substitute() an out-of-bounds memory reference could occur. + +34. When PCRE2 was compiled to use the heap instead of the stack for recursive +calls to match(), a repeated minimizing caseless back reference, or a +maximizing one where the two cases had different numbers of code units, +followed by a caseful back reference, could lose the caselessness of the first +repeated back reference (example: /(Z)(a)\2{1,2}?(?-i)\1X/i should match ZaAAZX +but didn't). + +35. When a pattern is too complicated, PCRE2 gives up trying to find a minimum +matching length and just records zero. Typically this happens when there are +too many nested or recursive back references. If the limit was reached in +certain recursive cases it failed to be triggered and an internal error could +be the result. + +36. The pcre2_dfa_match() function now takes note of the recursion limit for +the internal recursive calls that are used for lookrounds and recursions within +the pattern. + +37. More refactoring has got rid of the internal could_be_empty_branch() +function (around 400 lines of code, including comments) by keeping track of +could-be-emptiness as the pattern is compiled instead of scanning compiled +groups. (This would have been much harder before the refactoring of #3 above.) +This lifts a restriction on the number of branches in a group (more than about +1100 would give "pattern is too complicated"). + +38. Add the "-ac" command line option to pcre2test as a synonym for "-pattern +auto_callout". + +39. In a library with Unicode support, incorrect data was compiled for a +pattern with PCRE2_UCP set without PCRE2_UTF if a class required all wide +characters to match (for example, /[\s[:^ascii:]]/). + +40. The callout_error modifier has been added to pcre2test to make it possible +to return PCRE2_ERROR_CALLOUT from a callout. + +41. A minor change to pcre2grep: colour reset is now "[0m" instead of +"[00m". + +42. The limit in the auto-possessification code that was intended to catch +overly-complicated patterns and not spend too much time auto-possessifying was +being reset too often, resulting in very long compile times for some patterns. +Now such patterns are no longer completely auto-possessified. + +43. Applied Jason Hood's revised patch for RunTest.bat. + +44. Added a new Windows script RunGrepTest.bat, courtesy of Jason Hood. + +45. Minor cosmetic fix to pcre2test: move a variable that is not used under +Windows into the "not Windows" code. + +46. Applied Jason Hood's patches to upgrade pcre2grep under Windows and tidy +some of the code: + + * normalised the Windows condition by ensuring WIN32 is defined; + * enables the callout feature under Windows; + * adds globbing (Microsoft's implementation expands quoted args), + using a tweaked opendirectory; + * implements the is_*_tty functions for Windows; + * --color=always will write the ANSI sequences to file; + * add sequences 4 (underline works on Win10) and 5 (blink as bright + background, relatively standard on DOS/Win); + * remove the (char *) casts for the now-const strings; + * remove GREP_COLOUR (grep's command line allowed the 'u', but not + the environment), parsing GREP_COLORS instead; + * uses the current colour if not set, rather than black; + * add print_match for the undefined case; + * fixes a typo. + +In addition, colour settings containing anything other than digits and +semicolon are ignored, and the colour controls are no longer output for empty +strings. + +47. Detecting patterns that are too large inside the length-measuring loop +saves processing ridiculously long patterns to their end. + +48. Ignore PCRE2_CASELESS when processing \h, \H, \v, and \V in classes as it +just wastes time. In the UTF case it can also produce redundant entries in +XCLASS lists caused by characters with multiple other cases and pairs of +characters in the same "not-x" sublists. + +49. A pattern such as /(?=(a\K))/ can report the end of the match being before +its start; pcre2test was not handling this correctly when using the POSIX +interface (it was OK with the native interface). + +50. In pcre2grep, ignore all JIT compile errors. This means that pcre2grep will +continue to work, falling back to interpretation if anything goes wrong with +JIT. + +51. Applied patches from Christian Persch to configure.ac to make use of the +AC_USE_SYSTEM_EXTENSIONS macro and to test for functions used by the JIT +modules. + +52. Minor fixes to pcre2grep from Jason Hood: + * fixed some spacing; + * Windows doesn't usually use single quotes, so I've added a define + to use appropriate quotes [in an example]; + * LC_ALL was displayed as "LCC_ALL"; + * numbers 11, 12 & 13 should end in "th"; + * use double quotes in usage message. + +53. When autopossessifying, skip empty branches without recursion, to reduce +stack usage for the benefit of clang with -fsanitize-address, which uses huge +stack frames. Example pattern: /X?(R||){3335}/. Fixes oss-fuzz issue 553. + +54. A pattern with very many explicit back references to a group that is a long +way from the start of the pattern could take a long time to compile because +searching for the referenced group in order to find the minimum length was +being done repeatedly. Now up to 128 group minimum lengths are cached and the +attempt to find a minimum length is abandoned if there is a back reference to a +group whose number is greater than 128. (In that case, the pattern is so +complicated that this optimization probably isn't worth it.) This fixes +oss-fuzz issue 557. + +55. Issue 32 for 10.22 below was not correctly fixed. If pcre2grep in multiline +mode with --only-matching matched several lines, it restarted scanning at the +next line instead of moving on to the end of the matched string, which can be +several lines after the start. + +56. Applied Jason Hood's new patch for RunGrepTest.bat that updates it in line +with updates to the non-Windows version. + + + +Version 10.22 29-July-2016 +-------------------------- + +1. Applied Jason Hood's patches to RunTest.bat and testdata/wintestoutput3 +to fix problems with running the tests under Windows. + +2. Implemented a facility for quoting literal characters within hexadecimal +patterns in pcre2test, to make it easier to create patterns with just a few +non-printing characters. + +3. Binary zeros are not supported in pcre2test input files. It now detects them +and gives an error. + +4. Updated the valgrind parameters in RunTest: (a) changed smc-check=all to +smc-check=all-non-file; (b) changed obj:* in the suppression file to obj:??? so +that it matches only unknown objects. + +5. Updated the maintenance script maint/ManyConfigTests to make it easier to +select individual groups of tests. + +6. When the POSIX wrapper function regcomp() is called, the REG_NOSUB option +used to set PCRE2_NO_AUTO_CAPTURE when calling pcre2_compile(). However, this +disables the use of back references (and subroutine calls), which are supported +by other implementations of regcomp() with RE_NOSUB. Therefore, REG_NOSUB no +longer causes PCRE2_NO_AUTO_CAPTURE to be set, though it still ignores nmatch +and pmatch when regexec() is called. + +7. Because of 6 above, pcre2test has been modified with a new modifier called +posix_nosub, to call regcomp() with REG_NOSUB. Previously the no_auto_capture +modifier had this effect. That option is now ignored when the POSIX API is in +use. + +8. Minor tidies to the pcre2demo.c sample program, including more comments +about its 8-bit-ness. + +9. Detect unmatched closing parentheses and give the error in the pre-scan +instead of later. Previously the pre-scan carried on and could give a +misleading incorrect error message. For example, /(?J)(?'a'))(?'a')/ gave a +message about invalid duplicate group names. + +10. It has happened that pcre2test was accidentally linked with another POSIX +regex library instead of libpcre2-posix. In this situation, a call to regcomp() +(in the other library) may succeed, returning zero, but of course putting its +own data into the regex_t block. In one example the re_pcre2_code field was +left as NULL, which made pcre2test think it had not got a compiled POSIX regex, +so it treated the next line as another pattern line, resulting in a confusing +error message. A check has been added to pcre2test to see if the data returned +from a successful call of regcomp() are valid for PCRE2's regcomp(). If they +are not, an error message is output and the pcre2test run is abandoned. The +message points out the possibility of a mis-linking. Hopefully this will avoid +some head-scratching the next time this happens. + +11. A pattern such as /(?<=((?C)0))/, which has a callout inside a lookbehind +assertion, caused pcre2test to output a very large number of spaces when the +callout was taken, making the program appearing to loop. + +12. A pattern that included (*ACCEPT) in the middle of a sufficiently deeply +nested set of parentheses of sufficient size caused an overflow of the +compiling workspace (which was diagnosed, but of course is not desirable). + +13. Detect missing closing parentheses during the pre-pass for group +identification. + +14. Changed some integer variable types and put in a number of casts, following +a report of compiler warnings from Visual Studio 2013 and a few tests with +gcc's -Wconversion (which still throws up a lot). + +15. Implemented pcre2_code_copy(), and added pushcopy and #popcopy to pcre2test +for testing it. + +16. Change 66 for 10.21 introduced the use of snprintf() in PCRE2's version of +regerror(). When the error buffer is too small, my version of snprintf() puts a +binary zero in the final byte. Bug #1801 seems to show that other versions do +not do this, leading to bad output from pcre2test when it was checking for +buffer overflow. It no longer assumes a binary zero at the end of a too-small +regerror() buffer. + +17. Fixed typo ("&&" for "&") in pcre2_study(). Fortunately, this could not +actually affect anything, by sheer luck. + +18. Two minor fixes for MSVC compilation: (a) removal of apparently incorrect +"const" qualifiers in pcre2test and (b) defining snprintf as _snprintf for +older MSVC compilers. This has been done both in src/pcre2_internal.h for most +of the library, and also in src/pcre2posix.c, which no longer includes +pcre2_internal.h (see 24 below). + +19. Applied Chris Wilson's patch (Bugzilla #1681) to CMakeLists.txt for MSVC +static compilation. Subsequently applied Chris Wilson's second patch, putting +the first patch under a new option instead of being unconditional when +PCRE_STATIC is set. + +20. Updated pcre2grep to set stdout as binary when run under Windows, so as not +to convert \r\n at the ends of reflected lines into \r\r\n. This required +ensuring that other output that is written to stdout (e.g. file names) uses the +appropriate line terminator: \r\n for Windows, \n otherwise. + +21. When a line is too long for pcre2grep's internal buffer, show the maximum +length in the error message. + +22. Added support for string callouts to pcre2grep (Zoltan's patch with PH +additions). + +23. RunTest.bat was missing a "set type" line for test 22. + +24. The pcre2posix.c file was including pcre2_internal.h, and using some +"private" knowledge of the data structures. This is unnecessary; the code has +been re-factored and no longer includes pcre2_internal.h. + +25. A racing condition is fixed in JIT reported by Mozilla. + +26. Minor code refactor to avoid "array subscript is below array bounds" +compiler warning. + +27. Minor code refactor to avoid "left shift of negative number" warning. + +28. Add a bit more sanity checking to pcre2_serialize_decode() and document +that it expects trusted data. + +29. Fix typo in pcre2_jit_test.c + +30. Due to an oversight, pcre2grep was not making use of JIT when available. +This is now fixed. + +31. The RunGrepTest script is updated to use the valgrind suppressions file +when testing with JIT under valgrind (compare 10.21/51 below). The suppressions +file is updated so that is now the same as for PCRE1: it suppresses the +Memcheck warnings Addr16 and Cond in unknown objects (that is, JIT-compiled +code). Also changed smc-check=all to smc-check=all-non-file as was done for +RunTest (see 4 above). + +32. Implemented the PCRE2_NO_JIT option for pcre2_match(). + +33. Fix typo that gave a compiler error when JIT not supported. + +34. Fix comment describing the returns from find_fixedlength(). + +35. Fix potential negative index in pcre2test. + +36. Calls to pcre2_get_error_message() with error numbers that are never +returned by PCRE2 functions were returning empty strings. Now the error code +PCRE2_ERROR_BADDATA is returned. A facility has been added to pcre2test to +show the texts for given error numbers (i.e. to call pcre2_get_error_message() +and display what it returns) and a few representative error codes are now +checked in RunTest. + +37. Added "&& !defined(__INTEL_COMPILER)" to the test for __GNUC__ in +pcre2_match.c, in anticipation that this is needed for the same reason it was +recently added to pcrecpp.cc in PCRE1. + +38. Using -o with -M in pcre2grep could cause unnecessary repeated output when +the match extended over a line boundary, as it tried to find more matches "on +the same line" - but it was already over the end. + +39. Allow \C in lookbehinds and DFA matching in UTF-32 mode (by converting it +to the same code as '.' when PCRE2_DOTALL is set). + +40. Fix two clang compiler warnings in pcre2test when only one code unit width +is supported. + +41. Upgrade RunTest to automatically re-run test 2 with a large (64MiB) stack +if it fails when running the interpreter with a 16MiB stack (and if changing +the stack size via pcre2test is possible). This avoids having to manually set a +large stack size when testing with clang. + +42. Fix register overwite in JIT when SSE2 acceleration is enabled. + +43. Detect integer overflow in pcre2test pattern and data repetition counts. + +44. In pcre2test, ignore "allcaptures" after DFA matching. + +45. Fix unaligned accesses on x86. Patch by Marc Mutz. + +46. Fix some more clang compiler warnings. + + +Version 10.21 12-January-2016 +----------------------------- + +1. Improve matching speed of patterns starting with + or * in JIT. + +2. Use memchr() to find the first character in an unanchored match in 8-bit +mode in the interpreter. This gives a significant speed improvement. + +3. Removed a redundant copy of the opcode_possessify table in the +pcre2_auto_possessify.c source. + +4. Fix typos in dftables.c for z/OS. + +5. Change 36 for 10.20 broke the handling of [[:>:]] and [[:<:]] in that +processing them could involve a buffer overflow if the following character was +an opening parenthesis. + +6. Change 36 for 10.20 also introduced a bug in processing this pattern: +/((?x)(*:0))#(?'/. Specifically: if a setting of (?x) was followed by a (*MARK) +setting (which (*:0) is), then (?x) did not get unset at the end of its group +during the scan for named groups, and hence the external # was incorrectly +treated as a comment and the invalid (?' at the end of the pattern was not +diagnosed. This caused a buffer overflow during the real compile. This bug was +discovered by Karl Skomski with the LLVM fuzzer. + +7. Moved the pcre2_find_bracket() function from src/pcre2_compile.c into its +own source module to avoid a circular dependency between src/pcre2_compile.c +and src/pcre2_study.c + +8. A callout with a string argument containing an opening square bracket, for +example /(?C$[$)(?<]/, was incorrectly processed and could provoke a buffer +overflow. This bug was discovered by Karl Skomski with the LLVM fuzzer. + +9. The handling of callouts during the pre-pass for named group identification +has been tightened up. + +10. The quantifier {1} can be ignored, whether greedy, non-greedy, or +possessive. This is a very minor optimization. + +11. A possessively repeated conditional group that could match an empty string, +for example, /(?(R))*+/, was incorrectly compiled. + +12. The Unicode tables have been updated to Unicode 8.0.0 (thanks to Christian +Persch). + +13. An empty comment (?#) in a pattern was incorrectly processed and could +provoke a buffer overflow. This bug was discovered by Karl Skomski with the +LLVM fuzzer. + +14. Fix infinite recursion in the JIT compiler when certain patterns such as +/(?:|a|){100}x/ are analysed. + +15. Some patterns with character classes involving [: and \\ were incorrectly +compiled and could cause reading from uninitialized memory or an incorrect +error diagnosis. Examples are: /[[:\\](?<[::]/ and /[[:\\](?'abc')[a:]. The +first of these bugs was discovered by Karl Skomski with the LLVM fuzzer. + +16. Pathological patterns containing many nested occurrences of [: caused +pcre2_compile() to run for a very long time. This bug was found by the LLVM +fuzzer. + +17. A missing closing parenthesis for a callout with a string argument was not +being diagnosed, possibly leading to a buffer overflow. This bug was found by +the LLVM fuzzer. + +18. A conditional group with only one branch has an implicit empty alternative +branch and must therefore be treated as potentially matching an empty string. + +19. If (?R was followed by - or + incorrect behaviour happened instead of a +diagnostic. This bug was discovered by Karl Skomski with the LLVM fuzzer. + +20. Another bug that was introduced by change 36 for 10.20: conditional groups +whose condition was an assertion preceded by an explicit callout with a string +argument might be incorrectly processed, especially if the string contained \Q. +This bug was discovered by Karl Skomski with the LLVM fuzzer. + +21. Compiling PCRE2 with the sanitize options of clang showed up a number of +very pedantic coding infelicities and a buffer overflow while checking a UTF-8 +string if the final multi-byte UTF-8 character was truncated. + +22. For Perl compatibility in EBCDIC environments, ranges such as a-z in a +class, where both values are literal letters in the same case, omit the +non-letter EBCDIC code points within the range. + +23. Finding the minimum matching length of complex patterns with back +references and/or recursions can take a long time. There is now a cut-off that +gives up trying to find a minimum length when things get too complex. + +24. An optimization has been added that speeds up finding the minimum matching +length for patterns containing repeated capturing groups or recursions. + +25. If a pattern contained a back reference to a group whose number was +duplicated as a result of appearing in a (?|...) group, the computation of the +minimum matching length gave a wrong result, which could cause incorrect "no +match" errors. For such patterns, a minimum matching length cannot at present +be computed. + +26. Added a check for integer overflow in conditions (?() and +(?(R). This omission was discovered by Karl Skomski with the LLVM +fuzzer. + +27. Fixed an issue when \p{Any} inside an xclass did not read the current +character. + +28. If pcre2grep was given the -q option with -c or -l, or when handling a +binary file, it incorrectly wrote output to stdout. + +29. The JIT compiler did not restore the control verb head in case of *THEN +control verbs. This issue was found by Karl Skomski with a custom LLVM fuzzer. + +30. The way recursive references such as (?3) are compiled has been re-written +because the old way was the cause of many issues. Now, conversion of the group +number into a pattern offset does not happen until the pattern has been +completely compiled. This does mean that detection of all infinitely looping +recursions is postponed till match time. In the past, some easy ones were +detected at compile time. This re-writing was done in response to yet another +bug found by the LLVM fuzzer. + +31. A test for a back reference to a non-existent group was missing for items +such as \987. This caused incorrect code to be compiled. This issue was found +by Karl Skomski with a custom LLVM fuzzer. + +32. Error messages for syntax errors following \g and \k were giving inaccurate +offsets in the pattern. + +33. Improve the performance of starting single character repetitions in JIT. + +34. (*LIMIT_MATCH=) now gives an error instead of setting the value to 0. + +35. Error messages for syntax errors in *LIMIT_MATCH and *LIMIT_RECURSION now +give the right offset instead of zero. + +36. The JIT compiler should not check repeats after a {0,1} repeat byte code. +This issue was found by Karl Skomski with a custom LLVM fuzzer. + +37. The JIT compiler should restore the control chain for empty possessive +repeats. This issue was found by Karl Skomski with a custom LLVM fuzzer. + +38. A bug which was introduced by the single character repetition optimization +was fixed. + +39. Match limit check added to recursion. This issue was found by Karl Skomski +with a custom LLVM fuzzer. + +40. Arrange for the UTF check in pcre2_match() and pcre2_dfa_match() to look +only at the part of the subject that is relevant when the starting offset is +non-zero. + +41. Improve first character match in JIT with SSE2 on x86. + +42. Fix two assertion fails in JIT. These issues were found by Karl Skomski +with a custom LLVM fuzzer. + +43. Correct the setting of CMAKE_C_FLAGS in CMakeLists.txt (patch from Roy Ivy +III). + +44. Fix bug in RunTest.bat for new test 14, and adjust the script for the added +test (there are now 20 in total). + +45. Fixed a corner case of range optimization in JIT. + +46. Add the ${*MARK} facility to pcre2_substitute(). + +47. Modifier lists in pcre2test were splitting at spaces without the required +commas. + +48. Implemented PCRE2_ALT_VERBNAMES. + +49. Fixed two issues in JIT. These were found by Karl Skomski with a custom +LLVM fuzzer. + +50. The pcre2test program has been extended by adding the #newline_default +command. This has made it possible to run the standard tests when PCRE2 is +compiled with either CR or CRLF as the default newline convention. As part of +this work, the new command was added to several test files and the testing +scripts were modified. The pcre2grep tests can now also be run when there is no +LF in the default newline convention. + +51. The RunTest script has been modified so that, when JIT is used and valgrind +is specified, a valgrind suppressions file is set up to ignore "Invalid read of +size 16" errors because these are false positives when the hardware supports +the SSE2 instruction set. + +52. It is now possible to have comment lines amid the subject strings in +pcre2test (and perltest.sh) input. + +53. Implemented PCRE2_USE_OFFSET_LIMIT and pcre2_set_offset_limit(). + +54. Add the null_context modifier to pcre2test so that calling pcre2_compile() +and the matching functions with NULL contexts can be tested. + +55. Implemented PCRE2_SUBSTITUTE_EXTENDED. + +56. In a character class such as [\W\p{Any}] where both a negative-type escape +("not a word character") and a property escape were present, the property +escape was being ignored. + +57. Fixed integer overflow for patterns whose minimum matching length is very, +very large. + +58. Implemented --never-backslash-C. + +59. Change 55 above introduced a bug by which certain patterns provoked the +erroneous error "\ at end of pattern". + +60. The special sequences [[:<:]] and [[:>:]] gave rise to incorrect compiling +errors or other strange effects if compiled in UCP mode. Found with libFuzzer +and AddressSanitizer. + +61. Whitespace at the end of a pcre2test pattern line caused a spurious error +message if there were only single-character modifiers. It should be ignored. + +62. The use of PCRE2_NO_AUTO_CAPTURE could cause incorrect compilation results +or segmentation errors for some patterns. Found with libFuzzer and +AddressSanitizer. + +63. Very long names in (*MARK) or (*THEN) etc. items could provoke a buffer +overflow. + +64. Improve error message for overly-complicated patterns. + +65. Implemented an optional replication feature for patterns in pcre2test, to +make it easier to test long repetitive patterns. The tests for 63 above are +converted to use the new feature. + +66. In the POSIX wrapper, if regerror() was given too small a buffer, it could +misbehave. + +67. In pcre2_substitute() in UTF mode, the UTF validity check on the +replacement string was happening before the length setting when the replacement +string was zero-terminated. + +68. In pcre2_substitute() in UTF mode, PCRE2_NO_UTF_CHECK can be set for the +second and subsequent calls to pcre2_match(). + +69. There was no check for integer overflow for a replacement group number in +pcre2_substitute(). An added check for a number greater than the largest group +number in the pattern means this is not now needed. + +70. The PCRE2-specific VERSION condition didn't work correctly if only one +digit was given after the decimal point, or if more than two digits were given. +It now works with one or two digits, and gives a compile time error if more are +given. + +71. In pcre2_substitute() there was the possibility of reading one code unit +beyond the end of the replacement string. + +72. The code for checking a subject's UTF-32 validity for a pattern with a +lookbehind involved an out-of-bounds pointer, which could potentially cause +trouble in some environments. + +73. The maximum lookbehind length was incorrectly calculated for patterns such +as /(?<=(a)(?-1))x/ which have a recursion within a backreference. + +74. Give an error if a lookbehind assertion is longer than 65535 code units. + +75. Give an error in pcre2_substitute() if a match ends before it starts (as a +result of the use of \K). + +76. Check the length of subpattern names and the names in (*MARK:xx) etc. +dynamically to avoid the possibility of integer overflow. + +77. Implement pcre2_set_max_pattern_length() so that programs can restrict the +size of patterns that they are prepared to handle. + +78. (*NO_AUTO_POSSESS) was not working. + +79. Adding group information caching improves the speed of compiling when +checking whether a group has a fixed length and/or could match an empty string, +especially when recursion or subroutine calls are involved. However, this +cannot be used when (?| is present in the pattern because the same number may +be used for groups of different sizes. To catch runaway patterns in this +situation, counts have been introduced to the functions that scan for empty +branches or compute fixed lengths. + +80. Allow for the possibility of the size of the nest_save structure not being +a factor of the size of the compiling workspace (it currently is). + +81. Check for integer overflow in minimum length calculation and cap it at +65535. + +82. Small optimizations in code for finding the minimum matching length. + +83. Lock out configuring for EBCDIC with non-8-bit libraries. + +84. Test for error code <= 0 in regerror(). + +85. Check for too many replacements (more than INT_MAX) in pcre2_substitute(). + +86. Avoid the possibility of computing with an out-of-bounds pointer (though +not dereferencing it) while handling lookbehind assertions. + +87. Failure to get memory for the match data in regcomp() is now given as a +regcomp() error instead of waiting for regexec() to pick it up. + +88. In pcre2_substitute(), ensure that CRLF is not split when it is a valid +newline sequence. + +89. Paranoid check in regcomp() for bad error code from pcre2_compile(). + +90. Run test 8 (internal offsets and code sizes) for link sizes 3 and 4 as well +as for link size 2. + +91. Document that JIT has a limit on pattern size, and give more information +about JIT compile failures in pcre2test. + +92. Implement PCRE2_INFO_HASBACKSLASHC. + +93. Re-arrange valgrind support code in pcre2test to avoid spurious reports +with JIT (possibly caused by SSE2?). + +94. Support offset_limit in JIT. + +95. A sequence such as [[:punct:]b] that is, a POSIX character class followed +by a single ASCII character in a class item, was incorrectly compiled in UCP +mode. The POSIX class got lost, but only if the single character followed it. + +96. [:punct:] in UCP mode was matching some characters in the range 128-255 +that should not have been matched. + +97. If [:^ascii:] or [:^xdigit:] are present in a non-negated class, all +characters with code points greater than 255 are in the class. When a Unicode +property was also in the class (if PCRE2_UCP is set, escapes such as \w are +turned into Unicode properties), wide characters were not correctly handled, +and could fail to match. + +98. In pcre2test, make the "startoffset" modifier a synonym of "offset", +because it sets the "startoffset" parameter for pcre2_match(). + +99. If PCRE2_AUTO_CALLOUT was set on a pattern that had a (?# comment between +an item and its qualifier (for example, A(?#comment)?B) pcre2_compile() +misbehaved. This bug was found by the LLVM fuzzer. + +100. The error for an invalid UTF pattern string always gave the code unit +offset as zero instead of where the invalidity was found. + +101. Further to 97 above, negated classes such as [^[:^ascii:]\d] were also not +working correctly in UCP mode. + +102. Similar to 99 above, if an isolated \E was present between an item and its +qualifier when PCRE2_AUTO_CALLOUT was set, pcre2_compile() misbehaved. This bug +was found by the LLVM fuzzer. + +103. The POSIX wrapper function regexec() crashed if the option REG_STARTEND +was set when the pmatch argument was NULL. It now returns REG_INVARG. + +104. Allow for up to 32-bit numbers in the ordin() function in pcre2grep. + +105. An empty \Q\E sequence between an item and its qualifier caused +pcre2_compile() to misbehave when auto callouts were enabled. This bug +was found by the LLVM fuzzer. + +106. If both PCRE2_ALT_VERBNAMES and PCRE2_EXTENDED were set, and a (*MARK) or +other verb "name" ended with whitespace immediately before the closing +parenthesis, pcre2_compile() misbehaved. Example: /(*:abc )/, but only when +both those options were set. + +107. In a number of places pcre2_compile() was not handling NULL characters +correctly, and pcre2test with the "bincode" modifier was not always correctly +displaying fields containing NULLS: + + (a) Within /x extended #-comments + (b) Within the "name" part of (*MARK) and other *verbs + (c) Within the text argument of a callout + +108. If a pattern that was compiled with PCRE2_EXTENDED started with white +space or a #-type comment that was followed by (?-x), which turns off +PCRE2_EXTENDED, and there was no subsequent (?x) to turn it on again, +pcre2_compile() assumed that (?-x) applied to the whole pattern and +consequently mis-compiled it. This bug was found by the LLVM fuzzer. The fix +for this bug means that a setting of any of the (?imsxJU) options at the start +of a pattern is no longer transferred to the options that are returned by +PCRE2_INFO_ALLOPTIONS. In fact, this was an anachronism that should have +changed when the effects of those options were all moved to compile time. + +109. An escaped closing parenthesis in the "name" part of a (*verb) when +PCRE2_ALT_VERBNAMES was set caused pcre2_compile() to malfunction. This bug +was found by the LLVM fuzzer. + +110. Implemented PCRE2_SUBSTITUTE_UNSET_EMPTY, and updated pcre2test to make it +possible to test it. + +111. "Harden" pcre2test against ridiculously large values in modifiers and +command line arguments. + +112. Implemented PCRE2_SUBSTITUTE_UNKNOWN_UNSET and PCRE2_SUBSTITUTE_OVERFLOW_ +LENGTH. + +113. Fix printing of *MARK names that contain binary zeroes in pcre2test. + + +Version 10.20 30-June-2015 +-------------------------- + +1. Callouts with string arguments have been added. + +2. Assertion code generator in JIT has been optimized. + +3. The invalid pattern (?(?C) has a missing assertion condition at the end. The +pcre2_compile() function read past the end of the input before diagnosing an +error. This bug was discovered by the LLVM fuzzer. + +4. Implemented pcre2_callout_enumerate(). + +5. Fix JIT compilation of conditional blocks whose assertion is converted to +(*FAIL). E.g: /(?(?!))/. + +6. The pattern /(?(?!)^)/ caused references to random memory. This bug was +discovered by the LLVM fuzzer. + +7. The assertion (?!) is optimized to (*FAIL). This was not handled correctly +when this assertion was used as a condition, for example (?(?!)a|b). In +pcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect +error about an unsupported item. + +8. For some types of pattern, for example /Z*(|d*){216}/, the auto- +possessification code could take exponential time to complete. A recursion +depth limit of 1000 has been imposed to limit the resources used by this +optimization. This infelicity was discovered by the LLVM fuzzer. + +9. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class +such as \S in non-UCP mode, explicit wide characters (> 255) can be ignored +because \S ensures they are all in the class. The code for doing this was +interacting badly with the code for computing the amount of space needed to +compile the pattern, leading to a buffer overflow. This bug was discovered by +the LLVM fuzzer. + +10. A pattern such as /((?2)+)((?1))/ which has mutual recursion nested inside +other kinds of group caused stack overflow at compile time. This bug was +discovered by the LLVM fuzzer. + +11. A pattern such as /(?1)(?#?'){8}(a)/ which had a parenthesized comment +between a subroutine call and its quantifier was incorrectly compiled, leading +to buffer overflow or other errors. This bug was discovered by the LLVM fuzzer. + +12. The illegal pattern /(?(?.*!.*)?)/ was not being diagnosed as missing an +assertion after (?(. The code was failing to check the character after (?(?< +for the ! or = that would indicate a lookbehind assertion. This bug was +discovered by the LLVM fuzzer. + +13. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with +a fixed maximum following a group that contains a subroutine reference was +incorrectly compiled and could trigger buffer overflow. This bug was discovered +by the LLVM fuzzer. + +14. Negative relative recursive references such as (?-7) to non-existent +subpatterns were not being diagnosed and could lead to unpredictable behaviour. +This bug was discovered by the LLVM fuzzer. + +15. The bug fixed in 14 was due to an integer variable that was unsigned when +it should have been signed. Some other "int" variables, having been checked, +have either been changed to uint32_t or commented as "must be signed". + +16. A mutual recursion within a lookbehind assertion such as (?<=((?2))((?1))) +caused a stack overflow instead of the diagnosis of a non-fixed length +lookbehind assertion. This bug was discovered by the LLVM fuzzer. + +17. The use of \K in a positive lookbehind assertion in a non-anchored pattern +(e.g. /(?<=\Ka)/) could make pcre2grep loop. + +18. There was a similar problem to 17 in pcre2test for global matches, though +the code there did catch the loop. + +19. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*), +and a subsequent item in the pattern caused a non-match, backtracking over the +repeated \X did not stop, but carried on past the start of the subject, causing +reference to random memory and/or a segfault. There were also some other cases +where backtracking after \C could crash. This set of bugs was discovered by the +LLVM fuzzer. + +20. The function for finding the minimum length of a matching string could take +a very long time if mutual recursion was present many times in a pattern, for +example, /((?2){73}(?2))((?1))/. A better mutual recursion detection method has +been implemented. This infelicity was discovered by the LLVM fuzzer. + +21. Implemented PCRE2_NEVER_BACKSLASH_C. + +22. The feature for string replication in pcre2test could read from freed +memory if the replication required a buffer to be extended, and it was not +working properly in 16-bit and 32-bit modes. This issue was discovered by a +fuzzer: see http://lcamtuf.coredump.cx/afl/. + +23. Added the PCRE2_ALT_CIRCUMFLEX option. + +24. Adjust the treatment of \8 and \9 to be the same as the current Perl +behaviour. + +25. Static linking against the PCRE2 library using the pkg-config module was +failing on missing pthread symbols. + +26. If a group that contained a recursive back reference also contained a +forward reference subroutine call followed by a non-forward-reference +subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to +compile correct code, leading to undefined behaviour or an internally detected +error. This bug was discovered by the LLVM fuzzer. + +27. Quantification of certain items (e.g. atomic back references) could cause +incorrect code to be compiled when recursive forward references were involved. +For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/. This bug was +discovered by the LLVM fuzzer. + +28. A repeated conditional group whose condition was a reference by name caused +a buffer overflow if there was more than one group with the given name. This +bug was discovered by the LLVM fuzzer. + +29. A recursive back reference by name within a group that had the same name as +another group caused a buffer overflow. For example: /(?J)(?'d'(?'d'\g{d}))/. +This bug was discovered by the LLVM fuzzer. + +30. A forward reference by name to a group whose number is the same as the +current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused a +buffer overflow at compile time. This bug was discovered by the LLVM fuzzer. + +31. Fix -fsanitize=undefined warnings for left shifts of 1 by 31 (it treats 1 +as an int; fixed by writing it as 1u). + +32. Fix pcre2grep compile when -std=c99 is used with gcc, though it still gives +a warning for "fileno" unless -std=gnu99 us used. + +33. A lookbehind assertion within a set of mutually recursive subpatterns could +provoke a buffer overflow. This bug was discovered by the LLVM fuzzer. + +34. Give an error for an empty subpattern name such as (?''). + +35. Make pcre2test give an error if a pattern that follows #forbud_utf contains +\P, \p, or \X. + +36. The way named subpatterns are handled has been refactored. There is now a +pre-pass over the regex which does nothing other than identify named +subpatterns and count the total captures. This means that information about +named patterns is known before the rest of the compile. In particular, it means +that forward references can be checked as they are encountered. Previously, the +code for handling forward references was contorted and led to several errors in +computing the memory requirements for some patterns, leading to buffer +overflows. + +37. There was no check for integer overflow in subroutine calls such as (?123). + +38. The table entry for \l in EBCDIC environments was incorrect, leading to its +being treated as a literal 'l' instead of causing an error. + +39. If a non-capturing group containing a conditional group that could match +an empty string was repeated, it was not identified as matching an empty string +itself. For example: /^(?:(?(1)x|)+)+$()/. + +40. In an EBCDIC environment, pcretest was mishandling the escape sequences +\a and \e in test subject lines. + +41. In an EBCDIC environment, \a in a pattern was converted to the ASCII +instead of the EBCDIC value. + +42. The handling of \c in an EBCDIC environment has been revised so that it is +now compatible with the specification in Perl's perlebcdic page. + +43. Single character repetition in JIT has been improved. 20-30% speedup +was achieved on certain patterns. + +44. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in +ASCII/Unicode. This has now been added to the list of characters that are +recognized as white space in EBCDIC. + +45. When PCRE2 was compiled without Unicode support, the use of \p and \P gave +an error (correctly) when used outside a class, but did not give an error +within a class. + +46. \h within a class was incorrectly compiled in EBCDIC environments. + +47. JIT should return with error when the compiled pattern requires +more stack space than the maximum. + +48. Fixed a memory leak in pcre2grep when a locale is set. + + +Version 10.10 06-March-2015 +--------------------------- + +1. When a pattern is compiled, it remembers the highest back reference so that +when matching, if the ovector is too small, extra memory can be obtained to +use instead. A conditional subpattern whose condition is a check on a capture +having happened, such as, for example in the pattern /^(?:(a)|b)(?(1)A|B)/, is +another kind of back reference, but it was not setting the highest +backreference number. This mattered only if pcre2_match() was called with an +ovector that was too small to hold the capture, and there was no other kind of +back reference (a situation which is probably quite rare). The effect of the +bug was that the condition was always treated as FALSE when the capture could +not be consulted, leading to a incorrect behaviour by pcre2_match(). This bug +has been fixed. + +2. Functions for serialization and deserialization of sets of compiled patterns +have been added. + +3. The value that is returned by PCRE2_INFO_SIZE has been corrected to remove +excess code units at the end of the data block that may occasionally occur if +the code for calculating the size over-estimates. This change stops the +serialization code copying uninitialized data, to which valgrind objects. The +documentation of PCRE2_INFO_SIZE was incorrect in stating that the size did not +include the general overhead. This has been corrected. + +4. All code units in every slot in the table of group names are now set, again +in order to avoid accessing uninitialized data when serializing. + +5. The (*NO_JIT) feature is implemented. + +6. If a bug that caused pcre2_compile() to use more memory than allocated was +triggered when using valgrind, the code in (3) above passed a stupidly large +value to valgrind. This caused a crash instead of an "internal error" return. + +7. A reference to a duplicated named group (either a back reference or a test +for being set in a conditional) that occurred in a part of the pattern where +PCRE2_DUPNAMES was not set caused the amount of memory needed for the pattern +to be incorrectly calculated, leading to overwriting. + +8. A mutually recursive set of back references such as (\2)(\1) caused a +segfault at compile time (while trying to find the minimum matching length). +The infinite loop is now broken (with the minimum length unset, that is, zero). + +9. If an assertion that was used as a condition was quantified with a minimum +of zero, matching went wrong. In particular, if the whole group had unlimited +repetition and could match an empty string, a segfault was likely. The pattern +(?(?=0)?)+ is an example that caused this. Perl allows assertions to be +quantified, but not if they are being used as conditions, so the above pattern +is faulted by Perl. PCRE2 has now been changed so that it also rejects such +patterns. + +10. The error message for an invalid quantifier has been changed from "nothing +to repeat" to "quantifier does not follow a repeatable item". + +11. If a bad UTF string is compiled with NO_UTF_CHECK, it may succeed, but +scanning the compiled pattern in subsequent auto-possessification can get out +of step and lead to an unknown opcode. Previously this could have caused an +infinite loop. Now it generates an "internal error" error. This is a tidyup, +not a bug fix; passing bad UTF with NO_UTF_CHECK is documented as having an +undefined outcome. + +12. A UTF pattern containing a "not" match of a non-ASCII character and a +subroutine reference could loop at compile time. Example: /[^\xff]((?1))/. + +13. The locale test (RunTest 3) has been upgraded. It now checks that a locale +that is found in the output of "locale -a" can actually be set by pcre2test +before it is accepted. Previously, in an environment where a locale was listed +but would not set (an example does exist), the test would "pass" without +actually doing anything. Also the fr_CA locale has been added to the list of +locales that can be used. + +14. Fixed a bug in pcre2_substitute(). If a replacement string ended in a +capturing group number without parentheses, the last character was incorrectly +literally included at the end of the replacement string. + +15. A possessive capturing group such as (a)*+ with a minimum repeat of zero +failed to allow the zero-repeat case if pcre2_match() was called with an +ovector too small to capture the group. + +16. Improved error message in pcre2test when setting the stack size (-S) fails. + +17. Fixed two bugs in CMakeLists.txt: (1) Some lines had got lost in the +transfer from PCRE1, meaning that CMake configuration failed if "build tests" +was selected. (2) The file src/pcre2_serialize.c had not been added to the list +of PCRE2 sources, which caused a failure to build pcre2test. + +18. Fixed typo in pcre2_serialize.c (DECL instead of DEFN) that causes problems +only on Windows. + +19. Use binary input when reading back saved serialized patterns in pcre2test. + +20. Added RunTest.bat for running the tests under Windows. + +21. "make distclean" was not removing config.h, a file that may be created for +use with CMake. + +22. A pattern such as "((?2){0,1999}())?", which has a group containing a +forward reference repeated a large (but limited) number of times within a +repeated outer group that has a zero minimum quantifier, caused incorrect code +to be compiled, leading to the error "internal error: previously-checked +referenced subpattern not found" when an incorrect memory address was read. +This bug was reported as "heap overflow", discovered by Kai Lu of Fortinet's +FortiGuard Labs. (Added 24-March-2015: CVE-2015-2325 was given to this.) + +23. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine +call within a group that also contained a recursive back reference caused +incorrect code to be compiled. This bug was reported as "heap overflow", +discovered by Kai Lu of Fortinet's FortiGuard Labs. (Added 24-March-2015: +CVE-2015-2326 was given to this.) + +24. Computing the size of the JIT read-only data in advance has been a source +of various issues, and new ones are still appear unfortunately. To fix +existing and future issues, size computation is eliminated from the code, +and replaced by on-demand memory allocation. + +25. A pattern such as /(?i)[A-`]/, where characters in the other case are +adjacent to the end of the range, and the range contained characters with more +than one other case, caused incorrect behaviour when compiled in UTF mode. In +that example, the range a-j was left out of the class. + + +Version 10.00 05-January-2015 +----------------------------- + +Version 10.00 is the first release of PCRE2, a revised API for the PCRE +library. Changes prior to 10.00 are logged in the ChangeLog file for the old +API, up to item 20 for release 8.36. + +The code of the library was heavily revised as part of the new API +implementation. Details of each and every modification were not individually +logged. In addition to the API changes, the following changes were made. They +are either new functionality, or bug fixes and other noticeable changes of +behaviour that were implemented after the code had been forked. + +1. Including Unicode support at build time is now enabled by default, but it +can optionally be disabled. It is not enabled by default at run time (no +change). + +2. The test program, now called pcre2test, was re-specified and almost +completely re-written. Its input is not compatible with input for pcretest. + +3. Patterns may start with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) to set the +PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is +matched by that pattern. + +4. For the benefit of those who use PCRE2 via some other application, that is, +not writing the function calls themselves, it is possible to check the PCRE2 +version by matching a pattern such as /(?(VERSION>=10)yes|no)/ against a +string such as "yesno". + +5. There are case-equivalent Unicode characters whose encodings use different +numbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is +theoretically possible for this to happen in UTF-16 too.) If a backreference to +a group containing one of these characters was greedily repeated, and during +the match a backtrack occurred, the subject might be backtracked by the wrong +number of code units. For example, if /^(\x{23a})\1*(.)/ is matched caselessly +(and in UTF-8 mode) against "\x{23a}\x{2c65}\x{2c65}\x{2c65}", group 2 should +capture the final character, which is the three bytes E2, B1, and A5 in UTF-8. +Incorrect backtracking meant that group 2 captured only the last two bytes. +This bug has been fixed; the new code is slower, but it is used only when the +strings matched by the repetition are not all the same length. + +6. A pattern such as /()a/ was not setting the "first character must be 'a'" +information. This applied to any pattern with a group that matched no +characters, for example: /(?:(?=.)|(? 0) while () { + $count = 0; $line++; if (/^\s*$/) { @@ -50,14 +51,24 @@ while (scalar(@ARGV) > 0) $yield = 1; } } - else + elsif (/\\[^ef]|\\f[^IBP]/) { - if (/\\[^ef]|\\f[^IBP]/) - { - printf "Bad backslash in line $line of $file\n"; - $yield = 1; - } - } + printf "Bad backslash in line $line of $file\n"; + $yield = 1; + } + while (/\\f[BI]/g) + { + $count++; + } + while (/\\fP/g) + { + $count--; + } + if ($count != 0) + { + printf "Mismatching formatting in line $line of $file\n"; + $yield = 1; + } } close(IN); diff --git a/src/pcre/CleanTxt b/src/pcre2/CleanTxt similarity index 100% rename from src/pcre/CleanTxt rename to src/pcre2/CleanTxt diff --git a/src/pcre/Detrail b/src/pcre2/Detrail similarity index 100% rename from src/pcre/Detrail rename to src/pcre2/Detrail diff --git a/src/pcre2/HACKING b/src/pcre2/HACKING new file mode 100644 index 00000000..20faf8f4 --- /dev/null +++ b/src/pcre2/HACKING @@ -0,0 +1,830 @@ +Technical Notes about PCRE2 +--------------------------- + +These are very rough technical notes that record potentially useful information +about PCRE2 internals. PCRE2 is a library based on the original PCRE library, +but with a revised (and incompatible) API. To avoid confusion, the original +library is referred to as PCRE1 below. For information about testing PCRE2, see +the pcre2test documentation and the comment at the head of the RunTest file. + +PCRE1 releases were up to 8.3x when PCRE2 was developed, and later bug fix +releases remain in the 8.xx series. PCRE2 releases started at 10.00 to avoid +confusion with PCRE1. + + +Historical note 1 +----------------- + +Many years ago I implemented some regular expression functions to an algorithm +suggested by Martin Richards. The rather simple patterns were not Unix-like in +form, and were quite restricted in what they could do by comparison with Perl. +The interesting part about the algorithm was that the amount of space required +to hold the compiled form of an expression was known in advance. The code to +apply an expression did not operate by backtracking, as the original Henry +Spencer code and current PCRE2 and Perl code does, but instead checked all +possibilities simultaneously by keeping a list of current states and checking +all of them as it advanced through the subject string. In the terminology of +Jeffrey Friedl's book, it was a "DFA algorithm", though it was not a +traditional Finite State Machine (FSM). When the pattern was all used up, all +remaining states were possible matches, and the one matching the longest subset +of the subject string was chosen. This did not necessarily maximize the +individual wild portions of the pattern, as is expected in Unix and Perl-style +regular expressions. + + +Historical note 2 +----------------- + +By contrast, the code originally written by Henry Spencer (which was +subsequently heavily modified for Perl) compiles the expression twice: once in +a dummy mode in order to find out how much store will be needed, and then for +real. (The Perl version probably doesn't do this any more; I'm talking about +the original library.) The execution function operates by backtracking and +maximizing (or, optionally, minimizing, in Perl) the amount of the subject that +matches individual wild portions of the pattern. This is an "NFA algorithm" in +Friedl's terminology. + + +OK, here's the real stuff +------------------------- + +For the set of functions that formed the original PCRE1 library in 1997 (which +are unrelated to those mentioned above), I tried at first to invent an +algorithm that used an amount of store bounded by a multiple of the number of +characters in the pattern, to save on compiling time. However, because of the +greater complexity in Perl regular expressions, I couldn't do this, even though +the then current Perl 5.004 patterns were much simpler than those supported +nowadays. In any case, a first pass through the pattern is helpful for other +reasons. + + +Support for 16-bit and 32-bit data strings +------------------------------------------- + +The PCRE2 library can be compiled in any combination of 8-bit, 16-bit or 32-bit +modes, creating up to three different libraries. In the description that +follows, the word "short" is used for a 16-bit data quantity, and the phrase +"code unit" is used for a quantity that is a byte in 8-bit mode, a short in +16-bit mode and a 32-bit word in 32-bit mode. The names of PCRE2 functions are +given in generic form, without the _8, _16, or _32 suffix. + + +Computing the memory requirement: how it was +-------------------------------------------- + +Up to and including release 6.7, PCRE1 worked by running a very degenerate +first pass to calculate a maximum memory requirement, and then a second pass to +do the real compile - which might use a bit less than the predicted amount of +memory. The idea was that this would turn out faster than the Henry Spencer +code because the first pass is degenerate and the second pass can just store +stuff straight into memory, which it knows is big enough. + + +Computing the memory requirement: how it is +------------------------------------------- + +By the time I was working on a potential 6.8 release, the degenerate first pass +had become very complicated and hard to maintain. Indeed one of the early +things I did for 6.8 was to fix Yet Another Bug in the memory computation. Then +I had a flash of inspiration as to how I could run the real compile function in +a "fake" mode that enables it to compute how much memory it would need, while +in most cases only ever using a small amount of working memory, and without too +many tests of the mode that might slow it down. So I refactored the compiling +functions to work this way. This got rid of about 600 lines of source and made +further maintenance and development easier. As this was such a major change, I +never released 6.8, instead upping the number to 7.0 (other quite major changes +were also present in the 7.0 release). + +A side effect of this work was that the previous limit of 200 on the nesting +depth of parentheses was removed. However, there was a downside: compiling ran +more slowly than before (30% or more, depending on the pattern) because it now +did a full analysis of the pattern. My hope was that this would not be a big +issue, and in the event, nobody has commented on it. + +At release 8.34, a limit on the nesting depth of parentheses was re-introduced +(default 250, settable at build time) so as to put a limit on the amount of +system stack used by the compile function, which uses recursive function calls +for nested parenthesized groups. This is a safety feature for environments with +small stacks where the patterns are provided by users. + + +Yet another pattern scan +------------------------ + +History repeated itself for PCRE2 release 10.20. A number of bugs relating to +named subpatterns had been discovered by fuzzers. Most of these were related to +the handling of forward references when it was not known if the named group was +unique. (References to non-unique names use a different opcode and more +memory.) The use of duplicate group numbers (the (?| facility) also caused +issues. + +To get around these problems I adopted a new approach by adding a third pass +over the pattern (really a "pre-pass"), which did nothing other than identify +all the named subpatterns and their corresponding group numbers. This means +that the actual compile (both the memory-computing dummy run and the real +compile) has full knowledge of group names and numbers throughout. Several +dozen lines of messy code were eliminated, though the new pre-pass was not +short. In particular, parsing and skipping over [] classes is complicated. + +While working on 10.22 I realized that I could simplify yet again by moving +more of the parsing into the pre-pass, thus avoiding doing it in two places, so +after 10.22 was released, the code underwent yet another big refactoring. This +is how it is from 10.23 onwards: + +The function called parse_regex() scans the pattern characters, parsing them +into literal data and meta characters. It converts escapes such as \x{123} +into literals, handles \Q...\E, and skips over comments and non-significant +white space. The result of the scanning is put into a vector of 32-bit unsigned +integers. Values less than 0x80000000 are literal data. Higher values represent +meta-characters. The top 16-bits of such values identify the meta-character, +and these are given names such as META_CAPTURE. The lower 16-bits are available +for data, for example, the capturing group number. The only situation in which +literal data values greater than 0x7fffffff can appear is when the 32-bit +library is running in non-UTF mode. This is handled by having a special +meta-character that is followed by the 32-bit data value. + +The size of the parsed pattern vector, when auto-callouts are not enabled, is +bounded by the length of the pattern (with one exception). The code is written +so that each item in the pattern uses no more vector elements than the number +of code units in the item itself. The exception is the aforementioned large +32-bit number handling. For this reason, 32-bit non-UTF patterns are scanned in +advance to check for such values. When auto-callouts are enabled, the generous +assumption is made that there will be a callout for each pattern code unit +(which of course is only actually true if all code units are literals) plus one +at the end. There is a default parsed pattern vector on the system stack, but +if this is not big enough, heap memory is used. + +As before, the actual compiling function is run twice, the first time to +determine the amount of memory needed for the final compiled pattern. It +now processes the parsed pattern vector, not the pattern itself, although some +of the parsed items refer to strings in the pattern - for example, group +names. As escapes and comments have already been processed, the code is a bit +simpler than before. + +Most errors can be diagnosed during the parsing scan. For those that cannot +(for example, "lookbehind assertion is not fixed length"), the parsed code +contains offsets into the pattern so that the actual compiling code can +report where errors are. + + +The elements of the parsed pattern vector +----------------------------------------- + +The word "offset" below means a code unit offset into the pattern. When +PCRE2_SIZE (which is usually size_t) is no bigger than uint32_t, an offset is +stored in a single parsed pattern element. Otherwise (typically on 64-bit +systems) it occupies two elements. The following meta items occupy just one +element, with no data: + +META_ACCEPT (*ACCEPT) +META_ASTERISK * +META_ASTERISK_PLUS *+ +META_ASTERISK_QUERY *? +META_ATOMIC (?> start of atomic group +META_CIRCUMFLEX ^ metacharacter +META_CLASS [ start of non-empty class +META_CLASS_EMPTY [] empty class - only with PCRE2_ALLOW_EMPTY_CLASS +META_CLASS_EMPTY_NOT [^] negative empty class - ditto +META_CLASS_END ] end of non-empty class +META_CLASS_NOT [^ start non-empty negative class +META_COMMIT (*COMMIT) +META_COND_ASSERT (?(?assertion) +META_DOLLAR $ metacharacter +META_DOT . metacharacter +META_END End of pattern (this value is 0x80000000) +META_FAIL (*FAIL) +META_KET ) closing parenthesis +META_LOOKAHEAD (?= start of lookahead +META_LOOKAHEAD_NA (*napla: start of non-atomic lookahead +META_LOOKAHEADNOT (?! start of negative lookahead +META_NOCAPTURE (?: no capture parens +META_PLUS + +META_PLUS_PLUS ++ +META_PLUS_QUERY +? +META_PRUNE (*PRUNE) - no argument +META_QUERY ? +META_QUERY_PLUS ?+ +META_QUERY_QUERY ?? +META_RANGE_ESCAPED hyphen in class range with at least one escape +META_RANGE_LITERAL hyphen in class range defined literally +META_SKIP (*SKIP) - no argument +META_THEN (*THEN) - no argument + +The two RANGE values occur only in character classes. They are positioned +between two literals that define the start and end of the range. In an EBCDIC +evironment it is necessary to know whether either of the range values was +specified as an escape. In an ASCII/Unicode environment the distinction is not +relevant. + +The following have data in the lower 16 bits, and may be followed by other data +elements: + +META_ALT | alternation +META_BACKREF back reference +META_CAPTURE start of capturing group +META_ESCAPE non-literal escape sequence +META_RECURSE recursion call + +If the data for META_ALT is non-zero, it is inside a lookbehind, and the data +is the length of its branch, for which OP_REVERSE must be generated. + +META_BACKREF, META_CAPTURE, and META_RECURSE have the capture group number as +their data in the lower 16 bits of the element. + +META_BACKREF is followed by an offset if the back reference group number is 10 +or more. The offsets of the first ocurrences of references to groups whose +numbers are less than 10 are put in cb->small_ref_offset[] (only the first +occurrence is useful). On 64-bit systems this avoids using more than two parsed +pattern elements for items such as \3. The offset is used when an error occurs +because the reference is to a non-existent group. + +META_RECURSE is always followed by an offset, for use in error messages. + +META_ESCAPE has an ESC_xxx value as its data. For ESC_P and ESC_p, the next +element contains the 16-bit type and data property values, packed together. +ESC_g and ESC_k are used only for named references - numerical ones are turned +into META_RECURSE or META_BACKREF as appropriate. ESC_g and ESC_k are followed +by a length and an offset into the pattern to specify the name. + +The following have one data item that follows in the next vector element: + +META_BIGVALUE Next is a literal >= META_END +META_OPTIONS (?i) and friends (data is new option bits) +META_POSIX POSIX class item (data identifies the class) +META_POSIX_NEG negative POSIX class item (ditto) + +The following are followed by a length element, then a number of character code +values (which should match with the length): + +META_MARK (*MARK:xxxx) +META_COMMIT_ARG )*COMMIT:xxxx) +META_PRUNE_ARG (*PRUNE:xxx) +META_SKIP_ARG (*SKIP:xxxx) +META_THEN_ARG (*THEN:xxxx) + +The following are followed by a length element, then an offset in the pattern +that identifies the name: + +META_COND_NAME (?() or (?('name') or (?(name) +META_COND_RNAME (?(R&name) +META_COND_RNUMBER (?(Rdigits) +META_RECURSE_BYNAME (?&name) +META_BACKREF_BYNAME \k'name' + +META_COND_RNUMBER is used for names that start with R and continue with digits, +because this is an ambiguous case. It could be a back reference to a group with +that name, or it could be a recursion test on a numbered group. + +This one is followed by an offset, for use in error messages, then a number: + +META_COND_NUMBER (?([+-]digits) + +The following is followed just by an offset, for use in error messages: + +META_COND_DEFINE (?(DEFINE) + +The following are also followed just by an offset, but also the lower 16 bits +of the main word contain the length of the first branch of the lookbehind +group; this is used when generating OP_REVERSE for that branch. + +META_LOOKBEHIND (?<= start of lookbehind +META_LOOKBEHIND_NA (*naplb: start of non-atomic lookbehind +META_LOOKBEHINDNOT (?' and 1 for '>='; +the next two are the major and minor numbers: + +META_COND_VERSION (?(VERSIONx.y) + +Callouts are converted into one of two items: + +META_CALLOUT_NUMBER (?C with numerical argument +META_CALLOUT_STRING (?C with string argument + +In both cases, the next two elements contain the offset and length of the next +item in the pattern. Then there is either one callout number, or a length and +an offset for the string argument. The length includes both delimiters. + + +Traditional matching function +----------------------------- + +The "traditional", and original, matching function is called pcre2_match(), and +it implements an NFA algorithm, similar to the original Henry Spencer algorithm +and the way that Perl works. This is not surprising, since it is intended to be +as compatible with Perl as possible. This is the function most users of PCRE2 +will use most of the time. If PCRE2 is compiled with just-in-time (JIT) +support, and studying a compiled pattern with JIT is successful, the JIT code +is run instead of the normal pcre2_match() code, but the result is the same. + + +Supplementary matching function +------------------------------- + +There is also a supplementary matching function called pcre2_dfa_match(). This +implements a DFA matching algorithm that searches simultaneously for all +possible matches that start at one point in the subject string. (Going back to +my roots: see Historical Note 1 above.) This function intreprets the same +compiled pattern data as pcre2_match(); however, not all the facilities are +available, and those that are do not always work in quite the same way. See the +user documentation for details. + +The algorithm that is used for pcre2_dfa_match() is not a traditional FSM, +because it may have a number of states active at one time. More work would be +needed at compile time to produce a traditional FSM where only one state is +ever active at once. I believe some other regex matchers work this way. JIT +support is not available for this kind of matching. + + +Changeable options +------------------ + +The /i, /m, or /s options (PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL, and +others) may be changed in the middle of patterns by items such as (?i). Their +processing is handled entirely at compile time by generating different opcodes +for the different settings. The runtime functions do not need to keep track of +an option's state. + +PCRE2_DUPNAMES, PCRE2_EXTENDED, PCRE2_EXTENDED_MORE, and PCRE2_NO_AUTO_CAPTURE +are tracked and processed during the parsing pre-pass. The others are handled +from META_OPTIONS items during the main compile phase. + + +Format of compiled patterns +--------------------------- + +The compiled form of a pattern is a vector of unsigned code units (bytes in +8-bit mode, shorts in 16-bit mode, 32-bit words in 32-bit mode), containing +items of variable length. The first code unit in an item contains an opcode, +and the length of the item is either implicit in the opcode or contained in the +data that follows it. + +In many cases listed below, LINK_SIZE data values are specified for offsets +within the compiled pattern. LINK_SIZE always specifies a number of bytes. The +default value for LINK_SIZE is 2, except for the 32-bit library, where it can +only be 4. The 8-bit library can be compiled to used 3-byte or 4-byte values, +and the 16-bit library can be compiled to use 4-byte values, though this +impairs performance. Specifing a LINK_SIZE larger than 2 for these libraries is +necessary only when patterns whose compiled length is greater than 65535 code +units are going to be processed. When a LINK_SIZE value uses more than one code +unit, the most significant unit is first. + +In this description, we assume the "normal" compilation options. Data values +that are counts (e.g. quantifiers) are always two bytes long in 8-bit mode +(most significant byte first), and one code unit in 16-bit and 32-bit modes. + + +Opcodes with no following data +------------------------------ + +These items are all just one unit long: + + OP_END end of pattern + OP_ANY match any one character other than newline + OP_ALLANY match any one character, including newline + OP_ANYBYTE match any single code unit, even in UTF-8/16 mode + OP_SOD match start of data: \A + OP_SOM, start of match (subject + offset): \G + OP_SET_SOM, set start of match (\K) + OP_CIRC ^ (start of data) + OP_CIRCM ^ multiline mode (start of data or after newline) + OP_NOT_WORD_BOUNDARY \W + OP_WORD_BOUNDARY \w + OP_NOT_DIGIT \D + OP_DIGIT \d + OP_NOT_HSPACE \H + OP_HSPACE \h + OP_NOT_WHITESPACE \S + OP_WHITESPACE \s + OP_NOT_VSPACE \V + OP_VSPACE \v + OP_NOT_WORDCHAR \W + OP_WORDCHAR \w + OP_EODN match end of data or newline at end: \Z + OP_EOD match end of data: \z + OP_DOLL $ (end of data, or before final newline) + OP_DOLLM $ multiline mode (end of data or before newline) + OP_EXTUNI match an extended Unicode grapheme cluster + OP_ANYNL match any Unicode newline sequence + + OP_ASSERT_ACCEPT ) + OP_ACCEPT ) These are Perl 5.10's "backtracking control + OP_COMMIT ) verbs". If OP_ACCEPT is inside capturing + OP_FAIL ) parentheses, it may be preceded by one or more + OP_PRUNE ) OP_CLOSE, each followed by a number that + OP_SKIP ) indicates which parentheses must be closed. + OP_THEN ) + +OP_ASSERT_ACCEPT is used when (*ACCEPT) is encountered within an assertion. +This ends the assertion, not the entire pattern match. The assertion (?!) is +always optimized to OP_FAIL. + +OP_ALLANY is used for '.' when PCRE2_DOTALL is set. It is also used for \C in +non-UTF modes and in UTF-32 mode (since one code unit still equals one +character). Another use is for [^] when empty classes are permitted +(PCRE2_ALLOW_EMPTY_CLASS is set). + + +Backtracking control verbs +-------------------------- + +Verbs with no arguments generate opcodes with no following data (as listed +in the section above). + +(*MARK:NAME) generates OP_MARK followed by the mark name, preceded by a +length in one code unit, and followed by a binary zero. The name length is +limited by the size of the code unit. + +(*ACCEPT:NAME) and (*FAIL:NAME) are compiled as (*MARK:NAME)(*ACCEPT) and +(*MARK:NAME)(*FAIL) respectively. + +For (*COMMIT:NAME), (*PRUNE:NAME), (*SKIP:NAME), and (*THEN:NAME), the opcodes +OP_COMMIT_ARG, OP_PRUNE_ARG, OP_SKIP_ARG, and OP_THEN_ARG are used, with the +name following in the same format as for OP_MARK. + + +Matching literal characters +--------------------------- + +The OP_CHAR opcode is followed by a single character that is to be matched +casefully. For caseless matching of characters that have at most two +case-equivalent code points, OP_CHARI is used. In UTF-8 or UTF-16 modes, the +character may be more than one code unit long. In UTF-32 mode, characters are +always exactly one code unit long. + +If there is only one character in a character class, OP_CHAR or OP_CHARI is +used for a positive class, and OP_NOT or OP_NOTI for a negative one (that is, +for something like [^a]). + +Caseless matching (positive or negative) of characters that have more than two +case-equivalent code points (which is possible only in UTF mode) is handled by +compiling a Unicode property item (see below), with the pseudo-property +PT_CLIST. The value of this property is an offset in a vector called +"ucd_caseless_sets" which identifies the start of a short list of equivalent +characters, terminated by the value NOTACHAR (0xffffffff). + + +Repeating single characters +--------------------------- + +The common repeats (*, +, ?), when applied to a single character, use the +following opcodes, which come in caseful and caseless versions: + + Caseful Caseless + OP_STAR OP_STARI + OP_MINSTAR OP_MINSTARI + OP_POSSTAR OP_POSSTARI + OP_PLUS OP_PLUSI + OP_MINPLUS OP_MINPLUSI + OP_POSPLUS OP_POSPLUSI + OP_QUERY OP_QUERYI + OP_MINQUERY OP_MINQUERYI + OP_POSQUERY OP_POSQUERYI + +Each opcode is followed by the character that is to be repeated. In ASCII or +UTF-32 modes, these are two-code-unit items; in UTF-8 or UTF-16 modes, the +length is variable. Those with "MIN" in their names are the minimizing +versions. Those with "POS" in their names are possessive versions. Other kinds +of repeat make use of these opcodes: + + Caseful Caseless + OP_UPTO OP_UPTOI + OP_MINUPTO OP_MINUPTOI + OP_POSUPTO OP_POSUPTOI + OP_EXACT OP_EXACTI + +Each of these is followed by a count and then the repeated character. The count +is two bytes long in 8-bit mode (most significant byte first), or one code unit +in 16-bit and 32-bit modes. + +OP_UPTO matches from 0 to the given number. A repeat with a non-zero minimum +and a fixed maximum is coded as an OP_EXACT followed by an OP_UPTO (or +OP_MINUPTO or OPT_POSUPTO). + +Another set of matching repeating opcodes (called OP_NOTSTAR, OP_NOTSTARI, +etc.) are used for repeated, negated, single-character classes such as [^a]*. +The normal single-character opcodes (OP_STAR, etc.) are used for repeated +positive single-character classes. + + +Repeating character types +------------------------- + +Repeats of things like \d are done exactly as for single characters, except +that instead of a character, the opcode for the type (e.g. OP_DIGIT) is stored +in the next code unit. The opcodes are: + + OP_TYPESTAR + OP_TYPEMINSTAR + OP_TYPEPOSSTAR + OP_TYPEPLUS + OP_TYPEMINPLUS + OP_TYPEPOSPLUS + OP_TYPEQUERY + OP_TYPEMINQUERY + OP_TYPEPOSQUERY + OP_TYPEUPTO + OP_TYPEMINUPTO + OP_TYPEPOSUPTO + OP_TYPEEXACT + + +Match by Unicode property +------------------------- + +OP_PROP and OP_NOTPROP are used for positive and negative matches of a +character by testing its Unicode property (the \p and \P escape sequences). +Each is followed by two code units that encode the desired property as a type +and a value. The types are a set of #defines of the form PT_xxx, and the values +are enumerations of the form ucp_xx, defined in the pcre2_ucp.h source file. +The value is relevant only for PT_GC (General Category), PT_PC (Particular +Category), PT_SC (Script), and the pseudo-property PT_CLIST, which is used to +identify a list of case-equivalent characters when there are three or more. + +Repeats of these items use the OP_TYPESTAR etc. set of opcodes, followed by +three code units: OP_PROP or OP_NOTPROP, and then the desired property type and +value. + + +Character classes +----------------- + +If there is only one character in a class, OP_CHAR or OP_CHARI is used for a +positive class, and OP_NOT or OP_NOTI for a negative one (that is, for +something like [^a]), except when caselessly matching a character that has more +than two case-equivalent code points (which can happen only in UTF mode). In +this case a Unicode property item is used, as described above in "Matching +literal characters". + +A set of repeating opcodes (called OP_NOTSTAR etc.) are used for repeated, +negated, single-character classes. The normal single-character opcodes +(OP_STAR, etc.) are used for repeated positive single-character classes. + +When there is more than one character in a class, and all the code points are +less than 256, OP_CLASS is used for a positive class, and OP_NCLASS for a +negative one. In either case, the opcode is followed by a 32-byte (16-short, +8-word) bit map containing a 1 bit for every character that is acceptable. The +bits are counted from the least significant end of each unit. In caseless mode, +bits for both cases are set. + +The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8 and +16-bit and 32-bit modes, subject characters with values greater than 255 can be +handled correctly. For OP_CLASS they do not match, whereas for OP_NCLASS they +do. + +For classes containing characters with values greater than 255 or that contain +\p or \P, OP_XCLASS is used. It optionally uses a bit map if any acceptable +code points are less than 256, followed by a list of pairs (for a range) and/or +single characters and/or properties. In caseless mode, all equivalent +characters are explicitly listed. + +OP_XCLASS is followed by a LINK_SIZE value containing the total length of the +opcode and its data. This is followed by a code unit containing flag bits: +XCL_NOT indicates that this is a negative class, and XCL_MAP indicates that a +bit map is present. There follows the bit map, if XCL_MAP is set, and then a +sequence of items coded as follows: + + XCL_END marks the end of the list + XCL_SINGLE one character follows + XCL_RANGE two characters follow + XCL_PROP a Unicode property (type, value) follows + XCL_NOTPROP a Unicode property (type, value) follows + +If a range starts with a code point less than 256 and ends with one greater +than 255, it is split into two ranges, with characters less than 256 being +indicated in the bit map, and the rest with XCL_RANGE. + +When XCL_NOT is set, the bit map, if present, contains bits for characters that +are allowed (exactly as for OP_NCLASS), but the list of items that follow it +specifies characters and properties that are not allowed. + + +Back references +--------------- + +OP_REF (caseful) or OP_REFI (caseless) is followed by a count containing the +reference number when the reference is to a unique capturing group (either by +number or by name). When named groups are used, there may be more than one +group with the same name. In this case, a reference to such a group by name +generates OP_DNREF or OP_DNREFI. These are followed by two counts: the index +(not the byte offset) in the group name table of the first entry for the +required name, followed by the number of groups with the same name. The +matching code can then search for the first one that is set. + + +Repeating character classes and back references +----------------------------------------------- + +Single-character classes are handled specially (see above). This section +applies to other classes and also to back references. In both cases, the repeat +information follows the base item. The matching code looks at the following +opcode to see if it is one of these: + + OP_CRSTAR + OP_CRMINSTAR + OP_CRPOSSTAR + OP_CRPLUS + OP_CRMINPLUS + OP_CRPOSPLUS + OP_CRQUERY + OP_CRMINQUERY + OP_CRPOSQUERY + OP_CRRANGE + OP_CRMINRANGE + OP_CRPOSRANGE + +All but the last three are single-code-unit items, with no data. The range +opcodes are followed by the minimum and maximum repeat counts. + + +Brackets and alternation +------------------------ + +A pair of non-capturing round brackets is wrapped round each expression at +compile time, so alternation always happens in the context of brackets. + +[Note for North Americans: "bracket" to some English speakers, including +myself, can be round, square, curly, or pointy. Hence this usage rather than +"parentheses".] + +Non-capturing brackets use the opcode OP_BRA, capturing brackets use OP_CBRA. A +bracket opcode is followed by a LINK_SIZE value which gives the offset to the +next alternative OP_ALT or, if there aren't any branches, to the terminating +opcode. Each OP_ALT is followed by a LINK_SIZE value giving the offset to the +next one, or to the final opcode. For capturing brackets, the bracket number is +a count that immediately follows the offset. + +There are several opcodes that mark the end of a subpattern group. OP_KET is +used for subpatterns that do not repeat indefinitely, OP_KETRMIN and +OP_KETRMAX are used for indefinite repetitions, minimally or maximally +respectively, and OP_KETRPOS for possessive repetitions (see below for more +details). All four are followed by a LINK_SIZE value giving (as a positive +number) the offset back to the matching bracket opcode. + +If a subpattern is quantified such that it is permitted to match zero times, it +is preceded by one of OP_BRAZERO, OP_BRAMINZERO, or OP_SKIPZERO. These are +single-unit opcodes that tell the matcher that skipping the following +subpattern entirely is a valid match. In the case of the first two, not +skipping the pattern is also valid (greedy and non-greedy). The third is used +when a pattern has the quantifier {0,0}. It cannot be entirely discarded, +because it may be called as a subroutine from elsewhere in the pattern. + +A subpattern with an indefinite maximum repetition is replicated in the +compiled data its minimum number of times (or once with OP_BRAZERO if the +minimum is zero), with the final copy terminating with OP_KETRMIN or OP_KETRMAX +as appropriate. + +A subpattern with a bounded maximum repetition is replicated in a nested +fashion up to the maximum number of times, with OP_BRAZERO or OP_BRAMINZERO +before each replication after the minimum, so that, for example, (abc){2,5} is +compiled as (abc)(abc)((abc)((abc)(abc)?)?)?, except that each bracketed group +has the same number. + +When a repeated subpattern has an unbounded upper limit, it is checked to see +whether it could match an empty string. If this is the case, the opcode in the +final replication is changed to OP_SBRA or OP_SCBRA. This tells the matcher +that it needs to check for matching an empty string when it hits OP_KETRMIN or +OP_KETRMAX, and if so, to break the loop. + + +Possessive brackets +------------------- + +When a repeated group (capturing or non-capturing) is marked as possessive by +the "+" notation, e.g. (abc)++, different opcodes are used. Their names all +have POS on the end, e.g. OP_BRAPOS instead of OP_BRA and OP_SCBRAPOS instead +of OP_SCBRA. The end of such a group is marked by OP_KETRPOS. If the minimum +repetition is zero, the group is preceded by OP_BRAPOSZERO. + + +Once-only (atomic) groups +------------------------- + +These are just like other subpatterns, but they start with the opcode OP_ONCE. +The check for matching an empty string in an unbounded repeat is handled +entirely at runtime, so there is just this one opcode for atomic groups. + + +Assertions +---------- + +Forward assertions are also just like other subpatterns, but starting with one +of the opcodes OP_ASSERT, OP_ASSERT_NA (non-atomic assertion), or +OP_ASSERT_NOT. Backward assertions use the opcodes OP_ASSERTBACK, +OP_ASSERTBACK_NA, and OP_ASSERTBACK_NOT, and the first opcode inside the +assertion is OP_REVERSE, followed by a count of the number of characters to +move back the pointer in the subject string. In ASCII or UTF-32 mode, the count +is also the number of code units, but in UTF-8/16 mode each character may +occupy more than one code unit. A separate count is present in each alternative +of a lookbehind assertion, allowing each branch to have a different (but fixed) +length. + + +Conditional subpatterns +----------------------- + +These are like other subpatterns, but they start with the opcode OP_COND, or +OP_SCOND for one that might match an empty string in an unbounded repeat. + +If the condition is a back reference, this is stored at the start of the +subpattern using the opcode OP_CREF followed by a count containing the +reference number, provided that the reference is to a unique capturing group. +If the reference was by name and there is more than one group with that name, +OP_DNCREF is used instead. It is followed by two counts: the index in the group +names table, and the number of groups with the same name. The allows the +matcher to check if any group with the given name is set. + +If the condition is "in recursion" (coded as "(?(R)"), or "in recursion of +group x" (coded as "(?(Rx)"), the group number is stored at the start of the +subpattern using the opcode OP_RREF (with a value of RREF_ANY (0xffff) for "the +whole pattern") or OP_DNRREF (with data as for OP_DNCREF). + +For a DEFINE condition, OP_FALSE is used (with no associated data). During +compilation, however, a DEFINE condition is coded as OP_DEFINE so that, when +the conditional group is complete, there can be a check to ensure that it +contains only one top-level branch. Once this has happened, the opcode is +changed to OP_FALSE, so the matcher never sees OP_DEFINE. + +There is a special PCRE2-specific condition of the form (VERSION[>]=x.y), which +tests the PCRE2 version number. This compiles into one of the opcodes OP_TRUE +or OP_FALSE. + +If a condition is not a back reference, recursion test, DEFINE, or VERSION, it +must start with a parenthesized atomic assertion, whose opcode normally +immediately follows OP_COND or OP_SCOND. However, if automatic callouts are +enabled, a callout is inserted immediately before the assertion. It is also +possible to insert a manual callout at this point. Only assertion conditions +may have callouts preceding the condition. + +A condition that is the negative assertion (?!) is optimized to OP_FAIL in all +parts of the pattern, so this is another opcode that may appear as a condition. +It is treated the same as OP_FALSE. + + +Recursion +--------- + +Recursion either matches the current pattern, or some subexpression. The opcode +OP_RECURSE is followed by a LINK_SIZE value that is the offset to the starting +bracket from the start of the whole pattern. OP_RECURSE is also used for +"subroutine" calls, even though they are not strictly a recursion. Up till +release 10.30 recursions were treated as atomic groups, making them +incompatible with Perl (but PCRE had them well before Perl did). From 10.30, +backtracking into recursions is supported. + +Repeated recursions used to be wrapped inside OP_ONCE brackets, which not only +forced no backtracking, but also allowed repetition to be handled as for other +bracketed groups. From 10.30 onwards, repeated recursions are duplicated for +their minimum repetitions, and then wrapped in non-capturing brackets for the +remainder. For example, (?1){3} is treated as (?1)(?1)(?1), and (?1){2,4} is +treated as (?1)(?1)(?:(?1)){0,2}. + + +Callouts +-------- + +A callout may have either a numerical argument or a string argument. These use +OP_CALLOUT or OP_CALLOUT_STR, respectively. In each case these are followed by +two LINK_SIZE values giving the offset in the pattern string to the start of +the following item, and another count giving the length of this item. These +values make it possible for pcre2test to output useful tracing information +using callouts. + +In the case of a numeric callout, after these two values there is a single code +unit containing the callout number, in the range 0-255, with 255 being used for +callouts that are automatically inserted as a result of the PCRE2_AUTO_CALLOUT +option. Thus, this opcode item is of fixed length: + + [OP_CALLOUT] [PATTERN_OFFSET] [PATTERN_LENGTH] [NUMBER] + +For callouts with string arguments, OP_CALLOUT_STR has three more data items: +a LINK_SIZE value giving the complete length of the entire opcode item, a +LINK_SIZE item containing the offset within the pattern string to the start of +the string argument, and the string itself, preceded by its starting delimiter +and followed by a binary zero. When a callout function is called, a pointer to +the actual string is passed, but the delimiter can be accessed as string[-1] if +the application needs it. In the 8-bit library, the callout in /X(?C'abc')Y/ is +compiled as the following bytes (decimal numbers represent binary values): + + [OP_CALLOUT_STR] [0] [10] [0] [1] [0] [14] [0] [5] ['] [a] [b] [c] [0] + -------- ------- -------- ------- + | | | | + ------- LINK_SIZE items ------ + +Opcode table checking +--------------------- + +The last opcode that is defined in pcre2_internal.h is OP_TABLE_LENGTH. This is +not a real opcode, but is used to check at compile time that tables indexed by +opcode are the correct length, in order to catch updating errors. + +Philip Hazel +12 July 2019 diff --git a/src/pcre/INSTALL b/src/pcre2/INSTALL similarity index 100% rename from src/pcre/INSTALL rename to src/pcre2/INSTALL diff --git a/src/pcre/LICENCE b/src/pcre2/LICENCE similarity index 54% rename from src/pcre/LICENCE rename to src/pcre2/LICENCE index 760a6666..18684cea 100644 --- a/src/pcre/LICENCE +++ b/src/pcre2/LICENCE @@ -1,42 +1,43 @@ -PCRE LICENCE ------------- +PCRE2 LICENCE +------------- -PCRE is a library of functions to support regular expressions whose syntax +PCRE2 is a library of functions to support regular expressions whose syntax and semantics are as close as possible to those of the Perl 5 language. -Release 8 of PCRE is distributed under the terms of the "BSD" licence, as -specified below. The documentation for PCRE, supplied in the "doc" -directory, is distributed under the same terms as the software itself. The data -in the testdata directory is not copyrighted and is in the public domain. +Releases 10.00 and above of PCRE2 are distributed under the terms of the "BSD" +licence, as specified below, with one exemption for certain binary +redistributions. The documentation for PCRE2, supplied in the "doc" directory, +is distributed under the same terms as the software itself. The data in the +testdata directory is not copyrighted and is in the public domain. The basic library functions are written in C and are freestanding. Also -included in the distribution is a set of C++ wrapper functions, and a -just-in-time compiler that can be used to optimize pattern matching. These -are both optional features that can be omitted when the library is built. +included in the distribution is a just-in-time compiler that can be used to +optimize pattern matching. This is an optional feature that can be omitted when +the library is built. THE BASIC LIBRARY FUNCTIONS --------------------------- Written by: Philip Hazel -Email local part: ph10 -Email domain: cam.ac.uk +Email local part: Philip.Hazel +Email domain: gmail.com University of Cambridge Computing Service, Cambridge, England. -Copyright (c) 1997-2019 University of Cambridge +Copyright (c) 1997-2021 University of Cambridge All rights reserved. -PCRE JUST-IN-TIME COMPILATION SUPPORT -------------------------------------- +PCRE2 JUST-IN-TIME COMPILATION SUPPORT +-------------------------------------- Written by: Zoltan Herczeg Email local part: hzmester Email domain: freemail.hu -Copyright(c) 2010-2019 Zoltan Herczeg +Copyright(c) 2010-2021 Zoltan Herczeg All rights reserved. @@ -47,16 +48,7 @@ Written by: Zoltan Herczeg Email local part: hzmester Email domain: freemail.hu -Copyright(c) 2009-2019 Zoltan Herczeg -All rights reserved. - - -THE C++ WRAPPER FUNCTIONS -------------------------- - -Contributed by: Google Inc. - -Copyright (c) 2007-2012, Google Inc. +Copyright(c) 2009-2021 Zoltan Herczeg All rights reserved. @@ -66,17 +58,16 @@ THE "BSD" LICENCE Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: - * Redistributions of source code must retain the above copyright notice, + * Redistributions of source code must retain the above copyright notices, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the + notices, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. - * Neither the name of the University of Cambridge nor the name of Google - Inc. nor the names of their contributors may be used to endorse or - promote products derived from this software without specific prior - written permission. + * Neither the name of the University of Cambridge nor the names of any + contributors may be used to endorse or promote products derived from this + software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE @@ -90,4 +81,14 @@ CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +EXEMPTION FOR BINARY LIBRARY-LIKE PACKAGES +------------------------------------------ + +The second condition in the BSD licence (covering binary redistributions) does +not apply all the way down a chain of software. If binary package A includes +PCRE2, it must respect the condition, but if package B is software that +includes package A, the condition is not imposed on package B unless it uses +PCRE2 independently. + End diff --git a/src/pcre2/Makefile.am b/src/pcre2/Makefile.am new file mode 100644 index 00000000..bd8e6f0c --- /dev/null +++ b/src/pcre2/Makefile.am @@ -0,0 +1,868 @@ +## Process this file with automake to produce Makefile.in. + +AUTOMAKE_OPTIONS = subdir-objects +ACLOCAL_AMFLAGS = -I m4 + +## This seems to have become necessary for building in non-source directory. + +AM_CPPFLAGS="-I$(srcdir)/src" + +## Specify the documentation files that are distributed. + +dist_doc_DATA = \ + AUTHORS \ + COPYING \ + ChangeLog \ + LICENCE \ + NEWS \ + README \ + doc/pcre2.txt \ + doc/pcre2-config.txt \ + doc/pcre2grep.txt \ + doc/pcre2test.txt + +dist_html_DATA = \ + doc/html/NON-AUTOTOOLS-BUILD.txt \ + doc/html/README.txt \ + doc/html/index.html \ + doc/html/pcre2-config.html \ + doc/html/pcre2.html \ + doc/html/pcre2_callout_enumerate.html \ + doc/html/pcre2_code_copy.html \ + doc/html/pcre2_code_copy_with_tables.html \ + doc/html/pcre2_code_free.html \ + doc/html/pcre2_compile.html \ + doc/html/pcre2_compile_context_copy.html \ + doc/html/pcre2_compile_context_create.html \ + doc/html/pcre2_compile_context_free.html \ + doc/html/pcre2_config.html \ + doc/html/pcre2_convert_context_copy.html \ + doc/html/pcre2_convert_context_create.html \ + doc/html/pcre2_convert_context_free.html \ + doc/html/pcre2_converted_pattern_free.html \ + doc/html/pcre2_dfa_match.html \ + doc/html/pcre2_general_context_copy.html \ + doc/html/pcre2_general_context_create.html \ + doc/html/pcre2_general_context_free.html \ + doc/html/pcre2_get_error_message.html \ + doc/html/pcre2_get_mark.html \ + doc/html/pcre2_get_match_data_size.html \ + doc/html/pcre2_get_ovector_count.html \ + doc/html/pcre2_get_ovector_pointer.html \ + doc/html/pcre2_get_startchar.html \ + doc/html/pcre2_jit_compile.html \ + doc/html/pcre2_jit_free_unused_memory.html \ + doc/html/pcre2_jit_match.html \ + doc/html/pcre2_jit_stack_assign.html \ + doc/html/pcre2_jit_stack_create.html \ + doc/html/pcre2_jit_stack_free.html \ + doc/html/pcre2_maketables.html \ + doc/html/pcre2_maketables_free.html \ + doc/html/pcre2_match.html \ + doc/html/pcre2_match_context_copy.html \ + doc/html/pcre2_match_context_create.html \ + doc/html/pcre2_match_context_free.html \ + doc/html/pcre2_match_data_create.html \ + doc/html/pcre2_match_data_create_from_pattern.html \ + doc/html/pcre2_match_data_free.html \ + doc/html/pcre2_pattern_convert.html \ + doc/html/pcre2_pattern_info.html \ + doc/html/pcre2_serialize_decode.html \ + doc/html/pcre2_serialize_encode.html \ + doc/html/pcre2_serialize_free.html \ + doc/html/pcre2_serialize_get_number_of_codes.html \ + doc/html/pcre2_set_bsr.html \ + doc/html/pcre2_set_callout.html \ + doc/html/pcre2_set_character_tables.html \ + doc/html/pcre2_set_compile_extra_options.html \ + doc/html/pcre2_set_compile_recursion_guard.html \ + doc/html/pcre2_set_depth_limit.html \ + doc/html/pcre2_set_glob_escape.html \ + doc/html/pcre2_set_glob_separator.html \ + doc/html/pcre2_set_heap_limit.html \ + doc/html/pcre2_set_match_limit.html \ + doc/html/pcre2_set_max_pattern_length.html \ + doc/html/pcre2_set_offset_limit.html \ + doc/html/pcre2_set_newline.html \ + doc/html/pcre2_set_parens_nest_limit.html \ + doc/html/pcre2_set_recursion_limit.html \ + doc/html/pcre2_set_recursion_memory_management.html \ + doc/html/pcre2_set_substitute_callout.html \ + doc/html/pcre2_substitute.html \ + doc/html/pcre2_substring_copy_byname.html \ + doc/html/pcre2_substring_copy_bynumber.html \ + doc/html/pcre2_substring_free.html \ + doc/html/pcre2_substring_get_byname.html \ + doc/html/pcre2_substring_get_bynumber.html \ + doc/html/pcre2_substring_length_byname.html \ + doc/html/pcre2_substring_length_bynumber.html \ + doc/html/pcre2_substring_list_free.html \ + doc/html/pcre2_substring_list_get.html \ + doc/html/pcre2_substring_nametable_scan.html \ + doc/html/pcre2_substring_number_from_name.html \ + doc/html/pcre2api.html \ + doc/html/pcre2build.html \ + doc/html/pcre2callout.html \ + doc/html/pcre2compat.html \ + doc/html/pcre2convert.html \ + doc/html/pcre2demo.html \ + doc/html/pcre2grep.html \ + doc/html/pcre2jit.html \ + doc/html/pcre2limits.html \ + doc/html/pcre2matching.html \ + doc/html/pcre2partial.html \ + doc/html/pcre2pattern.html \ + doc/html/pcre2perform.html \ + doc/html/pcre2posix.html \ + doc/html/pcre2sample.html \ + doc/html/pcre2serialize.html \ + doc/html/pcre2syntax.html \ + doc/html/pcre2test.html \ + doc/html/pcre2unicode.html + +dist_man_MANS = \ + doc/pcre2-config.1 \ + doc/pcre2.3 \ + doc/pcre2_callout_enumerate.3 \ + doc/pcre2_code_copy.3 \ + doc/pcre2_code_copy_with_tables.3 \ + doc/pcre2_code_free.3 \ + doc/pcre2_compile.3 \ + doc/pcre2_compile_context_copy.3 \ + doc/pcre2_compile_context_create.3 \ + doc/pcre2_compile_context_free.3 \ + doc/pcre2_config.3 \ + doc/pcre2_convert_context_copy.3 \ + doc/pcre2_convert_context_create.3 \ + doc/pcre2_convert_context_free.3 \ + doc/pcre2_converted_pattern_free.3 \ + doc/pcre2_dfa_match.3 \ + doc/pcre2_general_context_copy.3 \ + doc/pcre2_general_context_create.3 \ + doc/pcre2_general_context_free.3 \ + doc/pcre2_get_error_message.3 \ + doc/pcre2_get_mark.3 \ + doc/pcre2_get_match_data_size.3 \ + doc/pcre2_get_ovector_count.3 \ + doc/pcre2_get_ovector_pointer.3 \ + doc/pcre2_get_startchar.3 \ + doc/pcre2_jit_compile.3 \ + doc/pcre2_jit_free_unused_memory.3 \ + doc/pcre2_jit_match.3 \ + doc/pcre2_jit_stack_assign.3 \ + doc/pcre2_jit_stack_create.3 \ + doc/pcre2_jit_stack_free.3 \ + doc/pcre2_maketables.3 \ + doc/pcre2_maketables_free.3 \ + doc/pcre2_match.3 \ + doc/pcre2_match_context_copy.3 \ + doc/pcre2_match_context_create.3 \ + doc/pcre2_match_context_free.3 \ + doc/pcre2_match_data_create.3 \ + doc/pcre2_match_data_create_from_pattern.3 \ + doc/pcre2_match_data_free.3 \ + doc/pcre2_pattern_convert.3 \ + doc/pcre2_pattern_info.3 \ + doc/pcre2_serialize_decode.3 \ + doc/pcre2_serialize_encode.3 \ + doc/pcre2_serialize_free.3 \ + doc/pcre2_serialize_get_number_of_codes.3 \ + doc/pcre2_set_bsr.3 \ + doc/pcre2_set_callout.3 \ + doc/pcre2_set_character_tables.3 \ + doc/pcre2_set_compile_extra_options.3 \ + doc/pcre2_set_compile_recursion_guard.3 \ + doc/pcre2_set_depth_limit.3 \ + doc/pcre2_set_glob_escape.3 \ + doc/pcre2_set_glob_separator.3 \ + doc/pcre2_set_heap_limit.3 \ + doc/pcre2_set_match_limit.3 \ + doc/pcre2_set_max_pattern_length.3 \ + doc/pcre2_set_offset_limit.3 \ + doc/pcre2_set_newline.3 \ + doc/pcre2_set_parens_nest_limit.3 \ + doc/pcre2_set_recursion_limit.3 \ + doc/pcre2_set_recursion_memory_management.3 \ + doc/pcre2_set_substitute_callout.3 \ + doc/pcre2_substitute.3 \ + doc/pcre2_substring_copy_byname.3 \ + doc/pcre2_substring_copy_bynumber.3 \ + doc/pcre2_substring_free.3 \ + doc/pcre2_substring_get_byname.3 \ + doc/pcre2_substring_get_bynumber.3 \ + doc/pcre2_substring_length_byname.3 \ + doc/pcre2_substring_length_bynumber.3 \ + doc/pcre2_substring_list_free.3 \ + doc/pcre2_substring_list_get.3 \ + doc/pcre2_substring_nametable_scan.3 \ + doc/pcre2_substring_number_from_name.3 \ + doc/pcre2api.3 \ + doc/pcre2build.3 \ + doc/pcre2callout.3 \ + doc/pcre2compat.3 \ + doc/pcre2convert.3 \ + doc/pcre2demo.3 \ + doc/pcre2grep.1 \ + doc/pcre2jit.3 \ + doc/pcre2limits.3 \ + doc/pcre2matching.3 \ + doc/pcre2partial.3 \ + doc/pcre2pattern.3 \ + doc/pcre2perform.3 \ + doc/pcre2posix.3 \ + doc/pcre2sample.3 \ + doc/pcre2serialize.3 \ + doc/pcre2syntax.3 \ + doc/pcre2test.1 \ + doc/pcre2unicode.3 + +# The Libtool libraries to install. We'll add to this later. + +lib_LTLIBRARIES = + +# Unit tests you want to run when people type 'make check'. +# TESTS is for binary unit tests, check_SCRIPTS for script-based tests + +TESTS = +check_SCRIPTS = +dist_noinst_SCRIPTS = + +# Some of the binaries we make are to be installed, and others are +# (non-user-visible) helper programs needed to build the libraries. + +bin_PROGRAMS = +noinst_PROGRAMS = + +# Additional files to delete on 'make clean', 'make distclean', +# and 'make maintainer-clean'. + +CLEANFILES = +DISTCLEANFILES = src/config.h.in~ +MAINTAINERCLEANFILES = + +# Additional files to bundle with the distribution, over and above what +# the Autotools include by default. + +EXTRA_DIST = + +# These files contain additional m4 macros that are used by autoconf. + +EXTRA_DIST += \ + m4/ax_pthread.m4 m4/pcre2_visibility.m4 + +# These files contain maintenance information + +EXTRA_DIST += \ + NON-AUTOTOOLS-BUILD \ + HACKING + +# These files are used in the preparation of a release + +EXTRA_DIST += \ + PrepareRelease \ + CheckMan \ + CleanTxt \ + Detrail \ + 132html \ + doc/index.html.src + +# These files are usable versions of pcre2.h and config.h that are distributed +# for the benefit of people who are building PCRE2 manually, without the +# Autotools support. + +EXTRA_DIST += \ + src/pcre2.h.generic \ + src/config.h.generic + +# The only difference between pcre2.h.in and pcre2.h is the setting of the PCRE +# version number. Therefore, we can create the generic version just by copying. + +src/pcre2.h.generic: src/pcre2.h.in configure.ac + rm -f $@ + cp -p src/pcre2.h $@ + +# It is more complicated for config.h.generic. We need the version that results +# from a default configuration so as to get all the default values for PCRE +# configuration macros such as MATCH_LIMIT and NEWLINE. We can get this by +# doing a configure in a temporary directory. However, some trickery is needed, +# because the source directory may already be configured. If you just try +# running configure in a new directory, it complains. For this reason, we move +# config.status out of the way while doing the default configuration. The +# resulting config.h is munged by perl to put #ifdefs round any #defines for +# macros with values, and to #undef all boolean macros such as HAVE_xxx and +# SUPPORT_xxx. We also get rid of any gcc-specific visibility settings. Make +# sure that PCRE2_EXP_DEFN is unset (in case it has visibility settings). + +src/config.h.generic: configure.ac + rm -rf $@ _generic + mkdir _generic + cs=$(srcdir)/config.status; test ! -f $$cs || mv -f $$cs $$cs.aside + cd _generic && $(abs_top_srcdir)/configure || : + cs=$(srcdir)/config.status; test ! -f $$cs.aside || mv -f $$cs.aside $$cs + test -f _generic/src/config.h + perl -n \ + -e 'BEGIN{$$blank=0;}' \ + -e 'if(/PCRE2_EXP_DEFN/){print"/* #undef PCRE2_EXP_DEFN */\n";$$blank=0;next;}' \ + -e 'if(/to make a symbol visible/){next;}' \ + -e 'if(/__attribute__ \(\(visibility/){next;}' \ + -e 'if(/LT_OBJDIR/){print"/* This is ignored unless you are using libtool. */\n";}' \ + -e 'if(/^#define\s((?:HAVE|SUPPORT|STDC)_\w+)/){print"/* #undef $$1 */\n";$$blank=0;next;}' \ + -e 'if(/^#define\s(?!PACKAGE|VERSION)(\w+)/){print"#ifndef $$1\n$$_#endif\n";$$blank=0;next;}' \ + -e 'if(/^\s*$$/){print unless $$blank; $$blank=1;} else{print;$$blank=0;}' \ + _generic/src/config.h >$@ + rm -rf _generic + +MAINTAINERCLEANFILES += src/pcre2.h.generic src/config.h.generic + +# These are the header files we'll install. We do not distribute pcre2.h +# because it is generated from pcre2.h.in. + +nodist_include_HEADERS = src/pcre2.h +include_HEADERS = src/pcre2posix.h + +# This is the "config" script. + +bin_SCRIPTS = pcre2-config + +## --------------------------------------------------------------- +## The pcre2_dftables program is used to rebuild character tables before +## compiling PCRE2, if --enable-rebuild-chartables is specified. It is not an +## installed program. The default (when --enable-rebuild-chartables is not +## specified) is to copy a distributed set of tables that are defined for ASCII +## code. In this case, pcre2_dftables is not needed. + +if WITH_REBUILD_CHARTABLES +noinst_PROGRAMS += pcre2_dftables +pcre2_dftables_SOURCES = src/pcre2_dftables.c +src/pcre2_chartables.c: pcre2_dftables$(EXEEXT) + rm -f $@ + ./pcre2_dftables$(EXEEXT) $@ +else +src/pcre2_chartables.c: $(srcdir)/src/pcre2_chartables.c.dist + rm -f $@ + $(LN_S) $(abs_srcdir)/src/pcre2_chartables.c.dist $(abs_builddir)/src/pcre2_chartables.c +endif # WITH_REBUILD_CHARTABLES + +BUILT_SOURCES = src/pcre2_chartables.c +NODIST_SOURCES = src/pcre2_chartables.c + +## Define the list of common sources, then arrange to build whichever of the +## 8-, 16-, or 32-bit libraries are configured. + +COMMON_SOURCES = \ + src/pcre2_auto_possess.c \ + src/pcre2_compile.c \ + src/pcre2_config.c \ + src/pcre2_context.c \ + src/pcre2_convert.c \ + src/pcre2_dfa_match.c \ + src/pcre2_error.c \ + src/pcre2_extuni.c \ + src/pcre2_find_bracket.c \ + src/pcre2_internal.h \ + src/pcre2_intmodedep.h \ + src/pcre2_jit_compile.c \ + src/pcre2_jit_neon_inc.h \ + src/pcre2_jit_simd_inc.h \ + src/pcre2_maketables.c \ + src/pcre2_match.c \ + src/pcre2_match_data.c \ + src/pcre2_newline.c \ + src/pcre2_ord2utf.c \ + src/pcre2_pattern_info.c \ + src/pcre2_script_run.c \ + src/pcre2_serialize.c \ + src/pcre2_string_utils.c \ + src/pcre2_study.c \ + src/pcre2_substitute.c \ + src/pcre2_substring.c \ + src/pcre2_tables.c \ + src/pcre2_ucd.c \ + src/pcre2_ucp.h \ + src/pcre2_valid_utf.c \ + src/pcre2_xclass.c + +if WITH_PCRE2_8 +lib_LTLIBRARIES += libpcre2-8.la +libpcre2_8_la_SOURCES = \ + $(COMMON_SOURCES) +nodist_libpcre2_8_la_SOURCES = \ + $(NODIST_SOURCES) +libpcre2_8_la_CFLAGS = \ + -DPCRE2_CODE_UNIT_WIDTH=8 \ + $(VISIBILITY_CFLAGS) \ + $(CET_CFLAGS) \ + $(AM_CFLAGS) +libpcre2_8_la_LIBADD = +endif # WITH_PCRE2_8 + +if WITH_PCRE2_16 +lib_LTLIBRARIES += libpcre2-16.la +libpcre2_16_la_SOURCES = \ + $(COMMON_SOURCES) +nodist_libpcre2_16_la_SOURCES = \ + $(NODIST_SOURCES) +libpcre2_16_la_CFLAGS = \ + -DPCRE2_CODE_UNIT_WIDTH=16 \ + $(VISIBILITY_CFLAGS) \ + $(CET_CFLAGS) \ + $(AM_CFLAGS) +libpcre2_16_la_LIBADD = +endif # WITH_PCRE2_16 + +if WITH_PCRE2_32 +lib_LTLIBRARIES += libpcre2-32.la +libpcre2_32_la_SOURCES = \ + $(COMMON_SOURCES) +nodist_libpcre2_32_la_SOURCES = \ + $(NODIST_SOURCES) +libpcre2_32_la_CFLAGS = \ + -DPCRE2_CODE_UNIT_WIDTH=32 \ + $(VISIBILITY_CFLAGS) \ + $(CET_CFLAGS) \ + $(AM_CFLAGS) +libpcre2_32_la_LIBADD = +endif # WITH_PCRE2_32 + +# The pcre2_chartables.c.dist file is the default version of +# pcre2_chartables.c, used unless --enable-rebuild-chartables is specified. + +EXTRA_DIST += src/pcre2_chartables.c.dist +CLEANFILES += src/pcre2_chartables.c + +# The JIT compiler lives in a separate directory, but its files are #included +# when pcre2_jit_compile.c is processed, so they must be distributed. + +EXTRA_DIST += \ + src/sljit/sljitConfig.h \ + src/sljit/sljitConfigInternal.h \ + src/sljit/sljitExecAllocator.c \ + src/sljit/sljitLir.c \ + src/sljit/sljitLir.h \ + src/sljit/sljitNativeARM_32.c \ + src/sljit/sljitNativeARM_64.c \ + src/sljit/sljitNativeARM_T2_32.c \ + src/sljit/sljitNativeMIPS_32.c \ + src/sljit/sljitNativeMIPS_64.c \ + src/sljit/sljitNativeMIPS_common.c \ + src/sljit/sljitNativePPC_32.c \ + src/sljit/sljitNativePPC_64.c \ + src/sljit/sljitNativePPC_common.c \ + src/sljit/sljitNativeS390X.c \ + src/sljit/sljitNativeSPARC_32.c \ + src/sljit/sljitNativeSPARC_common.c \ + src/sljit/sljitNativeX86_32.c \ + src/sljit/sljitNativeX86_64.c \ + src/sljit/sljitNativeX86_common.c \ + src/sljit/sljitProtExecAllocator.c \ + src/sljit/sljitUtils.c \ + src/sljit/sljitWXExecAllocator.c + +# Some of the JIT sources are also in separate files that are #included. + +EXTRA_DIST += \ + src/pcre2_jit_match.c \ + src/pcre2_jit_misc.c + +if WITH_PCRE2_8 +libpcre2_8_la_LDFLAGS = $(EXTRA_LIBPCRE2_8_LDFLAGS) +endif # WITH_PCRE2_8 +if WITH_PCRE2_16 +libpcre2_16_la_LDFLAGS = $(EXTRA_LIBPCRE2_16_LDFLAGS) +endif # WITH_PCRE2_16 +if WITH_PCRE2_32 +libpcre2_32_la_LDFLAGS = $(EXTRA_LIBPCRE2_32_LDFLAGS) +endif # WITH_PCRE2_32 + +if WITH_VALGRIND +if WITH_PCRE2_8 +libpcre2_8_la_CFLAGS += $(VALGRIND_CFLAGS) +endif # WITH_PCRE2_8 +if WITH_PCRE2_16 +libpcre2_16_la_CFLAGS += $(VALGRIND_CFLAGS) +endif # WITH_PCRE2_16 +if WITH_PCRE2_32 +libpcre2_32_la_CFLAGS += $(VALGRIND_CFLAGS) +endif # WITH_PCRE2_32 +endif # WITH_VALGRIND + +if WITH_GCOV +if WITH_PCRE2_8 +libpcre2_8_la_CFLAGS += $(GCOV_CFLAGS) +endif # WITH_PCRE2_8 +if WITH_PCRE2_16 +libpcre2_16_la_CFLAGS += $(GCOV_CFLAGS) +endif # WITH_PCRE2_16 +if WITH_PCRE2_32 +libpcre2_32_la_CFLAGS += $(GCOV_CFLAGS) +endif # WITH_PCRE2_32 +endif # WITH_GCOV + +## A version of the 8-bit library that has a POSIX API. + +if WITH_PCRE2_8 +lib_LTLIBRARIES += libpcre2-posix.la +libpcre2_posix_la_SOURCES = src/pcre2posix.c +libpcre2_posix_la_CFLAGS = \ + -DPCRE2_CODE_UNIT_WIDTH=8 \ + $(VISIBILITY_CFLAGS) $(AM_CFLAGS) +libpcre2_posix_la_LDFLAGS = $(EXTRA_LIBPCRE2_POSIX_LDFLAGS) +libpcre2_posix_la_LIBADD = libpcre2-8.la +if WITH_GCOV +libpcre2_posix_la_CFLAGS += $(GCOV_CFLAGS) +endif # WITH_GCOV +endif # WITH_PCRE2_8 + +## Build pcre2grep and optional fuzzer stuff if the 8-bit library is enabled + +if WITH_PCRE2_8 +bin_PROGRAMS += pcre2grep +pcre2grep_SOURCES = src/pcre2grep.c +pcre2grep_CFLAGS = $(AM_CFLAGS) +pcre2grep_LDADD = $(LIBZ) $(LIBBZ2) +pcre2grep_LDADD += libpcre2-8.la +if WITH_GCOV +pcre2grep_CFLAGS += $(GCOV_CFLAGS) +pcre2grep_LDADD += $(GCOV_LIBS) +endif # WITH_GCOV + +## If fuzzer support is enabled, build a non-distributed library containing the +## fuzzing function. Also build the standalone checking binary from the same +## source but using -DSTANDALONE. + +if WITH_FUZZ_SUPPORT +noinst_LIBRARIES = .libs/libpcre2-fuzzsupport.a +_libs_libpcre2_fuzzsupport_a_SOURCES = src/pcre2_fuzzsupport.c +_libs_libpcre2_fuzzsupport_a_CFLAGS = $(AM_CFLAGS) +_libs_libpcre2_fuzzsupport_a_LIBADD = + +noinst_PROGRAMS += pcre2fuzzcheck +pcre2fuzzcheck_SOURCES = src/pcre2_fuzzsupport.c +pcre2fuzzcheck_CFLAGS = -DSTANDALONE $(AM_CFLAGS) +pcre2fuzzcheck_LDADD = libpcre2-8.la +if WITH_GCOV +pcre2fuzzcheck_CFLAGS += $(GCOV_CFLAGS) +pcre2fuzzcheck_LDADD += $(GCOV_LIBS) +endif # WITH_GCOV +endif # WITH FUZZ_SUPPORT +endif # WITH_PCRE2_8 + +## -------- Testing ---------- + +## If JIT support is enabled, arrange for the JIT test program to run. + +if WITH_JIT +TESTS += pcre2_jit_test +noinst_PROGRAMS += pcre2_jit_test +pcre2_jit_test_SOURCES = src/pcre2_jit_test.c +pcre2_jit_test_CFLAGS = $(AM_CFLAGS) +pcre2_jit_test_LDADD = +if WITH_PCRE2_8 +pcre2_jit_test_LDADD += libpcre2-8.la +endif # WITH_PCRE2_8 +if WITH_PCRE2_16 +pcre2_jit_test_LDADD += libpcre2-16.la +endif # WITH_PCRE2_16 +if WITH_PCRE2_32 +pcre2_jit_test_LDADD += libpcre2-32.la +endif # WITH_PCRE2_32 +if WITH_GCOV +pcre2_jit_test_CFLAGS += $(GCOV_CFLAGS) +pcre2_jit_test_LDADD += $(GCOV_LIBS) +endif # WITH_GCOV +endif # WITH_JIT + +# Build the general pcre2test program. The file src/pcre2_printint.c is +# #included by pcre2test as many times as needed, at different code unit +# widths. + +bin_PROGRAMS += pcre2test +EXTRA_DIST += src/pcre2_printint.c +pcre2test_SOURCES = src/pcre2test.c +pcre2test_CFLAGS = $(AM_CFLAGS) +pcre2test_LDADD = $(LIBREADLINE) + +if WITH_PCRE2_8 +pcre2test_LDADD += libpcre2-8.la libpcre2-posix.la +endif # WITH_PCRE2_8 + +if WITH_PCRE2_16 +pcre2test_LDADD += libpcre2-16.la +endif # WITH_PCRE2_16 + +if WITH_PCRE2_32 +pcre2test_LDADD += libpcre2-32.la +endif # WITH_PCRE2_32 + +if WITH_VALGRIND +pcre2test_CFLAGS += $(VALGRIND_CFLAGS) +endif # WITH_VALGRIND + +if WITH_GCOV +pcre2test_CFLAGS += $(GCOV_CFLAGS) +pcre2test_LDADD += $(GCOV_LIBS) +endif # WITH_GCOV + +## The main library tests. Each test is a binary plus a script that runs that +## binary in various ways. We install these test binaries in case folks find it +## helpful. The two .bat files are for running the tests under Windows. + +TESTS += RunTest +EXTRA_DIST += RunTest.bat +dist_noinst_SCRIPTS += RunTest + +## When the 8-bit library is configured, pcre2grep will have been built. + +if WITH_PCRE2_8 +TESTS += RunGrepTest +EXTRA_DIST += RunGrepTest.bat +dist_noinst_SCRIPTS += RunGrepTest +endif # WITH_PCRE2_8 + +## Distribute all the test data files + +EXTRA_DIST += \ + testdata/grepbinary \ + testdata/grepfilelist \ + testdata/grepinput \ + testdata/grepinput3 \ + testdata/grepinput8 \ + testdata/grepinputM \ + testdata/grepinputv \ + testdata/grepinputx \ + testdata/greplist \ + testdata/grepoutput \ + testdata/grepoutput8 \ + testdata/grepoutputC \ + testdata/grepoutputCN \ + testdata/grepoutputN \ + testdata/greppatN4 \ + testdata/testbtables \ + testdata/testinput1 \ + testdata/testinput2 \ + testdata/testinput3 \ + testdata/testinput4 \ + testdata/testinput5 \ + testdata/testinput6 \ + testdata/testinput7 \ + testdata/testinput8 \ + testdata/testinput9 \ + testdata/testinput10 \ + testdata/testinput11 \ + testdata/testinput12 \ + testdata/testinput13 \ + testdata/testinput14 \ + testdata/testinput15 \ + testdata/testinput16 \ + testdata/testinput17 \ + testdata/testinput18 \ + testdata/testinput19 \ + testdata/testinput20 \ + testdata/testinput21 \ + testdata/testinput22 \ + testdata/testinput23 \ + testdata/testinput24 \ + testdata/testinput25 \ + testdata/testinputEBC \ + testdata/testoutput1 \ + testdata/testoutput2 \ + testdata/testoutput3 \ + testdata/testoutput3A \ + testdata/testoutput3B \ + testdata/testoutput4 \ + testdata/testoutput5 \ + testdata/testoutput6 \ + testdata/testoutput7 \ + testdata/testoutput8-16-2 \ + testdata/testoutput8-16-3 \ + testdata/testoutput8-16-4 \ + testdata/testoutput8-32-2 \ + testdata/testoutput8-32-3 \ + testdata/testoutput8-32-4 \ + testdata/testoutput8-8-2 \ + testdata/testoutput8-8-3 \ + testdata/testoutput8-8-4 \ + testdata/testoutput9 \ + testdata/testoutput10 \ + testdata/testoutput11-16 \ + testdata/testoutput11-32 \ + testdata/testoutput12-16 \ + testdata/testoutput12-32 \ + testdata/testoutput13 \ + testdata/testoutput14-16 \ + testdata/testoutput14-32 \ + testdata/testoutput14-8 \ + testdata/testoutput15 \ + testdata/testoutput16 \ + testdata/testoutput17 \ + testdata/testoutput18 \ + testdata/testoutput19 \ + testdata/testoutput20 \ + testdata/testoutput21 \ + testdata/testoutput22-16 \ + testdata/testoutput22-32 \ + testdata/testoutput22-8 \ + testdata/testoutput23 \ + testdata/testoutput24 \ + testdata/testoutput25 \ + testdata/testoutputEBC \ + testdata/valgrind-jit.supp \ + testdata/wintestinput3 \ + testdata/wintestoutput3 \ + perltest.sh + +# RunTest and RunGrepTest should clean up after themselves, but just in case +# they don't, add their working files to CLEANFILES. + +CLEANFILES += \ + testSinput \ + test3input \ + test3output \ + test3outputA \ + test3outputB \ + testtry \ + teststdout \ + teststderr \ + teststderrgrep \ + testtemp1grep \ + testtemp2grep \ + testtrygrep \ + testNinputgrep + +## ------------ End of testing ------------- + + +# PCRE2 demonstration program. Not built automatcally. The point is that the +# users should build it themselves. So just distribute the source. + +EXTRA_DIST += src/pcre2demo.c + + +# We have .pc files for pkg-config users. + +pkgconfigdir = $(libdir)/pkgconfig +pkgconfig_DATA = + +if WITH_PCRE2_8 +pkgconfig_DATA += libpcre2-8.pc libpcre2-posix.pc +endif + +if WITH_PCRE2_16 +pkgconfig_DATA += libpcre2-16.pc +endif + +if WITH_PCRE2_32 +pkgconfig_DATA += libpcre2-32.pc +endif + + +# gcov/lcov code coverage reporting +# +# Coverage reporting targets: +# +# coverage: Create a coverage report from 'make check' +# coverage-baseline: Capture baseline coverage information +# coverage-reset: This zeros the coverage counters only +# coverage-report: This creates the coverage report only +# coverage-clean-report: This removes the generated coverage report +# without cleaning the coverage data itself +# coverage-clean-data: This removes the captured coverage data without +# removing the coverage files created at compile time (*.gcno) +# coverage-clean: This cleans all coverage data including the generated +# coverage report. + +if WITH_GCOV +COVERAGE_TEST_NAME = $(PACKAGE) +COVERAGE_NAME = $(PACKAGE)-$(VERSION) +COVERAGE_OUTPUT_FILE = $(COVERAGE_NAME)-coverage.info +COVERAGE_OUTPUT_DIR = $(COVERAGE_NAME)-coverage +COVERAGE_LCOV_EXTRA_FLAGS = +COVERAGE_GENHTML_EXTRA_FLAGS = + +coverage_quiet = $(coverage_quiet_$(V)) +coverage_quiet_ = $(coverage_quiet_$(AM_DEFAULT_VERBOSITY)) +coverage_quiet_0 = --quiet + +coverage-check: all + -$(MAKE) $(AM_MAKEFLAGS) -k check + +coverage-baseline: + $(LCOV) $(coverage_quiet) \ + --directory $(top_builddir) \ + --output-file "$(COVERAGE_OUTPUT_FILE)" \ + --capture \ + --initial + +coverage-report: + $(LCOV) $(coverage_quiet) \ + --directory $(top_builddir) \ + --capture \ + --output-file "$(COVERAGE_OUTPUT_FILE).tmp" \ + --test-name "$(COVERAGE_TEST_NAME)" \ + --no-checksum \ + --compat-libtool \ + $(COVERAGE_LCOV_EXTRA_FLAGS) + $(LCOV) $(coverage_quiet) \ + --directory $(top_builddir) \ + --output-file "$(COVERAGE_OUTPUT_FILE)" \ + --remove "$(COVERAGE_OUTPUT_FILE).tmp" \ + "/tmp/*" \ + "/usr/include/*" \ + "$(includedir)/*" + -@rm -f "$(COVERAGE_OUTPUT_FILE).tmp" + LANG=C $(GENHTML) $(coverage_quiet) \ + --prefix $(top_builddir) \ + --output-directory "$(COVERAGE_OUTPUT_DIR)" \ + --title "$(PACKAGE) $(VERSION) Code Coverage Report" \ + --show-details "$(COVERAGE_OUTPUT_FILE)" \ + --legend \ + $(COVERAGE_GENHTML_EXTRA_FLAGS) + @echo "Code coverage report written to file://$(abs_builddir)/$(COVERAGE_OUTPUT_DIR)/index.html" + +coverage-reset: + -$(LCOV) $(coverage_quiet) --zerocounters --directory $(top_builddir) + +coverage-clean-report: + -rm -f "$(COVERAGE_OUTPUT_FILE)" "$(COVERAGE_OUTPUT_FILE).tmp" + -rm -rf "$(COVERAGE_OUTPUT_DIR)" + +coverage-clean-data: + -find $(top_builddir) -name "*.gcda" -delete + +coverage-clean: coverage-reset coverage-clean-report coverage-clean-data + -find $(top_builddir) -name "*.gcno" -delete + +coverage-distclean: coverage-clean + +coverage: coverage-reset coverage-baseline coverage-check coverage-report +clean-local: coverage-clean +distclean-local: coverage-distclean + +.PHONY: coverage coverage-baseline coverage-check coverage-report coverage-reset coverage-clean-report coverage-clean-data coverage-clean coverage-distclean + +# Without coverage support, still arrange for 'make distclean' to get rid of +# any coverage files that may have been left from a different configuration. + +else + +coverage: + @echo "Configuring with --enable-coverage is required to generate code coverage report." + +DISTCLEANFILES += src/*.gcda src/*.gcno + +distclean-local: + rm -rf $(PACKAGE)-$(VERSION)-coverage* + +endif # WITH_GCOV + +## CMake support + +EXTRA_DIST += \ + cmake/COPYING-CMAKE-SCRIPTS \ + cmake/FindPackageHandleStandardArgs.cmake \ + cmake/FindReadline.cmake \ + cmake/FindEditline.cmake \ + CMakeLists.txt \ + config-cmake.h.in + +## end Makefile.am diff --git a/src/pcre2/Makefile.in b/src/pcre2/Makefile.in new file mode 100644 index 00000000..1da0cfc1 --- /dev/null +++ b/src/pcre2/Makefile.in @@ -0,0 +1,3664 @@ +# Makefile.in generated by automake 1.16.3 from Makefile.am. +# @configure_input@ + +# Copyright (C) 1994-2020 Free Software Foundation, Inc. + +# This Makefile.in is free software; the Free Software Foundation +# gives unlimited permission to copy and/or distribute it, +# with or without modifications, as long as this notice is preserved. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY, to the extent permitted by law; without +# even the implied warranty of MERCHANTABILITY or FITNESS FOR A +# PARTICULAR PURPOSE. + +@SET_MAKE@ + + + + + + +VPATH = @srcdir@ +am__is_gnu_make = { \ + if test -z '$(MAKELEVEL)'; then \ + false; \ + elif test -n '$(MAKE_HOST)'; then \ + true; \ + elif test -n '$(MAKE_VERSION)' && test -n '$(CURDIR)'; then \ + true; \ + else \ + false; \ + fi; \ +} +am__make_running_with_option = \ + case $${target_option-} in \ + ?) ;; \ + *) echo "am__make_running_with_option: internal error: invalid" \ + "target option '$${target_option-}' specified" >&2; \ + exit 1;; \ + esac; \ + has_opt=no; \ + sane_makeflags=$$MAKEFLAGS; \ + if $(am__is_gnu_make); then \ + sane_makeflags=$$MFLAGS; \ + else \ + case $$MAKEFLAGS in \ + *\\[\ \ ]*) \ + bs=\\; \ + sane_makeflags=`printf '%s\n' "$$MAKEFLAGS" \ + | sed "s/$$bs$$bs[$$bs $$bs ]*//g"`;; \ + esac; \ + fi; \ + skip_next=no; \ + strip_trailopt () \ + { \ + flg=`printf '%s\n' "$$flg" | sed "s/$$1.*$$//"`; \ + }; \ + for flg in $$sane_makeflags; do \ + test $$skip_next = yes && { skip_next=no; continue; }; \ + case $$flg in \ + *=*|--*) continue;; \ + -*I) strip_trailopt 'I'; skip_next=yes;; \ + -*I?*) strip_trailopt 'I';; \ + -*O) strip_trailopt 'O'; skip_next=yes;; \ + -*O?*) strip_trailopt 'O';; \ + -*l) strip_trailopt 'l'; skip_next=yes;; \ + -*l?*) strip_trailopt 'l';; \ + -[dEDm]) skip_next=yes;; \ + -[JT]) skip_next=yes;; \ + esac; \ + case $$flg in \ + *$$target_option*) has_opt=yes; break;; \ + esac; \ + done; \ + test $$has_opt = yes +am__make_dryrun = (target_option=n; $(am__make_running_with_option)) +am__make_keepgoing = (target_option=k; $(am__make_running_with_option)) +pkgdatadir = $(datadir)/@PACKAGE@ +pkgincludedir = $(includedir)/@PACKAGE@ +pkglibdir = $(libdir)/@PACKAGE@ +pkglibexecdir = $(libexecdir)/@PACKAGE@ +am__cd = CDPATH="$${ZSH_VERSION+.}$(PATH_SEPARATOR)" && cd +install_sh_DATA = $(install_sh) -c -m 644 +install_sh_PROGRAM = $(install_sh) -c +install_sh_SCRIPT = $(install_sh) -c +INSTALL_HEADER = $(INSTALL_DATA) +transform = $(program_transform_name) +NORMAL_INSTALL = : +PRE_INSTALL = : +POST_INSTALL = : +NORMAL_UNINSTALL = : +PRE_UNINSTALL = : +POST_UNINSTALL = : +build_triplet = @build@ +host_triplet = @host@ +TESTS = $(am__EXEEXT_4) RunTest $(am__append_32) +bin_PROGRAMS = $(am__EXEEXT_1) pcre2test$(EXEEXT) +noinst_PROGRAMS = $(am__EXEEXT_2) $(am__EXEEXT_3) $(am__EXEEXT_4) +@WITH_REBUILD_CHARTABLES_TRUE@am__append_1 = pcre2_dftables +@WITH_PCRE2_8_TRUE@am__append_2 = libpcre2-8.la +@WITH_PCRE2_16_TRUE@am__append_3 = libpcre2-16.la +@WITH_PCRE2_32_TRUE@am__append_4 = libpcre2-32.la +@WITH_PCRE2_8_TRUE@@WITH_VALGRIND_TRUE@am__append_5 = $(VALGRIND_CFLAGS) +@WITH_PCRE2_16_TRUE@@WITH_VALGRIND_TRUE@am__append_6 = $(VALGRIND_CFLAGS) +@WITH_PCRE2_32_TRUE@@WITH_VALGRIND_TRUE@am__append_7 = $(VALGRIND_CFLAGS) +@WITH_GCOV_TRUE@@WITH_PCRE2_8_TRUE@am__append_8 = $(GCOV_CFLAGS) +@WITH_GCOV_TRUE@@WITH_PCRE2_16_TRUE@am__append_9 = $(GCOV_CFLAGS) +@WITH_GCOV_TRUE@@WITH_PCRE2_32_TRUE@am__append_10 = $(GCOV_CFLAGS) +@WITH_PCRE2_8_TRUE@am__append_11 = libpcre2-posix.la +@WITH_GCOV_TRUE@@WITH_PCRE2_8_TRUE@am__append_12 = $(GCOV_CFLAGS) +@WITH_PCRE2_8_TRUE@am__append_13 = pcre2grep +@WITH_GCOV_TRUE@@WITH_PCRE2_8_TRUE@am__append_14 = $(GCOV_CFLAGS) +@WITH_GCOV_TRUE@@WITH_PCRE2_8_TRUE@am__append_15 = $(GCOV_LIBS) +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@am__append_16 = pcre2fuzzcheck +@WITH_FUZZ_SUPPORT_TRUE@@WITH_GCOV_TRUE@@WITH_PCRE2_8_TRUE@am__append_17 = $(GCOV_CFLAGS) +@WITH_FUZZ_SUPPORT_TRUE@@WITH_GCOV_TRUE@@WITH_PCRE2_8_TRUE@am__append_18 = $(GCOV_LIBS) +@WITH_JIT_TRUE@am__append_19 = pcre2_jit_test +@WITH_JIT_TRUE@am__append_20 = pcre2_jit_test +@WITH_JIT_TRUE@@WITH_PCRE2_8_TRUE@am__append_21 = libpcre2-8.la +@WITH_JIT_TRUE@@WITH_PCRE2_16_TRUE@am__append_22 = libpcre2-16.la +@WITH_JIT_TRUE@@WITH_PCRE2_32_TRUE@am__append_23 = libpcre2-32.la +@WITH_GCOV_TRUE@@WITH_JIT_TRUE@am__append_24 = $(GCOV_CFLAGS) +@WITH_GCOV_TRUE@@WITH_JIT_TRUE@am__append_25 = $(GCOV_LIBS) +@WITH_PCRE2_8_TRUE@am__append_26 = libpcre2-8.la libpcre2-posix.la +@WITH_PCRE2_16_TRUE@am__append_27 = libpcre2-16.la +@WITH_PCRE2_32_TRUE@am__append_28 = libpcre2-32.la +@WITH_VALGRIND_TRUE@am__append_29 = $(VALGRIND_CFLAGS) +@WITH_GCOV_TRUE@am__append_30 = $(GCOV_CFLAGS) +@WITH_GCOV_TRUE@am__append_31 = $(GCOV_LIBS) +@WITH_PCRE2_8_TRUE@am__append_32 = RunGrepTest +@WITH_PCRE2_8_TRUE@am__append_33 = RunGrepTest.bat +@WITH_PCRE2_8_TRUE@am__append_34 = RunGrepTest +@WITH_PCRE2_8_TRUE@am__append_35 = libpcre2-8.pc libpcre2-posix.pc +@WITH_PCRE2_16_TRUE@am__append_36 = libpcre2-16.pc +@WITH_PCRE2_32_TRUE@am__append_37 = libpcre2-32.pc +@WITH_GCOV_FALSE@am__append_38 = src/*.gcda src/*.gcno +subdir = . +ACLOCAL_M4 = $(top_srcdir)/aclocal.m4 +am__aclocal_m4_deps = $(top_srcdir)/m4/ax_pthread.m4 \ + $(top_srcdir)/m4/libtool.m4 $(top_srcdir)/m4/ltoptions.m4 \ + $(top_srcdir)/m4/ltsugar.m4 $(top_srcdir)/m4/ltversion.m4 \ + $(top_srcdir)/m4/lt~obsolete.m4 \ + $(top_srcdir)/m4/pcre2_visibility.m4 \ + $(top_srcdir)/configure.ac +am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \ + $(ACLOCAL_M4) +DIST_COMMON = $(srcdir)/Makefile.am $(top_srcdir)/configure \ + $(am__configure_deps) $(am__dist_noinst_SCRIPTS_DIST) \ + $(dist_doc_DATA) $(dist_html_DATA) $(include_HEADERS) \ + $(am__DIST_COMMON) +am__CONFIG_DISTCLEAN_FILES = config.status config.cache config.log \ + configure.lineno config.status.lineno +mkinstalldirs = $(install_sh) -d +CONFIG_HEADER = $(top_builddir)/src/config.h +CONFIG_CLEAN_FILES = libpcre2-8.pc libpcre2-16.pc libpcre2-32.pc \ + libpcre2-posix.pc pcre2-config src/pcre2.h +CONFIG_CLEAN_VPATH_FILES = +@WITH_PCRE2_8_TRUE@am__EXEEXT_1 = pcre2grep$(EXEEXT) +am__installdirs = "$(DESTDIR)$(bindir)" "$(DESTDIR)$(libdir)" \ + "$(DESTDIR)$(bindir)" "$(DESTDIR)$(man1dir)" \ + "$(DESTDIR)$(man3dir)" "$(DESTDIR)$(docdir)" \ + "$(DESTDIR)$(htmldir)" "$(DESTDIR)$(pkgconfigdir)" \ + "$(DESTDIR)$(includedir)" "$(DESTDIR)$(includedir)" +@WITH_REBUILD_CHARTABLES_TRUE@am__EXEEXT_2 = pcre2_dftables$(EXEEXT) +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@am__EXEEXT_3 = pcre2fuzzcheck$(EXEEXT) +@WITH_JIT_TRUE@am__EXEEXT_4 = pcre2_jit_test$(EXEEXT) +PROGRAMS = $(bin_PROGRAMS) $(noinst_PROGRAMS) +LIBRARIES = $(noinst_LIBRARIES) +am__vpath_adj_setup = srcdirstrip=`echo "$(srcdir)" | sed 's|.|.|g'`; +am__vpath_adj = case $$p in \ + $(srcdir)/*) f=`echo "$$p" | sed "s|^$$srcdirstrip/||"`;; \ + *) f=$$p;; \ + esac; +am__strip_dir = f=`echo $$p | sed -e 's|^.*/||'`; +am__install_max = 40 +am__nobase_strip_setup = \ + srcdirstrip=`echo "$(srcdir)" | sed 's/[].[^$$\\*|]/\\\\&/g'` +am__nobase_strip = \ + for p in $$list; do echo "$$p"; done | sed -e "s|$$srcdirstrip/||" +am__nobase_list = $(am__nobase_strip_setup); \ + for p in $$list; do echo "$$p $$p"; done | \ + sed "s| $$srcdirstrip/| |;"' / .*\//!s/ .*/ ./; s,\( .*\)/[^/]*$$,\1,' | \ + $(AWK) 'BEGIN { files["."] = "" } { files[$$2] = files[$$2] " " $$1; \ + if (++n[$$2] == $(am__install_max)) \ + { print $$2, files[$$2]; n[$$2] = 0; files[$$2] = "" } } \ + END { for (dir in files) print dir, files[dir] }' +am__base_list = \ + sed '$$!N;$$!N;$$!N;$$!N;$$!N;$$!N;$$!N;s/\n/ /g' | \ + sed '$$!N;$$!N;$$!N;$$!N;s/\n/ /g' +am__uninstall_files_from_dir = { \ + test -z "$$files" \ + || { test ! -d "$$dir" && test ! -f "$$dir" && test ! -r "$$dir"; } \ + || { echo " ( cd '$$dir' && rm -f" $$files ")"; \ + $(am__cd) "$$dir" && rm -f $$files; }; \ + } +LTLIBRARIES = $(lib_LTLIBRARIES) +ARFLAGS = cru +AM_V_AR = $(am__v_AR_@AM_V@) +am__v_AR_ = $(am__v_AR_@AM_DEFAULT_V@) +am__v_AR_0 = @echo " AR " $@; +am__v_AR_1 = +_libs_libpcre2_fuzzsupport_a_AR = $(AR) $(ARFLAGS) +_libs_libpcre2_fuzzsupport_a_DEPENDENCIES = +am___libs_libpcre2_fuzzsupport_a_SOURCES_DIST = \ + src/pcre2_fuzzsupport.c +am__dirstamp = $(am__leading_dot)dirstamp +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@am__libs_libpcre2_fuzzsupport_a_OBJECTS = src/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.$(OBJEXT) +_libs_libpcre2_fuzzsupport_a_OBJECTS = \ + $(am__libs_libpcre2_fuzzsupport_a_OBJECTS) +libpcre2_16_la_DEPENDENCIES = +am__libpcre2_16_la_SOURCES_DIST = src/pcre2_auto_possess.c \ + src/pcre2_compile.c src/pcre2_config.c src/pcre2_context.c \ + src/pcre2_convert.c src/pcre2_dfa_match.c src/pcre2_error.c \ + src/pcre2_extuni.c src/pcre2_find_bracket.c \ + src/pcre2_internal.h src/pcre2_intmodedep.h \ + src/pcre2_jit_compile.c src/pcre2_jit_neon_inc.h \ + src/pcre2_jit_simd_inc.h src/pcre2_maketables.c \ + src/pcre2_match.c src/pcre2_match_data.c src/pcre2_newline.c \ + src/pcre2_ord2utf.c src/pcre2_pattern_info.c \ + src/pcre2_script_run.c src/pcre2_serialize.c \ + src/pcre2_string_utils.c src/pcre2_study.c \ + src/pcre2_substitute.c src/pcre2_substring.c \ + src/pcre2_tables.c src/pcre2_ucd.c src/pcre2_ucp.h \ + src/pcre2_valid_utf.c src/pcre2_xclass.c +am__objects_1 = src/libpcre2_16_la-pcre2_auto_possess.lo \ + src/libpcre2_16_la-pcre2_compile.lo \ + src/libpcre2_16_la-pcre2_config.lo \ + src/libpcre2_16_la-pcre2_context.lo \ + src/libpcre2_16_la-pcre2_convert.lo \ + src/libpcre2_16_la-pcre2_dfa_match.lo \ + src/libpcre2_16_la-pcre2_error.lo \ + src/libpcre2_16_la-pcre2_extuni.lo \ + src/libpcre2_16_la-pcre2_find_bracket.lo \ + src/libpcre2_16_la-pcre2_jit_compile.lo \ + src/libpcre2_16_la-pcre2_maketables.lo \ + src/libpcre2_16_la-pcre2_match.lo \ + src/libpcre2_16_la-pcre2_match_data.lo \ + src/libpcre2_16_la-pcre2_newline.lo \ + src/libpcre2_16_la-pcre2_ord2utf.lo \ + src/libpcre2_16_la-pcre2_pattern_info.lo \ + src/libpcre2_16_la-pcre2_script_run.lo \ + src/libpcre2_16_la-pcre2_serialize.lo \ + src/libpcre2_16_la-pcre2_string_utils.lo \ + src/libpcre2_16_la-pcre2_study.lo \ + src/libpcre2_16_la-pcre2_substitute.lo \ + src/libpcre2_16_la-pcre2_substring.lo \ + src/libpcre2_16_la-pcre2_tables.lo \ + src/libpcre2_16_la-pcre2_ucd.lo \ + src/libpcre2_16_la-pcre2_valid_utf.lo \ + src/libpcre2_16_la-pcre2_xclass.lo +@WITH_PCRE2_16_TRUE@am_libpcre2_16_la_OBJECTS = $(am__objects_1) +am__objects_2 = src/libpcre2_16_la-pcre2_chartables.lo +@WITH_PCRE2_16_TRUE@nodist_libpcre2_16_la_OBJECTS = $(am__objects_2) +libpcre2_16_la_OBJECTS = $(am_libpcre2_16_la_OBJECTS) \ + $(nodist_libpcre2_16_la_OBJECTS) +AM_V_lt = $(am__v_lt_@AM_V@) +am__v_lt_ = $(am__v_lt_@AM_DEFAULT_V@) +am__v_lt_0 = --silent +am__v_lt_1 = +libpcre2_16_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC \ + $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=link $(CCLD) \ + $(libpcre2_16_la_CFLAGS) $(CFLAGS) $(libpcre2_16_la_LDFLAGS) \ + $(LDFLAGS) -o $@ +@WITH_PCRE2_16_TRUE@am_libpcre2_16_la_rpath = -rpath $(libdir) +libpcre2_32_la_DEPENDENCIES = +am__libpcre2_32_la_SOURCES_DIST = src/pcre2_auto_possess.c \ + src/pcre2_compile.c src/pcre2_config.c src/pcre2_context.c \ + src/pcre2_convert.c src/pcre2_dfa_match.c src/pcre2_error.c \ + src/pcre2_extuni.c src/pcre2_find_bracket.c \ + src/pcre2_internal.h src/pcre2_intmodedep.h \ + src/pcre2_jit_compile.c src/pcre2_jit_neon_inc.h \ + src/pcre2_jit_simd_inc.h src/pcre2_maketables.c \ + src/pcre2_match.c src/pcre2_match_data.c src/pcre2_newline.c \ + src/pcre2_ord2utf.c src/pcre2_pattern_info.c \ + src/pcre2_script_run.c src/pcre2_serialize.c \ + src/pcre2_string_utils.c src/pcre2_study.c \ + src/pcre2_substitute.c src/pcre2_substring.c \ + src/pcre2_tables.c src/pcre2_ucd.c src/pcre2_ucp.h \ + src/pcre2_valid_utf.c src/pcre2_xclass.c +am__objects_3 = src/libpcre2_32_la-pcre2_auto_possess.lo \ + src/libpcre2_32_la-pcre2_compile.lo \ + src/libpcre2_32_la-pcre2_config.lo \ + src/libpcre2_32_la-pcre2_context.lo \ + src/libpcre2_32_la-pcre2_convert.lo \ + src/libpcre2_32_la-pcre2_dfa_match.lo \ + src/libpcre2_32_la-pcre2_error.lo \ + src/libpcre2_32_la-pcre2_extuni.lo \ + src/libpcre2_32_la-pcre2_find_bracket.lo \ + src/libpcre2_32_la-pcre2_jit_compile.lo \ + src/libpcre2_32_la-pcre2_maketables.lo \ + src/libpcre2_32_la-pcre2_match.lo \ + src/libpcre2_32_la-pcre2_match_data.lo \ + src/libpcre2_32_la-pcre2_newline.lo \ + src/libpcre2_32_la-pcre2_ord2utf.lo \ + src/libpcre2_32_la-pcre2_pattern_info.lo \ + src/libpcre2_32_la-pcre2_script_run.lo \ + src/libpcre2_32_la-pcre2_serialize.lo \ + src/libpcre2_32_la-pcre2_string_utils.lo \ + src/libpcre2_32_la-pcre2_study.lo \ + src/libpcre2_32_la-pcre2_substitute.lo \ + src/libpcre2_32_la-pcre2_substring.lo \ + src/libpcre2_32_la-pcre2_tables.lo \ + src/libpcre2_32_la-pcre2_ucd.lo \ + src/libpcre2_32_la-pcre2_valid_utf.lo \ + src/libpcre2_32_la-pcre2_xclass.lo +@WITH_PCRE2_32_TRUE@am_libpcre2_32_la_OBJECTS = $(am__objects_3) +am__objects_4 = src/libpcre2_32_la-pcre2_chartables.lo +@WITH_PCRE2_32_TRUE@nodist_libpcre2_32_la_OBJECTS = $(am__objects_4) +libpcre2_32_la_OBJECTS = $(am_libpcre2_32_la_OBJECTS) \ + $(nodist_libpcre2_32_la_OBJECTS) +libpcre2_32_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC \ + $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=link $(CCLD) \ + $(libpcre2_32_la_CFLAGS) $(CFLAGS) $(libpcre2_32_la_LDFLAGS) \ + $(LDFLAGS) -o $@ +@WITH_PCRE2_32_TRUE@am_libpcre2_32_la_rpath = -rpath $(libdir) +libpcre2_8_la_DEPENDENCIES = +am__libpcre2_8_la_SOURCES_DIST = src/pcre2_auto_possess.c \ + src/pcre2_compile.c src/pcre2_config.c src/pcre2_context.c \ + src/pcre2_convert.c src/pcre2_dfa_match.c src/pcre2_error.c \ + src/pcre2_extuni.c src/pcre2_find_bracket.c \ + src/pcre2_internal.h src/pcre2_intmodedep.h \ + src/pcre2_jit_compile.c src/pcre2_jit_neon_inc.h \ + src/pcre2_jit_simd_inc.h src/pcre2_maketables.c \ + src/pcre2_match.c src/pcre2_match_data.c src/pcre2_newline.c \ + src/pcre2_ord2utf.c src/pcre2_pattern_info.c \ + src/pcre2_script_run.c src/pcre2_serialize.c \ + src/pcre2_string_utils.c src/pcre2_study.c \ + src/pcre2_substitute.c src/pcre2_substring.c \ + src/pcre2_tables.c src/pcre2_ucd.c src/pcre2_ucp.h \ + src/pcre2_valid_utf.c src/pcre2_xclass.c +am__objects_5 = src/libpcre2_8_la-pcre2_auto_possess.lo \ + src/libpcre2_8_la-pcre2_compile.lo \ + src/libpcre2_8_la-pcre2_config.lo \ + src/libpcre2_8_la-pcre2_context.lo \ + src/libpcre2_8_la-pcre2_convert.lo \ + src/libpcre2_8_la-pcre2_dfa_match.lo \ + src/libpcre2_8_la-pcre2_error.lo \ + src/libpcre2_8_la-pcre2_extuni.lo \ + src/libpcre2_8_la-pcre2_find_bracket.lo \ + src/libpcre2_8_la-pcre2_jit_compile.lo \ + src/libpcre2_8_la-pcre2_maketables.lo \ + src/libpcre2_8_la-pcre2_match.lo \ + src/libpcre2_8_la-pcre2_match_data.lo \ + src/libpcre2_8_la-pcre2_newline.lo \ + src/libpcre2_8_la-pcre2_ord2utf.lo \ + src/libpcre2_8_la-pcre2_pattern_info.lo \ + src/libpcre2_8_la-pcre2_script_run.lo \ + src/libpcre2_8_la-pcre2_serialize.lo \ + src/libpcre2_8_la-pcre2_string_utils.lo \ + src/libpcre2_8_la-pcre2_study.lo \ + src/libpcre2_8_la-pcre2_substitute.lo \ + src/libpcre2_8_la-pcre2_substring.lo \ + src/libpcre2_8_la-pcre2_tables.lo \ + src/libpcre2_8_la-pcre2_ucd.lo \ + src/libpcre2_8_la-pcre2_valid_utf.lo \ + src/libpcre2_8_la-pcre2_xclass.lo +@WITH_PCRE2_8_TRUE@am_libpcre2_8_la_OBJECTS = $(am__objects_5) +am__objects_6 = src/libpcre2_8_la-pcre2_chartables.lo +@WITH_PCRE2_8_TRUE@nodist_libpcre2_8_la_OBJECTS = $(am__objects_6) +libpcre2_8_la_OBJECTS = $(am_libpcre2_8_la_OBJECTS) \ + $(nodist_libpcre2_8_la_OBJECTS) +libpcre2_8_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) \ + $(LIBTOOLFLAGS) --mode=link $(CCLD) $(libpcre2_8_la_CFLAGS) \ + $(CFLAGS) $(libpcre2_8_la_LDFLAGS) $(LDFLAGS) -o $@ +@WITH_PCRE2_8_TRUE@am_libpcre2_8_la_rpath = -rpath $(libdir) +@WITH_PCRE2_8_TRUE@libpcre2_posix_la_DEPENDENCIES = libpcre2-8.la +am__libpcre2_posix_la_SOURCES_DIST = src/pcre2posix.c +@WITH_PCRE2_8_TRUE@am_libpcre2_posix_la_OBJECTS = \ +@WITH_PCRE2_8_TRUE@ src/libpcre2_posix_la-pcre2posix.lo +libpcre2_posix_la_OBJECTS = $(am_libpcre2_posix_la_OBJECTS) +libpcre2_posix_la_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC \ + $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=link $(CCLD) \ + $(libpcre2_posix_la_CFLAGS) $(CFLAGS) \ + $(libpcre2_posix_la_LDFLAGS) $(LDFLAGS) -o $@ +@WITH_PCRE2_8_TRUE@am_libpcre2_posix_la_rpath = -rpath $(libdir) +am__pcre2_dftables_SOURCES_DIST = src/pcre2_dftables.c +@WITH_REBUILD_CHARTABLES_TRUE@am_pcre2_dftables_OBJECTS = \ +@WITH_REBUILD_CHARTABLES_TRUE@ src/pcre2_dftables.$(OBJEXT) +pcre2_dftables_OBJECTS = $(am_pcre2_dftables_OBJECTS) +pcre2_dftables_LDADD = $(LDADD) +am__pcre2_jit_test_SOURCES_DIST = src/pcre2_jit_test.c +@WITH_JIT_TRUE@am_pcre2_jit_test_OBJECTS = \ +@WITH_JIT_TRUE@ src/pcre2_jit_test-pcre2_jit_test.$(OBJEXT) +pcre2_jit_test_OBJECTS = $(am_pcre2_jit_test_OBJECTS) +am__DEPENDENCIES_1 = +@WITH_GCOV_TRUE@@WITH_JIT_TRUE@am__DEPENDENCIES_2 = \ +@WITH_GCOV_TRUE@@WITH_JIT_TRUE@ $(am__DEPENDENCIES_1) +@WITH_JIT_TRUE@pcre2_jit_test_DEPENDENCIES = $(am__append_21) \ +@WITH_JIT_TRUE@ $(am__append_22) $(am__append_23) \ +@WITH_JIT_TRUE@ $(am__DEPENDENCIES_2) +pcre2_jit_test_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC \ + $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=link $(CCLD) \ + $(pcre2_jit_test_CFLAGS) $(CFLAGS) $(AM_LDFLAGS) $(LDFLAGS) -o \ + $@ +am__pcre2fuzzcheck_SOURCES_DIST = src/pcre2_fuzzsupport.c +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@am_pcre2fuzzcheck_OBJECTS = src/pcre2fuzzcheck-pcre2_fuzzsupport.$(OBJEXT) +pcre2fuzzcheck_OBJECTS = $(am_pcre2fuzzcheck_OBJECTS) +@WITH_FUZZ_SUPPORT_TRUE@@WITH_GCOV_TRUE@@WITH_PCRE2_8_TRUE@am__DEPENDENCIES_3 = $(am__DEPENDENCIES_1) +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@pcre2fuzzcheck_DEPENDENCIES = \ +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@ libpcre2-8.la \ +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@ $(am__DEPENDENCIES_3) +pcre2fuzzcheck_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC \ + $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=link $(CCLD) \ + $(pcre2fuzzcheck_CFLAGS) $(CFLAGS) $(AM_LDFLAGS) $(LDFLAGS) -o \ + $@ +am__pcre2grep_SOURCES_DIST = src/pcre2grep.c +@WITH_PCRE2_8_TRUE@am_pcre2grep_OBJECTS = \ +@WITH_PCRE2_8_TRUE@ src/pcre2grep-pcre2grep.$(OBJEXT) +pcre2grep_OBJECTS = $(am_pcre2grep_OBJECTS) +@WITH_GCOV_TRUE@@WITH_PCRE2_8_TRUE@am__DEPENDENCIES_4 = \ +@WITH_GCOV_TRUE@@WITH_PCRE2_8_TRUE@ $(am__DEPENDENCIES_1) +@WITH_PCRE2_8_TRUE@pcre2grep_DEPENDENCIES = $(am__DEPENDENCIES_1) \ +@WITH_PCRE2_8_TRUE@ $(am__DEPENDENCIES_1) libpcre2-8.la \ +@WITH_PCRE2_8_TRUE@ $(am__DEPENDENCIES_4) +pcre2grep_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) \ + $(LIBTOOLFLAGS) --mode=link $(CCLD) $(pcre2grep_CFLAGS) \ + $(CFLAGS) $(AM_LDFLAGS) $(LDFLAGS) -o $@ +am_pcre2test_OBJECTS = src/pcre2test-pcre2test.$(OBJEXT) +pcre2test_OBJECTS = $(am_pcre2test_OBJECTS) +@WITH_GCOV_TRUE@am__DEPENDENCIES_5 = $(am__DEPENDENCIES_1) +pcre2test_DEPENDENCIES = $(am__DEPENDENCIES_1) $(am__append_26) \ + $(am__append_27) $(am__append_28) $(am__DEPENDENCIES_5) +pcre2test_LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) \ + $(LIBTOOLFLAGS) --mode=link $(CCLD) $(pcre2test_CFLAGS) \ + $(CFLAGS) $(AM_LDFLAGS) $(LDFLAGS) -o $@ +am__dist_noinst_SCRIPTS_DIST = RunTest RunGrepTest +SCRIPTS = $(bin_SCRIPTS) $(dist_noinst_SCRIPTS) +AM_V_P = $(am__v_P_@AM_V@) +am__v_P_ = $(am__v_P_@AM_DEFAULT_V@) +am__v_P_0 = false +am__v_P_1 = : +AM_V_GEN = $(am__v_GEN_@AM_V@) +am__v_GEN_ = $(am__v_GEN_@AM_DEFAULT_V@) +am__v_GEN_0 = @echo " GEN " $@; +am__v_GEN_1 = +AM_V_at = $(am__v_at_@AM_V@) +am__v_at_ = $(am__v_at_@AM_DEFAULT_V@) +am__v_at_0 = @ +am__v_at_1 = +DEFAULT_INCLUDES = -I.@am__isrc@ -I$(top_builddir)/src +depcomp = $(SHELL) $(top_srcdir)/depcomp +am__maybe_remake_depfiles = depfiles +am__depfiles_remade = src/$(DEPDIR)/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.Po \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_auto_possess.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_chartables.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_compile.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_config.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_context.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_convert.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_dfa_match.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_error.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_extuni.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_find_bracket.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_jit_compile.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_maketables.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_match.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_match_data.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_newline.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_ord2utf.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_pattern_info.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_script_run.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_serialize.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_string_utils.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_study.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_substitute.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_substring.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_tables.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_ucd.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_valid_utf.Plo \ + src/$(DEPDIR)/libpcre2_16_la-pcre2_xclass.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_auto_possess.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_chartables.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_compile.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_config.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_context.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_convert.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_dfa_match.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_error.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_extuni.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_find_bracket.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_jit_compile.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_maketables.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_match.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_match_data.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_newline.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_ord2utf.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_pattern_info.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_script_run.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_serialize.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_string_utils.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_study.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_substitute.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_substring.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_tables.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_ucd.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_valid_utf.Plo \ + src/$(DEPDIR)/libpcre2_32_la-pcre2_xclass.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_auto_possess.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_chartables.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_compile.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_config.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_context.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_convert.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_dfa_match.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_error.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_extuni.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_find_bracket.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_jit_compile.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_maketables.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_match.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_match_data.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_newline.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_ord2utf.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_pattern_info.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_script_run.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_serialize.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_string_utils.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_study.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_substitute.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_substring.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_tables.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_ucd.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_valid_utf.Plo \ + src/$(DEPDIR)/libpcre2_8_la-pcre2_xclass.Plo \ + src/$(DEPDIR)/libpcre2_posix_la-pcre2posix.Plo \ + src/$(DEPDIR)/pcre2_dftables.Po \ + src/$(DEPDIR)/pcre2_jit_test-pcre2_jit_test.Po \ + src/$(DEPDIR)/pcre2fuzzcheck-pcre2_fuzzsupport.Po \ + src/$(DEPDIR)/pcre2grep-pcre2grep.Po \ + src/$(DEPDIR)/pcre2test-pcre2test.Po +am__mv = mv -f +COMPILE = $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) \ + $(CPPFLAGS) $(AM_CFLAGS) $(CFLAGS) +LTCOMPILE = $(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) \ + $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) \ + $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) \ + $(AM_CFLAGS) $(CFLAGS) +AM_V_CC = $(am__v_CC_@AM_V@) +am__v_CC_ = $(am__v_CC_@AM_DEFAULT_V@) +am__v_CC_0 = @echo " CC " $@; +am__v_CC_1 = +CCLD = $(CC) +LINK = $(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) \ + $(LIBTOOLFLAGS) --mode=link $(CCLD) $(AM_CFLAGS) $(CFLAGS) \ + $(AM_LDFLAGS) $(LDFLAGS) -o $@ +AM_V_CCLD = $(am__v_CCLD_@AM_V@) +am__v_CCLD_ = $(am__v_CCLD_@AM_DEFAULT_V@) +am__v_CCLD_0 = @echo " CCLD " $@; +am__v_CCLD_1 = +SOURCES = $(_libs_libpcre2_fuzzsupport_a_SOURCES) \ + $(libpcre2_16_la_SOURCES) $(nodist_libpcre2_16_la_SOURCES) \ + $(libpcre2_32_la_SOURCES) $(nodist_libpcre2_32_la_SOURCES) \ + $(libpcre2_8_la_SOURCES) $(nodist_libpcre2_8_la_SOURCES) \ + $(libpcre2_posix_la_SOURCES) $(pcre2_dftables_SOURCES) \ + $(pcre2_jit_test_SOURCES) $(pcre2fuzzcheck_SOURCES) \ + $(pcre2grep_SOURCES) $(pcre2test_SOURCES) +DIST_SOURCES = $(am___libs_libpcre2_fuzzsupport_a_SOURCES_DIST) \ + $(am__libpcre2_16_la_SOURCES_DIST) \ + $(am__libpcre2_32_la_SOURCES_DIST) \ + $(am__libpcre2_8_la_SOURCES_DIST) \ + $(am__libpcre2_posix_la_SOURCES_DIST) \ + $(am__pcre2_dftables_SOURCES_DIST) \ + $(am__pcre2_jit_test_SOURCES_DIST) \ + $(am__pcre2fuzzcheck_SOURCES_DIST) \ + $(am__pcre2grep_SOURCES_DIST) $(pcre2test_SOURCES) +am__can_run_installinfo = \ + case $$AM_UPDATE_INFO_DIR in \ + n|no|NO) false;; \ + *) (install-info --version) >/dev/null 2>&1;; \ + esac +man1dir = $(mandir)/man1 +man3dir = $(mandir)/man3 +NROFF = nroff +MANS = $(dist_man_MANS) +DATA = $(dist_doc_DATA) $(dist_html_DATA) $(pkgconfig_DATA) +HEADERS = $(include_HEADERS) $(nodist_include_HEADERS) +am__tagged_files = $(HEADERS) $(SOURCES) $(TAGS_FILES) $(LISP) +# Read a list of newline-separated strings from the standard input, +# and print each of them once, without duplicates. Input order is +# *not* preserved. +am__uniquify_input = $(AWK) '\ + BEGIN { nonempty = 0; } \ + { items[$$0] = 1; nonempty = 1; } \ + END { if (nonempty) { for (i in items) print i; }; } \ +' +# Make sure the list of sources is unique. This is necessary because, +# e.g., the same source file might be shared among _SOURCES variables +# for different programs/libraries. +am__define_uniq_tagged_files = \ + list='$(am__tagged_files)'; \ + unique=`for i in $$list; do \ + if test -f "$$i"; then echo $$i; else echo $(srcdir)/$$i; fi; \ + done | $(am__uniquify_input)` +ETAGS = etags +CTAGS = ctags +CSCOPE = cscope +AM_RECURSIVE_TARGETS = cscope check recheck +am__tty_colors_dummy = \ + mgn= red= grn= lgn= blu= brg= std=; \ + am__color_tests=no +am__tty_colors = { \ + $(am__tty_colors_dummy); \ + if test "X$(AM_COLOR_TESTS)" = Xno; then \ + am__color_tests=no; \ + elif test "X$(AM_COLOR_TESTS)" = Xalways; then \ + am__color_tests=yes; \ + elif test "X$$TERM" != Xdumb && { test -t 1; } 2>/dev/null; then \ + am__color_tests=yes; \ + fi; \ + if test $$am__color_tests = yes; then \ + red=''; \ + grn=''; \ + lgn=''; \ + blu=''; \ + mgn=''; \ + brg=''; \ + std=''; \ + fi; \ +} +am__recheck_rx = ^[ ]*:recheck:[ ]* +am__global_test_result_rx = ^[ ]*:global-test-result:[ ]* +am__copy_in_global_log_rx = ^[ ]*:copy-in-global-log:[ ]* +# A command that, given a newline-separated list of test names on the +# standard input, print the name of the tests that are to be re-run +# upon "make recheck". +am__list_recheck_tests = $(AWK) '{ \ + recheck = 1; \ + while ((rc = (getline line < ($$0 ".trs"))) != 0) \ + { \ + if (rc < 0) \ + { \ + if ((getline line2 < ($$0 ".log")) < 0) \ + recheck = 0; \ + break; \ + } \ + else if (line ~ /$(am__recheck_rx)[nN][Oo]/) \ + { \ + recheck = 0; \ + break; \ + } \ + else if (line ~ /$(am__recheck_rx)[yY][eE][sS]/) \ + { \ + break; \ + } \ + }; \ + if (recheck) \ + print $$0; \ + close ($$0 ".trs"); \ + close ($$0 ".log"); \ +}' +# A command that, given a newline-separated list of test names on the +# standard input, create the global log from their .trs and .log files. +am__create_global_log = $(AWK) ' \ +function fatal(msg) \ +{ \ + print "fatal: making $@: " msg | "cat >&2"; \ + exit 1; \ +} \ +function rst_section(header) \ +{ \ + print header; \ + len = length(header); \ + for (i = 1; i <= len; i = i + 1) \ + printf "="; \ + printf "\n\n"; \ +} \ +{ \ + copy_in_global_log = 1; \ + global_test_result = "RUN"; \ + while ((rc = (getline line < ($$0 ".trs"))) != 0) \ + { \ + if (rc < 0) \ + fatal("failed to read from " $$0 ".trs"); \ + if (line ~ /$(am__global_test_result_rx)/) \ + { \ + sub("$(am__global_test_result_rx)", "", line); \ + sub("[ ]*$$", "", line); \ + global_test_result = line; \ + } \ + else if (line ~ /$(am__copy_in_global_log_rx)[nN][oO]/) \ + copy_in_global_log = 0; \ + }; \ + if (copy_in_global_log) \ + { \ + rst_section(global_test_result ": " $$0); \ + while ((rc = (getline line < ($$0 ".log"))) != 0) \ + { \ + if (rc < 0) \ + fatal("failed to read from " $$0 ".log"); \ + print line; \ + }; \ + printf "\n"; \ + }; \ + close ($$0 ".trs"); \ + close ($$0 ".log"); \ +}' +# Restructured Text title. +am__rst_title = { sed 's/.*/ & /;h;s/./=/g;p;x;s/ *$$//;p;g' && echo; } +# Solaris 10 'make', and several other traditional 'make' implementations, +# pass "-e" to $(SHELL), and POSIX 2008 even requires this. Work around it +# by disabling -e (using the XSI extension "set +e") if it's set. +am__sh_e_setup = case $$- in *e*) set +e;; esac +# Default flags passed to test drivers. +am__common_driver_flags = \ + --color-tests "$$am__color_tests" \ + --enable-hard-errors "$$am__enable_hard_errors" \ + --expect-failure "$$am__expect_failure" +# To be inserted before the command running the test. Creates the +# directory for the log if needed. Stores in $dir the directory +# containing $f, in $tst the test, in $log the log. Executes the +# developer- defined test setup AM_TESTS_ENVIRONMENT (if any), and +# passes TESTS_ENVIRONMENT. Set up options for the wrapper that +# will run the test scripts (or their associated LOG_COMPILER, if +# thy have one). +am__check_pre = \ +$(am__sh_e_setup); \ +$(am__vpath_adj_setup) $(am__vpath_adj) \ +$(am__tty_colors); \ +srcdir=$(srcdir); export srcdir; \ +case "$@" in \ + */*) am__odir=`echo "./$@" | sed 's|/[^/]*$$||'`;; \ + *) am__odir=.;; \ +esac; \ +test "x$$am__odir" = x"." || test -d "$$am__odir" \ + || $(MKDIR_P) "$$am__odir" || exit $$?; \ +if test -f "./$$f"; then dir=./; \ +elif test -f "$$f"; then dir=; \ +else dir="$(srcdir)/"; fi; \ +tst=$$dir$$f; log='$@'; \ +if test -n '$(DISABLE_HARD_ERRORS)'; then \ + am__enable_hard_errors=no; \ +else \ + am__enable_hard_errors=yes; \ +fi; \ +case " $(XFAIL_TESTS) " in \ + *[\ \ ]$$f[\ \ ]* | *[\ \ ]$$dir$$f[\ \ ]*) \ + am__expect_failure=yes;; \ + *) \ + am__expect_failure=no;; \ +esac; \ +$(AM_TESTS_ENVIRONMENT) $(TESTS_ENVIRONMENT) +# A shell command to get the names of the tests scripts with any registered +# extension removed (i.e., equivalently, the names of the test logs, with +# the '.log' extension removed). The result is saved in the shell variable +# '$bases'. This honors runtime overriding of TESTS and TEST_LOGS. Sadly, +# we cannot use something simpler, involving e.g., "$(TEST_LOGS:.log=)", +# since that might cause problem with VPATH rewrites for suffix-less tests. +# See also 'test-harness-vpath-rewrite.sh' and 'test-trs-basic.sh'. +am__set_TESTS_bases = \ + bases='$(TEST_LOGS)'; \ + bases=`for i in $$bases; do echo $$i; done | sed 's/\.log$$//'`; \ + bases=`echo $$bases` +AM_TESTSUITE_SUMMARY_HEADER = ' for $(PACKAGE_STRING)' +RECHECK_LOGS = $(TEST_LOGS) +TEST_SUITE_LOG = test-suite.log +TEST_EXTENSIONS = @EXEEXT@ .test +LOG_DRIVER = $(SHELL) $(top_srcdir)/test-driver +LOG_COMPILE = $(LOG_COMPILER) $(AM_LOG_FLAGS) $(LOG_FLAGS) +am__set_b = \ + case '$@' in \ + */*) \ + case '$*' in \ + */*) b='$*';; \ + *) b=`echo '$@' | sed 's/\.log$$//'`; \ + esac;; \ + *) \ + b='$*';; \ + esac +am__test_logs1 = $(TESTS:=.log) +am__test_logs2 = $(am__test_logs1:@EXEEXT@.log=.log) +TEST_LOGS = $(am__test_logs2:.test.log=.log) +TEST_LOG_DRIVER = $(SHELL) $(top_srcdir)/test-driver +TEST_LOG_COMPILE = $(TEST_LOG_COMPILER) $(AM_TEST_LOG_FLAGS) \ + $(TEST_LOG_FLAGS) +am__DIST_COMMON = $(dist_man_MANS) $(srcdir)/Makefile.in \ + $(srcdir)/libpcre2-16.pc.in $(srcdir)/libpcre2-32.pc.in \ + $(srcdir)/libpcre2-8.pc.in $(srcdir)/libpcre2-posix.pc.in \ + $(srcdir)/pcre2-config.in $(top_srcdir)/src/config.h.in \ + $(top_srcdir)/src/pcre2.h.in AUTHORS COPYING ChangeLog INSTALL \ + NEWS README ar-lib compile config.guess config.sub depcomp \ + install-sh ltmain.sh missing test-driver +DISTFILES = $(DIST_COMMON) $(DIST_SOURCES) $(TEXINFOS) $(EXTRA_DIST) +distdir = $(PACKAGE)-$(VERSION) +top_distdir = $(distdir) +am__remove_distdir = \ + if test -d "$(distdir)"; then \ + find "$(distdir)" -type d ! -perm -200 -exec chmod u+w {} ';' \ + && rm -rf "$(distdir)" \ + || { sleep 5 && rm -rf "$(distdir)"; }; \ + else :; fi +am__post_remove_distdir = $(am__remove_distdir) +DIST_ARCHIVES = $(distdir).tar.gz $(distdir).tar.bz2 $(distdir).zip +GZIP_ENV = --best +DIST_TARGETS = dist-bzip2 dist-gzip dist-zip +# Exists only to be overridden by the user if desired. +AM_DISTCHECK_DVI_TARGET = dvi +distuninstallcheck_listfiles = find . -type f -print +am__distuninstallcheck_listfiles = $(distuninstallcheck_listfiles) \ + | sed 's|^\./|$(prefix)/|' | grep -v '$(infodir)/dir$$' +distcleancheck_listfiles = find . -type f -print +ACLOCAL = @ACLOCAL@ +AMTAR = @AMTAR@ +AM_DEFAULT_VERBOSITY = @AM_DEFAULT_VERBOSITY@ +AR = @AR@ +AS = @AS@ +AUTOCONF = @AUTOCONF@ +AUTOHEADER = @AUTOHEADER@ +AUTOMAKE = @AUTOMAKE@ +AWK = @AWK@ +CC = @CC@ +CCDEPMODE = @CCDEPMODE@ +CET_CFLAGS = @CET_CFLAGS@ +CFLAGS = @CFLAGS@ +CPPFLAGS = @CPPFLAGS@ +CYGPATH_W = @CYGPATH_W@ +DEFS = @DEFS@ +DEPDIR = @DEPDIR@ +DISTCHECK_CONFIGURE_FLAGS = @DISTCHECK_CONFIGURE_FLAGS@ +DLLTOOL = @DLLTOOL@ +DSYMUTIL = @DSYMUTIL@ +DUMPBIN = @DUMPBIN@ +ECHO_C = @ECHO_C@ +ECHO_N = @ECHO_N@ +ECHO_T = @ECHO_T@ +EGREP = @EGREP@ +EXEEXT = @EXEEXT@ +EXTRA_LIBPCRE2_16_LDFLAGS = @EXTRA_LIBPCRE2_16_LDFLAGS@ +EXTRA_LIBPCRE2_32_LDFLAGS = @EXTRA_LIBPCRE2_32_LDFLAGS@ +EXTRA_LIBPCRE2_8_LDFLAGS = @EXTRA_LIBPCRE2_8_LDFLAGS@ +EXTRA_LIBPCRE2_POSIX_LDFLAGS = @EXTRA_LIBPCRE2_POSIX_LDFLAGS@ +FGREP = @FGREP@ +GCOV_CFLAGS = @GCOV_CFLAGS@ +GCOV_CXXFLAGS = @GCOV_CXXFLAGS@ +GCOV_LIBS = @GCOV_LIBS@ +GENHTML = @GENHTML@ +GREP = @GREP@ +HAVE_VISIBILITY = @HAVE_VISIBILITY@ +INSTALL = @INSTALL@ +INSTALL_DATA = @INSTALL_DATA@ +INSTALL_PROGRAM = @INSTALL_PROGRAM@ +INSTALL_SCRIPT = @INSTALL_SCRIPT@ +INSTALL_STRIP_PROGRAM = @INSTALL_STRIP_PROGRAM@ +LCOV = @LCOV@ +LD = @LD@ +LDFLAGS = @LDFLAGS@ +LIBBZ2 = @LIBBZ2@ +LIBOBJS = @LIBOBJS@ +LIBREADLINE = @LIBREADLINE@ +LIBS = @LIBS@ +LIBTOOL = @LIBTOOL@ +LIBZ = @LIBZ@ +LIB_POSTFIX = @LIB_POSTFIX@ +LIPO = @LIPO@ +LN_S = @LN_S@ +LTLIBOBJS = @LTLIBOBJS@ +LT_SYS_LIBRARY_PATH = @LT_SYS_LIBRARY_PATH@ +MAKEINFO = @MAKEINFO@ +MANIFEST_TOOL = @MANIFEST_TOOL@ +MKDIR_P = @MKDIR_P@ +NM = @NM@ +NMEDIT = @NMEDIT@ +OBJDUMP = @OBJDUMP@ +OBJEXT = @OBJEXT@ +OTOOL = @OTOOL@ +OTOOL64 = @OTOOL64@ +PACKAGE = @PACKAGE@ +PACKAGE_BUGREPORT = @PACKAGE_BUGREPORT@ +PACKAGE_NAME = @PACKAGE_NAME@ +PACKAGE_STRING = @PACKAGE_STRING@ +PACKAGE_TARNAME = @PACKAGE_TARNAME@ +PACKAGE_URL = @PACKAGE_URL@ +PACKAGE_VERSION = @PACKAGE_VERSION@ +PATH_SEPARATOR = @PATH_SEPARATOR@ +PCRE2_DATE = @PCRE2_DATE@ +PCRE2_MAJOR = @PCRE2_MAJOR@ +PCRE2_MINOR = @PCRE2_MINOR@ +PCRE2_PRERELEASE = @PCRE2_PRERELEASE@ +PCRE2_STATIC_CFLAG = @PCRE2_STATIC_CFLAG@ +PKG_CONFIG = @PKG_CONFIG@ +PKG_CONFIG_LIBDIR = @PKG_CONFIG_LIBDIR@ +PKG_CONFIG_PATH = @PKG_CONFIG_PATH@ +PTHREAD_CC = @PTHREAD_CC@ +PTHREAD_CFLAGS = @PTHREAD_CFLAGS@ +PTHREAD_LIBS = @PTHREAD_LIBS@ +RANLIB = @RANLIB@ +SED = @SED@ +SET_MAKE = @SET_MAKE@ +SHELL = @SHELL@ +SHTOOL = @SHTOOL@ +STRIP = @STRIP@ +VALGRIND_CFLAGS = @VALGRIND_CFLAGS@ +VALGRIND_LIBS = @VALGRIND_LIBS@ +VERSION = @VERSION@ +VISIBILITY_CFLAGS = @VISIBILITY_CFLAGS@ +VISIBILITY_CXXFLAGS = @VISIBILITY_CXXFLAGS@ +abs_builddir = @abs_builddir@ +abs_srcdir = @abs_srcdir@ +abs_top_builddir = @abs_top_builddir@ +abs_top_srcdir = @abs_top_srcdir@ +ac_ct_AR = @ac_ct_AR@ +ac_ct_CC = @ac_ct_CC@ +ac_ct_DUMPBIN = @ac_ct_DUMPBIN@ +am__include = @am__include@ +am__leading_dot = @am__leading_dot@ +am__quote = @am__quote@ +am__tar = @am__tar@ +am__untar = @am__untar@ +ax_pthread_config = @ax_pthread_config@ +bindir = @bindir@ +build = @build@ +build_alias = @build_alias@ +build_cpu = @build_cpu@ +build_os = @build_os@ +build_vendor = @build_vendor@ +builddir = @builddir@ +datadir = @datadir@ +datarootdir = @datarootdir@ +docdir = @docdir@ +dvidir = @dvidir@ +enable_pcre2_16 = @enable_pcre2_16@ +enable_pcre2_32 = @enable_pcre2_32@ +enable_pcre2_8 = @enable_pcre2_8@ +exec_prefix = @exec_prefix@ +host = @host@ +host_alias = @host_alias@ +host_cpu = @host_cpu@ +host_os = @host_os@ +host_vendor = @host_vendor@ +htmldir = @htmldir@ +includedir = @includedir@ +infodir = @infodir@ +install_sh = @install_sh@ +libdir = @libdir@ +libexecdir = @libexecdir@ +localedir = @localedir@ +localstatedir = @localstatedir@ +mandir = @mandir@ +mkdir_p = @mkdir_p@ +oldincludedir = @oldincludedir@ +pdfdir = @pdfdir@ +prefix = @prefix@ +program_transform_name = @program_transform_name@ +psdir = @psdir@ +runstatedir = @runstatedir@ +sbindir = @sbindir@ +sharedstatedir = @sharedstatedir@ +srcdir = @srcdir@ +sysconfdir = @sysconfdir@ +target_alias = @target_alias@ +top_build_prefix = @top_build_prefix@ +top_builddir = @top_builddir@ +top_srcdir = @top_srcdir@ +AUTOMAKE_OPTIONS = subdir-objects +ACLOCAL_AMFLAGS = -I m4 +AM_CPPFLAGS = "-I$(srcdir)/src" +dist_doc_DATA = \ + AUTHORS \ + COPYING \ + ChangeLog \ + LICENCE \ + NEWS \ + README \ + doc/pcre2.txt \ + doc/pcre2-config.txt \ + doc/pcre2grep.txt \ + doc/pcre2test.txt + +dist_html_DATA = \ + doc/html/NON-AUTOTOOLS-BUILD.txt \ + doc/html/README.txt \ + doc/html/index.html \ + doc/html/pcre2-config.html \ + doc/html/pcre2.html \ + doc/html/pcre2_callout_enumerate.html \ + doc/html/pcre2_code_copy.html \ + doc/html/pcre2_code_copy_with_tables.html \ + doc/html/pcre2_code_free.html \ + doc/html/pcre2_compile.html \ + doc/html/pcre2_compile_context_copy.html \ + doc/html/pcre2_compile_context_create.html \ + doc/html/pcre2_compile_context_free.html \ + doc/html/pcre2_config.html \ + doc/html/pcre2_convert_context_copy.html \ + doc/html/pcre2_convert_context_create.html \ + doc/html/pcre2_convert_context_free.html \ + doc/html/pcre2_converted_pattern_free.html \ + doc/html/pcre2_dfa_match.html \ + doc/html/pcre2_general_context_copy.html \ + doc/html/pcre2_general_context_create.html \ + doc/html/pcre2_general_context_free.html \ + doc/html/pcre2_get_error_message.html \ + doc/html/pcre2_get_mark.html \ + doc/html/pcre2_get_match_data_size.html \ + doc/html/pcre2_get_ovector_count.html \ + doc/html/pcre2_get_ovector_pointer.html \ + doc/html/pcre2_get_startchar.html \ + doc/html/pcre2_jit_compile.html \ + doc/html/pcre2_jit_free_unused_memory.html \ + doc/html/pcre2_jit_match.html \ + doc/html/pcre2_jit_stack_assign.html \ + doc/html/pcre2_jit_stack_create.html \ + doc/html/pcre2_jit_stack_free.html \ + doc/html/pcre2_maketables.html \ + doc/html/pcre2_maketables_free.html \ + doc/html/pcre2_match.html \ + doc/html/pcre2_match_context_copy.html \ + doc/html/pcre2_match_context_create.html \ + doc/html/pcre2_match_context_free.html \ + doc/html/pcre2_match_data_create.html \ + doc/html/pcre2_match_data_create_from_pattern.html \ + doc/html/pcre2_match_data_free.html \ + doc/html/pcre2_pattern_convert.html \ + doc/html/pcre2_pattern_info.html \ + doc/html/pcre2_serialize_decode.html \ + doc/html/pcre2_serialize_encode.html \ + doc/html/pcre2_serialize_free.html \ + doc/html/pcre2_serialize_get_number_of_codes.html \ + doc/html/pcre2_set_bsr.html \ + doc/html/pcre2_set_callout.html \ + doc/html/pcre2_set_character_tables.html \ + doc/html/pcre2_set_compile_extra_options.html \ + doc/html/pcre2_set_compile_recursion_guard.html \ + doc/html/pcre2_set_depth_limit.html \ + doc/html/pcre2_set_glob_escape.html \ + doc/html/pcre2_set_glob_separator.html \ + doc/html/pcre2_set_heap_limit.html \ + doc/html/pcre2_set_match_limit.html \ + doc/html/pcre2_set_max_pattern_length.html \ + doc/html/pcre2_set_offset_limit.html \ + doc/html/pcre2_set_newline.html \ + doc/html/pcre2_set_parens_nest_limit.html \ + doc/html/pcre2_set_recursion_limit.html \ + doc/html/pcre2_set_recursion_memory_management.html \ + doc/html/pcre2_set_substitute_callout.html \ + doc/html/pcre2_substitute.html \ + doc/html/pcre2_substring_copy_byname.html \ + doc/html/pcre2_substring_copy_bynumber.html \ + doc/html/pcre2_substring_free.html \ + doc/html/pcre2_substring_get_byname.html \ + doc/html/pcre2_substring_get_bynumber.html \ + doc/html/pcre2_substring_length_byname.html \ + doc/html/pcre2_substring_length_bynumber.html \ + doc/html/pcre2_substring_list_free.html \ + doc/html/pcre2_substring_list_get.html \ + doc/html/pcre2_substring_nametable_scan.html \ + doc/html/pcre2_substring_number_from_name.html \ + doc/html/pcre2api.html \ + doc/html/pcre2build.html \ + doc/html/pcre2callout.html \ + doc/html/pcre2compat.html \ + doc/html/pcre2convert.html \ + doc/html/pcre2demo.html \ + doc/html/pcre2grep.html \ + doc/html/pcre2jit.html \ + doc/html/pcre2limits.html \ + doc/html/pcre2matching.html \ + doc/html/pcre2partial.html \ + doc/html/pcre2pattern.html \ + doc/html/pcre2perform.html \ + doc/html/pcre2posix.html \ + doc/html/pcre2sample.html \ + doc/html/pcre2serialize.html \ + doc/html/pcre2syntax.html \ + doc/html/pcre2test.html \ + doc/html/pcre2unicode.html + +dist_man_MANS = \ + doc/pcre2-config.1 \ + doc/pcre2.3 \ + doc/pcre2_callout_enumerate.3 \ + doc/pcre2_code_copy.3 \ + doc/pcre2_code_copy_with_tables.3 \ + doc/pcre2_code_free.3 \ + doc/pcre2_compile.3 \ + doc/pcre2_compile_context_copy.3 \ + doc/pcre2_compile_context_create.3 \ + doc/pcre2_compile_context_free.3 \ + doc/pcre2_config.3 \ + doc/pcre2_convert_context_copy.3 \ + doc/pcre2_convert_context_create.3 \ + doc/pcre2_convert_context_free.3 \ + doc/pcre2_converted_pattern_free.3 \ + doc/pcre2_dfa_match.3 \ + doc/pcre2_general_context_copy.3 \ + doc/pcre2_general_context_create.3 \ + doc/pcre2_general_context_free.3 \ + doc/pcre2_get_error_message.3 \ + doc/pcre2_get_mark.3 \ + doc/pcre2_get_match_data_size.3 \ + doc/pcre2_get_ovector_count.3 \ + doc/pcre2_get_ovector_pointer.3 \ + doc/pcre2_get_startchar.3 \ + doc/pcre2_jit_compile.3 \ + doc/pcre2_jit_free_unused_memory.3 \ + doc/pcre2_jit_match.3 \ + doc/pcre2_jit_stack_assign.3 \ + doc/pcre2_jit_stack_create.3 \ + doc/pcre2_jit_stack_free.3 \ + doc/pcre2_maketables.3 \ + doc/pcre2_maketables_free.3 \ + doc/pcre2_match.3 \ + doc/pcre2_match_context_copy.3 \ + doc/pcre2_match_context_create.3 \ + doc/pcre2_match_context_free.3 \ + doc/pcre2_match_data_create.3 \ + doc/pcre2_match_data_create_from_pattern.3 \ + doc/pcre2_match_data_free.3 \ + doc/pcre2_pattern_convert.3 \ + doc/pcre2_pattern_info.3 \ + doc/pcre2_serialize_decode.3 \ + doc/pcre2_serialize_encode.3 \ + doc/pcre2_serialize_free.3 \ + doc/pcre2_serialize_get_number_of_codes.3 \ + doc/pcre2_set_bsr.3 \ + doc/pcre2_set_callout.3 \ + doc/pcre2_set_character_tables.3 \ + doc/pcre2_set_compile_extra_options.3 \ + doc/pcre2_set_compile_recursion_guard.3 \ + doc/pcre2_set_depth_limit.3 \ + doc/pcre2_set_glob_escape.3 \ + doc/pcre2_set_glob_separator.3 \ + doc/pcre2_set_heap_limit.3 \ + doc/pcre2_set_match_limit.3 \ + doc/pcre2_set_max_pattern_length.3 \ + doc/pcre2_set_offset_limit.3 \ + doc/pcre2_set_newline.3 \ + doc/pcre2_set_parens_nest_limit.3 \ + doc/pcre2_set_recursion_limit.3 \ + doc/pcre2_set_recursion_memory_management.3 \ + doc/pcre2_set_substitute_callout.3 \ + doc/pcre2_substitute.3 \ + doc/pcre2_substring_copy_byname.3 \ + doc/pcre2_substring_copy_bynumber.3 \ + doc/pcre2_substring_free.3 \ + doc/pcre2_substring_get_byname.3 \ + doc/pcre2_substring_get_bynumber.3 \ + doc/pcre2_substring_length_byname.3 \ + doc/pcre2_substring_length_bynumber.3 \ + doc/pcre2_substring_list_free.3 \ + doc/pcre2_substring_list_get.3 \ + doc/pcre2_substring_nametable_scan.3 \ + doc/pcre2_substring_number_from_name.3 \ + doc/pcre2api.3 \ + doc/pcre2build.3 \ + doc/pcre2callout.3 \ + doc/pcre2compat.3 \ + doc/pcre2convert.3 \ + doc/pcre2demo.3 \ + doc/pcre2grep.1 \ + doc/pcre2jit.3 \ + doc/pcre2limits.3 \ + doc/pcre2matching.3 \ + doc/pcre2partial.3 \ + doc/pcre2pattern.3 \ + doc/pcre2perform.3 \ + doc/pcre2posix.3 \ + doc/pcre2sample.3 \ + doc/pcre2serialize.3 \ + doc/pcre2syntax.3 \ + doc/pcre2test.1 \ + doc/pcre2unicode.3 + + +# The Libtool libraries to install. We'll add to this later. +lib_LTLIBRARIES = $(am__append_2) $(am__append_3) $(am__append_4) \ + $(am__append_11) +check_SCRIPTS = +dist_noinst_SCRIPTS = RunTest $(am__append_34) + +# Additional files to delete on 'make clean', 'make distclean', +# and 'make maintainer-clean'. + +# RunTest and RunGrepTest should clean up after themselves, but just in case +# they don't, add their working files to CLEANFILES. +CLEANFILES = src/pcre2_chartables.c testSinput test3input test3output \ + test3outputA test3outputB testtry teststdout teststderr \ + teststderrgrep testtemp1grep testtemp2grep testtrygrep \ + testNinputgrep +DISTCLEANFILES = src/config.h.in~ $(am__append_38) +MAINTAINERCLEANFILES = src/pcre2.h.generic src/config.h.generic + +# Additional files to bundle with the distribution, over and above what +# the Autotools include by default. + +# These files contain additional m4 macros that are used by autoconf. + +# These files contain maintenance information + +# These files are used in the preparation of a release + +# These files are usable versions of pcre2.h and config.h that are distributed +# for the benefit of people who are building PCRE2 manually, without the +# Autotools support. + +# The pcre2_chartables.c.dist file is the default version of +# pcre2_chartables.c, used unless --enable-rebuild-chartables is specified. + +# The JIT compiler lives in a separate directory, but its files are #included +# when pcre2_jit_compile.c is processed, so they must be distributed. + +# Some of the JIT sources are also in separate files that are #included. + +# PCRE2 demonstration program. Not built automatcally. The point is that the +# users should build it themselves. So just distribute the source. +EXTRA_DIST = m4/ax_pthread.m4 m4/pcre2_visibility.m4 \ + NON-AUTOTOOLS-BUILD HACKING PrepareRelease CheckMan CleanTxt \ + Detrail 132html doc/index.html.src src/pcre2.h.generic \ + src/config.h.generic src/pcre2_chartables.c.dist \ + src/sljit/sljitConfig.h src/sljit/sljitConfigInternal.h \ + src/sljit/sljitExecAllocator.c src/sljit/sljitLir.c \ + src/sljit/sljitLir.h src/sljit/sljitNativeARM_32.c \ + src/sljit/sljitNativeARM_64.c src/sljit/sljitNativeARM_T2_32.c \ + src/sljit/sljitNativeMIPS_32.c src/sljit/sljitNativeMIPS_64.c \ + src/sljit/sljitNativeMIPS_common.c \ + src/sljit/sljitNativePPC_32.c src/sljit/sljitNativePPC_64.c \ + src/sljit/sljitNativePPC_common.c src/sljit/sljitNativeS390X.c \ + src/sljit/sljitNativeSPARC_32.c \ + src/sljit/sljitNativeSPARC_common.c \ + src/sljit/sljitNativeX86_32.c src/sljit/sljitNativeX86_64.c \ + src/sljit/sljitNativeX86_common.c \ + src/sljit/sljitProtExecAllocator.c src/sljit/sljitUtils.c \ + src/sljit/sljitWXExecAllocator.c src/pcre2_jit_match.c \ + src/pcre2_jit_misc.c src/pcre2_printint.c RunTest.bat \ + $(am__append_33) testdata/grepbinary testdata/grepfilelist \ + testdata/grepinput testdata/grepinput3 testdata/grepinput8 \ + testdata/grepinputM testdata/grepinputv testdata/grepinputx \ + testdata/greplist testdata/grepoutput testdata/grepoutput8 \ + testdata/grepoutputC testdata/grepoutputCN \ + testdata/grepoutputN testdata/greppatN4 testdata/testbtables \ + testdata/testinput1 testdata/testinput2 testdata/testinput3 \ + testdata/testinput4 testdata/testinput5 testdata/testinput6 \ + testdata/testinput7 testdata/testinput8 testdata/testinput9 \ + testdata/testinput10 testdata/testinput11 testdata/testinput12 \ + testdata/testinput13 testdata/testinput14 testdata/testinput15 \ + testdata/testinput16 testdata/testinput17 testdata/testinput18 \ + testdata/testinput19 testdata/testinput20 testdata/testinput21 \ + testdata/testinput22 testdata/testinput23 testdata/testinput24 \ + testdata/testinput25 testdata/testinputEBC \ + testdata/testoutput1 testdata/testoutput2 testdata/testoutput3 \ + testdata/testoutput3A testdata/testoutput3B \ + testdata/testoutput4 testdata/testoutput5 testdata/testoutput6 \ + testdata/testoutput7 testdata/testoutput8-16-2 \ + testdata/testoutput8-16-3 testdata/testoutput8-16-4 \ + testdata/testoutput8-32-2 testdata/testoutput8-32-3 \ + testdata/testoutput8-32-4 testdata/testoutput8-8-2 \ + testdata/testoutput8-8-3 testdata/testoutput8-8-4 \ + testdata/testoutput9 testdata/testoutput10 \ + testdata/testoutput11-16 testdata/testoutput11-32 \ + testdata/testoutput12-16 testdata/testoutput12-32 \ + testdata/testoutput13 testdata/testoutput14-16 \ + testdata/testoutput14-32 testdata/testoutput14-8 \ + testdata/testoutput15 testdata/testoutput16 \ + testdata/testoutput17 testdata/testoutput18 \ + testdata/testoutput19 testdata/testoutput20 \ + testdata/testoutput21 testdata/testoutput22-16 \ + testdata/testoutput22-32 testdata/testoutput22-8 \ + testdata/testoutput23 testdata/testoutput24 \ + testdata/testoutput25 testdata/testoutputEBC \ + testdata/valgrind-jit.supp testdata/wintestinput3 \ + testdata/wintestoutput3 perltest.sh src/pcre2demo.c \ + cmake/COPYING-CMAKE-SCRIPTS \ + cmake/FindPackageHandleStandardArgs.cmake \ + cmake/FindReadline.cmake cmake/FindEditline.cmake \ + CMakeLists.txt config-cmake.h.in + +# These are the header files we'll install. We do not distribute pcre2.h +# because it is generated from pcre2.h.in. +nodist_include_HEADERS = src/pcre2.h +include_HEADERS = src/pcre2posix.h + +# This is the "config" script. +bin_SCRIPTS = pcre2-config +@WITH_REBUILD_CHARTABLES_TRUE@pcre2_dftables_SOURCES = src/pcre2_dftables.c +BUILT_SOURCES = src/pcre2_chartables.c +NODIST_SOURCES = src/pcre2_chartables.c +COMMON_SOURCES = \ + src/pcre2_auto_possess.c \ + src/pcre2_compile.c \ + src/pcre2_config.c \ + src/pcre2_context.c \ + src/pcre2_convert.c \ + src/pcre2_dfa_match.c \ + src/pcre2_error.c \ + src/pcre2_extuni.c \ + src/pcre2_find_bracket.c \ + src/pcre2_internal.h \ + src/pcre2_intmodedep.h \ + src/pcre2_jit_compile.c \ + src/pcre2_jit_neon_inc.h \ + src/pcre2_jit_simd_inc.h \ + src/pcre2_maketables.c \ + src/pcre2_match.c \ + src/pcre2_match_data.c \ + src/pcre2_newline.c \ + src/pcre2_ord2utf.c \ + src/pcre2_pattern_info.c \ + src/pcre2_script_run.c \ + src/pcre2_serialize.c \ + src/pcre2_string_utils.c \ + src/pcre2_study.c \ + src/pcre2_substitute.c \ + src/pcre2_substring.c \ + src/pcre2_tables.c \ + src/pcre2_ucd.c \ + src/pcre2_ucp.h \ + src/pcre2_valid_utf.c \ + src/pcre2_xclass.c + +@WITH_PCRE2_8_TRUE@libpcre2_8_la_SOURCES = \ +@WITH_PCRE2_8_TRUE@ $(COMMON_SOURCES) + +@WITH_PCRE2_8_TRUE@nodist_libpcre2_8_la_SOURCES = \ +@WITH_PCRE2_8_TRUE@ $(NODIST_SOURCES) + +@WITH_PCRE2_8_TRUE@libpcre2_8_la_CFLAGS = -DPCRE2_CODE_UNIT_WIDTH=8 \ +@WITH_PCRE2_8_TRUE@ $(VISIBILITY_CFLAGS) $(CET_CFLAGS) \ +@WITH_PCRE2_8_TRUE@ $(AM_CFLAGS) $(am__append_5) \ +@WITH_PCRE2_8_TRUE@ $(am__append_8) +@WITH_PCRE2_8_TRUE@libpcre2_8_la_LIBADD = +@WITH_PCRE2_16_TRUE@libpcre2_16_la_SOURCES = \ +@WITH_PCRE2_16_TRUE@ $(COMMON_SOURCES) + +@WITH_PCRE2_16_TRUE@nodist_libpcre2_16_la_SOURCES = \ +@WITH_PCRE2_16_TRUE@ $(NODIST_SOURCES) + +@WITH_PCRE2_16_TRUE@libpcre2_16_la_CFLAGS = \ +@WITH_PCRE2_16_TRUE@ -DPCRE2_CODE_UNIT_WIDTH=16 \ +@WITH_PCRE2_16_TRUE@ $(VISIBILITY_CFLAGS) $(CET_CFLAGS) \ +@WITH_PCRE2_16_TRUE@ $(AM_CFLAGS) $(am__append_6) \ +@WITH_PCRE2_16_TRUE@ $(am__append_9) +@WITH_PCRE2_16_TRUE@libpcre2_16_la_LIBADD = +@WITH_PCRE2_32_TRUE@libpcre2_32_la_SOURCES = \ +@WITH_PCRE2_32_TRUE@ $(COMMON_SOURCES) + +@WITH_PCRE2_32_TRUE@nodist_libpcre2_32_la_SOURCES = \ +@WITH_PCRE2_32_TRUE@ $(NODIST_SOURCES) + +@WITH_PCRE2_32_TRUE@libpcre2_32_la_CFLAGS = \ +@WITH_PCRE2_32_TRUE@ -DPCRE2_CODE_UNIT_WIDTH=32 \ +@WITH_PCRE2_32_TRUE@ $(VISIBILITY_CFLAGS) $(CET_CFLAGS) \ +@WITH_PCRE2_32_TRUE@ $(AM_CFLAGS) $(am__append_7) \ +@WITH_PCRE2_32_TRUE@ $(am__append_10) +@WITH_PCRE2_32_TRUE@libpcre2_32_la_LIBADD = +@WITH_PCRE2_8_TRUE@libpcre2_8_la_LDFLAGS = $(EXTRA_LIBPCRE2_8_LDFLAGS) +@WITH_PCRE2_16_TRUE@libpcre2_16_la_LDFLAGS = $(EXTRA_LIBPCRE2_16_LDFLAGS) +@WITH_PCRE2_32_TRUE@libpcre2_32_la_LDFLAGS = $(EXTRA_LIBPCRE2_32_LDFLAGS) +@WITH_PCRE2_8_TRUE@libpcre2_posix_la_SOURCES = src/pcre2posix.c +@WITH_PCRE2_8_TRUE@libpcre2_posix_la_CFLAGS = \ +@WITH_PCRE2_8_TRUE@ -DPCRE2_CODE_UNIT_WIDTH=8 \ +@WITH_PCRE2_8_TRUE@ $(VISIBILITY_CFLAGS) $(AM_CFLAGS) \ +@WITH_PCRE2_8_TRUE@ $(am__append_12) +@WITH_PCRE2_8_TRUE@libpcre2_posix_la_LDFLAGS = $(EXTRA_LIBPCRE2_POSIX_LDFLAGS) +@WITH_PCRE2_8_TRUE@libpcre2_posix_la_LIBADD = libpcre2-8.la +@WITH_PCRE2_8_TRUE@pcre2grep_SOURCES = src/pcre2grep.c +@WITH_PCRE2_8_TRUE@pcre2grep_CFLAGS = $(AM_CFLAGS) $(am__append_14) +@WITH_PCRE2_8_TRUE@pcre2grep_LDADD = $(LIBZ) $(LIBBZ2) libpcre2-8.la \ +@WITH_PCRE2_8_TRUE@ $(am__append_15) +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@noinst_LIBRARIES = .libs/libpcre2-fuzzsupport.a +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@_libs_libpcre2_fuzzsupport_a_SOURCES = src/pcre2_fuzzsupport.c +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@_libs_libpcre2_fuzzsupport_a_CFLAGS = $(AM_CFLAGS) +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@_libs_libpcre2_fuzzsupport_a_LIBADD = +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@pcre2fuzzcheck_SOURCES = src/pcre2_fuzzsupport.c +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@pcre2fuzzcheck_CFLAGS = \ +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@ -DSTANDALONE \ +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@ $(AM_CFLAGS) \ +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@ $(am__append_17) +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@pcre2fuzzcheck_LDADD = \ +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@ libpcre2-8.la \ +@WITH_FUZZ_SUPPORT_TRUE@@WITH_PCRE2_8_TRUE@ $(am__append_18) +@WITH_JIT_TRUE@pcre2_jit_test_SOURCES = src/pcre2_jit_test.c +@WITH_JIT_TRUE@pcre2_jit_test_CFLAGS = $(AM_CFLAGS) $(am__append_24) +@WITH_JIT_TRUE@pcre2_jit_test_LDADD = $(am__append_21) \ +@WITH_JIT_TRUE@ $(am__append_22) $(am__append_23) \ +@WITH_JIT_TRUE@ $(am__append_25) +pcre2test_SOURCES = src/pcre2test.c +pcre2test_CFLAGS = $(AM_CFLAGS) $(am__append_29) $(am__append_30) +pcre2test_LDADD = $(LIBREADLINE) $(am__append_26) $(am__append_27) \ + $(am__append_28) $(am__append_31) + +# We have .pc files for pkg-config users. +pkgconfigdir = $(libdir)/pkgconfig +pkgconfig_DATA = $(am__append_35) $(am__append_36) $(am__append_37) + +# gcov/lcov code coverage reporting +# +# Coverage reporting targets: +# +# coverage: Create a coverage report from 'make check' +# coverage-baseline: Capture baseline coverage information +# coverage-reset: This zeros the coverage counters only +# coverage-report: This creates the coverage report only +# coverage-clean-report: This removes the generated coverage report +# without cleaning the coverage data itself +# coverage-clean-data: This removes the captured coverage data without +# removing the coverage files created at compile time (*.gcno) +# coverage-clean: This cleans all coverage data including the generated +# coverage report. +@WITH_GCOV_TRUE@COVERAGE_TEST_NAME = $(PACKAGE) +@WITH_GCOV_TRUE@COVERAGE_NAME = $(PACKAGE)-$(VERSION) +@WITH_GCOV_TRUE@COVERAGE_OUTPUT_FILE = $(COVERAGE_NAME)-coverage.info +@WITH_GCOV_TRUE@COVERAGE_OUTPUT_DIR = $(COVERAGE_NAME)-coverage +@WITH_GCOV_TRUE@COVERAGE_LCOV_EXTRA_FLAGS = +@WITH_GCOV_TRUE@COVERAGE_GENHTML_EXTRA_FLAGS = +@WITH_GCOV_TRUE@coverage_quiet = $(coverage_quiet_$(V)) +@WITH_GCOV_TRUE@coverage_quiet_ = $(coverage_quiet_$(AM_DEFAULT_VERBOSITY)) +@WITH_GCOV_TRUE@coverage_quiet_0 = --quiet +all: $(BUILT_SOURCES) + $(MAKE) $(AM_MAKEFLAGS) all-am + +.SUFFIXES: +.SUFFIXES: .c .lo .log .o .obj .test .test$(EXEEXT) .trs +am--refresh: Makefile + @: +$(srcdir)/Makefile.in: $(srcdir)/Makefile.am $(am__configure_deps) + @for dep in $?; do \ + case '$(am__configure_deps)' in \ + *$$dep*) \ + echo ' cd $(srcdir) && $(AUTOMAKE) --gnu'; \ + $(am__cd) $(srcdir) && $(AUTOMAKE) --gnu \ + && exit 0; \ + exit 1;; \ + esac; \ + done; \ + echo ' cd $(top_srcdir) && $(AUTOMAKE) --gnu Makefile'; \ + $(am__cd) $(top_srcdir) && \ + $(AUTOMAKE) --gnu Makefile +Makefile: $(srcdir)/Makefile.in $(top_builddir)/config.status + @case '$?' in \ + *config.status*) \ + echo ' $(SHELL) ./config.status'; \ + $(SHELL) ./config.status;; \ + *) \ + echo ' cd $(top_builddir) && $(SHELL) ./config.status $@ $(am__maybe_remake_depfiles)'; \ + cd $(top_builddir) && $(SHELL) ./config.status $@ $(am__maybe_remake_depfiles);; \ + esac; + +$(top_builddir)/config.status: $(top_srcdir)/configure $(CONFIG_STATUS_DEPENDENCIES) + $(SHELL) ./config.status --recheck + +$(top_srcdir)/configure: $(am__configure_deps) + $(am__cd) $(srcdir) && $(AUTOCONF) +$(ACLOCAL_M4): $(am__aclocal_m4_deps) + $(am__cd) $(srcdir) && $(ACLOCAL) $(ACLOCAL_AMFLAGS) +$(am__aclocal_m4_deps): + +src/config.h: src/stamp-h1 + @test -f $@ || rm -f src/stamp-h1 + @test -f $@ || $(MAKE) $(AM_MAKEFLAGS) src/stamp-h1 + +src/stamp-h1: $(top_srcdir)/src/config.h.in $(top_builddir)/config.status + @rm -f src/stamp-h1 + cd $(top_builddir) && $(SHELL) ./config.status src/config.h +$(top_srcdir)/src/config.h.in: $(am__configure_deps) + ($(am__cd) $(top_srcdir) && $(AUTOHEADER)) + rm -f src/stamp-h1 + touch $@ + +distclean-hdr: + -rm -f src/config.h src/stamp-h1 +libpcre2-8.pc: $(top_builddir)/config.status $(srcdir)/libpcre2-8.pc.in + cd $(top_builddir) && $(SHELL) ./config.status $@ +libpcre2-16.pc: $(top_builddir)/config.status $(srcdir)/libpcre2-16.pc.in + cd $(top_builddir) && $(SHELL) ./config.status $@ +libpcre2-32.pc: $(top_builddir)/config.status $(srcdir)/libpcre2-32.pc.in + cd $(top_builddir) && $(SHELL) ./config.status $@ +libpcre2-posix.pc: $(top_builddir)/config.status $(srcdir)/libpcre2-posix.pc.in + cd $(top_builddir) && $(SHELL) ./config.status $@ +pcre2-config: $(top_builddir)/config.status $(srcdir)/pcre2-config.in + cd $(top_builddir) && $(SHELL) ./config.status $@ +src/pcre2.h: $(top_builddir)/config.status $(top_srcdir)/src/pcre2.h.in + cd $(top_builddir) && $(SHELL) ./config.status $@ +install-binPROGRAMS: $(bin_PROGRAMS) + @$(NORMAL_INSTALL) + @list='$(bin_PROGRAMS)'; test -n "$(bindir)" || list=; \ + if test -n "$$list"; then \ + echo " $(MKDIR_P) '$(DESTDIR)$(bindir)'"; \ + $(MKDIR_P) "$(DESTDIR)$(bindir)" || exit 1; \ + fi; \ + for p in $$list; do echo "$$p $$p"; done | \ + sed 's/$(EXEEXT)$$//' | \ + while read p p1; do if test -f $$p \ + || test -f $$p1 \ + ; then echo "$$p"; echo "$$p"; else :; fi; \ + done | \ + sed -e 'p;s,.*/,,;n;h' \ + -e 's|.*|.|' \ + -e 'p;x;s,.*/,,;s/$(EXEEXT)$$//;$(transform);s/$$/$(EXEEXT)/' | \ + sed 'N;N;N;s,\n, ,g' | \ + $(AWK) 'BEGIN { files["."] = ""; dirs["."] = 1 } \ + { d=$$3; if (dirs[d] != 1) { print "d", d; dirs[d] = 1 } \ + if ($$2 == $$4) files[d] = files[d] " " $$1; \ + else { print "f", $$3 "/" $$4, $$1; } } \ + END { for (d in files) print "f", d, files[d] }' | \ + while read type dir files; do \ + if test "$$dir" = .; then dir=; else dir=/$$dir; fi; \ + test -z "$$files" || { \ + echo " $(INSTALL_PROGRAM_ENV) $(LIBTOOL) $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=install $(INSTALL_PROGRAM) $$files '$(DESTDIR)$(bindir)$$dir'"; \ + $(INSTALL_PROGRAM_ENV) $(LIBTOOL) $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=install $(INSTALL_PROGRAM) $$files "$(DESTDIR)$(bindir)$$dir" || exit $$?; \ + } \ + ; done + +uninstall-binPROGRAMS: + @$(NORMAL_UNINSTALL) + @list='$(bin_PROGRAMS)'; test -n "$(bindir)" || list=; \ + files=`for p in $$list; do echo "$$p"; done | \ + sed -e 'h;s,^.*/,,;s/$(EXEEXT)$$//;$(transform)' \ + -e 's/$$/$(EXEEXT)/' \ + `; \ + test -n "$$list" || exit 0; \ + echo " ( cd '$(DESTDIR)$(bindir)' && rm -f" $$files ")"; \ + cd "$(DESTDIR)$(bindir)" && rm -f $$files + +clean-binPROGRAMS: + @list='$(bin_PROGRAMS)'; test -n "$$list" || exit 0; \ + echo " rm -f" $$list; \ + rm -f $$list || exit $$?; \ + test -n "$(EXEEXT)" || exit 0; \ + list=`for p in $$list; do echo "$$p"; done | sed 's/$(EXEEXT)$$//'`; \ + echo " rm -f" $$list; \ + rm -f $$list + +clean-noinstPROGRAMS: + @list='$(noinst_PROGRAMS)'; test -n "$$list" || exit 0; \ + echo " rm -f" $$list; \ + rm -f $$list || exit $$?; \ + test -n "$(EXEEXT)" || exit 0; \ + list=`for p in $$list; do echo "$$p"; done | sed 's/$(EXEEXT)$$//'`; \ + echo " rm -f" $$list; \ + rm -f $$list + +clean-noinstLIBRARIES: + -test -z "$(noinst_LIBRARIES)" || rm -f $(noinst_LIBRARIES) + +install-libLTLIBRARIES: $(lib_LTLIBRARIES) + @$(NORMAL_INSTALL) + @list='$(lib_LTLIBRARIES)'; test -n "$(libdir)" || list=; \ + list2=; for p in $$list; do \ + if test -f $$p; then \ + list2="$$list2 $$p"; \ + else :; fi; \ + done; \ + test -z "$$list2" || { \ + echo " $(MKDIR_P) '$(DESTDIR)$(libdir)'"; \ + $(MKDIR_P) "$(DESTDIR)$(libdir)" || exit 1; \ + echo " $(LIBTOOL) $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=install $(INSTALL) $(INSTALL_STRIP_FLAG) $$list2 '$(DESTDIR)$(libdir)'"; \ + $(LIBTOOL) $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=install $(INSTALL) $(INSTALL_STRIP_FLAG) $$list2 "$(DESTDIR)$(libdir)"; \ + } + +uninstall-libLTLIBRARIES: + @$(NORMAL_UNINSTALL) + @list='$(lib_LTLIBRARIES)'; test -n "$(libdir)" || list=; \ + for p in $$list; do \ + $(am__strip_dir) \ + echo " $(LIBTOOL) $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=uninstall rm -f '$(DESTDIR)$(libdir)/$$f'"; \ + $(LIBTOOL) $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=uninstall rm -f "$(DESTDIR)$(libdir)/$$f"; \ + done + +clean-libLTLIBRARIES: + -test -z "$(lib_LTLIBRARIES)" || rm -f $(lib_LTLIBRARIES) + @list='$(lib_LTLIBRARIES)'; \ + locs=`for p in $$list; do echo $$p; done | \ + sed 's|^[^/]*$$|.|; s|/[^/]*$$||; s|$$|/so_locations|' | \ + sort -u`; \ + test -z "$$locs" || { \ + echo rm -f $${locs}; \ + rm -f $${locs}; \ + } +src/$(am__dirstamp): + @$(MKDIR_P) src + @: > src/$(am__dirstamp) +src/$(DEPDIR)/$(am__dirstamp): + @$(MKDIR_P) src/$(DEPDIR) + @: > src/$(DEPDIR)/$(am__dirstamp) +src/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.$(OBJEXT): \ + src/$(am__dirstamp) src/$(DEPDIR)/$(am__dirstamp) +.libs/$(am__dirstamp): + @$(MKDIR_P) .libs + @: > .libs/$(am__dirstamp) + +.libs/libpcre2-fuzzsupport.a: $(_libs_libpcre2_fuzzsupport_a_OBJECTS) $(_libs_libpcre2_fuzzsupport_a_DEPENDENCIES) $(EXTRA__libs_libpcre2_fuzzsupport_a_DEPENDENCIES) .libs/$(am__dirstamp) + $(AM_V_at)-rm -f .libs/libpcre2-fuzzsupport.a + $(AM_V_AR)$(_libs_libpcre2_fuzzsupport_a_AR) .libs/libpcre2-fuzzsupport.a $(_libs_libpcre2_fuzzsupport_a_OBJECTS) $(_libs_libpcre2_fuzzsupport_a_LIBADD) + $(AM_V_at)$(RANLIB) .libs/libpcre2-fuzzsupport.a +src/libpcre2_16_la-pcre2_auto_possess.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_compile.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_config.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_context.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_convert.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_dfa_match.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_error.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_extuni.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_find_bracket.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_jit_compile.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_maketables.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_match.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_match_data.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_newline.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_ord2utf.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_pattern_info.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_script_run.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_serialize.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_string_utils.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_study.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_substitute.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_substring.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_tables.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_ucd.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_valid_utf.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_xclass.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_16_la-pcre2_chartables.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) + +libpcre2-16.la: $(libpcre2_16_la_OBJECTS) $(libpcre2_16_la_DEPENDENCIES) $(EXTRA_libpcre2_16_la_DEPENDENCIES) + $(AM_V_CCLD)$(libpcre2_16_la_LINK) $(am_libpcre2_16_la_rpath) $(libpcre2_16_la_OBJECTS) $(libpcre2_16_la_LIBADD) $(LIBS) +src/libpcre2_32_la-pcre2_auto_possess.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_compile.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_config.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_context.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_convert.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_dfa_match.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_error.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_extuni.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_find_bracket.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_jit_compile.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_maketables.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_match.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_match_data.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_newline.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_ord2utf.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_pattern_info.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_script_run.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_serialize.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_string_utils.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_study.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_substitute.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_substring.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_tables.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_ucd.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_valid_utf.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_xclass.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_32_la-pcre2_chartables.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) + +libpcre2-32.la: $(libpcre2_32_la_OBJECTS) $(libpcre2_32_la_DEPENDENCIES) $(EXTRA_libpcre2_32_la_DEPENDENCIES) + $(AM_V_CCLD)$(libpcre2_32_la_LINK) $(am_libpcre2_32_la_rpath) $(libpcre2_32_la_OBJECTS) $(libpcre2_32_la_LIBADD) $(LIBS) +src/libpcre2_8_la-pcre2_auto_possess.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_compile.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_config.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_context.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_convert.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_dfa_match.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_error.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_extuni.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_find_bracket.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_jit_compile.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_maketables.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_match.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_match_data.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_newline.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_ord2utf.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_pattern_info.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_script_run.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_serialize.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_string_utils.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_study.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_substitute.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_substring.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_tables.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_ucd.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_valid_utf.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_xclass.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) +src/libpcre2_8_la-pcre2_chartables.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) + +libpcre2-8.la: $(libpcre2_8_la_OBJECTS) $(libpcre2_8_la_DEPENDENCIES) $(EXTRA_libpcre2_8_la_DEPENDENCIES) + $(AM_V_CCLD)$(libpcre2_8_la_LINK) $(am_libpcre2_8_la_rpath) $(libpcre2_8_la_OBJECTS) $(libpcre2_8_la_LIBADD) $(LIBS) +src/libpcre2_posix_la-pcre2posix.lo: src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) + +libpcre2-posix.la: $(libpcre2_posix_la_OBJECTS) $(libpcre2_posix_la_DEPENDENCIES) $(EXTRA_libpcre2_posix_la_DEPENDENCIES) + $(AM_V_CCLD)$(libpcre2_posix_la_LINK) $(am_libpcre2_posix_la_rpath) $(libpcre2_posix_la_OBJECTS) $(libpcre2_posix_la_LIBADD) $(LIBS) +src/pcre2_dftables.$(OBJEXT): src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) + +pcre2_dftables$(EXEEXT): $(pcre2_dftables_OBJECTS) $(pcre2_dftables_DEPENDENCIES) $(EXTRA_pcre2_dftables_DEPENDENCIES) + @rm -f pcre2_dftables$(EXEEXT) + $(AM_V_CCLD)$(LINK) $(pcre2_dftables_OBJECTS) $(pcre2_dftables_LDADD) $(LIBS) +src/pcre2_jit_test-pcre2_jit_test.$(OBJEXT): src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) + +pcre2_jit_test$(EXEEXT): $(pcre2_jit_test_OBJECTS) $(pcre2_jit_test_DEPENDENCIES) $(EXTRA_pcre2_jit_test_DEPENDENCIES) + @rm -f pcre2_jit_test$(EXEEXT) + $(AM_V_CCLD)$(pcre2_jit_test_LINK) $(pcre2_jit_test_OBJECTS) $(pcre2_jit_test_LDADD) $(LIBS) +src/pcre2fuzzcheck-pcre2_fuzzsupport.$(OBJEXT): src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) + +pcre2fuzzcheck$(EXEEXT): $(pcre2fuzzcheck_OBJECTS) $(pcre2fuzzcheck_DEPENDENCIES) $(EXTRA_pcre2fuzzcheck_DEPENDENCIES) + @rm -f pcre2fuzzcheck$(EXEEXT) + $(AM_V_CCLD)$(pcre2fuzzcheck_LINK) $(pcre2fuzzcheck_OBJECTS) $(pcre2fuzzcheck_LDADD) $(LIBS) +src/pcre2grep-pcre2grep.$(OBJEXT): src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) + +pcre2grep$(EXEEXT): $(pcre2grep_OBJECTS) $(pcre2grep_DEPENDENCIES) $(EXTRA_pcre2grep_DEPENDENCIES) + @rm -f pcre2grep$(EXEEXT) + $(AM_V_CCLD)$(pcre2grep_LINK) $(pcre2grep_OBJECTS) $(pcre2grep_LDADD) $(LIBS) +src/pcre2test-pcre2test.$(OBJEXT): src/$(am__dirstamp) \ + src/$(DEPDIR)/$(am__dirstamp) + +pcre2test$(EXEEXT): $(pcre2test_OBJECTS) $(pcre2test_DEPENDENCIES) $(EXTRA_pcre2test_DEPENDENCIES) + @rm -f pcre2test$(EXEEXT) + $(AM_V_CCLD)$(pcre2test_LINK) $(pcre2test_OBJECTS) $(pcre2test_LDADD) $(LIBS) +install-binSCRIPTS: $(bin_SCRIPTS) + @$(NORMAL_INSTALL) + @list='$(bin_SCRIPTS)'; test -n "$(bindir)" || list=; \ + if test -n "$$list"; then \ + echo " $(MKDIR_P) '$(DESTDIR)$(bindir)'"; \ + $(MKDIR_P) "$(DESTDIR)$(bindir)" || exit 1; \ + fi; \ + for p in $$list; do \ + if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ + if test -f "$$d$$p"; then echo "$$d$$p"; echo "$$p"; else :; fi; \ + done | \ + sed -e 'p;s,.*/,,;n' \ + -e 'h;s|.*|.|' \ + -e 'p;x;s,.*/,,;$(transform)' | sed 'N;N;N;s,\n, ,g' | \ + $(AWK) 'BEGIN { files["."] = ""; dirs["."] = 1; } \ + { d=$$3; if (dirs[d] != 1) { print "d", d; dirs[d] = 1 } \ + if ($$2 == $$4) { files[d] = files[d] " " $$1; \ + if (++n[d] == $(am__install_max)) { \ + print "f", d, files[d]; n[d] = 0; files[d] = "" } } \ + else { print "f", d "/" $$4, $$1 } } \ + END { for (d in files) print "f", d, files[d] }' | \ + while read type dir files; do \ + if test "$$dir" = .; then dir=; else dir=/$$dir; fi; \ + test -z "$$files" || { \ + echo " $(INSTALL_SCRIPT) $$files '$(DESTDIR)$(bindir)$$dir'"; \ + $(INSTALL_SCRIPT) $$files "$(DESTDIR)$(bindir)$$dir" || exit $$?; \ + } \ + ; done + +uninstall-binSCRIPTS: + @$(NORMAL_UNINSTALL) + @list='$(bin_SCRIPTS)'; test -n "$(bindir)" || exit 0; \ + files=`for p in $$list; do echo "$$p"; done | \ + sed -e 's,.*/,,;$(transform)'`; \ + dir='$(DESTDIR)$(bindir)'; $(am__uninstall_files_from_dir) + +mostlyclean-compile: + -rm -f *.$(OBJEXT) + -rm -f src/*.$(OBJEXT) + -rm -f src/*.lo + +distclean-compile: + -rm -f *.tab.c + +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.Po@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_auto_possess.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_chartables.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_compile.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_config.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_context.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_convert.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_dfa_match.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_error.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_extuni.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_find_bracket.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_jit_compile.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_maketables.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_match.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_match_data.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_newline.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_ord2utf.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_pattern_info.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_script_run.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_serialize.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_string_utils.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_study.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_substitute.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_substring.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_tables.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_ucd.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_valid_utf.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_16_la-pcre2_xclass.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_auto_possess.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_chartables.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_compile.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_config.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_context.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_convert.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_dfa_match.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_error.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_extuni.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_find_bracket.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_jit_compile.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_maketables.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_match.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_match_data.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_newline.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_ord2utf.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_pattern_info.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_script_run.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_serialize.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_string_utils.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_study.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_substitute.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_substring.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_tables.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_ucd.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_valid_utf.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_32_la-pcre2_xclass.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_auto_possess.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_chartables.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_compile.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_config.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_context.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_convert.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_dfa_match.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_error.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_extuni.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_find_bracket.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_jit_compile.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_maketables.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_match.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_match_data.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_newline.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_ord2utf.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_pattern_info.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_script_run.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_serialize.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_string_utils.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_study.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_substitute.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_substring.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_tables.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_ucd.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_valid_utf.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_8_la-pcre2_xclass.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/libpcre2_posix_la-pcre2posix.Plo@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/pcre2_dftables.Po@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/pcre2_jit_test-pcre2_jit_test.Po@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/pcre2fuzzcheck-pcre2_fuzzsupport.Po@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/pcre2grep-pcre2grep.Po@am__quote@ # am--include-marker +@AMDEP_TRUE@@am__include@ @am__quote@src/$(DEPDIR)/pcre2test-pcre2test.Po@am__quote@ # am--include-marker + +$(am__depfiles_remade): + @$(MKDIR_P) $(@D) + @echo '# dummy' >$@-t && $(am__mv) $@-t $@ + +am--depfiles: $(am__depfiles_remade) + +.c.o: +@am__fastdepCC_TRUE@ $(AM_V_CC)depbase=`echo $@ | sed 's|[^/]*$$|$(DEPDIR)/&|;s|\.o$$||'`;\ +@am__fastdepCC_TRUE@ $(COMPILE) -MT $@ -MD -MP -MF $$depbase.Tpo -c -o $@ $< &&\ +@am__fastdepCC_TRUE@ $(am__mv) $$depbase.Tpo $$depbase.Po +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='$<' object='$@' libtool=no @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(COMPILE) -c -o $@ $< + +.c.obj: +@am__fastdepCC_TRUE@ $(AM_V_CC)depbase=`echo $@ | sed 's|[^/]*$$|$(DEPDIR)/&|;s|\.obj$$||'`;\ +@am__fastdepCC_TRUE@ $(COMPILE) -MT $@ -MD -MP -MF $$depbase.Tpo -c -o $@ `$(CYGPATH_W) '$<'` &&\ +@am__fastdepCC_TRUE@ $(am__mv) $$depbase.Tpo $$depbase.Po +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='$<' object='$@' libtool=no @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(COMPILE) -c -o $@ `$(CYGPATH_W) '$<'` + +.c.lo: +@am__fastdepCC_TRUE@ $(AM_V_CC)depbase=`echo $@ | sed 's|[^/]*$$|$(DEPDIR)/&|;s|\.lo$$||'`;\ +@am__fastdepCC_TRUE@ $(LTCOMPILE) -MT $@ -MD -MP -MF $$depbase.Tpo -c -o $@ $< &&\ +@am__fastdepCC_TRUE@ $(am__mv) $$depbase.Tpo $$depbase.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='$<' object='$@' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LTCOMPILE) -c -o $@ $< + +src/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.o: src/pcre2_fuzzsupport.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(_libs_libpcre2_fuzzsupport_a_CFLAGS) $(CFLAGS) -MT src/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.o -MD -MP -MF src/$(DEPDIR)/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.Tpo -c -o src/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.o `test -f 'src/pcre2_fuzzsupport.c' || echo '$(srcdir)/'`src/pcre2_fuzzsupport.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.Tpo src/$(DEPDIR)/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.Po +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_fuzzsupport.c' object='src/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.o' libtool=no @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(_libs_libpcre2_fuzzsupport_a_CFLAGS) $(CFLAGS) -c -o src/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.o `test -f 'src/pcre2_fuzzsupport.c' || echo '$(srcdir)/'`src/pcre2_fuzzsupport.c + +src/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.obj: src/pcre2_fuzzsupport.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(_libs_libpcre2_fuzzsupport_a_CFLAGS) $(CFLAGS) -MT src/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.obj -MD -MP -MF src/$(DEPDIR)/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.Tpo -c -o src/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.obj `if test -f 'src/pcre2_fuzzsupport.c'; then $(CYGPATH_W) 'src/pcre2_fuzzsupport.c'; else $(CYGPATH_W) '$(srcdir)/src/pcre2_fuzzsupport.c'; fi` +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.Tpo src/$(DEPDIR)/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.Po +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_fuzzsupport.c' object='src/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.obj' libtool=no @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(_libs_libpcre2_fuzzsupport_a_CFLAGS) $(CFLAGS) -c -o src/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.obj `if test -f 'src/pcre2_fuzzsupport.c'; then $(CYGPATH_W) 'src/pcre2_fuzzsupport.c'; else $(CYGPATH_W) '$(srcdir)/src/pcre2_fuzzsupport.c'; fi` + +src/libpcre2_16_la-pcre2_auto_possess.lo: src/pcre2_auto_possess.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_auto_possess.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_auto_possess.Tpo -c -o src/libpcre2_16_la-pcre2_auto_possess.lo `test -f 'src/pcre2_auto_possess.c' || echo '$(srcdir)/'`src/pcre2_auto_possess.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_auto_possess.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_auto_possess.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_auto_possess.c' object='src/libpcre2_16_la-pcre2_auto_possess.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_auto_possess.lo `test -f 'src/pcre2_auto_possess.c' || echo '$(srcdir)/'`src/pcre2_auto_possess.c + +src/libpcre2_16_la-pcre2_compile.lo: src/pcre2_compile.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_compile.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_compile.Tpo -c -o src/libpcre2_16_la-pcre2_compile.lo `test -f 'src/pcre2_compile.c' || echo '$(srcdir)/'`src/pcre2_compile.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_compile.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_compile.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_compile.c' object='src/libpcre2_16_la-pcre2_compile.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_compile.lo `test -f 'src/pcre2_compile.c' || echo '$(srcdir)/'`src/pcre2_compile.c + +src/libpcre2_16_la-pcre2_config.lo: src/pcre2_config.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_config.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_config.Tpo -c -o src/libpcre2_16_la-pcre2_config.lo `test -f 'src/pcre2_config.c' || echo '$(srcdir)/'`src/pcre2_config.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_config.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_config.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_config.c' object='src/libpcre2_16_la-pcre2_config.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_config.lo `test -f 'src/pcre2_config.c' || echo '$(srcdir)/'`src/pcre2_config.c + +src/libpcre2_16_la-pcre2_context.lo: src/pcre2_context.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_context.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_context.Tpo -c -o src/libpcre2_16_la-pcre2_context.lo `test -f 'src/pcre2_context.c' || echo '$(srcdir)/'`src/pcre2_context.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_context.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_context.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_context.c' object='src/libpcre2_16_la-pcre2_context.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_context.lo `test -f 'src/pcre2_context.c' || echo '$(srcdir)/'`src/pcre2_context.c + +src/libpcre2_16_la-pcre2_convert.lo: src/pcre2_convert.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_convert.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_convert.Tpo -c -o src/libpcre2_16_la-pcre2_convert.lo `test -f 'src/pcre2_convert.c' || echo '$(srcdir)/'`src/pcre2_convert.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_convert.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_convert.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_convert.c' object='src/libpcre2_16_la-pcre2_convert.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_convert.lo `test -f 'src/pcre2_convert.c' || echo '$(srcdir)/'`src/pcre2_convert.c + +src/libpcre2_16_la-pcre2_dfa_match.lo: src/pcre2_dfa_match.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_dfa_match.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_dfa_match.Tpo -c -o src/libpcre2_16_la-pcre2_dfa_match.lo `test -f 'src/pcre2_dfa_match.c' || echo '$(srcdir)/'`src/pcre2_dfa_match.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_dfa_match.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_dfa_match.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_dfa_match.c' object='src/libpcre2_16_la-pcre2_dfa_match.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_dfa_match.lo `test -f 'src/pcre2_dfa_match.c' || echo '$(srcdir)/'`src/pcre2_dfa_match.c + +src/libpcre2_16_la-pcre2_error.lo: src/pcre2_error.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_error.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_error.Tpo -c -o src/libpcre2_16_la-pcre2_error.lo `test -f 'src/pcre2_error.c' || echo '$(srcdir)/'`src/pcre2_error.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_error.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_error.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_error.c' object='src/libpcre2_16_la-pcre2_error.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_error.lo `test -f 'src/pcre2_error.c' || echo '$(srcdir)/'`src/pcre2_error.c + +src/libpcre2_16_la-pcre2_extuni.lo: src/pcre2_extuni.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_extuni.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_extuni.Tpo -c -o src/libpcre2_16_la-pcre2_extuni.lo `test -f 'src/pcre2_extuni.c' || echo '$(srcdir)/'`src/pcre2_extuni.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_extuni.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_extuni.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_extuni.c' object='src/libpcre2_16_la-pcre2_extuni.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_extuni.lo `test -f 'src/pcre2_extuni.c' || echo '$(srcdir)/'`src/pcre2_extuni.c + +src/libpcre2_16_la-pcre2_find_bracket.lo: src/pcre2_find_bracket.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_find_bracket.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_find_bracket.Tpo -c -o src/libpcre2_16_la-pcre2_find_bracket.lo `test -f 'src/pcre2_find_bracket.c' || echo '$(srcdir)/'`src/pcre2_find_bracket.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_find_bracket.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_find_bracket.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_find_bracket.c' object='src/libpcre2_16_la-pcre2_find_bracket.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_find_bracket.lo `test -f 'src/pcre2_find_bracket.c' || echo '$(srcdir)/'`src/pcre2_find_bracket.c + +src/libpcre2_16_la-pcre2_jit_compile.lo: src/pcre2_jit_compile.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_jit_compile.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_jit_compile.Tpo -c -o src/libpcre2_16_la-pcre2_jit_compile.lo `test -f 'src/pcre2_jit_compile.c' || echo '$(srcdir)/'`src/pcre2_jit_compile.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_jit_compile.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_jit_compile.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_jit_compile.c' object='src/libpcre2_16_la-pcre2_jit_compile.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_jit_compile.lo `test -f 'src/pcre2_jit_compile.c' || echo '$(srcdir)/'`src/pcre2_jit_compile.c + +src/libpcre2_16_la-pcre2_maketables.lo: src/pcre2_maketables.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_maketables.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_maketables.Tpo -c -o src/libpcre2_16_la-pcre2_maketables.lo `test -f 'src/pcre2_maketables.c' || echo '$(srcdir)/'`src/pcre2_maketables.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_maketables.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_maketables.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_maketables.c' object='src/libpcre2_16_la-pcre2_maketables.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_maketables.lo `test -f 'src/pcre2_maketables.c' || echo '$(srcdir)/'`src/pcre2_maketables.c + +src/libpcre2_16_la-pcre2_match.lo: src/pcre2_match.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_match.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_match.Tpo -c -o src/libpcre2_16_la-pcre2_match.lo `test -f 'src/pcre2_match.c' || echo '$(srcdir)/'`src/pcre2_match.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_match.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_match.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_match.c' object='src/libpcre2_16_la-pcre2_match.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_match.lo `test -f 'src/pcre2_match.c' || echo '$(srcdir)/'`src/pcre2_match.c + +src/libpcre2_16_la-pcre2_match_data.lo: src/pcre2_match_data.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_match_data.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_match_data.Tpo -c -o src/libpcre2_16_la-pcre2_match_data.lo `test -f 'src/pcre2_match_data.c' || echo '$(srcdir)/'`src/pcre2_match_data.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_match_data.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_match_data.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_match_data.c' object='src/libpcre2_16_la-pcre2_match_data.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_match_data.lo `test -f 'src/pcre2_match_data.c' || echo '$(srcdir)/'`src/pcre2_match_data.c + +src/libpcre2_16_la-pcre2_newline.lo: src/pcre2_newline.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_newline.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_newline.Tpo -c -o src/libpcre2_16_la-pcre2_newline.lo `test -f 'src/pcre2_newline.c' || echo '$(srcdir)/'`src/pcre2_newline.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_newline.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_newline.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_newline.c' object='src/libpcre2_16_la-pcre2_newline.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_newline.lo `test -f 'src/pcre2_newline.c' || echo '$(srcdir)/'`src/pcre2_newline.c + +src/libpcre2_16_la-pcre2_ord2utf.lo: src/pcre2_ord2utf.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_ord2utf.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_ord2utf.Tpo -c -o src/libpcre2_16_la-pcre2_ord2utf.lo `test -f 'src/pcre2_ord2utf.c' || echo '$(srcdir)/'`src/pcre2_ord2utf.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_ord2utf.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_ord2utf.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_ord2utf.c' object='src/libpcre2_16_la-pcre2_ord2utf.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_ord2utf.lo `test -f 'src/pcre2_ord2utf.c' || echo '$(srcdir)/'`src/pcre2_ord2utf.c + +src/libpcre2_16_la-pcre2_pattern_info.lo: src/pcre2_pattern_info.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_pattern_info.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_pattern_info.Tpo -c -o src/libpcre2_16_la-pcre2_pattern_info.lo `test -f 'src/pcre2_pattern_info.c' || echo '$(srcdir)/'`src/pcre2_pattern_info.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_pattern_info.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_pattern_info.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_pattern_info.c' object='src/libpcre2_16_la-pcre2_pattern_info.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_pattern_info.lo `test -f 'src/pcre2_pattern_info.c' || echo '$(srcdir)/'`src/pcre2_pattern_info.c + +src/libpcre2_16_la-pcre2_script_run.lo: src/pcre2_script_run.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_script_run.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_script_run.Tpo -c -o src/libpcre2_16_la-pcre2_script_run.lo `test -f 'src/pcre2_script_run.c' || echo '$(srcdir)/'`src/pcre2_script_run.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_script_run.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_script_run.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_script_run.c' object='src/libpcre2_16_la-pcre2_script_run.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_script_run.lo `test -f 'src/pcre2_script_run.c' || echo '$(srcdir)/'`src/pcre2_script_run.c + +src/libpcre2_16_la-pcre2_serialize.lo: src/pcre2_serialize.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_serialize.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_serialize.Tpo -c -o src/libpcre2_16_la-pcre2_serialize.lo `test -f 'src/pcre2_serialize.c' || echo '$(srcdir)/'`src/pcre2_serialize.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_serialize.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_serialize.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_serialize.c' object='src/libpcre2_16_la-pcre2_serialize.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_serialize.lo `test -f 'src/pcre2_serialize.c' || echo '$(srcdir)/'`src/pcre2_serialize.c + +src/libpcre2_16_la-pcre2_string_utils.lo: src/pcre2_string_utils.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_string_utils.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_string_utils.Tpo -c -o src/libpcre2_16_la-pcre2_string_utils.lo `test -f 'src/pcre2_string_utils.c' || echo '$(srcdir)/'`src/pcre2_string_utils.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_string_utils.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_string_utils.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_string_utils.c' object='src/libpcre2_16_la-pcre2_string_utils.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_string_utils.lo `test -f 'src/pcre2_string_utils.c' || echo '$(srcdir)/'`src/pcre2_string_utils.c + +src/libpcre2_16_la-pcre2_study.lo: src/pcre2_study.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_study.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_study.Tpo -c -o src/libpcre2_16_la-pcre2_study.lo `test -f 'src/pcre2_study.c' || echo '$(srcdir)/'`src/pcre2_study.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_study.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_study.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_study.c' object='src/libpcre2_16_la-pcre2_study.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_study.lo `test -f 'src/pcre2_study.c' || echo '$(srcdir)/'`src/pcre2_study.c + +src/libpcre2_16_la-pcre2_substitute.lo: src/pcre2_substitute.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_substitute.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_substitute.Tpo -c -o src/libpcre2_16_la-pcre2_substitute.lo `test -f 'src/pcre2_substitute.c' || echo '$(srcdir)/'`src/pcre2_substitute.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_substitute.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_substitute.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_substitute.c' object='src/libpcre2_16_la-pcre2_substitute.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_substitute.lo `test -f 'src/pcre2_substitute.c' || echo '$(srcdir)/'`src/pcre2_substitute.c + +src/libpcre2_16_la-pcre2_substring.lo: src/pcre2_substring.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_substring.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_substring.Tpo -c -o src/libpcre2_16_la-pcre2_substring.lo `test -f 'src/pcre2_substring.c' || echo '$(srcdir)/'`src/pcre2_substring.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_substring.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_substring.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_substring.c' object='src/libpcre2_16_la-pcre2_substring.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_substring.lo `test -f 'src/pcre2_substring.c' || echo '$(srcdir)/'`src/pcre2_substring.c + +src/libpcre2_16_la-pcre2_tables.lo: src/pcre2_tables.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_tables.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_tables.Tpo -c -o src/libpcre2_16_la-pcre2_tables.lo `test -f 'src/pcre2_tables.c' || echo '$(srcdir)/'`src/pcre2_tables.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_tables.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_tables.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_tables.c' object='src/libpcre2_16_la-pcre2_tables.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_tables.lo `test -f 'src/pcre2_tables.c' || echo '$(srcdir)/'`src/pcre2_tables.c + +src/libpcre2_16_la-pcre2_ucd.lo: src/pcre2_ucd.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_ucd.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_ucd.Tpo -c -o src/libpcre2_16_la-pcre2_ucd.lo `test -f 'src/pcre2_ucd.c' || echo '$(srcdir)/'`src/pcre2_ucd.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_ucd.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_ucd.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_ucd.c' object='src/libpcre2_16_la-pcre2_ucd.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_ucd.lo `test -f 'src/pcre2_ucd.c' || echo '$(srcdir)/'`src/pcre2_ucd.c + +src/libpcre2_16_la-pcre2_valid_utf.lo: src/pcre2_valid_utf.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_valid_utf.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_valid_utf.Tpo -c -o src/libpcre2_16_la-pcre2_valid_utf.lo `test -f 'src/pcre2_valid_utf.c' || echo '$(srcdir)/'`src/pcre2_valid_utf.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_valid_utf.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_valid_utf.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_valid_utf.c' object='src/libpcre2_16_la-pcre2_valid_utf.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_valid_utf.lo `test -f 'src/pcre2_valid_utf.c' || echo '$(srcdir)/'`src/pcre2_valid_utf.c + +src/libpcre2_16_la-pcre2_xclass.lo: src/pcre2_xclass.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_xclass.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_xclass.Tpo -c -o src/libpcre2_16_la-pcre2_xclass.lo `test -f 'src/pcre2_xclass.c' || echo '$(srcdir)/'`src/pcre2_xclass.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_xclass.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_xclass.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_xclass.c' object='src/libpcre2_16_la-pcre2_xclass.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_xclass.lo `test -f 'src/pcre2_xclass.c' || echo '$(srcdir)/'`src/pcre2_xclass.c + +src/libpcre2_16_la-pcre2_chartables.lo: src/pcre2_chartables.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_16_la-pcre2_chartables.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_16_la-pcre2_chartables.Tpo -c -o src/libpcre2_16_la-pcre2_chartables.lo `test -f 'src/pcre2_chartables.c' || echo '$(srcdir)/'`src/pcre2_chartables.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_16_la-pcre2_chartables.Tpo src/$(DEPDIR)/libpcre2_16_la-pcre2_chartables.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_chartables.c' object='src/libpcre2_16_la-pcre2_chartables.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_16_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_16_la-pcre2_chartables.lo `test -f 'src/pcre2_chartables.c' || echo '$(srcdir)/'`src/pcre2_chartables.c + +src/libpcre2_32_la-pcre2_auto_possess.lo: src/pcre2_auto_possess.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_auto_possess.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_auto_possess.Tpo -c -o src/libpcre2_32_la-pcre2_auto_possess.lo `test -f 'src/pcre2_auto_possess.c' || echo '$(srcdir)/'`src/pcre2_auto_possess.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_auto_possess.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_auto_possess.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_auto_possess.c' object='src/libpcre2_32_la-pcre2_auto_possess.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_auto_possess.lo `test -f 'src/pcre2_auto_possess.c' || echo '$(srcdir)/'`src/pcre2_auto_possess.c + +src/libpcre2_32_la-pcre2_compile.lo: src/pcre2_compile.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_compile.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_compile.Tpo -c -o src/libpcre2_32_la-pcre2_compile.lo `test -f 'src/pcre2_compile.c' || echo '$(srcdir)/'`src/pcre2_compile.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_compile.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_compile.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_compile.c' object='src/libpcre2_32_la-pcre2_compile.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_compile.lo `test -f 'src/pcre2_compile.c' || echo '$(srcdir)/'`src/pcre2_compile.c + +src/libpcre2_32_la-pcre2_config.lo: src/pcre2_config.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_config.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_config.Tpo -c -o src/libpcre2_32_la-pcre2_config.lo `test -f 'src/pcre2_config.c' || echo '$(srcdir)/'`src/pcre2_config.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_config.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_config.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_config.c' object='src/libpcre2_32_la-pcre2_config.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_config.lo `test -f 'src/pcre2_config.c' || echo '$(srcdir)/'`src/pcre2_config.c + +src/libpcre2_32_la-pcre2_context.lo: src/pcre2_context.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_context.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_context.Tpo -c -o src/libpcre2_32_la-pcre2_context.lo `test -f 'src/pcre2_context.c' || echo '$(srcdir)/'`src/pcre2_context.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_context.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_context.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_context.c' object='src/libpcre2_32_la-pcre2_context.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_context.lo `test -f 'src/pcre2_context.c' || echo '$(srcdir)/'`src/pcre2_context.c + +src/libpcre2_32_la-pcre2_convert.lo: src/pcre2_convert.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_convert.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_convert.Tpo -c -o src/libpcre2_32_la-pcre2_convert.lo `test -f 'src/pcre2_convert.c' || echo '$(srcdir)/'`src/pcre2_convert.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_convert.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_convert.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_convert.c' object='src/libpcre2_32_la-pcre2_convert.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_convert.lo `test -f 'src/pcre2_convert.c' || echo '$(srcdir)/'`src/pcre2_convert.c + +src/libpcre2_32_la-pcre2_dfa_match.lo: src/pcre2_dfa_match.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_dfa_match.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_dfa_match.Tpo -c -o src/libpcre2_32_la-pcre2_dfa_match.lo `test -f 'src/pcre2_dfa_match.c' || echo '$(srcdir)/'`src/pcre2_dfa_match.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_dfa_match.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_dfa_match.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_dfa_match.c' object='src/libpcre2_32_la-pcre2_dfa_match.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_dfa_match.lo `test -f 'src/pcre2_dfa_match.c' || echo '$(srcdir)/'`src/pcre2_dfa_match.c + +src/libpcre2_32_la-pcre2_error.lo: src/pcre2_error.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_error.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_error.Tpo -c -o src/libpcre2_32_la-pcre2_error.lo `test -f 'src/pcre2_error.c' || echo '$(srcdir)/'`src/pcre2_error.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_error.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_error.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_error.c' object='src/libpcre2_32_la-pcre2_error.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_error.lo `test -f 'src/pcre2_error.c' || echo '$(srcdir)/'`src/pcre2_error.c + +src/libpcre2_32_la-pcre2_extuni.lo: src/pcre2_extuni.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_extuni.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_extuni.Tpo -c -o src/libpcre2_32_la-pcre2_extuni.lo `test -f 'src/pcre2_extuni.c' || echo '$(srcdir)/'`src/pcre2_extuni.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_extuni.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_extuni.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_extuni.c' object='src/libpcre2_32_la-pcre2_extuni.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_extuni.lo `test -f 'src/pcre2_extuni.c' || echo '$(srcdir)/'`src/pcre2_extuni.c + +src/libpcre2_32_la-pcre2_find_bracket.lo: src/pcre2_find_bracket.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_find_bracket.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_find_bracket.Tpo -c -o src/libpcre2_32_la-pcre2_find_bracket.lo `test -f 'src/pcre2_find_bracket.c' || echo '$(srcdir)/'`src/pcre2_find_bracket.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_find_bracket.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_find_bracket.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_find_bracket.c' object='src/libpcre2_32_la-pcre2_find_bracket.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_find_bracket.lo `test -f 'src/pcre2_find_bracket.c' || echo '$(srcdir)/'`src/pcre2_find_bracket.c + +src/libpcre2_32_la-pcre2_jit_compile.lo: src/pcre2_jit_compile.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_jit_compile.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_jit_compile.Tpo -c -o src/libpcre2_32_la-pcre2_jit_compile.lo `test -f 'src/pcre2_jit_compile.c' || echo '$(srcdir)/'`src/pcre2_jit_compile.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_jit_compile.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_jit_compile.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_jit_compile.c' object='src/libpcre2_32_la-pcre2_jit_compile.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_jit_compile.lo `test -f 'src/pcre2_jit_compile.c' || echo '$(srcdir)/'`src/pcre2_jit_compile.c + +src/libpcre2_32_la-pcre2_maketables.lo: src/pcre2_maketables.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_maketables.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_maketables.Tpo -c -o src/libpcre2_32_la-pcre2_maketables.lo `test -f 'src/pcre2_maketables.c' || echo '$(srcdir)/'`src/pcre2_maketables.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_maketables.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_maketables.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_maketables.c' object='src/libpcre2_32_la-pcre2_maketables.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_maketables.lo `test -f 'src/pcre2_maketables.c' || echo '$(srcdir)/'`src/pcre2_maketables.c + +src/libpcre2_32_la-pcre2_match.lo: src/pcre2_match.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_match.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_match.Tpo -c -o src/libpcre2_32_la-pcre2_match.lo `test -f 'src/pcre2_match.c' || echo '$(srcdir)/'`src/pcre2_match.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_match.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_match.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_match.c' object='src/libpcre2_32_la-pcre2_match.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_match.lo `test -f 'src/pcre2_match.c' || echo '$(srcdir)/'`src/pcre2_match.c + +src/libpcre2_32_la-pcre2_match_data.lo: src/pcre2_match_data.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_match_data.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_match_data.Tpo -c -o src/libpcre2_32_la-pcre2_match_data.lo `test -f 'src/pcre2_match_data.c' || echo '$(srcdir)/'`src/pcre2_match_data.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_match_data.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_match_data.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_match_data.c' object='src/libpcre2_32_la-pcre2_match_data.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_match_data.lo `test -f 'src/pcre2_match_data.c' || echo '$(srcdir)/'`src/pcre2_match_data.c + +src/libpcre2_32_la-pcre2_newline.lo: src/pcre2_newline.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_newline.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_newline.Tpo -c -o src/libpcre2_32_la-pcre2_newline.lo `test -f 'src/pcre2_newline.c' || echo '$(srcdir)/'`src/pcre2_newline.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_newline.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_newline.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_newline.c' object='src/libpcre2_32_la-pcre2_newline.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_newline.lo `test -f 'src/pcre2_newline.c' || echo '$(srcdir)/'`src/pcre2_newline.c + +src/libpcre2_32_la-pcre2_ord2utf.lo: src/pcre2_ord2utf.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_ord2utf.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_ord2utf.Tpo -c -o src/libpcre2_32_la-pcre2_ord2utf.lo `test -f 'src/pcre2_ord2utf.c' || echo '$(srcdir)/'`src/pcre2_ord2utf.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_ord2utf.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_ord2utf.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_ord2utf.c' object='src/libpcre2_32_la-pcre2_ord2utf.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_ord2utf.lo `test -f 'src/pcre2_ord2utf.c' || echo '$(srcdir)/'`src/pcre2_ord2utf.c + +src/libpcre2_32_la-pcre2_pattern_info.lo: src/pcre2_pattern_info.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_pattern_info.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_pattern_info.Tpo -c -o src/libpcre2_32_la-pcre2_pattern_info.lo `test -f 'src/pcre2_pattern_info.c' || echo '$(srcdir)/'`src/pcre2_pattern_info.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_pattern_info.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_pattern_info.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_pattern_info.c' object='src/libpcre2_32_la-pcre2_pattern_info.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_pattern_info.lo `test -f 'src/pcre2_pattern_info.c' || echo '$(srcdir)/'`src/pcre2_pattern_info.c + +src/libpcre2_32_la-pcre2_script_run.lo: src/pcre2_script_run.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_script_run.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_script_run.Tpo -c -o src/libpcre2_32_la-pcre2_script_run.lo `test -f 'src/pcre2_script_run.c' || echo '$(srcdir)/'`src/pcre2_script_run.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_script_run.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_script_run.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_script_run.c' object='src/libpcre2_32_la-pcre2_script_run.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_script_run.lo `test -f 'src/pcre2_script_run.c' || echo '$(srcdir)/'`src/pcre2_script_run.c + +src/libpcre2_32_la-pcre2_serialize.lo: src/pcre2_serialize.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_serialize.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_serialize.Tpo -c -o src/libpcre2_32_la-pcre2_serialize.lo `test -f 'src/pcre2_serialize.c' || echo '$(srcdir)/'`src/pcre2_serialize.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_serialize.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_serialize.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_serialize.c' object='src/libpcre2_32_la-pcre2_serialize.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_serialize.lo `test -f 'src/pcre2_serialize.c' || echo '$(srcdir)/'`src/pcre2_serialize.c + +src/libpcre2_32_la-pcre2_string_utils.lo: src/pcre2_string_utils.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_string_utils.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_string_utils.Tpo -c -o src/libpcre2_32_la-pcre2_string_utils.lo `test -f 'src/pcre2_string_utils.c' || echo '$(srcdir)/'`src/pcre2_string_utils.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_string_utils.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_string_utils.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_string_utils.c' object='src/libpcre2_32_la-pcre2_string_utils.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_string_utils.lo `test -f 'src/pcre2_string_utils.c' || echo '$(srcdir)/'`src/pcre2_string_utils.c + +src/libpcre2_32_la-pcre2_study.lo: src/pcre2_study.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_study.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_study.Tpo -c -o src/libpcre2_32_la-pcre2_study.lo `test -f 'src/pcre2_study.c' || echo '$(srcdir)/'`src/pcre2_study.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_study.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_study.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_study.c' object='src/libpcre2_32_la-pcre2_study.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_study.lo `test -f 'src/pcre2_study.c' || echo '$(srcdir)/'`src/pcre2_study.c + +src/libpcre2_32_la-pcre2_substitute.lo: src/pcre2_substitute.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_substitute.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_substitute.Tpo -c -o src/libpcre2_32_la-pcre2_substitute.lo `test -f 'src/pcre2_substitute.c' || echo '$(srcdir)/'`src/pcre2_substitute.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_substitute.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_substitute.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_substitute.c' object='src/libpcre2_32_la-pcre2_substitute.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_substitute.lo `test -f 'src/pcre2_substitute.c' || echo '$(srcdir)/'`src/pcre2_substitute.c + +src/libpcre2_32_la-pcre2_substring.lo: src/pcre2_substring.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_substring.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_substring.Tpo -c -o src/libpcre2_32_la-pcre2_substring.lo `test -f 'src/pcre2_substring.c' || echo '$(srcdir)/'`src/pcre2_substring.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_substring.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_substring.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_substring.c' object='src/libpcre2_32_la-pcre2_substring.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_substring.lo `test -f 'src/pcre2_substring.c' || echo '$(srcdir)/'`src/pcre2_substring.c + +src/libpcre2_32_la-pcre2_tables.lo: src/pcre2_tables.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_tables.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_tables.Tpo -c -o src/libpcre2_32_la-pcre2_tables.lo `test -f 'src/pcre2_tables.c' || echo '$(srcdir)/'`src/pcre2_tables.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_tables.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_tables.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_tables.c' object='src/libpcre2_32_la-pcre2_tables.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_tables.lo `test -f 'src/pcre2_tables.c' || echo '$(srcdir)/'`src/pcre2_tables.c + +src/libpcre2_32_la-pcre2_ucd.lo: src/pcre2_ucd.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_ucd.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_ucd.Tpo -c -o src/libpcre2_32_la-pcre2_ucd.lo `test -f 'src/pcre2_ucd.c' || echo '$(srcdir)/'`src/pcre2_ucd.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_ucd.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_ucd.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_ucd.c' object='src/libpcre2_32_la-pcre2_ucd.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_ucd.lo `test -f 'src/pcre2_ucd.c' || echo '$(srcdir)/'`src/pcre2_ucd.c + +src/libpcre2_32_la-pcre2_valid_utf.lo: src/pcre2_valid_utf.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_valid_utf.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_valid_utf.Tpo -c -o src/libpcre2_32_la-pcre2_valid_utf.lo `test -f 'src/pcre2_valid_utf.c' || echo '$(srcdir)/'`src/pcre2_valid_utf.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_valid_utf.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_valid_utf.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_valid_utf.c' object='src/libpcre2_32_la-pcre2_valid_utf.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_valid_utf.lo `test -f 'src/pcre2_valid_utf.c' || echo '$(srcdir)/'`src/pcre2_valid_utf.c + +src/libpcre2_32_la-pcre2_xclass.lo: src/pcre2_xclass.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_xclass.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_xclass.Tpo -c -o src/libpcre2_32_la-pcre2_xclass.lo `test -f 'src/pcre2_xclass.c' || echo '$(srcdir)/'`src/pcre2_xclass.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_xclass.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_xclass.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_xclass.c' object='src/libpcre2_32_la-pcre2_xclass.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_xclass.lo `test -f 'src/pcre2_xclass.c' || echo '$(srcdir)/'`src/pcre2_xclass.c + +src/libpcre2_32_la-pcre2_chartables.lo: src/pcre2_chartables.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_32_la-pcre2_chartables.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_32_la-pcre2_chartables.Tpo -c -o src/libpcre2_32_la-pcre2_chartables.lo `test -f 'src/pcre2_chartables.c' || echo '$(srcdir)/'`src/pcre2_chartables.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_32_la-pcre2_chartables.Tpo src/$(DEPDIR)/libpcre2_32_la-pcre2_chartables.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_chartables.c' object='src/libpcre2_32_la-pcre2_chartables.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_32_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_32_la-pcre2_chartables.lo `test -f 'src/pcre2_chartables.c' || echo '$(srcdir)/'`src/pcre2_chartables.c + +src/libpcre2_8_la-pcre2_auto_possess.lo: src/pcre2_auto_possess.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_auto_possess.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_auto_possess.Tpo -c -o src/libpcre2_8_la-pcre2_auto_possess.lo `test -f 'src/pcre2_auto_possess.c' || echo '$(srcdir)/'`src/pcre2_auto_possess.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_auto_possess.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_auto_possess.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_auto_possess.c' object='src/libpcre2_8_la-pcre2_auto_possess.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_auto_possess.lo `test -f 'src/pcre2_auto_possess.c' || echo '$(srcdir)/'`src/pcre2_auto_possess.c + +src/libpcre2_8_la-pcre2_compile.lo: src/pcre2_compile.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_compile.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_compile.Tpo -c -o src/libpcre2_8_la-pcre2_compile.lo `test -f 'src/pcre2_compile.c' || echo '$(srcdir)/'`src/pcre2_compile.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_compile.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_compile.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_compile.c' object='src/libpcre2_8_la-pcre2_compile.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_compile.lo `test -f 'src/pcre2_compile.c' || echo '$(srcdir)/'`src/pcre2_compile.c + +src/libpcre2_8_la-pcre2_config.lo: src/pcre2_config.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_config.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_config.Tpo -c -o src/libpcre2_8_la-pcre2_config.lo `test -f 'src/pcre2_config.c' || echo '$(srcdir)/'`src/pcre2_config.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_config.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_config.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_config.c' object='src/libpcre2_8_la-pcre2_config.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_config.lo `test -f 'src/pcre2_config.c' || echo '$(srcdir)/'`src/pcre2_config.c + +src/libpcre2_8_la-pcre2_context.lo: src/pcre2_context.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_context.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_context.Tpo -c -o src/libpcre2_8_la-pcre2_context.lo `test -f 'src/pcre2_context.c' || echo '$(srcdir)/'`src/pcre2_context.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_context.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_context.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_context.c' object='src/libpcre2_8_la-pcre2_context.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_context.lo `test -f 'src/pcre2_context.c' || echo '$(srcdir)/'`src/pcre2_context.c + +src/libpcre2_8_la-pcre2_convert.lo: src/pcre2_convert.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_convert.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_convert.Tpo -c -o src/libpcre2_8_la-pcre2_convert.lo `test -f 'src/pcre2_convert.c' || echo '$(srcdir)/'`src/pcre2_convert.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_convert.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_convert.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_convert.c' object='src/libpcre2_8_la-pcre2_convert.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_convert.lo `test -f 'src/pcre2_convert.c' || echo '$(srcdir)/'`src/pcre2_convert.c + +src/libpcre2_8_la-pcre2_dfa_match.lo: src/pcre2_dfa_match.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_dfa_match.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_dfa_match.Tpo -c -o src/libpcre2_8_la-pcre2_dfa_match.lo `test -f 'src/pcre2_dfa_match.c' || echo '$(srcdir)/'`src/pcre2_dfa_match.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_dfa_match.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_dfa_match.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_dfa_match.c' object='src/libpcre2_8_la-pcre2_dfa_match.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_dfa_match.lo `test -f 'src/pcre2_dfa_match.c' || echo '$(srcdir)/'`src/pcre2_dfa_match.c + +src/libpcre2_8_la-pcre2_error.lo: src/pcre2_error.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_error.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_error.Tpo -c -o src/libpcre2_8_la-pcre2_error.lo `test -f 'src/pcre2_error.c' || echo '$(srcdir)/'`src/pcre2_error.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_error.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_error.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_error.c' object='src/libpcre2_8_la-pcre2_error.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_error.lo `test -f 'src/pcre2_error.c' || echo '$(srcdir)/'`src/pcre2_error.c + +src/libpcre2_8_la-pcre2_extuni.lo: src/pcre2_extuni.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_extuni.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_extuni.Tpo -c -o src/libpcre2_8_la-pcre2_extuni.lo `test -f 'src/pcre2_extuni.c' || echo '$(srcdir)/'`src/pcre2_extuni.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_extuni.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_extuni.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_extuni.c' object='src/libpcre2_8_la-pcre2_extuni.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_extuni.lo `test -f 'src/pcre2_extuni.c' || echo '$(srcdir)/'`src/pcre2_extuni.c + +src/libpcre2_8_la-pcre2_find_bracket.lo: src/pcre2_find_bracket.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_find_bracket.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_find_bracket.Tpo -c -o src/libpcre2_8_la-pcre2_find_bracket.lo `test -f 'src/pcre2_find_bracket.c' || echo '$(srcdir)/'`src/pcre2_find_bracket.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_find_bracket.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_find_bracket.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_find_bracket.c' object='src/libpcre2_8_la-pcre2_find_bracket.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_find_bracket.lo `test -f 'src/pcre2_find_bracket.c' || echo '$(srcdir)/'`src/pcre2_find_bracket.c + +src/libpcre2_8_la-pcre2_jit_compile.lo: src/pcre2_jit_compile.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_jit_compile.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_jit_compile.Tpo -c -o src/libpcre2_8_la-pcre2_jit_compile.lo `test -f 'src/pcre2_jit_compile.c' || echo '$(srcdir)/'`src/pcre2_jit_compile.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_jit_compile.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_jit_compile.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_jit_compile.c' object='src/libpcre2_8_la-pcre2_jit_compile.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_jit_compile.lo `test -f 'src/pcre2_jit_compile.c' || echo '$(srcdir)/'`src/pcre2_jit_compile.c + +src/libpcre2_8_la-pcre2_maketables.lo: src/pcre2_maketables.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_maketables.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_maketables.Tpo -c -o src/libpcre2_8_la-pcre2_maketables.lo `test -f 'src/pcre2_maketables.c' || echo '$(srcdir)/'`src/pcre2_maketables.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_maketables.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_maketables.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_maketables.c' object='src/libpcre2_8_la-pcre2_maketables.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_maketables.lo `test -f 'src/pcre2_maketables.c' || echo '$(srcdir)/'`src/pcre2_maketables.c + +src/libpcre2_8_la-pcre2_match.lo: src/pcre2_match.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_match.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_match.Tpo -c -o src/libpcre2_8_la-pcre2_match.lo `test -f 'src/pcre2_match.c' || echo '$(srcdir)/'`src/pcre2_match.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_match.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_match.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_match.c' object='src/libpcre2_8_la-pcre2_match.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_match.lo `test -f 'src/pcre2_match.c' || echo '$(srcdir)/'`src/pcre2_match.c + +src/libpcre2_8_la-pcre2_match_data.lo: src/pcre2_match_data.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_match_data.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_match_data.Tpo -c -o src/libpcre2_8_la-pcre2_match_data.lo `test -f 'src/pcre2_match_data.c' || echo '$(srcdir)/'`src/pcre2_match_data.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_match_data.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_match_data.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_match_data.c' object='src/libpcre2_8_la-pcre2_match_data.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_match_data.lo `test -f 'src/pcre2_match_data.c' || echo '$(srcdir)/'`src/pcre2_match_data.c + +src/libpcre2_8_la-pcre2_newline.lo: src/pcre2_newline.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_newline.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_newline.Tpo -c -o src/libpcre2_8_la-pcre2_newline.lo `test -f 'src/pcre2_newline.c' || echo '$(srcdir)/'`src/pcre2_newline.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_newline.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_newline.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_newline.c' object='src/libpcre2_8_la-pcre2_newline.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_newline.lo `test -f 'src/pcre2_newline.c' || echo '$(srcdir)/'`src/pcre2_newline.c + +src/libpcre2_8_la-pcre2_ord2utf.lo: src/pcre2_ord2utf.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_ord2utf.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_ord2utf.Tpo -c -o src/libpcre2_8_la-pcre2_ord2utf.lo `test -f 'src/pcre2_ord2utf.c' || echo '$(srcdir)/'`src/pcre2_ord2utf.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_ord2utf.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_ord2utf.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_ord2utf.c' object='src/libpcre2_8_la-pcre2_ord2utf.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_ord2utf.lo `test -f 'src/pcre2_ord2utf.c' || echo '$(srcdir)/'`src/pcre2_ord2utf.c + +src/libpcre2_8_la-pcre2_pattern_info.lo: src/pcre2_pattern_info.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_pattern_info.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_pattern_info.Tpo -c -o src/libpcre2_8_la-pcre2_pattern_info.lo `test -f 'src/pcre2_pattern_info.c' || echo '$(srcdir)/'`src/pcre2_pattern_info.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_pattern_info.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_pattern_info.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_pattern_info.c' object='src/libpcre2_8_la-pcre2_pattern_info.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_pattern_info.lo `test -f 'src/pcre2_pattern_info.c' || echo '$(srcdir)/'`src/pcre2_pattern_info.c + +src/libpcre2_8_la-pcre2_script_run.lo: src/pcre2_script_run.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_script_run.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_script_run.Tpo -c -o src/libpcre2_8_la-pcre2_script_run.lo `test -f 'src/pcre2_script_run.c' || echo '$(srcdir)/'`src/pcre2_script_run.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_script_run.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_script_run.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_script_run.c' object='src/libpcre2_8_la-pcre2_script_run.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_script_run.lo `test -f 'src/pcre2_script_run.c' || echo '$(srcdir)/'`src/pcre2_script_run.c + +src/libpcre2_8_la-pcre2_serialize.lo: src/pcre2_serialize.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_serialize.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_serialize.Tpo -c -o src/libpcre2_8_la-pcre2_serialize.lo `test -f 'src/pcre2_serialize.c' || echo '$(srcdir)/'`src/pcre2_serialize.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_serialize.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_serialize.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_serialize.c' object='src/libpcre2_8_la-pcre2_serialize.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_serialize.lo `test -f 'src/pcre2_serialize.c' || echo '$(srcdir)/'`src/pcre2_serialize.c + +src/libpcre2_8_la-pcre2_string_utils.lo: src/pcre2_string_utils.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_string_utils.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_string_utils.Tpo -c -o src/libpcre2_8_la-pcre2_string_utils.lo `test -f 'src/pcre2_string_utils.c' || echo '$(srcdir)/'`src/pcre2_string_utils.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_string_utils.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_string_utils.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_string_utils.c' object='src/libpcre2_8_la-pcre2_string_utils.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_string_utils.lo `test -f 'src/pcre2_string_utils.c' || echo '$(srcdir)/'`src/pcre2_string_utils.c + +src/libpcre2_8_la-pcre2_study.lo: src/pcre2_study.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_study.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_study.Tpo -c -o src/libpcre2_8_la-pcre2_study.lo `test -f 'src/pcre2_study.c' || echo '$(srcdir)/'`src/pcre2_study.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_study.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_study.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_study.c' object='src/libpcre2_8_la-pcre2_study.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_study.lo `test -f 'src/pcre2_study.c' || echo '$(srcdir)/'`src/pcre2_study.c + +src/libpcre2_8_la-pcre2_substitute.lo: src/pcre2_substitute.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_substitute.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_substitute.Tpo -c -o src/libpcre2_8_la-pcre2_substitute.lo `test -f 'src/pcre2_substitute.c' || echo '$(srcdir)/'`src/pcre2_substitute.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_substitute.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_substitute.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_substitute.c' object='src/libpcre2_8_la-pcre2_substitute.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_substitute.lo `test -f 'src/pcre2_substitute.c' || echo '$(srcdir)/'`src/pcre2_substitute.c + +src/libpcre2_8_la-pcre2_substring.lo: src/pcre2_substring.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_substring.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_substring.Tpo -c -o src/libpcre2_8_la-pcre2_substring.lo `test -f 'src/pcre2_substring.c' || echo '$(srcdir)/'`src/pcre2_substring.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_substring.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_substring.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_substring.c' object='src/libpcre2_8_la-pcre2_substring.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_substring.lo `test -f 'src/pcre2_substring.c' || echo '$(srcdir)/'`src/pcre2_substring.c + +src/libpcre2_8_la-pcre2_tables.lo: src/pcre2_tables.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_tables.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_tables.Tpo -c -o src/libpcre2_8_la-pcre2_tables.lo `test -f 'src/pcre2_tables.c' || echo '$(srcdir)/'`src/pcre2_tables.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_tables.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_tables.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_tables.c' object='src/libpcre2_8_la-pcre2_tables.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_tables.lo `test -f 'src/pcre2_tables.c' || echo '$(srcdir)/'`src/pcre2_tables.c + +src/libpcre2_8_la-pcre2_ucd.lo: src/pcre2_ucd.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_ucd.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_ucd.Tpo -c -o src/libpcre2_8_la-pcre2_ucd.lo `test -f 'src/pcre2_ucd.c' || echo '$(srcdir)/'`src/pcre2_ucd.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_ucd.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_ucd.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_ucd.c' object='src/libpcre2_8_la-pcre2_ucd.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_ucd.lo `test -f 'src/pcre2_ucd.c' || echo '$(srcdir)/'`src/pcre2_ucd.c + +src/libpcre2_8_la-pcre2_valid_utf.lo: src/pcre2_valid_utf.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_valid_utf.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_valid_utf.Tpo -c -o src/libpcre2_8_la-pcre2_valid_utf.lo `test -f 'src/pcre2_valid_utf.c' || echo '$(srcdir)/'`src/pcre2_valid_utf.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_valid_utf.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_valid_utf.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_valid_utf.c' object='src/libpcre2_8_la-pcre2_valid_utf.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_valid_utf.lo `test -f 'src/pcre2_valid_utf.c' || echo '$(srcdir)/'`src/pcre2_valid_utf.c + +src/libpcre2_8_la-pcre2_xclass.lo: src/pcre2_xclass.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_xclass.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_xclass.Tpo -c -o src/libpcre2_8_la-pcre2_xclass.lo `test -f 'src/pcre2_xclass.c' || echo '$(srcdir)/'`src/pcre2_xclass.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_xclass.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_xclass.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_xclass.c' object='src/libpcre2_8_la-pcre2_xclass.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_xclass.lo `test -f 'src/pcre2_xclass.c' || echo '$(srcdir)/'`src/pcre2_xclass.c + +src/libpcre2_8_la-pcre2_chartables.lo: src/pcre2_chartables.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_8_la-pcre2_chartables.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_8_la-pcre2_chartables.Tpo -c -o src/libpcre2_8_la-pcre2_chartables.lo `test -f 'src/pcre2_chartables.c' || echo '$(srcdir)/'`src/pcre2_chartables.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_8_la-pcre2_chartables.Tpo src/$(DEPDIR)/libpcre2_8_la-pcre2_chartables.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_chartables.c' object='src/libpcre2_8_la-pcre2_chartables.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_8_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_8_la-pcre2_chartables.lo `test -f 'src/pcre2_chartables.c' || echo '$(srcdir)/'`src/pcre2_chartables.c + +src/libpcre2_posix_la-pcre2posix.lo: src/pcre2posix.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_posix_la_CFLAGS) $(CFLAGS) -MT src/libpcre2_posix_la-pcre2posix.lo -MD -MP -MF src/$(DEPDIR)/libpcre2_posix_la-pcre2posix.Tpo -c -o src/libpcre2_posix_la-pcre2posix.lo `test -f 'src/pcre2posix.c' || echo '$(srcdir)/'`src/pcre2posix.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/libpcre2_posix_la-pcre2posix.Tpo src/$(DEPDIR)/libpcre2_posix_la-pcre2posix.Plo +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2posix.c' object='src/libpcre2_posix_la-pcre2posix.lo' libtool=yes @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) $(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(libpcre2_posix_la_CFLAGS) $(CFLAGS) -c -o src/libpcre2_posix_la-pcre2posix.lo `test -f 'src/pcre2posix.c' || echo '$(srcdir)/'`src/pcre2posix.c + +src/pcre2_jit_test-pcre2_jit_test.o: src/pcre2_jit_test.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2_jit_test_CFLAGS) $(CFLAGS) -MT src/pcre2_jit_test-pcre2_jit_test.o -MD -MP -MF src/$(DEPDIR)/pcre2_jit_test-pcre2_jit_test.Tpo -c -o src/pcre2_jit_test-pcre2_jit_test.o `test -f 'src/pcre2_jit_test.c' || echo '$(srcdir)/'`src/pcre2_jit_test.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/pcre2_jit_test-pcre2_jit_test.Tpo src/$(DEPDIR)/pcre2_jit_test-pcre2_jit_test.Po +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_jit_test.c' object='src/pcre2_jit_test-pcre2_jit_test.o' libtool=no @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2_jit_test_CFLAGS) $(CFLAGS) -c -o src/pcre2_jit_test-pcre2_jit_test.o `test -f 'src/pcre2_jit_test.c' || echo '$(srcdir)/'`src/pcre2_jit_test.c + +src/pcre2_jit_test-pcre2_jit_test.obj: src/pcre2_jit_test.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2_jit_test_CFLAGS) $(CFLAGS) -MT src/pcre2_jit_test-pcre2_jit_test.obj -MD -MP -MF src/$(DEPDIR)/pcre2_jit_test-pcre2_jit_test.Tpo -c -o src/pcre2_jit_test-pcre2_jit_test.obj `if test -f 'src/pcre2_jit_test.c'; then $(CYGPATH_W) 'src/pcre2_jit_test.c'; else $(CYGPATH_W) '$(srcdir)/src/pcre2_jit_test.c'; fi` +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/pcre2_jit_test-pcre2_jit_test.Tpo src/$(DEPDIR)/pcre2_jit_test-pcre2_jit_test.Po +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_jit_test.c' object='src/pcre2_jit_test-pcre2_jit_test.obj' libtool=no @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2_jit_test_CFLAGS) $(CFLAGS) -c -o src/pcre2_jit_test-pcre2_jit_test.obj `if test -f 'src/pcre2_jit_test.c'; then $(CYGPATH_W) 'src/pcre2_jit_test.c'; else $(CYGPATH_W) '$(srcdir)/src/pcre2_jit_test.c'; fi` + +src/pcre2fuzzcheck-pcre2_fuzzsupport.o: src/pcre2_fuzzsupport.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2fuzzcheck_CFLAGS) $(CFLAGS) -MT src/pcre2fuzzcheck-pcre2_fuzzsupport.o -MD -MP -MF src/$(DEPDIR)/pcre2fuzzcheck-pcre2_fuzzsupport.Tpo -c -o src/pcre2fuzzcheck-pcre2_fuzzsupport.o `test -f 'src/pcre2_fuzzsupport.c' || echo '$(srcdir)/'`src/pcre2_fuzzsupport.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/pcre2fuzzcheck-pcre2_fuzzsupport.Tpo src/$(DEPDIR)/pcre2fuzzcheck-pcre2_fuzzsupport.Po +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_fuzzsupport.c' object='src/pcre2fuzzcheck-pcre2_fuzzsupport.o' libtool=no @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2fuzzcheck_CFLAGS) $(CFLAGS) -c -o src/pcre2fuzzcheck-pcre2_fuzzsupport.o `test -f 'src/pcre2_fuzzsupport.c' || echo '$(srcdir)/'`src/pcre2_fuzzsupport.c + +src/pcre2fuzzcheck-pcre2_fuzzsupport.obj: src/pcre2_fuzzsupport.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2fuzzcheck_CFLAGS) $(CFLAGS) -MT src/pcre2fuzzcheck-pcre2_fuzzsupport.obj -MD -MP -MF src/$(DEPDIR)/pcre2fuzzcheck-pcre2_fuzzsupport.Tpo -c -o src/pcre2fuzzcheck-pcre2_fuzzsupport.obj `if test -f 'src/pcre2_fuzzsupport.c'; then $(CYGPATH_W) 'src/pcre2_fuzzsupport.c'; else $(CYGPATH_W) '$(srcdir)/src/pcre2_fuzzsupport.c'; fi` +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/pcre2fuzzcheck-pcre2_fuzzsupport.Tpo src/$(DEPDIR)/pcre2fuzzcheck-pcre2_fuzzsupport.Po +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2_fuzzsupport.c' object='src/pcre2fuzzcheck-pcre2_fuzzsupport.obj' libtool=no @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2fuzzcheck_CFLAGS) $(CFLAGS) -c -o src/pcre2fuzzcheck-pcre2_fuzzsupport.obj `if test -f 'src/pcre2_fuzzsupport.c'; then $(CYGPATH_W) 'src/pcre2_fuzzsupport.c'; else $(CYGPATH_W) '$(srcdir)/src/pcre2_fuzzsupport.c'; fi` + +src/pcre2grep-pcre2grep.o: src/pcre2grep.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2grep_CFLAGS) $(CFLAGS) -MT src/pcre2grep-pcre2grep.o -MD -MP -MF src/$(DEPDIR)/pcre2grep-pcre2grep.Tpo -c -o src/pcre2grep-pcre2grep.o `test -f 'src/pcre2grep.c' || echo '$(srcdir)/'`src/pcre2grep.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/pcre2grep-pcre2grep.Tpo src/$(DEPDIR)/pcre2grep-pcre2grep.Po +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2grep.c' object='src/pcre2grep-pcre2grep.o' libtool=no @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2grep_CFLAGS) $(CFLAGS) -c -o src/pcre2grep-pcre2grep.o `test -f 'src/pcre2grep.c' || echo '$(srcdir)/'`src/pcre2grep.c + +src/pcre2grep-pcre2grep.obj: src/pcre2grep.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2grep_CFLAGS) $(CFLAGS) -MT src/pcre2grep-pcre2grep.obj -MD -MP -MF src/$(DEPDIR)/pcre2grep-pcre2grep.Tpo -c -o src/pcre2grep-pcre2grep.obj `if test -f 'src/pcre2grep.c'; then $(CYGPATH_W) 'src/pcre2grep.c'; else $(CYGPATH_W) '$(srcdir)/src/pcre2grep.c'; fi` +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/pcre2grep-pcre2grep.Tpo src/$(DEPDIR)/pcre2grep-pcre2grep.Po +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2grep.c' object='src/pcre2grep-pcre2grep.obj' libtool=no @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2grep_CFLAGS) $(CFLAGS) -c -o src/pcre2grep-pcre2grep.obj `if test -f 'src/pcre2grep.c'; then $(CYGPATH_W) 'src/pcre2grep.c'; else $(CYGPATH_W) '$(srcdir)/src/pcre2grep.c'; fi` + +src/pcre2test-pcre2test.o: src/pcre2test.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2test_CFLAGS) $(CFLAGS) -MT src/pcre2test-pcre2test.o -MD -MP -MF src/$(DEPDIR)/pcre2test-pcre2test.Tpo -c -o src/pcre2test-pcre2test.o `test -f 'src/pcre2test.c' || echo '$(srcdir)/'`src/pcre2test.c +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/pcre2test-pcre2test.Tpo src/$(DEPDIR)/pcre2test-pcre2test.Po +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2test.c' object='src/pcre2test-pcre2test.o' libtool=no @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2test_CFLAGS) $(CFLAGS) -c -o src/pcre2test-pcre2test.o `test -f 'src/pcre2test.c' || echo '$(srcdir)/'`src/pcre2test.c + +src/pcre2test-pcre2test.obj: src/pcre2test.c +@am__fastdepCC_TRUE@ $(AM_V_CC)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2test_CFLAGS) $(CFLAGS) -MT src/pcre2test-pcre2test.obj -MD -MP -MF src/$(DEPDIR)/pcre2test-pcre2test.Tpo -c -o src/pcre2test-pcre2test.obj `if test -f 'src/pcre2test.c'; then $(CYGPATH_W) 'src/pcre2test.c'; else $(CYGPATH_W) '$(srcdir)/src/pcre2test.c'; fi` +@am__fastdepCC_TRUE@ $(AM_V_at)$(am__mv) src/$(DEPDIR)/pcre2test-pcre2test.Tpo src/$(DEPDIR)/pcre2test-pcre2test.Po +@AMDEP_TRUE@@am__fastdepCC_FALSE@ $(AM_V_CC)source='src/pcre2test.c' object='src/pcre2test-pcre2test.obj' libtool=no @AMDEPBACKSLASH@ +@AMDEP_TRUE@@am__fastdepCC_FALSE@ DEPDIR=$(DEPDIR) $(CCDEPMODE) $(depcomp) @AMDEPBACKSLASH@ +@am__fastdepCC_FALSE@ $(AM_V_CC@am__nodep@)$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(pcre2test_CFLAGS) $(CFLAGS) -c -o src/pcre2test-pcre2test.obj `if test -f 'src/pcre2test.c'; then $(CYGPATH_W) 'src/pcre2test.c'; else $(CYGPATH_W) '$(srcdir)/src/pcre2test.c'; fi` + +mostlyclean-libtool: + -rm -f *.lo + +clean-libtool: + -rm -rf .libs _libs + -rm -rf src/.libs src/_libs + +distclean-libtool: + -rm -f libtool config.lt +install-man1: $(dist_man_MANS) + @$(NORMAL_INSTALL) + @list1=''; \ + list2='$(dist_man_MANS)'; \ + test -n "$(man1dir)" \ + && test -n "`echo $$list1$$list2`" \ + || exit 0; \ + echo " $(MKDIR_P) '$(DESTDIR)$(man1dir)'"; \ + $(MKDIR_P) "$(DESTDIR)$(man1dir)" || exit 1; \ + { for i in $$list1; do echo "$$i"; done; \ + if test -n "$$list2"; then \ + for i in $$list2; do echo "$$i"; done \ + | sed -n '/\.1[a-z]*$$/p'; \ + fi; \ + } | while read p; do \ + if test -f $$p; then d=; else d="$(srcdir)/"; fi; \ + echo "$$d$$p"; echo "$$p"; \ + done | \ + sed -e 'n;s,.*/,,;p;h;s,.*\.,,;s,^[^1][0-9a-z]*$$,1,;x' \ + -e 's,\.[0-9a-z]*$$,,;$(transform);G;s,\n,.,' | \ + sed 'N;N;s,\n, ,g' | { \ + list=; while read file base inst; do \ + if test "$$base" = "$$inst"; then list="$$list $$file"; else \ + echo " $(INSTALL_DATA) '$$file' '$(DESTDIR)$(man1dir)/$$inst'"; \ + $(INSTALL_DATA) "$$file" "$(DESTDIR)$(man1dir)/$$inst" || exit $$?; \ + fi; \ + done; \ + for i in $$list; do echo "$$i"; done | $(am__base_list) | \ + while read files; do \ + test -z "$$files" || { \ + echo " $(INSTALL_DATA) $$files '$(DESTDIR)$(man1dir)'"; \ + $(INSTALL_DATA) $$files "$(DESTDIR)$(man1dir)" || exit $$?; }; \ + done; } + +uninstall-man1: + @$(NORMAL_UNINSTALL) + @list=''; test -n "$(man1dir)" || exit 0; \ + files=`{ for i in $$list; do echo "$$i"; done; \ + l2='$(dist_man_MANS)'; for i in $$l2; do echo "$$i"; done | \ + sed -n '/\.1[a-z]*$$/p'; \ + } | sed -e 's,.*/,,;h;s,.*\.,,;s,^[^1][0-9a-z]*$$,1,;x' \ + -e 's,\.[0-9a-z]*$$,,;$(transform);G;s,\n,.,'`; \ + dir='$(DESTDIR)$(man1dir)'; $(am__uninstall_files_from_dir) +install-man3: $(dist_man_MANS) + @$(NORMAL_INSTALL) + @list1=''; \ + list2='$(dist_man_MANS)'; \ + test -n "$(man3dir)" \ + && test -n "`echo $$list1$$list2`" \ + || exit 0; \ + echo " $(MKDIR_P) '$(DESTDIR)$(man3dir)'"; \ + $(MKDIR_P) "$(DESTDIR)$(man3dir)" || exit 1; \ + { for i in $$list1; do echo "$$i"; done; \ + if test -n "$$list2"; then \ + for i in $$list2; do echo "$$i"; done \ + | sed -n '/\.3[a-z]*$$/p'; \ + fi; \ + } | while read p; do \ + if test -f $$p; then d=; else d="$(srcdir)/"; fi; \ + echo "$$d$$p"; echo "$$p"; \ + done | \ + sed -e 'n;s,.*/,,;p;h;s,.*\.,,;s,^[^3][0-9a-z]*$$,3,;x' \ + -e 's,\.[0-9a-z]*$$,,;$(transform);G;s,\n,.,' | \ + sed 'N;N;s,\n, ,g' | { \ + list=; while read file base inst; do \ + if test "$$base" = "$$inst"; then list="$$list $$file"; else \ + echo " $(INSTALL_DATA) '$$file' '$(DESTDIR)$(man3dir)/$$inst'"; \ + $(INSTALL_DATA) "$$file" "$(DESTDIR)$(man3dir)/$$inst" || exit $$?; \ + fi; \ + done; \ + for i in $$list; do echo "$$i"; done | $(am__base_list) | \ + while read files; do \ + test -z "$$files" || { \ + echo " $(INSTALL_DATA) $$files '$(DESTDIR)$(man3dir)'"; \ + $(INSTALL_DATA) $$files "$(DESTDIR)$(man3dir)" || exit $$?; }; \ + done; } + +uninstall-man3: + @$(NORMAL_UNINSTALL) + @list=''; test -n "$(man3dir)" || exit 0; \ + files=`{ for i in $$list; do echo "$$i"; done; \ + l2='$(dist_man_MANS)'; for i in $$l2; do echo "$$i"; done | \ + sed -n '/\.3[a-z]*$$/p'; \ + } | sed -e 's,.*/,,;h;s,.*\.,,;s,^[^3][0-9a-z]*$$,3,;x' \ + -e 's,\.[0-9a-z]*$$,,;$(transform);G;s,\n,.,'`; \ + dir='$(DESTDIR)$(man3dir)'; $(am__uninstall_files_from_dir) +install-dist_docDATA: $(dist_doc_DATA) + @$(NORMAL_INSTALL) + @list='$(dist_doc_DATA)'; test -n "$(docdir)" || list=; \ + if test -n "$$list"; then \ + echo " $(MKDIR_P) '$(DESTDIR)$(docdir)'"; \ + $(MKDIR_P) "$(DESTDIR)$(docdir)" || exit 1; \ + fi; \ + for p in $$list; do \ + if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ + echo "$$d$$p"; \ + done | $(am__base_list) | \ + while read files; do \ + echo " $(INSTALL_DATA) $$files '$(DESTDIR)$(docdir)'"; \ + $(INSTALL_DATA) $$files "$(DESTDIR)$(docdir)" || exit $$?; \ + done + +uninstall-dist_docDATA: + @$(NORMAL_UNINSTALL) + @list='$(dist_doc_DATA)'; test -n "$(docdir)" || list=; \ + files=`for p in $$list; do echo $$p; done | sed -e 's|^.*/||'`; \ + dir='$(DESTDIR)$(docdir)'; $(am__uninstall_files_from_dir) +install-dist_htmlDATA: $(dist_html_DATA) + @$(NORMAL_INSTALL) + @list='$(dist_html_DATA)'; test -n "$(htmldir)" || list=; \ + if test -n "$$list"; then \ + echo " $(MKDIR_P) '$(DESTDIR)$(htmldir)'"; \ + $(MKDIR_P) "$(DESTDIR)$(htmldir)" || exit 1; \ + fi; \ + for p in $$list; do \ + if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ + echo "$$d$$p"; \ + done | $(am__base_list) | \ + while read files; do \ + echo " $(INSTALL_DATA) $$files '$(DESTDIR)$(htmldir)'"; \ + $(INSTALL_DATA) $$files "$(DESTDIR)$(htmldir)" || exit $$?; \ + done + +uninstall-dist_htmlDATA: + @$(NORMAL_UNINSTALL) + @list='$(dist_html_DATA)'; test -n "$(htmldir)" || list=; \ + files=`for p in $$list; do echo $$p; done | sed -e 's|^.*/||'`; \ + dir='$(DESTDIR)$(htmldir)'; $(am__uninstall_files_from_dir) +install-pkgconfigDATA: $(pkgconfig_DATA) + @$(NORMAL_INSTALL) + @list='$(pkgconfig_DATA)'; test -n "$(pkgconfigdir)" || list=; \ + if test -n "$$list"; then \ + echo " $(MKDIR_P) '$(DESTDIR)$(pkgconfigdir)'"; \ + $(MKDIR_P) "$(DESTDIR)$(pkgconfigdir)" || exit 1; \ + fi; \ + for p in $$list; do \ + if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ + echo "$$d$$p"; \ + done | $(am__base_list) | \ + while read files; do \ + echo " $(INSTALL_DATA) $$files '$(DESTDIR)$(pkgconfigdir)'"; \ + $(INSTALL_DATA) $$files "$(DESTDIR)$(pkgconfigdir)" || exit $$?; \ + done + +uninstall-pkgconfigDATA: + @$(NORMAL_UNINSTALL) + @list='$(pkgconfig_DATA)'; test -n "$(pkgconfigdir)" || list=; \ + files=`for p in $$list; do echo $$p; done | sed -e 's|^.*/||'`; \ + dir='$(DESTDIR)$(pkgconfigdir)'; $(am__uninstall_files_from_dir) +install-includeHEADERS: $(include_HEADERS) + @$(NORMAL_INSTALL) + @list='$(include_HEADERS)'; test -n "$(includedir)" || list=; \ + if test -n "$$list"; then \ + echo " $(MKDIR_P) '$(DESTDIR)$(includedir)'"; \ + $(MKDIR_P) "$(DESTDIR)$(includedir)" || exit 1; \ + fi; \ + for p in $$list; do \ + if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ + echo "$$d$$p"; \ + done | $(am__base_list) | \ + while read files; do \ + echo " $(INSTALL_HEADER) $$files '$(DESTDIR)$(includedir)'"; \ + $(INSTALL_HEADER) $$files "$(DESTDIR)$(includedir)" || exit $$?; \ + done + +uninstall-includeHEADERS: + @$(NORMAL_UNINSTALL) + @list='$(include_HEADERS)'; test -n "$(includedir)" || list=; \ + files=`for p in $$list; do echo $$p; done | sed -e 's|^.*/||'`; \ + dir='$(DESTDIR)$(includedir)'; $(am__uninstall_files_from_dir) +install-nodist_includeHEADERS: $(nodist_include_HEADERS) + @$(NORMAL_INSTALL) + @list='$(nodist_include_HEADERS)'; test -n "$(includedir)" || list=; \ + if test -n "$$list"; then \ + echo " $(MKDIR_P) '$(DESTDIR)$(includedir)'"; \ + $(MKDIR_P) "$(DESTDIR)$(includedir)" || exit 1; \ + fi; \ + for p in $$list; do \ + if test -f "$$p"; then d=; else d="$(srcdir)/"; fi; \ + echo "$$d$$p"; \ + done | $(am__base_list) | \ + while read files; do \ + echo " $(INSTALL_HEADER) $$files '$(DESTDIR)$(includedir)'"; \ + $(INSTALL_HEADER) $$files "$(DESTDIR)$(includedir)" || exit $$?; \ + done + +uninstall-nodist_includeHEADERS: + @$(NORMAL_UNINSTALL) + @list='$(nodist_include_HEADERS)'; test -n "$(includedir)" || list=; \ + files=`for p in $$list; do echo $$p; done | sed -e 's|^.*/||'`; \ + dir='$(DESTDIR)$(includedir)'; $(am__uninstall_files_from_dir) + +ID: $(am__tagged_files) + $(am__define_uniq_tagged_files); mkid -fID $$unique +tags: tags-am +TAGS: tags + +tags-am: $(TAGS_DEPENDENCIES) $(am__tagged_files) + set x; \ + here=`pwd`; \ + $(am__define_uniq_tagged_files); \ + shift; \ + if test -z "$(ETAGS_ARGS)$$*$$unique"; then :; else \ + test -n "$$unique" || unique=$$empty_fix; \ + if test $$# -gt 0; then \ + $(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \ + "$$@" $$unique; \ + else \ + $(ETAGS) $(ETAGSFLAGS) $(AM_ETAGSFLAGS) $(ETAGS_ARGS) \ + $$unique; \ + fi; \ + fi +ctags: ctags-am + +CTAGS: ctags +ctags-am: $(TAGS_DEPENDENCIES) $(am__tagged_files) + $(am__define_uniq_tagged_files); \ + test -z "$(CTAGS_ARGS)$$unique" \ + || $(CTAGS) $(CTAGSFLAGS) $(AM_CTAGSFLAGS) $(CTAGS_ARGS) \ + $$unique + +GTAGS: + here=`$(am__cd) $(top_builddir) && pwd` \ + && $(am__cd) $(top_srcdir) \ + && gtags -i $(GTAGS_ARGS) "$$here" +cscope: cscope.files + test ! -s cscope.files \ + || $(CSCOPE) -b -q $(AM_CSCOPEFLAGS) $(CSCOPEFLAGS) -i cscope.files $(CSCOPE_ARGS) +clean-cscope: + -rm -f cscope.files +cscope.files: clean-cscope cscopelist +cscopelist: cscopelist-am + +cscopelist-am: $(am__tagged_files) + list='$(am__tagged_files)'; \ + case "$(srcdir)" in \ + [\\/]* | ?:[\\/]*) sdir="$(srcdir)" ;; \ + *) sdir=$(subdir)/$(srcdir) ;; \ + esac; \ + for i in $$list; do \ + if test -f "$$i"; then \ + echo "$(subdir)/$$i"; \ + else \ + echo "$$sdir/$$i"; \ + fi; \ + done >> $(top_builddir)/cscope.files + +distclean-tags: + -rm -f TAGS ID GTAGS GRTAGS GSYMS GPATH tags + -rm -f cscope.out cscope.in.out cscope.po.out cscope.files + +# Recover from deleted '.trs' file; this should ensure that +# "rm -f foo.log; make foo.trs" re-run 'foo.test', and re-create +# both 'foo.log' and 'foo.trs'. Break the recipe in two subshells +# to avoid problems with "make -n". +.log.trs: + rm -f $< $@ + $(MAKE) $(AM_MAKEFLAGS) $< + +# Leading 'am--fnord' is there to ensure the list of targets does not +# expand to empty, as could happen e.g. with make check TESTS=''. +am--fnord $(TEST_LOGS) $(TEST_LOGS:.log=.trs): $(am__force_recheck) +am--force-recheck: + @: + +$(TEST_SUITE_LOG): $(TEST_LOGS) + @$(am__set_TESTS_bases); \ + am__f_ok () { test -f "$$1" && test -r "$$1"; }; \ + redo_bases=`for i in $$bases; do \ + am__f_ok $$i.trs && am__f_ok $$i.log || echo $$i; \ + done`; \ + if test -n "$$redo_bases"; then \ + redo_logs=`for i in $$redo_bases; do echo $$i.log; done`; \ + redo_results=`for i in $$redo_bases; do echo $$i.trs; done`; \ + if $(am__make_dryrun); then :; else \ + rm -f $$redo_logs && rm -f $$redo_results || exit 1; \ + fi; \ + fi; \ + if test -n "$$am__remaking_logs"; then \ + echo "fatal: making $(TEST_SUITE_LOG): possible infinite" \ + "recursion detected" >&2; \ + elif test -n "$$redo_logs"; then \ + am__remaking_logs=yes $(MAKE) $(AM_MAKEFLAGS) $$redo_logs; \ + fi; \ + if $(am__make_dryrun); then :; else \ + st=0; \ + errmsg="fatal: making $(TEST_SUITE_LOG): failed to create"; \ + for i in $$redo_bases; do \ + test -f $$i.trs && test -r $$i.trs \ + || { echo "$$errmsg $$i.trs" >&2; st=1; }; \ + test -f $$i.log && test -r $$i.log \ + || { echo "$$errmsg $$i.log" >&2; st=1; }; \ + done; \ + test $$st -eq 0 || exit 1; \ + fi + @$(am__sh_e_setup); $(am__tty_colors); $(am__set_TESTS_bases); \ + ws='[ ]'; \ + results=`for b in $$bases; do echo $$b.trs; done`; \ + test -n "$$results" || results=/dev/null; \ + all=` grep "^$$ws*:test-result:" $$results | wc -l`; \ + pass=` grep "^$$ws*:test-result:$$ws*PASS" $$results | wc -l`; \ + fail=` grep "^$$ws*:test-result:$$ws*FAIL" $$results | wc -l`; \ + skip=` grep "^$$ws*:test-result:$$ws*SKIP" $$results | wc -l`; \ + xfail=`grep "^$$ws*:test-result:$$ws*XFAIL" $$results | wc -l`; \ + xpass=`grep "^$$ws*:test-result:$$ws*XPASS" $$results | wc -l`; \ + error=`grep "^$$ws*:test-result:$$ws*ERROR" $$results | wc -l`; \ + if test `expr $$fail + $$xpass + $$error` -eq 0; then \ + success=true; \ + else \ + success=false; \ + fi; \ + br='==================='; br=$$br$$br$$br$$br; \ + result_count () \ + { \ + if test x"$$1" = x"--maybe-color"; then \ + maybe_colorize=yes; \ + elif test x"$$1" = x"--no-color"; then \ + maybe_colorize=no; \ + else \ + echo "$@: invalid 'result_count' usage" >&2; exit 4; \ + fi; \ + shift; \ + desc=$$1 count=$$2; \ + if test $$maybe_colorize = yes && test $$count -gt 0; then \ + color_start=$$3 color_end=$$std; \ + else \ + color_start= color_end=; \ + fi; \ + echo "$${color_start}# $$desc $$count$${color_end}"; \ + }; \ + create_testsuite_report () \ + { \ + result_count $$1 "TOTAL:" $$all "$$brg"; \ + result_count $$1 "PASS: " $$pass "$$grn"; \ + result_count $$1 "SKIP: " $$skip "$$blu"; \ + result_count $$1 "XFAIL:" $$xfail "$$lgn"; \ + result_count $$1 "FAIL: " $$fail "$$red"; \ + result_count $$1 "XPASS:" $$xpass "$$red"; \ + result_count $$1 "ERROR:" $$error "$$mgn"; \ + }; \ + { \ + echo "$(PACKAGE_STRING): $(subdir)/$(TEST_SUITE_LOG)" | \ + $(am__rst_title); \ + create_testsuite_report --no-color; \ + echo; \ + echo ".. contents:: :depth: 2"; \ + echo; \ + for b in $$bases; do echo $$b; done \ + | $(am__create_global_log); \ + } >$(TEST_SUITE_LOG).tmp || exit 1; \ + mv $(TEST_SUITE_LOG).tmp $(TEST_SUITE_LOG); \ + if $$success; then \ + col="$$grn"; \ + else \ + col="$$red"; \ + test x"$$VERBOSE" = x || cat $(TEST_SUITE_LOG); \ + fi; \ + echo "$${col}$$br$${std}"; \ + echo "$${col}Testsuite summary"$(AM_TESTSUITE_SUMMARY_HEADER)"$${std}"; \ + echo "$${col}$$br$${std}"; \ + create_testsuite_report --maybe-color; \ + echo "$$col$$br$$std"; \ + if $$success; then :; else \ + echo "$${col}See $(subdir)/$(TEST_SUITE_LOG)$${std}"; \ + if test -n "$(PACKAGE_BUGREPORT)"; then \ + echo "$${col}Please report to $(PACKAGE_BUGREPORT)$${std}"; \ + fi; \ + echo "$$col$$br$$std"; \ + fi; \ + $$success || exit 1 + +check-TESTS: $(check_SCRIPTS) + @list='$(RECHECK_LOGS)'; test -z "$$list" || rm -f $$list + @list='$(RECHECK_LOGS:.log=.trs)'; test -z "$$list" || rm -f $$list + @test -z "$(TEST_SUITE_LOG)" || rm -f $(TEST_SUITE_LOG) + @set +e; $(am__set_TESTS_bases); \ + log_list=`for i in $$bases; do echo $$i.log; done`; \ + trs_list=`for i in $$bases; do echo $$i.trs; done`; \ + log_list=`echo $$log_list`; trs_list=`echo $$trs_list`; \ + $(MAKE) $(AM_MAKEFLAGS) $(TEST_SUITE_LOG) TEST_LOGS="$$log_list"; \ + exit $$?; +recheck: all $(check_SCRIPTS) + @test -z "$(TEST_SUITE_LOG)" || rm -f $(TEST_SUITE_LOG) + @set +e; $(am__set_TESTS_bases); \ + bases=`for i in $$bases; do echo $$i; done \ + | $(am__list_recheck_tests)` || exit 1; \ + log_list=`for i in $$bases; do echo $$i.log; done`; \ + log_list=`echo $$log_list`; \ + $(MAKE) $(AM_MAKEFLAGS) $(TEST_SUITE_LOG) \ + am__force_recheck=am--force-recheck \ + TEST_LOGS="$$log_list"; \ + exit $$? +pcre2_jit_test.log: pcre2_jit_test$(EXEEXT) + @p='pcre2_jit_test$(EXEEXT)'; \ + b='pcre2_jit_test'; \ + $(am__check_pre) $(LOG_DRIVER) --test-name "$$f" \ + --log-file $$b.log --trs-file $$b.trs \ + $(am__common_driver_flags) $(AM_LOG_DRIVER_FLAGS) $(LOG_DRIVER_FLAGS) -- $(LOG_COMPILE) \ + "$$tst" $(AM_TESTS_FD_REDIRECT) +RunTest.log: RunTest + @p='RunTest'; \ + b='RunTest'; \ + $(am__check_pre) $(LOG_DRIVER) --test-name "$$f" \ + --log-file $$b.log --trs-file $$b.trs \ + $(am__common_driver_flags) $(AM_LOG_DRIVER_FLAGS) $(LOG_DRIVER_FLAGS) -- $(LOG_COMPILE) \ + "$$tst" $(AM_TESTS_FD_REDIRECT) +RunGrepTest.log: RunGrepTest + @p='RunGrepTest'; \ + b='RunGrepTest'; \ + $(am__check_pre) $(LOG_DRIVER) --test-name "$$f" \ + --log-file $$b.log --trs-file $$b.trs \ + $(am__common_driver_flags) $(AM_LOG_DRIVER_FLAGS) $(LOG_DRIVER_FLAGS) -- $(LOG_COMPILE) \ + "$$tst" $(AM_TESTS_FD_REDIRECT) +.test.log: + @p='$<'; \ + $(am__set_b); \ + $(am__check_pre) $(TEST_LOG_DRIVER) --test-name "$$f" \ + --log-file $$b.log --trs-file $$b.trs \ + $(am__common_driver_flags) $(AM_TEST_LOG_DRIVER_FLAGS) $(TEST_LOG_DRIVER_FLAGS) -- $(TEST_LOG_COMPILE) \ + "$$tst" $(AM_TESTS_FD_REDIRECT) +@am__EXEEXT_TRUE@.test$(EXEEXT).log: +@am__EXEEXT_TRUE@ @p='$<'; \ +@am__EXEEXT_TRUE@ $(am__set_b); \ +@am__EXEEXT_TRUE@ $(am__check_pre) $(TEST_LOG_DRIVER) --test-name "$$f" \ +@am__EXEEXT_TRUE@ --log-file $$b.log --trs-file $$b.trs \ +@am__EXEEXT_TRUE@ $(am__common_driver_flags) $(AM_TEST_LOG_DRIVER_FLAGS) $(TEST_LOG_DRIVER_FLAGS) -- $(TEST_LOG_COMPILE) \ +@am__EXEEXT_TRUE@ "$$tst" $(AM_TESTS_FD_REDIRECT) + +distdir: $(BUILT_SOURCES) + $(MAKE) $(AM_MAKEFLAGS) distdir-am + +distdir-am: $(DISTFILES) + $(am__remove_distdir) + test -d "$(distdir)" || mkdir "$(distdir)" + @srcdirstrip=`echo "$(srcdir)" | sed 's/[].[^$$\\*]/\\\\&/g'`; \ + topsrcdirstrip=`echo "$(top_srcdir)" | sed 's/[].[^$$\\*]/\\\\&/g'`; \ + list='$(DISTFILES)'; \ + dist_files=`for file in $$list; do echo $$file; done | \ + sed -e "s|^$$srcdirstrip/||;t" \ + -e "s|^$$topsrcdirstrip/|$(top_builddir)/|;t"`; \ + case $$dist_files in \ + */*) $(MKDIR_P) `echo "$$dist_files" | \ + sed '/\//!d;s|^|$(distdir)/|;s,/[^/]*$$,,' | \ + sort -u` ;; \ + esac; \ + for file in $$dist_files; do \ + if test -f $$file || test -d $$file; then d=.; else d=$(srcdir); fi; \ + if test -d $$d/$$file; then \ + dir=`echo "/$$file" | sed -e 's,/[^/]*$$,,'`; \ + if test -d "$(distdir)/$$file"; then \ + find "$(distdir)/$$file" -type d ! -perm -700 -exec chmod u+rwx {} \;; \ + fi; \ + if test -d $(srcdir)/$$file && test $$d != $(srcdir); then \ + cp -fpR $(srcdir)/$$file "$(distdir)$$dir" || exit 1; \ + find "$(distdir)/$$file" -type d ! -perm -700 -exec chmod u+rwx {} \;; \ + fi; \ + cp -fpR $$d/$$file "$(distdir)$$dir" || exit 1; \ + else \ + test -f "$(distdir)/$$file" \ + || cp -p $$d/$$file "$(distdir)/$$file" \ + || exit 1; \ + fi; \ + done + -test -n "$(am__skip_mode_fix)" \ + || find "$(distdir)" -type d ! -perm -755 \ + -exec chmod u+rwx,go+rx {} \; -o \ + ! -type d ! -perm -444 -links 1 -exec chmod a+r {} \; -o \ + ! -type d ! -perm -400 -exec chmod a+r {} \; -o \ + ! -type d ! -perm -444 -exec $(install_sh) -c -m a+r {} {} \; \ + || chmod -R a+r "$(distdir)" +dist-gzip: distdir + tardir=$(distdir) && $(am__tar) | eval GZIP= gzip $(GZIP_ENV) -c >$(distdir).tar.gz + $(am__post_remove_distdir) +dist-bzip2: distdir + tardir=$(distdir) && $(am__tar) | BZIP2=$${BZIP2--9} bzip2 -c >$(distdir).tar.bz2 + $(am__post_remove_distdir) + +dist-lzip: distdir + tardir=$(distdir) && $(am__tar) | lzip -c $${LZIP_OPT--9} >$(distdir).tar.lz + $(am__post_remove_distdir) + +dist-xz: distdir + tardir=$(distdir) && $(am__tar) | XZ_OPT=$${XZ_OPT--e} xz -c >$(distdir).tar.xz + $(am__post_remove_distdir) + +dist-zstd: distdir + tardir=$(distdir) && $(am__tar) | zstd -c $${ZSTD_CLEVEL-$${ZSTD_OPT--19}} >$(distdir).tar.zst + $(am__post_remove_distdir) + +dist-tarZ: distdir + @echo WARNING: "Support for distribution archives compressed with" \ + "legacy program 'compress' is deprecated." >&2 + @echo WARNING: "It will be removed altogether in Automake 2.0" >&2 + tardir=$(distdir) && $(am__tar) | compress -c >$(distdir).tar.Z + $(am__post_remove_distdir) + +dist-shar: distdir + @echo WARNING: "Support for shar distribution archives is" \ + "deprecated." >&2 + @echo WARNING: "It will be removed altogether in Automake 2.0" >&2 + shar $(distdir) | eval GZIP= gzip $(GZIP_ENV) -c >$(distdir).shar.gz + $(am__post_remove_distdir) +dist-zip: distdir + -rm -f $(distdir).zip + zip -rq $(distdir).zip $(distdir) + $(am__post_remove_distdir) + +dist dist-all: + $(MAKE) $(AM_MAKEFLAGS) $(DIST_TARGETS) am__post_remove_distdir='@:' + $(am__post_remove_distdir) + +# This target untars the dist file and tries a VPATH configuration. Then +# it guarantees that the distribution is self-contained by making another +# tarfile. +distcheck: dist + case '$(DIST_ARCHIVES)' in \ + *.tar.gz*) \ + eval GZIP= gzip $(GZIP_ENV) -dc $(distdir).tar.gz | $(am__untar) ;;\ + *.tar.bz2*) \ + bzip2 -dc $(distdir).tar.bz2 | $(am__untar) ;;\ + *.tar.lz*) \ + lzip -dc $(distdir).tar.lz | $(am__untar) ;;\ + *.tar.xz*) \ + xz -dc $(distdir).tar.xz | $(am__untar) ;;\ + *.tar.Z*) \ + uncompress -c $(distdir).tar.Z | $(am__untar) ;;\ + *.shar.gz*) \ + eval GZIP= gzip $(GZIP_ENV) -dc $(distdir).shar.gz | unshar ;;\ + *.zip*) \ + unzip $(distdir).zip ;;\ + *.tar.zst*) \ + zstd -dc $(distdir).tar.zst | $(am__untar) ;;\ + esac + chmod -R a-w $(distdir) + chmod u+w $(distdir) + mkdir $(distdir)/_build $(distdir)/_build/sub $(distdir)/_inst + chmod a-w $(distdir) + test -d $(distdir)/_build || exit 0; \ + dc_install_base=`$(am__cd) $(distdir)/_inst && pwd | sed -e 's,^[^:\\/]:[\\/],/,'` \ + && dc_destdir="$${TMPDIR-/tmp}/am-dc-$$$$/" \ + && am__cwd=`pwd` \ + && $(am__cd) $(distdir)/_build/sub \ + && ../../configure \ + $(AM_DISTCHECK_CONFIGURE_FLAGS) \ + $(DISTCHECK_CONFIGURE_FLAGS) \ + --srcdir=../.. --prefix="$$dc_install_base" \ + && $(MAKE) $(AM_MAKEFLAGS) \ + && $(MAKE) $(AM_MAKEFLAGS) $(AM_DISTCHECK_DVI_TARGET) \ + && $(MAKE) $(AM_MAKEFLAGS) check \ + && $(MAKE) $(AM_MAKEFLAGS) install \ + && $(MAKE) $(AM_MAKEFLAGS) installcheck \ + && $(MAKE) $(AM_MAKEFLAGS) uninstall \ + && $(MAKE) $(AM_MAKEFLAGS) distuninstallcheck_dir="$$dc_install_base" \ + distuninstallcheck \ + && chmod -R a-w "$$dc_install_base" \ + && ({ \ + (cd ../.. && umask 077 && mkdir "$$dc_destdir") \ + && $(MAKE) $(AM_MAKEFLAGS) DESTDIR="$$dc_destdir" install \ + && $(MAKE) $(AM_MAKEFLAGS) DESTDIR="$$dc_destdir" uninstall \ + && $(MAKE) $(AM_MAKEFLAGS) DESTDIR="$$dc_destdir" \ + distuninstallcheck_dir="$$dc_destdir" distuninstallcheck; \ + } || { rm -rf "$$dc_destdir"; exit 1; }) \ + && rm -rf "$$dc_destdir" \ + && $(MAKE) $(AM_MAKEFLAGS) dist \ + && rm -rf $(DIST_ARCHIVES) \ + && $(MAKE) $(AM_MAKEFLAGS) distcleancheck \ + && cd "$$am__cwd" \ + || exit 1 + $(am__post_remove_distdir) + @(echo "$(distdir) archives ready for distribution: "; \ + list='$(DIST_ARCHIVES)'; for i in $$list; do echo $$i; done) | \ + sed -e 1h -e 1s/./=/g -e 1p -e 1x -e '$$p' -e '$$x' +distuninstallcheck: + @test -n '$(distuninstallcheck_dir)' || { \ + echo 'ERROR: trying to run $@ with an empty' \ + '$$(distuninstallcheck_dir)' >&2; \ + exit 1; \ + }; \ + $(am__cd) '$(distuninstallcheck_dir)' || { \ + echo 'ERROR: cannot chdir into $(distuninstallcheck_dir)' >&2; \ + exit 1; \ + }; \ + test `$(am__distuninstallcheck_listfiles) | wc -l` -eq 0 \ + || { echo "ERROR: files left after uninstall:" ; \ + if test -n "$(DESTDIR)"; then \ + echo " (check DESTDIR support)"; \ + fi ; \ + $(distuninstallcheck_listfiles) ; \ + exit 1; } >&2 +distcleancheck: distclean + @if test '$(srcdir)' = . ; then \ + echo "ERROR: distcleancheck can only run from a VPATH build" ; \ + exit 1 ; \ + fi + @test `$(distcleancheck_listfiles) | wc -l` -eq 0 \ + || { echo "ERROR: files left in build directory after distclean:" ; \ + $(distcleancheck_listfiles) ; \ + exit 1; } >&2 +check-am: all-am + $(MAKE) $(AM_MAKEFLAGS) $(check_SCRIPTS) + $(MAKE) $(AM_MAKEFLAGS) check-TESTS +check: $(BUILT_SOURCES) + $(MAKE) $(AM_MAKEFLAGS) check-am +all-am: Makefile $(PROGRAMS) $(LIBRARIES) $(LTLIBRARIES) $(SCRIPTS) \ + $(MANS) $(DATA) $(HEADERS) +install-binPROGRAMS: install-libLTLIBRARIES + +installdirs: + for dir in "$(DESTDIR)$(bindir)" "$(DESTDIR)$(libdir)" "$(DESTDIR)$(bindir)" "$(DESTDIR)$(man1dir)" "$(DESTDIR)$(man3dir)" "$(DESTDIR)$(docdir)" "$(DESTDIR)$(htmldir)" "$(DESTDIR)$(pkgconfigdir)" "$(DESTDIR)$(includedir)" "$(DESTDIR)$(includedir)"; do \ + test -z "$$dir" || $(MKDIR_P) "$$dir"; \ + done +install: $(BUILT_SOURCES) + $(MAKE) $(AM_MAKEFLAGS) install-am +install-exec: $(BUILT_SOURCES) + $(MAKE) $(AM_MAKEFLAGS) install-exec-am +install-data: install-data-am +uninstall: uninstall-am + +install-am: all-am + @$(MAKE) $(AM_MAKEFLAGS) install-exec-am install-data-am + +installcheck: installcheck-am +install-strip: + if test -z '$(STRIP)'; then \ + $(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \ + install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \ + install; \ + else \ + $(MAKE) $(AM_MAKEFLAGS) INSTALL_PROGRAM="$(INSTALL_STRIP_PROGRAM)" \ + install_sh_PROGRAM="$(INSTALL_STRIP_PROGRAM)" INSTALL_STRIP_FLAG=-s \ + "INSTALL_PROGRAM_ENV=STRIPPROG='$(STRIP)'" install; \ + fi +mostlyclean-generic: + -test -z "$(TEST_LOGS)" || rm -f $(TEST_LOGS) + -test -z "$(TEST_LOGS:.log=.trs)" || rm -f $(TEST_LOGS:.log=.trs) + -test -z "$(TEST_SUITE_LOG)" || rm -f $(TEST_SUITE_LOG) + +clean-generic: + -test -z "$(CLEANFILES)" || rm -f $(CLEANFILES) + +distclean-generic: + -test -z "$(CONFIG_CLEAN_FILES)" || rm -f $(CONFIG_CLEAN_FILES) + -test . = "$(srcdir)" || test -z "$(CONFIG_CLEAN_VPATH_FILES)" || rm -f $(CONFIG_CLEAN_VPATH_FILES) + -rm -f .libs/$(am__dirstamp) + -rm -f src/$(DEPDIR)/$(am__dirstamp) + -rm -f src/$(am__dirstamp) + -test -z "$(DISTCLEANFILES)" || rm -f $(DISTCLEANFILES) + +maintainer-clean-generic: + @echo "This command is intended for maintainers to use" + @echo "it deletes files that may require special tools to rebuild." + -test -z "$(BUILT_SOURCES)" || rm -f $(BUILT_SOURCES) + -test -z "$(MAINTAINERCLEANFILES)" || rm -f $(MAINTAINERCLEANFILES) +@WITH_GCOV_FALSE@clean-local: +clean: clean-am + +clean-am: clean-binPROGRAMS clean-generic clean-libLTLIBRARIES \ + clean-libtool clean-local clean-noinstLIBRARIES \ + clean-noinstPROGRAMS mostlyclean-am + +distclean: distclean-am + -rm -f $(am__CONFIG_DISTCLEAN_FILES) + -rm -f src/$(DEPDIR)/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.Po + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_auto_possess.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_chartables.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_compile.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_config.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_context.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_convert.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_dfa_match.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_error.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_extuni.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_find_bracket.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_jit_compile.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_maketables.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_match.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_match_data.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_newline.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_ord2utf.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_pattern_info.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_script_run.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_serialize.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_string_utils.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_study.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_substitute.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_substring.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_tables.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_ucd.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_valid_utf.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_xclass.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_auto_possess.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_chartables.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_compile.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_config.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_context.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_convert.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_dfa_match.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_error.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_extuni.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_find_bracket.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_jit_compile.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_maketables.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_match.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_match_data.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_newline.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_ord2utf.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_pattern_info.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_script_run.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_serialize.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_string_utils.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_study.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_substitute.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_substring.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_tables.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_ucd.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_valid_utf.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_xclass.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_auto_possess.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_chartables.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_compile.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_config.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_context.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_convert.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_dfa_match.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_error.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_extuni.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_find_bracket.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_jit_compile.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_maketables.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_match.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_match_data.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_newline.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_ord2utf.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_pattern_info.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_script_run.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_serialize.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_string_utils.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_study.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_substitute.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_substring.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_tables.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_ucd.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_valid_utf.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_xclass.Plo + -rm -f src/$(DEPDIR)/libpcre2_posix_la-pcre2posix.Plo + -rm -f src/$(DEPDIR)/pcre2_dftables.Po + -rm -f src/$(DEPDIR)/pcre2_jit_test-pcre2_jit_test.Po + -rm -f src/$(DEPDIR)/pcre2fuzzcheck-pcre2_fuzzsupport.Po + -rm -f src/$(DEPDIR)/pcre2grep-pcre2grep.Po + -rm -f src/$(DEPDIR)/pcre2test-pcre2test.Po + -rm -f Makefile +distclean-am: clean-am distclean-compile distclean-generic \ + distclean-hdr distclean-libtool distclean-local distclean-tags + +dvi: dvi-am + +dvi-am: + +html: html-am + +html-am: + +info: info-am + +info-am: + +install-data-am: install-dist_docDATA install-dist_htmlDATA \ + install-includeHEADERS install-man \ + install-nodist_includeHEADERS install-pkgconfigDATA + +install-dvi: install-dvi-am + +install-dvi-am: + +install-exec-am: install-binPROGRAMS install-binSCRIPTS \ + install-libLTLIBRARIES + +install-html: install-html-am + +install-html-am: + +install-info: install-info-am + +install-info-am: + +install-man: install-man1 install-man3 + +install-pdf: install-pdf-am + +install-pdf-am: + +install-ps: install-ps-am + +install-ps-am: + +installcheck-am: + +maintainer-clean: maintainer-clean-am + -rm -f $(am__CONFIG_DISTCLEAN_FILES) + -rm -rf $(top_srcdir)/autom4te.cache + -rm -f src/$(DEPDIR)/_libs_libpcre2_fuzzsupport_a-pcre2_fuzzsupport.Po + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_auto_possess.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_chartables.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_compile.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_config.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_context.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_convert.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_dfa_match.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_error.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_extuni.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_find_bracket.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_jit_compile.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_maketables.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_match.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_match_data.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_newline.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_ord2utf.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_pattern_info.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_script_run.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_serialize.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_string_utils.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_study.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_substitute.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_substring.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_tables.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_ucd.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_valid_utf.Plo + -rm -f src/$(DEPDIR)/libpcre2_16_la-pcre2_xclass.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_auto_possess.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_chartables.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_compile.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_config.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_context.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_convert.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_dfa_match.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_error.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_extuni.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_find_bracket.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_jit_compile.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_maketables.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_match.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_match_data.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_newline.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_ord2utf.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_pattern_info.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_script_run.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_serialize.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_string_utils.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_study.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_substitute.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_substring.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_tables.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_ucd.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_valid_utf.Plo + -rm -f src/$(DEPDIR)/libpcre2_32_la-pcre2_xclass.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_auto_possess.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_chartables.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_compile.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_config.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_context.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_convert.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_dfa_match.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_error.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_extuni.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_find_bracket.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_jit_compile.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_maketables.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_match.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_match_data.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_newline.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_ord2utf.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_pattern_info.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_script_run.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_serialize.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_string_utils.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_study.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_substitute.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_substring.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_tables.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_ucd.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_valid_utf.Plo + -rm -f src/$(DEPDIR)/libpcre2_8_la-pcre2_xclass.Plo + -rm -f src/$(DEPDIR)/libpcre2_posix_la-pcre2posix.Plo + -rm -f src/$(DEPDIR)/pcre2_dftables.Po + -rm -f src/$(DEPDIR)/pcre2_jit_test-pcre2_jit_test.Po + -rm -f src/$(DEPDIR)/pcre2fuzzcheck-pcre2_fuzzsupport.Po + -rm -f src/$(DEPDIR)/pcre2grep-pcre2grep.Po + -rm -f src/$(DEPDIR)/pcre2test-pcre2test.Po + -rm -f Makefile +maintainer-clean-am: distclean-am maintainer-clean-generic + +mostlyclean: mostlyclean-am + +mostlyclean-am: mostlyclean-compile mostlyclean-generic \ + mostlyclean-libtool + +pdf: pdf-am + +pdf-am: + +ps: ps-am + +ps-am: + +uninstall-am: uninstall-binPROGRAMS uninstall-binSCRIPTS \ + uninstall-dist_docDATA uninstall-dist_htmlDATA \ + uninstall-includeHEADERS uninstall-libLTLIBRARIES \ + uninstall-man uninstall-nodist_includeHEADERS \ + uninstall-pkgconfigDATA + +uninstall-man: uninstall-man1 uninstall-man3 + +.MAKE: all check check-am install install-am install-exec \ + install-strip + +.PHONY: CTAGS GTAGS TAGS all all-am am--depfiles am--refresh check \ + check-TESTS check-am clean clean-binPROGRAMS clean-cscope \ + clean-generic clean-libLTLIBRARIES clean-libtool clean-local \ + clean-noinstLIBRARIES clean-noinstPROGRAMS cscope \ + cscopelist-am ctags ctags-am dist dist-all dist-bzip2 \ + dist-gzip dist-lzip dist-shar dist-tarZ dist-xz dist-zip \ + dist-zstd distcheck distclean distclean-compile \ + distclean-generic distclean-hdr distclean-libtool \ + distclean-local distclean-tags distcleancheck distdir \ + distuninstallcheck dvi dvi-am html html-am info info-am \ + install install-am install-binPROGRAMS install-binSCRIPTS \ + install-data install-data-am install-dist_docDATA \ + install-dist_htmlDATA install-dvi install-dvi-am install-exec \ + install-exec-am install-html install-html-am \ + install-includeHEADERS install-info install-info-am \ + install-libLTLIBRARIES install-man install-man1 install-man3 \ + install-nodist_includeHEADERS install-pdf install-pdf-am \ + install-pkgconfigDATA install-ps install-ps-am install-strip \ + installcheck installcheck-am installdirs maintainer-clean \ + maintainer-clean-generic mostlyclean mostlyclean-compile \ + mostlyclean-generic mostlyclean-libtool pdf pdf-am ps ps-am \ + recheck tags tags-am uninstall uninstall-am \ + uninstall-binPROGRAMS uninstall-binSCRIPTS \ + uninstall-dist_docDATA uninstall-dist_htmlDATA \ + uninstall-includeHEADERS uninstall-libLTLIBRARIES \ + uninstall-man uninstall-man1 uninstall-man3 \ + uninstall-nodist_includeHEADERS uninstall-pkgconfigDATA + +.PRECIOUS: Makefile + + +# The only difference between pcre2.h.in and pcre2.h is the setting of the PCRE +# version number. Therefore, we can create the generic version just by copying. + +src/pcre2.h.generic: src/pcre2.h.in configure.ac + rm -f $@ + cp -p src/pcre2.h $@ + +# It is more complicated for config.h.generic. We need the version that results +# from a default configuration so as to get all the default values for PCRE +# configuration macros such as MATCH_LIMIT and NEWLINE. We can get this by +# doing a configure in a temporary directory. However, some trickery is needed, +# because the source directory may already be configured. If you just try +# running configure in a new directory, it complains. For this reason, we move +# config.status out of the way while doing the default configuration. The +# resulting config.h is munged by perl to put #ifdefs round any #defines for +# macros with values, and to #undef all boolean macros such as HAVE_xxx and +# SUPPORT_xxx. We also get rid of any gcc-specific visibility settings. Make +# sure that PCRE2_EXP_DEFN is unset (in case it has visibility settings). + +src/config.h.generic: configure.ac + rm -rf $@ _generic + mkdir _generic + cs=$(srcdir)/config.status; test ! -f $$cs || mv -f $$cs $$cs.aside + cd _generic && $(abs_top_srcdir)/configure || : + cs=$(srcdir)/config.status; test ! -f $$cs.aside || mv -f $$cs.aside $$cs + test -f _generic/src/config.h + perl -n \ + -e 'BEGIN{$$blank=0;}' \ + -e 'if(/PCRE2_EXP_DEFN/){print"/* #undef PCRE2_EXP_DEFN */\n";$$blank=0;next;}' \ + -e 'if(/to make a symbol visible/){next;}' \ + -e 'if(/__attribute__ \(\(visibility/){next;}' \ + -e 'if(/LT_OBJDIR/){print"/* This is ignored unless you are using libtool. */\n";}' \ + -e 'if(/^#define\s((?:HAVE|SUPPORT|STDC)_\w+)/){print"/* #undef $$1 */\n";$$blank=0;next;}' \ + -e 'if(/^#define\s(?!PACKAGE|VERSION)(\w+)/){print"#ifndef $$1\n$$_#endif\n";$$blank=0;next;}' \ + -e 'if(/^\s*$$/){print unless $$blank; $$blank=1;} else{print;$$blank=0;}' \ + _generic/src/config.h >$@ + rm -rf _generic +@WITH_REBUILD_CHARTABLES_TRUE@src/pcre2_chartables.c: pcre2_dftables$(EXEEXT) +@WITH_REBUILD_CHARTABLES_TRUE@ rm -f $@ +@WITH_REBUILD_CHARTABLES_TRUE@ ./pcre2_dftables$(EXEEXT) $@ +@WITH_REBUILD_CHARTABLES_FALSE@src/pcre2_chartables.c: $(srcdir)/src/pcre2_chartables.c.dist +@WITH_REBUILD_CHARTABLES_FALSE@ rm -f $@ +@WITH_REBUILD_CHARTABLES_FALSE@ $(LN_S) $(abs_srcdir)/src/pcre2_chartables.c.dist $(abs_builddir)/src/pcre2_chartables.c + +@WITH_GCOV_TRUE@coverage-check: all +@WITH_GCOV_TRUE@ -$(MAKE) $(AM_MAKEFLAGS) -k check + +@WITH_GCOV_TRUE@coverage-baseline: +@WITH_GCOV_TRUE@ $(LCOV) $(coverage_quiet) \ +@WITH_GCOV_TRUE@ --directory $(top_builddir) \ +@WITH_GCOV_TRUE@ --output-file "$(COVERAGE_OUTPUT_FILE)" \ +@WITH_GCOV_TRUE@ --capture \ +@WITH_GCOV_TRUE@ --initial + +@WITH_GCOV_TRUE@coverage-report: +@WITH_GCOV_TRUE@ $(LCOV) $(coverage_quiet) \ +@WITH_GCOV_TRUE@ --directory $(top_builddir) \ +@WITH_GCOV_TRUE@ --capture \ +@WITH_GCOV_TRUE@ --output-file "$(COVERAGE_OUTPUT_FILE).tmp" \ +@WITH_GCOV_TRUE@ --test-name "$(COVERAGE_TEST_NAME)" \ +@WITH_GCOV_TRUE@ --no-checksum \ +@WITH_GCOV_TRUE@ --compat-libtool \ +@WITH_GCOV_TRUE@ $(COVERAGE_LCOV_EXTRA_FLAGS) +@WITH_GCOV_TRUE@ $(LCOV) $(coverage_quiet) \ +@WITH_GCOV_TRUE@ --directory $(top_builddir) \ +@WITH_GCOV_TRUE@ --output-file "$(COVERAGE_OUTPUT_FILE)" \ +@WITH_GCOV_TRUE@ --remove "$(COVERAGE_OUTPUT_FILE).tmp" \ +@WITH_GCOV_TRUE@ "/tmp/*" \ +@WITH_GCOV_TRUE@ "/usr/include/*" \ +@WITH_GCOV_TRUE@ "$(includedir)/*" +@WITH_GCOV_TRUE@ -@rm -f "$(COVERAGE_OUTPUT_FILE).tmp" +@WITH_GCOV_TRUE@ LANG=C $(GENHTML) $(coverage_quiet) \ +@WITH_GCOV_TRUE@ --prefix $(top_builddir) \ +@WITH_GCOV_TRUE@ --output-directory "$(COVERAGE_OUTPUT_DIR)" \ +@WITH_GCOV_TRUE@ --title "$(PACKAGE) $(VERSION) Code Coverage Report" \ +@WITH_GCOV_TRUE@ --show-details "$(COVERAGE_OUTPUT_FILE)" \ +@WITH_GCOV_TRUE@ --legend \ +@WITH_GCOV_TRUE@ $(COVERAGE_GENHTML_EXTRA_FLAGS) +@WITH_GCOV_TRUE@ @echo "Code coverage report written to file://$(abs_builddir)/$(COVERAGE_OUTPUT_DIR)/index.html" + +@WITH_GCOV_TRUE@coverage-reset: +@WITH_GCOV_TRUE@ -$(LCOV) $(coverage_quiet) --zerocounters --directory $(top_builddir) + +@WITH_GCOV_TRUE@coverage-clean-report: +@WITH_GCOV_TRUE@ -rm -f "$(COVERAGE_OUTPUT_FILE)" "$(COVERAGE_OUTPUT_FILE).tmp" +@WITH_GCOV_TRUE@ -rm -rf "$(COVERAGE_OUTPUT_DIR)" + +@WITH_GCOV_TRUE@coverage-clean-data: +@WITH_GCOV_TRUE@ -find $(top_builddir) -name "*.gcda" -delete + +@WITH_GCOV_TRUE@coverage-clean: coverage-reset coverage-clean-report coverage-clean-data +@WITH_GCOV_TRUE@ -find $(top_builddir) -name "*.gcno" -delete + +@WITH_GCOV_TRUE@coverage-distclean: coverage-clean + +@WITH_GCOV_TRUE@coverage: coverage-reset coverage-baseline coverage-check coverage-report +@WITH_GCOV_TRUE@clean-local: coverage-clean +@WITH_GCOV_TRUE@distclean-local: coverage-distclean + +@WITH_GCOV_TRUE@.PHONY: coverage coverage-baseline coverage-check coverage-report coverage-reset coverage-clean-report coverage-clean-data coverage-clean coverage-distclean + +# Without coverage support, still arrange for 'make distclean' to get rid of +# any coverage files that may have been left from a different configuration. + +@WITH_GCOV_FALSE@coverage: +@WITH_GCOV_FALSE@ @echo "Configuring with --enable-coverage is required to generate code coverage report." + +@WITH_GCOV_FALSE@distclean-local: +@WITH_GCOV_FALSE@ rm -rf $(PACKAGE)-$(VERSION)-coverage* + +# Tell versions [3.59,3.63) of GNU make to not export all variables. +# Otherwise a system limit (for SysV at least) may be exceeded. +.NOEXPORT: diff --git a/src/pcre2/NEWS b/src/pcre2/NEWS new file mode 100644 index 00000000..8e3cf7ea --- /dev/null +++ b/src/pcre2/NEWS @@ -0,0 +1,355 @@ +News about PCRE2 releases +------------------------- + + +Version 10.37 26-May-2021 +------------------------- + +A few more bug fixes and tidies. The only change of real note is the removal of +the actual POSIX names regcomp etc. from the POSIX wrapper library because +these have caused issues for some applications (see 10.33 #2 below). + + +Version 10.36 04-December-2020 +------------------------------ + +Again, mainly bug fixes and tidies. The only enhancements are the addition of +GNU grep's -m (aka --max-count) option to pcre2grep, and also unifying the +handling of substitution strings for both -O and callouts in pcre2grep, with +the addition of $x{...} and $o{...} to allow for characters whose code points +are greater than 255 in Unicode mode. + +NOTE: there is an outstanding issue with JIT support for MacOS on arm64 +hardware. For details, please see Bugzilla issue #2618. + + +Version 10.35 15-April-2020 +--------------------------- + +Bugfixes, tidies, and a few new enhancements. + +1. Capturing groups that contain recursive backreferences to themselves are no +longer automatically atomic, because the restriction is no longer necessary +as a result of the 10.30 restructuring. + +2. Several new options for pcre2_substitute(). + +3. When Unicode is supported and PCRE2_UCP is set without PCRE2_UTF, Unicode +character properties are used for upper/lower case computations on characters +whose code points are greater than 127. + +4. The character tables (for low-valued characters) can now more easily be +saved and restored in binary. + +5. Updated to Unicode 13.0.0. + + +Version 10.34 21-November-2019 +------------------------------ + +Another release with a few enhancements as well as bugfixes and tidies. The +main new features are: + +1. There is now some support for matching in invalid UTF strings. + +2. Non-atomic positive lookarounds are implemented in the pcre2_match() +interpreter, but not in JIT. + +3. Added two new functions: pcre2_get_match_data_size() and +pcre2_maketables_free(). + +4. Upgraded to Unicode 12.1.0. + + +Version 10.33 16-April-2019 +--------------------------- + +Yet more bugfixes, tidies, and a few enhancements, summarized here (see +ChangeLog for the full list): + +1. Callouts from pcre2_substitute() are now available. + +2. The POSIX functions are now all called pcre2_regcomp() etc., with wrapper +functions that use the standard POSIX names. However, in pcre2posix.h the POSIX +names are defined as macros. This should help avoid linking with the wrong +library in some environments, while still exporting the POSIX names for +pre-existing programs that use them. + +3. Some new options: + + (a) PCRE2_EXTRA_ESCAPED_CR_IS_LF makes \r behave as \n. + + (b) PCRE2_EXTRA_ALT_BSUX enables support for ECMAScript 6's \u{hh...} + construct. + + (c) PCRE2_COPY_MATCHED_SUBJECT causes a copy of a matched subject to be + made, instead of just remembering a pointer. + +4. Some new Perl features: + + (a) Perl 5.28's experimental alphabetic names for atomic groups and + lookaround assertions, for example, (*pla:...) and (*atomic:...). + + (b) The new Perl "script run" features (*script_run:...) and + (*atomic_script_run:...) aka (*sr:...) and (*asr:...). + + (c) When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in + capture group names. + +5. --disable-percent-zt disables the use of %zu and %td in formatting strings +in pcre2test. They were already automatically disabled for VC and older C +compilers. + +6. Some changes related to callouts in pcre2grep: + + (a) Support for running an external program under VMS has been added, in + addition to Windows and fork() support. + + (b) --disable-pcre2grep-callout-fork restricts the callout support in + to the inbuilt echo facility. + + +Version 10.32 10-September-2018 +------------------------------- + +This is another mainly bugfix and tidying release with a few minor +enhancements. These are the main ones: + +1. pcre2grep now supports the inclusion of binary zeros in patterns that are +read from files via the -f option. + +2. ./configure now supports --enable-jit=auto, which automatically enables JIT +if the hardware supports it. + +3. In pcre2_dfa_match(), internal recursive calls no longer use the stack for +local workspace and local ovectors. Instead, an initial block of stack is +reserved, but if this is insufficient, heap memory is used. The heap limit +parameter now applies to pcre2_dfa_match(). + +4. Updated to Unicode version 11.0.0. + +5. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported. + +6. Added support for \N{U+dddd}, but only in Unicode mode. + +7. Added support for (?^) to unset all imnsx options. + + +Version 10.31 12-February-2018 +------------------------------ + +This is mainly a bugfix and tidying release (see ChangeLog for full details). +However, there are some minor enhancements. + +1. New pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and +PCRE2_CONFIG_COMPILED_WIDTHS. + +2. New pcre2_pattern_info() option PCRE2_INFO_EXTRAOPTIONS to retrieve the +extra compile time options. + +3. There are now public names for all the pcre2_compile() error numbers. + +4. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new +field callout_flags in callout blocks. + + +Version 10.30 14-August-2017 +---------------------------- + +The full list of changes that includes bugfixes and tidies is, as always, in +ChangeLog. These are the most important new features: + +1. The main interpreter, pcre2_match(), has been refactored into a new version +that does not use recursive function calls (and therefore the system stack) for +remembering backtracking positions. This makes --disable-stack-for-recursion a +NOOP. The new implementation allows backtracking into recursive group calls in +patterns, making it more compatible with Perl, and also fixes some other +previously hard-to-do issues. For patterns that have a lot of backtracking, the +heap is now used, and there is an explicit limit on the amount, settable by +pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). The "recursion limit" is retained, +but is renamed as "depth limit" (though the old names remain for +compatibility). + +There is also a change in the way callouts from pcre2_match() are handled. The +offset_vector field in the callout block is no longer a pointer to the +actual ovector that was passed to the matching function in the match data +block. Instead it points to an internal ovector of a size large enough to hold +all possible captured substrings in the pattern. + +2. The new option PCRE2_ENDANCHORED insists that a pattern match must end at +the end of the subject. + +3. The new option PCRE2_EXTENDED_MORE implements Perl's /xx feature, and +pcre2test is upgraded to support it. Setting within the pattern by (?xx) is +also supported. + +4. (?n) can be used to set PCRE2_NO_AUTO_CAPTURE, because Perl now has this. + +5. Additional compile options in the compile context are now available, and the +first two are: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES and +PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL. + +6. The newline type PCRE2_NEWLINE_NUL is now available. + +7. The match limit value now also applies to pcre2_dfa_match() as there are +patterns that can use up a lot of resources without necessarily recursing very +deeply. + +8. The option REG_PEND (a GNU extension) is now available for the POSIX +wrapper. Also there is a new option PCRE2_LITERAL which is used to support +REG_NOSPEC. + +9. PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD are implemented for the +benefit of pcre2grep, and pcre2grep's -F, -w, and -x options are re-implemented +using PCRE2_LITERAL, PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This +is tidier and also fixes some bugs. + +10. The Unicode tables are upgraded from Unicode 8.0.0 to Unicode 10.0.0. + +11. There are some experimental functions for converting foreign patterns +(globs and POSIX patterns) into PCRE2 patterns. + + +Version 10.23 14-February-2017 +------------------------------ + +1. ChangeLog has the details of a lot of bug fixes and tidies. + +2. There has been a major re-factoring of the pcre2_compile.c file. Most syntax +checking is now done in the pre-pass that identifies capturing groups. This has +reduced the amount of duplication and made the code tidier. While doing this, +some minor bugs and Perl incompatibilities were fixed (see ChangeLog for +details.) + +3. Back references are now permitted in lookbehind assertions when there are +no duplicated group numbers (that is, (?| has not been used), and, if the +reference is by name, there is only one group of that name. The referenced +group must, of course be of fixed length. + +4. \g{+} (e.g. \g{+2} ) is now supported. It is a "forward back +reference" and can be useful in repetitions (compare \g{-} ). Perl does +not recognize this syntax. + +5. pcre2grep now automatically expands its buffer up to a maximum set by +--max-buffer-size. + +6. The -t option (grand total) has been added to pcre2grep. + +7. A new function called pcre2_code_copy_with_tables() exists to copy a +compiled pattern along with a private copy of the character tables that is +uses. + +8. A user supplied a number of patches to upgrade pcre2grep under Windows and +tidy the code. + +9. Several updates have been made to pcre2test and test scripts (see +ChangeLog). + + +Version 10.22 29-July-2016 +-------------------------- + +1. ChangeLog has the details of a number of bug fixes. + +2. The POSIX wrapper function regcomp() did not used to support back references +and subroutine calls if called with the REG_NOSUB option. It now does. + +3. A new function, pcre2_code_copy(), is added, to make a copy of a compiled +pattern. + +4. Support for string callouts is added to pcre2grep. + +5. Added the PCRE2_NO_JIT option to pcre2_match(). + +6. The pcre2_get_error_message() function now returns with a negative error +code if the error number it is given is unknown. + +7. Several updates have been made to pcre2test and test scripts (see +ChangeLog). + + +Version 10.21 12-January-2016 +----------------------------- + +1. Many bugs have been fixed. A large number of them were provoked only by very +strange pattern input, and were discovered by fuzzers. Some others were +discovered by code auditing. See ChangeLog for details. + +2. The Unicode tables have been updated to Unicode version 8.0.0. + +3. For Perl compatibility in EBCDIC environments, ranges such as a-z in a +class, where both values are literal letters in the same case, omit the +non-letter EBCDIC code points within the range. + +4. There have been a number of enhancements to the pcre2_substitute() function, +giving more flexibility to replacement facilities. It is now also possible to +cause the function to return the needed buffer size if the one given is too +small. + +5. The PCRE2_ALT_VERBNAMES option causes the "name" parts of special verbs such +as (*THEN:name) to be processed for backslashes and to take note of +PCRE2_EXTENDED. + +6. PCRE2_INFO_HASBACKSLASHC makes it possible for a client to find out if a +pattern uses \C, and --never-backslash-C makes it possible to compile a version +PCRE2 in which the use of \C is always forbidden. + +7. A limit to the length of pattern that can be handled can now be set by +calling pcre2_set_max_pattern_length(). + +8. When matching an unanchored pattern, a match can be required to begin within +a given number of code units after the start of the subject by calling +pcre2_set_offset_limit(). + +9. The pcre2test program has been extended to test new facilities, and it can +now run the tests when LF on its own is not a valid newline sequence. + +10. The RunTest script has also been updated to enable more tests to be run. + +11. There have been some minor performance enhancements. + + +Version 10.20 30-June-2015 +-------------------------- + +1. Callouts with string arguments and the pcre2_callout_enumerate() function +have been implemented. + +2. The PCRE2_NEVER_BACKSLASH_C option, which locks out the use of \C, is added. + +3. The PCRE2_ALT_CIRCUMFLEX option lets ^ match after a newline at the end of a +subject in multiline mode. + +4. The way named subpatterns are handled has been refactored. The previous +approach had several bugs. + +5. The handling of \c in EBCDIC environments has been changed to conform to the +perlebcdic document. This is an incompatible change. + +6. Bugs have been mended, many of them discovered by fuzzers. + + +Version 10.10 06-March-2015 +--------------------------- + +1. Serialization and de-serialization functions have been added to the API, +making it possible to save and restore sets of compiled patterns, though +restoration must be done in the same environment that was used for compilation. + +2. The (*NO_JIT) feature has been added; this makes it possible for a pattern +creator to specify that JIT is not to be used. + +3. A number of bugs have been fixed. In particular, bugs that caused building +on Windows using CMake to fail have been mended. + + +Version 10.00 05-January-2015 +----------------------------- + +Version 10.00 is the first release of PCRE2, a revised API for the PCRE +library. Changes prior to 10.00 are logged in the ChangeLog file for the old +API, up to item 20 for release 8.36. New programs are recommended to use the +new library. Programs that use the original (PCRE1) API will need changing +before linking with the new library. + +**** diff --git a/src/pcre2/NON-AUTOTOOLS-BUILD b/src/pcre2/NON-AUTOTOOLS-BUILD new file mode 100644 index 00000000..6bf65765 --- /dev/null +++ b/src/pcre2/NON-AUTOTOOLS-BUILD @@ -0,0 +1,410 @@ +Building PCRE2 without using autotools +-------------------------------------- + +This document contains the following sections: + + General + Generic instructions for the PCRE2 C library + Stack size in Windows environments + Linking programs in Windows environments + Calling conventions in Windows environments + Comments about Win32 builds + Building PCRE2 on Windows with CMake + Building PCRE2 on Windows with Visual Studio + Testing with RunTest.bat + Building PCRE2 on native z/OS and z/VM + + +GENERAL + +The basic PCRE2 library consists entirely of code written in Standard C, and so +should compile successfully on any system that has a Standard C compiler and +library. + +The PCRE2 distribution includes a "configure" file for use by the +configure/make (autotools) build system, as found in many Unix-like +environments. The README file contains information about the options for +"configure". + +There is also support for CMake, which some users prefer, especially in Windows +environments, though it can also be run in Unix-like environments. See the +section entitled "Building PCRE2 on Windows with CMake" below. + +Versions of src/config.h and src/pcre2.h are distributed in the PCRE2 tarballs +under the names src/config.h.generic and src/pcre2.h.generic. These are +provided for those who build PCRE2 without using "configure" or CMake. If you +use "configure" or CMake, the .generic versions are not used. + + +GENERIC INSTRUCTIONS FOR THE PCRE2 C LIBRARY + +The following are generic instructions for building the PCRE2 C library "by +hand". If you are going to use CMake, this section does not apply to you; you +can skip ahead to the CMake section. Note that the settings concerned with +8-bit, 16-bit, and 32-bit code units relate to the type of data string that +PCRE2 processes. They are NOT referring to the underlying operating system bit +width. You do not have to do anything special to compile in a 64-bit +environment, for example. + + (1) Copy or rename the file src/config.h.generic as src/config.h, and edit the + macro settings that it contains to whatever is appropriate for your + environment. In particular, you can alter the definition of the NEWLINE + macro to specify what character(s) you want to be interpreted as line + terminators by default. + + When you subsequently compile any of the PCRE2 modules, you must specify + -DHAVE_CONFIG_H to your compiler so that src/config.h is included in the + sources. + + An alternative approach is not to edit src/config.h, but to use -D on the + compiler command line to make any changes that you need to the + configuration options. In this case -DHAVE_CONFIG_H must not be set. + + NOTE: There have been occasions when the way in which certain parameters + in src/config.h are used has changed between releases. (In the + configure/make world, this is handled automatically.) When upgrading to a + new release, you are strongly advised to review src/config.h.generic + before re-using what you had previously. + + Note also that the src/config.h.generic file is created from a config.h + that was generated by Autotools, which automatically includes settings of + a number of macros that are not actually used by PCRE2 (for example, + HAVE_MEMORY_H). + + (2) Copy or rename the file src/pcre2.h.generic as src/pcre2.h. + + (3) EITHER: + Copy or rename file src/pcre2_chartables.c.dist as + src/pcre2_chartables.c. + + OR: + Compile src/pcre2_dftables.c as a stand-alone program (using + -DHAVE_CONFIG_H if you have set up src/config.h), and then run it with + the single argument "src/pcre2_chartables.c". This generates a set of + standard character tables and writes them to that file. The tables are + generated using the default C locale for your system. If you want to use + a locale that is specified by LC_xxx environment variables, add the -L + option to the pcre2_dftables command. You must use this method if you + are building on a system that uses EBCDIC code. + + The tables in src/pcre2_chartables.c are defaults. The caller of PCRE2 can + specify alternative tables at run time. + + (4) For a library that supports 8-bit code units in the character strings that + it processes, compile the following source files from the src directory, + setting -DPCRE2_CODE_UNIT_WIDTH=8 as a compiler option. Also set + -DHAVE_CONFIG_H if you have set up src/config.h with your configuration, + or else use other -D settings to change the configuration as required. + + pcre2_auto_possess.c + pcre2_chartables.c + pcre2_compile.c + pcre2_config.c + pcre2_context.c + pcre2_convert.c + pcre2_dfa_match.c + pcre2_error.c + pcre2_extuni.c + pcre2_find_bracket.c + pcre2_jit_compile.c + pcre2_maketables.c + pcre2_match.c + pcre2_match_data.c + pcre2_newline.c + pcre2_ord2utf.c + pcre2_pattern_info.c + pcre2_script_run.c + pcre2_serialize.c + pcre2_string_utils.c + pcre2_study.c + pcre2_substitute.c + pcre2_substring.c + pcre2_tables.c + pcre2_ucd.c + pcre2_valid_utf.c + pcre2_xclass.c + + Make sure that you include -I. in the compiler command (or equivalent for + an unusual compiler) so that all included PCRE2 header files are first + sought in the src directory under the current directory. Otherwise you run + the risk of picking up a previously-installed file from somewhere else. + + Note that you must compile pcre2_jit_compile.c, even if you have not + defined SUPPORT_JIT in src/config.h, because when JIT support is not + configured, dummy functions are compiled. When JIT support IS configured, + pcre2_jit_compile.c #includes other files from the sljit subdirectory, + all of whose names begin with "sljit". It also #includes + src/pcre2_jit_match.c and src/pcre2_jit_misc.c, so you should not compile + these yourself. + + Note also that the pcre2_fuzzsupport.c file contains special code that is + useful to those who want to run fuzzing tests on the PCRE2 library. Unless + you are doing that, you can ignore it. + + (5) Now link all the compiled code into an object library in whichever form + your system keeps such libraries. This is the basic PCRE2 C 8-bit library. + If your system has static and shared libraries, you may have to do this + once for each type. + + (6) If you want to build a library that supports 16-bit or 32-bit code units, + (as well as, or instead of the 8-bit library) just supply 16 or 32 as the + value of -DPCRE2_CODE_UNIT_WIDTH when you are compiling. + + (7) If you want to build the POSIX wrapper functions (which apply only to the + 8-bit library), ensure that you have the src/pcre2posix.h file and then + compile src/pcre2posix.c. Link the result (on its own) as the pcre2posix + library. + + (8) The pcre2test program can be linked with any combination of the 8-bit, + 16-bit and 32-bit libraries (depending on what you selected in + src/config.h). Compile src/pcre2test.c; don't forget -DHAVE_CONFIG_H if + necessary, but do NOT define PCRE2_CODE_UNIT_WIDTH. Then link with the + appropriate library/ies. If you compiled an 8-bit library, pcre2test also + needs the pcre2posix wrapper library. + + (9) Run pcre2test on the testinput files in the testdata directory, and check + that the output matches the corresponding testoutput files. There are + comments about what each test does in the section entitled "Testing PCRE2" + in the README file. If you compiled more than one of the 8-bit, 16-bit and + 32-bit libraries, you need to run pcre2test with the -16 option to do + 16-bit tests and with the -32 option to do 32-bit tests. + + Some tests are relevant only when certain build-time options are selected. + For example, test 4 is for Unicode support, and will not run if you have + built PCRE2 without it. See the comments at the start of each testinput + file. If you have a suitable Unix-like shell, the RunTest script will run + the appropriate tests for you. The command "RunTest list" will output a + list of all the tests. + + Note that the supplied files are in Unix format, with just LF characters + as line terminators. You may need to edit them to change this if your + system uses a different convention. + +(10) If you have built PCRE2 with SUPPORT_JIT, the JIT features can be tested + by running pcre2test with the -jit option. This is done automatically by + the RunTest script. You might also like to build and run the freestanding + JIT test program, src/pcre2_jit_test.c. + +(11) If you want to use the pcre2grep command, compile and link + src/pcre2grep.c; it uses only the basic 8-bit PCRE2 library (it does not + need the pcre2posix library). If you have built the PCRE2 library with JIT + support by defining SUPPORT_JIT in src/config.h, you can also define + SUPPORT_PCRE2GREP_JIT, which causes pcre2grep to make use of JIT (unless + it is run with --no-jit). If you define SUPPORT_PCRE2GREP_JIT without + defining SUPPORT_JIT, pcre2grep does not try to make use of JIT. + + +STACK SIZE IN WINDOWS ENVIRONMENTS + +Prior to release 10.30 the default system stack size of 1MiB in some Windows +environments caused issues with some tests. This should no longer be the case +for 10.30 and later releases. + + +LINKING PROGRAMS IN WINDOWS ENVIRONMENTS + +If you want to statically link a program against a PCRE2 library in the form of +a non-dll .a file, you must define PCRE2_STATIC before including src/pcre2.h. + + +CALLING CONVENTIONS IN WINDOWS ENVIRONMENTS + +It is possible to compile programs to use different calling conventions using +MSVC. Search the web for "calling conventions" for more information. To make it +easier to change the calling convention for the exported functions in the +PCRE2 library, the macro PCRE2_CALL_CONVENTION is present in all the external +definitions. It can be set externally when compiling (e.g. in CFLAGS). If it is +not set, it defaults to empty; the default calling convention is then used +(which is what is wanted most of the time). + + +COMMENTS ABOUT WIN32 BUILDS (see also "BUILDING PCRE2 ON WINDOWS WITH CMAKE") + +There are two ways of building PCRE2 using the "configure, make, make install" +paradigm on Windows systems: using MinGW or using Cygwin. These are not at all +the same thing; they are completely different from each other. There is also +support for building using CMake, which some users find a more straightforward +way of building PCRE2 under Windows. + +The MinGW home page (http://www.mingw.org/) says this: + + MinGW: A collection of freely available and freely distributable Windows + specific header files and import libraries combined with GNU toolsets that + allow one to produce native Windows programs that do not rely on any + 3rd-party C runtime DLLs. + +The Cygwin home page (http://www.cygwin.com/) says this: + + Cygwin is a Linux-like environment for Windows. It consists of two parts: + + . A DLL (cygwin1.dll) which acts as a Linux API emulation layer providing + substantial Linux API functionality + + . A collection of tools which provide Linux look and feel. + +On both MinGW and Cygwin, PCRE2 should build correctly using: + + ./configure && make && make install + +This should create two libraries called libpcre2-8 and libpcre2-posix. These +are independent libraries: when you link with libpcre2-posix you must also link +with libpcre2-8, which contains the basic functions. + +Using Cygwin's compiler generates libraries and executables that depend on +cygwin1.dll. If a library that is generated this way is distributed, +cygwin1.dll has to be distributed as well. Since cygwin1.dll is under the GPL +licence, this forces not only PCRE2 to be under the GPL, but also the entire +application. A distributor who wants to keep their own code proprietary must +purchase an appropriate Cygwin licence. + +MinGW has no such restrictions. The MinGW compiler generates a library or +executable that can run standalone on Windows without any third party dll or +licensing issues. + +But there is more complication: + +If a Cygwin user uses the -mno-cygwin Cygwin gcc flag, what that really does is +to tell Cygwin's gcc to use the MinGW gcc. Cygwin's gcc is only acting as a +front end to MinGW's gcc (if you install Cygwin's gcc, you get both Cygwin's +gcc and MinGW's gcc). So, a user can: + +. Build native binaries by using MinGW or by getting Cygwin and using + -mno-cygwin. + +. Build binaries that depend on cygwin1.dll by using Cygwin with the normal + compiler flags. + +The test files that are supplied with PCRE2 are in UNIX format, with LF +characters as line terminators. Unless your PCRE2 library uses a default +newline option that includes LF as a valid newline, it may be necessary to +change the line terminators in the test files to get some of the tests to work. + + +BUILDING PCRE2 ON WINDOWS WITH CMAKE + +CMake is an alternative configuration facility that can be used instead of +"configure". CMake creates project files (make files, solution files, etc.) +tailored to numerous development environments, including Visual Studio, +Borland, Msys, MinGW, NMake, and Unix. If possible, use short paths with no +spaces in the names for your CMake installation and your PCRE2 source and build +directories. + +The following instructions were contributed by a PCRE1 user, but they should +also work for PCRE2. If they are not followed exactly, errors may occur. In the +event that errors do occur, it is recommended that you delete the CMake cache +before attempting to repeat the CMake build process. In the CMake GUI, the +cache can be deleted by selecting "File > Delete Cache". + +1. Install the latest CMake version available from http://www.cmake.org/, and + ensure that cmake\bin is on your path. + +2. Unzip (retaining folder structure) the PCRE2 source tree into a source + directory such as C:\pcre2. You should ensure your local date and time + is not earlier than the file dates in your source dir if the release is + very new. + +3. Create a new, empty build directory, preferably a subdirectory of the + source dir. For example, C:\pcre2\pcre2-xx\build. + +4. Run cmake-gui from the Shell envirornment of your build tool, for example, + Msys for Msys/MinGW or Visual Studio Command Prompt for VC/VC++. Do not try + to start Cmake from the Windows Start menu, as this can lead to errors. + +5. Enter C:\pcre2\pcre2-xx and C:\pcre2\pcre2-xx\build for the source and + build directories, respectively. + +6. Hit the "Configure" button. + +7. Select the particular IDE / build tool that you are using (Visual + Studio, MSYS makefiles, MinGW makefiles, etc.) + +8. The GUI will then list several configuration options. This is where + you can disable Unicode support or select other PCRE2 optional features. + +9. Hit "Configure" again. The adjacent "Generate" button should now be + active. + +10. Hit "Generate". + +11. The build directory should now contain a usable build system, be it a + solution file for Visual Studio, makefiles for MinGW, etc. Exit from + cmake-gui and use the generated build system with your compiler or IDE. + E.g., for MinGW you can run "make", or for Visual Studio, open the PCRE2 + solution, select the desired configuration (Debug, or Release, etc.) and + build the ALL_BUILD project. + +12. If during configuration with cmake-gui you've elected to build the test + programs, you can execute them by building the test project. E.g., for + MinGW: "make test"; for Visual Studio build the RUN_TESTS project. The + most recent build configuration is targeted by the tests. A summary of + test results is presented. Complete test output is subsequently + available for review in Testing\Temporary under your build dir. + + +BUILDING PCRE2 ON WINDOWS WITH VISUAL STUDIO + +The code currently cannot be compiled without a stdint.h header, which is +available only in relatively recent versions of Visual Studio. However, this +portable and permissively-licensed implementation of the header worked without +issue: + + http://www.azillionmonkeys.com/qed/pstdint.h + +Just rename it and drop it into the top level of the build tree. + + +TESTING WITH RUNTEST.BAT + +If configured with CMake, building the test project ("make test" or building +ALL_TESTS in Visual Studio) creates (and runs) pcre2_test.bat (and depending +on your configuration options, possibly other test programs) in the build +directory. The pcre2_test.bat script runs RunTest.bat with correct source and +exe paths. + +For manual testing with RunTest.bat, provided the build dir is a subdirectory +of the source directory: Open command shell window. Chdir to the location +of your pcre2test.exe and pcre2grep.exe programs. Call RunTest.bat with +"..\RunTest.Bat" or "..\..\RunTest.bat" as appropriate. + +To run only a particular test with RunTest.Bat provide a test number argument. + +Otherwise: + +1. Copy RunTest.bat into the directory where pcre2test.exe and pcre2grep.exe + have been created. + +2. Edit RunTest.bat to indentify the full or relative location of + the pcre2 source (wherein which the testdata folder resides), e.g.: + + set srcdir=C:\pcre2\pcre2-10.00 + +3. In a Windows command environment, chdir to the location of your bat and + exe programs. + +4. Run RunTest.bat. Test outputs will automatically be compared to expected + results, and discrepancies will be identified in the console output. + +To independently test the just-in-time compiler, run pcre2_jit_test.exe. + + +BUILDING PCRE2 ON NATIVE Z/OS AND Z/VM + +z/OS and z/VM are operating systems for mainframe computers, produced by IBM. +The character code used is EBCDIC, not ASCII or Unicode. In z/OS, UNIX APIs and +applications can be supported through UNIX System Services, and in such an +environment it should be possible to build PCRE2 in the same way as in other +systems, with the EBCDIC related configuration settings, but it is not known if +anybody has tried this. + +In native z/OS (without UNIX System Services) and in z/VM, special ports are +required. For details, please see file 939 on this web site: + + http://www.cbttape.org + +Everything in that location, source and executable, is in EBCDIC and native +z/OS file formats. The port provides an API for LE languages such as COBOL and +for the z/OS and z/VM versions of the Rexx languages. + +=========================== +Last Updated: 28 April 2021 +=========================== diff --git a/src/pcre/PrepareRelease b/src/pcre2/PrepareRelease similarity index 62% rename from src/pcre/PrepareRelease rename to src/pcre2/PrepareRelease index 9891e08d..e7cf8db8 100755 --- a/src/pcre/PrepareRelease +++ b/src/pcre2/PrepareRelease @@ -1,7 +1,7 @@ #/bin/sh -# Script to prepare the files for building a PCRE release. It does some -# processing of the documentation, detrails files, and creates pcre.h.generic +# Script to prepare the files for building a PCRE2 release. It does some +# processing of the documentation, detrails files, and creates pcre2.h.generic # and config.h.generic (for use by builders who can't run ./configure). # You must run this script before runnning "make dist". If its first argument @@ -9,7 +9,7 @@ # arguments. The script makes use of the following files: # 132html A Perl script that converts a .1 or .3 man page into HTML. It -# "knows" the relevant troff constructs that are used in the PCRE +# "knows" the relevant troff constructs that are used in the PCRE2 # man pages. # CheckMan A Perl script that checks man pages for typos in the mark up. @@ -27,19 +27,19 @@ # README & NON-AUTOTOOLS-BUILD # These files are copied into the doc/html directory, with .txt -# extensions so that they can by hyperlinked from the HTML +# extensions so that they can by hyperlinked from the HTML # documentation, because some people just go to the HTML without # looking for text files. -# First, sort out the documentation. Remove pcredemo.3 first because it won't +# First, sort out the documentation. Remove pcre2demo.3 first because it won't # pass the markup check (it is created below, using markup that none of the # other pages use). cd doc echo Processing documentation -/bin/rm -f pcredemo.3 +/bin/rm -f pcre2demo.3 # Check the remaining man pages @@ -50,37 +50,37 @@ if [ $? != 0 ] ; then exit 1; fi # tidy for online reading. Concatenate all the .3 stuff, but omit the # individual function pages. -cat <pcre.txt +cat <pcre2.txt ----------------------------------------------------------------------------- -This file contains a concatenation of the PCRE man pages, converted to plain +This file contains a concatenation of the PCRE2 man pages, converted to plain text format for ease of searching with a text editor, or for use on systems that do not have a man page processor. The small individual files that give synopses of each function in the library have not been included. Neither has -the pcredemo program. There are separate text files for the pcregrep and -pcretest commands. +the pcre2demo program. There are separate text files for the pcre2grep and +pcre2test commands. ----------------------------------------------------------------------------- End -echo "Making pcre.txt" -for file in pcre pcre16 pcre32 pcrebuild pcrematching pcreapi pcrecallout \ - pcrecompat pcrepattern pcresyntax pcreunicode pcrejit pcrepartial \ - pcreprecompile pcreperform pcreposix pcrecpp pcresample \ - pcrelimits pcrestack ; do +echo "Making pcre2.txt" +for file in pcre2 pcre2api pcre2build pcre2callout pcre2compat pcre2jit \ + pcre2limits pcre2matching pcre2partial pcre2pattern pcre2perform \ + pcre2posix pcre2sample pcre2serialize pcre2syntax \ + pcre2unicode ; do echo " Processing $file.3" nroff -c -man $file.3 >$file.rawtxt - perl ../CleanTxt <$file.rawtxt >>pcre.txt + perl ../CleanTxt <$file.rawtxt >>pcre2.txt /bin/rm $file.rawtxt - echo "------------------------------------------------------------------------------" >>pcre.txt - if [ "$file" != "pcresample" ] ; then - echo " " >>pcre.txt - echo " " >>pcre.txt + echo "------------------------------------------------------------------------------" >>pcre2.txt + if [ "$file" != "pcre2sample" ] ; then + echo " " >>pcre2.txt + echo " " >>pcre2.txt fi done # The three commands -for file in pcretest pcregrep pcre-config ; do +for file in pcre2test pcre2grep pcre2-config ; do echo Making $file.txt nroff -c -man $file.1 >$file.rawtxt perl ../CleanTxt <$file.rawtxt >$file.txt @@ -88,12 +88,12 @@ for file in pcretest pcregrep pcre-config ; do done -# Make pcredemo.3 from the pcredemo.c source file +# Make pcre2demo.3 from the pcre2demo.c source file -echo "Making pcredemo.3" -perl <<"END" >pcredemo.3 - open(IN, "../pcredemo.c") || die "Failed to open pcredemo.c\n"; - open(OUT, ">pcredemo.3") || die "Failed to open pcredemo.3\n"; +echo "Making pcre2demo.3" +perl <<"END" >pcre2demo.3 + open(IN, "../src/pcre2demo.c") || die "Failed to open src/pcre2demo.c\n"; + open(OUT, ">pcre2demo.3") || die "Failed to open pcre2demo.3\n"; print OUT ".\\\" Start example.\n" . ".de EX\n" . ". nr mE \\\\n(.f\n" . @@ -145,12 +145,10 @@ for file in *.3 ; do base=`basename $file .3` toc=-toc if [ `expr $base : '.*_'` -ne 0 ] ; then toc="" ; fi - if [ "$base" = "pcresample" ] || \ - [ "$base" = "pcrestack" ] || \ - [ "$base" = "pcrecompat" ] || \ - [ "$base" = "pcrelimits" ] || \ - [ "$base" = "pcreperform" ] || \ - [ "$base" = "pcreunicode" ] ; then + if [ "$base" = "pcre2sample" ] || \ + [ "$base" = "pcre2compat" ] || \ + [ "$base" = "pcre2limits" ] || \ + [ "$base" = "pcre2unicode" ] ; then toc="" fi echo " Making $base.html" @@ -167,20 +165,16 @@ if [ "$1" = "doc" ] ; then exit; fi # These files are detrailed; do not detrail the test data because there may be # significant trailing spaces. Do not detrail RunTest.bat, because it has CRLF # line endings and the detrail script removes all trailing white space. The -# configure files are also omitted from the detrailing. We don't bother with -# those pcre[16|32]_xx files that just define COMPILE_PCRE16 and then #include the -# common file, because they aren't going to change. +# configure files are also omitted from the detrailing. files="\ Makefile.am \ - Makefile.in \ configure.ac \ README \ LICENCE \ COPYING \ AUTHORS \ NEWS \ - NON-UNIX-USE \ NON-AUTOTOOLS-BUILD \ INSTALL \ 132html \ @@ -190,65 +184,49 @@ files="\ CMakeLists.txt \ RunGrepTest \ RunTest \ - pcre-config.in \ - libpcre.pc.in \ - libpcre16.pc.in \ - libpcre32.pc.in \ - libpcreposix.pc.in \ - libpcrecpp.pc.in \ - config.h.in \ - pcre_chartables.c.dist \ - pcredemo.c \ - pcregrep.c \ - pcretest.c \ - dftables.c \ - pcreposix.c \ - pcreposix.h \ - pcre.h.in \ - pcre_internal.h \ - pcre_byte_order.c \ - pcre_compile.c \ - pcre_config.c \ - pcre_dfa_exec.c \ - pcre_exec.c \ - pcre_fullinfo.c \ - pcre_get.c \ - pcre_globals.c \ - pcre_jit_compile.c \ - pcre_jit_test.c \ - pcre_maketables.c \ - pcre_newline.c \ - pcre_ord2utf8.c \ - pcre16_ord2utf16.c \ - pcre32_ord2utf32.c \ - pcre_printint.c \ - pcre_refcount.c \ - pcre_string_utils.c \ - pcre_study.c \ - pcre_tables.c \ - pcre_valid_utf8.c \ - pcre_version.c \ - pcre_xclass.c \ - pcre16_utf16_utils.c \ - pcre32_utf32_utils.c \ - pcre16_valid_utf16.c \ - pcre32_valid_utf32.c \ - pcre_scanner.cc \ - pcre_scanner.h \ - pcre_scanner_unittest.cc \ - pcrecpp.cc \ - pcrecpp.h \ - pcrecpparg.h.in \ - pcrecpp_unittest.cc \ - pcre_stringpiece.cc \ - pcre_stringpiece.h.in \ - pcre_stringpiece_unittest.cc \ - perltest.pl \ - ucp.h \ - makevp.bat \ - pcre.def \ - libpcre.def \ - libpcreposix.def" + pcre2-config.in \ + perltest.sh \ + libpcre2-8.pc.in \ + libpcre2-16.pc.in \ + libpcre2-32.pc.in \ + libpcre2-posix.pc.in \ + src/pcre2_dftables.c \ + src/pcre2.h.in \ + src/pcre2_auto_possess.c \ + src/pcre2_compile.c \ + src/pcre2_config.c \ + src/pcre2_context.c \ + src/pcre2_convert.c \ + src/pcre2_dfa_match.c \ + src/pcre2_error.c \ + src/pcre2_extuni.c \ + src/pcre2_find_bracket.c \ + src/pcre2_internal.h \ + src/pcre2_intmodedep.h \ + src/pcre2_jit_compile.c \ + src/pcre2_jit_match.c \ + src/pcre2_jit_misc.c \ + src/pcre2_jit_test.c \ + src/pcre2_maketables.c \ + src/pcre2_match.c \ + src/pcre2_match_data.c \ + src/pcre2_newline.c \ + src/pcre2_ord2utf.c \ + src/pcre2_pattern_info.c \ + src/pcre2_printint.c \ + src/pcre2_string_utils.c \ + src/pcre2_study.c \ + src/pcre2_substring.c \ + src/pcre2_tables.c \ + src/pcre2_ucd.c \ + src/pcre2_ucp.h \ + src/pcre2_valid_utf.c \ + src/pcre2_xclass.c \ + src/pcre2demo.c \ + src/pcre2grep.c \ + src/pcre2posix.c \ + src/pcre2posix.h \ + src/pcre2test.c" echo Detrailing perl ./Detrail $files doc/p* doc/html/* diff --git a/src/pcre2/README b/src/pcre2/README new file mode 100644 index 00000000..d1a3120e --- /dev/null +++ b/src/pcre2/README @@ -0,0 +1,907 @@ +README file for PCRE2 (Perl-compatible regular expression library) +------------------------------------------------------------------ + +PCRE2 is a re-working of the original PCRE1 library to provide an entirely new +API. Since its initial release in 2015, there has been further development of +the code and it now differs from PCRE1 in more than just the API. There are new +features, and the internals have been improved. The original PCRE1 library is +now obsolete and should not be used in new projects. The latest release of +PCRE2 is available in three alternative formats from: + +https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.gz +https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.bz2 +https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.zip + +There is a mailing list for discussion about the development of PCRE at +pcre-dev@exim.org. You can access the archives and subscribe or manage your +subscription here: + + https://lists.exim.org/mailman/listinfo/pcre-dev + +Please read the NEWS file if you are upgrading from a previous release. The +contents of this README file are: + + The PCRE2 APIs + Documentation for PCRE2 + Contributions by users of PCRE2 + Building PCRE2 on non-Unix-like systems + Building PCRE2 without using autotools + Building PCRE2 using autotools + Retrieving configuration information + Shared libraries + Cross-compiling using autotools + Making new tarballs + Testing PCRE2 + Character tables + File manifest + + +The PCRE2 APIs +-------------- + +PCRE2 is written in C, and it has its own API. There are three sets of +functions, one for the 8-bit library, which processes strings of bytes, one for +the 16-bit library, which processes strings of 16-bit values, and one for the +32-bit library, which processes strings of 32-bit values. Unlike PCRE1, there +are no C++ wrappers. + +The distribution does contain a set of C wrapper functions for the 8-bit +library that are based on the POSIX regular expression API (see the pcre2posix +man page). These are built into a library called libpcre2-posix. Note that this +just provides a POSIX calling interface to PCRE2; the regular expressions +themselves still follow Perl syntax and semantics. The POSIX API is restricted, +and does not give full access to all of PCRE2's facilities. + +The header file for the POSIX-style functions is called pcre2posix.h. The +official POSIX name is regex.h, but I did not want to risk possible problems +with existing files of that name by distributing it that way. To use PCRE2 with +an existing program that uses the POSIX API, pcre2posix.h will have to be +renamed or pointed at by a link (or the program modified, of course). See the +pcre2posix documentation for more details. + + +Documentation for PCRE2 +----------------------- + +If you install PCRE2 in the normal way on a Unix-like system, you will end up +with a set of man pages whose names all start with "pcre2". The one that is +just called "pcre2" lists all the others. In addition to these man pages, the +PCRE2 documentation is supplied in two other forms: + + 1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and + doc/pcre2test.txt in the source distribution. The first of these is a + concatenation of the text forms of all the section 3 man pages except the + listing of pcre2demo.c and those that summarize individual functions. The + other two are the text forms of the section 1 man pages for the pcre2grep + and pcre2test commands. These text forms are provided for ease of scanning + with text editors or similar tools. They are installed in + /share/doc/pcre2, where is the installation prefix + (defaulting to /usr/local). + + 2. A set of files containing all the documentation in HTML form, hyperlinked + in various ways, and rooted in a file called index.html, is distributed in + doc/html and installed in /share/doc/pcre2/html. + + +Building PCRE2 on non-Unix-like systems +--------------------------------------- + +For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if +your system supports the use of "configure" and "make" you may be able to build +PCRE2 using autotools in the same way as for many Unix-like systems. + +PCRE2 can also be configured using CMake, which can be run in various ways +(command line, GUI, etc). This creates Makefiles, solution files, etc. The file +NON-AUTOTOOLS-BUILD has information about CMake. + +PCRE2 has been compiled on many different operating systems. It should be +straightforward to build PCRE2 on any system that has a Standard C compiler and +library, because it uses only Standard C functions. + + +Building PCRE2 without using autotools +-------------------------------------- + +The use of autotools (in particular, libtool) is problematic in some +environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD +file for ways of building PCRE2 without using autotools. + + +Building PCRE2 using autotools +------------------------------ + +The following instructions assume the use of the widely used "configure; make; +make install" (autotools) process. + +To build PCRE2 on system that supports autotools, first run the "configure" +command from the PCRE2 distribution directory, with your current directory set +to the directory where you want the files to be created. This command is a +standard GNU "autoconf" configuration script, for which generic instructions +are supplied in the file INSTALL. + +Most commonly, people build PCRE2 within its own distribution directory, and in +this case, on many systems, just running "./configure" is sufficient. However, +the usual methods of changing standard defaults are available. For example: + +CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local + +This command specifies that the C compiler should be run with the flags '-O2 +-Wall' instead of the default, and that "make install" should install PCRE2 +under /opt/local instead of the default /usr/local. + +If you want to build in a different directory, just run "configure" with that +directory as current. For example, suppose you have unpacked the PCRE2 source +into /source/pcre2/pcre2-xxx, but you want to build it in +/build/pcre2/pcre2-xxx: + +cd /build/pcre2/pcre2-xxx +/source/pcre2/pcre2-xxx/configure + +PCRE2 is written in C and is normally compiled as a C library. However, it is +possible to build it as a C++ library, though the provided building apparatus +does not have any features to support this. + +There are some optional features that can be included or omitted from the PCRE2 +library. They are also documented in the pcre2build man page. + +. By default, both shared and static libraries are built. You can change this + by adding one of these options to the "configure" command: + + --disable-shared + --disable-static + + (See also "Shared libraries on Unix-like systems" below.) + +. By default, only the 8-bit library is built. If you add --enable-pcre2-16 to + the "configure" command, the 16-bit library is also built. If you add + --enable-pcre2-32 to the "configure" command, the 32-bit library is also + built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8 + to disable building the 8-bit library. + +. If you want to include support for just-in-time (JIT) compiling, which can + give large performance improvements on certain platforms, add --enable-jit to + the "configure" command. This support is available only for certain hardware + architectures. If you try to enable it on an unsupported architecture, there + will be a compile time error. If in doubt, use --enable-jit=auto, which + enables JIT only if the current hardware is supported. + +. If you are enabling JIT under SELinux environment you may also want to add + --enable-jit-sealloc, which enables the use of an executable memory allocator + that is compatible with SELinux. Warning: this allocator is experimental! + It does not support fork() operation and may crash when no disk space is + available. This option has no effect if JIT is disabled. + +. If you do not want to make use of the default support for UTF-8 Unicode + character strings in the 8-bit library, UTF-16 Unicode character strings in + the 16-bit library, or UTF-32 Unicode character strings in the 32-bit + library, you can add --disable-unicode to the "configure" command. This + reduces the size of the libraries. It is not possible to configure one + library with Unicode support, and another without, in the same configuration. + It is also not possible to use --enable-ebcdic (see below) with Unicode + support, so if this option is set, you must also use --disable-unicode. + + When Unicode support is available, the use of a UTF encoding still has to be + enabled by setting the PCRE2_UTF option at run time or starting a pattern + with (*UTF). When PCRE2 is compiled with Unicode support, its input can only + either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. + + As well as supporting UTF strings, Unicode support includes support for the + \P, \p, and \X sequences that recognize Unicode character properties. + However, only the basic two-letter properties such as Lu are supported. + Escape sequences such as \d and \w in patterns do not by default make use of + Unicode properties, but can be made to do so by setting the PCRE2_UCP option + or starting a pattern with (*UCP). + +. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any + of the preceding, or any of the Unicode newline sequences, or the NUL (zero) + character as indicating the end of a line. Whatever you specify at build time + is the default; the caller of PCRE2 can change the selection at run time. The + default newline indicator is a single LF character (the Unix standard). You + can specify the default newline indicator by adding --enable-newline-is-cr, + --enable-newline-is-lf, --enable-newline-is-crlf, + --enable-newline-is-anycrlf, --enable-newline-is-any, or + --enable-newline-is-nul to the "configure" command, respectively. + +. By default, the sequence \R in a pattern matches any Unicode line ending + sequence. This is independent of the option specifying what PCRE2 considers + to be the end of a line (see above). However, the caller of PCRE2 can + restrict \R to match only CR, LF, or CRLF. You can make this the default by + adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R"). + +. In a pattern, the escape sequence \C matches a single code unit, even in a + UTF mode. This can be dangerous because it breaks up multi-code-unit + characters. You can build PCRE2 with the use of \C permanently locked out by + adding --enable-never-backslash-C (note the upper case C) to the "configure" + command. When \C is allowed by the library, individual applications can lock + it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option. + +. PCRE2 has a counter that limits the depth of nesting of parentheses in a + pattern. This limits the amount of system stack that a pattern uses when it + is compiled. The default is 250, but you can change it by setting, for + example, + + --with-parens-nest-limit=500 + +. PCRE2 has a counter that can be set to limit the amount of computing resource + it uses when matching a pattern. If the limit is exceeded during a match, the + match fails. The default is ten million. You can change the default by + setting, for example, + + --with-match-limit=500000 + + on the "configure" command. This is just the default; individual calls to + pcre2_match() or pcre2_dfa_match() can supply their own value. There is more + discussion in the pcre2api man page (search for pcre2_set_match_limit). + +. There is a separate counter that limits the depth of nested backtracking + (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a + matching process, which indirectly limits the amount of heap memory that is + used, and in the case of pcre2_dfa_match() the amount of stack as well. This + counter also has a default of ten million, which is essentially "unlimited". + You can change the default by setting, for example, + + --with-match-limit-depth=5000 + + There is more discussion in the pcre2api man page (search for + pcre2_set_depth_limit). + +. You can also set an explicit limit on the amount of heap memory used by + the pcre2_match() and pcre2_dfa_match() interpreters: + + --with-heap-limit=500 + + The units are kibibytes (units of 1024 bytes). This limit does not apply when + the JIT optimization (which has its own memory control features) is used. + There is more discussion on the pcre2api man page (search for + pcre2_set_heap_limit). + +. In the 8-bit library, the default maximum compiled pattern size is around + 64 kibibytes. You can increase this by adding --with-link-size=3 to the + "configure" command. PCRE2 then uses three bytes instead of two for offsets + to different parts of the compiled pattern. In the 16-bit library, + --with-link-size=3 is the same as --with-link-size=4, which (in both + libraries) uses four-byte offsets. Increasing the internal link size reduces + performance in the 8-bit and 16-bit libraries. In the 32-bit library, the + link size setting is ignored, as 4-byte offsets are always used. + +. For speed, PCRE2 uses four tables for manipulating and identifying characters + whose code point values are less than 256. By default, it uses a set of + tables for ASCII encoding that is part of the distribution. If you specify + + --enable-rebuild-chartables + + a program called pcre2_dftables is compiled and run in the default C locale + when you obey "make". It builds a source file called pcre2_chartables.c. If + you do not specify this option, pcre2_chartables.c is created as a copy of + pcre2_chartables.c.dist. See "Character tables" below for further + information. + +. It is possible to compile PCRE2 for use on systems that use EBCDIC as their + character code (as opposed to ASCII/Unicode) by specifying + + --enable-ebcdic --disable-unicode + + This automatically implies --enable-rebuild-chartables (see above). However, + when PCRE2 is built this way, it always operates in EBCDIC. It cannot support + both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25, + which specifies that the code value for the EBCDIC NL character is 0x25 + instead of the default 0x15. + +. If you specify --enable-debug, additional debugging code is included in the + build. This option is intended for use by the PCRE2 maintainers. + +. In environments where valgrind is installed, if you specify + + --enable-valgrind + + PCRE2 will use valgrind annotations to mark certain memory regions as + unaddressable. This allows it to detect invalid memory accesses, and is + mostly useful for debugging PCRE2 itself. + +. In environments where the gcc compiler is used and lcov is installed, if you + specify + + --enable-coverage + + the build process implements a code coverage report for the test suite. The + report is generated by running "make coverage". If ccache is installed on + your system, it must be disabled when building PCRE2 for coverage reporting. + You can do this by setting the environment variable CCACHE_DISABLE=1 before + running "make" to build PCRE2. There is more information about coverage + reporting in the "pcre2build" documentation. + +. When JIT support is enabled, pcre2grep automatically makes use of it, unless + you add --disable-pcre2grep-jit to the "configure" command. + +. There is support for calling external programs during matching in the + pcre2grep command, using PCRE2's callout facility with string arguments. This + support can be disabled by adding --disable-pcre2grep-callout to the + "configure" command. There are two kinds of callout: one that generates + output from inbuilt code, and another that calls an external program. The + latter has special support for Windows and VMS; otherwise it assumes the + existence of the fork() function. This facility can be disabled by adding + --disable-pcre2grep-callout-fork to the "configure" command. + +. The pcre2grep program currently supports only 8-bit data files, and so + requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use + libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by + specifying one or both of + + --enable-pcre2grep-libz + --enable-pcre2grep-libbz2 + + Of course, the relevant libraries must be installed on your system. + +. The default starting size (in bytes) of the internal buffer used by pcre2grep + can be set by, for example: + + --with-pcre2grep-bufsize=51200 + + The value must be a plain integer. The default is 20480. The amount of memory + used by pcre2grep is actually three times this number, to allow for "before" + and "after" lines. If very long lines are encountered, the buffer is + automatically enlarged, up to a fixed maximum size. + +. The default maximum size of pcre2grep's internal buffer can be set by, for + example: + + --with-pcre2grep-max-bufsize=2097152 + + The default is either 1048576 or the value of --with-pcre2grep-bufsize, + whichever is the larger. + +. It is possible to compile pcre2test so that it links with the libreadline + or libedit libraries, by specifying, respectively, + + --enable-pcre2test-libreadline or --enable-pcre2test-libedit + + If this is done, when pcre2test's input is from a terminal, it reads it using + the readline() function. This provides line-editing and history facilities. + Note that libreadline is GPL-licenced, so if you distribute a binary of + pcre2test linked in this way, there may be licensing issues. These can be + avoided by linking with libedit (which has a BSD licence) instead. + + Enabling libreadline causes the -lreadline option to be added to the + pcre2test build. In many operating environments with a sytem-installed + readline library this is sufficient. However, in some environments (e.g. if + an unmodified distribution version of readline is in use), it may be + necessary to specify something like LIBS="-lncurses" as well. This is + because, to quote the readline INSTALL, "Readline uses the termcap functions, + but does not link with the termcap or curses library itself, allowing + applications which link with readline the to choose an appropriate library." + If you get error messages about missing functions tgetstr, tgetent, tputs, + tgetflag, or tgoto, this is the problem, and linking with the ncurses library + should fix it. + +. The C99 standard defines formatting modifiers z and t for size_t and + ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in + environments other than Microsoft Visual Studio when __STDC_VERSION__ is + defined and has a value greater than or equal to 199901L (indicating C99). + However, there is at least one environment that claims to be C99 but does not + support these modifiers. If --disable-percent-zt is specified, no use is made + of the z or t modifiers. Instead or %td or %zu, %lu is used, with a cast for + size_t values. + +. There is a special option called --enable-fuzz-support for use by people who + want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit + library. If set, it causes an extra library called libpcre2-fuzzsupport.a to + be built, but not installed. This contains a single function called + LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the + length of the string. When called, this function tries to compile the string + as a pattern, and if that succeeds, to match it. This is done both with no + options and with some random options bits that are generated from the string. + Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to + be created. This is normally run under valgrind or used when PCRE2 is + compiled with address sanitizing enabled. It calls the fuzzing function and + outputs information about it is doing. The input strings are specified by + arguments: if an argument starts with "=" the rest of it is a literal input + string. Otherwise, it is assumed to be a file name, and the contents of the + file are the test string. + +. Releases before 10.30 could be compiled with --disable-stack-for-recursion, + which caused pcre2_match() to use individual blocks on the heap for + backtracking instead of recursive function calls (which use the stack). This + is now obsolete since pcre2_match() was refactored always to use the heap (in + a much more efficient way than before). This option is retained for backwards + compatibility, but has no effect other than to output a warning. + +The "configure" script builds the following files for the basic C library: + +. Makefile the makefile that builds the library +. src/config.h build-time configuration options for the library +. src/pcre2.h the public PCRE2 header file +. pcre2-config script that shows the building settings such as CFLAGS + that were set for "configure" +. libpcre2-8.pc ) +. libpcre2-16.pc ) data for the pkg-config command +. libpcre2-32.pc ) +. libpcre2-posix.pc ) +. libtool script that builds shared and/or static libraries + +Versions of config.h and pcre2.h are distributed in the src directory of PCRE2 +tarballs under the names config.h.generic and pcre2.h.generic. These are +provided for those who have to build PCRE2 without using "configure" or CMake. +If you use "configure" or CMake, the .generic versions are not used. + +The "configure" script also creates config.status, which is an executable +script that can be run to recreate the configuration, and config.log, which +contains compiler output from tests that "configure" runs. + +Once "configure" has run, you can run "make". This builds whichever of the +libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test +program called pcre2test. If you enabled JIT support with --enable-jit, another +test program called pcre2_jit_test is built as well. If the 8-bit library is +built, libpcre2-posix and the pcre2grep command are also built. Running +"make" with the -j option may speed up compilation on multiprocessor systems. + +The command "make check" runs all the appropriate tests. Details of the PCRE2 +tests are given below in a separate section of this document. The -j option of +"make" can also be used when running the tests. + +You can use "make install" to install PCRE2 into live directories on your +system. The following are installed (file names are all relative to the + that is set when "configure" is run): + + Commands (bin): + pcre2test + pcre2grep (if 8-bit support is enabled) + pcre2-config + + Libraries (lib): + libpcre2-8 (if 8-bit support is enabled) + libpcre2-16 (if 16-bit support is enabled) + libpcre2-32 (if 32-bit support is enabled) + libpcre2-posix (if 8-bit support is enabled) + + Configuration information (lib/pkgconfig): + libpcre2-8.pc + libpcre2-16.pc + libpcre2-32.pc + libpcre2-posix.pc + + Header files (include): + pcre2.h + pcre2posix.h + + Man pages (share/man/man{1,3}): + pcre2grep.1 + pcre2test.1 + pcre2-config.1 + pcre2.3 + pcre2*.3 (lots more pages, all starting "pcre2") + + HTML documentation (share/doc/pcre2/html): + index.html + *.html (lots more pages, hyperlinked from index.html) + + Text file documentation (share/doc/pcre2): + AUTHORS + COPYING + ChangeLog + LICENCE + NEWS + README + pcre2.txt (a concatenation of the man(3) pages) + pcre2test.txt the pcre2test man page + pcre2grep.txt the pcre2grep man page + pcre2-config.txt the pcre2-config man page + +If you want to remove PCRE2 from your system, you can run "make uninstall". +This removes all the files that "make install" installed. However, it does not +remove any directories, because these are often shared with other programs. + + +Retrieving configuration information +------------------------------------ + +Running "make install" installs the command pcre2-config, which can be used to +recall information about the PCRE2 configuration and installation. For example: + + pcre2-config --version + +prints the version number, and + + pcre2-config --libs8 + +outputs information about where the 8-bit library is installed. This command +can be included in makefiles for programs that use PCRE2, saving the programmer +from having to remember too many details. Run pcre2-config with no arguments to +obtain a list of possible arguments. + +The pkg-config command is another system for saving and retrieving information +about installed libraries. Instead of separate commands for each library, a +single command is used. For example: + + pkg-config --libs libpcre2-16 + +The data is held in *.pc files that are installed in a directory called +/lib/pkgconfig. + + +Shared libraries +---------------- + +The default distribution builds PCRE2 as shared libraries and static libraries, +as long as the operating system supports shared libraries. Shared library +support relies on the "libtool" script which is built as part of the +"configure" process. + +The libtool script is used to compile and link both shared and static +libraries. They are placed in a subdirectory called .libs when they are newly +built. The programs pcre2test and pcre2grep are built to use these uninstalled +libraries (by means of wrapper scripts in the case of shared libraries). When +you use "make install" to install shared libraries, pcre2grep and pcre2test are +automatically re-built to use the newly installed shared libraries before being +installed themselves. However, the versions left in the build directory still +use the uninstalled libraries. + +To build PCRE2 using static libraries only you must use --disable-shared when +configuring it. For example: + +./configure --prefix=/usr/gnu --disable-shared + +Then run "make" in the usual way. Similarly, you can use --disable-static to +build only shared libraries. + + +Cross-compiling using autotools +------------------------------- + +You can specify CC and CFLAGS in the normal way to the "configure" command, in +order to cross-compile PCRE2 for some other host. However, you should NOT +specify --enable-rebuild-chartables, because if you do, the pcre2_dftables.c +source file is compiled and run on the local host, in order to generate the +inbuilt character tables (the pcre2_chartables.c file). This will probably not +work, because pcre2_dftables.c needs to be compiled with the local compiler, +not the cross compiler. + +When --enable-rebuild-chartables is not specified, pcre2_chartables.c is +created by making a copy of pcre2_chartables.c.dist, which is a default set of +tables that assumes ASCII code. Cross-compiling with the default tables should +not be a problem. + +If you need to modify the character tables when cross-compiling, you should +move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by +hand and run it on the local host to make a new version of +pcre2_chartables.c.dist. See the pcre2build section "Creating character tables +at build time" for more details. + + +Making new tarballs +------------------- + +The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and +zip formats. The command "make distcheck" does the same, but then does a trial +build of the new distribution to ensure that it works. + +If you have modified any of the man page sources in the doc directory, you +should first run the PrepareRelease script before making a distribution. This +script creates the .txt and HTML forms of the documentation from the man pages. + + +Testing PCRE2 +------------- + +To test the basic PCRE2 library on a Unix-like system, run the RunTest script. +There is another script called RunGrepTest that tests the pcre2grep command. +When JIT support is enabled, a third test program called pcre2_jit_test is +built. Both the scripts and all the program tests are run if you obey "make +check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD. + +The RunTest script runs the pcre2test test program (which is documented in its +own man page) on each of the relevant testinput files in the testdata +directory, and compares the output with the contents of the corresponding +testoutput files. RunTest uses a file called testtry to hold the main output +from pcre2test. Other files whose names begin with "test" are used as working +files in some tests. + +Some tests are relevant only when certain build-time options were selected. For +example, the tests for UTF-8/16/32 features are run only when Unicode support +is available. RunTest outputs a comment when it skips a test. + +Many (but not all) of the tests that are not skipped are run twice if JIT +support is available. On the second run, JIT compilation is forced. This +testing can be suppressed by putting "nojit" on the RunTest command line. + +The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit +libraries that are enabled. If you want to run just one set of tests, call +RunTest with either the -8, -16 or -32 option. + +If valgrind is installed, you can run the tests under it by putting "valgrind" +on the RunTest command line. To run pcre2test on just one or more specific test +files, give their numbers as arguments to RunTest, for example: + + RunTest 2 7 11 + +You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the +end), or a number preceded by ~ to exclude a test. For example: + + Runtest 3-15 ~10 + +This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests +except test 13. Whatever order the arguments are in, the tests are always run +in numerical order. + +You can also call RunTest with the single argument "list" to cause it to output +a list of tests. + +The test sequence starts with "test 0", which is a special test that has no +input file, and whose output is not checked. This is because it will be +different on different hardware and with different configurations. The test +exists in order to exercise some of pcre2test's code that would not otherwise +be run. + +Tests 1 and 2 can always be run, as they expect only plain text strings (not +UTF) and make no use of Unicode properties. The first test file can be fed +directly into the perltest.sh script to check that Perl gives the same results. +The only difference you should see is in the first few lines, where the Perl +version is given instead of the PCRE2 version. The second set of tests check +auxiliary functions, error detection, and run-time flags that are specific to +PCRE2. It also uses the debugging flags to check some of the internals of +pcre2_compile(). + +If you build PCRE2 with a locale setting that is not the standard C locale, the +character tables may be different (see next paragraph). In some cases, this may +cause failures in the second set of tests. For example, in a locale where the +isprint() function yields TRUE for characters in the range 128-255, the use of +[:isascii:] inside a character class defines a different set of characters, and +this shows up in this test as a difference in the compiled code, which is being +listed for checking. For example, where the comparison test output contains +[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other +cases. This is not a bug in PCRE2. + +Test 3 checks pcre2_maketables(), the facility for building a set of character +tables for a specific locale and using them instead of the default tables. The +script uses the "locale" command to check for the availability of the "fr_FR", +"french", or "fr" locale, and uses the first one that it finds. If the "locale" +command fails, or if its output doesn't include "fr_FR", "french", or "fr" in +the list of available locales, the third test cannot be run, and a comment is +output to say why. If running this test produces an error like this: + + ** Failed to set locale "fr_FR" + +it means that the given locale is not available on your system, despite being +listed by "locale". This does not mean that PCRE2 is broken. There are three +alternative output files for the third test, because three different versions +of the French locale have been encountered. The test passes if its output +matches any one of them. + +Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible +with the perltest.sh script, and test 5 checking PCRE2-specific things. + +Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in +non-UTF mode and UTF-mode with Unicode property support, respectively. + +Test 8 checks some internal offsets and code size features, but it is run only +when Unicode support is enabled. The output is different in 8-bit, 16-bit, and +32-bit modes and for different link sizes, so there are different output files +for each mode and link size. + +Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in +16-bit and 32-bit modes. These are tests that generate different output in +8-bit mode. Each pair are for general cases and Unicode support, respectively. + +Test 13 checks the handling of non-UTF characters greater than 255 by +pcre2_dfa_match() in 16-bit and 32-bit modes. + +Test 14 contains some special UTF and UCP tests that give different output for +different code unit widths. + +Test 15 contains a number of tests that must not be run with JIT. They check, +among other non-JIT things, the match-limiting features of the intepretive +matcher. + +Test 16 is run only when JIT support is not available. It checks that an +attempt to use JIT has the expected behaviour. + +Test 17 is run only when JIT support is available. It checks JIT complete and +partial modes, match-limiting under JIT, and other JIT-specific features. + +Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to +the 8-bit library, without and with Unicode support, respectively. + +Test 20 checks the serialization functions by writing a set of compiled +patterns to a file, and then reloading and checking them. + +Tests 21 and 22 test \C support when the use of \C is not locked out, without +and with UTF support, respectively. Test 23 tests \C when it is locked out. + +Tests 24 and 25 test the experimental pattern conversion functions, without and +with UTF support, respectively. + + +Character tables +---------------- + +For speed, PCRE2 uses four tables for manipulating and identifying characters +whose code point values are less than 256. By default, a set of tables that is +built into the library is used. The pcre2_maketables() function can be called +by an application to create a new set of tables in the current locale. This are +passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a +compile context. + +The source file called pcre2_chartables.c contains the default set of tables. +By default, this is created as a copy of pcre2_chartables.c.dist, which +contains tables for ASCII coding. However, if --enable-rebuild-chartables is +specified for ./configure, a new version of pcre2_chartables.c is built by the +program pcre2_dftables (compiled from pcre2_dftables.c), which uses the ANSI C +character handling functions such as isalnum(), isalpha(), isupper(), +islower(), etc. to build the table sources. This means that the default C +locale that is set for your system will control the contents of these default +tables. You can change the default tables by editing pcre2_chartables.c and +then re-building PCRE2. If you do this, you should take care to ensure that the +file does not get automatically re-generated. The best way to do this is to +move pcre2_chartables.c.dist out of the way and replace it with your customized +tables. + +When the pcre2_dftables program is run as a result of specifying +--enable-rebuild-chartables, it uses the default C locale that is set on your +system. It does not pay attention to the LC_xxx environment variables. In other +words, it uses the system's default locale rather than whatever the compiling +user happens to have set. If you really do want to build a source set of +character tables in a locale that is specified by the LC_xxx variables, you can +run the pcre2_dftables program by hand with the -L option. For example: + + ./pcre2_dftables -L pcre2_chartables.c.special + +The second argument names the file where the source code for the tables is +written. The first two 256-byte tables provide lower casing and case flipping +functions, respectively. The next table consists of a number of 32-byte bit +maps which identify certain character classes such as digits, "word" +characters, white space, etc. These are used when building 32-byte bit maps +that represent character classes for code points less than 256. The final +256-byte table has bits indicating various character types, as follows: + + 1 white space character + 2 letter + 4 lower case letter + 8 decimal digit + 16 alphanumeric or '_' + +You can also specify -b (with or without -L) when running pcre2_dftables. This +causes the tables to be written in binary instead of as source code. A set of +binary tables can be loaded into memory by an application and passed to +pcre2_compile() in the same way as tables created dynamically by calling +pcre2_maketables(). The tables are just a string of bytes, independent of +hardware characteristics such as endianness. This means they can be bundled +with an application that runs in different environments, to ensure consistent +behaviour. + +See also the pcre2build section "Creating character tables at build time". + + +File manifest +------------- + +The distribution should contain the files listed below. + +(A) Source files for the PCRE2 library functions and their headers are found in + the src directory: + + src/pcre2_dftables.c auxiliary program for building pcre2_chartables.c + when --enable-rebuild-chartables is specified + + src/pcre2_chartables.c.dist a default set of character tables that assume + ASCII coding; unless --enable-rebuild-chartables is + specified, used by copying to pcre2_chartables.c + + src/pcre2posix.c ) + src/pcre2_auto_possess.c ) + src/pcre2_compile.c ) + src/pcre2_config.c ) + src/pcre2_context.c ) + src/pcre2_convert.c ) + src/pcre2_dfa_match.c ) + src/pcre2_error.c ) + src/pcre2_extuni.c ) + src/pcre2_find_bracket.c ) + src/pcre2_jit_compile.c ) + src/pcre2_jit_match.c ) sources for the functions in the library, + src/pcre2_jit_misc.c ) and some internal functions that they use + src/pcre2_maketables.c ) + src/pcre2_match.c ) + src/pcre2_match_data.c ) + src/pcre2_newline.c ) + src/pcre2_ord2utf.c ) + src/pcre2_pattern_info.c ) + src/pcre2_script_run.c ) + src/pcre2_serialize.c ) + src/pcre2_string_utils.c ) + src/pcre2_study.c ) + src/pcre2_substitute.c ) + src/pcre2_substring.c ) + src/pcre2_tables.c ) + src/pcre2_ucd.c ) + src/pcre2_valid_utf.c ) + src/pcre2_xclass.c ) + + src/pcre2_printint.c debugging function that is used by pcre2test, + src/pcre2_fuzzsupport.c function for (optional) fuzzing support + + src/config.h.in template for config.h, when built by "configure" + src/pcre2.h.in template for pcre2.h when built by "configure" + src/pcre2posix.h header for the external POSIX wrapper API + src/pcre2_internal.h header for internal use + src/pcre2_intmodedep.h a mode-specific internal header + src/pcre2_ucp.h header for Unicode property handling + + sljit/* source files for the JIT compiler + +(B) Source files for programs that use PCRE2: + + src/pcre2demo.c simple demonstration of coding calls to PCRE2 + src/pcre2grep.c source of a grep utility that uses PCRE2 + src/pcre2test.c comprehensive test program + src/pcre2_jit_test.c JIT test program + +(C) Auxiliary files: + + 132html script to turn "man" pages into HTML + AUTHORS information about the author of PCRE2 + ChangeLog log of changes to the code + CleanTxt script to clean nroff output for txt man pages + Detrail script to remove trailing spaces + HACKING some notes about the internals of PCRE2 + INSTALL generic installation instructions + LICENCE conditions for the use of PCRE2 + COPYING the same, using GNU's standard name + Makefile.in ) template for Unix Makefile, which is built by + ) "configure" + Makefile.am ) the automake input that was used to create + ) Makefile.in + NEWS important changes in this release + NON-AUTOTOOLS-BUILD notes on building PCRE2 without using autotools + PrepareRelease script to make preparations for "make dist" + README this file + RunTest a Unix shell script for running tests + RunGrepTest a Unix shell script for pcre2grep tests + aclocal.m4 m4 macros (generated by "aclocal") + config.guess ) files used by libtool, + config.sub ) used only when building a shared library + configure a configuring shell script (built by autoconf) + configure.ac ) the autoconf input that was used to build + ) "configure" and config.h + depcomp ) script to find program dependencies, generated by + ) automake + doc/*.3 man page sources for PCRE2 + doc/*.1 man page sources for pcre2grep and pcre2test + doc/index.html.src the base HTML page + doc/html/* HTML documentation + doc/pcre2.txt plain text version of the man pages + doc/pcre2test.txt plain text documentation of test program + install-sh a shell script for installing files + libpcre2-8.pc.in template for libpcre2-8.pc for pkg-config + libpcre2-16.pc.in template for libpcre2-16.pc for pkg-config + libpcre2-32.pc.in template for libpcre2-32.pc for pkg-config + libpcre2-posix.pc.in template for libpcre2-posix.pc for pkg-config + ltmain.sh file used to build a libtool script + missing ) common stub for a few missing GNU programs while + ) installing, generated by automake + mkinstalldirs script for making install directories + perltest.sh Script for running a Perl test program + pcre2-config.in source of script which retains PCRE2 information + testdata/testinput* test data for main library tests + testdata/testoutput* expected test results + testdata/grep* input and output for pcre2grep tests + testdata/* other supporting test files + +(D) Auxiliary files for cmake support + + cmake/COPYING-CMAKE-SCRIPTS + cmake/FindPackageHandleStandardArgs.cmake + cmake/FindEditline.cmake + cmake/FindReadline.cmake + CMakeLists.txt + config-cmake.h.in + +(E) Auxiliary files for building PCRE2 "by hand" + + src/pcre2.h.generic ) a version of the public PCRE2 header file + ) for use in non-"configure" environments + src/config.h.generic ) a version of config.h for use in non-"configure" + ) environments + +Philip Hazel +Email local part: Philip.Hazel +Email domain: gmail.com +Last updated: 28 April 2021 diff --git a/src/pcre2/RunGrepTest b/src/pcre2/RunGrepTest new file mode 100755 index 00000000..78206baf --- /dev/null +++ b/src/pcre2/RunGrepTest @@ -0,0 +1,821 @@ +#! /bin/sh + +# Run pcre2grep tests. The assumption is that the PCRE2 tests check the library +# itself. What we are checking here is the file handling and options that are +# supported by pcre2grep. This script must be run in the build directory. + +# CODING CONVENTIONS: +# * Put printf arguments in single, not double quotes to avoid unwanted +# escaping. +# * Use \0 for binary zero in printf, not \x0, for the benefit of older +# versions (and use octal for other special values). + +# Set the C locale, so that sort(1) behaves predictably. + +LC_ALL=C +export LC_ALL + +# Remove any non-default colouring and aliases that the caller may have set. + +unset PCRE2GREP_COLOUR PCRE2GREP_COLOR PCREGREP_COLOUR PCREGREP_COLOR +unset GREP_COLOR GREP_COLORS +unset cp ls mv rm + +# Remember the current (build) directory, set the program to be tested, and +# valgrind settings when requested. + +builddir=`pwd` +pcre2grep=$builddir/pcre2grep +pcre2test=$builddir/pcre2test + +if [ ! -x $pcre2grep ] ; then + echo "** $pcre2grep does not exist or is not executable." + exit 1 +fi + +if [ ! -x $pcre2test ] ; then + echo "** $pcre2test does not exist or is not executable." + exit 1 +fi + +valgrind= +while [ $# -gt 0 ] ; do + case $1 in + valgrind) valgrind="valgrind -q --leak-check=no --smc-check=all-non-file";; + *) echo "RunGrepTest: Unknown argument $1"; exit 1;; + esac + shift +done + +vjs= +pcre2grep_version=`$pcre2grep -V` +if [ "$valgrind" = "" ] ; then + echo "Testing $pcre2grep_version" +else + echo "Testing $pcre2grep_version using valgrind" + $pcre2test -C jit >/dev/null + if [ $? -ne 0 ]; then + vjs="--suppressions=./testdata/valgrind-jit.supp" + fi +fi + +# Set up a suitable "diff" command for comparison. Some systems have a diff +# that lacks a -u option. Try to deal with this; better do the test for the -b +# option as well. + +cf="diff" +diff -b /dev/null /dev/null 2>/dev/null && cf="diff -b" +diff -u /dev/null /dev/null 2>/dev/null && cf="diff -u" +diff -ub /dev/null /dev/null 2>/dev/null && cf="diff -ub" + +# If this test is being run from "make check", $srcdir will be set. If not, set +# it to the current or parent directory, whichever one contains the test data. +# Subsequently, we run most of the pcre2grep tests in the source directory so +# that the file names in the output are always the same. + +if [ -z "$srcdir" -o ! -d "$srcdir/testdata" ] ; then + if [ -d "./testdata" ] ; then + srcdir=. + elif [ -d "../testdata" ] ; then + srcdir=.. + else + echo "Cannot find the testdata directory" + exit 1 + fi +fi + +# Check for the availability of UTF-8 support + +$pcre2test -C unicode >/dev/null +utf8=$? + +# Check default newline convention. If it does not include LF, force LF. + +nl=`$pcre2test -C newline` +if [ "$nl" != "LF" -a "$nl" != "ANY" -a "$nl" != "ANYCRLF" ]; then + pcre2grep="$pcre2grep -N LF" + echo "Default newline setting forced to LF" +fi + +# ------ Function to run and check a special pcre2grep arguments test ------- + +checkspecial() + { + $valgrind $pcre2grep $1 >>testtrygrep 2>&1 + if [ $? -ne $2 ] ; then + echo "** pcre2grep $1 failed - check testtrygrep" + exit 1 + fi + } + +# ------ Normal tests ------ + +echo "Testing pcre2grep main features" + +echo "---------------------------- Test 1 ------------------------------" >testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep PATTERN ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 2 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep '^PATTERN' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 3 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -in PATTERN ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 4 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -ic PATTERN ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 5 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -in PATTERN ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 6 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -inh PATTERN ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 7 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -il PATTERN ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 8 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -l PATTERN ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 9 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -q PATTERN ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 10 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -q NEVER-PATTERN ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 11 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -vn pattern ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 12 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -ix pattern ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 13 -----------------------------" >>testtrygrep +echo seventeen >testtemp1grep +(cd $srcdir; $valgrind $vjs $pcre2grep -f./testdata/greplist -f $builddir/testtemp1grep ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 14 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -w pat ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 15 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep 'abc^*' ./testdata/grepinput) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 16 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep abc ./testdata/grepinput ./testdata/nonexistfile) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 17 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -M 'the\noutput' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 18 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -Mn '(the\noutput|dog\.\n--)' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 19 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -Mix 'Pattern' ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 20 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -Mixn 'complete pair\nof lines' ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 21 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -nA3 'four' ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 22 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -nB3 'four' ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 23 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -C3 'four' ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 24 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -A9 'four' ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 25 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -nB9 'four' ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 26 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -A9 -B9 'four' ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 27 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -A10 'four' ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 28 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -nB10 'four' ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 29 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -C12 -B10 'four' ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 30 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -inB3 'pattern' ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 31 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -inA3 'pattern' ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 32 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -L 'fox' ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 33 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep 'fox' ./testdata/grepnonexist) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 34 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -s 'fox' ./testdata/grepnonexist) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 35 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -L -r --include=grepinputx --include grepinput8 --exclude-dir='^\.' 'fox' ./testdata | sort) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 36 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -L -r --include=grepinput --exclude 'grepinput$' --exclude=grepinput8 --exclude=grepinputM --exclude-dir='^\.' 'fox' ./testdata | sort) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 37 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep '^(a+)*\d' ./testdata/grepinput) >>testtrygrep 2>teststderrgrep +echo "RC=$?" >>testtrygrep +echo "======== STDERR ========" >>testtrygrep +cat teststderrgrep >>testtrygrep + +echo "---------------------------- Test 38 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep '>\x00<' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 39 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -A1 'before the binary zero' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 40 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -B1 'after the binary zero' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 41 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -B1 -o '\w+ the binary zero' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 42 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -B1 -onH '\w+ the binary zero' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 43 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -on 'before|zero|after' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 44 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -on -e before -ezero -e after ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 45 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -on -f ./testdata/greplist -e binary ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 46 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -eabc -e '(unclosed' ./testdata/grepinput) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 47 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -Fx "AB.VE +elephant" ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 48 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -F "AB.VE +elephant" ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 49 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -F -e DATA -e "AB.VE +elephant" ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 50 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep "^(abc|def|ghi|jkl)" ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 51 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -Mv "brown\sfox" ./testdata/grepinputv) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 52 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --colour=always jumps ./testdata/grepinputv) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 53 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --file-offsets 'before|zero|after' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 54 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --line-offsets 'before|zero|after' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 55 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -f./testdata/greplist --color=always ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 56 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -c lazy ./testdata/grepinput*) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 57 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -c -l lazy ./testdata/grepinput*) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 58 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --regex=PATTERN ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 59 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --regexp=PATTERN ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 60 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --regex PATTERN ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 61 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --regexp PATTERN ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 62 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $pcre2grep --match-limit=1000 --no-jit -M 'This is a file(.|\R)*file.' ./testdata/grepinput) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 63 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $pcre2grep --recursion-limit=1000 --no-jit -M 'This is a file(.|\R)*file.' ./testdata/grepinput) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 64 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o1 '(?<=PAT)TERN (ap(pear)s)' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 65 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o2 '(?<=PAT)TERN (ap(pear)s)' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 66 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o3 '(?<=PAT)TERN (ap(pear)s)' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 67 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o12 '(?<=PAT)TERN (ap(pear)s)' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 68 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --only-matching=2 '(?<=PAT)TERN (ap(pear)s)' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 69 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -vn --colour=always pattern ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 70 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --color=always -M "triple:\t.*\n\n" ./testdata/grepinput3) >>testtrygrep +echo "RC=$?" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --color=always -M -n "triple:\t.*\n\n" ./testdata/grepinput3) >>testtrygrep +echo "RC=$?" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -M "triple:\t.*\n\n" ./testdata/grepinput3) >>testtrygrep +echo "RC=$?" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -M -n "triple:\t.*\n\n" ./testdata/grepinput3) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 71 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o "^01|^02|^03" ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 72 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --color=always "^01|^02|^03" ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 73 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o --colour=always "^01|^02|^03" ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 74 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o "^01|02|^03" ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 75 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --color=always "^01|02|^03" ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 76 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o --colour=always "^01|02|^03" ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 77 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o "^01|^02|03" ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 78 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --color=always "^01|^02|03" ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 79 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o --colour=always "^01|^02|03" ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 80 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o "\b01|\b02" ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 81 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --color=always "\\b01|\\b02" ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 82 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o --colour=always "\\b01|\\b02" ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 83 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --buffer-size=10 --max-buffer-size=100 "^a" ./testdata/grepinput3) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 84 -----------------------------" >>testtrygrep +echo testdata/grepinput3 >testtemp1grep +(cd $srcdir; $valgrind $vjs $pcre2grep --file-list ./testdata/grepfilelist --file-list $builddir/testtemp1grep "fox|complete|t7") >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 85 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --file-list=./testdata/grepfilelist "dolor" ./testdata/grepinput3) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 86 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep "dog" ./testdata/grepbinary) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 87 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep "cat" ./testdata/grepbinary) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 88 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -v "cat" ./testdata/grepbinary) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 89 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -I "dog" ./testdata/grepbinary) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 90 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --binary-files=without-match "dog" ./testdata/grepbinary) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 91 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -a "dog" ./testdata/grepbinary) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 92 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --binary-files=text "dog" ./testdata/grepbinary) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 93 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --text "dog" ./testdata/grepbinary) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 94 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -L -r --include=grepinputx --include grepinput8 'fox' ./testdata/grepinput* | sort) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 95 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --file-list ./testdata/grepfilelist --exclude grepinputv "fox|complete") >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 96 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -L -r --include-dir=testdata --exclude '^(?!grepinput)' --exclude=grepinputM 'fox' ./test* | sort) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 97 -----------------------------" >>testtrygrep +echo "grepinput$" >testtemp1grep +echo "grepinput8" >>testtemp1grep +(cd $srcdir; $valgrind $vjs $pcre2grep -L -r --include=grepinput --exclude=grepinputM --exclude-from $builddir/testtemp1grep --exclude-dir='^\.' 'fox' ./testdata | sort) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 98 -----------------------------" >>testtrygrep +echo "grepinput$" >testtemp1grep +echo "grepinput8" >>testtemp1grep +(cd $srcdir; $valgrind $vjs $pcre2grep -L -r --exclude=grepinput3 --exclude=grepinputM --include=grepinput --exclude-from $builddir/testtemp1grep --exclude-dir='^\.' 'fox' ./testdata | sort) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 99 -----------------------------" >>testtrygrep +echo "grepinput$" >testtemp1grep +echo "grepinput8" >testtemp2grep +(cd $srcdir; $valgrind $vjs $pcre2grep -L -r --include grepinput --exclude=grepinputM --exclude-from $builddir/testtemp1grep --exclude-from=$builddir/testtemp2grep --exclude-dir='^\.' 'fox' ./testdata | sort) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 100 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -Ho2 --only-matching=1 -o3 '(\w+) binary (\w+)(\.)?' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 101 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o3 -Ho2 -o12 --only-matching=1 -o3 --colour=always --om-separator='|' '(\w+) binary (\w+)(\.)?' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 102 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -n "^$" ./testdata/grepinput3) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 103 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --only-matching "^$" ./testdata/grepinput3) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 104 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -n --only-matching "^$" ./testdata/grepinput3) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 105 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --colour=always "ipsum|" ./testdata/grepinput3) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 106 -----------------------------" >>testtrygrep +(cd $srcdir; echo "a" | $valgrind $vjs $pcre2grep -M "|a" ) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 107 -----------------------------" >>testtrygrep +echo "a" >testtemp1grep +echo "aaaaa" >>testtemp1grep +(cd $srcdir; $valgrind $vjs $pcre2grep --line-offsets '(?<=\Ka)' $builddir/testtemp1grep) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 108 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -lq PATTERN ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 109 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -cq lazy ./testdata/grepinput*) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 110 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --om-separator / -Mo0 -o1 -o2 'match (\d+):\n (.)\n' testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 111 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --line-offsets -M 'match (\d+):\n (.)\n' testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 112 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --file-offsets -M 'match (\d+):\n (.)\n' testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 113 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --total-count 'the' testdata/grepinput*) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 114 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -tc 'the' testdata/grepinput*) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 115 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -tlc 'the' testdata/grepinput*) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 116 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep --exclude=grepinputM -th 'the' testdata/grepinput*) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 117 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -tch 'the' testdata/grepinput*) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 118 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -tL 'the' testdata/grepinput*) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 119 -----------------------------" >>testtrygrep +printf '123\n456\n789\n---abc\ndef\nxyz\n---\n' >testNinputgrep +$valgrind $vjs $pcre2grep -Mo '(\n|[^-])*---' testNinputgrep >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 120 ------------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -HO '$0:$2$1$3' '(\w+) binary (\w+)(\.)?' ./testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 121 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -F '\E and (regex)' testdata/grepinputv) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 122 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -w 'cat|dog' testdata/grepinputv) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 123 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -w 'dog|cat' testdata/grepinputv) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 124 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -Mn --colour=always 'start[\s]+end' testdata/grepinputM) >>testtrygrep +echo "RC=$?" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -Mn --colour=always -A2 'start[\s]+end' testdata/grepinputM) >>testtrygrep +echo "RC=$?" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -Mn 'start[\s]+end' testdata/grepinputM) >>testtrygrep +echo "RC=$?" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -Mn -A2 'start[\s]+end' testdata/grepinputM) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 125 -----------------------------" >>testtrygrep +printf 'abcd\n' >testNinputgrep +$valgrind $vjs $pcre2grep --colour=always '(?<=\K.)' testNinputgrep >>testtrygrep +echo "RC=$?" >>testtrygrep +$valgrind $vjs $pcre2grep --colour=always '(?=.\K)' testNinputgrep >>testtrygrep +echo "RC=$?" >>testtrygrep +$valgrind $vjs $pcre2grep --colour=always '(?<=\K[ac])' testNinputgrep >>testtrygrep +echo "RC=$?" >>testtrygrep +$valgrind $vjs $pcre2grep --colour=always '(?=[ac]\K)' testNinputgrep >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 126 -----------------------------" >>testtrygrep +printf 'Next line pattern has binary zero\nABC\0XYZ\n' >testtemp1grep +printf 'ABC\0XYZ\nABCDEF\nDEFABC\n' >testtemp2grep +$valgrind $vjs $pcre2grep -a -f testtemp1grep testtemp2grep >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 127 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o --om-capture=0 'pattern()()()()' testdata/grepinput) >>testtrygrep +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 128 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o1 --om-capture=0 'pattern()()()()' testdata/grepinput) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 129 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -m 2 'fox' testdata/grepinput) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 130 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -o -m2 'fox' testdata/grepinput) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 131 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -oc -m2 'fox' testdata/grepinput) >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 132 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -m1 -A3 '^match'; echo '---'; head -1) <$srcdir/testdata/grepinput >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +echo "---------------------------- Test 133 -----------------------------" >>testtrygrep +(cd $srcdir; $valgrind $vjs $pcre2grep -m1 -O '=$x{41}$x423$o{103}$o1045=' 'fox') <$srcdir/testdata/grepinputv >>testtrygrep 2>&1 +echo "RC=$?" >>testtrygrep + +# Now compare the results. + +$cf $srcdir/testdata/grepoutput testtrygrep +if [ $? != 0 ] ; then exit 1; fi + + +# These tests require UTF-8 support + +if [ $utf8 -ne 0 ] ; then + echo "Testing pcre2grep UTF-8 features" + + echo "---------------------------- Test U1 ------------------------------" >testtrygrep + (cd $srcdir; $valgrind $vjs $pcre2grep -n -u --newline=any "^X" ./testdata/grepinput8) >>testtrygrep + echo "RC=$?" >>testtrygrep + + echo "---------------------------- Test U2 ------------------------------" >>testtrygrep + (cd $srcdir; $valgrind $vjs $pcre2grep -n -u -C 3 --newline=any "Match" ./testdata/grepinput8) >>testtrygrep + echo "RC=$?" >>testtrygrep + + echo "---------------------------- Test U3 ------------------------------" >>testtrygrep + (cd $srcdir; $valgrind $vjs $pcre2grep --line-offsets -u --newline=any '(?<=\K\x{17f})' ./testdata/grepinput8) >>testtrygrep + echo "RC=$?" >>testtrygrep + + echo "---------------------------- Test U4 ------------------------------" >>testtrygrep + printf 'A\341\200\200\200CD\342\200\200Z\n' >testtemp1grep + (cd $srcdir; $valgrind $vjs $pcre2grep -u -o '....' $builddir/testtemp1grep) >>testtrygrep 2>&1 + echo "RC=$?" >>testtrygrep + + echo "---------------------------- Test U5 ------------------------------" >>testtrygrep + printf 'A\341\200\200\200CD\342\200\200Z\n' >testtemp1grep + (cd $srcdir; $valgrind $vjs $pcre2grep -U -o '....' $builddir/testtemp1grep) >>testtrygrep + echo "RC=$?" >>testtrygrep + + echo "---------------------------- Test U6 -----------------------------" >>testtrygrep + (cd $srcdir; $valgrind $vjs $pcre2grep -u -m1 -O '=$x{1d3}$o{744}=' 'fox') <$srcdir/testdata/grepinputv >>testtrygrep 2>&1 + echo "RC=$?" >>testtrygrep + + $cf $srcdir/testdata/grepoutput8 testtrygrep + if [ $? != 0 ] ; then exit 1; fi + +else + echo "Skipping pcre2grep UTF-8 tests: no UTF-8 support in PCRE2 library" +fi + + +# We go to some contortions to try to ensure that the tests for the various +# newline settings will work in environments where the normal newline sequence +# is not \n. Do not use exported files, whose line endings might be changed. +# Instead, create an input file using printf so that its contents are exactly +# what we want. Note the messy fudge to get printf to write a string that +# starts with a hyphen. These tests are run in the build directory. + +echo "Testing pcre2grep newline settings" +printf 'abc\rdef\r\nghi\njkl' >testNinputgrep + +printf '%c--------------------------- Test N1 ------------------------------\r\n' - >testtrygrep +$valgrind $vjs $pcre2grep -n -N CR "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep + +printf '%c--------------------------- Test N2 ------------------------------\r\n' - >>testtrygrep +$valgrind $vjs $pcre2grep -n --newline=crlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep + +printf '%c--------------------------- Test N3 ------------------------------\r\n' - >>testtrygrep +pattern=`printf 'def\rjkl'` +$valgrind $vjs $pcre2grep -n --newline=cr -F "$pattern" testNinputgrep >>testtrygrep + +printf '%c--------------------------- Test N4 ------------------------------\r\n' - >>testtrygrep +$valgrind $vjs $pcre2grep -n --newline=crlf -F -f $srcdir/testdata/greppatN4 testNinputgrep >>testtrygrep + +printf '%c--------------------------- Test N5 ------------------------------\r\n' - >>testtrygrep +$valgrind $vjs $pcre2grep -n --newline=any "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep + +printf '%c--------------------------- Test N6 ------------------------------\r\n' - >>testtrygrep +$valgrind $vjs $pcre2grep -n --newline=anycrlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep + +# This next test involves NUL characters. It seems impossible to handle them +# easily in many operating systems. An earlier version of this script used sed +# to translate NUL into the string ZERO, but this didn't work on Solaris (aka +# SunOS), where the version of sed explicitly doesn't like them, and also MacOS +# (Darwin), OpenBSD, FreeBSD, NetBSD, and some Linux distributions like Alpine, +# even when using GNU sed. A user suggested using tr instead, which +# necessitates translating to a single character (@). However, on (some +# versions of?) Solaris, the normal "tr" cannot handle binary zeros, but if +# /usr/xpg4/bin/tr is available, it can do so, so test for that. + +if [ -x /usr/xpg4/bin/tr ] ; then + tr=/usr/xpg4/bin/tr +else + tr=tr +fi + +printf '%c--------------------------- Test N7 ------------------------------\r\n' - >>testtrygrep +printf 'abc\0def' >testNinputgrep +$valgrind $vjs $pcre2grep -na --newline=nul "^(abc|def)" testNinputgrep | $tr '\000' '@' >>testtrygrep +echo "" >>testtrygrep + +$cf $srcdir/testdata/grepoutputN testtrygrep +if [ $? != 0 ] ; then exit 1; fi + +# If pcre2grep supports script callouts, run some tests on them. It is possible +# to restrict these callouts to the non-fork case, either for security, or for +# environments that do not support fork(). This is handled by comparing to a +# different output. + +if $valgrind $vjs $pcre2grep --help | $valgrind $vjs $pcre2grep -q 'callout scripts in patterns are supported'; then + echo "Testing pcre2grep script callouts" + $valgrind $vjs $pcre2grep '(T)(..(.))(?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4) ($14) ($0)")()' $srcdir/testdata/grepinputv >testtrygrep + $valgrind $vjs $pcre2grep '(T)(..(.))()()()()()()()(..)(?C"/bin/echo|Arg1: [$11] [${11}]")' $srcdir/testdata/grepinputv >>testtrygrep + $valgrind $vjs $pcre2grep '(T)(?C"|$0:$1$n")' $srcdir/testdata/grepinputv >>testtrygrep + $valgrind $vjs $pcre2grep '(T)(?C"|$1$n")(*F)' $srcdir/testdata/grepinputv >>testtrygrep + $valgrind $vjs $pcre2grep -m1 '(T)(?C"|$0:$1:$x{41}$o{101}$n")' $srcdir/testdata/grepinputv >>testtrygrep + + if $valgrind $vjs $pcre2grep --help | $valgrind $vjs $pcre2grep -q 'Non-fork callout scripts in patterns are supported'; then + $cf $srcdir/testdata/grepoutputCN testtrygrep + else + $cf $srcdir/testdata/grepoutputC testtrygrep + fi + + if [ $? != 0 ] ; then exit 1; fi +else + echo "Script callouts are not supported" +fi + +# Finally, some tests to exercise code that is not tested above, just to be +# sure that it runs OK. Doing this improves the coverage statistics. The output +# is not checked. + +echo "Testing miscellaneous pcre2grep arguments (unchecked)" +echo '' >testtrygrep +checkspecial '-xxxxx' 2 +checkspecial '--help' 0 +checkspecial '--line-buffered --colour=auto abc /dev/null' 1 + +# Clean up local working files +rm -f testNinputgrep teststderrgrep testtrygrep testtemp1grep testtemp2grep + +exit 0 + +# End diff --git a/src/pcre2/RunGrepTest.bat b/src/pcre2/RunGrepTest.bat new file mode 100644 index 00000000..4a095a36 --- /dev/null +++ b/src/pcre2/RunGrepTest.bat @@ -0,0 +1,699 @@ +@echo off + +:: Run pcre2grep tests. The assumption is that the PCRE2 tests check the library +:: itself. What we are checking here is the file handling and options that are +:: supported by pcre2grep. This script must be run in the build directory. +:: (jmh: I've only tested in the main directory, using my own builds.) + +setlocal enabledelayedexpansion + +:: Remove any non-default colouring that the caller may have set. + +set PCRE2GREP_COLOUR= +set PCRE2GREP_COLOR= +set PCREGREP_COLOUR= +set PCREGREP_COLOR= +set GREP_COLORS= +set GREP_COLOR= + +:: Remember the current (build) directory and set the program to be tested. + +set builddir="%CD%" +set pcre2grep=%builddir%\pcre2grep.exe +set pcre2test=%builddir%\pcre2test.exe + +if NOT exist %pcre2grep% ( + echo ** %pcre2grep% does not exist. + exit /b 1 +) + +if NOT exist %pcre2test% ( + echo ** %pcre2test% does not exist. + exit /b 1 +) + +for /f "delims=" %%a in ('"%pcre2grep%" -V') do set pcre2grep_version=%%a +echo Testing %pcre2grep_version% + +:: Set up a suitable "diff" command for comparison. Some systems have a diff +:: that lacks a -u option. Try to deal with this; better do the test for the -b +:: option as well. Use FC if there's no diff, taking care to ignore equality. + +set cf= +set cfout= +diff -b nul nul 2>nul && set cf=diff -b +diff -u nul nul 2>nul && set cf=diff -u +diff -ub nul nul 2>nul && set cf=diff -ub +if NOT defined cf ( + set cf=fc /n + set "cfout=>testcf || (type testcf & cmd /c exit /b 1)" +) + +:: Set srcdir to the current or parent directory, whichever one contains the +:: test data. Subsequently, we run most of the pcre2grep tests in the source +:: directory so that the file names in the output are always the same. + +if NOT defined srcdir set srcdir=. +if NOT exist %srcdir%\testdata\ ( + if exist testdata\ ( + set srcdir=. + ) else if exist ..\testdata\ ( + set srcdir=.. + ) else if exist ..\..\testdata\ ( + set srcdir=..\.. + ) else ( + echo Cannot find the testdata directory + exit /b 1 + ) +) + +:: Check for the availability of UTF-8 support + +%pcre2test% -C unicode >nul +set utf8=%ERRORLEVEL% + +:: Check default newline convention. If it does not include LF, force LF. + +for /f %%a in ('"%pcre2test%" -C newline') do set nl=%%a +if NOT "%nl%" == "LF" if NOT "%nl%" == "ANY" if NOT "%nl%" == "ANYCRLF" ( + set pcre2grep=%pcre2grep% -N LF + echo Default newline setting forced to LF +) + +:: Create a simple printf via cscript/JScript (an actual printf may translate +:: LF to CRLF, which this one does not). + +echo WScript.StdOut.Write(WScript.Arguments(0).replace(/\\r/g, "\r").replace(/\\n/g, "\n")) >printf.js +set printf=cscript //nologo printf.js + +:: ------ Normal tests ------ + +echo Testing pcre2grep main features + +echo ---------------------------- Test 1 ------------------------------>testtrygrep +(pushd %srcdir% & %pcre2grep% PATTERN ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 2 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% "^PATTERN" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 3 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -in PATTERN ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 4 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -ic PATTERN ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 5 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -in PATTERN ./testdata/grepinput ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 6 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -inh PATTERN ./testdata/grepinput ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 7 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -il PATTERN ./testdata/grepinput ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 8 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -l PATTERN ./testdata/grepinput ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 9 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -q PATTERN ./testdata/grepinput ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 10 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -q NEVER-PATTERN ./testdata/grepinput ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 11 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -vn pattern ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 12 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -ix pattern ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 13 ----------------------------->>testtrygrep +echo seventeen >testtemp1grep +(pushd %srcdir% & %pcre2grep% -f./testdata/greplist -f %builddir%\testtemp1grep ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 14 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -w pat ./testdata/grepinput ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 15 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% "abc^*" ./testdata/grepinput & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 16 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% abc ./testdata/grepinput ./testdata/nonexistfile & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 17 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -M "the\noutput" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 18 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -Mn "(the\noutput|dog\.\n--)" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 19 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -Mix "Pattern" ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 20 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -Mixn "complete pair\nof lines" ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 21 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -nA3 "four" ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 22 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -nB3 "four" ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 23 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -C3 "four" ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 24 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -A9 "four" ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 25 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -nB9 "four" ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 26 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -A9 -B9 "four" ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 27 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -A10 "four" ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 28 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -nB10 "four" ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 29 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -C12 -B10 "four" ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 30 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -inB3 "pattern" ./testdata/grepinput ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 31 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -inA3 "pattern" ./testdata/grepinput ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 32 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -L "fox" ./testdata/grepinput ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 33 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% "fox" ./testdata/grepnonexist & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 34 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -s "fox" ./testdata/grepnonexist & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 35 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -L -r --include=grepinputx --include grepinput8 --exclude-dir="^\." "fox" ./testdata | sort & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 36 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -L -r --include=grepinput --exclude "grepinput$" --exclude=grepinput8 --exclude-dir="^\." "fox" ./testdata | sort & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 37 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% "^(a+)*\d" ./testdata/grepinput & popd) >>testtrygrep 2>teststderrgrep +echo RC=^%ERRORLEVEL%>>testtrygrep +echo ======== STDERR ========>>testtrygrep +type teststderrgrep >>testtrygrep + +echo ---------------------------- Test 38 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% ">\x00<" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 39 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -A1 "before the binary zero" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 40 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -B1 "after the binary zero" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 41 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -B1 -o "\w+ the binary zero" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 42 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -B1 -onH "\w+ the binary zero" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 43 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -on "before|zero|after" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 44 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -on -e before -ezero -e after ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 45 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -on -f ./testdata/greplist -e binary ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 46 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -eabc -e "(unclosed" ./testdata/grepinput & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 47 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -Fx AB.VE^ + +elephant ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 48 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -F AB.VE^ + +elephant ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 49 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -F -e DATA -e AB.VE^ + +elephant ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 50 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% "^(abc|def|ghi|jkl)" ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 51 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -Mv "brown\sfox" ./testdata/grepinputv & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 52 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% --colour=always jumps ./testdata/grepinputv & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 53 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% --file-offsets "before|zero|after" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 54 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% --line-offsets "before|zero|after" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 55 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -f./testdata/greplist --color=always ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 56 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -c lazy ./testdata/grepinput* & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 57 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -c -l lazy ./testdata/grepinput* & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 58 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --regex=PATTERN ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 59 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --regexp=PATTERN ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 60 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --regex PATTERN ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 61 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --regexp PATTERN ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 62 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --match-limit=1000 --no-jit -M "This is a file(.|\R)*file." ./testdata/grepinput & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 63 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --recursion-limit=1000 --no-jit -M "This is a file(.|\R)*file." ./testdata/grepinput & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 64 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -o1 "(?<=PAT)TERN (ap(pear)s)" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 65 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -o2 "(?<=PAT)TERN (ap(pear)s)" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 66 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -o3 "(?<=PAT)TERN (ap(pear)s)" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 67 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -o12 "(?<=PAT)TERN (ap(pear)s)" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 68 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% --only-matching=2 "(?<=PAT)TERN (ap(pear)s)" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 69 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -vn --colour=always pattern ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 70 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --color=always -M "triple:\t.*\n\n" ./testdata/grepinput3 & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 71 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -o "^01|^02|^03" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 72 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --color=always "^01|^02|^03" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 73 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -o --colour=always "^01|^02|^03" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 74 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -o "^01|02|^03" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 75 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --color=always "^01|02|^03" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 76 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -o --colour=always "^01|02|^03" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 77 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -o "^01|^02|03" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 78 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --color=always "^01|^02|03" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 79 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -o --colour=always "^01|^02|03" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 80 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -o "\b01|\b02" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 81 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --color=always "\b01|\b02" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 82 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -o --colour=always "\b01|\b02" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 83 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --buffer-size=10 --max-buffer-size=100 "^a" ./testdata/grepinput3 & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 84 ----------------------------->>testtrygrep +echo testdata/grepinput3 >testtemp1grep +(pushd %srcdir% & %pcre2grep% --file-list ./testdata/grepfilelist --file-list %builddir%\testtemp1grep "fox|complete|t7" & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 85 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --file-list=./testdata/grepfilelist "dolor" ./testdata/grepinput3 & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 86 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% "dog" ./testdata/grepbinary & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 87 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% "cat" ./testdata/grepbinary & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 88 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -v "cat" ./testdata/grepbinary & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 89 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -I "dog" ./testdata/grepbinary & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 90 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --binary-files=without-match "dog" ./testdata/grepbinary & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 91 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -a "dog" ./testdata/grepbinary & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 92 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --binary-files=text "dog" ./testdata/grepbinary & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 93 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --text "dog" ./testdata/grepbinary & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 94 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -L -r --include=grepinputx --include grepinput8 "fox" ./testdata/grepinput* | sort & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 95 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --file-list ./testdata/grepfilelist --exclude grepinputv "fox|complete" & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 96 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -L -r --include-dir=testdata --exclude "^^(?^!grepinput)" "fox" ./test* | sort & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 97 ----------------------------->>testtrygrep +echo grepinput$>testtemp1grep +echo grepinput8>>testtemp1grep +(pushd %srcdir% & %pcre2grep% -L -r --include=grepinput --exclude-from %builddir%\testtemp1grep --exclude-dir="^\." "fox" ./testdata | sort & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 98 ----------------------------->>testtrygrep +echo grepinput$>testtemp1grep +echo grepinput8>>testtemp1grep +(pushd %srcdir% & %pcre2grep% -L -r --exclude=grepinput3 --include=grepinput --exclude-from %builddir%\testtemp1grep --exclude-dir="^\." "fox" ./testdata | sort & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 99 ----------------------------->>testtrygrep +echo grepinput$>testtemp1grep +echo grepinput8>testtemp2grep +(pushd %srcdir% & %pcre2grep% -L -r --include grepinput --exclude-from %builddir%\testtemp1grep --exclude-from=%builddir%\testtemp2grep --exclude-dir="^\." "fox" ./testdata | sort & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 100 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -Ho2 --only-matching=1 -o3 "(\w+) binary (\w+)(\.)?" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 101 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -o3 -Ho2 -o12 --only-matching=1 -o3 --colour=always --om-separator="|" "(\w+) binary (\w+)(\.)?" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 102 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -n "^$" ./testdata/grepinput3 & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 103 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --only-matching "^$" ./testdata/grepinput3 & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 104 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -n --only-matching "^$" ./testdata/grepinput3 & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 105 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --colour=always "ipsum|" ./testdata/grepinput3 & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 106 ----------------------------->>testtrygrep +(pushd %srcdir% & echo a| %pcre2grep% -M "|a" & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 107 ----------------------------->>testtrygrep +echo a>testtemp1grep +echo aaaaa>>testtemp1grep +(pushd %srcdir% & %pcre2grep% --line-offsets "(?<=\Ka)" %builddir%\testtemp1grep & popd) >>testtrygrep 2>&1 +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 108 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -lq PATTERN ./testdata/grepinput ./testdata/grepinputx & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 109 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -cq lazy ./testdata/grepinput* & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 110 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --om-separator / -Mo0 -o1 -o2 "match (\d+):\n (.)\n" testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 111 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --line-offsets -M "match (\d+):\n (.)\n" testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 112 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --file-offsets -M "match (\d+):\n (.)\n" testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 113 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% --total-count "the" testdata/grepinput* & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 114 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -tc "the" testdata/grepinput* & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 115 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -tlc "the" testdata/grepinput* & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 116 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -th "the" testdata/grepinput* & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 117 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -tch "the" testdata/grepinput* & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 118 ----------------------------->>testtrygrep +(pushd %srcdir% & %pcre2grep% -tL "the" testdata/grepinput* & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 119 ----------------------------->>testtrygrep +%printf% "123\n456\n789\n---abc\ndef\nxyz\n---\n" >testNinputgrep +%pcre2grep% -Mo "(\n|[^-])*---" testNinputgrep >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +echo ---------------------------- Test 120 ------------------------------>>testtrygrep +(pushd %srcdir% & %pcre2grep% -HO "$0:$2$1$3" "(\w+) binary (\w+)(\.)?" ./testdata/grepinput & popd) >>testtrygrep +echo RC=^%ERRORLEVEL%>>testtrygrep + +:: Now compare the results. + +%cf% %srcdir%\testdata\grepoutput testtrygrep %cfout% +if ERRORLEVEL 1 exit /b 1 + + +:: These tests require UTF-8 support + +if %utf8% neq 0 ( + echo Testing pcre2grep UTF-8 features + + echo ---------------------------- Test U1 ------------------------------>testtrygrep + (pushd %srcdir% & %pcre2grep% -n -u --newline=any "^X" ./testdata/grepinput8 & popd) >>testtrygrep + echo RC=^%ERRORLEVEL%>>testtrygrep + + echo ---------------------------- Test U2 ------------------------------>>testtrygrep + (pushd %srcdir% & %pcre2grep% -n -u -C 3 --newline=any "Match" ./testdata/grepinput8 & popd) >>testtrygrep + echo RC=^%ERRORLEVEL%>>testtrygrep + + echo ---------------------------- Test U3 ------------------------------>>testtrygrep + (pushd %srcdir% & %pcre2grep% --line-offsets -u --newline=any "(?<=\K\x{17f})" ./testdata/grepinput8 & popd) >>testtrygrep + echo RC=^%ERRORLEVEL%>>testtrygrep + + %cf% %srcdir%\testdata\grepoutput8 testtrygrep %cfout% + if ERRORLEVEL 1 exit /b 1 + +) else ( + echo Skipping pcre2grep UTF-8 tests: no UTF-8 support in PCRE2 library +) + + +:: We go to some contortions to try to ensure that the tests for the various +:: newline settings will work in environments where the normal newline sequence +:: is not \n. Do not use exported files, whose line endings might be changed. +:: Instead, create an input file so that its contents are exactly what we want. +:: These tests are run in the build directory. + +echo Testing pcre2grep newline settings +%printf% "abc\rdef\r\nghi\njkl" >testNinputgrep + +echo ---------------------------- Test N1 ------------------------------>testtrygrep +%pcre2grep% -n -N CR "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep + +echo ---------------------------- Test N2 ------------------------------>>testtrygrep +%pcre2grep% -n --newline=crlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep + +echo ---------------------------- Test N3 ------------------------------>>testtrygrep +for /f %%a in ('%printf% "def\rjkl"') do set pattern=%%a +%pcre2grep% -n --newline=cr -F "!pattern!" testNinputgrep >>testtrygrep + +echo ---------------------------- Test N4 ------------------------------>>testtrygrep +%pcre2grep% -n --newline=crlf -F -f %srcdir%/testdata/greppatN4 testNinputgrep >>testtrygrep + +echo ---------------------------- Test N5 ------------------------------>>testtrygrep +%pcre2grep% -n --newline=any "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep + +echo ---------------------------- Test N6 ------------------------------>>testtrygrep +%pcre2grep% -n --newline=anycrlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep + +%cf% %srcdir%\testdata\grepoutputN testtrygrep %cfout% +if ERRORLEVEL 1 exit /b 1 + +:: If pcre2grep supports script callouts, run some tests on them. + +%pcre2grep% --help | %pcre2grep% -q "callout scripts in patterns are supported" +if %ERRORLEVEL% equ 0 ( + echo Testing pcre2grep script callouts + %pcre2grep% "(T)(..(.))(?C'cmd|/c echo|Arg1: [$1] [$2] [$3]|Arg2: ^$|${1}^$| ($4) ($14) ($0)')()" %srcdir%/testdata/grepinputv >testtrygrep + %pcre2grep% "(T)(..(.))()()()()()()()(..)(?C'cmd|/c echo|Arg1: [$11] [${11}]')" %srcdir%/testdata/grepinputv >>testtrygrep + %pcre2grep% "(T)(?C'|$0:$1$n')" %srcdir%/testdata/grepinputv >>testtrygrep + %pcre2grep% "(T)(?C'|$1$n')(*F)" %srcdir%/testdata/grepinputv >>testtrygrep + %pcre2grep% --help | %pcre2grep% -q "Non-script callout scripts in patterns are supported" + if %ERRORLEVEL% equ 0 ( + %cf% %srcdir%\testdata\grepoutputCN testtrygrep %cfout% + ) else ( + %cf% %srcdir%\testdata\grepoutputC testtrygrep %cfout% + ) + if ERRORLEVEL 1 exit /b 1 +) else ( + echo Script callouts are not supported +) + +:: Finally, some tests to exercise code that is not tested above, just to be +:: sure that it runs OK. Doing this improves the coverage statistics. The output +:: is not checked. + +echo Testing miscellaneous pcre2grep arguments (unchecked) +%printf% "" >testtrygrep +call :checkspecial "-xxxxx" 2 || exit /b 1 +call :checkspecial "--help" 0 || exit /b 1 +call :checkspecial "--line-buffered --colour=auto abc nul" 1 || exit /b 1 + +:: Clean up local working files +del testcf printf.js testNinputgrep teststderrgrep testtrygrep testtemp1grep testtemp2grep + +exit /b 0 + +:: ------ Function to run and check a special pcre2grep arguments test ------- + +:checkspecial + %pcre2grep% %~1 >>testtrygrep 2>&1 + if %ERRORLEVEL% neq %2 ( + echo ** pcre2grep %~1 failed - check testtrygrep + exit /b 1 + ) + exit /b 0 + +:: End diff --git a/src/pcre2/RunTest b/src/pcre2/RunTest new file mode 100755 index 00000000..14d9f60f --- /dev/null +++ b/src/pcre2/RunTest @@ -0,0 +1,869 @@ +#! /bin/sh + +############################################################################### +# Run the PCRE2 tests using the pcre2test program. The appropriate tests are +# selected, depending on which build-time options were used. +# +# When JIT support is available, all appropriate tests are run with and without +# JIT, unless "-nojit" is given on the command line. There are also two tests +# for JIT-specific features, one to be run when JIT support is available +# (unless "-nojit" is specified), and one when it is not. +# +# Whichever of the 8-, 16- and 32-bit libraries exist are tested. It is also +# possible to select which to test by giving "-8", "-16" or "-32" on the +# command line. +# +# As well as "-nojit", "-8", "-16", and "-32", arguments for this script are +# individual test numbers, ranges of tests such as 3-6 or 3- (meaning 3 to the +# end), or a number preceded by ~ to exclude a test. For example, "3-15 ~10" +# runs tests 3 to 15, excluding test 10, and just "~10" runs all the tests +# except test 10. Whatever order the arguments are in, the tests are always run +# in numerical order. +# +# Inappropriate tests are automatically skipped (with a comment to say so). For +# example, if JIT support is not compiled, test 16 is skipped, whereas if JIT +# support is compiled, test 15 is skipped. +# +# Other arguments can be one of the words "-valgrind", "-valgrind-log", or +# "-sim" followed by an argument to run cross-compiled executables under a +# simulator, for example: +# +# RunTest 3 -sim "qemu-arm -s 8388608" +# +# For backwards compatibility, -nojit, -valgrind, -valgrind-log, and -sim may +# be given without the leading "-" character. +# +# When PCRE2 is compiled by clang with -fsanitize arguments, some tests need +# very much more stack than normal. In environments where the stack can be +# set at runtime, -bigstack sets a gigantic stack. +# +# There are two special cases where only one argument is allowed: +# +# If the first and only argument is "ebcdic", the script runs the special +# EBCDIC test that can be useful for checking certain EBCDIC features, even +# when run in an ASCII environment. PCRE2 must be built with EBCDIC support for +# this test to be run. +# +# If the script is obeyed as "RunTest list", a list of available tests is +# output, but none of them are run. +############################################################################### + +# Define test titles in variables so that they can be output as a list. Some +# of them are modified (e.g. with -8 or -16) when used in the actual tests. + +title0="Test 0: Unchecked pcre2test argument tests (to improve coverage)" +title1="Test 1: Main non-UTF, non-UCP functionality (compatible with Perl >= 5.10)" +title2="Test 2: API, errors, internals and non-Perl stuff" +title3="Test 3: Locale-specific features" +title4A="Test 4: UTF" +title4B=" and Unicode property support (compatible with Perl >= 5.10)" +title5A="Test 5: API, internals, and non-Perl stuff for UTF" +title5B=" and UCP support" +title6="Test 6: DFA matching main non-UTF, non-UCP functionality" +title7A="Test 7: DFA matching with UTF" +title7B=" and Unicode property support" +title8="Test 8: Internal offsets and code size tests" +title9="Test 9: Specials for the basic 8-bit library" +title10="Test 10: Specials for the 8-bit library with UTF-8 and UCP support" +title11="Test 11: Specials for the basic 16-bit and 32-bit libraries" +title12="Test 12: Specials for the 16-bit and 32-bit libraries UTF and UCP support" +title13="Test 13: DFA specials for the basic 16-bit and 32-bit libraries" +title14="Test 14: DFA specials for UTF and UCP support" +title15="Test 15: Non-JIT limits and other non-JIT tests" +title16="Test 16: JIT-specific features when JIT is not available" +title17="Test 17: JIT-specific features when JIT is available" +title18="Test 18: Tests of the POSIX interface, excluding UTF/UCP" +title19="Test 19: Tests of the POSIX interface with UTF/UCP" +title20="Test 20: Serialization and code copy tests" +title21="Test 21: \C tests without UTF (supported for DFA matching)" +title22="Test 22: \C tests with UTF (not supported for DFA matching)" +title23="Test 23: \C disabled test" +title24="Test 24: Non-UTF pattern conversion tests" +title25="Test 25: UTF pattern conversion tests" +maxtest=25 + +if [ $# -eq 1 -a "$1" = "list" ]; then + echo $title0 + echo $title1 + echo $title2 "(not UTF or UCP)" + echo $title3 + echo $title4A $title4B + echo $title5A $title5B + echo $title6 + echo $title7A $title7B + echo $title8 + echo $title9 + echo $title10 + echo $title11 + echo $title12 + echo $title13 + echo $title14 + echo $title15 + echo $title16 + echo $title17 + echo $title18 + echo $title19 + echo $title20 + echo $title21 + echo $title22 + echo $title23 + echo $title24 + echo $title25 + exit 0 +fi + +# Set up a suitable "diff" command for comparison. Some systems +# have a diff that lacks a -u option. Try to deal with this. + +cf="diff" +diff -u /dev/null /dev/null 2>/dev/null && cf="diff -u" + +# Find the test data + +if [ -n "$srcdir" -a -d "$srcdir" ] ; then + testdata="$srcdir/testdata" +elif [ -d "./testdata" ] ; then + testdata=./testdata +elif [ -d "../testdata" ] ; then + testdata=../testdata +else + echo "Cannot find the testdata directory" + exit 1 +fi + + +# ------ Function to check results of a test ------- + +# This function is called with three parameters: +# +# $1 the value of $? after a call to pcre2test +# $2 the suffix of the output file to compare with +# $3 the $opt value (empty, -jit, or -dfa) +# +# Note: must define using name(), not "function name", for Solaris. + +checkresult() + { + if [ $1 -ne 0 ] ; then + echo "** pcre2test failed - check testtry" + exit 1 + fi + case "$3" in + -jit) with=" with JIT";; + -dfa) with=" with DFA";; + *) with="";; + esac + $cf $testdata/testoutput$2 testtry + if [ $? != 0 ] ; then + echo "" + echo "** Test $2 failed$with" + exit 1 + fi + echo " OK$with" + } + + +# ------ Function to run and check a special pcre2test arguments test ------- + +checkspecial() + { + $valgrind $vjs ./pcre2test $1 >>testtry + if [ $? -ne 0 ] ; then + echo "** pcre2test $1 failed - check testtry" + exit 1 + fi + } + + +# ------ Special EBCDIC Test ------- + +if [ $# -eq 1 -a "$1" = "ebcdic" ]; then + $valgrind ./pcre2test -C ebcdic >/dev/null + ebcdic=$? + if [ $ebcdic -ne 1 ] ; then + echo "Cannot run EBCDIC tests: EBCDIC support not compiled" + exit 1 + fi + for opt in "" "-dfa"; do + ./pcre2test -q $opt $testdata/testinputEBC >testtry + checkresult $? EBC "$opt" + done +exit 0 +fi + + +# ------ Normal Tests ------ + +# Default values + +arg8= +arg16= +arg32= +nojit= +bigstack= +sim= +skip= +valgrind= +vjs= + +# This is in case the caller has set aliases (as I do - PH) +unset cp ls mv rm + +# Process options and select which tests to run; for those that are explicitly +# requested, check that the necessary optional facilities are available. + +do0=no +do1=no +do2=no +do3=no +do4=no +do5=no +do6=no +do7=no +do8=no +do9=no +do10=no +do11=no +do12=no +do13=no +do14=no +do15=no +do16=no +do17=no +do18=no +do19=no +do20=no +do21=no +do22=no +do23=no +do24=no +do25=no + +while [ $# -gt 0 ] ; do + case $1 in + 0) do0=yes;; + 1) do1=yes;; + 2) do2=yes;; + 3) do3=yes;; + 4) do4=yes;; + 5) do5=yes;; + 6) do6=yes;; + 7) do7=yes;; + 8) do8=yes;; + 9) do9=yes;; + 10) do10=yes;; + 11) do11=yes;; + 12) do12=yes;; + 13) do13=yes;; + 14) do14=yes;; + 15) do15=yes;; + 16) do16=yes;; + 17) do17=yes;; + 18) do18=yes;; + 19) do19=yes;; + 20) do20=yes;; + 21) do21=yes;; + 22) do22=yes;; + 23) do23=yes;; + 24) do24=yes;; + 25) do25=yes;; + -8) arg8=yes;; + -16) arg16=yes;; + -32) arg32=yes;; + bigstack|-bigstack) bigstack=yes;; + nojit|-nojit) nojit=yes;; + sim|-sim) shift; sim=$1;; + valgrind|-valgrind) valgrind="valgrind --tool=memcheck -q --smc-check=all-non-file";; + valgrind-log|-valgrind-log) valgrind="valgrind --tool=memcheck --num-callers=30 --leak-check=no --error-limit=no --smc-check=all-non-file --log-file=report.%p ";; + ~*) + if expr "$1" : '~[0-9][0-9]*$' >/dev/null; then + skip="$skip `expr "$1" : '~\([0-9]*\)*$'`" + else + echo "Unknown option or test selector '$1'"; exit 1 + fi + ;; + *-*) + if expr "$1" : '[0-9][0-9]*-[0-9]*$' >/dev/null; then + tf=`expr "$1" : '\([0-9]*\)'` + tt=`expr "$1" : '.*-\([0-9]*\)'` + if [ "$tt" = "" ] ; then tt=$maxtest; fi + if expr \( "$tt" ">" "$maxtest" \) >/dev/null; then + echo "Invalid test range '$1'"; exit 1 + fi + while expr "$tf" "<=" "$tt" >/dev/null; do + eval do${tf}=yes + tf=`expr $tf + 1` + done + else + echo "Invalid test range '$1'"; exit 1 + fi + ;; + *) echo "Unknown option or test selector '$1'"; exit 1;; + esac + shift +done + +# Find which optional facilities are available. + +$sim ./pcre2test -C linksize >/dev/null +link_size=$? +if [ $link_size -lt 2 ] ; then + echo "RunTest: Failed to find internal link size" + exit 1 +fi +if [ $link_size -gt 4 ] ; then + echo "RunTest: Failed to find internal link size" + exit 1 +fi + +# If it is possible to set the system stack size and -bigstack was given, +# set up a large stack. + +$sim ./pcre2test -S 64 /dev/null /dev/null +if [ $? -eq 0 -a "$bigstack" != "" ] ; then + setstack="-S 64" +else + setstack="" +fi + +# All of 8-bit, 16-bit, and 32-bit character strings may be supported, but only +# one need be. + +$sim ./pcre2test -C pcre2-8 >/dev/null +support8=$? +$sim ./pcre2test -C pcre2-16 >/dev/null +support16=$? +$sim ./pcre2test -C pcre2-32 >/dev/null +support32=$? + +# \C may be disabled + +$sim ./pcre2test -C backslash-C >/dev/null +supportBSC=$? + +# Initialize all bitsizes skipped + +test8=skip +test16=skip +test32=skip + +# If no bitsize arguments, select all that are available + +if [ "$arg8$arg16$arg32" = "" ] ; then + if [ $support8 -ne 0 ] ; then + test8=-8 + fi + if [ $support16 -ne 0 ] ; then + test16=-16 + fi + if [ $support32 -ne 0 ] ; then + test32=-32 + fi + +# Otherwise, select requested bit sizes + +else + if [ "$arg8" = yes ] ; then + if [ $support8 -eq 0 ] ; then + echo "Cannot run 8-bit library tests: 8-bit library not compiled" + exit 1 + fi + test8=-8 + fi + if [ "$arg16" = yes ] ; then + if [ $support16 -eq 0 ] ; then + echo "Cannot run 16-bit library tests: 16-bit library not compiled" + exit 1 + fi + test16=-16 + fi + if [ "$arg32" = yes ] ; then + if [ $support32 -eq 0 ] ; then + echo "Cannot run 32-bit library tests: 32-bit library not compiled" + exit 1 + fi + test32=-32 + fi +fi + +# UTF support is implied by Unicode support, and it always applies to all bit +# sizes if both are supported; we can't have UTF-8 support without UTF-16 or +# UTF-32 support. + +$sim ./pcre2test -C unicode >/dev/null +utf=$? + +# When JIT is used with valgrind, we need to set up valgrind suppressions as +# otherwise there are a lot of false positive valgrind reports when the +# the hardware supports SSE2. + +jitopt= +$sim ./pcre2test -C jit >/dev/null +jit=$? +if [ $jit -ne 0 -a "$nojit" != "yes" ] ; then + jitopt=-jit + if [ "$valgrind" != "" ] ; then + vjs="--suppressions=$testdata/valgrind-jit.supp" + fi +fi + +# If no specific tests were requested, select all. Those that are not +# relevant will be automatically skipped. + +if [ $do0 = no -a $do1 = no -a $do2 = no -a $do3 = no -a \ + $do4 = no -a $do5 = no -a $do6 = no -a $do7 = no -a \ + $do8 = no -a $do9 = no -a $do10 = no -a $do11 = no -a \ + $do12 = no -a $do13 = no -a $do14 = no -a $do15 = no -a \ + $do16 = no -a $do17 = no -a $do18 = no -a $do19 = no -a \ + $do20 = no -a $do21 = no -a $do22 = no -a $do23 = no -a \ + $do24 = no -a $do25 = no \ + ]; then + do0=yes + do1=yes + do2=yes + do3=yes + do4=yes + do5=yes + do6=yes + do7=yes + do8=yes + do9=yes + do10=yes + do11=yes + do12=yes + do13=yes + do14=yes + do15=yes + do16=yes + do17=yes + do18=yes + do19=yes + do20=yes + do21=yes + do22=yes + do23=yes + do24=yes + do25=yes +fi + +# Handle any explicit skips at this stage, so that an argument list may consist +# only of explicit skips. + +for i in $skip; do eval do$i=no; done + +# Show which release and which test data + +echo "" +echo PCRE2 C library tests using test data from $testdata +$sim ./pcre2test /dev/null +echo "" + +for bmode in "$test8" "$test16" "$test32"; do + case "$bmode" in + skip) continue;; + -16) if [ "$test8$test32" != "skipskip" ] ; then echo ""; fi + bits=16; echo "---- Testing 16-bit library ----"; echo "";; + -32) if [ "$test8$test16" != "skipskip" ] ; then echo ""; fi + bits=32; echo "---- Testing 32-bit library ----"; echo "";; + -8) bits=8; echo "---- Testing 8-bit library ----"; echo "";; + esac + + # Test 0 is a special test. Its output is not checked, because it will + # be different on different hardware and with different configurations. + # Running this test just exercises the code. + + if [ $do0 = yes ] ; then + echo $title0 + echo '/abc/jit,memory,framesize' >testSinput + echo ' abc' >>testSinput + echo '' >testtry + checkspecial '-C' + checkspecial '--help' + checkspecial '-S 1 -t 10 testSinput' + echo " OK" + fi + + # Primary non-UTF test, compatible with JIT and all versions of Perl >= 5.8 + + if [ $do1 = yes ] ; then + echo $title1 + for opt in "" $jitopt; do + $sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput1 testtry + checkresult $? 1 "$opt" + done + fi + + # PCRE2 tests that are not Perl-compatible: API, errors, internals. We copy + # the testbtables file to the current directory for use by this test. + + if [ $do2 = yes ] ; then + echo $title2 "(excluding UTF-$bits)" + cp $testdata/testbtables . + for opt in "" $jitopt; do + $sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput2 testtry + saverc=$? + if [ $saverc = 0 ] ; then + $sim $valgrind ${opt:+$vjs} ./pcre2test -q $bmode $opt -error -70,-62,-2,-1,0,100,101,191,200 >>testtry + checkresult $? 2 "$opt" + else + checkresult $saverc 2 "$opt" + fi + done + fi + + # Locale-specific tests, provided that either the "fr_FR", "fr_CA", "french" + # or "fr" locale is available. The first two are Unix-like standards; the + # last two are for Windows. Unfortunately, different versions of the French + # locale give different outputs for some items. This test passes if the + # output matches any one of the alternative output files. + + if [ $do3 = yes ] ; then + locale= + + # In some environments locales that are listed by the "locale -a" + # command do not seem to work with setlocale(). Therefore, we do + # a preliminary test to see if pcre2test can set one before going + # on to use it. + + for loc in 'fr_FR' 'french' 'fr' 'fr_CA'; do + locale -a | grep "^$loc\$" >/dev/null + if [ $? -eq 0 ] ; then + echo "/a/locale=$loc" | \ + $sim $valgrind ./pcre2test -q $bmode | \ + grep "Failed to set locale" >/dev/null + if [ $? -ne 0 ] ; then + locale=$loc + if [ "$locale" = "fr_FR" ] ; then + infile=$testdata/testinput3 + outfile=$testdata/testoutput3 + outfile2=$testdata/testoutput3A + outfile3=$testdata/testoutput3B + else + infile=test3input + outfile=test3output + outfile2=test3outputA + outfile3=test3outputB + sed "s/fr_FR/$loc/" $testdata/testinput3 >test3input + sed "s/fr_FR/$loc/" $testdata/testoutput3 >test3output + sed "s/fr_FR/$loc/" $testdata/testoutput3A >test3outputA + sed "s/fr_FR/$loc/" $testdata/testoutput3B >test3outputB + fi + break + fi + fi + done + + if [ "$locale" != "" ] ; then + echo $title3 "(using '$locale' locale)" + for opt in "" $jitopt; do + $sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $infile testtry + if [ $? = 0 ] ; then + case "$opt" in + -jit) with=" with JIT";; + *) with="";; + esac + if $cf $outfile testtry >teststdout || \ + $cf $outfile2 testtry >teststdout || \ + $cf $outfile3 testtry >teststdout + then + echo " OK$with" + else + echo "** Locale test did not run successfully$with. The output did not match" + echo " $outfile, $outfile2 or $outfile3." + echo " This may mean that there is a problem with the locale settings rather" + echo " than a bug in PCRE2." + exit 1 + fi + else exit 1 + fi + done + else + echo "Cannot test locale-specific features - none of the 'fr_FR', 'fr_CA'," + echo "'fr' or 'french' locales can be set, or the \"locale\" command is" + echo "not available to check for them." + echo " " + fi + fi + + # Tests for UTF and Unicode property support + + if [ $do4 = yes ] ; then + echo ${title4A}-${bits}${title4B} + if [ $utf -eq 0 ] ; then + echo " Skipped because UTF-$bits support is not available" + else + for opt in "" $jitopt; do + $sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput4 testtry + checkresult $? 4 "$opt" + done + fi + fi + + if [ $do5 = yes ] ; then + echo ${title5A}-${bits}$title5B + if [ $utf -eq 0 ] ; then + echo " Skipped because UTF-$bits support is not available" + else + for opt in "" $jitopt; do + $sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput5 testtry + checkresult $? 5 "$opt" + done + fi + fi + + # Tests for DFA matching support + + if [ $do6 = yes ] ; then + echo $title6 + $sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput6 testtry + checkresult $? 6 "" + fi + + if [ $do7 = yes ] ; then + echo ${title7A}-${bits}$title7B + if [ $utf -eq 0 ] ; then + echo " Skipped because UTF-$bits support is not available" + else + $sim $valgrind ./pcre2test -q $setstack $bmode $opt $testdata/testinput7 testtry + checkresult $? 7 "" + fi + fi + + # Test of internal offsets and code sizes. This test is run only when there + # is UTF/UCP support. The actual tests are mostly the same as in some of the + # above, but in this test we inspect some offsets and sizes. This is a + # doublecheck for the maintainer, just in case something changes unexpectely. + # The output from this test is different in 8-bit, 16-bit, and 32-bit modes + # and for different link sizes, so there are different output files for each + # mode and link size. + + if [ $do8 = yes ] ; then + echo $title8 + if [ $utf -eq 0 ] ; then + echo " Skipped because UTF-$bits support is not available" + else + $sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput8 testtry + checkresult $? 8-$bits-$link_size "" + fi + fi + + # Tests for 8-bit-specific features + + if [ "$do9" = yes ] ; then + echo $title9 + if [ "$bits" = "16" -o "$bits" = "32" ] ; then + echo " Skipped when running 16/32-bit tests" + else + for opt in "" $jitopt; do + $sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput9 testtry + checkresult $? 9 "$opt" + done + fi + fi + + # Tests for UTF-8 and UCP 8-bit-specific features + + if [ "$do10" = yes ] ; then + echo $title10 + if [ "$bits" = "16" -o "$bits" = "32" ] ; then + echo " Skipped when running 16/32-bit tests" + elif [ $utf -eq 0 ] ; then + echo " Skipped because UTF-$bits support is not available" + else + for opt in "" $jitopt; do + $sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput10 testtry + checkresult $? 10 "$opt" + done + fi + fi + + # Tests for 16-bit and 32-bit features. Output is different for the two widths. + + if [ $do11 = yes ] ; then + echo $title11 + if [ "$bits" = "8" ] ; then + echo " Skipped when running 8-bit tests" + else + for opt in "" $jitopt; do + $sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput11 testtry + checkresult $? 11-$bits "$opt" + done + fi + fi + + # Tests for 16-bit and 32-bit features with UTF-16/32 and UCP support. Output + # is different for the two widths. + + if [ $do12 = yes ] ; then + echo $title12 + if [ "$bits" = "8" ] ; then + echo " Skipped when running 8-bit tests" + elif [ $utf -eq 0 ] ; then + echo " Skipped because UTF-$bits support is not available" + else + for opt in "" $jitopt; do + $sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput12 testtry + checkresult $? 12-$bits "$opt" + done + fi + fi + + # Tests for 16/32-bit-specific features in DFA non-UTF modes + + if [ $do13 = yes ] ; then + echo $title13 + if [ "$bits" = "8" ] ; then + echo " Skipped when running 8-bit tests" + else + $sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput13 testtry + checkresult $? 13 "" + fi + fi + + # Tests for DFA UTF and UCP features. Output is different for the different widths. + + if [ $do14 = yes ] ; then + echo $title14 + if [ $utf -eq 0 ] ; then + echo " Skipped because UTF-$bits support is not available" + else + $sim $valgrind ./pcre2test -q $setstack $bmode $opt $testdata/testinput14 testtry + checkresult $? 14-$bits "" + fi + fi + + # Test non-JIT match and recursion limits + + if [ $do15 = yes ] ; then + echo $title15 + $sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput15 testtry + checkresult $? 15 "" + fi + + # Test JIT-specific features when JIT is not available + + if [ $do16 = yes ] ; then + echo $title16 + if [ $jit -ne 0 ] ; then + echo " Skipped because JIT is available" + else + $sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput16 testtry + checkresult $? 16 "" + fi + fi + + # Test JIT-specific features when JIT is available + + if [ $do17 = yes ] ; then + echo $title17 + if [ $jit -eq 0 -o "$nojit" = "yes" ] ; then + echo " Skipped because JIT is not available or nojit was specified" + else + $sim $valgrind $vjs ./pcre2test -q $setstack $bmode $testdata/testinput17 testtry + checkresult $? 17 "" + fi + fi + + # Tests for the POSIX interface without UTF/UCP (8-bit only) + + if [ $do18 = yes ] ; then + echo $title18 + if [ "$bits" = "16" -o "$bits" = "32" ] ; then + echo " Skipped when running 16/32-bit tests" + else + $sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput18 testtry + checkresult $? 18 "" + fi + fi + + # Tests for the POSIX interface with UTF/UCP (8-bit only) + + if [ $do19 = yes ] ; then + echo $title19 + if [ "$bits" = "16" -o "$bits" = "32" ] ; then + echo " Skipped when running 16/32-bit tests" + elif [ $utf -eq 0 ] ; then + echo " Skipped because UTF-$bits support is not available" + else + $sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput19 testtry + checkresult $? 19 "" + fi + fi + + # Serialization tests + + if [ $do20 = yes ] ; then + echo $title20 + $sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput20 testtry + checkresult $? 20 "" + fi + + # \C tests without UTF - DFA matching is supported + + if [ "$do21" = yes ] ; then + echo $title21 + if [ $supportBSC -eq 0 ] ; then + echo " Skipped because \C is disabled" + else + for opt in "" $jitopt -dfa; do + $sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput21 testtry + checkresult $? 21 "$opt" + done + fi + fi + + # \C tests with UTF - DFA matching is not supported for \C in UTF mode + + if [ "$do22" = yes ] ; then + echo $title22 + if [ $supportBSC -eq 0 ] ; then + echo " Skipped because \C is disabled" + elif [ $utf -eq 0 ] ; then + echo " Skipped because UTF-$bits support is not available" + else + for opt in "" $jitopt; do + $sim $valgrind ${opt:+$vjs} ./pcre2test -q $setstack $bmode $opt $testdata/testinput22 testtry + checkresult $? 22-$bits "$opt" + done + fi + fi + + # Test when \C is disabled + + if [ "$do23" = yes ] ; then + echo $title23 + if [ $supportBSC -ne 0 ] ; then + echo " Skipped because \C is not disabled" + else + $sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput23 testtry + checkresult $? 23 "" + fi + fi + + # Non-UTF pattern conversion tests + + if [ "$do24" = yes ] ; then + echo $title24 + $sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput24 testtry + checkresult $? 24 "" + fi + + # UTF pattern conversion tests + + if [ "$do25" = yes ] ; then + echo $title25 + if [ $utf -eq 0 ] ; then + echo " Skipped because UTF-$bits support is not available" + else + $sim $valgrind ./pcre2test -q $setstack $bmode $testdata/testinput25 testtry + checkresult $? 25 "" + fi + fi + +# End of loop for 8/16/32-bit tests +done + +# Clean up local working files +rm -f testbtables testSinput test3input testsaved1 testsaved2 test3output test3outputA test3outputB teststdout teststderr testtry + +# End diff --git a/src/pcre2/RunTest.bat b/src/pcre2/RunTest.bat new file mode 100644 index 00000000..791f2650 --- /dev/null +++ b/src/pcre2/RunTest.bat @@ -0,0 +1,525 @@ +@echo off +@rem +@rem MS Windows batch file to run pcre2test on testfiles with the correct +@rem options. This file must use CRLF linebreaks to function properly, +@rem and requires both pcre2test and pcre2grep. +@rem +@rem ------------------------ HISTORY ---------------------------------- +@rem This file was originally contributed to PCRE1 by Ralf Junker, and touched +@rem up by Daniel Richard G. Tests 10-12 added by Philip H. +@rem Philip H also changed test 3 to use "wintest" files. +@rem +@rem Updated by Tom Fortmann to support explicit test numbers on the command +@rem line. Added argument validation and added error reporting. +@rem +@rem Sheri Pierce added logic to skip feature dependent tests +@rem tests 4 5 7 10 12 14 19 and 22 require Unicode support +@rem 8 requires Unicode and link size 2 +@rem 16 requires absence of jit support +@rem 17 requires presence of jit support +@rem Sheri P also added override tests for study and jit testing +@rem Zoltan Herczeg added libpcre16 support +@rem Zoltan Herczeg added libpcre32 support +@rem ------------------------------------------------------------------- +@rem +@rem The file was converted for PCRE2 by PH, February 2015. +@rem Updated for new test 14 (moving others up a number), August 2015. +@rem Tidied and updated for new tests 21, 22, 23 by PH, October 2015. +@rem PH added missing "set type" for test 22, April 2016. +@rem PH added copy command for new testbtables file, November 2020 + + +setlocal enabledelayedexpansion +if [%srcdir%]==[] ( +if exist testdata\ set srcdir=.) +if [%srcdir%]==[] ( +if exist ..\testdata\ set srcdir=..) +if [%srcdir%]==[] ( +if exist ..\..\testdata\ set srcdir=..\..) +if NOT exist %srcdir%\testdata\ ( +Error: echo distribution testdata folder not found! +call :conferror +exit /b 1 +goto :eof +) + +if [%pcre2test%]==[] set pcre2test=.\pcre2test.exe + +echo source dir is %srcdir% +echo pcre2test=%pcre2test% + +if NOT exist %pcre2test% ( +echo Error: %pcre2test% not found! +echo. +call :conferror +exit /b 1 +) + +%pcre2test% -C linksize >NUL +set link_size=%ERRORLEVEL% +%pcre2test% -C pcre2-8 >NUL +set support8=%ERRORLEVEL% +%pcre2test% -C pcre2-16 >NUL +set support16=%ERRORLEVEL% +%pcre2test% -C pcre2-32 >NUL +set support32=%ERRORLEVEL% +%pcre2test% -C unicode >NUL +set unicode=%ERRORLEVEL% +%pcre2test% -C jit >NUL +set jit=%ERRORLEVEL% +%pcre2test% -C backslash-C >NUL +set supportBSC=%ERRORLEVEL% + +if %support8% EQU 1 ( +if not exist testout8 md testout8 +if not exist testoutjit8 md testoutjit8 +) + +if %support16% EQU 1 ( +if not exist testout16 md testout16 +if not exist testoutjit16 md testoutjit16 +) + +if %support16% EQU 1 ( +if not exist testout32 md testout32 +if not exist testoutjit32 md testoutjit32 +) + +set do1=no +set do2=no +set do3=no +set do4=no +set do5=no +set do6=no +set do7=no +set do8=no +set do9=no +set do10=no +set do11=no +set do12=no +set do13=no +set do14=no +set do15=no +set do16=no +set do17=no +set do18=no +set do19=no +set do20=no +set do21=no +set do22=no +set do23=no +set all=yes + +for %%a in (%*) do ( + set valid=no + for %%v in (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23) do if %%v == %%a set valid=yes + if "!valid!" == "yes" ( + set do%%a=yes + set all=no +) else ( + echo Invalid test number - %%a! + echo Usage %0 [ test_number ] ... + echo Where test_number is one or more optional test numbers 1 through 23, default is all tests. + exit /b 1 +) +) +set failed="no" + +if "%all%" == "yes" ( + set do1=yes + set do2=yes + set do3=yes + set do4=yes + set do5=yes + set do6=yes + set do7=yes + set do8=yes + set do9=yes + set do10=yes + set do11=yes + set do12=yes + set do13=yes + set do14=yes + set do15=yes + set do16=yes + set do17=yes + set do18=yes + set do19=yes + set do20=yes + set do21=yes + set do22=yes + set do23=yes +) + +@echo RunTest.bat's pcre2test output is written to newly created subfolders +@echo named testout{8,16,32} and testoutjit{8,16,32}. +@echo. + +set mode= +set bits=8 + +:nextMode +if "%mode%" == "" ( + if %support8% EQU 0 goto modeSkip + echo. + echo ---- Testing 8-bit library ---- + echo. +) +if "%mode%" == "-16" ( + if %support16% EQU 0 goto modeSkip + echo. + echo ---- Testing 16-bit library ---- + echo. +) +if "%mode%" == "-32" ( + if %support32% EQU 0 goto modeSkip + echo. + echo ---- Testing 32-bit library ---- + echo. +) +if "%do1%" == "yes" call :do1 +if "%do2%" == "yes" call :do2 +if "%do3%" == "yes" call :do3 +if "%do4%" == "yes" call :do4 +if "%do5%" == "yes" call :do5 +if "%do6%" == "yes" call :do6 +if "%do7%" == "yes" call :do7 +if "%do8%" == "yes" call :do8 +if "%do9%" == "yes" call :do9 +if "%do10%" == "yes" call :do10 +if "%do11%" == "yes" call :do11 +if "%do12%" == "yes" call :do12 +if "%do13%" == "yes" call :do13 +if "%do14%" == "yes" call :do14 +if "%do15%" == "yes" call :do15 +if "%do16%" == "yes" call :do16 +if "%do17%" == "yes" call :do17 +if "%do18%" == "yes" call :do18 +if "%do19%" == "yes" call :do19 +if "%do20%" == "yes" call :do20 +if "%do21%" == "yes" call :do21 +if "%do22%" == "yes" call :do22 +if "%do23%" == "yes" call :do23 +:modeSkip +if "%mode%" == "" ( + set mode=-16 + set bits=16 + goto nextMode +) +if "%mode%" == "-16" ( + set mode=-32 + set bits=32 + goto nextMode +) + +@rem If mode is -32, testing is finished +if %failed% == "yes" ( +echo In above output, one or more of the various tests failed! +exit /b 1 +) +echo All OK +goto :eof + +:runsub +@rem Function to execute pcre2test and compare the output +@rem Arguments are as follows: +@rem +@rem 1 = test number +@rem 2 = outputdir +@rem 3 = test name use double quotes +@rem 4 - 9 = pcre2test options + +if [%1] == [] ( + echo Missing test number argument! + exit /b 1 +) + +if [%2] == [] ( + echo Missing outputdir! + exit /b 1 +) + +if [%3] == [] ( + echo Missing test name argument! + exit /b 1 +) + +if %1 == 8 ( + set outnum=8-%bits%-%link_size% +) else ( + set outnum=%1 +) +set testinput=testinput%1 +set testoutput=testoutput%outnum% +if exist %srcdir%\testdata\win%testinput% ( + set testinput=wintestinput%1 + set testoutput=wintestoutput%outnum% +) + +echo Test %1: %3 +%pcre2test% %mode% %4 %5 %6 %7 %8 %9 %srcdir%\testdata\%testinput% >%2%bits%\%testoutput% +if errorlevel 1 ( + echo. failed executing command-line: + echo. %pcre2test% %mode% %4 %5 %6 %7 %8 %9 %srcdir%\testdata\%testinput% ^>%2%bits%\%testoutput% + set failed="yes" + goto :eof +) else if [%1]==[2] ( + %pcre2test% %mode% %4 %5 %6 %7 %8 %9 -error -70,-62,-2,-1,0,100,101,191,200 >>%2%bits%\%testoutput% +) + +set type= +if [%1]==[11] ( + set type=-%bits% +) +if [%1]==[12] ( + set type=-%bits% +) +if [%1]==[14] ( + set type=-%bits% +) +if [%1]==[22] ( + set type=-%bits% +) + +fc /n %srcdir%\testdata\%testoutput%%type% %2%bits%\%testoutput% >NUL + +if errorlevel 1 ( + echo. failed comparison: fc /n %srcdir%\testdata\%testoutput% %2%bits%\%testoutput% + if [%1]==[3] ( + echo. + echo ** Test 3 failure usually means french locale is not + echo ** available on the system, rather than a bug or problem with PCRE2. + echo. + goto :eof +) + + set failed="yes" + goto :eof +) + +echo. Passed. +goto :eof + +:do1 +call :runsub 1 testout "Main non-UTF, non-UCP functionality (Compatible with Perl >= 5.10)" -q +if %jit% EQU 1 call :runsub 1 testoutjit "Test with JIT Override" -q -jit +goto :eof + +:do2 + copy /y %srcdir%\testdata\testbtables testbtables + call :runsub 2 testout "API, errors, internals, and non-Perl stuff" -q + if %jit% EQU 1 call :runsub 2 testoutjit "Test with JIT Override" -q -jit +goto :eof + +:do3 + call :runsub 3 testout "Locale-specific features" -q + if %jit% EQU 1 call :runsub 3 testoutjit "Test with JIT Override" -q -jit +goto :eof + +:do4 +if %unicode% EQU 0 ( + echo Test 4 Skipped due to absence of Unicode support. + goto :eof +) + call :runsub 4 testout "UTF-%bits% and Unicode property support - (Compatible with Perl >= 5.10)" -q + if %jit% EQU 1 call :runsub 4 testoutjit "Test with JIT Override" -q -jit +goto :eof + +:do5 +if %unicode% EQU 0 ( + echo Test 5 Skipped due to absence of Unicode support. + goto :eof +) + call :runsub 5 testout "API, internals, and non-Perl stuff for UTF-%bits% and UCP" -q + if %jit% EQU 1 call :runsub 5 testoutjit "Test with JIT Override" -q -jit +goto :eof + +:do6 + call :runsub 6 testout "DFA matching main non-UTF, non-UCP functionality" -q +goto :eof + +:do7 +if %unicode% EQU 0 ( + echo Test 7 Skipped due to absence of Unicode support. + goto :eof +) + call :runsub 7 testout "DFA matching with UTF-%bits% and Unicode property support" -q + goto :eof + +:do8 +if NOT %link_size% EQU 2 ( + echo Test 8 Skipped because link size is not 2. + goto :eof +) +if %unicode% EQU 0 ( + echo Test 8 Skipped due to absence of Unicode support. + goto :eof +) + call :runsub 8 testout "Internal offsets and code size tests" -q +goto :eof + +:do9 +if NOT %bits% EQU 8 ( + echo Test 9 Skipped when running 16/32-bit tests. + goto :eof +) + call :runsub 9 testout "Specials for the basic 8-bit library" -q + if %jit% EQU 1 call :runsub 9 testoutjit "Test with JIT Override" -q -jit +goto :eof + +:do10 +if NOT %bits% EQU 8 ( + echo Test 10 Skipped when running 16/32-bit tests. + goto :eof +) +if %unicode% EQU 0 ( + echo Test 10 Skipped due to absence of Unicode support. + goto :eof +) + call :runsub 10 testout "Specials for the 8-bit library with Unicode support" -q + if %jit% EQU 1 call :runsub 10 testoutjit "Test with JIT Override" -q -jit +goto :eof + +:do11 +if %bits% EQU 8 ( + echo Test 11 Skipped when running 8-bit tests. + goto :eof +) + call :runsub 11 testout "Specials for the basic 16/32-bit library" -q + if %jit% EQU 1 call :runsub 11 testoutjit "Test with JIT Override" -q -jit +goto :eof + +:do12 +if %bits% EQU 8 ( + echo Test 12 Skipped when running 8-bit tests. + goto :eof +) +if %unicode% EQU 0 ( + echo Test 12 Skipped due to absence of Unicode support. + goto :eof +) + call :runsub 12 testout "Specials for the 16/32-bit library with Unicode support" -q + if %jit% EQU 1 call :runsub 12 testoutjit "Test with JIT Override" -q -jit +goto :eof + +:do13 +if %bits% EQU 8 ( + echo Test 13 Skipped when running 8-bit tests. + goto :eof +) + call :runsub 13 testout "DFA specials for the basic 16/32-bit library" -q +goto :eof + +:do14 +if %unicode% EQU 0 ( + echo Test 14 Skipped due to absence of Unicode support. + goto :eof +) + call :runsub 14 testout "DFA specials for UTF and UCP support" -q + goto :eof + +:do15 +call :runsub 15 testout "Non-JIT limits and other non_JIT tests" -q +goto :eof + +:do16 +if %jit% EQU 1 ( + echo Test 16 Skipped due to presence of JIT support. + goto :eof +) + call :runsub 16 testout "JIT-specific features when JIT is not available" -q +goto :eof + +:do17 +if %jit% EQU 0 ( + echo Test 17 Skipped due to absence of JIT support. + goto :eof +) + call :runsub 17 testout "JIT-specific features when JIT is available" -q +goto :eof + +:do18 +if %bits% EQU 16 ( + echo Test 18 Skipped when running 16-bit tests. + goto :eof +) +if %bits% EQU 32 ( + echo Test 18 Skipped when running 32-bit tests. + goto :eof +) + call :runsub 18 testout "POSIX interface, excluding UTF-8 and UCP" -q +goto :eof + +:do19 +if %bits% EQU 16 ( + echo Test 19 Skipped when running 16-bit tests. + goto :eof +) +if %bits% EQU 32 ( + echo Test 19 Skipped when running 32-bit tests. + goto :eof +) +if %unicode% EQU 0 ( + echo Test 19 Skipped due to absence of Unicode support. + goto :eof +) + call :runsub 19 testout "POSIX interface with UTF-8 and UCP" -q +goto :eof + +:do20 +call :runsub 20 testout "Serialization tests" -q +goto :eof + +:do21 +if %supportBSC% EQU 0 ( + echo Test 21 Skipped due to absence of backslash-C support. + goto :eof +) + call :runsub 21 testout "Backslash-C tests without UTF" -q + call :runsub 21 testout "Backslash-C tests without UTF (DFA)" -q -dfa + if %jit% EQU 1 call :runsub 21 testoutjit "Test with JIT Override" -q -jit +goto :eof + +:do22 +if %supportBSC% EQU 0 ( + echo Test 22 Skipped due to absence of backslash-C support. + goto :eof +) +if %unicode% EQU 0 ( + echo Test 22 Skipped due to absence of Unicode support. + goto :eof +) + call :runsub 22 testout "Backslash-C tests with UTF" -q + if %jit% EQU 1 call :runsub 22 testoutjit "Test with JIT Override" -q -jit +goto :eof + +:do23 +if %supportBSC% EQU 1 ( + echo Test 23 Skipped due to presence of backslash-C support. + goto :eof +) + call :runsub 23 testout "Backslash-C disabled test" -q +goto :eof + +:conferror +@echo. +@echo Either your build is incomplete or you have a configuration error. +@echo. +@echo If configured with cmake and executed via "make test" or the MSVC "RUN_TESTS" +@echo project, pcre2_test.bat defines variables and automatically calls RunTest.bat. +@echo For manual testing of all available features, after configuring with cmake +@echo and building, you can run the built pcre2_test.bat. For best results with +@echo cmake builds and tests avoid directories with full path names that include +@echo spaces for source or build. +@echo. +@echo Otherwise, if the build dir is in a subdir of the source dir, testdata needed +@echo for input and verification should be found automatically when (from the +@echo location of the the built exes) you call RunTest.bat. By default RunTest.bat +@echo runs all tests compatible with the linked pcre2 library but it can be given +@echo a test number as an argument. +@echo. +@echo If the build dir is not under the source dir you can either copy your exes +@echo to the source folder or copy RunTest.bat and the testdata folder to the +@echo location of your built exes and then run RunTest.bat. +@echo. +goto :eof diff --git a/src/pcre/aclocal.m4 b/src/pcre2/aclocal.m4 similarity index 97% rename from src/pcre/aclocal.m4 rename to src/pcre2/aclocal.m4 index 85368ab8..71611157 100644 --- a/src/pcre/aclocal.m4 +++ b/src/pcre2/aclocal.m4 @@ -1,6 +1,6 @@ -# generated automatically by aclocal 1.16.1 -*- Autoconf -*- +# generated automatically by aclocal 1.16.3 -*- Autoconf -*- -# Copyright (C) 1996-2018 Free Software Foundation, Inc. +# Copyright (C) 1996-2020 Free Software Foundation, Inc. # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -14,8 +14,8 @@ m4_ifndef([AC_CONFIG_MACRO_DIRS], [m4_defun([_AM_CONFIG_MACRO_DIRS], [])m4_defun([AC_CONFIG_MACRO_DIRS], [_AM_CONFIG_MACRO_DIRS($@)])]) m4_ifndef([AC_AUTOCONF_VERSION], [m4_copy([m4_PACKAGE_VERSION], [AC_AUTOCONF_VERSION])])dnl -m4_if(m4_defn([AC_AUTOCONF_VERSION]), [2.69],, -[m4_warning([this file was generated for autoconf 2.69. +m4_if(m4_defn([AC_AUTOCONF_VERSION]), [2.71],, +[m4_warning([this file was generated for autoconf 2.71. You have another version of autoconf. It may work, but is not guaranteed to. If you have problems, you may need to regenerate the build system entirely. To do so, use the procedure documented by the package, typically 'autoreconf'.])]) @@ -364,7 +364,7 @@ AS_IF([test "$AS_TR_SH([with_]m4_tolower([$1]))" = "yes"], [AC_DEFINE([HAVE_][$1], 1, [Enable ]m4_tolower([$1])[ support])]) ])dnl PKG_HAVE_DEFINE_WITH_MODULES -# Copyright (C) 2002-2018 Free Software Foundation, Inc. +# Copyright (C) 2002-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -379,7 +379,7 @@ AC_DEFUN([AM_AUTOMAKE_VERSION], [am__api_version='1.16' dnl Some users find AM_AUTOMAKE_VERSION and mistake it for a way to dnl require some minimum version. Point them to the right macro. -m4_if([$1], [1.16.1], [], +m4_if([$1], [1.16.3], [], [AC_FATAL([Do not call $0, use AM_INIT_AUTOMAKE([$1]).])])dnl ]) @@ -395,12 +395,12 @@ m4_define([_AM_AUTOCONF_VERSION], []) # Call AM_AUTOMAKE_VERSION and AM_AUTOMAKE_VERSION so they can be traced. # This function is AC_REQUIREd by AM_INIT_AUTOMAKE. AC_DEFUN([AM_SET_CURRENT_AUTOMAKE_VERSION], -[AM_AUTOMAKE_VERSION([1.16.1])dnl +[AM_AUTOMAKE_VERSION([1.16.3])dnl m4_ifndef([AC_AUTOCONF_VERSION], [m4_copy([m4_PACKAGE_VERSION], [AC_AUTOCONF_VERSION])])dnl _AM_AUTOCONF_VERSION(m4_defn([AC_AUTOCONF_VERSION]))]) -# Copyright (C) 2011-2018 Free Software Foundation, Inc. +# Copyright (C) 2011-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -462,7 +462,7 @@ AC_SUBST([AR])dnl # AM_AUX_DIR_EXPAND -*- Autoconf -*- -# Copyright (C) 2001-2018 Free Software Foundation, Inc. +# Copyright (C) 2001-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -514,7 +514,7 @@ am_aux_dir=`cd "$ac_aux_dir" && pwd` # AM_CONDITIONAL -*- Autoconf -*- -# Copyright (C) 1997-2018 Free Software Foundation, Inc. +# Copyright (C) 1997-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -545,7 +545,7 @@ AC_CONFIG_COMMANDS_PRE( Usually this means the macro was only invoked conditionally.]]) fi])]) -# Copyright (C) 1999-2018 Free Software Foundation, Inc. +# Copyright (C) 1999-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -736,7 +736,7 @@ _AM_SUBST_NOTMAKE([am__nodep])dnl # Generate code to set up dependency tracking. -*- Autoconf -*- -# Copyright (C) 1999-2018 Free Software Foundation, Inc. +# Copyright (C) 1999-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -775,7 +775,9 @@ AC_DEFUN([_AM_OUTPUT_DEPENDENCY_COMMANDS], done if test $am_rc -ne 0; then AC_MSG_FAILURE([Something went wrong bootstrapping makefile fragments - for automatic dependency tracking. Try re-running configure with the + for automatic dependency tracking. If GNU make was not used, consider + re-running the configure script with MAKE="gmake" (or whatever is + necessary). You can also try re-running configure with the '--disable-dependency-tracking' option to at least be able to build the package (albeit without support for automatic dependency tracking).]) fi @@ -802,7 +804,7 @@ AC_DEFUN([AM_OUTPUT_DEPENDENCY_COMMANDS], # Do all the work for Automake. -*- Autoconf -*- -# Copyright (C) 1996-2018 Free Software Foundation, Inc. +# Copyright (C) 1996-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -999,7 +1001,7 @@ for _am_header in $config_headers :; do done echo "timestamp for $_am_arg" >`AS_DIRNAME(["$_am_arg"])`/stamp-h[]$_am_stamp_count]) -# Copyright (C) 2001-2018 Free Software Foundation, Inc. +# Copyright (C) 2001-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -1020,7 +1022,7 @@ if test x"${install_sh+set}" != xset; then fi AC_SUBST([install_sh])]) -# Copyright (C) 2003-2018 Free Software Foundation, Inc. +# Copyright (C) 2003-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -1041,7 +1043,7 @@ AC_SUBST([am__leading_dot])]) # Check to see how 'make' treats includes. -*- Autoconf -*- -# Copyright (C) 2001-2018 Free Software Foundation, Inc. +# Copyright (C) 2001-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -1084,7 +1086,7 @@ AC_SUBST([am__quote])]) # Fake the existence of programs that GNU maintainers use. -*- Autoconf -*- -# Copyright (C) 1997-2018 Free Software Foundation, Inc. +# Copyright (C) 1997-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -1105,12 +1107,7 @@ AC_DEFUN([AM_MISSING_HAS_RUN], [AC_REQUIRE([AM_AUX_DIR_EXPAND])dnl AC_REQUIRE_AUX_FILE([missing])dnl if test x"${MISSING+set}" != xset; then - case $am_aux_dir in - *\ * | *\ *) - MISSING="\${SHELL} \"$am_aux_dir/missing\"" ;; - *) - MISSING="\${SHELL} $am_aux_dir/missing" ;; - esac + MISSING="\${SHELL} '$am_aux_dir/missing'" fi # Use eval to expand $SHELL if eval "$MISSING --is-lightweight"; then @@ -1123,7 +1120,7 @@ fi # Helper functions for option handling. -*- Autoconf -*- -# Copyright (C) 2001-2018 Free Software Foundation, Inc. +# Copyright (C) 2001-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -1152,7 +1149,7 @@ AC_DEFUN([_AM_SET_OPTIONS], AC_DEFUN([_AM_IF_OPTION], [m4_ifset(_AM_MANGLE_OPTION([$1]), [$2], [$3])]) -# Copyright (C) 1999-2018 Free Software Foundation, Inc. +# Copyright (C) 1999-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -1199,7 +1196,7 @@ AC_LANG_POP([C])]) # For backward compatibility. AC_DEFUN_ONCE([AM_PROG_CC_C_O], [AC_REQUIRE([AC_PROG_CC])]) -# Copyright (C) 2001-2018 Free Software Foundation, Inc. +# Copyright (C) 2001-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -1218,7 +1215,7 @@ AC_DEFUN([AM_RUN_LOG], # Check to make sure that the build environment is sane. -*- Autoconf -*- -# Copyright (C) 1996-2018 Free Software Foundation, Inc. +# Copyright (C) 1996-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -1299,7 +1296,7 @@ AC_CONFIG_COMMANDS_PRE( rm -f conftest.file ]) -# Copyright (C) 2009-2018 Free Software Foundation, Inc. +# Copyright (C) 2009-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -1359,7 +1356,7 @@ AC_SUBST([AM_BACKSLASH])dnl _AM_SUBST_NOTMAKE([AM_BACKSLASH])dnl ]) -# Copyright (C) 2001-2018 Free Software Foundation, Inc. +# Copyright (C) 2001-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -1387,7 +1384,7 @@ fi INSTALL_STRIP_PROGRAM="\$(install_sh) -c -s" AC_SUBST([INSTALL_STRIP_PROGRAM])]) -# Copyright (C) 2006-2018 Free Software Foundation, Inc. +# Copyright (C) 2006-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -1406,7 +1403,7 @@ AC_DEFUN([AM_SUBST_NOTMAKE], [_AM_SUBST_NOTMAKE($@)]) # Check how to create a tarball. -*- Autoconf -*- -# Copyright (C) 2004-2018 Free Software Foundation, Inc. +# Copyright (C) 2004-2020 Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -1543,4 +1540,4 @@ m4_include([m4/ltoptions.m4]) m4_include([m4/ltsugar.m4]) m4_include([m4/ltversion.m4]) m4_include([m4/lt~obsolete.m4]) -m4_include([m4/pcre_visibility.m4]) +m4_include([m4/pcre2_visibility.m4]) diff --git a/src/pcre/ar-lib b/src/pcre2/ar-lib similarity index 94% rename from src/pcre/ar-lib rename to src/pcre2/ar-lib index 0baa4f60..1e9388e2 100755 --- a/src/pcre/ar-lib +++ b/src/pcre2/ar-lib @@ -2,9 +2,9 @@ # Wrapper for Microsoft lib.exe me=ar-lib -scriptversion=2012-03-01.08; # UTC +scriptversion=2019-07-04.01; # UTC -# Copyright (C) 2010-2018 Free Software Foundation, Inc. +# Copyright (C) 2010-2020 Free Software Foundation, Inc. # Written by Peter Rosin . # # This program is free software; you can redistribute it and/or modify @@ -53,7 +53,7 @@ func_file_conv () MINGW*) file_conv=mingw ;; - CYGWIN*) + CYGWIN* | MSYS*) file_conv=cygwin ;; *) @@ -65,7 +65,7 @@ func_file_conv () mingw) file=`cmd //C echo "$file " | sed -e 's/"\(.*\) " *$/\1/'` ;; - cygwin) + cygwin | msys) file=`cygpath -m "$file" || echo "$file"` ;; wine) @@ -224,10 +224,11 @@ elif test -n "$extract"; then esac done else - $AR -NOLOGO -LIST "$archive" | sed -e 's/\\/\\\\/g' | while read member - do - $AR -NOLOGO -EXTRACT:"$member" "$archive" || exit $? - done + $AR -NOLOGO -LIST "$archive" | tr -d '\r' | sed -e 's/\\/\\\\/g' \ + | while read member + do + $AR -NOLOGO -EXTRACT:"$member" "$archive" || exit $? + done fi elif test -n "$quick$replace"; then diff --git a/src/pcre/cmake/COPYING-CMAKE-SCRIPTS b/src/pcre2/cmake/COPYING-CMAKE-SCRIPTS similarity index 100% rename from src/pcre/cmake/COPYING-CMAKE-SCRIPTS rename to src/pcre2/cmake/COPYING-CMAKE-SCRIPTS diff --git a/src/pcre/cmake/FindEditline.cmake b/src/pcre2/cmake/FindEditline.cmake similarity index 100% rename from src/pcre/cmake/FindEditline.cmake rename to src/pcre2/cmake/FindEditline.cmake diff --git a/src/pcre/cmake/FindPackageHandleStandardArgs.cmake b/src/pcre2/cmake/FindPackageHandleStandardArgs.cmake similarity index 100% rename from src/pcre/cmake/FindPackageHandleStandardArgs.cmake rename to src/pcre2/cmake/FindPackageHandleStandardArgs.cmake diff --git a/src/pcre/cmake/FindReadline.cmake b/src/pcre2/cmake/FindReadline.cmake similarity index 100% rename from src/pcre/cmake/FindReadline.cmake rename to src/pcre2/cmake/FindReadline.cmake diff --git a/src/pcre/compile b/src/pcre2/compile similarity index 98% rename from src/pcre/compile rename to src/pcre2/compile index 99e50524..23fcba01 100755 --- a/src/pcre/compile +++ b/src/pcre2/compile @@ -3,7 +3,7 @@ scriptversion=2018-03-07.03; # UTC -# Copyright (C) 1999-2018 Free Software Foundation, Inc. +# Copyright (C) 1999-2020 Free Software Foundation, Inc. # Written by Tom Tromey . # # This program is free software; you can redistribute it and/or modify @@ -53,7 +53,7 @@ func_file_conv () MINGW*) file_conv=mingw ;; - CYGWIN*) + CYGWIN* | MSYS*) file_conv=cygwin ;; *) @@ -67,7 +67,7 @@ func_file_conv () mingw/*) file=`cmd //C echo "$file " | sed -e 's/"\(.*\) " *$/\1/'` ;; - cygwin/*) + cygwin/* | msys/*) file=`cygpath -m "$file" || echo "$file"` ;; wine/*) diff --git a/src/pcre2/config-cmake.h.in b/src/pcre2/config-cmake.h.in new file mode 100644 index 00000000..7766dd74 --- /dev/null +++ b/src/pcre2/config-cmake.h.in @@ -0,0 +1,58 @@ +/* config.h for CMake builds */ + +#cmakedefine HAVE_ATTRIBUTE_UNINITIALIZED 1 +#cmakedefine HAVE_DIRENT_H 1 +#cmakedefine HAVE_INTTYPES_H 1 +#cmakedefine HAVE_STDINT_H 1 +#cmakedefine HAVE_STRERROR 1 +#cmakedefine HAVE_SYS_STAT_H 1 +#cmakedefine HAVE_SYS_TYPES_H 1 +#cmakedefine HAVE_UNISTD_H 1 +#cmakedefine HAVE_WINDOWS_H 1 + +#cmakedefine HAVE_BCOPY 1 +#cmakedefine HAVE_MEMFD_CREATE 1 +#cmakedefine HAVE_MEMMOVE 1 +#cmakedefine HAVE_SECURE_GETENV 1 +#cmakedefine HAVE_STRERROR 1 + +#cmakedefine PCRE2_STATIC 1 + +#cmakedefine SUPPORT_PCRE2_8 1 +#cmakedefine SUPPORT_PCRE2_16 1 +#cmakedefine SUPPORT_PCRE2_32 1 +#cmakedefine PCRE2_DEBUG 1 +#cmakedefine DISABLE_PERCENT_ZT 1 + +#cmakedefine SUPPORT_LIBBZ2 1 +#cmakedefine SUPPORT_LIBEDIT 1 +#cmakedefine SUPPORT_LIBREADLINE 1 +#cmakedefine SUPPORT_LIBZ 1 + +#cmakedefine SUPPORT_JIT 1 +#cmakedefine SLJIT_PROT_EXECUTABLE_ALLOCATOR 1 +#cmakedefine SUPPORT_PCRE2GREP_JIT 1 +#cmakedefine SUPPORT_PCRE2GREP_CALLOUT 1 +#cmakedefine SUPPORT_PCRE2GREP_CALLOUT_FORK 1 +#cmakedefine SUPPORT_UNICODE 1 +#cmakedefine SUPPORT_VALGRIND 1 + +#cmakedefine BSR_ANYCRLF 1 +#cmakedefine EBCDIC 1 +#cmakedefine EBCDIC_NL25 1 +#cmakedefine HEAP_MATCH_RECURSE 1 +#cmakedefine NEVER_BACKSLASH_C 1 + +#define LINK_SIZE @PCRE2_LINK_SIZE@ +#define HEAP_LIMIT @PCRE2_HEAP_LIMIT@ +#define MATCH_LIMIT @PCRE2_MATCH_LIMIT@ +#define MATCH_LIMIT_DEPTH @PCRE2_MATCH_LIMIT_DEPTH@ +#define NEWLINE_DEFAULT @NEWLINE_DEFAULT@ +#define PARENS_NEST_LIMIT @PCRE2_PARENS_NEST_LIMIT@ +#define PCRE2GREP_BUFSIZE @PCRE2GREP_BUFSIZE@ +#define PCRE2GREP_MAX_BUFSIZE @PCRE2GREP_MAX_BUFSIZE@ + +#define MAX_NAME_SIZE 32 +#define MAX_NAME_COUNT 10000 + +/* end config.h for CMake builds */ diff --git a/src/pcre/config.guess b/src/pcre2/config.guess similarity index 71% rename from src/pcre/config.guess rename to src/pcre2/config.guess index 256083a7..0fc11edb 100755 --- a/src/pcre/config.guess +++ b/src/pcre2/config.guess @@ -1,8 +1,8 @@ #! /bin/sh # Attempt to guess a canonical system name. -# Copyright 1992-2018 Free Software Foundation, Inc. +# Copyright 1992-2020 Free Software Foundation, Inc. -timestamp='2018-03-08' +timestamp='2020-11-07' # This file is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by @@ -32,7 +32,7 @@ timestamp='2018-03-08' # Please send patches to . -me=`echo "$0" | sed -e 's,.*/,,'` +me=$(echo "$0" | sed -e 's,.*/,,') usage="\ Usage: $0 [OPTION] @@ -50,7 +50,7 @@ version="\ GNU config.guess ($timestamp) Originally written by Per Bothner. -Copyright 1992-2018 Free Software Foundation, Inc. +Copyright 1992-2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE." @@ -84,8 +84,6 @@ if test $# != 0; then exit 1 fi -trap 'exit 1' 1 2 15 - # CC_FOR_BUILD -- compiler used by this script. Note that the use of a # compiler to aid in system detection is discouraged as it requires # temporary files to be created and, as you can see below, it is a @@ -96,41 +94,47 @@ trap 'exit 1' 1 2 15 # Portable tmp directory creation inspired by the Autoconf team. -set_cc_for_build=' -trap "exitcode=\$?; (rm -f \$tmpfiles 2>/dev/null; rmdir \$tmp 2>/dev/null) && exit \$exitcode" 0 ; -trap "rm -f \$tmpfiles 2>/dev/null; rmdir \$tmp 2>/dev/null; exit 1" 1 2 13 15 ; -: ${TMPDIR=/tmp} ; - { tmp=`(umask 077 && mktemp -d "$TMPDIR/cgXXXXXX") 2>/dev/null` && test -n "$tmp" && test -d "$tmp" ; } || - { test -n "$RANDOM" && tmp=$TMPDIR/cg$$-$RANDOM && (umask 077 && mkdir $tmp) ; } || - { tmp=$TMPDIR/cg-$$ && (umask 077 && mkdir $tmp) && echo "Warning: creating insecure temp directory" >&2 ; } || - { echo "$me: cannot create a temporary directory in $TMPDIR" >&2 ; exit 1 ; } ; -dummy=$tmp/dummy ; -tmpfiles="$dummy.c $dummy.o $dummy.rel $dummy" ; -case $CC_FOR_BUILD,$HOST_CC,$CC in - ,,) echo "int x;" > "$dummy.c" ; - for c in cc gcc c89 c99 ; do - if ($c -c -o "$dummy.o" "$dummy.c") >/dev/null 2>&1 ; then - CC_FOR_BUILD="$c"; break ; - fi ; - done ; - if test x"$CC_FOR_BUILD" = x ; then - CC_FOR_BUILD=no_compiler_found ; - fi - ;; - ,,*) CC_FOR_BUILD=$CC ;; - ,*,*) CC_FOR_BUILD=$HOST_CC ;; -esac ; set_cc_for_build= ;' +tmp= +# shellcheck disable=SC2172 +trap 'test -z "$tmp" || rm -fr "$tmp"' 0 1 2 13 15 + +set_cc_for_build() { + # prevent multiple calls if $tmp is already set + test "$tmp" && return 0 + : "${TMPDIR=/tmp}" + # shellcheck disable=SC2039 + { tmp=$( (umask 077 && mktemp -d "$TMPDIR/cgXXXXXX") 2>/dev/null) && test -n "$tmp" && test -d "$tmp" ; } || + { test -n "$RANDOM" && tmp=$TMPDIR/cg$$-$RANDOM && (umask 077 && mkdir "$tmp" 2>/dev/null) ; } || + { tmp=$TMPDIR/cg-$$ && (umask 077 && mkdir "$tmp" 2>/dev/null) && echo "Warning: creating insecure temp directory" >&2 ; } || + { echo "$me: cannot create a temporary directory in $TMPDIR" >&2 ; exit 1 ; } + dummy=$tmp/dummy + case ${CC_FOR_BUILD-},${HOST_CC-},${CC-} in + ,,) echo "int x;" > "$dummy.c" + for driver in cc gcc c89 c99 ; do + if ($driver -c -o "$dummy.o" "$dummy.c") >/dev/null 2>&1 ; then + CC_FOR_BUILD="$driver" + break + fi + done + if test x"$CC_FOR_BUILD" = x ; then + CC_FOR_BUILD=no_compiler_found + fi + ;; + ,,*) CC_FOR_BUILD=$CC ;; + ,*,*) CC_FOR_BUILD=$HOST_CC ;; + esac +} # This is needed to find uname on a Pyramid OSx when run in the BSD universe. # (ghazi@noc.rutgers.edu 1994-08-24) -if (test -f /.attbin/uname) >/dev/null 2>&1 ; then +if test -f /.attbin/uname ; then PATH=$PATH:/.attbin ; export PATH fi -UNAME_MACHINE=`(uname -m) 2>/dev/null` || UNAME_MACHINE=unknown -UNAME_RELEASE=`(uname -r) 2>/dev/null` || UNAME_RELEASE=unknown -UNAME_SYSTEM=`(uname -s) 2>/dev/null` || UNAME_SYSTEM=unknown -UNAME_VERSION=`(uname -v) 2>/dev/null` || UNAME_VERSION=unknown +UNAME_MACHINE=$( (uname -m) 2>/dev/null) || UNAME_MACHINE=unknown +UNAME_RELEASE=$( (uname -r) 2>/dev/null) || UNAME_RELEASE=unknown +UNAME_SYSTEM=$( (uname -s) 2>/dev/null) || UNAME_SYSTEM=unknown +UNAME_VERSION=$( (uname -v) 2>/dev/null) || UNAME_VERSION=unknown case "$UNAME_SYSTEM" in Linux|GNU|GNU/*) @@ -138,7 +142,7 @@ Linux|GNU|GNU/*) # We could probably try harder. LIBC=gnu - eval "$set_cc_for_build" + set_cc_for_build cat <<-EOF > "$dummy.c" #include #if defined(__UCLIBC__) @@ -146,17 +150,15 @@ Linux|GNU|GNU/*) #elif defined(__dietlibc__) LIBC=dietlibc #else + #include + #ifdef __DEFINED_va_list + LIBC=musl + #else LIBC=gnu #endif + #endif EOF - eval "`$CC_FOR_BUILD -E "$dummy.c" 2>/dev/null | grep '^LIBC' | sed 's, ,,g'`" - - # If ldd exists, use it to detect musl libc. - if command -v ldd >/dev/null && \ - ldd --version 2>&1 | grep -q ^musl - then - LIBC=musl - fi + eval "$($CC_FOR_BUILD -E "$dummy.c" 2>/dev/null | grep '^LIBC' | sed 's, ,,g')" ;; esac @@ -175,19 +177,20 @@ case "$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in # Note: NetBSD doesn't particularly care about the vendor # portion of the name. We always set it to "unknown". sysctl="sysctl -n hw.machine_arch" - UNAME_MACHINE_ARCH=`(uname -p 2>/dev/null || \ + UNAME_MACHINE_ARCH=$( (uname -p 2>/dev/null || \ "/sbin/$sysctl" 2>/dev/null || \ "/usr/sbin/$sysctl" 2>/dev/null || \ - echo unknown)` + echo unknown)) case "$UNAME_MACHINE_ARCH" in + aarch64eb) machine=aarch64_be-unknown ;; armeb) machine=armeb-unknown ;; arm*) machine=arm-unknown ;; sh3el) machine=shl-unknown ;; sh3eb) machine=sh-unknown ;; sh5el) machine=sh5le-unknown ;; earmv*) - arch=`echo "$UNAME_MACHINE_ARCH" | sed -e 's,^e\(armv[0-9]\).*$,\1,'` - endian=`echo "$UNAME_MACHINE_ARCH" | sed -ne 's,^.*\(eb\)$,\1,p'` + arch=$(echo "$UNAME_MACHINE_ARCH" | sed -e 's,^e\(armv[0-9]\).*$,\1,') + endian=$(echo "$UNAME_MACHINE_ARCH" | sed -ne 's,^.*\(eb\)$,\1,p') machine="${arch}${endian}"-unknown ;; *) machine="$UNAME_MACHINE_ARCH"-unknown ;; @@ -199,7 +202,7 @@ case "$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in os=netbsdelf ;; arm*|i386|m68k|ns32k|sh3*|sparc|vax) - eval "$set_cc_for_build" + set_cc_for_build if echo __ELF__ | $CC_FOR_BUILD -E - 2>/dev/null \ | grep -q __ELF__ then @@ -218,7 +221,7 @@ case "$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in case "$UNAME_MACHINE_ARCH" in earm*) expr='s/^earmv[0-9]/-eabi/;s/eb$//' - abi=`echo "$UNAME_MACHINE_ARCH" | sed -e "$expr"` + abi=$(echo "$UNAME_MACHINE_ARCH" | sed -e "$expr") ;; esac # The OS release @@ -231,24 +234,24 @@ case "$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in release='-gnu' ;; *) - release=`echo "$UNAME_RELEASE" | sed -e 's/[-_].*//' | cut -d. -f1,2` + release=$(echo "$UNAME_RELEASE" | sed -e 's/[-_].*//' | cut -d. -f1,2) ;; esac # Since CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM: # contains redundant information, the shorter form: # CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM is used. - echo "$machine-${os}${release}${abi}" + echo "$machine-${os}${release}${abi-}" exit ;; *:Bitrig:*:*) - UNAME_MACHINE_ARCH=`arch | sed 's/Bitrig.//'` + UNAME_MACHINE_ARCH=$(arch | sed 's/Bitrig.//') echo "$UNAME_MACHINE_ARCH"-unknown-bitrig"$UNAME_RELEASE" exit ;; *:OpenBSD:*:*) - UNAME_MACHINE_ARCH=`arch | sed 's/OpenBSD.//'` + UNAME_MACHINE_ARCH=$(arch | sed 's/OpenBSD.//') echo "$UNAME_MACHINE_ARCH"-unknown-openbsd"$UNAME_RELEASE" exit ;; *:LibertyBSD:*:*) - UNAME_MACHINE_ARCH=`arch | sed 's/^.*BSD\.//'` + UNAME_MACHINE_ARCH=$(arch | sed 's/^.*BSD\.//') echo "$UNAME_MACHINE_ARCH"-unknown-libertybsd"$UNAME_RELEASE" exit ;; *:MidnightBSD:*:*) @@ -260,6 +263,9 @@ case "$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in *:SolidBSD:*:*) echo "$UNAME_MACHINE"-unknown-solidbsd"$UNAME_RELEASE" exit ;; + *:OS108:*:*) + echo "$UNAME_MACHINE"-unknown-os108_"$UNAME_RELEASE" + exit ;; macppc:MirBSD:*:*) echo powerpc-unknown-mirbsd"$UNAME_RELEASE" exit ;; @@ -269,26 +275,29 @@ case "$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in *:Sortix:*:*) echo "$UNAME_MACHINE"-unknown-sortix exit ;; + *:Twizzler:*:*) + echo "$UNAME_MACHINE"-unknown-twizzler + exit ;; *:Redox:*:*) echo "$UNAME_MACHINE"-unknown-redox exit ;; mips:OSF1:*.*) - echo mips-dec-osf1 - exit ;; + echo mips-dec-osf1 + exit ;; alpha:OSF1:*:*) case $UNAME_RELEASE in *4.0) - UNAME_RELEASE=`/usr/sbin/sizer -v | awk '{print $3}'` + UNAME_RELEASE=$(/usr/sbin/sizer -v | awk '{print $3}') ;; *5.*) - UNAME_RELEASE=`/usr/sbin/sizer -v | awk '{print $4}'` + UNAME_RELEASE=$(/usr/sbin/sizer -v | awk '{print $4}') ;; esac # According to Compaq, /usr/sbin/psrinfo has been available on # OSF/1 and Tru64 systems produced since 1995. I hope that # covers most systems running today. This code pipes the CPU # types through head -n 1, so we only detect the type of CPU 0. - ALPHA_CPU_TYPE=`/usr/sbin/psrinfo -v | sed -n -e 's/^ The alpha \(.*\) processor.*$/\1/p' | head -n 1` + ALPHA_CPU_TYPE=$(/usr/sbin/psrinfo -v | sed -n -e 's/^ The alpha \(.*\) processor.*$/\1/p' | head -n 1) case "$ALPHA_CPU_TYPE" in "EV4 (21064)") UNAME_MACHINE=alpha ;; @@ -326,7 +335,7 @@ case "$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in # A Tn.n version is a released field test version. # A Xn.n version is an unreleased experimental baselevel. # 1.2 uses "1.2" for uname -r. - echo "$UNAME_MACHINE"-dec-osf"`echo "$UNAME_RELEASE" | sed -e 's/^[PVTX]//' | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz`" + echo "$UNAME_MACHINE"-dec-osf"$(echo "$UNAME_RELEASE" | sed -e 's/^[PVTX]//' | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz)" # Reset EXIT trap before exiting to avoid spurious non-zero exit code. exitcode=$? trap '' 0 @@ -360,7 +369,7 @@ case "$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in exit ;; Pyramid*:OSx*:*:* | MIS*:OSx*:*:* | MIS*:SMP_DC-OSx*:*:*) # akee@wpdis03.wpafb.af.mil (Earle F. Ake) contributed MIS and NILE. - if test "`(/bin/universe) 2>/dev/null`" = att ; then + if test "$( (/bin/universe) 2>/dev/null)" = att ; then echo pyramid-pyramid-sysv3 else echo pyramid-pyramid-bsd @@ -373,28 +382,28 @@ case "$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in echo sparc-icl-nx6 exit ;; DRS?6000:UNIX_SV:4.2*:7* | DRS?6000:isis:4.2*:7*) - case `/usr/bin/uname -p` in + case $(/usr/bin/uname -p) in sparc) echo sparc-icl-nx7; exit ;; esac ;; s390x:SunOS:*:*) - echo "$UNAME_MACHINE"-ibm-solaris2"`echo "$UNAME_RELEASE" | sed -e 's/[^.]*//'`" + echo "$UNAME_MACHINE"-ibm-solaris2"$(echo "$UNAME_RELEASE" | sed -e 's/[^.]*//')" exit ;; sun4H:SunOS:5.*:*) - echo sparc-hal-solaris2"`echo "$UNAME_RELEASE"|sed -e 's/[^.]*//'`" + echo sparc-hal-solaris2"$(echo "$UNAME_RELEASE"|sed -e 's/[^.]*//')" exit ;; sun4*:SunOS:5.*:* | tadpole*:SunOS:5.*:*) - echo sparc-sun-solaris2"`echo "$UNAME_RELEASE" | sed -e 's/[^.]*//'`" + echo sparc-sun-solaris2"$(echo "$UNAME_RELEASE" | sed -e 's/[^.]*//')" exit ;; i86pc:AuroraUX:5.*:* | i86xen:AuroraUX:5.*:*) echo i386-pc-auroraux"$UNAME_RELEASE" exit ;; i86pc:SunOS:5.*:* | i86xen:SunOS:5.*:*) - eval "$set_cc_for_build" + set_cc_for_build SUN_ARCH=i386 # If there is a compiler, see if it is configured for 64-bit objects. # Note that the Sun cc does not turn __LP64__ into 1 like gcc does. # This test works for both compilers. - if [ "$CC_FOR_BUILD" != no_compiler_found ]; then + if test "$CC_FOR_BUILD" != no_compiler_found; then if (echo '#ifdef __amd64'; echo IS_64BIT_ARCH; echo '#endif') | \ (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \ grep IS_64BIT_ARCH >/dev/null @@ -402,30 +411,30 @@ case "$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in SUN_ARCH=x86_64 fi fi - echo "$SUN_ARCH"-pc-solaris2"`echo "$UNAME_RELEASE"|sed -e 's/[^.]*//'`" + echo "$SUN_ARCH"-pc-solaris2"$(echo "$UNAME_RELEASE"|sed -e 's/[^.]*//')" exit ;; sun4*:SunOS:6*:*) # According to config.sub, this is the proper way to canonicalize # SunOS6. Hard to guess exactly what SunOS6 will be like, but # it's likely to be more like Solaris than SunOS4. - echo sparc-sun-solaris3"`echo "$UNAME_RELEASE"|sed -e 's/[^.]*//'`" + echo sparc-sun-solaris3"$(echo "$UNAME_RELEASE"|sed -e 's/[^.]*//')" exit ;; sun4*:SunOS:*:*) - case "`/usr/bin/arch -k`" in + case "$(/usr/bin/arch -k)" in Series*|S4*) - UNAME_RELEASE=`uname -v` + UNAME_RELEASE=$(uname -v) ;; esac # Japanese Language versions have a version number like `4.1.3-JL'. - echo sparc-sun-sunos"`echo "$UNAME_RELEASE"|sed -e 's/-/_/'`" + echo sparc-sun-sunos"$(echo "$UNAME_RELEASE"|sed -e 's/-/_/')" exit ;; sun3*:SunOS:*:*) echo m68k-sun-sunos"$UNAME_RELEASE" exit ;; sun*:*:4.2BSD:*) - UNAME_RELEASE=`(sed 1q /etc/motd | awk '{print substr($5,1,3)}') 2>/dev/null` + UNAME_RELEASE=$( (sed 1q /etc/motd | awk '{print substr($5,1,3)}') 2>/dev/null) test "x$UNAME_RELEASE" = x && UNAME_RELEASE=3 - case "`/bin/arch`" in + case "$(/bin/arch)" in sun3) echo m68k-sun-sunos"$UNAME_RELEASE" ;; @@ -482,7 +491,7 @@ case "$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in echo clipper-intergraph-clix"$UNAME_RELEASE" exit ;; mips:*:*:UMIPS | mips:*:*:RISCos) - eval "$set_cc_for_build" + set_cc_for_build sed 's/^ //' << EOF > "$dummy.c" #ifdef __cplusplus #include /* for printf() prototype */ @@ -505,8 +514,8 @@ case "$UNAME_MACHINE:$UNAME_SYSTEM:$UNAME_RELEASE:$UNAME_VERSION" in } EOF $CC_FOR_BUILD -o "$dummy" "$dummy.c" && - dummyarg=`echo "$UNAME_RELEASE" | sed -n 's/\([0-9]*\).*/\1/p'` && - SYSTEM_NAME=`"$dummy" "$dummyarg"` && + dummyarg=$(echo "$UNAME_RELEASE" | sed -n 's/\([0-9]*\).*/\1/p') && + SYSTEM_NAME=$("$dummy" "$dummyarg") && { echo "$SYSTEM_NAME"; exit; } echo mips-mips-riscos"$UNAME_RELEASE" exit ;; @@ -533,11 +542,11 @@ EOF exit ;; AViiON:dgux:*:*) # DG/UX returns AViiON for all architectures - UNAME_PROCESSOR=`/usr/bin/uname -p` - if [ "$UNAME_PROCESSOR" = mc88100 ] || [ "$UNAME_PROCESSOR" = mc88110 ] + UNAME_PROCESSOR=$(/usr/bin/uname -p) + if test "$UNAME_PROCESSOR" = mc88100 || test "$UNAME_PROCESSOR" = mc88110 then - if [ "$TARGET_BINARY_INTERFACE"x = m88kdguxelfx ] || \ - [ "$TARGET_BINARY_INTERFACE"x = x ] + if test "$TARGET_BINARY_INTERFACE"x = m88kdguxelfx || \ + test "$TARGET_BINARY_INTERFACE"x = x then echo m88k-dg-dgux"$UNAME_RELEASE" else @@ -561,17 +570,17 @@ EOF echo m68k-tektronix-bsd exit ;; *:IRIX*:*:*) - echo mips-sgi-irix"`echo "$UNAME_RELEASE"|sed -e 's/-/_/g'`" + echo mips-sgi-irix"$(echo "$UNAME_RELEASE"|sed -e 's/-/_/g')" exit ;; ????????:AIX?:[12].1:2) # AIX 2.2.1 or AIX 2.1.1 is RT/PC AIX. echo romp-ibm-aix # uname -m gives an 8 hex-code CPU id - exit ;; # Note that: echo "'`uname -s`'" gives 'AIX ' + exit ;; # Note that: echo "'$(uname -s)'" gives 'AIX ' i*86:AIX:*:*) echo i386-ibm-aix exit ;; ia64:AIX:*:*) - if [ -x /usr/bin/oslevel ] ; then - IBM_REV=`/usr/bin/oslevel` + if test -x /usr/bin/oslevel ; then + IBM_REV=$(/usr/bin/oslevel) else IBM_REV="$UNAME_VERSION.$UNAME_RELEASE" fi @@ -579,7 +588,7 @@ EOF exit ;; *:AIX:2:3) if grep bos325 /usr/include/stdio.h >/dev/null 2>&1; then - eval "$set_cc_for_build" + set_cc_for_build sed 's/^ //' << EOF > "$dummy.c" #include @@ -591,7 +600,7 @@ EOF exit(0); } EOF - if $CC_FOR_BUILD -o "$dummy" "$dummy.c" && SYSTEM_NAME=`"$dummy"` + if $CC_FOR_BUILD -o "$dummy" "$dummy.c" && SYSTEM_NAME=$("$dummy") then echo "$SYSTEM_NAME" else @@ -604,15 +613,15 @@ EOF fi exit ;; *:AIX:*:[4567]) - IBM_CPU_ID=`/usr/sbin/lsdev -C -c processor -S available | sed 1q | awk '{ print $1 }'` + IBM_CPU_ID=$(/usr/sbin/lsdev -C -c processor -S available | sed 1q | awk '{ print $1 }') if /usr/sbin/lsattr -El "$IBM_CPU_ID" | grep ' POWER' >/dev/null 2>&1; then IBM_ARCH=rs6000 else IBM_ARCH=powerpc fi - if [ -x /usr/bin/lslpp ] ; then - IBM_REV=`/usr/bin/lslpp -Lqc bos.rte.libc | - awk -F: '{ print $3 }' | sed s/[0-9]*$/0/` + if test -x /usr/bin/lslpp ; then + IBM_REV=$(/usr/bin/lslpp -Lqc bos.rte.libc | + awk -F: '{ print $3 }' | sed s/[0-9]*$/0/) else IBM_REV="$UNAME_VERSION.$UNAME_RELEASE" fi @@ -640,14 +649,14 @@ EOF echo m68k-hp-bsd4.4 exit ;; 9000/[34678]??:HP-UX:*:*) - HPUX_REV=`echo "$UNAME_RELEASE"|sed -e 's/[^.]*.[0B]*//'` + HPUX_REV=$(echo "$UNAME_RELEASE"|sed -e 's/[^.]*.[0B]*//') case "$UNAME_MACHINE" in 9000/31?) HP_ARCH=m68000 ;; 9000/[34]??) HP_ARCH=m68k ;; 9000/[678][0-9][0-9]) - if [ -x /usr/bin/getconf ]; then - sc_cpu_version=`/usr/bin/getconf SC_CPU_VERSION 2>/dev/null` - sc_kernel_bits=`/usr/bin/getconf SC_KERNEL_BITS 2>/dev/null` + if test -x /usr/bin/getconf; then + sc_cpu_version=$(/usr/bin/getconf SC_CPU_VERSION 2>/dev/null) + sc_kernel_bits=$(/usr/bin/getconf SC_KERNEL_BITS 2>/dev/null) case "$sc_cpu_version" in 523) HP_ARCH=hppa1.0 ;; # CPU_PA_RISC1_0 528) HP_ARCH=hppa1.1 ;; # CPU_PA_RISC1_1 @@ -659,8 +668,8 @@ EOF esac ;; esac fi - if [ "$HP_ARCH" = "" ]; then - eval "$set_cc_for_build" + if test "$HP_ARCH" = ""; then + set_cc_for_build sed 's/^ //' << EOF > "$dummy.c" #define _HPUX_SOURCE @@ -694,13 +703,13 @@ EOF exit (0); } EOF - (CCOPTS="" $CC_FOR_BUILD -o "$dummy" "$dummy.c" 2>/dev/null) && HP_ARCH=`"$dummy"` + (CCOPTS="" $CC_FOR_BUILD -o "$dummy" "$dummy.c" 2>/dev/null) && HP_ARCH=$("$dummy") test -z "$HP_ARCH" && HP_ARCH=hppa fi ;; esac - if [ "$HP_ARCH" = hppa2.0w ] + if test "$HP_ARCH" = hppa2.0w then - eval "$set_cc_for_build" + set_cc_for_build # hppa2.0w-hp-hpux* has a 64-bit kernel and a compiler generating # 32-bit code. hppa64-hp-hpux* has the same kernel and a compiler @@ -722,11 +731,11 @@ EOF echo "$HP_ARCH"-hp-hpux"$HPUX_REV" exit ;; ia64:HP-UX:*:*) - HPUX_REV=`echo "$UNAME_RELEASE"|sed -e 's/[^.]*.[0B]*//'` + HPUX_REV=$(echo "$UNAME_RELEASE"|sed -e 's/[^.]*.[0B]*//') echo ia64-hp-hpux"$HPUX_REV" exit ;; 3050*:HI-UX:*:*) - eval "$set_cc_for_build" + set_cc_for_build sed 's/^ //' << EOF > "$dummy.c" #include int @@ -752,7 +761,7 @@ EOF exit (0); } EOF - $CC_FOR_BUILD -o "$dummy" "$dummy.c" && SYSTEM_NAME=`"$dummy"` && + $CC_FOR_BUILD -o "$dummy" "$dummy.c" && SYSTEM_NAME=$("$dummy") && { echo "$SYSTEM_NAME"; exit; } echo unknown-hitachi-hiuxwe2 exit ;; @@ -772,7 +781,7 @@ EOF echo hppa1.0-hp-osf exit ;; i*86:OSF1:*:*) - if [ -x /usr/sbin/sysversion ] ; then + if test -x /usr/sbin/sysversion ; then echo "$UNAME_MACHINE"-unknown-osf1mk else echo "$UNAME_MACHINE"-unknown-osf1 @@ -821,14 +830,14 @@ EOF echo craynv-cray-unicosmp"$UNAME_RELEASE" | sed -e 's/\.[^.]*$/.X/' exit ;; F30[01]:UNIX_System_V:*:* | F700:UNIX_System_V:*:*) - FUJITSU_PROC=`uname -m | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz` - FUJITSU_SYS=`uname -p | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/\///'` - FUJITSU_REL=`echo "$UNAME_RELEASE" | sed -e 's/ /_/'` + FUJITSU_PROC=$(uname -m | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz) + FUJITSU_SYS=$(uname -p | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/\///') + FUJITSU_REL=$(echo "$UNAME_RELEASE" | sed -e 's/ /_/') echo "${FUJITSU_PROC}-fujitsu-${FUJITSU_SYS}${FUJITSU_REL}" exit ;; 5000:UNIX_System_V:4.*:*) - FUJITSU_SYS=`uname -p | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/\///'` - FUJITSU_REL=`echo "$UNAME_RELEASE" | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/ /_/'` + FUJITSU_SYS=$(uname -p | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/\///') + FUJITSU_REL=$(echo "$UNAME_RELEASE" | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz | sed -e 's/ /_/') echo "sparc-fujitsu-${FUJITSU_SYS}${FUJITSU_REL}" exit ;; i*86:BSD/386:*:* | i*86:BSD/OS:*:* | *:Ascend\ Embedded/OS:*:*) @@ -840,15 +849,26 @@ EOF *:BSD/OS:*:*) echo "$UNAME_MACHINE"-unknown-bsdi"$UNAME_RELEASE" exit ;; + arm:FreeBSD:*:*) + UNAME_PROCESSOR=$(uname -p) + set_cc_for_build + if echo __ARM_PCS_VFP | $CC_FOR_BUILD -E - 2>/dev/null \ + | grep -q __ARM_PCS_VFP + then + echo "${UNAME_PROCESSOR}"-unknown-freebsd"$(echo ${UNAME_RELEASE}|sed -e 's/[-(].*//')"-gnueabi + else + echo "${UNAME_PROCESSOR}"-unknown-freebsd"$(echo ${UNAME_RELEASE}|sed -e 's/[-(].*//')"-gnueabihf + fi + exit ;; *:FreeBSD:*:*) - UNAME_PROCESSOR=`/usr/bin/uname -p` + UNAME_PROCESSOR=$(/usr/bin/uname -p) case "$UNAME_PROCESSOR" in amd64) UNAME_PROCESSOR=x86_64 ;; i386) UNAME_PROCESSOR=i586 ;; esac - echo "$UNAME_PROCESSOR"-unknown-freebsd"`echo "$UNAME_RELEASE"|sed -e 's/[-(].*//'`" + echo "$UNAME_PROCESSOR"-unknown-freebsd"$(echo "$UNAME_RELEASE"|sed -e 's/[-(].*//')" exit ;; i*:CYGWIN*:*) echo "$UNAME_MACHINE"-pc-cygwin @@ -881,21 +901,21 @@ EOF echo "$UNAME_MACHINE"-pc-uwin exit ;; amd64:CYGWIN*:*:* | x86_64:CYGWIN*:*:*) - echo x86_64-unknown-cygwin + echo x86_64-pc-cygwin exit ;; prep*:SunOS:5.*:*) - echo powerpcle-unknown-solaris2"`echo "$UNAME_RELEASE"|sed -e 's/[^.]*//'`" + echo powerpcle-unknown-solaris2"$(echo "$UNAME_RELEASE"|sed -e 's/[^.]*//')" exit ;; *:GNU:*:*) # the GNU system - echo "`echo "$UNAME_MACHINE"|sed -e 's,[-/].*$,,'`-unknown-$LIBC`echo "$UNAME_RELEASE"|sed -e 's,/.*$,,'`" + echo "$(echo "$UNAME_MACHINE"|sed -e 's,[-/].*$,,')-unknown-$LIBC$(echo "$UNAME_RELEASE"|sed -e 's,/.*$,,')" exit ;; *:GNU/*:*:*) # other systems with GNU libc and userland - echo "$UNAME_MACHINE-unknown-`echo "$UNAME_SYSTEM" | sed 's,^[^/]*/,,' | tr "[:upper:]" "[:lower:]"``echo "$UNAME_RELEASE"|sed -e 's/[-(].*//'`-$LIBC" + echo "$UNAME_MACHINE-unknown-$(echo "$UNAME_SYSTEM" | sed 's,^[^/]*/,,' | tr "[:upper:]" "[:lower:]")$(echo "$UNAME_RELEASE"|sed -e 's/[-(].*//')-$LIBC" exit ;; - i*86:Minix:*:*) - echo "$UNAME_MACHINE"-pc-minix + *:Minix:*:*) + echo "$UNAME_MACHINE"-unknown-minix exit ;; aarch64:Linux:*:*) echo "$UNAME_MACHINE"-unknown-linux-"$LIBC" @@ -905,7 +925,7 @@ EOF echo "$UNAME_MACHINE"-unknown-linux-"$LIBC" exit ;; alpha:Linux:*:*) - case `sed -n '/^cpu model/s/^.*: \(.*\)/\1/p' < /proc/cpuinfo` in + case $(sed -n '/^cpu model/s/^.*: \(.*\)/\1/p' /proc/cpuinfo 2>/dev/null) in EV5) UNAME_MACHINE=alphaev5 ;; EV56) UNAME_MACHINE=alphaev56 ;; PCA56) UNAME_MACHINE=alphapca56 ;; @@ -922,7 +942,7 @@ EOF echo "$UNAME_MACHINE"-unknown-linux-"$LIBC" exit ;; arm*:Linux:*:*) - eval "$set_cc_for_build" + set_cc_for_build if echo __ARM_EABI__ | $CC_FOR_BUILD -E - 2>/dev/null \ | grep -q __ARM_EABI__ then @@ -971,23 +991,51 @@ EOF echo "$UNAME_MACHINE"-unknown-linux-"$LIBC" exit ;; mips:Linux:*:* | mips64:Linux:*:*) - eval "$set_cc_for_build" + set_cc_for_build + IS_GLIBC=0 + test x"${LIBC}" = xgnu && IS_GLIBC=1 sed 's/^ //' << EOF > "$dummy.c" #undef CPU - #undef ${UNAME_MACHINE} - #undef ${UNAME_MACHINE}el + #undef mips + #undef mipsel + #undef mips64 + #undef mips64el + #if ${IS_GLIBC} && defined(_ABI64) + LIBCABI=gnuabi64 + #else + #if ${IS_GLIBC} && defined(_ABIN32) + LIBCABI=gnuabin32 + #else + LIBCABI=${LIBC} + #endif + #endif + + #if ${IS_GLIBC} && defined(__mips64) && defined(__mips_isa_rev) && __mips_isa_rev>=6 + CPU=mipsisa64r6 + #else + #if ${IS_GLIBC} && !defined(__mips64) && defined(__mips_isa_rev) && __mips_isa_rev>=6 + CPU=mipsisa32r6 + #else + #if defined(__mips64) + CPU=mips64 + #else + CPU=mips + #endif + #endif + #endif + #if defined(__MIPSEL__) || defined(__MIPSEL) || defined(_MIPSEL) || defined(MIPSEL) - CPU=${UNAME_MACHINE}el + MIPS_ENDIAN=el #else #if defined(__MIPSEB__) || defined(__MIPSEB) || defined(_MIPSEB) || defined(MIPSEB) - CPU=${UNAME_MACHINE} + MIPS_ENDIAN= #else - CPU= + MIPS_ENDIAN= #endif #endif EOF - eval "`$CC_FOR_BUILD -E "$dummy.c" 2>/dev/null | grep '^CPU'`" - test "x$CPU" != x && { echo "$CPU-unknown-linux-$LIBC"; exit; } + eval "$($CC_FOR_BUILD -E "$dummy.c" 2>/dev/null | grep '^CPU\|^MIPS_ENDIAN\|^LIBCABI')" + test "x$CPU" != x && { echo "$CPU${MIPS_ENDIAN}-unknown-linux-$LIBCABI"; exit; } ;; mips64el:Linux:*:*) echo "$UNAME_MACHINE"-unknown-linux-"$LIBC" @@ -1006,7 +1054,7 @@ EOF exit ;; parisc:Linux:*:* | hppa:Linux:*:*) # Look for CPU level - case `grep '^cpu[^a-z]*:' /proc/cpuinfo 2>/dev/null | cut -d' ' -f2` in + case $(grep '^cpu[^a-z]*:' /proc/cpuinfo 2>/dev/null | cut -d' ' -f2) in PA7*) echo hppa1.1-unknown-linux-"$LIBC" ;; PA8*) echo hppa2.0-unknown-linux-"$LIBC" ;; *) echo hppa-unknown-linux-"$LIBC" ;; @@ -1046,7 +1094,17 @@ EOF echo "$UNAME_MACHINE"-dec-linux-"$LIBC" exit ;; x86_64:Linux:*:*) - echo "$UNAME_MACHINE"-pc-linux-"$LIBC" + set_cc_for_build + LIBCABI=$LIBC + if test "$CC_FOR_BUILD" != no_compiler_found; then + if (echo '#ifdef __ILP32__'; echo IS_X32; echo '#endif') | \ + (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \ + grep IS_X32 >/dev/null + then + LIBCABI="$LIBC"x32 + fi + fi + echo "$UNAME_MACHINE"-pc-linux-"$LIBCABI" exit ;; xtensa*:Linux:*:*) echo "$UNAME_MACHINE"-unknown-linux-"$LIBC" @@ -1086,7 +1144,7 @@ EOF echo "$UNAME_MACHINE"-pc-msdosdjgpp exit ;; i*86:*:4.*:*) - UNAME_REL=`echo "$UNAME_RELEASE" | sed 's/\/MP$//'` + UNAME_REL=$(echo "$UNAME_RELEASE" | sed 's/\/MP$//') if grep Novell /usr/include/link.h >/dev/null 2>/dev/null; then echo "$UNAME_MACHINE"-univel-sysv"$UNAME_REL" else @@ -1095,19 +1153,19 @@ EOF exit ;; i*86:*:5:[678]*) # UnixWare 7.x, OpenUNIX and OpenServer 6. - case `/bin/uname -X | grep "^Machine"` in + case $(/bin/uname -X | grep "^Machine") in *486*) UNAME_MACHINE=i486 ;; *Pentium) UNAME_MACHINE=i586 ;; *Pent*|*Celeron) UNAME_MACHINE=i686 ;; esac - echo "$UNAME_MACHINE-unknown-sysv${UNAME_RELEASE}${UNAME_SYSTEM}{$UNAME_VERSION}" + echo "$UNAME_MACHINE-unknown-sysv${UNAME_RELEASE}${UNAME_SYSTEM}${UNAME_VERSION}" exit ;; i*86:*:3.2:*) if test -f /usr/options/cb.name; then - UNAME_REL=`sed -n 's/.*Version //p' /dev/null >/dev/null ; then - UNAME_REL=`(/bin/uname -X|grep Release|sed -e 's/.*= //')` + UNAME_REL=$( (/bin/uname -X|grep Release|sed -e 's/.*= //')) (/bin/uname -X|grep i80486 >/dev/null) && UNAME_MACHINE=i486 (/bin/uname -X|grep '^Machine.*Pentium' >/dev/null) \ && UNAME_MACHINE=i586 @@ -1157,7 +1215,7 @@ EOF 3[345]??:*:4.0:3.0 | 3[34]??A:*:4.0:3.0 | 3[34]??,*:*:4.0:3.0 | 3[34]??/*:*:4.0:3.0 | 4400:*:4.0:3.0 | 4850:*:4.0:3.0 | SKA40:*:4.0:3.0 | SDS2:*:4.0:3.0 | SHG2:*:4.0:3.0 | S7501*:*:4.0:3.0) OS_REL='' test -r /etc/.relid \ - && OS_REL=.`sed -n 's/[^ ]* [^ ]* \([0-9][0-9]\).*/\1/p' < /etc/.relid` + && OS_REL=.$(sed -n 's/[^ ]* [^ ]* \([0-9][0-9]\).*/\1/p' < /etc/.relid) /bin/uname -p 2>/dev/null | grep 86 >/dev/null \ && { echo i486-ncr-sysv4.3"$OS_REL"; exit; } /bin/uname -p 2>/dev/null | /bin/grep entium >/dev/null \ @@ -1168,7 +1226,7 @@ EOF NCR*:*:4.2:* | MPRAS*:*:4.2:*) OS_REL='.3' test -r /etc/.relid \ - && OS_REL=.`sed -n 's/[^ ]* [^ ]* \([0-9][0-9]\).*/\1/p' < /etc/.relid` + && OS_REL=.$(sed -n 's/[^ ]* [^ ]* \([0-9][0-9]\).*/\1/p' < /etc/.relid) /bin/uname -p 2>/dev/null | grep 86 >/dev/null \ && { echo i486-ncr-sysv4.3"$OS_REL"; exit; } /bin/uname -p 2>/dev/null | /bin/grep entium >/dev/null \ @@ -1201,7 +1259,7 @@ EOF exit ;; *:SINIX-*:*:*) if uname -p 2>/dev/null >/dev/null ; then - UNAME_MACHINE=`(uname -p) 2>/dev/null` + UNAME_MACHINE=$( (uname -p) 2>/dev/null) echo "$UNAME_MACHINE"-sni-sysv4 else echo ns32k-sni-sysv @@ -1235,7 +1293,7 @@ EOF echo mips-sony-newsos6 exit ;; R[34]000:*System_V*:*:* | R4000:UNIX_SYSV:*:* | R*000:UNIX_SV:*:*) - if [ -d /usr/nec ]; then + if test -d /usr/nec; then echo mips-nec-sysv"$UNAME_RELEASE" else echo mips-unknown-sysv"$UNAME_RELEASE" @@ -1283,44 +1341,48 @@ EOF *:Rhapsody:*:*) echo "$UNAME_MACHINE"-apple-rhapsody"$UNAME_RELEASE" exit ;; + arm64:Darwin:*:*) + echo aarch64-apple-darwin"$UNAME_RELEASE" + exit ;; *:Darwin:*:*) - UNAME_PROCESSOR=`uname -p` || UNAME_PROCESSOR=unknown - eval "$set_cc_for_build" - if test "$UNAME_PROCESSOR" = unknown ; then - UNAME_PROCESSOR=powerpc + UNAME_PROCESSOR=$(uname -p) + case $UNAME_PROCESSOR in + unknown) UNAME_PROCESSOR=powerpc ;; + esac + if command -v xcode-select > /dev/null 2> /dev/null && \ + ! xcode-select --print-path > /dev/null 2> /dev/null ; then + # Avoid executing cc if there is no toolchain installed as + # cc will be a stub that puts up a graphical alert + # prompting the user to install developer tools. + CC_FOR_BUILD=no_compiler_found + else + set_cc_for_build fi - if test "`echo "$UNAME_RELEASE" | sed -e 's/\..*//'`" -le 10 ; then - if [ "$CC_FOR_BUILD" != no_compiler_found ]; then - if (echo '#ifdef __LP64__'; echo IS_64BIT_ARCH; echo '#endif') | \ - (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \ - grep IS_64BIT_ARCH >/dev/null - then - case $UNAME_PROCESSOR in - i386) UNAME_PROCESSOR=x86_64 ;; - powerpc) UNAME_PROCESSOR=powerpc64 ;; - esac - fi - # On 10.4-10.6 one might compile for PowerPC via gcc -arch ppc - if (echo '#ifdef __POWERPC__'; echo IS_PPC; echo '#endif') | \ - (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \ - grep IS_PPC >/dev/null - then - UNAME_PROCESSOR=powerpc - fi + if test "$CC_FOR_BUILD" != no_compiler_found; then + if (echo '#ifdef __LP64__'; echo IS_64BIT_ARCH; echo '#endif') | \ + (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \ + grep IS_64BIT_ARCH >/dev/null + then + case $UNAME_PROCESSOR in + i386) UNAME_PROCESSOR=x86_64 ;; + powerpc) UNAME_PROCESSOR=powerpc64 ;; + esac + fi + # On 10.4-10.6 one might compile for PowerPC via gcc -arch ppc + if (echo '#ifdef __POWERPC__'; echo IS_PPC; echo '#endif') | \ + (CCOPTS="" $CC_FOR_BUILD -E - 2>/dev/null) | \ + grep IS_PPC >/dev/null + then + UNAME_PROCESSOR=powerpc fi elif test "$UNAME_PROCESSOR" = i386 ; then - # Avoid executing cc on OS X 10.9, as it ships with a stub - # that puts up a graphical alert prompting to install - # developer tools. Any system running Mac OS X 10.7 or - # later (Darwin 11 and later) is required to have a 64-bit - # processor. This is not true of the ARM version of Darwin - # that Apple uses in portable devices. - UNAME_PROCESSOR=x86_64 + # uname -m returns i386 or x86_64 + UNAME_PROCESSOR=$UNAME_MACHINE fi echo "$UNAME_PROCESSOR"-apple-darwin"$UNAME_RELEASE" exit ;; *:procnto*:*:* | *:QNX:[0123456789]*:*) - UNAME_PROCESSOR=`uname -p` + UNAME_PROCESSOR=$(uname -p) if test "$UNAME_PROCESSOR" = x86; then UNAME_PROCESSOR=i386 UNAME_MACHINE=pc @@ -1358,6 +1420,7 @@ EOF # "uname -m" is not consistent, so use $cputype instead. 386 # is converted to i386 for consistency with other x86 # operating systems. + # shellcheck disable=SC2154 if test "$cputype" = 386; then UNAME_MACHINE=i386 else @@ -1387,10 +1450,10 @@ EOF echo mips-sei-seiux"$UNAME_RELEASE" exit ;; *:DragonFly:*:*) - echo "$UNAME_MACHINE"-unknown-dragonfly"`echo "$UNAME_RELEASE"|sed -e 's/[-(].*//'`" + echo "$UNAME_MACHINE"-unknown-dragonfly"$(echo "$UNAME_RELEASE"|sed -e 's/[-(].*//')" exit ;; *:*VMS:*:*) - UNAME_MACHINE=`(uname -p) 2>/dev/null` + UNAME_MACHINE=$( (uname -p) 2>/dev/null) case "$UNAME_MACHINE" in A*) echo alpha-dec-vms ; exit ;; I*) echo ia64-dec-vms ; exit ;; @@ -1400,7 +1463,7 @@ EOF echo i386-pc-xenix exit ;; i*86:skyos:*:*) - echo "$UNAME_MACHINE"-pc-skyos"`echo "$UNAME_RELEASE" | sed -e 's/ .*$//'`" + echo "$UNAME_MACHINE"-pc-skyos"$(echo "$UNAME_RELEASE" | sed -e 's/ .*$//')" exit ;; i*86:rdos:*:*) echo "$UNAME_MACHINE"-pc-rdos @@ -1414,8 +1477,148 @@ EOF amd64:Isilon\ OneFS:*:*) echo x86_64-unknown-onefs exit ;; + *:Unleashed:*:*) + echo "$UNAME_MACHINE"-unknown-unleashed"$UNAME_RELEASE" + exit ;; esac +# No uname command or uname output not recognized. +set_cc_for_build +cat > "$dummy.c" < +#include +#endif +#if defined(ultrix) || defined(_ultrix) || defined(__ultrix) || defined(__ultrix__) +#if defined (vax) || defined (__vax) || defined (__vax__) || defined(mips) || defined(__mips) || defined(__mips__) || defined(MIPS) || defined(__MIPS__) +#include +#if defined(_SIZE_T_) || defined(SIGLOST) +#include +#endif +#endif +#endif +main () +{ +#if defined (sony) +#if defined (MIPSEB) + /* BFD wants "bsd" instead of "newsos". Perhaps BFD should be changed, + I don't know.... */ + printf ("mips-sony-bsd\n"); exit (0); +#else +#include + printf ("m68k-sony-newsos%s\n", +#ifdef NEWSOS4 + "4" +#else + "" +#endif + ); exit (0); +#endif +#endif + +#if defined (NeXT) +#if !defined (__ARCHITECTURE__) +#define __ARCHITECTURE__ "m68k" +#endif + int version; + version=$( (hostinfo | sed -n 's/.*NeXT Mach \([0-9]*\).*/\1/p') 2>/dev/null); + if (version < 4) + printf ("%s-next-nextstep%d\n", __ARCHITECTURE__, version); + else + printf ("%s-next-openstep%d\n", __ARCHITECTURE__, version); + exit (0); +#endif + +#if defined (MULTIMAX) || defined (n16) +#if defined (UMAXV) + printf ("ns32k-encore-sysv\n"); exit (0); +#else +#if defined (CMU) + printf ("ns32k-encore-mach\n"); exit (0); +#else + printf ("ns32k-encore-bsd\n"); exit (0); +#endif +#endif +#endif + +#if defined (__386BSD__) + printf ("i386-pc-bsd\n"); exit (0); +#endif + +#if defined (sequent) +#if defined (i386) + printf ("i386-sequent-dynix\n"); exit (0); +#endif +#if defined (ns32000) + printf ("ns32k-sequent-dynix\n"); exit (0); +#endif +#endif + +#if defined (_SEQUENT_) + struct utsname un; + + uname(&un); + if (strncmp(un.version, "V2", 2) == 0) { + printf ("i386-sequent-ptx2\n"); exit (0); + } + if (strncmp(un.version, "V1", 2) == 0) { /* XXX is V1 correct? */ + printf ("i386-sequent-ptx1\n"); exit (0); + } + printf ("i386-sequent-ptx\n"); exit (0); +#endif + +#if defined (vax) +#if !defined (ultrix) +#include +#if defined (BSD) +#if BSD == 43 + printf ("vax-dec-bsd4.3\n"); exit (0); +#else +#if BSD == 199006 + printf ("vax-dec-bsd4.3reno\n"); exit (0); +#else + printf ("vax-dec-bsd\n"); exit (0); +#endif +#endif +#else + printf ("vax-dec-bsd\n"); exit (0); +#endif +#else +#if defined(_SIZE_T_) || defined(SIGLOST) + struct utsname un; + uname (&un); + printf ("vax-dec-ultrix%s\n", un.release); exit (0); +#else + printf ("vax-dec-ultrix\n"); exit (0); +#endif +#endif +#endif +#if defined(ultrix) || defined(_ultrix) || defined(__ultrix) || defined(__ultrix__) +#if defined(mips) || defined(__mips) || defined(__mips__) || defined(MIPS) || defined(__MIPS__) +#if defined(_SIZE_T_) || defined(SIGLOST) + struct utsname *un; + uname (&un); + printf ("mips-dec-ultrix%s\n", un.release); exit (0); +#else + printf ("mips-dec-ultrix\n"); exit (0); +#endif +#endif +#endif + +#if defined (alliant) && defined (i860) + printf ("i860-alliant-bsd\n"); exit (0); +#endif + + exit (1); +} +EOF + +$CC_FOR_BUILD -o "$dummy" "$dummy.c" 2>/dev/null && SYSTEM_NAME=$($dummy) && + { echo "$SYSTEM_NAME"; exit; } + +# Apollos put the system type in the environment. +test -d /usr/apollo && { echo "$ISP-apollo-$SYSTYPE"; exit; } + echo "$0: unable to guess system type" >&2 case "$UNAME_MACHINE:$UNAME_SYSTEM" in @@ -1438,6 +1641,12 @@ copies of config.guess and config.sub with the latest versions from: https://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.guess and https://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub +EOF + +year=$(echo $timestamp | sed 's,-.*,,') +# shellcheck disable=SC2003 +if test "$(expr "$(date +%Y)" - "$year")" -lt 3 ; then + cat >&2 </dev/null || echo unknown` -uname -r = `(uname -r) 2>/dev/null || echo unknown` -uname -s = `(uname -s) 2>/dev/null || echo unknown` -uname -v = `(uname -v) 2>/dev/null || echo unknown` +uname -m = $( (uname -m) 2>/dev/null || echo unknown) +uname -r = $( (uname -r) 2>/dev/null || echo unknown) +uname -s = $( (uname -s) 2>/dev/null || echo unknown) +uname -v = $( (uname -v) 2>/dev/null || echo unknown) -/usr/bin/uname -p = `(/usr/bin/uname -p) 2>/dev/null` -/bin/uname -X = `(/bin/uname -X) 2>/dev/null` +/usr/bin/uname -p = $( (/usr/bin/uname -p) 2>/dev/null) +/bin/uname -X = $( (/bin/uname -X) 2>/dev/null) -hostinfo = `(hostinfo) 2>/dev/null` -/bin/universe = `(/bin/universe) 2>/dev/null` -/usr/bin/arch -k = `(/usr/bin/arch -k) 2>/dev/null` -/bin/arch = `(/bin/arch) 2>/dev/null` -/usr/bin/oslevel = `(/usr/bin/oslevel) 2>/dev/null` -/usr/convex/getsysinfo = `(/usr/convex/getsysinfo) 2>/dev/null` +hostinfo = $( (hostinfo) 2>/dev/null) +/bin/universe = $( (/bin/universe) 2>/dev/null) +/usr/bin/arch -k = $( (/usr/bin/arch -k) 2>/dev/null) +/bin/arch = $( (/bin/arch) 2>/dev/null) +/usr/bin/oslevel = $( (/usr/bin/oslevel) 2>/dev/null) +/usr/convex/getsysinfo = $( (/usr/convex/getsysinfo) 2>/dev/null) UNAME_MACHINE = "$UNAME_MACHINE" UNAME_RELEASE = "$UNAME_RELEASE" UNAME_SYSTEM = "$UNAME_SYSTEM" UNAME_VERSION = "$UNAME_VERSION" EOF +fi exit 1 diff --git a/src/pcre2/config.sub b/src/pcre2/config.sub new file mode 100755 index 00000000..c874b7a9 --- /dev/null +++ b/src/pcre2/config.sub @@ -0,0 +1,1853 @@ +#! /bin/sh +# Configuration validation subroutine script. +# Copyright 1992-2020 Free Software Foundation, Inc. + +timestamp='2020-11-07' + +# This file is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, see . +# +# As a special exception to the GNU General Public License, if you +# distribute this file as part of a program that contains a +# configuration script generated by Autoconf, you may include it under +# the same distribution terms that you use for the rest of that +# program. This Exception is an additional permission under section 7 +# of the GNU General Public License, version 3 ("GPLv3"). + + +# Please send patches to . +# +# Configuration subroutine to validate and canonicalize a configuration type. +# Supply the specified configuration type as an argument. +# If it is invalid, we print an error message on stderr and exit with code 1. +# Otherwise, we print the canonical config type on stdout and succeed. + +# You can get the latest version of this script from: +# https://git.savannah.gnu.org/gitweb/?p=config.git;a=blob_plain;f=config.sub + +# This file is supposed to be the same for all GNU packages +# and recognize all the CPU types, system types and aliases +# that are meaningful with *any* GNU software. +# Each package is responsible for reporting which valid configurations +# it does not support. The user should be able to distinguish +# a failure to support a valid configuration from a meaningless +# configuration. + +# The goal of this file is to map all the various variations of a given +# machine specification into a single specification in the form: +# CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM +# or in some cases, the newer four-part form: +# CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM +# It is wrong to echo any other type of specification. + +me=$(echo "$0" | sed -e 's,.*/,,') + +usage="\ +Usage: $0 [OPTION] CPU-MFR-OPSYS or ALIAS + +Canonicalize a configuration name. + +Options: + -h, --help print this help, then exit + -t, --time-stamp print date of last modification, then exit + -v, --version print version number, then exit + +Report bugs and patches to ." + +version="\ +GNU config.sub ($timestamp) + +Copyright 1992-2020 Free Software Foundation, Inc. + +This is free software; see the source for copying conditions. There is NO +warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE." + +help=" +Try \`$me --help' for more information." + +# Parse command line +while test $# -gt 0 ; do + case $1 in + --time-stamp | --time* | -t ) + echo "$timestamp" ; exit ;; + --version | -v ) + echo "$version" ; exit ;; + --help | --h* | -h ) + echo "$usage"; exit ;; + -- ) # Stop option processing + shift; break ;; + - ) # Use stdin as input. + break ;; + -* ) + echo "$me: invalid option $1$help" >&2 + exit 1 ;; + + *local*) + # First pass through any local machine types. + echo "$1" + exit ;; + + * ) + break ;; + esac +done + +case $# in + 0) echo "$me: missing argument$help" >&2 + exit 1;; + 1) ;; + *) echo "$me: too many arguments$help" >&2 + exit 1;; +esac + +# Split fields of configuration type +# shellcheck disable=SC2162 +IFS="-" read field1 field2 field3 field4 <&2 + exit 1 + ;; + *-*-*-*) + basic_machine=$field1-$field2 + basic_os=$field3-$field4 + ;; + *-*-*) + # Ambiguous whether COMPANY is present, or skipped and KERNEL-OS is two + # parts + maybe_os=$field2-$field3 + case $maybe_os in + nto-qnx* | linux-* | uclinux-uclibc* \ + | uclinux-gnu* | kfreebsd*-gnu* | knetbsd*-gnu* | netbsd*-gnu* \ + | netbsd*-eabi* | kopensolaris*-gnu* | cloudabi*-eabi* \ + | storm-chaos* | os2-emx* | rtmk-nova*) + basic_machine=$field1 + basic_os=$maybe_os + ;; + android-linux) + basic_machine=$field1-unknown + basic_os=linux-android + ;; + *) + basic_machine=$field1-$field2 + basic_os=$field3 + ;; + esac + ;; + *-*) + # A lone config we happen to match not fitting any pattern + case $field1-$field2 in + decstation-3100) + basic_machine=mips-dec + basic_os= + ;; + *-*) + # Second component is usually, but not always the OS + case $field2 in + # Prevent following clause from handling this valid os + sun*os*) + basic_machine=$field1 + basic_os=$field2 + ;; + # Manufacturers + dec* | mips* | sequent* | encore* | pc533* | sgi* | sony* \ + | att* | 7300* | 3300* | delta* | motorola* | sun[234]* \ + | unicom* | ibm* | next | hp | isi* | apollo | altos* \ + | convergent* | ncr* | news | 32* | 3600* | 3100* \ + | hitachi* | c[123]* | convex* | sun | crds | omron* | dg \ + | ultra | tti* | harris | dolphin | highlevel | gould \ + | cbm | ns | masscomp | apple | axis | knuth | cray \ + | microblaze* | sim | cisco \ + | oki | wec | wrs | winbond) + basic_machine=$field1-$field2 + basic_os= + ;; + *) + basic_machine=$field1 + basic_os=$field2 + ;; + esac + ;; + esac + ;; + *) + # Convert single-component short-hands not valid as part of + # multi-component configurations. + case $field1 in + 386bsd) + basic_machine=i386-pc + basic_os=bsd + ;; + a29khif) + basic_machine=a29k-amd + basic_os=udi + ;; + adobe68k) + basic_machine=m68010-adobe + basic_os=scout + ;; + alliant) + basic_machine=fx80-alliant + basic_os= + ;; + altos | altos3068) + basic_machine=m68k-altos + basic_os= + ;; + am29k) + basic_machine=a29k-none + basic_os=bsd + ;; + amdahl) + basic_machine=580-amdahl + basic_os=sysv + ;; + amiga) + basic_machine=m68k-unknown + basic_os= + ;; + amigaos | amigados) + basic_machine=m68k-unknown + basic_os=amigaos + ;; + amigaunix | amix) + basic_machine=m68k-unknown + basic_os=sysv4 + ;; + apollo68) + basic_machine=m68k-apollo + basic_os=sysv + ;; + apollo68bsd) + basic_machine=m68k-apollo + basic_os=bsd + ;; + aros) + basic_machine=i386-pc + basic_os=aros + ;; + aux) + basic_machine=m68k-apple + basic_os=aux + ;; + balance) + basic_machine=ns32k-sequent + basic_os=dynix + ;; + blackfin) + basic_machine=bfin-unknown + basic_os=linux + ;; + cegcc) + basic_machine=arm-unknown + basic_os=cegcc + ;; + convex-c1) + basic_machine=c1-convex + basic_os=bsd + ;; + convex-c2) + basic_machine=c2-convex + basic_os=bsd + ;; + convex-c32) + basic_machine=c32-convex + basic_os=bsd + ;; + convex-c34) + basic_machine=c34-convex + basic_os=bsd + ;; + convex-c38) + basic_machine=c38-convex + basic_os=bsd + ;; + cray) + basic_machine=j90-cray + basic_os=unicos + ;; + crds | unos) + basic_machine=m68k-crds + basic_os= + ;; + da30) + basic_machine=m68k-da30 + basic_os= + ;; + decstation | pmax | pmin | dec3100 | decstatn) + basic_machine=mips-dec + basic_os= + ;; + delta88) + basic_machine=m88k-motorola + basic_os=sysv3 + ;; + dicos) + basic_machine=i686-pc + basic_os=dicos + ;; + djgpp) + basic_machine=i586-pc + basic_os=msdosdjgpp + ;; + ebmon29k) + basic_machine=a29k-amd + basic_os=ebmon + ;; + es1800 | OSE68k | ose68k | ose | OSE) + basic_machine=m68k-ericsson + basic_os=ose + ;; + gmicro) + basic_machine=tron-gmicro + basic_os=sysv + ;; + go32) + basic_machine=i386-pc + basic_os=go32 + ;; + h8300hms) + basic_machine=h8300-hitachi + basic_os=hms + ;; + h8300xray) + basic_machine=h8300-hitachi + basic_os=xray + ;; + h8500hms) + basic_machine=h8500-hitachi + basic_os=hms + ;; + harris) + basic_machine=m88k-harris + basic_os=sysv3 + ;; + hp300 | hp300hpux) + basic_machine=m68k-hp + basic_os=hpux + ;; + hp300bsd) + basic_machine=m68k-hp + basic_os=bsd + ;; + hppaosf) + basic_machine=hppa1.1-hp + basic_os=osf + ;; + hppro) + basic_machine=hppa1.1-hp + basic_os=proelf + ;; + i386mach) + basic_machine=i386-mach + basic_os=mach + ;; + isi68 | isi) + basic_machine=m68k-isi + basic_os=sysv + ;; + m68knommu) + basic_machine=m68k-unknown + basic_os=linux + ;; + magnum | m3230) + basic_machine=mips-mips + basic_os=sysv + ;; + merlin) + basic_machine=ns32k-utek + basic_os=sysv + ;; + mingw64) + basic_machine=x86_64-pc + basic_os=mingw64 + ;; + mingw32) + basic_machine=i686-pc + basic_os=mingw32 + ;; + mingw32ce) + basic_machine=arm-unknown + basic_os=mingw32ce + ;; + monitor) + basic_machine=m68k-rom68k + basic_os=coff + ;; + morphos) + basic_machine=powerpc-unknown + basic_os=morphos + ;; + moxiebox) + basic_machine=moxie-unknown + basic_os=moxiebox + ;; + msdos) + basic_machine=i386-pc + basic_os=msdos + ;; + msys) + basic_machine=i686-pc + basic_os=msys + ;; + mvs) + basic_machine=i370-ibm + basic_os=mvs + ;; + nacl) + basic_machine=le32-unknown + basic_os=nacl + ;; + ncr3000) + basic_machine=i486-ncr + basic_os=sysv4 + ;; + netbsd386) + basic_machine=i386-pc + basic_os=netbsd + ;; + netwinder) + basic_machine=armv4l-rebel + basic_os=linux + ;; + news | news700 | news800 | news900) + basic_machine=m68k-sony + basic_os=newsos + ;; + news1000) + basic_machine=m68030-sony + basic_os=newsos + ;; + necv70) + basic_machine=v70-nec + basic_os=sysv + ;; + nh3000) + basic_machine=m68k-harris + basic_os=cxux + ;; + nh[45]000) + basic_machine=m88k-harris + basic_os=cxux + ;; + nindy960) + basic_machine=i960-intel + basic_os=nindy + ;; + mon960) + basic_machine=i960-intel + basic_os=mon960 + ;; + nonstopux) + basic_machine=mips-compaq + basic_os=nonstopux + ;; + os400) + basic_machine=powerpc-ibm + basic_os=os400 + ;; + OSE68000 | ose68000) + basic_machine=m68000-ericsson + basic_os=ose + ;; + os68k) + basic_machine=m68k-none + basic_os=os68k + ;; + paragon) + basic_machine=i860-intel + basic_os=osf + ;; + parisc) + basic_machine=hppa-unknown + basic_os=linux + ;; + psp) + basic_machine=mipsallegrexel-sony + basic_os=psp + ;; + pw32) + basic_machine=i586-unknown + basic_os=pw32 + ;; + rdos | rdos64) + basic_machine=x86_64-pc + basic_os=rdos + ;; + rdos32) + basic_machine=i386-pc + basic_os=rdos + ;; + rom68k) + basic_machine=m68k-rom68k + basic_os=coff + ;; + sa29200) + basic_machine=a29k-amd + basic_os=udi + ;; + sei) + basic_machine=mips-sei + basic_os=seiux + ;; + sequent) + basic_machine=i386-sequent + basic_os= + ;; + sps7) + basic_machine=m68k-bull + basic_os=sysv2 + ;; + st2000) + basic_machine=m68k-tandem + basic_os= + ;; + stratus) + basic_machine=i860-stratus + basic_os=sysv4 + ;; + sun2) + basic_machine=m68000-sun + basic_os= + ;; + sun2os3) + basic_machine=m68000-sun + basic_os=sunos3 + ;; + sun2os4) + basic_machine=m68000-sun + basic_os=sunos4 + ;; + sun3) + basic_machine=m68k-sun + basic_os= + ;; + sun3os3) + basic_machine=m68k-sun + basic_os=sunos3 + ;; + sun3os4) + basic_machine=m68k-sun + basic_os=sunos4 + ;; + sun4) + basic_machine=sparc-sun + basic_os= + ;; + sun4os3) + basic_machine=sparc-sun + basic_os=sunos3 + ;; + sun4os4) + basic_machine=sparc-sun + basic_os=sunos4 + ;; + sun4sol2) + basic_machine=sparc-sun + basic_os=solaris2 + ;; + sun386 | sun386i | roadrunner) + basic_machine=i386-sun + basic_os= + ;; + sv1) + basic_machine=sv1-cray + basic_os=unicos + ;; + symmetry) + basic_machine=i386-sequent + basic_os=dynix + ;; + t3e) + basic_machine=alphaev5-cray + basic_os=unicos + ;; + t90) + basic_machine=t90-cray + basic_os=unicos + ;; + toad1) + basic_machine=pdp10-xkl + basic_os=tops20 + ;; + tpf) + basic_machine=s390x-ibm + basic_os=tpf + ;; + udi29k) + basic_machine=a29k-amd + basic_os=udi + ;; + ultra3) + basic_machine=a29k-nyu + basic_os=sym1 + ;; + v810 | necv810) + basic_machine=v810-nec + basic_os=none + ;; + vaxv) + basic_machine=vax-dec + basic_os=sysv + ;; + vms) + basic_machine=vax-dec + basic_os=vms + ;; + vsta) + basic_machine=i386-pc + basic_os=vsta + ;; + vxworks960) + basic_machine=i960-wrs + basic_os=vxworks + ;; + vxworks68) + basic_machine=m68k-wrs + basic_os=vxworks + ;; + vxworks29k) + basic_machine=a29k-wrs + basic_os=vxworks + ;; + xbox) + basic_machine=i686-pc + basic_os=mingw32 + ;; + ymp) + basic_machine=ymp-cray + basic_os=unicos + ;; + *) + basic_machine=$1 + basic_os= + ;; + esac + ;; +esac + +# Decode 1-component or ad-hoc basic machines +case $basic_machine in + # Here we handle the default manufacturer of certain CPU types. It is in + # some cases the only manufacturer, in others, it is the most popular. + w89k) + cpu=hppa1.1 + vendor=winbond + ;; + op50n) + cpu=hppa1.1 + vendor=oki + ;; + op60c) + cpu=hppa1.1 + vendor=oki + ;; + ibm*) + cpu=i370 + vendor=ibm + ;; + orion105) + cpu=clipper + vendor=highlevel + ;; + mac | mpw | mac-mpw) + cpu=m68k + vendor=apple + ;; + pmac | pmac-mpw) + cpu=powerpc + vendor=apple + ;; + + # Recognize the various machine names and aliases which stand + # for a CPU type and a company and sometimes even an OS. + 3b1 | 7300 | 7300-att | att-7300 | pc7300 | safari | unixpc) + cpu=m68000 + vendor=att + ;; + 3b*) + cpu=we32k + vendor=att + ;; + bluegene*) + cpu=powerpc + vendor=ibm + basic_os=cnk + ;; + decsystem10* | dec10*) + cpu=pdp10 + vendor=dec + basic_os=tops10 + ;; + decsystem20* | dec20*) + cpu=pdp10 + vendor=dec + basic_os=tops20 + ;; + delta | 3300 | motorola-3300 | motorola-delta \ + | 3300-motorola | delta-motorola) + cpu=m68k + vendor=motorola + ;; + dpx2*) + cpu=m68k + vendor=bull + basic_os=sysv3 + ;; + encore | umax | mmax) + cpu=ns32k + vendor=encore + ;; + elxsi) + cpu=elxsi + vendor=elxsi + basic_os=${basic_os:-bsd} + ;; + fx2800) + cpu=i860 + vendor=alliant + ;; + genix) + cpu=ns32k + vendor=ns + ;; + h3050r* | hiux*) + cpu=hppa1.1 + vendor=hitachi + basic_os=hiuxwe2 + ;; + hp3k9[0-9][0-9] | hp9[0-9][0-9]) + cpu=hppa1.0 + vendor=hp + ;; + hp9k2[0-9][0-9] | hp9k31[0-9]) + cpu=m68000 + vendor=hp + ;; + hp9k3[2-9][0-9]) + cpu=m68k + vendor=hp + ;; + hp9k6[0-9][0-9] | hp6[0-9][0-9]) + cpu=hppa1.0 + vendor=hp + ;; + hp9k7[0-79][0-9] | hp7[0-79][0-9]) + cpu=hppa1.1 + vendor=hp + ;; + hp9k78[0-9] | hp78[0-9]) + # FIXME: really hppa2.0-hp + cpu=hppa1.1 + vendor=hp + ;; + hp9k8[67]1 | hp8[67]1 | hp9k80[24] | hp80[24] | hp9k8[78]9 | hp8[78]9 | hp9k893 | hp893) + # FIXME: really hppa2.0-hp + cpu=hppa1.1 + vendor=hp + ;; + hp9k8[0-9][13679] | hp8[0-9][13679]) + cpu=hppa1.1 + vendor=hp + ;; + hp9k8[0-9][0-9] | hp8[0-9][0-9]) + cpu=hppa1.0 + vendor=hp + ;; + i*86v32) + cpu=$(echo "$1" | sed -e 's/86.*/86/') + vendor=pc + basic_os=sysv32 + ;; + i*86v4*) + cpu=$(echo "$1" | sed -e 's/86.*/86/') + vendor=pc + basic_os=sysv4 + ;; + i*86v) + cpu=$(echo "$1" | sed -e 's/86.*/86/') + vendor=pc + basic_os=sysv + ;; + i*86sol2) + cpu=$(echo "$1" | sed -e 's/86.*/86/') + vendor=pc + basic_os=solaris2 + ;; + j90 | j90-cray) + cpu=j90 + vendor=cray + basic_os=${basic_os:-unicos} + ;; + iris | iris4d) + cpu=mips + vendor=sgi + case $basic_os in + irix*) + ;; + *) + basic_os=irix4 + ;; + esac + ;; + miniframe) + cpu=m68000 + vendor=convergent + ;; + *mint | mint[0-9]* | *MiNT | *MiNT[0-9]*) + cpu=m68k + vendor=atari + basic_os=mint + ;; + news-3600 | risc-news) + cpu=mips + vendor=sony + basic_os=newsos + ;; + next | m*-next) + cpu=m68k + vendor=next + case $basic_os in + openstep*) + ;; + nextstep*) + ;; + ns2*) + basic_os=nextstep2 + ;; + *) + basic_os=nextstep3 + ;; + esac + ;; + np1) + cpu=np1 + vendor=gould + ;; + op50n-* | op60c-*) + cpu=hppa1.1 + vendor=oki + basic_os=proelf + ;; + pa-hitachi) + cpu=hppa1.1 + vendor=hitachi + basic_os=hiuxwe2 + ;; + pbd) + cpu=sparc + vendor=tti + ;; + pbb) + cpu=m68k + vendor=tti + ;; + pc532) + cpu=ns32k + vendor=pc532 + ;; + pn) + cpu=pn + vendor=gould + ;; + power) + cpu=power + vendor=ibm + ;; + ps2) + cpu=i386 + vendor=ibm + ;; + rm[46]00) + cpu=mips + vendor=siemens + ;; + rtpc | rtpc-*) + cpu=romp + vendor=ibm + ;; + sde) + cpu=mipsisa32 + vendor=sde + basic_os=${basic_os:-elf} + ;; + simso-wrs) + cpu=sparclite + vendor=wrs + basic_os=vxworks + ;; + tower | tower-32) + cpu=m68k + vendor=ncr + ;; + vpp*|vx|vx-*) + cpu=f301 + vendor=fujitsu + ;; + w65) + cpu=w65 + vendor=wdc + ;; + w89k-*) + cpu=hppa1.1 + vendor=winbond + basic_os=proelf + ;; + none) + cpu=none + vendor=none + ;; + leon|leon[3-9]) + cpu=sparc + vendor=$basic_machine + ;; + leon-*|leon[3-9]-*) + cpu=sparc + vendor=$(echo "$basic_machine" | sed 's/-.*//') + ;; + + *-*) + # shellcheck disable=SC2162 + IFS="-" read cpu vendor <&2 + exit 1 + ;; + esac + ;; +esac + +# Here we canonicalize certain aliases for manufacturers. +case $vendor in + digital*) + vendor=dec + ;; + commodore*) + vendor=cbm + ;; + *) + ;; +esac + +# Decode manufacturer-specific aliases for certain operating systems. + +if test x$basic_os != x +then + +# First recognize some ad-hoc caes, or perhaps split kernel-os, or else just +# set os. +case $basic_os in + gnu/linux*) + kernel=linux + os=$(echo $basic_os | sed -e 's|gnu/linux|gnu|') + ;; + os2-emx) + kernel=os2 + os=$(echo $basic_os | sed -e 's|os2-emx|emx|') + ;; + nto-qnx*) + kernel=nto + os=$(echo $basic_os | sed -e 's|nto-qnx|qnx|') + ;; + *-*) + # shellcheck disable=SC2162 + IFS="-" read kernel os <&2 + exit 1 + ;; +esac + +# As a final step for OS-related things, validate the OS-kernel combination +# (given a valid OS), if there is a kernel. +case $kernel-$os in + linux-gnu* | linux-dietlibc* | linux-android* | linux-newlib* | linux-musl* | linux-uclibc* ) + ;; + uclinux-uclibc* ) + ;; + -dietlibc* | -newlib* | -musl* | -uclibc* ) + # These are just libc implementations, not actual OSes, and thus + # require a kernel. + echo "Invalid configuration \`$1': libc \`$os' needs explicit kernel." 1>&2 + exit 1 + ;; + kfreebsd*-gnu* | kopensolaris*-gnu*) + ;; + nto-qnx*) + ;; + os2-emx) + ;; + *-eabi* | *-gnueabi*) + ;; + -*) + # Blank kernel with real OS is always fine. + ;; + *-*) + echo "Invalid configuration \`$1': Kernel \`$kernel' not known to work with OS \`$os'." 1>&2 + exit 1 + ;; +esac + +# Here we handle the case where we know the os, and the CPU type, but not the +# manufacturer. We pick the logical manufacturer. +case $vendor in + unknown) + case $cpu-$os in + *-riscix*) + vendor=acorn + ;; + *-sunos*) + vendor=sun + ;; + *-cnk* | *-aix*) + vendor=ibm + ;; + *-beos*) + vendor=be + ;; + *-hpux*) + vendor=hp + ;; + *-mpeix*) + vendor=hp + ;; + *-hiux*) + vendor=hitachi + ;; + *-unos*) + vendor=crds + ;; + *-dgux*) + vendor=dg + ;; + *-luna*) + vendor=omron + ;; + *-genix*) + vendor=ns + ;; + *-clix*) + vendor=intergraph + ;; + *-mvs* | *-opened*) + vendor=ibm + ;; + *-os400*) + vendor=ibm + ;; + s390-* | s390x-*) + vendor=ibm + ;; + *-ptx*) + vendor=sequent + ;; + *-tpf*) + vendor=ibm + ;; + *-vxsim* | *-vxworks* | *-windiss*) + vendor=wrs + ;; + *-aux*) + vendor=apple + ;; + *-hms*) + vendor=hitachi + ;; + *-mpw* | *-macos*) + vendor=apple + ;; + *-*mint | *-mint[0-9]* | *-*MiNT | *-MiNT[0-9]*) + vendor=atari + ;; + *-vos*) + vendor=stratus + ;; + esac + ;; +esac + +echo "$cpu-$vendor-${kernel:+$kernel-}$os" +exit + +# Local variables: +# eval: (add-hook 'before-save-hook 'time-stamp) +# time-stamp-start: "timestamp='" +# time-stamp-format: "%:y-%02m-%02d" +# time-stamp-end: "'" +# End: diff --git a/src/pcre/configure b/src/pcre2/configure similarity index 56% rename from src/pcre/configure rename to src/pcre2/configure index 2db0796f..319d8fb1 100755 --- a/src/pcre/configure +++ b/src/pcre2/configure @@ -1,9 +1,10 @@ #! /bin/sh # Guess values for system-dependent variables and create Makefiles. -# Generated by GNU Autoconf 2.69 for PCRE 8.43. +# Generated by GNU Autoconf 2.71 for PCRE2 10.37. # # -# Copyright (C) 1992-1996, 1998-2012 Free Software Foundation, Inc. +# Copyright (C) 1992-1996, 1998-2017, 2020-2021 Free Software Foundation, +# Inc. # # # This configure script is free software; the Free Software Foundation @@ -14,14 +15,16 @@ # Be more Bourne compatible DUALCASE=1; export DUALCASE # for MKS sh -if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then : +as_nop=: +if test ${ZSH_VERSION+y} && (emulate sh) >/dev/null 2>&1 +then : emulate sh NULLCMD=: # Pre-4.2 versions of Zsh do word splitting on ${1+"$@"}, which # is contrary to our usage. Disable this feature. alias -g '${1+"$@"}'='"$@"' setopt NO_GLOB_SUBST -else +else $as_nop case `(set -o) 2>/dev/null` in #( *posix*) : set -o posix ;; #( @@ -31,46 +34,46 @@ esac fi + +# Reset variables that may have inherited troublesome values from +# the environment. + +# IFS needs to be set, to space, tab, and newline, in precisely that order. +# (If _AS_PATH_WALK were called with IFS unset, it would have the +# side effect of setting IFS to empty, thus disabling word splitting.) +# Quoting is to prevent editors from complaining about space-tab. as_nl=' ' export as_nl -# Printing a long string crashes Solaris 7 /usr/bin/printf. -as_echo='\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' -as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo -as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo$as_echo -# Prefer a ksh shell builtin over an external printf program on Solaris, -# but without wasting forks for bash or zsh. -if test -z "$BASH_VERSION$ZSH_VERSION" \ - && (test "X`print -r -- $as_echo`" = "X$as_echo") 2>/dev/null; then - as_echo='print -r --' - as_echo_n='print -rn --' -elif (test "X`printf %s $as_echo`" = "X$as_echo") 2>/dev/null; then - as_echo='printf %s\n' - as_echo_n='printf %s' -else - if test "X`(/usr/ucb/echo -n -n $as_echo) 2>/dev/null`" = "X-n $as_echo"; then - as_echo_body='eval /usr/ucb/echo -n "$1$as_nl"' - as_echo_n='/usr/ucb/echo -n' - else - as_echo_body='eval expr "X$1" : "X\\(.*\\)"' - as_echo_n_body='eval - arg=$1; - case $arg in #( - *"$as_nl"*) - expr "X$arg" : "X\\(.*\\)$as_nl"; - arg=`expr "X$arg" : ".*$as_nl\\(.*\\)"`;; - esac; - expr "X$arg" : "X\\(.*\\)" | tr -d "$as_nl" - ' - export as_echo_n_body - as_echo_n='sh -c $as_echo_n_body as_echo' - fi - export as_echo_body - as_echo='sh -c $as_echo_body as_echo' -fi +IFS=" "" $as_nl" + +PS1='$ ' +PS2='> ' +PS4='+ ' + +# Ensure predictable behavior from utilities with locale-dependent output. +LC_ALL=C +export LC_ALL +LANGUAGE=C +export LANGUAGE + +# We cannot yet rely on "unset" to work, but we need these variables +# to be unset--not just set to an empty or harmless value--now, to +# avoid bugs in old shells (e.g. pre-3.0 UWIN ksh). This construct +# also avoids known problems related to "unset" and subshell syntax +# in other old shells (e.g. bash 2.01 and pdksh 5.2.14). +for as_var in BASH_ENV ENV MAIL MAILPATH CDPATH +do eval test \${$as_var+y} \ + && ( (unset $as_var) || exit 1) >/dev/null 2>&1 && unset $as_var || : +done + +# Ensure that fds 0, 1, and 2 are open. +if (exec 3>&0) 2>/dev/null; then :; else exec 0&1) 2>/dev/null; then :; else exec 1>/dev/null; fi +if (exec 3>&2) ; then :; else exec 2>/dev/null; fi # The user is always right. -if test "${PATH_SEPARATOR+set}" != set; then +if ${PATH_SEPARATOR+false} :; then PATH_SEPARATOR=: (PATH='/bin;/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 && { (PATH='/bin:/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 || @@ -79,13 +82,6 @@ if test "${PATH_SEPARATOR+set}" != set; then fi -# IFS -# We need space, tab and new line, in precisely that order. Quoting is -# there to prevent editors from complaining about space-tab. -# (If _AS_PATH_WALK were called with IFS unset, it would disable word -# splitting by setting IFS to empty value.) -IFS=" "" $as_nl" - # Find who we are. Look in the path if we contain no directory separator. as_myself= case $0 in #(( @@ -94,8 +90,12 @@ case $0 in #(( for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - test -r "$as_dir/$0" && as_myself=$as_dir/$0 && break + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + test -r "$as_dir$0" && as_myself=$as_dir$0 && break done IFS=$as_save_IFS @@ -107,30 +107,10 @@ if test "x$as_myself" = x; then as_myself=$0 fi if test ! -f "$as_myself"; then - $as_echo "$as_myself: error: cannot find myself; rerun with an absolute file name" >&2 + printf "%s\n" "$as_myself: error: cannot find myself; rerun with an absolute file name" >&2 exit 1 fi -# Unset variables that we do not need and which cause bugs (e.g. in -# pre-3.0 UWIN ksh). But do not cause bugs in bash 2.01; the "|| exit 1" -# suppresses any "Segmentation fault" message there. '((' could -# trigger a bug in pdksh 5.2.14. -for as_var in BASH_ENV ENV MAIL MAILPATH -do eval test x\${$as_var+set} = xset \ - && ( (unset $as_var) || exit 1) >/dev/null 2>&1 && unset $as_var || : -done -PS1='$ ' -PS2='> ' -PS4='+ ' - -# NLS nuisances. -LC_ALL=C -export LC_ALL -LANGUAGE=C -export LANGUAGE - -# CDPATH. -(unset CDPATH) >/dev/null 2>&1 && unset CDPATH # Use a proper internal environment variable to ensure we don't fall # into an infinite loop, continuously re-executing ourselves. @@ -152,20 +132,22 @@ esac exec $CONFIG_SHELL $as_opts "$as_myself" ${1+"$@"} # Admittedly, this is quite paranoid, since all the known shells bail # out after a failed `exec'. -$as_echo "$0: could not re-execute with $CONFIG_SHELL" >&2 -as_fn_exit 255 +printf "%s\n" "$0: could not re-execute with $CONFIG_SHELL" >&2 +exit 255 fi # We don't want this to propagate to other subprocesses. { _as_can_reexec=; unset _as_can_reexec;} if test "x$CONFIG_SHELL" = x; then - as_bourne_compatible="if test -n \"\${ZSH_VERSION+set}\" && (emulate sh) >/dev/null 2>&1; then : + as_bourne_compatible="as_nop=: +if test \${ZSH_VERSION+y} && (emulate sh) >/dev/null 2>&1 +then : emulate sh NULLCMD=: # Pre-4.2 versions of Zsh do word splitting on \${1+\"\$@\"}, which # is contrary to our usage. Disable this feature. alias -g '\${1+\"\$@\"}'='\"\$@\"' setopt NO_GLOB_SUBST -else +else \$as_nop case \`(set -o) 2>/dev/null\` in #( *posix*) : set -o posix ;; #( @@ -185,18 +167,20 @@ as_fn_success || { exitcode=1; echo as_fn_success failed.; } as_fn_failure && { exitcode=1; echo as_fn_failure succeeded.; } as_fn_ret_success || { exitcode=1; echo as_fn_ret_success failed.; } as_fn_ret_failure && { exitcode=1; echo as_fn_ret_failure succeeded.; } -if ( set x; as_fn_ret_success y && test x = \"\$1\" ); then : +if ( set x; as_fn_ret_success y && test x = \"\$1\" ) +then : -else +else \$as_nop exitcode=1; echo positional parameters were not saved. fi test x\$exitcode = x0 || exit 1 +blah=\$(echo \$(echo blah)) +test x\"\$blah\" = xblah || exit 1 test -x / || exit 1" as_suggested=" as_lineno_1=";as_suggested=$as_suggested$LINENO;as_suggested=$as_suggested" as_lineno_1a=\$LINENO as_lineno_2=";as_suggested=$as_suggested$LINENO;as_suggested=$as_suggested" as_lineno_2a=\$LINENO eval 'test \"x\$as_lineno_1'\$as_run'\" != \"x\$as_lineno_2'\$as_run'\" && test \"x\`expr \$as_lineno_1'\$as_run' + 1\`\" = \"x\$as_lineno_2'\$as_run'\"' || exit 1 -test \$(( 1 + 1 )) = 2 || exit 1 test -n \"\${ZSH_VERSION+set}\${BASH_VERSION+set}\" || ( ECHO='\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' @@ -204,31 +188,40 @@ test \$(( 1 + 1 )) = 2 || exit 1 ECHO=\$ECHO\$ECHO\$ECHO\$ECHO\$ECHO\$ECHO PATH=/empty FPATH=/empty; export PATH FPATH test \"X\`printf %s \$ECHO\`\" = \"X\$ECHO\" \\ - || test \"X\`print -r -- \$ECHO\`\" = \"X\$ECHO\" ) || exit 1" - if (eval "$as_required") 2>/dev/null; then : + || test \"X\`print -r -- \$ECHO\`\" = \"X\$ECHO\" ) || exit 1 +test \$(( 1 + 1 )) = 2 || exit 1" + if (eval "$as_required") 2>/dev/null +then : as_have_required=yes -else +else $as_nop as_have_required=no fi - if test x$as_have_required = xyes && (eval "$as_suggested") 2>/dev/null; then : + if test x$as_have_required = xyes && (eval "$as_suggested") 2>/dev/null +then : -else +else $as_nop as_save_IFS=$IFS; IFS=$PATH_SEPARATOR as_found=false for as_dir in /bin$PATH_SEPARATOR/usr/bin$PATH_SEPARATOR$PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac as_found=: case $as_dir in #( /*) for as_base in sh bash ksh sh5; do # Try only shells that exist, to save several forks. - as_shell=$as_dir/$as_base + as_shell=$as_dir$as_base if { test -f "$as_shell" || test -f "$as_shell.exe"; } && - { $as_echo "$as_bourne_compatible""$as_required" | as_run=a "$as_shell"; } 2>/dev/null; then : + as_run=a "$as_shell" -c "$as_bourne_compatible""$as_required" 2>/dev/null +then : CONFIG_SHELL=$as_shell as_have_required=yes - if { $as_echo "$as_bourne_compatible""$as_suggested" | as_run=a "$as_shell"; } 2>/dev/null; then : + if as_run=a "$as_shell" -c "$as_bourne_compatible""$as_suggested" 2>/dev/null +then : break 2 fi fi @@ -236,14 +229,21 @@ fi esac as_found=false done -$as_found || { if { test -f "$SHELL" || test -f "$SHELL.exe"; } && - { $as_echo "$as_bourne_compatible""$as_required" | as_run=a "$SHELL"; } 2>/dev/null; then : - CONFIG_SHELL=$SHELL as_have_required=yes -fi; } IFS=$as_save_IFS +if $as_found +then : + +else $as_nop + if { test -f "$SHELL" || test -f "$SHELL.exe"; } && + as_run=a "$SHELL" -c "$as_bourne_compatible""$as_required" 2>/dev/null +then : + CONFIG_SHELL=$SHELL as_have_required=yes +fi +fi - if test "x$CONFIG_SHELL" != x; then : + if test "x$CONFIG_SHELL" != x +then : export CONFIG_SHELL # We cannot yet assume a decent shell, so we have to provide a # neutralization value for shells without unset; and this also @@ -261,18 +261,19 @@ esac exec $CONFIG_SHELL $as_opts "$as_myself" ${1+"$@"} # Admittedly, this is quite paranoid, since all the known shells bail # out after a failed `exec'. -$as_echo "$0: could not re-execute with $CONFIG_SHELL" >&2 +printf "%s\n" "$0: could not re-execute with $CONFIG_SHELL" >&2 exit 255 fi - if test x$as_have_required = xno; then : - $as_echo "$0: This script requires a shell more modern than all" - $as_echo "$0: the shells that I found on your system." - if test x${ZSH_VERSION+set} = xset ; then - $as_echo "$0: In particular, zsh $ZSH_VERSION has bugs and should" - $as_echo "$0: be upgraded to zsh 4.3.4 or later." + if test x$as_have_required = xno +then : + printf "%s\n" "$0: This script requires a shell more modern than all" + printf "%s\n" "$0: the shells that I found on your system." + if test ${ZSH_VERSION+y} ; then + printf "%s\n" "$0: In particular, zsh $ZSH_VERSION has bugs and should" + printf "%s\n" "$0: be upgraded to zsh 4.3.4 or later." else - $as_echo "$0: Please tell bug-autoconf@gnu.org about your system, + printf "%s\n" "$0: Please tell bug-autoconf@gnu.org about your system, $0: including any error possibly output before this $0: message. Then install a modern shell, or manually run $0: the script under such a shell if you do have one." @@ -299,6 +300,7 @@ as_fn_unset () } as_unset=as_fn_unset + # as_fn_set_status STATUS # ----------------------- # Set $? to STATUS, without forking. @@ -316,6 +318,14 @@ as_fn_exit () as_fn_set_status $1 exit $1 } # as_fn_exit +# as_fn_nop +# --------- +# Do nothing but, unlike ":", preserve the value of $?. +as_fn_nop () +{ + return $? +} +as_nop=as_fn_nop # as_fn_mkdir_p # ------------- @@ -330,7 +340,7 @@ as_fn_mkdir_p () as_dirs= while :; do case $as_dir in #( - *\'*) as_qdir=`$as_echo "$as_dir" | sed "s/'/'\\\\\\\\''/g"`;; #'( + *\'*) as_qdir=`printf "%s\n" "$as_dir" | sed "s/'/'\\\\\\\\''/g"`;; #'( *) as_qdir=$as_dir;; esac as_dirs="'$as_qdir' $as_dirs" @@ -339,7 +349,7 @@ $as_expr X"$as_dir" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_dir" : 'X\(//\)[^/]' \| \ X"$as_dir" : 'X\(//\)$' \| \ X"$as_dir" : 'X\(/\)' \| . 2>/dev/null || -$as_echo X"$as_dir" | +printf "%s\n" X"$as_dir" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q @@ -378,12 +388,13 @@ as_fn_executable_p () # advantage of any shell optimizations that allow amortized linear growth over # repeated appends, instead of the typical quadratic growth present in naive # implementations. -if (eval "as_var=1; as_var+=2; test x\$as_var = x12") 2>/dev/null; then : +if (eval "as_var=1; as_var+=2; test x\$as_var = x12") 2>/dev/null +then : eval 'as_fn_append () { eval $1+=\$2 }' -else +else $as_nop as_fn_append () { eval $1=\$$1\$2 @@ -395,18 +406,27 @@ fi # as_fn_append # Perform arithmetic evaluation on the ARGs, and store the result in the # global $as_val. Take advantage of shells that can avoid forks. The arguments # must be portable across $(()) and expr. -if (eval "test \$(( 1 + 1 )) = 2") 2>/dev/null; then : +if (eval "test \$(( 1 + 1 )) = 2") 2>/dev/null +then : eval 'as_fn_arith () { as_val=$(( $* )) }' -else +else $as_nop as_fn_arith () { as_val=`expr "$@" || test $? -eq 1` } fi # as_fn_arith +# as_fn_nop +# --------- +# Do nothing but, unlike ":", preserve the value of $?. +as_fn_nop () +{ + return $? +} +as_nop=as_fn_nop # as_fn_error STATUS ERROR [LINENO LOG_FD] # ---------------------------------------- @@ -418,9 +438,9 @@ as_fn_error () as_status=$1; test $as_status -eq 0 && as_status=1 if test "$4"; then as_lineno=${as_lineno-"$3"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - $as_echo "$as_me:${as_lineno-$LINENO}: error: $2" >&$4 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: $2" >&$4 fi - $as_echo "$as_me: error: $2" >&2 + printf "%s\n" "$as_me: error: $2" >&2 as_fn_exit $as_status } # as_fn_error @@ -447,7 +467,7 @@ as_me=`$as_basename -- "$0" || $as_expr X/"$0" : '.*/\([^/][^/]*\)/*$' \| \ X"$0" : 'X\(//\)$' \| \ X"$0" : 'X\(/\)' \| . 2>/dev/null || -$as_echo X/"$0" | +printf "%s\n" X/"$0" | sed '/^.*\/\([^/][^/]*\)\/*$/{ s//\1/ q @@ -491,7 +511,7 @@ as_cr_alnum=$as_cr_Letters$as_cr_digits s/-\n.*// ' >$as_me.lineno && chmod +x "$as_me.lineno" || - { $as_echo "$as_me: error: cannot create $as_me.lineno; rerun with a POSIX shell" >&2; as_fn_exit 1; } + { printf "%s\n" "$as_me: error: cannot create $as_me.lineno; rerun with a POSIX shell" >&2; as_fn_exit 1; } # If we had to re-execute with $CONFIG_SHELL, we're ensured to have # already done that, so ensure we don't try to do so again and fall @@ -505,6 +525,10 @@ as_cr_alnum=$as_cr_Letters$as_cr_digits exit } + +# Determine whether it's possible to make 'echo' print without a newline. +# These variables are no longer used directly by Autoconf, but are AC_SUBSTed +# for compatibility with existing Makefiles. ECHO_C= ECHO_N= ECHO_T= case `echo -n x` in #((((( -n*) @@ -518,6 +542,13 @@ case `echo -n x` in #((((( ECHO_N='-n';; esac +# For backward compatibility with old third-party macros, we provide +# the shell variables $as_echo and $as_echo_n. New code should use +# AS_ECHO(["message"]) and AS_ECHO_N(["message"]), respectively. +as_echo='printf %s\n' +as_echo_n='printf %s' + + rm -f conf$$ conf$$.exe conf$$.file if test -d conf$$.dir; then rm -f conf$$.dir/conf$$.file @@ -585,54 +616,52 @@ MFLAGS= MAKEFLAGS= # Identity of this package. -PACKAGE_NAME='PCRE' -PACKAGE_TARNAME='pcre' -PACKAGE_VERSION='8.43' -PACKAGE_STRING='PCRE 8.43' +PACKAGE_NAME='PCRE2' +PACKAGE_TARNAME='pcre2' +PACKAGE_VERSION='10.37' +PACKAGE_STRING='PCRE2 10.37' PACKAGE_BUGREPORT='' PACKAGE_URL='' -ac_unique_file="pcre.h.in" +ac_unique_file="src/pcre2.h.in" # Factoring default headers for most tests. ac_includes_default="\ -#include -#ifdef HAVE_SYS_TYPES_H -# include +#include +#ifdef HAVE_STDIO_H +# include #endif -#ifdef HAVE_SYS_STAT_H -# include -#endif -#ifdef STDC_HEADERS +#ifdef HAVE_STDLIB_H # include -# include -#else -# ifdef HAVE_STDLIB_H -# include -# endif #endif #ifdef HAVE_STRING_H -# if !defined STDC_HEADERS && defined HAVE_MEMORY_H -# include -# endif # include #endif -#ifdef HAVE_STRINGS_H -# include -#endif #ifdef HAVE_INTTYPES_H # include #endif #ifdef HAVE_STDINT_H # include #endif +#ifdef HAVE_STRINGS_H +# include +#endif +#ifdef HAVE_SYS_TYPES_H +# include +#endif +#ifdef HAVE_SYS_STAT_H +# include +#endif #ifdef HAVE_UNISTD_H # include #endif" +ac_header_c_list= ac_subst_vars='am__EXEEXT_FALSE am__EXEEXT_TRUE LTLIBOBJS LIBOBJS +LIB_POSTFIX +CET_CFLAGS WITH_GCOV_FALSE WITH_GCOV_TRUE GCOV_LIBS @@ -649,49 +678,44 @@ PKG_CONFIG LIBBZ2 LIBZ DISTCHECK_CONFIGURE_FLAGS -EXTRA_LIBPCRECPP_LDFLAGS -EXTRA_LIBPCREPOSIX_LDFLAGS -EXTRA_LIBPCRE32_LDFLAGS -EXTRA_LIBPCRE16_LDFLAGS -EXTRA_LIBPCRE_LDFLAGS +EXTRA_LIBPCRE2_POSIX_LDFLAGS +EXTRA_LIBPCRE2_32_LDFLAGS +EXTRA_LIBPCRE2_16_LDFLAGS +EXTRA_LIBPCRE2_8_LDFLAGS PTHREAD_CFLAGS PTHREAD_LIBS PTHREAD_CC ax_pthread_config -PCRE_STATIC_CFLAG +PCRE2_STATIC_CFLAG LIBREADLINE +WITH_FUZZ_SUPPORT_FALSE +WITH_FUZZ_SUPPORT_TRUE WITH_VALGRIND_FALSE WITH_VALGRIND_TRUE -WITH_UTF_FALSE -WITH_UTF_TRUE +WITH_UNICODE_FALSE +WITH_UNICODE_TRUE WITH_JIT_FALSE WITH_JIT_TRUE WITH_REBUILD_CHARTABLES_FALSE WITH_REBUILD_CHARTABLES_TRUE -WITH_PCRE_CPP_FALSE -WITH_PCRE_CPP_TRUE -WITH_PCRE32_FALSE -WITH_PCRE32_TRUE -WITH_PCRE16_FALSE -WITH_PCRE16_TRUE -WITH_PCRE8_FALSE -WITH_PCRE8_TRUE -pcre_have_bits_type_traits -pcre_have_type_traits -pcre_have_ulong_long -pcre_have_long_long -enable_cpp -enable_pcre32 -enable_pcre16 -enable_pcre8 -PCRE_DATE -PCRE_PRERELEASE -PCRE_MINOR -PCRE_MAJOR +WITH_DEBUG_FALSE +WITH_DEBUG_TRUE +WITH_PCRE2_32_FALSE +WITH_PCRE2_32_TRUE +WITH_PCRE2_16_FALSE +WITH_PCRE2_16_TRUE +WITH_PCRE2_8_FALSE +WITH_PCRE2_8_TRUE +enable_pcre2_32 +enable_pcre2_16 +enable_pcre2_8 +PCRE2_DATE +PCRE2_PRERELEASE +PCRE2_MINOR +PCRE2_MAJOR HAVE_VISIBILITY VISIBILITY_CXXFLAGS VISIBILITY_CFLAGS -CXXCPP LT_SYS_LIBRARY_PATH OTOOL64 OTOOL @@ -706,11 +730,9 @@ ac_ct_DUMPBIN DUMPBIN LD FGREP +EGREP +GREP SED -LIBTOOL -OBJDUMP -DLLTOOL -AS host_os host_vendor host_cpu @@ -719,15 +741,12 @@ build_os build_vendor build_cpu build -EGREP -GREP -CPP -am__fastdepCXX_FALSE -am__fastdepCXX_TRUE -CXXDEPMODE -ac_ct_CXX -CXXFLAGS -CXX +LIBTOOL +OBJDUMP +DLLTOOL +AS +ac_ct_AR +AR am__fastdepCC_FALSE am__fastdepCC_TRUE CCDEPMODE @@ -744,8 +763,6 @@ CPPFLAGS LDFLAGS CFLAGS CC -ac_ct_AR -AR AM_BACKSLASH AM_DEFAULT_VERBOSITY AM_DEFAULT_V @@ -792,6 +809,7 @@ infodir docdir oldincludedir includedir +runstatedir localstatedir sharedstatedir sysconfdir @@ -828,34 +846,44 @@ enable_libtool_lock enable_pcre8 enable_pcre16 enable_pcre32 -enable_cpp +enable_pcre2_8 +enable_pcre2_16 +enable_pcre2_32 +enable_debug enable_jit -enable_pcregrep_jit +enable_jit_sealloc +enable_pcre2grep_jit +enable_pcre2grep_callout +enable_pcre2grep_callout_fork enable_rebuild_chartables -enable_utf8 -enable_utf -enable_unicode_properties +enable_unicode enable_newline_is_cr enable_newline_is_lf enable_newline_is_crlf enable_newline_is_anycrlf enable_newline_is_any +enable_newline_is_nul enable_bsr_anycrlf +enable_never_backslash_C enable_ebcdic enable_ebcdic_nl25 -enable_stack_for_recursion -enable_pcregrep_libz -enable_pcregrep_libbz2 -with_pcregrep_bufsize -enable_pcretest_libedit -enable_pcretest_libreadline -with_posix_malloc_threshold +enable_pcre2grep_libz +enable_pcre2grep_libbz2 +with_pcre2grep_bufsize +with_pcre2grep_max_bufsize +enable_pcre2test_libedit +enable_pcre2test_libreadline with_link_size with_parens_nest_limit +with_heap_limit with_match_limit +with_match_limit_depth with_match_limit_recursion enable_valgrind enable_coverage +enable_fuzz_support +enable_stack_for_recursion +enable_percent_zt ' ac_precious_vars='build_alias host_alias @@ -865,12 +893,7 @@ CFLAGS LDFLAGS LIBS CPPFLAGS -CXX -CXXFLAGS -CCC -CPP LT_SYS_LIBRARY_PATH -CXXCPP PKG_CONFIG PKG_CONFIG_PATH PKG_CONFIG_LIBDIR @@ -916,6 +939,7 @@ datadir='${datarootdir}' sysconfdir='${prefix}/etc' sharedstatedir='${prefix}/com' localstatedir='${prefix}/var' +runstatedir='${localstatedir}/run' includedir='${prefix}/include' oldincludedir='/usr/include' docdir='${datarootdir}/doc/${PACKAGE_TARNAME}' @@ -945,8 +969,6 @@ do *) ac_optarg=yes ;; esac - # Accept the important Cygnus configure options, so we can diagnose typos. - case $ac_dashdash$ac_option in --) ac_dashdash=yes ;; @@ -987,9 +1009,9 @@ do ac_useropt=`expr "x$ac_option" : 'x-*disable-\(.*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && - as_fn_error $? "invalid feature name: $ac_useropt" + as_fn_error $? "invalid feature name: \`$ac_useropt'" ac_useropt_orig=$ac_useropt - ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` + ac_useropt=`printf "%s\n" "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "enable_$ac_useropt" @@ -1013,9 +1035,9 @@ do ac_useropt=`expr "x$ac_option" : 'x-*enable-\([^=]*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && - as_fn_error $? "invalid feature name: $ac_useropt" + as_fn_error $? "invalid feature name: \`$ac_useropt'" ac_useropt_orig=$ac_useropt - ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` + ac_useropt=`printf "%s\n" "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "enable_$ac_useropt" @@ -1168,6 +1190,15 @@ do | -silent | --silent | --silen | --sile | --sil) silent=yes ;; + -runstatedir | --runstatedir | --runstatedi | --runstated \ + | --runstate | --runstat | --runsta | --runst | --runs \ + | --run | --ru | --r) + ac_prev=runstatedir ;; + -runstatedir=* | --runstatedir=* | --runstatedi=* | --runstated=* \ + | --runstate=* | --runstat=* | --runsta=* | --runst=* | --runs=* \ + | --run=* | --ru=* | --r=*) + runstatedir=$ac_optarg ;; + -sbindir | --sbindir | --sbindi | --sbind | --sbin | --sbi | --sb) ac_prev=sbindir ;; -sbindir=* | --sbindir=* | --sbindi=* | --sbind=* | --sbin=* \ @@ -1217,9 +1248,9 @@ do ac_useropt=`expr "x$ac_option" : 'x-*with-\([^=]*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && - as_fn_error $? "invalid package name: $ac_useropt" + as_fn_error $? "invalid package name: \`$ac_useropt'" ac_useropt_orig=$ac_useropt - ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` + ac_useropt=`printf "%s\n" "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "with_$ac_useropt" @@ -1233,9 +1264,9 @@ do ac_useropt=`expr "x$ac_option" : 'x-*without-\(.*\)'` # Reject names that are not valid shell variable names. expr "x$ac_useropt" : ".*[^-+._$as_cr_alnum]" >/dev/null && - as_fn_error $? "invalid package name: $ac_useropt" + as_fn_error $? "invalid package name: \`$ac_useropt'" ac_useropt_orig=$ac_useropt - ac_useropt=`$as_echo "$ac_useropt" | sed 's/[-+.]/_/g'` + ac_useropt=`printf "%s\n" "$ac_useropt" | sed 's/[-+.]/_/g'` case $ac_user_opts in *" "with_$ac_useropt" @@ -1279,9 +1310,9 @@ Try \`$0 --help' for more information" *) # FIXME: should be removed in autoconf 3.0. - $as_echo "$as_me: WARNING: you should use --build, --host, --target" >&2 + printf "%s\n" "$as_me: WARNING: you should use --build, --host, --target" >&2 expr "x$ac_option" : ".*[^-._$as_cr_alnum]" >/dev/null && - $as_echo "$as_me: WARNING: invalid host type: $ac_option" >&2 + printf "%s\n" "$as_me: WARNING: invalid host type: $ac_option" >&2 : "${build_alias=$ac_option} ${host_alias=$ac_option} ${target_alias=$ac_option}" ;; @@ -1297,7 +1328,7 @@ if test -n "$ac_unrecognized_opts"; then case $enable_option_checking in no) ;; fatal) as_fn_error $? "unrecognized options: $ac_unrecognized_opts" ;; - *) $as_echo "$as_me: WARNING: unrecognized options: $ac_unrecognized_opts" >&2 ;; + *) printf "%s\n" "$as_me: WARNING: unrecognized options: $ac_unrecognized_opts" >&2 ;; esac fi @@ -1305,7 +1336,7 @@ fi for ac_var in exec_prefix prefix bindir sbindir libexecdir datarootdir \ datadir sysconfdir sharedstatedir localstatedir includedir \ oldincludedir docdir infodir htmldir dvidir pdfdir psdir \ - libdir localedir mandir + libdir localedir mandir runstatedir do eval ac_val=\$$ac_var # Remove trailing slashes. @@ -1361,7 +1392,7 @@ $as_expr X"$as_myself" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_myself" : 'X\(//\)[^/]' \| \ X"$as_myself" : 'X\(//\)$' \| \ X"$as_myself" : 'X\(/\)' \| . 2>/dev/null || -$as_echo X"$as_myself" | +printf "%s\n" X"$as_myself" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q @@ -1418,7 +1449,7 @@ if test "$ac_init_help" = "long"; then # Omit some internal or obsolete options to make the list less imposing. # This message is too long to be a string in the A/UX 3.1 sh. cat <<_ACEOF -\`configure' configures PCRE 8.43 to adapt to many kinds of systems. +\`configure' configures PCRE2 10.37 to adapt to many kinds of systems. Usage: $0 [OPTION]... [VAR=VALUE]... @@ -1458,6 +1489,7 @@ Fine tuning of the installation directories: --sysconfdir=DIR read-only single-machine data [PREFIX/etc] --sharedstatedir=DIR modifiable architecture-independent data [PREFIX/com] --localstatedir=DIR modifiable single-machine data [PREFIX/var] + --runstatedir=DIR modifiable per-process data [LOCALSTATEDIR/run] --libdir=DIR object code libraries [EPREFIX/lib] --includedir=DIR C header files [PREFIX/include] --oldincludedir=DIR C header files for non-gcc [/usr/include] @@ -1466,7 +1498,7 @@ Fine tuning of the installation directories: --infodir=DIR info documentation [DATAROOTDIR/info] --localedir=DIR locale-dependent data [DATAROOTDIR/locale] --mandir=DIR man documentation [DATAROOTDIR/man] - --docdir=DIR documentation root [DATAROOTDIR/doc/pcre] + --docdir=DIR documentation root [DATAROOTDIR/doc/pcre2] --htmldir=DIR html documentation [DOCDIR] --dvidir=DIR dvi documentation [DOCDIR] --pdfdir=DIR pdf documentation [DOCDIR] @@ -1488,7 +1520,7 @@ fi if test -n "$ac_init_help"; then case $ac_init_help in - short | recursive ) echo "Configuration of PCRE 8.43:";; + short | recursive ) echo "Configuration of PCRE2 10.37:";; esac cat <<\_ACEOF @@ -1507,21 +1539,22 @@ Optional Features: --enable-fast-install[=PKGS] optimize for fast installation [default=yes] --disable-libtool-lock avoid locking (might break parallel builds) - --disable-pcre8 disable 8 bit character support - --enable-pcre16 enable 16 bit character support - --enable-pcre32 enable 32 bit character support - --disable-cpp disable C++ support + + --disable-pcre2-8 disable 8 bit character support + --enable-pcre2-16 enable 16 bit character support + --enable-pcre2-32 enable 32 bit character support + --enable-debug enable debugging code --enable-jit enable Just-In-Time compiling support - --disable-pcregrep-jit disable JIT support in pcregrep + --enable-jit-sealloc enable SELinux compatible execmem allocator in JIT + (experimental) + --disable-pcre2grep-jit disable JIT support in pcre2grep + --disable-pcre2grep-callout + disable callout script support in pcre2grep + --disable-pcre2grep-callout-fork + disable callout script fork support in pcre2grep --enable-rebuild-chartables rebuild character tables in current locale - --enable-utf8 another name for --enable-utf. Kept only for - compatibility reasons - --enable-utf enable UTF-8/16/32 support (incompatible with - --enable-ebcdic) - --enable-unicode-properties - enable Unicode properties support (implies - --enable-utf) + --disable-unicode disable Unicode support --enable-newline-is-cr use CR as newline character --enable-newline-is-lf use LF as newline character (default) --enable-newline-is-crlf @@ -1529,23 +1562,26 @@ Optional Features: --enable-newline-is-anycrlf use CR, LF, or CRLF as newline sequence --enable-newline-is-any use any valid Unicode newline sequence + --enable-newline-is-nul use NUL (binary zero) as newline character --enable-bsr-anycrlf \R matches only CR, LF, CRLF by default + --enable-never-backslash-C + use of \C causes an error --enable-ebcdic assume EBCDIC coding rather than ASCII; incompatible with --enable-utf; use only in (uncommon) EBCDIC environments; it implies --enable-rebuild-chartables --enable-ebcdic-nl25 set EBCDIC code for NL to 0x25 instead of 0x15; it implies --enable-ebcdic - --disable-stack-for-recursion - don't use stack recursion when matching - --enable-pcregrep-libz link pcregrep with libz to handle .gz files - --enable-pcregrep-libbz2 - link pcregrep with libbz2 to handle .bz2 files - --enable-pcretest-libedit - link pcretest with libedit - --enable-pcretest-libreadline - link pcretest with libreadline - --enable-valgrind valgrind support + --enable-pcre2grep-libz link pcre2grep with libz to handle .gz files + --enable-pcre2grep-libbz2 + link pcre2grep with libbz2 to handle .bz2 files + --enable-pcre2test-libedit + link pcre2test with libedit + --enable-pcre2test-libreadline + link pcre2test with libreadline + --enable-valgrind enable valgrind support --enable-coverage enable code coverage reports using gcov + --enable-fuzz-support enable fuzzer support + --disable-percent-zt disable the use of z and t formatting modifiers Optional Packages: --with-PACKAGE[=ARG] use PACKAGE [ARG=yes] @@ -1558,18 +1594,23 @@ Optional Packages: --with-gnu-ld assume the C compiler uses GNU ld [default=no] --with-sysroot[=DIR] Search for dependent libraries within DIR (or the compiler's sysroot if not specified). - --with-pcregrep-bufsize=N - pcregrep buffer size (default=20480, minimum=8192) - --with-posix-malloc-threshold=NBYTES - threshold for POSIX malloc usage (default=10) + --with-pcre2grep-bufsize=N + pcre2grep initial buffer size (default=20480, + minimum=8192) + --with-pcre2grep-max-bufsize=N + pcre2grep maximum buffer size (default=1048576, + minimum=8192) --with-link-size=N internal link size (2, 3, or 4 allowed; default=2) --with-parens-nest-limit=N nested parentheses limit (default=250) + --with-heap-limit=N default limit on heap memory (kibibytes, + default=20000000) --with-match-limit=N default limit on internal looping (default=10000000) - --with-match-limit-recursion=N - default limit on internal recursion + --with-match-limit-depth=N + default limit on match tree depth (default=MATCH_LIMIT) + Some influential environment variables: CC C compiler command CFLAGS C compiler flags @@ -1578,12 +1619,8 @@ Some influential environment variables: LIBS libraries to pass to the linker, e.g. -l CPPFLAGS (Objective) C/C++ preprocessor flags, e.g. -I if you have headers in a nonstandard directory - CXX C++ compiler command - CXXFLAGS C++ compiler flags - CPP C preprocessor LT_SYS_LIBRARY_PATH User-defined run-time library search path. - CXXCPP C++ preprocessor PKG_CONFIG path to pkg-config utility PKG_CONFIG_PATH directories to add to pkg-config's search path @@ -1615,9 +1652,9 @@ if test "$ac_init_help" = "recursive"; then case "$ac_dir" in .) ac_dir_suffix= ac_top_builddir_sub=. ac_top_build_prefix= ;; *) - ac_dir_suffix=/`$as_echo "$ac_dir" | sed 's|^\.[\\/]||'` + ac_dir_suffix=/`printf "%s\n" "$ac_dir" | sed 's|^\.[\\/]||'` # A ".." for each directory in $ac_dir_suffix. - ac_top_builddir_sub=`$as_echo "$ac_dir_suffix" | sed 's|/[^\\/]*|/..|g;s|/||'` + ac_top_builddir_sub=`printf "%s\n" "$ac_dir_suffix" | sed 's|/[^\\/]*|/..|g;s|/||'` case $ac_top_builddir_sub in "") ac_top_builddir_sub=. ac_top_build_prefix= ;; *) ac_top_build_prefix=$ac_top_builddir_sub/ ;; @@ -1645,7 +1682,8 @@ esac ac_abs_srcdir=$ac_abs_top_srcdir$ac_dir_suffix cd "$ac_dir" || { ac_status=$?; continue; } - # Check for guested configure. + # Check for configure.gnu first; this name is used for a wrapper for + # Metaconfig's "Configure" on case-insensitive file systems. if test -f "$ac_srcdir/configure.gnu"; then echo && $SHELL "$ac_srcdir/configure.gnu" --help=recursive @@ -1653,7 +1691,7 @@ ac_abs_srcdir=$ac_abs_top_srcdir$ac_dir_suffix echo && $SHELL "$ac_srcdir/configure" --help=recursive else - $as_echo "$as_me: WARNING: no configuration information is in $ac_dir" >&2 + printf "%s\n" "$as_me: WARNING: no configuration information is in $ac_dir" >&2 fi || ac_status=$? cd "$ac_pwd" || { ac_status=$?; break; } done @@ -1662,10 +1700,10 @@ fi test -n "$ac_init_help" && exit $ac_status if $ac_init_version; then cat <<\_ACEOF -PCRE configure 8.43 -generated by GNU Autoconf 2.69 +PCRE2 configure 10.37 +generated by GNU Autoconf 2.71 -Copyright (C) 2012 Free Software Foundation, Inc. +Copyright (C) 2021 Free Software Foundation, Inc. This configure script is free software; the Free Software Foundation gives unlimited permission to copy, distribute and modify it. _ACEOF @@ -1682,14 +1720,14 @@ fi ac_fn_c_try_compile () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - rm -f conftest.$ac_objext + rm -f conftest.$ac_objext conftest.beam if { { ac_try="$ac_compile" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_compile") 2>conftest.err ac_status=$? if test -s conftest.err; then @@ -1697,14 +1735,15 @@ $as_echo "$ac_try_echo"; } >&5 cat conftest.er1 >&5 mv -f conftest.er1 conftest.err fi - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err - } && test -s conftest.$ac_objext; then : + } && test -s conftest.$ac_objext +then : ac_retval=0 -else - $as_echo "$as_me: failed program was:" >&5 +else $as_nop + printf "%s\n" "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_retval=1 @@ -1714,123 +1753,6 @@ fi } # ac_fn_c_try_compile -# ac_fn_cxx_try_compile LINENO -# ---------------------------- -# Try to compile conftest.$ac_ext, and return whether this succeeded. -ac_fn_cxx_try_compile () -{ - as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - rm -f conftest.$ac_objext - if { { ac_try="$ac_compile" -case "(($ac_try" in - *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; - *) ac_try_echo=$ac_try;; -esac -eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 - (eval "$ac_compile") 2>conftest.err - ac_status=$? - if test -s conftest.err; then - grep -v '^ *+' conftest.err >conftest.er1 - cat conftest.er1 >&5 - mv -f conftest.er1 conftest.err - fi - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } && { - test -z "$ac_cxx_werror_flag" || - test ! -s conftest.err - } && test -s conftest.$ac_objext; then : - ac_retval=0 -else - $as_echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 - - ac_retval=1 -fi - eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno - as_fn_set_status $ac_retval - -} # ac_fn_cxx_try_compile - -# ac_fn_c_try_cpp LINENO -# ---------------------- -# Try to preprocess conftest.$ac_ext, and return whether this succeeded. -ac_fn_c_try_cpp () -{ - as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - if { { ac_try="$ac_cpp conftest.$ac_ext" -case "(($ac_try" in - *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; - *) ac_try_echo=$ac_try;; -esac -eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 - (eval "$ac_cpp conftest.$ac_ext") 2>conftest.err - ac_status=$? - if test -s conftest.err; then - grep -v '^ *+' conftest.err >conftest.er1 - cat conftest.er1 >&5 - mv -f conftest.er1 conftest.err - fi - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } > conftest.i && { - test -z "$ac_c_preproc_warn_flag$ac_c_werror_flag" || - test ! -s conftest.err - }; then : - ac_retval=0 -else - $as_echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 - - ac_retval=1 -fi - eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno - as_fn_set_status $ac_retval - -} # ac_fn_c_try_cpp - -# ac_fn_c_try_run LINENO -# ---------------------- -# Try to link conftest.$ac_ext, and return whether this succeeded. Assumes -# that executables *can* be run. -ac_fn_c_try_run () -{ - as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - if { { ac_try="$ac_link" -case "(($ac_try" in - *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; - *) ac_try_echo=$ac_try;; -esac -eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 - (eval "$ac_link") 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } && { ac_try='./conftest$ac_exeext' - { { case "(($ac_try" in - *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; - *) ac_try_echo=$ac_try;; -esac -eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 - (eval "$ac_try") 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; }; }; then : - ac_retval=0 -else - $as_echo "$as_me: program exited with status $ac_status" >&5 - $as_echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 - - ac_retval=$ac_status -fi - rm -rf conftest.dSYM conftest_ipa8_conftest.oo - eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno - as_fn_set_status $ac_retval - -} # ac_fn_c_try_run - # ac_fn_c_check_header_compile LINENO HEADER VAR INCLUDES # ------------------------------------------------------- # Tests whether HEADER exists and can be compiled using the include files in @@ -1838,26 +1760,28 @@ fi ac_fn_c_check_header_compile () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 -$as_echo_n "checking for $2... " >&6; } -if eval \${$3+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 +printf %s "checking for $2... " >&6; } +if eval test \${$3+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ $4 #include <$2> _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : eval "$3=yes" -else +else $as_nop eval "$3=no" fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext fi eval ac_res=\$$3 - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 +printf "%s\n" "$ac_res" >&6; } eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno } # ac_fn_c_check_header_compile @@ -1869,11 +1793,12 @@ $as_echo "$ac_res" >&6; } ac_fn_c_find_intX_t () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for int$2_t" >&5 -$as_echo_n "checking for int$2_t... " >&6; } -if eval \${$3+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for int$2_t" >&5 +printf %s "checking for int$2_t... " >&6; } +if eval test \${$3+y} +then : + printf %s "(cached) " >&6 +else $as_nop eval "$3=no" # Order is important - never check a type that is potentially smaller # than half of the expected target width. @@ -1884,7 +1809,7 @@ else $ac_includes_default enum { N = $2 / 2 - 1 }; int -main () +main (void) { static int test_array [1 - 2 * !(0 < ($ac_type) ((((($ac_type) 1 << N) << N) - 1) * 2 + 1))]; test_array [0] = 0; @@ -1894,13 +1819,14 @@ return test_array [0]; return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ $ac_includes_default enum { N = $2 / 2 - 1 }; int -main () +main (void) { static int test_array [1 - 2 * !(($ac_type) ((((($ac_type) 1 << N) << N) - 1) * 2 + 1) < ($ac_type) ((((($ac_type) 1 << N) << N) - 1) * 2 + 2))]; @@ -1911,9 +1837,10 @@ return test_array [0]; return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : -else +else $as_nop case $ac_type in #( int$2_t) : eval "$3=yes" ;; #( @@ -1921,19 +1848,20 @@ else eval "$3=\$ac_type" ;; esac fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext - if eval test \"x\$"$3"\" = x"no"; then : +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext + if eval test \"x\$"$3"\" = x"no" +then : -else +else $as_nop break fi done fi eval ac_res=\$$3 - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 +printf "%s\n" "$ac_res" >&6; } eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno } # ac_fn_c_find_intX_t @@ -1944,14 +1872,14 @@ $as_echo "$ac_res" >&6; } ac_fn_c_try_link () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - rm -f conftest.$ac_objext conftest$ac_exeext + rm -f conftest.$ac_objext conftest.beam conftest$ac_exeext if { { ac_try="$ac_link" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_link") 2>conftest.err ac_status=$? if test -s conftest.err; then @@ -1959,17 +1887,18 @@ $as_echo "$ac_try_echo"; } >&5 cat conftest.er1 >&5 mv -f conftest.er1 conftest.err fi - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } && { test -z "$ac_c_werror_flag" || test ! -s conftest.err } && test -s conftest$ac_exeext && { test "$cross_compiling" = yes || test -x conftest$ac_exeext - }; then : + } +then : ac_retval=0 -else - $as_echo "$as_me: failed program was:" >&5 +else $as_nop + printf "%s\n" "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 ac_retval=1 @@ -1990,11 +1919,12 @@ fi ac_fn_c_check_func () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 -$as_echo_n "checking for $2... " >&6; } -if eval \${$3+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 +printf %s "checking for $2... " >&6; } +if eval test \${$3+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ /* Define $2 to an innocuous variant, in case declares $2. @@ -2002,16 +1932,9 @@ else #define $2 innocuous_$2 /* System header to define __stub macros and hopefully few prototypes, - which can conflict with char $2 (); below. - Prefer to if __STDC__ is defined, since - exists even on freestanding compilers. */ - -#ifdef __STDC__ -# include -#else -# include -#endif + which can conflict with char $2 (); below. */ +#include #undef $2 /* Override any GCC internal prototype to avoid an error. @@ -2029,400 +1952,113 @@ choke me #endif int -main () +main (void) { return $2 (); ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : eval "$3=yes" -else +else $as_nop eval "$3=no" fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext fi eval ac_res=\$$3 - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 +printf "%s\n" "$ac_res" >&6; } eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno } # ac_fn_c_check_func -# ac_fn_cxx_try_cpp LINENO -# ------------------------ -# Try to preprocess conftest.$ac_ext, and return whether this succeeded. -ac_fn_cxx_try_cpp () +# ac_fn_c_check_type LINENO TYPE VAR INCLUDES +# ------------------------------------------- +# Tests whether TYPE exists after having included INCLUDES, setting cache +# variable VAR accordingly. +ac_fn_c_check_type () { as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - if { { ac_try="$ac_cpp conftest.$ac_ext" -case "(($ac_try" in - *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; - *) ac_try_echo=$ac_try;; -esac -eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 - (eval "$ac_cpp conftest.$ac_ext") 2>conftest.err - ac_status=$? - if test -s conftest.err; then - grep -v '^ *+' conftest.err >conftest.er1 - cat conftest.er1 >&5 - mv -f conftest.er1 conftest.err - fi - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } > conftest.i && { - test -z "$ac_cxx_preproc_warn_flag$ac_cxx_werror_flag" || - test ! -s conftest.err - }; then : - ac_retval=0 -else - $as_echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 +printf %s "checking for $2... " >&6; } +if eval test \${$3+y} +then : + printf %s "(cached) " >&6 +else $as_nop + eval "$3=no" + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +$4 +int +main (void) +{ +if (sizeof ($2)) + return 0; + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO" +then : + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +$4 +int +main (void) +{ +if (sizeof (($2))) + return 0; + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO" +then : - ac_retval=1 +else $as_nop + eval "$3=yes" +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext fi +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext +fi +eval ac_res=\$$3 + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 +printf "%s\n" "$ac_res" >&6; } eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno - as_fn_set_status $ac_retval -} # ac_fn_cxx_try_cpp +} # ac_fn_c_check_type +ac_configure_args_raw= +for ac_arg +do + case $ac_arg in + *\'*) + ac_arg=`printf "%s\n" "$ac_arg" | sed "s/'/'\\\\\\\\''/g"` ;; + esac + as_fn_append ac_configure_args_raw " '$ac_arg'" +done -# ac_fn_cxx_try_link LINENO -# ------------------------- -# Try to link conftest.$ac_ext, and return whether this succeeded. -ac_fn_cxx_try_link () -{ - as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - rm -f conftest.$ac_objext conftest$ac_exeext - if { { ac_try="$ac_link" -case "(($ac_try" in - *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; - *) ac_try_echo=$ac_try;; +case $ac_configure_args_raw in + *$as_nl*) + ac_safe_unquote= ;; + *) + ac_unsafe_z='|&;<>()$`\\"*?[ '' ' # This string ends in space, tab. + ac_unsafe_a="$ac_unsafe_z#~" + ac_safe_unquote="s/ '\\([^$ac_unsafe_a][^$ac_unsafe_z]*\\)'/ \\1/g" + ac_configure_args_raw=` printf "%s\n" "$ac_configure_args_raw" | sed "$ac_safe_unquote"`;; esac -eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 - (eval "$ac_link") 2>conftest.err - ac_status=$? - if test -s conftest.err; then - grep -v '^ *+' conftest.err >conftest.er1 - cat conftest.er1 >&5 - mv -f conftest.er1 conftest.err - fi - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } && { - test -z "$ac_cxx_werror_flag" || - test ! -s conftest.err - } && test -s conftest$ac_exeext && { - test "$cross_compiling" = yes || - test -x conftest$ac_exeext - }; then : - ac_retval=0 -else - $as_echo "$as_me: failed program was:" >&5 -sed 's/^/| /' conftest.$ac_ext >&5 - - ac_retval=1 -fi - # Delete the IPA/IPO (Inter Procedural Analysis/Optimization) information - # created by the PGI compiler (conftest_ipa8_conftest.oo), as it would - # interfere with the next link command; also delete a directory that is - # left behind by Apple's compiler. We do this before executing the actions. - rm -rf conftest.dSYM conftest_ipa8_conftest.oo - eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno - as_fn_set_status $ac_retval - -} # ac_fn_cxx_try_link - -# ac_fn_c_check_header_mongrel LINENO HEADER VAR INCLUDES -# ------------------------------------------------------- -# Tests whether HEADER exists, giving a warning if it cannot be compiled using -# the include files in INCLUDES and setting the cache variable VAR -# accordingly. -ac_fn_c_check_header_mongrel () -{ - as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - if eval \${$3+:} false; then : - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 -$as_echo_n "checking for $2... " >&6; } -if eval \${$3+:} false; then : - $as_echo_n "(cached) " >&6 -fi -eval ac_res=\$$3 - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } -else - # Is the header compilable? -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking $2 usability" >&5 -$as_echo_n "checking $2 usability... " >&6; } -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -$4 -#include <$2> -_ACEOF -if ac_fn_c_try_compile "$LINENO"; then : - ac_header_compiler=yes -else - ac_header_compiler=no -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_header_compiler" >&5 -$as_echo "$ac_header_compiler" >&6; } - -# Is the header present? -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking $2 presence" >&5 -$as_echo_n "checking $2 presence... " >&6; } -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include <$2> -_ACEOF -if ac_fn_c_try_cpp "$LINENO"; then : - ac_header_preproc=yes -else - ac_header_preproc=no -fi -rm -f conftest.err conftest.i conftest.$ac_ext -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_header_preproc" >&5 -$as_echo "$ac_header_preproc" >&6; } - -# So? What about this header? -case $ac_header_compiler:$ac_header_preproc:$ac_c_preproc_warn_flag in #(( - yes:no: ) - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: accepted by the compiler, rejected by the preprocessor!" >&5 -$as_echo "$as_me: WARNING: $2: accepted by the compiler, rejected by the preprocessor!" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: proceeding with the compiler's result" >&5 -$as_echo "$as_me: WARNING: $2: proceeding with the compiler's result" >&2;} - ;; - no:yes:* ) - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: present but cannot be compiled" >&5 -$as_echo "$as_me: WARNING: $2: present but cannot be compiled" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: check for missing prerequisite headers?" >&5 -$as_echo "$as_me: WARNING: $2: check for missing prerequisite headers?" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: see the Autoconf documentation" >&5 -$as_echo "$as_me: WARNING: $2: see the Autoconf documentation" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: section \"Present But Cannot Be Compiled\"" >&5 -$as_echo "$as_me: WARNING: $2: section \"Present But Cannot Be Compiled\"" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: proceeding with the compiler's result" >&5 -$as_echo "$as_me: WARNING: $2: proceeding with the compiler's result" >&2;} - ;; -esac - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 -$as_echo_n "checking for $2... " >&6; } -if eval \${$3+:} false; then : - $as_echo_n "(cached) " >&6 -else - eval "$3=\$ac_header_compiler" -fi -eval ac_res=\$$3 - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } -fi - eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno - -} # ac_fn_c_check_header_mongrel - -# ac_fn_cxx_check_header_mongrel LINENO HEADER VAR INCLUDES -# --------------------------------------------------------- -# Tests whether HEADER exists, giving a warning if it cannot be compiled using -# the include files in INCLUDES and setting the cache variable VAR -# accordingly. -ac_fn_cxx_check_header_mongrel () -{ - as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - if eval \${$3+:} false; then : - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 -$as_echo_n "checking for $2... " >&6; } -if eval \${$3+:} false; then : - $as_echo_n "(cached) " >&6 -fi -eval ac_res=\$$3 - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } -else - # Is the header compilable? -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking $2 usability" >&5 -$as_echo_n "checking $2 usability... " >&6; } -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -$4 -#include <$2> -_ACEOF -if ac_fn_cxx_try_compile "$LINENO"; then : - ac_header_compiler=yes -else - ac_header_compiler=no -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_header_compiler" >&5 -$as_echo "$ac_header_compiler" >&6; } - -# Is the header present? -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking $2 presence" >&5 -$as_echo_n "checking $2 presence... " >&6; } -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include <$2> -_ACEOF -if ac_fn_cxx_try_cpp "$LINENO"; then : - ac_header_preproc=yes -else - ac_header_preproc=no -fi -rm -f conftest.err conftest.i conftest.$ac_ext -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_header_preproc" >&5 -$as_echo "$ac_header_preproc" >&6; } - -# So? What about this header? -case $ac_header_compiler:$ac_header_preproc:$ac_cxx_preproc_warn_flag in #(( - yes:no: ) - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: accepted by the compiler, rejected by the preprocessor!" >&5 -$as_echo "$as_me: WARNING: $2: accepted by the compiler, rejected by the preprocessor!" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: proceeding with the compiler's result" >&5 -$as_echo "$as_me: WARNING: $2: proceeding with the compiler's result" >&2;} - ;; - no:yes:* ) - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: present but cannot be compiled" >&5 -$as_echo "$as_me: WARNING: $2: present but cannot be compiled" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: check for missing prerequisite headers?" >&5 -$as_echo "$as_me: WARNING: $2: check for missing prerequisite headers?" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: see the Autoconf documentation" >&5 -$as_echo "$as_me: WARNING: $2: see the Autoconf documentation" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: section \"Present But Cannot Be Compiled\"" >&5 -$as_echo "$as_me: WARNING: $2: section \"Present But Cannot Be Compiled\"" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $2: proceeding with the compiler's result" >&5 -$as_echo "$as_me: WARNING: $2: proceeding with the compiler's result" >&2;} - ;; -esac - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 -$as_echo_n "checking for $2... " >&6; } -if eval \${$3+:} false; then : - $as_echo_n "(cached) " >&6 -else - eval "$3=\$ac_header_compiler" -fi -eval ac_res=\$$3 - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } -fi - eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno - -} # ac_fn_cxx_check_header_mongrel - -# ac_fn_cxx_check_type LINENO TYPE VAR INCLUDES -# --------------------------------------------- -# Tests whether TYPE exists after having included INCLUDES, setting cache -# variable VAR accordingly. -ac_fn_cxx_check_type () -{ - as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 -$as_echo_n "checking for $2... " >&6; } -if eval \${$3+:} false; then : - $as_echo_n "(cached) " >&6 -else - eval "$3=no" - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -$4 -int -main () -{ -if (sizeof ($2)) - return 0; - ; - return 0; -} -_ACEOF -if ac_fn_cxx_try_compile "$LINENO"; then : - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -$4 -int -main () -{ -if (sizeof (($2))) - return 0; - ; - return 0; -} -_ACEOF -if ac_fn_cxx_try_compile "$LINENO"; then : - -else - eval "$3=yes" -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -fi -eval ac_res=\$$3 - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } - eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno - -} # ac_fn_cxx_check_type - -# ac_fn_c_check_type LINENO TYPE VAR INCLUDES -# ------------------------------------------- -# Tests whether TYPE exists after having included INCLUDES, setting cache -# variable VAR accordingly. -ac_fn_c_check_type () -{ - as_lineno=${as_lineno-"$1"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $2" >&5 -$as_echo_n "checking for $2... " >&6; } -if eval \${$3+:} false; then : - $as_echo_n "(cached) " >&6 -else - eval "$3=no" - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -$4 -int -main () -{ -if (sizeof ($2)) - return 0; - ; - return 0; -} -_ACEOF -if ac_fn_c_try_compile "$LINENO"; then : - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -$4 -int -main () -{ -if (sizeof (($2))) - return 0; - ; - return 0; -} -_ACEOF -if ac_fn_c_try_compile "$LINENO"; then : - -else - eval "$3=yes" -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -fi -eval ac_res=\$$3 - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_res" >&5 -$as_echo "$ac_res" >&6; } - eval $as_lineno_stack; ${as_lineno_stack:+:} unset as_lineno -} # ac_fn_c_check_type cat >config.log <<_ACEOF This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. -It was created by PCRE $as_me 8.43, which was -generated by GNU Autoconf 2.69. Invocation command line was +It was created by PCRE2 $as_me 10.37, which was +generated by GNU Autoconf 2.71. Invocation command line was - $ $0 $@ + $ $0$ac_configure_args_raw _ACEOF exec 5>>config.log @@ -2455,8 +2091,12 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - $as_echo "PATH: $as_dir" + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + printf "%s\n" "PATH: $as_dir" done IFS=$as_save_IFS @@ -2491,7 +2131,7 @@ do | -silent | --silent | --silen | --sile | --sil) continue ;; *\'*) - ac_arg=`$as_echo "$ac_arg" | sed "s/'/'\\\\\\\\''/g"` ;; + ac_arg=`printf "%s\n" "$ac_arg" | sed "s/'/'\\\\\\\\''/g"` ;; esac case $ac_pass in 1) as_fn_append ac_configure_args0 " '$ac_arg'" ;; @@ -2526,11 +2166,13 @@ done # WARNING: Use '\'' to represent an apostrophe within the trap. # WARNING: Do not start the trap code with a newline, due to a FreeBSD 4.0 bug. trap 'exit_status=$? + # Sanitize IFS. + IFS=" "" $as_nl" # Save into config.log some information that might help in debugging. { echo - $as_echo "## ---------------- ## + printf "%s\n" "## ---------------- ## ## Cache variables. ## ## ---------------- ##" echo @@ -2541,8 +2183,8 @@ trap 'exit_status=$? case $ac_val in #( *${as_nl}*) case $ac_var in #( - *_cv_*) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 -$as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; + *_cv_*) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 +printf "%s\n" "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; esac case $ac_var in #( _ | IFS | as_nl) ;; #( @@ -2566,7 +2208,7 @@ $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; ) echo - $as_echo "## ----------------- ## + printf "%s\n" "## ----------------- ## ## Output variables. ## ## ----------------- ##" echo @@ -2574,14 +2216,14 @@ $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; do eval ac_val=\$$ac_var case $ac_val in - *\'\''*) ac_val=`$as_echo "$ac_val" | sed "s/'\''/'\''\\\\\\\\'\'''\''/g"`;; + *\'\''*) ac_val=`printf "%s\n" "$ac_val" | sed "s/'\''/'\''\\\\\\\\'\'''\''/g"`;; esac - $as_echo "$ac_var='\''$ac_val'\''" + printf "%s\n" "$ac_var='\''$ac_val'\''" done | sort echo if test -n "$ac_subst_files"; then - $as_echo "## ------------------- ## + printf "%s\n" "## ------------------- ## ## File substitutions. ## ## ------------------- ##" echo @@ -2589,15 +2231,15 @@ $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; do eval ac_val=\$$ac_var case $ac_val in - *\'\''*) ac_val=`$as_echo "$ac_val" | sed "s/'\''/'\''\\\\\\\\'\'''\''/g"`;; + *\'\''*) ac_val=`printf "%s\n" "$ac_val" | sed "s/'\''/'\''\\\\\\\\'\'''\''/g"`;; esac - $as_echo "$ac_var='\''$ac_val'\''" + printf "%s\n" "$ac_var='\''$ac_val'\''" done | sort echo fi if test -s confdefs.h; then - $as_echo "## ----------- ## + printf "%s\n" "## ----------- ## ## confdefs.h. ## ## ----------- ##" echo @@ -2605,8 +2247,8 @@ $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; echo fi test "$ac_signal" != 0 && - $as_echo "$as_me: caught signal $ac_signal" - $as_echo "$as_me: exit $exit_status" + printf "%s\n" "$as_me: caught signal $ac_signal" + printf "%s\n" "$as_me: exit $exit_status" } >&5 rm -f core *.core core.conftest.* && rm -f -r conftest* confdefs* conf$$* $ac_clean_files && @@ -2620,63 +2262,48 @@ ac_signal=0 # confdefs.h avoids OS command line length limits that DEFS can exceed. rm -f -r conftest* confdefs.h -$as_echo "/* confdefs.h */" > confdefs.h +printf "%s\n" "/* confdefs.h */" > confdefs.h # Predefined preprocessor variables. -cat >>confdefs.h <<_ACEOF -#define PACKAGE_NAME "$PACKAGE_NAME" -_ACEOF +printf "%s\n" "#define PACKAGE_NAME \"$PACKAGE_NAME\"" >>confdefs.h -cat >>confdefs.h <<_ACEOF -#define PACKAGE_TARNAME "$PACKAGE_TARNAME" -_ACEOF +printf "%s\n" "#define PACKAGE_TARNAME \"$PACKAGE_TARNAME\"" >>confdefs.h -cat >>confdefs.h <<_ACEOF -#define PACKAGE_VERSION "$PACKAGE_VERSION" -_ACEOF +printf "%s\n" "#define PACKAGE_VERSION \"$PACKAGE_VERSION\"" >>confdefs.h -cat >>confdefs.h <<_ACEOF -#define PACKAGE_STRING "$PACKAGE_STRING" -_ACEOF +printf "%s\n" "#define PACKAGE_STRING \"$PACKAGE_STRING\"" >>confdefs.h -cat >>confdefs.h <<_ACEOF -#define PACKAGE_BUGREPORT "$PACKAGE_BUGREPORT" -_ACEOF +printf "%s\n" "#define PACKAGE_BUGREPORT \"$PACKAGE_BUGREPORT\"" >>confdefs.h -cat >>confdefs.h <<_ACEOF -#define PACKAGE_URL "$PACKAGE_URL" -_ACEOF +printf "%s\n" "#define PACKAGE_URL \"$PACKAGE_URL\"" >>confdefs.h # Let the site file select an alternate cache file if it wants to. # Prefer an explicitly selected file to automatically selected ones. -ac_site_file1=NONE -ac_site_file2=NONE if test -n "$CONFIG_SITE"; then - # We do not want a PATH search for config.site. - case $CONFIG_SITE in #(( - -*) ac_site_file1=./$CONFIG_SITE;; - */*) ac_site_file1=$CONFIG_SITE;; - *) ac_site_file1=./$CONFIG_SITE;; - esac + ac_site_files="$CONFIG_SITE" elif test "x$prefix" != xNONE; then - ac_site_file1=$prefix/share/config.site - ac_site_file2=$prefix/etc/config.site + ac_site_files="$prefix/share/config.site $prefix/etc/config.site" else - ac_site_file1=$ac_default_prefix/share/config.site - ac_site_file2=$ac_default_prefix/etc/config.site + ac_site_files="$ac_default_prefix/share/config.site $ac_default_prefix/etc/config.site" fi -for ac_site_file in "$ac_site_file1" "$ac_site_file2" + +for ac_site_file in $ac_site_files do - test "x$ac_site_file" = xNONE && continue - if test /dev/null != "$ac_site_file" && test -r "$ac_site_file"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: loading site script $ac_site_file" >&5 -$as_echo "$as_me: loading site script $ac_site_file" >&6;} + case $ac_site_file in #( + */*) : + ;; #( + *) : + ac_site_file=./$ac_site_file ;; +esac + if test -f "$ac_site_file" && test -r "$ac_site_file"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: loading site script $ac_site_file" >&5 +printf "%s\n" "$as_me: loading site script $ac_site_file" >&6;} sed 's/^/| /' "$ac_site_file" >&5 . "$ac_site_file" \ - || { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} + || { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "failed to load site script $ac_site_file See \`config.log' for more details" "$LINENO" 5; } fi @@ -2686,120 +2313,511 @@ if test -r "$cache_file"; then # Some versions of bash will fail to source /dev/null (special files # actually), so we avoid doing that. DJGPP emulates it as a regular file. if test /dev/null != "$cache_file" && test -f "$cache_file"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: loading cache $cache_file" >&5 -$as_echo "$as_me: loading cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: loading cache $cache_file" >&5 +printf "%s\n" "$as_me: loading cache $cache_file" >&6;} case $cache_file in [\\/]* | ?:[\\/]* ) . "$cache_file";; *) . "./$cache_file";; esac fi else - { $as_echo "$as_me:${as_lineno-$LINENO}: creating cache $cache_file" >&5 -$as_echo "$as_me: creating cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: creating cache $cache_file" >&5 +printf "%s\n" "$as_me: creating cache $cache_file" >&6;} >$cache_file fi -# Check that the precious variables saved in the cache have kept the same -# value. -ac_cache_corrupted=false -for ac_var in $ac_precious_vars; do - eval ac_old_set=\$ac_cv_env_${ac_var}_set - eval ac_new_set=\$ac_env_${ac_var}_set - eval ac_old_val=\$ac_cv_env_${ac_var}_value - eval ac_new_val=\$ac_env_${ac_var}_value - case $ac_old_set,$ac_new_set in - set,) - { $as_echo "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&5 -$as_echo "$as_me: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&2;} - ac_cache_corrupted=: ;; - ,set) - { $as_echo "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' was not set in the previous run" >&5 -$as_echo "$as_me: error: \`$ac_var' was not set in the previous run" >&2;} - ac_cache_corrupted=: ;; - ,);; - *) - if test "x$ac_old_val" != "x$ac_new_val"; then - # differences in whitespace do not lead to failure. - ac_old_val_w=`echo x $ac_old_val` - ac_new_val_w=`echo x $ac_new_val` - if test "$ac_old_val_w" != "$ac_new_val_w"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' has changed since the previous run:" >&5 -$as_echo "$as_me: error: \`$ac_var' has changed since the previous run:" >&2;} - ac_cache_corrupted=: - else - { $as_echo "$as_me:${as_lineno-$LINENO}: warning: ignoring whitespace changes in \`$ac_var' since the previous run:" >&5 -$as_echo "$as_me: warning: ignoring whitespace changes in \`$ac_var' since the previous run:" >&2;} - eval $ac_var=\$ac_old_val - fi - { $as_echo "$as_me:${as_lineno-$LINENO}: former value: \`$ac_old_val'" >&5 -$as_echo "$as_me: former value: \`$ac_old_val'" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: current value: \`$ac_new_val'" >&5 -$as_echo "$as_me: current value: \`$ac_new_val'" >&2;} - fi;; - esac - # Pass precious variables to config.status. - if test "$ac_new_set" = set; then - case $ac_new_val in - *\'*) ac_arg=$ac_var=`$as_echo "$ac_new_val" | sed "s/'/'\\\\\\\\''/g"` ;; - *) ac_arg=$ac_var=$ac_new_val ;; - esac - case " $ac_configure_args " in - *" '$ac_arg' "*) ;; # Avoid dups. Use of quotes ensures accuracy. - *) as_fn_append ac_configure_args " '$ac_arg'" ;; - esac - fi -done -if $ac_cache_corrupted; then - { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} - { $as_echo "$as_me:${as_lineno-$LINENO}: error: changes in the environment can compromise the build" >&5 -$as_echo "$as_me: error: changes in the environment can compromise the build" >&2;} - as_fn_error $? "run \`make distclean' and/or \`rm $cache_file' and start over" "$LINENO" 5 -fi -## -------------------- ## -## Main body of script. ## -## -------------------- ## +# Test code for whether the C compiler supports C89 (global declarations) +ac_c_conftest_c89_globals=' +/* Does the compiler advertise C89 conformance? + Do not test the value of __STDC__, because some compilers set it to 0 + while being otherwise adequately conformant. */ +#if !defined __STDC__ +# error "Compiler does not advertise C89 conformance" +#endif -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu +#include +#include +struct stat; +/* Most of the following tests are stolen from RCS 5.7 src/conf.sh. */ +struct buf { int x; }; +struct buf * (*rcsopen) (struct buf *, struct stat *, int); +static char *e (p, i) + char **p; + int i; +{ + return p[i]; +} +static char *f (char * (*g) (char **, int), char **p, ...) +{ + char *s; + va_list v; + va_start (v,p); + s = g (p, va_arg (v,int)); + va_end (v); + return s; +} +/* OSF 4.0 Compaq cc is some sort of almost-ANSI by default. It has + function prototypes and stuff, but not \xHH hex character constants. + These do not provoke an error unfortunately, instead are silently treated + as an "x". The following induces an error, until -std is added to get + proper ANSI mode. Curiously \x00 != x always comes out true, for an + array size at least. It is necessary to write \x00 == 0 to get something + that is true only with -std. */ +int osf4_cc_array ['\''\x00'\'' == 0 ? 1 : -1]; +/* IBM C 6 for AIX is almost-ANSI by default, but it replaces macro parameters + inside strings and character constants. */ +#define FOO(x) '\''x'\'' +int xlc6_cc_array[FOO(a) == '\''x'\'' ? 1 : -1]; -am__api_version='1.16' +int test (int i, double x); +struct s1 {int (*f) (int a);}; +struct s2 {int (*f) (double a);}; +int pairnames (int, char **, int *(*)(struct buf *, struct stat *, int), + int, int);' -ac_aux_dir= -for ac_dir in "$srcdir" "$srcdir/.." "$srcdir/../.."; do - if test -f "$ac_dir/install-sh"; then - ac_aux_dir=$ac_dir - ac_install_sh="$ac_aux_dir/install-sh -c" - break - elif test -f "$ac_dir/install.sh"; then - ac_aux_dir=$ac_dir - ac_install_sh="$ac_aux_dir/install.sh -c" - break - elif test -f "$ac_dir/shtool"; then - ac_aux_dir=$ac_dir - ac_install_sh="$ac_aux_dir/shtool install -c" - break - fi -done -if test -z "$ac_aux_dir"; then - as_fn_error $? "cannot find install-sh, install.sh, or shtool in \"$srcdir\" \"$srcdir/..\" \"$srcdir/../..\"" "$LINENO" 5 -fi +# Test code for whether the C compiler supports C89 (body of main). +ac_c_conftest_c89_main=' +ok |= (argc == 0 || f (e, argv, 0) != argv[0] || f (e, argv, 1) != argv[1]); +' -# These three variables are undocumented and unsupported, -# and are intended to be withdrawn in a future Autoconf release. +# Test code for whether the C compiler supports C99 (global declarations) +ac_c_conftest_c99_globals=' +// Does the compiler advertise C99 conformance? +#if !defined __STDC_VERSION__ || __STDC_VERSION__ < 199901L +# error "Compiler does not advertise C99 conformance" +#endif + +#include +extern int puts (const char *); +extern int printf (const char *, ...); +extern int dprintf (int, const char *, ...); +extern void *malloc (size_t); + +// Check varargs macros. These examples are taken from C99 6.10.3.5. +// dprintf is used instead of fprintf to avoid needing to declare +// FILE and stderr. +#define debug(...) dprintf (2, __VA_ARGS__) +#define showlist(...) puts (#__VA_ARGS__) +#define report(test,...) ((test) ? puts (#test) : printf (__VA_ARGS__)) +static void +test_varargs_macros (void) +{ + int x = 1234; + int y = 5678; + debug ("Flag"); + debug ("X = %d\n", x); + showlist (The first, second, and third items.); + report (x>y, "x is %d but y is %d", x, y); +} + +// Check long long types. +#define BIG64 18446744073709551615ull +#define BIG32 4294967295ul +#define BIG_OK (BIG64 / BIG32 == 4294967297ull && BIG64 % BIG32 == 0) +#if !BIG_OK + #error "your preprocessor is broken" +#endif +#if BIG_OK +#else + #error "your preprocessor is broken" +#endif +static long long int bignum = -9223372036854775807LL; +static unsigned long long int ubignum = BIG64; + +struct incomplete_array +{ + int datasize; + double data[]; +}; + +struct named_init { + int number; + const wchar_t *name; + double average; +}; + +typedef const char *ccp; + +static inline int +test_restrict (ccp restrict text) +{ + // See if C++-style comments work. + // Iterate through items via the restricted pointer. + // Also check for declarations in for loops. + for (unsigned int i = 0; *(text+i) != '\''\0'\''; ++i) + continue; + return 0; +} + +// Check varargs and va_copy. +static bool +test_varargs (const char *format, ...) +{ + va_list args; + va_start (args, format); + va_list args_copy; + va_copy (args_copy, args); + + const char *str = ""; + int number = 0; + float fnumber = 0; + + while (*format) + { + switch (*format++) + { + case '\''s'\'': // string + str = va_arg (args_copy, const char *); + break; + case '\''d'\'': // int + number = va_arg (args_copy, int); + break; + case '\''f'\'': // float + fnumber = va_arg (args_copy, double); + break; + default: + break; + } + } + va_end (args_copy); + va_end (args); + + return *str && number && fnumber; +} +' + +# Test code for whether the C compiler supports C99 (body of main). +ac_c_conftest_c99_main=' + // Check bool. + _Bool success = false; + success |= (argc != 0); + + // Check restrict. + if (test_restrict ("String literal") == 0) + success = true; + char *restrict newvar = "Another string"; + + // Check varargs. + success &= test_varargs ("s, d'\'' f .", "string", 65, 34.234); + test_varargs_macros (); + + // Check flexible array members. + struct incomplete_array *ia = + malloc (sizeof (struct incomplete_array) + (sizeof (double) * 10)); + ia->datasize = 10; + for (int i = 0; i < ia->datasize; ++i) + ia->data[i] = i * 1.234; + + // Check named initializers. + struct named_init ni = { + .number = 34, + .name = L"Test wide string", + .average = 543.34343, + }; + + ni.number = 58; + + int dynamic_array[ni.number]; + dynamic_array[0] = argv[0][0]; + dynamic_array[ni.number - 1] = 543; + + // work around unused variable warnings + ok |= (!success || bignum == 0LL || ubignum == 0uLL || newvar[0] == '\''x'\'' + || dynamic_array[ni.number - 1] != 543); +' + +# Test code for whether the C compiler supports C11 (global declarations) +ac_c_conftest_c11_globals=' +// Does the compiler advertise C11 conformance? +#if !defined __STDC_VERSION__ || __STDC_VERSION__ < 201112L +# error "Compiler does not advertise C11 conformance" +#endif + +// Check _Alignas. +char _Alignas (double) aligned_as_double; +char _Alignas (0) no_special_alignment; +extern char aligned_as_int; +char _Alignas (0) _Alignas (int) aligned_as_int; + +// Check _Alignof. +enum +{ + int_alignment = _Alignof (int), + int_array_alignment = _Alignof (int[100]), + char_alignment = _Alignof (char) +}; +_Static_assert (0 < -_Alignof (int), "_Alignof is signed"); + +// Check _Noreturn. +int _Noreturn does_not_return (void) { for (;;) continue; } + +// Check _Static_assert. +struct test_static_assert +{ + int x; + _Static_assert (sizeof (int) <= sizeof (long int), + "_Static_assert does not work in struct"); + long int y; +}; + +// Check UTF-8 literals. +#define u8 syntax error! +char const utf8_literal[] = u8"happens to be ASCII" "another string"; + +// Check duplicate typedefs. +typedef long *long_ptr; +typedef long int *long_ptr; +typedef long_ptr long_ptr; + +// Anonymous structures and unions -- taken from C11 6.7.2.1 Example 1. +struct anonymous +{ + union { + struct { int i; int j; }; + struct { int k; long int l; } w; + }; + int m; +} v1; +' + +# Test code for whether the C compiler supports C11 (body of main). +ac_c_conftest_c11_main=' + _Static_assert ((offsetof (struct anonymous, i) + == offsetof (struct anonymous, w.k)), + "Anonymous union alignment botch"); + v1.i = 2; + v1.w.k = 5; + ok |= v1.i != 5; +' + +# Test code for whether the C compiler supports C11 (complete). +ac_c_conftest_c11_program="${ac_c_conftest_c89_globals} +${ac_c_conftest_c99_globals} +${ac_c_conftest_c11_globals} + +int +main (int argc, char **argv) +{ + int ok = 0; + ${ac_c_conftest_c89_main} + ${ac_c_conftest_c99_main} + ${ac_c_conftest_c11_main} + return ok; +} +" + +# Test code for whether the C compiler supports C99 (complete). +ac_c_conftest_c99_program="${ac_c_conftest_c89_globals} +${ac_c_conftest_c99_globals} + +int +main (int argc, char **argv) +{ + int ok = 0; + ${ac_c_conftest_c89_main} + ${ac_c_conftest_c99_main} + return ok; +} +" + +# Test code for whether the C compiler supports C89 (complete). +ac_c_conftest_c89_program="${ac_c_conftest_c89_globals} + +int +main (int argc, char **argv) +{ + int ok = 0; + ${ac_c_conftest_c89_main} + return ok; +} +" + +as_fn_append ac_header_c_list " stdio.h stdio_h HAVE_STDIO_H" +as_fn_append ac_header_c_list " stdlib.h stdlib_h HAVE_STDLIB_H" +as_fn_append ac_header_c_list " string.h string_h HAVE_STRING_H" +as_fn_append ac_header_c_list " inttypes.h inttypes_h HAVE_INTTYPES_H" +as_fn_append ac_header_c_list " stdint.h stdint_h HAVE_STDINT_H" +as_fn_append ac_header_c_list " strings.h strings_h HAVE_STRINGS_H" +as_fn_append ac_header_c_list " sys/stat.h sys_stat_h HAVE_SYS_STAT_H" +as_fn_append ac_header_c_list " sys/types.h sys_types_h HAVE_SYS_TYPES_H" +as_fn_append ac_header_c_list " unistd.h unistd_h HAVE_UNISTD_H" +as_fn_append ac_header_c_list " wchar.h wchar_h HAVE_WCHAR_H" +as_fn_append ac_header_c_list " minix/config.h minix_config_h HAVE_MINIX_CONFIG_H" + +# Auxiliary files required by this configure script. +ac_aux_files="config.guess config.sub ltmain.sh ar-lib compile missing install-sh" + +# Locations in which to look for auxiliary files. +ac_aux_dir_candidates="${srcdir}${PATH_SEPARATOR}${srcdir}/..${PATH_SEPARATOR}${srcdir}/../.." + +# Search for a directory containing all of the required auxiliary files, +# $ac_aux_files, from the $PATH-style list $ac_aux_dir_candidates. +# If we don't find one directory that contains all the files we need, +# we report the set of missing files from the *first* directory in +# $ac_aux_dir_candidates and give up. +ac_missing_aux_files="" +ac_first_candidate=: +printf "%s\n" "$as_me:${as_lineno-$LINENO}: looking for aux files: $ac_aux_files" >&5 +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +as_found=false +for as_dir in $ac_aux_dir_candidates +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + as_found=: + + printf "%s\n" "$as_me:${as_lineno-$LINENO}: trying $as_dir" >&5 + ac_aux_dir_found=yes + ac_install_sh= + for ac_aux in $ac_aux_files + do + # As a special case, if "install-sh" is required, that requirement + # can be satisfied by any of "install-sh", "install.sh", or "shtool", + # and $ac_install_sh is set appropriately for whichever one is found. + if test x"$ac_aux" = x"install-sh" + then + if test -f "${as_dir}install-sh"; then + printf "%s\n" "$as_me:${as_lineno-$LINENO}: ${as_dir}install-sh found" >&5 + ac_install_sh="${as_dir}install-sh -c" + elif test -f "${as_dir}install.sh"; then + printf "%s\n" "$as_me:${as_lineno-$LINENO}: ${as_dir}install.sh found" >&5 + ac_install_sh="${as_dir}install.sh -c" + elif test -f "${as_dir}shtool"; then + printf "%s\n" "$as_me:${as_lineno-$LINENO}: ${as_dir}shtool found" >&5 + ac_install_sh="${as_dir}shtool install -c" + else + ac_aux_dir_found=no + if $ac_first_candidate; then + ac_missing_aux_files="${ac_missing_aux_files} install-sh" + else + break + fi + fi + else + if test -f "${as_dir}${ac_aux}"; then + printf "%s\n" "$as_me:${as_lineno-$LINENO}: ${as_dir}${ac_aux} found" >&5 + else + ac_aux_dir_found=no + if $ac_first_candidate; then + ac_missing_aux_files="${ac_missing_aux_files} ${ac_aux}" + else + break + fi + fi + fi + done + if test "$ac_aux_dir_found" = yes; then + ac_aux_dir="$as_dir" + break + fi + ac_first_candidate=false + + as_found=false +done +IFS=$as_save_IFS +if $as_found +then : + +else $as_nop + as_fn_error $? "cannot find required auxiliary files:$ac_missing_aux_files" "$LINENO" 5 +fi + + +# These three variables are undocumented and unsupported, +# and are intended to be withdrawn in a future Autoconf release. # They can cause serious problems if a builder's source tree is in a directory # whose full name contains unusual characters. -ac_config_guess="$SHELL $ac_aux_dir/config.guess" # Please don't use this var. -ac_config_sub="$SHELL $ac_aux_dir/config.sub" # Please don't use this var. -ac_configure="$SHELL $ac_aux_dir/configure" # Please don't use this var. +if test -f "${ac_aux_dir}config.guess"; then + ac_config_guess="$SHELL ${ac_aux_dir}config.guess" +fi +if test -f "${ac_aux_dir}config.sub"; then + ac_config_sub="$SHELL ${ac_aux_dir}config.sub" +fi +if test -f "$ac_aux_dir/configure"; then + ac_configure="$SHELL ${ac_aux_dir}configure" +fi + +# Check that the precious variables saved in the cache have kept the same +# value. +ac_cache_corrupted=false +for ac_var in $ac_precious_vars; do + eval ac_old_set=\$ac_cv_env_${ac_var}_set + eval ac_new_set=\$ac_env_${ac_var}_set + eval ac_old_val=\$ac_cv_env_${ac_var}_value + eval ac_new_val=\$ac_env_${ac_var}_value + case $ac_old_set,$ac_new_set in + set,) + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&5 +printf "%s\n" "$as_me: error: \`$ac_var' was set to \`$ac_old_val' in the previous run" >&2;} + ac_cache_corrupted=: ;; + ,set) + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' was not set in the previous run" >&5 +printf "%s\n" "$as_me: error: \`$ac_var' was not set in the previous run" >&2;} + ac_cache_corrupted=: ;; + ,);; + *) + if test "x$ac_old_val" != "x$ac_new_val"; then + # differences in whitespace do not lead to failure. + ac_old_val_w=`echo x $ac_old_val` + ac_new_val_w=`echo x $ac_new_val` + if test "$ac_old_val_w" != "$ac_new_val_w"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: \`$ac_var' has changed since the previous run:" >&5 +printf "%s\n" "$as_me: error: \`$ac_var' has changed since the previous run:" >&2;} + ac_cache_corrupted=: + else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: warning: ignoring whitespace changes in \`$ac_var' since the previous run:" >&5 +printf "%s\n" "$as_me: warning: ignoring whitespace changes in \`$ac_var' since the previous run:" >&2;} + eval $ac_var=\$ac_old_val + fi + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: former value: \`$ac_old_val'" >&5 +printf "%s\n" "$as_me: former value: \`$ac_old_val'" >&2;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: current value: \`$ac_new_val'" >&5 +printf "%s\n" "$as_me: current value: \`$ac_new_val'" >&2;} + fi;; + esac + # Pass precious variables to config.status. + if test "$ac_new_set" = set; then + case $ac_new_val in + *\'*) ac_arg=$ac_var=`printf "%s\n" "$ac_new_val" | sed "s/'/'\\\\\\\\''/g"` ;; + *) ac_arg=$ac_var=$ac_new_val ;; + esac + case " $ac_configure_args " in + *" '$ac_arg' "*) ;; # Avoid dups. Use of quotes ensures accuracy. + *) as_fn_append ac_configure_args " '$ac_arg'" ;; + esac + fi +done +if $ac_cache_corrupted; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: changes in the environment can compromise the build" >&5 +printf "%s\n" "$as_me: error: changes in the environment can compromise the build" >&2;} + as_fn_error $? "run \`${MAKE-make} distclean' and/or \`rm $cache_file' + and start over" "$LINENO" 5 +fi +## -------------------- ## +## Main body of script. ## +## -------------------- ## + +ac_ext=c +ac_cpp='$CPP $CPPFLAGS' +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' +ac_compiler_gnu=$ac_cv_c_compiler_gnu + + + +am__api_version='1.16' + -# Find a good install program. We prefer a C program (faster), + # Find a good install program. We prefer a C program (faster), # so one script is as good as another. But avoid the broken or # incompatible versions: # SysV /etc/install, /usr/sbin/install @@ -2813,20 +2831,25 @@ ac_configure="$SHELL $ac_aux_dir/configure" # Please don't use this var. # OS/2's system install, which has a completely different semantic # ./install, which can be erroneously created by make from ./install.sh. # Reject install programs that cannot install multiple files. -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for a BSD-compatible install" >&5 -$as_echo_n "checking for a BSD-compatible install... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for a BSD-compatible install" >&5 +printf %s "checking for a BSD-compatible install... " >&6; } if test -z "$INSTALL"; then -if ${ac_cv_path_install+:} false; then : - $as_echo_n "(cached) " >&6 -else +if test ${ac_cv_path_install+y} +then : + printf %s "(cached) " >&6 +else $as_nop as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - # Account for people who put trailing slashes in PATH elements. -case $as_dir/ in #(( - ./ | .// | /[cC]/* | \ + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + # Account for fact that we put trailing slashes in our PATH walk. +case $as_dir in #(( + ./ | /[cC]/* | \ /etc/* | /usr/sbin/* | /usr/etc/* | /sbin/* | /usr/afsws/bin/* | \ ?:[\\/]os2[\\/]install[\\/]* | ?:[\\/]OS2[\\/]INSTALL[\\/]* | \ /usr/ucb/* ) ;; @@ -2836,13 +2859,13 @@ case $as_dir/ in #(( # by default. for ac_prog in ginstall scoinst install; do for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_prog$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_prog$ac_exec_ext"; then if test $ac_prog = install && - grep dspmsg "$as_dir/$ac_prog$ac_exec_ext" >/dev/null 2>&1; then + grep dspmsg "$as_dir$ac_prog$ac_exec_ext" >/dev/null 2>&1; then # AIX install. It has an incompatible calling convention. : elif test $ac_prog = install && - grep pwplus "$as_dir/$ac_prog$ac_exec_ext" >/dev/null 2>&1; then + grep pwplus "$as_dir$ac_prog$ac_exec_ext" >/dev/null 2>&1; then # program-specific install script used by HP pwplus--don't use. : else @@ -2850,12 +2873,12 @@ case $as_dir/ in #(( echo one > conftest.one echo two > conftest.two mkdir conftest.dir - if "$as_dir/$ac_prog$ac_exec_ext" -c conftest.one conftest.two "`pwd`/conftest.dir" && + if "$as_dir$ac_prog$ac_exec_ext" -c conftest.one conftest.two "`pwd`/conftest.dir/" && test -s conftest.one && test -s conftest.two && test -s conftest.dir/conftest.one && test -s conftest.dir/conftest.two then - ac_cv_path_install="$as_dir/$ac_prog$ac_exec_ext -c" + ac_cv_path_install="$as_dir$ac_prog$ac_exec_ext -c" break 3 fi fi @@ -2871,7 +2894,7 @@ IFS=$as_save_IFS rm -rf conftest.one conftest.two conftest.dir fi - if test "${ac_cv_path_install+set}" = set; then + if test ${ac_cv_path_install+y}; then INSTALL=$ac_cv_path_install else # As a last resort, use the slow shell script. Don't cache a @@ -2881,8 +2904,8 @@ fi INSTALL=$ac_install_sh fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $INSTALL" >&5 -$as_echo "$INSTALL" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $INSTALL" >&5 +printf "%s\n" "$INSTALL" >&6; } # Use test -z because SunOS4 sh mishandles braces in ${var-val}. # It thinks the first close brace ends the variable substitution. @@ -2892,8 +2915,8 @@ test -z "$INSTALL_SCRIPT" && INSTALL_SCRIPT='${INSTALL}' test -z "$INSTALL_DATA" && INSTALL_DATA='${INSTALL} -m 644' -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether build environment is sane" >&5 -$as_echo_n "checking whether build environment is sane... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether build environment is sane" >&5 +printf %s "checking whether build environment is sane... " >&6; } # Reject unsafe characters in $srcdir or the absolute working directory # name. Accept space and tab only in the latter. am_lf=' @@ -2947,8 +2970,8 @@ else as_fn_error $? "newly created file is older than distributed files! Check your system clock" "$LINENO" 5 fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } # If we didn't sleep, we still need to ensure time stamps of config.status and # generated files are strictly newer. am_sleep_pid= @@ -2967,26 +2990,23 @@ test "$program_suffix" != NONE && # Double any \ or $. # By default was `s,x,x', remove it if useless. ac_script='s/[\\$]/&&/g;s/;s,x,x,$//' -program_transform_name=`$as_echo "$program_transform_name" | sed "$ac_script"` +program_transform_name=`printf "%s\n" "$program_transform_name" | sed "$ac_script"` + # Expand $ac_aux_dir to an absolute path. am_aux_dir=`cd "$ac_aux_dir" && pwd` -if test x"${MISSING+set}" != xset; then - case $am_aux_dir in - *\ * | *\ *) - MISSING="\${SHELL} \"$am_aux_dir/missing\"" ;; - *) - MISSING="\${SHELL} $am_aux_dir/missing" ;; - esac + + if test x"${MISSING+set}" != xset; then + MISSING="\${SHELL} '$am_aux_dir/missing'" fi # Use eval to expand $SHELL if eval "$MISSING --is-lightweight"; then am_missing_run="$MISSING " else am_missing_run= - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: 'missing' script is too old or missing" >&5 -$as_echo "$as_me: WARNING: 'missing' script is too old or missing" >&2;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: 'missing' script is too old or missing" >&5 +printf "%s\n" "$as_me: WARNING: 'missing' script is too old or missing" >&2;} fi if test x"${install_sh+set}" != xset; then @@ -3006,11 +3026,12 @@ if test "$cross_compiling" != no; then if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}strip", so it can be a program name with args. set dummy ${ac_tool_prefix}strip; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_STRIP+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_STRIP+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$STRIP"; then ac_cv_prog_STRIP="$STRIP" # Let the user override the test. else @@ -3018,11 +3039,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_STRIP="${ac_tool_prefix}strip" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -3033,11 +3058,11 @@ fi fi STRIP=$ac_cv_prog_STRIP if test -n "$STRIP"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $STRIP" >&5 -$as_echo "$STRIP" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $STRIP" >&5 +printf "%s\n" "$STRIP" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -3046,11 +3071,12 @@ if test -z "$ac_cv_prog_STRIP"; then ac_ct_STRIP=$STRIP # Extract the first word of "strip", so it can be a program name with args. set dummy strip; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_STRIP+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_STRIP+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$ac_ct_STRIP"; then ac_cv_prog_ac_ct_STRIP="$ac_ct_STRIP" # Let the user override the test. else @@ -3058,11 +3084,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_STRIP="strip" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -3073,11 +3103,11 @@ fi fi ac_ct_STRIP=$ac_cv_prog_ac_ct_STRIP if test -n "$ac_ct_STRIP"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_STRIP" >&5 -$as_echo "$ac_ct_STRIP" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_STRIP" >&5 +printf "%s\n" "$ac_ct_STRIP" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi if test "x$ac_ct_STRIP" = x; then @@ -3085,8 +3115,8 @@ fi else case $cross_compiling:$ac_tool_warned in yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} ac_tool_warned=yes ;; esac STRIP=$ac_ct_STRIP @@ -3098,25 +3128,31 @@ fi fi INSTALL_STRIP_PROGRAM="\$(install_sh) -c -s" -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for a thread-safe mkdir -p" >&5 -$as_echo_n "checking for a thread-safe mkdir -p... " >&6; } + + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for a race-free mkdir -p" >&5 +printf %s "checking for a race-free mkdir -p... " >&6; } if test -z "$MKDIR_P"; then - if ${ac_cv_path_mkdir+:} false; then : - $as_echo_n "(cached) " >&6 -else + if test ${ac_cv_path_mkdir+y} +then : + printf %s "(cached) " >&6 +else $as_nop as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH$PATH_SEPARATOR/opt/sfw/bin do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_prog in mkdir gmkdir; do for ac_exec_ext in '' $ac_executable_extensions; do - as_fn_executable_p "$as_dir/$ac_prog$ac_exec_ext" || continue - case `"$as_dir/$ac_prog$ac_exec_ext" --version 2>&1` in #( - 'mkdir (GNU coreutils) '* | \ - 'mkdir (coreutils) '* | \ + as_fn_executable_p "$as_dir$ac_prog$ac_exec_ext" || continue + case `"$as_dir$ac_prog$ac_exec_ext" --version 2>&1` in #( + 'mkdir ('*'coreutils) '* | \ + 'BusyBox '* | \ 'mkdir (fileutils) '4.1*) - ac_cv_path_mkdir=$as_dir/$ac_prog$ac_exec_ext + ac_cv_path_mkdir=$as_dir$ac_prog$ac_exec_ext break 3;; esac done @@ -3127,7 +3163,7 @@ IFS=$as_save_IFS fi test -d ./--version && rmdir ./--version - if test "${ac_cv_path_mkdir+set}" = set; then + if test ${ac_cv_path_mkdir+y}; then MKDIR_P="$ac_cv_path_mkdir -p" else # As a last resort, use the slow shell script. Don't cache a @@ -3137,18 +3173,19 @@ fi MKDIR_P="$ac_install_sh -d" fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $MKDIR_P" >&5 -$as_echo "$MKDIR_P" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $MKDIR_P" >&5 +printf "%s\n" "$MKDIR_P" >&6; } for ac_prog in gawk mawk nawk awk do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_AWK+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_AWK+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$AWK"; then ac_cv_prog_AWK="$AWK" # Let the user override the test. else @@ -3156,11 +3193,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_AWK="$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -3171,24 +3212,25 @@ fi fi AWK=$ac_cv_prog_AWK if test -n "$AWK"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $AWK" >&5 -$as_echo "$AWK" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $AWK" >&5 +printf "%s\n" "$AWK" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi test -n "$AWK" && break done -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${MAKE-make} sets \$(MAKE)" >&5 -$as_echo_n "checking whether ${MAKE-make} sets \$(MAKE)... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether ${MAKE-make} sets \$(MAKE)" >&5 +printf %s "checking whether ${MAKE-make} sets \$(MAKE)... " >&6; } set x ${MAKE-make} -ac_make=`$as_echo "$2" | sed 's/+/p/g; s/[^a-zA-Z0-9_]/_/g'` -if eval \${ac_cv_prog_make_${ac_make}_set+:} false; then : - $as_echo_n "(cached) " >&6 -else +ac_make=`printf "%s\n" "$2" | sed 's/+/p/g; s/[^a-zA-Z0-9_]/_/g'` +if eval test \${ac_cv_prog_make_${ac_make}_set+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat >conftest.make <<\_ACEOF SHELL = /bin/sh all: @@ -3204,12 +3246,12 @@ esac rm -f conftest.make fi if eval test \$ac_cv_prog_make_${ac_make}_set = yes; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } SET_MAKE= else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } SET_MAKE="MAKE=${MAKE-make}" fi @@ -3223,7 +3265,8 @@ fi rmdir .tst 2>/dev/null # Check whether --enable-silent-rules was given. -if test "${enable_silent_rules+set}" = set; then : +if test ${enable_silent_rules+y} +then : enableval=$enable_silent_rules; fi @@ -3233,12 +3276,13 @@ case $enable_silent_rules in # ((( *) AM_DEFAULT_VERBOSITY=1;; esac am_make=${MAKE-make} -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $am_make supports nested variables" >&5 -$as_echo_n "checking whether $am_make supports nested variables... " >&6; } -if ${am_cv_make_support_nested_variables+:} false; then : - $as_echo_n "(cached) " >&6 -else - if $as_echo 'TRUE=$(BAR$(V)) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether $am_make supports nested variables" >&5 +printf %s "checking whether $am_make supports nested variables... " >&6; } +if test ${am_cv_make_support_nested_variables+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if printf "%s\n" 'TRUE=$(BAR$(V)) BAR0=false BAR1=true V=1 @@ -3250,8 +3294,8 @@ else am_cv_make_support_nested_variables=no fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $am_cv_make_support_nested_variables" >&5 -$as_echo "$am_cv_make_support_nested_variables" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $am_cv_make_support_nested_variables" >&5 +printf "%s\n" "$am_cv_make_support_nested_variables" >&6; } if test $am_cv_make_support_nested_variables = yes; then AM_V='$(V)' AM_DEFAULT_V='$(AM_DEFAULT_VERBOSITY)' @@ -3282,18 +3326,14 @@ fi # Define the identity of the package. - PACKAGE='pcre' - VERSION='8.43' + PACKAGE='pcre2' + VERSION='10.37' -cat >>confdefs.h <<_ACEOF -#define PACKAGE "$PACKAGE" -_ACEOF +printf "%s\n" "#define PACKAGE \"$PACKAGE\"" >>confdefs.h -cat >>confdefs.h <<_ACEOF -#define VERSION "$VERSION" -_ACEOF +printf "%s\n" "#define VERSION \"$VERSION\"" >>confdefs.h # Some tools Automake needs. @@ -3377,7 +3417,8 @@ END fi # Check whether --enable-silent-rules was given. -if test "${enable_silent_rules+set}" = set; then : +if test ${enable_silent_rules+y} +then : enableval=$enable_silent_rules; fi @@ -3387,12 +3428,13 @@ case $enable_silent_rules in # ((( *) AM_DEFAULT_VERBOSITY=0;; esac am_make=${MAKE-make} -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $am_make supports nested variables" >&5 -$as_echo_n "checking whether $am_make supports nested variables... " >&6; } -if ${am_cv_make_support_nested_variables+:} false; then : - $as_echo_n "(cached) " >&6 -else - if $as_echo 'TRUE=$(BAR$(V)) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether $am_make supports nested variables" >&5 +printf %s "checking whether $am_make supports nested variables... " >&6; } +if test ${am_cv_make_support_nested_variables+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if printf "%s\n" 'TRUE=$(BAR$(V)) BAR0=false BAR1=true V=1 @@ -3404,8 +3446,8 @@ else am_cv_make_support_nested_variables=no fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $am_cv_make_support_nested_variables" >&5 -$as_echo "$am_cv_make_support_nested_variables" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $am_cv_make_support_nested_variables" >&5 +printf "%s\n" "$am_cv_make_support_nested_variables" >&6; } if test $am_cv_make_support_nested_variables = yes; then AM_V='$(V)' AM_DEFAULT_V='$(AM_DEFAULT_VERBOSITY)' @@ -3415,71 +3457,29 @@ else fi AM_BACKSLASH='\' -ac_config_headers="$ac_config_headers config.h" +ac_config_headers="$ac_config_headers src/config.h" + + +# This was added at the suggestion of libtoolize (03-Jan-10) + + +# The default CFLAGS in Autoconf are "-g -O2" for gcc and just "-g" for any +# other compiler. There doesn't seem to be a standard way of getting rid of the +# -g (which I don't think is needed for a production library). This fudge seems +# to achieve the necessary. First, we remember the externally set values of +# CFLAGS. Then call the AC_PROG_CC macro to find the compiler - if CFLAGS is +# not set, it will be set to Autoconf's defaults. Afterwards, if the original +# values were not set, remove the -g from the Autoconf defaults. + +remember_set_CFLAGS="$CFLAGS" + + -# This is a new thing required to stop a warning from automake 1.12 -DEPDIR="${am__leading_dot}deps" -ac_config_commands="$ac_config_commands depfiles" -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${MAKE-make} supports the include directive" >&5 -$as_echo_n "checking whether ${MAKE-make} supports the include directive... " >&6; } -cat > confinc.mk << 'END' -am__doit: - @echo this is the am__doit target >confinc.out -.PHONY: am__doit -END -am__include="#" -am__quote= -# BSD make does it like this. -echo '.include "confinc.mk" # ignored' > confmf.BSD -# Other make implementations (GNU, Solaris 10, AIX) do it like this. -echo 'include confinc.mk # ignored' > confmf.GNU -_am_result=no -for s in GNU BSD; do - { echo "$as_me:$LINENO: ${MAKE-make} -f confmf.$s && cat confinc.out" >&5 - (${MAKE-make} -f confmf.$s && cat confinc.out) >&5 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } - case $?:`cat confinc.out 2>/dev/null` in #( - '0:this is the am__doit target') : - case $s in #( - BSD) : - am__include='.include' am__quote='"' ;; #( - *) : - am__include='include' am__quote='' ;; -esac ;; #( - *) : - ;; -esac - if test "$am__include" != "#"; then - _am_result="yes ($s style)" - break - fi -done -rm -f confinc.* confmf.* -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: ${_am_result}" >&5 -$as_echo "${_am_result}" >&6; } -# Check whether --enable-dependency-tracking was given. -if test "${enable_dependency_tracking+set}" = set; then : - enableval=$enable_dependency_tracking; -fi -if test "x$enable_dependency_tracking" != xno; then - am_depcomp="$ac_aux_dir/depcomp" - AMDEPBACKSLASH='\' - am__nodep='_no' -fi - if test "x$enable_dependency_tracking" != xno; then - AMDEP_TRUE= - AMDEP_FALSE='#' -else - AMDEP_TRUE='#' - AMDEP_FALSE= -fi ac_ext=c @@ -3490,11 +3490,12 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}gcc", so it can be a program name with args. set dummy ${ac_tool_prefix}gcc; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else @@ -3502,11 +3503,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_CC="${ac_tool_prefix}gcc" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -3517,11 +3522,11 @@ fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 -$as_echo "$CC" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 +printf "%s\n" "$CC" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -3530,11 +3535,12 @@ if test -z "$ac_cv_prog_CC"; then ac_ct_CC=$CC # Extract the first word of "gcc", so it can be a program name with args. set dummy gcc; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else @@ -3542,11 +3548,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CC="gcc" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -3557,11 +3567,11 @@ fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CC" >&5 -$as_echo "$ac_ct_CC" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CC" >&5 +printf "%s\n" "$ac_ct_CC" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi if test "x$ac_ct_CC" = x; then @@ -3569,8 +3579,8 @@ fi else case $cross_compiling:$ac_tool_warned in yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} ac_tool_warned=yes ;; esac CC=$ac_ct_CC @@ -3583,11 +3593,12 @@ if test -z "$CC"; then if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}cc", so it can be a program name with args. set dummy ${ac_tool_prefix}cc; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else @@ -3595,11 +3606,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_CC="${ac_tool_prefix}cc" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -3610,11 +3625,11 @@ fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 -$as_echo "$CC" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 +printf "%s\n" "$CC" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -3623,11 +3638,12 @@ fi if test -z "$CC"; then # Extract the first word of "cc", so it can be a program name with args. set dummy cc; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else @@ -3636,15 +3652,19 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - if test "$as_dir/$ac_word$ac_exec_ext" = "/usr/ucb/cc"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + if test "$as_dir$ac_word$ac_exec_ext" = "/usr/ucb/cc"; then ac_prog_rejected=yes continue fi ac_cv_prog_CC="cc" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -3660,18 +3680,18 @@ if test $ac_prog_rejected = yes; then # However, it has the same basename, so the bogon will be chosen # first if we set CC to just the basename; use the full file name. shift - ac_cv_prog_CC="$as_dir/$ac_word${1+' '}$@" + ac_cv_prog_CC="$as_dir$ac_word${1+' '}$@" fi fi fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 -$as_echo "$CC" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 +printf "%s\n" "$CC" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -3682,11 +3702,12 @@ if test -z "$CC"; then do # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. set dummy $ac_tool_prefix$ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$CC"; then ac_cv_prog_CC="$CC" # Let the user override the test. else @@ -3694,11 +3715,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_CC="$ac_tool_prefix$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -3709,11 +3734,11 @@ fi fi CC=$ac_cv_prog_CC if test -n "$CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 -$as_echo "$CC" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 +printf "%s\n" "$CC" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -3726,11 +3751,12 @@ if test -z "$CC"; then do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$ac_ct_CC"; then ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. else @@ -3738,11 +3764,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_CC="$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -3753,11 +3783,11 @@ fi fi ac_ct_CC=$ac_cv_prog_ac_ct_CC if test -n "$ac_ct_CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CC" >&5 -$as_echo "$ac_ct_CC" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CC" >&5 +printf "%s\n" "$ac_ct_CC" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -3769,8 +3799,8 @@ done else case $cross_compiling:$ac_tool_warned in yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} ac_tool_warned=yes ;; esac CC=$ac_ct_CC @@ -3778,25 +3808,129 @@ esac fi fi - - -test -z "$CC" && { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} -as_fn_error $? "no acceptable C compiler found in \$PATH -See \`config.log' for more details" "$LINENO" 5; } - +if test -z "$CC"; then + if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}clang", so it can be a program name with args. +set dummy ${ac_tool_prefix}clang; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$CC"; then + ac_cv_prog_CC="$CC" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_CC="${ac_tool_prefix}clang" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS + +fi +fi +CC=$ac_cv_prog_CC +if test -n "$CC"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 +printf "%s\n" "$CC" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + + +fi +if test -z "$ac_cv_prog_CC"; then + ac_ct_CC=$CC + # Extract the first word of "clang", so it can be a program name with args. +set dummy clang; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_CC"; then + ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_CC="clang" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS + +fi +fi +ac_ct_CC=$ac_cv_prog_ac_ct_CC +if test -n "$ac_ct_CC"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CC" >&5 +printf "%s\n" "$ac_ct_CC" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + + if test "x$ac_ct_CC" = x; then + CC="" + else + case $cross_compiling:$ac_tool_warned in +yes:) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +ac_tool_warned=yes ;; +esac + CC=$ac_ct_CC + fi +else + CC="$ac_cv_prog_CC" +fi + +fi + + +test -z "$CC" && { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} +as_fn_error $? "no acceptable C compiler found in \$PATH +See \`config.log' for more details" "$LINENO" 5; } + # Provide some information about the compiler. -$as_echo "$as_me:${as_lineno-$LINENO}: checking for C compiler version" >&5 +printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for C compiler version" >&5 set X $ac_compile ac_compiler=$2 -for ac_option in --version -v -V -qversion; do +for ac_option in --version -v -V -qversion -version; do { { ac_try="$ac_compiler $ac_option >&5" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_compiler $ac_option >&5") 2>conftest.err ac_status=$? if test -s conftest.err; then @@ -3806,7 +3940,7 @@ $as_echo "$ac_try_echo"; } >&5 cat conftest.er1 >&5 fi rm -f conftest.er1 conftest.err - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } done @@ -3814,7 +3948,7 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int -main () +main (void) { ; @@ -3826,9 +3960,9 @@ ac_clean_files="$ac_clean_files a.out a.out.dSYM a.exe b.out" # Try to create an executable without -o first, disregard a.out. # It will help us diagnose broken compilers, and finding out an intuition # of exeext. -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the C compiler works" >&5 -$as_echo_n "checking whether the C compiler works... " >&6; } -ac_link_default=`$as_echo "$ac_link" | sed 's/ -o *conftest[^ ]*//'` +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the C compiler works" >&5 +printf %s "checking whether the C compiler works... " >&6; } +ac_link_default=`printf "%s\n" "$ac_link" | sed 's/ -o *conftest[^ ]*//'` # The possible output files: ac_files="a.out conftest.exe conftest a.exe a_out.exe b.out conftest.*" @@ -3849,11 +3983,12 @@ case "(($ac_try" in *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_link_default") 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; }; then : + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; } +then : # Autoconf-2.13 could set the ac_cv_exeext variable to `no'. # So ignore a value of `no', otherwise this would lead to `EXEEXT = no' # in a Makefile. We should not override ac_cv_exeext if it was cached, @@ -3870,7 +4005,7 @@ do # certainly right. break;; *.* ) - if test "${ac_cv_exeext+set}" = set && test "$ac_cv_exeext" != no; + if test ${ac_cv_exeext+y} && test "$ac_cv_exeext" != no; then :; else ac_cv_exeext=`expr "$ac_file" : '[^.]*\(\..*\)'` fi @@ -3886,44 +4021,46 @@ do done test "$ac_cv_exeext" = no && ac_cv_exeext= -else +else $as_nop ac_file='' fi -if test -z "$ac_file"; then : - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -$as_echo "$as_me: failed program was:" >&5 +if test -z "$ac_file" +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +printf "%s\n" "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 -{ { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} +{ { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error 77 "C compiler cannot create executables See \`config.log' for more details" "$LINENO" 5; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for C compiler default output file name" >&5 -$as_echo_n "checking for C compiler default output file name... " >&6; } -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_file" >&5 -$as_echo "$ac_file" >&6; } +else $as_nop + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for C compiler default output file name" >&5 +printf %s "checking for C compiler default output file name... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_file" >&5 +printf "%s\n" "$ac_file" >&6; } ac_exeext=$ac_cv_exeext rm -f -r a.out a.out.dSYM a.exe conftest$ac_cv_exeext b.out ac_clean_files=$ac_clean_files_save -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for suffix of executables" >&5 -$as_echo_n "checking for suffix of executables... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for suffix of executables" >&5 +printf %s "checking for suffix of executables... " >&6; } if { { ac_try="$ac_link" case "(($ac_try" in *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_link") 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; }; then : + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; } +then : # If both `conftest.exe' and `conftest' are `present' (well, observable) # catch `conftest.exe'. For instance with Cygwin, `ls conftest' will # work properly (i.e., refer to `conftest.exe'), while it won't with @@ -3937,15 +4074,15 @@ for ac_file in conftest.exe conftest conftest.*; do * ) break;; esac done -else - { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} +else $as_nop + { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "cannot compute suffix of executables: cannot compile and link See \`config.log' for more details" "$LINENO" 5; } fi rm -f conftest conftest$ac_cv_exeext -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_exeext" >&5 -$as_echo "$ac_cv_exeext" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_exeext" >&5 +printf "%s\n" "$ac_cv_exeext" >&6; } rm -f conftest.$ac_ext EXEEXT=$ac_cv_exeext @@ -3954,7 +4091,7 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #include int -main () +main (void) { FILE *f = fopen ("conftest.out", "w"); return ferror (f) || fclose (f) != 0; @@ -3966,8 +4103,8 @@ _ACEOF ac_clean_files="$ac_clean_files conftest.out" # Check that the compiler produces executables we can run. If not, either # the compiler is broken, or we cross compile. -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether we are cross compiling" >&5 -$as_echo_n "checking whether we are cross compiling... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether we are cross compiling" >&5 +printf %s "checking whether we are cross compiling... " >&6; } if test "$cross_compiling" != yes; then { { ac_try="$ac_link" case "(($ac_try" in @@ -3975,10 +4112,10 @@ case "(($ac_try" in *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_link") 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } if { ac_try='./conftest$ac_cv_exeext' { { case "(($ac_try" in @@ -3986,39 +4123,40 @@ $as_echo "$ac_try_echo"; } >&5 *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_try") 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; }; }; then cross_compiling=no else if test "$cross_compiling" = maybe; then cross_compiling=yes else - { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} -as_fn_error $? "cannot run C compiled programs. + { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} +as_fn_error 77 "cannot run C compiled programs. If you meant to cross compile, use \`--host'. See \`config.log' for more details" "$LINENO" 5; } fi fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $cross_compiling" >&5 -$as_echo "$cross_compiling" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $cross_compiling" >&5 +printf "%s\n" "$cross_compiling" >&6; } rm -f conftest.$ac_ext conftest$ac_cv_exeext conftest.out ac_clean_files=$ac_clean_files_save -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for suffix of object files" >&5 -$as_echo_n "checking for suffix of object files... " >&6; } -if ${ac_cv_objext+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for suffix of object files" >&5 +printf %s "checking for suffix of object files... " >&6; } +if test ${ac_cv_objext+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int -main () +main (void) { ; @@ -4032,11 +4170,12 @@ case "(($ac_try" in *) ac_try_echo=$ac_try;; esac eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 +printf "%s\n" "$ac_try_echo"; } >&5 (eval "$ac_compile") 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; }; then : + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; } +then : for ac_file in conftest.o conftest.obj conftest.*; do test -f "$ac_file" || continue; case $ac_file in @@ -4045,31 +4184,32 @@ $as_echo "$ac_try_echo"; } >&5 break;; esac done -else - $as_echo "$as_me: failed program was:" >&5 +else $as_nop + printf "%s\n" "$as_me: failed program was:" >&5 sed 's/^/| /' conftest.$ac_ext >&5 -{ { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} +{ { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "cannot compute suffix of object files: cannot compile See \`config.log' for more details" "$LINENO" 5; } fi rm -f conftest.$ac_cv_objext conftest.$ac_ext fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_objext" >&5 -$as_echo "$ac_cv_objext" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_objext" >&5 +printf "%s\n" "$ac_cv_objext" >&6; } OBJEXT=$ac_cv_objext ac_objext=$OBJEXT -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether we are using the GNU C compiler" >&5 -$as_echo_n "checking whether we are using the GNU C compiler... " >&6; } -if ${ac_cv_c_compiler_gnu+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the compiler supports GNU C" >&5 +printf %s "checking whether the compiler supports GNU C... " >&6; } +if test ${ac_cv_c_compiler_gnu+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int -main () +main (void) { #ifndef __GNUC__ choke me @@ -4079,29 +4219,33 @@ main () return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : ac_compiler_gnu=yes -else +else $as_nop ac_compiler_gnu=no fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ac_cv_c_compiler_gnu=$ac_compiler_gnu fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_compiler_gnu" >&5 -$as_echo "$ac_cv_c_compiler_gnu" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_compiler_gnu" >&5 +printf "%s\n" "$ac_cv_c_compiler_gnu" >&6; } +ac_compiler_gnu=$ac_cv_c_compiler_gnu + if test $ac_compiler_gnu = yes; then GCC=yes else GCC= fi -ac_test_CFLAGS=${CFLAGS+set} +ac_test_CFLAGS=${CFLAGS+y} ac_save_CFLAGS=$CFLAGS -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CC accepts -g" >&5 -$as_echo_n "checking whether $CC accepts -g... " >&6; } -if ${ac_cv_prog_cc_g+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether $CC accepts -g" >&5 +printf %s "checking whether $CC accepts -g... " >&6; } +if test ${ac_cv_prog_cc_g+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_save_c_werror_flag=$ac_c_werror_flag ac_c_werror_flag=yes ac_cv_prog_cc_g=no @@ -4110,57 +4254,60 @@ else /* end confdefs.h. */ int -main () +main (void) { ; return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : ac_cv_prog_cc_g=yes -else +else $as_nop CFLAGS="" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int -main () +main (void) { ; return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : -else +else $as_nop ac_c_werror_flag=$ac_save_c_werror_flag CFLAGS="-g" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int -main () +main (void) { ; return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : ac_cv_prog_cc_g=yes fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ac_c_werror_flag=$ac_save_c_werror_flag fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_g" >&5 -$as_echo "$ac_cv_prog_cc_g" >&6; } -if test "$ac_test_CFLAGS" = set; then +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_g" >&5 +printf "%s\n" "$ac_cv_prog_cc_g" >&6; } +if test $ac_test_CFLAGS; then CFLAGS=$ac_save_CFLAGS elif test $ac_cv_prog_cc_g = yes; then if test "$GCC" = yes; then @@ -4175,94 +4322,144 @@ else CFLAGS= fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $CC option to accept ISO C89" >&5 -$as_echo_n "checking for $CC option to accept ISO C89... " >&6; } -if ${ac_cv_prog_cc_c89+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_cv_prog_cc_c89=no +ac_prog_cc_stdc=no +if test x$ac_prog_cc_stdc = xno +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $CC option to enable C11 features" >&5 +printf %s "checking for $CC option to enable C11 features... " >&6; } +if test ${ac_cv_prog_cc_c11+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_cv_prog_cc_c11=no ac_save_CC=$CC cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ -#include -#include -struct stat; -/* Most of the following tests are stolen from RCS 5.7's src/conf.sh. */ -struct buf { int x; }; -FILE * (*rcsopen) (struct buf *, struct stat *, int); -static char *e (p, i) - char **p; - int i; -{ - return p[i]; -} -static char *f (char * (*g) (char **, int), char **p, ...) -{ - char *s; - va_list v; - va_start (v,p); - s = g (p, va_arg (v,int)); - va_end (v); - return s; -} - -/* OSF 4.0 Compaq cc is some sort of almost-ANSI by default. It has - function prototypes and stuff, but not '\xHH' hex character constants. - These don't provoke an error unfortunately, instead are silently treated - as 'x'. The following induces an error, until -std is added to get - proper ANSI mode. Curiously '\x00'!='x' always comes out true, for an - array size at least. It's necessary to write '\x00'==0 to get something - that's true only with -std. */ -int osf4_cc_array ['\x00' == 0 ? 1 : -1]; +$ac_c_conftest_c11_program +_ACEOF +for ac_arg in '' -std=gnu11 +do + CC="$ac_save_CC $ac_arg" + if ac_fn_c_try_compile "$LINENO" +then : + ac_cv_prog_cc_c11=$ac_arg +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam + test "x$ac_cv_prog_cc_c11" != "xno" && break +done +rm -f conftest.$ac_ext +CC=$ac_save_CC +fi -/* IBM C 6 for AIX is almost-ANSI by default, but it replaces macro parameters - inside strings and character constants. */ -#define FOO(x) 'x' -int xlc6_cc_array[FOO(a) == 'x' ? 1 : -1]; +if test "x$ac_cv_prog_cc_c11" = xno +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5 +printf "%s\n" "unsupported" >&6; } +else $as_nop + if test "x$ac_cv_prog_cc_c11" = x +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: none needed" >&5 +printf "%s\n" "none needed" >&6; } +else $as_nop + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c11" >&5 +printf "%s\n" "$ac_cv_prog_cc_c11" >&6; } + CC="$CC $ac_cv_prog_cc_c11" +fi + ac_cv_prog_cc_stdc=$ac_cv_prog_cc_c11 + ac_prog_cc_stdc=c11 +fi +fi +if test x$ac_prog_cc_stdc = xno +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $CC option to enable C99 features" >&5 +printf %s "checking for $CC option to enable C99 features... " >&6; } +if test ${ac_cv_prog_cc_c99+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_cv_prog_cc_c99=no +ac_save_CC=$CC +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +$ac_c_conftest_c99_program +_ACEOF +for ac_arg in '' -std=gnu99 -std=c99 -c99 -qlanglvl=extc1x -qlanglvl=extc99 -AC99 -D_STDC_C99= +do + CC="$ac_save_CC $ac_arg" + if ac_fn_c_try_compile "$LINENO" +then : + ac_cv_prog_cc_c99=$ac_arg +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam + test "x$ac_cv_prog_cc_c99" != "xno" && break +done +rm -f conftest.$ac_ext +CC=$ac_save_CC +fi -int test (int i, double x); -struct s1 {int (*f) (int a);}; -struct s2 {int (*f) (double a);}; -int pairnames (int, char **, FILE *(*)(struct buf *, struct stat *, int), int, int); -int argc; -char **argv; -int -main () -{ -return f (e, argv, 0) != argv[0] || f (e, argv, 1) != argv[1]; - ; - return 0; -} +if test "x$ac_cv_prog_cc_c99" = xno +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5 +printf "%s\n" "unsupported" >&6; } +else $as_nop + if test "x$ac_cv_prog_cc_c99" = x +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: none needed" >&5 +printf "%s\n" "none needed" >&6; } +else $as_nop + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c99" >&5 +printf "%s\n" "$ac_cv_prog_cc_c99" >&6; } + CC="$CC $ac_cv_prog_cc_c99" +fi + ac_cv_prog_cc_stdc=$ac_cv_prog_cc_c99 + ac_prog_cc_stdc=c99 +fi +fi +if test x$ac_prog_cc_stdc = xno +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $CC option to enable C89 features" >&5 +printf %s "checking for $CC option to enable C89 features... " >&6; } +if test ${ac_cv_prog_cc_c89+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_cv_prog_cc_c89=no +ac_save_CC=$CC +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +$ac_c_conftest_c89_program _ACEOF -for ac_arg in '' -qlanglvl=extc89 -qlanglvl=ansi -std \ - -Ae "-Aa -D_HPUX_SOURCE" "-Xc -D__EXTENSIONS__" +for ac_arg in '' -qlanglvl=extc89 -qlanglvl=ansi -std -Ae "-Aa -D_HPUX_SOURCE" "-Xc -D__EXTENSIONS__" do CC="$ac_save_CC $ac_arg" - if ac_fn_c_try_compile "$LINENO"; then : + if ac_fn_c_try_compile "$LINENO" +then : ac_cv_prog_cc_c89=$ac_arg fi -rm -f core conftest.err conftest.$ac_objext +rm -f core conftest.err conftest.$ac_objext conftest.beam test "x$ac_cv_prog_cc_c89" != "xno" && break done rm -f conftest.$ac_ext CC=$ac_save_CC - fi -# AC_CACHE_VAL -case "x$ac_cv_prog_cc_c89" in - x) - { $as_echo "$as_me:${as_lineno-$LINENO}: result: none needed" >&5 -$as_echo "none needed" >&6; } ;; - xno) - { $as_echo "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5 -$as_echo "unsupported" >&6; } ;; - *) - CC="$CC $ac_cv_prog_cc_c89" - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c89" >&5 -$as_echo "$ac_cv_prog_cc_c89" >&6; } ;; -esac -if test "x$ac_cv_prog_cc_c89" != xno; then : +if test "x$ac_cv_prog_cc_c89" = xno +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5 +printf "%s\n" "unsupported" >&6; } +else $as_nop + if test "x$ac_cv_prog_cc_c89" = x +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: none needed" >&5 +printf "%s\n" "none needed" >&6; } +else $as_nop + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c89" >&5 +printf "%s\n" "$ac_cv_prog_cc_c89" >&6; } + CC="$CC $ac_cv_prog_cc_c89" +fi + ac_cv_prog_cc_stdc=$ac_cv_prog_cc_c89 + ac_prog_cc_stdc=c89 +fi fi ac_ext=c @@ -4271,21 +4468,23 @@ ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu -ac_ext=c + + ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CC understands -c and -o together" >&5 -$as_echo_n "checking whether $CC understands -c and -o together... " >&6; } -if ${am_cv_prog_cc_c_o+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether $CC understands -c and -o together" >&5 +printf %s "checking whether $CC understands -c and -o together... " >&6; } +if test ${am_cv_prog_cc_c_o+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int -main () +main (void) { ; @@ -4313,8 +4512,8 @@ _ACEOF rm -f core conftest* unset am_i fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $am_cv_prog_cc_c_o" >&5 -$as_echo "$am_cv_prog_cc_c_o" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $am_cv_prog_cc_c_o" >&5 +printf "%s\n" "$am_cv_prog_cc_c_o" >&6; } if test "$am_cv_prog_cc_c_o" != yes; then # Losing compiler, so override with the script. # FIXME: It is wrong to rewrite CC. @@ -4329,14 +4528,79 @@ ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu +DEPDIR="${am__leading_dot}deps" -depcc="$CC" am_compiler_list= +ac_config_commands="$ac_config_commands depfiles" + +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether ${MAKE-make} supports the include directive" >&5 +printf %s "checking whether ${MAKE-make} supports the include directive... " >&6; } +cat > confinc.mk << 'END' +am__doit: + @echo this is the am__doit target >confinc.out +.PHONY: am__doit +END +am__include="#" +am__quote= +# BSD make does it like this. +echo '.include "confinc.mk" # ignored' > confmf.BSD +# Other make implementations (GNU, Solaris 10, AIX) do it like this. +echo 'include confinc.mk # ignored' > confmf.GNU +_am_result=no +for s in GNU BSD; do + { echo "$as_me:$LINENO: ${MAKE-make} -f confmf.$s && cat confinc.out" >&5 + (${MAKE-make} -f confmf.$s && cat confinc.out) >&5 2>&5 + ac_status=$? + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + (exit $ac_status); } + case $?:`cat confinc.out 2>/dev/null` in #( + '0:this is the am__doit target') : + case $s in #( + BSD) : + am__include='.include' am__quote='"' ;; #( + *) : + am__include='include' am__quote='' ;; +esac ;; #( + *) : + ;; +esac + if test "$am__include" != "#"; then + _am_result="yes ($s style)" + break + fi +done +rm -f confinc.* confmf.* +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: ${_am_result}" >&5 +printf "%s\n" "${_am_result}" >&6; } + +# Check whether --enable-dependency-tracking was given. +if test ${enable_dependency_tracking+y} +then : + enableval=$enable_dependency_tracking; +fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking dependency style of $depcc" >&5 -$as_echo_n "checking dependency style of $depcc... " >&6; } -if ${am_cv_CC_dependencies_compiler_type+:} false; then : - $as_echo_n "(cached) " >&6 +if test "x$enable_dependency_tracking" != xno; then + am_depcomp="$ac_aux_dir/depcomp" + AMDEPBACKSLASH='\' + am__nodep='_no' +fi + if test "x$enable_dependency_tracking" != xno; then + AMDEP_TRUE= + AMDEP_FALSE='#' else + AMDEP_TRUE='#' + AMDEP_FALSE= +fi + + + +depcc="$CC" am_compiler_list= + +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking dependency style of $depcc" >&5 +printf %s "checking dependency style of $depcc... " >&6; } +if test ${am_cv_CC_dependencies_compiler_type+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then # We make a subdir and do the tests there. Otherwise we can end up # making bogus files that we don't know about and never remove. For @@ -4443,8 +4707,8 @@ else fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $am_cv_CC_dependencies_compiler_type" >&5 -$as_echo "$am_cv_CC_dependencies_compiler_type" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $am_cv_CC_dependencies_compiler_type" >&5 +printf "%s\n" "$am_cv_CC_dependencies_compiler_type" >&6; } CCDEPMODE=depmode=$am_cv_CC_dependencies_compiler_type if @@ -4459,72 +4723,267 @@ fi -if test -n "$ac_tool_prefix"; then - for ac_prog in ar lib "link -lib" - do - # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. -set dummy $ac_tool_prefix$ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_AR+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$AR"; then - ac_cv_prog_AR="$AR" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH + +ac_header= ac_cache= +for ac_item in $ac_header_c_list do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_AR="$ac_tool_prefix$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 + if test $ac_cache; then + ac_fn_c_check_header_compile "$LINENO" $ac_header ac_cv_header_$ac_cache "$ac_includes_default" + if eval test \"x\$ac_cv_header_$ac_cache\" = xyes; then + printf "%s\n" "#define $ac_item 1" >> confdefs.h + fi + ac_header= ac_cache= + elif test $ac_header; then + ac_cache=$ac_item + else + ac_header=$ac_item fi done - done -IFS=$as_save_IFS -fi -fi -AR=$ac_cv_prog_AR -if test -n "$AR"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $AR" >&5 -$as_echo "$AR" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - test -n "$AR" && break - done + + + + + +if test $ac_cv_header_stdlib_h = yes && test $ac_cv_header_string_h = yes +then : + +printf "%s\n" "#define STDC_HEADERS 1" >>confdefs.h + fi -if test -z "$AR"; then - ac_ct_AR=$AR - for ac_prog in ar lib "link -lib" -do - # Extract the first word of "$ac_prog", so it can be a program name with args. -set dummy $ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_AR+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_AR"; then - ac_cv_prog_ac_ct_AR="$ac_ct_AR" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH + + + + + + + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether it is safe to define __EXTENSIONS__" >&5 +printf %s "checking whether it is safe to define __EXTENSIONS__... " >&6; } +if test ${ac_cv_safe_to_define___extensions__+y} +then : + printf %s "(cached) " >&6 +else $as_nop + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +# define __EXTENSIONS__ 1 + $ac_includes_default +int +main (void) +{ + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO" +then : + ac_cv_safe_to_define___extensions__=yes +else $as_nop + ac_cv_safe_to_define___extensions__=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_safe_to_define___extensions__" >&5 +printf "%s\n" "$ac_cv_safe_to_define___extensions__" >&6; } + + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether _XOPEN_SOURCE should be defined" >&5 +printf %s "checking whether _XOPEN_SOURCE should be defined... " >&6; } +if test ${ac_cv_should_define__xopen_source+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_cv_should_define__xopen_source=no + if test $ac_cv_header_wchar_h = yes +then : + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + + #include + mbstate_t x; +int +main (void) +{ + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO" +then : + +else $as_nop + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + + #define _XOPEN_SOURCE 500 + #include + mbstate_t x; +int +main (void) +{ + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO" +then : + ac_cv_should_define__xopen_source=yes +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext +fi +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_should_define__xopen_source" >&5 +printf "%s\n" "$ac_cv_should_define__xopen_source" >&6; } + + printf "%s\n" "#define _ALL_SOURCE 1" >>confdefs.h + + printf "%s\n" "#define _DARWIN_C_SOURCE 1" >>confdefs.h + + printf "%s\n" "#define _GNU_SOURCE 1" >>confdefs.h + + printf "%s\n" "#define _HPUX_ALT_XOPEN_SOCKET_API 1" >>confdefs.h + + printf "%s\n" "#define _NETBSD_SOURCE 1" >>confdefs.h + + printf "%s\n" "#define _OPENBSD_SOURCE 1" >>confdefs.h + + printf "%s\n" "#define _POSIX_PTHREAD_SEMANTICS 1" >>confdefs.h + + printf "%s\n" "#define __STDC_WANT_IEC_60559_ATTRIBS_EXT__ 1" >>confdefs.h + + printf "%s\n" "#define __STDC_WANT_IEC_60559_BFP_EXT__ 1" >>confdefs.h + + printf "%s\n" "#define __STDC_WANT_IEC_60559_DFP_EXT__ 1" >>confdefs.h + + printf "%s\n" "#define __STDC_WANT_IEC_60559_FUNCS_EXT__ 1" >>confdefs.h + + printf "%s\n" "#define __STDC_WANT_IEC_60559_TYPES_EXT__ 1" >>confdefs.h + + printf "%s\n" "#define __STDC_WANT_LIB_EXT2__ 1" >>confdefs.h + + printf "%s\n" "#define __STDC_WANT_MATH_SPEC_FUNCS__ 1" >>confdefs.h + + printf "%s\n" "#define _TANDEM_SOURCE 1" >>confdefs.h + + if test $ac_cv_header_minix_config_h = yes +then : + MINIX=yes + printf "%s\n" "#define _MINIX 1" >>confdefs.h + + printf "%s\n" "#define _POSIX_SOURCE 1" >>confdefs.h + + printf "%s\n" "#define _POSIX_1_SOURCE 2" >>confdefs.h + +else $as_nop + MINIX= +fi + if test $ac_cv_safe_to_define___extensions__ = yes +then : + printf "%s\n" "#define __EXTENSIONS__ 1" >>confdefs.h + +fi + if test $ac_cv_should_define__xopen_source = yes +then : + printf "%s\n" "#define _XOPEN_SOURCE 500" >>confdefs.h + +fi + + +if test "x$remember_set_CFLAGS" = "x" +then + if test "$CFLAGS" = "-g -O2" + then + CFLAGS="-O2" + elif test "$CFLAGS" = "-g" + then + CFLAGS="" + fi +fi + +# This is a new thing required to stop a warning from automake 1.12 + + if test -n "$ac_tool_prefix"; then + for ac_prog in ar lib "link -lib" + do + # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. +set dummy $ac_tool_prefix$ac_prog; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_AR+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$AR"; then + ac_cv_prog_AR="$AR" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_AR="$ac_tool_prefix$ac_prog" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS + +fi +fi +AR=$ac_cv_prog_AR +if test -n "$AR"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $AR" >&5 +printf "%s\n" "$AR" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + + + test -n "$AR" && break + done +fi +if test -z "$AR"; then + ac_ct_AR=$AR + for ac_prog in ar lib "link -lib" +do + # Extract the first word of "$ac_prog", so it can be a program name with args. +set dummy $ac_prog; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_AR+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_AR"; then + ac_cv_prog_ac_ct_AR="$ac_ct_AR" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_AR="$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -4535,11 +4994,11 @@ fi fi ac_ct_AR=$ac_cv_prog_ac_ct_AR if test -n "$ac_ct_AR"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_AR" >&5 -$as_echo "$ac_ct_AR" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_AR" >&5 +printf "%s\n" "$ac_ct_AR" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -4551,8 +5010,8 @@ done else case $cross_compiling:$ac_tool_warned in yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} ac_tool_warned=yes ;; esac AR=$ac_ct_AR @@ -4561,11 +5020,12 @@ fi : ${AR=ar} -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking the archiver ($AR) interface" >&5 -$as_echo_n "checking the archiver ($AR) interface... " >&6; } -if ${am_cv_ar_interface+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking the archiver ($AR) interface" >&5 +printf %s "checking the archiver ($AR) interface... " >&6; } +if test ${am_cv_ar_interface+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' @@ -4577,12 +5037,13 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu /* end confdefs.h. */ int some_variable = 0; _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : am_ar_try='$AR cru libconftest.a conftest.$ac_objext >&5' { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$am_ar_try\""; } >&5 (eval $am_ar_try) 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } if test "$ac_status" -eq 0; then am_cv_ar_interface=ar @@ -4591,7 +5052,7 @@ if ac_fn_c_try_compile "$LINENO"; then : { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$am_ar_try\""; } >&5 (eval $am_ar_try) 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; } if test "$ac_status" -eq 0; then am_cv_ar_interface=lib @@ -4602,7 +5063,7 @@ if ac_fn_c_try_compile "$LINENO"; then : rm -f conftest.lib libconftest.a fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' @@ -4610,8 +5071,8 @@ ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $ ac_compiler_gnu=$ac_cv_c_compiler_gnu fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $am_cv_ar_interface" >&5 -$as_echo "$am_cv_ar_interface" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $am_cv_ar_interface" >&5 +printf "%s\n" "$am_cv_ar_interface" >&6; } case $am_cv_ar_interface in ar) @@ -4631,1868 +5092,1160 @@ unknown) esac -# This was added at the suggestion of libtoolize (03-Jan-10) +# Check for a 64-bit integer type +ac_fn_c_find_intX_t "$LINENO" "64" "ac_cv_c_int64_t" +case $ac_cv_c_int64_t in #( + no|yes) ;; #( + *) +printf "%s\n" "#define int64_t $ac_cv_c_int64_t" >>confdefs.h +;; +esac -# The default CFLAGS and CXXFLAGS in Autoconf are "-g -O2" for gcc and just -# "-g" for any other compiler. There doesn't seem to be a standard way of -# getting rid of the -g (which I don't think is needed for a production -# library). This fudge seems to achieve the necessary. First, we remember the -# externally set values of CFLAGS and CXXFLAGS. Then call the AC_PROG_CC and -# AC_PROG_CXX macros to find the compilers - if CFLAGS and CXXFLAGS are not -# set, they will be set to Autoconf's defaults. Afterwards, if the original -# values were not set, remove the -g from the Autoconf defaults. -# (PH 02-May-07) -remember_set_CFLAGS="$CFLAGS" -remember_set_CXXFLAGS="$CXXFLAGS" -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu -if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}gcc", so it can be a program name with args. -set dummy ${ac_tool_prefix}gcc; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$CC"; then - ac_cv_prog_CC="$CC" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_CC="${ac_tool_prefix}gcc" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS +case `pwd` in + *\ * | *\ *) + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: Libtool does not cope well with whitespace in \`pwd\`" >&5 +printf "%s\n" "$as_me: WARNING: Libtool does not cope well with whitespace in \`pwd\`" >&2;} ;; +esac -fi -fi -CC=$ac_cv_prog_CC -if test -n "$CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 -$as_echo "$CC" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi -fi -if test -z "$ac_cv_prog_CC"; then - ac_ct_CC=$CC - # Extract the first word of "gcc", so it can be a program name with args. -set dummy gcc; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_CC"; then - ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_CC="gcc" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS +macro_version='2.4.6.42-b88ce-dirty' +macro_revision='2.4.6.42' -fi -fi -ac_ct_CC=$ac_cv_prog_ac_ct_CC -if test -n "$ac_ct_CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CC" >&5 -$as_echo "$ac_ct_CC" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - if test "x$ac_ct_CC" = x; then - CC="" - else - case $cross_compiling:$ac_tool_warned in -yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} -ac_tool_warned=yes ;; -esac - CC=$ac_ct_CC - fi -else - CC="$ac_cv_prog_CC" -fi -if test -z "$CC"; then - if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}cc", so it can be a program name with args. -set dummy ${ac_tool_prefix}cc; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$CC"; then - ac_cv_prog_CC="$CC" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_CC="${ac_tool_prefix}cc" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS -fi -fi -CC=$ac_cv_prog_CC -if test -n "$CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 -$as_echo "$CC" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - fi -fi -if test -z "$CC"; then - # Extract the first word of "cc", so it can be a program name with args. -set dummy cc; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$CC"; then - ac_cv_prog_CC="$CC" # Let the user override the test. -else - ac_prog_rejected=no -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - if test "$as_dir/$ac_word$ac_exec_ext" = "/usr/ucb/cc"; then - ac_prog_rejected=yes - continue - fi - ac_cv_prog_CC="cc" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS -if test $ac_prog_rejected = yes; then - # We found a bogon in the path, so make sure we never use it. - set dummy $ac_cv_prog_CC - shift - if test $# != 0; then - # We chose a different compiler from the bogus one. - # However, it has the same basename, so the bogon will be chosen - # first if we set CC to just the basename; use the full file name. - shift - ac_cv_prog_CC="$as_dir/$ac_word${1+' '}$@" - fi -fi -fi -fi -CC=$ac_cv_prog_CC -if test -n "$CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 -$as_echo "$CC" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi -fi -if test -z "$CC"; then - if test -n "$ac_tool_prefix"; then - for ac_prog in cl.exe - do - # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. -set dummy $ac_tool_prefix$ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$CC"; then - ac_cv_prog_CC="$CC" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_CC="$ac_tool_prefix$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS -fi -fi -CC=$ac_cv_prog_CC -if test -n "$CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CC" >&5 -$as_echo "$CC" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - test -n "$CC" && break - done -fi -if test -z "$CC"; then - ac_ct_CC=$CC - for ac_prog in cl.exe -do - # Extract the first word of "$ac_prog", so it can be a program name with args. -set dummy $ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_CC"; then - ac_cv_prog_ac_ct_CC="$ac_ct_CC" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_CC="$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS -fi -fi -ac_ct_CC=$ac_cv_prog_ac_ct_CC -if test -n "$ac_ct_CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CC" >&5 -$as_echo "$ac_ct_CC" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi +ltmain=$ac_aux_dir/ltmain.sh - test -n "$ac_ct_CC" && break -done - if test "x$ac_ct_CC" = x; then - CC="" - else - case $cross_compiling:$ac_tool_warned in -yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} -ac_tool_warned=yes ;; + + # Make sure we can run config.sub. +$SHELL "${ac_aux_dir}config.sub" sun4 >/dev/null 2>&1 || + as_fn_error $? "cannot run $SHELL ${ac_aux_dir}config.sub" "$LINENO" 5 + +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking build system type" >&5 +printf %s "checking build system type... " >&6; } +if test ${ac_cv_build+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_build_alias=$build_alias +test "x$ac_build_alias" = x && + ac_build_alias=`$SHELL "${ac_aux_dir}config.guess"` +test "x$ac_build_alias" = x && + as_fn_error $? "cannot guess build type; you must specify one" "$LINENO" 5 +ac_cv_build=`$SHELL "${ac_aux_dir}config.sub" $ac_build_alias` || + as_fn_error $? "$SHELL ${ac_aux_dir}config.sub $ac_build_alias failed" "$LINENO" 5 + +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_build" >&5 +printf "%s\n" "$ac_cv_build" >&6; } +case $ac_cv_build in +*-*-*) ;; +*) as_fn_error $? "invalid value of canonical build" "$LINENO" 5;; esac - CC=$ac_ct_CC - fi +build=$ac_cv_build +ac_save_IFS=$IFS; IFS='-' +set x $ac_cv_build +shift +build_cpu=$1 +build_vendor=$2 +shift; shift +# Remember, the first character of IFS is used to create $*, +# except with old shells: +build_os=$* +IFS=$ac_save_IFS +case $build_os in *\ *) build_os=`echo "$build_os" | sed 's/ /-/g'`;; esac + + +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking host system type" >&5 +printf %s "checking host system type... " >&6; } +if test ${ac_cv_host+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test "x$host_alias" = x; then + ac_cv_host=$ac_cv_build +else + ac_cv_host=`$SHELL "${ac_aux_dir}config.sub" $host_alias` || + as_fn_error $? "$SHELL ${ac_aux_dir}config.sub $host_alias failed" "$LINENO" 5 fi fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_host" >&5 +printf "%s\n" "$ac_cv_host" >&6; } +case $ac_cv_host in +*-*-*) ;; +*) as_fn_error $? "invalid value of canonical host" "$LINENO" 5;; +esac +host=$ac_cv_host +ac_save_IFS=$IFS; IFS='-' +set x $ac_cv_host +shift +host_cpu=$1 +host_vendor=$2 +shift; shift +# Remember, the first character of IFS is used to create $*, +# except with old shells: +host_os=$* +IFS=$ac_save_IFS +case $host_os in *\ *) host_os=`echo "$host_os" | sed 's/ /-/g'`;; esac -test -z "$CC" && { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} -as_fn_error $? "no acceptable C compiler found in \$PATH -See \`config.log' for more details" "$LINENO" 5; } +# Backslashify metacharacters that are still active within +# double-quoted strings. +sed_quote_subst='s/\(["`$\\]\)/\\\1/g' -# Provide some information about the compiler. -$as_echo "$as_me:${as_lineno-$LINENO}: checking for C compiler version" >&5 -set X $ac_compile -ac_compiler=$2 -for ac_option in --version -v -V -qversion; do - { { ac_try="$ac_compiler $ac_option >&5" -case "(($ac_try" in - *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; - *) ac_try_echo=$ac_try;; -esac -eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 - (eval "$ac_compiler $ac_option >&5") 2>conftest.err - ac_status=$? - if test -s conftest.err; then - sed '10a\ -... rest of stderr output deleted ... - 10q' conftest.err >conftest.er1 - cat conftest.er1 >&5 - fi - rm -f conftest.er1 conftest.err - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } -done +# Same as above, but do not quote variable references. +double_quote_subst='s/\(["`\\]\)/\\\1/g' -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether we are using the GNU C compiler" >&5 -$as_echo_n "checking whether we are using the GNU C compiler... " >&6; } -if ${ac_cv_c_compiler_gnu+:} false; then : - $as_echo_n "(cached) " >&6 -else - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ +# Sed substitution to delay expansion of an escaped shell variable in a +# double_quote_subst'ed string. +delay_variable_subst='s/\\\\\\\\\\\$/\\\\\\$/g' -int -main () -{ -#ifndef __GNUC__ - choke me -#endif +# Sed substitution to delay expansion of an escaped single quote. +delay_single_quote_subst='s/'\''/'\'\\\\\\\'\''/g' - ; - return 0; -} -_ACEOF -if ac_fn_c_try_compile "$LINENO"; then : - ac_compiler_gnu=yes -else - ac_compiler_gnu=no -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -ac_cv_c_compiler_gnu=$ac_compiler_gnu +# Sed substitution to avoid accidental globbing in evaled expressions +no_glob_subst='s/\*/\\\*/g' -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_compiler_gnu" >&5 -$as_echo "$ac_cv_c_compiler_gnu" >&6; } -if test $ac_compiler_gnu = yes; then - GCC=yes +ECHO='\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' +ECHO=$ECHO$ECHO$ECHO$ECHO$ECHO +ECHO=$ECHO$ECHO$ECHO$ECHO$ECHO$ECHO + +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking how to print strings" >&5 +printf %s "checking how to print strings... " >&6; } +# Test print first, because it will be a builtin if present. +if test "X`( print -r -- -n ) 2>/dev/null`" = X-n && \ + test "X`print -r -- $ECHO 2>/dev/null`" = "X$ECHO"; then + ECHO='print -r --' +elif test "X`printf %s $ECHO 2>/dev/null`" = "X$ECHO"; then + ECHO='printf %s\n' else - GCC= + # Use this function as a fallback that always works. + func_fallback_echo () + { + eval 'cat <<_LTECHO_EOF +$1 +_LTECHO_EOF' + } + ECHO='func_fallback_echo' fi -ac_test_CFLAGS=${CFLAGS+set} -ac_save_CFLAGS=$CFLAGS -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CC accepts -g" >&5 -$as_echo_n "checking whether $CC accepts -g... " >&6; } -if ${ac_cv_prog_cc_g+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_save_c_werror_flag=$ac_c_werror_flag - ac_c_werror_flag=yes - ac_cv_prog_cc_g=no - CFLAGS="-g" - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -int -main () +# func_echo_all arg... +# Invoke $ECHO with all args, space-separated. +func_echo_all () { - - ; - return 0; + $ECHO "" } -_ACEOF -if ac_fn_c_try_compile "$LINENO"; then : - ac_cv_prog_cc_g=yes -else - CFLAGS="" - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -int -main () -{ +case $ECHO in + printf*) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: printf" >&5 +printf "%s\n" "printf" >&6; } ;; + print*) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: print -r" >&5 +printf "%s\n" "print -r" >&6; } ;; + *) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: cat" >&5 +printf "%s\n" "cat" >&6; } ;; +esac - ; - return 0; -} -_ACEOF -if ac_fn_c_try_compile "$LINENO"; then : -else - ac_c_werror_flag=$ac_save_c_werror_flag - CFLAGS="-g" - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -int -main () -{ - ; - return 0; -} -_ACEOF -if ac_fn_c_try_compile "$LINENO"; then : - ac_cv_prog_cc_g=yes -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext - ac_c_werror_flag=$ac_save_c_werror_flag -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_g" >&5 -$as_echo "$ac_cv_prog_cc_g" >&6; } -if test "$ac_test_CFLAGS" = set; then - CFLAGS=$ac_save_CFLAGS -elif test $ac_cv_prog_cc_g = yes; then - if test "$GCC" = yes; then - CFLAGS="-g -O2" - else - CFLAGS="-g" - fi -else - if test "$GCC" = yes; then - CFLAGS="-O2" - else - CFLAGS= - fi -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $CC option to accept ISO C89" >&5 -$as_echo_n "checking for $CC option to accept ISO C89... " >&6; } -if ${ac_cv_prog_cc_c89+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_cv_prog_cc_c89=no -ac_save_CC=$CC -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include -#include -struct stat; -/* Most of the following tests are stolen from RCS 5.7's src/conf.sh. */ -struct buf { int x; }; -FILE * (*rcsopen) (struct buf *, struct stat *, int); -static char *e (p, i) - char **p; - int i; -{ - return p[i]; -} -static char *f (char * (*g) (char **, int), char **p, ...) -{ - char *s; - va_list v; - va_start (v,p); - s = g (p, va_arg (v,int)); - va_end (v); - return s; -} -/* OSF 4.0 Compaq cc is some sort of almost-ANSI by default. It has - function prototypes and stuff, but not '\xHH' hex character constants. - These don't provoke an error unfortunately, instead are silently treated - as 'x'. The following induces an error, until -std is added to get - proper ANSI mode. Curiously '\x00'!='x' always comes out true, for an - array size at least. It's necessary to write '\x00'==0 to get something - that's true only with -std. */ -int osf4_cc_array ['\x00' == 0 ? 1 : -1]; -/* IBM C 6 for AIX is almost-ANSI by default, but it replaces macro parameters - inside strings and character constants. */ -#define FOO(x) 'x' -int xlc6_cc_array[FOO(a) == 'x' ? 1 : -1]; -int test (int i, double x); -struct s1 {int (*f) (int a);}; -struct s2 {int (*f) (double a);}; -int pairnames (int, char **, FILE *(*)(struct buf *, struct stat *, int), int, int); -int argc; -char **argv; -int -main () -{ -return f (e, argv, 0) != argv[0] || f (e, argv, 1) != argv[1]; - ; - return 0; -} -_ACEOF -for ac_arg in '' -qlanglvl=extc89 -qlanglvl=ansi -std \ - -Ae "-Aa -D_HPUX_SOURCE" "-Xc -D__EXTENSIONS__" -do - CC="$ac_save_CC $ac_arg" - if ac_fn_c_try_compile "$LINENO"; then : - ac_cv_prog_cc_c89=$ac_arg -fi -rm -f core conftest.err conftest.$ac_objext - test "x$ac_cv_prog_cc_c89" != "xno" && break -done -rm -f conftest.$ac_ext -CC=$ac_save_CC -fi -# AC_CACHE_VAL -case "x$ac_cv_prog_cc_c89" in - x) - { $as_echo "$as_me:${as_lineno-$LINENO}: result: none needed" >&5 -$as_echo "none needed" >&6; } ;; - xno) - { $as_echo "$as_me:${as_lineno-$LINENO}: result: unsupported" >&5 -$as_echo "unsupported" >&6; } ;; - *) - CC="$CC $ac_cv_prog_cc_c89" - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cc_c89" >&5 -$as_echo "$ac_cv_prog_cc_c89" >&6; } ;; -esac -if test "x$ac_cv_prog_cc_c89" != xno; then : -fi -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CC understands -c and -o together" >&5 -$as_echo_n "checking whether $CC understands -c and -o together... " >&6; } -if ${am_cv_prog_cc_c_o+:} false; then : - $as_echo_n "(cached) " >&6 -else - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -int -main () -{ - ; - return 0; -} -_ACEOF - # Make sure it works both with $CC and with simple cc. - # Following AC_PROG_CC_C_O, we do the test twice because some - # compilers refuse to overwrite an existing .o file with -o, - # though they will create one. - am_cv_prog_cc_c_o=yes - for am_i in 1 2; do - if { echo "$as_me:$LINENO: $CC -c conftest.$ac_ext -o conftest2.$ac_objext" >&5 - ($CC -c conftest.$ac_ext -o conftest2.$ac_objext) >&5 2>&5 - ac_status=$? - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - (exit $ac_status); } \ - && test -f conftest2.$ac_objext; then - : OK - else - am_cv_prog_cc_c_o=no - break + +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for a sed that does not truncate output" >&5 +printf %s "checking for a sed that does not truncate output... " >&6; } +if test ${ac_cv_path_SED+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_script=s/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb/ + for ac_i in 1 2 3 4 5 6 7; do + ac_script="$ac_script$as_nl$ac_script" + done + echo "$ac_script" 2>/dev/null | sed 99q >conftest.sed + { ac_script=; unset ac_script;} + if test -z "$SED"; then + ac_path_SED_found=false + # Loop through the user's path and test for each of PROGNAME-LIST + as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_prog in sed gsed + do + for ac_exec_ext in '' $ac_executable_extensions; do + ac_path_SED="$as_dir$ac_prog$ac_exec_ext" + as_fn_executable_p "$ac_path_SED" || continue +# Check for GNU ac_path_SED and select it if it is found. + # Check for GNU $ac_path_SED +case `"$ac_path_SED" --version 2>&1` in +*GNU*) + ac_cv_path_SED="$ac_path_SED" ac_path_SED_found=:;; +*) + ac_count=0 + printf %s 0123456789 >"conftest.in" + while : + do + cat "conftest.in" "conftest.in" >"conftest.tmp" + mv "conftest.tmp" "conftest.in" + cp "conftest.in" "conftest.nl" + printf "%s\n" '' >> "conftest.nl" + "$ac_path_SED" -f conftest.sed < "conftest.nl" >"conftest.out" 2>/dev/null || break + diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break + as_fn_arith $ac_count + 1 && ac_count=$as_val + if test $ac_count -gt ${ac_path_SED_max-0}; then + # Best one so far, save it but keep looking for a better one + ac_cv_path_SED="$ac_path_SED" + ac_path_SED_max=$ac_count fi + # 10*(2^10) chars as input seems more than enough + test $ac_count -gt 10 && break done - rm -f core conftest* - unset am_i + rm -f conftest.in conftest.tmp conftest.nl conftest.out;; +esac + + $ac_path_SED_found && break 3 + done + done + done +IFS=$as_save_IFS + if test -z "$ac_cv_path_SED"; then + as_fn_error $? "no acceptable sed could be found in \$PATH" "$LINENO" 5 + fi +else + ac_cv_path_SED=$SED fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $am_cv_prog_cc_c_o" >&5 -$as_echo "$am_cv_prog_cc_c_o" >&6; } -if test "$am_cv_prog_cc_c_o" != yes; then - # Losing compiler, so override with the script. - # FIXME: It is wrong to rewrite CC. - # But if we don't then we get into trouble of one sort or another. - # A longer-term fix would be to have automake use am__CC in this case, - # and then we could set am__CC="\$(top_srcdir)/compile \$(CC)" - CC="$am_aux_dir/compile $CC" + fi -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_SED" >&5 +printf "%s\n" "$ac_cv_path_SED" >&6; } + SED="$ac_cv_path_SED" + rm -f conftest.sed +test -z "$SED" && SED=sed +Xsed="$SED -e 1s/^X//" -depcc="$CC" am_compiler_list= -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking dependency style of $depcc" >&5 -$as_echo_n "checking dependency style of $depcc... " >&6; } -if ${am_cv_CC_dependencies_compiler_type+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then - # We make a subdir and do the tests there. Otherwise we can end up - # making bogus files that we don't know about and never remove. For - # instance it was reported that on HP-UX the gcc test will end up - # making a dummy file named 'D' -- because '-MD' means "put the output - # in D". - rm -rf conftest.dir - mkdir conftest.dir - # Copy depcomp to subdir because otherwise we won't find it if we're - # using a relative directory. - cp "$am_depcomp" conftest.dir - cd conftest.dir - # We will build objects and dependencies in a subdirectory because - # it helps to detect inapplicable dependency modes. For instance - # both Tru64's cc and ICC support -MD to output dependencies as a - # side effect of compilation, but ICC will put the dependencies in - # the current directory while Tru64 will put them in the object - # directory. - mkdir sub - am_cv_CC_dependencies_compiler_type=none - if test "$am_compiler_list" = ""; then - am_compiler_list=`sed -n 's/^#*\([a-zA-Z0-9]*\))$/\1/p' < ./depcomp` - fi - am__universal=false - case " $depcc " in #( - *\ -arch\ *\ -arch\ *) am__universal=true ;; - esac - for depmode in $am_compiler_list; do - # Setup a source with many dependencies, because some compilers - # like to wrap large dependency lists on column 80 (with \), and - # we should not choose a depcomp mode which is confused by this. - # - # We need to recreate these files for each test, as the compiler may - # overwrite some of them when testing with obscure command lines. - # This happens at least with the AIX C compiler. - : > sub/conftest.c - for i in 1 2 3 4 5 6; do - echo '#include "conftst'$i'.h"' >> sub/conftest.c - # Using ": > sub/conftst$i.h" creates only sub/conftst1.h with - # Solaris 10 /bin/sh. - echo '/* dummy */' > sub/conftst$i.h - done - echo "${am__include} ${am__quote}sub/conftest.Po${am__quote}" > confmf - # We check with '-c' and '-o' for the sake of the "dashmstdout" - # mode. It turns out that the SunPro C++ compiler does not properly - # handle '-M -o', and we need to detect this. Also, some Intel - # versions had trouble with output in subdirs. - am__obj=sub/conftest.${OBJEXT-o} - am__minus_obj="-o $am__obj" - case $depmode in - gcc) - # This depmode causes a compiler race in universal mode. - test "$am__universal" = false || continue - ;; - nosideeffect) - # After this tag, mechanisms are not by side-effect, so they'll - # only be used when explicitly requested. - if test "x$enable_dependency_tracking" = xyes; then - continue - else - break - fi - ;; - msvc7 | msvc7msys | msvisualcpp | msvcmsys) - # This compiler won't grok '-c -o', but also, the minuso test has - # not run yet. These depmodes are late enough in the game, and - # so weak that their functioning should not be impacted. - am__obj=conftest.${OBJEXT-o} - am__minus_obj= - ;; - none) break ;; - esac - if depmode=$depmode \ - source=sub/conftest.c object=$am__obj \ - depfile=sub/conftest.Po tmpdepfile=sub/conftest.TPo \ - $SHELL ./depcomp $depcc -c $am__minus_obj sub/conftest.c \ - >/dev/null 2>conftest.err && - grep sub/conftst1.h sub/conftest.Po > /dev/null 2>&1 && - grep sub/conftst6.h sub/conftest.Po > /dev/null 2>&1 && - grep $am__obj sub/conftest.Po > /dev/null 2>&1 && - ${MAKE-make} -s -f confmf > /dev/null 2>&1; then - # icc doesn't choke on unknown options, it will just issue warnings - # or remarks (even with -Werror). So we grep stderr for any message - # that says an option was ignored or not supported. - # When given -MP, icc 7.0 and 7.1 complain thusly: - # icc: Command line warning: ignoring option '-M'; no argument required - # The diagnosis changed in icc 8.0: - # icc: Command line remark: option '-MP' not supported - if (grep 'ignoring option' conftest.err || - grep 'not supported' conftest.err) >/dev/null 2>&1; then :; else - am_cv_CC_dependencies_compiler_type=$depmode - break - fi - fi - done - cd .. - rm -rf conftest.dir -else - am_cv_CC_dependencies_compiler_type=none -fi -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $am_cv_CC_dependencies_compiler_type" >&5 -$as_echo "$am_cv_CC_dependencies_compiler_type" >&6; } -CCDEPMODE=depmode=$am_cv_CC_dependencies_compiler_type - if - test "x$enable_dependency_tracking" != xno \ - && test "$am_cv_CC_dependencies_compiler_type" = gcc3; then - am__fastdepCC_TRUE= - am__fastdepCC_FALSE='#' -else - am__fastdepCC_TRUE='#' - am__fastdepCC_FALSE= -fi -ac_ext=cpp -ac_cpp='$CXXCPP $CPPFLAGS' -ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_cxx_compiler_gnu -if test -z "$CXX"; then - if test -n "$CCC"; then - CXX=$CCC - else - if test -n "$ac_tool_prefix"; then - for ac_prog in g++ c++ gpp aCC CC cxx cc++ cl.exe FCC KCC RCC xlC_r xlC - do - # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. -set dummy $ac_tool_prefix$ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_CXX+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$CXX"; then - ac_cv_prog_CXX="$CXX" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH + +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for grep that handles long lines and -e" >&5 +printf %s "checking for grep that handles long lines and -e... " >&6; } +if test ${ac_cv_path_GREP+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -z "$GREP"; then + ac_path_GREP_found=false + # Loop through the user's path and test for each of PROGNAME-LIST + as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_prog in grep ggrep + do for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_CXX="$ac_tool_prefix$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done + ac_path_GREP="$as_dir$ac_prog$ac_exec_ext" + as_fn_executable_p "$ac_path_GREP" || continue +# Check for GNU ac_path_GREP and select it if it is found. + # Check for GNU $ac_path_GREP +case `"$ac_path_GREP" --version 2>&1` in +*GNU*) + ac_cv_path_GREP="$ac_path_GREP" ac_path_GREP_found=:;; +*) + ac_count=0 + printf %s 0123456789 >"conftest.in" + while : + do + cat "conftest.in" "conftest.in" >"conftest.tmp" + mv "conftest.tmp" "conftest.in" + cp "conftest.in" "conftest.nl" + printf "%s\n" 'GREP' >> "conftest.nl" + "$ac_path_GREP" -e 'GREP$' -e '-(cannot match)-' < "conftest.nl" >"conftest.out" 2>/dev/null || break + diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break + as_fn_arith $ac_count + 1 && ac_count=$as_val + if test $ac_count -gt ${ac_path_GREP_max-0}; then + # Best one so far, save it but keep looking for a better one + ac_cv_path_GREP="$ac_path_GREP" + ac_path_GREP_max=$ac_count + fi + # 10*(2^10) chars as input seems more than enough + test $ac_count -gt 10 && break done -IFS=$as_save_IFS + rm -f conftest.in conftest.tmp conftest.nl conftest.out;; +esac -fi -fi -CXX=$ac_cv_prog_CXX -if test -n "$CXX"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CXX" >&5 -$as_echo "$CXX" >&6; } + $ac_path_GREP_found && break 3 + done + done + done +IFS=$as_save_IFS + if test -z "$ac_cv_path_GREP"; then + as_fn_error $? "no acceptable grep could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" "$LINENO" 5 + fi else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + ac_cv_path_GREP=$GREP fi - - test -n "$CXX" && break - done fi -if test -z "$CXX"; then - ac_ct_CXX=$CXX - for ac_prog in g++ c++ gpp aCC CC cxx cc++ cl.exe FCC KCC RCC xlC_r xlC -do - # Extract the first word of "$ac_prog", so it can be a program name with args. -set dummy $ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_CXX+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_CXX"; then - ac_cv_prog_ac_ct_CXX="$ac_ct_CXX" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_GREP" >&5 +printf "%s\n" "$ac_cv_path_GREP" >&6; } + GREP="$ac_cv_path_GREP" + + +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for egrep" >&5 +printf %s "checking for egrep... " >&6; } +if test ${ac_cv_path_EGREP+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if echo a | $GREP -E '(a|b)' >/dev/null 2>&1 + then ac_cv_path_EGREP="$GREP -E" + else + if test -z "$EGREP"; then + ac_path_EGREP_found=false + # Loop through the user's path and test for each of PROGNAME-LIST + as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_prog in egrep + do for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_CXX="$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done + ac_path_EGREP="$as_dir$ac_prog$ac_exec_ext" + as_fn_executable_p "$ac_path_EGREP" || continue +# Check for GNU ac_path_EGREP and select it if it is found. + # Check for GNU $ac_path_EGREP +case `"$ac_path_EGREP" --version 2>&1` in +*GNU*) + ac_cv_path_EGREP="$ac_path_EGREP" ac_path_EGREP_found=:;; +*) + ac_count=0 + printf %s 0123456789 >"conftest.in" + while : + do + cat "conftest.in" "conftest.in" >"conftest.tmp" + mv "conftest.tmp" "conftest.in" + cp "conftest.in" "conftest.nl" + printf "%s\n" 'EGREP' >> "conftest.nl" + "$ac_path_EGREP" 'EGREP$' < "conftest.nl" >"conftest.out" 2>/dev/null || break + diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break + as_fn_arith $ac_count + 1 && ac_count=$as_val + if test $ac_count -gt ${ac_path_EGREP_max-0}; then + # Best one so far, save it but keep looking for a better one + ac_cv_path_EGREP="$ac_path_EGREP" + ac_path_EGREP_max=$ac_count + fi + # 10*(2^10) chars as input seems more than enough + test $ac_count -gt 10 && break done -IFS=$as_save_IFS + rm -f conftest.in conftest.tmp conftest.nl conftest.out;; +esac -fi -fi -ac_ct_CXX=$ac_cv_prog_ac_ct_CXX -if test -n "$ac_ct_CXX"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CXX" >&5 -$as_echo "$ac_ct_CXX" >&6; } + $ac_path_EGREP_found && break 3 + done + done + done +IFS=$as_save_IFS + if test -z "$ac_cv_path_EGREP"; then + as_fn_error $? "no acceptable egrep could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" "$LINENO" 5 + fi else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + ac_cv_path_EGREP=$EGREP fi + fi +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_EGREP" >&5 +printf "%s\n" "$ac_cv_path_EGREP" >&6; } + EGREP="$ac_cv_path_EGREP" - test -n "$ac_ct_CXX" && break -done - if test "x$ac_ct_CXX" = x; then - CXX="g++" - else - case $cross_compiling:$ac_tool_warned in -yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} -ac_tool_warned=yes ;; +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for fgrep" >&5 +printf %s "checking for fgrep... " >&6; } +if test ${ac_cv_path_FGREP+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if echo 'ab*c' | $GREP -F 'ab*c' >/dev/null 2>&1 + then ac_cv_path_FGREP="$GREP -F" + else + if test -z "$FGREP"; then + ac_path_FGREP_found=false + # Loop through the user's path and test for each of PROGNAME-LIST + as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_prog in fgrep + do + for ac_exec_ext in '' $ac_executable_extensions; do + ac_path_FGREP="$as_dir$ac_prog$ac_exec_ext" + as_fn_executable_p "$ac_path_FGREP" || continue +# Check for GNU ac_path_FGREP and select it if it is found. + # Check for GNU $ac_path_FGREP +case `"$ac_path_FGREP" --version 2>&1` in +*GNU*) + ac_cv_path_FGREP="$ac_path_FGREP" ac_path_FGREP_found=:;; +*) + ac_count=0 + printf %s 0123456789 >"conftest.in" + while : + do + cat "conftest.in" "conftest.in" >"conftest.tmp" + mv "conftest.tmp" "conftest.in" + cp "conftest.in" "conftest.nl" + printf "%s\n" 'FGREP' >> "conftest.nl" + "$ac_path_FGREP" FGREP < "conftest.nl" >"conftest.out" 2>/dev/null || break + diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break + as_fn_arith $ac_count + 1 && ac_count=$as_val + if test $ac_count -gt ${ac_path_FGREP_max-0}; then + # Best one so far, save it but keep looking for a better one + ac_cv_path_FGREP="$ac_path_FGREP" + ac_path_FGREP_max=$ac_count + fi + # 10*(2^10) chars as input seems more than enough + test $ac_count -gt 10 && break + done + rm -f conftest.in conftest.tmp conftest.nl conftest.out;; esac - CXX=$ac_ct_CXX + + $ac_path_FGREP_found && break 3 + done + done + done +IFS=$as_save_IFS + if test -z "$ac_cv_path_FGREP"; then + as_fn_error $? "no acceptable fgrep could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" "$LINENO" 5 fi +else + ac_cv_path_FGREP=$FGREP fi - fi + fi fi -# Provide some information about the compiler. -$as_echo "$as_me:${as_lineno-$LINENO}: checking for C++ compiler version" >&5 -set X $ac_compile -ac_compiler=$2 -for ac_option in --version -v -V -qversion; do - { { ac_try="$ac_compiler $ac_option >&5" -case "(($ac_try" in - *\"* | *\`* | *\\*) ac_try_echo=\$ac_try;; - *) ac_try_echo=$ac_try;; -esac -eval ac_try_echo="\"\$as_me:${as_lineno-$LINENO}: $ac_try_echo\"" -$as_echo "$ac_try_echo"; } >&5 - (eval "$ac_compiler $ac_option >&5") 2>conftest.err - ac_status=$? - if test -s conftest.err; then - sed '10a\ -... rest of stderr output deleted ... - 10q' conftest.err >conftest.er1 - cat conftest.er1 >&5 - fi - rm -f conftest.er1 conftest.err - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } -done +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_FGREP" >&5 +printf "%s\n" "$ac_cv_path_FGREP" >&6; } + FGREP="$ac_cv_path_FGREP" -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether we are using the GNU C++ compiler" >&5 -$as_echo_n "checking whether we are using the GNU C++ compiler... " >&6; } -if ${ac_cv_cxx_compiler_gnu+:} false; then : - $as_echo_n "(cached) " >&6 -else - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -int -main () -{ -#ifndef __GNUC__ - choke me -#endif +test -z "$GREP" && GREP=grep - ; - return 0; -} -_ACEOF -if ac_fn_cxx_try_compile "$LINENO"; then : - ac_compiler_gnu=yes -else - ac_compiler_gnu=no -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -ac_cv_cxx_compiler_gnu=$ac_compiler_gnu -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_cxx_compiler_gnu" >&5 -$as_echo "$ac_cv_cxx_compiler_gnu" >&6; } -if test $ac_compiler_gnu = yes; then - GXX=yes -else - GXX= -fi -ac_test_CXXFLAGS=${CXXFLAGS+set} -ac_save_CXXFLAGS=$CXXFLAGS -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether $CXX accepts -g" >&5 -$as_echo_n "checking whether $CXX accepts -g... " >&6; } -if ${ac_cv_prog_cxx_g+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_save_cxx_werror_flag=$ac_cxx_werror_flag - ac_cxx_werror_flag=yes - ac_cv_prog_cxx_g=no - CXXFLAGS="-g" - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -int -main () -{ - ; - return 0; -} -_ACEOF -if ac_fn_cxx_try_compile "$LINENO"; then : - ac_cv_prog_cxx_g=yes -else - CXXFLAGS="" - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -int -main () -{ - ; - return 0; -} -_ACEOF -if ac_fn_cxx_try_compile "$LINENO"; then : -else - ac_cxx_werror_flag=$ac_save_cxx_werror_flag - CXXFLAGS="-g" - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -int -main () -{ - ; - return 0; -} -_ACEOF -if ac_fn_cxx_try_compile "$LINENO"; then : - ac_cv_prog_cxx_g=yes -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext - ac_cxx_werror_flag=$ac_save_cxx_werror_flag -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_prog_cxx_g" >&5 -$as_echo "$ac_cv_prog_cxx_g" >&6; } -if test "$ac_test_CXXFLAGS" = set; then - CXXFLAGS=$ac_save_CXXFLAGS -elif test $ac_cv_prog_cxx_g = yes; then - if test "$GXX" = yes; then - CXXFLAGS="-g -O2" - else - CXXFLAGS="-g" - fi -else - if test "$GXX" = yes; then - CXXFLAGS="-O2" - else - CXXFLAGS= - fi -fi -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu -depcc="$CXX" am_compiler_list= -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking dependency style of $depcc" >&5 -$as_echo_n "checking dependency style of $depcc... " >&6; } -if ${am_cv_CXX_dependencies_compiler_type+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -z "$AMDEP_TRUE" && test -f "$am_depcomp"; then - # We make a subdir and do the tests there. Otherwise we can end up - # making bogus files that we don't know about and never remove. For - # instance it was reported that on HP-UX the gcc test will end up - # making a dummy file named 'D' -- because '-MD' means "put the output - # in D". - rm -rf conftest.dir - mkdir conftest.dir - # Copy depcomp to subdir because otherwise we won't find it if we're - # using a relative directory. - cp "$am_depcomp" conftest.dir - cd conftest.dir - # We will build objects and dependencies in a subdirectory because - # it helps to detect inapplicable dependency modes. For instance - # both Tru64's cc and ICC support -MD to output dependencies as a - # side effect of compilation, but ICC will put the dependencies in - # the current directory while Tru64 will put them in the object - # directory. - mkdir sub - am_cv_CXX_dependencies_compiler_type=none - if test "$am_compiler_list" = ""; then - am_compiler_list=`sed -n 's/^#*\([a-zA-Z0-9]*\))$/\1/p' < ./depcomp` - fi - am__universal=false - case " $depcc " in #( - *\ -arch\ *\ -arch\ *) am__universal=true ;; - esac - for depmode in $am_compiler_list; do - # Setup a source with many dependencies, because some compilers - # like to wrap large dependency lists on column 80 (with \), and - # we should not choose a depcomp mode which is confused by this. - # - # We need to recreate these files for each test, as the compiler may - # overwrite some of them when testing with obscure command lines. - # This happens at least with the AIX C compiler. - : > sub/conftest.c - for i in 1 2 3 4 5 6; do - echo '#include "conftst'$i'.h"' >> sub/conftest.c - # Using ": > sub/conftst$i.h" creates only sub/conftst1.h with - # Solaris 10 /bin/sh. - echo '/* dummy */' > sub/conftst$i.h - done - echo "${am__include} ${am__quote}sub/conftest.Po${am__quote}" > confmf - # We check with '-c' and '-o' for the sake of the "dashmstdout" - # mode. It turns out that the SunPro C++ compiler does not properly - # handle '-M -o', and we need to detect this. Also, some Intel - # versions had trouble with output in subdirs. - am__obj=sub/conftest.${OBJEXT-o} - am__minus_obj="-o $am__obj" - case $depmode in - gcc) - # This depmode causes a compiler race in universal mode. - test "$am__universal" = false || continue - ;; - nosideeffect) - # After this tag, mechanisms are not by side-effect, so they'll - # only be used when explicitly requested. - if test "x$enable_dependency_tracking" = xyes; then - continue - else - break - fi - ;; - msvc7 | msvc7msys | msvisualcpp | msvcmsys) - # This compiler won't grok '-c -o', but also, the minuso test has - # not run yet. These depmodes are late enough in the game, and - # so weak that their functioning should not be impacted. - am__obj=conftest.${OBJEXT-o} - am__minus_obj= + + + + + +# Check whether --with-gnu-ld was given. +if test ${with_gnu_ld+y} +then : + withval=$with_gnu_ld; test no = "$withval" || with_gnu_ld=yes +else $as_nop + with_gnu_ld=no +fi + +ac_prog=ld +if test yes = "$GCC"; then + # Check if gcc -print-prog-name=ld gives a path. + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for ld used by $CC" >&5 +printf %s "checking for ld used by $CC... " >&6; } + case $host in + *-*-mingw*) + # gcc leaves a trailing carriage return, which upsets mingw + ac_prog=`($CC -print-prog-name=ld) 2>&5 | tr -d '\015'` ;; + *) + ac_prog=`($CC -print-prog-name=ld) 2>&5` ;; + esac + case $ac_prog in + # Accept absolute paths. + [\\/]* | ?:[\\/]*) + re_direlt='/[^/][^/]*/\.\./' + # Canonicalize the pathname of ld + ac_prog=`$ECHO "$ac_prog"| $SED 's%\\\\%/%g'` + while $ECHO "$ac_prog" | $GREP "$re_direlt" > /dev/null 2>&1; do + ac_prog=`$ECHO $ac_prog| $SED "s%$re_direlt%/%"` + done + test -z "$LD" && LD=$ac_prog ;; - none) break ;; - esac - if depmode=$depmode \ - source=sub/conftest.c object=$am__obj \ - depfile=sub/conftest.Po tmpdepfile=sub/conftest.TPo \ - $SHELL ./depcomp $depcc -c $am__minus_obj sub/conftest.c \ - >/dev/null 2>conftest.err && - grep sub/conftst1.h sub/conftest.Po > /dev/null 2>&1 && - grep sub/conftst6.h sub/conftest.Po > /dev/null 2>&1 && - grep $am__obj sub/conftest.Po > /dev/null 2>&1 && - ${MAKE-make} -s -f confmf > /dev/null 2>&1; then - # icc doesn't choke on unknown options, it will just issue warnings - # or remarks (even with -Werror). So we grep stderr for any message - # that says an option was ignored or not supported. - # When given -MP, icc 7.0 and 7.1 complain thusly: - # icc: Command line warning: ignoring option '-M'; no argument required - # The diagnosis changed in icc 8.0: - # icc: Command line remark: option '-MP' not supported - if (grep 'ignoring option' conftest.err || - grep 'not supported' conftest.err) >/dev/null 2>&1; then :; else - am_cv_CXX_dependencies_compiler_type=$depmode - break - fi + "") + # If it fails, then pretend we aren't using GCC. + ac_prog=ld + ;; + *) + # If it is relative, then search for the first ld in PATH. + with_gnu_ld=unknown + ;; + esac +elif test yes = "$with_gnu_ld"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for GNU ld" >&5 +printf %s "checking for GNU ld... " >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for non-GNU ld" >&5 +printf %s "checking for non-GNU ld... " >&6; } +fi +if test ${lt_cv_path_LD+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -z "$LD"; then + lt_save_ifs=$IFS; IFS=$PATH_SEPARATOR + for ac_dir in $PATH; do + IFS=$lt_save_ifs + test -z "$ac_dir" && ac_dir=. + if test -f "$ac_dir/$ac_prog" || test -f "$ac_dir/$ac_prog$ac_exeext"; then + lt_cv_path_LD=$ac_dir/$ac_prog + # Check to see if the program is GNU ld. I'd rather use --version, + # but apparently some variants of GNU ld only accept -v. + # Break only if it was the GNU/non-GNU ld that we prefer. + case `"$lt_cv_path_LD" -v 2>&1 &5 -$as_echo "$am_cv_CXX_dependencies_compiler_type" >&6; } -CXXDEPMODE=depmode=$am_cv_CXX_dependencies_compiler_type - if - test "x$enable_dependency_tracking" != xno \ - && test "$am_cv_CXX_dependencies_compiler_type" = gcc3; then - am__fastdepCXX_TRUE= - am__fastdepCXX_FALSE='#' +LD=$lt_cv_path_LD +if test -n "$LD"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $LD" >&5 +printf "%s\n" "$LD" >&6; } else - am__fastdepCXX_TRUE='#' - am__fastdepCXX_FALSE= -fi - - - - -if test "x$remember_set_CFLAGS" = "x" -then - if test "$CFLAGS" = "-g -O2" - then - CFLAGS="-O2" - elif test "$CFLAGS" = "-g" - then - CFLAGS="" - fi + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi - -if test "x$remember_set_CXXFLAGS" = "x" -then - if test "$CXXFLAGS" = "-g -O2" - then - CXXFLAGS="-O2" - elif test "$CXXFLAGS" = "-g" - then - CXXFLAGS="" - fi +test -z "$LD" && as_fn_error $? "no acceptable ld found in \$PATH" "$LINENO" 5 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if the linker ($LD) is GNU ld" >&5 +printf %s "checking if the linker ($LD) is GNU ld... " >&6; } +if test ${lt_cv_prog_gnu_ld+y} +then : + printf %s "(cached) " >&6 +else $as_nop + # I'd rather use --version here, but apparently some GNU lds only accept -v. +case `$LD -v 2>&1 &5 +printf "%s\n" "$lt_cv_prog_gnu_ld" >&6; } +with_gnu_ld=$lt_cv_prog_gnu_ld -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -int -main () -{ - ; - return 0; -} -_ACEOF -if ac_fn_cxx_try_compile "$LINENO"; then : -else - CXX=""; CXXCP=""; CXXFLAGS="" -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu -# Check for a 64-bit integer type -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking how to run the C preprocessor" >&5 -$as_echo_n "checking how to run the C preprocessor... " >&6; } -# On Suns, sometimes $CPP names a directory. -if test -n "$CPP" && test -d "$CPP"; then - CPP= -fi -if test -z "$CPP"; then - if ${ac_cv_prog_CPP+:} false; then : - $as_echo_n "(cached) " >&6 -else - # Double quotes because CPP needs to be expanded - for CPP in "$CC -E" "$CC -E -traditional-cpp" "/lib/cpp" - do - ac_preproc_ok=false -for ac_c_preproc_warn_flag in '' yes -do - # Use a header file that comes with gcc, so configuring glibc - # with a fresh cross-compiler works. - # Prefer to if __STDC__ is defined, since - # exists even on freestanding compilers. - # On the NeXT, cc -E runs the code through the compiler's parser, - # not just through cpp. "Syntax error" is here to catch this case. - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#ifdef __STDC__ -# include -#else -# include -#endif - Syntax error -_ACEOF -if ac_fn_c_try_cpp "$LINENO"; then : -else - # Broken: fails on valid input. -continue -fi -rm -f conftest.err conftest.i conftest.$ac_ext - # OK, works on sane cases. Now check whether nonexistent headers - # can be detected and how. - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include -_ACEOF -if ac_fn_c_try_cpp "$LINENO"; then : - # Broken: success on invalid input. -continue +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for BSD- or MS-compatible name lister (nm)" >&5 +printf %s "checking for BSD- or MS-compatible name lister (nm)... " >&6; } +if test ${lt_cv_path_NM+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$NM"; then + # Let the user override the test. + lt_cv_path_NM=$NM else - # Passes both tests. -ac_preproc_ok=: -break -fi -rm -f conftest.err conftest.i conftest.$ac_ext - -done -# Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped. -rm -f conftest.i conftest.err conftest.$ac_ext -if $ac_preproc_ok; then : - break -fi - - done - ac_cv_prog_CPP=$CPP - -fi - CPP=$ac_cv_prog_CPP -else - ac_cv_prog_CPP=$CPP -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $CPP" >&5 -$as_echo "$CPP" >&6; } -ac_preproc_ok=false -for ac_c_preproc_warn_flag in '' yes -do - # Use a header file that comes with gcc, so configuring glibc - # with a fresh cross-compiler works. - # Prefer to if __STDC__ is defined, since - # exists even on freestanding compilers. - # On the NeXT, cc -E runs the code through the compiler's parser, - # not just through cpp. "Syntax error" is here to catch this case. - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#ifdef __STDC__ -# include -#else -# include -#endif - Syntax error -_ACEOF -if ac_fn_c_try_cpp "$LINENO"; then : - -else - # Broken: fails on valid input. -continue + lt_nm_to_check=${ac_tool_prefix}nm + if test -n "$ac_tool_prefix" && test "$build" = "$host"; then + lt_nm_to_check="$lt_nm_to_check nm" + fi + for lt_tmp_nm in $lt_nm_to_check; do + lt_save_ifs=$IFS; IFS=$PATH_SEPARATOR + for ac_dir in $PATH /usr/ccs/bin/elf /usr/ccs/bin /usr/ucb /bin; do + IFS=$lt_save_ifs + test -z "$ac_dir" && ac_dir=. + tmp_nm=$ac_dir/$lt_tmp_nm + if test -f "$tmp_nm" || test -f "$tmp_nm$ac_exeext"; then + # Check to see if the nm accepts a BSD-compat flag. + # Adding the 'sed 1q' prevents false positives on HP-UX, which says: + # nm: unknown option "B" ignored + # Tru64's nm complains that /dev/null is an invalid object file + # MSYS converts /dev/null to NUL, MinGW nm treats NUL as empty + case $build_os in + mingw*) lt_bad_file=conftest.nm/nofile ;; + *) lt_bad_file=/dev/null ;; + esac + case `"$tmp_nm" -B $lt_bad_file 2>&1 | sed '1q'` in + *$lt_bad_file* | *'Invalid file or object type'*) + lt_cv_path_NM="$tmp_nm -B" + break 2 + ;; + *) + case `"$tmp_nm" -p /dev/null 2>&1 | sed '1q'` in + */dev/null*) + lt_cv_path_NM="$tmp_nm -p" + break 2 + ;; + *) + lt_cv_path_NM=${lt_cv_path_NM="$tmp_nm"} # keep the first match, but + continue # so that we can try to find one that supports BSD flags + ;; + esac + ;; + esac + fi + done + IFS=$lt_save_ifs + done + : ${lt_cv_path_NM=no} fi -rm -f conftest.err conftest.i conftest.$ac_ext - - # OK, works on sane cases. Now check whether nonexistent headers - # can be detected and how. - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include -_ACEOF -if ac_fn_c_try_cpp "$LINENO"; then : - # Broken: success on invalid input. -continue -else - # Passes both tests. -ac_preproc_ok=: -break fi -rm -f conftest.err conftest.i conftest.$ac_ext - -done -# Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped. -rm -f conftest.i conftest.err conftest.$ac_ext -if $ac_preproc_ok; then : - +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_path_NM" >&5 +printf "%s\n" "$lt_cv_path_NM" >&6; } +if test no != "$lt_cv_path_NM"; then + NM=$lt_cv_path_NM else - { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} -as_fn_error $? "C preprocessor \"$CPP\" fails sanity check -See \`config.log' for more details" "$LINENO" 5; } -fi - -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu - - -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for grep that handles long lines and -e" >&5 -$as_echo_n "checking for grep that handles long lines and -e... " >&6; } -if ${ac_cv_path_GREP+:} false; then : - $as_echo_n "(cached) " >&6 + # Didn't find any BSD compatible name lister, look for dumpbin. + if test -n "$DUMPBIN"; then : + # Let the user override the test. + else + if test -n "$ac_tool_prefix"; then + for ac_prog in dumpbin "link -dump" + do + # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. +set dummy $ac_tool_prefix$ac_prog; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_DUMPBIN+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$DUMPBIN"; then + ac_cv_prog_DUMPBIN="$DUMPBIN" # Let the user override the test. else - if test -z "$GREP"; then - ac_path_GREP_found=false - # Loop through the user's path and test for each of PROGNAME-LIST - as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_prog in grep ggrep; do + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - ac_path_GREP="$as_dir/$ac_prog$ac_exec_ext" - as_fn_executable_p "$ac_path_GREP" || continue -# Check for GNU ac_path_GREP and select it if it is found. - # Check for GNU $ac_path_GREP -case `"$ac_path_GREP" --version 2>&1` in -*GNU*) - ac_cv_path_GREP="$ac_path_GREP" ac_path_GREP_found=:;; -*) - ac_count=0 - $as_echo_n 0123456789 >"conftest.in" - while : - do - cat "conftest.in" "conftest.in" >"conftest.tmp" - mv "conftest.tmp" "conftest.in" - cp "conftest.in" "conftest.nl" - $as_echo 'GREP' >> "conftest.nl" - "$ac_path_GREP" -e 'GREP$' -e '-(cannot match)-' < "conftest.nl" >"conftest.out" 2>/dev/null || break - diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break - as_fn_arith $ac_count + 1 && ac_count=$as_val - if test $ac_count -gt ${ac_path_GREP_max-0}; then - # Best one so far, save it but keep looking for a better one - ac_cv_path_GREP="$ac_path_GREP" - ac_path_GREP_max=$ac_count - fi - # 10*(2^10) chars as input seems more than enough - test $ac_count -gt 10 && break - done - rm -f conftest.in conftest.tmp conftest.nl conftest.out;; -esac - - $ac_path_GREP_found && break 3 - done - done + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_DUMPBIN="$ac_tool_prefix$ac_prog" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done done IFS=$as_save_IFS - if test -z "$ac_cv_path_GREP"; then - as_fn_error $? "no acceptable grep could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" "$LINENO" 5 - fi -else - ac_cv_path_GREP=$GREP -fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_GREP" >&5 -$as_echo "$ac_cv_path_GREP" >&6; } - GREP="$ac_cv_path_GREP" +fi +DUMPBIN=$ac_cv_prog_DUMPBIN +if test -n "$DUMPBIN"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $DUMPBIN" >&5 +printf "%s\n" "$DUMPBIN" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for egrep" >&5 -$as_echo_n "checking for egrep... " >&6; } -if ${ac_cv_path_EGREP+:} false; then : - $as_echo_n "(cached) " >&6 + test -n "$DUMPBIN" && break + done +fi +if test -z "$DUMPBIN"; then + ac_ct_DUMPBIN=$DUMPBIN + for ac_prog in dumpbin "link -dump" +do + # Extract the first word of "$ac_prog", so it can be a program name with args. +set dummy $ac_prog; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_DUMPBIN+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_DUMPBIN"; then + ac_cv_prog_ac_ct_DUMPBIN="$ac_ct_DUMPBIN" # Let the user override the test. else - if echo a | $GREP -E '(a|b)' >/dev/null 2>&1 - then ac_cv_path_EGREP="$GREP -E" - else - if test -z "$EGREP"; then - ac_path_EGREP_found=false - # Loop through the user's path and test for each of PROGNAME-LIST - as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_prog in egrep; do + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - ac_path_EGREP="$as_dir/$ac_prog$ac_exec_ext" - as_fn_executable_p "$ac_path_EGREP" || continue -# Check for GNU ac_path_EGREP and select it if it is found. - # Check for GNU $ac_path_EGREP -case `"$ac_path_EGREP" --version 2>&1` in -*GNU*) - ac_cv_path_EGREP="$ac_path_EGREP" ac_path_EGREP_found=:;; -*) - ac_count=0 - $as_echo_n 0123456789 >"conftest.in" - while : - do - cat "conftest.in" "conftest.in" >"conftest.tmp" - mv "conftest.tmp" "conftest.in" - cp "conftest.in" "conftest.nl" - $as_echo 'EGREP' >> "conftest.nl" - "$ac_path_EGREP" 'EGREP$' < "conftest.nl" >"conftest.out" 2>/dev/null || break - diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break - as_fn_arith $ac_count + 1 && ac_count=$as_val - if test $ac_count -gt ${ac_path_EGREP_max-0}; then - # Best one so far, save it but keep looking for a better one - ac_cv_path_EGREP="$ac_path_EGREP" - ac_path_EGREP_max=$ac_count - fi - # 10*(2^10) chars as input seems more than enough - test $ac_count -gt 10 && break - done - rm -f conftest.in conftest.tmp conftest.nl conftest.out;; -esac - - $ac_path_EGREP_found && break 3 - done - done + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_DUMPBIN="$ac_prog" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done done IFS=$as_save_IFS - if test -z "$ac_cv_path_EGREP"; then - as_fn_error $? "no acceptable egrep could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" "$LINENO" 5 - fi -else - ac_cv_path_EGREP=$EGREP -fi - fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_EGREP" >&5 -$as_echo "$ac_cv_path_EGREP" >&6; } - EGREP="$ac_cv_path_EGREP" - - -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for ANSI C header files" >&5 -$as_echo_n "checking for ANSI C header files... " >&6; } -if ${ac_cv_header_stdc+:} false; then : - $as_echo_n "(cached) " >&6 -else - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include -#include -#include -#include - -int -main () -{ - - ; - return 0; -} -_ACEOF -if ac_fn_c_try_compile "$LINENO"; then : - ac_cv_header_stdc=yes +fi +ac_ct_DUMPBIN=$ac_cv_prog_ac_ct_DUMPBIN +if test -n "$ac_ct_DUMPBIN"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_DUMPBIN" >&5 +printf "%s\n" "$ac_ct_DUMPBIN" >&6; } else - ac_cv_header_stdc=no + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -if test $ac_cv_header_stdc = yes; then - # SunOS 4.x string.h does not declare mem*, contrary to ANSI. - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include -_ACEOF -if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | - $EGREP "memchr" >/dev/null 2>&1; then : + test -n "$ac_ct_DUMPBIN" && break +done -else - ac_cv_header_stdc=no + if test "x$ac_ct_DUMPBIN" = x; then + DUMPBIN=":" + else + case $cross_compiling:$ac_tool_warned in +yes:) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +ac_tool_warned=yes ;; +esac + DUMPBIN=$ac_ct_DUMPBIN + fi fi -rm -f conftest* + case `$DUMPBIN -symbols -headers /dev/null 2>&1 | sed '1q'` in + *COFF*) + DUMPBIN="$DUMPBIN -symbols -headers" + ;; + *) + DUMPBIN=: + ;; + esac + fi + + if test : != "$DUMPBIN"; then + NM=$DUMPBIN + fi fi +test -z "$NM" && NM=nm -if test $ac_cv_header_stdc = yes; then - # ISC 2.0.2 stdlib.h does not declare free, contrary to ANSI. - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include -_ACEOF -if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | - $EGREP "free" >/dev/null 2>&1; then : -else - ac_cv_header_stdc=no -fi -rm -f conftest* -fi -if test $ac_cv_header_stdc = yes; then - # /bin/cc in Irix-4.0.5 gets non-ANSI ctype macros unless using -ansi. - if test "$cross_compiling" = yes; then : - : -else - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include -#include -#if ((' ' & 0x0FF) == 0x020) -# define ISLOWER(c) ('a' <= (c) && (c) <= 'z') -# define TOUPPER(c) (ISLOWER(c) ? 'A' + ((c) - 'a') : (c)) -#else -# define ISLOWER(c) \ - (('a' <= (c) && (c) <= 'i') \ - || ('j' <= (c) && (c) <= 'r') \ - || ('s' <= (c) && (c) <= 'z')) -# define TOUPPER(c) (ISLOWER(c) ? ((c) | 0x40) : (c)) -#endif -#define XOR(e, f) (((e) && !(f)) || (!(e) && (f))) -int -main () -{ - int i; - for (i = 0; i < 256; i++) - if (XOR (islower (i), ISLOWER (i)) - || toupper (i) != TOUPPER (i)) - return 2; - return 0; -} -_ACEOF -if ac_fn_c_try_run "$LINENO"; then : +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking the name lister ($NM) interface" >&5 +printf %s "checking the name lister ($NM) interface... " >&6; } +if test ${lt_cv_nm_interface+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_nm_interface="BSD nm" + echo "int some_variable = 0;" > conftest.$ac_ext + (eval echo "\"\$as_me:$LINENO: $ac_compile\"" >&5) + (eval "$ac_compile" 2>conftest.err) + cat conftest.err >&5 + (eval echo "\"\$as_me:$LINENO: $NM \\\"conftest.$ac_objext\\\"\"" >&5) + (eval "$NM \"conftest.$ac_objext\"" 2>conftest.err > conftest.out) + cat conftest.err >&5 + (eval echo "\"\$as_me:$LINENO: output\"" >&5) + cat conftest.out >&5 + if $GREP 'External.*some_variable' conftest.out > /dev/null; then + lt_cv_nm_interface="MS dumpbin" + fi + rm -f conftest* +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_nm_interface" >&5 +printf "%s\n" "$lt_cv_nm_interface" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether ln -s works" >&5 +printf %s "checking whether ln -s works... " >&6; } +LN_S=$as_ln_s +if test "$LN_S" = "ln -s"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } else - ac_cv_header_stdc=no -fi -rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \ - conftest.$ac_objext conftest.beam conftest.$ac_ext + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no, using $LN_S" >&5 +printf "%s\n" "no, using $LN_S" >&6; } fi -fi -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_header_stdc" >&5 -$as_echo "$ac_cv_header_stdc" >&6; } -if test $ac_cv_header_stdc = yes; then +# find the maximum length of command line arguments +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking the maximum length of command line arguments" >&5 +printf %s "checking the maximum length of command line arguments... " >&6; } +if test ${lt_cv_sys_max_cmd_len+y} +then : + printf %s "(cached) " >&6 +else $as_nop + i=0 + teststring=ABCD -$as_echo "#define STDC_HEADERS 1" >>confdefs.h + case $build_os in + msdosdjgpp*) + # On DJGPP, this test can blow up pretty badly due to problems in libc + # (any single argument exceeding 2000 bytes causes a buffer overrun + # during glob expansion). Even if it were fixed, the result of this + # check would be larger than it should be. + lt_cv_sys_max_cmd_len=12288; # 12K is about right + ;; -fi + gnu*) + # Under GNU Hurd, this test is not required because there is + # no limit to the length of command line arguments. + # Libtool will interpret -1 as no limit whatsoever + lt_cv_sys_max_cmd_len=-1; + ;; -# On IRIX 5.3, sys/types and inttypes.h are conflicting. -for ac_header in sys/types.h sys/stat.h stdlib.h string.h memory.h strings.h \ - inttypes.h stdint.h unistd.h -do : - as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh` -ac_fn_c_check_header_compile "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default -" -if eval test \"x\$"$as_ac_Header"\" = x"yes"; then : - cat >>confdefs.h <<_ACEOF -#define `$as_echo "HAVE_$ac_header" | $as_tr_cpp` 1 -_ACEOF + cygwin* | mingw* | cegcc*) + # On Win9x/ME, this test blows up -- it succeeds, but takes + # about 5 minutes as the teststring grows exponentially. + # Worse, since 9x/ME are not pre-emptively multitasking, + # you end up with a "frozen" computer, even though with patience + # the test eventually succeeds (with a max line length of 256k). + # Instead, let's just punt: use the minimum linelength reported by + # all of the supported platforms: 8192 (on NT/2K/XP). + lt_cv_sys_max_cmd_len=8192; + ;; -fi + mint*) + # On MiNT this can take a long time and run out of memory. + lt_cv_sys_max_cmd_len=8192; + ;; -done + amigaos*) + # On AmigaOS with pdksh, this test takes hours, literally. + # So we just punt and use a minimum line length of 8192. + lt_cv_sys_max_cmd_len=8192; + ;; + + bitrig* | darwin* | dragonfly* | freebsd* | netbsd* | openbsd*) + # This has been around since 386BSD, at least. Likely further. + if test -x /sbin/sysctl; then + lt_cv_sys_max_cmd_len=`/sbin/sysctl -n kern.argmax` + elif test -x /usr/sbin/sysctl; then + lt_cv_sys_max_cmd_len=`/usr/sbin/sysctl -n kern.argmax` + else + lt_cv_sys_max_cmd_len=65536 # usable default for all BSDs + fi + # And add a safety zone + lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \/ 4` + lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \* 3` + ;; + + interix*) + # We know the value 262144 and hardcode it with a safety zone (like BSD) + lt_cv_sys_max_cmd_len=196608 + ;; + os2*) + # The test takes a long time on OS/2. + lt_cv_sys_max_cmd_len=8192 + ;; -ac_fn_c_find_intX_t "$LINENO" "64" "ac_cv_c_int64_t" -case $ac_cv_c_int64_t in #( - no|yes) ;; #( + osf*) + # Dr. Hans Ekkehard Plesser reports seeing a kernel panic running configure + # due to this test when exec_disable_arg_limit is 1 on Tru64. It is not + # nice to cause kernel panics so lets avoid the loop below. + # First set a reasonable default. + lt_cv_sys_max_cmd_len=16384 + # + if test -x /sbin/sysconfig; then + case `/sbin/sysconfig -q proc exec_disable_arg_limit` in + *1*) lt_cv_sys_max_cmd_len=-1 ;; + esac + fi + ;; + sco3.2v5*) + lt_cv_sys_max_cmd_len=102400 + ;; + sysv5* | sco5v6* | sysv4.2uw2*) + kargmax=`grep ARG_MAX /etc/conf/cf.d/stune 2>/dev/null` + if test -n "$kargmax"; then + lt_cv_sys_max_cmd_len=`echo $kargmax | sed 's/.*[ ]//'` + else + lt_cv_sys_max_cmd_len=32768 + fi + ;; *) + lt_cv_sys_max_cmd_len=`(getconf ARG_MAX) 2> /dev/null` + if test -n "$lt_cv_sys_max_cmd_len" && \ + test undefined != "$lt_cv_sys_max_cmd_len"; then + lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \/ 4` + lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \* 3` + else + # Make teststring a little bigger before we do anything with it. + # a 1K string should be a reasonable start. + for i in 1 2 3 4 5 6 7 8; do + teststring=$teststring$teststring + done + SHELL=${SHELL-${CONFIG_SHELL-/bin/sh}} + # If test is not a shell built-in, we'll probably end up computing a + # maximum length that is only half of the actual maximum length, but + # we can't tell. + while { test X`env echo "$teststring$teststring" 2>/dev/null` \ + = "X$teststring$teststring"; } >/dev/null 2>&1 && + test 17 != "$i" # 1/2 MB should be enough + do + i=`expr $i + 1` + teststring=$teststring$teststring + done + # Only check the string length outside the loop. + lt_cv_sys_max_cmd_len=`expr "X$teststring" : ".*" 2>&1` + teststring= + # Add a significant safety factor because C++ compilers can tack on + # massive amounts of additional arguments before passing them to the + # linker. It appears as though 1/2 is a usable value. + lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \/ 2` + fi + ;; + esac -cat >>confdefs.h <<_ACEOF -#define int64_t $ac_cv_c_int64_t -_ACEOF -;; -esac +fi + +if test -n "$lt_cv_sys_max_cmd_len"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_sys_max_cmd_len" >&5 +printf "%s\n" "$lt_cv_sys_max_cmd_len" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: none" >&5 +printf "%s\n" "none" >&6; } +fi +max_cmd_len=$lt_cv_sys_max_cmd_len -# Make sure we can run config.sub. -$SHELL "$ac_aux_dir/config.sub" sun4 >/dev/null 2>&1 || - as_fn_error $? "cannot run $SHELL $ac_aux_dir/config.sub" "$LINENO" 5 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking build system type" >&5 -$as_echo_n "checking build system type... " >&6; } -if ${ac_cv_build+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_build_alias=$build_alias -test "x$ac_build_alias" = x && - ac_build_alias=`$SHELL "$ac_aux_dir/config.guess"` -test "x$ac_build_alias" = x && - as_fn_error $? "cannot guess build type; you must specify one" "$LINENO" 5 -ac_cv_build=`$SHELL "$ac_aux_dir/config.sub" $ac_build_alias` || - as_fn_error $? "$SHELL $ac_aux_dir/config.sub $ac_build_alias failed" "$LINENO" 5 -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_build" >&5 -$as_echo "$ac_cv_build" >&6; } -case $ac_cv_build in -*-*-*) ;; -*) as_fn_error $? "invalid value of canonical build" "$LINENO" 5;; -esac -build=$ac_cv_build -ac_save_IFS=$IFS; IFS='-' -set x $ac_cv_build -shift -build_cpu=$1 -build_vendor=$2 -shift; shift -# Remember, the first character of IFS is used to create $*, -# except with old shells: -build_os=$* -IFS=$ac_save_IFS -case $build_os in *\ *) build_os=`echo "$build_os" | sed 's/ /-/g'`;; esac +: ${CP="cp -f"} +: ${MV="mv -f"} +: ${RM="rm -f"} -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking host system type" >&5 -$as_echo_n "checking host system type... " >&6; } -if ${ac_cv_host+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test "x$host_alias" = x; then - ac_cv_host=$ac_cv_build +if ( (MAIL=60; unset MAIL) || exit) >/dev/null 2>&1; then + lt_unset=unset else - ac_cv_host=`$SHELL "$ac_aux_dir/config.sub" $host_alias` || - as_fn_error $? "$SHELL $ac_aux_dir/config.sub $host_alias failed" "$LINENO" 5 + lt_unset=false fi -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_host" >&5 -$as_echo "$ac_cv_host" >&6; } -case $ac_cv_host in -*-*-*) ;; -*) as_fn_error $? "invalid value of canonical host" "$LINENO" 5;; + + + + +# test EBCDIC or ASCII +case `echo X|tr X '\101'` in + A) # ASCII based system + # \n is not interpreted correctly by Solaris 8 /usr/ucb/tr + lt_SP2NL='tr \040 \012' + lt_NL2SP='tr \015\012 \040\040' + ;; + *) # EBCDIC based system + lt_SP2NL='tr \100 \n' + lt_NL2SP='tr \r\n \100\100' + ;; esac -host=$ac_cv_host -ac_save_IFS=$IFS; IFS='-' -set x $ac_cv_host -shift -host_cpu=$1 -host_vendor=$2 -shift; shift -# Remember, the first character of IFS is used to create $*, -# except with old shells: -host_os=$* -IFS=$ac_save_IFS -case $host_os in *\ *) host_os=`echo "$host_os" | sed 's/ /-/g'`;; esac -enable_win32_dll=yes -case $host in -*-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-cegcc*) - if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}as", so it can be a program name with args. -set dummy ${ac_tool_prefix}as; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_AS+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$AS"; then - ac_cv_prog_AS="$AS" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_AS="${ac_tool_prefix}as" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS -fi -fi -AS=$ac_cv_prog_AS -if test -n "$AS"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $AS" >&5 -$as_echo "$AS" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi -fi -if test -z "$ac_cv_prog_AS"; then - ac_ct_AS=$AS - # Extract the first word of "as", so it can be a program name with args. -set dummy as; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_AS+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_AS"; then - ac_cv_prog_ac_ct_AS="$ac_ct_AS" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_AS="as" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS -fi -fi -ac_ct_AS=$ac_cv_prog_ac_ct_AS -if test -n "$ac_ct_AS"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_AS" >&5 -$as_echo "$ac_ct_AS" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - if test "x$ac_ct_AS" = x; then - AS="false" - else - case $cross_compiling:$ac_tool_warned in -yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} -ac_tool_warned=yes ;; + +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking how to convert $build file names to $host format" >&5 +printf %s "checking how to convert $build file names to $host format... " >&6; } +if test ${lt_cv_to_host_file_cmd+y} +then : + printf %s "(cached) " >&6 +else $as_nop + case $host in + *-*-mingw* ) + case $build in + *-*-mingw* ) # actually msys + lt_cv_to_host_file_cmd=func_convert_file_msys_to_w32 + ;; + *-*-cygwin* ) + lt_cv_to_host_file_cmd=func_convert_file_cygwin_to_w32 + ;; + * ) # otherwise, assume *nix + lt_cv_to_host_file_cmd=func_convert_file_nix_to_w32 + ;; + esac + ;; + *-*-cygwin* ) + case $build in + *-*-mingw* ) # actually msys + lt_cv_to_host_file_cmd=func_convert_file_msys_to_cygwin + ;; + *-*-cygwin* ) + lt_cv_to_host_file_cmd=func_convert_file_noop + ;; + * ) # otherwise, assume *nix + lt_cv_to_host_file_cmd=func_convert_file_nix_to_cygwin + ;; + esac + ;; + * ) # unhandled hosts (and "normal" native builds) + lt_cv_to_host_file_cmd=func_convert_file_noop + ;; esac - AS=$ac_ct_AS - fi -else - AS="$ac_cv_prog_AS" + fi - if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}dlltool", so it can be a program name with args. -set dummy ${ac_tool_prefix}dlltool; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_DLLTOOL+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$DLLTOOL"; then - ac_cv_prog_DLLTOOL="$DLLTOOL" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_DLLTOOL="${ac_tool_prefix}dlltool" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS +to_host_file_cmd=$lt_cv_to_host_file_cmd +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_to_host_file_cmd" >&5 +printf "%s\n" "$lt_cv_to_host_file_cmd" >&6; } -fi -fi -DLLTOOL=$ac_cv_prog_DLLTOOL -if test -n "$DLLTOOL"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $DLLTOOL" >&5 -$as_echo "$DLLTOOL" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi -fi -if test -z "$ac_cv_prog_DLLTOOL"; then - ac_ct_DLLTOOL=$DLLTOOL - # Extract the first word of "dlltool", so it can be a program name with args. -set dummy dlltool; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_DLLTOOL+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_DLLTOOL"; then - ac_cv_prog_ac_ct_DLLTOOL="$ac_ct_DLLTOOL" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_DLLTOOL="dlltool" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS -fi -fi -ac_ct_DLLTOOL=$ac_cv_prog_ac_ct_DLLTOOL -if test -n "$ac_ct_DLLTOOL"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_DLLTOOL" >&5 -$as_echo "$ac_ct_DLLTOOL" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - if test "x$ac_ct_DLLTOOL" = x; then - DLLTOOL="false" - else - case $cross_compiling:$ac_tool_warned in -yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} -ac_tool_warned=yes ;; +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking how to convert $build file names to toolchain format" >&5 +printf %s "checking how to convert $build file names to toolchain format... " >&6; } +if test ${lt_cv_to_tool_file_cmd+y} +then : + printf %s "(cached) " >&6 +else $as_nop + #assume ordinary cross tools, or native build. +lt_cv_to_tool_file_cmd=func_convert_file_noop +case $host in + *-*-mingw* ) + case $build in + *-*-mingw* ) # actually msys + lt_cv_to_tool_file_cmd=func_convert_file_msys_to_w32 + ;; + esac + ;; esac - DLLTOOL=$ac_ct_DLLTOOL - fi -else - DLLTOOL="$ac_cv_prog_DLLTOOL" + fi - if test -n "$ac_tool_prefix"; then +to_tool_file_cmd=$lt_cv_to_tool_file_cmd +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_to_tool_file_cmd" >&5 +printf "%s\n" "$lt_cv_to_tool_file_cmd" >&6; } + + + + + +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $LD option to reload object files" >&5 +printf %s "checking for $LD option to reload object files... " >&6; } +if test ${lt_cv_ld_reload_flag+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_ld_reload_flag='-r' +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_ld_reload_flag" >&5 +printf "%s\n" "$lt_cv_ld_reload_flag" >&6; } +reload_flag=$lt_cv_ld_reload_flag +case $reload_flag in +"" | " "*) ;; +*) reload_flag=" $reload_flag" ;; +esac +reload_cmds='$LD$reload_flag -o $output$reload_objs' +case $host_os in + cygwin* | mingw* | pw32* | cegcc*) + if test yes != "$GCC"; then + reload_cmds=false + fi + ;; + darwin*) + if test yes = "$GCC"; then + reload_cmds='$LTCC $LTCFLAGS -nostdlib $wl-r -o $output$reload_objs' + else + reload_cmds='$LD$reload_flag -o $output$reload_objs' + fi + ;; +esac + + + + + + + + + +if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}objdump", so it can be a program name with args. set dummy ${ac_tool_prefix}objdump; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_OBJDUMP+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_OBJDUMP+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$OBJDUMP"; then ac_cv_prog_OBJDUMP="$OBJDUMP" # Let the user override the test. else @@ -6500,11 +6253,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_OBJDUMP="${ac_tool_prefix}objdump" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -6515,11 +6272,11 @@ fi fi OBJDUMP=$ac_cv_prog_OBJDUMP if test -n "$OBJDUMP"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $OBJDUMP" >&5 -$as_echo "$OBJDUMP" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $OBJDUMP" >&5 +printf "%s\n" "$OBJDUMP" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -6528,11 +6285,12 @@ if test -z "$ac_cv_prog_OBJDUMP"; then ac_ct_OBJDUMP=$OBJDUMP # Extract the first word of "objdump", so it can be a program name with args. set dummy objdump; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_OBJDUMP+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_OBJDUMP+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$ac_ct_OBJDUMP"; then ac_cv_prog_ac_ct_OBJDUMP="$ac_ct_OBJDUMP" # Let the user override the test. else @@ -6540,11 +6298,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_ac_ct_OBJDUMP="objdump" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -6555,11 +6317,11 @@ fi fi ac_ct_OBJDUMP=$ac_cv_prog_ac_ct_OBJDUMP if test -n "$ac_ct_OBJDUMP"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_OBJDUMP" >&5 -$as_echo "$ac_ct_OBJDUMP" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_OBJDUMP" >&5 +printf "%s\n" "$ac_ct_OBJDUMP" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi if test "x$ac_ct_OBJDUMP" = x; then @@ -6567,8 +6329,8 @@ fi else case $cross_compiling:$ac_tool_warned in yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} ac_tool_warned=yes ;; esac OBJDUMP=$ac_ct_OBJDUMP @@ -6577,109 +6339,242 @@ else OBJDUMP="$ac_cv_prog_OBJDUMP" fi - ;; -esac - -test -z "$AS" && AS=as - +test -z "$OBJDUMP" && OBJDUMP=objdump -test -z "$DLLTOOL" && DLLTOOL=dlltool +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking how to recognize dependent libraries" >&5 +printf %s "checking how to recognize dependent libraries... " >&6; } +if test ${lt_cv_deplibs_check_method+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_file_magic_cmd='$MAGIC_CMD' +lt_cv_file_magic_test_file= +lt_cv_deplibs_check_method='unknown' +# Need to set the preceding variable on all platforms that support +# interlibrary dependencies. +# 'none' -- dependencies not supported. +# 'unknown' -- same as none, but documents that we really don't know. +# 'pass_all' -- all dependencies passed with no checks. +# 'test_compile' -- check by making test program. +# 'file_magic [[regex]]' -- check by looking for files in library path +# that responds to the $file_magic_cmd with a given extended regex. +# If you have 'file' or equivalent on your system and you're not sure +# whether 'pass_all' will *always* work, you probably want this one. +case $host_os in +aix[4-9]*) + lt_cv_deplibs_check_method=pass_all + ;; +beos*) + lt_cv_deplibs_check_method=pass_all + ;; -test -z "$OBJDUMP" && OBJDUMP=objdump +bsdi[45]*) + lt_cv_deplibs_check_method='file_magic ELF [0-9][0-9]*-bit [ML]SB (shared object|dynamic lib)' + lt_cv_file_magic_cmd='/usr/bin/file -L' + lt_cv_file_magic_test_file=/shlib/libc.so + ;; +cygwin*) + # func_win32_libid is a shell function defined in ltmain.sh + lt_cv_deplibs_check_method='file_magic ^x86 archive import|^x86 DLL' + lt_cv_file_magic_cmd='func_win32_libid' + ;; +mingw* | pw32*) + # Base MSYS/MinGW do not provide the 'file' command needed by + # func_win32_libid shell function, so use a weaker test based on 'objdump', + # unless we find 'file', for example because we are cross-compiling. + if ( file / ) >/dev/null 2>&1; then + lt_cv_deplibs_check_method='file_magic ^x86 archive import|^x86 DLL' + lt_cv_file_magic_cmd='func_win32_libid' + else + # Keep this pattern in sync with the one in func_win32_libid. + lt_cv_deplibs_check_method='file_magic file format (pei*-i386(.*architecture: i386)?|pe-arm-wince|pe-x86-64)' + lt_cv_file_magic_cmd='$OBJDUMP -f' + fi + ;; +cegcc*) + # use the weaker test based on 'objdump'. See mingw*. + lt_cv_deplibs_check_method='file_magic file format pe-arm-.*little(.*architecture: arm)?' + lt_cv_file_magic_cmd='$OBJDUMP -f' + ;; +darwin* | rhapsody*) + lt_cv_deplibs_check_method=pass_all + ;; +freebsd* | dragonfly*) + if echo __ELF__ | $CC -E - | $GREP __ELF__ > /dev/null; then + case $host_cpu in + i*86 ) + # Not sure whether the presence of OpenBSD here was a mistake. + # Let's accept both of them until this is cleared up. + lt_cv_deplibs_check_method='file_magic (FreeBSD|OpenBSD|DragonFly)/i[3-9]86 (compact )?demand paged shared library' + lt_cv_file_magic_cmd=/usr/bin/file + lt_cv_file_magic_test_file=`echo /usr/lib/libc.so.*` + ;; + esac + else + lt_cv_deplibs_check_method=pass_all + fi + ;; +haiku*) + lt_cv_deplibs_check_method=pass_all + ;; -case `pwd` in - *\ * | *\ *) - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: Libtool does not cope well with whitespace in \`pwd\`" >&5 -$as_echo "$as_me: WARNING: Libtool does not cope well with whitespace in \`pwd\`" >&2;} ;; -esac - +hpux10.20* | hpux11*) + lt_cv_file_magic_cmd=/usr/bin/file + case $host_cpu in + ia64*) + lt_cv_deplibs_check_method='file_magic (s[0-9][0-9][0-9]|ELF-[0-9][0-9]) shared object file - IA64' + lt_cv_file_magic_test_file=/usr/lib/hpux32/libc.so + ;; + hppa*64*) + lt_cv_deplibs_check_method='file_magic (s[0-9][0-9][0-9]|ELF[ -][0-9][0-9])(-bit)?( [LM]SB)? shared object( file)?[, -]* PA-RISC [0-9]\.[0-9]' + lt_cv_file_magic_test_file=/usr/lib/pa20_64/libc.sl + ;; + *) + lt_cv_deplibs_check_method='file_magic (s[0-9][0-9][0-9]|PA-RISC[0-9]\.[0-9]) shared library' + lt_cv_file_magic_test_file=/usr/lib/libc.sl + ;; + esac + ;; +interix[3-9]*) + # PIC code is broken on Interix 3.x, that's why |\.a not |_pic\.a here + lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so|\.a)$' + ;; -macro_version='2.4.6.42-b88ce' -macro_revision='2.4.6.42' +irix5* | irix6* | nonstopux*) + case $LD in + *-32|*"-32 ") libmagic=32-bit;; + *-n32|*"-n32 ") libmagic=N32;; + *-64|*"-64 ") libmagic=64-bit;; + *) libmagic=never-match;; + esac + lt_cv_deplibs_check_method=pass_all + ;; +# This must be glibc/ELF. +linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*) + lt_cv_deplibs_check_method=pass_all + ;; +netbsd*) + if echo __ELF__ | $CC -E - | $GREP __ELF__ > /dev/null; then + lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so\.[0-9]+\.[0-9]+|_pic\.a)$' + else + lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so|_pic\.a)$' + fi + ;; +newos6*) + lt_cv_deplibs_check_method='file_magic ELF [0-9][0-9]*-bit [ML]SB (executable|dynamic lib)' + lt_cv_file_magic_cmd=/usr/bin/file + lt_cv_file_magic_test_file=/usr/lib/libnls.so + ;; +*nto* | *qnx*) + lt_cv_deplibs_check_method=pass_all + ;; +openbsd* | bitrig*) + if test -z "`echo __ELF__ | $CC -E - | $GREP __ELF__`"; then + lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so\.[0-9]+\.[0-9]+|\.so|_pic\.a)$' + else + lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so\.[0-9]+\.[0-9]+|_pic\.a)$' + fi + ;; +osf3* | osf4* | osf5*) + lt_cv_deplibs_check_method=pass_all + ;; +rdos*) + lt_cv_deplibs_check_method=pass_all + ;; +solaris*) + lt_cv_deplibs_check_method=pass_all + ;; +sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) + lt_cv_deplibs_check_method=pass_all + ;; +sysv4 | sysv4.3*) + case $host_vendor in + motorola) + lt_cv_deplibs_check_method='file_magic ELF [0-9][0-9]*-bit [ML]SB (shared object|dynamic lib) M[0-9][0-9]* Version [0-9]' + lt_cv_file_magic_test_file=`echo /usr/lib/libc.so*` + ;; + ncr) + lt_cv_deplibs_check_method=pass_all + ;; + sequent) + lt_cv_file_magic_cmd='/bin/file' + lt_cv_deplibs_check_method='file_magic ELF [0-9][0-9]*-bit [LM]SB (shared object|dynamic lib )' + ;; + sni) + lt_cv_file_magic_cmd='/bin/file' + lt_cv_deplibs_check_method="file_magic ELF [0-9][0-9]*-bit [LM]SB dynamic lib" + lt_cv_file_magic_test_file=/lib/libc.so + ;; + siemens) + lt_cv_deplibs_check_method=pass_all + ;; + pc) + lt_cv_deplibs_check_method=pass_all + ;; + esac + ;; +tpf*) + lt_cv_deplibs_check_method=pass_all + ;; +os2*) + lt_cv_deplibs_check_method=pass_all + ;; +esac +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_deplibs_check_method" >&5 +printf "%s\n" "$lt_cv_deplibs_check_method" >&6; } -ltmain=$ac_aux_dir/ltmain.sh +file_magic_glob= +want_nocaseglob=no +if test "$build" = "$host"; then + case $host_os in + mingw* | pw32*) + if ( shopt | grep nocaseglob ) >/dev/null 2>&1; then + want_nocaseglob=yes + else + file_magic_glob=`echo aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ | $SED -e "s/\(..\)/s\/[\1]\/[\1]\/g;/g"` + fi + ;; + esac +fi -# Backslashify metacharacters that are still active within -# double-quoted strings. -sed_quote_subst='s/\(["`$\\]\)/\\\1/g' +file_magic_cmd=$lt_cv_file_magic_cmd +deplibs_check_method=$lt_cv_deplibs_check_method +test -z "$deplibs_check_method" && deplibs_check_method=unknown -# Same as above, but do not quote variable references. -double_quote_subst='s/\(["`\\]\)/\\\1/g' -# Sed substitution to delay expansion of an escaped shell variable in a -# double_quote_subst'ed string. -delay_variable_subst='s/\\\\\\\\\\\$/\\\\\\$/g' -# Sed substitution to delay expansion of an escaped single quote. -delay_single_quote_subst='s/'\''/'\'\\\\\\\'\''/g' -# Sed substitution to avoid accidental globbing in evaled expressions -no_glob_subst='s/\*/\\\*/g' -ECHO='\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' -ECHO=$ECHO$ECHO$ECHO$ECHO$ECHO -ECHO=$ECHO$ECHO$ECHO$ECHO$ECHO$ECHO -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking how to print strings" >&5 -$as_echo_n "checking how to print strings... " >&6; } -# Test print first, because it will be a builtin if present. -if test "X`( print -r -- -n ) 2>/dev/null`" = X-n && \ - test "X`print -r -- $ECHO 2>/dev/null`" = "X$ECHO"; then - ECHO='print -r --' -elif test "X`printf %s $ECHO 2>/dev/null`" = "X$ECHO"; then - ECHO='printf %s\n' -else - # Use this function as a fallback that always works. - func_fallback_echo () - { - eval 'cat <<_LTECHO_EOF -$1 -_LTECHO_EOF' - } - ECHO='func_fallback_echo' -fi -# func_echo_all arg... -# Invoke $ECHO with all args, space-separated. -func_echo_all () -{ - $ECHO "" -} -case $ECHO in - printf*) { $as_echo "$as_me:${as_lineno-$LINENO}: result: printf" >&5 -$as_echo "printf" >&6; } ;; - print*) { $as_echo "$as_me:${as_lineno-$LINENO}: result: print -r" >&5 -$as_echo "print -r" >&6; } ;; - *) { $as_echo "$as_me:${as_lineno-$LINENO}: result: cat" >&5 -$as_echo "cat" >&6; } ;; -esac @@ -6694,168 +6589,109 @@ esac -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for a sed that does not truncate output" >&5 -$as_echo_n "checking for a sed that does not truncate output... " >&6; } -if ${ac_cv_path_SED+:} false; then : - $as_echo_n "(cached) " >&6 +if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}dlltool", so it can be a program name with args. +set dummy ${ac_tool_prefix}dlltool; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_DLLTOOL+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$DLLTOOL"; then + ac_cv_prog_DLLTOOL="$DLLTOOL" # Let the user override the test. else - ac_script=s/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa/bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb/ - for ac_i in 1 2 3 4 5 6 7; do - ac_script="$ac_script$as_nl$ac_script" - done - echo "$ac_script" 2>/dev/null | sed 99q >conftest.sed - { ac_script=; unset ac_script;} - if test -z "$SED"; then - ac_path_SED_found=false - # Loop through the user's path and test for each of PROGNAME-LIST - as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_prog in sed gsed; do + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - ac_path_SED="$as_dir/$ac_prog$ac_exec_ext" - as_fn_executable_p "$ac_path_SED" || continue -# Check for GNU ac_path_SED and select it if it is found. - # Check for GNU $ac_path_SED -case `"$ac_path_SED" --version 2>&1` in -*GNU*) - ac_cv_path_SED="$ac_path_SED" ac_path_SED_found=:;; -*) - ac_count=0 - $as_echo_n 0123456789 >"conftest.in" - while : - do - cat "conftest.in" "conftest.in" >"conftest.tmp" - mv "conftest.tmp" "conftest.in" - cp "conftest.in" "conftest.nl" - $as_echo '' >> "conftest.nl" - "$ac_path_SED" -f conftest.sed < "conftest.nl" >"conftest.out" 2>/dev/null || break - diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break - as_fn_arith $ac_count + 1 && ac_count=$as_val - if test $ac_count -gt ${ac_path_SED_max-0}; then - # Best one so far, save it but keep looking for a better one - ac_cv_path_SED="$ac_path_SED" - ac_path_SED_max=$ac_count - fi - # 10*(2^10) chars as input seems more than enough - test $ac_count -gt 10 && break - done - rm -f conftest.in conftest.tmp conftest.nl conftest.out;; -esac - - $ac_path_SED_found && break 3 - done - done + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_DLLTOOL="${ac_tool_prefix}dlltool" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done done IFS=$as_save_IFS - if test -z "$ac_cv_path_SED"; then - as_fn_error $? "no acceptable sed could be found in \$PATH" "$LINENO" 5 - fi -else - ac_cv_path_SED=$SED -fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_SED" >&5 -$as_echo "$ac_cv_path_SED" >&6; } - SED="$ac_cv_path_SED" - rm -f conftest.sed - -test -z "$SED" && SED=sed -Xsed="$SED -e 1s/^X//" - - - - - - - - - +fi +DLLTOOL=$ac_cv_prog_DLLTOOL +if test -n "$DLLTOOL"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $DLLTOOL" >&5 +printf "%s\n" "$DLLTOOL" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for fgrep" >&5 -$as_echo_n "checking for fgrep... " >&6; } -if ${ac_cv_path_FGREP+:} false; then : - $as_echo_n "(cached) " >&6 +fi +if test -z "$ac_cv_prog_DLLTOOL"; then + ac_ct_DLLTOOL=$DLLTOOL + # Extract the first word of "dlltool", so it can be a program name with args. +set dummy dlltool; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_DLLTOOL+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_DLLTOOL"; then + ac_cv_prog_ac_ct_DLLTOOL="$ac_ct_DLLTOOL" # Let the user override the test. else - if echo 'ab*c' | $GREP -F 'ab*c' >/dev/null 2>&1 - then ac_cv_path_FGREP="$GREP -F" - else - if test -z "$FGREP"; then - ac_path_FGREP_found=false - # Loop through the user's path and test for each of PROGNAME-LIST - as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH$PATH_SEPARATOR/usr/xpg4/bin +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_prog in fgrep; do + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - ac_path_FGREP="$as_dir/$ac_prog$ac_exec_ext" - as_fn_executable_p "$ac_path_FGREP" || continue -# Check for GNU ac_path_FGREP and select it if it is found. - # Check for GNU $ac_path_FGREP -case `"$ac_path_FGREP" --version 2>&1` in -*GNU*) - ac_cv_path_FGREP="$ac_path_FGREP" ac_path_FGREP_found=:;; -*) - ac_count=0 - $as_echo_n 0123456789 >"conftest.in" - while : - do - cat "conftest.in" "conftest.in" >"conftest.tmp" - mv "conftest.tmp" "conftest.in" - cp "conftest.in" "conftest.nl" - $as_echo 'FGREP' >> "conftest.nl" - "$ac_path_FGREP" FGREP < "conftest.nl" >"conftest.out" 2>/dev/null || break - diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break - as_fn_arith $ac_count + 1 && ac_count=$as_val - if test $ac_count -gt ${ac_path_FGREP_max-0}; then - # Best one so far, save it but keep looking for a better one - ac_cv_path_FGREP="$ac_path_FGREP" - ac_path_FGREP_max=$ac_count - fi - # 10*(2^10) chars as input seems more than enough - test $ac_count -gt 10 && break - done - rm -f conftest.in conftest.tmp conftest.nl conftest.out;; -esac - - $ac_path_FGREP_found && break 3 - done - done + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_DLLTOOL="dlltool" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done done IFS=$as_save_IFS - if test -z "$ac_cv_path_FGREP"; then - as_fn_error $? "no acceptable fgrep could be found in $PATH$PATH_SEPARATOR/usr/xpg4/bin" "$LINENO" 5 - fi + +fi +fi +ac_ct_DLLTOOL=$ac_cv_prog_ac_ct_DLLTOOL +if test -n "$ac_ct_DLLTOOL"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_DLLTOOL" >&5 +printf "%s\n" "$ac_ct_DLLTOOL" >&6; } else - ac_cv_path_FGREP=$FGREP + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi - fi + if test "x$ac_ct_DLLTOOL" = x; then + DLLTOOL="false" + else + case $cross_compiling:$ac_tool_warned in +yes:) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +ac_tool_warned=yes ;; +esac + DLLTOOL=$ac_ct_DLLTOOL + fi +else + DLLTOOL="$ac_cv_prog_DLLTOOL" fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_FGREP" >&5 -$as_echo "$ac_cv_path_FGREP" >&6; } - FGREP="$ac_cv_path_FGREP" - - -test -z "$GREP" && GREP=grep - - - - - - - - - - - +test -z "$DLLTOOL" && DLLTOOL=dlltool @@ -6863,110 +6699,38 @@ test -z "$GREP" && GREP=grep -# Check whether --with-gnu-ld was given. -if test "${with_gnu_ld+set}" = set; then : - withval=$with_gnu_ld; test no = "$withval" || with_gnu_ld=yes -else - with_gnu_ld=no -fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking how to associate runtime and link libraries" >&5 +printf %s "checking how to associate runtime and link libraries... " >&6; } +if test ${lt_cv_sharedlib_from_linklib_cmd+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_sharedlib_from_linklib_cmd='unknown' -ac_prog=ld -if test yes = "$GCC"; then - # Check if gcc -print-prog-name=ld gives a path. - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ld used by $CC" >&5 -$as_echo_n "checking for ld used by $CC... " >&6; } - case $host in - *-*-mingw*) - # gcc leaves a trailing carriage return, which upsets mingw - ac_prog=`($CC -print-prog-name=ld) 2>&5 | tr -d '\015'` ;; - *) - ac_prog=`($CC -print-prog-name=ld) 2>&5` ;; - esac - case $ac_prog in - # Accept absolute paths. - [\\/]* | ?:[\\/]*) - re_direlt='/[^/][^/]*/\.\./' - # Canonicalize the pathname of ld - ac_prog=`$ECHO "$ac_prog"| $SED 's%\\\\%/%g'` - while $ECHO "$ac_prog" | $GREP "$re_direlt" > /dev/null 2>&1; do - ac_prog=`$ECHO $ac_prog| $SED "s%$re_direlt%/%"` - done - test -z "$LD" && LD=$ac_prog - ;; - "") - # If it fails, then pretend we aren't using GCC. - ac_prog=ld +case $host_os in +cygwin* | mingw* | pw32* | cegcc*) + # two different shell functions defined in ltmain.sh; + # decide which one to use based on capabilities of $DLLTOOL + case `$DLLTOOL --help 2>&1` in + *--identify-strict*) + lt_cv_sharedlib_from_linklib_cmd=func_cygming_dll_for_implib ;; *) - # If it is relative, then search for the first ld in PATH. - with_gnu_ld=unknown + lt_cv_sharedlib_from_linklib_cmd=func_cygming_dll_for_implib_fallback ;; esac -elif test yes = "$with_gnu_ld"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for GNU ld" >&5 -$as_echo_n "checking for GNU ld... " >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for non-GNU ld" >&5 -$as_echo_n "checking for non-GNU ld... " >&6; } -fi -if ${lt_cv_path_LD+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -z "$LD"; then - lt_save_ifs=$IFS; IFS=$PATH_SEPARATOR - for ac_dir in $PATH; do - IFS=$lt_save_ifs - test -z "$ac_dir" && ac_dir=. - if test -f "$ac_dir/$ac_prog" || test -f "$ac_dir/$ac_prog$ac_exeext"; then - lt_cv_path_LD=$ac_dir/$ac_prog - # Check to see if the program is GNU ld. I'd rather use --version, - # but apparently some variants of GNU ld only accept -v. - # Break only if it was the GNU/non-GNU ld that we prefer. - case `"$lt_cv_path_LD" -v 2>&1 &5 -$as_echo "$LD" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi -test -z "$LD" && as_fn_error $? "no acceptable ld found in \$PATH" "$LINENO" 5 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking if the linker ($LD) is GNU ld" >&5 -$as_echo_n "checking if the linker ($LD) is GNU ld... " >&6; } -if ${lt_cv_prog_gnu_ld+:} false; then : - $as_echo_n "(cached) " >&6 -else - # I'd rather use --version here, but apparently some GNU lds only accept -v. -case `$LD -v 2>&1 &5 -$as_echo "$lt_cv_prog_gnu_ld" >&6; } -with_gnu_ld=$lt_cv_prog_gnu_ld - +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_sharedlib_from_linklib_cmd" >&5 +printf "%s\n" "$lt_cv_sharedlib_from_linklib_cmd" >&6; } +sharedlib_from_linklib_cmd=$lt_cv_sharedlib_from_linklib_cmd +test -z "$sharedlib_from_linklib_cmd" && sharedlib_from_linklib_cmd=$ECHO @@ -6974,91 +6738,33 @@ with_gnu_ld=$lt_cv_prog_gnu_ld -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for BSD- or MS-compatible name lister (nm)" >&5 -$as_echo_n "checking for BSD- or MS-compatible name lister (nm)... " >&6; } -if ${lt_cv_path_NM+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$NM"; then - # Let the user override the test. - lt_cv_path_NM=$NM -else - lt_nm_to_check=${ac_tool_prefix}nm - if test -n "$ac_tool_prefix" && test "$build" = "$host"; then - lt_nm_to_check="$lt_nm_to_check nm" - fi - for lt_tmp_nm in $lt_nm_to_check; do - lt_save_ifs=$IFS; IFS=$PATH_SEPARATOR - for ac_dir in $PATH /usr/ccs/bin/elf /usr/ccs/bin /usr/ucb /bin; do - IFS=$lt_save_ifs - test -z "$ac_dir" && ac_dir=. - tmp_nm=$ac_dir/$lt_tmp_nm - if test -f "$tmp_nm" || test -f "$tmp_nm$ac_exeext"; then - # Check to see if the nm accepts a BSD-compat flag. - # Adding the 'sed 1q' prevents false positives on HP-UX, which says: - # nm: unknown option "B" ignored - # Tru64's nm complains that /dev/null is an invalid object file - # MSYS converts /dev/null to NUL, MinGW nm treats NUL as empty - case $build_os in - mingw*) lt_bad_file=conftest.nm/nofile ;; - *) lt_bad_file=/dev/null ;; - esac - case `"$tmp_nm" -B $lt_bad_file 2>&1 | sed '1q'` in - *$lt_bad_file* | *'Invalid file or object type'*) - lt_cv_path_NM="$tmp_nm -B" - break 2 - ;; - *) - case `"$tmp_nm" -p /dev/null 2>&1 | sed '1q'` in - */dev/null*) - lt_cv_path_NM="$tmp_nm -p" - break 2 - ;; - *) - lt_cv_path_NM=${lt_cv_path_NM="$tmp_nm"} # keep the first match, but - continue # so that we can try to find one that supports BSD flags - ;; - esac - ;; - esac - fi - done - IFS=$lt_save_ifs - done - : ${lt_cv_path_NM=no} -fi -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_path_NM" >&5 -$as_echo "$lt_cv_path_NM" >&6; } -if test no != "$lt_cv_path_NM"; then - NM=$lt_cv_path_NM -else - # Didn't find any BSD compatible name lister, look for dumpbin. - if test -n "$DUMPBIN"; then : - # Let the user override the test. - else - if test -n "$ac_tool_prefix"; then - for ac_prog in dumpbin "link -dump" +if test -n "$ac_tool_prefix"; then + for ac_prog in ar do # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. set dummy $ac_tool_prefix$ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_DUMPBIN+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$DUMPBIN"; then - ac_cv_prog_DUMPBIN="$DUMPBIN" # Let the user override the test. +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_AR+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$AR"; then + ac_cv_prog_AR="$AR" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_DUMPBIN="$ac_tool_prefix$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_AR="$ac_tool_prefix$ac_prog" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -7067,42 +6773,47 @@ IFS=$as_save_IFS fi fi -DUMPBIN=$ac_cv_prog_DUMPBIN -if test -n "$DUMPBIN"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $DUMPBIN" >&5 -$as_echo "$DUMPBIN" >&6; } +AR=$ac_cv_prog_AR +if test -n "$AR"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $AR" >&5 +printf "%s\n" "$AR" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi - test -n "$DUMPBIN" && break + test -n "$AR" && break done fi -if test -z "$DUMPBIN"; then - ac_ct_DUMPBIN=$DUMPBIN - for ac_prog in dumpbin "link -dump" +if test -z "$AR"; then + ac_ct_AR=$AR + for ac_prog in ar do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_DUMPBIN+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_DUMPBIN"; then - ac_cv_prog_ac_ct_DUMPBIN="$ac_ct_DUMPBIN" # Let the user override the test. +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_AR+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_AR"; then + ac_cv_prog_ac_ct_AR="$ac_ct_AR" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_DUMPBIN="$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_AR="$ac_prog" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -7111,373 +6822,224 @@ IFS=$as_save_IFS fi fi -ac_ct_DUMPBIN=$ac_cv_prog_ac_ct_DUMPBIN -if test -n "$ac_ct_DUMPBIN"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_DUMPBIN" >&5 -$as_echo "$ac_ct_DUMPBIN" >&6; } +ac_ct_AR=$ac_cv_prog_ac_ct_AR +if test -n "$ac_ct_AR"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_AR" >&5 +printf "%s\n" "$ac_ct_AR" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi - test -n "$ac_ct_DUMPBIN" && break + test -n "$ac_ct_AR" && break done - if test "x$ac_ct_DUMPBIN" = x; then - DUMPBIN=":" + if test "x$ac_ct_AR" = x; then + AR="false" else case $cross_compiling:$ac_tool_warned in yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} ac_tool_warned=yes ;; esac - DUMPBIN=$ac_ct_DUMPBIN - fi -fi - - case `$DUMPBIN -symbols -headers /dev/null 2>&1 | sed '1q'` in - *COFF*) - DUMPBIN="$DUMPBIN -symbols -headers" - ;; - *) - DUMPBIN=: - ;; - esac - fi - - if test : != "$DUMPBIN"; then - NM=$DUMPBIN + AR=$ac_ct_AR fi fi -test -z "$NM" && NM=nm - +: ${AR=ar} -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking the name lister ($NM) interface" >&5 -$as_echo_n "checking the name lister ($NM) interface... " >&6; } -if ${lt_cv_nm_interface+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_nm_interface="BSD nm" - echo "int some_variable = 0;" > conftest.$ac_ext - (eval echo "\"\$as_me:$LINENO: $ac_compile\"" >&5) - (eval "$ac_compile" 2>conftest.err) - cat conftest.err >&5 - (eval echo "\"\$as_me:$LINENO: $NM \\\"conftest.$ac_objext\\\"\"" >&5) - (eval "$NM \"conftest.$ac_objext\"" 2>conftest.err > conftest.out) - cat conftest.err >&5 - (eval echo "\"\$as_me:$LINENO: output\"" >&5) - cat conftest.out >&5 - if $GREP 'External.*some_variable' conftest.out > /dev/null; then - lt_cv_nm_interface="MS dumpbin" - fi - rm -f conftest* -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_nm_interface" >&5 -$as_echo "$lt_cv_nm_interface" >&6; } -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ln -s works" >&5 -$as_echo_n "checking whether ln -s works... " >&6; } -LN_S=$as_ln_s -if test "$LN_S" = "ln -s"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no, using $LN_S" >&5 -$as_echo "no, using $LN_S" >&6; } -fi -# find the maximum length of command line arguments -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking the maximum length of command line arguments" >&5 -$as_echo_n "checking the maximum length of command line arguments... " >&6; } -if ${lt_cv_sys_max_cmd_len+:} false; then : - $as_echo_n "(cached) " >&6 -else - i=0 - teststring=ABCD +# Use ARFLAGS variable as AR's operation code to sync the variable naming with +# Automake. If both AR_FLAGS and ARFLAGS are specified, AR_FLAGS should have +# higher priority because thats what people were doing historically (setting +# ARFLAGS for automake and AR_FLAGS for libtool). FIXME: Make the AR_FLAGS +# variable obsoleted/removed. - case $build_os in - msdosdjgpp*) - # On DJGPP, this test can blow up pretty badly due to problems in libc - # (any single argument exceeding 2000 bytes causes a buffer overrun - # during glob expansion). Even if it were fixed, the result of this - # check would be larger than it should be. - lt_cv_sys_max_cmd_len=12288; # 12K is about right - ;; +test ${AR_FLAGS+y} || AR_FLAGS=${ARFLAGS-cr} +lt_ar_flags=$AR_FLAGS - gnu*) - # Under GNU Hurd, this test is not required because there is - # no limit to the length of command line arguments. - # Libtool will interpret -1 as no limit whatsoever - lt_cv_sys_max_cmd_len=-1; - ;; - cygwin* | mingw* | cegcc*) - # On Win9x/ME, this test blows up -- it succeeds, but takes - # about 5 minutes as the teststring grows exponentially. - # Worse, since 9x/ME are not pre-emptively multitasking, - # you end up with a "frozen" computer, even though with patience - # the test eventually succeeds (with a max line length of 256k). - # Instead, let's just punt: use the minimum linelength reported by - # all of the supported platforms: 8192 (on NT/2K/XP). - lt_cv_sys_max_cmd_len=8192; - ;; - mint*) - # On MiNT this can take a long time and run out of memory. - lt_cv_sys_max_cmd_len=8192; - ;; - amigaos*) - # On AmigaOS with pdksh, this test takes hours, literally. - # So we just punt and use a minimum line length of 8192. - lt_cv_sys_max_cmd_len=8192; - ;; - bitrig* | darwin* | dragonfly* | freebsd* | netbsd* | openbsd*) - # This has been around since 386BSD, at least. Likely further. - if test -x /sbin/sysctl; then - lt_cv_sys_max_cmd_len=`/sbin/sysctl -n kern.argmax` - elif test -x /usr/sbin/sysctl; then - lt_cv_sys_max_cmd_len=`/usr/sbin/sysctl -n kern.argmax` - else - lt_cv_sys_max_cmd_len=65536 # usable default for all BSDs - fi - # And add a safety zone - lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \/ 4` - lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \* 3` - ;; - interix*) - # We know the value 262144 and hardcode it with a safety zone (like BSD) - lt_cv_sys_max_cmd_len=196608 - ;; +# Make AR_FLAGS overridable by 'make ARFLAGS='. Don't try to run-time override +# by AR_FLAGS because that was never working and AR_FLAGS is about to die. - os2*) - # The test takes a long time on OS/2. - lt_cv_sys_max_cmd_len=8192 - ;; - osf*) - # Dr. Hans Ekkehard Plesser reports seeing a kernel panic running configure - # due to this test when exec_disable_arg_limit is 1 on Tru64. It is not - # nice to cause kernel panics so lets avoid the loop below. - # First set a reasonable default. - lt_cv_sys_max_cmd_len=16384 - # - if test -x /sbin/sysconfig; then - case `/sbin/sysconfig -q proc exec_disable_arg_limit` in - *1*) lt_cv_sys_max_cmd_len=-1 ;; - esac - fi - ;; - sco3.2v5*) - lt_cv_sys_max_cmd_len=102400 - ;; - sysv5* | sco5v6* | sysv4.2uw2*) - kargmax=`grep ARG_MAX /etc/conf/cf.d/stune 2>/dev/null` - if test -n "$kargmax"; then - lt_cv_sys_max_cmd_len=`echo $kargmax | sed 's/.*[ ]//'` - else - lt_cv_sys_max_cmd_len=32768 - fi - ;; - *) - lt_cv_sys_max_cmd_len=`(getconf ARG_MAX) 2> /dev/null` - if test -n "$lt_cv_sys_max_cmd_len" && \ - test undefined != "$lt_cv_sys_max_cmd_len"; then - lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \/ 4` - lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \* 3` - else - # Make teststring a little bigger before we do anything with it. - # a 1K string should be a reasonable start. - for i in 1 2 3 4 5 6 7 8; do - teststring=$teststring$teststring - done - SHELL=${SHELL-${CONFIG_SHELL-/bin/sh}} - # If test is not a shell built-in, we'll probably end up computing a - # maximum length that is only half of the actual maximum length, but - # we can't tell. - while { test X`env echo "$teststring$teststring" 2>/dev/null` \ - = "X$teststring$teststring"; } >/dev/null 2>&1 && - test 17 != "$i" # 1/2 MB should be enough - do - i=`expr $i + 1` - teststring=$teststring$teststring - done - # Only check the string length outside the loop. - lt_cv_sys_max_cmd_len=`expr "X$teststring" : ".*" 2>&1` - teststring= - # Add a significant safety factor because C++ compilers can tack on - # massive amounts of additional arguments before passing them to the - # linker. It appears as though 1/2 is a usable value. - lt_cv_sys_max_cmd_len=`expr $lt_cv_sys_max_cmd_len \/ 2` - fi - ;; - esac -fi -if test -n "$lt_cv_sys_max_cmd_len"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_sys_max_cmd_len" >&5 -$as_echo "$lt_cv_sys_max_cmd_len" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: none" >&5 -$as_echo "none" >&6; } -fi -max_cmd_len=$lt_cv_sys_max_cmd_len +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for archiver @FILE support" >&5 +printf %s "checking for archiver @FILE support... " >&6; } +if test ${lt_cv_ar_at_file+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_ar_at_file=no + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +int +main (void) +{ + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO" +then : + echo conftest.$ac_objext > conftest.lst + lt_ar_try='$AR $AR_FLAGS libconftest.a @conftest.lst >&5' + { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$lt_ar_try\""; } >&5 + (eval $lt_ar_try) 2>&5 + ac_status=$? + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; } + if test 0 -eq "$ac_status"; then + # Ensure the archiver fails upon bogus file names. + rm -f conftest.$ac_objext libconftest.a + { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$lt_ar_try\""; } >&5 + (eval $lt_ar_try) 2>&5 + ac_status=$? + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; } + if test 0 -ne "$ac_status"; then + lt_cv_ar_at_file=@ + fi + fi + rm -f conftest.* libconftest.a +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext -: ${CP="cp -f"} -: ${MV="mv -f"} -: ${RM="rm -f"} +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_ar_at_file" >&5 +printf "%s\n" "$lt_cv_ar_at_file" >&6; } -if ( (MAIL=60; unset MAIL) || exit) >/dev/null 2>&1; then - lt_unset=unset +if test no = "$lt_cv_ar_at_file"; then + archiver_list_spec= else - lt_unset=false + archiver_list_spec=$lt_cv_ar_at_file fi -# test EBCDIC or ASCII -case `echo X|tr X '\101'` in - A) # ASCII based system - # \n is not interpreted correctly by Solaris 8 /usr/ucb/tr - lt_SP2NL='tr \040 \012' - lt_NL2SP='tr \015\012 \040\040' - ;; - *) # EBCDIC based system - lt_SP2NL='tr \100 \n' - lt_NL2SP='tr \r\n \100\100' - ;; -esac - - - - +if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}strip", so it can be a program name with args. +set dummy ${ac_tool_prefix}strip; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_STRIP+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$STRIP"; then + ac_cv_prog_STRIP="$STRIP" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_STRIP="${ac_tool_prefix}strip" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS +fi +fi +STRIP=$ac_cv_prog_STRIP +if test -n "$STRIP"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $STRIP" >&5 +printf "%s\n" "$STRIP" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking how to convert $build file names to $host format" >&5 -$as_echo_n "checking how to convert $build file names to $host format... " >&6; } -if ${lt_cv_to_host_file_cmd+:} false; then : - $as_echo_n "(cached) " >&6 +fi +if test -z "$ac_cv_prog_STRIP"; then + ac_ct_STRIP=$STRIP + # Extract the first word of "strip", so it can be a program name with args. +set dummy strip; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_STRIP+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_STRIP"; then + ac_cv_prog_ac_ct_STRIP="$ac_ct_STRIP" # Let the user override the test. else - case $host in - *-*-mingw* ) - case $build in - *-*-mingw* ) # actually msys - lt_cv_to_host_file_cmd=func_convert_file_msys_to_w32 - ;; - *-*-cygwin* ) - lt_cv_to_host_file_cmd=func_convert_file_cygwin_to_w32 - ;; - * ) # otherwise, assume *nix - lt_cv_to_host_file_cmd=func_convert_file_nix_to_w32 - ;; - esac - ;; - *-*-cygwin* ) - case $build in - *-*-mingw* ) # actually msys - lt_cv_to_host_file_cmd=func_convert_file_msys_to_cygwin - ;; - *-*-cygwin* ) - lt_cv_to_host_file_cmd=func_convert_file_noop - ;; - * ) # otherwise, assume *nix - lt_cv_to_host_file_cmd=func_convert_file_nix_to_cygwin - ;; - esac - ;; - * ) # unhandled hosts (and "normal" native builds) - lt_cv_to_host_file_cmd=func_convert_file_noop - ;; -esac +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_STRIP="strip" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS fi - -to_host_file_cmd=$lt_cv_to_host_file_cmd -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_to_host_file_cmd" >&5 -$as_echo "$lt_cv_to_host_file_cmd" >&6; } - - - - - -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking how to convert $build file names to toolchain format" >&5 -$as_echo_n "checking how to convert $build file names to toolchain format... " >&6; } -if ${lt_cv_to_tool_file_cmd+:} false; then : - $as_echo_n "(cached) " >&6 +fi +ac_ct_STRIP=$ac_cv_prog_ac_ct_STRIP +if test -n "$ac_ct_STRIP"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_STRIP" >&5 +printf "%s\n" "$ac_ct_STRIP" >&6; } else - #assume ordinary cross tools, or native build. -lt_cv_to_tool_file_cmd=func_convert_file_noop -case $host in - *-*-mingw* ) - case $build in - *-*-mingw* ) # actually msys - lt_cv_to_tool_file_cmd=func_convert_file_msys_to_w32 - ;; - esac - ;; -esac - + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi -to_tool_file_cmd=$lt_cv_to_tool_file_cmd -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_to_tool_file_cmd" >&5 -$as_echo "$lt_cv_to_tool_file_cmd" >&6; } - - - - - -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $LD option to reload object files" >&5 -$as_echo_n "checking for $LD option to reload object files... " >&6; } -if ${lt_cv_ld_reload_flag+:} false; then : - $as_echo_n "(cached) " >&6 + if test "x$ac_ct_STRIP" = x; then + STRIP=":" + else + case $cross_compiling:$ac_tool_warned in +yes:) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +ac_tool_warned=yes ;; +esac + STRIP=$ac_ct_STRIP + fi else - lt_cv_ld_reload_flag='-r' + STRIP="$ac_cv_prog_STRIP" fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_ld_reload_flag" >&5 -$as_echo "$lt_cv_ld_reload_flag" >&6; } -reload_flag=$lt_cv_ld_reload_flag -case $reload_flag in -"" | " "*) ;; -*) reload_flag=" $reload_flag" ;; -esac -reload_cmds='$LD$reload_flag -o $output$reload_objs' -case $host_os in - cygwin* | mingw* | pw32* | cegcc*) - if test yes != "$GCC"; then - reload_cmds=false - fi - ;; - darwin*) - if test yes = "$GCC"; then - reload_cmds='$LTCC $LTCFLAGS -nostdlib $wl-r -o $output$reload_objs' - else - reload_cmds='$LD$reload_flag -o $output$reload_objs' - fi - ;; -esac - - +test -z "$STRIP" && STRIP=: @@ -7485,25 +7047,30 @@ esac if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}objdump", so it can be a program name with args. -set dummy ${ac_tool_prefix}objdump; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_OBJDUMP+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$OBJDUMP"; then - ac_cv_prog_OBJDUMP="$OBJDUMP" # Let the user override the test. + # Extract the first word of "${ac_tool_prefix}ranlib", so it can be a program name with args. +set dummy ${ac_tool_prefix}ranlib; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_RANLIB+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$RANLIB"; then + ac_cv_prog_RANLIB="$RANLIB" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_OBJDUMP="${ac_tool_prefix}objdump" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_RANLIB="${ac_tool_prefix}ranlib" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -7512,38 +7079,43 @@ IFS=$as_save_IFS fi fi -OBJDUMP=$ac_cv_prog_OBJDUMP -if test -n "$OBJDUMP"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $OBJDUMP" >&5 -$as_echo "$OBJDUMP" >&6; } +RANLIB=$ac_cv_prog_RANLIB +if test -n "$RANLIB"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $RANLIB" >&5 +printf "%s\n" "$RANLIB" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi fi -if test -z "$ac_cv_prog_OBJDUMP"; then - ac_ct_OBJDUMP=$OBJDUMP - # Extract the first word of "objdump", so it can be a program name with args. -set dummy objdump; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_OBJDUMP+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_OBJDUMP"; then - ac_cv_prog_ac_ct_OBJDUMP="$ac_ct_OBJDUMP" # Let the user override the test. +if test -z "$ac_cv_prog_RANLIB"; then + ac_ct_RANLIB=$RANLIB + # Extract the first word of "ranlib", so it can be a program name with args. +set dummy ranlib; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_RANLIB+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_RANLIB"; then + ac_cv_prog_ac_ct_RANLIB="$ac_ct_RANLIB" # Let the user override the test. else as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_OBJDUMP="objdump" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_RANLIB="ranlib" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -7552,257 +7124,85 @@ IFS=$as_save_IFS fi fi -ac_ct_OBJDUMP=$ac_cv_prog_ac_ct_OBJDUMP -if test -n "$ac_ct_OBJDUMP"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_OBJDUMP" >&5 -$as_echo "$ac_ct_OBJDUMP" >&6; } +ac_ct_RANLIB=$ac_cv_prog_ac_ct_RANLIB +if test -n "$ac_ct_RANLIB"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_RANLIB" >&5 +printf "%s\n" "$ac_ct_RANLIB" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi - if test "x$ac_ct_OBJDUMP" = x; then - OBJDUMP="false" + if test "x$ac_ct_RANLIB" = x; then + RANLIB=":" else case $cross_compiling:$ac_tool_warned in yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} ac_tool_warned=yes ;; esac - OBJDUMP=$ac_ct_OBJDUMP + RANLIB=$ac_ct_RANLIB fi else - OBJDUMP="$ac_cv_prog_OBJDUMP" + RANLIB="$ac_cv_prog_RANLIB" fi -test -z "$OBJDUMP" && OBJDUMP=objdump +test -z "$RANLIB" && RANLIB=: -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking how to recognize dependent libraries" >&5 -$as_echo_n "checking how to recognize dependent libraries... " >&6; } -if ${lt_cv_deplibs_check_method+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_file_magic_cmd='$MAGIC_CMD' -lt_cv_file_magic_test_file= -lt_cv_deplibs_check_method='unknown' -# Need to set the preceding variable on all platforms that support -# interlibrary dependencies. -# 'none' -- dependencies not supported. -# 'unknown' -- same as none, but documents that we really don't know. -# 'pass_all' -- all dependencies passed with no checks. -# 'test_compile' -- check by making test program. -# 'file_magic [[regex]]' -- check by looking for files in library path -# that responds to the $file_magic_cmd with a given extended regex. -# If you have 'file' or equivalent on your system and you're not sure -# whether 'pass_all' will *always* work, you probably want this one. +# Determine commands to create old-style static archives. +old_archive_cmds='$AR $AR_FLAGS $oldlib$oldobjs' +old_postinstall_cmds='chmod 644 $oldlib' +old_postuninstall_cmds= + +if test -n "$RANLIB"; then + case $host_os in + bitrig* | openbsd*) + old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB -t \$tool_oldlib" + ;; + *) + old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB \$tool_oldlib" + ;; + esac + old_archive_cmds="$old_archive_cmds~\$RANLIB \$tool_oldlib" +fi case $host_os in -aix[4-9]*) - lt_cv_deplibs_check_method=pass_all - ;; + darwin*) + lock_old_archive_extraction=yes ;; + *) + lock_old_archive_extraction=no ;; +esac -beos*) - lt_cv_deplibs_check_method=pass_all - ;; -bsdi[45]*) - lt_cv_deplibs_check_method='file_magic ELF [0-9][0-9]*-bit [ML]SB (shared object|dynamic lib)' - lt_cv_file_magic_cmd='/usr/bin/file -L' - lt_cv_file_magic_test_file=/shlib/libc.so - ;; -cygwin*) - # func_win32_libid is a shell function defined in ltmain.sh - lt_cv_deplibs_check_method='file_magic ^x86 archive import|^x86 DLL' - lt_cv_file_magic_cmd='func_win32_libid' - ;; -mingw* | pw32*) - # Base MSYS/MinGW do not provide the 'file' command needed by - # func_win32_libid shell function, so use a weaker test based on 'objdump', - # unless we find 'file', for example because we are cross-compiling. - if ( file / ) >/dev/null 2>&1; then - lt_cv_deplibs_check_method='file_magic ^x86 archive import|^x86 DLL' - lt_cv_file_magic_cmd='func_win32_libid' - else - # Keep this pattern in sync with the one in func_win32_libid. - lt_cv_deplibs_check_method='file_magic file format (pei*-i386(.*architecture: i386)?|pe-arm-wince|pe-x86-64)' - lt_cv_file_magic_cmd='$OBJDUMP -f' - fi - ;; -cegcc*) - # use the weaker test based on 'objdump'. See mingw*. - lt_cv_deplibs_check_method='file_magic file format pe-arm-.*little(.*architecture: arm)?' - lt_cv_file_magic_cmd='$OBJDUMP -f' - ;; -darwin* | rhapsody*) - lt_cv_deplibs_check_method=pass_all - ;; -freebsd* | dragonfly*) - if echo __ELF__ | $CC -E - | $GREP __ELF__ > /dev/null; then - case $host_cpu in - i*86 ) - # Not sure whether the presence of OpenBSD here was a mistake. - # Let's accept both of them until this is cleared up. - lt_cv_deplibs_check_method='file_magic (FreeBSD|OpenBSD|DragonFly)/i[3-9]86 (compact )?demand paged shared library' - lt_cv_file_magic_cmd=/usr/bin/file - lt_cv_file_magic_test_file=`echo /usr/lib/libc.so.*` - ;; - esac - else - lt_cv_deplibs_check_method=pass_all - fi - ;; -haiku*) - lt_cv_deplibs_check_method=pass_all - ;; -hpux10.20* | hpux11*) - lt_cv_file_magic_cmd=/usr/bin/file - case $host_cpu in - ia64*) - lt_cv_deplibs_check_method='file_magic (s[0-9][0-9][0-9]|ELF-[0-9][0-9]) shared object file - IA64' - lt_cv_file_magic_test_file=/usr/lib/hpux32/libc.so - ;; - hppa*64*) - lt_cv_deplibs_check_method='file_magic (s[0-9][0-9][0-9]|ELF[ -][0-9][0-9])(-bit)?( [LM]SB)? shared object( file)?[, -]* PA-RISC [0-9]\.[0-9]' - lt_cv_file_magic_test_file=/usr/lib/pa20_64/libc.sl - ;; - *) - lt_cv_deplibs_check_method='file_magic (s[0-9][0-9][0-9]|PA-RISC[0-9]\.[0-9]) shared library' - lt_cv_file_magic_test_file=/usr/lib/libc.sl - ;; - esac - ;; -interix[3-9]*) - # PIC code is broken on Interix 3.x, that's why |\.a not |_pic\.a here - lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so|\.a)$' - ;; -irix5* | irix6* | nonstopux*) - case $LD in - *-32|*"-32 ") libmagic=32-bit;; - *-n32|*"-n32 ") libmagic=N32;; - *-64|*"-64 ") libmagic=64-bit;; - *) libmagic=never-match;; - esac - lt_cv_deplibs_check_method=pass_all - ;; -# This must be glibc/ELF. -linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*) - lt_cv_deplibs_check_method=pass_all - ;; -netbsd*) - if echo __ELF__ | $CC -E - | $GREP __ELF__ > /dev/null; then - lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so\.[0-9]+\.[0-9]+|_pic\.a)$' - else - lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so|_pic\.a)$' - fi - ;; -newos6*) - lt_cv_deplibs_check_method='file_magic ELF [0-9][0-9]*-bit [ML]SB (executable|dynamic lib)' - lt_cv_file_magic_cmd=/usr/bin/file - lt_cv_file_magic_test_file=/usr/lib/libnls.so - ;; -*nto* | *qnx*) - lt_cv_deplibs_check_method=pass_all - ;; -openbsd* | bitrig*) - if test -z "`echo __ELF__ | $CC -E - | $GREP __ELF__`"; then - lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so\.[0-9]+\.[0-9]+|\.so|_pic\.a)$' - else - lt_cv_deplibs_check_method='match_pattern /lib[^/]+(\.so\.[0-9]+\.[0-9]+|_pic\.a)$' - fi - ;; -osf3* | osf4* | osf5*) - lt_cv_deplibs_check_method=pass_all - ;; -rdos*) - lt_cv_deplibs_check_method=pass_all - ;; -solaris*) - lt_cv_deplibs_check_method=pass_all - ;; -sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) - lt_cv_deplibs_check_method=pass_all - ;; -sysv4 | sysv4.3*) - case $host_vendor in - motorola) - lt_cv_deplibs_check_method='file_magic ELF [0-9][0-9]*-bit [ML]SB (shared object|dynamic lib) M[0-9][0-9]* Version [0-9]' - lt_cv_file_magic_test_file=`echo /usr/lib/libc.so*` - ;; - ncr) - lt_cv_deplibs_check_method=pass_all - ;; - sequent) - lt_cv_file_magic_cmd='/bin/file' - lt_cv_deplibs_check_method='file_magic ELF [0-9][0-9]*-bit [LM]SB (shared object|dynamic lib )' - ;; - sni) - lt_cv_file_magic_cmd='/bin/file' - lt_cv_deplibs_check_method="file_magic ELF [0-9][0-9]*-bit [LM]SB dynamic lib" - lt_cv_file_magic_test_file=/lib/libc.so - ;; - siemens) - lt_cv_deplibs_check_method=pass_all - ;; - pc) - lt_cv_deplibs_check_method=pass_all - ;; - esac - ;; -tpf*) - lt_cv_deplibs_check_method=pass_all - ;; -os2*) - lt_cv_deplibs_check_method=pass_all - ;; -esac -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_deplibs_check_method" >&5 -$as_echo "$lt_cv_deplibs_check_method" >&6; } -file_magic_glob= -want_nocaseglob=no -if test "$build" = "$host"; then - case $host_os in - mingw* | pw32*) - if ( shopt | grep nocaseglob ) >/dev/null 2>&1; then - want_nocaseglob=yes - else - file_magic_glob=`echo aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ | $SED -e "s/\(..\)/s\/[\1]\/[\1]\/g;/g"` - fi - ;; - esac -fi -file_magic_cmd=$lt_cv_file_magic_cmd -deplibs_check_method=$lt_cv_deplibs_check_method -test -z "$deplibs_check_method" && deplibs_check_method=unknown @@ -7817,332 +7217,326 @@ test -z "$deplibs_check_method" && deplibs_check_method=unknown +# If no C compiler was specified, use CC. +LTCC=${LTCC-"$CC"} +# If no C compiler flags were specified, use CFLAGS. +LTCFLAGS=${LTCFLAGS-"$CFLAGS"} +# Allow CC to be a program name with arguments. +compiler=$CC +# Check for command to grab the raw symbol name followed by C symbol from nm. +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking command to parse $NM output from $compiler object" >&5 +printf %s "checking command to parse $NM output from $compiler object... " >&6; } +if test ${lt_cv_sys_global_symbol_pipe+y} +then : + printf %s "(cached) " >&6 +else $as_nop +# These are sane defaults that work on at least a few old systems. +# [They come from Ultrix. What could be older than Ultrix?!! ;)] +# Character class describing NM global symbol codes. +symcode='[BCDEGRST]' +# Regexp to match symbols that can be accessed directly from C. +sympat='\([_A-Za-z][_A-Za-z0-9]*\)' -if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}dlltool", so it can be a program name with args. -set dummy ${ac_tool_prefix}dlltool; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_DLLTOOL+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$DLLTOOL"; then - ac_cv_prog_DLLTOOL="$DLLTOOL" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_DLLTOOL="${ac_tool_prefix}dlltool" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 +# Define system-specific variables. +case $host_os in +aix*) + symcode='[BCDT]' + ;; +cygwin* | mingw* | pw32* | cegcc*) + symcode='[ABCDGISTW]' + ;; +hpux*) + if test ia64 = "$host_cpu"; then + symcode='[ABCDEGRST]' fi -done - done -IFS=$as_save_IFS + ;; +irix* | nonstopux*) + symcode='[BCDEGRST]' + ;; +osf*) + symcode='[BCDEGQRST]' + ;; +solaris*) + symcode='[BDRT]' + ;; +sco3.2v5*) + symcode='[DT]' + ;; +sysv4.2uw2*) + symcode='[DT]' + ;; +sysv5* | sco5v6* | unixware* | OpenUNIX*) + symcode='[ABDT]' + ;; +sysv4) + symcode='[DFNSTU]' + ;; +esac -fi -fi -DLLTOOL=$ac_cv_prog_DLLTOOL -if test -n "$DLLTOOL"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $DLLTOOL" >&5 -$as_echo "$DLLTOOL" >&6; } +# If we're using GNU nm, then use its standard symbol codes. +case `$NM -V 2>&1` in +*GNU* | *'with BFD'*) + symcode='[ABCDGIRSTW]' ;; +esac + +if test "$lt_cv_nm_interface" = "MS dumpbin"; then + # Gets list of data symbols to import. + lt_cv_sys_global_symbol_to_import="sed -n -e 's/^I .* \(.*\)$/\1/p'" + # Adjust the below global symbol transforms to fixup imported variables. + lt_cdecl_hook=" -e 's/^I .* \(.*\)$/extern __declspec(dllimport) char \1;/p'" + lt_c_name_hook=" -e 's/^I .* \(.*\)$/ {\"\1\", (void *) 0},/p'" + lt_c_name_lib_hook="\ + -e 's/^I .* \(lib.*\)$/ {\"\1\", (void *) 0},/p'\ + -e 's/^I .* \(.*\)$/ {\"lib\1\", (void *) 0},/p'" else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + # Disable hooks by default. + lt_cv_sys_global_symbol_to_import= + lt_cdecl_hook= + lt_c_name_hook= + lt_c_name_lib_hook= fi +# Transform an extracted symbol line into a proper C declaration. +# Some systems (esp. on ia64) link data and code symbols differently, +# so use this general approach. +lt_cv_sys_global_symbol_to_cdecl="sed -n"\ +$lt_cdecl_hook\ +" -e 's/^T .* \(.*\)$/extern int \1();/p'"\ +" -e 's/^$symcode$symcode* .* \(.*\)$/extern char \1;/p'" -fi -if test -z "$ac_cv_prog_DLLTOOL"; then - ac_ct_DLLTOOL=$DLLTOOL - # Extract the first word of "dlltool", so it can be a program name with args. -set dummy dlltool; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_DLLTOOL+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_DLLTOOL"; then - ac_cv_prog_ac_ct_DLLTOOL="$ac_ct_DLLTOOL" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_DLLTOOL="dlltool" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS +# Transform an extracted symbol line into symbol name and symbol address +lt_cv_sys_global_symbol_to_c_name_address="sed -n"\ +$lt_c_name_hook\ +" -e 's/^: \(.*\) .*$/ {\"\1\", (void *) 0},/p'"\ +" -e 's/^$symcode$symcode* .* \(.*\)$/ {\"\1\", (void *) \&\1},/p'" -fi -fi -ac_ct_DLLTOOL=$ac_cv_prog_ac_ct_DLLTOOL -if test -n "$ac_ct_DLLTOOL"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_DLLTOOL" >&5 -$as_echo "$ac_ct_DLLTOOL" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi +# Transform an extracted symbol line into symbol name with lib prefix and +# symbol address. +lt_cv_sys_global_symbol_to_c_name_address_lib_prefix="sed -n"\ +$lt_c_name_lib_hook\ +" -e 's/^: \(.*\) .*$/ {\"\1\", (void *) 0},/p'"\ +" -e 's/^$symcode$symcode* .* \(lib.*\)$/ {\"\1\", (void *) \&\1},/p'"\ +" -e 's/^$symcode$symcode* .* \(.*\)$/ {\"lib\1\", (void *) \&\1},/p'" - if test "x$ac_ct_DLLTOOL" = x; then - DLLTOOL="false" - else - case $cross_compiling:$ac_tool_warned in -yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} -ac_tool_warned=yes ;; +# Handle CRLF in mingw tool chain +opt_cr= +case $build_os in +mingw*) + opt_cr=`$ECHO 'x\{0,1\}' | tr x '\015'` # option cr in regexp + ;; esac - DLLTOOL=$ac_ct_DLLTOOL - fi -else - DLLTOOL="$ac_cv_prog_DLLTOOL" -fi - -test -z "$DLLTOOL" && DLLTOOL=dlltool - - +# Try without a prefix underscore, then with it. +for ac_symprfx in "" "_"; do + # Transform symcode, sympat, and symprfx into a raw symbol and a C symbol. + symxfrm="\\1 $ac_symprfx\\2 \\2" + # Write the raw and C identifiers. + if test "$lt_cv_nm_interface" = "MS dumpbin"; then + # Fake it for dumpbin and say T for any non-static function, + # D for any global variable and I for any imported variable. + # Also find C++ and __fastcall symbols from MSVC++ or ICC, + # which start with @ or ?. + lt_cv_sys_global_symbol_pipe="$AWK '"\ +" {last_section=section; section=\$ 3};"\ +" /^COFF SYMBOL TABLE/{for(i in hide) delete hide[i]};"\ +" /Section length .*#relocs.*(pick any)/{hide[last_section]=1};"\ +" /^ *Symbol name *: /{split(\$ 0,sn,\":\"); si=substr(sn[2],2)};"\ +" /^ *Type *: code/{print \"T\",si,substr(si,length(prfx))};"\ +" /^ *Type *: data/{print \"I\",si,substr(si,length(prfx))};"\ +" \$ 0!~/External *\|/{next};"\ +" / 0+ UNDEF /{next}; / UNDEF \([^|]\)*()/{next};"\ +" {if(hide[section]) next};"\ +" {f=\"D\"}; \$ 0~/\(\).*\|/{f=\"T\"};"\ +" {split(\$ 0,a,/\||\r/); split(a[2],s)};"\ +" s[1]~/^[@?]/{print f,s[1],s[1]; next};"\ +" s[1]~prfx {split(s[1],t,\"@\"); print f,t[1],substr(t[1],length(prfx))}"\ +" ' prfx=^$ac_symprfx" + else + lt_cv_sys_global_symbol_pipe="sed -n -e 's/^.*[ ]\($symcode$symcode*\)[ ][ ]*$ac_symprfx$sympat$opt_cr$/$symxfrm/p'" + fi + lt_cv_sys_global_symbol_pipe="$lt_cv_sys_global_symbol_pipe | sed '/ __gnu_lto/d'" + # Check to see that the pipe works correctly. + pipe_works=no -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking how to associate runtime and link libraries" >&5 -$as_echo_n "checking how to associate runtime and link libraries... " >&6; } -if ${lt_cv_sharedlib_from_linklib_cmd+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_sharedlib_from_linklib_cmd='unknown' + rm -f conftest* + cat > conftest.$ac_ext <<_LT_EOF +#ifdef __cplusplus +extern "C" { +#endif +char nm_test_var; +void nm_test_func(void); +void nm_test_func(void){} +#ifdef __cplusplus +} +#endif +int main(){nm_test_var='a';nm_test_func();return(0);} +_LT_EOF -case $host_os in -cygwin* | mingw* | pw32* | cegcc*) - # two different shell functions defined in ltmain.sh; - # decide which one to use based on capabilities of $DLLTOOL - case `$DLLTOOL --help 2>&1` in - *--identify-strict*) - lt_cv_sharedlib_from_linklib_cmd=func_cygming_dll_for_implib - ;; - *) - lt_cv_sharedlib_from_linklib_cmd=func_cygming_dll_for_implib_fallback - ;; - esac - ;; -*) - # fallback: assume linklib IS sharedlib - lt_cv_sharedlib_from_linklib_cmd=$ECHO - ;; -esac + if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 + (eval $ac_compile) 2>&5 + ac_status=$? + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; then + # Now try to grab the symbols. + nlist=conftest.nm + if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$NM conftest.$ac_objext \| "$lt_cv_sys_global_symbol_pipe" \> $nlist\""; } >&5 + (eval $NM conftest.$ac_objext \| "$lt_cv_sys_global_symbol_pipe" \> $nlist) 2>&5 + ac_status=$? + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; } && test -s "$nlist"; then + # Try sorting and uniquifying the output. + if sort "$nlist" | uniq > "$nlist"T; then + mv -f "$nlist"T "$nlist" + else + rm -f "$nlist"T + fi -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_sharedlib_from_linklib_cmd" >&5 -$as_echo "$lt_cv_sharedlib_from_linklib_cmd" >&6; } -sharedlib_from_linklib_cmd=$lt_cv_sharedlib_from_linklib_cmd -test -z "$sharedlib_from_linklib_cmd" && sharedlib_from_linklib_cmd=$ECHO + # Make sure that we snagged all the symbols we need. + if $GREP ' nm_test_var$' "$nlist" >/dev/null; then + if $GREP ' nm_test_func$' "$nlist" >/dev/null; then + cat <<_LT_EOF > conftest.$ac_ext +/* Keep this code in sync between libtool.m4, ltmain, lt_system.h, and tests. */ +#if defined _WIN32 || defined __CYGWIN__ || defined _WIN32_WCE +/* DATA imports from DLLs on WIN32 can't be const, because runtime + relocations are performed -- see ld's documentation on pseudo-relocs. */ +# define LT_DLSYM_CONST +#elif defined __osf__ +/* This system does not cope well with relocations in const data. */ +# define LT_DLSYM_CONST +#else +# define LT_DLSYM_CONST const +#endif +#ifdef __cplusplus +extern "C" { +#endif +_LT_EOF + # Now generate the symbol file. + eval "$lt_cv_sys_global_symbol_to_cdecl"' < "$nlist" | $GREP -v main >> conftest.$ac_ext' + cat <<_LT_EOF >> conftest.$ac_ext +/* The mapping between symbol names and symbols. */ +LT_DLSYM_CONST struct { + const char *name; + void *address; +} +lt__PROGRAM__LTX_preloaded_symbols[] = +{ + { "@PROGRAM@", (void *) 0 }, +_LT_EOF + $SED "s/^$symcode$symcode* .* \(.*\)$/ {\"\1\", (void *) \&\1},/" < "$nlist" | $GREP -v main >> conftest.$ac_ext + cat <<\_LT_EOF >> conftest.$ac_ext + {0, (void *) 0} +}; +/* This works around a problem in FreeBSD linker */ +#ifdef FREEBSD_WORKAROUND +static const void *lt_preloaded_setup() { + return lt__PROGRAM__LTX_preloaded_symbols; +} +#endif +#ifdef __cplusplus +} +#endif +_LT_EOF + # Now try linking the two files. + mv conftest.$ac_objext conftstm.$ac_objext + lt_globsym_save_LIBS=$LIBS + lt_globsym_save_CFLAGS=$CFLAGS + LIBS=conftstm.$ac_objext + CFLAGS="$CFLAGS$lt_prog_compiler_no_builtin_flag" + if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_link\""; } >&5 + (eval $ac_link) 2>&5 + ac_status=$? + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; } && test -s conftest$ac_exeext; then + pipe_works=yes + fi + LIBS=$lt_globsym_save_LIBS + CFLAGS=$lt_globsym_save_CFLAGS + else + echo "cannot find nm_test_func in $nlist" >&5 + fi + else + echo "cannot find nm_test_var in $nlist" >&5 + fi + else + echo "cannot run $lt_cv_sys_global_symbol_pipe" >&5 + fi + else + echo "$progname: failed program was:" >&5 + cat conftest.$ac_ext >&5 + fi + rm -rf conftest* conftst* -if test -n "$ac_tool_prefix"; then - for ac_prog in ar - do - # Extract the first word of "$ac_tool_prefix$ac_prog", so it can be a program name with args. -set dummy $ac_tool_prefix$ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_AR+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$AR"; then - ac_cv_prog_AR="$AR" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_AR="$ac_tool_prefix$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 + # Do not use the global_symbol_pipe unless it works. + if test yes = "$pipe_works"; then + break + else + lt_cv_sys_global_symbol_pipe= fi done - done -IFS=$as_save_IFS fi + +if test -z "$lt_cv_sys_global_symbol_pipe"; then + lt_cv_sys_global_symbol_to_cdecl= fi -AR=$ac_cv_prog_AR -if test -n "$AR"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $AR" >&5 -$as_echo "$AR" >&6; } +if test -z "$lt_cv_sys_global_symbol_pipe$lt_cv_sys_global_symbol_to_cdecl"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: failed" >&5 +printf "%s\n" "failed" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: ok" >&5 +printf "%s\n" "ok" >&6; } fi - - test -n "$AR" && break - done +# Response file support. +if test "$lt_cv_nm_interface" = "MS dumpbin"; then + nm_file_list_spec='@' +elif $NM --help 2>/dev/null | grep '[@]FILE' >/dev/null; then + nm_file_list_spec='@' fi -if test -z "$AR"; then - ac_ct_AR=$AR - for ac_prog in ar -do - # Extract the first word of "$ac_prog", so it can be a program name with args. -set dummy $ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_AR+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_AR"; then - ac_cv_prog_ac_ct_AR="$ac_ct_AR" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_AR="$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS -fi -fi -ac_ct_AR=$ac_cv_prog_ac_ct_AR -if test -n "$ac_ct_AR"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_AR" >&5 -$as_echo "$ac_ct_AR" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - test -n "$ac_ct_AR" && break -done - if test "x$ac_ct_AR" = x; then - AR="false" - else - case $cross_compiling:$ac_tool_warned in -yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} -ac_tool_warned=yes ;; -esac - AR=$ac_ct_AR - fi -fi -: ${AR=ar} -# Use ARFLAGS variable as AR's operation code to sync the variable naming with -# Automake. If both AR_FLAGS and ARFLAGS are specified, AR_FLAGS should have -# higher priority because thats what people were doing historically (setting -# ARFLAGS for automake and AR_FLAGS for libtool). FIXME: Make the AR_FLAGS -# variable obsoleted/removed. -test ${AR_FLAGS+y} || AR_FLAGS=${ARFLAGS-cr} -lt_ar_flags=$AR_FLAGS -# Make AR_FLAGS overridable by 'make ARFLAGS='. Don't try to run-time override -# by AR_FLAGS because that was never working and AR_FLAGS is about to die. -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for archiver @FILE support" >&5 -$as_echo_n "checking for archiver @FILE support... " >&6; } -if ${lt_cv_ar_at_file+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_ar_at_file=no - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -int -main () -{ - ; - return 0; -} -_ACEOF -if ac_fn_c_try_compile "$LINENO"; then : - echo conftest.$ac_objext > conftest.lst - lt_ar_try='$AR $AR_FLAGS libconftest.a @conftest.lst >&5' - { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$lt_ar_try\""; } >&5 - (eval $lt_ar_try) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } - if test 0 -eq "$ac_status"; then - # Ensure the archiver fails upon bogus file names. - rm -f conftest.$ac_objext libconftest.a - { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$lt_ar_try\""; } >&5 - (eval $lt_ar_try) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } - if test 0 -ne "$ac_status"; then - lt_cv_ar_at_file=@ - fi - fi - rm -f conftest.* libconftest.a - -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_ar_at_file" >&5 -$as_echo "$lt_cv_ar_at_file" >&6; } -if test no = "$lt_cv_ar_at_file"; then - archiver_list_spec= -else - archiver_list_spec=$lt_cv_ar_at_file -fi @@ -8150,250 +7544,112 @@ fi -if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}strip", so it can be a program name with args. -set dummy ${ac_tool_prefix}strip; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_STRIP+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$STRIP"; then - ac_cv_prog_STRIP="$STRIP" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_STRIP="${ac_tool_prefix}strip" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS -fi -fi -STRIP=$ac_cv_prog_STRIP -if test -n "$STRIP"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $STRIP" >&5 -$as_echo "$STRIP" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for sysroot" >&5 +printf %s "checking for sysroot... " >&6; } +# Check whether --with-sysroot was given. +if test ${with_sysroot+y} +then : + withval=$with_sysroot; +else $as_nop + with_sysroot=no fi -if test -z "$ac_cv_prog_STRIP"; then - ac_ct_STRIP=$STRIP - # Extract the first word of "strip", so it can be a program name with args. -set dummy strip; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_STRIP+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_STRIP"; then - ac_cv_prog_ac_ct_STRIP="$ac_ct_STRIP" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_STRIP="strip" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS -fi -fi -ac_ct_STRIP=$ac_cv_prog_ac_ct_STRIP -if test -n "$ac_ct_STRIP"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_STRIP" >&5 -$as_echo "$ac_ct_STRIP" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - if test "x$ac_ct_STRIP" = x; then - STRIP=":" - else - case $cross_compiling:$ac_tool_warned in -yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} -ac_tool_warned=yes ;; +lt_sysroot= +case $with_sysroot in #( + yes) + if test yes = "$GCC"; then + lt_sysroot=`$CC --print-sysroot 2>/dev/null` + fi + ;; #( + /*) + lt_sysroot=`echo "$with_sysroot" | sed -e "$sed_quote_subst"` + ;; #( + no|'') + ;; #( + *) + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $with_sysroot" >&5 +printf "%s\n" "$with_sysroot" >&6; } + as_fn_error $? "The sysroot must be an absolute path." "$LINENO" 5 + ;; esac - STRIP=$ac_ct_STRIP - fi -else - STRIP="$ac_cv_prog_STRIP" -fi - -test -z "$STRIP" && STRIP=: + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: ${lt_sysroot:-no}" >&5 +printf "%s\n" "${lt_sysroot:-no}" >&6; } -if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}ranlib", so it can be a program name with args. -set dummy ${ac_tool_prefix}ranlib; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_RANLIB+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$RANLIB"; then - ac_cv_prog_RANLIB="$RANLIB" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for a working dd" >&5 +printf %s "checking for a working dd... " >&6; } +if test ${ac_cv_path_lt_DD+y} +then : + printf %s "(cached) " >&6 +else $as_nop + printf 0123456789abcdef0123456789abcdef >conftest.i +cat conftest.i conftest.i >conftest2.i +: ${lt_DD:=$DD} +if test -z "$lt_DD"; then + ac_path_lt_DD_found=false + # Loop through the user's path and test for each of PROGNAME-LIST + as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_prog in dd + do for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_RANLIB="${ac_tool_prefix}ranlib" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS - -fi -fi -RANLIB=$ac_cv_prog_RANLIB -if test -n "$RANLIB"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $RANLIB" >&5 -$as_echo "$RANLIB" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - + ac_path_lt_DD="$as_dir$ac_prog$ac_exec_ext" + as_fn_executable_p "$ac_path_lt_DD" || continue +if "$ac_path_lt_DD" bs=32 count=1 conftest.out 2>/dev/null; then + cmp -s conftest.i conftest.out \ + && ac_cv_path_lt_DD="$ac_path_lt_DD" ac_path_lt_DD_found=: fi -if test -z "$ac_cv_prog_RANLIB"; then - ac_ct_RANLIB=$RANLIB - # Extract the first word of "ranlib", so it can be a program name with args. -set dummy ranlib; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_RANLIB+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_RANLIB"; then - ac_cv_prog_ac_ct_RANLIB="$ac_ct_RANLIB" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_RANLIB="ranlib" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done + $ac_path_lt_DD_found && break 3 + done + done done IFS=$as_save_IFS - -fi -fi -ac_ct_RANLIB=$ac_cv_prog_ac_ct_RANLIB -if test -n "$ac_ct_RANLIB"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_RANLIB" >&5 -$as_echo "$ac_ct_RANLIB" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - if test "x$ac_ct_RANLIB" = x; then - RANLIB=":" - else - case $cross_compiling:$ac_tool_warned in -yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} -ac_tool_warned=yes ;; -esac - RANLIB=$ac_ct_RANLIB + if test -z "$ac_cv_path_lt_DD"; then + : fi else - RANLIB="$ac_cv_prog_RANLIB" + ac_cv_path_lt_DD=$lt_DD fi -test -z "$RANLIB" && RANLIB=: - - - - - - -# Determine commands to create old-style static archives. -old_archive_cmds='$AR $AR_FLAGS $oldlib$oldobjs' -old_postinstall_cmds='chmod 644 $oldlib' -old_postuninstall_cmds= - -if test -n "$RANLIB"; then - case $host_os in - bitrig* | openbsd*) - old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB -t \$tool_oldlib" - ;; - *) - old_postinstall_cmds="$old_postinstall_cmds~\$RANLIB \$tool_oldlib" - ;; - esac - old_archive_cmds="$old_archive_cmds~\$RANLIB \$tool_oldlib" +rm -f conftest.i conftest2.i conftest.out fi - -case $host_os in - darwin*) - lock_old_archive_extraction=yes ;; - *) - lock_old_archive_extraction=no ;; -esac - - - - - - - - - - - - - - - - - - - - - +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_lt_DD" >&5 +printf "%s\n" "$ac_cv_path_lt_DD" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking how to truncate binary pipes" >&5 +printf %s "checking how to truncate binary pipes... " >&6; } +if test ${lt_cv_truncate_bin+y} +then : + printf %s "(cached) " >&6 +else $as_nop + printf 0123456789abcdef0123456789abcdef >conftest.i +cat conftest.i conftest.i >conftest2.i +lt_cv_truncate_bin= +if "$ac_cv_path_lt_DD" bs=32 count=1 conftest.out 2>/dev/null; then + cmp -s conftest.i conftest.out \ + && lt_cv_truncate_bin="$ac_cv_path_lt_DD bs=4096 count=1" +fi +rm -f conftest.i conftest2.i conftest.out +test -z "$lt_cv_truncate_bin" && lt_cv_truncate_bin="$SED -e 4q" +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_truncate_bin" >&5 +printf "%s\n" "$lt_cv_truncate_bin" >&6; } @@ -8401,4089 +7657,926 @@ esac +# Calculate cc_basename. Skip known compiler wrappers and cross-prefix. +func_cc_basename () +{ + for cc_temp in $*""; do + case $cc_temp in + compile | *[\\/]compile | ccache | *[\\/]ccache ) ;; + distcc | *[\\/]distcc | purify | *[\\/]purify ) ;; + \-*) ;; + *) break;; + esac + done + func_cc_basename_result=`$ECHO "$cc_temp" | $SED "s%.*/%%; s%^$host_alias-%%"` +} +# Check whether --enable-libtool-lock was given. +if test ${enable_libtool_lock+y} +then : + enableval=$enable_libtool_lock; +fi +test no = "$enable_libtool_lock" || enable_libtool_lock=yes - - - - - - -# If no C compiler was specified, use CC. -LTCC=${LTCC-"$CC"} - -# If no C compiler flags were specified, use CFLAGS. -LTCFLAGS=${LTCFLAGS-"$CFLAGS"} - -# Allow CC to be a program name with arguments. -compiler=$CC - - -# Check for command to grab the raw symbol name followed by C symbol from nm. -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking command to parse $NM output from $compiler object" >&5 -$as_echo_n "checking command to parse $NM output from $compiler object... " >&6; } -if ${lt_cv_sys_global_symbol_pipe+:} false; then : - $as_echo_n "(cached) " >&6 -else - -# These are sane defaults that work on at least a few old systems. -# [They come from Ultrix. What could be older than Ultrix?!! ;)] - -# Character class describing NM global symbol codes. -symcode='[BCDEGRST]' - -# Regexp to match symbols that can be accessed directly from C. -sympat='\([_A-Za-z][_A-Za-z0-9]*\)' - -# Define system-specific variables. -case $host_os in -aix*) - symcode='[BCDT]' - ;; -cygwin* | mingw* | pw32* | cegcc*) - symcode='[ABCDGISTW]' - ;; -hpux*) - if test ia64 = "$host_cpu"; then - symcode='[ABCDEGRST]' - fi - ;; -irix* | nonstopux*) - symcode='[BCDEGRST]' - ;; -osf*) - symcode='[BCDEGQRST]' - ;; -solaris*) - symcode='[BDRT]' - ;; -sco3.2v5*) - symcode='[DT]' - ;; -sysv4.2uw2*) - symcode='[DT]' - ;; -sysv5* | sco5v6* | unixware* | OpenUNIX*) - symcode='[ABDT]' - ;; -sysv4) - symcode='[DFNSTU]' - ;; -esac - -# If we're using GNU nm, then use its standard symbol codes. -case `$NM -V 2>&1` in -*GNU* | *'with BFD'*) - symcode='[ABCDGIRSTW]' ;; -esac - -if test "$lt_cv_nm_interface" = "MS dumpbin"; then - # Gets list of data symbols to import. - lt_cv_sys_global_symbol_to_import="sed -n -e 's/^I .* \(.*\)$/\1/p'" - # Adjust the below global symbol transforms to fixup imported variables. - lt_cdecl_hook=" -e 's/^I .* \(.*\)$/extern __declspec(dllimport) char \1;/p'" - lt_c_name_hook=" -e 's/^I .* \(.*\)$/ {\"\1\", (void *) 0},/p'" - lt_c_name_lib_hook="\ - -e 's/^I .* \(lib.*\)$/ {\"\1\", (void *) 0},/p'\ - -e 's/^I .* \(.*\)$/ {\"lib\1\", (void *) 0},/p'" -else - # Disable hooks by default. - lt_cv_sys_global_symbol_to_import= - lt_cdecl_hook= - lt_c_name_hook= - lt_c_name_lib_hook= -fi - -# Transform an extracted symbol line into a proper C declaration. -# Some systems (esp. on ia64) link data and code symbols differently, -# so use this general approach. -lt_cv_sys_global_symbol_to_cdecl="sed -n"\ -$lt_cdecl_hook\ -" -e 's/^T .* \(.*\)$/extern int \1();/p'"\ -" -e 's/^$symcode$symcode* .* \(.*\)$/extern char \1;/p'" - -# Transform an extracted symbol line into symbol name and symbol address -lt_cv_sys_global_symbol_to_c_name_address="sed -n"\ -$lt_c_name_hook\ -" -e 's/^: \(.*\) .*$/ {\"\1\", (void *) 0},/p'"\ -" -e 's/^$symcode$symcode* .* \(.*\)$/ {\"\1\", (void *) \&\1},/p'" - -# Transform an extracted symbol line into symbol name with lib prefix and -# symbol address. -lt_cv_sys_global_symbol_to_c_name_address_lib_prefix="sed -n"\ -$lt_c_name_lib_hook\ -" -e 's/^: \(.*\) .*$/ {\"\1\", (void *) 0},/p'"\ -" -e 's/^$symcode$symcode* .* \(lib.*\)$/ {\"\1\", (void *) \&\1},/p'"\ -" -e 's/^$symcode$symcode* .* \(.*\)$/ {\"lib\1\", (void *) \&\1},/p'" - -# Handle CRLF in mingw tool chain -opt_cr= -case $build_os in -mingw*) - opt_cr=`$ECHO 'x\{0,1\}' | tr x '\015'` # option cr in regexp - ;; -esac - -# Try without a prefix underscore, then with it. -for ac_symprfx in "" "_"; do - - # Transform symcode, sympat, and symprfx into a raw symbol and a C symbol. - symxfrm="\\1 $ac_symprfx\\2 \\2" - - # Write the raw and C identifiers. - if test "$lt_cv_nm_interface" = "MS dumpbin"; then - # Fake it for dumpbin and say T for any non-static function, - # D for any global variable and I for any imported variable. - # Also find C++ and __fastcall symbols from MSVC++ or ICC, - # which start with @ or ?. - lt_cv_sys_global_symbol_pipe="$AWK '"\ -" {last_section=section; section=\$ 3};"\ -" /^COFF SYMBOL TABLE/{for(i in hide) delete hide[i]};"\ -" /Section length .*#relocs.*(pick any)/{hide[last_section]=1};"\ -" /^ *Symbol name *: /{split(\$ 0,sn,\":\"); si=substr(sn[2],2)};"\ -" /^ *Type *: code/{print \"T\",si,substr(si,length(prfx))};"\ -" /^ *Type *: data/{print \"I\",si,substr(si,length(prfx))};"\ -" \$ 0!~/External *\|/{next};"\ -" / 0+ UNDEF /{next}; / UNDEF \([^|]\)*()/{next};"\ -" {if(hide[section]) next};"\ -" {f=\"D\"}; \$ 0~/\(\).*\|/{f=\"T\"};"\ -" {split(\$ 0,a,/\||\r/); split(a[2],s)};"\ -" s[1]~/^[@?]/{print f,s[1],s[1]; next};"\ -" s[1]~prfx {split(s[1],t,\"@\"); print f,t[1],substr(t[1],length(prfx))}"\ -" ' prfx=^$ac_symprfx" - else - lt_cv_sys_global_symbol_pipe="sed -n -e 's/^.*[ ]\($symcode$symcode*\)[ ][ ]*$ac_symprfx$sympat$opt_cr$/$symxfrm/p'" - fi - lt_cv_sys_global_symbol_pipe="$lt_cv_sys_global_symbol_pipe | sed '/ __gnu_lto/d'" - - # Check to see that the pipe works correctly. - pipe_works=no - - rm -f conftest* - cat > conftest.$ac_ext <<_LT_EOF -#ifdef __cplusplus -extern "C" { -#endif -char nm_test_var; -void nm_test_func(void); -void nm_test_func(void){} -#ifdef __cplusplus -} -#endif -int main(){nm_test_var='a';nm_test_func();return(0);} -_LT_EOF - - if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 - (eval $ac_compile) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; }; then - # Now try to grab the symbols. - nlist=conftest.nm - if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$NM conftest.$ac_objext \| "$lt_cv_sys_global_symbol_pipe" \> $nlist\""; } >&5 - (eval $NM conftest.$ac_objext \| "$lt_cv_sys_global_symbol_pipe" \> $nlist) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } && test -s "$nlist"; then - # Try sorting and uniquifying the output. - if sort "$nlist" | uniq > "$nlist"T; then - mv -f "$nlist"T "$nlist" - else - rm -f "$nlist"T - fi - - # Make sure that we snagged all the symbols we need. - if $GREP ' nm_test_var$' "$nlist" >/dev/null; then - if $GREP ' nm_test_func$' "$nlist" >/dev/null; then - cat <<_LT_EOF > conftest.$ac_ext -/* Keep this code in sync between libtool.m4, ltmain, lt_system.h, and tests. */ -#if defined _WIN32 || defined __CYGWIN__ || defined _WIN32_WCE -/* DATA imports from DLLs on WIN32 can't be const, because runtime - relocations are performed -- see ld's documentation on pseudo-relocs. */ -# define LT_DLSYM_CONST -#elif defined __osf__ -/* This system does not cope well with relocations in const data. */ -# define LT_DLSYM_CONST -#else -# define LT_DLSYM_CONST const -#endif - -#ifdef __cplusplus -extern "C" { -#endif - -_LT_EOF - # Now generate the symbol file. - eval "$lt_cv_sys_global_symbol_to_cdecl"' < "$nlist" | $GREP -v main >> conftest.$ac_ext' - - cat <<_LT_EOF >> conftest.$ac_ext - -/* The mapping between symbol names and symbols. */ -LT_DLSYM_CONST struct { - const char *name; - void *address; -} -lt__PROGRAM__LTX_preloaded_symbols[] = -{ - { "@PROGRAM@", (void *) 0 }, -_LT_EOF - $SED "s/^$symcode$symcode* .* \(.*\)$/ {\"\1\", (void *) \&\1},/" < "$nlist" | $GREP -v main >> conftest.$ac_ext - cat <<\_LT_EOF >> conftest.$ac_ext - {0, (void *) 0} -}; - -/* This works around a problem in FreeBSD linker */ -#ifdef FREEBSD_WORKAROUND -static const void *lt_preloaded_setup() { - return lt__PROGRAM__LTX_preloaded_symbols; -} -#endif - -#ifdef __cplusplus -} -#endif -_LT_EOF - # Now try linking the two files. - mv conftest.$ac_objext conftstm.$ac_objext - lt_globsym_save_LIBS=$LIBS - lt_globsym_save_CFLAGS=$CFLAGS - LIBS=conftstm.$ac_objext - CFLAGS="$CFLAGS$lt_prog_compiler_no_builtin_flag" - if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_link\""; } >&5 - (eval $ac_link) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } && test -s conftest$ac_exeext; then - pipe_works=yes - fi - LIBS=$lt_globsym_save_LIBS - CFLAGS=$lt_globsym_save_CFLAGS - else - echo "cannot find nm_test_func in $nlist" >&5 - fi - else - echo "cannot find nm_test_var in $nlist" >&5 - fi - else - echo "cannot run $lt_cv_sys_global_symbol_pipe" >&5 - fi - else - echo "$progname: failed program was:" >&5 - cat conftest.$ac_ext >&5 - fi - rm -rf conftest* conftst* - - # Do not use the global_symbol_pipe unless it works. - if test yes = "$pipe_works"; then - break - else - lt_cv_sys_global_symbol_pipe= - fi -done - -fi - -if test -z "$lt_cv_sys_global_symbol_pipe"; then - lt_cv_sys_global_symbol_to_cdecl= -fi -if test -z "$lt_cv_sys_global_symbol_pipe$lt_cv_sys_global_symbol_to_cdecl"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: failed" >&5 -$as_echo "failed" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: ok" >&5 -$as_echo "ok" >&6; } -fi - -# Response file support. -if test "$lt_cv_nm_interface" = "MS dumpbin"; then - nm_file_list_spec='@' -elif $NM --help 2>/dev/null | grep '[@]FILE' >/dev/null; then - nm_file_list_spec='@' -fi - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for sysroot" >&5 -$as_echo_n "checking for sysroot... " >&6; } - -# Check whether --with-sysroot was given. -if test "${with_sysroot+set}" = set; then : - withval=$with_sysroot; -else - with_sysroot=no -fi - - -lt_sysroot= -case $with_sysroot in #( - yes) - if test yes = "$GCC"; then - lt_sysroot=`$CC --print-sysroot 2>/dev/null` - fi - ;; #( - /*) - lt_sysroot=`echo "$with_sysroot" | sed -e "$sed_quote_subst"` - ;; #( - no|'') - ;; #( - *) - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_sysroot" >&5 -$as_echo "$with_sysroot" >&6; } - as_fn_error $? "The sysroot must be an absolute path." "$LINENO" 5 - ;; -esac - - { $as_echo "$as_me:${as_lineno-$LINENO}: result: ${lt_sysroot:-no}" >&5 -$as_echo "${lt_sysroot:-no}" >&6; } - - - - - -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for a working dd" >&5 -$as_echo_n "checking for a working dd... " >&6; } -if ${ac_cv_path_lt_DD+:} false; then : - $as_echo_n "(cached) " >&6 -else - printf 0123456789abcdef0123456789abcdef >conftest.i -cat conftest.i conftest.i >conftest2.i -: ${lt_DD:=$DD} -if test -z "$lt_DD"; then - ac_path_lt_DD_found=false - # Loop through the user's path and test for each of PROGNAME-LIST - as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_prog in dd; do - for ac_exec_ext in '' $ac_executable_extensions; do - ac_path_lt_DD="$as_dir/$ac_prog$ac_exec_ext" - as_fn_executable_p "$ac_path_lt_DD" || continue -if "$ac_path_lt_DD" bs=32 count=1 conftest.out 2>/dev/null; then - cmp -s conftest.i conftest.out \ - && ac_cv_path_lt_DD="$ac_path_lt_DD" ac_path_lt_DD_found=: -fi - $ac_path_lt_DD_found && break 3 - done - done - done -IFS=$as_save_IFS - if test -z "$ac_cv_path_lt_DD"; then - : - fi -else - ac_cv_path_lt_DD=$lt_DD -fi - -rm -f conftest.i conftest2.i conftest.out -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_path_lt_DD" >&5 -$as_echo "$ac_cv_path_lt_DD" >&6; } - - -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking how to truncate binary pipes" >&5 -$as_echo_n "checking how to truncate binary pipes... " >&6; } -if ${lt_cv_truncate_bin+:} false; then : - $as_echo_n "(cached) " >&6 -else - printf 0123456789abcdef0123456789abcdef >conftest.i -cat conftest.i conftest.i >conftest2.i -lt_cv_truncate_bin= -if "$ac_cv_path_lt_DD" bs=32 count=1 conftest.out 2>/dev/null; then - cmp -s conftest.i conftest.out \ - && lt_cv_truncate_bin="$ac_cv_path_lt_DD bs=4096 count=1" -fi -rm -f conftest.i conftest2.i conftest.out -test -z "$lt_cv_truncate_bin" && lt_cv_truncate_bin="$SED -e 4q" -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_truncate_bin" >&5 -$as_echo "$lt_cv_truncate_bin" >&6; } - - - - - - - -# Calculate cc_basename. Skip known compiler wrappers and cross-prefix. -func_cc_basename () -{ - for cc_temp in $*""; do - case $cc_temp in - compile | *[\\/]compile | ccache | *[\\/]ccache ) ;; - distcc | *[\\/]distcc | purify | *[\\/]purify ) ;; - \-*) ;; - *) break;; - esac - done - func_cc_basename_result=`$ECHO "$cc_temp" | $SED "s%.*/%%; s%^$host_alias-%%"` -} - -# Check whether --enable-libtool-lock was given. -if test "${enable_libtool_lock+set}" = set; then : - enableval=$enable_libtool_lock; -fi - -test no = "$enable_libtool_lock" || enable_libtool_lock=yes - -# Some flags need to be propagated to the compiler or linker for good -# libtool support. -case $host in -ia64-*-hpux*) - # Find out what ABI is being produced by ac_compile, and set mode - # options accordingly. - echo 'int i;' > conftest.$ac_ext - if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 - (eval $ac_compile) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; }; then - case `/usr/bin/file conftest.$ac_objext` in - *ELF-32*) - HPUX_IA64_MODE=32 - ;; - *ELF-64*) - HPUX_IA64_MODE=64 - ;; - esac - fi - rm -rf conftest* - ;; -*-*-irix6*) - # Find out what ABI is being produced by ac_compile, and set linker - # options accordingly. - echo '#line '$LINENO' "configure"' > conftest.$ac_ext - if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 - (eval $ac_compile) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; }; then - if test yes = "$lt_cv_prog_gnu_ld"; then - case `/usr/bin/file conftest.$ac_objext` in - *32-bit*) - LD="${LD-ld} -melf32bsmip" - ;; - *N32*) - LD="${LD-ld} -melf32bmipn32" - ;; - *64-bit*) - LD="${LD-ld} -melf64bmip" - ;; - esac - else - case `/usr/bin/file conftest.$ac_objext` in - *32-bit*) - LD="${LD-ld} -32" - ;; - *N32*) - LD="${LD-ld} -n32" - ;; - *64-bit*) - LD="${LD-ld} -64" - ;; - esac - fi - fi - rm -rf conftest* - ;; - -mips64*-*linux*) - # Find out what ABI is being produced by ac_compile, and set linker - # options accordingly. - echo '#line '$LINENO' "configure"' > conftest.$ac_ext - if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 - (eval $ac_compile) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; }; then - emul=elf - case `/usr/bin/file conftest.$ac_objext` in - *32-bit*) - emul="${emul}32" - ;; - *64-bit*) - emul="${emul}64" - ;; - esac - case `/usr/bin/file conftest.$ac_objext` in - *MSB*) - emul="${emul}btsmip" - ;; - *LSB*) - emul="${emul}ltsmip" - ;; - esac - case `/usr/bin/file conftest.$ac_objext` in - *N32*) - emul="${emul}n32" - ;; - esac - LD="${LD-ld} -m $emul" - fi - rm -rf conftest* - ;; - -x86_64-*kfreebsd*-gnu|x86_64-*linux*|powerpc*-*linux*| \ -s390*-*linux*|s390*-*tpf*|sparc*-*linux*) - # Find out what ABI is being produced by ac_compile, and set linker - # options accordingly. Note that the listed cases only cover the - # situations where additional linker options are needed (such as when - # doing 32-bit compilation for a host where ld defaults to 64-bit, or - # vice versa); the common cases where no linker options are needed do - # not appear in the list. - echo 'int i;' > conftest.$ac_ext - if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 - (eval $ac_compile) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; }; then - case `/usr/bin/file conftest.o` in - *32-bit*) - case $host in - x86_64-*kfreebsd*-gnu) - LD="${LD-ld} -m elf_i386_fbsd" - ;; - x86_64-*linux*) - case `/usr/bin/file conftest.o` in - *x86-64*) - LD="${LD-ld} -m elf32_x86_64" - ;; - *) - LD="${LD-ld} -m elf_i386" - ;; - esac - ;; - powerpc64le-*linux*) - LD="${LD-ld} -m elf32lppclinux" - ;; - powerpc64-*linux*) - LD="${LD-ld} -m elf32ppclinux" - ;; - s390x-*linux*) - LD="${LD-ld} -m elf_s390" - ;; - sparc64-*linux*) - LD="${LD-ld} -m elf32_sparc" - ;; - esac - ;; - *64-bit*) - case $host in - x86_64-*kfreebsd*-gnu) - LD="${LD-ld} -m elf_x86_64_fbsd" - ;; - x86_64-*linux*) - LD="${LD-ld} -m elf_x86_64" - ;; - powerpcle-*linux*) - LD="${LD-ld} -m elf64lppc" - ;; - powerpc-*linux*) - LD="${LD-ld} -m elf64ppc" - ;; - s390*-*linux*|s390*-*tpf*) - LD="${LD-ld} -m elf64_s390" - ;; - sparc*-*linux*) - LD="${LD-ld} -m elf64_sparc" - ;; - esac - ;; - esac - fi - rm -rf conftest* - ;; - -*-*-sco3.2v5*) - # On SCO OpenServer 5, we need -belf to get full-featured binaries. - SAVE_CFLAGS=$CFLAGS - CFLAGS="$CFLAGS -belf" - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the C compiler needs -belf" >&5 -$as_echo_n "checking whether the C compiler needs -belf... " >&6; } -if ${lt_cv_cc_needs_belf+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu - - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ - -int -main () -{ - - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - lt_cv_cc_needs_belf=yes -else - lt_cv_cc_needs_belf=no -fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext - ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu - -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_cc_needs_belf" >&5 -$as_echo "$lt_cv_cc_needs_belf" >&6; } - if test yes != "$lt_cv_cc_needs_belf"; then - # this is probably gcc 2.8.0, egcs 1.0 or newer; no need for -belf - CFLAGS=$SAVE_CFLAGS - fi - ;; -*-*solaris*) - # Find out what ABI is being produced by ac_compile, and set linker - # options accordingly. - echo 'int i;' > conftest.$ac_ext - if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 - (eval $ac_compile) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; }; then - case `/usr/bin/file conftest.o` in - *64-bit*) - case $lt_cv_prog_gnu_ld in - yes*) - case $host in - i?86-*-solaris*|x86_64-*-solaris*) - LD="${LD-ld} -m elf_x86_64" - ;; - sparc*-*-solaris*) - LD="${LD-ld} -m elf64_sparc" - ;; - esac - # GNU ld 2.21 introduced _sol2 emulations. Use them if available. - if ${LD-ld} -V | grep _sol2 >/dev/null 2>&1; then - LD=${LD-ld}_sol2 - fi - ;; - *) - if ${LD-ld} -64 -r -o conftest2.o conftest.o >/dev/null 2>&1; then - LD="${LD-ld} -64" - fi - ;; - esac - ;; - esac - fi - rm -rf conftest* - ;; -esac - -need_locks=$enable_libtool_lock - -if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}mt", so it can be a program name with args. -set dummy ${ac_tool_prefix}mt; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_MANIFEST_TOOL+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$MANIFEST_TOOL"; then - ac_cv_prog_MANIFEST_TOOL="$MANIFEST_TOOL" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_MANIFEST_TOOL="${ac_tool_prefix}mt" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS - -fi -fi -MANIFEST_TOOL=$ac_cv_prog_MANIFEST_TOOL -if test -n "$MANIFEST_TOOL"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $MANIFEST_TOOL" >&5 -$as_echo "$MANIFEST_TOOL" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - -fi -if test -z "$ac_cv_prog_MANIFEST_TOOL"; then - ac_ct_MANIFEST_TOOL=$MANIFEST_TOOL - # Extract the first word of "mt", so it can be a program name with args. -set dummy mt; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_MANIFEST_TOOL+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_MANIFEST_TOOL"; then - ac_cv_prog_ac_ct_MANIFEST_TOOL="$ac_ct_MANIFEST_TOOL" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_MANIFEST_TOOL="mt" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS - -fi -fi -ac_ct_MANIFEST_TOOL=$ac_cv_prog_ac_ct_MANIFEST_TOOL -if test -n "$ac_ct_MANIFEST_TOOL"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_MANIFEST_TOOL" >&5 -$as_echo "$ac_ct_MANIFEST_TOOL" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - if test "x$ac_ct_MANIFEST_TOOL" = x; then - MANIFEST_TOOL=":" - else - case $cross_compiling:$ac_tool_warned in -yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} -ac_tool_warned=yes ;; -esac - MANIFEST_TOOL=$ac_ct_MANIFEST_TOOL - fi -else - MANIFEST_TOOL="$ac_cv_prog_MANIFEST_TOOL" -fi - -test -z "$MANIFEST_TOOL" && MANIFEST_TOOL=mt -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking if $MANIFEST_TOOL is a manifest tool" >&5 -$as_echo_n "checking if $MANIFEST_TOOL is a manifest tool... " >&6; } -if ${lt_cv_path_mainfest_tool+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_path_mainfest_tool=no - echo "$as_me:$LINENO: $MANIFEST_TOOL '-?'" >&5 - $MANIFEST_TOOL '-?' 2>conftest.err > conftest.out - cat conftest.err >&5 - if $GREP 'Manifest Tool' conftest.out > /dev/null; then - lt_cv_path_mainfest_tool=yes - fi - rm -f conftest* -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_path_mainfest_tool" >&5 -$as_echo "$lt_cv_path_mainfest_tool" >&6; } -if test yes != "$lt_cv_path_mainfest_tool"; then - MANIFEST_TOOL=: -fi - - - - - - - case $host_os in - rhapsody* | darwin*) - if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}dsymutil", so it can be a program name with args. -set dummy ${ac_tool_prefix}dsymutil; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_DSYMUTIL+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$DSYMUTIL"; then - ac_cv_prog_DSYMUTIL="$DSYMUTIL" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_DSYMUTIL="${ac_tool_prefix}dsymutil" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS - -fi -fi -DSYMUTIL=$ac_cv_prog_DSYMUTIL -if test -n "$DSYMUTIL"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $DSYMUTIL" >&5 -$as_echo "$DSYMUTIL" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - -fi -if test -z "$ac_cv_prog_DSYMUTIL"; then - ac_ct_DSYMUTIL=$DSYMUTIL - # Extract the first word of "dsymutil", so it can be a program name with args. -set dummy dsymutil; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_DSYMUTIL+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_DSYMUTIL"; then - ac_cv_prog_ac_ct_DSYMUTIL="$ac_ct_DSYMUTIL" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_DSYMUTIL="dsymutil" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS - -fi -fi -ac_ct_DSYMUTIL=$ac_cv_prog_ac_ct_DSYMUTIL -if test -n "$ac_ct_DSYMUTIL"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_DSYMUTIL" >&5 -$as_echo "$ac_ct_DSYMUTIL" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - if test "x$ac_ct_DSYMUTIL" = x; then - DSYMUTIL=":" - else - case $cross_compiling:$ac_tool_warned in -yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} -ac_tool_warned=yes ;; -esac - DSYMUTIL=$ac_ct_DSYMUTIL - fi -else - DSYMUTIL="$ac_cv_prog_DSYMUTIL" -fi - - if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}nmedit", so it can be a program name with args. -set dummy ${ac_tool_prefix}nmedit; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_NMEDIT+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$NMEDIT"; then - ac_cv_prog_NMEDIT="$NMEDIT" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_NMEDIT="${ac_tool_prefix}nmedit" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS - -fi -fi -NMEDIT=$ac_cv_prog_NMEDIT -if test -n "$NMEDIT"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $NMEDIT" >&5 -$as_echo "$NMEDIT" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - -fi -if test -z "$ac_cv_prog_NMEDIT"; then - ac_ct_NMEDIT=$NMEDIT - # Extract the first word of "nmedit", so it can be a program name with args. -set dummy nmedit; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_NMEDIT+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_NMEDIT"; then - ac_cv_prog_ac_ct_NMEDIT="$ac_ct_NMEDIT" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_NMEDIT="nmedit" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS - -fi -fi -ac_ct_NMEDIT=$ac_cv_prog_ac_ct_NMEDIT -if test -n "$ac_ct_NMEDIT"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_NMEDIT" >&5 -$as_echo "$ac_ct_NMEDIT" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - if test "x$ac_ct_NMEDIT" = x; then - NMEDIT=":" - else - case $cross_compiling:$ac_tool_warned in -yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} -ac_tool_warned=yes ;; -esac - NMEDIT=$ac_ct_NMEDIT - fi -else - NMEDIT="$ac_cv_prog_NMEDIT" -fi - - if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}lipo", so it can be a program name with args. -set dummy ${ac_tool_prefix}lipo; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_LIPO+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$LIPO"; then - ac_cv_prog_LIPO="$LIPO" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_LIPO="${ac_tool_prefix}lipo" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS - -fi -fi -LIPO=$ac_cv_prog_LIPO -if test -n "$LIPO"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $LIPO" >&5 -$as_echo "$LIPO" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - -fi -if test -z "$ac_cv_prog_LIPO"; then - ac_ct_LIPO=$LIPO - # Extract the first word of "lipo", so it can be a program name with args. -set dummy lipo; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_LIPO+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_LIPO"; then - ac_cv_prog_ac_ct_LIPO="$ac_ct_LIPO" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_LIPO="lipo" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS - -fi -fi -ac_ct_LIPO=$ac_cv_prog_ac_ct_LIPO -if test -n "$ac_ct_LIPO"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_LIPO" >&5 -$as_echo "$ac_ct_LIPO" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - if test "x$ac_ct_LIPO" = x; then - LIPO=":" - else - case $cross_compiling:$ac_tool_warned in -yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} -ac_tool_warned=yes ;; -esac - LIPO=$ac_ct_LIPO - fi -else - LIPO="$ac_cv_prog_LIPO" -fi - - if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}otool", so it can be a program name with args. -set dummy ${ac_tool_prefix}otool; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_OTOOL+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$OTOOL"; then - ac_cv_prog_OTOOL="$OTOOL" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_OTOOL="${ac_tool_prefix}otool" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS - -fi -fi -OTOOL=$ac_cv_prog_OTOOL -if test -n "$OTOOL"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $OTOOL" >&5 -$as_echo "$OTOOL" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - -fi -if test -z "$ac_cv_prog_OTOOL"; then - ac_ct_OTOOL=$OTOOL - # Extract the first word of "otool", so it can be a program name with args. -set dummy otool; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_OTOOL+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_OTOOL"; then - ac_cv_prog_ac_ct_OTOOL="$ac_ct_OTOOL" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_OTOOL="otool" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS - -fi -fi -ac_ct_OTOOL=$ac_cv_prog_ac_ct_OTOOL -if test -n "$ac_ct_OTOOL"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_OTOOL" >&5 -$as_echo "$ac_ct_OTOOL" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - if test "x$ac_ct_OTOOL" = x; then - OTOOL=":" - else - case $cross_compiling:$ac_tool_warned in -yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} -ac_tool_warned=yes ;; -esac - OTOOL=$ac_ct_OTOOL - fi -else - OTOOL="$ac_cv_prog_OTOOL" -fi - - if test -n "$ac_tool_prefix"; then - # Extract the first word of "${ac_tool_prefix}otool64", so it can be a program name with args. -set dummy ${ac_tool_prefix}otool64; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_OTOOL64+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$OTOOL64"; then - ac_cv_prog_OTOOL64="$OTOOL64" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_OTOOL64="${ac_tool_prefix}otool64" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS - -fi -fi -OTOOL64=$ac_cv_prog_OTOOL64 -if test -n "$OTOOL64"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $OTOOL64" >&5 -$as_echo "$OTOOL64" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - -fi -if test -z "$ac_cv_prog_OTOOL64"; then - ac_ct_OTOOL64=$OTOOL64 - # Extract the first word of "otool64", so it can be a program name with args. -set dummy otool64; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ac_ct_OTOOL64+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -n "$ac_ct_OTOOL64"; then - ac_cv_prog_ac_ct_OTOOL64="$ac_ct_OTOOL64" # Let the user override the test. -else -as_save_IFS=$IFS; IFS=$PATH_SEPARATOR -for as_dir in $PATH -do - IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_prog_ac_ct_OTOOL64="otool64" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 - break 2 - fi -done - done -IFS=$as_save_IFS - -fi -fi -ac_ct_OTOOL64=$ac_cv_prog_ac_ct_OTOOL64 -if test -n "$ac_ct_OTOOL64"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_OTOOL64" >&5 -$as_echo "$ac_ct_OTOOL64" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - if test "x$ac_ct_OTOOL64" = x; then - OTOOL64=":" - else - case $cross_compiling:$ac_tool_warned in -yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} -ac_tool_warned=yes ;; -esac - OTOOL64=$ac_ct_OTOOL64 - fi -else - OTOOL64="$ac_cv_prog_OTOOL64" -fi - - - - - - - - - - - - - - - - - - - - - - - - - - - - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for -single_module linker flag" >&5 -$as_echo_n "checking for -single_module linker flag... " >&6; } -if ${lt_cv_apple_cc_single_mod+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_apple_cc_single_mod=no - if test -z "$LT_MULTI_MODULE"; then - # By default we will add the -single_module flag. You can override - # by either setting the environment variable LT_MULTI_MODULE - # non-empty at configure time, or by adding -multi_module to the - # link flags. - rm -rf libconftest.dylib* - echo "int foo(void){return 1;}" > conftest.c - echo "$LTCC $LTCFLAGS $LDFLAGS -o libconftest.dylib \ --dynamiclib -Wl,-single_module conftest.c" >&5 - $LTCC $LTCFLAGS $LDFLAGS -o libconftest.dylib \ - -dynamiclib -Wl,-single_module conftest.c 2>conftest.err - _lt_result=$? - # If there is a non-empty error log, and "single_module" - # appears in it, assume the flag caused a linker warning - if test -s conftest.err && $GREP single_module conftest.err; then - cat conftest.err >&5 - # Otherwise, if the output was created with a 0 exit code from - # the compiler, it worked. - elif test -f libconftest.dylib && test 0 = "$_lt_result"; then - lt_cv_apple_cc_single_mod=yes - else - cat conftest.err >&5 - fi - rm -rf libconftest.dylib* - rm -f conftest.* - fi -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_apple_cc_single_mod" >&5 -$as_echo "$lt_cv_apple_cc_single_mod" >&6; } - - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for -exported_symbols_list linker flag" >&5 -$as_echo_n "checking for -exported_symbols_list linker flag... " >&6; } -if ${lt_cv_ld_exported_symbols_list+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_ld_exported_symbols_list=no - save_LDFLAGS=$LDFLAGS - echo "_main" > conftest.sym - LDFLAGS="$LDFLAGS -Wl,-exported_symbols_list,conftest.sym" - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ - -int -main () -{ - - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - lt_cv_ld_exported_symbols_list=yes -else - lt_cv_ld_exported_symbols_list=no -fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext - LDFLAGS=$save_LDFLAGS - -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_ld_exported_symbols_list" >&5 -$as_echo "$lt_cv_ld_exported_symbols_list" >&6; } - - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for -force_load linker flag" >&5 -$as_echo_n "checking for -force_load linker flag... " >&6; } -if ${lt_cv_ld_force_load+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_ld_force_load=no - cat > conftest.c << _LT_EOF -int forced_loaded() { return 2;} -_LT_EOF - echo "$LTCC $LTCFLAGS -c -o conftest.o conftest.c" >&5 - $LTCC $LTCFLAGS -c -o conftest.o conftest.c 2>&5 - echo "$AR $AR_FLAGS libconftest.a conftest.o" >&5 - $AR $AR_FLAGS libconftest.a conftest.o 2>&5 - echo "$RANLIB libconftest.a" >&5 - $RANLIB libconftest.a 2>&5 - cat > conftest.c << _LT_EOF -int main() { return 0;} -_LT_EOF - echo "$LTCC $LTCFLAGS $LDFLAGS -o conftest conftest.c -Wl,-force_load,./libconftest.a" >&5 - $LTCC $LTCFLAGS $LDFLAGS -o conftest conftest.c -Wl,-force_load,./libconftest.a 2>conftest.err - _lt_result=$? - if test -s conftest.err && $GREP force_load conftest.err; then - cat conftest.err >&5 - elif test -f conftest && test 0 = "$_lt_result" && $GREP forced_load conftest >/dev/null 2>&1; then - lt_cv_ld_force_load=yes - else - cat conftest.err >&5 - fi - rm -f conftest.err libconftest.a conftest conftest.c - rm -rf conftest.dSYM - -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_ld_force_load" >&5 -$as_echo "$lt_cv_ld_force_load" >&6; } - case $host_os in - rhapsody* | darwin1.[012]) - _lt_dar_allow_undefined='$wl-undefined ${wl}suppress' ;; - darwin1.*) - _lt_dar_allow_undefined='$wl-flat_namespace $wl-undefined ${wl}suppress' ;; - darwin*) # darwin 5.x on - # if running on 10.5 or later, the deployment target defaults - # to the OS version, if on x86, and 10.4, the deployment - # target defaults to 10.4. Don't you love it? - case ${MACOSX_DEPLOYMENT_TARGET-10.0},$host in - 10.0,*86*-darwin8*|10.0,*-darwin[91]*) - _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup' ;; - 10.[012][,.]*) - _lt_dar_allow_undefined='$wl-flat_namespace $wl-undefined ${wl}suppress' ;; - 10.*) - _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup' ;; - esac - ;; - esac - if test yes = "$lt_cv_apple_cc_single_mod"; then - _lt_dar_single_mod='$single_module' - fi - if test yes = "$lt_cv_ld_exported_symbols_list"; then - _lt_dar_export_syms=' $wl-exported_symbols_list,$output_objdir/$libname-symbols.expsym' - else - _lt_dar_export_syms='~$NMEDIT -s $output_objdir/$libname-symbols.expsym $lib' - fi - if test : != "$DSYMUTIL" && test no = "$lt_cv_ld_force_load"; then - _lt_dsymutil='~$DSYMUTIL $lib || :' - else - _lt_dsymutil= - fi - ;; - esac - -# func_munge_path_list VARIABLE PATH -# ----------------------------------- -# VARIABLE is name of variable containing _space_ separated list of -# directories to be munged by the contents of PATH, which is string -# having a format: -# "DIR[:DIR]:" -# string "DIR[ DIR]" will be prepended to VARIABLE -# ":DIR[:DIR]" -# string "DIR[ DIR]" will be appended to VARIABLE -# "DIRP[:DIRP]::[DIRA:]DIRA" -# string "DIRP[ DIRP]" will be prepended to VARIABLE and string -# "DIRA[ DIRA]" will be appended to VARIABLE -# "DIR[:DIR]" -# VARIABLE will be replaced by "DIR[ DIR]" -func_munge_path_list () -{ - case x$2 in - x) - ;; - *:) - eval $1=\"`$ECHO $2 | $SED 's/:/ /g'` \$$1\" - ;; - x:*) - eval $1=\"\$$1 `$ECHO $2 | $SED 's/:/ /g'`\" - ;; - *::*) - eval $1=\"\$$1\ `$ECHO $2 | $SED -e 's/.*:://' -e 's/:/ /g'`\" - eval $1=\"`$ECHO $2 | $SED -e 's/::.*//' -e 's/:/ /g'`\ \$$1\" - ;; - *) - eval $1=\"`$ECHO $2 | $SED 's/:/ /g'`\" - ;; - esac -} - -for ac_header in dlfcn.h -do : - ac_fn_c_check_header_compile "$LINENO" "dlfcn.h" "ac_cv_header_dlfcn_h" "$ac_includes_default -" -if test "x$ac_cv_header_dlfcn_h" = xyes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_DLFCN_H 1 -_ACEOF - -fi - -done - - - -func_stripname_cnf () -{ - case $2 in - .*) func_stripname_result=`$ECHO "$3" | $SED "s%^$1%%; s%\\\\$2\$%%"`;; - *) func_stripname_result=`$ECHO "$3" | $SED "s%^$1%%; s%$2\$%%"`;; - esac -} # func_stripname_cnf - - - - - -# Set options - - - - enable_dlopen=no - - - - # Check whether --enable-shared was given. -if test "${enable_shared+set}" = set; then : - enableval=$enable_shared; p=${PACKAGE-default} - case $enableval in - yes) enable_shared=yes ;; - no) enable_shared=no ;; - *) - enable_shared=no - # Look at the argument we got. We use all the common list separators. - lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR, - for pkg in $enableval; do - IFS=$lt_save_ifs - if test "X$pkg" = "X$p"; then - enable_shared=yes - fi - done - IFS=$lt_save_ifs - ;; - esac -else - enable_shared=yes -fi - - - - - - - - - - # Check whether --enable-static was given. -if test "${enable_static+set}" = set; then : - enableval=$enable_static; p=${PACKAGE-default} - case $enableval in - yes) enable_static=yes ;; - no) enable_static=no ;; - *) - enable_static=no - # Look at the argument we got. We use all the common list separators. - lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR, - for pkg in $enableval; do - IFS=$lt_save_ifs - if test "X$pkg" = "X$p"; then - enable_static=yes - fi - done - IFS=$lt_save_ifs - ;; - esac -else - enable_static=yes -fi - - - - - - - - - - -# Check whether --with-pic was given. -if test "${with_pic+set}" = set; then : - withval=$with_pic; lt_p=${PACKAGE-default} - case $withval in - yes|no) pic_mode=$withval ;; - *) - pic_mode=default - # Look at the argument we got. We use all the common list separators. - lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR, - for lt_pkg in $withval; do - IFS=$lt_save_ifs - if test "X$lt_pkg" = "X$lt_p"; then - pic_mode=yes - fi - done - IFS=$lt_save_ifs - ;; - esac -else - pic_mode=default -fi - - - - - - - - - # Check whether --enable-fast-install was given. -if test "${enable_fast_install+set}" = set; then : - enableval=$enable_fast_install; p=${PACKAGE-default} - case $enableval in - yes) enable_fast_install=yes ;; - no) enable_fast_install=no ;; - *) - enable_fast_install=no - # Look at the argument we got. We use all the common list separators. - lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR, - for pkg in $enableval; do - IFS=$lt_save_ifs - if test "X$pkg" = "X$p"; then - enable_fast_install=yes - fi - done - IFS=$lt_save_ifs - ;; - esac -else - enable_fast_install=yes -fi - - - - - - - - - shared_archive_member_spec= -case $host,$enable_shared in -power*-*-aix[5-9]*,yes) - { $as_echo "$as_me:${as_lineno-$LINENO}: checking which variant of shared library versioning to provide" >&5 -$as_echo_n "checking which variant of shared library versioning to provide... " >&6; } - -# Check whether --with-aix-soname was given. -if test "${with_aix_soname+set}" = set; then : - withval=$with_aix_soname; case $withval in - aix|svr4|both) - ;; - *) - as_fn_error $? "Unknown argument to --with-aix-soname" "$LINENO" 5 - ;; - esac - lt_cv_with_aix_soname=$with_aix_soname -else - if ${lt_cv_with_aix_soname+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_with_aix_soname=aix -fi - - with_aix_soname=$lt_cv_with_aix_soname -fi - - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $with_aix_soname" >&5 -$as_echo "$with_aix_soname" >&6; } - if test aix != "$with_aix_soname"; then - # For the AIX way of multilib, we name the shared archive member - # based on the bitwidth used, traditionally 'shr.o' or 'shr_64.o', - # and 'shr.imp' or 'shr_64.imp', respectively, for the Import File. - # Even when GNU compilers ignore OBJECT_MODE but need '-maix64' flag, - # the AIX toolchain works better with OBJECT_MODE set (default 32). - if test 64 = "${OBJECT_MODE-32}"; then - shared_archive_member_spec=shr_64 - else - shared_archive_member_spec=shr - fi - fi - ;; -*) - with_aix_soname=aix - ;; -esac - - - - - - - - - - -# This can be used to rebuild libtool when needed -LIBTOOL_DEPS=$ltmain - -# Always use our own libtool. -LIBTOOL='$(SHELL) $(top_builddir)/libtool' - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -test -z "$LN_S" && LN_S="ln -s" - - - - - - - - - - - - - - -if test -n "${ZSH_VERSION+set}"; then - setopt NO_GLOB_SUBST -fi - -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for objdir" >&5 -$as_echo_n "checking for objdir... " >&6; } -if ${lt_cv_objdir+:} false; then : - $as_echo_n "(cached) " >&6 -else - rm -f .libs 2>/dev/null -mkdir .libs 2>/dev/null -if test -d .libs; then - lt_cv_objdir=.libs -else - # MS-DOS does not allow filenames that begin with a dot. - lt_cv_objdir=_libs -fi -rmdir .libs 2>/dev/null -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_objdir" >&5 -$as_echo "$lt_cv_objdir" >&6; } -objdir=$lt_cv_objdir - - - - - -cat >>confdefs.h <<_ACEOF -#define LT_OBJDIR "$lt_cv_objdir/" -_ACEOF - - - - -case $host_os in -aix3*) - # AIX sometimes has problems with the GCC collect2 program. For some - # reason, if we set the COLLECT_NAMES environment variable, the problems - # vanish in a puff of smoke. - if test set != "${COLLECT_NAMES+set}"; then - COLLECT_NAMES= - export COLLECT_NAMES - fi - ;; -esac - -# Global variables: -ofile=libtool -can_build_shared=yes - -# All known linkers require a '.a' archive for static linking (except MSVC and -# ICC, which need '.lib'). -libext=a - -with_gnu_ld=$lt_cv_prog_gnu_ld - -old_CC=$CC -old_CFLAGS=$CFLAGS - -# Set sane defaults for various variables -test -z "$CC" && CC=cc -test -z "$LTCC" && LTCC=$CC -test -z "$LTCFLAGS" && LTCFLAGS=$CFLAGS -test -z "$LD" && LD=ld -test -z "$ac_objext" && ac_objext=o - -func_cc_basename $compiler -cc_basename=$func_cc_basename_result - - -# Only perform the check for file, if the check method requires it -test -z "$MAGIC_CMD" && MAGIC_CMD=file -case $deplibs_check_method in -file_magic*) - if test "$file_magic_cmd" = '$MAGIC_CMD'; then - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ${ac_tool_prefix}file" >&5 -$as_echo_n "checking for ${ac_tool_prefix}file... " >&6; } -if ${lt_cv_path_MAGIC_CMD+:} false; then : - $as_echo_n "(cached) " >&6 -else - case $MAGIC_CMD in -[\\/*] | ?:[\\/]*) - lt_cv_path_MAGIC_CMD=$MAGIC_CMD # Let the user override the test with a path. - ;; -*) - lt_save_MAGIC_CMD=$MAGIC_CMD - lt_save_ifs=$IFS; IFS=$PATH_SEPARATOR - ac_dummy="/usr/bin$PATH_SEPARATOR$PATH" - for ac_dir in $ac_dummy; do - IFS=$lt_save_ifs - test -z "$ac_dir" && ac_dir=. - if test -f "$ac_dir/${ac_tool_prefix}file"; then - lt_cv_path_MAGIC_CMD=$ac_dir/"${ac_tool_prefix}file" - if test -n "$file_magic_test_file"; then - case $deplibs_check_method in - "file_magic "*) - file_magic_regex=`expr "$deplibs_check_method" : "file_magic \(.*\)"` - MAGIC_CMD=$lt_cv_path_MAGIC_CMD - if eval $file_magic_cmd \$file_magic_test_file 2> /dev/null | - $EGREP "$file_magic_regex" > /dev/null; then - : - else - cat <<_LT_EOF 1>&2 - -*** Warning: the command libtool uses to detect shared libraries, -*** $file_magic_cmd, produces output that libtool cannot recognize. -*** The result is that libtool may fail to recognize shared libraries -*** as such. This will affect the creation of libtool libraries that -*** depend on shared libraries, but programs linked with such libtool -*** libraries will work regardless of this problem. Nevertheless, you -*** may want to report the problem to your system manager and/or to -*** bug-libtool@gnu.org - -_LT_EOF - fi ;; - esac - fi - break - fi - done - IFS=$lt_save_ifs - MAGIC_CMD=$lt_save_MAGIC_CMD - ;; -esac -fi - -MAGIC_CMD=$lt_cv_path_MAGIC_CMD -if test -n "$MAGIC_CMD"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $MAGIC_CMD" >&5 -$as_echo "$MAGIC_CMD" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - - - - -if test -z "$lt_cv_path_MAGIC_CMD"; then - if test -n "$ac_tool_prefix"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for file" >&5 -$as_echo_n "checking for file... " >&6; } -if ${lt_cv_path_MAGIC_CMD+:} false; then : - $as_echo_n "(cached) " >&6 -else - case $MAGIC_CMD in -[\\/*] | ?:[\\/]*) - lt_cv_path_MAGIC_CMD=$MAGIC_CMD # Let the user override the test with a path. - ;; -*) - lt_save_MAGIC_CMD=$MAGIC_CMD - lt_save_ifs=$IFS; IFS=$PATH_SEPARATOR - ac_dummy="/usr/bin$PATH_SEPARATOR$PATH" - for ac_dir in $ac_dummy; do - IFS=$lt_save_ifs - test -z "$ac_dir" && ac_dir=. - if test -f "$ac_dir/file"; then - lt_cv_path_MAGIC_CMD=$ac_dir/"file" - if test -n "$file_magic_test_file"; then - case $deplibs_check_method in - "file_magic "*) - file_magic_regex=`expr "$deplibs_check_method" : "file_magic \(.*\)"` - MAGIC_CMD=$lt_cv_path_MAGIC_CMD - if eval $file_magic_cmd \$file_magic_test_file 2> /dev/null | - $EGREP "$file_magic_regex" > /dev/null; then - : - else - cat <<_LT_EOF 1>&2 - -*** Warning: the command libtool uses to detect shared libraries, -*** $file_magic_cmd, produces output that libtool cannot recognize. -*** The result is that libtool may fail to recognize shared libraries -*** as such. This will affect the creation of libtool libraries that -*** depend on shared libraries, but programs linked with such libtool -*** libraries will work regardless of this problem. Nevertheless, you -*** may want to report the problem to your system manager and/or to -*** bug-libtool@gnu.org - -_LT_EOF - fi ;; - esac - fi - break - fi - done - IFS=$lt_save_ifs - MAGIC_CMD=$lt_save_MAGIC_CMD - ;; -esac -fi - -MAGIC_CMD=$lt_cv_path_MAGIC_CMD -if test -n "$MAGIC_CMD"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $MAGIC_CMD" >&5 -$as_echo "$MAGIC_CMD" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi - - - else - MAGIC_CMD=: - fi -fi - - fi - ;; -esac - -# Use C for the default configuration in the libtool script - -lt_save_CC=$CC -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu - - -# Source file extension for C test sources. -ac_ext=c - -# Object file extension for compiled C test sources. -objext=o -objext=$objext - -# Code to be used in simple compile tests -lt_simple_compile_test_code="int some_variable = 0;" - -# Code to be used in simple link tests -lt_simple_link_test_code='int main(){return(0);}' - - - - - - - -# If no C compiler was specified, use CC. -LTCC=${LTCC-"$CC"} - -# If no C compiler flags were specified, use CFLAGS. -LTCFLAGS=${LTCFLAGS-"$CFLAGS"} - -# Allow CC to be a program name with arguments. -compiler=$CC - -# Save the default compiler, since it gets overwritten when the other -# tags are being tested, and _LT_TAGVAR(compiler, []) is a NOP. -compiler_DEFAULT=$CC - -# save warnings/boilerplate of simple test code -ac_outfile=conftest.$ac_objext -echo "$lt_simple_compile_test_code" >conftest.$ac_ext -eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err -_lt_compiler_boilerplate=`cat conftest.err` -$RM conftest* - -ac_outfile=conftest.$ac_objext -echo "$lt_simple_link_test_code" >conftest.$ac_ext -eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err -_lt_linker_boilerplate=`cat conftest.err` -$RM -r conftest* - - -## CAVEAT EMPTOR: -## There is no encapsulation within the following macros, do not change -## the running order or otherwise move them around unless you know exactly -## what you are doing... -if test -n "$compiler"; then - -lt_prog_compiler_no_builtin_flag= - -if test yes = "$GCC"; then - case $cc_basename in - nvcc*) - lt_prog_compiler_no_builtin_flag=' -Xcompiler -fno-builtin' ;; - *) - lt_prog_compiler_no_builtin_flag=' -fno-builtin' ;; - esac - - { $as_echo "$as_me:${as_lineno-$LINENO}: checking if $compiler supports -fno-rtti -fno-exceptions" >&5 -$as_echo_n "checking if $compiler supports -fno-rtti -fno-exceptions... " >&6; } -if ${lt_cv_prog_compiler_rtti_exceptions+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_prog_compiler_rtti_exceptions=no - ac_outfile=conftest.$ac_objext - echo "$lt_simple_compile_test_code" > conftest.$ac_ext - lt_compiler_flag="-fno-rtti -fno-exceptions" ## exclude from sc_useless_quotes_in_assignment - # Insert the option either (1) after the last *FLAGS variable, or - # (2) before a word containing "conftest.", or (3) at the end. - # Note that $ac_compile itself does not contain backslashes and begins - # with a dollar sign (not a hyphen), so the echo should work correctly. - # The option is referenced via a variable to avoid confusing sed. - lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ - -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ - -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:$LINENO: $lt_compile\"" >&5) - (eval "$lt_compile" 2>conftest.err) - ac_status=$? - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - if (exit $ac_status) && test -s "$ac_outfile"; then - # The compiler can only warn and ignore the option if not recognized - # So say no if there are warnings other than the usual output. - $ECHO "$_lt_compiler_boilerplate" | $SED '/^$/d' >conftest.exp - $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 - if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then - lt_cv_prog_compiler_rtti_exceptions=yes - fi - fi - $RM conftest* - -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_rtti_exceptions" >&5 -$as_echo "$lt_cv_prog_compiler_rtti_exceptions" >&6; } - -if test yes = "$lt_cv_prog_compiler_rtti_exceptions"; then - lt_prog_compiler_no_builtin_flag="$lt_prog_compiler_no_builtin_flag -fno-rtti -fno-exceptions" -else - : -fi - -fi - - - - - - - lt_prog_compiler_wl= -lt_prog_compiler_pic= -lt_prog_compiler_static= - - - if test yes = "$GCC"; then - lt_prog_compiler_wl='-Wl,' - lt_prog_compiler_static='-static' - - case $host_os in - aix*) - # All AIX code is PIC. - if test ia64 = "$host_cpu"; then - # AIX 5 now supports IA64 processor - lt_prog_compiler_static='-Bstatic' - fi - lt_prog_compiler_pic='-fPIC' - ;; - - amigaos*) - case $host_cpu in - powerpc) - # see comment about AmigaOS4 .so support - lt_prog_compiler_pic='-fPIC' - ;; - m68k) - # FIXME: we need at least 68020 code to build shared libraries, but - # adding the '-m68020' flag to GCC prevents building anything better, - # like '-m68040'. - lt_prog_compiler_pic='-m68020 -resident32 -malways-restore-a4' - ;; - esac - ;; - - beos* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*) - # PIC is the default for these OSes. - ;; - - mingw* | cygwin* | pw32* | os2* | cegcc*) - # This hack is so that the source file can tell whether it is being - # built for inclusion in a dll (and should export symbols for example). - # Although the cygwin gcc ignores -fPIC, still need this for old-style - # (--disable-auto-import) libraries - lt_prog_compiler_pic='-DDLL_EXPORT' - case $host_os in - os2*) - lt_prog_compiler_static='$wl-static' - ;; - esac - ;; - - darwin* | rhapsody*) - # PIC is the default on this platform - # Common symbols not allowed in MH_DYLIB files - lt_prog_compiler_pic='-fno-common' - ;; - - haiku*) - # PIC is the default for Haiku. - # The "-static" flag exists, but is broken. - lt_prog_compiler_static= - ;; - - hpux*) - # PIC is the default for 64-bit PA HP-UX, but not for 32-bit - # PA HP-UX. On IA64 HP-UX, PIC is the default but the pic flag - # sets the default TLS model and affects inlining. - case $host_cpu in - hppa*64*) - # +Z the default - ;; - *) - lt_prog_compiler_pic='-fPIC' - ;; - esac - ;; - - interix[3-9]*) - # Interix 3.x gcc -fpic/-fPIC options generate broken code. - # Instead, we relocate shared libraries at runtime. - ;; - - msdosdjgpp*) - # Just because we use GCC doesn't mean we suddenly get shared libraries - # on systems that don't support them. - lt_prog_compiler_can_build_shared=no - enable_shared=no - ;; - - *nto* | *qnx*) - # QNX uses GNU C++, but need to define -shared option too, otherwise - # it will coredump. - lt_prog_compiler_pic='-fPIC -shared' - ;; - - sysv4*MP*) - if test -d /usr/nec; then - lt_prog_compiler_pic=-Kconform_pic - fi - ;; - - *) - lt_prog_compiler_pic='-fPIC' - ;; - esac - - case $cc_basename in - nvcc*) # Cuda Compiler Driver 2.2 - lt_prog_compiler_wl='-Xlinker ' - if test -n "$lt_prog_compiler_pic"; then - lt_prog_compiler_pic="-Xcompiler $lt_prog_compiler_pic" - fi - ;; - esac - else - # PORTME Check for flag to pass linker flags through the system compiler. - case $host_os in - aix*) - lt_prog_compiler_wl='-Wl,' - if test ia64 = "$host_cpu"; then - # AIX 5 now supports IA64 processor - lt_prog_compiler_static='-Bstatic' - else - lt_prog_compiler_static='-bnso -bI:/lib/syscalls.exp' - fi - ;; - - darwin* | rhapsody*) - # PIC is the default on this platform - # Common symbols not allowed in MH_DYLIB files - lt_prog_compiler_pic='-fno-common' - case $cc_basename in - nagfor*) - # NAG Fortran compiler - lt_prog_compiler_wl='-Wl,-Wl,,' - lt_prog_compiler_pic='-PIC' - lt_prog_compiler_static='-Bstatic' - ;; - esac - ;; - - mingw* | cygwin* | pw32* | os2* | cegcc*) - # This hack is so that the source file can tell whether it is being - # built for inclusion in a dll (and should export symbols for example). - lt_prog_compiler_pic='-DDLL_EXPORT' - case $host_os in - os2*) - lt_prog_compiler_static='$wl-static' - ;; - esac - ;; - - hpux9* | hpux10* | hpux11*) - lt_prog_compiler_wl='-Wl,' - # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but - # not for PA HP-UX. - case $host_cpu in - hppa*64*|ia64*) - # +Z the default - ;; - *) - lt_prog_compiler_pic='+Z' - ;; - esac - # Is there a better lt_prog_compiler_static that works with the bundled CC? - lt_prog_compiler_static='$wl-a ${wl}archive' - ;; - - irix5* | irix6* | nonstopux*) - lt_prog_compiler_wl='-Wl,' - # PIC (with -KPIC) is the default. - lt_prog_compiler_static='-non_shared' - ;; - - linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*) - case $cc_basename in - # old Intel for x86_64, which still supported -KPIC. - ecc*) - lt_prog_compiler_wl='-Wl,' - lt_prog_compiler_pic='-KPIC' - lt_prog_compiler_static='-static' - ;; - # icc used to be incompatible with GCC. - # ICC 10 doesn't accept -KPIC any more. - icc* | ifort*) - lt_prog_compiler_wl='-Wl,' - lt_prog_compiler_pic='-fPIC' - lt_prog_compiler_static='-static' - ;; - # Lahey Fortran 8.1. - lf95*) - lt_prog_compiler_wl='-Wl,' - lt_prog_compiler_pic='--shared' - lt_prog_compiler_static='--static' - ;; - nagfor*) - # NAG Fortran compiler - lt_prog_compiler_wl='-Wl,-Wl,,' - lt_prog_compiler_pic='-PIC' - lt_prog_compiler_static='-Bstatic' - ;; - tcc*) - # Fabrice Bellard et al's Tiny C Compiler - lt_prog_compiler_wl='-Wl,' - lt_prog_compiler_pic='-fPIC' - lt_prog_compiler_static='-static' - ;; - pgcc* | pgf77* | pgf90* | pgf95* | pgfortran*) - # Portland Group compilers (*not* the Pentium gcc compiler, - # which looks to be a dead project) - lt_prog_compiler_wl='-Wl,' - lt_prog_compiler_pic='-fpic' - lt_prog_compiler_static='-Bstatic' - ;; - ccc*) - lt_prog_compiler_wl='-Wl,' - # All Alpha code is PIC. - lt_prog_compiler_static='-non_shared' - ;; - xl* | bgxl* | bgf* | mpixl*) - # IBM XL C 8.0/Fortran 10.1, 11.1 on PPC and BlueGene - lt_prog_compiler_wl='-Wl,' - lt_prog_compiler_pic='-qpic' - lt_prog_compiler_static='-qstaticlink' - ;; - *) - case `$CC -V 2>&1 | sed 5q` in - *Sun\ Ceres\ Fortran* | *Sun*Fortran*\ [1-7].* | *Sun*Fortran*\ 8.[0-3]*) - # Sun Fortran 8.3 passes all unrecognized flags to the linker - lt_prog_compiler_pic='-KPIC' - lt_prog_compiler_static='-Bstatic' - lt_prog_compiler_wl='' - ;; - *Sun\ F* | *Sun*Fortran*) - lt_prog_compiler_pic='-KPIC' - lt_prog_compiler_static='-Bstatic' - lt_prog_compiler_wl='-Qoption ld ' - ;; - *Sun\ C*) - # Sun C 5.9 - lt_prog_compiler_pic='-KPIC' - lt_prog_compiler_static='-Bstatic' - lt_prog_compiler_wl='-Wl,' - ;; - *Intel*\ [CF]*Compiler*) - lt_prog_compiler_wl='-Wl,' - lt_prog_compiler_pic='-fPIC' - lt_prog_compiler_static='-static' - ;; - *Portland\ Group*) - lt_prog_compiler_wl='-Wl,' - lt_prog_compiler_pic='-fpic' - lt_prog_compiler_static='-Bstatic' - ;; - esac - ;; - esac - ;; - - newsos6) - lt_prog_compiler_pic='-KPIC' - lt_prog_compiler_static='-Bstatic' - ;; - - *nto* | *qnx*) - # QNX uses GNU C++, but need to define -shared option too, otherwise - # it will coredump. - lt_prog_compiler_pic='-fPIC -shared' - ;; - - osf3* | osf4* | osf5*) - lt_prog_compiler_wl='-Wl,' - # All OSF/1 code is PIC. - lt_prog_compiler_static='-non_shared' - ;; - - rdos*) - lt_prog_compiler_static='-non_shared' - ;; - - solaris*) - lt_prog_compiler_pic='-KPIC' - lt_prog_compiler_static='-Bstatic' - case $cc_basename in - f77* | f90* | f95* | sunf77* | sunf90* | sunf95*) - lt_prog_compiler_wl='-Qoption ld ';; - *) - lt_prog_compiler_wl='-Wl,';; - esac - ;; - - sunos4*) - lt_prog_compiler_wl='-Qoption ld ' - lt_prog_compiler_pic='-PIC' - lt_prog_compiler_static='-Bstatic' - ;; - - sysv4 | sysv4.2uw2* | sysv4.3*) - lt_prog_compiler_wl='-Wl,' - lt_prog_compiler_pic='-KPIC' - lt_prog_compiler_static='-Bstatic' - ;; - - sysv4*MP*) - if test -d /usr/nec; then - lt_prog_compiler_pic='-Kconform_pic' - lt_prog_compiler_static='-Bstatic' - fi - ;; - - sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*) - lt_prog_compiler_wl='-Wl,' - lt_prog_compiler_pic='-KPIC' - lt_prog_compiler_static='-Bstatic' - ;; - - unicos*) - lt_prog_compiler_wl='-Wl,' - lt_prog_compiler_can_build_shared=no - ;; - - uts4*) - lt_prog_compiler_pic='-pic' - lt_prog_compiler_static='-Bstatic' - ;; - - *) - lt_prog_compiler_can_build_shared=no - ;; - esac - fi - -case $host_os in - # For platforms that do not support PIC, -DPIC is meaningless: - *djgpp*) - lt_prog_compiler_pic= - ;; - *) - lt_prog_compiler_pic="$lt_prog_compiler_pic -DPIC" - ;; -esac - -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $compiler option to produce PIC" >&5 -$as_echo_n "checking for $compiler option to produce PIC... " >&6; } -if ${lt_cv_prog_compiler_pic+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_prog_compiler_pic=$lt_prog_compiler_pic -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_pic" >&5 -$as_echo "$lt_cv_prog_compiler_pic" >&6; } -lt_prog_compiler_pic=$lt_cv_prog_compiler_pic - -# -# Check to make sure the PIC flag actually works. -# -if test -n "$lt_prog_compiler_pic"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: checking if $compiler PIC flag $lt_prog_compiler_pic works" >&5 -$as_echo_n "checking if $compiler PIC flag $lt_prog_compiler_pic works... " >&6; } -if ${lt_cv_prog_compiler_pic_works+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_prog_compiler_pic_works=no - ac_outfile=conftest.$ac_objext - echo "$lt_simple_compile_test_code" > conftest.$ac_ext - lt_compiler_flag="$lt_prog_compiler_pic -DPIC" ## exclude from sc_useless_quotes_in_assignment - # Insert the option either (1) after the last *FLAGS variable, or - # (2) before a word containing "conftest.", or (3) at the end. - # Note that $ac_compile itself does not contain backslashes and begins - # with a dollar sign (not a hyphen), so the echo should work correctly. - # The option is referenced via a variable to avoid confusing sed. - lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ - -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ - -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:$LINENO: $lt_compile\"" >&5) - (eval "$lt_compile" 2>conftest.err) - ac_status=$? - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - if (exit $ac_status) && test -s "$ac_outfile"; then - # The compiler can only warn and ignore the option if not recognized - # So say no if there are warnings other than the usual output. - $ECHO "$_lt_compiler_boilerplate" | $SED '/^$/d' >conftest.exp - $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 - if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then - lt_cv_prog_compiler_pic_works=yes - fi - fi - $RM conftest* - -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_pic_works" >&5 -$as_echo "$lt_cv_prog_compiler_pic_works" >&6; } - -if test yes = "$lt_cv_prog_compiler_pic_works"; then - case $lt_prog_compiler_pic in - "" | " "*) ;; - *) lt_prog_compiler_pic=" $lt_prog_compiler_pic" ;; - esac -else - lt_prog_compiler_pic= - lt_prog_compiler_can_build_shared=no -fi - -fi - - - - - - - - - - - -# -# Check to make sure the static flag actually works. -# -wl=$lt_prog_compiler_wl eval lt_tmp_static_flag=\"$lt_prog_compiler_static\" -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking if $compiler static flag $lt_tmp_static_flag works" >&5 -$as_echo_n "checking if $compiler static flag $lt_tmp_static_flag works... " >&6; } -if ${lt_cv_prog_compiler_static_works+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_prog_compiler_static_works=no - save_LDFLAGS=$LDFLAGS - LDFLAGS="$LDFLAGS $lt_tmp_static_flag" - echo "$lt_simple_link_test_code" > conftest.$ac_ext - if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then - # The linker can only warn and ignore the option if not recognized - # So say no if there are warnings - if test -s conftest.err; then - # Append any errors to the config.log. - cat conftest.err 1>&5 - $ECHO "$_lt_linker_boilerplate" | $SED '/^$/d' > conftest.exp - $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 - if diff conftest.exp conftest.er2 >/dev/null; then - lt_cv_prog_compiler_static_works=yes - fi - else - lt_cv_prog_compiler_static_works=yes - fi - fi - $RM -r conftest* - LDFLAGS=$save_LDFLAGS - -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_static_works" >&5 -$as_echo "$lt_cv_prog_compiler_static_works" >&6; } - -if test yes = "$lt_cv_prog_compiler_static_works"; then - : -else - lt_prog_compiler_static= -fi - - - - - - - - { $as_echo "$as_me:${as_lineno-$LINENO}: checking if $compiler supports -c -o file.$ac_objext" >&5 -$as_echo_n "checking if $compiler supports -c -o file.$ac_objext... " >&6; } -if ${lt_cv_prog_compiler_c_o+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_prog_compiler_c_o=no - $RM -r conftest 2>/dev/null - mkdir conftest - cd conftest - mkdir out - echo "$lt_simple_compile_test_code" > conftest.$ac_ext - - lt_compiler_flag="-o out/conftest2.$ac_objext" - # Insert the option either (1) after the last *FLAGS variable, or - # (2) before a word containing "conftest.", or (3) at the end. - # Note that $ac_compile itself does not contain backslashes and begins - # with a dollar sign (not a hyphen), so the echo should work correctly. - lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ - -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ - -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:$LINENO: $lt_compile\"" >&5) - (eval "$lt_compile" 2>out/conftest.err) - ac_status=$? - cat out/conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - if (exit $ac_status) && test -s out/conftest2.$ac_objext - then - # The compiler can only warn and ignore the option if not recognized - # So say no if there are warnings - $ECHO "$_lt_compiler_boilerplate" | $SED '/^$/d' > out/conftest.exp - $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 - if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then - lt_cv_prog_compiler_c_o=yes - fi - fi - chmod u+w . 2>&5 - $RM conftest* - # SGI C++ compiler will create directory out/ii_files/ for - # template instantiation - test -d out/ii_files && $RM out/ii_files/* && rmdir out/ii_files - $RM out/* && rmdir out - cd .. - $RM -r conftest - $RM conftest* - -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_c_o" >&5 -$as_echo "$lt_cv_prog_compiler_c_o" >&6; } - - - - - - - { $as_echo "$as_me:${as_lineno-$LINENO}: checking if $compiler supports -c -o file.$ac_objext" >&5 -$as_echo_n "checking if $compiler supports -c -o file.$ac_objext... " >&6; } -if ${lt_cv_prog_compiler_c_o+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_prog_compiler_c_o=no - $RM -r conftest 2>/dev/null - mkdir conftest - cd conftest - mkdir out - echo "$lt_simple_compile_test_code" > conftest.$ac_ext - - lt_compiler_flag="-o out/conftest2.$ac_objext" - # Insert the option either (1) after the last *FLAGS variable, or - # (2) before a word containing "conftest.", or (3) at the end. - # Note that $ac_compile itself does not contain backslashes and begins - # with a dollar sign (not a hyphen), so the echo should work correctly. - lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ - -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ - -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:$LINENO: $lt_compile\"" >&5) - (eval "$lt_compile" 2>out/conftest.err) - ac_status=$? - cat out/conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - if (exit $ac_status) && test -s out/conftest2.$ac_objext - then - # The compiler can only warn and ignore the option if not recognized - # So say no if there are warnings - $ECHO "$_lt_compiler_boilerplate" | $SED '/^$/d' > out/conftest.exp - $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 - if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then - lt_cv_prog_compiler_c_o=yes - fi - fi - chmod u+w . 2>&5 - $RM conftest* - # SGI C++ compiler will create directory out/ii_files/ for - # template instantiation - test -d out/ii_files && $RM out/ii_files/* && rmdir out/ii_files - $RM out/* && rmdir out - cd .. - $RM -r conftest - $RM conftest* - -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_c_o" >&5 -$as_echo "$lt_cv_prog_compiler_c_o" >&6; } - - - - -hard_links=nottested -if test no = "$lt_cv_prog_compiler_c_o" && test no != "$need_locks"; then - # do not overwrite the value of need_locks provided by the user - { $as_echo "$as_me:${as_lineno-$LINENO}: checking if we can lock with hard links" >&5 -$as_echo_n "checking if we can lock with hard links... " >&6; } - hard_links=yes - $RM conftest* - ln conftest.a conftest.b 2>/dev/null && hard_links=no - touch conftest.a - ln conftest.a conftest.b 2>&5 || hard_links=no - ln conftest.a conftest.b 2>/dev/null && hard_links=no - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $hard_links" >&5 -$as_echo "$hard_links" >&6; } - if test no = "$hard_links"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: '$CC' does not support '-c -o', so 'make -j' may be unsafe" >&5 -$as_echo "$as_me: WARNING: '$CC' does not support '-c -o', so 'make -j' may be unsafe" >&2;} - need_locks=warn - fi -else - need_locks=no -fi - - - - - - - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the $compiler linker ($LD) supports shared libraries" >&5 -$as_echo_n "checking whether the $compiler linker ($LD) supports shared libraries... " >&6; } - - runpath_var= - allow_undefined_flag= - always_export_symbols=no - archive_cmds= - archive_expsym_cmds= - compiler_needs_object=no - enable_shared_with_static_runtimes=no - export_dynamic_flag_spec= - export_symbols_cmds='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols' - hardcode_automatic=no - hardcode_direct=no - hardcode_direct_absolute=no - hardcode_libdir_flag_spec= - hardcode_libdir_separator= - hardcode_minus_L=no - hardcode_shlibpath_var=unsupported - inherit_rpath=no - link_all_deplibs=unknown - module_cmds= - module_expsym_cmds= - old_archive_from_new_cmds= - old_archive_from_expsyms_cmds= - thread_safe_flag_spec= - whole_archive_flag_spec= - # include_expsyms should be a list of space-separated symbols to be *always* - # included in the symbol list - include_expsyms= - # exclude_expsyms can be an extended regexp of symbols to exclude - # it will be wrapped by ' (' and ')$', so one must not match beginning or - # end of line. Example: 'a|bc|.*d.*' will exclude the symbols 'a' and 'bc', - # as well as any symbol that contains 'd'. - exclude_expsyms='_GLOBAL_OFFSET_TABLE_|_GLOBAL__F[ID]_.*' - # Although _GLOBAL_OFFSET_TABLE_ is a valid symbol C name, most a.out - # platforms (ab)use it in PIC code, but their linkers get confused if - # the symbol is explicitly referenced. Since portable code cannot - # rely on this symbol name, it's probably fine to never include it in - # preloaded symbol tables. - # Exclude shared library initialization/finalization symbols. - extract_expsyms_cmds= - - case $host_os in - cygwin* | mingw* | pw32* | cegcc*) - # FIXME: the MSVC++ and ICC port hasn't been tested in a loooong time - # When not using gcc, we currently assume that we are using - # Microsoft Visual C++ or Intel C++ Compiler. - if test yes != "$GCC"; then - with_gnu_ld=no - fi - ;; - interix*) - # we just hope/assume this is gcc and not c89 (= MSVC++ or ICC) - with_gnu_ld=yes - ;; - openbsd* | bitrig*) - with_gnu_ld=no - ;; - esac - - ld_shlibs=yes - - # On some targets, GNU ld is compatible enough with the native linker - # that we're better off using the native interface for both. - lt_use_gnu_ld_interface=no - if test yes = "$with_gnu_ld"; then - case $host_os in - aix*) - # The AIX port of GNU ld has always aspired to compatibility - # with the native linker. However, as the warning in the GNU ld - # block says, versions before 2.19.5* couldn't really create working - # shared libraries, regardless of the interface used. - case `$LD -v 2>&1` in - *\ \(GNU\ Binutils\)\ 2.19.5*) ;; - *\ \(GNU\ Binutils\)\ 2.[2-9]*) ;; - *\ \(GNU\ Binutils\)\ [3-9]*) ;; - *) - lt_use_gnu_ld_interface=yes - ;; - esac - ;; - *) - lt_use_gnu_ld_interface=yes - ;; - esac - fi - - if test yes = "$lt_use_gnu_ld_interface"; then - # If archive_cmds runs LD, not CC, wlarc should be empty - wlarc='$wl' - - # Set some defaults for GNU ld with shared library support. These - # are reset later if shared libraries are not supported. Putting them - # here allows them to be overridden if necessary. - runpath_var=LD_RUN_PATH - hardcode_libdir_flag_spec='$wl-rpath $wl$libdir' - export_dynamic_flag_spec='$wl--export-dynamic' - # ancient GNU ld didn't support --whole-archive et. al. - if $LD --help 2>&1 | $GREP 'no-whole-archive' > /dev/null; then - whole_archive_flag_spec=$wlarc'--whole-archive$convenience '$wlarc'--no-whole-archive' - else - whole_archive_flag_spec= - fi - supports_anon_versioning=no - case `$LD -v | $SED -e 's/(^)\+)\s\+//' 2>&1` in - *GNU\ gold*) supports_anon_versioning=yes ;; - *\ [01].* | *\ 2.[0-9].* | *\ 2.10.*) ;; # catch versions < 2.11 - *\ 2.11.93.0.2\ *) supports_anon_versioning=yes ;; # RH7.3 ... - *\ 2.11.92.0.12\ *) supports_anon_versioning=yes ;; # Mandrake 8.2 ... - *\ 2.11.*) ;; # other 2.11 versions - *) supports_anon_versioning=yes ;; - esac - - # See if GNU ld supports shared libraries. - case $host_os in - aix[3-9]*) - # On AIX/PPC, the GNU linker is very broken - if test ia64 != "$host_cpu"; then - ld_shlibs=no - cat <<_LT_EOF 1>&2 - -*** Warning: the GNU linker, at least up to release 2.19, is reported -*** to be unable to reliably create shared libraries on AIX. -*** Therefore, libtool is disabling shared libraries support. If you -*** really care for shared libraries, you may want to install binutils -*** 2.20 or above, or modify your PATH so that a non-GNU linker is found. -*** You will then need to restart the configuration process. - -_LT_EOF - fi - ;; - - amigaos*) - case $host_cpu in - powerpc) - # see comment about AmigaOS4 .so support - archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' - archive_expsym_cmds='' - ;; - m68k) - archive_cmds='$RM $output_objdir/a2ixlibrary.data~$ECHO "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$ECHO "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$ECHO "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$ECHO "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)' - hardcode_libdir_flag_spec='-L$libdir' - hardcode_minus_L=yes - ;; - esac - ;; - - beos*) - if $LD --help 2>&1 | $GREP ': supported targets:.* elf' > /dev/null; then - allow_undefined_flag=unsupported - # Joseph Beckenbach says some releases of gcc - # support --undefined. This deserves some investigation. FIXME - archive_cmds='$CC -nostart $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' - else - ld_shlibs=no - fi - ;; - - cygwin* | mingw* | pw32* | cegcc*) - # _LT_TAGVAR(hardcode_libdir_flag_spec, ) is actually meaningless, - # as there is no search path for DLLs. - hardcode_libdir_flag_spec='-L$libdir' - export_dynamic_flag_spec='$wl--export-all-symbols' - allow_undefined_flag=unsupported - always_export_symbols=no - enable_shared_with_static_runtimes=yes - export_symbols_cmds='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[BCDGRS][ ]/s/.*[ ]\([^ ]*\)/\1 DATA/;s/^.*[ ]__nm__\([^ ]*\)[ ][^ ]*/\1 DATA/;/^I[ ]/d;/^[AITW][ ]/s/.* //'\'' | sort | uniq > $export_symbols' - exclude_expsyms='[_]+GLOBAL_OFFSET_TABLE_|[_]+GLOBAL__[FID]_.*|[_]+head_[A-Za-z0-9_]+_dll|[A-Za-z0-9_]+_dll_iname' - - if $LD --help 2>&1 | $GREP 'auto-import' > /dev/null; then - archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags -o $output_objdir/$soname $wl--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' - # If the export-symbols file already is a .def file, use it as - # is; otherwise, prepend EXPORTS... - archive_expsym_cmds='if test DEF = "`$SED -n -e '\''s/^[ ]*//'\'' -e '\''/^\(;.*\)*$/d'\'' -e '\''s/^\(EXPORTS\|LIBRARY\)\([ ].*\)*$/DEF/p'\'' -e q $export_symbols`" ; then - cp $export_symbols $output_objdir/$soname.def; - else - echo EXPORTS > $output_objdir/$soname.def; - cat $export_symbols >> $output_objdir/$soname.def; - fi~ - $CC -shared $output_objdir/$soname.def $libobjs $deplibs $compiler_flags -o $output_objdir/$soname $wl--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' - else - ld_shlibs=no - fi - ;; - - haiku*) - archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' - link_all_deplibs=yes - ;; - - os2*) - hardcode_libdir_flag_spec='-L$libdir' - hardcode_minus_L=yes - allow_undefined_flag=unsupported - shrext_cmds=.dll - archive_cmds='$ECHO "LIBRARY ${soname%$shared_ext} INITINSTANCE TERMINSTANCE" > $output_objdir/$libname.def~ - $ECHO "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~ - $ECHO "DATA MULTIPLE NONSHARED" >> $output_objdir/$libname.def~ - $ECHO EXPORTS >> $output_objdir/$libname.def~ - emxexp $libobjs | $SED /"_DLL_InitTerm"/d >> $output_objdir/$libname.def~ - $CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~ - emximp -o $lib $output_objdir/$libname.def' - archive_expsym_cmds='$ECHO "LIBRARY ${soname%$shared_ext} INITINSTANCE TERMINSTANCE" > $output_objdir/$libname.def~ - $ECHO "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~ - $ECHO "DATA MULTIPLE NONSHARED" >> $output_objdir/$libname.def~ - $ECHO EXPORTS >> $output_objdir/$libname.def~ - prefix_cmds="$SED"~ - if test EXPORTS = "`$SED 1q $export_symbols`"; then - prefix_cmds="$prefix_cmds -e 1d"; - fi~ - prefix_cmds="$prefix_cmds -e \"s/^\(.*\)$/_\1/g\""~ - cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~ - $CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~ - emximp -o $lib $output_objdir/$libname.def' - old_archive_From_new_cmds='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def' - enable_shared_with_static_runtimes=yes - file_list_spec='@' - ;; - - interix[3-9]*) - hardcode_direct=no - hardcode_shlibpath_var=no - hardcode_libdir_flag_spec='$wl-rpath,$libdir' - export_dynamic_flag_spec='$wl-E' - # Hack: On Interix 3.x, we cannot compile PIC because of a broken gcc. - # Instead, shared libraries are loaded at an image base (0x10000000 by - # default) and relocated if they conflict, which is a slow very memory - # consuming and fragmenting process. To avoid this, we pick a random, - # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link - # time. Moving up from 0x10000000 also allows more sbrk(2) space. - archive_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' - archive_expsym_cmds='sed "s|^|_|" $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--retain-symbols-file,$output_objdir/$soname.expsym $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' - ;; - - gnu* | linux* | tpf* | k*bsd*-gnu | kopensolaris*-gnu) - tmp_diet=no - if test linux-dietlibc = "$host_os"; then - case $cc_basename in - diet\ *) tmp_diet=yes;; # linux-dietlibc with static linking (!diet-dyn) - esac - fi - if $LD --help 2>&1 | $EGREP ': supported targets:.* elf' > /dev/null \ - && test no = "$tmp_diet" - then - tmp_addflag=' $pic_flag' - tmp_sharedflag='-shared' - case $cc_basename,$host_cpu in - pgcc*) # Portland Group C compiler - whole_archive_flag_spec='$wl--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; func_echo_all \"$new_convenience\"` $wl--no-whole-archive' - tmp_addflag=' $pic_flag' - ;; - pgf77* | pgf90* | pgf95* | pgfortran*) - # Portland Group f77 and f90 compilers - whole_archive_flag_spec='$wl--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; func_echo_all \"$new_convenience\"` $wl--no-whole-archive' - tmp_addflag=' $pic_flag -Mnomain' ;; - ecc*,ia64* | icc*,ia64*) # Intel C compiler on ia64 - tmp_addflag=' -i_dynamic' ;; - efc*,ia64* | ifort*,ia64*) # Intel Fortran compiler on ia64 - tmp_addflag=' -i_dynamic -nofor_main' ;; - ifc* | ifort*) # Intel Fortran compiler - tmp_addflag=' -nofor_main' ;; - lf95*) # Lahey Fortran 8.1 - whole_archive_flag_spec= - tmp_sharedflag='--shared' ;; - nagfor*) # NAGFOR 5.3 - tmp_sharedflag='-Wl,-shared' ;; - xl[cC]* | bgxl[cC]* | mpixl[cC]*) # IBM XL C 8.0 on PPC (deal with xlf below) - tmp_sharedflag='-qmkshrobj' - tmp_addflag= ;; - nvcc*) # Cuda Compiler Driver 2.2 - whole_archive_flag_spec='$wl--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; func_echo_all \"$new_convenience\"` $wl--no-whole-archive' - compiler_needs_object=yes - ;; - esac - case `$CC -V 2>&1 | sed 5q` in - *Sun\ C*) # Sun C 5.9 - whole_archive_flag_spec='$wl--whole-archive`new_convenience=; for conv in $convenience\"\"; do test -z \"$conv\" || new_convenience=\"$new_convenience,$conv\"; done; func_echo_all \"$new_convenience\"` $wl--no-whole-archive' - compiler_needs_object=yes - tmp_sharedflag='-G' ;; - *Sun\ F*) # Sun Fortran 8.3 - tmp_sharedflag='-G' ;; - esac - archive_cmds='$CC '"$tmp_sharedflag""$tmp_addflag"' $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' - - if test yes = "$supports_anon_versioning"; then - archive_expsym_cmds='echo "{ global:" > $output_objdir/$libname.ver~ - cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~ - echo "local: *; };" >> $output_objdir/$libname.ver~ - $CC '"$tmp_sharedflag""$tmp_addflag"' $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-version-script $wl$output_objdir/$libname.ver -o $lib' - fi - - case $cc_basename in - tcc*) - export_dynamic_flag_spec='-rdynamic' - ;; - xlf* | bgf* | bgxlf* | mpixlf*) - # IBM XL Fortran 10.1 on PPC cannot create shared libs itself - whole_archive_flag_spec='--whole-archive$convenience --no-whole-archive' - hardcode_libdir_flag_spec='$wl-rpath $wl$libdir' - archive_cmds='$LD -shared $libobjs $deplibs $linker_flags -soname $soname -o $lib' - if test yes = "$supports_anon_versioning"; then - archive_expsym_cmds='echo "{ global:" > $output_objdir/$libname.ver~ - cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~ - echo "local: *; };" >> $output_objdir/$libname.ver~ - $LD -shared $libobjs $deplibs $linker_flags -soname $soname -version-script $output_objdir/$libname.ver -o $lib' - fi - ;; - esac - else - ld_shlibs=no - fi - ;; - - netbsd*) - if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then - archive_cmds='$LD -Bshareable $libobjs $deplibs $linker_flags -o $lib' - wlarc= - else - archive_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' - archive_expsym_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-retain-symbols-file $wl$export_symbols -o $lib' - fi - ;; - - solaris*) - if $LD -v 2>&1 | $GREP 'BFD 2\.8' > /dev/null; then - ld_shlibs=no - cat <<_LT_EOF 1>&2 - -*** Warning: The releases 2.8.* of the GNU linker cannot reliably -*** create shared libraries on Solaris systems. Therefore, libtool -*** is disabling shared libraries support. We urge you to upgrade GNU -*** binutils to release 2.9.1 or newer. Another option is to modify -*** your PATH or compiler configuration so that the native linker is -*** used, and then restart. - -_LT_EOF - elif $LD --help 2>&1 | $GREP ': supported targets:.* elf' > /dev/null; then - archive_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' - archive_expsym_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-retain-symbols-file $wl$export_symbols -o $lib' - else - ld_shlibs=no - fi - ;; - - sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX*) - case `$LD -v 2>&1` in - *\ [01].* | *\ 2.[0-9].* | *\ 2.1[0-5].*) - ld_shlibs=no - cat <<_LT_EOF 1>&2 - -*** Warning: Releases of the GNU linker prior to 2.16.91.0.3 cannot -*** reliably create shared libraries on SCO systems. Therefore, libtool -*** is disabling shared libraries support. We urge you to upgrade GNU -*** binutils to release 2.16.91.0.3 or newer. Another option is to modify -*** your PATH or compiler configuration so that the native linker is -*** used, and then restart. - -_LT_EOF - ;; - *) - # For security reasons, it is highly recommended that you always - # use absolute paths for naming shared libraries, and exclude the - # DT_RUNPATH tag from executables and libraries. But doing so - # requires that you compile everything twice, which is a pain. - if $LD --help 2>&1 | $GREP ': supported targets:.* elf' > /dev/null; then - hardcode_libdir_flag_spec='$wl-rpath $wl$libdir' - archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' - archive_expsym_cmds='$CC -shared $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-retain-symbols-file $wl$export_symbols -o $lib' - else - ld_shlibs=no - fi - ;; - esac - ;; - - sunos4*) - archive_cmds='$LD -assert pure-text -Bshareable -o $lib $libobjs $deplibs $linker_flags' - wlarc= - hardcode_direct=yes - hardcode_shlibpath_var=no - ;; - - *) - if $LD --help 2>&1 | $GREP ': supported targets:.* elf' > /dev/null; then - archive_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' - archive_expsym_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-retain-symbols-file $wl$export_symbols -o $lib' - else - ld_shlibs=no - fi - ;; - esac - - if test no = "$ld_shlibs"; then - runpath_var= - hardcode_libdir_flag_spec= - export_dynamic_flag_spec= - whole_archive_flag_spec= - fi - else - # PORTME fill in a description of your system's linker (not GNU ld) - case $host_os in - aix3*) - allow_undefined_flag=unsupported - always_export_symbols=yes - archive_expsym_cmds='$LD -o $output_objdir/$soname $libobjs $deplibs $linker_flags -bE:$export_symbols -T512 -H512 -bM:SRE~$AR $AR_FLAGS $lib $output_objdir/$soname' - # Note: this linker hardcodes the directories in LIBPATH if there - # are no directories specified by -L. - hardcode_minus_L=yes - if test yes = "$GCC" && test -z "$lt_prog_compiler_static"; then - # Neither direct hardcoding nor static linking is supported with a - # broken collect2. - hardcode_direct=unsupported - fi - ;; - - aix[4-9]*) - if test ia64 = "$host_cpu"; then - # On IA64, the linker does run time linking by default, so we don't - # have to do anything special. - aix_use_runtimelinking=no - exp_sym_flag='-Bexport' - no_entry_flag= - else - # If we're using GNU nm, then we don't want the "-C" option. - # -C means demangle to GNU nm, but means don't demangle to AIX nm. - # Without the "-l" option, or with the "-B" option, AIX nm treats - # weak defined symbols like other global defined symbols, whereas - # GNU nm marks them as "W". - # While the 'weak' keyword is ignored in the Export File, we need - # it in the Import File for the 'aix-soname' feature, so we have - # to replace the "-B" option with "-P" for AIX nm. - if $NM -V 2>&1 | $GREP 'GNU' > /dev/null; then - export_symbols_cmds='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "W")) && (substr(\$ 3,1,1) != ".")) { if (\$ 2 == "W") { print \$ 3 " weak" } else { print \$ 3 } } }'\'' | sort -u > $export_symbols' - else - export_symbols_cmds='`func_echo_all $NM | $SED -e '\''s/B\([^B]*\)$/P\1/'\''` -PCpgl $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "L") || (\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) && (substr(\$ 1,1,1) != ".")) { if ((\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) { print \$ 1 " weak" } else { print \$ 1 } } }'\'' | sort -u > $export_symbols' - fi - aix_use_runtimelinking=no - - # Test if we are trying to use run time linking or normal - # AIX style linking. If -brtl is somewhere in LDFLAGS, we - # have runtime linking enabled, and use it for executables. - # For shared libraries, we enable/disable runtime linking - # depending on the kind of the shared library created - - # when "with_aix_soname,aix_use_runtimelinking" is: - # "aix,no" lib.a(lib.so.V) shared, rtl:no, for executables - # "aix,yes" lib.so shared, rtl:yes, for executables - # lib.a static archive - # "both,no" lib.so.V(shr.o) shared, rtl:yes - # lib.a(lib.so.V) shared, rtl:no, for executables - # "both,yes" lib.so.V(shr.o) shared, rtl:yes, for executables - # lib.a(lib.so.V) shared, rtl:no - # "svr4,*" lib.so.V(shr.o) shared, rtl:yes, for executables - # lib.a static archive - case $host_os in aix4.[23]|aix4.[23].*|aix[5-9]*) - for ld_flag in $LDFLAGS; do - if (test x-brtl = "x$ld_flag" || test x-Wl,-brtl = "x$ld_flag"); then - aix_use_runtimelinking=yes - break - fi - done - if test svr4,no = "$with_aix_soname,$aix_use_runtimelinking"; then - # With aix-soname=svr4, we create the lib.so.V shared archives only, - # so we don't have lib.a shared libs to link our executables. - # We have to force runtime linking in this case. - aix_use_runtimelinking=yes - LDFLAGS="$LDFLAGS -Wl,-brtl" - fi - ;; - esac - - exp_sym_flag='-bexport' - no_entry_flag='-bnoentry' - fi - - # When large executables or shared objects are built, AIX ld can - # have problems creating the table of contents. If linking a library - # or program results in "error TOC overflow" add -mminimal-toc to - # CXXFLAGS/CFLAGS for g++/gcc. In the cases where that is not - # enough to fix the problem, add -Wl,-bbigtoc to LDFLAGS. - - archive_cmds='' - hardcode_direct=yes - hardcode_direct_absolute=yes - hardcode_libdir_separator=':' - link_all_deplibs=yes - file_list_spec='$wl-f,' - case $with_aix_soname,$aix_use_runtimelinking in - aix,*) ;; # traditional, no import file - svr4,* | *,yes) # use import file - # The Import File defines what to hardcode. - hardcode_direct=no - hardcode_direct_absolute=no - ;; - esac - - if test yes = "$GCC"; then - case $host_os in aix4.[012]|aix4.[012].*) - # We only want to do this on AIX 4.2 and lower, the check - # below for broken collect2 doesn't work under 4.3+ - collect2name=`$CC -print-prog-name=collect2` - if test -f "$collect2name" && - strings "$collect2name" | $GREP resolve_lib_name >/dev/null - then - # We have reworked collect2 - : - else - # We have old collect2 - hardcode_direct=unsupported - # It fails to find uninstalled libraries when the uninstalled - # path is not listed in the libpath. Setting hardcode_minus_L - # to unsupported forces relinking - hardcode_minus_L=yes - hardcode_libdir_flag_spec='-L$libdir' - hardcode_libdir_separator= - fi - ;; - esac - shared_flag='-shared' - if test yes = "$aix_use_runtimelinking"; then - shared_flag="$shared_flag "'$wl-G' - fi - # Need to ensure runtime linking is disabled for the traditional - # shared library, or the linker may eventually find shared libraries - # /with/ Import File - we do not want to mix them. - shared_flag_aix='-shared' - shared_flag_svr4='-shared $wl-G' - else - # not using gcc - if test ia64 = "$host_cpu"; then - # VisualAge C++, Version 5.5 for AIX 5L for IA-64, Beta 3 Release - # chokes on -Wl,-G. The following line is correct: - shared_flag='-G' - else - if test yes = "$aix_use_runtimelinking"; then - shared_flag='$wl-G' - else - shared_flag='$wl-bM:SRE' - fi - shared_flag_aix='$wl-bM:SRE' - shared_flag_svr4='$wl-G' - fi - fi - - export_dynamic_flag_spec='$wl-bexpall' - # It seems that -bexpall does not export symbols beginning with - # underscore (_), so it is better to generate a list of symbols to export. - always_export_symbols=yes - if test aix,yes = "$with_aix_soname,$aix_use_runtimelinking"; then - # Warning - without using the other runtime loading flags (-brtl), - # -berok will link without error, but may produce a broken library. - allow_undefined_flag='-berok' - # Determine the default libpath from the value encoded in an - # empty executable. - if test set = "${lt_cv_aix_libpath+set}"; then - aix_libpath=$lt_cv_aix_libpath -else - if ${lt_cv_aix_libpath_+:} false; then : - $as_echo_n "(cached) " >&6 -else - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ - -int -main () -{ - - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - - lt_aix_libpath_sed=' - /Import File Strings/,/^$/ { - /^0/ { - s/^0 *\([^ ]*\) *$/\1/ - p - } - }' - lt_cv_aix_libpath_=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e "$lt_aix_libpath_sed"` - # Check for a 64-bit object if we didn't find anything. - if test -z "$lt_cv_aix_libpath_"; then - lt_cv_aix_libpath_=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e "$lt_aix_libpath_sed"` - fi -fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext - if test -z "$lt_cv_aix_libpath_"; then - lt_cv_aix_libpath_=/usr/lib:/lib - fi - -fi - - aix_libpath=$lt_cv_aix_libpath_ -fi - - hardcode_libdir_flag_spec='$wl-blibpath:$libdir:'"$aix_libpath" - archive_expsym_cmds='$CC -o $output_objdir/$soname $libobjs $deplibs $wl'$no_entry_flag' $compiler_flags `if test -n "$allow_undefined_flag"; then func_echo_all "$wl$allow_undefined_flag"; else :; fi` $wl'$exp_sym_flag:\$export_symbols' '$shared_flag - else - if test ia64 = "$host_cpu"; then - hardcode_libdir_flag_spec='$wl-R $libdir:/usr/lib:/lib' - allow_undefined_flag="-z nodefs" - archive_expsym_cmds="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs '"\$wl$no_entry_flag"' $compiler_flags $wl$allow_undefined_flag '"\$wl$exp_sym_flag:\$export_symbols" - else - # Determine the default libpath from the value encoded in an - # empty executable. - if test set = "${lt_cv_aix_libpath+set}"; then - aix_libpath=$lt_cv_aix_libpath -else - if ${lt_cv_aix_libpath_+:} false; then : - $as_echo_n "(cached) " >&6 -else - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ - -int -main () -{ - - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - - lt_aix_libpath_sed=' - /Import File Strings/,/^$/ { - /^0/ { - s/^0 *\([^ ]*\) *$/\1/ - p - } - }' - lt_cv_aix_libpath_=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e "$lt_aix_libpath_sed"` - # Check for a 64-bit object if we didn't find anything. - if test -z "$lt_cv_aix_libpath_"; then - lt_cv_aix_libpath_=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e "$lt_aix_libpath_sed"` - fi -fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext - if test -z "$lt_cv_aix_libpath_"; then - lt_cv_aix_libpath_=/usr/lib:/lib - fi - -fi - - aix_libpath=$lt_cv_aix_libpath_ -fi - - hardcode_libdir_flag_spec='$wl-blibpath:$libdir:'"$aix_libpath" - # Warning - without using the other run time loading flags, - # -berok will link without error, but may produce a broken library. - no_undefined_flag=' $wl-bernotok' - allow_undefined_flag=' $wl-berok' - if test yes = "$with_gnu_ld"; then - # We only use this code for GNU lds that support --whole-archive. - whole_archive_flag_spec='$wl--whole-archive$convenience $wl--no-whole-archive' - else - # Exported symbols can be pulled into shared objects from archives - whole_archive_flag_spec='$convenience' - fi - archive_cmds_need_lc=yes - archive_expsym_cmds='$RM -r $output_objdir/$realname.d~$MKDIR $output_objdir/$realname.d' - # -brtl affects multiple linker settings, -berok does not and is overridden later - compiler_flags_filtered='`func_echo_all "$compiler_flags " | $SED -e "s%-brtl\\([, ]\\)%-berok\\1%g"`' - if test svr4 != "$with_aix_soname"; then - # This is similar to how AIX traditionally builds its shared libraries. - archive_expsym_cmds="$archive_expsym_cmds"'~$CC '$shared_flag_aix' -o $output_objdir/$realname.d/$soname $libobjs $deplibs $wl-bnoentry '$compiler_flags_filtered'$wl-bE:$export_symbols$allow_undefined_flag~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$realname.d/$soname' - fi - if test aix != "$with_aix_soname"; then - archive_expsym_cmds="$archive_expsym_cmds"'~$CC '$shared_flag_svr4' -o $output_objdir/$realname.d/$shared_archive_member_spec.o $libobjs $deplibs $wl-bnoentry '$compiler_flags_filtered'$wl-bE:$export_symbols$allow_undefined_flag~$STRIP -e $output_objdir/$realname.d/$shared_archive_member_spec.o~( func_echo_all "#! $soname($shared_archive_member_spec.o)"; if test shr_64 = "$shared_archive_member_spec"; then func_echo_all "# 64"; else func_echo_all "# 32"; fi; cat $export_symbols ) > $output_objdir/$realname.d/$shared_archive_member_spec.imp~$AR $AR_FLAGS $output_objdir/$soname $output_objdir/$realname.d/$shared_archive_member_spec.o $output_objdir/$realname.d/$shared_archive_member_spec.imp' - else - # used by -dlpreopen to get the symbols - archive_expsym_cmds="$archive_expsym_cmds"'~$MV $output_objdir/$realname.d/$soname $output_objdir' - fi - archive_expsym_cmds="$archive_expsym_cmds"'~$RM -r $output_objdir/$realname.d' - fi - fi - ;; - - amigaos*) - case $host_cpu in - powerpc) - # see comment about AmigaOS4 .so support - archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' - archive_expsym_cmds='' - ;; - m68k) - archive_cmds='$RM $output_objdir/a2ixlibrary.data~$ECHO "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$ECHO "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$ECHO "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$ECHO "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)' - hardcode_libdir_flag_spec='-L$libdir' - hardcode_minus_L=yes - ;; - esac - ;; - - bsdi[45]*) - export_dynamic_flag_spec=-rdynamic - ;; - - cygwin* | mingw* | pw32* | cegcc*) - # When not using gcc, we currently assume that we are using - # Microsoft Visual C++ or Intel C++ Compiler. - # hardcode_libdir_flag_spec is actually meaningless, as there is - # no search path for DLLs. - case $cc_basename in - cl* | icl*) - # Native MSVC or ICC - hardcode_libdir_flag_spec=' ' - allow_undefined_flag=unsupported - always_export_symbols=yes - file_list_spec='@' - # Tell ltmain to make .lib files, not .a files. - libext=lib - # Tell ltmain to make .dll files, not .so files. - shrext_cmds=.dll - # FIXME: Setting linknames here is a bad hack. - archive_cmds='$CC -o $output_objdir/$soname $libobjs $compiler_flags $deplibs -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~linknames=' - archive_expsym_cmds='if test DEF = "`$SED -n -e '\''s/^[ ]*//'\'' -e '\''/^\(;.*\)*$/d'\'' -e '\''s/^\(EXPORTS\|LIBRARY\)\([ ].*\)*$/DEF/p'\'' -e q $export_symbols`" ; then - cp "$export_symbols" "$output_objdir/$soname.def"; - echo "$tool_output_objdir$soname.def" > "$output_objdir/$soname.exp"; - else - $SED -e '\''s/^/-link -EXPORT:/'\'' < $export_symbols > $output_objdir/$soname.exp; - fi~ - $CC -o $tool_output_objdir$soname $libobjs $compiler_flags $deplibs "@$tool_output_objdir$soname.exp" -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~ - linknames=' - # The linker will not automatically build a static lib if we build a DLL. - # _LT_TAGVAR(old_archive_from_new_cmds, )='true' - enable_shared_with_static_runtimes=yes - exclude_expsyms='_NULL_IMPORT_DESCRIPTOR|_IMPORT_DESCRIPTOR_.*' - export_symbols_cmds='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[BCDGRS][ ]/s/.*[ ]\([^ ]*\)/\1,DATA/'\'' | $SED -e '\''/^[AITW][ ]/s/.*[ ]//'\'' | sort | uniq > $export_symbols' - # Don't use ranlib - old_postinstall_cmds='chmod 644 $oldlib' - postlink_cmds='lt_outputfile="@OUTPUT@"~ - lt_tool_outputfile="@TOOL_OUTPUT@"~ - case $lt_outputfile in - *.exe|*.EXE) ;; - *) - lt_outputfile=$lt_outputfile.exe - lt_tool_outputfile=$lt_tool_outputfile.exe - ;; - esac~ - if test : != "$MANIFEST_TOOL" && test -f "$lt_outputfile.manifest"; then - $MANIFEST_TOOL -manifest "$lt_tool_outputfile.manifest" -outputresource:"$lt_tool_outputfile" || exit 1; - $RM "$lt_outputfile.manifest"; - fi' - ;; - *) - # Assume MSVC and ICC wrapper - hardcode_libdir_flag_spec=' ' - allow_undefined_flag=unsupported - # Tell ltmain to make .lib files, not .a files. - libext=lib - # Tell ltmain to make .dll files, not .so files. - shrext_cmds=.dll - # FIXME: Setting linknames here is a bad hack. - archive_cmds='$CC -o $lib $libobjs $compiler_flags `func_echo_all "$deplibs" | $SED '\''s/ -lc$//'\''` -link -dll~linknames=' - # The linker will automatically build a .lib file if we build a DLL. - old_archive_from_new_cmds='true' - # FIXME: Should let the user specify the lib program. - old_archive_cmds='lib -OUT:$oldlib$oldobjs$old_deplibs' - enable_shared_with_static_runtimes=yes - ;; - esac - ;; - - darwin* | rhapsody*) - - - archive_cmds_need_lc=no - hardcode_direct=no - hardcode_automatic=yes - hardcode_shlibpath_var=unsupported - if test yes = "$lt_cv_ld_force_load"; then - whole_archive_flag_spec='`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience $wl-force_load,$conv\"; done; func_echo_all \"$new_convenience\"`' - - else - whole_archive_flag_spec='' - fi - link_all_deplibs=yes - allow_undefined_flag=$_lt_dar_allow_undefined - case $cc_basename in - ifort*|nagfor*) _lt_dar_can_shared=yes ;; - *) _lt_dar_can_shared=$GCC ;; - esac - if test yes = "$_lt_dar_can_shared"; then - output_verbose_link_cmd=func_echo_all - archive_cmds="\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring $_lt_dar_single_mod$_lt_dsymutil" - module_cmds="\$CC \$allow_undefined_flag -o \$lib -bundle \$libobjs \$deplibs \$compiler_flags$_lt_dsymutil" - archive_expsym_cmds="sed 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring $_lt_dar_single_mod$_lt_dar_export_syms$_lt_dsymutil" - module_expsym_cmds="sed -e 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC \$allow_undefined_flag -o \$lib -bundle \$libobjs \$deplibs \$compiler_flags$_lt_dar_export_syms$_lt_dsymutil" - - else - ld_shlibs=no - fi - - ;; - - dgux*) - archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' - hardcode_libdir_flag_spec='-L$libdir' - hardcode_shlibpath_var=no - ;; - - # FreeBSD 2.2.[012] allows us to include c++rt0.o to get C++ constructor - # support. Future versions do this automatically, but an explicit c++rt0.o - # does not break anything, and helps significantly (at the cost of a little - # extra space). - freebsd2.2*) - archive_cmds='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags /usr/lib/c++rt0.o' - hardcode_libdir_flag_spec='-R$libdir' - hardcode_direct=yes - hardcode_shlibpath_var=no - ;; - - # Unfortunately, older versions of FreeBSD 2 do not have this feature. - freebsd2.*) - archive_cmds='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' - hardcode_direct=yes - hardcode_minus_L=yes - hardcode_shlibpath_var=no - ;; - - # FreeBSD 3 and greater uses gcc -shared to do shared libraries. - freebsd* | dragonfly*) - archive_cmds='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' - hardcode_libdir_flag_spec='-R$libdir' - hardcode_direct=yes - hardcode_shlibpath_var=no - ;; - - hpux9*) - if test yes = "$GCC"; then - archive_cmds='$RM $output_objdir/$soname~$CC -shared $pic_flag $wl+b $wl$install_libdir -o $output_objdir/$soname $libobjs $deplibs $compiler_flags~test "x$output_objdir/$soname" = "x$lib" || mv $output_objdir/$soname $lib' - else - archive_cmds='$RM $output_objdir/$soname~$LD -b +b $install_libdir -o $output_objdir/$soname $libobjs $deplibs $linker_flags~test "x$output_objdir/$soname" = "x$lib" || mv $output_objdir/$soname $lib' - fi - hardcode_libdir_flag_spec='$wl+b $wl$libdir' - hardcode_libdir_separator=: - hardcode_direct=yes - - # hardcode_minus_L: Not really in the search PATH, - # but as the default location of the library. - hardcode_minus_L=yes - export_dynamic_flag_spec='$wl-E' - ;; - - hpux10*) - if test yes,no = "$GCC,$with_gnu_ld"; then - archive_cmds='$CC -shared $pic_flag $wl+h $wl$soname $wl+b $wl$install_libdir -o $lib $libobjs $deplibs $compiler_flags' - else - archive_cmds='$LD -b +h $soname +b $install_libdir -o $lib $libobjs $deplibs $linker_flags' - fi - if test no = "$with_gnu_ld"; then - hardcode_libdir_flag_spec='$wl+b $wl$libdir' - hardcode_libdir_separator=: - hardcode_direct=yes - hardcode_direct_absolute=yes - export_dynamic_flag_spec='$wl-E' - # hardcode_minus_L: Not really in the search PATH, - # but as the default location of the library. - hardcode_minus_L=yes - fi - ;; - - hpux11*) - if test yes,no = "$GCC,$with_gnu_ld"; then - case $host_cpu in - hppa*64*) - archive_cmds='$CC -shared $wl+h $wl$soname -o $lib $libobjs $deplibs $compiler_flags' - ;; - ia64*) - archive_cmds='$CC -shared $pic_flag $wl+h $wl$soname $wl+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' - ;; - *) - archive_cmds='$CC -shared $pic_flag $wl+h $wl$soname $wl+b $wl$install_libdir -o $lib $libobjs $deplibs $compiler_flags' - ;; - esac - else - case $host_cpu in - hppa*64*) - archive_cmds='$CC -b $wl+h $wl$soname -o $lib $libobjs $deplibs $compiler_flags' - ;; - ia64*) - archive_cmds='$CC -b $wl+h $wl$soname $wl+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' - ;; - *) - - # Older versions of the 11.00 compiler do not understand -b yet - # (HP92453-01 A.11.01.20 doesn't, HP92453-01 B.11.X.35175-35176.GP does) - { $as_echo "$as_me:${as_lineno-$LINENO}: checking if $CC understands -b" >&5 -$as_echo_n "checking if $CC understands -b... " >&6; } -if ${lt_cv_prog_compiler__b+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_prog_compiler__b=no - save_LDFLAGS=$LDFLAGS - LDFLAGS="$LDFLAGS -b" - echo "$lt_simple_link_test_code" > conftest.$ac_ext - if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then - # The linker can only warn and ignore the option if not recognized - # So say no if there are warnings - if test -s conftest.err; then - # Append any errors to the config.log. - cat conftest.err 1>&5 - $ECHO "$_lt_linker_boilerplate" | $SED '/^$/d' > conftest.exp - $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 - if diff conftest.exp conftest.er2 >/dev/null; then - lt_cv_prog_compiler__b=yes - fi - else - lt_cv_prog_compiler__b=yes - fi - fi - $RM -r conftest* - LDFLAGS=$save_LDFLAGS - -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler__b" >&5 -$as_echo "$lt_cv_prog_compiler__b" >&6; } - -if test yes = "$lt_cv_prog_compiler__b"; then - archive_cmds='$CC -b $wl+h $wl$soname $wl+b $wl$install_libdir -o $lib $libobjs $deplibs $compiler_flags' -else - archive_cmds='$LD -b +h $soname +b $install_libdir -o $lib $libobjs $deplibs $linker_flags' -fi - - ;; - esac - fi - if test no = "$with_gnu_ld"; then - hardcode_libdir_flag_spec='$wl+b $wl$libdir' - hardcode_libdir_separator=: - - case $host_cpu in - hppa*64*|ia64*) - hardcode_direct=no - hardcode_shlibpath_var=no - ;; - *) - hardcode_direct=yes - hardcode_direct_absolute=yes - export_dynamic_flag_spec='$wl-E' - - # hardcode_minus_L: Not really in the search PATH, - # but as the default location of the library. - hardcode_minus_L=yes - ;; - esac - fi - ;; - - irix5* | irix6* | nonstopux*) - if test yes = "$GCC"; then - archive_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` $wl-update_registry $wl$output_objdir/so_locations -o $lib' - # Try to use the -exported_symbol ld option, if it does not - # work, assume that -exports_file does not work either and - # implicitly export all symbols. - # This should be the same for all languages, so no per-tag cache variable. - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the $host_os linker accepts -exported_symbol" >&5 -$as_echo_n "checking whether the $host_os linker accepts -exported_symbol... " >&6; } -if ${lt_cv_irix_exported_symbol+:} false; then : - $as_echo_n "(cached) " >&6 -else - save_LDFLAGS=$LDFLAGS - LDFLAGS="$LDFLAGS -shared $wl-exported_symbol ${wl}foo $wl-update_registry $wl/dev/null" - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -int foo (void) { return 0; } -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - lt_cv_irix_exported_symbol=yes -else - lt_cv_irix_exported_symbol=no -fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext - LDFLAGS=$save_LDFLAGS -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_irix_exported_symbol" >&5 -$as_echo "$lt_cv_irix_exported_symbol" >&6; } - if test yes = "$lt_cv_irix_exported_symbol"; then - archive_expsym_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` $wl-update_registry $wl$output_objdir/so_locations $wl-exports_file $wl$export_symbols -o $lib' - fi - else - archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib' - archive_expsym_cmds='$CC -shared $libobjs $deplibs $compiler_flags -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -exports_file $export_symbols -o $lib' - fi - archive_cmds_need_lc='no' - hardcode_libdir_flag_spec='$wl-rpath $wl$libdir' - hardcode_libdir_separator=: - inherit_rpath=yes - link_all_deplibs=yes - ;; - - linux*) - case $cc_basename in - tcc*) - # Fabrice Bellard et al's Tiny C Compiler - ld_shlibs=yes - archive_cmds='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' - ;; - esac - ;; - - netbsd*) - if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then - archive_cmds='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' # a.out - else - archive_cmds='$LD -shared -o $lib $libobjs $deplibs $linker_flags' # ELF - fi - hardcode_libdir_flag_spec='-R$libdir' - hardcode_direct=yes - hardcode_shlibpath_var=no - ;; - - newsos6) - archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' - hardcode_direct=yes - hardcode_libdir_flag_spec='$wl-rpath $wl$libdir' - hardcode_libdir_separator=: - hardcode_shlibpath_var=no - ;; - - *nto* | *qnx*) - ;; - - openbsd* | bitrig*) - if test -f /usr/libexec/ld.so; then - hardcode_direct=yes - hardcode_shlibpath_var=no - hardcode_direct_absolute=yes - if test -z "`echo __ELF__ | $CC -E - | $GREP __ELF__`"; then - archive_cmds='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' - archive_expsym_cmds='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags $wl-retain-symbols-file,$export_symbols' - hardcode_libdir_flag_spec='$wl-rpath,$libdir' - export_dynamic_flag_spec='$wl-E' - else - archive_cmds='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' - hardcode_libdir_flag_spec='$wl-rpath,$libdir' - fi - else - ld_shlibs=no - fi - ;; - - os2*) - hardcode_libdir_flag_spec='-L$libdir' - hardcode_minus_L=yes - allow_undefined_flag=unsupported - shrext_cmds=.dll - archive_cmds='$ECHO "LIBRARY ${soname%$shared_ext} INITINSTANCE TERMINSTANCE" > $output_objdir/$libname.def~ - $ECHO "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~ - $ECHO "DATA MULTIPLE NONSHARED" >> $output_objdir/$libname.def~ - $ECHO EXPORTS >> $output_objdir/$libname.def~ - emxexp $libobjs | $SED /"_DLL_InitTerm"/d >> $output_objdir/$libname.def~ - $CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~ - emximp -o $lib $output_objdir/$libname.def' - archive_expsym_cmds='$ECHO "LIBRARY ${soname%$shared_ext} INITINSTANCE TERMINSTANCE" > $output_objdir/$libname.def~ - $ECHO "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~ - $ECHO "DATA MULTIPLE NONSHARED" >> $output_objdir/$libname.def~ - $ECHO EXPORTS >> $output_objdir/$libname.def~ - prefix_cmds="$SED"~ - if test EXPORTS = "`$SED 1q $export_symbols`"; then - prefix_cmds="$prefix_cmds -e 1d"; - fi~ - prefix_cmds="$prefix_cmds -e \"s/^\(.*\)$/_\1/g\""~ - cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~ - $CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~ - emximp -o $lib $output_objdir/$libname.def' - old_archive_From_new_cmds='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def' - enable_shared_with_static_runtimes=yes - file_list_spec='@' - ;; - - osf3*) - if test yes = "$GCC"; then - allow_undefined_flag=' $wl-expect_unresolved $wl\*' - archive_cmds='$CC -shared$allow_undefined_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` $wl-update_registry $wl$output_objdir/so_locations -o $lib' - else - allow_undefined_flag=' -expect_unresolved \*' - archive_cmds='$CC -shared$allow_undefined_flag $libobjs $deplibs $compiler_flags -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib' - fi - archive_cmds_need_lc='no' - hardcode_libdir_flag_spec='$wl-rpath $wl$libdir' - hardcode_libdir_separator=: - ;; - - osf4* | osf5*) # as osf3* with the addition of -msym flag - if test yes = "$GCC"; then - allow_undefined_flag=' $wl-expect_unresolved $wl\*' - archive_cmds='$CC -shared$allow_undefined_flag $pic_flag $libobjs $deplibs $compiler_flags $wl-msym $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` $wl-update_registry $wl$output_objdir/so_locations -o $lib' - hardcode_libdir_flag_spec='$wl-rpath $wl$libdir' - else - allow_undefined_flag=' -expect_unresolved \*' - archive_cmds='$CC -shared$allow_undefined_flag $libobjs $deplibs $compiler_flags -msym -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib' - archive_expsym_cmds='for i in `cat $export_symbols`; do printf "%s %s\\n" -exported_symbol "\$i" >> $lib.exp; done; printf "%s\\n" "-hidden">> $lib.exp~ - $CC -shared$allow_undefined_flag $wl-input $wl$lib.exp $compiler_flags $libobjs $deplibs -soname $soname `test -n "$verstring" && $ECHO "-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib~$RM $lib.exp' - - # Both c and cxx compiler support -rpath directly - hardcode_libdir_flag_spec='-rpath $libdir' - fi - archive_cmds_need_lc='no' - hardcode_libdir_separator=: - ;; - - solaris*) - no_undefined_flag=' -z defs' - if test yes = "$GCC"; then - wlarc='$wl' - archive_cmds='$CC -shared $pic_flag $wl-z ${wl}text $wl-h $wl$soname -o $lib $libobjs $deplibs $compiler_flags' - archive_expsym_cmds='echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~echo "local: *; };" >> $lib.exp~ - $CC -shared $pic_flag $wl-z ${wl}text $wl-M $wl$lib.exp $wl-h $wl$soname -o $lib $libobjs $deplibs $compiler_flags~$RM $lib.exp' - else - case `$CC -V 2>&1` in - *"Compilers 5.0"*) - wlarc='' - archive_cmds='$LD -G$allow_undefined_flag -h $soname -o $lib $libobjs $deplibs $linker_flags' - archive_expsym_cmds='echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~echo "local: *; };" >> $lib.exp~ - $LD -G$allow_undefined_flag -M $lib.exp -h $soname -o $lib $libobjs $deplibs $linker_flags~$RM $lib.exp' - ;; - *) - wlarc='$wl' - archive_cmds='$CC -G$allow_undefined_flag -h $soname -o $lib $libobjs $deplibs $compiler_flags' - archive_expsym_cmds='echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~echo "local: *; };" >> $lib.exp~ - $CC -G$allow_undefined_flag -M $lib.exp -h $soname -o $lib $libobjs $deplibs $compiler_flags~$RM $lib.exp' - ;; - esac - fi - hardcode_libdir_flag_spec='-R$libdir' - hardcode_shlibpath_var=no - case $host_os in - solaris2.[0-5] | solaris2.[0-5].*) ;; - *) - # The compiler driver will combine and reorder linker options, - # but understands '-z linker_flag'. GCC discards it without '$wl', - # but is careful enough not to reorder. - # Supported since Solaris 2.6 (maybe 2.5.1?) - if test yes = "$GCC"; then - whole_archive_flag_spec='$wl-z ${wl}allextract$convenience $wl-z ${wl}defaultextract' - else - whole_archive_flag_spec='-z allextract$convenience -z defaultextract' - fi - ;; - esac - link_all_deplibs=yes - ;; - - sunos4*) - if test sequent = "$host_vendor"; then - # Use $CC to link under sequent, because it throws in some extra .o - # files that make .init and .fini sections work. - archive_cmds='$CC -G $wl-h $soname -o $lib $libobjs $deplibs $compiler_flags' - else - archive_cmds='$LD -assert pure-text -Bstatic -o $lib $libobjs $deplibs $linker_flags' - fi - hardcode_libdir_flag_spec='-L$libdir' - hardcode_direct=yes - hardcode_minus_L=yes - hardcode_shlibpath_var=no - ;; - - sysv4) - case $host_vendor in - sni) - archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' - hardcode_direct=yes # is this really true??? - ;; - siemens) - ## LD is ld it makes a PLAMLIB - ## CC just makes a GrossModule. - archive_cmds='$LD -G -o $lib $libobjs $deplibs $linker_flags' - reload_cmds='$CC -r -o $output$reload_objs' - hardcode_direct=no - ;; - motorola) - archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' - hardcode_direct=no #Motorola manual says yes, but my tests say they lie - ;; - esac - runpath_var='LD_RUN_PATH' - hardcode_shlibpath_var=no - ;; - - sysv4.3*) - archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' - hardcode_shlibpath_var=no - export_dynamic_flag_spec='-Bexport' - ;; - - sysv4*MP*) - if test -d /usr/nec; then - archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' - hardcode_shlibpath_var=no - runpath_var=LD_RUN_PATH - hardcode_runpath_var=yes - ld_shlibs=yes - fi - ;; - - sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[01].[10]* | unixware7* | sco3.2v5.0.[024]*) - no_undefined_flag='$wl-z,text' - archive_cmds_need_lc=no - hardcode_shlibpath_var=no - runpath_var='LD_RUN_PATH' - - if test yes = "$GCC"; then - archive_cmds='$CC -shared $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - archive_expsym_cmds='$CC -shared $wl-Bexport:$export_symbols $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - else - archive_cmds='$CC -G $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - archive_expsym_cmds='$CC -G $wl-Bexport:$export_symbols $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - fi - ;; - - sysv5* | sco3.2v5* | sco5v6*) - # Note: We CANNOT use -z defs as we might desire, because we do not - # link with -lc, and that would cause any symbols used from libc to - # always be unresolved, which means just about no library would - # ever link correctly. If we're not using GNU ld we use -z text - # though, which does catch some bad symbols but isn't as heavy-handed - # as -z defs. - no_undefined_flag='$wl-z,text' - allow_undefined_flag='$wl-z,nodefs' - archive_cmds_need_lc=no - hardcode_shlibpath_var=no - hardcode_libdir_flag_spec='$wl-R,$libdir' - hardcode_libdir_separator=':' - link_all_deplibs=yes - export_dynamic_flag_spec='$wl-Bexport' - runpath_var='LD_RUN_PATH' - - if test yes = "$GCC"; then - archive_cmds='$CC -shared $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - archive_expsym_cmds='$CC -shared $wl-Bexport:$export_symbols $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - else - archive_cmds='$CC -G $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - archive_expsym_cmds='$CC -G $wl-Bexport:$export_symbols $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - fi - ;; - - uts4*) - archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' - hardcode_libdir_flag_spec='-L$libdir' - hardcode_shlibpath_var=no - ;; - - *) - ld_shlibs=no - ;; +# Some flags need to be propagated to the compiler or linker for good +# libtool support. +case $host in +ia64-*-hpux*) + # Find out what ABI is being produced by ac_compile, and set mode + # options accordingly. + echo 'int i;' > conftest.$ac_ext + if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 + (eval $ac_compile) 2>&5 + ac_status=$? + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; then + case `/usr/bin/file conftest.$ac_objext` in + *ELF-32*) + HPUX_IA64_MODE=32 + ;; + *ELF-64*) + HPUX_IA64_MODE=64 + ;; esac - - if test sni = "$host_vendor"; then - case $host in - sysv4 | sysv4.2uw2* | sysv4.3* | sysv5*) - export_dynamic_flag_spec='$wl-Blargedynsym' + fi + rm -rf conftest* + ;; +*-*-irix6*) + # Find out what ABI is being produced by ac_compile, and set linker + # options accordingly. + echo '#line '$LINENO' "configure"' > conftest.$ac_ext + if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 + (eval $ac_compile) 2>&5 + ac_status=$? + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; then + if test yes = "$lt_cv_prog_gnu_ld"; then + case `/usr/bin/file conftest.$ac_objext` in + *32-bit*) + LD="${LD-ld} -melf32bsmip" + ;; + *N32*) + LD="${LD-ld} -melf32bmipn32" + ;; + *64-bit*) + LD="${LD-ld} -melf64bmip" ;; esac + else + case `/usr/bin/file conftest.$ac_objext` in + *32-bit*) + LD="${LD-ld} -32" + ;; + *N32*) + LD="${LD-ld} -n32" + ;; + *64-bit*) + LD="${LD-ld} -64" + ;; + esac fi fi + rm -rf conftest* + ;; -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ld_shlibs" >&5 -$as_echo "$ld_shlibs" >&6; } -test no = "$ld_shlibs" && can_build_shared=no - -with_gnu_ld=$with_gnu_ld - - - - - - - - - - - - - +mips64*-*linux*) + # Find out what ABI is being produced by ac_compile, and set linker + # options accordingly. + echo '#line '$LINENO' "configure"' > conftest.$ac_ext + if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 + (eval $ac_compile) 2>&5 + ac_status=$? + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; then + emul=elf + case `/usr/bin/file conftest.$ac_objext` in + *32-bit*) + emul="${emul}32" + ;; + *64-bit*) + emul="${emul}64" + ;; + esac + case `/usr/bin/file conftest.$ac_objext` in + *MSB*) + emul="${emul}btsmip" + ;; + *LSB*) + emul="${emul}ltsmip" + ;; + esac + case `/usr/bin/file conftest.$ac_objext` in + *N32*) + emul="${emul}n32" + ;; + esac + LD="${LD-ld} -m $emul" + fi + rm -rf conftest* + ;; +x86_64-*kfreebsd*-gnu|x86_64-*linux*|powerpc*-*linux*| \ +s390*-*linux*|s390*-*tpf*|sparc*-*linux*) + # Find out what ABI is being produced by ac_compile, and set linker + # options accordingly. Note that the listed cases only cover the + # situations where additional linker options are needed (such as when + # doing 32-bit compilation for a host where ld defaults to 64-bit, or + # vice versa); the common cases where no linker options are needed do + # not appear in the list. + echo 'int i;' > conftest.$ac_ext + if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 + (eval $ac_compile) 2>&5 + ac_status=$? + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; then + case `/usr/bin/file conftest.o` in + *32-bit*) + case $host in + x86_64-*kfreebsd*-gnu) + LD="${LD-ld} -m elf_i386_fbsd" + ;; + x86_64-*linux*) + case `/usr/bin/file conftest.o` in + *x86-64*) + LD="${LD-ld} -m elf32_x86_64" + ;; + *) + LD="${LD-ld} -m elf_i386" + ;; + esac + ;; + powerpc64le-*linux*) + LD="${LD-ld} -m elf32lppclinux" + ;; + powerpc64-*linux*) + LD="${LD-ld} -m elf32ppclinux" + ;; + s390x-*linux*) + LD="${LD-ld} -m elf_s390" + ;; + sparc64-*linux*) + LD="${LD-ld} -m elf32_sparc" + ;; + esac + ;; + *64-bit*) + case $host in + x86_64-*kfreebsd*-gnu) + LD="${LD-ld} -m elf_x86_64_fbsd" + ;; + x86_64-*linux*) + LD="${LD-ld} -m elf_x86_64" + ;; + powerpcle-*linux*) + LD="${LD-ld} -m elf64lppc" + ;; + powerpc-*linux*) + LD="${LD-ld} -m elf64ppc" + ;; + s390*-*linux*|s390*-*tpf*) + LD="${LD-ld} -m elf64_s390" + ;; + sparc*-*linux*) + LD="${LD-ld} -m elf64_sparc" + ;; + esac + ;; + esac + fi + rm -rf conftest* + ;; -# -# Do we need to explicitly link libc? -# -case "x$archive_cmds_need_lc" in -x|xyes) - # Assume -lc should be added - archive_cmds_need_lc=yes +*-*-sco3.2v5*) + # On SCO OpenServer 5, we need -belf to get full-featured binaries. + SAVE_CFLAGS=$CFLAGS + CFLAGS="$CFLAGS -belf" + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the C compiler needs -belf" >&5 +printf %s "checking whether the C compiler needs -belf... " >&6; } +if test ${lt_cv_cc_needs_belf+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_ext=c +ac_cpp='$CPP $CPPFLAGS' +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' +ac_compiler_gnu=$ac_cv_c_compiler_gnu - if test yes,yes = "$GCC,$enable_shared"; then - case $archive_cmds in - *'~'*) - # FIXME: we may have to deal with multi-command sequences. - ;; - '$CC '*) - # Test whether the compiler implicitly links with -lc since on some - # systems, -lgcc has to come before -lc. If gcc already passes -lc - # to ld, don't add -lc before -lgcc. - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether -lc should be explicitly linked in" >&5 -$as_echo_n "checking whether -lc should be explicitly linked in... " >&6; } -if ${lt_cv_archive_cmds_need_lc+:} false; then : - $as_echo_n "(cached) " >&6 -else - $RM conftest* - echo "$lt_simple_compile_test_code" > conftest.$ac_ext + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ - if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 - (eval $ac_compile) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } 2>conftest.err; then - soname=conftest - lib=conftest - libobjs=conftest.$ac_objext - deplibs= - wl=$lt_prog_compiler_wl - pic_flag=$lt_prog_compiler_pic - compiler_flags=-v - linker_flags=-v - verstring= - output_objdir=. - libname=conftest - lt_save_allow_undefined_flag=$allow_undefined_flag - allow_undefined_flag= - if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$archive_cmds 2\>\&1 \| $GREP \" -lc \" \>/dev/null 2\>\&1\""; } >&5 - (eval $archive_cmds 2\>\&1 \| $GREP \" -lc \" \>/dev/null 2\>\&1) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } - then - lt_cv_archive_cmds_need_lc=no - else - lt_cv_archive_cmds_need_lc=yes - fi - allow_undefined_flag=$lt_save_allow_undefined_flag - else - cat conftest.err 1>&5 - fi - $RM conftest* +int +main (void) +{ + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO" +then : + lt_cv_cc_needs_belf=yes +else $as_nop + lt_cv_cc_needs_belf=no fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_archive_cmds_need_lc" >&5 -$as_echo "$lt_cv_archive_cmds_need_lc" >&6; } - archive_cmds_need_lc=$lt_cv_archive_cmds_need_lc +rm -f core conftest.err conftest.$ac_objext conftest.beam \ + conftest$ac_exeext conftest.$ac_ext + ac_ext=c +ac_cpp='$CPP $CPPFLAGS' +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' +ac_compiler_gnu=$ac_cv_c_compiler_gnu + +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_cc_needs_belf" >&5 +printf "%s\n" "$lt_cv_cc_needs_belf" >&6; } + if test yes != "$lt_cv_cc_needs_belf"; then + # this is probably gcc 2.8.0, egcs 1.0 or newer; no need for -belf + CFLAGS=$SAVE_CFLAGS + fi + ;; +*-*solaris*) + # Find out what ABI is being produced by ac_compile, and set linker + # options accordingly. + echo 'int i;' > conftest.$ac_ext + if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 + (eval $ac_compile) 2>&5 + ac_status=$? + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; }; then + case `/usr/bin/file conftest.o` in + *64-bit*) + case $lt_cv_prog_gnu_ld in + yes*) + case $host in + i?86-*-solaris*|x86_64-*-solaris*) + LD="${LD-ld} -m elf_x86_64" + ;; + sparc*-*-solaris*) + LD="${LD-ld} -m elf64_sparc" + ;; + esac + # GNU ld 2.21 introduced _sol2 emulations. Use them if available. + if ${LD-ld} -V | grep _sol2 >/dev/null 2>&1; then + LD=${LD-ld}_sol2 + fi + ;; + *) + if ${LD-ld} -64 -r -o conftest2.o conftest.o >/dev/null 2>&1; then + LD="${LD-ld} -64" + fi + ;; + esac ;; esac fi + rm -rf conftest* ;; esac +need_locks=$enable_libtool_lock +if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}mt", so it can be a program name with args. +set dummy ${ac_tool_prefix}mt; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_MANIFEST_TOOL+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$MANIFEST_TOOL"; then + ac_cv_prog_MANIFEST_TOOL="$MANIFEST_TOOL" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_MANIFEST_TOOL="${ac_tool_prefix}mt" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS +fi +fi +MANIFEST_TOOL=$ac_cv_prog_MANIFEST_TOOL +if test -n "$MANIFEST_TOOL"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $MANIFEST_TOOL" >&5 +printf "%s\n" "$MANIFEST_TOOL" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi +fi +if test -z "$ac_cv_prog_MANIFEST_TOOL"; then + ac_ct_MANIFEST_TOOL=$MANIFEST_TOOL + # Extract the first word of "mt", so it can be a program name with args. +set dummy mt; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_MANIFEST_TOOL+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_MANIFEST_TOOL"; then + ac_cv_prog_ac_ct_MANIFEST_TOOL="$ac_ct_MANIFEST_TOOL" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_MANIFEST_TOOL="mt" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS +fi +fi +ac_ct_MANIFEST_TOOL=$ac_cv_prog_ac_ct_MANIFEST_TOOL +if test -n "$ac_ct_MANIFEST_TOOL"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_MANIFEST_TOOL" >&5 +printf "%s\n" "$ac_ct_MANIFEST_TOOL" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + if test "x$ac_ct_MANIFEST_TOOL" = x; then + MANIFEST_TOOL=":" + else + case $cross_compiling:$ac_tool_warned in +yes:) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +ac_tool_warned=yes ;; +esac + MANIFEST_TOOL=$ac_ct_MANIFEST_TOOL + fi +else + MANIFEST_TOOL="$ac_cv_prog_MANIFEST_TOOL" +fi +test -z "$MANIFEST_TOOL" && MANIFEST_TOOL=mt +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if $MANIFEST_TOOL is a manifest tool" >&5 +printf %s "checking if $MANIFEST_TOOL is a manifest tool... " >&6; } +if test ${lt_cv_path_mainfest_tool+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_path_mainfest_tool=no + echo "$as_me:$LINENO: $MANIFEST_TOOL '-?'" >&5 + $MANIFEST_TOOL '-?' 2>conftest.err > conftest.out + cat conftest.err >&5 + if $GREP 'Manifest Tool' conftest.out > /dev/null; then + lt_cv_path_mainfest_tool=yes + fi + rm -f conftest* +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_path_mainfest_tool" >&5 +printf "%s\n" "$lt_cv_path_mainfest_tool" >&6; } +if test yes != "$lt_cv_path_mainfest_tool"; then + MANIFEST_TOOL=: +fi + case $host_os in + rhapsody* | darwin*) + if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}dsymutil", so it can be a program name with args. +set dummy ${ac_tool_prefix}dsymutil; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_DSYMUTIL+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$DSYMUTIL"; then + ac_cv_prog_DSYMUTIL="$DSYMUTIL" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_DSYMUTIL="${ac_tool_prefix}dsymutil" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS +fi +fi +DSYMUTIL=$ac_cv_prog_DSYMUTIL +if test -n "$DSYMUTIL"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $DSYMUTIL" >&5 +printf "%s\n" "$DSYMUTIL" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi +fi +if test -z "$ac_cv_prog_DSYMUTIL"; then + ac_ct_DSYMUTIL=$DSYMUTIL + # Extract the first word of "dsymutil", so it can be a program name with args. +set dummy dsymutil; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_DSYMUTIL+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_DSYMUTIL"; then + ac_cv_prog_ac_ct_DSYMUTIL="$ac_ct_DSYMUTIL" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_DSYMUTIL="dsymutil" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS +fi +fi +ac_ct_DSYMUTIL=$ac_cv_prog_ac_ct_DSYMUTIL +if test -n "$ac_ct_DSYMUTIL"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_DSYMUTIL" >&5 +printf "%s\n" "$ac_ct_DSYMUTIL" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + if test "x$ac_ct_DSYMUTIL" = x; then + DSYMUTIL=":" + else + case $cross_compiling:$ac_tool_warned in +yes:) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +ac_tool_warned=yes ;; +esac + DSYMUTIL=$ac_ct_DSYMUTIL + fi +else + DSYMUTIL="$ac_cv_prog_DSYMUTIL" +fi + if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}nmedit", so it can be a program name with args. +set dummy ${ac_tool_prefix}nmedit; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_NMEDIT+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$NMEDIT"; then + ac_cv_prog_NMEDIT="$NMEDIT" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_NMEDIT="${ac_tool_prefix}nmedit" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS +fi +fi +NMEDIT=$ac_cv_prog_NMEDIT +if test -n "$NMEDIT"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $NMEDIT" >&5 +printf "%s\n" "$NMEDIT" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi +fi +if test -z "$ac_cv_prog_NMEDIT"; then + ac_ct_NMEDIT=$NMEDIT + # Extract the first word of "nmedit", so it can be a program name with args. +set dummy nmedit; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_NMEDIT+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_NMEDIT"; then + ac_cv_prog_ac_ct_NMEDIT="$ac_ct_NMEDIT" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_NMEDIT="nmedit" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS +fi +fi +ac_ct_NMEDIT=$ac_cv_prog_ac_ct_NMEDIT +if test -n "$ac_ct_NMEDIT"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_NMEDIT" >&5 +printf "%s\n" "$ac_ct_NMEDIT" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + if test "x$ac_ct_NMEDIT" = x; then + NMEDIT=":" + else + case $cross_compiling:$ac_tool_warned in +yes:) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +ac_tool_warned=yes ;; +esac + NMEDIT=$ac_ct_NMEDIT + fi +else + NMEDIT="$ac_cv_prog_NMEDIT" +fi + if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}lipo", so it can be a program name with args. +set dummy ${ac_tool_prefix}lipo; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_LIPO+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$LIPO"; then + ac_cv_prog_LIPO="$LIPO" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_LIPO="${ac_tool_prefix}lipo" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS +fi +fi +LIPO=$ac_cv_prog_LIPO +if test -n "$LIPO"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $LIPO" >&5 +printf "%s\n" "$LIPO" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi +fi +if test -z "$ac_cv_prog_LIPO"; then + ac_ct_LIPO=$LIPO + # Extract the first word of "lipo", so it can be a program name with args. +set dummy lipo; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_LIPO+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_LIPO"; then + ac_cv_prog_ac_ct_LIPO="$ac_ct_LIPO" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_LIPO="lipo" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS +fi +fi +ac_ct_LIPO=$ac_cv_prog_ac_ct_LIPO +if test -n "$ac_ct_LIPO"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_LIPO" >&5 +printf "%s\n" "$ac_ct_LIPO" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + if test "x$ac_ct_LIPO" = x; then + LIPO=":" + else + case $cross_compiling:$ac_tool_warned in +yes:) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +ac_tool_warned=yes ;; +esac + LIPO=$ac_ct_LIPO + fi +else + LIPO="$ac_cv_prog_LIPO" +fi + if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}otool", so it can be a program name with args. +set dummy ${ac_tool_prefix}otool; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_OTOOL+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$OTOOL"; then + ac_cv_prog_OTOOL="$OTOOL" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_OTOOL="${ac_tool_prefix}otool" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS +fi +fi +OTOOL=$ac_cv_prog_OTOOL +if test -n "$OTOOL"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $OTOOL" >&5 +printf "%s\n" "$OTOOL" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi +fi +if test -z "$ac_cv_prog_OTOOL"; then + ac_ct_OTOOL=$OTOOL + # Extract the first word of "otool", so it can be a program name with args. +set dummy otool; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_OTOOL+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_OTOOL"; then + ac_cv_prog_ac_ct_OTOOL="$ac_ct_OTOOL" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_OTOOL="otool" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS +fi +fi +ac_ct_OTOOL=$ac_cv_prog_ac_ct_OTOOL +if test -n "$ac_ct_OTOOL"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_OTOOL" >&5 +printf "%s\n" "$ac_ct_OTOOL" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + if test "x$ac_ct_OTOOL" = x; then + OTOOL=":" + else + case $cross_compiling:$ac_tool_warned in +yes:) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +ac_tool_warned=yes ;; +esac + OTOOL=$ac_ct_OTOOL + fi +else + OTOOL="$ac_cv_prog_OTOOL" +fi + if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}otool64", so it can be a program name with args. +set dummy ${ac_tool_prefix}otool64; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_OTOOL64+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$OTOOL64"; then + ac_cv_prog_OTOOL64="$OTOOL64" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_OTOOL64="${ac_tool_prefix}otool64" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS +fi +fi +OTOOL64=$ac_cv_prog_OTOOL64 +if test -n "$OTOOL64"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $OTOOL64" >&5 +printf "%s\n" "$OTOOL64" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi +fi +if test -z "$ac_cv_prog_OTOOL64"; then + ac_ct_OTOOL64=$OTOOL64 + # Extract the first word of "otool64", so it can be a program name with args. +set dummy otool64; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_OTOOL64+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_OTOOL64"; then + ac_cv_prog_ac_ct_OTOOL64="$ac_ct_OTOOL64" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_OTOOL64="otool64" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS +fi +fi +ac_ct_OTOOL64=$ac_cv_prog_ac_ct_OTOOL64 +if test -n "$ac_ct_OTOOL64"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_OTOOL64" >&5 +printf "%s\n" "$ac_ct_OTOOL64" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + if test "x$ac_ct_OTOOL64" = x; then + OTOOL64=":" + else + case $cross_compiling:$ac_tool_warned in +yes:) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +ac_tool_warned=yes ;; +esac + OTOOL64=$ac_ct_OTOOL64 + fi +else + OTOOL64="$ac_cv_prog_OTOOL64" +fi @@ -12504,942 +8597,563 @@ esac - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - { $as_echo "$as_me:${as_lineno-$LINENO}: checking dynamic linker characteristics" >&5 -$as_echo_n "checking dynamic linker characteristics... " >&6; } - -if test yes = "$GCC"; then - case $host_os in - darwin*) lt_awk_arg='/^libraries:/,/LR/' ;; - *) lt_awk_arg='/^libraries:/' ;; - esac - case $host_os in - mingw* | cegcc*) lt_sed_strip_eq='s|=\([A-Za-z]:\)|\1|g' ;; - *) lt_sed_strip_eq='s|=/|/|g' ;; - esac - lt_search_path_spec=`$CC -print-search-dirs | awk $lt_awk_arg | $SED -e "s/^libraries://" -e $lt_sed_strip_eq` - case $lt_search_path_spec in - *\;*) - # if the path contains ";" then we assume it to be the separator - # otherwise default to the standard path separator (i.e. ":") - it is - # assumed that no part of a normal pathname contains ";" but that should - # okay in the real world where ";" in dirpaths is itself problematic. - lt_search_path_spec=`$ECHO "$lt_search_path_spec" | $SED 's/;/ /g'` - ;; - *) - lt_search_path_spec=`$ECHO "$lt_search_path_spec" | $SED "s/$PATH_SEPARATOR/ /g"` - ;; - esac - # Ok, now we have the path, separated by spaces, we can step through it - # and add multilib dir if necessary... - lt_tmp_lt_search_path_spec= - lt_multi_os_dir=/`$CC $CPPFLAGS $CFLAGS $LDFLAGS -print-multi-os-directory 2>/dev/null` - # ...but if some path component already ends with the multilib dir we assume - # that all is fine and trust -print-search-dirs as is (GCC 4.2? or newer). - case "$lt_multi_os_dir; $lt_search_path_spec " in - "/; "* | "/.; "* | "/./; "* | *"$lt_multi_os_dir "* | *"$lt_multi_os_dir/ "*) - lt_multi_os_dir= - ;; - esac - for lt_sys_path in $lt_search_path_spec; do - if test -d "$lt_sys_path$lt_multi_os_dir"; then - lt_tmp_lt_search_path_spec="$lt_tmp_lt_search_path_spec $lt_sys_path$lt_multi_os_dir" - elif test -n "$lt_multi_os_dir"; then - test -d "$lt_sys_path" && \ - lt_tmp_lt_search_path_spec="$lt_tmp_lt_search_path_spec $lt_sys_path" - fi - done - lt_search_path_spec=`$ECHO "$lt_tmp_lt_search_path_spec" | awk ' -BEGIN {RS = " "; FS = "/|\n";} { - lt_foo = ""; - lt_count = 0; - for (lt_i = NF; lt_i > 0; lt_i--) { - if ($lt_i != "" && $lt_i != ".") { - if ($lt_i == "..") { - lt_count++; - } else { - if (lt_count == 0) { - lt_foo = "/" $lt_i lt_foo; - } else { - lt_count--; - } - } - } - } - if (lt_foo != "") { lt_freq[lt_foo]++; } - if (lt_freq[lt_foo] == 1) { print lt_foo; } -}'` - # AWK program above erroneously prepends '/' to C:/dos/paths - # for these hosts. - case $host_os in - mingw* | cegcc*) lt_search_path_spec=`$ECHO "$lt_search_path_spec" |\ - $SED 's|/\([A-Za-z]:\)|\1|g'` ;; - esac - sys_lib_search_path_spec=`$ECHO "$lt_search_path_spec" | $lt_NL2SP` -else - sys_lib_search_path_spec="/lib /usr/lib /usr/local/lib" -fi -library_names_spec= -libname_spec='lib$name' -soname_spec= -shrext_cmds=.so -postinstall_cmds= -postuninstall_cmds= -finish_cmds= -finish_eval= -shlibpath_var= -shlibpath_overrides_runpath=unknown -version_type=none -dynamic_linker="$host_os ld.so" -sys_lib_dlsearch_path_spec="/lib /usr/lib" -need_lib_prefix=unknown -hardcode_into_libs=no - -# when you set need_version to no, make sure it does not cause -set_version -# flags to be left without arguments -need_version=unknown - - - -case $host_os in -aix3*) - version_type=linux # correct to gnu/linux during the next big refactor - library_names_spec='$libname$release$shared_ext$versuffix $libname.a' - shlibpath_var=LIBPATH - - # AIX 3 has no versioning support, so we append a major version to the name. - soname_spec='$libname$release$shared_ext$major' - ;; - -aix[4-9]*) - version_type=linux # correct to gnu/linux during the next big refactor - need_lib_prefix=no - need_version=no - hardcode_into_libs=yes - if test ia64 = "$host_cpu"; then - # AIX 5 supports IA64 - library_names_spec='$libname$release$shared_ext$major $libname$release$shared_ext$versuffix $libname$shared_ext' - shlibpath_var=LD_LIBRARY_PATH - else - # With GCC up to 2.95.x, collect2 would create an import file - # for dependence libraries. The import file would start with - # the line '#! .'. This would cause the generated library to - # depend on '.', always an invalid library. This was fixed in - # development snapshots of GCC prior to 3.0. - case $host_os in - aix4 | aix4.[01] | aix4.[01].*) - if { echo '#if __GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 97)' - echo ' yes ' - echo '#endif'; } | $CC -E - | $GREP yes > /dev/null; then - : - else - can_build_shared=no - fi - ;; - esac - # Using Import Files as archive members, it is possible to support - # filename-based versioning of shared library archives on AIX. While - # this would work for both with and without runtime linking, it will - # prevent static linking of such archives. So we do filename-based - # shared library versioning with .so extension only, which is used - # when both runtime linking and shared linking is enabled. - # Unfortunately, runtime linking may impact performance, so we do - # not want this to be the default eventually. Also, we use the - # versioned .so libs for executables only if there is the -brtl - # linker flag in LDFLAGS as well, or --with-aix-soname=svr4 only. - # To allow for filename-based versioning support, we need to create - # libNAME.so.V as an archive file, containing: - # *) an Import File, referring to the versioned filename of the - # archive as well as the shared archive member, telling the - # bitwidth (32 or 64) of that shared object, and providing the - # list of exported symbols of that shared object, eventually - # decorated with the 'weak' keyword - # *) the shared object with the F_LOADONLY flag set, to really avoid - # it being seen by the linker. - # At run time we better use the real file rather than another symlink, - # but for link time we create the symlink libNAME.so -> libNAME.so.V - - case $with_aix_soname,$aix_use_runtimelinking in - # AIX (on Power*) has no versioning support, so currently we cannot hardcode correct - # soname into executable. Probably we can add versioning support to - # collect2, so additional links can be useful in future. - aix,yes) # traditional libtool - dynamic_linker='AIX unversionable lib.so' - # If using run time linking (on AIX 4.2 or later) use lib.so - # instead of lib.a to let people know that these are not - # typical AIX shared libraries. - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - ;; - aix,no) # traditional AIX only - dynamic_linker='AIX lib.a(lib.so.V)' - # We preserve .a as extension for shared libraries through AIX4.2 - # and later when we are not doing run time linking. - library_names_spec='$libname$release.a $libname.a' - soname_spec='$libname$release$shared_ext$major' - ;; - svr4,*) # full svr4 only - dynamic_linker="AIX lib.so.V($shared_archive_member_spec.o)" - library_names_spec='$libname$release$shared_ext$major $libname$shared_ext' - # We do not specify a path in Import Files, so LIBPATH fires. - shlibpath_overrides_runpath=yes - ;; - *,yes) # both, prefer svr4 - dynamic_linker="AIX lib.so.V($shared_archive_member_spec.o), lib.a(lib.so.V)" - library_names_spec='$libname$release$shared_ext$major $libname$shared_ext' - # unpreferred sharedlib libNAME.a needs extra handling - postinstall_cmds='test -n "$linkname" || linkname="$realname"~func_stripname "" ".so" "$linkname"~$install_shared_prog "$dir/$func_stripname_result.$libext" "$destdir/$func_stripname_result.$libext"~test -z "$tstripme" || test -z "$striplib" || $striplib "$destdir/$func_stripname_result.$libext"' - postuninstall_cmds='for n in $library_names $old_library; do :; done~func_stripname "" ".so" "$n"~test "$func_stripname_result" = "$n" || func_append rmfiles " $odir/$func_stripname_result.$libext"' - # We do not specify a path in Import Files, so LIBPATH fires. - shlibpath_overrides_runpath=yes - ;; - *,no) # both, prefer aix - dynamic_linker="AIX lib.a(lib.so.V), lib.so.V($shared_archive_member_spec.o)" - library_names_spec='$libname$release.a $libname.a' - soname_spec='$libname$release$shared_ext$major' - # unpreferred sharedlib libNAME.so.V and symlink libNAME.so need extra handling - postinstall_cmds='test -z "$dlname" || $install_shared_prog $dir/$dlname $destdir/$dlname~test -z "$tstripme" || test -z "$striplib" || $striplib $destdir/$dlname~test -n "$linkname" || linkname=$realname~func_stripname "" ".a" "$linkname"~(cd "$destdir" && $LN_S -f $dlname $func_stripname_result.so)' - postuninstall_cmds='test -z "$dlname" || func_append rmfiles " $odir/$dlname"~for n in $old_library $library_names; do :; done~func_stripname "" ".a" "$n"~func_append rmfiles " $odir/$func_stripname_result.so"' - ;; - esac - shlibpath_var=LIBPATH - fi - ;; - -amigaos*) - case $host_cpu in - powerpc) - # Since July 2007 AmigaOS4 officially supports .so libraries. - # When compiling the executable, add -use-dynld -Lsobjs: to the compileline. - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - ;; - m68k) - library_names_spec='$libname.ixlibrary $libname.a' - # Create ${libname}_ixlibrary.a entries in /sys/libs. - finish_eval='for lib in `ls $libdir/*.ixlibrary 2>/dev/null`; do libname=`func_echo_all "$lib" | $SED '\''s%^.*/\([^/]*\)\.ixlibrary$%\1%'\''`; $RM /sys/libs/${libname}_ixlibrary.a; $show "cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a"; cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a || exit 1; done' - ;; - esac - ;; - -beos*) - library_names_spec='$libname$shared_ext' - dynamic_linker="$host_os ld.so" - shlibpath_var=LIBRARY_PATH - ;; - -bsdi[45]*) - version_type=linux # correct to gnu/linux during the next big refactor - need_version=no - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - finish_cmds='PATH="\$PATH:/sbin" ldconfig $libdir' - shlibpath_var=LD_LIBRARY_PATH - sys_lib_search_path_spec="/shlib /usr/lib /usr/X11/lib /usr/contrib/lib /lib /usr/local/lib" - sys_lib_dlsearch_path_spec="/shlib /usr/lib /usr/local/lib" - # the default ld.so.conf also contains /usr/contrib/lib and - # /usr/X11R6/lib (/usr/X11 is a link to /usr/X11R6), but let us allow - # libtool to hard-code these into programs - ;; - -cygwin* | mingw* | pw32* | cegcc*) - version_type=windows - shrext_cmds=.dll - need_version=no - need_lib_prefix=no - - case $GCC,$cc_basename in - yes,*) - # gcc - library_names_spec='$libname.dll.a' - # DLL is installed to $(libdir)/../bin by postinstall_cmds - postinstall_cmds='base_file=`basename \$file`~ - dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~ - dldir=$destdir/`dirname \$dlpath`~ - test -d \$dldir || mkdir -p \$dldir~ - $install_prog $dir/$dlname \$dldir/$dlname~ - chmod a+x \$dldir/$dlname~ - if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then - eval '\''$striplib \$dldir/$dlname'\'' || exit \$?; - fi' - postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ - dlpath=$dir/\$dldll~ - $RM \$dlpath' - shlibpath_overrides_runpath=yes - - case $host_os in - cygwin*) - # Cygwin DLLs use 'cyg' prefix rather than 'lib' - soname_spec='`echo $libname | sed -e 's/^lib/cyg/'``echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext' - - sys_lib_search_path_spec="$sys_lib_search_path_spec /usr/lib/w32api" - ;; - mingw* | cegcc*) - # MinGW DLLs use traditional 'lib' prefix - soname_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext' - ;; - pw32*) - # pw32 DLLs use 'pw' prefix rather than 'lib' - library_names_spec='`echo $libname | sed -e 's/^lib/pw/'``echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext' - ;; - esac - dynamic_linker='Win32 ld.exe' - ;; - - *,cl* | *,icl*) - # Native MSVC or ICC - libname_spec='$name' - soname_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext' - library_names_spec='$libname.dll.lib' - - case $build_os in - mingw*) - sys_lib_search_path_spec= - lt_save_ifs=$IFS - IFS=';' - for lt_path in $LIB - do - IFS=$lt_save_ifs - # Let DOS variable expansion print the short 8.3 style file name. - lt_path=`cd "$lt_path" 2>/dev/null && cmd //C "for %i in (".") do @echo %~si"` - sys_lib_search_path_spec="$sys_lib_search_path_spec $lt_path" - done - IFS=$lt_save_ifs - # Convert to MSYS style. - sys_lib_search_path_spec=`$ECHO "$sys_lib_search_path_spec" | sed -e 's|\\\\|/|g' -e 's| \\([a-zA-Z]\\):| /\\1|g' -e 's|^ ||'` - ;; - cygwin*) - # Convert to unix form, then to dos form, then back to unix form - # but this time dos style (no spaces!) so that the unix form looks - # like /cygdrive/c/PROGRA~1:/cygdr... - sys_lib_search_path_spec=`cygpath --path --unix "$LIB"` - sys_lib_search_path_spec=`cygpath --path --dos "$sys_lib_search_path_spec" 2>/dev/null` - sys_lib_search_path_spec=`cygpath --path --unix "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` - ;; - *) - sys_lib_search_path_spec=$LIB - if $ECHO "$sys_lib_search_path_spec" | $GREP ';[c-zC-Z]:/' >/dev/null; then - # It is most probably a Windows format PATH. - sys_lib_search_path_spec=`$ECHO "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` - else - sys_lib_search_path_spec=`$ECHO "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` + + + + + + + + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for -single_module linker flag" >&5 +printf %s "checking for -single_module linker flag... " >&6; } +if test ${lt_cv_apple_cc_single_mod+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_apple_cc_single_mod=no + if test -z "$LT_MULTI_MODULE"; then + # By default we will add the -single_module flag. You can override + # by either setting the environment variable LT_MULTI_MODULE + # non-empty at configure time, or by adding -multi_module to the + # link flags. + rm -rf libconftest.dylib* + echo "int foo(void){return 1;}" > conftest.c + echo "$LTCC $LTCFLAGS $LDFLAGS -o libconftest.dylib \ +-dynamiclib -Wl,-single_module conftest.c" >&5 + $LTCC $LTCFLAGS $LDFLAGS -o libconftest.dylib \ + -dynamiclib -Wl,-single_module conftest.c 2>conftest.err + _lt_result=$? + # If there is a non-empty error log, and "single_module" + # appears in it, assume the flag caused a linker warning + if test -s conftest.err && $GREP single_module conftest.err; then + cat conftest.err >&5 + # Otherwise, if the output was created with a 0 exit code from + # the compiler, it worked. + elif test -f libconftest.dylib && test 0 = "$_lt_result"; then + lt_cv_apple_cc_single_mod=yes + else + cat conftest.err >&5 + fi + rm -rf libconftest.dylib* + rm -f conftest.* fi - # FIXME: find the short name or the path components, as spaces are - # common. (e.g. "Program Files" -> "PROGRA~1") - ;; - esac +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_apple_cc_single_mod" >&5 +printf "%s\n" "$lt_cv_apple_cc_single_mod" >&6; } - # DLL is installed to $(libdir)/../bin by postinstall_cmds - postinstall_cmds='base_file=`basename \$file`~ - dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~ - dldir=$destdir/`dirname \$dlpath`~ - test -d \$dldir || mkdir -p \$dldir~ - $install_prog $dir/$dlname \$dldir/$dlname' - postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ - dlpath=$dir/\$dldll~ - $RM \$dlpath' - shlibpath_overrides_runpath=yes - dynamic_linker='Win32 link.exe' - ;; + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for -exported_symbols_list linker flag" >&5 +printf %s "checking for -exported_symbols_list linker flag... " >&6; } +if test ${lt_cv_ld_exported_symbols_list+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_ld_exported_symbols_list=no + save_LDFLAGS=$LDFLAGS + echo "_main" > conftest.sym + LDFLAGS="$LDFLAGS -Wl,-exported_symbols_list,conftest.sym" + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ - *) - # Assume MSVC and ICC wrapper - library_names_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext $libname.lib' - dynamic_linker='Win32 ld.exe' - ;; - esac - # FIXME: first we should search . and the directory the executable is in - shlibpath_var=PATH - ;; +int +main (void) +{ -darwin* | rhapsody*) - dynamic_linker="$host_os dyld" - version_type=darwin - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$major$shared_ext $libname$shared_ext' - soname_spec='$libname$release$major$shared_ext' - shlibpath_overrides_runpath=yes - shlibpath_var=DYLD_LIBRARY_PATH - shrext_cmds='`test .$module = .yes && echo .so || echo .dylib`' + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO" +then : + lt_cv_ld_exported_symbols_list=yes +else $as_nop + lt_cv_ld_exported_symbols_list=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam \ + conftest$ac_exeext conftest.$ac_ext + LDFLAGS=$save_LDFLAGS - sys_lib_search_path_spec="$sys_lib_search_path_spec /usr/local/lib" - sys_lib_dlsearch_path_spec='/usr/local/lib /lib /usr/lib' - ;; +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_ld_exported_symbols_list" >&5 +printf "%s\n" "$lt_cv_ld_exported_symbols_list" >&6; } -dgux*) - version_type=linux # correct to gnu/linux during the next big refactor - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - shlibpath_var=LD_LIBRARY_PATH - ;; + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for -force_load linker flag" >&5 +printf %s "checking for -force_load linker flag... " >&6; } +if test ${lt_cv_ld_force_load+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_ld_force_load=no + cat > conftest.c << _LT_EOF +int forced_loaded() { return 2;} +_LT_EOF + echo "$LTCC $LTCFLAGS -c -o conftest.o conftest.c" >&5 + $LTCC $LTCFLAGS -c -o conftest.o conftest.c 2>&5 + echo "$AR $AR_FLAGS libconftest.a conftest.o" >&5 + $AR $AR_FLAGS libconftest.a conftest.o 2>&5 + echo "$RANLIB libconftest.a" >&5 + $RANLIB libconftest.a 2>&5 + cat > conftest.c << _LT_EOF +int main() { return 0;} +_LT_EOF + echo "$LTCC $LTCFLAGS $LDFLAGS -o conftest conftest.c -Wl,-force_load,./libconftest.a" >&5 + $LTCC $LTCFLAGS $LDFLAGS -o conftest conftest.c -Wl,-force_load,./libconftest.a 2>conftest.err + _lt_result=$? + if test -s conftest.err && $GREP force_load conftest.err; then + cat conftest.err >&5 + elif test -f conftest && test 0 = "$_lt_result" && $GREP forced_load conftest >/dev/null 2>&1; then + lt_cv_ld_force_load=yes + else + cat conftest.err >&5 + fi + rm -f conftest.err libconftest.a conftest conftest.c + rm -rf conftest.dSYM -freebsd* | dragonfly*) - # DragonFly does not have aout. When/if they implement a new - # versioning mechanism, adjust this. - if test -x /usr/bin/objformat; then - objformat=`/usr/bin/objformat` - else +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_ld_force_load" >&5 +printf "%s\n" "$lt_cv_ld_force_load" >&6; } case $host_os in - freebsd[23].*) objformat=aout ;; - *) objformat=elf ;; - esac - fi - version_type=freebsd-$objformat - case $version_type in - freebsd-elf*) - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - need_version=no - need_lib_prefix=no - ;; - freebsd-*) - library_names_spec='$libname$release$shared_ext$versuffix $libname$shared_ext$versuffix' - need_version=yes - ;; - esac - shlibpath_var=LD_LIBRARY_PATH - case $host_os in - freebsd2.*) - shlibpath_overrides_runpath=yes - ;; - freebsd3.[01]* | freebsdelf3.[01]*) - shlibpath_overrides_runpath=yes - hardcode_into_libs=yes - ;; - freebsd3.[2-9]* | freebsdelf3.[2-9]* | \ - freebsd4.[0-5] | freebsdelf4.[0-5] | freebsd4.1.1 | freebsdelf4.1.1) - shlibpath_overrides_runpath=no - hardcode_into_libs=yes - ;; - *) # from 4.6 on, and DragonFly - shlibpath_overrides_runpath=yes - hardcode_into_libs=yes + rhapsody* | darwin1.[012]) + _lt_dar_allow_undefined='$wl-undefined ${wl}suppress' ;; + darwin1.*) + _lt_dar_allow_undefined='$wl-flat_namespace $wl-undefined ${wl}suppress' ;; + darwin*) # darwin 5.x on + # if running on 10.5 or later, the deployment target defaults + # to the OS version, if on x86, and 10.4, the deployment + # target defaults to 10.4. Don't you love it? + case ${MACOSX_DEPLOYMENT_TARGET-10.0},$host in + 10.0,*86*-darwin8*|10.0,*-darwin[91]*) + _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup' ;; + 10.[012][,.]*) + _lt_dar_allow_undefined='$wl-flat_namespace $wl-undefined ${wl}suppress' ;; + 10.*) + _lt_dar_allow_undefined='$wl-undefined ${wl}dynamic_lookup' ;; + esac ;; esac - ;; - -haiku*) - version_type=linux # correct to gnu/linux during the next big refactor - need_lib_prefix=no - need_version=no - dynamic_linker="$host_os runtime_loader" - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - shlibpath_var=LIBRARY_PATH - shlibpath_overrides_runpath=no - sys_lib_dlsearch_path_spec='/boot/home/config/lib /boot/common/lib /boot/system/lib' - hardcode_into_libs=yes - ;; - -hpux9* | hpux10* | hpux11*) - # Give a soname corresponding to the major version so that dld.sl refuses to - # link against other versions. - version_type=sunos - need_lib_prefix=no - need_version=no - case $host_cpu in - ia64*) - shrext_cmds='.so' - hardcode_into_libs=yes - dynamic_linker="$host_os dld.so" - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - if test 32 = "$HPUX_IA64_MODE"; then - sys_lib_search_path_spec="/usr/lib/hpux32 /usr/local/lib/hpux32 /usr/local/lib" - sys_lib_dlsearch_path_spec=/usr/lib/hpux32 + if test yes = "$lt_cv_apple_cc_single_mod"; then + _lt_dar_single_mod='$single_module' + fi + if test yes = "$lt_cv_ld_exported_symbols_list"; then + _lt_dar_export_syms=' $wl-exported_symbols_list,$output_objdir/$libname-symbols.expsym' else - sys_lib_search_path_spec="/usr/lib/hpux64 /usr/local/lib/hpux64" - sys_lib_dlsearch_path_spec=/usr/lib/hpux64 + _lt_dar_export_syms='~$NMEDIT -s $output_objdir/$libname-symbols.expsym $lib' + fi + if test : != "$DSYMUTIL" && test no = "$lt_cv_ld_force_load"; then + _lt_dsymutil='~$DSYMUTIL $lib || :' + else + _lt_dsymutil= fi - ;; - hppa*64*) - shrext_cmds='.sl' - hardcode_into_libs=yes - dynamic_linker="$host_os dld.sl" - shlibpath_var=LD_LIBRARY_PATH # How should we handle SHLIB_PATH - shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - sys_lib_search_path_spec="/usr/lib/pa20_64 /usr/ccs/lib/pa20_64" - sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec - ;; - *) - shrext_cmds='.sl' - dynamic_linker="$host_os dld.sl" - shlibpath_var=SHLIB_PATH - shlibpath_overrides_runpath=no # +s is required to enable SHLIB_PATH - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' ;; esac - # HP-UX runs *really* slowly unless shared libraries are mode 555, ... - postinstall_cmds='chmod 555 $lib' - # or fails outright, so override atomically: - install_override_mode=555 - ;; - -interix[3-9]*) - version_type=linux # correct to gnu/linux during the next big refactor - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - dynamic_linker='Interix 3.x ld.so.1 (PE, like ELF)' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=no - hardcode_into_libs=yes - ;; -irix5* | irix6* | nonstopux*) - case $host_os in - nonstopux*) version_type=nonstopux ;; +# func_munge_path_list VARIABLE PATH +# ----------------------------------- +# VARIABLE is name of variable containing _space_ separated list of +# directories to be munged by the contents of PATH, which is string +# having a format: +# "DIR[:DIR]:" +# string "DIR[ DIR]" will be prepended to VARIABLE +# ":DIR[:DIR]" +# string "DIR[ DIR]" will be appended to VARIABLE +# "DIRP[:DIRP]::[DIRA:]DIRA" +# string "DIRP[ DIRP]" will be prepended to VARIABLE and string +# "DIRA[ DIRA]" will be appended to VARIABLE +# "DIR[:DIR]" +# VARIABLE will be replaced by "DIR[ DIR]" +func_munge_path_list () +{ + case x$2 in + x) + ;; + *:) + eval $1=\"`$ECHO $2 | $SED 's/:/ /g'` \$$1\" + ;; + x:*) + eval $1=\"\$$1 `$ECHO $2 | $SED 's/:/ /g'`\" + ;; + *::*) + eval $1=\"\$$1\ `$ECHO $2 | $SED -e 's/.*:://' -e 's/:/ /g'`\" + eval $1=\"`$ECHO $2 | $SED -e 's/::.*//' -e 's/:/ /g'`\ \$$1\" + ;; *) - if test yes = "$lt_cv_prog_gnu_ld"; then - version_type=linux # correct to gnu/linux during the next big refactor - else - version_type=irix - fi ;; - esac - need_lib_prefix=no - need_version=no - soname_spec='$libname$release$shared_ext$major' - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$release$shared_ext $libname$shared_ext' - case $host_os in - irix5* | nonstopux*) - libsuff= shlibsuff= - ;; - *) - case $LD in # libtool.m4 will add one of these switches to LD - *-32|*"-32 "|*-melf32bsmip|*"-melf32bsmip ") - libsuff= shlibsuff= libmagic=32-bit;; - *-n32|*"-n32 "|*-melf32bmipn32|*"-melf32bmipn32 ") - libsuff=32 shlibsuff=N32 libmagic=N32;; - *-64|*"-64 "|*-melf64bmip|*"-melf64bmip ") - libsuff=64 shlibsuff=64 libmagic=64-bit;; - *) libsuff= shlibsuff= libmagic=never-match;; + eval $1=\"`$ECHO $2 | $SED 's/:/ /g'`\" + ;; esac - ;; - esac - shlibpath_var=LD_LIBRARY${shlibsuff}_PATH - shlibpath_overrides_runpath=no - sys_lib_search_path_spec="/usr/lib$libsuff /lib$libsuff /usr/local/lib$libsuff" - sys_lib_dlsearch_path_spec="/usr/lib$libsuff /lib$libsuff" - hardcode_into_libs=yes - ;; +} -# No shared lib support for Linux oldld, aout, or coff. -linux*oldld* | linux*aout* | linux*coff*) - dynamic_linker=no - ;; +ac_fn_c_check_header_compile "$LINENO" "dlfcn.h" "ac_cv_header_dlfcn_h" "$ac_includes_default +" +if test "x$ac_cv_header_dlfcn_h" = xyes +then : + printf "%s\n" "#define HAVE_DLFCN_H 1" >>confdefs.h -linux*android*) - version_type=none # Android doesn't support versioned libraries. - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$shared_ext' - soname_spec='$libname$release$shared_ext' - finish_cmds= - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes +fi - # This implies no fast_install, which is unacceptable. - # Some rework will be needed to allow for fast_install - # before this can be enabled. - hardcode_into_libs=yes - dynamic_linker='Android linker' - # Don't embed -rpath directories since the linker doesn't support them. - hardcode_libdir_flag_spec='-L$libdir' - ;; -# This must be glibc/ELF. -linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*) - version_type=linux # correct to gnu/linux during the next big refactor - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - finish_cmds='PATH="\$PATH:/sbin" ldconfig -n $libdir' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=no - # Some binutils ld are patched to set DT_RUNPATH - if ${lt_cv_shlibpath_overrides_runpath+:} false; then : - $as_echo_n "(cached) " >&6 + +# Set options +enable_win32_dll=yes + +case $host in +*-*-cygwin* | *-*-mingw* | *-*-pw32* | *-*-cegcc*) + if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}as", so it can be a program name with args. +set dummy ${ac_tool_prefix}as; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_AS+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$AS"; then + ac_cv_prog_AS="$AS" # Let the user override the test. else - lt_cv_shlibpath_overrides_runpath=no - save_LDFLAGS=$LDFLAGS - save_libdir=$libdir - eval "libdir=/foo; wl=\"$lt_prog_compiler_wl\"; \ - LDFLAGS=\"\$LDFLAGS $hardcode_libdir_flag_spec\"" - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_AS="${ac_tool_prefix}as" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS + +fi +fi +AS=$ac_cv_prog_AS +if test -n "$AS"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $AS" >&5 +printf "%s\n" "$AS" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + + +fi +if test -z "$ac_cv_prog_AS"; then + ac_ct_AS=$AS + # Extract the first word of "as", so it can be a program name with args. +set dummy as; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_AS+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_AS"; then + ac_cv_prog_ac_ct_AS="$ac_ct_AS" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_AS="as" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS + +fi +fi +ac_ct_AS=$ac_cv_prog_ac_ct_AS +if test -n "$ac_ct_AS"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_AS" >&5 +printf "%s\n" "$ac_ct_AS" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + + if test "x$ac_ct_AS" = x; then + AS="false" + else + case $cross_compiling:$ac_tool_warned in +yes:) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +ac_tool_warned=yes ;; +esac + AS=$ac_ct_AS + fi +else + AS="$ac_cv_prog_AS" +fi -int -main () -{ + if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}dlltool", so it can be a program name with args. +set dummy ${ac_tool_prefix}dlltool; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_DLLTOOL+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$DLLTOOL"; then + ac_cv_prog_DLLTOOL="$DLLTOOL" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_DLLTOOL="${ac_tool_prefix}dlltool" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 + fi +done + done +IFS=$as_save_IFS - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - if ($OBJDUMP -p conftest$ac_exeext) 2>/dev/null | grep "RUNPATH.*$libdir" >/dev/null; then : - lt_cv_shlibpath_overrides_runpath=yes fi fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext - LDFLAGS=$save_LDFLAGS - libdir=$save_libdir - +DLLTOOL=$ac_cv_prog_DLLTOOL +if test -n "$DLLTOOL"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $DLLTOOL" >&5 +printf "%s\n" "$DLLTOOL" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi - shlibpath_overrides_runpath=$lt_cv_shlibpath_overrides_runpath - - # This implies no fast_install, which is unacceptable. - # Some rework will be needed to allow for fast_install - # before this can be enabled. - hardcode_into_libs=yes - # Ideally, we could use ldconfig to report *all* directores which are - # searched for libraries, however this is still not possible. Aside from not - # being certain /sbin/ldconfig is available, command - # 'ldconfig -N -X -v | grep ^/' on 64bit Fedora does not report /usr/lib64, - # even though it is searched at run-time. Try to do the best guess by - # appending ld.so.conf contents (and includes) to the search path. - if test -f /etc/ld.so.conf; then - lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s 2>/dev/null", \$2)); skip = 1; } { if (!skip) print \$0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;/^[ ]*hwcap[ ]/d;s/[:, ]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;s/"//g;/^$/d' | tr '\n' ' '` - sys_lib_dlsearch_path_spec="/lib /usr/lib $lt_ld_extra" +fi +if test -z "$ac_cv_prog_DLLTOOL"; then + ac_ct_DLLTOOL=$DLLTOOL + # Extract the first word of "dlltool", so it can be a program name with args. +set dummy dlltool; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_DLLTOOL+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_DLLTOOL"; then + ac_cv_prog_ac_ct_DLLTOOL="$ac_ct_DLLTOOL" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_DLLTOOL="dlltool" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 fi +done + done +IFS=$as_save_IFS - # We used to test for /lib/ld.so.1 and disable shared libraries on - # powerpc, because MkLinux only supported shared libraries with the - # GNU dynamic linker. Since this was broken with cross compilers, - # most powerpc-linux boxes support dynamic linking these days and - # people can always --disable-shared, the test was removed, and we - # assume the GNU/Linux dynamic linker is in use. - dynamic_linker='GNU/Linux ld.so' - ;; +fi +fi +ac_ct_DLLTOOL=$ac_cv_prog_ac_ct_DLLTOOL +if test -n "$ac_ct_DLLTOOL"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_DLLTOOL" >&5 +printf "%s\n" "$ac_ct_DLLTOOL" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi -netbsd*) - version_type=sunos - need_lib_prefix=no - need_version=no - if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then - library_names_spec='$libname$release$shared_ext$versuffix $libname$shared_ext$versuffix' - finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' - dynamic_linker='NetBSD (a.out) ld.so' + if test "x$ac_ct_DLLTOOL" = x; then + DLLTOOL="false" else - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - dynamic_linker='NetBSD ld.elf_so' + case $cross_compiling:$ac_tool_warned in +yes:) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +ac_tool_warned=yes ;; +esac + DLLTOOL=$ac_ct_DLLTOOL fi - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - hardcode_into_libs=yes - ;; - -newsos6) - version_type=linux # correct to gnu/linux during the next big refactor - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - ;; - -*nto* | *qnx*) - version_type=qnx - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=no - hardcode_into_libs=yes - dynamic_linker='ldqnx.so' - ;; +else + DLLTOOL="$ac_cv_prog_DLLTOOL" +fi -openbsd* | bitrig*) - version_type=sunos - sys_lib_dlsearch_path_spec=/usr/lib - need_lib_prefix=no - if test -z "`echo __ELF__ | $CC -E - | $GREP __ELF__`"; then - need_version=no - else - need_version=yes + if test -n "$ac_tool_prefix"; then + # Extract the first word of "${ac_tool_prefix}objdump", so it can be a program name with args. +set dummy ${ac_tool_prefix}objdump; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_OBJDUMP+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$OBJDUMP"; then + ac_cv_prog_OBJDUMP="$OBJDUMP" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_OBJDUMP="${ac_tool_prefix}objdump" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 fi - library_names_spec='$libname$release$shared_ext$versuffix $libname$shared_ext$versuffix' - finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - ;; - -os2*) - libname_spec='$name' - version_type=windows - shrext_cmds=.dll - need_version=no - need_lib_prefix=no - # OS/2 can only load a DLL with a base name of 8 characters or less. - soname_spec='`test -n "$os2dllname" && libname="$os2dllname"; - v=$($ECHO $release$versuffix | tr -d .-); - n=$($ECHO $libname | cut -b -$((8 - ${#v})) | tr . _); - $ECHO $n$v`$shared_ext' - library_names_spec='${libname}_dll.$libext' - dynamic_linker='OS/2 ld.exe' - shlibpath_var=BEGINLIBPATH - sys_lib_search_path_spec="/lib /usr/lib /usr/local/lib" - sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec - postinstall_cmds='base_file=`basename \$file`~ - dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; $ECHO \$dlname'\''`~ - dldir=$destdir/`dirname \$dlpath`~ - test -d \$dldir || mkdir -p \$dldir~ - $install_prog $dir/$dlname \$dldir/$dlname~ - chmod a+x \$dldir/$dlname~ - if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then - eval '\''$striplib \$dldir/$dlname'\'' || exit \$?; - fi' - postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; $ECHO \$dlname'\''`~ - dlpath=$dir/\$dldll~ - $RM \$dlpath' - ;; - -osf3* | osf4* | osf5*) - version_type=osf - need_lib_prefix=no - need_version=no - soname_spec='$libname$release$shared_ext$major' - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - shlibpath_var=LD_LIBRARY_PATH - sys_lib_search_path_spec="/usr/shlib /usr/ccs/lib /usr/lib/cmplrs/cc /usr/lib /usr/local/lib /var/shlib" - sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec - ;; - -rdos*) - dynamic_linker=no - ;; - -solaris*) - version_type=linux # correct to gnu/linux during the next big refactor - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - hardcode_into_libs=yes - # ldd complains unless libraries are executable - postinstall_cmds='chmod +x $lib' - ;; +done + done +IFS=$as_save_IFS -sunos4*) - version_type=sunos - library_names_spec='$libname$release$shared_ext$versuffix $libname$shared_ext$versuffix' - finish_cmds='PATH="\$PATH:/usr/etc" ldconfig $libdir' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - if test yes = "$with_gnu_ld"; then - need_lib_prefix=no - fi - need_version=yes - ;; +fi +fi +OBJDUMP=$ac_cv_prog_OBJDUMP +if test -n "$OBJDUMP"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $OBJDUMP" >&5 +printf "%s\n" "$OBJDUMP" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi -sysv4 | sysv4.3*) - version_type=linux # correct to gnu/linux during the next big refactor - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - shlibpath_var=LD_LIBRARY_PATH - case $host_vendor in - sni) - shlibpath_overrides_runpath=no - need_lib_prefix=no - runpath_var=LD_RUN_PATH - ;; - siemens) - need_lib_prefix=no - ;; - motorola) - need_lib_prefix=no - need_version=no - shlibpath_overrides_runpath=no - sys_lib_search_path_spec='/lib /usr/lib /usr/ccs/lib' - ;; - esac - ;; -sysv4*MP*) - if test -d /usr/nec; then - version_type=linux # correct to gnu/linux during the next big refactor - library_names_spec='$libname$shared_ext.$versuffix $libname$shared_ext.$major $libname$shared_ext' - soname_spec='$libname$shared_ext.$major' - shlibpath_var=LD_LIBRARY_PATH +fi +if test -z "$ac_cv_prog_OBJDUMP"; then + ac_ct_OBJDUMP=$OBJDUMP + # Extract the first word of "objdump", so it can be a program name with args. +set dummy objdump; ac_word=$2 +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ac_ct_OBJDUMP+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test -n "$ac_ct_OBJDUMP"; then + ac_cv_prog_ac_ct_OBJDUMP="$ac_ct_OBJDUMP" # Let the user override the test. +else +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR +for as_dir in $PATH +do + IFS=$as_save_IFS + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + for ac_exec_ext in '' $ac_executable_extensions; do + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_prog_ac_ct_OBJDUMP="objdump" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 + break 2 fi - ;; +done + done +IFS=$as_save_IFS -sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) - version_type=sco - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - hardcode_into_libs=yes - if test yes = "$with_gnu_ld"; then - sys_lib_search_path_spec='/usr/local/lib /usr/gnu/lib /usr/ccs/lib /usr/lib /lib' +fi +fi +ac_ct_OBJDUMP=$ac_cv_prog_ac_ct_OBJDUMP +if test -n "$ac_ct_OBJDUMP"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_ct_OBJDUMP" >&5 +printf "%s\n" "$ac_ct_OBJDUMP" >&6; } +else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi + + if test "x$ac_ct_OBJDUMP" = x; then + OBJDUMP="false" else - sys_lib_search_path_spec='/usr/ccs/lib /usr/lib' - case $host_os in - sco3.2v5*) - sys_lib_search_path_spec="$sys_lib_search_path_spec /lib" - ;; - esac + case $cross_compiling:$ac_tool_warned in +yes:) +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +ac_tool_warned=yes ;; +esac + OBJDUMP=$ac_ct_OBJDUMP fi - sys_lib_dlsearch_path_spec='/usr/lib' - ;; +else + OBJDUMP="$ac_cv_prog_OBJDUMP" +fi -tpf*) - # TPF is a cross-target only. Preferred cross-host = GNU/Linux. - version_type=linux # correct to gnu/linux during the next big refactor - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=no - hardcode_into_libs=yes ;; +esac -uts4*) - version_type=linux # correct to gnu/linux during the next big refactor - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - shlibpath_var=LD_LIBRARY_PATH - ;; +test -z "$AS" && AS=as -*) - dynamic_linker=no - ;; -esac -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $dynamic_linker" >&5 -$as_echo "$dynamic_linker" >&6; } -test no = "$dynamic_linker" && can_build_shared=no -variables_saved_for_relink="PATH $shlibpath_var $runpath_var" -if test yes = "$GCC"; then - variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" -fi -if test set = "${lt_cv_sys_lib_search_path_spec+set}"; then - sys_lib_search_path_spec=$lt_cv_sys_lib_search_path_spec -fi -if test set = "${lt_cv_sys_lib_dlsearch_path_spec+set}"; then - sys_lib_dlsearch_path_spec=$lt_cv_sys_lib_dlsearch_path_spec -fi -# remember unaugmented sys_lib_dlsearch_path content for libtool script decls... -configure_time_dlsearch_path=$sys_lib_dlsearch_path_spec +test -z "$DLLTOOL" && DLLTOOL=dlltool + + + + + +test -z "$OBJDUMP" && OBJDUMP=objdump + -# ... but it needs LT_SYS_LIBRARY_PATH munging for other configure-time code -func_munge_path_list sys_lib_dlsearch_path_spec "$LT_SYS_LIBRARY_PATH" -# to be used as default LT_SYS_LIBRARY_PATH value in generated libtool -configure_time_lt_sys_library_path=$LT_SYS_LIBRARY_PATH + enable_dlopen=no + + + + # Check whether --enable-shared was given. +if test ${enable_shared+y} +then : + enableval=$enable_shared; p=${PACKAGE-default} + case $enableval in + yes) enable_shared=yes ;; + no) enable_shared=no ;; + *) + enable_shared=no + # Look at the argument we got. We use all the common list separators. + lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR, + for pkg in $enableval; do + IFS=$lt_save_ifs + if test "X$pkg" = "X$p"; then + enable_shared=yes + fi + done + IFS=$lt_save_ifs + ;; + esac +else $as_nop + enable_shared=yes +fi @@ -13449,6 +9163,29 @@ configure_time_lt_sys_library_path=$LT_SYS_LIBRARY_PATH + # Check whether --enable-static was given. +if test ${enable_static+y} +then : + enableval=$enable_static; p=${PACKAGE-default} + case $enableval in + yes) enable_static=yes ;; + no) enable_static=no ;; + *) + enable_static=no + # Look at the argument we got. We use all the common list separators. + lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR, + for pkg in $enableval; do + IFS=$lt_save_ifs + if test "X$pkg" = "X$p"; then + enable_static=yes + fi + done + IFS=$lt_save_ifs + ;; + esac +else $as_nop + enable_static=yes +fi @@ -13459,6 +9196,28 @@ configure_time_lt_sys_library_path=$LT_SYS_LIBRARY_PATH +# Check whether --with-pic was given. +if test ${with_pic+y} +then : + withval=$with_pic; lt_p=${PACKAGE-default} + case $withval in + yes|no) pic_mode=$withval ;; + *) + pic_mode=default + # Look at the argument we got. We use all the common list separators. + lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR, + for lt_pkg in $withval; do + IFS=$lt_save_ifs + if test "X$lt_pkg" = "X$lt_p"; then + pic_mode=yes + fi + done + IFS=$lt_save_ifs + ;; + esac +else $as_nop + pic_mode=default +fi @@ -13467,6 +9226,29 @@ configure_time_lt_sys_library_path=$LT_SYS_LIBRARY_PATH + # Check whether --enable-fast-install was given. +if test ${enable_fast_install+y} +then : + enableval=$enable_fast_install; p=${PACKAGE-default} + case $enableval in + yes) enable_fast_install=yes ;; + no) enable_fast_install=no ;; + *) + enable_fast_install=no + # Look at the argument we got. We use all the common list separators. + lt_save_ifs=$IFS; IFS=$IFS$PATH_SEPARATOR, + for pkg in $enableval; do + IFS=$lt_save_ifs + if test "X$pkg" = "X$p"; then + enable_fast_install=yes + fi + done + IFS=$lt_save_ifs + ;; + esac +else $as_nop + enable_fast_install=yes +fi @@ -13475,9 +9257,53 @@ configure_time_lt_sys_library_path=$LT_SYS_LIBRARY_PATH + shared_archive_member_spec= +case $host,$enable_shared in +power*-*-aix[5-9]*,yes) + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking which variant of shared library versioning to provide" >&5 +printf %s "checking which variant of shared library versioning to provide... " >&6; } +# Check whether --with-aix-soname was given. +if test ${with_aix_soname+y} +then : + withval=$with_aix_soname; case $withval in + aix|svr4|both) + ;; + *) + as_fn_error $? "Unknown argument to --with-aix-soname" "$LINENO" 5 + ;; + esac + lt_cv_with_aix_soname=$with_aix_soname +else $as_nop + if test ${lt_cv_with_aix_soname+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_with_aix_soname=aix +fi + with_aix_soname=$lt_cv_with_aix_soname +fi + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $with_aix_soname" >&5 +printf "%s\n" "$with_aix_soname" >&6; } + if test aix != "$with_aix_soname"; then + # For the AIX way of multilib, we name the shared archive member + # based on the bitwidth used, traditionally 'shr.o' or 'shr_64.o', + # and 'shr.imp' or 'shr_64.imp', respectively, for the Import File. + # Even when GNU compilers ignore OBJECT_MODE but need '-maix64' flag, + # the AIX toolchain works better with OBJECT_MODE set (default 32). + if test 64 = "${OBJECT_MODE-32}"; then + shared_archive_member_spec=shr_64 + else + shared_archive_member_spec=shr + fi + fi + ;; +*) + with_aix_soname=aix + ;; +esac @@ -13488,7 +9314,11 @@ configure_time_lt_sys_library_path=$LT_SYS_LIBRARY_PATH +# This can be used to rebuild libtool when needed +LIBTOOL_DEPS=$ltmain +# Always use our own libtool. +LIBTOOL='$(SHELL) $(top_builddir)/libtool' @@ -13519,6 +9349,7 @@ configure_time_lt_sys_library_path=$LT_SYS_LIBRARY_PATH +test -z "$LN_S" && LN_S="ln -s" @@ -13533,1250 +9364,1520 @@ configure_time_lt_sys_library_path=$LT_SYS_LIBRARY_PATH - { $as_echo "$as_me:${as_lineno-$LINENO}: checking how to hardcode library paths into programs" >&5 -$as_echo_n "checking how to hardcode library paths into programs... " >&6; } -hardcode_action= -if test -n "$hardcode_libdir_flag_spec" || - test -n "$runpath_var" || - test yes = "$hardcode_automatic"; then +if test -n "${ZSH_VERSION+set}"; then + setopt NO_GLOB_SUBST +fi - # We can hardcode non-existent directories. - if test no != "$hardcode_direct" && - # If the only mechanism to avoid hardcoding is shlibpath_var, we - # have to relink, otherwise we might link with an installed library - # when we should be linking with a yet-to-be-installed one - ## test no != "$_LT_TAGVAR(hardcode_shlibpath_var, )" && - test no != "$hardcode_minus_L"; then - # Linking always hardcodes the temporary library directory. - hardcode_action=relink - else - # We can link without hardcoding, and we can hardcode nonexisting dirs. - hardcode_action=immediate - fi -else - # We cannot hardcode anything, or else we can only hardcode existing - # directories. - hardcode_action=unsupported +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for objdir" >&5 +printf %s "checking for objdir... " >&6; } +if test ${lt_cv_objdir+y} +then : + printf %s "(cached) " >&6 +else $as_nop + rm -f .libs 2>/dev/null +mkdir .libs 2>/dev/null +if test -d .libs; then + lt_cv_objdir=.libs +else + # MS-DOS does not allow filenames that begin with a dot. + lt_cv_objdir=_libs fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $hardcode_action" >&5 -$as_echo "$hardcode_action" >&6; } - -if test relink = "$hardcode_action" || - test yes = "$inherit_rpath"; then - # Fast installation is not supported - enable_fast_install=no -elif test yes = "$shlibpath_overrides_runpath" || - test no = "$enable_shared"; then - # Fast installation is not necessary - enable_fast_install=needless +rmdir .libs 2>/dev/null fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_objdir" >&5 +printf "%s\n" "$lt_cv_objdir" >&6; } +objdir=$lt_cv_objdir +printf "%s\n" "#define LT_OBJDIR \"$lt_cv_objdir/\"" >>confdefs.h - if test yes != "$enable_dlopen"; then - enable_dlopen=unknown - enable_dlopen_self=unknown - enable_dlopen_self_static=unknown -else - lt_cv_dlopen=no - lt_cv_dlopen_libs= - case $host_os in - beos*) - lt_cv_dlopen=load_add_on - lt_cv_dlopen_libs= - lt_cv_dlopen_self=yes - ;; - mingw* | pw32* | cegcc*) - lt_cv_dlopen=LoadLibrary - lt_cv_dlopen_libs= - ;; - cygwin*) - lt_cv_dlopen=dlopen - lt_cv_dlopen_libs= - ;; +case $host_os in +aix3*) + # AIX sometimes has problems with the GCC collect2 program. For some + # reason, if we set the COLLECT_NAMES environment variable, the problems + # vanish in a puff of smoke. + if test set != "${COLLECT_NAMES+set}"; then + COLLECT_NAMES= + export COLLECT_NAMES + fi + ;; +esac - darwin*) - # if libdl is installed we need to link against it - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for dlopen in -ldl" >&5 -$as_echo_n "checking for dlopen in -ldl... " >&6; } -if ${ac_cv_lib_dl_dlopen+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-ldl $LIBS" -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ +# Global variables: +ofile=libtool +can_build_shared=yes -/* Override any GCC internal prototype to avoid an error. - Use char because int might match the return type of a GCC - builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif -char dlopen (); -int -main () -{ -return dlopen (); - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - ac_cv_lib_dl_dlopen=yes -else - ac_cv_lib_dl_dlopen=no -fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_dl_dlopen" >&5 -$as_echo "$ac_cv_lib_dl_dlopen" >&6; } -if test "x$ac_cv_lib_dl_dlopen" = xyes; then : - lt_cv_dlopen=dlopen lt_cv_dlopen_libs=-ldl -else +# All known linkers require a '.a' archive for static linking (except MSVC and +# ICC, which need '.lib'). +libext=a - lt_cv_dlopen=dyld - lt_cv_dlopen_libs= - lt_cv_dlopen_self=yes +with_gnu_ld=$lt_cv_prog_gnu_ld -fi +old_CC=$CC +old_CFLAGS=$CFLAGS - ;; +# Set sane defaults for various variables +test -z "$CC" && CC=cc +test -z "$LTCC" && LTCC=$CC +test -z "$LTCFLAGS" && LTCFLAGS=$CFLAGS +test -z "$LD" && LD=ld +test -z "$ac_objext" && ac_objext=o - tpf*) - # Don't try to run any link tests for TPF. We know it's impossible - # because TPF is a cross-compiler, and we know how we open DSOs. - lt_cv_dlopen=dlopen - lt_cv_dlopen_libs= - lt_cv_dlopen_self=no - ;; +func_cc_basename $compiler +cc_basename=$func_cc_basename_result - *) - ac_fn_c_check_func "$LINENO" "shl_load" "ac_cv_func_shl_load" -if test "x$ac_cv_func_shl_load" = xyes; then : - lt_cv_dlopen=shl_load -else - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for shl_load in -ldld" >&5 -$as_echo_n "checking for shl_load in -ldld... " >&6; } -if ${ac_cv_lib_dld_shl_load+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-ldld $LIBS" -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -/* Override any GCC internal prototype to avoid an error. - Use char because int might match the return type of a GCC - builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif -char shl_load (); -int -main () -{ -return shl_load (); - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - ac_cv_lib_dld_shl_load=yes -else - ac_cv_lib_dld_shl_load=no -fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS +# Only perform the check for file, if the check method requires it +test -z "$MAGIC_CMD" && MAGIC_CMD=file +case $deplibs_check_method in +file_magic*) + if test "$file_magic_cmd" = '$MAGIC_CMD'; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for ${ac_tool_prefix}file" >&5 +printf %s "checking for ${ac_tool_prefix}file... " >&6; } +if test ${lt_cv_path_MAGIC_CMD+y} +then : + printf %s "(cached) " >&6 +else $as_nop + case $MAGIC_CMD in +[\\/*] | ?:[\\/]*) + lt_cv_path_MAGIC_CMD=$MAGIC_CMD # Let the user override the test with a path. + ;; +*) + lt_save_MAGIC_CMD=$MAGIC_CMD + lt_save_ifs=$IFS; IFS=$PATH_SEPARATOR + ac_dummy="/usr/bin$PATH_SEPARATOR$PATH" + for ac_dir in $ac_dummy; do + IFS=$lt_save_ifs + test -z "$ac_dir" && ac_dir=. + if test -f "$ac_dir/${ac_tool_prefix}file"; then + lt_cv_path_MAGIC_CMD=$ac_dir/"${ac_tool_prefix}file" + if test -n "$file_magic_test_file"; then + case $deplibs_check_method in + "file_magic "*) + file_magic_regex=`expr "$deplibs_check_method" : "file_magic \(.*\)"` + MAGIC_CMD=$lt_cv_path_MAGIC_CMD + if eval $file_magic_cmd \$file_magic_test_file 2> /dev/null | + $EGREP "$file_magic_regex" > /dev/null; then + : + else + cat <<_LT_EOF 1>&2 + +*** Warning: the command libtool uses to detect shared libraries, +*** $file_magic_cmd, produces output that libtool cannot recognize. +*** The result is that libtool may fail to recognize shared libraries +*** as such. This will affect the creation of libtool libraries that +*** depend on shared libraries, but programs linked with such libtool +*** libraries will work regardless of this problem. Nevertheless, you +*** may want to report the problem to your system manager and/or to +*** bug-libtool@gnu.org + +_LT_EOF + fi ;; + esac + fi + break + fi + done + IFS=$lt_save_ifs + MAGIC_CMD=$lt_save_MAGIC_CMD + ;; +esac fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_dld_shl_load" >&5 -$as_echo "$ac_cv_lib_dld_shl_load" >&6; } -if test "x$ac_cv_lib_dld_shl_load" = xyes; then : - lt_cv_dlopen=shl_load lt_cv_dlopen_libs=-ldld -else - ac_fn_c_check_func "$LINENO" "dlopen" "ac_cv_func_dlopen" -if test "x$ac_cv_func_dlopen" = xyes; then : - lt_cv_dlopen=dlopen -else - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for dlopen in -ldl" >&5 -$as_echo_n "checking for dlopen in -ldl... " >&6; } -if ${ac_cv_lib_dl_dlopen+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-ldl $LIBS" -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -/* Override any GCC internal prototype to avoid an error. - Use char because int might match the return type of a GCC - builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif -char dlopen (); -int -main () -{ -return dlopen (); - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - ac_cv_lib_dl_dlopen=yes +MAGIC_CMD=$lt_cv_path_MAGIC_CMD +if test -n "$MAGIC_CMD"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $MAGIC_CMD" >&5 +printf "%s\n" "$MAGIC_CMD" >&6; } else - ac_cv_lib_dl_dlopen=no + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_dl_dlopen" >&5 -$as_echo "$ac_cv_lib_dl_dlopen" >&6; } -if test "x$ac_cv_lib_dl_dlopen" = xyes; then : - lt_cv_dlopen=dlopen lt_cv_dlopen_libs=-ldl -else - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for dlopen in -lsvld" >&5 -$as_echo_n "checking for dlopen in -lsvld... " >&6; } -if ${ac_cv_lib_svld_dlopen+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-lsvld $LIBS" -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -/* Override any GCC internal prototype to avoid an error. - Use char because int might match the return type of a GCC - builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif -char dlopen (); -int -main () -{ -return dlopen (); - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - ac_cv_lib_svld_dlopen=yes -else - ac_cv_lib_svld_dlopen=no -fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS + + + + +if test -z "$lt_cv_path_MAGIC_CMD"; then + if test -n "$ac_tool_prefix"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for file" >&5 +printf %s "checking for file... " >&6; } +if test ${lt_cv_path_MAGIC_CMD+y} +then : + printf %s "(cached) " >&6 +else $as_nop + case $MAGIC_CMD in +[\\/*] | ?:[\\/]*) + lt_cv_path_MAGIC_CMD=$MAGIC_CMD # Let the user override the test with a path. + ;; +*) + lt_save_MAGIC_CMD=$MAGIC_CMD + lt_save_ifs=$IFS; IFS=$PATH_SEPARATOR + ac_dummy="/usr/bin$PATH_SEPARATOR$PATH" + for ac_dir in $ac_dummy; do + IFS=$lt_save_ifs + test -z "$ac_dir" && ac_dir=. + if test -f "$ac_dir/file"; then + lt_cv_path_MAGIC_CMD=$ac_dir/"file" + if test -n "$file_magic_test_file"; then + case $deplibs_check_method in + "file_magic "*) + file_magic_regex=`expr "$deplibs_check_method" : "file_magic \(.*\)"` + MAGIC_CMD=$lt_cv_path_MAGIC_CMD + if eval $file_magic_cmd \$file_magic_test_file 2> /dev/null | + $EGREP "$file_magic_regex" > /dev/null; then + : + else + cat <<_LT_EOF 1>&2 + +*** Warning: the command libtool uses to detect shared libraries, +*** $file_magic_cmd, produces output that libtool cannot recognize. +*** The result is that libtool may fail to recognize shared libraries +*** as such. This will affect the creation of libtool libraries that +*** depend on shared libraries, but programs linked with such libtool +*** libraries will work regardless of this problem. Nevertheless, you +*** may want to report the problem to your system manager and/or to +*** bug-libtool@gnu.org + +_LT_EOF + fi ;; + esac + fi + break + fi + done + IFS=$lt_save_ifs + MAGIC_CMD=$lt_save_MAGIC_CMD + ;; +esac fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_svld_dlopen" >&5 -$as_echo "$ac_cv_lib_svld_dlopen" >&6; } -if test "x$ac_cv_lib_svld_dlopen" = xyes; then : - lt_cv_dlopen=dlopen lt_cv_dlopen_libs=-lsvld -else - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for dld_link in -ldld" >&5 -$as_echo_n "checking for dld_link in -ldld... " >&6; } -if ${ac_cv_lib_dld_dld_link+:} false; then : - $as_echo_n "(cached) " >&6 -else - ac_check_lib_save_LIBS=$LIBS -LIBS="-ldld $LIBS" -cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -/* Override any GCC internal prototype to avoid an error. - Use char because int might match the return type of a GCC - builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif -char dld_link (); -int -main () -{ -return dld_link (); - ; - return 0; -} -_ACEOF -if ac_fn_c_try_link "$LINENO"; then : - ac_cv_lib_dld_dld_link=yes +MAGIC_CMD=$lt_cv_path_MAGIC_CMD +if test -n "$MAGIC_CMD"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $MAGIC_CMD" >&5 +printf "%s\n" "$MAGIC_CMD" >&6; } else - ac_cv_lib_dld_dld_link=no -fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -LIBS=$ac_check_lib_save_LIBS -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_dld_dld_link" >&5 -$as_echo "$ac_cv_lib_dld_dld_link" >&6; } -if test "x$ac_cv_lib_dld_dld_link" = xyes; then : - lt_cv_dlopen=dld_link lt_cv_dlopen_libs=-ldld + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi + else + MAGIC_CMD=: + fi fi + fi + ;; +esac -fi +# Use C for the default configuration in the libtool script +lt_save_CC=$CC +ac_ext=c +ac_cpp='$CPP $CPPFLAGS' +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' +ac_compiler_gnu=$ac_cv_c_compiler_gnu -fi +# Source file extension for C test sources. +ac_ext=c -fi +# Object file extension for compiled C test sources. +objext=o +objext=$objext +# Code to be used in simple compile tests +lt_simple_compile_test_code="int some_variable = 0;" -fi +# Code to be used in simple link tests +lt_simple_link_test_code='int main(){return(0);}' - ;; - esac - if test no = "$lt_cv_dlopen"; then - enable_dlopen=no - else - enable_dlopen=yes - fi - case $lt_cv_dlopen in - dlopen) - save_CPPFLAGS=$CPPFLAGS - test yes = "$ac_cv_header_dlfcn_h" && CPPFLAGS="$CPPFLAGS -DHAVE_DLFCN_H" - save_LDFLAGS=$LDFLAGS - wl=$lt_prog_compiler_wl eval LDFLAGS=\"\$LDFLAGS $export_dynamic_flag_spec\" - save_LIBS=$LIBS - LIBS="$lt_cv_dlopen_libs $LIBS" - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether a program can dlopen itself" >&5 -$as_echo_n "checking whether a program can dlopen itself... " >&6; } -if ${lt_cv_dlopen_self+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test yes = "$cross_compiling"; then : - lt_cv_dlopen_self=cross -else - lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 - lt_status=$lt_dlunknown - cat > conftest.$ac_ext <<_LT_EOF -#line $LINENO "configure" -#include "confdefs.h" -#if HAVE_DLFCN_H -#include -#endif +# If no C compiler was specified, use CC. +LTCC=${LTCC-"$CC"} -#include +# If no C compiler flags were specified, use CFLAGS. +LTCFLAGS=${LTCFLAGS-"$CFLAGS"} -#ifdef RTLD_GLOBAL -# define LT_DLGLOBAL RTLD_GLOBAL -#else -# ifdef DL_GLOBAL -# define LT_DLGLOBAL DL_GLOBAL -# else -# define LT_DLGLOBAL 0 -# endif -#endif +# Allow CC to be a program name with arguments. +compiler=$CC -/* We may have to define LT_DLLAZY_OR_NOW in the command line if we - find out it does not work in some platform. */ -#ifndef LT_DLLAZY_OR_NOW -# ifdef RTLD_LAZY -# define LT_DLLAZY_OR_NOW RTLD_LAZY -# else -# ifdef DL_LAZY -# define LT_DLLAZY_OR_NOW DL_LAZY -# else -# ifdef RTLD_NOW -# define LT_DLLAZY_OR_NOW RTLD_NOW -# else -# ifdef DL_NOW -# define LT_DLLAZY_OR_NOW DL_NOW -# else -# define LT_DLLAZY_OR_NOW 0 -# endif -# endif -# endif -# endif -#endif +# Save the default compiler, since it gets overwritten when the other +# tags are being tested, and _LT_TAGVAR(compiler, []) is a NOP. +compiler_DEFAULT=$CC -/* When -fvisibility=hidden is used, assume the code has been annotated - correspondingly for the symbols needed. */ -#if defined __GNUC__ && (((__GNUC__ == 3) && (__GNUC_MINOR__ >= 3)) || (__GNUC__ > 3)) -int fnord () __attribute__((visibility("default"))); -#endif +# save warnings/boilerplate of simple test code +ac_outfile=conftest.$ac_objext +echo "$lt_simple_compile_test_code" >conftest.$ac_ext +eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err +_lt_compiler_boilerplate=`cat conftest.err` +$RM conftest* -int fnord () { return 42; } -int main () -{ - void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW); - int status = $lt_dlunknown; +ac_outfile=conftest.$ac_objext +echo "$lt_simple_link_test_code" >conftest.$ac_ext +eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err +_lt_linker_boilerplate=`cat conftest.err` +$RM -r conftest* - if (self) - { - if (dlsym (self,"fnord")) status = $lt_dlno_uscore; - else - { - if (dlsym( self,"_fnord")) status = $lt_dlneed_uscore; - else puts (dlerror ()); - } - /* dlclose (self); */ - } - else - puts (dlerror ()); - return status; -} -_LT_EOF - if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_link\""; } >&5 - (eval $ac_link) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } && test -s "conftest$ac_exeext" 2>/dev/null; then - (./conftest; exit; ) >&5 2>/dev/null - lt_status=$? - case x$lt_status in - x$lt_dlno_uscore) lt_cv_dlopen_self=yes ;; - x$lt_dlneed_uscore) lt_cv_dlopen_self=yes ;; - x$lt_dlunknown|x*) lt_cv_dlopen_self=no ;; - esac - else : - # compilation failed - lt_cv_dlopen_self=no - fi +## CAVEAT EMPTOR: +## There is no encapsulation within the following macros, do not change +## the running order or otherwise move them around unless you know exactly +## what you are doing... +if test -n "$compiler"; then + +lt_prog_compiler_no_builtin_flag= + +if test yes = "$GCC"; then + case $cc_basename in + nvcc*) + lt_prog_compiler_no_builtin_flag=' -Xcompiler -fno-builtin' ;; + *) + lt_prog_compiler_no_builtin_flag=' -fno-builtin' ;; + esac + + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if $compiler supports -fno-rtti -fno-exceptions" >&5 +printf %s "checking if $compiler supports -fno-rtti -fno-exceptions... " >&6; } +if test ${lt_cv_prog_compiler_rtti_exceptions+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_prog_compiler_rtti_exceptions=no + ac_outfile=conftest.$ac_objext + echo "$lt_simple_compile_test_code" > conftest.$ac_ext + lt_compiler_flag="-fno-rtti -fno-exceptions" ## exclude from sc_useless_quotes_in_assignment + # Insert the option either (1) after the last *FLAGS variable, or + # (2) before a word containing "conftest.", or (3) at the end. + # Note that $ac_compile itself does not contain backslashes and begins + # with a dollar sign (not a hyphen), so the echo should work correctly. + # The option is referenced via a variable to avoid confusing sed. + lt_compile=`echo "$ac_compile" | $SED \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ + -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ + -e 's:$: $lt_compiler_flag:'` + (eval echo "\"\$as_me:$LINENO: $lt_compile\"" >&5) + (eval "$lt_compile" 2>conftest.err) + ac_status=$? + cat conftest.err >&5 + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + if (exit $ac_status) && test -s "$ac_outfile"; then + # The compiler can only warn and ignore the option if not recognized + # So say no if there are warnings other than the usual output. + $ECHO "$_lt_compiler_boilerplate" | $SED '/^$/d' >conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then + lt_cv_prog_compiler_rtti_exceptions=yes + fi + fi + $RM conftest* + fi -rm -fr conftest* +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_rtti_exceptions" >&5 +printf "%s\n" "$lt_cv_prog_compiler_rtti_exceptions" >&6; } +if test yes = "$lt_cv_prog_compiler_rtti_exceptions"; then + lt_prog_compiler_no_builtin_flag="$lt_prog_compiler_no_builtin_flag -fno-rtti -fno-exceptions" +else + : +fi fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_dlopen_self" >&5 -$as_echo "$lt_cv_dlopen_self" >&6; } - if test yes = "$lt_cv_dlopen_self"; then - wl=$lt_prog_compiler_wl eval LDFLAGS=\"\$LDFLAGS $lt_prog_compiler_static\" - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether a statically linked program can dlopen itself" >&5 -$as_echo_n "checking whether a statically linked program can dlopen itself... " >&6; } -if ${lt_cv_dlopen_self_static+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test yes = "$cross_compiling"; then : - lt_cv_dlopen_self_static=cross -else - lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 - lt_status=$lt_dlunknown - cat > conftest.$ac_ext <<_LT_EOF -#line $LINENO "configure" -#include "confdefs.h" -#if HAVE_DLFCN_H -#include -#endif -#include -#ifdef RTLD_GLOBAL -# define LT_DLGLOBAL RTLD_GLOBAL -#else -# ifdef DL_GLOBAL -# define LT_DLGLOBAL DL_GLOBAL -# else -# define LT_DLGLOBAL 0 -# endif -#endif -/* We may have to define LT_DLLAZY_OR_NOW in the command line if we - find out it does not work in some platform. */ -#ifndef LT_DLLAZY_OR_NOW -# ifdef RTLD_LAZY -# define LT_DLLAZY_OR_NOW RTLD_LAZY -# else -# ifdef DL_LAZY -# define LT_DLLAZY_OR_NOW DL_LAZY -# else -# ifdef RTLD_NOW -# define LT_DLLAZY_OR_NOW RTLD_NOW -# else -# ifdef DL_NOW -# define LT_DLLAZY_OR_NOW DL_NOW -# else -# define LT_DLLAZY_OR_NOW 0 -# endif -# endif -# endif -# endif -#endif -/* When -fvisibility=hidden is used, assume the code has been annotated - correspondingly for the symbols needed. */ -#if defined __GNUC__ && (((__GNUC__ == 3) && (__GNUC_MINOR__ >= 3)) || (__GNUC__ > 3)) -int fnord () __attribute__((visibility("default"))); -#endif + lt_prog_compiler_wl= +lt_prog_compiler_pic= +lt_prog_compiler_static= -int fnord () { return 42; } -int main () -{ - void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW); - int status = $lt_dlunknown; - if (self) - { - if (dlsym (self,"fnord")) status = $lt_dlno_uscore; - else - { - if (dlsym( self,"_fnord")) status = $lt_dlneed_uscore; - else puts (dlerror ()); - } - /* dlclose (self); */ - } - else - puts (dlerror ()); + if test yes = "$GCC"; then + lt_prog_compiler_wl='-Wl,' + lt_prog_compiler_static='-static' - return status; -} -_LT_EOF - if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_link\""; } >&5 - (eval $ac_link) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } && test -s "conftest$ac_exeext" 2>/dev/null; then - (./conftest; exit; ) >&5 2>/dev/null - lt_status=$? - case x$lt_status in - x$lt_dlno_uscore) lt_cv_dlopen_self_static=yes ;; - x$lt_dlneed_uscore) lt_cv_dlopen_self_static=yes ;; - x$lt_dlunknown|x*) lt_cv_dlopen_self_static=no ;; - esac - else : - # compilation failed - lt_cv_dlopen_self_static=no - fi -fi -rm -fr conftest* + case $host_os in + aix*) + # All AIX code is PIC. + if test ia64 = "$host_cpu"; then + # AIX 5 now supports IA64 processor + lt_prog_compiler_static='-Bstatic' + fi + lt_prog_compiler_pic='-fPIC' + ;; + amigaos*) + case $host_cpu in + powerpc) + # see comment about AmigaOS4 .so support + lt_prog_compiler_pic='-fPIC' + ;; + m68k) + # FIXME: we need at least 68020 code to build shared libraries, but + # adding the '-m68020' flag to GCC prevents building anything better, + # like '-m68040'. + lt_prog_compiler_pic='-m68020 -resident32 -malways-restore-a4' + ;; + esac + ;; -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_dlopen_self_static" >&5 -$as_echo "$lt_cv_dlopen_self_static" >&6; } - fi + beos* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*) + # PIC is the default for these OSes. + ;; - CPPFLAGS=$save_CPPFLAGS - LDFLAGS=$save_LDFLAGS - LIBS=$save_LIBS - ;; - esac + mingw* | cygwin* | pw32* | os2* | cegcc*) + # This hack is so that the source file can tell whether it is being + # built for inclusion in a dll (and should export symbols for example). + # Although the cygwin gcc ignores -fPIC, still need this for old-style + # (--disable-auto-import) libraries + lt_prog_compiler_pic='-DDLL_EXPORT' + case $host_os in + os2*) + lt_prog_compiler_static='$wl-static' + ;; + esac + ;; - case $lt_cv_dlopen_self in - yes|no) enable_dlopen_self=$lt_cv_dlopen_self ;; - *) enable_dlopen_self=unknown ;; - esac + darwin* | rhapsody*) + # PIC is the default on this platform + # Common symbols not allowed in MH_DYLIB files + lt_prog_compiler_pic='-fno-common' + ;; - case $lt_cv_dlopen_self_static in - yes|no) enable_dlopen_self_static=$lt_cv_dlopen_self_static ;; - *) enable_dlopen_self_static=unknown ;; - esac -fi + haiku*) + # PIC is the default for Haiku. + # The "-static" flag exists, but is broken. + lt_prog_compiler_static= + ;; + + hpux*) + # PIC is the default for 64-bit PA HP-UX, but not for 32-bit + # PA HP-UX. On IA64 HP-UX, PIC is the default but the pic flag + # sets the default TLS model and affects inlining. + case $host_cpu in + hppa*64*) + # +Z the default + ;; + *) + lt_prog_compiler_pic='-fPIC' + ;; + esac + ;; + + interix[3-9]*) + # Interix 3.x gcc -fpic/-fPIC options generate broken code. + # Instead, we relocate shared libraries at runtime. + ;; + + msdosdjgpp*) + # Just because we use GCC doesn't mean we suddenly get shared libraries + # on systems that don't support them. + lt_prog_compiler_can_build_shared=no + enable_shared=no + ;; + + *nto* | *qnx*) + # QNX uses GNU C++, but need to define -shared option too, otherwise + # it will coredump. + lt_prog_compiler_pic='-fPIC -shared' + ;; + + sysv4*MP*) + if test -d /usr/nec; then + lt_prog_compiler_pic=-Kconform_pic + fi + ;; + *) + lt_prog_compiler_pic='-fPIC' + ;; + esac + case $cc_basename in + nvcc*) # Cuda Compiler Driver 2.2 + lt_prog_compiler_wl='-Xlinker ' + if test -n "$lt_prog_compiler_pic"; then + lt_prog_compiler_pic="-Xcompiler $lt_prog_compiler_pic" + fi + ;; + esac + else + # PORTME Check for flag to pass linker flags through the system compiler. + case $host_os in + aix*) + lt_prog_compiler_wl='-Wl,' + if test ia64 = "$host_cpu"; then + # AIX 5 now supports IA64 processor + lt_prog_compiler_static='-Bstatic' + else + lt_prog_compiler_static='-bnso -bI:/lib/syscalls.exp' + fi + ;; + darwin* | rhapsody*) + # PIC is the default on this platform + # Common symbols not allowed in MH_DYLIB files + lt_prog_compiler_pic='-fno-common' + case $cc_basename in + nagfor*) + # NAG Fortran compiler + lt_prog_compiler_wl='-Wl,-Wl,,' + lt_prog_compiler_pic='-PIC' + lt_prog_compiler_static='-Bstatic' + ;; + esac + ;; + mingw* | cygwin* | pw32* | os2* | cegcc*) + # This hack is so that the source file can tell whether it is being + # built for inclusion in a dll (and should export symbols for example). + lt_prog_compiler_pic='-DDLL_EXPORT' + case $host_os in + os2*) + lt_prog_compiler_static='$wl-static' + ;; + esac + ;; + hpux9* | hpux10* | hpux11*) + lt_prog_compiler_wl='-Wl,' + # PIC is the default for IA64 HP-UX and 64-bit HP-UX, but + # not for PA HP-UX. + case $host_cpu in + hppa*64*|ia64*) + # +Z the default + ;; + *) + lt_prog_compiler_pic='+Z' + ;; + esac + # Is there a better lt_prog_compiler_static that works with the bundled CC? + lt_prog_compiler_static='$wl-a ${wl}archive' + ;; + irix5* | irix6* | nonstopux*) + lt_prog_compiler_wl='-Wl,' + # PIC (with -KPIC) is the default. + lt_prog_compiler_static='-non_shared' + ;; + linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*) + case $cc_basename in + # old Intel for x86_64, which still supported -KPIC. + ecc*) + lt_prog_compiler_wl='-Wl,' + lt_prog_compiler_pic='-KPIC' + lt_prog_compiler_static='-static' + ;; + # icc used to be incompatible with GCC. + # ICC 10 doesn't accept -KPIC any more. + icc* | ifort*) + lt_prog_compiler_wl='-Wl,' + lt_prog_compiler_pic='-fPIC' + lt_prog_compiler_static='-static' + ;; + # Lahey Fortran 8.1. + lf95*) + lt_prog_compiler_wl='-Wl,' + lt_prog_compiler_pic='--shared' + lt_prog_compiler_static='--static' + ;; + nagfor*) + # NAG Fortran compiler + lt_prog_compiler_wl='-Wl,-Wl,,' + lt_prog_compiler_pic='-PIC' + lt_prog_compiler_static='-Bstatic' + ;; + tcc*) + # Fabrice Bellard et al's Tiny C Compiler + lt_prog_compiler_wl='-Wl,' + lt_prog_compiler_pic='-fPIC' + lt_prog_compiler_static='-static' + ;; + pgcc* | pgf77* | pgf90* | pgf95* | pgfortran*) + # Portland Group compilers (*not* the Pentium gcc compiler, + # which looks to be a dead project) + lt_prog_compiler_wl='-Wl,' + lt_prog_compiler_pic='-fpic' + lt_prog_compiler_static='-Bstatic' + ;; + ccc*) + lt_prog_compiler_wl='-Wl,' + # All Alpha code is PIC. + lt_prog_compiler_static='-non_shared' + ;; + xl* | bgxl* | bgf* | mpixl*) + # IBM XL C 8.0/Fortran 10.1, 11.1 on PPC and BlueGene + lt_prog_compiler_wl='-Wl,' + lt_prog_compiler_pic='-qpic' + lt_prog_compiler_static='-qstaticlink' + ;; + *) + case `$CC -V 2>&1 | sed 5q` in + *Sun\ Ceres\ Fortran* | *Sun*Fortran*\ [1-7].* | *Sun*Fortran*\ 8.[0-3]*) + # Sun Fortran 8.3 passes all unrecognized flags to the linker + lt_prog_compiler_pic='-KPIC' + lt_prog_compiler_static='-Bstatic' + lt_prog_compiler_wl='' + ;; + *Sun\ F* | *Sun*Fortran*) + lt_prog_compiler_pic='-KPIC' + lt_prog_compiler_static='-Bstatic' + lt_prog_compiler_wl='-Qoption ld ' + ;; + *Sun\ C*) + # Sun C 5.9 + lt_prog_compiler_pic='-KPIC' + lt_prog_compiler_static='-Bstatic' + lt_prog_compiler_wl='-Wl,' + ;; + *Intel*\ [CF]*Compiler*) + lt_prog_compiler_wl='-Wl,' + lt_prog_compiler_pic='-fPIC' + lt_prog_compiler_static='-static' + ;; + *Portland\ Group*) + lt_prog_compiler_wl='-Wl,' + lt_prog_compiler_pic='-fpic' + lt_prog_compiler_static='-Bstatic' + ;; + esac + ;; + esac + ;; + newsos6) + lt_prog_compiler_pic='-KPIC' + lt_prog_compiler_static='-Bstatic' + ;; + *nto* | *qnx*) + # QNX uses GNU C++, but need to define -shared option too, otherwise + # it will coredump. + lt_prog_compiler_pic='-fPIC -shared' + ;; + osf3* | osf4* | osf5*) + lt_prog_compiler_wl='-Wl,' + # All OSF/1 code is PIC. + lt_prog_compiler_static='-non_shared' + ;; + rdos*) + lt_prog_compiler_static='-non_shared' + ;; + solaris*) + lt_prog_compiler_pic='-KPIC' + lt_prog_compiler_static='-Bstatic' + case $cc_basename in + f77* | f90* | f95* | sunf77* | sunf90* | sunf95*) + lt_prog_compiler_wl='-Qoption ld ';; + *) + lt_prog_compiler_wl='-Wl,';; + esac + ;; + sunos4*) + lt_prog_compiler_wl='-Qoption ld ' + lt_prog_compiler_pic='-PIC' + lt_prog_compiler_static='-Bstatic' + ;; + sysv4 | sysv4.2uw2* | sysv4.3*) + lt_prog_compiler_wl='-Wl,' + lt_prog_compiler_pic='-KPIC' + lt_prog_compiler_static='-Bstatic' + ;; + sysv4*MP*) + if test -d /usr/nec; then + lt_prog_compiler_pic='-Kconform_pic' + lt_prog_compiler_static='-Bstatic' + fi + ;; + sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*) + lt_prog_compiler_wl='-Wl,' + lt_prog_compiler_pic='-KPIC' + lt_prog_compiler_static='-Bstatic' + ;; -striplib= -old_striplib= -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether stripping libraries is possible" >&5 -$as_echo_n "checking whether stripping libraries is possible... " >&6; } -if test -z "$STRIP"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -else - if $STRIP -V 2>&1 | $GREP "GNU strip" >/dev/null; then - old_striplib="$STRIP --strip-debug" - striplib="$STRIP --strip-unneeded" - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } - else - case $host_os in - darwin*) - # FIXME - insert some real tests, host_os isn't really good enough - striplib="$STRIP -x" - old_striplib="$STRIP -S" - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } + unicos*) + lt_prog_compiler_wl='-Wl,' + lt_prog_compiler_can_build_shared=no ;; - freebsd*) - if $STRIP -V 2>&1 | $GREP "elftoolchain" >/dev/null; then - old_striplib="$STRIP --strip-debug" - striplib="$STRIP --strip-unneeded" - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } - else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } - fi + + uts4*) + lt_prog_compiler_pic='-pic' + lt_prog_compiler_static='-Bstatic' ;; + *) - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + lt_prog_compiler_can_build_shared=no ;; esac fi -fi - - - - - - - - - - +case $host_os in + # For platforms that do not support PIC, -DPIC is meaningless: + *djgpp*) + lt_prog_compiler_pic= + ;; + *) + lt_prog_compiler_pic="$lt_prog_compiler_pic -DPIC" + ;; +esac - # Report what library types will actually be built - { $as_echo "$as_me:${as_lineno-$LINENO}: checking if libtool supports shared libraries" >&5 -$as_echo_n "checking if libtool supports shared libraries... " >&6; } - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $can_build_shared" >&5 -$as_echo "$can_build_shared" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $compiler option to produce PIC" >&5 +printf %s "checking for $compiler option to produce PIC... " >&6; } +if test ${lt_cv_prog_compiler_pic+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_prog_compiler_pic=$lt_prog_compiler_pic +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_pic" >&5 +printf "%s\n" "$lt_cv_prog_compiler_pic" >&6; } +lt_prog_compiler_pic=$lt_cv_prog_compiler_pic - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build shared libraries" >&5 -$as_echo_n "checking whether to build shared libraries... " >&6; } - test no = "$can_build_shared" && enable_shared=no +# +# Check to make sure the PIC flag actually works. +# +if test -n "$lt_prog_compiler_pic"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if $compiler PIC flag $lt_prog_compiler_pic works" >&5 +printf %s "checking if $compiler PIC flag $lt_prog_compiler_pic works... " >&6; } +if test ${lt_cv_prog_compiler_pic_works+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_prog_compiler_pic_works=no + ac_outfile=conftest.$ac_objext + echo "$lt_simple_compile_test_code" > conftest.$ac_ext + lt_compiler_flag="$lt_prog_compiler_pic -DPIC" ## exclude from sc_useless_quotes_in_assignment + # Insert the option either (1) after the last *FLAGS variable, or + # (2) before a word containing "conftest.", or (3) at the end. + # Note that $ac_compile itself does not contain backslashes and begins + # with a dollar sign (not a hyphen), so the echo should work correctly. + # The option is referenced via a variable to avoid confusing sed. + lt_compile=`echo "$ac_compile" | $SED \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ + -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ + -e 's:$: $lt_compiler_flag:'` + (eval echo "\"\$as_me:$LINENO: $lt_compile\"" >&5) + (eval "$lt_compile" 2>conftest.err) + ac_status=$? + cat conftest.err >&5 + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + if (exit $ac_status) && test -s "$ac_outfile"; then + # The compiler can only warn and ignore the option if not recognized + # So say no if there are warnings other than the usual output. + $ECHO "$_lt_compiler_boilerplate" | $SED '/^$/d' >conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then + lt_cv_prog_compiler_pic_works=yes + fi + fi + $RM conftest* - # On AIX, shared libraries and static libraries use the same namespace, and - # are all built from PIC. - case $host_os in - aix3*) - test yes = "$enable_shared" && enable_static=no - if test -n "$RANLIB"; then - archive_cmds="$archive_cmds~\$RANLIB \$lib" - postinstall_cmds='$RANLIB $lib' - fi - ;; +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_pic_works" >&5 +printf "%s\n" "$lt_cv_prog_compiler_pic_works" >&6; } - aix[4-9]*) - if test ia64 != "$host_cpu"; then - case $enable_shared,$with_aix_soname,$aix_use_runtimelinking in - yes,aix,yes) ;; # shared object as lib.so file only - yes,svr4,*) ;; # shared object as lib.so archive member only - yes,*) enable_static=no ;; # shared object in lib.a archive as well - esac - fi - ;; - esac - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $enable_shared" >&5 -$as_echo "$enable_shared" >&6; } +if test yes = "$lt_cv_prog_compiler_pic_works"; then + case $lt_prog_compiler_pic in + "" | " "*) ;; + *) lt_prog_compiler_pic=" $lt_prog_compiler_pic" ;; + esac +else + lt_prog_compiler_pic= + lt_prog_compiler_can_build_shared=no +fi - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether to build static libraries" >&5 -$as_echo_n "checking whether to build static libraries... " >&6; } - # Make sure either enable_shared or enable_static is yes. - test yes = "$enable_shared" || enable_static=yes - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $enable_static" >&5 -$as_echo "$enable_static" >&6; } +fi -fi -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu -CC=$lt_save_CC - if test -n "$CXX" && ( test no != "$CXX" && - ( (test g++ = "$CXX" && `g++ -v >/dev/null 2>&1` ) || - (test g++ != "$CXX"))); then - ac_ext=cpp -ac_cpp='$CXXCPP $CPPFLAGS' -ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_cxx_compiler_gnu -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking how to run the C++ preprocessor" >&5 -$as_echo_n "checking how to run the C++ preprocessor... " >&6; } -if test -z "$CXXCPP"; then - if ${ac_cv_prog_CXXCPP+:} false; then : - $as_echo_n "(cached) " >&6 -else - # Double quotes because CXXCPP needs to be expanded - for CXXCPP in "$CXX -E" "/lib/cpp" - do - ac_preproc_ok=false -for ac_cxx_preproc_warn_flag in '' yes -do - # Use a header file that comes with gcc, so configuring glibc - # with a fresh cross-compiler works. - # Prefer to if __STDC__ is defined, since - # exists even on freestanding compilers. - # On the NeXT, cc -E runs the code through the compiler's parser, - # not just through cpp. "Syntax error" is here to catch this case. - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#ifdef __STDC__ -# include -#else -# include -#endif - Syntax error -_ACEOF -if ac_fn_cxx_try_cpp "$LINENO"; then : -else - # Broken: fails on valid input. -continue -fi -rm -f conftest.err conftest.i conftest.$ac_ext - # OK, works on sane cases. Now check whether nonexistent headers - # can be detected and how. - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include -_ACEOF -if ac_fn_cxx_try_cpp "$LINENO"; then : - # Broken: success on invalid input. -continue -else - # Passes both tests. -ac_preproc_ok=: -break -fi -rm -f conftest.err conftest.i conftest.$ac_ext -done -# Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped. -rm -f conftest.i conftest.err conftest.$ac_ext -if $ac_preproc_ok; then : - break -fi - done - ac_cv_prog_CXXCPP=$CXXCPP -fi - CXXCPP=$ac_cv_prog_CXXCPP -else - ac_cv_prog_CXXCPP=$CXXCPP -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $CXXCPP" >&5 -$as_echo "$CXXCPP" >&6; } -ac_preproc_ok=false -for ac_cxx_preproc_warn_flag in '' yes -do - # Use a header file that comes with gcc, so configuring glibc - # with a fresh cross-compiler works. - # Prefer to if __STDC__ is defined, since - # exists even on freestanding compilers. - # On the NeXT, cc -E runs the code through the compiler's parser, - # not just through cpp. "Syntax error" is here to catch this case. - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#ifdef __STDC__ -# include -#else -# include -#endif - Syntax error -_ACEOF -if ac_fn_cxx_try_cpp "$LINENO"; then : +# +# Check to make sure the static flag actually works. +# +wl=$lt_prog_compiler_wl eval lt_tmp_static_flag=\"$lt_prog_compiler_static\" +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if $compiler static flag $lt_tmp_static_flag works" >&5 +printf %s "checking if $compiler static flag $lt_tmp_static_flag works... " >&6; } +if test ${lt_cv_prog_compiler_static_works+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_prog_compiler_static_works=no + save_LDFLAGS=$LDFLAGS + LDFLAGS="$LDFLAGS $lt_tmp_static_flag" + echo "$lt_simple_link_test_code" > conftest.$ac_ext + if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then + # The linker can only warn and ignore the option if not recognized + # So say no if there are warnings + if test -s conftest.err; then + # Append any errors to the config.log. + cat conftest.err 1>&5 + $ECHO "$_lt_linker_boilerplate" | $SED '/^$/d' > conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if diff conftest.exp conftest.er2 >/dev/null; then + lt_cv_prog_compiler_static_works=yes + fi + else + lt_cv_prog_compiler_static_works=yes + fi + fi + $RM -r conftest* + LDFLAGS=$save_LDFLAGS -else - # Broken: fails on valid input. -continue fi -rm -f conftest.err conftest.i conftest.$ac_ext +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_static_works" >&5 +printf "%s\n" "$lt_cv_prog_compiler_static_works" >&6; } - # OK, works on sane cases. Now check whether nonexistent headers - # can be detected and how. - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include -_ACEOF -if ac_fn_cxx_try_cpp "$LINENO"; then : - # Broken: success on invalid input. -continue +if test yes = "$lt_cv_prog_compiler_static_works"; then + : else - # Passes both tests. -ac_preproc_ok=: -break + lt_prog_compiler_static= fi -rm -f conftest.err conftest.i conftest.$ac_ext -done -# Because of `break', _AC_PREPROC_IFELSE's cleaning code was skipped. -rm -f conftest.i conftest.err conftest.$ac_ext -if $ac_preproc_ok; then : -else - { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} -as_fn_error $? "C++ preprocessor \"$CXXCPP\" fails sanity check -See \`config.log' for more details" "$LINENO" 5; } + + + + + + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if $compiler supports -c -o file.$ac_objext" >&5 +printf %s "checking if $compiler supports -c -o file.$ac_objext... " >&6; } +if test ${lt_cv_prog_compiler_c_o+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_prog_compiler_c_o=no + $RM -r conftest 2>/dev/null + mkdir conftest + cd conftest + mkdir out + echo "$lt_simple_compile_test_code" > conftest.$ac_ext + + lt_compiler_flag="-o out/conftest2.$ac_objext" + # Insert the option either (1) after the last *FLAGS variable, or + # (2) before a word containing "conftest.", or (3) at the end. + # Note that $ac_compile itself does not contain backslashes and begins + # with a dollar sign (not a hyphen), so the echo should work correctly. + lt_compile=`echo "$ac_compile" | $SED \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ + -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ + -e 's:$: $lt_compiler_flag:'` + (eval echo "\"\$as_me:$LINENO: $lt_compile\"" >&5) + (eval "$lt_compile" 2>out/conftest.err) + ac_status=$? + cat out/conftest.err >&5 + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + if (exit $ac_status) && test -s out/conftest2.$ac_objext + then + # The compiler can only warn and ignore the option if not recognized + # So say no if there are warnings + $ECHO "$_lt_compiler_boilerplate" | $SED '/^$/d' > out/conftest.exp + $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 + if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then + lt_cv_prog_compiler_c_o=yes + fi + fi + chmod u+w . 2>&5 + $RM conftest* + # SGI C++ compiler will create directory out/ii_files/ for + # template instantiation + test -d out/ii_files && $RM out/ii_files/* && rmdir out/ii_files + $RM out/* && rmdir out + cd .. + $RM -r conftest + $RM conftest* + fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_c_o" >&5 +printf "%s\n" "$lt_cv_prog_compiler_c_o" >&6; } -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu -else - _lt_caught_CXX_error=yes -fi - -ac_ext=cpp -ac_cpp='$CXXCPP $CPPFLAGS' -ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_cxx_compiler_gnu - -archive_cmds_need_lc_CXX=no -allow_undefined_flag_CXX= -always_export_symbols_CXX=no -archive_expsym_cmds_CXX= -compiler_needs_object_CXX=no -export_dynamic_flag_spec_CXX= -hardcode_direct_CXX=no -hardcode_direct_absolute_CXX=no -hardcode_libdir_flag_spec_CXX= -hardcode_libdir_separator_CXX= -hardcode_minus_L_CXX=no -hardcode_shlibpath_var_CXX=unsupported -hardcode_automatic_CXX=no -inherit_rpath_CXX=no -module_cmds_CXX= -module_expsym_cmds_CXX= -link_all_deplibs_CXX=unknown -old_archive_cmds_CXX=$old_archive_cmds -reload_flag_CXX=$reload_flag -reload_cmds_CXX=$reload_cmds -no_undefined_flag_CXX= -whole_archive_flag_spec_CXX= -enable_shared_with_static_runtimes_CXX=no - -# Source file extension for C++ test sources. -ac_ext=cpp - -# Object file extension for compiled C++ test sources. -objext=o -objext_CXX=$objext -# No sense in running all these tests if we already determined that -# the CXX compiler isn't working. Some variables (like enable_shared) -# are currently assumed to apply to all compilers on this platform, -# and will be corrupted by setting them based on a non-working compiler. -if test yes != "$_lt_caught_CXX_error"; then - # Code to be used in simple compile tests - lt_simple_compile_test_code="int some_variable = 0;" - # Code to be used in simple link tests - lt_simple_link_test_code='int main(int, char *[]) { return(0); }' - # ltmain only uses $CC for tagged configurations so make sure $CC is set. + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if $compiler supports -c -o file.$ac_objext" >&5 +printf %s "checking if $compiler supports -c -o file.$ac_objext... " >&6; } +if test ${lt_cv_prog_compiler_c_o+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_prog_compiler_c_o=no + $RM -r conftest 2>/dev/null + mkdir conftest + cd conftest + mkdir out + echo "$lt_simple_compile_test_code" > conftest.$ac_ext + lt_compiler_flag="-o out/conftest2.$ac_objext" + # Insert the option either (1) after the last *FLAGS variable, or + # (2) before a word containing "conftest.", or (3) at the end. + # Note that $ac_compile itself does not contain backslashes and begins + # with a dollar sign (not a hyphen), so the echo should work correctly. + lt_compile=`echo "$ac_compile" | $SED \ + -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ + -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ + -e 's:$: $lt_compiler_flag:'` + (eval echo "\"\$as_me:$LINENO: $lt_compile\"" >&5) + (eval "$lt_compile" 2>out/conftest.err) + ac_status=$? + cat out/conftest.err >&5 + echo "$as_me:$LINENO: \$? = $ac_status" >&5 + if (exit $ac_status) && test -s out/conftest2.$ac_objext + then + # The compiler can only warn and ignore the option if not recognized + # So say no if there are warnings + $ECHO "$_lt_compiler_boilerplate" | $SED '/^$/d' > out/conftest.exp + $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 + if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then + lt_cv_prog_compiler_c_o=yes + fi + fi + chmod u+w . 2>&5 + $RM conftest* + # SGI C++ compiler will create directory out/ii_files/ for + # template instantiation + test -d out/ii_files && $RM out/ii_files/* && rmdir out/ii_files + $RM out/* && rmdir out + cd .. + $RM -r conftest + $RM conftest* +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_c_o" >&5 +printf "%s\n" "$lt_cv_prog_compiler_c_o" >&6; } -# If no C compiler was specified, use CC. -LTCC=${LTCC-"$CC"} -# If no C compiler flags were specified, use CFLAGS. -LTCFLAGS=${LTCFLAGS-"$CFLAGS"} +hard_links=nottested +if test no = "$lt_cv_prog_compiler_c_o" && test no != "$need_locks"; then + # do not overwrite the value of need_locks provided by the user + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if we can lock with hard links" >&5 +printf %s "checking if we can lock with hard links... " >&6; } + hard_links=yes + $RM conftest* + ln conftest.a conftest.b 2>/dev/null && hard_links=no + touch conftest.a + ln conftest.a conftest.b 2>&5 || hard_links=no + ln conftest.a conftest.b 2>/dev/null && hard_links=no + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $hard_links" >&5 +printf "%s\n" "$hard_links" >&6; } + if test no = "$hard_links"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: '$CC' does not support '-c -o', so 'make -j' may be unsafe" >&5 +printf "%s\n" "$as_me: WARNING: '$CC' does not support '-c -o', so 'make -j' may be unsafe" >&2;} + need_locks=warn + fi +else + need_locks=no +fi -# Allow CC to be a program name with arguments. -compiler=$CC - # save warnings/boilerplate of simple test code - ac_outfile=conftest.$ac_objext -echo "$lt_simple_compile_test_code" >conftest.$ac_ext -eval "$ac_compile" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err -_lt_compiler_boilerplate=`cat conftest.err` -$RM conftest* - ac_outfile=conftest.$ac_objext -echo "$lt_simple_link_test_code" >conftest.$ac_ext -eval "$ac_link" 2>&1 >/dev/null | $SED '/^$/d; /^ *+/d' >conftest.err -_lt_linker_boilerplate=`cat conftest.err` -$RM -r conftest* - # Allow CC to be a program name with arguments. - lt_save_CC=$CC - lt_save_CFLAGS=$CFLAGS - lt_save_LD=$LD - lt_save_GCC=$GCC - GCC=$GXX - lt_save_with_gnu_ld=$with_gnu_ld - lt_save_path_LD=$lt_cv_path_LD - if test -n "${lt_cv_prog_gnu_ldcxx+set}"; then - lt_cv_prog_gnu_ld=$lt_cv_prog_gnu_ldcxx - else - $as_unset lt_cv_prog_gnu_ld - fi - if test -n "${lt_cv_path_LDCXX+set}"; then - lt_cv_path_LD=$lt_cv_path_LDCXX - else - $as_unset lt_cv_path_LD + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the $compiler linker ($LD) supports shared libraries" >&5 +printf %s "checking whether the $compiler linker ($LD) supports shared libraries... " >&6; } + + runpath_var= + allow_undefined_flag= + always_export_symbols=no + archive_cmds= + archive_expsym_cmds= + compiler_needs_object=no + enable_shared_with_static_runtimes=no + export_dynamic_flag_spec= + export_symbols_cmds='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols' + hardcode_automatic=no + hardcode_direct=no + hardcode_direct_absolute=no + hardcode_libdir_flag_spec= + hardcode_libdir_separator= + hardcode_minus_L=no + hardcode_shlibpath_var=unsupported + inherit_rpath=no + link_all_deplibs=unknown + module_cmds= + module_expsym_cmds= + old_archive_from_new_cmds= + old_archive_from_expsyms_cmds= + thread_safe_flag_spec= + whole_archive_flag_spec= + # include_expsyms should be a list of space-separated symbols to be *always* + # included in the symbol list + include_expsyms= + # exclude_expsyms can be an extended regexp of symbols to exclude + # it will be wrapped by ' (' and ')$', so one must not match beginning or + # end of line. Example: 'a|bc|.*d.*' will exclude the symbols 'a' and 'bc', + # as well as any symbol that contains 'd'. + exclude_expsyms='_GLOBAL_OFFSET_TABLE_|_GLOBAL__F[ID]_.*' + # Although _GLOBAL_OFFSET_TABLE_ is a valid symbol C name, most a.out + # platforms (ab)use it in PIC code, but their linkers get confused if + # the symbol is explicitly referenced. Since portable code cannot + # rely on this symbol name, it's probably fine to never include it in + # preloaded symbol tables. + # Exclude shared library initialization/finalization symbols. + extract_expsyms_cmds= + + case $host_os in + cygwin* | mingw* | pw32* | cegcc*) + # FIXME: the MSVC++ and ICC port hasn't been tested in a loooong time + # When not using gcc, we currently assume that we are using + # Microsoft Visual C++ or Intel C++ Compiler. + if test yes != "$GCC"; then + with_gnu_ld=no + fi + ;; + interix*) + # we just hope/assume this is gcc and not c89 (= MSVC++ or ICC) + with_gnu_ld=yes + ;; + openbsd* | bitrig*) + with_gnu_ld=no + ;; + esac + + ld_shlibs=yes + + # On some targets, GNU ld is compatible enough with the native linker + # that we're better off using the native interface for both. + lt_use_gnu_ld_interface=no + if test yes = "$with_gnu_ld"; then + case $host_os in + aix*) + # The AIX port of GNU ld has always aspired to compatibility + # with the native linker. However, as the warning in the GNU ld + # block says, versions before 2.19.5* couldn't really create working + # shared libraries, regardless of the interface used. + case `$LD -v 2>&1` in + *\ \(GNU\ Binutils\)\ 2.19.5*) ;; + *\ \(GNU\ Binutils\)\ 2.[2-9]*) ;; + *\ \(GNU\ Binutils\)\ [3-9]*) ;; + *) + lt_use_gnu_ld_interface=yes + ;; + esac + ;; + *) + lt_use_gnu_ld_interface=yes + ;; + esac fi - test -z "${LDCXX+set}" || LD=$LDCXX - CC=${CXX-"c++"} - CFLAGS=$CXXFLAGS - compiler=$CC - compiler_CXX=$CC - func_cc_basename $compiler -cc_basename=$func_cc_basename_result + if test yes = "$lt_use_gnu_ld_interface"; then + # If archive_cmds runs LD, not CC, wlarc should be empty + wlarc='$wl' - if test -n "$compiler"; then - # We don't want -fno-exception when compiling C++ code, so set the - # no_builtin_flag separately - if test yes = "$GXX"; then - lt_prog_compiler_no_builtin_flag_CXX=' -fno-builtin' + # Set some defaults for GNU ld with shared library support. These + # are reset later if shared libraries are not supported. Putting them + # here allows them to be overridden if necessary. + runpath_var=LD_RUN_PATH + hardcode_libdir_flag_spec='$wl-rpath $wl$libdir' + export_dynamic_flag_spec='$wl--export-dynamic' + # ancient GNU ld didn't support --whole-archive et. al. + if $LD --help 2>&1 | $GREP 'no-whole-archive' > /dev/null; then + whole_archive_flag_spec=$wlarc'--whole-archive$convenience '$wlarc'--no-whole-archive' else - lt_prog_compiler_no_builtin_flag_CXX= + whole_archive_flag_spec= fi + supports_anon_versioning=no + case `$LD -v | $SED -e 's/(^)\+)\s\+//' 2>&1` in + *GNU\ gold*) supports_anon_versioning=yes ;; + *\ [01].* | *\ 2.[0-9].* | *\ 2.10.*) ;; # catch versions < 2.11 + *\ 2.11.93.0.2\ *) supports_anon_versioning=yes ;; # RH7.3 ... + *\ 2.11.92.0.12\ *) supports_anon_versioning=yes ;; # Mandrake 8.2 ... + *\ 2.11.*) ;; # other 2.11 versions + *) supports_anon_versioning=yes ;; + esac - if test yes = "$GXX"; then - # Set up default GNU C++ configuration + # See if GNU ld supports shared libraries. + case $host_os in + aix[3-9]*) + # On AIX/PPC, the GNU linker is very broken + if test ia64 != "$host_cpu"; then + ld_shlibs=no + cat <<_LT_EOF 1>&2 +*** Warning: the GNU linker, at least up to release 2.19, is reported +*** to be unable to reliably create shared libraries on AIX. +*** Therefore, libtool is disabling shared libraries support. If you +*** really care for shared libraries, you may want to install binutils +*** 2.20 or above, or modify your PATH so that a non-GNU linker is found. +*** You will then need to restart the configuration process. +_LT_EOF + fi + ;; -# Check whether --with-gnu-ld was given. -if test "${with_gnu_ld+set}" = set; then : - withval=$with_gnu_ld; test no = "$withval" || with_gnu_ld=yes -else - with_gnu_ld=no -fi + amigaos*) + case $host_cpu in + powerpc) + # see comment about AmigaOS4 .so support + archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' + archive_expsym_cmds='' + ;; + m68k) + archive_cmds='$RM $output_objdir/a2ixlibrary.data~$ECHO "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$ECHO "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$ECHO "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$ECHO "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)' + hardcode_libdir_flag_spec='-L$libdir' + hardcode_minus_L=yes + ;; + esac + ;; -ac_prog=ld -if test yes = "$GCC"; then - # Check if gcc -print-prog-name=ld gives a path. - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ld used by $CC" >&5 -$as_echo_n "checking for ld used by $CC... " >&6; } - case $host in - *-*-mingw*) - # gcc leaves a trailing carriage return, which upsets mingw - ac_prog=`($CC -print-prog-name=ld) 2>&5 | tr -d '\015'` ;; - *) - ac_prog=`($CC -print-prog-name=ld) 2>&5` ;; - esac - case $ac_prog in - # Accept absolute paths. - [\\/]* | ?:[\\/]*) - re_direlt='/[^/][^/]*/\.\./' - # Canonicalize the pathname of ld - ac_prog=`$ECHO "$ac_prog"| $SED 's%\\\\%/%g'` - while $ECHO "$ac_prog" | $GREP "$re_direlt" > /dev/null 2>&1; do - ac_prog=`$ECHO $ac_prog| $SED "s%$re_direlt%/%"` - done - test -z "$LD" && LD=$ac_prog + beos*) + if $LD --help 2>&1 | $GREP ': supported targets:.* elf' > /dev/null; then + allow_undefined_flag=unsupported + # Joseph Beckenbach says some releases of gcc + # support --undefined. This deserves some investigation. FIXME + archive_cmds='$CC -nostart $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' + else + ld_shlibs=no + fi ;; - "") - # If it fails, then pretend we aren't using GCC. - ac_prog=ld - ;; - *) - # If it is relative, then search for the first ld in PATH. - with_gnu_ld=unknown - ;; - esac -elif test yes = "$with_gnu_ld"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for GNU ld" >&5 -$as_echo_n "checking for GNU ld... " >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for non-GNU ld" >&5 -$as_echo_n "checking for non-GNU ld... " >&6; } -fi -if ${lt_cv_path_LD+:} false; then : - $as_echo_n "(cached) " >&6 -else - if test -z "$LD"; then - lt_save_ifs=$IFS; IFS=$PATH_SEPARATOR - for ac_dir in $PATH; do - IFS=$lt_save_ifs - test -z "$ac_dir" && ac_dir=. - if test -f "$ac_dir/$ac_prog" || test -f "$ac_dir/$ac_prog$ac_exeext"; then - lt_cv_path_LD=$ac_dir/$ac_prog - # Check to see if the program is GNU ld. I'd rather use --version, - # but apparently some variants of GNU ld only accept -v. - # Break only if it was the GNU/non-GNU ld that we prefer. - case `"$lt_cv_path_LD" -v 2>&1 &5 -$as_echo "$LD" >&6; } -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi -test -z "$LD" && as_fn_error $? "no acceptable ld found in \$PATH" "$LINENO" 5 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking if the linker ($LD) is GNU ld" >&5 -$as_echo_n "checking if the linker ($LD) is GNU ld... " >&6; } -if ${lt_cv_prog_gnu_ld+:} false; then : - $as_echo_n "(cached) " >&6 -else - # I'd rather use --version here, but apparently some GNU lds only accept -v. -case `$LD -v 2>&1 &5 -$as_echo "$lt_cv_prog_gnu_ld" >&6; } -with_gnu_ld=$lt_cv_prog_gnu_ld + cygwin* | mingw* | pw32* | cegcc*) + # _LT_TAGVAR(hardcode_libdir_flag_spec, ) is actually meaningless, + # as there is no search path for DLLs. + hardcode_libdir_flag_spec='-L$libdir' + export_dynamic_flag_spec='$wl--export-all-symbols' + allow_undefined_flag=unsupported + always_export_symbols=no + enable_shared_with_static_runtimes=yes + export_symbols_cmds='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[BCDGRS][ ]/s/.*[ ]\([^ ]*\)/\1 DATA/;s/^.*[ ]__nm__\([^ ]*\)[ ][^ ]*/\1 DATA/;/^I[ ]/d;/^[AITW][ ]/s/.* //'\'' | sort | uniq > $export_symbols' + exclude_expsyms='[_]+GLOBAL_OFFSET_TABLE_|[_]+GLOBAL__[FID]_.*|[_]+head_[A-Za-z0-9_]+_dll|[A-Za-z0-9_]+_dll_iname' + + if $LD --help 2>&1 | $GREP 'auto-import' > /dev/null; then + archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags -o $output_objdir/$soname $wl--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' + # If the export-symbols file already is a .def file, use it as + # is; otherwise, prepend EXPORTS... + archive_expsym_cmds='if test DEF = "`$SED -n -e '\''s/^[ ]*//'\'' -e '\''/^\(;.*\)*$/d'\'' -e '\''s/^\(EXPORTS\|LIBRARY\)\([ ].*\)*$/DEF/p'\'' -e q $export_symbols`" ; then + cp $export_symbols $output_objdir/$soname.def; + else + echo EXPORTS > $output_objdir/$soname.def; + cat $export_symbols >> $output_objdir/$soname.def; + fi~ + $CC -shared $output_objdir/$soname.def $libobjs $deplibs $compiler_flags -o $output_objdir/$soname $wl--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' + else + ld_shlibs=no + fi + ;; + + haiku*) + archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' + link_all_deplibs=yes + ;; + + os2*) + hardcode_libdir_flag_spec='-L$libdir' + hardcode_minus_L=yes + allow_undefined_flag=unsupported + shrext_cmds=.dll + archive_cmds='$ECHO "LIBRARY ${soname%$shared_ext} INITINSTANCE TERMINSTANCE" > $output_objdir/$libname.def~ + $ECHO "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~ + $ECHO "DATA MULTIPLE NONSHARED" >> $output_objdir/$libname.def~ + $ECHO EXPORTS >> $output_objdir/$libname.def~ + emxexp $libobjs | $SED /"_DLL_InitTerm"/d >> $output_objdir/$libname.def~ + $CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~ + emximp -o $lib $output_objdir/$libname.def' + archive_expsym_cmds='$ECHO "LIBRARY ${soname%$shared_ext} INITINSTANCE TERMINSTANCE" > $output_objdir/$libname.def~ + $ECHO "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~ + $ECHO "DATA MULTIPLE NONSHARED" >> $output_objdir/$libname.def~ + $ECHO EXPORTS >> $output_objdir/$libname.def~ + prefix_cmds="$SED"~ + if test EXPORTS = "`$SED 1q $export_symbols`"; then + prefix_cmds="$prefix_cmds -e 1d"; + fi~ + prefix_cmds="$prefix_cmds -e \"s/^\(.*\)$/_\1/g\""~ + cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~ + $CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~ + emximp -o $lib $output_objdir/$libname.def' + old_archive_From_new_cmds='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def' + enable_shared_with_static_runtimes=yes + file_list_spec='@' + ;; + + interix[3-9]*) + hardcode_direct=no + hardcode_shlibpath_var=no + hardcode_libdir_flag_spec='$wl-rpath,$libdir' + export_dynamic_flag_spec='$wl-E' + # Hack: On Interix 3.x, we cannot compile PIC because of a broken gcc. + # Instead, shared libraries are loaded at an image base (0x10000000 by + # default) and relocated if they conflict, which is a slow very memory + # consuming and fragmenting process. To avoid this, we pick a random, + # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link + # time. Moving up from 0x10000000 also allows more sbrk(2) space. + archive_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' + archive_expsym_cmds='sed "s|^|_|" $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--retain-symbols-file,$output_objdir/$soname.expsym $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' + ;; + + gnu* | linux* | tpf* | k*bsd*-gnu | kopensolaris*-gnu) + tmp_diet=no + if test linux-dietlibc = "$host_os"; then + case $cc_basename in + diet\ *) tmp_diet=yes;; # linux-dietlibc with static linking (!diet-dyn) + esac + fi + if $LD --help 2>&1 | $EGREP ': supported targets:.* elf' > /dev/null \ + && test no = "$tmp_diet" + then + tmp_addflag=' $pic_flag' + tmp_sharedflag='-shared' + case $cc_basename,$host_cpu in + pgcc*) # Portland Group C compiler + whole_archive_flag_spec='$wl--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; func_echo_all \"$new_convenience\"` $wl--no-whole-archive' + tmp_addflag=' $pic_flag' + ;; + pgf77* | pgf90* | pgf95* | pgfortran*) + # Portland Group f77 and f90 compilers + whole_archive_flag_spec='$wl--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; func_echo_all \"$new_convenience\"` $wl--no-whole-archive' + tmp_addflag=' $pic_flag -Mnomain' ;; + ecc*,ia64* | icc*,ia64*) # Intel C compiler on ia64 + tmp_addflag=' -i_dynamic' ;; + efc*,ia64* | ifort*,ia64*) # Intel Fortran compiler on ia64 + tmp_addflag=' -i_dynamic -nofor_main' ;; + ifc* | ifort*) # Intel Fortran compiler + tmp_addflag=' -nofor_main' ;; + lf95*) # Lahey Fortran 8.1 + whole_archive_flag_spec= + tmp_sharedflag='--shared' ;; + nagfor*) # NAGFOR 5.3 + tmp_sharedflag='-Wl,-shared' ;; + xl[cC]* | bgxl[cC]* | mpixl[cC]*) # IBM XL C 8.0 on PPC (deal with xlf below) + tmp_sharedflag='-qmkshrobj' + tmp_addflag= ;; + nvcc*) # Cuda Compiler Driver 2.2 + whole_archive_flag_spec='$wl--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; func_echo_all \"$new_convenience\"` $wl--no-whole-archive' + compiler_needs_object=yes + ;; + esac + case `$CC -V 2>&1 | sed 5q` in + *Sun\ C*) # Sun C 5.9 + whole_archive_flag_spec='$wl--whole-archive`new_convenience=; for conv in $convenience\"\"; do test -z \"$conv\" || new_convenience=\"$new_convenience,$conv\"; done; func_echo_all \"$new_convenience\"` $wl--no-whole-archive' + compiler_needs_object=yes + tmp_sharedflag='-G' ;; + *Sun\ F*) # Sun Fortran 8.3 + tmp_sharedflag='-G' ;; + esac + archive_cmds='$CC '"$tmp_sharedflag""$tmp_addflag"' $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' + + if test yes = "$supports_anon_versioning"; then + archive_expsym_cmds='echo "{ global:" > $output_objdir/$libname.ver~ + cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~ + echo "local: *; };" >> $output_objdir/$libname.ver~ + $CC '"$tmp_sharedflag""$tmp_addflag"' $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-version-script $wl$output_objdir/$libname.ver -o $lib' + fi + case $cc_basename in + tcc*) + export_dynamic_flag_spec='-rdynamic' + ;; + xlf* | bgf* | bgxlf* | mpixlf*) + # IBM XL Fortran 10.1 on PPC cannot create shared libs itself + whole_archive_flag_spec='--whole-archive$convenience --no-whole-archive' + hardcode_libdir_flag_spec='$wl-rpath $wl$libdir' + archive_cmds='$LD -shared $libobjs $deplibs $linker_flags -soname $soname -o $lib' + if test yes = "$supports_anon_versioning"; then + archive_expsym_cmds='echo "{ global:" > $output_objdir/$libname.ver~ + cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~ + echo "local: *; };" >> $output_objdir/$libname.ver~ + $LD -shared $libobjs $deplibs $linker_flags -soname $soname -version-script $output_objdir/$libname.ver -o $lib' + fi + ;; + esac + else + ld_shlibs=no + fi + ;; + netbsd*) + if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then + archive_cmds='$LD -Bshareable $libobjs $deplibs $linker_flags -o $lib' + wlarc= + else + archive_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' + archive_expsym_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-retain-symbols-file $wl$export_symbols -o $lib' + fi + ;; + solaris*) + if $LD -v 2>&1 | $GREP 'BFD 2\.8' > /dev/null; then + ld_shlibs=no + cat <<_LT_EOF 1>&2 +*** Warning: The releases 2.8.* of the GNU linker cannot reliably +*** create shared libraries on Solaris systems. Therefore, libtool +*** is disabling shared libraries support. We urge you to upgrade GNU +*** binutils to release 2.9.1 or newer. Another option is to modify +*** your PATH or compiler configuration so that the native linker is +*** used, and then restart. +_LT_EOF + elif $LD --help 2>&1 | $GREP ': supported targets:.* elf' > /dev/null; then + archive_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' + archive_expsym_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-retain-symbols-file $wl$export_symbols -o $lib' + else + ld_shlibs=no + fi + ;; + sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX*) + case `$LD -v 2>&1` in + *\ [01].* | *\ 2.[0-9].* | *\ 2.1[0-5].*) + ld_shlibs=no + cat <<_LT_EOF 1>&2 - # Check if GNU C++ uses GNU ld as the underlying linker, since the - # archiving commands below assume that GNU ld is being used. - if test yes = "$with_gnu_ld"; then - archive_cmds_CXX='$CC $pic_flag -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-soname $wl$soname -o $lib' - archive_expsym_cmds_CXX='$CC $pic_flag -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-soname $wl$soname $wl-retain-symbols-file $wl$export_symbols -o $lib' +*** Warning: Releases of the GNU linker prior to 2.16.91.0.3 cannot +*** reliably create shared libraries on SCO systems. Therefore, libtool +*** is disabling shared libraries support. We urge you to upgrade GNU +*** binutils to release 2.16.91.0.3 or newer. Another option is to modify +*** your PATH or compiler configuration so that the native linker is +*** used, and then restart. - hardcode_libdir_flag_spec_CXX='$wl-rpath $wl$libdir' - export_dynamic_flag_spec_CXX='$wl--export-dynamic' +_LT_EOF + ;; + *) + # For security reasons, it is highly recommended that you always + # use absolute paths for naming shared libraries, and exclude the + # DT_RUNPATH tag from executables and libraries. But doing so + # requires that you compile everything twice, which is a pain. + if $LD --help 2>&1 | $GREP ': supported targets:.* elf' > /dev/null; then + hardcode_libdir_flag_spec='$wl-rpath $wl$libdir' + archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' + archive_expsym_cmds='$CC -shared $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-retain-symbols-file $wl$export_symbols -o $lib' + else + ld_shlibs=no + fi + ;; + esac + ;; - # If archive_cmds runs LD, not CC, wlarc should be empty - # XXX I think wlarc can be eliminated in ltcf-cxx, but I need to - # investigate it a little bit more. (MM) - wlarc='$wl' + sunos4*) + archive_cmds='$LD -assert pure-text -Bshareable -o $lib $libobjs $deplibs $linker_flags' + wlarc= + hardcode_direct=yes + hardcode_shlibpath_var=no + ;; - # ancient GNU ld didn't support --whole-archive et. al. - if eval "`$CC -print-prog-name=ld` --help 2>&1" | - $GREP 'no-whole-archive' > /dev/null; then - whole_archive_flag_spec_CXX=$wlarc'--whole-archive$convenience '$wlarc'--no-whole-archive' - else - whole_archive_flag_spec_CXX= - fi + *) + if $LD --help 2>&1 | $GREP ': supported targets:.* elf' > /dev/null; then + archive_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' + archive_expsym_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-retain-symbols-file $wl$export_symbols -o $lib' else - with_gnu_ld=no - wlarc= - - # A generic and very simple default shared library creation - # command for GNU C++ for the case where it uses the native - # linker, instead of GNU ld. If possible, this setting should - # overridden to take advantage of the native linker features on - # the platform it is being used on. - archive_cmds_CXX='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $lib' + ld_shlibs=no fi + ;; + esac - # Commands to make compiler produce verbose output that lists - # what "hidden" libraries, object files and flags are used when - # linking a shared library. - output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "\-L"' - - else - GXX=no - with_gnu_ld=no - wlarc= + if test no = "$ld_shlibs"; then + runpath_var= + hardcode_libdir_flag_spec= + export_dynamic_flag_spec= + whole_archive_flag_spec= fi - - # PORTME: fill in a description of your system's C++ link characteristics - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the $compiler linker ($LD) supports shared libraries" >&5 -$as_echo_n "checking whether the $compiler linker ($LD) supports shared libraries... " >&6; } - ld_shlibs_CXX=yes + else + # PORTME fill in a description of your system's linker (not GNU ld) case $host_os in - aix3*) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - aix[4-9]*) - if test ia64 = "$host_cpu"; then - # On IA64, the linker does run time linking by default, so we don't - # have to do anything special. - aix_use_runtimelinking=no - exp_sym_flag='-Bexport' - no_entry_flag= - else - aix_use_runtimelinking=no - - # Test if we are trying to use run time linking or normal - # AIX style linking. If -brtl is somewhere in LDFLAGS, we - # have runtime linking enabled, and use it for executables. - # For shared libraries, we enable/disable runtime linking - # depending on the kind of the shared library created - - # when "with_aix_soname,aix_use_runtimelinking" is: - # "aix,no" lib.a(lib.so.V) shared, rtl:no, for executables - # "aix,yes" lib.so shared, rtl:yes, for executables - # lib.a static archive - # "both,no" lib.so.V(shr.o) shared, rtl:yes - # lib.a(lib.so.V) shared, rtl:no, for executables - # "both,yes" lib.so.V(shr.o) shared, rtl:yes, for executables - # lib.a(lib.so.V) shared, rtl:no - # "svr4,*" lib.so.V(shr.o) shared, rtl:yes, for executables - # lib.a static archive - case $host_os in aix4.[23]|aix4.[23].*|aix[5-9]*) - for ld_flag in $LDFLAGS; do - case $ld_flag in - *-brtl*) - aix_use_runtimelinking=yes - break - ;; - esac - done - if test svr4,no = "$with_aix_soname,$aix_use_runtimelinking"; then - # With aix-soname=svr4, we create the lib.so.V shared archives only, - # so we don't have lib.a shared libs to link our executables. - # We have to force runtime linking in this case. - aix_use_runtimelinking=yes - LDFLAGS="$LDFLAGS -Wl,-brtl" - fi - ;; - esac + aix3*) + allow_undefined_flag=unsupported + always_export_symbols=yes + archive_expsym_cmds='$LD -o $output_objdir/$soname $libobjs $deplibs $linker_flags -bE:$export_symbols -T512 -H512 -bM:SRE~$AR $AR_FLAGS $lib $output_objdir/$soname' + # Note: this linker hardcodes the directories in LIBPATH if there + # are no directories specified by -L. + hardcode_minus_L=yes + if test yes = "$GCC" && test -z "$lt_prog_compiler_static"; then + # Neither direct hardcoding nor static linking is supported with a + # broken collect2. + hardcode_direct=unsupported + fi + ;; - exp_sym_flag='-bexport' - no_entry_flag='-bnoentry' - fi + aix[4-9]*) + if test ia64 = "$host_cpu"; then + # On IA64, the linker does run time linking by default, so we don't + # have to do anything special. + aix_use_runtimelinking=no + exp_sym_flag='-Bexport' + no_entry_flag= + else + # If we're using GNU nm, then we don't want the "-C" option. + # -C means demangle to GNU nm, but means don't demangle to AIX nm. + # Without the "-l" option, or with the "-B" option, AIX nm treats + # weak defined symbols like other global defined symbols, whereas + # GNU nm marks them as "W". + # While the 'weak' keyword is ignored in the Export File, we need + # it in the Import File for the 'aix-soname' feature, so we have + # to replace the "-B" option with "-P" for AIX nm. + if $NM -V 2>&1 | $GREP 'GNU' > /dev/null; then + export_symbols_cmds='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "W")) && (substr(\$ 3,1,1) != ".")) { if (\$ 2 == "W") { print \$ 3 " weak" } else { print \$ 3 } } }'\'' | sort -u > $export_symbols' + else + export_symbols_cmds='`func_echo_all $NM | $SED -e '\''s/B\([^B]*\)$/P\1/'\''` -PCpgl $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "L") || (\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) && (substr(\$ 1,1,1) != ".")) { if ((\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) { print \$ 1 " weak" } else { print \$ 1 } } }'\'' | sort -u > $export_symbols' + fi + aix_use_runtimelinking=no - # When large executables or shared objects are built, AIX ld can - # have problems creating the table of contents. If linking a library - # or program results in "error TOC overflow" add -mminimal-toc to - # CXXFLAGS/CFLAGS for g++/gcc. In the cases where that is not - # enough to fix the problem, add -Wl,-bbigtoc to LDFLAGS. - - archive_cmds_CXX='' - hardcode_direct_CXX=yes - hardcode_direct_absolute_CXX=yes - hardcode_libdir_separator_CXX=':' - link_all_deplibs_CXX=yes - file_list_spec_CXX='$wl-f,' - case $with_aix_soname,$aix_use_runtimelinking in - aix,*) ;; # no import file - svr4,* | *,yes) # use import file - # The Import File defines what to hardcode. - hardcode_direct_CXX=no - hardcode_direct_absolute_CXX=no - ;; - esac + # Test if we are trying to use run time linking or normal + # AIX style linking. If -brtl is somewhere in LDFLAGS, we + # have runtime linking enabled, and use it for executables. + # For shared libraries, we enable/disable runtime linking + # depending on the kind of the shared library created - + # when "with_aix_soname,aix_use_runtimelinking" is: + # "aix,no" lib.a(lib.so.V) shared, rtl:no, for executables + # "aix,yes" lib.so shared, rtl:yes, for executables + # lib.a static archive + # "both,no" lib.so.V(shr.o) shared, rtl:yes + # lib.a(lib.so.V) shared, rtl:no, for executables + # "both,yes" lib.so.V(shr.o) shared, rtl:yes, for executables + # lib.a(lib.so.V) shared, rtl:no + # "svr4,*" lib.so.V(shr.o) shared, rtl:yes, for executables + # lib.a static archive + case $host_os in aix4.[23]|aix4.[23].*|aix[5-9]*) + for ld_flag in $LDFLAGS; do + if (test x-brtl = "x$ld_flag" || test x-Wl,-brtl = "x$ld_flag"); then + aix_use_runtimelinking=yes + break + fi + done + if test svr4,no = "$with_aix_soname,$aix_use_runtimelinking"; then + # With aix-soname=svr4, we create the lib.so.V shared archives only, + # so we don't have lib.a shared libs to link our executables. + # We have to force runtime linking in this case. + aix_use_runtimelinking=yes + LDFLAGS="$LDFLAGS -Wl,-brtl" + fi + ;; + esac + + exp_sym_flag='-bexport' + no_entry_flag='-bnoentry' + fi + + # When large executables or shared objects are built, AIX ld can + # have problems creating the table of contents. If linking a library + # or program results in "error TOC overflow" add -mminimal-toc to + # CXXFLAGS/CFLAGS for g++/gcc. In the cases where that is not + # enough to fix the problem, add -Wl,-bbigtoc to LDFLAGS. + + archive_cmds='' + hardcode_direct=yes + hardcode_direct_absolute=yes + hardcode_libdir_separator=':' + link_all_deplibs=yes + file_list_spec='$wl-f,' + case $with_aix_soname,$aix_use_runtimelinking in + aix,*) ;; # traditional, no import file + svr4,* | *,yes) # use import file + # The Import File defines what to hardcode. + hardcode_direct=no + hardcode_direct_absolute=no + ;; + esac - if test yes = "$GXX"; then - case $host_os in aix4.[012]|aix4.[012].*) - # We only want to do this on AIX 4.2 and lower, the check - # below for broken collect2 doesn't work under 4.3+ + if test yes = "$GCC"; then + case $host_os in aix4.[012]|aix4.[012].*) + # We only want to do this on AIX 4.2 and lower, the check + # below for broken collect2 doesn't work under 4.3+ collect2name=`$CC -print-prog-name=collect2` if test -f "$collect2name" && - strings "$collect2name" | $GREP resolve_lib_name >/dev/null + strings "$collect2name" | $GREP resolve_lib_name >/dev/null then - # We have reworked collect2 - : + # We have reworked collect2 + : else - # We have old collect2 - hardcode_direct_CXX=unsupported - # It fails to find uninstalled libraries when the uninstalled - # path is not listed in the libpath. Setting hardcode_minus_L - # to unsupported forces relinking - hardcode_minus_L_CXX=yes - hardcode_libdir_flag_spec_CXX='-L$libdir' - hardcode_libdir_separator_CXX= + # We have old collect2 + hardcode_direct=unsupported + # It fails to find uninstalled libraries when the uninstalled + # path is not listed in the libpath. Setting hardcode_minus_L + # to unsupported forces relinking + hardcode_minus_L=yes + hardcode_libdir_flag_spec='-L$libdir' + hardcode_libdir_separator= fi - esac - shared_flag='-shared' + ;; + esac + shared_flag='-shared' + if test yes = "$aix_use_runtimelinking"; then + shared_flag="$shared_flag "'$wl-G' + fi + # Need to ensure runtime linking is disabled for the traditional + # shared library, or the linker may eventually find shared libraries + # /with/ Import File - we do not want to mix them. + shared_flag_aix='-shared' + shared_flag_svr4='-shared $wl-G' + else + # not using gcc + if test ia64 = "$host_cpu"; then + # VisualAge C++, Version 5.5 for AIX 5L for IA-64, Beta 3 Release + # chokes on -Wl,-G. The following line is correct: + shared_flag='-G' + else if test yes = "$aix_use_runtimelinking"; then - shared_flag=$shared_flag' $wl-G' + shared_flag='$wl-G' + else + shared_flag='$wl-bM:SRE' fi - # Need to ensure runtime linking is disabled for the traditional - # shared library, or the linker may eventually find shared libraries - # /with/ Import File - we do not want to mix them. - shared_flag_aix='-shared' - shared_flag_svr4='-shared $wl-G' - else - # not using gcc - if test ia64 = "$host_cpu"; then - # VisualAge C++, Version 5.5 for AIX 5L for IA-64, Beta 3 Release - # chokes on -Wl,-G. The following line is correct: - shared_flag='-G' - else - if test yes = "$aix_use_runtimelinking"; then - shared_flag='$wl-G' - else - shared_flag='$wl-bM:SRE' - fi - shared_flag_aix='$wl-bM:SRE' - shared_flag_svr4='$wl-G' - fi - fi + shared_flag_aix='$wl-bM:SRE' + shared_flag_svr4='$wl-G' + fi + fi - export_dynamic_flag_spec_CXX='$wl-bexpall' - # It seems that -bexpall does not export symbols beginning with - # underscore (_), so it is better to generate a list of symbols to - # export. - always_export_symbols_CXX=yes - if test aix,yes = "$with_aix_soname,$aix_use_runtimelinking"; then - # Warning - without using the other runtime loading flags (-brtl), - # -berok will link without error, but may produce a broken library. - # The "-G" linker flag allows undefined symbols. - no_undefined_flag_CXX='-bernotok' - # Determine the default libpath from the value encoded in an empty - # executable. - if test set = "${lt_cv_aix_libpath+set}"; then + export_dynamic_flag_spec='$wl-bexpall' + # It seems that -bexpall does not export symbols beginning with + # underscore (_), so it is better to generate a list of symbols to export. + always_export_symbols=yes + if test aix,yes = "$with_aix_soname,$aix_use_runtimelinking"; then + # Warning - without using the other runtime loading flags (-brtl), + # -berok will link without error, but may produce a broken library. + allow_undefined_flag='-berok' + # Determine the default libpath from the value encoded in an + # empty executable. + if test set = "${lt_cv_aix_libpath+set}"; then aix_libpath=$lt_cv_aix_libpath else - if ${lt_cv_aix_libpath__CXX+:} false; then : - $as_echo_n "(cached) " >&6 -else + if test ${lt_cv_aix_libpath_+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int -main () +main (void) { ; return 0; } _ACEOF -if ac_fn_cxx_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : lt_aix_libpath_sed=' /Import File Strings/,/^$/ { @@ -14785,52 +10886,53 @@ if ac_fn_cxx_try_link "$LINENO"; then : p } }' - lt_cv_aix_libpath__CXX=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e "$lt_aix_libpath_sed"` + lt_cv_aix_libpath_=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e "$lt_aix_libpath_sed"` # Check for a 64-bit object if we didn't find anything. - if test -z "$lt_cv_aix_libpath__CXX"; then - lt_cv_aix_libpath__CXX=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e "$lt_aix_libpath_sed"` + if test -z "$lt_cv_aix_libpath_"; then + lt_cv_aix_libpath_=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e "$lt_aix_libpath_sed"` fi fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext - if test -z "$lt_cv_aix_libpath__CXX"; then - lt_cv_aix_libpath__CXX=/usr/lib:/lib + if test -z "$lt_cv_aix_libpath_"; then + lt_cv_aix_libpath_=/usr/lib:/lib fi fi - aix_libpath=$lt_cv_aix_libpath__CXX + aix_libpath=$lt_cv_aix_libpath_ fi - hardcode_libdir_flag_spec_CXX='$wl-blibpath:$libdir:'"$aix_libpath" - - archive_expsym_cmds_CXX='$CC -o $output_objdir/$soname $libobjs $deplibs $wl'$no_entry_flag' $compiler_flags `if test -n "$allow_undefined_flag"; then func_echo_all "$wl$allow_undefined_flag"; else :; fi` $wl'$exp_sym_flag:\$export_symbols' '$shared_flag - else - if test ia64 = "$host_cpu"; then - hardcode_libdir_flag_spec_CXX='$wl-R $libdir:/usr/lib:/lib' - allow_undefined_flag_CXX="-z nodefs" - archive_expsym_cmds_CXX="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs '"\$wl$no_entry_flag"' $compiler_flags $wl$allow_undefined_flag '"\$wl$exp_sym_flag:\$export_symbols" - else - # Determine the default libpath from the value encoded in an - # empty executable. - if test set = "${lt_cv_aix_libpath+set}"; then + hardcode_libdir_flag_spec='$wl-blibpath:$libdir:'"$aix_libpath" + archive_expsym_cmds='$CC -o $output_objdir/$soname $libobjs $deplibs $wl'$no_entry_flag' $compiler_flags `if test -n "$allow_undefined_flag"; then func_echo_all "$wl$allow_undefined_flag"; else :; fi` $wl'$exp_sym_flag:\$export_symbols' '$shared_flag + else + if test ia64 = "$host_cpu"; then + hardcode_libdir_flag_spec='$wl-R $libdir:/usr/lib:/lib' + allow_undefined_flag="-z nodefs" + archive_expsym_cmds="\$CC $shared_flag"' -o $output_objdir/$soname $libobjs $deplibs '"\$wl$no_entry_flag"' $compiler_flags $wl$allow_undefined_flag '"\$wl$exp_sym_flag:\$export_symbols" + else + # Determine the default libpath from the value encoded in an + # empty executable. + if test set = "${lt_cv_aix_libpath+set}"; then aix_libpath=$lt_cv_aix_libpath else - if ${lt_cv_aix_libpath__CXX+:} false; then : - $as_echo_n "(cached) " >&6 -else + if test ${lt_cv_aix_libpath_+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int -main () +main (void) { ; return 0; } _ACEOF -if ac_fn_cxx_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : lt_aix_libpath_sed=' /Import File Strings/,/^$/ { @@ -14839,1017 +10941,756 @@ if ac_fn_cxx_try_link "$LINENO"; then : p } }' - lt_cv_aix_libpath__CXX=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e "$lt_aix_libpath_sed"` + lt_cv_aix_libpath_=`dump -H conftest$ac_exeext 2>/dev/null | $SED -n -e "$lt_aix_libpath_sed"` # Check for a 64-bit object if we didn't find anything. - if test -z "$lt_cv_aix_libpath__CXX"; then - lt_cv_aix_libpath__CXX=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e "$lt_aix_libpath_sed"` + if test -z "$lt_cv_aix_libpath_"; then + lt_cv_aix_libpath_=`dump -HX64 conftest$ac_exeext 2>/dev/null | $SED -n -e "$lt_aix_libpath_sed"` fi fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext - if test -z "$lt_cv_aix_libpath__CXX"; then - lt_cv_aix_libpath__CXX=/usr/lib:/lib + if test -z "$lt_cv_aix_libpath_"; then + lt_cv_aix_libpath_=/usr/lib:/lib fi fi - aix_libpath=$lt_cv_aix_libpath__CXX -fi + aix_libpath=$lt_cv_aix_libpath_ +fi + + hardcode_libdir_flag_spec='$wl-blibpath:$libdir:'"$aix_libpath" + # Warning - without using the other run time loading flags, + # -berok will link without error, but may produce a broken library. + no_undefined_flag=' $wl-bernotok' + allow_undefined_flag=' $wl-berok' + if test yes = "$with_gnu_ld"; then + # We only use this code for GNU lds that support --whole-archive. + whole_archive_flag_spec='$wl--whole-archive$convenience $wl--no-whole-archive' + else + # Exported symbols can be pulled into shared objects from archives + whole_archive_flag_spec='$convenience' + fi + archive_cmds_need_lc=yes + archive_expsym_cmds='$RM -r $output_objdir/$realname.d~$MKDIR $output_objdir/$realname.d' + # -brtl affects multiple linker settings, -berok does not and is overridden later + compiler_flags_filtered='`func_echo_all "$compiler_flags " | $SED -e "s%-brtl\\([, ]\\)%-berok\\1%g"`' + if test svr4 != "$with_aix_soname"; then + # This is similar to how AIX traditionally builds its shared libraries. + archive_expsym_cmds="$archive_expsym_cmds"'~$CC '$shared_flag_aix' -o $output_objdir/$realname.d/$soname $libobjs $deplibs $wl-bnoentry '$compiler_flags_filtered'$wl-bE:$export_symbols$allow_undefined_flag~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$realname.d/$soname' + fi + if test aix != "$with_aix_soname"; then + archive_expsym_cmds="$archive_expsym_cmds"'~$CC '$shared_flag_svr4' -o $output_objdir/$realname.d/$shared_archive_member_spec.o $libobjs $deplibs $wl-bnoentry '$compiler_flags_filtered'$wl-bE:$export_symbols$allow_undefined_flag~$STRIP -e $output_objdir/$realname.d/$shared_archive_member_spec.o~( func_echo_all "#! $soname($shared_archive_member_spec.o)"; if test shr_64 = "$shared_archive_member_spec"; then func_echo_all "# 64"; else func_echo_all "# 32"; fi; cat $export_symbols ) > $output_objdir/$realname.d/$shared_archive_member_spec.imp~$AR $AR_FLAGS $output_objdir/$soname $output_objdir/$realname.d/$shared_archive_member_spec.o $output_objdir/$realname.d/$shared_archive_member_spec.imp' + else + # used by -dlpreopen to get the symbols + archive_expsym_cmds="$archive_expsym_cmds"'~$MV $output_objdir/$realname.d/$soname $output_objdir' + fi + archive_expsym_cmds="$archive_expsym_cmds"'~$RM -r $output_objdir/$realname.d' + fi + fi + ;; + + amigaos*) + case $host_cpu in + powerpc) + # see comment about AmigaOS4 .so support + archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' + archive_expsym_cmds='' + ;; + m68k) + archive_cmds='$RM $output_objdir/a2ixlibrary.data~$ECHO "#define NAME $libname" > $output_objdir/a2ixlibrary.data~$ECHO "#define LIBRARY_ID 1" >> $output_objdir/a2ixlibrary.data~$ECHO "#define VERSION $major" >> $output_objdir/a2ixlibrary.data~$ECHO "#define REVISION $revision" >> $output_objdir/a2ixlibrary.data~$AR $AR_FLAGS $lib $libobjs~$RANLIB $lib~(cd $output_objdir && a2ixlibrary -32)' + hardcode_libdir_flag_spec='-L$libdir' + hardcode_minus_L=yes + ;; + esac + ;; + + bsdi[45]*) + export_dynamic_flag_spec=-rdynamic + ;; + + cygwin* | mingw* | pw32* | cegcc*) + # When not using gcc, we currently assume that we are using + # Microsoft Visual C++ or Intel C++ Compiler. + # hardcode_libdir_flag_spec is actually meaningless, as there is + # no search path for DLLs. + case $cc_basename in + cl* | icl*) + # Native MSVC or ICC + hardcode_libdir_flag_spec=' ' + allow_undefined_flag=unsupported + always_export_symbols=yes + file_list_spec='@' + # Tell ltmain to make .lib files, not .a files. + libext=lib + # Tell ltmain to make .dll files, not .so files. + shrext_cmds=.dll + # FIXME: Setting linknames here is a bad hack. + archive_cmds='$CC -o $output_objdir/$soname $libobjs $compiler_flags $deplibs -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~linknames=' + archive_expsym_cmds='if test DEF = "`$SED -n -e '\''s/^[ ]*//'\'' -e '\''/^\(;.*\)*$/d'\'' -e '\''s/^\(EXPORTS\|LIBRARY\)\([ ].*\)*$/DEF/p'\'' -e q $export_symbols`" ; then + cp "$export_symbols" "$output_objdir/$soname.def"; + echo "$tool_output_objdir$soname.def" > "$output_objdir/$soname.exp"; + else + $SED -e '\''s/^/-link -EXPORT:/'\'' < $export_symbols > $output_objdir/$soname.exp; + fi~ + $CC -o $tool_output_objdir$soname $libobjs $compiler_flags $deplibs "@$tool_output_objdir$soname.exp" -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~ + linknames=' + # The linker will not automatically build a static lib if we build a DLL. + # _LT_TAGVAR(old_archive_from_new_cmds, )='true' + enable_shared_with_static_runtimes=yes + exclude_expsyms='_NULL_IMPORT_DESCRIPTOR|_IMPORT_DESCRIPTOR_.*' + export_symbols_cmds='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[BCDGRS][ ]/s/.*[ ]\([^ ]*\)/\1,DATA/'\'' | $SED -e '\''/^[AITW][ ]/s/.*[ ]//'\'' | sort | uniq > $export_symbols' + # Don't use ranlib + old_postinstall_cmds='chmod 644 $oldlib' + postlink_cmds='lt_outputfile="@OUTPUT@"~ + lt_tool_outputfile="@TOOL_OUTPUT@"~ + case $lt_outputfile in + *.exe|*.EXE) ;; + *) + lt_outputfile=$lt_outputfile.exe + lt_tool_outputfile=$lt_tool_outputfile.exe + ;; + esac~ + if test : != "$MANIFEST_TOOL" && test -f "$lt_outputfile.manifest"; then + $MANIFEST_TOOL -manifest "$lt_tool_outputfile.manifest" -outputresource:"$lt_tool_outputfile" || exit 1; + $RM "$lt_outputfile.manifest"; + fi' + ;; + *) + # Assume MSVC and ICC wrapper + hardcode_libdir_flag_spec=' ' + allow_undefined_flag=unsupported + # Tell ltmain to make .lib files, not .a files. + libext=lib + # Tell ltmain to make .dll files, not .so files. + shrext_cmds=.dll + # FIXME: Setting linknames here is a bad hack. + archive_cmds='$CC -o $lib $libobjs $compiler_flags `func_echo_all "$deplibs" | $SED '\''s/ -lc$//'\''` -link -dll~linknames=' + # The linker will automatically build a .lib file if we build a DLL. + old_archive_from_new_cmds='true' + # FIXME: Should let the user specify the lib program. + old_archive_cmds='lib -OUT:$oldlib$oldobjs$old_deplibs' + enable_shared_with_static_runtimes=yes + ;; + esac + ;; + + darwin* | rhapsody*) + + + archive_cmds_need_lc=no + hardcode_direct=no + hardcode_automatic=yes + hardcode_shlibpath_var=unsupported + if test yes = "$lt_cv_ld_force_load"; then + whole_archive_flag_spec='`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience $wl-force_load,$conv\"; done; func_echo_all \"$new_convenience\"`' + + else + whole_archive_flag_spec='' + fi + link_all_deplibs=yes + allow_undefined_flag=$_lt_dar_allow_undefined + case $cc_basename in + ifort*|nagfor*) _lt_dar_can_shared=yes ;; + *) _lt_dar_can_shared=$GCC ;; + esac + if test yes = "$_lt_dar_can_shared"; then + output_verbose_link_cmd=func_echo_all + archive_cmds="\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring $_lt_dar_single_mod$_lt_dsymutil" + module_cmds="\$CC \$allow_undefined_flag -o \$lib -bundle \$libobjs \$deplibs \$compiler_flags$_lt_dsymutil" + archive_expsym_cmds="sed 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring $_lt_dar_single_mod$_lt_dar_export_syms$_lt_dsymutil" + module_expsym_cmds="sed -e 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC \$allow_undefined_flag -o \$lib -bundle \$libobjs \$deplibs \$compiler_flags$_lt_dar_export_syms$_lt_dsymutil" + + else + ld_shlibs=no + fi + + ;; + + dgux*) + archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + hardcode_libdir_flag_spec='-L$libdir' + hardcode_shlibpath_var=no + ;; + + # FreeBSD 2.2.[012] allows us to include c++rt0.o to get C++ constructor + # support. Future versions do this automatically, but an explicit c++rt0.o + # does not break anything, and helps significantly (at the cost of a little + # extra space). + freebsd2.2*) + archive_cmds='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags /usr/lib/c++rt0.o' + hardcode_libdir_flag_spec='-R$libdir' + hardcode_direct=yes + hardcode_shlibpath_var=no + ;; + + # Unfortunately, older versions of FreeBSD 2 do not have this feature. + freebsd2.*) + archive_cmds='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' + hardcode_direct=yes + hardcode_minus_L=yes + hardcode_shlibpath_var=no + ;; + + # FreeBSD 3 and greater uses gcc -shared to do shared libraries. + freebsd* | dragonfly*) + archive_cmds='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' + hardcode_libdir_flag_spec='-R$libdir' + hardcode_direct=yes + hardcode_shlibpath_var=no + ;; + + hpux9*) + if test yes = "$GCC"; then + archive_cmds='$RM $output_objdir/$soname~$CC -shared $pic_flag $wl+b $wl$install_libdir -o $output_objdir/$soname $libobjs $deplibs $compiler_flags~test "x$output_objdir/$soname" = "x$lib" || mv $output_objdir/$soname $lib' + else + archive_cmds='$RM $output_objdir/$soname~$LD -b +b $install_libdir -o $output_objdir/$soname $libobjs $deplibs $linker_flags~test "x$output_objdir/$soname" = "x$lib" || mv $output_objdir/$soname $lib' + fi + hardcode_libdir_flag_spec='$wl+b $wl$libdir' + hardcode_libdir_separator=: + hardcode_direct=yes - hardcode_libdir_flag_spec_CXX='$wl-blibpath:$libdir:'"$aix_libpath" - # Warning - without using the other run time loading flags, - # -berok will link without error, but may produce a broken library. - no_undefined_flag_CXX=' $wl-bernotok' - allow_undefined_flag_CXX=' $wl-berok' - if test yes = "$with_gnu_ld"; then - # We only use this code for GNU lds that support --whole-archive. - whole_archive_flag_spec_CXX='$wl--whole-archive$convenience $wl--no-whole-archive' - else - # Exported symbols can be pulled into shared objects from archives - whole_archive_flag_spec_CXX='$convenience' - fi - archive_cmds_need_lc_CXX=yes - archive_expsym_cmds_CXX='$RM -r $output_objdir/$realname.d~$MKDIR $output_objdir/$realname.d' - # -brtl affects multiple linker settings, -berok does not and is overridden later - compiler_flags_filtered='`func_echo_all "$compiler_flags " | $SED -e "s%-brtl\\([, ]\\)%-berok\\1%g"`' - if test svr4 != "$with_aix_soname"; then - # This is similar to how AIX traditionally builds its shared - # libraries. Need -bnortl late, we may have -brtl in LDFLAGS. - archive_expsym_cmds_CXX="$archive_expsym_cmds_CXX"'~$CC '$shared_flag_aix' -o $output_objdir/$realname.d/$soname $libobjs $deplibs $wl-bnoentry '$compiler_flags_filtered'$wl-bE:$export_symbols$allow_undefined_flag~$AR $AR_FLAGS $output_objdir/$libname$release.a $output_objdir/$realname.d/$soname' - fi - if test aix != "$with_aix_soname"; then - archive_expsym_cmds_CXX="$archive_expsym_cmds_CXX"'~$CC '$shared_flag_svr4' -o $output_objdir/$realname.d/$shared_archive_member_spec.o $libobjs $deplibs $wl-bnoentry '$compiler_flags_filtered'$wl-bE:$export_symbols$allow_undefined_flag~$STRIP -e $output_objdir/$realname.d/$shared_archive_member_spec.o~( func_echo_all "#! $soname($shared_archive_member_spec.o)"; if test shr_64 = "$shared_archive_member_spec"; then func_echo_all "# 64"; else func_echo_all "# 32"; fi; cat $export_symbols ) > $output_objdir/$realname.d/$shared_archive_member_spec.imp~$AR $AR_FLAGS $output_objdir/$soname $output_objdir/$realname.d/$shared_archive_member_spec.o $output_objdir/$realname.d/$shared_archive_member_spec.imp' - else - # used by -dlpreopen to get the symbols - archive_expsym_cmds_CXX="$archive_expsym_cmds_CXX"'~$MV $output_objdir/$realname.d/$soname $output_objdir' - fi - archive_expsym_cmds_CXX="$archive_expsym_cmds_CXX"'~$RM -r $output_objdir/$realname.d' - fi - fi - ;; + # hardcode_minus_L: Not really in the search PATH, + # but as the default location of the library. + hardcode_minus_L=yes + export_dynamic_flag_spec='$wl-E' + ;; - beos*) - if $LD --help 2>&1 | $GREP ': supported targets:.* elf' > /dev/null; then - allow_undefined_flag_CXX=unsupported - # Joseph Beckenbach says some releases of gcc - # support --undefined. This deserves some investigation. FIXME - archive_cmds_CXX='$CC -nostart $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' - else - ld_shlibs_CXX=no - fi - ;; + hpux10*) + if test yes,no = "$GCC,$with_gnu_ld"; then + archive_cmds='$CC -shared $pic_flag $wl+h $wl$soname $wl+b $wl$install_libdir -o $lib $libobjs $deplibs $compiler_flags' + else + archive_cmds='$LD -b +h $soname +b $install_libdir -o $lib $libobjs $deplibs $linker_flags' + fi + if test no = "$with_gnu_ld"; then + hardcode_libdir_flag_spec='$wl+b $wl$libdir' + hardcode_libdir_separator=: + hardcode_direct=yes + hardcode_direct_absolute=yes + export_dynamic_flag_spec='$wl-E' + # hardcode_minus_L: Not really in the search PATH, + # but as the default location of the library. + hardcode_minus_L=yes + fi + ;; - chorus*) - case $cc_basename in - *) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no + hpux11*) + if test yes,no = "$GCC,$with_gnu_ld"; then + case $host_cpu in + hppa*64*) + archive_cmds='$CC -shared $wl+h $wl$soname -o $lib $libobjs $deplibs $compiler_flags' ;; - esac - ;; - - cygwin* | mingw* | pw32* | cegcc*) - case $GXX,$cc_basename in - ,cl* | no,cl* | ,icl* | no,icl*) - # Native MSVC or ICC - # hardcode_libdir_flag_spec is actually meaningless, as there is - # no search path for DLLs. - hardcode_libdir_flag_spec_CXX=' ' - allow_undefined_flag_CXX=unsupported - always_export_symbols_CXX=yes - file_list_spec_CXX='@' - # Tell ltmain to make .lib files, not .a files. - libext=lib - # Tell ltmain to make .dll files, not .so files. - shrext_cmds=.dll - # FIXME: Setting linknames here is a bad hack. - archive_cmds_CXX='$CC -o $output_objdir/$soname $libobjs $compiler_flags $deplibs -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~linknames=' - archive_expsym_cmds_CXX='if test DEF = "`$SED -n -e '\''s/^[ ]*//'\'' -e '\''/^\(;.*\)*$/d'\'' -e '\''s/^\(EXPORTS\|LIBRARY\)\([ ].*\)*$/DEF/p'\'' -e q $export_symbols`" ; then - cp "$export_symbols" "$output_objdir/$soname.def"; - echo "$tool_output_objdir$soname.def" > "$output_objdir/$soname.exp"; - else - $SED -e '\''s/^/-link -EXPORT:/'\'' < $export_symbols > $output_objdir/$soname.exp; - fi~ - $CC -o $tool_output_objdir$soname $libobjs $compiler_flags $deplibs "@$tool_output_objdir$soname.exp" -Wl,-DLL,-IMPLIB:"$tool_output_objdir$libname.dll.lib"~ - linknames=' - # The linker will not automatically build a static lib if we build a DLL. - # _LT_TAGVAR(old_archive_from_new_cmds, CXX)='true' - enable_shared_with_static_runtimes_CXX=yes - # Don't use ranlib - old_postinstall_cmds_CXX='chmod 644 $oldlib' - postlink_cmds_CXX='lt_outputfile="@OUTPUT@"~ - lt_tool_outputfile="@TOOL_OUTPUT@"~ - case $lt_outputfile in - *.exe|*.EXE) ;; - *) - lt_outputfile=$lt_outputfile.exe - lt_tool_outputfile=$lt_tool_outputfile.exe - ;; - esac~ - func_to_tool_file "$lt_outputfile"~ - if test : != "$MANIFEST_TOOL" && test -f "$lt_outputfile.manifest"; then - $MANIFEST_TOOL -manifest "$lt_tool_outputfile.manifest" -outputresource:"$lt_tool_outputfile" || exit 1; - $RM "$lt_outputfile.manifest"; - fi' + ia64*) + archive_cmds='$CC -shared $pic_flag $wl+h $wl$soname $wl+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' ;; *) - # g++ - # _LT_TAGVAR(hardcode_libdir_flag_spec, CXX) is actually meaningless, - # as there is no search path for DLLs. - hardcode_libdir_flag_spec_CXX='-L$libdir' - export_dynamic_flag_spec_CXX='$wl--export-all-symbols' - allow_undefined_flag_CXX=unsupported - always_export_symbols_CXX=no - enable_shared_with_static_runtimes_CXX=yes - - if $LD --help 2>&1 | $GREP 'auto-import' > /dev/null; then - archive_cmds_CXX='$CC -shared -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $output_objdir/$soname $wl--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' - # If the export-symbols file already is a .def file, use it as - # is; otherwise, prepend EXPORTS... - archive_expsym_cmds_CXX='if test DEF = "`$SED -n -e '\''s/^[ ]*//'\'' -e '\''/^\(;.*\)*$/d'\'' -e '\''s/^\(EXPORTS\|LIBRARY\)\([ ].*\)*$/DEF/p'\'' -e q $export_symbols`" ; then - cp $export_symbols $output_objdir/$soname.def; - else - echo EXPORTS > $output_objdir/$soname.def; - cat $export_symbols >> $output_objdir/$soname.def; - fi~ - $CC -shared -nostdlib $output_objdir/$soname.def $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $output_objdir/$soname $wl--enable-auto-image-base -Xlinker --out-implib -Xlinker $lib' - else - ld_shlibs_CXX=no - fi + archive_cmds='$CC -shared $pic_flag $wl+h $wl$soname $wl+b $wl$install_libdir -o $lib $libobjs $deplibs $compiler_flags' ;; esac - ;; - darwin* | rhapsody*) - + else + case $host_cpu in + hppa*64*) + archive_cmds='$CC -b $wl+h $wl$soname -o $lib $libobjs $deplibs $compiler_flags' + ;; + ia64*) + archive_cmds='$CC -b $wl+h $wl$soname $wl+nodefaultrpath -o $lib $libobjs $deplibs $compiler_flags' + ;; + *) - archive_cmds_need_lc_CXX=no - hardcode_direct_CXX=no - hardcode_automatic_CXX=yes - hardcode_shlibpath_var_CXX=unsupported - if test yes = "$lt_cv_ld_force_load"; then - whole_archive_flag_spec_CXX='`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience $wl-force_load,$conv\"; done; func_echo_all \"$new_convenience\"`' + # Older versions of the 11.00 compiler do not understand -b yet + # (HP92453-01 A.11.01.20 doesn't, HP92453-01 B.11.X.35175-35176.GP does) + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if $CC understands -b" >&5 +printf %s "checking if $CC understands -b... " >&6; } +if test ${lt_cv_prog_compiler__b+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_prog_compiler__b=no + save_LDFLAGS=$LDFLAGS + LDFLAGS="$LDFLAGS -b" + echo "$lt_simple_link_test_code" > conftest.$ac_ext + if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then + # The linker can only warn and ignore the option if not recognized + # So say no if there are warnings + if test -s conftest.err; then + # Append any errors to the config.log. + cat conftest.err 1>&5 + $ECHO "$_lt_linker_boilerplate" | $SED '/^$/d' > conftest.exp + $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 + if diff conftest.exp conftest.er2 >/dev/null; then + lt_cv_prog_compiler__b=yes + fi + else + lt_cv_prog_compiler__b=yes + fi + fi + $RM -r conftest* + LDFLAGS=$save_LDFLAGS - else - whole_archive_flag_spec_CXX='' - fi - link_all_deplibs_CXX=yes - allow_undefined_flag_CXX=$_lt_dar_allow_undefined - case $cc_basename in - ifort*|nagfor*) _lt_dar_can_shared=yes ;; - *) _lt_dar_can_shared=$GCC ;; - esac - if test yes = "$_lt_dar_can_shared"; then - output_verbose_link_cmd=func_echo_all - archive_cmds_CXX="\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring $_lt_dar_single_mod$_lt_dsymutil" - module_cmds_CXX="\$CC \$allow_undefined_flag -o \$lib -bundle \$libobjs \$deplibs \$compiler_flags$_lt_dsymutil" - archive_expsym_cmds_CXX="sed 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$libobjs \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring $_lt_dar_single_mod$_lt_dar_export_syms$_lt_dsymutil" - module_expsym_cmds_CXX="sed -e 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC \$allow_undefined_flag -o \$lib -bundle \$libobjs \$deplibs \$compiler_flags$_lt_dar_export_syms$_lt_dsymutil" - if test yes != "$lt_cv_apple_cc_single_mod"; then - archive_cmds_CXX="\$CC -r -keep_private_externs -nostdlib -o \$lib-master.o \$libobjs~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$lib-master.o \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring$_lt_dsymutil" - archive_expsym_cmds_CXX="sed 's|^|_|' < \$export_symbols > \$output_objdir/\$libname-symbols.expsym~\$CC -r -keep_private_externs -nostdlib -o \$lib-master.o \$libobjs~\$CC -dynamiclib \$allow_undefined_flag -o \$lib \$lib-master.o \$deplibs \$compiler_flags -install_name \$rpath/\$soname \$verstring$_lt_dar_export_syms$_lt_dsymutil" - fi +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler__b" >&5 +printf "%s\n" "$lt_cv_prog_compiler__b" >&6; } - else - ld_shlibs_CXX=no - fi +if test yes = "$lt_cv_prog_compiler__b"; then + archive_cmds='$CC -b $wl+h $wl$soname $wl+b $wl$install_libdir -o $lib $libobjs $deplibs $compiler_flags' +else + archive_cmds='$LD -b +h $soname +b $install_libdir -o $lib $libobjs $deplibs $linker_flags' +fi - ;; + ;; + esac + fi + if test no = "$with_gnu_ld"; then + hardcode_libdir_flag_spec='$wl+b $wl$libdir' + hardcode_libdir_separator=: - os2*) - hardcode_libdir_flag_spec_CXX='-L$libdir' - hardcode_minus_L_CXX=yes - allow_undefined_flag_CXX=unsupported - shrext_cmds=.dll - archive_cmds_CXX='$ECHO "LIBRARY ${soname%$shared_ext} INITINSTANCE TERMINSTANCE" > $output_objdir/$libname.def~ - $ECHO "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~ - $ECHO "DATA MULTIPLE NONSHARED" >> $output_objdir/$libname.def~ - $ECHO EXPORTS >> $output_objdir/$libname.def~ - emxexp $libobjs | $SED /"_DLL_InitTerm"/d >> $output_objdir/$libname.def~ - $CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~ - emximp -o $lib $output_objdir/$libname.def' - archive_expsym_cmds_CXX='$ECHO "LIBRARY ${soname%$shared_ext} INITINSTANCE TERMINSTANCE" > $output_objdir/$libname.def~ - $ECHO "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~ - $ECHO "DATA MULTIPLE NONSHARED" >> $output_objdir/$libname.def~ - $ECHO EXPORTS >> $output_objdir/$libname.def~ - prefix_cmds="$SED"~ - if test EXPORTS = "`$SED 1q $export_symbols`"; then - prefix_cmds="$prefix_cmds -e 1d"; - fi~ - prefix_cmds="$prefix_cmds -e \"s/^\(.*\)$/_\1/g\""~ - cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~ - $CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~ - emximp -o $lib $output_objdir/$libname.def' - old_archive_From_new_cmds_CXX='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def' - enable_shared_with_static_runtimes_CXX=yes - file_list_spec_CXX='@' - ;; + case $host_cpu in + hppa*64*|ia64*) + hardcode_direct=no + hardcode_shlibpath_var=no + ;; + *) + hardcode_direct=yes + hardcode_direct_absolute=yes + export_dynamic_flag_spec='$wl-E' - dgux*) - case $cc_basename in - ec++*) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - ghcx*) - # Green Hills C++ Compiler - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - *) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - esac - ;; + # hardcode_minus_L: Not really in the search PATH, + # but as the default location of the library. + hardcode_minus_L=yes + ;; + esac + fi + ;; - freebsd2.*) - # C++ shared libraries reported to be fairly broken before - # switch to ELF - ld_shlibs_CXX=no - ;; + irix5* | irix6* | nonstopux*) + if test yes = "$GCC"; then + archive_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` $wl-update_registry $wl$output_objdir/so_locations -o $lib' + # Try to use the -exported_symbol ld option, if it does not + # work, assume that -exports_file does not work either and + # implicitly export all symbols. + # This should be the same for all languages, so no per-tag cache variable. + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the $host_os linker accepts -exported_symbol" >&5 +printf %s "checking whether the $host_os linker accepts -exported_symbol... " >&6; } +if test ${lt_cv_irix_exported_symbol+y} +then : + printf %s "(cached) " >&6 +else $as_nop + save_LDFLAGS=$LDFLAGS + LDFLAGS="$LDFLAGS -shared $wl-exported_symbol ${wl}foo $wl-update_registry $wl/dev/null" + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +int foo (void) { return 0; } +_ACEOF +if ac_fn_c_try_link "$LINENO" +then : + lt_cv_irix_exported_symbol=yes +else $as_nop + lt_cv_irix_exported_symbol=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam \ + conftest$ac_exeext conftest.$ac_ext + LDFLAGS=$save_LDFLAGS +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_irix_exported_symbol" >&5 +printf "%s\n" "$lt_cv_irix_exported_symbol" >&6; } + if test yes = "$lt_cv_irix_exported_symbol"; then + archive_expsym_cmds='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` $wl-update_registry $wl$output_objdir/so_locations $wl-exports_file $wl$export_symbols -o $lib' + fi + else + archive_cmds='$CC -shared $libobjs $deplibs $compiler_flags -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib' + archive_expsym_cmds='$CC -shared $libobjs $deplibs $compiler_flags -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -exports_file $export_symbols -o $lib' + fi + archive_cmds_need_lc='no' + hardcode_libdir_flag_spec='$wl-rpath $wl$libdir' + hardcode_libdir_separator=: + inherit_rpath=yes + link_all_deplibs=yes + ;; - freebsd-elf*) - archive_cmds_need_lc_CXX=no - ;; + linux*) + case $cc_basename in + tcc*) + # Fabrice Bellard et al's Tiny C Compiler + ld_shlibs=yes + archive_cmds='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' + ;; + esac + ;; - freebsd* | dragonfly*) - # FreeBSD 3 and later use GNU C++ and GNU ld with standard ELF - # conventions - ld_shlibs_CXX=yes - ;; + netbsd*) + if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then + archive_cmds='$LD -Bshareable -o $lib $libobjs $deplibs $linker_flags' # a.out + else + archive_cmds='$LD -shared -o $lib $libobjs $deplibs $linker_flags' # ELF + fi + hardcode_libdir_flag_spec='-R$libdir' + hardcode_direct=yes + hardcode_shlibpath_var=no + ;; - haiku*) - archive_cmds_CXX='$CC -shared $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' - link_all_deplibs_CXX=yes - ;; + newsos6) + archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + hardcode_direct=yes + hardcode_libdir_flag_spec='$wl-rpath $wl$libdir' + hardcode_libdir_separator=: + hardcode_shlibpath_var=no + ;; - hpux9*) - hardcode_libdir_flag_spec_CXX='$wl+b $wl$libdir' - hardcode_libdir_separator_CXX=: - export_dynamic_flag_spec_CXX='$wl-E' - hardcode_direct_CXX=yes - hardcode_minus_L_CXX=yes # Not in the search PATH, - # but as the default - # location of the library. - - case $cc_basename in - CC*) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - aCC*) - archive_cmds_CXX='$RM $output_objdir/$soname~$CC -b $wl+b $wl$install_libdir -o $output_objdir/$soname $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~test "x$output_objdir/$soname" = "x$lib" || mv $output_objdir/$soname $lib' - # Commands to make compiler produce verbose output that lists - # what "hidden" libraries, object files and flags are used when - # linking a shared library. - # - # There doesn't appear to be a way to prevent this compiler from - # explicitly linking system object files so we need to strip them - # from the output so that they don't get included in the library - # dependencies. - output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $EGREP "\-L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"' - ;; - *) - if test yes = "$GXX"; then - archive_cmds_CXX='$RM $output_objdir/$soname~$CC -shared -nostdlib $pic_flag $wl+b $wl$install_libdir -o $output_objdir/$soname $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~test "x$output_objdir/$soname" = "x$lib" || mv $output_objdir/$soname $lib' - else - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - fi - ;; - esac - ;; + *nto* | *qnx*) + ;; - hpux10*|hpux11*) - if test no = "$with_gnu_ld"; then - hardcode_libdir_flag_spec_CXX='$wl+b $wl$libdir' - hardcode_libdir_separator_CXX=: + openbsd* | bitrig*) + if test -f /usr/libexec/ld.so; then + hardcode_direct=yes + hardcode_shlibpath_var=no + hardcode_direct_absolute=yes + if test -z "`echo __ELF__ | $CC -E - | $GREP __ELF__`"; then + archive_cmds='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags $wl-retain-symbols-file,$export_symbols' + hardcode_libdir_flag_spec='$wl-rpath,$libdir' + export_dynamic_flag_spec='$wl-E' + else + archive_cmds='$CC -shared $pic_flag -o $lib $libobjs $deplibs $compiler_flags' + hardcode_libdir_flag_spec='$wl-rpath,$libdir' + fi + else + ld_shlibs=no + fi + ;; - case $host_cpu in - hppa*64*|ia64*) - ;; - *) - export_dynamic_flag_spec_CXX='$wl-E' - ;; - esac - fi - case $host_cpu in - hppa*64*|ia64*) - hardcode_direct_CXX=no - hardcode_shlibpath_var_CXX=no - ;; - *) - hardcode_direct_CXX=yes - hardcode_direct_absolute_CXX=yes - hardcode_minus_L_CXX=yes # Not in the search PATH, - # but as the default - # location of the library. - ;; - esac + os2*) + hardcode_libdir_flag_spec='-L$libdir' + hardcode_minus_L=yes + allow_undefined_flag=unsupported + shrext_cmds=.dll + archive_cmds='$ECHO "LIBRARY ${soname%$shared_ext} INITINSTANCE TERMINSTANCE" > $output_objdir/$libname.def~ + $ECHO "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~ + $ECHO "DATA MULTIPLE NONSHARED" >> $output_objdir/$libname.def~ + $ECHO EXPORTS >> $output_objdir/$libname.def~ + emxexp $libobjs | $SED /"_DLL_InitTerm"/d >> $output_objdir/$libname.def~ + $CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~ + emximp -o $lib $output_objdir/$libname.def' + archive_expsym_cmds='$ECHO "LIBRARY ${soname%$shared_ext} INITINSTANCE TERMINSTANCE" > $output_objdir/$libname.def~ + $ECHO "DESCRIPTION \"$libname\"" >> $output_objdir/$libname.def~ + $ECHO "DATA MULTIPLE NONSHARED" >> $output_objdir/$libname.def~ + $ECHO EXPORTS >> $output_objdir/$libname.def~ + prefix_cmds="$SED"~ + if test EXPORTS = "`$SED 1q $export_symbols`"; then + prefix_cmds="$prefix_cmds -e 1d"; + fi~ + prefix_cmds="$prefix_cmds -e \"s/^\(.*\)$/_\1/g\""~ + cat $export_symbols | $prefix_cmds >> $output_objdir/$libname.def~ + $CC -Zdll -Zcrtdll -o $output_objdir/$soname $libobjs $deplibs $compiler_flags $output_objdir/$libname.def~ + emximp -o $lib $output_objdir/$libname.def' + old_archive_From_new_cmds='emximp -o $output_objdir/${libname}_dll.a $output_objdir/$libname.def' + enable_shared_with_static_runtimes=yes + file_list_spec='@' + ;; - case $cc_basename in - CC*) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - aCC*) - case $host_cpu in - hppa*64*) - archive_cmds_CXX='$CC -b $wl+h $wl$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' - ;; - ia64*) - archive_cmds_CXX='$CC -b $wl+h $wl$soname $wl+nodefaultrpath -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' - ;; - *) - archive_cmds_CXX='$CC -b $wl+h $wl$soname $wl+b $wl$install_libdir -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' - ;; - esac - # Commands to make compiler produce verbose output that lists - # what "hidden" libraries, object files and flags are used when - # linking a shared library. - # - # There doesn't appear to be a way to prevent this compiler from - # explicitly linking system object files so we need to strip them - # from the output so that they don't get included in the library - # dependencies. - output_verbose_link_cmd='templist=`($CC -b $CFLAGS -v conftest.$objext 2>&1) | $GREP "\-L"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"' - ;; - *) - if test yes = "$GXX"; then - if test no = "$with_gnu_ld"; then - case $host_cpu in - hppa*64*) - archive_cmds_CXX='$CC -shared -nostdlib -fPIC $wl+h $wl$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' - ;; - ia64*) - archive_cmds_CXX='$CC -shared -nostdlib $pic_flag $wl+h $wl$soname $wl+nodefaultrpath -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' - ;; - *) - archive_cmds_CXX='$CC -shared -nostdlib $pic_flag $wl+h $wl$soname $wl+b $wl$install_libdir -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' - ;; - esac - fi - else - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - fi - ;; - esac - ;; + osf3*) + if test yes = "$GCC"; then + allow_undefined_flag=' $wl-expect_unresolved $wl\*' + archive_cmds='$CC -shared$allow_undefined_flag $libobjs $deplibs $compiler_flags $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` $wl-update_registry $wl$output_objdir/so_locations -o $lib' + else + allow_undefined_flag=' -expect_unresolved \*' + archive_cmds='$CC -shared$allow_undefined_flag $libobjs $deplibs $compiler_flags -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib' + fi + archive_cmds_need_lc='no' + hardcode_libdir_flag_spec='$wl-rpath $wl$libdir' + hardcode_libdir_separator=: + ;; - interix[3-9]*) - hardcode_direct_CXX=no - hardcode_shlibpath_var_CXX=no - hardcode_libdir_flag_spec_CXX='$wl-rpath,$libdir' - export_dynamic_flag_spec_CXX='$wl-E' - # Hack: On Interix 3.x, we cannot compile PIC because of a broken gcc. - # Instead, shared libraries are loaded at an image base (0x10000000 by - # default) and relocated if they conflict, which is a slow very memory - # consuming and fragmenting process. To avoid this, we pick a random, - # 256 KiB-aligned image base between 0x50000000 and 0x6FFC0000 at link - # time. Moving up from 0x10000000 also allows more sbrk(2) space. - archive_cmds_CXX='$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' - archive_expsym_cmds_CXX='sed "s|^|_|" $export_symbols >$output_objdir/$soname.expsym~$CC -shared $pic_flag $libobjs $deplibs $compiler_flags $wl-h,$soname $wl--retain-symbols-file,$output_objdir/$soname.expsym $wl--image-base,`expr ${RANDOM-$$} % 4096 / 2 \* 262144 + 1342177280` -o $lib' - ;; - irix5* | irix6*) - case $cc_basename in - CC*) - # SGI C++ - archive_cmds_CXX='$CC -shared -all -multigot $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib' - - # Archives containing C++ object files must be created using - # "CC -ar", where "CC" is the IRIX C++ compiler. This is - # necessary to make sure instantiated templates are included - # in the archive. - old_archive_cmds_CXX='$CC -ar -WR,-u -o $oldlib $oldobjs' - ;; - *) - if test yes = "$GXX"; then - if test no = "$with_gnu_ld"; then - archive_cmds_CXX='$CC -shared $pic_flag -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` $wl-update_registry $wl$output_objdir/so_locations -o $lib' - else - archive_cmds_CXX='$CC -shared $pic_flag -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` -o $lib' - fi - fi - link_all_deplibs_CXX=yes - ;; - esac - hardcode_libdir_flag_spec_CXX='$wl-rpath $wl$libdir' - hardcode_libdir_separator_CXX=: - inherit_rpath_CXX=yes - ;; + osf4* | osf5*) # as osf3* with the addition of -msym flag + if test yes = "$GCC"; then + allow_undefined_flag=' $wl-expect_unresolved $wl\*' + archive_cmds='$CC -shared$allow_undefined_flag $pic_flag $libobjs $deplibs $compiler_flags $wl-msym $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` $wl-update_registry $wl$output_objdir/so_locations -o $lib' + hardcode_libdir_flag_spec='$wl-rpath $wl$libdir' + else + allow_undefined_flag=' -expect_unresolved \*' + archive_cmds='$CC -shared$allow_undefined_flag $libobjs $deplibs $compiler_flags -msym -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib' + archive_expsym_cmds='for i in `cat $export_symbols`; do printf "%s %s\\n" -exported_symbol "\$i" >> $lib.exp; done; printf "%s\\n" "-hidden">> $lib.exp~ + $CC -shared$allow_undefined_flag $wl-input $wl$lib.exp $compiler_flags $libobjs $deplibs -soname $soname `test -n "$verstring" && $ECHO "-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib~$RM $lib.exp' - linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*) - case $cc_basename in - KCC*) - # Kuck and Associates, Inc. (KAI) C++ Compiler - - # KCC will only create a shared library if the output file - # ends with ".so" (or ".sl" for HP-UX), so rename the library - # to its proper name (with version) after linking. - archive_cmds_CXX='tempext=`echo $shared_ext | $SED -e '\''s/\([^()0-9A-Za-z{}]\)/\\\\\1/g'\''`; templib=`echo $lib | $SED -e "s/\$tempext\..*/.so/"`; $CC $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags --soname $soname -o \$templib; mv \$templib $lib' - archive_expsym_cmds_CXX='tempext=`echo $shared_ext | $SED -e '\''s/\([^()0-9A-Za-z{}]\)/\\\\\1/g'\''`; templib=`echo $lib | $SED -e "s/\$tempext\..*/.so/"`; $CC $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags --soname $soname -o \$templib $wl-retain-symbols-file,$export_symbols; mv \$templib $lib' - # Commands to make compiler produce verbose output that lists - # what "hidden" libraries, object files and flags are used when - # linking a shared library. - # - # There doesn't appear to be a way to prevent this compiler from - # explicitly linking system object files so we need to strip them - # from the output so that they don't get included in the library - # dependencies. - output_verbose_link_cmd='templist=`$CC $CFLAGS -v conftest.$objext -o libconftest$shared_ext 2>&1 | $GREP "ld"`; rm -f libconftest$shared_ext; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"' - - hardcode_libdir_flag_spec_CXX='$wl-rpath,$libdir' - export_dynamic_flag_spec_CXX='$wl--export-dynamic' - - # Archives containing C++ object files must be created using - # "CC -Bstatic", where "CC" is the KAI C++ compiler. - old_archive_cmds_CXX='$CC -Bstatic -o $oldlib $oldobjs' - ;; - icpc* | ecpc* ) - # Intel C++ - with_gnu_ld=yes - # version 8.0 and above of icpc choke on multiply defined symbols - # if we add $predep_objects and $postdep_objects, however 7.1 and - # earlier do not add the objects themselves. - case `$CC -V 2>&1` in - *"Version 7."*) - archive_cmds_CXX='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-soname $wl$soname -o $lib' - archive_expsym_cmds_CXX='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-soname $wl$soname $wl-retain-symbols-file $wl$export_symbols -o $lib' - ;; - *) # Version 8.0 or newer - tmp_idyn= - case $host_cpu in - ia64*) tmp_idyn=' -i_dynamic';; - esac - archive_cmds_CXX='$CC -shared'"$tmp_idyn"' $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' - archive_expsym_cmds_CXX='$CC -shared'"$tmp_idyn"' $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-retain-symbols-file $wl$export_symbols -o $lib' - ;; - esac - archive_cmds_need_lc_CXX=no - hardcode_libdir_flag_spec_CXX='$wl-rpath,$libdir' - export_dynamic_flag_spec_CXX='$wl--export-dynamic' - whole_archive_flag_spec_CXX='$wl--whole-archive$convenience $wl--no-whole-archive' - ;; - pgCC* | pgcpp*) - # Portland Group C++ compiler - case `$CC -V` in - *pgCC\ [1-5].* | *pgcpp\ [1-5].*) - prelink_cmds_CXX='tpldir=Template.dir~ - rm -rf $tpldir~ - $CC --prelink_objects --instantiation_dir $tpldir $objs $libobjs $compile_deplibs~ - compile_command="$compile_command `find $tpldir -name \*.o | sort | $NL2SP`"' - old_archive_cmds_CXX='tpldir=Template.dir~ - rm -rf $tpldir~ - $CC --prelink_objects --instantiation_dir $tpldir $oldobjs$old_deplibs~ - $AR $AR_FLAGS $oldlib$oldobjs$old_deplibs `find $tpldir -name \*.o | sort | $NL2SP`~ - $RANLIB $oldlib' - archive_cmds_CXX='tpldir=Template.dir~ - rm -rf $tpldir~ - $CC --prelink_objects --instantiation_dir $tpldir $predep_objects $libobjs $deplibs $convenience $postdep_objects~ - $CC -shared $pic_flag $predep_objects $libobjs $deplibs `find $tpldir -name \*.o | sort | $NL2SP` $postdep_objects $compiler_flags $wl-soname $wl$soname -o $lib' - archive_expsym_cmds_CXX='tpldir=Template.dir~ - rm -rf $tpldir~ - $CC --prelink_objects --instantiation_dir $tpldir $predep_objects $libobjs $deplibs $convenience $postdep_objects~ - $CC -shared $pic_flag $predep_objects $libobjs $deplibs `find $tpldir -name \*.o | sort | $NL2SP` $postdep_objects $compiler_flags $wl-soname $wl$soname $wl-retain-symbols-file $wl$export_symbols -o $lib' - ;; - *) # Version 6 and above use weak symbols - archive_cmds_CXX='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-soname $wl$soname -o $lib' - archive_expsym_cmds_CXX='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-soname $wl$soname $wl-retain-symbols-file $wl$export_symbols -o $lib' - ;; - esac + # Both c and cxx compiler support -rpath directly + hardcode_libdir_flag_spec='-rpath $libdir' + fi + archive_cmds_need_lc='no' + hardcode_libdir_separator=: + ;; - hardcode_libdir_flag_spec_CXX='$wl--rpath $wl$libdir' - export_dynamic_flag_spec_CXX='$wl--export-dynamic' - whole_archive_flag_spec_CXX='$wl--whole-archive`for conv in $convenience\"\"; do test -n \"$conv\" && new_convenience=\"$new_convenience,$conv\"; done; func_echo_all \"$new_convenience\"` $wl--no-whole-archive' - ;; - cxx*) - # Compaq C++ - archive_cmds_CXX='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-soname $wl$soname -o $lib' - archive_expsym_cmds_CXX='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-soname $wl$soname -o $lib $wl-retain-symbols-file $wl$export_symbols' - - runpath_var=LD_RUN_PATH - hardcode_libdir_flag_spec_CXX='-rpath $libdir' - hardcode_libdir_separator_CXX=: - - # Commands to make compiler produce verbose output that lists - # what "hidden" libraries, object files and flags are used when - # linking a shared library. - # - # There doesn't appear to be a way to prevent this compiler from - # explicitly linking system object files so we need to strip them - # from the output so that they don't get included in the library - # dependencies. - output_verbose_link_cmd='templist=`$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP "ld"`; templist=`func_echo_all "$templist" | $SED "s/\(^.*ld.*\)\( .*ld .*$\)/\1/"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "X$list" | $Xsed' - ;; - xl* | mpixl* | bgxl*) - # IBM XL 8.0 on PPC, with GNU ld - hardcode_libdir_flag_spec_CXX='$wl-rpath $wl$libdir' - export_dynamic_flag_spec_CXX='$wl--export-dynamic' - archive_cmds_CXX='$CC -qmkshrobj $libobjs $deplibs $compiler_flags $wl-soname $wl$soname -o $lib' - if test yes = "$supports_anon_versioning"; then - archive_expsym_cmds_CXX='echo "{ global:" > $output_objdir/$libname.ver~ - cat $export_symbols | sed -e "s/\(.*\)/\1;/" >> $output_objdir/$libname.ver~ - echo "local: *; };" >> $output_objdir/$libname.ver~ - $CC -qmkshrobj $libobjs $deplibs $compiler_flags $wl-soname $wl$soname $wl-version-script $wl$output_objdir/$libname.ver -o $lib' - fi - ;; - *) - case `$CC -V 2>&1 | sed 5q` in - *Sun\ C*) - # Sun C++ 5.9 - no_undefined_flag_CXX=' -zdefs' - archive_cmds_CXX='$CC -G$allow_undefined_flag -h$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' - archive_expsym_cmds_CXX='$CC -G$allow_undefined_flag -h$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-retain-symbols-file $wl$export_symbols' - hardcode_libdir_flag_spec_CXX='-R$libdir' - whole_archive_flag_spec_CXX='$wl--whole-archive`new_convenience=; for conv in $convenience\"\"; do test -z \"$conv\" || new_convenience=\"$new_convenience,$conv\"; done; func_echo_all \"$new_convenience\"` $wl--no-whole-archive' - compiler_needs_object_CXX=yes - - # Not sure whether something based on - # $CC $CFLAGS -v conftest.$objext -o libconftest$shared_ext 2>&1 - # would be better. - output_verbose_link_cmd='func_echo_all' - - # Archives containing C++ object files must be created using - # "CC -xar", where "CC" is the Sun C++ compiler. This is - # necessary to make sure instantiated templates are included - # in the archive. - old_archive_cmds_CXX='$CC -xar -o $oldlib $oldobjs' - ;; - esac - ;; + solaris*) + no_undefined_flag=' -z defs' + if test yes = "$GCC"; then + wlarc='$wl' + archive_cmds='$CC -shared $pic_flag $wl-z ${wl}text $wl-h $wl$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds='echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~echo "local: *; };" >> $lib.exp~ + $CC -shared $pic_flag $wl-z ${wl}text $wl-M $wl$lib.exp $wl-h $wl$soname -o $lib $libobjs $deplibs $compiler_flags~$RM $lib.exp' + else + case `$CC -V 2>&1` in + *"Compilers 5.0"*) + wlarc='' + archive_cmds='$LD -G$allow_undefined_flag -h $soname -o $lib $libobjs $deplibs $linker_flags' + archive_expsym_cmds='echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~echo "local: *; };" >> $lib.exp~ + $LD -G$allow_undefined_flag -M $lib.exp -h $soname -o $lib $libobjs $deplibs $linker_flags~$RM $lib.exp' + ;; + *) + wlarc='$wl' + archive_cmds='$CC -G$allow_undefined_flag -h $soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds='echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~echo "local: *; };" >> $lib.exp~ + $CC -G$allow_undefined_flag -M $lib.exp -h $soname -o $lib $libobjs $deplibs $compiler_flags~$RM $lib.exp' + ;; esac + fi + hardcode_libdir_flag_spec='-R$libdir' + hardcode_shlibpath_var=no + case $host_os in + solaris2.[0-5] | solaris2.[0-5].*) ;; + *) + # The compiler driver will combine and reorder linker options, + # but understands '-z linker_flag'. GCC discards it without '$wl', + # but is careful enough not to reorder. + # Supported since Solaris 2.6 (maybe 2.5.1?) + if test yes = "$GCC"; then + whole_archive_flag_spec='$wl-z ${wl}allextract$convenience $wl-z ${wl}defaultextract' + else + whole_archive_flag_spec='-z allextract$convenience -z defaultextract' + fi ;; + esac + link_all_deplibs=yes + ;; - lynxos*) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - - m88k*) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - - mvs*) - case $cc_basename in - cxx*) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - *) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - esac - ;; + sunos4*) + if test sequent = "$host_vendor"; then + # Use $CC to link under sequent, because it throws in some extra .o + # files that make .init and .fini sections work. + archive_cmds='$CC -G $wl-h $soname -o $lib $libobjs $deplibs $compiler_flags' + else + archive_cmds='$LD -assert pure-text -Bstatic -o $lib $libobjs $deplibs $linker_flags' + fi + hardcode_libdir_flag_spec='-L$libdir' + hardcode_direct=yes + hardcode_minus_L=yes + hardcode_shlibpath_var=no + ;; - netbsd*) - if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then - archive_cmds_CXX='$LD -Bshareable -o $lib $predep_objects $libobjs $deplibs $postdep_objects $linker_flags' - wlarc= - hardcode_libdir_flag_spec_CXX='-R$libdir' - hardcode_direct_CXX=yes - hardcode_shlibpath_var_CXX=no - fi - # Workaround some broken pre-1.5 toolchains - output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP conftest.$objext | $SED -e "s:-lgcc -lc -lgcc::"' + sysv4) + case $host_vendor in + sni) + archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + hardcode_direct=yes # is this really true??? ;; - - *nto* | *qnx*) - ld_shlibs_CXX=yes + siemens) + ## LD is ld it makes a PLAMLIB + ## CC just makes a GrossModule. + archive_cmds='$LD -G -o $lib $libobjs $deplibs $linker_flags' + reload_cmds='$CC -r -o $output$reload_objs' + hardcode_direct=no + ;; + motorola) + archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + hardcode_direct=no #Motorola manual says yes, but my tests say they lie ;; + esac + runpath_var='LD_RUN_PATH' + hardcode_shlibpath_var=no + ;; - openbsd* | bitrig*) - if test -f /usr/libexec/ld.so; then - hardcode_direct_CXX=yes - hardcode_shlibpath_var_CXX=no - hardcode_direct_absolute_CXX=yes - archive_cmds_CXX='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -o $lib' - hardcode_libdir_flag_spec_CXX='$wl-rpath,$libdir' - if test -z "`echo __ELF__ | $CC -E - | grep __ELF__`"; then - archive_expsym_cmds_CXX='$CC -shared $pic_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-retain-symbols-file,$export_symbols -o $lib' - export_dynamic_flag_spec_CXX='$wl-E' - whole_archive_flag_spec_CXX=$wlarc'--whole-archive$convenience '$wlarc'--no-whole-archive' - fi - output_verbose_link_cmd=func_echo_all - else - ld_shlibs_CXX=no - fi - ;; + sysv4.3*) + archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + hardcode_shlibpath_var=no + export_dynamic_flag_spec='-Bexport' + ;; - osf3* | osf4* | osf5*) - case $cc_basename in - KCC*) - # Kuck and Associates, Inc. (KAI) C++ Compiler + sysv4*MP*) + if test -d /usr/nec; then + archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + hardcode_shlibpath_var=no + runpath_var=LD_RUN_PATH + hardcode_runpath_var=yes + ld_shlibs=yes + fi + ;; - # KCC will only create a shared library if the output file - # ends with ".so" (or ".sl" for HP-UX), so rename the library - # to its proper name (with version) after linking. - archive_cmds_CXX='tempext=`echo $shared_ext | $SED -e '\''s/\([^()0-9A-Za-z{}]\)/\\\\\1/g'\''`; templib=`echo "$lib" | $SED -e "s/\$tempext\..*/.so/"`; $CC $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags --soname $soname -o \$templib; mv \$templib $lib' + sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[01].[10]* | unixware7* | sco3.2v5.0.[024]*) + no_undefined_flag='$wl-z,text' + archive_cmds_need_lc=no + hardcode_shlibpath_var=no + runpath_var='LD_RUN_PATH' - hardcode_libdir_flag_spec_CXX='$wl-rpath,$libdir' - hardcode_libdir_separator_CXX=: + if test yes = "$GCC"; then + archive_cmds='$CC -shared $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds='$CC -shared $wl-Bexport:$export_symbols $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + else + archive_cmds='$CC -G $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds='$CC -G $wl-Bexport:$export_symbols $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + fi + ;; - # Archives containing C++ object files must be created using - # the KAI C++ compiler. - case $host in - osf3*) old_archive_cmds_CXX='$CC -Bstatic -o $oldlib $oldobjs' ;; - *) old_archive_cmds_CXX='$CC -o $oldlib $oldobjs' ;; - esac - ;; - RCC*) - # Rational C++ 2.4.1 - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - cxx*) - case $host in - osf3*) - allow_undefined_flag_CXX=' $wl-expect_unresolved $wl\*' - archive_cmds_CXX='$CC -shared$allow_undefined_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-soname $soname `test -n "$verstring" && func_echo_all "$wl-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib' - hardcode_libdir_flag_spec_CXX='$wl-rpath $wl$libdir' - ;; - *) - allow_undefined_flag_CXX=' -expect_unresolved \*' - archive_cmds_CXX='$CC -shared$allow_undefined_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -msym -soname $soname `test -n "$verstring" && func_echo_all "-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib' - archive_expsym_cmds_CXX='for i in `cat $export_symbols`; do printf "%s %s\\n" -exported_symbol "\$i" >> $lib.exp; done~ - echo "-hidden">> $lib.exp~ - $CC -shared$allow_undefined_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags -msym -soname $soname $wl-input $wl$lib.exp `test -n "$verstring" && $ECHO "-set_version $verstring"` -update_registry $output_objdir/so_locations -o $lib~ - $RM $lib.exp' - hardcode_libdir_flag_spec_CXX='-rpath $libdir' - ;; - esac + sysv5* | sco3.2v5* | sco5v6*) + # Note: We CANNOT use -z defs as we might desire, because we do not + # link with -lc, and that would cause any symbols used from libc to + # always be unresolved, which means just about no library would + # ever link correctly. If we're not using GNU ld we use -z text + # though, which does catch some bad symbols but isn't as heavy-handed + # as -z defs. + no_undefined_flag='$wl-z,text' + allow_undefined_flag='$wl-z,nodefs' + archive_cmds_need_lc=no + hardcode_shlibpath_var=no + hardcode_libdir_flag_spec='$wl-R,$libdir' + hardcode_libdir_separator=':' + link_all_deplibs=yes + export_dynamic_flag_spec='$wl-Bexport' + runpath_var='LD_RUN_PATH' - hardcode_libdir_separator_CXX=: - - # Commands to make compiler produce verbose output that lists - # what "hidden" libraries, object files and flags are used when - # linking a shared library. - # - # There doesn't appear to be a way to prevent this compiler from - # explicitly linking system object files so we need to strip them - # from the output so that they don't get included in the library - # dependencies. - output_verbose_link_cmd='templist=`$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP "ld" | $GREP -v "ld:"`; templist=`func_echo_all "$templist" | $SED "s/\(^.*ld.*\)\( .*ld.*$\)/\1/"`; list= ; for z in $templist; do case $z in conftest.$objext) list="$list $z";; *.$objext);; *) list="$list $z";;esac; done; func_echo_all "$list"' - ;; - *) - if test yes,no = "$GXX,$with_gnu_ld"; then - allow_undefined_flag_CXX=' $wl-expect_unresolved $wl\*' - case $host in - osf3*) - archive_cmds_CXX='$CC -shared -nostdlib $allow_undefined_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` $wl-update_registry $wl$output_objdir/so_locations -o $lib' - ;; - *) - archive_cmds_CXX='$CC -shared $pic_flag -nostdlib $allow_undefined_flag $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-msym $wl-soname $wl$soname `test -n "$verstring" && func_echo_all "$wl-set_version $wl$verstring"` $wl-update_registry $wl$output_objdir/so_locations -o $lib' - ;; - esac - - hardcode_libdir_flag_spec_CXX='$wl-rpath $wl$libdir' - hardcode_libdir_separator_CXX=: - - # Commands to make compiler produce verbose output that lists - # what "hidden" libraries, object files and flags are used when - # linking a shared library. - output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "\-L"' - - else - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - fi - ;; - esac - ;; + if test yes = "$GCC"; then + archive_cmds='$CC -shared $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds='$CC -shared $wl-Bexport:$export_symbols $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + else + archive_cmds='$CC -G $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + archive_expsym_cmds='$CC -G $wl-Bexport:$export_symbols $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' + fi + ;; - psos*) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; + uts4*) + archive_cmds='$LD -G -h $soname -o $lib $libobjs $deplibs $linker_flags' + hardcode_libdir_flag_spec='-L$libdir' + hardcode_shlibpath_var=no + ;; - sunos4*) - case $cc_basename in - CC*) - # Sun C++ 4.x - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - lcc*) - # Lucid - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - *) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - esac - ;; + *) + ld_shlibs=no + ;; + esac - solaris*) - case $cc_basename in - CC* | sunCC*) - # Sun C++ 4.2, 5.x and Centerline C++ - archive_cmds_need_lc_CXX=yes - no_undefined_flag_CXX=' -zdefs' - archive_cmds_CXX='$CC -G$allow_undefined_flag -h$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags' - archive_expsym_cmds_CXX='echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~echo "local: *; };" >> $lib.exp~ - $CC -G$allow_undefined_flag $wl-M $wl$lib.exp -h$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$RM $lib.exp' - - hardcode_libdir_flag_spec_CXX='-R$libdir' - hardcode_shlibpath_var_CXX=no - case $host_os in - solaris2.[0-5] | solaris2.[0-5].*) ;; - *) - # The compiler driver will combine and reorder linker options, - # but understands '-z linker_flag'. - # Supported since Solaris 2.6 (maybe 2.5.1?) - whole_archive_flag_spec_CXX='-z allextract$convenience -z defaultextract' - ;; - esac - link_all_deplibs_CXX=yes + if test sni = "$host_vendor"; then + case $host in + sysv4 | sysv4.2uw2* | sysv4.3* | sysv5*) + export_dynamic_flag_spec='$wl-Blargedynsym' + ;; + esac + fi + fi - output_verbose_link_cmd='func_echo_all' +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ld_shlibs" >&5 +printf "%s\n" "$ld_shlibs" >&6; } +test no = "$ld_shlibs" && can_build_shared=no - # Archives containing C++ object files must be created using - # "CC -xar", where "CC" is the Sun C++ compiler. This is - # necessary to make sure instantiated templates are included - # in the archive. - old_archive_cmds_CXX='$CC -xar -o $oldlib $oldobjs' - ;; - gcx*) - # Green Hills C++ Compiler - archive_cmds_CXX='$CC -shared $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-h $wl$soname -o $lib' +with_gnu_ld=$with_gnu_ld - # The C++ compiler must be used to create the archive. - old_archive_cmds_CXX='$CC $LDFLAGS -archive -o $oldlib $oldobjs' - ;; - *) - # GNU C++ compiler with Solaris linker - if test yes,no = "$GXX,$with_gnu_ld"; then - no_undefined_flag_CXX=' $wl-z ${wl}defs' - if $CC --version | $GREP -v '^2\.7' > /dev/null; then - archive_cmds_CXX='$CC -shared $pic_flag -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-h $wl$soname -o $lib' - archive_expsym_cmds_CXX='echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~echo "local: *; };" >> $lib.exp~ - $CC -shared $pic_flag -nostdlib $wl-M $wl$lib.exp $wl-h $wl$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$RM $lib.exp' - - # Commands to make compiler produce verbose output that lists - # what "hidden" libraries, object files and flags are used when - # linking a shared library. - output_verbose_link_cmd='$CC -shared $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "\-L"' - else - # g++ 2.7 appears to require '-G' NOT '-shared' on this - # platform. - archive_cmds_CXX='$CC -G -nostdlib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags $wl-h $wl$soname -o $lib' - archive_expsym_cmds_CXX='echo "{ global:" > $lib.exp~cat $export_symbols | $SED -e "s/\(.*\)/\1;/" >> $lib.exp~echo "local: *; };" >> $lib.exp~ - $CC -G -nostdlib $wl-M $wl$lib.exp $wl-h $wl$soname -o $lib $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags~$RM $lib.exp' - - # Commands to make compiler produce verbose output that lists - # what "hidden" libraries, object files and flags are used when - # linking a shared library. - output_verbose_link_cmd='$CC -G $CFLAGS -v conftest.$objext 2>&1 | $GREP -v "^Configured with:" | $GREP "\-L"' - fi - - hardcode_libdir_flag_spec_CXX='$wl-R $wl$libdir' - case $host_os in - solaris2.[0-5] | solaris2.[0-5].*) ;; - *) - whole_archive_flag_spec_CXX='$wl-z ${wl}allextract$convenience $wl-z ${wl}defaultextract' - ;; - esac - fi - ;; - esac - ;; - sysv4*uw2* | sysv5OpenUNIX* | sysv5UnixWare7.[01].[10]* | unixware7* | sco3.2v5.0.[024]*) - no_undefined_flag_CXX='$wl-z,text' - archive_cmds_need_lc_CXX=no - hardcode_shlibpath_var_CXX=no - runpath_var='LD_RUN_PATH' - case $cc_basename in - CC*) - archive_cmds_CXX='$CC -G $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - archive_expsym_cmds_CXX='$CC -G $wl-Bexport:$export_symbols $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - ;; - *) - archive_cmds_CXX='$CC -shared $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - archive_expsym_cmds_CXX='$CC -shared $wl-Bexport:$export_symbols $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - ;; - esac - ;; - sysv5* | sco3.2v5* | sco5v6*) - # Note: We CANNOT use -z defs as we might desire, because we do not - # link with -lc, and that would cause any symbols used from libc to - # always be unresolved, which means just about no library would - # ever link correctly. If we're not using GNU ld we use -z text - # though, which does catch some bad symbols but isn't as heavy-handed - # as -z defs. - no_undefined_flag_CXX='$wl-z,text' - allow_undefined_flag_CXX='$wl-z,nodefs' - archive_cmds_need_lc_CXX=no - hardcode_shlibpath_var_CXX=no - hardcode_libdir_flag_spec_CXX='$wl-R,$libdir' - hardcode_libdir_separator_CXX=':' - link_all_deplibs_CXX=yes - export_dynamic_flag_spec_CXX='$wl-Bexport' - runpath_var='LD_RUN_PATH' - case $cc_basename in - CC*) - archive_cmds_CXX='$CC -G $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - archive_expsym_cmds_CXX='$CC -G $wl-Bexport:$export_symbols $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - old_archive_cmds_CXX='$CC -Tprelink_objects $oldobjs~ - '"$old_archive_cmds_CXX" - reload_cmds_CXX='$CC -Tprelink_objects $reload_objs~ - '"$reload_cmds_CXX" - ;; - *) - archive_cmds_CXX='$CC -shared $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - archive_expsym_cmds_CXX='$CC -shared $wl-Bexport:$export_symbols $wl-h,$soname -o $lib $libobjs $deplibs $compiler_flags' - ;; - esac - ;; - tandem*) - case $cc_basename in - NCC*) - # NonStop-UX NCC 3.20 - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - *) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - esac - ;; - vxworks*) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - *) - # FIXME: insert proper C++ library support - ld_shlibs_CXX=no - ;; - esac - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ld_shlibs_CXX" >&5 -$as_echo "$ld_shlibs_CXX" >&6; } - test no = "$ld_shlibs_CXX" && can_build_shared=no - - GCC_CXX=$GXX - LD_CXX=$LD - - ## CAVEAT EMPTOR: - ## There is no encapsulation within the following macros, do not change - ## the running order or otherwise move them around unless you know exactly - ## what you are doing... - # Dependencies to place before and after the object being linked: -predep_objects_CXX= -postdep_objects_CXX= -predeps_CXX= -postdeps_CXX= -compiler_lib_search_path_CXX= - -cat > conftest.$ac_ext <<_LT_EOF -class Foo -{ -public: - Foo (void) { a = 0; } -private: - int a; -}; -_LT_EOF -_lt_libdeps_save_CFLAGS=$CFLAGS -case "$CC $CFLAGS " in #( -*\ -flto*\ *) CFLAGS="$CFLAGS -fno-lto" ;; -*\ -fwhopr*\ *) CFLAGS="$CFLAGS -fno-whopr" ;; -*\ -fuse-linker-plugin*\ *) CFLAGS="$CFLAGS -fno-use-linker-plugin" ;; -esac -if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 + + + +# +# Do we need to explicitly link libc? +# +case "x$archive_cmds_need_lc" in +x|xyes) + # Assume -lc should be added + archive_cmds_need_lc=yes + + if test yes,yes = "$GCC,$enable_shared"; then + case $archive_cmds in + *'~'*) + # FIXME: we may have to deal with multi-command sequences. + ;; + '$CC '*) + # Test whether the compiler implicitly links with -lc since on some + # systems, -lgcc has to come before -lc. If gcc already passes -lc + # to ld, don't add -lc before -lgcc. + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether -lc should be explicitly linked in" >&5 +printf %s "checking whether -lc should be explicitly linked in... " >&6; } +if test ${lt_cv_archive_cmds_need_lc+y} +then : + printf %s "(cached) " >&6 +else $as_nop + $RM conftest* + echo "$lt_simple_compile_test_code" > conftest.$ac_ext + + if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 (eval $ac_compile) 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; }; then - # Parse the compiler output and extract the necessary - # objects, libraries and library flags. - - # Sentinel used to keep track of whether or not we are before - # the conftest object file. - pre_test_object_deps_done=no - - for p in `eval "$output_verbose_link_cmd"`; do - case $prev$p in - - -L* | -R* | -l*) - # Some compilers place space between "-{L,R}" and the path. - # Remove the space. - if test x-L = "$p" || - test x-R = "$p"; then - prev=$p - continue - fi + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; } 2>conftest.err; then + soname=conftest + lib=conftest + libobjs=conftest.$ac_objext + deplibs= + wl=$lt_prog_compiler_wl + pic_flag=$lt_prog_compiler_pic + compiler_flags=-v + linker_flags=-v + verstring= + output_objdir=. + libname=conftest + lt_save_allow_undefined_flag=$allow_undefined_flag + allow_undefined_flag= + if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$archive_cmds 2\>\&1 \| $GREP \" -lc \" \>/dev/null 2\>\&1\""; } >&5 + (eval $archive_cmds 2\>\&1 \| $GREP \" -lc \" \>/dev/null 2\>\&1) 2>&5 + ac_status=$? + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; } + then + lt_cv_archive_cmds_need_lc=no + else + lt_cv_archive_cmds_need_lc=yes + fi + allow_undefined_flag=$lt_save_allow_undefined_flag + else + cat conftest.err 1>&5 + fi + $RM conftest* + +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_archive_cmds_need_lc" >&5 +printf "%s\n" "$lt_cv_archive_cmds_need_lc" >&6; } + archive_cmds_need_lc=$lt_cv_archive_cmds_need_lc + ;; + esac + fi + ;; +esac + + + + + + + + + + + + + + + + + + + + + - # Expand the sysroot to ease extracting the directories later. - if test -z "$prev"; then - case $p in - -L*) func_stripname_cnf '-L' '' "$p"; prev=-L; p=$func_stripname_result ;; - -R*) func_stripname_cnf '-R' '' "$p"; prev=-R; p=$func_stripname_result ;; - -l*) func_stripname_cnf '-l' '' "$p"; prev=-l; p=$func_stripname_result ;; - esac - fi - case $p in - =*) func_stripname_cnf '=' '' "$p"; p=$lt_sysroot$func_stripname_result ;; - esac - if test no = "$pre_test_object_deps_done"; then - case $prev in - -L | -R) - # Internal compiler library paths should come after those - # provided the user. The postdeps already come after the - # user supplied libs so there is no need to process them. - if test -z "$compiler_lib_search_path_CXX"; then - compiler_lib_search_path_CXX=$prev$p - else - compiler_lib_search_path_CXX="${compiler_lib_search_path_CXX} $prev$p" - fi - ;; - # The "-l" case would never come before the object being - # linked, so don't bother handling this case. - esac - else - if test -z "$postdeps_CXX"; then - postdeps_CXX=$prev$p - else - postdeps_CXX="${postdeps_CXX} $prev$p" - fi - fi - prev= - ;; - - *.lto.$objext) ;; # Ignore GCC LTO objects - *.$objext) - # This assumes that the test object file only shows up - # once in the compiler output. - if test "$p" = "conftest.$objext"; then - pre_test_object_deps_done=yes - continue - fi - if test no = "$pre_test_object_deps_done"; then - if test -z "$predep_objects_CXX"; then - predep_objects_CXX=$p - else - predep_objects_CXX="$predep_objects_CXX $p" - fi - else - if test -z "$postdep_objects_CXX"; then - postdep_objects_CXX=$p - else - postdep_objects_CXX="$postdep_objects_CXX $p" - fi - fi - ;; - *) ;; # Ignore the rest. - esac - done - # Clean up. - rm -f a.out a.exe -else - echo "libtool.m4: error: problem compiling CXX test program" -fi -$RM -f confest.$objext -CFLAGS=$_lt_libdeps_save_CFLAGS -# PORTME: override above test on systems where it is broken -case $host_os in -interix[3-9]*) - # Interix 3.5 installs completely hosed .la files for C++, so rather than - # hack all around it, let's just trust "g++" to DTRT. - predep_objects_CXX= - postdep_objects_CXX= - postdeps_CXX= - ;; -esac -case " $postdeps_CXX " in -*" -lc "*) archive_cmds_need_lc_CXX=no ;; -esac - compiler_lib_search_dirs_CXX= -if test -n "${compiler_lib_search_path_CXX}"; then - compiler_lib_search_dirs_CXX=`echo " ${compiler_lib_search_path_CXX}" | $SED -e 's! -L! !g' -e 's!^ !!'` -fi @@ -15881,1549 +11722,1598 @@ fi - lt_prog_compiler_wl_CXX= -lt_prog_compiler_pic_CXX= -lt_prog_compiler_static_CXX= - # C++ specific cases for pic, static, wl, etc. - if test yes = "$GXX"; then - lt_prog_compiler_wl_CXX='-Wl,' - lt_prog_compiler_static_CXX='-static' - case $host_os in - aix*) - # All AIX code is PIC. - if test ia64 = "$host_cpu"; then - # AIX 5 now supports IA64 processor - lt_prog_compiler_static_CXX='-Bstatic' - fi - lt_prog_compiler_pic_CXX='-fPIC' - ;; - amigaos*) - case $host_cpu in - powerpc) - # see comment about AmigaOS4 .so support - lt_prog_compiler_pic_CXX='-fPIC' - ;; - m68k) - # FIXME: we need at least 68020 code to build shared libraries, but - # adding the '-m68020' flag to GCC prevents building anything better, - # like '-m68040'. - lt_prog_compiler_pic_CXX='-m68020 -resident32 -malways-restore-a4' - ;; - esac - ;; - beos* | irix5* | irix6* | nonstopux* | osf3* | osf4* | osf5*) - # PIC is the default for these OSes. - ;; - mingw* | cygwin* | os2* | pw32* | cegcc*) - # This hack is so that the source file can tell whether it is being - # built for inclusion in a dll (and should export symbols for example). - # Although the cygwin gcc ignores -fPIC, still need this for old-style - # (--disable-auto-import) libraries - lt_prog_compiler_pic_CXX='-DDLL_EXPORT' - case $host_os in - os2*) - lt_prog_compiler_static_CXX='$wl-static' - ;; - esac - ;; - darwin* | rhapsody*) - # PIC is the default on this platform - # Common symbols not allowed in MH_DYLIB files - lt_prog_compiler_pic_CXX='-fno-common' - ;; - *djgpp*) - # DJGPP does not support shared libraries at all - lt_prog_compiler_pic_CXX= - ;; - haiku*) - # PIC is the default for Haiku. - # The "-static" flag exists, but is broken. - lt_prog_compiler_static_CXX= - ;; - interix[3-9]*) - # Interix 3.x gcc -fpic/-fPIC options generate broken code. - # Instead, we relocate shared libraries at runtime. - ;; - sysv4*MP*) - if test -d /usr/nec; then - lt_prog_compiler_pic_CXX=-Kconform_pic - fi - ;; - hpux*) - # PIC is the default for 64-bit PA HP-UX, but not for 32-bit - # PA HP-UX. On IA64 HP-UX, PIC is the default but the pic flag - # sets the default TLS model and affects inlining. - case $host_cpu in - hppa*64*) - ;; - *) - lt_prog_compiler_pic_CXX='-fPIC' - ;; - esac - ;; - *qnx* | *nto*) - # QNX uses GNU C++, but need to define -shared option too, otherwise - # it will coredump. - lt_prog_compiler_pic_CXX='-fPIC -shared' - ;; - *) - lt_prog_compiler_pic_CXX='-fPIC' - ;; - esac - else - case $host_os in - aix[4-9]*) - # All AIX code is PIC. - if test ia64 = "$host_cpu"; then - # AIX 5 now supports IA64 processor - lt_prog_compiler_static_CXX='-Bstatic' - else - lt_prog_compiler_static_CXX='-bnso -bI:/lib/syscalls.exp' - fi - ;; - chorus*) - case $cc_basename in - cxch68*) - # Green Hills C++ Compiler - # _LT_TAGVAR(lt_prog_compiler_static, CXX)="--no_auto_instantiation -u __main -u __premain -u _abort -r $COOL_DIR/lib/libOrb.a $MVME_DIR/lib/CC/libC.a $MVME_DIR/lib/classix/libcx.s.a" - ;; - esac - ;; - mingw* | cygwin* | os2* | pw32* | cegcc*) - # This hack is so that the source file can tell whether it is being - # built for inclusion in a dll (and should export symbols for example). - lt_prog_compiler_pic_CXX='-DDLL_EXPORT' - ;; - dgux*) - case $cc_basename in - ec++*) - lt_prog_compiler_pic_CXX='-KPIC' - ;; - ghcx*) - # Green Hills C++ Compiler - lt_prog_compiler_pic_CXX='-pic' - ;; - *) - ;; - esac - ;; - freebsd* | dragonfly*) - # FreeBSD uses GNU C++ - ;; - hpux9* | hpux10* | hpux11*) - case $cc_basename in - CC*) - lt_prog_compiler_wl_CXX='-Wl,' - lt_prog_compiler_static_CXX='$wl-a ${wl}archive' - if test ia64 != "$host_cpu"; then - lt_prog_compiler_pic_CXX='+Z' - fi - ;; - aCC*) - lt_prog_compiler_wl_CXX='-Wl,' - lt_prog_compiler_static_CXX='$wl-a ${wl}archive' - case $host_cpu in - hppa*64*|ia64*) - # +Z the default - ;; - *) - lt_prog_compiler_pic_CXX='+Z' - ;; - esac - ;; - *) - ;; - esac - ;; - interix*) - # This is c89, which is MS Visual C++ (no shared libs) - # Anyone wants to do a port? - ;; - irix5* | irix6* | nonstopux*) - case $cc_basename in - CC*) - lt_prog_compiler_wl_CXX='-Wl,' - lt_prog_compiler_static_CXX='-non_shared' - # CC pic flag -KPIC is the default. - ;; - *) - ;; - esac - ;; - linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*) - case $cc_basename in - KCC*) - # KAI C++ Compiler - lt_prog_compiler_wl_CXX='--backend -Wl,' - lt_prog_compiler_pic_CXX='-fPIC' - ;; - ecpc* ) - # old Intel C++ for x86_64, which still supported -KPIC. - lt_prog_compiler_wl_CXX='-Wl,' - lt_prog_compiler_pic_CXX='-KPIC' - lt_prog_compiler_static_CXX='-static' - ;; - icpc* ) - # Intel C++, used to be incompatible with GCC. - # ICC 10 doesn't accept -KPIC any more. - lt_prog_compiler_wl_CXX='-Wl,' - lt_prog_compiler_pic_CXX='-fPIC' - lt_prog_compiler_static_CXX='-static' - ;; - pgCC* | pgcpp*) - # Portland Group C++ compiler - lt_prog_compiler_wl_CXX='-Wl,' - lt_prog_compiler_pic_CXX='-fpic' - lt_prog_compiler_static_CXX='-Bstatic' - ;; - cxx*) - # Compaq C++ - # Make sure the PIC flag is empty. It appears that all Alpha - # Linux and Compaq Tru64 Unix objects are PIC. - lt_prog_compiler_pic_CXX= - lt_prog_compiler_static_CXX='-non_shared' - ;; - xlc* | xlC* | bgxl[cC]* | mpixl[cC]*) - # IBM XL 8.0, 9.0 on PPC and BlueGene - lt_prog_compiler_wl_CXX='-Wl,' - lt_prog_compiler_pic_CXX='-qpic' - lt_prog_compiler_static_CXX='-qstaticlink' - ;; - *) - case `$CC -V 2>&1 | sed 5q` in - *Sun\ C*) - # Sun C++ 5.9 - lt_prog_compiler_pic_CXX='-KPIC' - lt_prog_compiler_static_CXX='-Bstatic' - lt_prog_compiler_wl_CXX='-Qoption ld ' - ;; - esac - ;; - esac - ;; - lynxos*) - ;; - m88k*) - ;; - mvs*) - case $cc_basename in - cxx*) - lt_prog_compiler_pic_CXX='-W c,exportall' - ;; - *) - ;; - esac - ;; - netbsd*) - ;; - *qnx* | *nto*) - # QNX uses GNU C++, but need to define -shared option too, otherwise - # it will coredump. - lt_prog_compiler_pic_CXX='-fPIC -shared' - ;; - osf3* | osf4* | osf5*) - case $cc_basename in - KCC*) - lt_prog_compiler_wl_CXX='--backend -Wl,' - ;; - RCC*) - # Rational C++ 2.4.1 - lt_prog_compiler_pic_CXX='-pic' - ;; - cxx*) - # Digital/Compaq C++ - lt_prog_compiler_wl_CXX='-Wl,' - # Make sure the PIC flag is empty. It appears that all Alpha - # Linux and Compaq Tru64 Unix objects are PIC. - lt_prog_compiler_pic_CXX= - lt_prog_compiler_static_CXX='-non_shared' - ;; - *) - ;; - esac - ;; - psos*) - ;; - solaris*) - case $cc_basename in - CC* | sunCC*) - # Sun C++ 4.2, 5.x and Centerline C++ - lt_prog_compiler_pic_CXX='-KPIC' - lt_prog_compiler_static_CXX='-Bstatic' - lt_prog_compiler_wl_CXX='-Qoption ld ' - ;; - gcx*) - # Green Hills C++ Compiler - lt_prog_compiler_pic_CXX='-PIC' - ;; - *) - ;; - esac - ;; - sunos4*) - case $cc_basename in - CC*) - # Sun C++ 4.x - lt_prog_compiler_pic_CXX='-pic' - lt_prog_compiler_static_CXX='-Bstatic' - ;; - lcc*) - # Lucid - lt_prog_compiler_pic_CXX='-pic' - ;; - *) - ;; - esac - ;; - sysv5* | unixware* | sco3.2v5* | sco5v6* | OpenUNIX*) - case $cc_basename in - CC*) - lt_prog_compiler_wl_CXX='-Wl,' - lt_prog_compiler_pic_CXX='-KPIC' - lt_prog_compiler_static_CXX='-Bstatic' - ;; - esac - ;; - tandem*) - case $cc_basename in - NCC*) - # NonStop-UX NCC 3.20 - lt_prog_compiler_pic_CXX='-KPIC' - ;; - *) - ;; - esac - ;; - vxworks*) - ;; - *) - lt_prog_compiler_can_build_shared_CXX=no - ;; - esac - fi -case $host_os in - # For platforms that do not support PIC, -DPIC is meaningless: - *djgpp*) - lt_prog_compiler_pic_CXX= - ;; - *) - lt_prog_compiler_pic_CXX="$lt_prog_compiler_pic_CXX -DPIC" - ;; -esac -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $compiler option to produce PIC" >&5 -$as_echo_n "checking for $compiler option to produce PIC... " >&6; } -if ${lt_cv_prog_compiler_pic_CXX+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_prog_compiler_pic_CXX=$lt_prog_compiler_pic_CXX -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_pic_CXX" >&5 -$as_echo "$lt_cv_prog_compiler_pic_CXX" >&6; } -lt_prog_compiler_pic_CXX=$lt_cv_prog_compiler_pic_CXX -# -# Check to make sure the PIC flag actually works. -# -if test -n "$lt_prog_compiler_pic_CXX"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: checking if $compiler PIC flag $lt_prog_compiler_pic_CXX works" >&5 -$as_echo_n "checking if $compiler PIC flag $lt_prog_compiler_pic_CXX works... " >&6; } -if ${lt_cv_prog_compiler_pic_works_CXX+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_prog_compiler_pic_works_CXX=no - ac_outfile=conftest.$ac_objext - echo "$lt_simple_compile_test_code" > conftest.$ac_ext - lt_compiler_flag="$lt_prog_compiler_pic_CXX -DPIC" ## exclude from sc_useless_quotes_in_assignment - # Insert the option either (1) after the last *FLAGS variable, or - # (2) before a word containing "conftest.", or (3) at the end. - # Note that $ac_compile itself does not contain backslashes and begins - # with a dollar sign (not a hyphen), so the echo should work correctly. - # The option is referenced via a variable to avoid confusing sed. - lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ - -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ - -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:$LINENO: $lt_compile\"" >&5) - (eval "$lt_compile" 2>conftest.err) - ac_status=$? - cat conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - if (exit $ac_status) && test -s "$ac_outfile"; then - # The compiler can only warn and ignore the option if not recognized - # So say no if there are warnings other than the usual output. - $ECHO "$_lt_compiler_boilerplate" | $SED '/^$/d' >conftest.exp - $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 - if test ! -s conftest.er2 || diff conftest.exp conftest.er2 >/dev/null; then - lt_cv_prog_compiler_pic_works_CXX=yes - fi - fi - $RM conftest* -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_pic_works_CXX" >&5 -$as_echo "$lt_cv_prog_compiler_pic_works_CXX" >&6; } -if test yes = "$lt_cv_prog_compiler_pic_works_CXX"; then - case $lt_prog_compiler_pic_CXX in - "" | " "*) ;; - *) lt_prog_compiler_pic_CXX=" $lt_prog_compiler_pic_CXX" ;; - esac -else - lt_prog_compiler_pic_CXX= - lt_prog_compiler_can_build_shared_CXX=no -fi -fi -# -# Check to make sure the static flag actually works. -# -wl=$lt_prog_compiler_wl_CXX eval lt_tmp_static_flag=\"$lt_prog_compiler_static_CXX\" -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking if $compiler static flag $lt_tmp_static_flag works" >&5 -$as_echo_n "checking if $compiler static flag $lt_tmp_static_flag works... " >&6; } -if ${lt_cv_prog_compiler_static_works_CXX+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_prog_compiler_static_works_CXX=no - save_LDFLAGS=$LDFLAGS - LDFLAGS="$LDFLAGS $lt_tmp_static_flag" - echo "$lt_simple_link_test_code" > conftest.$ac_ext - if (eval $ac_link 2>conftest.err) && test -s conftest$ac_exeext; then - # The linker can only warn and ignore the option if not recognized - # So say no if there are warnings - if test -s conftest.err; then - # Append any errors to the config.log. - cat conftest.err 1>&5 - $ECHO "$_lt_linker_boilerplate" | $SED '/^$/d' > conftest.exp - $SED '/^$/d; /^ *+/d' conftest.err >conftest.er2 - if diff conftest.exp conftest.er2 >/dev/null; then - lt_cv_prog_compiler_static_works_CXX=yes - fi - else - lt_cv_prog_compiler_static_works_CXX=yes - fi - fi - $RM -r conftest* - LDFLAGS=$save_LDFLAGS -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_static_works_CXX" >&5 -$as_echo "$lt_cv_prog_compiler_static_works_CXX" >&6; } -if test yes = "$lt_cv_prog_compiler_static_works_CXX"; then - : -else - lt_prog_compiler_static_CXX= -fi - { $as_echo "$as_me:${as_lineno-$LINENO}: checking if $compiler supports -c -o file.$ac_objext" >&5 -$as_echo_n "checking if $compiler supports -c -o file.$ac_objext... " >&6; } -if ${lt_cv_prog_compiler_c_o_CXX+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_prog_compiler_c_o_CXX=no - $RM -r conftest 2>/dev/null - mkdir conftest - cd conftest - mkdir out - echo "$lt_simple_compile_test_code" > conftest.$ac_ext - lt_compiler_flag="-o out/conftest2.$ac_objext" - # Insert the option either (1) after the last *FLAGS variable, or - # (2) before a word containing "conftest.", or (3) at the end. - # Note that $ac_compile itself does not contain backslashes and begins - # with a dollar sign (not a hyphen), so the echo should work correctly. - lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ - -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ - -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:$LINENO: $lt_compile\"" >&5) - (eval "$lt_compile" 2>out/conftest.err) - ac_status=$? - cat out/conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - if (exit $ac_status) && test -s out/conftest2.$ac_objext - then - # The compiler can only warn and ignore the option if not recognized - # So say no if there are warnings - $ECHO "$_lt_compiler_boilerplate" | $SED '/^$/d' > out/conftest.exp - $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 - if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then - lt_cv_prog_compiler_c_o_CXX=yes - fi - fi - chmod u+w . 2>&5 - $RM conftest* - # SGI C++ compiler will create directory out/ii_files/ for - # template instantiation - test -d out/ii_files && $RM out/ii_files/* && rmdir out/ii_files - $RM out/* && rmdir out - cd .. - $RM -r conftest - $RM conftest* + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking dynamic linker characteristics" >&5 +printf %s "checking dynamic linker characteristics... " >&6; } + +if test yes = "$GCC"; then + case $host_os in + darwin*) lt_awk_arg='/^libraries:/,/LR/' ;; + *) lt_awk_arg='/^libraries:/' ;; + esac + case $host_os in + mingw* | cegcc*) lt_sed_strip_eq='s|=\([A-Za-z]:\)|\1|g' ;; + *) lt_sed_strip_eq='s|=/|/|g' ;; + esac + lt_search_path_spec=`$CC -print-search-dirs | awk $lt_awk_arg | $SED -e "s/^libraries://" -e $lt_sed_strip_eq` + case $lt_search_path_spec in + *\;*) + # if the path contains ";" then we assume it to be the separator + # otherwise default to the standard path separator (i.e. ":") - it is + # assumed that no part of a normal pathname contains ";" but that should + # okay in the real world where ";" in dirpaths is itself problematic. + lt_search_path_spec=`$ECHO "$lt_search_path_spec" | $SED 's/;/ /g'` + ;; + *) + lt_search_path_spec=`$ECHO "$lt_search_path_spec" | $SED "s/$PATH_SEPARATOR/ /g"` + ;; + esac + # Ok, now we have the path, separated by spaces, we can step through it + # and add multilib dir if necessary... + lt_tmp_lt_search_path_spec= + lt_multi_os_dir=/`$CC $CPPFLAGS $CFLAGS $LDFLAGS -print-multi-os-directory 2>/dev/null` + # ...but if some path component already ends with the multilib dir we assume + # that all is fine and trust -print-search-dirs as is (GCC 4.2? or newer). + case "$lt_multi_os_dir; $lt_search_path_spec " in + "/; "* | "/.; "* | "/./; "* | *"$lt_multi_os_dir "* | *"$lt_multi_os_dir/ "*) + lt_multi_os_dir= + ;; + esac + for lt_sys_path in $lt_search_path_spec; do + if test -d "$lt_sys_path$lt_multi_os_dir"; then + lt_tmp_lt_search_path_spec="$lt_tmp_lt_search_path_spec $lt_sys_path$lt_multi_os_dir" + elif test -n "$lt_multi_os_dir"; then + test -d "$lt_sys_path" && \ + lt_tmp_lt_search_path_spec="$lt_tmp_lt_search_path_spec $lt_sys_path" + fi + done + lt_search_path_spec=`$ECHO "$lt_tmp_lt_search_path_spec" | awk ' +BEGIN {RS = " "; FS = "/|\n";} { + lt_foo = ""; + lt_count = 0; + for (lt_i = NF; lt_i > 0; lt_i--) { + if ($lt_i != "" && $lt_i != ".") { + if ($lt_i == "..") { + lt_count++; + } else { + if (lt_count == 0) { + lt_foo = "/" $lt_i lt_foo; + } else { + lt_count--; + } + } + } + } + if (lt_foo != "") { lt_freq[lt_foo]++; } + if (lt_freq[lt_foo] == 1) { print lt_foo; } +}'` + # AWK program above erroneously prepends '/' to C:/dos/paths + # for these hosts. + case $host_os in + mingw* | cegcc*) lt_search_path_spec=`$ECHO "$lt_search_path_spec" |\ + $SED 's|/\([A-Za-z]:\)|\1|g'` ;; + esac + sys_lib_search_path_spec=`$ECHO "$lt_search_path_spec" | $lt_NL2SP` +else + sys_lib_search_path_spec="/lib /usr/lib /usr/local/lib" fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_c_o_CXX" >&5 -$as_echo "$lt_cv_prog_compiler_c_o_CXX" >&6; } +library_names_spec= +libname_spec='lib$name' +soname_spec= +shrext_cmds=.so +postinstall_cmds= +postuninstall_cmds= +finish_cmds= +finish_eval= +shlibpath_var= +shlibpath_overrides_runpath=unknown +version_type=none +dynamic_linker="$host_os ld.so" +sys_lib_dlsearch_path_spec="/lib /usr/lib" +need_lib_prefix=unknown +hardcode_into_libs=no +# when you set need_version to no, make sure it does not cause -set_version +# flags to be left without arguments +need_version=unknown - { $as_echo "$as_me:${as_lineno-$LINENO}: checking if $compiler supports -c -o file.$ac_objext" >&5 -$as_echo_n "checking if $compiler supports -c -o file.$ac_objext... " >&6; } -if ${lt_cv_prog_compiler_c_o_CXX+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_prog_compiler_c_o_CXX=no - $RM -r conftest 2>/dev/null - mkdir conftest - cd conftest - mkdir out - echo "$lt_simple_compile_test_code" > conftest.$ac_ext - lt_compiler_flag="-o out/conftest2.$ac_objext" - # Insert the option either (1) after the last *FLAGS variable, or - # (2) before a word containing "conftest.", or (3) at the end. - # Note that $ac_compile itself does not contain backslashes and begins - # with a dollar sign (not a hyphen), so the echo should work correctly. - lt_compile=`echo "$ac_compile" | $SED \ - -e 's:.*FLAGS}\{0,1\} :&$lt_compiler_flag :; t' \ - -e 's: [^ ]*conftest\.: $lt_compiler_flag&:; t' \ - -e 's:$: $lt_compiler_flag:'` - (eval echo "\"\$as_me:$LINENO: $lt_compile\"" >&5) - (eval "$lt_compile" 2>out/conftest.err) - ac_status=$? - cat out/conftest.err >&5 - echo "$as_me:$LINENO: \$? = $ac_status" >&5 - if (exit $ac_status) && test -s out/conftest2.$ac_objext - then - # The compiler can only warn and ignore the option if not recognized - # So say no if there are warnings - $ECHO "$_lt_compiler_boilerplate" | $SED '/^$/d' > out/conftest.exp - $SED '/^$/d; /^ *+/d' out/conftest.err >out/conftest.er2 - if test ! -s out/conftest.er2 || diff out/conftest.exp out/conftest.er2 >/dev/null; then - lt_cv_prog_compiler_c_o_CXX=yes - fi - fi - chmod u+w . 2>&5 - $RM conftest* - # SGI C++ compiler will create directory out/ii_files/ for - # template instantiation - test -d out/ii_files && $RM out/ii_files/* && rmdir out/ii_files - $RM out/* && rmdir out - cd .. - $RM -r conftest - $RM conftest* +case $host_os in +aix3*) + version_type=linux # correct to gnu/linux during the next big refactor + library_names_spec='$libname$release$shared_ext$versuffix $libname.a' + shlibpath_var=LIBPATH -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_prog_compiler_c_o_CXX" >&5 -$as_echo "$lt_cv_prog_compiler_c_o_CXX" >&6; } + # AIX 3 has no versioning support, so we append a major version to the name. + soname_spec='$libname$release$shared_ext$major' + ;; + +aix[4-9]*) + version_type=linux # correct to gnu/linux during the next big refactor + need_lib_prefix=no + need_version=no + hardcode_into_libs=yes + if test ia64 = "$host_cpu"; then + # AIX 5 supports IA64 + library_names_spec='$libname$release$shared_ext$major $libname$release$shared_ext$versuffix $libname$shared_ext' + shlibpath_var=LD_LIBRARY_PATH + else + # With GCC up to 2.95.x, collect2 would create an import file + # for dependence libraries. The import file would start with + # the line '#! .'. This would cause the generated library to + # depend on '.', always an invalid library. This was fixed in + # development snapshots of GCC prior to 3.0. + case $host_os in + aix4 | aix4.[01] | aix4.[01].*) + if { echo '#if __GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 97)' + echo ' yes ' + echo '#endif'; } | $CC -E - | $GREP yes > /dev/null; then + : + else + can_build_shared=no + fi + ;; + esac + # Using Import Files as archive members, it is possible to support + # filename-based versioning of shared library archives on AIX. While + # this would work for both with and without runtime linking, it will + # prevent static linking of such archives. So we do filename-based + # shared library versioning with .so extension only, which is used + # when both runtime linking and shared linking is enabled. + # Unfortunately, runtime linking may impact performance, so we do + # not want this to be the default eventually. Also, we use the + # versioned .so libs for executables only if there is the -brtl + # linker flag in LDFLAGS as well, or --with-aix-soname=svr4 only. + # To allow for filename-based versioning support, we need to create + # libNAME.so.V as an archive file, containing: + # *) an Import File, referring to the versioned filename of the + # archive as well as the shared archive member, telling the + # bitwidth (32 or 64) of that shared object, and providing the + # list of exported symbols of that shared object, eventually + # decorated with the 'weak' keyword + # *) the shared object with the F_LOADONLY flag set, to really avoid + # it being seen by the linker. + # At run time we better use the real file rather than another symlink, + # but for link time we create the symlink libNAME.so -> libNAME.so.V + case $with_aix_soname,$aix_use_runtimelinking in + # AIX (on Power*) has no versioning support, so currently we cannot hardcode correct + # soname into executable. Probably we can add versioning support to + # collect2, so additional links can be useful in future. + aix,yes) # traditional libtool + dynamic_linker='AIX unversionable lib.so' + # If using run time linking (on AIX 4.2 or later) use lib.so + # instead of lib.a to let people know that these are not + # typical AIX shared libraries. + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + ;; + aix,no) # traditional AIX only + dynamic_linker='AIX lib.a(lib.so.V)' + # We preserve .a as extension for shared libraries through AIX4.2 + # and later when we are not doing run time linking. + library_names_spec='$libname$release.a $libname.a' + soname_spec='$libname$release$shared_ext$major' + ;; + svr4,*) # full svr4 only + dynamic_linker="AIX lib.so.V($shared_archive_member_spec.o)" + library_names_spec='$libname$release$shared_ext$major $libname$shared_ext' + # We do not specify a path in Import Files, so LIBPATH fires. + shlibpath_overrides_runpath=yes + ;; + *,yes) # both, prefer svr4 + dynamic_linker="AIX lib.so.V($shared_archive_member_spec.o), lib.a(lib.so.V)" + library_names_spec='$libname$release$shared_ext$major $libname$shared_ext' + # unpreferred sharedlib libNAME.a needs extra handling + postinstall_cmds='test -n "$linkname" || linkname="$realname"~func_stripname "" ".so" "$linkname"~$install_shared_prog "$dir/$func_stripname_result.$libext" "$destdir/$func_stripname_result.$libext"~test -z "$tstripme" || test -z "$striplib" || $striplib "$destdir/$func_stripname_result.$libext"' + postuninstall_cmds='for n in $library_names $old_library; do :; done~func_stripname "" ".so" "$n"~test "$func_stripname_result" = "$n" || func_append rmfiles " $odir/$func_stripname_result.$libext"' + # We do not specify a path in Import Files, so LIBPATH fires. + shlibpath_overrides_runpath=yes + ;; + *,no) # both, prefer aix + dynamic_linker="AIX lib.a(lib.so.V), lib.so.V($shared_archive_member_spec.o)" + library_names_spec='$libname$release.a $libname.a' + soname_spec='$libname$release$shared_ext$major' + # unpreferred sharedlib libNAME.so.V and symlink libNAME.so need extra handling + postinstall_cmds='test -z "$dlname" || $install_shared_prog $dir/$dlname $destdir/$dlname~test -z "$tstripme" || test -z "$striplib" || $striplib $destdir/$dlname~test -n "$linkname" || linkname=$realname~func_stripname "" ".a" "$linkname"~(cd "$destdir" && $LN_S -f $dlname $func_stripname_result.so)' + postuninstall_cmds='test -z "$dlname" || func_append rmfiles " $odir/$dlname"~for n in $old_library $library_names; do :; done~func_stripname "" ".a" "$n"~func_append rmfiles " $odir/$func_stripname_result.so"' + ;; + esac + shlibpath_var=LIBPATH + fi + ;; +amigaos*) + case $host_cpu in + powerpc) + # Since July 2007 AmigaOS4 officially supports .so libraries. + # When compiling the executable, add -use-dynld -Lsobjs: to the compileline. + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + ;; + m68k) + library_names_spec='$libname.ixlibrary $libname.a' + # Create ${libname}_ixlibrary.a entries in /sys/libs. + finish_eval='for lib in `ls $libdir/*.ixlibrary 2>/dev/null`; do libname=`func_echo_all "$lib" | $SED '\''s%^.*/\([^/]*\)\.ixlibrary$%\1%'\''`; $RM /sys/libs/${libname}_ixlibrary.a; $show "cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a"; cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a || exit 1; done' + ;; + esac + ;; +beos*) + library_names_spec='$libname$shared_ext' + dynamic_linker="$host_os ld.so" + shlibpath_var=LIBRARY_PATH + ;; -hard_links=nottested -if test no = "$lt_cv_prog_compiler_c_o_CXX" && test no != "$need_locks"; then - # do not overwrite the value of need_locks provided by the user - { $as_echo "$as_me:${as_lineno-$LINENO}: checking if we can lock with hard links" >&5 -$as_echo_n "checking if we can lock with hard links... " >&6; } - hard_links=yes - $RM conftest* - ln conftest.a conftest.b 2>/dev/null && hard_links=no - touch conftest.a - ln conftest.a conftest.b 2>&5 || hard_links=no - ln conftest.a conftest.b 2>/dev/null && hard_links=no - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $hard_links" >&5 -$as_echo "$hard_links" >&6; } - if test no = "$hard_links"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: '$CC' does not support '-c -o', so 'make -j' may be unsafe" >&5 -$as_echo "$as_me: WARNING: '$CC' does not support '-c -o', so 'make -j' may be unsafe" >&2;} - need_locks=warn - fi -else - need_locks=no -fi +bsdi[45]*) + version_type=linux # correct to gnu/linux during the next big refactor + need_version=no + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + soname_spec='$libname$release$shared_ext$major' + finish_cmds='PATH="\$PATH:/sbin" ldconfig $libdir' + shlibpath_var=LD_LIBRARY_PATH + sys_lib_search_path_spec="/shlib /usr/lib /usr/X11/lib /usr/contrib/lib /lib /usr/local/lib" + sys_lib_dlsearch_path_spec="/shlib /usr/lib /usr/local/lib" + # the default ld.so.conf also contains /usr/contrib/lib and + # /usr/X11R6/lib (/usr/X11 is a link to /usr/X11R6), but let us allow + # libtool to hard-code these into programs + ;; +cygwin* | mingw* | pw32* | cegcc*) + version_type=windows + shrext_cmds=.dll + need_version=no + need_lib_prefix=no + case $GCC,$cc_basename in + yes,*) + # gcc + library_names_spec='$libname.dll.a' + # DLL is installed to $(libdir)/../bin by postinstall_cmds + postinstall_cmds='base_file=`basename \$file`~ + dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~ + dldir=$destdir/`dirname \$dlpath`~ + test -d \$dldir || mkdir -p \$dldir~ + $install_prog $dir/$dlname \$dldir/$dlname~ + chmod a+x \$dldir/$dlname~ + if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then + eval '\''$striplib \$dldir/$dlname'\'' || exit \$?; + fi' + postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ + dlpath=$dir/\$dldll~ + $RM \$dlpath' + shlibpath_overrides_runpath=yes - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the $compiler linker ($LD) supports shared libraries" >&5 -$as_echo_n "checking whether the $compiler linker ($LD) supports shared libraries... " >&6; } + case $host_os in + cygwin*) + # Cygwin DLLs use 'cyg' prefix rather than 'lib' + soname_spec='`echo $libname | sed -e 's/^lib/cyg/'``echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext' - export_symbols_cmds_CXX='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols' - exclude_expsyms_CXX='_GLOBAL_OFFSET_TABLE_|_GLOBAL__F[ID]_.*' - case $host_os in - aix[4-9]*) - # If we're using GNU nm, then we don't want the "-C" option. - # -C means demangle to GNU nm, but means don't demangle to AIX nm. - # Without the "-l" option, or with the "-B" option, AIX nm treats - # weak defined symbols like other global defined symbols, whereas - # GNU nm marks them as "W". - # While the 'weak' keyword is ignored in the Export File, we need - # it in the Import File for the 'aix-soname' feature, so we have - # to replace the "-B" option with "-P" for AIX nm. - if $NM -V 2>&1 | $GREP 'GNU' > /dev/null; then - export_symbols_cmds_CXX='$NM -Bpg $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "W")) && (substr(\$ 3,1,1) != ".")) { if (\$ 2 == "W") { print \$ 3 " weak" } else { print \$ 3 } } }'\'' | sort -u > $export_symbols' - else - export_symbols_cmds_CXX='`func_echo_all $NM | $SED -e '\''s/B\([^B]*\)$/P\1/'\''` -PCpgl $libobjs $convenience | awk '\''{ if (((\$ 2 == "T") || (\$ 2 == "D") || (\$ 2 == "B") || (\$ 2 == "L") || (\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) && (substr(\$ 1,1,1) != ".")) { if ((\$ 2 == "W") || (\$ 2 == "V") || (\$ 2 == "Z")) { print \$ 1 " weak" } else { print \$ 1 } } }'\'' | sort -u > $export_symbols' - fi - ;; - pw32*) - export_symbols_cmds_CXX=$ltdll_cmds + sys_lib_search_path_spec="$sys_lib_search_path_spec /usr/lib/w32api" + ;; + mingw* | cegcc*) + # MinGW DLLs use traditional 'lib' prefix + soname_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext' + ;; + pw32*) + # pw32 DLLs use 'pw' prefix rather than 'lib' + library_names_spec='`echo $libname | sed -e 's/^lib/pw/'``echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext' + ;; + esac + dynamic_linker='Win32 ld.exe' ;; - cygwin* | mingw* | cegcc*) - case $cc_basename in - cl* | icl*) - exclude_expsyms_CXX='_NULL_IMPORT_DESCRIPTOR|_IMPORT_DESCRIPTOR_.*' + + *,cl* | *,icl*) + # Native MSVC or ICC + libname_spec='$name' + soname_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext' + library_names_spec='$libname.dll.lib' + + case $build_os in + mingw*) + sys_lib_search_path_spec= + lt_save_ifs=$IFS + IFS=';' + for lt_path in $LIB + do + IFS=$lt_save_ifs + # Let DOS variable expansion print the short 8.3 style file name. + lt_path=`cd "$lt_path" 2>/dev/null && cmd //C "for %i in (".") do @echo %~si"` + sys_lib_search_path_spec="$sys_lib_search_path_spec $lt_path" + done + IFS=$lt_save_ifs + # Convert to MSYS style. + sys_lib_search_path_spec=`$ECHO "$sys_lib_search_path_spec" | sed -e 's|\\\\|/|g' -e 's| \\([a-zA-Z]\\):| /\\1|g' -e 's|^ ||'` + ;; + cygwin*) + # Convert to unix form, then to dos form, then back to unix form + # but this time dos style (no spaces!) so that the unix form looks + # like /cygdrive/c/PROGRA~1:/cygdr... + sys_lib_search_path_spec=`cygpath --path --unix "$LIB"` + sys_lib_search_path_spec=`cygpath --path --dos "$sys_lib_search_path_spec" 2>/dev/null` + sys_lib_search_path_spec=`cygpath --path --unix "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` ;; *) - export_symbols_cmds_CXX='$NM $libobjs $convenience | $global_symbol_pipe | $SED -e '\''/^[BCDGRS][ ]/s/.*[ ]\([^ ]*\)/\1 DATA/;s/^.*[ ]__nm__\([^ ]*\)[ ][^ ]*/\1 DATA/;/^I[ ]/d;/^[AITW][ ]/s/.* //'\'' | sort | uniq > $export_symbols' - exclude_expsyms_CXX='[_]+GLOBAL_OFFSET_TABLE_|[_]+GLOBAL__[FID]_.*|[_]+head_[A-Za-z0-9_]+_dll|[A-Za-z0-9_]+_dll_iname' + sys_lib_search_path_spec=$LIB + if $ECHO "$sys_lib_search_path_spec" | $GREP ';[c-zC-Z]:/' >/dev/null; then + # It is most probably a Windows format PATH. + sys_lib_search_path_spec=`$ECHO "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` + else + sys_lib_search_path_spec=`$ECHO "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` + fi + # FIXME: find the short name or the path components, as spaces are + # common. (e.g. "Program Files" -> "PROGRA~1") ;; esac + + # DLL is installed to $(libdir)/../bin by postinstall_cmds + postinstall_cmds='base_file=`basename \$file`~ + dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~ + dldir=$destdir/`dirname \$dlpath`~ + test -d \$dldir || mkdir -p \$dldir~ + $install_prog $dir/$dlname \$dldir/$dlname' + postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ + dlpath=$dir/\$dldll~ + $RM \$dlpath' + shlibpath_overrides_runpath=yes + dynamic_linker='Win32 link.exe' ;; + *) - export_symbols_cmds_CXX='$NM $libobjs $convenience | $global_symbol_pipe | $SED '\''s/.* //'\'' | sort | uniq > $export_symbols' + # Assume MSVC and ICC wrapper + library_names_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext $libname.lib' + dynamic_linker='Win32 ld.exe' ;; esac + # FIXME: first we should search . and the directory the executable is in + shlibpath_var=PATH + ;; -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ld_shlibs_CXX" >&5 -$as_echo "$ld_shlibs_CXX" >&6; } -test no = "$ld_shlibs_CXX" && can_build_shared=no - -with_gnu_ld_CXX=$with_gnu_ld - - - - - - -# -# Do we need to explicitly link libc? -# -case "x$archive_cmds_need_lc_CXX" in -x|xyes) - # Assume -lc should be added - archive_cmds_need_lc_CXX=yes +darwin* | rhapsody*) + dynamic_linker="$host_os dyld" + version_type=darwin + need_lib_prefix=no + need_version=no + library_names_spec='$libname$release$major$shared_ext $libname$shared_ext' + soname_spec='$libname$release$major$shared_ext' + shlibpath_overrides_runpath=yes + shlibpath_var=DYLD_LIBRARY_PATH + shrext_cmds='`test .$module = .yes && echo .so || echo .dylib`' - if test yes,yes = "$GCC,$enable_shared"; then - case $archive_cmds_CXX in - *'~'*) - # FIXME: we may have to deal with multi-command sequences. - ;; - '$CC '*) - # Test whether the compiler implicitly links with -lc since on some - # systems, -lgcc has to come before -lc. If gcc already passes -lc - # to ld, don't add -lc before -lgcc. - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether -lc should be explicitly linked in" >&5 -$as_echo_n "checking whether -lc should be explicitly linked in... " >&6; } -if ${lt_cv_archive_cmds_need_lc_CXX+:} false; then : - $as_echo_n "(cached) " >&6 -else - $RM conftest* - echo "$lt_simple_compile_test_code" > conftest.$ac_ext + sys_lib_search_path_spec="$sys_lib_search_path_spec /usr/local/lib" + sys_lib_dlsearch_path_spec='/usr/local/lib /lib /usr/lib' + ;; - if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_compile\""; } >&5 - (eval $ac_compile) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } 2>conftest.err; then - soname=conftest - lib=conftest - libobjs=conftest.$ac_objext - deplibs= - wl=$lt_prog_compiler_wl_CXX - pic_flag=$lt_prog_compiler_pic_CXX - compiler_flags=-v - linker_flags=-v - verstring= - output_objdir=. - libname=conftest - lt_save_allow_undefined_flag=$allow_undefined_flag_CXX - allow_undefined_flag_CXX= - if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$archive_cmds_CXX 2\>\&1 \| $GREP \" -lc \" \>/dev/null 2\>\&1\""; } >&5 - (eval $archive_cmds_CXX 2\>\&1 \| $GREP \" -lc \" \>/dev/null 2\>\&1) 2>&5 - ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 - test $ac_status = 0; } - then - lt_cv_archive_cmds_need_lc_CXX=no - else - lt_cv_archive_cmds_need_lc_CXX=yes - fi - allow_undefined_flag_CXX=$lt_save_allow_undefined_flag - else - cat conftest.err 1>&5 - fi - $RM conftest* +dgux*) + version_type=linux # correct to gnu/linux during the next big refactor + need_lib_prefix=no + need_version=no + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + soname_spec='$libname$release$shared_ext$major' + shlibpath_var=LD_LIBRARY_PATH + ;; -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $lt_cv_archive_cmds_need_lc_CXX" >&5 -$as_echo "$lt_cv_archive_cmds_need_lc_CXX" >&6; } - archive_cmds_need_lc_CXX=$lt_cv_archive_cmds_need_lc_CXX - ;; +freebsd* | dragonfly*) + # DragonFly does not have aout. When/if they implement a new + # versioning mechanism, adjust this. + if test -x /usr/bin/objformat; then + objformat=`/usr/bin/objformat` + else + case $host_os in + freebsd[23].*) objformat=aout ;; + *) objformat=elf ;; esac fi + version_type=freebsd-$objformat + case $version_type in + freebsd-elf*) + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + soname_spec='$libname$release$shared_ext$major' + need_version=no + need_lib_prefix=no + ;; + freebsd-*) + library_names_spec='$libname$release$shared_ext$versuffix $libname$shared_ext$versuffix' + need_version=yes + ;; + esac + shlibpath_var=LD_LIBRARY_PATH + case $host_os in + freebsd2.*) + shlibpath_overrides_runpath=yes + ;; + freebsd3.[01]* | freebsdelf3.[01]*) + shlibpath_overrides_runpath=yes + hardcode_into_libs=yes + ;; + freebsd3.[2-9]* | freebsdelf3.[2-9]* | \ + freebsd4.[0-5] | freebsdelf4.[0-5] | freebsd4.1.1 | freebsdelf4.1.1) + shlibpath_overrides_runpath=no + hardcode_into_libs=yes + ;; + *) # from 4.6 on, and DragonFly + shlibpath_overrides_runpath=yes + hardcode_into_libs=yes + ;; + esac ;; -esac - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +haiku*) + version_type=linux # correct to gnu/linux during the next big refactor + need_lib_prefix=no + need_version=no + dynamic_linker="$host_os runtime_loader" + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + soname_spec='$libname$release$shared_ext$major' + shlibpath_var=LIBRARY_PATH + shlibpath_overrides_runpath=no + sys_lib_dlsearch_path_spec='/boot/home/config/lib /boot/common/lib /boot/system/lib' + hardcode_into_libs=yes + ;; +hpux9* | hpux10* | hpux11*) + # Give a soname corresponding to the major version so that dld.sl refuses to + # link against other versions. + version_type=sunos + need_lib_prefix=no + need_version=no + case $host_cpu in + ia64*) + shrext_cmds='.so' + hardcode_into_libs=yes + dynamic_linker="$host_os dld.so" + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + soname_spec='$libname$release$shared_ext$major' + if test 32 = "$HPUX_IA64_MODE"; then + sys_lib_search_path_spec="/usr/lib/hpux32 /usr/local/lib/hpux32 /usr/local/lib" + sys_lib_dlsearch_path_spec=/usr/lib/hpux32 + else + sys_lib_search_path_spec="/usr/lib/hpux64 /usr/local/lib/hpux64" + sys_lib_dlsearch_path_spec=/usr/lib/hpux64 + fi + ;; + hppa*64*) + shrext_cmds='.sl' + hardcode_into_libs=yes + dynamic_linker="$host_os dld.sl" + shlibpath_var=LD_LIBRARY_PATH # How should we handle SHLIB_PATH + shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + soname_spec='$libname$release$shared_ext$major' + sys_lib_search_path_spec="/usr/lib/pa20_64 /usr/ccs/lib/pa20_64" + sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec + ;; + *) + shrext_cmds='.sl' + dynamic_linker="$host_os dld.sl" + shlibpath_var=SHLIB_PATH + shlibpath_overrides_runpath=no # +s is required to enable SHLIB_PATH + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + soname_spec='$libname$release$shared_ext$major' + ;; + esac + # HP-UX runs *really* slowly unless shared libraries are mode 555, ... + postinstall_cmds='chmod 555 $lib' + # or fails outright, so override atomically: + install_override_mode=555 + ;; +interix[3-9]*) + version_type=linux # correct to gnu/linux during the next big refactor + need_lib_prefix=no + need_version=no + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + soname_spec='$libname$release$shared_ext$major' + dynamic_linker='Interix 3.x ld.so.1 (PE, like ELF)' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=no + hardcode_into_libs=yes + ;; +irix5* | irix6* | nonstopux*) + case $host_os in + nonstopux*) version_type=nonstopux ;; + *) + if test yes = "$lt_cv_prog_gnu_ld"; then + version_type=linux # correct to gnu/linux during the next big refactor + else + version_type=irix + fi ;; + esac + need_lib_prefix=no + need_version=no + soname_spec='$libname$release$shared_ext$major' + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$release$shared_ext $libname$shared_ext' + case $host_os in + irix5* | nonstopux*) + libsuff= shlibsuff= + ;; + *) + case $LD in # libtool.m4 will add one of these switches to LD + *-32|*"-32 "|*-melf32bsmip|*"-melf32bsmip ") + libsuff= shlibsuff= libmagic=32-bit;; + *-n32|*"-n32 "|*-melf32bmipn32|*"-melf32bmipn32 ") + libsuff=32 shlibsuff=N32 libmagic=N32;; + *-64|*"-64 "|*-melf64bmip|*"-melf64bmip ") + libsuff=64 shlibsuff=64 libmagic=64-bit;; + *) libsuff= shlibsuff= libmagic=never-match;; + esac + ;; + esac + shlibpath_var=LD_LIBRARY${shlibsuff}_PATH + shlibpath_overrides_runpath=no + sys_lib_search_path_spec="/usr/lib$libsuff /lib$libsuff /usr/local/lib$libsuff" + sys_lib_dlsearch_path_spec="/usr/lib$libsuff /lib$libsuff" + hardcode_into_libs=yes + ;; +# No shared lib support for Linux oldld, aout, or coff. +linux*oldld* | linux*aout* | linux*coff*) + dynamic_linker=no + ;; +linux*android*) + version_type=none # Android doesn't support versioned libraries. + need_lib_prefix=no + need_version=no + library_names_spec='$libname$release$shared_ext' + soname_spec='$libname$release$shared_ext' + finish_cmds= + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes + # This implies no fast_install, which is unacceptable. + # Some rework will be needed to allow for fast_install + # before this can be enabled. + hardcode_into_libs=yes - { $as_echo "$as_me:${as_lineno-$LINENO}: checking dynamic linker characteristics" >&5 -$as_echo_n "checking dynamic linker characteristics... " >&6; } + dynamic_linker='Android linker' + # Don't embed -rpath directories since the linker doesn't support them. + hardcode_libdir_flag_spec='-L$libdir' + ;; -library_names_spec= -libname_spec='lib$name' -soname_spec= -shrext_cmds=.so -postinstall_cmds= -postuninstall_cmds= -finish_cmds= -finish_eval= -shlibpath_var= -shlibpath_overrides_runpath=unknown -version_type=none -dynamic_linker="$host_os ld.so" -sys_lib_dlsearch_path_spec="/lib /usr/lib" -need_lib_prefix=unknown -hardcode_into_libs=no +# This must be glibc/ELF. +linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*) + version_type=linux # correct to gnu/linux during the next big refactor + need_lib_prefix=no + need_version=no + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + soname_spec='$libname$release$shared_ext$major' + finish_cmds='PATH="\$PATH:/sbin" ldconfig -n $libdir' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=no -# when you set need_version to no, make sure it does not cause -set_version -# flags to be left without arguments -need_version=unknown + # Some binutils ld are patched to set DT_RUNPATH + if test ${lt_cv_shlibpath_overrides_runpath+y} +then : + printf %s "(cached) " >&6 +else $as_nop + lt_cv_shlibpath_overrides_runpath=no + save_LDFLAGS=$LDFLAGS + save_libdir=$libdir + eval "libdir=/foo; wl=\"$lt_prog_compiler_wl\"; \ + LDFLAGS=\"\$LDFLAGS $hardcode_libdir_flag_spec\"" + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +int +main (void) +{ + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO" +then : + if ($OBJDUMP -p conftest$ac_exeext) 2>/dev/null | grep "RUNPATH.*$libdir" >/dev/null +then : + lt_cv_shlibpath_overrides_runpath=yes +fi +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam \ + conftest$ac_exeext conftest.$ac_ext + LDFLAGS=$save_LDFLAGS + libdir=$save_libdir -case $host_os in -aix3*) - version_type=linux # correct to gnu/linux during the next big refactor - library_names_spec='$libname$release$shared_ext$versuffix $libname.a' - shlibpath_var=LIBPATH +fi - # AIX 3 has no versioning support, so we append a major version to the name. - soname_spec='$libname$release$shared_ext$major' - ;; + shlibpath_overrides_runpath=$lt_cv_shlibpath_overrides_runpath -aix[4-9]*) - version_type=linux # correct to gnu/linux during the next big refactor - need_lib_prefix=no - need_version=no + # This implies no fast_install, which is unacceptable. + # Some rework will be needed to allow for fast_install + # before this can be enabled. hardcode_into_libs=yes - if test ia64 = "$host_cpu"; then - # AIX 5 supports IA64 - library_names_spec='$libname$release$shared_ext$major $libname$release$shared_ext$versuffix $libname$shared_ext' - shlibpath_var=LD_LIBRARY_PATH - else - # With GCC up to 2.95.x, collect2 would create an import file - # for dependence libraries. The import file would start with - # the line '#! .'. This would cause the generated library to - # depend on '.', always an invalid library. This was fixed in - # development snapshots of GCC prior to 3.0. - case $host_os in - aix4 | aix4.[01] | aix4.[01].*) - if { echo '#if __GNUC__ > 2 || (__GNUC__ == 2 && __GNUC_MINOR__ >= 97)' - echo ' yes ' - echo '#endif'; } | $CC -E - | $GREP yes > /dev/null; then - : - else - can_build_shared=no - fi - ;; - esac - # Using Import Files as archive members, it is possible to support - # filename-based versioning of shared library archives on AIX. While - # this would work for both with and without runtime linking, it will - # prevent static linking of such archives. So we do filename-based - # shared library versioning with .so extension only, which is used - # when both runtime linking and shared linking is enabled. - # Unfortunately, runtime linking may impact performance, so we do - # not want this to be the default eventually. Also, we use the - # versioned .so libs for executables only if there is the -brtl - # linker flag in LDFLAGS as well, or --with-aix-soname=svr4 only. - # To allow for filename-based versioning support, we need to create - # libNAME.so.V as an archive file, containing: - # *) an Import File, referring to the versioned filename of the - # archive as well as the shared archive member, telling the - # bitwidth (32 or 64) of that shared object, and providing the - # list of exported symbols of that shared object, eventually - # decorated with the 'weak' keyword - # *) the shared object with the F_LOADONLY flag set, to really avoid - # it being seen by the linker. - # At run time we better use the real file rather than another symlink, - # but for link time we create the symlink libNAME.so -> libNAME.so.V - case $with_aix_soname,$aix_use_runtimelinking in - # AIX (on Power*) has no versioning support, so currently we cannot hardcode correct - # soname into executable. Probably we can add versioning support to - # collect2, so additional links can be useful in future. - aix,yes) # traditional libtool - dynamic_linker='AIX unversionable lib.so' - # If using run time linking (on AIX 4.2 or later) use lib.so - # instead of lib.a to let people know that these are not - # typical AIX shared libraries. - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - ;; - aix,no) # traditional AIX only - dynamic_linker='AIX lib.a(lib.so.V)' - # We preserve .a as extension for shared libraries through AIX4.2 - # and later when we are not doing run time linking. - library_names_spec='$libname$release.a $libname.a' - soname_spec='$libname$release$shared_ext$major' - ;; - svr4,*) # full svr4 only - dynamic_linker="AIX lib.so.V($shared_archive_member_spec.o)" - library_names_spec='$libname$release$shared_ext$major $libname$shared_ext' - # We do not specify a path in Import Files, so LIBPATH fires. - shlibpath_overrides_runpath=yes - ;; - *,yes) # both, prefer svr4 - dynamic_linker="AIX lib.so.V($shared_archive_member_spec.o), lib.a(lib.so.V)" - library_names_spec='$libname$release$shared_ext$major $libname$shared_ext' - # unpreferred sharedlib libNAME.a needs extra handling - postinstall_cmds='test -n "$linkname" || linkname="$realname"~func_stripname "" ".so" "$linkname"~$install_shared_prog "$dir/$func_stripname_result.$libext" "$destdir/$func_stripname_result.$libext"~test -z "$tstripme" || test -z "$striplib" || $striplib "$destdir/$func_stripname_result.$libext"' - postuninstall_cmds='for n in $library_names $old_library; do :; done~func_stripname "" ".so" "$n"~test "$func_stripname_result" = "$n" || func_append rmfiles " $odir/$func_stripname_result.$libext"' - # We do not specify a path in Import Files, so LIBPATH fires. - shlibpath_overrides_runpath=yes - ;; - *,no) # both, prefer aix - dynamic_linker="AIX lib.a(lib.so.V), lib.so.V($shared_archive_member_spec.o)" - library_names_spec='$libname$release.a $libname.a' - soname_spec='$libname$release$shared_ext$major' - # unpreferred sharedlib libNAME.so.V and symlink libNAME.so need extra handling - postinstall_cmds='test -z "$dlname" || $install_shared_prog $dir/$dlname $destdir/$dlname~test -z "$tstripme" || test -z "$striplib" || $striplib $destdir/$dlname~test -n "$linkname" || linkname=$realname~func_stripname "" ".a" "$linkname"~(cd "$destdir" && $LN_S -f $dlname $func_stripname_result.so)' - postuninstall_cmds='test -z "$dlname" || func_append rmfiles " $odir/$dlname"~for n in $old_library $library_names; do :; done~func_stripname "" ".a" "$n"~func_append rmfiles " $odir/$func_stripname_result.so"' - ;; - esac - shlibpath_var=LIBPATH + # Ideally, we could use ldconfig to report *all* directores which are + # searched for libraries, however this is still not possible. Aside from not + # being certain /sbin/ldconfig is available, command + # 'ldconfig -N -X -v | grep ^/' on 64bit Fedora does not report /usr/lib64, + # even though it is searched at run-time. Try to do the best guess by + # appending ld.so.conf contents (and includes) to the search path. + if test -f /etc/ld.so.conf; then + lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s 2>/dev/null", \$2)); skip = 1; } { if (!skip) print \$0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;/^[ ]*hwcap[ ]/d;s/[:, ]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;s/"//g;/^$/d' | tr '\n' ' '` + sys_lib_dlsearch_path_spec="/lib /usr/lib $lt_ld_extra" fi + + # We used to test for /lib/ld.so.1 and disable shared libraries on + # powerpc, because MkLinux only supported shared libraries with the + # GNU dynamic linker. Since this was broken with cross compilers, + # most powerpc-linux boxes support dynamic linking these days and + # people can always --disable-shared, the test was removed, and we + # assume the GNU/Linux dynamic linker is in use. + dynamic_linker='GNU/Linux ld.so' ;; -amigaos*) - case $host_cpu in - powerpc) - # Since July 2007 AmigaOS4 officially supports .so libraries. - # When compiling the executable, add -use-dynld -Lsobjs: to the compileline. +netbsd*) + version_type=sunos + need_lib_prefix=no + need_version=no + if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then + library_names_spec='$libname$release$shared_ext$versuffix $libname$shared_ext$versuffix' + finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' + dynamic_linker='NetBSD (a.out) ld.so' + else library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - ;; - m68k) - library_names_spec='$libname.ixlibrary $libname.a' - # Create ${libname}_ixlibrary.a entries in /sys/libs. - finish_eval='for lib in `ls $libdir/*.ixlibrary 2>/dev/null`; do libname=`func_echo_all "$lib" | $SED '\''s%^.*/\([^/]*\)\.ixlibrary$%\1%'\''`; $RM /sys/libs/${libname}_ixlibrary.a; $show "cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a"; cd /sys/libs && $LN_S $lib ${libname}_ixlibrary.a || exit 1; done' - ;; - esac + soname_spec='$libname$release$shared_ext$major' + dynamic_linker='NetBSD ld.elf_so' + fi + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes + hardcode_into_libs=yes ;; -beos*) - library_names_spec='$libname$shared_ext' - dynamic_linker="$host_os ld.so" - shlibpath_var=LIBRARY_PATH +newsos6) + version_type=linux # correct to gnu/linux during the next big refactor + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes ;; -bsdi[45]*) - version_type=linux # correct to gnu/linux during the next big refactor +*nto* | *qnx*) + version_type=qnx + need_lib_prefix=no need_version=no library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' soname_spec='$libname$release$shared_ext$major' - finish_cmds='PATH="\$PATH:/sbin" ldconfig $libdir' shlibpath_var=LD_LIBRARY_PATH - sys_lib_search_path_spec="/shlib /usr/lib /usr/X11/lib /usr/contrib/lib /lib /usr/local/lib" - sys_lib_dlsearch_path_spec="/shlib /usr/lib /usr/local/lib" - # the default ld.so.conf also contains /usr/contrib/lib and - # /usr/X11R6/lib (/usr/X11 is a link to /usr/X11R6), but let us allow - # libtool to hard-code these into programs + shlibpath_overrides_runpath=no + hardcode_into_libs=yes + dynamic_linker='ldqnx.so' ;; -cygwin* | mingw* | pw32* | cegcc*) +openbsd* | bitrig*) + version_type=sunos + sys_lib_dlsearch_path_spec=/usr/lib + need_lib_prefix=no + if test -z "`echo __ELF__ | $CC -E - | $GREP __ELF__`"; then + need_version=no + else + need_version=yes + fi + library_names_spec='$libname$release$shared_ext$versuffix $libname$shared_ext$versuffix' + finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes + ;; + +os2*) + libname_spec='$name' version_type=windows shrext_cmds=.dll need_version=no need_lib_prefix=no + # OS/2 can only load a DLL with a base name of 8 characters or less. + soname_spec='`test -n "$os2dllname" && libname="$os2dllname"; + v=$($ECHO $release$versuffix | tr -d .-); + n=$($ECHO $libname | cut -b -$((8 - ${#v})) | tr . _); + $ECHO $n$v`$shared_ext' + library_names_spec='${libname}_dll.$libext' + dynamic_linker='OS/2 ld.exe' + shlibpath_var=BEGINLIBPATH + sys_lib_search_path_spec="/lib /usr/lib /usr/local/lib" + sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec + postinstall_cmds='base_file=`basename \$file`~ + dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; $ECHO \$dlname'\''`~ + dldir=$destdir/`dirname \$dlpath`~ + test -d \$dldir || mkdir -p \$dldir~ + $install_prog $dir/$dlname \$dldir/$dlname~ + chmod a+x \$dldir/$dlname~ + if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then + eval '\''$striplib \$dldir/$dlname'\'' || exit \$?; + fi' + postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; $ECHO \$dlname'\''`~ + dlpath=$dir/\$dldll~ + $RM \$dlpath' + ;; - case $GCC,$cc_basename in - yes,*) - # gcc - library_names_spec='$libname.dll.a' - # DLL is installed to $(libdir)/../bin by postinstall_cmds - postinstall_cmds='base_file=`basename \$file`~ - dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~ - dldir=$destdir/`dirname \$dlpath`~ - test -d \$dldir || mkdir -p \$dldir~ - $install_prog $dir/$dlname \$dldir/$dlname~ - chmod a+x \$dldir/$dlname~ - if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then - eval '\''$striplib \$dldir/$dlname'\'' || exit \$?; - fi' - postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ - dlpath=$dir/\$dldll~ - $RM \$dlpath' - shlibpath_overrides_runpath=yes - - case $host_os in - cygwin*) - # Cygwin DLLs use 'cyg' prefix rather than 'lib' - soname_spec='`echo $libname | sed -e 's/^lib/cyg/'``echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext' - - ;; - mingw* | cegcc*) - # MinGW DLLs use traditional 'lib' prefix - soname_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext' - ;; - pw32*) - # pw32 DLLs use 'pw' prefix rather than 'lib' - library_names_spec='`echo $libname | sed -e 's/^lib/pw/'``echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext' - ;; - esac - dynamic_linker='Win32 ld.exe' - ;; - - *,cl* | *,icl*) - # Native MSVC or ICC - libname_spec='$name' - soname_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext' - library_names_spec='$libname.dll.lib' - - case $build_os in - mingw*) - sys_lib_search_path_spec= - lt_save_ifs=$IFS - IFS=';' - for lt_path in $LIB - do - IFS=$lt_save_ifs - # Let DOS variable expansion print the short 8.3 style file name. - lt_path=`cd "$lt_path" 2>/dev/null && cmd //C "for %i in (".") do @echo %~si"` - sys_lib_search_path_spec="$sys_lib_search_path_spec $lt_path" - done - IFS=$lt_save_ifs - # Convert to MSYS style. - sys_lib_search_path_spec=`$ECHO "$sys_lib_search_path_spec" | sed -e 's|\\\\|/|g' -e 's| \\([a-zA-Z]\\):| /\\1|g' -e 's|^ ||'` - ;; - cygwin*) - # Convert to unix form, then to dos form, then back to unix form - # but this time dos style (no spaces!) so that the unix form looks - # like /cygdrive/c/PROGRA~1:/cygdr... - sys_lib_search_path_spec=`cygpath --path --unix "$LIB"` - sys_lib_search_path_spec=`cygpath --path --dos "$sys_lib_search_path_spec" 2>/dev/null` - sys_lib_search_path_spec=`cygpath --path --unix "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` - ;; - *) - sys_lib_search_path_spec=$LIB - if $ECHO "$sys_lib_search_path_spec" | $GREP ';[c-zC-Z]:/' >/dev/null; then - # It is most probably a Windows format PATH. - sys_lib_search_path_spec=`$ECHO "$sys_lib_search_path_spec" | $SED -e 's/;/ /g'` - else - sys_lib_search_path_spec=`$ECHO "$sys_lib_search_path_spec" | $SED -e "s/$PATH_SEPARATOR/ /g"` - fi - # FIXME: find the short name or the path components, as spaces are - # common. (e.g. "Program Files" -> "PROGRA~1") - ;; - esac - - # DLL is installed to $(libdir)/../bin by postinstall_cmds - postinstall_cmds='base_file=`basename \$file`~ - dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; echo \$dlname'\''`~ - dldir=$destdir/`dirname \$dlpath`~ - test -d \$dldir || mkdir -p \$dldir~ - $install_prog $dir/$dlname \$dldir/$dlname' - postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; echo \$dlname'\''`~ - dlpath=$dir/\$dldll~ - $RM \$dlpath' - shlibpath_overrides_runpath=yes - dynamic_linker='Win32 link.exe' - ;; +osf3* | osf4* | osf5*) + version_type=osf + need_lib_prefix=no + need_version=no + soname_spec='$libname$release$shared_ext$major' + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + shlibpath_var=LD_LIBRARY_PATH + sys_lib_search_path_spec="/usr/shlib /usr/ccs/lib /usr/lib/cmplrs/cc /usr/lib /usr/local/lib /var/shlib" + sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec + ;; - *) - # Assume MSVC and ICC wrapper - library_names_spec='$libname`echo $release | $SED -e 's/[.]/-/g'`$versuffix$shared_ext $libname.lib' - dynamic_linker='Win32 ld.exe' - ;; - esac - # FIXME: first we should search . and the directory the executable is in - shlibpath_var=PATH +rdos*) + dynamic_linker=no ;; -darwin* | rhapsody*) - dynamic_linker="$host_os dyld" - version_type=darwin +solaris*) + version_type=linux # correct to gnu/linux during the next big refactor need_lib_prefix=no need_version=no - library_names_spec='$libname$release$major$shared_ext $libname$shared_ext' - soname_spec='$libname$release$major$shared_ext' + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + soname_spec='$libname$release$shared_ext$major' + shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=yes - shlibpath_var=DYLD_LIBRARY_PATH - shrext_cmds='`test .$module = .yes && echo .so || echo .dylib`' + hardcode_into_libs=yes + # ldd complains unless libraries are executable + postinstall_cmds='chmod +x $lib' + ;; + +sunos4*) + version_type=sunos + library_names_spec='$libname$release$shared_ext$versuffix $libname$shared_ext$versuffix' + finish_cmds='PATH="\$PATH:/usr/etc" ldconfig $libdir' + shlibpath_var=LD_LIBRARY_PATH + shlibpath_overrides_runpath=yes + if test yes = "$with_gnu_ld"; then + need_lib_prefix=no + fi + need_version=yes + ;; + +sysv4 | sysv4.3*) + version_type=linux # correct to gnu/linux during the next big refactor + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + soname_spec='$libname$release$shared_ext$major' + shlibpath_var=LD_LIBRARY_PATH + case $host_vendor in + sni) + shlibpath_overrides_runpath=no + need_lib_prefix=no + runpath_var=LD_RUN_PATH + ;; + siemens) + need_lib_prefix=no + ;; + motorola) + need_lib_prefix=no + need_version=no + shlibpath_overrides_runpath=no + sys_lib_search_path_spec='/lib /usr/lib /usr/ccs/lib' + ;; + esac + ;; - sys_lib_dlsearch_path_spec='/usr/local/lib /lib /usr/lib' +sysv4*MP*) + if test -d /usr/nec; then + version_type=linux # correct to gnu/linux during the next big refactor + library_names_spec='$libname$shared_ext.$versuffix $libname$shared_ext.$major $libname$shared_ext' + soname_spec='$libname$shared_ext.$major' + shlibpath_var=LD_LIBRARY_PATH + fi ;; -dgux*) - version_type=linux # correct to gnu/linux during the next big refactor +sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) + version_type=sco need_lib_prefix=no need_version=no - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext $libname$shared_ext' soname_spec='$libname$release$shared_ext$major' shlibpath_var=LD_LIBRARY_PATH - ;; - -freebsd* | dragonfly*) - # DragonFly does not have aout. When/if they implement a new - # versioning mechanism, adjust this. - if test -x /usr/bin/objformat; then - objformat=`/usr/bin/objformat` + shlibpath_overrides_runpath=yes + hardcode_into_libs=yes + if test yes = "$with_gnu_ld"; then + sys_lib_search_path_spec='/usr/local/lib /usr/gnu/lib /usr/ccs/lib /usr/lib /lib' else + sys_lib_search_path_spec='/usr/ccs/lib /usr/lib' case $host_os in - freebsd[23].*) objformat=aout ;; - *) objformat=elf ;; + sco3.2v5*) + sys_lib_search_path_spec="$sys_lib_search_path_spec /lib" + ;; esac fi - version_type=freebsd-$objformat - case $version_type in - freebsd-elf*) - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - need_version=no - need_lib_prefix=no - ;; - freebsd-*) - library_names_spec='$libname$release$shared_ext$versuffix $libname$shared_ext$versuffix' - need_version=yes - ;; - esac - shlibpath_var=LD_LIBRARY_PATH - case $host_os in - freebsd2.*) - shlibpath_overrides_runpath=yes - ;; - freebsd3.[01]* | freebsdelf3.[01]*) - shlibpath_overrides_runpath=yes - hardcode_into_libs=yes - ;; - freebsd3.[2-9]* | freebsdelf3.[2-9]* | \ - freebsd4.[0-5] | freebsdelf4.[0-5] | freebsd4.1.1 | freebsdelf4.1.1) - shlibpath_overrides_runpath=no - hardcode_into_libs=yes - ;; - *) # from 4.6 on, and DragonFly - shlibpath_overrides_runpath=yes - hardcode_into_libs=yes - ;; - esac + sys_lib_dlsearch_path_spec='/usr/lib' ;; -haiku*) +tpf*) + # TPF is a cross-target only. Preferred cross-host = GNU/Linux. version_type=linux # correct to gnu/linux during the next big refactor need_lib_prefix=no need_version=no - dynamic_linker="$host_os runtime_loader" library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - shlibpath_var=LIBRARY_PATH + shlibpath_var=LD_LIBRARY_PATH shlibpath_overrides_runpath=no - sys_lib_dlsearch_path_spec='/boot/home/config/lib /boot/common/lib /boot/system/lib' hardcode_into_libs=yes ;; -hpux9* | hpux10* | hpux11*) - # Give a soname corresponding to the major version so that dld.sl refuses to - # link against other versions. - version_type=sunos - need_lib_prefix=no - need_version=no - case $host_cpu in - ia64*) - shrext_cmds='.so' - hardcode_into_libs=yes - dynamic_linker="$host_os dld.so" - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - if test 32 = "$HPUX_IA64_MODE"; then - sys_lib_search_path_spec="/usr/lib/hpux32 /usr/local/lib/hpux32 /usr/local/lib" - sys_lib_dlsearch_path_spec=/usr/lib/hpux32 - else - sys_lib_search_path_spec="/usr/lib/hpux64 /usr/local/lib/hpux64" - sys_lib_dlsearch_path_spec=/usr/lib/hpux64 - fi - ;; - hppa*64*) - shrext_cmds='.sl' - hardcode_into_libs=yes - dynamic_linker="$host_os dld.sl" - shlibpath_var=LD_LIBRARY_PATH # How should we handle SHLIB_PATH - shlibpath_overrides_runpath=yes # Unless +noenvvar is specified. - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - sys_lib_search_path_spec="/usr/lib/pa20_64 /usr/ccs/lib/pa20_64" - sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec - ;; - *) - shrext_cmds='.sl' - dynamic_linker="$host_os dld.sl" - shlibpath_var=SHLIB_PATH - shlibpath_overrides_runpath=no # +s is required to enable SHLIB_PATH - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - ;; - esac - # HP-UX runs *really* slowly unless shared libraries are mode 555, ... - postinstall_cmds='chmod 555 $lib' - # or fails outright, so override atomically: - install_override_mode=555 +uts4*) + version_type=linux # correct to gnu/linux during the next big refactor + library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' + soname_spec='$libname$release$shared_ext$major' + shlibpath_var=LD_LIBRARY_PATH + ;; + +*) + dynamic_linker=no ;; +esac +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $dynamic_linker" >&5 +printf "%s\n" "$dynamic_linker" >&6; } +test no = "$dynamic_linker" && can_build_shared=no + +variables_saved_for_relink="PATH $shlibpath_var $runpath_var" +if test yes = "$GCC"; then + variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" +fi + +if test set = "${lt_cv_sys_lib_search_path_spec+set}"; then + sys_lib_search_path_spec=$lt_cv_sys_lib_search_path_spec +fi + +if test set = "${lt_cv_sys_lib_dlsearch_path_spec+set}"; then + sys_lib_dlsearch_path_spec=$lt_cv_sys_lib_dlsearch_path_spec +fi + +# remember unaugmented sys_lib_dlsearch_path content for libtool script decls... +configure_time_dlsearch_path=$sys_lib_dlsearch_path_spec + +# ... but it needs LT_SYS_LIBRARY_PATH munging for other configure-time code +func_munge_path_list sys_lib_dlsearch_path_spec "$LT_SYS_LIBRARY_PATH" + +# to be used as default LT_SYS_LIBRARY_PATH value in generated libtool +configure_time_lt_sys_library_path=$LT_SYS_LIBRARY_PATH + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking how to hardcode library paths into programs" >&5 +printf %s "checking how to hardcode library paths into programs... " >&6; } +hardcode_action= +if test -n "$hardcode_libdir_flag_spec" || + test -n "$runpath_var" || + test yes = "$hardcode_automatic"; then -interix[3-9]*) - version_type=linux # correct to gnu/linux during the next big refactor - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - dynamic_linker='Interix 3.x ld.so.1 (PE, like ELF)' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=no - hardcode_into_libs=yes - ;; + # We can hardcode non-existent directories. + if test no != "$hardcode_direct" && + # If the only mechanism to avoid hardcoding is shlibpath_var, we + # have to relink, otherwise we might link with an installed library + # when we should be linking with a yet-to-be-installed one + ## test no != "$_LT_TAGVAR(hardcode_shlibpath_var, )" && + test no != "$hardcode_minus_L"; then + # Linking always hardcodes the temporary library directory. + hardcode_action=relink + else + # We can link without hardcoding, and we can hardcode nonexisting dirs. + hardcode_action=immediate + fi +else + # We cannot hardcode anything, or else we can only hardcode existing + # directories. + hardcode_action=unsupported +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $hardcode_action" >&5 +printf "%s\n" "$hardcode_action" >&6; } + +if test relink = "$hardcode_action" || + test yes = "$inherit_rpath"; then + # Fast installation is not supported + enable_fast_install=no +elif test yes = "$shlibpath_overrides_runpath" || + test no = "$enable_shared"; then + # Fast installation is not necessary + enable_fast_install=needless +fi + + + + + + + if test yes != "$enable_dlopen"; then + enable_dlopen=unknown + enable_dlopen_self=unknown + enable_dlopen_self_static=unknown +else + lt_cv_dlopen=no + lt_cv_dlopen_libs= -irix5* | irix6* | nonstopux*) - case $host_os in - nonstopux*) version_type=nonstopux ;; - *) - if test yes = "$lt_cv_prog_gnu_ld"; then - version_type=linux # correct to gnu/linux during the next big refactor - else - version_type=irix - fi ;; - esac - need_lib_prefix=no - need_version=no - soname_spec='$libname$release$shared_ext$major' - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$release$shared_ext $libname$shared_ext' case $host_os in - irix5* | nonstopux*) - libsuff= shlibsuff= + beos*) + lt_cv_dlopen=load_add_on + lt_cv_dlopen_libs= + lt_cv_dlopen_self=yes ;; - *) - case $LD in # libtool.m4 will add one of these switches to LD - *-32|*"-32 "|*-melf32bsmip|*"-melf32bsmip ") - libsuff= shlibsuff= libmagic=32-bit;; - *-n32|*"-n32 "|*-melf32bmipn32|*"-melf32bmipn32 ") - libsuff=32 shlibsuff=N32 libmagic=N32;; - *-64|*"-64 "|*-melf64bmip|*"-melf64bmip ") - libsuff=64 shlibsuff=64 libmagic=64-bit;; - *) libsuff= shlibsuff= libmagic=never-match;; - esac + + mingw* | pw32* | cegcc*) + lt_cv_dlopen=LoadLibrary + lt_cv_dlopen_libs= ;; - esac - shlibpath_var=LD_LIBRARY${shlibsuff}_PATH - shlibpath_overrides_runpath=no - sys_lib_search_path_spec="/usr/lib$libsuff /lib$libsuff /usr/local/lib$libsuff" - sys_lib_dlsearch_path_spec="/usr/lib$libsuff /lib$libsuff" - hardcode_into_libs=yes - ;; -# No shared lib support for Linux oldld, aout, or coff. -linux*oldld* | linux*aout* | linux*coff*) - dynamic_linker=no - ;; + cygwin*) + lt_cv_dlopen=dlopen + lt_cv_dlopen_libs= + ;; -linux*android*) - version_type=none # Android doesn't support versioned libraries. - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$shared_ext' - soname_spec='$libname$release$shared_ext' - finish_cmds= - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes + darwin*) + # if libdl is installed we need to link against it + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for dlopen in -ldl" >&5 +printf %s "checking for dlopen in -ldl... " >&6; } +if test ${ac_cv_lib_dl_dlopen+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_check_lib_save_LIBS=$LIBS +LIBS="-ldl $LIBS" +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ - # This implies no fast_install, which is unacceptable. - # Some rework will be needed to allow for fast_install - # before this can be enabled. - hardcode_into_libs=yes +/* Override any GCC internal prototype to avoid an error. + Use char because int might match the return type of a GCC + builtin and then its argument prototype would still apply. */ +char dlopen (); +int +main (void) +{ +return dlopen (); + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO" +then : + ac_cv_lib_dl_dlopen=yes +else $as_nop + ac_cv_lib_dl_dlopen=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam \ + conftest$ac_exeext conftest.$ac_ext +LIBS=$ac_check_lib_save_LIBS +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_dl_dlopen" >&5 +printf "%s\n" "$ac_cv_lib_dl_dlopen" >&6; } +if test "x$ac_cv_lib_dl_dlopen" = xyes +then : + lt_cv_dlopen=dlopen lt_cv_dlopen_libs=-ldl +else $as_nop - dynamic_linker='Android linker' - # Don't embed -rpath directories since the linker doesn't support them. - hardcode_libdir_flag_spec_CXX='-L$libdir' - ;; + lt_cv_dlopen=dyld + lt_cv_dlopen_libs= + lt_cv_dlopen_self=yes -# This must be glibc/ELF. -linux* | k*bsd*-gnu | kopensolaris*-gnu | gnu*) - version_type=linux # correct to gnu/linux during the next big refactor - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - finish_cmds='PATH="\$PATH:/sbin" ldconfig -n $libdir' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=no +fi - # Some binutils ld are patched to set DT_RUNPATH - if ${lt_cv_shlibpath_overrides_runpath+:} false; then : - $as_echo_n "(cached) " >&6 -else - lt_cv_shlibpath_overrides_runpath=no - save_LDFLAGS=$LDFLAGS - save_libdir=$libdir - eval "libdir=/foo; wl=\"$lt_prog_compiler_wl_CXX\"; \ - LDFLAGS=\"\$LDFLAGS $hardcode_libdir_flag_spec_CXX\"" - cat confdefs.h - <<_ACEOF >conftest.$ac_ext + ;; + + tpf*) + # Don't try to run any link tests for TPF. We know it's impossible + # because TPF is a cross-compiler, and we know how we open DSOs. + lt_cv_dlopen=dlopen + lt_cv_dlopen_libs= + lt_cv_dlopen_self=no + ;; + + *) + ac_fn_c_check_func "$LINENO" "shl_load" "ac_cv_func_shl_load" +if test "x$ac_cv_func_shl_load" = xyes +then : + lt_cv_dlopen=shl_load +else $as_nop + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for shl_load in -ldld" >&5 +printf %s "checking for shl_load in -ldld... " >&6; } +if test ${ac_cv_lib_dld_shl_load+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_check_lib_save_LIBS=$LIBS +LIBS="-ldld $LIBS" +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +/* Override any GCC internal prototype to avoid an error. + Use char because int might match the return type of a GCC + builtin and then its argument prototype would still apply. */ +char shl_load (); +int +main (void) +{ +return shl_load (); + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO" +then : + ac_cv_lib_dld_shl_load=yes +else $as_nop + ac_cv_lib_dld_shl_load=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam \ + conftest$ac_exeext conftest.$ac_ext +LIBS=$ac_check_lib_save_LIBS +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_dld_shl_load" >&5 +printf "%s\n" "$ac_cv_lib_dld_shl_load" >&6; } +if test "x$ac_cv_lib_dld_shl_load" = xyes +then : + lt_cv_dlopen=shl_load lt_cv_dlopen_libs=-ldld +else $as_nop + ac_fn_c_check_func "$LINENO" "dlopen" "ac_cv_func_dlopen" +if test "x$ac_cv_func_dlopen" = xyes +then : + lt_cv_dlopen=dlopen +else $as_nop + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for dlopen in -ldl" >&5 +printf %s "checking for dlopen in -ldl... " >&6; } +if test ${ac_cv_lib_dl_dlopen+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_check_lib_save_LIBS=$LIBS +LIBS="-ldl $LIBS" +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +/* Override any GCC internal prototype to avoid an error. + Use char because int might match the return type of a GCC + builtin and then its argument prototype would still apply. */ +char dlopen (); +int +main (void) +{ +return dlopen (); + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO" +then : + ac_cv_lib_dl_dlopen=yes +else $as_nop + ac_cv_lib_dl_dlopen=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam \ + conftest$ac_exeext conftest.$ac_ext +LIBS=$ac_check_lib_save_LIBS +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_dl_dlopen" >&5 +printf "%s\n" "$ac_cv_lib_dl_dlopen" >&6; } +if test "x$ac_cv_lib_dl_dlopen" = xyes +then : + lt_cv_dlopen=dlopen lt_cv_dlopen_libs=-ldl +else $as_nop + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for dlopen in -lsvld" >&5 +printf %s "checking for dlopen in -lsvld... " >&6; } +if test ${ac_cv_lib_svld_dlopen+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_check_lib_save_LIBS=$LIBS +LIBS="-lsvld $LIBS" +cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ +/* Override any GCC internal prototype to avoid an error. + Use char because int might match the return type of a GCC + builtin and then its argument prototype would still apply. */ +char dlopen (); int -main () +main (void) { +return dlopen (); + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO" +then : + ac_cv_lib_svld_dlopen=yes +else $as_nop + ac_cv_lib_svld_dlopen=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam \ + conftest$ac_exeext conftest.$ac_ext +LIBS=$ac_check_lib_save_LIBS +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_svld_dlopen" >&5 +printf "%s\n" "$ac_cv_lib_svld_dlopen" >&6; } +if test "x$ac_cv_lib_svld_dlopen" = xyes +then : + lt_cv_dlopen=dlopen lt_cv_dlopen_libs=-lsvld +else $as_nop + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for dld_link in -ldld" >&5 +printf %s "checking for dld_link in -ldld... " >&6; } +if test ${ac_cv_lib_dld_dld_link+y} +then : + printf %s "(cached) " >&6 +else $as_nop + ac_check_lib_save_LIBS=$LIBS +LIBS="-ldld $LIBS" +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +/* Override any GCC internal prototype to avoid an error. + Use char because int might match the return type of a GCC + builtin and then its argument prototype would still apply. */ +char dld_link (); +int +main (void) +{ +return dld_link (); ; return 0; } _ACEOF -if ac_fn_cxx_try_link "$LINENO"; then : - if ($OBJDUMP -p conftest$ac_exeext) 2>/dev/null | grep "RUNPATH.*$libdir" >/dev/null; then : - lt_cv_shlibpath_overrides_runpath=yes -fi +if ac_fn_c_try_link "$LINENO" +then : + ac_cv_lib_dld_dld_link=yes +else $as_nop + ac_cv_lib_dld_dld_link=no fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext - LDFLAGS=$save_LDFLAGS - libdir=$save_libdir +LIBS=$ac_check_lib_save_LIBS +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_dld_dld_link" >&5 +printf "%s\n" "$ac_cv_lib_dld_dld_link" >&6; } +if test "x$ac_cv_lib_dld_dld_link" = xyes +then : + lt_cv_dlopen=dld_link lt_cv_dlopen_libs=-ldld +fi + fi - shlibpath_overrides_runpath=$lt_cv_shlibpath_overrides_runpath - # This implies no fast_install, which is unacceptable. - # Some rework will be needed to allow for fast_install - # before this can be enabled. - hardcode_into_libs=yes +fi - # Ideally, we could use ldconfig to report *all* directores which are - # searched for libraries, however this is still not possible. Aside from not - # being certain /sbin/ldconfig is available, command - # 'ldconfig -N -X -v | grep ^/' on 64bit Fedora does not report /usr/lib64, - # even though it is searched at run-time. Try to do the best guess by - # appending ld.so.conf contents (and includes) to the search path. - if test -f /etc/ld.so.conf; then - lt_ld_extra=`awk '/^include / { system(sprintf("cd /etc; cat %s 2>/dev/null", \$2)); skip = 1; } { if (!skip) print \$0; skip = 0; }' < /etc/ld.so.conf | $SED -e 's/#.*//;/^[ ]*hwcap[ ]/d;s/[:, ]/ /g;s/=[^=]*$//;s/=[^= ]* / /g;s/"//g;/^$/d' | tr '\n' ' '` - sys_lib_dlsearch_path_spec="/lib /usr/lib $lt_ld_extra" - fi - # We used to test for /lib/ld.so.1 and disable shared libraries on - # powerpc, because MkLinux only supported shared libraries with the - # GNU dynamic linker. Since this was broken with cross compilers, - # most powerpc-linux boxes support dynamic linking these days and - # people can always --disable-shared, the test was removed, and we - # assume the GNU/Linux dynamic linker is in use. - dynamic_linker='GNU/Linux ld.so' - ;; +fi -netbsd*) - version_type=sunos - need_lib_prefix=no - need_version=no - if echo __ELF__ | $CC -E - | $GREP __ELF__ >/dev/null; then - library_names_spec='$libname$release$shared_ext$versuffix $libname$shared_ext$versuffix' - finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' - dynamic_linker='NetBSD (a.out) ld.so' - else - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - dynamic_linker='NetBSD ld.elf_so' - fi - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - hardcode_into_libs=yes - ;; -newsos6) - version_type=linux # correct to gnu/linux during the next big refactor - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - ;; +fi -*nto* | *qnx*) - version_type=qnx - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=no - hardcode_into_libs=yes - dynamic_linker='ldqnx.so' - ;; -openbsd* | bitrig*) - version_type=sunos - sys_lib_dlsearch_path_spec=/usr/lib - need_lib_prefix=no - if test -z "`echo __ELF__ | $CC -E - | $GREP __ELF__`"; then - need_version=no +fi + + ;; + esac + + if test no = "$lt_cv_dlopen"; then + enable_dlopen=no else - need_version=yes + enable_dlopen=yes fi - library_names_spec='$libname$release$shared_ext$versuffix $libname$shared_ext$versuffix' - finish_cmds='PATH="\$PATH:/sbin" ldconfig -m $libdir' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - ;; -os2*) - libname_spec='$name' - version_type=windows - shrext_cmds=.dll - need_version=no - need_lib_prefix=no - # OS/2 can only load a DLL with a base name of 8 characters or less. - soname_spec='`test -n "$os2dllname" && libname="$os2dllname"; - v=$($ECHO $release$versuffix | tr -d .-); - n=$($ECHO $libname | cut -b -$((8 - ${#v})) | tr . _); - $ECHO $n$v`$shared_ext' - library_names_spec='${libname}_dll.$libext' - dynamic_linker='OS/2 ld.exe' - shlibpath_var=BEGINLIBPATH - sys_lib_search_path_spec="/lib /usr/lib /usr/local/lib" - sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec - postinstall_cmds='base_file=`basename \$file`~ - dlpath=`$SHELL 2>&1 -c '\''. $dir/'\''\$base_file'\''i; $ECHO \$dlname'\''`~ - dldir=$destdir/`dirname \$dlpath`~ - test -d \$dldir || mkdir -p \$dldir~ - $install_prog $dir/$dlname \$dldir/$dlname~ - chmod a+x \$dldir/$dlname~ - if test -n '\''$stripme'\'' && test -n '\''$striplib'\''; then - eval '\''$striplib \$dldir/$dlname'\'' || exit \$?; - fi' - postuninstall_cmds='dldll=`$SHELL 2>&1 -c '\''. $file; $ECHO \$dlname'\''`~ - dlpath=$dir/\$dldll~ - $RM \$dlpath' - ;; + case $lt_cv_dlopen in + dlopen) + save_CPPFLAGS=$CPPFLAGS + test yes = "$ac_cv_header_dlfcn_h" && CPPFLAGS="$CPPFLAGS -DHAVE_DLFCN_H" -osf3* | osf4* | osf5*) - version_type=osf - need_lib_prefix=no - need_version=no - soname_spec='$libname$release$shared_ext$major' - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - shlibpath_var=LD_LIBRARY_PATH - sys_lib_search_path_spec="/usr/shlib /usr/ccs/lib /usr/lib/cmplrs/cc /usr/lib /usr/local/lib /var/shlib" - sys_lib_dlsearch_path_spec=$sys_lib_search_path_spec - ;; + save_LDFLAGS=$LDFLAGS + wl=$lt_prog_compiler_wl eval LDFLAGS=\"\$LDFLAGS $export_dynamic_flag_spec\" -rdos*) - dynamic_linker=no - ;; + save_LIBS=$LIBS + LIBS="$lt_cv_dlopen_libs $LIBS" -solaris*) - version_type=linux # correct to gnu/linux during the next big refactor - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - hardcode_into_libs=yes - # ldd complains unless libraries are executable - postinstall_cmds='chmod +x $lib' - ;; + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether a program can dlopen itself" >&5 +printf %s "checking whether a program can dlopen itself... " >&6; } +if test ${lt_cv_dlopen_self+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test yes = "$cross_compiling"; then : + lt_cv_dlopen_self=cross +else + lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 + lt_status=$lt_dlunknown + cat > conftest.$ac_ext <<_LT_EOF +#line $LINENO "configure" +#include "confdefs.h" -sunos4*) - version_type=sunos - library_names_spec='$libname$release$shared_ext$versuffix $libname$shared_ext$versuffix' - finish_cmds='PATH="\$PATH:/usr/etc" ldconfig $libdir' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - if test yes = "$with_gnu_ld"; then - need_lib_prefix=no - fi - need_version=yes - ;; +#if HAVE_DLFCN_H +#include +#endif -sysv4 | sysv4.3*) - version_type=linux # correct to gnu/linux during the next big refactor - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - shlibpath_var=LD_LIBRARY_PATH - case $host_vendor in - sni) - shlibpath_overrides_runpath=no - need_lib_prefix=no - runpath_var=LD_RUN_PATH - ;; - siemens) - need_lib_prefix=no - ;; - motorola) - need_lib_prefix=no - need_version=no - shlibpath_overrides_runpath=no - sys_lib_search_path_spec='/lib /usr/lib /usr/ccs/lib' - ;; - esac - ;; +#include -sysv4*MP*) - if test -d /usr/nec; then - version_type=linux # correct to gnu/linux during the next big refactor - library_names_spec='$libname$shared_ext.$versuffix $libname$shared_ext.$major $libname$shared_ext' - soname_spec='$libname$shared_ext.$major' - shlibpath_var=LD_LIBRARY_PATH - fi - ;; +#ifdef RTLD_GLOBAL +# define LT_DLGLOBAL RTLD_GLOBAL +#else +# ifdef DL_GLOBAL +# define LT_DLGLOBAL DL_GLOBAL +# else +# define LT_DLGLOBAL 0 +# endif +#endif -sysv5* | sco3.2v5* | sco5v6* | unixware* | OpenUNIX* | sysv4*uw2*) - version_type=sco - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=yes - hardcode_into_libs=yes - if test yes = "$with_gnu_ld"; then - sys_lib_search_path_spec='/usr/local/lib /usr/gnu/lib /usr/ccs/lib /usr/lib /lib' - else - sys_lib_search_path_spec='/usr/ccs/lib /usr/lib' - case $host_os in - sco3.2v5*) - sys_lib_search_path_spec="$sys_lib_search_path_spec /lib" - ;; - esac - fi - sys_lib_dlsearch_path_spec='/usr/lib' - ;; +/* We may have to define LT_DLLAZY_OR_NOW in the command line if we + find out it does not work in some platform. */ +#ifndef LT_DLLAZY_OR_NOW +# ifdef RTLD_LAZY +# define LT_DLLAZY_OR_NOW RTLD_LAZY +# else +# ifdef DL_LAZY +# define LT_DLLAZY_OR_NOW DL_LAZY +# else +# ifdef RTLD_NOW +# define LT_DLLAZY_OR_NOW RTLD_NOW +# else +# ifdef DL_NOW +# define LT_DLLAZY_OR_NOW DL_NOW +# else +# define LT_DLLAZY_OR_NOW 0 +# endif +# endif +# endif +# endif +#endif -tpf*) - # TPF is a cross-target only. Preferred cross-host = GNU/Linux. - version_type=linux # correct to gnu/linux during the next big refactor - need_lib_prefix=no - need_version=no - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - shlibpath_var=LD_LIBRARY_PATH - shlibpath_overrides_runpath=no - hardcode_into_libs=yes - ;; +/* When -fvisibility=hidden is used, assume the code has been annotated + correspondingly for the symbols needed. */ +#if defined __GNUC__ && (((__GNUC__ == 3) && (__GNUC_MINOR__ >= 3)) || (__GNUC__ > 3)) +int fnord () __attribute__((visibility("default"))); +#endif -uts4*) - version_type=linux # correct to gnu/linux during the next big refactor - library_names_spec='$libname$release$shared_ext$versuffix $libname$release$shared_ext$major $libname$shared_ext' - soname_spec='$libname$release$shared_ext$major' - shlibpath_var=LD_LIBRARY_PATH - ;; +int fnord () { return 42; } +int main () +{ + void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW); + int status = $lt_dlunknown; -*) - dynamic_linker=no - ;; -esac -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $dynamic_linker" >&5 -$as_echo "$dynamic_linker" >&6; } -test no = "$dynamic_linker" && can_build_shared=no + if (self) + { + if (dlsym (self,"fnord")) status = $lt_dlno_uscore; + else + { + if (dlsym( self,"_fnord")) status = $lt_dlneed_uscore; + else puts (dlerror ()); + } + /* dlclose (self); */ + } + else + puts (dlerror ()); -variables_saved_for_relink="PATH $shlibpath_var $runpath_var" -if test yes = "$GCC"; then - variables_saved_for_relink="$variables_saved_for_relink GCC_EXEC_PREFIX COMPILER_PATH LIBRARY_PATH" + return status; +} +_LT_EOF + if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_link\""; } >&5 + (eval $ac_link) 2>&5 + ac_status=$? + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; } && test -s "conftest$ac_exeext" 2>/dev/null; then + (./conftest; exit; ) >&5 2>/dev/null + lt_status=$? + case x$lt_status in + x$lt_dlno_uscore) lt_cv_dlopen_self=yes ;; + x$lt_dlneed_uscore) lt_cv_dlopen_self=yes ;; + x$lt_dlunknown|x*) lt_cv_dlopen_self=no ;; + esac + else : + # compilation failed + lt_cv_dlopen_self=no + fi fi +rm -fr conftest* -if test set = "${lt_cv_sys_lib_search_path_spec+set}"; then - sys_lib_search_path_spec=$lt_cv_sys_lib_search_path_spec -fi -if test set = "${lt_cv_sys_lib_dlsearch_path_spec+set}"; then - sys_lib_dlsearch_path_spec=$lt_cv_sys_lib_dlsearch_path_spec fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_dlopen_self" >&5 +printf "%s\n" "$lt_cv_dlopen_self" >&6; } -# remember unaugmented sys_lib_dlsearch_path content for libtool script decls... -configure_time_dlsearch_path=$sys_lib_dlsearch_path_spec - -# ... but it needs LT_SYS_LIBRARY_PATH munging for other configure-time code -func_munge_path_list sys_lib_dlsearch_path_spec "$LT_SYS_LIBRARY_PATH" - -# to be used as default LT_SYS_LIBRARY_PATH value in generated libtool -configure_time_lt_sys_library_path=$LT_SYS_LIBRARY_PATH - + if test yes = "$lt_cv_dlopen_self"; then + wl=$lt_prog_compiler_wl eval LDFLAGS=\"\$LDFLAGS $lt_prog_compiler_static\" + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether a statically linked program can dlopen itself" >&5 +printf %s "checking whether a statically linked program can dlopen itself... " >&6; } +if test ${lt_cv_dlopen_self_static+y} +then : + printf %s "(cached) " >&6 +else $as_nop + if test yes = "$cross_compiling"; then : + lt_cv_dlopen_self_static=cross +else + lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 + lt_status=$lt_dlunknown + cat > conftest.$ac_ext <<_LT_EOF +#line $LINENO "configure" +#include "confdefs.h" +#if HAVE_DLFCN_H +#include +#endif +#include +#ifdef RTLD_GLOBAL +# define LT_DLGLOBAL RTLD_GLOBAL +#else +# ifdef DL_GLOBAL +# define LT_DLGLOBAL DL_GLOBAL +# else +# define LT_DLGLOBAL 0 +# endif +#endif +/* We may have to define LT_DLLAZY_OR_NOW in the command line if we + find out it does not work in some platform. */ +#ifndef LT_DLLAZY_OR_NOW +# ifdef RTLD_LAZY +# define LT_DLLAZY_OR_NOW RTLD_LAZY +# else +# ifdef DL_LAZY +# define LT_DLLAZY_OR_NOW DL_LAZY +# else +# ifdef RTLD_NOW +# define LT_DLLAZY_OR_NOW RTLD_NOW +# else +# ifdef DL_NOW +# define LT_DLLAZY_OR_NOW DL_NOW +# else +# define LT_DLLAZY_OR_NOW 0 +# endif +# endif +# endif +# endif +#endif +/* When -fvisibility=hidden is used, assume the code has been annotated + correspondingly for the symbols needed. */ +#if defined __GNUC__ && (((__GNUC__ == 3) && (__GNUC_MINOR__ >= 3)) || (__GNUC__ > 3)) +int fnord () __attribute__((visibility("default"))); +#endif +int fnord () { return 42; } +int main () +{ + void *self = dlopen (0, LT_DLGLOBAL|LT_DLLAZY_OR_NOW); + int status = $lt_dlunknown; + if (self) + { + if (dlsym (self,"fnord")) status = $lt_dlno_uscore; + else + { + if (dlsym( self,"_fnord")) status = $lt_dlneed_uscore; + else puts (dlerror ()); + } + /* dlclose (self); */ + } + else + puts (dlerror ()); + return status; +} +_LT_EOF + if { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_link\""; } >&5 + (eval $ac_link) 2>&5 + ac_status=$? + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + test $ac_status = 0; } && test -s "conftest$ac_exeext" 2>/dev/null; then + (./conftest; exit; ) >&5 2>/dev/null + lt_status=$? + case x$lt_status in + x$lt_dlno_uscore) lt_cv_dlopen_self_static=yes ;; + x$lt_dlneed_uscore) lt_cv_dlopen_self_static=yes ;; + x$lt_dlunknown|x*) lt_cv_dlopen_self_static=no ;; + esac + else : + # compilation failed + lt_cv_dlopen_self_static=no + fi +fi +rm -fr conftest* +fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $lt_cv_dlopen_self_static" >&5 +printf "%s\n" "$lt_cv_dlopen_self_static" >&6; } + fi + CPPFLAGS=$save_CPPFLAGS + LDFLAGS=$save_LDFLAGS + LIBS=$save_LIBS + ;; + esac + case $lt_cv_dlopen_self in + yes|no) enable_dlopen_self=$lt_cv_dlopen_self ;; + *) enable_dlopen_self=unknown ;; + esac + case $lt_cv_dlopen_self_static in + yes|no) enable_dlopen_self_static=$lt_cv_dlopen_self_static ;; + *) enable_dlopen_self_static=unknown ;; + esac +fi @@ -17441,6 +13331,46 @@ configure_time_lt_sys_library_path=$LT_SYS_LIBRARY_PATH +striplib= +old_striplib= +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether stripping libraries is possible" >&5 +printf %s "checking whether stripping libraries is possible... " >&6; } +if test -z "$STRIP"; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +else + if $STRIP -V 2>&1 | $GREP "GNU strip" >/dev/null; then + old_striplib="$STRIP --strip-debug" + striplib="$STRIP --strip-unneeded" + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } + else + case $host_os in + darwin*) + # FIXME - insert some real tests, host_os isn't really good enough + striplib="$STRIP -x" + old_striplib="$STRIP -S" + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } + ;; + freebsd*) + if $STRIP -V 2>&1 | $GREP "elftoolchain" >/dev/null; then + old_striplib="$STRIP --strip-debug" + striplib="$STRIP --strip-unneeded" + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } + else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } + fi + ;; + *) + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } + ;; + esac + fi +fi @@ -17450,70 +13380,62 @@ configure_time_lt_sys_library_path=$LT_SYS_LIBRARY_PATH - { $as_echo "$as_me:${as_lineno-$LINENO}: checking how to hardcode library paths into programs" >&5 -$as_echo_n "checking how to hardcode library paths into programs... " >&6; } -hardcode_action_CXX= -if test -n "$hardcode_libdir_flag_spec_CXX" || - test -n "$runpath_var_CXX" || - test yes = "$hardcode_automatic_CXX"; then - # We can hardcode non-existent directories. - if test no != "$hardcode_direct_CXX" && - # If the only mechanism to avoid hardcoding is shlibpath_var, we - # have to relink, otherwise we might link with an installed library - # when we should be linking with a yet-to-be-installed one - ## test no != "$_LT_TAGVAR(hardcode_shlibpath_var, CXX)" && - test no != "$hardcode_minus_L_CXX"; then - # Linking always hardcodes the temporary library directory. - hardcode_action_CXX=relink - else - # We can link without hardcoding, and we can hardcode nonexisting dirs. - hardcode_action_CXX=immediate - fi -else - # We cannot hardcode anything, or else we can only hardcode existing - # directories. - hardcode_action_CXX=unsupported -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $hardcode_action_CXX" >&5 -$as_echo "$hardcode_action_CXX" >&6; } -if test relink = "$hardcode_action_CXX" || - test yes = "$inherit_rpath_CXX"; then - # Fast installation is not supported - enable_fast_install=no -elif test yes = "$shlibpath_overrides_runpath" || - test no = "$enable_shared"; then - # Fast installation is not necessary - enable_fast_install=needless -fi + # Report what library types will actually be built + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if libtool supports shared libraries" >&5 +printf %s "checking if libtool supports shared libraries... " >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $can_build_shared" >&5 +printf "%s\n" "$can_build_shared" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether to build shared libraries" >&5 +printf %s "checking whether to build shared libraries... " >&6; } + test no = "$can_build_shared" && enable_shared=no + # On AIX, shared libraries and static libraries use the same namespace, and + # are all built from PIC. + case $host_os in + aix3*) + test yes = "$enable_shared" && enable_static=no + if test -n "$RANLIB"; then + archive_cmds="$archive_cmds~\$RANLIB \$lib" + postinstall_cmds='$RANLIB $lib' + fi + ;; + aix[4-9]*) + if test ia64 != "$host_cpu"; then + case $enable_shared,$with_aix_soname,$aix_use_runtimelinking in + yes,aix,yes) ;; # shared object as lib.so file only + yes,svr4,*) ;; # shared object as lib.so archive member only + yes,*) enable_static=no ;; # shared object in lib.a archive as well + esac + fi + ;; + esac + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $enable_shared" >&5 +printf "%s\n" "$enable_shared" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether to build static libraries" >&5 +printf %s "checking whether to build static libraries... " >&6; } + # Make sure either enable_shared or enable_static is yes. + test yes = "$enable_shared" || enable_static=yes + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $enable_static" >&5 +printf "%s\n" "$enable_static" >&6; } - fi # test -n "$compiler" - CC=$lt_save_CC - CFLAGS=$lt_save_CFLAGS - LDCXX=$LD - LD=$lt_save_LD - GCC=$lt_save_GCC - with_gnu_ld=$lt_save_with_gnu_ld - lt_cv_path_LDCXX=$lt_cv_path_LD - lt_cv_path_LD=$lt_save_path_LD - lt_cv_prog_gnu_ldcxx=$lt_cv_prog_gnu_ld - lt_cv_prog_gnu_ld=$lt_save_with_gnu_ld -fi # test yes != "$_lt_caught_CXX_error" +fi ac_ext=c ac_cpp='$CPP $CPPFLAGS' ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu +CC=$lt_save_CC + @@ -17536,15 +13458,15 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu # Only expand once: -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ln -s works" >&5 -$as_echo_n "checking whether ln -s works... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether ln -s works" >&5 +printf %s "checking whether ln -s works... " >&6; } LN_S=$as_ln_s if test "$LN_S" = "ln -s"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no, using $LN_S" >&5 -$as_echo "no, using $LN_S" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no, using $LN_S" >&5 +printf "%s\n" "no, using $LN_S" >&6; } fi @@ -17556,45 +13478,48 @@ fi VISIBILITY_CXXFLAGS= HAVE_VISIBILITY=0 if test -n "$GCC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether the -Werror option is usable" >&5 -$as_echo_n "checking whether the -Werror option is usable... " >&6; } - if ${pcre_cv_cc_vis_werror+:} false; then : - $as_echo_n "(cached) " >&6 -else - - pcre_save_CFLAGS="$CFLAGS" + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether the -Werror option is usable" >&5 +printf %s "checking whether the -Werror option is usable... " >&6; } + if test ${pcre2_cv_cc_vis_werror+y} +then : + printf %s "(cached) " >&6 +else $as_nop + + pcre2_save_CFLAGS="$CFLAGS" CFLAGS="$CFLAGS -Werror" cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int -main () +main (void) { ; return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : - pcre_cv_cc_vis_werror=yes -else - pcre_cv_cc_vis_werror=no -fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext - CFLAGS="$pcre_save_CFLAGS" -fi - - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $pcre_cv_cc_vis_werror" >&5 -$as_echo "$pcre_cv_cc_vis_werror" >&6; } - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for simple visibility declarations" >&5 -$as_echo_n "checking for simple visibility declarations... " >&6; } - if ${pcre_cv_cc_visibility+:} false; then : - $as_echo_n "(cached) " >&6 -else - - pcre_save_CFLAGS="$CFLAGS" +if ac_fn_c_try_compile "$LINENO" +then : + pcre2_cv_cc_vis_werror=yes +else $as_nop + pcre2_cv_cc_vis_werror=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext + CFLAGS="$pcre2_save_CFLAGS" +fi + + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $pcre2_cv_cc_vis_werror" >&5 +printf "%s\n" "$pcre2_cv_cc_vis_werror" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for simple visibility declarations" >&5 +printf %s "checking for simple visibility declarations... " >&6; } + if test ${pcre2_cv_cc_visibility+y} +then : + printf %s "(cached) " >&6 +else $as_nop + + pcre2_save_CFLAGS="$CFLAGS" CFLAGS="$CFLAGS -fvisibility=hidden" - if test $pcre_cv_cc_vis_werror = yes; then + if test $pcre2_cv_cc_vis_werror = yes; then CFLAGS="$CFLAGS -Werror" fi cat confdefs.h - <<_ACEOF >conftest.$ac_ext @@ -17606,73 +13531,108 @@ extern __attribute__((__visibility__("hidden"))) int hiddenvar; void dummyfunc (void) {} int -main () +main (void) { ; return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : - pcre_cv_cc_visibility=yes -else - pcre_cv_cc_visibility=no +if ac_fn_c_try_compile "$LINENO" +then : + pcre2_cv_cc_visibility=yes +else $as_nop + pcre2_cv_cc_visibility=no fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext - CFLAGS="$pcre_save_CFLAGS" +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext + CFLAGS="$pcre2_save_CFLAGS" fi - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $pcre_cv_cc_visibility" >&5 -$as_echo "$pcre_cv_cc_visibility" >&6; } - if test $pcre_cv_cc_visibility = yes; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $pcre2_cv_cc_visibility" >&5 +printf "%s\n" "$pcre2_cv_cc_visibility" >&6; } + if test $pcre2_cv_cc_visibility = yes; then VISIBILITY_CFLAGS="-fvisibility=hidden" VISIBILITY_CXXFLAGS="-fvisibility=hidden -fvisibility-inlines-hidden" HAVE_VISIBILITY=1 -$as_echo "#define PCRE_EXP_DECL extern __attribute__ ((visibility (\"default\")))" >>confdefs.h +printf "%s\n" "#define PCRE2_EXP_DECL extern __attribute__ ((visibility (\"default\")))" >>confdefs.h -$as_echo "#define PCRE_EXP_DEFN __attribute__ ((visibility (\"default\")))" >>confdefs.h +printf "%s\n" "#define PCRE2_EXP_DEFN __attribute__ ((visibility (\"default\")))" >>confdefs.h -$as_echo "#define PCRE_EXP_DATA_DEFN __attribute__ ((visibility (\"default\")))" >>confdefs.h +printf "%s\n" "#define PCRE2POSIX_EXP_DECL extern __attribute__ ((visibility (\"default\")))" >>confdefs.h -$as_echo "#define PCREPOSIX_EXP_DECL extern __attribute__ ((visibility (\"default\")))" >>confdefs.h +printf "%s\n" "#define PCRE2POSIX_EXP_DEFN extern __attribute__ ((visibility (\"default\")))" >>confdefs.h + fi + fi -$as_echo "#define PCREPOSIX_EXP_DEFN extern __attribute__ ((visibility (\"default\")))" >>confdefs.h -$as_echo "#define PCRECPP_EXP_DECL extern __attribute__ ((visibility (\"default\")))" >>confdefs.h +printf "%s\n" "#define HAVE_VISIBILITY $HAVE_VISIBILITY" >>confdefs.h -$as_echo "#define PCRECPP_EXP_DEFN __attribute__ ((visibility (\"default\")))" >>confdefs.h - fi - fi +# Check for Clang __attribute__((uninitialized)) feature +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for __attribute__((uninitialized))" >&5 +printf %s "checking for __attribute__((uninitialized))... " >&6; } +ac_ext=c +ac_cpp='$CPP $CPPFLAGS' +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' +ac_compiler_gnu=$ac_cv_c_compiler_gnu +tmp_CFLAGS=$CFLAGS +CFLAGS="$CFLAGS -Werror" +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ -cat >>confdefs.h <<_ACEOF -#define HAVE_VISIBILITY $HAVE_VISIBILITY +int +main (void) +{ +char buf[128] __attribute__((uninitialized));(void)buf + ; + return 0; +} _ACEOF +if ac_fn_c_try_compile "$LINENO" +then : + pcre2_cc_cv_attribute_uninitialized=yes +else $as_nop + pcre2_cc_cv_attribute_uninitialized=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $pcre2_cc_cv_attribute_uninitialized" >&5 +printf "%s\n" "$pcre2_cc_cv_attribute_uninitialized" >&6; } +if test "$pcre2_cc_cv_attribute_uninitialized" = yes; then +printf "%s\n" "#define HAVE_ATTRIBUTE_UNINITIALIZED 1" >>confdefs.h + +fi +CFLAGS=$tmp_CFLAGS +ac_ext=c +ac_cpp='$CPP $CPPFLAGS' +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' +ac_compiler_gnu=$ac_cv_c_compiler_gnu # Versioning -PCRE_MAJOR="8" -PCRE_MINOR="43" -PCRE_PRERELEASE="" -PCRE_DATE="2019-02-23" +PCRE2_MAJOR="10" +PCRE2_MINOR="37" +PCRE2_PRERELEASE="" +PCRE2_DATE="2021-05-26" -if test "$PCRE_MINOR" = "08" -o "$PCRE_MINOR" = "09" +if test "$PCRE2_MINOR" = "08" -o "$PCRE2_MINOR" = "09" then echo "***" - echo "*** Minor version number $PCRE_MINOR must not be used. ***" - echo "*** Use only 01 to 07 or 10 onwards, to avoid octal issues. ***" + echo "*** Minor version number $PCRE2_MINOR must not be used. ***" + echo "*** Use only 00 to 07 or 10 onwards, to avoid octal issues. ***" echo "***" exit 1 fi @@ -17688,58 +13648,92 @@ then htmldir='${docdir}/html' fi -# Handle --disable-pcre8 (enabled by default) +# Force an error for PCRE1 size options # Check whether --enable-pcre8 was given. -if test "${enable_pcre8+set}" = set; then : +if test ${enable_pcre8+y} +then : enableval=$enable_pcre8; -else - enable_pcre8=unset +else $as_nop + enable_pcre8=no fi - - -# Handle --enable-pcre16 (disabled by default) # Check whether --enable-pcre16 was given. -if test "${enable_pcre16+set}" = set; then : +if test ${enable_pcre16+y} +then : enableval=$enable_pcre16; -else - enable_pcre16=unset +else $as_nop + enable_pcre16=no fi - - -# Handle --enable-pcre32 (disabled by default) # Check whether --enable-pcre32 was given. -if test "${enable_pcre32+set}" = set; then : +if test ${enable_pcre32+y} +then : enableval=$enable_pcre32; -else - enable_pcre32=unset +else $as_nop + enable_pcre32=no fi +if test "$enable_pcre8$enable_pcre16$enable_pcre32" != "nonono" +then + echo "** ERROR: Use --[en|dis]able-pcre2-[8|16|32], not --[en|dis]able-pcre[8|16|32]" + exit 1 +fi -# Handle --disable-cpp. The substitution of enable_cpp is needed for use in -# pcre-config. -# Check whether --enable-cpp was given. -if test "${enable_cpp+set}" = set; then : - enableval=$enable_cpp; -else - enable_cpp=unset +# Handle --disable-pcre2-8 (enabled by default) +# Check whether --enable-pcre2-8 was given. +if test ${enable_pcre2_8+y} +then : + enableval=$enable_pcre2_8; +else $as_nop + enable_pcre2_8=unset +fi + + + +# Handle --enable-pcre2-16 (disabled by default) +# Check whether --enable-pcre2-16 was given. +if test ${enable_pcre2_16+y} +then : + enableval=$enable_pcre2_16; +else $as_nop + enable_pcre2_16=unset fi +# Handle --enable-pcre2-32 (disabled by default) +# Check whether --enable-pcre2-32 was given. +if test ${enable_pcre2_32+y} +then : + enableval=$enable_pcre2_32; +else $as_nop + enable_pcre2_32=unset +fi + + + +# Handle --enable-debug (disabled by default) +# Check whether --enable-debug was given. +if test ${enable_debug+y} +then : + enableval=$enable_debug; +else $as_nop + enable_debug=no +fi + + # Handle --enable-jit (disabled by default) # Check whether --enable-jit was given. -if test "${enable_jit+set}" = set; then : +if test ${enable_jit+y} +then : enableval=$enable_jit; -else +else $as_nop enable_jit=no fi # This code enables JIT if the hardware supports it. - if test "$enable_jit" = "auto"; then ac_ext=c ac_cpp='$CPP $CPPFLAGS' @@ -17747,741 +13741,521 @@ ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' ac_compiler_gnu=$ac_cv_c_compiler_gnu + SAVE_CPPFLAGS=$CPPFLAGS + CPPFLAGS=-I$srcdir cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #define SLJIT_CONFIG_AUTO 1 - #include "sljit/sljitConfigInternal.h" + #include "src/sljit/sljitConfigInternal.h" #if (defined SLJIT_CONFIG_UNSUPPORTED && SLJIT_CONFIG_UNSUPPORTED) #error unsupported #endif _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : enable_jit=yes -else +else $as_nop enable_jit=no fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext + CPPFLAGS=$SAVE_CPPFLAGS + echo checking for JIT support on this hardware... $enable_jit fi -# Handle --disable-pcregrep-jit (enabled by default) -# Check whether --enable-pcregrep-jit was given. -if test "${enable_pcregrep_jit+set}" = set; then : - enableval=$enable_pcregrep_jit; -else - enable_pcregrep_jit=yes +# Handle --enable-jit-sealloc (disabled by default and only experimental) +case $host_os in + linux* | netbsd*) + # Check whether --enable-jit-sealloc was given. +if test ${enable_jit_sealloc+y} +then : + enableval=$enable_jit_sealloc; +else $as_nop + enable_jit_sealloc=no +fi + + ;; + *) + enable_jit_sealloc=unsupported + ;; +esac + +# Handle --disable-pcre2grep-jit (enabled by default) +# Check whether --enable-pcre2grep-jit was given. +if test ${enable_pcre2grep_jit+y} +then : + enableval=$enable_pcre2grep_jit; +else $as_nop + enable_pcre2grep_jit=yes fi -# Handle --enable-rebuild-chartables -# Check whether --enable-rebuild-chartables was given. -if test "${enable_rebuild_chartables+set}" = set; then : - enableval=$enable_rebuild_chartables; -else - enable_rebuild_chartables=no +# Handle --disable-pcre2grep-callout (enabled by default) +# Check whether --enable-pcre2grep-callout was given. +if test ${enable_pcre2grep_callout+y} +then : + enableval=$enable_pcre2grep_callout; +else $as_nop + enable_pcre2grep_callout=yes fi -# Handle --enable-utf8 (disabled by default) -# Check whether --enable-utf8 was given. -if test "${enable_utf8+set}" = set; then : - enableval=$enable_utf8; -else - enable_utf8=unset +# Handle --disable-pcre2grep-callout-fork (enabled by default) +# Check whether --enable-pcre2grep-callout-fork was given. +if test ${enable_pcre2grep_callout_fork+y} +then : + enableval=$enable_pcre2grep_callout_fork; +else $as_nop + enable_pcre2grep_callout_fork=yes fi -# Handle --enable-utf (disabled by default) -# Check whether --enable-utf was given. -if test "${enable_utf+set}" = set; then : - enableval=$enable_utf; -else - enable_utf=unset +# Handle --enable-rebuild-chartables +# Check whether --enable-rebuild-chartables was given. +if test ${enable_rebuild_chartables+y} +then : + enableval=$enable_rebuild_chartables; +else $as_nop + enable_rebuild_chartables=no fi -# Handle --enable-unicode-properties -# Check whether --enable-unicode-properties was given. -if test "${enable_unicode_properties+set}" = set; then : - enableval=$enable_unicode_properties; -else - enable_unicode_properties=no +# Handle --disable-unicode (enabled by default) +# Check whether --enable-unicode was given. +if test ${enable_unicode+y} +then : + enableval=$enable_unicode; +else $as_nop + enable_unicode=unset fi # Handle newline options -ac_pcre_newline=lf +ac_pcre2_newline=lf # Check whether --enable-newline-is-cr was given. -if test "${enable_newline_is_cr+set}" = set; then : - enableval=$enable_newline_is_cr; ac_pcre_newline=cr +if test ${enable_newline_is_cr+y} +then : + enableval=$enable_newline_is_cr; ac_pcre2_newline=cr fi # Check whether --enable-newline-is-lf was given. -if test "${enable_newline_is_lf+set}" = set; then : - enableval=$enable_newline_is_lf; ac_pcre_newline=lf +if test ${enable_newline_is_lf+y} +then : + enableval=$enable_newline_is_lf; ac_pcre2_newline=lf fi # Check whether --enable-newline-is-crlf was given. -if test "${enable_newline_is_crlf+set}" = set; then : - enableval=$enable_newline_is_crlf; ac_pcre_newline=crlf +if test ${enable_newline_is_crlf+y} +then : + enableval=$enable_newline_is_crlf; ac_pcre2_newline=crlf fi # Check whether --enable-newline-is-anycrlf was given. -if test "${enable_newline_is_anycrlf+set}" = set; then : - enableval=$enable_newline_is_anycrlf; ac_pcre_newline=anycrlf +if test ${enable_newline_is_anycrlf+y} +then : + enableval=$enable_newline_is_anycrlf; ac_pcre2_newline=anycrlf fi # Check whether --enable-newline-is-any was given. -if test "${enable_newline_is_any+set}" = set; then : - enableval=$enable_newline_is_any; ac_pcre_newline=any +if test ${enable_newline_is_any+y} +then : + enableval=$enable_newline_is_any; ac_pcre2_newline=any +fi + +# Check whether --enable-newline-is-nul was given. +if test ${enable_newline_is_nul+y} +then : + enableval=$enable_newline_is_nul; ac_pcre2_newline=nul fi -enable_newline="$ac_pcre_newline" +enable_newline="$ac_pcre2_newline" # Handle --enable-bsr-anycrlf # Check whether --enable-bsr-anycrlf was given. -if test "${enable_bsr_anycrlf+set}" = set; then : +if test ${enable_bsr_anycrlf+y} +then : enableval=$enable_bsr_anycrlf; -else +else $as_nop enable_bsr_anycrlf=no fi +# Handle --enable-never-backslash-C +# Check whether --enable-never-backslash-C was given. +if test ${enable_never_backslash_C+y} +then : + enableval=$enable_never_backslash_C; +else $as_nop + enable_never_backslash_C=no +fi + + # Handle --enable-ebcdic # Check whether --enable-ebcdic was given. -if test "${enable_ebcdic+set}" = set; then : +if test ${enable_ebcdic+y} +then : enableval=$enable_ebcdic; -else +else $as_nop enable_ebcdic=no fi # Handle --enable-ebcdic-nl25 # Check whether --enable-ebcdic-nl25 was given. -if test "${enable_ebcdic_nl25+set}" = set; then : +if test ${enable_ebcdic_nl25+y} +then : enableval=$enable_ebcdic_nl25; -else +else $as_nop enable_ebcdic_nl25=no fi -# Handle --disable-stack-for-recursion -# Check whether --enable-stack-for-recursion was given. -if test "${enable_stack_for_recursion+set}" = set; then : - enableval=$enable_stack_for_recursion; -else - enable_stack_for_recursion=yes -fi - - -# Handle --enable-pcregrep-libz -# Check whether --enable-pcregrep-libz was given. -if test "${enable_pcregrep_libz+set}" = set; then : - enableval=$enable_pcregrep_libz; -else - enable_pcregrep_libz=no -fi - - -# Handle --enable-pcregrep-libbz2 -# Check whether --enable-pcregrep-libbz2 was given. -if test "${enable_pcregrep_libbz2+set}" = set; then : - enableval=$enable_pcregrep_libbz2; -else - enable_pcregrep_libbz2=no -fi - - -# Handle --with-pcregrep-bufsize=N - -# Check whether --with-pcregrep-bufsize was given. -if test "${with_pcregrep_bufsize+set}" = set; then : - withval=$with_pcregrep_bufsize; -else - with_pcregrep_bufsize=20480 -fi - - -# Handle --enable-pcretest-libedit -# Check whether --enable-pcretest-libedit was given. -if test "${enable_pcretest_libedit+set}" = set; then : - enableval=$enable_pcretest_libedit; -else - enable_pcretest_libedit=no -fi - - -# Handle --enable-pcretest-libreadline -# Check whether --enable-pcretest-libreadline was given. -if test "${enable_pcretest_libreadline+set}" = set; then : - enableval=$enable_pcretest_libreadline; -else - enable_pcretest_libreadline=no -fi - - -# Handle --with-posix-malloc-threshold=NBYTES - -# Check whether --with-posix-malloc-threshold was given. -if test "${with_posix_malloc_threshold+set}" = set; then : - withval=$with_posix_malloc_threshold; -else - with_posix_malloc_threshold=10 -fi - - -# Handle --with-link-size=N - -# Check whether --with-link-size was given. -if test "${with_link_size+set}" = set; then : - withval=$with_link_size; -else - with_link_size=2 -fi - - -# Handle --with-parens-nest-limit=N - -# Check whether --with-parens-nest-limit was given. -if test "${with_parens_nest_limit+set}" = set; then : - withval=$with_parens_nest_limit; -else - with_parens_nest_limit=250 -fi - - -# Handle --with-match-limit=N - -# Check whether --with-match-limit was given. -if test "${with_match_limit+set}" = set; then : - withval=$with_match_limit; -else - with_match_limit=10000000 -fi - - -# Handle --with-match-limit_recursion=N -# -# Note: In config.h, the default is to define MATCH_LIMIT_RECURSION -# symbolically as MATCH_LIMIT, which in turn is defined to be some numeric -# value (e.g. 10000000). MATCH_LIMIT_RECURSION can otherwise be set to some -# different numeric value (or even the same numeric value as MATCH_LIMIT, -# though no longer defined in terms of the latter). -# - -# Check whether --with-match-limit-recursion was given. -if test "${with_match_limit_recursion+set}" = set; then : - withval=$with_match_limit_recursion; -else - with_match_limit_recursion=MATCH_LIMIT -fi - - -# Handle --enable-valgrind -# Check whether --enable-valgrind was given. -if test "${enable_valgrind+set}" = set; then : - enableval=$enable_valgrind; -else - enable_valgrind=no -fi - - -# Enable code coverage reports using gcov -# Check whether --enable-coverage was given. -if test "${enable_coverage+set}" = set; then : - enableval=$enable_coverage; -else - enable_coverage=no -fi - - -# Copy enable_utf8 value to enable_utf for compatibility reasons -if test "x$enable_utf8" != "xunset" -then - if test "x$enable_utf" != "xunset" - then - as_fn_error $? "--enable/disable-utf8 is kept only for compatibility reasons and its value is copied to --enable/disable-utf. Newer code must use --enable/disable-utf alone." "$LINENO" 5 - fi - enable_utf=$enable_utf8 -fi - -# Set the default value for pcre8 -if test "x$enable_pcre8" = "xunset" -then - enable_pcre8=yes -fi - -# Set the default value for pcre16 -if test "x$enable_pcre16" = "xunset" -then - enable_pcre16=no -fi - -# Set the default value for pcre32 -if test "x$enable_pcre32" = "xunset" -then - enable_pcre32=no -fi - -# Make sure enable_pcre8 or enable_pcre16 was set -if test "x$enable_pcre8$enable_pcre16$enable_pcre32" = "xnonono" -then - as_fn_error $? "At least one of 8, 16 or 32 bit pcre library must be enabled" "$LINENO" 5 -fi - -# Make sure that if enable_unicode_properties was set, that UTF support is enabled. -if test "x$enable_unicode_properties" = "xyes" -then - if test "x$enable_utf" = "xno" - then - as_fn_error $? "support for Unicode properties requires UTF-8/16/32 support" "$LINENO" 5 - fi - enable_utf=yes -fi - -# enable_utf is disabled by default. -if test "x$enable_utf" = "xunset" -then - enable_utf=no -fi - -# enable_cpp copies the value of enable_pcre8 by default -if test "x$enable_cpp" = "xunset" -then - enable_cpp=$enable_pcre8 -fi - -# Make sure that if enable_cpp was set, that enable_pcre8 support is enabled -if test "x$enable_cpp" = "xyes" -then - if test "x$enable_pcre8" = "xno" - then - as_fn_error $? "C++ library requires pcre library with 8 bit characters" "$LINENO" 5 - fi -fi - -# Convert the newline identifier into the appropriate integer value. The first -# three are ASCII values 0x0a, 0x0d, and 0x0d0a, but if EBCDIC is enabled, they -# are changed below. - -case "$enable_newline" in - lf) ac_pcre_newline_value=10 ;; - cr) ac_pcre_newline_value=13 ;; - crlf) ac_pcre_newline_value=3338 ;; - anycrlf) ac_pcre_newline_value=-2 ;; - any) ac_pcre_newline_value=-1 ;; - *) - as_fn_error $? "invalid argument \"$enable_newline\" to --enable-newline option" "$LINENO" 5 - ;; -esac - -# --enable-ebcdic-nl25 implies --enable-ebcdic -if test "x$enable_ebcdic_nl25" = "xyes"; then - enable_ebcdic=yes -fi - -# Make sure that if enable_ebcdic is set, rebuild_chartables is also enabled, -# and the newline value is adjusted appropriately (CR is still 13, but LF is -# 21 or 37). Also check that UTF support is not requested, because PCRE cannot -# handle EBCDIC and UTF in the same build. To do so it would need to use -# different character constants depending on the mode. -# -if test "x$enable_ebcdic" = "xyes"; then - enable_rebuild_chartables=yes - - if test "x$enable_utf" = "xyes"; then - as_fn_error $? "support for EBCDIC and UTF-8/16/32 cannot be enabled at the same time" "$LINENO" 5 - fi - - if test "x$enable_ebcdic_nl25" = "xno"; then - case "$ac_pcre_newline_value" in - 10) ac_pcre_newline_value=21 ;; - 3338) ac_pcre_newline_value=3349 ;; - esac - else - case "$ac_pcre_newline_value" in - 10) ac_pcre_newline_value=37 ;; - 3338) ac_pcre_newline_value=3365 ;; - esac - fi +# Handle --enable-pcre2grep-libz +# Check whether --enable-pcre2grep-libz was given. +if test ${enable_pcre2grep_libz+y} +then : + enableval=$enable_pcre2grep_libz; +else $as_nop + enable_pcre2grep_libz=no fi -# Check argument to --with-link-size -case "$with_link_size" in - 2|3|4) ;; - *) - as_fn_error $? "invalid argument \"$with_link_size\" to --with-link-size option" "$LINENO" 5 - ;; -esac - - - -# Checks for header files. -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for ANSI C header files" >&5 -$as_echo_n "checking for ANSI C header files... " >&6; } -if ${ac_cv_header_stdc+:} false; then : - $as_echo_n "(cached) " >&6 -else - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include -#include -#include -#include - -int -main () -{ - ; - return 0; -} -_ACEOF -if ac_fn_c_try_compile "$LINENO"; then : - ac_cv_header_stdc=yes -else - ac_cv_header_stdc=no +# Handle --enable-pcre2grep-libbz2 +# Check whether --enable-pcre2grep-libbz2 was given. +if test ${enable_pcre2grep_libbz2+y} +then : + enableval=$enable_pcre2grep_libbz2; +else $as_nop + enable_pcre2grep_libbz2=no fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -if test $ac_cv_header_stdc = yes; then - # SunOS 4.x string.h does not declare mem*, contrary to ANSI. - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include -_ACEOF -if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | - $EGREP "memchr" >/dev/null 2>&1; then : +# Handle --with-pcre2grep-bufsize=N -else - ac_cv_header_stdc=no +# Check whether --with-pcre2grep-bufsize was given. +if test ${with_pcre2grep_bufsize+y} +then : + withval=$with_pcre2grep_bufsize; +else $as_nop + with_pcre2grep_bufsize=20480 fi -rm -f conftest* -fi -if test $ac_cv_header_stdc = yes; then - # ISC 2.0.2 stdlib.h does not declare free, contrary to ANSI. - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include +# Handle --with-pcre2grep-max-bufsize=N -_ACEOF -if (eval "$ac_cpp conftest.$ac_ext") 2>&5 | - $EGREP "free" >/dev/null 2>&1; then : +# Check whether --with-pcre2grep-max-bufsize was given. +if test ${with_pcre2grep_max_bufsize+y} +then : + withval=$with_pcre2grep_max_bufsize; +else $as_nop + with_pcre2grep_max_bufsize=1048576 +fi -else - ac_cv_header_stdc=no + +# Handle --enable-pcre2test-libedit +# Check whether --enable-pcre2test-libedit was given. +if test ${enable_pcre2test_libedit+y} +then : + enableval=$enable_pcre2test_libedit; +else $as_nop + enable_pcre2test_libedit=no fi -rm -f conftest* + +# Handle --enable-pcre2test-libreadline +# Check whether --enable-pcre2test-libreadline was given. +if test ${enable_pcre2test_libreadline+y} +then : + enableval=$enable_pcre2test_libreadline; +else $as_nop + enable_pcre2test_libreadline=no fi -if test $ac_cv_header_stdc = yes; then - # /bin/cc in Irix-4.0.5 gets non-ANSI ctype macros unless using -ansi. - if test "$cross_compiling" = yes; then : - : -else - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include -#include -#if ((' ' & 0x0FF) == 0x020) -# define ISLOWER(c) ('a' <= (c) && (c) <= 'z') -# define TOUPPER(c) (ISLOWER(c) ? 'A' + ((c) - 'a') : (c)) -#else -# define ISLOWER(c) \ - (('a' <= (c) && (c) <= 'i') \ - || ('j' <= (c) && (c) <= 'r') \ - || ('s' <= (c) && (c) <= 'z')) -# define TOUPPER(c) (ISLOWER(c) ? ((c) | 0x40) : (c)) -#endif -#define XOR(e, f) (((e) && !(f)) || (!(e) && (f))) -int -main () -{ - int i; - for (i = 0; i < 256; i++) - if (XOR (islower (i), ISLOWER (i)) - || toupper (i) != TOUPPER (i)) - return 2; - return 0; -} -_ACEOF -if ac_fn_c_try_run "$LINENO"; then : +# Handle --with-link-size=N -else - ac_cv_header_stdc=no -fi -rm -f core *.core core.conftest.* gmon.out bb.out conftest$ac_exeext \ - conftest.$ac_objext conftest.beam conftest.$ac_ext +# Check whether --with-link-size was given. +if test ${with_link_size+y} +then : + withval=$with_link_size; +else $as_nop + with_link_size=2 fi -fi -fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_header_stdc" >&5 -$as_echo "$ac_cv_header_stdc" >&6; } -if test $ac_cv_header_stdc = yes; then -$as_echo "#define STDC_HEADERS 1" >>confdefs.h +# Handle --with-parens-nest-limit=N +# Check whether --with-parens-nest-limit was given. +if test ${with_parens_nest_limit+y} +then : + withval=$with_parens_nest_limit; +else $as_nop + with_parens_nest_limit=250 fi -for ac_header in limits.h sys/types.h sys/stat.h dirent.h -do : - as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh` -ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default" -if eval test \"x\$"$as_ac_Header"\" = x"yes"; then : - cat >>confdefs.h <<_ACEOF -#define `$as_echo "HAVE_$ac_header" | $as_tr_cpp` 1 -_ACEOF +# Handle --with-heap-limit + +# Check whether --with-heap-limit was given. +if test ${with_heap_limit+y} +then : + withval=$with_heap_limit; +else $as_nop + with_heap_limit=20000000 fi -done -for ac_header in windows.h -do : - ac_fn_c_check_header_mongrel "$LINENO" "windows.h" "ac_cv_header_windows_h" "$ac_includes_default" -if test "x$ac_cv_header_windows_h" = xyes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_WINDOWS_H 1 -_ACEOF - HAVE_WINDOWS_H=1 +# Handle --with-match-limit=N + +# Check whether --with-match-limit was given. +if test ${with_match_limit+y} +then : + withval=$with_match_limit; +else $as_nop + with_match_limit=10000000 fi -done + +# Handle --with-match-limit-depth=N +# Recognize old synonym --with-match-limit-recursion +# +# Note: In config.h, the default is to define MATCH_LIMIT_DEPTH symbolically as +# MATCH_LIMIT, which in turn is defined to be some numeric value (e.g. +# 10000000). MATCH_LIMIT_DEPTH can otherwise be set to some different numeric +# value (or even the same numeric value as MATCH_LIMIT, though no longer +# defined in terms of the latter). +# + +# Check whether --with-match-limit-depth was given. +if test ${with_match_limit_depth+y} +then : + withval=$with_match_limit_depth; +else $as_nop + with_match_limit_depth=MATCH_LIMIT +fi -# The files below are C++ header files. -pcre_have_type_traits="0" -pcre_have_bits_type_traits="0" -if test "x$enable_cpp" = "xyes" -a -z "$CXX"; then - as_fn_error $? "Invalid C++ compiler or C++ compiler flags" "$LINENO" 5 +# Check whether --with-match-limit-recursion was given. +if test ${with_match_limit_recursion+y} +then : + withval=$with_match_limit_recursion; +else $as_nop + with_match_limit_recursion=UNSET fi -if test "x$enable_cpp" = "xyes" -a -n "$CXX" -then -ac_ext=cpp -ac_cpp='$CXXCPP $CPPFLAGS' -ac_compile='$CXX -c $CXXFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CXX -o conftest$ac_exeext $CXXFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_cxx_compiler_gnu - - -# Older versions of pcre defined pcrecpp::no_arg, but in new versions -# it's called pcrecpp::RE::no_arg. For backwards ABI compatibility, -# we want to make one an alias for the other. Different systems do -# this in different ways. Some systems, for instance, can do it via -# a linker flag: -alias (for os x 10.5) or -i (for os x <=10.4). -OLD_LDFLAGS="$LDFLAGS" -for flag in "-alias,__ZN7pcrecpp2RE6no_argE,__ZN7pcrecpp6no_argE" \ - "-i__ZN7pcrecpp6no_argE:__ZN7pcrecpp2RE6no_argE"; do - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for alias support in the linker" >&5 -$as_echo_n "checking for alias support in the linker... " >&6; } - LDFLAGS="$OLD_LDFLAGS -Wl,$flag" - # We try to run the linker with this new ld flag. If the link fails, - # we give up and remove the new flag from LDFLAGS. - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -namespace pcrecpp { - class RE { static int no_arg; }; - int RE::no_arg; - } -int -main () -{ - ; - return 0; -} -_ACEOF -if ac_fn_cxx_try_link "$LINENO"; then : - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; }; - EXTRA_LIBPCRECPP_LDFLAGS="$EXTRA_LIBPCRECPP_LDFLAGS -Wl,$flag"; - break; -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } +# Handle --enable-valgrind +# Check whether --enable-valgrind was given. +if test ${enable_valgrind+y} +then : + enableval=$enable_valgrind; +else $as_nop + enable_valgrind=no fi -rm -f core conftest.err conftest.$ac_objext \ - conftest$ac_exeext conftest.$ac_ext -done -LDFLAGS="$OLD_LDFLAGS" -# We could be more clever here, given we're doing AC_SUBST with this -# (eg set a var to be the name of the include file we want). But we're not -# so it's easy to change back to 'regular' autoconf vars if we needed to. -for ac_header in string -do : - ac_fn_cxx_check_header_mongrel "$LINENO" "string" "ac_cv_header_string" "$ac_includes_default" -if test "x$ac_cv_header_string" = xyes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_STRING 1 -_ACEOF - pcre_have_cpp_headers="1" -else - pcre_have_cpp_headers="0" +# Enable code coverage reports using gcov +# Check whether --enable-coverage was given. +if test ${enable_coverage+y} +then : + enableval=$enable_coverage; +else $as_nop + enable_coverage=no fi -done -for ac_header in bits/type_traits.h -do : - ac_fn_cxx_check_header_mongrel "$LINENO" "bits/type_traits.h" "ac_cv_header_bits_type_traits_h" "$ac_includes_default" -if test "x$ac_cv_header_bits_type_traits_h" = xyes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_BITS_TYPE_TRAITS_H 1 -_ACEOF - pcre_have_bits_type_traits="1" -else - pcre_have_bits_type_traits="0" +# Handle --enable-fuzz-support +# Check whether --enable-fuzz_support was given. +if test ${enable_fuzz_support+y} +then : + enableval=$enable_fuzz_support; +else $as_nop + enable_fuzz_support=no fi -done -for ac_header in type_traits.h -do : - ac_fn_cxx_check_header_mongrel "$LINENO" "type_traits.h" "ac_cv_header_type_traits_h" "$ac_includes_default" -if test "x$ac_cv_header_type_traits_h" = xyes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_TYPE_TRAITS_H 1 -_ACEOF - pcre_have_type_traits="1" -else - pcre_have_type_traits="0" +# Handle --disable-stack-for-recursion +# This option became obsolete at release 10.30. +# Check whether --enable-stack-for-recursion was given. +if test ${enable_stack_for_recursion+y} +then : + enableval=$enable_stack_for_recursion; +else $as_nop + enable_stack_for_recursion=yes fi -done +# Original code +# AC_ARG_ENABLE(stack-for-recursion, +# AS_HELP_STRING([--disable-stack-for-recursion], +# [don't use stack recursion when matching]), +# , enable_stack_for_recursion=yes) -# (This isn't c++-specific, but is only used in pcrecpp.cc, so try this -# in a c++ context. This matters becuase strtoimax is C99 and may not -# be supported by the C++ compiler.) -# Figure out how to create a longlong from a string: strtoll and -# equiv. It's not enough to call AC_CHECK_FUNCS: hpux has a -# strtoll, for instance, but it only takes 2 args instead of 3! -# We have to call AH_TEMPLATE since AC_DEFINE_UNQUOTED below is complex. +# Handle --disable-percent_zt (set as "auto" by default) +# Check whether --enable-percent-zt was given. +if test ${enable_percent_zt+y} +then : + enableval=$enable_percent_zt; +else $as_nop + enable_percent_zt=auto +fi +# Set the default value for pcre2-8 +if test "x$enable_pcre2_8" = "xunset" +then + enable_pcre2_8=yes +fi +# Set the default value for pcre2-16 +if test "x$enable_pcre2_16" = "xunset" +then + enable_pcre2_16=no +fi -have_strto_fn=0 -for fn in strtoq strtoll _strtoi64 strtoimax; do - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $fn" >&5 -$as_echo_n "checking for $fn... " >&6; } - if test "$fn" = strtoimax; then - include=stdint.h - else - include=stdlib.h - fi - cat confdefs.h - <<_ACEOF >conftest.$ac_ext -/* end confdefs.h. */ -#include <$include> -int -main () -{ -char* e; return $fn("100", &e, 10) - ; - return 0; -} -_ACEOF -if ac_fn_cxx_try_compile "$LINENO"; then : - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } +# Set the default value for pcre2-32 +if test "x$enable_pcre2_32" = "xunset" +then + enable_pcre2_32=no +fi -cat >>confdefs.h <<_ACEOF -#define HAVE_`echo $fn | tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ` 1 -_ACEOF +# Make sure at least one library is selected +if test "x$enable_pcre2_8$enable_pcre2_16$enable_pcre2_32" = "xnonono" +then + as_fn_error $? "At least one of the 8, 16 or 32 bit libraries must be enabled" "$LINENO" 5 +fi - have_strto_fn=1 - break -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } +# Unicode is enabled by default. +if test "x$enable_unicode" = "xunset" +then + enable_unicode=yes fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext -done -if test "$have_strto_fn" = 1; then - ac_fn_cxx_check_type "$LINENO" "long long" "ac_cv_type_long_long" "$ac_includes_default" -if test "x$ac_cv_type_long_long" = xyes; then : +# Convert the newline identifier into the appropriate integer value. These must +# agree with the PCRE2_NEWLINE_xxx values in pcre2.h. -cat >>confdefs.h <<_ACEOF -#define HAVE_LONG_LONG 1 -_ACEOF +case "$enable_newline" in + cr) ac_pcre2_newline_value=1 ;; + lf) ac_pcre2_newline_value=2 ;; + crlf) ac_pcre2_newline_value=3 ;; + any) ac_pcre2_newline_value=4 ;; + anycrlf) ac_pcre2_newline_value=5 ;; + nul) ac_pcre2_newline_value=6 ;; + *) + as_fn_error $? "invalid argument \"$enable_newline\" to --enable-newline option" "$LINENO" 5 + ;; +esac -pcre_have_long_long="1" -else - pcre_have_long_long="0" +# --enable-ebcdic-nl25 implies --enable-ebcdic +if test "x$enable_ebcdic_nl25" = "xyes"; then + enable_ebcdic=yes fi - ac_fn_cxx_check_type "$LINENO" "unsigned long long" "ac_cv_type_unsigned_long_long" "$ac_includes_default" -if test "x$ac_cv_type_unsigned_long_long" = xyes; then : +# Make sure that if enable_ebcdic is set, rebuild_chartables is also enabled. +# Also check that UTF support is not requested, because PCRE2 cannot handle +# EBCDIC and UTF in the same build. To do so it would need to use different +# character constants depending on the mode. Also, EBCDIC cannot be used with +# 16-bit and 32-bit libraries. +# +if test "x$enable_ebcdic" = "xyes"; then + enable_rebuild_chartables=yes + if test "x$enable_unicode" = "xyes"; then + as_fn_error $? "support for EBCDIC and Unicode cannot be enabled at the same time" "$LINENO" 5 + fi + if test "x$enable_pcre2_16" = "xyes" -o "x$enable_pcre2_32" = "xyes"; then + as_fn_error $? "EBCDIC support is available only for the 8-bit library" "$LINENO" 5 + fi +fi -cat >>confdefs.h <<_ACEOF -#define HAVE_UNSIGNED_LONG_LONG 1 -_ACEOF +# Check argument to --with-link-size +case "$with_link_size" in + 2|3|4) ;; + *) + as_fn_error $? "invalid argument \"$with_link_size\" to --with-link-size option" "$LINENO" 5 + ;; +esac -pcre_have_ulong_long="1" -else - pcre_have_ulong_long="0" -fi -else - pcre_have_long_long="0" - pcre_have_ulong_long="0" + +# Checks for header files. +ac_fn_c_check_header_compile "$LINENO" "limits.h" "ac_cv_header_limits_h" "$ac_includes_default" +if test "x$ac_cv_header_limits_h" = xyes +then : + printf "%s\n" "#define HAVE_LIMITS_H 1" >>confdefs.h + fi +ac_fn_c_check_header_compile "$LINENO" "sys/types.h" "ac_cv_header_sys_types_h" "$ac_includes_default" +if test "x$ac_cv_header_sys_types_h" = xyes +then : + printf "%s\n" "#define HAVE_SYS_TYPES_H 1" >>confdefs.h +fi +ac_fn_c_check_header_compile "$LINENO" "sys/stat.h" "ac_cv_header_sys_stat_h" "$ac_includes_default" +if test "x$ac_cv_header_sys_stat_h" = xyes +then : + printf "%s\n" "#define HAVE_SYS_STAT_H 1" >>confdefs.h +fi +ac_fn_c_check_header_compile "$LINENO" "dirent.h" "ac_cv_header_dirent_h" "$ac_includes_default" +if test "x$ac_cv_header_dirent_h" = xyes +then : + printf "%s\n" "#define HAVE_DIRENT_H 1" >>confdefs.h -ac_ext=c -ac_cpp='$CPP $CPPFLAGS' -ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' -ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' -ac_compiler_gnu=$ac_cv_c_compiler_gnu +fi + for ac_header in windows.h +do : + ac_fn_c_check_header_compile "$LINENO" "windows.h" "ac_cv_header_windows_h" "$ac_includes_default" +if test "x$ac_cv_header_windows_h" = xyes +then : + printf "%s\n" "#define HAVE_WINDOWS_H 1" >>confdefs.h + HAVE_WINDOWS_H=1 fi -# Using AC_SUBST eliminates the need to include config.h in a public .h file +done + for ac_header in sys/wait.h +do : + ac_fn_c_check_header_compile "$LINENO" "sys/wait.h" "ac_cv_header_sys_wait_h" "$ac_includes_default" +if test "x$ac_cv_header_sys_wait_h" = xyes +then : + printf "%s\n" "#define HAVE_SYS_WAIT_H 1" >>confdefs.h + HAVE_SYS_WAIT_H=1 +fi +done # Conditional compilation - if test "x$enable_pcre8" = "xyes"; then - WITH_PCRE8_TRUE= - WITH_PCRE8_FALSE='#' + if test "x$enable_pcre2_8" = "xyes"; then + WITH_PCRE2_8_TRUE= + WITH_PCRE2_8_FALSE='#' else - WITH_PCRE8_TRUE='#' - WITH_PCRE8_FALSE= + WITH_PCRE2_8_TRUE='#' + WITH_PCRE2_8_FALSE= fi - if test "x$enable_pcre16" = "xyes"; then - WITH_PCRE16_TRUE= - WITH_PCRE16_FALSE='#' + if test "x$enable_pcre2_16" = "xyes"; then + WITH_PCRE2_16_TRUE= + WITH_PCRE2_16_FALSE='#' else - WITH_PCRE16_TRUE='#' - WITH_PCRE16_FALSE= + WITH_PCRE2_16_TRUE='#' + WITH_PCRE2_16_FALSE= fi - if test "x$enable_pcre32" = "xyes"; then - WITH_PCRE32_TRUE= - WITH_PCRE32_FALSE='#' + if test "x$enable_pcre2_32" = "xyes"; then + WITH_PCRE2_32_TRUE= + WITH_PCRE2_32_FALSE='#' else - WITH_PCRE32_TRUE='#' - WITH_PCRE32_FALSE= + WITH_PCRE2_32_TRUE='#' + WITH_PCRE2_32_FALSE= fi - if test "x$enable_cpp" = "xyes"; then - WITH_PCRE_CPP_TRUE= - WITH_PCRE_CPP_FALSE='#' + if test "x$enable_debug" = "xyes"; then + WITH_DEBUG_TRUE= + WITH_DEBUG_FALSE='#' else - WITH_PCRE_CPP_TRUE='#' - WITH_PCRE_CPP_FALSE= + WITH_DEBUG_TRUE='#' + WITH_DEBUG_FALSE= fi if test "x$enable_rebuild_chartables" = "xyes"; then @@ -18500,12 +14274,12 @@ else WITH_JIT_FALSE= fi - if test "x$enable_utf" = "xyes"; then - WITH_UTF_TRUE= - WITH_UTF_FALSE='#' + if test "x$enable_unicode" = "xyes"; then + WITH_UNICODE_TRUE= + WITH_UNICODE_FALSE='#' else - WITH_UTF_TRUE='#' - WITH_UTF_FALSE= + WITH_UNICODE_TRUE='#' + WITH_UNICODE_FALSE= fi if test "x$enable_valgrind" = "xyes"; then @@ -18516,19 +14290,33 @@ else WITH_VALGRIND_FALSE= fi + if test "x$enable_fuzz_support" = "xyes"; then + WITH_FUZZ_SUPPORT_TRUE= + WITH_FUZZ_SUPPORT_FALSE='#' +else + WITH_FUZZ_SUPPORT_TRUE='#' + WITH_FUZZ_SUPPORT_FALSE= +fi + + +if test "$enable_fuzz_support" = "yes" -a "$enable_pcre2_8" = "no"; then + echo "** ERROR: Fuzzer support requires the 8-bit library" + exit 1 +fi # Checks for typedefs, structures, and compiler characteristics. -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for an ANSI C-conforming const" >&5 -$as_echo_n "checking for an ANSI C-conforming const... " >&6; } -if ${ac_cv_c_const+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for an ANSI C-conforming const" >&5 +printf %s "checking for an ANSI C-conforming const... " >&6; } +if test ${ac_cv_c_const+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ int -main () +main (void) { #ifndef __cplusplus @@ -18541,7 +14329,7 @@ main () /* NEC SVR4.0.2 mips cc rejects this. */ struct point {int x, y;}; static struct point const zero = {0,0}; - /* AIX XL C 1.02.0.0 rejects this. + /* IBM XL C 1.02.0.0 rejects this. It does not let you subtract one const X* pointer from another in an arm of an if-expression whose if-part is not a constant expression */ @@ -18569,7 +14357,7 @@ main () iptr p = 0; ++p; } - { /* AIX XL C 1.02.0.0 rejects this sort of thing, saying + { /* IBM XL C 1.02.0.0 rejects this sort of thing, saying "k.c", line 2.27: 1506-025 (S) Operand must be a modifiable lvalue. */ struct s { int j; const int *ap[3]; } bx; struct s *b = &bx; b->j = 5; @@ -18585,67 +14373,91 @@ main () return 0; } _ACEOF -if ac_fn_c_try_compile "$LINENO"; then : +if ac_fn_c_try_compile "$LINENO" +then : ac_cv_c_const=yes -else +else $as_nop ac_cv_c_const=no fi -rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_const" >&5 -$as_echo "$ac_cv_c_const" >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_c_const" >&5 +printf "%s\n" "$ac_cv_c_const" >&6; } if test $ac_cv_c_const = no; then -$as_echo "#define const /**/" >>confdefs.h +printf "%s\n" "#define const /**/" >>confdefs.h fi ac_fn_c_check_type "$LINENO" "size_t" "ac_cv_type_size_t" "$ac_includes_default" -if test "x$ac_cv_type_size_t" = xyes; then : +if test "x$ac_cv_type_size_t" = xyes +then : -else +else $as_nop -cat >>confdefs.h <<_ACEOF -#define size_t unsigned int -_ACEOF +printf "%s\n" "#define size_t unsigned int" >>confdefs.h fi # Checks for library functions. -for ac_func in bcopy memmove strerror -do : - as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh` -ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var" -if eval test \"x\$"$as_ac_var"\" = x"yes"; then : - cat >>confdefs.h <<_ACEOF -#define `$as_echo "HAVE_$ac_func" | $as_tr_cpp` 1 -_ACEOF +ac_fn_c_check_func "$LINENO" "bcopy" "ac_cv_func_bcopy" +if test "x$ac_cv_func_bcopy" = xyes +then : + printf "%s\n" "#define HAVE_BCOPY 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "memfd_create" "ac_cv_func_memfd_create" +if test "x$ac_cv_func_memfd_create" = xyes +then : + printf "%s\n" "#define HAVE_MEMFD_CREATE 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "memmove" "ac_cv_func_memmove" +if test "x$ac_cv_func_memmove" = xyes +then : + printf "%s\n" "#define HAVE_MEMMOVE 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "mkostemp" "ac_cv_func_mkostemp" +if test "x$ac_cv_func_mkostemp" = xyes +then : + printf "%s\n" "#define HAVE_MKOSTEMP 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "secure_getenv" "ac_cv_func_secure_getenv" +if test "x$ac_cv_func_secure_getenv" = xyes +then : + printf "%s\n" "#define HAVE_SECURE_GETENV 1" >>confdefs.h + +fi +ac_fn_c_check_func "$LINENO" "strerror" "ac_cv_func_strerror" +if test "x$ac_cv_func_strerror" = xyes +then : + printf "%s\n" "#define HAVE_STRERROR 1" >>confdefs.h fi -done # Check for the availability of libz (aka zlib) -for ac_header in zlib.h + for ac_header in zlib.h do : - ac_fn_c_check_header_mongrel "$LINENO" "zlib.h" "ac_cv_header_zlib_h" "$ac_includes_default" -if test "x$ac_cv_header_zlib_h" = xyes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_ZLIB_H 1 -_ACEOF + ac_fn_c_check_header_compile "$LINENO" "zlib.h" "ac_cv_header_zlib_h" "$ac_includes_default" +if test "x$ac_cv_header_zlib_h" = xyes +then : + printf "%s\n" "#define HAVE_ZLIB_H 1" >>confdefs.h HAVE_ZLIB_H=1 fi done - -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for gzopen in -lz" >&5 -$as_echo_n "checking for gzopen in -lz... " >&6; } -if ${ac_cv_lib_z_gzopen+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for gzopen in -lz" >&5 +printf %s "checking for gzopen in -lz... " >&6; } +if test ${ac_cv_lib_z_gzopen+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_check_lib_save_LIBS=$LIBS LIBS="-lz $LIBS" cat confdefs.h - <<_ACEOF >conftest.$ac_ext @@ -18654,30 +14466,29 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* Override any GCC internal prototype to avoid an error. Use char because int might match the return type of a GCC builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif char gzopen (); int -main () +main (void) { return gzopen (); ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : ac_cv_lib_z_gzopen=yes -else +else $as_nop ac_cv_lib_z_gzopen=no fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_z_gzopen" >&5 -$as_echo "$ac_cv_lib_z_gzopen" >&6; } -if test "x$ac_cv_lib_z_gzopen" = xyes; then : +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_z_gzopen" >&5 +printf "%s\n" "$ac_cv_lib_z_gzopen" >&6; } +if test "x$ac_cv_lib_z_gzopen" = xyes +then : HAVE_LIBZ=1 fi @@ -18692,31 +14503,29 @@ fi # therefore missing the function definition. # - The compiler thus generates a "C" signature for the test function. # - The linker fails to find the "C" function. -# - PCRE fails to configure if asked to do so against libbz2. +# - PCRE2 fails to configure if asked to do so against libbz2. # # Solution: # # - Replace the AC_CHECK_LIB test with a custom test. -for ac_header in bzlib.h + for ac_header in bzlib.h do : - ac_fn_c_check_header_mongrel "$LINENO" "bzlib.h" "ac_cv_header_bzlib_h" "$ac_includes_default" -if test "x$ac_cv_header_bzlib_h" = xyes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_BZLIB_H 1 -_ACEOF + ac_fn_c_check_header_compile "$LINENO" "bzlib.h" "ac_cv_header_bzlib_h" "$ac_includes_default" +if test "x$ac_cv_header_bzlib_h" = xyes +then : + printf "%s\n" "#define HAVE_BZLIB_H 1" >>confdefs.h HAVE_BZLIB_H=1 fi done - # Original test # AC_CHECK_LIB([bz2], [BZ2_bzopen], [HAVE_LIBBZ2=1]) # # Custom test follows -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for libbz2" >&5 -$as_echo_n "checking for libbz2... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for libbz2" >&5 +printf %s "checking for libbz2... " >&6; } OLD_LIBS="$LIBS" LIBS="$LIBS -lbz2" cat confdefs.h - <<_ACEOF >conftest.$ac_ext @@ -18726,56 +14535,54 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext #include #endif int -main () +main (void) { return (int)BZ2_bzopen("conftest", "rb"); ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; };HAVE_LIBBZ2=1; break; -else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } -fi -rm -f core conftest.err conftest.$ac_objext \ +if ac_fn_c_try_link "$LINENO" +then : + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; };HAVE_LIBBZ2=1; break; +else $as_nop + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext LIBS="$OLD_LIBS" # Check for the availabiity of libreadline -if test "$enable_pcretest_libreadline" = "yes"; then - for ac_header in readline/readline.h +if test "$enable_pcre2test_libreadline" = "yes"; then + for ac_header in readline/readline.h do : - ac_fn_c_check_header_mongrel "$LINENO" "readline/readline.h" "ac_cv_header_readline_readline_h" "$ac_includes_default" -if test "x$ac_cv_header_readline_readline_h" = xyes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_READLINE_READLINE_H 1 -_ACEOF + ac_fn_c_check_header_compile "$LINENO" "readline/readline.h" "ac_cv_header_readline_readline_h" "$ac_includes_default" +if test "x$ac_cv_header_readline_readline_h" = xyes +then : + printf "%s\n" "#define HAVE_READLINE_READLINE_H 1" >>confdefs.h HAVE_READLINE_H=1 fi done - - for ac_header in readline/history.h + for ac_header in readline/history.h do : - ac_fn_c_check_header_mongrel "$LINENO" "readline/history.h" "ac_cv_header_readline_history_h" "$ac_includes_default" -if test "x$ac_cv_header_readline_history_h" = xyes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_READLINE_HISTORY_H 1 -_ACEOF + ac_fn_c_check_header_compile "$LINENO" "readline/history.h" "ac_cv_header_readline_history_h" "$ac_includes_default" +if test "x$ac_cv_header_readline_history_h" = xyes +then : + printf "%s\n" "#define HAVE_READLINE_HISTORY_H 1" >>confdefs.h HAVE_HISTORY_H=1 fi done - - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for readline in -lreadline" >&5 -$as_echo_n "checking for readline in -lreadline... " >&6; } -if ${ac_cv_lib_readline_readline+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for readline in -lreadline" >&5 +printf %s "checking for readline in -lreadline... " >&6; } +if test ${ac_cv_lib_readline_readline+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_check_lib_save_LIBS=$LIBS LIBS="-lreadline $LIBS" cat confdefs.h - <<_ACEOF >conftest.$ac_ext @@ -18784,38 +14591,38 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* Override any GCC internal prototype to avoid an error. Use char because int might match the return type of a GCC builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif char readline (); int -main () +main (void) { return readline (); ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : ac_cv_lib_readline_readline=yes -else +else $as_nop ac_cv_lib_readline_readline=no fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_readline_readline" >&5 -$as_echo "$ac_cv_lib_readline_readline" >&6; } -if test "x$ac_cv_lib_readline_readline" = xyes; then : +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_readline_readline" >&5 +printf "%s\n" "$ac_cv_lib_readline_readline" >&6; } +if test "x$ac_cv_lib_readline_readline" = xyes +then : LIBREADLINE="-lreadline" -else +else $as_nop unset ac_cv_lib_readline_readline; - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for readline in -lreadline" >&5 -$as_echo_n "checking for readline in -lreadline... " >&6; } -if ${ac_cv_lib_readline_readline+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for readline in -lreadline" >&5 +printf %s "checking for readline in -lreadline... " >&6; } +if test ${ac_cv_lib_readline_readline+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_check_lib_save_LIBS=$LIBS LIBS="-lreadline -ltinfo $LIBS" cat confdefs.h - <<_ACEOF >conftest.$ac_ext @@ -18824,38 +14631,38 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* Override any GCC internal prototype to avoid an error. Use char because int might match the return type of a GCC builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif char readline (); int -main () +main (void) { return readline (); ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : ac_cv_lib_readline_readline=yes -else +else $as_nop ac_cv_lib_readline_readline=no fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_readline_readline" >&5 -$as_echo "$ac_cv_lib_readline_readline" >&6; } -if test "x$ac_cv_lib_readline_readline" = xyes; then : +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_readline_readline" >&5 +printf "%s\n" "$ac_cv_lib_readline_readline" >&6; } +if test "x$ac_cv_lib_readline_readline" = xyes +then : LIBREADLINE="-ltinfo" -else +else $as_nop unset ac_cv_lib_readline_readline; - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for readline in -lreadline" >&5 -$as_echo_n "checking for readline in -lreadline... " >&6; } -if ${ac_cv_lib_readline_readline+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for readline in -lreadline" >&5 +printf %s "checking for readline in -lreadline... " >&6; } +if test ${ac_cv_lib_readline_readline+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_check_lib_save_LIBS=$LIBS LIBS="-lreadline -lcurses $LIBS" cat confdefs.h - <<_ACEOF >conftest.$ac_ext @@ -18864,38 +14671,38 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* Override any GCC internal prototype to avoid an error. Use char because int might match the return type of a GCC builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif char readline (); int -main () +main (void) { return readline (); ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : ac_cv_lib_readline_readline=yes -else +else $as_nop ac_cv_lib_readline_readline=no fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_readline_readline" >&5 -$as_echo "$ac_cv_lib_readline_readline" >&6; } -if test "x$ac_cv_lib_readline_readline" = xyes; then : +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_readline_readline" >&5 +printf "%s\n" "$ac_cv_lib_readline_readline" >&6; } +if test "x$ac_cv_lib_readline_readline" = xyes +then : LIBREADLINE="-lcurses" -else +else $as_nop unset ac_cv_lib_readline_readline; - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for readline in -lreadline" >&5 -$as_echo_n "checking for readline in -lreadline... " >&6; } -if ${ac_cv_lib_readline_readline+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for readline in -lreadline" >&5 +printf %s "checking for readline in -lreadline... " >&6; } +if test ${ac_cv_lib_readline_readline+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_check_lib_save_LIBS=$LIBS LIBS="-lreadline -lncurses $LIBS" cat confdefs.h - <<_ACEOF >conftest.$ac_ext @@ -18904,38 +14711,38 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* Override any GCC internal prototype to avoid an error. Use char because int might match the return type of a GCC builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif char readline (); int -main () +main (void) { return readline (); ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : ac_cv_lib_readline_readline=yes -else +else $as_nop ac_cv_lib_readline_readline=no fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_readline_readline" >&5 -$as_echo "$ac_cv_lib_readline_readline" >&6; } -if test "x$ac_cv_lib_readline_readline" = xyes; then : +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_readline_readline" >&5 +printf "%s\n" "$ac_cv_lib_readline_readline" >&6; } +if test "x$ac_cv_lib_readline_readline" = xyes +then : LIBREADLINE="-lncurses" -else +else $as_nop unset ac_cv_lib_readline_readline; - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for readline in -lreadline" >&5 -$as_echo_n "checking for readline in -lreadline... " >&6; } -if ${ac_cv_lib_readline_readline+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for readline in -lreadline" >&5 +printf %s "checking for readline in -lreadline... " >&6; } +if test ${ac_cv_lib_readline_readline+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_check_lib_save_LIBS=$LIBS LIBS="-lreadline -lncursesw $LIBS" cat confdefs.h - <<_ACEOF >conftest.$ac_ext @@ -18944,38 +14751,38 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* Override any GCC internal prototype to avoid an error. Use char because int might match the return type of a GCC builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif char readline (); int -main () +main (void) { return readline (); ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : ac_cv_lib_readline_readline=yes -else +else $as_nop ac_cv_lib_readline_readline=no fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_readline_readline" >&5 -$as_echo "$ac_cv_lib_readline_readline" >&6; } -if test "x$ac_cv_lib_readline_readline" = xyes; then : +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_readline_readline" >&5 +printf "%s\n" "$ac_cv_lib_readline_readline" >&6; } +if test "x$ac_cv_lib_readline_readline" = xyes +then : LIBREADLINE="-lncursesw" -else +else $as_nop unset ac_cv_lib_readline_readline; - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for readline in -lreadline" >&5 -$as_echo_n "checking for readline in -lreadline... " >&6; } -if ${ac_cv_lib_readline_readline+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for readline in -lreadline" >&5 +printf %s "checking for readline in -lreadline... " >&6; } +if test ${ac_cv_lib_readline_readline+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_check_lib_save_LIBS=$LIBS LIBS="-lreadline -ltermcap $LIBS" cat confdefs.h - <<_ACEOF >conftest.$ac_ext @@ -18984,32 +14791,31 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* Override any GCC internal prototype to avoid an error. Use char because int might match the return type of a GCC builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif char readline (); int -main () +main (void) { return readline (); ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : ac_cv_lib_readline_readline=yes -else +else $as_nop ac_cv_lib_readline_readline=no fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_readline_readline" >&5 -$as_echo "$ac_cv_lib_readline_readline" >&6; } -if test "x$ac_cv_lib_readline_readline" = xyes; then : +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_readline_readline" >&5 +printf "%s\n" "$ac_cv_lib_readline_readline" >&6; } +if test "x$ac_cv_lib_readline_readline" = xyes +then : LIBREADLINE="-ltermcap" -else +else $as_nop LIBREADLINE="" fi @@ -19036,50 +14842,45 @@ fi # Check for the availability of libedit. Different distributions put its # headers in different places. Try to cover the most common ones. -if test "$enable_pcretest_libedit" = "yes"; then - for ac_header in editline/readline.h +if test "$enable_pcre2test_libedit" = "yes"; then + for ac_header in editline/readline.h do : - ac_fn_c_check_header_mongrel "$LINENO" "editline/readline.h" "ac_cv_header_editline_readline_h" "$ac_includes_default" -if test "x$ac_cv_header_editline_readline_h" = xyes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_EDITLINE_READLINE_H 1 -_ACEOF + ac_fn_c_check_header_compile "$LINENO" "editline/readline.h" "ac_cv_header_editline_readline_h" "$ac_includes_default" +if test "x$ac_cv_header_editline_readline_h" = xyes +then : + printf "%s\n" "#define HAVE_EDITLINE_READLINE_H 1" >>confdefs.h HAVE_EDITLINE_READLINE_H=1 -else - for ac_header in edit/readline/readline.h +else $as_nop + for ac_header in edit/readline/readline.h do : - ac_fn_c_check_header_mongrel "$LINENO" "edit/readline/readline.h" "ac_cv_header_edit_readline_readline_h" "$ac_includes_default" -if test "x$ac_cv_header_edit_readline_readline_h" = xyes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_EDIT_READLINE_READLINE_H 1 -_ACEOF + ac_fn_c_check_header_compile "$LINENO" "edit/readline/readline.h" "ac_cv_header_edit_readline_readline_h" "$ac_includes_default" +if test "x$ac_cv_header_edit_readline_readline_h" = xyes +then : + printf "%s\n" "#define HAVE_EDIT_READLINE_READLINE_H 1" >>confdefs.h HAVE_READLINE_READLINE_H=1 -else - for ac_header in readline/readline.h +else $as_nop + for ac_header in readline/readline.h do : - ac_fn_c_check_header_mongrel "$LINENO" "readline/readline.h" "ac_cv_header_readline_readline_h" "$ac_includes_default" -if test "x$ac_cv_header_readline_readline_h" = xyes; then : - cat >>confdefs.h <<_ACEOF -#define HAVE_READLINE_READLINE_H 1 -_ACEOF + ac_fn_c_check_header_compile "$LINENO" "readline/readline.h" "ac_cv_header_readline_readline_h" "$ac_includes_default" +if test "x$ac_cv_header_readline_readline_h" = xyes +then : + printf "%s\n" "#define HAVE_READLINE_READLINE_H 1" >>confdefs.h HAVE_READLINE_READLINE_H=1 fi done - fi done - fi done - - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for readline in -ledit" >&5 -$as_echo_n "checking for readline in -ledit... " >&6; } -if ${ac_cv_lib_edit_readline+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for readline in -ledit" >&5 +printf %s "checking for readline in -ledit... " >&6; } +if test ${ac_cv_lib_edit_readline+y} +then : + printf %s "(cached) " >&6 +else $as_nop ac_check_lib_save_LIBS=$LIBS LIBS="-ledit $LIBS" cat confdefs.h - <<_ACEOF >conftest.$ac_ext @@ -19088,64 +14889,75 @@ cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* Override any GCC internal prototype to avoid an error. Use char because int might match the return type of a GCC builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif char readline (); int -main () +main (void) { return readline (); ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : ac_cv_lib_edit_readline=yes -else +else $as_nop ac_cv_lib_edit_readline=no fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext LIBS=$ac_check_lib_save_LIBS fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_edit_readline" >&5 -$as_echo "$ac_cv_lib_edit_readline" >&6; } -if test "x$ac_cv_lib_edit_readline" = xyes; then : +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_edit_readline" >&5 +printf "%s\n" "$ac_cv_lib_edit_readline" >&6; } +if test "x$ac_cv_lib_edit_readline" = xyes +then : LIBEDIT="-ledit" fi fi -# This facilitates -ansi builds under Linux - -PCRE_STATIC_CFLAG="" +PCRE2_STATIC_CFLAG="" if test "x$enable_shared" = "xno" ; then -$as_echo "#define PCRE_STATIC 1" >>confdefs.h +printf "%s\n" "#define PCRE2_STATIC 1" >>confdefs.h + + PCRE2_STATIC_CFLAG="-DPCRE2_STATIC" +fi + + +# Here is where PCRE2-specific defines are handled + +if test "$enable_pcre2_8" = "yes"; then + +printf "%s\n" "#define SUPPORT_PCRE2_8 /**/" >>confdefs.h - PCRE_STATIC_CFLAG="-DPCRE_STATIC" fi +if test "$enable_pcre2_16" = "yes"; then -# Here is where pcre specific defines are handled +printf "%s\n" "#define SUPPORT_PCRE2_16 /**/" >>confdefs.h -if test "$enable_pcre8" = "yes"; then +fi + +if test "$enable_pcre2_32" = "yes"; then -$as_echo "#define SUPPORT_PCRE8 /**/" >>confdefs.h +printf "%s\n" "#define SUPPORT_PCRE2_32 /**/" >>confdefs.h fi -if test "$enable_pcre16" = "yes"; then +if test "$enable_debug" = "yes"; then -$as_echo "#define SUPPORT_PCRE16 /**/" >>confdefs.h +printf "%s\n" "#define PCRE2_DEBUG /**/" >>confdefs.h fi -if test "$enable_pcre32" = "yes"; then +if test "$enable_percent_zt" = "no"; then -$as_echo "#define SUPPORT_PCRE32 /**/" >>confdefs.h +printf "%s\n" "#define DISABLE_PERCENT_ZT /**/" >>confdefs.h +else + enable_percent_zt=auto fi # Unless running under Windows, JIT support requires pthreads. @@ -19174,33 +14986,31 @@ if test x"$PTHREAD_LIBS$PTHREAD_CFLAGS" != x; then CFLAGS="$CFLAGS $PTHREAD_CFLAGS" save_LIBS="$LIBS" LIBS="$PTHREAD_LIBS $LIBS" - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for pthread_join in LIBS=$PTHREAD_LIBS with CFLAGS=$PTHREAD_CFLAGS" >&5 -$as_echo_n "checking for pthread_join in LIBS=$PTHREAD_LIBS with CFLAGS=$PTHREAD_CFLAGS... " >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for pthread_join in LIBS=$PTHREAD_LIBS with CFLAGS=$PTHREAD_CFLAGS" >&5 +printf %s "checking for pthread_join in LIBS=$PTHREAD_LIBS with CFLAGS=$PTHREAD_CFLAGS... " >&6; } cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ /* Override any GCC internal prototype to avoid an error. Use char because int might match the return type of a GCC builtin and then its argument prototype would still apply. */ -#ifdef __cplusplus -extern "C" -#endif char pthread_join (); int -main () +main (void) { return pthread_join (); ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : ax_pthread_ok=yes fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ax_pthread_ok" >&5 -$as_echo "$ax_pthread_ok" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_pthread_ok" >&5 +printf "%s\n" "$ax_pthread_ok" >&6; } if test x"$ax_pthread_ok" = xno; then PTHREAD_LIBS="" PTHREAD_CFLAGS="" @@ -19265,24 +15075,25 @@ for flag in $ax_pthread_flags; do case $flag in none) - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether pthreads work without any flags" >&5 -$as_echo_n "checking whether pthreads work without any flags... " >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether pthreads work without any flags" >&5 +printf %s "checking whether pthreads work without any flags... " >&6; } ;; -*) - { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether pthreads work with $flag" >&5 -$as_echo_n "checking whether pthreads work with $flag... " >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether pthreads work with $flag" >&5 +printf %s "checking whether pthreads work with $flag... " >&6; } PTHREAD_CFLAGS="$flag" ;; pthread-config) # Extract the first word of "pthread-config", so it can be a program name with args. set dummy pthread-config; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_ax_pthread_config+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_ax_pthread_config+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$ax_pthread_config"; then ac_cv_prog_ax_pthread_config="$ax_pthread_config" # Let the user override the test. else @@ -19290,11 +15101,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_ax_pthread_config="yes" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -19306,11 +15121,11 @@ fi fi ax_pthread_config=$ac_cv_prog_ax_pthread_config if test -n "$ax_pthread_config"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ax_pthread_config" >&5 -$as_echo "$ax_pthread_config" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_pthread_config" >&5 +printf "%s\n" "$ax_pthread_config" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -19320,8 +15135,8 @@ fi ;; *) - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for the pthreads library -l$flag" >&5 -$as_echo_n "checking for the pthreads library -l$flag... " >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for the pthreads library -l$flag" >&5 +printf %s "checking for the pthreads library -l$flag... " >&6; } PTHREAD_LIBS="-l$flag" ;; esac @@ -19346,7 +15161,7 @@ $as_echo_n "checking for the pthreads library -l$flag... " >&6; } static void routine(void *a) { a = 0; } static void *start_routine(void *a) { return a; } int -main () +main (void) { pthread_t th; pthread_attr_t attr; pthread_create(&th, 0, start_routine, 0); @@ -19358,17 +15173,18 @@ pthread_t th; pthread_attr_t attr; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : ax_pthread_ok=yes fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext LIBS="$save_LIBS" CFLAGS="$save_CFLAGS" - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ax_pthread_ok" >&5 -$as_echo "$ax_pthread_ok" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_pthread_ok" >&5 +printf "%s\n" "$ax_pthread_ok" >&6; } if test "x$ax_pthread_ok" = xyes; then break; fi @@ -19386,39 +15202,38 @@ if test "x$ax_pthread_ok" = xyes; then CFLAGS="$CFLAGS $PTHREAD_CFLAGS" # Detect AIX lossage: JOINABLE attribute is called UNDETACHED. - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for joinable pthread attribute" >&5 -$as_echo_n "checking for joinable pthread attribute... " >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for joinable pthread attribute" >&5 +printf %s "checking for joinable pthread attribute... " >&6; } attr_name=unknown for attr in PTHREAD_CREATE_JOINABLE PTHREAD_CREATE_UNDETACHED; do cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #include int -main () +main (void) { int attr = $attr; return attr /* ; */ ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : attr_name=$attr; break fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext done - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $attr_name" >&5 -$as_echo "$attr_name" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $attr_name" >&5 +printf "%s\n" "$attr_name" >&6; } if test "$attr_name" != PTHREAD_CREATE_JOINABLE; then -cat >>confdefs.h <<_ACEOF -#define PTHREAD_CREATE_JOINABLE $attr_name -_ACEOF +printf "%s\n" "#define PTHREAD_CREATE_JOINABLE $attr_name" >>confdefs.h fi - { $as_echo "$as_me:${as_lineno-$LINENO}: checking if more special flags are required for pthreads" >&5 -$as_echo_n "checking if more special flags are required for pthreads... " >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking if more special flags are required for pthreads" >&5 +printf %s "checking if more special flags are required for pthreads... " >&6; } flag=no case ${host_os} in aix* | freebsd* | darwin*) flag="-D_THREAD_SAFE";; @@ -19431,44 +15246,47 @@ $as_echo_n "checking if more special flags are required for pthreads... " >&6; } fi ;; esac - { $as_echo "$as_me:${as_lineno-$LINENO}: result: ${flag}" >&5 -$as_echo "${flag}" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: ${flag}" >&5 +printf "%s\n" "${flag}" >&6; } if test "x$flag" != xno; then PTHREAD_CFLAGS="$flag $PTHREAD_CFLAGS" fi - { $as_echo "$as_me:${as_lineno-$LINENO}: checking for PTHREAD_PRIO_INHERIT" >&5 -$as_echo_n "checking for PTHREAD_PRIO_INHERIT... " >&6; } -if ${ax_cv_PTHREAD_PRIO_INHERIT+:} false; then : - $as_echo_n "(cached) " >&6 -else + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for PTHREAD_PRIO_INHERIT" >&5 +printf %s "checking for PTHREAD_PRIO_INHERIT... " >&6; } +if test ${ax_cv_PTHREAD_PRIO_INHERIT+y} +then : + printf %s "(cached) " >&6 +else $as_nop cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ #include int -main () +main (void) { int i = PTHREAD_PRIO_INHERIT; ; return 0; } _ACEOF -if ac_fn_c_try_link "$LINENO"; then : +if ac_fn_c_try_link "$LINENO" +then : ax_cv_PTHREAD_PRIO_INHERIT=yes -else +else $as_nop ax_cv_PTHREAD_PRIO_INHERIT=no fi -rm -f core conftest.err conftest.$ac_objext \ +rm -f core conftest.err conftest.$ac_objext conftest.beam \ conftest$ac_exeext conftest.$ac_ext fi -{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ax_cv_PTHREAD_PRIO_INHERIT" >&5 -$as_echo "$ax_cv_PTHREAD_PRIO_INHERIT" >&6; } - if test "x$ax_cv_PTHREAD_PRIO_INHERIT" = "xyes"; then : +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ax_cv_PTHREAD_PRIO_INHERIT" >&5 +printf "%s\n" "$ax_cv_PTHREAD_PRIO_INHERIT" >&6; } + if test "x$ax_cv_PTHREAD_PRIO_INHERIT" = "xyes" +then : -$as_echo "#define HAVE_PTHREAD_PRIO_INHERIT 1" >>confdefs.h +printf "%s\n" "#define HAVE_PTHREAD_PRIO_INHERIT 1" >>confdefs.h fi @@ -19481,11 +15299,12 @@ fi do # Extract the first word of "$ac_prog", so it can be a program name with args. set dummy $ac_prog; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_prog_PTHREAD_CC+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_prog_PTHREAD_CC+y} +then : + printf %s "(cached) " >&6 +else $as_nop if test -n "$PTHREAD_CC"; then ac_cv_prog_PTHREAD_CC="$PTHREAD_CC" # Let the user override the test. else @@ -19493,11 +15312,15 @@ as_save_IFS=$IFS; IFS=$PATH_SEPARATOR for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then ac_cv_prog_PTHREAD_CC="$ac_prog" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -19508,11 +15331,11 @@ fi fi PTHREAD_CC=$ac_cv_prog_PTHREAD_CC if test -n "$PTHREAD_CC"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $PTHREAD_CC" >&5 -$as_echo "$PTHREAD_CC" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $PTHREAD_CC" >&5 +printf "%s\n" "$PTHREAD_CC" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -19534,7 +15357,7 @@ fi # Finally, execute ACTION-IF-FOUND/ACTION-IF-NOT-FOUND: if test x"$ax_pthread_ok" = xyes; then -$as_echo "#define HAVE_PTHREAD 1" >>confdefs.h +printf "%s\n" "#define HAVE_PTHREAD 1" >>confdefs.h : else @@ -19553,148 +15376,173 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu LIBS="$PTHREAD_LIBS $LIBS" fi -$as_echo "#define SUPPORT_JIT /**/" >>confdefs.h +printf "%s\n" "#define SUPPORT_JIT /**/" >>confdefs.h else - enable_pcregrep_jit="no" + enable_pcre2grep_jit="no" fi -if test "$enable_pcregrep_jit" = "yes"; then +if test "$enable_jit_sealloc" = "yes"; then -$as_echo "#define SUPPORT_PCREGREP_JIT /**/" >>confdefs.h +printf "%s\n" "#define SLJIT_PROT_EXECUTABLE_ALLOCATOR 1" >>confdefs.h fi -if test "$enable_utf" = "yes"; then +if test "$enable_pcre2grep_jit" = "yes"; then -$as_echo "#define SUPPORT_UTF /**/" >>confdefs.h +printf "%s\n" "#define SUPPORT_PCRE2GREP_JIT /**/" >>confdefs.h fi -if test "$enable_unicode_properties" = "yes"; then +if test "$enable_pcre2grep_callout" = "yes"; then + if test "$enable_pcre2grep_callout_fork" = "yes"; then + if test "$HAVE_WINDOWS_H" != "1"; then + if test "$HAVE_SYS_WAIT_H" != "1"; then + as_fn_error $? "Callout script support needs sys/wait.h." "$LINENO" 5 + fi + fi + +printf "%s\n" "#define SUPPORT_PCRE2GREP_CALLOUT_FORK /**/" >>confdefs.h -$as_echo "#define SUPPORT_UCP /**/" >>confdefs.h + fi + +printf "%s\n" "#define SUPPORT_PCRE2GREP_CALLOUT /**/" >>confdefs.h +else + enable_pcre2grep_callout_fork="no" fi -if test "$enable_stack_for_recursion" = "no"; then +if test "$enable_unicode" = "yes"; then -$as_echo "#define NO_RECURSE /**/" >>confdefs.h +printf "%s\n" "#define SUPPORT_UNICODE /**/" >>confdefs.h fi -if test "$enable_pcregrep_libz" = "yes"; then +if test "$enable_pcre2grep_libz" = "yes"; then -$as_echo "#define SUPPORT_LIBZ /**/" >>confdefs.h +printf "%s\n" "#define SUPPORT_LIBZ /**/" >>confdefs.h fi -if test "$enable_pcregrep_libbz2" = "yes"; then +if test "$enable_pcre2grep_libbz2" = "yes"; then -$as_echo "#define SUPPORT_LIBBZ2 /**/" >>confdefs.h +printf "%s\n" "#define SUPPORT_LIBBZ2 /**/" >>confdefs.h fi -if test $with_pcregrep_bufsize -lt 8192 ; then - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $with_pcregrep_bufsize is too small for --with-pcregrep-bufsize; using 8192" >&5 -$as_echo "$as_me: WARNING: $with_pcregrep_bufsize is too small for --with-pcregrep-bufsize; using 8192" >&2;} - with_pcregrep_bufsize="8192" +if test $with_pcre2grep_bufsize -lt 8192 ; then + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: $with_pcre2grep_bufsize is too small for --with-pcre2grep-bufsize; using 8192" >&5 +printf "%s\n" "$as_me: WARNING: $with_pcre2grep_bufsize is too small for --with-pcre2grep-bufsize; using 8192" >&2;} + with_pcre2grep_bufsize="8192" +else + if test $? -gt 1 ; then + as_fn_error $? "Bad value for --with-pcre2grep-bufsize" "$LINENO" 5 + fi +fi + +if test $with_pcre2grep_max_bufsize -lt $with_pcre2grep_bufsize ; then + with_pcre2grep_max_bufsize="$with_pcre2grep_bufsize" else if test $? -gt 1 ; then - as_fn_error $? "Bad value for --with-pcregrep-bufsize" "$LINENO" 5 + as_fn_error $? "Bad value for --with-pcre2grep-max-bufsize" "$LINENO" 5 fi fi -cat >>confdefs.h <<_ACEOF -#define PCREGREP_BUFSIZE $with_pcregrep_bufsize -_ACEOF +printf "%s\n" "#define PCRE2GREP_BUFSIZE $with_pcre2grep_bufsize" >>confdefs.h + + +printf "%s\n" "#define PCRE2GREP_MAX_BUFSIZE $with_pcre2grep_max_bufsize" >>confdefs.h -if test "$enable_pcretest_libedit" = "yes"; then -$as_echo "#define SUPPORT_LIBEDIT /**/" >>confdefs.h +if test "$enable_pcre2test_libedit" = "yes"; then + +printf "%s\n" "#define SUPPORT_LIBEDIT /**/" >>confdefs.h LIBREADLINE="$LIBEDIT" -elif test "$enable_pcretest_libreadline" = "yes"; then +elif test "$enable_pcre2test_libreadline" = "yes"; then -$as_echo "#define SUPPORT_LIBREADLINE /**/" >>confdefs.h +printf "%s\n" "#define SUPPORT_LIBREADLINE /**/" >>confdefs.h fi -cat >>confdefs.h <<_ACEOF -#define NEWLINE $ac_pcre_newline_value -_ACEOF +printf "%s\n" "#define NEWLINE_DEFAULT $ac_pcre2_newline_value" >>confdefs.h if test "$enable_bsr_anycrlf" = "yes"; then -$as_echo "#define BSR_ANYCRLF /**/" >>confdefs.h +printf "%s\n" "#define BSR_ANYCRLF /**/" >>confdefs.h fi +if test "$enable_never_backslash_C" = "yes"; then -cat >>confdefs.h <<_ACEOF -#define LINK_SIZE $with_link_size -_ACEOF +printf "%s\n" "#define NEVER_BACKSLASH_C /**/" >>confdefs.h + +fi +printf "%s\n" "#define LINK_SIZE $with_link_size" >>confdefs.h -cat >>confdefs.h <<_ACEOF -#define POSIX_MALLOC_THRESHOLD $with_posix_malloc_threshold -_ACEOF +printf "%s\n" "#define PARENS_NEST_LIMIT $with_parens_nest_limit" >>confdefs.h -cat >>confdefs.h <<_ACEOF -#define PARENS_NEST_LIMIT $with_parens_nest_limit -_ACEOF +printf "%s\n" "#define MATCH_LIMIT $with_match_limit" >>confdefs.h -cat >>confdefs.h <<_ACEOF -#define MATCH_LIMIT $with_match_limit -_ACEOF +# --with-match-limit-recursion is an obsolete synonym for --with-match-limit-depth +if test "$with_match_limit_recursion" != "UNSET"; then +cat <>confdefs.h <<_ACEOF -#define MATCH_LIMIT_RECURSION $with_match_limit_recursion -_ACEOF +WARNING: --with-match-limit-recursion is an obsolete option. Please use + --with-match-limit-depth in future. If both are set, --with-match-limit-depth + will be used. See also --with-heap-limit. + +EOF +if test "$with_match_limit_depth" = "MATCH_LIMIT"; then + with_match_limit_depth=$with_match_limit_recursion +fi +fi + + +printf "%s\n" "#define MATCH_LIMIT_DEPTH $with_match_limit_depth" >>confdefs.h -$as_echo "#define MAX_NAME_SIZE 32" >>confdefs.h +printf "%s\n" "#define HEAP_LIMIT $with_heap_limit" >>confdefs.h -$as_echo "#define MAX_NAME_COUNT 10000" >>confdefs.h +printf "%s\n" "#define MAX_NAME_SIZE 32" >>confdefs.h + + + +printf "%s\n" "#define MAX_NAME_COUNT 10000" >>confdefs.h if test "$enable_ebcdic" = "yes"; then -cat >>confdefs.h <<_ACEOF -#define EBCDIC /**/ -_ACEOF +printf "%s\n" "#define EBCDIC /**/" >>confdefs.h fi if test "$enable_ebcdic_nl25" = "yes"; then -cat >>confdefs.h <<_ACEOF -#define EBCDIC_NL25 /**/ -_ACEOF +printf "%s\n" "#define EBCDIC_NL25 /**/" >>confdefs.h fi if test "$enable_valgrind" = "yes"; then -cat >>confdefs.h <<_ACEOF -#define SUPPORT_VALGRIND /**/ -_ACEOF +printf "%s\n" "#define SUPPORT_VALGRIND /**/" >>confdefs.h fi @@ -19710,25 +15558,20 @@ case $host_os in ;; esac -# The extra LDFLAGS for each particular library -# (Note: The libpcre*_version bits are m4 variables, assigned above) - -EXTRA_LIBPCRE_LDFLAGS="$EXTRA_LIBPCRE_LDFLAGS \ - $NO_UNDEFINED -version-info 3:11:2" - -EXTRA_LIBPCRE16_LDFLAGS="$EXTRA_LIBPCRE16_LDFLAGS \ - $NO_UNDEFINED -version-info 2:11:2" +# The extra LDFLAGS for each particular library. The libpcre2*_version values +# are m4 variables, assigned above. -EXTRA_LIBPCRE32_LDFLAGS="$EXTRA_LIBPCRE32_LDFLAGS \ - $NO_UNDEFINED -version-info 0:11:0" +EXTRA_LIBPCRE2_8_LDFLAGS="$EXTRA_LIBPCRE2_8_LDFLAGS \ + $NO_UNDEFINED -version-info 10:2:10" -EXTRA_LIBPCREPOSIX_LDFLAGS="$EXTRA_LIBPCREPOSIX_LDFLAGS \ - $NO_UNDEFINED -version-info 0:6:0" +EXTRA_LIBPCRE2_16_LDFLAGS="$EXTRA_LIBPCRE2_16_LDFLAGS \ + $NO_UNDEFINED -version-info 10:2:10" -EXTRA_LIBPCRECPP_LDFLAGS="$EXTRA_LIBPCRECPP_LDFLAGS \ - $NO_UNDEFINED -version-info 0:1:0 \ - $EXPORT_ALL_SYMBOLS" +EXTRA_LIBPCRE2_32_LDFLAGS="$EXTRA_LIBPCRE2_32_LDFLAGS \ + $NO_UNDEFINED -version-info 10:2:10" +EXTRA_LIBPCRE2_POSIX_LDFLAGS="$EXTRA_LIBPCRE2_POSIX_LDFLAGS \ + $NO_UNDEFINED -version-info 3:0:0" @@ -19737,68 +15580,68 @@ EXTRA_LIBPCRECPP_LDFLAGS="$EXTRA_LIBPCRECPP_LDFLAGS \ # When we run 'make distcheck', use these arguments. Turning off compiler # optimization makes it run faster. -DISTCHECK_CONFIGURE_FLAGS="CFLAGS='' CXXFLAGS='' --enable-pcre16 --enable-pcre32 --enable-jit --enable-cpp --enable-unicode-properties" +DISTCHECK_CONFIGURE_FLAGS="CFLAGS='' CXXFLAGS='' --enable-pcre2-16 --enable-pcre2-32 --enable-jit" -# Check that, if --enable-pcregrep-libz or --enable-pcregrep-libbz2 is +# Check that, if --enable-pcre2grep-libz or --enable-pcre2grep-libbz2 is # specified, the relevant library is available. -if test "$enable_pcregrep_libz" = "yes"; then +if test "$enable_pcre2grep_libz" = "yes"; then if test "$HAVE_ZLIB_H" != "1"; then - echo "** Cannot --enable-pcregrep-libz because zlib.h was not found" + echo "** Cannot --enable-pcre2grep-libz because zlib.h was not found" exit 1 fi if test "$HAVE_LIBZ" != "1"; then - echo "** Cannot --enable-pcregrep-libz because libz was not found" + echo "** Cannot --enable-pcre2grep-libz because libz was not found" exit 1 fi LIBZ="-lz" fi -if test "$enable_pcregrep_libbz2" = "yes"; then +if test "$enable_pcre2grep_libbz2" = "yes"; then if test "$HAVE_BZLIB_H" != "1"; then - echo "** Cannot --enable-pcregrep-libbz2 because bzlib.h was not found" + echo "** Cannot --enable-pcre2grep-libbz2 because bzlib.h was not found" exit 1 fi if test "$HAVE_LIBBZ2" != "1"; then - echo "** Cannot --enable-pcregrep-libbz2 because libbz2 was not found" + echo "** Cannot --enable-pcre2grep-libbz2 because libbz2 was not found" exit 1 fi LIBBZ2="-lbz2" fi -# Similarly for --enable-pcretest-readline +# Similarly for --enable-pcre2test-readline -if test "$enable_pcretest_libedit" = "yes"; then - if test "$enable_pcretest_libreadline" = "yes"; then - echo "** Cannot use both --enable-pcretest-libedit and --enable-pcretest-readline" +if test "$enable_pcre2test_libedit" = "yes"; then + if test "$enable_pcre2test_libreadline" = "yes"; then + echo "** Cannot use both --enable-pcre2test-libedit and --enable-pcre2test-readline" exit 1 fi if test "$HAVE_EDITLINE_READLINE_H" != "1" -a \ "$HAVE_READLINE_READLINE_H" != "1"; then - echo "** Cannot --enable-pcretest-libedit because neither editline/readline.h" + echo "** Cannot --enable-pcre2test-libedit because neither editline/readline.h" echo "** nor readline/readline.h was found." exit 1 fi if test -z "$LIBEDIT"; then - echo "** Cannot --enable-pcretest-libedit because libedit library was not found." + echo "** Cannot --enable-pcre2test-libedit because libedit library was not found." exit 1 fi fi -if test "$enable_pcretest_libreadline" = "yes"; then +if test "$enable_pcre2test_libreadline" = "yes"; then if test "$HAVE_READLINE_H" != "1"; then - echo "** Cannot --enable-pcretest-readline because readline/readline.h was not found." + echo "** Cannot --enable-pcre2test-readline because readline/readline.h was not found." exit 1 fi if test "$HAVE_HISTORY_H" != "1"; then - echo "** Cannot --enable-pcretest-readline because readline/history.h was not found." + echo "** Cannot --enable-pcre2test-readline because readline/history.h was not found." exit 1 fi if test -z "$LIBREADLINE"; then - echo "** Cannot --enable-pcretest-readline because readline library was not found." + echo "** Cannot --enable-pcre2test-readline because readline library was not found." exit 1 fi fi @@ -19817,11 +15660,12 @@ if test "x$ac_cv_env_PKG_CONFIG_set" != "xset"; then if test -n "$ac_tool_prefix"; then # Extract the first word of "${ac_tool_prefix}pkg-config", so it can be a program name with args. set dummy ${ac_tool_prefix}pkg-config; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_path_PKG_CONFIG+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_path_PKG_CONFIG+y} +then : + printf %s "(cached) " >&6 +else $as_nop case $PKG_CONFIG in [\\/]* | ?:[\\/]*) ac_cv_path_PKG_CONFIG="$PKG_CONFIG" # Let the user override the test with a path. @@ -19831,11 +15675,15 @@ else for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_path_PKG_CONFIG="$as_dir/$ac_word$ac_exec_ext" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_path_PKG_CONFIG="$as_dir$ac_word$ac_exec_ext" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -19847,11 +15695,11 @@ esac fi PKG_CONFIG=$ac_cv_path_PKG_CONFIG if test -n "$PKG_CONFIG"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $PKG_CONFIG" >&5 -$as_echo "$PKG_CONFIG" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $PKG_CONFIG" >&5 +printf "%s\n" "$PKG_CONFIG" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -19860,11 +15708,12 @@ if test -z "$ac_cv_path_PKG_CONFIG"; then ac_pt_PKG_CONFIG=$PKG_CONFIG # Extract the first word of "pkg-config", so it can be a program name with args. set dummy pkg-config; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_path_ac_pt_PKG_CONFIG+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_path_ac_pt_PKG_CONFIG+y} +then : + printf %s "(cached) " >&6 +else $as_nop case $ac_pt_PKG_CONFIG in [\\/]* | ?:[\\/]*) ac_cv_path_ac_pt_PKG_CONFIG="$ac_pt_PKG_CONFIG" # Let the user override the test with a path. @@ -19874,11 +15723,15 @@ else for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_path_ac_pt_PKG_CONFIG="$as_dir/$ac_word$ac_exec_ext" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_path_ac_pt_PKG_CONFIG="$as_dir$ac_word$ac_exec_ext" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -19890,11 +15743,11 @@ esac fi ac_pt_PKG_CONFIG=$ac_cv_path_ac_pt_PKG_CONFIG if test -n "$ac_pt_PKG_CONFIG"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_pt_PKG_CONFIG" >&5 -$as_echo "$ac_pt_PKG_CONFIG" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $ac_pt_PKG_CONFIG" >&5 +printf "%s\n" "$ac_pt_PKG_CONFIG" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi if test "x$ac_pt_PKG_CONFIG" = x; then @@ -19902,8 +15755,8 @@ fi else case $cross_compiling:$ac_tool_warned in yes:) -{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 -$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5 +printf "%s\n" "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;} ac_tool_warned=yes ;; esac PKG_CONFIG=$ac_pt_PKG_CONFIG @@ -19915,30 +15768,30 @@ fi fi if test -n "$PKG_CONFIG"; then _pkg_min_version=0.9.0 - { $as_echo "$as_me:${as_lineno-$LINENO}: checking pkg-config is at least version $_pkg_min_version" >&5 -$as_echo_n "checking pkg-config is at least version $_pkg_min_version... " >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking pkg-config is at least version $_pkg_min_version" >&5 +printf %s "checking pkg-config is at least version $_pkg_min_version... " >&6; } if $PKG_CONFIG --atleast-pkgconfig-version $_pkg_min_version; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } PKG_CONFIG="" fi fi pkg_failed=no -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for VALGRIND" >&5 -$as_echo_n "checking for VALGRIND... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for VALGRIND" >&5 +printf %s "checking for VALGRIND... " >&6; } if test -n "$VALGRIND_CFLAGS"; then pkg_cv_VALGRIND_CFLAGS="$VALGRIND_CFLAGS" elif test -n "$PKG_CONFIG"; then if test -n "$PKG_CONFIG" && \ - { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"valgrind\""; } >&5 + { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"valgrind\""; } >&5 ($PKG_CONFIG --exists --print-errors "valgrind") 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; }; then pkg_cv_VALGRIND_CFLAGS=`$PKG_CONFIG --cflags "valgrind" 2>/dev/null` test "x$?" != "x0" && pkg_failed=yes @@ -19952,10 +15805,10 @@ if test -n "$VALGRIND_LIBS"; then pkg_cv_VALGRIND_LIBS="$VALGRIND_LIBS" elif test -n "$PKG_CONFIG"; then if test -n "$PKG_CONFIG" && \ - { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"valgrind\""; } >&5 + { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"valgrind\""; } >&5 ($PKG_CONFIG --exists --print-errors "valgrind") 2>&5 ac_status=$? - $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5 test $ac_status = 0; }; then pkg_cv_VALGRIND_LIBS=`$PKG_CONFIG --libs "valgrind" 2>/dev/null` test "x$?" != "x0" && pkg_failed=yes @@ -19969,8 +15822,8 @@ fi if test $pkg_failed = yes; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then _pkg_short_errors_supported=yes @@ -19996,10 +15849,10 @@ Alternatively, you may set the environment variables VALGRIND_CFLAGS and VALGRIND_LIBS to avoid the need to call pkg-config. See the pkg-config man page for more details." "$LINENO" 5 elif test $pkg_failed = untried; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } - { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } + { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "The pkg-config script could not be found or is too old. Make sure it is in your PATH or set the PKG_CONFIG environment variable to the full path to pkg-config. @@ -20013,8 +15866,8 @@ See \`config.log' for more details" "$LINENO" 5; } else VALGRIND_CFLAGS=$pkg_cv_VALGRIND_CFLAGS VALGRIND_LIBS=$pkg_cv_VALGRIND_LIBS - { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 -$as_echo "yes" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: yes" >&5 +printf "%s\n" "yes" >&6; } fi fi @@ -20028,11 +15881,12 @@ if test "$enable_coverage" = "yes"; then # ccache is incompatible with gcov # Extract the first word of "shtool", so it can be a program name with args. set dummy shtool; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_path_SHTOOL+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_path_SHTOOL+y} +then : + printf %s "(cached) " >&6 +else $as_nop case $SHTOOL in [\\/]* | ?:[\\/]*) ac_cv_path_SHTOOL="$SHTOOL" # Let the user override the test with a path. @@ -20042,11 +15896,15 @@ else for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_path_SHTOOL="$as_dir/$ac_word$ac_exec_ext" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_path_SHTOOL="$as_dir$ac_word$ac_exec_ext" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -20059,11 +15917,11 @@ esac fi SHTOOL=$ac_cv_path_SHTOOL if test -n "$SHTOOL"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $SHTOOL" >&5 -$as_echo "$SHTOOL" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $SHTOOL" >&5 +printf "%s\n" "$SHTOOL" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -20081,11 +15939,12 @@ fi # Extract the first word of "lcov", so it can be a program name with args. set dummy lcov; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_path_LCOV+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_path_LCOV+y} +then : + printf %s "(cached) " >&6 +else $as_nop case $LCOV in [\\/]* | ?:[\\/]*) ac_cv_path_LCOV="$LCOV" # Let the user override the test with a path. @@ -20095,11 +15954,15 @@ else for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_path_LCOV="$as_dir/$ac_word$ac_exec_ext" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_path_LCOV="$as_dir$ac_word$ac_exec_ext" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -20112,11 +15975,11 @@ esac fi LCOV=$ac_cv_path_LCOV if test -n "$LCOV"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $LCOV" >&5 -$as_echo "$LCOV" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $LCOV" >&5 +printf "%s\n" "$LCOV" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -20127,11 +15990,12 @@ fi # Extract the first word of "genhtml", so it can be a program name with args. set dummy genhtml; ac_word=$2 -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 -$as_echo_n "checking for $ac_word... " >&6; } -if ${ac_cv_path_GENHTML+:} false; then : - $as_echo_n "(cached) " >&6 -else +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5 +printf %s "checking for $ac_word... " >&6; } +if test ${ac_cv_path_GENHTML+y} +then : + printf %s "(cached) " >&6 +else $as_nop case $GENHTML in [\\/]* | ?:[\\/]*) ac_cv_path_GENHTML="$GENHTML" # Let the user override the test with a path. @@ -20141,11 +16005,15 @@ else for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac for ac_exec_ext in '' $ac_executable_extensions; do - if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then - ac_cv_path_GENHTML="$as_dir/$ac_word$ac_exec_ext" - $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5 + if as_fn_executable_p "$as_dir$ac_word$ac_exec_ext"; then + ac_cv_path_GENHTML="$as_dir$ac_word$ac_exec_ext" + printf "%s\n" "$as_me:${as_lineno-$LINENO}: found $as_dir$ac_word$ac_exec_ext" >&5 break 2 fi done @@ -20158,11 +16026,11 @@ esac fi GENHTML=$ac_cv_path_GENHTML if test -n "$GENHTML"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: $GENHTML" >&5 -$as_echo "$GENHTML" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $GENHTML" >&5 +printf "%s\n" "$GENHTML" >&6; } else - { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 -$as_echo "no" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: no" >&5 +printf "%s\n" "no" >&6; } fi @@ -20188,15 +16056,61 @@ else fi +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking whether Intel CET is enabled" >&5 +printf %s "checking whether Intel CET is enabled... " >&6; } +ac_ext=c +ac_cpp='$CPP $CPPFLAGS' +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' +ac_compiler_gnu=$ac_cv_c_compiler_gnu + +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +int +main (void) +{ +#ifndef __CET__ +# error CET is not enabled +#endif + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO" +then : + pcre2_cc_cv_intel_cet_enabled=yes +else $as_nop + pcre2_cc_cv_intel_cet_enabled=no +fi +rm -f core conftest.err conftest.$ac_objext conftest.beam conftest.$ac_ext +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: $pcre2_cc_cv_intel_cet_enabled" >&5 +printf "%s\n" "$pcre2_cc_cv_intel_cet_enabled" >&6; } +if test "$pcre2_cc_cv_intel_cet_enabled" = yes; then + CET_CFLAGS="-mshstk" + +fi +ac_ext=c +ac_cpp='$CPP $CPPFLAGS' +ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5' +ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext $LIBS >&5' +ac_compiler_gnu=$ac_cv_c_compiler_gnu + + +# LIB_POSTFIX is used by CMakeLists.txt for Windows debug builds. +# Pass empty LIB_POSTFIX to *.pc files and pcre2-config here. + + # Produce these files, in addition to config.h. -ac_config_files="$ac_config_files Makefile libpcre.pc libpcre16.pc libpcre32.pc libpcreposix.pc libpcrecpp.pc pcre-config pcre.h pcre_stringpiece.h pcrecpparg.h" + +ac_config_files="$ac_config_files Makefile libpcre2-8.pc libpcre2-16.pc libpcre2-32.pc libpcre2-posix.pc pcre2-config src/pcre2.h" # Make the generated script files executable. ac_config_commands="$ac_config_commands script-chmod" -# Make sure that pcre_chartables.c is removed in case the method for +# Make sure that pcre2_chartables.c is removed in case the method for # creating it was changed by reconfiguration. ac_config_commands="$ac_config_commands delete-old-chartables" @@ -20228,8 +16142,8 @@ _ACEOF case $ac_val in #( *${as_nl}*) case $ac_var in #( - *_cv_*) { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 -$as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; + *_cv_*) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: cache variable $ac_var contains a newline" >&5 +printf "%s\n" "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; esac case $ac_var in #( _ | IFS | as_nl) ;; #( @@ -20259,15 +16173,15 @@ $as_echo "$as_me: WARNING: cache variable $ac_var contains a newline" >&2;} ;; /^ac_cv_env_/b end t clear :clear - s/^\([^=]*\)=\(.*[{}].*\)$/test "${\1+set}" = set || &/ + s/^\([^=]*\)=\(.*[{}].*\)$/test ${\1+y} || &/ t end s/^\([^=]*\)=\(.*\)$/\1=${\1=\2}/ :end' >>confcache if diff "$cache_file" confcache >/dev/null 2>&1; then :; else if test -w "$cache_file"; then if test "x$cache_file" != "x/dev/null"; then - { $as_echo "$as_me:${as_lineno-$LINENO}: updating cache $cache_file" >&5 -$as_echo "$as_me: updating cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: updating cache $cache_file" >&5 +printf "%s\n" "$as_me: updating cache $cache_file" >&6;} if test ! -f "$cache_file" || test -h "$cache_file"; then cat confcache >"$cache_file" else @@ -20281,8 +16195,8 @@ $as_echo "$as_me: updating cache $cache_file" >&6;} fi fi else - { $as_echo "$as_me:${as_lineno-$LINENO}: not updating unwritable cache $cache_file" >&5 -$as_echo "$as_me: not updating unwritable cache $cache_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: not updating unwritable cache $cache_file" >&5 +printf "%s\n" "$as_me: not updating unwritable cache $cache_file" >&6;} fi fi rm -f confcache @@ -20299,7 +16213,7 @@ U= for ac_i in : $LIBOBJS; do test "x$ac_i" = x: && continue # 1. Remove the extension, and $U if already installed. ac_script='s/\$U\././;s/\.o$//;s/\.obj$//' - ac_i=`$as_echo "$ac_i" | sed "$ac_script"` + ac_i=`printf "%s\n" "$ac_i" | sed "$ac_script"` # 2. Prepend LIBOBJDIR. When used with automake>=1.10 LIBOBJDIR # will be set to the directory where LIBOBJS objects are built. as_fn_append ac_libobjs " \${LIBOBJDIR}$ac_i\$U.$ac_objext" @@ -20310,14 +16224,14 @@ LIBOBJS=$ac_libobjs LTLIBOBJS=$ac_ltlibobjs -{ $as_echo "$as_me:${as_lineno-$LINENO}: checking that generated files are newer than configure" >&5 -$as_echo_n "checking that generated files are newer than configure... " >&6; } +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: checking that generated files are newer than configure" >&5 +printf %s "checking that generated files are newer than configure... " >&6; } if test -n "$am_sleep_pid"; then # Hide warnings about reused PIDs. wait $am_sleep_pid 2>/dev/null fi - { $as_echo "$as_me:${as_lineno-$LINENO}: result: done" >&5 -$as_echo "done" >&6; } + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: result: done" >&5 +printf "%s\n" "done" >&6; } if test -n "$EXEEXT"; then am__EXEEXT_TRUE= am__EXEEXT_FALSE='#' @@ -20334,28 +16248,20 @@ if test -z "${am__fastdepCC_TRUE}" && test -z "${am__fastdepCC_FALSE}"; then as_fn_error $? "conditional \"am__fastdepCC\" was never defined. Usually this means the macro was only invoked conditionally." "$LINENO" 5 fi -if test -z "${am__fastdepCC_TRUE}" && test -z "${am__fastdepCC_FALSE}"; then - as_fn_error $? "conditional \"am__fastdepCC\" was never defined. -Usually this means the macro was only invoked conditionally." "$LINENO" 5 -fi -if test -z "${am__fastdepCXX_TRUE}" && test -z "${am__fastdepCXX_FALSE}"; then - as_fn_error $? "conditional \"am__fastdepCXX\" was never defined. -Usually this means the macro was only invoked conditionally." "$LINENO" 5 -fi -if test -z "${WITH_PCRE8_TRUE}" && test -z "${WITH_PCRE8_FALSE}"; then - as_fn_error $? "conditional \"WITH_PCRE8\" was never defined. +if test -z "${WITH_PCRE2_8_TRUE}" && test -z "${WITH_PCRE2_8_FALSE}"; then + as_fn_error $? "conditional \"WITH_PCRE2_8\" was never defined. Usually this means the macro was only invoked conditionally." "$LINENO" 5 fi -if test -z "${WITH_PCRE16_TRUE}" && test -z "${WITH_PCRE16_FALSE}"; then - as_fn_error $? "conditional \"WITH_PCRE16\" was never defined. +if test -z "${WITH_PCRE2_16_TRUE}" && test -z "${WITH_PCRE2_16_FALSE}"; then + as_fn_error $? "conditional \"WITH_PCRE2_16\" was never defined. Usually this means the macro was only invoked conditionally." "$LINENO" 5 fi -if test -z "${WITH_PCRE32_TRUE}" && test -z "${WITH_PCRE32_FALSE}"; then - as_fn_error $? "conditional \"WITH_PCRE32\" was never defined. +if test -z "${WITH_PCRE2_32_TRUE}" && test -z "${WITH_PCRE2_32_FALSE}"; then + as_fn_error $? "conditional \"WITH_PCRE2_32\" was never defined. Usually this means the macro was only invoked conditionally." "$LINENO" 5 fi -if test -z "${WITH_PCRE_CPP_TRUE}" && test -z "${WITH_PCRE_CPP_FALSE}"; then - as_fn_error $? "conditional \"WITH_PCRE_CPP\" was never defined. +if test -z "${WITH_DEBUG_TRUE}" && test -z "${WITH_DEBUG_FALSE}"; then + as_fn_error $? "conditional \"WITH_DEBUG\" was never defined. Usually this means the macro was only invoked conditionally." "$LINENO" 5 fi if test -z "${WITH_REBUILD_CHARTABLES_TRUE}" && test -z "${WITH_REBUILD_CHARTABLES_FALSE}"; then @@ -20366,14 +16272,18 @@ if test -z "${WITH_JIT_TRUE}" && test -z "${WITH_JIT_FALSE}"; then as_fn_error $? "conditional \"WITH_JIT\" was never defined. Usually this means the macro was only invoked conditionally." "$LINENO" 5 fi -if test -z "${WITH_UTF_TRUE}" && test -z "${WITH_UTF_FALSE}"; then - as_fn_error $? "conditional \"WITH_UTF\" was never defined. +if test -z "${WITH_UNICODE_TRUE}" && test -z "${WITH_UNICODE_FALSE}"; then + as_fn_error $? "conditional \"WITH_UNICODE\" was never defined. Usually this means the macro was only invoked conditionally." "$LINENO" 5 fi if test -z "${WITH_VALGRIND_TRUE}" && test -z "${WITH_VALGRIND_FALSE}"; then as_fn_error $? "conditional \"WITH_VALGRIND\" was never defined. Usually this means the macro was only invoked conditionally." "$LINENO" 5 fi +if test -z "${WITH_FUZZ_SUPPORT_TRUE}" && test -z "${WITH_FUZZ_SUPPORT_FALSE}"; then + as_fn_error $? "conditional \"WITH_FUZZ_SUPPORT\" was never defined. +Usually this means the macro was only invoked conditionally." "$LINENO" 5 +fi if test -z "${WITH_GCOV_TRUE}" && test -z "${WITH_GCOV_FALSE}"; then as_fn_error $? "conditional \"WITH_GCOV\" was never defined. Usually this means the macro was only invoked conditionally." "$LINENO" 5 @@ -20383,8 +16293,8 @@ fi ac_write_fail=0 ac_clean_files_save=$ac_clean_files ac_clean_files="$ac_clean_files $CONFIG_STATUS" -{ $as_echo "$as_me:${as_lineno-$LINENO}: creating $CONFIG_STATUS" >&5 -$as_echo "$as_me: creating $CONFIG_STATUS" >&6;} +{ printf "%s\n" "$as_me:${as_lineno-$LINENO}: creating $CONFIG_STATUS" >&5 +printf "%s\n" "$as_me: creating $CONFIG_STATUS" >&6;} as_write_fail=0 cat >$CONFIG_STATUS <<_ASEOF || as_write_fail=1 #! $SHELL @@ -20407,14 +16317,16 @@ cat >>$CONFIG_STATUS <<\_ASEOF || as_write_fail=1 # Be more Bourne compatible DUALCASE=1; export DUALCASE # for MKS sh -if test -n "${ZSH_VERSION+set}" && (emulate sh) >/dev/null 2>&1; then : +as_nop=: +if test ${ZSH_VERSION+y} && (emulate sh) >/dev/null 2>&1 +then : emulate sh NULLCMD=: # Pre-4.2 versions of Zsh do word splitting on ${1+"$@"}, which # is contrary to our usage. Disable this feature. alias -g '${1+"$@"}'='"$@"' setopt NO_GLOB_SUBST -else +else $as_nop case `(set -o) 2>/dev/null` in #( *posix*) : set -o posix ;; #( @@ -20424,46 +16336,46 @@ esac fi + +# Reset variables that may have inherited troublesome values from +# the environment. + +# IFS needs to be set, to space, tab, and newline, in precisely that order. +# (If _AS_PATH_WALK were called with IFS unset, it would have the +# side effect of setting IFS to empty, thus disabling word splitting.) +# Quoting is to prevent editors from complaining about space-tab. as_nl=' ' export as_nl -# Printing a long string crashes Solaris 7 /usr/bin/printf. -as_echo='\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\' -as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo -as_echo=$as_echo$as_echo$as_echo$as_echo$as_echo$as_echo -# Prefer a ksh shell builtin over an external printf program on Solaris, -# but without wasting forks for bash or zsh. -if test -z "$BASH_VERSION$ZSH_VERSION" \ - && (test "X`print -r -- $as_echo`" = "X$as_echo") 2>/dev/null; then - as_echo='print -r --' - as_echo_n='print -rn --' -elif (test "X`printf %s $as_echo`" = "X$as_echo") 2>/dev/null; then - as_echo='printf %s\n' - as_echo_n='printf %s' -else - if test "X`(/usr/ucb/echo -n -n $as_echo) 2>/dev/null`" = "X-n $as_echo"; then - as_echo_body='eval /usr/ucb/echo -n "$1$as_nl"' - as_echo_n='/usr/ucb/echo -n' - else - as_echo_body='eval expr "X$1" : "X\\(.*\\)"' - as_echo_n_body='eval - arg=$1; - case $arg in #( - *"$as_nl"*) - expr "X$arg" : "X\\(.*\\)$as_nl"; - arg=`expr "X$arg" : ".*$as_nl\\(.*\\)"`;; - esac; - expr "X$arg" : "X\\(.*\\)" | tr -d "$as_nl" - ' - export as_echo_n_body - as_echo_n='sh -c $as_echo_n_body as_echo' - fi - export as_echo_body - as_echo='sh -c $as_echo_body as_echo' -fi +IFS=" "" $as_nl" + +PS1='$ ' +PS2='> ' +PS4='+ ' + +# Ensure predictable behavior from utilities with locale-dependent output. +LC_ALL=C +export LC_ALL +LANGUAGE=C +export LANGUAGE + +# We cannot yet rely on "unset" to work, but we need these variables +# to be unset--not just set to an empty or harmless value--now, to +# avoid bugs in old shells (e.g. pre-3.0 UWIN ksh). This construct +# also avoids known problems related to "unset" and subshell syntax +# in other old shells (e.g. bash 2.01 and pdksh 5.2.14). +for as_var in BASH_ENV ENV MAIL MAILPATH CDPATH +do eval test \${$as_var+y} \ + && ( (unset $as_var) || exit 1) >/dev/null 2>&1 && unset $as_var || : +done + +# Ensure that fds 0, 1, and 2 are open. +if (exec 3>&0) 2>/dev/null; then :; else exec 0&1) 2>/dev/null; then :; else exec 1>/dev/null; fi +if (exec 3>&2) ; then :; else exec 2>/dev/null; fi # The user is always right. -if test "${PATH_SEPARATOR+set}" != set; then +if ${PATH_SEPARATOR+false} :; then PATH_SEPARATOR=: (PATH='/bin;/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 && { (PATH='/bin:/bin'; FPATH=$PATH; sh -c :) >/dev/null 2>&1 || @@ -20472,13 +16384,6 @@ if test "${PATH_SEPARATOR+set}" != set; then fi -# IFS -# We need space, tab and new line, in precisely that order. Quoting is -# there to prevent editors from complaining about space-tab. -# (If _AS_PATH_WALK were called with IFS unset, it would disable word -# splitting by setting IFS to empty value.) -IFS=" "" $as_nl" - # Find who we are. Look in the path if we contain no directory separator. as_myself= case $0 in #(( @@ -20487,8 +16392,12 @@ case $0 in #(( for as_dir in $PATH do IFS=$as_save_IFS - test -z "$as_dir" && as_dir=. - test -r "$as_dir/$0" && as_myself=$as_dir/$0 && break + case $as_dir in #((( + '') as_dir=./ ;; + */) ;; + *) as_dir=$as_dir/ ;; + esac + test -r "$as_dir$0" && as_myself=$as_dir$0 && break done IFS=$as_save_IFS @@ -20500,30 +16409,10 @@ if test "x$as_myself" = x; then as_myself=$0 fi if test ! -f "$as_myself"; then - $as_echo "$as_myself: error: cannot find myself; rerun with an absolute file name" >&2 + printf "%s\n" "$as_myself: error: cannot find myself; rerun with an absolute file name" >&2 exit 1 fi -# Unset variables that we do not need and which cause bugs (e.g. in -# pre-3.0 UWIN ksh). But do not cause bugs in bash 2.01; the "|| exit 1" -# suppresses any "Segmentation fault" message there. '((' could -# trigger a bug in pdksh 5.2.14. -for as_var in BASH_ENV ENV MAIL MAILPATH -do eval test x\${$as_var+set} = xset \ - && ( (unset $as_var) || exit 1) >/dev/null 2>&1 && unset $as_var || : -done -PS1='$ ' -PS2='> ' -PS4='+ ' - -# NLS nuisances. -LC_ALL=C -export LC_ALL -LANGUAGE=C -export LANGUAGE - -# CDPATH. -(unset CDPATH) >/dev/null 2>&1 && unset CDPATH # as_fn_error STATUS ERROR [LINENO LOG_FD] @@ -20536,13 +16425,14 @@ as_fn_error () as_status=$1; test $as_status -eq 0 && as_status=1 if test "$4"; then as_lineno=${as_lineno-"$3"} as_lineno_stack=as_lineno_stack=$as_lineno_stack - $as_echo "$as_me:${as_lineno-$LINENO}: error: $2" >&$4 + printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: $2" >&$4 fi - $as_echo "$as_me: error: $2" >&2 + printf "%s\n" "$as_me: error: $2" >&2 as_fn_exit $as_status } # as_fn_error + # as_fn_set_status STATUS # ----------------------- # Set $? to STATUS, without forking. @@ -20569,18 +16459,20 @@ as_fn_unset () { eval $1=; unset $1;} } as_unset=as_fn_unset + # as_fn_append VAR VALUE # ---------------------- # Append the text in VALUE to the end of the definition contained in VAR. Take # advantage of any shell optimizations that allow amortized linear growth over # repeated appends, instead of the typical quadratic growth present in naive # implementations. -if (eval "as_var=1; as_var+=2; test x\$as_var = x12") 2>/dev/null; then : +if (eval "as_var=1; as_var+=2; test x\$as_var = x12") 2>/dev/null +then : eval 'as_fn_append () { eval $1+=\$2 }' -else +else $as_nop as_fn_append () { eval $1=\$$1\$2 @@ -20592,12 +16484,13 @@ fi # as_fn_append # Perform arithmetic evaluation on the ARGs, and store the result in the # global $as_val. Take advantage of shells that can avoid forks. The arguments # must be portable across $(()) and expr. -if (eval "test \$(( 1 + 1 )) = 2") 2>/dev/null; then : +if (eval "test \$(( 1 + 1 )) = 2") 2>/dev/null +then : eval 'as_fn_arith () { as_val=$(( $* )) }' -else +else $as_nop as_fn_arith () { as_val=`expr "$@" || test $? -eq 1` @@ -20628,7 +16521,7 @@ as_me=`$as_basename -- "$0" || $as_expr X/"$0" : '.*/\([^/][^/]*\)/*$' \| \ X"$0" : 'X\(//\)$' \| \ X"$0" : 'X\(/\)' \| . 2>/dev/null || -$as_echo X/"$0" | +printf "%s\n" X/"$0" | sed '/^.*\/\([^/][^/]*\)\/*$/{ s//\1/ q @@ -20650,6 +16543,10 @@ as_cr_Letters=$as_cr_letters$as_cr_LETTERS as_cr_digits='0123456789' as_cr_alnum=$as_cr_Letters$as_cr_digits + +# Determine whether it's possible to make 'echo' print without a newline. +# These variables are no longer used directly by Autoconf, but are AC_SUBSTed +# for compatibility with existing Makefiles. ECHO_C= ECHO_N= ECHO_T= case `echo -n x` in #((((( -n*) @@ -20663,6 +16560,12 @@ case `echo -n x` in #((((( ECHO_N='-n';; esac +# For backward compatibility with old third-party macros, we provide +# the shell variables $as_echo and $as_echo_n. New code should use +# AS_ECHO(["message"]) and AS_ECHO_N(["message"]), respectively. +as_echo='printf %s\n' +as_echo_n='printf %s' + rm -f conf$$ conf$$.exe conf$$.file if test -d conf$$.dir; then rm -f conf$$.dir/conf$$.file @@ -20704,7 +16607,7 @@ as_fn_mkdir_p () as_dirs= while :; do case $as_dir in #( - *\'*) as_qdir=`$as_echo "$as_dir" | sed "s/'/'\\\\\\\\''/g"`;; #'( + *\'*) as_qdir=`printf "%s\n" "$as_dir" | sed "s/'/'\\\\\\\\''/g"`;; #'( *) as_qdir=$as_dir;; esac as_dirs="'$as_qdir' $as_dirs" @@ -20713,7 +16616,7 @@ $as_expr X"$as_dir" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$as_dir" : 'X\(//\)[^/]' \| \ X"$as_dir" : 'X\(//\)$' \| \ X"$as_dir" : 'X\(/\)' \| . 2>/dev/null || -$as_echo X"$as_dir" | +printf "%s\n" X"$as_dir" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q @@ -20775,8 +16678,8 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 # report actual input values of CONFIG_FILES etc. instead of their # values after options handling. ac_log=" -This file was extended by PCRE $as_me 8.43, which was -generated by GNU Autoconf 2.69. Invocation command line was +This file was extended by PCRE2 $as_me 10.37, which was +generated by GNU Autoconf 2.71. Invocation command line was CONFIG_FILES = $CONFIG_FILES CONFIG_HEADERS = $CONFIG_HEADERS @@ -20838,14 +16741,16 @@ $config_commands Report bugs to the package provider." _ACEOF +ac_cs_config=`printf "%s\n" "$ac_configure_args" | sed "$ac_safe_unquote"` +ac_cs_config_escaped=`printf "%s\n" "$ac_cs_config" | sed "s/^ //; s/'/'\\\\\\\\''/g"` cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 -ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`" +ac_cs_config='$ac_cs_config_escaped' ac_cs_version="\\ -PCRE config.status 8.43 -configured by $0, generated by GNU Autoconf 2.69, +PCRE2 config.status 10.37 +configured by $0, generated by GNU Autoconf 2.71, with options \\"\$ac_cs_config\\" -Copyright (C) 2012 Free Software Foundation, Inc. +Copyright (C) 2021 Free Software Foundation, Inc. This config.status script is free software; the Free Software Foundation gives unlimited permission to copy, distribute and modify it." @@ -20885,15 +16790,15 @@ do -recheck | --recheck | --rechec | --reche | --rech | --rec | --re | --r) ac_cs_recheck=: ;; --version | --versio | --versi | --vers | --ver | --ve | --v | -V ) - $as_echo "$ac_cs_version"; exit ;; + printf "%s\n" "$ac_cs_version"; exit ;; --config | --confi | --conf | --con | --co | --c ) - $as_echo "$ac_cs_config"; exit ;; + printf "%s\n" "$ac_cs_config"; exit ;; --debug | --debu | --deb | --de | --d | -d ) debug=: ;; --file | --fil | --fi | --f ) $ac_shift case $ac_optarg in - *\'*) ac_optarg=`$as_echo "$ac_optarg" | sed "s/'/'\\\\\\\\''/g"` ;; + *\'*) ac_optarg=`printf "%s\n" "$ac_optarg" | sed "s/'/'\\\\\\\\''/g"` ;; '') as_fn_error $? "missing file argument" ;; esac as_fn_append CONFIG_FILES " '$ac_optarg'" @@ -20901,7 +16806,7 @@ do --header | --heade | --head | --hea ) $ac_shift case $ac_optarg in - *\'*) ac_optarg=`$as_echo "$ac_optarg" | sed "s/'/'\\\\\\\\''/g"` ;; + *\'*) ac_optarg=`printf "%s\n" "$ac_optarg" | sed "s/'/'\\\\\\\\''/g"` ;; esac as_fn_append CONFIG_HEADERS " '$ac_optarg'" ac_need_defaults=false;; @@ -20910,7 +16815,7 @@ do as_fn_error $? "ambiguous option: \`$1' Try \`$0 --help' for more information.";; --help | --hel | -h ) - $as_echo "$ac_cs_usage"; exit ;; + printf "%s\n" "$ac_cs_usage"; exit ;; -q | -quiet | --quiet | --quie | --qui | --qu | --q \ | -silent | --silent | --silen | --sile | --sil | --si | --s) ac_cs_silent=: ;; @@ -20938,7 +16843,7 @@ cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 if \$ac_cs_recheck; then set X $SHELL '$0' $ac_configure_args \$ac_configure_extra_args --no-create --no-recursion shift - \$as_echo "running CONFIG_SHELL=$SHELL \$*" >&6 + \printf "%s\n" "running CONFIG_SHELL=$SHELL \$*" >&6 CONFIG_SHELL='$SHELL' export CONFIG_SHELL exec "\$@" @@ -20952,7 +16857,7 @@ exec 5>>config.log sed 'h;s/./-/g;s/^.../## /;s/...$/ ##/;p;x;p;x' <<_ASBOX ## Running $as_me. ## _ASBOX - $as_echo "$ac_log" + printf "%s\n" "$ac_log" } >&5 _ACEOF @@ -20970,11 +16875,11 @@ AMDEP_TRUE="$AMDEP_TRUE" MAKE="${MAKE-make}" sed_quote_subst='$sed_quote_subst' double_quote_subst='$double_quote_subst' delay_variable_subst='$delay_variable_subst' +macro_version='`$ECHO "$macro_version" | $SED "$delay_single_quote_subst"`' +macro_revision='`$ECHO "$macro_revision" | $SED "$delay_single_quote_subst"`' AS='`$ECHO "$AS" | $SED "$delay_single_quote_subst"`' DLLTOOL='`$ECHO "$DLLTOOL" | $SED "$delay_single_quote_subst"`' OBJDUMP='`$ECHO "$OBJDUMP" | $SED "$delay_single_quote_subst"`' -macro_version='`$ECHO "$macro_version" | $SED "$delay_single_quote_subst"`' -macro_revision='`$ECHO "$macro_revision" | $SED "$delay_single_quote_subst"`' enable_shared='`$ECHO "$enable_shared" | $SED "$delay_single_quote_subst"`' enable_static='`$ECHO "$enable_static" | $SED "$delay_single_quote_subst"`' pic_mode='`$ECHO "$pic_mode" | $SED "$delay_single_quote_subst"`' @@ -21107,60 +17012,6 @@ enable_dlopen_self='`$ECHO "$enable_dlopen_self" | $SED "$delay_single_quote_sub enable_dlopen_self_static='`$ECHO "$enable_dlopen_self_static" | $SED "$delay_single_quote_subst"`' old_striplib='`$ECHO "$old_striplib" | $SED "$delay_single_quote_subst"`' striplib='`$ECHO "$striplib" | $SED "$delay_single_quote_subst"`' -compiler_lib_search_dirs='`$ECHO "$compiler_lib_search_dirs" | $SED "$delay_single_quote_subst"`' -predep_objects='`$ECHO "$predep_objects" | $SED "$delay_single_quote_subst"`' -postdep_objects='`$ECHO "$postdep_objects" | $SED "$delay_single_quote_subst"`' -predeps='`$ECHO "$predeps" | $SED "$delay_single_quote_subst"`' -postdeps='`$ECHO "$postdeps" | $SED "$delay_single_quote_subst"`' -compiler_lib_search_path='`$ECHO "$compiler_lib_search_path" | $SED "$delay_single_quote_subst"`' -LD_CXX='`$ECHO "$LD_CXX" | $SED "$delay_single_quote_subst"`' -reload_flag_CXX='`$ECHO "$reload_flag_CXX" | $SED "$delay_single_quote_subst"`' -reload_cmds_CXX='`$ECHO "$reload_cmds_CXX" | $SED "$delay_single_quote_subst"`' -old_archive_cmds_CXX='`$ECHO "$old_archive_cmds_CXX" | $SED "$delay_single_quote_subst"`' -compiler_CXX='`$ECHO "$compiler_CXX" | $SED "$delay_single_quote_subst"`' -GCC_CXX='`$ECHO "$GCC_CXX" | $SED "$delay_single_quote_subst"`' -lt_prog_compiler_no_builtin_flag_CXX='`$ECHO "$lt_prog_compiler_no_builtin_flag_CXX" | $SED "$delay_single_quote_subst"`' -lt_prog_compiler_pic_CXX='`$ECHO "$lt_prog_compiler_pic_CXX" | $SED "$delay_single_quote_subst"`' -lt_prog_compiler_wl_CXX='`$ECHO "$lt_prog_compiler_wl_CXX" | $SED "$delay_single_quote_subst"`' -lt_prog_compiler_static_CXX='`$ECHO "$lt_prog_compiler_static_CXX" | $SED "$delay_single_quote_subst"`' -lt_cv_prog_compiler_c_o_CXX='`$ECHO "$lt_cv_prog_compiler_c_o_CXX" | $SED "$delay_single_quote_subst"`' -archive_cmds_need_lc_CXX='`$ECHO "$archive_cmds_need_lc_CXX" | $SED "$delay_single_quote_subst"`' -enable_shared_with_static_runtimes_CXX='`$ECHO "$enable_shared_with_static_runtimes_CXX" | $SED "$delay_single_quote_subst"`' -export_dynamic_flag_spec_CXX='`$ECHO "$export_dynamic_flag_spec_CXX" | $SED "$delay_single_quote_subst"`' -whole_archive_flag_spec_CXX='`$ECHO "$whole_archive_flag_spec_CXX" | $SED "$delay_single_quote_subst"`' -compiler_needs_object_CXX='`$ECHO "$compiler_needs_object_CXX" | $SED "$delay_single_quote_subst"`' -old_archive_from_new_cmds_CXX='`$ECHO "$old_archive_from_new_cmds_CXX" | $SED "$delay_single_quote_subst"`' -old_archive_from_expsyms_cmds_CXX='`$ECHO "$old_archive_from_expsyms_cmds_CXX" | $SED "$delay_single_quote_subst"`' -archive_cmds_CXX='`$ECHO "$archive_cmds_CXX" | $SED "$delay_single_quote_subst"`' -archive_expsym_cmds_CXX='`$ECHO "$archive_expsym_cmds_CXX" | $SED "$delay_single_quote_subst"`' -module_cmds_CXX='`$ECHO "$module_cmds_CXX" | $SED "$delay_single_quote_subst"`' -module_expsym_cmds_CXX='`$ECHO "$module_expsym_cmds_CXX" | $SED "$delay_single_quote_subst"`' -with_gnu_ld_CXX='`$ECHO "$with_gnu_ld_CXX" | $SED "$delay_single_quote_subst"`' -allow_undefined_flag_CXX='`$ECHO "$allow_undefined_flag_CXX" | $SED "$delay_single_quote_subst"`' -no_undefined_flag_CXX='`$ECHO "$no_undefined_flag_CXX" | $SED "$delay_single_quote_subst"`' -hardcode_libdir_flag_spec_CXX='`$ECHO "$hardcode_libdir_flag_spec_CXX" | $SED "$delay_single_quote_subst"`' -hardcode_libdir_separator_CXX='`$ECHO "$hardcode_libdir_separator_CXX" | $SED "$delay_single_quote_subst"`' -hardcode_direct_CXX='`$ECHO "$hardcode_direct_CXX" | $SED "$delay_single_quote_subst"`' -hardcode_direct_absolute_CXX='`$ECHO "$hardcode_direct_absolute_CXX" | $SED "$delay_single_quote_subst"`' -hardcode_minus_L_CXX='`$ECHO "$hardcode_minus_L_CXX" | $SED "$delay_single_quote_subst"`' -hardcode_shlibpath_var_CXX='`$ECHO "$hardcode_shlibpath_var_CXX" | $SED "$delay_single_quote_subst"`' -hardcode_automatic_CXX='`$ECHO "$hardcode_automatic_CXX" | $SED "$delay_single_quote_subst"`' -inherit_rpath_CXX='`$ECHO "$inherit_rpath_CXX" | $SED "$delay_single_quote_subst"`' -link_all_deplibs_CXX='`$ECHO "$link_all_deplibs_CXX" | $SED "$delay_single_quote_subst"`' -always_export_symbols_CXX='`$ECHO "$always_export_symbols_CXX" | $SED "$delay_single_quote_subst"`' -export_symbols_cmds_CXX='`$ECHO "$export_symbols_cmds_CXX" | $SED "$delay_single_quote_subst"`' -exclude_expsyms_CXX='`$ECHO "$exclude_expsyms_CXX" | $SED "$delay_single_quote_subst"`' -include_expsyms_CXX='`$ECHO "$include_expsyms_CXX" | $SED "$delay_single_quote_subst"`' -prelink_cmds_CXX='`$ECHO "$prelink_cmds_CXX" | $SED "$delay_single_quote_subst"`' -postlink_cmds_CXX='`$ECHO "$postlink_cmds_CXX" | $SED "$delay_single_quote_subst"`' -file_list_spec_CXX='`$ECHO "$file_list_spec_CXX" | $SED "$delay_single_quote_subst"`' -hardcode_action_CXX='`$ECHO "$hardcode_action_CXX" | $SED "$delay_single_quote_subst"`' -compiler_lib_search_dirs_CXX='`$ECHO "$compiler_lib_search_dirs_CXX" | $SED "$delay_single_quote_subst"`' -predep_objects_CXX='`$ECHO "$predep_objects_CXX" | $SED "$delay_single_quote_subst"`' -postdep_objects_CXX='`$ECHO "$postdep_objects_CXX" | $SED "$delay_single_quote_subst"`' -predeps_CXX='`$ECHO "$predeps_CXX" | $SED "$delay_single_quote_subst"`' -postdeps_CXX='`$ECHO "$postdeps_CXX" | $SED "$delay_single_quote_subst"`' -compiler_lib_search_path_CXX='`$ECHO "$compiler_lib_search_path_CXX" | $SED "$delay_single_quote_subst"`' LTCC='$LTCC' LTCFLAGS='$LTCFLAGS' @@ -21242,38 +17093,7 @@ soname_spec \ install_override_mode \ finish_eval \ old_striplib \ -striplib \ -compiler_lib_search_dirs \ -predep_objects \ -postdep_objects \ -predeps \ -postdeps \ -compiler_lib_search_path \ -LD_CXX \ -reload_flag_CXX \ -compiler_CXX \ -lt_prog_compiler_no_builtin_flag_CXX \ -lt_prog_compiler_pic_CXX \ -lt_prog_compiler_wl_CXX \ -lt_prog_compiler_static_CXX \ -lt_cv_prog_compiler_c_o_CXX \ -export_dynamic_flag_spec_CXX \ -whole_archive_flag_spec_CXX \ -compiler_needs_object_CXX \ -with_gnu_ld_CXX \ -allow_undefined_flag_CXX \ -no_undefined_flag_CXX \ -hardcode_libdir_flag_spec_CXX \ -hardcode_libdir_separator_CXX \ -exclude_expsyms_CXX \ -include_expsyms_CXX \ -file_list_spec_CXX \ -compiler_lib_search_dirs_CXX \ -predep_objects_CXX \ -postdep_objects_CXX \ -predeps_CXX \ -postdeps_CXX \ -compiler_lib_search_path_CXX; do +striplib; do case \`eval \\\\\$ECHO \\\\""\\\\\$\$var"\\\\"\` in *[\\\\\\\`\\"\\\$]*) eval "lt_\$var=\\\\\\"\\\`\\\$ECHO \\"\\\$\$var\\" | \\\$SED \\"\\\$sed_quote_subst\\"\\\`\\\\\\"" ## exclude from sc_prohibit_nested_quotes @@ -21304,18 +17124,7 @@ postuninstall_cmds \ finish_cmds \ sys_lib_search_path_spec \ configure_time_dlsearch_path \ -configure_time_lt_sys_library_path \ -reload_cmds_CXX \ -old_archive_cmds_CXX \ -old_archive_from_new_cmds_CXX \ -old_archive_from_expsyms_cmds_CXX \ -archive_cmds_CXX \ -archive_expsym_cmds_CXX \ -module_cmds_CXX \ -module_expsym_cmds_CXX \ -export_symbols_cmds_CXX \ -prelink_cmds_CXX \ -postlink_cmds_CXX; do +configure_time_lt_sys_library_path; do case \`eval \\\\\$ECHO \\\\""\\\\\$\$var"\\\\"\` in *[\\\\\\\`\\"\\\$]*) eval "lt_\$var=\\\\\\"\\\`\\\$ECHO \\"\\\$\$var\\" | \\\$SED -e \\"\\\$double_quote_subst\\" -e \\"\\\$sed_quote_subst\\" -e \\"\\\$delay_variable_subst\\"\\\`\\\\\\"" ## exclude from sc_prohibit_nested_quotes @@ -21343,8 +17152,6 @@ fi - - _ACEOF cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 @@ -21353,19 +17160,16 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 for ac_config_target in $ac_config_targets do case $ac_config_target in - "config.h") CONFIG_HEADERS="$CONFIG_HEADERS config.h" ;; + "src/config.h") CONFIG_HEADERS="$CONFIG_HEADERS src/config.h" ;; "depfiles") CONFIG_COMMANDS="$CONFIG_COMMANDS depfiles" ;; "libtool") CONFIG_COMMANDS="$CONFIG_COMMANDS libtool" ;; "Makefile") CONFIG_FILES="$CONFIG_FILES Makefile" ;; - "libpcre.pc") CONFIG_FILES="$CONFIG_FILES libpcre.pc" ;; - "libpcre16.pc") CONFIG_FILES="$CONFIG_FILES libpcre16.pc" ;; - "libpcre32.pc") CONFIG_FILES="$CONFIG_FILES libpcre32.pc" ;; - "libpcreposix.pc") CONFIG_FILES="$CONFIG_FILES libpcreposix.pc" ;; - "libpcrecpp.pc") CONFIG_FILES="$CONFIG_FILES libpcrecpp.pc" ;; - "pcre-config") CONFIG_FILES="$CONFIG_FILES pcre-config" ;; - "pcre.h") CONFIG_FILES="$CONFIG_FILES pcre.h" ;; - "pcre_stringpiece.h") CONFIG_FILES="$CONFIG_FILES pcre_stringpiece.h" ;; - "pcrecpparg.h") CONFIG_FILES="$CONFIG_FILES pcrecpparg.h" ;; + "libpcre2-8.pc") CONFIG_FILES="$CONFIG_FILES libpcre2-8.pc" ;; + "libpcre2-16.pc") CONFIG_FILES="$CONFIG_FILES libpcre2-16.pc" ;; + "libpcre2-32.pc") CONFIG_FILES="$CONFIG_FILES libpcre2-32.pc" ;; + "libpcre2-posix.pc") CONFIG_FILES="$CONFIG_FILES libpcre2-posix.pc" ;; + "pcre2-config") CONFIG_FILES="$CONFIG_FILES pcre2-config" ;; + "src/pcre2.h") CONFIG_FILES="$CONFIG_FILES src/pcre2.h" ;; "script-chmod") CONFIG_COMMANDS="$CONFIG_COMMANDS script-chmod" ;; "delete-old-chartables") CONFIG_COMMANDS="$CONFIG_COMMANDS delete-old-chartables" ;; @@ -21379,9 +17183,9 @@ done # We use the long form for the default assignment because of an extremely # bizarre bug on SunOS 4.1.3. if $ac_need_defaults; then - test "${CONFIG_FILES+set}" = set || CONFIG_FILES=$config_files - test "${CONFIG_HEADERS+set}" = set || CONFIG_HEADERS=$config_headers - test "${CONFIG_COMMANDS+set}" = set || CONFIG_COMMANDS=$config_commands + test ${CONFIG_FILES+y} || CONFIG_FILES=$config_files + test ${CONFIG_HEADERS+y} || CONFIG_HEADERS=$config_headers + test ${CONFIG_COMMANDS+y} || CONFIG_COMMANDS=$config_commands fi # Have a temporary directory for convenience. Make it in the build tree @@ -21717,7 +17521,7 @@ do esac || as_fn_error 1 "cannot find input file: \`$ac_f'" "$LINENO" 5;; esac - case $ac_f in *\'*) ac_f=`$as_echo "$ac_f" | sed "s/'/'\\\\\\\\''/g"`;; esac + case $ac_f in *\'*) ac_f=`printf "%s\n" "$ac_f" | sed "s/'/'\\\\\\\\''/g"`;; esac as_fn_append ac_file_inputs " '$ac_f'" done @@ -21725,17 +17529,17 @@ do # use $as_me), people would be surprised to read: # /* config.h. Generated by config.status. */ configure_input='Generated from '` - $as_echo "$*" | sed 's|^[^:]*/||;s|:[^:]*/|, |g' + printf "%s\n" "$*" | sed 's|^[^:]*/||;s|:[^:]*/|, |g' `' by configure.' if test x"$ac_file" != x-; then configure_input="$ac_file. $configure_input" - { $as_echo "$as_me:${as_lineno-$LINENO}: creating $ac_file" >&5 -$as_echo "$as_me: creating $ac_file" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: creating $ac_file" >&5 +printf "%s\n" "$as_me: creating $ac_file" >&6;} fi # Neutralize special characters interpreted by sed in replacement strings. case $configure_input in #( *\&* | *\|* | *\\* ) - ac_sed_conf_input=`$as_echo "$configure_input" | + ac_sed_conf_input=`printf "%s\n" "$configure_input" | sed 's/[\\\\&|]/\\\\&/g'`;; #( *) ac_sed_conf_input=$configure_input;; esac @@ -21752,7 +17556,7 @@ $as_expr X"$ac_file" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$ac_file" : 'X\(//\)[^/]' \| \ X"$ac_file" : 'X\(//\)$' \| \ X"$ac_file" : 'X\(/\)' \| . 2>/dev/null || -$as_echo X"$ac_file" | +printf "%s\n" X"$ac_file" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q @@ -21776,9 +17580,9 @@ $as_echo X"$ac_file" | case "$ac_dir" in .) ac_dir_suffix= ac_top_builddir_sub=. ac_top_build_prefix= ;; *) - ac_dir_suffix=/`$as_echo "$ac_dir" | sed 's|^\.[\\/]||'` + ac_dir_suffix=/`printf "%s\n" "$ac_dir" | sed 's|^\.[\\/]||'` # A ".." for each directory in $ac_dir_suffix. - ac_top_builddir_sub=`$as_echo "$ac_dir_suffix" | sed 's|/[^\\/]*|/..|g;s|/||'` + ac_top_builddir_sub=`printf "%s\n" "$ac_dir_suffix" | sed 's|/[^\\/]*|/..|g;s|/||'` case $ac_top_builddir_sub in "") ac_top_builddir_sub=. ac_top_build_prefix= ;; *) ac_top_build_prefix=$ac_top_builddir_sub/ ;; @@ -21840,8 +17644,8 @@ ac_sed_dataroot=' case `eval "sed -n \"\$ac_sed_dataroot\" $ac_file_inputs"` in *datarootdir*) ac_datarootdir_seen=yes;; *@datadir@*|*@docdir@*|*@infodir@*|*@localedir@*|*@mandir@*) - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $ac_file_inputs seems to ignore the --datarootdir setting" >&5 -$as_echo "$as_me: WARNING: $ac_file_inputs seems to ignore the --datarootdir setting" >&2;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: $ac_file_inputs seems to ignore the --datarootdir setting" >&5 +printf "%s\n" "$as_me: WARNING: $ac_file_inputs seems to ignore the --datarootdir setting" >&2;} _ACEOF cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 ac_datarootdir_hack=' @@ -21885,9 +17689,9 @@ test -z "$ac_datarootdir_hack$ac_datarootdir_seen" && { ac_out=`sed -n '/\${datarootdir}/p' "$ac_tmp/out"`; test -n "$ac_out"; } && { ac_out=`sed -n '/^[ ]*datarootdir[ ]*:*=/p' \ "$ac_tmp/out"`; test -z "$ac_out"; } && - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: $ac_file contains a reference to the variable \`datarootdir' + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: $ac_file contains a reference to the variable \`datarootdir' which seems to be undefined. Please make sure it is defined" >&5 -$as_echo "$as_me: WARNING: $ac_file contains a reference to the variable \`datarootdir' +printf "%s\n" "$as_me: WARNING: $ac_file contains a reference to the variable \`datarootdir' which seems to be undefined. Please make sure it is defined" >&2;} rm -f "$ac_tmp/stdin" @@ -21903,20 +17707,20 @@ which seems to be undefined. Please make sure it is defined" >&2;} # if test x"$ac_file" != x-; then { - $as_echo "/* $configure_input */" \ + printf "%s\n" "/* $configure_input */" >&1 \ && eval '$AWK -f "$ac_tmp/defines.awk"' "$ac_file_inputs" } >"$ac_tmp/config.h" \ || as_fn_error $? "could not create $ac_file" "$LINENO" 5 if diff "$ac_file" "$ac_tmp/config.h" >/dev/null 2>&1; then - { $as_echo "$as_me:${as_lineno-$LINENO}: $ac_file is unchanged" >&5 -$as_echo "$as_me: $ac_file is unchanged" >&6;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: $ac_file is unchanged" >&5 +printf "%s\n" "$as_me: $ac_file is unchanged" >&6;} else rm -f "$ac_file" mv "$ac_tmp/config.h" "$ac_file" \ || as_fn_error $? "could not create $ac_file" "$LINENO" 5 fi else - $as_echo "/* $configure_input */" \ + printf "%s\n" "/* $configure_input */" >&1 \ && eval '$AWK -f "$ac_tmp/defines.awk"' "$ac_file_inputs" \ || as_fn_error $? "could not create -" "$LINENO" 5 fi @@ -21936,7 +17740,7 @@ $as_expr X"$_am_arg" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$_am_arg" : 'X\(//\)[^/]' \| \ X"$_am_arg" : 'X\(//\)$' \| \ X"$_am_arg" : 'X\(/\)' \| . 2>/dev/null || -$as_echo X"$_am_arg" | +printf "%s\n" X"$_am_arg" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q @@ -21956,8 +17760,8 @@ $as_echo X"$_am_arg" | s/.*/./; q'`/stamp-h$_am_stamp_count ;; - :C) { $as_echo "$as_me:${as_lineno-$LINENO}: executing $ac_file commands" >&5 -$as_echo "$as_me: executing $ac_file commands" >&6;} + :C) { printf "%s\n" "$as_me:${as_lineno-$LINENO}: executing $ac_file commands" >&5 +printf "%s\n" "$as_me: executing $ac_file commands" >&6;} ;; esac @@ -21983,7 +17787,7 @@ esac for am_mf do # Strip MF so we end up with the name of the file. - am_mf=`$as_echo "$am_mf" | sed -e 's/:.*$//'` + am_mf=`printf "%s\n" "$am_mf" | sed -e 's/:.*$//'` # Check whether this is an Automake generated Makefile which includes # dependency-tracking related rules and includes. # Grep'ing the whole file directly is not great: AIX grep has a line @@ -21995,7 +17799,7 @@ $as_expr X"$am_mf" : 'X\(.*[^/]\)//*[^/][^/]*/*$' \| \ X"$am_mf" : 'X\(//\)[^/]' \| \ X"$am_mf" : 'X\(//\)$' \| \ X"$am_mf" : 'X\(/\)' \| . 2>/dev/null || -$as_echo X"$am_mf" | +printf "%s\n" X"$am_mf" | sed '/^X\(.*[^/]\)\/\/*[^/][^/]*\/*$/{ s//\1/ q @@ -22017,7 +17821,7 @@ $as_echo X"$am_mf" | $as_expr X/"$am_mf" : '.*/\([^/][^/]*\)/*$' \| \ X"$am_mf" : 'X\(//\)$' \| \ X"$am_mf" : 'X\(/\)' \| . 2>/dev/null || -$as_echo X/"$am_mf" | +printf "%s\n" X/"$am_mf" | sed '/^.*\/\([^/][^/]*\)\/*$/{ s//\1/ q @@ -22042,10 +17846,12 @@ $as_echo X/"$am_mf" | (exit $ac_status); } || am_rc=$? done if test $am_rc -ne 0; then - { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 -$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} + { { printf "%s\n" "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +printf "%s\n" "$as_me: error: in \`$ac_pwd':" >&2;} as_fn_error $? "Something went wrong bootstrapping makefile fragments - for automatic dependency tracking. Try re-running configure with the + for automatic dependency tracking. If GNU make was not used, consider + re-running the configure script with MAKE=\"gmake\" (or whatever is + necessary). You can also try re-running configure with the '--disable-dependency-tracking' option to at least be able to build the package (albeit without support for automatic dependency tracking). See \`config.log' for more details" "$LINENO" 5; } @@ -22072,7 +17878,6 @@ See \`config.log' for more details" "$LINENO" 5; } cat <<_LT_EOF >> "$cfgfile" #! $SHELL # Generated automatically by $as_me ($PACKAGE) $VERSION -# Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`: # NOTE: Changes made to this file will be lost: look at ltmain.sh. # Provide generalized library-building support services. @@ -22102,13 +17907,17 @@ See \`config.log' for more details" "$LINENO" 5; } # The names of the tagged configurations supported by this script. -available_tags='CXX ' +available_tags='' # Configured defaults for sys_lib_dlsearch_path munging. : \${LT_SYS_LIBRARY_PATH="$configure_time_lt_sys_library_path"} # ### BEGIN LIBTOOL CONFIG +# Which release of libtool.m4 was used? +macro_version=$macro_version +macro_revision=$macro_revision + # Assembler program. AS=$lt_AS @@ -22118,10 +17927,6 @@ DLLTOOL=$lt_DLLTOOL # Object dumper program. OBJDUMP=$lt_OBJDUMP -# Which release of libtool.m4 was used? -macro_version=$macro_version -macro_revision=$macro_revision - # Whether or not to build shared libraries. build_libtool_libs=$enable_shared @@ -22513,20 +18318,6 @@ file_list_spec=$lt_file_list_spec # How to hardcode a shared library path into an executable. hardcode_action=$hardcode_action -# The directories searched by this compiler when creating a shared library. -compiler_lib_search_dirs=$lt_compiler_lib_search_dirs - -# Dependencies to place before and after the objects being linked to -# create a shared library. -predep_objects=$lt_predep_objects -postdep_objects=$lt_postdep_objects -predeps=$lt_predeps -postdeps=$lt_postdeps - -# The library search path used internally by the compiler when linking -# a shared library. -compiler_lib_search_path=$lt_compiler_lib_search_path - # ### END LIBTOOL CONFIG _LT_EOF @@ -22605,6 +18396,7 @@ _LT_EOF esac + ltmain=$ac_aux_dir/ltmain.sh @@ -22619,162 +18411,9 @@ ltmain=$ac_aux_dir/ltmain.sh (rm -f "$ofile" && cp "$cfgfile" "$ofile" && rm -f "$cfgfile") chmod +x "$ofile" - - cat <<_LT_EOF >> "$ofile" - -# ### BEGIN LIBTOOL TAG CONFIG: CXX - -# The linker used to build libraries. -LD=$lt_LD_CXX - -# How to create reloadable object files. -reload_flag=$lt_reload_flag_CXX -reload_cmds=$lt_reload_cmds_CXX - -# Commands used to build an old-style archive. -old_archive_cmds=$lt_old_archive_cmds_CXX - -# A language specific compiler. -CC=$lt_compiler_CXX - -# Is the compiler the GNU compiler? -with_gcc=$GCC_CXX - -# Compiler flag to turn off builtin functions. -no_builtin_flag=$lt_lt_prog_compiler_no_builtin_flag_CXX - -# Additional compiler flags for building library objects. -pic_flag=$lt_lt_prog_compiler_pic_CXX - -# How to pass a linker flag through the compiler. -wl=$lt_lt_prog_compiler_wl_CXX - -# Compiler flag to prevent dynamic linking. -link_static_flag=$lt_lt_prog_compiler_static_CXX - -# Does compiler simultaneously support -c and -o options? -compiler_c_o=$lt_lt_cv_prog_compiler_c_o_CXX - -# Whether or not to add -lc for building shared libraries. -build_libtool_need_lc=$archive_cmds_need_lc_CXX - -# Whether or not to disallow shared libs when runtime libs are static. -allow_libtool_libs_with_static_runtimes=$enable_shared_with_static_runtimes_CXX - -# Compiler flag to allow reflexive dlopens. -export_dynamic_flag_spec=$lt_export_dynamic_flag_spec_CXX - -# Compiler flag to generate shared objects directly from archives. -whole_archive_flag_spec=$lt_whole_archive_flag_spec_CXX - -# Whether the compiler copes with passing no objects directly. -compiler_needs_object=$lt_compiler_needs_object_CXX - -# Create an old-style archive from a shared archive. -old_archive_from_new_cmds=$lt_old_archive_from_new_cmds_CXX - -# Create a temporary old-style archive to link instead of a shared archive. -old_archive_from_expsyms_cmds=$lt_old_archive_from_expsyms_cmds_CXX - -# Commands used to build a shared archive. -archive_cmds=$lt_archive_cmds_CXX -archive_expsym_cmds=$lt_archive_expsym_cmds_CXX - -# Commands used to build a loadable module if different from building -# a shared archive. -module_cmds=$lt_module_cmds_CXX -module_expsym_cmds=$lt_module_expsym_cmds_CXX - -# Whether we are building with GNU ld or not. -with_gnu_ld=$lt_with_gnu_ld_CXX - -# Flag that allows shared libraries with undefined symbols to be built. -allow_undefined_flag=$lt_allow_undefined_flag_CXX - -# Flag that enforces no undefined symbols. -no_undefined_flag=$lt_no_undefined_flag_CXX - -# Flag to hardcode \$libdir into a binary during linking. -# This must work even if \$libdir does not exist -hardcode_libdir_flag_spec=$lt_hardcode_libdir_flag_spec_CXX - -# Whether we need a single "-rpath" flag with a separated argument. -hardcode_libdir_separator=$lt_hardcode_libdir_separator_CXX - -# Set to "yes" if using DIR/libNAME\$shared_ext during linking hardcodes -# DIR into the resulting binary. -hardcode_direct=$hardcode_direct_CXX - -# Set to "yes" if using DIR/libNAME\$shared_ext during linking hardcodes -# DIR into the resulting binary and the resulting library dependency is -# "absolute",i.e impossible to change by setting \$shlibpath_var if the -# library is relocated. -hardcode_direct_absolute=$hardcode_direct_absolute_CXX - -# Set to "yes" if using the -LDIR flag during linking hardcodes DIR -# into the resulting binary. -hardcode_minus_L=$hardcode_minus_L_CXX - -# Set to "yes" if using SHLIBPATH_VAR=DIR during linking hardcodes DIR -# into the resulting binary. -hardcode_shlibpath_var=$hardcode_shlibpath_var_CXX - -# Set to "yes" if building a shared library automatically hardcodes DIR -# into the library and all subsequent libraries and executables linked -# against it. -hardcode_automatic=$hardcode_automatic_CXX - -# Set to yes if linker adds runtime paths of dependent libraries -# to runtime path list. -inherit_rpath=$inherit_rpath_CXX - -# Whether libtool must link a program against all its dependency libraries. -link_all_deplibs=$link_all_deplibs_CXX - -# Set to "yes" if exported symbols are required. -always_export_symbols=$always_export_symbols_CXX - -# The commands to list exported symbols. -export_symbols_cmds=$lt_export_symbols_cmds_CXX - -# Symbols that should not be listed in the preloaded symbols. -exclude_expsyms=$lt_exclude_expsyms_CXX - -# Symbols that must always be exported. -include_expsyms=$lt_include_expsyms_CXX - -# Commands necessary for linking programs (against libraries) with templates. -prelink_cmds=$lt_prelink_cmds_CXX - -# Commands necessary for finishing linking programs. -postlink_cmds=$lt_postlink_cmds_CXX - -# Specify filename containing input files. -file_list_spec=$lt_file_list_spec_CXX - -# How to hardcode a shared library path into an executable. -hardcode_action=$hardcode_action_CXX - -# The directories searched by this compiler when creating a shared library. -compiler_lib_search_dirs=$lt_compiler_lib_search_dirs_CXX - -# Dependencies to place before and after the objects being linked to -# create a shared library. -predep_objects=$lt_predep_objects_CXX -postdep_objects=$lt_postdep_objects_CXX -predeps=$lt_predeps_CXX -postdeps=$lt_postdeps_CXX - -# The library search path used internally by the compiler when linking -# a shared library. -compiler_lib_search_path=$lt_compiler_lib_search_path_CXX - -# ### END LIBTOOL TAG CONFIG: CXX -_LT_EOF - ;; - "script-chmod":C) chmod a+x pcre-config ;; - "delete-old-chartables":C) rm -f pcre_chartables.c ;; + "script-chmod":C) chmod a+x pcre2-config ;; + "delete-old-chartables":C) rm -f pcre2_chartables.c ;; esac done # for ac_tag @@ -22809,11 +18448,20 @@ if test "$no_create" != yes; then $ac_cs_success || as_fn_exit 1 fi if test -n "$ac_unrecognized_opts" && test "$enable_option_checking" != no; then - { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: unrecognized options: $ac_unrecognized_opts" >&5 -$as_echo "$as_me: WARNING: unrecognized options: $ac_unrecognized_opts" >&2;} + { printf "%s\n" "$as_me:${as_lineno-$LINENO}: WARNING: unrecognized options: $ac_unrecognized_opts" >&5 +printf "%s\n" "$as_me: WARNING: unrecognized options: $ac_unrecognized_opts" >&2;} fi +# --disable-stack-for-recursion is obsolete and has no effect. + +if test "$enable_stack_for_recursion" = "no"; then +cat < +#endif]], +[[return (int)BZ2_bzopen("conftest", "rb");]])], +[AC_MSG_RESULT([yes]);HAVE_LIBBZ2=1; break;], +AC_MSG_RESULT([no])) +LIBS="$OLD_LIBS" + +# Check for the availabiity of libreadline + +if test "$enable_pcre2test_libreadline" = "yes"; then + AC_CHECK_HEADERS([readline/readline.h], [HAVE_READLINE_H=1]) + AC_CHECK_HEADERS([readline/history.h], [HAVE_HISTORY_H=1]) + AC_CHECK_LIB([readline], [readline], [LIBREADLINE="-lreadline"], + [unset ac_cv_lib_readline_readline; + AC_CHECK_LIB([readline], [readline], [LIBREADLINE="-ltinfo"], + [unset ac_cv_lib_readline_readline; + AC_CHECK_LIB([readline], [readline], [LIBREADLINE="-lcurses"], + [unset ac_cv_lib_readline_readline; + AC_CHECK_LIB([readline], [readline], [LIBREADLINE="-lncurses"], + [unset ac_cv_lib_readline_readline; + AC_CHECK_LIB([readline], [readline], [LIBREADLINE="-lncursesw"], + [unset ac_cv_lib_readline_readline; + AC_CHECK_LIB([readline], [readline], [LIBREADLINE="-ltermcap"], + [LIBREADLINE=""], + [-ltermcap])], + [-lncursesw])], + [-lncurses])], + [-lcurses])], + [-ltinfo])]) + AC_SUBST(LIBREADLINE) + if test -n "$LIBREADLINE"; then + if test "$LIBREADLINE" != "-lreadline"; then + echo "-lreadline needs $LIBREADLINE" + LIBREADLINE="-lreadline $LIBREADLINE" + fi + fi +fi + + +# Check for the availability of libedit. Different distributions put its +# headers in different places. Try to cover the most common ones. + +if test "$enable_pcre2test_libedit" = "yes"; then + AC_CHECK_HEADERS([editline/readline.h], [HAVE_EDITLINE_READLINE_H=1], + [AC_CHECK_HEADERS([edit/readline/readline.h], [HAVE_READLINE_READLINE_H=1], + [AC_CHECK_HEADERS([readline/readline.h], [HAVE_READLINE_READLINE_H=1])])]) + AC_CHECK_LIB([edit], [readline], [LIBEDIT="-ledit"]) +fi + +PCRE2_STATIC_CFLAG="" +if test "x$enable_shared" = "xno" ; then + AC_DEFINE([PCRE2_STATIC], [1], [ + Define to any value if linking statically (TODO: make nice with Libtool)]) + PCRE2_STATIC_CFLAG="-DPCRE2_STATIC" +fi +AC_SUBST(PCRE2_STATIC_CFLAG) + +# Here is where PCRE2-specific defines are handled + +if test "$enable_pcre2_8" = "yes"; then + AC_DEFINE([SUPPORT_PCRE2_8], [], [ + Define to any value to enable the 8 bit PCRE2 library.]) +fi + +if test "$enable_pcre2_16" = "yes"; then + AC_DEFINE([SUPPORT_PCRE2_16], [], [ + Define to any value to enable the 16 bit PCRE2 library.]) +fi + +if test "$enable_pcre2_32" = "yes"; then + AC_DEFINE([SUPPORT_PCRE2_32], [], [ + Define to any value to enable the 32 bit PCRE2 library.]) +fi + +if test "$enable_debug" = "yes"; then + AC_DEFINE([PCRE2_DEBUG], [], [ + Define to any value to include debugging code.]) +fi + +if test "$enable_percent_zt" = "no"; then + AC_DEFINE([DISABLE_PERCENT_ZT], [], [ + Define to any value to disable the use of the z and t modifiers in + formatting settings such as %zu or %td (this is rarely needed).]) +else + enable_percent_zt=auto +fi + +# Unless running under Windows, JIT support requires pthreads. + +if test "$enable_jit" = "yes"; then + if test "$HAVE_WINDOWS_H" != "1"; then + AX_PTHREAD([], [AC_MSG_ERROR([JIT support requires pthreads])]) + CC="$PTHREAD_CC" + CFLAGS="$PTHREAD_CFLAGS $CFLAGS" + LIBS="$PTHREAD_LIBS $LIBS" + fi + AC_DEFINE([SUPPORT_JIT], [], [ + Define to any value to enable support for Just-In-Time compiling.]) +else + enable_pcre2grep_jit="no" +fi + +if test "$enable_jit_sealloc" = "yes"; then + AC_DEFINE([SLJIT_PROT_EXECUTABLE_ALLOCATOR], [1], [ + Define to any non-zero number to enable support for SELinux + compatible executable memory allocator in JIT. Note that this + will have no effect unless SUPPORT_JIT is also defined.]) +fi + +if test "$enable_pcre2grep_jit" = "yes"; then + AC_DEFINE([SUPPORT_PCRE2GREP_JIT], [], [ + Define to any value to enable JIT support in pcre2grep. Note that this will + have no effect unless SUPPORT_JIT is also defined.]) +fi + +if test "$enable_pcre2grep_callout" = "yes"; then + if test "$enable_pcre2grep_callout_fork" = "yes"; then + if test "$HAVE_WINDOWS_H" != "1"; then + if test "$HAVE_SYS_WAIT_H" != "1"; then + AC_MSG_ERROR([Callout script support needs sys/wait.h.]) + fi + fi + AC_DEFINE([SUPPORT_PCRE2GREP_CALLOUT_FORK], [], [ + Define to any value to enable fork support in pcre2grep callout scripts. + This will have no effect unless SUPPORT_PCRE2GREP_CALLOUT is also + defined.]) + fi + AC_DEFINE([SUPPORT_PCRE2GREP_CALLOUT], [], [ + Define to any value to enable callout script support in pcre2grep.]) +else + enable_pcre2grep_callout_fork="no" +fi + +if test "$enable_unicode" = "yes"; then + AC_DEFINE([SUPPORT_UNICODE], [], [ + Define to any value to enable support for Unicode and UTF encoding. + This will work even in an EBCDIC environment, but it is incompatible + with the EBCDIC macro. That is, PCRE2 can support *either* EBCDIC + code *or* ASCII/Unicode, but not both at once.]) +fi + +if test "$enable_pcre2grep_libz" = "yes"; then + AC_DEFINE([SUPPORT_LIBZ], [], [ + Define to any value to allow pcre2grep to be linked with libz, so that it is + able to handle .gz files.]) +fi + +if test "$enable_pcre2grep_libbz2" = "yes"; then + AC_DEFINE([SUPPORT_LIBBZ2], [], [ + Define to any value to allow pcre2grep to be linked with libbz2, so that it + is able to handle .bz2 files.]) +fi + +if test $with_pcre2grep_bufsize -lt 8192 ; then + AC_MSG_WARN([$with_pcre2grep_bufsize is too small for --with-pcre2grep-bufsize; using 8192]) + with_pcre2grep_bufsize="8192" +else + if test $? -gt 1 ; then + AC_MSG_ERROR([Bad value for --with-pcre2grep-bufsize]) + fi +fi + +if test $with_pcre2grep_max_bufsize -lt $with_pcre2grep_bufsize ; then + with_pcre2grep_max_bufsize="$with_pcre2grep_bufsize" +else + if test $? -gt 1 ; then + AC_MSG_ERROR([Bad value for --with-pcre2grep-max-bufsize]) + fi +fi + +AC_DEFINE_UNQUOTED([PCRE2GREP_BUFSIZE], [$with_pcre2grep_bufsize], [ + The value of PCRE2GREP_BUFSIZE is the starting size of the buffer used by + pcre2grep to hold parts of the file it is searching. The buffer will be + expanded up to PCRE2GREP_MAX_BUFSIZE if necessary, for files containing very + long lines. The actual amount of memory used by pcre2grep is three times this + number, because it allows for the buffering of "before" and "after" lines.]) + +AC_DEFINE_UNQUOTED([PCRE2GREP_MAX_BUFSIZE], [$with_pcre2grep_max_bufsize], [ + The value of PCRE2GREP_MAX_BUFSIZE specifies the maximum size of the buffer + used by pcre2grep to hold parts of the file it is searching. The actual + amount of memory used by pcre2grep is three times this number, because it + allows for the buffering of "before" and "after" lines.]) + +if test "$enable_pcre2test_libedit" = "yes"; then + AC_DEFINE([SUPPORT_LIBEDIT], [], [ + Define to any value to allow pcre2test to be linked with libedit.]) + LIBREADLINE="$LIBEDIT" +elif test "$enable_pcre2test_libreadline" = "yes"; then + AC_DEFINE([SUPPORT_LIBREADLINE], [], [ + Define to any value to allow pcre2test to be linked with libreadline.]) +fi + +AC_DEFINE_UNQUOTED([NEWLINE_DEFAULT], [$ac_pcre2_newline_value], [ + The value of NEWLINE_DEFAULT determines the default newline character + sequence. PCRE2 client programs can override this by selecting other values + at run time. The valid values are 1 (CR), 2 (LF), 3 (CRLF), 4 (ANY), + 5 (ANYCRLF), and 6 (NUL).]) + +if test "$enable_bsr_anycrlf" = "yes"; then + AC_DEFINE([BSR_ANYCRLF], [], [ + By default, the \R escape sequence matches any Unicode line ending + character or sequence of characters. If BSR_ANYCRLF is defined (to any + value), this is changed so that backslash-R matches only CR, LF, or CRLF. + The build-time default can be overridden by the user of PCRE2 at runtime.]) +fi + +if test "$enable_never_backslash_C" = "yes"; then + AC_DEFINE([NEVER_BACKSLASH_C], [], [ + Defining NEVER_BACKSLASH_C locks out the use of \C in all patterns.]) +fi + +AC_DEFINE_UNQUOTED([LINK_SIZE], [$with_link_size], [ + The value of LINK_SIZE determines the number of bytes used to store + links as offsets within the compiled regex. The default is 2, which + allows for compiled patterns up to 65535 code units long. This covers the + vast majority of cases. However, PCRE2 can also be compiled to use 3 or 4 + bytes instead. This allows for longer patterns in extreme cases.]) + +AC_DEFINE_UNQUOTED([PARENS_NEST_LIMIT], [$with_parens_nest_limit], [ + The value of PARENS_NEST_LIMIT specifies the maximum depth of nested + parentheses (of any kind) in a pattern. This limits the amount of system + stack that is used while compiling a pattern.]) + +AC_DEFINE_UNQUOTED([MATCH_LIMIT], [$with_match_limit], [ + The value of MATCH_LIMIT determines the default number of times the + pcre2_match() function can record a backtrack position during a single + matching attempt. The value is also used to limit a loop counter in + pcre2_dfa_match(). There is a runtime interface for setting a different + limit. The limit exists in order to catch runaway regular expressions that + take for ever to determine that they do not match. The default is set very + large so that it does not accidentally catch legitimate cases.]) + +# --with-match-limit-recursion is an obsolete synonym for --with-match-limit-depth + +if test "$with_match_limit_recursion" != "UNSET"; then +cat < Delete Cache". + +1. Install the latest CMake version available from http://www.cmake.org/, and + ensure that cmake\bin is on your path. + +2. Unzip (retaining folder structure) the PCRE2 source tree into a source + directory such as C:\pcre2. You should ensure your local date and time + is not earlier than the file dates in your source dir if the release is + very new. + +3. Create a new, empty build directory, preferably a subdirectory of the + source dir. For example, C:\pcre2\pcre2-xx\build. + +4. Run cmake-gui from the Shell envirornment of your build tool, for example, + Msys for Msys/MinGW or Visual Studio Command Prompt for VC/VC++. Do not try + to start Cmake from the Windows Start menu, as this can lead to errors. + +5. Enter C:\pcre2\pcre2-xx and C:\pcre2\pcre2-xx\build for the source and + build directories, respectively. + +6. Hit the "Configure" button. + +7. Select the particular IDE / build tool that you are using (Visual + Studio, MSYS makefiles, MinGW makefiles, etc.) + +8. The GUI will then list several configuration options. This is where + you can disable Unicode support or select other PCRE2 optional features. + +9. Hit "Configure" again. The adjacent "Generate" button should now be + active. + +10. Hit "Generate". + +11. The build directory should now contain a usable build system, be it a + solution file for Visual Studio, makefiles for MinGW, etc. Exit from + cmake-gui and use the generated build system with your compiler or IDE. + E.g., for MinGW you can run "make", or for Visual Studio, open the PCRE2 + solution, select the desired configuration (Debug, or Release, etc.) and + build the ALL_BUILD project. + +12. If during configuration with cmake-gui you've elected to build the test + programs, you can execute them by building the test project. E.g., for + MinGW: "make test"; for Visual Studio build the RUN_TESTS project. The + most recent build configuration is targeted by the tests. A summary of + test results is presented. Complete test output is subsequently + available for review in Testing\Temporary under your build dir. + + +BUILDING PCRE2 ON WINDOWS WITH VISUAL STUDIO + +The code currently cannot be compiled without a stdint.h header, which is +available only in relatively recent versions of Visual Studio. However, this +portable and permissively-licensed implementation of the header worked without +issue: + + http://www.azillionmonkeys.com/qed/pstdint.h + +Just rename it and drop it into the top level of the build tree. + + +TESTING WITH RUNTEST.BAT + +If configured with CMake, building the test project ("make test" or building +ALL_TESTS in Visual Studio) creates (and runs) pcre2_test.bat (and depending +on your configuration options, possibly other test programs) in the build +directory. The pcre2_test.bat script runs RunTest.bat with correct source and +exe paths. + +For manual testing with RunTest.bat, provided the build dir is a subdirectory +of the source directory: Open command shell window. Chdir to the location +of your pcre2test.exe and pcre2grep.exe programs. Call RunTest.bat with +"..\RunTest.Bat" or "..\..\RunTest.bat" as appropriate. + +To run only a particular test with RunTest.Bat provide a test number argument. + +Otherwise: + +1. Copy RunTest.bat into the directory where pcre2test.exe and pcre2grep.exe + have been created. + +2. Edit RunTest.bat to indentify the full or relative location of + the pcre2 source (wherein which the testdata folder resides), e.g.: + + set srcdir=C:\pcre2\pcre2-10.00 + +3. In a Windows command environment, chdir to the location of your bat and + exe programs. + +4. Run RunTest.bat. Test outputs will automatically be compared to expected + results, and discrepancies will be identified in the console output. + +To independently test the just-in-time compiler, run pcre2_jit_test.exe. + + +BUILDING PCRE2 ON NATIVE Z/OS AND Z/VM + +z/OS and z/VM are operating systems for mainframe computers, produced by IBM. +The character code used is EBCDIC, not ASCII or Unicode. In z/OS, UNIX APIs and +applications can be supported through UNIX System Services, and in such an +environment it should be possible to build PCRE2 in the same way as in other +systems, with the EBCDIC related configuration settings, but it is not known if +anybody has tried this. + +In native z/OS (without UNIX System Services) and in z/VM, special ports are +required. For details, please see file 939 on this web site: + + http://www.cbttape.org + +Everything in that location, source and executable, is in EBCDIC and native +z/OS file formats. The port provides an API for LE languages such as COBOL and +for the z/OS and z/VM versions of the Rexx languages. + +=========================== +Last Updated: 28 April 2021 +=========================== diff --git a/src/pcre2/doc/html/README.txt b/src/pcre2/doc/html/README.txt new file mode 100644 index 00000000..d1a3120e --- /dev/null +++ b/src/pcre2/doc/html/README.txt @@ -0,0 +1,907 @@ +README file for PCRE2 (Perl-compatible regular expression library) +------------------------------------------------------------------ + +PCRE2 is a re-working of the original PCRE1 library to provide an entirely new +API. Since its initial release in 2015, there has been further development of +the code and it now differs from PCRE1 in more than just the API. There are new +features, and the internals have been improved. The original PCRE1 library is +now obsolete and should not be used in new projects. The latest release of +PCRE2 is available in three alternative formats from: + +https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.gz +https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.bz2 +https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.zip + +There is a mailing list for discussion about the development of PCRE at +pcre-dev@exim.org. You can access the archives and subscribe or manage your +subscription here: + + https://lists.exim.org/mailman/listinfo/pcre-dev + +Please read the NEWS file if you are upgrading from a previous release. The +contents of this README file are: + + The PCRE2 APIs + Documentation for PCRE2 + Contributions by users of PCRE2 + Building PCRE2 on non-Unix-like systems + Building PCRE2 without using autotools + Building PCRE2 using autotools + Retrieving configuration information + Shared libraries + Cross-compiling using autotools + Making new tarballs + Testing PCRE2 + Character tables + File manifest + + +The PCRE2 APIs +-------------- + +PCRE2 is written in C, and it has its own API. There are three sets of +functions, one for the 8-bit library, which processes strings of bytes, one for +the 16-bit library, which processes strings of 16-bit values, and one for the +32-bit library, which processes strings of 32-bit values. Unlike PCRE1, there +are no C++ wrappers. + +The distribution does contain a set of C wrapper functions for the 8-bit +library that are based on the POSIX regular expression API (see the pcre2posix +man page). These are built into a library called libpcre2-posix. Note that this +just provides a POSIX calling interface to PCRE2; the regular expressions +themselves still follow Perl syntax and semantics. The POSIX API is restricted, +and does not give full access to all of PCRE2's facilities. + +The header file for the POSIX-style functions is called pcre2posix.h. The +official POSIX name is regex.h, but I did not want to risk possible problems +with existing files of that name by distributing it that way. To use PCRE2 with +an existing program that uses the POSIX API, pcre2posix.h will have to be +renamed or pointed at by a link (or the program modified, of course). See the +pcre2posix documentation for more details. + + +Documentation for PCRE2 +----------------------- + +If you install PCRE2 in the normal way on a Unix-like system, you will end up +with a set of man pages whose names all start with "pcre2". The one that is +just called "pcre2" lists all the others. In addition to these man pages, the +PCRE2 documentation is supplied in two other forms: + + 1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and + doc/pcre2test.txt in the source distribution. The first of these is a + concatenation of the text forms of all the section 3 man pages except the + listing of pcre2demo.c and those that summarize individual functions. The + other two are the text forms of the section 1 man pages for the pcre2grep + and pcre2test commands. These text forms are provided for ease of scanning + with text editors or similar tools. They are installed in + /share/doc/pcre2, where is the installation prefix + (defaulting to /usr/local). + + 2. A set of files containing all the documentation in HTML form, hyperlinked + in various ways, and rooted in a file called index.html, is distributed in + doc/html and installed in /share/doc/pcre2/html. + + +Building PCRE2 on non-Unix-like systems +--------------------------------------- + +For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if +your system supports the use of "configure" and "make" you may be able to build +PCRE2 using autotools in the same way as for many Unix-like systems. + +PCRE2 can also be configured using CMake, which can be run in various ways +(command line, GUI, etc). This creates Makefiles, solution files, etc. The file +NON-AUTOTOOLS-BUILD has information about CMake. + +PCRE2 has been compiled on many different operating systems. It should be +straightforward to build PCRE2 on any system that has a Standard C compiler and +library, because it uses only Standard C functions. + + +Building PCRE2 without using autotools +-------------------------------------- + +The use of autotools (in particular, libtool) is problematic in some +environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD +file for ways of building PCRE2 without using autotools. + + +Building PCRE2 using autotools +------------------------------ + +The following instructions assume the use of the widely used "configure; make; +make install" (autotools) process. + +To build PCRE2 on system that supports autotools, first run the "configure" +command from the PCRE2 distribution directory, with your current directory set +to the directory where you want the files to be created. This command is a +standard GNU "autoconf" configuration script, for which generic instructions +are supplied in the file INSTALL. + +Most commonly, people build PCRE2 within its own distribution directory, and in +this case, on many systems, just running "./configure" is sufficient. However, +the usual methods of changing standard defaults are available. For example: + +CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local + +This command specifies that the C compiler should be run with the flags '-O2 +-Wall' instead of the default, and that "make install" should install PCRE2 +under /opt/local instead of the default /usr/local. + +If you want to build in a different directory, just run "configure" with that +directory as current. For example, suppose you have unpacked the PCRE2 source +into /source/pcre2/pcre2-xxx, but you want to build it in +/build/pcre2/pcre2-xxx: + +cd /build/pcre2/pcre2-xxx +/source/pcre2/pcre2-xxx/configure + +PCRE2 is written in C and is normally compiled as a C library. However, it is +possible to build it as a C++ library, though the provided building apparatus +does not have any features to support this. + +There are some optional features that can be included or omitted from the PCRE2 +library. They are also documented in the pcre2build man page. + +. By default, both shared and static libraries are built. You can change this + by adding one of these options to the "configure" command: + + --disable-shared + --disable-static + + (See also "Shared libraries on Unix-like systems" below.) + +. By default, only the 8-bit library is built. If you add --enable-pcre2-16 to + the "configure" command, the 16-bit library is also built. If you add + --enable-pcre2-32 to the "configure" command, the 32-bit library is also + built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8 + to disable building the 8-bit library. + +. If you want to include support for just-in-time (JIT) compiling, which can + give large performance improvements on certain platforms, add --enable-jit to + the "configure" command. This support is available only for certain hardware + architectures. If you try to enable it on an unsupported architecture, there + will be a compile time error. If in doubt, use --enable-jit=auto, which + enables JIT only if the current hardware is supported. + +. If you are enabling JIT under SELinux environment you may also want to add + --enable-jit-sealloc, which enables the use of an executable memory allocator + that is compatible with SELinux. Warning: this allocator is experimental! + It does not support fork() operation and may crash when no disk space is + available. This option has no effect if JIT is disabled. + +. If you do not want to make use of the default support for UTF-8 Unicode + character strings in the 8-bit library, UTF-16 Unicode character strings in + the 16-bit library, or UTF-32 Unicode character strings in the 32-bit + library, you can add --disable-unicode to the "configure" command. This + reduces the size of the libraries. It is not possible to configure one + library with Unicode support, and another without, in the same configuration. + It is also not possible to use --enable-ebcdic (see below) with Unicode + support, so if this option is set, you must also use --disable-unicode. + + When Unicode support is available, the use of a UTF encoding still has to be + enabled by setting the PCRE2_UTF option at run time or starting a pattern + with (*UTF). When PCRE2 is compiled with Unicode support, its input can only + either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. + + As well as supporting UTF strings, Unicode support includes support for the + \P, \p, and \X sequences that recognize Unicode character properties. + However, only the basic two-letter properties such as Lu are supported. + Escape sequences such as \d and \w in patterns do not by default make use of + Unicode properties, but can be made to do so by setting the PCRE2_UCP option + or starting a pattern with (*UCP). + +. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any + of the preceding, or any of the Unicode newline sequences, or the NUL (zero) + character as indicating the end of a line. Whatever you specify at build time + is the default; the caller of PCRE2 can change the selection at run time. The + default newline indicator is a single LF character (the Unix standard). You + can specify the default newline indicator by adding --enable-newline-is-cr, + --enable-newline-is-lf, --enable-newline-is-crlf, + --enable-newline-is-anycrlf, --enable-newline-is-any, or + --enable-newline-is-nul to the "configure" command, respectively. + +. By default, the sequence \R in a pattern matches any Unicode line ending + sequence. This is independent of the option specifying what PCRE2 considers + to be the end of a line (see above). However, the caller of PCRE2 can + restrict \R to match only CR, LF, or CRLF. You can make this the default by + adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R"). + +. In a pattern, the escape sequence \C matches a single code unit, even in a + UTF mode. This can be dangerous because it breaks up multi-code-unit + characters. You can build PCRE2 with the use of \C permanently locked out by + adding --enable-never-backslash-C (note the upper case C) to the "configure" + command. When \C is allowed by the library, individual applications can lock + it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option. + +. PCRE2 has a counter that limits the depth of nesting of parentheses in a + pattern. This limits the amount of system stack that a pattern uses when it + is compiled. The default is 250, but you can change it by setting, for + example, + + --with-parens-nest-limit=500 + +. PCRE2 has a counter that can be set to limit the amount of computing resource + it uses when matching a pattern. If the limit is exceeded during a match, the + match fails. The default is ten million. You can change the default by + setting, for example, + + --with-match-limit=500000 + + on the "configure" command. This is just the default; individual calls to + pcre2_match() or pcre2_dfa_match() can supply their own value. There is more + discussion in the pcre2api man page (search for pcre2_set_match_limit). + +. There is a separate counter that limits the depth of nested backtracking + (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a + matching process, which indirectly limits the amount of heap memory that is + used, and in the case of pcre2_dfa_match() the amount of stack as well. This + counter also has a default of ten million, which is essentially "unlimited". + You can change the default by setting, for example, + + --with-match-limit-depth=5000 + + There is more discussion in the pcre2api man page (search for + pcre2_set_depth_limit). + +. You can also set an explicit limit on the amount of heap memory used by + the pcre2_match() and pcre2_dfa_match() interpreters: + + --with-heap-limit=500 + + The units are kibibytes (units of 1024 bytes). This limit does not apply when + the JIT optimization (which has its own memory control features) is used. + There is more discussion on the pcre2api man page (search for + pcre2_set_heap_limit). + +. In the 8-bit library, the default maximum compiled pattern size is around + 64 kibibytes. You can increase this by adding --with-link-size=3 to the + "configure" command. PCRE2 then uses three bytes instead of two for offsets + to different parts of the compiled pattern. In the 16-bit library, + --with-link-size=3 is the same as --with-link-size=4, which (in both + libraries) uses four-byte offsets. Increasing the internal link size reduces + performance in the 8-bit and 16-bit libraries. In the 32-bit library, the + link size setting is ignored, as 4-byte offsets are always used. + +. For speed, PCRE2 uses four tables for manipulating and identifying characters + whose code point values are less than 256. By default, it uses a set of + tables for ASCII encoding that is part of the distribution. If you specify + + --enable-rebuild-chartables + + a program called pcre2_dftables is compiled and run in the default C locale + when you obey "make". It builds a source file called pcre2_chartables.c. If + you do not specify this option, pcre2_chartables.c is created as a copy of + pcre2_chartables.c.dist. See "Character tables" below for further + information. + +. It is possible to compile PCRE2 for use on systems that use EBCDIC as their + character code (as opposed to ASCII/Unicode) by specifying + + --enable-ebcdic --disable-unicode + + This automatically implies --enable-rebuild-chartables (see above). However, + when PCRE2 is built this way, it always operates in EBCDIC. It cannot support + both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25, + which specifies that the code value for the EBCDIC NL character is 0x25 + instead of the default 0x15. + +. If you specify --enable-debug, additional debugging code is included in the + build. This option is intended for use by the PCRE2 maintainers. + +. In environments where valgrind is installed, if you specify + + --enable-valgrind + + PCRE2 will use valgrind annotations to mark certain memory regions as + unaddressable. This allows it to detect invalid memory accesses, and is + mostly useful for debugging PCRE2 itself. + +. In environments where the gcc compiler is used and lcov is installed, if you + specify + + --enable-coverage + + the build process implements a code coverage report for the test suite. The + report is generated by running "make coverage". If ccache is installed on + your system, it must be disabled when building PCRE2 for coverage reporting. + You can do this by setting the environment variable CCACHE_DISABLE=1 before + running "make" to build PCRE2. There is more information about coverage + reporting in the "pcre2build" documentation. + +. When JIT support is enabled, pcre2grep automatically makes use of it, unless + you add --disable-pcre2grep-jit to the "configure" command. + +. There is support for calling external programs during matching in the + pcre2grep command, using PCRE2's callout facility with string arguments. This + support can be disabled by adding --disable-pcre2grep-callout to the + "configure" command. There are two kinds of callout: one that generates + output from inbuilt code, and another that calls an external program. The + latter has special support for Windows and VMS; otherwise it assumes the + existence of the fork() function. This facility can be disabled by adding + --disable-pcre2grep-callout-fork to the "configure" command. + +. The pcre2grep program currently supports only 8-bit data files, and so + requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use + libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by + specifying one or both of + + --enable-pcre2grep-libz + --enable-pcre2grep-libbz2 + + Of course, the relevant libraries must be installed on your system. + +. The default starting size (in bytes) of the internal buffer used by pcre2grep + can be set by, for example: + + --with-pcre2grep-bufsize=51200 + + The value must be a plain integer. The default is 20480. The amount of memory + used by pcre2grep is actually three times this number, to allow for "before" + and "after" lines. If very long lines are encountered, the buffer is + automatically enlarged, up to a fixed maximum size. + +. The default maximum size of pcre2grep's internal buffer can be set by, for + example: + + --with-pcre2grep-max-bufsize=2097152 + + The default is either 1048576 or the value of --with-pcre2grep-bufsize, + whichever is the larger. + +. It is possible to compile pcre2test so that it links with the libreadline + or libedit libraries, by specifying, respectively, + + --enable-pcre2test-libreadline or --enable-pcre2test-libedit + + If this is done, when pcre2test's input is from a terminal, it reads it using + the readline() function. This provides line-editing and history facilities. + Note that libreadline is GPL-licenced, so if you distribute a binary of + pcre2test linked in this way, there may be licensing issues. These can be + avoided by linking with libedit (which has a BSD licence) instead. + + Enabling libreadline causes the -lreadline option to be added to the + pcre2test build. In many operating environments with a sytem-installed + readline library this is sufficient. However, in some environments (e.g. if + an unmodified distribution version of readline is in use), it may be + necessary to specify something like LIBS="-lncurses" as well. This is + because, to quote the readline INSTALL, "Readline uses the termcap functions, + but does not link with the termcap or curses library itself, allowing + applications which link with readline the to choose an appropriate library." + If you get error messages about missing functions tgetstr, tgetent, tputs, + tgetflag, or tgoto, this is the problem, and linking with the ncurses library + should fix it. + +. The C99 standard defines formatting modifiers z and t for size_t and + ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in + environments other than Microsoft Visual Studio when __STDC_VERSION__ is + defined and has a value greater than or equal to 199901L (indicating C99). + However, there is at least one environment that claims to be C99 but does not + support these modifiers. If --disable-percent-zt is specified, no use is made + of the z or t modifiers. Instead or %td or %zu, %lu is used, with a cast for + size_t values. + +. There is a special option called --enable-fuzz-support for use by people who + want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit + library. If set, it causes an extra library called libpcre2-fuzzsupport.a to + be built, but not installed. This contains a single function called + LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the + length of the string. When called, this function tries to compile the string + as a pattern, and if that succeeds, to match it. This is done both with no + options and with some random options bits that are generated from the string. + Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to + be created. This is normally run under valgrind or used when PCRE2 is + compiled with address sanitizing enabled. It calls the fuzzing function and + outputs information about it is doing. The input strings are specified by + arguments: if an argument starts with "=" the rest of it is a literal input + string. Otherwise, it is assumed to be a file name, and the contents of the + file are the test string. + +. Releases before 10.30 could be compiled with --disable-stack-for-recursion, + which caused pcre2_match() to use individual blocks on the heap for + backtracking instead of recursive function calls (which use the stack). This + is now obsolete since pcre2_match() was refactored always to use the heap (in + a much more efficient way than before). This option is retained for backwards + compatibility, but has no effect other than to output a warning. + +The "configure" script builds the following files for the basic C library: + +. Makefile the makefile that builds the library +. src/config.h build-time configuration options for the library +. src/pcre2.h the public PCRE2 header file +. pcre2-config script that shows the building settings such as CFLAGS + that were set for "configure" +. libpcre2-8.pc ) +. libpcre2-16.pc ) data for the pkg-config command +. libpcre2-32.pc ) +. libpcre2-posix.pc ) +. libtool script that builds shared and/or static libraries + +Versions of config.h and pcre2.h are distributed in the src directory of PCRE2 +tarballs under the names config.h.generic and pcre2.h.generic. These are +provided for those who have to build PCRE2 without using "configure" or CMake. +If you use "configure" or CMake, the .generic versions are not used. + +The "configure" script also creates config.status, which is an executable +script that can be run to recreate the configuration, and config.log, which +contains compiler output from tests that "configure" runs. + +Once "configure" has run, you can run "make". This builds whichever of the +libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test +program called pcre2test. If you enabled JIT support with --enable-jit, another +test program called pcre2_jit_test is built as well. If the 8-bit library is +built, libpcre2-posix and the pcre2grep command are also built. Running +"make" with the -j option may speed up compilation on multiprocessor systems. + +The command "make check" runs all the appropriate tests. Details of the PCRE2 +tests are given below in a separate section of this document. The -j option of +"make" can also be used when running the tests. + +You can use "make install" to install PCRE2 into live directories on your +system. The following are installed (file names are all relative to the + that is set when "configure" is run): + + Commands (bin): + pcre2test + pcre2grep (if 8-bit support is enabled) + pcre2-config + + Libraries (lib): + libpcre2-8 (if 8-bit support is enabled) + libpcre2-16 (if 16-bit support is enabled) + libpcre2-32 (if 32-bit support is enabled) + libpcre2-posix (if 8-bit support is enabled) + + Configuration information (lib/pkgconfig): + libpcre2-8.pc + libpcre2-16.pc + libpcre2-32.pc + libpcre2-posix.pc + + Header files (include): + pcre2.h + pcre2posix.h + + Man pages (share/man/man{1,3}): + pcre2grep.1 + pcre2test.1 + pcre2-config.1 + pcre2.3 + pcre2*.3 (lots more pages, all starting "pcre2") + + HTML documentation (share/doc/pcre2/html): + index.html + *.html (lots more pages, hyperlinked from index.html) + + Text file documentation (share/doc/pcre2): + AUTHORS + COPYING + ChangeLog + LICENCE + NEWS + README + pcre2.txt (a concatenation of the man(3) pages) + pcre2test.txt the pcre2test man page + pcre2grep.txt the pcre2grep man page + pcre2-config.txt the pcre2-config man page + +If you want to remove PCRE2 from your system, you can run "make uninstall". +This removes all the files that "make install" installed. However, it does not +remove any directories, because these are often shared with other programs. + + +Retrieving configuration information +------------------------------------ + +Running "make install" installs the command pcre2-config, which can be used to +recall information about the PCRE2 configuration and installation. For example: + + pcre2-config --version + +prints the version number, and + + pcre2-config --libs8 + +outputs information about where the 8-bit library is installed. This command +can be included in makefiles for programs that use PCRE2, saving the programmer +from having to remember too many details. Run pcre2-config with no arguments to +obtain a list of possible arguments. + +The pkg-config command is another system for saving and retrieving information +about installed libraries. Instead of separate commands for each library, a +single command is used. For example: + + pkg-config --libs libpcre2-16 + +The data is held in *.pc files that are installed in a directory called +/lib/pkgconfig. + + +Shared libraries +---------------- + +The default distribution builds PCRE2 as shared libraries and static libraries, +as long as the operating system supports shared libraries. Shared library +support relies on the "libtool" script which is built as part of the +"configure" process. + +The libtool script is used to compile and link both shared and static +libraries. They are placed in a subdirectory called .libs when they are newly +built. The programs pcre2test and pcre2grep are built to use these uninstalled +libraries (by means of wrapper scripts in the case of shared libraries). When +you use "make install" to install shared libraries, pcre2grep and pcre2test are +automatically re-built to use the newly installed shared libraries before being +installed themselves. However, the versions left in the build directory still +use the uninstalled libraries. + +To build PCRE2 using static libraries only you must use --disable-shared when +configuring it. For example: + +./configure --prefix=/usr/gnu --disable-shared + +Then run "make" in the usual way. Similarly, you can use --disable-static to +build only shared libraries. + + +Cross-compiling using autotools +------------------------------- + +You can specify CC and CFLAGS in the normal way to the "configure" command, in +order to cross-compile PCRE2 for some other host. However, you should NOT +specify --enable-rebuild-chartables, because if you do, the pcre2_dftables.c +source file is compiled and run on the local host, in order to generate the +inbuilt character tables (the pcre2_chartables.c file). This will probably not +work, because pcre2_dftables.c needs to be compiled with the local compiler, +not the cross compiler. + +When --enable-rebuild-chartables is not specified, pcre2_chartables.c is +created by making a copy of pcre2_chartables.c.dist, which is a default set of +tables that assumes ASCII code. Cross-compiling with the default tables should +not be a problem. + +If you need to modify the character tables when cross-compiling, you should +move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by +hand and run it on the local host to make a new version of +pcre2_chartables.c.dist. See the pcre2build section "Creating character tables +at build time" for more details. + + +Making new tarballs +------------------- + +The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and +zip formats. The command "make distcheck" does the same, but then does a trial +build of the new distribution to ensure that it works. + +If you have modified any of the man page sources in the doc directory, you +should first run the PrepareRelease script before making a distribution. This +script creates the .txt and HTML forms of the documentation from the man pages. + + +Testing PCRE2 +------------- + +To test the basic PCRE2 library on a Unix-like system, run the RunTest script. +There is another script called RunGrepTest that tests the pcre2grep command. +When JIT support is enabled, a third test program called pcre2_jit_test is +built. Both the scripts and all the program tests are run if you obey "make +check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD. + +The RunTest script runs the pcre2test test program (which is documented in its +own man page) on each of the relevant testinput files in the testdata +directory, and compares the output with the contents of the corresponding +testoutput files. RunTest uses a file called testtry to hold the main output +from pcre2test. Other files whose names begin with "test" are used as working +files in some tests. + +Some tests are relevant only when certain build-time options were selected. For +example, the tests for UTF-8/16/32 features are run only when Unicode support +is available. RunTest outputs a comment when it skips a test. + +Many (but not all) of the tests that are not skipped are run twice if JIT +support is available. On the second run, JIT compilation is forced. This +testing can be suppressed by putting "nojit" on the RunTest command line. + +The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit +libraries that are enabled. If you want to run just one set of tests, call +RunTest with either the -8, -16 or -32 option. + +If valgrind is installed, you can run the tests under it by putting "valgrind" +on the RunTest command line. To run pcre2test on just one or more specific test +files, give their numbers as arguments to RunTest, for example: + + RunTest 2 7 11 + +You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the +end), or a number preceded by ~ to exclude a test. For example: + + Runtest 3-15 ~10 + +This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests +except test 13. Whatever order the arguments are in, the tests are always run +in numerical order. + +You can also call RunTest with the single argument "list" to cause it to output +a list of tests. + +The test sequence starts with "test 0", which is a special test that has no +input file, and whose output is not checked. This is because it will be +different on different hardware and with different configurations. The test +exists in order to exercise some of pcre2test's code that would not otherwise +be run. + +Tests 1 and 2 can always be run, as they expect only plain text strings (not +UTF) and make no use of Unicode properties. The first test file can be fed +directly into the perltest.sh script to check that Perl gives the same results. +The only difference you should see is in the first few lines, where the Perl +version is given instead of the PCRE2 version. The second set of tests check +auxiliary functions, error detection, and run-time flags that are specific to +PCRE2. It also uses the debugging flags to check some of the internals of +pcre2_compile(). + +If you build PCRE2 with a locale setting that is not the standard C locale, the +character tables may be different (see next paragraph). In some cases, this may +cause failures in the second set of tests. For example, in a locale where the +isprint() function yields TRUE for characters in the range 128-255, the use of +[:isascii:] inside a character class defines a different set of characters, and +this shows up in this test as a difference in the compiled code, which is being +listed for checking. For example, where the comparison test output contains +[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other +cases. This is not a bug in PCRE2. + +Test 3 checks pcre2_maketables(), the facility for building a set of character +tables for a specific locale and using them instead of the default tables. The +script uses the "locale" command to check for the availability of the "fr_FR", +"french", or "fr" locale, and uses the first one that it finds. If the "locale" +command fails, or if its output doesn't include "fr_FR", "french", or "fr" in +the list of available locales, the third test cannot be run, and a comment is +output to say why. If running this test produces an error like this: + + ** Failed to set locale "fr_FR" + +it means that the given locale is not available on your system, despite being +listed by "locale". This does not mean that PCRE2 is broken. There are three +alternative output files for the third test, because three different versions +of the French locale have been encountered. The test passes if its output +matches any one of them. + +Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible +with the perltest.sh script, and test 5 checking PCRE2-specific things. + +Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in +non-UTF mode and UTF-mode with Unicode property support, respectively. + +Test 8 checks some internal offsets and code size features, but it is run only +when Unicode support is enabled. The output is different in 8-bit, 16-bit, and +32-bit modes and for different link sizes, so there are different output files +for each mode and link size. + +Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in +16-bit and 32-bit modes. These are tests that generate different output in +8-bit mode. Each pair are for general cases and Unicode support, respectively. + +Test 13 checks the handling of non-UTF characters greater than 255 by +pcre2_dfa_match() in 16-bit and 32-bit modes. + +Test 14 contains some special UTF and UCP tests that give different output for +different code unit widths. + +Test 15 contains a number of tests that must not be run with JIT. They check, +among other non-JIT things, the match-limiting features of the intepretive +matcher. + +Test 16 is run only when JIT support is not available. It checks that an +attempt to use JIT has the expected behaviour. + +Test 17 is run only when JIT support is available. It checks JIT complete and +partial modes, match-limiting under JIT, and other JIT-specific features. + +Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to +the 8-bit library, without and with Unicode support, respectively. + +Test 20 checks the serialization functions by writing a set of compiled +patterns to a file, and then reloading and checking them. + +Tests 21 and 22 test \C support when the use of \C is not locked out, without +and with UTF support, respectively. Test 23 tests \C when it is locked out. + +Tests 24 and 25 test the experimental pattern conversion functions, without and +with UTF support, respectively. + + +Character tables +---------------- + +For speed, PCRE2 uses four tables for manipulating and identifying characters +whose code point values are less than 256. By default, a set of tables that is +built into the library is used. The pcre2_maketables() function can be called +by an application to create a new set of tables in the current locale. This are +passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a +compile context. + +The source file called pcre2_chartables.c contains the default set of tables. +By default, this is created as a copy of pcre2_chartables.c.dist, which +contains tables for ASCII coding. However, if --enable-rebuild-chartables is +specified for ./configure, a new version of pcre2_chartables.c is built by the +program pcre2_dftables (compiled from pcre2_dftables.c), which uses the ANSI C +character handling functions such as isalnum(), isalpha(), isupper(), +islower(), etc. to build the table sources. This means that the default C +locale that is set for your system will control the contents of these default +tables. You can change the default tables by editing pcre2_chartables.c and +then re-building PCRE2. If you do this, you should take care to ensure that the +file does not get automatically re-generated. The best way to do this is to +move pcre2_chartables.c.dist out of the way and replace it with your customized +tables. + +When the pcre2_dftables program is run as a result of specifying +--enable-rebuild-chartables, it uses the default C locale that is set on your +system. It does not pay attention to the LC_xxx environment variables. In other +words, it uses the system's default locale rather than whatever the compiling +user happens to have set. If you really do want to build a source set of +character tables in a locale that is specified by the LC_xxx variables, you can +run the pcre2_dftables program by hand with the -L option. For example: + + ./pcre2_dftables -L pcre2_chartables.c.special + +The second argument names the file where the source code for the tables is +written. The first two 256-byte tables provide lower casing and case flipping +functions, respectively. The next table consists of a number of 32-byte bit +maps which identify certain character classes such as digits, "word" +characters, white space, etc. These are used when building 32-byte bit maps +that represent character classes for code points less than 256. The final +256-byte table has bits indicating various character types, as follows: + + 1 white space character + 2 letter + 4 lower case letter + 8 decimal digit + 16 alphanumeric or '_' + +You can also specify -b (with or without -L) when running pcre2_dftables. This +causes the tables to be written in binary instead of as source code. A set of +binary tables can be loaded into memory by an application and passed to +pcre2_compile() in the same way as tables created dynamically by calling +pcre2_maketables(). The tables are just a string of bytes, independent of +hardware characteristics such as endianness. This means they can be bundled +with an application that runs in different environments, to ensure consistent +behaviour. + +See also the pcre2build section "Creating character tables at build time". + + +File manifest +------------- + +The distribution should contain the files listed below. + +(A) Source files for the PCRE2 library functions and their headers are found in + the src directory: + + src/pcre2_dftables.c auxiliary program for building pcre2_chartables.c + when --enable-rebuild-chartables is specified + + src/pcre2_chartables.c.dist a default set of character tables that assume + ASCII coding; unless --enable-rebuild-chartables is + specified, used by copying to pcre2_chartables.c + + src/pcre2posix.c ) + src/pcre2_auto_possess.c ) + src/pcre2_compile.c ) + src/pcre2_config.c ) + src/pcre2_context.c ) + src/pcre2_convert.c ) + src/pcre2_dfa_match.c ) + src/pcre2_error.c ) + src/pcre2_extuni.c ) + src/pcre2_find_bracket.c ) + src/pcre2_jit_compile.c ) + src/pcre2_jit_match.c ) sources for the functions in the library, + src/pcre2_jit_misc.c ) and some internal functions that they use + src/pcre2_maketables.c ) + src/pcre2_match.c ) + src/pcre2_match_data.c ) + src/pcre2_newline.c ) + src/pcre2_ord2utf.c ) + src/pcre2_pattern_info.c ) + src/pcre2_script_run.c ) + src/pcre2_serialize.c ) + src/pcre2_string_utils.c ) + src/pcre2_study.c ) + src/pcre2_substitute.c ) + src/pcre2_substring.c ) + src/pcre2_tables.c ) + src/pcre2_ucd.c ) + src/pcre2_valid_utf.c ) + src/pcre2_xclass.c ) + + src/pcre2_printint.c debugging function that is used by pcre2test, + src/pcre2_fuzzsupport.c function for (optional) fuzzing support + + src/config.h.in template for config.h, when built by "configure" + src/pcre2.h.in template for pcre2.h when built by "configure" + src/pcre2posix.h header for the external POSIX wrapper API + src/pcre2_internal.h header for internal use + src/pcre2_intmodedep.h a mode-specific internal header + src/pcre2_ucp.h header for Unicode property handling + + sljit/* source files for the JIT compiler + +(B) Source files for programs that use PCRE2: + + src/pcre2demo.c simple demonstration of coding calls to PCRE2 + src/pcre2grep.c source of a grep utility that uses PCRE2 + src/pcre2test.c comprehensive test program + src/pcre2_jit_test.c JIT test program + +(C) Auxiliary files: + + 132html script to turn "man" pages into HTML + AUTHORS information about the author of PCRE2 + ChangeLog log of changes to the code + CleanTxt script to clean nroff output for txt man pages + Detrail script to remove trailing spaces + HACKING some notes about the internals of PCRE2 + INSTALL generic installation instructions + LICENCE conditions for the use of PCRE2 + COPYING the same, using GNU's standard name + Makefile.in ) template for Unix Makefile, which is built by + ) "configure" + Makefile.am ) the automake input that was used to create + ) Makefile.in + NEWS important changes in this release + NON-AUTOTOOLS-BUILD notes on building PCRE2 without using autotools + PrepareRelease script to make preparations for "make dist" + README this file + RunTest a Unix shell script for running tests + RunGrepTest a Unix shell script for pcre2grep tests + aclocal.m4 m4 macros (generated by "aclocal") + config.guess ) files used by libtool, + config.sub ) used only when building a shared library + configure a configuring shell script (built by autoconf) + configure.ac ) the autoconf input that was used to build + ) "configure" and config.h + depcomp ) script to find program dependencies, generated by + ) automake + doc/*.3 man page sources for PCRE2 + doc/*.1 man page sources for pcre2grep and pcre2test + doc/index.html.src the base HTML page + doc/html/* HTML documentation + doc/pcre2.txt plain text version of the man pages + doc/pcre2test.txt plain text documentation of test program + install-sh a shell script for installing files + libpcre2-8.pc.in template for libpcre2-8.pc for pkg-config + libpcre2-16.pc.in template for libpcre2-16.pc for pkg-config + libpcre2-32.pc.in template for libpcre2-32.pc for pkg-config + libpcre2-posix.pc.in template for libpcre2-posix.pc for pkg-config + ltmain.sh file used to build a libtool script + missing ) common stub for a few missing GNU programs while + ) installing, generated by automake + mkinstalldirs script for making install directories + perltest.sh Script for running a Perl test program + pcre2-config.in source of script which retains PCRE2 information + testdata/testinput* test data for main library tests + testdata/testoutput* expected test results + testdata/grep* input and output for pcre2grep tests + testdata/* other supporting test files + +(D) Auxiliary files for cmake support + + cmake/COPYING-CMAKE-SCRIPTS + cmake/FindPackageHandleStandardArgs.cmake + cmake/FindEditline.cmake + cmake/FindReadline.cmake + CMakeLists.txt + config-cmake.h.in + +(E) Auxiliary files for building PCRE2 "by hand" + + src/pcre2.h.generic ) a version of the public PCRE2 header file + ) for use in non-"configure" environments + src/config.h.generic ) a version of config.h for use in non-"configure" + ) environments + +Philip Hazel +Email local part: Philip.Hazel +Email domain: gmail.com +Last updated: 28 April 2021 diff --git a/src/pcre2/doc/html/index.html b/src/pcre2/doc/html/index.html new file mode 100644 index 00000000..2c7c5fb2 --- /dev/null +++ b/src/pcre2/doc/html/index.html @@ -0,0 +1,312 @@ + + + +PCRE2 specification + + +

    Perl-compatible Regular Expressions (revised API: PCRE2)

    +

    +The HTML documentation for PCRE2 consists of a number of pages that are listed +below in alphabetical order. If you are new to PCRE2, please read the first one +first. +

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    pcre2  Introductory page
    pcre2-config  Information about the installation configuration
    pcre2api  PCRE2's native API
    pcre2build  Building PCRE2
    pcre2callout  The callout facility
    pcre2compat  Compability with Perl
    pcre2convert  Experimental foreign pattern conversion functions
    pcre2demo  A demonstration C program that uses the PCRE2 library
    pcre2grep  The pcre2grep command
    pcre2jit  Discussion of the just-in-time optimization support
    pcre2limits  Details of size and other limits
    pcre2matching  Discussion of the two matching algorithms
    pcre2partial  Using PCRE2 for partial matching
    pcre2pattern  Specification of the regular expressions supported by PCRE2
    pcre2perform  Some comments on performance
    pcre2posix  The POSIX API to the PCRE2 8-bit library
    pcre2sample  Discussion of the pcre2demo program
    pcre2serialize  Serializing functions for saving precompiled patterns
    pcre2syntax  Syntax quick-reference summary
    pcre2test  The pcre2test command for testing PCRE2
    pcre2unicode  Discussion of Unicode and UTF-8/UTF-16/UTF-32 support
    + +

    +There are also individual pages that summarize the interface for each function +in the library. +

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    pcre2_callout_enumerate  Enumerate callouts in a compiled pattern
    pcre2_code_copy  Copy a compiled pattern
    pcre2_code_copy_with_tables  Copy a compiled pattern and its character tables
    pcre2_code_free  Free a compiled pattern
    pcre2_compile  Compile a regular expression pattern
    pcre2_compile_context_copy  Copy a compile context
    pcre2_compile_context_create  Create a compile context
    pcre2_compile_context_free  Free a compile context
    pcre2_config  Show build-time configuration options
    pcre2_convert_context_copy  Copy a convert context
    pcre2_convert_context_create  Create a convert context
    pcre2_convert_context_free  Free a convert context
    pcre2_converted_pattern_free  Free converted foreign pattern
    pcre2_dfa_match  Match a compiled pattern to a subject string + (DFA algorithm; not Perl compatible)
    pcre2_general_context_copy  Copy a general context
    pcre2_general_context_create  Create a general context
    pcre2_general_context_free  Free a general context
    pcre2_get_error_message  Get textual error message for error number
    pcre2_get_mark  Get a (*MARK) name
    pcre2_get_match_data_size  Get the size of a match data block
    pcre2_get_ovector_count  Get the ovector count
    pcre2_get_ovector_pointer  Get a pointer to the ovector
    pcre2_get_startchar  Get the starting character offset
    pcre2_jit_compile  Process a compiled pattern with the JIT compiler
    pcre2_jit_free_unused_memory  Free unused JIT memory
    pcre2_jit_match  Fast path interface to JIT matching
    pcre2_jit_stack_assign  Assign stack for JIT matching
    pcre2_jit_stack_create  Create a stack for JIT matching
    pcre2_jit_stack_free  Free a JIT matching stack
    pcre2_maketables  Build character tables in current locale
    pcre2_maketables_free  Free character tables
    pcre2_match  Match a compiled pattern to a subject string + (Perl compatible)
    pcre2_match_context_copy  Copy a match context
    pcre2_match_context_create  Create a match context
    pcre2_match_context_free  Free a match context
    pcre2_match_data_create  Create a match data block
    pcre2_match_data_create_from_pattern  Create a match data block getting size from pattern
    pcre2_match_data_free  Free a match data block
    pcre2_pattern_convert  Experimental foreign pattern converter
    pcre2_pattern_info  Extract information about a pattern
    pcre2_serialize_decode  Decode serialized compiled patterns
    pcre2_serialize_encode  Serialize compiled patterns for save/restore
    pcre2_serialize_free  Free serialized compiled patterns
    pcre2_serialize_get_number_of_codes  Get number of serialized compiled patterns
    pcre2_set_bsr  Set \R convention
    pcre2_set_callout  Set up a callout function
    pcre2_set_character_tables  Set character tables
    pcre2_set_compile_extra_options  Set compile time extra options
    pcre2_set_compile_recursion_guard  Set up a compile recursion guard function
    pcre2_set_depth_limit  Set the match backtracking depth limit
    pcre2_set_glob_escape  Set glob escape character
    pcre2_set_glob_separator  Set glob separator character
    pcre2_set_heap_limit  Set the match backtracking heap limit
    pcre2_set_match_limit  Set the match limit
    pcre2_set_max_pattern_length  Set the maximum length of pattern
    pcre2_set_newline  Set the newline convention
    pcre2_set_offset_limit  Set the offset limit
    pcre2_set_parens_nest_limit  Set the parentheses nesting limit
    pcre2_set_recursion_limit  Obsolete: use pcre2_set_depth_limit
    pcre2_set_recursion_memory_management  Obsolete function that (from 10.30 onwards) does nothing
    pcre2_substitute  Match a compiled pattern to a subject string and do + substitutions
    pcre2_substring_copy_byname  Extract named substring into given buffer
    pcre2_substring_copy_bynumber  Extract numbered substring into given buffer
    pcre2_substring_free  Free extracted substring
    pcre2_substring_get_byname  Extract named substring into new memory
    pcre2_substring_get_bynumber  Extract numbered substring into new memory
    pcre2_substring_length_byname  Find length of named substring
    pcre2_substring_length_bynumber  Find length of numbered substring
    pcre2_substring_list_free  Free list of extracted substrings
    pcre2_substring_list_get  Extract all substrings into new memory
    pcre2_substring_nametable_scan  Find table entries for given string name
    pcre2_substring_number_from_name  Convert captured string name to number
    + + + diff --git a/src/pcre2/doc/html/pcre2-config.html b/src/pcre2/doc/html/pcre2-config.html new file mode 100644 index 00000000..b71d7602 --- /dev/null +++ b/src/pcre2/doc/html/pcre2-config.html @@ -0,0 +1,102 @@ + + +pcre2-config specification + + +

    pcre2-config man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

    +
    SYNOPSIS
    +

    +pcre2-config [--prefix] [--exec-prefix] [--version] + [--libs8] [--libs16] [--libs32] [--libs-posix] + [--cflags] [--cflags-posix] +

    +
    DESCRIPTION
    +

    +pcre2-config returns the configuration of the installed PCRE2 libraries +and the options required to compile a program to use them. Some of the options +apply only to the 8-bit, or 16-bit, or 32-bit libraries, respectively, and are +not available for libraries that have not been built. If an unavailable option +is encountered, the "usage" information is output. +

    +
    OPTIONS
    +

    +--prefix +Writes the directory prefix used in the PCRE2 installation for architecture +independent files (/usr on many systems, /usr/local on some +systems) to the standard output. +

    +

    +--exec-prefix +Writes the directory prefix used in the PCRE2 installation for architecture +dependent files (normally the same as --prefix) to the standard output. +

    +

    +--version +Writes the version number of the installed PCRE2 libraries to the standard +output. +

    +

    +--libs8 +Writes to the standard output the command line options required to link +with the 8-bit PCRE2 library (-lpcre2-8 on many systems). +

    +

    +--libs16 +Writes to the standard output the command line options required to link +with the 16-bit PCRE2 library (-lpcre2-16 on many systems). +

    +

    +--libs32 +Writes to the standard output the command line options required to link +with the 32-bit PCRE2 library (-lpcre2-32 on many systems). +

    +

    +--libs-posix +Writes to the standard output the command line options required to link with +PCRE2's POSIX API wrapper library (-lpcre2-posix -lpcre2-8 on many +systems). +

    +

    +--cflags +Writes to the standard output the command line options required to compile +files that use PCRE2 (this may include some -I options, but is blank on +many systems). +

    +

    +--cflags-posix +Writes to the standard output the command line options required to compile +files that use PCRE2's POSIX API wrapper library (this may include some +-I options, but is blank on many systems). +

    +
    SEE ALSO
    +

    +pcre2(3) +

    +
    AUTHOR
    +

    +This manual page was originally written by Mark Baker for the Debian GNU/Linux +system. It has been subsequently revised as a generic PCRE2 man page. +

    +
    REVISION
    +

    +Last updated: 28 September 2014 +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2.html b/src/pcre2/doc/html/pcre2.html new file mode 100644 index 00000000..1e267d09 --- /dev/null +++ b/src/pcre2/doc/html/pcre2.html @@ -0,0 +1,213 @@ + + +pcre2 specification + + +

    pcre2 man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

    +
    INTRODUCTION
    +

    +PCRE2 is the name used for a revised API for the PCRE library, which is a set +of functions, written in C, that implement regular expression pattern matching +using the same syntax and semantics as Perl, with just a few differences. After +nearly two decades, the limitations of the original API were making development +increasingly difficult. The new API is more extensible, and it was simplified +by abolishing the separate "study" optimizing function; in PCRE2, patterns are +automatically optimized where possible. Since forking from PCRE1, the code has +been extensively refactored and new features introduced. +

    +

    +As well as Perl-style regular expression patterns, some features that appeared +in Python and the original PCRE before they appeared in Perl are available +using the Python syntax. There is also some support for one or two .NET and +Oniguruma syntax items, and there are options for requesting some minor changes +that give better ECMAScript (aka JavaScript) compatibility. +

    +

    +The source code for PCRE2 can be compiled to support strings of 8-bit, 16-bit, +or 32-bit code units, which means that up to three separate libraries may be +installed, one for each code unit size. The size of code unit is not related to +the bit size of the underlying hardware. In a 64-bit environment that also +supports 32-bit applications, versions of PCRE2 that are compiled in both +64-bit and 32-bit modes may be needed. +

    +

    +The original work to extend PCRE to 16-bit and 32-bit code units was done by +Zoltan Herczeg and Christian Persch, respectively. In all three cases, strings +can be interpreted either as one character per code unit, or as UTF-encoded +Unicode, with support for Unicode general category properties. Unicode support +is optional at build time (but is the default). However, processing strings as +UTF code units must be enabled explicitly at run time. The version of Unicode +in use can be discovered by running +

    +  pcre2test -C
    +
    +

    +

    +The three libraries contain identical sets of functions, with names ending in +_8, _16, or _32, respectively (for example, pcre2_compile_8()). However, +by defining PCRE2_CODE_UNIT_WIDTH to be 8, 16, or 32, a program that uses just +one code unit width can be written using generic names such as +pcre2_compile(), and the documentation is written assuming that this is +the case. +

    +

    +In addition to the Perl-compatible matching function, PCRE2 contains an +alternative function that matches the same compiled patterns in a different +way. In certain circumstances, the alternative function has some advantages. +For a discussion of the two matching algorithms, see the +pcre2matching +page. +

    +

    +Details of exactly which Perl regular expression features are and are not +supported by PCRE2 are given in separate documents. See the +pcre2pattern +and +pcre2compat +pages. There is a syntax summary in the +pcre2syntax +page. +

    +

    +Some features of PCRE2 can be included, excluded, or changed when the library +is built. The +pcre2_config() +function makes it possible for a client to discover which features are +available. The features themselves are described in the +pcre2build +page. Documentation about building PCRE2 for various operating systems can be +found in the +README +and +NON-AUTOTOOLS_BUILD +files in the source distribution. +

    +

    +The libraries contains a number of undocumented internal functions and data +tables that are used by more than one of the exported external functions, but +which are not intended for use by external callers. Their names all begin with +"_pcre2", which hopefully will not provoke any name clashes. In some +environments, it is possible to control which external symbols are exported +when a shared library is built, and in these cases the undocumented symbols are +not exported. +

    +
    SECURITY CONSIDERATIONS
    +

    +If you are using PCRE2 in a non-UTF application that permits users to supply +arbitrary patterns for compilation, you should be aware of a feature that +allows users to turn on UTF support from within a pattern. For example, an +8-bit pattern that begins with "(*UTF)" turns on UTF-8 mode, which interprets +patterns and subjects as strings of UTF-8 code units instead of individual +8-bit characters. This causes both the pattern and any data against which it is +matched to be checked for UTF-8 validity. If the data string is very long, such +a check might use sufficiently many resources as to cause your application to +lose performance. +

    +

    +One way of guarding against this possibility is to use the +pcre2_pattern_info() function to check the compiled pattern's options for +PCRE2_UTF. Alternatively, you can set the PCRE2_NEVER_UTF option when calling +pcre2_compile(). This causes a compile time error if the pattern contains +a UTF-setting sequence. +

    +

    +The use of Unicode properties for character types such as \d can also be +enabled from within the pattern, by specifying "(*UCP)". This feature can be +disallowed by setting the PCRE2_NEVER_UCP option. +

    +

    +If your application is one that supports UTF, be aware that validity checking +can take time. If the same data string is to be matched many times, you can use +the PCRE2_NO_UTF_CHECK option for the second and subsequent matches to avoid +running redundant checks. +

    +

    +The use of the \C escape sequence in a UTF-8 or UTF-16 pattern can lead to +problems, because it may leave the current matching point in the middle of a +multi-code-unit character. The PCRE2_NEVER_BACKSLASH_C option can be used by an +application to lock out the use of \C, causing a compile-time error if it is +encountered. It is also possible to build PCRE2 with the use of \C permanently +disabled. +

    +

    +Another way that performance can be hit is by running a pattern that has a very +large search tree against a string that will never match. Nested unlimited +repeats in a pattern are a common example. PCRE2 provides some protection +against this: see the pcre2_set_match_limit() function in the +pcre2api +page. There is a similar function called pcre2_set_depth_limit() that can +be used to restrict the amount of memory that is used. +

    +
    USER DOCUMENTATION
    +

    +The user documentation for PCRE2 comprises a number of different sections. In +the "man" format, each of these is a separate "man page". In the HTML format, +each is a separate page, linked from the index page. In the plain text format, +the descriptions of the pcre2grep and pcre2test programs are in +files called pcre2grep.txt and pcre2test.txt, respectively. The +remaining sections, except for the pcre2demo section (which is a program +listing), and the short pages for individual functions, are concatenated in +pcre2.txt, for ease of searching. The sections are as follows: +

    +  pcre2              this document
    +  pcre2-config       show PCRE2 installation configuration information
    +  pcre2api           details of PCRE2's native C API
    +  pcre2build         building PCRE2
    +  pcre2callout       details of the pattern callout feature
    +  pcre2compat        discussion of Perl compatibility
    +  pcre2convert       details of pattern conversion functions
    +  pcre2demo          a demonstration C program that uses PCRE2
    +  pcre2grep          description of the pcre2grep command (8-bit only)
    +  pcre2jit           discussion of just-in-time optimization support
    +  pcre2limits        details of size and other limits
    +  pcre2matching      discussion of the two matching algorithms
    +  pcre2partial       details of the partial matching facility
    +  pcre2pattern       syntax and semantics of supported regular expression patterns
    +  pcre2perform       discussion of performance issues
    +  pcre2posix         the POSIX-compatible C API for the 8-bit library
    +  pcre2sample        discussion of the pcre2demo program
    +  pcre2serialize     details of pattern serialization
    +  pcre2syntax        quick syntax reference
    +  pcre2test          description of the pcre2test command
    +  pcre2unicode       discussion of Unicode and UTF support
    +
    +In the "man" and HTML formats, there is also a short page for each C library +function, listing its arguments and results. +

    +
    AUTHOR
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +

    +Putting an actual email address here is a spam magnet. If you want to email me, +use my two initials, followed by the two digits 10, at the domain cam.ac.uk. +

    +
    REVISION
    +

    +Last updated: 28 April 2021 +
    +Copyright © 1997-2021 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_callout_enumerate.html b/src/pcre2/doc/html/pcre2_callout_enumerate.html new file mode 100644 index 00000000..505ea7b2 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_callout_enumerate.html @@ -0,0 +1,63 @@ + + +pcre2_callout_enumerate specification + + +

    pcre2_callout_enumerate man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_callout_enumerate(const pcre2_code *code, + int (*callback)(pcre2_callout_enumerate_block *, void *), + void *callout_data); +

    +
    +DESCRIPTION +
    +

    +This function scans a compiled regular expression and calls the callback() +function for each callout within the pattern. The yield of the function is zero +for success and non-zero otherwise. The arguments are: +

    +  code           Points to the compiled pattern
    +  callback       The callback function
    +  callout_data   User data that is passed to the callback
    +
    +The callback() function is passed a pointer to a data block containing +the following fields (not necessarily in this order): +
    +  uint32_t   version                Block version number
    +  uint32_t   callout_number         Number for numbered callouts
    +  PCRE2_SIZE pattern_position       Offset to next item in pattern
    +  PCRE2_SIZE next_item_length       Length of next item in pattern
    +  PCRE2_SIZE callout_string_offset  Offset to string within pattern
    +  PCRE2_SIZE callout_string_length  Length of callout string
    +  PCRE2_SPTR callout_string         Points to callout string or is NULL
    +
    +The second argument passed to the callback() function is the callout data +that was passed to pcre2_callout_enumerate(). The callback() +function must return zero for success. Any other value causes the pattern scan +to stop, with the value being passed back as the result of +pcre2_callout_enumerate(). +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_code_copy.html b/src/pcre2/doc/html/pcre2_code_copy.html new file mode 100644 index 00000000..667d7b7f --- /dev/null +++ b/src/pcre2/doc/html/pcre2_code_copy.html @@ -0,0 +1,43 @@ + + +pcre2_code_copy specification + + +

    pcre2_code_copy man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +pcre2_code *pcre2_code_copy(const pcre2_code *code); +

    +
    +DESCRIPTION +
    +

    +This function makes a copy of the memory used for a compiled pattern, excluding +any memory used by the JIT compiler. Without a subsequent call to +pcre2_jit_compile(), the copy can be used only for non-JIT matching. The +pointer to the character tables is copied, not the tables themselves (see +pcre2_code_copy_with_tables()). The yield of the function is NULL if +code is NULL or if sufficient memory cannot be obtained. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_code_copy_with_tables.html b/src/pcre2/doc/html/pcre2_code_copy_with_tables.html new file mode 100644 index 00000000..67b2e1ff --- /dev/null +++ b/src/pcre2/doc/html/pcre2_code_copy_with_tables.html @@ -0,0 +1,44 @@ + + +pcre2_code_copy_with_tables specification + + +

    pcre2_code_copy_with_tables man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *code); +

    +
    +DESCRIPTION +
    +

    +This function makes a copy of the memory used for a compiled pattern, excluding +any memory used by the JIT compiler. Without a subsequent call to +pcre2_jit_compile(), the copy can be used only for non-JIT matching. +Unlike pcre2_code_copy(), a separate copy of the character tables is also +made, with the new code pointing to it. This memory will be automatically freed +when pcre2_code_free() is called. The yield of the function is NULL if +code is NULL or if sufficient memory cannot be obtained. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_code_free.html b/src/pcre2/doc/html/pcre2_code_free.html new file mode 100644 index 00000000..ff302fcd --- /dev/null +++ b/src/pcre2/doc/html/pcre2_code_free.html @@ -0,0 +1,42 @@ + + +pcre2_code_free specification + + +

    pcre2_code_free man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +void pcre2_code_free(pcre2_code *code); +

    +
    +DESCRIPTION +
    +

    +If code is NULL, this function does nothing. Otherwise, code must +point to a compiled pattern. This function frees its memory, including any +memory used by the JIT compiler. If the compiled pattern was created by a call +to pcre2_code_copy_with_tables(), the memory for the character tables is +also freed. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_compile.html b/src/pcre2/doc/html/pcre2_compile.html new file mode 100644 index 00000000..f6485f22 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_compile.html @@ -0,0 +1,107 @@ + + +pcre2_compile specification + + +

    pcre2_compile man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +pcre2_code *pcre2_compile(PCRE2_SPTR pattern, PCRE2_SIZE length, + uint32_t options, int *errorcode, PCRE2_SIZE *erroroffset, + pcre2_compile_context *ccontext); +

    +
    +DESCRIPTION +
    +

    +This function compiles a regular expression pattern into an internal form. Its +arguments are: +

    +  pattern       A string containing expression to be compiled
    +  length        The length of the string or PCRE2_ZERO_TERMINATED
    +  options       Option bits
    +  errorcode     Where to put an error code
    +  erroffset     Where to put an error offset
    +  ccontext      Pointer to a compile context or NULL
    +
    +The length of the pattern and any error offset that is returned are in code +units, not characters. A compile context is needed only if you want to provide +custom memory allocation functions, or to provide an external function for +system stack size checking, or to change one or more of these parameters: +
    +  What \R matches (Unicode newlines, or CR, LF, CRLF only);
    +  PCRE2's character tables;
    +  The newline character sequence;
    +  The compile time nested parentheses limit;
    +  The maximum pattern length (in code units) that is allowed.
    +  The additional options bits (see pcre2_set_compile_extra_options())
    +
    +The option bits are: +
    +  PCRE2_ANCHORED           Force pattern anchoring
    +  PCRE2_ALLOW_EMPTY_CLASS  Allow empty classes
    +  PCRE2_ALT_BSUX           Alternative handling of \u, \U, and \x
    +  PCRE2_ALT_CIRCUMFLEX     Alternative handling of ^ in multiline mode
    +  PCRE2_ALT_VERBNAMES      Process backslashes in verb names
    +  PCRE2_AUTO_CALLOUT       Compile automatic callouts
    +  PCRE2_CASELESS           Do caseless matching
    +  PCRE2_DOLLAR_ENDONLY     $ not to match newline at end
    +  PCRE2_DOTALL             . matches anything including NL
    +  PCRE2_DUPNAMES           Allow duplicate names for subpatterns
    +  PCRE2_ENDANCHORED        Pattern can match only at end of subject
    +  PCRE2_EXTENDED           Ignore white space and # comments
    +  PCRE2_FIRSTLINE          Force matching to be before newline
    +  PCRE2_LITERAL            Pattern characters are all literal
    +  PCRE2_MATCH_INVALID_UTF  Enable support for matching invalid UTF
    +  PCRE2_MATCH_UNSET_BACKREF  Match unset backreferences
    +  PCRE2_MULTILINE          ^ and $ match newlines within data
    +  PCRE2_NEVER_BACKSLASH_C  Lock out the use of \C in patterns
    +  PCRE2_NEVER_UCP          Lock out PCRE2_UCP, e.g. via (*UCP)
    +  PCRE2_NEVER_UTF          Lock out PCRE2_UTF, e.g. via (*UTF)
    +  PCRE2_NO_AUTO_CAPTURE    Disable numbered capturing paren-
    +                            theses (named ones available)
    +  PCRE2_NO_AUTO_POSSESS    Disable auto-possessification
    +  PCRE2_NO_DOTSTAR_ANCHOR  Disable automatic anchoring for .*
    +  PCRE2_NO_START_OPTIMIZE  Disable match-time start optimizations
    +  PCRE2_NO_UTF_CHECK       Do not check the pattern for UTF validity
    +                             (only relevant if PCRE2_UTF is set)
    +  PCRE2_UCP                Use Unicode properties for \d, \w, etc.
    +  PCRE2_UNGREEDY           Invert greediness of quantifiers
    +  PCRE2_USE_OFFSET_LIMIT   Enable offset limit for unanchored matching
    +  PCRE2_UTF                Treat pattern and subjects as UTF strings
    +
    +PCRE2 must be built with Unicode support (the default) in order to use +PCRE2_UTF, PCRE2_UCP and related options. +

    +

    +Additional options may be set in the compile context via the +pcre2_set_compile_extra_options +function. +

    +

    +The yield of this function is a pointer to a private data structure that +contains the compiled pattern, or NULL if an error was detected. +

    +

    +There is a complete description of the PCRE2 native API, with more detail on +each option, in the +pcre2api +page, and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_compile_context_copy.html b/src/pcre2/doc/html/pcre2_compile_context_copy.html new file mode 100644 index 00000000..9e9884b8 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_compile_context_copy.html @@ -0,0 +1,41 @@ + + +pcre2_compile_context_copy specification + + +

    pcre2_compile_context_copy man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +pcre2_compile_context *pcre2_compile_context_copy( + pcre2_compile_context *ccontext); +

    +
    +DESCRIPTION +
    +

    +This function makes a new copy of a compile context, using the memory +allocation function that was used for the original context. The result is NULL +if the memory cannot be obtained. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_compile_context_create.html b/src/pcre2/doc/html/pcre2_compile_context_create.html new file mode 100644 index 00000000..5eacd4ec --- /dev/null +++ b/src/pcre2/doc/html/pcre2_compile_context_create.html @@ -0,0 +1,42 @@ + + +pcre2_compile_context_create specification + + +

    pcre2_compile_context_create man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +pcre2_compile_context *pcre2_compile_context_create( + pcre2_general_context *gcontext); +

    +
    +DESCRIPTION +
    +

    +This function creates and initializes a new compile context. If its argument is +NULL, malloc() is used to get the necessary memory; otherwise the memory +allocation function within the general context is used. The result is NULL if +the memory could not be obtained. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_compile_context_free.html b/src/pcre2/doc/html/pcre2_compile_context_free.html new file mode 100644 index 00000000..b4159b11 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_compile_context_free.html @@ -0,0 +1,41 @@ + + +pcre2_compile_context_free specification + + +

    pcre2_compile_context_free man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +void pcre2_compile_context_free(pcre2_compile_context *ccontext); +

    +
    +DESCRIPTION +
    +

    +This function frees the memory occupied by a compile context, using the memory +freeing function from the general context with which it was created, or +free() if that was not set. If the argument is NULL, the function returns +immediately without doing anything. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_config.html b/src/pcre2/doc/html/pcre2_config.html new file mode 100644 index 00000000..f05bd062 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_config.html @@ -0,0 +1,84 @@ + + +pcre2_config specification + + +

    pcre2_config man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_config(uint32_t what, void *where); +

    +
    +DESCRIPTION +
    +

    +This function makes it possible for a client program to find out which optional +features are available in the version of the PCRE2 library it is using. The +arguments are as follows: +

    +  what     A code specifying what information is required
    +  where    Points to where to put the information
    +
    +If where is NULL, the function returns the amount of memory needed for +the requested information. When the information is a string, the value is in +code units; for other types of data it is in bytes. +

    +

    +If where is not NULL, for PCRE2_CONFIG_JITTARGET, +PCRE2_CONFIG_UNICODE_VERSION, and PCRE2_CONFIG_VERSION it must point to a +buffer that is large enough to hold the string. For all other codes it must +point to a uint32_t integer variable. The available codes are: +

    +  PCRE2_CONFIG_BSR             Indicates what \R matches by default:
    +                                 PCRE2_BSR_UNICODE
    +                                 PCRE2_BSR_ANYCRLF
    +  PCRE2_CONFIG_COMPILED_WIDTHS Which of 8/16/32 support was compiled
    +  PCRE2_CONFIG_DEPTHLIMIT      Default backtracking depth limit
    +  PCRE2_CONFIG_HEAPLIMIT       Default heap memory limit
    +  PCRE2_CONFIG_JIT             Availability of just-in-time compiler support (1=yes 0=no)
    +  PCRE2_CONFIG_JITTARGET       Information (a string) about the target architecture for the JIT compiler
    +  PCRE2_CONFIG_LINKSIZE        Configured internal link size (2, 3, 4)
    +  PCRE2_CONFIG_MATCHLIMIT      Default internal resource limit
    +  PCRE2_CONFIG_NEVER_BACKSLASH_C  Whether or not \C is disabled
    +  PCRE2_CONFIG_NEWLINE         Code for the default newline sequence:
    +                                 PCRE2_NEWLINE_CR
    +                                 PCRE2_NEWLINE_LF
    +                                 PCRE2_NEWLINE_CRLF
    +                                 PCRE2_NEWLINE_ANY
    +                                 PCRE2_NEWLINE_ANYCRLF
    +                                 PCRE2_NEWLINE_NUL
    +  PCRE2_CONFIG_PARENSLIMIT     Default parentheses nesting limit
    +  PCRE2_CONFIG_RECURSIONLIMIT  Obsolete: use PCRE2_CONFIG_DEPTHLIMIT
    +  PCRE2_CONFIG_STACKRECURSE    Obsolete: always returns 0
    +  PCRE2_CONFIG_UNICODE         Availability of Unicode support (1=yes 0=no)
    +  PCRE2_CONFIG_UNICODE_VERSION The Unicode version (a string)
    +  PCRE2_CONFIG_VERSION         The PCRE2 version (a string)
    +
    +The function yields a non-negative value on success or the negative value +PCRE2_ERROR_BADOPTION otherwise. This is also the result for the +PCRE2_CONFIG_JITTARGET code if JIT support is not available. When a string is +requested, the function returns the number of code units used, including the +terminating zero. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_convert_context_copy.html b/src/pcre2/doc/html/pcre2_convert_context_copy.html new file mode 100644 index 00000000..3c44ac6d --- /dev/null +++ b/src/pcre2/doc/html/pcre2_convert_context_copy.html @@ -0,0 +1,40 @@ + + +pcre2_convert_context_copy specification + + +

    pcre2_convert_context_copy man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +pcre2_convert_context *pcre2_convert_context_copy( + pcre2_convert_context *cvcontext); +

    +
    +DESCRIPTION +
    +

    +This function is part of an experimental set of pattern conversion functions. +It makes a new copy of a convert context, using the memory allocation function +that was used for the original context. The result is NULL if the memory cannot +be obtained. +

    +

    +The pattern conversion functions are described in the +pcre2convert +documentation. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_convert_context_create.html b/src/pcre2/doc/html/pcre2_convert_context_create.html new file mode 100644 index 00000000..25647809 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_convert_context_create.html @@ -0,0 +1,41 @@ + + +pcre2_convert_context_create specification + + +

    pcre2_convert_context_create man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +pcre2_convert_context *pcre2_convert_context_create( + pcre2_general_context *gcontext); +

    +
    +DESCRIPTION +
    +

    +This function is part of an experimental set of pattern conversion functions. +It creates and initializes a new convert context. If its argument is +NULL, malloc() is used to get the necessary memory; otherwise the memory +allocation function within the general context is used. The result is NULL if +the memory could not be obtained. +

    +

    +The pattern conversion functions are described in the +pcre2convert +documentation. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_convert_context_free.html b/src/pcre2/doc/html/pcre2_convert_context_free.html new file mode 100644 index 00000000..e9b142bf --- /dev/null +++ b/src/pcre2/doc/html/pcre2_convert_context_free.html @@ -0,0 +1,40 @@ + + +pcre2_convert_context_free specification + + +

    pcre2_convert_context_free man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +void pcre2_convert_context_free(pcre2_convert_context *cvcontext); +

    +
    +DESCRIPTION +
    +

    +This function is part of an experimental set of pattern conversion functions. +It frees the memory occupied by a convert context, using the memory +freeing function from the general context with which it was created, or +free() if that was not set. If the argument is NULL, the function returns +immediately without doing anything. +

    +

    +The pattern conversion functions are described in the +pcre2convert +documentation. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_converted_pattern_free.html b/src/pcre2/doc/html/pcre2_converted_pattern_free.html new file mode 100644 index 00000000..01d28d7a --- /dev/null +++ b/src/pcre2/doc/html/pcre2_converted_pattern_free.html @@ -0,0 +1,40 @@ + + +pcre2_converted_pattern_free specification + + +

    pcre2_converted_pattern_free man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +void pcre2_converted_pattern_free(PCRE2_UCHAR *converted_pattern); +

    +
    +DESCRIPTION +
    +

    +This function is part of an experimental set of pattern conversion functions. +It frees the memory occupied by a converted pattern that was obtained by +calling pcre2_pattern_convert() with arguments that caused it to place +the converted pattern into newly obtained heap memory. If the argument is NULL, +the function returns immediately without doing anything. +

    +

    +The pattern conversion functions are described in the +pcre2convert +documentation. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_dfa_match.html b/src/pcre2/doc/html/pcre2_dfa_match.html new file mode 100644 index 00000000..232e2bce --- /dev/null +++ b/src/pcre2/doc/html/pcre2_dfa_match.html @@ -0,0 +1,80 @@ + + +pcre2_dfa_match specification + + +

    pcre2_dfa_match man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext, + int *workspace, PCRE2_SIZE wscount); +

    +
    +DESCRIPTION +
    +

    +This function matches a compiled regular expression against a given subject +string, using an alternative matching algorithm that scans the subject string +just once (except when processing lookaround assertions). This function is +not Perl-compatible (the Perl-compatible matching function is +pcre2_match()). The arguments for this function are: +

    +  code         Points to the compiled pattern
    +  subject      Points to the subject string
    +  length       Length of the subject string
    +  startoffset  Offset in the subject at which to start matching
    +  options      Option bits
    +  match_data   Points to a match data block, for results
    +  mcontext     Points to a match context, or is NULL
    +  workspace    Points to a vector of ints used as working space
    +  wscount      Number of elements in the vector
    +
    +For pcre2_dfa_match(), a match context is needed only if you want to set +up a callout function or specify the heap limit or the match or the recursion +depth limits. The length and startoffset values are code units, not +characters. The options are: +
    +  PCRE2_ANCHORED          Match only at the first position
    +  PCRE2_COPY_MATCHED_SUBJECT
    +                          On success, make a private subject copy
    +  PCRE2_ENDANCHORED       Pattern can match only at end of subject
    +  PCRE2_NOTBOL            Subject is not the beginning of a line
    +  PCRE2_NOTEOL            Subject is not the end of a line
    +  PCRE2_NOTEMPTY          An empty string is not a valid match
    +  PCRE2_NOTEMPTY_ATSTART  An empty string at the start of the subject is not a valid match
    +  PCRE2_NO_UTF_CHECK      Do not check the subject for UTF validity (only relevant if PCRE2_UTF
    +                           was set at compile time)
    +  PCRE2_PARTIAL_HARD      Return PCRE2_ERROR_PARTIAL for a partial match even if there is a full match
    +  PCRE2_PARTIAL_SOFT      Return PCRE2_ERROR_PARTIAL for a partial match if no full matches are found
    +  PCRE2_DFA_RESTART       Restart after a partial match
    +  PCRE2_DFA_SHORTEST      Return only the shortest match
    +
    +There are restrictions on what may appear in a pattern when using this matching +function. Details are given in the +pcre2matching +documentation. For details of partial matching, see the +pcre2partial +page. There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_general_context_copy.html b/src/pcre2/doc/html/pcre2_general_context_copy.html new file mode 100644 index 00000000..00185346 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_general_context_copy.html @@ -0,0 +1,42 @@ + + +pcre2_general_context_copy specification + + +

    pcre2_general_context_copy man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +pcre2_general_context *pcre2_general_context_copy( + pcre2_general_context *gcontext); +

    +
    +DESCRIPTION +
    +

    +This function makes a new copy of a general context, using the memory +allocation functions in the context, if set, to get the necessary memory. +Otherwise malloc() is used. The result is NULL if the memory cannot be +obtained. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_general_context_create.html b/src/pcre2/doc/html/pcre2_general_context_create.html new file mode 100644 index 00000000..bc31ee82 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_general_context_create.html @@ -0,0 +1,44 @@ + + +pcre2_general_context_create specification + + +

    pcre2_general_context_create man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +pcre2_general_context *pcre2_general_context_create( + void *(*private_malloc)(PCRE2_SIZE, void *), + void (*private_free)(void *, void *), void *memory_data); +

    +
    +DESCRIPTION +
    +

    +This function creates and initializes a general context. The arguments define +custom memory management functions and a data value that is passed to them when +they are called. The private_malloc() function is used to get memory for +the context. If either of the first two arguments is NULL, the system memory +management function is used. The result is NULL if no memory could be obtained. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_general_context_free.html b/src/pcre2/doc/html/pcre2_general_context_free.html new file mode 100644 index 00000000..9f335f57 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_general_context_free.html @@ -0,0 +1,40 @@ + + +pcre2_general_context_free specification + + +

    pcre2_general_context_free man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +void pcre2_general_context_free(pcre2_general_context *gcontext); +

    +
    +DESCRIPTION +
    +

    +This function frees the memory occupied by a general context, using the memory +freeing function within the context, if set. If the argument is NULL, the +function returns immediately without doing anything. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_get_error_message.html b/src/pcre2/doc/html/pcre2_get_error_message.html new file mode 100644 index 00000000..70057600 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_get_error_message.html @@ -0,0 +1,51 @@ + + +pcre2_get_error_message specification + + +

    pcre2_get_error_message man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_get_error_message(int errorcode, PCRE2_UCHAR *buffer, + PCRE2_SIZE bufflen); +

    +
    +DESCRIPTION +
    +

    +This function provides a textual error message for each PCRE2 error code. +Compilation errors are positive numbers; UTF formatting errors and matching +errors are negative numbers. The arguments are: +

    +  errorcode   an error code (positive or negative)
    +  buffer      where to put the message
    +  bufflen     the length of the buffer (code units)
    +
    +The function returns the length of the message in code units, excluding the +trailing zero, or the negative error code PCRE2_ERROR_NOMEMORY if the buffer is +too small. In this case, the returned message is truncated (but still with a +trailing zero). If errorcode does not contain a recognized error code +number, the negative value PCRE2_ERROR_BADDATA is returned. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_get_mark.html b/src/pcre2/doc/html/pcre2_get_mark.html new file mode 100644 index 00000000..88e63269 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_get_mark.html @@ -0,0 +1,47 @@ + + +pcre2_get_mark specification + + +

    pcre2_get_mark man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +PCRE2_SPTR pcre2_get_mark(pcre2_match_data *match_data); +

    +
    +DESCRIPTION +
    +

    +After a call of pcre2_match() that was passed the match block that is +this function's argument, this function returns a pointer to the last (*MARK), +(*PRUNE), or (*THEN) name that was encountered during the matching process. The +name is zero-terminated, and is within the compiled pattern. The length of the +name is in the preceding code unit. If no name is available, NULL is returned. +

    +

    +After a successful match, the name that is returned is the last one on the +matching path. After a failed match or a partial match, the last encountered +name is returned. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_get_match_data_size.html b/src/pcre2/doc/html/pcre2_get_match_data_size.html new file mode 100644 index 00000000..113ecaab --- /dev/null +++ b/src/pcre2/doc/html/pcre2_get_match_data_size.html @@ -0,0 +1,39 @@ + + +pcre2_get_match_data_size specification + + +

    pcre2_get_match_data_size man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +PCRE2_SIZE pcre2_get_match_data_size(pcre2_match_data *match_data); +

    +
    +DESCRIPTION +
    +

    +This function returns the size, in bytes, of the match data block that is its +argument. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_get_ovector_count.html b/src/pcre2/doc/html/pcre2_get_ovector_count.html new file mode 100644 index 00000000..05aacb6d --- /dev/null +++ b/src/pcre2/doc/html/pcre2_get_ovector_count.html @@ -0,0 +1,39 @@ + + +pcre2_get_ovector_count specification + + +

    pcre2_get_ovector_count man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +uint32_t pcre2_get_ovector_count(pcre2_match_data *match_data); +

    +
    +DESCRIPTION +
    +

    +This function returns the number of pairs of offsets in the ovector that forms +part of the given match data block. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_get_ovector_pointer.html b/src/pcre2/doc/html/pcre2_get_ovector_pointer.html new file mode 100644 index 00000000..ff6317ef --- /dev/null +++ b/src/pcre2/doc/html/pcre2_get_ovector_pointer.html @@ -0,0 +1,40 @@ + + +pcre2_get_ovector_pointer specification + + +

    pcre2_get_ovector_pointer man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *match_data); +

    +
    +DESCRIPTION +
    +

    +This function returns a pointer to the vector of offsets that forms part of the +given match data block. The number of pairs can be found by calling +pcre2_get_ovector_count(). +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_get_startchar.html b/src/pcre2/doc/html/pcre2_get_startchar.html new file mode 100644 index 00000000..d2c28b2a --- /dev/null +++ b/src/pcre2/doc/html/pcre2_get_startchar.html @@ -0,0 +1,44 @@ + + +pcre2_get_startchar specification + + +

    pcre2_get_startchar man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *match_data); +

    +
    +DESCRIPTION +
    +

    +After a successful call of pcre2_match() that was passed the match block +that is this function's argument, this function returns the code unit offset of +the character at which the successful match started. For a non-partial match, +this can be different to the value of ovector[0] if the pattern contains +the \K escape sequence. After a partial match, however, this value is always +the same as ovector[0] because \K does not affect the result of a +partial match. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_jit_compile.html b/src/pcre2/doc/html/pcre2_jit_compile.html new file mode 100644 index 00000000..873d0dde --- /dev/null +++ b/src/pcre2/doc/html/pcre2_jit_compile.html @@ -0,0 +1,63 @@ + + +pcre2_jit_compile specification + + +

    pcre2_jit_compile man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_jit_compile(pcre2_code *code, uint32_t options); +

    +
    +DESCRIPTION +
    +

    +This function requests JIT compilation, which, if the just-in-time compiler is +available, further processes a compiled pattern into machine code that executes +much faster than the pcre2_match() interpretive matching function. Full +details are given in the +pcre2jit +documentation. +

    +

    +The first argument is a pointer that was returned by a successful call to +pcre2_compile(), and the second must contain one or more of the following +bits: +

    +  PCRE2_JIT_COMPLETE      compile code for full matching
    +  PCRE2_JIT_PARTIAL_SOFT  compile code for soft partial matching
    +  PCRE2_JIT_PARTIAL_HARD  compile code for hard partial matching
    +
    +There is also an obsolete option called PCRE2_JIT_INVALID_UTF, which has been +superseded by the pcre2_compile() option PCRE2_MATCH_INVALID_UTF. The old +option is deprecated and may be removed in the future. +

    +

    +The yield of the function is 0 for success, or a negative error code otherwise. +In particular, PCRE2_ERROR_JIT_BADOPTION is returned if JIT is not supported or +if an unknown bit is set in options. The function can also return +PCRE2_ERROR_NOMEMORY if JIT is unable to allocate executable memory for the +compiler, even if it was because of a system security restriction. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_jit_free_unused_memory.html b/src/pcre2/doc/html/pcre2_jit_free_unused_memory.html new file mode 100644 index 00000000..7f37e583 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_jit_free_unused_memory.html @@ -0,0 +1,43 @@ + + +pcre2_jit_free_unused_memory specification + + +

    pcre2_jit_free_unused_memory man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext); +

    +
    +DESCRIPTION +
    +

    +This function frees unused JIT executable memory. The argument is a general +context, for custom memory management, or NULL for standard memory management. +JIT memory allocation retains some memory in order to improve future JIT +compilation speed. In low memory conditions, +pcre2_jit_free_unused_memory() can be used to cause this memory to be +freed. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_jit_match.html b/src/pcre2/doc/html/pcre2_jit_match.html new file mode 100644 index 00000000..8629e4a4 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_jit_match.html @@ -0,0 +1,60 @@ + + +pcre2_jit_match specification + + +

    pcre2_jit_match man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_jit_match(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext); +

    +
    +DESCRIPTION +
    +

    +This function matches a compiled regular expression that has been successfully +processed by the JIT compiler against a given subject string, using a matching +algorithm that is similar to Perl's. It is a "fast path" interface to JIT, and +it bypasses some of the sanity checks that pcre2_match() applies. +Its arguments are exactly the same as for +pcre2_match(), +except that the subject string must be specified with a length; +PCRE2_ZERO_TERMINATED is not supported. +

    +

    +The supported options are PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, +PCRE2_NOTEMPTY_ATSTART, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Unsupported +options are ignored. The subject string is not checked for UTF validity. +

    +

    +The return values are the same as for pcre2_match() plus +PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested +that was not compiled. For details of partial matching, see the +pcre2partial +page. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the JIT API in the +pcre2jit +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_jit_stack_assign.html b/src/pcre2/doc/html/pcre2_jit_stack_assign.html new file mode 100644 index 00000000..4b3abb90 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_jit_stack_assign.html @@ -0,0 +1,75 @@ + + +pcre2_jit_stack_assign specification + + +

    pcre2_jit_stack_assign man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +void pcre2_jit_stack_assign(pcre2_match_context *mcontext, + pcre2_jit_callback callback_function, void *callback_data); +

    +
    +DESCRIPTION +
    +

    +This function provides control over the memory used by JIT as a run-time stack +when pcre2_match() or pcre2_jit_match() is called with a pattern +that has been successfully processed by the JIT compiler. The information that +determines which stack is used is put into a match context that is subsequently +passed to a matching function. The arguments of this function are: +

    +  mcontext       a pointer to a match context
    +  callback       a callback function
    +  callback_data  a JIT stack or a value to be passed to the callback
    +
    +

    +

    +If mcontext is NULL, the function returns immediately, without doing +anything. +

    +

    +If callback is NULL and callback_data is NULL, an internal 32KiB +block on the machine stack is used. +

    +

    +If callback is NULL and callback_data is not NULL, +callback_data must be a valid JIT stack, the result of calling +pcre2_jit_stack_create(). +

    +

    +If callback not NULL, it is called with callback_data as an +argument at the start of matching, in order to set up a JIT stack. If the +result is NULL, the internal 32KiB stack is used; otherwise the return value +must be a valid JIT stack, the result of calling +pcre2_jit_stack_create(). +

    +

    +You may safely use the same JIT stack for multiple patterns, as long as they +are all matched in the same thread. In a multithread application, each thread +must use its own JIT stack. For more details, see the +pcre2jit +page. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_jit_stack_create.html b/src/pcre2/doc/html/pcre2_jit_stack_create.html new file mode 100644 index 00000000..6200d177 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_jit_stack_create.html @@ -0,0 +1,49 @@ + + +pcre2_jit_stack_create specification + + +

    pcre2_jit_stack_create man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +pcre2_jit_stack *pcre2_jit_stack_create(PCRE2_SIZE startsize, + PCRE2_SIZE maxsize, pcre2_general_context *gcontext); +

    +
    +DESCRIPTION +
    +

    +This function is used to create a stack for use by the code compiled by the JIT +compiler. The first two arguments are a starting size for the stack, and a +maximum size to which it is allowed to grow. The final argument is a general +context, for memory allocation functions, or NULL for standard memory +allocation. The result can be passed to the JIT run-time code by calling +pcre2_jit_stack_assign() to associate the stack with a compiled pattern, +which can then be processed by pcre2_match() or pcre2_jit_match(). +A maximum stack size of 512KiB to 1MiB should be more than enough for any +pattern. For more details, see the +pcre2jit +page. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_jit_stack_free.html b/src/pcre2/doc/html/pcre2_jit_stack_free.html new file mode 100644 index 00000000..1d078d74 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_jit_stack_free.html @@ -0,0 +1,43 @@ + + +pcre2_jit_stack_free specification + + +

    pcre2_jit_stack_free man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +void pcre2_jit_stack_free(pcre2_jit_stack *jit_stack); +

    +
    +DESCRIPTION +
    +

    +This function is used to free a JIT stack that was created by +pcre2_jit_stack_create() when it is no longer needed. If the argument is +NULL, the function returns immediately without doing anything. For more +details, see the +pcre2jit +page. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_maketables.html b/src/pcre2/doc/html/pcre2_maketables.html new file mode 100644 index 00000000..19636545 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_maketables.html @@ -0,0 +1,48 @@ + + +pcre2_maketables specification + + +

    pcre2_maketables man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +const uint8_t *pcre2_maketables(pcre2_general_context *gcontext); +

    +
    +DESCRIPTION +
    +

    +This function builds a set of character tables for character code points that +are less than 256. These can be passed to pcre2_compile() in a compile +context in order to override the internal, built-in tables (which were either +defaulted or made by pcre2_maketables() when PCRE2 was compiled). See the +pcre2_set_character_tables() +page. You might want to do this if you are using a non-standard locale. +

    +

    +If the argument is NULL, malloc() is used to get memory for the tables. +Otherwise it must point to a general context, which can supply pointers to a +custom memory manager. The function yields a pointer to the tables. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_maketables_free.html b/src/pcre2/doc/html/pcre2_maketables_free.html new file mode 100644 index 00000000..7316ab25 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_maketables_free.html @@ -0,0 +1,44 @@ + + +pcre2_maketables_free specification + + +

    pcre2_maketables_free man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +void pcre2_maketables_free(pcre2_general_context *gcontext, + const uint8_t *tables); +

    +
    +DESCRIPTION +
    +

    +This function discards a set of character tables that were created by a call +to +pcre2_maketables(). +

    +

    +The gcontext parameter should match what was used in that call to +account for any custom allocators that might be in use; if it is NULL +the system free() is used. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_match.html b/src/pcre2/doc/html/pcre2_match.html new file mode 100644 index 00000000..90f7fcc1 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_match.html @@ -0,0 +1,85 @@ + + +pcre2_match specification + + +

    pcre2_match man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_match(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext); +

    +
    +DESCRIPTION +
    +

    +This function matches a compiled regular expression against a given subject +string, using a matching algorithm that is similar to Perl's. It returns +offsets to what it has matched and to captured substrings via the +match_data block, which can be processed by functions with names that +start with pcre2_get_ovector_...() or pcre2_substring_...(). The +return from pcre2_match() is one more than the highest numbered capturing +pair that has been set (for example, 1 if there are no captures), zero if the +vector of offsets is too small, or a negative error code for no match and other +errors. The function arguments are: +

    +  code         Points to the compiled pattern
    +  subject      Points to the subject string
    +  length       Length of the subject string
    +  startoffset  Offset in the subject at which to start matching
    +  options      Option bits
    +  match_data   Points to a match data block, for results
    +  mcontext     Points to a match context, or is NULL
    +
    +A match context is needed only if you want to: +
    +  Set up a callout function
    +  Set a matching offset limit
    +  Change the heap memory limit
    +  Change the backtracking match limit
    +  Change the backtracking depth limit
    +  Set custom memory management specifically for the match
    +
    +The length and startoffset values are code units, not characters. +The length may be given as PCRE2_ZERO_TERMINATED for a subject that is +terminated by a binary zero code unit. The options are: +
    +  PCRE2_ANCHORED          Match only at the first position
    +  PCRE2_COPY_MATCHED_SUBJECT
    +                          On success, make a private subject copy
    +  PCRE2_ENDANCHORED       Pattern can match only at end of subject
    +  PCRE2_NOTBOL            Subject string is not the beginning of a line
    +  PCRE2_NOTEOL            Subject string is not the end of a line
    +  PCRE2_NOTEMPTY          An empty string is not a valid match
    +  PCRE2_NOTEMPTY_ATSTART  An empty string at the start of the subject is not a valid match
    +  PCRE2_NO_JIT            Do not use JIT matching
    +  PCRE2_NO_UTF_CHECK      Do not check the subject for UTF validity (only relevant if PCRE2_UTF
    +                           was set at compile time)
    +  PCRE2_PARTIAL_HARD      Return PCRE2_ERROR_PARTIAL for a partial match even if there is a full match
    +  PCRE2_PARTIAL_SOFT      Return PCRE2_ERROR_PARTIAL for a partial match if no full matches are found
    +
    +For details of partial matching, see the +pcre2partial +page. There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_match_context_copy.html b/src/pcre2/doc/html/pcre2_match_context_copy.html new file mode 100644 index 00000000..4a719d69 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_match_context_copy.html @@ -0,0 +1,41 @@ + + +pcre2_match_context_copy specification + + +

    pcre2_match_context_copy man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +pcre2_match_context *pcre2_match_context_copy( + pcre2_match_context *mcontext); +

    +
    +DESCRIPTION +
    +

    +This function makes a new copy of a match context, using the memory +allocation function that was used for the original context. The result is NULL +if the memory cannot be obtained. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_match_context_create.html b/src/pcre2/doc/html/pcre2_match_context_create.html new file mode 100644 index 00000000..f7f27351 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_match_context_create.html @@ -0,0 +1,42 @@ + + +pcre2_match_context_create specification + + +

    pcre2_match_context_create man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +pcre2_match_context *pcre2_match_context_create( + pcre2_general_context *gcontext); +

    +
    +DESCRIPTION +
    +

    +This function creates and initializes a new match context. If its argument is +NULL, malloc() is used to get the necessary memory; otherwise the memory +allocation function within the general context is used. The result is NULL if +the memory could not be obtained. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_match_context_free.html b/src/pcre2/doc/html/pcre2_match_context_free.html new file mode 100644 index 00000000..7f00ea9b --- /dev/null +++ b/src/pcre2/doc/html/pcre2_match_context_free.html @@ -0,0 +1,41 @@ + + +pcre2_match_context_free specification + + +

    pcre2_match_context_free man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +void pcre2_match_context_free(pcre2_match_context *mcontext); +

    +
    +DESCRIPTION +
    +

    +This function frees the memory occupied by a match context, using the memory +freeing function from the general context with which it was created, or +free() if that was not set. If the argument is NULL, the function returns +immediately without doing anything. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_match_data_create.html b/src/pcre2/doc/html/pcre2_match_data_create.html new file mode 100644 index 00000000..8d0321b5 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_match_data_create.html @@ -0,0 +1,49 @@ + + +pcre2_match_data_create specification + + +

    pcre2_match_data_create man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +pcre2_match_data *pcre2_match_data_create(uint32_t ovecsize, + pcre2_general_context *gcontext); +

    +
    +DESCRIPTION +
    +

    +This function creates a new match data block, which is used for holding the +result of a match. The first argument specifies the number of pairs of offsets +that are required. These form the "output vector" (ovector) within the match +data block, and are used to identify the matched string and any captured +substrings. There is always one pair of offsets; if ovecsize is zero, it +is treated as one. +

    +

    +The second argument points to a general context, for custom memory management, +or is NULL for system memory management. The result of the function is NULL if +the memory for the block could not be obtained. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_match_data_create_from_pattern.html b/src/pcre2/doc/html/pcre2_match_data_create_from_pattern.html new file mode 100644 index 00000000..f40cf1e1 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_match_data_create_from_pattern.html @@ -0,0 +1,50 @@ + + +pcre2_match_data_create_from_pattern specification + + +

    pcre2_match_data_create_from_pattern man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +pcre2_match_data *pcre2_match_data_create_from_pattern( + const pcre2_code *code, pcre2_general_context *gcontext); +

    +
    +DESCRIPTION +
    +

    +This function creates a new match data block, which is used for holding the +result of a match. The first argument points to a compiled pattern. The number +of capturing parentheses within the pattern is used to compute the number of +pairs of offsets that are required in the match data block. These form the +"output vector" (ovector) within the match data block, and are used to identify +the matched string and any captured substrings. +

    +

    +The second argument points to a general context, for custom memory management, +or is NULL to use the same memory allocator as was used for the compiled +pattern. The result of the function is NULL if the memory for the block could +not be obtained. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_match_data_free.html b/src/pcre2/doc/html/pcre2_match_data_free.html new file mode 100644 index 00000000..6ba6162d --- /dev/null +++ b/src/pcre2/doc/html/pcre2_match_data_free.html @@ -0,0 +1,46 @@ + + +pcre2_match_data_free specification + + +

    pcre2_match_data_free man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +void pcre2_match_data_free(pcre2_match_data *match_data); +

    +
    +DESCRIPTION +
    +

    +If match_data is NULL, this function does nothing. Otherwise, +match_data must point to a match data block, which this function frees, +using the memory freeing function from the general context or compiled pattern +with which it was created, or free() if that was not set. +

    +

    +If the PCRE2_COPY_MATCHED_SUBJECT was used for a successful match using this +match data block, the copy of the subject that was remembered with the block is +also freed. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_pattern_convert.html b/src/pcre2/doc/html/pcre2_pattern_convert.html new file mode 100644 index 00000000..2fcd7cc0 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_pattern_convert.html @@ -0,0 +1,70 @@ + + +pcre2_pattern_convert specification + + +

    pcre2_pattern_convert man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_pattern_convert(PCRE2_SPTR pattern, PCRE2_SIZE length, + uint32_t options, PCRE2_UCHAR **buffer, + PCRE2_SIZE *blength, pcre2_convert_context *cvcontext); +

    +
    +DESCRIPTION +
    +

    +This function is part of an experimental set of pattern conversion functions. +It converts a foreign pattern (for example, a glob) into a PCRE2 regular +expression pattern. Its arguments are: +

    +  pattern     The foreign pattern
    +  length      The length of the input pattern or PCRE2_ZERO_TERMINATED
    +  options     Option bits
    +  buffer      Pointer to pointer to output buffer, or NULL
    +  blength     Pointer to output length field
    +  cvcontext   Pointer to a convert context or NULL
    +
    +The length of the converted pattern (excluding the terminating zero) is +returned via blength. If buffer is NULL, the function just returns +the output length. If buffer points to a NULL pointer, heap memory is +obtained for the converted pattern, using the allocator in the context if +present (or else malloc()), and the field pointed to by buffer is +updated. If buffer points to a non-NULL field, that must point to a +buffer whose size is in the variable pointed to by blength. This value is +updated. +

    +

    +The option bits are: +

    +  PCRE2_CONVERT_UTF                     Input is UTF
    +  PCRE2_CONVERT_NO_UTF_CHECK            Do not check UTF validity
    +  PCRE2_CONVERT_POSIX_BASIC             Convert POSIX basic pattern
    +  PCRE2_CONVERT_POSIX_EXTENDED          Convert POSIX extended pattern
    +  PCRE2_CONVERT_GLOB                    ) Convert
    +  PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR  )   various types
    +  PCRE2_CONVERT_GLOB_NO_STARSTAR        )     of glob
    +
    +The return value from pcre2_pattern_convert() is zero on success or a +non-zero PCRE2 error code. +

    +

    +The pattern conversion functions are described in the +pcre2convert +documentation. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_pattern_info.html b/src/pcre2/doc/html/pcre2_pattern_info.html new file mode 100644 index 00000000..eaaac6c0 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_pattern_info.html @@ -0,0 +1,109 @@ + + +pcre2_pattern_info specification + + +

    pcre2_pattern_info man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_pattern_info(const pcre2_code *code, uint32_t what, + void *where); +

    +
    +DESCRIPTION +
    +

    +This function returns information about a compiled pattern. Its arguments are: +

    +  code     Pointer to a compiled regular expression pattern
    +  what     What information is required
    +  where    Where to put the information
    +
    +The recognized values for the what argument, and the information they +request are as follows: +
    +  PCRE2_INFO_ALLOPTIONS      Final options after compiling
    +  PCRE2_INFO_ARGOPTIONS      Options passed to pcre2_compile()
    +  PCRE2_INFO_BACKREFMAX      Number of highest backreference
    +  PCRE2_INFO_BSR             What \R matches:
    +                               PCRE2_BSR_UNICODE: Unicode line endings
    +                               PCRE2_BSR_ANYCRLF: CR, LF, or CRLF only
    +  PCRE2_INFO_CAPTURECOUNT    Number of capturing subpatterns
    +  PCRE2_INFO_DEPTHLIMIT      Backtracking depth limit if set, otherwise PCRE2_ERROR_UNSET
    +  PCRE2_INFO_EXTRAOPTIONS    Extra options that were passed in the
    +                               compile context
    +  PCRE2_INFO_FIRSTBITMAP     Bitmap of first code units, or NULL
    +  PCRE2_INFO_FIRSTCODETYPE   Type of start-of-match information
    +                               0 nothing set
    +                               1 first code unit is set
    +                               2 start of string or after newline
    +  PCRE2_INFO_FIRSTCODEUNIT   First code unit when type is 1
    +  PCRE2_INFO_FRAMESIZE       Size of backtracking frame
    +  PCRE2_INFO_HASBACKSLASHC   Return 1 if pattern contains \C
    +  PCRE2_INFO_HASCRORLF       Return 1 if explicit CR or LF matches exist in the pattern
    +  PCRE2_INFO_HEAPLIMIT       Heap memory limit if set, otherwise PCRE2_ERROR_UNSET
    +  PCRE2_INFO_JCHANGED        Return 1 if (?J) or (?-J) was used
    +  PCRE2_INFO_JITSIZE         Size of JIT compiled code, or 0
    +  PCRE2_INFO_LASTCODETYPE    Type of must-be-present information
    +                               0 nothing set
    +                               1 code unit is set
    +  PCRE2_INFO_LASTCODEUNIT    Last code unit when type is 1
    +  PCRE2_INFO_MATCHEMPTY      1 if the pattern can match an empty string, 0 otherwise
    +  PCRE2_INFO_MATCHLIMIT      Match limit if set, otherwise PCRE2_ERROR_UNSET
    +  PCRE2_INFO_MAXLOOKBEHIND   Length (in characters) of the longest lookbehind assertion
    +  PCRE2_INFO_MINLENGTH       Lower bound length of matching strings
    +  PCRE2_INFO_NAMECOUNT       Number of named subpatterns
    +  PCRE2_INFO_NAMEENTRYSIZE   Size of name table entries
    +  PCRE2_INFO_NAMETABLE       Pointer to name table
    +  PCRE2_CONFIG_NEWLINE       Code for the newline sequence:
    +                               PCRE2_NEWLINE_CR
    +                               PCRE2_NEWLINE_LF
    +                               PCRE2_NEWLINE_CRLF
    +                               PCRE2_NEWLINE_ANY
    +                               PCRE2_NEWLINE_ANYCRLF
    +                               PCRE2_NEWLINE_NUL
    +  PCRE2_INFO_RECURSIONLIMIT  Obsolete synonym for PCRE2_INFO_DEPTHLIMIT
    +  PCRE2_INFO_SIZE            Size of compiled pattern
    +
    +If where is NULL, the function returns the amount of memory needed for +the requested information, in bytes. Otherwise, the where argument must +point to an unsigned 32-bit integer (uint32_t variable), except for the +following what values, when it must point to a variable of the type +shown: +
    +  PCRE2_INFO_FIRSTBITMAP     const uint8_t *
    +  PCRE2_INFO_JITSIZE         size_t
    +  PCRE2_INFO_NAMETABLE       PCRE2_SPTR
    +  PCRE2_INFO_SIZE            size_t
    +
    +The yield of the function is zero on success or: +
    +  PCRE2_ERROR_NULL           the argument code is NULL
    +  PCRE2_ERROR_BADMAGIC       the "magic number" was not found
    +  PCRE2_ERROR_BADOPTION      the value of what is invalid
    +  PCRE2_ERROR_BADMODE        the pattern was compiled in the wrong mode
    +  PCRE2_ERROR_UNSET          the requested information is not set
    +
    +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_serialize_decode.html b/src/pcre2/doc/html/pcre2_serialize_decode.html new file mode 100644 index 00000000..cff6e6cc --- /dev/null +++ b/src/pcre2/doc/html/pcre2_serialize_decode.html @@ -0,0 +1,65 @@ + + +pcre2_serialize_decode specification + + +

    pcre2_serialize_decode man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int32_t pcre2_serialize_decode(pcre2_code **codes, + int32_t number_of_codes, const uint8_t *bytes, + pcre2_general_context *gcontext); +

    +
    +DESCRIPTION +
    +

    +This function decodes a serialized set of compiled patterns back into a list of +individual patterns. This is possible only on a host that is running the same +version of PCRE2, with the same code unit width, and the host must also have +the same endianness, pointer width and PCRE2_SIZE type. The arguments for +pcre2_serialize_decode() are: +

    +  codes            pointer to a vector in which to build the list
    +  number_of_codes  number of slots in the vector
    +  bytes            the serialized byte stream
    +  gcontext         pointer to a general context or NULL
    +
    +The bytes argument must point to a block of data that was originally +created by pcre2_serialize_encode(), though it may have been saved on +disc or elsewhere in the meantime. If there are more codes in the serialized +data than slots in the list, only those compiled patterns that will fit are +decoded. The yield of the function is the number of decoded patterns, or one of +the following negative error codes: +
    +  PCRE2_ERROR_BADDATA   number_of_codes is zero or less
    +  PCRE2_ERROR_BADMAGIC  mismatch of id bytes in bytes
    +  PCRE2_ERROR_BADMODE   mismatch of variable unit size or PCRE version
    +  PCRE2_ERROR_MEMORY    memory allocation failed
    +  PCRE2_ERROR_NULL      codes or bytes is NULL
    +
    +PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled +on a system with different endianness. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the serialization functions in the +pcre2serialize +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_serialize_encode.html b/src/pcre2/doc/html/pcre2_serialize_encode.html new file mode 100644 index 00000000..f1532700 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_serialize_encode.html @@ -0,0 +1,66 @@ + + +pcre2_serialize_encode specification + + +

    pcre2_serialize_encode man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int32_t pcre2_serialize_encode(const pcre2_code **codes, + int32_t number_of_codes, uint8_t **serialized_bytes, + PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext); +

    +
    +DESCRIPTION +
    +

    +This function encodes a list of compiled patterns into a byte stream that can +be saved on disc or elsewhere. Note that this is not an abstract format like +Java or .NET. Conversion of the byte stream back into usable compiled patterns +can only happen on a host that is running the same version of PCRE2, with the +same code unit width, and the host must also have the same endianness, pointer +width and PCRE2_SIZE type. The arguments for pcre2_serialize_encode() +are: +

    +  codes             pointer to a vector containing the list
    +  number_of_codes   number of slots in the vector
    +  serialized_bytes  set to point to the serialized byte stream
    +  serialized_size   set to the number of bytes in the byte stream
    +  gcontext          pointer to a general context or NULL
    +
    +The context argument is used to obtain memory for the byte stream. When the +serialized data is no longer needed, it must be freed by calling +pcre2_serialize_free(). The yield of the function is the number of +serialized patterns, or one of the following negative error codes: +
    +  PCRE2_ERROR_BADDATA      number_of_codes is zero or less
    +  PCRE2_ERROR_BADMAGIC     mismatch of id bytes in one of the patterns
    +  PCRE2_ERROR_MEMORY       memory allocation failed
    +  PCRE2_ERROR_MIXEDTABLES  the patterns do not all use the same tables
    +  PCRE2_ERROR_NULL         an argument other than gcontext is NULL
    +
    +PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or +that a slot in the vector does not point to a compiled pattern. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the serialization functions in the +pcre2serialize +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_serialize_free.html b/src/pcre2/doc/html/pcre2_serialize_free.html new file mode 100644 index 00000000..26b435bc --- /dev/null +++ b/src/pcre2/doc/html/pcre2_serialize_free.html @@ -0,0 +1,41 @@ + + +pcre2_serialize_free specification + + +

    pcre2_serialize_free man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +void pcre2_serialize_free(uint8_t *bytes); +

    +
    +DESCRIPTION +
    +

    +This function frees the memory that was obtained by +pcre2_serialize_encode() to hold a serialized byte stream. The argument +must point to such a byte stream or be NULL, in which case the function returns +without doing anything. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the serialization functions in the +pcre2serialize +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_serialize_get_number_of_codes.html b/src/pcre2/doc/html/pcre2_serialize_get_number_of_codes.html new file mode 100644 index 00000000..fdd24294 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_serialize_get_number_of_codes.html @@ -0,0 +1,49 @@ + + +pcre2_serialize_get_number_of_codes specification + + +

    pcre2_serialize_get_number_of_codes man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int32_t pcre2_serialize_get_number_of_codes(const uint8_t *bytes); +

    +
    +DESCRIPTION +
    +

    +The bytes argument must point to a serialized byte stream that was +originally created by pcre2_serialize_encode() (though it may have been +saved on disc or elsewhere in the meantime). The function returns the number of +serialized patterns in the byte stream, or one of the following negative error +codes: +

    +  PCRE2_ERROR_BADMAGIC  mismatch of id bytes in bytes
    +  PCRE2_ERROR_BADMODE   mismatch of variable unit size or PCRE version
    +  PCRE2_ERROR_NULL      the argument is NULL
    +
    +PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled +on a system with different endianness. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the serialization functions in the +pcre2serialize +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_bsr.html b/src/pcre2/doc/html/pcre2_set_bsr.html new file mode 100644 index 00000000..8a62f18a --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_bsr.html @@ -0,0 +1,42 @@ + + +pcre2_set_bsr specification + + +

    pcre2_set_bsr man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_bsr(pcre2_compile_context *ccontext, + uint32_t value); +

    +
    +DESCRIPTION +
    +

    +This function sets the convention for processing \R within a compile context. +The second argument must be one of PCRE2_BSR_ANYCRLF or PCRE2_BSR_UNICODE. The +result is zero for success or PCRE2_ERROR_BADDATA if the second argument is +invalid. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_callout.html b/src/pcre2/doc/html/pcre2_set_callout.html new file mode 100644 index 00000000..4e7aca6c --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_callout.html @@ -0,0 +1,43 @@ + + +pcre2_set_callout specification + + +

    pcre2_set_callout man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_callout(pcre2_match_context *mcontext, + int (*callout_function)(pcre2_callout_block *), + void *callout_data); +

    +
    +DESCRIPTION +
    +

    +This function sets the callout fields in a match context (the first argument). +The second argument specifies a callout function, and the third argument is an +opaque data item that is passed to it. The result of this function is always +zero. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_character_tables.html b/src/pcre2/doc/html/pcre2_set_character_tables.html new file mode 100644 index 00000000..8564eea6 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_character_tables.html @@ -0,0 +1,45 @@ + + +pcre2_set_character_tables specification + + +

    pcre2_set_character_tables man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_character_tables(pcre2_compile_context *ccontext, + const uint8_t *tables); +

    +
    +DESCRIPTION +
    +

    +This function sets a pointer to custom character tables within a compile +context. The second argument must point to a set of PCRE2 character tables or +be NULL to request the default tables. The result is always zero. Character +tables can be created by calling pcre2_maketables() or by running the +pcre2_dftables maintenance command in binary mode (see the +pcre2build +documentation). +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_compile_extra_options.html b/src/pcre2/doc/html/pcre2_set_compile_extra_options.html new file mode 100644 index 00000000..c6c11f7e --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_compile_extra_options.html @@ -0,0 +1,47 @@ + + +pcre2_set_compile_extra_options specification + + +

    pcre2_set_compile_extra_options man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_compile_extra_options(pcre2_compile_context *ccontext, + uint32_t extra_options); +

    +
    +DESCRIPTION +
    +

    +This function sets additional option bits for pcre2_compile() that are +housed in a compile context. It completely replaces all the bits. The extra +options are: +

    +  PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES  Allow \x{df800} to \x{dfff} in UTF-8 and UTF-32 modes
    +  PCRE2_EXTRA_ALT_BSUX                 Extended alternate \u, \U, and \x handling
    +  PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL    Treat all invalid escapes as a literal following character
    +  PCRE2_EXTRA_ESCAPED_CR_IS_LF         Interpret \r as \n
    +  PCRE2_EXTRA_MATCH_LINE               Pattern matches whole lines
    +  PCRE2_EXTRA_MATCH_WORD               Pattern matches "words"
    +
    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_compile_recursion_guard.html b/src/pcre2/doc/html/pcre2_set_compile_recursion_guard.html new file mode 100644 index 00000000..c09942ce --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_compile_recursion_guard.html @@ -0,0 +1,46 @@ + + +pcre2_set_compile_recursion_guard specification + + +

    pcre2_set_compile_recursion_guard man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_compile_recursion_guard(pcre2_compile_context *ccontext, + int (*guard_function)(uint32_t, void *), void *user_data); +

    +
    +DESCRIPTION +
    +

    +This function defines, within a compile context, a function that is called +whenever pcre2_compile() starts to compile a parenthesized part of a +pattern. The first argument to the function gives the current depth of +parenthesis nesting, and the second is user data that is supplied when the +function is set up. The callout function should return zero if all is well, or +non-zero to force an error. This feature is provided so that applications can +check the available system stack space, in order to avoid running out. The +result of pcre2_set_compile_recursion_guard() is always zero. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_depth_limit.html b/src/pcre2/doc/html/pcre2_set_depth_limit.html new file mode 100644 index 00000000..a1cf7062 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_depth_limit.html @@ -0,0 +1,40 @@ + + +pcre2_set_depth_limit specification + + +

    pcre2_set_depth_limit man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_depth_limit(pcre2_match_context *mcontext, + uint32_t value); +

    +
    +DESCRIPTION +
    +

    +This function sets the backtracking depth limit field in a match context. The +result is always zero. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_glob_escape.html b/src/pcre2/doc/html/pcre2_set_glob_escape.html new file mode 100644 index 00000000..2b556271 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_glob_escape.html @@ -0,0 +1,43 @@ + + +pcre2_set_glob_escape specification + + +

    pcre2_set_glob_escape man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_glob_escape(pcre2_convert_context *cvcontext, + uint32_t escape_char); +

    +
    +DESCRIPTION +
    +

    +This function is part of an experimental set of pattern conversion functions. +It sets the escape character that is used when converting globs. The second +argument must either be zero (meaning there is no escape character) or a +punctuation character whose code point is less than 256. The default is grave +accent if running under Windows, otherwise backslash. The result of the +function is zero for success or PCRE2_ERROR_BADDATA if the second argument is +invalid. +

    +

    +The pattern conversion functions are described in the +pcre2convert +documentation. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_glob_separator.html b/src/pcre2/doc/html/pcre2_set_glob_separator.html new file mode 100644 index 00000000..283648ea --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_glob_separator.html @@ -0,0 +1,42 @@ + + +pcre2_set_glob_separator specification + + +

    pcre2_set_glob_separator man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_glob_separator(pcre2_convert_context *cvcontext, + uint32_t separator_char); +

    +
    +DESCRIPTION +
    +

    +This function is part of an experimental set of pattern conversion functions. +It sets the component separator character that is used when converting globs. +The second argument must be one of the characters forward slash, backslash, or +dot. The default is backslash when running under Windows, otherwise forward +slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if +the second argument is invalid. +

    +

    +The pattern conversion functions are described in the +pcre2convert +documentation. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_heap_limit.html b/src/pcre2/doc/html/pcre2_set_heap_limit.html new file mode 100644 index 00000000..3631ef6f --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_heap_limit.html @@ -0,0 +1,40 @@ + + +pcre2_set_heap_limit specification + + +

    pcre2_set_heap_limit man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_heap_limit(pcre2_match_context *mcontext, + uint32_t value); +

    +
    +DESCRIPTION +
    +

    +This function sets the backtracking heap limit field in a match context. The +result is always zero. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_match_limit.html b/src/pcre2/doc/html/pcre2_set_match_limit.html new file mode 100644 index 00000000..e840c744 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_match_limit.html @@ -0,0 +1,40 @@ + + +pcre2_set_match_limit specification + + +

    pcre2_set_match_limit man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_match_limit(pcre2_match_context *mcontext, + uint32_t value); +

    +
    +DESCRIPTION +
    +

    +This function sets the match limit field in a match context. The result is +always zero. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_max_pattern_length.html b/src/pcre2/doc/html/pcre2_set_max_pattern_length.html new file mode 100644 index 00000000..f6e422aa --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_max_pattern_length.html @@ -0,0 +1,43 @@ + + +pcre2_set_max_pattern_length specification + + +

    pcre2_set_max_pattern_length man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_max_pattern_length(pcre2_compile_context *ccontext, + PCRE2_SIZE value); +

    +
    +DESCRIPTION +
    +

    +This function sets, in a compile context, the maximum text length (in code +units) of the pattern that can be compiled. The result is always zero. If a +longer pattern is passed to pcre2_compile() there is an immediate error +return. The default is effectively unlimited, being the largest value a +PCRE2_SIZE variable can hold. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_newline.html b/src/pcre2/doc/html/pcre2_set_newline.html new file mode 100644 index 00000000..ba813001 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_newline.html @@ -0,0 +1,51 @@ + + +pcre2_set_newline specification + + +

    pcre2_set_newline man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_newline(pcre2_compile_context *ccontext, + uint32_t value); +

    +
    +DESCRIPTION +
    +

    +This function sets the newline convention within a compile context. This +specifies which character(s) are recognized as newlines when compiling and +matching patterns. The second argument must be one of: +

    +  PCRE2_NEWLINE_CR        Carriage return only
    +  PCRE2_NEWLINE_LF        Linefeed only
    +  PCRE2_NEWLINE_CRLF      CR followed by LF only
    +  PCRE2_NEWLINE_ANYCRLF   Any of the above
    +  PCRE2_NEWLINE_ANY       Any Unicode newline sequence
    +  PCRE2_NEWLINE_NUL       The NUL character (binary zero)
    +
    +The result is zero for success or PCRE2_ERROR_BADDATA if the second argument is +invalid. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_offset_limit.html b/src/pcre2/doc/html/pcre2_set_offset_limit.html new file mode 100644 index 00000000..6d9a85c6 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_offset_limit.html @@ -0,0 +1,40 @@ + + +pcre2_set_offset_limit specification + + +

    pcre2_set_offset_limit man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_offset_limit(pcre2_match_context *mcontext, + PCRE2_SIZE value); +

    +
    +DESCRIPTION +
    +

    +This function sets the offset limit field in a match context. The result is +always zero. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_parens_nest_limit.html b/src/pcre2/doc/html/pcre2_set_parens_nest_limit.html new file mode 100644 index 00000000..95fd31c3 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_parens_nest_limit.html @@ -0,0 +1,40 @@ + + +pcre2_set_parens_nest_limit specification + + +

    pcre2_set_parens_nest_limit man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext, + uint32_t value); +

    +
    +DESCRIPTION +
    +

    +This function sets, in a compile context, the maximum depth of nested +parentheses in a pattern. The result is always zero. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_recursion_limit.html b/src/pcre2/doc/html/pcre2_set_recursion_limit.html new file mode 100644 index 00000000..9ff68c2f --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_recursion_limit.html @@ -0,0 +1,40 @@ + + +pcre2_set_recursion_limit specification + + +

    pcre2_set_recursion_limit man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_recursion_limit(pcre2_match_context *mcontext, + uint32_t value); +

    +
    +DESCRIPTION +
    +

    +This function is obsolete and should not be used in new code. Use +pcre2_set_depth_limit() instead. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_recursion_memory_management.html b/src/pcre2/doc/html/pcre2_set_recursion_memory_management.html new file mode 100644 index 00000000..1e057b9d --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_recursion_memory_management.html @@ -0,0 +1,42 @@ + + +pcre2_set_recursion_memory_management specification + + +

    pcre2_set_recursion_memory_management man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_recursion_memory_management( + pcre2_match_context *mcontext, + void *(*private_malloc)(PCRE2_SIZE, void *), + void (*private_free)(void *, void *), void *memory_data); +

    +
    +DESCRIPTION +
    +

    +From release 10.30 onwards, this function is obsolete and does nothing. The +result is always zero. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_set_substitute_callout.html b/src/pcre2/doc/html/pcre2_set_substitute_callout.html new file mode 100644 index 00000000..7ae3a398 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_set_substitute_callout.html @@ -0,0 +1,43 @@ + + +pcre2_set_substitute_callout specification + + +

    pcre2_set_substitute_callout man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_set_substitute_callout(pcre2_match_context *mcontext, + int (*callout_function)(pcre2_substitute_callout_block *), + void *callout_data); +

    +
    +DESCRIPTION +
    +

    +This function sets the substitute callout fields in a match context (the first +argument). The second argument specifies a callout function, and the third +argument is an opaque data item that is passed to it. The result of this +function is always zero. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_substitute.html b/src/pcre2/doc/html/pcre2_substitute.html new file mode 100644 index 00000000..10b2267e --- /dev/null +++ b/src/pcre2/doc/html/pcre2_substitute.html @@ -0,0 +1,111 @@ + + +pcre2_substitute specification + + +

    pcre2_substitute man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_substitute(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext, PCRE2_SPTR replacement, + PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer, + PCRE2_SIZE *outlengthptr); +

    +
    +DESCRIPTION +
    +

    +This function matches a compiled regular expression against a given subject +string, using a matching algorithm that is similar to Perl's. It then makes a +copy of the subject, substituting a replacement string for what was matched. +Its arguments are: +

    +  code          Points to the compiled pattern
    +  subject       Points to the subject string
    +  length        Length of the subject string
    +  startoffset   Offset in the subject at which to start matching
    +  options       Option bits
    +  match_data    Points to a match data block, or is NULL
    +  mcontext      Points to a match context, or is NULL
    +  replacement   Points to the replacement string
    +  rlength       Length of the replacement string
    +  outputbuffer  Points to the output buffer
    +  outlengthptr  Points to the length of the output buffer
    +
    +A match data block is needed only if you want to inspect the data from the +final match that is returned in that block or if PCRE2_SUBSTITUTE_MATCHED is +set. A match context is needed only if you want to: +
    +  Set up a callout function
    +  Set a matching offset limit
    +  Change the backtracking match limit
    +  Change the backtracking depth limit
    +  Set custom memory management in the match context
    +
    +The length, startoffset and rlength values are code units, +not characters, as is the contents of the variable pointed at by +outlengthptr. This variable must contain the length of the output buffer +when the function is called. If the function is successful, the value is +changed to the length of the new string, excluding the trailing zero that is +automatically added. +

    +

    +The subject and replacement lengths can be given as PCRE2_ZERO_TERMINATED for +zero-terminated strings. The options are: +

    +  PCRE2_ANCHORED             Match only at the first position
    +  PCRE2_ENDANCHORED          Pattern can match only at end of subject
    +  PCRE2_NOTBOL               Subject is not the beginning of a line
    +  PCRE2_NOTEOL               Subject is not the end of a line
    +  PCRE2_NOTEMPTY             An empty string is not a valid match
    +  PCRE2_NOTEMPTY_ATSTART     An empty string at the start of the subject is not a valid match
    +  PCRE2_NO_JIT               Do not use JIT matching
    +  PCRE2_NO_UTF_CHECK         Do not check the subject or replacement for UTF validity (only relevant if
    +                              PCRE2_UTF was set at compile time)
    +  PCRE2_SUBSTITUTE_EXTENDED  Do extended replacement processing
    +  PCRE2_SUBSTITUTE_GLOBAL    Replace all occurrences in the subject
    +  PCRE2_SUBSTITUTE_LITERAL   The replacement string is literal
    +  PCRE2_SUBSTITUTE_MATCHED   Use pre-existing match data for 1st match
    +  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  If overflow, compute needed length
    +  PCRE2_SUBSTITUTE_REPLACEMENT_ONLY  Return only replacement string(s)
    +  PCRE2_SUBSTITUTE_UNKNOWN_UNSET  Treat unknown group as unset
    +  PCRE2_SUBSTITUTE_UNSET_EMPTY  Simple unset insert = empty string
    +
    +If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_EXTENDED, +PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY are ignored. +

    +

    +If PCRE2_SUBSTITUTE_MATCHED is set, match_data must be non-zero; its +contents must be the result of a call to pcre2_match() using the same +pattern and subject. +

    +

    +The function returns the number of substitutions, which may be zero if there +are no matches. The result may be greater than one only when +PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code +is returned. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_substring_copy_byname.html b/src/pcre2/doc/html/pcre2_substring_copy_byname.html new file mode 100644 index 00000000..fd01805e --- /dev/null +++ b/src/pcre2/doc/html/pcre2_substring_copy_byname.html @@ -0,0 +1,58 @@ + + +pcre2_substring_copy_byname specification + + +

    pcre2_substring_copy_byname man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_substring_copy_byname(pcre2_match_data *match_data, + PCRE2_SPTR name, PCRE2_UCHAR *buffer, PCRE2_SIZE *bufflen); +

    +
    +DESCRIPTION +
    +

    +This is a convenience function for extracting a captured substring, identified +by name, into a given buffer. The arguments are: +

    +  match_data    The match data block for the match
    +  name          Name of the required substring
    +  buffer        Buffer to receive the string
    +  bufflen       Length of buffer (code units)
    +
    +The bufflen variable is updated to contain the length of the extracted +string, excluding the trailing zero. The yield of the function is zero for +success or one of the following error numbers: +
    +  PCRE2_ERROR_NOSUBSTRING   there are no groups of that name
    +  PCRE2_ERROR_UNAVAILBLE    the ovector was too small for that group
    +  PCRE2_ERROR_UNSET         the group did not participate in the match
    +  PCRE2_ERROR_NOMEMORY      the buffer is not big enough
    +
    +If there is more than one group with the given name, the first one that is set +is returned. In this situation PCRE2_ERROR_UNSET means that no group with the +given name was set. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_substring_copy_bynumber.html b/src/pcre2/doc/html/pcre2_substring_copy_bynumber.html new file mode 100644 index 00000000..83e1a272 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_substring_copy_bynumber.html @@ -0,0 +1,57 @@ + + +pcre2_substring_copy_bynumber specification + + +

    pcre2_substring_copy_bynumber man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_substring_copy_bynumber(pcre2_match_data *match_data, + uint32_t number, PCRE2_UCHAR *buffer, + PCRE2_SIZE *bufflen); +

    +
    +DESCRIPTION +
    +

    +This is a convenience function for extracting a captured substring into a given +buffer. The arguments are: +

    +  match_data    The match data block for the match
    +  number        Number of the required substring
    +  buffer        Buffer to receive the string
    +  bufflen       Length of buffer
    +
    +The bufflen variable is updated with the length of the extracted string, +excluding the terminating zero. The yield of the function is zero for success +or one of the following error numbers: +
    +  PCRE2_ERROR_NOSUBSTRING   there are no groups of that number
    +  PCRE2_ERROR_UNAVAILBLE    the ovector was too small for that group
    +  PCRE2_ERROR_UNSET         the group did not participate in the match
    +  PCRE2_ERROR_NOMEMORY      the buffer is too small
    +
    +
    +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_substring_free.html b/src/pcre2/doc/html/pcre2_substring_free.html new file mode 100644 index 00000000..e0d0fbda --- /dev/null +++ b/src/pcre2/doc/html/pcre2_substring_free.html @@ -0,0 +1,41 @@ + + +pcre2_substring_free specification + + +

    pcre2_substring_free man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +void pcre2_substring_free(PCRE2_UCHAR *buffer); +

    +
    +DESCRIPTION +
    +

    +This is a convenience function for freeing the memory obtained by a previous +call to pcre2_substring_get_byname() or +pcre2_substring_get_bynumber(). Its only argument is a pointer to the +string. If the argument is NULL, the function does nothing. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_substring_get_byname.html b/src/pcre2/doc/html/pcre2_substring_get_byname.html new file mode 100644 index 00000000..a4b8771d --- /dev/null +++ b/src/pcre2/doc/html/pcre2_substring_get_byname.html @@ -0,0 +1,60 @@ + + +pcre2_substring_get_byname specification + + +

    pcre2_substring_get_byname man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_substring_get_byname(pcre2_match_data *match_data, + PCRE2_SPTR name, PCRE2_UCHAR **bufferptr, PCRE2_SIZE *bufflen); +

    +
    +DESCRIPTION +
    +

    +This is a convenience function for extracting a captured substring by name into +newly acquired memory. The arguments are: +

    +  match_data    The match data for the match
    +  name          Name of the required substring
    +  bufferptr     Where to put the string pointer
    +  bufflen       Where to put the string length
    +
    +The memory in which the substring is placed is obtained by calling the same +memory allocation function that was used for the match data block. The +convenience function pcre2_substring_free() can be used to free it when +it is no longer needed. The yield of the function is zero for success or one of +the following error numbers: +
    +  PCRE2_ERROR_NOSUBSTRING   there are no groups of that name
    +  PCRE2_ERROR_UNAVAILBLE    the ovector was too small for that group
    +  PCRE2_ERROR_UNSET         the group did not participate in the match
    +  PCRE2_ERROR_NOMEMORY      memory could not be obtained
    +
    +If there is more than one group with the given name, the first one that is set +is returned. In this situation PCRE2_ERROR_UNSET means that no group with the +given name was set. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_substring_get_bynumber.html b/src/pcre2/doc/html/pcre2_substring_get_bynumber.html new file mode 100644 index 00000000..391bc82b --- /dev/null +++ b/src/pcre2/doc/html/pcre2_substring_get_bynumber.html @@ -0,0 +1,58 @@ + + +pcre2_substring_get_bynumber specification + + +

    pcre2_substring_get_bynumber man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_substring_get_bynumber(pcre2_match_data *match_data, + uint32_t number, PCRE2_UCHAR **bufferptr, PCRE2_SIZE *bufflen); +

    +
    +DESCRIPTION +
    +

    +This is a convenience function for extracting a captured substring by number +into newly acquired memory. The arguments are: +

    +  match_data    The match data for the match
    +  number        Number of the required substring
    +  bufferptr     Where to put the string pointer
    +  bufflen       Where to put the string length
    +
    +The memory in which the substring is placed is obtained by calling the same +memory allocation function that was used for the match data block. The +convenience function pcre2_substring_free() can be used to free it when +it is no longer needed. The yield of the function is zero for success or one of +the following error numbers: +
    +  PCRE2_ERROR_NOSUBSTRING   there are no groups of that number
    +  PCRE2_ERROR_UNAVAILBLE    the ovector was too small for that group
    +  PCRE2_ERROR_UNSET         the group did not participate in the match
    +  PCRE2_ERROR_NOMEMORY      memory could not be obtained
    +
    +
    +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_substring_length_byname.html b/src/pcre2/doc/html/pcre2_substring_length_byname.html new file mode 100644 index 00000000..213bc949 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_substring_length_byname.html @@ -0,0 +1,46 @@ + + +pcre2_substring_length_byname specification + + +

    pcre2_substring_length_byname man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_substring_length_byname(pcre2_match_data *match_data, + PCRE2_SPTR name, PCRE2_SIZE *length); +

    +
    +DESCRIPTION +
    +

    +This function returns the length of a matched substring, identified by name. +The arguments are: +

    +  match_data   The match data block for the match
    +  name         The substring name
    +  length       Where to return the length
    +
    +The yield is zero on success, or an error code if the substring is not found. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_substring_length_bynumber.html b/src/pcre2/doc/html/pcre2_substring_length_bynumber.html new file mode 100644 index 00000000..db01cca4 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_substring_length_bynumber.html @@ -0,0 +1,48 @@ + + +pcre2_substring_length_bynumber specification + + +

    pcre2_substring_length_bynumber man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_substring_length_bynumber(pcre2_match_data *match_data, + uint32_t number, PCRE2_SIZE *length); +

    +
    +DESCRIPTION +
    +

    +This function returns the length of a matched substring, identified by number. +The arguments are: +

    +  match_data   The match data block for the match
    +  number       The substring number
    +  length       Where to return the length, or NULL
    +
    +The third argument may be NULL if all you want to know is whether or not a +substring is set. The yield is zero on success, or a negative error code +otherwise. After a partial match, only substring 0 is available. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_substring_list_free.html b/src/pcre2/doc/html/pcre2_substring_list_free.html new file mode 100644 index 00000000..0919d1e5 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_substring_list_free.html @@ -0,0 +1,41 @@ + + +pcre2_substring_list_free specification + + +

    pcre2_substring_list_free man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +void pcre2_substring_list_free(PCRE2_SPTR *list); +

    +
    +DESCRIPTION +
    +

    +This is a convenience function for freeing the store obtained by a previous +call to pcre2substring_list_get(). Its only argument is a pointer to +the list of string pointers. If the argument is NULL, the function returns +immediately, without doing anything. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_substring_list_get.html b/src/pcre2/doc/html/pcre2_substring_list_get.html new file mode 100644 index 00000000..fd436274 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_substring_list_get.html @@ -0,0 +1,56 @@ + + +pcre2_substring_list_get specification + + +

    pcre2_substring_list_get man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_substring_list_get(pcre2_match_data *match_data, +" PCRE2_UCHAR ***listptr, PCRE2_SIZE **lengthsptr); +

    +
    +DESCRIPTION +
    +

    +This is a convenience function for extracting all the captured substrings after +a pattern match. It builds a list of pointers to the strings, and (optionally) +a second list that contains their lengths (in code units), excluding a +terminating zero that is added to each of them. All this is done in a single +block of memory that is obtained using the same memory allocation function that +was used to get the match data block. The convenience function +pcre2_substring_list_free() can be used to free it when it is no longer +needed. The arguments are: +

    +  match_data    The match data block
    +  listptr       Where to put a pointer to the list
    +  lengthsptr    Where to put a pointer to the lengths, or NULL
    +
    +A pointer to a list of pointers is put in the variable whose address is in +listptr. The list is terminated by a NULL pointer. If lengthsptr is +not NULL, a matching list of lengths is created, and its address is placed in +lengthsptr. The yield of the function is zero on success or +PCRE2_ERROR_NOMEMORY if sufficient memory could not be obtained. +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_substring_nametable_scan.html b/src/pcre2/doc/html/pcre2_substring_nametable_scan.html new file mode 100644 index 00000000..277affae --- /dev/null +++ b/src/pcre2/doc/html/pcre2_substring_nametable_scan.html @@ -0,0 +1,53 @@ + + +pcre2_substring_nametable_scan specification + + +

    pcre2_substring_nametable_scan man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_substring_nametable_scan(const pcre2_code *code, + PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last); +

    +
    +DESCRIPTION +
    +

    +This convenience function finds, for a compiled pattern, the first and last +entries for a given name in the table that translates capture group names into +numbers. +

    +  code    Compiled regular expression
    +  name    Name whose entries required
    +  first   Where to return a pointer to the first entry
    +  last    Where to return a pointer to the last entry
    +
    +When the name is found in the table, if first is NULL, the function +returns a group number, but if there is more than one matching entry, it is not +defined which one. Otherwise, when both pointers have been set, the yield of +the function is the length of each entry in code units. If the name is not +found, PCRE2_ERROR_NOSUBSTRING is returned. +

    +

    +There is a complete description of the PCRE2 native API, including the format of +the table entries, in the +pcre2api +page, and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2_substring_number_from_name.html b/src/pcre2/doc/html/pcre2_substring_number_from_name.html new file mode 100644 index 00000000..160fbda6 --- /dev/null +++ b/src/pcre2/doc/html/pcre2_substring_number_from_name.html @@ -0,0 +1,50 @@ + + +pcre2_substring_number_from_name specification + + +

    pcre2_substring_number_from_name man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SYNOPSIS +
    +

    +#include <pcre2.h> +

    +

    +int pcre2_substring_number_from_name(const pcre2_code *code, + PCRE2_SPTR name); +

    +
    +DESCRIPTION +
    +

    +This convenience function finds the number of a named substring capturing +parenthesis in a compiled pattern, provided that it is a unique name. The +function arguments are: +

    +  code    Compiled regular expression
    +  name    Name whose number is required
    +
    +The yield of the function is the number of the parenthesis if the name is +found, or PCRE2_ERROR_NOSUBSTRING if it is not found. When duplicate names are +allowed (PCRE2_DUPNAMES is set), if the name is not unique, +PCRE2_ERROR_NOUNIQUESUBSTRING is returned. You can obtain the list of numbers +with the same name by calling pcre2_substring_nametable_scan(). +

    +

    +There is a complete description of the PCRE2 native API in the +pcre2api +page and a description of the POSIX API in the +pcre2posix +page. +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2api.html b/src/pcre2/doc/html/pcre2api.html new file mode 100644 index 00000000..4ca0eb0c --- /dev/null +++ b/src/pcre2/doc/html/pcre2api.html @@ -0,0 +1,3998 @@ + + +pcre2api specification + + +

    pcre2api man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

    +

    +#include <pcre2.h> +
    +
    +PCRE2 is a new API for PCRE, starting at release 10.0. This document contains a +description of all its native functions. See the +pcre2 +document for an overview of all the PCRE2 documentation. +

    +
    PCRE2 NATIVE API BASIC FUNCTIONS
    +

    +pcre2_code *pcre2_compile(PCRE2_SPTR pattern, PCRE2_SIZE length, + uint32_t options, int *errorcode, PCRE2_SIZE *erroroffset, + pcre2_compile_context *ccontext); +
    +
    +void pcre2_code_free(pcre2_code *code); +
    +
    +pcre2_match_data *pcre2_match_data_create(uint32_t ovecsize, + pcre2_general_context *gcontext); +
    +
    +pcre2_match_data *pcre2_match_data_create_from_pattern( + const pcre2_code *code, pcre2_general_context *gcontext); +
    +
    +int pcre2_match(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext); +
    +
    +int pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext, + int *workspace, PCRE2_SIZE wscount); +
    +
    +void pcre2_match_data_free(pcre2_match_data *match_data); +

    +
    PCRE2 NATIVE API AUXILIARY MATCH FUNCTIONS
    +

    +PCRE2_SPTR pcre2_get_mark(pcre2_match_data *match_data); +
    +
    +uint32_t pcre2_get_ovector_count(pcre2_match_data *match_data); +
    +
    +PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *match_data); +
    +
    +PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *match_data); +

    +
    PCRE2 NATIVE API GENERAL CONTEXT FUNCTIONS
    +

    +pcre2_general_context *pcre2_general_context_create( + void *(*private_malloc)(PCRE2_SIZE, void *), + void (*private_free)(void *, void *), void *memory_data); +
    +
    +pcre2_general_context *pcre2_general_context_copy( + pcre2_general_context *gcontext); +
    +
    +void pcre2_general_context_free(pcre2_general_context *gcontext); +

    +
    PCRE2 NATIVE API COMPILE CONTEXT FUNCTIONS
    +

    +pcre2_compile_context *pcre2_compile_context_create( + pcre2_general_context *gcontext); +
    +
    +pcre2_compile_context *pcre2_compile_context_copy( + pcre2_compile_context *ccontext); +
    +
    +void pcre2_compile_context_free(pcre2_compile_context *ccontext); +
    +
    +int pcre2_set_bsr(pcre2_compile_context *ccontext, + uint32_t value); +
    +
    +int pcre2_set_character_tables(pcre2_compile_context *ccontext, + const uint8_t *tables); +
    +
    +int pcre2_set_compile_extra_options(pcre2_compile_context *ccontext, + uint32_t extra_options); +
    +
    +int pcre2_set_max_pattern_length(pcre2_compile_context *ccontext, + PCRE2_SIZE value); +
    +
    +int pcre2_set_newline(pcre2_compile_context *ccontext, + uint32_t value); +
    +
    +int pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext, + uint32_t value); +
    +
    +int pcre2_set_compile_recursion_guard(pcre2_compile_context *ccontext, + int (*guard_function)(uint32_t, void *), void *user_data); +

    +
    PCRE2 NATIVE API MATCH CONTEXT FUNCTIONS
    +

    +pcre2_match_context *pcre2_match_context_create( + pcre2_general_context *gcontext); +
    +
    +pcre2_match_context *pcre2_match_context_copy( + pcre2_match_context *mcontext); +
    +
    +void pcre2_match_context_free(pcre2_match_context *mcontext); +
    +
    +int pcre2_set_callout(pcre2_match_context *mcontext, + int (*callout_function)(pcre2_callout_block *, void *), + void *callout_data); +
    +
    +int pcre2_set_substitute_callout(pcre2_match_context *mcontext, + int (*callout_function)(pcre2_substitute_callout_block *, void *), + void *callout_data); +
    +
    +int pcre2_set_offset_limit(pcre2_match_context *mcontext, + PCRE2_SIZE value); +
    +
    +int pcre2_set_heap_limit(pcre2_match_context *mcontext, + uint32_t value); +
    +
    +int pcre2_set_match_limit(pcre2_match_context *mcontext, + uint32_t value); +
    +
    +int pcre2_set_depth_limit(pcre2_match_context *mcontext, + uint32_t value); +

    +
    PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS
    +

    +int pcre2_substring_copy_byname(pcre2_match_data *match_data, + PCRE2_SPTR name, PCRE2_UCHAR *buffer, PCRE2_SIZE *bufflen); +
    +
    +int pcre2_substring_copy_bynumber(pcre2_match_data *match_data, + uint32_t number, PCRE2_UCHAR *buffer, + PCRE2_SIZE *bufflen); +
    +
    +void pcre2_substring_free(PCRE2_UCHAR *buffer); +
    +
    +int pcre2_substring_get_byname(pcre2_match_data *match_data, + PCRE2_SPTR name, PCRE2_UCHAR **bufferptr, PCRE2_SIZE *bufflen); +
    +
    +int pcre2_substring_get_bynumber(pcre2_match_data *match_data, + uint32_t number, PCRE2_UCHAR **bufferptr, + PCRE2_SIZE *bufflen); +
    +
    +int pcre2_substring_length_byname(pcre2_match_data *match_data, + PCRE2_SPTR name, PCRE2_SIZE *length); +
    +
    +int pcre2_substring_length_bynumber(pcre2_match_data *match_data, + uint32_t number, PCRE2_SIZE *length); +
    +
    +int pcre2_substring_nametable_scan(const pcre2_code *code, + PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last); +
    +
    +int pcre2_substring_number_from_name(const pcre2_code *code, + PCRE2_SPTR name); +
    +
    +void pcre2_substring_list_free(PCRE2_SPTR *list); +
    +
    +int pcre2_substring_list_get(pcre2_match_data *match_data, +" PCRE2_UCHAR ***listptr, PCRE2_SIZE **lengthsptr); +

    +
    PCRE2 NATIVE API STRING SUBSTITUTION FUNCTION
    +

    +int pcre2_substitute(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext, PCRE2_SPTR replacementz, + PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer, + PCRE2_SIZE *outlengthptr); +

    +
    PCRE2 NATIVE API JIT FUNCTIONS
    +

    +int pcre2_jit_compile(pcre2_code *code, uint32_t options); +
    +
    +int pcre2_jit_match(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext); +
    +
    +void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext); +
    +
    +pcre2_jit_stack *pcre2_jit_stack_create(PCRE2_SIZE startsize, + PCRE2_SIZE maxsize, pcre2_general_context *gcontext); +
    +
    +void pcre2_jit_stack_assign(pcre2_match_context *mcontext, + pcre2_jit_callback callback_function, void *callback_data); +
    +
    +void pcre2_jit_stack_free(pcre2_jit_stack *jit_stack); +

    +
    PCRE2 NATIVE API SERIALIZATION FUNCTIONS
    +

    +int32_t pcre2_serialize_decode(pcre2_code **codes, + int32_t number_of_codes, const uint8_t *bytes, + pcre2_general_context *gcontext); +
    +
    +int32_t pcre2_serialize_encode(const pcre2_code **codes, + int32_t number_of_codes, uint8_t **serialized_bytes, + PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext); +
    +
    +void pcre2_serialize_free(uint8_t *bytes); +
    +
    +int32_t pcre2_serialize_get_number_of_codes(const uint8_t *bytes); +

    +
    PCRE2 NATIVE API AUXILIARY FUNCTIONS
    +

    +pcre2_code *pcre2_code_copy(const pcre2_code *code); +
    +
    +pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *code); +
    +
    +int pcre2_get_error_message(int errorcode, PCRE2_UCHAR *buffer, + PCRE2_SIZE bufflen); +
    +
    +const uint8_t *pcre2_maketables(pcre2_general_context *gcontext); +
    +
    +void pcre2_maketables_free(pcre2_general_context *gcontext, + const uint8_t *tables); +
    +
    +int pcre2_pattern_info(const pcre2_code *code, uint32_t what, + void *where); +
    +
    +int pcre2_callout_enumerate(const pcre2_code *code, + int (*callback)(pcre2_callout_enumerate_block *, void *), + void *user_data); +
    +
    +int pcre2_config(uint32_t what, void *where); +

    +
    PCRE2 NATIVE API OBSOLETE FUNCTIONS
    +

    +int pcre2_set_recursion_limit(pcre2_match_context *mcontext, + uint32_t value); +
    +
    +int pcre2_set_recursion_memory_management( + pcre2_match_context *mcontext, + void *(*private_malloc)(PCRE2_SIZE, void *), + void (*private_free)(void *, void *), void *memory_data); +
    +
    +These functions became obsolete at release 10.30 and are retained only for +backward compatibility. They should not be used in new code. The first is +replaced by pcre2_set_depth_limit(); the second is no longer needed and +has no effect (it always returns zero). +

    +
    PCRE2 EXPERIMENTAL PATTERN CONVERSION FUNCTIONS
    +

    +pcre2_convert_context *pcre2_convert_context_create( + pcre2_general_context *gcontext); +
    +
    +pcre2_convert_context *pcre2_convert_context_copy( + pcre2_convert_context *cvcontext); +
    +
    +void pcre2_convert_context_free(pcre2_convert_context *cvcontext); +
    +
    +int pcre2_set_glob_escape(pcre2_convert_context *cvcontext, + uint32_t escape_char); +
    +
    +int pcre2_set_glob_separator(pcre2_convert_context *cvcontext, + uint32_t separator_char); +
    +
    +int pcre2_pattern_convert(PCRE2_SPTR pattern, PCRE2_SIZE length, + uint32_t options, PCRE2_UCHAR **buffer, + PCRE2_SIZE *blength, pcre2_convert_context *cvcontext); +
    +
    +void pcre2_converted_pattern_free(PCRE2_UCHAR *converted_pattern); +
    +
    +These functions provide a way of converting non-PCRE2 patterns into +patterns that can be processed by pcre2_compile(). This facility is +experimental and may be changed in future releases. At present, "globs" and +POSIX basic and extended patterns can be converted. Details are given in the +pcre2convert +documentation. +

    +
    PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES
    +

    +There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code +units, respectively. However, there is just one header file, pcre2.h. +This contains the function prototypes and other definitions for all three +libraries. One, two, or all three can be installed simultaneously. On Unix-like +systems the libraries are called libpcre2-8, libpcre2-16, and +libpcre2-32, and they can also co-exist with the original PCRE libraries. +

    +

    +Character strings are passed to and from a PCRE2 library as a sequence of +unsigned integers in code units of the appropriate width. Every PCRE2 function +comes in three different forms, one for each library, for example: +

    +  pcre2_compile_8()
    +  pcre2_compile_16()
    +  pcre2_compile_32()
    +
    +There are also three different sets of data types: +
    +  PCRE2_UCHAR8, PCRE2_UCHAR16, PCRE2_UCHAR32
    +  PCRE2_SPTR8,  PCRE2_SPTR16,  PCRE2_SPTR32
    +
    +The UCHAR types define unsigned code units of the appropriate widths. For +example, PCRE2_UCHAR16 is usually defined as `uint16_t'. The SPTR types are +constant pointers to the equivalent UCHAR types, that is, they are pointers to +vectors of unsigned code units. +

    +

    +Many applications use only one code unit width. For their convenience, macros +are defined whose names are the generic forms such as pcre2_compile() and +PCRE2_SPTR. These macros use the value of the macro PCRE2_CODE_UNIT_WIDTH to +generate the appropriate width-specific function and macro names. +PCRE2_CODE_UNIT_WIDTH is not defined by default. An application must define it +to be 8, 16, or 32 before including pcre2.h in order to make use of the +generic names. +

    +

    +Applications that use more than one code unit width can be linked with more +than one PCRE2 library, but must define PCRE2_CODE_UNIT_WIDTH to be 0 before +including pcre2.h, and then use the real function names. Any code that is +to be included in an environment where the value of PCRE2_CODE_UNIT_WIDTH is +unknown should also use the real function names. (Unfortunately, it is not +possible in C code to save and restore the value of a macro.) +

    +

    +If PCRE2_CODE_UNIT_WIDTH is not defined before including pcre2.h, a +compiler error occurs. +

    +

    +When using multiple libraries in an application, you must take care when +processing any particular pattern to use only functions from a single library. +For example, if you want to run a match using a pattern that was compiled with +pcre2_compile_16(), you must do so with pcre2_match_16(), not +pcre2_match_8() or pcre2_match_32(). +

    +

    +In the function summaries above, and in the rest of this document and other +PCRE2 documents, functions and data types are described using their generic +names, without the _8, _16, or _32 suffix. +

    +
    PCRE2 API OVERVIEW
    +

    +PCRE2 has its own native API, which is described in this document. There are +also some wrapper functions for the 8-bit library that correspond to the +POSIX regular expression API, but they do not give access to all the +functionality of PCRE2. They are described in the +pcre2posix +documentation. Both these APIs define a set of C function calls. +

    +

    +The native API C data types, function prototypes, option values, and error +codes are defined in the header file pcre2.h, which also contains +definitions of PCRE2_MAJOR and PCRE2_MINOR, the major and minor release numbers +for the library. Applications can use these to include support for different +releases of PCRE2. +

    +

    +In a Windows environment, if you want to statically link an application program +against a non-dll PCRE2 library, you must define PCRE2_STATIC before including +pcre2.h. +

    +

    +The functions pcre2_compile() and pcre2_match() are used for +compiling and matching regular expressions in a Perl-compatible manner. A +sample program that demonstrates the simplest way of using them is provided in +the file called pcre2demo.c in the PCRE2 source distribution. A listing +of this program is given in the +pcre2demo +documentation, and the +pcre2sample +documentation describes how to compile and run it. +

    +

    +The compiling and matching functions recognize various options that are passed +as bits in an options argument. There are also some more complicated parameters +such as custom memory management functions and resource limits that are passed +in "contexts" (which are just memory blocks, described below). Simple +applications do not need to make use of contexts. +

    +

    +Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be +built in appropriate hardware environments. It greatly speeds up the matching +performance of many patterns. Programs can request that it be used if +available by calling pcre2_jit_compile() after a pattern has been +successfully compiled by pcre2_compile(). This does nothing if JIT +support is not available. +

    +

    +More complicated programs might need to make use of the specialist functions +pcre2_jit_stack_create(), pcre2_jit_stack_free(), and +pcre2_jit_stack_assign() in order to control the JIT code's memory usage. +

    +

    +JIT matching is automatically used by pcre2_match() if it is available, +unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT +matching, which gives improved performance at the expense of less sanity +checking. The JIT-specific functions are discussed in the +pcre2jit +documentation. +

    +

    +A second matching function, pcre2_dfa_match(), which is not +Perl-compatible, is also provided. This uses a different algorithm for the +matching. The alternative algorithm finds all possible matches (at a given +point in the subject), and scans the subject just once (unless there are +lookaround assertions). However, this algorithm does not return captured +substrings. A description of the two matching algorithms and their advantages +and disadvantages is given in the +pcre2matching +documentation. There is no JIT support for pcre2_dfa_match(). +

    +

    +In addition to the main compiling and matching functions, there are convenience +functions for extracting captured substrings from a subject string that has +been matched by pcre2_match(). They are: +

    +  pcre2_substring_copy_byname()
    +  pcre2_substring_copy_bynumber()
    +  pcre2_substring_get_byname()
    +  pcre2_substring_get_bynumber()
    +  pcre2_substring_list_get()
    +  pcre2_substring_length_byname()
    +  pcre2_substring_length_bynumber()
    +  pcre2_substring_nametable_scan()
    +  pcre2_substring_number_from_name()
    +
    +pcre2_substring_free() and pcre2_substring_list_free() are also +provided, to free memory used for extracted strings. If either of these +functions is called with a NULL argument, the function returns immediately +without doing anything. +

    +

    +The function pcre2_substitute() can be called to match a pattern and +return a copy of the subject string with substitutions for parts that were +matched. +

    +

    +Functions whose names begin with pcre2_serialize_ are used for saving +compiled patterns on disc or elsewhere, and reloading them later. +

    +

    +Finally, there are functions for finding out information about a compiled +pattern (pcre2_pattern_info()) and about the configuration with which +PCRE2 was built (pcre2_config()). +

    +

    +Functions with names ending with _free() are used for freeing memory +blocks of various sorts. In all cases, if one of these functions is called with +a NULL argument, it does nothing. +

    +
    STRING LENGTHS AND OFFSETS
    +

    +The PCRE2 API uses string lengths and offsets into strings of code units in +several places. These values are always of type PCRE2_SIZE, which is an +unsigned integer type, currently always defined as size_t. The largest +value that can be stored in such a type (that is ~(PCRE2_SIZE)0) is reserved +as a special indicator for zero-terminated strings and unset offsets. +Therefore, the longest string that can be handled is one less than this +maximum. +

    +
    NEWLINES
    +

    +PCRE2 supports five different conventions for indicating line breaks in +strings: a single CR (carriage return) character, a single LF (linefeed) +character, the two-character sequence CRLF, any of the three preceding, or any +Unicode newline sequence. The Unicode newline sequences are the three just +mentioned, plus the single characters VT (vertical tab, U+000B), FF (form feed, +U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS +(paragraph separator, U+2029). +

    +

    +Each of the first three conventions is used by at least one operating system as +its standard newline sequence. When PCRE2 is built, a default can be specified. +If it is not, the default is set to LF, which is the Unix standard. However, +the newline convention can be changed by an application when calling +pcre2_compile(), or it can be specified by special text at the start of +the pattern itself; this overrides any other settings. See the +pcre2pattern +page for details of the special character sequences. +

    +

    +In the PCRE2 documentation the word "newline" is used to mean "the character or +pair of characters that indicate a line break". The choice of newline +convention affects the handling of the dot, circumflex, and dollar +metacharacters, the handling of #-comments in /x mode, and, when CRLF is a +recognized line ending sequence, the match position advancement for a +non-anchored pattern. There is more detail about this in the +section on pcre2_match() options +below. +

    +

    +The choice of newline convention does not affect the interpretation of +the \n or \r escape sequences, nor does it affect what \R matches; this has +its own separate convention. +

    +
    MULTITHREADING
    +

    +In a multithreaded application it is important to keep thread-specific data +separate from data that can be shared between threads. The PCRE2 library code +itself is thread-safe: it contains no static or global variables. The API is +designed to be fairly simple for non-threaded applications while at the same +time ensuring that multithreaded applications can use it. +

    +

    +There are several different blocks of data that are used to pass information +between the application and the PCRE2 libraries. +

    +
    +The compiled pattern +
    +

    +A pointer to the compiled form of a pattern is returned to the user when +pcre2_compile() is successful. The data in the compiled pattern is fixed, +and does not change when the pattern is matched. Therefore, it is thread-safe, +that is, the same compiled pattern can be used by more than one thread +simultaneously. For example, an application can compile all its patterns at the +start, before forking off multiple threads that use them. However, if the +just-in-time (JIT) optimization feature is being used, it needs separate memory +stack areas for each thread. See the +pcre2jit +documentation for more details. +

    +

    +In a more complicated situation, where patterns are compiled only when they are +first needed, but are still shared between threads, pointers to compiled +patterns must be protected from simultaneous writing by multiple threads. This +is somewhat tricky to do correctly. If you know that writing to a pointer is +atomic in your environment, you can use logic like this: +

    +  Get a read-only (shared) lock (mutex) for pointer
    +  if (pointer == NULL)
    +    {
    +    Get a write (unique) lock for pointer
    +    if (pointer == NULL) pointer = pcre2_compile(...
    +    }
    +  Release the lock
    +  Use pointer in pcre2_match()
    +
    +Of course, testing for compilation errors should also be included in the code. +

    +

    +The reason for checking the pointer a second time is as follows: Several +threads may have acquired the shared lock and tested the pointer for being +NULL, but only one of them will be given the write lock, with the rest kept +waiting. The winning thread will compile the pattern and store the result. +After this thread releases the write lock, another thread will get it, and if +it does not retest pointer for being NULL, will recompile the pattern and +overwrite the pointer, creating a memory leak and possibly causing other +issues. +

    +

    +In an environment where writing to a pointer may not be atomic, the above logic +is not sufficient. The thread that is doing the compiling may be descheduled +after writing only part of the pointer, which could cause other threads to use +an invalid value. Instead of checking the pointer itself, a separate "pointer +is valid" flag (that can be updated atomically) must be used: +

    +  Get a read-only (shared) lock (mutex) for pointer
    +  if (!pointer_is_valid)
    +    {
    +    Get a write (unique) lock for pointer
    +    if (!pointer_is_valid)
    +      {
    +      pointer = pcre2_compile(...
    +      pointer_is_valid = TRUE
    +      }
    +    }
    +  Release the lock
    +  Use pointer in pcre2_match()
    +
    +If JIT is being used, but the JIT compilation is not being done immediately +(perhaps waiting to see if the pattern is used often enough), similar logic is +required. JIT compilation updates a value within the compiled code block, so a +thread must gain unique write access to the pointer before calling +pcre2_jit_compile(). Alternatively, pcre2_code_copy() or +pcre2_code_copy_with_tables() can be used to obtain a private copy of the +compiled code before calling the JIT compiler. +

    +
    +Context blocks +
    +

    +The next main section below introduces the idea of "contexts" in which PCRE2 +functions are called. A context is nothing more than a collection of parameters +that control the way PCRE2 operates. Grouping a number of parameters together +in a context is a convenient way of passing them to a PCRE2 function without +using lots of arguments. The parameters that are stored in contexts are in some +sense "advanced features" of the API. Many straightforward applications will +not need to use contexts. +

    +

    +In a multithreaded application, if the parameters in a context are values that +are never changed, the same context can be used by all the threads. However, if +any thread needs to change any value in a context, it must make its own +thread-specific copy. +

    +
    +Match blocks +
    +

    +The matching functions need a block of memory for storing the results of a +match. This includes details of what was matched, as well as additional +information such as the name of a (*MARK) setting. Each thread must provide its +own copy of this memory. +

    +
    PCRE2 CONTEXTS
    +

    +Some PCRE2 functions have a lot of parameters, many of which are used only by +specialist applications, for example, those that use custom memory management +or non-standard character tables. To keep function argument lists at a +reasonable size, and at the same time to keep the API extensible, "uncommon" +parameters are passed to certain functions in a context instead of +directly. A context is just a block of memory that holds the parameter values. +Applications that do not need to adjust any of the context parameters can pass +NULL when a context pointer is required. +

    +

    +There are three different types of context: a general context that is relevant +for several PCRE2 operations, a compile-time context, and a match-time context. +

    +
    +The general context +
    +

    +At present, this context just contains pointers to (and data for) external +memory management functions that are called from several places in the PCRE2 +library. The context is named `general' rather than specifically `memory' +because in future other fields may be added. If you do not want to supply your +own custom memory management functions, you do not need to bother with a +general context. A general context is created by: +
    +
    +pcre2_general_context *pcre2_general_context_create( + void *(*private_malloc)(PCRE2_SIZE, void *), + void (*private_free)(void *, void *), void *memory_data); +
    +
    +The two function pointers specify custom memory management functions, whose +prototypes are: +

    +  void *private_malloc(PCRE2_SIZE, void *);
    +  void  private_free(void *, void *);
    +
    +Whenever code in PCRE2 calls these functions, the final argument is the value +of memory_data. Either of the first two arguments of the creation +function may be NULL, in which case the system memory management functions +malloc() and free() are used. (This is not currently useful, as +there are no other fields in a general context, but in future there might be.) +The private_malloc() function is used (if supplied) to obtain memory for +storing the context, and all three values are saved as part of the context. +

    +

    +Whenever PCRE2 creates a data block of any kind, the block contains a pointer +to the free() function that matches the malloc() function that was +used. When the time comes to free the block, this function is called. +

    +

    +A general context can be copied by calling: +
    +
    +pcre2_general_context *pcre2_general_context_copy( + pcre2_general_context *gcontext); +
    +
    +The memory used for a general context should be freed by calling: +
    +
    +void pcre2_general_context_free(pcre2_general_context *gcontext); +
    +
    +If this function is passed a NULL argument, it returns immediately without +doing anything. +

    +
    +The compile context +
    +

    +A compile context is required if you want to provide an external function for +stack checking during compilation or to change the default values of any of the +following compile-time parameters: +

    +  What \R matches (Unicode newlines or CR, LF, CRLF only)
    +  PCRE2's character tables
    +  The newline character sequence
    +  The compile time nested parentheses limit
    +  The maximum length of the pattern string
    +  The extra options bits (none set by default)
    +
    +A compile context is also required if you are using custom memory management. +If none of these apply, just pass NULL as the context argument of +pcre2_compile(). +

    +

    +A compile context is created, copied, and freed by the following functions: +
    +
    +pcre2_compile_context *pcre2_compile_context_create( + pcre2_general_context *gcontext); +
    +
    +pcre2_compile_context *pcre2_compile_context_copy( + pcre2_compile_context *ccontext); +
    +
    +void pcre2_compile_context_free(pcre2_compile_context *ccontext); +
    +
    +A compile context is created with default values for its parameters. These can +be changed by calling the following functions, which return 0 on success, or +PCRE2_ERROR_BADDATA if invalid data is detected. +
    +
    +int pcre2_set_bsr(pcre2_compile_context *ccontext, + uint32_t value); +
    +
    +The value must be PCRE2_BSR_ANYCRLF, to specify that \R matches only CR, LF, +or CRLF, or PCRE2_BSR_UNICODE, to specify that \R matches any Unicode line +ending sequence. The value is used by the JIT compiler and by the two +interpreted matching functions, pcre2_match() and +pcre2_dfa_match(). +
    +
    +int pcre2_set_character_tables(pcre2_compile_context *ccontext, + const uint8_t *tables); +
    +
    +The value must be the result of a call to pcre2_maketables(), whose only +argument is a general context. This function builds a set of character tables +in the current locale. +
    +
    +int pcre2_set_compile_extra_options(pcre2_compile_context *ccontext, + uint32_t extra_options); +
    +
    +As PCRE2 has developed, almost all the 32 option bits that are available in +the options argument of pcre2_compile() have been used up. To avoid +running out, the compile context contains a set of extra option bits which are +used for some newer, assumed rarer, options. This function sets those bits. It +always sets all the bits (either on or off). It does not modify any existing +setting. The available options are defined in the section entitled "Extra +compile options" +below. +
    +
    +int pcre2_set_max_pattern_length(pcre2_compile_context *ccontext, + PCRE2_SIZE value); +
    +
    +This sets a maximum length, in code units, for any pattern string that is +compiled with this context. If the pattern is longer, an error is generated. +This facility is provided so that applications that accept patterns from +external sources can limit their size. The default is the largest number that a +PCRE2_SIZE variable can hold, which is effectively unlimited. +
    +
    +int pcre2_set_newline(pcre2_compile_context *ccontext, + uint32_t value); +
    +
    +This specifies which characters or character sequences are to be recognized as +newlines. The value must be one of PCRE2_NEWLINE_CR (carriage return only), +PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character +sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), +PCRE2_NEWLINE_ANY (any Unicode newline sequence), or PCRE2_NEWLINE_NUL (the +NUL character, that is a binary zero). +

    +

    +A pattern can override the value set in the compile context by starting with a +sequence such as (*CRLF). See the +pcre2pattern +page for details. +

    +

    +When a pattern is compiled with the PCRE2_EXTENDED or PCRE2_EXTENDED_MORE +option, the newline convention affects the recognition of the end of internal +comments starting with #. The value is saved with the compiled pattern for +subsequent use by the JIT compiler and by the two interpreted matching +functions, pcre2_match() and pcre2_dfa_match(). +
    +
    +int pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext, + uint32_t value); +
    +
    +This parameter adjusts the limit, set when PCRE2 is built (default 250), on the +depth of parenthesis nesting in a pattern. This limit stops rogue patterns +using up too much system stack when being compiled. The limit applies to +parentheses of all kinds, not just capturing parentheses. +
    +
    +int pcre2_set_compile_recursion_guard(pcre2_compile_context *ccontext, + int (*guard_function)(uint32_t, void *), void *user_data); +
    +
    +There is at least one application that runs PCRE2 in threads with very limited +system stack, where running out of stack is to be avoided at all costs. The +parenthesis limit above cannot take account of how much stack is actually +available during compilation. For a finer control, you can supply a function +that is called whenever pcre2_compile() starts to compile a parenthesized +part of a pattern. This function can check the actual stack size (or anything +else that it wants to, of course). +

    +

    +The first argument to the callout function gives the current depth of +nesting, and the second is user data that is set up by the last argument of +pcre2_set_compile_recursion_guard(). The callout function should return +zero if all is well, or non-zero to force an error. +

    +
    +The match context +
    +

    +A match context is required if you want to: +

    +  Set up a callout function
    +  Set an offset limit for matching an unanchored pattern
    +  Change the limit on the amount of heap used when matching
    +  Change the backtracking match limit
    +  Change the backtracking depth limit
    +  Set custom memory management specifically for the match
    +
    +If none of these apply, just pass NULL as the context argument of +pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match(). +

    +

    +A match context is created, copied, and freed by the following functions: +
    +
    +pcre2_match_context *pcre2_match_context_create( + pcre2_general_context *gcontext); +
    +
    +pcre2_match_context *pcre2_match_context_copy( + pcre2_match_context *mcontext); +
    +
    +void pcre2_match_context_free(pcre2_match_context *mcontext); +
    +
    +A match context is created with default values for its parameters. These can +be changed by calling the following functions, which return 0 on success, or +PCRE2_ERROR_BADDATA if invalid data is detected. +
    +
    +int pcre2_set_callout(pcre2_match_context *mcontext, + int (*callout_function)(pcre2_callout_block *, void *), + void *callout_data); +
    +
    +This sets up a callout function for PCRE2 to call at specified points +during a matching operation. Details are given in the +pcre2callout +documentation. +
    +
    +int pcre2_set_substitute_callout(pcre2_match_context *mcontext, + int (*callout_function)(pcre2_substitute_callout_block *, void *), + void *callout_data); +
    +
    +This sets up a callout function for PCRE2 to call after each substitution +made by pcre2_substitute(). Details are given in the section entitled +"Creating a new string with substitutions" +below. +
    +
    +int pcre2_set_offset_limit(pcre2_match_context *mcontext, + PCRE2_SIZE value); +
    +
    +The offset_limit parameter limits how far an unanchored search can +advance in the subject string. The default value is PCRE2_UNSET. The +pcre2_match() and pcre2_dfa_match() functions return +PCRE2_ERROR_NOMATCH if a match with a starting point before or at the given +offset is not found. The pcre2_substitute() function makes no more +substitutions. +

    +

    +For example, if the pattern /abc/ is matched against "123abc" with an offset +limit less than 3, the result is PCRE2_ERROR_NOMATCH. A match can never be +found if the startoffset argument of pcre2_match(), +pcre2_dfa_match(), or pcre2_substitute() is greater than the offset +limit set in the match context. +

    +

    +When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when +calling pcre2_compile() so that when JIT is in use, different code can be +compiled. If a match is started with a non-default match limit when +PCRE2_USE_OFFSET_LIMIT is not set, an error is generated. +

    +

    +The offset limit facility can be used to track progress when searching large +subject strings or to limit the extent of global substitutions. See also the +PCRE2_FIRSTLINE option, which requires a match to start before or at the first +newline that follows the start of matching in the subject. If this is set with +an offset limit, a match must occur in the first line and also within the +offset limit. In other words, whichever limit comes first is used. +
    +
    +int pcre2_set_heap_limit(pcre2_match_context *mcontext, + uint32_t value); +
    +
    +The heap_limit parameter specifies, in units of kibibytes (1024 bytes), +the maximum amount of heap memory that pcre2_match() may use to hold +backtracking information when running an interpretive match. This limit also +applies to pcre2_dfa_match(), which may use the heap when processing +patterns with a lot of nested pattern recursion or lookarounds or atomic +groups. This limit does not apply to matching with the JIT optimization, which +has its own memory control arrangements (see the +pcre2jit +documentation for more details). If the limit is reached, the negative error +code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2 +is built; if it is not, the default is set very large and is essentially +"unlimited". +

    +

    +A value for the heap limit may also be supplied by an item at the start of a +pattern of the form +

    +  (*LIMIT_HEAP=ddd)
    +
    +where ddd is a decimal number. However, such a setting is ignored unless ddd is +less than the limit set by the caller of pcre2_match() or, if no such +limit is set, less than the default. +

    +

    +The pcre2_match() function starts out using a 20KiB vector on the system +stack for recording backtracking points. The more nested backtracking points +there are (that is, the deeper the search tree), the more memory is needed. +Heap memory is used only if the initial vector is too small. If the heap limit +is set to a value less than 21 (in particular, zero) no heap memory will be +used. In this case, only patterns that do not have a lot of nested backtracking +can be successfully processed. +

    +

    +Similarly, for pcre2_dfa_match(), a vector on the system stack is used +when processing pattern recursions, lookarounds, or atomic groups, and only if +this is not big enough is heap memory used. In this case, too, setting a value +of zero disables the use of the heap. +
    +
    +int pcre2_set_match_limit(pcre2_match_context *mcontext, + uint32_t value); +
    +
    +The match_limit parameter provides a means of preventing PCRE2 from using +up too many computing resources when processing patterns that are not going to +match, but which have a very large number of possibilities in their search +trees. The classic example is a pattern that uses nested unlimited repeats. +

    +

    +There is an internal counter in pcre2_match() that is incremented each +time round its main matching loop. If this value reaches the match limit, +pcre2_match() returns the negative value PCRE2_ERROR_MATCHLIMIT. This has +the effect of limiting the amount of backtracking that can take place. For +patterns that are not anchored, the count restarts from zero for each position +in the subject string. This limit also applies to pcre2_dfa_match(), +though the counting is done in a different way. +

    +

    +When pcre2_match() is called with a pattern that was successfully +processed by pcre2_jit_compile(), the way in which matching is executed +is entirely different. However, there is still the possibility of runaway +matching that goes on for a very long time, and so the match_limit value +is also used in this case (but in a different way) to limit how long the +matching can continue. +

    +

    +The default value for the limit can be set when PCRE2 is built; the default +default is 10 million, which handles all but the most extreme cases. A value +for the match limit may also be supplied by an item at the start of a pattern +of the form +

    +  (*LIMIT_MATCH=ddd)
    +
    +where ddd is a decimal number. However, such a setting is ignored unless ddd is +less than the limit set by the caller of pcre2_match() or +pcre2_dfa_match() or, if no such limit is set, less than the default. +
    +
    +int pcre2_set_depth_limit(pcre2_match_context *mcontext, + uint32_t value); +
    +
    +This parameter limits the depth of nested backtracking in pcre2_match(). +Each time a nested backtracking point is passed, a new memory "frame" is used +to remember the state of matching at that point. Thus, this parameter +indirectly limits the amount of memory that is used in a match. However, +because the size of each memory "frame" depends on the number of capturing +parentheses, the actual memory limit varies from pattern to pattern. This limit +was more useful in versions before 10.30, where function recursion was used for +backtracking. +

    +

    +The depth limit is not relevant, and is ignored, when matching is done using +JIT compiled code. However, it is supported by pcre2_dfa_match(), which +uses it to limit the depth of nested internal recursive function calls that +implement atomic groups, lookaround assertions, and pattern recursions. This +limits, indirectly, the amount of system stack that is used. It was more useful +in versions before 10.32, when stack memory was used for local workspace +vectors for recursive function calls. From version 10.32, only local variables +are allocated on the stack and as each call uses only a few hundred bytes, even +a small stack can support quite a lot of recursion. +

    +

    +If the depth of internal recursive function calls is great enough, local +workspace vectors are allocated on the heap from version 10.32 onwards, so the +depth limit also indirectly limits the amount of heap memory that is used. A +recursive pattern such as /(.(?2))((?1)|)/, when matched to a very long string +using pcre2_dfa_match(), can use a great deal of memory. However, it is +probably better to limit heap usage directly by calling +pcre2_set_heap_limit(). +

    +

    +The default value for the depth limit can be set when PCRE2 is built; if it is +not, the default is set to the same value as the default for the match limit. +If the limit is exceeded, pcre2_match() or pcre2_dfa_match() +returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be +supplied by an item at the start of a pattern of the form +

    +  (*LIMIT_DEPTH=ddd)
    +
    +where ddd is a decimal number. However, such a setting is ignored unless ddd is +less than the limit set by the caller of pcre2_match() or +pcre2_dfa_match() or, if no such limit is set, less than the default. +

    +
    CHECKING BUILD-TIME OPTIONS
    +

    +int pcre2_config(uint32_t what, void *where); +

    +

    +The function pcre2_config() makes it possible for a PCRE2 client to find +the value of certain configuration parameters and to discover which optional +features have been compiled into the PCRE2 library. The +pcre2build +documentation has more details about these features. +

    +

    +The first argument for pcre2_config() specifies which information is +required. The second argument is a pointer to memory into which the information +is placed. If NULL is passed, the function returns the amount of memory that is +needed for the requested information. For calls that return numerical values, +the value is in bytes; when requesting these values, where should point +to appropriately aligned memory. For calls that return strings, the required +length is given in code units, not counting the terminating zero. +

    +

    +When requesting information, the returned value from pcre2_config() is +non-negative on success, or the negative error code PCRE2_ERROR_BADOPTION if +the value in the first argument is not recognized. The following information is +available: +

    +  PCRE2_CONFIG_BSR
    +
    +The output is a uint32_t integer whose value indicates what character +sequences the \R escape sequence matches by default. A value of +PCRE2_BSR_UNICODE means that \R matches any Unicode line ending sequence; a +value of PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, or CRLF. The +default can be overridden when a pattern is compiled. +
    +  PCRE2_CONFIG_COMPILED_WIDTHS
    +
    +The output is a uint32_t integer whose lower bits indicate which code unit +widths were selected when PCRE2 was built. The 1-bit indicates 8-bit support, +and the 2-bit and 4-bit indicate 16-bit and 32-bit support, respectively. +
    +  PCRE2_CONFIG_DEPTHLIMIT
    +
    +The output is a uint32_t integer that gives the default limit for the depth of +nested backtracking in pcre2_match() or the depth of nested recursions, +lookarounds, and atomic groups in pcre2_dfa_match(). Further details are +given with pcre2_set_depth_limit() above. +
    +  PCRE2_CONFIG_HEAPLIMIT
    +
    +The output is a uint32_t integer that gives, in kibibytes, the default limit +for the amount of heap memory used by pcre2_match() or +pcre2_dfa_match(). Further details are given with +pcre2_set_heap_limit() above. +
    +  PCRE2_CONFIG_JIT
    +
    +The output is a uint32_t integer that is set to one if support for just-in-time +compiling is available; otherwise it is set to zero. +
    +  PCRE2_CONFIG_JITTARGET
    +
    +The where argument should point to a buffer that is at least 48 code +units long. (The exact length required can be found by calling +pcre2_config() with where set to NULL.) The buffer is filled with a +string that contains the name of the architecture for which the JIT compiler is +configured, for example "x86 32bit (little endian + unaligned)". If JIT support +is not available, PCRE2_ERROR_BADOPTION is returned, otherwise the number of +code units used is returned. This is the length of the string, plus one unit +for the terminating zero. +
    +  PCRE2_CONFIG_LINKSIZE
    +
    +The output is a uint32_t integer that contains the number of bytes used for +internal linkage in compiled regular expressions. When PCRE2 is configured, the +value can be set to 2, 3, or 4, with the default being 2. This is the value +that is returned by pcre2_config(). However, when the 16-bit library is +compiled, a value of 3 is rounded up to 4, and when the 32-bit library is +compiled, internal linkages always use 4 bytes, so the configured value is not +relevant. +

    +

    +The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all +but the most massive patterns, since it allows the size of the compiled pattern +to be up to 65535 code units. Larger values allow larger regular expressions to +be compiled by those two libraries, but at the expense of slower matching. +

    +  PCRE2_CONFIG_MATCHLIMIT
    +
    +The output is a uint32_t integer that gives the default match limit for +pcre2_match(). Further details are given with +pcre2_set_match_limit() above. +
    +  PCRE2_CONFIG_NEWLINE
    +
    +The output is a uint32_t integer whose value specifies the default character +sequence that is recognized as meaning "newline". The values are: +
    +  PCRE2_NEWLINE_CR       Carriage return (CR)
    +  PCRE2_NEWLINE_LF       Linefeed (LF)
    +  PCRE2_NEWLINE_CRLF     Carriage return, linefeed (CRLF)
    +  PCRE2_NEWLINE_ANY      Any Unicode line ending
    +  PCRE2_NEWLINE_ANYCRLF  Any of CR, LF, or CRLF
    +  PCRE2_NEWLINE_NUL      The NUL character (binary zero)
    +
    +The default should normally correspond to the standard sequence for your +operating system. +
    +  PCRE2_CONFIG_NEVER_BACKSLASH_C
    +
    +The output is a uint32_t integer that is set to one if the use of \C was +permanently disabled when PCRE2 was built; otherwise it is set to zero. +
    +  PCRE2_CONFIG_PARENSLIMIT
    +
    +The output is a uint32_t integer that gives the maximum depth of nesting +of parentheses (of any kind) in a pattern. This limit is imposed to cap the +amount of system stack used when a pattern is compiled. It is specified when +PCRE2 is built; the default is 250. This limit does not take into account the +stack that may already be used by the calling application. For finer control +over compilation stack usage, see pcre2_set_compile_recursion_guard(). +
    +  PCRE2_CONFIG_STACKRECURSE
    +
    +This parameter is obsolete and should not be used in new code. The output is a +uint32_t integer that is always set to zero. +
    +  PCRE2_CONFIG_TABLES_LENGTH
    +
    +The output is a uint32_t integer that gives the length of PCRE2's character +processing tables in bytes. For details of these tables see the +section on locale support +below. +
    +  PCRE2_CONFIG_UNICODE_VERSION
    +
    +The where argument should point to a buffer that is at least 24 code +units long. (The exact length required can be found by calling +pcre2_config() with where set to NULL.) If PCRE2 has been compiled +without Unicode support, the buffer is filled with the text "Unicode not +supported". Otherwise, the Unicode version string (for example, "8.0.0") is +inserted. The number of code units used is returned. This is the length of the +string plus one unit for the terminating zero. +
    +  PCRE2_CONFIG_UNICODE
    +
    +The output is a uint32_t integer that is set to one if Unicode support is +available; otherwise it is set to zero. Unicode support implies UTF support. +
    +  PCRE2_CONFIG_VERSION
    +
    +The where argument should point to a buffer that is at least 24 code +units long. (The exact length required can be found by calling +pcre2_config() with where set to NULL.) The buffer is filled with +the PCRE2 version string, zero-terminated. The number of code units used is +returned. This is the length of the string plus one unit for the terminating +zero. +

    +
    COMPILING A PATTERN
    +

    +pcre2_code *pcre2_compile(PCRE2_SPTR pattern, PCRE2_SIZE length, + uint32_t options, int *errorcode, PCRE2_SIZE *erroroffset, + pcre2_compile_context *ccontext); +
    +
    +void pcre2_code_free(pcre2_code *code); +
    +
    +pcre2_code *pcre2_code_copy(const pcre2_code *code); +
    +
    +pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *code); +

    +

    +The pcre2_compile() function compiles a pattern into an internal form. +The pattern is defined by a pointer to a string of code units and a length (in +code units). If the pattern is zero-terminated, the length can be specified as +PCRE2_ZERO_TERMINATED. The function returns a pointer to a block of memory that +contains the compiled pattern and related data, or NULL if an error occurred. +

    +

    +If the compile context argument ccontext is NULL, memory for the compiled +pattern is obtained by calling malloc(). Otherwise, it is obtained from +the same memory function that was used for the compile context. The caller must +free the memory by calling pcre2_code_free() when it is no longer needed. +If pcre2_code_free() is called with a NULL argument, it returns +immediately, without doing anything. +

    +

    +The function pcre2_code_copy() makes a copy of the compiled code in new +memory, using the same memory allocator as was used for the original. However, +if the code has been processed by the JIT compiler (see +below), +the JIT information cannot be copied (because it is position-dependent). +The new copy can initially be used only for non-JIT matching, though it can be +passed to pcre2_jit_compile() if required. If pcre2_code_copy() is +called with a NULL argument, it returns NULL. +

    +

    +The pcre2_code_copy() function provides a way for individual threads in a +multithreaded application to acquire a private copy of shared compiled code. +However, it does not make a copy of the character tables used by the compiled +pattern; the new pattern code points to the same tables as the original code. +(See +"Locale Support" +below for details of these character tables.) In many applications the same +tables are used throughout, so this behaviour is appropriate. Nevertheless, +there are occasions when a copy of a compiled pattern and the relevant tables +are needed. The pcre2_code_copy_with_tables() provides this facility. +Copies of both the code and the tables are made, with the new code pointing to +the new tables. The memory for the new tables is automatically freed when +pcre2_code_free() is called for the new copy of the compiled code. If +pcre2_code_copy_with_tables() is called with a NULL argument, it returns +NULL. +

    +

    +NOTE: When one of the matching functions is called, pointers to the compiled +pattern and the subject string are set in the match data block so that they can +be referenced by the substring extraction functions after a successful match. +After running a match, you must not free a compiled pattern or a subject string +until after all operations on the +match data block +have taken place, unless, in the case of the subject string, you have used the +PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled +"Option bits for pcre2_match()" +below. +

    +

    +The options argument for pcre2_compile() contains various bit +settings that affect the compilation. It should be zero if none of them are +required. The available options are described below. Some of them (in +particular, those that are compatible with Perl, but some others as well) can +also be set and unset from within the pattern (see the detailed description in +the +pcre2pattern +documentation). +

    +

    +For those options that can be different in different parts of the pattern, the +contents of the options argument specifies their settings at the start of +compilation. The PCRE2_ANCHORED, PCRE2_ENDANCHORED, and PCRE2_NO_UTF_CHECK +options can be set at the time of matching as well as at compile time. +

    +

    +Some additional options and less frequently required compile-time parameters +(for example, the newline setting) can be provided in a compile context (as +described +above). +

    +

    +If errorcode or erroroffset is NULL, pcre2_compile() returns +NULL immediately. Otherwise, the variables to which these point are set to an +error code and an offset (number of code units) within the pattern, +respectively, when pcre2_compile() returns NULL because a compilation +error has occurred. The values are not defined when compilation is successful +and pcre2_compile() returns a non-NULL value. +

    +

    +There are nearly 100 positive error codes that pcre2_compile() may return +if it finds an error in the pattern. There are also some negative error codes +that are used for invalid UTF strings when validity checking is in force. These +are the same as given by pcre2_match() and pcre2_dfa_match(), and +are described in the +pcre2unicode +documentation. There is no separate documentation for the positive error codes, +because the textual error messages that are obtained by calling the +pcre2_get_error_message() function (see "Obtaining a textual error +message" +below) +should be self-explanatory. Macro names starting with PCRE2_ERROR_ are defined +for both positive and negative error codes in pcre2.h. +

    +

    +The value returned in erroroffset is an indication of where in the +pattern the error occurred. It is not necessarily the furthest point in the +pattern that was read. For example, after the error "lookbehind assertion is +not fixed length", the error offset points to the start of the failing +assertion. For an invalid UTF-8 or UTF-16 string, the offset is that of the +first code unit of the failing character. +

    +

    +Some errors are not detected until the whole pattern has been scanned; in these +cases, the offset passed back is the length of the pattern. Note that the +offset is in code units, not characters, even in a UTF mode. It may sometimes +point into the middle of a UTF-8 or UTF-16 character. +

    +

    +This code fragment shows a typical straightforward call to +pcre2_compile(): +

    +  pcre2_code *re;
    +  PCRE2_SIZE erroffset;
    +  int errorcode;
    +  re = pcre2_compile(
    +    "^A.*Z",                /* the pattern */
    +    PCRE2_ZERO_TERMINATED,  /* the pattern is zero-terminated */
    +    0,                      /* default options */
    +    &errorcode,             /* for error code */
    +    &erroffset,             /* for error offset */
    +    NULL);                  /* no compile context */
    +
    +
    +

    +
    +Main compile options +
    +

    +The following names for option bits are defined in the pcre2.h header +file: +

    +  PCRE2_ANCHORED
    +
    +If this bit is set, the pattern is forced to be "anchored", that is, it is +constrained to match only at the first matching point in the string that is +being searched (the "subject string"). This effect can also be achieved by +appropriate constructs in the pattern itself, which is the only way to do it in +Perl. +
    +  PCRE2_ALLOW_EMPTY_CLASS
    +
    +By default, for compatibility with Perl, a closing square bracket that +immediately follows an opening one is treated as a data character for the +class. When PCRE2_ALLOW_EMPTY_CLASS is set, it terminates the class, which +therefore contains no characters and so can never match. +
    +  PCRE2_ALT_BSUX
    +
    +This option request alternative handling of three escape sequences, which +makes PCRE2's behaviour more like ECMAscript (aka JavaScript). When it is set: +

    +

    +(1) \U matches an upper case "U" character; by default \U causes a compile +time error (Perl uses \U to upper case subsequent characters). +

    +

    +(2) \u matches a lower case "u" character unless it is followed by four +hexadecimal digits, in which case the hexadecimal number defines the code point +to match. By default, \u causes a compile time error (Perl uses it to upper +case the following character). +

    +

    +(3) \x matches a lower case "x" character unless it is followed by two +hexadecimal digits, in which case the hexadecimal number defines the code point +to match. By default, as in Perl, a hexadecimal number is always expected after +\x, but it may have zero, one, or two digits (so, for example, \xz matches a +binary zero character followed by z). +

    +

    +ECMAscript 6 added additional functionality to \u. This can be accessed using +the PCRE2_EXTRA_ALT_BSUX extra option (see "Extra compile options" +below). +Note that this alternative escape handling applies only to patterns. Neither of +these options affects the processing of replacement strings passed to +pcre2_substitute(). +

    +  PCRE2_ALT_CIRCUMFLEX
    +
    +In multiline mode (when PCRE2_MULTILINE is set), the circumflex metacharacter +matches at the start of the subject (unless PCRE2_NOTBOL is set), and also +after any internal newline. However, it does not match after a newline at the +end of the subject, for compatibility with Perl. If you want a multiline +circumflex also to match after a terminating newline, you must set +PCRE2_ALT_CIRCUMFLEX. +
    +  PCRE2_ALT_VERBNAMES
    +
    +By default, for compatibility with Perl, the name in any verb sequence such as +(*MARK:NAME) is any sequence of characters that does not include a closing +parenthesis. The name is not processed in any way, and it is not possible to +include a closing parenthesis in the name. However, if the PCRE2_ALT_VERBNAMES +option is set, normal backslash processing is applied to verb names and only an +unescaped closing parenthesis terminates the name. A closing parenthesis can be +included in a name either as \) or between \Q and \E. If the PCRE2_EXTENDED +or PCRE2_EXTENDED_MORE option is set with PCRE2_ALT_VERBNAMES, unescaped +whitespace in verb names is skipped and #-comments are recognized, exactly as +in the rest of the pattern. +
    +  PCRE2_AUTO_CALLOUT
    +
    +If this bit is set, pcre2_compile() automatically inserts callout items, +all with number 255, before each pattern item, except immediately before or +after an explicit callout in the pattern. For discussion of the callout +facility, see the +pcre2callout +documentation. +
    +  PCRE2_CASELESS
    +
    +If this bit is set, letters in the pattern match both upper and lower case +letters in the subject. It is equivalent to Perl's /i option, and it can be +changed within a pattern by a (?i) option setting. If either PCRE2_UTF or +PCRE2_UCP is set, Unicode properties are used for all characters with more than +one other case, and for all characters whose code points are greater than +U+007F. Note that there are two ASCII characters, K and S, that, in addition to +their lower case ASCII equivalents, are case-equivalent with U+212A (Kelvin +sign) and U+017F (long S) respectively. For lower valued characters with only +one other case, a lookup table is used for speed. When neither PCRE2_UTF nor +PCRE2_UCP is set, a lookup table is used for all code points less than 256, and +higher code points (available only in 16-bit or 32-bit mode) are treated as not +having another case. +
    +  PCRE2_DOLLAR_ENDONLY
    +
    +If this bit is set, a dollar metacharacter in the pattern matches only at the +end of the subject string. Without this option, a dollar also matches +immediately before a newline at the end of the string (but not before any other +newlines). The PCRE2_DOLLAR_ENDONLY option is ignored if PCRE2_MULTILINE is +set. There is no equivalent to this option in Perl, and no way to set it within +a pattern. +
    +  PCRE2_DOTALL
    +
    +If this bit is set, a dot metacharacter in the pattern matches any character, +including one that indicates a newline. However, it only ever matches one +character, even if newlines are coded as CRLF. Without this option, a dot does +not match when the current position in the subject is at a newline. This option +is equivalent to Perl's /s option, and it can be changed within a pattern by a +(?s) option setting. A negative class such as [^a] always matches newline +characters, and the \N escape sequence always matches a non-newline character, +independent of the setting of PCRE2_DOTALL. +
    +  PCRE2_DUPNAMES
    +
    +If this bit is set, names used to identify capture groups need not be unique. +This can be helpful for certain types of pattern when it is known that only one +instance of the named group can ever be matched. There are more details of +named capture groups below; see also the +pcre2pattern +documentation. +
    +  PCRE2_ENDANCHORED
    +
    +If this bit is set, the end of any pattern match must be right at the end of +the string being searched (the "subject string"). If the pattern match +succeeds by reaching (*ACCEPT), but does not reach the end of the subject, the +match fails at the current starting point. For unanchored patterns, a new match +is then tried at the next starting point. However, if the match succeeds by +reaching the end of the pattern, but not the end of the subject, backtracking +occurs and an alternative match may be found. Consider these two patterns: +
    +  .(*ACCEPT)|..
    +  .|..
    +
    +If matched against "abc" with PCRE2_ENDANCHORED set, the first matches "c" +whereas the second matches "bc". The effect of PCRE2_ENDANCHORED can also be +achieved by appropriate constructs in the pattern itself, which is the only way +to do it in Perl. +

    +

    +For DFA matching with pcre2_dfa_match(), PCRE2_ENDANCHORED applies only +to the first (that is, the longest) matched string. Other parallel matches, +which are necessarily substrings of the first one, must obviously end before +the end of the subject. +

    +  PCRE2_EXTENDED
    +
    +If this bit is set, most white space characters in the pattern are totally +ignored except when escaped or inside a character class. However, white space +is not allowed within sequences such as (?> that introduce various +parenthesized groups, nor within numerical quantifiers such as {1,3}. Ignorable +white space is permitted between an item and a following quantifier and between +a quantifier and a following + that indicates possessiveness. PCRE2_EXTENDED is +equivalent to Perl's /x option, and it can be changed within a pattern by a +(?x) option setting. +

    +

    +When PCRE2 is compiled without Unicode support, PCRE2_EXTENDED recognizes as +white space only those characters with code points less than 256 that are +flagged as white space in its low-character table. The table is normally +created by +pcre2_maketables(), +which uses the isspace() function to identify space characters. In most +ASCII environments, the relevant characters are those with code points 0x0009 +(tab), 0x000A (linefeed), 0x000B (vertical tab), 0x000C (formfeed), 0x000D +(carriage return), and 0x0020 (space). +

    +

    +When PCRE2 is compiled with Unicode support, in addition to these characters, +five more Unicode "Pattern White Space" characters are recognized by +PCRE2_EXTENDED. These are U+0085 (next line), U+200E (left-to-right mark), +U+200F (right-to-left mark), U+2028 (line separator), and U+2029 (paragraph +separator). This set of characters is the same as recognized by Perl's /x +option. Note that the horizontal and vertical space characters that are matched +by the \h and \v escapes in patterns are a much bigger set. +

    +

    +As well as ignoring most white space, PCRE2_EXTENDED also causes characters +between an unescaped # outside a character class and the next newline, +inclusive, to be ignored, which makes it possible to include comments inside +complicated patterns. Note that the end of this type of comment is a literal +newline sequence in the pattern; escape sequences that happen to represent a +newline do not count. +

    +

    +Which characters are interpreted as newlines can be specified by a setting in +the compile context that is passed to pcre2_compile() or by a special +sequence at the start of the pattern, as described in the section entitled +"Newline conventions" +in the pcre2pattern documentation. A default is defined when PCRE2 is +built. +

    +  PCRE2_EXTENDED_MORE
    +
    +This option has the effect of PCRE2_EXTENDED, but, in addition, unescaped space +and horizontal tab characters are ignored inside a character class. Note: only +these two characters are ignored, not the full set of pattern white space +characters that are ignored outside a character class. PCRE2_EXTENDED_MORE is +equivalent to Perl's /xx option, and it can be changed within a pattern by a +(?xx) option setting. +
    +  PCRE2_FIRSTLINE
    +
    +If this option is set, the start of an unanchored pattern match must be before +or at the first newline in the subject string following the start of matching, +though the matched text may continue over the newline. If startoffset is +non-zero, the limiting newline is not necessarily the first newline in the +subject. For example, if the subject string is "abc\nxyz" (where \n +represents a single-character newline) a pattern match for "yz" succeeds with +PCRE2_FIRSTLINE if startoffset is greater than 3. See also +PCRE2_USE_OFFSET_LIMIT, which provides a more general limiting facility. If +PCRE2_FIRSTLINE is set with an offset limit, a match must occur in the first +line and also within the offset limit. In other words, whichever limit comes +first is used. +
    +  PCRE2_LITERAL
    +
    +If this option is set, all meta-characters in the pattern are disabled, and it +is treated as a literal string. Matching literal strings with a regular +expression engine is not the most efficient way of doing it. If you are doing a +lot of literal matching and are worried about efficiency, you should consider +using other approaches. The only other main options that are allowed with +PCRE2_LITERAL are: PCRE2_ANCHORED, PCRE2_ENDANCHORED, PCRE2_AUTO_CALLOUT, +PCRE2_CASELESS, PCRE2_FIRSTLINE, PCRE2_MATCH_INVALID_UTF, +PCRE2_NO_START_OPTIMIZE, PCRE2_NO_UTF_CHECK, PCRE2_UTF, and +PCRE2_USE_OFFSET_LIMIT. The extra options PCRE2_EXTRA_MATCH_LINE and +PCRE2_EXTRA_MATCH_WORD are also supported. Any other options cause an error. +
    +  PCRE2_MATCH_INVALID_UTF
    +
    +This option forces PCRE2_UTF (see below) and also enables support for matching +by pcre2_match() in subject strings that contain invalid UTF sequences. +This facility is not supported for DFA matching. For details, see the +pcre2unicode +documentation. +
    +  PCRE2_MATCH_UNSET_BACKREF
    +
    +If this option is set, a backreference to an unset capture group matches an +empty string (by default this causes the current matching alternative to fail). +A pattern such as (\1)(a) succeeds when this option is set (assuming it can +find an "a" in the subject), whereas it fails by default, for Perl +compatibility. Setting this option makes PCRE2 behave more like ECMAscript (aka +JavaScript). +
    +  PCRE2_MULTILINE
    +
    +By default, for the purposes of matching "start of line" and "end of line", +PCRE2 treats the subject string as consisting of a single line of characters, +even if it actually contains newlines. The "start of line" metacharacter (^) +matches only at the start of the string, and the "end of line" metacharacter +($) matches only at the end of the string, or before a terminating newline +(except when PCRE2_DOLLAR_ENDONLY is set). Note, however, that unless +PCRE2_DOTALL is set, the "any character" metacharacter (.) does not match at a +newline. This behaviour (for ^, $, and dot) is the same as Perl. +

    +

    +When PCRE2_MULTILINE it is set, the "start of line" and "end of line" +constructs match immediately following or immediately before internal newlines +in the subject string, respectively, as well as at the very start and end. This +is equivalent to Perl's /m option, and it can be changed within a pattern by a +(?m) option setting. Note that the "start of line" metacharacter does not match +after a newline at the end of the subject, for compatibility with Perl. +However, you can change this by setting the PCRE2_ALT_CIRCUMFLEX option. If +there are no newlines in a subject string, or no occurrences of ^ or $ in a +pattern, setting PCRE2_MULTILINE has no effect. +

    +  PCRE2_NEVER_BACKSLASH_C
    +
    +This option locks out the use of \C in the pattern that is being compiled. +This escape can cause unpredictable behaviour in UTF-8 or UTF-16 modes, because +it may leave the current matching point in the middle of a multi-code-unit +character. This option may be useful in applications that process patterns from +external sources. Note that there is also a build-time option that permanently +locks out the use of \C. +
    +  PCRE2_NEVER_UCP
    +
    +This option locks out the use of Unicode properties for handling \B, \b, \D, +\d, \S, \s, \W, \w, and some of the POSIX character classes, as described +for the PCRE2_UCP option below. In particular, it prevents the creator of the +pattern from enabling this facility by starting the pattern with (*UCP). This +option may be useful in applications that process patterns from external +sources. The option combination PCRE_UCP and PCRE_NEVER_UCP causes an error. +
    +  PCRE2_NEVER_UTF
    +
    +This option locks out interpretation of the pattern as UTF-8, UTF-16, or +UTF-32, depending on which library is in use. In particular, it prevents the +creator of the pattern from switching to UTF interpretation by starting the +pattern with (*UTF). This option may be useful in applications that process +patterns from external sources. The combination of PCRE2_UTF and +PCRE2_NEVER_UTF causes an error. +
    +  PCRE2_NO_AUTO_CAPTURE
    +
    +If this option is set, it disables the use of numbered capturing parentheses in +the pattern. Any opening parenthesis that is not followed by ? behaves as if it +were followed by ?: but named parentheses can still be used for capturing (and +they acquire numbers in the usual way). This is the same as Perl's /n option. +Note that, when this option is set, references to capture groups +(backreferences or recursion/subroutine calls) may only refer to named groups, +though the reference can be by name or by number. +
    +  PCRE2_NO_AUTO_POSSESS
    +
    +If this option is set, it disables "auto-possessification", which is an +optimization that, for example, turns a+b into a++b in order to avoid +backtracks into a+ that can never be successful. However, if callouts are in +use, auto-possessification means that some callouts are never taken. You can +set this option if you want the matching functions to do a full unoptimized +search and run all the callouts, but it is mainly provided for testing +purposes. +
    +  PCRE2_NO_DOTSTAR_ANCHOR
    +
    +If this option is set, it disables an optimization that is applied when .* is +the first significant item in a top-level branch of a pattern, and all the +other branches also start with .* or with \A or \G or ^. The optimization is +automatically disabled for .* if it is inside an atomic group or a capture +group that is the subject of a backreference, or if the pattern contains +(*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is +automatically anchored if PCRE2_DOTALL is set for all the .* items and +PCRE2_MULTILINE is not set for any ^ items. Otherwise, the fact that any match +must start either at the start of the subject or following a newline is +remembered. Like other optimizations, this can cause callouts to be skipped. +
    +  PCRE2_NO_START_OPTIMIZE
    +
    +This is an option whose main effect is at matching time. It does not change +what pcre2_compile() generates, but it does affect the output of the JIT +compiler. +

    +

    +There are a number of optimizations that may occur at the start of a match, in +order to speed up the process. For example, if it is known that an unanchored +match must start with a specific code unit value, the matching code searches +the subject for that value, and fails immediately if it cannot find it, without +actually running the main matching function. This means that a special item +such as (*COMMIT) at the start of a pattern is not considered until after a +suitable starting point for the match has been found. Also, when callouts or +(*MARK) items are in use, these "start-up" optimizations can cause them to be +skipped if the pattern is never actually used. The start-up optimizations are +in effect a pre-scan of the subject that takes place before the pattern is run. +

    +

    +The PCRE2_NO_START_OPTIMIZE option disables the start-up optimizations, +possibly causing performance to suffer, but ensuring that in cases where the +result is "no match", the callouts do occur, and that items such as (*COMMIT) +and (*MARK) are considered at every possible starting position in the subject +string. +

    +

    +Setting PCRE2_NO_START_OPTIMIZE may change the outcome of a matching operation. +Consider the pattern +

    +  (*COMMIT)ABC
    +
    +When this is compiled, PCRE2 records the fact that a match must start with the +character "A". Suppose the subject string is "DEFABC". The start-up +optimization scans along the subject, finds "A" and runs the first match +attempt from there. The (*COMMIT) item means that the pattern must match the +current starting position, which in this case, it does. However, if the same +match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the +subject string does not happen. The first match attempt is run starting from +"D" and when this fails, (*COMMIT) prevents any further matches being tried, so +the overall result is "no match". +

    +

    +As another start-up optimization makes use of a minimum length for a matching +subject, which is recorded when possible. Consider the pattern +

    +  (*MARK:1)B(*MARK:2)(X|Y)
    +
    +The minimum length for a match is two characters. If the subject is "XXBB", the +"starting character" optimization skips "XX", then tries to match "BB", which +is long enough. In the process, (*MARK:2) is encountered and remembered. When +the match attempt fails, the next "B" is found, but there is only one character +left, so there are no more attempts, and "no match" is returned with the "last +mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried +at every possible starting position, including at the end of the subject, where +(*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is +returned is "1". In this case, the optimizations do not affect the overall +match result, which is still "no match", but they do affect the auxiliary +information that is returned. +
    +  PCRE2_NO_UTF_CHECK
    +
    +When PCRE2_UTF is set, the validity of the pattern as a UTF string is +automatically checked. There are discussions about the validity of +UTF-8 strings, +UTF-16 strings, +and +UTF-32 strings +in the +pcre2unicode +document. If an invalid UTF sequence is found, pcre2_compile() returns a +negative error code. +

    +

    +If you know that your pattern is a valid UTF string, and you want to skip this +check for performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When +it is set, the effect of passing an invalid UTF string as a pattern is +undefined. It may cause your program to crash or loop. +

    +

    +Note that this option can also be passed to pcre2_match() and +pcre_dfa_match(), to suppress UTF validity checking of the subject +string. +

    +

    +Note also that setting PCRE2_NO_UTF_CHECK at compile time does not disable the +error that is given if an escape sequence for an invalid Unicode code point is +encountered in the pattern. In particular, the so-called "surrogate" code +points (0xd800 to 0xdfff) are invalid. If you want to allow escape sequences +such as \x{d800} you can set the PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES extra +option, as described in the section entitled "Extra compile options" +below. +However, this is possible only in UTF-8 and UTF-32 modes, because these values +are not representable in UTF-16. +

    +  PCRE2_UCP
    +
    +This option has two effects. Firstly, it change the way PCRE2 processes \B, +\b, \D, \d, \S, \s, \W, \w, and some of the POSIX character classes. By +default, only ASCII characters are recognized, but if PCRE2_UCP is set, Unicode +properties are used instead to classify characters. More details are given in +the section on +generic character types +in the +pcre2pattern +page. If you set PCRE2_UCP, matching one of the items it affects takes much +longer. +

    +

    +The second effect of PCRE2_UCP is to force the use of Unicode properties for +upper/lower casing operations on characters with code points greater than 127, +even when PCRE2_UTF is not set. This makes it possible, for example, to process +strings in the 16-bit UCS-2 code. This option is available only if PCRE2 has +been compiled with Unicode support (which is the default). +

    +  PCRE2_UNGREEDY
    +
    +This option inverts the "greediness" of the quantifiers so that they are not +greedy by default, but become greedy if followed by "?". It is not compatible +with Perl. It can also be set by a (?U) option setting within the pattern. +
    +  PCRE2_USE_OFFSET_LIMIT
    +
    +This option must be set for pcre2_compile() if +pcre2_set_offset_limit() is going to be used to set a non-default offset +limit in a match context for matches that use this pattern. An error is +generated if an offset limit is set without this option. For more details, see +the description of pcre2_set_offset_limit() in the +section +that describes match contexts. See also the PCRE2_FIRSTLINE +option above. +
    +  PCRE2_UTF
    +
    +This option causes PCRE2 to regard both the pattern and the subject strings +that are subsequently processed as strings of UTF characters instead of +single-code-unit strings. It is available when PCRE2 is built to include +Unicode support (which is the default). If Unicode support is not available, +the use of this option provokes an error. Details of how PCRE2_UTF changes the +behaviour of PCRE2 are given in the +pcre2unicode +page. In particular, note that it changes the way PCRE2_CASELESS handles +characters with code points greater than 127. +

    +
    +Extra compile options +
    +

    +The option bits that can be set in a compile context by calling the +pcre2_set_compile_extra_options() function are as follows: +

    +  PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
    +
    +This option applies when compiling a pattern in UTF-8 or UTF-32 mode. It is +forbidden in UTF-16 mode, and ignored in non-UTF modes. Unicode "surrogate" +code points in the range 0xd800 to 0xdfff are used in pairs in UTF-16 to encode +code points with values in the range 0x10000 to 0x10ffff. The surrogates cannot +therefore be represented in UTF-16. They can be represented in UTF-8 and +UTF-32, but are defined as invalid code points, and cause errors if encountered +in a UTF-8 or UTF-32 string that is being checked for validity by PCRE2. +

    +

    +These values also cause errors if encountered in escape sequences such as +\x{d912} within a pattern. However, it seems that some applications, when +using PCRE2 to check for unwanted characters in UTF-8 strings, explicitly test +for the surrogates using escape sequences. The PCRE2_NO_UTF_CHECK option does +not disable the error that occurs, because it applies only to the testing of +input strings for UTF validity. +

    +

    +If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code +point values in UTF-8 and UTF-32 patterns no longer provoke errors and are +incorporated in the compiled pattern. However, they can only match subject +characters if the matching function is called with PCRE2_NO_UTF_CHECK set. +

    +  PCRE2_EXTRA_ALT_BSUX
    +
    +The original option PCRE2_ALT_BSUX causes PCRE2 to process \U, \u, and \x in +the way that ECMAscript (aka JavaScript) does. Additional functionality was +defined by ECMAscript 6; setting PCRE2_EXTRA_ALT_BSUX has the effect of +PCRE2_ALT_BSUX, but in addition it recognizes \u{hhh..} as a hexadecimal +character code, where hhh.. is any number of hexadecimal digits. +
    +  PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
    +
    +This is a dangerous option. Use with care. By default, an unrecognized escape +such as \j or a malformed one such as \x{2z} causes a compile-time error when +detected by pcre2_compile(). Perl is somewhat inconsistent in handling +such items: for example, \j is treated as a literal "j", and non-hexadecimal +digits in \x{} are just ignored, though warnings are given in both cases if +Perl's warning switch is enabled. However, a malformed octal number after \o{ +always causes an error in Perl. +

    +

    +If the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to +pcre2_compile(), all unrecognized or malformed escape sequences are +treated as single-character escapes. For example, \j is a literal "j" and +\x{2z} is treated as the literal string "x{2z}". Setting this option means +that typos in patterns may go undetected and have unexpected results. Also note +that a sequence such as [\N{] is interpreted as a malformed attempt at +[\N{...}] and so is treated as [N{] whereas [\N] gives an error because an +unqualified \N is a valid escape sequence but is not supported in a character +class. To reiterate: this is a dangerous option. Use with great care. +

    +  PCRE2_EXTRA_ESCAPED_CR_IS_LF
    +
    +There are some legacy applications where the escape sequence \r in a pattern +is expected to match a newline. If this option is set, \r in a pattern is +converted to \n so that it matches a LF (linefeed) instead of a CR (carriage +return) character. The option does not affect a literal CR in the pattern, nor +does it affect CR specified as an explicit code point such as \x{0D}. +
    +  PCRE2_EXTRA_MATCH_LINE
    +
    +This option is provided for use by the -x option of pcre2grep. It +causes the pattern only to match complete lines. This is achieved by +automatically inserting the code for "^(?:" at the start of the compiled +pattern and ")$" at the end. Thus, when PCRE2_MULTILINE is set, the matched +line may be in the middle of the subject string. This option can be used with +PCRE2_LITERAL. +
    +  PCRE2_EXTRA_MATCH_WORD
    +
    +This option is provided for use by the -w option of pcre2grep. It +causes the pattern only to match strings that have a word boundary at the start +and the end. This is achieved by automatically inserting the code for "\b(?:" +at the start of the compiled pattern and ")\b" at the end. The option may be +used with PCRE2_LITERAL. However, it is ignored if PCRE2_EXTRA_MATCH_LINE is +also set. +

    +
    JUST-IN-TIME (JIT) COMPILATION
    +

    +int pcre2_jit_compile(pcre2_code *code, uint32_t options); +
    +
    +int pcre2_jit_match(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext); +
    +
    +void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext); +
    +
    +pcre2_jit_stack *pcre2_jit_stack_create(PCRE2_SIZE startsize, + PCRE2_SIZE maxsize, pcre2_general_context *gcontext); +
    +
    +void pcre2_jit_stack_assign(pcre2_match_context *mcontext, + pcre2_jit_callback callback_function, void *callback_data); +
    +
    +void pcre2_jit_stack_free(pcre2_jit_stack *jit_stack); +

    +

    +These functions provide support for JIT compilation, which, if the just-in-time +compiler is available, further processes a compiled pattern into machine code +that executes much faster than the pcre2_match() interpretive matching +function. Full details are given in the +pcre2jit +documentation. +

    +

    +JIT compilation is a heavyweight optimization. It can take some time for +patterns to be analyzed, and for one-off matches and simple patterns the +benefit of faster execution might be offset by a much slower compilation time. +Most (but not all) patterns can be optimized by the JIT compiler. +

    +
    LOCALE SUPPORT
    +

    +const uint8_t *pcre2_maketables(pcre2_general_context *gcontext); +
    +
    +void pcre2_maketables_free(pcre2_general_context *gcontext, + const uint8_t *tables); +

    +

    +PCRE2 handles caseless matching, and determines whether characters are letters, +digits, or whatever, by reference to a set of tables, indexed by character code +point. However, this applies only to characters whose code points are less than +256. By default, higher-valued code points never match escapes such as \w or +\d. +

    +

    +When PCRE2 is built with Unicode support (the default), the Unicode properties +of all characters can be tested with \p and \P, or, alternatively, the +PCRE2_UCP option can be set when a pattern is compiled; this causes \w and +friends to use Unicode property support instead of the built-in tables. +PCRE2_UCP also causes upper/lower casing operations on characters with code +points greater than 127 to use Unicode properties. These effects apply even +when PCRE2_UTF is not set. +

    +

    +The use of locales with Unicode is discouraged. If you are handling characters +with code points greater than 127, you should either use Unicode support, or +use locales, but not try to mix the two. +

    +

    +PCRE2 contains a built-in set of character tables that are used by default. +These are sufficient for many applications. Normally, the internal tables +recognize only ASCII characters. However, when PCRE2 is built, it is possible +to cause the internal tables to be rebuilt in the default "C" locale of the +local system, which may cause them to be different. +

    +

    +The built-in tables can be overridden by tables supplied by the application +that calls PCRE2. These may be created in a different locale from the default. +As more and more applications change to using Unicode, the need for this locale +support is expected to die away. +

    +

    +External tables are built by calling the pcre2_maketables() function, in +the relevant locale. The only argument to this function is a general context, +which can be used to pass a custom memory allocator. If the argument is NULL, +the system malloc() is used. The result can be passed to +pcre2_compile() as often as necessary, by creating a compile context and +calling pcre2_set_character_tables() to set the tables pointer therein. +

    +

    +For example, to build and use tables that are appropriate for the French locale +(where accented characters with values greater than 127 are treated as +letters), the following code could be used: +

    +  setlocale(LC_CTYPE, "fr_FR");
    +  tables = pcre2_maketables(NULL);
    +  ccontext = pcre2_compile_context_create(NULL);
    +  pcre2_set_character_tables(ccontext, tables);
    +  re = pcre2_compile(..., ccontext);
    +
    +The locale name "fr_FR" is used on Linux and other Unix-like systems; if you +are using Windows, the name for the French locale is "french". +

    +

    +The pointer that is passed (via the compile context) to pcre2_compile() +is saved with the compiled pattern, and the same tables are used by the +matching functions. Thus, for any single pattern, compilation and matching both +happen in the same locale, but different patterns can be processed in different +locales. +

    +

    +It is the caller's responsibility to ensure that the memory containing the +tables remains available while they are still in use. When they are no longer +needed, you can discard them using pcre2_maketables_free(), which should +pass as its first parameter the same global context that was used to create the +tables. +

    +
    +Saving locale tables +
    +

    +The tables described above are just a sequence of binary bytes, which makes +them independent of hardware characteristics such as endianness or whether the +processor is 32-bit or 64-bit. A copy of the result of pcre2_maketables() +can therefore be saved in a file or elsewhere and re-used later, even in a +different program or on another computer. The size of the tables (number of +bytes) must be obtained by calling pcre2_config() with the +PCRE2_CONFIG_TABLES_LENGTH option because pcre2_maketables() does not +return this value. Note that the pcre2_dftables program, which is part of +the PCRE2 build system, can be used stand-alone to create a file that contains +a set of binary tables. See the +pcre2build +documentation for details. +

    +
    INFORMATION ABOUT A COMPILED PATTERN
    +

    +int pcre2_pattern_info(const pcre2 *code, uint32_t what, void *where); +

    +

    +The pcre2_pattern_info() function returns general information about a +compiled pattern. For information about callouts, see the +next section. +The first argument for pcre2_pattern_info() is a pointer to the compiled +pattern. The second argument specifies which piece of information is required, +and the third argument is a pointer to a variable to receive the data. If the +third argument is NULL, the first argument is ignored, and the function returns +the size in bytes of the variable that is required for the information +requested. Otherwise, the yield of the function is zero for success, or one of +the following negative numbers: +

    +  PCRE2_ERROR_NULL           the argument code was NULL
    +  PCRE2_ERROR_BADMAGIC       the "magic number" was not found
    +  PCRE2_ERROR_BADOPTION      the value of what was invalid
    +  PCRE2_ERROR_UNSET          the requested field is not set
    +
    +The "magic number" is placed at the start of each compiled pattern as a simple +check against passing an arbitrary memory pointer. Here is a typical call of +pcre2_pattern_info(), to obtain the length of the compiled pattern: +
    +  int rc;
    +  size_t length;
    +  rc = pcre2_pattern_info(
    +    re,               /* result of pcre2_compile() */
    +    PCRE2_INFO_SIZE,  /* what is required */
    +    &length);         /* where to put the data */
    +
    +The possible values for the second argument are defined in pcre2.h, and +are as follows: +
    +  PCRE2_INFO_ALLOPTIONS
    +  PCRE2_INFO_ARGOPTIONS
    +  PCRE2_INFO_EXTRAOPTIONS
    +
    +Return copies of the pattern's options. The third argument should point to a +uint32_t variable. PCRE2_INFO_ARGOPTIONS returns exactly the options that +were passed to pcre2_compile(), whereas PCRE2_INFO_ALLOPTIONS returns +the compile options as modified by any top-level (*XXX) option settings such as +(*UTF) at the start of the pattern itself. PCRE2_INFO_EXTRAOPTIONS returns the +extra options that were set in the compile context by calling the +pcre2_set_compile_extra_options() function. +

    +

    +For example, if the pattern /(*UTF)abc/ is compiled with the PCRE2_EXTENDED +option, the result for PCRE2_INFO_ALLOPTIONS is PCRE2_EXTENDED and PCRE2_UTF. +Option settings such as (?i) that can change within a pattern do not affect the +result of PCRE2_INFO_ALLOPTIONS, even if they appear right at the start of the +pattern. (This was different in some earlier releases.) +

    +

    +A pattern compiled without PCRE2_ANCHORED is automatically anchored by PCRE2 if +the first significant item in every top-level branch is one of the following: +

    +  ^     unless PCRE2_MULTILINE is set
    +  \A    always
    +  \G    always
    +  .*    sometimes - see below
    +
    +When .* is the first significant item, anchoring is possible only when all the +following are true: +
    +  .* is not in an atomic group
    +  .* is not in a capture group that is the subject of a backreference
    +  PCRE2_DOTALL is in force for .*
    +  Neither (*PRUNE) nor (*SKIP) appears in the pattern
    +  PCRE2_NO_DOTSTAR_ANCHOR is not set
    +
    +For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the +options returned for PCRE2_INFO_ALLOPTIONS. +
    +  PCRE2_INFO_BACKREFMAX
    +
    +Return the number of the highest backreference in the pattern. The third +argument should point to a uint32_t variable. Named capture groups +acquire numbers as well as names, and these count towards the highest +backreference. Backreferences such as \4 or \g{12} match the captured +characters of the given group, but in addition, the check that a capture +group is set in a conditional group such as (?(3)a|b) is also a backreference. +Zero is returned if there are no backreferences. +
    +  PCRE2_INFO_BSR
    +
    +The output is a uint32_t integer whose value indicates what character sequences +the \R escape sequence matches. A value of PCRE2_BSR_UNICODE means that \R +matches any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means +that \R matches only CR, LF, or CRLF. +
    +  PCRE2_INFO_CAPTURECOUNT
    +
    +Return the highest capture group number in the pattern. In patterns where (?| +is not used, this is also the total number of capture groups. The third +argument should point to a uint32_t variable. +
    +  PCRE2_INFO_DEPTHLIMIT
    +
    +If the pattern set a backtracking depth limit by including an item of the form +(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument +should point to a uint32_t integer. If no such value has been set, the call to +pcre2_pattern_info() returns the error PCRE2_ERROR_UNSET. Note that this +limit will only be used during matching if it is less than the limit set or +defaulted by the caller of the match function. +
    +  PCRE2_INFO_FIRSTBITMAP
    +
    +In the absence of a single first code unit for a non-anchored pattern, +pcre2_compile() may construct a 256-bit table that defines a fixed set of +values for the first code unit in any match. For example, a pattern that starts +with [abc] results in a table with three bits set. When code unit values +greater than 255 are supported, the flag bit for 255 means "any code unit of +value 255 or above". If such a table was constructed, a pointer to it is +returned. Otherwise NULL is returned. The third argument should point to a +const uint8_t * variable. +
    +  PCRE2_INFO_FIRSTCODETYPE
    +
    +Return information about the first code unit of any matched string, for a +non-anchored pattern. The third argument should point to a uint32_t +variable. If there is a fixed first value, for example, the letter "c" from a +pattern such as (cat|cow|coyote), 1 is returned, and the value can be retrieved +using PCRE2_INFO_FIRSTCODEUNIT. If there is no fixed first value, but it is +known that a match can occur only at the start of the subject or following a +newline in the subject, 2 is returned. Otherwise, and for anchored patterns, 0 +is returned. +
    +  PCRE2_INFO_FIRSTCODEUNIT
    +
    +Return the value of the first code unit of any matched string for a pattern +where PCRE2_INFO_FIRSTCODETYPE returns 1; otherwise return 0. The third +argument should point to a uint32_t variable. In the 8-bit library, the +value is always less than 256. In the 16-bit library the value can be up to +0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff, +and up to 0xffffffff when not using UTF-32 mode. +
    +  PCRE2_INFO_FRAMESIZE
    +
    +Return the size (in bytes) of the data frames that are used to remember +backtracking positions when the pattern is processed by pcre2_match() +without the use of JIT. The third argument should point to a size_t +variable. The frame size depends on the number of capturing parentheses in the +pattern. Each additional capture group adds two PCRE2_SIZE variables. +
    +  PCRE2_INFO_HASBACKSLASHC
    +
    +Return 1 if the pattern contains any instances of \C, otherwise 0. The third +argument should point to a uint32_t variable. +
    +  PCRE2_INFO_HASCRORLF
    +
    +Return 1 if the pattern contains any explicit matches for CR or LF characters, +otherwise 0. The third argument should point to a uint32_t variable. An +explicit match is either a literal CR or LF character, or \r or \n or one of +the equivalent hexadecimal or octal escape sequences. +
    +  PCRE2_INFO_HEAPLIMIT
    +
    +If the pattern set a heap memory limit by including an item of the form +(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument +should point to a uint32_t integer. If no such value has been set, the call to +pcre2_pattern_info() returns the error PCRE2_ERROR_UNSET. Note that this +limit will only be used during matching if it is less than the limit set or +defaulted by the caller of the match function. +
    +  PCRE2_INFO_JCHANGED
    +
    +Return 1 if the (?J) or (?-J) option setting is used in the pattern, otherwise +0. The third argument should point to a uint32_t variable. (?J) and +(?-J) set and unset the local PCRE2_DUPNAMES option, respectively. +
    +  PCRE2_INFO_JITSIZE
    +
    +If the compiled pattern was successfully processed by +pcre2_jit_compile(), return the size of the JIT compiled code, otherwise +return zero. The third argument should point to a size_t variable. +
    +  PCRE2_INFO_LASTCODETYPE
    +
    +Returns 1 if there is a rightmost literal code unit that must exist in any +matched string, other than at its start. The third argument should point to a +uint32_t variable. If there is no such value, 0 is returned. When 1 is +returned, the code unit value itself can be retrieved using +PCRE2_INFO_LASTCODEUNIT. For anchored patterns, a last literal value is +recorded only if it follows something of variable length. For example, for the +pattern /^a\d+z\d+/ the returned value is 1 (with "z" returned from +PCRE2_INFO_LASTCODEUNIT), but for /^a\dz\d/ the returned value is 0. +
    +  PCRE2_INFO_LASTCODEUNIT
    +
    +Return the value of the rightmost literal code unit that must exist in any +matched string, other than at its start, for a pattern where +PCRE2_INFO_LASTCODETYPE returns 1. Otherwise, return 0. The third argument +should point to a uint32_t variable. +
    +  PCRE2_INFO_MATCHEMPTY
    +
    +Return 1 if the pattern might match an empty string, otherwise 0. The third +argument should point to a uint32_t variable. When a pattern contains +recursive subroutine calls it is not always possible to determine whether or +not it can match an empty string. PCRE2 takes a cautious approach and returns 1 +in such cases. +
    +  PCRE2_INFO_MATCHLIMIT
    +
    +If the pattern set a match limit by including an item of the form +(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument +should point to a uint32_t integer. If no such value has been set, the call to +pcre2_pattern_info() returns the error PCRE2_ERROR_UNSET. Note that this +limit will only be used during matching if it is less than the limit set or +defaulted by the caller of the match function. +
    +  PCRE2_INFO_MAXLOOKBEHIND
    +
    +A lookbehind assertion moves back a certain number of characters (not code +units) when it starts to process each of its branches. This request returns the +largest of these backward moves. The third argument should point to a uint32_t +integer. The simple assertions \b and \B require a one-character lookbehind +and cause PCRE2_INFO_MAXLOOKBEHIND to return 1 in the absence of anything +longer. \A also registers a one-character lookbehind, though it does not +actually inspect the previous character. +

    +

    +Note that this information is useful for multi-segment matching only +if the pattern contains no nested lookbehinds. For example, the pattern +(?<=a(?<=ba)c) returns a maximum lookbehind of 2, but when it is processed, the +first lookbehind moves back by two characters, matches one character, then the +nested lookbehind also moves back by two characters. This puts the matching +point three characters earlier than it was at the start. +PCRE2_INFO_MAXLOOKBEHIND is really only useful as a debugging tool. See the +pcre2partial +documentation for a discussion of multi-segment matching. +

    +  PCRE2_INFO_MINLENGTH
    +
    +If a minimum length for matching subject strings was computed, its value is +returned. Otherwise the returned value is 0. This value is not computed when +PCRE2_NO_START_OPTIMIZE is set. The value is a number of characters, which in +UTF mode may be different from the number of code units. The third argument +should point to a uint32_t variable. The value is a lower bound to the +length of any matching string. There may not be any strings of that length that +do actually match, but every string that does match is at least that long. +
    +  PCRE2_INFO_NAMECOUNT
    +  PCRE2_INFO_NAMEENTRYSIZE
    +  PCRE2_INFO_NAMETABLE
    +
    +PCRE2 supports the use of named as well as numbered capturing parentheses. The +names are just an additional way of identifying the parentheses, which still +acquire numbers. Several convenience functions such as +pcre2_substring_get_byname() are provided for extracting captured +substrings by name. It is also possible to extract the data directly, by first +converting the name to a number in order to access the correct pointers in the +output vector (described with pcre2_match() below). To do the conversion, +you need to use the name-to-number map, which is described by these three +values. +

    +

    +The map consists of a number of fixed-size entries. PCRE2_INFO_NAMECOUNT gives +the number of entries, and PCRE2_INFO_NAMEENTRYSIZE gives the size of each +entry in code units; both of these return a uint32_t value. The entry +size depends on the length of the longest name. +

    +

    +PCRE2_INFO_NAMETABLE returns a pointer to the first entry of the table. This is +a PCRE2_SPTR pointer to a block of code units. In the 8-bit library, the first +two bytes of each entry are the number of the capturing parenthesis, most +significant byte first. In the 16-bit library, the pointer points to 16-bit +code units, the first of which contains the parenthesis number. In the 32-bit +library, the pointer points to 32-bit code units, the first of which contains +the parenthesis number. The rest of the entry is the corresponding name, zero +terminated. +

    +

    +The names are in alphabetical order. If (?| is used to create multiple capture +groups with the same number, as described in the +section on duplicate group numbers +in the +pcre2pattern +page, the groups may be given the same name, but there is only one entry in the +table. Different names for groups of the same number are not permitted. +

    +

    +Duplicate names for capture groups with different numbers are permitted, but +only if PCRE2_DUPNAMES is set. They appear in the table in the order in which +they were found in the pattern. In the absence of (?| this is the order of +increasing number; when (?| is used this is not necessarily the case because +later capture groups may have lower numbers. +

    +

    +As a simple example of the name/number table, consider the following pattern +after compilation by the 8-bit library (assume PCRE2_EXTENDED is set, so white +space - including newlines - is ignored): +

    +  (?<date> (?<year>(\d\d)?\d\d) - (?<month>\d\d) - (?<day>\d\d) )
    +
    +There are four named capture groups, so the table has four entries, and each +entry in the table is eight bytes long. The table is as follows, with +non-printing bytes shows in hexadecimal, and undefined bytes shown as ??: +
    +  00 01 d  a  t  e  00 ??
    +  00 05 d  a  y  00 ?? ??
    +  00 04 m  o  n  t  h  00
    +  00 02 y  e  a  r  00 ??
    +
    +When writing code to extract data from named capture groups using the +name-to-number map, remember that the length of the entries is likely to be +different for each compiled pattern. +
    +  PCRE2_INFO_NEWLINE
    +
    +The output is one of the following uint32_t values: +
    +  PCRE2_NEWLINE_CR       Carriage return (CR)
    +  PCRE2_NEWLINE_LF       Linefeed (LF)
    +  PCRE2_NEWLINE_CRLF     Carriage return, linefeed (CRLF)
    +  PCRE2_NEWLINE_ANY      Any Unicode line ending
    +  PCRE2_NEWLINE_ANYCRLF  Any of CR, LF, or CRLF
    +  PCRE2_NEWLINE_NUL      The NUL character (binary zero)
    +
    +This identifies the character sequence that will be recognized as meaning +"newline" while matching. +
    +  PCRE2_INFO_SIZE
    +
    +Return the size of the compiled pattern in bytes (for all three libraries). The +third argument should point to a size_t variable. This value includes the +size of the general data block that precedes the code units of the compiled +pattern itself. The value that is used when pcre2_compile() is getting +memory in which to place the compiled pattern may be slightly larger than the +value returned by this option, because there are cases where the code that +calculates the size has to over-estimate. Processing a pattern with the JIT +compiler does not alter the value returned by this option. +

    +
    INFORMATION ABOUT A PATTERN'S CALLOUTS
    +

    +int pcre2_callout_enumerate(const pcre2_code *code, + int (*callback)(pcre2_callout_enumerate_block *, void *), + void *user_data); +
    +
    +A script language that supports the use of string arguments in callouts might +like to scan all the callouts in a pattern before running the match. This can +be done by calling pcre2_callout_enumerate(). The first argument is a +pointer to a compiled pattern, the second points to a callback function, and +the third is arbitrary user data. The callback function is called for every +callout in the pattern in the order in which they appear. Its first argument is +a pointer to a callout enumeration block, and its second argument is the +user_data value that was passed to pcre2_callout_enumerate(). The +contents of the callout enumeration block are described in the +pcre2callout +documentation, which also gives further details about callouts. +

    +
    SERIALIZATION AND PRECOMPILING
    +

    +It is possible to save compiled patterns on disc or elsewhere, and reload them +later, subject to a number of restrictions. The host on which the patterns are +reloaded must be running the same version of PCRE2, with the same code unit +width, and must also have the same endianness, pointer width, and PCRE2_SIZE +type. Before compiled patterns can be saved, they must be converted to a +"serialized" form, which in the case of PCRE2 is really just a bytecode dump. +The functions whose names begin with pcre2_serialize_ are used for +converting to and from the serialized form. They are described in the +pcre2serialize +documentation. Note that PCRE2 serialization does not convert compiled patterns +to an abstract format like Java or .NET serialization. +

    +
    THE MATCH DATA BLOCK
    +

    +pcre2_match_data *pcre2_match_data_create(uint32_t ovecsize, + pcre2_general_context *gcontext); +
    +
    +pcre2_match_data *pcre2_match_data_create_from_pattern( + const pcre2_code *code, pcre2_general_context *gcontext); +
    +
    +void pcre2_match_data_free(pcre2_match_data *match_data); +

    +

    +Information about a successful or unsuccessful match is placed in a match +data block, which is an opaque structure that is accessed by function calls. In +particular, the match data block contains a vector of offsets into the subject +string that define the matched part of the subject and any substrings that were +captured. This is known as the ovector. +

    +

    +Before calling pcre2_match(), pcre2_dfa_match(), or +pcre2_jit_match() you must create a match data block by calling one of +the creation functions above. For pcre2_match_data_create(), the first +argument is the number of pairs of offsets in the ovector. One pair of +offsets is required to identify the string that matched the whole pattern, with +an additional pair for each captured substring. For example, a value of 4 +creates enough space to record the matched portion of the subject plus three +captured substrings. A minimum of at least 1 pair is imposed by +pcre2_match_data_create(), so it is always possible to return the overall +matched string. +

    +

    +The second argument of pcre2_match_data_create() is a pointer to a +general context, which can specify custom memory management for obtaining the +memory for the match data block. If you are not using custom memory management, +pass NULL, which causes malloc() to be used. +

    +

    +For pcre2_match_data_create_from_pattern(), the first argument is a +pointer to a compiled pattern. The ovector is created to be exactly the right +size to hold all the substrings a pattern might capture. The second argument is +again a pointer to a general context, but in this case if NULL is passed, the +memory is obtained using the same allocator that was used for the compiled +pattern (custom or default). +

    +

    +A match data block can be used many times, with the same or different compiled +patterns. You can extract information from a match data block after a match +operation has finished, using functions that are described in the sections on +matched strings +and +other match data +below. +

    +

    +When a call of pcre2_match() fails, valid data is available in the match +block only when the error is PCRE2_ERROR_NOMATCH, PCRE2_ERROR_PARTIAL, or one +of the error codes for an invalid UTF string. Exactly what is available depends +on the error, and is detailed below. +

    +

    +When one of the matching functions is called, pointers to the compiled pattern +and the subject string are set in the match data block so that they can be +referenced by the extraction functions after a successful match. After running +a match, you must not free a compiled pattern or a subject string until after +all operations on the match data block (for that match) have taken place, +unless, in the case of the subject string, you have used the +PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled +"Option bits for pcre2_match()" +below. +

    +

    +When a match data block itself is no longer needed, it should be freed by +calling pcre2_match_data_free(). If this function is called with a NULL +argument, it returns immediately, without doing anything. +

    +
    MATCHING A PATTERN: THE TRADITIONAL FUNCTION
    +

    +int pcre2_match(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext); +

    +

    +The function pcre2_match() is called to match a subject string against a +compiled pattern, which is passed in the code argument. You can call +pcre2_match() with the same code argument as many times as you +like, in order to find multiple matches in the subject string or to match +different subject strings with the same pattern. +

    +

    +This function is the main matching facility of the library, and it operates in +a Perl-like manner. For specialist use there is also an alternative matching +function, which is described +below +in the section about the pcre2_dfa_match() function. +

    +

    +Here is an example of a simple call to pcre2_match(): +

    +  pcre2_match_data *md = pcre2_match_data_create(4, NULL);
    +  int rc = pcre2_match(
    +    re,             /* result of pcre2_compile() */
    +    "some string",  /* the subject string */
    +    11,             /* the length of the subject string */
    +    0,              /* start at offset 0 in the subject */
    +    0,              /* default options */
    +    md,             /* the match data block */
    +    NULL);          /* a match context; NULL means use defaults */
    +
    +If the subject string is zero-terminated, the length can be given as +PCRE2_ZERO_TERMINATED. A match context must be provided if certain less common +matching parameters are to be changed. For details, see the section on +the match context +above. +

    +
    +The string to be matched by pcre2_match() +
    +

    +The subject string is passed to pcre2_match() as a pointer in +subject, a length in length, and a starting offset in +startoffset. The length and offset are in code units, not characters. +That is, they are in bytes for the 8-bit library, 16-bit code units for the +16-bit library, and 32-bit code units for the 32-bit library, whether or not +UTF processing is enabled. +

    +

    +If startoffset is greater than the length of the subject, +pcre2_match() returns PCRE2_ERROR_BADOFFSET. When the starting offset is +zero, the search for a match starts at the beginning of the subject, and this +is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset +must point to the start of a character, or to the end of the subject (in UTF-32 +mode, one code unit equals one character, so all offsets are valid). Like the +pattern string, the subject may contain binary zeros. +

    +

    +A non-zero starting offset is useful when searching for another match in the +same subject by calling pcre2_match() again after a previous success. +Setting startoffset differs from passing over a shortened string and +setting PCRE2_NOTBOL in the case of a pattern that begins with any kind of +lookbehind. For example, consider the pattern +

    +  \Biss\B
    +
    +which finds occurrences of "iss" in the middle of words. (\B matches only if +the current position in the subject is not a word boundary.) When applied to +the string "Mississipi" the first call to pcre2_match() finds the first +occurrence. If pcre2_match() is called again with just the remainder of +the subject, namely "issipi", it does not match, because \B is always false at +the start of the subject, which is deemed to be a word boundary. However, if +pcre2_match() is passed the entire string again, but with +startoffset set to 4, it finds the second occurrence of "iss" because it +is able to look behind the starting point to discover that it is preceded by a +letter. +

    +

    +Finding all the matches in a subject is tricky when the pattern can match an +empty string. It is possible to emulate Perl's /g behaviour by first trying the +match again at the same offset, with the PCRE2_NOTEMPTY_ATSTART and +PCRE2_ANCHORED options, and then if that fails, advancing the starting offset +and trying an ordinary match again. There is some code that demonstrates how to +do this in the +pcre2demo +sample program. In the most general case, you have to check to see if the +newline convention recognizes CRLF as a newline, and if so, and the current +character is CR followed by LF, advance the starting offset by two characters +instead of one. +

    +

    +If a non-zero starting offset is passed when the pattern is anchored, a single +attempt to match at the given offset is made. This can only succeed if the +pattern does not require the match to be at the start of the subject. In other +words, the anchoring must be the result of setting the PCRE2_ANCHORED option or +the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \A. +

    +
    +Option bits for pcre2_match() +
    +

    +The unused bits of the options argument for pcre2_match() must be +zero. The only bits that may be set are PCRE2_ANCHORED, +PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL, +PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, +PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below. +

    +

    +Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by +the just-in-time (JIT) compiler. If it is set, JIT matching is disabled and the +interpretive code in pcre2_match() is run. Apart from PCRE2_NO_JIT +(obviously), the remaining options are supported for JIT matching. +

    +  PCRE2_ANCHORED
    +
    +The PCRE2_ANCHORED option limits pcre2_match() to matching at the first +matching position. If a pattern was compiled with PCRE2_ANCHORED, or turned out +to be anchored by virtue of its contents, it cannot be made unachored at +matching time. Note that setting the option at match time disables JIT +matching. +
    +  PCRE2_COPY_MATCHED_SUBJECT
    +
    +By default, a pointer to the subject is remembered in the match data block so +that, after a successful match, it can be referenced by the substring +extraction functions. This means that the subject's memory must not be freed +until all such operations are complete. For some applications where the +lifetime of the subject string is not guaranteed, it may be necessary to make a +copy of the subject string, but it is wasteful to do this unless the match is +successful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the +subject is copied and the new pointer is remembered in the match data block +instead of the original subject pointer. The memory allocator that was used for +the match block itself is used. The copy is automatically freed when +pcre2_match_data_free() is called to free the match data block. It is also +automatically freed if the match data block is re-used for another match +operation. +
    +  PCRE2_ENDANCHORED
    +
    +If the PCRE2_ENDANCHORED option is set, any string that pcre2_match() +matches must be right at the end of the subject string. Note that setting the +option at match time disables JIT matching. +
    +  PCRE2_NOTBOL
    +
    +This option specifies that first character of the subject string is not the +beginning of a line, so the circumflex metacharacter should not match before +it. Setting this without having set PCRE2_MULTILINE at compile time causes +circumflex never to match. This option affects only the behaviour of the +circumflex metacharacter. It does not affect \A. +
    +  PCRE2_NOTEOL
    +
    +This option specifies that the end of the subject string is not the end of a +line, so the dollar metacharacter should not match it nor (except in multiline +mode) a newline immediately before it. Setting this without having set +PCRE2_MULTILINE at compile time causes dollar never to match. This option +affects only the behaviour of the dollar metacharacter. It does not affect \Z +or \z. +
    +  PCRE2_NOTEMPTY
    +
    +An empty string is not considered to be a valid match if this option is set. If +there are alternatives in the pattern, they are tried. If all the alternatives +match the empty string, the entire match fails. For example, if the pattern +
    +  a?b?
    +
    +is applied to a string not beginning with "a" or "b", it matches an empty +string at the start of the subject. With PCRE2_NOTEMPTY set, this match is not +valid, so pcre2_match() searches further into the string for occurrences +of "a" or "b". +
    +  PCRE2_NOTEMPTY_ATSTART
    +
    +This is like PCRE2_NOTEMPTY, except that it locks out an empty string match +only at the first matching position, that is, at the start of the subject plus +the starting offset. An empty string match later in the subject is permitted. +If the pattern is anchored, such a match can occur only if the pattern contains +\K. +
    +  PCRE2_NO_JIT
    +
    +By default, if a pattern has been successfully processed by +pcre2_jit_compile(), JIT is automatically used when pcre2_match() +is called with options that JIT supports. Setting PCRE2_NO_JIT disables the use +of JIT; it forces matching to be done by the interpreter. +
    +  PCRE2_NO_UTF_CHECK
    +
    +When PCRE2_UTF is set at compile time, the validity of the subject as a UTF +string is checked unless PCRE2_NO_UTF_CHECK is passed to pcre2_match() or +PCRE2_MATCH_INVALID_UTF was passed to pcre2_compile(). The latter special +case is discussed in detail in the +pcre2unicode +documentation. +

    +

    +In the default case, if a non-zero starting offset is given, the check is +applied only to that part of the subject that could be inspected during +matching, and there is a check that the starting offset points to the first +code unit of a character or to the end of the subject. If there are no +lookbehind assertions in the pattern, the check starts at the starting offset. +Otherwise, it starts at the length of the longest lookbehind before the +starting offset, or at the start of the subject if there are not that many +characters before the starting offset. Note that the sequences \b and \B are +one-character lookbehinds. +

    +

    +The check is carried out before any other processing takes place, and a +negative error code is returned if the check fails. There are several UTF error +codes for each code unit width, corresponding to different problems with the +code unit sequence. There are discussions about the validity of +UTF-8 strings, +UTF-16 strings, +and +UTF-32 strings +in the +pcre2unicode +documentation. +

    +

    +If you know that your subject is valid, and you want to skip this check for +performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling +pcre2_match(). You might want to do this for the second and subsequent +calls to pcre2_match() if you are making repeated calls to find multiple +matches in the same subject string. +

    +

    +Warning: Unless PCRE2_MATCH_INVALID_UTF was set at compile time, when +PCRE2_NO_UTF_CHECK is set at match time the effect of passing an invalid +string as a subject, or an invalid value of startoffset, is undefined. +Your program may crash or loop indefinitely or give wrong results. +

    +  PCRE2_PARTIAL_HARD
    +  PCRE2_PARTIAL_SOFT
    +
    +These options turn on the partial matching feature. A partial match occurs if +the end of the subject string is reached successfully, but there are not enough +subject characters to complete the match. In addition, either at least one +character must have been inspected or the pattern must contain a lookbehind, or +the pattern must be one that could match an empty string. +

    +

    +If this situation arises when PCRE2_PARTIAL_SOFT (but not PCRE2_PARTIAL_HARD) +is set, matching continues by testing any remaining alternatives. Only if no +complete match can be found is PCRE2_ERROR_PARTIAL returned instead of +PCRE2_ERROR_NOMATCH. In other words, PCRE2_PARTIAL_SOFT specifies that the +caller is prepared to handle a partial match, but only if no complete match can +be found. +

    +

    +If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this case, if +a partial match is found, pcre2_match() immediately returns +PCRE2_ERROR_PARTIAL, without considering any other alternatives. In other +words, when PCRE2_PARTIAL_HARD is set, a partial match is considered to be more +important that an alternative complete match. +

    +

    +There is a more detailed discussion of partial and multi-segment matching, with +examples, in the +pcre2partial +documentation. +

    +
    NEWLINE HANDLING WHEN MATCHING
    +

    +When PCRE2 is built, a default newline convention is set; this is usually the +standard convention for the operating system. The default can be overridden in +a +compile context +by calling pcre2_set_newline(). It can also be overridden by starting a +pattern string with, for example, (*CRLF), as described in the +section on newline conventions +in the +pcre2pattern +page. During matching, the newline choice affects the behaviour of the dot, +circumflex, and dollar metacharacters. It may also alter the way the match +starting position is advanced after a match failure for an unanchored pattern. +

    +

    +When PCRE2_NEWLINE_CRLF, PCRE2_NEWLINE_ANYCRLF, or PCRE2_NEWLINE_ANY is set as +the newline convention, and a match attempt for an unanchored pattern fails +when the current starting position is at a CRLF sequence, and the pattern +contains no explicit matches for CR or LF characters, the match position is +advanced by two characters instead of one, in other words, to after the CRLF. +

    +

    +The above rule is a compromise that makes the most common cases work as +expected. For example, if the pattern is .+A (and the PCRE2_DOTALL option is +not set), it does not match the string "\r\nA" because, after failing at the +start, it skips both the CR and the LF before retrying. However, the pattern +[\r\n]A does match that string, because it contains an explicit CR or LF +reference, and so advances only by one character after the first failure. +

    +

    +An explicit match for CR of LF is either a literal appearance of one of those +characters in the pattern, or one of the \r or \n or equivalent octal or +hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor +does \s, even though it includes CR and LF in the characters that it matches. +

    +

    +Notwithstanding the above, anomalous effects may still occur when CRLF is a +valid newline sequence and explicit \r or \n escapes appear in the pattern. +

    +
    HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS
    +

    +uint32_t pcre2_get_ovector_count(pcre2_match_data *match_data); +
    +
    +PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *match_data); +

    +

    +In general, a pattern matches a certain portion of the subject, and in +addition, further substrings from the subject may be picked out by +parenthesized parts of the pattern. Following the usage in Jeffrey Friedl's +book, this is called "capturing" in what follows, and the phrase "capture +group" (Perl terminology) is used for a fragment of a pattern that picks out a +substring. PCRE2 supports several other kinds of parenthesized group that do +not cause substrings to be captured. The pcre2_pattern_info() function +can be used to find out how many capture groups there are in a compiled +pattern. +

    +

    +You can use auxiliary functions for accessing captured substrings +by number +or +by name, +as described in sections below. +

    +

    +Alternatively, you can make direct use of the vector of PCRE2_SIZE values, +called the ovector, which contains the offsets of captured strings. It is +part of the +match data block. +The function pcre2_get_ovector_pointer() returns the address of the +ovector, and pcre2_get_ovector_count() returns the number of pairs of +values it contains. +

    +

    +Within the ovector, the first in each pair of values is set to the offset of +the first code unit of a substring, and the second is set to the offset of the +first code unit after the end of a substring. These values are always code unit +offsets, not character offsets. That is, they are byte offsets in the 8-bit +library, 16-bit offsets in the 16-bit library, and 32-bit offsets in the 32-bit +library. +

    +

    +After a partial match (error return PCRE2_ERROR_PARTIAL), only the first pair +of offsets (that is, ovector[0] and ovector[1]) are set. They +identify the part of the subject that was partially matched. See the +pcre2partial +documentation for details of partial matching. +

    +

    +After a fully successful match, the first pair of offsets identifies the +portion of the subject string that was matched by the entire pattern. The next +pair is used for the first captured substring, and so on. The value returned by +pcre2_match() is one more than the highest numbered pair that has been +set. For example, if two substrings have been captured, the returned value is +3. If there are no captured substrings, the return value from a successful +match is 1, indicating that just the first pair of offsets has been set. +

    +

    +If a pattern uses the \K escape sequence within a positive assertion, the +reported start of a successful match can be greater than the end of the match. +For example, if the pattern (?=ab\K) is matched against "ab", the start and +end offset values for the match are 2 and 0. +

    +

    +If a capture group is matched repeatedly within a single match operation, it is +the last portion of the subject that it matched that is returned. +

    +

    +If the ovector is too small to hold all the captured substring offsets, as much +as possible is filled in, and the function returns a value of zero. If captured +substrings are not of interest, pcre2_match() may be called with a match +data block whose ovector is of minimum length (that is, one pair). +

    +

    +It is possible for capture group number n+1 to match some part of the +subject when group n has not been used at all. For example, if the string +"abc" is matched against the pattern (a|(z))(bc) the return from the function +is 4, and groups 1 and 3 are matched, but 2 is not. When this happens, both +values in the offset pairs corresponding to unused groups are set to +PCRE2_UNSET. +

    +

    +Offset values that correspond to unused groups at the end of the expression are +also set to PCRE2_UNSET. For example, if the string "abc" is matched against +the pattern (abc)(x(yz)?)? groups 2 and 3 are not matched. The return from the +function is 2, because the highest used capture group number is 1. The offsets +for for the second and third capture groupss (assuming the vector is large +enough, of course) are set to PCRE2_UNSET. +

    +

    +Elements in the ovector that do not correspond to capturing parentheses in the +pattern are never changed. That is, if a pattern contains n capturing +parentheses, no more than ovector[0] to ovector[2n+1] are set by +pcre2_match(). The other elements retain whatever values they previously +had. After a failed match attempt, the contents of the ovector are unchanged. +

    +
    OTHER INFORMATION ABOUT A MATCH
    +

    +PCRE2_SPTR pcre2_get_mark(pcre2_match_data *match_data); +
    +
    +PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *match_data); +

    +

    +As well as the offsets in the ovector, other information about a match is +retained in the match data block and can be retrieved by the above functions in +appropriate circumstances. If they are called at other times, the result is +undefined. +

    +

    +After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure +to match (PCRE2_ERROR_NOMATCH), a mark name may be available. The function +pcre2_get_mark() can be called to access this name, which can be +specified in the pattern by any of the backtracking control verbs, not just +(*MARK). The same function applies to all the verbs. It returns a pointer to +the zero-terminated name, which is within the compiled pattern. If no name is +available, NULL is returned. The length of the name (excluding the terminating +zero) is stored in the code unit that precedes the name. You should use this +length instead of relying on the terminating zero if the name might contain a +binary zero. +

    +

    +After a successful match, the name that is returned is the last mark name +encountered on the matching path through the pattern. Instances of backtracking +verbs without names do not count. Thus, for example, if the matching path +contains (*MARK:A)(*PRUNE), the name "A" is returned. After a "no match" or a +partial match, the last encountered name is returned. For example, consider +this pattern: +

    +  ^(*MARK:A)((*MARK:B)a|b)c
    +
    +When it matches "bc", the returned name is A. The B mark is "seen" in the first +branch of the group, but it is not on the matching path. On the other hand, +when this pattern fails to match "bx", the returned name is B. +

    +

    +Warning: By default, certain start-of-match optimizations are used to +give a fast "no match" result in some situations. For example, if the anchoring +is removed from the pattern above, there is an initial check for the presence +of "c" in the subject before running the matching engine. This check fails for +"bx", causing a match failure without seeing any marks. You can disable the +start-of-match optimizations by setting the PCRE2_NO_START_OPTIMIZE option for +pcre2_compile() or by starting the pattern with (*NO_START_OPT). +

    +

    +After a successful match, a partial match, or one of the invalid UTF errors +(for example, PCRE2_ERROR_UTF8_ERR5), pcre2_get_startchar() can be +called. After a successful or partial match it returns the code unit offset of +the character at which the match started. For a non-partial match, this can be +different to the value of ovector[0] if the pattern contains the \K +escape sequence. After a partial match, however, this value is always the same +as ovector[0] because \K does not affect the result of a partial match. +

    +

    +After a UTF check failure, pcre2_get_startchar() can be used to obtain +the code unit offset of the invalid UTF character. Details are given in the +pcre2unicode +page. +

    +
    ERROR RETURNS FROM pcre2_match()
    +

    +If pcre2_match() fails, it returns a negative number. This can be +converted to a text string by calling the pcre2_get_error_message() +function (see "Obtaining a textual error message" +below). +Negative error codes are also returned by other functions, and are documented +with them. The codes are given names in the header file. If UTF checking is in +force and an invalid UTF subject string is detected, one of a number of +UTF-specific negative error codes is returned. Details are given in the +pcre2unicode +page. The following are the other errors that may be returned by +pcre2_match(): +

    +  PCRE2_ERROR_NOMATCH
    +
    +The subject string did not match the pattern. +
    +  PCRE2_ERROR_PARTIAL
    +
    +The subject string did not match, but it did match partially. See the +pcre2partial +documentation for details of partial matching. +
    +  PCRE2_ERROR_BADMAGIC
    +
    +PCRE2 stores a 4-byte "magic number" at the start of the compiled code, to +catch the case when it is passed a junk pointer. This is the error that is +returned when the magic number is not present. +
    +  PCRE2_ERROR_BADMODE
    +
    +This error is given when a compiled pattern is passed to a function in a +library of a different code unit width, for example, a pattern compiled by +the 8-bit library is passed to a 16-bit or 32-bit library function. +
    +  PCRE2_ERROR_BADOFFSET
    +
    +The value of startoffset was greater than the length of the subject. +
    +  PCRE2_ERROR_BADOPTION
    +
    +An unrecognized bit was set in the options argument. +
    +  PCRE2_ERROR_BADUTFOFFSET
    +
    +The UTF code unit sequence that was passed as a subject was checked and found +to be valid (the PCRE2_NO_UTF_CHECK option was not set), but the value of +startoffset did not point to the beginning of a UTF character or the end +of the subject. +
    +  PCRE2_ERROR_CALLOUT
    +
    +This error is never generated by pcre2_match() itself. It is provided for +use by callout functions that want to cause pcre2_match() or +pcre2_callout_enumerate() to return a distinctive error code. See the +pcre2callout +documentation for details. +
    +  PCRE2_ERROR_DEPTHLIMIT
    +
    +The nested backtracking depth limit was reached. +
    +  PCRE2_ERROR_HEAPLIMIT
    +
    +The heap limit was reached. +
    +  PCRE2_ERROR_INTERNAL
    +
    +An unexpected internal error has occurred. This error could be caused by a bug +in PCRE2 or by overwriting of the compiled pattern. +
    +  PCRE2_ERROR_JIT_STACKLIMIT
    +
    +This error is returned when a pattern that was successfully studied using JIT +is being matched, but the memory available for the just-in-time processing +stack is not large enough. See the +pcre2jit +documentation for more details. +
    +  PCRE2_ERROR_MATCHLIMIT
    +
    +The backtracking match limit was reached. +
    +  PCRE2_ERROR_NOMEMORY
    +
    +If a pattern contains many nested backtracking points, heap memory is used to +remember them. This error is given when the memory allocation function (default +or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given +if the amount of memory needed exceeds the heap limit. PCRE2_ERROR_NOMEMORY is +also returned if PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails. +
    +  PCRE2_ERROR_NULL
    +
    +Either the code, subject, or match_data argument was passed +as NULL. +
    +  PCRE2_ERROR_RECURSELOOP
    +
    +This error is returned when pcre2_match() detects a recursion loop within +the pattern. Specifically, it means that either the whole pattern or a +capture group has been called recursively for the second time at the same +position in the subject string. Some simple patterns that might do this are +detected and faulted at compile time, but more complicated cases, in particular +mutual recursions between two different groups, cannot be detected until +matching is attempted. +

    +
    OBTAINING A TEXTUAL ERROR MESSAGE
    +

    +int pcre2_get_error_message(int errorcode, PCRE2_UCHAR *buffer, + PCRE2_SIZE bufflen); +

    +

    +A text message for an error code from any PCRE2 function (compile, match, or +auxiliary) can be obtained by calling pcre2_get_error_message(). The code +is passed as the first argument, with the remaining two arguments specifying a +code unit buffer and its length in code units, into which the text message is +placed. The message is returned in code units of the appropriate width for the +library that is being used. +

    +

    +The returned message is terminated with a trailing zero, and the function +returns the number of code units used, excluding the trailing zero. If the +error number is unknown, the negative error code PCRE2_ERROR_BADDATA is +returned. If the buffer is too small, the message is truncated (but still with +a trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned. +None of the messages are very long; a buffer size of 120 code units is ample. +

    +
    EXTRACTING CAPTURED SUBSTRINGS BY NUMBER
    +

    +int pcre2_substring_length_bynumber(pcre2_match_data *match_data, + uint32_t number, PCRE2_SIZE *length); +
    +
    +int pcre2_substring_copy_bynumber(pcre2_match_data *match_data, + uint32_t number, PCRE2_UCHAR *buffer, + PCRE2_SIZE *bufflen); +
    +
    +int pcre2_substring_get_bynumber(pcre2_match_data *match_data, + uint32_t number, PCRE2_UCHAR **bufferptr, + PCRE2_SIZE *bufflen); +
    +
    +void pcre2_substring_free(PCRE2_UCHAR *buffer); +

    +

    +Captured substrings can be accessed directly by using the ovector as described +above. +For convenience, auxiliary functions are provided for extracting captured +substrings as new, separate, zero-terminated strings. A substring that contains +a binary zero is correctly extracted and has a further zero added on the end, +but the result is not, of course, a C string. +

    +

    +The functions in this section identify substrings by number. The number zero +refers to the entire matched substring, with higher numbers referring to +substrings captured by parenthesized groups. After a partial match, only +substring zero is available. An attempt to extract any other substring gives +the error PCRE2_ERROR_PARTIAL. The next section describes similar functions for +extracting captured substrings by name. +

    +

    +If a pattern uses the \K escape sequence within a positive assertion, the +reported start of a successful match can be greater than the end of the match. +For example, if the pattern (?=ab\K) is matched against "ab", the start and +end offset values for the match are 2 and 0. In this situation, calling these +functions with a zero substring number extracts a zero-length empty string. +

    +

    +You can find the length in code units of a captured substring without +extracting it by calling pcre2_substring_length_bynumber(). The first +argument is a pointer to the match data block, the second is the group number, +and the third is a pointer to a variable into which the length is placed. If +you just want to know whether or not the substring has been captured, you can +pass the third argument as NULL. +

    +

    +The pcre2_substring_copy_bynumber() function copies a captured substring +into a supplied buffer, whereas pcre2_substring_get_bynumber() copies it +into new memory, obtained using the same memory allocation function that was +used for the match data block. The first two arguments of these functions are a +pointer to the match data block and a capture group number. +

    +

    +The final arguments of pcre2_substring_copy_bynumber() are a pointer to +the buffer and a pointer to a variable that contains its length in code units. +This is updated to contain the actual number of code units used for the +extracted substring, excluding the terminating zero. +

    +

    +For pcre2_substring_get_bynumber() the third and fourth arguments point +to variables that are updated with a pointer to the new memory and the number +of code units that comprise the substring, again excluding the terminating +zero. When the substring is no longer needed, the memory should be freed by +calling pcre2_substring_free(). +

    +

    +The return value from all these functions is zero for success, or a negative +error code. If the pattern match failed, the match failure code is returned. +If a substring number greater than zero is used after a partial match, +PCRE2_ERROR_PARTIAL is returned. Other possible error codes are: +

    +  PCRE2_ERROR_NOMEMORY
    +
    +The buffer was too small for pcre2_substring_copy_bynumber(), or the +attempt to get memory failed for pcre2_substring_get_bynumber(). +
    +  PCRE2_ERROR_NOSUBSTRING
    +
    +There is no substring with that number in the pattern, that is, the number is +greater than the number of capturing parentheses. +
    +  PCRE2_ERROR_UNAVAILABLE
    +
    +The substring number, though not greater than the number of captures in the +pattern, is greater than the number of slots in the ovector, so the substring +could not be captured. +
    +  PCRE2_ERROR_UNSET
    +
    +The substring did not participate in the match. For example, if the pattern is +(abc)|(def) and the subject is "def", and the ovector contains at least two +capturing slots, substring number 1 is unset. +

    +
    EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS
    +

    +int pcre2_substring_list_get(pcre2_match_data *match_data, +" PCRE2_UCHAR ***listptr, PCRE2_SIZE **lengthsptr); +
    +
    +void pcre2_substring_list_free(PCRE2_SPTR *list); +

    +

    +The pcre2_substring_list_get() function extracts all available substrings +and builds a list of pointers to them. It also (optionally) builds a second +list that contains their lengths (in code units), excluding a terminating zero +that is added to each of them. All this is done in a single block of memory +that is obtained using the same memory allocation function that was used to get +the match data block. +

    +

    +This function must be called only after a successful match. If called after a +partial match, the error code PCRE2_ERROR_PARTIAL is returned. +

    +

    +The address of the memory block is returned via listptr, which is also +the start of the list of string pointers. The end of the list is marked by a +NULL pointer. The address of the list of lengths is returned via +lengthsptr. If your strings do not contain binary zeros and you do not +therefore need the lengths, you may supply NULL as the lengthsptr +argument to disable the creation of a list of lengths. The yield of the +function is zero if all went well, or PCRE2_ERROR_NOMEMORY if the memory block +could not be obtained. When the list is no longer needed, it should be freed by +calling pcre2_substring_list_free(). +

    +

    +If this function encounters a substring that is unset, which can happen when +capture group number n+1 matches some part of the subject, but group +n has not been used at all, it returns an empty string. This can be +distinguished from a genuine zero-length substring by inspecting the +appropriate offset in the ovector, which contain PCRE2_UNSET for unset +substrings, or by calling pcre2_substring_length_bynumber(). +

    +
    EXTRACTING CAPTURED SUBSTRINGS BY NAME
    +

    +int pcre2_substring_number_from_name(const pcre2_code *code, + PCRE2_SPTR name); +
    +
    +int pcre2_substring_length_byname(pcre2_match_data *match_data, + PCRE2_SPTR name, PCRE2_SIZE *length); +
    +
    +int pcre2_substring_copy_byname(pcre2_match_data *match_data, + PCRE2_SPTR name, PCRE2_UCHAR *buffer, PCRE2_SIZE *bufflen); +
    +
    +int pcre2_substring_get_byname(pcre2_match_data *match_data, + PCRE2_SPTR name, PCRE2_UCHAR **bufferptr, PCRE2_SIZE *bufflen); +
    +
    +void pcre2_substring_free(PCRE2_UCHAR *buffer); +

    +

    +To extract a substring by name, you first have to find associated number. +For example, for this pattern: +

    +  (a+)b(?<xxx>\d+)...
    +
    +the number of the capture group called "xxx" is 2. If the name is known to be +unique (PCRE2_DUPNAMES was not set), you can find the number from the name by +calling pcre2_substring_number_from_name(). The first argument is the +compiled pattern, and the second is the name. The yield of the function is the +group number, PCRE2_ERROR_NOSUBSTRING if there is no group with that name, or +PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one group with that name. +Given the number, you can extract the substring directly from the ovector, or +use one of the "bynumber" functions described above. +

    +

    +For convenience, there are also "byname" functions that correspond to the +"bynumber" functions, the only difference being that the second argument is a +name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate +names, these functions scan all the groups with the given name, and return the +captured substring from the first named group that is set. +

    +

    +If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is +returned. If all groups with the name have numbers that are greater than the +number of slots in the ovector, PCRE2_ERROR_UNAVAILABLE is returned. If there +is at least one group with a slot in the ovector, but no group is found to be +set, PCRE2_ERROR_UNSET is returned. +

    +

    +Warning: If the pattern uses the (?| feature to set up multiple +capture groups with the same number, as described in the +section on duplicate group numbers +in the +pcre2pattern +page, you cannot use names to distinguish the different capture groups, because +names are not included in the compiled code. The matching process uses only +numbers. For this reason, the use of different names for groups with the +same number causes an error at compile time. +

    +
    CREATING A NEW STRING WITH SUBSTITUTIONS
    +

    +int pcre2_substitute(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext, PCRE2_SPTR replacement, + PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer, + PCRE2_SIZE *outlengthptr); +

    +

    +This function optionally calls pcre2_match() and then makes a copy of the +subject string in outputbuffer, replacing parts that were matched with +the replacement string, whose length is supplied in rlength. This +can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an +option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the +replacement string(s). The default action is to perform just one replacement if +the pattern matches, but there is an option that requests multiple replacements +(see PCRE2_SUBSTITUTE_GLOBAL below). +

    +

    +If successful, pcre2_substitute() returns the number of substitutions +that were carried out. This may be zero if no match was found, and is never +greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is +returned if an error is detected. +

    +

    +Matches in which a \K item in a lookahead in the pattern causes the match to +end before it starts are not supported, and give rise to an error return. For +global replacements, matches in which \K in a lookbehind causes the match to +start earlier than the point that was reached in the previous iteration are +also not supported. +

    +

    +The first seven arguments of pcre2_substitute() are the same as for +pcre2_match(), except that the partial matching options are not +permitted, and match_data may be passed as NULL, in which case a match +data block is obtained and freed within this function, using memory management +functions from the match context, if provided, or else those that were used to +allocate memory for the compiled code. +

    +

    +If match_data is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the +provided block is used for all calls to pcre2_match(), and its contents +afterwards are the result of the final call. For global changes, this will +always be a no-match error. The contents of the ovector within the match data +block may or may not have been changed. +

    +

    +As well as the usual options for pcre2_match(), a number of additional +options can be set in the options argument of pcre2_substitute(). +One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external +match_data block must be provided, and it must have been used for an +external call to pcre2_match(). The data in the match_data block +(return code, offset vector) is used for the first substitution instead of +calling pcre2_match() from within pcre2_substitute(). This allows +an application to check for a match before choosing to substitute, without +having to repeat the match. +

    +

    +The contents of the externally supplied match data block are not changed when +PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTITUTE_GLOBAL is also set, +pcre2_match() is called after the first substitution to check for further +matches, but this is done using an internally obtained match data block, thus +always leaving the external block unchanged. +

    +

    +The code argument is not used for matching before the first substitution +when PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided, even when +PCRE2_SUBSTITUTE_GLOBAL is not set, because it contains information such as the +UTF setting and the number of capturing parentheses in the pattern. +

    +

    +The default action of pcre2_substitute() is to return a copy of the +subject string with matched substrings replaced. However, if +PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are +returned. In the global case, multiple replacements are concatenated in the +output buffer. Substitution callouts (see +below) +can be used to separate them if necessary. +

    +

    +The outlengthptr argument of pcre2_substitute() must point to a +variable that contains the length, in code units, of the output buffer. If the +function is successful, the value is updated to contain the length in code +units of the new string, excluding the trailing zero that is automatically +added. +

    +

    +If the function is not successful, the value set via outlengthptr depends +on the type of error. For syntax errors in the replacement string, the value is +the offset in the replacement string where the error was detected. For other +errors, the value is PCRE2_UNSET by default. This includes the case of the +output buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set. +

    +

    +PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is +too small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If +this option is set, however, pcre2_substitute() continues to go through +the motions of matching and substituting (without, of course, writing anything) +in order to compute the size of buffer that is needed. This value is passed +back via the outlengthptr variable, with the result of the function still +being PCRE2_ERROR_NOMEMORY. +

    +

    +Passing a buffer size of zero is a permitted way of finding out how much memory +is needed for given substitution. However, this does mean that the entire +operation is carried out twice. Depending on the application, it may be more +efficient to allocate a large buffer and free the excess afterwards, instead of +using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH. +

    +

    +The replacement string, which is interpreted as a UTF string in UTF mode, is +checked for UTF validity unless PCRE2_NO_UTF_CHECK is set. An invalid UTF +replacement string causes an immediate return with the relevant UTF error code. +

    +

    +If PCRE2_SUBSTITUTE_LITERAL is set, the replacement string is not interpreted +in any way. By default, however, a dollar character is an escape character that +can specify the insertion of characters from capture groups and names from +(*MARK) or other control verbs in the pattern. The following forms are always +recognized: +

    +  $$                  insert a dollar character
    +  $<n> or ${<n>}      insert the contents of group <n>
    +  $*MARK or ${*MARK}  insert a control verb name
    +
    +Either a group number or a group name can be given for <n>. Curly brackets are +required only if the following character would be interpreted as part of the +number or name. The number may be zero to include the entire matched string. +For example, if the pattern a(b)c is matched with "=abc=" and the replacement +string "+$1$0$1+", the result is "=+babcb+=". +

    +

    +$*MARK inserts the name from the last encountered backtracking control verb on +the matching path that has a name. (*MARK) must always include a name, but the +other verbs need not. For example, in the case of (*MARK:A)(*PRUNE) the name +inserted is "A", but for (*MARK:A)(*PRUNE:B) the relevant name is "B". This +facility can be used to perform simple simultaneous substitutions, as this +pcre2test example shows: +

    +  /(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK}
    +      apple lemon
    +   2: pear orange
    +
    +PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string, +replacing every matching substring. If this option is not set, only the first +matching substring is replaced. The search for matches takes place in the +original subject string (that is, previous replacements do not affect it). +Iteration is implemented by advancing the startoffset value for each +search, which is always passed the entire subject string. If an offset limit is +set in the match context, searching stops when that limit is reached. +

    +

    +You can restrict the effect of a global substitution to a portion of the +subject string by setting either or both of startoffset and an offset +limit. Here is a pcre2test example: +

    +  /B/g,replace=!,use_offset_limit
    +  ABC ABC ABC ABC\=offset=3,offset_limit=12
    +   2: ABC A!C A!C ABC
    +
    +When continuing with global substitutions after matching a substring with zero +length, an attempt to find a non-empty match at the same offset is performed. +If this is not successful, the offset is advanced by one character except when +CRLF is a valid newline sequence and the next two characters are CR, LF. In +this case, the offset is advanced by two characters. +

    +

    +PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that do +not appear in the pattern to be treated as unset groups. This option should be +used with care, because it means that a typo in a group name or number no +longer causes the PCRE2_ERROR_NOSUBSTRING error. +

    +

    +PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including unknown +groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty +strings when inserted as described above. If this option is not set, an attempt +to insert an unset group causes the PCRE2_ERROR_UNSET error. This option does +not influence the extended substitution syntax described below. +

    +

    +PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the +replacement string. Without this option, only the dollar character is special, +and only the group insertion forms listed above are valid. When +PCRE2_SUBSTITUTE_EXTENDED is set, two things change: +

    +

    +Firstly, backslash in a replacement string is interpreted as an escape +character. The usual forms such as \n or \x{ddd} can be used to specify +particular character codes, and backslash followed by any non-alphanumeric +character quotes that character. Extended quoting can be coded using \Q...\E, +exactly as in pattern strings. +

    +

    +There are also four escape sequences for forcing the case of inserted letters. +The insertion mechanism has three states: no case forcing, force upper case, +and force lower case. The escape sequences change the current state: \U and +\L change to upper or lower case forcing, respectively, and \E (when not +terminating a \Q quoted sequence) reverts to no case forcing. The sequences +\u and \l force the next character (if it is a letter) to upper or lower +case, respectively, and then the state automatically reverts to no case +forcing. Case forcing applies to all inserted characters, including those from +capture groups and letters within \Q...\E quoted sequences. If either +PCRE2_UTF or PCRE2_UCP was set when the pattern was compiled, Unicode +properties are used for case forcing characters whose code points are greater +than 127. +

    +

    +Note that case forcing sequences such as \U...\E do not nest. For example, +the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final \E has no +effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EXTRA_ALT_BSUX options do +not apply to replacement strings. +

    +

    +The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more +flexibility to capture group substitution. The syntax is similar to that used +by Bash: +

    +  ${<n>:-<string>}
    +  ${<n>:+<string1>:<string2>}
    +
    +As before, <n> may be a group number or a name. The first form specifies a +default value. If group <n> is set, its value is inserted; if not, <string> is +expanded and the result inserted. The second form specifies strings that are +expanded and inserted when group <n> is set or unset, respectively. The first +form is just a convenient shorthand for +
    +  ${<n>:+${<n>}:<string>}
    +
    +Backslash can be used to escape colons and closing curly brackets in the +replacement strings. A change of the case forcing state within a replacement +string remains in force afterwards, as shown in this pcre2test example: +
    +  /(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo
    +      body
    +   1: hello
    +      somebody
    +   1: HELLO
    +
    +The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended +substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown +groups in the extended syntax forms to be treated as unset. +

    +

    +If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET, +PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrelevant and +are ignored. +

    +
    +Substitution errors +
    +

    +In the event of an error, pcre2_substitute() returns a negative error +code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors from +pcre2_match() are passed straight back. +

    +

    +PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion, +unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set. +

    +

    +PCRE2_ERROR_UNSET is returned for an unset substring insertion (including an +unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) when the simple +(non-extended) syntax is used and PCRE2_SUBSTITUTE_UNSET_EMPTY is not set. +

    +

    +PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big enough. If the +PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size of buffer that is +needed is returned via outlengthptr. Note that this does not happen by +default. +

    +

    +PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the +match_data argument is NULL. +

    +

    +PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the +replacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE +(invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE (closing curly bracket +not found), PCRE2_ERROR_BADSUBSTITUTION (syntax error in extended group +substitution), and PCRE2_ERROR_BADSUBSPATTERN (the pattern match ended before +it started or the match started earlier than the current position in the +subject, which can happen if \K is used in an assertion). +

    +

    +As for all PCRE2 errors, a text message that describes the error can be +obtained by calling the pcre2_get_error_message() function (see +"Obtaining a textual error message" +above). +

    +
    +Substitution callouts +
    +

    +int pcre2_set_substitute_callout(pcre2_match_context *mcontext, + int (*callout_function)(pcre2_substitute_callout_block *, void *), + void *callout_data); +
    +
    +The pcre2_set_substitution_callout() function can be used to specify a +callout function for pcre2_substitute(). This information is passed in +a match context. The callout function is called after each substitution has +been processed, but it can cause the replacement not to happen. The callout +function is not called for simulated substitutions that happen as a result of +the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. +

    +

    +The first argument of the callout function is a pointer to a substitute callout +block structure, which contains the following fields, not necessarily in this +order: +

    +  uint32_t    version;
    +  uint32_t    subscount;
    +  PCRE2_SPTR  input;
    +  PCRE2_SPTR  output;
    +  PCRE2_SIZE *ovector;
    +  uint32_t    oveccount;
    +  PCRE2_SIZE  output_offsets[2];
    +
    +The version field contains the version number of the block format. The +current version is 0. The version number will increase in future if more fields +are added, but the intention is never to remove any of the existing fields. +

    +

    +The subscount field is the number of the current match. It is 1 for the +first callout, 2 for the second, and so on. The input and output +pointers are copies of the values passed to pcre2_substitute(). +

    +

    +The ovector field points to the ovector, which contains the result of the +most recent match. The oveccount field contains the number of pairs that +are set in the ovector, and is always greater than zero. +

    +

    +The output_offsets vector contains the offsets of the replacement in the +output string. This has already been processed for dollar and (if requested) +backslash substitutions as described above. +

    +

    +The second argument of the callout function is the value passed as +callout_data when the function was registered. The value returned by the +callout function is interpreted as follows: +

    +

    +If the value is zero, the replacement is accepted, and, if +PCRE2_SUBSTITUTE_GLOBAL is set, processing continues with a search for the next +match. If the value is not zero, the current replacement is not accepted. If +the value is greater than zero, processing continues when +PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero or +PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is copied to the +output and the call to pcre2_substitute() exits, returning the number of +matches so far. +

    +
    DUPLICATE CAPTURE GROUP NAMES
    +

    +int pcre2_substring_nametable_scan(const pcre2_code *code, + PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last); +

    +

    +When a pattern is compiled with the PCRE2_DUPNAMES option, names for capture +groups are not required to be unique. Duplicate names are always allowed for +groups with the same number, created by using the (?| feature. Indeed, if such +groups are named, they are required to use the same names. +

    +

    +Normally, patterns that use duplicate names are such that in any one match, +only one of each set of identically-named groups participates. An example is +shown in the +pcre2pattern +documentation. +

    +

    +When duplicates are present, pcre2_substring_copy_byname() and +pcre2_substring_get_byname() return the first substring corresponding to +the given name that is set. Only if none are set is PCRE2_ERROR_UNSET is +returned. The pcre2_substring_number_from_name() function returns the +error PCRE2_ERROR_NOUNIQUESUBSTRING when there are duplicate names. +

    +

    +If you want to get full details of all captured substrings for a given name, +you must use the pcre2_substring_nametable_scan() function. The first +argument is the compiled pattern, and the second is the name. If the third and +fourth arguments are NULL, the function returns a group number for a unique +name, or PCRE2_ERROR_NOUNIQUESUBSTRING otherwise. +

    +

    +When the third and fourth arguments are not NULL, they must be pointers to +variables that are updated by the function. After it has run, they point to the +first and last entries in the name-to-number table for the given name, and the +function returns the length of each entry in code units. In both cases, +PCRE2_ERROR_NOSUBSTRING is returned if there are no entries for the given name. +

    +

    +The format of the name table is described +above +in the section entitled Information about a pattern. Given all the +relevant entries for the name, you can extract each of their numbers, and hence +the captured data. +

    +
    FINDING ALL POSSIBLE MATCHES AT ONE POSITION
    +

    +The traditional matching function uses a similar algorithm to Perl, which stops +when it finds the first match at a given point in the subject. If you want to +find all possible matches, or the longest possible match at a given position, +consider using the alternative matching function (see below) instead. If you +cannot use the alternative function, you can kludge it up by making use of the +callout facility, which is described in the +pcre2callout +documentation. +

    +

    +What you have to do is to insert a callout right at the end of the pattern. +When your callout function is called, extract and save the current matched +substring. Then return 1, which forces pcre2_match() to backtrack and try +other alternatives. Ultimately, when it runs out of matches, +pcre2_match() will yield PCRE2_ERROR_NOMATCH. +

    +
    MATCHING A PATTERN: THE ALTERNATIVE FUNCTION
    +

    +int pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext, + int *workspace, PCRE2_SIZE wscount); +

    +

    +The function pcre2_dfa_match() is called to match a subject string +against a compiled pattern, using a matching algorithm that scans the subject +string just once (not counting lookaround assertions), and does not backtrack. +This has different characteristics to the normal algorithm, and is not +compatible with Perl. Some of the features of PCRE2 patterns are not supported. +Nevertheless, there are times when this kind of matching can be useful. For a +discussion of the two matching algorithms, and a list of features that +pcre2_dfa_match() does not support, see the +pcre2matching +documentation. +

    +

    +The arguments for the pcre2_dfa_match() function are the same as for +pcre2_match(), plus two extras. The ovector within the match data block +is used in a different way, and this is described below. The other common +arguments are used in the same way as for pcre2_match(), so their +description is not repeated here. +

    +

    +The two additional arguments provide workspace for the function. The workspace +vector should contain at least 20 elements. It is used for keeping track of +multiple paths through the pattern tree. More workspace is needed for patterns +and subjects where there are a lot of potential matches. +

    +

    +Here is an example of a simple call to pcre2_dfa_match(): +

    +  int wspace[20];
    +  pcre2_match_data *md = pcre2_match_data_create(4, NULL);
    +  int rc = pcre2_dfa_match(
    +    re,             /* result of pcre2_compile() */
    +    "some string",  /* the subject string */
    +    11,             /* the length of the subject string */
    +    0,              /* start at offset 0 in the subject */
    +    0,              /* default options */
    +    md,             /* the match data block */
    +    NULL,           /* a match context; NULL means use defaults */
    +    wspace,         /* working space vector */
    +    20);            /* number of elements (NOT size in bytes) */
    +
    +

    +
    +Option bits for pcre_dfa_match() +
    +

    +The unused bits of the options argument for pcre2_dfa_match() must +be zero. The only bits that may be set are PCRE2_ANCHORED, +PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL, +PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, +PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last +four of these are exactly the same as for pcre2_match(), so their +description is not repeated here. +

    +  PCRE2_PARTIAL_HARD
    +  PCRE2_PARTIAL_SOFT
    +
    +These have the same general effect as they do for pcre2_match(), but the +details are slightly different. When PCRE2_PARTIAL_HARD is set for +pcre2_dfa_match(), it returns PCRE2_ERROR_PARTIAL if the end of the +subject is reached and there is still at least one matching possibility that +requires additional characters. This happens even if some complete matches have +already been found. When PCRE2_PARTIAL_SOFT is set, the return code +PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL if the end of the +subject is reached, there have been no complete matches, but there is still at +least one matching possibility. The portion of the string that was inspected +when the longest partial match was found is set as the first matching string in +both cases. There is a more detailed discussion of partial and multi-segment +matching, with examples, in the +pcre2partial +documentation. +
    +  PCRE2_DFA_SHORTEST
    +
    +Setting the PCRE2_DFA_SHORTEST option causes the matching algorithm to stop as +soon as it has found one match. Because of the way the alternative algorithm +works, this is necessarily the shortest possible match at the first possible +matching point in the subject string. +
    +  PCRE2_DFA_RESTART
    +
    +When pcre2_dfa_match() returns a partial match, it is possible to call it +again, with additional subject characters, and have it continue with the same +match. The PCRE2_DFA_RESTART option requests this action; when it is set, the +workspace and wscount options must reference the same vector as +before because data about the match so far is left in them after a partial +match. There is more discussion of this facility in the +pcre2partial +documentation. +

    +
    +Successful returns from pcre2_dfa_match() +
    +

    +When pcre2_dfa_match() succeeds, it may have matched more than one +substring in the subject. Note, however, that all the matches from one run of +the function start at the same point in the subject. The shorter matches are +all initial substrings of the longer matches. For example, if the pattern +

    +  <.*>
    +
    +is matched against the string +
    +  This is <something> <something else> <something further> no more
    +
    +the three matched strings are +
    +  <something> <something else> <something further>
    +  <something> <something else>
    +  <something>
    +
    +On success, the yield of the function is a number greater than zero, which is +the number of matched substrings. The offsets of the substrings are returned in +the ovector, and can be extracted by number in the same way as for +pcre2_match(), but the numbers bear no relation to any capture groups +that may exist in the pattern, because DFA matching does not support capturing. +

    +

    +Calls to the convenience functions that extract substrings by name +return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a +DFA match. The convenience functions that extract substrings by number never +return PCRE2_ERROR_NOSUBSTRING. +

    +

    +The matched strings are stored in the ovector in reverse order of length; that +is, the longest matching string is first. If there were too many matches to fit +into the ovector, the yield of the function is zero, and the vector is filled +with the longest matches. +

    +

    +NOTE: PCRE2's "auto-possessification" optimization usually applies to character +repeats at the end of a pattern (as well as internally). For example, the +pattern "a\d+" is compiled as if it were "a\d++". For DFA matching, this +means that only one possible match is found. If you really do want multiple +matches in such cases, either use an ungreedy repeat such as "a\d+?" or set +the PCRE2_NO_AUTO_POSSESS option when compiling. +

    +
    +Error returns from pcre2_dfa_match() +
    +

    +The pcre2_dfa_match() function returns a negative number when it fails. +Many of the errors are the same as for pcre2_match(), as described +above. +There are in addition the following errors that are specific to +pcre2_dfa_match(): +

    +  PCRE2_ERROR_DFA_UITEM
    +
    +This return is given if pcre2_dfa_match() encounters an item in the +pattern that it does not support, for instance, the use of \C in a UTF mode or +a backreference. +
    +  PCRE2_ERROR_DFA_UCOND
    +
    +This return is given if pcre2_dfa_match() encounters a condition item +that uses a backreference for the condition, or a test for recursion in a +specific capture group. These are not supported. +
    +  PCRE2_ERROR_DFA_UINVALID_UTF
    +
    +This return is given if pcre2_dfa_match() is called for a pattern that +was compiled with PCRE2_MATCH_INVALID_UTF. This is not supported for DFA +matching. +
    +  PCRE2_ERROR_DFA_WSSIZE
    +
    +This return is given if pcre2_dfa_match() runs out of space in the +workspace vector. +
    +  PCRE2_ERROR_DFA_RECURSE
    +
    +When a recursion or subroutine call is processed, the matching function calls +itself recursively, using private memory for the ovector and workspace. +This error is given if the internal ovector is not large enough. This should be +extremely rare, as a vector of size 1000 is used. +
    +  PCRE2_ERROR_DFA_BADRESTART
    +
    +When pcre2_dfa_match() is called with the PCRE2_DFA_RESTART option, +some plausibility checks are made on the contents of the workspace, which +should contain data about the previous partial match. If any of these checks +fail, this error is given. +

    +
    SEE ALSO
    +

    +pcre2build(3), pcre2callout(3), pcre2demo(3), +pcre2matching(3), pcre2partial(3), pcre2posix(3), +pcre2sample(3), pcre2unicode(3). +

    +
    AUTHOR
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    REVISION
    +

    +Last updated: 04 November 2020 +
    +Copyright © 1997-2020 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2build.html b/src/pcre2/doc/html/pcre2build.html new file mode 100644 index 00000000..a206b232 --- /dev/null +++ b/src/pcre2/doc/html/pcre2build.html @@ -0,0 +1,623 @@ + + +pcre2build specification + + +

    pcre2build man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

    +
    BUILDING PCRE2
    +

    +PCRE2 is distributed with a configure script that can be used to build +the library in Unix-like environments using the applications known as +Autotools. Also in the distribution are files to support building using +CMake instead of configure. The text file +README +contains general information about building with Autotools (some of which is +repeated below), and also has some comments about building on various operating +systems. There is a lot more information about building PCRE2 without using +Autotools (including information about using CMake and building "by +hand") in the text file called +NON-AUTOTOOLS-BUILD. +You should consult this file as well as the +README +file if you are building in a non-Unix-like environment. +

    +
    PCRE2 BUILD-TIME OPTIONS
    +

    +The rest of this document describes the optional features of PCRE2 that can be +selected when the library is compiled. It assumes use of the configure +script, where the optional features are selected or deselected by providing +options to configure before running the make command. However, the +same options can be selected in both Unix-like and non-Unix-like environments +if you are using CMake instead of configure to build PCRE2. +

    +

    +If you are not using Autotools or CMake, option selection can be done by +editing the config.h file, or by passing parameter settings to the +compiler, as described in +NON-AUTOTOOLS-BUILD. +

    +

    +The complete list of options for configure (which includes the standard +ones such as the selection of the installation directory) can be obtained by +running +

    +  ./configure --help
    +
    +The following sections include descriptions of "on/off" options whose names +begin with --enable or --disable. Because of the way that configure +works, --enable and --disable always come in pairs, so the complementary option +always exists as well, but as it specifies the default, it is not described. +Options that specify values have names that start with --with. At the end of a +configure run, a summary of the configuration is output. +

    +
    BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES
    +

    +By default, a library called libpcre2-8 is built, containing functions +that take string arguments contained in arrays of bytes, interpreted either as +single-byte characters, or UTF-8 strings. You can also build two other +libraries, called libpcre2-16 and libpcre2-32, which process +strings that are contained in arrays of 16-bit and 32-bit code units, +respectively. These can be interpreted either as single-unit characters or +UTF-16/UTF-32 strings. To build these additional libraries, add one or both of +the following to the configure command: +

    +  --enable-pcre2-16
    +  --enable-pcre2-32
    +
    +If you do not want the 8-bit library, add +
    +  --disable-pcre2-8
    +
    +as well. At least one of the three libraries must be built. Note that the POSIX +wrapper is for the 8-bit library only, and that pcre2grep is an 8-bit +program. Neither of these are built if you select only the 16-bit or 32-bit +libraries. +

    +
    BUILDING SHARED AND STATIC LIBRARIES
    +

    +The Autotools PCRE2 building process uses libtool to build both shared +and static libraries by default. You can suppress an unwanted library by adding +one of +

    +  --disable-shared
    +  --disable-static
    +
    +to the configure command. +

    +
    UNICODE AND UTF SUPPORT
    +

    +By default, PCRE2 is built with support for Unicode and UTF character strings. +To build it without Unicode support, add +

    +  --disable-unicode
    +
    +to the configure command. This setting applies to all three libraries. It +is not possible to build one library with Unicode support and another without +in the same configuration. +

    +

    +Of itself, Unicode support does not make PCRE2 treat strings as UTF-8, UTF-16 +or UTF-32. To do that, applications that use the library can set the PCRE2_UTF +option when they call pcre2_compile() to compile a pattern. +Alternatively, patterns may be started with (*UTF) unless the application has +locked this out by setting PCRE2_NEVER_UTF. +

    +

    +UTF support allows the libraries to process character code points up to +0x10ffff in the strings that they handle. Unicode support also gives access to +the Unicode properties of characters, using pattern escapes such as \P, \p, +and \X. Only the general category properties such as Lu and Nd are +supported. Details are given in the +pcre2pattern +documentation. +

    +

    +Pattern escapes such as \d and \w do not by default make use of Unicode +properties. The application can request that they do by setting the PCRE2_UCP +option. Unless the application has set PCRE2_NEVER_UCP, a pattern may also +request this by starting with (*UCP). +

    +
    DISABLING THE USE OF \C
    +

    +The \C escape sequence, which matches a single code unit, even in a UTF mode, +can cause unpredictable behaviour because it may leave the current matching +point in the middle of a multi-code-unit character. The application can lock it +out by setting the PCRE2_NEVER_BACKSLASH_C option when calling +pcre2_compile(). There is also a build-time option +

    +  --enable-never-backslash-C
    +
    +(note the upper case C) which locks out the use of \C entirely. +

    +
    JUST-IN-TIME COMPILER SUPPORT
    +

    +Just-in-time (JIT) compiler support is included in the build by specifying +

    +  --enable-jit
    +
    +This support is available only for certain hardware architectures. If this +option is set for an unsupported architecture, a building error occurs. +If in doubt, use +
    +  --enable-jit=auto
    +
    +which enables JIT only if the current hardware is supported. You can check +if JIT is enabled in the configuration summary that is output at the end of a +configure run. If you are enabling JIT under SELinux you may also want to +add +
    +  --enable-jit-sealloc
    +
    +which enables the use of an execmem allocator in JIT that is compatible with +SELinux. This has no effect if JIT is not enabled. See the +pcre2jit +documentation for a discussion of JIT usage. When JIT support is enabled, +pcre2grep automatically makes use of it, unless you add +
    +  --disable-pcre2grep-jit
    +
    +to the configure command. +

    +
    NEWLINE RECOGNITION
    +

    +By default, PCRE2 interprets the linefeed (LF) character as indicating the end +of a line. This is the normal newline character on Unix-like systems. You can +compile PCRE2 to use carriage return (CR) instead, by adding +

    +  --enable-newline-is-cr
    +
    +to the configure command. There is also an --enable-newline-is-lf option, +which explicitly specifies linefeed as the newline character. +

    +

    +Alternatively, you can specify that line endings are to be indicated by the +two-character sequence CRLF (CR immediately followed by LF). If you want this, +add +

    +  --enable-newline-is-crlf
    +
    +to the configure command. There is a fourth option, specified by +
    +  --enable-newline-is-anycrlf
    +
    +which causes PCRE2 to recognize any of the three sequences CR, LF, or CRLF as +indicating a line ending. A fifth option, specified by +
    +  --enable-newline-is-any
    +
    +causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline +sequences are the three just mentioned, plus the single characters VT (vertical +tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line +separator, U+2028), and PS (paragraph separator, U+2029). The final option is +
    +  --enable-newline-is-nul
    +
    +which causes NUL (binary zero) to be set as the default line-ending character. +

    +

    +Whatever default line ending convention is selected when PCRE2 is built can be +overridden by applications that use the library. At build time it is +recommended to use the standard for your operating system. +

    +
    WHAT \R MATCHES
    +

    +By default, the sequence \R in a pattern matches any Unicode newline sequence, +independently of what has been selected as the line ending sequence. If you +specify +

    +  --enable-bsr-anycrlf
    +
    +the default is changed so that \R matches only CR, LF, or CRLF. Whatever is +selected when PCRE2 is built can be overridden by applications that use the +library. +

    +
    HANDLING VERY LARGE PATTERNS
    +

    +Within a compiled pattern, offset values are used to point from one part to +another (for example, from an opening parenthesis to an alternation +metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values +are used for these offsets, leading to a maximum size for a compiled pattern of +around 64 thousand code units. This is sufficient to handle all but the most +gigantic patterns. Nevertheless, some people do want to process truly enormous +patterns, so it is possible to compile PCRE2 to use three-byte or four-byte +offsets by adding a setting such as +

    +  --with-link-size=3
    +
    +to the configure command. The value given must be 2, 3, or 4. For the +16-bit library, a value of 3 is rounded up to 4. In these libraries, using +longer offsets slows down the operation of PCRE2 because it has to load +additional data when handling them. For the 32-bit library the value is always +4 and cannot be overridden; the value of --with-link-size is ignored. +

    +
    LIMITING PCRE2 RESOURCE USAGE
    +

    +The pcre2_match() function increments a counter each time it goes round +its main loop. Putting a limit on this counter controls the amount of computing +resource used by a single call to pcre2_match(). The limit can be changed +at run time, as described in the +pcre2api +documentation. The default is 10 million, but this can be changed by adding a +setting such as +

    +  --with-match-limit=500000
    +
    +to the configure command. This setting also applies to the +pcre2_dfa_match() matching function, and to JIT matching (though the +counting is done differently). +

    +

    +The pcre2_match() function starts out using a 20KiB vector on the system +stack to record backtracking points. The more nested backtracking points there +are (that is, the deeper the search tree), the more memory is needed. If the +initial vector is not large enough, heap memory is used, up to a certain limit, +which is specified in kibibytes (units of 1024 bytes). The limit can be changed +at run time, as described in the +pcre2api +documentation. The default limit (in effect unlimited) is 20 million. You can +change this by a setting such as +

    +  --with-heap-limit=500
    +
    +which limits the amount of heap to 500 KiB. This limit applies only to +interpretive matching in pcre2_match() and pcre2_dfa_match(), which +may also use the heap for internal workspace when processing complicated +patterns. This limit does not apply when JIT (which has its own memory +arrangements) is used. +

    +

    +You can also explicitly limit the depth of nested backtracking in the +pcre2_match() interpreter. This limit defaults to the value that is set +for --with-match-limit. You can set a lower default limit by adding, for +example, +

    +  --with-match-limit_depth=10000
    +
    +to the configure command. This value can be overridden at run time. This +depth limit indirectly limits the amount of heap memory that is used, but +because the size of each backtracking "frame" depends on the number of +capturing parentheses in a pattern, the amount of heap that is used before the +limit is reached varies from pattern to pattern. This limit was more useful in +versions before 10.30, where function recursion was used for backtracking. +

    +

    +As well as applying to pcre2_match(), the depth limit also controls +the depth of recursive function calls in pcre2_dfa_match(). These are +used for lookaround assertions, atomic groups, and recursion within patterns. +The limit does not apply to JIT matching. +

    +
    CREATING CHARACTER TABLES AT BUILD TIME
    +

    +PCRE2 uses fixed tables for processing characters whose code points are less +than 256. By default, PCRE2 is built with a set of tables that are distributed +in the file src/pcre2_chartables.c.dist. These tables are for ASCII codes +only. If you add +

    +  --enable-rebuild-chartables
    +
    +to the configure command, the distributed tables are no longer used. +Instead, a program called pcre2_dftables is compiled and run. This +outputs the source for new set of tables, created in the default locale of your +C run-time system. This method of replacing the tables does not work if you are +cross compiling, because pcre2_dftables needs to be run on the local +host and therefore not compiled with the cross compiler. +

    +

    +If you need to create alternative tables when cross compiling, you will have to +do so "by hand". There may also be other reasons for creating tables manually. +To cause pcre2_dftables to be built on the local host, run a normal +compiling command, and then run the program with the output file as its +argument, for example: +

    +  cc src/pcre2_dftables.c -o pcre2_dftables
    +  ./pcre2_dftables src/pcre2_chartables.c
    +
    +This builds the tables in the default locale of the local host. If you want to +specify a locale, you must use the -L option: +
    +  LC_ALL=fr_FR ./pcre2_dftables -L src/pcre2_chartables.c
    +
    +You can also specify -b (with or without -L). This causes the tables to be +written in binary instead of as source code. A set of binary tables can be +loaded into memory by an application and passed to pcre2_compile() in the +same way as tables created by calling pcre2_maketables(). The tables are +just a string of bytes, independent of hardware characteristics such as +endianness. This means they can be bundled with an application that runs in +different environments, to ensure consistent behaviour. +

    +
    USING EBCDIC CODE
    +

    +PCRE2 assumes by default that it will run in an environment where the character +code is ASCII or Unicode, which is a superset of ASCII. This is the case for +most computer operating systems. PCRE2 can, however, be compiled to run in an +8-bit EBCDIC environment by adding +

    +  --enable-ebcdic --disable-unicode
    +
    +to the configure command. This setting implies +--enable-rebuild-chartables. You should only use it if you know that you are in +an EBCDIC environment (for example, an IBM mainframe operating system). +

    +

    +It is not possible to support both EBCDIC and UTF-8 codes in the same version +of the library. Consequently, --enable-unicode and --enable-ebcdic are mutually +exclusive. +

    +

    +The EBCDIC character that corresponds to an ASCII LF is assumed to have the +value 0x15 by default. However, in some EBCDIC environments, 0x25 is used. In +such an environment you should use +

    +  --enable-ebcdic-nl25
    +
    +as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR has the +same value as in ASCII, namely, 0x0d. Whichever of 0x15 and 0x25 is not +chosen as LF is made to correspond to the Unicode NEL character (which, in +Unicode, is 0x85). +

    +

    +The options that select newline behaviour, such as --enable-newline-is-cr, +and equivalent run-time options, refer to these character values in an EBCDIC +environment. +

    +
    PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS
    +

    +By default pcre2grep supports the use of callouts with string arguments +within the patterns it is matching. There are two kinds: one that generates +output using local code, and another that calls an external program or script. +If --disable-pcre2grep-callout-fork is added to the configure command, +only the first kind of callout is supported; if --disable-pcre2grep-callout is +used, all callouts are completely ignored. For more details of pcre2grep +callouts, see the +pcre2grep +documentation. +

    +
    PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT
    +

    +By default, pcre2grep reads all files as plain text. You can build it so +that it recognizes files whose names end in .gz or .bz2, and reads +them with libz or libbz2, respectively, by adding one or both of +

    +  --enable-pcre2grep-libz
    +  --enable-pcre2grep-libbz2
    +
    +to the configure command. These options naturally require that the +relevant libraries are installed on your system. Configuration will fail if +they are not. +

    +
    PCRE2GREP BUFFER SIZE
    +

    +pcre2grep uses an internal buffer to hold a "window" on the file it is +scanning, in order to be able to output "before" and "after" lines when it +finds a match. The default starting size of the buffer is 20KiB. The buffer +itself is three times this size, but because of the way it is used for holding +"before" lines, the longest line that is guaranteed to be processable is the +notional buffer size. If a longer line is encountered, pcre2grep +automatically expands the buffer, up to a specified maximum size, whose default +is 1MiB or the starting size, whichever is the larger. You can change the +default parameter values by adding, for example, +

    +  --with-pcre2grep-bufsize=51200
    +  --with-pcre2grep-max-bufsize=2097152
    +
    +to the configure command. The caller of pcre2grep can override +these values by using --buffer-size and --max-buffer-size on the command line. +

    +
    PCRE2TEST OPTION FOR LIBREADLINE SUPPORT
    +

    +If you add one of +

    +  --enable-pcre2test-libreadline
    +  --enable-pcre2test-libedit
    +
    +to the configure command, pcre2test is linked with the +libreadline orlibedit library, respectively, and when its input is +from a terminal, it reads it using the readline() function. This provides +line-editing and history facilities. Note that libreadline is +GPL-licensed, so if you distribute a binary of pcre2test linked in this +way, there may be licensing issues. These can be avoided by linking instead +with libedit, which has a BSD licence. +

    +

    +Setting --enable-pcre2test-libreadline causes the -lreadline option to be +added to the pcre2test build. In many operating environments with a +sytem-installed readline library this is sufficient. However, in some +environments (e.g. if an unmodified distribution version of readline is in +use), some extra configuration may be necessary. The INSTALL file for +libreadline says this: +

    +  "Readline uses the termcap functions, but does not link with
    +  the termcap or curses library itself, allowing applications
    +  which link with readline the to choose an appropriate library."
    +
    +If your environment has not been set up so that an appropriate library is +automatically included, you may need to add something like +
    +  LIBS="-ncurses"
    +
    +immediately before the configure command. +

    +
    INCLUDING DEBUGGING CODE
    +

    +If you add +

    +  --enable-debug
    +
    +to the configure command, additional debugging code is included in the +build. This feature is intended for use by the PCRE2 maintainers. +

    +
    DEBUGGING WITH VALGRIND SUPPORT
    +

    +If you add +

    +  --enable-valgrind
    +
    +to the configure command, PCRE2 will use valgrind annotations to mark +certain memory regions as unaddressable. This allows it to detect invalid +memory accesses, and is mostly useful for debugging PCRE2 itself. +

    +
    CODE COVERAGE REPORTING
    +

    +If your C compiler is gcc, you can build a version of PCRE2 that can generate a +code coverage report for its test suite. To enable this, you must install +lcov version 1.6 or above. Then specify +

    +  --enable-coverage
    +
    +to the configure command and build PCRE2 in the usual way. +

    +

    +Note that using ccache (a caching C compiler) is incompatible with code +coverage reporting. If you have configured ccache to run automatically +on your system, you must set the environment variable +

    +  CCACHE_DISABLE=1
    +
    +before running make to build PCRE2, so that ccache is not used. +

    +

    +When --enable-coverage is used, the following addition targets are added to the +Makefile: +

    +  make coverage
    +
    +This creates a fresh coverage report for the PCRE2 test suite. It is equivalent +to running "make coverage-reset", "make coverage-baseline", "make check", and +then "make coverage-report". +
    +  make coverage-reset
    +
    +This zeroes the coverage counters, but does nothing else. +
    +  make coverage-baseline
    +
    +This captures baseline coverage information. +
    +  make coverage-report
    +
    +This creates the coverage report. +
    +  make coverage-clean-report
    +
    +This removes the generated coverage report without cleaning the coverage data +itself. +
    +  make coverage-clean-data
    +
    +This removes the captured coverage data without removing the coverage files +created at compile time (*.gcno). +
    +  make coverage-clean
    +
    +This cleans all coverage data including the generated coverage report. For more +information about code coverage, see the gcov and lcov +documentation. +

    +
    DISABLING THE Z AND T FORMATTING MODIFIERS
    +

    +The C99 standard defines formatting modifiers z and t for size_t and +ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in +environments other than Microsoft Visual Studio when __STDC_VERSION__ is +defined and has a value greater than or equal to 199901L (indicating C99). +However, there is at least one environment that claims to be C99 but does not +support these modifiers. If +

    +  --disable-percent-zt
    +
    +is specified, no use is made of the z or t modifiers. Instead of %td or %zu, +%lu is used, with a cast for size_t values. +

    +
    SUPPORT FOR FUZZERS
    +

    +There is a special option for use by people who want to run fuzzing tests on +PCRE2: +

    +  --enable-fuzz-support
    +
    +At present this applies only to the 8-bit library. If set, it causes an extra +library called libpcre2-fuzzsupport.a to be built, but not installed. This +contains a single function called LLVMFuzzerTestOneInput() whose arguments are +a pointer to a string and the length of the string. When called, this function +tries to compile the string as a pattern, and if that succeeds, to match it. +This is done both with no options and with some random options bits that are +generated from the string. +

    +

    +Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck +to be created. This is normally run under valgrind or used when PCRE2 is +compiled with address sanitizing enabled. It calls the fuzzing function and +outputs information about what it is doing. The input strings are specified by +arguments: if an argument starts with "=" the rest of it is a literal input +string. Otherwise, it is assumed to be a file name, and the contents of the +file are the test string. +

    +
    OBSOLETE OPTION
    +

    +In versions of PCRE2 prior to 10.30, there were two ways of handling +backtracking in the pcre2_match() function. The default was to use the +system stack, but if +

    +  --disable-stack-for-recursion
    +
    +was set, memory on the heap was used. From release 10.30 onwards this has +changed (the stack is no longer used) and this option now does nothing except +give a warning. +

    +
    SEE ALSO
    +

    +pcre2api(3), pcre2-config(3). +

    +
    AUTHOR
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    REVISION
    +

    +Last updated: 20 March 2020 +
    +Copyright © 1997-2020 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2callout.html b/src/pcre2/doc/html/pcre2callout.html new file mode 100644 index 00000000..65db9336 --- /dev/null +++ b/src/pcre2/doc/html/pcre2callout.html @@ -0,0 +1,480 @@ + + +pcre2callout specification + + +

    pcre2callout man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

    +
    SYNOPSIS
    +

    +#include <pcre2.h> +

    +

    +int (*pcre2_callout)(pcre2_callout_block *, void *); +
    +
    +int pcre2_callout_enumerate(const pcre2_code *code, + int (*callback)(pcre2_callout_enumerate_block *, void *), + void *user_data); +

    +
    DESCRIPTION
    +

    +PCRE2 provides a feature called "callout", which is a means of temporarily +passing control to the caller of PCRE2 in the middle of pattern matching. The +caller of PCRE2 provides an external function by putting its entry point in +a match context (see pcre2_set_callout() in the +pcre2api +documentation). +

    +

    +When using the pcre2_substitute() function, an additional callout feature +is available. This does a callout after each change to the subject string and +is described in the +pcre2api +documentation; the rest of this document is concerned with callouts during +pattern matching. +

    +

    +Within a regular expression, (?C<arg>) indicates a point at which the external +function is to be called. Different callout points can be identified by putting +a number less than 256 after the letter C. The default value is zero. +Alternatively, the argument may be a delimited string. The starting delimiter +must be one of ` ' " ^ % # $ { and the ending delimiter is the same as the +start, except for {, where the ending delimiter is }. If the ending delimiter +is needed within the string, it must be doubled. For example, this pattern has +two callout points: +

    +  (?C1)abc(?C"some ""arbitrary"" text")def
    +
    +If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2 +automatically inserts callouts, all with number 255, before each item in the +pattern except for immediately before or after an explicit callout. For +example, if PCRE2_AUTO_CALLOUT is used with the pattern +
    +  A(?C3)B
    +
    +it is processed as if it were +
    +  (?C255)A(?C3)B(?C255)
    +
    +Here is a more complicated example: +
    +  A(\d{2}|--)
    +
    +With PCRE2_AUTO_CALLOUT, this pattern is processed as if it were +
    +  (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
    +
    +Notice that there is a callout before and after each parenthesis and +alternation bar. If the pattern contains a conditional group whose condition is +an assertion, an automatic callout is inserted immediately before the +condition. Such a callout may also be inserted explicitly, for example: +
    +  (?(?C9)(?=a)ab|de)  (?(?C%text%)(?!=d)ab|de)
    +
    +This applies only to assertion conditions (because they are themselves +independent groups). +

    +

    +Callouts can be useful for tracking the progress of pattern matching. The +pcre2test +program has a pattern qualifier (/auto_callout) that sets automatic callouts. +When any callouts are present, the output from pcre2test indicates how +the pattern is being matched. This is useful information when you are trying to +optimize the performance of a particular pattern. +

    +
    MISSING CALLOUTS
    +

    +You should be aware that, because of optimizations in the way PCRE2 compiles +and matches patterns, callouts sometimes do not happen exactly as you might +expect. +

    +
    +Auto-possessification +
    +

    +At compile time, PCRE2 "auto-possessifies" repeated items when it knows that +what follows cannot be part of the repeat. For example, a+[bc] is compiled as +if it were a++[bc]. The pcre2test output when this pattern is compiled +with PCRE2_ANCHORED and PCRE2_AUTO_CALLOUT and then applied to the string +"aaaa" is: +

    +  --->aaaa
    +   +0 ^        a+
    +   +2 ^   ^    [bc]
    +  No match
    +
    +This indicates that when matching [bc] fails, there is no backtracking into a+ +(because it is being treated as a++) and therefore the callouts that would be +taken for the backtracks do not occur. You can disable the auto-possessify +feature by passing PCRE2_NO_AUTO_POSSESS to pcre2_compile(), or starting +the pattern with (*NO_AUTO_POSSESS). In this case, the output changes to this: +
    +  --->aaaa
    +   +0 ^        a+
    +   +2 ^   ^    [bc]
    +   +2 ^  ^     [bc]
    +   +2 ^ ^      [bc]
    +   +2 ^^       [bc]
    +  No match
    +
    +This time, when matching [bc] fails, the matcher backtracks into a+ and tries +again, repeatedly, until a+ itself fails. +

    +
    +Automatic .* anchoring +
    +

    +By default, an optimization is applied when .* is the first significant item in +a pattern. If PCRE2_DOTALL is set, so that the dot can match any character, the +pattern is automatically anchored. If PCRE2_DOTALL is not set, a match can +start only after an internal newline or at the beginning of the subject, and +pcre2_compile() remembers this. If a pattern has more than one top-level +branch, automatic anchoring occurs if all branches are anchorable. +

    +

    +This optimization is disabled, however, if .* is in an atomic group or if there +is a backreference to the capture group in which it appears. It is also +disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of +callouts does not affect it. +

    +

    +For example, if the pattern .*\d is compiled with PCRE2_AUTO_CALLOUT and +applied to the string "aa", the pcre2test output is: +

    +  --->aa
    +   +0 ^      .*
    +   +2 ^ ^    \d
    +   +2 ^^     \d
    +   +2 ^      \d
    +  No match
    +
    +This shows that all match attempts start at the beginning of the subject. In +other words, the pattern is anchored. You can disable this optimization by +passing PCRE2_NO_DOTSTAR_ANCHOR to pcre2_compile(), or starting the +pattern with (*NO_DOTSTAR_ANCHOR). In this case, the output changes to: +
    +  --->aa
    +   +0 ^      .*
    +   +2 ^ ^    \d
    +   +2 ^^     \d
    +   +2 ^      \d
    +   +0  ^     .*
    +   +2  ^^    \d
    +   +2  ^     \d
    +  No match
    +
    +This shows more match attempts, starting at the second subject character. +Another optimization, described in the next section, means that there is no +subsequent attempt to match with an empty subject. +

    +
    +Other optimizations +
    +

    +Other optimizations that provide fast "no match" results also affect callouts. +For example, if the pattern is +

    +  ab(?C4)cd
    +
    +PCRE2 knows that any matching string must contain the letter "d". If the +subject string is "abyz", the lack of "d" means that matching doesn't ever +start, and the callout is never reached. However, with "abyd", though the +result is still no match, the callout is obeyed. +

    +

    +For most patterns PCRE2 also knows the minimum length of a matching string, and +will immediately give a "no match" return without actually running a match if +the subject is not long enough, or, for unanchored patterns, if it has been +scanned far enough. +

    +

    +You can disable these optimizations by passing the PCRE2_NO_START_OPTIMIZE +option to pcre2_compile(), or by starting the pattern with +(*NO_START_OPT). This slows down the matching process, but does ensure that +callouts such as the example above are obeyed. +

    +
    THE CALLOUT INTERFACE
    +

    +During matching, when PCRE2 reaches a callout point, if an external function is +provided in the match context, it is called. This applies to both normal, +DFA, and JIT matching. The first argument to the callout function is a pointer +to a pcre2_callout block. The second argument is the void * callout data +that was supplied when the callout was set up by calling +pcre2_set_callout() (see the +pcre2api +documentation). The callout block structure contains the following fields, not +necessarily in this order: +

    +  uint32_t      version;
    +  uint32_t      callout_number;
    +  uint32_t      capture_top;
    +  uint32_t      capture_last;
    +  uint32_t      callout_flags;
    +  PCRE2_SIZE   *offset_vector;
    +  PCRE2_SPTR    mark;
    +  PCRE2_SPTR    subject;
    +  PCRE2_SIZE    subject_length;
    +  PCRE2_SIZE    start_match;
    +  PCRE2_SIZE    current_position;
    +  PCRE2_SIZE    pattern_position;
    +  PCRE2_SIZE    next_item_length;
    +  PCRE2_SIZE    callout_string_offset;
    +  PCRE2_SIZE    callout_string_length;
    +  PCRE2_SPTR    callout_string;
    +
    +The version field contains the version number of the block format. The +current version is 2; the three callout string fields were added for version 1, +and the callout_flags field for version 2. If you are writing an +application that might use an earlier release of PCRE2, you should check the +version number before accessing any of these fields. The version number will +increase in future if more fields are added, but the intention is never to +remove any of the existing fields. +

    +
    +Fields for numerical callouts +
    +

    +For a numerical callout, callout_string is NULL, and callout_number +contains the number of the callout, in the range 0-255. This is the number +that follows (?C for callouts that part of the pattern; it is 255 for +automatically generated callouts. +

    +
    +Fields for string callouts +
    +

    +For callouts with string arguments, callout_number is always zero, and +callout_string points to the string that is contained within the compiled +pattern. Its length is given by callout_string_length. Duplicated ending +delimiters that were present in the original pattern string have been turned +into single characters, but there is no other processing of the callout string +argument. An additional code unit containing binary zero is present after the +string, but is not included in the length. The delimiter that was used to start +the string is also stored within the pattern, immediately before the string +itself. You can access this delimiter as callout_string[-1] if you need +it. +

    +

    +The callout_string_offset field is the code unit offset to the start of +the callout argument string within the original pattern string. This is +provided for the benefit of applications such as script languages that might +need to report errors in the callout string within the pattern. +

    +
    +Fields for all callouts +
    +

    +The remaining fields in the callout block are the same for both kinds of +callout. +

    +

    +The offset_vector field is a pointer to a vector of capturing offsets +(the "ovector"). You may read the elements in this vector, but you must not +change any of them. +

    +

    +For calls to pcre2_match(), the offset_vector field is not (since +release 10.30) a pointer to the actual ovector that was passed to the matching +function in the match data block. Instead it points to an internal ovector of a +size large enough to hold all possible captured substrings in the pattern. Note +that whenever a recursion or subroutine call within a pattern completes, the +capturing state is reset to what it was before. +

    +

    +The capture_last field contains the number of the most recently captured +substring, and the capture_top field contains one more than the number of +the highest numbered captured substring so far. If no substrings have yet been +captured, the value of capture_last is 0 and the value of +capture_top is 1. The values of these fields do not always differ by one; +for example, when the callout in the pattern ((a)(b))(?C2) is taken, +capture_last is 1 but capture_top is 4. +

    +

    +The contents of ovector[2] to ovector[<capture_top>*2-1] can be inspected in +order to extract substrings that have been matched so far, in the same way as +extracting substrings after a match has completed. The values in ovector[0] and +ovector[1] are always PCRE2_UNSET because the match is by definition not +complete. Substrings that have not been captured but whose numbers are less +than capture_top also have both of their ovector slots set to +PCRE2_UNSET. +

    +

    +For DFA matching, the offset_vector field points to the ovector that was +passed to the matching function in the match data block for callouts at the top +level, but to an internal ovector during the processing of pattern recursions, +lookarounds, and atomic groups. However, these ovectors hold no useful +information because pcre2_dfa_match() does not support substring +capturing. The value of capture_top is always 1 and the value of +capture_last is always 0 for DFA matching. +

    +

    +The subject and subject_length fields contain copies of the values +that were passed to the matching function. +

    +

    +The start_match field normally contains the offset within the subject at +which the current match attempt started. However, if the escape sequence \K +has been encountered, this value is changed to reflect the modified starting +point. If the pattern is not anchored, the callout function may be called +several times from the same point in the pattern for different starting points +in the subject. +

    +

    +The current_position field contains the offset within the subject of the +current match pointer. +

    +

    +The pattern_position field contains the offset in the pattern string to +the next item to be matched. +

    +

    +The next_item_length field contains the length of the next item to be +processed in the pattern string. When the callout is at the end of the pattern, +the length is zero. When the callout precedes an opening parenthesis, the +length includes meta characters that follow the parenthesis. For example, in a +callout before an assertion such as (?=ab) the length is 3. For an an +alternation bar or a closing parenthesis, the length is one, unless a closing +parenthesis is followed by a quantifier, in which case its length is included. +(This changed in release 10.23. In earlier releases, before an opening +parenthesis the length was that of the entire group, and before an alternation +bar or a closing parenthesis the length was zero.) +

    +

    +The pattern_position and next_item_length fields are intended to +help in distinguishing between different automatic callouts, which all have the +same callout number. However, they are set for all callouts, and are used by +pcre2test to show the next item to be matched when displaying callout +information. +

    +

    +In callouts from pcre2_match() the mark field contains a pointer to +the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or +(*THEN) item in the match, or NULL if no such items have been passed. Instances +of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In +callouts from the DFA matching function this field always contains NULL. +

    +

    +The callout_flags field is always zero in callouts from +pcre2_dfa_match() or when JIT is being used. When pcre2_match() +without JIT is used, the following bits may be set: +

    +  PCRE2_CALLOUT_STARTMATCH
    +
    +This is set for the first callout after the start of matching for each new +starting position in the subject. +
    +  PCRE2_CALLOUT_BACKTRACK
    +
    +This is set if there has been a matching backtrack since the previous callout, +or since the start of matching if this is the first callout from a +pcre2_match() run. +

    +

    +Both bits are set when a backtrack has caused a "bumpalong" to a new starting +position in the subject. Output from pcre2test does not indicate the +presence of these bits unless the callout_extra modifier is set. +

    +

    +The information in the callout_flags field is provided so that +applications can track and tell their users how matching with backtracking is +done. This can be useful when trying to optimize patterns, or just to +understand how PCRE2 works. There is no support in pcre2_dfa_match() +because there is no backtracking in DFA matching, and there is no support in +JIT because JIT is all about maximimizing matching performance. In both these +cases the callout_flags field is always zero. +

    +
    RETURN VALUES FROM CALLOUTS
    +

    +The external callout function returns an integer to PCRE2. If the value is +zero, matching proceeds as normal. If the value is greater than zero, matching +fails at the current point, but the testing of other matching possibilities +goes ahead, just as if a lookahead assertion had failed. If the value is less +than zero, the match is abandoned, and the matching function returns the +negative value. +

    +

    +Negative values should normally be chosen from the set of PCRE2_ERROR_xxx +values. In particular, PCRE2_ERROR_NOMATCH forces a standard "no match" +failure. The error number PCRE2_ERROR_CALLOUT is reserved for use by callout +functions; it will never be used by PCRE2 itself. +

    +
    CALLOUT ENUMERATION
    +

    +int pcre2_callout_enumerate(const pcre2_code *code, + int (*callback)(pcre2_callout_enumerate_block *, void *), + void *user_data); +
    +
    +A script language that supports the use of string arguments in callouts might +like to scan all the callouts in a pattern before running the match. This can +be done by calling pcre2_callout_enumerate(). The first argument is a +pointer to a compiled pattern, the second points to a callback function, and +the third is arbitrary user data. The callback function is called for every +callout in the pattern in the order in which they appear. Its first argument is +a pointer to a callout enumeration block, and its second argument is the +user_data value that was passed to pcre2_callout_enumerate(). The +data block contains the following fields: +

    +  version                Block version number
    +  pattern_position       Offset to next item in pattern
    +  next_item_length       Length of next item in pattern
    +  callout_number         Number for numbered callouts
    +  callout_string_offset  Offset to string within pattern
    +  callout_string_length  Length of callout string
    +  callout_string         Points to callout string or is NULL
    +
    +The version number is currently 0. It will increase if new fields are ever +added to the block. The remaining fields are the same as their namesakes in the +pcre2_callout block that is used for callouts during matching, as +described +above. +

    +

    +Note that the value of pattern_position is unique for each callout. +However, if a callout occurs inside a group that is quantified with a non-zero +minimum or a fixed maximum, the group is replicated inside the compiled +pattern. For example, a pattern such as /(a){2}/ is compiled as if it were +/(a)(a)/. This means that the callout will be enumerated more than once, but +with the same value for pattern_position in each case. +

    +

    +The callback function should normally return zero. If it returns a non-zero +value, scanning the pattern stops, and that value is returned from +pcre2_callout_enumerate(). +

    +
    AUTHOR
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    REVISION
    +

    +Last updated: 03 February 2019 +
    +Copyright © 1997-2019 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2compat.html b/src/pcre2/doc/html/pcre2compat.html new file mode 100644 index 00000000..54fb6437 --- /dev/null +++ b/src/pcre2/doc/html/pcre2compat.html @@ -0,0 +1,255 @@ + + +pcre2compat specification + + +

    pcre2compat man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +DIFFERENCES BETWEEN PCRE2 AND PERL +
    +

    +This document describes some of the differences in the ways that PCRE2 and Perl +handle regular expressions. The differences described here are with respect to +Perl version 5.32.0, but as both Perl and PCRE2 are continually changing, the +information may at times be out of date. +

    +

    +1. PCRE2 has only a subset of Perl's Unicode support. Details of what it does +have are given in the +pcre2unicode +page. +

    +

    +2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but +they do not mean what you might think. For example, (?!a){3} does not assert +that the next three characters are not "a". It just asserts that the next +character is not "a" three times (in principle; PCRE2 optimizes this to run the +assertion just once). Perl allows some repeat quantifiers on other assertions, +for example, \b* (but not \b{3}, though oddly it does allow ^{3}), but these +do not seem to have any use. PCRE2 does not allow any kind of quantifier on +non-lookaround assertions. +

    +

    +3. Capture groups that occur inside negative lookaround assertions are counted, +but their entries in the offsets vector are set only when a negative assertion +is a condition that has a matching branch (that is, the condition is false). +Perl may set such capture groups in other circumstances. +

    +

    +4. The following Perl escape sequences are not supported: \F, \l, \L, \u, +\U, and \N when followed by a character name. \N on its own, matching a +non-newline character, and \N{U+dd..}, matching a Unicode code point, are +supported. The escapes that modify the case of following letters are +implemented by Perl's general string-handling and are not part of its pattern +matching engine. If any of these are encountered by PCRE2, an error is +generated by default. However, if either of the PCRE2_ALT_BSUX or +PCRE2_EXTRA_ALT_BSUX options is set, \U and \u are interpreted as ECMAScript +interprets them. +

    +

    +5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 is +built with Unicode support (the default). The properties that can be tested +with \p and \P are limited to the general category properties such as Lu and +Nd, script names such as Greek or Han, and the derived properties Any and L&. +Both PCRE2 and Perl support the Cs (surrogate) property, but in PCRE2 its use +is limited. See the +pcre2pattern +documentation for details. The long synonyms for property names that Perl +supports (such as \p{Letter}) are not supported by PCRE2, nor is it permitted +to prefix any of these properties with "Is". +

    +

    +6. PCRE2 supports the \Q...\E escape for quoting substrings. Characters +in between are treated as literals. However, this is slightly different from +Perl in that $ and @ are also handled as literals inside the quotes. In Perl, +they cause variable interpolation (but of course PCRE2 does not have +variables). Also, Perl does "double-quotish backslash interpolation" on any +backslashes between \Q and \E which, its documentation says, "may lead to +confusing results". PCRE2 treats a backslash between \Q and \E just like any +other character. Note the following examples: +

    +    Pattern            PCRE2 matches     Perl matches
    +
    +    \Qabc$xyz\E        abc$xyz           abc followed by the contents of $xyz
    +    \Qabc\$xyz\E       abc\$xyz          abc\$xyz
    +    \Qabc\E\$\Qxyz\E   abc$xyz           abc$xyz
    +    \QA\B\E            A\B               A\B
    +    \Q\\E              \                 \\E
    +
    +The \Q...\E sequence is recognized both inside and outside character classes +by both PCRE2 and Perl. +

    +

    +7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code}) +constructions. However, PCRE2 does have a "callout" feature, which allows an +external function to be called during pattern matching. See the +pcre2callout +documentation for details. +

    +

    +8. Subroutine calls (whether recursive or not) were treated as atomic groups up +to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking +into subroutine calls is now supported, as in Perl. +

    +

    +9. In PCRE2, if any of the backtracking control verbs are used in a group that +is called as a subroutine (whether or not recursively), their effect is +confined to that group; it does not extend to the surrounding pattern. This is +not always the case in Perl. In particular, if (*THEN) is present in a group +that is called as a subroutine, its action is limited to that group, even if +the group does not contain any | characters. Note that such groups are +processed as anchored at the point where they are tested. +

    +

    +10. If a pattern contains more than one backtracking control verb, the first +one that is backtracked onto acts. For example, in the pattern +A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C +triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the +same as PCRE2, but there are cases where it differs. +

    +

    +11. There are some differences that are concerned with the settings of captured +strings when part of a pattern is repeated. For example, matching "aba" against +the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to +"b". +

    +

    +12. PCRE2's handling of duplicate capture group numbers and names is not as +general as Perl's. This is a consequence of the fact the PCRE2 works internally +just with numbers, using an external table to translate between numbers and +names. In particular, a pattern such as (?|(?<a>A)|(?<b>B)), where the two +capture groups have the same number but different names, is not supported, and +causes an error at compile time. If it were allowed, it would not be possible +to distinguish which group matched, because both names map to capture group +number 1. To avoid this confusing situation, an error is given at compile time. +

    +

    +13. Perl used to recognize comments in some places that PCRE2 does not, for +example, between the ( and ? at the start of a group. If the /x modifier is +set, Perl allowed white space between ( and ? though the latest Perls give an +error (for a while it was just deprecated). There may still be some cases where +Perl behaves differently. +

    +

    +14. Perl, when in warning mode, gives warnings for character classes such as +[A-\d] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no +warning features, so it gives an error in these cases because they are almost +certainly user mistakes. +

    +

    +15. In PCRE2, the upper/lower case character properties Lu and Ll are not +affected when case-independent matching is specified. For example, \p{Lu} +always matches an upper case letter. I think Perl has changed in this respect; +in the release at the time of writing (5.32), \p{Lu} and \p{Ll} match all +letters, regardless of case, when case independence is specified. +

    +

    +16. From release 5.32.0, Perl locks out the use of \K in lookaround +assertions. In PCRE2, \K is acted on when it occurs in positive assertions, +but is ignored in negative assertions. +

    +

    +17. PCRE2 provides some extensions to the Perl regular expression facilities. +Perl 5.10 included new features that were not in earlier versions of Perl, some +of which (such as named parentheses) were in PCRE2 for some time before. This +list is with respect to Perl 5.32: +
    +
    +(a) Although lookbehind assertions in PCRE2 must match fixed length strings, +each alternative toplevel branch of a lookbehind assertion can match a +different length of string. Perl requires them all to have the same length. +
    +
    +(b) From PCRE2 10.23, backreferences to groups of fixed length are supported +in lookbehinds, provided that there is no possibility of referencing a +non-unique number or name. Perl does not support backreferences in lookbehinds. +
    +
    +(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $ +meta-character matches only at the very end of the string. +
    +
    +(d) A backslash followed by a letter with no special meaning is faulted. (Perl +can be made to issue a warning.) +
    +
    +(e) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is +inverted, that is, by default they are not greedy, but if followed by a +question mark they are. +
    +
    +(f) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried +only at the first matching position in the subject string. +
    +
    +(g) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and PCRE2_NOTEMPTY_ATSTART +options have no Perl equivalents. +
    +
    +(h) The \R escape sequence can be restricted to match only CR, LF, or CRLF +by the PCRE2_BSR_ANYCRLF option. +
    +
    +(i) The callout facility is PCRE2-specific. Perl supports codeblocks and +variable interpolation, but not general hooks on every match. +
    +
    +(j) The partial matching facility is PCRE2-specific. +
    +
    +(k) The alternative matching function (pcre2_dfa_match() matches in a +different way and is not Perl-compatible. +
    +
    +(l) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at +the start of a pattern. These set overall options that cannot be changed within +the pattern. +
    +
    +(m) PCRE2 supports non-atomic positive lookaround assertions. This is an +extension to the lookaround facilities. The default, Perl-compatible +lookarounds are atomic. +

    +

    +18. The Perl /a modifier restricts /d numbers to pure ascii, and the /aa +modifier restricts /i case-insensitive matching to pure ascii, ignoring Unicode +rules. This separation cannot be represented with PCRE2_UCP. +

    +

    +19. Perl has different limits than PCRE2. See the +pcre2limit +documentation for details. Perl went with 5.10 from recursion to iteration +keeping the intermediate matches on the heap, which is ~10% slower but does not +fall into any stack-overflow limit. PCRE2 made a similar change at release +10.30, and also has many build-time and run-time customizable limits. +

    +
    +AUTHOR +
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    +REVISION +
    +

    +Last updated: 06 October 2020 +
    +Copyright © 1997-2019 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2convert.html b/src/pcre2/doc/html/pcre2convert.html new file mode 100644 index 00000000..871e5634 --- /dev/null +++ b/src/pcre2/doc/html/pcre2convert.html @@ -0,0 +1,191 @@ + + +pcre2convert specification + + +

    pcre2convert man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

    +
    EXPERIMENTAL PATTERN CONVERSION FUNCTIONS
    +

    +This document describes a set of functions that can be used to convert +"foreign" patterns into PCRE2 regular expressions. This facility is currently +experimental, and may be changed in future releases. Two kinds of pattern, +globs and POSIX patterns, are supported. +

    +
    THE CONVERT CONTEXT
    +

    +pcre2_convert_context *pcre2_convert_context_create( + pcre2_general_context *gcontext); +
    +
    +pcre2_convert_context *pcre2_convert_context_copy( + pcre2_convert_context *cvcontext); +
    +
    +void pcre2_convert_context_free(pcre2_convert_context *cvcontext); +
    +
    +int pcre2_set_glob_escape(pcre2_convert_context *cvcontext, + uint32_t escape_char); +
    +
    +int pcre2_set_glob_separator(pcre2_convert_context *cvcontext, + uint32_t separator_char); +
    +
    +A convert context is used to hold parameters that affect the way that pattern +conversion works. Like all PCRE2 contexts, you need to use a context only if +you want to override the defaults. There are the usual create, copy, and free +functions. If custom memory management functions are set in a general context +that is passed to pcre2_convert_context_create(), they are used for all +memory management within the conversion functions. +

    +

    +There are only two parameters in the convert context at present. Both apply +only to glob conversions. The escape character defaults to grave accent under +Windows, otherwise backslash. It can be set to zero, meaning no escape +character, or to any punctuation character with a code point less than 256. +The separator character defaults to backslash under Windows, otherwise forward +slash. It can be set to forward slash, backslash, or dot. +

    +

    +The two setting functions return zero on success, or PCRE2_ERROR_BADDATA if +their second argument is invalid. +

    +
    THE CONVERSION FUNCTION
    +

    +int pcre2_pattern_convert(PCRE2_SPTR pattern, PCRE2_SIZE length, + uint32_t options, PCRE2_UCHAR **buffer, + PCRE2_SIZE *blength, pcre2_convert_context *cvcontext); +
    +
    +void pcre2_converted_pattern_free(PCRE2_UCHAR *converted_pattern); +
    +
    +The first two arguments of pcre2_pattern_convert() define the foreign +pattern that is to be converted. The length may be given as +PCRE2_ZERO_TERMINATED. The options argument defines how the pattern is to +be processed. If the input is UTF, the PCRE2_CONVERT_UTF option should be set. +PCRE2_CONVERT_NO_UTF_CHECK may also be set if you are sure the input is valid. +One or more of the glob options, or one of the following POSIX options must be +set to define the type of conversion that is required: +

    +  PCRE2_CONVERT_GLOB
    +  PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
    +  PCRE2_CONVERT_GLOB_NO_STARSTAR
    +  PCRE2_CONVERT_POSIX_BASIC
    +  PCRE2_CONVERT_POSIX_EXTENDED
    +
    +Details of the conversions are given below. The buffer and blength +arguments define how the output is handled: +

    +

    +If buffer is NULL, the function just returns the length of the converted +pattern via blength. This is one less than the length of buffer needed, +because a terminating zero is always added to the output. +

    +

    +If buffer points to a NULL pointer, an output buffer is obtained using +the allocator in the context or malloc() if no context is supplied. A +pointer to this buffer is placed in the variable to which buffer points. +When no longer needed the output buffer must be freed by calling +pcre2_converted_pattern_free(). If this function is called with a NULL +argument, it returns immediately without doing anything. +

    +

    +If buffer points to a non-NULL pointer, blength must be set to the +actual length of the buffer provided (in code units). +

    +

    +In all cases, after successful conversion, the variable pointed to by +blength is updated to the length actually used (in code units), excluding +the terminating zero that is always added. +

    +

    +If an error occurs, the length (via blength) is set to the offset +within the input pattern where the error was detected. Only gross syntax errors +are caught; there are plenty of errors that will get passed on for +pcre2_compile() to discover. +

    +

    +The return from pcre2_pattern_convert() is zero on success or a non-zero +PCRE2 error code. Note that PCRE2 error codes may be positive or negative: +pcre2_compile() uses mostly positive codes and pcre2_match() +negative ones; pcre2_convert() uses existing codes of both kinds. A +textual error message can be obtained by calling +pcre2_get_error_message(). +

    +
    CONVERTING GLOBS
    +

    +Globs are used to match file names, and consequently have the concept of a +"path separator", which defaults to backslash under Windows and forward slash +otherwise. If PCRE2_CONVERT_GLOB is set, the wildcards * and ? are not +permitted to match separator characters, but the double-star (**) feature +(which does match separators) is supported. +

    +

    +PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards allowed to +match separator characters. PCRE2_GLOB_NO_STARSTAR matches globs with the +double-star feature disabled. These options may be given together. +

    +
    CONVERTING POSIX PATTERNS
    +

    +POSIX defines two kinds of regular expression pattern: basic and extended. +These can be processed by setting PCRE2_CONVERT_POSIX_BASIC or +PCRE2_CONVERT_POSIX_EXTENDED, respectively. +

    +

    +In POSIX patterns, backslash is not special in a character class. Unmatched +closing parentheses are treated as literals. +

    +

    +In basic patterns, ? + | {} and () must be escaped to be recognized +as metacharacters outside a character class. If the first character in the +pattern is * it is treated as a literal. ^ is a metacharacter only at the start +of a branch. +

    +

    +In extended patterns, a backslash not in a character class always +makes the next character literal, whatever it is. There are no backreferences. +

    +

    +Note: POSIX mandates that the longest possible match at the first matching +position must be found. This is not what pcre2_match() does; it yields +the first match that is found. An application can use pcre2_dfa_match() +to find the longest match, but that does not support backreferences (but then +neither do POSIX extended patterns). +

    +
    AUTHOR
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    REVISION
    +

    +Last updated: 28 June 2018 +
    +Copyright © 1997-2018 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2demo.html b/src/pcre2/doc/html/pcre2demo.html new file mode 100644 index 00000000..08b21901 --- /dev/null +++ b/src/pcre2/doc/html/pcre2demo.html @@ -0,0 +1,514 @@ + + +pcre2demo specification + + +

    pcre2demo man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

      +
    +
    +/*************************************************
    +*           PCRE2 DEMONSTRATION PROGRAM          *
    +*************************************************/
    +
    +/* This is a demonstration program to illustrate a straightforward way of
    +using the PCRE2 regular expression library from a C program. See the
    +pcre2sample documentation for a short discussion ("man pcre2sample" if you have
    +the PCRE2 man pages installed). PCRE2 is a revised API for the library, and is
    +incompatible with the original PCRE API.
    +
    +There are actually three libraries, each supporting a different code unit
    +width. This demonstration program uses the 8-bit library. The default is to
    +process each code unit as a separate character, but if the pattern begins with
    +"(*UTF)", both it and the subject are treated as UTF-8 strings, where
    +characters may occupy multiple code units.
    +
    +In Unix-like environments, if PCRE2 is installed in your standard system
    +libraries, you should be able to compile this program using this command:
    +
    +cc -Wall pcre2demo.c -lpcre2-8 -o pcre2demo
    +
    +If PCRE2 is not installed in a standard place, it is likely to be installed
    +with support for the pkg-config mechanism. If you have pkg-config, you can
    +compile this program using this command:
    +
    +cc -Wall pcre2demo.c `pkg-config --cflags --libs libpcre2-8` -o pcre2demo
    +
    +If you do not have pkg-config, you may have to use something like this:
    +
    +cc -Wall pcre2demo.c -I/usr/local/include -L/usr/local/lib \
    +  -R/usr/local/lib -lpcre2-8 -o pcre2demo
    +
    +Replace "/usr/local/include" and "/usr/local/lib" with wherever the include and
    +library files for PCRE2 are installed on your system. Only some operating
    +systems (Solaris is one) use the -R option.
    +
    +Building under Windows:
    +
    +If you want to statically link this program against a non-dll .a file, you must
    +define PCRE2_STATIC before including pcre2.h, so in this environment, uncomment
    +the following line. */
    +
    +/* #define PCRE2_STATIC */
    +
    +/* The PCRE2_CODE_UNIT_WIDTH macro must be defined before including pcre2.h.
    +For a program that uses only one code unit width, setting it to 8, 16, or 32
    +makes it possible to use generic function names such as pcre2_compile(). Note
    +that just changing 8 to 16 (for example) is not sufficient to convert this
    +program to process 16-bit characters. Even in a fully 16-bit environment, where
    +string-handling functions such as strcmp() and printf() work with 16-bit
    +characters, the code for handling the table of named substrings will still need
    +to be modified. */
    +
    +#define PCRE2_CODE_UNIT_WIDTH 8
    +
    +#include <stdio.h>
    +#include <string.h>
    +#include <pcre2.h>
    +
    +
    +/**************************************************************************
    +* Here is the program. The API includes the concept of "contexts" for     *
    +* setting up unusual interface requirements for compiling and matching,   *
    +* such as custom memory managers and non-standard newline definitions.    *
    +* This program does not do any of this, so it makes no use of contexts,   *
    +* always passing NULL where a context could be given.                     *
    +**************************************************************************/
    +
    +int main(int argc, char **argv)
    +{
    +pcre2_code *re;
    +PCRE2_SPTR pattern;     /* PCRE2_SPTR is a pointer to unsigned code units of */
    +PCRE2_SPTR subject;     /* the appropriate width (in this case, 8 bits). */
    +PCRE2_SPTR name_table;
    +
    +int crlf_is_newline;
    +int errornumber;
    +int find_all;
    +int i;
    +int rc;
    +int utf8;
    +
    +uint32_t option_bits;
    +uint32_t namecount;
    +uint32_t name_entry_size;
    +uint32_t newline;
    +
    +PCRE2_SIZE erroroffset;
    +PCRE2_SIZE *ovector;
    +PCRE2_SIZE subject_length;
    +
    +pcre2_match_data *match_data;
    +
    +
    +/**************************************************************************
    +* First, sort out the command line. There is only one possible option at  *
    +* the moment, "-g" to request repeated matching to find all occurrences,  *
    +* like Perl's /g option. We set the variable find_all to a non-zero value *
    +* if the -g option is present.                                            *
    +**************************************************************************/
    +
    +find_all = 0;
    +for (i = 1; i < argc; i++)
    +  {
    +  if (strcmp(argv[i], "-g") == 0) find_all = 1;
    +  else if (argv[i][0] == '-')
    +    {
    +    printf("Unrecognised option %s\n", argv[i]);
    +    return 1;
    +    }
    +  else break;
    +  }
    +
    +/* After the options, we require exactly two arguments, which are the pattern,
    +and the subject string. */
    +
    +if (argc - i != 2)
    +  {
    +  printf("Exactly two arguments required: a regex and a subject string\n");
    +  return 1;
    +  }
    +
    +/* Pattern and subject are char arguments, so they can be straightforwardly
    +cast to PCRE2_SPTR because we are working in 8-bit code units. The subject
    +length is cast to PCRE2_SIZE for completeness, though PCRE2_SIZE is in fact
    +defined to be size_t. */
    +
    +pattern = (PCRE2_SPTR)argv[i];
    +subject = (PCRE2_SPTR)argv[i+1];
    +subject_length = (PCRE2_SIZE)strlen((char *)subject);
    +
    +
    +/*************************************************************************
    +* Now we are going to compile the regular expression pattern, and handle *
    +* any errors that are detected.                                          *
    +*************************************************************************/
    +
    +re = pcre2_compile(
    +  pattern,               /* the pattern */
    +  PCRE2_ZERO_TERMINATED, /* indicates pattern is zero-terminated */
    +  0,                     /* default options */
    +  &errornumber,          /* for error number */
    +  &erroroffset,          /* for error offset */
    +  NULL);                 /* use default compile context */
    +
    +/* Compilation failed: print the error message and exit. */
    +
    +if (re == NULL)
    +  {
    +  PCRE2_UCHAR buffer[256];
    +  pcre2_get_error_message(errornumber, buffer, sizeof(buffer));
    +  printf("PCRE2 compilation failed at offset %d: %s\n", (int)erroroffset,
    +    buffer);
    +  return 1;
    +  }
    +
    +
    +/*************************************************************************
    +* If the compilation succeeded, we call PCRE2 again, in order to do a    *
    +* pattern match against the subject string. This does just ONE match. If *
    +* further matching is needed, it will be done below. Before running the  *
    +* match we must set up a match_data block for holding the result. Using  *
    +* pcre2_match_data_create_from_pattern() ensures that the block is       *
    +* exactly the right size for the number of capturing parentheses in the  *
    +* pattern. If you need to know the actual size of a match_data block as  *
    +* a number of bytes, you can find it like this:                          *
    +*                                                                        *
    +* PCRE2_SIZE match_data_size = pcre2_get_match_data_size(match_data);    *
    +*************************************************************************/
    +
    +match_data = pcre2_match_data_create_from_pattern(re, NULL);
    +
    +/* Now run the match. */
    +
    +rc = pcre2_match(
    +  re,                   /* the compiled pattern */
    +  subject,              /* the subject string */
    +  subject_length,       /* the length of the subject */
    +  0,                    /* start at offset 0 in the subject */
    +  0,                    /* default options */
    +  match_data,           /* block for storing the result */
    +  NULL);                /* use default match context */
    +
    +/* Matching failed: handle error cases */
    +
    +if (rc < 0)
    +  {
    +  switch(rc)
    +    {
    +    case PCRE2_ERROR_NOMATCH: printf("No match\n"); break;
    +    /*
    +    Handle other special cases if you like
    +    */
    +    default: printf("Matching error %d\n", rc); break;
    +    }
    +  pcre2_match_data_free(match_data);   /* Release memory used for the match */
    +  pcre2_code_free(re);                 /*   data and the compiled pattern. */
    +  return 1;
    +  }
    +
    +/* Match succeded. Get a pointer to the output vector, where string offsets are
    +stored. */
    +
    +ovector = pcre2_get_ovector_pointer(match_data);
    +printf("Match succeeded at offset %d\n", (int)ovector[0]);
    +
    +
    +/*************************************************************************
    +* We have found the first match within the subject string. If the output *
    +* vector wasn't big enough, say so. Then output any substrings that were *
    +* captured.                                                              *
    +*************************************************************************/
    +
    +/* The output vector wasn't big enough. This should not happen, because we used
    +pcre2_match_data_create_from_pattern() above. */
    +
    +if (rc == 0)
    +  printf("ovector was not big enough for all the captured substrings\n");
    +
    +/* We must guard against patterns such as /(?=.\K)/ that use \K in an assertion
    +to set the start of a match later than its end. In this demonstration program,
    +we just detect this case and give up. */
    +
    +if (ovector[0] > ovector[1])
    +  {
    +  printf("\\K was used in an assertion to set the match start after its end.\n"
    +    "From end to start the match was: %.*s\n", (int)(ovector[0] - ovector[1]),
    +      (char *)(subject + ovector[1]));
    +  printf("Run abandoned\n");
    +  pcre2_match_data_free(match_data);
    +  pcre2_code_free(re);
    +  return 1;
    +  }
    +
    +/* Show substrings stored in the output vector by number. Obviously, in a real
    +application you might want to do things other than print them. */
    +
    +for (i = 0; i < rc; i++)
    +  {
    +  PCRE2_SPTR substring_start = subject + ovector[2*i];
    +  PCRE2_SIZE substring_length = ovector[2*i+1] - ovector[2*i];
    +  printf("%2d: %.*s\n", i, (int)substring_length, (char *)substring_start);
    +  }
    +
    +
    +/**************************************************************************
    +* That concludes the basic part of this demonstration program. We have    *
    +* compiled a pattern, and performed a single match. The code that follows *
    +* shows first how to access named substrings, and then how to code for    *
    +* repeated matches on the same subject.                                   *
    +**************************************************************************/
    +
    +/* See if there are any named substrings, and if so, show them by name. First
    +we have to extract the count of named parentheses from the pattern. */
    +
    +(void)pcre2_pattern_info(
    +  re,                   /* the compiled pattern */
    +  PCRE2_INFO_NAMECOUNT, /* get the number of named substrings */
    +  &namecount);          /* where to put the answer */
    +
    +if (namecount == 0) printf("No named substrings\n"); else
    +  {
    +  PCRE2_SPTR tabptr;
    +  printf("Named substrings\n");
    +
    +  /* Before we can access the substrings, we must extract the table for
    +  translating names to numbers, and the size of each entry in the table. */
    +
    +  (void)pcre2_pattern_info(
    +    re,                       /* the compiled pattern */
    +    PCRE2_INFO_NAMETABLE,     /* address of the table */
    +    &name_table);             /* where to put the answer */
    +
    +  (void)pcre2_pattern_info(
    +    re,                       /* the compiled pattern */
    +    PCRE2_INFO_NAMEENTRYSIZE, /* size of each entry in the table */
    +    &name_entry_size);        /* where to put the answer */
    +
    +  /* Now we can scan the table and, for each entry, print the number, the name,
    +  and the substring itself. In the 8-bit library the number is held in two
    +  bytes, most significant first. */
    +
    +  tabptr = name_table;
    +  for (i = 0; i < namecount; i++)
    +    {
    +    int n = (tabptr[0] << 8) | tabptr[1];
    +    printf("(%d) %*s: %.*s\n", n, name_entry_size - 3, tabptr + 2,
    +      (int)(ovector[2*n+1] - ovector[2*n]), subject + ovector[2*n]);
    +    tabptr += name_entry_size;
    +    }
    +  }
    +
    +
    +/*************************************************************************
    +* If the "-g" option was given on the command line, we want to continue  *
    +* to search for additional matches in the subject string, in a similar   *
    +* way to the /g option in Perl. This turns out to be trickier than you   *
    +* might think because of the possibility of matching an empty string.    *
    +* What happens is as follows:                                            *
    +*                                                                        *
    +* If the previous match was NOT for an empty string, we can just start   *
    +* the next match at the end of the previous one.                         *
    +*                                                                        *
    +* If the previous match WAS for an empty string, we can't do that, as it *
    +* would lead to an infinite loop. Instead, a call of pcre2_match() is    *
    +* made with the PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set. The *
    +* first of these tells PCRE2 that an empty string at the start of the    *
    +* subject is not a valid match; other possibilities must be tried. The   *
    +* second flag restricts PCRE2 to one match attempt at the initial string *
    +* position. If this match succeeds, an alternative to the empty string   *
    +* match has been found, and we can print it and proceed round the loop,  *
    +* advancing by the length of whatever was found. If this match does not  *
    +* succeed, we still stay in the loop, advancing by just one character.   *
    +* In UTF-8 mode, which can be set by (*UTF) in the pattern, this may be  *
    +* more than one byte.                                                    *
    +*                                                                        *
    +* However, there is a complication concerned with newlines. When the     *
    +* newline convention is such that CRLF is a valid newline, we must       *
    +* advance by two characters rather than one. The newline convention can  *
    +* be set in the regex by (*CR), etc.; if not, we must find the default.  *
    +*************************************************************************/
    +
    +if (!find_all)     /* Check for -g */
    +  {
    +  pcre2_match_data_free(match_data);  /* Release the memory that was used */
    +  pcre2_code_free(re);                /* for the match data and the pattern. */
    +  return 0;                           /* Exit the program. */
    +  }
    +
    +/* Before running the loop, check for UTF-8 and whether CRLF is a valid newline
    +sequence. First, find the options with which the regex was compiled and extract
    +the UTF state. */
    +
    +(void)pcre2_pattern_info(re, PCRE2_INFO_ALLOPTIONS, &option_bits);
    +utf8 = (option_bits & PCRE2_UTF) != 0;
    +
    +/* Now find the newline convention and see whether CRLF is a valid newline
    +sequence. */
    +
    +(void)pcre2_pattern_info(re, PCRE2_INFO_NEWLINE, &newline);
    +crlf_is_newline = newline == PCRE2_NEWLINE_ANY ||
    +                  newline == PCRE2_NEWLINE_CRLF ||
    +                  newline == PCRE2_NEWLINE_ANYCRLF;
    +
    +/* Loop for second and subsequent matches */
    +
    +for (;;)
    +  {
    +  uint32_t options = 0;                   /* Normally no options */
    +  PCRE2_SIZE start_offset = ovector[1];   /* Start at end of previous match */
    +
    +  /* If the previous match was for an empty string, we are finished if we are
    +  at the end of the subject. Otherwise, arrange to run another match at the
    +  same point to see if a non-empty match can be found. */
    +
    +  if (ovector[0] == ovector[1])
    +    {
    +    if (ovector[0] == subject_length) break;
    +    options = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED;
    +    }
    +
    +  /* If the previous match was not an empty string, there is one tricky case to
    +  consider. If a pattern contains \K within a lookbehind assertion at the
    +  start, the end of the matched string can be at the offset where the match
    +  started. Without special action, this leads to a loop that keeps on matching
    +  the same substring. We must detect this case and arrange to move the start on
    +  by one character. The pcre2_get_startchar() function returns the starting
    +  offset that was passed to pcre2_match(). */
    +
    +  else
    +    {
    +    PCRE2_SIZE startchar = pcre2_get_startchar(match_data);
    +    if (start_offset <= startchar)
    +      {
    +      if (startchar >= subject_length) break;   /* Reached end of subject.   */
    +      start_offset = startchar + 1;             /* Advance by one character. */
    +      if (utf8)                                 /* If UTF-8, it may be more  */
    +        {                                       /*   than one code unit.     */
    +        for (; start_offset < subject_length; start_offset++)
    +          if ((subject[start_offset] & 0xc0) != 0x80) break;
    +        }
    +      }
    +    }
    +
    +  /* Run the next matching operation */
    +
    +  rc = pcre2_match(
    +    re,                   /* the compiled pattern */
    +    subject,              /* the subject string */
    +    subject_length,       /* the length of the subject */
    +    start_offset,         /* starting offset in the subject */
    +    options,              /* options */
    +    match_data,           /* block for storing the result */
    +    NULL);                /* use default match context */
    +
    +  /* This time, a result of NOMATCH isn't an error. If the value in "options"
    +  is zero, it just means we have found all possible matches, so the loop ends.
    +  Otherwise, it means we have failed to find a non-empty-string match at a
    +  point where there was a previous empty-string match. In this case, we do what
    +  Perl does: advance the matching position by one character, and continue. We
    +  do this by setting the "end of previous match" offset, because that is picked
    +  up at the top of the loop as the point at which to start again.
    +
    +  There are two complications: (a) When CRLF is a valid newline sequence, and
    +  the current position is just before it, advance by an extra byte. (b)
    +  Otherwise we must ensure that we skip an entire UTF character if we are in
    +  UTF mode. */
    +
    +  if (rc == PCRE2_ERROR_NOMATCH)
    +    {
    +    if (options == 0) break;                    /* All matches found */
    +    ovector[1] = start_offset + 1;              /* Advance one code unit */
    +    if (crlf_is_newline &&                      /* If CRLF is a newline & */
    +        start_offset < subject_length - 1 &&    /* we are at CRLF, */
    +        subject[start_offset] == '\r' &&
    +        subject[start_offset + 1] == '\n')
    +      ovector[1] += 1;                          /* Advance by one more. */
    +    else if (utf8)                              /* Otherwise, ensure we */
    +      {                                         /* advance a whole UTF-8 */
    +      while (ovector[1] < subject_length)       /* character. */
    +        {
    +        if ((subject[ovector[1]] & 0xc0) != 0x80) break;
    +        ovector[1] += 1;
    +        }
    +      }
    +    continue;    /* Go round the loop again */
    +    }
    +
    +  /* Other matching errors are not recoverable. */
    +
    +  if (rc < 0)
    +    {
    +    printf("Matching error %d\n", rc);
    +    pcre2_match_data_free(match_data);
    +    pcre2_code_free(re);
    +    return 1;
    +    }
    +
    +  /* Match succeded */
    +
    +  printf("\nMatch succeeded again at offset %d\n", (int)ovector[0]);
    +
    +  /* The match succeeded, but the output vector wasn't big enough. This
    +  should not happen. */
    +
    +  if (rc == 0)
    +    printf("ovector was not big enough for all the captured substrings\n");
    +
    +  /* We must guard against patterns such as /(?=.\K)/ that use \K in an
    +  assertion to set the start of a match later than its end. In this
    +  demonstration program, we just detect this case and give up. */
    +
    +  if (ovector[0] > ovector[1])
    +    {
    +    printf("\\K was used in an assertion to set the match start after its end.\n"
    +      "From end to start the match was: %.*s\n", (int)(ovector[0] - ovector[1]),
    +        (char *)(subject + ovector[1]));
    +    printf("Run abandoned\n");
    +    pcre2_match_data_free(match_data);
    +    pcre2_code_free(re);
    +    return 1;
    +    }
    +
    +  /* As before, show substrings stored in the output vector by number, and then
    +  also any named substrings. */
    +
    +  for (i = 0; i < rc; i++)
    +    {
    +    PCRE2_SPTR substring_start = subject + ovector[2*i];
    +    size_t substring_length = ovector[2*i+1] - ovector[2*i];
    +    printf("%2d: %.*s\n", i, (int)substring_length, (char *)substring_start);
    +    }
    +
    +  if (namecount == 0) printf("No named substrings\n"); else
    +    {
    +    PCRE2_SPTR tabptr = name_table;
    +    printf("Named substrings\n");
    +    for (i = 0; i < namecount; i++)
    +      {
    +      int n = (tabptr[0] << 8) | tabptr[1];
    +      printf("(%d) %*s: %.*s\n", n, name_entry_size - 3, tabptr + 2,
    +        (int)(ovector[2*n+1] - ovector[2*n]), subject + ovector[2*n]);
    +      tabptr += name_entry_size;
    +      }
    +    }
    +  }      /* End of loop to find second and subsequent matches */
    +
    +printf("\n");
    +pcre2_match_data_free(match_data);
    +pcre2_code_free(re);
    +return 0;
    +}
    +
    +/* End of pcre2demo.c */
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2grep.html b/src/pcre2/doc/html/pcre2grep.html new file mode 100644 index 00000000..995e0eab --- /dev/null +++ b/src/pcre2/doc/html/pcre2grep.html @@ -0,0 +1,1056 @@ + + +pcre2grep specification + + +

    pcre2grep man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

    +
    SYNOPSIS
    +

    +pcre2grep [options] [long options] [pattern] [path1 path2 ...] +

    +
    DESCRIPTION
    +

    +pcre2grep searches files for character patterns, in the same way as other +grep commands do, but it uses the PCRE2 regular expression library to support +patterns that are compatible with the regular expressions of Perl 5. See +pcre2syntax(3) +for a quick-reference summary of pattern syntax, or +pcre2pattern(3) +for a full description of the syntax and semantics of the regular expressions +that PCRE2 supports. +

    +

    +Patterns, whether supplied on the command line or in a separate file, are given +without delimiters. For example: +

    +  pcre2grep Thursday /etc/motd
    +
    +If you attempt to use delimiters (for example, by surrounding a pattern with +slashes, as is common in Perl scripts), they are interpreted as part of the +pattern. Quotes can of course be used to delimit patterns on the command line +because they are interpreted by the shell, and indeed quotes are required if a +pattern contains white space or shell metacharacters. +

    +

    +The first argument that follows any option settings is treated as the single +pattern to be matched when neither -e nor -f is present. +Conversely, when one or both of these options are used to specify patterns, all +arguments are treated as path names. At least one of -e, -f, or an +argument pattern must be provided. +

    +

    +If no files are specified, pcre2grep reads the standard input. The +standard input can also be referenced by a name consisting of a single hyphen. +For example: +

    +  pcre2grep some-pattern file1 - file3
    +
    +Input files are searched line by line. By default, each line that matches a +pattern is copied to the standard output, and if there is more than one file, +the file name is output at the start of each line, followed by a colon. +However, there are options that can change how pcre2grep behaves. In +particular, the -M option makes it possible to search for strings that +span line boundaries. What defines a line boundary is controlled by the +-N (--newline) option. +

    +

    +The amount of memory used for buffering files that are being scanned is +controlled by parameters that can be set by the --buffer-size and +--max-buffer-size options. The first of these sets the size of buffer +that is obtained at the start of processing. If an input file contains very +long lines, a larger buffer may be needed; this is handled by automatically +extending the buffer, up to the limit specified by --max-buffer-size. The +default values for these parameters can be set when pcre2grep is +built; if nothing is specified, the defaults are set to 20KiB and 1MiB +respectively. An error occurs if a line is too long and the buffer can no +longer be expanded. +

    +

    +The block of memory that is actually used is three times the "buffer size", to +allow for buffering "before" and "after" lines. If the buffer size is too +small, fewer than requested "before" and "after" lines may be output. +

    +

    +Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the greater. +BUFSIZ is defined in <stdio.h>. When there is more than one pattern +(specified by the use of -e and/or -f), each pattern is applied to +each line in the order in which they are defined, except that all the -e +patterns are tried before the -f patterns. +

    +

    +By default, as soon as one pattern matches a line, no further patterns are +considered. However, if --colour (or --color) is used to colour the +matching substrings, or if --only-matching, --file-offsets, or +--line-offsets is used to output only the part of the line that matched +(either shown literally, or as an offset), scanning resumes immediately +following the match, so that further matches on the same line can be found. If +there are multiple patterns, they are all tried on the remainder of the line, +but patterns that follow the one that matched are not tried on the earlier +matched part of the line. +

    +

    +This behaviour means that the order in which multiple patterns are specified +can affect the output when one of the above options is used. This is no longer +the same behaviour as GNU grep, which now manages to display earlier matches +for later patterns (as long as there is no overlap). +

    +

    +Patterns that can match an empty string are accepted, but empty string +matches are never recognized. An example is the pattern "(super)?(man)?", in +which all components are optional. This pattern finds all occurrences of both +"super" and "man"; the output differs from matching with "super|man" when only +the matching substrings are being shown. +

    +

    +If the LC_ALL or LC_CTYPE environment variable is set, +pcre2grep uses the value to set a locale when calling the PCRE2 library. +The --locale option can be used to override this. +

    +
    SUPPORT FOR COMPRESSED FILES
    +

    +It is possible to compile pcre2grep so that it uses libz or +libbz2 to read compressed files whose names end in .gz or +.bz2, respectively. You can find out whether your pcre2grep binary +has support for one or both of these file types by running it with the +--help option. If the appropriate support is not present, all files are +treated as plain text. The standard input is always so treated. When input is +from a compressed .gz or .bz2 file, the --line-buffered option is +ignored. +

    +
    BINARY FILES
    +

    +By default, a file that contains a binary zero byte within the first 1024 bytes +is identified as a binary file, and is processed specially. However, if the +newline type is specified as NUL, that is, the line terminator is a binary +zero, the test for a binary file is not applied. See the --binary-files +option for a means of changing the way binary files are handled. +

    +
    BINARY ZEROS IN PATTERNS
    +

    +Patterns passed from the command line are strings that are terminated by a +binary zero, so cannot contain internal zeros. However, patterns that are read +from a file via the -f option may contain binary zeros. +

    +
    OPTIONS
    +

    +The order in which some of the options appear can affect the output. For +example, both the -H and -l options affect the printing of file +names. Whichever comes later in the command line will be the one that takes +effect. Similarly, except where noted below, if an option is given twice, the +later setting is used. Numerical values for options may be followed by K or M, +to signify multiplication by 1024 or 1024*1024 respectively. +

    +

    +-- +This terminates the list of options. It is useful if the next item on the +command line starts with a hyphen but is not an option. This allows for the +processing of patterns and file names that start with hyphens. +

    +

    +-A number, --after-context=number +Output up to number lines of context after each matching line. Fewer +lines are output if the next match or the end of the file is reached, or if the +processing buffer size has been set too small. If file names and/or line +numbers are being output, a hyphen separator is used instead of a colon for the +context lines. A line containing "--" is output between each group of lines, +unless they are in fact contiguous in the input file. The value of number +is expected to be relatively small. When -c is used, -A is ignored. +

    +

    +-a, --text +Treat binary files as text. This is equivalent to +--binary-files=text. +

    +

    +-B number, --before-context=number +Output up to number lines of context before each matching line. Fewer +lines are output if the previous match or the start of the file is within +number lines, or if the processing buffer size has been set too small. If +file names and/or line numbers are being output, a hyphen separator is used +instead of a colon for the context lines. A line containing "--" is output +between each group of lines, unless they are in fact contiguous in the input +file. The value of number is expected to be relatively small. When +-c is used, -B is ignored. +

    +

    +--binary-files=word +Specify how binary files are to be processed. If the word is "binary" (the +default), pattern matching is performed on binary files, but the only output is +"Binary file <name> matches" when a match succeeds. If the word is "text", +which is equivalent to the -a or --text option, binary files are +processed in the same way as any other file. In this case, when a match +succeeds, the output may be binary garbage, which can have nasty effects if +sent to a terminal. If the word is "without-match", which is equivalent to the +-I option, binary files are not processed at all; they are assumed not to +be of interest and are skipped without causing any output or affecting the +return code. +

    +

    +--buffer-size=number +Set the parameter that controls how much memory is obtained at the start of +processing for buffering files that are being scanned. See also +--max-buffer-size below. +

    +

    +-C number, --context=number +Output number lines of context both before and after each matching line. +This is equivalent to setting both -A and -B to the same value. +

    +

    +-c, --count +Do not output lines from the files that are being scanned; instead output the +number of lines that would have been shown, either because they matched, or, if +-v is set, because they failed to match. By default, this count is +exactly the same as the number of lines that would have been output, but if the +-M (multiline) option is used (without -v), there may be more +suppressed lines than the count (that is, the number of matches). +
    +
    +If no lines are selected, the number zero is output. If several files are are +being scanned, a count is output for each of them and the -t option can +be used to cause a total to be output at the end. However, if the +--files-with-matches option is also used, only those files whose counts +are greater than zero are listed. When -c is used, the -A, +-B, and -C options are ignored. +

    +

    +--colour, --color +If this option is given without any data, it is equivalent to "--colour=auto". +If data is required, it must be given in the same shell item, separated by an +equals sign. +

    +

    +--colour=value, --color=value +This option specifies under what circumstances the parts of a line that matched +a pattern should be coloured in the output. By default, the output is not +coloured. The value (which is optional, see above) may be "never", "always", or +"auto". In the latter case, colouring happens only if the standard output is +connected to a terminal. More resources are used when colouring is enabled, +because pcre2grep has to search for all possible matches in a line, not +just one, in order to colour them all. +
    +
    +The colour that is used can be specified by setting one of the environment +variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR, PCREGREP_COLOUR, or +PCREGREP_COLOR, which are checked in that order. If none of these are set, +pcre2grep looks for GREP_COLORS or GREP_COLOR (in that order). The value +of the variable should be a string of two numbers, separated by a semicolon, +except in the case of GREP_COLORS, which must start with "ms=" or "mt=" +followed by two semicolon-separated colours, terminated by the end of the +string or by a colon. If GREP_COLORS does not start with "ms=" or "mt=" it is +ignored, and GREP_COLOR is checked. +
    +
    +If the string obtained from one of the above variables contains any characters +other than semicolon or digits, the setting is ignored and the default colour +is used. The string is copied directly into the control string for setting +colour on a terminal, so it is your responsibility to ensure that the values +make sense. If no relevant environment variable is set, the default is "1;31", +which gives red. +

    +

    +-D action, --devices=action +If an input path is not a regular file or a directory, "action" specifies how +it is to be processed. Valid values are "read" (the default) or "skip" +(silently skip the path). +

    +

    +-d action, --directories=action +If an input path is a directory, "action" specifies how it is to be processed. +Valid values are "read" (the default in non-Windows environments, for +compatibility with GNU grep), "recurse" (equivalent to the -r option), or +"skip" (silently skip the path, the default in Windows environments). In the +"read" case, directories are read as if they were ordinary files. In some +operating systems the effect of reading a directory like this is an immediate +end-of-file; in others it may provoke an error. +

    +

    +--depth-limit=number +See --match-limit below. +

    +

    +-e pattern, --regex=pattern, --regexp=pattern +Specify a pattern to be matched. This option can be used multiple times in +order to specify several patterns. It can also be used as a way of specifying a +single pattern that starts with a hyphen. When -e is used, no argument +pattern is taken from the command line; all arguments are treated as file +names. There is no limit to the number of patterns. They are applied to each +line in the order in which they are defined until one matches. +
    +
    +If -f is used with -e, the command line patterns are matched first, +followed by the patterns from the file(s), independent of the order in which +these options are specified. Note that multiple use of -e is not the same +as a single pattern with alternatives. For example, X|Y finds the first +character in a line that is X or Y, whereas if the two patterns are given +separately, with X first, pcre2grep finds X if it is present, even if it +follows Y in the line. It finds Y only if there is no X in the line. This +matters only if you are using -o or --colo(u)r to show the part(s) +of the line that matched. +

    +

    +--exclude=pattern +Files (but not directories) whose names match the pattern are skipped without +being processed. This applies to all files, whether listed on the command line, +obtained from --file-list, or by scanning a directory. The pattern is a +PCRE2 regular expression, and is matched against the final component of the +file name, not the entire path. The -F, -w, and -x options do +not apply to this pattern. The option may be given any number of times in order +to specify multiple patterns. If a file name matches both an --include +and an --exclude pattern, it is excluded. There is no short form for this +option. +

    +

    +--exclude-from=filename +Treat each non-empty line of the file as the data for an --exclude +option. What constitutes a newline when reading the file is the operating +system's default. The --newline option has no effect on this option. This +option may be given more than once in order to specify a number of files to +read. +

    +

    +--exclude-dir=pattern +Directories whose names match the pattern are skipped without being processed, +whatever the setting of the --recursive option. This applies to all +directories, whether listed on the command line, obtained from +--file-list, or by scanning a parent directory. The pattern is a PCRE2 +regular expression, and is matched against the final component of the directory +name, not the entire path. The -F, -w, and -x options do not +apply to this pattern. The option may be given any number of times in order to +specify more than one pattern. If a directory matches both --include-dir +and --exclude-dir, it is excluded. There is no short form for this +option. +

    +

    +-F, --fixed-strings +Interpret each data-matching pattern as a list of fixed strings, separated by +newlines, instead of as a regular expression. What constitutes a newline for +this purpose is controlled by the --newline option. The -w (match +as a word) and -x (match whole line) options can be used with -F. +They apply to each of the fixed strings. A line is selected if any of the fixed +strings are found in it (subject to -w or -x, if present). This +option applies only to the patterns that are matched against the contents of +files; it does not apply to patterns specified by any of the --include or +--exclude options. +

    +

    +-f filename, --file=filename +Read patterns from the file, one per line, and match them against each line of +input. As is the case with patterns on the command line, no delimiters should +be used. What constitutes a newline when reading the file is the operating +system's default interpretation of \n. The --newline option has no +effect on this option. Trailing white space is removed from each line, and +blank lines are ignored. An empty file contains no patterns and therefore +matches nothing. Patterns read from a file in this way may contain binary +zeros, which are treated as ordinary data characters. See also the comments +about multiple patterns versus a single pattern with alternatives in the +description of -e above. +
    +
    +If this option is given more than once, all the specified files are read. A +data line is output if any of the patterns match it. A file name can be given +as "-" to refer to the standard input. When -f is used, patterns +specified on the command line using -e may also be present; they are +tested before the file's patterns. However, no other pattern is taken from the +command line; all arguments are treated as the names of paths to be searched. +

    +

    +--file-list=filename +Read a list of files and/or directories that are to be scanned from the given +file, one per line. What constitutes a newline when reading the file is the +operating system's default. Trailing white space is removed from each line, and +blank lines are ignored. These paths are processed before any that are listed +on the command line. The file name can be given as "-" to refer to the standard +input. If --file and --file-list are both specified as "-", +patterns are read first. This is useful only when the standard input is a +terminal, from which further lines (the list of files) can be read after an +end-of-file indication. If this option is given more than once, all the +specified files are read. +

    +

    +--file-offsets +Instead of showing lines or parts of lines that match, show each match as an +offset from the start of the file and a length, separated by a comma. In this +mode, no context is shown. That is, the -A, -B, and -C +options are ignored. If there is more than one match in a line, each of them is +shown separately. This option is mutually exclusive with --output, +--line-offsets, and --only-matching. +

    +

    +-H, --with-filename +Force the inclusion of the file name at the start of output lines when +searching a single file. By default, the file name is not shown in this case. +For matching lines, the file name is followed by a colon; for context lines, a +hyphen separator is used. If a line number is also being output, it follows the +file name. When the -M option causes a pattern to match more than one +line, only the first is preceded by the file name. This option overrides any +previous -h, -l, or -L options. +

    +

    +-h, --no-filename +Suppress the output file names when searching multiple files. By default, +file names are shown when multiple files are searched. For matching lines, the +file name is followed by a colon; for context lines, a hyphen separator is used. +If a line number is also being output, it follows the file name. This option +overrides any previous -H, -L, or -l options. +

    +

    +--heap-limit=number +See --match-limit below. +

    +

    +--help +Output a help message, giving brief details of the command options and file +type support, and then exit. Anything else on the command line is +ignored. +

    +

    +-I +Ignore binary files. This is equivalent to +--binary-files=without-match. +

    +

    +-i, --ignore-case +Ignore upper/lower case distinctions during comparisons. +

    +

    +--include=pattern +If any --include patterns are specified, the only files that are +processed are those whose names match one of the patterns and do not match an +--exclude pattern. This option does not affect directories, but it +applies to all files, whether listed on the command line, obtained from +--file-list, or by scanning a directory. The pattern is a PCRE2 regular +expression, and is matched against the final component of the file name, not +the entire path. The -F, -w, and -x options do not apply to +this pattern. The option may be given any number of times. If a file name +matches both an --include and an --exclude pattern, it is excluded. +There is no short form for this option. +

    +

    +--include-from=filename +Treat each non-empty line of the file as the data for an --include +option. What constitutes a newline for this purpose is the operating system's +default. The --newline option has no effect on this option. This option +may be given any number of times; all the files are read. +

    +

    +--include-dir=pattern +If any --include-dir patterns are specified, the only directories that +are processed are those whose names match one of the patterns and do not match +an --exclude-dir pattern. This applies to all directories, whether listed +on the command line, obtained from --file-list, or by scanning a parent +directory. The pattern is a PCRE2 regular expression, and is matched against +the final component of the directory name, not the entire path. The -F, +-w, and -x options do not apply to this pattern. The option may be +given any number of times. If a directory matches both --include-dir and +--exclude-dir, it is excluded. There is no short form for this option. +

    +

    +-L, --files-without-match +Instead of outputting lines from the files, just output the names of the files +that do not contain any lines that would have been output. Each file name is +output once, on a separate line. This option overrides any previous -H, +-h, or -l options. +

    +

    +-l, --files-with-matches +Instead of outputting lines from the files, just output the names of the files +containing lines that would have been output. Each file name is output once, on +a separate line. Searching normally stops as soon as a matching line is found +in a file. However, if the -c (count) option is also used, matching +continues in order to obtain the correct count, and those files that have at +least one match are listed along with their counts. Using this option with +-c is a way of suppressing the listing of files with no matches that +occurs with -c on its own. This option overrides any previous -H, +-h, or -L options. +

    +

    +--label=name +This option supplies a name to be used for the standard input when file names +are being output. If not supplied, "(standard input)" is used. There is no +short form for this option. +

    +

    +--line-buffered +When this option is given, non-compressed input is read and processed line by +line, and the output is flushed after each write. By default, input is read in +large chunks, unless pcre2grep can determine that it is reading from a +terminal, which is currently possible only in Unix-like environments or +Windows. Output to terminal is normally automatically flushed by the operating +system. This option can be useful when the input or output is attached to a +pipe and you do not want pcre2grep to buffer up large amounts of data. +However, its use will affect performance, and the -M (multiline) option +ceases to work. When input is from a compressed .gz or .bz2 file, +--line-buffered is ignored. +

    +

    +--line-offsets +Instead of showing lines or parts of lines that match, show each match as a +line number, the offset from the start of the line, and a length. The line +number is terminated by a colon (as usual; see the -n option), and the +offset and length are separated by a comma. In this mode, no context is shown. +That is, the -A, -B, and -C options are ignored. If there is +more than one match in a line, each of them is shown separately. This option is +mutually exclusive with --output, --file-offsets, and +--only-matching. +

    +

    +--locale=locale-name +This option specifies a locale to be used for pattern matching. It overrides +the value in the LC_ALL or LC_CTYPE environment variables. If no +locale is specified, the PCRE2 library's default (usually the "C" locale) is +used. There is no short form for this option. +

    +

    +-M, --multiline +Allow patterns to match more than one line. When this option is set, the PCRE2 +library is called in "multiline" mode. This allows a matched string to extend +past the end of a line and continue on one or more subsequent lines. Patterns +used with -M may usefully contain literal newline characters and internal +occurrences of ^ and $ characters. The output for a successful match may +consist of more than one line. The first line is the line in which the match +started, and the last line is the line in which the match ended. If the matched +string ends with a newline sequence, the output ends at the end of that line. +If -v is set, none of the lines in a multi-line match are output. Once a +match has been handled, scanning restarts at the beginning of the line after +the one in which the match ended. +
    +
    +The newline sequence that separates multiple lines must be matched as part of +the pattern. For example, to find the phrase "regular expression" in a file +where "regular" might be at the end of a line and "expression" at the start of +the next line, you could use this command: +

    +  pcre2grep -M 'regular\s+expression' <file>
    +
    +The \s escape sequence matches any white space character, including newlines, +and is followed by + so as to match trailing white space on the first line as +well as possibly handling a two-character newline sequence. +
    +
    +There is a limit to the number of lines that can be matched, imposed by the way +that pcre2grep buffers the input file as it scans it. With a sufficiently +large processing buffer, this should not be a problem, but the -M option +does not work when input is read line by line (see --line-buffered.) +

    +

    +-m number, --max-count=number +Stop processing after finding number matching lines, or non-matching +lines if -v is also set. Any trailing context lines are output after the +final match. In multiline mode, each multiline match counts as just one line +for this purpose. If this limit is reached when reading the standard input from +a regular file, the file is left positioned just after the last matching line. +If -c is also set, the count that is output is never greater than +number. This option has no effect if used with -L, -l, or +-q, or when just checking for a match in a binary file. +

    +

    +--match-limit=number +Processing some regular expression patterns may take a very long time to search +for all possible matching strings. Others may require a very large amount of +memory. There are three options that set resource limits for matching. +
    +
    +The --match-limit option provides a means of limiting computing resource +usage when processing patterns that are not going to match, but which have a +very large number of possibilities in their search trees. The classic example +is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a +counter that is incremented each time around its main processing loop. If the +value set by --match-limit is reached, an error occurs. +
    +
    +The --heap-limit option specifies, as a number of kibibytes (units of +1024 bytes), the amount of heap memory that may be used for matching. Heap +memory is needed only if matching the pattern requires a significant number of +nested backtracking points to be remembered. This parameter can be set to zero +to forbid the use of heap memory altogether. +
    +
    +The --depth-limit option limits the depth of nested backtracking points, +which indirectly limits the amount of memory that is used. The amount of memory +needed for each backtracking point depends on the number of capturing +parentheses in the pattern, so the amount of memory that is used before this +limit acts varies from pattern to pattern. This limit is of use only if it is +set smaller than --match-limit. +
    +
    +There are no short forms for these options. The default limits can be set +when the PCRE2 library is compiled; if they are not specified, the defaults +are very large and so effectively unlimited. +

    +

    +--max-buffer-size=number +This limits the expansion of the processing buffer, whose initial size can be +set by --buffer-size. The maximum buffer size is silently forced to be no +smaller than the starting buffer size. +

    +

    +-N newline-type, --newline=newline-type +Six different conventions for indicating the ends of lines in scanned files are +supported. For example: +

    +  pcre2grep -N CRLF 'some pattern' <file>
    +
    +The newline type may be specified in upper, lower, or mixed case. If the +newline type is NUL, lines are separated by binary zero characters. The other +types are the single-character sequences CR (carriage return) and LF +(linefeed), the two-character sequence CRLF, an "anycrlf" type, which +recognizes any of the preceding three types, and an "any" type, for which any +Unicode line ending sequence is assumed to end a line. The Unicode sequences +are the three just mentioned, plus VT (vertical tab, U+000B), FF (form feed, +U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS +(paragraph separator, U+2029). +
    +
    +When the PCRE2 library is built, a default line-ending sequence is specified. +This is normally the standard sequence for the operating system. Unless +otherwise specified by this option, pcre2grep uses the library's default. +
    +
    +This option makes it possible to use pcre2grep to scan files that have +come from other environments without having to modify their line endings. If +the data that is being scanned does not agree with the convention set by this +option, pcre2grep may behave in strange ways. Note that this option does +not apply to files specified by the -f, --exclude-from, or +--include-from options, which are expected to use the operating system's +standard newline sequence. +

    +

    +-n, --line-number +Precede each output line by its line number in the file, followed by a colon +for matching lines or a hyphen for context lines. If the file name is also +being output, it precedes the line number. When the -M option causes a +pattern to match more than one line, only the first is preceded by its line +number. This option is forced if --line-offsets is used. +

    +

    +--no-jit +If the PCRE2 library is built with support for just-in-time compiling (which +speeds up matching), pcre2grep automatically makes use of this, unless it +was explicitly disabled at build time. This option can be used to disable the +use of JIT at run time. It is provided for testing and working round problems. +It should never be needed in normal use. +

    +

    +-O text, --output=text +When there is a match, instead of outputting the line that matched, output just +the text specified in this option, followed by an operating-system standard +newline. In this mode, no context is shown. That is, the -A, -B, +and -C options are ignored. The --newline option has no effect on +this option, which is mutually exclusive with --only-matching, +--file-offsets, and --line-offsets. However, like +--only-matching, if there is more than one match in a line, each of them +causes a line of output. +
    +
    +Escape sequences starting with a dollar character may be used to insert the +contents of the matched part of the line and/or captured substrings into the +text. +
    +
    +$<digits> or ${<digits>} is replaced by the captured substring of the given +decimal number; zero substitutes the whole match. If the number is greater than +the number of capturing substrings, or if the capture is unset, the replacement +is empty. +
    +
    +$a is replaced by bell; $b by backspace; $e by escape; $f by form feed; $n by +newline; $r by carriage return; $t by tab; $v by vertical tab. +
    +
    +$o<digits> or $o{<digits>} is replaced by the character whose code point is the +given octal number. In the first form, up to three octal digits are processed. +When more digits are needed in Unicode mode to specify a wide character, the +second form must be used. +
    +
    +$x<digits> or $x{<digits>} is replaced by the character represented by the +given hexadecimal number. In the first form, up to two hexadecimal digits are +processed. When more digits are needed in Unicode mode to specify a wide +character, the second form must be used. +
    +
    +Any other character is substituted by itself. In particular, $$ is replaced by +a single dollar. +

    +

    +-o, --only-matching +Show only the part of the line that matched a pattern instead of the whole +line. In this mode, no context is shown. That is, the -A, -B, and +-C options are ignored. If there is more than one match in a line, each +of them is shown separately, on a separate line of output. If -o is +combined with -v (invert the sense of the match to find non-matching +lines), no output is generated, but the return code is set appropriately. If +the matched portion of the line is empty, nothing is output unless the file +name or line number are being printed, in which case they are shown on an +otherwise empty line. This option is mutually exclusive with --output, +--file-offsets and --line-offsets. +

    +

    +-onumber, --only-matching=number +Show only the part of the line that matched the capturing parentheses of the +given number. Up to 50 capturing parentheses are supported by default. This +limit can be changed via the --om-capture option. A pattern may contain +any number of capturing parentheses, but only those whose number is within the +limit can be accessed by -o. An error occurs if the number specified by +-o is greater than the limit. +
    +
    +-o0 is the same as -o without a number. Because these options can be +given without an argument (see above), if an argument is present, it must be +given in the same shell item, for example, -o3 or --only-matching=2. The +comments given for the non-argument case above also apply to this option. If +the specified capturing parentheses do not exist in the pattern, or were not +set in the match, nothing is output unless the file name or line number are +being output. +
    +
    +If this option is given multiple times, multiple substrings are output for each +match, in the order the options are given, and all on one line. For example, +-o3 -o1 -o3 causes the substrings matched by capturing parentheses 3 and 1 and +then 3 again to be output. By default, there is no separator (but see the next +but one option). +

    +

    +--om-capture=number +Set the number of capturing parentheses that can be accessed by -o. The +default is 50. +

    +

    +--om-separator=text +Specify a separating string for multiple occurrences of -o. The default +is an empty string. Separating strings are never coloured. +

    +

    +-q, --quiet +Work quietly, that is, display nothing except error messages. The exit +status indicates whether or not any matches were found. +

    +

    +-r, --recursive +If any given path is a directory, recursively scan the files it contains, +taking note of any --include and --exclude settings. By default, a +directory is read as a normal file; in some operating systems this gives an +immediate end-of-file. This option is a shorthand for setting the -d +option to "recurse". +

    +

    +--recursion-limit=number +This is an obsolete synonym for --depth-limit. See --match-limit +above for details. +

    +

    +-s, --no-messages +Suppress error messages about non-existent or unreadable files. Such files are +quietly skipped. However, the return code is still 2, even if matches were +found in other files. +

    +

    +-t, --total-count +This option is useful when scanning more than one file. If used on its own, +-t suppresses all output except for a grand total number of matching +lines (or non-matching lines if -v is used) in all the files. If -t +is used with -c, a grand total is output except when the previous output +is just one line. In other words, it is not output when just one file's count +is listed. If file names are being output, the grand total is preceded by +"TOTAL:". Otherwise, it appears as just another number. The -t option is +ignored when used with -L (list files without matches), because the grand +total would always be zero. +

    +

    +-u, --utf +Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled +with UTF-8 support. All patterns (including those for any --exclude and +--include options) and all lines that are scanned must be valid strings +of UTF-8 characters. If an invalid UTF-8 string is encountered, an error +occurs. +

    +

    +-U, --utf-allow-invalid +As --utf, but in addition subject lines may contain invalid UTF-8 code +unit sequences. These can never form part of any pattern match. Patterns +themselves, however, must still be valid UTF-8 strings. This facility allows +valid UTF-8 strings to be sought within arbitrary byte sequences in executable +or other binary files. For more details about matching in non-valid UTF-8 +strings, see the +pcre2unicode(3) +documentation. +

    +

    +-V, --version +Write the version numbers of pcre2grep and the PCRE2 library to the +standard output and then exit. Anything else on the command line is +ignored. +

    +

    +-v, --invert-match +Invert the sense of the match, so that lines which do not match any of +the patterns are the ones that are found. When this option is set, options such +as --only-matching and --output, which specify parts of a match +that are to be output, are ignored. +

    +

    +-w, --word-regex, --word-regexp +Force the patterns only to match "words". That is, there must be a word +boundary at the start and end of each matched string. This is equivalent to +having "\b(?:" at the start of each pattern, and ")\b" at the end. This +option applies only to the patterns that are matched against the contents of +files; it does not apply to patterns specified by any of the --include or +--exclude options. +

    +

    +-x, --line-regex, --line-regexp +Force the patterns to start matching only at the beginnings of lines, and in +addition, require them to match entire lines. In multiline mode the match may +be more than one line. This is equivalent to having "^(?:" at the start of each +pattern and ")$" at the end. This option applies only to the patterns that are +matched against the contents of files; it does not apply to patterns specified +by any of the --include or --exclude options. +

    +
    ENVIRONMENT VARIABLES
    +

    +The environment variables LC_ALL and LC_CTYPE are examined, in that +order, for a locale. The first one that is set is used. This can be overridden +by the --locale option. If no locale is set, the PCRE2 library's default +(usually the "C" locale) is used. +

    +
    NEWLINES
    +

    +The -N (--newline) option allows pcre2grep to scan files with +newline conventions that differ from the default. This option affects only the +way scanned files are processed. It does not affect the interpretation of files +specified by the -f, --file-list, --exclude-from, or +--include-from options. +

    +

    +Any parts of the scanned input files that are written to the standard output +are copied with whatever newline sequences they have in the input. However, if +the final line of a file is output, and it does not end with a newline +sequence, a newline sequence is added. If the newline setting is CR, LF, CRLF +or NUL, that line ending is output; for the other settings (ANYCRLF or ANY) a +single NL is used. +

    +

    +The newline setting does not affect the way in which pcre2grep writes +newlines in informational messages to the standard output and error streams. +Under Windows, the standard output is set to be binary, so that "\r\n" at the +ends of output lines that are copied from the input is not converted to +"\r\r\n" by the C I/O library. This means that any messages written to the +standard output must end with "\r\n". For all other operating systems, and +for all messages to the standard error stream, "\n" is used. +

    +
    OPTIONS COMPATIBILITY
    +

    +Many of the short and long forms of pcre2grep's options are the same +as in the GNU grep program. Any long option of the form +--xxx-regexp (GNU terminology) is also available as --xxx-regex +(PCRE2 terminology). However, the --depth-limit, --file-list, +--file-offsets, --heap-limit, --include-dir, +--line-offsets, --locale, --match-limit, -M, +--multiline, -N, --newline, --om-separator, +--output, -u, --utf, -U, and --utf-allow-invalid +options are specific to pcre2grep, as is the use of the +--only-matching option with a capturing parentheses number. +

    +

    +Although most of the common options work the same way, a few are different in +pcre2grep. For example, the --include option's argument is a glob +for GNU grep, but a regular expression for pcre2grep. If both the +-c and -l options are given, GNU grep lists only file names, +without counts, but pcre2grep gives the counts as well. +

    +
    OPTIONS WITH DATA
    +

    +There are four different ways in which an option with data can be specified. +If a short form option is used, the data may follow immediately, or (with one +exception) in the next command line item. For example: +

    +  -f/some/file
    +  -f /some/file
    +
    +The exception is the -o option, which may appear with or without data. +Because of this, if data is present, it must follow immediately in the same +item, for example -o3. +

    +

    +If a long form option is used, the data may appear in the same command line +item, separated by an equals character, or (with two exceptions) it may appear +in the next command line item. For example: +

    +  --file=/some/file
    +  --file /some/file
    +
    +Note, however, that if you want to supply a file name beginning with ~ as data +in a shell command, and have the shell expand ~ to a home directory, you must +separate the file name from the option, because the shell does not treat ~ +specially unless it is at the start of an item. +

    +

    +The exceptions to the above are the --colour (or --color) and +--only-matching options, for which the data is optional. If one of these +options does have data, it must be given in the first form, using an equals +character. Otherwise pcre2grep will assume that it has no data. +

    +
    USING PCRE2'S CALLOUT FACILITY
    +

    +pcre2grep has, by default, support for calling external programs or +scripts or echoing specific strings during matching by making use of PCRE2's +callout facility. However, this support can be completely or partially disabled +when pcre2grep is built. You can find out whether your binary has support +for callouts by running it with the --help option. If callout support is +completely disabled, all callouts in patterns are ignored by pcre2grep. +If the facility is partially disabled, calling external programs is not +supported, and callouts that request it are ignored. +

    +

    +A callout in a PCRE2 pattern is of the form (?C<arg>) where the argument is +either a number or a quoted string (see the +pcre2callout +documentation for details). Numbered callouts are ignored by pcre2grep; +only callouts with string arguments are useful. +

    +
    +Echoing a specific string +
    +

    +Starting the callout string with a pipe character invokes an echoing facility +that avoids calling an external program or script. This facility is always +available, provided that callouts were not completely disabled when +pcre2grep was built. The rest of the callout string is processed as a +zero-terminated string, which means it should not contain any internal binary +zeros. It is written to the output, having first been passed through the same +escape processing as text from the --output (-O) option (see +above). However, $0 cannot be used to insert a matched substring because the +match is still in progress. Instead, the single character '0' is inserted. Any +syntax errors in the string (for example, a dollar not followed by another +character) causes the callout to be ignored. No terminator is added to the +output string, so if you want a newline, you must include it explicitly using +the escape $n. For example: +

    +  pcre2grep '(.)(..(.))(?C"|[$1] [$2] [$3]$n")' <some file>
    +
    +Matching continues normally after the string is output. If you want to see only +the callout output but not any output from an actual match, you should end the +pattern with (*FAIL). +

    +
    +Calling external programs or scripts +
    +

    +This facility can be independently disabled when pcre2grep is built. It +is supported for Windows, where a call to _spawnvp() is used, for VMS, +where lib$spawn() is used, and for any Unix-like environment where +fork() and execv() are available. +

    +

    +If the callout string does not start with a pipe (vertical bar) character, it +is parsed into a list of substrings separated by pipe characters. The first +substring must be an executable name, with the following substrings specifying +arguments: +

    +  executable_name|arg1|arg2|...
    +
    +Any substring (including the executable name) may contain escape sequences +started by a dollar character. These are the same as for the --output +(-O) option documented above, except that $0 cannot insert the matched +string because the match is still in progress. Instead, the character '0' +is inserted. If you need a literal dollar or pipe character in any +substring, use $$ or $| respectively. Here is an example: +
    +  echo -e "abcde\n12345" | pcre2grep \
    +    '(?x)(.)(..(.))
    +    (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
    +
    +  Output:
    +
    +    Arg1: [a] [bcd] [d] Arg2: |a| ()
    +    abcde
    +    Arg1: [1] [234] [4] Arg2: |1| ()
    +    12345
    +
    +The parameters for the system call that is used to run the program or script +are zero-terminated strings. This means that binary zero characters in the +callout argument will cause premature termination of their substrings, and +therefore should not be present. Any syntax errors in the string (for example, +a dollar not followed by another character) causes the callout to be ignored. +If running the program fails for any reason (including the non-existence of the +executable), a local matching failure occurs and the matcher backtracks in the +normal way. +

    +
    MATCHING ERRORS
    +

    +It is possible to supply a regular expression that takes a very long time to +fail to match certain lines. Such patterns normally involve nested indefinite +repeats, for example: (a+)*\d when matched against a line of a's with no final +digit. The PCRE2 matching function has a resource limit that causes it to abort +in these circumstances. If this happens, pcre2grep outputs an error +message and the line that caused the problem to the standard error stream. If +there are more than 20 such errors, pcre2grep gives up. +

    +

    +The --match-limit option of pcre2grep can be used to set the +overall resource limit. There are also other limits that affect the amount of +memory used during matching; see the discussion of --heap-limit and +--depth-limit above. +

    +
    DIAGNOSTICS
    +

    +Exit status is 0 if any matches were found, 1 if no matches were found, and 2 +for syntax errors, overlong lines, non-existent or inaccessible files (even if +matches were found in other files) or too many matching errors. Using the +-s option to suppress error messages about inaccessible files does not +affect the return code. +

    +

    +When run under VMS, the return code is placed in the symbol PCRE2GREP_RC +because VMS does not distinguish between exit(0) and exit(1). +

    +
    SEE ALSO
    +

    +pcre2pattern(3), pcre2syntax(3), pcre2callout(3), +pcre2unicode(3). +

    +
    AUTHOR
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    REVISION
    +

    +Last updated: 04 October 2020 +
    +Copyright © 1997-2020 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2jit.html b/src/pcre2/doc/html/pcre2jit.html new file mode 100644 index 00000000..423dfd83 --- /dev/null +++ b/src/pcre2/doc/html/pcre2jit.html @@ -0,0 +1,474 @@ + + +pcre2jit specification + + +

    pcre2jit man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

    +
    PCRE2 JUST-IN-TIME COMPILER SUPPORT
    +

    +Just-in-time compiling is a heavyweight optimization that can greatly speed up +pattern matching. However, it comes at the cost of extra processing before the +match is performed, so it is of most benefit when the same pattern is going to +be matched many times. This does not necessarily mean many calls of a matching +function; if the pattern is not anchored, matching attempts may take place many +times at various positions in the subject, even for a single call. Therefore, +if the subject string is very long, it may still pay to use JIT even for +one-off matches. JIT support is available for all of the 8-bit, 16-bit and +32-bit PCRE2 libraries. +

    +

    +JIT support applies only to the traditional Perl-compatible matching function. +It does not apply when the DFA matching function is being used. The code for +this support was written by Zoltan Herczeg. +

    +
    AVAILABILITY OF JIT SUPPORT
    +

    +JIT support is an optional feature of PCRE2. The "configure" option +--enable-jit (or equivalent CMake option) must be set when PCRE2 is built if +you want to use JIT. The support is limited to the following hardware +platforms: +

    +  ARM 32-bit (v5, v7, and Thumb2)
    +  ARM 64-bit
    +  Intel x86 32-bit and 64-bit
    +  MIPS 32-bit and 64-bit
    +  Power PC 32-bit and 64-bit
    +  SPARC 32-bit
    +
    +If --enable-jit is set on an unsupported platform, compilation fails. +

    +

    +A program can tell if JIT support is available by calling pcre2_config() +with the PCRE2_CONFIG_JIT option. The result is 1 when JIT is available, and 0 +otherwise. However, a simple program does not need to check this in order to +use JIT. The API is implemented in a way that falls back to the interpretive +code if JIT is not available. For programs that need the best possible +performance, there is also a "fast path" API that is JIT-specific. +

    +
    SIMPLE USE OF JIT
    +

    +To make use of the JIT support in the simplest way, all you have to do is to +call pcre2_jit_compile() after successfully compiling a pattern with +pcre2_compile(). This function has two arguments: the first is the +compiled pattern pointer that was returned by pcre2_compile(), and the +second is zero or more of the following option bits: PCRE2_JIT_COMPLETE, +PCRE2_JIT_PARTIAL_HARD, or PCRE2_JIT_PARTIAL_SOFT. +

    +

    +If JIT support is not available, a call to pcre2_jit_compile() does +nothing and returns PCRE2_ERROR_JIT_BADOPTION. Otherwise, the compiled pattern +is passed to the JIT compiler, which turns it into machine code that executes +much faster than the normal interpretive code, but yields exactly the same +results. The returned value from pcre2_jit_compile() is zero on success, +or a negative error code. +

    +

    +There is a limit to the size of pattern that JIT supports, imposed by the size +of machine stack that it uses. The exact rules are not documented because they +may change at any time, in particular, when new optimizations are introduced. +If a pattern is too big, a call to pcre2_jit_compile() returns +PCRE2_ERROR_NOMEMORY. +

    +

    +PCRE2_JIT_COMPLETE requests the JIT compiler to generate code for complete +matches. If you want to run partial matches using the PCRE2_PARTIAL_HARD or +PCRE2_PARTIAL_SOFT options of pcre2_match(), you should set one or both +of the other options as well as, or instead of PCRE2_JIT_COMPLETE. The JIT +compiler generates different optimized code for each of the three modes +(normal, soft partial, hard partial). When pcre2_match() is called, the +appropriate code is run if it is available. Otherwise, the pattern is matched +using interpretive code. +

    +

    +You can call pcre2_jit_compile() multiple times for the same compiled +pattern. It does nothing if it has previously compiled code for any of the +option bits. For example, you can call it once with PCRE2_JIT_COMPLETE and +(perhaps later, when you find you need partial matching) again with +PCRE2_JIT_COMPLETE and PCRE2_JIT_PARTIAL_HARD. This time it will ignore +PCRE2_JIT_COMPLETE and just compile code for partial matching. If +pcre2_jit_compile() is called with no option bits set, it immediately +returns zero. This is an alternative way of testing whether JIT is available. +

    +

    +At present, it is not possible to free JIT compiled code except when the entire +compiled pattern is freed by calling pcre2_code_free(). +

    +

    +In some circumstances you may need to call additional functions. These are +described in the section entitled +"Controlling the JIT stack" +below. +

    +

    +There are some pcre2_match() options that are not supported by JIT, and +there are also some pattern items that JIT cannot handle. Details are given +below. In both cases, matching automatically falls back to the interpretive +code. If you want to know whether JIT was actually used for a particular match, +you should arrange for a JIT callback function to be set up as described in the +section entitled +"Controlling the JIT stack" +below, even if you do not need to supply a non-default JIT stack. Such a +callback function is called whenever JIT code is about to be obeyed. If the +match-time options are not right for JIT execution, the callback function is +not obeyed. +

    +

    +If the JIT compiler finds an unsupported item, no JIT data is generated. You +can find out if JIT matching is available after compiling a pattern by calling +pcre2_pattern_info() with the PCRE2_INFO_JITSIZE option. A non-zero +result means that JIT compilation was successful. A result of 0 means that JIT +support is not available, or the pattern was not processed by +pcre2_jit_compile(), or the JIT compiler was not able to handle the +pattern. +

    +
    MATCHING SUBJECTS CONTAINING INVALID UTF
    +

    +When a pattern is compiled with the PCRE2_UTF option, subject strings are +normally expected to be a valid sequence of UTF code units. By default, this is +checked at the start of matching and an error is generated if invalid UTF is +detected. The PCRE2_NO_UTF_CHECK option can be passed to pcre2_match() to +skip the check (for improved performance) if you are sure that a subject string +is valid. If this option is used with an invalid string, the result is +undefined. +

    +

    +However, a way of running matches on strings that may contain invalid UTF +sequences is available. Calling pcre2_compile() with the +PCRE2_MATCH_INVALID_UTF option has two effects: it tells the interpreter in +pcre2_match() to support invalid UTF, and, if pcre2_jit_compile() +is called, the compiled JIT code also supports invalid UTF. Details of how this +support works, in both the JIT and the interpretive cases, is given in the +pcre2unicode +documentation. +

    +

    +There is also an obsolete option for pcre2_jit_compile() called +PCRE2_JIT_INVALID_UTF, which currently exists only for backward compatibility. +It is superseded by the pcre2_compile() option PCRE2_MATCH_INVALID_UTF +and should no longer be used. It may be removed in future. +

    +
    UNSUPPORTED OPTIONS AND PATTERN ITEMS
    +

    +The pcre2_match() options that are supported for JIT matching are +PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, +PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and +PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options are not +supported at match time. +

    +

    +If the PCRE2_NO_JIT option is passed to pcre2_match() it disables the +use of JIT, forcing matching by the interpreter code. +

    +

    +The only unsupported pattern items are \C (match a single data unit) when +running in a UTF mode, and a callout immediately before an assertion condition +in a conditional group. +

    +
    RETURN VALUES FROM JIT MATCHING
    +

    +When a pattern is matched using JIT matching, the return values are the same +as those given by the interpretive pcre2_match() code, with the addition +of one new error code: PCRE2_ERROR_JIT_STACKLIMIT. This means that the memory +used for the JIT stack was insufficient. See +"Controlling the JIT stack" +below for a discussion of JIT stack usage. +

    +

    +The error code PCRE2_ERROR_MATCHLIMIT is returned by the JIT code if searching +a very large pattern tree goes on for too long, as it is in the same +circumstance when JIT is not used, but the details of exactly what is counted +are not the same. The PCRE2_ERROR_DEPTHLIMIT error code is never returned +when JIT matching is used. +

    +
    CONTROLLING THE JIT STACK
    +

    +When the compiled JIT code runs, it needs a block of memory to use as a stack. +By default, it uses 32KiB on the machine stack. However, some large or +complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT +is given when there is not enough stack. Three functions are provided for +managing blocks of memory for use as JIT stacks. There is further discussion +about the use of JIT stacks in the section entitled +"JIT stack FAQ" +below. +

    +

    +The pcre2_jit_stack_create() function creates a JIT stack. Its arguments +are a starting size, a maximum size, and a general context (for memory +allocation functions, or NULL for standard memory allocation). It returns a +pointer to an opaque structure of type pcre2_jit_stack, or NULL if there +is an error. The pcre2_jit_stack_free() function is used to free a stack +that is no longer needed. If its argument is NULL, this function returns +immediately, without doing anything. (For the technically minded: the address +space is allocated by mmap or VirtualAlloc.) A maximum stack size of 512KiB to +1MiB should be more than enough for any pattern. +

    +

    +The pcre2_jit_stack_assign() function specifies which stack JIT code +should use. Its arguments are as follows: +

    +  pcre2_match_context  *mcontext
    +  pcre2_jit_callback    callback
    +  void                 *data
    +
    +The first argument is a pointer to a match context. When this is subsequently +passed to a matching function, its information determines which JIT stack is +used. If this argument is NULL, the function returns immediately, without doing +anything. There are three cases for the values of the other two options: +
    +  (1) If callback is NULL and data is NULL, an internal 32KiB block
    +      on the machine stack is used. This is the default when a match
    +      context is created.
    +
    +  (2) If callback is NULL and data is not NULL, data must be
    +      a pointer to a valid JIT stack, the result of calling
    +      pcre2_jit_stack_create().
    +
    +  (3) If callback is not NULL, it must point to a function that is
    +      called with data as an argument at the start of matching, in
    +      order to set up a JIT stack. If the return from the callback
    +      function is NULL, the internal 32KiB stack is used; otherwise the
    +      return value must be a valid JIT stack, the result of calling
    +      pcre2_jit_stack_create().
    +
    +A callback function is obeyed whenever JIT code is about to be run; it is not +obeyed when pcre2_match() is called with options that are incompatible +for JIT matching. A callback function can therefore be used to determine +whether a match operation was executed by JIT or by the interpreter. +

    +

    +You may safely use the same JIT stack for more than one pattern (either by +assigning directly or by callback), as long as the patterns are matched +sequentially in the same thread. Currently, the only way to set up +non-sequential matches in one thread is to use callouts: if a callout function +starts another match, that match must use a different JIT stack to the one used +for currently suspended match(es). +

    +

    +In a multithread application, if you do not +specify a JIT stack, or if you assign or pass back NULL from a callback, that +is thread-safe, because each thread has its own machine stack. However, if you +assign or pass back a non-NULL JIT stack, this must be a different stack for +each thread so that the application is thread-safe. +

    +

    +Strictly speaking, even more is allowed. You can assign the same non-NULL stack +to a match context that is used by any number of patterns, as long as they are +not used for matching by multiple threads at the same time. For example, you +could use the same stack in all compiled patterns, with a global mutex in the +callback to wait until the stack is available for use. However, this is an +inefficient solution, and not recommended. +

    +

    +This is a suggestion for how a multithreaded program that needs to set up +non-default JIT stacks might operate: +

    +  During thread initalization
    +    thread_local_var = pcre2_jit_stack_create(...)
    +
    +  During thread exit
    +    pcre2_jit_stack_free(thread_local_var)
    +
    +  Use a one-line callback function
    +    return thread_local_var
    +
    +All the functions described in this section do nothing if JIT is not available. +

    +
    JIT STACK FAQ
    +

    +(1) Why do we need JIT stacks? +
    +
    +PCRE2 (and JIT) is a recursive, depth-first engine, so it needs a stack where +the local data of the current node is pushed before checking its child nodes. +Allocating real machine stack on some platforms is difficult. For example, the +stack chain needs to be updated every time if we extend the stack on PowerPC. +Although it is possible, its updating time overhead decreases performance. So +we do the recursion in memory. +

    +

    +(2) Why don't we simply allocate blocks of memory with malloc()? +
    +
    +Modern operating systems have a nice feature: they can reserve an address space +instead of allocating memory. We can safely allocate memory pages inside this +address space, so the stack could grow without moving memory data (this is +important because of pointers). Thus we can allocate 1MiB address space, and +use only a single memory page (usually 4KiB) if that is enough. However, we can +still grow up to 1MiB anytime if needed. +

    +

    +(3) Who "owns" a JIT stack? +
    +
    +The owner of the stack is the user program, not the JIT studied pattern or +anything else. The user program must ensure that if a stack is being used by +pcre2_match(), (that is, it is assigned to a match context that is passed +to the pattern currently running), that stack must not be used by any other +threads (to avoid overwriting the same memory area). The best practice for +multithreaded programs is to allocate a stack for each thread, and return this +stack through the JIT callback function. +

    +

    +(4) When should a JIT stack be freed? +
    +
    +You can free a JIT stack at any time, as long as it will not be used by +pcre2_match() again. When you assign the stack to a match context, only a +pointer is set. There is no reference counting or any other magic. You can free +compiled patterns, contexts, and stacks in any order, anytime. +Just do not call pcre2_match() with a match context pointing to an +already freed stack, as that will cause SEGFAULT. (Also, do not free a stack +currently used by pcre2_match() in another thread). You can also replace +the stack in a context at any time when it is not in use. You should free the +previous stack before assigning a replacement. +

    +

    +(5) Should I allocate/free a stack every time before/after calling +pcre2_match()? +
    +
    +No, because this is too costly in terms of resources. However, you could +implement some clever idea which release the stack if it is not used in let's +say two minutes. The JIT callback can help to achieve this without keeping a +list of patterns. +

    +

    +(6) OK, the stack is for long term memory allocation. But what happens if a +pattern causes stack overflow with a stack of 1MiB? Is that 1MiB kept until the +stack is freed? +
    +
    +Especially on embedded sytems, it might be a good idea to release memory +sometimes without freeing the stack. There is no API for this at the moment. +Probably a function call which returns with the currently allocated memory for +any stack and another which allows releasing memory (shrinking the stack) would +be a good idea if someone needs this. +

    +

    +(7) This is too much of a headache. Isn't there any better solution for JIT +stack handling? +
    +
    +No, thanks to Windows. If POSIX threads were used everywhere, we could throw +out this complicated API. +

    +
    FREEING JIT SPECULATIVE MEMORY
    +

    +void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext); +

    +

    +The JIT executable allocator does not free all memory when it is possible. +It expects new allocations, and keeps some free memory around to improve +allocation speed. However, in low memory conditions, it might be better to free +all possible memory. You can cause this to happen by calling +pcre2_jit_free_unused_memory(). Its argument is a general context, for custom +memory management, or NULL for standard memory management. +

    +
    EXAMPLE CODE
    +

    +This is a single-threaded example that specifies a JIT stack without using a +callback. A real program should include error checking after all the function +calls. +

    +  int rc;
    +  pcre2_code *re;
    +  pcre2_match_data *match_data;
    +  pcre2_match_context *mcontext;
    +  pcre2_jit_stack *jit_stack;
    +
    +  re = pcre2_compile(pattern, PCRE2_ZERO_TERMINATED, 0,
    +    &errornumber, &erroffset, NULL);
    +  rc = pcre2_jit_compile(re, PCRE2_JIT_COMPLETE);
    +  mcontext = pcre2_match_context_create(NULL);
    +  jit_stack = pcre2_jit_stack_create(32*1024, 512*1024, NULL);
    +  pcre2_jit_stack_assign(mcontext, NULL, jit_stack);
    +  match_data = pcre2_match_data_create(re, 10);
    +  rc = pcre2_match(re, subject, length, 0, 0, match_data, mcontext);
    +  /* Process result */
    +
    +  pcre2_code_free(re);
    +  pcre2_match_data_free(match_data);
    +  pcre2_match_context_free(mcontext);
    +  pcre2_jit_stack_free(jit_stack);
    +
    +
    +

    +
    JIT FAST PATH API
    +

    +Because the API described above falls back to interpreted matching when JIT is +not available, it is convenient for programs that are written for general use +in many environments. However, calling JIT via pcre2_match() does have a +performance impact. Programs that are written for use where JIT is known to be +available, and which need the best possible performance, can instead use a +"fast path" API to call JIT matching directly instead of calling +pcre2_match() (obviously only for patterns that have been successfully +processed by pcre2_jit_compile()). +

    +

    +The fast path function is called pcre2_jit_match(), and it takes exactly +the same arguments as pcre2_match(). However, the subject string must be +specified with a length; PCRE2_ZERO_TERMINATED is not supported. Unsupported +option bits (for example, PCRE2_ANCHORED, PCRE2_ENDANCHORED and +PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the PCRE2_NO_JIT option. The +return values are also the same as for pcre2_match(), plus +PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested +that was not compiled. +

    +

    +When you call pcre2_match(), as well as testing for invalid options, a +number of other sanity checks are performed on the arguments. For example, if +the subject pointer is NULL, an immediate error is given. Also, unless +PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested for validity. In the +interests of speed, these checks do not happen on the JIT fast path, and if +invalid data is passed, the result is undefined. +

    +

    +Bypassing the sanity checks and the pcre2_match() wrapping can give +speedups of more than 10%. +

    +
    SEE ALSO
    +

    +pcre2api(3) +

    +
    AUTHOR
    +

    +Philip Hazel (FAQ by Zoltan Herczeg) +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    REVISION
    +

    +Last updated: 23 May 2019 +
    +Copyright © 1997-2019 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2limits.html b/src/pcre2/doc/html/pcre2limits.html new file mode 100644 index 00000000..c8bc01b8 --- /dev/null +++ b/src/pcre2/doc/html/pcre2limits.html @@ -0,0 +1,95 @@ + + +pcre2limits specification + + +

    pcre2limits man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +SIZE AND OTHER LIMITATIONS +
    +

    +There are some size limitations in PCRE2 but it is hoped that they will never +in practice be relevant. +

    +

    +The maximum size of a compiled pattern is approximately 64 thousand code units +for the 8-bit and 16-bit libraries if PCRE2 is compiled with the default +internal linkage size, which is 2 bytes for these libraries. If you want to +process regular expressions that are truly enormous, you can compile PCRE2 with +an internal linkage size of 3 or 4 (when building the 16-bit library, 3 is +rounded up to 4). See the README file in the source distribution and the +pcre2build +documentation for details. In these cases the limit is substantially larger. +However, the speed of execution is slower. In the 32-bit library, the internal +linkage size is always 4. +

    +

    +The maximum length of a source pattern string is essentially unlimited; it is +the largest number a PCRE2_SIZE variable can hold. However, the program that +calls pcre2_compile() can specify a smaller limit. +

    +

    +The maximum length (in code units) of a subject string is one less than the +largest number a PCRE2_SIZE variable can hold. PCRE2_SIZE is an unsigned +integer type, usually defined as size_t. Its maximum value (that is +~(PCRE2_SIZE)0) is reserved as a special indicator for zero-terminated strings +and unset offsets. +

    +

    +All values in repeating quantifiers must be less than 65536. +

    +

    +The maximum length of a lookbehind assertion is 65535 characters. +

    +

    +There is no limit to the number of parenthesized groups, but there can be no +more than 65535 capture groups, and there is a limit to the depth of nesting of +parenthesized subpatterns of all kinds. This is imposed in order to limit the +amount of system stack used at compile time. The default limit can be specified +when PCRE2 is built; if not, the default is set to 250. An application can +change this limit by calling pcre2_set_parens_nest_limit() to set the limit in +a compile context. +

    +

    +The maximum length of name for a named capture group is 32 code units, and the +maximum number of such groups is 10000. +

    +

    +The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb +is 255 code units for the 8-bit library and 65535 code units for the 16-bit and +32-bit libraries. +

    +

    +The maximum length of a string argument to a callout is the largest number a +32-bit unsigned integer can hold. +

    +
    +AUTHOR +
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    +REVISION +
    +

    +Last updated: 02 February 2019 +
    +Copyright © 1997-2019 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre/doc/html/pcrematching.html b/src/pcre2/doc/html/pcre2matching.html similarity index 64% rename from src/pcre/doc/html/pcrematching.html rename to src/pcre2/doc/html/pcre2matching.html index a1af39b6..4b71c8f8 100644 --- a/src/pcre/doc/html/pcrematching.html +++ b/src/pcre2/doc/html/pcre2matching.html @@ -1,19 +1,19 @@ -pcrematching specification +pcre2matching specification -

    pcrematching man page

    +

    pcre2matching man page

    -Return to the PCRE index page. +Return to the PCRE2 index page.

    -This page is part of the PCRE HTML documentation. It was generated automatically -from the original man page. If there is any nonsense in it, please consult the -man page, in case the conversion went wrong. +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong.

    -
    PCRE MATCHING ALGORITHMS
    +
    PCRE2 MATCHING ALGORITHMS

    -This document describes the two different algorithms that are available in PCRE -for matching a compiled regular expression against a given subject string. The -"standard" algorithm is the one provided by the pcre_exec(), -pcre16_exec() and pcre32_exec() functions. These work in the same -as as Perl's matching function, and provide a Perl-compatible matching operation. -The just-in-time (JIT) optimization that is described in the -pcrejit -documentation is compatible with these functions. +This document describes the two different algorithms that are available in +PCRE2 for matching a compiled regular expression against a given subject +string. The "standard" algorithm is the one provided by the pcre2_match() +function. This works in the same as as Perl's matching function, and provide a +Perl-compatible matching operation. The just-in-time (JIT) optimization that is +described in the +pcre2jit +documentation is compatible with this function.

    -An alternative algorithm is provided by the pcre_dfa_exec(), -pcre16_dfa_exec() and pcre32_dfa_exec() functions; they operate in -a different way, and are not Perl-compatible. This alternative has advantages -and disadvantages compared with the standard algorithm, and these are described -below. +An alternative algorithm is provided by the pcre2_dfa_match() function; +it operates in a different way, and is not Perl-compatible. This alternative +has advantages and disadvantages compared with the standard algorithm, and +these are described below.

    When there is only one possible way in which a given subject string can match a @@ -61,20 +60,19 @@

    pcrematching man page

    infinite size, but it is still a tree. Matching the pattern to a given subject string (from a given starting point) can be thought of as a search of the tree. There are two ways to search a tree: depth-first and breadth-first, and these -correspond to the two matching algorithms provided by PCRE. +correspond to the two matching algorithms provided by PCRE2.


    THE STANDARD MATCHING ALGORITHM

    -In the terminology of Jeffrey Friedl's book "Mastering Regular -Expressions", the standard algorithm is an "NFA algorithm". It conducts a -depth-first search of the pattern tree. That is, it proceeds along a single -path through the tree, checking that the subject matches what is required. When -there is a mismatch, the algorithm tries any alternatives at the current point, -and if they all fail, it backs up to the previous branch point in the tree, and -tries the next alternative branch at that level. This often involves backing up -(moving to the left) in the subject string as well. The order in which -repetition branches are tried is controlled by the greedy or ungreedy nature of -the quantifier. +In the terminology of Jeffrey Friedl's book "Mastering Regular Expressions", +the standard algorithm is an "NFA algorithm". It conducts a depth-first search +of the pattern tree. That is, it proceeds along a single path through the tree, +checking that the subject matches what is required. When there is a mismatch, +the algorithm tries any alternatives at the current point, and if they all +fail, it backs up to the previous branch point in the tree, and tries the next +alternative branch at that level. This often involves backing up (moving to the +left) in the subject string as well. The order in which repetition branches are +tried is controlled by the greedy or ungreedy nature of the quantifier.

    If a leaf node is reached, a matching string has been found, and at that point @@ -87,7 +85,7 @@

    pcrematching man page

    Because it ends up with a single path through the tree, it is relatively straightforward for this algorithm to keep track of the substrings that are matched by portions of the pattern in parentheses. This provides support for -capturing parentheses and back references. +capturing parentheses and backreferences.


    THE ALTERNATIVE MATCHING ALGORITHM

    @@ -120,27 +118,29 @@

    pcrematching man page

       cat(er(pillar)?)?
     
    -is matched against the string "the caterpillar catchment", the result will be -the three strings "caterpillar", "cater", and "cat" that start at the fifth +is matched against the string "the caterpillar catchment", the result is the +three strings "caterpillar", "cater", and "cat" that start at the fifth character of the subject. The algorithm does not automatically move on to find matches that start at later positions.

    -PCRE's "auto-possessification" optimization usually applies to character +PCRE2's "auto-possessification" optimization usually applies to character repeats at the end of a pattern (as well as internally). For example, the pattern "a\d+" is compiled as if it were "a\d++" because there is no point even considering the possibility of backtracking into the repeated digits. For DFA matching, this means that only one possible match is found. If you really do want multiple matches in such cases, either use an ungreedy repeat -("a\d+?") or set the PCRE_NO_AUTO_POSSESS option when compiling. +("a\d+?") or set the PCRE2_NO_AUTO_POSSESS option when compiling.

    -There are a number of features of PCRE regular expressions that are not -supported by the alternative matching algorithm. They are as follows: +There are a number of features of PCRE2 regular expressions that are not +supported or behave differently in the alternative matching function. Those +that are not supported cause an error if encountered.

    1. Because the algorithm finds all possible matches, the greedy or ungreedy -nature of repetition quantifiers is not relevant. Greedy and ungreedy +nature of repetition quantifiers is not relevant (though it may affect +auto-possessification, as just described). During matching, greedy and ungreedy quantifiers are treated in exactly the same way. However, possessive quantifiers can make a difference when what follows could also match what is quantified, for example in a pattern like this: @@ -155,36 +155,43 @@

    pcrematching man page

    2. When dealing with multiple paths through the tree simultaneously, it is not straightforward to keep track of captured substrings for the different matching -possibilities, and PCRE's implementation of this algorithm does not attempt to +possibilities, and PCRE2's implementation of this algorithm does not attempt to do this. This means that no captured substrings are available.

    -3. Because no substrings are captured, back references within the pattern are -not supported, and cause errors if encountered. +3. Because no substrings are captured, backreferences within the pattern are +not supported.

    4. For the same reason, conditional expressions that use a backreference as the condition or test for a specific group recursion are not supported.

    -5. Because many paths through the tree may be active, the \K escape sequence, +5. Again for the same reason, script runs are not supported. +

    +

    +6. Because many paths through the tree may be active, the \K escape sequence, which resets the start of the match when encountered (but may be on some paths -and not on others), is not supported. It causes an error if encountered. +and not on others), is not supported.

    -6. Callouts are supported, but the value of the capture_top field is -always 1, and the value of the capture_last field is always -1. +7. Callouts are supported, but the value of the capture_top field is +always 1, and the value of the capture_last field is always 0.

    -7. The \C escape sequence, which (in the standard algorithm) always matches a -single data unit, even in UTF-8, UTF-16 or UTF-32 modes, is not supported in -these modes, because the alternative algorithm moves through the subject string -one character (not data unit) at a time, for all active paths through the tree. +8. The \C escape sequence, which (in the standard algorithm) always matches a +single code unit, even in a UTF mode, is not supported in these modes, because +the alternative algorithm moves through the subject string one character (not +code unit) at a time, for all active paths through the tree.

    -8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not +9. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not supported. (*FAIL) is supported, and behaves like a failing negative assertion.

    +

    +10. The PCRE2_MATCH_INVALID_UTF option for pcre2_compile() is not +supported by pcre2_dfa_match(). +


    ADVANTAGES OF THE ALTERNATIVE ALGORITHM

    Using the alternative matching algorithm provides the following advantages: @@ -199,10 +206,10 @@

    pcrematching man page

    2. Because the alternative algorithm scans the subject string just once, and never needs to backtrack (except for lookbehinds), it is possible to pass very long subject strings to the matching function in several pieces, checking for -partial matching each time. Although it is possible to do multi-segment -matching using the standard algorithm by retaining partially matched +partial matching each time. Although it is also possible to do multi-segment +matching using the standard algorithm, by retaining partially matched substrings, it is more complicated. The -pcrepartial +pcre2partial documentation gives details of partial matching and discusses multi-segment matching.

    @@ -216,7 +223,8 @@

    pcrematching man page

    less susceptible to optimization.

    -2. Capturing parentheses and back references are not supported. +2. Capturing parentheses, backreferences, script runs, and matching within +invalid UTF string are not supported.

    3. Although atomic groups are supported, their use does not provide the @@ -228,15 +236,15 @@

    pcrematching man page


    University Computing Service
    -Cambridge CB2 3QH, England. +Cambridge, England.


    REVISION

    -Last updated: 12 November 2013 +Last updated: 23 May 2019
    -Copyright © 1997-2012 University of Cambridge. +Copyright © 1997-2019 University of Cambridge.

    -Return to the PCRE index page. +Return to the PCRE2 index page.

    diff --git a/src/pcre2/doc/html/pcre2partial.html b/src/pcre2/doc/html/pcre2partial.html new file mode 100644 index 00000000..bb73b1de --- /dev/null +++ b/src/pcre2/doc/html/pcre2partial.html @@ -0,0 +1,408 @@ + + +pcre2partial specification + + +

    pcre2partial man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

    +
    PARTIAL MATCHING IN PCRE2
    +

    +In normal use of PCRE2, if there is a match up to the end of a subject string, +but more characters are needed to match the entire pattern, PCRE2_ERROR_NOMATCH +is returned, just like any other failing match. There are circumstances where +it might be helpful to distinguish this "partial match" case. +

    +

    +One example is an application where the subject string is very long, and not +all available at once. The requirement here is to be able to do the matching +segment by segment, but special action is needed when a matched substring spans +the boundary between two segments. +

    +

    +Another example is checking a user input string as it is typed, to ensure that +it conforms to a required format. Invalid characters can be immediately +diagnosed and rejected, giving instant feedback. +

    +

    +Partial matching is a PCRE2-specific feature; it is not Perl-compatible. It is +requested by setting one of the PCRE2_PARTIAL_HARD or PCRE2_PARTIAL_SOFT +options when calling a matching function. The difference between the two +options is whether or not a partial match is preferred to an alternative +complete match, though the details differ between the two types of matching +function. If both options are set, PCRE2_PARTIAL_HARD takes precedence. +

    +

    +If you want to use partial matching with just-in-time optimized code, as well +as setting a partial match option for the matching function, you must also call +pcre2_jit_compile() with one or both of these options: +

    +  PCRE2_JIT_PARTIAL_HARD
    +  PCRE2_JIT_PARTIAL_SOFT
    +
    +PCRE2_JIT_COMPLETE should also be set if you are going to run non-partial +matches on the same pattern. Separate code is compiled for each mode. If the +appropriate JIT mode has not been compiled, interpretive matching code is used. +

    +

    +Setting a partial matching option disables two of PCRE2's standard +optimization hints. PCRE2 remembers the last literal code unit in a pattern, +and abandons matching immediately if it is not present in the subject string. +This optimization cannot be used for a subject string that might match only +partially. PCRE2 also remembers a minimum length of a matching string, and does +not bother to run the matching function on shorter strings. This optimization +is also disabled for partial matching. +

    +
    REQUIREMENTS FOR A PARTIAL MATCH
    +

    +A possible partial match occurs during matching when the end of the subject +string is reached successfully, but either more characters are needed to +complete the match, or the addition of more characters might change what is +matched. +

    +

    +Example 1: if the pattern is /abc/ and the subject is "ab", more characters are +definitely needed to complete a match. In this case both hard and soft matching +options yield a partial match. +

    +

    +Example 2: if the pattern is /ab+/ and the subject is "ab", a complete match +can be found, but the addition of more characters might change what is +matched. In this case, only PCRE2_PARTIAL_HARD returns a partial match; +PCRE2_PARTIAL_SOFT returns the complete match. +

    +

    +On reaching the end of the subject, when PCRE2_PARTIAL_HARD is set, if the next +pattern item is \z, \Z, \b, \B, or $ there is always a partial match. +Otherwise, for both options, the next pattern item must be one that inspects a +character, and at least one of the following must be true: +

    +

    +(1) At least one character has already been inspected. An inspected character +need not form part of the final matched string; lookbehind assertions and the +\K escape sequence provide ways of inspecting characters before the start of a +matched string. +

    +

    +(2) The pattern contains one or more lookbehind assertions. This condition +exists in case there is a lookbehind that inspects characters before the start +of the match. +

    +

    +(3) There is a special case when the whole pattern can match an empty string. +When the starting point is at the end of the subject, the empty string match is +a possibility, and if PCRE2_PARTIAL_SOFT is set and neither of the above +conditions is true, it is returned. However, because adding more characters +might result in a non-empty match, PCRE2_PARTIAL_HARD returns a partial match, +which in this case means "there is going to be a match at this point, but until +some more characters are added, we do not know if it will be an empty string or +something longer". +

    +
    PARTIAL MATCHING USING pcre2_match()
    +

    +When a partial matching option is set, the result of calling +pcre2_match() can be one of the following: +

    +

    +A successful match +A complete match has been found, starting and ending within this subject. +

    +

    +PCRE2_ERROR_NOMATCH +No match can start anywhere in this subject. +

    +

    +PCRE2_ERROR_PARTIAL +Adding more characters may result in a complete match that uses one or more +characters from the end of this subject. +

    +

    +When a partial match is returned, the first two elements in the ovector point +to the portion of the subject that was matched, but the values in the rest of +the ovector are undefined. The appearance of \K in the pattern has no effect +for a partial match. Consider this pattern: +

    +  /abc\K123/
    +
    +If it is matched against "456abc123xyz" the result is a complete match, and the +ovector defines the matched string as "123", because \K resets the "start of +match" point. However, if a partial match is requested and the subject string +is "456abc12", a partial match is found for the string "abc12", because all +these characters are needed for a subsequent re-match with additional +characters. +

    +

    +If there is more than one partial match, the first one that was found provides +the data that is returned. Consider this pattern: +

    +  /123\w+X|dogY/
    +
    +If this is matched against the subject string "abc123dog", both alternatives +fail to match, but the end of the subject is reached during matching, so +PCRE2_ERROR_PARTIAL is returned. The offsets are set to 3 and 9, identifying +"123dog" as the first partial match. (In this example, there are two partial +matches, because "dog" on its own partially matches the second alternative.) +

    +
    +How a partial match is processed by pcre2_match() +
    +

    +What happens when a partial match is identified depends on which of the two +partial matching options is set. +

    +

    +If PCRE2_PARTIAL_HARD is set, PCRE2_ERROR_PARTIAL is returned as soon as a +partial match is found, without continuing to search for possible complete +matches. This option is "hard" because it prefers an earlier partial match over +a later complete match. For this reason, the assumption is made that the end of +the supplied subject string is not the true end of the available data, which is +why \z, \Z, \b, \B, and $ always give a partial match. +

    +

    +If PCRE2_PARTIAL_SOFT is set, the partial match is remembered, but matching +continues as normal, and other alternatives in the pattern are tried. If no +complete match can be found, PCRE2_ERROR_PARTIAL is returned instead of +PCRE2_ERROR_NOMATCH. This option is "soft" because it prefers a complete match +over a partial match. All the various matching items in a pattern behave as if +the subject string is potentially complete; \z, \Z, and $ match at the end of +the subject, as normal, and for \b and \B the end of the subject is treated +as a non-alphanumeric. +

    +

    +The difference between the two partial matching options can be illustrated by a +pattern such as: +

    +  /dog(sbody)?/
    +
    +This matches either "dog" or "dogsbody", greedily (that is, it prefers the +longer string if possible). If it is matched against the string "dog" with +PCRE2_PARTIAL_SOFT, it yields a complete match for "dog". However, if +PCRE2_PARTIAL_HARD is set, the result is PCRE2_ERROR_PARTIAL. On the other +hand, if the pattern is made ungreedy the result is different: +
    +  /dog(sbody)??/
    +
    +In this case the result is always a complete match because that is found first, +and matching never continues after finding a complete match. It might be easier +to follow this explanation by thinking of the two patterns like this: +
    +  /dog(sbody)?/    is the same as  /dogsbody|dog/
    +  /dog(sbody)??/   is the same as  /dog|dogsbody/
    +
    +The second pattern will never match "dogsbody", because it will always find the +shorter match first. +

    +
    +Example of partial matching using pcre2test +
    +

    +The pcre2test data modifiers partial_hard (or ph) and +partial_soft (or ps) set PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT, +respectively, when calling pcre2_match(). Here is a run of +pcre2test using a pattern that matches the whole subject in the form of a +date: +

    +    re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
    +  data> 25dec3\=ph
    +  Partial match: 23dec3
    +  data> 3ju\=ph
    +  Partial match: 3ju
    +  data> 3juj\=ph
    +  No match
    +
    +This example gives the same results for both hard and soft partial matching +options. Here is an example where there is a difference: +
    +    re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
    +  data> 25jun04\=ps
    +   0: 25jun04
    +   1: jun
    +  data> 25jun04\=ph
    +  Partial match: 25jun04
    +
    +With PCRE2_PARTIAL_SOFT, the subject is matched completely. For +PCRE2_PARTIAL_HARD, however, the subject is assumed not to be complete, so +there is only a partial match. +

    +
    MULTI-SEGMENT MATCHING WITH pcre2_match()
    +

    +PCRE was not originally designed with multi-segment matching in mind. However, +over time, features (including partial matching) that make multi-segment +matching possible have been added. A very long string can be searched segment +by segment by calling pcre2_match() repeatedly, with the aim of achieving +the same results that would happen if the entire string was available for +searching all the time. Normally, the strings that are being sought are much +shorter than each individual segment, and are in the middle of very long +strings, so the pattern is normally not anchored. +

    +

    +Special logic must be implemented to handle a matched substring that spans a +segment boundary. PCRE2_PARTIAL_HARD should be used, because it returns a +partial match at the end of a segment whenever there is the possibility of +changing the match by adding more characters. The PCRE2_NOTBOL option should +also be set for all but the first segment. +

    +

    +When a partial match occurs, the next segment must be added to the current +subject and the match re-run, using the startoffset argument of +pcre2_match() to begin at the point where the partial match started. +For example: +

    +    re> /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/
    +  data> ...the date is 23ja\=ph
    +  Partial match: 23ja
    +  data> ...the date is 23jan19 and on that day...\=offset=15
    +   0: 23jan19
    +   1: jan
    +
    +Note the use of the offset modifier to start the new match where the +partial match was found. In this example, the next segment was added to the one +in which the partial match was found. This is the most straightforward +approach, typically using a memory buffer that is twice the size of each +segment. After a partial match, the first half of the buffer is discarded, the +second half is moved to the start of the buffer, and a new segment is added +before repeating the match as in the example above. After a no match, the +entire buffer can be discarded. +

    +

    +If there are memory constraints, you may want to discard text that precedes a +partial match before adding the next segment. Unfortunately, this is not at +present straightforward. In cases such as the above, where the pattern does not +contain any lookbehinds, it is sufficient to retain only the partially matched +substring. However, if the pattern contains a lookbehind assertion, characters +that precede the start of the partial match may have been inspected during the +matching process. When pcre2test displays a partial match, it indicates +these characters with '<' if the allusedtext modifier is set: +

    +    re> "(?<=123)abc"
    +  data> xx123ab\=ph,allusedtext
    +  Partial match: 123ab
    +                 <<<
    +
    +However, the allusedtext modifier is not available for JIT matching, +because JIT matching does not record the first (or last) consulted characters. +For this reason, this information is not available via the API. It is therefore +not possible in general to obtain the exact number of characters that must be +retained in order to get the right match result. If you cannot retain the +entire segment, you must find some heuristic way of choosing. +

    +

    +If you know the approximate length of the matching substrings, you can use that +to decide how much text to retain. The only lookbehind information that is +currently available via the API is the length of the longest individual +lookbehind in a pattern, but this can be misleading if there are nested +lookbehinds. The value returned by calling pcre2_pattern_info() with the +PCRE2_INFO_MAXLOOKBEHIND option is the maximum number of characters (not code +units) that any individual lookbehind moves back when it is processed. A +pattern such as "(?<=(?<!b)a)" has a maximum lookbehind value of one, but +inspects two characters before its starting point. +

    +

    +In a non-UTF or a 32-bit case, moving back is just a subtraction, but in +UTF-8 or UTF-16 you have to count characters while moving back through the code +units. +

    +
    PARTIAL MATCHING USING pcre2_dfa_match()
    +

    +The DFA function moves along the subject string character by character, without +backtracking, searching for all possible matches simultaneously. If the end of +the subject is reached before the end of the pattern, there is the possibility +of a partial match. +

    +

    +When PCRE2_PARTIAL_SOFT is set, PCRE2_ERROR_PARTIAL is returned only if there +have been no complete matches. Otherwise, the complete matches are returned. +If PCRE2_PARTIAL_HARD is set, a partial match takes precedence over any +complete matches. The portion of the string that was matched when the longest +partial match was found is set as the first matching string. +

    +

    +Because the DFA function always searches for all possible matches, and there is +no difference between greedy and ungreedy repetition, its behaviour is +different from the pcre2_match(). Consider the string "dog" matched +against this ungreedy pattern: +

    +  /dog(sbody)??/
    +
    +Whereas the standard function stops as soon as it finds the complete match for +"dog", the DFA function also finds the partial match for "dogsbody", and so +returns that when PCRE2_PARTIAL_HARD is set. +

    +
    MULTI-SEGMENT MATCHING WITH pcre2_dfa_match()
    +

    +When a partial match has been found using the DFA matching function, it is +possible to continue the match by providing additional subject data and calling +the function again with the same compiled regular expression, this time setting +the PCRE2_DFA_RESTART option. You must pass the same working space as before, +because this is where details of the previous partial match are stored. You can +set the PCRE2_PARTIAL_SOFT or PCRE2_PARTIAL_HARD options with PCRE2_DFA_RESTART +to continue partial matching over multiple segments. Here is an example using +pcre2test: +

    +    re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
    +  data> 23ja\=dfa,ps
    +  Partial match: 23ja
    +  data> n05\=dfa,dfa_restart
    +   0: n05
    +
    +The first call has "23ja" as the subject, and requests partial matching; the +second call has "n05" as the subject for the continued (restarted) match. +Notice that when the match is complete, only the last part is shown; PCRE2 does +not retain the previously partially-matched string. It is up to the calling +program to do that if it needs to. This means that, for an unanchored pattern, +if a continued match fails, it is not possible to try again at a new starting +point. All this facility is capable of doing is continuing with the previous +match attempt. For example, consider this pattern: +
    +  1234|3789
    +
    +If the first part of the subject is "ABC123", a partial match of the first +alternative is found at offset 3. There is no partial match for the second +alternative, because such a match does not start at the same point in the +subject string. Attempting to continue with the string "7890" does not yield a +match because only those alternatives that match at one point in the subject +are remembered. Depending on the application, this may or may not be what you +want. +

    +

    +If you do want to allow for starting again at the next character, one way of +doing it is to retain some or all of the segment and try a new complete match, +as described for pcre2_match() above. Another possibility is to work with +two buffers. If a partial match at offset n in the first buffer is +followed by "no match" when PCRE2_DFA_RESTART is used on the second buffer, you +can then try a new match starting at offset n+1 in the first buffer. +

    +
    AUTHOR
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    REVISION
    +

    +Last updated: 04 September 2019 +
    +Copyright © 1997-2019 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2pattern.html b/src/pcre2/doc/html/pcre2pattern.html new file mode 100644 index 00000000..9db15b9a --- /dev/null +++ b/src/pcre2/doc/html/pcre2pattern.html @@ -0,0 +1,3861 @@ + + +pcre2pattern specification + + +

    pcre2pattern man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

    +
    PCRE2 REGULAR EXPRESSION DETAILS
    +

    +The syntax and semantics of the regular expressions that are supported by PCRE2 +are described in detail below. There is a quick-reference syntax summary in the +pcre2syntax +page. PCRE2 tries to match Perl syntax and semantics as closely as it can. +PCRE2 also supports some alternative regular expression syntax (which does not +conflict with the Perl syntax) in order to provide some compatibility with +regular expressions in Python, .NET, and Oniguruma. +

    +

    +Perl's regular expressions are described in its own documentation, and regular +expressions in general are covered in a number of books, some of which have +copious examples. Jeffrey Friedl's "Mastering Regular Expressions", published +by O'Reilly, covers regular expressions in great detail. This description of +PCRE2's regular expressions is intended as reference material. +

    +

    +This document discusses the regular expression patterns that are supported by +PCRE2 when its main matching function, pcre2_match(), is used. PCRE2 also +has an alternative matching function, pcre2_dfa_match(), which matches +using a different algorithm that is not Perl-compatible. Some of the features +discussed below are not available when DFA matching is used. The advantages and +disadvantages of the alternative function, and how it differs from the normal +function, are discussed in the +pcre2matching +page. +

    +
    SPECIAL START-OF-PATTERN ITEMS
    +

    +A number of options that can be passed to pcre2_compile() can also be set +by special items at the start of a pattern. These are not Perl-compatible, but +are provided to make these options accessible to pattern writers who are not +able to change the program that processes the pattern. Any number of these +items may appear, but they must all be together right at the start of the +pattern string, and the letters must be in upper case. +

    +
    +UTF support +
    +

    +In the 8-bit and 16-bit PCRE2 libraries, characters may be coded either as +single code units, or as multiple UTF-8 or UTF-16 code units. UTF-32 can be +specified for the 32-bit library, in which case it constrains the character +values to valid Unicode code points. To process UTF strings, PCRE2 must be +built to include Unicode support (which is the default). When using UTF strings +you must either call the compiling function with one or both of the PCRE2_UTF +or PCRE2_MATCH_INVALID_UTF options, or the pattern must start with the special +sequence (*UTF), which is equivalent to setting the relevant PCRE2_UTF. How +setting a UTF mode affects pattern matching is mentioned in several places +below. There is also a summary of features in the +pcre2unicode +page. +

    +

    +Some applications that allow their users to supply patterns may wish to +restrict them to non-UTF data for security reasons. If the PCRE2_NEVER_UTF +option is passed to pcre2_compile(), (*UTF) is not allowed, and its +appearance in a pattern causes an error. +

    +
    +Unicode property support +
    +

    +Another special sequence that may appear at the start of a pattern is (*UCP). +This has the same effect as setting the PCRE2_UCP option: it causes sequences +such as \d and \w to use Unicode properties to determine character types, +instead of recognizing only characters with codes less than 256 via a lookup +table. If also causes upper/lower casing operations to use Unicode properties +for characters with code points greater than 127, even when UTF is not set. +

    +

    +Some applications that allow their users to supply patterns may wish to +restrict them for security reasons. If the PCRE2_NEVER_UCP option is passed to +pcre2_compile(), (*UCP) is not allowed, and its appearance in a pattern +causes an error. +

    +
    +Locking out empty string matching +
    +

    +Starting a pattern with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) has the same effect +as passing the PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART option to whichever +matching function is subsequently called to match the pattern. These options +lock out the matching of empty strings, either entirely, or only at the start +of the subject. +

    +
    +Disabling auto-possessification +
    +

    +If a pattern starts with (*NO_AUTO_POSSESS), it has the same effect as setting +the PCRE2_NO_AUTO_POSSESS option. This stops PCRE2 from making quantifiers +possessive when what follows cannot match the repeated item. For example, by +default a+b is treated as a++b. For more details, see the +pcre2api +documentation. +

    +
    +Disabling start-up optimizations +
    +

    +If a pattern starts with (*NO_START_OPT), it has the same effect as setting the +PCRE2_NO_START_OPTIMIZE option. This disables several optimizations for quickly +reaching "no match" results. For more details, see the +pcre2api +documentation. +

    +
    +Disabling automatic anchoring +
    +

    +If a pattern starts with (*NO_DOTSTAR_ANCHOR), it has the same effect as +setting the PCRE2_NO_DOTSTAR_ANCHOR option. This disables optimizations that +apply to patterns whose top-level branches all start with .* (match any number +of arbitrary characters). For more details, see the +pcre2api +documentation. +

    +
    +Disabling JIT compilation +
    +

    +If a pattern that starts with (*NO_JIT) is successfully compiled, an attempt by +the application to apply the JIT optimization by calling +pcre2_jit_compile() is ignored. +

    +
    +Setting match resource limits +
    +

    +The pcre2_match() function contains a counter that is incremented every +time it goes round its main loop. The caller of pcre2_match() can set a +limit on this counter, which therefore limits the amount of computing resource +used for a match. The maximum depth of nested backtracking can also be limited; +this indirectly restricts the amount of heap memory that is used, but there is +also an explicit memory limit that can be set. +

    +

    +These facilities are provided to catch runaway matches that are provoked by +patterns with huge matching trees. A common example is a pattern with nested +unlimited repeats applied to a long string that does not match. When one of +these limits is reached, pcre2_match() gives an error return. The limits +can also be set by items at the start of the pattern of the form +

    +  (*LIMIT_HEAP=d)
    +  (*LIMIT_MATCH=d)
    +  (*LIMIT_DEPTH=d)
    +
    +where d is any number of decimal digits. However, the value of the setting must +be less than the value set (or defaulted) by the caller of pcre2_match() +for it to have any effect. In other words, the pattern writer can lower the +limits set by the programmer, but not raise them. If there is more than one +setting of one of these limits, the lower value is used. The heap limit is +specified in kibibytes (units of 1024 bytes). +

    +

    +Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is +still recognized for backwards compatibility. +

    +

    +The heap limit applies only when the pcre2_match() or +pcre2_dfa_match() interpreters are used for matching. It does not apply +to JIT. The match limit is used (but in a different way) when JIT is being +used, or when pcre2_dfa_match() is called, to limit computing resource +usage by those matching functions. The depth limit is ignored by JIT but is +relevant for DFA matching, which uses function recursion for recursions within +the pattern and for lookaround assertions and atomic groups. In this case, the +depth limit controls the depth of such recursion. +

    +
    +Newline conventions +
    +

    +PCRE2 supports six different conventions for indicating line breaks in +strings: a single CR (carriage return) character, a single LF (linefeed) +character, the two-character sequence CRLF, any of the three preceding, any +Unicode newline sequence, or the NUL character (binary zero). The +pcre2api +page has +further discussion +about newlines, and shows how to set the newline convention when calling +pcre2_compile(). +

    +

    +It is also possible to specify a newline convention by starting a pattern +string with one of the following sequences: +

    +  (*CR)        carriage return
    +  (*LF)        linefeed
    +  (*CRLF)      carriage return, followed by linefeed
    +  (*ANYCRLF)   any of the three above
    +  (*ANY)       all Unicode newline sequences
    +  (*NUL)       the NUL character (binary zero)
    +
    +These override the default and the options given to the compiling function. For +example, on a Unix system where LF is the default newline sequence, the pattern +
    +  (*CR)a.b
    +
    +changes the convention to CR. That pattern matches "a\nb" because LF is no +longer a newline. If more than one of these settings is present, the last one +is used. +

    +

    +The newline convention affects where the circumflex and dollar assertions are +true. It also affects the interpretation of the dot metacharacter when +PCRE2_DOTALL is not set, and the behaviour of \N when not followed by an +opening brace. However, it does not affect what the \R escape sequence +matches. By default, this is any Unicode newline sequence, for Perl +compatibility. However, this can be changed; see the next section and the +description of \R in the section entitled +"Newline sequences" +below. A change of \R setting can be combined with a change of newline +convention. +

    +
    +Specifying what \R matches +
    +

    +It is possible to restrict \R to match only CR, LF, or CRLF (instead of the +complete set of Unicode line endings) by setting the option PCRE2_BSR_ANYCRLF +at compile time. This effect can also be achieved by starting a pattern with +(*BSR_ANYCRLF). For completeness, (*BSR_UNICODE) is also recognized, +corresponding to PCRE2_BSR_UNICODE. +

    +
    EBCDIC CHARACTER CODES
    +

    +PCRE2 can be compiled to run in an environment that uses EBCDIC as its +character code instead of ASCII or Unicode (typically a mainframe system). In +the sections below, character code values are ASCII or Unicode; in an EBCDIC +environment these characters may have different code values, and there are no +code points greater than 255. +

    +
    CHARACTERS AND METACHARACTERS
    +

    +A regular expression is a pattern that is matched against a subject string from +left to right. Most characters stand for themselves in a pattern, and match the +corresponding characters in the subject. As a trivial example, the pattern +

    +  The quick brown fox
    +
    +matches a portion of a subject string that is identical to itself. When +caseless matching is specified (the PCRE2_CASELESS option or (?i) within the +pattern), letters are matched independently of case. Note that there are two +ASCII characters, K and S, that, in addition to their lower case ASCII +equivalents, are case-equivalent with Unicode U+212A (Kelvin sign) and U+017F +(long S) respectively when either PCRE2_UTF or PCRE2_UCP is set. +

    +

    +The power of regular expressions comes from the ability to include wild cards, +character classes, alternatives, and repetitions in the pattern. These are +encoded in the pattern by the use of metacharacters, which do not stand +for themselves but instead are interpreted in some special way. +

    +

    +There are two different sets of metacharacters: those that are recognized +anywhere in the pattern except within square brackets, and those that are +recognized within square brackets. Outside square brackets, the metacharacters +are as follows: +

    +  \      general escape character with several uses
    +  ^      assert start of string (or line, in multiline mode)
    +  $      assert end of string (or line, in multiline mode)
    +  .      match any character except newline (by default)
    +  [      start character class definition
    +  |      start of alternative branch
    +  (      start group or control verb
    +  )      end group or control verb
    +  *      0 or more quantifier
    +  +      1 or more quantifier; also "possessive quantifier"
    +  ?      0 or 1 quantifier; also quantifier minimizer
    +  {      start min/max quantifier
    +
    +Part of a pattern that is in square brackets is called a "character class". In +a character class the only metacharacters are: +
    +  \      general escape character
    +  ^      negate the class, but only if the first character
    +  -      indicates character range
    +  [      POSIX character class (if followed by POSIX syntax)
    +  ]      terminates the character class
    +
    +If a pattern is compiled with the PCRE2_EXTENDED option, most white space in +the pattern, other than in a character class, and characters between a # +outside a character class and the next newline, inclusive, are ignored. An +escaping backslash can be used to include a white space or a # character as +part of the pattern. If the PCRE2_EXTENDED_MORE option is set, the same +applies, but in addition unescaped space and horizontal tab characters are +ignored inside a character class. Note: only these two characters are ignored, +not the full set of pattern white space characters that are ignored outside a +character class. Option settings can be changed within a pattern; see the +section entitled +"Internal Option Setting" +below. +

    +

    +The following sections describe the use of each of the metacharacters. +

    +
    BACKSLASH
    +

    +The backslash character has several uses. Firstly, if it is followed by a +character that is not a digit or a letter, it takes away any special meaning +that character may have. This use of backslash as an escape character applies +both inside and outside character classes. +

    +

    +For example, if you want to match a * character, you must write \* in the +pattern. This escaping action applies whether or not the following character +would otherwise be interpreted as a metacharacter, so it is always safe to +precede a non-alphanumeric with backslash to specify that it stands for itself. +In particular, if you want to match a backslash, you write \\. +

    +

    +Only ASCII digits and letters have any special meaning after a backslash. All +other characters (in particular, those whose code points are greater than 127) +are treated as literals. +

    +

    +If you want to treat all characters in a sequence as literals, you can do so by +putting them between \Q and \E. This is different from Perl in that $ and @ +are handled as literals in \Q...\E sequences in PCRE2, whereas in Perl, $ and +@ cause variable interpolation. Also, Perl does "double-quotish backslash +interpolation" on any backslashes between \Q and \E which, its documentation +says, "may lead to confusing results". PCRE2 treats a backslash between \Q and +\E just like any other character. Note the following examples: +

    +  Pattern            PCRE2 matches   Perl matches
    +
    +  \Qabc$xyz\E        abc$xyz        abc followed by the contents of $xyz
    +  \Qabc\$xyz\E       abc\$xyz       abc\$xyz
    +  \Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz
    +  \QA\B\E            A\B            A\B
    +  \Q\\E              \              \\E
    +
    +The \Q...\E sequence is recognized both inside and outside character classes. +An isolated \E that is not preceded by \Q is ignored. If \Q is not followed +by \E later in the pattern, the literal interpretation continues to the end of +the pattern (that is, \E is assumed at the end). If the isolated \Q is inside +a character class, this causes an error, because the character class is not +terminated by a closing square bracket. +

    +
    +Non-printing characters +
    +

    +A second use of backslash provides a way of encoding non-printing characters +in patterns in a visible manner. There is no restriction on the appearance of +non-printing characters in a pattern, but when a pattern is being prepared by +text editing, it is often easier to use one of the following escape sequences +instead of the binary character it represents. In an ASCII or Unicode +environment, these escapes are as follows: +

    +  \a          alarm, that is, the BEL character (hex 07)
    +  \cx         "control-x", where x is any printable ASCII character
    +  \e          escape (hex 1B)
    +  \f          form feed (hex 0C)
    +  \n          linefeed (hex 0A)
    +  \r          carriage return (hex 0D) (but see below)
    +  \t          tab (hex 09)
    +  \0dd        character with octal code 0dd
    +  \ddd        character with octal code ddd, or backreference
    +  \o{ddd..}   character with octal code ddd..
    +  \xhh        character with hex code hh
    +  \x{hhh..}   character with hex code hhh..
    +  \N{U+hhh..} character with Unicode hex code point hhh..
    +
    +By default, after \x that is not followed by {, from zero to two hexadecimal +digits are read (letters can be in upper or lower case). Any number of +hexadecimal digits may appear between \x{ and }. If a character other than a +hexadecimal digit appears between \x{ and }, or if there is no terminating }, +an error occurs. +

    +

    +Characters whose code points are less than 256 can be defined by either of the +two syntaxes for \x or by an octal sequence. There is no difference in the way +they are handled. For example, \xdc is exactly the same as \x{dc} or \334. +However, using the braced versions does make such sequences easier to read. +

    +

    +Support is available for some ECMAScript (aka JavaScript) escape sequences via +two compile-time options. If PCRE2_ALT_BSUX is set, the sequence \x followed +by { is not recognized. Only if \x is followed by two hexadecimal digits is it +recognized as a character escape. Otherwise it is interpreted as a literal "x" +character. In this mode, support for code points greater than 256 is provided +by \u, which must be followed by four hexadecimal digits; otherwise it is +interpreted as a literal "u" character. +

    +

    +PCRE2_EXTRA_ALT_BSUX has the same effect as PCRE2_ALT_BSUX and, in addition, +\u{hhh..} is recognized as the character specified by hexadecimal code point. +There may be any number of hexadecimal digits. This syntax is from ECMAScript +6. +

    +

    +The \N{U+hhh..} escape sequence is recognized only when PCRE2 is operating in +UTF mode. Perl also uses \N{name} to specify characters by Unicode name; PCRE2 +does not support this. Note that when \N is not followed by an opening brace +(curly bracket) it has an entirely different meaning, matching any character +that is not a newline. +

    +

    +There are some legacy applications where the escape sequence \r is expected to +match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \r in a +pattern is converted to \n so that it matches a LF (linefeed) instead of a CR +(carriage return) character. +

    +

    +The precise effect of \cx on ASCII characters is as follows: if x is a lower +case letter, it is converted to upper case. Then bit 6 of the character (hex +40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A), +but \c{ becomes hex 3B ({ is 7B), and \c; becomes hex 7B (; is 3B). If the +code unit following \c has a value less than 32 or greater than 126, a +compile-time error occurs. +

    +

    +When PCRE2 is compiled in EBCDIC mode, \N{U+hhh..} is not supported. \a, \e, +\f, \n, \r, and \t generate the appropriate EBCDIC code values. The \c +escape is processed as specified for Perl in the perlebcdic document. The +only characters that are allowed after \c are A-Z, a-z, or one of @, [, \, ], +^, _, or ?. Any other character provokes a compile-time error. The sequence +\c@ encodes character code 0; after \c the letters (in either case) encode +characters 1-26 (hex 01 to hex 1A); [, \, ], ^, and _ encode characters 27-31 +(hex 1B to hex 1F), and \c? becomes either 255 (hex FF) or 95 (hex 5F). +

    +

    +Thus, apart from \c?, these escapes generate the same character code values as +they do in an ASCII environment, though the meanings of the values mostly +differ. For example, \cG always generates code value 7, which is BEL in ASCII +but DEL in EBCDIC. +

    +

    +The sequence \c? generates DEL (127, hex 7F) in an ASCII environment, but +because 127 is not a control character in EBCDIC, Perl makes it generate the +APC character. Unfortunately, there are several variants of EBCDIC. In most of +them the APC character has the value 255 (hex FF), but in the one Perl calls +POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC +values, PCRE2 makes \c? generate 95; otherwise it generates 255. +

    +

    +After \0 up to two further octal digits are read. If there are fewer than two +digits, just those that are present are used. Thus the sequence \0\x\015 +specifies two binary zeros followed by a CR character (code value 13). Make +sure you supply two digits after the initial zero if the pattern character that +follows is itself an octal digit. +

    +

    +The escape \o must be followed by a sequence of octal digits, enclosed in +braces. An error occurs if this is not the case. This escape is a recent +addition to Perl; it provides way of specifying character code points as octal +numbers greater than 0777, and it also allows octal numbers and backreferences +to be unambiguously specified. +

    +

    +For greater clarity and unambiguity, it is best to avoid following \ by a +digit greater than zero. Instead, use \o{} or \x{} to specify numerical +character code points, and \g{} to specify backreferences. The following +paragraphs describe the old, ambiguous syntax. +

    +

    +The handling of a backslash followed by a digit other than 0 is complicated, +and Perl has changed over time, causing PCRE2 also to change. +

    +

    +Outside a character class, PCRE2 reads the digit and any following digits as a +decimal number. If the number is less than 10, begins with the digit 8 or 9, or +if there are at least that many previous capture groups in the expression, the +entire sequence is taken as a backreference. A description of how this +works is given +later, +following the discussion of +parenthesized groups. +Otherwise, up to three octal digits are read to form a character code. +

    +

    +Inside a character class, PCRE2 handles \8 and \9 as the literal characters +"8" and "9", and otherwise reads up to three octal digits following the +backslash, using them to generate a data character. Any subsequent digits stand +for themselves. For example, outside a character class: +

    +  \040   is another way of writing an ASCII space
    +  \40    is the same, provided there are fewer than 40 previous capture groups
    +  \7     is always a backreference
    +  \11    might be a backreference, or another way of writing a tab
    +  \011   is always a tab
    +  \0113  is a tab followed by the character "3"
    +  \113   might be a backreference, otherwise the character with octal code 113
    +  \377   might be a backreference, otherwise the value 255 (decimal)
    +  \81    is always a backreference .sp
    +
    +Note that octal values of 100 or greater that are specified using this syntax +must not be introduced by a leading zero, because no more than three octal +digits are ever read. +

    +
    +Constraints on character values +
    +

    +Characters that are specified using octal or hexadecimal numbers are +limited to certain values, as follows: +

    +  8-bit non-UTF mode    no greater than 0xff
    +  16-bit non-UTF mode   no greater than 0xffff
    +  32-bit non-UTF mode   no greater than 0xffffffff
    +  All UTF modes         no greater than 0x10ffff and a valid code point
    +
    +Invalid Unicode code points are all those in the range 0xd800 to 0xdfff (the +so-called "surrogate" code points). The check for these can be disabled by the +caller of pcre2_compile() by setting the option +PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES. However, this is possible only in UTF-8 +and UTF-32 modes, because these values are not representable in UTF-16. +

    +
    +Escape sequences in character classes +
    +

    +All the sequences that define a single character value can be used both inside +and outside character classes. In addition, inside a character class, \b is +interpreted as the backspace character (hex 08). +

    +

    +When not followed by an opening brace, \N is not allowed in a character class. +\B, \R, and \X are not special inside a character class. Like other +unrecognized alphabetic escape sequences, they cause an error. Outside a +character class, these sequences have different meanings. +

    +
    +Unsupported escape sequences +
    +

    +In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its string +handler and used to modify the case of following characters. By default, PCRE2 +does not support these escape sequences in patterns. However, if either of the +PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX options is set, \U matches a "U" +character, and \u can be used to define a character by code point, as +described above. +

    +
    +Absolute and relative backreferences +
    +

    +The sequence \g followed by a signed or unsigned number, optionally enclosed +in braces, is an absolute or relative backreference. A named backreference +can be coded as \g{name}. Backreferences are discussed +later, +following the discussion of +parenthesized groups. +

    +
    +Absolute and relative subroutine calls +
    +

    +For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or +a number enclosed either in angle brackets or single quotes, is an alternative +syntax for referencing a capture group as a subroutine. Details are discussed +later. +Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not +synonymous. The former is a backreference; the latter is a +subroutine +call. +

    +
    +Generic character types +
    +

    +Another use of backslash is for specifying generic character types: +

    +  \d     any decimal digit
    +  \D     any character that is not a decimal digit
    +  \h     any horizontal white space character
    +  \H     any character that is not a horizontal white space character
    +  \N     any character that is not a newline
    +  \s     any white space character
    +  \S     any character that is not a white space character
    +  \v     any vertical white space character
    +  \V     any character that is not a vertical white space character
    +  \w     any "word" character
    +  \W     any "non-word" character
    +
    +The \N escape sequence has the same meaning as +the "." metacharacter +when PCRE2_DOTALL is not set, but setting PCRE2_DOTALL does not change the +meaning of \N. Note that when \N is followed by an opening brace it has a +different meaning. See the section entitled +"Non-printing characters" +above for details. Perl also uses \N{name} to specify characters by Unicode +name; PCRE2 does not support this. +

    +

    +Each pair of lower and upper case escape sequences partitions the complete set +of characters into two disjoint sets. Any given character matches one, and only +one, of each pair. The sequences can appear both inside and outside character +classes. They each match one character of the appropriate type. If the current +matching point is at the end of the subject string, all of them fail, because +there is no character to match. +

    +

    +The default \s characters are HT (9), LF (10), VT (11), FF (12), CR (13), and +space (32), which are defined as white space in the "C" locale. This list may +vary if locale-specific matching is taking place. For example, in some locales +the "non-breaking space" character (\xA0) is recognized as white space, and in +others the VT character is not. +

    +

    +A "word" character is an underscore or any character that is a letter or digit. +By default, the definition of letters and digits is controlled by PCRE2's +low-valued character tables, and may vary if locale-specific matching is taking +place (see +"Locale support" +in the +pcre2api +page). For example, in a French locale such as "fr_FR" in Unix-like systems, +or "french" in Windows, some character codes greater than 127 are used for +accented letters, and these are then matched by \w. The use of locales with +Unicode is discouraged. +

    +

    +By default, characters whose code points are greater than 127 never match \d, +\s, or \w, and always match \D, \S, and \W, although this may be different +for characters in the range 128-255 when locale-specific matching is happening. +These escape sequences retain their original meanings from before Unicode +support was available, mainly for efficiency reasons. If the PCRE2_UCP option +is set, the behaviour is changed so that Unicode properties are used to +determine character types, as follows: +

    +  \d  any character that matches \p{Nd} (decimal digit)
    +  \s  any character that matches \p{Z} or \h or \v
    +  \w  any character that matches \p{L} or \p{N}, plus underscore
    +
    +The upper case escapes match the inverse sets of characters. Note that \d +matches only decimal digits, whereas \w matches any Unicode digit, as well as +any Unicode letter, and underscore. Note also that PCRE2_UCP affects \b, and +\B because they are defined in terms of \w and \W. Matching these sequences +is noticeably slower when PCRE2_UCP is set. +

    +

    +The sequences \h, \H, \v, and \V, in contrast to the other sequences, which +match only ASCII characters by default, always match a specific list of code +points, whether or not PCRE2_UCP is set. The horizontal space characters are: +

    +  U+0009     Horizontal tab (HT)
    +  U+0020     Space
    +  U+00A0     Non-break space
    +  U+1680     Ogham space mark
    +  U+180E     Mongolian vowel separator
    +  U+2000     En quad
    +  U+2001     Em quad
    +  U+2002     En space
    +  U+2003     Em space
    +  U+2004     Three-per-em space
    +  U+2005     Four-per-em space
    +  U+2006     Six-per-em space
    +  U+2007     Figure space
    +  U+2008     Punctuation space
    +  U+2009     Thin space
    +  U+200A     Hair space
    +  U+202F     Narrow no-break space
    +  U+205F     Medium mathematical space
    +  U+3000     Ideographic space
    +
    +The vertical space characters are: +
    +  U+000A     Linefeed (LF)
    +  U+000B     Vertical tab (VT)
    +  U+000C     Form feed (FF)
    +  U+000D     Carriage return (CR)
    +  U+0085     Next line (NEL)
    +  U+2028     Line separator
    +  U+2029     Paragraph separator
    +
    +In 8-bit, non-UTF-8 mode, only the characters with code points less than 256 +are relevant. +

    +
    +Newline sequences +
    +

    +Outside a character class, by default, the escape sequence \R matches any +Unicode newline sequence. In 8-bit non-UTF-8 mode \R is equivalent to the +following: +

    +  (?>\r\n|\n|\x0b|\f|\r|\x85)
    +
    +This is an example of an "atomic group", details of which are given +below. +This particular group matches either the two-character sequence CR followed by +LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab, +U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next +line, U+0085). Because this is an atomic group, the two-character sequence is +treated as a single unit that cannot be split. +

    +

    +In other modes, two additional characters whose code points are greater than 255 +are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029). +Unicode support is not needed for these characters to be recognized. +

    +

    +It is possible to restrict \R to match only CR, LF, or CRLF (instead of the +complete set of Unicode line endings) by setting the option PCRE2_BSR_ANYCRLF +at compile time. (BSR is an abbrevation for "backslash R".) This can be made +the default when PCRE2 is built; if this is the case, the other behaviour can +be requested via the PCRE2_BSR_UNICODE option. It is also possible to specify +these settings by starting a pattern string with one of the following +sequences: +

    +  (*BSR_ANYCRLF)   CR, LF, or CRLF only
    +  (*BSR_UNICODE)   any Unicode newline sequence
    +
    +These override the default and the options given to the compiling function. +Note that these special settings, which are not Perl-compatible, are recognized +only at the very start of a pattern, and that they must be in upper case. If +more than one of them is present, the last one is used. They can be combined +with a change of newline convention; for example, a pattern can start with: +
    +  (*ANY)(*BSR_ANYCRLF)
    +
    +They can also be combined with the (*UTF) or (*UCP) special sequences. Inside a +character class, \R is treated as an unrecognized escape sequence, and causes +an error. +

    +
    +Unicode character properties +
    +

    +When PCRE2 is built with Unicode support (the default), three additional escape +sequences that match characters with specific properties are available. They +can be used in any mode, though in 8-bit and 16-bit non-UTF modes these +sequences are of course limited to testing characters whose code points are +less than U+0100 and U+10000, respectively. In 32-bit non-UTF mode, code points +greater than 0x10ffff (the Unicode limit) may be encountered. These are all +treated as being in the Unknown script and with an unassigned type. The extra +escape sequences are: +

    +  \p{xx}   a character with the xx property
    +  \P{xx}   a character without the xx property
    +  \X       a Unicode extended grapheme cluster
    +
    +The property names represented by xx above are case-sensitive. There is +support for Unicode script names, Unicode general category properties, "Any", +which matches any character (including newline), and some special PCRE2 +properties (described in the +next section). +Other Perl properties such as "InMusicalSymbols" are not supported by PCRE2. +Note that \P{Any} does not match any characters, so always causes a match +failure. +

    +

    +Sets of Unicode characters are defined as belonging to certain scripts. A +character from one of these sets can be matched using a script name. For +example: +

    +  \p{Greek}
    +  \P{Han}
    +
    +Unassigned characters (and in non-UTF 32-bit mode, characters with code points +greater than 0x10FFFF) are assigned the "Unknown" script. Others that are not +part of an identified script are lumped together as "Common". The current list +of scripts is: +

    +

    +Adlam, +Ahom, +Anatolian_Hieroglyphs, +Arabic, +Armenian, +Avestan, +Balinese, +Bamum, +Bassa_Vah, +Batak, +Bengali, +Bhaiksuki, +Bopomofo, +Brahmi, +Braille, +Buginese, +Buhid, +Canadian_Aboriginal, +Carian, +Caucasian_Albanian, +Chakma, +Cham, +Cherokee, +Chorasmian, +Common, +Coptic, +Cuneiform, +Cypriot, +Cyrillic, +Deseret, +Devanagari, +Dives_Akuru, +Dogra, +Duployan, +Egyptian_Hieroglyphs, +Elbasan, +Elymaic, +Ethiopic, +Georgian, +Glagolitic, +Gothic, +Grantha, +Greek, +Gujarati, +Gunjala_Gondi, +Gurmukhi, +Han, +Hangul, +Hanifi_Rohingya, +Hanunoo, +Hatran, +Hebrew, +Hiragana, +Imperial_Aramaic, +Inherited, +Inscriptional_Pahlavi, +Inscriptional_Parthian, +Javanese, +Kaithi, +Kannada, +Katakana, +Kayah_Li, +Kharoshthi, +Khitan_Small_Script, +Khmer, +Khojki, +Khudawadi, +Lao, +Latin, +Lepcha, +Limbu, +Linear_A, +Linear_B, +Lisu, +Lycian, +Lydian, +Mahajani, +Makasar, +Malayalam, +Mandaic, +Manichaean, +Marchen, +Masaram_Gondi, +Medefaidrin, +Meetei_Mayek, +Mende_Kikakui, +Meroitic_Cursive, +Meroitic_Hieroglyphs, +Miao, +Modi, +Mongolian, +Mro, +Multani, +Myanmar, +Nabataean, +Nandinagari, +New_Tai_Lue, +Newa, +Nko, +Nushu, +Nyakeng_Puachue_Hmong, +Ogham, +Ol_Chiki, +Old_Hungarian, +Old_Italic, +Old_North_Arabian, +Old_Permic, +Old_Persian, +Old_Sogdian, +Old_South_Arabian, +Old_Turkic, +Oriya, +Osage, +Osmanya, +Pahawh_Hmong, +Palmyrene, +Pau_Cin_Hau, +Phags_Pa, +Phoenician, +Psalter_Pahlavi, +Rejang, +Runic, +Samaritan, +Saurashtra, +Sharada, +Shavian, +Siddham, +SignWriting, +Sinhala, +Sogdian, +Sora_Sompeng, +Soyombo, +Sundanese, +Syloti_Nagri, +Syriac, +Tagalog, +Tagbanwa, +Tai_Le, +Tai_Tham, +Tai_Viet, +Takri, +Tamil, +Tangut, +Telugu, +Thaana, +Thai, +Tibetan, +Tifinagh, +Tirhuta, +Ugaritic, +Unknown, +Vai, +Wancho, +Warang_Citi, +Yezidi, +Yi, +Zanabazar_Square. +

    +

    +Each character has exactly one Unicode general category property, specified by +a two-letter abbreviation. For compatibility with Perl, negation can be +specified by including a circumflex between the opening brace and the property +name. For example, \p{^Lu} is the same as \P{Lu}. +

    +

    +If only one letter is specified with \p or \P, it includes all the general +category properties that start with that letter. In this case, in the absence +of negation, the curly brackets in the escape sequence are optional; these two +examples have the same effect: +

    +  \p{L}
    +  \pL
    +
    +The following general category property codes are supported: +
    +  C     Other
    +  Cc    Control
    +  Cf    Format
    +  Cn    Unassigned
    +  Co    Private use
    +  Cs    Surrogate
    +
    +  L     Letter
    +  Ll    Lower case letter
    +  Lm    Modifier letter
    +  Lo    Other letter
    +  Lt    Title case letter
    +  Lu    Upper case letter
    +
    +  M     Mark
    +  Mc    Spacing mark
    +  Me    Enclosing mark
    +  Mn    Non-spacing mark
    +
    +  N     Number
    +  Nd    Decimal number
    +  Nl    Letter number
    +  No    Other number
    +
    +  P     Punctuation
    +  Pc    Connector punctuation
    +  Pd    Dash punctuation
    +  Pe    Close punctuation
    +  Pf    Final punctuation
    +  Pi    Initial punctuation
    +  Po    Other punctuation
    +  Ps    Open punctuation
    +
    +  S     Symbol
    +  Sc    Currency symbol
    +  Sk    Modifier symbol
    +  Sm    Mathematical symbol
    +  So    Other symbol
    +
    +  Z     Separator
    +  Zl    Line separator
    +  Zp    Paragraph separator
    +  Zs    Space separator
    +
    +The special property L& is also supported: it matches a character that has +the Lu, Ll, or Lt property, in other words, a letter that is not classified as +a modifier or "other". +

    +

    +The Cs (Surrogate) property applies only to characters whose code points are in +the range U+D800 to U+DFFF. These characters are no different to any other +character when PCRE2 is not in UTF mode (using the 16-bit or 32-bit library). +However, they are not valid in Unicode strings and so cannot be tested by PCRE2 +in UTF mode, unless UTF validity checking has been turned off (see the +discussion of PCRE2_NO_UTF_CHECK in the +pcre2api +page). +

    +

    +The long synonyms for property names that Perl supports (such as \p{Letter}) +are not supported by PCRE2, nor is it permitted to prefix any of these +properties with "Is". +

    +

    +No character that is in the Unicode table has the Cn (unassigned) property. +Instead, this property is assumed for any code point that is not in the +Unicode table. +

    +

    +Specifying caseless matching does not affect these escape sequences. For +example, \p{Lu} always matches only upper case letters. This is different from +the behaviour of current versions of Perl. +

    +

    +Matching characters by Unicode property is not fast, because PCRE2 has to do a +multistage table lookup in order to find a character's property. That is why +the traditional escape sequences such as \d and \w do not use Unicode +properties in PCRE2 by default, though you can make them do so by setting the +PCRE2_UCP option or by starting the pattern with (*UCP). +

    +
    +Extended grapheme clusters +
    +

    +The \X escape matches any number of Unicode characters that form an "extended +grapheme cluster", and treats the sequence as an atomic group +(see below). +Unicode supports various kinds of composite character by giving each character +a grapheme breaking property, and having rules that use these properties to +define the boundaries of extended grapheme clusters. The rules are defined in +Unicode Standard Annex 29, "Unicode Text Segmentation". Unicode 11.0.0 +abandoned the use of some previous properties that had been used for emojis. +Instead it introduced various emoji-specific properties. PCRE2 uses only the +Extended Pictographic property. +

    +

    +\X always matches at least one character. Then it decides whether to add +additional characters according to the following rules for ending a cluster: +

    +

    +1. End at the end of the subject string. +

    +

    +2. Do not end between CR and LF; otherwise end after any control character. +

    +

    +3. Do not break Hangul (a Korean script) syllable sequences. Hangul characters +are of five types: L, V, T, LV, and LVT. An L character may be followed by an +L, V, LV, or LVT character; an LV or V character may be followed by a V or T +character; an LVT or T character may be follwed only by a T character. +

    +

    +4. Do not end before extending characters or spacing marks or the "zero-width +joiner" character. Characters with the "mark" property always have the +"extend" grapheme breaking property. +

    +

    +5. Do not end after prepend characters. +

    +

    +6. Do not break within emoji modifier sequences or emoji zwj sequences. That +is, do not break between characters with the Extended_Pictographic property. +Extend and ZWJ characters are allowed between the characters. +

    +

    +7. Do not break within emoji flag sequences. That is, do not break between +regional indicator (RI) characters if there are an odd number of RI characters +before the break point. +

    +

    +8. Otherwise, end the cluster. +

    +
    +PCRE2's additional properties +
    +

    +As well as the standard Unicode properties described above, PCRE2 supports four +more that make it possible to convert traditional escape sequences such as \w +and \s to use Unicode properties. PCRE2 uses these non-standard, non-Perl +properties internally when PCRE2_UCP is set. However, they may also be used +explicitly. These properties are: +

    +  Xan   Any alphanumeric character
    +  Xps   Any POSIX space character
    +  Xsp   Any Perl space character
    +  Xwd   Any Perl "word" character
    +
    +Xan matches characters that have either the L (letter) or the N (number) +property. Xps matches the characters tab, linefeed, vertical tab, form feed, or +carriage return, and any other character that has the Z (separator) property. +Xsp is the same as Xps; in PCRE1 it used to exclude vertical tab, for Perl +compatibility, but Perl changed. Xwd matches the same characters as Xan, plus +underscore. +

    +

    +There is another non-standard property, Xuc, which matches any character that +can be represented by a Universal Character Name in C++ and other programming +languages. These are the characters $, @, ` (grave accent), and all characters +with Unicode code points greater than or equal to U+00A0, except for the +surrogates U+D800 to U+DFFF. Note that most base (ASCII) characters are +excluded. (Universal Character Names are of the form \uHHHH or \UHHHHHHHH +where H is a hexadecimal digit. Note that the Xuc property does not match these +sequences but the characters that they represent.) +

    +
    +Resetting the match start +
    +

    +In normal use, the escape sequence \K causes any previously matched characters +not to be included in the final matched sequence that is returned. For example, +the pattern: +

    +  foo\Kbar
    +
    +matches "foobar", but reports that it has matched "bar". \K does not interact +with anchoring in any way. The pattern: +
    +  ^foo\Kbar
    +
    +matches only when the subject begins with "foobar" (in single line mode), +though it again reports the matched string as "bar". This feature is similar to +a lookbehind assertion +(described below). +However, in this case, the part of the subject before the real match does not +have to be of fixed length, as lookbehind assertions do. The use of \K does +not interfere with the setting of +captured substrings. +For example, when the pattern +
    +  (foo)\Kbar
    +
    +matches "foobar", the first substring is still set to "foo". +

    +

    +Perl used to document that the use of \K within lookaround assertions is "not +well defined", but from version 5.32.0 Perl does not support this usage at all. +In PCRE2, \K is acted upon when it occurs inside positive assertions, but is +ignored in negative assertions. Note that when a pattern such as (?=ab\K) +matches, the reported start of the match can be greater than the end of the +match. Using \K in a lookbehind assertion at the start of a pattern can also +lead to odd effects. For example, consider this pattern: +

    +  (?<=\Kfoo)bar
    +
    +If the subject is "foobar", a call to pcre2_match() with a starting +offset of 3 succeeds and reports the matching string as "foobar", that is, the +start of the reported match is earlier than where the match started. +

    +
    +Simple assertions +
    +

    +The final use of backslash is for certain simple assertions. An assertion +specifies a condition that has to be met at a particular point in a match, +without consuming any characters from the subject string. The use of +groups for more complicated assertions is described +below. +The backslashed assertions are: +

    +  \b     matches at a word boundary
    +  \B     matches when not at a word boundary
    +  \A     matches at the start of the subject
    +  \Z     matches at the end of the subject
    +          also matches before a newline at the end of the subject
    +  \z     matches only at the end of the subject
    +  \G     matches at the first matching position in the subject
    +
    +Inside a character class, \b has a different meaning; it matches the backspace +character. If any other of these assertions appears in a character class, an +"invalid escape sequence" error is generated. +

    +

    +A word boundary is a position in the subject string where the current character +and the previous character do not both match \w or \W (i.e. one matches +\w and the other matches \W), or the start or end of the string if the +first or last character matches \w, respectively. When PCRE2 is built with +Unicode support, the meanings of \w and \W can be changed by setting the +PCRE2_UCP option. When this is done, it also affects \b and \B. Neither PCRE2 +nor Perl has a separate "start of word" or "end of word" metasequence. However, +whatever follows \b normally determines which it is. For example, the fragment +\ba matches "a" at the start of a word. +

    +

    +The \A, \Z, and \z assertions differ from the traditional circumflex and +dollar (described in the next section) in that they only ever match at the very +start and end of the subject string, whatever options are set. Thus, they are +independent of multiline mode. These three assertions are not affected by the +PCRE2_NOTBOL or PCRE2_NOTEOL options, which affect only the behaviour of the +circumflex and dollar metacharacters. However, if the startoffset +argument of pcre2_match() is non-zero, indicating that matching is to +start at a point other than the beginning of the subject, \A can never match. +The difference between \Z and \z is that \Z matches before a newline at the +end of the string as well as at the very end, whereas \z matches only at the +end. +

    +

    +The \G assertion is true only when the current matching position is at the +start point of the matching process, as specified by the startoffset +argument of pcre2_match(). It differs from \A when the value of +startoffset is non-zero. By calling pcre2_match() multiple times +with appropriate arguments, you can mimic Perl's /g option, and it is in this +kind of implementation where \G can be useful. +

    +

    +Note, however, that PCRE2's implementation of \G, being true at the starting +character of the matching process, is subtly different from Perl's, which +defines it as true at the end of the previous match. In Perl, these can be +different when the previously matched string was empty. Because PCRE2 does just +one match at a time, it cannot reproduce this behaviour. +

    +

    +If all the alternatives of a pattern begin with \G, the expression is anchored +to the starting match position, and the "anchored" flag is set in the compiled +regular expression. +

    +
    CIRCUMFLEX AND DOLLAR
    +

    +The circumflex and dollar metacharacters are zero-width assertions. That is, +they test for a particular condition being true without consuming any +characters from the subject string. These two metacharacters are concerned with +matching the starts and ends of lines. If the newline convention is set so that +only the two-character sequence CRLF is recognized as a newline, isolated CR +and LF characters are treated as ordinary data characters, and are not +recognized as newlines. +

    +

    +Outside a character class, in the default matching mode, the circumflex +character is an assertion that is true only if the current matching point is at +the start of the subject string. If the startoffset argument of +pcre2_match() is non-zero, or if PCRE2_NOTBOL is set, circumflex can +never match if the PCRE2_MULTILINE option is unset. Inside a character class, +circumflex has an entirely different meaning +(see below). +

    +

    +Circumflex need not be the first character of the pattern if a number of +alternatives are involved, but it should be the first thing in each alternative +in which it appears if the pattern is ever to match that branch. If all +possible alternatives start with a circumflex, that is, if the pattern is +constrained to match only at the start of the subject, it is said to be an +"anchored" pattern. (There are also other constructs that can cause a pattern +to be anchored.) +

    +

    +The dollar character is an assertion that is true only if the current matching +point is at the end of the subject string, or immediately before a newline at +the end of the string (by default), unless PCRE2_NOTEOL is set. Note, however, +that it does not actually match the newline. Dollar need not be the last +character of the pattern if a number of alternatives are involved, but it +should be the last item in any branch in which it appears. Dollar has no +special meaning in a character class. +

    +

    +The meaning of dollar can be changed so that it matches only at the very end of +the string, by setting the PCRE2_DOLLAR_ENDONLY option at compile time. This +does not affect the \Z assertion. +

    +

    +The meanings of the circumflex and dollar metacharacters are changed if the +PCRE2_MULTILINE option is set. When this is the case, a dollar character +matches before any newlines in the string, as well as at the very end, and a +circumflex matches immediately after internal newlines as well as at the start +of the subject string. It does not match after a newline that ends the string, +for compatibility with Perl. However, this can be changed by setting the +PCRE2_ALT_CIRCUMFLEX option. +

    +

    +For example, the pattern /^abc$/ matches the subject string "def\nabc" (where +\n represents a newline) in multiline mode, but not otherwise. Consequently, +patterns that are anchored in single line mode because all branches start with +^ are not anchored in multiline mode, and a match for circumflex is possible +when the startoffset argument of pcre2_match() is non-zero. The +PCRE2_DOLLAR_ENDONLY option is ignored if PCRE2_MULTILINE is set. +

    +

    +When the newline convention (see +"Newline conventions" +below) recognizes the two-character sequence CRLF as a newline, this is +preferred, even if the single characters CR and LF are also recognized as +newlines. For example, if the newline convention is "any", a multiline mode +circumflex matches before "xyz" in the string "abc\r\nxyz" rather than after +CR, even though CR on its own is a valid newline. (It also matches at the very +start of the string, of course.) +

    +

    +Note that the sequences \A, \Z, and \z can be used to match the start and +end of the subject in both modes, and if all branches of a pattern start with +\A it is always anchored, whether or not PCRE2_MULTILINE is set. +

    +
    FULL STOP (PERIOD, DOT) AND \N
    +

    +Outside a character class, a dot in the pattern matches any one character in +the subject string except (by default) a character that signifies the end of a +line. +

    +

    +When a line ending is defined as a single character, dot never matches that +character; when the two-character sequence CRLF is used, dot does not match CR +if it is immediately followed by LF, but otherwise it matches all characters +(including isolated CRs and LFs). When any Unicode line endings are being +recognized, dot does not match CR or LF or any of the other line ending +characters. +

    +

    +The behaviour of dot with regard to newlines can be changed. If the +PCRE2_DOTALL option is set, a dot matches any one character, without exception. +If the two-character sequence CRLF is present in the subject string, it takes +two dots to match it. +

    +

    +The handling of dot is entirely independent of the handling of circumflex and +dollar, the only relationship being that they both involve newlines. Dot has no +special meaning in a character class. +

    +

    +The escape sequence \N when not followed by an opening brace behaves like a +dot, except that it is not affected by the PCRE2_DOTALL option. In other words, +it matches any character except one that signifies the end of a line. +

    +

    +When \N is followed by an opening brace it has a different meaning. See the +section entitled +"Non-printing characters" +above for details. Perl also uses \N{name} to specify characters by Unicode +name; PCRE2 does not support this. +

    +
    MATCHING A SINGLE CODE UNIT
    +

    +Outside a character class, the escape sequence \C matches any one code unit, +whether or not a UTF mode is set. In the 8-bit library, one code unit is one +byte; in the 16-bit library it is a 16-bit unit; in the 32-bit library it is a +32-bit unit. Unlike a dot, \C always matches line-ending characters. The +feature is provided in Perl in order to match individual bytes in UTF-8 mode, +but it is unclear how it can usefully be used. +

    +

    +Because \C breaks up characters into individual code units, matching one unit +with \C in UTF-8 or UTF-16 mode means that the rest of the string may start +with a malformed UTF character. This has undefined results, because PCRE2 +assumes that it is matching character by character in a valid UTF string (by +default it checks the subject string's validity at the start of processing +unless the PCRE2_NO_UTF_CHECK or PCRE2_MATCH_INVALID_UTF option is used). +

    +

    +An application can lock out the use of \C by setting the +PCRE2_NEVER_BACKSLASH_C option when compiling a pattern. It is also possible to +build PCRE2 with the use of \C permanently disabled. +

    +

    +PCRE2 does not allow \C to appear in lookbehind assertions +(described below) +in UTF-8 or UTF-16 modes, because this would make it impossible to calculate +the length of the lookbehind. Neither the alternative matching function +pcre2_dfa_match() nor the JIT optimizer support \C in these UTF modes. +The former gives a match-time error; the latter fails to optimize and so the +match is always run using the interpreter. +

    +

    +In the 32-bit library, however, \C is always supported (when not explicitly +locked out) because it always matches a single code unit, whether or not UTF-32 +is specified. +

    +

    +In general, the \C escape sequence is best avoided. However, one way of using +it that avoids the problem of malformed UTF-8 or UTF-16 characters is to use a +lookahead to check the length of the next character, as in this pattern, which +could be used with a UTF-8 string (ignore white space and line breaks): +

    +  (?| (?=[\x00-\x7f])(\C) |
    +      (?=[\x80-\x{7ff}])(\C)(\C) |
    +      (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) |
    +      (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C))
    +
    +In this example, a group that starts with (?| resets the capturing parentheses +numbers in each alternative (see +"Duplicate Group Numbers" +below). The assertions at the start of each branch check the next UTF-8 +character for values whose encoding uses 1, 2, 3, or 4 bytes, respectively. The +character's individual bytes are then captured by the appropriate number of +\C groups. +

    +
    SQUARE BRACKETS AND CHARACTER CLASSES
    +

    +An opening square bracket introduces a character class, terminated by a closing +square bracket. A closing square bracket on its own is not special by default. +If a closing square bracket is required as a member of the class, it should be +the first data character in the class (after an initial circumflex, if present) +or escaped with a backslash. This means that, by default, an empty class cannot +be defined. However, if the PCRE2_ALLOW_EMPTY_CLASS option is set, a closing +square bracket at the start does end the (empty) class. +

    +

    +A character class matches a single character in the subject. A matched +character must be in the set of characters defined by the class, unless the +first character in the class definition is a circumflex, in which case the +subject character must not be in the set defined by the class. If a circumflex +is actually required as a member of the class, ensure it is not the first +character, or escape it with a backslash. +

    +

    +For example, the character class [aeiou] matches any lower case vowel, while +[^aeiou] matches any character that is not a lower case vowel. Note that a +circumflex is just a convenient notation for specifying the characters that +are in the class by enumerating those that are not. A class that starts with a +circumflex is not an assertion; it still consumes a character from the subject +string, and therefore it fails if the current pointer is at the end of the +string. +

    +

    +Characters in a class may be specified by their code points using \o, \x, or +\N{U+hh..} in the usual way. When caseless matching is set, any letters in a +class represent both their upper case and lower case versions, so for example, +a caseless [aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not +match "A", whereas a caseful version would. Note that there are two ASCII +characters, K and S, that, in addition to their lower case ASCII equivalents, +are case-equivalent with Unicode U+212A (Kelvin sign) and U+017F (long S) +respectively when either PCRE2_UTF or PCRE2_UCP is set. +

    +

    +Characters that might indicate line breaks are never treated in any special way +when matching character classes, whatever line-ending sequence is in use, and +whatever setting of the PCRE2_DOTALL and PCRE2_MULTILINE options is used. A +class such as [^a] always matches one of these characters. +

    +

    +The generic character type escape sequences \d, \D, \h, \H, \p, \P, \s, +\S, \v, \V, \w, and \W may appear in a character class, and add the +characters that they match to the class. For example, [\dABCDEF] matches any +hexadecimal digit. In UTF modes, the PCRE2_UCP option affects the meanings of +\d, \s, \w and their upper case partners, just as it does when they appear +outside a character class, as described in the section entitled +"Generic character types" +above. The escape sequence \b has a different meaning inside a character +class; it matches the backspace character. The sequences \B, \R, and \X are +not special inside a character class. Like any other unrecognized escape +sequences, they cause an error. The same is true for \N when not followed by +an opening brace. +

    +

    +The minus (hyphen) character can be used to specify a range of characters in a +character class. For example, [d-m] matches any letter between d and m, +inclusive. If a minus character is required in a class, it must be escaped with +a backslash or appear in a position where it cannot be interpreted as +indicating a range, typically as the first or last character in the class, +or immediately after a range. For example, [b-d-z] matches letters in the range +b to d, a hyphen character, or z. +

    +

    +Perl treats a hyphen as a literal if it appears before or after a POSIX class +(see below) or before or after a character type escape such as as \d or \H. +However, unless the hyphen is the last character in the class, Perl outputs a +warning in its warning mode, as this is most likely a user error. As PCRE2 has +no facility for warning, an error is given in these cases. +

    +

    +It is not possible to have the literal character "]" as the end character of a +range. A pattern such as [W-]46] is interpreted as a class of two characters +("W" and "-") followed by a literal string "46]", so it would match "W46]" or +"-46]". However, if the "]" is escaped with a backslash it is interpreted as +the end of range, so [W-\]46] is interpreted as a class containing a range +followed by two other characters. The octal or hexadecimal representation of +"]" can also be used to end a range. +

    +

    +Ranges normally include all code points between the start and end characters, +inclusive. They can also be used for code points specified numerically, for +example [\000-\037]. Ranges can include any characters that are valid for the +current mode. In any UTF mode, the so-called "surrogate" characters (those +whose code points lie between 0xd800 and 0xdfff inclusive) may not be specified +explicitly by default (the PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES option disables +this check). However, ranges such as [\x{d7ff}-\x{e000}], which include the +surrogates, are always permitted. +

    +

    +There is a special case in EBCDIC environments for ranges whose end points are +both specified as literal letters in the same case. For compatibility with +Perl, EBCDIC code points within the range that are not letters are omitted. For +example, [h-k] matches only four characters, even though the codes for h and k +are 0x88 and 0x92, a range of 11 code points. However, if the range is +specified numerically, for example, [\x88-\x92] or [h-\x92], all code points +are included. +

    +

    +If a range that includes letters is used when caseless matching is set, it +matches the letters in either case. For example, [W-c] is equivalent to +[][\\^_`wxyzabc], matched caselessly, and in a non-UTF mode, if character +tables for a French locale are in use, [\xc8-\xcb] matches accented E +characters in both cases. +

    +

    +A circumflex can conveniently be used with the upper case character types to +specify a more restricted set of characters than the matching lower case type. +For example, the class [^\W_] matches any letter or digit, but not underscore, +whereas [\w] includes underscore. A positive character class should be read as +"something OR something OR ..." and a negative class as "NOT something AND NOT +something AND NOT ...". +

    +

    +The only metacharacters that are recognized in character classes are backslash, +hyphen (only where it can be interpreted as specifying a range), circumflex +(only at the start), opening square bracket (only when it can be interpreted as +introducing a POSIX class name, or for a special compatibility feature - see +the next two sections), and the terminating closing square bracket. However, +escaping other non-alphanumeric characters does no harm. +

    +
    POSIX CHARACTER CLASSES
    +

    +Perl supports the POSIX notation for character classes. This uses names +enclosed by [: and :] within the enclosing square brackets. PCRE2 also supports +this notation. For example, +

    +  [01[:alpha:]%]
    +
    +matches "0", "1", any alphabetic character, or "%". The supported class names +are: +
    +  alnum    letters and digits
    +  alpha    letters
    +  ascii    character codes 0 - 127
    +  blank    space or tab only
    +  cntrl    control characters
    +  digit    decimal digits (same as \d)
    +  graph    printing characters, excluding space
    +  lower    lower case letters
    +  print    printing characters, including space
    +  punct    printing characters, excluding letters and digits and space
    +  space    white space (the same as \s from PCRE2 8.34)
    +  upper    upper case letters
    +  word     "word" characters (same as \w)
    +  xdigit   hexadecimal digits
    +
    +The default "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), +and space (32). If locale-specific matching is taking place, the list of space +characters may be different; there may be fewer or more of them. "Space" and +\s match the same set of characters. +

    +

    +The name "word" is a Perl extension, and "blank" is a GNU extension from Perl +5.8. Another Perl extension is negation, which is indicated by a ^ character +after the colon. For example, +

    +  [12[:^digit:]]
    +
    +matches "1", "2", or any non-digit. PCRE2 (and Perl) also recognize the POSIX +syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not +supported, and an error is given if they are encountered. +

    +

    +By default, characters with values greater than 127 do not match any of the +POSIX character classes, although this may be different for characters in the +range 128-255 when locale-specific matching is happening. However, if the +PCRE2_UCP option is passed to pcre2_compile(), some of the classes are +changed so that Unicode character properties are used. This is achieved by +replacing certain POSIX classes with other sequences, as follows: +

    +  [:alnum:]  becomes  \p{Xan}
    +  [:alpha:]  becomes  \p{L}
    +  [:blank:]  becomes  \h
    +  [:cntrl:]  becomes  \p{Cc}
    +  [:digit:]  becomes  \p{Nd}
    +  [:lower:]  becomes  \p{Ll}
    +  [:space:]  becomes  \p{Xps}
    +  [:upper:]  becomes  \p{Lu}
    +  [:word:]   becomes  \p{Xwd}
    +
    +Negated versions, such as [:^alpha:] use \P instead of \p. Three other POSIX +classes are handled specially in UCP mode: +

    +

    +[:graph:] +This matches characters that have glyphs that mark the page when printed. In +Unicode property terms, it matches all characters with the L, M, N, P, S, or Cf +properties, except for: +

    +  U+061C           Arabic Letter Mark
    +  U+180E           Mongolian Vowel Separator
    +  U+2066 - U+2069  Various "isolate"s
    +
    +
    +

    +

    +[:print:] +This matches the same characters as [:graph:] plus space characters that are +not controls, that is, characters with the Zs property. +

    +

    +[:punct:] +This matches all characters that have the Unicode P (punctuation) property, +plus those characters with code points less than 256 that have the S (Symbol) +property. +

    +

    +The other POSIX classes are unchanged, and match only characters with code +points less than 256. +

    +
    COMPATIBILITY FEATURE FOR WORD BOUNDARIES
    +

    +In the POSIX.2 compliant library that was included in 4.4BSD Unix, the ugly +syntax [[:<:]] and [[:>:]] is used for matching "start of word" and "end of +word". PCRE2 treats these items as follows: +

    +  [[:<:]]  is converted to  \b(?=\w)
    +  [[:>:]]  is converted to  \b(?<=\w)
    +
    +Only these exact character sequences are recognized. A sequence such as +[a[:<:]b] provokes error for an unrecognized POSIX class name. This support is +not compatible with Perl. It is provided to help migrations from other +environments, and is best not used in any new patterns. Note that \b matches +at the start and the end of a word (see +"Simple assertions" +above), and in a Perl-style pattern the preceding or following character +normally shows which is wanted, without the need for the assertions that are +used above in order to give exactly the POSIX behaviour. +

    +
    VERTICAL BAR
    +

    +Vertical bar characters are used to separate alternative patterns. For example, +the pattern +

    +  gilbert|sullivan
    +
    +matches either "gilbert" or "sullivan". Any number of alternatives may appear, +and an empty alternative is permitted (matching the empty string). The matching +process tries each alternative in turn, from left to right, and the first one +that succeeds is used. If the alternatives are within a group +(defined below), +"succeeds" means matching the rest of the main pattern as well as the +alternative in the group. +

    +
    INTERNAL OPTION SETTING
    +

    +The settings of the PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL, +PCRE2_EXTENDED, PCRE2_EXTENDED_MORE, and PCRE2_NO_AUTO_CAPTURE options can be +changed from within the pattern by a sequence of letters enclosed between "(?" +and ")". These options are Perl-compatible, and are described in detail in the +pcre2api +documentation. The option letters are: +

    +  i  for PCRE2_CASELESS
    +  m  for PCRE2_MULTILINE
    +  n  for PCRE2_NO_AUTO_CAPTURE
    +  s  for PCRE2_DOTALL
    +  x  for PCRE2_EXTENDED
    +  xx for PCRE2_EXTENDED_MORE
    +
    +For example, (?im) sets caseless, multiline matching. It is also possible to +unset these options by preceding the relevant letters with a hyphen, for +example (?-im). The two "extended" options are not independent; unsetting either +one cancels the effects of both of them. +

    +

    +A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS +and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also +permitted. Only one hyphen may appear in the options string. If a letter +appears both before and after the hyphen, the option is unset. An empty options +setting "(?)" is allowed. Needless to say, it has no effect. +

    +

    +If the first character following (? is a circumflex, it causes all of the above +options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow +the circumflex to cause some options to be re-instated, but a hyphen may not +appear. +

    +

    +The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in +the same way as the Perl-compatible options by using the characters J and U +respectively. However, these are not unset by (?^). +

    +

    +When one of these option changes occurs at top level (that is, not inside +group parentheses), the change applies to the remainder of the pattern +that follows. An option change within a group (see below for a description +of groups) affects only that part of the group that follows it, so +

    +  (a(?i)b)c
    +
    +matches abc and aBc and no other strings (assuming PCRE2_CASELESS is not used). +By this means, options can be made to have different settings in different +parts of the pattern. Any changes made in one alternative do carry on +into subsequent branches within the same group. For example, +
    +  (a(?i)b|c)
    +
    +matches "ab", "aB", "c", and "C", even though when matching "C" the first +branch is abandoned before the option setting. This is because the effects of +option settings happen at compile time. There would be some very weird +behaviour otherwise. +

    +

    +As a convenient shorthand, if any option settings are required at the start of +a non-capturing group (see the next section), the option letters may +appear between the "?" and the ":". Thus the two patterns +

    +  (?i:saturday|sunday)
    +  (?:(?i)saturday|sunday)
    +
    +match exactly the same set of strings. +

    +

    +Note: There are other PCRE2-specific options, applying to the whole +pattern, which can be set by the application when the compiling function is +called. In addition, the pattern can contain special leading sequences such as +(*CRLF) to override what the application has set or what has been defaulted. +Details are given in the section entitled +"Newline sequences" +above. There are also the (*UTF) and (*UCP) leading sequences that can be used +to set UTF and Unicode property modes; they are equivalent to setting the +PCRE2_UTF and PCRE2_UCP options, respectively. However, the application can set +the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP options, which lock out the use of the +(*UTF) and (*UCP) sequences. +

    +
    GROUPS
    +

    +Groups are delimited by parentheses (round brackets), which can be nested. +Turning part of a pattern into a group does two things: +
    +
    +1. It localizes a set of alternatives. For example, the pattern +

    +  cat(aract|erpillar|)
    +
    +matches "cataract", "caterpillar", or "cat". Without the parentheses, it would +match "cataract", "erpillar" or an empty string. +
    +
    +2. It creates a "capture group". This means that, when the whole pattern +matches, the portion of the subject string that matched the group is passed +back to the caller, separately from the portion that matched the whole pattern. +(This applies only to the traditional matching function; the DFA matching +function does not support capturing.) +

    +

    +Opening parentheses are counted from left to right (starting from 1) to obtain +numbers for capture groups. For example, if the string "the red king" is +matched against the pattern +

    +  the ((red|white) (king|queen))
    +
    +the captured substrings are "red king", "red", and "king", and are numbered 1, +2, and 3, respectively. +

    +

    +The fact that plain parentheses fulfil two functions is not always helpful. +There are often times when grouping is required without capturing. If an +opening parenthesis is followed by a question mark and a colon, the group +does not do any capturing, and is not counted when computing the number of any +subsequent capture groups. For example, if the string "the white queen" +is matched against the pattern +

    +  the ((?:red|white) (king|queen))
    +
    +the captured substrings are "white queen" and "queen", and are numbered 1 and +2. The maximum number of capture groups is 65535. +

    +

    +As a convenient shorthand, if any option settings are required at the start of +a non-capturing group, the option letters may appear between the "?" and the +":". Thus the two patterns +

    +  (?i:saturday|sunday)
    +  (?:(?i)saturday|sunday)
    +
    +match exactly the same set of strings. Because alternative branches are tried +from left to right, and options are not reset until the end of the group is +reached, an option setting in one branch does affect subsequent branches, so +the above patterns match "SUNDAY" as well as "Saturday". +

    +
    DUPLICATE GROUP NUMBERS
    +

    +Perl 5.10 introduced a feature whereby each alternative in a group uses the +same numbers for its capturing parentheses. Such a group starts with (?| and is +itself a non-capturing group. For example, consider this pattern: +

    +  (?|(Sat)ur|(Sun))day
    +
    +Because the two alternatives are inside a (?| group, both sets of capturing +parentheses are numbered one. Thus, when the pattern matches, you can look +at captured substring number one, whichever alternative matched. This construct +is useful when you want to capture part, but not all, of one of a number of +alternatives. Inside a (?| group, parentheses are numbered as usual, but the +number is reset at the start of each branch. The numbers of any capturing +parentheses that follow the whole group start after the highest number used in +any branch. The following example is taken from the Perl documentation. The +numbers underneath show in which buffer the captured content will be stored. +
    +  # before  ---------------branch-reset----------- after
    +  / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
    +  # 1            2         2  3        2     3     4
    +
    +A backreference to a capture group uses the most recent value that is set for +the group. The following pattern matches "abcabc" or "defdef": +
    +  /(?|(abc)|(def))\1/
    +
    +In contrast, a subroutine call to a capture group always refers to the +first one in the pattern with the given number. The following pattern matches +"abcabc" or "defabc": +
    +  /(?|(abc)|(def))(?1)/
    +
    +A relative reference such as (?-1) is no different: it is just a convenient way +of computing an absolute group number. +

    +

    +If a +condition test +for a group's having matched refers to a non-unique number, the test is +true if any group with that number has matched. +

    +

    +An alternative approach to using this "branch reset" feature is to use +duplicate named groups, as described in the next section. +

    +
    NAMED CAPTURE GROUPS
    +

    +Identifying capture groups by number is simple, but it can be very hard to keep +track of the numbers in complicated patterns. Furthermore, if an expression is +modified, the numbers may change. To help with this difficulty, PCRE2 supports +the naming of capture groups. This feature was not added to Perl until release +5.10. Python had the feature earlier, and PCRE1 introduced it at release 4.0, +using the Python syntax. PCRE2 supports both the Perl and the Python syntax. +

    +

    +In PCRE2, a capture group can be named in one of three ways: (?<name>...) or +(?'name'...) as in Perl, or (?P<name>...) as in Python. Names may be up to 32 +code units long. When PCRE2_UTF is not set, they may contain only ASCII +alphanumeric characters and underscores, but must start with a non-digit. When +PCRE2_UTF is set, the syntax of group names is extended to allow any Unicode +letter or Unicode decimal digit. In other words, group names must match one of +these patterns: +

    +  ^[_A-Za-z][_A-Za-z0-9]*\z   when PCRE2_UTF is not set
    +  ^[_\p{L}][_\p{L}\p{Nd}]*\z  when PCRE2_UTF is set
    +
    +References to capture groups from other parts of the pattern, such as +backreferences, +recursion, +and +conditions, +can all be made by name as well as by number. +

    +

    +Named capture groups are allocated numbers as well as names, exactly as +if the names were not present. In both PCRE2 and Perl, capture groups +are primarily identified by numbers; any names are just aliases for these +numbers. The PCRE2 API provides function calls for extracting the complete +name-to-number translation table from a compiled pattern, as well as +convenience functions for extracting captured substrings by name. +

    +

    +Warning: When more than one capture group has the same number, as +described in the previous section, a name given to one of them applies to all +of them. Perl allows identically numbered groups to have different names. +Consider this pattern, where there are two capture groups, both numbered 1: +

    +  (?|(?<AA>aa)|(?<BB>bb))
    +
    +Perl allows this, with both names AA and BB as aliases of group 1. Thus, after +a successful match, both names yield the same value (either "aa" or "bb"). +

    +

    +In an attempt to reduce confusion, PCRE2 does not allow the same group number +to be associated with more than one name. The example above provokes a +compile-time error. However, there is still scope for confusion. Consider this +pattern: +

    +  (?|(?<AA>aa)|(bb))
    +
    +Although the second group number 1 is not explicitly named, the name AA is +still an alias for any group 1. Whether the pattern matches "aa" or "bb", a +reference by name to group AA yields the matched string. +

    +

    +By default, a name must be unique within a pattern, except that duplicate names +are permitted for groups with the same number, for example: +

    +  (?|(?<AA>aa)|(?<AA>bb))
    +
    +The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES +option at compile time, or by the use of (?J) within the pattern, as described +in the section entitled +"Internal Option Setting" +above. +

    +

    +Duplicate names can be useful for patterns where only one instance of the named +capture group can match. Suppose you want to match the name of a weekday, +either as a 3-letter abbreviation or as the full name, and in both cases you +want to extract the abbreviation. This pattern (ignoring the line breaks) does +the job: +

    +  (?J)
    +  (?<DN>Mon|Fri|Sun)(?:day)?|
    +  (?<DN>Tue)(?:sday)?|
    +  (?<DN>Wed)(?:nesday)?|
    +  (?<DN>Thu)(?:rsday)?|
    +  (?<DN>Sat)(?:urday)?
    +
    +There are five capture groups, but only one is ever set after a match. The +convenience functions for extracting the data by name returns the substring for +the first (and in this example, the only) group of that name that matched. This +saves searching to find which numbered group it was. (An alternative way of +solving this problem is to use a "branch reset" group, as described in the +previous section.) +

    +

    +If you make a backreference to a non-unique named group from elsewhere in the +pattern, the groups to which the name refers are checked in the order in which +they appear in the overall pattern. The first one that is set is used for the +reference. For example, this pattern matches both "foofoo" and "barbar" but not +"foobar" or "barfoo": +

    +  (?J)(?:(?<n>foo)|(?<n>bar))\k<n>
    +
    +
    +

    +

    +If you make a subroutine call to a non-unique named group, the one that +corresponds to the first occurrence of the name is used. In the absence of +duplicate numbers this is the one with the lowest number. +

    +

    +If you use a named reference in a condition +test (see the +section about conditions +below), either to check whether a capture group has matched, or to check for +recursion, all groups with the same name are tested. If the condition is true +for any one of them, the overall condition is true. This is the same behaviour +as testing by number. For further details of the interfaces for handling named +capture groups, see the +pcre2api +documentation. +

    +
    REPETITION
    +

    +Repetition is specified by quantifiers, which can follow any of the following +items: +

    +  a literal data character
    +  the dot metacharacter
    +  the \C escape sequence
    +  the \R escape sequence
    +  the \X escape sequence
    +  an escape such as \d or \pL that matches a single character
    +  a character class
    +  a backreference
    +  a parenthesized group (including lookaround assertions)
    +  a subroutine call (recursive or otherwise)
    +
    +The general repetition quantifier specifies a minimum and maximum number of +permitted matches, by giving the two numbers in curly brackets (braces), +separated by a comma. The numbers must be less than 65536, and the first must +be less than or equal to the second. For example, +
    +  z{2,4}
    +
    +matches "zz", "zzz", or "zzzz". A closing brace on its own is not a special +character. If the second number is omitted, but the comma is present, there is +no upper limit; if the second number and the comma are both omitted, the +quantifier specifies an exact number of required matches. Thus +
    +  [aeiou]{3,}
    +
    +matches at least 3 successive vowels, but may match many more, whereas +
    +  \d{8}
    +
    +matches exactly 8 digits. An opening curly bracket that appears in a position +where a quantifier is not allowed, or one that does not match the syntax of a +quantifier, is taken as a literal character. For example, {,6} is not a +quantifier, but a literal string of four characters. +

    +

    +In UTF modes, quantifiers apply to characters rather than to individual code +units. Thus, for example, \x{100}{2} matches two characters, each of +which is represented by a two-byte sequence in a UTF-8 string. Similarly, +\X{3} matches three Unicode extended grapheme clusters, each of which may be +several code units long (and they may be of different lengths). +

    +

    +The quantifier {0} is permitted, causing the expression to behave as if the +previous item and the quantifier were not present. This may be useful for +capture groups that are referenced as +subroutines +from elsewhere in the pattern (but see also the section entitled +"Defining capture groups for use by reference only" +below). Except for parenthesized groups, items that have a {0} quantifier are +omitted from the compiled pattern. +

    +

    +For convenience, the three most common quantifiers have single-character +abbreviations: +

    +  *    is equivalent to {0,}
    +  +    is equivalent to {1,}
    +  ?    is equivalent to {0,1}
    +
    +It is possible to construct infinite loops by following a group that can match +no characters with a quantifier that has no upper limit, for example: +
    +  (a?)*
    +
    +Earlier versions of Perl and PCRE1 used to give an error at compile time for +such patterns. However, because there are cases where this can be useful, such +patterns are now accepted, but whenever an iteration of such a group matches no +characters, matching moves on to the next item in the pattern instead of +repeatedly matching an empty string. This does not prevent backtracking into +any of the iterations if a subsequent item fails to match. +

    +

    +By default, quantifiers are "greedy", that is, they match as much as possible +(up to the maximum number of permitted times), without causing the rest of the +pattern to fail. The classic example of where this gives problems is in trying +to match comments in C programs. These appear between /* and */ and within the +comment, individual * and / characters may appear. An attempt to match C +comments by applying the pattern +

    +  /\*.*\*/
    +
    +to the string +
    +  /* first comment */  not comment  /* second comment */
    +
    +fails, because it matches the entire string owing to the greediness of the .* +item. However, if a quantifier is followed by a question mark, it ceases to be +greedy, and instead matches the minimum number of times possible, so the +pattern +
    +  /\*.*?\*/
    +
    +does the right thing with the C comments. The meaning of the various +quantifiers is not otherwise changed, just the preferred number of matches. +Do not confuse this use of question mark with its use as a quantifier in its +own right. Because it has two uses, it can sometimes appear doubled, as in +
    +  \d??\d
    +
    +which matches one digit by preference, but can match two if that is the only +way the rest of the pattern matches. +

    +

    +If the PCRE2_UNGREEDY option is set (an option that is not available in Perl), +the quantifiers are not greedy by default, but individual ones can be made +greedy by following them with a question mark. In other words, it inverts the +default behaviour. +

    +

    +When a parenthesized group is quantified with a minimum repeat count that +is greater than 1 or with a limited maximum, more memory is required for the +compiled pattern, in proportion to the size of the minimum or maximum. +

    +

    +If a pattern starts with .* or .{0,} and the PCRE2_DOTALL option (equivalent +to Perl's /s) is set, thus allowing the dot to match newlines, the pattern is +implicitly anchored, because whatever follows will be tried against every +character position in the subject string, so there is no point in retrying the +overall match at any position after the first. PCRE2 normally treats such a +pattern as though it were preceded by \A. +

    +

    +In cases where it is known that the subject string contains no newlines, it is +worth setting PCRE2_DOTALL in order to obtain this optimization, or +alternatively, using ^ to indicate anchoring explicitly. +

    +

    +However, there are some cases where the optimization cannot be used. When .* +is inside capturing parentheses that are the subject of a backreference +elsewhere in the pattern, a match at the start may fail where a later one +succeeds. Consider, for example: +

    +  (.*)abc\1
    +
    +If the subject is "xyz123abc123" the match point is the fourth character. For +this reason, such a pattern is not implicitly anchored. +

    +

    +Another case where implicit anchoring is not applied is when the leading .* is +inside an atomic group. Once again, a match at the start may fail where a later +one succeeds. Consider this pattern: +

    +  (?>.*?a)b
    +
    +It matches "ab" in the subject "aab". The use of the backtracking control verbs +(*PRUNE) and (*SKIP) also disable this optimization, and there is an option, +PCRE2_NO_DOTSTAR_ANCHOR, to do so explicitly. +

    +

    +When a capture group is repeated, the value captured is the substring that +matched the final iteration. For example, after +

    +  (tweedle[dume]{3}\s*)+
    +
    +has matched "tweedledum tweedledee" the value of the captured substring is +"tweedledee". However, if there are nested capture groups, the corresponding +captured values may have been set in previous iterations. For example, after +
    +  (a|(b))+
    +
    +matches "aba" the value of the second captured substring is "b". +

    +
    ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS
    +

    +With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy") +repetition, failure of what follows normally causes the repeated item to be +re-evaluated to see if a different number of repeats allows the rest of the +pattern to match. Sometimes it is useful to prevent this, either to change the +nature of the match, or to cause it fail earlier than it otherwise might, when +the author of the pattern knows there is no point in carrying on. +

    +

    +Consider, for example, the pattern \d+foo when applied to the subject line +

    +  123456bar
    +
    +After matching all 6 digits and then failing to match "foo", the normal +action of the matcher is to try again with only 5 digits matching the \d+ +item, and then with 4, and so on, before ultimately failing. "Atomic grouping" +(a term taken from Jeffrey Friedl's book) provides the means for specifying +that once a group has matched, it is not to be re-evaluated in this way. +

    +

    +If we use atomic grouping for the previous example, the matcher gives up +immediately on failing to match "foo" the first time. The notation is a kind of +special parenthesis, starting with (?> as in this example: +

    +  (?>\d+)foo
    +
    +Perl 5.28 introduced an experimental alphabetic form starting with (* which may +be easier to remember: +
    +  (*atomic:\d+)foo
    +
    +This kind of parenthesized group "locks up" the part of the pattern it +contains once it has matched, and a failure further into the pattern is +prevented from backtracking into it. Backtracking past it to previous items, +however, works as normal. +

    +

    +An alternative description is that a group of this type matches exactly the +string of characters that an identical standalone pattern would match, if +anchored at the current point in the subject string. +

    +

    +Atomic groups are not capture groups. Simple cases such as the above example +can be thought of as a maximizing repeat that must swallow everything it can. +So, while both \d+ and \d+? are prepared to adjust the number of digits they +match in order to make the rest of the pattern match, (?>\d+) can only match +an entire sequence of digits. +

    +

    +Atomic groups in general can of course contain arbitrarily complicated +expressions, and can be nested. However, when the contents of an atomic +group is just a single repeated item, as in the example above, a simpler +notation, called a "possessive quantifier" can be used. This consists of an +additional + character following a quantifier. Using this notation, the +previous example can be rewritten as +

    +  \d++foo
    +
    +Note that a possessive quantifier can be used with an entire group, for +example: +
    +  (abc|xyz){2,3}+
    +
    +Possessive quantifiers are always greedy; the setting of the PCRE2_UNGREEDY +option is ignored. They are a convenient notation for the simpler forms of +atomic group. However, there is no difference in the meaning of a possessive +quantifier and the equivalent atomic group, though there may be a performance +difference; possessive quantifiers should be slightly faster. +

    +

    +The possessive quantifier syntax is an extension to the Perl 5.8 syntax. +Jeffrey Friedl originated the idea (and the name) in the first edition of his +book. Mike McCloskey liked it, so implemented it when he built Sun's Java +package, and PCRE1 copied it from there. It found its way into Perl at release +5.10. +

    +

    +PCRE2 has an optimization that automatically "possessifies" certain simple +pattern constructs. For example, the sequence A+B is treated as A++B because +there is no point in backtracking into a sequence of A's when B must follow. +This feature can be disabled by the PCRE2_NO_AUTOPOSSESS option, or starting +the pattern with (*NO_AUTO_POSSESS). +

    +

    +When a pattern contains an unlimited repeat inside a group that can itself be +repeated an unlimited number of times, the use of an atomic group is the only +way to avoid some failing matches taking a very long time indeed. The pattern +

    +  (\D+|<\d+>)*[!?]
    +
    +matches an unlimited number of substrings that either consist of non-digits, or +digits enclosed in <>, followed by either ! or ?. When it matches, it runs +quickly. However, if it is applied to +
    +  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
    +
    +it takes a long time before reporting failure. This is because the string can +be divided between the internal \D+ repeat and the external * repeat in a +large number of ways, and all have to be tried. (The example uses [!?] rather +than a single character at the end, because both PCRE2 and Perl have an +optimization that allows for fast failure when a single character is used. They +remember the last single character that is required for a match, and fail early +if it is not present in the string.) If the pattern is changed so that it uses +an atomic group, like this: +
    +  ((?>\D+)|<\d+>)*[!?]
    +
    +sequences of non-digits cannot be broken, and failure happens quickly. +

    +
    BACKREFERENCES
    +

    +Outside a character class, a backslash followed by a digit greater than 0 (and +possibly further digits) is a backreference to a capture group earlier (that +is, to its left) in the pattern, provided there have been that many previous +capture groups. +

    +

    +However, if the decimal number following the backslash is less than 8, it is +always taken as a backreference, and causes an error only if there are not that +many capture groups in the entire pattern. In other words, the group that is +referenced need not be to the left of the reference for numbers less than 8. A +"forward backreference" of this type can make sense when a repetition is +involved and the group to the right has participated in an earlier iteration. +

    +

    +It is not possible to have a numerical "forward backreference" to a group whose +number is 8 or more using this syntax because a sequence such as \50 is +interpreted as a character defined in octal. See the subsection entitled +"Non-printing characters" +above +for further details of the handling of digits following a backslash. Other +forms of backreferencing do not suffer from this restriction. In particular, +there is no problem when named capture groups are used (see below). +

    +

    +Another way of avoiding the ambiguity inherent in the use of digits following a +backslash is to use the \g escape sequence. This escape must be followed by a +signed or unsigned number, optionally enclosed in braces. These examples are +all identical: +

    +  (ring), \1
    +  (ring), \g1
    +  (ring), \g{1}
    +
    +An unsigned number specifies an absolute reference without the ambiguity that +is present in the older syntax. It is also useful when literal digits follow +the reference. A signed number is a relative reference. Consider this example: +
    +  (abc(def)ghi)\g{-1}
    +
    +The sequence \g{-1} is a reference to the most recently started capture group +before \g, that is, is it equivalent to \2 in this example. Similarly, +\g{-2} would be equivalent to \1. The use of relative references can be +helpful in long patterns, and also in patterns that are created by joining +together fragments that contain references within themselves. +

    +

    +The sequence \g{+1} is a reference to the next capture group. This kind of +forward reference can be useful in patterns that repeat. Perl does not support +the use of + in this way. +

    +

    +A backreference matches whatever actually most recently matched the capture +group in the current subject string, rather than anything at all that matches +the group (see +"Groups as subroutines" +below for a way of doing that). So the pattern +

    +  (sens|respons)e and \1ibility
    +
    +matches "sense and sensibility" and "response and responsibility", but not +"sense and responsibility". If caseful matching is in force at the time of the +backreference, the case of letters is relevant. For example, +
    +  ((?i)rah)\s+\1
    +
    +matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original +capture group is matched caselessly. +

    +

    +There are several different ways of writing backreferences to named capture +groups. The .NET syntax \k{name} and the Perl syntax \k<name> or \k'name' +are supported, as is the Python syntax (?P=name). Perl 5.10's unified +backreference syntax, in which \g can be used for both numeric and named +references, is also supported. We could rewrite the above example in any of the +following ways: +

    +  (?<p1>(?i)rah)\s+\k<p1>
    +  (?'p1'(?i)rah)\s+\k{p1}
    +  (?P<p1>(?i)rah)\s+(?P=p1)
    +  (?<p1>(?i)rah)\s+\g{p1}
    +
    +A capture group that is referenced by name may appear in the pattern before or +after the reference. +

    +

    +There may be more than one backreference to the same group. If a group has not +actually been used in a particular match, backreferences to it always fail by +default. For example, the pattern +

    +  (a|(bc))\2
    +
    +always fails if it starts to match "a" rather than "bc". However, if the +PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a backreference to an +unset value matches an empty string. +

    +

    +Because there may be many capture groups in a pattern, all digits following a +backslash are taken as part of a potential backreference number. If the pattern +continues with a digit character, some delimiter must be used to terminate the +backreference. If the PCRE2_EXTENDED or PCRE2_EXTENDED_MORE option is set, this +can be white space. Otherwise, the \g{} syntax or an empty comment (see +"Comments" +below) can be used. +

    +
    +Recursive backreferences +
    +

    +A backreference that occurs inside the group to which it refers fails when the +group is first used, so, for example, (a\1) never matches. However, such +references can be useful inside repeated groups. For example, the pattern +

    +  (a|b\1)+
    +
    +matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of +the group, the backreference matches the character string corresponding to the +previous iteration. In order for this to work, the pattern must be such that +the first iteration does not need to match the backreference. This can be done +using alternation, as in the example above, or by a quantifier with a minimum +of zero. +

    +

    +For versions of PCRE2 less than 10.25, backreferences of this type used to +cause the group that they reference to be treated as an +atomic group. +This restriction no longer applies, and backtracking into such groups can occur +as normal. +

    +
    ASSERTIONS
    +

    +An assertion is a test on the characters following or preceding the current +matching point that does not consume any characters. The simple assertions +coded as \b, \B, \A, \G, \Z, \z, ^ and $ are described +above. +

    +

    +More complicated assertions are coded as parenthesized groups. There are two +kinds: those that look ahead of the current position in the subject string, and +those that look behind it, and in each case an assertion may be positive (must +match for the assertion to be true) or negative (must not match for the +assertion to be true). An assertion group is matched in the normal way, +and if it is true, matching continues after it, but with the matching position +in the subject string reset to what it was before the assertion was processed. +

    +

    +The Perl-compatible lookaround assertions are atomic. If an assertion is true, +but there is a subsequent matching failure, there is no backtracking into the +assertion. However, there are some cases where non-atomic assertions can be +useful. PCRE2 has some support for these, described in the section entitled +"Non-atomic assertions" +below, but they are not Perl-compatible. +

    +

    +A lookaround assertion may appear as the condition in a +conditional group +(see below). In this case, the result of matching the assertion determines +which branch of the condition is followed. +

    +

    +Assertion groups are not capture groups. If an assertion contains capture +groups within it, these are counted for the purposes of numbering the capture +groups in the whole pattern. Within each branch of an assertion, locally +captured substrings may be referenced in the usual way. For example, a sequence +such as (.)\g{-1} can be used to check that two adjacent characters are the +same. +

    +

    +When a branch within an assertion fails to match, any substrings that were +captured are discarded (as happens with any pattern branch that fails to +match). A negative assertion is true only when all its branches fail to match; +this means that no captured substrings are ever retained after a successful +negative assertion. When an assertion contains a matching branch, what happens +depends on the type of assertion. +

    +

    +For a positive assertion, internally captured substrings in the successful +branch are retained, and matching continues with the next pattern item after +the assertion. For a negative assertion, a matching branch means that the +assertion is not true. If such an assertion is being used as a condition in a +conditional group +(see below), captured substrings are retained, because matching continues with +the "no" branch of the condition. For other failing negative assertions, +control passes to the previous backtracking point, thus discarding any captured +strings within the assertion. +

    +

    +Most assertion groups may be repeated; though it makes no sense to assert the +same thing several times, the side effect of capturing in positive assertions +may occasionally be useful. However, an assertion that forms the condition for +a conditional group may not be quantified. PCRE2 used to restrict the +repetition of assertions, but from release 10.35 the only restriction is that +an unlimited maximum repetition is changed to be one more than the minimum. For +example, {3,} is treated as {3,4}. +

    +
    +Alphabetic assertion names +
    +

    +Traditionally, symbolic sequences such as (?= and (?<= have been used to +specify lookaround assertions. Perl 5.28 introduced some experimental +alphabetic alternatives which might be easier to remember. They all start with +(* instead of (? and must be written using lower case letters. PCRE2 supports +the following synonyms: +

    +  (*positive_lookahead:  or (*pla: is the same as (?=
    +  (*negative_lookahead:  or (*nla: is the same as (?!
    +  (*positive_lookbehind: or (*plb: is the same as (?<=
    +  (*negative_lookbehind: or (*nlb: is the same as (?<!
    +
    +For example, (*pla:foo) is the same assertion as (?=foo). In the following +sections, the various assertions are described using the original symbolic +forms. +

    +
    +Lookahead assertions +
    +

    +Lookahead assertions start with (?= for positive assertions and (?! for +negative assertions. For example, +

    +  \w+(?=;)
    +
    +matches a word followed by a semicolon, but does not include the semicolon in +the match, and +
    +  foo(?!bar)
    +
    +matches any occurrence of "foo" that is not followed by "bar". Note that the +apparently similar pattern +
    +  (?!foo)bar
    +
    +does not find an occurrence of "bar" that is preceded by something other than +"foo"; it finds any occurrence of "bar" whatsoever, because the assertion +(?!foo) is always true when the next three characters are "bar". A +lookbehind assertion is needed to achieve the other effect. +

    +

    +If you want to force a matching failure at some point in a pattern, the most +convenient way to do it is with (?!) because an empty string always matches, so +an assertion that requires there not to be an empty string must always fail. +The backtracking control verb (*FAIL) or (*F) is a synonym for (?!). +

    +
    +Lookbehind assertions +
    +

    +Lookbehind assertions start with (?<= for positive assertions and (?<! for +negative assertions. For example, +

    +  (?<!foo)bar
    +
    +does find an occurrence of "bar" that is not preceded by "foo". The contents of +a lookbehind assertion are restricted such that all the strings it matches must +have a fixed length. However, if there are several top-level alternatives, they +do not all have to have the same fixed length. Thus +
    +  (?<=bullock|donkey)
    +
    +is permitted, but +
    +  (?<!dogs?|cats?)
    +
    +causes an error at compile time. Branches that match different length strings +are permitted only at the top level of a lookbehind assertion. This is an +extension compared with Perl, which requires all branches to match the same +length of string. An assertion such as +
    +  (?<=ab(c|de))
    +
    +is not permitted, because its single top-level branch can match two different +lengths, but it is acceptable to PCRE2 if rewritten to use two top-level +branches: +
    +  (?<=abc|abde)
    +
    +In some cases, the escape sequence \K +(see above) +can be used instead of a lookbehind assertion to get round the fixed-length +restriction. +

    +

    +The implementation of lookbehind assertions is, for each alternative, to +temporarily move the current position back by the fixed length and then try to +match. If there are insufficient characters before the current position, the +assertion fails. +

    +

    +In UTF-8 and UTF-16 modes, PCRE2 does not allow the \C escape (which matches a +single code unit even in a UTF mode) to appear in lookbehind assertions, +because it makes it impossible to calculate the length of the lookbehind. The +\X and \R escapes, which can match different numbers of code units, are never +permitted in lookbehinds. +

    +

    +"Subroutine" +calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long +as the called capture group matches a fixed-length string. However, +recursion, +that is, a "subroutine" call into a group that is already active, +is not supported. +

    +

    +Perl does not support backreferences in lookbehinds. PCRE2 does support them, +but only if certain conditions are met. The PCRE2_MATCH_UNSET_BACKREF option +must not be set, there must be no use of (?| in the pattern (it creates +duplicate group numbers), and if the backreference is by name, the name +must be unique. Of course, the referenced group must itself match a fixed +length substring. The following pattern matches words containing at least two +characters that begin and end with the same character: +

    +   \b(\w)\w++(?<=\1)
    +
    +

    +

    +Possessive quantifiers can be used in conjunction with lookbehind assertions to +specify efficient matching of fixed-length strings at the end of subject +strings. Consider a simple pattern such as +

    +  abcd$
    +
    +when applied to a long string that does not match. Because matching proceeds +from left to right, PCRE2 will look for each "a" in the subject and then see if +what follows matches the rest of the pattern. If the pattern is specified as +
    +  ^.*abcd$
    +
    +the initial .* matches the entire string at first, but when this fails (because +there is no following "a"), it backtracks to match all but the last character, +then all but the last two characters, and so on. Once again the search for "a" +covers the entire string, from right to left, so we are no better off. However, +if the pattern is written as +
    +  ^.*+(?<=abcd)
    +
    +there can be no backtracking for the .*+ item because of the possessive +quantifier; it can match only the entire string. The subsequent lookbehind +assertion does a single test on the last four characters. If it fails, the +match fails immediately. For long strings, this approach makes a significant +difference to the processing time. +

    +
    +Using multiple assertions +
    +

    +Several assertions (of any sort) may occur in succession. For example, +

    +  (?<=\d{3})(?<!999)foo
    +
    +matches "foo" preceded by three digits that are not "999". Notice that each of +the assertions is applied independently at the same point in the subject +string. First there is a check that the previous three characters are all +digits, and then there is a check that the same three characters are not "999". +This pattern does not match "foo" preceded by six characters, the first +of which are digits and the last three of which are not "999". For example, it +doesn't match "123abcfoo". A pattern to do that is +
    +  (?<=\d{3}...)(?<!999)foo
    +
    +This time the first assertion looks at the preceding six characters, checking +that the first three are digits, and then the second assertion checks that the +preceding three characters are not "999". +

    +

    +Assertions can be nested in any combination. For example, +

    +  (?<=(?<!foo)bar)baz
    +
    +matches an occurrence of "baz" that is preceded by "bar" which in turn is not +preceded by "foo", while +
    +  (?<=\d{3}(?!999)...)foo
    +
    +is another pattern that matches "foo" preceded by three digits and any three +characters that are not "999". +

    +
    NON-ATOMIC ASSERTIONS
    +

    +The traditional Perl-compatible lookaround assertions are atomic. That is, if +an assertion is true, but there is a subsequent matching failure, there is no +backtracking into the assertion. However, there are some cases where non-atomic +positive assertions can be useful. PCRE2 provides these using the following +syntax: +

    +  (*non_atomic_positive_lookahead:  or (*napla: or (?*
    +  (*non_atomic_positive_lookbehind: or (*naplb: or (?<*
    +
    +Consider the problem of finding the right-most word in a string that also +appears earlier in the string, that is, it must appear at least twice in total. +This pattern returns the required result as captured substring 1: +
    +  ^(?x)(*napla: .* \b(\w++)) (?> .*? \b\1\b ){2}
    +
    +For a subject such as "word1 word2 word3 word2 word3 word4" the result is +"word3". How does it work? At the start, ^(?x) anchors the pattern and sets the +"x" option, which causes white space (introduced for readability) to be +ignored. Inside the assertion, the greedy .* at first consumes the entire +string, but then has to backtrack until the rest of the assertion can match a +word, which is captured by group 1. In other words, when the assertion first +succeeds, it captures the right-most word in the string. +

    +

    +The current matching point is then reset to the start of the subject, and the +rest of the pattern match checks for two occurrences of the captured word, +using an ungreedy .*? to scan from the left. If this succeeds, we are done, but +if the last word in the string does not occur twice, this part of the pattern +fails. If a traditional atomic lookhead (?= or (*pla: had been used, the +assertion could not be re-entered, and the whole match would fail. The pattern +would succeed only if the very last word in the subject was found twice. +

    +

    +Using a non-atomic lookahead, however, means that when the last word does not +occur twice in the string, the lookahead can backtrack and find the second-last +word, and so on, until either the match succeeds, or all words have been +tested. +

    +

    +Two conditions must be met for a non-atomic assertion to be useful: the +contents of one or more capturing groups must change after a backtrack into the +assertion, and there must be a backreference to a changed group later in the +pattern. If this is not the case, the rest of the pattern match fails exactly +as before because nothing has changed, so using a non-atomic assertion just +wastes resources. +

    +

    +There is one exception to backtracking into a non-atomic assertion. If an +(*ACCEPT) control verb is triggered, the assertion succeeds atomically. That +is, a subsequent match failure cannot backtrack into the assertion. +

    +

    +Non-atomic assertions are not supported by the alternative matching function +pcre2_dfa_match(). They are supported by JIT, but only if they do not +contain any control verbs such as (*ACCEPT). (This may change in future). Note +that assertions that appear as conditions for +conditional groups +(see below) must be atomic. +

    +
    SCRIPT RUNS
    +

    +In concept, a script run is a sequence of characters that are all from the same +Unicode script such as Latin or Greek. However, because some scripts are +commonly used together, and because some diacritical and other marks are used +with multiple scripts, it is not that simple. There is a full description of +the rules that PCRE2 uses in the section entitled +"Script Runs" +in the +pcre2unicode +documentation. +

    +

    +If part of a pattern is enclosed between (*script_run: or (*sr: and a closing +parenthesis, it fails if the sequence of characters that it matches are not a +script run. After a failure, normal backtracking occurs. Script runs can be +used to detect spoofing attacks using characters that look the same, but are +from different scripts. The string "paypal.com" is an infamous example, where +the letters could be a mixture of Latin and Cyrillic. This pattern ensures that +the matched characters in a sequence of non-spaces that follow white space are +a script run: +

    +  \s+(*sr:\S+)
    +
    +To be sure that they are all from the Latin script (for example), a lookahead +can be used: +
    +  \s+(?=\p{Latin})(*sr:\S+)
    +
    +This works as long as the first character is expected to be a character in that +script, and not (for example) punctuation, which is allowed with any script. If +this is not the case, a more creative lookahead is needed. For example, if +digits, underscore, and dots are permitted at the start: +
    +  \s+(?=[0-9_.]*\p{Latin})(*sr:\S+)
    +
    +
    +

    +

    +In many cases, backtracking into a script run pattern fragment is not +desirable. The script run can employ an atomic group to prevent this. Because +this is a common requirement, a shorthand notation is provided by +(*atomic_script_run: or (*asr: +

    +  (*asr:...) is the same as (*sr:(?>...))
    +
    +Note that the atomic group is inside the script run. Putting it outside would +not prevent backtracking into the script run pattern. +

    +

    +Support for script runs is not available if PCRE2 is compiled without Unicode +support. A compile-time error is given if any of the above constructs is +encountered. Script runs are not supported by the alternate matching function, +pcre2_dfa_match() because they use the same mechanism as capturing +parentheses. +

    +

    +Warning: The (*ACCEPT) control verb +(see below) +should not be used within a script run group, because it causes an immediate +exit from the group, bypassing the script run checking. +

    +
    CONDITIONAL GROUPS
    +

    +It is possible to cause the matching process to obey a pattern fragment +conditionally or to choose between two alternative fragments, depending on +the result of an assertion, or whether a specific capture group has +already been matched. The two possible forms of conditional group are: +

    +  (?(condition)yes-pattern)
    +  (?(condition)yes-pattern|no-pattern)
    +
    +If the condition is satisfied, the yes-pattern is used; otherwise the +no-pattern (if present) is used. An absent no-pattern is equivalent to an empty +string (it always matches). If there are more than two alternatives in the +group, a compile-time error occurs. Each of the two alternatives may itself +contain nested groups of any form, including conditional groups; the +restriction to two alternatives applies only at the level of the condition +itself. This pattern fragment is an example where the alternatives are complex: +
    +  (?(1) (A|B|C) | (D | (?(2)E|F) | E) )
    +
    +
    +

    +

    +There are five kinds of condition: references to capture groups, references to +recursion, two pseudo-conditions called DEFINE and VERSION, and assertions. +

    +
    +Checking for a used capture group by number +
    +

    +If the text between the parentheses consists of a sequence of digits, the +condition is true if a capture group of that number has previously matched. If +there is more than one capture group with the same number (see the earlier +section about duplicate group numbers), +the condition is true if any of them have matched. An alternative notation is +to precede the digits with a plus or minus sign. In this case, the group number +is relative rather than absolute. The most recently opened capture group can be +referenced by (?(-1), the next most recent by (?(-2), and so on. Inside loops +it can also make sense to refer to subsequent groups. The next capture group +can be referenced as (?(+1), and so on. (The value zero in any of these forms +is not used; it provokes a compile-time error.) +

    +

    +Consider the following pattern, which contains non-significant white space to +make it more readable (assume the PCRE2_EXTENDED option) and to divide it into +three parts for ease of discussion: +

    +  ( \( )?    [^()]+    (?(1) \) )
    +
    +The first part matches an optional opening parenthesis, and if that +character is present, sets it as the first captured substring. The second part +matches one or more characters that are not parentheses. The third part is a +conditional group that tests whether or not the first capture group +matched. If it did, that is, if subject started with an opening parenthesis, +the condition is true, and so the yes-pattern is executed and a closing +parenthesis is required. Otherwise, since no-pattern is not present, the +conditional group matches nothing. In other words, this pattern matches a +sequence of non-parentheses, optionally enclosed in parentheses. +

    +

    +If you were embedding this pattern in a larger one, you could use a relative +reference: +

    +  ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...
    +
    +This makes the fragment independent of the parentheses in the larger pattern. +

    +
    +Checking for a used capture group by name +
    +

    +Perl uses the syntax (?(<name>)...) or (?('name')...) to test for a used +capture group by name. For compatibility with earlier versions of PCRE1, which +had this facility before Perl, the syntax (?(name)...) is also recognized. +Note, however, that undelimited names consisting of the letter R followed by +digits are ambiguous (see the following section). Rewriting the above example +to use a named group gives this: +

    +  (?<OPEN> \( )?    [^()]+    (?(<OPEN>) \) )
    +
    +If the name used in a condition of this kind is a duplicate, the test is +applied to all groups of the same name, and is true if any one of them has +matched. +

    +
    +Checking for pattern recursion +
    +

    +"Recursion" in this sense refers to any subroutine-like call from one part of +the pattern to another, whether or not it is actually recursive. See the +sections entitled +"Recursive patterns" +and +"Groups as subroutines" +below for details of recursion and subroutine calls. +

    +

    +If a condition is the string (R), and there is no capture group with the name +R, the condition is true if matching is currently in a recursion or subroutine +call to the whole pattern or any capture group. If digits follow the letter R, +and there is no group with that name, the condition is true if the most recent +call is into a group with the given number, which must exist somewhere in the +overall pattern. This is a contrived example that is equivalent to a+b: +

    +  ((?(R1)a+|(?1)b))
    +
    +However, in both cases, if there is a capture group with a matching name, the +condition tests for its being set, as described in the section above, instead +of testing for recursion. For example, creating a group with the name R1 by +adding (?<R1>) to the above pattern completely changes its meaning. +

    +

    +If a name preceded by ampersand follows the letter R, for example: +

    +  (?(R&name)...)
    +
    +the condition is true if the most recent recursion is into a group of that name +(which must exist within the pattern). +

    +

    +This condition does not check the entire recursion stack. It tests only the +current level. If the name used in a condition of this kind is a duplicate, the +test is applied to all groups of the same name, and is true if any one of +them is the most recent recursion. +

    +

    +At "top level", all these recursion test conditions are false. +

    +
    +Defining capture groups for use by reference only +
    +

    +If the condition is the string (DEFINE), the condition is always false, even if +there is a group with the name DEFINE. In this case, there may be only one +alternative in the rest of the conditional group. It is always skipped if +control reaches this point in the pattern; the idea of DEFINE is that it can be +used to define subroutines that can be referenced from elsewhere. (The use of +subroutines +is described below.) For example, a pattern to match an IPv4 address such as +"192.168.23.245" could be written like this (ignore white space and line +breaks): +

    +  (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
    +  \b (?&byte) (\.(?&byte)){3} \b
    +
    +The first part of the pattern is a DEFINE group inside which a another group +named "byte" is defined. This matches an individual component of an IPv4 +address (a number less than 256). When matching takes place, this part of the +pattern is skipped because DEFINE acts like a false condition. The rest of the +pattern uses references to the named group to match the four dot-separated +components of an IPv4 address, insisting on a word boundary at each end. +

    +
    +Checking the PCRE2 version +
    +

    +Programs that link with a PCRE2 library can check the version by calling +pcre2_config() with appropriate arguments. Users of applications that do +not have access to the underlying code cannot do this. A special "condition" +called VERSION exists to allow such users to discover which version of PCRE2 +they are dealing with by using this condition to match a string such as +"yesno". VERSION must be followed either by "=" or ">=" and a version number. +For example: +

    +  (?(VERSION>=10.4)yes|no)
    +
    +This pattern matches "yes" if the PCRE2 version is greater or equal to 10.4, or +"no" otherwise. The fractional part of the version number may not contain more +than two digits. +

    +
    +Assertion conditions +
    +

    +If the condition is not in any of the above formats, it must be a parenthesized +assertion. This may be a positive or negative lookahead or lookbehind +assertion. However, it must be a traditional atomic assertion, not one of the +PCRE2-specific +non-atomic assertions. +

    +

    +Consider this pattern, again containing non-significant white space, and with +the two alternatives on the second line: +

    +  (?(?=[^a-z]*[a-z])
    +  \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
    +
    +The condition is a positive lookahead assertion that matches an optional +sequence of non-letters followed by a letter. In other words, it tests for the +presence of at least one letter in the subject. If a letter is found, the +subject is matched against the first alternative; otherwise it is matched +against the second. This pattern matches strings in one of the two forms +dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits. +

    +

    +When an assertion that is a condition contains capture groups, any +capturing that occurs in a matching branch is retained afterwards, for both +positive and negative assertions, because matching always continues after the +assertion, whether it succeeds or fails. (Compare non-conditional assertions, +for which captures are retained only for positive assertions that succeed.) +

    +
    COMMENTS
    +

    +There are two ways of including comments in patterns that are processed by +PCRE2. In both cases, the start of the comment must not be in a character +class, nor in the middle of any other sequence of related characters such as +(?: or a group name or number. The characters that make up a comment play +no part in the pattern matching. +

    +

    +The sequence (?# marks the start of a comment that continues up to the next +closing parenthesis. Nested parentheses are not permitted. If the +PCRE2_EXTENDED or PCRE2_EXTENDED_MORE option is set, an unescaped # character +also introduces a comment, which in this case continues to immediately after +the next newline character or character sequence in the pattern. Which +characters are interpreted as newlines is controlled by an option passed to the +compiling function or by a special sequence at the start of the pattern, as +described in the section entitled +"Newline conventions" +above. Note that the end of this type of comment is a literal newline sequence +in the pattern; escape sequences that happen to represent a newline do not +count. For example, consider this pattern when PCRE2_EXTENDED is set, and the +default newline convention (a single linefeed character) is in force: +

    +  abc #comment \n still comment
    +
    +On encountering the # character, pcre2_compile() skips along, looking for +a newline in the pattern. The sequence \n is still literal at this stage, so +it does not terminate the comment. Only an actual character with the code value +0x0a (the default newline) does so. +

    +
    RECURSIVE PATTERNS
    +

    +Consider the problem of matching a string in parentheses, allowing for +unlimited nested parentheses. Without the use of recursion, the best that can +be done is to use a pattern that matches up to some fixed depth of nesting. It +is not possible to handle an arbitrary nesting depth. +

    +

    +For some time, Perl has provided a facility that allows regular expressions to +recurse (amongst other things). It does this by interpolating Perl code in the +expression at run time, and the code can refer to the expression itself. A Perl +pattern using code interpolation to solve the parentheses problem can be +created like this: +

    +  $re = qr{\( (?: (?>[^()]+) | (?p{$re}) )* \)}x;
    +
    +The (?p{...}) item interpolates Perl code at run time, and in this case refers +recursively to the pattern in which it appears. +

    +

    +Obviously, PCRE2 cannot support the interpolation of Perl code. Instead, it +supports special syntax for recursion of the entire pattern, and also for +individual capture group recursion. After its introduction in PCRE1 and Python, +this kind of recursion was subsequently introduced into Perl at release 5.10. +

    +

    +A special item that consists of (? followed by a number greater than zero and a +closing parenthesis is a recursive subroutine call of the capture group of the +given number, provided that it occurs inside that group. (If not, it is a +non-recursive subroutine +call, which is described in the next section.) The special item (?R) or (?0) is +a recursive call of the entire regular expression. +

    +

    +This PCRE2 pattern solves the nested parentheses problem (assume the +PCRE2_EXTENDED option is set so that white space is ignored): +

    +  \( ( [^()]++ | (?R) )* \)
    +
    +First it matches an opening parenthesis. Then it matches any number of +substrings which can either be a sequence of non-parentheses, or a recursive +match of the pattern itself (that is, a correctly parenthesized substring). +Finally there is a closing parenthesis. Note the use of a possessive quantifier +to avoid backtracking into sequences of non-parentheses. +

    +

    +If this were part of a larger pattern, you would not want to recurse the entire +pattern, so instead you could use this: +

    +  ( \( ( [^()]++ | (?1) )* \) )
    +
    +We have put the pattern into parentheses, and caused the recursion to refer to +them instead of the whole pattern. +

    +

    +In a larger pattern, keeping track of parenthesis numbers can be tricky. This +is made easier by the use of relative references. Instead of (?1) in the +pattern above you can write (?-2) to refer to the second most recently opened +parentheses preceding the recursion. In other words, a negative number counts +capturing parentheses leftwards from the point at which it is encountered. +

    +

    +Be aware however, that if +duplicate capture group numbers +are in use, relative references refer to the earliest group with the +appropriate number. Consider, for example: +

    +  (?|(a)|(b)) (c) (?-2)
    +
    +The first two capture groups (a) and (b) are both numbered 1, and group (c) +is number 2. When the reference (?-2) is encountered, the second most recently +opened parentheses has the number 1, but it is the first such group (the (a) +group) to which the recursion refers. This would be the same if an absolute +reference (?1) was used. In other words, relative references are just a +shorthand for computing a group number. +

    +

    +It is also possible to refer to subsequent capture groups, by writing +references such as (?+2). However, these cannot be recursive because the +reference is not inside the parentheses that are referenced. They are always +non-recursive subroutine +calls, as described in the next section. +

    +

    +An alternative approach is to use named parentheses. The Perl syntax for this +is (?&name); PCRE1's earlier syntax (?P>name) is also supported. We could +rewrite the above example as follows: +

    +  (?<pn> \( ( [^()]++ | (?&pn) )* \) )
    +
    +If there is more than one group with the same name, the earliest one is +used. +

    +

    +The example pattern that we have been looking at contains nested unlimited +repeats, and so the use of a possessive quantifier for matching strings of +non-parentheses is important when applying the pattern to strings that do not +match. For example, when this pattern is applied to +

    +  (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
    +
    +it yields "no match" quickly. However, if a possessive quantifier is not used, +the match runs for a very long time indeed because there are so many different +ways the + and * repeats can carve up the subject, and all have to be tested +before failure can be reported. +

    +

    +At the end of a match, the values of capturing parentheses are those from +the outermost level. If you want to obtain intermediate values, a callout +function can be used (see below and the +pcre2callout +documentation). If the pattern above is matched against +

    +  (ab(cd)ef)
    +
    +the value for the inner capturing parentheses (numbered 2) is "ef", which is +the last value taken on at the top level. If a capture group is not matched at +the top level, its final captured value is unset, even if it was (temporarily) +set at a deeper level during the matching process. +

    +

    +Do not confuse the (?R) item with the condition (R), which tests for recursion. +Consider this pattern, which matches text in angle brackets, allowing for +arbitrary nesting. Only digits are allowed in nested brackets (that is, when +recursing), whereas any characters are permitted at the outer level. +

    +  < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >
    +
    +In this pattern, (?(R) is the start of a conditional group, with two different +alternatives for the recursive and non-recursive cases. The (?R) item is the +actual recursive call. +

    +
    +Differences in recursion processing between PCRE2 and Perl +
    +

    +Some former differences between PCRE2 and Perl no longer exist. +

    +

    +Before release 10.30, recursion processing in PCRE2 differed from Perl in that +a recursive subroutine call was always treated as an atomic group. That is, +once it had matched some of the subject string, it was never re-entered, even +if it contained untried alternatives and there was a subsequent matching +failure. (Historical note: PCRE implemented recursion before Perl did.) +

    +

    +Starting with release 10.30, recursive subroutine calls are no longer treated +as atomic. That is, they can be re-entered to try unused alternatives if there +is a matching failure later in the pattern. This is now compatible with the way +Perl works. If you want a subroutine call to be atomic, you must explicitly +enclose it in an atomic group. +

    +

    +Supporting backtracking into recursions simplifies certain types of recursive +pattern. For example, this pattern matches palindromic strings: +

    +  ^((.)(?1)\2|.?)$
    +
    +The second branch in the group matches a single central character in the +palindrome when there are an odd number of characters, or nothing when there +are an even number of characters, but in order to work it has to be able to try +the second case when the rest of the pattern match fails. If you want to match +typical palindromic phrases, the pattern has to ignore all non-word characters, +which can be done like this: +
    +  ^\W*+((.)\W*+(?1)\W*+\2|\W*+.?)\W*+$
    +
    +If run with the PCRE2_CASELESS option, this pattern matches phrases such as "A +man, a plan, a canal: Panama!". Note the use of the possessive quantifier *+ to +avoid backtracking into sequences of non-word characters. Without this, PCRE2 +takes a great deal longer (ten times or more) to match typical phrases, and +Perl takes so long that you think it has gone into a loop. +

    +

    +Another way in which PCRE2 and Perl used to differ in their recursion +processing is in the handling of captured values. Formerly in Perl, when a +group was called recursively or as a subroutine (see the next section), it +had no access to any values that were captured outside the recursion, whereas +in PCRE2 these values can be referenced. Consider this pattern: +

    +  ^(.)(\1|a(?2))
    +
    +This pattern matches "bab". The first capturing parentheses match "b", then in +the second group, when the backreference \1 fails to match "b", the second +alternative matches "a" and then recurses. In the recursion, \1 does now match +"b" and so the whole match succeeds. This match used to fail in Perl, but in +later versions (I tried 5.024) it now works. +

    +
    GROUPS AS SUBROUTINES
    +

    +If the syntax for a recursive group call (either by number or by name) is used +outside the parentheses to which it refers, it operates a bit like a subroutine +in a programming language. More accurately, PCRE2 treats the referenced group +as an independent subpattern which it tries to match at the current matching +position. The called group may be defined before or after the reference. A +numbered reference can be absolute or relative, as in these examples: +

    +  (...(absolute)...)...(?2)...
    +  (...(relative)...)...(?-1)...
    +  (...(?+1)...(relative)...
    +
    +An earlier example pointed out that the pattern +
    +  (sens|respons)e and \1ibility
    +
    +matches "sense and sensibility" and "response and responsibility", but not +"sense and responsibility". If instead the pattern +
    +  (sens|respons)e and (?1)ibility
    +
    +is used, it does match "sense and responsibility" as well as the other two +strings. Another example is given in the discussion of DEFINE above. +

    +

    +Like recursions, subroutine calls used to be treated as atomic, but this +changed at PCRE2 release 10.30, so backtracking into subroutine calls can now +occur. However, any capturing parentheses that are set during the subroutine +call revert to their previous values afterwards. +

    +

    +Processing options such as case-independence are fixed when a group is +defined, so if it is used as a subroutine, such options cannot be changed for +different calls. For example, consider this pattern: +

    +  (abc)(?i:(?-1))
    +
    +It matches "abcabc". It does not match "abcABC" because the change of +processing option does not affect the called group. +

    +

    +The behaviour of +backtracking control verbs +in groups when called as subroutines is described in the section entitled +"Backtracking verbs in subroutines" +below. +

    +
    ONIGURUMA SUBROUTINE SYNTAX
    +

    +For compatibility with Oniguruma, the non-Perl syntax \g followed by a name or +a number enclosed either in angle brackets or single quotes, is an alternative +syntax for calling a group as a subroutine, possibly recursively. Here are two +of the examples used above, rewritten using this syntax: +

    +  (?<pn> \( ( (?>[^()]+) | \g<pn> )* \) )
    +  (sens|respons)e and \g'1'ibility
    +
    +PCRE2 supports an extension to Oniguruma: if a number is preceded by a +plus or a minus sign it is taken as a relative reference. For example: +
    +  (abc)(?i:\g<-1>)
    +
    +Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not +synonymous. The former is a backreference; the latter is a subroutine call. +

    +
    CALLOUTS
    +

    +Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl +code to be obeyed in the middle of matching a regular expression. This makes it +possible, amongst other things, to extract different substrings that match the +same pair of parentheses when there is a repetition. +

    +

    +PCRE2 provides a similar feature, but of course it cannot obey arbitrary Perl +code. The feature is called "callout". The caller of PCRE2 provides an external +function by putting its entry point in a match context using the function +pcre2_set_callout(), and then passing that context to pcre2_match() +or pcre2_dfa_match(). If no match context is passed, or if the callout +entry point is set to NULL, callouts are disabled. +

    +

    +Within a regular expression, (?C<arg>) indicates a point at which the external +function is to be called. There are two kinds of callout: those with a +numerical argument and those with a string argument. (?C) on its own with no +argument is treated as (?C0). A numerical argument allows the application to +distinguish between different callouts. String arguments were added for release +10.20 to make it possible for script languages that use PCRE2 to embed short +scripts within patterns in a similar way to Perl. +

    +

    +During matching, when PCRE2 reaches a callout point, the external function is +called. It is provided with the number or string argument of the callout, the +position in the pattern, and one item of data that is also set in the match +block. The callout function may cause matching to proceed, to backtrack, or to +fail. +

    +

    +By default, PCRE2 implements a number of optimizations at matching time, and +one side-effect is that sometimes callouts are skipped. If you need all +possible callouts to happen, you need to set options that disable the relevant +optimizations. More details, including a complete description of the +programming interface to the callout function, are given in the +pcre2callout +documentation. +

    +
    +Callouts with numerical arguments +
    +

    +If you just want to have a means of identifying different callout points, put a +number less than 256 after the letter C. For example, this pattern has two +callout points: +

    +  (?C1)abc(?C2)def
    +
    +If the PCRE2_AUTO_CALLOUT flag is passed to pcre2_compile(), numerical +callouts are automatically installed before each item in the pattern. They are +all numbered 255. If there is a conditional group in the pattern whose +condition is an assertion, an additional callout is inserted just before the +condition. An explicit callout may also be set at this position, as in this +example: +
    +  (?(?C9)(?=a)abc|def)
    +
    +Note that this applies only to assertion conditions, not to other types of +condition. +

    +
    +Callouts with string arguments +
    +

    +A delimited string may be used instead of a number as a callout argument. The +starting delimiter must be one of ` ' " ^ % # $ { and the ending delimiter is +the same as the start, except for {, where the ending delimiter is }. If the +ending delimiter is needed within the string, it must be doubled. For +example: +

    +  (?C'ab ''c'' d')xyz(?C{any text})pqr
    +
    +The doubling is removed before the string is passed to the callout function. +

    +
    BACKTRACKING CONTROL
    +

    +There are a number of special "Backtracking Control Verbs" (to use Perl's +terminology) that modify the behaviour of backtracking during matching. They +are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form, +and may behave differently depending on whether or not a name argument is +present. The names are not required to be unique within the pattern. +

    +

    +By default, for compatibility with Perl, a name is any sequence of characters +that does not include a closing parenthesis. The name is not processed in +any way, and it is not possible to include a closing parenthesis in the name. +This can be changed by setting the PCRE2_ALT_VERBNAMES option, but the result +is no longer Perl-compatible. +

    +

    +When PCRE2_ALT_VERBNAMES is set, backslash processing is applied to verb names +and only an unescaped closing parenthesis terminates the name. However, the +only backslash items that are permitted are \Q, \E, and sequences such as +\x{100} that define character code points. Character type escapes such as \d +are faulted. +

    +

    +A closing parenthesis can be included in a name either as \) or between \Q +and \E. In addition to backslash processing, if the PCRE2_EXTENDED or +PCRE2_EXTENDED_MORE option is also set, unescaped whitespace in verb names is +skipped, and #-comments are recognized, exactly as in the rest of the pattern. +PCRE2_EXTENDED and PCRE2_EXTENDED_MORE do not affect verb names unless +PCRE2_ALT_VERBNAMES is also set. +

    +

    +The maximum length of a name is 255 in the 8-bit library and 65535 in the +16-bit and 32-bit libraries. If the name is empty, that is, if the closing +parenthesis immediately follows the colon, the effect is as if the colon were +not there. Any number of these verbs may occur in a pattern. Except for +(*ACCEPT), they may not be quantified. +

    +

    +Since these verbs are specifically related to backtracking, most of them can be +used only when the pattern is to be matched using the traditional matching +function, because that uses a backtracking algorithm. With the exception of +(*FAIL), which behaves like a failing negative assertion, the backtracking +control verbs cause an error if encountered by the DFA matching function. +

    +

    +The behaviour of these verbs in +repeated groups, +assertions, +and in +capture groups called as subroutines +(whether or not recursively) is documented below. +

    +
    +Optimizations that affect backtracking verbs +
    +

    +PCRE2 contains some optimizations that are used to speed up matching by running +some checks at the start of each match attempt. For example, it may know the +minimum length of matching subject, or that a particular character must be +present. When one of these optimizations bypasses the running of a match, any +included backtracking verbs will not, of course, be processed. You can suppress +the start-of-match optimizations by setting the PCRE2_NO_START_OPTIMIZE option +when calling pcre2_compile(), or by starting the pattern with +(*NO_START_OPT). There is more discussion of this option in the section +entitled +"Compiling a pattern" +in the +pcre2api +documentation. +

    +

    +Experiments with Perl suggest that it too has similar optimizations, and like +PCRE2, turning them off can change the result of a match. +

    +
    +Verbs that act immediately +
    +

    +The following verbs act as soon as they are encountered. +

    +   (*ACCEPT) or (*ACCEPT:NAME)
    +
    +This verb causes the match to end successfully, skipping the remainder of the +pattern. However, when it is inside a capture group that is called as a +subroutine, only that group is ended successfully. Matching then continues +at the outer level. If (*ACCEPT) in triggered in a positive assertion, the +assertion succeeds; in a negative assertion, the assertion fails. +

    +

    +If (*ACCEPT) is inside capturing parentheses, the data so far is captured. For +example: +

    +  A((?:A|B(*ACCEPT)|C)D)
    +
    +This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by +the outer parentheses. +

    +

    +(*ACCEPT) is the only backtracking verb that is allowed to be quantified +because an ungreedy quantification with a minimum of zero acts only when a +backtrack happens. Consider, for example, +

    +  (A(*ACCEPT)??B)C
    +
    +where A, B, and C may be complex expressions. After matching "A", the matcher +processes "BC"; if that fails, causing a backtrack, (*ACCEPT) is triggered and +the match succeeds. In both cases, all but C is captured. Whereas (*COMMIT) +(see below) means "fail on backtrack", a repeated (*ACCEPT) of this type means +"succeed on backtrack". +

    +

    +Warning: (*ACCEPT) should not be used within a script run group, because +it causes an immediate exit from the group, bypassing the script run checking. +

    +  (*FAIL) or (*FAIL:NAME)
    +
    +This verb causes a matching failure, forcing backtracking to occur. It may be +abbreviated to (*F). It is equivalent to (?!) but easier to read. The Perl +documentation notes that it is probably useful only when combined with (?{}) or +(??{}). Those are, of course, Perl features that are not present in PCRE2. The +nearest equivalent is the callout feature, as for example in this pattern: +
    +  a+(?C)(*FAIL)
    +
    +A match with the string "aaaa" always fails, but the callout is taken before +each backtrack happens (in this example, 10 times). +

    +

    +(*ACCEPT:NAME) and (*FAIL:NAME) behave the same as (*MARK:NAME)(*ACCEPT) and +(*MARK:NAME)(*FAIL), respectively, that is, a (*MARK) is recorded just before +the verb acts. +

    +
    +Recording which path was taken +
    +

    +There is one verb whose main purpose is to track how a match was arrived at, +though it also has a secondary use in conjunction with advancing the match +starting point (see (*SKIP) below). +

    +  (*MARK:NAME) or (*:NAME)
    +
    +A name is always required with this verb. For all the other backtracking +control verbs, a NAME argument is optional. +

    +

    +When a match succeeds, the name of the last-encountered mark name on the +matching path is passed back to the caller as described in the section entitled +"Other information about the match" +in the +pcre2api +documentation. This applies to all instances of (*MARK) and other verbs, +including those inside assertions and atomic groups. However, there are +differences in those cases when (*MARK) is used in conjunction with (*SKIP) as +described below. +

    +

    +The mark name that was last encountered on the matching path is passed back. A +verb without a NAME argument is ignored for this purpose. Here is an example of +pcre2test output, where the "mark" modifier requests the retrieval and +outputting of (*MARK) data: +

    +    re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
    +  data> XY
    +   0: XY
    +  MK: A
    +  XZ
    +   0: XZ
    +  MK: B
    +
    +The (*MARK) name is tagged with "MK:" in this output, and in this example it +indicates which of the two alternatives matched. This is a more efficient way +of obtaining this information than putting each alternative in its own +capturing parentheses. +

    +

    +If a verb with a name is encountered in a positive assertion that is true, the +name is recorded and passed back if it is the last-encountered. This does not +happen for negative assertions or failing positive assertions. +

    +

    +After a partial match or a failed match, the last encountered name in the +entire match process is returned. For example: +

    +    re> /X(*MARK:A)Y|X(*MARK:B)Z/mark
    +  data> XP
    +  No match, mark = B
    +
    +Note that in this unanchored example the mark is retained from the match +attempt that started at the letter "X" in the subject. Subsequent match +attempts starting at "P" and then with an empty string do not get as far as the +(*MARK) item, but nevertheless do not reset it. +

    +

    +If you are interested in (*MARK) values after failed matches, you should +probably set the PCRE2_NO_START_OPTIMIZE option +(see above) +to ensure that the match is always attempted. +

    +
    +Verbs that act after backtracking +
    +

    +The following verbs do nothing when they are encountered. Matching continues +with what follows, but if there is a subsequent match failure, causing a +backtrack to the verb, a failure is forced. That is, backtracking cannot pass +to the left of the verb. However, when one of these verbs appears inside an +atomic group or in a lookaround assertion that is true, its effect is confined +to that group, because once the group has been matched, there is never any +backtracking into it. Backtracking from beyond an assertion or an atomic group +ignores the entire group, and seeks a preceding backtracking point. +

    +

    +These verbs differ in exactly what kind of failure occurs when backtracking +reaches them. The behaviour described below is what happens when the verb is +not in a subroutine or an assertion. Subsequent sections cover these special +cases. +

    +  (*COMMIT) or (*COMMIT:NAME)
    +
    +This verb causes the whole match to fail outright if there is a later matching +failure that causes backtracking to reach it. Even if the pattern is +unanchored, no further attempts to find a match by advancing the starting point +take place. If (*COMMIT) is the only backtracking verb that is encountered, +once it has been passed pcre2_match() is committed to finding a match at +the current starting point, or not at all. For example: +
    +  a+(*COMMIT)b
    +
    +This matches "xxaab" but not "aacaab". It can be thought of as a kind of +dynamic anchor, or "I've started, so I must finish." +

    +

    +The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is +like (*MARK:NAME) in that the name is remembered for passing back to the +caller. However, (*SKIP:NAME) searches only for names that are set with +(*MARK), ignoring those set by any of the other backtracking verbs. +

    +

    +If there is more than one backtracking verb in a pattern, a different one that +follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a +match does not always guarantee that a match must be at this starting point. +

    +

    +Note that (*COMMIT) at the start of a pattern is not the same as an anchor, +unless PCRE2's start-of-match optimizations are turned off, as shown in this +output from pcre2test: +

    +    re> /(*COMMIT)abc/
    +  data> xyzabc
    +   0: abc
    +  data>
    +  re> /(*COMMIT)abc/no_start_optimize
    +  data> xyzabc
    +  No match
    +
    +For the first pattern, PCRE2 knows that any match must start with "a", so the +optimization skips along the subject to "a" before applying the pattern to the +first set of data. The match attempt then succeeds. The second pattern disables +the optimization that skips along to the first character. The pattern is now +applied starting at "x", and so the (*COMMIT) causes the match to fail without +trying any other starting points. +
    +  (*PRUNE) or (*PRUNE:NAME)
    +
    +This verb causes the match to fail at the current starting position in the +subject if there is a later matching failure that causes backtracking to reach +it. If the pattern is unanchored, the normal "bumpalong" advance to the next +starting character then happens. Backtracking can occur as usual to the left of +(*PRUNE), before it is reached, or when matching to the right of (*PRUNE), but +if there is no match to the right, backtracking cannot cross (*PRUNE). In +simple cases, the use of (*PRUNE) is just an alternative to an atomic group or +possessive quantifier, but there are some uses of (*PRUNE) that cannot be +expressed in any other way. In an anchored pattern (*PRUNE) has the same effect +as (*COMMIT). +

    +

    +The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is +like (*MARK:NAME) in that the name is remembered for passing back to the +caller. However, (*SKIP:NAME) searches only for names set with (*MARK), +ignoring those set by other backtracking verbs. +

    +  (*SKIP)
    +
    +This verb, when given without a name, is like (*PRUNE), except that if the +pattern is unanchored, the "bumpalong" advance is not to the next character, +but to the position in the subject where (*SKIP) was encountered. (*SKIP) +signifies that whatever text was matched leading up to it cannot be part of a +successful match if there is a later mismatch. Consider: +
    +  a+(*SKIP)b
    +
    +If the subject is "aaaac...", after the first match attempt fails (starting at +the first character in the string), the starting point skips on to start the +next attempt at "c". Note that a possessive quantifer does not have the same +effect as this example; although it would suppress backtracking during the +first match attempt, the second attempt would start at the second character +instead of skipping on to "c". +

    +

    +If (*SKIP) is used to specify a new starting position that is the same as the +starting position of the current match, or (by being inside a lookbehind) +earlier, the position specified by (*SKIP) is ignored, and instead the normal +"bumpalong" occurs. +

    +  (*SKIP:NAME)
    +
    +When (*SKIP) has an associated name, its behaviour is modified. When such a +(*SKIP) is triggered, the previous path through the pattern is searched for the +most recent (*MARK) that has the same name. If one is found, the "bumpalong" +advance is to the subject position that corresponds to that (*MARK) instead of +to where (*SKIP) was encountered. If no (*MARK) with a matching name is found, +the (*SKIP) is ignored. +

    +

    +The search for a (*MARK) name uses the normal backtracking mechanism, which +means that it does not see (*MARK) settings that are inside atomic groups or +assertions, because they are never re-entered by backtracking. Compare the +following pcre2test examples: +

    +    re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/
    +  data: abc
    +   0: a
    +   1: a
    +  data:
    +    re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/
    +  data: abc
    +   0: b
    +   1: b
    +
    +In the first example, the (*MARK) setting is in an atomic group, so it is not +seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows +the second branch of the pattern to be tried at the first character position. +In the second example, the (*MARK) setting is not in an atomic group. This +allows (*SKIP:X) to find the (*MARK) when it backtracks, and this causes a new +matching attempt to start at the second character. This time, the (*MARK) is +never seen because "a" does not match "b", so the matcher immediately jumps to +the second branch of the pattern. +

    +

    +Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores +names that are set by other backtracking verbs. +

    +  (*THEN) or (*THEN:NAME)
    +
    +This verb causes a skip to the next innermost alternative when backtracking +reaches it. That is, it cancels any further backtracking within the current +alternative. Its name comes from the observation that it can be used for a +pattern-based if-then-else block: +
    +  ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ...
    +
    +If the COND1 pattern matches, FOO is tried (and possibly further items after +the end of the group if FOO succeeds); on failure, the matcher skips to the +second alternative and tries COND2, without backtracking into COND1. If that +succeeds and BAR fails, COND3 is tried. If subsequently BAZ fails, there are no +more alternatives, so there is a backtrack to whatever came before the entire +group. If (*THEN) is not inside an alternation, it acts like (*PRUNE). +

    +

    +The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is +like (*MARK:NAME) in that the name is remembered for passing back to the +caller. However, (*SKIP:NAME) searches only for names set with (*MARK), +ignoring those set by other backtracking verbs. +

    +

    +A group that does not contain a | character is just a part of the enclosing +alternative; it is not a nested alternation with only one alternative. The +effect of (*THEN) extends beyond such a group to the enclosing alternative. +Consider this pattern, where A, B, etc. are complex pattern fragments that do +not contain any | characters at this level: +

    +  A (B(*THEN)C) | D
    +
    +If A and B are matched, but there is a failure in C, matching does not +backtrack into A; instead it moves to the next alternative, that is, D. +However, if the group containing (*THEN) is given an alternative, it +behaves differently: +
    +  A (B(*THEN)C | (*FAIL)) | D
    +
    +The effect of (*THEN) is now confined to the inner group. After a failure in C, +matching moves to (*FAIL), which causes the whole group to fail because there +are no more alternatives to try. In this case, matching does backtrack into A. +

    +

    +Note that a conditional group is not considered as having two alternatives, +because only one is ever used. In other words, the | character in a conditional +group has a different meaning. Ignoring white space, consider: +

    +  ^.*? (?(?=a) a | b(*THEN)c )
    +
    +If the subject is "ba", this pattern does not match. Because .*? is ungreedy, +it initially matches zero characters. The condition (?=a) then fails, the +character "b" is matched, but "c" is not. At this point, matching does not +backtrack to .*? as might perhaps be expected from the presence of the | +character. The conditional group is part of the single alternative that +comprises the whole pattern, and so the match fails. (If there was a backtrack +into .*?, allowing it to match "b", the match would succeed.) +

    +

    +The verbs just described provide four different "strengths" of control when +subsequent matching fails. (*THEN) is the weakest, carrying on the match at the +next alternative. (*PRUNE) comes next, failing the match at the current +starting position, but allowing an advance to the next character (for an +unanchored pattern). (*SKIP) is similar, except that the advance may be more +than one character. (*COMMIT) is the strongest, causing the entire match to +fail. +

    +
    +More than one backtracking verb +
    +

    +If more than one backtracking verb is present in a pattern, the one that is +backtracked onto first acts. For example, consider this pattern, where A, B, +etc. are complex pattern fragments: +

    +  (A(*COMMIT)B(*THEN)C|ABD)
    +
    +If A matches but B fails, the backtrack to (*COMMIT) causes the entire match to +fail. However, if A and B match, but C fails, the backtrack to (*THEN) causes +the next alternative (ABD) to be tried. This behaviour is consistent, but is +not always the same as Perl's. It means that if two or more backtracking verbs +appear in succession, all the the last of them has no effect. Consider this +example: +
    +  ...(*COMMIT)(*PRUNE)...
    +
    +If there is a matching failure to the right, backtracking onto (*PRUNE) causes +it to be triggered, and its action is taken. There can never be a backtrack +onto (*COMMIT). +

    +
    +Backtracking verbs in repeated groups +
    +

    +PCRE2 sometimes differs from Perl in its handling of backtracking verbs in +repeated groups. For example, consider: +

    +  /(a(*COMMIT)b)+ac/
    +
    +If the subject is "abac", Perl matches unless its optimizations are disabled, +but PCRE2 always fails because the (*COMMIT) in the second repeat of the group +acts. +

    +
    +Backtracking verbs in assertions +
    +

    +(*FAIL) in any assertion has its normal effect: it forces an immediate +backtrack. The behaviour of the other backtracking verbs depends on whether or +not the assertion is standalone or acting as the condition in a conditional +group. +

    +

    +(*ACCEPT) in a standalone positive assertion causes the assertion to succeed +without any further processing; captured strings and a mark name (if set) are +retained. In a standalone negative assertion, (*ACCEPT) causes the assertion to +fail without any further processing; captured substrings and any mark name are +discarded. +

    +

    +If the assertion is a condition, (*ACCEPT) causes the condition to be true for +a positive assertion and false for a negative one; captured substrings are +retained in both cases. +

    +

    +The remaining verbs act only when a later failure causes a backtrack to +reach them. This means that, for the Perl-compatible assertions, their effect +is confined to the assertion, because Perl lookaround assertions are atomic. A +backtrack that occurs after such an assertion is complete does not jump back +into the assertion. Note in particular that a (*MARK) name that is set in an +assertion is not "seen" by an instance of (*SKIP:NAME) later in the pattern. +

    +

    +PCRE2 now supports non-atomic positive assertions, as described in the section +entitled +"Non-atomic assertions" +above. These assertions must be standalone (not used as conditions). They are +not Perl-compatible. For these assertions, a later backtrack does jump back +into the assertion, and therefore verbs such as (*COMMIT) can be triggered by +backtracks from later in the pattern. +

    +

    +The effect of (*THEN) is not allowed to escape beyond an assertion. If there +are no more branches to try, (*THEN) causes a positive assertion to be false, +and a negative assertion to be true. +

    +

    +The other backtracking verbs are not treated specially if they appear in a +standalone positive assertion. In a conditional positive assertion, +backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE) +causes the condition to be false. However, for both standalone and conditional +negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes +the assertion to be true, without considering any further alternative branches. +

    +
    +Backtracking verbs in subroutines +
    +

    +These behaviours occur whether or not the group is called recursively. +

    +

    +(*ACCEPT) in a group called as a subroutine causes the subroutine match to +succeed without any further processing. Matching then continues after the +subroutine call. Perl documents this behaviour. Perl's treatment of the other +verbs in subroutines is different in some cases. +

    +

    +(*FAIL) in a group called as a subroutine has its normal effect: it forces +an immediate backtrack. +

    +

    +(*COMMIT), (*SKIP), and (*PRUNE) cause the subroutine match to fail when +triggered by being backtracked to in a group called as a subroutine. There is +then a backtrack at the outer level. +

    +

    +(*THEN), when triggered, skips to the next alternative in the innermost +enclosing group that has alternatives (its normal behaviour). However, if there +is no such group within the subroutine's group, the subroutine match fails and +there is a backtrack at the outer level. +

    +
    SEE ALSO
    +

    +pcre2api(3), pcre2callout(3), pcre2matching(3), +pcre2syntax(3), pcre2(3). +

    +
    AUTHOR
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    REVISION
    +

    +Last updated: 06 October 2020 +
    +Copyright © 1997-2020 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2perform.html b/src/pcre2/doc/html/pcre2perform.html new file mode 100644 index 00000000..80d716c7 --- /dev/null +++ b/src/pcre2/doc/html/pcre2perform.html @@ -0,0 +1,261 @@ + + +pcre2perform specification + + +

    pcre2perform man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

    +
    PCRE2 PERFORMANCE
    +

    +Two aspects of performance are discussed below: memory usage and processing +time. The way you express your pattern as a regular expression can affect both +of them. +

    +
    COMPILED PATTERN MEMORY USAGE
    +

    +Patterns are compiled by PCRE2 into a reasonably efficient interpretive code, +so that most simple patterns do not use much memory for storing the compiled +version. However, there is one case where the memory usage of a compiled +pattern can be unexpectedly large. If a parenthesized group has a quantifier +with a minimum greater than 1 and/or a limited maximum, the whole group is +repeated in the compiled code. For example, the pattern +

    +  (abc|def){2,4}
    +
    +is compiled as if it were +
    +  (abc|def)(abc|def)((abc|def)(abc|def)?)?
    +
    +(Technical aside: It is done this way so that backtrack points within each of +the repetitions can be independently maintained.) +

    +

    +For regular expressions whose quantifiers use only small numbers, this is not +usually a problem. However, if the numbers are large, and particularly if such +repetitions are nested, the memory usage can become an embarrassment. For +example, the very simple pattern +

    +  ((ab){1,1000}c){1,3}
    +
    +uses over 50KiB when compiled using the 8-bit library. When PCRE2 is +compiled with its default internal pointer size of two bytes, the size limit on +a compiled pattern is 65535 code units in the 8-bit and 16-bit libraries, and +this is reached with the above pattern if the outer repetition is increased +from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus +handle larger compiled patterns, but it is better to try to rewrite your +pattern to use less memory if you can. +

    +

    +One way of reducing the memory usage for such patterns is to make use of +PCRE2's +"subroutine" +facility. Re-writing the above pattern as +

    +  ((ab)(?2){0,999}c)(?1){0,2}
    +
    +reduces the memory requirements to around 16KiB, and indeed it remains under +20KiB even with the outer repetition increased to 100. However, this kind of +pattern is not always exactly equivalent, because any captures within +subroutine calls are lost when the subroutine completes. If this is not a +problem, this kind of rewriting will allow you to process patterns that PCRE2 +cannot otherwise handle. The matching performance of the two different versions +of the pattern are roughly the same. (This applies from release 10.30 - things +were different in earlier releases.) +

    +
    STACK AND HEAP USAGE AT RUN TIME
    +

    +From release 10.30, the interpretive (non-JIT) version of pcre2_match() +uses very little system stack at run time. In earlier releases recursive +function calls could use a great deal of stack, and this could cause problems, +but this usage has been eliminated. Backtracking positions are now explicitly +remembered in memory frames controlled by the code. An initial 20KiB vector of +frames is allocated on the system stack (enough for about 100 frames for small +patterns), but if this is insufficient, heap memory is used. The amount of heap +memory can be limited; if the limit is set to zero, only the initial stack +vector is used. Rewriting patterns to be time-efficient, as described below, +may also reduce the memory requirements. +

    +

    +In contrast to pcre2_match(), pcre2_dfa_match() does use recursive +function calls, but only for processing atomic groups, lookaround assertions, +and recursion within the pattern. The original version of the code used to +allocate quite large internal workspace vectors on the stack, which caused some +problems for some patterns in environments with small stacks. From release +10.32 the code for pcre2_dfa_match() has been re-factored to use heap +memory when necessary for internal workspace when recursing, though recursive +function calls are still used. +

    +

    +The "match depth" parameter can be used to limit the depth of function +recursion, and the "match heap" parameter to limit heap memory in +pcre2_dfa_match(). +

    +
    PROCESSING TIME
    +

    +Certain items in regular expression patterns are processed more efficiently +than others. It is more efficient to use a character class like [aeiou] than a +set of single-character alternatives such as (a|e|i|o|u). In general, the +simplest construction that provides the required behaviour is usually the most +efficient. Jeffrey Friedl's book contains a lot of useful general discussion +about optimizing regular expressions for efficient performance. This document +contains a few observations about PCRE2. +

    +

    +Using Unicode character properties (the \p, \P, and \X escapes) is slow, +because PCRE2 has to use a multi-stage table lookup whenever it needs a +character's property. If you can find an alternative pattern that does not use +character properties, it will probably be faster. +

    +

    +By default, the escape sequences \b, \d, \s, and \w, and the POSIX +character classes such as [:alpha:] do not use Unicode properties, partly for +backwards compatibility, and partly for performance reasons. However, you can +set the PCRE2_UCP option or start the pattern with (*UCP) if you want Unicode +character properties to be used. This can double the matching time for items +such as \d, when matched with pcre2_match(); the performance loss is +less with a DFA matching function, and in both cases there is not much +difference for \b. +

    +

    +When a pattern begins with .* not in atomic parentheses, nor in parentheses +that are the subject of a backreference, and the PCRE2_DOTALL option is set, +the pattern is implicitly anchored by PCRE2, since it can match only at the +start of a subject string. If the pattern has multiple top-level branches, they +must all be anchorable. The optimization can be disabled by the +PCRE2_NO_DOTSTAR_ANCHOR option, and is automatically disabled if the pattern +contains (*PRUNE) or (*SKIP). +

    +

    +If PCRE2_DOTALL is not set, PCRE2 cannot make this optimization, because the +dot metacharacter does not then match a newline, and if the subject string +contains newlines, the pattern may match from the character immediately +following one of them instead of from the very start. For example, the pattern +

    +  .*second
    +
    +matches the subject "first\nand second" (where \n stands for a newline +character), with the match starting at the seventh character. In order to do +this, PCRE2 has to retry the match starting after every newline in the subject. +

    +

    +If you are using such a pattern with subject strings that do not contain +newlines, the best performance is obtained by setting PCRE2_DOTALL, or starting +the pattern with ^.* or ^.*? to indicate explicit anchoring. That saves PCRE2 +from having to scan along the subject looking for a newline to restart at. +

    +

    +Beware of patterns that contain nested indefinite repeats. These can take a +long time to run when applied to a string that does not match. Consider the +pattern fragment +

    +  ^(a+)*
    +
    +This can match "aaaa" in 16 different ways, and this number increases very +rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4 +times, and for each of those cases other than 0 or 4, the + repeats can match +different numbers of times.) When the remainder of the pattern is such that the +entire match is going to fail, PCRE2 has in principle to try every possible +variation, and this can take an extremely long time, even for relatively short +strings. +

    +

    +An optimization catches some of the more simple cases such as +

    +  (a+)*b
    +
    +where a literal character follows. Before embarking on the standard matching +procedure, PCRE2 checks that there is a "b" later in the subject string, and if +there is not, it fails the match immediately. However, when there is no +following literal this optimization cannot be used. You can see the difference +by comparing the behaviour of +
    +  (a+)*\d
    +
    +with the pattern above. The former gives a failure almost instantly when +applied to a whole line of "a" characters, whereas the latter takes an +appreciable time with strings longer than about 20 characters. +

    +

    +In many cases, the solution to this kind of performance issue is to use an +atomic group or a possessive quantifier. This can often reduce memory +requirements as well. As another example, consider this pattern: +

    +  ([^<]|<(?!inet))+
    +
    +It matches from wherever it starts until it encounters "<inet" or the end of +the data, and is the kind of pattern that might be used when processing an XML +file. Each iteration of the outer parentheses matches either one character that +is not "<" or a "<" that is not followed by "inet". However, each time a +parenthesis is processed, a backtracking position is passed, so this +formulation uses a memory frame for each matched character. For a long string, +a lot of memory is required. Consider now this rewritten pattern, which matches +exactly the same strings: +
    +  ([^<]++|<(?!inet))+
    +
    +This runs much faster, because sequences of characters that do not contain "<" +are "swallowed" in one item inside the parentheses, and a possessive quantifier +is used to stop any backtracking into the runs of non-"<" characters. This +version also uses a lot less memory because entry to a new set of parentheses +happens only when a "<" character that is not followed by "inet" is encountered +(and we assume this is relatively rare). +

    +

    +This example shows that one way of optimizing performance when matching long +subject strings is to write repeated parenthesized subpatterns to match more +than one character whenever possible. +

    +
    +SETTING RESOURCE LIMITS +
    +

    +You can set limits on the amount of processing that takes place when matching, +and on the amount of heap memory that is used. The default values of the limits +are very large, and unlikely ever to operate. They can be changed when PCRE2 is +built, and they can also be set when pcre2_match() or +pcre2_dfa_match() is called. For details of these interfaces, see the +pcre2build +documentation and the section entitled +"The match context" +in the +pcre2api +documentation. +

    +

    +The pcre2test test program has a modifier called "find_limits" which, if +applied to a subject line, causes it to find the smallest limits that allow a +pattern to match. This is done by repeatedly matching with different limits. +

    +
    AUTHOR
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    REVISION
    +

    +Last updated: 03 February 2019 +
    +Copyright © 1997-2019 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2posix.html b/src/pcre2/doc/html/pcre2posix.html new file mode 100644 index 00000000..0ad6f9e8 --- /dev/null +++ b/src/pcre2/doc/html/pcre2posix.html @@ -0,0 +1,356 @@ + + +pcre2posix specification + + +

    pcre2posix man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

    +
    SYNOPSIS
    +

    +#include <pcre2posix.h> +

    +

    +int pcre2_regcomp(regex_t *preg, const char *pattern, + int cflags); +
    +
    +int pcre2_regexec(const regex_t *preg, const char *string, + size_t nmatch, regmatch_t pmatch[], int eflags); +
    +
    +size_t pcre2_regerror(int errcode, const regex_t *preg, + char *errbuf, size_t errbuf_size); +
    +
    +void pcre2_regfree(regex_t *preg); +

    +
    DESCRIPTION
    +

    +This set of functions provides a POSIX-style API for the PCRE2 regular +expression 8-bit library. There are no POSIX-style wrappers for PCRE2's 16-bit +and 32-bit libraries. See the +pcre2api +documentation for a description of PCRE2's native API, which contains much +additional functionality. +

    +

    +The functions described here are wrapper functions that ultimately call the +PCRE2 native API. Their prototypes are defined in the pcre2posix.h header +file, and they all have unique names starting with pcre2_. However, the +pcre2posix.h header also contains macro definitions that convert the +standard POSIX names such regcomp() into pcre2_regcomp() etc. This +means that a program can use the usual POSIX names without running the risk of +accidentally linking with POSIX functions from a different library. +

    +

    +On Unix-like systems the PCRE2 POSIX library is called libpcre2-posix, so +can be accessed by adding -lpcre2-posix to the command for linking an +application. Because the POSIX functions call the native ones, it is also +necessary to add -lpcre2-8. +

    +

    +Although they were not defined as protypes in pcre2posix.h, releases +10.33 to 10.36 of the library contained functions with the POSIX names +regcomp() etc. These simply passed their arguments to the PCRE2 +functions. These functions were provided for backwards compatibility with +earlier versions of PCRE2, which had only POSIX names. However, this has proved +troublesome in situations where a program links with several libraries, some of +which use PCRE2's POSIX interface while others use the real POSIX functions. +For this reason, the POSIX names have been removed since release 10.37. +

    +

    +Calling the header file pcre2posix.h avoids any conflict with other POSIX +libraries. It can, of course, be renamed or aliased as regex.h, which is +the "correct" name, if there is no clash. It provides two structure types, +regex_t for compiled internal forms, and regmatch_t for returning +captured substrings. It also defines some constants whose names start with +"REG_"; these are used for setting options and identifying error codes. +

    +
    USING THE POSIX FUNCTIONS
    +

    +Those POSIX option bits that can reasonably be mapped to PCRE2 native options +have been implemented. In addition, the option REG_EXTENDED is defined with the +value zero. This has no effect, but since programs that are written to the +POSIX interface often use it, this makes it easier to slot in PCRE2 as a +replacement library. Other POSIX options are not even defined. +

    +

    +There are also some options that are not defined by POSIX. These have been +added at the request of users who want to make use of certain PCRE2-specific +features via the POSIX calling interface or to add BSD or GNU functionality. +

    +

    +When PCRE2 is called via these functions, it is only the API that is POSIX-like +in style. The syntax and semantics of the regular expressions themselves are +still those of Perl, subject to the setting of various PCRE2 options, as +described below. "POSIX-like in style" means that the API approximates to the +POSIX definition; it is not fully POSIX-compatible, and in multi-unit encoding +domains it is probably even less compatible. +

    +

    +The descriptions below use the actual names of the functions, but, as described +above, the standard POSIX names (without the pcre2_ prefix) may also be +used. +

    +
    COMPILING A PATTERN
    +

    +The function pcre2_regcomp() is called to compile a pattern into an +internal form. By default, the pattern is a C string terminated by a binary +zero (but see REG_PEND below). The preg argument is a pointer to a +regex_t structure that is used as a base for storing information about +the compiled regular expression. (It is also used for input when REG_PEND is +set.) +

    +

    +The argument cflags is either zero, or contains one or more of the bits +defined by the following macros: +

    +  REG_DOTALL
    +
    +The PCRE2_DOTALL option is set when the regular expression is passed for +compilation to the native function. Note that REG_DOTALL is not part of the +POSIX standard. +
    +  REG_ICASE
    +
    +The PCRE2_CASELESS option is set when the regular expression is passed for +compilation to the native function. +
    +  REG_NEWLINE
    +
    +The PCRE2_MULTILINE option is set when the regular expression is passed for +compilation to the native function. Note that this does not mimic the +defined POSIX behaviour for REG_NEWLINE (see the following section). +
    +  REG_NOSPEC
    +
    +The PCRE2_LITERAL option is set when the regular expression is passed for +compilation to the native function. This disables all meta characters in the +pattern, causing it to be treated as a literal string. The only other options +that are allowed with REG_NOSPEC are REG_ICASE, REG_NOSUB, REG_PEND, and +REG_UTF. Note that REG_NOSPEC is not part of the POSIX standard. +
    +  REG_NOSUB
    +
    +When a pattern that is compiled with this flag is passed to +pcre2_regexec() for matching, the nmatch and pmatch arguments +are ignored, and no captured strings are returned. Versions of the PCRE library +prior to 10.22 used to set the PCRE2_NO_AUTO_CAPTURE compile option, but this +no longer happens because it disables the use of backreferences. +
    +  REG_PEND
    +
    +If this option is set, the reg_endp field in the preg structure +(which has the type const char *) must be set to point to the character beyond +the end of the pattern before calling pcre2_regcomp(). The pattern itself +may now contain binary zeros, which are treated as data characters. Without +REG_PEND, a binary zero terminates the pattern and the re_endp field is +ignored. This is a GNU extension to the POSIX standard and should be used with +caution in software intended to be portable to other systems. +
    +  REG_UCP
    +
    +The PCRE2_UCP option is set when the regular expression is passed for +compilation to the native function. This causes PCRE2 to use Unicode properties +when matchine \d, \w, etc., instead of just recognizing ASCII values. Note +that REG_UCP is not part of the POSIX standard. +
    +  REG_UNGREEDY
    +
    +The PCRE2_UNGREEDY option is set when the regular expression is passed for +compilation to the native function. Note that REG_UNGREEDY is not part of the +POSIX standard. +
    +  REG_UTF
    +
    +The PCRE2_UTF option is set when the regular expression is passed for +compilation to the native function. This causes the pattern itself and all data +strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF +is not part of the POSIX standard. +

    +

    +In the absence of these flags, no options are passed to the native function. +This means the the regex is compiled with PCRE2 default semantics. In +particular, the way it handles newline characters in the subject string is the +Perl way, not the POSIX way. Note that setting PCRE2_MULTILINE has only +some of the effects specified for REG_NEWLINE. It does not affect the way +newlines are matched by the dot metacharacter (they are not) or by a negative +class such as [^a] (they are). +

    +

    +The yield of pcre2_regcomp() is zero on success, and non-zero otherwise. +The preg structure is filled in on success, and one other member of the +structure (as well as re_endp) is public: re_nsub contains the +number of capturing subpatterns in the regular expression. Various error codes +are defined in the header file. +

    +

    +NOTE: If the yield of pcre2_regcomp() is non-zero, you must not attempt +to use the contents of the preg structure. If, for example, you pass it +to pcre2_regexec(), the result is undefined and your program is likely to +crash. +

    +
    MATCHING NEWLINE CHARACTERS
    +

    +This area is not simple, because POSIX and Perl take different views of things. +It is not possible to get PCRE2 to obey POSIX semantics, but then PCRE2 was +never intended to be a POSIX engine. The following table lists the different +possibilities for matching newline characters in Perl and PCRE2: +

    +                          Default   Change with
    +
    +  . matches newline          no     PCRE2_DOTALL
    +  newline matches [^a]       yes    not changeable
    +  $ matches \n at end        yes    PCRE2_DOLLAR_ENDONLY
    +  $ matches \n in middle     no     PCRE2_MULTILINE
    +  ^ matches \n in middle     no     PCRE2_MULTILINE
    +
    +This is the equivalent table for a POSIX-compatible pattern matcher: +
    +                          Default   Change with
    +
    +  . matches newline          yes    REG_NEWLINE
    +  newline matches [^a]       yes    REG_NEWLINE
    +  $ matches \n at end        no     REG_NEWLINE
    +  $ matches \n in middle     no     REG_NEWLINE
    +  ^ matches \n in middle     no     REG_NEWLINE
    +
    +This behaviour is not what happens when PCRE2 is called via its POSIX +API. By default, PCRE2's behaviour is the same as Perl's, except that there is +no equivalent for PCRE2_DOLLAR_ENDONLY in Perl. In both PCRE2 and Perl, there +is no way to stop newline from matching [^a]. +

    +

    +Default POSIX newline handling can be obtained by setting PCRE2_DOTALL and +PCRE2_DOLLAR_ENDONLY when calling pcre2_compile() directly, but there is +no way to make PCRE2 behave exactly as for the REG_NEWLINE action. When using +the POSIX API, passing REG_NEWLINE to PCRE2's pcre2_regcomp() function +causes PCRE2_MULTILINE to be passed to pcre2_compile(), and REG_DOTALL +passes PCRE2_DOTALL. There is no way to pass PCRE2_DOLLAR_ENDONLY. +

    +
    MATCHING A PATTERN
    +

    +The function pcre2_regexec() is called to match a compiled pattern +preg against a given string, which is by default terminated by a +zero byte (but see REG_STARTEND below), subject to the options in eflags. +These can be: +

    +  REG_NOTBOL
    +
    +The PCRE2_NOTBOL option is set when calling the underlying PCRE2 matching +function. +
    +  REG_NOTEMPTY
    +
    +The PCRE2_NOTEMPTY option is set when calling the underlying PCRE2 matching +function. Note that REG_NOTEMPTY is not part of the POSIX standard. However, +setting this option can give more POSIX-like behaviour in some situations. +
    +  REG_NOTEOL
    +
    +The PCRE2_NOTEOL option is set when calling the underlying PCRE2 matching +function. +
    +  REG_STARTEND
    +
    +When this option is set, the subject string starts at string + +pmatch[0].rm_so and ends at string + pmatch[0].rm_eo, which +should point to the first character beyond the string. There may be binary +zeros within the subject string, and indeed, using REG_STARTEND is the only +way to pass a subject string that contains a binary zero. +

    +

    +Whatever the value of pmatch[0].rm_so, the offsets of the matched string +and any captured substrings are still given relative to the start of +string itself. (Before PCRE2 release 10.30 these were given relative to +string + pmatch[0].rm_so, but this differs from other +implementations.) +

    +

    +This is a BSD extension, compatible with but not specified by IEEE Standard +1003.2 (POSIX.2), and should be used with caution in software intended to be +portable to other systems. Note that a non-zero rm_so does not imply +REG_NOTBOL; REG_STARTEND affects only the location and length of the string, +not how it is matched. Setting REG_STARTEND and passing pmatch as NULL +are mutually exclusive; the error REG_INVARG is returned. +

    +

    +If the pattern was compiled with the REG_NOSUB flag, no data about any matched +strings is returned. The nmatch and pmatch arguments of +pcre2_regexec() are ignored (except possibly as input for REG_STARTEND). +

    +

    +The value of nmatch may be zero, and the value pmatch may be NULL +(unless REG_STARTEND is set); in both these cases no data about any matched +strings is returned. +

    +

    +Otherwise, the portion of the string that was matched, and also any captured +substrings, are returned via the pmatch argument, which points to an +array of nmatch structures of type regmatch_t, containing the +members rm_so and rm_eo. These contain the byte offset to the first +character of each substring and the offset to the first character after the end +of each substring, respectively. The 0th element of the vector relates to the +entire portion of string that was matched; subsequent elements relate to +the capturing subpatterns of the regular expression. Unused entries in the +array have both structure members set to -1. +

    +

    +A successful match yields a zero return; various error codes are defined in the +header file, of which REG_NOMATCH is the "expected" failure code. +

    +
    ERROR MESSAGES
    +

    +The pcre2_regerror() function maps a non-zero errorcode from either +pcre2_regcomp() or pcre2_regexec() to a printable message. If +preg is not NULL, the error should have arisen from the use of that +structure. A message terminated by a binary zero is placed in errbuf. If +the buffer is too short, only the first errbuf_size - 1 characters of the +error message are used. The yield of the function is the size of buffer needed +to hold the whole message, including the terminating zero. This value is +greater than errbuf_size if the message was truncated. +

    +
    MEMORY USAGE
    +

    +Compiling a regular expression causes memory to be allocated and associated +with the preg structure. The function pcre2_regfree() frees all +such memory, after which preg may no longer be used as a compiled +expression. +

    +
    AUTHOR
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    REVISION
    +

    +Last updated: 26 April 2021 +
    +Copyright © 1997-2021 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2sample.html b/src/pcre2/doc/html/pcre2sample.html new file mode 100644 index 00000000..2b36f1fc --- /dev/null +++ b/src/pcre2/doc/html/pcre2sample.html @@ -0,0 +1,110 @@ + + +pcre2sample specification + + +

    pcre2sample man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +PCRE2 SAMPLE PROGRAM +
    +

    +A simple, complete demonstration program to get you started with using PCRE2 is +supplied in the file pcre2demo.c in the src directory in the PCRE2 +distribution. A listing of this program is given in the +pcre2demo +documentation. If you do not have a copy of the PCRE2 distribution, you can +save this listing to re-create the contents of pcre2demo.c. +

    +

    +The demonstration program compiles the regular expression that is its +first argument, and matches it against the subject string in its second +argument. No PCRE2 options are set, and default character tables are used. If +matching succeeds, the program outputs the portion of the subject that matched, +together with the contents of any captured substrings. +

    +

    +If the -g option is given on the command line, the program then goes on to +check for further matches of the same regular expression in the same subject +string. The logic is a little bit tricky because of the possibility of matching +an empty string. Comments in the code explain what is going on. +

    +

    +The code in pcre2demo.c is an 8-bit program that uses the PCRE2 8-bit +library. It handles strings and characters that are stored in 8-bit code units. +By default, one character corresponds to one code unit, but if the pattern +starts with "(*UTF)", both it and the subject are treated as UTF-8 strings, +where characters may occupy multiple code units. +

    +

    +If PCRE2 is installed in the standard include and library directories for your +operating system, you should be able to compile the demonstration program using +a command like this: +

    +  cc -o pcre2demo pcre2demo.c -lpcre2-8
    +
    +If PCRE2 is installed elsewhere, you may need to add additional options to the +command line. For example, on a Unix-like system that has PCRE2 installed in +/usr/local, you can compile the demonstration program using a command +like this: +
    +  cc -o pcre2demo -I/usr/local/include pcre2demo.c -L/usr/local/lib -lpcre2-8
    +
    +Once you have built the demonstration program, you can run simple tests like +this: +
    +  ./pcre2demo 'cat|dog' 'the cat sat on the mat'
    +  ./pcre2demo -g 'cat|dog' 'the dog sat on the cat'
    +
    +Note that there is a much more comprehensive test program, called +pcre2test, +which supports many more facilities for testing regular expressions using all +three PCRE2 libraries (8-bit, 16-bit, and 32-bit, though not all three need be +installed). The +pcre2demo +program is provided as a relatively simple coding example. +

    +

    +If you try to run +pcre2demo +when PCRE2 is not installed in the standard library directory, you may get an +error like this on some operating systems (e.g. Solaris): +

    +  ld.so.1: pcre2demo: fatal: libpcre2-8.so.0: open failed: No such file or directory
    +
    +This is caused by the way shared library support works on those systems. You +need to add +
    +  -R/usr/local/lib
    +
    +(for example) to the compile command to get round this problem. +

    +
    +AUTHOR +
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    +REVISION +
    +

    +Last updated: 02 February 2016 +
    +Copyright © 1997-2016 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2serialize.html b/src/pcre2/doc/html/pcre2serialize.html new file mode 100644 index 00000000..18a8d7fa --- /dev/null +++ b/src/pcre2/doc/html/pcre2serialize.html @@ -0,0 +1,213 @@ + + +pcre2serialize specification + + +

    pcre2serialize man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

    +
    SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS
    +

    +int32_t pcre2_serialize_decode(pcre2_code **codes, + int32_t number_of_codes, const uint32_t *bytes, + pcre2_general_context *gcontext); +
    +
    +int32_t pcre2_serialize_encode(pcre2_code **codes, + int32_t number_of_codes, uint32_t **serialized_bytes, + PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext); +
    +
    +void pcre2_serialize_free(uint8_t *bytes); +
    +
    +int32_t pcre2_serialize_get_number_of_codes(const uint8_t *bytes); +
    +
    +If you are running an application that uses a large number of regular +expression patterns, it may be useful to store them in a precompiled form +instead of having to compile them every time the application is run. However, +if you are using the just-in-time optimization feature, it is not possible to +save and reload the JIT data, because it is position-dependent. The host on +which the patterns are reloaded must be running the same version of PCRE2, with +the same code unit width, and must also have the same endianness, pointer width +and PCRE2_SIZE type. For example, patterns compiled on a 32-bit system using +PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor can they be +reloaded using the 8-bit library. +

    +

    +Note that "serialization" in PCRE2 does not convert compiled patterns to an +abstract format like Java or .NET serialization. The serialized output is +really just a bytecode dump, which is why it can only be reloaded in the same +environment as the one that created it. Hence the restrictions mentioned above. +Applications that are not statically linked with a fixed version of PCRE2 must +be prepared to recompile patterns from their sources, in order to be immune to +PCRE2 upgrades. +

    +
    SECURITY CONCERNS
    +

    +The facility for saving and restoring compiled patterns is intended for use +within individual applications. As such, the data supplied to +pcre2_serialize_decode() is expected to be trusted data, not data from +arbitrary external sources. There is only some simple consistency checking, not +complete validation of what is being re-loaded. Corrupted data may cause +undefined results. For example, if the length field of a pattern in the +serialized data is corrupted, the deserializing code may read beyond the end of +the byte stream that is passed to it. +

    +
    SAVING COMPILED PATTERNS
    +

    +Before compiled patterns can be saved they must be serialized, which in PCRE2 +means converting the pattern to a stream of bytes. A single byte stream may +contain any number of compiled patterns, but they must all use the same +character tables. A single copy of the tables is included in the byte stream +(its size is 1088 bytes). For more details of character tables, see the +section on locale support +in the +pcre2api +documentation. +

    +

    +The function pcre2_serialize_encode() creates a serialized byte stream +from a list of compiled patterns. Its first two arguments specify the list, +being a pointer to a vector of pointers to compiled patterns, and the length of +the vector. The third and fourth arguments point to variables which are set to +point to the created byte stream and its length, respectively. The final +argument is a pointer to a general context, which can be used to specify custom +memory mangagement functions. If this argument is NULL, malloc() is used +to obtain memory for the byte stream. The yield of the function is the number +of serialized patterns, or one of the following negative error codes: +

    +  PCRE2_ERROR_BADDATA      the number of patterns is zero or less
    +  PCRE2_ERROR_BADMAGIC     mismatch of id bytes in one of the patterns
    +  PCRE2_ERROR_MEMORY       memory allocation failed
    +  PCRE2_ERROR_MIXEDTABLES  the patterns do not all use the same tables
    +  PCRE2_ERROR_NULL         the 1st, 3rd, or 4th argument is NULL
    +
    +PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or +that a slot in the vector does not point to a compiled pattern. +

    +

    +Once a set of patterns has been serialized you can save the data in any +appropriate manner. Here is sample code that compiles two patterns and writes +them to a file. It assumes that the variable fd refers to a file that is +open for output. The error checking that should be present in a real +application has been omitted for simplicity. +

    +  int errorcode;
    +  uint8_t *bytes;
    +  PCRE2_SIZE erroroffset;
    +  PCRE2_SIZE bytescount;
    +  pcre2_code *list_of_codes[2];
    +  list_of_codes[0] = pcre2_compile("first pattern",
    +    PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
    +  list_of_codes[1] = pcre2_compile("second pattern",
    +    PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL);
    +  errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes,
    +    &bytescount, NULL);
    +  errorcode = fwrite(bytes, 1, bytescount, fd);
    +
    +Note that the serialized data is binary data that may contain any of the 256 +possible byte values. On systems that make a distinction between binary and +non-binary data, be sure that the file is opened for binary output. +

    +

    +Serializing a set of patterns leaves the original data untouched, so they can +still be used for matching. Their memory must eventually be freed in the usual +way by calling pcre2_code_free(). When you have finished with the byte +stream, it too must be freed by calling pcre2_serialize_free(). If this +function is called with a NULL argument, it returns immediately without doing +anything. +

    +
    RE-USING PRECOMPILED PATTERNS
    +

    +In order to re-use a set of saved patterns you must first make the serialized +byte stream available in main memory (for example, by reading from a file). The +management of this memory block is up to the application. You can use the +pcre2_serialize_get_number_of_codes() function to find out how many +compiled patterns are in the serialized data without actually decoding the +patterns: +

    +  uint8_t *bytes = <serialized data>;
    +  int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes);
    +
    +The pcre2_serialize_decode() function reads a byte stream and recreates +the compiled patterns in new memory blocks, setting pointers to them in a +vector. The first two arguments are a pointer to a suitable vector and its +length, and the third argument points to a byte stream. The final argument is a +pointer to a general context, which can be used to specify custom memory +mangagement functions for the decoded patterns. If this argument is NULL, +malloc() and free() are used. After deserialization, the byte +stream is no longer needed and can be discarded. +
    +  int32_t number_of_codes;
    +  pcre2_code *list_of_codes[2];
    +  uint8_t *bytes = <serialized data>;
    +  int32_t number_of_codes =
    +    pcre2_serialize_decode(list_of_codes, 2, bytes, NULL);
    +
    +If the vector is not large enough for all the patterns in the byte stream, it +is filled with those that fit, and the remainder are ignored. The yield of the +function is the number of decoded patterns, or one of the following negative +error codes: +
    +  PCRE2_ERROR_BADDATA    second argument is zero or less
    +  PCRE2_ERROR_BADMAGIC   mismatch of id bytes in the data
    +  PCRE2_ERROR_BADMODE    mismatch of code unit size or PCRE2 version
    +  PCRE2_ERROR_BADSERIALIZEDDATA  other sanity check failure
    +  PCRE2_ERROR_MEMORY     memory allocation failed
    +  PCRE2_ERROR_NULL       first or third argument is NULL
    +
    +PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled +on a system with different endianness. +

    +

    +Decoded patterns can be used for matching in the usual way, and must be freed +by calling pcre2_code_free(). However, be aware that there is a potential +race issue if you are using multiple patterns that were decoded from a single +byte stream in a multithreaded application. A single copy of the character +tables is used by all the decoded patterns and a reference count is used to +arrange for its memory to be automatically freed when the last pattern is +freed, but there is no locking on this reference count. Therefore, if you want +to call pcre2_code_free() for these patterns in different threads, you +must arrange your own locking, and ensure that pcre2_code_free() cannot +be called by two threads at the same time. +

    +

    +If a pattern was processed by pcre2_jit_compile() before being +serialized, the JIT data is discarded and so is no longer available after a +save/restore cycle. You can, however, process a restored pattern with +pcre2_jit_compile() if you wish. +

    +
    AUTHOR
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    REVISION
    +

    +Last updated: 27 June 2018 +
    +Copyright © 1997-2018 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2syntax.html b/src/pcre2/doc/html/pcre2syntax.html new file mode 100644 index 00000000..7383104a --- /dev/null +++ b/src/pcre2/doc/html/pcre2syntax.html @@ -0,0 +1,698 @@ + + +pcre2syntax specification + + +

    pcre2syntax man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

    +
    PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY
    +

    +The full syntax and semantics of the regular expressions that are supported by +PCRE2 are described in the +pcre2pattern +documentation. This document contains a quick-reference summary of the syntax. +

    +
    QUOTING
    +

    +

    +  \x         where x is non-alphanumeric is a literal x
    +  \Q...\E    treat enclosed characters as literal
    +
    +

    +
    ESCAPED CHARACTERS
    +

    +This table applies to ASCII and Unicode environments. An unrecognized escape +sequence causes an error. +

    +  \a         alarm, that is, the BEL character (hex 07)
    +  \cx        "control-x", where x is any ASCII printing character
    +  \e         escape (hex 1B)
    +  \f         form feed (hex 0C)
    +  \n         newline (hex 0A)
    +  \r         carriage return (hex 0D)
    +  \t         tab (hex 09)
    +  \0dd       character with octal code 0dd
    +  \ddd       character with octal code ddd, or backreference
    +  \o{ddd..}  character with octal code ddd..
    +  \N{U+hh..} character with Unicode code point hh.. (Unicode mode only)
    +  \xhh       character with hex code hh
    +  \x{hh..}   character with hex code hh..
    +
    +If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the +following are also recognized: +
    +  \U         the character "U"
    +  \uhhhh     character with hex code hhhh
    +  \u{hh..}   character with hex code hh.. but only for EXTRA_ALT_BSUX
    +
    +When \x is not followed by {, from zero to two hexadecimal digits are read, +but in ALT_BSUX mode \x must be followed by two hexadecimal digits to be +recognized as a hexadecimal escape; otherwise it matches a literal "x". +Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits +or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it +matches a literal "u". +

    +

    +Note that \0dd is always an octal code. The treatment of backslash followed by +a non-zero digit is complicated; for details see the section +"Non-printing characters" +in the +pcre2pattern +documentation, where details of escape processing in EBCDIC environments are +also given. \N{U+hh..} is synonymous with \x{hh..} in PCRE2 but is not +supported in EBCDIC environments. Note that \N not followed by an opening +curly bracket has a different meaning (see below). +

    +
    CHARACTER TYPES
    +

    +

    +  .          any character except newline;
    +               in dotall mode, any character whatsoever
    +  \C         one code unit, even in UTF mode (best avoided)
    +  \d         a decimal digit
    +  \D         a character that is not a decimal digit
    +  \h         a horizontal white space character
    +  \H         a character that is not a horizontal white space character
    +  \N         a character that is not a newline
    +  \p{xx}     a character with the xx property
    +  \P{xx}     a character without the xx property
    +  \R         a newline sequence
    +  \s         a white space character
    +  \S         a character that is not a white space character
    +  \v         a vertical white space character
    +  \V         a character that is not a vertical white space character
    +  \w         a "word" character
    +  \W         a "non-word" character
    +  \X         a Unicode extended grapheme cluster
    +
    +\C is dangerous because it may leave the current matching point in the middle +of a UTF-8 or UTF-16 character. The application can lock out the use of \C by +setting the PCRE2_NEVER_BACKSLASH_C option. It is also possible to build PCRE2 +with the use of \C permanently disabled. +

    +

    +By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode +or in the 16-bit and 32-bit libraries. However, if locale-specific matching is +happening, \s and \w may also match characters with code points in the range +128-255. If the PCRE2_UCP option is set, the behaviour of these escape +sequences is changed to use Unicode properties and they match many more +characters. +

    +
    GENERAL CATEGORY PROPERTIES FOR \p and \P
    +

    +

    +  C          Other
    +  Cc         Control
    +  Cf         Format
    +  Cn         Unassigned
    +  Co         Private use
    +  Cs         Surrogate
    +
    +  L          Letter
    +  Ll         Lower case letter
    +  Lm         Modifier letter
    +  Lo         Other letter
    +  Lt         Title case letter
    +  Lu         Upper case letter
    +  L&         Ll, Lu, or Lt
    +
    +  M          Mark
    +  Mc         Spacing mark
    +  Me         Enclosing mark
    +  Mn         Non-spacing mark
    +
    +  N          Number
    +  Nd         Decimal number
    +  Nl         Letter number
    +  No         Other number
    +
    +  P          Punctuation
    +  Pc         Connector punctuation
    +  Pd         Dash punctuation
    +  Pe         Close punctuation
    +  Pf         Final punctuation
    +  Pi         Initial punctuation
    +  Po         Other punctuation
    +  Ps         Open punctuation
    +
    +  S          Symbol
    +  Sc         Currency symbol
    +  Sk         Modifier symbol
    +  Sm         Mathematical symbol
    +  So         Other symbol
    +
    +  Z          Separator
    +  Zl         Line separator
    +  Zp         Paragraph separator
    +  Zs         Space separator
    +
    +

    +
    PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P
    +

    +

    +  Xan        Alphanumeric: union of properties L and N
    +  Xps        POSIX space: property Z or tab, NL, VT, FF, CR
    +  Xsp        Perl space: property Z or tab, NL, VT, FF, CR
    +  Xuc        Univerally-named character: one that can be
    +               represented by a Universal Character Name
    +  Xwd        Perl word: property Xan or underscore
    +
    +Perl and POSIX space are now the same. Perl added VT to its space character set +at release 5.18. +

    +
    SCRIPT NAMES FOR \p AND \P
    +

    +Adlam, +Ahom, +Anatolian_Hieroglyphs, +Arabic, +Armenian, +Avestan, +Balinese, +Bamum, +Bassa_Vah, +Batak, +Bengali, +Bhaiksuki, +Bopomofo, +Brahmi, +Braille, +Buginese, +Buhid, +Canadian_Aboriginal, +Carian, +Caucasian_Albanian, +Chakma, +Cham, +Cherokee, +Chorasmian, +Common, +Coptic, +Cuneiform, +Cypriot, +Cyrillic, +Deseret, +Devanagari, +Dives_Akuru, +Dogra, +Duployan, +Egyptian_Hieroglyphs, +Elbasan, +Elymaic, +Ethiopic, +Georgian, +Glagolitic, +Gothic, +Grantha, +Greek, +Gujarati, +Gunjala_Gondi, +Gurmukhi, +Han, +Hangul, +Hanifi_Rohingya, +Hanunoo, +Hatran, +Hebrew, +Hiragana, +Imperial_Aramaic, +Inherited, +Inscriptional_Pahlavi, +Inscriptional_Parthian, +Javanese, +Kaithi, +Kannada, +Katakana, +Kayah_Li, +Kharoshthi, +Khitan_Small_Script, +Khmer, +Khojki, +Khudawadi, +Lao, +Latin, +Lepcha, +Limbu, +Linear_A, +Linear_B, +Lisu, +Lycian, +Lydian, +Mahajani, +Makasar, +Malayalam, +Mandaic, +Manichaean, +Marchen, +Masaram_Gondi, +Medefaidrin, +Meetei_Mayek, +Mende_Kikakui, +Meroitic_Cursive, +Meroitic_Hieroglyphs, +Miao, +Modi, +Mongolian, +Mro, +Multani, +Myanmar, +Nabataean, +Nandinagari, +New_Tai_Lue, +Newa, +Nko, +Nushu, +Nyakeng_Puachue_Hmong, +Ogham, +Ol_Chiki, +Old_Hungarian, +Old_Italic, +Old_North_Arabian, +Old_Permic, +Old_Persian, +Old_Sogdian, +Old_South_Arabian, +Old_Turkic, +Oriya, +Osage, +Osmanya, +Pahawh_Hmong, +Palmyrene, +Pau_Cin_Hau, +Phags_Pa, +Phoenician, +Psalter_Pahlavi, +Rejang, +Runic, +Samaritan, +Saurashtra, +Sharada, +Shavian, +Siddham, +SignWriting, +Sinhala, +Sogdian, +Sora_Sompeng, +Soyombo, +Sundanese, +Syloti_Nagri, +Syriac, +Tagalog, +Tagbanwa, +Tai_Le, +Tai_Tham, +Tai_Viet, +Takri, +Tamil, +Tangut, +Telugu, +Thaana, +Thai, +Tibetan, +Tifinagh, +Tirhuta, +Ugaritic, +Vai, +Wancho, +Warang_Citi, +Yezidi, +Yi, +Zanabazar_Square. +

    +
    CHARACTER CLASSES
    +

    +

    +  [...]       positive character class
    +  [^...]      negative character class
    +  [x-y]       range (can be used for hex characters)
    +  [[:xxx:]]   positive POSIX named set
    +  [[:^xxx:]]  negative POSIX named set
    +
    +  alnum       alphanumeric
    +  alpha       alphabetic
    +  ascii       0-127
    +  blank       space or tab
    +  cntrl       control character
    +  digit       decimal digit
    +  graph       printing, excluding space
    +  lower       lower case letter
    +  print       printing, including space
    +  punct       printing, excluding alphanumeric
    +  space       white space
    +  upper       upper case letter
    +  word        same as \w
    +  xdigit      hexadecimal digit
    +
    +In PCRE2, POSIX character set names recognize only ASCII characters by default, +but some of them use Unicode properties if PCRE2_UCP is set. You can use +\Q...\E inside a character class. +

    +
    QUANTIFIERS
    +

    +

    +  ?           0 or 1, greedy
    +  ?+          0 or 1, possessive
    +  ??          0 or 1, lazy
    +  *           0 or more, greedy
    +  *+          0 or more, possessive
    +  *?          0 or more, lazy
    +  +           1 or more, greedy
    +  ++          1 or more, possessive
    +  +?          1 or more, lazy
    +  {n}         exactly n
    +  {n,m}       at least n, no more than m, greedy
    +  {n,m}+      at least n, no more than m, possessive
    +  {n,m}?      at least n, no more than m, lazy
    +  {n,}        n or more, greedy
    +  {n,}+       n or more, possessive
    +  {n,}?       n or more, lazy
    +
    +

    +
    ANCHORS AND SIMPLE ASSERTIONS
    +

    +

    +  \b          word boundary
    +  \B          not a word boundary
    +  ^           start of subject
    +                also after an internal newline in multiline mode
    +                (after any newline if PCRE2_ALT_CIRCUMFLEX is set)
    +  \A          start of subject
    +  $           end of subject
    +                also before newline at end of subject
    +                also before internal newline in multiline mode
    +  \Z          end of subject
    +                also before newline at end of subject
    +  \z          end of subject
    +  \G          first matching position in subject
    +
    +

    +
    REPORTED MATCH POINT SETTING
    +

    +

    +  \K          set reported start of match
    +
    +\K is honoured in positive assertions, but ignored in negative ones. +

    +
    ALTERNATION
    +

    +

    +  expr|expr|expr...
    +
    +

    +
    CAPTURING
    +

    +

    +  (...)           capture group
    +  (?<name>...)    named capture group (Perl)
    +  (?'name'...)    named capture group (Perl)
    +  (?P<name>...)   named capture group (Python)
    +  (?:...)         non-capture group
    +  (?|...)         non-capture group; reset group numbers for
    +                   capture groups in each alternative
    +
    +In non-UTF modes, names may contain underscores and ASCII letters and digits; +in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In +both cases, a name must not start with a digit. +

    +
    ATOMIC GROUPS
    +

    +

    +  (?>...)         atomic non-capture group
    +  (*atomic:...)   atomic non-capture group
    +
    +

    +
    COMMENT
    +

    +

    +  (?#....)        comment (not nestable)
    +
    +

    +
    OPTION SETTING
    +

    +Changes of these options within a group are automatically cancelled at the end +of the group. +

    +  (?i)            caseless
    +  (?J)            allow duplicate named groups
    +  (?m)            multiline
    +  (?n)            no auto capture
    +  (?s)            single line (dotall)
    +  (?U)            default ungreedy (lazy)
    +  (?x)            extended: ignore white space except in classes
    +  (?xx)           as (?x) but also ignore space and tab in classes
    +  (?-...)         unset option(s)
    +  (?^)            unset imnsx options
    +
    +Unsetting x or xx unsets both. Several options may be set at once, and a +mixture of setting and unsetting such as (?i-x) is allowed, but there may be +only one hyphen. Setting (but no unsetting) is allowed after (?^ for example +(?^in). An option setting may appear at the start of a non-capture group, for +example (?i:...). +

    +

    +The following are recognized only at the very start of a pattern or after one +of the newline or \R options with similar syntax. More than one of them may +appear. For the first three, d is a decimal number. +

    +  (*LIMIT_DEPTH=d) set the backtracking limit to d
    +  (*LIMIT_HEAP=d)  set the heap size limit to d * 1024 bytes
    +  (*LIMIT_MATCH=d) set the match limit to d
    +  (*NOTEMPTY)      set PCRE2_NOTEMPTY when matching
    +  (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
    +  (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS)
    +  (*NO_DOTSTAR_ANCHOR) no .* anchoring (PCRE2_NO_DOTSTAR_ANCHOR)
    +  (*NO_JIT)       disable JIT optimization
    +  (*NO_START_OPT) no start-match optimization (PCRE2_NO_START_OPTIMIZE)
    +  (*UTF)          set appropriate UTF mode for the library in use
    +  (*UCP)          set PCRE2_UCP (use Unicode properties for \d etc)
    +
    +Note that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the value of +the limits set by the caller of pcre2_match() or pcre2_dfa_match(), +not increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The +application can lock out the use of (*UTF) and (*UCP) by setting the +PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time. +

    +
    NEWLINE CONVENTION
    +

    +These are recognized only at the very start of the pattern or after option +settings with a similar syntax. +

    +  (*CR)           carriage return only
    +  (*LF)           linefeed only
    +  (*CRLF)         carriage return followed by linefeed
    +  (*ANYCRLF)      all three of the above
    +  (*ANY)          any Unicode newline sequence
    +  (*NUL)          the NUL character (binary zero)
    +
    +

    +
    WHAT \R MATCHES
    +

    +These are recognized only at the very start of the pattern or after option +setting with a similar syntax. +

    +  (*BSR_ANYCRLF)  CR, LF, or CRLF
    +  (*BSR_UNICODE)  any Unicode newline sequence
    +
    +

    +
    LOOKAHEAD AND LOOKBEHIND ASSERTIONS
    +

    +

    +  (?=...)                     )
    +  (*pla:...)                  ) positive lookahead
    +  (*positive_lookahead:...)   )
    +
    +  (?!...)                     )
    +  (*nla:...)                  ) negative lookahead
    +  (*negative_lookahead:...)   )
    +
    +  (?<=...)                    )
    +  (*plb:...)                  ) positive lookbehind
    +  (*positive_lookbehind:...)  )
    +
    +  (?<!...)                    )
    +  (*nlb:...)                  ) negative lookbehind
    +  (*negative_lookbehind:...)  )
    +
    +Each top-level branch of a lookbehind must be of a fixed length. +

    +
    NON-ATOMIC LOOKAROUND ASSERTIONS
    +

    +These assertions are specific to PCRE2 and are not Perl-compatible. +

    +  (?*...)                                )
    +  (*napla:...)                           ) synonyms
    +  (*non_atomic_positive_lookahead:...)   )
    +
    +  (?<*...)                               )
    +  (*naplb:...)                           ) synonyms
    +  (*non_atomic_positive_lookbehind:...)  )
    +
    +

    +
    SCRIPT RUNS
    +

    +

    +  (*script_run:...)           ) script run, can be backtracked into
    +  (*sr:...)                   )
    +
    +  (*atomic_script_run:...)    ) atomic script run
    +  (*asr:...)                  )
    +
    +

    +
    BACKREFERENCES
    +

    +

    +  \n              reference by number (can be ambiguous)
    +  \gn             reference by number
    +  \g{n}           reference by number
    +  \g+n            relative reference by number (PCRE2 extension)
    +  \g-n            relative reference by number
    +  \g{+n}          relative reference by number (PCRE2 extension)
    +  \g{-n}          relative reference by number
    +  \k<name>        reference by name (Perl)
    +  \k'name'        reference by name (Perl)
    +  \g{name}        reference by name (Perl)
    +  \k{name}        reference by name (.NET)
    +  (?P=name)       reference by name (Python)
    +
    +

    +
    SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)
    +

    +

    +  (?R)            recurse whole pattern
    +  (?n)            call subroutine by absolute number
    +  (?+n)           call subroutine by relative number
    +  (?-n)           call subroutine by relative number
    +  (?&name)        call subroutine by name (Perl)
    +  (?P>name)       call subroutine by name (Python)
    +  \g<name>        call subroutine by name (Oniguruma)
    +  \g'name'        call subroutine by name (Oniguruma)
    +  \g<n>           call subroutine by absolute number (Oniguruma)
    +  \g'n'           call subroutine by absolute number (Oniguruma)
    +  \g<+n>          call subroutine by relative number (PCRE2 extension)
    +  \g'+n'          call subroutine by relative number (PCRE2 extension)
    +  \g<-n>          call subroutine by relative number (PCRE2 extension)
    +  \g'-n'          call subroutine by relative number (PCRE2 extension)
    +
    +

    +
    CONDITIONAL PATTERNS
    +

    +

    +  (?(condition)yes-pattern)
    +  (?(condition)yes-pattern|no-pattern)
    +
    +  (?(n)               absolute reference condition
    +  (?(+n)              relative reference condition
    +  (?(-n)              relative reference condition
    +  (?(<name>)          named reference condition (Perl)
    +  (?('name')          named reference condition (Perl)
    +  (?(name)            named reference condition (PCRE2, deprecated)
    +  (?(R)               overall recursion condition
    +  (?(Rn)              specific numbered group recursion condition
    +  (?(R&name)          specific named group recursion condition
    +  (?(DEFINE)          define groups for reference
    +  (?(VERSION[>]=n.m)  test PCRE2 version
    +  (?(assert)          assertion condition
    +
    +Note the ambiguity of (?(R) and (?(Rn) which might be named reference +conditions or recursion tests. Such a condition is interpreted as a reference +condition if the relevant named group exists. +

    +
    BACKTRACKING CONTROL
    +

    +All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the +name is mandatory, for the others it is optional. (*SKIP) changes its behaviour +if :NAME is present. The others just set a name for passing back to the caller, +but this is not a name that (*SKIP) can see. The following act immediately they +are reached: +

    +  (*ACCEPT)       force successful match
    +  (*FAIL)         force backtrack; synonym (*F)
    +  (*MARK:NAME)    set name to be passed back; synonym (*:NAME)
    +
    +The following act only when a subsequent match failure causes a backtrack to +reach them. They all force a match failure, but they differ in what happens +afterwards. Those that advance the start-of-match point do so only if the +pattern is not anchored. +
    +  (*COMMIT)       overall failure, no advance of starting point
    +  (*PRUNE)        advance to next starting character
    +  (*SKIP)         advance to current matching position
    +  (*SKIP:NAME)    advance to position corresponding to an earlier
    +                  (*MARK:NAME); if not found, the (*SKIP) is ignored
    +  (*THEN)         local failure, backtrack to next alternation
    +
    +The effect of one of these verbs in a group called as a subroutine is confined +to the subroutine call. +

    +
    CALLOUTS
    +

    +

    +  (?C)            callout (assumed number 0)
    +  (?Cn)           callout with numerical data n
    +  (?C"text")      callout with string data
    +
    +The allowed string delimiters are ` ' " ^ % # $ (which are the same for the +start and the end), and the starting delimiter { matched with the ending +delimiter }. To encode the ending delimiter within the string, double it. +

    +
    SEE ALSO
    +

    +pcre2pattern(3), pcre2api(3), pcre2callout(3), +pcre2matching(3), pcre2(3). +

    +
    AUTHOR
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    REVISION
    +

    +Last updated: 28 December 2019 +
    +Copyright © 1997-2019 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2test.html b/src/pcre2/doc/html/pcre2test.html new file mode 100644 index 00000000..09d3a0ec --- /dev/null +++ b/src/pcre2/doc/html/pcre2test.html @@ -0,0 +1,2133 @@ + + +pcre2test specification + + +

    pcre2test man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +

    +
    SYNOPSIS
    +

    +pcre2test [options] [input file [output file]] +
    +
    +pcre2test is a test program for the PCRE2 regular expression libraries, +but it can also be used for experimenting with regular expressions. This +document describes the features of the test program; for details of the regular +expressions themselves, see the +pcre2pattern +documentation. For details of the PCRE2 library function calls and their +options, see the +pcre2api +documentation. +

    +

    +The input for pcre2test is a sequence of regular expression patterns and +subject strings to be matched. There are also command lines for setting +defaults and controlling some special actions. The output shows the result of +each match attempt. Modifiers on external or internal command lines, the +patterns, and the subject lines specify PCRE2 function options, control how the +subject is processed, and what output is produced. +

    +

    +As the original fairly simple PCRE library evolved, it acquired many different +features, and as a result, the original pcretest program ended up with a +lot of options in a messy, arcane syntax for testing all the features. The +move to the new PCRE2 API provided an opportunity to re-implement the test +program as pcre2test, with a cleaner modifier syntax. Nevertheless, there +are still many obscure modifiers, some of which are specifically designed for +use in conjunction with the test script and data files that are distributed as +part of PCRE2. All the modifiers are documented here, some without much +justification, but many of them are unlikely to be of use except when testing +the libraries. +

    +
    PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
    +

    +Different versions of the PCRE2 library can be built to support character +strings that are encoded in 8-bit, 16-bit, or 32-bit code units. One, two, or +all three of these libraries may be simultaneously installed. The +pcre2test program can be used to test all the libraries. However, its own +input and output are always in 8-bit format. When testing the 16-bit or 32-bit +libraries, patterns and subject strings are converted to 16-bit or 32-bit +format before being passed to the library functions. Results are converted back +to 8-bit code units for output. +

    +

    +In the rest of this document, the names of library functions and structures +are given in generic form, for example, pcre_compile(). The actual +names used in the libraries have a suffix _8, _16, or _32, as appropriate. +

    +
    INPUT ENCODING
    +

    +Input to pcre2test is processed line by line, either by calling the C +library's fgets() function, or via the libreadline library. In some +Windows environments character 26 (hex 1A) causes an immediate end of file, and +no further data is read, so this character should be avoided unless you really +want that action. +

    +

    +The input is processed using using C's string functions, so must not +contain binary zeros, even though in Unix-like environments, fgets() +treats any bytes other than newline as data characters. An error is generated +if a binary zero is encountered. By default subject lines are processed for +backslash escapes, which makes it possible to include any data value in strings +that are passed to the library for matching. For patterns, there is a facility +for specifying some or all of the 8-bit input characters as hexadecimal pairs, +which makes it possible to include binary zeros. +

    +
    +Input for the 16-bit and 32-bit libraries +
    +

    +When testing the 16-bit or 32-bit libraries, there is a need to be able to +generate character code points greater than 255 in the strings that are passed +to the library. For subject lines, backslash escapes can be used. In addition, +when the utf modifier (see +"Setting compilation options" +below) is set, the pattern and any following subject lines are interpreted as +UTF-8 strings and translated to UTF-16 or UTF-32 as appropriate. +

    +

    +For non-UTF testing of wide characters, the utf8_input modifier can be +used. This is mutually exclusive with utf, and is allowed only in 16-bit +or 32-bit mode. It causes the pattern and following subject lines to be treated +as UTF-8 according to the original definition (RFC 2279), which allows for +character values up to 0x7fffffff. Each character is placed in one 16-bit or +32-bit code unit (in the 16-bit case, values greater than 0xffff cause an error +to occur). +

    +

    +UTF-8 (in its original definition) is not capable of encoding values greater +than 0x7fffffff, but such values can be handled by the 32-bit library. When +testing this library in non-UTF mode with utf8_input set, if any +character is preceded by the byte 0xff (which is an invalid byte in UTF-8) +0x80000000 is added to the character's value. This is the only way of passing +such code points in a pattern string. For subject strings, using an escape +sequence is preferable. +

    +
    COMMAND LINE OPTIONS
    +

    +-8 +If the 8-bit library has been built, this option causes it to be used (this is +the default). If the 8-bit library has not been built, this option causes an +error. +

    +

    +-16 +If the 16-bit library has been built, this option causes it to be used. If only +the 16-bit library has been built, this is the default. If the 16-bit library +has not been built, this option causes an error. +

    +

    +-32 +If the 32-bit library has been built, this option causes it to be used. If only +the 32-bit library has been built, this is the default. If the 32-bit library +has not been built, this option causes an error. +

    +

    +-ac +Behave as if each pattern has the auto_callout modifier, that is, insert +automatic callouts into every pattern that is compiled. +

    +

    +-AC +As for -ac, but in addition behave as if each subject line has the +callout_extra modifier, that is, show additional information from +callouts. +

    +

    +-b +Behave as if each pattern has the fullbincode modifier; the full +internal binary form of the pattern is output after compilation. +

    +

    +-C +Output the version number of the PCRE2 library, and all available information +about the optional features that are included, and then exit with zero exit +code. All other options are ignored. If both -C and -LM are present, whichever +is first is recognized. +

    +

    +-C option +Output information about a specific build-time option, then exit. This +functionality is intended for use in scripts such as RunTest. The +following options output the value and set the exit code as indicated: +

    +  ebcdic-nl  the code for LF (= NL) in an EBCDIC environment:
    +               0x15 or 0x25
    +               0 if used in an ASCII environment
    +               exit code is always 0
    +  linksize   the configured internal link size (2, 3, or 4)
    +               exit code is set to the link size
    +  newline    the default newline setting:
    +               CR, LF, CRLF, ANYCRLF, ANY, or NUL
    +               exit code is always 0
    +  bsr        the default setting for what \R matches:
    +               ANYCRLF or ANY
    +               exit code is always 0
    +
    +The following options output 1 for true or 0 for false, and set the exit code +to the same value: +
    +  backslash-C  \C is supported (not locked out)
    +  ebcdic       compiled for an EBCDIC environment
    +  jit          just-in-time support is available
    +  pcre2-16     the 16-bit library was built
    +  pcre2-32     the 32-bit library was built
    +  pcre2-8      the 8-bit library was built
    +  unicode      Unicode support is available
    +
    +If an unknown option is given, an error message is output; the exit code is 0. +

    +

    +-d +Behave as if each pattern has the debug modifier; the internal +form and information about the compiled pattern is output after compilation; +-d is equivalent to -b -i. +

    +

    +-dfa +Behave as if each subject line has the dfa modifier; matching is done +using the pcre2_dfa_match() function instead of the default +pcre2_match(). +

    +

    +-error number[,number,...] +Call pcre2_get_error_message() for each of the error numbers in the +comma-separated list, display the resulting messages on the standard output, +then exit with zero exit code. The numbers may be positive or negative. This is +a convenience facility for PCRE2 maintainers. +

    +

    +-help +Output a brief summary these options and then exit. +

    +

    +-i +Behave as if each pattern has the info modifier; information about the +compiled pattern is given after compilation. +

    +

    +-jit +Behave as if each pattern line has the jit modifier; after successful +compilation, each pattern is passed to the just-in-time compiler, if available. +

    +

    +-jitfast +Behave as if each pattern line has the jitfast modifier; after +successful compilation, each pattern is passed to the just-in-time compiler, if +available, and each subject line is passed directly to the JIT matcher via its +"fast path". +

    +

    +-jitverify +Behave as if each pattern line has the jitverify modifier; after +successful compilation, each pattern is passed to the just-in-time compiler, if +available, and the use of JIT for matching is verified. +

    +

    +-LM +List modifiers: write a list of available pattern and subject modifiers to the +standard output, then exit with zero exit code. All other options are ignored. +If both -C and -LM are present, whichever is first is recognized. +

    +

    +-pattern modifier-list +Behave as if each pattern line contains the given modifiers. +

    +

    +-q +Do not output the version number of pcre2test at the start of execution. +

    +

    +-S size +On Unix-like systems, set the size of the run-time stack to size +mebibytes (units of 1024*1024 bytes). +

    +

    +-subject modifier-list +Behave as if each subject line contains the given modifiers. +

    +

    +-t +Run each compile and match many times with a timer, and output the resulting +times per compile or match. When JIT is used, separate times are given for the +initial compile and the JIT compile. You can control the number of iterations +that are used for timing by following -t with a number (as a separate +item on the command line). For example, "-t 1000" iterates 1000 times. The +default is to iterate 500,000 times. +

    +

    +-tm +This is like -t except that it times only the matching phase, not the +compile phase. +

    +

    +-T -TM +These behave like -t and -tm, but in addition, at the end of a run, +the total times for all compiles and matches are output. +

    +

    +-version +Output the PCRE2 version number and then exit. +

    +
    DESCRIPTION
    +

    +If pcre2test is given two filename arguments, it reads from the first and +writes to the second. If the first name is "-", input is taken from the +standard input. If pcre2test is given only one argument, it reads from +that file and writes to stdout. Otherwise, it reads from stdin and writes to +stdout. +

    +

    +When pcre2test is built, a configuration option can specify that it +should be linked with the libreadline or libedit library. When this +is done, if the input is from a terminal, it is read using the readline() +function. This provides line-editing and history facilities. The output from +the -help option states whether or not readline() will be used. +

    +

    +The program handles any number of tests, each of which consists of a set of +input lines. Each set starts with a regular expression pattern, followed by any +number of subject lines to be matched against that pattern. In between sets of +test data, command lines that begin with # may appear. This file format, with +some restrictions, can also be processed by the perltest.sh script that +is distributed with PCRE2 as a means of checking that the behaviour of PCRE2 +and Perl is the same. For a specification of perltest.sh, see the +comments near its beginning. See also the #perltest command below. +

    +

    +When the input is a terminal, pcre2test prompts for each line of input, +using "re>" to prompt for regular expression patterns, and "data>" to prompt +for subject lines. Command lines starting with # can be entered only in +response to the "re>" prompt. +

    +

    +Each subject line is matched separately and independently. If you want to do +multi-line matches, you have to use the \n escape sequence (or \r or \r\n, +etc., depending on the newline setting) in a single line of input to encode the +newline sequences. There is no limit on the length of subject lines; the input +buffer is automatically extended if it is too small. There are replication +features that makes it possible to generate long repetitive pattern or subject +lines without having to supply them explicitly. +

    +

    +An empty line or the end of the file signals the end of the subject lines for a +test, at which point a new pattern or command line is expected if there is +still input to be read. +

    +
    COMMAND LINES
    +

    +In between sets of test data, a line that begins with # is interpreted as a +command line. If the first character is followed by white space or an +exclamation mark, the line is treated as a comment, and ignored. Otherwise, the +following commands are recognized: +

    +  #forbid_utf
    +
    +Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP +options set, which locks out the use of the PCRE2_UTF and PCRE2_UCP options and +the use of (*UTF) and (*UCP) at the start of patterns. This command also forces +an error if a subsequent pattern contains any occurrences of \P, \p, or \X, +which are still supported when PCRE2_UTF is not set, but which require Unicode +property support to be included in the library. +

    +

    +This is a trigger guard that is used in test files to ensure that UTF or +Unicode property tests are not accidentally added to files that are used when +Unicode support is not included in the library. Setting PCRE2_NEVER_UTF and +PCRE2_NEVER_UCP as a default can also be obtained by the use of #pattern; +the difference is that #forbid_utf cannot be unset, and the automatic +options are not displayed in pattern information, to avoid cluttering up test +output. +

    +  #load <filename>
    +
    +This command is used to load a set of precompiled patterns from a file, as +described in the section entitled "Saving and restoring compiled patterns" +below. +
    +  #loadtables <filename>
    +
    +This command is used to load a set of binary character tables that can be +accessed by the tables=3 qualifier. Such tables can be created by the +pcre2_dftables program with the -b option. +
    +  #newline_default [<newline-list>]
    +
    +When PCRE2 is built, a default newline convention can be specified. This +determines which characters and/or character pairs are recognized as indicating +a newline in a pattern or subject string. The default can be overridden when a +pattern is compiled. The standard test files contain tests of various newline +conventions, but the majority of the tests expect a single linefeed to be +recognized as a newline by default. Without special action the tests would fail +when PCRE2 is compiled with either CR or CRLF as the default newline. +

    +

    +The #newline_default command specifies a list of newline types that are +acceptable as the default. The types must be one of CR, LF, CRLF, ANYCRLF, +ANY, or NUL (in upper or lower case), for example: +

    +  #newline_default LF Any anyCRLF
    +
    +If the default newline is in the list, this command has no effect. Otherwise, +except when testing the POSIX API, a newline modifier that specifies the +first newline convention in the list (LF in the above example) is added to any +pattern that does not already have a newline modifier. If the newline +list is empty, the feature is turned off. This command is present in a number +of the standard test input files. +

    +

    +When the POSIX API is being tested there is no way to override the default +newline convention, though it is possible to set the newline convention from +within the pattern. A warning is given if the posix or posix_nosub +modifier is used when #newline_default would set a default for the +non-POSIX API. +

    +  #pattern <modifier-list>
    +
    +This command sets a default modifier list that applies to all subsequent +patterns. Modifiers on a pattern can change these settings. +
    +  #perltest
    +
    +This line is used in test files that can also be processed by perltest.sh +to confirm that Perl gives the same results as PCRE2. Subsequent tests are +checked for the use of pcre2test features that are incompatible with the +perltest.sh script. +

    +

    +Patterns must use '/' as their delimiter, and only certain modifiers are +supported. Comment lines, #pattern commands, and #subject commands that set or +unset "mark" are recognized and acted on. The #perltest, #forbid_utf, and +#newline_default commands, which are needed in the relevant pcre2test files, +are silently ignored. All other command lines are ignored, but give a warning +message. The #perltest command helps detect tests that are accidentally +put in the wrong file or use the wrong delimiter. For more details of the +perltest.sh script see the comments it contains. +

    +  #pop [<modifiers>]
    +  #popcopy [<modifiers>]
    +
    +These commands are used to manipulate the stack of compiled patterns, as +described in the section entitled "Saving and restoring compiled patterns" +below. +
    +  #save <filename>
    +
    +This command is used to save a set of compiled patterns to a file, as described +in the section entitled "Saving and restoring compiled patterns" +below. +
    +  #subject <modifier-list>
    +
    +This command sets a default modifier list that applies to all subsequent +subject lines. Modifiers on a subject line can change these settings. +

    +
    MODIFIER SYNTAX
    +

    +Modifier lists are used with both pattern and subject lines. Items in a list +are separated by commas followed by optional white space. Trailing whitespace +in a modifier list is ignored. Some modifiers may be given for both patterns +and subject lines, whereas others are valid only for one or the other. Each +modifier has a long name, for example "anchored", and some of them must be +followed by an equals sign and a value, for example, "offset=12". Values cannot +contain comma characters, but may contain spaces. Modifiers that do not take +values may be preceded by a minus sign to turn off a previous setting. +

    +

    +A few of the more common modifiers can also be specified as single letters, for +example "i" for "caseless". In documentation, following the Perl convention, +these are written with a slash ("the /i modifier") for clarity. Abbreviated +modifiers must all be concatenated in the first item of a modifier list. If the +first item is not recognized as a long modifier name, it is interpreted as a +sequence of these abbreviations. For example: +

    +  /abc/ig,newline=cr,jit=3
    +
    +This is a pattern line whose modifier list starts with two one-letter modifiers +(/i and /g). The lower-case abbreviated modifiers are the same as used in Perl. +

    +
    PATTERN SYNTAX
    +

    +A pattern line must start with one of the following characters (common symbols, +excluding pattern meta-characters): +

    +  / ! " ' ` - = _ : ; , % & @ ~
    +
    +This is interpreted as the pattern's delimiter. A regular expression may be +continued over several input lines, in which case the newline characters are +included within it. It is possible to include the delimiter within the pattern +by escaping it with a backslash, for example +
    +  /abc\/def/
    +
    +If you do this, the escape and the delimiter form part of the pattern, but +since the delimiters are all non-alphanumeric, this does not affect its +interpretation. If the terminating delimiter is immediately followed by a +backslash, for example, +
    +  /abc/\
    +
    +then a backslash is added to the end of the pattern. This is done to provide a +way of testing the error condition that arises if a pattern finishes with a +backslash, because +
    +  /abc\/
    +
    +is interpreted as the first line of a pattern that starts with "abc/", causing +pcre2test to read the next line as a continuation of the regular expression. +

    +

    +A pattern can be followed by a modifier list (details below). +

    +
    SUBJECT LINE SYNTAX
    +

    +Before each subject line is passed to pcre2_match() or +pcre2_dfa_match(), leading and trailing white space is removed, and the +line is scanned for backslash escapes, unless the subject_literal +modifier was set for the pattern. The following provide a means of encoding +non-printing characters in a visible way: +

    +  \a         alarm (BEL, \x07)
    +  \b         backspace (\x08)
    +  \e         escape (\x27)
    +  \f         form feed (\x0c)
    +  \n         newline (\x0a)
    +  \r         carriage return (\x0d)
    +  \t         tab (\x09)
    +  \v         vertical tab (\x0b)
    +  \nnn       octal character (up to 3 octal digits); always
    +               a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode
    +  \o{dd...}  octal character (any number of octal digits}
    +  \xhh       hexadecimal byte (up to 2 hex digits)
    +  \x{hh...}  hexadecimal character (any number of hex digits)
    +
    +The use of \x{hh...} is not dependent on the use of the utf modifier on +the pattern. It is recognized always. There may be any number of hexadecimal +digits inside the braces; invalid values provoke error messages. +

    +

    +Note that \xhh specifies one byte rather than one character in UTF-8 mode; +this makes it possible to construct invalid UTF-8 sequences for testing +purposes. On the other hand, \x{hh} is interpreted as a UTF-8 character in +UTF-8 mode, generating more than one byte if the value is greater than 127. +When testing the 8-bit library not in UTF-8 mode, \x{hh} generates one byte +for values less than 256, and causes an error for greater values. +

    +

    +In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it +possible to construct invalid UTF-16 sequences for testing purposes. +

    +

    +In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This makes it +possible to construct invalid UTF-32 sequences for testing purposes. +

    +

    +There is a special backslash sequence that specifies replication of one or more +characters: +

    +  \[<characters>]{<count>}
    +
    +This makes it possible to test long strings without having to provide them as +part of the file. For example: +
    +  \[abc]{4}
    +
    +is converted to "abcabcabcabc". This feature does not support nesting. To +include a closing square bracket in the characters, code it as \x5D. +

    +

    +A backslash followed by an equals sign marks the end of the subject string and +the start of a modifier list. For example: +

    +  abc\=notbol,notempty
    +
    +If the subject string is empty and \= is followed by whitespace, the line is +treated as a comment line, and is not used for matching. For example: +
    +  \= This is a comment.
    +  abc\= This is an invalid modifier list.
    +
    +A backslash followed by any other non-alphanumeric character just escapes that +character. A backslash followed by anything else causes an error. However, if +the very last character in the line is a backslash (and there is no modifier +list), it is ignored. This gives a way of passing an empty line as data, since +a real empty line terminates the data input. +

    +

    +If the subject_literal modifier is set for a pattern, all subject lines +that follow are treated as literals, with no special treatment of backslashes. +No replication is possible, and any subject modifiers must be set as defaults +by a #subject command. +

    +
    PATTERN MODIFIERS
    +

    +There are several types of modifier that can appear in pattern lines. Except +where noted below, they may also be used in #pattern commands. A +pattern's modifier list can add to or override default modifiers that were set +by a previous #pattern command. +

    +
    +Setting compilation options +
    +

    +The following modifiers set options for pcre2_compile(). Most of them set +bits in the options argument of that function, but those whose names start with +PCRE2_EXTRA are additional options that are set in the compile context. For the +main options, there are some single-letter abbreviations that are the same as +Perl options. There is special handling for /x: if a second x is present, +PCRE2_EXTENDED is converted into PCRE2_EXTENDED_MORE as in Perl. A third +appearance adds PCRE2_EXTENDED as well, though this makes no difference to the +way pcre2_compile() behaves. See +pcre2api +for a description of the effects of these options. +

    +      allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
    +      allow_surrogate_escapes   set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
    +      alt_bsux                  set PCRE2_ALT_BSUX
    +      alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
    +      alt_verbnames             set PCRE2_ALT_VERBNAMES
    +      anchored                  set PCRE2_ANCHORED
    +      auto_callout              set PCRE2_AUTO_CALLOUT
    +      bad_escape_is_literal     set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
    +  /i  caseless                  set PCRE2_CASELESS
    +      dollar_endonly            set PCRE2_DOLLAR_ENDONLY
    +  /s  dotall                    set PCRE2_DOTALL
    +      dupnames                  set PCRE2_DUPNAMES
    +      endanchored               set PCRE2_ENDANCHORED
    +      escaped_cr_is_lf          set PCRE2_EXTRA_ESCAPED_CR_IS_LF
    +  /x  extended                  set PCRE2_EXTENDED
    +  /xx extended_more             set PCRE2_EXTENDED_MORE
    +      extra_alt_bsux            set PCRE2_EXTRA_ALT_BSUX
    +      firstline                 set PCRE2_FIRSTLINE
    +      literal                   set PCRE2_LITERAL
    +      match_line                set PCRE2_EXTRA_MATCH_LINE
    +      match_invalid_utf         set PCRE2_MATCH_INVALID_UTF
    +      match_unset_backref       set PCRE2_MATCH_UNSET_BACKREF
    +      match_word                set PCRE2_EXTRA_MATCH_WORD
    +  /m  multiline                 set PCRE2_MULTILINE
    +      never_backslash_c         set PCRE2_NEVER_BACKSLASH_C
    +      never_ucp                 set PCRE2_NEVER_UCP
    +      never_utf                 set PCRE2_NEVER_UTF
    +  /n  no_auto_capture           set PCRE2_NO_AUTO_CAPTURE
    +      no_auto_possess           set PCRE2_NO_AUTO_POSSESS
    +      no_dotstar_anchor         set PCRE2_NO_DOTSTAR_ANCHOR
    +      no_start_optimize         set PCRE2_NO_START_OPTIMIZE
    +      no_utf_check              set PCRE2_NO_UTF_CHECK
    +      ucp                       set PCRE2_UCP
    +      ungreedy                  set PCRE2_UNGREEDY
    +      use_offset_limit          set PCRE2_USE_OFFSET_LIMIT
    +      utf                       set PCRE2_UTF
    +
    +As well as turning on the PCRE2_UTF option, the utf modifier causes all +non-printing characters in output strings to be printed using the \x{hh...} +notation. Otherwise, those less than 0x100 are output in hex without the curly +brackets. Setting utf in 16-bit or 32-bit mode also causes pattern and +subject strings to be translated to UTF-16 or UTF-32, respectively, before +being passed to library functions. +

    +
    +Setting compilation controls +
    +

    +The following modifiers affect the compilation process or request information +about the pattern. There are single-letter abbreviations for some that are +heavily used in the test files. +

    +      bsr=[anycrlf|unicode]     specify \R handling
    +  /B  bincode                   show binary code without lengths
    +      callout_info              show callout information
    +      convert=<options>         request foreign pattern conversion
    +      convert_glob_escape=c     set glob escape character
    +      convert_glob_separator=c  set glob separator character
    +      convert_length            set convert buffer length
    +      debug                     same as info,fullbincode
    +      framesize                 show matching frame size
    +      fullbincode               show binary code with lengths
    +  /I  info                      show info about compiled pattern
    +      hex                       unquoted characters are hexadecimal
    +      jit[=<number>]            use JIT
    +      jitfast                   use JIT fast path
    +      jitverify                 verify JIT use
    +      locale=<name>             use this locale
    +      max_pattern_length=<n>    set the maximum pattern length
    +      memory                    show memory used
    +      newline=<type>            set newline type
    +      null_context              compile with a NULL context
    +      parens_nest_limit=<n>     set maximum parentheses depth
    +      posix                     use the POSIX API
    +      posix_nosub               use the POSIX API with REG_NOSUB
    +      push                      push compiled pattern onto the stack
    +      pushcopy                  push a copy onto the stack
    +      stackguard=<number>       test the stackguard feature
    +      subject_literal           treat all subject lines as literal
    +      tables=[0|1|2|3]          select internal tables
    +      use_length                do not zero-terminate the pattern
    +      utf8_input                treat input as UTF-8
    +
    +The effects of these modifiers are described in the following sections. +

    +
    +Newline and \R handling +
    +

    +The bsr modifier specifies what \R in a pattern should match. If it is +set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to "unicode", +\R matches any Unicode newline sequence. The default can be specified when +PCRE2 is built; if it is not, the default is set to Unicode. +

    +

    +The newline modifier specifies which characters are to be interpreted as +newlines, both in the pattern and in subject lines. The type must be one of CR, +LF, CRLF, ANYCRLF, ANY, or NUL (in upper or lower case). +

    +
    +Information about a pattern +
    +

    +The debug modifier is a shorthand for info,fullbincode, requesting +all available information. +

    +

    +The bincode modifier causes a representation of the compiled code to be +output after compilation. This information does not contain length and offset +values, which ensures that the same output is generated for different internal +link sizes and different code unit widths. By using bincode, the same +regression tests can be used in different environments. +

    +

    +The fullbincode modifier, by contrast, does include length and +offset values. This is used in a few special tests that run only for specific +code unit widths and link sizes, and is also useful for one-off tests. +

    +

    +The info modifier requests information about the compiled pattern +(whether it is anchored, has a fixed first character, and so on). The +information is obtained from the pcre2_pattern_info() function. Here are +some typical examples: +

    +    re> /(?i)(^a|^b)/m,info
    +  Capture group count = 1
    +  Compile options: multiline
    +  Overall options: caseless multiline
    +  First code unit at start or follows newline
    +  Subject length lower bound = 1
    +
    +    re> /(?i)abc/info
    +  Capture group count = 0
    +  Compile options: <none>
    +  Overall options: caseless
    +  First code unit = 'a' (caseless)
    +  Last code unit = 'c' (caseless)
    +  Subject length lower bound = 3
    +
    +"Compile options" are those specified by modifiers; "overall options" have +added options that are taken or deduced from the pattern. If both sets of +options are the same, just a single "options" line is output; if there are no +options, the line is omitted. "First code unit" is where any match must start; +if there is more than one they are listed as "starting code units". "Last code +unit" is the last literal code unit that must be present in any match. This is +not necessarily the last character. These lines are omitted if no starting or +ending code units are recorded. The subject length line is omitted when +no_start_optimize is set because the minimum length is not calculated +when it can never be used. +

    +

    +The framesize modifier shows the size, in bytes, of the storage frames +used by pcre2_match() for handling backtracking. The size depends on the +number of capturing parentheses in the pattern. +

    +

    +The callout_info modifier requests information about all the callouts in +the pattern. A list of them is output at the end of any other information that +is requested. For each callout, either its number or string is given, followed +by the item that follows it in the pattern. +

    +
    +Passing a NULL context +
    +

    +Normally, pcre2test passes a context block to pcre2_compile(). If +the null_context modifier is set, however, NULL is passed. This is for +testing that pcre2_compile() behaves correctly in this case (it uses +default values). +

    +
    +Specifying pattern characters in hexadecimal +
    +

    +The hex modifier specifies that the characters of the pattern, except for +substrings enclosed in single or double quotes, are to be interpreted as pairs +of hexadecimal digits. This feature is provided as a way of creating patterns +that contain binary zeros and other non-printing characters. White space is +permitted between pairs of digits. For example, this pattern contains three +characters: +

    +  /ab 32 59/hex
    +
    +Parts of such a pattern are taken literally if quoted. This pattern contains +nine characters, only two of which are specified in hexadecimal: +
    +  /ab "literal" 32/hex
    +
    +Either single or double quotes may be used. There is no way of including +the delimiter within a substring. The hex and expand modifiers are +mutually exclusive. +

    +
    +Specifying the pattern's length +
    +

    +By default, patterns are passed to the compiling functions as zero-terminated +strings but can be passed by length instead of being zero-terminated. The +use_length modifier causes this to happen. Using a length happens +automatically (whether or not use_length is set) when hex is set, +because patterns specified in hexadecimal may contain binary zeros. +

    +

    +If hex or use_length is used with the POSIX wrapper API (see +"Using the POSIX wrapper API" +below), the REG_PEND extension is used to pass the pattern's length. +

    +
    +Specifying wide characters in 16-bit and 32-bit modes +
    +

    +In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and +translated to UTF-16 or UTF-32 when the utf modifier is set. For testing +the 16-bit and 32-bit libraries in non-UTF mode, the utf8_input modifier +can be used. It is mutually exclusive with utf. Input lines are +interpreted as UTF-8 as a means of specifying wide characters. More details are +given in +"Input encoding" +above. +

    +
    +Generating long repetitive patterns +
    +

    +Some tests use long patterns that are very repetitive. Instead of creating a +very long input line for such a pattern, you can use a special repetition +feature, similar to the one described for subject lines above. If the +expand modifier is present on a pattern, parts of the pattern that have +the form +

    +  \[<characters>]{<count>}
    +
    +are expanded before the pattern is passed to pcre2_compile(). For +example, \[AB]{6000} is expanded to "ABAB..." 6000 times. This construction +cannot be nested. An initial "\[" sequence is recognized only if "]{" followed +by decimal digits and "}" is found later in the pattern. If not, the characters +remain in the pattern unaltered. The expand and hex modifiers are +mutually exclusive. +

    +

    +If part of an expanded pattern looks like an expansion, but is really part of +the actual pattern, unwanted expansion can be avoided by giving two values in +the quantifier. For example, \[AB]{6000,6000} is not recognized as an +expansion item. +

    +

    +If the info modifier is set on an expanded pattern, the result of the +expansion is included in the information that is output. +

    +
    +JIT compilation +
    +

    +Just-in-time (JIT) compiling is a heavyweight optimization that can greatly +speed up pattern matching. See the +pcre2jit +documentation for details. JIT compiling happens, optionally, after a pattern +has been successfully compiled into an internal form. The JIT compiler converts +this to optimized machine code. It needs to know whether the match-time options +PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used, because +different code is generated for the different cases. See the partial +modifier in "Subject Modifiers" +below +for details of how these options are specified for each match attempt. +

    +

    +JIT compilation is requested by the jit pattern modifier, which may +optionally be followed by an equals sign and a number in the range 0 to 7. +The three bits that make up the number specify which of the three JIT operating +modes are to be compiled: +

    +  1  compile JIT code for non-partial matching
    +  2  compile JIT code for soft partial matching
    +  4  compile JIT code for hard partial matching
    +
    +The possible values for the jit modifier are therefore: +
    +  0  disable JIT
    +  1  normal matching only
    +  2  soft partial matching only
    +  3  normal and soft partial matching
    +  4  hard partial matching only
    +  6  soft and hard partial matching only
    +  7  all three modes
    +
    +If no number is given, 7 is assumed. The phrase "partial matching" means a call +to pcre2_match() with either the PCRE2_PARTIAL_SOFT or the +PCRE2_PARTIAL_HARD option set. Note that such a call may return a complete +match; the options enable the possibility of a partial match, but do not +require it. Note also that if you request JIT compilation only for partial +matching (for example, jit=2) but do not set the partial modifier on a +subject line, that match will not use JIT code because none was compiled for +non-partial matching. +

    +

    +If JIT compilation is successful, the compiled JIT code will automatically be +used when an appropriate type of match is run, except when incompatible +run-time options are specified. For more details, see the +pcre2jit +documentation. See also the jitstack modifier below for a way of +setting the size of the JIT stack. +

    +

    +If the jitfast modifier is specified, matching is done using the JIT +"fast path" interface, pcre2_jit_match(), which skips some of the sanity +checks that are done by pcre2_match(), and of course does not work when +JIT is not supported. If jitfast is specified without jit, jit=7 is +assumed. +

    +

    +If the jitverify modifier is specified, information about the compiled +pattern shows whether JIT compilation was or was not successful. If +jitverify is specified without jit, jit=7 is assumed. If JIT +compilation is successful when jitverify is set, the text "(JIT)" is +added to the first output line after a match or non match when JIT-compiled +code was actually used in the match. +

    +
    +Setting a locale +
    +

    +The locale modifier must specify the name of a locale, for example: +

    +  /pattern/locale=fr_FR
    +
    +The given locale is set, pcre2_maketables() is called to build a set of +character tables for the locale, and this is then passed to +pcre2_compile() when compiling the regular expression. The same tables +are used when matching the following subject lines. The locale modifier +applies only to the pattern on which it appears, but can be given in a +#pattern command if a default is needed. Setting a locale and alternate +character tables are mutually exclusive. +

    +
    +Showing pattern memory +
    +

    +The memory modifier causes the size in bytes of the memory used to hold +the compiled pattern to be output. This does not include the size of the +pcre2_code block; it is just the actual compiled data. If the pattern is +subsequently passed to the JIT compiler, the size of the JIT compiled code is +also output. Here is an example: +

    +    re> /a(b)c/jit,memory
    +  Memory allocation (code space): 21
    +  Memory allocation (JIT code): 1910
    +
    +
    +

    +
    +Limiting nested parentheses +
    +

    +The parens_nest_limit modifier sets a limit on the depth of nested +parentheses in a pattern. Breaching the limit causes a compilation error. +The default for the library is set when PCRE2 is built, but pcre2test +sets its own default of 220, which is required for running the standard test +suite. +

    +
    +Limiting the pattern length +
    +

    +The max_pattern_length modifier sets a limit, in code units, to the +length of pattern that pcre2_compile() will accept. Breaching the limit +causes a compilation error. The default is the largest number a PCRE2_SIZE +variable can hold (essentially unlimited). +

    +
    +Using the POSIX wrapper API +
    +

    +The posix and posix_nosub modifiers cause pcre2test to call +PCRE2 via the POSIX wrapper API rather than its native API. When +posix_nosub is used, the POSIX option REG_NOSUB is passed to +regcomp(). The POSIX wrapper supports only the 8-bit library. Note that +it does not imply POSIX matching semantics; for more detail see the +pcre2posix +documentation. The following pattern modifiers set options for the +regcomp() function: +

    +  caseless           REG_ICASE
    +  multiline          REG_NEWLINE
    +  dotall             REG_DOTALL     )
    +  ungreedy           REG_UNGREEDY   ) These options are not part of
    +  ucp                REG_UCP        )   the POSIX standard
    +  utf                REG_UTF8       )
    +
    +The regerror_buffsize modifier specifies a size for the error buffer that +is passed to regerror() in the event of a compilation error. For example: +
    +  /abc/posix,regerror_buffsize=20
    +
    +This provides a means of testing the behaviour of regerror() when the +buffer is too small for the error message. If this modifier has not been set, a +large buffer is used. +

    +

    +The aftertext and allaftertext subject modifiers work as described +below. All other modifiers are either ignored, with a warning message, or cause +an error. +

    +

    +The pattern is passed to regcomp() as a zero-terminated string by +default, but if the use_length or hex modifiers are set, the +REG_PEND extension is used to pass it by length. +

    +
    +Testing the stack guard feature +
    +

    +The stackguard modifier is used to test the use of +pcre2_set_compile_recursion_guard(), a function that is provided to +enable stack availability to be checked during compilation (see the +pcre2api +documentation for details). If the number specified by the modifier is greater +than zero, pcre2_set_compile_recursion_guard() is called to set up +callback from pcre2_compile() to a local function. The argument it +receives is the current nesting parenthesis depth; if this is greater than the +value given by the modifier, non-zero is returned, causing the compilation to +be aborted. +

    +
    +Using alternative character tables +
    +

    +The value specified for the tables modifier must be one of the digits 0, +1, 2, or 3. It causes a specific set of built-in character tables to be passed +to pcre2_compile(). This is used in the PCRE2 tests to check behaviour +with different character tables. The digit specifies the tables as follows: +

    +  0   do not pass any special character tables
    +  1   the default ASCII tables, as distributed in
    +        pcre2_chartables.c.dist
    +  2   a set of tables defining ISO 8859 characters
    +  3   a set of tables loaded by the #loadtables command
    +
    +In tables 2, some characters whose codes are greater than 128 are identified as +letters, digits, spaces, etc. Tables 3 can be used only after a +#loadtables command has loaded them from a binary file. Setting alternate +character tables and a locale are mutually exclusive. +

    +
    +Setting certain match controls +
    +

    +The following modifiers are really subject modifiers, and are described under +"Subject Modifiers" below. However, they may be included in a pattern's +modifier list, in which case they are applied to every subject line that is +processed with that pattern. These modifiers do not affect the compilation +process. +

    +      aftertext                   show text after match
    +      allaftertext                show text after captures
    +      allcaptures                 show all captures
    +      allvector                   show the entire ovector
    +      allusedtext                 show all consulted text
    +      altglobal                   alternative global matching
    +  /g  global                      global matching
    +      jitstack=<n>                set size of JIT stack
    +      mark                        show mark values
    +      replace=<string>            specify a replacement string
    +      startchar                   show starting character when relevant
    +      substitute_callout          use substitution callouts
    +      substitute_extended         use PCRE2_SUBSTITUTE_EXTENDED
    +      substitute_literal          use PCRE2_SUBSTITUTE_LITERAL
    +      substitute_matched          use PCRE2_SUBSTITUTE_MATCHED
    +      substitute_overflow_length  use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
    +      substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
    +      substitute_skip=<n>         skip substitution <n>
    +      substitute_stop=<n>         skip substitution <n> and following
    +      substitute_unknown_unset    use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
    +      substitute_unset_empty      use PCRE2_SUBSTITUTE_UNSET_EMPTY
    +
    +These modifiers may not appear in a #pattern command. If you want them as +defaults, set them in a #subject command. +

    +
    +Specifying literal subject lines +
    +

    +If the subject_literal modifier is present on a pattern, all the subject +lines that it matches are taken as literal strings, with no interpretation of +backslashes. It is not possible to set subject modifiers on such lines, but any +that are set as defaults by a #subject command are recognized. +

    +
    +Saving a compiled pattern +
    +

    +When a pattern with the push modifier is successfully compiled, it is +pushed onto a stack of compiled patterns, and pcre2test expects the next +line to contain a new pattern (or a command) instead of a subject line. This +facility is used when saving compiled patterns to a file, as described in the +section entitled "Saving and restoring compiled patterns" +below. +If pushcopy is used instead of push, a copy of the compiled +pattern is stacked, leaving the original as current, ready to match the +following input lines. This provides a way of testing the +pcre2_code_copy() function. +The push and pushcopy modifiers are incompatible with compilation +modifiers such as global that act at match time. Any that are specified +are ignored (for the stacked copy), with a warning message, except for +replace, which causes an error. Note that jitverify, which is +allowed, does not carry through to any subsequent matching that uses a stacked +pattern. +

    +
    +Testing foreign pattern conversion +
    +

    +The experimental foreign pattern conversion functions in PCRE2 can be tested by +setting the convert modifier. Its argument is a colon-separated list of +options, which set the equivalent option for the pcre2_pattern_convert() +function: +

    +  glob                    PCRE2_CONVERT_GLOB
    +  glob_no_starstar        PCRE2_CONVERT_GLOB_NO_STARSTAR
    +  glob_no_wild_separator  PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
    +  posix_basic             PCRE2_CONVERT_POSIX_BASIC
    +  posix_extended          PCRE2_CONVERT_POSIX_EXTENDED
    +  unset                   Unset all options
    +
    +The "unset" value is useful for turning off a default that has been set by a +#pattern command. When one of these options is set, the input pattern is +passed to pcre2_pattern_convert(). If the conversion is successful, the +result is reflected in the output and then passed to pcre2_compile(). The +normal utf and no_utf_check options, if set, cause the +PCRE2_CONVERT_UTF and PCRE2_CONVERT_NO_UTF_CHECK options to be passed to +pcre2_pattern_convert(). +

    +

    +By default, the conversion function is allowed to allocate a buffer for its +output. However, if the convert_length modifier is set to a value greater +than zero, pcre2test passes a buffer of the given length. This makes it +possible to test the length check. +

    +

    +The convert_glob_escape and convert_glob_separator modifiers can be +used to specify the escape and separator characters for glob processing, +overriding the defaults, which are operating-system dependent. +

    +
    SUBJECT MODIFIERS
    +

    +The modifiers that can appear in subject lines and the #subject +command are of two types. +

    +
    +Setting match options +
    +

    +The following modifiers set options for pcre2_match() or +pcre2_dfa_match(). See +pcreapi +for a description of their effects. +

    +      anchored                  set PCRE2_ANCHORED
    +      endanchored               set PCRE2_ENDANCHORED
    +      dfa_restart               set PCRE2_DFA_RESTART
    +      dfa_shortest              set PCRE2_DFA_SHORTEST
    +      no_jit                    set PCRE2_NO_JIT
    +      no_utf_check              set PCRE2_NO_UTF_CHECK
    +      notbol                    set PCRE2_NOTBOL
    +      notempty                  set PCRE2_NOTEMPTY
    +      notempty_atstart          set PCRE2_NOTEMPTY_ATSTART
    +      noteol                    set PCRE2_NOTEOL
    +      partial_hard (or ph)      set PCRE2_PARTIAL_HARD
    +      partial_soft (or ps)      set PCRE2_PARTIAL_SOFT
    +
    +The partial matching modifiers are provided with abbreviations because they +appear frequently in tests. +

    +

    +If the posix or posix_nosub modifier was present on the pattern, +causing the POSIX wrapper API to be used, the only option-setting modifiers +that have any effect are notbol, notempty, and noteol, +causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to +regexec(). The other modifiers are ignored, with a warning message. +

    +

    +There is one additional modifier that can be used with the POSIX wrapper. It is +ignored (with a warning) if used for non-POSIX matching. +

    +      posix_startend=<n>[:<m>]
    +
    +This causes the subject string to be passed to regexec() using the +REG_STARTEND option, which uses offsets to specify which part of the string is +searched. If only one number is given, the end offset is passed as the end of +the subject string. For more detail of REG_STARTEND, see the +pcre2posix +documentation. If the subject string contains binary zeros (coded as escapes +such as \x{00} because pcre2test does not support actual binary zeros in +its input), you must use posix_startend to specify its length. +

    +
    +Setting match controls +
    +

    +The following modifiers affect the matching process or request additional +information. Some of them may also be specified on a pattern line (see above), +in which case they apply to every subject line that is matched against that +pattern, but can be overridden by modifiers on the subject. +

    +      aftertext                  show text after match
    +      allaftertext               show text after captures
    +      allcaptures                show all captures
    +      allvector                  show the entire ovector
    +      allusedtext                show all consulted text (non-JIT only)
    +      altglobal                  alternative global matching
    +      callout_capture            show captures at callout time
    +      callout_data=<n>           set a value to pass via callouts
    +      callout_error=<n>[:<m>]    control callout error
    +      callout_extra              show extra callout information
    +      callout_fail=<n>[:<m>]     control callout failure
    +      callout_no_where           do not show position of a callout
    +      callout_none               do not supply a callout function
    +      copy=<number or name>      copy captured substring
    +      depth_limit=<n>            set a depth limit
    +      dfa                        use pcre2_dfa_match()
    +      find_limits                find match and depth limits
    +      get=<number or name>       extract captured substring
    +      getall                     extract all captured substrings
    +  /g  global                     global matching
    +      heap_limit=<n>             set a limit on heap memory (Kbytes)
    +      jitstack=<n>               set size of JIT stack
    +      mark                       show mark values
    +      match_limit=<n>            set a match limit
    +      memory                     show heap memory usage
    +      null_context               match with a NULL context
    +      offset=<n>                 set starting offset
    +      offset_limit=<n>           set offset limit
    +      ovector=<n>                set size of output vector
    +      recursion_limit=<n>        obsolete synonym for depth_limit
    +      replace=<string>           specify a replacement string
    +      startchar                  show startchar when relevant
    +      startoffset=<n>            same as offset=<n>
    +      substitute_callout         use substitution callouts
    +      substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
    +      substitute_literal         use PCRE2_SUBSTITUTE_LITERAL
    +      substitute_matched         use PCRE2_SUBSTITUTE_MATCHED
    +      substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
    +      substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
    +      substitute_skip=<n>        skip substitution number n
    +      substitute_stop=<n>        skip substitution number n and greater
    +      substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
    +      substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
    +      zero_terminate             pass the subject as zero-terminated
    +
    +The effects of these modifiers are described in the following sections. When +matching via the POSIX wrapper API, the aftertext, allaftertext, +and ovector subject modifiers work as described below. All other +modifiers are either ignored, with a warning message, or cause an error. +

    +
    +Showing more text +
    +

    +The aftertext modifier requests that as well as outputting the part of +the subject string that matched the entire pattern, pcre2test should in +addition output the remainder of the subject string. This is useful for tests +where the subject contains multiple copies of the same substring. The +allaftertext modifier requests the same action for captured substrings as +well as the main matched substring. In each case the remainder is output on the +following line with a plus character following the capture number. +

    +

    +The allusedtext modifier requests that all the text that was consulted +during a successful pattern match by the interpreter should be shown, for both +full and partial matches. This feature is not supported for JIT matching, and +if requested with JIT it is ignored (with a warning message). Setting this +modifier affects the output if there is a lookbehind at the start of a match, +or, for a complete match, a lookahead at the end, or if \K is used in the +pattern. Characters that precede or follow the start and end of the actual +match are indicated in the output by '<' or '>' characters underneath them. +Here is an example: +

    +    re> /(?<=pqr)abc(?=xyz)/
    +  data> 123pqrabcxyz456\=allusedtext
    +   0: pqrabcxyz
    +      <<<   >>>
    +  data> 123pqrabcxy\=ph,allusedtext
    +  Partial match: pqrabcxy
    +                 <<<
    +
    +The first, complete match shows that the matched string is "abc", with the +preceding and following strings "pqr" and "xyz" having been consulted during +the match (when processing the assertions). The partial match can indicate only +the preceding string. +

    +

    +The startchar modifier requests that the starting character for the match +be indicated, if it is different to the start of the matched string. The only +time when this occurs is when \K has been processed as part of the match. In +this situation, the output for the matched string is displayed from the +starting character instead of from the match point, with circumflex characters +under the earlier characters. For example: +

    +    re> /abc\Kxyz/
    +  data> abcxyz\=startchar
    +   0: abcxyz
    +      ^^^
    +
    +Unlike allusedtext, the startchar modifier can be used with JIT. +However, these two modifiers are mutually exclusive. +

    +
    +Showing the value of all capture groups +
    +

    +The allcaptures modifier requests that the values of all potential +captured parentheses be output after a match. By default, only those up to the +highest one actually used in the match are output (corresponding to the return +code from pcre2_match()). Groups that did not take part in the match +are output as "<unset>". This modifier is not relevant for DFA matching (which +does no capturing) and does not apply when replace is specified; it is +ignored, with a warning message, if present. +

    +
    +Showing the entire ovector, for all outcomes +
    +

    +The allvector modifier requests that the entire ovector be shown, +whatever the outcome of the match. Compare allcaptures, which shows only +up to the maximum number of capture groups for the pattern, and then only for a +successful complete non-DFA match. This modifier, which acts after any match +result, and also for DFA matching, provides a means of checking that there are +no unexpected modifications to ovector fields. Before each match attempt, the +ovector is filled with a special value, and if this is found in both elements +of a capturing pair, "<unchanged>" is output. After a successful match, this +applies to all groups after the maximum capture group for the pattern. In other +cases it applies to the entire ovector. After a partial match, the first two +elements are the only ones that should be set. After a DFA match, the amount of +ovector that is used depends on the number of matches that were found. +

    +
    +Testing pattern callouts +
    +

    +A callout function is supplied when pcre2test calls the library matching +functions, unless callout_none is specified. Its behaviour can be +controlled by various modifiers listed above whose names begin with +callout_. Details are given in the section entitled "Callouts" +below. +Testing callouts from pcre2_substitute() is decribed separately in +"Testing the substitution function" +below. +

    +
    +Finding all matches in a string +
    +

    +Searching for all possible matches within a subject can be requested by the +global or altglobal modifier. After finding a match, the matching +function is called again to search the remainder of the subject. The difference +between global and altglobal is that the former uses the +start_offset argument to pcre2_match() or pcre2_dfa_match() +to start searching at a new point within the entire string (which is what Perl +does), whereas the latter passes over a shortened subject. This makes a +difference to the matching process if the pattern begins with a lookbehind +assertion (including \b or \B). +

    +

    +If an empty string is matched, the next match is done with the +PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search for +another, non-empty, match at the same point in the subject. If this match +fails, the start offset is advanced, and the normal match is retried. This +imitates the way Perl handles such cases when using the /g modifier or +the split() function. Normally, the start offset is advanced by one +character, but if the newline convention recognizes CRLF as a newline, and the +current character is CR followed by LF, an advance of two characters occurs. +

    +
    +Testing substring extraction functions +
    +

    +The copy and get modifiers can be used to test the +pcre2_substring_copy_xxx() and pcre2_substring_get_xxx() functions. +They can be given more than once, and each can specify a capture group name or +number, for example: +

    +   abcd\=copy=1,copy=3,get=G1
    +
    +If the #subject command is used to set default copy and/or get lists, +these can be unset by specifying a negative number to cancel all numbered +groups and an empty name to cancel all named groups. +

    +

    +The getall modifier tests pcre2_substring_list_get(), which +extracts all captured substrings. +

    +

    +If the subject line is successfully matched, the substrings extracted by the +convenience functions are output with C, G, or L after the string number +instead of a colon. This is in addition to the normal full list. The string +length (that is, the return from the extraction function) is given in +parentheses after each substring, followed by the name when the extraction was +by name. +

    +
    +Testing the substitution function +
    +

    +If the replace modifier is set, the pcre2_substitute() function is +called instead of one of the matching functions (or after one call of +pcre2_match() in the case of PCRE2_SUBSTITUTE_MATCHED). Note that +replacement strings cannot contain commas, because a comma signifies the end of +a modifier. This is not thought to be an issue in a test program. +

    +

    +Specifying a completely empty replacement string disables this modifier. +However, it is possible to specify an empty replacement by providing a buffer +length, as described below, for an otherwise empty replacement. +

    +

    +Unlike subject strings, pcre2test does not process replacement strings +for escape sequences. In UTF mode, a replacement string is checked to see if it +is a valid UTF-8 string. If so, it is correctly converted to a UTF string of +the appropriate code unit width. If it is not a valid UTF-8 string, the +individual code units are copied directly. This provides a means of passing an +invalid UTF-8 string for testing purposes. +

    +

    +The following modifiers set options (in additional to the normal match options) +for pcre2_substitute(): +

    +  global                      PCRE2_SUBSTITUTE_GLOBAL
    +  substitute_extended         PCRE2_SUBSTITUTE_EXTENDED
    +  substitute_literal          PCRE2_SUBSTITUTE_LITERAL
    +  substitute_matched          PCRE2_SUBSTITUTE_MATCHED
    +  substitute_overflow_length  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
    +  substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY
    +  substitute_unknown_unset    PCRE2_SUBSTITUTE_UNKNOWN_UNSET
    +  substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY
    +
    +See the +pcre2api +documentation for details of these options. +

    +

    +After a successful substitution, the modified string is output, preceded by the +number of replacements. This may be zero if there were no matches. Here is a +simple example of a substitution test: +

    +  /abc/replace=xxx
    +      =abc=abc=
    +   1: =xxx=abc=
    +      =abc=abc=\=global
    +   2: =xxx=xxx=
    +
    +Subject and replacement strings should be kept relatively short (fewer than 256 +characters) for substitution tests, as fixed-size buffers are used. To make it +easy to test for buffer overflow, if the replacement string starts with a +number in square brackets, that number is passed to pcre2_substitute() as +the size of the output buffer, with the replacement string starting at the next +character. Here is an example that tests the edge case: +
    +  /abc/
    +      123abc123\=replace=[10]XYZ
    +   1: 123XYZ123
    +      123abc123\=replace=[9]XYZ
    +  Failed: error -47: no more memory
    +
    +The default action of pcre2_substitute() is to return +PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the +PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the +substitute_overflow_length modifier), pcre2_substitute() continues +to go through the motions of matching and substituting (but not doing any +callouts), in order to compute the size of buffer that is required. When this +happens, pcre2test shows the required buffer length (which includes space +for the trailing zero) as part of the error message. For example: +
    +  /abc/substitute_overflow_length
    +      123abc123\=replace=[9]XYZ
    +  Failed: error -47: no more memory: 10 code units are needed
    +
    +A replacement string is ignored with POSIX and DFA matching. Specifying partial +matching provokes an error return ("bad option value") from +pcre2_substitute(). +

    +
    +Testing substitute callouts +
    +

    +If the substitute_callout modifier is set, a substitution callout +function is set up. The null_context modifier must not be set, because +the address of the callout function is passed in a match context. When the +callout function is called (after each substitution), details of the the input +and output strings are output. For example: +

    +  /abc/g,replace=<$0>,substitute_callout
    +      abcdefabcpqr
    +   1(1) Old 0 3 "abc" New 0 5 "<abc>"
    +   2(1) Old 6 9 "abc" New 8 13 "<abc>"
    +   2: <abc>def<abc>pqr
    +
    +The first number on each callout line is the count of matches. The +parenthesized number is the number of pairs that are set in the ovector (that +is, one more than the number of capturing groups that were set). Then are +listed the offsets of the old substring, its contents, and the same for the +replacement. +

    +

    +By default, the substitution callout function returns zero, which accepts the +replacement and causes matching to continue if /g was used. Two further +modifiers can be used to test other return values. If substitute_skip is +set to a value greater than zero the callout function returns +1 for the match +of that number, and similarly substitute_stop returns -1. These cause the +replacement to be rejected, and -1 causes no further matching to take place. If +either of them are set, substitute_callout is assumed. For example: +

    +  /abc/g,replace=<$0>,substitute_skip=1
    +      abcdefabcpqr
    +   1(1) Old 0 3 "abc" New 0 5 "<abc> SKIPPED"
    +   2(1) Old 6 9 "abc" New 6 11 "<abc>"
    +   2: abcdef<abc>pqr
    +      abcdefabcpqr\=substitute_stop=1
    +   1(1) Old 0 3 "abc" New 0 5 "<abc> STOPPED"
    +   1: abcdefabcpqr
    +
    +If both are set for the same number, stop takes precedence. Only a single skip +or stop is supported, which is sufficient for testing that the feature works. +

    +
    +Setting the JIT stack size +
    +

    +The jitstack modifier provides a way of setting the maximum stack size +that is used by the just-in-time optimization code. It is ignored if JIT +optimization is not being used. The value is a number of kibibytes (units of +1024 bytes). Setting zero reverts to the default of 32KiB. Providing a stack +that is larger than the default is necessary only for very complicated +patterns. If jitstack is set non-zero on a subject line it overrides any +value that was set on the pattern. +

    +
    +Setting heap, match, and depth limits +
    +

    +The heap_limit, match_limit, and depth_limit modifiers set +the appropriate limits in the match context. These values are ignored when the +find_limits modifier is specified. +

    +
    +Finding minimum limits +
    +

    +If the find_limits modifier is present on a subject line, pcre2test +calls the relevant matching function several times, setting different values in +the match context via pcre2_set_heap_limit(), +pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds +the minimum values for each parameter that allows the match to complete without +error. If JIT is being used, only the match limit is relevant. +

    +

    +When using this modifier, the pattern should not contain any limit settings +such as (*LIMIT_MATCH=...) within it. If such a setting is present and is +lower than the minimum matching value, the minimum value cannot be found +because pcre2_set_match_limit() etc. are only able to reduce the value of +an in-pattern limit; they cannot increase it. +

    +

    +For non-DFA matching, the minimum depth_limit number is a measure of how +much nested backtracking happens (that is, how deeply the pattern's tree is +searched). In the case of DFA matching, depth_limit controls the depth of +recursive calls of the internal function that is used for handling pattern +recursion, lookaround assertions, and atomic groups. +

    +

    +For non-DFA matching, the match_limit number is a measure of the amount +of backtracking that takes place, and learning the minimum value can be +instructive. For most simple matches, the number is quite small, but for +patterns with very large numbers of matching possibilities, it can become large +very quickly with increasing length of subject string. In the case of DFA +matching, match_limit controls the total number of calls, both recursive +and non-recursive, to the internal matching function, thus controlling the +overall amount of computing resource that is used. +

    +

    +For both kinds of matching, the heap_limit number, which is in kibibytes +(units of 1024 bytes), limits the amount of heap memory used for matching. A +value of zero disables the use of any heap memory; many simple pattern matches +can be done without using the heap, so zero is not an unreasonable setting. +

    +
    +Showing MARK names +
    +

    +The mark modifier causes the names from backtracking control verbs that +are returned from calls to pcre2_match() to be displayed. If a mark is +returned for a match, non-match, or partial match, pcre2test shows it. +For a match, it is on a line by itself, tagged with "MK:". Otherwise, it +is added to the non-match message. +

    +
    +Showing memory usage +
    +

    +The memory modifier causes pcre2test to log the sizes of all heap +memory allocation and freeing calls that occur during a call to +pcre2_match() or pcre2_dfa_match(). These occur only when a match +requires a bigger vector than the default for remembering backtracking points +(pcre2_match()) or for internal workspace (pcre2_dfa_match()). In +many cases there will be no heap memory used and therefore no additional +output. No heap memory is allocated during matching with JIT, so in that case +the memory modifier never has any effect. For this modifier to work, the +null_context modifier must not be set on both the pattern and the +subject, though it can be set on one or the other. +

    +
    +Setting a starting offset +
    +

    +The offset modifier sets an offset in the subject string at which +matching starts. Its value is a number of code units, not characters. +

    +
    +Setting an offset limit +
    +

    +The offset_limit modifier sets a limit for unanchored matches. If a match +cannot be found starting at or before this offset in the subject, a "no match" +return is given. The data value is a number of code units, not characters. When +this modifier is used, the use_offset_limit modifier must have been set +for the pattern; if not, an error is generated. +

    +
    +Setting the size of the output vector +
    +

    +The ovector modifier applies only to the subject line in which it +appears, though of course it can also be used to set a default in a +#subject command. It specifies the number of pairs of offsets that are +available for storing matching information. The default is 15. +

    +

    +A value of zero is useful when testing the POSIX API because it causes +regexec() to be called with a NULL capture vector. When not testing the +POSIX API, a value of zero is used to cause +pcre2_match_data_create_from_pattern() to be called, in order to create a +match block of exactly the right size for the pattern. (It is not possible to +create a match block with a zero-length ovector; there is always at least one +pair of offsets.) +

    +
    +Passing the subject as zero-terminated +
    +

    +By default, the subject string is passed to a native API matching function with +its correct length. In order to test the facility for passing a zero-terminated +string, the zero_terminate modifier is provided. It causes the length to +be passed as PCRE2_ZERO_TERMINATED. When matching via the POSIX interface, +this modifier is ignored, with a warning. +

    +

    +When testing pcre2_substitute(), this modifier also has the effect of +passing the replacement string as zero-terminated. +

    +
    +Passing a NULL context +
    +

    +Normally, pcre2test passes a context block to pcre2_match(), +pcre2_dfa_match(), pcre2_jit_match() or pcre2_substitute(). +If the null_context modifier is set, however, NULL is passed. This is for +testing that the matching and substitution functions behave correctly in this +case (they use default values). This modifier cannot be used with the +find_limits or substitute_callout modifiers. +

    +
    THE ALTERNATIVE MATCHING FUNCTION
    +

    +By default, pcre2test uses the standard PCRE2 matching function, +pcre2_match() to match each subject line. PCRE2 also supports an +alternative matching function, pcre2_dfa_match(), which operates in a +different way, and has some restrictions. The differences between the two +functions are described in the +pcre2matching +documentation. +

    +

    +If the dfa modifier is set, the alternative matching function is used. +This function finds all possible matches at a given point in the subject. If, +however, the dfa_shortest modifier is set, processing stops after the +first match is found. This is always the shortest possible match. +

    +
    DEFAULT OUTPUT FROM pcre2test
    +

    +This section describes the output when the normal matching function, +pcre2_match(), is being used. +

    +

    +When a match succeeds, pcre2test outputs the list of captured substrings, +starting with number 0 for the string that matched the whole pattern. +Otherwise, it outputs "No match" when the return is PCRE2_ERROR_NOMATCH, or +"Partial match:" followed by the partially matching substring when the +return is PCRE2_ERROR_PARTIAL. (Note that this is the +entire substring that was inspected during the partial match; it may include +characters before the actual match start if a lookbehind assertion, \K, \b, +or \B was involved.) +

    +

    +For any other return, pcre2test outputs the PCRE2 negative error number +and a short descriptive phrase. If the error is a failed UTF string check, the +code unit offset of the start of the failing character is also output. Here is +an example of an interactive pcre2test run. +

    +  $ pcre2test
    +  PCRE2 version 10.22 2016-07-29
    +
    +    re> /^abc(\d+)/
    +  data> abc123
    +   0: abc123
    +   1: 123
    +  data> xyz
    +  No match
    +
    +Unset capturing substrings that are not followed by one that is set are not +shown by pcre2test unless the allcaptures modifier is specified. In +the following example, there are two capturing substrings, but when the first +data line is matched, the second, unset substring is not shown. An "internal" +unset substring is shown as "<unset>", as for the second data line. +
    +    re> /(a)|(b)/
    +  data> a
    +   0: a
    +   1: a
    +  data> b
    +   0: b
    +   1: <unset>
    +   2: b
    +
    +If the strings contain any non-printing characters, they are output as \xhh +escapes if the value is less than 256 and UTF mode is not set. Otherwise they +are output as \x{hh...} escapes. See below for the definition of non-printing +characters. If the aftertext modifier is set, the output for substring +0 is followed by the the rest of the subject string, identified by "0+" like +this: +
    +    re> /cat/aftertext
    +  data> cataract
    +   0: cat
    +   0+ aract
    +
    +If global matching is requested, the results of successive matching attempts +are output in sequence, like this: +
    +    re> /\Bi(\w\w)/g
    +  data> Mississippi
    +   0: iss
    +   1: ss
    +   0: iss
    +   1: ss
    +   0: ipp
    +   1: pp
    +
    +"No match" is output only if the first match attempt fails. Here is an example +of a failure message (the offset 4 that is specified by the offset +modifier is past the end of the subject string): +
    +    re> /xyz/
    +  data> xyz\=offset=4
    +  Error -24 (bad offset value)
    +
    +

    +

    +Note that whereas patterns can be continued over several lines (a plain ">" +prompt is used for continuations), subject lines may not. However newlines can +be included in a subject by means of the \n escape (or \r, \r\n, etc., +depending on the newline sequence setting). +

    +
    OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
    +

    +When the alternative matching function, pcre2_dfa_match(), is used, the +output consists of a list of all the matches that start at the first point in +the subject where there is at least one match. For example: +

    +    re> /(tang|tangerine|tan)/
    +  data> yellow tangerine\=dfa
    +   0: tangerine
    +   1: tang
    +   2: tan
    +
    +Using the normal matching function on this data finds only "tang". The +longest matching string is always given first (and numbered zero). After a +PCRE2_ERROR_PARTIAL return, the output is "Partial match:", followed by the +partially matching substring. Note that this is the entire substring that was +inspected during the partial match; it may include characters before the actual +match start if a lookbehind assertion, \b, or \B was involved. (\K is not +supported for DFA matching.) +

    +

    +If global matching is requested, the search for further matches resumes +at the end of the longest match. For example: +

    +    re> /(tang|tangerine|tan)/g
    +  data> yellow tangerine and tangy sultana\=dfa
    +   0: tangerine
    +   1: tang
    +   2: tan
    +   0: tang
    +   1: tan
    +   0: tan
    +
    +The alternative matching function does not support substring capture, so the +modifiers that are concerned with captured substrings are not relevant. +

    +
    RESTARTING AFTER A PARTIAL MATCH
    +

    +When the alternative matching function has given the PCRE2_ERROR_PARTIAL +return, indicating that the subject partially matched the pattern, you can +restart the match with additional subject data by means of the +dfa_restart modifier. For example: +

    +    re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
    +  data> 23ja\=ps,dfa
    +  Partial match: 23ja
    +  data> n05\=dfa,dfa_restart
    +   0: n05
    +
    +For further information about partial matching, see the +pcre2partial +documentation. +

    +
    CALLOUTS
    +

    +If the pattern contains any callout requests, pcre2test's callout +function is called during matching unless callout_none is specified. This +works with both matching functions, and with JIT, though there are some +differences in behaviour. The output for callouts with numerical arguments and +those with string arguments is slightly different. +

    +
    +Callouts with numerical arguments +
    +

    +By default, the callout function displays the callout number, the start and +current positions in the subject text at the callout time, and the next pattern +item to be tested. For example: +

    +  --->pqrabcdef
    +    0    ^  ^     \d
    +
    +This output indicates that callout number 0 occurred for a match attempt +starting at the fourth character of the subject string, when the pointer was at +the seventh character, and when the next pattern item was \d. Just +one circumflex is output if the start and current positions are the same, or if +the current position precedes the start position, which can happen if the +callout is in a lookbehind assertion. +

    +

    +Callouts numbered 255 are assumed to be automatic callouts, inserted as a +result of the auto_callout pattern modifier. In this case, instead of +showing the callout number, the offset in the pattern, preceded by a plus, is +output. For example: +

    +    re> /\d?[A-E]\*/auto_callout
    +  data> E*
    +  --->E*
    +   +0 ^      \d?
    +   +3 ^      [A-E]
    +   +8 ^^     \*
    +  +10 ^ ^
    +   0: E*
    +
    +If a pattern contains (*MARK) items, an additional line is output whenever +a change of latest mark is passed to the callout function. For example: +
    +    re> /a(*MARK:X)bc/auto_callout
    +  data> abc
    +  --->abc
    +   +0 ^       a
    +   +1 ^^      (*MARK:X)
    +  +10 ^^      b
    +  Latest Mark: X
    +  +11 ^ ^     c
    +  +12 ^  ^
    +   0: abc
    +
    +The mark changes between matching "a" and "b", but stays the same for the rest +of the match, so nothing more is output. If, as a result of backtracking, the +mark reverts to being unset, the text "<unset>" is output. +

    +
    +Callouts with string arguments +
    +

    +The output for a callout with a string argument is similar, except that instead +of outputting a callout number before the position indicators, the callout +string and its offset in the pattern string are output before the reflection of +the subject string, and the subject string is reflected for each callout. For +example: +

    +    re> /^ab(?C'first')cd(?C"second")ef/
    +  data> abcdefg
    +  Callout (7): 'first'
    +  --->abcdefg
    +      ^ ^         c
    +  Callout (20): "second"
    +  --->abcdefg
    +      ^   ^       e
    +   0: abcdef
    +
    +
    +

    +
    +Callout modifiers +
    +

    +The callout function in pcre2test returns zero (carry on matching) by +default, but you can use a callout_fail modifier in a subject line to +change this and other parameters of the callout (see below). +

    +

    +If the callout_capture modifier is set, the current captured groups are +output when a callout occurs. This is useful only for non-DFA matching, as +pcre2_dfa_match() does not support capturing, so no captures are ever +shown. +

    +

    +The normal callout output, showing the callout number or pattern offset (as +described above) is suppressed if the callout_no_where modifier is set. +

    +

    +When using the interpretive matching function pcre2_match() without JIT, +setting the callout_extra modifier causes additional output from +pcre2test's callout function to be generated. For the first callout in a +match attempt at a new starting position in the subject, "New match attempt" is +output. If there has been a backtrack since the last callout (or start of +matching if this is the first callout), "Backtrack" is output, followed by "No +other matching paths" if the backtrack ended the previous match attempt. For +example: +

    +   re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
    +  data> aac\=callout_extra
    +  New match attempt
    +  --->aac
    +   +0 ^       (
    +   +1 ^       a+
    +   +3 ^ ^     )
    +   +4 ^ ^     b
    +  Backtrack
    +  --->aac
    +   +3 ^^      )
    +   +4 ^^      b
    +  Backtrack
    +  No other matching paths
    +  New match attempt
    +  --->aac
    +   +0  ^      (
    +   +1  ^      a+
    +   +3  ^^     )
    +   +4  ^^     b
    +  Backtrack
    +  No other matching paths
    +  New match attempt
    +  --->aac
    +   +0   ^     (
    +   +1   ^     a+
    +  Backtrack
    +  No other matching paths
    +  New match attempt
    +  --->aac
    +   +0    ^    (
    +   +1    ^    a+
    +  No match
    +
    +Notice that various optimizations must be turned off if you want all possible +matching paths to be scanned. If no_start_optimize is not used, there is +an immediate "no match", without any callouts, because the starting +optimization fails to find "b" in the subject, which it knows must be present +for any match. If no_auto_possess is not used, the "a+" item is turned +into "a++", which reduces the number of backtracks. +

    +

    +The callout_extra modifier has no effect if used with the DFA matching +function, or with JIT. +

    +
    +Return values from callouts +
    +

    +The default return from the callout function is zero, which allows matching to +continue. The callout_fail modifier can be given one or two numbers. If +there is only one number, 1 is returned instead of 0 (causing matching to +backtrack) when a callout of that number is reached. If two numbers (<n>:<m>) +are given, 1 is returned when callout <n> is reached and there have been at +least <m> callouts. The callout_error modifier is similar, except that +PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be +aborted. If both these modifiers are set for the same callout number, +callout_error takes precedence. Note that callouts with string arguments +are always given the number zero. +

    +

    +The callout_data modifier can be given an unsigned or a negative number. +This is set as the "user data" that is passed to the matching function, and +passed back when the callout function is invoked. Any value other than zero is +used as a return from pcre2test's callout function. +

    +

    +Inserting callouts can be helpful when using pcre2test to check +complicated regular expressions. For further information about callouts, see +the +pcre2callout +documentation. +

    +
    NON-PRINTING CHARACTERS
    +

    +When pcre2test is outputting text in the compiled version of a pattern, +bytes other than 32-126 are always treated as non-printing characters and are +therefore shown as hex escapes. +

    +

    +When pcre2test is outputting text that is a matched part of a subject +string, it behaves in the same way, unless a different locale has been set for +the pattern (using the locale modifier). In this case, the +isprint() function is used to distinguish printing and non-printing +characters. +

    +
    SAVING AND RESTORING COMPILED PATTERNS
    +

    +It is possible to save compiled patterns on disc or elsewhere, and reload them +later, subject to a number of restrictions. JIT data cannot be saved. The host +on which the patterns are reloaded must be running the same version of PCRE2, +with the same code unit width, and must also have the same endianness, pointer +width and PCRE2_SIZE type. Before compiled patterns can be saved they must be +serialized, that is, converted to a stream of bytes. A single byte stream may +contain any number of compiled patterns, but they must all use the same +character tables. A single copy of the tables is included in the byte stream +(its size is 1088 bytes). +

    +

    +The functions whose names begin with pcre2_serialize_ are used +for serializing and de-serializing. They are described in the +pcre2serialize +documentation. In this section we describe the features of pcre2test that +can be used to test these functions. +

    +

    +Note that "serialization" in PCRE2 does not convert compiled patterns to an +abstract format like Java or .NET. It just makes a reloadable byte code stream. +Hence the restrictions on reloading mentioned above. +

    +

    +In pcre2test, when a pattern with push modifier is successfully +compiled, it is pushed onto a stack of compiled patterns, and pcre2test +expects the next line to contain a new pattern (or command) instead of a +subject line. By contrast, the pushcopy modifier causes a copy of the +compiled pattern to be stacked, leaving the original available for immediate +matching. By using push and/or pushcopy, a number of patterns can +be compiled and retained. These modifiers are incompatible with posix, +and control modifiers that act at match time are ignored (with a message) for +the stacked patterns. The jitverify modifier applies only at compile +time. +

    +

    +The command +

    +  #save <filename>
    +
    +causes all the stacked patterns to be serialized and the result written to the +named file. Afterwards, all the stacked patterns are freed. The command +
    +  #load <filename>
    +
    +reads the data in the file, and then arranges for it to be de-serialized, with +the resulting compiled patterns added to the pattern stack. The pattern on the +top of the stack can be retrieved by the #pop command, which must be followed +by lines of subjects that are to be matched with the pattern, terminated as +usual by an empty line or end of file. This command may be followed by a +modifier list containing only +control modifiers +that act after a pattern has been compiled. In particular, hex, +posix, posix_nosub, push, and pushcopy are not allowed, +nor are any +option-setting modifiers. +The JIT modifiers are, however permitted. Here is an example that saves and +reloads two patterns. +
    +  /abc/push
    +  /xyz/push
    +  #save tempfile
    +  #load tempfile
    +  #pop info
    +  xyz
    +
    +  #pop jit,bincode
    +  abc
    +
    +If jitverify is used with #pop, it does not automatically imply +jit, which is different behaviour from when it is used on a pattern. +

    +

    +The #popcopy command is analagous to the pushcopy modifier in that it +makes current a copy of the topmost stack pattern, leaving the original still +on the stack. +

    +
    SEE ALSO
    +

    +pcre2(3), pcre2api(3), pcre2callout(3), +pcre2jit, pcre2matching(3), pcre2partial(d), +pcre2pattern(3), pcre2serialize(3). +

    +
    AUTHOR
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    REVISION
    +

    +Last updated: 28 April 2021 +
    +Copyright © 1997-2021 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/html/pcre2unicode.html b/src/pcre2/doc/html/pcre2unicode.html new file mode 100644 index 00000000..76ca6ea2 --- /dev/null +++ b/src/pcre2/doc/html/pcre2unicode.html @@ -0,0 +1,495 @@ + + +pcre2unicode specification + + +

    pcre2unicode man page

    +

    +Return to the PCRE2 index page. +

    +

    +This page is part of the PCRE2 HTML documentation. It was generated +automatically from the original man page. If there is any nonsense in it, +please consult the man page, in case the conversion went wrong. +
    +
    +UNICODE AND UTF SUPPORT +
    +

    +PCRE2 is normally built with Unicode support, though if you do not need it, you +can build it without, in which case the library will be smaller. With Unicode +support, PCRE2 has knowledge of Unicode character properties and can process +strings of text in UTF-8, UTF-16, and UTF-32 format (depending on the code unit +width), but this is not the default. Unless specifically requested, PCRE2 +treats each code unit in a string as one character. +

    +

    +There are two ways of telling PCRE2 to switch to UTF mode, where characters may +consist of more than one code unit and the range of values is constrained. The +program can call +pcre2_compile() +with the PCRE2_UTF option, or the pattern may start with the sequence (*UTF). +However, the latter facility can be locked out by the PCRE2_NEVER_UTF option. +That is, the programmer can prevent the supplier of the pattern from switching +to UTF mode. +

    +

    +Note that the PCRE2_MATCH_INVALID_UTF option (see +below) +forces PCRE2_UTF to be set. +

    +

    +In UTF mode, both the pattern and any subject strings that are matched against +it are treated as UTF strings instead of strings of individual one-code-unit +characters. There are also some other changes to the way characters are +handled, as documented below. +

    +
    +UNICODE PROPERTY SUPPORT +
    +

    +When PCRE2 is built with Unicode support, the escape sequences \p{..}, +\P{..}, and \X can be used. This is not dependent on the PCRE2_UTF setting. +The Unicode properties that can be tested are limited to the general category +properties such as Lu for an upper case letter or Nd for a decimal number, the +Unicode script names such as Arabic or Han, and the derived properties Any and +L&. Full lists are given in the +pcre2pattern +and +pcre2syntax +documentation. Only the short names for properties are supported. For example, +\p{L} matches a letter. Its Perl synonym, \p{Letter}, is not supported. +Furthermore, in Perl, many properties may optionally be prefixed by "Is", for +compatibility with Perl 5.6. PCRE2 does not support this. +

    +
    +WIDE CHARACTERS AND UTF MODES +
    +

    +Code points less than 256 can be specified in patterns by either braced or +unbraced hexadecimal escape sequences (for example, \x{b3} or \xb3). Larger +values have to use braced sequences. Unbraced octal code points up to \777 are +also recognized; larger ones can be coded using \o{...}. +

    +

    +The escape sequence \N{U+<hex digits>} is recognized as another way of +specifying a Unicode character by code point in a UTF mode. It is not allowed +in non-UTF mode. +

    +

    +In UTF mode, repeat quantifiers apply to complete UTF characters, not to +individual code units. +

    +

    +In UTF mode, the dot metacharacter matches one UTF character instead of a +single code unit. +

    +

    +In UTF mode, capture group names are not restricted to ASCII, and may contain +any Unicode letters and decimal digits, as well as underscore. +

    +

    +The escape sequence \C can be used to match a single code unit in UTF mode, +but its use can lead to some strange effects because it breaks up multi-unit +characters (see the description of \C in the +pcre2pattern +documentation). For this reason, there is a build-time option that disables +support for \C completely. There is also a less draconian compile-time option +for locking out the use of \C when a pattern is compiled. +

    +

    +The use of \C is not supported by the alternative matching function +pcre2_dfa_match() when in UTF-8 or UTF-16 mode, that is, when a character +may consist of more than one code unit. The use of \C in these modes provokes +a match-time error. Also, the JIT optimization does not support \C in these +modes. If JIT optimization is requested for a UTF-8 or UTF-16 pattern that +contains \C, it will not succeed, and so when pcre2_match() is called, +the matching will be carried out by the interpretive function. +

    +

    +The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly test +characters of any code value, but, by default, the characters that PCRE2 +recognizes as digits, spaces, or word characters remain the same set as in +non-UTF mode, all with code points less than 256. This remains true even when +PCRE2 is built to include Unicode support, because to do otherwise would slow +down matching in many common cases. Note that this also applies to \b +and \B, because they are defined in terms of \w and \W. If you want +to test for a wider sense of, say, "digit", you can use explicit Unicode +property tests such as \p{Nd}. Alternatively, if you set the PCRE2_UCP option, +the way that the character escapes work is changed so that Unicode properties +are used to determine which characters match. There are more details in the +section on +generic character types +in the +pcre2pattern +documentation. +

    +

    +Similarly, characters that match the POSIX named character classes are all +low-valued characters, unless the PCRE2_UCP option is set. +

    +

    +However, the special horizontal and vertical white space matching escapes (\h, +\H, \v, and \V) do match all the appropriate Unicode characters, whether or +not PCRE2_UCP is set. +

    +
    +UNICODE CASE-EQUIVALENCE +
    +

    +If either PCRE2_UTF or PCRE2_UCP is set, upper/lower case processing makes use +of Unicode properties except for characters whose code points are less than 128 +and that have at most two case-equivalent values. For these, a direct table +lookup is used for speed. A few Unicode characters such as Greek sigma have +more than two code points that are case-equivalent, and these are treated +specially. Setting PCRE2_UCP without PCRE2_UTF allows Unicode-style case +processing for non-UTF character encodings such as UCS-2. +

    +
    +SCRIPT RUNS +
    +

    +The pattern constructs (*script_run:...) and (*atomic_script_run:...), with +synonyms (*sr:...) and (*asr:...), verify that the string matched within the +parentheses is a script run. In concept, a script run is a sequence of +characters that are all from the same Unicode script. However, because some +scripts are commonly used together, and because some diacritical and other +marks are used with multiple scripts, it is not that simple. +

    +

    +Every Unicode character has a Script property, mostly with a value +corresponding to the name of a script, such as Latin, Greek, or Cyrillic. There +are also three special values: +

    +

    +"Unknown" is used for code points that have not been assigned, and also for the +surrogate code points. In the PCRE2 32-bit library, characters whose code +points are greater than the Unicode maximum (U+10FFFF), which are accessible +only in non-UTF mode, are assigned the Unknown script. +

    +

    +"Common" is used for characters that are used with many scripts. These include +punctuation, emoji, mathematical, musical, and currency symbols, and the ASCII +digits 0 to 9. +

    +

    +"Inherited" is used for characters such as diacritical marks that modify a +previous character. These are considered to take on the script of the character +that they modify. +

    +

    +Some Inherited characters are used with many scripts, but many of them are only +normally used with a small number of scripts. For example, U+102E0 (Coptic +Epact thousands mark) is used only with Arabic and Coptic. In order to make it +possible to check this, a Unicode property called Script Extension exists. Its +value is a list of scripts that apply to the character. For the majority of +characters, the list contains just one script, the same one as the Script +property. However, for characters such as U+102E0 more than one Script is +listed. There are also some Common characters that have a single, non-Common +script in their Script Extension list. +

    +

    +The next section describes the basic rules for deciding whether a given string +of characters is a script run. Note, however, that there are some special cases +involving the Chinese Han script, and an additional constraint for decimal +digits. These are covered in subsequent sections. +

    +
    +Basic script run rules +
    +

    +A string that is less than two characters long is a script run. This is the +only case in which an Unknown character can be part of a script run. Longer +strings are checked using only the Script Extensions property, not the basic +Script property. +

    +

    +If a character's Script Extension property is the single value "Inherited", it +is always accepted as part of a script run. This is also true for the property +"Common", subject to the checking of decimal digits described below. All the +remaining characters in a script run must have at least one script in common in +their Script Extension lists. In set-theoretic terminology, the intersection of +all the sets of scripts must not be empty. +

    +

    +A simple example is an Internet name such as "google.com". The letters are all +in the Latin script, and the dot is Common, so this string is a script run. +However, the Cyrillic letter "o" looks exactly the same as the Latin "o"; a +string that looks the same, but with Cyrillic "o"s is not a script run. +

    +

    +More interesting examples involve characters with more than one script in their +Script Extension. Consider the following characters: +

    +  U+060C  Arabic comma
    +  U+06D4  Arabic full stop
    +
    +The first has the Script Extension list Arabic, Hanifi Rohingya, Syriac, and +Thaana; the second has just Arabic and Hanifi Rohingya. Both of them could +appear in script runs of either Arabic or Hanifi Rohingya. The first could also +appear in Syriac or Thaana script runs, but the second could not. +

    +
    +The Chinese Han script +
    +

    +The Chinese Han script is commonly used in conjunction with other scripts for +writing certain languages. Japanese uses the Hiragana and Katakana scripts +together with Han; Korean uses Hangul and Han; Taiwanese Mandarin uses Bopomofo +and Han. These three combinations are treated as special cases when checking +script runs and are, in effect, "virtual scripts". Thus, a script run may +contain a mixture of Hiragana, Katakana, and Han, or a mixture of Hangul and +Han, or a mixture of Bopomofo and Han, but not, for example, a mixture of +Hangul and Bopomofo and Han. PCRE2 (like Perl) follows Unicode's Technical +Standard 39 ("Unicode Security Mechanisms", http://unicode.org/reports/tr39/) +in allowing such mixtures. +

    +
    +Decimal digits +
    +

    +Unicode contains many sets of 10 decimal digits in different scripts, and some +scripts (including the Common script) contain more than one set. Some of these +decimal digits them are visually indistinguishable from the common ASCII +digits. In addition to the script checking described above, if a script run +contains any decimal digits, they must all come from the same set of 10 +adjacent characters. +

    +
    +VALIDITY OF UTF STRINGS +
    +

    +When the PCRE2_UTF option is set, the strings passed as patterns and subjects +are (by default) checked for validity on entry to the relevant functions. If an +invalid UTF string is passed, a negative error code is returned. The code unit +offset to the offending character can be extracted from the match data block by +calling pcre2_get_startchar(), which is used for this purpose after a UTF +error. +

    +

    +In some situations, you may already know that your strings are valid, and +therefore want to skip these checks in order to improve performance, for +example in the case of a long subject string that is being scanned repeatedly. +If you set the PCRE2_NO_UTF_CHECK option at compile time or at match time, +PCRE2 assumes that the pattern or subject it is given (respectively) contains +only valid UTF code unit sequences. +

    +

    +If you pass an invalid UTF string when PCRE2_NO_UTF_CHECK is set, the result +is undefined and your program may crash or loop indefinitely or give incorrect +results. There is, however, one mode of matching that can handle invalid UTF +subject strings. This is enabled by passing PCRE2_MATCH_INVALID_UTF to +pcre2_compile() and is discussed below in the next section. The rest of +this section covers the case when PCRE2_MATCH_INVALID_UTF is not set. +

    +

    +Passing PCRE2_NO_UTF_CHECK to pcre2_compile() just disables the UTF check +for the pattern; it does not also apply to subject strings. If you want to +disable the check for a subject string you must pass this same option to +pcre2_match() or pcre2_dfa_match(). +

    +

    +UTF-16 and UTF-32 strings can indicate their endianness by special code knows +as a byte-order mark (BOM). The PCRE2 functions do not handle this, expecting +strings to be in host byte order. +

    +

    +Unless PCRE2_NO_UTF_CHECK is set, a UTF string is checked before any other +processing takes place. In the case of pcre2_match() and +pcre2_dfa_match() calls with a non-zero starting offset, the check is +applied only to that part of the subject that could be inspected during +matching, and there is a check that the starting offset points to the first +code unit of a character or to the end of the subject. If there are no +lookbehind assertions in the pattern, the check starts at the starting offset. +Otherwise, it starts at the length of the longest lookbehind before the +starting offset, or at the start of the subject if there are not that many +characters before the starting offset. Note that the sequences \b and \B are +one-character lookbehinds. +

    +

    +In addition to checking the format of the string, there is a check to ensure +that all code points lie in the range U+0 to U+10FFFF, excluding the surrogate +area. The so-called "non-character" code points are not excluded because +Unicode corrigendum #9 makes it clear that they should not be. +

    +

    +Characters in the "Surrogate Area" of Unicode are reserved for use by UTF-16, +where they are used in pairs to encode code points with values greater than +0xFFFF. The code points that are encoded by UTF-16 pairs are available +independently in the UTF-8 and UTF-32 encodings. (In other words, the whole +surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8 and +UTF-32.) +

    +

    +Setting PCRE2_NO_UTF_CHECK at compile time does not disable the error that is +given if an escape sequence for an invalid Unicode code point is encountered in +the pattern. If you want to allow escape sequences such as \x{d800} (a +surrogate code point) you can set the PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES extra +option. However, this is possible only in UTF-8 and UTF-32 modes, because these +values are not representable in UTF-16. +

    +
    +Errors in UTF-8 strings +
    +

    +The following negative error codes are given for invalid UTF-8 strings: +

    +  PCRE2_ERROR_UTF8_ERR1
    +  PCRE2_ERROR_UTF8_ERR2
    +  PCRE2_ERROR_UTF8_ERR3
    +  PCRE2_ERROR_UTF8_ERR4
    +  PCRE2_ERROR_UTF8_ERR5
    +
    +The string ends with a truncated UTF-8 character; the code specifies how many +bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8 characters to be +no longer than 4 bytes, the encoding scheme (originally defined by RFC 2279) +allows for up to 6 bytes, and this is checked first; hence the possibility of +4 or 5 missing bytes. +
    +  PCRE2_ERROR_UTF8_ERR6
    +  PCRE2_ERROR_UTF8_ERR7
    +  PCRE2_ERROR_UTF8_ERR8
    +  PCRE2_ERROR_UTF8_ERR9
    +  PCRE2_ERROR_UTF8_ERR10
    +
    +The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of the +character do not have the binary value 0b10 (that is, either the most +significant bit is 0, or the next bit is 1). +
    +  PCRE2_ERROR_UTF8_ERR11
    +  PCRE2_ERROR_UTF8_ERR12
    +
    +A character that is valid by the RFC 2279 rules is either 5 or 6 bytes long; +these code points are excluded by RFC 3629. +
    +  PCRE2_ERROR_UTF8_ERR13
    +
    +A 4-byte character has a value greater than 0x10ffff; these code points are +excluded by RFC 3629. +
    +  PCRE2_ERROR_UTF8_ERR14
    +
    +A 3-byte character has a value in the range 0xd800 to 0xdfff; this range of +code points are reserved by RFC 3629 for use with UTF-16, and so are excluded +from UTF-8. +
    +  PCRE2_ERROR_UTF8_ERR15
    +  PCRE2_ERROR_UTF8_ERR16
    +  PCRE2_ERROR_UTF8_ERR17
    +  PCRE2_ERROR_UTF8_ERR18
    +  PCRE2_ERROR_UTF8_ERR19
    +
    +A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it codes for a +value that can be represented by fewer bytes, which is invalid. For example, +the two bytes 0xc0, 0xae give the value 0x2e, whose correct coding uses just +one byte. +
    +  PCRE2_ERROR_UTF8_ERR20
    +
    +The two most significant bits of the first byte of a character have the binary +value 0b10 (that is, the most significant bit is 1 and the second is 0). Such a +byte can only validly occur as the second or subsequent byte of a multi-byte +character. +
    +  PCRE2_ERROR_UTF8_ERR21
    +
    +The first byte of a character has the value 0xfe or 0xff. These values can +never occur in a valid UTF-8 string. +

    +
    +Errors in UTF-16 strings +
    +

    +The following negative error codes are given for invalid UTF-16 strings: +

    +  PCRE2_ERROR_UTF16_ERR1  Missing low surrogate at end of string
    +  PCRE2_ERROR_UTF16_ERR2  Invalid low surrogate follows high surrogate
    +  PCRE2_ERROR_UTF16_ERR3  Isolated low surrogate
    +
    +
    +

    +
    +Errors in UTF-32 strings +
    +

    +The following negative error codes are given for invalid UTF-32 strings: +

    +  PCRE2_ERROR_UTF32_ERR1  Surrogate character (0xd800 to 0xdfff)
    +  PCRE2_ERROR_UTF32_ERR2  Code point is greater than 0x10ffff
    +
    +
    +

    +
    +MATCHING IN INVALID UTF STRINGS +
    +

    +You can run pattern matches on subject strings that may contain invalid UTF +sequences if you call pcre2_compile() with the PCRE2_MATCH_INVALID_UTF +option. This is supported by pcre2_match(), including JIT matching, but +not by pcre2_dfa_match(). When PCRE2_MATCH_INVALID_UTF is set, it forces +PCRE2_UTF to be set as well. Note, however, that the pattern itself must be a +valid UTF string. +

    +

    +Setting PCRE2_MATCH_INVALID_UTF does not affect what pcre2_compile() +generates, but if pcre2_jit_compile() is subsequently called, it does +generate different code. If JIT is not used, the option affects the behaviour +of the interpretive code in pcre2_match(). When PCRE2_MATCH_INVALID_UTF +is set at compile time, PCRE2_NO_UTF_CHECK is ignored at match time. +

    +

    +In this mode, an invalid code unit sequence in the subject never matches any +pattern item. It does not match dot, it does not match \p{Any}, it does not +even match negative items such as [^X]. A lookbehind assertion fails if it +encounters an invalid sequence while moving the current point backwards. In +other words, an invalid UTF code unit sequence acts as a barrier which no match +can cross. +

    +

    +You can also think of this as the subject being split up into fragments of +valid UTF, delimited internally by invalid code unit sequences. The pattern is +matched fragment by fragment. The result of a successful match, however, is +given as code unit offsets in the entire subject string in the usual way. There +are a few points to consider: +

    +

    +The internal boundaries are not interpreted as the beginnings or ends of lines +and so do not match circumflex or dollar characters in the pattern. +

    +

    +If pcre2_match() is called with an offset that points to an invalid +UTF-sequence, that sequence is skipped, and the match starts at the next valid +UTF character, or the end of the subject. +

    +

    +At internal fragment boundaries, \b and \B behave in the same way as at the +beginning and end of the subject. For example, a sequence such as \bWORD\b +would match an instance of WORD that is surrounded by invalid UTF code units. +

    +

    +Using PCRE2_MATCH_INVALID_UTF, an application can run matches on arbitrary +data, knowing that any matched strings that are returned are valid UTF. This +can be useful when searching for UTF text in executable or other binary files. +

    +
    +AUTHOR +
    +

    +Philip Hazel +
    +University Computing Service +
    +Cambridge, England. +
    +

    +
    +REVISION +
    +

    +Last updated: 23 February 2020 +
    +Copyright © 1997-2020 University of Cambridge. +
    +

    +Return to the PCRE2 index page. +

    diff --git a/src/pcre2/doc/index.html.src b/src/pcre2/doc/index.html.src new file mode 100644 index 00000000..2c7c5fb2 --- /dev/null +++ b/src/pcre2/doc/index.html.src @@ -0,0 +1,312 @@ + + + +PCRE2 specification + + +

    Perl-compatible Regular Expressions (revised API: PCRE2)

    +

    +The HTML documentation for PCRE2 consists of a number of pages that are listed +below in alphabetical order. If you are new to PCRE2, please read the first one +first. +

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    pcre2  Introductory page
    pcre2-config  Information about the installation configuration
    pcre2api  PCRE2's native API
    pcre2build  Building PCRE2
    pcre2callout  The callout facility
    pcre2compat  Compability with Perl
    pcre2convert  Experimental foreign pattern conversion functions
    pcre2demo  A demonstration C program that uses the PCRE2 library
    pcre2grep  The pcre2grep command
    pcre2jit  Discussion of the just-in-time optimization support
    pcre2limits  Details of size and other limits
    pcre2matching  Discussion of the two matching algorithms
    pcre2partial  Using PCRE2 for partial matching
    pcre2pattern  Specification of the regular expressions supported by PCRE2
    pcre2perform  Some comments on performance
    pcre2posix  The POSIX API to the PCRE2 8-bit library
    pcre2sample  Discussion of the pcre2demo program
    pcre2serialize  Serializing functions for saving precompiled patterns
    pcre2syntax  Syntax quick-reference summary
    pcre2test  The pcre2test command for testing PCRE2
    pcre2unicode  Discussion of Unicode and UTF-8/UTF-16/UTF-32 support
    + +

    +There are also individual pages that summarize the interface for each function +in the library. +

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    pcre2_callout_enumerate  Enumerate callouts in a compiled pattern
    pcre2_code_copy  Copy a compiled pattern
    pcre2_code_copy_with_tables  Copy a compiled pattern and its character tables
    pcre2_code_free  Free a compiled pattern
    pcre2_compile  Compile a regular expression pattern
    pcre2_compile_context_copy  Copy a compile context
    pcre2_compile_context_create  Create a compile context
    pcre2_compile_context_free  Free a compile context
    pcre2_config  Show build-time configuration options
    pcre2_convert_context_copy  Copy a convert context
    pcre2_convert_context_create  Create a convert context
    pcre2_convert_context_free  Free a convert context
    pcre2_converted_pattern_free  Free converted foreign pattern
    pcre2_dfa_match  Match a compiled pattern to a subject string + (DFA algorithm; not Perl compatible)
    pcre2_general_context_copy  Copy a general context
    pcre2_general_context_create  Create a general context
    pcre2_general_context_free  Free a general context
    pcre2_get_error_message  Get textual error message for error number
    pcre2_get_mark  Get a (*MARK) name
    pcre2_get_match_data_size  Get the size of a match data block
    pcre2_get_ovector_count  Get the ovector count
    pcre2_get_ovector_pointer  Get a pointer to the ovector
    pcre2_get_startchar  Get the starting character offset
    pcre2_jit_compile  Process a compiled pattern with the JIT compiler
    pcre2_jit_free_unused_memory  Free unused JIT memory
    pcre2_jit_match  Fast path interface to JIT matching
    pcre2_jit_stack_assign  Assign stack for JIT matching
    pcre2_jit_stack_create  Create a stack for JIT matching
    pcre2_jit_stack_free  Free a JIT matching stack
    pcre2_maketables  Build character tables in current locale
    pcre2_maketables_free  Free character tables
    pcre2_match  Match a compiled pattern to a subject string + (Perl compatible)
    pcre2_match_context_copy  Copy a match context
    pcre2_match_context_create  Create a match context
    pcre2_match_context_free  Free a match context
    pcre2_match_data_create  Create a match data block
    pcre2_match_data_create_from_pattern  Create a match data block getting size from pattern
    pcre2_match_data_free  Free a match data block
    pcre2_pattern_convert  Experimental foreign pattern converter
    pcre2_pattern_info  Extract information about a pattern
    pcre2_serialize_decode  Decode serialized compiled patterns
    pcre2_serialize_encode  Serialize compiled patterns for save/restore
    pcre2_serialize_free  Free serialized compiled patterns
    pcre2_serialize_get_number_of_codes  Get number of serialized compiled patterns
    pcre2_set_bsr  Set \R convention
    pcre2_set_callout  Set up a callout function
    pcre2_set_character_tables  Set character tables
    pcre2_set_compile_extra_options  Set compile time extra options
    pcre2_set_compile_recursion_guard  Set up a compile recursion guard function
    pcre2_set_depth_limit  Set the match backtracking depth limit
    pcre2_set_glob_escape  Set glob escape character
    pcre2_set_glob_separator  Set glob separator character
    pcre2_set_heap_limit  Set the match backtracking heap limit
    pcre2_set_match_limit  Set the match limit
    pcre2_set_max_pattern_length  Set the maximum length of pattern
    pcre2_set_newline  Set the newline convention
    pcre2_set_offset_limit  Set the offset limit
    pcre2_set_parens_nest_limit  Set the parentheses nesting limit
    pcre2_set_recursion_limit  Obsolete: use pcre2_set_depth_limit
    pcre2_set_recursion_memory_management  Obsolete function that (from 10.30 onwards) does nothing
    pcre2_substitute  Match a compiled pattern to a subject string and do + substitutions
    pcre2_substring_copy_byname  Extract named substring into given buffer
    pcre2_substring_copy_bynumber  Extract numbered substring into given buffer
    pcre2_substring_free  Free extracted substring
    pcre2_substring_get_byname  Extract named substring into new memory
    pcre2_substring_get_bynumber  Extract numbered substring into new memory
    pcre2_substring_length_byname  Find length of named substring
    pcre2_substring_length_bynumber  Find length of numbered substring
    pcre2_substring_list_free  Free list of extracted substrings
    pcre2_substring_list_get  Extract all substrings into new memory
    pcre2_substring_nametable_scan  Find table entries for given string name
    pcre2_substring_number_from_name  Convert captured string name to number
    + + + diff --git a/src/pcre2/doc/pcre2-config.1 b/src/pcre2/doc/pcre2-config.1 new file mode 100644 index 00000000..7fa0a091 --- /dev/null +++ b/src/pcre2/doc/pcre2-config.1 @@ -0,0 +1,86 @@ +.TH PCRE2-CONFIG 1 "28 September 2014" "PCRE2 10.00" +.SH NAME +pcre2-config - program to return PCRE2 configuration +.SH SYNOPSIS +.rs +.sp +.nf +.B pcre2-config [--prefix] [--exec-prefix] [--version] +.B " [--libs8] [--libs16] [--libs32] [--libs-posix]" +.B " [--cflags] [--cflags-posix]" +.fi +. +. +.SH DESCRIPTION +.rs +.sp +\fBpcre2-config\fP returns the configuration of the installed PCRE2 libraries +and the options required to compile a program to use them. Some of the options +apply only to the 8-bit, or 16-bit, or 32-bit libraries, respectively, and are +not available for libraries that have not been built. If an unavailable option +is encountered, the "usage" information is output. +. +. +.SH OPTIONS +.rs +.TP 10 +\fB--prefix\fP +Writes the directory prefix used in the PCRE2 installation for architecture +independent files (\fI/usr\fP on many systems, \fI/usr/local\fP on some +systems) to the standard output. +.TP 10 +\fB--exec-prefix\fP +Writes the directory prefix used in the PCRE2 installation for architecture +dependent files (normally the same as \fB--prefix\fP) to the standard output. +.TP 10 +\fB--version\fP +Writes the version number of the installed PCRE2 libraries to the standard +output. +.TP 10 +\fB--libs8\fP +Writes to the standard output the command line options required to link +with the 8-bit PCRE2 library (\fB-lpcre2-8\fP on many systems). +.TP 10 +\fB--libs16\fP +Writes to the standard output the command line options required to link +with the 16-bit PCRE2 library (\fB-lpcre2-16\fP on many systems). +.TP 10 +\fB--libs32\fP +Writes to the standard output the command line options required to link +with the 32-bit PCRE2 library (\fB-lpcre2-32\fP on many systems). +.TP 10 +\fB--libs-posix\fP +Writes to the standard output the command line options required to link with +PCRE2's POSIX API wrapper library (\fB-lpcre2-posix\fP \fB-lpcre2-8\fP on many +systems). +.TP 10 +\fB--cflags\fP +Writes to the standard output the command line options required to compile +files that use PCRE2 (this may include some \fB-I\fP options, but is blank on +many systems). +.TP 10 +\fB--cflags-posix\fP +Writes to the standard output the command line options required to compile +files that use PCRE2's POSIX API wrapper library (this may include some +\fB-I\fP options, but is blank on many systems). +. +. +.SH "SEE ALSO" +.rs +.sp +\fBpcre2(3)\fP +. +. +.SH AUTHOR +.rs +.sp +This manual page was originally written by Mark Baker for the Debian GNU/Linux +system. It has been subsequently revised as a generic PCRE2 man page. +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 28 September 2014 +.fi diff --git a/src/pcre2/doc/pcre2-config.txt b/src/pcre2/doc/pcre2-config.txt new file mode 100644 index 00000000..33785f4b --- /dev/null +++ b/src/pcre2/doc/pcre2-config.txt @@ -0,0 +1,81 @@ +PCRE2-CONFIG(1) General Commands Manual PCRE2-CONFIG(1) + + + +NAME + pcre2-config - program to return PCRE2 configuration + +SYNOPSIS + + pcre2-config [--prefix] [--exec-prefix] [--version] + [--libs8] [--libs16] [--libs32] [--libs-posix] + [--cflags] [--cflags-posix] + + +DESCRIPTION + + pcre2-config returns the configuration of the installed PCRE2 libraries + and the options required to compile a program to use them. Some of the + options apply only to the 8-bit, or 16-bit, or 32-bit libraries, re- + spectively, and are not available for libraries that have not been + built. If an unavailable option is encountered, the "usage" information + is output. + + +OPTIONS + + --prefix Writes the directory prefix used in the PCRE2 installation + for architecture independent files (/usr on many systems, + /usr/local on some systems) to the standard output. + + --exec-prefix + Writes the directory prefix used in the PCRE2 installation + for architecture dependent files (normally the same as --pre- + fix) to the standard output. + + --version Writes the version number of the installed PCRE2 libraries to + the standard output. + + --libs8 Writes to the standard output the command line options re- + quired to link with the 8-bit PCRE2 library (-lpcre2-8 on + many systems). + + --libs16 Writes to the standard output the command line options re- + quired to link with the 16-bit PCRE2 library (-lpcre2-16 on + many systems). + + --libs32 Writes to the standard output the command line options re- + quired to link with the 32-bit PCRE2 library (-lpcre2-32 on + many systems). + + --libs-posix + Writes to the standard output the command line options re- + quired to link with PCRE2's POSIX API wrapper library + (-lpcre2-posix -lpcre2-8 on many systems). + + --cflags Writes to the standard output the command line options re- + quired to compile files that use PCRE2 (this may include some + -I options, but is blank on many systems). + + --cflags-posix + Writes to the standard output the command line options re- + quired to compile files that use PCRE2's POSIX API wrapper + library (this may include some -I options, but is blank on + many systems). + + +SEE ALSO + + pcre2(3) + + +AUTHOR + + This manual page was originally written by Mark Baker for the Debian + GNU/Linux system. It has been subsequently revised as a generic PCRE2 + man page. + + +REVISION + + Last updated: 28 September 2014 diff --git a/src/pcre2/doc/pcre2.3 b/src/pcre2/doc/pcre2.3 new file mode 100644 index 00000000..efe41c55 --- /dev/null +++ b/src/pcre2/doc/pcre2.3 @@ -0,0 +1,207 @@ +.TH PCRE2 3 "28 April 2021" "PCRE2 10.37" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH INTRODUCTION +.rs +.sp +PCRE2 is the name used for a revised API for the PCRE library, which is a set +of functions, written in C, that implement regular expression pattern matching +using the same syntax and semantics as Perl, with just a few differences. After +nearly two decades, the limitations of the original API were making development +increasingly difficult. The new API is more extensible, and it was simplified +by abolishing the separate "study" optimizing function; in PCRE2, patterns are +automatically optimized where possible. Since forking from PCRE1, the code has +been extensively refactored and new features introduced. +.P +As well as Perl-style regular expression patterns, some features that appeared +in Python and the original PCRE before they appeared in Perl are available +using the Python syntax. There is also some support for one or two .NET and +Oniguruma syntax items, and there are options for requesting some minor changes +that give better ECMAScript (aka JavaScript) compatibility. +.P +The source code for PCRE2 can be compiled to support strings of 8-bit, 16-bit, +or 32-bit code units, which means that up to three separate libraries may be +installed, one for each code unit size. The size of code unit is not related to +the bit size of the underlying hardware. In a 64-bit environment that also +supports 32-bit applications, versions of PCRE2 that are compiled in both +64-bit and 32-bit modes may be needed. +.P +The original work to extend PCRE to 16-bit and 32-bit code units was done by +Zoltan Herczeg and Christian Persch, respectively. In all three cases, strings +can be interpreted either as one character per code unit, or as UTF-encoded +Unicode, with support for Unicode general category properties. Unicode support +is optional at build time (but is the default). However, processing strings as +UTF code units must be enabled explicitly at run time. The version of Unicode +in use can be discovered by running +.sp + pcre2test -C +.P +The three libraries contain identical sets of functions, with names ending in +_8, _16, or _32, respectively (for example, \fBpcre2_compile_8()\fP). However, +by defining PCRE2_CODE_UNIT_WIDTH to be 8, 16, or 32, a program that uses just +one code unit width can be written using generic names such as +\fBpcre2_compile()\fP, and the documentation is written assuming that this is +the case. +.P +In addition to the Perl-compatible matching function, PCRE2 contains an +alternative function that matches the same compiled patterns in a different +way. In certain circumstances, the alternative function has some advantages. +For a discussion of the two matching algorithms, see the +.\" HREF +\fBpcre2matching\fP +.\" +page. +.P +Details of exactly which Perl regular expression features are and are not +supported by PCRE2 are given in separate documents. See the +.\" HREF +\fBpcre2pattern\fP +.\" +and +.\" HREF +\fBpcre2compat\fP +.\" +pages. There is a syntax summary in the +.\" HREF +\fBpcre2syntax\fP +.\" +page. +.P +Some features of PCRE2 can be included, excluded, or changed when the library +is built. The +.\" HREF +\fBpcre2_config()\fP +.\" +function makes it possible for a client to discover which features are +available. The features themselves are described in the +.\" HREF +\fBpcre2build\fP +.\" +page. Documentation about building PCRE2 for various operating systems can be +found in the +.\" HTML +.\" +\fBREADME\fP +.\" +and +.\" HTML +.\" +\fBNON-AUTOTOOLS_BUILD\fP +.\" +files in the source distribution. +.P +The libraries contains a number of undocumented internal functions and data +tables that are used by more than one of the exported external functions, but +which are not intended for use by external callers. Their names all begin with +"_pcre2", which hopefully will not provoke any name clashes. In some +environments, it is possible to control which external symbols are exported +when a shared library is built, and in these cases the undocumented symbols are +not exported. +. +. +.SH "SECURITY CONSIDERATIONS" +.rs +.sp +If you are using PCRE2 in a non-UTF application that permits users to supply +arbitrary patterns for compilation, you should be aware of a feature that +allows users to turn on UTF support from within a pattern. For example, an +8-bit pattern that begins with "(*UTF)" turns on UTF-8 mode, which interprets +patterns and subjects as strings of UTF-8 code units instead of individual +8-bit characters. This causes both the pattern and any data against which it is +matched to be checked for UTF-8 validity. If the data string is very long, such +a check might use sufficiently many resources as to cause your application to +lose performance. +.P +One way of guarding against this possibility is to use the +\fBpcre2_pattern_info()\fP function to check the compiled pattern's options for +PCRE2_UTF. Alternatively, you can set the PCRE2_NEVER_UTF option when calling +\fBpcre2_compile()\fP. This causes a compile time error if the pattern contains +a UTF-setting sequence. +.P +The use of Unicode properties for character types such as \ed can also be +enabled from within the pattern, by specifying "(*UCP)". This feature can be +disallowed by setting the PCRE2_NEVER_UCP option. +.P +If your application is one that supports UTF, be aware that validity checking +can take time. If the same data string is to be matched many times, you can use +the PCRE2_NO_UTF_CHECK option for the second and subsequent matches to avoid +running redundant checks. +.P +The use of the \eC escape sequence in a UTF-8 or UTF-16 pattern can lead to +problems, because it may leave the current matching point in the middle of a +multi-code-unit character. The PCRE2_NEVER_BACKSLASH_C option can be used by an +application to lock out the use of \eC, causing a compile-time error if it is +encountered. It is also possible to build PCRE2 with the use of \eC permanently +disabled. +.P +Another way that performance can be hit is by running a pattern that has a very +large search tree against a string that will never match. Nested unlimited +repeats in a pattern are a common example. PCRE2 provides some protection +against this: see the \fBpcre2_set_match_limit()\fP function in the +.\" HREF +\fBpcre2api\fP +.\" +page. There is a similar function called \fBpcre2_set_depth_limit()\fP that can +be used to restrict the amount of memory that is used. +. +. +.SH "USER DOCUMENTATION" +.rs +.sp +The user documentation for PCRE2 comprises a number of different sections. In +the "man" format, each of these is a separate "man page". In the HTML format, +each is a separate page, linked from the index page. In the plain text format, +the descriptions of the \fBpcre2grep\fP and \fBpcre2test\fP programs are in +files called \fBpcre2grep.txt\fP and \fBpcre2test.txt\fP, respectively. The +remaining sections, except for the \fBpcre2demo\fP section (which is a program +listing), and the short pages for individual functions, are concatenated in +\fBpcre2.txt\fP, for ease of searching. The sections are as follows: +.sp + pcre2 this document + pcre2-config show PCRE2 installation configuration information + pcre2api details of PCRE2's native C API + pcre2build building PCRE2 + pcre2callout details of the pattern callout feature + pcre2compat discussion of Perl compatibility + pcre2convert details of pattern conversion functions + pcre2demo a demonstration C program that uses PCRE2 + pcre2grep description of the \fBpcre2grep\fP command (8-bit only) + pcre2jit discussion of just-in-time optimization support + pcre2limits details of size and other limits + pcre2matching discussion of the two matching algorithms + pcre2partial details of the partial matching facility +.\" JOIN + pcre2pattern syntax and semantics of supported regular + expression patterns + pcre2perform discussion of performance issues + pcre2posix the POSIX-compatible C API for the 8-bit library + pcre2sample discussion of the pcre2demo program + pcre2serialize details of pattern serialization + pcre2syntax quick syntax reference + pcre2test description of the \fBpcre2test\fP command + pcre2unicode discussion of Unicode and UTF support +.sp +In the "man" and HTML formats, there is also a short page for each C library +function, listing its arguments and results. +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +.P +Putting an actual email address here is a spam magnet. If you want to email me, +use my two initials, followed by the two digits 10, at the domain cam.ac.uk. +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 28 April 2021 +Copyright (c) 1997-2021 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2.txt b/src/pcre2/doc/pcre2.txt new file mode 100644 index 00000000..3c3d9803 --- /dev/null +++ b/src/pcre2/doc/pcre2.txt @@ -0,0 +1,11421 @@ +----------------------------------------------------------------------------- +This file contains a concatenation of the PCRE2 man pages, converted to plain +text format for ease of searching with a text editor, or for use on systems +that do not have a man page processor. The small individual files that give +synopses of each function in the library have not been included. Neither has +the pcre2demo program. There are separate text files for the pcre2grep and +pcre2test commands. +----------------------------------------------------------------------------- + + +PCRE2(3) Library Functions Manual PCRE2(3) + + + +NAME + PCRE2 - Perl-compatible regular expressions (revised API) + +INTRODUCTION + + PCRE2 is the name used for a revised API for the PCRE library, which is + a set of functions, written in C, that implement regular expression + pattern matching using the same syntax and semantics as Perl, with just + a few differences. After nearly two decades, the limitations of the + original API were making development increasingly difficult. The new + API is more extensible, and it was simplified by abolishing the sepa- + rate "study" optimizing function; in PCRE2, patterns are automatically + optimized where possible. Since forking from PCRE1, the code has been + extensively refactored and new features introduced. + + As well as Perl-style regular expression patterns, some features that + appeared in Python and the original PCRE before they appeared in Perl + are available using the Python syntax. There is also some support for + one or two .NET and Oniguruma syntax items, and there are options for + requesting some minor changes that give better ECMAScript (aka Java- + Script) compatibility. + + The source code for PCRE2 can be compiled to support strings of 8-bit, + 16-bit, or 32-bit code units, which means that up to three separate li- + braries may be installed, one for each code unit size. The size of code + unit is not related to the bit size of the underlying hardware. In a + 64-bit environment that also supports 32-bit applications, versions of + PCRE2 that are compiled in both 64-bit and 32-bit modes may be needed. + + The original work to extend PCRE to 16-bit and 32-bit code units was + done by Zoltan Herczeg and Christian Persch, respectively. In all three + cases, strings can be interpreted either as one character per code + unit, or as UTF-encoded Unicode, with support for Unicode general cate- + gory properties. Unicode support is optional at build time (but is the + default). However, processing strings as UTF code units must be enabled + explicitly at run time. The version of Unicode in use can be discovered + by running + + pcre2test -C + + The three libraries contain identical sets of functions, with names + ending in _8, _16, or _32, respectively (for example, pcre2_com- + pile_8()). However, by defining PCRE2_CODE_UNIT_WIDTH to be 8, 16, or + 32, a program that uses just one code unit width can be written using + generic names such as pcre2_compile(), and the documentation is written + assuming that this is the case. + + In addition to the Perl-compatible matching function, PCRE2 contains an + alternative function that matches the same compiled patterns in a dif- + ferent way. In certain circumstances, the alternative function has some + advantages. For a discussion of the two matching algorithms, see the + pcre2matching page. + + Details of exactly which Perl regular expression features are and are + not supported by PCRE2 are given in separate documents. See the + pcre2pattern and pcre2compat pages. There is a syntax summary in the + pcre2syntax page. + + Some features of PCRE2 can be included, excluded, or changed when the + library is built. The pcre2_config() function makes it possible for a + client to discover which features are available. The features them- + selves are described in the pcre2build page. Documentation about build- + ing PCRE2 for various operating systems can be found in the README and + NON-AUTOTOOLS_BUILD files in the source distribution. + + The libraries contains a number of undocumented internal functions and + data tables that are used by more than one of the exported external + functions, but which are not intended for use by external callers. + Their names all begin with "_pcre2", which hopefully will not provoke + any name clashes. In some environments, it is possible to control which + external symbols are exported when a shared library is built, and in + these cases the undocumented symbols are not exported. + + +SECURITY CONSIDERATIONS + + If you are using PCRE2 in a non-UTF application that permits users to + supply arbitrary patterns for compilation, you should be aware of a + feature that allows users to turn on UTF support from within a pattern. + For example, an 8-bit pattern that begins with "(*UTF)" turns on UTF-8 + mode, which interprets patterns and subjects as strings of UTF-8 code + units instead of individual 8-bit characters. This causes both the pat- + tern and any data against which it is matched to be checked for UTF-8 + validity. If the data string is very long, such a check might use suf- + ficiently many resources as to cause your application to lose perfor- + mance. + + One way of guarding against this possibility is to use the pcre2_pat- + tern_info() function to check the compiled pattern's options for + PCRE2_UTF. Alternatively, you can set the PCRE2_NEVER_UTF option when + calling pcre2_compile(). This causes a compile time error if the pat- + tern contains a UTF-setting sequence. + + The use of Unicode properties for character types such as \d can also + be enabled from within the pattern, by specifying "(*UCP)". This fea- + ture can be disallowed by setting the PCRE2_NEVER_UCP option. + + If your application is one that supports UTF, be aware that validity + checking can take time. If the same data string is to be matched many + times, you can use the PCRE2_NO_UTF_CHECK option for the second and + subsequent matches to avoid running redundant checks. + + The use of the \C escape sequence in a UTF-8 or UTF-16 pattern can lead + to problems, because it may leave the current matching point in the + middle of a multi-code-unit character. The PCRE2_NEVER_BACKSLASH_C op- + tion can be used by an application to lock out the use of \C, causing a + compile-time error if it is encountered. It is also possible to build + PCRE2 with the use of \C permanently disabled. + + Another way that performance can be hit is by running a pattern that + has a very large search tree against a string that will never match. + Nested unlimited repeats in a pattern are a common example. PCRE2 pro- + vides some protection against this: see the pcre2_set_match_limit() + function in the pcre2api page. There is a similar function called + pcre2_set_depth_limit() that can be used to restrict the amount of mem- + ory that is used. + + +USER DOCUMENTATION + + The user documentation for PCRE2 comprises a number of different sec- + tions. In the "man" format, each of these is a separate "man page". In + the HTML format, each is a separate page, linked from the index page. + In the plain text format, the descriptions of the pcre2grep and + pcre2test programs are in files called pcre2grep.txt and pcre2test.txt, + respectively. The remaining sections, except for the pcre2demo section + (which is a program listing), and the short pages for individual func- + tions, are concatenated in pcre2.txt, for ease of searching. The sec- + tions are as follows: + + pcre2 this document + pcre2-config show PCRE2 installation configuration information + pcre2api details of PCRE2's native C API + pcre2build building PCRE2 + pcre2callout details of the pattern callout feature + pcre2compat discussion of Perl compatibility + pcre2convert details of pattern conversion functions + pcre2demo a demonstration C program that uses PCRE2 + pcre2grep description of the pcre2grep command (8-bit only) + pcre2jit discussion of just-in-time optimization support + pcre2limits details of size and other limits + pcre2matching discussion of the two matching algorithms + pcre2partial details of the partial matching facility + pcre2pattern syntax and semantics of supported regular + expression patterns + pcre2perform discussion of performance issues + pcre2posix the POSIX-compatible C API for the 8-bit library + pcre2sample discussion of the pcre2demo program + pcre2serialize details of pattern serialization + pcre2syntax quick syntax reference + pcre2test description of the pcre2test command + pcre2unicode discussion of Unicode and UTF support + + In the "man" and HTML formats, there is also a short page for each C + library function, listing its arguments and results. + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + Putting an actual email address here is a spam magnet. If you want to + email me, use my two initials, followed by the two digits 10, at the + domain cam.ac.uk. + + +REVISION + + Last updated: 28 April 2021 + Copyright (c) 1997-2021 University of Cambridge. +------------------------------------------------------------------------------ + + +PCRE2API(3) Library Functions Manual PCRE2API(3) + + + +NAME + PCRE2 - Perl-compatible regular expressions (revised API) + + #include + + PCRE2 is a new API for PCRE, starting at release 10.0. This document + contains a description of all its native functions. See the pcre2 docu- + ment for an overview of all the PCRE2 documentation. + + +PCRE2 NATIVE API BASIC FUNCTIONS + + pcre2_code *pcre2_compile(PCRE2_SPTR pattern, PCRE2_SIZE length, + uint32_t options, int *errorcode, PCRE2_SIZE *erroroffset, + pcre2_compile_context *ccontext); + + void pcre2_code_free(pcre2_code *code); + + pcre2_match_data *pcre2_match_data_create(uint32_t ovecsize, + pcre2_general_context *gcontext); + + pcre2_match_data *pcre2_match_data_create_from_pattern( + const pcre2_code *code, pcre2_general_context *gcontext); + + int pcre2_match(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext); + + int pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext, + int *workspace, PCRE2_SIZE wscount); + + void pcre2_match_data_free(pcre2_match_data *match_data); + + +PCRE2 NATIVE API AUXILIARY MATCH FUNCTIONS + + PCRE2_SPTR pcre2_get_mark(pcre2_match_data *match_data); + + uint32_t pcre2_get_ovector_count(pcre2_match_data *match_data); + + PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *match_data); + + PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *match_data); + + +PCRE2 NATIVE API GENERAL CONTEXT FUNCTIONS + + pcre2_general_context *pcre2_general_context_create( + void *(*private_malloc)(PCRE2_SIZE, void *), + void (*private_free)(void *, void *), void *memory_data); + + pcre2_general_context *pcre2_general_context_copy( + pcre2_general_context *gcontext); + + void pcre2_general_context_free(pcre2_general_context *gcontext); + + +PCRE2 NATIVE API COMPILE CONTEXT FUNCTIONS + + pcre2_compile_context *pcre2_compile_context_create( + pcre2_general_context *gcontext); + + pcre2_compile_context *pcre2_compile_context_copy( + pcre2_compile_context *ccontext); + + void pcre2_compile_context_free(pcre2_compile_context *ccontext); + + int pcre2_set_bsr(pcre2_compile_context *ccontext, + uint32_t value); + + int pcre2_set_character_tables(pcre2_compile_context *ccontext, + const uint8_t *tables); + + int pcre2_set_compile_extra_options(pcre2_compile_context *ccontext, + uint32_t extra_options); + + int pcre2_set_max_pattern_length(pcre2_compile_context *ccontext, + PCRE2_SIZE value); + + int pcre2_set_newline(pcre2_compile_context *ccontext, + uint32_t value); + + int pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext, + uint32_t value); + + int pcre2_set_compile_recursion_guard(pcre2_compile_context *ccontext, + int (*guard_function)(uint32_t, void *), void *user_data); + + +PCRE2 NATIVE API MATCH CONTEXT FUNCTIONS + + pcre2_match_context *pcre2_match_context_create( + pcre2_general_context *gcontext); + + pcre2_match_context *pcre2_match_context_copy( + pcre2_match_context *mcontext); + + void pcre2_match_context_free(pcre2_match_context *mcontext); + + int pcre2_set_callout(pcre2_match_context *mcontext, + int (*callout_function)(pcre2_callout_block *, void *), + void *callout_data); + + int pcre2_set_substitute_callout(pcre2_match_context *mcontext, + int (*callout_function)(pcre2_substitute_callout_block *, void *), + void *callout_data); + + int pcre2_set_offset_limit(pcre2_match_context *mcontext, + PCRE2_SIZE value); + + int pcre2_set_heap_limit(pcre2_match_context *mcontext, + uint32_t value); + + int pcre2_set_match_limit(pcre2_match_context *mcontext, + uint32_t value); + + int pcre2_set_depth_limit(pcre2_match_context *mcontext, + uint32_t value); + + +PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS + + int pcre2_substring_copy_byname(pcre2_match_data *match_data, + PCRE2_SPTR name, PCRE2_UCHAR *buffer, PCRE2_SIZE *bufflen); + + int pcre2_substring_copy_bynumber(pcre2_match_data *match_data, + uint32_t number, PCRE2_UCHAR *buffer, + PCRE2_SIZE *bufflen); + + void pcre2_substring_free(PCRE2_UCHAR *buffer); + + int pcre2_substring_get_byname(pcre2_match_data *match_data, + PCRE2_SPTR name, PCRE2_UCHAR **bufferptr, PCRE2_SIZE *bufflen); + + int pcre2_substring_get_bynumber(pcre2_match_data *match_data, + uint32_t number, PCRE2_UCHAR **bufferptr, + PCRE2_SIZE *bufflen); + + int pcre2_substring_length_byname(pcre2_match_data *match_data, + PCRE2_SPTR name, PCRE2_SIZE *length); + + int pcre2_substring_length_bynumber(pcre2_match_data *match_data, + uint32_t number, PCRE2_SIZE *length); + + int pcre2_substring_nametable_scan(const pcre2_code *code, + PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last); + + int pcre2_substring_number_from_name(const pcre2_code *code, + PCRE2_SPTR name); + + void pcre2_substring_list_free(PCRE2_SPTR *list); + + int pcre2_substring_list_get(pcre2_match_data *match_data, + PCRE2_UCHAR ***listptr, PCRE2_SIZE **lengthsptr); + + +PCRE2 NATIVE API STRING SUBSTITUTION FUNCTION + + int pcre2_substitute(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext, PCRE2_SPTR replacementz, + PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer, + PCRE2_SIZE *outlengthptr); + + +PCRE2 NATIVE API JIT FUNCTIONS + + int pcre2_jit_compile(pcre2_code *code, uint32_t options); + + int pcre2_jit_match(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext); + + void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext); + + pcre2_jit_stack *pcre2_jit_stack_create(PCRE2_SIZE startsize, + PCRE2_SIZE maxsize, pcre2_general_context *gcontext); + + void pcre2_jit_stack_assign(pcre2_match_context *mcontext, + pcre2_jit_callback callback_function, void *callback_data); + + void pcre2_jit_stack_free(pcre2_jit_stack *jit_stack); + + +PCRE2 NATIVE API SERIALIZATION FUNCTIONS + + int32_t pcre2_serialize_decode(pcre2_code **codes, + int32_t number_of_codes, const uint8_t *bytes, + pcre2_general_context *gcontext); + + int32_t pcre2_serialize_encode(const pcre2_code **codes, + int32_t number_of_codes, uint8_t **serialized_bytes, + PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext); + + void pcre2_serialize_free(uint8_t *bytes); + + int32_t pcre2_serialize_get_number_of_codes(const uint8_t *bytes); + + +PCRE2 NATIVE API AUXILIARY FUNCTIONS + + pcre2_code *pcre2_code_copy(const pcre2_code *code); + + pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *code); + + int pcre2_get_error_message(int errorcode, PCRE2_UCHAR *buffer, + PCRE2_SIZE bufflen); + + const uint8_t *pcre2_maketables(pcre2_general_context *gcontext); + + void pcre2_maketables_free(pcre2_general_context *gcontext, + const uint8_t *tables); + + int pcre2_pattern_info(const pcre2_code *code, uint32_t what, + void *where); + + int pcre2_callout_enumerate(const pcre2_code *code, + int (*callback)(pcre2_callout_enumerate_block *, void *), + void *user_data); + + int pcre2_config(uint32_t what, void *where); + + +PCRE2 NATIVE API OBSOLETE FUNCTIONS + + int pcre2_set_recursion_limit(pcre2_match_context *mcontext, + uint32_t value); + + int pcre2_set_recursion_memory_management( + pcre2_match_context *mcontext, + void *(*private_malloc)(PCRE2_SIZE, void *), + void (*private_free)(void *, void *), void *memory_data); + + These functions became obsolete at release 10.30 and are retained only + for backward compatibility. They should not be used in new code. The + first is replaced by pcre2_set_depth_limit(); the second is no longer + needed and has no effect (it always returns zero). + + +PCRE2 EXPERIMENTAL PATTERN CONVERSION FUNCTIONS + + pcre2_convert_context *pcre2_convert_context_create( + pcre2_general_context *gcontext); + + pcre2_convert_context *pcre2_convert_context_copy( + pcre2_convert_context *cvcontext); + + void pcre2_convert_context_free(pcre2_convert_context *cvcontext); + + int pcre2_set_glob_escape(pcre2_convert_context *cvcontext, + uint32_t escape_char); + + int pcre2_set_glob_separator(pcre2_convert_context *cvcontext, + uint32_t separator_char); + + int pcre2_pattern_convert(PCRE2_SPTR pattern, PCRE2_SIZE length, + uint32_t options, PCRE2_UCHAR **buffer, + PCRE2_SIZE *blength, pcre2_convert_context *cvcontext); + + void pcre2_converted_pattern_free(PCRE2_UCHAR *converted_pattern); + + These functions provide a way of converting non-PCRE2 patterns into + patterns that can be processed by pcre2_compile(). This facility is ex- + perimental and may be changed in future releases. At present, "globs" + and POSIX basic and extended patterns can be converted. Details are + given in the pcre2convert documentation. + + +PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES + + There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit + code units, respectively. However, there is just one header file, + pcre2.h. This contains the function prototypes and other definitions + for all three libraries. One, two, or all three can be installed simul- + taneously. On Unix-like systems the libraries are called libpcre2-8, + libpcre2-16, and libpcre2-32, and they can also co-exist with the orig- + inal PCRE libraries. + + Character strings are passed to and from a PCRE2 library as a sequence + of unsigned integers in code units of the appropriate width. Every + PCRE2 function comes in three different forms, one for each library, + for example: + + pcre2_compile_8() + pcre2_compile_16() + pcre2_compile_32() + + There are also three different sets of data types: + + PCRE2_UCHAR8, PCRE2_UCHAR16, PCRE2_UCHAR32 + PCRE2_SPTR8, PCRE2_SPTR16, PCRE2_SPTR32 + + The UCHAR types define unsigned code units of the appropriate widths. + For example, PCRE2_UCHAR16 is usually defined as `uint16_t'. The SPTR + types are constant pointers to the equivalent UCHAR types, that is, + they are pointers to vectors of unsigned code units. + + Many applications use only one code unit width. For their convenience, + macros are defined whose names are the generic forms such as pcre2_com- + pile() and PCRE2_SPTR. These macros use the value of the macro + PCRE2_CODE_UNIT_WIDTH to generate the appropriate width-specific func- + tion and macro names. PCRE2_CODE_UNIT_WIDTH is not defined by default. + An application must define it to be 8, 16, or 32 before including + pcre2.h in order to make use of the generic names. + + Applications that use more than one code unit width can be linked with + more than one PCRE2 library, but must define PCRE2_CODE_UNIT_WIDTH to + be 0 before including pcre2.h, and then use the real function names. + Any code that is to be included in an environment where the value of + PCRE2_CODE_UNIT_WIDTH is unknown should also use the real function + names. (Unfortunately, it is not possible in C code to save and restore + the value of a macro.) + + If PCRE2_CODE_UNIT_WIDTH is not defined before including pcre2.h, a + compiler error occurs. + + When using multiple libraries in an application, you must take care + when processing any particular pattern to use only functions from a + single library. For example, if you want to run a match using a pat- + tern that was compiled with pcre2_compile_16(), you must do so with + pcre2_match_16(), not pcre2_match_8() or pcre2_match_32(). + + In the function summaries above, and in the rest of this document and + other PCRE2 documents, functions and data types are described using + their generic names, without the _8, _16, or _32 suffix. + + +PCRE2 API OVERVIEW + + PCRE2 has its own native API, which is described in this document. + There are also some wrapper functions for the 8-bit library that corre- + spond to the POSIX regular expression API, but they do not give access + to all the functionality of PCRE2. They are described in the pcre2posix + documentation. Both these APIs define a set of C function calls. + + The native API C data types, function prototypes, option values, and + error codes are defined in the header file pcre2.h, which also contains + definitions of PCRE2_MAJOR and PCRE2_MINOR, the major and minor release + numbers for the library. Applications can use these to include support + for different releases of PCRE2. + + In a Windows environment, if you want to statically link an application + program against a non-dll PCRE2 library, you must define PCRE2_STATIC + before including pcre2.h. + + The functions pcre2_compile() and pcre2_match() are used for compiling + and matching regular expressions in a Perl-compatible manner. A sample + program that demonstrates the simplest way of using them is provided in + the file called pcre2demo.c in the PCRE2 source distribution. A listing + of this program is given in the pcre2demo documentation, and the + pcre2sample documentation describes how to compile and run it. + + The compiling and matching functions recognize various options that are + passed as bits in an options argument. There are also some more compli- + cated parameters such as custom memory management functions and re- + source limits that are passed in "contexts" (which are just memory + blocks, described below). Simple applications do not need to make use + of contexts. + + Just-in-time (JIT) compiler support is an optional feature of PCRE2 + that can be built in appropriate hardware environments. It greatly + speeds up the matching performance of many patterns. Programs can re- + quest that it be used if available by calling pcre2_jit_compile() after + a pattern has been successfully compiled by pcre2_compile(). This does + nothing if JIT support is not available. + + More complicated programs might need to make use of the specialist + functions pcre2_jit_stack_create(), pcre2_jit_stack_free(), and + pcre2_jit_stack_assign() in order to control the JIT code's memory us- + age. + + JIT matching is automatically used by pcre2_match() if it is available, + unless the PCRE2_NO_JIT option is set. There is also a direct interface + for JIT matching, which gives improved performance at the expense of + less sanity checking. The JIT-specific functions are discussed in the + pcre2jit documentation. + + A second matching function, pcre2_dfa_match(), which is not Perl-com- + patible, is also provided. This uses a different algorithm for the + matching. The alternative algorithm finds all possible matches (at a + given point in the subject), and scans the subject just once (unless + there are lookaround assertions). However, this algorithm does not re- + turn captured substrings. A description of the two matching algorithms + and their advantages and disadvantages is given in the pcre2matching + documentation. There is no JIT support for pcre2_dfa_match(). + + In addition to the main compiling and matching functions, there are + convenience functions for extracting captured substrings from a subject + string that has been matched by pcre2_match(). They are: + + pcre2_substring_copy_byname() + pcre2_substring_copy_bynumber() + pcre2_substring_get_byname() + pcre2_substring_get_bynumber() + pcre2_substring_list_get() + pcre2_substring_length_byname() + pcre2_substring_length_bynumber() + pcre2_substring_nametable_scan() + pcre2_substring_number_from_name() + + pcre2_substring_free() and pcre2_substring_list_free() are also pro- + vided, to free memory used for extracted strings. If either of these + functions is called with a NULL argument, the function returns immedi- + ately without doing anything. + + The function pcre2_substitute() can be called to match a pattern and + return a copy of the subject string with substitutions for parts that + were matched. + + Functions whose names begin with pcre2_serialize_ are used for saving + compiled patterns on disc or elsewhere, and reloading them later. + + Finally, there are functions for finding out information about a com- + piled pattern (pcre2_pattern_info()) and about the configuration with + which PCRE2 was built (pcre2_config()). + + Functions with names ending with _free() are used for freeing memory + blocks of various sorts. In all cases, if one of these functions is + called with a NULL argument, it does nothing. + + +STRING LENGTHS AND OFFSETS + + The PCRE2 API uses string lengths and offsets into strings of code + units in several places. These values are always of type PCRE2_SIZE, + which is an unsigned integer type, currently always defined as size_t. + The largest value that can be stored in such a type (that is + ~(PCRE2_SIZE)0) is reserved as a special indicator for zero-terminated + strings and unset offsets. Therefore, the longest string that can be + handled is one less than this maximum. + + +NEWLINES + + PCRE2 supports five different conventions for indicating line breaks in + strings: a single CR (carriage return) character, a single LF (line- + feed) character, the two-character sequence CRLF, any of the three pre- + ceding, or any Unicode newline sequence. The Unicode newline sequences + are the three just mentioned, plus the single characters VT (vertical + tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line + separator, U+2028), and PS (paragraph separator, U+2029). + + Each of the first three conventions is used by at least one operating + system as its standard newline sequence. When PCRE2 is built, a default + can be specified. If it is not, the default is set to LF, which is the + Unix standard. However, the newline convention can be changed by an ap- + plication when calling pcre2_compile(), or it can be specified by spe- + cial text at the start of the pattern itself; this overrides any other + settings. See the pcre2pattern page for details of the special charac- + ter sequences. + + In the PCRE2 documentation the word "newline" is used to mean "the + character or pair of characters that indicate a line break". The choice + of newline convention affects the handling of the dot, circumflex, and + dollar metacharacters, the handling of #-comments in /x mode, and, when + CRLF is a recognized line ending sequence, the match position advance- + ment for a non-anchored pattern. There is more detail about this in the + section on pcre2_match() options below. + + The choice of newline convention does not affect the interpretation of + the \n or \r escape sequences, nor does it affect what \R matches; this + has its own separate convention. + + +MULTITHREADING + + In a multithreaded application it is important to keep thread-specific + data separate from data that can be shared between threads. The PCRE2 + library code itself is thread-safe: it contains no static or global + variables. The API is designed to be fairly simple for non-threaded ap- + plications while at the same time ensuring that multithreaded applica- + tions can use it. + + There are several different blocks of data that are used to pass infor- + mation between the application and the PCRE2 libraries. + + The compiled pattern + + A pointer to the compiled form of a pattern is returned to the user + when pcre2_compile() is successful. The data in the compiled pattern is + fixed, and does not change when the pattern is matched. Therefore, it + is thread-safe, that is, the same compiled pattern can be used by more + than one thread simultaneously. For example, an application can compile + all its patterns at the start, before forking off multiple threads that + use them. However, if the just-in-time (JIT) optimization feature is + being used, it needs separate memory stack areas for each thread. See + the pcre2jit documentation for more details. + + In a more complicated situation, where patterns are compiled only when + they are first needed, but are still shared between threads, pointers + to compiled patterns must be protected from simultaneous writing by + multiple threads. This is somewhat tricky to do correctly. If you know + that writing to a pointer is atomic in your environment, you can use + logic like this: + + Get a read-only (shared) lock (mutex) for pointer + if (pointer == NULL) + { + Get a write (unique) lock for pointer + if (pointer == NULL) pointer = pcre2_compile(... + } + Release the lock + Use pointer in pcre2_match() + + Of course, testing for compilation errors should also be included in + the code. + + The reason for checking the pointer a second time is as follows: Sev- + eral threads may have acquired the shared lock and tested the pointer + for being NULL, but only one of them will be given the write lock, with + the rest kept waiting. The winning thread will compile the pattern and + store the result. After this thread releases the write lock, another + thread will get it, and if it does not retest pointer for being NULL, + will recompile the pattern and overwrite the pointer, creating a memory + leak and possibly causing other issues. + + In an environment where writing to a pointer may not be atomic, the + above logic is not sufficient. The thread that is doing the compiling + may be descheduled after writing only part of the pointer, which could + cause other threads to use an invalid value. Instead of checking the + pointer itself, a separate "pointer is valid" flag (that can be updated + atomically) must be used: + + Get a read-only (shared) lock (mutex) for pointer + if (!pointer_is_valid) + { + Get a write (unique) lock for pointer + if (!pointer_is_valid) + { + pointer = pcre2_compile(... + pointer_is_valid = TRUE + } + } + Release the lock + Use pointer in pcre2_match() + + If JIT is being used, but the JIT compilation is not being done immedi- + ately (perhaps waiting to see if the pattern is used often enough), + similar logic is required. JIT compilation updates a value within the + compiled code block, so a thread must gain unique write access to the + pointer before calling pcre2_jit_compile(). Alternatively, + pcre2_code_copy() or pcre2_code_copy_with_tables() can be used to ob- + tain a private copy of the compiled code before calling the JIT com- + piler. + + Context blocks + + The next main section below introduces the idea of "contexts" in which + PCRE2 functions are called. A context is nothing more than a collection + of parameters that control the way PCRE2 operates. Grouping a number of + parameters together in a context is a convenient way of passing them to + a PCRE2 function without using lots of arguments. The parameters that + are stored in contexts are in some sense "advanced features" of the + API. Many straightforward applications will not need to use contexts. + + In a multithreaded application, if the parameters in a context are val- + ues that are never changed, the same context can be used by all the + threads. However, if any thread needs to change any value in a context, + it must make its own thread-specific copy. + + Match blocks + + The matching functions need a block of memory for storing the results + of a match. This includes details of what was matched, as well as addi- + tional information such as the name of a (*MARK) setting. Each thread + must provide its own copy of this memory. + + +PCRE2 CONTEXTS + + Some PCRE2 functions have a lot of parameters, many of which are used + only by specialist applications, for example, those that use custom + memory management or non-standard character tables. To keep function + argument lists at a reasonable size, and at the same time to keep the + API extensible, "uncommon" parameters are passed to certain functions + in a context instead of directly. A context is just a block of memory + that holds the parameter values. Applications that do not need to ad- + just any of the context parameters can pass NULL when a context pointer + is required. + + There are three different types of context: a general context that is + relevant for several PCRE2 operations, a compile-time context, and a + match-time context. + + The general context + + At present, this context just contains pointers to (and data for) ex- + ternal memory management functions that are called from several places + in the PCRE2 library. The context is named `general' rather than + specifically `memory' because in future other fields may be added. If + you do not want to supply your own custom memory management functions, + you do not need to bother with a general context. A general context is + created by: + + pcre2_general_context *pcre2_general_context_create( + void *(*private_malloc)(PCRE2_SIZE, void *), + void (*private_free)(void *, void *), void *memory_data); + + The two function pointers specify custom memory management functions, + whose prototypes are: + + void *private_malloc(PCRE2_SIZE, void *); + void private_free(void *, void *); + + Whenever code in PCRE2 calls these functions, the final argument is the + value of memory_data. Either of the first two arguments of the creation + function may be NULL, in which case the system memory management func- + tions malloc() and free() are used. (This is not currently useful, as + there are no other fields in a general context, but in future there + might be.) The private_malloc() function is used (if supplied) to ob- + tain memory for storing the context, and all three values are saved as + part of the context. + + Whenever PCRE2 creates a data block of any kind, the block contains a + pointer to the free() function that matches the malloc() function that + was used. When the time comes to free the block, this function is + called. + + A general context can be copied by calling: + + pcre2_general_context *pcre2_general_context_copy( + pcre2_general_context *gcontext); + + The memory used for a general context should be freed by calling: + + void pcre2_general_context_free(pcre2_general_context *gcontext); + + If this function is passed a NULL argument, it returns immediately + without doing anything. + + The compile context + + A compile context is required if you want to provide an external func- + tion for stack checking during compilation or to change the default + values of any of the following compile-time parameters: + + What \R matches (Unicode newlines or CR, LF, CRLF only) + PCRE2's character tables + The newline character sequence + The compile time nested parentheses limit + The maximum length of the pattern string + The extra options bits (none set by default) + + A compile context is also required if you are using custom memory man- + agement. If none of these apply, just pass NULL as the context argu- + ment of pcre2_compile(). + + A compile context is created, copied, and freed by the following func- + tions: + + pcre2_compile_context *pcre2_compile_context_create( + pcre2_general_context *gcontext); + + pcre2_compile_context *pcre2_compile_context_copy( + pcre2_compile_context *ccontext); + + void pcre2_compile_context_free(pcre2_compile_context *ccontext); + + A compile context is created with default values for its parameters. + These can be changed by calling the following functions, which return 0 + on success, or PCRE2_ERROR_BADDATA if invalid data is detected. + + int pcre2_set_bsr(pcre2_compile_context *ccontext, + uint32_t value); + + The value must be PCRE2_BSR_ANYCRLF, to specify that \R matches only + CR, LF, or CRLF, or PCRE2_BSR_UNICODE, to specify that \R matches any + Unicode line ending sequence. The value is used by the JIT compiler and + by the two interpreted matching functions, pcre2_match() and + pcre2_dfa_match(). + + int pcre2_set_character_tables(pcre2_compile_context *ccontext, + const uint8_t *tables); + + The value must be the result of a call to pcre2_maketables(), whose + only argument is a general context. This function builds a set of char- + acter tables in the current locale. + + int pcre2_set_compile_extra_options(pcre2_compile_context *ccontext, + uint32_t extra_options); + + As PCRE2 has developed, almost all the 32 option bits that are avail- + able in the options argument of pcre2_compile() have been used up. To + avoid running out, the compile context contains a set of extra option + bits which are used for some newer, assumed rarer, options. This func- + tion sets those bits. It always sets all the bits (either on or off). + It does not modify any existing setting. The available options are de- + fined in the section entitled "Extra compile options" below. + + int pcre2_set_max_pattern_length(pcre2_compile_context *ccontext, + PCRE2_SIZE value); + + This sets a maximum length, in code units, for any pattern string that + is compiled with this context. If the pattern is longer, an error is + generated. This facility is provided so that applications that accept + patterns from external sources can limit their size. The default is the + largest number that a PCRE2_SIZE variable can hold, which is effec- + tively unlimited. + + int pcre2_set_newline(pcre2_compile_context *ccontext, + uint32_t value); + + This specifies which characters or character sequences are to be recog- + nized as newlines. The value must be one of PCRE2_NEWLINE_CR (carriage + return only), PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the + two-character sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any + of the above), PCRE2_NEWLINE_ANY (any Unicode newline sequence), or + PCRE2_NEWLINE_NUL (the NUL character, that is a binary zero). + + A pattern can override the value set in the compile context by starting + with a sequence such as (*CRLF). See the pcre2pattern page for details. + + When a pattern is compiled with the PCRE2_EXTENDED or PCRE2_EX- + TENDED_MORE option, the newline convention affects the recognition of + the end of internal comments starting with #. The value is saved with + the compiled pattern for subsequent use by the JIT compiler and by the + two interpreted matching functions, pcre2_match() and + pcre2_dfa_match(). + + int pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext, + uint32_t value); + + This parameter adjusts the limit, set when PCRE2 is built (default + 250), on the depth of parenthesis nesting in a pattern. This limit + stops rogue patterns using up too much system stack when being com- + piled. The limit applies to parentheses of all kinds, not just captur- + ing parentheses. + + int pcre2_set_compile_recursion_guard(pcre2_compile_context *ccontext, + int (*guard_function)(uint32_t, void *), void *user_data); + + There is at least one application that runs PCRE2 in threads with very + limited system stack, where running out of stack is to be avoided at + all costs. The parenthesis limit above cannot take account of how much + stack is actually available during compilation. For a finer control, + you can supply a function that is called whenever pcre2_compile() + starts to compile a parenthesized part of a pattern. This function can + check the actual stack size (or anything else that it wants to, of + course). + + The first argument to the callout function gives the current depth of + nesting, and the second is user data that is set up by the last argu- + ment of pcre2_set_compile_recursion_guard(). The callout function + should return zero if all is well, or non-zero to force an error. + + The match context + + A match context is required if you want to: + + Set up a callout function + Set an offset limit for matching an unanchored pattern + Change the limit on the amount of heap used when matching + Change the backtracking match limit + Change the backtracking depth limit + Set custom memory management specifically for the match + + If none of these apply, just pass NULL as the context argument of + pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match(). + + A match context is created, copied, and freed by the following func- + tions: + + pcre2_match_context *pcre2_match_context_create( + pcre2_general_context *gcontext); + + pcre2_match_context *pcre2_match_context_copy( + pcre2_match_context *mcontext); + + void pcre2_match_context_free(pcre2_match_context *mcontext); + + A match context is created with default values for its parameters. + These can be changed by calling the following functions, which return 0 + on success, or PCRE2_ERROR_BADDATA if invalid data is detected. + + int pcre2_set_callout(pcre2_match_context *mcontext, + int (*callout_function)(pcre2_callout_block *, void *), + void *callout_data); + + This sets up a callout function for PCRE2 to call at specified points + during a matching operation. Details are given in the pcre2callout doc- + umentation. + + int pcre2_set_substitute_callout(pcre2_match_context *mcontext, + int (*callout_function)(pcre2_substitute_callout_block *, void *), + void *callout_data); + + This sets up a callout function for PCRE2 to call after each substitu- + tion made by pcre2_substitute(). Details are given in the section enti- + tled "Creating a new string with substitutions" below. + + int pcre2_set_offset_limit(pcre2_match_context *mcontext, + PCRE2_SIZE value); + + The offset_limit parameter limits how far an unanchored search can ad- + vance in the subject string. The default value is PCRE2_UNSET. The + pcre2_match() and pcre2_dfa_match() functions return PCRE2_ERROR_NO- + MATCH if a match with a starting point before or at the given offset is + not found. The pcre2_substitute() function makes no more substitutions. + + For example, if the pattern /abc/ is matched against "123abc" with an + offset limit less than 3, the result is PCRE2_ERROR_NOMATCH. A match + can never be found if the startoffset argument of pcre2_match(), + pcre2_dfa_match(), or pcre2_substitute() is greater than the offset + limit set in the match context. + + When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT op- + tion when calling pcre2_compile() so that when JIT is in use, different + code can be compiled. If a match is started with a non-default match + limit when PCRE2_USE_OFFSET_LIMIT is not set, an error is generated. + + The offset limit facility can be used to track progress when searching + large subject strings or to limit the extent of global substitutions. + See also the PCRE2_FIRSTLINE option, which requires a match to start + before or at the first newline that follows the start of matching in + the subject. If this is set with an offset limit, a match must occur in + the first line and also within the offset limit. In other words, which- + ever limit comes first is used. + + int pcre2_set_heap_limit(pcre2_match_context *mcontext, + uint32_t value); + + The heap_limit parameter specifies, in units of kibibytes (1024 bytes), + the maximum amount of heap memory that pcre2_match() may use to hold + backtracking information when running an interpretive match. This limit + also applies to pcre2_dfa_match(), which may use the heap when process- + ing patterns with a lot of nested pattern recursion or lookarounds or + atomic groups. This limit does not apply to matching with the JIT opti- + mization, which has its own memory control arrangements (see the + pcre2jit documentation for more details). If the limit is reached, the + negative error code PCRE2_ERROR_HEAPLIMIT is returned. The default + limit can be set when PCRE2 is built; if it is not, the default is set + very large and is essentially "unlimited". + + A value for the heap limit may also be supplied by an item at the start + of a pattern of the form + + (*LIMIT_HEAP=ddd) + + where ddd is a decimal number. However, such a setting is ignored un- + less ddd is less than the limit set by the caller of pcre2_match() or, + if no such limit is set, less than the default. + + The pcre2_match() function starts out using a 20KiB vector on the sys- + tem stack for recording backtracking points. The more nested backtrack- + ing points there are (that is, the deeper the search tree), the more + memory is needed. Heap memory is used only if the initial vector is + too small. If the heap limit is set to a value less than 21 (in partic- + ular, zero) no heap memory will be used. In this case, only patterns + that do not have a lot of nested backtracking can be successfully pro- + cessed. + + Similarly, for pcre2_dfa_match(), a vector on the system stack is used + when processing pattern recursions, lookarounds, or atomic groups, and + only if this is not big enough is heap memory used. In this case, too, + setting a value of zero disables the use of the heap. + + int pcre2_set_match_limit(pcre2_match_context *mcontext, + uint32_t value); + + The match_limit parameter provides a means of preventing PCRE2 from us- + ing up too many computing resources when processing patterns that are + not going to match, but which have a very large number of possibilities + in their search trees. The classic example is a pattern that uses + nested unlimited repeats. + + There is an internal counter in pcre2_match() that is incremented each + time round its main matching loop. If this value reaches the match + limit, pcre2_match() returns the negative value PCRE2_ERROR_MATCHLIMIT. + This has the effect of limiting the amount of backtracking that can + take place. For patterns that are not anchored, the count restarts from + zero for each position in the subject string. This limit also applies + to pcre2_dfa_match(), though the counting is done in a different way. + + When pcre2_match() is called with a pattern that was successfully pro- + cessed by pcre2_jit_compile(), the way in which matching is executed is + entirely different. However, there is still the possibility of runaway + matching that goes on for a very long time, and so the match_limit + value is also used in this case (but in a different way) to limit how + long the matching can continue. + + The default value for the limit can be set when PCRE2 is built; the de- + fault default is 10 million, which handles all but the most extreme + cases. A value for the match limit may also be supplied by an item at + the start of a pattern of the form + + (*LIMIT_MATCH=ddd) + + where ddd is a decimal number. However, such a setting is ignored un- + less ddd is less than the limit set by the caller of pcre2_match() or + pcre2_dfa_match() or, if no such limit is set, less than the default. + + int pcre2_set_depth_limit(pcre2_match_context *mcontext, + uint32_t value); + + This parameter limits the depth of nested backtracking in + pcre2_match(). Each time a nested backtracking point is passed, a new + memory "frame" is used to remember the state of matching at that point. + Thus, this parameter indirectly limits the amount of memory that is + used in a match. However, because the size of each memory "frame" de- + pends on the number of capturing parentheses, the actual memory limit + varies from pattern to pattern. This limit was more useful in versions + before 10.30, where function recursion was used for backtracking. + + The depth limit is not relevant, and is ignored, when matching is done + using JIT compiled code. However, it is supported by pcre2_dfa_match(), + which uses it to limit the depth of nested internal recursive function + calls that implement atomic groups, lookaround assertions, and pattern + recursions. This limits, indirectly, the amount of system stack that is + used. It was more useful in versions before 10.32, when stack memory + was used for local workspace vectors for recursive function calls. From + version 10.32, only local variables are allocated on the stack and as + each call uses only a few hundred bytes, even a small stack can support + quite a lot of recursion. + + If the depth of internal recursive function calls is great enough, lo- + cal workspace vectors are allocated on the heap from version 10.32 on- + wards, so the depth limit also indirectly limits the amount of heap + memory that is used. A recursive pattern such as /(.(?2))((?1)|)/, when + matched to a very long string using pcre2_dfa_match(), can use a great + deal of memory. However, it is probably better to limit heap usage di- + rectly by calling pcre2_set_heap_limit(). + + The default value for the depth limit can be set when PCRE2 is built; + if it is not, the default is set to the same value as the default for + the match limit. If the limit is exceeded, pcre2_match() or + pcre2_dfa_match() returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth + limit may also be supplied by an item at the start of a pattern of the + form + + (*LIMIT_DEPTH=ddd) + + where ddd is a decimal number. However, such a setting is ignored un- + less ddd is less than the limit set by the caller of pcre2_match() or + pcre2_dfa_match() or, if no such limit is set, less than the default. + + +CHECKING BUILD-TIME OPTIONS + + int pcre2_config(uint32_t what, void *where); + + The function pcre2_config() makes it possible for a PCRE2 client to + find the value of certain configuration parameters and to discover + which optional features have been compiled into the PCRE2 library. The + pcre2build documentation has more details about these features. + + The first argument for pcre2_config() specifies which information is + required. The second argument is a pointer to memory into which the in- + formation is placed. If NULL is passed, the function returns the amount + of memory that is needed for the requested information. For calls that + return numerical values, the value is in bytes; when requesting these + values, where should point to appropriately aligned memory. For calls + that return strings, the required length is given in code units, not + counting the terminating zero. + + When requesting information, the returned value from pcre2_config() is + non-negative on success, or the negative error code PCRE2_ERROR_BADOP- + TION if the value in the first argument is not recognized. The follow- + ing information is available: + + PCRE2_CONFIG_BSR + + The output is a uint32_t integer whose value indicates what character + sequences the \R escape sequence matches by default. A value of + PCRE2_BSR_UNICODE means that \R matches any Unicode line ending se- + quence; a value of PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, + or CRLF. The default can be overridden when a pattern is compiled. + + PCRE2_CONFIG_COMPILED_WIDTHS + + The output is a uint32_t integer whose lower bits indicate which code + unit widths were selected when PCRE2 was built. The 1-bit indicates + 8-bit support, and the 2-bit and 4-bit indicate 16-bit and 32-bit sup- + port, respectively. + + PCRE2_CONFIG_DEPTHLIMIT + + The output is a uint32_t integer that gives the default limit for the + depth of nested backtracking in pcre2_match() or the depth of nested + recursions, lookarounds, and atomic groups in pcre2_dfa_match(). Fur- + ther details are given with pcre2_set_depth_limit() above. + + PCRE2_CONFIG_HEAPLIMIT + + The output is a uint32_t integer that gives, in kibibytes, the default + limit for the amount of heap memory used by pcre2_match() or + pcre2_dfa_match(). Further details are given with + pcre2_set_heap_limit() above. + + PCRE2_CONFIG_JIT + + The output is a uint32_t integer that is set to one if support for + just-in-time compiling is available; otherwise it is set to zero. + + PCRE2_CONFIG_JITTARGET + + The where argument should point to a buffer that is at least 48 code + units long. (The exact length required can be found by calling + pcre2_config() with where set to NULL.) The buffer is filled with a + string that contains the name of the architecture for which the JIT + compiler is configured, for example "x86 32bit (little endian + un- + aligned)". If JIT support is not available, PCRE2_ERROR_BADOPTION is + returned, otherwise the number of code units used is returned. This is + the length of the string, plus one unit for the terminating zero. + + PCRE2_CONFIG_LINKSIZE + + The output is a uint32_t integer that contains the number of bytes used + for internal linkage in compiled regular expressions. When PCRE2 is + configured, the value can be set to 2, 3, or 4, with the default being + 2. This is the value that is returned by pcre2_config(). However, when + the 16-bit library is compiled, a value of 3 is rounded up to 4, and + when the 32-bit library is compiled, internal linkages always use 4 + bytes, so the configured value is not relevant. + + The default value of 2 for the 8-bit and 16-bit libraries is sufficient + for all but the most massive patterns, since it allows the size of the + compiled pattern to be up to 65535 code units. Larger values allow + larger regular expressions to be compiled by those two libraries, but + at the expense of slower matching. + + PCRE2_CONFIG_MATCHLIMIT + + The output is a uint32_t integer that gives the default match limit for + pcre2_match(). Further details are given with pcre2_set_match_limit() + above. + + PCRE2_CONFIG_NEWLINE + + The output is a uint32_t integer whose value specifies the default + character sequence that is recognized as meaning "newline". The values + are: + + PCRE2_NEWLINE_CR Carriage return (CR) + PCRE2_NEWLINE_LF Linefeed (LF) + PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF) + PCRE2_NEWLINE_ANY Any Unicode line ending + PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF + PCRE2_NEWLINE_NUL The NUL character (binary zero) + + The default should normally correspond to the standard sequence for + your operating system. + + PCRE2_CONFIG_NEVER_BACKSLASH_C + + The output is a uint32_t integer that is set to one if the use of \C + was permanently disabled when PCRE2 was built; otherwise it is set to + zero. + + PCRE2_CONFIG_PARENSLIMIT + + The output is a uint32_t integer that gives the maximum depth of nest- + ing of parentheses (of any kind) in a pattern. This limit is imposed to + cap the amount of system stack used when a pattern is compiled. It is + specified when PCRE2 is built; the default is 250. This limit does not + take into account the stack that may already be used by the calling ap- + plication. For finer control over compilation stack usage, see + pcre2_set_compile_recursion_guard(). + + PCRE2_CONFIG_STACKRECURSE + + This parameter is obsolete and should not be used in new code. The out- + put is a uint32_t integer that is always set to zero. + + PCRE2_CONFIG_TABLES_LENGTH + + The output is a uint32_t integer that gives the length of PCRE2's char- + acter processing tables in bytes. For details of these tables see the + section on locale support below. + + PCRE2_CONFIG_UNICODE_VERSION + + The where argument should point to a buffer that is at least 24 code + units long. (The exact length required can be found by calling + pcre2_config() with where set to NULL.) If PCRE2 has been compiled + without Unicode support, the buffer is filled with the text "Unicode + not supported". Otherwise, the Unicode version string (for example, + "8.0.0") is inserted. The number of code units used is returned. This + is the length of the string plus one unit for the terminating zero. + + PCRE2_CONFIG_UNICODE + + The output is a uint32_t integer that is set to one if Unicode support + is available; otherwise it is set to zero. Unicode support implies UTF + support. + + PCRE2_CONFIG_VERSION + + The where argument should point to a buffer that is at least 24 code + units long. (The exact length required can be found by calling + pcre2_config() with where set to NULL.) The buffer is filled with the + PCRE2 version string, zero-terminated. The number of code units used is + returned. This is the length of the string plus one unit for the termi- + nating zero. + + +COMPILING A PATTERN + + pcre2_code *pcre2_compile(PCRE2_SPTR pattern, PCRE2_SIZE length, + uint32_t options, int *errorcode, PCRE2_SIZE *erroroffset, + pcre2_compile_context *ccontext); + + void pcre2_code_free(pcre2_code *code); + + pcre2_code *pcre2_code_copy(const pcre2_code *code); + + pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *code); + + The pcre2_compile() function compiles a pattern into an internal form. + The pattern is defined by a pointer to a string of code units and a + length (in code units). If the pattern is zero-terminated, the length + can be specified as PCRE2_ZERO_TERMINATED. The function returns a + pointer to a block of memory that contains the compiled pattern and re- + lated data, or NULL if an error occurred. + + If the compile context argument ccontext is NULL, memory for the com- + piled pattern is obtained by calling malloc(). Otherwise, it is ob- + tained from the same memory function that was used for the compile con- + text. The caller must free the memory by calling pcre2_code_free() when + it is no longer needed. If pcre2_code_free() is called with a NULL ar- + gument, it returns immediately, without doing anything. + + The function pcre2_code_copy() makes a copy of the compiled code in new + memory, using the same memory allocator as was used for the original. + However, if the code has been processed by the JIT compiler (see be- + low), the JIT information cannot be copied (because it is position-de- + pendent). The new copy can initially be used only for non-JIT match- + ing, though it can be passed to pcre2_jit_compile() if required. If + pcre2_code_copy() is called with a NULL argument, it returns NULL. + + The pcre2_code_copy() function provides a way for individual threads in + a multithreaded application to acquire a private copy of shared com- + piled code. However, it does not make a copy of the character tables + used by the compiled pattern; the new pattern code points to the same + tables as the original code. (See "Locale Support" below for details + of these character tables.) In many applications the same tables are + used throughout, so this behaviour is appropriate. Nevertheless, there + are occasions when a copy of a compiled pattern and the relevant tables + are needed. The pcre2_code_copy_with_tables() provides this facility. + Copies of both the code and the tables are made, with the new code + pointing to the new tables. The memory for the new tables is automati- + cally freed when pcre2_code_free() is called for the new copy of the + compiled code. If pcre2_code_copy_with_tables() is called with a NULL + argument, it returns NULL. + + NOTE: When one of the matching functions is called, pointers to the + compiled pattern and the subject string are set in the match data block + so that they can be referenced by the substring extraction functions + after a successful match. After running a match, you must not free a + compiled pattern or a subject string until after all operations on the + match data block have taken place, unless, in the case of the subject + string, you have used the PCRE2_COPY_MATCHED_SUBJECT option, which is + described in the section entitled "Option bits for pcre2_match()" be- + low. + + The options argument for pcre2_compile() contains various bit settings + that affect the compilation. It should be zero if none of them are re- + quired. The available options are described below. Some of them (in + particular, those that are compatible with Perl, but some others as + well) can also be set and unset from within the pattern (see the de- + tailed description in the pcre2pattern documentation). + + For those options that can be different in different parts of the pat- + tern, the contents of the options argument specifies their settings at + the start of compilation. The PCRE2_ANCHORED, PCRE2_ENDANCHORED, and + PCRE2_NO_UTF_CHECK options can be set at the time of matching as well + as at compile time. + + Some additional options and less frequently required compile-time pa- + rameters (for example, the newline setting) can be provided in a com- + pile context (as described above). + + If errorcode or erroroffset is NULL, pcre2_compile() returns NULL imme- + diately. Otherwise, the variables to which these point are set to an + error code and an offset (number of code units) within the pattern, re- + spectively, when pcre2_compile() returns NULL because a compilation er- + ror has occurred. The values are not defined when compilation is suc- + cessful and pcre2_compile() returns a non-NULL value. + + There are nearly 100 positive error codes that pcre2_compile() may re- + turn if it finds an error in the pattern. There are also some negative + error codes that are used for invalid UTF strings when validity check- + ing is in force. These are the same as given by pcre2_match() and + pcre2_dfa_match(), and are described in the pcre2unicode documentation. + There is no separate documentation for the positive error codes, be- + cause the textual error messages that are obtained by calling the + pcre2_get_error_message() function (see "Obtaining a textual error mes- + sage" below) should be self-explanatory. Macro names starting with + PCRE2_ERROR_ are defined for both positive and negative error codes in + pcre2.h. + + The value returned in erroroffset is an indication of where in the pat- + tern the error occurred. It is not necessarily the furthest point in + the pattern that was read. For example, after the error "lookbehind as- + sertion is not fixed length", the error offset points to the start of + the failing assertion. For an invalid UTF-8 or UTF-16 string, the off- + set is that of the first code unit of the failing character. + + Some errors are not detected until the whole pattern has been scanned; + in these cases, the offset passed back is the length of the pattern. + Note that the offset is in code units, not characters, even in a UTF + mode. It may sometimes point into the middle of a UTF-8 or UTF-16 char- + acter. + + This code fragment shows a typical straightforward call to pcre2_com- + pile(): + + pcre2_code *re; + PCRE2_SIZE erroffset; + int errorcode; + re = pcre2_compile( + "^A.*Z", /* the pattern */ + PCRE2_ZERO_TERMINATED, /* the pattern is zero-terminated */ + 0, /* default options */ + &errorcode, /* for error code */ + &erroffset, /* for error offset */ + NULL); /* no compile context */ + + + Main compile options + + The following names for option bits are defined in the pcre2.h header + file: + + PCRE2_ANCHORED + + If this bit is set, the pattern is forced to be "anchored", that is, it + is constrained to match only at the first matching point in the string + that is being searched (the "subject string"). This effect can also be + achieved by appropriate constructs in the pattern itself, which is the + only way to do it in Perl. + + PCRE2_ALLOW_EMPTY_CLASS + + By default, for compatibility with Perl, a closing square bracket that + immediately follows an opening one is treated as a data character for + the class. When PCRE2_ALLOW_EMPTY_CLASS is set, it terminates the + class, which therefore contains no characters and so can never match. + + PCRE2_ALT_BSUX + + This option request alternative handling of three escape sequences, + which makes PCRE2's behaviour more like ECMAscript (aka JavaScript). + When it is set: + + (1) \U matches an upper case "U" character; by default \U causes a com- + pile time error (Perl uses \U to upper case subsequent characters). + + (2) \u matches a lower case "u" character unless it is followed by four + hexadecimal digits, in which case the hexadecimal number defines the + code point to match. By default, \u causes a compile time error (Perl + uses it to upper case the following character). + + (3) \x matches a lower case "x" character unless it is followed by two + hexadecimal digits, in which case the hexadecimal number defines the + code point to match. By default, as in Perl, a hexadecimal number is + always expected after \x, but it may have zero, one, or two digits (so, + for example, \xz matches a binary zero character followed by z). + + ECMAscript 6 added additional functionality to \u. This can be accessed + using the PCRE2_EXTRA_ALT_BSUX extra option (see "Extra compile op- + tions" below). Note that this alternative escape handling applies only + to patterns. Neither of these options affects the processing of re- + placement strings passed to pcre2_substitute(). + + PCRE2_ALT_CIRCUMFLEX + + In multiline mode (when PCRE2_MULTILINE is set), the circumflex + metacharacter matches at the start of the subject (unless PCRE2_NOTBOL + is set), and also after any internal newline. However, it does not + match after a newline at the end of the subject, for compatibility with + Perl. If you want a multiline circumflex also to match after a termi- + nating newline, you must set PCRE2_ALT_CIRCUMFLEX. + + PCRE2_ALT_VERBNAMES + + By default, for compatibility with Perl, the name in any verb sequence + such as (*MARK:NAME) is any sequence of characters that does not in- + clude a closing parenthesis. The name is not processed in any way, and + it is not possible to include a closing parenthesis in the name. How- + ever, if the PCRE2_ALT_VERBNAMES option is set, normal backslash pro- + cessing is applied to verb names and only an unescaped closing paren- + thesis terminates the name. A closing parenthesis can be included in a + name either as \) or between \Q and \E. If the PCRE2_EXTENDED or + PCRE2_EXTENDED_MORE option is set with PCRE2_ALT_VERBNAMES, unescaped + whitespace in verb names is skipped and #-comments are recognized, ex- + actly as in the rest of the pattern. + + PCRE2_AUTO_CALLOUT + + If this bit is set, pcre2_compile() automatically inserts callout + items, all with number 255, before each pattern item, except immedi- + ately before or after an explicit callout in the pattern. For discus- + sion of the callout facility, see the pcre2callout documentation. + + PCRE2_CASELESS + + If this bit is set, letters in the pattern match both upper and lower + case letters in the subject. It is equivalent to Perl's /i option, and + it can be changed within a pattern by a (?i) option setting. If either + PCRE2_UTF or PCRE2_UCP is set, Unicode properties are used for all + characters with more than one other case, and for all characters whose + code points are greater than U+007F. Note that there are two ASCII + characters, K and S, that, in addition to their lower case ASCII equiv- + alents, are case-equivalent with U+212A (Kelvin sign) and U+017F (long + S) respectively. For lower valued characters with only one other case, + a lookup table is used for speed. When neither PCRE2_UTF nor PCRE2_UCP + is set, a lookup table is used for all code points less than 256, and + higher code points (available only in 16-bit or 32-bit mode) are + treated as not having another case. + + PCRE2_DOLLAR_ENDONLY + + If this bit is set, a dollar metacharacter in the pattern matches only + at the end of the subject string. Without this option, a dollar also + matches immediately before a newline at the end of the string (but not + before any other newlines). The PCRE2_DOLLAR_ENDONLY option is ignored + if PCRE2_MULTILINE is set. There is no equivalent to this option in + Perl, and no way to set it within a pattern. + + PCRE2_DOTALL + + If this bit is set, a dot metacharacter in the pattern matches any + character, including one that indicates a newline. However, it only + ever matches one character, even if newlines are coded as CRLF. Without + this option, a dot does not match when the current position in the sub- + ject is at a newline. This option is equivalent to Perl's /s option, + and it can be changed within a pattern by a (?s) option setting. A neg- + ative class such as [^a] always matches newline characters, and the \N + escape sequence always matches a non-newline character, independent of + the setting of PCRE2_DOTALL. + + PCRE2_DUPNAMES + + If this bit is set, names used to identify capture groups need not be + unique. This can be helpful for certain types of pattern when it is + known that only one instance of the named group can ever be matched. + There are more details of named capture groups below; see also the + pcre2pattern documentation. + + PCRE2_ENDANCHORED + + If this bit is set, the end of any pattern match must be right at the + end of the string being searched (the "subject string"). If the pattern + match succeeds by reaching (*ACCEPT), but does not reach the end of the + subject, the match fails at the current starting point. For unanchored + patterns, a new match is then tried at the next starting point. How- + ever, if the match succeeds by reaching the end of the pattern, but not + the end of the subject, backtracking occurs and an alternative match + may be found. Consider these two patterns: + + .(*ACCEPT)|.. + .|.. + + If matched against "abc" with PCRE2_ENDANCHORED set, the first matches + "c" whereas the second matches "bc". The effect of PCRE2_ENDANCHORED + can also be achieved by appropriate constructs in the pattern itself, + which is the only way to do it in Perl. + + For DFA matching with pcre2_dfa_match(), PCRE2_ENDANCHORED applies only + to the first (that is, the longest) matched string. Other parallel + matches, which are necessarily substrings of the first one, must obvi- + ously end before the end of the subject. + + PCRE2_EXTENDED + + If this bit is set, most white space characters in the pattern are to- + tally ignored except when escaped or inside a character class. However, + white space is not allowed within sequences such as (?> that introduce + various parenthesized groups, nor within numerical quantifiers such as + {1,3}. Ignorable white space is permitted between an item and a follow- + ing quantifier and between a quantifier and a following + that indi- + cates possessiveness. PCRE2_EXTENDED is equivalent to Perl's /x option, + and it can be changed within a pattern by a (?x) option setting. + + When PCRE2 is compiled without Unicode support, PCRE2_EXTENDED recog- + nizes as white space only those characters with code points less than + 256 that are flagged as white space in its low-character table. The ta- + ble is normally created by pcre2_maketables(), which uses the isspace() + function to identify space characters. In most ASCII environments, the + relevant characters are those with code points 0x0009 (tab), 0x000A + (linefeed), 0x000B (vertical tab), 0x000C (formfeed), 0x000D (carriage + return), and 0x0020 (space). + + When PCRE2 is compiled with Unicode support, in addition to these char- + acters, five more Unicode "Pattern White Space" characters are recog- + nized by PCRE2_EXTENDED. These are U+0085 (next line), U+200E (left-to- + right mark), U+200F (right-to-left mark), U+2028 (line separator), and + U+2029 (paragraph separator). This set of characters is the same as + recognized by Perl's /x option. Note that the horizontal and vertical + space characters that are matched by the \h and \v escapes in patterns + are a much bigger set. + + As well as ignoring most white space, PCRE2_EXTENDED also causes char- + acters between an unescaped # outside a character class and the next + newline, inclusive, to be ignored, which makes it possible to include + comments inside complicated patterns. Note that the end of this type of + comment is a literal newline sequence in the pattern; escape sequences + that happen to represent a newline do not count. + + Which characters are interpreted as newlines can be specified by a set- + ting in the compile context that is passed to pcre2_compile() or by a + special sequence at the start of the pattern, as described in the sec- + tion entitled "Newline conventions" in the pcre2pattern documentation. + A default is defined when PCRE2 is built. + + PCRE2_EXTENDED_MORE + + This option has the effect of PCRE2_EXTENDED, but, in addition, un- + escaped space and horizontal tab characters are ignored inside a char- + acter class. Note: only these two characters are ignored, not the full + set of pattern white space characters that are ignored outside a char- + acter class. PCRE2_EXTENDED_MORE is equivalent to Perl's /xx option, + and it can be changed within a pattern by a (?xx) option setting. + + PCRE2_FIRSTLINE + + If this option is set, the start of an unanchored pattern match must be + before or at the first newline in the subject string following the + start of matching, though the matched text may continue over the new- + line. If startoffset is non-zero, the limiting newline is not necessar- + ily the first newline in the subject. For example, if the subject + string is "abc\nxyz" (where \n represents a single-character newline) a + pattern match for "yz" succeeds with PCRE2_FIRSTLINE if startoffset is + greater than 3. See also PCRE2_USE_OFFSET_LIMIT, which provides a more + general limiting facility. If PCRE2_FIRSTLINE is set with an offset + limit, a match must occur in the first line and also within the offset + limit. In other words, whichever limit comes first is used. + + PCRE2_LITERAL + + If this option is set, all meta-characters in the pattern are disabled, + and it is treated as a literal string. Matching literal strings with a + regular expression engine is not the most efficient way of doing it. If + you are doing a lot of literal matching and are worried about effi- + ciency, you should consider using other approaches. The only other main + options that are allowed with PCRE2_LITERAL are: PCRE2_ANCHORED, + PCRE2_ENDANCHORED, PCRE2_AUTO_CALLOUT, PCRE2_CASELESS, PCRE2_FIRSTLINE, + PCRE2_MATCH_INVALID_UTF, PCRE2_NO_START_OPTIMIZE, PCRE2_NO_UTF_CHECK, + PCRE2_UTF, and PCRE2_USE_OFFSET_LIMIT. The extra options PCRE2_EX- + TRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD are also supported. Any other + options cause an error. + + PCRE2_MATCH_INVALID_UTF + + This option forces PCRE2_UTF (see below) and also enables support for + matching by pcre2_match() in subject strings that contain invalid UTF + sequences. This facility is not supported for DFA matching. For de- + tails, see the pcre2unicode documentation. + + PCRE2_MATCH_UNSET_BACKREF + + If this option is set, a backreference to an unset capture group + matches an empty string (by default this causes the current matching + alternative to fail). A pattern such as (\1)(a) succeeds when this op- + tion is set (assuming it can find an "a" in the subject), whereas it + fails by default, for Perl compatibility. Setting this option makes + PCRE2 behave more like ECMAscript (aka JavaScript). + + PCRE2_MULTILINE + + By default, for the purposes of matching "start of line" and "end of + line", PCRE2 treats the subject string as consisting of a single line + of characters, even if it actually contains newlines. The "start of + line" metacharacter (^) matches only at the start of the string, and + the "end of line" metacharacter ($) matches only at the end of the + string, or before a terminating newline (except when PCRE2_DOLLAR_EN- + DONLY is set). Note, however, that unless PCRE2_DOTALL is set, the "any + character" metacharacter (.) does not match at a newline. This behav- + iour (for ^, $, and dot) is the same as Perl. + + When PCRE2_MULTILINE it is set, the "start of line" and "end of line" + constructs match immediately following or immediately before internal + newlines in the subject string, respectively, as well as at the very + start and end. This is equivalent to Perl's /m option, and it can be + changed within a pattern by a (?m) option setting. Note that the "start + of line" metacharacter does not match after a newline at the end of the + subject, for compatibility with Perl. However, you can change this by + setting the PCRE2_ALT_CIRCUMFLEX option. If there are no newlines in a + subject string, or no occurrences of ^ or $ in a pattern, setting + PCRE2_MULTILINE has no effect. + + PCRE2_NEVER_BACKSLASH_C + + This option locks out the use of \C in the pattern that is being com- + piled. This escape can cause unpredictable behaviour in UTF-8 or + UTF-16 modes, because it may leave the current matching point in the + middle of a multi-code-unit character. This option may be useful in ap- + plications that process patterns from external sources. Note that there + is also a build-time option that permanently locks out the use of \C. + + PCRE2_NEVER_UCP + + This option locks out the use of Unicode properties for handling \B, + \b, \D, \d, \S, \s, \W, \w, and some of the POSIX character classes, as + described for the PCRE2_UCP option below. In particular, it prevents + the creator of the pattern from enabling this facility by starting the + pattern with (*UCP). This option may be useful in applications that + process patterns from external sources. The option combination PCRE_UCP + and PCRE_NEVER_UCP causes an error. + + PCRE2_NEVER_UTF + + This option locks out interpretation of the pattern as UTF-8, UTF-16, + or UTF-32, depending on which library is in use. In particular, it pre- + vents the creator of the pattern from switching to UTF interpretation + by starting the pattern with (*UTF). This option may be useful in ap- + plications that process patterns from external sources. The combination + of PCRE2_UTF and PCRE2_NEVER_UTF causes an error. + + PCRE2_NO_AUTO_CAPTURE + + If this option is set, it disables the use of numbered capturing paren- + theses in the pattern. Any opening parenthesis that is not followed by + ? behaves as if it were followed by ?: but named parentheses can still + be used for capturing (and they acquire numbers in the usual way). This + is the same as Perl's /n option. Note that, when this option is set, + references to capture groups (backreferences or recursion/subroutine + calls) may only refer to named groups, though the reference can be by + name or by number. + + PCRE2_NO_AUTO_POSSESS + + If this option is set, it disables "auto-possessification", which is an + optimization that, for example, turns a+b into a++b in order to avoid + backtracks into a+ that can never be successful. However, if callouts + are in use, auto-possessification means that some callouts are never + taken. You can set this option if you want the matching functions to do + a full unoptimized search and run all the callouts, but it is mainly + provided for testing purposes. + + PCRE2_NO_DOTSTAR_ANCHOR + + If this option is set, it disables an optimization that is applied when + .* is the first significant item in a top-level branch of a pattern, + and all the other branches also start with .* or with \A or \G or ^. + The optimization is automatically disabled for .* if it is inside an + atomic group or a capture group that is the subject of a backreference, + or if the pattern contains (*PRUNE) or (*SKIP). When the optimization + is not disabled, such a pattern is automatically anchored if + PCRE2_DOTALL is set for all the .* items and PCRE2_MULTILINE is not set + for any ^ items. Otherwise, the fact that any match must start either + at the start of the subject or following a newline is remembered. Like + other optimizations, this can cause callouts to be skipped. + + PCRE2_NO_START_OPTIMIZE + + This is an option whose main effect is at matching time. It does not + change what pcre2_compile() generates, but it does affect the output of + the JIT compiler. + + There are a number of optimizations that may occur at the start of a + match, in order to speed up the process. For example, if it is known + that an unanchored match must start with a specific code unit value, + the matching code searches the subject for that value, and fails imme- + diately if it cannot find it, without actually running the main match- + ing function. This means that a special item such as (*COMMIT) at the + start of a pattern is not considered until after a suitable starting + point for the match has been found. Also, when callouts or (*MARK) + items are in use, these "start-up" optimizations can cause them to be + skipped if the pattern is never actually used. The start-up optimiza- + tions are in effect a pre-scan of the subject that takes place before + the pattern is run. + + The PCRE2_NO_START_OPTIMIZE option disables the start-up optimizations, + possibly causing performance to suffer, but ensuring that in cases + where the result is "no match", the callouts do occur, and that items + such as (*COMMIT) and (*MARK) are considered at every possible starting + position in the subject string. + + Setting PCRE2_NO_START_OPTIMIZE may change the outcome of a matching + operation. Consider the pattern + + (*COMMIT)ABC + + When this is compiled, PCRE2 records the fact that a match must start + with the character "A". Suppose the subject string is "DEFABC". The + start-up optimization scans along the subject, finds "A" and runs the + first match attempt from there. The (*COMMIT) item means that the pat- + tern must match the current starting position, which in this case, it + does. However, if the same match is run with PCRE2_NO_START_OPTIMIZE + set, the initial scan along the subject string does not happen. The + first match attempt is run starting from "D" and when this fails, + (*COMMIT) prevents any further matches being tried, so the overall re- + sult is "no match". + + As another start-up optimization makes use of a minimum length for a + matching subject, which is recorded when possible. Consider the pattern + + (*MARK:1)B(*MARK:2)(X|Y) + + The minimum length for a match is two characters. If the subject is + "XXBB", the "starting character" optimization skips "XX", then tries to + match "BB", which is long enough. In the process, (*MARK:2) is encoun- + tered and remembered. When the match attempt fails, the next "B" is + found, but there is only one character left, so there are no more at- + tempts, and "no match" is returned with the "last mark seen" set to + "2". If NO_START_OPTIMIZE is set, however, matches are tried at every + possible starting position, including at the end of the subject, where + (*MARK:1) is encountered, but there is no "B", so the "last mark seen" + that is returned is "1". In this case, the optimizations do not affect + the overall match result, which is still "no match", but they do affect + the auxiliary information that is returned. + + PCRE2_NO_UTF_CHECK + + When PCRE2_UTF is set, the validity of the pattern as a UTF string is + automatically checked. There are discussions about the validity of + UTF-8 strings, UTF-16 strings, and UTF-32 strings in the pcre2unicode + document. If an invalid UTF sequence is found, pcre2_compile() returns + a negative error code. + + If you know that your pattern is a valid UTF string, and you want to + skip this check for performance reasons, you can set the + PCRE2_NO_UTF_CHECK option. When it is set, the effect of passing an in- + valid UTF string as a pattern is undefined. It may cause your program + to crash or loop. + + Note that this option can also be passed to pcre2_match() and + pcre_dfa_match(), to suppress UTF validity checking of the subject + string. + + Note also that setting PCRE2_NO_UTF_CHECK at compile time does not dis- + able the error that is given if an escape sequence for an invalid Uni- + code code point is encountered in the pattern. In particular, the so- + called "surrogate" code points (0xd800 to 0xdfff) are invalid. If you + want to allow escape sequences such as \x{d800} you can set the + PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES extra option, as described in the + section entitled "Extra compile options" below. However, this is pos- + sible only in UTF-8 and UTF-32 modes, because these values are not rep- + resentable in UTF-16. + + PCRE2_UCP + + This option has two effects. Firstly, it change the way PCRE2 processes + \B, \b, \D, \d, \S, \s, \W, \w, and some of the POSIX character + classes. By default, only ASCII characters are recognized, but if + PCRE2_UCP is set, Unicode properties are used instead to classify char- + acters. More details are given in the section on generic character + types in the pcre2pattern page. If you set PCRE2_UCP, matching one of + the items it affects takes much longer. + + The second effect of PCRE2_UCP is to force the use of Unicode proper- + ties for upper/lower casing operations on characters with code points + greater than 127, even when PCRE2_UTF is not set. This makes it possi- + ble, for example, to process strings in the 16-bit UCS-2 code. This op- + tion is available only if PCRE2 has been compiled with Unicode support + (which is the default). + + PCRE2_UNGREEDY + + This option inverts the "greediness" of the quantifiers so that they + are not greedy by default, but become greedy if followed by "?". It is + not compatible with Perl. It can also be set by a (?U) option setting + within the pattern. + + PCRE2_USE_OFFSET_LIMIT + + This option must be set for pcre2_compile() if pcre2_set_offset_limit() + is going to be used to set a non-default offset limit in a match con- + text for matches that use this pattern. An error is generated if an + offset limit is set without this option. For more details, see the de- + scription of pcre2_set_offset_limit() in the section that describes + match contexts. See also the PCRE2_FIRSTLINE option above. + + PCRE2_UTF + + This option causes PCRE2 to regard both the pattern and the subject + strings that are subsequently processed as strings of UTF characters + instead of single-code-unit strings. It is available when PCRE2 is + built to include Unicode support (which is the default). If Unicode + support is not available, the use of this option provokes an error. De- + tails of how PCRE2_UTF changes the behaviour of PCRE2 are given in the + pcre2unicode page. In particular, note that it changes the way + PCRE2_CASELESS handles characters with code points greater than 127. + + Extra compile options + + The option bits that can be set in a compile context by calling the + pcre2_set_compile_extra_options() function are as follows: + + PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES + + This option applies when compiling a pattern in UTF-8 or UTF-32 mode. + It is forbidden in UTF-16 mode, and ignored in non-UTF modes. Unicode + "surrogate" code points in the range 0xd800 to 0xdfff are used in pairs + in UTF-16 to encode code points with values in the range 0x10000 to + 0x10ffff. The surrogates cannot therefore be represented in UTF-16. + They can be represented in UTF-8 and UTF-32, but are defined as invalid + code points, and cause errors if encountered in a UTF-8 or UTF-32 + string that is being checked for validity by PCRE2. + + These values also cause errors if encountered in escape sequences such + as \x{d912} within a pattern. However, it seems that some applications, + when using PCRE2 to check for unwanted characters in UTF-8 strings, ex- + plicitly test for the surrogates using escape sequences. The + PCRE2_NO_UTF_CHECK option does not disable the error that occurs, be- + cause it applies only to the testing of input strings for UTF validity. + + If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surro- + gate code point values in UTF-8 and UTF-32 patterns no longer provoke + errors and are incorporated in the compiled pattern. However, they can + only match subject characters if the matching function is called with + PCRE2_NO_UTF_CHECK set. + + PCRE2_EXTRA_ALT_BSUX + + The original option PCRE2_ALT_BSUX causes PCRE2 to process \U, \u, and + \x in the way that ECMAscript (aka JavaScript) does. Additional func- + tionality was defined by ECMAscript 6; setting PCRE2_EXTRA_ALT_BSUX has + the effect of PCRE2_ALT_BSUX, but in addition it recognizes \u{hhh..} + as a hexadecimal character code, where hhh.. is any number of hexadeci- + mal digits. + + PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL + + This is a dangerous option. Use with care. By default, an unrecognized + escape such as \j or a malformed one such as \x{2z} causes a compile- + time error when detected by pcre2_compile(). Perl is somewhat inconsis- + tent in handling such items: for example, \j is treated as a literal + "j", and non-hexadecimal digits in \x{} are just ignored, though warn- + ings are given in both cases if Perl's warning switch is enabled. How- + ever, a malformed octal number after \o{ always causes an error in + Perl. + + If the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to + pcre2_compile(), all unrecognized or malformed escape sequences are + treated as single-character escapes. For example, \j is a literal "j" + and \x{2z} is treated as the literal string "x{2z}". Setting this op- + tion means that typos in patterns may go undetected and have unexpected + results. Also note that a sequence such as [\N{] is interpreted as a + malformed attempt at [\N{...}] and so is treated as [N{] whereas [\N] + gives an error because an unqualified \N is a valid escape sequence but + is not supported in a character class. To reiterate: this is a danger- + ous option. Use with great care. + + PCRE2_EXTRA_ESCAPED_CR_IS_LF + + There are some legacy applications where the escape sequence \r in a + pattern is expected to match a newline. If this option is set, \r in a + pattern is converted to \n so that it matches a LF (linefeed) instead + of a CR (carriage return) character. The option does not affect a lit- + eral CR in the pattern, nor does it affect CR specified as an explicit + code point such as \x{0D}. + + PCRE2_EXTRA_MATCH_LINE + + This option is provided for use by the -x option of pcre2grep. It + causes the pattern only to match complete lines. This is achieved by + automatically inserting the code for "^(?:" at the start of the com- + piled pattern and ")$" at the end. Thus, when PCRE2_MULTILINE is set, + the matched line may be in the middle of the subject string. This op- + tion can be used with PCRE2_LITERAL. + + PCRE2_EXTRA_MATCH_WORD + + This option is provided for use by the -w option of pcre2grep. It + causes the pattern only to match strings that have a word boundary at + the start and the end. This is achieved by automatically inserting the + code for "\b(?:" at the start of the compiled pattern and ")\b" at the + end. The option may be used with PCRE2_LITERAL. However, it is ignored + if PCRE2_EXTRA_MATCH_LINE is also set. + + +JUST-IN-TIME (JIT) COMPILATION + + int pcre2_jit_compile(pcre2_code *code, uint32_t options); + + int pcre2_jit_match(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext); + + void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext); + + pcre2_jit_stack *pcre2_jit_stack_create(PCRE2_SIZE startsize, + PCRE2_SIZE maxsize, pcre2_general_context *gcontext); + + void pcre2_jit_stack_assign(pcre2_match_context *mcontext, + pcre2_jit_callback callback_function, void *callback_data); + + void pcre2_jit_stack_free(pcre2_jit_stack *jit_stack); + + These functions provide support for JIT compilation, which, if the + just-in-time compiler is available, further processes a compiled pat- + tern into machine code that executes much faster than the pcre2_match() + interpretive matching function. Full details are given in the pcre2jit + documentation. + + JIT compilation is a heavyweight optimization. It can take some time + for patterns to be analyzed, and for one-off matches and simple pat- + terns the benefit of faster execution might be offset by a much slower + compilation time. Most (but not all) patterns can be optimized by the + JIT compiler. + + +LOCALE SUPPORT + + const uint8_t *pcre2_maketables(pcre2_general_context *gcontext); + + void pcre2_maketables_free(pcre2_general_context *gcontext, + const uint8_t *tables); + + PCRE2 handles caseless matching, and determines whether characters are + letters, digits, or whatever, by reference to a set of tables, indexed + by character code point. However, this applies only to characters whose + code points are less than 256. By default, higher-valued code points + never match escapes such as \w or \d. + + When PCRE2 is built with Unicode support (the default), the Unicode + properties of all characters can be tested with \p and \P, or, alterna- + tively, the PCRE2_UCP option can be set when a pattern is compiled; + this causes \w and friends to use Unicode property support instead of + the built-in tables. PCRE2_UCP also causes upper/lower casing opera- + tions on characters with code points greater than 127 to use Unicode + properties. These effects apply even when PCRE2_UTF is not set. + + The use of locales with Unicode is discouraged. If you are handling + characters with code points greater than 127, you should either use + Unicode support, or use locales, but not try to mix the two. + + PCRE2 contains a built-in set of character tables that are used by de- + fault. These are sufficient for many applications. Normally, the in- + ternal tables recognize only ASCII characters. However, when PCRE2 is + built, it is possible to cause the internal tables to be rebuilt in the + default "C" locale of the local system, which may cause them to be dif- + ferent. + + The built-in tables can be overridden by tables supplied by the appli- + cation that calls PCRE2. These may be created in a different locale + from the default. As more and more applications change to using Uni- + code, the need for this locale support is expected to die away. + + External tables are built by calling the pcre2_maketables() function, + in the relevant locale. The only argument to this function is a general + context, which can be used to pass a custom memory allocator. If the + argument is NULL, the system malloc() is used. The result can be passed + to pcre2_compile() as often as necessary, by creating a compile context + and calling pcre2_set_character_tables() to set the tables pointer + therein. + + For example, to build and use tables that are appropriate for the + French locale (where accented characters with values greater than 127 + are treated as letters), the following code could be used: + + setlocale(LC_CTYPE, "fr_FR"); + tables = pcre2_maketables(NULL); + ccontext = pcre2_compile_context_create(NULL); + pcre2_set_character_tables(ccontext, tables); + re = pcre2_compile(..., ccontext); + + The locale name "fr_FR" is used on Linux and other Unix-like systems; + if you are using Windows, the name for the French locale is "french". + + The pointer that is passed (via the compile context) to pcre2_compile() + is saved with the compiled pattern, and the same tables are used by the + matching functions. Thus, for any single pattern, compilation and + matching both happen in the same locale, but different patterns can be + processed in different locales. + + It is the caller's responsibility to ensure that the memory containing + the tables remains available while they are still in use. When they are + no longer needed, you can discard them using pcre2_maketables_free(), + which should pass as its first parameter the same global context that + was used to create the tables. + + Saving locale tables + + The tables described above are just a sequence of binary bytes, which + makes them independent of hardware characteristics such as endianness + or whether the processor is 32-bit or 64-bit. A copy of the result of + pcre2_maketables() can therefore be saved in a file or elsewhere and + re-used later, even in a different program or on another computer. The + size of the tables (number of bytes) must be obtained by calling + pcre2_config() with the PCRE2_CONFIG_TABLES_LENGTH option because + pcre2_maketables() does not return this value. Note that the + pcre2_dftables program, which is part of the PCRE2 build system, can be + used stand-alone to create a file that contains a set of binary tables. + See the pcre2build documentation for details. + + +INFORMATION ABOUT A COMPILED PATTERN + + int pcre2_pattern_info(const pcre2 *code, uint32_t what, void *where); + + The pcre2_pattern_info() function returns general information about a + compiled pattern. For information about callouts, see the next section. + The first argument for pcre2_pattern_info() is a pointer to the com- + piled pattern. The second argument specifies which piece of information + is required, and the third argument is a pointer to a variable to re- + ceive the data. If the third argument is NULL, the first argument is + ignored, and the function returns the size in bytes of the variable + that is required for the information requested. Otherwise, the yield of + the function is zero for success, or one of the following negative num- + bers: + + PCRE2_ERROR_NULL the argument code was NULL + PCRE2_ERROR_BADMAGIC the "magic number" was not found + PCRE2_ERROR_BADOPTION the value of what was invalid + PCRE2_ERROR_UNSET the requested field is not set + + The "magic number" is placed at the start of each compiled pattern as a + simple check against passing an arbitrary memory pointer. Here is a + typical call of pcre2_pattern_info(), to obtain the length of the com- + piled pattern: + + int rc; + size_t length; + rc = pcre2_pattern_info( + re, /* result of pcre2_compile() */ + PCRE2_INFO_SIZE, /* what is required */ + &length); /* where to put the data */ + + The possible values for the second argument are defined in pcre2.h, and + are as follows: + + PCRE2_INFO_ALLOPTIONS + PCRE2_INFO_ARGOPTIONS + PCRE2_INFO_EXTRAOPTIONS + + Return copies of the pattern's options. The third argument should point + to a uint32_t variable. PCRE2_INFO_ARGOPTIONS returns exactly the op- + tions that were passed to pcre2_compile(), whereas PCRE2_INFO_ALLOP- + TIONS returns the compile options as modified by any top-level (*XXX) + option settings such as (*UTF) at the start of the pattern itself. + PCRE2_INFO_EXTRAOPTIONS returns the extra options that were set in the + compile context by calling the pcre2_set_compile_extra_options() func- + tion. + + For example, if the pattern /(*UTF)abc/ is compiled with the PCRE2_EX- + TENDED option, the result for PCRE2_INFO_ALLOPTIONS is PCRE2_EXTENDED + and PCRE2_UTF. Option settings such as (?i) that can change within a + pattern do not affect the result of PCRE2_INFO_ALLOPTIONS, even if they + appear right at the start of the pattern. (This was different in some + earlier releases.) + + A pattern compiled without PCRE2_ANCHORED is automatically anchored by + PCRE2 if the first significant item in every top-level branch is one of + the following: + + ^ unless PCRE2_MULTILINE is set + \A always + \G always + .* sometimes - see below + + When .* is the first significant item, anchoring is possible only when + all the following are true: + + .* is not in an atomic group + .* is not in a capture group that is the subject + of a backreference + PCRE2_DOTALL is in force for .* + Neither (*PRUNE) nor (*SKIP) appears in the pattern + PCRE2_NO_DOTSTAR_ANCHOR is not set + + For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in + the options returned for PCRE2_INFO_ALLOPTIONS. + + PCRE2_INFO_BACKREFMAX + + Return the number of the highest backreference in the pattern. The + third argument should point to a uint32_t variable. Named capture + groups acquire numbers as well as names, and these count towards the + highest backreference. Backreferences such as \4 or \g{12} match the + captured characters of the given group, but in addition, the check that + a capture group is set in a conditional group such as (?(3)a|b) is also + a backreference. Zero is returned if there are no backreferences. + + PCRE2_INFO_BSR + + The output is a uint32_t integer whose value indicates what character + sequences the \R escape sequence matches. A value of PCRE2_BSR_UNICODE + means that \R matches any Unicode line ending sequence; a value of + PCRE2_BSR_ANYCRLF means that \R matches only CR, LF, or CRLF. + + PCRE2_INFO_CAPTURECOUNT + + Return the highest capture group number in the pattern. In patterns + where (?| is not used, this is also the total number of capture groups. + The third argument should point to a uint32_t variable. + + PCRE2_INFO_DEPTHLIMIT + + If the pattern set a backtracking depth limit by including an item of + the form (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The + third argument should point to a uint32_t integer. If no such value has + been set, the call to pcre2_pattern_info() returns the error PCRE2_ER- + ROR_UNSET. Note that this limit will only be used during matching if it + is less than the limit set or defaulted by the caller of the match + function. + + PCRE2_INFO_FIRSTBITMAP + + In the absence of a single first code unit for a non-anchored pattern, + pcre2_compile() may construct a 256-bit table that defines a fixed set + of values for the first code unit in any match. For example, a pattern + that starts with [abc] results in a table with three bits set. When + code unit values greater than 255 are supported, the flag bit for 255 + means "any code unit of value 255 or above". If such a table was con- + structed, a pointer to it is returned. Otherwise NULL is returned. The + third argument should point to a const uint8_t * variable. + + PCRE2_INFO_FIRSTCODETYPE + + Return information about the first code unit of any matched string, for + a non-anchored pattern. The third argument should point to a uint32_t + variable. If there is a fixed first value, for example, the letter "c" + from a pattern such as (cat|cow|coyote), 1 is returned, and the value + can be retrieved using PCRE2_INFO_FIRSTCODEUNIT. If there is no fixed + first value, but it is known that a match can occur only at the start + of the subject or following a newline in the subject, 2 is returned. + Otherwise, and for anchored patterns, 0 is returned. + + PCRE2_INFO_FIRSTCODEUNIT + + Return the value of the first code unit of any matched string for a + pattern where PCRE2_INFO_FIRSTCODETYPE returns 1; otherwise return 0. + The third argument should point to a uint32_t variable. In the 8-bit + library, the value is always less than 256. In the 16-bit library the + value can be up to 0xffff. In the 32-bit library in UTF-32 mode the + value can be up to 0x10ffff, and up to 0xffffffff when not using UTF-32 + mode. + + PCRE2_INFO_FRAMESIZE + + Return the size (in bytes) of the data frames that are used to remember + backtracking positions when the pattern is processed by pcre2_match() + without the use of JIT. The third argument should point to a size_t + variable. The frame size depends on the number of capturing parentheses + in the pattern. Each additional capture group adds two PCRE2_SIZE vari- + ables. + + PCRE2_INFO_HASBACKSLASHC + + Return 1 if the pattern contains any instances of \C, otherwise 0. The + third argument should point to a uint32_t variable. + + PCRE2_INFO_HASCRORLF + + Return 1 if the pattern contains any explicit matches for CR or LF + characters, otherwise 0. The third argument should point to a uint32_t + variable. An explicit match is either a literal CR or LF character, or + \r or \n or one of the equivalent hexadecimal or octal escape se- + quences. + + PCRE2_INFO_HEAPLIMIT + + If the pattern set a heap memory limit by including an item of the form + (*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argu- + ment should point to a uint32_t integer. If no such value has been set, + the call to pcre2_pattern_info() returns the error PCRE2_ERROR_UNSET. + Note that this limit will only be used during matching if it is less + than the limit set or defaulted by the caller of the match function. + + PCRE2_INFO_JCHANGED + + Return 1 if the (?J) or (?-J) option setting is used in the pattern, + otherwise 0. The third argument should point to a uint32_t variable. + (?J) and (?-J) set and unset the local PCRE2_DUPNAMES option, respec- + tively. + + PCRE2_INFO_JITSIZE + + If the compiled pattern was successfully processed by pcre2_jit_com- + pile(), return the size of the JIT compiled code, otherwise return + zero. The third argument should point to a size_t variable. + + PCRE2_INFO_LASTCODETYPE + + Returns 1 if there is a rightmost literal code unit that must exist in + any matched string, other than at its start. The third argument should + point to a uint32_t variable. If there is no such value, 0 is returned. + When 1 is returned, the code unit value itself can be retrieved using + PCRE2_INFO_LASTCODEUNIT. For anchored patterns, a last literal value is + recorded only if it follows something of variable length. For example, + for the pattern /^a\d+z\d+/ the returned value is 1 (with "z" returned + from PCRE2_INFO_LASTCODEUNIT), but for /^a\dz\d/ the returned value is + 0. + + PCRE2_INFO_LASTCODEUNIT + + Return the value of the rightmost literal code unit that must exist in + any matched string, other than at its start, for a pattern where + PCRE2_INFO_LASTCODETYPE returns 1. Otherwise, return 0. The third argu- + ment should point to a uint32_t variable. + + PCRE2_INFO_MATCHEMPTY + + Return 1 if the pattern might match an empty string, otherwise 0. The + third argument should point to a uint32_t variable. When a pattern con- + tains recursive subroutine calls it is not always possible to determine + whether or not it can match an empty string. PCRE2 takes a cautious ap- + proach and returns 1 in such cases. + + PCRE2_INFO_MATCHLIMIT + + If the pattern set a match limit by including an item of the form + (*LIMIT_MATCH=nnnn) at the start, the value is returned. The third ar- + gument should point to a uint32_t integer. If no such value has been + set, the call to pcre2_pattern_info() returns the error PCRE2_ERROR_UN- + SET. Note that this limit will only be used during matching if it is + less than the limit set or defaulted by the caller of the match func- + tion. + + PCRE2_INFO_MAXLOOKBEHIND + + A lookbehind assertion moves back a certain number of characters (not + code units) when it starts to process each of its branches. This re- + quest returns the largest of these backward moves. The third argument + should point to a uint32_t integer. The simple assertions \b and \B re- + quire a one-character lookbehind and cause PCRE2_INFO_MAXLOOKBEHIND to + return 1 in the absence of anything longer. \A also registers a one- + character lookbehind, though it does not actually inspect the previous + character. + + Note that this information is useful for multi-segment matching only if + the pattern contains no nested lookbehinds. For example, the pattern + (?<=a(?<=ba)c) returns a maximum lookbehind of 2, but when it is pro- + cessed, the first lookbehind moves back by two characters, matches one + character, then the nested lookbehind also moves back by two charac- + ters. This puts the matching point three characters earlier than it was + at the start. PCRE2_INFO_MAXLOOKBEHIND is really only useful as a de- + bugging tool. See the pcre2partial documentation for a discussion of + multi-segment matching. + + PCRE2_INFO_MINLENGTH + + If a minimum length for matching subject strings was computed, its + value is returned. Otherwise the returned value is 0. This value is not + computed when PCRE2_NO_START_OPTIMIZE is set. The value is a number of + characters, which in UTF mode may be different from the number of code + units. The third argument should point to a uint32_t variable. The + value is a lower bound to the length of any matching string. There may + not be any strings of that length that do actually match, but every + string that does match is at least that long. + + PCRE2_INFO_NAMECOUNT + PCRE2_INFO_NAMEENTRYSIZE + PCRE2_INFO_NAMETABLE + + PCRE2 supports the use of named as well as numbered capturing parenthe- + ses. The names are just an additional way of identifying the parenthe- + ses, which still acquire numbers. Several convenience functions such as + pcre2_substring_get_byname() are provided for extracting captured sub- + strings by name. It is also possible to extract the data directly, by + first converting the name to a number in order to access the correct + pointers in the output vector (described with pcre2_match() below). To + do the conversion, you need to use the name-to-number map, which is de- + scribed by these three values. + + The map consists of a number of fixed-size entries. PCRE2_INFO_NAME- + COUNT gives the number of entries, and PCRE2_INFO_NAMEENTRYSIZE gives + the size of each entry in code units; both of these return a uint32_t + value. The entry size depends on the length of the longest name. + + PCRE2_INFO_NAMETABLE returns a pointer to the first entry of the table. + This is a PCRE2_SPTR pointer to a block of code units. In the 8-bit li- + brary, the first two bytes of each entry are the number of the captur- + ing parenthesis, most significant byte first. In the 16-bit library, + the pointer points to 16-bit code units, the first of which contains + the parenthesis number. In the 32-bit library, the pointer points to + 32-bit code units, the first of which contains the parenthesis number. + The rest of the entry is the corresponding name, zero terminated. + + The names are in alphabetical order. If (?| is used to create multiple + capture groups with the same number, as described in the section on du- + plicate group numbers in the pcre2pattern page, the groups may be given + the same name, but there is only one entry in the table. Different + names for groups of the same number are not permitted. + + Duplicate names for capture groups with different numbers are permit- + ted, but only if PCRE2_DUPNAMES is set. They appear in the table in the + order in which they were found in the pattern. In the absence of (?| + this is the order of increasing number; when (?| is used this is not + necessarily the case because later capture groups may have lower num- + bers. + + As a simple example of the name/number table, consider the following + pattern after compilation by the 8-bit library (assume PCRE2_EXTENDED + is set, so white space - including newlines - is ignored): + + (? (?(\d\d)?\d\d) - + (?\d\d) - (?\d\d) ) + + There are four named capture groups, so the table has four entries, and + each entry in the table is eight bytes long. The table is as follows, + with non-printing bytes shows in hexadecimal, and undefined bytes shown + as ??: + + 00 01 d a t e 00 ?? + 00 05 d a y 00 ?? ?? + 00 04 m o n t h 00 + 00 02 y e a r 00 ?? + + When writing code to extract data from named capture groups using the + name-to-number map, remember that the length of the entries is likely + to be different for each compiled pattern. + + PCRE2_INFO_NEWLINE + + The output is one of the following uint32_t values: + + PCRE2_NEWLINE_CR Carriage return (CR) + PCRE2_NEWLINE_LF Linefeed (LF) + PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF) + PCRE2_NEWLINE_ANY Any Unicode line ending + PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF + PCRE2_NEWLINE_NUL The NUL character (binary zero) + + This identifies the character sequence that will be recognized as mean- + ing "newline" while matching. + + PCRE2_INFO_SIZE + + Return the size of the compiled pattern in bytes (for all three li- + braries). The third argument should point to a size_t variable. This + value includes the size of the general data block that precedes the + code units of the compiled pattern itself. The value that is used when + pcre2_compile() is getting memory in which to place the compiled pat- + tern may be slightly larger than the value returned by this option, be- + cause there are cases where the code that calculates the size has to + over-estimate. Processing a pattern with the JIT compiler does not al- + ter the value returned by this option. + + +INFORMATION ABOUT A PATTERN'S CALLOUTS + + int pcre2_callout_enumerate(const pcre2_code *code, + int (*callback)(pcre2_callout_enumerate_block *, void *), + void *user_data); + + A script language that supports the use of string arguments in callouts + might like to scan all the callouts in a pattern before running the + match. This can be done by calling pcre2_callout_enumerate(). The first + argument is a pointer to a compiled pattern, the second points to a + callback function, and the third is arbitrary user data. The callback + function is called for every callout in the pattern in the order in + which they appear. Its first argument is a pointer to a callout enumer- + ation block, and its second argument is the user_data value that was + passed to pcre2_callout_enumerate(). The contents of the callout enu- + meration block are described in the pcre2callout documentation, which + also gives further details about callouts. + + +SERIALIZATION AND PRECOMPILING + + It is possible to save compiled patterns on disc or elsewhere, and + reload them later, subject to a number of restrictions. The host on + which the patterns are reloaded must be running the same version of + PCRE2, with the same code unit width, and must also have the same endi- + anness, pointer width, and PCRE2_SIZE type. Before compiled patterns + can be saved, they must be converted to a "serialized" form, which in + the case of PCRE2 is really just a bytecode dump. The functions whose + names begin with pcre2_serialize_ are used for converting to and from + the serialized form. They are described in the pcre2serialize documen- + tation. Note that PCRE2 serialization does not convert compiled pat- + terns to an abstract format like Java or .NET serialization. + + +THE MATCH DATA BLOCK + + pcre2_match_data *pcre2_match_data_create(uint32_t ovecsize, + pcre2_general_context *gcontext); + + pcre2_match_data *pcre2_match_data_create_from_pattern( + const pcre2_code *code, pcre2_general_context *gcontext); + + void pcre2_match_data_free(pcre2_match_data *match_data); + + Information about a successful or unsuccessful match is placed in a + match data block, which is an opaque structure that is accessed by + function calls. In particular, the match data block contains a vector + of offsets into the subject string that define the matched part of the + subject and any substrings that were captured. This is known as the + ovector. + + Before calling pcre2_match(), pcre2_dfa_match(), or pcre2_jit_match() + you must create a match data block by calling one of the creation func- + tions above. For pcre2_match_data_create(), the first argument is the + number of pairs of offsets in the ovector. One pair of offsets is re- + quired to identify the string that matched the whole pattern, with an + additional pair for each captured substring. For example, a value of 4 + creates enough space to record the matched portion of the subject plus + three captured substrings. A minimum of at least 1 pair is imposed by + pcre2_match_data_create(), so it is always possible to return the over- + all matched string. + + The second argument of pcre2_match_data_create() is a pointer to a gen- + eral context, which can specify custom memory management for obtaining + the memory for the match data block. If you are not using custom memory + management, pass NULL, which causes malloc() to be used. + + For pcre2_match_data_create_from_pattern(), the first argument is a + pointer to a compiled pattern. The ovector is created to be exactly the + right size to hold all the substrings a pattern might capture. The sec- + ond argument is again a pointer to a general context, but in this case + if NULL is passed, the memory is obtained using the same allocator that + was used for the compiled pattern (custom or default). + + A match data block can be used many times, with the same or different + compiled patterns. You can extract information from a match data block + after a match operation has finished, using functions that are de- + scribed in the sections on matched strings and other match data below. + + When a call of pcre2_match() fails, valid data is available in the + match block only when the error is PCRE2_ERROR_NOMATCH, PCRE2_ER- + ROR_PARTIAL, or one of the error codes for an invalid UTF string. Ex- + actly what is available depends on the error, and is detailed below. + + When one of the matching functions is called, pointers to the compiled + pattern and the subject string are set in the match data block so that + they can be referenced by the extraction functions after a successful + match. After running a match, you must not free a compiled pattern or a + subject string until after all operations on the match data block (for + that match) have taken place, unless, in the case of the subject + string, you have used the PCRE2_COPY_MATCHED_SUBJECT option, which is + described in the section entitled "Option bits for pcre2_match()" be- + low. + + When a match data block itself is no longer needed, it should be freed + by calling pcre2_match_data_free(). If this function is called with a + NULL argument, it returns immediately, without doing anything. + + +MATCHING A PATTERN: THE TRADITIONAL FUNCTION + + int pcre2_match(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext); + + The function pcre2_match() is called to match a subject string against + a compiled pattern, which is passed in the code argument. You can call + pcre2_match() with the same code argument as many times as you like, in + order to find multiple matches in the subject string or to match dif- + ferent subject strings with the same pattern. + + This function is the main matching facility of the library, and it op- + erates in a Perl-like manner. For specialist use there is also an al- + ternative matching function, which is described below in the section + about the pcre2_dfa_match() function. + + Here is an example of a simple call to pcre2_match(): + + pcre2_match_data *md = pcre2_match_data_create(4, NULL); + int rc = pcre2_match( + re, /* result of pcre2_compile() */ + "some string", /* the subject string */ + 11, /* the length of the subject string */ + 0, /* start at offset 0 in the subject */ + 0, /* default options */ + md, /* the match data block */ + NULL); /* a match context; NULL means use defaults */ + + If the subject string is zero-terminated, the length can be given as + PCRE2_ZERO_TERMINATED. A match context must be provided if certain less + common matching parameters are to be changed. For details, see the sec- + tion on the match context above. + + The string to be matched by pcre2_match() + + The subject string is passed to pcre2_match() as a pointer in subject, + a length in length, and a starting offset in startoffset. The length + and offset are in code units, not characters. That is, they are in + bytes for the 8-bit library, 16-bit code units for the 16-bit library, + and 32-bit code units for the 32-bit library, whether or not UTF pro- + cessing is enabled. + + If startoffset is greater than the length of the subject, pcre2_match() + returns PCRE2_ERROR_BADOFFSET. When the starting offset is zero, the + search for a match starts at the beginning of the subject, and this is + by far the most common case. In UTF-8 or UTF-16 mode, the starting off- + set must point to the start of a character, or to the end of the sub- + ject (in UTF-32 mode, one code unit equals one character, so all off- + sets are valid). Like the pattern string, the subject may contain bi- + nary zeros. + + A non-zero starting offset is useful when searching for another match + in the same subject by calling pcre2_match() again after a previous + success. Setting startoffset differs from passing over a shortened + string and setting PCRE2_NOTBOL in the case of a pattern that begins + with any kind of lookbehind. For example, consider the pattern + + \Biss\B + + which finds occurrences of "iss" in the middle of words. (\B matches + only if the current position in the subject is not a word boundary.) + When applied to the string "Mississipi" the first call to pcre2_match() + finds the first occurrence. If pcre2_match() is called again with just + the remainder of the subject, namely "issipi", it does not match, be- + cause \B is always false at the start of the subject, which is deemed + to be a word boundary. However, if pcre2_match() is passed the entire + string again, but with startoffset set to 4, it finds the second occur- + rence of "iss" because it is able to look behind the starting point to + discover that it is preceded by a letter. + + Finding all the matches in a subject is tricky when the pattern can + match an empty string. It is possible to emulate Perl's /g behaviour by + first trying the match again at the same offset, with the + PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED options, and then if that + fails, advancing the starting offset and trying an ordinary match + again. There is some code that demonstrates how to do this in the + pcre2demo sample program. In the most general case, you have to check + to see if the newline convention recognizes CRLF as a newline, and if + so, and the current character is CR followed by LF, advance the start- + ing offset by two characters instead of one. + + If a non-zero starting offset is passed when the pattern is anchored, a + single attempt to match at the given offset is made. This can only suc- + ceed if the pattern does not require the match to be at the start of + the subject. In other words, the anchoring must be the result of set- + ting the PCRE2_ANCHORED option or the use of .* with PCRE2_DOTALL, not + by starting the pattern with ^ or \A. + + Option bits for pcre2_match() + + The unused bits of the options argument for pcre2_match() must be zero. + The only bits that may be set are PCRE2_ANCHORED, + PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NO- + TEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, + PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their + action is described below. + + Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not sup- + ported by the just-in-time (JIT) compiler. If it is set, JIT matching + is disabled and the interpretive code in pcre2_match() is run. Apart + from PCRE2_NO_JIT (obviously), the remaining options are supported for + JIT matching. + + PCRE2_ANCHORED + + The PCRE2_ANCHORED option limits pcre2_match() to matching at the first + matching position. If a pattern was compiled with PCRE2_ANCHORED, or + turned out to be anchored by virtue of its contents, it cannot be made + unachored at matching time. Note that setting the option at match time + disables JIT matching. + + PCRE2_COPY_MATCHED_SUBJECT + + By default, a pointer to the subject is remembered in the match data + block so that, after a successful match, it can be referenced by the + substring extraction functions. This means that the subject's memory + must not be freed until all such operations are complete. For some ap- + plications where the lifetime of the subject string is not guaranteed, + it may be necessary to make a copy of the subject string, but it is + wasteful to do this unless the match is successful. After a successful + match, if PCRE2_COPY_MATCHED_SUBJECT is set, the subject is copied and + the new pointer is remembered in the match data block instead of the + original subject pointer. The memory allocator that was used for the + match block itself is used. The copy is automatically freed when + pcre2_match_data_free() is called to free the match data block. It is + also automatically freed if the match data block is re-used for another + match operation. + + PCRE2_ENDANCHORED + + If the PCRE2_ENDANCHORED option is set, any string that pcre2_match() + matches must be right at the end of the subject string. Note that set- + ting the option at match time disables JIT matching. + + PCRE2_NOTBOL + + This option specifies that first character of the subject string is not + the beginning of a line, so the circumflex metacharacter should not + match before it. Setting this without having set PCRE2_MULTILINE at + compile time causes circumflex never to match. This option affects only + the behaviour of the circumflex metacharacter. It does not affect \A. + + PCRE2_NOTEOL + + This option specifies that the end of the subject string is not the end + of a line, so the dollar metacharacter should not match it nor (except + in multiline mode) a newline immediately before it. Setting this with- + out having set PCRE2_MULTILINE at compile time causes dollar never to + match. This option affects only the behaviour of the dollar metacharac- + ter. It does not affect \Z or \z. + + PCRE2_NOTEMPTY + + An empty string is not considered to be a valid match if this option is + set. If there are alternatives in the pattern, they are tried. If all + the alternatives match the empty string, the entire match fails. For + example, if the pattern + + a?b? + + is applied to a string not beginning with "a" or "b", it matches an + empty string at the start of the subject. With PCRE2_NOTEMPTY set, this + match is not valid, so pcre2_match() searches further into the string + for occurrences of "a" or "b". + + PCRE2_NOTEMPTY_ATSTART + + This is like PCRE2_NOTEMPTY, except that it locks out an empty string + match only at the first matching position, that is, at the start of the + subject plus the starting offset. An empty string match later in the + subject is permitted. If the pattern is anchored, such a match can oc- + cur only if the pattern contains \K. + + PCRE2_NO_JIT + + By default, if a pattern has been successfully processed by + pcre2_jit_compile(), JIT is automatically used when pcre2_match() is + called with options that JIT supports. Setting PCRE2_NO_JIT disables + the use of JIT; it forces matching to be done by the interpreter. + + PCRE2_NO_UTF_CHECK + + When PCRE2_UTF is set at compile time, the validity of the subject as a + UTF string is checked unless PCRE2_NO_UTF_CHECK is passed to + pcre2_match() or PCRE2_MATCH_INVALID_UTF was passed to pcre2_compile(). + The latter special case is discussed in detail in the pcre2unicode doc- + umentation. + + In the default case, if a non-zero starting offset is given, the check + is applied only to that part of the subject that could be inspected + during matching, and there is a check that the starting offset points + to the first code unit of a character or to the end of the subject. If + there are no lookbehind assertions in the pattern, the check starts at + the starting offset. Otherwise, it starts at the length of the longest + lookbehind before the starting offset, or at the start of the subject + if there are not that many characters before the starting offset. Note + that the sequences \b and \B are one-character lookbehinds. + + The check is carried out before any other processing takes place, and a + negative error code is returned if the check fails. There are several + UTF error codes for each code unit width, corresponding to different + problems with the code unit sequence. There are discussions about the + validity of UTF-8 strings, UTF-16 strings, and UTF-32 strings in the + pcre2unicode documentation. + + If you know that your subject is valid, and you want to skip this check + for performance reasons, you can set the PCRE2_NO_UTF_CHECK option when + calling pcre2_match(). You might want to do this for the second and + subsequent calls to pcre2_match() if you are making repeated calls to + find multiple matches in the same subject string. + + Warning: Unless PCRE2_MATCH_INVALID_UTF was set at compile time, when + PCRE2_NO_UTF_CHECK is set at match time the effect of passing an in- + valid string as a subject, or an invalid value of startoffset, is unde- + fined. Your program may crash or loop indefinitely or give wrong re- + sults. + + PCRE2_PARTIAL_HARD + PCRE2_PARTIAL_SOFT + + These options turn on the partial matching feature. A partial match oc- + curs if the end of the subject string is reached successfully, but + there are not enough subject characters to complete the match. In addi- + tion, either at least one character must have been inspected or the + pattern must contain a lookbehind, or the pattern must be one that + could match an empty string. + + If this situation arises when PCRE2_PARTIAL_SOFT (but not PCRE2_PAR- + TIAL_HARD) is set, matching continues by testing any remaining alterna- + tives. Only if no complete match can be found is PCRE2_ERROR_PARTIAL + returned instead of PCRE2_ERROR_NOMATCH. In other words, PCRE2_PAR- + TIAL_SOFT specifies that the caller is prepared to handle a partial + match, but only if no complete match can be found. + + If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this + case, if a partial match is found, pcre2_match() immediately returns + PCRE2_ERROR_PARTIAL, without considering any other alternatives. In + other words, when PCRE2_PARTIAL_HARD is set, a partial match is consid- + ered to be more important that an alternative complete match. + + There is a more detailed discussion of partial and multi-segment match- + ing, with examples, in the pcre2partial documentation. + + +NEWLINE HANDLING WHEN MATCHING + + When PCRE2 is built, a default newline convention is set; this is usu- + ally the standard convention for the operating system. The default can + be overridden in a compile context by calling pcre2_set_newline(). It + can also be overridden by starting a pattern string with, for example, + (*CRLF), as described in the section on newline conventions in the + pcre2pattern page. During matching, the newline choice affects the be- + haviour of the dot, circumflex, and dollar metacharacters. It may also + alter the way the match starting position is advanced after a match + failure for an unanchored pattern. + + When PCRE2_NEWLINE_CRLF, PCRE2_NEWLINE_ANYCRLF, or PCRE2_NEWLINE_ANY is + set as the newline convention, and a match attempt for an unanchored + pattern fails when the current starting position is at a CRLF sequence, + and the pattern contains no explicit matches for CR or LF characters, + the match position is advanced by two characters instead of one, in + other words, to after the CRLF. + + The above rule is a compromise that makes the most common cases work as + expected. For example, if the pattern is .+A (and the PCRE2_DOTALL op- + tion is not set), it does not match the string "\r\nA" because, after + failing at the start, it skips both the CR and the LF before retrying. + However, the pattern [\r\n]A does match that string, because it con- + tains an explicit CR or LF reference, and so advances only by one char- + acter after the first failure. + + An explicit match for CR of LF is either a literal appearance of one of + those characters in the pattern, or one of the \r or \n or equivalent + octal or hexadecimal escape sequences. Implicit matches such as [^X] do + not count, nor does \s, even though it includes CR and LF in the char- + acters that it matches. + + Notwithstanding the above, anomalous effects may still occur when CRLF + is a valid newline sequence and explicit \r or \n escapes appear in the + pattern. + + +HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS + + uint32_t pcre2_get_ovector_count(pcre2_match_data *match_data); + + PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *match_data); + + In general, a pattern matches a certain portion of the subject, and in + addition, further substrings from the subject may be picked out by + parenthesized parts of the pattern. Following the usage in Jeffrey + Friedl's book, this is called "capturing" in what follows, and the + phrase "capture group" (Perl terminology) is used for a fragment of a + pattern that picks out a substring. PCRE2 supports several other kinds + of parenthesized group that do not cause substrings to be captured. The + pcre2_pattern_info() function can be used to find out how many capture + groups there are in a compiled pattern. + + You can use auxiliary functions for accessing captured substrings by + number or by name, as described in sections below. + + Alternatively, you can make direct use of the vector of PCRE2_SIZE val- + ues, called the ovector, which contains the offsets of captured + strings. It is part of the match data block. The function + pcre2_get_ovector_pointer() returns the address of the ovector, and + pcre2_get_ovector_count() returns the number of pairs of values it con- + tains. + + Within the ovector, the first in each pair of values is set to the off- + set of the first code unit of a substring, and the second is set to the + offset of the first code unit after the end of a substring. These val- + ues are always code unit offsets, not character offsets. That is, they + are byte offsets in the 8-bit library, 16-bit offsets in the 16-bit li- + brary, and 32-bit offsets in the 32-bit library. + + After a partial match (error return PCRE2_ERROR_PARTIAL), only the + first pair of offsets (that is, ovector[0] and ovector[1]) are set. + They identify the part of the subject that was partially matched. See + the pcre2partial documentation for details of partial matching. + + After a fully successful match, the first pair of offsets identifies + the portion of the subject string that was matched by the entire pat- + tern. The next pair is used for the first captured substring, and so + on. The value returned by pcre2_match() is one more than the highest + numbered pair that has been set. For example, if two substrings have + been captured, the returned value is 3. If there are no captured sub- + strings, the return value from a successful match is 1, indicating that + just the first pair of offsets has been set. + + If a pattern uses the \K escape sequence within a positive assertion, + the reported start of a successful match can be greater than the end of + the match. For example, if the pattern (?=ab\K) is matched against + "ab", the start and end offset values for the match are 2 and 0. + + If a capture group is matched repeatedly within a single match opera- + tion, it is the last portion of the subject that it matched that is re- + turned. + + If the ovector is too small to hold all the captured substring offsets, + as much as possible is filled in, and the function returns a value of + zero. If captured substrings are not of interest, pcre2_match() may be + called with a match data block whose ovector is of minimum length (that + is, one pair). + + It is possible for capture group number n+1 to match some part of the + subject when group n has not been used at all. For example, if the + string "abc" is matched against the pattern (a|(z))(bc) the return from + the function is 4, and groups 1 and 3 are matched, but 2 is not. When + this happens, both values in the offset pairs corresponding to unused + groups are set to PCRE2_UNSET. + + Offset values that correspond to unused groups at the end of the ex- + pression are also set to PCRE2_UNSET. For example, if the string "abc" + is matched against the pattern (abc)(x(yz)?)? groups 2 and 3 are not + matched. The return from the function is 2, because the highest used + capture group number is 1. The offsets for for the second and third + capture groupss (assuming the vector is large enough, of course) are + set to PCRE2_UNSET. + + Elements in the ovector that do not correspond to capturing parentheses + in the pattern are never changed. That is, if a pattern contains n cap- + turing parentheses, no more than ovector[0] to ovector[2n+1] are set by + pcre2_match(). The other elements retain whatever values they previ- + ously had. After a failed match attempt, the contents of the ovector + are unchanged. + + +OTHER INFORMATION ABOUT A MATCH + + PCRE2_SPTR pcre2_get_mark(pcre2_match_data *match_data); + + PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *match_data); + + As well as the offsets in the ovector, other information about a match + is retained in the match data block and can be retrieved by the above + functions in appropriate circumstances. If they are called at other + times, the result is undefined. + + After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a + failure to match (PCRE2_ERROR_NOMATCH), a mark name may be available. + The function pcre2_get_mark() can be called to access this name, which + can be specified in the pattern by any of the backtracking control + verbs, not just (*MARK). The same function applies to all the verbs. It + returns a pointer to the zero-terminated name, which is within the com- + piled pattern. If no name is available, NULL is returned. The length of + the name (excluding the terminating zero) is stored in the code unit + that precedes the name. You should use this length instead of relying + on the terminating zero if the name might contain a binary zero. + + After a successful match, the name that is returned is the last mark + name encountered on the matching path through the pattern. Instances of + backtracking verbs without names do not count. Thus, for example, if + the matching path contains (*MARK:A)(*PRUNE), the name "A" is returned. + After a "no match" or a partial match, the last encountered name is re- + turned. For example, consider this pattern: + + ^(*MARK:A)((*MARK:B)a|b)c + + When it matches "bc", the returned name is A. The B mark is "seen" in + the first branch of the group, but it is not on the matching path. On + the other hand, when this pattern fails to match "bx", the returned + name is B. + + Warning: By default, certain start-of-match optimizations are used to + give a fast "no match" result in some situations. For example, if the + anchoring is removed from the pattern above, there is an initial check + for the presence of "c" in the subject before running the matching en- + gine. This check fails for "bx", causing a match failure without seeing + any marks. You can disable the start-of-match optimizations by setting + the PCRE2_NO_START_OPTIMIZE option for pcre2_compile() or by starting + the pattern with (*NO_START_OPT). + + After a successful match, a partial match, or one of the invalid UTF + errors (for example, PCRE2_ERROR_UTF8_ERR5), pcre2_get_startchar() can + be called. After a successful or partial match it returns the code unit + offset of the character at which the match started. For a non-partial + match, this can be different to the value of ovector[0] if the pattern + contains the \K escape sequence. After a partial match, however, this + value is always the same as ovector[0] because \K does not affect the + result of a partial match. + + After a UTF check failure, pcre2_get_startchar() can be used to obtain + the code unit offset of the invalid UTF character. Details are given in + the pcre2unicode page. + + +ERROR RETURNS FROM pcre2_match() + + If pcre2_match() fails, it returns a negative number. This can be con- + verted to a text string by calling the pcre2_get_error_message() func- + tion (see "Obtaining a textual error message" below). Negative error + codes are also returned by other functions, and are documented with + them. The codes are given names in the header file. If UTF checking is + in force and an invalid UTF subject string is detected, one of a number + of UTF-specific negative error codes is returned. Details are given in + the pcre2unicode page. The following are the other errors that may be + returned by pcre2_match(): + + PCRE2_ERROR_NOMATCH + + The subject string did not match the pattern. + + PCRE2_ERROR_PARTIAL + + The subject string did not match, but it did match partially. See the + pcre2partial documentation for details of partial matching. + + PCRE2_ERROR_BADMAGIC + + PCRE2 stores a 4-byte "magic number" at the start of the compiled code, + to catch the case when it is passed a junk pointer. This is the error + that is returned when the magic number is not present. + + PCRE2_ERROR_BADMODE + + This error is given when a compiled pattern is passed to a function in + a library of a different code unit width, for example, a pattern com- + piled by the 8-bit library is passed to a 16-bit or 32-bit library + function. + + PCRE2_ERROR_BADOFFSET + + The value of startoffset was greater than the length of the subject. + + PCRE2_ERROR_BADOPTION + + An unrecognized bit was set in the options argument. + + PCRE2_ERROR_BADUTFOFFSET + + The UTF code unit sequence that was passed as a subject was checked and + found to be valid (the PCRE2_NO_UTF_CHECK option was not set), but the + value of startoffset did not point to the beginning of a UTF character + or the end of the subject. + + PCRE2_ERROR_CALLOUT + + This error is never generated by pcre2_match() itself. It is provided + for use by callout functions that want to cause pcre2_match() or + pcre2_callout_enumerate() to return a distinctive error code. See the + pcre2callout documentation for details. + + PCRE2_ERROR_DEPTHLIMIT + + The nested backtracking depth limit was reached. + + PCRE2_ERROR_HEAPLIMIT + + The heap limit was reached. + + PCRE2_ERROR_INTERNAL + + An unexpected internal error has occurred. This error could be caused + by a bug in PCRE2 or by overwriting of the compiled pattern. + + PCRE2_ERROR_JIT_STACKLIMIT + + This error is returned when a pattern that was successfully studied us- + ing JIT is being matched, but the memory available for the just-in-time + processing stack is not large enough. See the pcre2jit documentation + for more details. + + PCRE2_ERROR_MATCHLIMIT + + The backtracking match limit was reached. + + PCRE2_ERROR_NOMEMORY + + If a pattern contains many nested backtracking points, heap memory is + used to remember them. This error is given when the memory allocation + function (default or custom) fails. Note that a different error, + PCRE2_ERROR_HEAPLIMIT, is given if the amount of memory needed exceeds + the heap limit. PCRE2_ERROR_NOMEMORY is also returned if + PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails. + + PCRE2_ERROR_NULL + + Either the code, subject, or match_data argument was passed as NULL. + + PCRE2_ERROR_RECURSELOOP + + This error is returned when pcre2_match() detects a recursion loop + within the pattern. Specifically, it means that either the whole pat- + tern or a capture group has been called recursively for the second time + at the same position in the subject string. Some simple patterns that + might do this are detected and faulted at compile time, but more com- + plicated cases, in particular mutual recursions between two different + groups, cannot be detected until matching is attempted. + + +OBTAINING A TEXTUAL ERROR MESSAGE + + int pcre2_get_error_message(int errorcode, PCRE2_UCHAR *buffer, + PCRE2_SIZE bufflen); + + A text message for an error code from any PCRE2 function (compile, + match, or auxiliary) can be obtained by calling pcre2_get_error_mes- + sage(). The code is passed as the first argument, with the remaining + two arguments specifying a code unit buffer and its length in code + units, into which the text message is placed. The message is returned + in code units of the appropriate width for the library that is being + used. + + The returned message is terminated with a trailing zero, and the func- + tion returns the number of code units used, excluding the trailing + zero. If the error number is unknown, the negative error code PCRE2_ER- + ROR_BADDATA is returned. If the buffer is too small, the message is + truncated (but still with a trailing zero), and the negative error code + PCRE2_ERROR_NOMEMORY is returned. None of the messages are very long; + a buffer size of 120 code units is ample. + + +EXTRACTING CAPTURED SUBSTRINGS BY NUMBER + + int pcre2_substring_length_bynumber(pcre2_match_data *match_data, + uint32_t number, PCRE2_SIZE *length); + + int pcre2_substring_copy_bynumber(pcre2_match_data *match_data, + uint32_t number, PCRE2_UCHAR *buffer, + PCRE2_SIZE *bufflen); + + int pcre2_substring_get_bynumber(pcre2_match_data *match_data, + uint32_t number, PCRE2_UCHAR **bufferptr, + PCRE2_SIZE *bufflen); + + void pcre2_substring_free(PCRE2_UCHAR *buffer); + + Captured substrings can be accessed directly by using the ovector as + described above. For convenience, auxiliary functions are provided for + extracting captured substrings as new, separate, zero-terminated + strings. A substring that contains a binary zero is correctly extracted + and has a further zero added on the end, but the result is not, of + course, a C string. + + The functions in this section identify substrings by number. The number + zero refers to the entire matched substring, with higher numbers refer- + ring to substrings captured by parenthesized groups. After a partial + match, only substring zero is available. An attempt to extract any + other substring gives the error PCRE2_ERROR_PARTIAL. The next section + describes similar functions for extracting captured substrings by name. + + If a pattern uses the \K escape sequence within a positive assertion, + the reported start of a successful match can be greater than the end of + the match. For example, if the pattern (?=ab\K) is matched against + "ab", the start and end offset values for the match are 2 and 0. In + this situation, calling these functions with a zero substring number + extracts a zero-length empty string. + + You can find the length in code units of a captured substring without + extracting it by calling pcre2_substring_length_bynumber(). The first + argument is a pointer to the match data block, the second is the group + number, and the third is a pointer to a variable into which the length + is placed. If you just want to know whether or not the substring has + been captured, you can pass the third argument as NULL. + + The pcre2_substring_copy_bynumber() function copies a captured sub- + string into a supplied buffer, whereas pcre2_substring_get_bynumber() + copies it into new memory, obtained using the same memory allocation + function that was used for the match data block. The first two argu- + ments of these functions are a pointer to the match data block and a + capture group number. + + The final arguments of pcre2_substring_copy_bynumber() are a pointer to + the buffer and a pointer to a variable that contains its length in code + units. This is updated to contain the actual number of code units used + for the extracted substring, excluding the terminating zero. + + For pcre2_substring_get_bynumber() the third and fourth arguments point + to variables that are updated with a pointer to the new memory and the + number of code units that comprise the substring, again excluding the + terminating zero. When the substring is no longer needed, the memory + should be freed by calling pcre2_substring_free(). + + The return value from all these functions is zero for success, or a + negative error code. If the pattern match failed, the match failure + code is returned. If a substring number greater than zero is used af- + ter a partial match, PCRE2_ERROR_PARTIAL is returned. Other possible + error codes are: + + PCRE2_ERROR_NOMEMORY + + The buffer was too small for pcre2_substring_copy_bynumber(), or the + attempt to get memory failed for pcre2_substring_get_bynumber(). + + PCRE2_ERROR_NOSUBSTRING + + There is no substring with that number in the pattern, that is, the + number is greater than the number of capturing parentheses. + + PCRE2_ERROR_UNAVAILABLE + + The substring number, though not greater than the number of captures in + the pattern, is greater than the number of slots in the ovector, so the + substring could not be captured. + + PCRE2_ERROR_UNSET + + The substring did not participate in the match. For example, if the + pattern is (abc)|(def) and the subject is "def", and the ovector con- + tains at least two capturing slots, substring number 1 is unset. + + +EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS + + int pcre2_substring_list_get(pcre2_match_data *match_data, + PCRE2_UCHAR ***listptr, PCRE2_SIZE **lengthsptr); + + void pcre2_substring_list_free(PCRE2_SPTR *list); + + The pcre2_substring_list_get() function extracts all available sub- + strings and builds a list of pointers to them. It also (optionally) + builds a second list that contains their lengths (in code units), ex- + cluding a terminating zero that is added to each of them. All this is + done in a single block of memory that is obtained using the same memory + allocation function that was used to get the match data block. + + This function must be called only after a successful match. If called + after a partial match, the error code PCRE2_ERROR_PARTIAL is returned. + + The address of the memory block is returned via listptr, which is also + the start of the list of string pointers. The end of the list is marked + by a NULL pointer. The address of the list of lengths is returned via + lengthsptr. If your strings do not contain binary zeros and you do not + therefore need the lengths, you may supply NULL as the lengthsptr argu- + ment to disable the creation of a list of lengths. The yield of the + function is zero if all went well, or PCRE2_ERROR_NOMEMORY if the mem- + ory block could not be obtained. When the list is no longer needed, it + should be freed by calling pcre2_substring_list_free(). + + If this function encounters a substring that is unset, which can happen + when capture group number n+1 matches some part of the subject, but + group n has not been used at all, it returns an empty string. This can + be distinguished from a genuine zero-length substring by inspecting the + appropriate offset in the ovector, which contain PCRE2_UNSET for unset + substrings, or by calling pcre2_substring_length_bynumber(). + + +EXTRACTING CAPTURED SUBSTRINGS BY NAME + + int pcre2_substring_number_from_name(const pcre2_code *code, + PCRE2_SPTR name); + + int pcre2_substring_length_byname(pcre2_match_data *match_data, + PCRE2_SPTR name, PCRE2_SIZE *length); + + int pcre2_substring_copy_byname(pcre2_match_data *match_data, + PCRE2_SPTR name, PCRE2_UCHAR *buffer, PCRE2_SIZE *bufflen); + + int pcre2_substring_get_byname(pcre2_match_data *match_data, + PCRE2_SPTR name, PCRE2_UCHAR **bufferptr, PCRE2_SIZE *bufflen); + + void pcre2_substring_free(PCRE2_UCHAR *buffer); + + To extract a substring by name, you first have to find associated num- + ber. For example, for this pattern: + + (a+)b(?\d+)... + + the number of the capture group called "xxx" is 2. If the name is known + to be unique (PCRE2_DUPNAMES was not set), you can find the number from + the name by calling pcre2_substring_number_from_name(). The first argu- + ment is the compiled pattern, and the second is the name. The yield of + the function is the group number, PCRE2_ERROR_NOSUBSTRING if there is + no group with that name, or PCRE2_ERROR_NOUNIQUESUBSTRING if there is + more than one group with that name. Given the number, you can extract + the substring directly from the ovector, or use one of the "bynumber" + functions described above. + + For convenience, there are also "byname" functions that correspond to + the "bynumber" functions, the only difference being that the second ar- + gument is a name instead of a number. If PCRE2_DUPNAMES is set and + there are duplicate names, these functions scan all the groups with the + given name, and return the captured substring from the first named + group that is set. + + If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is + returned. If all groups with the name have numbers that are greater + than the number of slots in the ovector, PCRE2_ERROR_UNAVAILABLE is re- + turned. If there is at least one group with a slot in the ovector, but + no group is found to be set, PCRE2_ERROR_UNSET is returned. + + Warning: If the pattern uses the (?| feature to set up multiple capture + groups with the same number, as described in the section on duplicate + group numbers in the pcre2pattern page, you cannot use names to distin- + guish the different capture groups, because names are not included in + the compiled code. The matching process uses only numbers. For this + reason, the use of different names for groups with the same number + causes an error at compile time. + + +CREATING A NEW STRING WITH SUBSTITUTIONS + + int pcre2_substitute(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext, PCRE2_SPTR replacement, + PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer, + PCRE2_SIZE *outlengthptr); + + This function optionally calls pcre2_match() and then makes a copy of + the subject string in outputbuffer, replacing parts that were matched + with the replacement string, whose length is supplied in rlength. This + can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. + There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to re- + turn just the replacement string(s). The default action is to perform + just one replacement if the pattern matches, but there is an option + that requests multiple replacements (see PCRE2_SUBSTITUTE_GLOBAL be- + low). + + If successful, pcre2_substitute() returns the number of substitutions + that were carried out. This may be zero if no match was found, and is + never greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A nega- + tive value is returned if an error is detected. + + Matches in which a \K item in a lookahead in the pattern causes the + match to end before it starts are not supported, and give rise to an + error return. For global replacements, matches in which \K in a lookbe- + hind causes the match to start earlier than the point that was reached + in the previous iteration are also not supported. + + The first seven arguments of pcre2_substitute() are the same as for + pcre2_match(), except that the partial matching options are not permit- + ted, and match_data may be passed as NULL, in which case a match data + block is obtained and freed within this function, using memory manage- + ment functions from the match context, if provided, or else those that + were used to allocate memory for the compiled code. + + If match_data is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the + provided block is used for all calls to pcre2_match(), and its contents + afterwards are the result of the final call. For global changes, this + will always be a no-match error. The contents of the ovector within the + match data block may or may not have been changed. + + As well as the usual options for pcre2_match(), a number of additional + options can be set in the options argument of pcre2_substitute(). One + such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external + match_data block must be provided, and it must have been used for an + external call to pcre2_match(). The data in the match_data block (re- + turn code, offset vector) is used for the first substitution instead of + calling pcre2_match() from within pcre2_substitute(). This allows an + application to check for a match before choosing to substitute, without + having to repeat the match. + + The contents of the externally supplied match data block are not + changed when PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTI- + TUTE_GLOBAL is also set, pcre2_match() is called after the first sub- + stitution to check for further matches, but this is done using an in- + ternally obtained match data block, thus always leaving the external + block unchanged. + + The code argument is not used for matching before the first substitu- + tion when PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided, + even when PCRE2_SUBSTITUTE_GLOBAL is not set, because it contains in- + formation such as the UTF setting and the number of capturing parenthe- + ses in the pattern. + + The default action of pcre2_substitute() is to return a copy of the + subject string with matched substrings replaced. However, if PCRE2_SUB- + STITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are + returned. In the global case, multiple replacements are concatenated in + the output buffer. Substitution callouts (see below) can be used to + separate them if necessary. + + The outlengthptr argument of pcre2_substitute() must point to a vari- + able that contains the length, in code units, of the output buffer. If + the function is successful, the value is updated to contain the length + in code units of the new string, excluding the trailing zero that is + automatically added. + + If the function is not successful, the value set via outlengthptr de- + pends on the type of error. For syntax errors in the replacement + string, the value is the offset in the replacement string where the er- + ror was detected. For other errors, the value is PCRE2_UNSET by de- + fault. This includes the case of the output buffer being too small, un- + less PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set. + + PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output + buffer is too small. The default action is to return PCRE2_ERROR_NOMEM- + ORY immediately. If this option is set, however, pcre2_substitute() + continues to go through the motions of matching and substituting (with- + out, of course, writing anything) in order to compute the size of buf- + fer that is needed. This value is passed back via the outlengthptr + variable, with the result of the function still being PCRE2_ER- + ROR_NOMEMORY. + + Passing a buffer size of zero is a permitted way of finding out how + much memory is needed for given substitution. However, this does mean + that the entire operation is carried out twice. Depending on the appli- + cation, it may be more efficient to allocate a large buffer and free + the excess afterwards, instead of using PCRE2_SUBSTITUTE_OVER- + FLOW_LENGTH. + + The replacement string, which is interpreted as a UTF string in UTF + mode, is checked for UTF validity unless PCRE2_NO_UTF_CHECK is set. An + invalid UTF replacement string causes an immediate return with the rel- + evant UTF error code. + + If PCRE2_SUBSTITUTE_LITERAL is set, the replacement string is not in- + terpreted in any way. By default, however, a dollar character is an es- + cape character that can specify the insertion of characters from cap- + ture groups and names from (*MARK) or other control verbs in the pat- + tern. The following forms are always recognized: + + $$ insert a dollar character + $ or ${} insert the contents of group + $*MARK or ${*MARK} insert a control verb name + + Either a group number or a group name can be given for . Curly + brackets are required only if the following character would be inter- + preted as part of the number or name. The number may be zero to include + the entire matched string. For example, if the pattern a(b)c is + matched with "=abc=" and the replacement string "+$1$0$1+", the result + is "=+babcb+=". + + $*MARK inserts the name from the last encountered backtracking control + verb on the matching path that has a name. (*MARK) must always include + a name, but the other verbs need not. For example, in the case of + (*MARK:A)(*PRUNE) the name inserted is "A", but for (*MARK:A)(*PRUNE:B) + the relevant name is "B". This facility can be used to perform simple + simultaneous substitutions, as this pcre2test example shows: + + /(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK} + apple lemon + 2: pear orange + + PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject + string, replacing every matching substring. If this option is not set, + only the first matching substring is replaced. The search for matches + takes place in the original subject string (that is, previous replace- + ments do not affect it). Iteration is implemented by advancing the + startoffset value for each search, which is always passed the entire + subject string. If an offset limit is set in the match context, search- + ing stops when that limit is reached. + + You can restrict the effect of a global substitution to a portion of + the subject string by setting either or both of startoffset and an off- + set limit. Here is a pcre2test example: + + /B/g,replace=!,use_offset_limit + ABC ABC ABC ABC\=offset=3,offset_limit=12 + 2: ABC A!C A!C ABC + + When continuing with global substitutions after matching a substring + with zero length, an attempt to find a non-empty match at the same off- + set is performed. If this is not successful, the offset is advanced by + one character except when CRLF is a valid newline sequence and the next + two characters are CR, LF. In this case, the offset is advanced by two + characters. + + PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that + do not appear in the pattern to be treated as unset groups. This option + should be used with care, because it means that a typo in a group name + or number no longer causes the PCRE2_ERROR_NOSUBSTRING error. + + PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including un- + known groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated + as empty strings when inserted as described above. If this option is + not set, an attempt to insert an unset group causes the PCRE2_ERROR_UN- + SET error. This option does not influence the extended substitution + syntax described below. + + PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the + replacement string. Without this option, only the dollar character is + special, and only the group insertion forms listed above are valid. + When PCRE2_SUBSTITUTE_EXTENDED is set, two things change: + + Firstly, backslash in a replacement string is interpreted as an escape + character. The usual forms such as \n or \x{ddd} can be used to specify + particular character codes, and backslash followed by any non-alphanu- + meric character quotes that character. Extended quoting can be coded + using \Q...\E, exactly as in pattern strings. + + There are also four escape sequences for forcing the case of inserted + letters. The insertion mechanism has three states: no case forcing, + force upper case, and force lower case. The escape sequences change the + current state: \U and \L change to upper or lower case forcing, respec- + tively, and \E (when not terminating a \Q quoted sequence) reverts to + no case forcing. The sequences \u and \l force the next character (if + it is a letter) to upper or lower case, respectively, and then the + state automatically reverts to no case forcing. Case forcing applies to + all inserted characters, including those from capture groups and let- + ters within \Q...\E quoted sequences. If either PCRE2_UTF or PCRE2_UCP + was set when the pattern was compiled, Unicode properties are used for + case forcing characters whose code points are greater than 127. + + Note that case forcing sequences such as \U...\E do not nest. For exam- + ple, the result of processing "\Uaa\LBB\Ecc\E" is "AAbbcc"; the final + \E has no effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EX- + TRA_ALT_BSUX options do not apply to replacement strings. + + The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more + flexibility to capture group substitution. The syntax is similar to + that used by Bash: + + ${:-} + ${:+:} + + As before, may be a group number or a name. The first form speci- + fies a default value. If group is set, its value is inserted; if + not, is expanded and the result inserted. The second form + specifies strings that are expanded and inserted when group is set + or unset, respectively. The first form is just a convenient shorthand + for + + ${:+${}:} + + Backslash can be used to escape colons and closing curly brackets in + the replacement strings. A change of the case forcing state within a + replacement string remains in force afterwards, as shown in this + pcre2test example: + + /(some)?(body)/substitute_extended,replace=${1:+\U:\L}HeLLo + body + 1: hello + somebody + 1: HELLO + + The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended + substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause un- + known groups in the extended syntax forms to be treated as unset. + + If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET, + PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrele- + vant and are ignored. + + Substitution errors + + In the event of an error, pcre2_substitute() returns a negative error + code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors + from pcre2_match() are passed straight back. + + PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring inser- + tion, unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set. + + PCRE2_ERROR_UNSET is returned for an unset substring insertion (includ- + ing an unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) + when the simple (non-extended) syntax is used and PCRE2_SUBSTITUTE_UN- + SET_EMPTY is not set. + + PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big + enough. If the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size + of buffer that is needed is returned via outlengthptr. Note that this + does not happen by default. + + PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the + match_data argument is NULL. + + PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in + the replacement string, with more particular errors being PCRE2_ER- + ROR_BADREPESCAPE (invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE + (closing curly bracket not found), PCRE2_ERROR_BADSUBSTITUTION (syntax + error in extended group substitution), and PCRE2_ERROR_BADSUBSPATTERN + (the pattern match ended before it started or the match started earlier + than the current position in the subject, which can happen if \K is + used in an assertion). + + As for all PCRE2 errors, a text message that describes the error can be + obtained by calling the pcre2_get_error_message() function (see "Ob- + taining a textual error message" above). + + Substitution callouts + + int pcre2_set_substitute_callout(pcre2_match_context *mcontext, + int (*callout_function)(pcre2_substitute_callout_block *, void *), + void *callout_data); + + The pcre2_set_substitution_callout() function can be used to specify a + callout function for pcre2_substitute(). This information is passed in + a match context. The callout function is called after each substitution + has been processed, but it can cause the replacement not to happen. The + callout function is not called for simulated substitutions that happen + as a result of the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. + + The first argument of the callout function is a pointer to a substitute + callout block structure, which contains the following fields, not nec- + essarily in this order: + + uint32_t version; + uint32_t subscount; + PCRE2_SPTR input; + PCRE2_SPTR output; + PCRE2_SIZE *ovector; + uint32_t oveccount; + PCRE2_SIZE output_offsets[2]; + + The version field contains the version number of the block format. The + current version is 0. The version number will increase in future if + more fields are added, but the intention is never to remove any of the + existing fields. + + The subscount field is the number of the current match. It is 1 for the + first callout, 2 for the second, and so on. The input and output point- + ers are copies of the values passed to pcre2_substitute(). + + The ovector field points to the ovector, which contains the result of + the most recent match. The oveccount field contains the number of pairs + that are set in the ovector, and is always greater than zero. + + The output_offsets vector contains the offsets of the replacement in + the output string. This has already been processed for dollar and (if + requested) backslash substitutions as described above. + + The second argument of the callout function is the value passed as + callout_data when the function was registered. The value returned by + the callout function is interpreted as follows: + + If the value is zero, the replacement is accepted, and, if PCRE2_SUB- + STITUTE_GLOBAL is set, processing continues with a search for the next + match. If the value is not zero, the current replacement is not ac- + cepted. If the value is greater than zero, processing continues when + PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero + or PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is + copied to the output and the call to pcre2_substitute() exits, return- + ing the number of matches so far. + + +DUPLICATE CAPTURE GROUP NAMES + + int pcre2_substring_nametable_scan(const pcre2_code *code, + PCRE2_SPTR name, PCRE2_SPTR *first, PCRE2_SPTR *last); + + When a pattern is compiled with the PCRE2_DUPNAMES option, names for + capture groups are not required to be unique. Duplicate names are al- + ways allowed for groups with the same number, created by using the (?| + feature. Indeed, if such groups are named, they are required to use the + same names. + + Normally, patterns that use duplicate names are such that in any one + match, only one of each set of identically-named groups participates. + An example is shown in the pcre2pattern documentation. + + When duplicates are present, pcre2_substring_copy_byname() and + pcre2_substring_get_byname() return the first substring corresponding + to the given name that is set. Only if none are set is PCRE2_ERROR_UN- + SET is returned. The pcre2_substring_number_from_name() function re- + turns the error PCRE2_ERROR_NOUNIQUESUBSTRING when there are duplicate + names. + + If you want to get full details of all captured substrings for a given + name, you must use the pcre2_substring_nametable_scan() function. The + first argument is the compiled pattern, and the second is the name. If + the third and fourth arguments are NULL, the function returns a group + number for a unique name, or PCRE2_ERROR_NOUNIQUESUBSTRING otherwise. + + When the third and fourth arguments are not NULL, they must be pointers + to variables that are updated by the function. After it has run, they + point to the first and last entries in the name-to-number table for the + given name, and the function returns the length of each entry in code + units. In both cases, PCRE2_ERROR_NOSUBSTRING is returned if there are + no entries for the given name. + + The format of the name table is described above in the section entitled + Information about a pattern. Given all the relevant entries for the + name, you can extract each of their numbers, and hence the captured + data. + + +FINDING ALL POSSIBLE MATCHES AT ONE POSITION + + The traditional matching function uses a similar algorithm to Perl, + which stops when it finds the first match at a given point in the sub- + ject. If you want to find all possible matches, or the longest possible + match at a given position, consider using the alternative matching + function (see below) instead. If you cannot use the alternative func- + tion, you can kludge it up by making use of the callout facility, which + is described in the pcre2callout documentation. + + What you have to do is to insert a callout right at the end of the pat- + tern. When your callout function is called, extract and save the cur- + rent matched substring. Then return 1, which forces pcre2_match() to + backtrack and try other alternatives. Ultimately, when it runs out of + matches, pcre2_match() will yield PCRE2_ERROR_NOMATCH. + + +MATCHING A PATTERN: THE ALTERNATIVE FUNCTION + + int pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject, + PCRE2_SIZE length, PCRE2_SIZE startoffset, + uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext, + int *workspace, PCRE2_SIZE wscount); + + The function pcre2_dfa_match() is called to match a subject string + against a compiled pattern, using a matching algorithm that scans the + subject string just once (not counting lookaround assertions), and does + not backtrack. This has different characteristics to the normal algo- + rithm, and is not compatible with Perl. Some of the features of PCRE2 + patterns are not supported. Nevertheless, there are times when this + kind of matching can be useful. For a discussion of the two matching + algorithms, and a list of features that pcre2_dfa_match() does not sup- + port, see the pcre2matching documentation. + + The arguments for the pcre2_dfa_match() function are the same as for + pcre2_match(), plus two extras. The ovector within the match data block + is used in a different way, and this is described below. The other com- + mon arguments are used in the same way as for pcre2_match(), so their + description is not repeated here. + + The two additional arguments provide workspace for the function. The + workspace vector should contain at least 20 elements. It is used for + keeping track of multiple paths through the pattern tree. More + workspace is needed for patterns and subjects where there are a lot of + potential matches. + + Here is an example of a simple call to pcre2_dfa_match(): + + int wspace[20]; + pcre2_match_data *md = pcre2_match_data_create(4, NULL); + int rc = pcre2_dfa_match( + re, /* result of pcre2_compile() */ + "some string", /* the subject string */ + 11, /* the length of the subject string */ + 0, /* start at offset 0 in the subject */ + 0, /* default options */ + md, /* the match data block */ + NULL, /* a match context; NULL means use defaults */ + wspace, /* working space vector */ + 20); /* number of elements (NOT size in bytes) */ + + Option bits for pcre_dfa_match() + + The unused bits of the options argument for pcre2_dfa_match() must be + zero. The only bits that may be set are PCRE2_ANCHORED, + PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NO- + TEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, + PCRE2_PARTIAL_HARD, PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and + PCRE2_DFA_RESTART. All but the last four of these are exactly the same + as for pcre2_match(), so their description is not repeated here. + + PCRE2_PARTIAL_HARD + PCRE2_PARTIAL_SOFT + + These have the same general effect as they do for pcre2_match(), but + the details are slightly different. When PCRE2_PARTIAL_HARD is set for + pcre2_dfa_match(), it returns PCRE2_ERROR_PARTIAL if the end of the + subject is reached and there is still at least one matching possibility + that requires additional characters. This happens even if some complete + matches have already been found. When PCRE2_PARTIAL_SOFT is set, the + return code PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL + if the end of the subject is reached, there have been no complete + matches, but there is still at least one matching possibility. The por- + tion of the string that was inspected when the longest partial match + was found is set as the first matching string in both cases. There is a + more detailed discussion of partial and multi-segment matching, with + examples, in the pcre2partial documentation. + + PCRE2_DFA_SHORTEST + + Setting the PCRE2_DFA_SHORTEST option causes the matching algorithm to + stop as soon as it has found one match. Because of the way the alterna- + tive algorithm works, this is necessarily the shortest possible match + at the first possible matching point in the subject string. + + PCRE2_DFA_RESTART + + When pcre2_dfa_match() returns a partial match, it is possible to call + it again, with additional subject characters, and have it continue with + the same match. The PCRE2_DFA_RESTART option requests this action; when + it is set, the workspace and wscount options must reference the same + vector as before because data about the match so far is left in them + after a partial match. There is more discussion of this facility in the + pcre2partial documentation. + + Successful returns from pcre2_dfa_match() + + When pcre2_dfa_match() succeeds, it may have matched more than one sub- + string in the subject. Note, however, that all the matches from one run + of the function start at the same point in the subject. The shorter + matches are all initial substrings of the longer matches. For example, + if the pattern + + <.*> + + is matched against the string + + This is no more + + the three matched strings are + + + + + + On success, the yield of the function is a number greater than zero, + which is the number of matched substrings. The offsets of the sub- + strings are returned in the ovector, and can be extracted by number in + the same way as for pcre2_match(), but the numbers bear no relation to + any capture groups that may exist in the pattern, because DFA matching + does not support capturing. + + Calls to the convenience functions that extract substrings by name re- + turn the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used af- + ter a DFA match. The convenience functions that extract substrings by + number never return PCRE2_ERROR_NOSUBSTRING. + + The matched strings are stored in the ovector in reverse order of + length; that is, the longest matching string is first. If there were + too many matches to fit into the ovector, the yield of the function is + zero, and the vector is filled with the longest matches. + + NOTE: PCRE2's "auto-possessification" optimization usually applies to + character repeats at the end of a pattern (as well as internally). For + example, the pattern "a\d+" is compiled as if it were "a\d++". For DFA + matching, this means that only one possible match is found. If you re- + ally do want multiple matches in such cases, either use an ungreedy re- + peat such as "a\d+?" or set the PCRE2_NO_AUTO_POSSESS option when com- + piling. + + Error returns from pcre2_dfa_match() + + The pcre2_dfa_match() function returns a negative number when it fails. + Many of the errors are the same as for pcre2_match(), as described + above. There are in addition the following errors that are specific to + pcre2_dfa_match(): + + PCRE2_ERROR_DFA_UITEM + + This return is given if pcre2_dfa_match() encounters an item in the + pattern that it does not support, for instance, the use of \C in a UTF + mode or a backreference. + + PCRE2_ERROR_DFA_UCOND + + This return is given if pcre2_dfa_match() encounters a condition item + that uses a backreference for the condition, or a test for recursion in + a specific capture group. These are not supported. + + PCRE2_ERROR_DFA_UINVALID_UTF + + This return is given if pcre2_dfa_match() is called for a pattern that + was compiled with PCRE2_MATCH_INVALID_UTF. This is not supported for + DFA matching. + + PCRE2_ERROR_DFA_WSSIZE + + This return is given if pcre2_dfa_match() runs out of space in the + workspace vector. + + PCRE2_ERROR_DFA_RECURSE + + When a recursion or subroutine call is processed, the matching function + calls itself recursively, using private memory for the ovector and + workspace. This error is given if the internal ovector is not large + enough. This should be extremely rare, as a vector of size 1000 is + used. + + PCRE2_ERROR_DFA_BADRESTART + + When pcre2_dfa_match() is called with the PCRE2_DFA_RESTART option, + some plausibility checks are made on the contents of the workspace, + which should contain data about the previous partial match. If any of + these checks fail, this error is given. + + +SEE ALSO + + pcre2build(3), pcre2callout(3), pcre2demo(3), pcre2matching(3), + pcre2partial(3), pcre2posix(3), pcre2sample(3), pcre2unicode(3). + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 04 November 2020 + Copyright (c) 1997-2020 University of Cambridge. +------------------------------------------------------------------------------ + + +PCRE2BUILD(3) Library Functions Manual PCRE2BUILD(3) + + + +NAME + PCRE2 - Perl-compatible regular expressions (revised API) + +BUILDING PCRE2 + + PCRE2 is distributed with a configure script that can be used to build + the library in Unix-like environments using the applications known as + Autotools. Also in the distribution are files to support building using + CMake instead of configure. The text file README contains general in- + formation about building with Autotools (some of which is repeated be- + low), and also has some comments about building on various operating + systems. There is a lot more information about building PCRE2 without + using Autotools (including information about using CMake and building + "by hand") in the text file called NON-AUTOTOOLS-BUILD. You should + consult this file as well as the README file if you are building in a + non-Unix-like environment. + + +PCRE2 BUILD-TIME OPTIONS + + The rest of this document describes the optional features of PCRE2 that + can be selected when the library is compiled. It assumes use of the + configure script, where the optional features are selected or dese- + lected by providing options to configure before running the make com- + mand. However, the same options can be selected in both Unix-like and + non-Unix-like environments if you are using CMake instead of configure + to build PCRE2. + + If you are not using Autotools or CMake, option selection can be done + by editing the config.h file, or by passing parameter settings to the + compiler, as described in NON-AUTOTOOLS-BUILD. + + The complete list of options for configure (which includes the standard + ones such as the selection of the installation directory) can be ob- + tained by running + + ./configure --help + + The following sections include descriptions of "on/off" options whose + names begin with --enable or --disable. Because of the way that config- + ure works, --enable and --disable always come in pairs, so the comple- + mentary option always exists as well, but as it specifies the default, + it is not described. Options that specify values have names that start + with --with. At the end of a configure run, a summary of the configura- + tion is output. + + +BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES + + By default, a library called libpcre2-8 is built, containing functions + that take string arguments contained in arrays of bytes, interpreted + either as single-byte characters, or UTF-8 strings. You can also build + two other libraries, called libpcre2-16 and libpcre2-32, which process + strings that are contained in arrays of 16-bit and 32-bit code units, + respectively. These can be interpreted either as single-unit characters + or UTF-16/UTF-32 strings. To build these additional libraries, add one + or both of the following to the configure command: + + --enable-pcre2-16 + --enable-pcre2-32 + + If you do not want the 8-bit library, add + + --disable-pcre2-8 + + as well. At least one of the three libraries must be built. Note that + the POSIX wrapper is for the 8-bit library only, and that pcre2grep is + an 8-bit program. Neither of these are built if you select only the + 16-bit or 32-bit libraries. + + +BUILDING SHARED AND STATIC LIBRARIES + + The Autotools PCRE2 building process uses libtool to build both shared + and static libraries by default. You can suppress an unwanted library + by adding one of + + --disable-shared + --disable-static + + to the configure command. + + +UNICODE AND UTF SUPPORT + + By default, PCRE2 is built with support for Unicode and UTF character + strings. To build it without Unicode support, add + + --disable-unicode + + to the configure command. This setting applies to all three libraries. + It is not possible to build one library with Unicode support and an- + other without in the same configuration. + + Of itself, Unicode support does not make PCRE2 treat strings as UTF-8, + UTF-16 or UTF-32. To do that, applications that use the library can set + the PCRE2_UTF option when they call pcre2_compile() to compile a pat- + tern. Alternatively, patterns may be started with (*UTF) unless the + application has locked this out by setting PCRE2_NEVER_UTF. + + UTF support allows the libraries to process character code points up to + 0x10ffff in the strings that they handle. Unicode support also gives + access to the Unicode properties of characters, using pattern escapes + such as \P, \p, and \X. Only the general category properties such as Lu + and Nd are supported. Details are given in the pcre2pattern documenta- + tion. + + Pattern escapes such as \d and \w do not by default make use of Unicode + properties. The application can request that they do by setting the + PCRE2_UCP option. Unless the application has set PCRE2_NEVER_UCP, a + pattern may also request this by starting with (*UCP). + + +DISABLING THE USE OF \C + + The \C escape sequence, which matches a single code unit, even in a UTF + mode, can cause unpredictable behaviour because it may leave the cur- + rent matching point in the middle of a multi-code-unit character. The + application can lock it out by setting the PCRE2_NEVER_BACKSLASH_C op- + tion when calling pcre2_compile(). There is also a build-time option + + --enable-never-backslash-C + + (note the upper case C) which locks out the use of \C entirely. + + +JUST-IN-TIME COMPILER SUPPORT + + Just-in-time (JIT) compiler support is included in the build by speci- + fying + + --enable-jit + + This support is available only for certain hardware architectures. If + this option is set for an unsupported architecture, a building error + occurs. If in doubt, use + + --enable-jit=auto + + which enables JIT only if the current hardware is supported. You can + check if JIT is enabled in the configuration summary that is output at + the end of a configure run. If you are enabling JIT under SELinux you + may also want to add + + --enable-jit-sealloc + + which enables the use of an execmem allocator in JIT that is compatible + with SELinux. This has no effect if JIT is not enabled. See the + pcre2jit documentation for a discussion of JIT usage. When JIT support + is enabled, pcre2grep automatically makes use of it, unless you add + + --disable-pcre2grep-jit + + to the configure command. + + +NEWLINE RECOGNITION + + By default, PCRE2 interprets the linefeed (LF) character as indicating + the end of a line. This is the normal newline character on Unix-like + systems. You can compile PCRE2 to use carriage return (CR) instead, by + adding + + --enable-newline-is-cr + + to the configure command. There is also an --enable-newline-is-lf op- + tion, which explicitly specifies linefeed as the newline character. + + Alternatively, you can specify that line endings are to be indicated by + the two-character sequence CRLF (CR immediately followed by LF). If you + want this, add + + --enable-newline-is-crlf + + to the configure command. There is a fourth option, specified by + + --enable-newline-is-anycrlf + + which causes PCRE2 to recognize any of the three sequences CR, LF, or + CRLF as indicating a line ending. A fifth option, specified by + + --enable-newline-is-any + + causes PCRE2 to recognize any Unicode newline sequence. The Unicode + newline sequences are the three just mentioned, plus the single charac- + ters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line, + U+0085), LS (line separator, U+2028), and PS (paragraph separator, + U+2029). The final option is + + --enable-newline-is-nul + + which causes NUL (binary zero) to be set as the default line-ending + character. + + Whatever default line ending convention is selected when PCRE2 is built + can be overridden by applications that use the library. At build time + it is recommended to use the standard for your operating system. + + +WHAT \R MATCHES + + By default, the sequence \R in a pattern matches any Unicode newline + sequence, independently of what has been selected as the line ending + sequence. If you specify + + --enable-bsr-anycrlf + + the default is changed so that \R matches only CR, LF, or CRLF. What- + ever is selected when PCRE2 is built can be overridden by applications + that use the library. + + +HANDLING VERY LARGE PATTERNS + + Within a compiled pattern, offset values are used to point from one + part to another (for example, from an opening parenthesis to an alter- + nation metacharacter). By default, in the 8-bit and 16-bit libraries, + two-byte values are used for these offsets, leading to a maximum size + for a compiled pattern of around 64 thousand code units. This is suffi- + cient to handle all but the most gigantic patterns. Nevertheless, some + people do want to process truly enormous patterns, so it is possible to + compile PCRE2 to use three-byte or four-byte offsets by adding a set- + ting such as + + --with-link-size=3 + + to the configure command. The value given must be 2, 3, or 4. For the + 16-bit library, a value of 3 is rounded up to 4. In these libraries, + using longer offsets slows down the operation of PCRE2 because it has + to load additional data when handling them. For the 32-bit library the + value is always 4 and cannot be overridden; the value of --with-link- + size is ignored. + + +LIMITING PCRE2 RESOURCE USAGE + + The pcre2_match() function increments a counter each time it goes round + its main loop. Putting a limit on this counter controls the amount of + computing resource used by a single call to pcre2_match(). The limit + can be changed at run time, as described in the pcre2api documentation. + The default is 10 million, but this can be changed by adding a setting + such as + + --with-match-limit=500000 + + to the configure command. This setting also applies to the + pcre2_dfa_match() matching function, and to JIT matching (though the + counting is done differently). + + The pcre2_match() function starts out using a 20KiB vector on the sys- + tem stack to record backtracking points. The more nested backtracking + points there are (that is, the deeper the search tree), the more memory + is needed. If the initial vector is not large enough, heap memory is + used, up to a certain limit, which is specified in kibibytes (units of + 1024 bytes). The limit can be changed at run time, as described in the + pcre2api documentation. The default limit (in effect unlimited) is 20 + million. You can change this by a setting such as + + --with-heap-limit=500 + + which limits the amount of heap to 500 KiB. This limit applies only to + interpretive matching in pcre2_match() and pcre2_dfa_match(), which may + also use the heap for internal workspace when processing complicated + patterns. This limit does not apply when JIT (which has its own memory + arrangements) is used. + + You can also explicitly limit the depth of nested backtracking in the + pcre2_match() interpreter. This limit defaults to the value that is set + for --with-match-limit. You can set a lower default limit by adding, + for example, + + --with-match-limit_depth=10000 + + to the configure command. This value can be overridden at run time. + This depth limit indirectly limits the amount of heap memory that is + used, but because the size of each backtracking "frame" depends on the + number of capturing parentheses in a pattern, the amount of heap that + is used before the limit is reached varies from pattern to pattern. + This limit was more useful in versions before 10.30, where function re- + cursion was used for backtracking. + + As well as applying to pcre2_match(), the depth limit also controls the + depth of recursive function calls in pcre2_dfa_match(). These are used + for lookaround assertions, atomic groups, and recursion within pat- + terns. The limit does not apply to JIT matching. + + +CREATING CHARACTER TABLES AT BUILD TIME + + PCRE2 uses fixed tables for processing characters whose code points are + less than 256. By default, PCRE2 is built with a set of tables that are + distributed in the file src/pcre2_chartables.c.dist. These tables are + for ASCII codes only. If you add + + --enable-rebuild-chartables + + to the configure command, the distributed tables are no longer used. + Instead, a program called pcre2_dftables is compiled and run. This out- + puts the source for new set of tables, created in the default locale of + your C run-time system. This method of replacing the tables does not + work if you are cross compiling, because pcre2_dftables needs to be run + on the local host and therefore not compiled with the cross compiler. + + If you need to create alternative tables when cross compiling, you will + have to do so "by hand". There may also be other reasons for creating + tables manually. To cause pcre2_dftables to be built on the local + host, run a normal compiling command, and then run the program with the + output file as its argument, for example: + + cc src/pcre2_dftables.c -o pcre2_dftables + ./pcre2_dftables src/pcre2_chartables.c + + This builds the tables in the default locale of the local host. If you + want to specify a locale, you must use the -L option: + + LC_ALL=fr_FR ./pcre2_dftables -L src/pcre2_chartables.c + + You can also specify -b (with or without -L). This causes the tables to + be written in binary instead of as source code. A set of binary tables + can be loaded into memory by an application and passed to pcre2_com- + pile() in the same way as tables created by calling pcre2_maketables(). + The tables are just a string of bytes, independent of hardware charac- + teristics such as endianness. This means they can be bundled with an + application that runs in different environments, to ensure consistent + behaviour. + + +USING EBCDIC CODE + + PCRE2 assumes by default that it will run in an environment where the + character code is ASCII or Unicode, which is a superset of ASCII. This + is the case for most computer operating systems. PCRE2 can, however, be + compiled to run in an 8-bit EBCDIC environment by adding + + --enable-ebcdic --disable-unicode + + to the configure command. This setting implies --enable-rebuild-charta- + bles. You should only use it if you know that you are in an EBCDIC en- + vironment (for example, an IBM mainframe operating system). + + It is not possible to support both EBCDIC and UTF-8 codes in the same + version of the library. Consequently, --enable-unicode and --enable- + ebcdic are mutually exclusive. + + The EBCDIC character that corresponds to an ASCII LF is assumed to have + the value 0x15 by default. However, in some EBCDIC environments, 0x25 + is used. In such an environment you should use + + --enable-ebcdic-nl25 + + as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR + has the same value as in ASCII, namely, 0x0d. Whichever of 0x15 and + 0x25 is not chosen as LF is made to correspond to the Unicode NEL char- + acter (which, in Unicode, is 0x85). + + The options that select newline behaviour, such as --enable-newline-is- + cr, and equivalent run-time options, refer to these character values in + an EBCDIC environment. + + +PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS + + By default pcre2grep supports the use of callouts with string arguments + within the patterns it is matching. There are two kinds: one that gen- + erates output using local code, and another that calls an external pro- + gram or script. If --disable-pcre2grep-callout-fork is added to the + configure command, only the first kind of callout is supported; if + --disable-pcre2grep-callout is used, all callouts are completely ig- + nored. For more details of pcre2grep callouts, see the pcre2grep docu- + mentation. + + +PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT + + By default, pcre2grep reads all files as plain text. You can build it + so that it recognizes files whose names end in .gz or .bz2, and reads + them with libz or libbz2, respectively, by adding one or both of + + --enable-pcre2grep-libz + --enable-pcre2grep-libbz2 + + to the configure command. These options naturally require that the rel- + evant libraries are installed on your system. Configuration will fail + if they are not. + + +PCRE2GREP BUFFER SIZE + + pcre2grep uses an internal buffer to hold a "window" on the file it is + scanning, in order to be able to output "before" and "after" lines when + it finds a match. The default starting size of the buffer is 20KiB. The + buffer itself is three times this size, but because of the way it is + used for holding "before" lines, the longest line that is guaranteed to + be processable is the notional buffer size. If a longer line is encoun- + tered, pcre2grep automatically expands the buffer, up to a specified + maximum size, whose default is 1MiB or the starting size, whichever is + the larger. You can change the default parameter values by adding, for + example, + + --with-pcre2grep-bufsize=51200 + --with-pcre2grep-max-bufsize=2097152 + + to the configure command. The caller of pcre2grep can override these + values by using --buffer-size and --max-buffer-size on the command + line. + + +PCRE2TEST OPTION FOR LIBREADLINE SUPPORT + + If you add one of + + --enable-pcre2test-libreadline + --enable-pcre2test-libedit + + to the configure command, pcre2test is linked with the libreadline or- + libedit library, respectively, and when its input is from a terminal, + it reads it using the readline() function. This provides line-editing + and history facilities. Note that libreadline is GPL-licensed, so if + you distribute a binary of pcre2test linked in this way, there may be + licensing issues. These can be avoided by linking instead with libedit, + which has a BSD licence. + + Setting --enable-pcre2test-libreadline causes the -lreadline option to + be added to the pcre2test build. In many operating environments with a + sytem-installed readline library this is sufficient. However, in some + environments (e.g. if an unmodified distribution version of readline is + in use), some extra configuration may be necessary. The INSTALL file + for libreadline says this: + + "Readline uses the termcap functions, but does not link with + the termcap or curses library itself, allowing applications + which link with readline the to choose an appropriate library." + + If your environment has not been set up so that an appropriate library + is automatically included, you may need to add something like + + LIBS="-ncurses" + + immediately before the configure command. + + +INCLUDING DEBUGGING CODE + + If you add + + --enable-debug + + to the configure command, additional debugging code is included in the + build. This feature is intended for use by the PCRE2 maintainers. + + +DEBUGGING WITH VALGRIND SUPPORT + + If you add + + --enable-valgrind + + to the configure command, PCRE2 will use valgrind annotations to mark + certain memory regions as unaddressable. This allows it to detect in- + valid memory accesses, and is mostly useful for debugging PCRE2 itself. + + +CODE COVERAGE REPORTING + + If your C compiler is gcc, you can build a version of PCRE2 that can + generate a code coverage report for its test suite. To enable this, you + must install lcov version 1.6 or above. Then specify + + --enable-coverage + + to the configure command and build PCRE2 in the usual way. + + Note that using ccache (a caching C compiler) is incompatible with code + coverage reporting. If you have configured ccache to run automatically + on your system, you must set the environment variable + + CCACHE_DISABLE=1 + + before running make to build PCRE2, so that ccache is not used. + + When --enable-coverage is used, the following addition targets are + added to the Makefile: + + make coverage + + This creates a fresh coverage report for the PCRE2 test suite. It is + equivalent to running "make coverage-reset", "make coverage-baseline", + "make check", and then "make coverage-report". + + make coverage-reset + + This zeroes the coverage counters, but does nothing else. + + make coverage-baseline + + This captures baseline coverage information. + + make coverage-report + + This creates the coverage report. + + make coverage-clean-report + + This removes the generated coverage report without cleaning the cover- + age data itself. + + make coverage-clean-data + + This removes the captured coverage data without removing the coverage + files created at compile time (*.gcno). + + make coverage-clean + + This cleans all coverage data including the generated coverage report. + For more information about code coverage, see the gcov and lcov docu- + mentation. + + +DISABLING THE Z AND T FORMATTING MODIFIERS + + The C99 standard defines formatting modifiers z and t for size_t and + ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers + in environments other than Microsoft Visual Studio when __STDC_VER- + SION__ is defined and has a value greater than or equal to 199901L (in- + dicating C99). However, there is at least one environment that claims + to be C99 but does not support these modifiers. If + + --disable-percent-zt + + is specified, no use is made of the z or t modifiers. Instead of %td or + %zu, %lu is used, with a cast for size_t values. + + +SUPPORT FOR FUZZERS + + There is a special option for use by people who want to run fuzzing + tests on PCRE2: + + --enable-fuzz-support + + At present this applies only to the 8-bit library. If set, it causes an + extra library called libpcre2-fuzzsupport.a to be built, but not in- + stalled. This contains a single function called LLVMFuzzerTestOneIn- + put() whose arguments are a pointer to a string and the length of the + string. When called, this function tries to compile the string as a + pattern, and if that succeeds, to match it. This is done both with no + options and with some random options bits that are generated from the + string. + + Setting --enable-fuzz-support also causes a binary called pcre2fuz- + zcheck to be created. This is normally run under valgrind or used when + PCRE2 is compiled with address sanitizing enabled. It calls the fuzzing + function and outputs information about what it is doing. The input + strings are specified by arguments: if an argument starts with "=" the + rest of it is a literal input string. Otherwise, it is assumed to be a + file name, and the contents of the file are the test string. + + +OBSOLETE OPTION + + In versions of PCRE2 prior to 10.30, there were two ways of handling + backtracking in the pcre2_match() function. The default was to use the + system stack, but if + + --disable-stack-for-recursion + + was set, memory on the heap was used. From release 10.30 onwards this + has changed (the stack is no longer used) and this option now does + nothing except give a warning. + + +SEE ALSO + + pcre2api(3), pcre2-config(3). + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 20 March 2020 + Copyright (c) 1997-2020 University of Cambridge. +------------------------------------------------------------------------------ + + +PCRE2CALLOUT(3) Library Functions Manual PCRE2CALLOUT(3) + + + +NAME + PCRE2 - Perl-compatible regular expressions (revised API) + +SYNOPSIS + + #include + + int (*pcre2_callout)(pcre2_callout_block *, void *); + + int pcre2_callout_enumerate(const pcre2_code *code, + int (*callback)(pcre2_callout_enumerate_block *, void *), + void *user_data); + + +DESCRIPTION + + PCRE2 provides a feature called "callout", which is a means of tempo- + rarily passing control to the caller of PCRE2 in the middle of pattern + matching. The caller of PCRE2 provides an external function by putting + its entry point in a match context (see pcre2_set_callout() in the + pcre2api documentation). + + When using the pcre2_substitute() function, an additional callout fea- + ture is available. This does a callout after each change to the subject + string and is described in the pcre2api documentation; the rest of this + document is concerned with callouts during pattern matching. + + Within a regular expression, (?C) indicates a point at which the + external function is to be called. Different callout points can be + identified by putting a number less than 256 after the letter C. The + default value is zero. Alternatively, the argument may be a delimited + string. The starting delimiter must be one of ` ' " ^ % # $ { and the + ending delimiter is the same as the start, except for {, where the end- + ing delimiter is }. If the ending delimiter is needed within the + string, it must be doubled. For example, this pattern has two callout + points: + + (?C1)abc(?C"some ""arbitrary"" text")def + + If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, + PCRE2 automatically inserts callouts, all with number 255, before each + item in the pattern except for immediately before or after an explicit + callout. For example, if PCRE2_AUTO_CALLOUT is used with the pattern + + A(?C3)B + + it is processed as if it were + + (?C255)A(?C3)B(?C255) + + Here is a more complicated example: + + A(\d{2}|--) + + With PCRE2_AUTO_CALLOUT, this pattern is processed as if it were + + (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255) + + Notice that there is a callout before and after each parenthesis and + alternation bar. If the pattern contains a conditional group whose con- + dition is an assertion, an automatic callout is inserted immediately + before the condition. Such a callout may also be inserted explicitly, + for example: + + (?(?C9)(?=a)ab|de) (?(?C%text%)(?!=d)ab|de) + + This applies only to assertion conditions (because they are themselves + independent groups). + + Callouts can be useful for tracking the progress of pattern matching. + The pcre2test program has a pattern qualifier (/auto_callout) that sets + automatic callouts. When any callouts are present, the output from + pcre2test indicates how the pattern is being matched. This is useful + information when you are trying to optimize the performance of a par- + ticular pattern. + + +MISSING CALLOUTS + + You should be aware that, because of optimizations in the way PCRE2 + compiles and matches patterns, callouts sometimes do not happen exactly + as you might expect. + + Auto-possessification + + At compile time, PCRE2 "auto-possessifies" repeated items when it knows + that what follows cannot be part of the repeat. For example, a+[bc] is + compiled as if it were a++[bc]. The pcre2test output when this pattern + is compiled with PCRE2_ANCHORED and PCRE2_AUTO_CALLOUT and then applied + to the string "aaaa" is: + + --->aaaa + +0 ^ a+ + +2 ^ ^ [bc] + No match + + This indicates that when matching [bc] fails, there is no backtracking + into a+ (because it is being treated as a++) and therefore the callouts + that would be taken for the backtracks do not occur. You can disable + the auto-possessify feature by passing PCRE2_NO_AUTO_POSSESS to + pcre2_compile(), or starting the pattern with (*NO_AUTO_POSSESS). In + this case, the output changes to this: + + --->aaaa + +0 ^ a+ + +2 ^ ^ [bc] + +2 ^ ^ [bc] + +2 ^ ^ [bc] + +2 ^^ [bc] + No match + + This time, when matching [bc] fails, the matcher backtracks into a+ and + tries again, repeatedly, until a+ itself fails. + + Automatic .* anchoring + + By default, an optimization is applied when .* is the first significant + item in a pattern. If PCRE2_DOTALL is set, so that the dot can match + any character, the pattern is automatically anchored. If PCRE2_DOTALL + is not set, a match can start only after an internal newline or at the + beginning of the subject, and pcre2_compile() remembers this. If a pat- + tern has more than one top-level branch, automatic anchoring occurs if + all branches are anchorable. + + This optimization is disabled, however, if .* is in an atomic group or + if there is a backreference to the capture group in which it appears. + It is also disabled if the pattern contains (*PRUNE) or (*SKIP). How- + ever, the presence of callouts does not affect it. + + For example, if the pattern .*\d is compiled with PCRE2_AUTO_CALLOUT + and applied to the string "aa", the pcre2test output is: + + --->aa + +0 ^ .* + +2 ^ ^ \d + +2 ^^ \d + +2 ^ \d + No match + + This shows that all match attempts start at the beginning of the sub- + ject. In other words, the pattern is anchored. You can disable this op- + timization by passing PCRE2_NO_DOTSTAR_ANCHOR to pcre2_compile(), or + starting the pattern with (*NO_DOTSTAR_ANCHOR). In this case, the out- + put changes to: + + --->aa + +0 ^ .* + +2 ^ ^ \d + +2 ^^ \d + +2 ^ \d + +0 ^ .* + +2 ^^ \d + +2 ^ \d + No match + + This shows more match attempts, starting at the second subject charac- + ter. Another optimization, described in the next section, means that + there is no subsequent attempt to match with an empty subject. + + Other optimizations + + Other optimizations that provide fast "no match" results also affect + callouts. For example, if the pattern is + + ab(?C4)cd + + PCRE2 knows that any matching string must contain the letter "d". If + the subject string is "abyz", the lack of "d" means that matching + doesn't ever start, and the callout is never reached. However, with + "abyd", though the result is still no match, the callout is obeyed. + + For most patterns PCRE2 also knows the minimum length of a matching + string, and will immediately give a "no match" return without actually + running a match if the subject is not long enough, or, for unanchored + patterns, if it has been scanned far enough. + + You can disable these optimizations by passing the PCRE2_NO_START_OPTI- + MIZE option to pcre2_compile(), or by starting the pattern with + (*NO_START_OPT). This slows down the matching process, but does ensure + that callouts such as the example above are obeyed. + + +THE CALLOUT INTERFACE + + During matching, when PCRE2 reaches a callout point, if an external + function is provided in the match context, it is called. This applies + to both normal, DFA, and JIT matching. The first argument to the call- + out function is a pointer to a pcre2_callout block. The second argument + is the void * callout data that was supplied when the callout was set + up by calling pcre2_set_callout() (see the pcre2api documentation). The + callout block structure contains the following fields, not necessarily + in this order: + + uint32_t version; + uint32_t callout_number; + uint32_t capture_top; + uint32_t capture_last; + uint32_t callout_flags; + PCRE2_SIZE *offset_vector; + PCRE2_SPTR mark; + PCRE2_SPTR subject; + PCRE2_SIZE subject_length; + PCRE2_SIZE start_match; + PCRE2_SIZE current_position; + PCRE2_SIZE pattern_position; + PCRE2_SIZE next_item_length; + PCRE2_SIZE callout_string_offset; + PCRE2_SIZE callout_string_length; + PCRE2_SPTR callout_string; + + The version field contains the version number of the block format. The + current version is 2; the three callout string fields were added for + version 1, and the callout_flags field for version 2. If you are writ- + ing an application that might use an earlier release of PCRE2, you + should check the version number before accessing any of these fields. + The version number will increase in future if more fields are added, + but the intention is never to remove any of the existing fields. + + Fields for numerical callouts + + For a numerical callout, callout_string is NULL, and callout_number + contains the number of the callout, in the range 0-255. This is the + number that follows (?C for callouts that part of the pattern; it is + 255 for automatically generated callouts. + + Fields for string callouts + + For callouts with string arguments, callout_number is always zero, and + callout_string points to the string that is contained within the com- + piled pattern. Its length is given by callout_string_length. Duplicated + ending delimiters that were present in the original pattern string have + been turned into single characters, but there is no other processing of + the callout string argument. An additional code unit containing binary + zero is present after the string, but is not included in the length. + The delimiter that was used to start the string is also stored within + the pattern, immediately before the string itself. You can access this + delimiter as callout_string[-1] if you need it. + + The callout_string_offset field is the code unit offset to the start of + the callout argument string within the original pattern string. This is + provided for the benefit of applications such as script languages that + might need to report errors in the callout string within the pattern. + + Fields for all callouts + + The remaining fields in the callout block are the same for both kinds + of callout. + + The offset_vector field is a pointer to a vector of capturing offsets + (the "ovector"). You may read the elements in this vector, but you must + not change any of them. + + For calls to pcre2_match(), the offset_vector field is not (since re- + lease 10.30) a pointer to the actual ovector that was passed to the + matching function in the match data block. Instead it points to an in- + ternal ovector of a size large enough to hold all possible captured + substrings in the pattern. Note that whenever a recursion or subroutine + call within a pattern completes, the capturing state is reset to what + it was before. + + The capture_last field contains the number of the most recently cap- + tured substring, and the capture_top field contains one more than the + number of the highest numbered captured substring so far. If no sub- + strings have yet been captured, the value of capture_last is 0 and the + value of capture_top is 1. The values of these fields do not always + differ by one; for example, when the callout in the pattern + ((a)(b))(?C2) is taken, capture_last is 1 but capture_top is 4. + + The contents of ovector[2] to ovector[*2-1] can be in- + spected in order to extract substrings that have been matched so far, + in the same way as extracting substrings after a match has completed. + The values in ovector[0] and ovector[1] are always PCRE2_UNSET because + the match is by definition not complete. Substrings that have not been + captured but whose numbers are less than capture_top also have both of + their ovector slots set to PCRE2_UNSET. + + For DFA matching, the offset_vector field points to the ovector that + was passed to the matching function in the match data block for call- + outs at the top level, but to an internal ovector during the processing + of pattern recursions, lookarounds, and atomic groups. However, these + ovectors hold no useful information because pcre2_dfa_match() does not + support substring capturing. The value of capture_top is always 1 and + the value of capture_last is always 0 for DFA matching. + + The subject and subject_length fields contain copies of the values that + were passed to the matching function. + + The start_match field normally contains the offset within the subject + at which the current match attempt started. However, if the escape se- + quence \K has been encountered, this value is changed to reflect the + modified starting point. If the pattern is not anchored, the callout + function may be called several times from the same point in the pattern + for different starting points in the subject. + + The current_position field contains the offset within the subject of + the current match pointer. + + The pattern_position field contains the offset in the pattern string to + the next item to be matched. + + The next_item_length field contains the length of the next item to be + processed in the pattern string. When the callout is at the end of the + pattern, the length is zero. When the callout precedes an opening + parenthesis, the length includes meta characters that follow the paren- + thesis. For example, in a callout before an assertion such as (?=ab) + the length is 3. For an an alternation bar or a closing parenthesis, + the length is one, unless a closing parenthesis is followed by a quan- + tifier, in which case its length is included. (This changed in release + 10.23. In earlier releases, before an opening parenthesis the length + was that of the entire group, and before an alternation bar or a clos- + ing parenthesis the length was zero.) + + The pattern_position and next_item_length fields are intended to help + in distinguishing between different automatic callouts, which all have + the same callout number. However, they are set for all callouts, and + are used by pcre2test to show the next item to be matched when display- + ing callout information. + + In callouts from pcre2_match() the mark field contains a pointer to the + zero-terminated name of the most recently passed (*MARK), (*PRUNE), or + (*THEN) item in the match, or NULL if no such items have been passed. + Instances of (*PRUNE) or (*THEN) without a name do not obliterate a + previous (*MARK). In callouts from the DFA matching function this field + always contains NULL. + + The callout_flags field is always zero in callouts from + pcre2_dfa_match() or when JIT is being used. When pcre2_match() without + JIT is used, the following bits may be set: + + PCRE2_CALLOUT_STARTMATCH + + This is set for the first callout after the start of matching for each + new starting position in the subject. + + PCRE2_CALLOUT_BACKTRACK + + This is set if there has been a matching backtrack since the previous + callout, or since the start of matching if this is the first callout + from a pcre2_match() run. + + Both bits are set when a backtrack has caused a "bumpalong" to a new + starting position in the subject. Output from pcre2test does not indi- + cate the presence of these bits unless the callout_extra modifier is + set. + + The information in the callout_flags field is provided so that applica- + tions can track and tell their users how matching with backtracking is + done. This can be useful when trying to optimize patterns, or just to + understand how PCRE2 works. There is no support in pcre2_dfa_match() + because there is no backtracking in DFA matching, and there is no sup- + port in JIT because JIT is all about maximimizing matching performance. + In both these cases the callout_flags field is always zero. + + +RETURN VALUES FROM CALLOUTS + + The external callout function returns an integer to PCRE2. If the value + is zero, matching proceeds as normal. If the value is greater than + zero, matching fails at the current point, but the testing of other + matching possibilities goes ahead, just as if a lookahead assertion had + failed. If the value is less than zero, the match is abandoned, and the + matching function returns the negative value. + + Negative values should normally be chosen from the set of PCRE2_ER- + ROR_xxx values. In particular, PCRE2_ERROR_NOMATCH forces a standard + "no match" failure. The error number PCRE2_ERROR_CALLOUT is reserved + for use by callout functions; it will never be used by PCRE2 itself. + + +CALLOUT ENUMERATION + + int pcre2_callout_enumerate(const pcre2_code *code, + int (*callback)(pcre2_callout_enumerate_block *, void *), + void *user_data); + + A script language that supports the use of string arguments in callouts + might like to scan all the callouts in a pattern before running the + match. This can be done by calling pcre2_callout_enumerate(). The first + argument is a pointer to a compiled pattern, the second points to a + callback function, and the third is arbitrary user data. The callback + function is called for every callout in the pattern in the order in + which they appear. Its first argument is a pointer to a callout enumer- + ation block, and its second argument is the user_data value that was + passed to pcre2_callout_enumerate(). The data block contains the fol- + lowing fields: + + version Block version number + pattern_position Offset to next item in pattern + next_item_length Length of next item in pattern + callout_number Number for numbered callouts + callout_string_offset Offset to string within pattern + callout_string_length Length of callout string + callout_string Points to callout string or is NULL + + The version number is currently 0. It will increase if new fields are + ever added to the block. The remaining fields are the same as their + namesakes in the pcre2_callout block that is used for callouts during + matching, as described above. + + Note that the value of pattern_position is unique for each callout. + However, if a callout occurs inside a group that is quantified with a + non-zero minimum or a fixed maximum, the group is replicated inside the + compiled pattern. For example, a pattern such as /(a){2}/ is compiled + as if it were /(a)(a)/. This means that the callout will be enumerated + more than once, but with the same value for pattern_position in each + case. + + The callback function should normally return zero. If it returns a non- + zero value, scanning the pattern stops, and that value is returned from + pcre2_callout_enumerate(). + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 03 February 2019 + Copyright (c) 1997-2019 University of Cambridge. +------------------------------------------------------------------------------ + + +PCRE2COMPAT(3) Library Functions Manual PCRE2COMPAT(3) + + + +NAME + PCRE2 - Perl-compatible regular expressions (revised API) + +DIFFERENCES BETWEEN PCRE2 AND PERL + + This document describes some of the differences in the ways that PCRE2 + and Perl handle regular expressions. The differences described here are + with respect to Perl version 5.32.0, but as both Perl and PCRE2 are + continually changing, the information may at times be out of date. + + 1. PCRE2 has only a subset of Perl's Unicode support. Details of what + it does have are given in the pcre2unicode page. + + 2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized asser- + tions, but they do not mean what you might think. For example, (?!a){3} + does not assert that the next three characters are not "a". It just as- + serts that the next character is not "a" three times (in principle; + PCRE2 optimizes this to run the assertion just once). Perl allows some + repeat quantifiers on other assertions, for example, \b* (but not + \b{3}, though oddly it does allow ^{3}), but these do not seem to have + any use. PCRE2 does not allow any kind of quantifier on non-lookaround + assertions. + + 3. Capture groups that occur inside negative lookaround assertions are + counted, but their entries in the offsets vector are set only when a + negative assertion is a condition that has a matching branch (that is, + the condition is false). Perl may set such capture groups in other + circumstances. + + 4. The following Perl escape sequences are not supported: \F, \l, \L, + \u, \U, and \N when followed by a character name. \N on its own, match- + ing a non-newline character, and \N{U+dd..}, matching a Unicode code + point, are supported. The escapes that modify the case of following + letters are implemented by Perl's general string-handling and are not + part of its pattern matching engine. If any of these are encountered by + PCRE2, an error is generated by default. However, if either of the + PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX options is set, \U and \u are + interpreted as ECMAScript interprets them. + + 5. The Perl escape sequences \p, \P, and \X are supported only if PCRE2 + is built with Unicode support (the default). The properties that can be + tested with \p and \P are limited to the general category properties + such as Lu and Nd, script names such as Greek or Han, and the derived + properties Any and L&. Both PCRE2 and Perl support the Cs (surrogate) + property, but in PCRE2 its use is limited. See the pcre2pattern docu- + mentation for details. The long synonyms for property names that Perl + supports (such as \p{Letter}) are not supported by PCRE2, nor is it + permitted to prefix any of these properties with "Is". + + 6. PCRE2 supports the \Q...\E escape for quoting substrings. Characters + in between are treated as literals. However, this is slightly different + from Perl in that $ and @ are also handled as literals inside the + quotes. In Perl, they cause variable interpolation (but of course PCRE2 + does not have variables). Also, Perl does "double-quotish backslash in- + terpolation" on any backslashes between \Q and \E which, its documenta- + tion says, "may lead to confusing results". PCRE2 treats a backslash + between \Q and \E just like any other character. Note the following ex- + amples: + + Pattern PCRE2 matches Perl matches + + \Qabc$xyz\E abc$xyz abc followed by the + contents of $xyz + \Qabc\$xyz\E abc\$xyz abc\$xyz + \Qabc\E\$\Qxyz\E abc$xyz abc$xyz + \QA\B\E A\B A\B + \Q\\E \ \\E + + The \Q...\E sequence is recognized both inside and outside character + classes by both PCRE2 and Perl. + + 7. Fairly obviously, PCRE2 does not support the (?{code}) and + (??{code}) constructions. However, PCRE2 does have a "callout" feature, + which allows an external function to be called during pattern matching. + See the pcre2callout documentation for details. + + 8. Subroutine calls (whether recursive or not) were treated as atomic + groups up to PCRE2 release 10.23, but from release 10.30 this changed, + and backtracking into subroutine calls is now supported, as in Perl. + + 9. In PCRE2, if any of the backtracking control verbs are used in a + group that is called as a subroutine (whether or not recursively), + their effect is confined to that group; it does not extend to the sur- + rounding pattern. This is not always the case in Perl. In particular, + if (*THEN) is present in a group that is called as a subroutine, its + action is limited to that group, even if the group does not contain any + | characters. Note that such groups are processed as anchored at the + point where they are tested. + + 10. If a pattern contains more than one backtracking control verb, the + first one that is backtracked onto acts. For example, in the pattern + A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure + in C triggers (*PRUNE). Perl's behaviour is more complex; in many cases + it is the same as PCRE2, but there are cases where it differs. + + 11. There are some differences that are concerned with the settings of + captured strings when part of a pattern is repeated. For example, + matching "aba" against the pattern /^(a(b)?)+$/ in Perl leaves $2 un- + set, but in PCRE2 it is set to "b". + + 12. PCRE2's handling of duplicate capture group numbers and names is + not as general as Perl's. This is a consequence of the fact the PCRE2 + works internally just with numbers, using an external table to trans- + late between numbers and names. In particular, a pattern such as + (?|(?A)|(?B)), where the two capture groups have the same number + but different names, is not supported, and causes an error at compile + time. If it were allowed, it would not be possible to distinguish which + group matched, because both names map to capture group number 1. To + avoid this confusing situation, an error is given at compile time. + + 13. Perl used to recognize comments in some places that PCRE2 does not, + for example, between the ( and ? at the start of a group. If the /x + modifier is set, Perl allowed white space between ( and ? though the + latest Perls give an error (for a while it was just deprecated). There + may still be some cases where Perl behaves differently. + + 14. Perl, when in warning mode, gives warnings for character classes + such as [A-\d] or [a-[:digit:]]. It then treats the hyphens as liter- + als. PCRE2 has no warning features, so it gives an error in these cases + because they are almost certainly user mistakes. + + 15. In PCRE2, the upper/lower case character properties Lu and Ll are + not affected when case-independent matching is specified. For example, + \p{Lu} always matches an upper case letter. I think Perl has changed in + this respect; in the release at the time of writing (5.32), \p{Lu} and + \p{Ll} match all letters, regardless of case, when case independence is + specified. + + 16. From release 5.32.0, Perl locks out the use of \K in lookaround as- + sertions. In PCRE2, \K is acted on when it occurs in positive asser- + tions, but is ignored in negative assertions. + + 17. PCRE2 provides some extensions to the Perl regular expression fa- + cilities. Perl 5.10 included new features that were not in earlier + versions of Perl, some of which (such as named parentheses) were in + PCRE2 for some time before. This list is with respect to Perl 5.32: + + (a) Although lookbehind assertions in PCRE2 must match fixed length + strings, each alternative toplevel branch of a lookbehind assertion can + match a different length of string. Perl requires them all to have the + same length. + + (b) From PCRE2 10.23, backreferences to groups of fixed length are sup- + ported in lookbehinds, provided that there is no possibility of refer- + encing a non-unique number or name. Perl does not support backrefer- + ences in lookbehinds. + + (c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the + $ meta-character matches only at the very end of the string. + + (d) A backslash followed by a letter with no special meaning is + faulted. (Perl can be made to issue a warning.) + + (e) If PCRE2_UNGREEDY is set, the greediness of the repetition quanti- + fiers is inverted, that is, by default they are not greedy, but if fol- + lowed by a question mark they are. + + (f) PCRE2_ANCHORED can be used at matching time to force a pattern to + be tried only at the first matching position in the subject string. + + (g) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and + PCRE2_NOTEMPTY_ATSTART options have no Perl equivalents. + + (h) The \R escape sequence can be restricted to match only CR, LF, or + CRLF by the PCRE2_BSR_ANYCRLF option. + + (i) The callout facility is PCRE2-specific. Perl supports codeblocks + and variable interpolation, but not general hooks on every match. + + (j) The partial matching facility is PCRE2-specific. + + (k) The alternative matching function (pcre2_dfa_match() matches in a + different way and is not Perl-compatible. + + (l) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) + at the start of a pattern. These set overall options that cannot be + changed within the pattern. + + (m) PCRE2 supports non-atomic positive lookaround assertions. This is + an extension to the lookaround facilities. The default, Perl-compatible + lookarounds are atomic. + + 18. The Perl /a modifier restricts /d numbers to pure ascii, and the + /aa modifier restricts /i case-insensitive matching to pure ascii, ig- + noring Unicode rules. This separation cannot be represented with + PCRE2_UCP. + + 19. Perl has different limits than PCRE2. See the pcre2limit documenta- + tion for details. Perl went with 5.10 from recursion to iteration keep- + ing the intermediate matches on the heap, which is ~10% slower but does + not fall into any stack-overflow limit. PCRE2 made a similar change at + release 10.30, and also has many build-time and run-time customizable + limits. + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 06 October 2020 + Copyright (c) 1997-2019 University of Cambridge. +------------------------------------------------------------------------------ + + +PCRE2JIT(3) Library Functions Manual PCRE2JIT(3) + + + +NAME + PCRE2 - Perl-compatible regular expressions (revised API) + +PCRE2 JUST-IN-TIME COMPILER SUPPORT + + Just-in-time compiling is a heavyweight optimization that can greatly + speed up pattern matching. However, it comes at the cost of extra pro- + cessing before the match is performed, so it is of most benefit when + the same pattern is going to be matched many times. This does not nec- + essarily mean many calls of a matching function; if the pattern is not + anchored, matching attempts may take place many times at various posi- + tions in the subject, even for a single call. Therefore, if the subject + string is very long, it may still pay to use JIT even for one-off + matches. JIT support is available for all of the 8-bit, 16-bit and + 32-bit PCRE2 libraries. + + JIT support applies only to the traditional Perl-compatible matching + function. It does not apply when the DFA matching function is being + used. The code for this support was written by Zoltan Herczeg. + + +AVAILABILITY OF JIT SUPPORT + + JIT support is an optional feature of PCRE2. The "configure" option + --enable-jit (or equivalent CMake option) must be set when PCRE2 is + built if you want to use JIT. The support is limited to the following + hardware platforms: + + ARM 32-bit (v5, v7, and Thumb2) + ARM 64-bit + Intel x86 32-bit and 64-bit + MIPS 32-bit and 64-bit + Power PC 32-bit and 64-bit + SPARC 32-bit + + If --enable-jit is set on an unsupported platform, compilation fails. + + A program can tell if JIT support is available by calling pcre2_con- + fig() with the PCRE2_CONFIG_JIT option. The result is 1 when JIT is + available, and 0 otherwise. However, a simple program does not need to + check this in order to use JIT. The API is implemented in a way that + falls back to the interpretive code if JIT is not available. For pro- + grams that need the best possible performance, there is also a "fast + path" API that is JIT-specific. + + +SIMPLE USE OF JIT + + To make use of the JIT support in the simplest way, all you have to do + is to call pcre2_jit_compile() after successfully compiling a pattern + with pcre2_compile(). This function has two arguments: the first is the + compiled pattern pointer that was returned by pcre2_compile(), and the + second is zero or more of the following option bits: PCRE2_JIT_COM- + PLETE, PCRE2_JIT_PARTIAL_HARD, or PCRE2_JIT_PARTIAL_SOFT. + + If JIT support is not available, a call to pcre2_jit_compile() does + nothing and returns PCRE2_ERROR_JIT_BADOPTION. Otherwise, the compiled + pattern is passed to the JIT compiler, which turns it into machine code + that executes much faster than the normal interpretive code, but yields + exactly the same results. The returned value from pcre2_jit_compile() + is zero on success, or a negative error code. + + There is a limit to the size of pattern that JIT supports, imposed by + the size of machine stack that it uses. The exact rules are not docu- + mented because they may change at any time, in particular, when new op- + timizations are introduced. If a pattern is too big, a call to + pcre2_jit_compile() returns PCRE2_ERROR_NOMEMORY. + + PCRE2_JIT_COMPLETE requests the JIT compiler to generate code for com- + plete matches. If you want to run partial matches using the PCRE2_PAR- + TIAL_HARD or PCRE2_PARTIAL_SOFT options of pcre2_match(), you should + set one or both of the other options as well as, or instead of + PCRE2_JIT_COMPLETE. The JIT compiler generates different optimized code + for each of the three modes (normal, soft partial, hard partial). When + pcre2_match() is called, the appropriate code is run if it is avail- + able. Otherwise, the pattern is matched using interpretive code. + + You can call pcre2_jit_compile() multiple times for the same compiled + pattern. It does nothing if it has previously compiled code for any of + the option bits. For example, you can call it once with PCRE2_JIT_COM- + PLETE and (perhaps later, when you find you need partial matching) + again with PCRE2_JIT_COMPLETE and PCRE2_JIT_PARTIAL_HARD. This time it + will ignore PCRE2_JIT_COMPLETE and just compile code for partial match- + ing. If pcre2_jit_compile() is called with no option bits set, it imme- + diately returns zero. This is an alternative way of testing whether JIT + is available. + + At present, it is not possible to free JIT compiled code except when + the entire compiled pattern is freed by calling pcre2_code_free(). + + In some circumstances you may need to call additional functions. These + are described in the section entitled "Controlling the JIT stack" be- + low. + + There are some pcre2_match() options that are not supported by JIT, and + there are also some pattern items that JIT cannot handle. Details are + given below. In both cases, matching automatically falls back to the + interpretive code. If you want to know whether JIT was actually used + for a particular match, you should arrange for a JIT callback function + to be set up as described in the section entitled "Controlling the JIT + stack" below, even if you do not need to supply a non-default JIT + stack. Such a callback function is called whenever JIT code is about to + be obeyed. If the match-time options are not right for JIT execution, + the callback function is not obeyed. + + If the JIT compiler finds an unsupported item, no JIT data is gener- + ated. You can find out if JIT matching is available after compiling a + pattern by calling pcre2_pattern_info() with the PCRE2_INFO_JITSIZE op- + tion. A non-zero result means that JIT compilation was successful. A + result of 0 means that JIT support is not available, or the pattern was + not processed by pcre2_jit_compile(), or the JIT compiler was not able + to handle the pattern. + + +MATCHING SUBJECTS CONTAINING INVALID UTF + + When a pattern is compiled with the PCRE2_UTF option, subject strings + are normally expected to be a valid sequence of UTF code units. By de- + fault, this is checked at the start of matching and an error is gener- + ated if invalid UTF is detected. The PCRE2_NO_UTF_CHECK option can be + passed to pcre2_match() to skip the check (for improved performance) if + you are sure that a subject string is valid. If this option is used + with an invalid string, the result is undefined. + + However, a way of running matches on strings that may contain invalid + UTF sequences is available. Calling pcre2_compile() with the + PCRE2_MATCH_INVALID_UTF option has two effects: it tells the inter- + preter in pcre2_match() to support invalid UTF, and, if pcre2_jit_com- + pile() is called, the compiled JIT code also supports invalid UTF. De- + tails of how this support works, in both the JIT and the interpretive + cases, is given in the pcre2unicode documentation. + + There is also an obsolete option for pcre2_jit_compile() called + PCRE2_JIT_INVALID_UTF, which currently exists only for backward compat- + ibility. It is superseded by the pcre2_compile() option + PCRE2_MATCH_INVALID_UTF and should no longer be used. It may be removed + in future. + + +UNSUPPORTED OPTIONS AND PATTERN ITEMS + + The pcre2_match() options that are supported for JIT matching are + PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, + PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and + PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options + are not supported at match time. + + If the PCRE2_NO_JIT option is passed to pcre2_match() it disables the + use of JIT, forcing matching by the interpreter code. + + The only unsupported pattern items are \C (match a single data unit) + when running in a UTF mode, and a callout immediately before an asser- + tion condition in a conditional group. + + +RETURN VALUES FROM JIT MATCHING + + When a pattern is matched using JIT matching, the return values are the + same as those given by the interpretive pcre2_match() code, with the + addition of one new error code: PCRE2_ERROR_JIT_STACKLIMIT. This means + that the memory used for the JIT stack was insufficient. See "Control- + ling the JIT stack" below for a discussion of JIT stack usage. + + The error code PCRE2_ERROR_MATCHLIMIT is returned by the JIT code if + searching a very large pattern tree goes on for too long, as it is in + the same circumstance when JIT is not used, but the details of exactly + what is counted are not the same. The PCRE2_ERROR_DEPTHLIMIT error code + is never returned when JIT matching is used. + + +CONTROLLING THE JIT STACK + + When the compiled JIT code runs, it needs a block of memory to use as a + stack. By default, it uses 32KiB on the machine stack. However, some + large or complicated patterns need more than this. The error PCRE2_ER- + ROR_JIT_STACKLIMIT is given when there is not enough stack. Three func- + tions are provided for managing blocks of memory for use as JIT stacks. + There is further discussion about the use of JIT stacks in the section + entitled "JIT stack FAQ" below. + + The pcre2_jit_stack_create() function creates a JIT stack. Its argu- + ments are a starting size, a maximum size, and a general context (for + memory allocation functions, or NULL for standard memory allocation). + It returns a pointer to an opaque structure of type pcre2_jit_stack, or + NULL if there is an error. The pcre2_jit_stack_free() function is used + to free a stack that is no longer needed. If its argument is NULL, this + function returns immediately, without doing anything. (For the techni- + cally minded: the address space is allocated by mmap or VirtualAlloc.) + A maximum stack size of 512KiB to 1MiB should be more than enough for + any pattern. + + The pcre2_jit_stack_assign() function specifies which stack JIT code + should use. Its arguments are as follows: + + pcre2_match_context *mcontext + pcre2_jit_callback callback + void *data + + The first argument is a pointer to a match context. When this is subse- + quently passed to a matching function, its information determines which + JIT stack is used. If this argument is NULL, the function returns imme- + diately, without doing anything. There are three cases for the values + of the other two options: + + (1) If callback is NULL and data is NULL, an internal 32KiB block + on the machine stack is used. This is the default when a match + context is created. + + (2) If callback is NULL and data is not NULL, data must be + a pointer to a valid JIT stack, the result of calling + pcre2_jit_stack_create(). + + (3) If callback is not NULL, it must point to a function that is + called with data as an argument at the start of matching, in + order to set up a JIT stack. If the return from the callback + function is NULL, the internal 32KiB stack is used; otherwise the + return value must be a valid JIT stack, the result of calling + pcre2_jit_stack_create(). + + A callback function is obeyed whenever JIT code is about to be run; it + is not obeyed when pcre2_match() is called with options that are incom- + patible for JIT matching. A callback function can therefore be used to + determine whether a match operation was executed by JIT or by the in- + terpreter. + + You may safely use the same JIT stack for more than one pattern (either + by assigning directly or by callback), as long as the patterns are + matched sequentially in the same thread. Currently, the only way to set + up non-sequential matches in one thread is to use callouts: if a call- + out function starts another match, that match must use a different JIT + stack to the one used for currently suspended match(es). + + In a multithread application, if you do not specify a JIT stack, or if + you assign or pass back NULL from a callback, that is thread-safe, be- + cause each thread has its own machine stack. However, if you assign or + pass back a non-NULL JIT stack, this must be a different stack for each + thread so that the application is thread-safe. + + Strictly speaking, even more is allowed. You can assign the same non- + NULL stack to a match context that is used by any number of patterns, + as long as they are not used for matching by multiple threads at the + same time. For example, you could use the same stack in all compiled + patterns, with a global mutex in the callback to wait until the stack + is available for use. However, this is an inefficient solution, and not + recommended. + + This is a suggestion for how a multithreaded program that needs to set + up non-default JIT stacks might operate: + + During thread initalization + thread_local_var = pcre2_jit_stack_create(...) + + During thread exit + pcre2_jit_stack_free(thread_local_var) + + Use a one-line callback function + return thread_local_var + + All the functions described in this section do nothing if JIT is not + available. + + +JIT STACK FAQ + + (1) Why do we need JIT stacks? + + PCRE2 (and JIT) is a recursive, depth-first engine, so it needs a stack + where the local data of the current node is pushed before checking its + child nodes. Allocating real machine stack on some platforms is diffi- + cult. For example, the stack chain needs to be updated every time if we + extend the stack on PowerPC. Although it is possible, its updating + time overhead decreases performance. So we do the recursion in memory. + + (2) Why don't we simply allocate blocks of memory with malloc()? + + Modern operating systems have a nice feature: they can reserve an ad- + dress space instead of allocating memory. We can safely allocate memory + pages inside this address space, so the stack could grow without moving + memory data (this is important because of pointers). Thus we can allo- + cate 1MiB address space, and use only a single memory page (usually + 4KiB) if that is enough. However, we can still grow up to 1MiB anytime + if needed. + + (3) Who "owns" a JIT stack? + + The owner of the stack is the user program, not the JIT studied pattern + or anything else. The user program must ensure that if a stack is being + used by pcre2_match(), (that is, it is assigned to a match context that + is passed to the pattern currently running), that stack must not be + used by any other threads (to avoid overwriting the same memory area). + The best practice for multithreaded programs is to allocate a stack for + each thread, and return this stack through the JIT callback function. + + (4) When should a JIT stack be freed? + + You can free a JIT stack at any time, as long as it will not be used by + pcre2_match() again. When you assign the stack to a match context, only + a pointer is set. There is no reference counting or any other magic. + You can free compiled patterns, contexts, and stacks in any order, any- + time. Just do not call pcre2_match() with a match context pointing to + an already freed stack, as that will cause SEGFAULT. (Also, do not free + a stack currently used by pcre2_match() in another thread). You can + also replace the stack in a context at any time when it is not in use. + You should free the previous stack before assigning a replacement. + + (5) Should I allocate/free a stack every time before/after calling + pcre2_match()? + + No, because this is too costly in terms of resources. However, you + could implement some clever idea which release the stack if it is not + used in let's say two minutes. The JIT callback can help to achieve + this without keeping a list of patterns. + + (6) OK, the stack is for long term memory allocation. But what happens + if a pattern causes stack overflow with a stack of 1MiB? Is that 1MiB + kept until the stack is freed? + + Especially on embedded sytems, it might be a good idea to release mem- + ory sometimes without freeing the stack. There is no API for this at + the moment. Probably a function call which returns with the currently + allocated memory for any stack and another which allows releasing mem- + ory (shrinking the stack) would be a good idea if someone needs this. + + (7) This is too much of a headache. Isn't there any better solution for + JIT stack handling? + + No, thanks to Windows. If POSIX threads were used everywhere, we could + throw out this complicated API. + + +FREEING JIT SPECULATIVE MEMORY + + void pcre2_jit_free_unused_memory(pcre2_general_context *gcontext); + + The JIT executable allocator does not free all memory when it is possi- + ble. It expects new allocations, and keeps some free memory around to + improve allocation speed. However, in low memory conditions, it might + be better to free all possible memory. You can cause this to happen by + calling pcre2_jit_free_unused_memory(). Its argument is a general con- + text, for custom memory management, or NULL for standard memory manage- + ment. + + +EXAMPLE CODE + + This is a single-threaded example that specifies a JIT stack without + using a callback. A real program should include error checking after + all the function calls. + + int rc; + pcre2_code *re; + pcre2_match_data *match_data; + pcre2_match_context *mcontext; + pcre2_jit_stack *jit_stack; + + re = pcre2_compile(pattern, PCRE2_ZERO_TERMINATED, 0, + &errornumber, &erroffset, NULL); + rc = pcre2_jit_compile(re, PCRE2_JIT_COMPLETE); + mcontext = pcre2_match_context_create(NULL); + jit_stack = pcre2_jit_stack_create(32*1024, 512*1024, NULL); + pcre2_jit_stack_assign(mcontext, NULL, jit_stack); + match_data = pcre2_match_data_create(re, 10); + rc = pcre2_match(re, subject, length, 0, 0, match_data, mcontext); + /* Process result */ + + pcre2_code_free(re); + pcre2_match_data_free(match_data); + pcre2_match_context_free(mcontext); + pcre2_jit_stack_free(jit_stack); + + +JIT FAST PATH API + + Because the API described above falls back to interpreted matching when + JIT is not available, it is convenient for programs that are written + for general use in many environments. However, calling JIT via + pcre2_match() does have a performance impact. Programs that are written + for use where JIT is known to be available, and which need the best + possible performance, can instead use a "fast path" API to call JIT + matching directly instead of calling pcre2_match() (obviously only for + patterns that have been successfully processed by pcre2_jit_compile()). + + The fast path function is called pcre2_jit_match(), and it takes ex- + actly the same arguments as pcre2_match(). However, the subject string + must be specified with a length; PCRE2_ZERO_TERMINATED is not sup- + ported. Unsupported option bits (for example, PCRE2_ANCHORED, PCRE2_EN- + DANCHORED and PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the + PCRE2_NO_JIT option. The return values are also the same as for + pcre2_match(), plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (par- + tial or complete) is requested that was not compiled. + + When you call pcre2_match(), as well as testing for invalid options, a + number of other sanity checks are performed on the arguments. For exam- + ple, if the subject pointer is NULL, an immediate error is given. Also, + unless PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested for + validity. In the interests of speed, these checks do not happen on the + JIT fast path, and if invalid data is passed, the result is undefined. + + Bypassing the sanity checks and the pcre2_match() wrapping can give + speedups of more than 10%. + + +SEE ALSO + + pcre2api(3) + + +AUTHOR + + Philip Hazel (FAQ by Zoltan Herczeg) + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 23 May 2019 + Copyright (c) 1997-2019 University of Cambridge. +------------------------------------------------------------------------------ + + +PCRE2LIMITS(3) Library Functions Manual PCRE2LIMITS(3) + + + +NAME + PCRE2 - Perl-compatible regular expressions (revised API) + +SIZE AND OTHER LIMITATIONS + + There are some size limitations in PCRE2 but it is hoped that they will + never in practice be relevant. + + The maximum size of a compiled pattern is approximately 64 thousand + code units for the 8-bit and 16-bit libraries if PCRE2 is compiled with + the default internal linkage size, which is 2 bytes for these li- + braries. If you want to process regular expressions that are truly + enormous, you can compile PCRE2 with an internal linkage size of 3 or 4 + (when building the 16-bit library, 3 is rounded up to 4). See the + README file in the source distribution and the pcre2build documentation + for details. In these cases the limit is substantially larger. How- + ever, the speed of execution is slower. In the 32-bit library, the in- + ternal linkage size is always 4. + + The maximum length of a source pattern string is essentially unlimited; + it is the largest number a PCRE2_SIZE variable can hold. However, the + program that calls pcre2_compile() can specify a smaller limit. + + The maximum length (in code units) of a subject string is one less than + the largest number a PCRE2_SIZE variable can hold. PCRE2_SIZE is an un- + signed integer type, usually defined as size_t. Its maximum value (that + is ~(PCRE2_SIZE)0) is reserved as a special indicator for zero-termi- + nated strings and unset offsets. + + All values in repeating quantifiers must be less than 65536. + + The maximum length of a lookbehind assertion is 65535 characters. + + There is no limit to the number of parenthesized groups, but there can + be no more than 65535 capture groups, and there is a limit to the depth + of nesting of parenthesized subpatterns of all kinds. This is imposed + in order to limit the amount of system stack used at compile time. The + default limit can be specified when PCRE2 is built; if not, the default + is set to 250. An application can change this limit by calling + pcre2_set_parens_nest_limit() to set the limit in a compile context. + + The maximum length of name for a named capture group is 32 code units, + and the maximum number of such groups is 10000. + + The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or + (*THEN) verb is 255 code units for the 8-bit library and 65535 code + units for the 16-bit and 32-bit libraries. + + The maximum length of a string argument to a callout is the largest + number a 32-bit unsigned integer can hold. + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 02 February 2019 + Copyright (c) 1997-2019 University of Cambridge. +------------------------------------------------------------------------------ + + +PCRE2MATCHING(3) Library Functions Manual PCRE2MATCHING(3) + + + +NAME + PCRE2 - Perl-compatible regular expressions (revised API) + +PCRE2 MATCHING ALGORITHMS + + This document describes the two different algorithms that are available + in PCRE2 for matching a compiled regular expression against a given + subject string. The "standard" algorithm is the one provided by the + pcre2_match() function. This works in the same as as Perl's matching + function, and provide a Perl-compatible matching operation. The just- + in-time (JIT) optimization that is described in the pcre2jit documenta- + tion is compatible with this function. + + An alternative algorithm is provided by the pcre2_dfa_match() function; + it operates in a different way, and is not Perl-compatible. This alter- + native has advantages and disadvantages compared with the standard al- + gorithm, and these are described below. + + When there is only one possible way in which a given subject string can + match a pattern, the two algorithms give the same answer. A difference + arises, however, when there are multiple possibilities. For example, if + the pattern + + ^<.*> + + is matched against the string + + + + there are three possible answers. The standard algorithm finds only one + of them, whereas the alternative algorithm finds all three. + + +REGULAR EXPRESSIONS AS TREES + + The set of strings that are matched by a regular expression can be rep- + resented as a tree structure. An unlimited repetition in the pattern + makes the tree of infinite size, but it is still a tree. Matching the + pattern to a given subject string (from a given starting point) can be + thought of as a search of the tree. There are two ways to search a + tree: depth-first and breadth-first, and these correspond to the two + matching algorithms provided by PCRE2. + + +THE STANDARD MATCHING ALGORITHM + + In the terminology of Jeffrey Friedl's book "Mastering Regular Expres- + sions", the standard algorithm is an "NFA algorithm". It conducts a + depth-first search of the pattern tree. That is, it proceeds along a + single path through the tree, checking that the subject matches what is + required. When there is a mismatch, the algorithm tries any alterna- + tives at the current point, and if they all fail, it backs up to the + previous branch point in the tree, and tries the next alternative + branch at that level. This often involves backing up (moving to the + left) in the subject string as well. The order in which repetition + branches are tried is controlled by the greedy or ungreedy nature of + the quantifier. + + If a leaf node is reached, a matching string has been found, and at + that point the algorithm stops. Thus, if there is more than one possi- + ble match, this algorithm returns the first one that it finds. Whether + this is the shortest, the longest, or some intermediate length depends + on the way the greedy and ungreedy repetition quantifiers are specified + in the pattern. + + Because it ends up with a single path through the tree, it is rela- + tively straightforward for this algorithm to keep track of the sub- + strings that are matched by portions of the pattern in parentheses. + This provides support for capturing parentheses and backreferences. + + +THE ALTERNATIVE MATCHING ALGORITHM + + This algorithm conducts a breadth-first search of the tree. Starting + from the first matching point in the subject, it scans the subject + string from left to right, once, character by character, and as it does + this, it remembers all the paths through the tree that represent valid + matches. In Friedl's terminology, this is a kind of "DFA algorithm", + though it is not implemented as a traditional finite state machine (it + keeps multiple states active simultaneously). + + Although the general principle of this matching algorithm is that it + scans the subject string only once, without backtracking, there is one + exception: when a lookaround assertion is encountered, the characters + following or preceding the current point have to be independently in- + spected. + + The scan continues until either the end of the subject is reached, or + there are no more unterminated paths. At this point, terminated paths + represent the different matching possibilities (if there are none, the + match has failed). Thus, if there is more than one possible match, + this algorithm finds all of them, and in particular, it finds the long- + est. The matches are returned in decreasing order of length. There is + an option to stop the algorithm after the first match (which is neces- + sarily the shortest) is found. + + Note that all the matches that are found start at the same point in the + subject. If the pattern + + cat(er(pillar)?)? + + is matched against the string "the caterpillar catchment", the result + is the three strings "caterpillar", "cater", and "cat" that start at + the fifth character of the subject. The algorithm does not automati- + cally move on to find matches that start at later positions. + + PCRE2's "auto-possessification" optimization usually applies to charac- + ter repeats at the end of a pattern (as well as internally). For exam- + ple, the pattern "a\d+" is compiled as if it were "a\d++" because there + is no point even considering the possibility of backtracking into the + repeated digits. For DFA matching, this means that only one possible + match is found. If you really do want multiple matches in such cases, + either use an ungreedy repeat ("a\d+?") or set the PCRE2_NO_AUTO_POS- + SESS option when compiling. + + There are a number of features of PCRE2 regular expressions that are + not supported or behave differently in the alternative matching func- + tion. Those that are not supported cause an error if encountered. + + 1. Because the algorithm finds all possible matches, the greedy or un- + greedy nature of repetition quantifiers is not relevant (though it may + affect auto-possessification, as just described). During matching, + greedy and ungreedy quantifiers are treated in exactly the same way. + However, possessive quantifiers can make a difference when what follows + could also match what is quantified, for example in a pattern like + this: + + ^a++\w! + + This pattern matches "aaab!" but not "aaa!", which would be matched by + a non-possessive quantifier. Similarly, if an atomic group is present, + it is matched as if it were a standalone pattern at the current point, + and the longest match is then "locked in" for the rest of the overall + pattern. + + 2. When dealing with multiple paths through the tree simultaneously, it + is not straightforward to keep track of captured substrings for the + different matching possibilities, and PCRE2's implementation of this + algorithm does not attempt to do this. This means that no captured sub- + strings are available. + + 3. Because no substrings are captured, backreferences within the pat- + tern are not supported. + + 4. For the same reason, conditional expressions that use a backrefer- + ence as the condition or test for a specific group recursion are not + supported. + + 5. Again for the same reason, script runs are not supported. + + 6. Because many paths through the tree may be active, the \K escape se- + quence, which resets the start of the match when encountered (but may + be on some paths and not on others), is not supported. + + 7. Callouts are supported, but the value of the capture_top field is + always 1, and the value of the capture_last field is always 0. + + 8. The \C escape sequence, which (in the standard algorithm) always + matches a single code unit, even in a UTF mode, is not supported in + these modes, because the alternative algorithm moves through the sub- + ject string one character (not code unit) at a time, for all active + paths through the tree. + + 9. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) + are not supported. (*FAIL) is supported, and behaves like a failing + negative assertion. + + 10. The PCRE2_MATCH_INVALID_UTF option for pcre2_compile() is not sup- + ported by pcre2_dfa_match(). + + +ADVANTAGES OF THE ALTERNATIVE ALGORITHM + + Using the alternative matching algorithm provides the following advan- + tages: + + 1. All possible matches (at a single point in the subject) are automat- + ically found, and in particular, the longest match is found. To find + more than one match using the standard algorithm, you have to do kludgy + things with callouts. + + 2. Because the alternative algorithm scans the subject string just + once, and never needs to backtrack (except for lookbehinds), it is pos- + sible to pass very long subject strings to the matching function in + several pieces, checking for partial matching each time. Although it is + also possible to do multi-segment matching using the standard algo- + rithm, by retaining partially matched substrings, it is more compli- + cated. The pcre2partial documentation gives details of partial matching + and discusses multi-segment matching. + + +DISADVANTAGES OF THE ALTERNATIVE ALGORITHM + + The alternative algorithm suffers from a number of disadvantages: + + 1. It is substantially slower than the standard algorithm. This is + partly because it has to search for all possible matches, but is also + because it is less susceptible to optimization. + + 2. Capturing parentheses, backreferences, script runs, and matching + within invalid UTF string are not supported. + + 3. Although atomic groups are supported, their use does not provide the + performance advantage that it does for the standard algorithm. + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 23 May 2019 + Copyright (c) 1997-2019 University of Cambridge. +------------------------------------------------------------------------------ + + +PCRE2PARTIAL(3) Library Functions Manual PCRE2PARTIAL(3) + + + +NAME + PCRE2 - Perl-compatible regular expressions + +PARTIAL MATCHING IN PCRE2 + + In normal use of PCRE2, if there is a match up to the end of a subject + string, but more characters are needed to match the entire pattern, + PCRE2_ERROR_NOMATCH is returned, just like any other failing match. + There are circumstances where it might be helpful to distinguish this + "partial match" case. + + One example is an application where the subject string is very long, + and not all available at once. The requirement here is to be able to do + the matching segment by segment, but special action is needed when a + matched substring spans the boundary between two segments. + + Another example is checking a user input string as it is typed, to en- + sure that it conforms to a required format. Invalid characters can be + immediately diagnosed and rejected, giving instant feedback. + + Partial matching is a PCRE2-specific feature; it is not Perl-compati- + ble. It is requested by setting one of the PCRE2_PARTIAL_HARD or + PCRE2_PARTIAL_SOFT options when calling a matching function. The dif- + ference between the two options is whether or not a partial match is + preferred to an alternative complete match, though the details differ + between the two types of matching function. If both options are set, + PCRE2_PARTIAL_HARD takes precedence. + + If you want to use partial matching with just-in-time optimized code, + as well as setting a partial match option for the matching function, + you must also call pcre2_jit_compile() with one or both of these op- + tions: + + PCRE2_JIT_PARTIAL_HARD + PCRE2_JIT_PARTIAL_SOFT + + PCRE2_JIT_COMPLETE should also be set if you are going to run non-par- + tial matches on the same pattern. Separate code is compiled for each + mode. If the appropriate JIT mode has not been compiled, interpretive + matching code is used. + + Setting a partial matching option disables two of PCRE2's standard op- + timization hints. PCRE2 remembers the last literal code unit in a pat- + tern, and abandons matching immediately if it is not present in the + subject string. This optimization cannot be used for a subject string + that might match only partially. PCRE2 also remembers a minimum length + of a matching string, and does not bother to run the matching function + on shorter strings. This optimization is also disabled for partial + matching. + + +REQUIREMENTS FOR A PARTIAL MATCH + + A possible partial match occurs during matching when the end of the + subject string is reached successfully, but either more characters are + needed to complete the match, or the addition of more characters might + change what is matched. + + Example 1: if the pattern is /abc/ and the subject is "ab", more char- + acters are definitely needed to complete a match. In this case both + hard and soft matching options yield a partial match. + + Example 2: if the pattern is /ab+/ and the subject is "ab", a complete + match can be found, but the addition of more characters might change + what is matched. In this case, only PCRE2_PARTIAL_HARD returns a par- + tial match; PCRE2_PARTIAL_SOFT returns the complete match. + + On reaching the end of the subject, when PCRE2_PARTIAL_HARD is set, if + the next pattern item is \z, \Z, \b, \B, or $ there is always a partial + match. Otherwise, for both options, the next pattern item must be one + that inspects a character, and at least one of the following must be + true: + + (1) At least one character has already been inspected. An inspected + character need not form part of the final matched string; lookbehind + assertions and the \K escape sequence provide ways of inspecting char- + acters before the start of a matched string. + + (2) The pattern contains one or more lookbehind assertions. This condi- + tion exists in case there is a lookbehind that inspects characters be- + fore the start of the match. + + (3) There is a special case when the whole pattern can match an empty + string. When the starting point is at the end of the subject, the + empty string match is a possibility, and if PCRE2_PARTIAL_SOFT is set + and neither of the above conditions is true, it is returned. However, + because adding more characters might result in a non-empty match, + PCRE2_PARTIAL_HARD returns a partial match, which in this case means + "there is going to be a match at this point, but until some more char- + acters are added, we do not know if it will be an empty string or some- + thing longer". + + +PARTIAL MATCHING USING pcre2_match() + + When a partial matching option is set, the result of calling + pcre2_match() can be one of the following: + + A successful match + A complete match has been found, starting and ending within this sub- + ject. + + PCRE2_ERROR_NOMATCH + No match can start anywhere in this subject. + + PCRE2_ERROR_PARTIAL + Adding more characters may result in a complete match that uses one + or more characters from the end of this subject. + + When a partial match is returned, the first two elements in the ovector + point to the portion of the subject that was matched, but the values in + the rest of the ovector are undefined. The appearance of \K in the pat- + tern has no effect for a partial match. Consider this pattern: + + /abc\K123/ + + If it is matched against "456abc123xyz" the result is a complete match, + and the ovector defines the matched string as "123", because \K resets + the "start of match" point. However, if a partial match is requested + and the subject string is "456abc12", a partial match is found for the + string "abc12", because all these characters are needed for a subse- + quent re-match with additional characters. + + If there is more than one partial match, the first one that was found + provides the data that is returned. Consider this pattern: + + /123\w+X|dogY/ + + If this is matched against the subject string "abc123dog", both alter- + natives fail to match, but the end of the subject is reached during + matching, so PCRE2_ERROR_PARTIAL is returned. The offsets are set to 3 + and 9, identifying "123dog" as the first partial match. (In this exam- + ple, there are two partial matches, because "dog" on its own partially + matches the second alternative.) + + How a partial match is processed by pcre2_match() + + What happens when a partial match is identified depends on which of the + two partial matching options is set. + + If PCRE2_PARTIAL_HARD is set, PCRE2_ERROR_PARTIAL is returned as soon + as a partial match is found, without continuing to search for possible + complete matches. This option is "hard" because it prefers an earlier + partial match over a later complete match. For this reason, the assump- + tion is made that the end of the supplied subject string is not the + true end of the available data, which is why \z, \Z, \b, \B, and $ al- + ways give a partial match. + + If PCRE2_PARTIAL_SOFT is set, the partial match is remembered, but + matching continues as normal, and other alternatives in the pattern are + tried. If no complete match can be found, PCRE2_ERROR_PARTIAL is re- + turned instead of PCRE2_ERROR_NOMATCH. This option is "soft" because it + prefers a complete match over a partial match. All the various matching + items in a pattern behave as if the subject string is potentially com- + plete; \z, \Z, and $ match at the end of the subject, as normal, and + for \b and \B the end of the subject is treated as a non-alphanumeric. + + The difference between the two partial matching options can be illus- + trated by a pattern such as: + + /dog(sbody)?/ + + This matches either "dog" or "dogsbody", greedily (that is, it prefers + the longer string if possible). If it is matched against the string + "dog" with PCRE2_PARTIAL_SOFT, it yields a complete match for "dog". + However, if PCRE2_PARTIAL_HARD is set, the result is PCRE2_ERROR_PAR- + TIAL. On the other hand, if the pattern is made ungreedy the result is + different: + + /dog(sbody)??/ + + In this case the result is always a complete match because that is + found first, and matching never continues after finding a complete + match. It might be easier to follow this explanation by thinking of the + two patterns like this: + + /dog(sbody)?/ is the same as /dogsbody|dog/ + /dog(sbody)??/ is the same as /dog|dogsbody/ + + The second pattern will never match "dogsbody", because it will always + find the shorter match first. + + Example of partial matching using pcre2test + + The pcre2test data modifiers partial_hard (or ph) and partial_soft (or + ps) set PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT, respectively, when + calling pcre2_match(). Here is a run of pcre2test using a pattern that + matches the whole subject in the form of a date: + + re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ + data> 25dec3\=ph + Partial match: 23dec3 + data> 3ju\=ph + Partial match: 3ju + data> 3juj\=ph + No match + + This example gives the same results for both hard and soft partial + matching options. Here is an example where there is a difference: + + re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ + data> 25jun04\=ps + 0: 25jun04 + 1: jun + data> 25jun04\=ph + Partial match: 25jun04 + + With PCRE2_PARTIAL_SOFT, the subject is matched completely. For + PCRE2_PARTIAL_HARD, however, the subject is assumed not to be complete, + so there is only a partial match. + + +MULTI-SEGMENT MATCHING WITH pcre2_match() + + PCRE was not originally designed with multi-segment matching in mind. + However, over time, features (including partial matching) that make + multi-segment matching possible have been added. A very long string can + be searched segment by segment by calling pcre2_match() repeatedly, + with the aim of achieving the same results that would happen if the en- + tire string was available for searching all the time. Normally, the + strings that are being sought are much shorter than each individual + segment, and are in the middle of very long strings, so the pattern is + normally not anchored. + + Special logic must be implemented to handle a matched substring that + spans a segment boundary. PCRE2_PARTIAL_HARD should be used, because it + returns a partial match at the end of a segment whenever there is the + possibility of changing the match by adding more characters. The + PCRE2_NOTBOL option should also be set for all but the first segment. + + When a partial match occurs, the next segment must be added to the cur- + rent subject and the match re-run, using the startoffset argument of + pcre2_match() to begin at the point where the partial match started. + For example: + + re> /\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d/ + data> ...the date is 23ja\=ph + Partial match: 23ja + data> ...the date is 23jan19 and on that day...\=offset=15 + 0: 23jan19 + 1: jan + + Note the use of the offset modifier to start the new match where the + partial match was found. In this example, the next segment was added to + the one in which the partial match was found. This is the most + straightforward approach, typically using a memory buffer that is twice + the size of each segment. After a partial match, the first half of the + buffer is discarded, the second half is moved to the start of the buf- + fer, and a new segment is added before repeating the match as in the + example above. After a no match, the entire buffer can be discarded. + + If there are memory constraints, you may want to discard text that pre- + cedes a partial match before adding the next segment. Unfortunately, + this is not at present straightforward. In cases such as the above, + where the pattern does not contain any lookbehinds, it is sufficient to + retain only the partially matched substring. However, if the pattern + contains a lookbehind assertion, characters that precede the start of + the partial match may have been inspected during the matching process. + When pcre2test displays a partial match, it indicates these characters + with '<' if the allusedtext modifier is set: + + re> "(?<=123)abc" + data> xx123ab\=ph,allusedtext + Partial match: 123ab + <<< + + However, the allusedtext modifier is not available for JIT matching, + because JIT matching does not record the first (or last) consulted + characters. For this reason, this information is not available via the + API. It is therefore not possible in general to obtain the exact number + of characters that must be retained in order to get the right match re- + sult. If you cannot retain the entire segment, you must find some + heuristic way of choosing. + + If you know the approximate length of the matching substrings, you can + use that to decide how much text to retain. The only lookbehind infor- + mation that is currently available via the API is the length of the + longest individual lookbehind in a pattern, but this can be misleading + if there are nested lookbehinds. The value returned by calling + pcre2_pattern_info() with the PCRE2_INFO_MAXLOOKBEHIND option is the + maximum number of characters (not code units) that any individual look- + behind moves back when it is processed. A pattern such as + "(?<=(? /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ + data> 23ja\=dfa,ps + Partial match: 23ja + data> n05\=dfa,dfa_restart + 0: n05 + + The first call has "23ja" as the subject, and requests partial match- + ing; the second call has "n05" as the subject for the continued + (restarted) match. Notice that when the match is complete, only the + last part is shown; PCRE2 does not retain the previously partially- + matched string. It is up to the calling program to do that if it needs + to. This means that, for an unanchored pattern, if a continued match + fails, it is not possible to try again at a new starting point. All + this facility is capable of doing is continuing with the previous match + attempt. For example, consider this pattern: + + 1234|3789 + + If the first part of the subject is "ABC123", a partial match of the + first alternative is found at offset 3. There is no partial match for + the second alternative, because such a match does not start at the same + point in the subject string. Attempting to continue with the string + "7890" does not yield a match because only those alternatives that + match at one point in the subject are remembered. Depending on the ap- + plication, this may or may not be what you want. + + If you do want to allow for starting again at the next character, one + way of doing it is to retain some or all of the segment and try a new + complete match, as described for pcre2_match() above. Another possibil- + ity is to work with two buffers. If a partial match at offset n in the + first buffer is followed by "no match" when PCRE2_DFA_RESTART is used + on the second buffer, you can then try a new match starting at offset + n+1 in the first buffer. + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 04 September 2019 + Copyright (c) 1997-2019 University of Cambridge. +------------------------------------------------------------------------------ + + +PCRE2PATTERN(3) Library Functions Manual PCRE2PATTERN(3) + + + +NAME + PCRE2 - Perl-compatible regular expressions (revised API) + +PCRE2 REGULAR EXPRESSION DETAILS + + The syntax and semantics of the regular expressions that are supported + by PCRE2 are described in detail below. There is a quick-reference syn- + tax summary in the pcre2syntax page. PCRE2 tries to match Perl syntax + and semantics as closely as it can. PCRE2 also supports some alterna- + tive regular expression syntax (which does not conflict with the Perl + syntax) in order to provide some compatibility with regular expressions + in Python, .NET, and Oniguruma. + + Perl's regular expressions are described in its own documentation, and + regular expressions in general are covered in a number of books, some + of which have copious examples. Jeffrey Friedl's "Mastering Regular Ex- + pressions", published by O'Reilly, covers regular expressions in great + detail. This description of PCRE2's regular expressions is intended as + reference material. + + This document discusses the regular expression patterns that are sup- + ported by PCRE2 when its main matching function, pcre2_match(), is + used. PCRE2 also has an alternative matching function, + pcre2_dfa_match(), which matches using a different algorithm that is + not Perl-compatible. Some of the features discussed below are not + available when DFA matching is used. The advantages and disadvantages + of the alternative function, and how it differs from the normal func- + tion, are discussed in the pcre2matching page. + + +SPECIAL START-OF-PATTERN ITEMS + + A number of options that can be passed to pcre2_compile() can also be + set by special items at the start of a pattern. These are not Perl-com- + patible, but are provided to make these options accessible to pattern + writers who are not able to change the program that processes the pat- + tern. Any number of these items may appear, but they must all be to- + gether right at the start of the pattern string, and the letters must + be in upper case. + + UTF support + + In the 8-bit and 16-bit PCRE2 libraries, characters may be coded either + as single code units, or as multiple UTF-8 or UTF-16 code units. UTF-32 + can be specified for the 32-bit library, in which case it constrains + the character values to valid Unicode code points. To process UTF + strings, PCRE2 must be built to include Unicode support (which is the + default). When using UTF strings you must either call the compiling + function with one or both of the PCRE2_UTF or PCRE2_MATCH_INVALID_UTF + options, or the pattern must start with the special sequence (*UTF), + which is equivalent to setting the relevant PCRE2_UTF. How setting a + UTF mode affects pattern matching is mentioned in several places below. + There is also a summary of features in the pcre2unicode page. + + Some applications that allow their users to supply patterns may wish to + restrict them to non-UTF data for security reasons. If the + PCRE2_NEVER_UTF option is passed to pcre2_compile(), (*UTF) is not al- + lowed, and its appearance in a pattern causes an error. + + Unicode property support + + Another special sequence that may appear at the start of a pattern is + (*UCP). This has the same effect as setting the PCRE2_UCP option: it + causes sequences such as \d and \w to use Unicode properties to deter- + mine character types, instead of recognizing only characters with codes + less than 256 via a lookup table. If also causes upper/lower casing op- + erations to use Unicode properties for characters with code points + greater than 127, even when UTF is not set. + + Some applications that allow their users to supply patterns may wish to + restrict them for security reasons. If the PCRE2_NEVER_UCP option is + passed to pcre2_compile(), (*UCP) is not allowed, and its appearance in + a pattern causes an error. + + Locking out empty string matching + + Starting a pattern with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) has the same + effect as passing the PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART option + to whichever matching function is subsequently called to match the pat- + tern. These options lock out the matching of empty strings, either en- + tirely, or only at the start of the subject. + + Disabling auto-possessification + + If a pattern starts with (*NO_AUTO_POSSESS), it has the same effect as + setting the PCRE2_NO_AUTO_POSSESS option. This stops PCRE2 from making + quantifiers possessive when what follows cannot match the repeated + item. For example, by default a+b is treated as a++b. For more details, + see the pcre2api documentation. + + Disabling start-up optimizations + + If a pattern starts with (*NO_START_OPT), it has the same effect as + setting the PCRE2_NO_START_OPTIMIZE option. This disables several opti- + mizations for quickly reaching "no match" results. For more details, + see the pcre2api documentation. + + Disabling automatic anchoring + + If a pattern starts with (*NO_DOTSTAR_ANCHOR), it has the same effect + as setting the PCRE2_NO_DOTSTAR_ANCHOR option. This disables optimiza- + tions that apply to patterns whose top-level branches all start with .* + (match any number of arbitrary characters). For more details, see the + pcre2api documentation. + + Disabling JIT compilation + + If a pattern that starts with (*NO_JIT) is successfully compiled, an + attempt by the application to apply the JIT optimization by calling + pcre2_jit_compile() is ignored. + + Setting match resource limits + + The pcre2_match() function contains a counter that is incremented every + time it goes round its main loop. The caller of pcre2_match() can set a + limit on this counter, which therefore limits the amount of computing + resource used for a match. The maximum depth of nested backtracking can + also be limited; this indirectly restricts the amount of heap memory + that is used, but there is also an explicit memory limit that can be + set. + + These facilities are provided to catch runaway matches that are pro- + voked by patterns with huge matching trees. A common example is a pat- + tern with nested unlimited repeats applied to a long string that does + not match. When one of these limits is reached, pcre2_match() gives an + error return. The limits can also be set by items at the start of the + pattern of the form + + (*LIMIT_HEAP=d) + (*LIMIT_MATCH=d) + (*LIMIT_DEPTH=d) + + where d is any number of decimal digits. However, the value of the set- + ting must be less than the value set (or defaulted) by the caller of + pcre2_match() for it to have any effect. In other words, the pattern + writer can lower the limits set by the programmer, but not raise them. + If there is more than one setting of one of these limits, the lower + value is used. The heap limit is specified in kibibytes (units of 1024 + bytes). + + Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This + name is still recognized for backwards compatibility. + + The heap limit applies only when the pcre2_match() or pcre2_dfa_match() + interpreters are used for matching. It does not apply to JIT. The match + limit is used (but in a different way) when JIT is being used, or when + pcre2_dfa_match() is called, to limit computing resource usage by those + matching functions. The depth limit is ignored by JIT but is relevant + for DFA matching, which uses function recursion for recursions within + the pattern and for lookaround assertions and atomic groups. In this + case, the depth limit controls the depth of such recursion. + + Newline conventions + + PCRE2 supports six different conventions for indicating line breaks in + strings: a single CR (carriage return) character, a single LF (line- + feed) character, the two-character sequence CRLF, any of the three pre- + ceding, any Unicode newline sequence, or the NUL character (binary + zero). The pcre2api page has further discussion about newlines, and + shows how to set the newline convention when calling pcre2_compile(). + + It is also possible to specify a newline convention by starting a pat- + tern string with one of the following sequences: + + (*CR) carriage return + (*LF) linefeed + (*CRLF) carriage return, followed by linefeed + (*ANYCRLF) any of the three above + (*ANY) all Unicode newline sequences + (*NUL) the NUL character (binary zero) + + These override the default and the options given to the compiling func- + tion. For example, on a Unix system where LF is the default newline se- + quence, the pattern + + (*CR)a.b + + changes the convention to CR. That pattern matches "a\nb" because LF is + no longer a newline. If more than one of these settings is present, the + last one is used. + + The newline convention affects where the circumflex and dollar asser- + tions are true. It also affects the interpretation of the dot metachar- + acter when PCRE2_DOTALL is not set, and the behaviour of \N when not + followed by an opening brace. However, it does not affect what the \R + escape sequence matches. By default, this is any Unicode newline se- + quence, for Perl compatibility. However, this can be changed; see the + next section and the description of \R in the section entitled "Newline + sequences" below. A change of \R setting can be combined with a change + of newline convention. + + Specifying what \R matches + + It is possible to restrict \R to match only CR, LF, or CRLF (instead of + the complete set of Unicode line endings) by setting the option + PCRE2_BSR_ANYCRLF at compile time. This effect can also be achieved by + starting a pattern with (*BSR_ANYCRLF). For completeness, (*BSR_UNI- + CODE) is also recognized, corresponding to PCRE2_BSR_UNICODE. + + +EBCDIC CHARACTER CODES + + PCRE2 can be compiled to run in an environment that uses EBCDIC as its + character code instead of ASCII or Unicode (typically a mainframe sys- + tem). In the sections below, character code values are ASCII or Uni- + code; in an EBCDIC environment these characters may have different code + values, and there are no code points greater than 255. + + +CHARACTERS AND METACHARACTERS + + A regular expression is a pattern that is matched against a subject + string from left to right. Most characters stand for themselves in a + pattern, and match the corresponding characters in the subject. As a + trivial example, the pattern + + The quick brown fox + + matches a portion of a subject string that is identical to itself. When + caseless matching is specified (the PCRE2_CASELESS option or (?i) + within the pattern), letters are matched independently of case. Note + that there are two ASCII characters, K and S, that, in addition to + their lower case ASCII equivalents, are case-equivalent with Unicode + U+212A (Kelvin sign) and U+017F (long S) respectively when either + PCRE2_UTF or PCRE2_UCP is set. + + The power of regular expressions comes from the ability to include wild + cards, character classes, alternatives, and repetitions in the pattern. + These are encoded in the pattern by the use of metacharacters, which do + not stand for themselves but instead are interpreted in some special + way. + + There are two different sets of metacharacters: those that are recog- + nized anywhere in the pattern except within square brackets, and those + that are recognized within square brackets. Outside square brackets, + the metacharacters are as follows: + + \ general escape character with several uses + ^ assert start of string (or line, in multiline mode) + $ assert end of string (or line, in multiline mode) + . match any character except newline (by default) + [ start character class definition + | start of alternative branch + ( start group or control verb + ) end group or control verb + * 0 or more quantifier + + 1 or more quantifier; also "possessive quantifier" + ? 0 or 1 quantifier; also quantifier minimizer + { start min/max quantifier + + Part of a pattern that is in square brackets is called a "character + class". In a character class the only metacharacters are: + + \ general escape character + ^ negate the class, but only if the first character + - indicates character range + [ POSIX character class (if followed by POSIX syntax) + ] terminates the character class + + If a pattern is compiled with the PCRE2_EXTENDED option, most white + space in the pattern, other than in a character class, and characters + between a # outside a character class and the next newline, inclusive, + are ignored. An escaping backslash can be used to include a white space + or a # character as part of the pattern. If the PCRE2_EXTENDED_MORE op- + tion is set, the same applies, but in addition unescaped space and hor- + izontal tab characters are ignored inside a character class. Note: only + these two characters are ignored, not the full set of pattern white + space characters that are ignored outside a character class. Option + settings can be changed within a pattern; see the section entitled "In- + ternal Option Setting" below. + + The following sections describe the use of each of the metacharacters. + + +BACKSLASH + + The backslash character has several uses. Firstly, if it is followed by + a character that is not a digit or a letter, it takes away any special + meaning that character may have. This use of backslash as an escape + character applies both inside and outside character classes. + + For example, if you want to match a * character, you must write \* in + the pattern. This escaping action applies whether or not the following + character would otherwise be interpreted as a metacharacter, so it is + always safe to precede a non-alphanumeric with backslash to specify + that it stands for itself. In particular, if you want to match a back- + slash, you write \\. + + Only ASCII digits and letters have any special meaning after a back- + slash. All other characters (in particular, those whose code points are + greater than 127) are treated as literals. + + If you want to treat all characters in a sequence as literals, you can + do so by putting them between \Q and \E. This is different from Perl in + that $ and @ are handled as literals in \Q...\E sequences in PCRE2, + whereas in Perl, $ and @ cause variable interpolation. Also, Perl does + "double-quotish backslash interpolation" on any backslashes between \Q + and \E which, its documentation says, "may lead to confusing results". + PCRE2 treats a backslash between \Q and \E just like any other charac- + ter. Note the following examples: + + Pattern PCRE2 matches Perl matches + + \Qabc$xyz\E abc$xyz abc followed by the + contents of $xyz + \Qabc\$xyz\E abc\$xyz abc\$xyz + \Qabc\E\$\Qxyz\E abc$xyz abc$xyz + \QA\B\E A\B A\B + \Q\\E \ \\E + + The \Q...\E sequence is recognized both inside and outside character + classes. An isolated \E that is not preceded by \Q is ignored. If \Q + is not followed by \E later in the pattern, the literal interpretation + continues to the end of the pattern (that is, \E is assumed at the + end). If the isolated \Q is inside a character class, this causes an + error, because the character class is not terminated by a closing + square bracket. + + Non-printing characters + + A second use of backslash provides a way of encoding non-printing char- + acters in patterns in a visible manner. There is no restriction on the + appearance of non-printing characters in a pattern, but when a pattern + is being prepared by text editing, it is often easier to use one of the + following escape sequences instead of the binary character it repre- + sents. In an ASCII or Unicode environment, these escapes are as fol- + lows: + + \a alarm, that is, the BEL character (hex 07) + \cx "control-x", where x is any printable ASCII character + \e escape (hex 1B) + \f form feed (hex 0C) + \n linefeed (hex 0A) + \r carriage return (hex 0D) (but see below) + \t tab (hex 09) + \0dd character with octal code 0dd + \ddd character with octal code ddd, or backreference + \o{ddd..} character with octal code ddd.. + \xhh character with hex code hh + \x{hhh..} character with hex code hhh.. + \N{U+hhh..} character with Unicode hex code point hhh.. + + By default, after \x that is not followed by {, from zero to two hexa- + decimal digits are read (letters can be in upper or lower case). Any + number of hexadecimal digits may appear between \x{ and }. If a charac- + ter other than a hexadecimal digit appears between \x{ and }, or if + there is no terminating }, an error occurs. + + Characters whose code points are less than 256 can be defined by either + of the two syntaxes for \x or by an octal sequence. There is no differ- + ence in the way they are handled. For example, \xdc is exactly the same + as \x{dc} or \334. However, using the braced versions does make such + sequences easier to read. + + Support is available for some ECMAScript (aka JavaScript) escape se- + quences via two compile-time options. If PCRE2_ALT_BSUX is set, the se- + quence \x followed by { is not recognized. Only if \x is followed by + two hexadecimal digits is it recognized as a character escape. Other- + wise it is interpreted as a literal "x" character. In this mode, sup- + port for code points greater than 256 is provided by \u, which must be + followed by four hexadecimal digits; otherwise it is interpreted as a + literal "u" character. + + PCRE2_EXTRA_ALT_BSUX has the same effect as PCRE2_ALT_BSUX and, in ad- + dition, \u{hhh..} is recognized as the character specified by hexadeci- + mal code point. There may be any number of hexadecimal digits. This + syntax is from ECMAScript 6. + + The \N{U+hhh..} escape sequence is recognized only when PCRE2 is oper- + ating in UTF mode. Perl also uses \N{name} to specify characters by + Unicode name; PCRE2 does not support this. Note that when \N is not + followed by an opening brace (curly bracket) it has an entirely differ- + ent meaning, matching any character that is not a newline. + + There are some legacy applications where the escape sequence \r is ex- + pected to match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option + is set, \r in a pattern is converted to \n so that it matches a LF + (linefeed) instead of a CR (carriage return) character. + + The precise effect of \cx on ASCII characters is as follows: if x is a + lower case letter, it is converted to upper case. Then bit 6 of the + character (hex 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A + (A is 41, Z is 5A), but \c{ becomes hex 3B ({ is 7B), and \c; becomes + hex 7B (; is 3B). If the code unit following \c has a value less than + 32 or greater than 126, a compile-time error occurs. + + When PCRE2 is compiled in EBCDIC mode, \N{U+hhh..} is not supported. + \a, \e, \f, \n, \r, and \t generate the appropriate EBCDIC code values. + The \c escape is processed as specified for Perl in the perlebcdic doc- + ument. The only characters that are allowed after \c are A-Z, a-z, or + one of @, [, \, ], ^, _, or ?. Any other character provokes a compile- + time error. The sequence \c@ encodes character code 0; after \c the + letters (in either case) encode characters 1-26 (hex 01 to hex 1A); [, + \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and \c? be- + comes either 255 (hex FF) or 95 (hex 5F). + + Thus, apart from \c?, these escapes generate the same character code + values as they do in an ASCII environment, though the meanings of the + values mostly differ. For example, \cG always generates code value 7, + which is BEL in ASCII but DEL in EBCDIC. + + The sequence \c? generates DEL (127, hex 7F) in an ASCII environment, + but because 127 is not a control character in EBCDIC, Perl makes it + generate the APC character. Unfortunately, there are several variants + of EBCDIC. In most of them the APC character has the value 255 (hex + FF), but in the one Perl calls POSIX-BC its value is 95 (hex 5F). If + certain other characters have POSIX-BC values, PCRE2 makes \c? generate + 95; otherwise it generates 255. + + After \0 up to two further octal digits are read. If there are fewer + than two digits, just those that are present are used. Thus the se- + quence \0\x\015 specifies two binary zeros followed by a CR character + (code value 13). Make sure you supply two digits after the initial zero + if the pattern character that follows is itself an octal digit. + + The escape \o must be followed by a sequence of octal digits, enclosed + in braces. An error occurs if this is not the case. This escape is a + recent addition to Perl; it provides way of specifying character code + points as octal numbers greater than 0777, and it also allows octal + numbers and backreferences to be unambiguously specified. + + For greater clarity and unambiguity, it is best to avoid following \ by + a digit greater than zero. Instead, use \o{} or \x{} to specify numeri- + cal character code points, and \g{} to specify backreferences. The fol- + lowing paragraphs describe the old, ambiguous syntax. + + The handling of a backslash followed by a digit other than 0 is compli- + cated, and Perl has changed over time, causing PCRE2 also to change. + + Outside a character class, PCRE2 reads the digit and any following dig- + its as a decimal number. If the number is less than 10, begins with the + digit 8 or 9, or if there are at least that many previous capture + groups in the expression, the entire sequence is taken as a backrefer- + ence. A description of how this works is given later, following the + discussion of parenthesized groups. Otherwise, up to three octal dig- + its are read to form a character code. + + Inside a character class, PCRE2 handles \8 and \9 as the literal char- + acters "8" and "9", and otherwise reads up to three octal digits fol- + lowing the backslash, using them to generate a data character. Any sub- + sequent digits stand for themselves. For example, outside a character + class: + + \040 is another way of writing an ASCII space + \40 is the same, provided there are fewer than 40 + previous capture groups + \7 is always a backreference + \11 might be a backreference, or another way of + writing a tab + \011 is always a tab + \0113 is a tab followed by the character "3" + \113 might be a backreference, otherwise the + character with octal code 113 + \377 might be a backreference, otherwise + the value 255 (decimal) + \81 is always a backreference + + Note that octal values of 100 or greater that are specified using this + syntax must not be introduced by a leading zero, because no more than + three octal digits are ever read. + + Constraints on character values + + Characters that are specified using octal or hexadecimal numbers are + limited to certain values, as follows: + + 8-bit non-UTF mode no greater than 0xff + 16-bit non-UTF mode no greater than 0xffff + 32-bit non-UTF mode no greater than 0xffffffff + All UTF modes no greater than 0x10ffff and a valid code point + + Invalid Unicode code points are all those in the range 0xd800 to 0xdfff + (the so-called "surrogate" code points). The check for these can be + disabled by the caller of pcre2_compile() by setting the option + PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES. However, this is possible only in + UTF-8 and UTF-32 modes, because these values are not representable in + UTF-16. + + Escape sequences in character classes + + All the sequences that define a single character value can be used both + inside and outside character classes. In addition, inside a character + class, \b is interpreted as the backspace character (hex 08). + + When not followed by an opening brace, \N is not allowed in a character + class. \B, \R, and \X are not special inside a character class. Like + other unrecognized alphabetic escape sequences, they cause an error. + Outside a character class, these sequences have different meanings. + + Unsupported escape sequences + + In Perl, the sequences \F, \l, \L, \u, and \U are recognized by its + string handler and used to modify the case of following characters. By + default, PCRE2 does not support these escape sequences in patterns. + However, if either of the PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX op- + tions is set, \U matches a "U" character, and \u can be used to define + a character by code point, as described above. + + Absolute and relative backreferences + + The sequence \g followed by a signed or unsigned number, optionally en- + closed in braces, is an absolute or relative backreference. A named + backreference can be coded as \g{name}. Backreferences are discussed + later, following the discussion of parenthesized groups. + + Absolute and relative subroutine calls + + For compatibility with Oniguruma, the non-Perl syntax \g followed by a + name or a number enclosed either in angle brackets or single quotes, is + an alternative syntax for referencing a capture group as a subroutine. + Details are discussed later. Note that \g{...} (Perl syntax) and + \g<...> (Oniguruma syntax) are not synonymous. The former is a backref- + erence; the latter is a subroutine call. + + Generic character types + + Another use of backslash is for specifying generic character types: + + \d any decimal digit + \D any character that is not a decimal digit + \h any horizontal white space character + \H any character that is not a horizontal white space character + \N any character that is not a newline + \s any white space character + \S any character that is not a white space character + \v any vertical white space character + \V any character that is not a vertical white space character + \w any "word" character + \W any "non-word" character + + The \N escape sequence has the same meaning as the "." metacharacter + when PCRE2_DOTALL is not set, but setting PCRE2_DOTALL does not change + the meaning of \N. Note that when \N is followed by an opening brace it + has a different meaning. See the section entitled "Non-printing charac- + ters" above for details. Perl also uses \N{name} to specify characters + by Unicode name; PCRE2 does not support this. + + Each pair of lower and upper case escape sequences partitions the com- + plete set of characters into two disjoint sets. Any given character + matches one, and only one, of each pair. The sequences can appear both + inside and outside character classes. They each match one character of + the appropriate type. If the current matching point is at the end of + the subject string, all of them fail, because there is no character to + match. + + The default \s characters are HT (9), LF (10), VT (11), FF (12), CR + (13), and space (32), which are defined as white space in the "C" lo- + cale. This list may vary if locale-specific matching is taking place. + For example, in some locales the "non-breaking space" character (\xA0) + is recognized as white space, and in others the VT character is not. + + A "word" character is an underscore or any character that is a letter + or digit. By default, the definition of letters and digits is con- + trolled by PCRE2's low-valued character tables, and may vary if locale- + specific matching is taking place (see "Locale support" in the pcre2api + page). For example, in a French locale such as "fr_FR" in Unix-like + systems, or "french" in Windows, some character codes greater than 127 + are used for accented letters, and these are then matched by \w. The + use of locales with Unicode is discouraged. + + By default, characters whose code points are greater than 127 never + match \d, \s, or \w, and always match \D, \S, and \W, although this may + be different for characters in the range 128-255 when locale-specific + matching is happening. These escape sequences retain their original + meanings from before Unicode support was available, mainly for effi- + ciency reasons. If the PCRE2_UCP option is set, the behaviour is + changed so that Unicode properties are used to determine character + types, as follows: + + \d any character that matches \p{Nd} (decimal digit) + \s any character that matches \p{Z} or \h or \v + \w any character that matches \p{L} or \p{N}, plus underscore + + The upper case escapes match the inverse sets of characters. Note that + \d matches only decimal digits, whereas \w matches any Unicode digit, + as well as any Unicode letter, and underscore. Note also that PCRE2_UCP + affects \b, and \B because they are defined in terms of \w and \W. + Matching these sequences is noticeably slower when PCRE2_UCP is set. + + The sequences \h, \H, \v, and \V, in contrast to the other sequences, + which match only ASCII characters by default, always match a specific + list of code points, whether or not PCRE2_UCP is set. The horizontal + space characters are: + + U+0009 Horizontal tab (HT) + U+0020 Space + U+00A0 Non-break space + U+1680 Ogham space mark + U+180E Mongolian vowel separator + U+2000 En quad + U+2001 Em quad + U+2002 En space + U+2003 Em space + U+2004 Three-per-em space + U+2005 Four-per-em space + U+2006 Six-per-em space + U+2007 Figure space + U+2008 Punctuation space + U+2009 Thin space + U+200A Hair space + U+202F Narrow no-break space + U+205F Medium mathematical space + U+3000 Ideographic space + + The vertical space characters are: + + U+000A Linefeed (LF) + U+000B Vertical tab (VT) + U+000C Form feed (FF) + U+000D Carriage return (CR) + U+0085 Next line (NEL) + U+2028 Line separator + U+2029 Paragraph separator + + In 8-bit, non-UTF-8 mode, only the characters with code points less + than 256 are relevant. + + Newline sequences + + Outside a character class, by default, the escape sequence \R matches + any Unicode newline sequence. In 8-bit non-UTF-8 mode \R is equivalent + to the following: + + (?>\r\n|\n|\x0b|\f|\r|\x85) + + This is an example of an "atomic group", details of which are given be- + low. This particular group matches either the two-character sequence + CR followed by LF, or one of the single characters LF (linefeed, + U+000A), VT (vertical tab, U+000B), FF (form feed, U+000C), CR (car- + riage return, U+000D), or NEL (next line, U+0085). Because this is an + atomic group, the two-character sequence is treated as a single unit + that cannot be split. + + In other modes, two additional characters whose code points are greater + than 255 are added: LS (line separator, U+2028) and PS (paragraph sepa- + rator, U+2029). Unicode support is not needed for these characters to + be recognized. + + It is possible to restrict \R to match only CR, LF, or CRLF (instead of + the complete set of Unicode line endings) by setting the option + PCRE2_BSR_ANYCRLF at compile time. (BSR is an abbrevation for "back- + slash R".) This can be made the default when PCRE2 is built; if this is + the case, the other behaviour can be requested via the PCRE2_BSR_UNI- + CODE option. It is also possible to specify these settings by starting + a pattern string with one of the following sequences: + + (*BSR_ANYCRLF) CR, LF, or CRLF only + (*BSR_UNICODE) any Unicode newline sequence + + These override the default and the options given to the compiling func- + tion. Note that these special settings, which are not Perl-compatible, + are recognized only at the very start of a pattern, and that they must + be in upper case. If more than one of them is present, the last one is + used. They can be combined with a change of newline convention; for ex- + ample, a pattern can start with: + + (*ANY)(*BSR_ANYCRLF) + + They can also be combined with the (*UTF) or (*UCP) special sequences. + Inside a character class, \R is treated as an unrecognized escape se- + quence, and causes an error. + + Unicode character properties + + When PCRE2 is built with Unicode support (the default), three addi- + tional escape sequences that match characters with specific properties + are available. They can be used in any mode, though in 8-bit and 16-bit + non-UTF modes these sequences are of course limited to testing charac- + ters whose code points are less than U+0100 and U+10000, respectively. + In 32-bit non-UTF mode, code points greater than 0x10ffff (the Unicode + limit) may be encountered. These are all treated as being in the Un- + known script and with an unassigned type. The extra escape sequences + are: + + \p{xx} a character with the xx property + \P{xx} a character without the xx property + \X a Unicode extended grapheme cluster + + The property names represented by xx above are case-sensitive. There is + support for Unicode script names, Unicode general category properties, + "Any", which matches any character (including newline), and some spe- + cial PCRE2 properties (described in the next section). Other Perl + properties such as "InMusicalSymbols" are not supported by PCRE2. Note + that \P{Any} does not match any characters, so always causes a match + failure. + + Sets of Unicode characters are defined as belonging to certain scripts. + A character from one of these sets can be matched using a script name. + For example: + + \p{Greek} + \P{Han} + + Unassigned characters (and in non-UTF 32-bit mode, characters with code + points greater than 0x10FFFF) are assigned the "Unknown" script. Others + that are not part of an identified script are lumped together as "Com- + mon". The current list of scripts is: + + Adlam, Ahom, Anatolian_Hieroglyphs, Arabic, Armenian, Avestan, Bali- + nese, Bamum, Bassa_Vah, Batak, Bengali, Bhaiksuki, Bopomofo, Brahmi, + Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Caucasian_Alba- + nian, Chakma, Cham, Cherokee, Chorasmian, Common, Coptic, Cuneiform, + Cypriot, Cyrillic, Deseret, Devanagari, Dives_Akuru, Dogra, Duployan, + Egyptian_Hieroglyphs, Elbasan, Elymaic, Ethiopic, Georgian, Glagolitic, + Gothic, Grantha, Greek, Gujarati, Gunjala_Gondi, Gurmukhi, Han, Hangul, + Hanifi_Rohingya, Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Aramaic, + Inherited, Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese, + Kaithi, Kannada, Katakana, Kayah_Li, Kharoshthi, Khitan_Small_Script, + Khmer, Khojki, Khudawadi, Lao, Latin, Lepcha, Limbu, Linear_A, Lin- + ear_B, Lisu, Lycian, Lydian, Mahajani, Makasar, Malayalam, Mandaic, + Manichaean, Marchen, Masaram_Gondi, Medefaidrin, Meetei_Mayek, + Mende_Kikakui, Meroitic_Cursive, Meroitic_Hieroglyphs, Miao, Modi, Mon- + golian, Mro, Multani, Myanmar, Nabataean, Nandinagari, New_Tai_Lue, + Newa, Nko, Nushu, Nyakeng_Puachue_Hmong, Ogham, Ol_Chiki, Old_Hungar- + ian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian, Old_Sog- + dian, Old_South_Arabian, Old_Turkic, Oriya, Osage, Osmanya, Pa- + hawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician, + Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha- + vian, Siddham, SignWriting, Sinhala, Sogdian, Sora_Sompeng, Soyombo, + Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, + Tai_Viet, Takri, Tamil, Tangut, Telugu, Thaana, Thai, Tibetan, Tifi- + nagh, Tirhuta, Ugaritic, Unknown, Vai, Wancho, Warang_Citi, Yezidi, Yi, + Zanabazar_Square. + + Each character has exactly one Unicode general category property, spec- + ified by a two-letter abbreviation. For compatibility with Perl, nega- + tion can be specified by including a circumflex between the opening + brace and the property name. For example, \p{^Lu} is the same as + \P{Lu}. + + If only one letter is specified with \p or \P, it includes all the gen- + eral category properties that start with that letter. In this case, in + the absence of negation, the curly brackets in the escape sequence are + optional; these two examples have the same effect: + + \p{L} + \pL + + The following general category property codes are supported: + + C Other + Cc Control + Cf Format + Cn Unassigned + Co Private use + Cs Surrogate + + L Letter + Ll Lower case letter + Lm Modifier letter + Lo Other letter + Lt Title case letter + Lu Upper case letter + + M Mark + Mc Spacing mark + Me Enclosing mark + Mn Non-spacing mark + + N Number + Nd Decimal number + Nl Letter number + No Other number + + P Punctuation + Pc Connector punctuation + Pd Dash punctuation + Pe Close punctuation + Pf Final punctuation + Pi Initial punctuation + Po Other punctuation + Ps Open punctuation + + S Symbol + Sc Currency symbol + Sk Modifier symbol + Sm Mathematical symbol + So Other symbol + + Z Separator + Zl Line separator + Zp Paragraph separator + Zs Space separator + + The special property L& is also supported: it matches a character that + has the Lu, Ll, or Lt property, in other words, a letter that is not + classified as a modifier or "other". + + The Cs (Surrogate) property applies only to characters whose code + points are in the range U+D800 to U+DFFF. These characters are no dif- + ferent to any other character when PCRE2 is not in UTF mode (using the + 16-bit or 32-bit library). However, they are not valid in Unicode + strings and so cannot be tested by PCRE2 in UTF mode, unless UTF valid- + ity checking has been turned off (see the discussion of + PCRE2_NO_UTF_CHECK in the pcre2api page). + + The long synonyms for property names that Perl supports (such as + \p{Letter}) are not supported by PCRE2, nor is it permitted to prefix + any of these properties with "Is". + + No character that is in the Unicode table has the Cn (unassigned) prop- + erty. Instead, this property is assumed for any code point that is not + in the Unicode table. + + Specifying caseless matching does not affect these escape sequences. + For example, \p{Lu} always matches only upper case letters. This is + different from the behaviour of current versions of Perl. + + Matching characters by Unicode property is not fast, because PCRE2 has + to do a multistage table lookup in order to find a character's prop- + erty. That is why the traditional escape sequences such as \d and \w do + not use Unicode properties in PCRE2 by default, though you can make + them do so by setting the PCRE2_UCP option or by starting the pattern + with (*UCP). + + Extended grapheme clusters + + The \X escape matches any number of Unicode characters that form an + "extended grapheme cluster", and treats the sequence as an atomic group + (see below). Unicode supports various kinds of composite character by + giving each character a grapheme breaking property, and having rules + that use these properties to define the boundaries of extended grapheme + clusters. The rules are defined in Unicode Standard Annex 29, "Unicode + Text Segmentation". Unicode 11.0.0 abandoned the use of some previous + properties that had been used for emojis. Instead it introduced vari- + ous emoji-specific properties. PCRE2 uses only the Extended Picto- + graphic property. + + \X always matches at least one character. Then it decides whether to + add additional characters according to the following rules for ending a + cluster: + + 1. End at the end of the subject string. + + 2. Do not end between CR and LF; otherwise end after any control char- + acter. + + 3. Do not break Hangul (a Korean script) syllable sequences. Hangul + characters are of five types: L, V, T, LV, and LVT. An L character may + be followed by an L, V, LV, or LVT character; an LV or V character may + be followed by a V or T character; an LVT or T character may be follwed + only by a T character. + + 4. Do not end before extending characters or spacing marks or the + "zero-width joiner" character. Characters with the "mark" property al- + ways have the "extend" grapheme breaking property. + + 5. Do not end after prepend characters. + + 6. Do not break within emoji modifier sequences or emoji zwj sequences. + That is, do not break between characters with the Extended_Pictographic + property. Extend and ZWJ characters are allowed between the charac- + ters. + + 7. Do not break within emoji flag sequences. That is, do not break be- + tween regional indicator (RI) characters if there are an odd number of + RI characters before the break point. + + 8. Otherwise, end the cluster. + + PCRE2's additional properties + + As well as the standard Unicode properties described above, PCRE2 sup- + ports four more that make it possible to convert traditional escape se- + quences such as \w and \s to use Unicode properties. PCRE2 uses these + non-standard, non-Perl properties internally when PCRE2_UCP is set. + However, they may also be used explicitly. These properties are: + + Xan Any alphanumeric character + Xps Any POSIX space character + Xsp Any Perl space character + Xwd Any Perl "word" character + + Xan matches characters that have either the L (letter) or the N (num- + ber) property. Xps matches the characters tab, linefeed, vertical tab, + form feed, or carriage return, and any other character that has the Z + (separator) property. Xsp is the same as Xps; in PCRE1 it used to ex- + clude vertical tab, for Perl compatibility, but Perl changed. Xwd + matches the same characters as Xan, plus underscore. + + There is another non-standard property, Xuc, which matches any charac- + ter that can be represented by a Universal Character Name in C++ and + other programming languages. These are the characters $, @, ` (grave + accent), and all characters with Unicode code points greater than or + equal to U+00A0, except for the surrogates U+D800 to U+DFFF. Note that + most base (ASCII) characters are excluded. (Universal Character Names + are of the form \uHHHH or \UHHHHHHHH where H is a hexadecimal digit. + Note that the Xuc property does not match these sequences but the char- + acters that they represent.) + + Resetting the match start + + In normal use, the escape sequence \K causes any previously matched + characters not to be included in the final matched sequence that is re- + turned. For example, the pattern: + + foo\Kbar + + matches "foobar", but reports that it has matched "bar". \K does not + interact with anchoring in any way. The pattern: + + ^foo\Kbar + + matches only when the subject begins with "foobar" (in single line + mode), though it again reports the matched string as "bar". This fea- + ture is similar to a lookbehind assertion (described below). However, + in this case, the part of the subject before the real match does not + have to be of fixed length, as lookbehind assertions do. The use of \K + does not interfere with the setting of captured substrings. For exam- + ple, when the pattern + + (foo)\Kbar + + matches "foobar", the first substring is still set to "foo". + + Perl used to document that the use of \K within lookaround assertions + is "not well defined", but from version 5.32.0 Perl does not support + this usage at all. In PCRE2, \K is acted upon when it occurs inside + positive assertions, but is ignored in negative assertions. Note that + when a pattern such as (?=ab\K) matches, the reported start of the + match can be greater than the end of the match. Using \K in a lookbe- + hind assertion at the start of a pattern can also lead to odd effects. + For example, consider this pattern: + + (?<=\Kfoo)bar + + If the subject is "foobar", a call to pcre2_match() with a starting + offset of 3 succeeds and reports the matching string as "foobar", that + is, the start of the reported match is earlier than where the match + started. + + Simple assertions + + The final use of backslash is for certain simple assertions. An asser- + tion specifies a condition that has to be met at a particular point in + a match, without consuming any characters from the subject string. The + use of groups for more complicated assertions is described below. The + backslashed assertions are: + + \b matches at a word boundary + \B matches when not at a word boundary + \A matches at the start of the subject + \Z matches at the end of the subject + also matches before a newline at the end of the subject + \z matches only at the end of the subject + \G matches at the first matching position in the subject + + Inside a character class, \b has a different meaning; it matches the + backspace character. If any other of these assertions appears in a + character class, an "invalid escape sequence" error is generated. + + A word boundary is a position in the subject string where the current + character and the previous character do not both match \w or \W (i.e. + one matches \w and the other matches \W), or the start or end of the + string if the first or last character matches \w, respectively. When + PCRE2 is built with Unicode support, the meanings of \w and \W can be + changed by setting the PCRE2_UCP option. When this is done, it also af- + fects \b and \B. Neither PCRE2 nor Perl has a separate "start of word" + or "end of word" metasequence. However, whatever follows \b normally + determines which it is. For example, the fragment \ba matches "a" at + the start of a word. + + The \A, \Z, and \z assertions differ from the traditional circumflex + and dollar (described in the next section) in that they only ever match + at the very start and end of the subject string, whatever options are + set. Thus, they are independent of multiline mode. These three asser- + tions are not affected by the PCRE2_NOTBOL or PCRE2_NOTEOL options, + which affect only the behaviour of the circumflex and dollar metachar- + acters. However, if the startoffset argument of pcre2_match() is non- + zero, indicating that matching is to start at a point other than the + beginning of the subject, \A can never match. The difference between + \Z and \z is that \Z matches before a newline at the end of the string + as well as at the very end, whereas \z matches only at the end. + + The \G assertion is true only when the current matching position is at + the start point of the matching process, as specified by the startoff- + set argument of pcre2_match(). It differs from \A when the value of + startoffset is non-zero. By calling pcre2_match() multiple times with + appropriate arguments, you can mimic Perl's /g option, and it is in + this kind of implementation where \G can be useful. + + Note, however, that PCRE2's implementation of \G, being true at the + starting character of the matching process, is subtly different from + Perl's, which defines it as true at the end of the previous match. In + Perl, these can be different when the previously matched string was + empty. Because PCRE2 does just one match at a time, it cannot reproduce + this behaviour. + + If all the alternatives of a pattern begin with \G, the expression is + anchored to the starting match position, and the "anchored" flag is set + in the compiled regular expression. + + +CIRCUMFLEX AND DOLLAR + + The circumflex and dollar metacharacters are zero-width assertions. + That is, they test for a particular condition being true without con- + suming any characters from the subject string. These two metacharacters + are concerned with matching the starts and ends of lines. If the new- + line convention is set so that only the two-character sequence CRLF is + recognized as a newline, isolated CR and LF characters are treated as + ordinary data characters, and are not recognized as newlines. + + Outside a character class, in the default matching mode, the circumflex + character is an assertion that is true only if the current matching + point is at the start of the subject string. If the startoffset argu- + ment of pcre2_match() is non-zero, or if PCRE2_NOTBOL is set, circum- + flex can never match if the PCRE2_MULTILINE option is unset. Inside a + character class, circumflex has an entirely different meaning (see be- + low). + + Circumflex need not be the first character of the pattern if a number + of alternatives are involved, but it should be the first thing in each + alternative in which it appears if the pattern is ever to match that + branch. If all possible alternatives start with a circumflex, that is, + if the pattern is constrained to match only at the start of the sub- + ject, it is said to be an "anchored" pattern. (There are also other + constructs that can cause a pattern to be anchored.) + + The dollar character is an assertion that is true only if the current + matching point is at the end of the subject string, or immediately be- + fore a newline at the end of the string (by default), unless PCRE2_NO- + TEOL is set. Note, however, that it does not actually match the new- + line. Dollar need not be the last character of the pattern if a number + of alternatives are involved, but it should be the last item in any + branch in which it appears. Dollar has no special meaning in a charac- + ter class. + + The meaning of dollar can be changed so that it matches only at the + very end of the string, by setting the PCRE2_DOLLAR_ENDONLY option at + compile time. This does not affect the \Z assertion. + + The meanings of the circumflex and dollar metacharacters are changed if + the PCRE2_MULTILINE option is set. When this is the case, a dollar + character matches before any newlines in the string, as well as at the + very end, and a circumflex matches immediately after internal newlines + as well as at the start of the subject string. It does not match after + a newline that ends the string, for compatibility with Perl. However, + this can be changed by setting the PCRE2_ALT_CIRCUMFLEX option. + + For example, the pattern /^abc$/ matches the subject string "def\nabc" + (where \n represents a newline) in multiline mode, but not otherwise. + Consequently, patterns that are anchored in single line mode because + all branches start with ^ are not anchored in multiline mode, and a + match for circumflex is possible when the startoffset argument of + pcre2_match() is non-zero. The PCRE2_DOLLAR_ENDONLY option is ignored + if PCRE2_MULTILINE is set. + + When the newline convention (see "Newline conventions" below) recog- + nizes the two-character sequence CRLF as a newline, this is preferred, + even if the single characters CR and LF are also recognized as new- + lines. For example, if the newline convention is "any", a multiline + mode circumflex matches before "xyz" in the string "abc\r\nxyz" rather + than after CR, even though CR on its own is a valid newline. (It also + matches at the very start of the string, of course.) + + Note that the sequences \A, \Z, and \z can be used to match the start + and end of the subject in both modes, and if all branches of a pattern + start with \A it is always anchored, whether or not PCRE2_MULTILINE is + set. + + +FULL STOP (PERIOD, DOT) AND \N + + Outside a character class, a dot in the pattern matches any one charac- + ter in the subject string except (by default) a character that signi- + fies the end of a line. + + When a line ending is defined as a single character, dot never matches + that character; when the two-character sequence CRLF is used, dot does + not match CR if it is immediately followed by LF, but otherwise it + matches all characters (including isolated CRs and LFs). When any Uni- + code line endings are being recognized, dot does not match CR or LF or + any of the other line ending characters. + + The behaviour of dot with regard to newlines can be changed. If the + PCRE2_DOTALL option is set, a dot matches any one character, without + exception. If the two-character sequence CRLF is present in the sub- + ject string, it takes two dots to match it. + + The handling of dot is entirely independent of the handling of circum- + flex and dollar, the only relationship being that they both involve + newlines. Dot has no special meaning in a character class. + + The escape sequence \N when not followed by an opening brace behaves + like a dot, except that it is not affected by the PCRE2_DOTALL option. + In other words, it matches any character except one that signifies the + end of a line. + + When \N is followed by an opening brace it has a different meaning. See + the section entitled "Non-printing characters" above for details. Perl + also uses \N{name} to specify characters by Unicode name; PCRE2 does + not support this. + + +MATCHING A SINGLE CODE UNIT + + Outside a character class, the escape sequence \C matches any one code + unit, whether or not a UTF mode is set. In the 8-bit library, one code + unit is one byte; in the 16-bit library it is a 16-bit unit; in the + 32-bit library it is a 32-bit unit. Unlike a dot, \C always matches + line-ending characters. The feature is provided in Perl in order to + match individual bytes in UTF-8 mode, but it is unclear how it can use- + fully be used. + + Because \C breaks up characters into individual code units, matching + one unit with \C in UTF-8 or UTF-16 mode means that the rest of the + string may start with a malformed UTF character. This has undefined re- + sults, because PCRE2 assumes that it is matching character by character + in a valid UTF string (by default it checks the subject string's valid- + ity at the start of processing unless the PCRE2_NO_UTF_CHECK or + PCRE2_MATCH_INVALID_UTF option is used). + + An application can lock out the use of \C by setting the + PCRE2_NEVER_BACKSLASH_C option when compiling a pattern. It is also + possible to build PCRE2 with the use of \C permanently disabled. + + PCRE2 does not allow \C to appear in lookbehind assertions (described + below) in UTF-8 or UTF-16 modes, because this would make it impossible + to calculate the length of the lookbehind. Neither the alternative + matching function pcre2_dfa_match() nor the JIT optimizer support \C in + these UTF modes. The former gives a match-time error; the latter fails + to optimize and so the match is always run using the interpreter. + + In the 32-bit library, however, \C is always supported (when not ex- + plicitly locked out) because it always matches a single code unit, + whether or not UTF-32 is specified. + + In general, the \C escape sequence is best avoided. However, one way of + using it that avoids the problem of malformed UTF-8 or UTF-16 charac- + ters is to use a lookahead to check the length of the next character, + as in this pattern, which could be used with a UTF-8 string (ignore + white space and line breaks): + + (?| (?=[\x00-\x7f])(\C) | + (?=[\x80-\x{7ff}])(\C)(\C) | + (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) | + (?=[\x{10000}-\x{1fffff}])(\C)(\C)(\C)(\C)) + + In this example, a group that starts with (?| resets the capturing + parentheses numbers in each alternative (see "Duplicate Group Numbers" + below). The assertions at the start of each branch check the next UTF-8 + character for values whose encoding uses 1, 2, 3, or 4 bytes, respec- + tively. The character's individual bytes are then captured by the ap- + propriate number of \C groups. + + +SQUARE BRACKETS AND CHARACTER CLASSES + + An opening square bracket introduces a character class, terminated by a + closing square bracket. A closing square bracket on its own is not spe- + cial by default. If a closing square bracket is required as a member + of the class, it should be the first data character in the class (after + an initial circumflex, if present) or escaped with a backslash. This + means that, by default, an empty class cannot be defined. However, if + the PCRE2_ALLOW_EMPTY_CLASS option is set, a closing square bracket at + the start does end the (empty) class. + + A character class matches a single character in the subject. A matched + character must be in the set of characters defined by the class, unless + the first character in the class definition is a circumflex, in which + case the subject character must not be in the set defined by the class. + If a circumflex is actually required as a member of the class, ensure + it is not the first character, or escape it with a backslash. + + For example, the character class [aeiou] matches any lower case vowel, + while [^aeiou] matches any character that is not a lower case vowel. + Note that a circumflex is just a convenient notation for specifying the + characters that are in the class by enumerating those that are not. A + class that starts with a circumflex is not an assertion; it still con- + sumes a character from the subject string, and therefore it fails if + the current pointer is at the end of the string. + + Characters in a class may be specified by their code points using \o, + \x, or \N{U+hh..} in the usual way. When caseless matching is set, any + letters in a class represent both their upper case and lower case ver- + sions, so for example, a caseless [aeiou] matches "A" as well as "a", + and a caseless [^aeiou] does not match "A", whereas a caseful version + would. Note that there are two ASCII characters, K and S, that, in ad- + dition to their lower case ASCII equivalents, are case-equivalent with + Unicode U+212A (Kelvin sign) and U+017F (long S) respectively when ei- + ther PCRE2_UTF or PCRE2_UCP is set. + + Characters that might indicate line breaks are never treated in any + special way when matching character classes, whatever line-ending se- + quence is in use, and whatever setting of the PCRE2_DOTALL and + PCRE2_MULTILINE options is used. A class such as [^a] always matches + one of these characters. + + The generic character type escape sequences \d, \D, \h, \H, \p, \P, \s, + \S, \v, \V, \w, and \W may appear in a character class, and add the + characters that they match to the class. For example, [\dABCDEF] + matches any hexadecimal digit. In UTF modes, the PCRE2_UCP option af- + fects the meanings of \d, \s, \w and their upper case partners, just as + it does when they appear outside a character class, as described in the + section entitled "Generic character types" above. The escape sequence + \b has a different meaning inside a character class; it matches the + backspace character. The sequences \B, \R, and \X are not special in- + side a character class. Like any other unrecognized escape sequences, + they cause an error. The same is true for \N when not followed by an + opening brace. + + The minus (hyphen) character can be used to specify a range of charac- + ters in a character class. For example, [d-m] matches any letter be- + tween d and m, inclusive. If a minus character is required in a class, + it must be escaped with a backslash or appear in a position where it + cannot be interpreted as indicating a range, typically as the first or + last character in the class, or immediately after a range. For example, + [b-d-z] matches letters in the range b to d, a hyphen character, or z. + + Perl treats a hyphen as a literal if it appears before or after a POSIX + class (see below) or before or after a character type escape such as as + \d or \H. However, unless the hyphen is the last character in the + class, Perl outputs a warning in its warning mode, as this is most + likely a user error. As PCRE2 has no facility for warning, an error is + given in these cases. + + It is not possible to have the literal character "]" as the end charac- + ter of a range. A pattern such as [W-]46] is interpreted as a class of + two characters ("W" and "-") followed by a literal string "46]", so it + would match "W46]" or "-46]". However, if the "]" is escaped with a + backslash it is interpreted as the end of range, so [W-\]46] is inter- + preted as a class containing a range followed by two other characters. + The octal or hexadecimal representation of "]" can also be used to end + a range. + + Ranges normally include all code points between the start and end char- + acters, inclusive. They can also be used for code points specified nu- + merically, for example [\000-\037]. Ranges can include any characters + that are valid for the current mode. In any UTF mode, the so-called + "surrogate" characters (those whose code points lie between 0xd800 and + 0xdfff inclusive) may not be specified explicitly by default (the + PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES option disables this check). How- + ever, ranges such as [\x{d7ff}-\x{e000}], which include the surrogates, + are always permitted. + + There is a special case in EBCDIC environments for ranges whose end + points are both specified as literal letters in the same case. For com- + patibility with Perl, EBCDIC code points within the range that are not + letters are omitted. For example, [h-k] matches only four characters, + even though the codes for h and k are 0x88 and 0x92, a range of 11 code + points. However, if the range is specified numerically, for example, + [\x88-\x92] or [h-\x92], all code points are included. + + If a range that includes letters is used when caseless matching is set, + it matches the letters in either case. For example, [W-c] is equivalent + to [][\\^_`wxyzabc], matched caselessly, and in a non-UTF mode, if + character tables for a French locale are in use, [\xc8-\xcb] matches + accented E characters in both cases. + + A circumflex can conveniently be used with the upper case character + types to specify a more restricted set of characters than the matching + lower case type. For example, the class [^\W_] matches any letter or + digit, but not underscore, whereas [\w] includes underscore. A positive + character class should be read as "something OR something OR ..." and a + negative class as "NOT something AND NOT something AND NOT ...". + + The only metacharacters that are recognized in character classes are + backslash, hyphen (only where it can be interpreted as specifying a + range), circumflex (only at the start), opening square bracket (only + when it can be interpreted as introducing a POSIX class name, or for a + special compatibility feature - see the next two sections), and the + terminating closing square bracket. However, escaping other non-al- + phanumeric characters does no harm. + + +POSIX CHARACTER CLASSES + + Perl supports the POSIX notation for character classes. This uses names + enclosed by [: and :] within the enclosing square brackets. PCRE2 also + supports this notation. For example, + + [01[:alpha:]%] + + matches "0", "1", any alphabetic character, or "%". The supported class + names are: + + alnum letters and digits + alpha letters + ascii character codes 0 - 127 + blank space or tab only + cntrl control characters + digit decimal digits (same as \d) + graph printing characters, excluding space + lower lower case letters + print printing characters, including space + punct printing characters, excluding letters and digits and space + space white space (the same as \s from PCRE2 8.34) + upper upper case letters + word "word" characters (same as \w) + xdigit hexadecimal digits + + The default "space" characters are HT (9), LF (10), VT (11), FF (12), + CR (13), and space (32). If locale-specific matching is taking place, + the list of space characters may be different; there may be fewer or + more of them. "Space" and \s match the same set of characters. + + The name "word" is a Perl extension, and "blank" is a GNU extension + from Perl 5.8. Another Perl extension is negation, which is indicated + by a ^ character after the colon. For example, + + [12[:^digit:]] + + matches "1", "2", or any non-digit. PCRE2 (and Perl) also recognize the + POSIX syntax [.ch.] and [=ch=] where "ch" is a "collating element", but + these are not supported, and an error is given if they are encountered. + + By default, characters with values greater than 127 do not match any of + the POSIX character classes, although this may be different for charac- + ters in the range 128-255 when locale-specific matching is happening. + However, if the PCRE2_UCP option is passed to pcre2_compile(), some of + the classes are changed so that Unicode character properties are used. + This is achieved by replacing certain POSIX classes with other se- + quences, as follows: + + [:alnum:] becomes \p{Xan} + [:alpha:] becomes \p{L} + [:blank:] becomes \h + [:cntrl:] becomes \p{Cc} + [:digit:] becomes \p{Nd} + [:lower:] becomes \p{Ll} + [:space:] becomes \p{Xps} + [:upper:] becomes \p{Lu} + [:word:] becomes \p{Xwd} + + Negated versions, such as [:^alpha:] use \P instead of \p. Three other + POSIX classes are handled specially in UCP mode: + + [:graph:] This matches characters that have glyphs that mark the page + when printed. In Unicode property terms, it matches all char- + acters with the L, M, N, P, S, or Cf properties, except for: + + U+061C Arabic Letter Mark + U+180E Mongolian Vowel Separator + U+2066 - U+2069 Various "isolate"s + + + [:print:] This matches the same characters as [:graph:] plus space + characters that are not controls, that is, characters with + the Zs property. + + [:punct:] This matches all characters that have the Unicode P (punctua- + tion) property, plus those characters with code points less + than 256 that have the S (Symbol) property. + + The other POSIX classes are unchanged, and match only characters with + code points less than 256. + + +COMPATIBILITY FEATURE FOR WORD BOUNDARIES + + In the POSIX.2 compliant library that was included in 4.4BSD Unix, the + ugly syntax [[:<:]] and [[:>:]] is used for matching "start of word" + and "end of word". PCRE2 treats these items as follows: + + [[:<:]] is converted to \b(?=\w) + [[:>:]] is converted to \b(?<=\w) + + Only these exact character sequences are recognized. A sequence such as + [a[:<:]b] provokes error for an unrecognized POSIX class name. This + support is not compatible with Perl. It is provided to help migrations + from other environments, and is best not used in any new patterns. Note + that \b matches at the start and the end of a word (see "Simple asser- + tions" above), and in a Perl-style pattern the preceding or following + character normally shows which is wanted, without the need for the as- + sertions that are used above in order to give exactly the POSIX behav- + iour. + + +VERTICAL BAR + + Vertical bar characters are used to separate alternative patterns. For + example, the pattern + + gilbert|sullivan + + matches either "gilbert" or "sullivan". Any number of alternatives may + appear, and an empty alternative is permitted (matching the empty + string). The matching process tries each alternative in turn, from left + to right, and the first one that succeeds is used. If the alternatives + are within a group (defined below), "succeeds" means matching the rest + of the main pattern as well as the alternative in the group. + + +INTERNAL OPTION SETTING + + The settings of the PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL, + PCRE2_EXTENDED, PCRE2_EXTENDED_MORE, and PCRE2_NO_AUTO_CAPTURE options + can be changed from within the pattern by a sequence of letters en- + closed between "(?" and ")". These options are Perl-compatible, and + are described in detail in the pcre2api documentation. The option let- + ters are: + + i for PCRE2_CASELESS + m for PCRE2_MULTILINE + n for PCRE2_NO_AUTO_CAPTURE + s for PCRE2_DOTALL + x for PCRE2_EXTENDED + xx for PCRE2_EXTENDED_MORE + + For example, (?im) sets caseless, multiline matching. It is also possi- + ble to unset these options by preceding the relevant letters with a hy- + phen, for example (?-im). The two "extended" options are not indepen- + dent; unsetting either one cancels the effects of both of them. + + A combined setting and unsetting such as (?im-sx), which sets + PCRE2_CASELESS and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and + PCRE2_EXTENDED, is also permitted. Only one hyphen may appear in the + options string. If a letter appears both before and after the hyphen, + the option is unset. An empty options setting "(?)" is allowed. Need- + less to say, it has no effect. + + If the first character following (? is a circumflex, it causes all of + the above options to be unset. Thus, (?^) is equivalent to (?-imnsx). + Letters may follow the circumflex to cause some options to be re-in- + stated, but a hyphen may not appear. + + The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be + changed in the same way as the Perl-compatible options by using the + characters J and U respectively. However, these are not unset by (?^). + + When one of these option changes occurs at top level (that is, not in- + side group parentheses), the change applies to the remainder of the + pattern that follows. An option change within a group (see below for a + description of groups) affects only that part of the group that follows + it, so + + (a(?i)b)c + + matches abc and aBc and no other strings (assuming PCRE2_CASELESS is + not used). By this means, options can be made to have different set- + tings in different parts of the pattern. Any changes made in one alter- + native do carry on into subsequent branches within the same group. For + example, + + (a(?i)b|c) + + matches "ab", "aB", "c", and "C", even though when matching "C" the + first branch is abandoned before the option setting. This is because + the effects of option settings happen at compile time. There would be + some very weird behaviour otherwise. + + As a convenient shorthand, if any option settings are required at the + start of a non-capturing group (see the next section), the option let- + ters may appear between the "?" and the ":". Thus the two patterns + + (?i:saturday|sunday) + (?:(?i)saturday|sunday) + + match exactly the same set of strings. + + Note: There are other PCRE2-specific options, applying to the whole + pattern, which can be set by the application when the compiling func- + tion is called. In addition, the pattern can contain special leading + sequences such as (*CRLF) to override what the application has set or + what has been defaulted. Details are given in the section entitled + "Newline sequences" above. There are also the (*UTF) and (*UCP) leading + sequences that can be used to set UTF and Unicode property modes; they + are equivalent to setting the PCRE2_UTF and PCRE2_UCP options, respec- + tively. However, the application can set the PCRE2_NEVER_UTF and + PCRE2_NEVER_UCP options, which lock out the use of the (*UTF) and + (*UCP) sequences. + + +GROUPS + + Groups are delimited by parentheses (round brackets), which can be + nested. Turning part of a pattern into a group does two things: + + 1. It localizes a set of alternatives. For example, the pattern + + cat(aract|erpillar|) + + matches "cataract", "caterpillar", or "cat". Without the parentheses, + it would match "cataract", "erpillar" or an empty string. + + 2. It creates a "capture group". This means that, when the whole pat- + tern matches, the portion of the subject string that matched the group + is passed back to the caller, separately from the portion that matched + the whole pattern. (This applies only to the traditional matching + function; the DFA matching function does not support capturing.) + + Opening parentheses are counted from left to right (starting from 1) to + obtain numbers for capture groups. For example, if the string "the red + king" is matched against the pattern + + the ((red|white) (king|queen)) + + the captured substrings are "red king", "red", and "king", and are num- + bered 1, 2, and 3, respectively. + + The fact that plain parentheses fulfil two functions is not always + helpful. There are often times when grouping is required without cap- + turing. If an opening parenthesis is followed by a question mark and a + colon, the group does not do any capturing, and is not counted when + computing the number of any subsequent capture groups. For example, if + the string "the white queen" is matched against the pattern + + the ((?:red|white) (king|queen)) + + the captured substrings are "white queen" and "queen", and are numbered + 1 and 2. The maximum number of capture groups is 65535. + + As a convenient shorthand, if any option settings are required at the + start of a non-capturing group, the option letters may appear between + the "?" and the ":". Thus the two patterns + + (?i:saturday|sunday) + (?:(?i)saturday|sunday) + + match exactly the same set of strings. Because alternative branches are + tried from left to right, and options are not reset until the end of + the group is reached, an option setting in one branch does affect sub- + sequent branches, so the above patterns match "SUNDAY" as well as "Sat- + urday". + + +DUPLICATE GROUP NUMBERS + + Perl 5.10 introduced a feature whereby each alternative in a group uses + the same numbers for its capturing parentheses. Such a group starts + with (?| and is itself a non-capturing group. For example, consider + this pattern: + + (?|(Sat)ur|(Sun))day + + Because the two alternatives are inside a (?| group, both sets of cap- + turing parentheses are numbered one. Thus, when the pattern matches, + you can look at captured substring number one, whichever alternative + matched. This construct is useful when you want to capture part, but + not all, of one of a number of alternatives. Inside a (?| group, paren- + theses are numbered as usual, but the number is reset at the start of + each branch. The numbers of any capturing parentheses that follow the + whole group start after the highest number used in any branch. The fol- + lowing example is taken from the Perl documentation. The numbers under- + neath show in which buffer the captured content will be stored. + + # before ---------------branch-reset----------- after + / ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x + # 1 2 2 3 2 3 4 + + A backreference to a capture group uses the most recent value that is + set for the group. The following pattern matches "abcabc" or "defdef": + + /(?|(abc)|(def))\1/ + + In contrast, a subroutine call to a capture group always refers to the + first one in the pattern with the given number. The following pattern + matches "abcabc" or "defabc": + + /(?|(abc)|(def))(?1)/ + + A relative reference such as (?-1) is no different: it is just a conve- + nient way of computing an absolute group number. + + If a condition test for a group's having matched refers to a non-unique + number, the test is true if any group with that number has matched. + + An alternative approach to using this "branch reset" feature is to use + duplicate named groups, as described in the next section. + + +NAMED CAPTURE GROUPS + + Identifying capture groups by number is simple, but it can be very hard + to keep track of the numbers in complicated patterns. Furthermore, if + an expression is modified, the numbers may change. To help with this + difficulty, PCRE2 supports the naming of capture groups. This feature + was not added to Perl until release 5.10. Python had the feature ear- + lier, and PCRE1 introduced it at release 4.0, using the Python syntax. + PCRE2 supports both the Perl and the Python syntax. + + In PCRE2, a capture group can be named in one of three ways: + (?...) or (?'name'...) as in Perl, or (?P...) as in Python. + Names may be up to 32 code units long. When PCRE2_UTF is not set, they + may contain only ASCII alphanumeric characters and underscores, but + must start with a non-digit. When PCRE2_UTF is set, the syntax of group + names is extended to allow any Unicode letter or Unicode decimal digit. + In other words, group names must match one of these patterns: + + ^[_A-Za-z][_A-Za-z0-9]*\z when PCRE2_UTF is not set + ^[_\p{L}][_\p{L}\p{Nd}]*\z when PCRE2_UTF is set + + References to capture groups from other parts of the pattern, such as + backreferences, recursion, and conditions, can all be made by name as + well as by number. + + Named capture groups are allocated numbers as well as names, exactly as + if the names were not present. In both PCRE2 and Perl, capture groups + are primarily identified by numbers; any names are just aliases for + these numbers. The PCRE2 API provides function calls for extracting the + complete name-to-number translation table from a compiled pattern, as + well as convenience functions for extracting captured substrings by + name. + + Warning: When more than one capture group has the same number, as de- + scribed in the previous section, a name given to one of them applies to + all of them. Perl allows identically numbered groups to have different + names. Consider this pattern, where there are two capture groups, both + numbered 1: + + (?|(?aa)|(?bb)) + + Perl allows this, with both names AA and BB as aliases of group 1. + Thus, after a successful match, both names yield the same value (either + "aa" or "bb"). + + In an attempt to reduce confusion, PCRE2 does not allow the same group + number to be associated with more than one name. The example above pro- + vokes a compile-time error. However, there is still scope for confu- + sion. Consider this pattern: + + (?|(?aa)|(bb)) + + Although the second group number 1 is not explicitly named, the name AA + is still an alias for any group 1. Whether the pattern matches "aa" or + "bb", a reference by name to group AA yields the matched string. + + By default, a name must be unique within a pattern, except that dupli- + cate names are permitted for groups with the same number, for example: + + (?|(?aa)|(?bb)) + + The duplicate name constraint can be disabled by setting the PCRE2_DUP- + NAMES option at compile time, or by the use of (?J) within the pattern, + as described in the section entitled "Internal Option Setting" above. + + Duplicate names can be useful for patterns where only one instance of + the named capture group can match. Suppose you want to match the name + of a weekday, either as a 3-letter abbreviation or as the full name, + and in both cases you want to extract the abbreviation. This pattern + (ignoring the line breaks) does the job: + + (?J) + (?Mon|Fri|Sun)(?:day)?| + (?Tue)(?:sday)?| + (?Wed)(?:nesday)?| + (?Thu)(?:rsday)?| + (?Sat)(?:urday)? + + There are five capture groups, but only one is ever set after a match. + The convenience functions for extracting the data by name returns the + substring for the first (and in this example, the only) group of that + name that matched. This saves searching to find which numbered group it + was. (An alternative way of solving this problem is to use a "branch + reset" group, as described in the previous section.) + + If you make a backreference to a non-unique named group from elsewhere + in the pattern, the groups to which the name refers are checked in the + order in which they appear in the overall pattern. The first one that + is set is used for the reference. For example, this pattern matches + both "foofoo" and "barbar" but not "foobar" or "barfoo": + + (?J)(?:(?foo)|(?bar))\k + + + If you make a subroutine call to a non-unique named group, the one that + corresponds to the first occurrence of the name is used. In the absence + of duplicate numbers this is the one with the lowest number. + + If you use a named reference in a condition test (see the section about + conditions below), either to check whether a capture group has matched, + or to check for recursion, all groups with the same name are tested. If + the condition is true for any one of them, the overall condition is + true. This is the same behaviour as testing by number. For further de- + tails of the interfaces for handling named capture groups, see the + pcre2api documentation. + + +REPETITION + + Repetition is specified by quantifiers, which can follow any of the + following items: + + a literal data character + the dot metacharacter + the \C escape sequence + the \R escape sequence + the \X escape sequence + an escape such as \d or \pL that matches a single character + a character class + a backreference + a parenthesized group (including lookaround assertions) + a subroutine call (recursive or otherwise) + + The general repetition quantifier specifies a minimum and maximum num- + ber of permitted matches, by giving the two numbers in curly brackets + (braces), separated by a comma. The numbers must be less than 65536, + and the first must be less than or equal to the second. For example, + + z{2,4} + + matches "zz", "zzz", or "zzzz". A closing brace on its own is not a + special character. If the second number is omitted, but the comma is + present, there is no upper limit; if the second number and the comma + are both omitted, the quantifier specifies an exact number of required + matches. Thus + + [aeiou]{3,} + + matches at least 3 successive vowels, but may match many more, whereas + + \d{8} + + matches exactly 8 digits. An opening curly bracket that appears in a + position where a quantifier is not allowed, or one that does not match + the syntax of a quantifier, is taken as a literal character. For exam- + ple, {,6} is not a quantifier, but a literal string of four characters. + + In UTF modes, quantifiers apply to characters rather than to individual + code units. Thus, for example, \x{100}{2} matches two characters, each + of which is represented by a two-byte sequence in a UTF-8 string. Simi- + larly, \X{3} matches three Unicode extended grapheme clusters, each of + which may be several code units long (and they may be of different + lengths). + + The quantifier {0} is permitted, causing the expression to behave as if + the previous item and the quantifier were not present. This may be use- + ful for capture groups that are referenced as subroutines from else- + where in the pattern (but see also the section entitled "Defining cap- + ture groups for use by reference only" below). Except for parenthesized + groups, items that have a {0} quantifier are omitted from the compiled + pattern. + + For convenience, the three most common quantifiers have single-charac- + ter abbreviations: + + * is equivalent to {0,} + + is equivalent to {1,} + ? is equivalent to {0,1} + + It is possible to construct infinite loops by following a group that + can match no characters with a quantifier that has no upper limit, for + example: + + (a?)* + + Earlier versions of Perl and PCRE1 used to give an error at compile + time for such patterns. However, because there are cases where this can + be useful, such patterns are now accepted, but whenever an iteration of + such a group matches no characters, matching moves on to the next item + in the pattern instead of repeatedly matching an empty string. This + does not prevent backtracking into any of the iterations if a subse- + quent item fails to match. + + By default, quantifiers are "greedy", that is, they match as much as + possible (up to the maximum number of permitted times), without causing + the rest of the pattern to fail. The classic example of where this + gives problems is in trying to match comments in C programs. These ap- + pear between /* and */ and within the comment, individual * and / char- + acters may appear. An attempt to match C comments by applying the pat- + tern + + /\*.*\*/ + + to the string + + /* first comment */ not comment /* second comment */ + + fails, because it matches the entire string owing to the greediness of + the .* item. However, if a quantifier is followed by a question mark, + it ceases to be greedy, and instead matches the minimum number of times + possible, so the pattern + + /\*.*?\*/ + + does the right thing with the C comments. The meaning of the various + quantifiers is not otherwise changed, just the preferred number of + matches. Do not confuse this use of question mark with its use as a + quantifier in its own right. Because it has two uses, it can sometimes + appear doubled, as in + + \d??\d + + which matches one digit by preference, but can match two if that is the + only way the rest of the pattern matches. + + If the PCRE2_UNGREEDY option is set (an option that is not available in + Perl), the quantifiers are not greedy by default, but individual ones + can be made greedy by following them with a question mark. In other + words, it inverts the default behaviour. + + When a parenthesized group is quantified with a minimum repeat count + that is greater than 1 or with a limited maximum, more memory is re- + quired for the compiled pattern, in proportion to the size of the mini- + mum or maximum. + + If a pattern starts with .* or .{0,} and the PCRE2_DOTALL option + (equivalent to Perl's /s) is set, thus allowing the dot to match new- + lines, the pattern is implicitly anchored, because whatever follows + will be tried against every character position in the subject string, + so there is no point in retrying the overall match at any position af- + ter the first. PCRE2 normally treats such a pattern as though it were + preceded by \A. + + In cases where it is known that the subject string contains no new- + lines, it is worth setting PCRE2_DOTALL in order to obtain this opti- + mization, or alternatively, using ^ to indicate anchoring explicitly. + + However, there are some cases where the optimization cannot be used. + When .* is inside capturing parentheses that are the subject of a + backreference elsewhere in the pattern, a match at the start may fail + where a later one succeeds. Consider, for example: + + (.*)abc\1 + + If the subject is "xyz123abc123" the match point is the fourth charac- + ter. For this reason, such a pattern is not implicitly anchored. + + Another case where implicit anchoring is not applied is when the lead- + ing .* is inside an atomic group. Once again, a match at the start may + fail where a later one succeeds. Consider this pattern: + + (?>.*?a)b + + It matches "ab" in the subject "aab". The use of the backtracking con- + trol verbs (*PRUNE) and (*SKIP) also disable this optimization, and + there is an option, PCRE2_NO_DOTSTAR_ANCHOR, to do so explicitly. + + When a capture group is repeated, the value captured is the substring + that matched the final iteration. For example, after + + (tweedle[dume]{3}\s*)+ + + has matched "tweedledum tweedledee" the value of the captured substring + is "tweedledee". However, if there are nested capture groups, the cor- + responding captured values may have been set in previous iterations. + For example, after + + (a|(b))+ + + matches "aba" the value of the second captured substring is "b". + + +ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS + + With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy") + repetition, failure of what follows normally causes the repeated item + to be re-evaluated to see if a different number of repeats allows the + rest of the pattern to match. Sometimes it is useful to prevent this, + either to change the nature of the match, or to cause it fail earlier + than it otherwise might, when the author of the pattern knows there is + no point in carrying on. + + Consider, for example, the pattern \d+foo when applied to the subject + line + + 123456bar + + After matching all 6 digits and then failing to match "foo", the normal + action of the matcher is to try again with only 5 digits matching the + \d+ item, and then with 4, and so on, before ultimately failing. + "Atomic grouping" (a term taken from Jeffrey Friedl's book) provides + the means for specifying that once a group has matched, it is not to be + re-evaluated in this way. + + If we use atomic grouping for the previous example, the matcher gives + up immediately on failing to match "foo" the first time. The notation + is a kind of special parenthesis, starting with (?> as in this example: + + (?>\d+)foo + + Perl 5.28 introduced an experimental alphabetic form starting with (* + which may be easier to remember: + + (*atomic:\d+)foo + + This kind of parenthesized group "locks up" the part of the pattern it + contains once it has matched, and a failure further into the pattern is + prevented from backtracking into it. Backtracking past it to previous + items, however, works as normal. + + An alternative description is that a group of this type matches exactly + the string of characters that an identical standalone pattern would + match, if anchored at the current point in the subject string. + + Atomic groups are not capture groups. Simple cases such as the above + example can be thought of as a maximizing repeat that must swallow ev- + erything it can. So, while both \d+ and \d+? are prepared to adjust + the number of digits they match in order to make the rest of the pat- + tern match, (?>\d+) can only match an entire sequence of digits. + + Atomic groups in general can of course contain arbitrarily complicated + expressions, and can be nested. However, when the contents of an atomic + group is just a single repeated item, as in the example above, a sim- + pler notation, called a "possessive quantifier" can be used. This con- + sists of an additional + character following a quantifier. Using this + notation, the previous example can be rewritten as + + \d++foo + + Note that a possessive quantifier can be used with an entire group, for + example: + + (abc|xyz){2,3}+ + + Possessive quantifiers are always greedy; the setting of the PCRE2_UN- + GREEDY option is ignored. They are a convenient notation for the sim- + pler forms of atomic group. However, there is no difference in the + meaning of a possessive quantifier and the equivalent atomic group, + though there may be a performance difference; possessive quantifiers + should be slightly faster. + + The possessive quantifier syntax is an extension to the Perl 5.8 syn- + tax. Jeffrey Friedl originated the idea (and the name) in the first + edition of his book. Mike McCloskey liked it, so implemented it when he + built Sun's Java package, and PCRE1 copied it from there. It found its + way into Perl at release 5.10. + + PCRE2 has an optimization that automatically "possessifies" certain + simple pattern constructs. For example, the sequence A+B is treated as + A++B because there is no point in backtracking into a sequence of A's + when B must follow. This feature can be disabled by the PCRE2_NO_AUTO- + POSSESS option, or starting the pattern with (*NO_AUTO_POSSESS). + + When a pattern contains an unlimited repeat inside a group that can it- + self be repeated an unlimited number of times, the use of an atomic + group is the only way to avoid some failing matches taking a very long + time indeed. The pattern + + (\D+|<\d+>)*[!?] + + matches an unlimited number of substrings that either consist of non- + digits, or digits enclosed in <>, followed by either ! or ?. When it + matches, it runs quickly. However, if it is applied to + + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + + it takes a long time before reporting failure. This is because the + string can be divided between the internal \D+ repeat and the external + * repeat in a large number of ways, and all have to be tried. (The ex- + ample uses [!?] rather than a single character at the end, because both + PCRE2 and Perl have an optimization that allows for fast failure when a + single character is used. They remember the last single character that + is required for a match, and fail early if it is not present in the + string.) If the pattern is changed so that it uses an atomic group, + like this: + + ((?>\D+)|<\d+>)*[!?] + + sequences of non-digits cannot be broken, and failure happens quickly. + + +BACKREFERENCES + + Outside a character class, a backslash followed by a digit greater than + 0 (and possibly further digits) is a backreference to a capture group + earlier (that is, to its left) in the pattern, provided there have been + that many previous capture groups. + + However, if the decimal number following the backslash is less than 8, + it is always taken as a backreference, and causes an error only if + there are not that many capture groups in the entire pattern. In other + words, the group that is referenced need not be to the left of the ref- + erence for numbers less than 8. A "forward backreference" of this type + can make sense when a repetition is involved and the group to the right + has participated in an earlier iteration. + + It is not possible to have a numerical "forward backreference" to a + group whose number is 8 or more using this syntax because a sequence + such as \50 is interpreted as a character defined in octal. See the + subsection entitled "Non-printing characters" above for further details + of the handling of digits following a backslash. Other forms of back- + referencing do not suffer from this restriction. In particular, there + is no problem when named capture groups are used (see below). + + Another way of avoiding the ambiguity inherent in the use of digits + following a backslash is to use the \g escape sequence. This escape + must be followed by a signed or unsigned number, optionally enclosed in + braces. These examples are all identical: + + (ring), \1 + (ring), \g1 + (ring), \g{1} + + An unsigned number specifies an absolute reference without the ambigu- + ity that is present in the older syntax. It is also useful when literal + digits follow the reference. A signed number is a relative reference. + Consider this example: + + (abc(def)ghi)\g{-1} + + The sequence \g{-1} is a reference to the most recently started capture + group before \g, that is, is it equivalent to \2 in this example. Simi- + larly, \g{-2} would be equivalent to \1. The use of relative references + can be helpful in long patterns, and also in patterns that are created + by joining together fragments that contain references within them- + selves. + + The sequence \g{+1} is a reference to the next capture group. This kind + of forward reference can be useful in patterns that repeat. Perl does + not support the use of + in this way. + + A backreference matches whatever actually most recently matched the + capture group in the current subject string, rather than anything at + all that matches the group (see "Groups as subroutines" below for a way + of doing that). So the pattern + + (sens|respons)e and \1ibility + + matches "sense and sensibility" and "response and responsibility", but + not "sense and responsibility". If caseful matching is in force at the + time of the backreference, the case of letters is relevant. For exam- + ple, + + ((?i)rah)\s+\1 + + matches "rah rah" and "RAH RAH", but not "RAH rah", even though the + original capture group is matched caselessly. + + There are several different ways of writing backreferences to named + capture groups. The .NET syntax \k{name} and the Perl syntax \k + or \k'name' are supported, as is the Python syntax (?P=name). Perl + 5.10's unified backreference syntax, in which \g can be used for both + numeric and named references, is also supported. We could rewrite the + above example in any of the following ways: + + (?(?i)rah)\s+\k + (?'p1'(?i)rah)\s+\k{p1} + (?P(?i)rah)\s+(?P=p1) + (?(?i)rah)\s+\g{p1} + + A capture group that is referenced by name may appear in the pattern + before or after the reference. + + There may be more than one backreference to the same group. If a group + has not actually been used in a particular match, backreferences to it + always fail by default. For example, the pattern + + (a|(bc))\2 + + always fails if it starts to match "a" rather than "bc". However, if + the PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a backref- + erence to an unset value matches an empty string. + + Because there may be many capture groups in a pattern, all digits fol- + lowing a backslash are taken as part of a potential backreference num- + ber. If the pattern continues with a digit character, some delimiter + must be used to terminate the backreference. If the PCRE2_EXTENDED or + PCRE2_EXTENDED_MORE option is set, this can be white space. Otherwise, + the \g{} syntax or an empty comment (see "Comments" below) can be used. + + Recursive backreferences + + A backreference that occurs inside the group to which it refers fails + when the group is first used, so, for example, (a\1) never matches. + However, such references can be useful inside repeated groups. For ex- + ample, the pattern + + (a|b\1)+ + + matches any number of "a"s and also "aba", "ababbaa" etc. At each iter- + ation of the group, the backreference matches the character string cor- + responding to the previous iteration. In order for this to work, the + pattern must be such that the first iteration does not need to match + the backreference. This can be done using alternation, as in the exam- + ple above, or by a quantifier with a minimum of zero. + + For versions of PCRE2 less than 10.25, backreferences of this type used + to cause the group that they reference to be treated as an atomic + group. This restriction no longer applies, and backtracking into such + groups can occur as normal. + + +ASSERTIONS + + An assertion is a test on the characters following or preceding the + current matching point that does not consume any characters. The simple + assertions coded as \b, \B, \A, \G, \Z, \z, ^ and $ are described + above. + + More complicated assertions are coded as parenthesized groups. There + are two kinds: those that look ahead of the current position in the + subject string, and those that look behind it, and in each case an as- + sertion may be positive (must match for the assertion to be true) or + negative (must not match for the assertion to be true). An assertion + group is matched in the normal way, and if it is true, matching contin- + ues after it, but with the matching position in the subject string re- + set to what it was before the assertion was processed. + + The Perl-compatible lookaround assertions are atomic. If an assertion + is true, but there is a subsequent matching failure, there is no back- + tracking into the assertion. However, there are some cases where non- + atomic assertions can be useful. PCRE2 has some support for these, de- + scribed in the section entitled "Non-atomic assertions" below, but they + are not Perl-compatible. + + A lookaround assertion may appear as the condition in a conditional + group (see below). In this case, the result of matching the assertion + determines which branch of the condition is followed. + + Assertion groups are not capture groups. If an assertion contains cap- + ture groups within it, these are counted for the purposes of numbering + the capture groups in the whole pattern. Within each branch of an as- + sertion, locally captured substrings may be referenced in the usual + way. For example, a sequence such as (.)\g{-1} can be used to check + that two adjacent characters are the same. + + When a branch within an assertion fails to match, any substrings that + were captured are discarded (as happens with any pattern branch that + fails to match). A negative assertion is true only when all its + branches fail to match; this means that no captured substrings are ever + retained after a successful negative assertion. When an assertion con- + tains a matching branch, what happens depends on the type of assertion. + + For a positive assertion, internally captured substrings in the suc- + cessful branch are retained, and matching continues with the next pat- + tern item after the assertion. For a negative assertion, a matching + branch means that the assertion is not true. If such an assertion is + being used as a condition in a conditional group (see below), captured + substrings are retained, because matching continues with the "no" + branch of the condition. For other failing negative assertions, control + passes to the previous backtracking point, thus discarding any captured + strings within the assertion. + + Most assertion groups may be repeated; though it makes no sense to as- + sert the same thing several times, the side effect of capturing in pos- + itive assertions may occasionally be useful. However, an assertion that + forms the condition for a conditional group may not be quantified. + PCRE2 used to restrict the repetition of assertions, but from release + 10.35 the only restriction is that an unlimited maximum repetition is + changed to be one more than the minimum. For example, {3,} is treated + as {3,4}. + + Alphabetic assertion names + + Traditionally, symbolic sequences such as (?= and (?<= have been used + to specify lookaround assertions. Perl 5.28 introduced some experimen- + tal alphabetic alternatives which might be easier to remember. They all + start with (* instead of (? and must be written using lower case let- + ters. PCRE2 supports the following synonyms: + + (*positive_lookahead: or (*pla: is the same as (?= + (*negative_lookahead: or (*nla: is the same as (?! + (*positive_lookbehind: or (*plb: is the same as (?<= + (*negative_lookbehind: or (*nlb: is the same as (? .*? \b\1\b ){2} + + For a subject such as "word1 word2 word3 word2 word3 word4" the result + is "word3". How does it work? At the start, ^(?x) anchors the pattern + and sets the "x" option, which causes white space (introduced for read- + ability) to be ignored. Inside the assertion, the greedy .* at first + consumes the entire string, but then has to backtrack until the rest of + the assertion can match a word, which is captured by group 1. In other + words, when the assertion first succeeds, it captures the right-most + word in the string. + + The current matching point is then reset to the start of the subject, + and the rest of the pattern match checks for two occurrences of the + captured word, using an ungreedy .*? to scan from the left. If this + succeeds, we are done, but if the last word in the string does not oc- + cur twice, this part of the pattern fails. If a traditional atomic + lookhead (?= or (*pla: had been used, the assertion could not be re-en- + tered, and the whole match would fail. The pattern would succeed only + if the very last word in the subject was found twice. + + Using a non-atomic lookahead, however, means that when the last word + does not occur twice in the string, the lookahead can backtrack and + find the second-last word, and so on, until either the match succeeds, + or all words have been tested. + + Two conditions must be met for a non-atomic assertion to be useful: the + contents of one or more capturing groups must change after a backtrack + into the assertion, and there must be a backreference to a changed + group later in the pattern. If this is not the case, the rest of the + pattern match fails exactly as before because nothing has changed, so + using a non-atomic assertion just wastes resources. + + There is one exception to backtracking into a non-atomic assertion. If + an (*ACCEPT) control verb is triggered, the assertion succeeds atomi- + cally. That is, a subsequent match failure cannot backtrack into the + assertion. + + Non-atomic assertions are not supported by the alternative matching + function pcre2_dfa_match(). They are supported by JIT, but only if they + do not contain any control verbs such as (*ACCEPT). (This may change in + future). Note that assertions that appear as conditions for conditional + groups (see below) must be atomic. + + +SCRIPT RUNS + + In concept, a script run is a sequence of characters that are all from + the same Unicode script such as Latin or Greek. However, because some + scripts are commonly used together, and because some diacritical and + other marks are used with multiple scripts, it is not that simple. + There is a full description of the rules that PCRE2 uses in the section + entitled "Script Runs" in the pcre2unicode documentation. + + If part of a pattern is enclosed between (*script_run: or (*sr: and a + closing parenthesis, it fails if the sequence of characters that it + matches are not a script run. After a failure, normal backtracking oc- + curs. Script runs can be used to detect spoofing attacks using charac- + ters that look the same, but are from different scripts. The string + "paypal.com" is an infamous example, where the letters could be a mix- + ture of Latin and Cyrillic. This pattern ensures that the matched char- + acters in a sequence of non-spaces that follow white space are a script + run: + + \s+(*sr:\S+) + + To be sure that they are all from the Latin script (for example), a + lookahead can be used: + + \s+(?=\p{Latin})(*sr:\S+) + + This works as long as the first character is expected to be a character + in that script, and not (for example) punctuation, which is allowed + with any script. If this is not the case, a more creative lookahead is + needed. For example, if digits, underscore, and dots are permitted at + the start: + + \s+(?=[0-9_.]*\p{Latin})(*sr:\S+) + + + In many cases, backtracking into a script run pattern fragment is not + desirable. The script run can employ an atomic group to prevent this. + Because this is a common requirement, a shorthand notation is provided + by (*atomic_script_run: or (*asr: + + (*asr:...) is the same as (*sr:(?>...)) + + Note that the atomic group is inside the script run. Putting it outside + would not prevent backtracking into the script run pattern. + + Support for script runs is not available if PCRE2 is compiled without + Unicode support. A compile-time error is given if any of the above con- + structs is encountered. Script runs are not supported by the alternate + matching function, pcre2_dfa_match() because they use the same mecha- + nism as capturing parentheses. + + Warning: The (*ACCEPT) control verb (see below) should not be used + within a script run group, because it causes an immediate exit from the + group, bypassing the script run checking. + + +CONDITIONAL GROUPS + + It is possible to cause the matching process to obey a pattern fragment + conditionally or to choose between two alternative fragments, depending + on the result of an assertion, or whether a specific capture group has + already been matched. The two possible forms of conditional group are: + + (?(condition)yes-pattern) + (?(condition)yes-pattern|no-pattern) + + If the condition is satisfied, the yes-pattern is used; otherwise the + no-pattern (if present) is used. An absent no-pattern is equivalent to + an empty string (it always matches). If there are more than two alter- + natives in the group, a compile-time error occurs. Each of the two al- + ternatives may itself contain nested groups of any form, including con- + ditional groups; the restriction to two alternatives applies only at + the level of the condition itself. This pattern fragment is an example + where the alternatives are complex: + + (?(1) (A|B|C) | (D | (?(2)E|F) | E) ) + + + There are five kinds of condition: references to capture groups, refer- + ences to recursion, two pseudo-conditions called DEFINE and VERSION, + and assertions. + + Checking for a used capture group by number + + If the text between the parentheses consists of a sequence of digits, + the condition is true if a capture group of that number has previously + matched. If there is more than one capture group with the same number + (see the earlier section about duplicate group numbers), the condition + is true if any of them have matched. An alternative notation is to pre- + cede the digits with a plus or minus sign. In this case, the group num- + ber is relative rather than absolute. The most recently opened capture + group can be referenced by (?(-1), the next most recent by (?(-2), and + so on. Inside loops it can also make sense to refer to subsequent + groups. The next capture group can be referenced as (?(+1), and so on. + (The value zero in any of these forms is not used; it provokes a com- + pile-time error.) + + Consider the following pattern, which contains non-significant white + space to make it more readable (assume the PCRE2_EXTENDED option) and + to divide it into three parts for ease of discussion: + + ( \( )? [^()]+ (?(1) \) ) + + The first part matches an optional opening parenthesis, and if that + character is present, sets it as the first captured substring. The sec- + ond part matches one or more characters that are not parentheses. The + third part is a conditional group that tests whether or not the first + capture group matched. If it did, that is, if subject started with an + opening parenthesis, the condition is true, and so the yes-pattern is + executed and a closing parenthesis is required. Otherwise, since no- + pattern is not present, the conditional group matches nothing. In other + words, this pattern matches a sequence of non-parentheses, optionally + enclosed in parentheses. + + If you were embedding this pattern in a larger one, you could use a + relative reference: + + ...other stuff... ( \( )? [^()]+ (?(-1) \) ) ... + + This makes the fragment independent of the parentheses in the larger + pattern. + + Checking for a used capture group by name + + Perl uses the syntax (?()...) or (?('name')...) to test for a + used capture group by name. For compatibility with earlier versions of + PCRE1, which had this facility before Perl, the syntax (?(name)...) is + also recognized. Note, however, that undelimited names consisting of + the letter R followed by digits are ambiguous (see the following sec- + tion). Rewriting the above example to use a named group gives this: + + (? \( )? [^()]+ (?() \) ) + + If the name used in a condition of this kind is a duplicate, the test + is applied to all groups of the same name, and is true if any one of + them has matched. + + Checking for pattern recursion + + "Recursion" in this sense refers to any subroutine-like call from one + part of the pattern to another, whether or not it is actually recur- + sive. See the sections entitled "Recursive patterns" and "Groups as + subroutines" below for details of recursion and subroutine calls. + + If a condition is the string (R), and there is no capture group with + the name R, the condition is true if matching is currently in a recur- + sion or subroutine call to the whole pattern or any capture group. If + digits follow the letter R, and there is no group with that name, the + condition is true if the most recent call is into a group with the + given number, which must exist somewhere in the overall pattern. This + is a contrived example that is equivalent to a+b: + + ((?(R1)a+|(?1)b)) + + However, in both cases, if there is a capture group with a matching + name, the condition tests for its being set, as described in the sec- + tion above, instead of testing for recursion. For example, creating a + group with the name R1 by adding (?) to the above pattern com- + pletely changes its meaning. + + If a name preceded by ampersand follows the letter R, for example: + + (?(R&name)...) + + the condition is true if the most recent recursion is into a group of + that name (which must exist within the pattern). + + This condition does not check the entire recursion stack. It tests only + the current level. If the name used in a condition of this kind is a + duplicate, the test is applied to all groups of the same name, and is + true if any one of them is the most recent recursion. + + At "top level", all these recursion test conditions are false. + + Defining capture groups for use by reference only + + If the condition is the string (DEFINE), the condition is always false, + even if there is a group with the name DEFINE. In this case, there may + be only one alternative in the rest of the conditional group. It is al- + ways skipped if control reaches this point in the pattern; the idea of + DEFINE is that it can be used to define subroutines that can be refer- + enced from elsewhere. (The use of subroutines is described below.) For + example, a pattern to match an IPv4 address such as "192.168.23.245" + could be written like this (ignore white space and line breaks): + + (?(DEFINE) (? 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) ) + \b (?&byte) (\.(?&byte)){3} \b + + The first part of the pattern is a DEFINE group inside which a another + group named "byte" is defined. This matches an individual component of + an IPv4 address (a number less than 256). When matching takes place, + this part of the pattern is skipped because DEFINE acts like a false + condition. The rest of the pattern uses references to the named group + to match the four dot-separated components of an IPv4 address, insist- + ing on a word boundary at each end. + + Checking the PCRE2 version + + Programs that link with a PCRE2 library can check the version by call- + ing pcre2_config() with appropriate arguments. Users of applications + that do not have access to the underlying code cannot do this. A spe- + cial "condition" called VERSION exists to allow such users to discover + which version of PCRE2 they are dealing with by using this condition to + match a string such as "yesno". VERSION must be followed either by "=" + or ">=" and a version number. For example: + + (?(VERSION>=10.4)yes|no) + + This pattern matches "yes" if the PCRE2 version is greater or equal to + 10.4, or "no" otherwise. The fractional part of the version number may + not contain more than two digits. + + Assertion conditions + + If the condition is not in any of the above formats, it must be a + parenthesized assertion. This may be a positive or negative lookahead + or lookbehind assertion. However, it must be a traditional atomic as- + sertion, not one of the PCRE2-specific non-atomic assertions. + + Consider this pattern, again containing non-significant white space, + and with the two alternatives on the second line: + + (?(?=[^a-z]*[a-z]) + \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} ) + + The condition is a positive lookahead assertion that matches an op- + tional sequence of non-letters followed by a letter. In other words, it + tests for the presence of at least one letter in the subject. If a let- + ter is found, the subject is matched against the first alternative; + otherwise it is matched against the second. This pattern matches + strings in one of the two forms dd-aaa-dd or dd-dd-dd, where aaa are + letters and dd are digits. + + When an assertion that is a condition contains capture groups, any cap- + turing that occurs in a matching branch is retained afterwards, for + both positive and negative assertions, because matching always contin- + ues after the assertion, whether it succeeds or fails. (Compare non- + conditional assertions, for which captures are retained only for posi- + tive assertions that succeed.) + + +COMMENTS + + There are two ways of including comments in patterns that are processed + by PCRE2. In both cases, the start of the comment must not be in a + character class, nor in the middle of any other sequence of related + characters such as (?: or a group name or number. The characters that + make up a comment play no part in the pattern matching. + + The sequence (?# marks the start of a comment that continues up to the + next closing parenthesis. Nested parentheses are not permitted. If the + PCRE2_EXTENDED or PCRE2_EXTENDED_MORE option is set, an unescaped # + character also introduces a comment, which in this case continues to + immediately after the next newline character or character sequence in + the pattern. Which characters are interpreted as newlines is controlled + by an option passed to the compiling function or by a special sequence + at the start of the pattern, as described in the section entitled "New- + line conventions" above. Note that the end of this type of comment is a + literal newline sequence in the pattern; escape sequences that happen + to represent a newline do not count. For example, consider this pattern + when PCRE2_EXTENDED is set, and the default newline convention (a sin- + gle linefeed character) is in force: + + abc #comment \n still comment + + On encountering the # character, pcre2_compile() skips along, looking + for a newline in the pattern. The sequence \n is still literal at this + stage, so it does not terminate the comment. Only an actual character + with the code value 0x0a (the default newline) does so. + + +RECURSIVE PATTERNS + + Consider the problem of matching a string in parentheses, allowing for + unlimited nested parentheses. Without the use of recursion, the best + that can be done is to use a pattern that matches up to some fixed + depth of nesting. It is not possible to handle an arbitrary nesting + depth. + + For some time, Perl has provided a facility that allows regular expres- + sions to recurse (amongst other things). It does this by interpolating + Perl code in the expression at run time, and the code can refer to the + expression itself. A Perl pattern using code interpolation to solve the + parentheses problem can be created like this: + + $re = qr{\( (?: (?>[^()]+) | (?p{$re}) )* \)}x; + + The (?p{...}) item interpolates Perl code at run time, and in this case + refers recursively to the pattern in which it appears. + + Obviously, PCRE2 cannot support the interpolation of Perl code. In- + stead, it supports special syntax for recursion of the entire pattern, + and also for individual capture group recursion. After its introduction + in PCRE1 and Python, this kind of recursion was subsequently introduced + into Perl at release 5.10. + + A special item that consists of (? followed by a number greater than + zero and a closing parenthesis is a recursive subroutine call of the + capture group of the given number, provided that it occurs inside that + group. (If not, it is a non-recursive subroutine call, which is de- + scribed in the next section.) The special item (?R) or (?0) is a recur- + sive call of the entire regular expression. + + This PCRE2 pattern solves the nested parentheses problem (assume the + PCRE2_EXTENDED option is set so that white space is ignored): + + \( ( [^()]++ | (?R) )* \) + + First it matches an opening parenthesis. Then it matches any number of + substrings which can either be a sequence of non-parentheses, or a re- + cursive match of the pattern itself (that is, a correctly parenthesized + substring). Finally there is a closing parenthesis. Note the use of a + possessive quantifier to avoid backtracking into sequences of non- + parentheses. + + If this were part of a larger pattern, you would not want to recurse + the entire pattern, so instead you could use this: + + ( \( ( [^()]++ | (?1) )* \) ) + + We have put the pattern into parentheses, and caused the recursion to + refer to them instead of the whole pattern. + + In a larger pattern, keeping track of parenthesis numbers can be + tricky. This is made easier by the use of relative references. Instead + of (?1) in the pattern above you can write (?-2) to refer to the second + most recently opened parentheses preceding the recursion. In other + words, a negative number counts capturing parentheses leftwards from + the point at which it is encountered. + + Be aware however, that if duplicate capture group numbers are in use, + relative references refer to the earliest group with the appropriate + number. Consider, for example: + + (?|(a)|(b)) (c) (?-2) + + The first two capture groups (a) and (b) are both numbered 1, and group + (c) is number 2. When the reference (?-2) is encountered, the second + most recently opened parentheses has the number 1, but it is the first + such group (the (a) group) to which the recursion refers. This would be + the same if an absolute reference (?1) was used. In other words, rela- + tive references are just a shorthand for computing a group number. + + It is also possible to refer to subsequent capture groups, by writing + references such as (?+2). However, these cannot be recursive because + the reference is not inside the parentheses that are referenced. They + are always non-recursive subroutine calls, as described in the next + section. + + An alternative approach is to use named parentheses. The Perl syntax + for this is (?&name); PCRE1's earlier syntax (?P>name) is also sup- + ported. We could rewrite the above example as follows: + + (? \( ( [^()]++ | (?&pn) )* \) ) + + If there is more than one group with the same name, the earliest one is + used. + + The example pattern that we have been looking at contains nested unlim- + ited repeats, and so the use of a possessive quantifier for matching + strings of non-parentheses is important when applying the pattern to + strings that do not match. For example, when this pattern is applied to + + (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa() + + it yields "no match" quickly. However, if a possessive quantifier is + not used, the match runs for a very long time indeed because there are + so many different ways the + and * repeats can carve up the subject, + and all have to be tested before failure can be reported. + + At the end of a match, the values of capturing parentheses are those + from the outermost level. If you want to obtain intermediate values, a + callout function can be used (see below and the pcre2callout documenta- + tion). If the pattern above is matched against + + (ab(cd)ef) + + the value for the inner capturing parentheses (numbered 2) is "ef", + which is the last value taken on at the top level. If a capture group + is not matched at the top level, its final captured value is unset, + even if it was (temporarily) set at a deeper level during the matching + process. + + Do not confuse the (?R) item with the condition (R), which tests for + recursion. Consider this pattern, which matches text in angle brack- + ets, allowing for arbitrary nesting. Only digits are allowed in nested + brackets (that is, when recursing), whereas any characters are permit- + ted at the outer level. + + < (?: (?(R) \d++ | [^<>]*+) | (?R)) * > + + In this pattern, (?(R) is the start of a conditional group, with two + different alternatives for the recursive and non-recursive cases. The + (?R) item is the actual recursive call. + + Differences in recursion processing between PCRE2 and Perl + + Some former differences between PCRE2 and Perl no longer exist. + + Before release 10.30, recursion processing in PCRE2 differed from Perl + in that a recursive subroutine call was always treated as an atomic + group. That is, once it had matched some of the subject string, it was + never re-entered, even if it contained untried alternatives and there + was a subsequent matching failure. (Historical note: PCRE implemented + recursion before Perl did.) + + Starting with release 10.30, recursive subroutine calls are no longer + treated as atomic. That is, they can be re-entered to try unused alter- + natives if there is a matching failure later in the pattern. This is + now compatible with the way Perl works. If you want a subroutine call + to be atomic, you must explicitly enclose it in an atomic group. + + Supporting backtracking into recursions simplifies certain types of re- + cursive pattern. For example, this pattern matches palindromic strings: + + ^((.)(?1)\2|.?)$ + + The second branch in the group matches a single central character in + the palindrome when there are an odd number of characters, or nothing + when there are an even number of characters, but in order to work it + has to be able to try the second case when the rest of the pattern + match fails. If you want to match typical palindromic phrases, the pat- + tern has to ignore all non-word characters, which can be done like + this: + + ^\W*+((.)\W*+(?1)\W*+\2|\W*+.?)\W*+$ + + If run with the PCRE2_CASELESS option, this pattern matches phrases + such as "A man, a plan, a canal: Panama!". Note the use of the posses- + sive quantifier *+ to avoid backtracking into sequences of non-word + characters. Without this, PCRE2 takes a great deal longer (ten times or + more) to match typical phrases, and Perl takes so long that you think + it has gone into a loop. + + Another way in which PCRE2 and Perl used to differ in their recursion + processing is in the handling of captured values. Formerly in Perl, + when a group was called recursively or as a subroutine (see the next + section), it had no access to any values that were captured outside the + recursion, whereas in PCRE2 these values can be referenced. Consider + this pattern: + + ^(.)(\1|a(?2)) + + This pattern matches "bab". The first capturing parentheses match "b", + then in the second group, when the backreference \1 fails to match "b", + the second alternative matches "a" and then recurses. In the recursion, + \1 does now match "b" and so the whole match succeeds. This match used + to fail in Perl, but in later versions (I tried 5.024) it now works. + + +GROUPS AS SUBROUTINES + + If the syntax for a recursive group call (either by number or by name) + is used outside the parentheses to which it refers, it operates a bit + like a subroutine in a programming language. More accurately, PCRE2 + treats the referenced group as an independent subpattern which it tries + to match at the current matching position. The called group may be de- + fined before or after the reference. A numbered reference can be abso- + lute or relative, as in these examples: + + (...(absolute)...)...(?2)... + (...(relative)...)...(?-1)... + (...(?+1)...(relative)... + + An earlier example pointed out that the pattern + + (sens|respons)e and \1ibility + + matches "sense and sensibility" and "response and responsibility", but + not "sense and responsibility". If instead the pattern + + (sens|respons)e and (?1)ibility + + is used, it does match "sense and responsibility" as well as the other + two strings. Another example is given in the discussion of DEFINE + above. + + Like recursions, subroutine calls used to be treated as atomic, but + this changed at PCRE2 release 10.30, so backtracking into subroutine + calls can now occur. However, any capturing parentheses that are set + during the subroutine call revert to their previous values afterwards. + + Processing options such as case-independence are fixed when a group is + defined, so if it is used as a subroutine, such options cannot be + changed for different calls. For example, consider this pattern: + + (abc)(?i:(?-1)) + + It matches "abcabc". It does not match "abcABC" because the change of + processing option does not affect the called group. + + The behaviour of backtracking control verbs in groups when called as + subroutines is described in the section entitled "Backtracking verbs in + subroutines" below. + + +ONIGURUMA SUBROUTINE SYNTAX + + For compatibility with Oniguruma, the non-Perl syntax \g followed by a + name or a number enclosed either in angle brackets or single quotes, is + an alternative syntax for calling a group as a subroutine, possibly re- + cursively. Here are two of the examples used above, rewritten using + this syntax: + + (? \( ( (?>[^()]+) | \g )* \) ) + (sens|respons)e and \g'1'ibility + + PCRE2 supports an extension to Oniguruma: if a number is preceded by a + plus or a minus sign it is taken as a relative reference. For example: + + (abc)(?i:\g<-1>) + + Note that \g{...} (Perl syntax) and \g<...> (Oniguruma syntax) are not + synonymous. The former is a backreference; the latter is a subroutine + call. + + +CALLOUTS + + Perl has a feature whereby using the sequence (?{...}) causes arbitrary + Perl code to be obeyed in the middle of matching a regular expression. + This makes it possible, amongst other things, to extract different sub- + strings that match the same pair of parentheses when there is a repeti- + tion. + + PCRE2 provides a similar feature, but of course it cannot obey arbi- + trary Perl code. The feature is called "callout". The caller of PCRE2 + provides an external function by putting its entry point in a match + context using the function pcre2_set_callout(), and then passing that + context to pcre2_match() or pcre2_dfa_match(). If no match context is + passed, or if the callout entry point is set to NULL, callouts are dis- + abled. + + Within a regular expression, (?C) indicates a point at which the + external function is to be called. There are two kinds of callout: + those with a numerical argument and those with a string argument. (?C) + on its own with no argument is treated as (?C0). A numerical argument + allows the application to distinguish between different callouts. + String arguments were added for release 10.20 to make it possible for + script languages that use PCRE2 to embed short scripts within patterns + in a similar way to Perl. + + During matching, when PCRE2 reaches a callout point, the external func- + tion is called. It is provided with the number or string argument of + the callout, the position in the pattern, and one item of data that is + also set in the match block. The callout function may cause matching to + proceed, to backtrack, or to fail. + + By default, PCRE2 implements a number of optimizations at matching + time, and one side-effect is that sometimes callouts are skipped. If + you need all possible callouts to happen, you need to set options that + disable the relevant optimizations. More details, including a complete + description of the programming interface to the callout function, are + given in the pcre2callout documentation. + + Callouts with numerical arguments + + If you just want to have a means of identifying different callout + points, put a number less than 256 after the letter C. For example, + this pattern has two callout points: + + (?C1)abc(?C2)def + + If the PCRE2_AUTO_CALLOUT flag is passed to pcre2_compile(), numerical + callouts are automatically installed before each item in the pattern. + They are all numbered 255. If there is a conditional group in the pat- + tern whose condition is an assertion, an additional callout is inserted + just before the condition. An explicit callout may also be set at this + position, as in this example: + + (?(?C9)(?=a)abc|def) + + Note that this applies only to assertion conditions, not to other types + of condition. + + Callouts with string arguments + + A delimited string may be used instead of a number as a callout argu- + ment. The starting delimiter must be one of ` ' " ^ % # $ { and the + ending delimiter is the same as the start, except for {, where the end- + ing delimiter is }. If the ending delimiter is needed within the + string, it must be doubled. For example: + + (?C'ab ''c'' d')xyz(?C{any text})pqr + + The doubling is removed before the string is passed to the callout + function. + + +BACKTRACKING CONTROL + + There are a number of special "Backtracking Control Verbs" (to use + Perl's terminology) that modify the behaviour of backtracking during + matching. They are generally of the form (*VERB) or (*VERB:NAME). Some + verbs take either form, and may behave differently depending on whether + or not a name argument is present. The names are not required to be + unique within the pattern. + + By default, for compatibility with Perl, a name is any sequence of + characters that does not include a closing parenthesis. The name is not + processed in any way, and it is not possible to include a closing + parenthesis in the name. This can be changed by setting the + PCRE2_ALT_VERBNAMES option, but the result is no longer Perl-compati- + ble. + + When PCRE2_ALT_VERBNAMES is set, backslash processing is applied to + verb names and only an unescaped closing parenthesis terminates the + name. However, the only backslash items that are permitted are \Q, \E, + and sequences such as \x{100} that define character code points. Char- + acter type escapes such as \d are faulted. + + A closing parenthesis can be included in a name either as \) or between + \Q and \E. In addition to backslash processing, if the PCRE2_EXTENDED + or PCRE2_EXTENDED_MORE option is also set, unescaped whitespace in verb + names is skipped, and #-comments are recognized, exactly as in the rest + of the pattern. PCRE2_EXTENDED and PCRE2_EXTENDED_MORE do not affect + verb names unless PCRE2_ALT_VERBNAMES is also set. + + The maximum length of a name is 255 in the 8-bit library and 65535 in + the 16-bit and 32-bit libraries. If the name is empty, that is, if the + closing parenthesis immediately follows the colon, the effect is as if + the colon were not there. Any number of these verbs may occur in a pat- + tern. Except for (*ACCEPT), they may not be quantified. + + Since these verbs are specifically related to backtracking, most of + them can be used only when the pattern is to be matched using the tra- + ditional matching function, because that uses a backtracking algorithm. + With the exception of (*FAIL), which behaves like a failing negative + assertion, the backtracking control verbs cause an error if encountered + by the DFA matching function. + + The behaviour of these verbs in repeated groups, assertions, and in + capture groups called as subroutines (whether or not recursively) is + documented below. + + Optimizations that affect backtracking verbs + + PCRE2 contains some optimizations that are used to speed up matching by + running some checks at the start of each match attempt. For example, it + may know the minimum length of matching subject, or that a particular + character must be present. When one of these optimizations bypasses the + running of a match, any included backtracking verbs will not, of + course, be processed. You can suppress the start-of-match optimizations + by setting the PCRE2_NO_START_OPTIMIZE option when calling pcre2_com- + pile(), or by starting the pattern with (*NO_START_OPT). There is more + discussion of this option in the section entitled "Compiling a pattern" + in the pcre2api documentation. + + Experiments with Perl suggest that it too has similar optimizations, + and like PCRE2, turning them off can change the result of a match. + + Verbs that act immediately + + The following verbs act as soon as they are encountered. + + (*ACCEPT) or (*ACCEPT:NAME) + + This verb causes the match to end successfully, skipping the remainder + of the pattern. However, when it is inside a capture group that is + called as a subroutine, only that group is ended successfully. Matching + then continues at the outer level. If (*ACCEPT) in triggered in a posi- + tive assertion, the assertion succeeds; in a negative assertion, the + assertion fails. + + If (*ACCEPT) is inside capturing parentheses, the data so far is cap- + tured. For example: + + A((?:A|B(*ACCEPT)|C)D) + + This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is cap- + tured by the outer parentheses. + + (*ACCEPT) is the only backtracking verb that is allowed to be quanti- + fied because an ungreedy quantification with a minimum of zero acts + only when a backtrack happens. Consider, for example, + + (A(*ACCEPT)??B)C + + where A, B, and C may be complex expressions. After matching "A", the + matcher processes "BC"; if that fails, causing a backtrack, (*ACCEPT) + is triggered and the match succeeds. In both cases, all but C is cap- + tured. Whereas (*COMMIT) (see below) means "fail on backtrack", a re- + peated (*ACCEPT) of this type means "succeed on backtrack". + + Warning: (*ACCEPT) should not be used within a script run group, be- + cause it causes an immediate exit from the group, bypassing the script + run checking. + + (*FAIL) or (*FAIL:NAME) + + This verb causes a matching failure, forcing backtracking to occur. It + may be abbreviated to (*F). It is equivalent to (?!) but easier to + read. The Perl documentation notes that it is probably useful only when + combined with (?{}) or (??{}). Those are, of course, Perl features that + are not present in PCRE2. The nearest equivalent is the callout fea- + ture, as for example in this pattern: + + a+(?C)(*FAIL) + + A match with the string "aaaa" always fails, but the callout is taken + before each backtrack happens (in this example, 10 times). + + (*ACCEPT:NAME) and (*FAIL:NAME) behave the same as (*MARK:NAME)(*AC- + CEPT) and (*MARK:NAME)(*FAIL), respectively, that is, a (*MARK) is + recorded just before the verb acts. + + Recording which path was taken + + There is one verb whose main purpose is to track how a match was ar- + rived at, though it also has a secondary use in conjunction with ad- + vancing the match starting point (see (*SKIP) below). + + (*MARK:NAME) or (*:NAME) + + A name is always required with this verb. For all the other backtrack- + ing control verbs, a NAME argument is optional. + + When a match succeeds, the name of the last-encountered mark name on + the matching path is passed back to the caller as described in the sec- + tion entitled "Other information about the match" in the pcre2api docu- + mentation. This applies to all instances of (*MARK) and other verbs, + including those inside assertions and atomic groups. However, there are + differences in those cases when (*MARK) is used in conjunction with + (*SKIP) as described below. + + The mark name that was last encountered on the matching path is passed + back. A verb without a NAME argument is ignored for this purpose. Here + is an example of pcre2test output, where the "mark" modifier requests + the retrieval and outputting of (*MARK) data: + + re> /X(*MARK:A)Y|X(*MARK:B)Z/mark + data> XY + 0: XY + MK: A + XZ + 0: XZ + MK: B + + The (*MARK) name is tagged with "MK:" in this output, and in this exam- + ple it indicates which of the two alternatives matched. This is a more + efficient way of obtaining this information than putting each alterna- + tive in its own capturing parentheses. + + If a verb with a name is encountered in a positive assertion that is + true, the name is recorded and passed back if it is the last-encoun- + tered. This does not happen for negative assertions or failing positive + assertions. + + After a partial match or a failed match, the last encountered name in + the entire match process is returned. For example: + + re> /X(*MARK:A)Y|X(*MARK:B)Z/mark + data> XP + No match, mark = B + + Note that in this unanchored example the mark is retained from the + match attempt that started at the letter "X" in the subject. Subsequent + match attempts starting at "P" and then with an empty string do not get + as far as the (*MARK) item, but nevertheless do not reset it. + + If you are interested in (*MARK) values after failed matches, you + should probably set the PCRE2_NO_START_OPTIMIZE option (see above) to + ensure that the match is always attempted. + + Verbs that act after backtracking + + The following verbs do nothing when they are encountered. Matching con- + tinues with what follows, but if there is a subsequent match failure, + causing a backtrack to the verb, a failure is forced. That is, back- + tracking cannot pass to the left of the verb. However, when one of + these verbs appears inside an atomic group or in a lookaround assertion + that is true, its effect is confined to that group, because once the + group has been matched, there is never any backtracking into it. Back- + tracking from beyond an assertion or an atomic group ignores the entire + group, and seeks a preceding backtracking point. + + These verbs differ in exactly what kind of failure occurs when back- + tracking reaches them. The behaviour described below is what happens + when the verb is not in a subroutine or an assertion. Subsequent sec- + tions cover these special cases. + + (*COMMIT) or (*COMMIT:NAME) + + This verb causes the whole match to fail outright if there is a later + matching failure that causes backtracking to reach it. Even if the pat- + tern is unanchored, no further attempts to find a match by advancing + the starting point take place. If (*COMMIT) is the only backtracking + verb that is encountered, once it has been passed pcre2_match() is com- + mitted to finding a match at the current starting point, or not at all. + For example: + + a+(*COMMIT)b + + This matches "xxaab" but not "aacaab". It can be thought of as a kind + of dynamic anchor, or "I've started, so I must finish." + + The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COM- + MIT). It is like (*MARK:NAME) in that the name is remembered for pass- + ing back to the caller. However, (*SKIP:NAME) searches only for names + that are set with (*MARK), ignoring those set by any of the other back- + tracking verbs. + + If there is more than one backtracking verb in a pattern, a different + one that follows (*COMMIT) may be triggered first, so merely passing + (*COMMIT) during a match does not always guarantee that a match must be + at this starting point. + + Note that (*COMMIT) at the start of a pattern is not the same as an an- + chor, unless PCRE2's start-of-match optimizations are turned off, as + shown in this output from pcre2test: + + re> /(*COMMIT)abc/ + data> xyzabc + 0: abc + data> + re> /(*COMMIT)abc/no_start_optimize + data> xyzabc + No match + + For the first pattern, PCRE2 knows that any match must start with "a", + so the optimization skips along the subject to "a" before applying the + pattern to the first set of data. The match attempt then succeeds. The + second pattern disables the optimization that skips along to the first + character. The pattern is now applied starting at "x", and so the + (*COMMIT) causes the match to fail without trying any other starting + points. + + (*PRUNE) or (*PRUNE:NAME) + + This verb causes the match to fail at the current starting position in + the subject if there is a later matching failure that causes backtrack- + ing to reach it. If the pattern is unanchored, the normal "bumpalong" + advance to the next starting character then happens. Backtracking can + occur as usual to the left of (*PRUNE), before it is reached, or when + matching to the right of (*PRUNE), but if there is no match to the + right, backtracking cannot cross (*PRUNE). In simple cases, the use of + (*PRUNE) is just an alternative to an atomic group or possessive quan- + tifier, but there are some uses of (*PRUNE) that cannot be expressed in + any other way. In an anchored pattern (*PRUNE) has the same effect as + (*COMMIT). + + The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). + It is like (*MARK:NAME) in that the name is remembered for passing back + to the caller. However, (*SKIP:NAME) searches only for names set with + (*MARK), ignoring those set by other backtracking verbs. + + (*SKIP) + + This verb, when given without a name, is like (*PRUNE), except that if + the pattern is unanchored, the "bumpalong" advance is not to the next + character, but to the position in the subject where (*SKIP) was encoun- + tered. (*SKIP) signifies that whatever text was matched leading up to + it cannot be part of a successful match if there is a later mismatch. + Consider: + + a+(*SKIP)b + + If the subject is "aaaac...", after the first match attempt fails + (starting at the first character in the string), the starting point + skips on to start the next attempt at "c". Note that a possessive quan- + tifer does not have the same effect as this example; although it would + suppress backtracking during the first match attempt, the second at- + tempt would start at the second character instead of skipping on to + "c". + + If (*SKIP) is used to specify a new starting position that is the same + as the starting position of the current match, or (by being inside a + lookbehind) earlier, the position specified by (*SKIP) is ignored, and + instead the normal "bumpalong" occurs. + + (*SKIP:NAME) + + When (*SKIP) has an associated name, its behaviour is modified. When + such a (*SKIP) is triggered, the previous path through the pattern is + searched for the most recent (*MARK) that has the same name. If one is + found, the "bumpalong" advance is to the subject position that corre- + sponds to that (*MARK) instead of to where (*SKIP) was encountered. If + no (*MARK) with a matching name is found, the (*SKIP) is ignored. + + The search for a (*MARK) name uses the normal backtracking mechanism, + which means that it does not see (*MARK) settings that are inside + atomic groups or assertions, because they are never re-entered by back- + tracking. Compare the following pcre2test examples: + + re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/ + data: abc + 0: a + 1: a + data: + re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/ + data: abc + 0: b + 1: b + + In the first example, the (*MARK) setting is in an atomic group, so it + is not seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. + This allows the second branch of the pattern to be tried at the first + character position. In the second example, the (*MARK) setting is not + in an atomic group. This allows (*SKIP:X) to find the (*MARK) when it + backtracks, and this causes a new matching attempt to start at the sec- + ond character. This time, the (*MARK) is never seen because "a" does + not match "b", so the matcher immediately jumps to the second branch of + the pattern. + + Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It + ignores names that are set by other backtracking verbs. + + (*THEN) or (*THEN:NAME) + + This verb causes a skip to the next innermost alternative when back- + tracking reaches it. That is, it cancels any further backtracking + within the current alternative. Its name comes from the observation + that it can be used for a pattern-based if-then-else block: + + ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ... + + If the COND1 pattern matches, FOO is tried (and possibly further items + after the end of the group if FOO succeeds); on failure, the matcher + skips to the second alternative and tries COND2, without backtracking + into COND1. If that succeeds and BAR fails, COND3 is tried. If subse- + quently BAZ fails, there are no more alternatives, so there is a back- + track to whatever came before the entire group. If (*THEN) is not in- + side an alternation, it acts like (*PRUNE). + + The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). + It is like (*MARK:NAME) in that the name is remembered for passing back + to the caller. However, (*SKIP:NAME) searches only for names set with + (*MARK), ignoring those set by other backtracking verbs. + + A group that does not contain a | character is just a part of the en- + closing alternative; it is not a nested alternation with only one al- + ternative. The effect of (*THEN) extends beyond such a group to the en- + closing alternative. Consider this pattern, where A, B, etc. are com- + plex pattern fragments that do not contain any | characters at this + level: + + A (B(*THEN)C) | D + + If A and B are matched, but there is a failure in C, matching does not + backtrack into A; instead it moves to the next alternative, that is, D. + However, if the group containing (*THEN) is given an alternative, it + behaves differently: + + A (B(*THEN)C | (*FAIL)) | D + + The effect of (*THEN) is now confined to the inner group. After a fail- + ure in C, matching moves to (*FAIL), which causes the whole group to + fail because there are no more alternatives to try. In this case, + matching does backtrack into A. + + Note that a conditional group is not considered as having two alterna- + tives, because only one is ever used. In other words, the | character + in a conditional group has a different meaning. Ignoring white space, + consider: + + ^.*? (?(?=a) a | b(*THEN)c ) + + If the subject is "ba", this pattern does not match. Because .*? is un- + greedy, it initially matches zero characters. The condition (?=a) then + fails, the character "b" is matched, but "c" is not. At this point, + matching does not backtrack to .*? as might perhaps be expected from + the presence of the | character. The conditional group is part of the + single alternative that comprises the whole pattern, and so the match + fails. (If there was a backtrack into .*?, allowing it to match "b", + the match would succeed.) + + The verbs just described provide four different "strengths" of control + when subsequent matching fails. (*THEN) is the weakest, carrying on the + match at the next alternative. (*PRUNE) comes next, failing the match + at the current starting position, but allowing an advance to the next + character (for an unanchored pattern). (*SKIP) is similar, except that + the advance may be more than one character. (*COMMIT) is the strongest, + causing the entire match to fail. + + More than one backtracking verb + + If more than one backtracking verb is present in a pattern, the one + that is backtracked onto first acts. For example, consider this pat- + tern, where A, B, etc. are complex pattern fragments: + + (A(*COMMIT)B(*THEN)C|ABD) + + If A matches but B fails, the backtrack to (*COMMIT) causes the entire + match to fail. However, if A and B match, but C fails, the backtrack to + (*THEN) causes the next alternative (ABD) to be tried. This behaviour + is consistent, but is not always the same as Perl's. It means that if + two or more backtracking verbs appear in succession, all the the last + of them has no effect. Consider this example: + + ...(*COMMIT)(*PRUNE)... + + If there is a matching failure to the right, backtracking onto (*PRUNE) + causes it to be triggered, and its action is taken. There can never be + a backtrack onto (*COMMIT). + + Backtracking verbs in repeated groups + + PCRE2 sometimes differs from Perl in its handling of backtracking verbs + in repeated groups. For example, consider: + + /(a(*COMMIT)b)+ac/ + + If the subject is "abac", Perl matches unless its optimizations are + disabled, but PCRE2 always fails because the (*COMMIT) in the second + repeat of the group acts. + + Backtracking verbs in assertions + + (*FAIL) in any assertion has its normal effect: it forces an immediate + backtrack. The behaviour of the other backtracking verbs depends on + whether or not the assertion is standalone or acting as the condition + in a conditional group. + + (*ACCEPT) in a standalone positive assertion causes the assertion to + succeed without any further processing; captured strings and a mark + name (if set) are retained. In a standalone negative assertion, (*AC- + CEPT) causes the assertion to fail without any further processing; cap- + tured substrings and any mark name are discarded. + + If the assertion is a condition, (*ACCEPT) causes the condition to be + true for a positive assertion and false for a negative one; captured + substrings are retained in both cases. + + The remaining verbs act only when a later failure causes a backtrack to + reach them. This means that, for the Perl-compatible assertions, their + effect is confined to the assertion, because Perl lookaround assertions + are atomic. A backtrack that occurs after such an assertion is complete + does not jump back into the assertion. Note in particular that a + (*MARK) name that is set in an assertion is not "seen" by an instance + of (*SKIP:NAME) later in the pattern. + + PCRE2 now supports non-atomic positive assertions, as described in the + section entitled "Non-atomic assertions" above. These assertions must + be standalone (not used as conditions). They are not Perl-compatible. + For these assertions, a later backtrack does jump back into the asser- + tion, and therefore verbs such as (*COMMIT) can be triggered by back- + tracks from later in the pattern. + + The effect of (*THEN) is not allowed to escape beyond an assertion. If + there are no more branches to try, (*THEN) causes a positive assertion + to be false, and a negative assertion to be true. + + The other backtracking verbs are not treated specially if they appear + in a standalone positive assertion. In a conditional positive asser- + tion, backtracking (from within the assertion) into (*COMMIT), (*SKIP), + or (*PRUNE) causes the condition to be false. However, for both stand- + alone and conditional negative assertions, backtracking into (*COMMIT), + (*SKIP), or (*PRUNE) causes the assertion to be true, without consider- + ing any further alternative branches. + + Backtracking verbs in subroutines + + These behaviours occur whether or not the group is called recursively. + + (*ACCEPT) in a group called as a subroutine causes the subroutine match + to succeed without any further processing. Matching then continues af- + ter the subroutine call. Perl documents this behaviour. Perl's treat- + ment of the other verbs in subroutines is different in some cases. + + (*FAIL) in a group called as a subroutine has its normal effect: it + forces an immediate backtrack. + + (*COMMIT), (*SKIP), and (*PRUNE) cause the subroutine match to fail + when triggered by being backtracked to in a group called as a subrou- + tine. There is then a backtrack at the outer level. + + (*THEN), when triggered, skips to the next alternative in the innermost + enclosing group that has alternatives (its normal behaviour). However, + if there is no such group within the subroutine's group, the subroutine + match fails and there is a backtrack at the outer level. + + +SEE ALSO + + pcre2api(3), pcre2callout(3), pcre2matching(3), pcre2syntax(3), + pcre2(3). + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 06 October 2020 + Copyright (c) 1997-2020 University of Cambridge. +------------------------------------------------------------------------------ + + +PCRE2PERFORM(3) Library Functions Manual PCRE2PERFORM(3) + + + +NAME + PCRE2 - Perl-compatible regular expressions (revised API) + +PCRE2 PERFORMANCE + + Two aspects of performance are discussed below: memory usage and pro- + cessing time. The way you express your pattern as a regular expression + can affect both of them. + + +COMPILED PATTERN MEMORY USAGE + + Patterns are compiled by PCRE2 into a reasonably efficient interpretive + code, so that most simple patterns do not use much memory for storing + the compiled version. However, there is one case where the memory usage + of a compiled pattern can be unexpectedly large. If a parenthesized + group has a quantifier with a minimum greater than 1 and/or a limited + maximum, the whole group is repeated in the compiled code. For example, + the pattern + + (abc|def){2,4} + + is compiled as if it were + + (abc|def)(abc|def)((abc|def)(abc|def)?)? + + (Technical aside: It is done this way so that backtrack points within + each of the repetitions can be independently maintained.) + + For regular expressions whose quantifiers use only small numbers, this + is not usually a problem. However, if the numbers are large, and par- + ticularly if such repetitions are nested, the memory usage can become + an embarrassment. For example, the very simple pattern + + ((ab){1,1000}c){1,3} + + uses over 50KiB when compiled using the 8-bit library. When PCRE2 is + compiled with its default internal pointer size of two bytes, the size + limit on a compiled pattern is 65535 code units in the 8-bit and 16-bit + libraries, and this is reached with the above pattern if the outer rep- + etition is increased from 3 to 4. PCRE2 can be compiled to use larger + internal pointers and thus handle larger compiled patterns, but it is + better to try to rewrite your pattern to use less memory if you can. + + One way of reducing the memory usage for such patterns is to make use + of PCRE2's "subroutine" facility. Re-writing the above pattern as + + ((ab)(?2){0,999}c)(?1){0,2} + + reduces the memory requirements to around 16KiB, and indeed it remains + under 20KiB even with the outer repetition increased to 100. However, + this kind of pattern is not always exactly equivalent, because any cap- + tures within subroutine calls are lost when the subroutine completes. + If this is not a problem, this kind of rewriting will allow you to + process patterns that PCRE2 cannot otherwise handle. The matching per- + formance of the two different versions of the pattern are roughly the + same. (This applies from release 10.30 - things were different in ear- + lier releases.) + + +STACK AND HEAP USAGE AT RUN TIME + + From release 10.30, the interpretive (non-JIT) version of pcre2_match() + uses very little system stack at run time. In earlier releases recur- + sive function calls could use a great deal of stack, and this could + cause problems, but this usage has been eliminated. Backtracking posi- + tions are now explicitly remembered in memory frames controlled by the + code. An initial 20KiB vector of frames is allocated on the system + stack (enough for about 100 frames for small patterns), but if this is + insufficient, heap memory is used. The amount of heap memory can be + limited; if the limit is set to zero, only the initial stack vector is + used. Rewriting patterns to be time-efficient, as described below, may + also reduce the memory requirements. + + In contrast to pcre2_match(), pcre2_dfa_match() does use recursive + function calls, but only for processing atomic groups, lookaround as- + sertions, and recursion within the pattern. The original version of the + code used to allocate quite large internal workspace vectors on the + stack, which caused some problems for some patterns in environments + with small stacks. From release 10.32 the code for pcre2_dfa_match() + has been re-factored to use heap memory when necessary for internal + workspace when recursing, though recursive function calls are still + used. + + The "match depth" parameter can be used to limit the depth of function + recursion, and the "match heap" parameter to limit heap memory in + pcre2_dfa_match(). + + +PROCESSING TIME + + Certain items in regular expression patterns are processed more effi- + ciently than others. It is more efficient to use a character class like + [aeiou] than a set of single-character alternatives such as + (a|e|i|o|u). In general, the simplest construction that provides the + required behaviour is usually the most efficient. Jeffrey Friedl's book + contains a lot of useful general discussion about optimizing regular + expressions for efficient performance. This document contains a few ob- + servations about PCRE2. + + Using Unicode character properties (the \p, \P, and \X escapes) is + slow, because PCRE2 has to use a multi-stage table lookup whenever it + needs a character's property. If you can find an alternative pattern + that does not use character properties, it will probably be faster. + + By default, the escape sequences \b, \d, \s, and \w, and the POSIX + character classes such as [:alpha:] do not use Unicode properties, + partly for backwards compatibility, and partly for performance reasons. + However, you can set the PCRE2_UCP option or start the pattern with + (*UCP) if you want Unicode character properties to be used. This can + double the matching time for items such as \d, when matched with + pcre2_match(); the performance loss is less with a DFA matching func- + tion, and in both cases there is not much difference for \b. + + When a pattern begins with .* not in atomic parentheses, nor in paren- + theses that are the subject of a backreference, and the PCRE2_DOTALL + option is set, the pattern is implicitly anchored by PCRE2, since it + can match only at the start of a subject string. If the pattern has + multiple top-level branches, they must all be anchorable. The optimiza- + tion can be disabled by the PCRE2_NO_DOTSTAR_ANCHOR option, and is au- + tomatically disabled if the pattern contains (*PRUNE) or (*SKIP). + + If PCRE2_DOTALL is not set, PCRE2 cannot make this optimization, be- + cause the dot metacharacter does not then match a newline, and if the + subject string contains newlines, the pattern may match from the char- + acter immediately following one of them instead of from the very start. + For example, the pattern + + .*second + + matches the subject "first\nand second" (where \n stands for a newline + character), with the match starting at the seventh character. In order + to do this, PCRE2 has to retry the match starting after every newline + in the subject. + + If you are using such a pattern with subject strings that do not con- + tain newlines, the best performance is obtained by setting + PCRE2_DOTALL, or starting the pattern with ^.* or ^.*? to indicate ex- + plicit anchoring. That saves PCRE2 from having to scan along the sub- + ject looking for a newline to restart at. + + Beware of patterns that contain nested indefinite repeats. These can + take a long time to run when applied to a string that does not match. + Consider the pattern fragment + + ^(a+)* + + This can match "aaaa" in 16 different ways, and this number increases + very rapidly as the string gets longer. (The * repeat can match 0, 1, + 2, 3, or 4 times, and for each of those cases other than 0 or 4, the + + repeats can match different numbers of times.) When the remainder of + the pattern is such that the entire match is going to fail, PCRE2 has + in principle to try every possible variation, and this can take an ex- + tremely long time, even for relatively short strings. + + An optimization catches some of the more simple cases such as + + (a+)*b + + where a literal character follows. Before embarking on the standard + matching procedure, PCRE2 checks that there is a "b" later in the sub- + ject string, and if there is not, it fails the match immediately. How- + ever, when there is no following literal this optimization cannot be + used. You can see the difference by comparing the behaviour of + + (a+)*\d + + with the pattern above. The former gives a failure almost instantly + when applied to a whole line of "a" characters, whereas the latter + takes an appreciable time with strings longer than about 20 characters. + + In many cases, the solution to this kind of performance issue is to use + an atomic group or a possessive quantifier. This can often reduce mem- + ory requirements as well. As another example, consider this pattern: + + ([^<]|<(?!inet))+ + + It matches from wherever it starts until it encounters " + + int pcre2_regcomp(regex_t *preg, const char *pattern, + int cflags); + + int pcre2_regexec(const regex_t *preg, const char *string, + size_t nmatch, regmatch_t pmatch[], int eflags); + + size_t pcre2_regerror(int errcode, const regex_t *preg, + char *errbuf, size_t errbuf_size); + + void pcre2_regfree(regex_t *preg); + + +DESCRIPTION + + This set of functions provides a POSIX-style API for the PCRE2 regular + expression 8-bit library. There are no POSIX-style wrappers for PCRE2's + 16-bit and 32-bit libraries. See the pcre2api documentation for a de- + scription of PCRE2's native API, which contains much additional func- + tionality. + + The functions described here are wrapper functions that ultimately call + the PCRE2 native API. Their prototypes are defined in the pcre2posix.h + header file, and they all have unique names starting with pcre2_. How- + ever, the pcre2posix.h header also contains macro definitions that con- + vert the standard POSIX names such regcomp() into pcre2_regcomp() etc. + This means that a program can use the usual POSIX names without running + the risk of accidentally linking with POSIX functions from a different + library. + + On Unix-like systems the PCRE2 POSIX library is called libpcre2-posix, + so can be accessed by adding -lpcre2-posix to the command for linking + an application. Because the POSIX functions call the native ones, it is + also necessary to add -lpcre2-8. + + Although they were not defined as protypes in pcre2posix.h, releases + 10.33 to 10.36 of the library contained functions with the POSIX names + regcomp() etc. These simply passed their arguments to the PCRE2 func- + tions. These functions were provided for backwards compatibility with + earlier versions of PCRE2, which had only POSIX names. However, this + has proved troublesome in situations where a program links with several + libraries, some of which use PCRE2's POSIX interface while others use + the real POSIX functions. For this reason, the POSIX names have been + removed since release 10.37. + + Calling the header file pcre2posix.h avoids any conflict with other + POSIX libraries. It can, of course, be renamed or aliased as regex.h, + which is the "correct" name, if there is no clash. It provides two + structure types, regex_t for compiled internal forms, and regmatch_t + for returning captured substrings. It also defines some constants whose + names start with "REG_"; these are used for setting options and identi- + fying error codes. + + +USING THE POSIX FUNCTIONS + + Those POSIX option bits that can reasonably be mapped to PCRE2 native + options have been implemented. In addition, the option REG_EXTENDED is + defined with the value zero. This has no effect, but since programs + that are written to the POSIX interface often use it, this makes it + easier to slot in PCRE2 as a replacement library. Other POSIX options + are not even defined. + + There are also some options that are not defined by POSIX. These have + been added at the request of users who want to make use of certain + PCRE2-specific features via the POSIX calling interface or to add BSD + or GNU functionality. + + When PCRE2 is called via these functions, it is only the API that is + POSIX-like in style. The syntax and semantics of the regular expres- + sions themselves are still those of Perl, subject to the setting of + various PCRE2 options, as described below. "POSIX-like in style" means + that the API approximates to the POSIX definition; it is not fully + POSIX-compatible, and in multi-unit encoding domains it is probably + even less compatible. + + The descriptions below use the actual names of the functions, but, as + described above, the standard POSIX names (without the pcre2_ prefix) + may also be used. + + +COMPILING A PATTERN + + The function pcre2_regcomp() is called to compile a pattern into an in- + ternal form. By default, the pattern is a C string terminated by a bi- + nary zero (but see REG_PEND below). The preg argument is a pointer to a + regex_t structure that is used as a base for storing information about + the compiled regular expression. (It is also used for input when + REG_PEND is set.) + + The argument cflags is either zero, or contains one or more of the bits + defined by the following macros: + + REG_DOTALL + + The PCRE2_DOTALL option is set when the regular expression is passed + for compilation to the native function. Note that REG_DOTALL is not + part of the POSIX standard. + + REG_ICASE + + The PCRE2_CASELESS option is set when the regular expression is passed + for compilation to the native function. + + REG_NEWLINE + + The PCRE2_MULTILINE option is set when the regular expression is passed + for compilation to the native function. Note that this does not mimic + the defined POSIX behaviour for REG_NEWLINE (see the following sec- + tion). + + REG_NOSPEC + + The PCRE2_LITERAL option is set when the regular expression is passed + for compilation to the native function. This disables all meta charac- + ters in the pattern, causing it to be treated as a literal string. The + only other options that are allowed with REG_NOSPEC are REG_ICASE, + REG_NOSUB, REG_PEND, and REG_UTF. Note that REG_NOSPEC is not part of + the POSIX standard. + + REG_NOSUB + + When a pattern that is compiled with this flag is passed to + pcre2_regexec() for matching, the nmatch and pmatch arguments are ig- + nored, and no captured strings are returned. Versions of the PCRE li- + brary prior to 10.22 used to set the PCRE2_NO_AUTO_CAPTURE compile op- + tion, but this no longer happens because it disables the use of back- + references. + + REG_PEND + + If this option is set, the reg_endp field in the preg structure (which + has the type const char *) must be set to point to the character beyond + the end of the pattern before calling pcre2_regcomp(). The pattern it- + self may now contain binary zeros, which are treated as data charac- + ters. Without REG_PEND, a binary zero terminates the pattern and the + re_endp field is ignored. This is a GNU extension to the POSIX standard + and should be used with caution in software intended to be portable to + other systems. + + REG_UCP + + The PCRE2_UCP option is set when the regular expression is passed for + compilation to the native function. This causes PCRE2 to use Unicode + properties when matchine \d, \w, etc., instead of just recognizing + ASCII values. Note that REG_UCP is not part of the POSIX standard. + + REG_UNGREEDY + + The PCRE2_UNGREEDY option is set when the regular expression is passed + for compilation to the native function. Note that REG_UNGREEDY is not + part of the POSIX standard. + + REG_UTF + + The PCRE2_UTF option is set when the regular expression is passed for + compilation to the native function. This causes the pattern itself and + all data strings used for matching it to be treated as UTF-8 strings. + Note that REG_UTF is not part of the POSIX standard. + + In the absence of these flags, no options are passed to the native + function. This means the the regex is compiled with PCRE2 default se- + mantics. In particular, the way it handles newline characters in the + subject string is the Perl way, not the POSIX way. Note that setting + PCRE2_MULTILINE has only some of the effects specified for REG_NEWLINE. + It does not affect the way newlines are matched by the dot metacharac- + ter (they are not) or by a negative class such as [^a] (they are). + + The yield of pcre2_regcomp() is zero on success, and non-zero other- + wise. The preg structure is filled in on success, and one other member + of the structure (as well as re_endp) is public: re_nsub contains the + number of capturing subpatterns in the regular expression. Various er- + ror codes are defined in the header file. + + NOTE: If the yield of pcre2_regcomp() is non-zero, you must not attempt + to use the contents of the preg structure. If, for example, you pass it + to pcre2_regexec(), the result is undefined and your program is likely + to crash. + + +MATCHING NEWLINE CHARACTERS + + This area is not simple, because POSIX and Perl take different views of + things. It is not possible to get PCRE2 to obey POSIX semantics, but + then PCRE2 was never intended to be a POSIX engine. The following table + lists the different possibilities for matching newline characters in + Perl and PCRE2: + + Default Change with + + . matches newline no PCRE2_DOTALL + newline matches [^a] yes not changeable + $ matches \n at end yes PCRE2_DOLLAR_ENDONLY + $ matches \n in middle no PCRE2_MULTILINE + ^ matches \n in middle no PCRE2_MULTILINE + + This is the equivalent table for a POSIX-compatible pattern matcher: + + Default Change with + + . matches newline yes REG_NEWLINE + newline matches [^a] yes REG_NEWLINE + $ matches \n at end no REG_NEWLINE + $ matches \n in middle no REG_NEWLINE + ^ matches \n in middle no REG_NEWLINE + + This behaviour is not what happens when PCRE2 is called via its POSIX + API. By default, PCRE2's behaviour is the same as Perl's, except that + there is no equivalent for PCRE2_DOLLAR_ENDONLY in Perl. In both PCRE2 + and Perl, there is no way to stop newline from matching [^a]. + + Default POSIX newline handling can be obtained by setting PCRE2_DOTALL + and PCRE2_DOLLAR_ENDONLY when calling pcre2_compile() directly, but + there is no way to make PCRE2 behave exactly as for the REG_NEWLINE ac- + tion. When using the POSIX API, passing REG_NEWLINE to PCRE2's + pcre2_regcomp() function causes PCRE2_MULTILINE to be passed to + pcre2_compile(), and REG_DOTALL passes PCRE2_DOTALL. There is no way to + pass PCRE2_DOLLAR_ENDONLY. + + +MATCHING A PATTERN + + The function pcre2_regexec() is called to match a compiled pattern preg + against a given string, which is by default terminated by a zero byte + (but see REG_STARTEND below), subject to the options in eflags. These + can be: + + REG_NOTBOL + + The PCRE2_NOTBOL option is set when calling the underlying PCRE2 match- + ing function. + + REG_NOTEMPTY + + The PCRE2_NOTEMPTY option is set when calling the underlying PCRE2 + matching function. Note that REG_NOTEMPTY is not part of the POSIX + standard. However, setting this option can give more POSIX-like behav- + iour in some situations. + + REG_NOTEOL + + The PCRE2_NOTEOL option is set when calling the underlying PCRE2 match- + ing function. + + REG_STARTEND + + When this option is set, the subject string starts at string + + pmatch[0].rm_so and ends at string + pmatch[0].rm_eo, which should + point to the first character beyond the string. There may be binary ze- + ros within the subject string, and indeed, using REG_STARTEND is the + only way to pass a subject string that contains a binary zero. + + Whatever the value of pmatch[0].rm_so, the offsets of the matched + string and any captured substrings are still given relative to the + start of string itself. (Before PCRE2 release 10.30 these were given + relative to string + pmatch[0].rm_so, but this differs from other im- + plementations.) + + This is a BSD extension, compatible with but not specified by IEEE + Standard 1003.2 (POSIX.2), and should be used with caution in software + intended to be portable to other systems. Note that a non-zero rm_so + does not imply REG_NOTBOL; REG_STARTEND affects only the location and + length of the string, not how it is matched. Setting REG_STARTEND and + passing pmatch as NULL are mutually exclusive; the error REG_INVARG is + returned. + + If the pattern was compiled with the REG_NOSUB flag, no data about any + matched strings is returned. The nmatch and pmatch arguments of + pcre2_regexec() are ignored (except possibly as input for REG_STAR- + TEND). + + The value of nmatch may be zero, and the value pmatch may be NULL (un- + less REG_STARTEND is set); in both these cases no data about any + matched strings is returned. + + Otherwise, the portion of the string that was matched, and also any + captured substrings, are returned via the pmatch argument, which points + to an array of nmatch structures of type regmatch_t, containing the + members rm_so and rm_eo. These contain the byte offset to the first + character of each substring and the offset to the first character after + the end of each substring, respectively. The 0th element of the vector + relates to the entire portion of string that was matched; subsequent + elements relate to the capturing subpatterns of the regular expression. + Unused entries in the array have both structure members set to -1. + + A successful match yields a zero return; various error codes are de- + fined in the header file, of which REG_NOMATCH is the "expected" fail- + ure code. + + +ERROR MESSAGES + + The pcre2_regerror() function maps a non-zero errorcode from either + pcre2_regcomp() or pcre2_regexec() to a printable message. If preg is + not NULL, the error should have arisen from the use of that structure. + A message terminated by a binary zero is placed in errbuf. If the buf- + fer is too short, only the first errbuf_size - 1 characters of the er- + ror message are used. The yield of the function is the size of buffer + needed to hold the whole message, including the terminating zero. This + value is greater than errbuf_size if the message was truncated. + + +MEMORY USAGE + + Compiling a regular expression causes memory to be allocated and asso- + ciated with the preg structure. The function pcre2_regfree() frees all + such memory, after which preg may no longer be used as a compiled ex- + pression. + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 26 April 2021 + Copyright (c) 1997-2021 University of Cambridge. +------------------------------------------------------------------------------ + + +PCRE2SAMPLE(3) Library Functions Manual PCRE2SAMPLE(3) + + + +NAME + PCRE2 - Perl-compatible regular expressions (revised API) + +PCRE2 SAMPLE PROGRAM + + A simple, complete demonstration program to get you started with using + PCRE2 is supplied in the file pcre2demo.c in the src directory in the + PCRE2 distribution. A listing of this program is given in the pcre2demo + documentation. If you do not have a copy of the PCRE2 distribution, you + can save this listing to re-create the contents of pcre2demo.c. + + The demonstration program compiles the regular expression that is its + first argument, and matches it against the subject string in its second + argument. No PCRE2 options are set, and default character tables are + used. If matching succeeds, the program outputs the portion of the sub- + ject that matched, together with the contents of any captured sub- + strings. + + If the -g option is given on the command line, the program then goes on + to check for further matches of the same regular expression in the same + subject string. The logic is a little bit tricky because of the possi- + bility of matching an empty string. Comments in the code explain what + is going on. + + The code in pcre2demo.c is an 8-bit program that uses the PCRE2 8-bit + library. It handles strings and characters that are stored in 8-bit + code units. By default, one character corresponds to one code unit, + but if the pattern starts with "(*UTF)", both it and the subject are + treated as UTF-8 strings, where characters may occupy multiple code + units. + + If PCRE2 is installed in the standard include and library directories + for your operating system, you should be able to compile the demonstra- + tion program using a command like this: + + cc -o pcre2demo pcre2demo.c -lpcre2-8 + + If PCRE2 is installed elsewhere, you may need to add additional options + to the command line. For example, on a Unix-like system that has PCRE2 + installed in /usr/local, you can compile the demonstration program us- + ing a command like this: + + cc -o pcre2demo -I/usr/local/include pcre2demo.c \ + -L/usr/local/lib -lpcre2-8 + + Once you have built the demonstration program, you can run simple tests + like this: + + ./pcre2demo 'cat|dog' 'the cat sat on the mat' + ./pcre2demo -g 'cat|dog' 'the dog sat on the cat' + + Note that there is a much more comprehensive test program, called + pcre2test, which supports many more facilities for testing regular ex- + pressions using all three PCRE2 libraries (8-bit, 16-bit, and 32-bit, + though not all three need be installed). The pcre2demo program is pro- + vided as a relatively simple coding example. + + If you try to run pcre2demo when PCRE2 is not installed in the standard + library directory, you may get an error like this on some operating + systems (e.g. Solaris): + + ld.so.1: pcre2demo: fatal: libpcre2-8.so.0: open failed: No such file + or directory + + This is caused by the way shared library support works on those sys- + tems. You need to add + + -R/usr/local/lib + + (for example) to the compile command to get round this problem. + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 02 February 2016 + Copyright (c) 1997-2016 University of Cambridge. +------------------------------------------------------------------------------ +PCRE2SERIALIZE(3) Library Functions Manual PCRE2SERIALIZE(3) + + + +NAME + PCRE2 - Perl-compatible regular expressions (revised API) + +SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS + + int32_t pcre2_serialize_decode(pcre2_code **codes, + int32_t number_of_codes, const uint32_t *bytes, + pcre2_general_context *gcontext); + + int32_t pcre2_serialize_encode(pcre2_code **codes, + int32_t number_of_codes, uint32_t **serialized_bytes, + PCRE2_SIZE *serialized_size, pcre2_general_context *gcontext); + + void pcre2_serialize_free(uint8_t *bytes); + + int32_t pcre2_serialize_get_number_of_codes(const uint8_t *bytes); + + If you are running an application that uses a large number of regular + expression patterns, it may be useful to store them in a precompiled + form instead of having to compile them every time the application is + run. However, if you are using the just-in-time optimization feature, + it is not possible to save and reload the JIT data, because it is posi- + tion-dependent. The host on which the patterns are reloaded must be + running the same version of PCRE2, with the same code unit width, and + must also have the same endianness, pointer width and PCRE2_SIZE type. + For example, patterns compiled on a 32-bit system using PCRE2's 16-bit + library cannot be reloaded on a 64-bit system, nor can they be reloaded + using the 8-bit library. + + Note that "serialization" in PCRE2 does not convert compiled patterns + to an abstract format like Java or .NET serialization. The serialized + output is really just a bytecode dump, which is why it can only be + reloaded in the same environment as the one that created it. Hence the + restrictions mentioned above. Applications that are not statically + linked with a fixed version of PCRE2 must be prepared to recompile pat- + terns from their sources, in order to be immune to PCRE2 upgrades. + + +SECURITY CONCERNS + + The facility for saving and restoring compiled patterns is intended for + use within individual applications. As such, the data supplied to + pcre2_serialize_decode() is expected to be trusted data, not data from + arbitrary external sources. There is only some simple consistency + checking, not complete validation of what is being re-loaded. Corrupted + data may cause undefined results. For example, if the length field of a + pattern in the serialized data is corrupted, the deserializing code may + read beyond the end of the byte stream that is passed to it. + + +SAVING COMPILED PATTERNS + + Before compiled patterns can be saved they must be serialized, which in + PCRE2 means converting the pattern to a stream of bytes. A single byte + stream may contain any number of compiled patterns, but they must all + use the same character tables. A single copy of the tables is included + in the byte stream (its size is 1088 bytes). For more details of char- + acter tables, see the section on locale support in the pcre2api docu- + mentation. + + The function pcre2_serialize_encode() creates a serialized byte stream + from a list of compiled patterns. Its first two arguments specify the + list, being a pointer to a vector of pointers to compiled patterns, and + the length of the vector. The third and fourth arguments point to vari- + ables which are set to point to the created byte stream and its length, + respectively. The final argument is a pointer to a general context, + which can be used to specify custom memory mangagement functions. If + this argument is NULL, malloc() is used to obtain memory for the byte + stream. The yield of the function is the number of serialized patterns, + or one of the following negative error codes: + + PCRE2_ERROR_BADDATA the number of patterns is zero or less + PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns + PCRE2_ERROR_MEMORY memory allocation failed + PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables + PCRE2_ERROR_NULL the 1st, 3rd, or 4th argument is NULL + + PCRE2_ERROR_BADMAGIC means either that a pattern's code has been cor- + rupted, or that a slot in the vector does not point to a compiled pat- + tern. + + Once a set of patterns has been serialized you can save the data in any + appropriate manner. Here is sample code that compiles two patterns and + writes them to a file. It assumes that the variable fd refers to a file + that is open for output. The error checking that should be present in a + real application has been omitted for simplicity. + + int errorcode; + uint8_t *bytes; + PCRE2_SIZE erroroffset; + PCRE2_SIZE bytescount; + pcre2_code *list_of_codes[2]; + list_of_codes[0] = pcre2_compile("first pattern", + PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL); + list_of_codes[1] = pcre2_compile("second pattern", + PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL); + errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes, + &bytescount, NULL); + errorcode = fwrite(bytes, 1, bytescount, fd); + + Note that the serialized data is binary data that may contain any of + the 256 possible byte values. On systems that make a distinction be- + tween binary and non-binary data, be sure that the file is opened for + binary output. + + Serializing a set of patterns leaves the original data untouched, so + they can still be used for matching. Their memory must eventually be + freed in the usual way by calling pcre2_code_free(). When you have fin- + ished with the byte stream, it too must be freed by calling pcre2_seri- + alize_free(). If this function is called with a NULL argument, it re- + turns immediately without doing anything. + + +RE-USING PRECOMPILED PATTERNS + + In order to re-use a set of saved patterns you must first make the se- + rialized byte stream available in main memory (for example, by reading + from a file). The management of this memory block is up to the applica- + tion. You can use the pcre2_serialize_get_number_of_codes() function to + find out how many compiled patterns are in the serialized data without + actually decoding the patterns: + + uint8_t *bytes = ; + int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes); + + The pcre2_serialize_decode() function reads a byte stream and recreates + the compiled patterns in new memory blocks, setting pointers to them in + a vector. The first two arguments are a pointer to a suitable vector + and its length, and the third argument points to a byte stream. The fi- + nal argument is a pointer to a general context, which can be used to + specify custom memory mangagement functions for the decoded patterns. + If this argument is NULL, malloc() and free() are used. After deserial- + ization, the byte stream is no longer needed and can be discarded. + + int32_t number_of_codes; + pcre2_code *list_of_codes[2]; + uint8_t *bytes = ; + int32_t number_of_codes = + pcre2_serialize_decode(list_of_codes, 2, bytes, NULL); + + If the vector is not large enough for all the patterns in the byte + stream, it is filled with those that fit, and the remainder are ig- + nored. The yield of the function is the number of decoded patterns, or + one of the following negative error codes: + + PCRE2_ERROR_BADDATA second argument is zero or less + PCRE2_ERROR_BADMAGIC mismatch of id bytes in the data + PCRE2_ERROR_BADMODE mismatch of code unit size or PCRE2 version + PCRE2_ERROR_BADSERIALIZEDDATA other sanity check failure + PCRE2_ERROR_MEMORY memory allocation failed + PCRE2_ERROR_NULL first or third argument is NULL + + PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was + compiled on a system with different endianness. + + Decoded patterns can be used for matching in the usual way, and must be + freed by calling pcre2_code_free(). However, be aware that there is a + potential race issue if you are using multiple patterns that were de- + coded from a single byte stream in a multithreaded application. A sin- + gle copy of the character tables is used by all the decoded patterns + and a reference count is used to arrange for its memory to be automati- + cally freed when the last pattern is freed, but there is no locking on + this reference count. Therefore, if you want to call pcre2_code_free() + for these patterns in different threads, you must arrange your own + locking, and ensure that pcre2_code_free() cannot be called by two + threads at the same time. + + If a pattern was processed by pcre2_jit_compile() before being serial- + ized, the JIT data is discarded and so is no longer available after a + save/restore cycle. You can, however, process a restored pattern with + pcre2_jit_compile() if you wish. + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 27 June 2018 + Copyright (c) 1997-2018 University of Cambridge. +------------------------------------------------------------------------------ + + +PCRE2SYNTAX(3) Library Functions Manual PCRE2SYNTAX(3) + + + +NAME + PCRE2 - Perl-compatible regular expressions (revised API) + +PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY + + The full syntax and semantics of the regular expressions that are sup- + ported by PCRE2 are described in the pcre2pattern documentation. This + document contains a quick-reference summary of the syntax. + + +QUOTING + + \x where x is non-alphanumeric is a literal x + \Q...\E treat enclosed characters as literal + + +ESCAPED CHARACTERS + + This table applies to ASCII and Unicode environments. An unrecognized + escape sequence causes an error. + + \a alarm, that is, the BEL character (hex 07) + \cx "control-x", where x is any ASCII printing character + \e escape (hex 1B) + \f form feed (hex 0C) + \n newline (hex 0A) + \r carriage return (hex 0D) + \t tab (hex 09) + \0dd character with octal code 0dd + \ddd character with octal code ddd, or backreference + \o{ddd..} character with octal code ddd.. + \N{U+hh..} character with Unicode code point hh.. (Unicode mode only) + \xhh character with hex code hh + \x{hh..} character with hex code hh.. + + If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the + following are also recognized: + + \U the character "U" + \uhhhh character with hex code hhhh + \u{hh..} character with hex code hh.. but only for EXTRA_ALT_BSUX + + When \x is not followed by {, from zero to two hexadecimal digits are + read, but in ALT_BSUX mode \x must be followed by two hexadecimal dig- + its to be recognized as a hexadecimal escape; otherwise it matches a + literal "x". Likewise, if \u (in ALT_BSUX mode) is not followed by + four hexadecimal digits or (in EXTRA_ALT_BSUX mode) a sequence of hex + digits in curly brackets, it matches a literal "u". + + Note that \0dd is always an octal code. The treatment of backslash fol- + lowed by a non-zero digit is complicated; for details see the section + "Non-printing characters" in the pcre2pattern documentation, where de- + tails of escape processing in EBCDIC environments are also given. + \N{U+hh..} is synonymous with \x{hh..} in PCRE2 but is not supported in + EBCDIC environments. Note that \N not followed by an opening curly + bracket has a different meaning (see below). + + +CHARACTER TYPES + + . any character except newline; + in dotall mode, any character whatsoever + \C one code unit, even in UTF mode (best avoided) + \d a decimal digit + \D a character that is not a decimal digit + \h a horizontal white space character + \H a character that is not a horizontal white space character + \N a character that is not a newline + \p{xx} a character with the xx property + \P{xx} a character without the xx property + \R a newline sequence + \s a white space character + \S a character that is not a white space character + \v a vertical white space character + \V a character that is not a vertical white space character + \w a "word" character + \W a "non-word" character + \X a Unicode extended grapheme cluster + + \C is dangerous because it may leave the current matching point in the + middle of a UTF-8 or UTF-16 character. The application can lock out the + use of \C by setting the PCRE2_NEVER_BACKSLASH_C option. It is also + possible to build PCRE2 with the use of \C permanently disabled. + + By default, \d, \s, and \w match only ASCII characters, even in UTF-8 + mode or in the 16-bit and 32-bit libraries. However, if locale-specific + matching is happening, \s and \w may also match characters with code + points in the range 128-255. If the PCRE2_UCP option is set, the behav- + iour of these escape sequences is changed to use Unicode properties and + they match many more characters. + + +GENERAL CATEGORY PROPERTIES FOR \p and \P + + C Other + Cc Control + Cf Format + Cn Unassigned + Co Private use + Cs Surrogate + + L Letter + Ll Lower case letter + Lm Modifier letter + Lo Other letter + Lt Title case letter + Lu Upper case letter + L& Ll, Lu, or Lt + + M Mark + Mc Spacing mark + Me Enclosing mark + Mn Non-spacing mark + + N Number + Nd Decimal number + Nl Letter number + No Other number + + P Punctuation + Pc Connector punctuation + Pd Dash punctuation + Pe Close punctuation + Pf Final punctuation + Pi Initial punctuation + Po Other punctuation + Ps Open punctuation + + S Symbol + Sc Currency symbol + Sk Modifier symbol + Sm Mathematical symbol + So Other symbol + + Z Separator + Zl Line separator + Zp Paragraph separator + Zs Space separator + + +PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P + + Xan Alphanumeric: union of properties L and N + Xps POSIX space: property Z or tab, NL, VT, FF, CR + Xsp Perl space: property Z or tab, NL, VT, FF, CR + Xuc Univerally-named character: one that can be + represented by a Universal Character Name + Xwd Perl word: property Xan or underscore + + Perl and POSIX space are now the same. Perl added VT to its space char- + acter set at release 5.18. + + +SCRIPT NAMES FOR \p AND \P + + Adlam, Ahom, Anatolian_Hieroglyphs, Arabic, Armenian, Avestan, Bali- + nese, Bamum, Bassa_Vah, Batak, Bengali, Bhaiksuki, Bopomofo, Brahmi, + Braille, Buginese, Buhid, Canadian_Aboriginal, Carian, Caucasian_Alba- + nian, Chakma, Cham, Cherokee, Chorasmian, Common, Coptic, Cuneiform, + Cypriot, Cyrillic, Deseret, Devanagari, Dives_Akuru, Dogra, Duployan, + Egyptian_Hieroglyphs, Elbasan, Elymaic, Ethiopic, Georgian, Glagolitic, + Gothic, Grantha, Greek, Gujarati, Gunjala_Gondi, Gurmukhi, Han, Hangul, + Hanifi_Rohingya, Hanunoo, Hatran, Hebrew, Hiragana, Imperial_Aramaic, + Inherited, Inscriptional_Pahlavi, Inscriptional_Parthian, Javanese, + Kaithi, Kannada, Katakana, Kayah_Li, Kharoshthi, Khitan_Small_Script, + Khmer, Khojki, Khudawadi, Lao, Latin, Lepcha, Limbu, Linear_A, Lin- + ear_B, Lisu, Lycian, Lydian, Mahajani, Makasar, Malayalam, Mandaic, + Manichaean, Marchen, Masaram_Gondi, Medefaidrin, Meetei_Mayek, + Mende_Kikakui, Meroitic_Cursive, Meroitic_Hieroglyphs, Miao, Modi, Mon- + golian, Mro, Multani, Myanmar, Nabataean, Nandinagari, New_Tai_Lue, + Newa, Nko, Nushu, Nyakeng_Puachue_Hmong, Ogham, Ol_Chiki, Old_Hungar- + ian, Old_Italic, Old_North_Arabian, Old_Permic, Old_Persian, Old_Sog- + dian, Old_South_Arabian, Old_Turkic, Oriya, Osage, Osmanya, Pa- + hawh_Hmong, Palmyrene, Pau_Cin_Hau, Phags_Pa, Phoenician, + Psalter_Pahlavi, Rejang, Runic, Samaritan, Saurashtra, Sharada, Sha- + vian, Siddham, SignWriting, Sinhala, Sogdian, Sora_Sompeng, Soyombo, + Sundanese, Syloti_Nagri, Syriac, Tagalog, Tagbanwa, Tai_Le, Tai_Tham, + Tai_Viet, Takri, Tamil, Tangut, Telugu, Thaana, Thai, Tibetan, Tifi- + nagh, Tirhuta, Ugaritic, Vai, Wancho, Warang_Citi, Yezidi, Yi, Zan- + abazar_Square. + + +CHARACTER CLASSES + + [...] positive character class + [^...] negative character class + [x-y] range (can be used for hex characters) + [[:xxx:]] positive POSIX named set + [[:^xxx:]] negative POSIX named set + + alnum alphanumeric + alpha alphabetic + ascii 0-127 + blank space or tab + cntrl control character + digit decimal digit + graph printing, excluding space + lower lower case letter + print printing, including space + punct printing, excluding alphanumeric + space white space + upper upper case letter + word same as \w + xdigit hexadecimal digit + + In PCRE2, POSIX character set names recognize only ASCII characters by + default, but some of them use Unicode properties if PCRE2_UCP is set. + You can use \Q...\E inside a character class. + + +QUANTIFIERS + + ? 0 or 1, greedy + ?+ 0 or 1, possessive + ?? 0 or 1, lazy + * 0 or more, greedy + *+ 0 or more, possessive + *? 0 or more, lazy + + 1 or more, greedy + ++ 1 or more, possessive + +? 1 or more, lazy + {n} exactly n + {n,m} at least n, no more than m, greedy + {n,m}+ at least n, no more than m, possessive + {n,m}? at least n, no more than m, lazy + {n,} n or more, greedy + {n,}+ n or more, possessive + {n,}? n or more, lazy + + +ANCHORS AND SIMPLE ASSERTIONS + + \b word boundary + \B not a word boundary + ^ start of subject + also after an internal newline in multiline mode + (after any newline if PCRE2_ALT_CIRCUMFLEX is set) + \A start of subject + $ end of subject + also before newline at end of subject + also before internal newline in multiline mode + \Z end of subject + also before newline at end of subject + \z end of subject + \G first matching position in subject + + +REPORTED MATCH POINT SETTING + + \K set reported start of match + + \K is honoured in positive assertions, but ignored in negative ones. + + +ALTERNATION + + expr|expr|expr... + + +CAPTURING + + (...) capture group + (?...) named capture group (Perl) + (?'name'...) named capture group (Perl) + (?P...) named capture group (Python) + (?:...) non-capture group + (?|...) non-capture group; reset group numbers for + capture groups in each alternative + + In non-UTF modes, names may contain underscores and ASCII letters and + digits; in UTF modes, any Unicode letters and Unicode decimal digits + are permitted. In both cases, a name must not start with a digit. + + +ATOMIC GROUPS + + (?>...) atomic non-capture group + (*atomic:...) atomic non-capture group + + +COMMENT + + (?#....) comment (not nestable) + + +OPTION SETTING + Changes of these options within a group are automatically cancelled at + the end of the group. + + (?i) caseless + (?J) allow duplicate named groups + (?m) multiline + (?n) no auto capture + (?s) single line (dotall) + (?U) default ungreedy (lazy) + (?x) extended: ignore white space except in classes + (?xx) as (?x) but also ignore space and tab in classes + (?-...) unset option(s) + (?^) unset imnsx options + + Unsetting x or xx unsets both. Several options may be set at once, and + a mixture of setting and unsetting such as (?i-x) is allowed, but there + may be only one hyphen. Setting (but no unsetting) is allowed after (?^ + for example (?^in). An option setting may appear at the start of a non- + capture group, for example (?i:...). + + The following are recognized only at the very start of a pattern or af- + ter one of the newline or \R options with similar syntax. More than one + of them may appear. For the first three, d is a decimal number. + + (*LIMIT_DEPTH=d) set the backtracking limit to d + (*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes + (*LIMIT_MATCH=d) set the match limit to d + (*NOTEMPTY) set PCRE2_NOTEMPTY when matching + (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching + (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS) + (*NO_DOTSTAR_ANCHOR) no .* anchoring (PCRE2_NO_DOTSTAR_ANCHOR) + (*NO_JIT) disable JIT optimization + (*NO_START_OPT) no start-match optimization (PCRE2_NO_START_OPTIMIZE) + (*UTF) set appropriate UTF mode for the library in use + (*UCP) set PCRE2_UCP (use Unicode properties for \d etc) + + Note that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the + value of the limits set by the caller of pcre2_match() or + pcre2_dfa_match(), not increase them. LIMIT_RECURSION is an obsolete + synonym for LIMIT_DEPTH. The application can lock out the use of (*UTF) + and (*UCP) by setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, + respectively, at compile time. + + +NEWLINE CONVENTION + + These are recognized only at the very start of the pattern or after op- + tion settings with a similar syntax. + + (*CR) carriage return only + (*LF) linefeed only + (*CRLF) carriage return followed by linefeed + (*ANYCRLF) all three of the above + (*ANY) any Unicode newline sequence + (*NUL) the NUL character (binary zero) + + +WHAT \R MATCHES + + These are recognized only at the very start of the pattern or after op- + tion setting with a similar syntax. + + (*BSR_ANYCRLF) CR, LF, or CRLF + (*BSR_UNICODE) any Unicode newline sequence + + +LOOKAHEAD AND LOOKBEHIND ASSERTIONS + + (?=...) ) + (*pla:...) ) positive lookahead + (*positive_lookahead:...) ) + + (?!...) ) + (*nla:...) ) negative lookahead + (*negative_lookahead:...) ) + + (?<=...) ) + (*plb:...) ) positive lookbehind + (*positive_lookbehind:...) ) + + (? reference by name (Perl) + \k'name' reference by name (Perl) + \g{name} reference by name (Perl) + \k{name} reference by name (.NET) + (?P=name) reference by name (Python) + + +SUBROUTINE REFERENCES (POSSIBLY RECURSIVE) + + (?R) recurse whole pattern + (?n) call subroutine by absolute number + (?+n) call subroutine by relative number + (?-n) call subroutine by relative number + (?&name) call subroutine by name (Perl) + (?P>name) call subroutine by name (Python) + \g call subroutine by name (Oniguruma) + \g'name' call subroutine by name (Oniguruma) + \g call subroutine by absolute number (Oniguruma) + \g'n' call subroutine by absolute number (Oniguruma) + \g<+n> call subroutine by relative number (PCRE2 extension) + \g'+n' call subroutine by relative number (PCRE2 extension) + \g<-n> call subroutine by relative number (PCRE2 extension) + \g'-n' call subroutine by relative number (PCRE2 extension) + + +CONDITIONAL PATTERNS + + (?(condition)yes-pattern) + (?(condition)yes-pattern|no-pattern) + + (?(n) absolute reference condition + (?(+n) relative reference condition + (?(-n) relative reference condition + (?() named reference condition (Perl) + (?('name') named reference condition (Perl) + (?(name) named reference condition (PCRE2, deprecated) + (?(R) overall recursion condition + (?(Rn) specific numbered group recursion condition + (?(R&name) specific named group recursion condition + (?(DEFINE) define groups for reference + (?(VERSION[>]=n.m) test PCRE2 version + (?(assert) assertion condition + + Note the ambiguity of (?(R) and (?(Rn) which might be named reference + conditions or recursion tests. Such a condition is interpreted as a + reference condition if the relevant named group exists. + + +BACKTRACKING CONTROL + + All backtracking control verbs may be in the form (*VERB:NAME). For + (*MARK) the name is mandatory, for the others it is optional. (*SKIP) + changes its behaviour if :NAME is present. The others just set a name + for passing back to the caller, but this is not a name that (*SKIP) can + see. The following act immediately they are reached: + + (*ACCEPT) force successful match + (*FAIL) force backtrack; synonym (*F) + (*MARK:NAME) set name to be passed back; synonym (*:NAME) + + The following act only when a subsequent match failure causes a back- + track to reach them. They all force a match failure, but they differ in + what happens afterwards. Those that advance the start-of-match point do + so only if the pattern is not anchored. + + (*COMMIT) overall failure, no advance of starting point + (*PRUNE) advance to next starting character + (*SKIP) advance to current matching position + (*SKIP:NAME) advance to position corresponding to an earlier + (*MARK:NAME); if not found, the (*SKIP) is ignored + (*THEN) local failure, backtrack to next alternation + + The effect of one of these verbs in a group called as a subroutine is + confined to the subroutine call. + + +CALLOUTS + + (?C) callout (assumed number 0) + (?Cn) callout with numerical data n + (?C"text") callout with string data + + The allowed string delimiters are ` ' " ^ % # $ (which are the same for + the start and the end), and the starting delimiter { matched with the + ending delimiter }. To encode the ending delimiter within the string, + double it. + + +SEE ALSO + + pcre2pattern(3), pcre2api(3), pcre2callout(3), pcre2matching(3), + pcre2(3). + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 28 December 2019 + Copyright (c) 1997-2019 University of Cambridge. +------------------------------------------------------------------------------ + + +PCRE2UNICODE(3) Library Functions Manual PCRE2UNICODE(3) + + + +NAME + PCRE - Perl-compatible regular expressions (revised API) + +UNICODE AND UTF SUPPORT + + PCRE2 is normally built with Unicode support, though if you do not need + it, you can build it without, in which case the library will be + smaller. With Unicode support, PCRE2 has knowledge of Unicode character + properties and can process strings of text in UTF-8, UTF-16, and UTF-32 + format (depending on the code unit width), but this is not the default. + Unless specifically requested, PCRE2 treats each code unit in a string + as one character. + + There are two ways of telling PCRE2 to switch to UTF mode, where char- + acters may consist of more than one code unit and the range of values + is constrained. The program can call pcre2_compile() with the PCRE2_UTF + option, or the pattern may start with the sequence (*UTF). However, + the latter facility can be locked out by the PCRE2_NEVER_UTF option. + That is, the programmer can prevent the supplier of the pattern from + switching to UTF mode. + + Note that the PCRE2_MATCH_INVALID_UTF option (see below) forces + PCRE2_UTF to be set. + + In UTF mode, both the pattern and any subject strings that are matched + against it are treated as UTF strings instead of strings of individual + one-code-unit characters. There are also some other changes to the way + characters are handled, as documented below. + + +UNICODE PROPERTY SUPPORT + + When PCRE2 is built with Unicode support, the escape sequences \p{..}, + \P{..}, and \X can be used. This is not dependent on the PCRE2_UTF set- + ting. The Unicode properties that can be tested are limited to the + general category properties such as Lu for an upper case letter or Nd + for a decimal number, the Unicode script names such as Arabic or Han, + and the derived properties Any and L&. Full lists are given in the + pcre2pattern and pcre2syntax documentation. Only the short names for + properties are supported. For example, \p{L} matches a letter. Its Perl + synonym, \p{Letter}, is not supported. Furthermore, in Perl, many + properties may optionally be prefixed by "Is", for compatibility with + Perl 5.6. PCRE2 does not support this. + + +WIDE CHARACTERS AND UTF MODES + + Code points less than 256 can be specified in patterns by either braced + or unbraced hexadecimal escape sequences (for example, \x{b3} or \xb3). + Larger values have to use braced sequences. Unbraced octal code points + up to \777 are also recognized; larger ones can be coded using \o{...}. + + The escape sequence \N{U+} is recognized as another way of + specifying a Unicode character by code point in a UTF mode. It is not + allowed in non-UTF mode. + + In UTF mode, repeat quantifiers apply to complete UTF characters, not + to individual code units. + + In UTF mode, the dot metacharacter matches one UTF character instead of + a single code unit. + + In UTF mode, capture group names are not restricted to ASCII, and may + contain any Unicode letters and decimal digits, as well as underscore. + + The escape sequence \C can be used to match a single code unit in UTF + mode, but its use can lead to some strange effects because it breaks up + multi-unit characters (see the description of \C in the pcre2pattern + documentation). For this reason, there is a build-time option that dis- + ables support for \C completely. There is also a less draconian com- + pile-time option for locking out the use of \C when a pattern is com- + piled. + + The use of \C is not supported by the alternative matching function + pcre2_dfa_match() when in UTF-8 or UTF-16 mode, that is, when a charac- + ter may consist of more than one code unit. The use of \C in these + modes provokes a match-time error. Also, the JIT optimization does not + support \C in these modes. If JIT optimization is requested for a UTF-8 + or UTF-16 pattern that contains \C, it will not succeed, and so when + pcre2_match() is called, the matching will be carried out by the inter- + pretive function. + + The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly test + characters of any code value, but, by default, the characters that + PCRE2 recognizes as digits, spaces, or word characters remain the same + set as in non-UTF mode, all with code points less than 256. This re- + mains true even when PCRE2 is built to include Unicode support, because + to do otherwise would slow down matching in many common cases. Note + that this also applies to \b and \B, because they are defined in terms + of \w and \W. If you want to test for a wider sense of, say, "digit", + you can use explicit Unicode property tests such as \p{Nd}. Alterna- + tively, if you set the PCRE2_UCP option, the way that the character es- + capes work is changed so that Unicode properties are used to determine + which characters match. There are more details in the section on + generic character types in the pcre2pattern documentation. + + Similarly, characters that match the POSIX named character classes are + all low-valued characters, unless the PCRE2_UCP option is set. + + However, the special horizontal and vertical white space matching es- + capes (\h, \H, \v, and \V) do match all the appropriate Unicode charac- + ters, whether or not PCRE2_UCP is set. + + +UNICODE CASE-EQUIVALENCE + + If either PCRE2_UTF or PCRE2_UCP is set, upper/lower case processing + makes use of Unicode properties except for characters whose code points + are less than 128 and that have at most two case-equivalent values. For + these, a direct table lookup is used for speed. A few Unicode charac- + ters such as Greek sigma have more than two code points that are case- + equivalent, and these are treated specially. Setting PCRE2_UCP without + PCRE2_UTF allows Unicode-style case processing for non-UTF character + encodings such as UCS-2. + + +SCRIPT RUNS + + The pattern constructs (*script_run:...) and (*atomic_script_run:...), + with synonyms (*sr:...) and (*asr:...), verify that the string matched + within the parentheses is a script run. In concept, a script run is a + sequence of characters that are all from the same Unicode script. How- + ever, because some scripts are commonly used together, and because some + diacritical and other marks are used with multiple scripts, it is not + that simple. + + Every Unicode character has a Script property, mostly with a value cor- + responding to the name of a script, such as Latin, Greek, or Cyrillic. + There are also three special values: + + "Unknown" is used for code points that have not been assigned, and also + for the surrogate code points. In the PCRE2 32-bit library, characters + whose code points are greater than the Unicode maximum (U+10FFFF), + which are accessible only in non-UTF mode, are assigned the Unknown + script. + + "Common" is used for characters that are used with many scripts. These + include punctuation, emoji, mathematical, musical, and currency sym- + bols, and the ASCII digits 0 to 9. + + "Inherited" is used for characters such as diacritical marks that mod- + ify a previous character. These are considered to take on the script of + the character that they modify. + + Some Inherited characters are used with many scripts, but many of them + are only normally used with a small number of scripts. For example, + U+102E0 (Coptic Epact thousands mark) is used only with Arabic and Cop- + tic. In order to make it possible to check this, a Unicode property + called Script Extension exists. Its value is a list of scripts that ap- + ply to the character. For the majority of characters, the list contains + just one script, the same one as the Script property. However, for + characters such as U+102E0 more than one Script is listed. There are + also some Common characters that have a single, non-Common script in + their Script Extension list. + + The next section describes the basic rules for deciding whether a given + string of characters is a script run. Note, however, that there are + some special cases involving the Chinese Han script, and an additional + constraint for decimal digits. These are covered in subsequent sec- + tions. + + Basic script run rules + + A string that is less than two characters long is a script run. This is + the only case in which an Unknown character can be part of a script + run. Longer strings are checked using only the Script Extensions prop- + erty, not the basic Script property. + + If a character's Script Extension property is the single value "Inher- + ited", it is always accepted as part of a script run. This is also true + for the property "Common", subject to the checking of decimal digits + described below. All the remaining characters in a script run must have + at least one script in common in their Script Extension lists. In set- + theoretic terminology, the intersection of all the sets of scripts must + not be empty. + + A simple example is an Internet name such as "google.com". The letters + are all in the Latin script, and the dot is Common, so this string is a + script run. However, the Cyrillic letter "o" looks exactly the same as + the Latin "o"; a string that looks the same, but with Cyrillic "o"s is + not a script run. + + More interesting examples involve characters with more than one script + in their Script Extension. Consider the following characters: + + U+060C Arabic comma + U+06D4 Arabic full stop + + The first has the Script Extension list Arabic, Hanifi Rohingya, Syr- + iac, and Thaana; the second has just Arabic and Hanifi Rohingya. Both + of them could appear in script runs of either Arabic or Hanifi Ro- + hingya. The first could also appear in Syriac or Thaana script runs, + but the second could not. + + The Chinese Han script + + The Chinese Han script is commonly used in conjunction with other + scripts for writing certain languages. Japanese uses the Hiragana and + Katakana scripts together with Han; Korean uses Hangul and Han; Tai- + wanese Mandarin uses Bopomofo and Han. These three combinations are + treated as special cases when checking script runs and are, in effect, + "virtual scripts". Thus, a script run may contain a mixture of Hira- + gana, Katakana, and Han, or a mixture of Hangul and Han, or a mixture + of Bopomofo and Han, but not, for example, a mixture of Hangul and + Bopomofo and Han. PCRE2 (like Perl) follows Unicode's Technical Stan- + dard 39 ("Unicode Security Mechanisms", http://unicode.org/re- + ports/tr39/) in allowing such mixtures. + + Decimal digits + + Unicode contains many sets of 10 decimal digits in different scripts, + and some scripts (including the Common script) contain more than one + set. Some of these decimal digits them are visually indistinguishable + from the common ASCII digits. In addition to the script checking de- + scribed above, if a script run contains any decimal digits, they must + all come from the same set of 10 adjacent characters. + + +VALIDITY OF UTF STRINGS + + When the PCRE2_UTF option is set, the strings passed as patterns and + subjects are (by default) checked for validity on entry to the relevant + functions. If an invalid UTF string is passed, a negative error code is + returned. The code unit offset to the offending character can be ex- + tracted from the match data block by calling pcre2_get_startchar(), + which is used for this purpose after a UTF error. + + In some situations, you may already know that your strings are valid, + and therefore want to skip these checks in order to improve perfor- + mance, for example in the case of a long subject string that is being + scanned repeatedly. If you set the PCRE2_NO_UTF_CHECK option at com- + pile time or at match time, PCRE2 assumes that the pattern or subject + it is given (respectively) contains only valid UTF code unit sequences. + + If you pass an invalid UTF string when PCRE2_NO_UTF_CHECK is set, the + result is undefined and your program may crash or loop indefinitely or + give incorrect results. There is, however, one mode of matching that + can handle invalid UTF subject strings. This is enabled by passing + PCRE2_MATCH_INVALID_UTF to pcre2_compile() and is discussed below in + the next section. The rest of this section covers the case when + PCRE2_MATCH_INVALID_UTF is not set. + + Passing PCRE2_NO_UTF_CHECK to pcre2_compile() just disables the UTF + check for the pattern; it does not also apply to subject strings. If + you want to disable the check for a subject string you must pass this + same option to pcre2_match() or pcre2_dfa_match(). + + UTF-16 and UTF-32 strings can indicate their endianness by special code + knows as a byte-order mark (BOM). The PCRE2 functions do not handle + this, expecting strings to be in host byte order. + + Unless PCRE2_NO_UTF_CHECK is set, a UTF string is checked before any + other processing takes place. In the case of pcre2_match() and + pcre2_dfa_match() calls with a non-zero starting offset, the check is + applied only to that part of the subject that could be inspected during + matching, and there is a check that the starting offset points to the + first code unit of a character or to the end of the subject. If there + are no lookbehind assertions in the pattern, the check starts at the + starting offset. Otherwise, it starts at the length of the longest + lookbehind before the starting offset, or at the start of the subject + if there are not that many characters before the starting offset. Note + that the sequences \b and \B are one-character lookbehinds. + + In addition to checking the format of the string, there is a check to + ensure that all code points lie in the range U+0 to U+10FFFF, excluding + the surrogate area. The so-called "non-character" code points are not + excluded because Unicode corrigendum #9 makes it clear that they should + not be. + + Characters in the "Surrogate Area" of Unicode are reserved for use by + UTF-16, where they are used in pairs to encode code points with values + greater than 0xFFFF. The code points that are encoded by UTF-16 pairs + are available independently in the UTF-8 and UTF-32 encodings. (In + other words, the whole surrogate thing is a fudge for UTF-16 which un- + fortunately messes up UTF-8 and UTF-32.) + + Setting PCRE2_NO_UTF_CHECK at compile time does not disable the error + that is given if an escape sequence for an invalid Unicode code point + is encountered in the pattern. If you want to allow escape sequences + such as \x{d800} (a surrogate code point) you can set the PCRE2_EX- + TRA_ALLOW_SURROGATE_ESCAPES extra option. However, this is possible + only in UTF-8 and UTF-32 modes, because these values are not repre- + sentable in UTF-16. + + Errors in UTF-8 strings + + The following negative error codes are given for invalid UTF-8 strings: + + PCRE2_ERROR_UTF8_ERR1 + PCRE2_ERROR_UTF8_ERR2 + PCRE2_ERROR_UTF8_ERR3 + PCRE2_ERROR_UTF8_ERR4 + PCRE2_ERROR_UTF8_ERR5 + + The string ends with a truncated UTF-8 character; the code specifies + how many bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8 + characters to be no longer than 4 bytes, the encoding scheme (origi- + nally defined by RFC 2279) allows for up to 6 bytes, and this is + checked first; hence the possibility of 4 or 5 missing bytes. + + PCRE2_ERROR_UTF8_ERR6 + PCRE2_ERROR_UTF8_ERR7 + PCRE2_ERROR_UTF8_ERR8 + PCRE2_ERROR_UTF8_ERR9 + PCRE2_ERROR_UTF8_ERR10 + + The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of + the character do not have the binary value 0b10 (that is, either the + most significant bit is 0, or the next bit is 1). + + PCRE2_ERROR_UTF8_ERR11 + PCRE2_ERROR_UTF8_ERR12 + + A character that is valid by the RFC 2279 rules is either 5 or 6 bytes + long; these code points are excluded by RFC 3629. + + PCRE2_ERROR_UTF8_ERR13 + + A 4-byte character has a value greater than 0x10ffff; these code points + are excluded by RFC 3629. + + PCRE2_ERROR_UTF8_ERR14 + + A 3-byte character has a value in the range 0xd800 to 0xdfff; this + range of code points are reserved by RFC 3629 for use with UTF-16, and + so are excluded from UTF-8. + + PCRE2_ERROR_UTF8_ERR15 + PCRE2_ERROR_UTF8_ERR16 + PCRE2_ERROR_UTF8_ERR17 + PCRE2_ERROR_UTF8_ERR18 + PCRE2_ERROR_UTF8_ERR19 + + A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it codes + for a value that can be represented by fewer bytes, which is invalid. + For example, the two bytes 0xc0, 0xae give the value 0x2e, whose cor- + rect coding uses just one byte. + + PCRE2_ERROR_UTF8_ERR20 + + The two most significant bits of the first byte of a character have the + binary value 0b10 (that is, the most significant bit is 1 and the sec- + ond is 0). Such a byte can only validly occur as the second or subse- + quent byte of a multi-byte character. + + PCRE2_ERROR_UTF8_ERR21 + + The first byte of a character has the value 0xfe or 0xff. These values + can never occur in a valid UTF-8 string. + + Errors in UTF-16 strings + + The following negative error codes are given for invalid UTF-16 + strings: + + PCRE2_ERROR_UTF16_ERR1 Missing low surrogate at end of string + PCRE2_ERROR_UTF16_ERR2 Invalid low surrogate follows high surrogate + PCRE2_ERROR_UTF16_ERR3 Isolated low surrogate + + + Errors in UTF-32 strings + + The following negative error codes are given for invalid UTF-32 + strings: + + PCRE2_ERROR_UTF32_ERR1 Surrogate character (0xd800 to 0xdfff) + PCRE2_ERROR_UTF32_ERR2 Code point is greater than 0x10ffff + + +MATCHING IN INVALID UTF STRINGS + + You can run pattern matches on subject strings that may contain invalid + UTF sequences if you call pcre2_compile() with the PCRE2_MATCH_IN- + VALID_UTF option. This is supported by pcre2_match(), including JIT + matching, but not by pcre2_dfa_match(). When PCRE2_MATCH_INVALID_UTF is + set, it forces PCRE2_UTF to be set as well. Note, however, that the + pattern itself must be a valid UTF string. + + Setting PCRE2_MATCH_INVALID_UTF does not affect what pcre2_compile() + generates, but if pcre2_jit_compile() is subsequently called, it does + generate different code. If JIT is not used, the option affects the be- + haviour of the interpretive code in pcre2_match(). When PCRE2_MATCH_IN- + VALID_UTF is set at compile time, PCRE2_NO_UTF_CHECK is ignored at + match time. + + In this mode, an invalid code unit sequence in the subject never + matches any pattern item. It does not match dot, it does not match + \p{Any}, it does not even match negative items such as [^X]. A lookbe- + hind assertion fails if it encounters an invalid sequence while moving + the current point backwards. In other words, an invalid UTF code unit + sequence acts as a barrier which no match can cross. + + You can also think of this as the subject being split up into fragments + of valid UTF, delimited internally by invalid code unit sequences. The + pattern is matched fragment by fragment. The result of a successful + match, however, is given as code unit offsets in the entire subject + string in the usual way. There are a few points to consider: + + The internal boundaries are not interpreted as the beginnings or ends + of lines and so do not match circumflex or dollar characters in the + pattern. + + If pcre2_match() is called with an offset that points to an invalid + UTF-sequence, that sequence is skipped, and the match starts at the + next valid UTF character, or the end of the subject. + + At internal fragment boundaries, \b and \B behave in the same way as at + the beginning and end of the subject. For example, a sequence such as + \bWORD\b would match an instance of WORD that is surrounded by invalid + UTF code units. + + Using PCRE2_MATCH_INVALID_UTF, an application can run matches on arbi- + trary data, knowing that any matched strings that are returned are + valid UTF. This can be useful when searching for UTF text in executable + or other binary files. + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 23 February 2020 + Copyright (c) 1997-2020 University of Cambridge. +------------------------------------------------------------------------------ + + diff --git a/src/pcre2/doc/pcre2_callout_enumerate.3 b/src/pcre2/doc/pcre2_callout_enumerate.3 new file mode 100644 index 00000000..109c9bec --- /dev/null +++ b/src/pcre2/doc/pcre2_callout_enumerate.3 @@ -0,0 +1,51 @@ +.TH PCRE2_COMPILE 3 "23 March 2017" "PCRE2 10.30" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_callout_enumerate(const pcre2_code *\fIcode\fP, +.B " int (*\fIcallback\fP)(pcre2_callout_enumerate_block *, void *)," +.B " void *\fIcallout_data\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function scans a compiled regular expression and calls the \fIcallback()\fP +function for each callout within the pattern. The yield of the function is zero +for success and non-zero otherwise. The arguments are: +.sp + \fIcode\fP Points to the compiled pattern + \fIcallback\fP The callback function + \fIcallout_data\fP User data that is passed to the callback +.sp +The \fIcallback()\fP function is passed a pointer to a data block containing +the following fields (not necessarily in this order): +.sp + uint32_t \fIversion\fP Block version number + uint32_t \fIcallout_number\fP Number for numbered callouts + PCRE2_SIZE \fIpattern_position\fP Offset to next item in pattern + PCRE2_SIZE \fInext_item_length\fP Length of next item in pattern + PCRE2_SIZE \fIcallout_string_offset\fP Offset to string within pattern + PCRE2_SIZE \fIcallout_string_length\fP Length of callout string + PCRE2_SPTR \fIcallout_string\fP Points to callout string or is NULL +.sp +The second argument passed to the \fBcallback()\fP function is the callout data +that was passed to \fBpcre2_callout_enumerate()\fP. The \fBcallback()\fP +function must return zero for success. Any other value causes the pattern scan +to stop, with the value being passed back as the result of +\fBpcre2_callout_enumerate()\fP. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_code_copy.3 b/src/pcre2/doc/pcre2_code_copy.3 new file mode 100644 index 00000000..09b47054 --- /dev/null +++ b/src/pcre2/doc/pcre2_code_copy.3 @@ -0,0 +1,31 @@ +.TH PCRE2_CODE_COPY 3 "22 November 2016" "PCRE2 10.23" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B pcre2_code *pcre2_code_copy(const pcre2_code *\fIcode\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +This function makes a copy of the memory used for a compiled pattern, excluding +any memory used by the JIT compiler. Without a subsequent call to +\fBpcre2_jit_compile()\fP, the copy can be used only for non-JIT matching. The +pointer to the character tables is copied, not the tables themselves (see +\fBpcre2_code_copy_with_tables()\fP). The yield of the function is NULL if +\fIcode\fP is NULL or if sufficient memory cannot be obtained. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_code_copy_with_tables.3 b/src/pcre2/doc/pcre2_code_copy_with_tables.3 new file mode 100644 index 00000000..cfbddb33 --- /dev/null +++ b/src/pcre2/doc/pcre2_code_copy_with_tables.3 @@ -0,0 +1,32 @@ +.TH PCRE2_CODE_COPY 3 "22 November 2016" "PCRE2 10.23" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *\fIcode\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +This function makes a copy of the memory used for a compiled pattern, excluding +any memory used by the JIT compiler. Without a subsequent call to +\fBpcre2_jit_compile()\fP, the copy can be used only for non-JIT matching. +Unlike \fBpcre2_code_copy()\fP, a separate copy of the character tables is also +made, with the new code pointing to it. This memory will be automatically freed +when \fBpcre2_code_free()\fP is called. The yield of the function is NULL if +\fIcode\fP is NULL or if sufficient memory cannot be obtained. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_code_free.3 b/src/pcre2/doc/pcre2_code_free.3 new file mode 100644 index 00000000..9e0ad3c0 --- /dev/null +++ b/src/pcre2/doc/pcre2_code_free.3 @@ -0,0 +1,30 @@ +.TH PCRE2_CODE_FREE 3 "28 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B void pcre2_code_free(pcre2_code *\fIcode\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +If \fIcode\fP is NULL, this function does nothing. Otherwise, \fIcode\fP must +point to a compiled pattern. This function frees its memory, including any +memory used by the JIT compiler. If the compiled pattern was created by a call +to \fBpcre2_code_copy_with_tables()\fP, the memory for the character tables is +also freed. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_compile.3 b/src/pcre2/doc/pcre2_compile.3 new file mode 100644 index 00000000..58a60c1d --- /dev/null +++ b/src/pcre2/doc/pcre2_compile.3 @@ -0,0 +1,95 @@ +.TH PCRE2_COMPILE 3 "23 May 2019" "PCRE2 10.34" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B pcre2_code *pcre2_compile(PCRE2_SPTR \fIpattern\fP, PCRE2_SIZE \fIlength\fP, +.B " uint32_t \fIoptions\fP, int *\fIerrorcode\fP, PCRE2_SIZE *\fIerroroffset,\fP" +.B " pcre2_compile_context *\fIccontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function compiles a regular expression pattern into an internal form. Its +arguments are: +.sp + \fIpattern\fP A string containing expression to be compiled + \fIlength\fP The length of the string or PCRE2_ZERO_TERMINATED + \fIoptions\fP Option bits + \fIerrorcode\fP Where to put an error code + \fIerroffset\fP Where to put an error offset + \fIccontext\fP Pointer to a compile context or NULL +.sp +The length of the pattern and any error offset that is returned are in code +units, not characters. A compile context is needed only if you want to provide +custom memory allocation functions, or to provide an external function for +system stack size checking, or to change one or more of these parameters: +.sp + What \eR matches (Unicode newlines, or CR, LF, CRLF only); + PCRE2's character tables; + The newline character sequence; + The compile time nested parentheses limit; + The maximum pattern length (in code units) that is allowed. + The additional options bits (see pcre2_set_compile_extra_options()) +.sp +The option bits are: +.sp + PCRE2_ANCHORED Force pattern anchoring + PCRE2_ALLOW_EMPTY_CLASS Allow empty classes + PCRE2_ALT_BSUX Alternative handling of \eu, \eU, and \ex + PCRE2_ALT_CIRCUMFLEX Alternative handling of ^ in multiline mode + PCRE2_ALT_VERBNAMES Process backslashes in verb names + PCRE2_AUTO_CALLOUT Compile automatic callouts + PCRE2_CASELESS Do caseless matching + PCRE2_DOLLAR_ENDONLY $ not to match newline at end + PCRE2_DOTALL . matches anything including NL + PCRE2_DUPNAMES Allow duplicate names for subpatterns + PCRE2_ENDANCHORED Pattern can match only at end of subject + PCRE2_EXTENDED Ignore white space and # comments + PCRE2_FIRSTLINE Force matching to be before newline + PCRE2_LITERAL Pattern characters are all literal + PCRE2_MATCH_INVALID_UTF Enable support for matching invalid UTF + PCRE2_MATCH_UNSET_BACKREF Match unset backreferences + PCRE2_MULTILINE ^ and $ match newlines within data + PCRE2_NEVER_BACKSLASH_C Lock out the use of \eC in patterns + PCRE2_NEVER_UCP Lock out PCRE2_UCP, e.g. via (*UCP) + PCRE2_NEVER_UTF Lock out PCRE2_UTF, e.g. via (*UTF) + PCRE2_NO_AUTO_CAPTURE Disable numbered capturing paren- + theses (named ones available) + PCRE2_NO_AUTO_POSSESS Disable auto-possessification + PCRE2_NO_DOTSTAR_ANCHOR Disable automatic anchoring for .* + PCRE2_NO_START_OPTIMIZE Disable match-time start optimizations + PCRE2_NO_UTF_CHECK Do not check the pattern for UTF validity + (only relevant if PCRE2_UTF is set) + PCRE2_UCP Use Unicode properties for \ed, \ew, etc. + PCRE2_UNGREEDY Invert greediness of quantifiers + PCRE2_USE_OFFSET_LIMIT Enable offset limit for unanchored matching + PCRE2_UTF Treat pattern and subjects as UTF strings +.sp +PCRE2 must be built with Unicode support (the default) in order to use +PCRE2_UTF, PCRE2_UCP and related options. +.P +Additional options may be set in the compile context via the +.\" HREF +\fBpcre2_set_compile_extra_options\fP +.\" +function. +.P +The yield of this function is a pointer to a private data structure that +contains the compiled pattern, or NULL if an error was detected. +.P +There is a complete description of the PCRE2 native API, with more detail on +each option, in the +.\" HREF +\fBpcre2api\fP +.\" +page, and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_compile_context_copy.3 b/src/pcre2/doc/pcre2_compile_context_copy.3 new file mode 100644 index 00000000..aea11875 --- /dev/null +++ b/src/pcre2/doc/pcre2_compile_context_copy.3 @@ -0,0 +1,29 @@ +.TH PCRE2_COMPILE_CONTEXT_COPY 3 "22 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B pcre2_compile_context *pcre2_compile_context_copy( +.B " pcre2_compile_context *\fIccontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function makes a new copy of a compile context, using the memory +allocation function that was used for the original context. The result is NULL +if the memory cannot be obtained. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_compile_context_create.3 b/src/pcre2/doc/pcre2_compile_context_create.3 new file mode 100644 index 00000000..3053df43 --- /dev/null +++ b/src/pcre2/doc/pcre2_compile_context_create.3 @@ -0,0 +1,30 @@ +.TH PCRE2_COMPILE_CONTEXT_CREATE 3 "22 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B pcre2_compile_context *pcre2_compile_context_create( +.B " pcre2_general_context *\fIgcontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function creates and initializes a new compile context. If its argument is +NULL, \fBmalloc()\fP is used to get the necessary memory; otherwise the memory +allocation function within the general context is used. The result is NULL if +the memory could not be obtained. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_compile_context_free.3 b/src/pcre2/doc/pcre2_compile_context_free.3 new file mode 100644 index 00000000..e90d744f --- /dev/null +++ b/src/pcre2/doc/pcre2_compile_context_free.3 @@ -0,0 +1,29 @@ +.TH PCRE2_COMPILE_CONTEXT_FREE 3 "29 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B void pcre2_compile_context_free(pcre2_compile_context *\fIccontext\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +This function frees the memory occupied by a compile context, using the memory +freeing function from the general context with which it was created, or +\fBfree()\fP if that was not set. If the argument is NULL, the function returns +immediately without doing anything. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_config.3 b/src/pcre2/doc/pcre2_config.3 new file mode 100644 index 00000000..ab9623d2 --- /dev/null +++ b/src/pcre2/doc/pcre2_config.3 @@ -0,0 +1,76 @@ +.TH PCRE2_CONFIG 3 "16 September 2017" "PCRE2 10.31" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.SM +.B int pcre2_config(uint32_t \fIwhat\fP, void *\fIwhere\fP); +. +.SH DESCRIPTION +.rs +.sp +This function makes it possible for a client program to find out which optional +features are available in the version of the PCRE2 library it is using. The +arguments are as follows: +.sp + \fIwhat\fP A code specifying what information is required + \fIwhere\fP Points to where to put the information +.sp +If \fIwhere\fP is NULL, the function returns the amount of memory needed for +the requested information. When the information is a string, the value is in +code units; for other types of data it is in bytes. +.P +If \fBwhere\fP is not NULL, for PCRE2_CONFIG_JITTARGET, +PCRE2_CONFIG_UNICODE_VERSION, and PCRE2_CONFIG_VERSION it must point to a +buffer that is large enough to hold the string. For all other codes it must +point to a uint32_t integer variable. The available codes are: +.sp + PCRE2_CONFIG_BSR Indicates what \eR matches by default: + PCRE2_BSR_UNICODE + PCRE2_BSR_ANYCRLF + PCRE2_CONFIG_COMPILED_WIDTHS Which of 8/16/32 support was compiled + PCRE2_CONFIG_DEPTHLIMIT Default backtracking depth limit + PCRE2_CONFIG_HEAPLIMIT Default heap memory limit +.\" JOIN + PCRE2_CONFIG_JIT Availability of just-in-time compiler + support (1=yes 0=no) +.\" JOIN + PCRE2_CONFIG_JITTARGET Information (a string) about the target + architecture for the JIT compiler + PCRE2_CONFIG_LINKSIZE Configured internal link size (2, 3, 4) + PCRE2_CONFIG_MATCHLIMIT Default internal resource limit + PCRE2_CONFIG_NEVER_BACKSLASH_C Whether or not \eC is disabled + PCRE2_CONFIG_NEWLINE Code for the default newline sequence: + PCRE2_NEWLINE_CR + PCRE2_NEWLINE_LF + PCRE2_NEWLINE_CRLF + PCRE2_NEWLINE_ANY + PCRE2_NEWLINE_ANYCRLF + PCRE2_NEWLINE_NUL + PCRE2_CONFIG_PARENSLIMIT Default parentheses nesting limit + PCRE2_CONFIG_RECURSIONLIMIT Obsolete: use PCRE2_CONFIG_DEPTHLIMIT + PCRE2_CONFIG_STACKRECURSE Obsolete: always returns 0 +.\" JOIN + PCRE2_CONFIG_UNICODE Availability of Unicode support (1=yes + 0=no) + PCRE2_CONFIG_UNICODE_VERSION The Unicode version (a string) + PCRE2_CONFIG_VERSION The PCRE2 version (a string) +.sp +The function yields a non-negative value on success or the negative value +PCRE2_ERROR_BADOPTION otherwise. This is also the result for the +PCRE2_CONFIG_JITTARGET code if JIT support is not available. When a string is +requested, the function returns the number of code units used, including the +terminating zero. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_convert_context_copy.3 b/src/pcre2/doc/pcre2_convert_context_copy.3 new file mode 100644 index 00000000..827c3e99 --- /dev/null +++ b/src/pcre2/doc/pcre2_convert_context_copy.3 @@ -0,0 +1,26 @@ +.TH PCRE2_CONVERT_CONTEXT_COPY 3 "10 July 2017" "PCRE2 10.30" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B pcre2_convert_context *pcre2_convert_context_copy( +.B " pcre2_convert_context *\fIcvcontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function is part of an experimental set of pattern conversion functions. +It makes a new copy of a convert context, using the memory allocation function +that was used for the original context. The result is NULL if the memory cannot +be obtained. +.P +The pattern conversion functions are described in the +.\" HREF +\fBpcre2convert\fP +.\" +documentation. diff --git a/src/pcre2/doc/pcre2_convert_context_create.3 b/src/pcre2/doc/pcre2_convert_context_create.3 new file mode 100644 index 00000000..91c17fb3 --- /dev/null +++ b/src/pcre2/doc/pcre2_convert_context_create.3 @@ -0,0 +1,27 @@ +.TH PCRE2_CONVERT_CONTEXT_CREATE 3 "10 July 2017" "PCRE2 10.30" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B pcre2_convert_context *pcre2_convert_context_create( +.B " pcre2_general_context *\fIgcontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function is part of an experimental set of pattern conversion functions. +It creates and initializes a new convert context. If its argument is +NULL, \fBmalloc()\fP is used to get the necessary memory; otherwise the memory +allocation function within the general context is used. The result is NULL if +the memory could not be obtained. +.P +The pattern conversion functions are described in the +.\" HREF +\fBpcre2convert\fP +.\" +documentation. diff --git a/src/pcre2/doc/pcre2_convert_context_free.3 b/src/pcre2/doc/pcre2_convert_context_free.3 new file mode 100644 index 00000000..3fd57830 --- /dev/null +++ b/src/pcre2/doc/pcre2_convert_context_free.3 @@ -0,0 +1,26 @@ +.TH PCRE2_CONVERT_CONTEXT_FREE 3 "28 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B void pcre2_convert_context_free(pcre2_convert_context *\fIcvcontext\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +This function is part of an experimental set of pattern conversion functions. +It frees the memory occupied by a convert context, using the memory +freeing function from the general context with which it was created, or +\fBfree()\fP if that was not set. If the argument is NULL, the function returns +immediately without doing anything. +.P +The pattern conversion functions are described in the +.\" HREF +\fBpcre2convert\fP +.\" +documentation. diff --git a/src/pcre2/doc/pcre2_converted_pattern_free.3 b/src/pcre2/doc/pcre2_converted_pattern_free.3 new file mode 100644 index 00000000..b0645b57 --- /dev/null +++ b/src/pcre2/doc/pcre2_converted_pattern_free.3 @@ -0,0 +1,26 @@ +.TH PCRE2_CONVERTED_PATTERN_FREE 3 "28 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B void pcre2_converted_pattern_free(PCRE2_UCHAR *\fIconverted_pattern\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +This function is part of an experimental set of pattern conversion functions. +It frees the memory occupied by a converted pattern that was obtained by +calling \fBpcre2_pattern_convert()\fP with arguments that caused it to place +the converted pattern into newly obtained heap memory. If the argument is NULL, +the function returns immediately without doing anything. +.P +The pattern conversion functions are described in the +.\" HREF +\fBpcre2convert\fP +.\" +documentation. diff --git a/src/pcre2/doc/pcre2_dfa_match.3 b/src/pcre2/doc/pcre2_dfa_match.3 new file mode 100644 index 00000000..6413cb60 --- /dev/null +++ b/src/pcre2/doc/pcre2_dfa_match.3 @@ -0,0 +1,81 @@ +.TH PCRE2_DFA_MATCH 3 "16 October 2018" "PCRE2 10.33" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_dfa_match(const pcre2_code *\fIcode\fP, PCRE2_SPTR \fIsubject\fP, +.B " PCRE2_SIZE \fIlength\fP, PCRE2_SIZE \fIstartoffset\fP," +.B " uint32_t \fIoptions\fP, pcre2_match_data *\fImatch_data\fP," +.B " pcre2_match_context *\fImcontext\fP," +.B " int *\fIworkspace\fP, PCRE2_SIZE \fIwscount\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function matches a compiled regular expression against a given subject +string, using an alternative matching algorithm that scans the subject string +just once (except when processing lookaround assertions). This function is +\fInot\fP Perl-compatible (the Perl-compatible matching function is +\fBpcre2_match()\fP). The arguments for this function are: +.sp + \fIcode\fP Points to the compiled pattern + \fIsubject\fP Points to the subject string + \fIlength\fP Length of the subject string + \fIstartoffset\fP Offset in the subject at which to start matching + \fIoptions\fP Option bits + \fImatch_data\fP Points to a match data block, for results + \fImcontext\fP Points to a match context, or is NULL + \fIworkspace\fP Points to a vector of ints used as working space + \fIwscount\fP Number of elements in the vector +.sp +For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set +up a callout function or specify the heap limit or the match or the recursion +depth limits. The \fIlength\fP and \fIstartoffset\fP values are code units, not +characters. The options are: +.sp + PCRE2_ANCHORED Match only at the first position + PCRE2_COPY_MATCHED_SUBJECT + On success, make a private subject copy + PCRE2_ENDANCHORED Pattern can match only at end of subject + PCRE2_NOTBOL Subject is not the beginning of a line + PCRE2_NOTEOL Subject is not the end of a line + PCRE2_NOTEMPTY An empty string is not a valid match +.\" JOIN + PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject + is not a valid match +.\" JOIN + PCRE2_NO_UTF_CHECK Do not check the subject for UTF + validity (only relevant if PCRE2_UTF + was set at compile time) +.\" JOIN + PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial + match even if there is a full match +.\" JOIN + PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial + match if no full matches are found + PCRE2_DFA_RESTART Restart after a partial match + PCRE2_DFA_SHORTEST Return only the shortest match +.sp +There are restrictions on what may appear in a pattern when using this matching +function. Details are given in the +.\" HREF +\fBpcre2matching\fP +.\" +documentation. For details of partial matching, see the +.\" HREF +\fBpcre2partial\fP +.\" +page. There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_general_context_copy.3 b/src/pcre2/doc/pcre2_general_context_copy.3 new file mode 100644 index 00000000..637e565f --- /dev/null +++ b/src/pcre2/doc/pcre2_general_context_copy.3 @@ -0,0 +1,30 @@ +.TH PCRE2_GENERAL_CONTEXT_COPY 3 "22 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B pcre2_general_context *pcre2_general_context_copy( +.B " pcre2_general_context *\fIgcontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function makes a new copy of a general context, using the memory +allocation functions in the context, if set, to get the necessary memory. +Otherwise \fBmalloc()\fP is used. The result is NULL if the memory cannot be +obtained. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_general_context_create.3 b/src/pcre2/doc/pcre2_general_context_create.3 new file mode 100644 index 00000000..a3e6c10c --- /dev/null +++ b/src/pcre2/doc/pcre2_general_context_create.3 @@ -0,0 +1,32 @@ +.TH PCRE2_GENERAL_CONTEXT_CREATE 3 "22 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B pcre2_general_context *pcre2_general_context_create( +.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *)," +.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function creates and initializes a general context. The arguments define +custom memory management functions and a data value that is passed to them when +they are called. The \fBprivate_malloc()\fP function is used to get memory for +the context. If either of the first two arguments is NULL, the system memory +management function is used. The result is NULL if no memory could be obtained. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_general_context_free.3 b/src/pcre2/doc/pcre2_general_context_free.3 new file mode 100644 index 00000000..df1aa1f4 --- /dev/null +++ b/src/pcre2/doc/pcre2_general_context_free.3 @@ -0,0 +1,28 @@ +.TH PCRE2_GENERAL_CONTEXT_FREE 3 "28 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B void pcre2_general_context_free(pcre2_general_context *\fIgcontext\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +This function frees the memory occupied by a general context, using the memory +freeing function within the context, if set. If the argument is NULL, the +function returns immediately without doing anything. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_get_error_message.3 b/src/pcre2/doc/pcre2_get_error_message.3 new file mode 100644 index 00000000..3d3e0deb --- /dev/null +++ b/src/pcre2/doc/pcre2_get_error_message.3 @@ -0,0 +1,39 @@ +.TH PCRE2_GET_ERROR_MESSAGE 3 "24 March 2017" "PCRE2 10.30" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_get_error_message(int \fIerrorcode\fP, PCRE2_UCHAR *\fIbuffer\fP, +.B " PCRE2_SIZE \fIbufflen\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function provides a textual error message for each PCRE2 error code. +Compilation errors are positive numbers; UTF formatting errors and matching +errors are negative numbers. The arguments are: +.sp + \fIerrorcode\fP an error code (positive or negative) + \fIbuffer\fP where to put the message + \fIbufflen\fP the length of the buffer (code units) +.sp +The function returns the length of the message in code units, excluding the +trailing zero, or the negative error code PCRE2_ERROR_NOMEMORY if the buffer is +too small. In this case, the returned message is truncated (but still with a +trailing zero). If \fIerrorcode\fP does not contain a recognized error code +number, the negative value PCRE2_ERROR_BADDATA is returned. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_get_mark.3 b/src/pcre2/doc/pcre2_get_mark.3 new file mode 100644 index 00000000..dce377d6 --- /dev/null +++ b/src/pcre2/doc/pcre2_get_mark.3 @@ -0,0 +1,34 @@ +.TH PCRE2_GET_MARK 3 "13 October 2017" "PCRE2 10.31" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B PCRE2_SPTR pcre2_get_mark(pcre2_match_data *\fImatch_data\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +After a call of \fBpcre2_match()\fP that was passed the match block that is +this function's argument, this function returns a pointer to the last (*MARK), +(*PRUNE), or (*THEN) name that was encountered during the matching process. The +name is zero-terminated, and is within the compiled pattern. The length of the +name is in the preceding code unit. If no name is available, NULL is returned. +.P +After a successful match, the name that is returned is the last one on the +matching path. After a failed match or a partial match, the last encountered +name is returned. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_get_match_data_size.3 b/src/pcre2/doc/pcre2_get_match_data_size.3 new file mode 100644 index 00000000..cf5fa5e6 --- /dev/null +++ b/src/pcre2/doc/pcre2_get_match_data_size.3 @@ -0,0 +1,27 @@ +.TH PCRE2_GET_MATCH_DATA_SIZE 3 "16 July 2019" "PCRE2 10.34" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B PCRE2_SIZE pcre2_get_match_data_size(pcre2_match_data *\fImatch_data\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +This function returns the size, in bytes, of the match data block that is its +argument. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_get_ovector_count.3 b/src/pcre2/doc/pcre2_get_ovector_count.3 new file mode 100644 index 00000000..3f6d7488 --- /dev/null +++ b/src/pcre2/doc/pcre2_get_ovector_count.3 @@ -0,0 +1,27 @@ +.TH PCRE2_GET_OVECTOR_COUNT 3 "24 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B uint32_t pcre2_get_ovector_count(pcre2_match_data *\fImatch_data\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +This function returns the number of pairs of offsets in the ovector that forms +part of the given match data block. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_get_ovector_pointer.3 b/src/pcre2/doc/pcre2_get_ovector_pointer.3 new file mode 100644 index 00000000..261d652d --- /dev/null +++ b/src/pcre2/doc/pcre2_get_ovector_pointer.3 @@ -0,0 +1,28 @@ +.TH PCRE2_GET_OVECTOR_POINTER 3 "24 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *\fImatch_data\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +This function returns a pointer to the vector of offsets that forms part of the +given match data block. The number of pairs can be found by calling +\fBpcre2_get_ovector_count()\fP. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_get_startchar.3 b/src/pcre2/doc/pcre2_get_startchar.3 new file mode 100644 index 00000000..c6ac8b01 --- /dev/null +++ b/src/pcre2/doc/pcre2_get_startchar.3 @@ -0,0 +1,32 @@ +.TH PCRE2_GET_STARTCHAR 3 "24 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *\fImatch_data\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +After a successful call of \fBpcre2_match()\fP that was passed the match block +that is this function's argument, this function returns the code unit offset of +the character at which the successful match started. For a non-partial match, +this can be different to the value of \fIovector[0]\fP if the pattern contains +the \eK escape sequence. After a partial match, however, this value is always +the same as \fIovector[0]\fP because \eK does not affect the result of a +partial match. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_jit_compile.3 b/src/pcre2/doc/pcre2_jit_compile.3 new file mode 100644 index 00000000..6cc17880 --- /dev/null +++ b/src/pcre2/doc/pcre2_jit_compile.3 @@ -0,0 +1,51 @@ +.TH PCRE2_JIT_COMPILE 3 "29 July 2019" "PCRE2 10.34" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_jit_compile(pcre2_code *\fIcode\fP, uint32_t \fIoptions\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +This function requests JIT compilation, which, if the just-in-time compiler is +available, further processes a compiled pattern into machine code that executes +much faster than the \fBpcre2_match()\fP interpretive matching function. Full +details are given in the +.\" HREF +\fBpcre2jit\fP +.\" +documentation. +.P +The first argument is a pointer that was returned by a successful call to +\fBpcre2_compile()\fP, and the second must contain one or more of the following +bits: +.sp + PCRE2_JIT_COMPLETE compile code for full matching + PCRE2_JIT_PARTIAL_SOFT compile code for soft partial matching + PCRE2_JIT_PARTIAL_HARD compile code for hard partial matching +.sp +There is also an obsolete option called PCRE2_JIT_INVALID_UTF, which has been +superseded by the \fBpcre2_compile()\fP option PCRE2_MATCH_INVALID_UTF. The old +option is deprecated and may be removed in the future. +.P +The yield of the function is 0 for success, or a negative error code otherwise. +In particular, PCRE2_ERROR_JIT_BADOPTION is returned if JIT is not supported or +if an unknown bit is set in \fIoptions\fP. The function can also return +PCRE2_ERROR_NOMEMORY if JIT is unable to allocate executable memory for the +compiler, even if it was because of a system security restriction. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_jit_free_unused_memory.3 b/src/pcre2/doc/pcre2_jit_free_unused_memory.3 new file mode 100644 index 00000000..183bba0a --- /dev/null +++ b/src/pcre2/doc/pcre2_jit_free_unused_memory.3 @@ -0,0 +1,31 @@ +.TH PCRE2_JIT_FREE_UNUSED_MEMORY 3 "27 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B void pcre2_jit_free_unused_memory(pcre2_general_context *\fIgcontext\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +This function frees unused JIT executable memory. The argument is a general +context, for custom memory management, or NULL for standard memory management. +JIT memory allocation retains some memory in order to improve future JIT +compilation speed. In low memory conditions, +\fBpcre2_jit_free_unused_memory()\fP can be used to cause this memory to be +freed. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_jit_match.3 b/src/pcre2/doc/pcre2_jit_match.3 new file mode 100644 index 00000000..5877fcba --- /dev/null +++ b/src/pcre2/doc/pcre2_jit_match.3 @@ -0,0 +1,50 @@ +.TH PCRE2_JIT_MATCH 3 "11 February 2020" "PCRE2 10.35" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_jit_match(const pcre2_code *\fIcode\fP, PCRE2_SPTR \fIsubject\fP, +.B " PCRE2_SIZE \fIlength\fP, PCRE2_SIZE \fIstartoffset\fP," +.B " uint32_t \fIoptions\fP, pcre2_match_data *\fImatch_data\fP," +.B " pcre2_match_context *\fImcontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function matches a compiled regular expression that has been successfully +processed by the JIT compiler against a given subject string, using a matching +algorithm that is similar to Perl's. It is a "fast path" interface to JIT, and +it bypasses some of the sanity checks that \fBpcre2_match()\fP applies. +Its arguments are exactly the same as for +.\" HREF +\fBpcre2_match()\fP, +.\" +except that the subject string must be specified with a length; +PCRE2_ZERO_TERMINATED is not supported. +.P +The supported options are PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, +PCRE2_NOTEMPTY_ATSTART, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Unsupported +options are ignored. The subject string is not checked for UTF validity. +.P +The return values are the same as for \fBpcre2_match()\fP plus +PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested +that was not compiled. For details of partial matching, see the +.\" HREF +\fBpcre2partial\fP +.\" +page. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the JIT API in the +.\" HREF +\fBpcre2jit\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_jit_stack_assign.3 b/src/pcre2/doc/pcre2_jit_stack_assign.3 new file mode 100644 index 00000000..33d2e1cb --- /dev/null +++ b/src/pcre2/doc/pcre2_jit_stack_assign.3 @@ -0,0 +1,59 @@ +.TH PCRE2_JIT_STACK_ASSIGN 3 "28 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B void pcre2_jit_stack_assign(pcre2_match_context *\fImcontext\fP, +.B " pcre2_jit_callback \fIcallback_function\fP, void *\fIcallback_data\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function provides control over the memory used by JIT as a run-time stack +when \fBpcre2_match()\fP or \fBpcre2_jit_match()\fP is called with a pattern +that has been successfully processed by the JIT compiler. The information that +determines which stack is used is put into a match context that is subsequently +passed to a matching function. The arguments of this function are: +.sp + mcontext a pointer to a match context + callback a callback function + callback_data a JIT stack or a value to be passed to the callback +.P +If \fImcontext\fP is NULL, the function returns immediately, without doing +anything. +.P +If \fIcallback\fP is NULL and \fIcallback_data\fP is NULL, an internal 32KiB +block on the machine stack is used. +.P +If \fIcallback\fP is NULL and \fIcallback_data\fP is not NULL, +\fIcallback_data\fP must be a valid JIT stack, the result of calling +\fBpcre2_jit_stack_create()\fP. +.P +If \fIcallback\fP not NULL, it is called with \fIcallback_data\fP as an +argument at the start of matching, in order to set up a JIT stack. If the +result is NULL, the internal 32KiB stack is used; otherwise the return value +must be a valid JIT stack, the result of calling +\fBpcre2_jit_stack_create()\fP. +.P +You may safely use the same JIT stack for multiple patterns, as long as they +are all matched in the same thread. In a multithread application, each thread +must use its own JIT stack. For more details, see the +.\" HREF +\fBpcre2jit\fP +.\" +page. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_jit_stack_create.3 b/src/pcre2/doc/pcre2_jit_stack_create.3 new file mode 100644 index 00000000..f0b29f0d --- /dev/null +++ b/src/pcre2/doc/pcre2_jit_stack_create.3 @@ -0,0 +1,39 @@ +.TH PCRE2_JIT_STACK_CREATE 3 "24 March 2017" "PCRE2 10.30" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B pcre2_jit_stack *pcre2_jit_stack_create(PCRE2_SIZE \fIstartsize\fP, +.B " PCRE2_SIZE \fImaxsize\fP, pcre2_general_context *\fIgcontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function is used to create a stack for use by the code compiled by the JIT +compiler. The first two arguments are a starting size for the stack, and a +maximum size to which it is allowed to grow. The final argument is a general +context, for memory allocation functions, or NULL for standard memory +allocation. The result can be passed to the JIT run-time code by calling +\fBpcre2_jit_stack_assign()\fP to associate the stack with a compiled pattern, +which can then be processed by \fBpcre2_match()\fP or \fBpcre2_jit_match()\fP. +A maximum stack size of 512KiB to 1MiB should be more than enough for any +pattern. For more details, see the +.\" HREF +\fBpcre2jit\fP +.\" +page. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_jit_stack_free.3 b/src/pcre2/doc/pcre2_jit_stack_free.3 new file mode 100644 index 00000000..2131a793 --- /dev/null +++ b/src/pcre2/doc/pcre2_jit_stack_free.3 @@ -0,0 +1,32 @@ +.TH PCRE2_JIT_STACK_FREE 3 "28 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.SM +.B void pcre2_jit_stack_free(pcre2_jit_stack *\fIjit_stack\fP); +. +.SH DESCRIPTION +.rs +.sp +This function is used to free a JIT stack that was created by +\fBpcre2_jit_stack_create()\fP when it is no longer needed. If the argument is +NULL, the function returns immediately without doing anything. For more +details, see the +.\" HREF +\fBpcre2jit\fP +.\" +page. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_maketables.3 b/src/pcre2/doc/pcre2_maketables.3 new file mode 100644 index 00000000..7dc8438b --- /dev/null +++ b/src/pcre2/doc/pcre2_maketables.3 @@ -0,0 +1,36 @@ +.TH PCRE2_MAKETABLES 3 "17 April 2017" "PCRE2 10.30" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.SM +.B const uint8_t *pcre2_maketables(pcre2_general_context *\fIgcontext\fP); +. +.SH DESCRIPTION +.rs +.sp +This function builds a set of character tables for character code points that +are less than 256. These can be passed to \fBpcre2_compile()\fP in a compile +context in order to override the internal, built-in tables (which were either +defaulted or made by \fBpcre2_maketables()\fP when PCRE2 was compiled). See the +.\" HREF +\fBpcre2_set_character_tables()\fP +.\" +page. You might want to do this if you are using a non-standard locale. +.P +If the argument is NULL, \fBmalloc()\fP is used to get memory for the tables. +Otherwise it must point to a general context, which can supply pointers to a +custom memory manager. The function yields a pointer to the tables. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_maketables_free.3 b/src/pcre2/doc/pcre2_maketables_free.3 new file mode 100644 index 00000000..07986b97 --- /dev/null +++ b/src/pcre2/doc/pcre2_maketables_free.3 @@ -0,0 +1,31 @@ +.TH PCRE2_MAKETABLES_FREE 3 "02 September 2019" "PCRE2 10.34" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B void pcre2_maketables_free(pcre2_general_context *\fIgcontext\fP, +.B " const uint8_t *\fItables\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function discards a set of character tables that were created by a call +to +.\" HREF +\fBpcre2_maketables()\fP. +.\" +.P +The \fIgcontext\fP parameter should match what was used in that call to +account for any custom allocators that might be in use; if it is NULL +the system \fBfree()\fP is used. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_match.3 b/src/pcre2/doc/pcre2_match.3 new file mode 100644 index 00000000..2be2dd0a --- /dev/null +++ b/src/pcre2/doc/pcre2_match.3 @@ -0,0 +1,84 @@ +.TH PCRE2_MATCH 3 "16 October 2018" "PCRE2 10.33" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_match(const pcre2_code *\fIcode\fP, PCRE2_SPTR \fIsubject\fP, +.B " PCRE2_SIZE \fIlength\fP, PCRE2_SIZE \fIstartoffset\fP," +.B " uint32_t \fIoptions\fP, pcre2_match_data *\fImatch_data\fP," +.B " pcre2_match_context *\fImcontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function matches a compiled regular expression against a given subject +string, using a matching algorithm that is similar to Perl's. It returns +offsets to what it has matched and to captured substrings via the +\fBmatch_data\fP block, which can be processed by functions with names that +start with \fBpcre2_get_ovector_...()\fP or \fBpcre2_substring_...()\fP. The +return from \fBpcre2_match()\fP is one more than the highest numbered capturing +pair that has been set (for example, 1 if there are no captures), zero if the +vector of offsets is too small, or a negative error code for no match and other +errors. The function arguments are: +.sp + \fIcode\fP Points to the compiled pattern + \fIsubject\fP Points to the subject string + \fIlength\fP Length of the subject string + \fIstartoffset\fP Offset in the subject at which to start matching + \fIoptions\fP Option bits + \fImatch_data\fP Points to a match data block, for results + \fImcontext\fP Points to a match context, or is NULL +.sp +A match context is needed only if you want to: +.sp + Set up a callout function + Set a matching offset limit + Change the heap memory limit + Change the backtracking match limit + Change the backtracking depth limit + Set custom memory management specifically for the match +.sp +The \fIlength\fP and \fIstartoffset\fP values are code units, not characters. +The length may be given as PCRE2_ZERO_TERMINATED for a subject that is +terminated by a binary zero code unit. The options are: +.sp + PCRE2_ANCHORED Match only at the first position + PCRE2_COPY_MATCHED_SUBJECT + On success, make a private subject copy + PCRE2_ENDANCHORED Pattern can match only at end of subject + PCRE2_NOTBOL Subject string is not the beginning of a line + PCRE2_NOTEOL Subject string is not the end of a line + PCRE2_NOTEMPTY An empty string is not a valid match +.\" JOIN + PCRE2_NOTEMPTY_ATSTART An empty string at the start of the subject + is not a valid match + PCRE2_NO_JIT Do not use JIT matching +.\" JOIN + PCRE2_NO_UTF_CHECK Do not check the subject for UTF + validity (only relevant if PCRE2_UTF + was set at compile time) +.\" JOIN + PCRE2_PARTIAL_HARD Return PCRE2_ERROR_PARTIAL for a partial + match even if there is a full match +.\" JOIN + PCRE2_PARTIAL_SOFT Return PCRE2_ERROR_PARTIAL for a partial + match if no full matches are found +.sp +For details of partial matching, see the +.\" HREF +\fBpcre2partial\fP +.\" +page. There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_match_context_copy.3 b/src/pcre2/doc/pcre2_match_context_copy.3 new file mode 100644 index 00000000..26c33a69 --- /dev/null +++ b/src/pcre2/doc/pcre2_match_context_copy.3 @@ -0,0 +1,29 @@ +.TH PCRE2_MATCH_CONTEXT_COPY 3 "22 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B pcre2_match_context *pcre2_match_context_copy( +.B " pcre2_match_context *\fImcontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function makes a new copy of a match context, using the memory +allocation function that was used for the original context. The result is NULL +if the memory cannot be obtained. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_match_context_create.3 b/src/pcre2/doc/pcre2_match_context_create.3 new file mode 100644 index 00000000..d4a26653 --- /dev/null +++ b/src/pcre2/doc/pcre2_match_context_create.3 @@ -0,0 +1,30 @@ +.TH PCRE2_MATCH_CONTEXT_CREATE 3 "22 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B pcre2_match_context *pcre2_match_context_create( +.B " pcre2_general_context *\fIgcontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function creates and initializes a new match context. If its argument is +NULL, \fBmalloc()\fP is used to get the necessary memory; otherwise the memory +allocation function within the general context is used. The result is NULL if +the memory could not be obtained. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_match_context_free.3 b/src/pcre2/doc/pcre2_match_context_free.3 new file mode 100644 index 00000000..7d19f986 --- /dev/null +++ b/src/pcre2/doc/pcre2_match_context_free.3 @@ -0,0 +1,29 @@ +.TH PCRE2_MATCH_CONTEXT_FREE 3 "28 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B void pcre2_match_context_free(pcre2_match_context *\fImcontext\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +This function frees the memory occupied by a match context, using the memory +freeing function from the general context with which it was created, or +\fBfree()\fP if that was not set. If the argument is NULL, the function returns +immediately without doing anything. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_match_data_create.3 b/src/pcre2/doc/pcre2_match_data_create.3 new file mode 100644 index 00000000..3b0a29e1 --- /dev/null +++ b/src/pcre2/doc/pcre2_match_data_create.3 @@ -0,0 +1,36 @@ +.TH PCRE2_MATCH_DATA_CREATE 3 "29 July 2015" "PCRE2 10.21" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B pcre2_match_data *pcre2_match_data_create(uint32_t \fIovecsize\fP, +.B " pcre2_general_context *\fIgcontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function creates a new match data block, which is used for holding the +result of a match. The first argument specifies the number of pairs of offsets +that are required. These form the "output vector" (ovector) within the match +data block, and are used to identify the matched string and any captured +substrings. There is always one pair of offsets; if \fBovecsize\fP is zero, it +is treated as one. +.P +The second argument points to a general context, for custom memory management, +or is NULL for system memory management. The result of the function is NULL if +the memory for the block could not be obtained. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_match_data_create_from_pattern.3 b/src/pcre2/doc/pcre2_match_data_create_from_pattern.3 new file mode 100644 index 00000000..60bf77cc --- /dev/null +++ b/src/pcre2/doc/pcre2_match_data_create_from_pattern.3 @@ -0,0 +1,37 @@ +.TH PCRE2_MATCH_DATA_CREATE_FROM_PATTERN 3 "29 July 2015" "PCRE2 10.21" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B pcre2_match_data *pcre2_match_data_create_from_pattern( +.B " const pcre2_code *\fIcode\fP, pcre2_general_context *\fIgcontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function creates a new match data block, which is used for holding the +result of a match. The first argument points to a compiled pattern. The number +of capturing parentheses within the pattern is used to compute the number of +pairs of offsets that are required in the match data block. These form the +"output vector" (ovector) within the match data block, and are used to identify +the matched string and any captured substrings. +.P +The second argument points to a general context, for custom memory management, +or is NULL to use the same memory allocator as was used for the compiled +pattern. The result of the function is NULL if the memory for the block could +not be obtained. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_match_data_free.3 b/src/pcre2/doc/pcre2_match_data_free.3 new file mode 100644 index 00000000..cebdef90 --- /dev/null +++ b/src/pcre2/doc/pcre2_match_data_free.3 @@ -0,0 +1,33 @@ +.TH PCRE2_MATCH_DATA_FREE 3 "16 October 2018" "PCRE2 10.33" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B void pcre2_match_data_free(pcre2_match_data *\fImatch_data\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +If \fImatch_data\fP is NULL, this function does nothing. Otherwise, +\fImatch_data\fP must point to a match data block, which this function frees, +using the memory freeing function from the general context or compiled pattern +with which it was created, or \fBfree()\fP if that was not set. +.P +If the PCRE2_COPY_MATCHED_SUBJECT was used for a successful match using this +match data block, the copy of the subject that was remembered with the block is +also freed. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_pattern_convert.3 b/src/pcre2/doc/pcre2_pattern_convert.3 new file mode 100644 index 00000000..b72acb76 --- /dev/null +++ b/src/pcre2/doc/pcre2_pattern_convert.3 @@ -0,0 +1,55 @@ +.TH PCRE2_PATTERN_CONVERT 3 "11 July 2017" "PCRE2 10.30" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_pattern_convert(PCRE2_SPTR \fIpattern\fP, PCRE2_SIZE \fIlength\fP, +.B " uint32_t \fIoptions\fP, PCRE2_UCHAR **\fIbuffer\fP," +.B " PCRE2_SIZE *\fIblength\fP, pcre2_convert_context *\fIcvcontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function is part of an experimental set of pattern conversion functions. +It converts a foreign pattern (for example, a glob) into a PCRE2 regular +expression pattern. Its arguments are: +.sp + \fIpattern\fP The foreign pattern + \fIlength\fP The length of the input pattern or PCRE2_ZERO_TERMINATED + \fIoptions\fP Option bits + \fIbuffer\fP Pointer to pointer to output buffer, or NULL + \fIblength\fP Pointer to output length field + \fIcvcontext\fP Pointer to a convert context or NULL +.sp +The length of the converted pattern (excluding the terminating zero) is +returned via \fIblength\fP. If \fIbuffer\fP is NULL, the function just returns +the output length. If \fIbuffer\fP points to a NULL pointer, heap memory is +obtained for the converted pattern, using the allocator in the context if +present (or else \fBmalloc()\fP), and the field pointed to by \fIbuffer\fP is +updated. If \fIbuffer\fP points to a non-NULL field, that must point to a +buffer whose size is in the variable pointed to by \fIblength\fP. This value is +updated. +.P +The option bits are: +.sp + PCRE2_CONVERT_UTF Input is UTF + PCRE2_CONVERT_NO_UTF_CHECK Do not check UTF validity + PCRE2_CONVERT_POSIX_BASIC Convert POSIX basic pattern + PCRE2_CONVERT_POSIX_EXTENDED Convert POSIX extended pattern + PCRE2_CONVERT_GLOB ) Convert + PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR ) various types + PCRE2_CONVERT_GLOB_NO_STARSTAR ) of glob +.sp +The return value from \fBpcre2_pattern_convert()\fP is zero on success or a +non-zero PCRE2 error code. +.P +The pattern conversion functions are described in the +.\" HREF +\fBpcre2convert\fP +.\" +documentation. diff --git a/src/pcre2/doc/pcre2_pattern_info.3 b/src/pcre2/doc/pcre2_pattern_info.3 new file mode 100644 index 00000000..edd8989d --- /dev/null +++ b/src/pcre2/doc/pcre2_pattern_info.3 @@ -0,0 +1,108 @@ +.TH PCRE2_PATTERN_INFO 3 "14 February 2019" "PCRE2 10.33" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_pattern_info(const pcre2_code *\fIcode\fP, uint32_t \fIwhat\fP, +.B " void *\fIwhere\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function returns information about a compiled pattern. Its arguments are: +.sp + \fIcode\fP Pointer to a compiled regular expression pattern + \fIwhat\fP What information is required + \fIwhere\fP Where to put the information +.sp +The recognized values for the \fIwhat\fP argument, and the information they +request are as follows: +.sp + PCRE2_INFO_ALLOPTIONS Final options after compiling + PCRE2_INFO_ARGOPTIONS Options passed to \fBpcre2_compile()\fP + PCRE2_INFO_BACKREFMAX Number of highest backreference + PCRE2_INFO_BSR What \eR matches: + PCRE2_BSR_UNICODE: Unicode line endings + PCRE2_BSR_ANYCRLF: CR, LF, or CRLF only + PCRE2_INFO_CAPTURECOUNT Number of capturing subpatterns +.\" JOIN + PCRE2_INFO_DEPTHLIMIT Backtracking depth limit if set, + otherwise PCRE2_ERROR_UNSET + PCRE2_INFO_EXTRAOPTIONS Extra options that were passed in the + compile context + PCRE2_INFO_FIRSTBITMAP Bitmap of first code units, or NULL + PCRE2_INFO_FIRSTCODETYPE Type of start-of-match information + 0 nothing set + 1 first code unit is set + 2 start of string or after newline + PCRE2_INFO_FIRSTCODEUNIT First code unit when type is 1 + PCRE2_INFO_FRAMESIZE Size of backtracking frame + PCRE2_INFO_HASBACKSLASHC Return 1 if pattern contains \eC +.\" JOIN + PCRE2_INFO_HASCRORLF Return 1 if explicit CR or LF matches + exist in the pattern +.\" JOIN + PCRE2_INFO_HEAPLIMIT Heap memory limit if set, + otherwise PCRE2_ERROR_UNSET + PCRE2_INFO_JCHANGED Return 1 if (?J) or (?-J) was used + PCRE2_INFO_JITSIZE Size of JIT compiled code, or 0 + PCRE2_INFO_LASTCODETYPE Type of must-be-present information + 0 nothing set + 1 code unit is set + PCRE2_INFO_LASTCODEUNIT Last code unit when type is 1 +.\" JOIN + PCRE2_INFO_MATCHEMPTY 1 if the pattern can match an + empty string, 0 otherwise +.\" JOIN + PCRE2_INFO_MATCHLIMIT Match limit if set, + otherwise PCRE2_ERROR_UNSET +.\" JOIN + PCRE2_INFO_MAXLOOKBEHIND Length (in characters) of the longest + lookbehind assertion + PCRE2_INFO_MINLENGTH Lower bound length of matching strings + PCRE2_INFO_NAMECOUNT Number of named subpatterns + PCRE2_INFO_NAMEENTRYSIZE Size of name table entries + PCRE2_INFO_NAMETABLE Pointer to name table + PCRE2_CONFIG_NEWLINE Code for the newline sequence: + PCRE2_NEWLINE_CR + PCRE2_NEWLINE_LF + PCRE2_NEWLINE_CRLF + PCRE2_NEWLINE_ANY + PCRE2_NEWLINE_ANYCRLF + PCRE2_NEWLINE_NUL + PCRE2_INFO_RECURSIONLIMIT Obsolete synonym for PCRE2_INFO_DEPTHLIMIT + PCRE2_INFO_SIZE Size of compiled pattern +.sp +If \fIwhere\fP is NULL, the function returns the amount of memory needed for +the requested information, in bytes. Otherwise, the \fIwhere\fP argument must +point to an unsigned 32-bit integer (uint32_t variable), except for the +following \fIwhat\fP values, when it must point to a variable of the type +shown: +.sp + PCRE2_INFO_FIRSTBITMAP const uint8_t * + PCRE2_INFO_JITSIZE size_t + PCRE2_INFO_NAMETABLE PCRE2_SPTR + PCRE2_INFO_SIZE size_t +.sp +The yield of the function is zero on success or: +.sp + PCRE2_ERROR_NULL the argument \fIcode\fP is NULL + PCRE2_ERROR_BADMAGIC the "magic number" was not found + PCRE2_ERROR_BADOPTION the value of \fIwhat\fP is invalid + PCRE2_ERROR_BADMODE the pattern was compiled in the wrong mode + PCRE2_ERROR_UNSET the requested information is not set +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_serialize_decode.3 b/src/pcre2/doc/pcre2_serialize_decode.3 new file mode 100644 index 00000000..b67a112d --- /dev/null +++ b/src/pcre2/doc/pcre2_serialize_decode.3 @@ -0,0 +1,53 @@ +.TH PCRE2_SERIALIZE_DECODE 3 "27 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int32_t pcre2_serialize_decode(pcre2_code **\fIcodes\fP, +.B " int32_t \fInumber_of_codes\fP, const uint8_t *\fIbytes\fP," +.B " pcre2_general_context *\fIgcontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function decodes a serialized set of compiled patterns back into a list of +individual patterns. This is possible only on a host that is running the same +version of PCRE2, with the same code unit width, and the host must also have +the same endianness, pointer width and PCRE2_SIZE type. The arguments for +\fBpcre2_serialize_decode()\fP are: +.sp + \fIcodes\fP pointer to a vector in which to build the list + \fInumber_of_codes\fP number of slots in the vector + \fIbytes\fP the serialized byte stream + \fIgcontext\fP pointer to a general context or NULL +.sp +The \fIbytes\fP argument must point to a block of data that was originally +created by \fBpcre2_serialize_encode()\fP, though it may have been saved on +disc or elsewhere in the meantime. If there are more codes in the serialized +data than slots in the list, only those compiled patterns that will fit are +decoded. The yield of the function is the number of decoded patterns, or one of +the following negative error codes: +.sp + PCRE2_ERROR_BADDATA \fInumber_of_codes\fP is zero or less + PCRE2_ERROR_BADMAGIC mismatch of id bytes in \fIbytes\fP + PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE version + PCRE2_ERROR_MEMORY memory allocation failed + PCRE2_ERROR_NULL \fIcodes\fP or \fIbytes\fP is NULL +.sp +PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled +on a system with different endianness. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the serialization functions in the +.\" HREF +\fBpcre2serialize\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_serialize_encode.3 b/src/pcre2/doc/pcre2_serialize_encode.3 new file mode 100644 index 00000000..d5293608 --- /dev/null +++ b/src/pcre2/doc/pcre2_serialize_encode.3 @@ -0,0 +1,54 @@ +.TH PCRE2_SERIALIZE_ENCODE 3 "27 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int32_t pcre2_serialize_encode(const pcre2_code **\fIcodes\fP, +.B " int32_t \fInumber_of_codes\fP, uint8_t **\fIserialized_bytes\fP," +.B " PCRE2_SIZE *\fIserialized_size\fP, pcre2_general_context *\fIgcontext\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function encodes a list of compiled patterns into a byte stream that can +be saved on disc or elsewhere. Note that this is not an abstract format like +Java or .NET. Conversion of the byte stream back into usable compiled patterns +can only happen on a host that is running the same version of PCRE2, with the +same code unit width, and the host must also have the same endianness, pointer +width and PCRE2_SIZE type. The arguments for \fBpcre2_serialize_encode()\fP +are: +.sp + \fIcodes\fP pointer to a vector containing the list + \fInumber_of_codes\fP number of slots in the vector + \fIserialized_bytes\fP set to point to the serialized byte stream + \fIserialized_size\fP set to the number of bytes in the byte stream + \fIgcontext\fP pointer to a general context or NULL +.sp +The context argument is used to obtain memory for the byte stream. When the +serialized data is no longer needed, it must be freed by calling +\fBpcre2_serialize_free()\fP. The yield of the function is the number of +serialized patterns, or one of the following negative error codes: +.sp + PCRE2_ERROR_BADDATA \fInumber_of_codes\fP is zero or less + PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns + PCRE2_ERROR_MEMORY memory allocation failed + PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables + PCRE2_ERROR_NULL an argument other than \fIgcontext\fP is NULL +.sp +PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or +that a slot in the vector does not point to a compiled pattern. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the serialization functions in the +.\" HREF +\fBpcre2serialize\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_serialize_free.3 b/src/pcre2/doc/pcre2_serialize_free.3 new file mode 100644 index 00000000..2c43824b --- /dev/null +++ b/src/pcre2/doc/pcre2_serialize_free.3 @@ -0,0 +1,29 @@ +.TH PCRE2_SERIALIZE_FREE 3 "27 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B void pcre2_serialize_free(uint8_t *\fIbytes\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +This function frees the memory that was obtained by +\fBpcre2_serialize_encode()\fP to hold a serialized byte stream. The argument +must point to such a byte stream or be NULL, in which case the function returns +without doing anything. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the serialization functions in the +.\" HREF +\fBpcre2serialize\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_serialize_get_number_of_codes.3 b/src/pcre2/doc/pcre2_serialize_get_number_of_codes.3 new file mode 100644 index 00000000..f5eea540 --- /dev/null +++ b/src/pcre2/doc/pcre2_serialize_get_number_of_codes.3 @@ -0,0 +1,37 @@ +.TH PCRE2_SERIALIZE_GET_NUMBER_OF_CODES 3 "27 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int32_t pcre2_serialize_get_number_of_codes(const uint8_t *\fIbytes\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +The \fIbytes\fP argument must point to a serialized byte stream that was +originally created by \fBpcre2_serialize_encode()\fP (though it may have been +saved on disc or elsewhere in the meantime). The function returns the number of +serialized patterns in the byte stream, or one of the following negative error +codes: +.sp + PCRE2_ERROR_BADMAGIC mismatch of id bytes in \fIbytes\fP + PCRE2_ERROR_BADMODE mismatch of variable unit size or PCRE version + PCRE2_ERROR_NULL the argument is NULL +.sp +PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled +on a system with different endianness. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the serialization functions in the +.\" HREF +\fBpcre2serialize\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_set_bsr.3 b/src/pcre2/doc/pcre2_set_bsr.3 new file mode 100644 index 00000000..ecf2437f --- /dev/null +++ b/src/pcre2/doc/pcre2_set_bsr.3 @@ -0,0 +1,30 @@ +.TH PCRE2_SET_BSR 3 "22 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_bsr(pcre2_compile_context *\fIccontext\fP, +.B " uint32_t \fIvalue\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function sets the convention for processing \eR within a compile context. +The second argument must be one of PCRE2_BSR_ANYCRLF or PCRE2_BSR_UNICODE. The +result is zero for success or PCRE2_ERROR_BADDATA if the second argument is +invalid. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_set_callout.3 b/src/pcre2/doc/pcre2_set_callout.3 new file mode 100644 index 00000000..cb48e143 --- /dev/null +++ b/src/pcre2/doc/pcre2_set_callout.3 @@ -0,0 +1,31 @@ +.TH PCRE2_SET_CALLOUT 3 "21 March 2017" "PCRE2 10.30" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_callout(pcre2_match_context *\fImcontext\fP, +.B " int (*\fIcallout_function\fP)(pcre2_callout_block *)," +.B " void *\fIcallout_data\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function sets the callout fields in a match context (the first argument). +The second argument specifies a callout function, and the third argument is an +opaque data item that is passed to it. The result of this function is always +zero. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_set_character_tables.3 b/src/pcre2/doc/pcre2_set_character_tables.3 new file mode 100644 index 00000000..1ca41347 --- /dev/null +++ b/src/pcre2/doc/pcre2_set_character_tables.3 @@ -0,0 +1,35 @@ +.TH PCRE2_SET_CHARACTER_TABLES 3 "20 March 2020" "PCRE2 10.35" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_character_tables(pcre2_compile_context *\fIccontext\fP, +.B " const uint8_t *\fItables\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function sets a pointer to custom character tables within a compile +context. The second argument must point to a set of PCRE2 character tables or +be NULL to request the default tables. The result is always zero. Character +tables can be created by calling \fBpcre2_maketables()\fP or by running the +\fBpcre2_dftables\fP maintenance command in binary mode (see the +.\" HREF +\fBpcre2build\fP +.\" +documentation). +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_set_compile_extra_options.3 b/src/pcre2/doc/pcre2_set_compile_extra_options.3 new file mode 100644 index 00000000..764a75e8 --- /dev/null +++ b/src/pcre2/doc/pcre2_set_compile_extra_options.3 @@ -0,0 +1,42 @@ +.TH PCRE2_SET_COMPILE_EXTRA_OPTIONS 3 "11 February 2019" "PCRE2 10.33" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_compile_extra_options(pcre2_compile_context *\fIccontext\fP, +.B " uint32_t \fIextra_options\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function sets additional option bits for \fBpcre2_compile()\fP that are +housed in a compile context. It completely replaces all the bits. The extra +options are: +.sp +.\" JOIN + PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES Allow \ex{df800} to \ex{dfff} + in UTF-8 and UTF-32 modes +.\" JOIN + PCRE2_EXTRA_ALT_BSUX Extended alternate \eu, \eU, and \ex + handling +.\" JOIN + PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL Treat all invalid escapes as + a literal following character + PCRE2_EXTRA_ESCAPED_CR_IS_LF Interpret \er as \en + PCRE2_EXTRA_MATCH_LINE Pattern matches whole lines + PCRE2_EXTRA_MATCH_WORD Pattern matches "words" +.sp +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_set_compile_recursion_guard.3 b/src/pcre2/doc/pcre2_set_compile_recursion_guard.3 new file mode 100644 index 00000000..0575f940 --- /dev/null +++ b/src/pcre2/doc/pcre2_set_compile_recursion_guard.3 @@ -0,0 +1,34 @@ +.TH PCRE2_SET_COMPILE_RECURSION_GUARD 3 "22 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_compile_recursion_guard(pcre2_compile_context *\fIccontext\fP, +.B " int (*\fIguard_function\fP)(uint32_t, void *), void *\fIuser_data\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function defines, within a compile context, a function that is called +whenever \fBpcre2_compile()\fP starts to compile a parenthesized part of a +pattern. The first argument to the function gives the current depth of +parenthesis nesting, and the second is user data that is supplied when the +function is set up. The callout function should return zero if all is well, or +non-zero to force an error. This feature is provided so that applications can +check the available system stack space, in order to avoid running out. The +result of \fBpcre2_set_compile_recursion_guard()\fP is always zero. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_set_depth_limit.3 b/src/pcre2/doc/pcre2_set_depth_limit.3 new file mode 100644 index 00000000..62bc7fe1 --- /dev/null +++ b/src/pcre2/doc/pcre2_set_depth_limit.3 @@ -0,0 +1,28 @@ +.TH PCRE2_SET_DEPTH_LIMIT 3 "25 March 2017" "PCRE2 10.30" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP, +.B " uint32_t \fIvalue\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function sets the backtracking depth limit field in a match context. The +result is always zero. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_set_glob_escape.3 b/src/pcre2/doc/pcre2_set_glob_escape.3 new file mode 100644 index 00000000..d5637aff --- /dev/null +++ b/src/pcre2/doc/pcre2_set_glob_escape.3 @@ -0,0 +1,29 @@ +.TH PCRE2_SET_GLOB_ESCAPE 3 "11 July 2017" "PCRE2 10.30" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_glob_escape(pcre2_convert_context *\fIcvcontext\fP, +.B " uint32_t \fIescape_char\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function is part of an experimental set of pattern conversion functions. +It sets the escape character that is used when converting globs. The second +argument must either be zero (meaning there is no escape character) or a +punctuation character whose code point is less than 256. The default is grave +accent if running under Windows, otherwise backslash. The result of the +function is zero for success or PCRE2_ERROR_BADDATA if the second argument is +invalid. +.P +The pattern conversion functions are described in the +.\" HREF +\fBpcre2convert\fP +.\" +documentation. diff --git a/src/pcre2/doc/pcre2_set_glob_separator.3 b/src/pcre2/doc/pcre2_set_glob_separator.3 new file mode 100644 index 00000000..5d78c097 --- /dev/null +++ b/src/pcre2/doc/pcre2_set_glob_separator.3 @@ -0,0 +1,28 @@ +.TH PCRE2_SET_GLOB_SEPARATOR 3 "11 July 2017" "PCRE2 10.30" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_glob_separator(pcre2_convert_context *\fIcvcontext\fP, +.B " uint32_t \fIseparator_char\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function is part of an experimental set of pattern conversion functions. +It sets the component separator character that is used when converting globs. +The second argument must be one of the characters forward slash, backslash, or +dot. The default is backslash when running under Windows, otherwise forward +slash. The result of the function is zero for success or PCRE2_ERROR_BADDATA if +the second argument is invalid. +.P +The pattern conversion functions are described in the +.\" HREF +\fBpcre2convert\fP +.\" +documentation. diff --git a/src/pcre2/doc/pcre2_set_heap_limit.3 b/src/pcre2/doc/pcre2_set_heap_limit.3 new file mode 100644 index 00000000..7c155a26 --- /dev/null +++ b/src/pcre2/doc/pcre2_set_heap_limit.3 @@ -0,0 +1,28 @@ +.TH PCRE2_SET_HEAP_LIMIT 3 "11 April 2017" "PCRE2 10.30" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_heap_limit(pcre2_match_context *\fImcontext\fP, +.B " uint32_t \fIvalue\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function sets the backtracking heap limit field in a match context. The +result is always zero. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_set_match_limit.3 b/src/pcre2/doc/pcre2_set_match_limit.3 new file mode 100644 index 00000000..523e97f2 --- /dev/null +++ b/src/pcre2/doc/pcre2_set_match_limit.3 @@ -0,0 +1,28 @@ +.TH PCRE2_SET_MATCH_LIMIT 3 "24 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP, +.B " uint32_t \fIvalue\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function sets the match limit field in a match context. The result is +always zero. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_set_max_pattern_length.3 b/src/pcre2/doc/pcre2_set_max_pattern_length.3 new file mode 100644 index 00000000..7aa01c77 --- /dev/null +++ b/src/pcre2/doc/pcre2_set_max_pattern_length.3 @@ -0,0 +1,31 @@ +.TH PCRE2_SET_MAX_PATTERN_LENGTH 3 "05 October 2016" "PCRE2 10.23" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_max_pattern_length(pcre2_compile_context *\fIccontext\fP, +.B " PCRE2_SIZE \fIvalue\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function sets, in a compile context, the maximum text length (in code +units) of the pattern that can be compiled. The result is always zero. If a +longer pattern is passed to \fBpcre2_compile()\fP there is an immediate error +return. The default is effectively unlimited, being the largest value a +PCRE2_SIZE variable can hold. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_set_newline.3 b/src/pcre2/doc/pcre2_set_newline.3 new file mode 100644 index 00000000..0bccfc7d --- /dev/null +++ b/src/pcre2/doc/pcre2_set_newline.3 @@ -0,0 +1,39 @@ +.TH PCRE2_SET_NEWLINE 3 "26 May 2017" "PCRE2 10.30" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_newline(pcre2_compile_context *\fIccontext\fP, +.B " uint32_t \fIvalue\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function sets the newline convention within a compile context. This +specifies which character(s) are recognized as newlines when compiling and +matching patterns. The second argument must be one of: +.sp + PCRE2_NEWLINE_CR Carriage return only + PCRE2_NEWLINE_LF Linefeed only + PCRE2_NEWLINE_CRLF CR followed by LF only + PCRE2_NEWLINE_ANYCRLF Any of the above + PCRE2_NEWLINE_ANY Any Unicode newline sequence + PCRE2_NEWLINE_NUL The NUL character (binary zero) +.sp +The result is zero for success or PCRE2_ERROR_BADDATA if the second argument is +invalid. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_set_offset_limit.3 b/src/pcre2/doc/pcre2_set_offset_limit.3 new file mode 100644 index 00000000..20fa1045 --- /dev/null +++ b/src/pcre2/doc/pcre2_set_offset_limit.3 @@ -0,0 +1,28 @@ +.TH PCRE2_SET_OFFSET_LIMIT 3 "22 September 2015" "PCRE2 10.21" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP, +.B " PCRE2_SIZE \fIvalue\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function sets the offset limit field in a match context. The result is +always zero. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_set_parens_nest_limit.3 b/src/pcre2/doc/pcre2_set_parens_nest_limit.3 new file mode 100644 index 00000000..03676193 --- /dev/null +++ b/src/pcre2/doc/pcre2_set_parens_nest_limit.3 @@ -0,0 +1,28 @@ +.TH PCRE2_SET_PARENS_NEST_LIMIT 3 "22 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_parens_nest_limit(pcre2_compile_context *\fIccontext\fP, +.B " uint32_t \fIvalue\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function sets, in a compile context, the maximum depth of nested +parentheses in a pattern. The result is always zero. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_set_recursion_limit.3 b/src/pcre2/doc/pcre2_set_recursion_limit.3 new file mode 100644 index 00000000..26f42572 --- /dev/null +++ b/src/pcre2/doc/pcre2_set_recursion_limit.3 @@ -0,0 +1,28 @@ +.TH PCRE2_SET_RECURSION_LIMIT 3 "25 March 2017" "PCRE2 10.30" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP, +.B " uint32_t \fIvalue\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function is obsolete and should not be used in new code. Use +\fBpcre2_set_depth_limit()\fP instead. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_set_recursion_memory_management.3 b/src/pcre2/doc/pcre2_set_recursion_memory_management.3 new file mode 100644 index 00000000..12f175db --- /dev/null +++ b/src/pcre2/doc/pcre2_set_recursion_memory_management.3 @@ -0,0 +1,30 @@ +.TH PCRE2_SET_RECURSION_MEMORY_MANAGEMENT 3 "25 March 2017" "PCRE2 10.30" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_recursion_memory_management( +.B " pcre2_match_context *\fImcontext\fP," +.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *)," +.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +From release 10.30 onwards, this function is obsolete and does nothing. The +result is always zero. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_set_substitute_callout.3 b/src/pcre2/doc/pcre2_set_substitute_callout.3 new file mode 100644 index 00000000..cdd7ac6a --- /dev/null +++ b/src/pcre2/doc/pcre2_set_substitute_callout.3 @@ -0,0 +1,31 @@ +.TH PCRE2_SET_SUBSTITUTE_CALLOUT 3 "12 November 2018" "PCRE2 10.33" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP, +.B " int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *)," +.B " void *\fIcallout_data\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function sets the substitute callout fields in a match context (the first +argument). The second argument specifies a callout function, and the third +argument is an opaque data item that is passed to it. The result of this +function is always zero. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_substitute.3 b/src/pcre2/doc/pcre2_substitute.3 new file mode 100644 index 00000000..cceb7846 --- /dev/null +++ b/src/pcre2/doc/pcre2_substitute.3 @@ -0,0 +1,100 @@ +.TH PCRE2_SUBSTITUTE 3 "22 January 2020" "PCRE2 10.35" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_substitute(const pcre2_code *\fIcode\fP, PCRE2_SPTR \fIsubject\fP, +.B " PCRE2_SIZE \fIlength\fP, PCRE2_SIZE \fIstartoffset\fP," +.B " uint32_t \fIoptions\fP, pcre2_match_data *\fImatch_data\fP," +.B " pcre2_match_context *\fImcontext\fP, PCRE2_SPTR \fIreplacement\fP," +.B " PCRE2_SIZE \fIrlength\fP, PCRE2_UCHAR *\fIoutputbuffer\fP," +.B " PCRE2_SIZE *\fIoutlengthptr\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function matches a compiled regular expression against a given subject +string, using a matching algorithm that is similar to Perl's. It then makes a +copy of the subject, substituting a replacement string for what was matched. +Its arguments are: +.sp + \fIcode\fP Points to the compiled pattern + \fIsubject\fP Points to the subject string + \fIlength\fP Length of the subject string + \fIstartoffset\fP Offset in the subject at which to start matching + \fIoptions\fP Option bits + \fImatch_data\fP Points to a match data block, or is NULL + \fImcontext\fP Points to a match context, or is NULL + \fIreplacement\fP Points to the replacement string + \fIrlength\fP Length of the replacement string + \fIoutputbuffer\fP Points to the output buffer + \fIoutlengthptr\fP Points to the length of the output buffer +.sp +A match data block is needed only if you want to inspect the data from the +final match that is returned in that block or if PCRE2_SUBSTITUTE_MATCHED is +set. A match context is needed only if you want to: +.sp + Set up a callout function + Set a matching offset limit + Change the backtracking match limit + Change the backtracking depth limit + Set custom memory management in the match context +.sp +The \fIlength\fP, \fIstartoffset\fP and \fIrlength\fP values are code units, +not characters, as is the contents of the variable pointed at by +\fIoutlengthptr\fP. This variable must contain the length of the output buffer +when the function is called. If the function is successful, the value is +changed to the length of the new string, excluding the trailing zero that is +automatically added. +.P +The subject and replacement lengths can be given as PCRE2_ZERO_TERMINATED for +zero-terminated strings. The options are: +.sp + PCRE2_ANCHORED Match only at the first position + PCRE2_ENDANCHORED Pattern can match only at end of subject + PCRE2_NOTBOL Subject is not the beginning of a line + PCRE2_NOTEOL Subject is not the end of a line + PCRE2_NOTEMPTY An empty string is not a valid match +.\" JOIN + PCRE2_NOTEMPTY_ATSTART An empty string at the start of the + subject is not a valid match + PCRE2_NO_JIT Do not use JIT matching +.\" JOIN + PCRE2_NO_UTF_CHECK Do not check the subject or replacement + for UTF validity (only relevant if + PCRE2_UTF was set at compile time) + PCRE2_SUBSTITUTE_EXTENDED Do extended replacement processing + PCRE2_SUBSTITUTE_GLOBAL Replace all occurrences in the subject + PCRE2_SUBSTITUTE_LITERAL The replacement string is literal + PCRE2_SUBSTITUTE_MATCHED Use pre-existing match data for 1st match + PCRE2_SUBSTITUTE_OVERFLOW_LENGTH If overflow, compute needed length + PCRE2_SUBSTITUTE_REPLACEMENT_ONLY Return only replacement string(s) + PCRE2_SUBSTITUTE_UNKNOWN_UNSET Treat unknown group as unset + PCRE2_SUBSTITUTE_UNSET_EMPTY Simple unset insert = empty string +.sp +If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_EXTENDED, +PCRE2_SUBSTITUTE_UNKNOWN_UNSET, and PCRE2_SUBSTITUTE_UNSET_EMPTY are ignored. +.P +If PCRE2_SUBSTITUTE_MATCHED is set, \fImatch_data\fP must be non-zero; its +contents must be the result of a call to \fBpcre2_match()\fP using the same +pattern and subject. +.P +The function returns the number of substitutions, which may be zero if there +are no matches. The result may be greater than one only when +PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code +is returned. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_substring_copy_byname.3 b/src/pcre2/doc/pcre2_substring_copy_byname.3 new file mode 100644 index 00000000..d2af63bf --- /dev/null +++ b/src/pcre2/doc/pcre2_substring_copy_byname.3 @@ -0,0 +1,46 @@ +.TH PCRE2_SUBSTRING_COPY_BYNAME 3 "21 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_substring_copy_byname(pcre2_match_data *\fImatch_data\fP, +.B " PCRE2_SPTR \fIname\fP, PCRE2_UCHAR *\fIbuffer\fP, PCRE2_SIZE *\fIbufflen\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This is a convenience function for extracting a captured substring, identified +by name, into a given buffer. The arguments are: +.sp + \fImatch_data\fP The match data block for the match + \fIname\fP Name of the required substring + \fIbuffer\fP Buffer to receive the string + \fIbufflen\fP Length of buffer (code units) +.sp +The \fIbufflen\fP variable is updated to contain the length of the extracted +string, excluding the trailing zero. The yield of the function is zero for +success or one of the following error numbers: +.sp + PCRE2_ERROR_NOSUBSTRING there are no groups of that name + PCRE2_ERROR_UNAVAILBLE the ovector was too small for that group + PCRE2_ERROR_UNSET the group did not participate in the match + PCRE2_ERROR_NOMEMORY the buffer is not big enough +.sp +If there is more than one group with the given name, the first one that is set +is returned. In this situation PCRE2_ERROR_UNSET means that no group with the +given name was set. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_substring_copy_bynumber.3 b/src/pcre2/doc/pcre2_substring_copy_bynumber.3 new file mode 100644 index 00000000..4cee2b42 --- /dev/null +++ b/src/pcre2/doc/pcre2_substring_copy_bynumber.3 @@ -0,0 +1,44 @@ +.TH PCRE2_SUBSTRING_COPY_BYNUMBER 3 "13 December 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_substring_copy_bynumber(pcre2_match_data *\fImatch_data\fP, +.B " uint32_t \fInumber\fP, PCRE2_UCHAR *\fIbuffer\fP," +.B " PCRE2_SIZE *\fIbufflen\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This is a convenience function for extracting a captured substring into a given +buffer. The arguments are: +.sp + \fImatch_data\fP The match data block for the match + \fInumber\fP Number of the required substring + \fIbuffer\fP Buffer to receive the string + \fIbufflen\fP Length of buffer +.sp +The \fIbufflen\fP variable is updated with the length of the extracted string, +excluding the terminating zero. The yield of the function is zero for success +or one of the following error numbers: +.sp + PCRE2_ERROR_NOSUBSTRING there are no groups of that number + PCRE2_ERROR_UNAVAILBLE the ovector was too small for that group + PCRE2_ERROR_UNSET the group did not participate in the match + PCRE2_ERROR_NOMEMORY the buffer is too small +.sp +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_substring_free.3 b/src/pcre2/doc/pcre2_substring_free.3 new file mode 100644 index 00000000..6d0fd588 --- /dev/null +++ b/src/pcre2/doc/pcre2_substring_free.3 @@ -0,0 +1,28 @@ +.TH PCRE2_SUBSTRING_FREE 3 "28 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.SM +.B void pcre2_substring_free(PCRE2_UCHAR *\fIbuffer\fP); +. +.SH DESCRIPTION +.rs +.sp +This is a convenience function for freeing the memory obtained by a previous +call to \fBpcre2_substring_get_byname()\fP or +\fBpcre2_substring_get_bynumber()\fP. Its only argument is a pointer to the +string. If the argument is NULL, the function does nothing. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_substring_get_byname.3 b/src/pcre2/doc/pcre2_substring_get_byname.3 new file mode 100644 index 00000000..6c3f7d57 --- /dev/null +++ b/src/pcre2/doc/pcre2_substring_get_byname.3 @@ -0,0 +1,48 @@ +.TH PCRE2_SUBSTRING_GET_BYNAME 3 "21 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_substring_get_byname(pcre2_match_data *\fImatch_data\fP, +.B " PCRE2_SPTR \fIname\fP, PCRE2_UCHAR **\fIbufferptr\fP, PCRE2_SIZE *\fIbufflen\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This is a convenience function for extracting a captured substring by name into +newly acquired memory. The arguments are: +.sp + \fImatch_data\fP The match data for the match + \fIname\fP Name of the required substring + \fIbufferptr\fP Where to put the string pointer + \fIbufflen\fP Where to put the string length +.sp +The memory in which the substring is placed is obtained by calling the same +memory allocation function that was used for the match data block. The +convenience function \fBpcre2_substring_free()\fP can be used to free it when +it is no longer needed. The yield of the function is zero for success or one of +the following error numbers: +.sp + PCRE2_ERROR_NOSUBSTRING there are no groups of that name + PCRE2_ERROR_UNAVAILBLE the ovector was too small for that group + PCRE2_ERROR_UNSET the group did not participate in the match + PCRE2_ERROR_NOMEMORY memory could not be obtained +.sp +If there is more than one group with the given name, the first one that is set +is returned. In this situation PCRE2_ERROR_UNSET means that no group with the +given name was set. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_substring_get_bynumber.3 b/src/pcre2/doc/pcre2_substring_get_bynumber.3 new file mode 100644 index 00000000..51b6a049 --- /dev/null +++ b/src/pcre2/doc/pcre2_substring_get_bynumber.3 @@ -0,0 +1,45 @@ +.TH PCRE2_SUBSTRING_GET_BYNUMBER 3 "13 December 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_substring_get_bynumber(pcre2_match_data *\fImatch_data\fP, +.B " uint32_t \fInumber\fP, PCRE2_UCHAR **\fIbufferptr\fP, PCRE2_SIZE *\fIbufflen\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This is a convenience function for extracting a captured substring by number +into newly acquired memory. The arguments are: +.sp + \fImatch_data\fP The match data for the match + \fInumber\fP Number of the required substring + \fIbufferptr\fP Where to put the string pointer + \fIbufflen\fP Where to put the string length +.sp +The memory in which the substring is placed is obtained by calling the same +memory allocation function that was used for the match data block. The +convenience function \fBpcre2_substring_free()\fP can be used to free it when +it is no longer needed. The yield of the function is zero for success or one of +the following error numbers: +.sp + PCRE2_ERROR_NOSUBSTRING there are no groups of that number + PCRE2_ERROR_UNAVAILBLE the ovector was too small for that group + PCRE2_ERROR_UNSET the group did not participate in the match + PCRE2_ERROR_NOMEMORY memory could not be obtained +.sp +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_substring_length_byname.3 b/src/pcre2/doc/pcre2_substring_length_byname.3 new file mode 100644 index 00000000..84cdc6a5 --- /dev/null +++ b/src/pcre2/doc/pcre2_substring_length_byname.3 @@ -0,0 +1,34 @@ +.TH PCRE2_SUBSTRING_LENGTH_BYNAME 3 "21 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_substring_length_byname(pcre2_match_data *\fImatch_data\fP, +.B " PCRE2_SPTR \fIname\fP, PCRE2_SIZE *\fIlength\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function returns the length of a matched substring, identified by name. +The arguments are: +.sp + \fImatch_data\fP The match data block for the match + \fIname\fP The substring name + \fIlength\fP Where to return the length +.sp +The yield is zero on success, or an error code if the substring is not found. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_substring_length_bynumber.3 b/src/pcre2/doc/pcre2_substring_length_bynumber.3 new file mode 100644 index 00000000..12778d61 --- /dev/null +++ b/src/pcre2/doc/pcre2_substring_length_bynumber.3 @@ -0,0 +1,36 @@ +.TH PCRE2_SUBSTRING_LENGTH_BYNUMBER 3 "22 December 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_substring_length_bynumber(pcre2_match_data *\fImatch_data\fP, +.B " uint32_t \fInumber\fP, PCRE2_SIZE *\fIlength\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This function returns the length of a matched substring, identified by number. +The arguments are: +.sp + \fImatch_data\fP The match data block for the match + \fInumber\fP The substring number + \fIlength\fP Where to return the length, or NULL +.sp +The third argument may be NULL if all you want to know is whether or not a +substring is set. The yield is zero on success, or a negative error code +otherwise. After a partial match, only substring 0 is available. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_substring_list_free.3 b/src/pcre2/doc/pcre2_substring_list_free.3 new file mode 100644 index 00000000..d977ed52 --- /dev/null +++ b/src/pcre2/doc/pcre2_substring_list_free.3 @@ -0,0 +1,28 @@ +.TH PCRE2_SUBSTRING_LIST_FREE 3 "28 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.SM +.B void pcre2_substring_list_free(PCRE2_SPTR *\fIlist\fP); +. +.SH DESCRIPTION +.rs +.sp +This is a convenience function for freeing the store obtained by a previous +call to \fBpcre2substring_list_get()\fP. Its only argument is a pointer to +the list of string pointers. If the argument is NULL, the function returns +immediately, without doing anything. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_substring_list_get.3 b/src/pcre2/doc/pcre2_substring_list_get.3 new file mode 100644 index 00000000..bdc400ec --- /dev/null +++ b/src/pcre2/doc/pcre2_substring_list_get.3 @@ -0,0 +1,44 @@ +.TH PCRE2_SUBSTRING_LIST_GET 3 "21 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_substring_list_get(pcre2_match_data *\fImatch_data\fP, +.B " PCRE2_UCHAR ***\fIlistptr\fP, PCRE2_SIZE **\fIlengthsptr\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +This is a convenience function for extracting all the captured substrings after +a pattern match. It builds a list of pointers to the strings, and (optionally) +a second list that contains their lengths (in code units), excluding a +terminating zero that is added to each of them. All this is done in a single +block of memory that is obtained using the same memory allocation function that +was used to get the match data block. The convenience function +\fBpcre2_substring_list_free()\fP can be used to free it when it is no longer +needed. The arguments are: +.sp + \fImatch_data\fP The match data block + \fIlistptr\fP Where to put a pointer to the list + \fIlengthsptr\fP Where to put a pointer to the lengths, or NULL +.sp +A pointer to a list of pointers is put in the variable whose address is in +\fIlistptr\fP. The list is terminated by a NULL pointer. If \fIlengthsptr\fP is +not NULL, a matching list of lengths is created, and its address is placed in +\fIlengthsptr\fP. The yield of the function is zero on success or +PCRE2_ERROR_NOMEMORY if sufficient memory could not be obtained. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_substring_nametable_scan.3 b/src/pcre2/doc/pcre2_substring_nametable_scan.3 new file mode 100644 index 00000000..9ab58cdc --- /dev/null +++ b/src/pcre2/doc/pcre2_substring_nametable_scan.3 @@ -0,0 +1,41 @@ +.TH PCRE2_SUBSTRING_NAMETABLE_SCAN 3 "03 February 2019" "PCRE2 10.33" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_substring_nametable_scan(const pcre2_code *\fIcode\fP, +.B " PCRE2_SPTR \fIname\fP, PCRE2_SPTR *\fIfirst\fP, PCRE2_SPTR *\fIlast\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This convenience function finds, for a compiled pattern, the first and last +entries for a given name in the table that translates capture group names into +numbers. +.sp + \fIcode\fP Compiled regular expression + \fIname\fP Name whose entries required + \fIfirst\fP Where to return a pointer to the first entry + \fIlast\fP Where to return a pointer to the last entry +.sp +When the name is found in the table, if \fIfirst\fP is NULL, the function +returns a group number, but if there is more than one matching entry, it is not +defined which one. Otherwise, when both pointers have been set, the yield of +the function is the length of each entry in code units. If the name is not +found, PCRE2_ERROR_NOSUBSTRING is returned. +.P +There is a complete description of the PCRE2 native API, including the format of +the table entries, in the +.\" HREF +\fBpcre2api\fP +.\" +page, and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2_substring_number_from_name.3 b/src/pcre2/doc/pcre2_substring_number_from_name.3 new file mode 100644 index 00000000..b077b1d2 --- /dev/null +++ b/src/pcre2/doc/pcre2_substring_number_from_name.3 @@ -0,0 +1,38 @@ +.TH PCRE2_SUBSTRING_NUMBER_FROM_NAME 3 "21 October 2014" "PCRE2 10.00" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_substring_number_from_name(const pcre2_code *\fIcode\fP, +.B " PCRE2_SPTR \fIname\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +This convenience function finds the number of a named substring capturing +parenthesis in a compiled pattern, provided that it is a unique name. The +function arguments are: +.sp + \fIcode\fP Compiled regular expression + \fIname\fP Name whose number is required +.sp +The yield of the function is the number of the parenthesis if the name is +found, or PCRE2_ERROR_NOSUBSTRING if it is not found. When duplicate names are +allowed (PCRE2_DUPNAMES is set), if the name is not unique, +PCRE2_ERROR_NOUNIQUESUBSTRING is returned. You can obtain the list of numbers +with the same name by calling \fBpcre2_substring_nametable_scan()\fP. +.P +There is a complete description of the PCRE2 native API in the +.\" HREF +\fBpcre2api\fP +.\" +page and a description of the POSIX API in the +.\" HREF +\fBpcre2posix\fP +.\" +page. diff --git a/src/pcre2/doc/pcre2api.3 b/src/pcre2/doc/pcre2api.3 new file mode 100644 index 00000000..148dca62 --- /dev/null +++ b/src/pcre2/doc/pcre2api.3 @@ -0,0 +1,4005 @@ +.TH PCRE2API 3 "04 November 2020" "PCRE2 10.36" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.sp +.B #include +.sp +PCRE2 is a new API for PCRE, starting at release 10.0. This document contains a +description of all its native functions. See the +.\" HREF +\fBpcre2\fP +.\" +document for an overview of all the PCRE2 documentation. +. +. +.SH "PCRE2 NATIVE API BASIC FUNCTIONS" +.rs +.sp +.nf +.B pcre2_code *pcre2_compile(PCRE2_SPTR \fIpattern\fP, PCRE2_SIZE \fIlength\fP, +.B " uint32_t \fIoptions\fP, int *\fIerrorcode\fP, PCRE2_SIZE *\fIerroroffset,\fP" +.B " pcre2_compile_context *\fIccontext\fP);" +.sp +.B void pcre2_code_free(pcre2_code *\fIcode\fP); +.sp +.B pcre2_match_data *pcre2_match_data_create(uint32_t \fIovecsize\fP, +.B " pcre2_general_context *\fIgcontext\fP);" +.sp +.B pcre2_match_data *pcre2_match_data_create_from_pattern( +.B " const pcre2_code *\fIcode\fP, pcre2_general_context *\fIgcontext\fP);" +.sp +.B int pcre2_match(const pcre2_code *\fIcode\fP, PCRE2_SPTR \fIsubject\fP, +.B " PCRE2_SIZE \fIlength\fP, PCRE2_SIZE \fIstartoffset\fP," +.B " uint32_t \fIoptions\fP, pcre2_match_data *\fImatch_data\fP," +.B " pcre2_match_context *\fImcontext\fP);" +.sp +.B int pcre2_dfa_match(const pcre2_code *\fIcode\fP, PCRE2_SPTR \fIsubject\fP, +.B " PCRE2_SIZE \fIlength\fP, PCRE2_SIZE \fIstartoffset\fP," +.B " uint32_t \fIoptions\fP, pcre2_match_data *\fImatch_data\fP," +.B " pcre2_match_context *\fImcontext\fP," +.B " int *\fIworkspace\fP, PCRE2_SIZE \fIwscount\fP);" +.sp +.B void pcre2_match_data_free(pcre2_match_data *\fImatch_data\fP); +.fi +. +. +.SH "PCRE2 NATIVE API AUXILIARY MATCH FUNCTIONS" +.rs +.sp +.nf +.B PCRE2_SPTR pcre2_get_mark(pcre2_match_data *\fImatch_data\fP); +.sp +.B uint32_t pcre2_get_ovector_count(pcre2_match_data *\fImatch_data\fP); +.sp +.B PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *\fImatch_data\fP); +.sp +.B PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *\fImatch_data\fP); +.fi +. +. +.SH "PCRE2 NATIVE API GENERAL CONTEXT FUNCTIONS" +.rs +.sp +.nf +.B pcre2_general_context *pcre2_general_context_create( +.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *)," +.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);" +.sp +.B pcre2_general_context *pcre2_general_context_copy( +.B " pcre2_general_context *\fIgcontext\fP);" +.sp +.B void pcre2_general_context_free(pcre2_general_context *\fIgcontext\fP); +.fi +. +. +.SH "PCRE2 NATIVE API COMPILE CONTEXT FUNCTIONS" +.rs +.sp +.nf +.B pcre2_compile_context *pcre2_compile_context_create( +.B " pcre2_general_context *\fIgcontext\fP);" +.sp +.B pcre2_compile_context *pcre2_compile_context_copy( +.B " pcre2_compile_context *\fIccontext\fP);" +.sp +.B void pcre2_compile_context_free(pcre2_compile_context *\fIccontext\fP); +.sp +.B int pcre2_set_bsr(pcre2_compile_context *\fIccontext\fP, +.B " uint32_t \fIvalue\fP);" +.sp +.B int pcre2_set_character_tables(pcre2_compile_context *\fIccontext\fP, +.B " const uint8_t *\fItables\fP);" +.sp +.B int pcre2_set_compile_extra_options(pcre2_compile_context *\fIccontext\fP, +.B " uint32_t \fIextra_options\fP);" +.sp +.B int pcre2_set_max_pattern_length(pcre2_compile_context *\fIccontext\fP, +.B " PCRE2_SIZE \fIvalue\fP);" +.sp +.B int pcre2_set_newline(pcre2_compile_context *\fIccontext\fP, +.B " uint32_t \fIvalue\fP);" +.sp +.B int pcre2_set_parens_nest_limit(pcre2_compile_context *\fIccontext\fP, +.B " uint32_t \fIvalue\fP);" +.sp +.B int pcre2_set_compile_recursion_guard(pcre2_compile_context *\fIccontext\fP, +.B " int (*\fIguard_function\fP)(uint32_t, void *), void *\fIuser_data\fP);" +.fi +. +. +.SH "PCRE2 NATIVE API MATCH CONTEXT FUNCTIONS" +.rs +.sp +.nf +.B pcre2_match_context *pcre2_match_context_create( +.B " pcre2_general_context *\fIgcontext\fP);" +.sp +.B pcre2_match_context *pcre2_match_context_copy( +.B " pcre2_match_context *\fImcontext\fP);" +.sp +.B void pcre2_match_context_free(pcre2_match_context *\fImcontext\fP); +.sp +.B int pcre2_set_callout(pcre2_match_context *\fImcontext\fP, +.B " int (*\fIcallout_function\fP)(pcre2_callout_block *, void *)," +.B " void *\fIcallout_data\fP);" +.sp +.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP, +.B " int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *)," +.B " void *\fIcallout_data\fP);" +.sp +.B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP, +.B " PCRE2_SIZE \fIvalue\fP);" +.sp +.B int pcre2_set_heap_limit(pcre2_match_context *\fImcontext\fP, +.B " uint32_t \fIvalue\fP);" +.sp +.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP, +.B " uint32_t \fIvalue\fP);" +.sp +.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP, +.B " uint32_t \fIvalue\fP);" +.fi +. +. +.SH "PCRE2 NATIVE API STRING EXTRACTION FUNCTIONS" +.rs +.sp +.nf +.B int pcre2_substring_copy_byname(pcre2_match_data *\fImatch_data\fP, +.B " PCRE2_SPTR \fIname\fP, PCRE2_UCHAR *\fIbuffer\fP, PCRE2_SIZE *\fIbufflen\fP);" +.sp +.B int pcre2_substring_copy_bynumber(pcre2_match_data *\fImatch_data\fP, +.B " uint32_t \fInumber\fP, PCRE2_UCHAR *\fIbuffer\fP," +.B " PCRE2_SIZE *\fIbufflen\fP);" +.sp +.B void pcre2_substring_free(PCRE2_UCHAR *\fIbuffer\fP); +.sp +.B int pcre2_substring_get_byname(pcre2_match_data *\fImatch_data\fP, +.B " PCRE2_SPTR \fIname\fP, PCRE2_UCHAR **\fIbufferptr\fP, PCRE2_SIZE *\fIbufflen\fP);" +.sp +.B int pcre2_substring_get_bynumber(pcre2_match_data *\fImatch_data\fP, +.B " uint32_t \fInumber\fP, PCRE2_UCHAR **\fIbufferptr\fP," +.B " PCRE2_SIZE *\fIbufflen\fP);" +.sp +.B int pcre2_substring_length_byname(pcre2_match_data *\fImatch_data\fP, +.B " PCRE2_SPTR \fIname\fP, PCRE2_SIZE *\fIlength\fP);" +.sp +.B int pcre2_substring_length_bynumber(pcre2_match_data *\fImatch_data\fP, +.B " uint32_t \fInumber\fP, PCRE2_SIZE *\fIlength\fP);" +.sp +.B int pcre2_substring_nametable_scan(const pcre2_code *\fIcode\fP, +.B " PCRE2_SPTR \fIname\fP, PCRE2_SPTR *\fIfirst\fP, PCRE2_SPTR *\fIlast\fP);" +.sp +.B int pcre2_substring_number_from_name(const pcre2_code *\fIcode\fP, +.B " PCRE2_SPTR \fIname\fP);" +.sp +.B void pcre2_substring_list_free(PCRE2_SPTR *\fIlist\fP); +.sp +.B int pcre2_substring_list_get(pcre2_match_data *\fImatch_data\fP, +.B " PCRE2_UCHAR ***\fIlistptr\fP, PCRE2_SIZE **\fIlengthsptr\fP); +.fi +. +. +.SH "PCRE2 NATIVE API STRING SUBSTITUTION FUNCTION" +.rs +.sp +.nf +.B int pcre2_substitute(const pcre2_code *\fIcode\fP, PCRE2_SPTR \fIsubject\fP, +.B " PCRE2_SIZE \fIlength\fP, PCRE2_SIZE \fIstartoffset\fP," +.B " uint32_t \fIoptions\fP, pcre2_match_data *\fImatch_data\fP," +.B " pcre2_match_context *\fImcontext\fP, PCRE2_SPTR \fIreplacementz\fP," +.B " PCRE2_SIZE \fIrlength\fP, PCRE2_UCHAR *\fIoutputbuffer\fP," +.B " PCRE2_SIZE *\fIoutlengthptr\fP);" +.fi +. +. +.SH "PCRE2 NATIVE API JIT FUNCTIONS" +.rs +.sp +.nf +.B int pcre2_jit_compile(pcre2_code *\fIcode\fP, uint32_t \fIoptions\fP); +.sp +.B int pcre2_jit_match(const pcre2_code *\fIcode\fP, PCRE2_SPTR \fIsubject\fP, +.B " PCRE2_SIZE \fIlength\fP, PCRE2_SIZE \fIstartoffset\fP," +.B " uint32_t \fIoptions\fP, pcre2_match_data *\fImatch_data\fP," +.B " pcre2_match_context *\fImcontext\fP);" +.sp +.B void pcre2_jit_free_unused_memory(pcre2_general_context *\fIgcontext\fP); +.sp +.B pcre2_jit_stack *pcre2_jit_stack_create(PCRE2_SIZE \fIstartsize\fP, +.B " PCRE2_SIZE \fImaxsize\fP, pcre2_general_context *\fIgcontext\fP);" +.sp +.B void pcre2_jit_stack_assign(pcre2_match_context *\fImcontext\fP, +.B " pcre2_jit_callback \fIcallback_function\fP, void *\fIcallback_data\fP);" +.sp +.B void pcre2_jit_stack_free(pcre2_jit_stack *\fIjit_stack\fP); +.fi +. +. +.SH "PCRE2 NATIVE API SERIALIZATION FUNCTIONS" +.rs +.sp +.nf +.B int32_t pcre2_serialize_decode(pcre2_code **\fIcodes\fP, +.B " int32_t \fInumber_of_codes\fP, const uint8_t *\fIbytes\fP," +.B " pcre2_general_context *\fIgcontext\fP);" +.sp +.B int32_t pcre2_serialize_encode(const pcre2_code **\fIcodes\fP, +.B " int32_t \fInumber_of_codes\fP, uint8_t **\fIserialized_bytes\fP," +.B " PCRE2_SIZE *\fIserialized_size\fP, pcre2_general_context *\fIgcontext\fP);" +.sp +.B void pcre2_serialize_free(uint8_t *\fIbytes\fP); +.sp +.B int32_t pcre2_serialize_get_number_of_codes(const uint8_t *\fIbytes\fP); +.fi +. +. +.SH "PCRE2 NATIVE API AUXILIARY FUNCTIONS" +.rs +.sp +.nf +.B pcre2_code *pcre2_code_copy(const pcre2_code *\fIcode\fP); +.sp +.B pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *\fIcode\fP); +.sp +.B int pcre2_get_error_message(int \fIerrorcode\fP, PCRE2_UCHAR *\fIbuffer\fP, +.B " PCRE2_SIZE \fIbufflen\fP);" +.sp +.B const uint8_t *pcre2_maketables(pcre2_general_context *\fIgcontext\fP); +.sp +.B void pcre2_maketables_free(pcre2_general_context *\fIgcontext\fP, +.B " const uint8_t *\fItables\fP);" +.sp +.B int pcre2_pattern_info(const pcre2_code *\fIcode\fP, uint32_t \fIwhat\fP, +.B " void *\fIwhere\fP);" +.sp +.B int pcre2_callout_enumerate(const pcre2_code *\fIcode\fP, +.B " int (*\fIcallback\fP)(pcre2_callout_enumerate_block *, void *)," +.B " void *\fIuser_data\fP);" +.sp +.B int pcre2_config(uint32_t \fIwhat\fP, void *\fIwhere\fP); +.fi +. +. +.SH "PCRE2 NATIVE API OBSOLETE FUNCTIONS" +.rs +.sp +.nf +.B int pcre2_set_recursion_limit(pcre2_match_context *\fImcontext\fP, +.B " uint32_t \fIvalue\fP);" +.sp +.B int pcre2_set_recursion_memory_management( +.B " pcre2_match_context *\fImcontext\fP," +.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *)," +.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);" +.fi +.sp +These functions became obsolete at release 10.30 and are retained only for +backward compatibility. They should not be used in new code. The first is +replaced by \fBpcre2_set_depth_limit()\fP; the second is no longer needed and +has no effect (it always returns zero). +. +. +.SH "PCRE2 EXPERIMENTAL PATTERN CONVERSION FUNCTIONS" +.rs +.sp +.nf +.B pcre2_convert_context *pcre2_convert_context_create( +.B " pcre2_general_context *\fIgcontext\fP);" +.sp +.B pcre2_convert_context *pcre2_convert_context_copy( +.B " pcre2_convert_context *\fIcvcontext\fP);" +.sp +.B void pcre2_convert_context_free(pcre2_convert_context *\fIcvcontext\fP); +.sp +.B int pcre2_set_glob_escape(pcre2_convert_context *\fIcvcontext\fP, +.B " uint32_t \fIescape_char\fP);" +.sp +.B int pcre2_set_glob_separator(pcre2_convert_context *\fIcvcontext\fP, +.B " uint32_t \fIseparator_char\fP);" +.sp +.B int pcre2_pattern_convert(PCRE2_SPTR \fIpattern\fP, PCRE2_SIZE \fIlength\fP, +.B " uint32_t \fIoptions\fP, PCRE2_UCHAR **\fIbuffer\fP," +.B " PCRE2_SIZE *\fIblength\fP, pcre2_convert_context *\fIcvcontext\fP);" +.sp +.B void pcre2_converted_pattern_free(PCRE2_UCHAR *\fIconverted_pattern\fP); +.fi +.sp +These functions provide a way of converting non-PCRE2 patterns into +patterns that can be processed by \fBpcre2_compile()\fP. This facility is +experimental and may be changed in future releases. At present, "globs" and +POSIX basic and extended patterns can be converted. Details are given in the +.\" HREF +\fBpcre2convert\fP +.\" +documentation. +. +. +.SH "PCRE2 8-BIT, 16-BIT, AND 32-BIT LIBRARIES" +.rs +.sp +There are three PCRE2 libraries, supporting 8-bit, 16-bit, and 32-bit code +units, respectively. However, there is just one header file, \fBpcre2.h\fP. +This contains the function prototypes and other definitions for all three +libraries. One, two, or all three can be installed simultaneously. On Unix-like +systems the libraries are called \fBlibpcre2-8\fP, \fBlibpcre2-16\fP, and +\fBlibpcre2-32\fP, and they can also co-exist with the original PCRE libraries. +.P +Character strings are passed to and from a PCRE2 library as a sequence of +unsigned integers in code units of the appropriate width. Every PCRE2 function +comes in three different forms, one for each library, for example: +.sp + \fBpcre2_compile_8()\fP + \fBpcre2_compile_16()\fP + \fBpcre2_compile_32()\fP +.sp +There are also three different sets of data types: +.sp + \fBPCRE2_UCHAR8, PCRE2_UCHAR16, PCRE2_UCHAR32\fP + \fBPCRE2_SPTR8, PCRE2_SPTR16, PCRE2_SPTR32\fP +.sp +The UCHAR types define unsigned code units of the appropriate widths. For +example, PCRE2_UCHAR16 is usually defined as `uint16_t'. The SPTR types are +constant pointers to the equivalent UCHAR types, that is, they are pointers to +vectors of unsigned code units. +.P +Many applications use only one code unit width. For their convenience, macros +are defined whose names are the generic forms such as \fBpcre2_compile()\fP and +PCRE2_SPTR. These macros use the value of the macro PCRE2_CODE_UNIT_WIDTH to +generate the appropriate width-specific function and macro names. +PCRE2_CODE_UNIT_WIDTH is not defined by default. An application must define it +to be 8, 16, or 32 before including \fBpcre2.h\fP in order to make use of the +generic names. +.P +Applications that use more than one code unit width can be linked with more +than one PCRE2 library, but must define PCRE2_CODE_UNIT_WIDTH to be 0 before +including \fBpcre2.h\fP, and then use the real function names. Any code that is +to be included in an environment where the value of PCRE2_CODE_UNIT_WIDTH is +unknown should also use the real function names. (Unfortunately, it is not +possible in C code to save and restore the value of a macro.) +.P +If PCRE2_CODE_UNIT_WIDTH is not defined before including \fBpcre2.h\fP, a +compiler error occurs. +.P +When using multiple libraries in an application, you must take care when +processing any particular pattern to use only functions from a single library. +For example, if you want to run a match using a pattern that was compiled with +\fBpcre2_compile_16()\fP, you must do so with \fBpcre2_match_16()\fP, not +\fBpcre2_match_8()\fP or \fBpcre2_match_32()\fP. +.P +In the function summaries above, and in the rest of this document and other +PCRE2 documents, functions and data types are described using their generic +names, without the _8, _16, or _32 suffix. +. +. +.SH "PCRE2 API OVERVIEW" +.rs +.sp +PCRE2 has its own native API, which is described in this document. There are +also some wrapper functions for the 8-bit library that correspond to the +POSIX regular expression API, but they do not give access to all the +functionality of PCRE2. They are described in the +.\" HREF +\fBpcre2posix\fP +.\" +documentation. Both these APIs define a set of C function calls. +.P +The native API C data types, function prototypes, option values, and error +codes are defined in the header file \fBpcre2.h\fP, which also contains +definitions of PCRE2_MAJOR and PCRE2_MINOR, the major and minor release numbers +for the library. Applications can use these to include support for different +releases of PCRE2. +.P +In a Windows environment, if you want to statically link an application program +against a non-dll PCRE2 library, you must define PCRE2_STATIC before including +\fBpcre2.h\fP. +.P +The functions \fBpcre2_compile()\fP and \fBpcre2_match()\fP are used for +compiling and matching regular expressions in a Perl-compatible manner. A +sample program that demonstrates the simplest way of using them is provided in +the file called \fIpcre2demo.c\fP in the PCRE2 source distribution. A listing +of this program is given in the +.\" HREF +\fBpcre2demo\fP +.\" +documentation, and the +.\" HREF +\fBpcre2sample\fP +.\" +documentation describes how to compile and run it. +.P +The compiling and matching functions recognize various options that are passed +as bits in an options argument. There are also some more complicated parameters +such as custom memory management functions and resource limits that are passed +in "contexts" (which are just memory blocks, described below). Simple +applications do not need to make use of contexts. +.P +Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be +built in appropriate hardware environments. It greatly speeds up the matching +performance of many patterns. Programs can request that it be used if +available by calling \fBpcre2_jit_compile()\fP after a pattern has been +successfully compiled by \fBpcre2_compile()\fP. This does nothing if JIT +support is not available. +.P +More complicated programs might need to make use of the specialist functions +\fBpcre2_jit_stack_create()\fP, \fBpcre2_jit_stack_free()\fP, and +\fBpcre2_jit_stack_assign()\fP in order to control the JIT code's memory usage. +.P +JIT matching is automatically used by \fBpcre2_match()\fP if it is available, +unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT +matching, which gives improved performance at the expense of less sanity +checking. The JIT-specific functions are discussed in the +.\" HREF +\fBpcre2jit\fP +.\" +documentation. +.P +A second matching function, \fBpcre2_dfa_match()\fP, which is not +Perl-compatible, is also provided. This uses a different algorithm for the +matching. The alternative algorithm finds all possible matches (at a given +point in the subject), and scans the subject just once (unless there are +lookaround assertions). However, this algorithm does not return captured +substrings. A description of the two matching algorithms and their advantages +and disadvantages is given in the +.\" HREF +\fBpcre2matching\fP +.\" +documentation. There is no JIT support for \fBpcre2_dfa_match()\fP. +.P +In addition to the main compiling and matching functions, there are convenience +functions for extracting captured substrings from a subject string that has +been matched by \fBpcre2_match()\fP. They are: +.sp + \fBpcre2_substring_copy_byname()\fP + \fBpcre2_substring_copy_bynumber()\fP + \fBpcre2_substring_get_byname()\fP + \fBpcre2_substring_get_bynumber()\fP + \fBpcre2_substring_list_get()\fP + \fBpcre2_substring_length_byname()\fP + \fBpcre2_substring_length_bynumber()\fP + \fBpcre2_substring_nametable_scan()\fP + \fBpcre2_substring_number_from_name()\fP +.sp +\fBpcre2_substring_free()\fP and \fBpcre2_substring_list_free()\fP are also +provided, to free memory used for extracted strings. If either of these +functions is called with a NULL argument, the function returns immediately +without doing anything. +.P +The function \fBpcre2_substitute()\fP can be called to match a pattern and +return a copy of the subject string with substitutions for parts that were +matched. +.P +Functions whose names begin with \fBpcre2_serialize_\fP are used for saving +compiled patterns on disc or elsewhere, and reloading them later. +.P +Finally, there are functions for finding out information about a compiled +pattern (\fBpcre2_pattern_info()\fP) and about the configuration with which +PCRE2 was built (\fBpcre2_config()\fP). +.P +Functions with names ending with \fB_free()\fP are used for freeing memory +blocks of various sorts. In all cases, if one of these functions is called with +a NULL argument, it does nothing. +. +. +.SH "STRING LENGTHS AND OFFSETS" +.rs +.sp +The PCRE2 API uses string lengths and offsets into strings of code units in +several places. These values are always of type PCRE2_SIZE, which is an +unsigned integer type, currently always defined as \fIsize_t\fP. The largest +value that can be stored in such a type (that is ~(PCRE2_SIZE)0) is reserved +as a special indicator for zero-terminated strings and unset offsets. +Therefore, the longest string that can be handled is one less than this +maximum. +. +. +.\" HTML +.SH NEWLINES +.rs +.sp +PCRE2 supports five different conventions for indicating line breaks in +strings: a single CR (carriage return) character, a single LF (linefeed) +character, the two-character sequence CRLF, any of the three preceding, or any +Unicode newline sequence. The Unicode newline sequences are the three just +mentioned, plus the single characters VT (vertical tab, U+000B), FF (form feed, +U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS +(paragraph separator, U+2029). +.P +Each of the first three conventions is used by at least one operating system as +its standard newline sequence. When PCRE2 is built, a default can be specified. +If it is not, the default is set to LF, which is the Unix standard. However, +the newline convention can be changed by an application when calling +\fBpcre2_compile()\fP, or it can be specified by special text at the start of +the pattern itself; this overrides any other settings. See the +.\" HREF +\fBpcre2pattern\fP +.\" +page for details of the special character sequences. +.P +In the PCRE2 documentation the word "newline" is used to mean "the character or +pair of characters that indicate a line break". The choice of newline +convention affects the handling of the dot, circumflex, and dollar +metacharacters, the handling of #-comments in /x mode, and, when CRLF is a +recognized line ending sequence, the match position advancement for a +non-anchored pattern. There is more detail about this in the +.\" HTML +.\" +section on \fBpcre2_match()\fP options +.\" +below. +.P +The choice of newline convention does not affect the interpretation of +the \en or \er escape sequences, nor does it affect what \eR matches; this has +its own separate convention. +. +. +.SH MULTITHREADING +.rs +.sp +In a multithreaded application it is important to keep thread-specific data +separate from data that can be shared between threads. The PCRE2 library code +itself is thread-safe: it contains no static or global variables. The API is +designed to be fairly simple for non-threaded applications while at the same +time ensuring that multithreaded applications can use it. +.P +There are several different blocks of data that are used to pass information +between the application and the PCRE2 libraries. +. +. +.SS "The compiled pattern" +.rs +.sp +A pointer to the compiled form of a pattern is returned to the user when +\fBpcre2_compile()\fP is successful. The data in the compiled pattern is fixed, +and does not change when the pattern is matched. Therefore, it is thread-safe, +that is, the same compiled pattern can be used by more than one thread +simultaneously. For example, an application can compile all its patterns at the +start, before forking off multiple threads that use them. However, if the +just-in-time (JIT) optimization feature is being used, it needs separate memory +stack areas for each thread. See the +.\" HREF +\fBpcre2jit\fP +.\" +documentation for more details. +.P +In a more complicated situation, where patterns are compiled only when they are +first needed, but are still shared between threads, pointers to compiled +patterns must be protected from simultaneous writing by multiple threads. This +is somewhat tricky to do correctly. If you know that writing to a pointer is +atomic in your environment, you can use logic like this: +.sp + Get a read-only (shared) lock (mutex) for pointer + if (pointer == NULL) + { + Get a write (unique) lock for pointer + if (pointer == NULL) pointer = pcre2_compile(... + } + Release the lock + Use pointer in pcre2_match() +.sp +Of course, testing for compilation errors should also be included in the code. +.P +The reason for checking the pointer a second time is as follows: Several +threads may have acquired the shared lock and tested the pointer for being +NULL, but only one of them will be given the write lock, with the rest kept +waiting. The winning thread will compile the pattern and store the result. +After this thread releases the write lock, another thread will get it, and if +it does not retest pointer for being NULL, will recompile the pattern and +overwrite the pointer, creating a memory leak and possibly causing other +issues. +.P +In an environment where writing to a pointer may not be atomic, the above logic +is not sufficient. The thread that is doing the compiling may be descheduled +after writing only part of the pointer, which could cause other threads to use +an invalid value. Instead of checking the pointer itself, a separate "pointer +is valid" flag (that can be updated atomically) must be used: +.sp + Get a read-only (shared) lock (mutex) for pointer + if (!pointer_is_valid) + { + Get a write (unique) lock for pointer + if (!pointer_is_valid) + { + pointer = pcre2_compile(... + pointer_is_valid = TRUE + } + } + Release the lock + Use pointer in pcre2_match() +.sp +If JIT is being used, but the JIT compilation is not being done immediately +(perhaps waiting to see if the pattern is used often enough), similar logic is +required. JIT compilation updates a value within the compiled code block, so a +thread must gain unique write access to the pointer before calling +\fBpcre2_jit_compile()\fP. Alternatively, \fBpcre2_code_copy()\fP or +\fBpcre2_code_copy_with_tables()\fP can be used to obtain a private copy of the +compiled code before calling the JIT compiler. +. +. +.SS "Context blocks" +.rs +.sp +The next main section below introduces the idea of "contexts" in which PCRE2 +functions are called. A context is nothing more than a collection of parameters +that control the way PCRE2 operates. Grouping a number of parameters together +in a context is a convenient way of passing them to a PCRE2 function without +using lots of arguments. The parameters that are stored in contexts are in some +sense "advanced features" of the API. Many straightforward applications will +not need to use contexts. +.P +In a multithreaded application, if the parameters in a context are values that +are never changed, the same context can be used by all the threads. However, if +any thread needs to change any value in a context, it must make its own +thread-specific copy. +. +. +.SS "Match blocks" +.rs +.sp +The matching functions need a block of memory for storing the results of a +match. This includes details of what was matched, as well as additional +information such as the name of a (*MARK) setting. Each thread must provide its +own copy of this memory. +. +. +.SH "PCRE2 CONTEXTS" +.rs +.sp +Some PCRE2 functions have a lot of parameters, many of which are used only by +specialist applications, for example, those that use custom memory management +or non-standard character tables. To keep function argument lists at a +reasonable size, and at the same time to keep the API extensible, "uncommon" +parameters are passed to certain functions in a \fBcontext\fP instead of +directly. A context is just a block of memory that holds the parameter values. +Applications that do not need to adjust any of the context parameters can pass +NULL when a context pointer is required. +.P +There are three different types of context: a general context that is relevant +for several PCRE2 operations, a compile-time context, and a match-time context. +. +. +.SS "The general context" +.rs +.sp +At present, this context just contains pointers to (and data for) external +memory management functions that are called from several places in the PCRE2 +library. The context is named `general' rather than specifically `memory' +because in future other fields may be added. If you do not want to supply your +own custom memory management functions, you do not need to bother with a +general context. A general context is created by: +.sp +.nf +.B pcre2_general_context *pcre2_general_context_create( +.B " void *(*\fIprivate_malloc\fP)(PCRE2_SIZE, void *)," +.B " void (*\fIprivate_free\fP)(void *, void *), void *\fImemory_data\fP);" +.fi +.sp +The two function pointers specify custom memory management functions, whose +prototypes are: +.sp + \fBvoid *private_malloc(PCRE2_SIZE, void *);\fP + \fBvoid private_free(void *, void *);\fP +.sp +Whenever code in PCRE2 calls these functions, the final argument is the value +of \fImemory_data\fP. Either of the first two arguments of the creation +function may be NULL, in which case the system memory management functions +\fImalloc()\fP and \fIfree()\fP are used. (This is not currently useful, as +there are no other fields in a general context, but in future there might be.) +The \fIprivate_malloc()\fP function is used (if supplied) to obtain memory for +storing the context, and all three values are saved as part of the context. +.P +Whenever PCRE2 creates a data block of any kind, the block contains a pointer +to the \fIfree()\fP function that matches the \fImalloc()\fP function that was +used. When the time comes to free the block, this function is called. +.P +A general context can be copied by calling: +.sp +.nf +.B pcre2_general_context *pcre2_general_context_copy( +.B " pcre2_general_context *\fIgcontext\fP);" +.fi +.sp +The memory used for a general context should be freed by calling: +.sp +.nf +.B void pcre2_general_context_free(pcre2_general_context *\fIgcontext\fP); +.fi +.sp +If this function is passed a NULL argument, it returns immediately without +doing anything. +. +. +.\" HTML +.SS "The compile context" +.rs +.sp +A compile context is required if you want to provide an external function for +stack checking during compilation or to change the default values of any of the +following compile-time parameters: +.sp + What \eR matches (Unicode newlines or CR, LF, CRLF only) + PCRE2's character tables + The newline character sequence + The compile time nested parentheses limit + The maximum length of the pattern string + The extra options bits (none set by default) +.sp +A compile context is also required if you are using custom memory management. +If none of these apply, just pass NULL as the context argument of +\fIpcre2_compile()\fP. +.P +A compile context is created, copied, and freed by the following functions: +.sp +.nf +.B pcre2_compile_context *pcre2_compile_context_create( +.B " pcre2_general_context *\fIgcontext\fP);" +.sp +.B pcre2_compile_context *pcre2_compile_context_copy( +.B " pcre2_compile_context *\fIccontext\fP);" +.sp +.B void pcre2_compile_context_free(pcre2_compile_context *\fIccontext\fP); +.fi +.sp +A compile context is created with default values for its parameters. These can +be changed by calling the following functions, which return 0 on success, or +PCRE2_ERROR_BADDATA if invalid data is detected. +.sp +.nf +.B int pcre2_set_bsr(pcre2_compile_context *\fIccontext\fP, +.B " uint32_t \fIvalue\fP);" +.fi +.sp +The value must be PCRE2_BSR_ANYCRLF, to specify that \eR matches only CR, LF, +or CRLF, or PCRE2_BSR_UNICODE, to specify that \eR matches any Unicode line +ending sequence. The value is used by the JIT compiler and by the two +interpreted matching functions, \fIpcre2_match()\fP and +\fIpcre2_dfa_match()\fP. +.sp +.nf +.B int pcre2_set_character_tables(pcre2_compile_context *\fIccontext\fP, +.B " const uint8_t *\fItables\fP);" +.fi +.sp +The value must be the result of a call to \fBpcre2_maketables()\fP, whose only +argument is a general context. This function builds a set of character tables +in the current locale. +.sp +.nf +.B int pcre2_set_compile_extra_options(pcre2_compile_context *\fIccontext\fP, +.B " uint32_t \fIextra_options\fP);" +.fi +.sp +As PCRE2 has developed, almost all the 32 option bits that are available in +the \fIoptions\fP argument of \fBpcre2_compile()\fP have been used up. To avoid +running out, the compile context contains a set of extra option bits which are +used for some newer, assumed rarer, options. This function sets those bits. It +always sets all the bits (either on or off). It does not modify any existing +setting. The available options are defined in the section entitled "Extra +compile options" +.\" HTML +.\" +below. +.\" +.sp +.nf +.B int pcre2_set_max_pattern_length(pcre2_compile_context *\fIccontext\fP, +.B " PCRE2_SIZE \fIvalue\fP);" +.fi +.sp +This sets a maximum length, in code units, for any pattern string that is +compiled with this context. If the pattern is longer, an error is generated. +This facility is provided so that applications that accept patterns from +external sources can limit their size. The default is the largest number that a +PCRE2_SIZE variable can hold, which is effectively unlimited. +.sp +.nf +.B int pcre2_set_newline(pcre2_compile_context *\fIccontext\fP, +.B " uint32_t \fIvalue\fP);" +.fi +.sp +This specifies which characters or character sequences are to be recognized as +newlines. The value must be one of PCRE2_NEWLINE_CR (carriage return only), +PCRE2_NEWLINE_LF (linefeed only), PCRE2_NEWLINE_CRLF (the two-character +sequence CR followed by LF), PCRE2_NEWLINE_ANYCRLF (any of the above), +PCRE2_NEWLINE_ANY (any Unicode newline sequence), or PCRE2_NEWLINE_NUL (the +NUL character, that is a binary zero). +.P +A pattern can override the value set in the compile context by starting with a +sequence such as (*CRLF). See the +.\" HREF +\fBpcre2pattern\fP +.\" +page for details. +.P +When a pattern is compiled with the PCRE2_EXTENDED or PCRE2_EXTENDED_MORE +option, the newline convention affects the recognition of the end of internal +comments starting with #. The value is saved with the compiled pattern for +subsequent use by the JIT compiler and by the two interpreted matching +functions, \fIpcre2_match()\fP and \fIpcre2_dfa_match()\fP. +.sp +.nf +.B int pcre2_set_parens_nest_limit(pcre2_compile_context *\fIccontext\fP, +.B " uint32_t \fIvalue\fP);" +.fi +.sp +This parameter adjusts the limit, set when PCRE2 is built (default 250), on the +depth of parenthesis nesting in a pattern. This limit stops rogue patterns +using up too much system stack when being compiled. The limit applies to +parentheses of all kinds, not just capturing parentheses. +.sp +.nf +.B int pcre2_set_compile_recursion_guard(pcre2_compile_context *\fIccontext\fP, +.B " int (*\fIguard_function\fP)(uint32_t, void *), void *\fIuser_data\fP);" +.fi +.sp +There is at least one application that runs PCRE2 in threads with very limited +system stack, where running out of stack is to be avoided at all costs. The +parenthesis limit above cannot take account of how much stack is actually +available during compilation. For a finer control, you can supply a function +that is called whenever \fBpcre2_compile()\fP starts to compile a parenthesized +part of a pattern. This function can check the actual stack size (or anything +else that it wants to, of course). +.P +The first argument to the callout function gives the current depth of +nesting, and the second is user data that is set up by the last argument of +\fBpcre2_set_compile_recursion_guard()\fP. The callout function should return +zero if all is well, or non-zero to force an error. +. +. +.\" HTML +.SS "The match context" +.rs +.sp +A match context is required if you want to: +.sp + Set up a callout function + Set an offset limit for matching an unanchored pattern + Change the limit on the amount of heap used when matching + Change the backtracking match limit + Change the backtracking depth limit + Set custom memory management specifically for the match +.sp +If none of these apply, just pass NULL as the context argument of +\fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or \fBpcre2_jit_match()\fP. +.P +A match context is created, copied, and freed by the following functions: +.sp +.nf +.B pcre2_match_context *pcre2_match_context_create( +.B " pcre2_general_context *\fIgcontext\fP);" +.sp +.B pcre2_match_context *pcre2_match_context_copy( +.B " pcre2_match_context *\fImcontext\fP);" +.sp +.B void pcre2_match_context_free(pcre2_match_context *\fImcontext\fP); +.fi +.sp +A match context is created with default values for its parameters. These can +be changed by calling the following functions, which return 0 on success, or +PCRE2_ERROR_BADDATA if invalid data is detected. +.sp +.nf +.B int pcre2_set_callout(pcre2_match_context *\fImcontext\fP, +.B " int (*\fIcallout_function\fP)(pcre2_callout_block *, void *)," +.B " void *\fIcallout_data\fP);" +.fi +.sp +This sets up a callout function for PCRE2 to call at specified points +during a matching operation. Details are given in the +.\" HREF +\fBpcre2callout\fP +.\" +documentation. +.sp +.nf +.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP, +.B " int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *)," +.B " void *\fIcallout_data\fP);" +.fi +.sp +This sets up a callout function for PCRE2 to call after each substitution +made by \fBpcre2_substitute()\fP. Details are given in the section entitled +"Creating a new string with substitutions" +.\" HTML +.\" +below. +.\" +.sp +.nf +.B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP, +.B " PCRE2_SIZE \fIvalue\fP);" +.fi +.sp +The \fIoffset_limit\fP parameter limits how far an unanchored search can +advance in the subject string. The default value is PCRE2_UNSET. The +\fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP functions return +PCRE2_ERROR_NOMATCH if a match with a starting point before or at the given +offset is not found. The \fBpcre2_substitute()\fP function makes no more +substitutions. +.P +For example, if the pattern /abc/ is matched against "123abc" with an offset +limit less than 3, the result is PCRE2_ERROR_NOMATCH. A match can never be +found if the \fIstartoffset\fP argument of \fBpcre2_match()\fP, +\fBpcre2_dfa_match()\fP, or \fBpcre2_substitute()\fP is greater than the offset +limit set in the match context. +.P +When using this facility, you must set the PCRE2_USE_OFFSET_LIMIT option when +calling \fBpcre2_compile()\fP so that when JIT is in use, different code can be +compiled. If a match is started with a non-default match limit when +PCRE2_USE_OFFSET_LIMIT is not set, an error is generated. +.P +The offset limit facility can be used to track progress when searching large +subject strings or to limit the extent of global substitutions. See also the +PCRE2_FIRSTLINE option, which requires a match to start before or at the first +newline that follows the start of matching in the subject. If this is set with +an offset limit, a match must occur in the first line and also within the +offset limit. In other words, whichever limit comes first is used. +.sp +.nf +.B int pcre2_set_heap_limit(pcre2_match_context *\fImcontext\fP, +.B " uint32_t \fIvalue\fP);" +.fi +.sp +The \fIheap_limit\fP parameter specifies, in units of kibibytes (1024 bytes), +the maximum amount of heap memory that \fBpcre2_match()\fP may use to hold +backtracking information when running an interpretive match. This limit also +applies to \fBpcre2_dfa_match()\fP, which may use the heap when processing +patterns with a lot of nested pattern recursion or lookarounds or atomic +groups. This limit does not apply to matching with the JIT optimization, which +has its own memory control arrangements (see the +.\" HREF +\fBpcre2jit\fP +.\" +documentation for more details). If the limit is reached, the negative error +code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2 +is built; if it is not, the default is set very large and is essentially +"unlimited". +.P +A value for the heap limit may also be supplied by an item at the start of a +pattern of the form +.sp + (*LIMIT_HEAP=ddd) +.sp +where ddd is a decimal number. However, such a setting is ignored unless ddd is +less than the limit set by the caller of \fBpcre2_match()\fP or, if no such +limit is set, less than the default. +.P +The \fBpcre2_match()\fP function starts out using a 20KiB vector on the system +stack for recording backtracking points. The more nested backtracking points +there are (that is, the deeper the search tree), the more memory is needed. +Heap memory is used only if the initial vector is too small. If the heap limit +is set to a value less than 21 (in particular, zero) no heap memory will be +used. In this case, only patterns that do not have a lot of nested backtracking +can be successfully processed. +.P +Similarly, for \fBpcre2_dfa_match()\fP, a vector on the system stack is used +when processing pattern recursions, lookarounds, or atomic groups, and only if +this is not big enough is heap memory used. In this case, too, setting a value +of zero disables the use of the heap. +.sp +.nf +.B int pcre2_set_match_limit(pcre2_match_context *\fImcontext\fP, +.B " uint32_t \fIvalue\fP);" +.fi +.sp +The \fImatch_limit\fP parameter provides a means of preventing PCRE2 from using +up too many computing resources when processing patterns that are not going to +match, but which have a very large number of possibilities in their search +trees. The classic example is a pattern that uses nested unlimited repeats. +.P +There is an internal counter in \fBpcre2_match()\fP that is incremented each +time round its main matching loop. If this value reaches the match limit, +\fBpcre2_match()\fP returns the negative value PCRE2_ERROR_MATCHLIMIT. This has +the effect of limiting the amount of backtracking that can take place. For +patterns that are not anchored, the count restarts from zero for each position +in the subject string. This limit also applies to \fBpcre2_dfa_match()\fP, +though the counting is done in a different way. +.P +When \fBpcre2_match()\fP is called with a pattern that was successfully +processed by \fBpcre2_jit_compile()\fP, the way in which matching is executed +is entirely different. However, there is still the possibility of runaway +matching that goes on for a very long time, and so the \fImatch_limit\fP value +is also used in this case (but in a different way) to limit how long the +matching can continue. +.P +The default value for the limit can be set when PCRE2 is built; the default +default is 10 million, which handles all but the most extreme cases. A value +for the match limit may also be supplied by an item at the start of a pattern +of the form +.sp + (*LIMIT_MATCH=ddd) +.sp +where ddd is a decimal number. However, such a setting is ignored unless ddd is +less than the limit set by the caller of \fBpcre2_match()\fP or +\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default. +.sp +.nf +.B int pcre2_set_depth_limit(pcre2_match_context *\fImcontext\fP, +.B " uint32_t \fIvalue\fP);" +.fi +.sp +This parameter limits the depth of nested backtracking in \fBpcre2_match()\fP. +Each time a nested backtracking point is passed, a new memory "frame" is used +to remember the state of matching at that point. Thus, this parameter +indirectly limits the amount of memory that is used in a match. However, +because the size of each memory "frame" depends on the number of capturing +parentheses, the actual memory limit varies from pattern to pattern. This limit +was more useful in versions before 10.30, where function recursion was used for +backtracking. +.P +The depth limit is not relevant, and is ignored, when matching is done using +JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which +uses it to limit the depth of nested internal recursive function calls that +implement atomic groups, lookaround assertions, and pattern recursions. This +limits, indirectly, the amount of system stack that is used. It was more useful +in versions before 10.32, when stack memory was used for local workspace +vectors for recursive function calls. From version 10.32, only local variables +are allocated on the stack and as each call uses only a few hundred bytes, even +a small stack can support quite a lot of recursion. +.P +If the depth of internal recursive function calls is great enough, local +workspace vectors are allocated on the heap from version 10.32 onwards, so the +depth limit also indirectly limits the amount of heap memory that is used. A +recursive pattern such as /(.(?2))((?1)|)/, when matched to a very long string +using \fBpcre2_dfa_match()\fP, can use a great deal of memory. However, it is +probably better to limit heap usage directly by calling +\fBpcre2_set_heap_limit()\fP. +.P +The default value for the depth limit can be set when PCRE2 is built; if it is +not, the default is set to the same value as the default for the match limit. +If the limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP +returns PCRE2_ERROR_DEPTHLIMIT. A value for the depth limit may also be +supplied by an item at the start of a pattern of the form +.sp + (*LIMIT_DEPTH=ddd) +.sp +where ddd is a decimal number. However, such a setting is ignored unless ddd is +less than the limit set by the caller of \fBpcre2_match()\fP or +\fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default. +. +. +.SH "CHECKING BUILD-TIME OPTIONS" +.rs +.sp +.B int pcre2_config(uint32_t \fIwhat\fP, void *\fIwhere\fP); +.P +The function \fBpcre2_config()\fP makes it possible for a PCRE2 client to find +the value of certain configuration parameters and to discover which optional +features have been compiled into the PCRE2 library. The +.\" HREF +\fBpcre2build\fP +.\" +documentation has more details about these features. +.P +The first argument for \fBpcre2_config()\fP specifies which information is +required. The second argument is a pointer to memory into which the information +is placed. If NULL is passed, the function returns the amount of memory that is +needed for the requested information. For calls that return numerical values, +the value is in bytes; when requesting these values, \fIwhere\fP should point +to appropriately aligned memory. For calls that return strings, the required +length is given in code units, not counting the terminating zero. +.P +When requesting information, the returned value from \fBpcre2_config()\fP is +non-negative on success, or the negative error code PCRE2_ERROR_BADOPTION if +the value in the first argument is not recognized. The following information is +available: +.sp + PCRE2_CONFIG_BSR +.sp +The output is a uint32_t integer whose value indicates what character +sequences the \eR escape sequence matches by default. A value of +PCRE2_BSR_UNICODE means that \eR matches any Unicode line ending sequence; a +value of PCRE2_BSR_ANYCRLF means that \eR matches only CR, LF, or CRLF. The +default can be overridden when a pattern is compiled. +.sp + PCRE2_CONFIG_COMPILED_WIDTHS +.sp +The output is a uint32_t integer whose lower bits indicate which code unit +widths were selected when PCRE2 was built. The 1-bit indicates 8-bit support, +and the 2-bit and 4-bit indicate 16-bit and 32-bit support, respectively. +.sp + PCRE2_CONFIG_DEPTHLIMIT +.sp +The output is a uint32_t integer that gives the default limit for the depth of +nested backtracking in \fBpcre2_match()\fP or the depth of nested recursions, +lookarounds, and atomic groups in \fBpcre2_dfa_match()\fP. Further details are +given with \fBpcre2_set_depth_limit()\fP above. +.sp + PCRE2_CONFIG_HEAPLIMIT +.sp +The output is a uint32_t integer that gives, in kibibytes, the default limit +for the amount of heap memory used by \fBpcre2_match()\fP or +\fBpcre2_dfa_match()\fP. Further details are given with +\fBpcre2_set_heap_limit()\fP above. +.sp + PCRE2_CONFIG_JIT +.sp +The output is a uint32_t integer that is set to one if support for just-in-time +compiling is available; otherwise it is set to zero. +.sp + PCRE2_CONFIG_JITTARGET +.sp +The \fIwhere\fP argument should point to a buffer that is at least 48 code +units long. (The exact length required can be found by calling +\fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with a +string that contains the name of the architecture for which the JIT compiler is +configured, for example "x86 32bit (little endian + unaligned)". If JIT support +is not available, PCRE2_ERROR_BADOPTION is returned, otherwise the number of +code units used is returned. This is the length of the string, plus one unit +for the terminating zero. +.sp + PCRE2_CONFIG_LINKSIZE +.sp +The output is a uint32_t integer that contains the number of bytes used for +internal linkage in compiled regular expressions. When PCRE2 is configured, the +value can be set to 2, 3, or 4, with the default being 2. This is the value +that is returned by \fBpcre2_config()\fP. However, when the 16-bit library is +compiled, a value of 3 is rounded up to 4, and when the 32-bit library is +compiled, internal linkages always use 4 bytes, so the configured value is not +relevant. +.P +The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all +but the most massive patterns, since it allows the size of the compiled pattern +to be up to 65535 code units. Larger values allow larger regular expressions to +be compiled by those two libraries, but at the expense of slower matching. +.sp + PCRE2_CONFIG_MATCHLIMIT +.sp +The output is a uint32_t integer that gives the default match limit for +\fBpcre2_match()\fP. Further details are given with +\fBpcre2_set_match_limit()\fP above. +.sp + PCRE2_CONFIG_NEWLINE +.sp +The output is a uint32_t integer whose value specifies the default character +sequence that is recognized as meaning "newline". The values are: +.sp + PCRE2_NEWLINE_CR Carriage return (CR) + PCRE2_NEWLINE_LF Linefeed (LF) + PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF) + PCRE2_NEWLINE_ANY Any Unicode line ending + PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF + PCRE2_NEWLINE_NUL The NUL character (binary zero) +.sp +The default should normally correspond to the standard sequence for your +operating system. +.sp + PCRE2_CONFIG_NEVER_BACKSLASH_C +.sp +The output is a uint32_t integer that is set to one if the use of \eC was +permanently disabled when PCRE2 was built; otherwise it is set to zero. +.sp + PCRE2_CONFIG_PARENSLIMIT +.sp +The output is a uint32_t integer that gives the maximum depth of nesting +of parentheses (of any kind) in a pattern. This limit is imposed to cap the +amount of system stack used when a pattern is compiled. It is specified when +PCRE2 is built; the default is 250. This limit does not take into account the +stack that may already be used by the calling application. For finer control +over compilation stack usage, see \fBpcre2_set_compile_recursion_guard()\fP. +.sp + PCRE2_CONFIG_STACKRECURSE +.sp +This parameter is obsolete and should not be used in new code. The output is a +uint32_t integer that is always set to zero. +.sp + PCRE2_CONFIG_TABLES_LENGTH +.sp +The output is a uint32_t integer that gives the length of PCRE2's character +processing tables in bytes. For details of these tables see the +.\" HTML +.\" +section on locale support +.\" +below. +.sp + PCRE2_CONFIG_UNICODE_VERSION +.sp +The \fIwhere\fP argument should point to a buffer that is at least 24 code +units long. (The exact length required can be found by calling +\fBpcre2_config()\fP with \fBwhere\fP set to NULL.) If PCRE2 has been compiled +without Unicode support, the buffer is filled with the text "Unicode not +supported". Otherwise, the Unicode version string (for example, "8.0.0") is +inserted. The number of code units used is returned. This is the length of the +string plus one unit for the terminating zero. +.sp + PCRE2_CONFIG_UNICODE +.sp +The output is a uint32_t integer that is set to one if Unicode support is +available; otherwise it is set to zero. Unicode support implies UTF support. +.sp + PCRE2_CONFIG_VERSION +.sp +The \fIwhere\fP argument should point to a buffer that is at least 24 code +units long. (The exact length required can be found by calling +\fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with +the PCRE2 version string, zero-terminated. The number of code units used is +returned. This is the length of the string plus one unit for the terminating +zero. +. +. +.\" HTML +.SH "COMPILING A PATTERN" +.rs +.sp +.nf +.B pcre2_code *pcre2_compile(PCRE2_SPTR \fIpattern\fP, PCRE2_SIZE \fIlength\fP, +.B " uint32_t \fIoptions\fP, int *\fIerrorcode\fP, PCRE2_SIZE *\fIerroroffset,\fP" +.B " pcre2_compile_context *\fIccontext\fP);" +.sp +.B void pcre2_code_free(pcre2_code *\fIcode\fP); +.sp +.B pcre2_code *pcre2_code_copy(const pcre2_code *\fIcode\fP); +.sp +.B pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *\fIcode\fP); +.fi +.P +The \fBpcre2_compile()\fP function compiles a pattern into an internal form. +The pattern is defined by a pointer to a string of code units and a length (in +code units). If the pattern is zero-terminated, the length can be specified as +PCRE2_ZERO_TERMINATED. The function returns a pointer to a block of memory that +contains the compiled pattern and related data, or NULL if an error occurred. +.P +If the compile context argument \fIccontext\fP is NULL, memory for the compiled +pattern is obtained by calling \fBmalloc()\fP. Otherwise, it is obtained from +the same memory function that was used for the compile context. The caller must +free the memory by calling \fBpcre2_code_free()\fP when it is no longer needed. +If \fBpcre2_code_free()\fP is called with a NULL argument, it returns +immediately, without doing anything. +.P +The function \fBpcre2_code_copy()\fP makes a copy of the compiled code in new +memory, using the same memory allocator as was used for the original. However, +if the code has been processed by the JIT compiler (see +.\" HTML +.\" +below), +.\" +the JIT information cannot be copied (because it is position-dependent). +The new copy can initially be used only for non-JIT matching, though it can be +passed to \fBpcre2_jit_compile()\fP if required. If \fBpcre2_code_copy()\fP is +called with a NULL argument, it returns NULL. +.P +The \fBpcre2_code_copy()\fP function provides a way for individual threads in a +multithreaded application to acquire a private copy of shared compiled code. +However, it does not make a copy of the character tables used by the compiled +pattern; the new pattern code points to the same tables as the original code. +(See +.\" HTML +.\" +"Locale Support" +.\" +below for details of these character tables.) In many applications the same +tables are used throughout, so this behaviour is appropriate. Nevertheless, +there are occasions when a copy of a compiled pattern and the relevant tables +are needed. The \fBpcre2_code_copy_with_tables()\fP provides this facility. +Copies of both the code and the tables are made, with the new code pointing to +the new tables. The memory for the new tables is automatically freed when +\fBpcre2_code_free()\fP is called for the new copy of the compiled code. If +\fBpcre2_code_copy_with_tables()\fP is called with a NULL argument, it returns +NULL. +.P +NOTE: When one of the matching functions is called, pointers to the compiled +pattern and the subject string are set in the match data block so that they can +be referenced by the substring extraction functions after a successful match. +After running a match, you must not free a compiled pattern or a subject string +until after all operations on the +.\" HTML +.\" +match data block +.\" +have taken place, unless, in the case of the subject string, you have used the +PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled +"Option bits for \fBpcre2_match()\fP" +.\" HTML +.\" +below. +.\" +.P +The \fIoptions\fP argument for \fBpcre2_compile()\fP contains various bit +settings that affect the compilation. It should be zero if none of them are +required. The available options are described below. Some of them (in +particular, those that are compatible with Perl, but some others as well) can +also be set and unset from within the pattern (see the detailed description in +the +.\" HREF +\fBpcre2pattern\fP +.\" +documentation). +.P +For those options that can be different in different parts of the pattern, the +contents of the \fIoptions\fP argument specifies their settings at the start of +compilation. The PCRE2_ANCHORED, PCRE2_ENDANCHORED, and PCRE2_NO_UTF_CHECK +options can be set at the time of matching as well as at compile time. +.P +Some additional options and less frequently required compile-time parameters +(for example, the newline setting) can be provided in a compile context (as +described +.\" HTML +.\" +above). +.\" +.P +If \fIerrorcode\fP or \fIerroroffset\fP is NULL, \fBpcre2_compile()\fP returns +NULL immediately. Otherwise, the variables to which these point are set to an +error code and an offset (number of code units) within the pattern, +respectively, when \fBpcre2_compile()\fP returns NULL because a compilation +error has occurred. The values are not defined when compilation is successful +and \fBpcre2_compile()\fP returns a non-NULL value. +.P +There are nearly 100 positive error codes that \fBpcre2_compile()\fP may return +if it finds an error in the pattern. There are also some negative error codes +that are used for invalid UTF strings when validity checking is in force. These +are the same as given by \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, and +are described in the +.\" HREF +\fBpcre2unicode\fP +.\" +documentation. There is no separate documentation for the positive error codes, +because the textual error messages that are obtained by calling the +\fBpcre2_get_error_message()\fP function (see "Obtaining a textual error +message" +.\" HTML +.\" +below) +.\" +should be self-explanatory. Macro names starting with PCRE2_ERROR_ are defined +for both positive and negative error codes in \fBpcre2.h\fP. +.P +The value returned in \fIerroroffset\fP is an indication of where in the +pattern the error occurred. It is not necessarily the furthest point in the +pattern that was read. For example, after the error "lookbehind assertion is +not fixed length", the error offset points to the start of the failing +assertion. For an invalid UTF-8 or UTF-16 string, the offset is that of the +first code unit of the failing character. +.P +Some errors are not detected until the whole pattern has been scanned; in these +cases, the offset passed back is the length of the pattern. Note that the +offset is in code units, not characters, even in a UTF mode. It may sometimes +point into the middle of a UTF-8 or UTF-16 character. +.P +This code fragment shows a typical straightforward call to +\fBpcre2_compile()\fP: +.sp + pcre2_code *re; + PCRE2_SIZE erroffset; + int errorcode; + re = pcre2_compile( + "^A.*Z", /* the pattern */ + PCRE2_ZERO_TERMINATED, /* the pattern is zero-terminated */ + 0, /* default options */ + &errorcode, /* for error code */ + &erroffset, /* for error offset */ + NULL); /* no compile context */ +.sp +. +. +.SS "Main compile options" +.rs +.sp +The following names for option bits are defined in the \fBpcre2.h\fP header +file: +.sp + PCRE2_ANCHORED +.sp +If this bit is set, the pattern is forced to be "anchored", that is, it is +constrained to match only at the first matching point in the string that is +being searched (the "subject string"). This effect can also be achieved by +appropriate constructs in the pattern itself, which is the only way to do it in +Perl. +.sp + PCRE2_ALLOW_EMPTY_CLASS +.sp +By default, for compatibility with Perl, a closing square bracket that +immediately follows an opening one is treated as a data character for the +class. When PCRE2_ALLOW_EMPTY_CLASS is set, it terminates the class, which +therefore contains no characters and so can never match. +.sp + PCRE2_ALT_BSUX +.sp +This option request alternative handling of three escape sequences, which +makes PCRE2's behaviour more like ECMAscript (aka JavaScript). When it is set: +.P +(1) \eU matches an upper case "U" character; by default \eU causes a compile +time error (Perl uses \eU to upper case subsequent characters). +.P +(2) \eu matches a lower case "u" character unless it is followed by four +hexadecimal digits, in which case the hexadecimal number defines the code point +to match. By default, \eu causes a compile time error (Perl uses it to upper +case the following character). +.P +(3) \ex matches a lower case "x" character unless it is followed by two +hexadecimal digits, in which case the hexadecimal number defines the code point +to match. By default, as in Perl, a hexadecimal number is always expected after +\ex, but it may have zero, one, or two digits (so, for example, \exz matches a +binary zero character followed by z). +.P +ECMAscript 6 added additional functionality to \eu. This can be accessed using +the PCRE2_EXTRA_ALT_BSUX extra option (see "Extra compile options" +.\" HTML +.\" +below). +.\" +Note that this alternative escape handling applies only to patterns. Neither of +these options affects the processing of replacement strings passed to +\fBpcre2_substitute()\fP. +.sp + PCRE2_ALT_CIRCUMFLEX +.sp +In multiline mode (when PCRE2_MULTILINE is set), the circumflex metacharacter +matches at the start of the subject (unless PCRE2_NOTBOL is set), and also +after any internal newline. However, it does not match after a newline at the +end of the subject, for compatibility with Perl. If you want a multiline +circumflex also to match after a terminating newline, you must set +PCRE2_ALT_CIRCUMFLEX. +.sp + PCRE2_ALT_VERBNAMES +.sp +By default, for compatibility with Perl, the name in any verb sequence such as +(*MARK:NAME) is any sequence of characters that does not include a closing +parenthesis. The name is not processed in any way, and it is not possible to +include a closing parenthesis in the name. However, if the PCRE2_ALT_VERBNAMES +option is set, normal backslash processing is applied to verb names and only an +unescaped closing parenthesis terminates the name. A closing parenthesis can be +included in a name either as \e) or between \eQ and \eE. If the PCRE2_EXTENDED +or PCRE2_EXTENDED_MORE option is set with PCRE2_ALT_VERBNAMES, unescaped +whitespace in verb names is skipped and #-comments are recognized, exactly as +in the rest of the pattern. +.sp + PCRE2_AUTO_CALLOUT +.sp +If this bit is set, \fBpcre2_compile()\fP automatically inserts callout items, +all with number 255, before each pattern item, except immediately before or +after an explicit callout in the pattern. For discussion of the callout +facility, see the +.\" HREF +\fBpcre2callout\fP +.\" +documentation. +.sp + PCRE2_CASELESS +.sp +If this bit is set, letters in the pattern match both upper and lower case +letters in the subject. It is equivalent to Perl's /i option, and it can be +changed within a pattern by a (?i) option setting. If either PCRE2_UTF or +PCRE2_UCP is set, Unicode properties are used for all characters with more than +one other case, and for all characters whose code points are greater than +U+007F. Note that there are two ASCII characters, K and S, that, in addition to +their lower case ASCII equivalents, are case-equivalent with U+212A (Kelvin +sign) and U+017F (long S) respectively. For lower valued characters with only +one other case, a lookup table is used for speed. When neither PCRE2_UTF nor +PCRE2_UCP is set, a lookup table is used for all code points less than 256, and +higher code points (available only in 16-bit or 32-bit mode) are treated as not +having another case. +.sp + PCRE2_DOLLAR_ENDONLY +.sp +If this bit is set, a dollar metacharacter in the pattern matches only at the +end of the subject string. Without this option, a dollar also matches +immediately before a newline at the end of the string (but not before any other +newlines). The PCRE2_DOLLAR_ENDONLY option is ignored if PCRE2_MULTILINE is +set. There is no equivalent to this option in Perl, and no way to set it within +a pattern. +.sp + PCRE2_DOTALL +.sp +If this bit is set, a dot metacharacter in the pattern matches any character, +including one that indicates a newline. However, it only ever matches one +character, even if newlines are coded as CRLF. Without this option, a dot does +not match when the current position in the subject is at a newline. This option +is equivalent to Perl's /s option, and it can be changed within a pattern by a +(?s) option setting. A negative class such as [^a] always matches newline +characters, and the \eN escape sequence always matches a non-newline character, +independent of the setting of PCRE2_DOTALL. +.sp + PCRE2_DUPNAMES +.sp +If this bit is set, names used to identify capture groups need not be unique. +This can be helpful for certain types of pattern when it is known that only one +instance of the named group can ever be matched. There are more details of +named capture groups below; see also the +.\" HREF +\fBpcre2pattern\fP +.\" +documentation. +.sp + PCRE2_ENDANCHORED +.sp +If this bit is set, the end of any pattern match must be right at the end of +the string being searched (the "subject string"). If the pattern match +succeeds by reaching (*ACCEPT), but does not reach the end of the subject, the +match fails at the current starting point. For unanchored patterns, a new match +is then tried at the next starting point. However, if the match succeeds by +reaching the end of the pattern, but not the end of the subject, backtracking +occurs and an alternative match may be found. Consider these two patterns: +.sp + .(*ACCEPT)|.. + .|.. +.sp +If matched against "abc" with PCRE2_ENDANCHORED set, the first matches "c" +whereas the second matches "bc". The effect of PCRE2_ENDANCHORED can also be +achieved by appropriate constructs in the pattern itself, which is the only way +to do it in Perl. +.P +For DFA matching with \fBpcre2_dfa_match()\fP, PCRE2_ENDANCHORED applies only +to the first (that is, the longest) matched string. Other parallel matches, +which are necessarily substrings of the first one, must obviously end before +the end of the subject. +.sp + PCRE2_EXTENDED +.sp +If this bit is set, most white space characters in the pattern are totally +ignored except when escaped or inside a character class. However, white space +is not allowed within sequences such as (?> that introduce various +parenthesized groups, nor within numerical quantifiers such as {1,3}. Ignorable +white space is permitted between an item and a following quantifier and between +a quantifier and a following + that indicates possessiveness. PCRE2_EXTENDED is +equivalent to Perl's /x option, and it can be changed within a pattern by a +(?x) option setting. +.P +When PCRE2 is compiled without Unicode support, PCRE2_EXTENDED recognizes as +white space only those characters with code points less than 256 that are +flagged as white space in its low-character table. The table is normally +created by +.\" HREF +\fBpcre2_maketables()\fP, +.\" +which uses the \fBisspace()\fP function to identify space characters. In most +ASCII environments, the relevant characters are those with code points 0x0009 +(tab), 0x000A (linefeed), 0x000B (vertical tab), 0x000C (formfeed), 0x000D +(carriage return), and 0x0020 (space). +.P +When PCRE2 is compiled with Unicode support, in addition to these characters, +five more Unicode "Pattern White Space" characters are recognized by +PCRE2_EXTENDED. These are U+0085 (next line), U+200E (left-to-right mark), +U+200F (right-to-left mark), U+2028 (line separator), and U+2029 (paragraph +separator). This set of characters is the same as recognized by Perl's /x +option. Note that the horizontal and vertical space characters that are matched +by the \eh and \ev escapes in patterns are a much bigger set. +.P +As well as ignoring most white space, PCRE2_EXTENDED also causes characters +between an unescaped # outside a character class and the next newline, +inclusive, to be ignored, which makes it possible to include comments inside +complicated patterns. Note that the end of this type of comment is a literal +newline sequence in the pattern; escape sequences that happen to represent a +newline do not count. +.P +Which characters are interpreted as newlines can be specified by a setting in +the compile context that is passed to \fBpcre2_compile()\fP or by a special +sequence at the start of the pattern, as described in the section entitled +.\" HTML +.\" +"Newline conventions" +.\" +in the \fBpcre2pattern\fP documentation. A default is defined when PCRE2 is +built. +.sp + PCRE2_EXTENDED_MORE +.sp +This option has the effect of PCRE2_EXTENDED, but, in addition, unescaped space +and horizontal tab characters are ignored inside a character class. Note: only +these two characters are ignored, not the full set of pattern white space +characters that are ignored outside a character class. PCRE2_EXTENDED_MORE is +equivalent to Perl's /xx option, and it can be changed within a pattern by a +(?xx) option setting. +.sp + PCRE2_FIRSTLINE +.sp +If this option is set, the start of an unanchored pattern match must be before +or at the first newline in the subject string following the start of matching, +though the matched text may continue over the newline. If \fIstartoffset\fP is +non-zero, the limiting newline is not necessarily the first newline in the +subject. For example, if the subject string is "abc\enxyz" (where \en +represents a single-character newline) a pattern match for "yz" succeeds with +PCRE2_FIRSTLINE if \fIstartoffset\fP is greater than 3. See also +PCRE2_USE_OFFSET_LIMIT, which provides a more general limiting facility. If +PCRE2_FIRSTLINE is set with an offset limit, a match must occur in the first +line and also within the offset limit. In other words, whichever limit comes +first is used. +.sp + PCRE2_LITERAL +.sp +If this option is set, all meta-characters in the pattern are disabled, and it +is treated as a literal string. Matching literal strings with a regular +expression engine is not the most efficient way of doing it. If you are doing a +lot of literal matching and are worried about efficiency, you should consider +using other approaches. The only other main options that are allowed with +PCRE2_LITERAL are: PCRE2_ANCHORED, PCRE2_ENDANCHORED, PCRE2_AUTO_CALLOUT, +PCRE2_CASELESS, PCRE2_FIRSTLINE, PCRE2_MATCH_INVALID_UTF, +PCRE2_NO_START_OPTIMIZE, PCRE2_NO_UTF_CHECK, PCRE2_UTF, and +PCRE2_USE_OFFSET_LIMIT. The extra options PCRE2_EXTRA_MATCH_LINE and +PCRE2_EXTRA_MATCH_WORD are also supported. Any other options cause an error. +.sp + PCRE2_MATCH_INVALID_UTF +.sp +This option forces PCRE2_UTF (see below) and also enables support for matching +by \fBpcre2_match()\fP in subject strings that contain invalid UTF sequences. +This facility is not supported for DFA matching. For details, see the +.\" HREF +\fBpcre2unicode\fP +.\" +documentation. +.sp + PCRE2_MATCH_UNSET_BACKREF +.sp +If this option is set, a backreference to an unset capture group matches an +empty string (by default this causes the current matching alternative to fail). +A pattern such as (\e1)(a) succeeds when this option is set (assuming it can +find an "a" in the subject), whereas it fails by default, for Perl +compatibility. Setting this option makes PCRE2 behave more like ECMAscript (aka +JavaScript). +.sp + PCRE2_MULTILINE +.sp +By default, for the purposes of matching "start of line" and "end of line", +PCRE2 treats the subject string as consisting of a single line of characters, +even if it actually contains newlines. The "start of line" metacharacter (^) +matches only at the start of the string, and the "end of line" metacharacter +($) matches only at the end of the string, or before a terminating newline +(except when PCRE2_DOLLAR_ENDONLY is set). Note, however, that unless +PCRE2_DOTALL is set, the "any character" metacharacter (.) does not match at a +newline. This behaviour (for ^, $, and dot) is the same as Perl. +.P +When PCRE2_MULTILINE it is set, the "start of line" and "end of line" +constructs match immediately following or immediately before internal newlines +in the subject string, respectively, as well as at the very start and end. This +is equivalent to Perl's /m option, and it can be changed within a pattern by a +(?m) option setting. Note that the "start of line" metacharacter does not match +after a newline at the end of the subject, for compatibility with Perl. +However, you can change this by setting the PCRE2_ALT_CIRCUMFLEX option. If +there are no newlines in a subject string, or no occurrences of ^ or $ in a +pattern, setting PCRE2_MULTILINE has no effect. +.sp + PCRE2_NEVER_BACKSLASH_C +.sp +This option locks out the use of \eC in the pattern that is being compiled. +This escape can cause unpredictable behaviour in UTF-8 or UTF-16 modes, because +it may leave the current matching point in the middle of a multi-code-unit +character. This option may be useful in applications that process patterns from +external sources. Note that there is also a build-time option that permanently +locks out the use of \eC. +.sp + PCRE2_NEVER_UCP +.sp +This option locks out the use of Unicode properties for handling \eB, \eb, \eD, +\ed, \eS, \es, \eW, \ew, and some of the POSIX character classes, as described +for the PCRE2_UCP option below. In particular, it prevents the creator of the +pattern from enabling this facility by starting the pattern with (*UCP). This +option may be useful in applications that process patterns from external +sources. The option combination PCRE_UCP and PCRE_NEVER_UCP causes an error. +.sp + PCRE2_NEVER_UTF +.sp +This option locks out interpretation of the pattern as UTF-8, UTF-16, or +UTF-32, depending on which library is in use. In particular, it prevents the +creator of the pattern from switching to UTF interpretation by starting the +pattern with (*UTF). This option may be useful in applications that process +patterns from external sources. The combination of PCRE2_UTF and +PCRE2_NEVER_UTF causes an error. +.sp + PCRE2_NO_AUTO_CAPTURE +.sp +If this option is set, it disables the use of numbered capturing parentheses in +the pattern. Any opening parenthesis that is not followed by ? behaves as if it +were followed by ?: but named parentheses can still be used for capturing (and +they acquire numbers in the usual way). This is the same as Perl's /n option. +Note that, when this option is set, references to capture groups +(backreferences or recursion/subroutine calls) may only refer to named groups, +though the reference can be by name or by number. +.sp + PCRE2_NO_AUTO_POSSESS +.sp +If this option is set, it disables "auto-possessification", which is an +optimization that, for example, turns a+b into a++b in order to avoid +backtracks into a+ that can never be successful. However, if callouts are in +use, auto-possessification means that some callouts are never taken. You can +set this option if you want the matching functions to do a full unoptimized +search and run all the callouts, but it is mainly provided for testing +purposes. +.sp + PCRE2_NO_DOTSTAR_ANCHOR +.sp +If this option is set, it disables an optimization that is applied when .* is +the first significant item in a top-level branch of a pattern, and all the +other branches also start with .* or with \eA or \eG or ^. The optimization is +automatically disabled for .* if it is inside an atomic group or a capture +group that is the subject of a backreference, or if the pattern contains +(*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is +automatically anchored if PCRE2_DOTALL is set for all the .* items and +PCRE2_MULTILINE is not set for any ^ items. Otherwise, the fact that any match +must start either at the start of the subject or following a newline is +remembered. Like other optimizations, this can cause callouts to be skipped. +.sp + PCRE2_NO_START_OPTIMIZE +.sp +This is an option whose main effect is at matching time. It does not change +what \fBpcre2_compile()\fP generates, but it does affect the output of the JIT +compiler. +.P +There are a number of optimizations that may occur at the start of a match, in +order to speed up the process. For example, if it is known that an unanchored +match must start with a specific code unit value, the matching code searches +the subject for that value, and fails immediately if it cannot find it, without +actually running the main matching function. This means that a special item +such as (*COMMIT) at the start of a pattern is not considered until after a +suitable starting point for the match has been found. Also, when callouts or +(*MARK) items are in use, these "start-up" optimizations can cause them to be +skipped if the pattern is never actually used. The start-up optimizations are +in effect a pre-scan of the subject that takes place before the pattern is run. +.P +The PCRE2_NO_START_OPTIMIZE option disables the start-up optimizations, +possibly causing performance to suffer, but ensuring that in cases where the +result is "no match", the callouts do occur, and that items such as (*COMMIT) +and (*MARK) are considered at every possible starting position in the subject +string. +.P +Setting PCRE2_NO_START_OPTIMIZE may change the outcome of a matching operation. +Consider the pattern +.sp + (*COMMIT)ABC +.sp +When this is compiled, PCRE2 records the fact that a match must start with the +character "A". Suppose the subject string is "DEFABC". The start-up +optimization scans along the subject, finds "A" and runs the first match +attempt from there. The (*COMMIT) item means that the pattern must match the +current starting position, which in this case, it does. However, if the same +match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the +subject string does not happen. The first match attempt is run starting from +"D" and when this fails, (*COMMIT) prevents any further matches being tried, so +the overall result is "no match". +.P +As another start-up optimization makes use of a minimum length for a matching +subject, which is recorded when possible. Consider the pattern +.sp + (*MARK:1)B(*MARK:2)(X|Y) +.sp +The minimum length for a match is two characters. If the subject is "XXBB", the +"starting character" optimization skips "XX", then tries to match "BB", which +is long enough. In the process, (*MARK:2) is encountered and remembered. When +the match attempt fails, the next "B" is found, but there is only one character +left, so there are no more attempts, and "no match" is returned with the "last +mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried +at every possible starting position, including at the end of the subject, where +(*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is +returned is "1". In this case, the optimizations do not affect the overall +match result, which is still "no match", but they do affect the auxiliary +information that is returned. +.sp + PCRE2_NO_UTF_CHECK +.sp +When PCRE2_UTF is set, the validity of the pattern as a UTF string is +automatically checked. There are discussions about the validity of +.\" HTML +.\" +UTF-8 strings, +.\" +.\" HTML +.\" +UTF-16 strings, +.\" +and +.\" HTML +.\" +UTF-32 strings +.\" +in the +.\" HREF +\fBpcre2unicode\fP +.\" +document. If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a +negative error code. +.P +If you know that your pattern is a valid UTF string, and you want to skip this +check for performance reasons, you can set the PCRE2_NO_UTF_CHECK option. When +it is set, the effect of passing an invalid UTF string as a pattern is +undefined. It may cause your program to crash or loop. +.P +Note that this option can also be passed to \fBpcre2_match()\fP and +\fBpcre_dfa_match()\fP, to suppress UTF validity checking of the subject +string. +.P +Note also that setting PCRE2_NO_UTF_CHECK at compile time does not disable the +error that is given if an escape sequence for an invalid Unicode code point is +encountered in the pattern. In particular, the so-called "surrogate" code +points (0xd800 to 0xdfff) are invalid. If you want to allow escape sequences +such as \ex{d800} you can set the PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES extra +option, as described in the section entitled "Extra compile options" +.\" HTML +.\" +below. +.\" +However, this is possible only in UTF-8 and UTF-32 modes, because these values +are not representable in UTF-16. +.sp + PCRE2_UCP +.sp +This option has two effects. Firstly, it change the way PCRE2 processes \eB, +\eb, \eD, \ed, \eS, \es, \eW, \ew, and some of the POSIX character classes. By +default, only ASCII characters are recognized, but if PCRE2_UCP is set, Unicode +properties are used instead to classify characters. More details are given in +the section on +.\" HTML +.\" +generic character types +.\" +in the +.\" HREF +\fBpcre2pattern\fP +.\" +page. If you set PCRE2_UCP, matching one of the items it affects takes much +longer. +.P +The second effect of PCRE2_UCP is to force the use of Unicode properties for +upper/lower casing operations on characters with code points greater than 127, +even when PCRE2_UTF is not set. This makes it possible, for example, to process +strings in the 16-bit UCS-2 code. This option is available only if PCRE2 has +been compiled with Unicode support (which is the default). +.sp + PCRE2_UNGREEDY +.sp +This option inverts the "greediness" of the quantifiers so that they are not +greedy by default, but become greedy if followed by "?". It is not compatible +with Perl. It can also be set by a (?U) option setting within the pattern. +.sp + PCRE2_USE_OFFSET_LIMIT +.sp +This option must be set for \fBpcre2_compile()\fP if +\fBpcre2_set_offset_limit()\fP is going to be used to set a non-default offset +limit in a match context for matches that use this pattern. An error is +generated if an offset limit is set without this option. For more details, see +the description of \fBpcre2_set_offset_limit()\fP in the +.\" HTML +.\" +section +.\" +that describes match contexts. See also the PCRE2_FIRSTLINE +option above. +.sp + PCRE2_UTF +.sp +This option causes PCRE2 to regard both the pattern and the subject strings +that are subsequently processed as strings of UTF characters instead of +single-code-unit strings. It is available when PCRE2 is built to include +Unicode support (which is the default). If Unicode support is not available, +the use of this option provokes an error. Details of how PCRE2_UTF changes the +behaviour of PCRE2 are given in the +.\" HREF +\fBpcre2unicode\fP +.\" +page. In particular, note that it changes the way PCRE2_CASELESS handles +characters with code points greater than 127. +. +. +.\" HTML +.SS "Extra compile options" +.rs +.sp +The option bits that can be set in a compile context by calling the +\fBpcre2_set_compile_extra_options()\fP function are as follows: +.sp + PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES +.sp +This option applies when compiling a pattern in UTF-8 or UTF-32 mode. It is +forbidden in UTF-16 mode, and ignored in non-UTF modes. Unicode "surrogate" +code points in the range 0xd800 to 0xdfff are used in pairs in UTF-16 to encode +code points with values in the range 0x10000 to 0x10ffff. The surrogates cannot +therefore be represented in UTF-16. They can be represented in UTF-8 and +UTF-32, but are defined as invalid code points, and cause errors if encountered +in a UTF-8 or UTF-32 string that is being checked for validity by PCRE2. +.P +These values also cause errors if encountered in escape sequences such as +\ex{d912} within a pattern. However, it seems that some applications, when +using PCRE2 to check for unwanted characters in UTF-8 strings, explicitly test +for the surrogates using escape sequences. The PCRE2_NO_UTF_CHECK option does +not disable the error that occurs, because it applies only to the testing of +input strings for UTF validity. +.P +If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code +point values in UTF-8 and UTF-32 patterns no longer provoke errors and are +incorporated in the compiled pattern. However, they can only match subject +characters if the matching function is called with PCRE2_NO_UTF_CHECK set. +.sp + PCRE2_EXTRA_ALT_BSUX +.sp +The original option PCRE2_ALT_BSUX causes PCRE2 to process \eU, \eu, and \ex in +the way that ECMAscript (aka JavaScript) does. Additional functionality was +defined by ECMAscript 6; setting PCRE2_EXTRA_ALT_BSUX has the effect of +PCRE2_ALT_BSUX, but in addition it recognizes \eu{hhh..} as a hexadecimal +character code, where hhh.. is any number of hexadecimal digits. +.sp + PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL +.sp +This is a dangerous option. Use with care. By default, an unrecognized escape +such as \ej or a malformed one such as \ex{2z} causes a compile-time error when +detected by \fBpcre2_compile()\fP. Perl is somewhat inconsistent in handling +such items: for example, \ej is treated as a literal "j", and non-hexadecimal +digits in \ex{} are just ignored, though warnings are given in both cases if +Perl's warning switch is enabled. However, a malformed octal number after \eo{ +always causes an error in Perl. +.P +If the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to +\fBpcre2_compile()\fP, all unrecognized or malformed escape sequences are +treated as single-character escapes. For example, \ej is a literal "j" and +\ex{2z} is treated as the literal string "x{2z}". Setting this option means +that typos in patterns may go undetected and have unexpected results. Also note +that a sequence such as [\eN{] is interpreted as a malformed attempt at +[\eN{...}] and so is treated as [N{] whereas [\eN] gives an error because an +unqualified \eN is a valid escape sequence but is not supported in a character +class. To reiterate: this is a dangerous option. Use with great care. +.sp + PCRE2_EXTRA_ESCAPED_CR_IS_LF +.sp +There are some legacy applications where the escape sequence \er in a pattern +is expected to match a newline. If this option is set, \er in a pattern is +converted to \en so that it matches a LF (linefeed) instead of a CR (carriage +return) character. The option does not affect a literal CR in the pattern, nor +does it affect CR specified as an explicit code point such as \ex{0D}. +.sp + PCRE2_EXTRA_MATCH_LINE +.sp +This option is provided for use by the \fB-x\fP option of \fBpcre2grep\fP. It +causes the pattern only to match complete lines. This is achieved by +automatically inserting the code for "^(?:" at the start of the compiled +pattern and ")$" at the end. Thus, when PCRE2_MULTILINE is set, the matched +line may be in the middle of the subject string. This option can be used with +PCRE2_LITERAL. +.sp + PCRE2_EXTRA_MATCH_WORD +.sp +This option is provided for use by the \fB-w\fP option of \fBpcre2grep\fP. It +causes the pattern only to match strings that have a word boundary at the start +and the end. This is achieved by automatically inserting the code for "\eb(?:" +at the start of the compiled pattern and ")\eb" at the end. The option may be +used with PCRE2_LITERAL. However, it is ignored if PCRE2_EXTRA_MATCH_LINE is +also set. +. +. +.\" HTML +.SH "JUST-IN-TIME (JIT) COMPILATION" +.rs +.sp +.nf +.B int pcre2_jit_compile(pcre2_code *\fIcode\fP, uint32_t \fIoptions\fP); +.sp +.B int pcre2_jit_match(const pcre2_code *\fIcode\fP, PCRE2_SPTR \fIsubject\fP, +.B " PCRE2_SIZE \fIlength\fP, PCRE2_SIZE \fIstartoffset\fP," +.B " uint32_t \fIoptions\fP, pcre2_match_data *\fImatch_data\fP," +.B " pcre2_match_context *\fImcontext\fP);" +.sp +.B void pcre2_jit_free_unused_memory(pcre2_general_context *\fIgcontext\fP); +.sp +.B pcre2_jit_stack *pcre2_jit_stack_create(PCRE2_SIZE \fIstartsize\fP, +.B " PCRE2_SIZE \fImaxsize\fP, pcre2_general_context *\fIgcontext\fP);" +.sp +.B void pcre2_jit_stack_assign(pcre2_match_context *\fImcontext\fP, +.B " pcre2_jit_callback \fIcallback_function\fP, void *\fIcallback_data\fP);" +.sp +.B void pcre2_jit_stack_free(pcre2_jit_stack *\fIjit_stack\fP); +.fi +.P +These functions provide support for JIT compilation, which, if the just-in-time +compiler is available, further processes a compiled pattern into machine code +that executes much faster than the \fBpcre2_match()\fP interpretive matching +function. Full details are given in the +.\" HREF +\fBpcre2jit\fP +.\" +documentation. +.P +JIT compilation is a heavyweight optimization. It can take some time for +patterns to be analyzed, and for one-off matches and simple patterns the +benefit of faster execution might be offset by a much slower compilation time. +Most (but not all) patterns can be optimized by the JIT compiler. +. +. +.\" HTML +.SH "LOCALE SUPPORT" +.rs +.sp +.nf +.B const uint8_t *pcre2_maketables(pcre2_general_context *\fIgcontext\fP); +.sp +.B void pcre2_maketables_free(pcre2_general_context *\fIgcontext\fP, +.B " const uint8_t *\fItables\fP);" +.fi +.P +PCRE2 handles caseless matching, and determines whether characters are letters, +digits, or whatever, by reference to a set of tables, indexed by character code +point. However, this applies only to characters whose code points are less than +256. By default, higher-valued code points never match escapes such as \ew or +\ed. +.P +When PCRE2 is built with Unicode support (the default), the Unicode properties +of all characters can be tested with \ep and \eP, or, alternatively, the +PCRE2_UCP option can be set when a pattern is compiled; this causes \ew and +friends to use Unicode property support instead of the built-in tables. +PCRE2_UCP also causes upper/lower casing operations on characters with code +points greater than 127 to use Unicode properties. These effects apply even +when PCRE2_UTF is not set. +.P +The use of locales with Unicode is discouraged. If you are handling characters +with code points greater than 127, you should either use Unicode support, or +use locales, but not try to mix the two. +.P +PCRE2 contains a built-in set of character tables that are used by default. +These are sufficient for many applications. Normally, the internal tables +recognize only ASCII characters. However, when PCRE2 is built, it is possible +to cause the internal tables to be rebuilt in the default "C" locale of the +local system, which may cause them to be different. +.P +The built-in tables can be overridden by tables supplied by the application +that calls PCRE2. These may be created in a different locale from the default. +As more and more applications change to using Unicode, the need for this locale +support is expected to die away. +.P +External tables are built by calling the \fBpcre2_maketables()\fP function, in +the relevant locale. The only argument to this function is a general context, +which can be used to pass a custom memory allocator. If the argument is NULL, +the system \fBmalloc()\fP is used. The result can be passed to +\fBpcre2_compile()\fP as often as necessary, by creating a compile context and +calling \fBpcre2_set_character_tables()\fP to set the tables pointer therein. +.P +For example, to build and use tables that are appropriate for the French locale +(where accented characters with values greater than 127 are treated as +letters), the following code could be used: +.sp + setlocale(LC_CTYPE, "fr_FR"); + tables = pcre2_maketables(NULL); + ccontext = pcre2_compile_context_create(NULL); + pcre2_set_character_tables(ccontext, tables); + re = pcre2_compile(..., ccontext); +.sp +The locale name "fr_FR" is used on Linux and other Unix-like systems; if you +are using Windows, the name for the French locale is "french". +.P +The pointer that is passed (via the compile context) to \fBpcre2_compile()\fP +is saved with the compiled pattern, and the same tables are used by the +matching functions. Thus, for any single pattern, compilation and matching both +happen in the same locale, but different patterns can be processed in different +locales. +.P +It is the caller's responsibility to ensure that the memory containing the +tables remains available while they are still in use. When they are no longer +needed, you can discard them using \fBpcre2_maketables_free()\fP, which should +pass as its first parameter the same global context that was used to create the +tables. +. +. +.SS "Saving locale tables" +.rs +.sp +The tables described above are just a sequence of binary bytes, which makes +them independent of hardware characteristics such as endianness or whether the +processor is 32-bit or 64-bit. A copy of the result of \fBpcre2_maketables()\fP +can therefore be saved in a file or elsewhere and re-used later, even in a +different program or on another computer. The size of the tables (number of +bytes) must be obtained by calling \fBpcre2_config()\fP with the +PCRE2_CONFIG_TABLES_LENGTH option because \fBpcre2_maketables()\fP does not +return this value. Note that the \fBpcre2_dftables\fP program, which is part of +the PCRE2 build system, can be used stand-alone to create a file that contains +a set of binary tables. See the +.\" HTML +.\" +\fBpcre2build\fP +.\" +documentation for details. +. +. +.\" HTML +.SH "INFORMATION ABOUT A COMPILED PATTERN" +.rs +.sp +.nf +.B int pcre2_pattern_info(const pcre2 *\fIcode\fP, uint32_t \fIwhat\fP, void *\fIwhere\fP); +.fi +.P +The \fBpcre2_pattern_info()\fP function returns general information about a +compiled pattern. For information about callouts, see the +.\" HTML +.\" +next section. +.\" +The first argument for \fBpcre2_pattern_info()\fP is a pointer to the compiled +pattern. The second argument specifies which piece of information is required, +and the third argument is a pointer to a variable to receive the data. If the +third argument is NULL, the first argument is ignored, and the function returns +the size in bytes of the variable that is required for the information +requested. Otherwise, the yield of the function is zero for success, or one of +the following negative numbers: +.sp + PCRE2_ERROR_NULL the argument \fIcode\fP was NULL + PCRE2_ERROR_BADMAGIC the "magic number" was not found + PCRE2_ERROR_BADOPTION the value of \fIwhat\fP was invalid + PCRE2_ERROR_UNSET the requested field is not set +.sp +The "magic number" is placed at the start of each compiled pattern as a simple +check against passing an arbitrary memory pointer. Here is a typical call of +\fBpcre2_pattern_info()\fP, to obtain the length of the compiled pattern: +.sp + int rc; + size_t length; + rc = pcre2_pattern_info( + re, /* result of pcre2_compile() */ + PCRE2_INFO_SIZE, /* what is required */ + &length); /* where to put the data */ +.sp +The possible values for the second argument are defined in \fBpcre2.h\fP, and +are as follows: +.sp + PCRE2_INFO_ALLOPTIONS + PCRE2_INFO_ARGOPTIONS + PCRE2_INFO_EXTRAOPTIONS +.sp +Return copies of the pattern's options. The third argument should point to a +\fBuint32_t\fP variable. PCRE2_INFO_ARGOPTIONS returns exactly the options that +were passed to \fBpcre2_compile()\fP, whereas PCRE2_INFO_ALLOPTIONS returns +the compile options as modified by any top-level (*XXX) option settings such as +(*UTF) at the start of the pattern itself. PCRE2_INFO_EXTRAOPTIONS returns the +extra options that were set in the compile context by calling the +pcre2_set_compile_extra_options() function. +.P +For example, if the pattern /(*UTF)abc/ is compiled with the PCRE2_EXTENDED +option, the result for PCRE2_INFO_ALLOPTIONS is PCRE2_EXTENDED and PCRE2_UTF. +Option settings such as (?i) that can change within a pattern do not affect the +result of PCRE2_INFO_ALLOPTIONS, even if they appear right at the start of the +pattern. (This was different in some earlier releases.) +.P +A pattern compiled without PCRE2_ANCHORED is automatically anchored by PCRE2 if +the first significant item in every top-level branch is one of the following: +.sp + ^ unless PCRE2_MULTILINE is set + \eA always + \eG always + .* sometimes - see below +.sp +When .* is the first significant item, anchoring is possible only when all the +following are true: +.sp + .* is not in an atomic group +.\" JOIN + .* is not in a capture group that is the subject + of a backreference + PCRE2_DOTALL is in force for .* + Neither (*PRUNE) nor (*SKIP) appears in the pattern + PCRE2_NO_DOTSTAR_ANCHOR is not set +.sp +For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the +options returned for PCRE2_INFO_ALLOPTIONS. +.sp + PCRE2_INFO_BACKREFMAX +.sp +Return the number of the highest backreference in the pattern. The third +argument should point to a \fBuint32_t\fP variable. Named capture groups +acquire numbers as well as names, and these count towards the highest +backreference. Backreferences such as \e4 or \eg{12} match the captured +characters of the given group, but in addition, the check that a capture +group is set in a conditional group such as (?(3)a|b) is also a backreference. +Zero is returned if there are no backreferences. +.sp + PCRE2_INFO_BSR +.sp +The output is a uint32_t integer whose value indicates what character sequences +the \eR escape sequence matches. A value of PCRE2_BSR_UNICODE means that \eR +matches any Unicode line ending sequence; a value of PCRE2_BSR_ANYCRLF means +that \eR matches only CR, LF, or CRLF. +.sp + PCRE2_INFO_CAPTURECOUNT +.sp +Return the highest capture group number in the pattern. In patterns where (?| +is not used, this is also the total number of capture groups. The third +argument should point to a \fBuint32_t\fP variable. +.sp + PCRE2_INFO_DEPTHLIMIT +.sp +If the pattern set a backtracking depth limit by including an item of the form +(*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument +should point to a uint32_t integer. If no such value has been set, the call to +\fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note that this +limit will only be used during matching if it is less than the limit set or +defaulted by the caller of the match function. +.sp + PCRE2_INFO_FIRSTBITMAP +.sp +In the absence of a single first code unit for a non-anchored pattern, +\fBpcre2_compile()\fP may construct a 256-bit table that defines a fixed set of +values for the first code unit in any match. For example, a pattern that starts +with [abc] results in a table with three bits set. When code unit values +greater than 255 are supported, the flag bit for 255 means "any code unit of +value 255 or above". If such a table was constructed, a pointer to it is +returned. Otherwise NULL is returned. The third argument should point to a +\fBconst uint8_t *\fP variable. +.sp + PCRE2_INFO_FIRSTCODETYPE +.sp +Return information about the first code unit of any matched string, for a +non-anchored pattern. The third argument should point to a \fBuint32_t\fP +variable. If there is a fixed first value, for example, the letter "c" from a +pattern such as (cat|cow|coyote), 1 is returned, and the value can be retrieved +using PCRE2_INFO_FIRSTCODEUNIT. If there is no fixed first value, but it is +known that a match can occur only at the start of the subject or following a +newline in the subject, 2 is returned. Otherwise, and for anchored patterns, 0 +is returned. +.sp + PCRE2_INFO_FIRSTCODEUNIT +.sp +Return the value of the first code unit of any matched string for a pattern +where PCRE2_INFO_FIRSTCODETYPE returns 1; otherwise return 0. The third +argument should point to a \fBuint32_t\fP variable. In the 8-bit library, the +value is always less than 256. In the 16-bit library the value can be up to +0xffff. In the 32-bit library in UTF-32 mode the value can be up to 0x10ffff, +and up to 0xffffffff when not using UTF-32 mode. +.sp + PCRE2_INFO_FRAMESIZE +.sp +Return the size (in bytes) of the data frames that are used to remember +backtracking positions when the pattern is processed by \fBpcre2_match()\fP +without the use of JIT. The third argument should point to a \fBsize_t\fP +variable. The frame size depends on the number of capturing parentheses in the +pattern. Each additional capture group adds two PCRE2_SIZE variables. +.sp + PCRE2_INFO_HASBACKSLASHC +.sp +Return 1 if the pattern contains any instances of \eC, otherwise 0. The third +argument should point to a \fBuint32_t\fP variable. +.sp + PCRE2_INFO_HASCRORLF +.sp +Return 1 if the pattern contains any explicit matches for CR or LF characters, +otherwise 0. The third argument should point to a \fBuint32_t\fP variable. An +explicit match is either a literal CR or LF character, or \er or \en or one of +the equivalent hexadecimal or octal escape sequences. +.sp + PCRE2_INFO_HEAPLIMIT +.sp +If the pattern set a heap memory limit by including an item of the form +(*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument +should point to a uint32_t integer. If no such value has been set, the call to +\fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note that this +limit will only be used during matching if it is less than the limit set or +defaulted by the caller of the match function. +.sp + PCRE2_INFO_JCHANGED +.sp +Return 1 if the (?J) or (?-J) option setting is used in the pattern, otherwise +0. The third argument should point to a \fBuint32_t\fP variable. (?J) and +(?-J) set and unset the local PCRE2_DUPNAMES option, respectively. +.sp + PCRE2_INFO_JITSIZE +.sp +If the compiled pattern was successfully processed by +\fBpcre2_jit_compile()\fP, return the size of the JIT compiled code, otherwise +return zero. The third argument should point to a \fBsize_t\fP variable. +.sp + PCRE2_INFO_LASTCODETYPE +.sp +Returns 1 if there is a rightmost literal code unit that must exist in any +matched string, other than at its start. The third argument should point to a +\fBuint32_t\fP variable. If there is no such value, 0 is returned. When 1 is +returned, the code unit value itself can be retrieved using +PCRE2_INFO_LASTCODEUNIT. For anchored patterns, a last literal value is +recorded only if it follows something of variable length. For example, for the +pattern /^a\ed+z\ed+/ the returned value is 1 (with "z" returned from +PCRE2_INFO_LASTCODEUNIT), but for /^a\edz\ed/ the returned value is 0. +.sp + PCRE2_INFO_LASTCODEUNIT +.sp +Return the value of the rightmost literal code unit that must exist in any +matched string, other than at its start, for a pattern where +PCRE2_INFO_LASTCODETYPE returns 1. Otherwise, return 0. The third argument +should point to a \fBuint32_t\fP variable. +.sp + PCRE2_INFO_MATCHEMPTY +.sp +Return 1 if the pattern might match an empty string, otherwise 0. The third +argument should point to a \fBuint32_t\fP variable. When a pattern contains +recursive subroutine calls it is not always possible to determine whether or +not it can match an empty string. PCRE2 takes a cautious approach and returns 1 +in such cases. +.sp + PCRE2_INFO_MATCHLIMIT +.sp +If the pattern set a match limit by including an item of the form +(*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument +should point to a uint32_t integer. If no such value has been set, the call to +\fBpcre2_pattern_info()\fP returns the error PCRE2_ERROR_UNSET. Note that this +limit will only be used during matching if it is less than the limit set or +defaulted by the caller of the match function. +.sp + PCRE2_INFO_MAXLOOKBEHIND +.sp +A lookbehind assertion moves back a certain number of characters (not code +units) when it starts to process each of its branches. This request returns the +largest of these backward moves. The third argument should point to a uint32_t +integer. The simple assertions \eb and \eB require a one-character lookbehind +and cause PCRE2_INFO_MAXLOOKBEHIND to return 1 in the absence of anything +longer. \eA also registers a one-character lookbehind, though it does not +actually inspect the previous character. +.P +Note that this information is useful for multi-segment matching only +if the pattern contains no nested lookbehinds. For example, the pattern +(?<=a(?<=ba)c) returns a maximum lookbehind of 2, but when it is processed, the +first lookbehind moves back by two characters, matches one character, then the +nested lookbehind also moves back by two characters. This puts the matching +point three characters earlier than it was at the start. +PCRE2_INFO_MAXLOOKBEHIND is really only useful as a debugging tool. See the +.\" HREF +\fBpcre2partial\fP +.\" +documentation for a discussion of multi-segment matching. +.sp + PCRE2_INFO_MINLENGTH +.sp +If a minimum length for matching subject strings was computed, its value is +returned. Otherwise the returned value is 0. This value is not computed when +PCRE2_NO_START_OPTIMIZE is set. The value is a number of characters, which in +UTF mode may be different from the number of code units. The third argument +should point to a \fBuint32_t\fP variable. The value is a lower bound to the +length of any matching string. There may not be any strings of that length that +do actually match, but every string that does match is at least that long. +.sp + PCRE2_INFO_NAMECOUNT + PCRE2_INFO_NAMEENTRYSIZE + PCRE2_INFO_NAMETABLE +.sp +PCRE2 supports the use of named as well as numbered capturing parentheses. The +names are just an additional way of identifying the parentheses, which still +acquire numbers. Several convenience functions such as +\fBpcre2_substring_get_byname()\fP are provided for extracting captured +substrings by name. It is also possible to extract the data directly, by first +converting the name to a number in order to access the correct pointers in the +output vector (described with \fBpcre2_match()\fP below). To do the conversion, +you need to use the name-to-number map, which is described by these three +values. +.P +The map consists of a number of fixed-size entries. PCRE2_INFO_NAMECOUNT gives +the number of entries, and PCRE2_INFO_NAMEENTRYSIZE gives the size of each +entry in code units; both of these return a \fBuint32_t\fP value. The entry +size depends on the length of the longest name. +.P +PCRE2_INFO_NAMETABLE returns a pointer to the first entry of the table. This is +a PCRE2_SPTR pointer to a block of code units. In the 8-bit library, the first +two bytes of each entry are the number of the capturing parenthesis, most +significant byte first. In the 16-bit library, the pointer points to 16-bit +code units, the first of which contains the parenthesis number. In the 32-bit +library, the pointer points to 32-bit code units, the first of which contains +the parenthesis number. The rest of the entry is the corresponding name, zero +terminated. +.P +The names are in alphabetical order. If (?| is used to create multiple capture +groups with the same number, as described in the +.\" HTML +.\" +section on duplicate group numbers +.\" +in the +.\" HREF +\fBpcre2pattern\fP +.\" +page, the groups may be given the same name, but there is only one entry in the +table. Different names for groups of the same number are not permitted. +.P +Duplicate names for capture groups with different numbers are permitted, but +only if PCRE2_DUPNAMES is set. They appear in the table in the order in which +they were found in the pattern. In the absence of (?| this is the order of +increasing number; when (?| is used this is not necessarily the case because +later capture groups may have lower numbers. +.P +As a simple example of the name/number table, consider the following pattern +after compilation by the 8-bit library (assume PCRE2_EXTENDED is set, so white +space - including newlines - is ignored): +.sp +.\" JOIN + (? (?(\ed\ed)?\ed\ed) - + (?\ed\ed) - (?\ed\ed) ) +.sp +There are four named capture groups, so the table has four entries, and each +entry in the table is eight bytes long. The table is as follows, with +non-printing bytes shows in hexadecimal, and undefined bytes shown as ??: +.sp + 00 01 d a t e 00 ?? + 00 05 d a y 00 ?? ?? + 00 04 m o n t h 00 + 00 02 y e a r 00 ?? +.sp +When writing code to extract data from named capture groups using the +name-to-number map, remember that the length of the entries is likely to be +different for each compiled pattern. +.sp + PCRE2_INFO_NEWLINE +.sp +The output is one of the following \fBuint32_t\fP values: +.sp + PCRE2_NEWLINE_CR Carriage return (CR) + PCRE2_NEWLINE_LF Linefeed (LF) + PCRE2_NEWLINE_CRLF Carriage return, linefeed (CRLF) + PCRE2_NEWLINE_ANY Any Unicode line ending + PCRE2_NEWLINE_ANYCRLF Any of CR, LF, or CRLF + PCRE2_NEWLINE_NUL The NUL character (binary zero) +.sp +This identifies the character sequence that will be recognized as meaning +"newline" while matching. +.sp + PCRE2_INFO_SIZE +.sp +Return the size of the compiled pattern in bytes (for all three libraries). The +third argument should point to a \fBsize_t\fP variable. This value includes the +size of the general data block that precedes the code units of the compiled +pattern itself. The value that is used when \fBpcre2_compile()\fP is getting +memory in which to place the compiled pattern may be slightly larger than the +value returned by this option, because there are cases where the code that +calculates the size has to over-estimate. Processing a pattern with the JIT +compiler does not alter the value returned by this option. +. +. +.\" HTML +.SH "INFORMATION ABOUT A PATTERN'S CALLOUTS" +.rs +.sp +.nf +.B int pcre2_callout_enumerate(const pcre2_code *\fIcode\fP, +.B " int (*\fIcallback\fP)(pcre2_callout_enumerate_block *, void *)," +.B " void *\fIuser_data\fP);" +.fi +.sp +A script language that supports the use of string arguments in callouts might +like to scan all the callouts in a pattern before running the match. This can +be done by calling \fBpcre2_callout_enumerate()\fP. The first argument is a +pointer to a compiled pattern, the second points to a callback function, and +the third is arbitrary user data. The callback function is called for every +callout in the pattern in the order in which they appear. Its first argument is +a pointer to a callout enumeration block, and its second argument is the +\fIuser_data\fP value that was passed to \fBpcre2_callout_enumerate()\fP. The +contents of the callout enumeration block are described in the +.\" HREF +\fBpcre2callout\fP +.\" +documentation, which also gives further details about callouts. +. +. +.SH "SERIALIZATION AND PRECOMPILING" +.rs +.sp +It is possible to save compiled patterns on disc or elsewhere, and reload them +later, subject to a number of restrictions. The host on which the patterns are +reloaded must be running the same version of PCRE2, with the same code unit +width, and must also have the same endianness, pointer width, and PCRE2_SIZE +type. Before compiled patterns can be saved, they must be converted to a +"serialized" form, which in the case of PCRE2 is really just a bytecode dump. +The functions whose names begin with \fBpcre2_serialize_\fP are used for +converting to and from the serialized form. They are described in the +.\" HREF +\fBpcre2serialize\fP +.\" +documentation. Note that PCRE2 serialization does not convert compiled patterns +to an abstract format like Java or .NET serialization. +. +. +.\" HTML +.SH "THE MATCH DATA BLOCK" +.rs +.sp +.nf +.B pcre2_match_data *pcre2_match_data_create(uint32_t \fIovecsize\fP, +.B " pcre2_general_context *\fIgcontext\fP);" +.sp +.B pcre2_match_data *pcre2_match_data_create_from_pattern( +.B " const pcre2_code *\fIcode\fP, pcre2_general_context *\fIgcontext\fP);" +.sp +.B void pcre2_match_data_free(pcre2_match_data *\fImatch_data\fP); +.fi +.P +Information about a successful or unsuccessful match is placed in a match +data block, which is an opaque structure that is accessed by function calls. In +particular, the match data block contains a vector of offsets into the subject +string that define the matched part of the subject and any substrings that were +captured. This is known as the \fIovector\fP. +.P +Before calling \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP, or +\fBpcre2_jit_match()\fP you must create a match data block by calling one of +the creation functions above. For \fBpcre2_match_data_create()\fP, the first +argument is the number of pairs of offsets in the \fIovector\fP. One pair of +offsets is required to identify the string that matched the whole pattern, with +an additional pair for each captured substring. For example, a value of 4 +creates enough space to record the matched portion of the subject plus three +captured substrings. A minimum of at least 1 pair is imposed by +\fBpcre2_match_data_create()\fP, so it is always possible to return the overall +matched string. +.P +The second argument of \fBpcre2_match_data_create()\fP is a pointer to a +general context, which can specify custom memory management for obtaining the +memory for the match data block. If you are not using custom memory management, +pass NULL, which causes \fBmalloc()\fP to be used. +.P +For \fBpcre2_match_data_create_from_pattern()\fP, the first argument is a +pointer to a compiled pattern. The ovector is created to be exactly the right +size to hold all the substrings a pattern might capture. The second argument is +again a pointer to a general context, but in this case if NULL is passed, the +memory is obtained using the same allocator that was used for the compiled +pattern (custom or default). +.P +A match data block can be used many times, with the same or different compiled +patterns. You can extract information from a match data block after a match +operation has finished, using functions that are described in the sections on +.\" HTML +.\" +matched strings +.\" +and +.\" HTML +.\" +other match data +.\" +below. +.P +When a call of \fBpcre2_match()\fP fails, valid data is available in the match +block only when the error is PCRE2_ERROR_NOMATCH, PCRE2_ERROR_PARTIAL, or one +of the error codes for an invalid UTF string. Exactly what is available depends +on the error, and is detailed below. +.P +When one of the matching functions is called, pointers to the compiled pattern +and the subject string are set in the match data block so that they can be +referenced by the extraction functions after a successful match. After running +a match, you must not free a compiled pattern or a subject string until after +all operations on the match data block (for that match) have taken place, +unless, in the case of the subject string, you have used the +PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled +"Option bits for \fBpcre2_match()\fP" +.\" HTML +.\" +below. +.\" +.P +When a match data block itself is no longer needed, it should be freed by +calling \fBpcre2_match_data_free()\fP. If this function is called with a NULL +argument, it returns immediately, without doing anything. +. +. +.SH "MATCHING A PATTERN: THE TRADITIONAL FUNCTION" +.rs +.sp +.nf +.B int pcre2_match(const pcre2_code *\fIcode\fP, PCRE2_SPTR \fIsubject\fP, +.B " PCRE2_SIZE \fIlength\fP, PCRE2_SIZE \fIstartoffset\fP," +.B " uint32_t \fIoptions\fP, pcre2_match_data *\fImatch_data\fP," +.B " pcre2_match_context *\fImcontext\fP);" +.fi +.P +The function \fBpcre2_match()\fP is called to match a subject string against a +compiled pattern, which is passed in the \fIcode\fP argument. You can call +\fBpcre2_match()\fP with the same \fIcode\fP argument as many times as you +like, in order to find multiple matches in the subject string or to match +different subject strings with the same pattern. +.P +This function is the main matching facility of the library, and it operates in +a Perl-like manner. For specialist use there is also an alternative matching +function, which is described +.\" HTML +.\" +below +.\" +in the section about the \fBpcre2_dfa_match()\fP function. +.P +Here is an example of a simple call to \fBpcre2_match()\fP: +.sp + pcre2_match_data *md = pcre2_match_data_create(4, NULL); + int rc = pcre2_match( + re, /* result of pcre2_compile() */ + "some string", /* the subject string */ + 11, /* the length of the subject string */ + 0, /* start at offset 0 in the subject */ + 0, /* default options */ + md, /* the match data block */ + NULL); /* a match context; NULL means use defaults */ +.sp +If the subject string is zero-terminated, the length can be given as +PCRE2_ZERO_TERMINATED. A match context must be provided if certain less common +matching parameters are to be changed. For details, see the section on +.\" HTML +.\" +the match context +.\" +above. +. +. +.SS "The string to be matched by \fBpcre2_match()\fP" +.rs +.sp +The subject string is passed to \fBpcre2_match()\fP as a pointer in +\fIsubject\fP, a length in \fIlength\fP, and a starting offset in +\fIstartoffset\fP. The length and offset are in code units, not characters. +That is, they are in bytes for the 8-bit library, 16-bit code units for the +16-bit library, and 32-bit code units for the 32-bit library, whether or not +UTF processing is enabled. +.P +If \fIstartoffset\fP is greater than the length of the subject, +\fBpcre2_match()\fP returns PCRE2_ERROR_BADOFFSET. When the starting offset is +zero, the search for a match starts at the beginning of the subject, and this +is by far the most common case. In UTF-8 or UTF-16 mode, the starting offset +must point to the start of a character, or to the end of the subject (in UTF-32 +mode, one code unit equals one character, so all offsets are valid). Like the +pattern string, the subject may contain binary zeros. +.P +A non-zero starting offset is useful when searching for another match in the +same subject by calling \fBpcre2_match()\fP again after a previous success. +Setting \fIstartoffset\fP differs from passing over a shortened string and +setting PCRE2_NOTBOL in the case of a pattern that begins with any kind of +lookbehind. For example, consider the pattern +.sp + \eBiss\eB +.sp +which finds occurrences of "iss" in the middle of words. (\eB matches only if +the current position in the subject is not a word boundary.) When applied to +the string "Mississipi" the first call to \fBpcre2_match()\fP finds the first +occurrence. If \fBpcre2_match()\fP is called again with just the remainder of +the subject, namely "issipi", it does not match, because \eB is always false at +the start of the subject, which is deemed to be a word boundary. However, if +\fBpcre2_match()\fP is passed the entire string again, but with +\fIstartoffset\fP set to 4, it finds the second occurrence of "iss" because it +is able to look behind the starting point to discover that it is preceded by a +letter. +.P +Finding all the matches in a subject is tricky when the pattern can match an +empty string. It is possible to emulate Perl's /g behaviour by first trying the +match again at the same offset, with the PCRE2_NOTEMPTY_ATSTART and +PCRE2_ANCHORED options, and then if that fails, advancing the starting offset +and trying an ordinary match again. There is some code that demonstrates how to +do this in the +.\" HREF +\fBpcre2demo\fP +.\" +sample program. In the most general case, you have to check to see if the +newline convention recognizes CRLF as a newline, and if so, and the current +character is CR followed by LF, advance the starting offset by two characters +instead of one. +.P +If a non-zero starting offset is passed when the pattern is anchored, a single +attempt to match at the given offset is made. This can only succeed if the +pattern does not require the match to be at the start of the subject. In other +words, the anchoring must be the result of setting the PCRE2_ANCHORED option or +the use of .* with PCRE2_DOTALL, not by starting the pattern with ^ or \eA. +. +. +.\" HTML +.SS "Option bits for \fBpcre2_match()\fP" +.rs +.sp +The unused bits of the \fIoptions\fP argument for \fBpcre2_match()\fP must be +zero. The only bits that may be set are PCRE2_ANCHORED, +PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL, +PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_JIT, PCRE2_NO_UTF_CHECK, +PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below. +.P +Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by +the just-in-time (JIT) compiler. If it is set, JIT matching is disabled and the +interpretive code in \fBpcre2_match()\fP is run. Apart from PCRE2_NO_JIT +(obviously), the remaining options are supported for JIT matching. +.sp + PCRE2_ANCHORED +.sp +The PCRE2_ANCHORED option limits \fBpcre2_match()\fP to matching at the first +matching position. If a pattern was compiled with PCRE2_ANCHORED, or turned out +to be anchored by virtue of its contents, it cannot be made unachored at +matching time. Note that setting the option at match time disables JIT +matching. +.sp + PCRE2_COPY_MATCHED_SUBJECT +.sp +By default, a pointer to the subject is remembered in the match data block so +that, after a successful match, it can be referenced by the substring +extraction functions. This means that the subject's memory must not be freed +until all such operations are complete. For some applications where the +lifetime of the subject string is not guaranteed, it may be necessary to make a +copy of the subject string, but it is wasteful to do this unless the match is +successful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the +subject is copied and the new pointer is remembered in the match data block +instead of the original subject pointer. The memory allocator that was used for +the match block itself is used. The copy is automatically freed when +\fBpcre2_match_data_free()\fP is called to free the match data block. It is also +automatically freed if the match data block is re-used for another match +operation. +.sp + PCRE2_ENDANCHORED +.sp +If the PCRE2_ENDANCHORED option is set, any string that \fBpcre2_match()\fP +matches must be right at the end of the subject string. Note that setting the +option at match time disables JIT matching. +.sp + PCRE2_NOTBOL +.sp +This option specifies that first character of the subject string is not the +beginning of a line, so the circumflex metacharacter should not match before +it. Setting this without having set PCRE2_MULTILINE at compile time causes +circumflex never to match. This option affects only the behaviour of the +circumflex metacharacter. It does not affect \eA. +.sp + PCRE2_NOTEOL +.sp +This option specifies that the end of the subject string is not the end of a +line, so the dollar metacharacter should not match it nor (except in multiline +mode) a newline immediately before it. Setting this without having set +PCRE2_MULTILINE at compile time causes dollar never to match. This option +affects only the behaviour of the dollar metacharacter. It does not affect \eZ +or \ez. +.sp + PCRE2_NOTEMPTY +.sp +An empty string is not considered to be a valid match if this option is set. If +there are alternatives in the pattern, they are tried. If all the alternatives +match the empty string, the entire match fails. For example, if the pattern +.sp + a?b? +.sp +is applied to a string not beginning with "a" or "b", it matches an empty +string at the start of the subject. With PCRE2_NOTEMPTY set, this match is not +valid, so \fBpcre2_match()\fP searches further into the string for occurrences +of "a" or "b". +.sp + PCRE2_NOTEMPTY_ATSTART +.sp +This is like PCRE2_NOTEMPTY, except that it locks out an empty string match +only at the first matching position, that is, at the start of the subject plus +the starting offset. An empty string match later in the subject is permitted. +If the pattern is anchored, such a match can occur only if the pattern contains +\eK. +.sp + PCRE2_NO_JIT +.sp +By default, if a pattern has been successfully processed by +\fBpcre2_jit_compile()\fP, JIT is automatically used when \fBpcre2_match()\fP +is called with options that JIT supports. Setting PCRE2_NO_JIT disables the use +of JIT; it forces matching to be done by the interpreter. +.sp + PCRE2_NO_UTF_CHECK +.sp +When PCRE2_UTF is set at compile time, the validity of the subject as a UTF +string is checked unless PCRE2_NO_UTF_CHECK is passed to \fBpcre2_match()\fP or +PCRE2_MATCH_INVALID_UTF was passed to \fBpcre2_compile()\fP. The latter special +case is discussed in detail in the +.\" HREF +\fBpcre2unicode\fP +.\" +documentation. +.P +In the default case, if a non-zero starting offset is given, the check is +applied only to that part of the subject that could be inspected during +matching, and there is a check that the starting offset points to the first +code unit of a character or to the end of the subject. If there are no +lookbehind assertions in the pattern, the check starts at the starting offset. +Otherwise, it starts at the length of the longest lookbehind before the +starting offset, or at the start of the subject if there are not that many +characters before the starting offset. Note that the sequences \eb and \eB are +one-character lookbehinds. +.P +The check is carried out before any other processing takes place, and a +negative error code is returned if the check fails. There are several UTF error +codes for each code unit width, corresponding to different problems with the +code unit sequence. There are discussions about the validity of +.\" HTML +.\" +UTF-8 strings, +.\" +.\" HTML +.\" +UTF-16 strings, +.\" +and +.\" HTML +.\" +UTF-32 strings +.\" +in the +.\" HREF +\fBpcre2unicode\fP +.\" +documentation. +.P +If you know that your subject is valid, and you want to skip this check for +performance reasons, you can set the PCRE2_NO_UTF_CHECK option when calling +\fBpcre2_match()\fP. You might want to do this for the second and subsequent +calls to \fBpcre2_match()\fP if you are making repeated calls to find multiple +matches in the same subject string. +.P +\fBWarning:\fP Unless PCRE2_MATCH_INVALID_UTF was set at compile time, when +PCRE2_NO_UTF_CHECK is set at match time the effect of passing an invalid +string as a subject, or an invalid value of \fIstartoffset\fP, is undefined. +Your program may crash or loop indefinitely or give wrong results. +.sp + PCRE2_PARTIAL_HARD + PCRE2_PARTIAL_SOFT +.sp +These options turn on the partial matching feature. A partial match occurs if +the end of the subject string is reached successfully, but there are not enough +subject characters to complete the match. In addition, either at least one +character must have been inspected or the pattern must contain a lookbehind, or +the pattern must be one that could match an empty string. +.P +If this situation arises when PCRE2_PARTIAL_SOFT (but not PCRE2_PARTIAL_HARD) +is set, matching continues by testing any remaining alternatives. Only if no +complete match can be found is PCRE2_ERROR_PARTIAL returned instead of +PCRE2_ERROR_NOMATCH. In other words, PCRE2_PARTIAL_SOFT specifies that the +caller is prepared to handle a partial match, but only if no complete match can +be found. +.P +If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this case, if +a partial match is found, \fBpcre2_match()\fP immediately returns +PCRE2_ERROR_PARTIAL, without considering any other alternatives. In other +words, when PCRE2_PARTIAL_HARD is set, a partial match is considered to be more +important that an alternative complete match. +.P +There is a more detailed discussion of partial and multi-segment matching, with +examples, in the +.\" HREF +\fBpcre2partial\fP +.\" +documentation. +. +. +. +.SH "NEWLINE HANDLING WHEN MATCHING" +.rs +.sp +When PCRE2 is built, a default newline convention is set; this is usually the +standard convention for the operating system. The default can be overridden in +a +.\" HTML +.\" +compile context +.\" +by calling \fBpcre2_set_newline()\fP. It can also be overridden by starting a +pattern string with, for example, (*CRLF), as described in the +.\" HTML +.\" +section on newline conventions +.\" +in the +.\" HREF +\fBpcre2pattern\fP +.\" +page. During matching, the newline choice affects the behaviour of the dot, +circumflex, and dollar metacharacters. It may also alter the way the match +starting position is advanced after a match failure for an unanchored pattern. +.P +When PCRE2_NEWLINE_CRLF, PCRE2_NEWLINE_ANYCRLF, or PCRE2_NEWLINE_ANY is set as +the newline convention, and a match attempt for an unanchored pattern fails +when the current starting position is at a CRLF sequence, and the pattern +contains no explicit matches for CR or LF characters, the match position is +advanced by two characters instead of one, in other words, to after the CRLF. +.P +The above rule is a compromise that makes the most common cases work as +expected. For example, if the pattern is .+A (and the PCRE2_DOTALL option is +not set), it does not match the string "\er\enA" because, after failing at the +start, it skips both the CR and the LF before retrying. However, the pattern +[\er\en]A does match that string, because it contains an explicit CR or LF +reference, and so advances only by one character after the first failure. +.P +An explicit match for CR of LF is either a literal appearance of one of those +characters in the pattern, or one of the \er or \en or equivalent octal or +hexadecimal escape sequences. Implicit matches such as [^X] do not count, nor +does \es, even though it includes CR and LF in the characters that it matches. +.P +Notwithstanding the above, anomalous effects may still occur when CRLF is a +valid newline sequence and explicit \er or \en escapes appear in the pattern. +. +. +.\" HTML +.SH "HOW PCRE2_MATCH() RETURNS A STRING AND CAPTURED SUBSTRINGS" +.rs +.sp +.nf +.B uint32_t pcre2_get_ovector_count(pcre2_match_data *\fImatch_data\fP); +.sp +.B PCRE2_SIZE *pcre2_get_ovector_pointer(pcre2_match_data *\fImatch_data\fP); +.fi +.P +In general, a pattern matches a certain portion of the subject, and in +addition, further substrings from the subject may be picked out by +parenthesized parts of the pattern. Following the usage in Jeffrey Friedl's +book, this is called "capturing" in what follows, and the phrase "capture +group" (Perl terminology) is used for a fragment of a pattern that picks out a +substring. PCRE2 supports several other kinds of parenthesized group that do +not cause substrings to be captured. The \fBpcre2_pattern_info()\fP function +can be used to find out how many capture groups there are in a compiled +pattern. +.P +You can use auxiliary functions for accessing captured substrings +.\" HTML +.\" +by number +.\" +or +.\" HTML +.\" +by name, +.\" +as described in sections below. +.P +Alternatively, you can make direct use of the vector of PCRE2_SIZE values, +called the \fBovector\fP, which contains the offsets of captured strings. It is +part of the +.\" HTML +.\" +match data block. +.\" +The function \fBpcre2_get_ovector_pointer()\fP returns the address of the +ovector, and \fBpcre2_get_ovector_count()\fP returns the number of pairs of +values it contains. +.P +Within the ovector, the first in each pair of values is set to the offset of +the first code unit of a substring, and the second is set to the offset of the +first code unit after the end of a substring. These values are always code unit +offsets, not character offsets. That is, they are byte offsets in the 8-bit +library, 16-bit offsets in the 16-bit library, and 32-bit offsets in the 32-bit +library. +.P +After a partial match (error return PCRE2_ERROR_PARTIAL), only the first pair +of offsets (that is, \fIovector[0]\fP and \fIovector[1]\fP) are set. They +identify the part of the subject that was partially matched. See the +.\" HREF +\fBpcre2partial\fP +.\" +documentation for details of partial matching. +.P +After a fully successful match, the first pair of offsets identifies the +portion of the subject string that was matched by the entire pattern. The next +pair is used for the first captured substring, and so on. The value returned by +\fBpcre2_match()\fP is one more than the highest numbered pair that has been +set. For example, if two substrings have been captured, the returned value is +3. If there are no captured substrings, the return value from a successful +match is 1, indicating that just the first pair of offsets has been set. +.P +If a pattern uses the \eK escape sequence within a positive assertion, the +reported start of a successful match can be greater than the end of the match. +For example, if the pattern (?=ab\eK) is matched against "ab", the start and +end offset values for the match are 2 and 0. +.P +If a capture group is matched repeatedly within a single match operation, it is +the last portion of the subject that it matched that is returned. +.P +If the ovector is too small to hold all the captured substring offsets, as much +as possible is filled in, and the function returns a value of zero. If captured +substrings are not of interest, \fBpcre2_match()\fP may be called with a match +data block whose ovector is of minimum length (that is, one pair). +.P +It is possible for capture group number \fIn+1\fP to match some part of the +subject when group \fIn\fP has not been used at all. For example, if the string +"abc" is matched against the pattern (a|(z))(bc) the return from the function +is 4, and groups 1 and 3 are matched, but 2 is not. When this happens, both +values in the offset pairs corresponding to unused groups are set to +PCRE2_UNSET. +.P +Offset values that correspond to unused groups at the end of the expression are +also set to PCRE2_UNSET. For example, if the string "abc" is matched against +the pattern (abc)(x(yz)?)? groups 2 and 3 are not matched. The return from the +function is 2, because the highest used capture group number is 1. The offsets +for for the second and third capture groupss (assuming the vector is large +enough, of course) are set to PCRE2_UNSET. +.P +Elements in the ovector that do not correspond to capturing parentheses in the +pattern are never changed. That is, if a pattern contains \fIn\fP capturing +parentheses, no more than \fIovector[0]\fP to \fIovector[2n+1]\fP are set by +\fBpcre2_match()\fP. The other elements retain whatever values they previously +had. After a failed match attempt, the contents of the ovector are unchanged. +. +. +.\" HTML +.SH "OTHER INFORMATION ABOUT A MATCH" +.rs +.sp +.nf +.B PCRE2_SPTR pcre2_get_mark(pcre2_match_data *\fImatch_data\fP); +.sp +.B PCRE2_SIZE pcre2_get_startchar(pcre2_match_data *\fImatch_data\fP); +.fi +.P +As well as the offsets in the ovector, other information about a match is +retained in the match data block and can be retrieved by the above functions in +appropriate circumstances. If they are called at other times, the result is +undefined. +.P +After a successful match, a partial match (PCRE2_ERROR_PARTIAL), or a failure +to match (PCRE2_ERROR_NOMATCH), a mark name may be available. The function +\fBpcre2_get_mark()\fP can be called to access this name, which can be +specified in the pattern by any of the backtracking control verbs, not just +(*MARK). The same function applies to all the verbs. It returns a pointer to +the zero-terminated name, which is within the compiled pattern. If no name is +available, NULL is returned. The length of the name (excluding the terminating +zero) is stored in the code unit that precedes the name. You should use this +length instead of relying on the terminating zero if the name might contain a +binary zero. +.P +After a successful match, the name that is returned is the last mark name +encountered on the matching path through the pattern. Instances of backtracking +verbs without names do not count. Thus, for example, if the matching path +contains (*MARK:A)(*PRUNE), the name "A" is returned. After a "no match" or a +partial match, the last encountered name is returned. For example, consider +this pattern: +.sp + ^(*MARK:A)((*MARK:B)a|b)c +.sp +When it matches "bc", the returned name is A. The B mark is "seen" in the first +branch of the group, but it is not on the matching path. On the other hand, +when this pattern fails to match "bx", the returned name is B. +.P +\fBWarning:\fP By default, certain start-of-match optimizations are used to +give a fast "no match" result in some situations. For example, if the anchoring +is removed from the pattern above, there is an initial check for the presence +of "c" in the subject before running the matching engine. This check fails for +"bx", causing a match failure without seeing any marks. You can disable the +start-of-match optimizations by setting the PCRE2_NO_START_OPTIMIZE option for +\fBpcre2_compile()\fP or by starting the pattern with (*NO_START_OPT). +.P +After a successful match, a partial match, or one of the invalid UTF errors +(for example, PCRE2_ERROR_UTF8_ERR5), \fBpcre2_get_startchar()\fP can be +called. After a successful or partial match it returns the code unit offset of +the character at which the match started. For a non-partial match, this can be +different to the value of \fIovector[0]\fP if the pattern contains the \eK +escape sequence. After a partial match, however, this value is always the same +as \fIovector[0]\fP because \eK does not affect the result of a partial match. +.P +After a UTF check failure, \fBpcre2_get_startchar()\fP can be used to obtain +the code unit offset of the invalid UTF character. Details are given in the +.\" HREF +\fBpcre2unicode\fP +.\" +page. +. +. +.\" HTML +.SH "ERROR RETURNS FROM \fBpcre2_match()\fP" +.rs +.sp +If \fBpcre2_match()\fP fails, it returns a negative number. This can be +converted to a text string by calling the \fBpcre2_get_error_message()\fP +function (see "Obtaining a textual error message" +.\" HTML +.\" +below). +.\" +Negative error codes are also returned by other functions, and are documented +with them. The codes are given names in the header file. If UTF checking is in +force and an invalid UTF subject string is detected, one of a number of +UTF-specific negative error codes is returned. Details are given in the +.\" HREF +\fBpcre2unicode\fP +.\" +page. The following are the other errors that may be returned by +\fBpcre2_match()\fP: +.sp + PCRE2_ERROR_NOMATCH +.sp +The subject string did not match the pattern. +.sp + PCRE2_ERROR_PARTIAL +.sp +The subject string did not match, but it did match partially. See the +.\" HREF +\fBpcre2partial\fP +.\" +documentation for details of partial matching. +.sp + PCRE2_ERROR_BADMAGIC +.sp +PCRE2 stores a 4-byte "magic number" at the start of the compiled code, to +catch the case when it is passed a junk pointer. This is the error that is +returned when the magic number is not present. +.sp + PCRE2_ERROR_BADMODE +.sp +This error is given when a compiled pattern is passed to a function in a +library of a different code unit width, for example, a pattern compiled by +the 8-bit library is passed to a 16-bit or 32-bit library function. +.sp + PCRE2_ERROR_BADOFFSET +.sp +The value of \fIstartoffset\fP was greater than the length of the subject. +.sp + PCRE2_ERROR_BADOPTION +.sp +An unrecognized bit was set in the \fIoptions\fP argument. +.sp + PCRE2_ERROR_BADUTFOFFSET +.sp +The UTF code unit sequence that was passed as a subject was checked and found +to be valid (the PCRE2_NO_UTF_CHECK option was not set), but the value of +\fIstartoffset\fP did not point to the beginning of a UTF character or the end +of the subject. +.sp + PCRE2_ERROR_CALLOUT +.sp +This error is never generated by \fBpcre2_match()\fP itself. It is provided for +use by callout functions that want to cause \fBpcre2_match()\fP or +\fBpcre2_callout_enumerate()\fP to return a distinctive error code. See the +.\" HREF +\fBpcre2callout\fP +.\" +documentation for details. +.sp + PCRE2_ERROR_DEPTHLIMIT +.sp +The nested backtracking depth limit was reached. +.sp + PCRE2_ERROR_HEAPLIMIT +.sp +The heap limit was reached. +.sp + PCRE2_ERROR_INTERNAL +.sp +An unexpected internal error has occurred. This error could be caused by a bug +in PCRE2 or by overwriting of the compiled pattern. +.sp + PCRE2_ERROR_JIT_STACKLIMIT +.sp +This error is returned when a pattern that was successfully studied using JIT +is being matched, but the memory available for the just-in-time processing +stack is not large enough. See the +.\" HREF +\fBpcre2jit\fP +.\" +documentation for more details. +.sp + PCRE2_ERROR_MATCHLIMIT +.sp +The backtracking match limit was reached. +.sp + PCRE2_ERROR_NOMEMORY +.sp +If a pattern contains many nested backtracking points, heap memory is used to +remember them. This error is given when the memory allocation function (default +or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given +if the amount of memory needed exceeds the heap limit. PCRE2_ERROR_NOMEMORY is +also returned if PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails. +.sp + PCRE2_ERROR_NULL +.sp +Either the \fIcode\fP, \fIsubject\fP, or \fImatch_data\fP argument was passed +as NULL. +.sp + PCRE2_ERROR_RECURSELOOP +.sp +This error is returned when \fBpcre2_match()\fP detects a recursion loop within +the pattern. Specifically, it means that either the whole pattern or a +capture group has been called recursively for the second time at the same +position in the subject string. Some simple patterns that might do this are +detected and faulted at compile time, but more complicated cases, in particular +mutual recursions between two different groups, cannot be detected until +matching is attempted. +. +. +.\" HTML +.SH "OBTAINING A TEXTUAL ERROR MESSAGE" +.rs +.sp +.nf +.B int pcre2_get_error_message(int \fIerrorcode\fP, PCRE2_UCHAR *\fIbuffer\fP, +.B " PCRE2_SIZE \fIbufflen\fP);" +.fi +.P +A text message for an error code from any PCRE2 function (compile, match, or +auxiliary) can be obtained by calling \fBpcre2_get_error_message()\fP. The code +is passed as the first argument, with the remaining two arguments specifying a +code unit buffer and its length in code units, into which the text message is +placed. The message is returned in code units of the appropriate width for the +library that is being used. +.P +The returned message is terminated with a trailing zero, and the function +returns the number of code units used, excluding the trailing zero. If the +error number is unknown, the negative error code PCRE2_ERROR_BADDATA is +returned. If the buffer is too small, the message is truncated (but still with +a trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned. +None of the messages are very long; a buffer size of 120 code units is ample. +. +. +.\" HTML +.SH "EXTRACTING CAPTURED SUBSTRINGS BY NUMBER" +.rs +.sp +.nf +.B int pcre2_substring_length_bynumber(pcre2_match_data *\fImatch_data\fP, +.B " uint32_t \fInumber\fP, PCRE2_SIZE *\fIlength\fP);" +.sp +.B int pcre2_substring_copy_bynumber(pcre2_match_data *\fImatch_data\fP, +.B " uint32_t \fInumber\fP, PCRE2_UCHAR *\fIbuffer\fP," +.B " PCRE2_SIZE *\fIbufflen\fP);" +.sp +.B int pcre2_substring_get_bynumber(pcre2_match_data *\fImatch_data\fP, +.B " uint32_t \fInumber\fP, PCRE2_UCHAR **\fIbufferptr\fP," +.B " PCRE2_SIZE *\fIbufflen\fP);" +.sp +.B void pcre2_substring_free(PCRE2_UCHAR *\fIbuffer\fP); +.fi +.P +Captured substrings can be accessed directly by using the ovector as described +.\" HTML +.\" +above. +.\" +For convenience, auxiliary functions are provided for extracting captured +substrings as new, separate, zero-terminated strings. A substring that contains +a binary zero is correctly extracted and has a further zero added on the end, +but the result is not, of course, a C string. +.P +The functions in this section identify substrings by number. The number zero +refers to the entire matched substring, with higher numbers referring to +substrings captured by parenthesized groups. After a partial match, only +substring zero is available. An attempt to extract any other substring gives +the error PCRE2_ERROR_PARTIAL. The next section describes similar functions for +extracting captured substrings by name. +.P +If a pattern uses the \eK escape sequence within a positive assertion, the +reported start of a successful match can be greater than the end of the match. +For example, if the pattern (?=ab\eK) is matched against "ab", the start and +end offset values for the match are 2 and 0. In this situation, calling these +functions with a zero substring number extracts a zero-length empty string. +.P +You can find the length in code units of a captured substring without +extracting it by calling \fBpcre2_substring_length_bynumber()\fP. The first +argument is a pointer to the match data block, the second is the group number, +and the third is a pointer to a variable into which the length is placed. If +you just want to know whether or not the substring has been captured, you can +pass the third argument as NULL. +.P +The \fBpcre2_substring_copy_bynumber()\fP function copies a captured substring +into a supplied buffer, whereas \fBpcre2_substring_get_bynumber()\fP copies it +into new memory, obtained using the same memory allocation function that was +used for the match data block. The first two arguments of these functions are a +pointer to the match data block and a capture group number. +.P +The final arguments of \fBpcre2_substring_copy_bynumber()\fP are a pointer to +the buffer and a pointer to a variable that contains its length in code units. +This is updated to contain the actual number of code units used for the +extracted substring, excluding the terminating zero. +.P +For \fBpcre2_substring_get_bynumber()\fP the third and fourth arguments point +to variables that are updated with a pointer to the new memory and the number +of code units that comprise the substring, again excluding the terminating +zero. When the substring is no longer needed, the memory should be freed by +calling \fBpcre2_substring_free()\fP. +.P +The return value from all these functions is zero for success, or a negative +error code. If the pattern match failed, the match failure code is returned. +If a substring number greater than zero is used after a partial match, +PCRE2_ERROR_PARTIAL is returned. Other possible error codes are: +.sp + PCRE2_ERROR_NOMEMORY +.sp +The buffer was too small for \fBpcre2_substring_copy_bynumber()\fP, or the +attempt to get memory failed for \fBpcre2_substring_get_bynumber()\fP. +.sp + PCRE2_ERROR_NOSUBSTRING +.sp +There is no substring with that number in the pattern, that is, the number is +greater than the number of capturing parentheses. +.sp + PCRE2_ERROR_UNAVAILABLE +.sp +The substring number, though not greater than the number of captures in the +pattern, is greater than the number of slots in the ovector, so the substring +could not be captured. +.sp + PCRE2_ERROR_UNSET +.sp +The substring did not participate in the match. For example, if the pattern is +(abc)|(def) and the subject is "def", and the ovector contains at least two +capturing slots, substring number 1 is unset. +. +. +.SH "EXTRACTING A LIST OF ALL CAPTURED SUBSTRINGS" +.rs +.sp +.nf +.B int pcre2_substring_list_get(pcre2_match_data *\fImatch_data\fP, +.B " PCRE2_UCHAR ***\fIlistptr\fP, PCRE2_SIZE **\fIlengthsptr\fP); +.sp +.B void pcre2_substring_list_free(PCRE2_SPTR *\fIlist\fP); +.fi +.P +The \fBpcre2_substring_list_get()\fP function extracts all available substrings +and builds a list of pointers to them. It also (optionally) builds a second +list that contains their lengths (in code units), excluding a terminating zero +that is added to each of them. All this is done in a single block of memory +that is obtained using the same memory allocation function that was used to get +the match data block. +.P +This function must be called only after a successful match. If called after a +partial match, the error code PCRE2_ERROR_PARTIAL is returned. +.P +The address of the memory block is returned via \fIlistptr\fP, which is also +the start of the list of string pointers. The end of the list is marked by a +NULL pointer. The address of the list of lengths is returned via +\fIlengthsptr\fP. If your strings do not contain binary zeros and you do not +therefore need the lengths, you may supply NULL as the \fBlengthsptr\fP +argument to disable the creation of a list of lengths. The yield of the +function is zero if all went well, or PCRE2_ERROR_NOMEMORY if the memory block +could not be obtained. When the list is no longer needed, it should be freed by +calling \fBpcre2_substring_list_free()\fP. +.P +If this function encounters a substring that is unset, which can happen when +capture group number \fIn+1\fP matches some part of the subject, but group +\fIn\fP has not been used at all, it returns an empty string. This can be +distinguished from a genuine zero-length substring by inspecting the +appropriate offset in the ovector, which contain PCRE2_UNSET for unset +substrings, or by calling \fBpcre2_substring_length_bynumber()\fP. +. +. +.\" HTML +.SH "EXTRACTING CAPTURED SUBSTRINGS BY NAME" +.rs +.sp +.nf +.B int pcre2_substring_number_from_name(const pcre2_code *\fIcode\fP, +.B " PCRE2_SPTR \fIname\fP);" +.sp +.B int pcre2_substring_length_byname(pcre2_match_data *\fImatch_data\fP, +.B " PCRE2_SPTR \fIname\fP, PCRE2_SIZE *\fIlength\fP);" +.sp +.B int pcre2_substring_copy_byname(pcre2_match_data *\fImatch_data\fP, +.B " PCRE2_SPTR \fIname\fP, PCRE2_UCHAR *\fIbuffer\fP, PCRE2_SIZE *\fIbufflen\fP);" +.sp +.B int pcre2_substring_get_byname(pcre2_match_data *\fImatch_data\fP, +.B " PCRE2_SPTR \fIname\fP, PCRE2_UCHAR **\fIbufferptr\fP, PCRE2_SIZE *\fIbufflen\fP);" +.sp +.B void pcre2_substring_free(PCRE2_UCHAR *\fIbuffer\fP); +.fi +.P +To extract a substring by name, you first have to find associated number. +For example, for this pattern: +.sp + (a+)b(?\ed+)... +.sp +the number of the capture group called "xxx" is 2. If the name is known to be +unique (PCRE2_DUPNAMES was not set), you can find the number from the name by +calling \fBpcre2_substring_number_from_name()\fP. The first argument is the +compiled pattern, and the second is the name. The yield of the function is the +group number, PCRE2_ERROR_NOSUBSTRING if there is no group with that name, or +PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one group with that name. +Given the number, you can extract the substring directly from the ovector, or +use one of the "bynumber" functions described above. +.P +For convenience, there are also "byname" functions that correspond to the +"bynumber" functions, the only difference being that the second argument is a +name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate +names, these functions scan all the groups with the given name, and return the +captured substring from the first named group that is set. +.P +If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is +returned. If all groups with the name have numbers that are greater than the +number of slots in the ovector, PCRE2_ERROR_UNAVAILABLE is returned. If there +is at least one group with a slot in the ovector, but no group is found to be +set, PCRE2_ERROR_UNSET is returned. +.P +\fBWarning:\fP If the pattern uses the (?| feature to set up multiple +capture groups with the same number, as described in the +.\" HTML +.\" +section on duplicate group numbers +.\" +in the +.\" HREF +\fBpcre2pattern\fP +.\" +page, you cannot use names to distinguish the different capture groups, because +names are not included in the compiled code. The matching process uses only +numbers. For this reason, the use of different names for groups with the +same number causes an error at compile time. +. +. +.\" HTML +.SH "CREATING A NEW STRING WITH SUBSTITUTIONS" +.rs +.sp +.nf +.B int pcre2_substitute(const pcre2_code *\fIcode\fP, PCRE2_SPTR \fIsubject\fP, +.B " PCRE2_SIZE \fIlength\fP, PCRE2_SIZE \fIstartoffset\fP," +.B " uint32_t \fIoptions\fP, pcre2_match_data *\fImatch_data\fP," +.B " pcre2_match_context *\fImcontext\fP, PCRE2_SPTR \fIreplacement\fP," +.B " PCRE2_SIZE \fIrlength\fP, PCRE2_UCHAR *\fIoutputbuffer\fP," +.B " PCRE2_SIZE *\fIoutlengthptr\fP);" +.fi +.P +This function optionally calls \fBpcre2_match()\fP and then makes a copy of the +subject string in \fIoutputbuffer\fP, replacing parts that were matched with +the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This +can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an +option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just the +replacement string(s). The default action is to perform just one replacement if +the pattern matches, but there is an option that requests multiple replacements +(see PCRE2_SUBSTITUTE_GLOBAL below). +.P +If successful, \fBpcre2_substitute()\fP returns the number of substitutions +that were carried out. This may be zero if no match was found, and is never +greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is +returned if an error is detected. +.P +Matches in which a \eK item in a lookahead in the pattern causes the match to +end before it starts are not supported, and give rise to an error return. For +global replacements, matches in which \eK in a lookbehind causes the match to +start earlier than the point that was reached in the previous iteration are +also not supported. +.P +The first seven arguments of \fBpcre2_substitute()\fP are the same as for +\fBpcre2_match()\fP, except that the partial matching options are not +permitted, and \fImatch_data\fP may be passed as NULL, in which case a match +data block is obtained and freed within this function, using memory management +functions from the match context, if provided, or else those that were used to +allocate memory for the compiled code. +.P +If \fImatch_data\fP is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the +provided block is used for all calls to \fBpcre2_match()\fP, and its contents +afterwards are the result of the final call. For global changes, this will +always be a no-match error. The contents of the ovector within the match data +block may or may not have been changed. +.P +As well as the usual options for \fBpcre2_match()\fP, a number of additional +options can be set in the \fIoptions\fP argument of \fBpcre2_substitute()\fP. +One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external +\fImatch_data\fP block must be provided, and it must have been used for an +external call to \fBpcre2_match()\fP. The data in the \fImatch_data\fP block +(return code, offset vector) is used for the first substitution instead of +calling \fBpcre2_match()\fP from within \fBpcre2_substitute()\fP. This allows +an application to check for a match before choosing to substitute, without +having to repeat the match. +.P +The contents of the externally supplied match data block are not changed when +PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTITUTE_GLOBAL is also set, +\fBpcre2_match()\fP is called after the first substitution to check for further +matches, but this is done using an internally obtained match data block, thus +always leaving the external block unchanged. +.P +The \fIcode\fP argument is not used for matching before the first substitution +when PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided, even when +PCRE2_SUBSTITUTE_GLOBAL is not set, because it contains information such as the +UTF setting and the number of capturing parentheses in the pattern. +.P +The default action of \fBpcre2_substitute()\fP is to return a copy of the +subject string with matched substrings replaced. However, if +PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are +returned. In the global case, multiple replacements are concatenated in the +output buffer. Substitution callouts (see +.\" HTML +.\" +below) +.\" +can be used to separate them if necessary. +.P +The \fIoutlengthptr\fP argument of \fBpcre2_substitute()\fP must point to a +variable that contains the length, in code units, of the output buffer. If the +function is successful, the value is updated to contain the length in code +units of the new string, excluding the trailing zero that is automatically +added. +.P +If the function is not successful, the value set via \fIoutlengthptr\fP depends +on the type of error. For syntax errors in the replacement string, the value is +the offset in the replacement string where the error was detected. For other +errors, the value is PCRE2_UNSET by default. This includes the case of the +output buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set. +.P +PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is +too small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If +this option is set, however, \fBpcre2_substitute()\fP continues to go through +the motions of matching and substituting (without, of course, writing anything) +in order to compute the size of buffer that is needed. This value is passed +back via the \fIoutlengthptr\fP variable, with the result of the function still +being PCRE2_ERROR_NOMEMORY. +.P +Passing a buffer size of zero is a permitted way of finding out how much memory +is needed for given substitution. However, this does mean that the entire +operation is carried out twice. Depending on the application, it may be more +efficient to allocate a large buffer and free the excess afterwards, instead of +using PCRE2_SUBSTITUTE_OVERFLOW_LENGTH. +.P +The replacement string, which is interpreted as a UTF string in UTF mode, is +checked for UTF validity unless PCRE2_NO_UTF_CHECK is set. An invalid UTF +replacement string causes an immediate return with the relevant UTF error code. +.P +If PCRE2_SUBSTITUTE_LITERAL is set, the replacement string is not interpreted +in any way. By default, however, a dollar character is an escape character that +can specify the insertion of characters from capture groups and names from +(*MARK) or other control verbs in the pattern. The following forms are always +recognized: +.sp + $$ insert a dollar character + $ or ${} insert the contents of group + $*MARK or ${*MARK} insert a control verb name +.sp +Either a group number or a group name can be given for . Curly brackets are +required only if the following character would be interpreted as part of the +number or name. The number may be zero to include the entire matched string. +For example, if the pattern a(b)c is matched with "=abc=" and the replacement +string "+$1$0$1+", the result is "=+babcb+=". +.P +$*MARK inserts the name from the last encountered backtracking control verb on +the matching path that has a name. (*MARK) must always include a name, but the +other verbs need not. For example, in the case of (*MARK:A)(*PRUNE) the name +inserted is "A", but for (*MARK:A)(*PRUNE:B) the relevant name is "B". This +facility can be used to perform simple simultaneous substitutions, as this +\fBpcre2test\fP example shows: +.sp + /(*MARK:pear)apple|(*MARK:orange)lemon/g,replace=${*MARK} + apple lemon + 2: pear orange +.sp +PCRE2_SUBSTITUTE_GLOBAL causes the function to iterate over the subject string, +replacing every matching substring. If this option is not set, only the first +matching substring is replaced. The search for matches takes place in the +original subject string (that is, previous replacements do not affect it). +Iteration is implemented by advancing the \fIstartoffset\fP value for each +search, which is always passed the entire subject string. If an offset limit is +set in the match context, searching stops when that limit is reached. +.P +You can restrict the effect of a global substitution to a portion of the +subject string by setting either or both of \fIstartoffset\fP and an offset +limit. Here is a \fBpcre2test\fP example: +.sp + /B/g,replace=!,use_offset_limit + ABC ABC ABC ABC\e=offset=3,offset_limit=12 + 2: ABC A!C A!C ABC +.sp +When continuing with global substitutions after matching a substring with zero +length, an attempt to find a non-empty match at the same offset is performed. +If this is not successful, the offset is advanced by one character except when +CRLF is a valid newline sequence and the next two characters are CR, LF. In +this case, the offset is advanced by two characters. +.P +PCRE2_SUBSTITUTE_UNKNOWN_UNSET causes references to capture groups that do +not appear in the pattern to be treated as unset groups. This option should be +used with care, because it means that a typo in a group name or number no +longer causes the PCRE2_ERROR_NOSUBSTRING error. +.P +PCRE2_SUBSTITUTE_UNSET_EMPTY causes unset capture groups (including unknown +groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty +strings when inserted as described above. If this option is not set, an attempt +to insert an unset group causes the PCRE2_ERROR_UNSET error. This option does +not influence the extended substitution syntax described below. +.P +PCRE2_SUBSTITUTE_EXTENDED causes extra processing to be applied to the +replacement string. Without this option, only the dollar character is special, +and only the group insertion forms listed above are valid. When +PCRE2_SUBSTITUTE_EXTENDED is set, two things change: +.P +Firstly, backslash in a replacement string is interpreted as an escape +character. The usual forms such as \en or \ex{ddd} can be used to specify +particular character codes, and backslash followed by any non-alphanumeric +character quotes that character. Extended quoting can be coded using \eQ...\eE, +exactly as in pattern strings. +.P +There are also four escape sequences for forcing the case of inserted letters. +The insertion mechanism has three states: no case forcing, force upper case, +and force lower case. The escape sequences change the current state: \eU and +\eL change to upper or lower case forcing, respectively, and \eE (when not +terminating a \eQ quoted sequence) reverts to no case forcing. The sequences +\eu and \el force the next character (if it is a letter) to upper or lower +case, respectively, and then the state automatically reverts to no case +forcing. Case forcing applies to all inserted characters, including those from +capture groups and letters within \eQ...\eE quoted sequences. If either +PCRE2_UTF or PCRE2_UCP was set when the pattern was compiled, Unicode +properties are used for case forcing characters whose code points are greater +than 127. +.P +Note that case forcing sequences such as \eU...\eE do not nest. For example, +the result of processing "\eUaa\eLBB\eEcc\eE" is "AAbbcc"; the final \eE has no +effect. Note also that the PCRE2_ALT_BSUX and PCRE2_EXTRA_ALT_BSUX options do +not apply to replacement strings. +.P +The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more +flexibility to capture group substitution. The syntax is similar to that used +by Bash: +.sp + ${:-} + ${:+:} +.sp +As before, may be a group number or a name. The first form specifies a +default value. If group is set, its value is inserted; if not, is +expanded and the result inserted. The second form specifies strings that are +expanded and inserted when group is set or unset, respectively. The first +form is just a convenient shorthand for +.sp + ${:+${}:} +.sp +Backslash can be used to escape colons and closing curly brackets in the +replacement strings. A change of the case forcing state within a replacement +string remains in force afterwards, as shown in this \fBpcre2test\fP example: +.sp + /(some)?(body)/substitute_extended,replace=${1:+\eU:\eL}HeLLo + body + 1: hello + somebody + 1: HELLO +.sp +The PCRE2_SUBSTITUTE_UNSET_EMPTY option does not affect these extended +substitutions. However, PCRE2_SUBSTITUTE_UNKNOWN_UNSET does cause unknown +groups in the extended syntax forms to be treated as unset. +.P +If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET, +PCRE2_SUBSTITUTE_UNSET_EMPTY, and PCRE2_SUBSTITUTE_EXTENDED are irrelevant and +are ignored. +. +. +.SS "Substitution errors" +.rs +.sp +In the event of an error, \fBpcre2_substitute()\fP returns a negative error +code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors from +\fBpcre2_match()\fP are passed straight back. +.P +PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion, +unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set. +.P +PCRE2_ERROR_UNSET is returned for an unset substring insertion (including an +unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) when the simple +(non-extended) syntax is used and PCRE2_SUBSTITUTE_UNSET_EMPTY is not set. +.P +PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big enough. If the +PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size of buffer that is +needed is returned via \fIoutlengthptr\fP. Note that this does not happen by +default. +.P +PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the +\fImatch_data\fP argument is NULL. +.P +PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the +replacement string, with more particular errors being PCRE2_ERROR_BADREPESCAPE +(invalid escape sequence), PCRE2_ERROR_REPMISSINGBRACE (closing curly bracket +not found), PCRE2_ERROR_BADSUBSTITUTION (syntax error in extended group +substitution), and PCRE2_ERROR_BADSUBSPATTERN (the pattern match ended before +it started or the match started earlier than the current position in the +subject, which can happen if \eK is used in an assertion). +.P +As for all PCRE2 errors, a text message that describes the error can be +obtained by calling the \fBpcre2_get_error_message()\fP function (see +"Obtaining a textual error message" +.\" HTML +.\" +above). +.\" +. +. +.\" HTML +.SS "Substitution callouts" +.rs +.sp +.nf +.B int pcre2_set_substitute_callout(pcre2_match_context *\fImcontext\fP, +.B " int (*\fIcallout_function\fP)(pcre2_substitute_callout_block *, void *)," +.B " void *\fIcallout_data\fP);" +.fi +.sp +The \fBpcre2_set_substitution_callout()\fP function can be used to specify a +callout function for \fBpcre2_substitute()\fP. This information is passed in +a match context. The callout function is called after each substitution has +been processed, but it can cause the replacement not to happen. The callout +function is not called for simulated substitutions that happen as a result of +the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option. +.P +The first argument of the callout function is a pointer to a substitute callout +block structure, which contains the following fields, not necessarily in this +order: +.sp + uint32_t \fIversion\fP; + uint32_t \fIsubscount\fP; + PCRE2_SPTR \fIinput\fP; + PCRE2_SPTR \fIoutput\fP; + PCRE2_SIZE \fI*ovector\fP; + uint32_t \fIoveccount\fP; + PCRE2_SIZE \fIoutput_offsets[2]\fP; +.sp +The \fIversion\fP field contains the version number of the block format. The +current version is 0. The version number will increase in future if more fields +are added, but the intention is never to remove any of the existing fields. +.P +The \fIsubscount\fP field is the number of the current match. It is 1 for the +first callout, 2 for the second, and so on. The \fIinput\fP and \fIoutput\fP +pointers are copies of the values passed to \fBpcre2_substitute()\fP. +.P +The \fIovector\fP field points to the ovector, which contains the result of the +most recent match. The \fIoveccount\fP field contains the number of pairs that +are set in the ovector, and is always greater than zero. +.P +The \fIoutput_offsets\fP vector contains the offsets of the replacement in the +output string. This has already been processed for dollar and (if requested) +backslash substitutions as described above. +.P +The second argument of the callout function is the value passed as +\fIcallout_data\fP when the function was registered. The value returned by the +callout function is interpreted as follows: +.P +If the value is zero, the replacement is accepted, and, if +PCRE2_SUBSTITUTE_GLOBAL is set, processing continues with a search for the next +match. If the value is not zero, the current replacement is not accepted. If +the value is greater than zero, processing continues when +PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero or +PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is copied to the +output and the call to \fBpcre2_substitute()\fP exits, returning the number of +matches so far. +. +. +.SH "DUPLICATE CAPTURE GROUP NAMES" +.rs +.sp +.nf +.B int pcre2_substring_nametable_scan(const pcre2_code *\fIcode\fP, +.B " PCRE2_SPTR \fIname\fP, PCRE2_SPTR *\fIfirst\fP, PCRE2_SPTR *\fIlast\fP);" +.fi +.P +When a pattern is compiled with the PCRE2_DUPNAMES option, names for capture +groups are not required to be unique. Duplicate names are always allowed for +groups with the same number, created by using the (?| feature. Indeed, if such +groups are named, they are required to use the same names. +.P +Normally, patterns that use duplicate names are such that in any one match, +only one of each set of identically-named groups participates. An example is +shown in the +.\" HREF +\fBpcre2pattern\fP +.\" +documentation. +.P +When duplicates are present, \fBpcre2_substring_copy_byname()\fP and +\fBpcre2_substring_get_byname()\fP return the first substring corresponding to +the given name that is set. Only if none are set is PCRE2_ERROR_UNSET is +returned. The \fBpcre2_substring_number_from_name()\fP function returns the +error PCRE2_ERROR_NOUNIQUESUBSTRING when there are duplicate names. +.P +If you want to get full details of all captured substrings for a given name, +you must use the \fBpcre2_substring_nametable_scan()\fP function. The first +argument is the compiled pattern, and the second is the name. If the third and +fourth arguments are NULL, the function returns a group number for a unique +name, or PCRE2_ERROR_NOUNIQUESUBSTRING otherwise. +.P +When the third and fourth arguments are not NULL, they must be pointers to +variables that are updated by the function. After it has run, they point to the +first and last entries in the name-to-number table for the given name, and the +function returns the length of each entry in code units. In both cases, +PCRE2_ERROR_NOSUBSTRING is returned if there are no entries for the given name. +.P +The format of the name table is described +.\" HTML +.\" +above +.\" +in the section entitled \fIInformation about a pattern\fP. Given all the +relevant entries for the name, you can extract each of their numbers, and hence +the captured data. +. +. +.SH "FINDING ALL POSSIBLE MATCHES AT ONE POSITION" +.rs +.sp +The traditional matching function uses a similar algorithm to Perl, which stops +when it finds the first match at a given point in the subject. If you want to +find all possible matches, or the longest possible match at a given position, +consider using the alternative matching function (see below) instead. If you +cannot use the alternative function, you can kludge it up by making use of the +callout facility, which is described in the +.\" HREF +\fBpcre2callout\fP +.\" +documentation. +.P +What you have to do is to insert a callout right at the end of the pattern. +When your callout function is called, extract and save the current matched +substring. Then return 1, which forces \fBpcre2_match()\fP to backtrack and try +other alternatives. Ultimately, when it runs out of matches, +\fBpcre2_match()\fP will yield PCRE2_ERROR_NOMATCH. +. +. +.\" HTML +.SH "MATCHING A PATTERN: THE ALTERNATIVE FUNCTION" +.rs +.sp +.nf +.B int pcre2_dfa_match(const pcre2_code *\fIcode\fP, PCRE2_SPTR \fIsubject\fP, +.B " PCRE2_SIZE \fIlength\fP, PCRE2_SIZE \fIstartoffset\fP," +.B " uint32_t \fIoptions\fP, pcre2_match_data *\fImatch_data\fP," +.B " pcre2_match_context *\fImcontext\fP," +.B " int *\fIworkspace\fP, PCRE2_SIZE \fIwscount\fP);" +.fi +.P +The function \fBpcre2_dfa_match()\fP is called to match a subject string +against a compiled pattern, using a matching algorithm that scans the subject +string just once (not counting lookaround assertions), and does not backtrack. +This has different characteristics to the normal algorithm, and is not +compatible with Perl. Some of the features of PCRE2 patterns are not supported. +Nevertheless, there are times when this kind of matching can be useful. For a +discussion of the two matching algorithms, and a list of features that +\fBpcre2_dfa_match()\fP does not support, see the +.\" HREF +\fBpcre2matching\fP +.\" +documentation. +.P +The arguments for the \fBpcre2_dfa_match()\fP function are the same as for +\fBpcre2_match()\fP, plus two extras. The ovector within the match data block +is used in a different way, and this is described below. The other common +arguments are used in the same way as for \fBpcre2_match()\fP, so their +description is not repeated here. +.P +The two additional arguments provide workspace for the function. The workspace +vector should contain at least 20 elements. It is used for keeping track of +multiple paths through the pattern tree. More workspace is needed for patterns +and subjects where there are a lot of potential matches. +.P +Here is an example of a simple call to \fBpcre2_dfa_match()\fP: +.sp + int wspace[20]; + pcre2_match_data *md = pcre2_match_data_create(4, NULL); + int rc = pcre2_dfa_match( + re, /* result of pcre2_compile() */ + "some string", /* the subject string */ + 11, /* the length of the subject string */ + 0, /* start at offset 0 in the subject */ + 0, /* default options */ + md, /* the match data block */ + NULL, /* a match context; NULL means use defaults */ + wspace, /* working space vector */ + 20); /* number of elements (NOT size in bytes) */ +. +.SS "Option bits for \fBpcre_dfa_match()\fP" +.rs +.sp +The unused bits of the \fIoptions\fP argument for \fBpcre2_dfa_match()\fP must +be zero. The only bits that may be set are PCRE2_ANCHORED, +PCRE2_COPY_MATCHED_SUBJECT, PCRE2_ENDANCHORED, PCRE2_NOTBOL, PCRE2_NOTEOL, +PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, +PCRE2_PARTIAL_SOFT, PCRE2_DFA_SHORTEST, and PCRE2_DFA_RESTART. All but the last +four of these are exactly the same as for \fBpcre2_match()\fP, so their +description is not repeated here. +.sp + PCRE2_PARTIAL_HARD + PCRE2_PARTIAL_SOFT +.sp +These have the same general effect as they do for \fBpcre2_match()\fP, but the +details are slightly different. When PCRE2_PARTIAL_HARD is set for +\fBpcre2_dfa_match()\fP, it returns PCRE2_ERROR_PARTIAL if the end of the +subject is reached and there is still at least one matching possibility that +requires additional characters. This happens even if some complete matches have +already been found. When PCRE2_PARTIAL_SOFT is set, the return code +PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL if the end of the +subject is reached, there have been no complete matches, but there is still at +least one matching possibility. The portion of the string that was inspected +when the longest partial match was found is set as the first matching string in +both cases. There is a more detailed discussion of partial and multi-segment +matching, with examples, in the +.\" HREF +\fBpcre2partial\fP +.\" +documentation. +.sp + PCRE2_DFA_SHORTEST +.sp +Setting the PCRE2_DFA_SHORTEST option causes the matching algorithm to stop as +soon as it has found one match. Because of the way the alternative algorithm +works, this is necessarily the shortest possible match at the first possible +matching point in the subject string. +.sp + PCRE2_DFA_RESTART +.sp +When \fBpcre2_dfa_match()\fP returns a partial match, it is possible to call it +again, with additional subject characters, and have it continue with the same +match. The PCRE2_DFA_RESTART option requests this action; when it is set, the +\fIworkspace\fP and \fIwscount\fP options must reference the same vector as +before because data about the match so far is left in them after a partial +match. There is more discussion of this facility in the +.\" HREF +\fBpcre2partial\fP +.\" +documentation. +. +. +.SS "Successful returns from \fBpcre2_dfa_match()\fP" +.rs +.sp +When \fBpcre2_dfa_match()\fP succeeds, it may have matched more than one +substring in the subject. Note, however, that all the matches from one run of +the function start at the same point in the subject. The shorter matches are +all initial substrings of the longer matches. For example, if the pattern +.sp + <.*> +.sp +is matched against the string +.sp + This is no more +.sp +the three matched strings are +.sp + + + +.sp +On success, the yield of the function is a number greater than zero, which is +the number of matched substrings. The offsets of the substrings are returned in +the ovector, and can be extracted by number in the same way as for +\fBpcre2_match()\fP, but the numbers bear no relation to any capture groups +that may exist in the pattern, because DFA matching does not support capturing. +.P +Calls to the convenience functions that extract substrings by name +return the error PCRE2_ERROR_DFA_UFUNC (unsupported function) if used after a +DFA match. The convenience functions that extract substrings by number never +return PCRE2_ERROR_NOSUBSTRING. +.P +The matched strings are stored in the ovector in reverse order of length; that +is, the longest matching string is first. If there were too many matches to fit +into the ovector, the yield of the function is zero, and the vector is filled +with the longest matches. +.P +NOTE: PCRE2's "auto-possessification" optimization usually applies to character +repeats at the end of a pattern (as well as internally). For example, the +pattern "a\ed+" is compiled as if it were "a\ed++". For DFA matching, this +means that only one possible match is found. If you really do want multiple +matches in such cases, either use an ungreedy repeat such as "a\ed+?" or set +the PCRE2_NO_AUTO_POSSESS option when compiling. +. +. +.SS "Error returns from \fBpcre2_dfa_match()\fP" +.rs +.sp +The \fBpcre2_dfa_match()\fP function returns a negative number when it fails. +Many of the errors are the same as for \fBpcre2_match()\fP, as described +.\" HTML +.\" +above. +.\" +There are in addition the following errors that are specific to +\fBpcre2_dfa_match()\fP: +.sp + PCRE2_ERROR_DFA_UITEM +.sp +This return is given if \fBpcre2_dfa_match()\fP encounters an item in the +pattern that it does not support, for instance, the use of \eC in a UTF mode or +a backreference. +.sp + PCRE2_ERROR_DFA_UCOND +.sp +This return is given if \fBpcre2_dfa_match()\fP encounters a condition item +that uses a backreference for the condition, or a test for recursion in a +specific capture group. These are not supported. +.sp + PCRE2_ERROR_DFA_UINVALID_UTF +.sp +This return is given if \fBpcre2_dfa_match()\fP is called for a pattern that +was compiled with PCRE2_MATCH_INVALID_UTF. This is not supported for DFA +matching. +.sp + PCRE2_ERROR_DFA_WSSIZE +.sp +This return is given if \fBpcre2_dfa_match()\fP runs out of space in the +\fIworkspace\fP vector. +.sp + PCRE2_ERROR_DFA_RECURSE +.sp +When a recursion or subroutine call is processed, the matching function calls +itself recursively, using private memory for the ovector and \fIworkspace\fP. +This error is given if the internal ovector is not large enough. This should be +extremely rare, as a vector of size 1000 is used. +.sp + PCRE2_ERROR_DFA_BADRESTART +.sp +When \fBpcre2_dfa_match()\fP is called with the \fBPCRE2_DFA_RESTART\fP option, +some plausibility checks are made on the contents of the workspace, which +should contain data about the previous partial match. If any of these checks +fail, this error is given. +. +. +.SH "SEE ALSO" +.rs +.sp +\fBpcre2build\fP(3), \fBpcre2callout\fP(3), \fBpcre2demo(3)\fP, +\fBpcre2matching\fP(3), \fBpcre2partial\fP(3), \fBpcre2posix\fP(3), +\fBpcre2sample\fP(3), \fBpcre2unicode\fP(3). +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 04 November 2020 +Copyright (c) 1997-2020 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2build.3 b/src/pcre2/doc/pcre2build.3 new file mode 100644 index 00000000..edea2223 --- /dev/null +++ b/src/pcre2/doc/pcre2build.3 @@ -0,0 +1,637 @@ +.TH PCRE2BUILD 3 "20 March 2020" "PCRE2 10.35" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +. +. +.SH "BUILDING PCRE2" +.rs +.sp +PCRE2 is distributed with a \fBconfigure\fP script that can be used to build +the library in Unix-like environments using the applications known as +Autotools. Also in the distribution are files to support building using +\fBCMake\fP instead of \fBconfigure\fP. The text file +.\" HTML +.\" +\fBREADME\fP +.\" +contains general information about building with Autotools (some of which is +repeated below), and also has some comments about building on various operating +systems. There is a lot more information about building PCRE2 without using +Autotools (including information about using \fBCMake\fP and building "by +hand") in the text file called +.\" HTML +.\" +\fBNON-AUTOTOOLS-BUILD\fP. +.\" +You should consult this file as well as the +.\" HTML +.\" +\fBREADME\fP +.\" +file if you are building in a non-Unix-like environment. +. +. +.SH "PCRE2 BUILD-TIME OPTIONS" +.rs +.sp +The rest of this document describes the optional features of PCRE2 that can be +selected when the library is compiled. It assumes use of the \fBconfigure\fP +script, where the optional features are selected or deselected by providing +options to \fBconfigure\fP before running the \fBmake\fP command. However, the +same options can be selected in both Unix-like and non-Unix-like environments +if you are using \fBCMake\fP instead of \fBconfigure\fP to build PCRE2. +.P +If you are not using Autotools or \fBCMake\fP, option selection can be done by +editing the \fBconfig.h\fP file, or by passing parameter settings to the +compiler, as described in +.\" HTML +.\" +\fBNON-AUTOTOOLS-BUILD\fP. +.\" +.P +The complete list of options for \fBconfigure\fP (which includes the standard +ones such as the selection of the installation directory) can be obtained by +running +.sp + ./configure --help +.sp +The following sections include descriptions of "on/off" options whose names +begin with --enable or --disable. Because of the way that \fBconfigure\fP +works, --enable and --disable always come in pairs, so the complementary option +always exists as well, but as it specifies the default, it is not described. +Options that specify values have names that start with --with. At the end of a +\fBconfigure\fP run, a summary of the configuration is output. +. +. +.SH "BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES" +.rs +.sp +By default, a library called \fBlibpcre2-8\fP is built, containing functions +that take string arguments contained in arrays of bytes, interpreted either as +single-byte characters, or UTF-8 strings. You can also build two other +libraries, called \fBlibpcre2-16\fP and \fBlibpcre2-32\fP, which process +strings that are contained in arrays of 16-bit and 32-bit code units, +respectively. These can be interpreted either as single-unit characters or +UTF-16/UTF-32 strings. To build these additional libraries, add one or both of +the following to the \fBconfigure\fP command: +.sp + --enable-pcre2-16 + --enable-pcre2-32 +.sp +If you do not want the 8-bit library, add +.sp + --disable-pcre2-8 +.sp +as well. At least one of the three libraries must be built. Note that the POSIX +wrapper is for the 8-bit library only, and that \fBpcre2grep\fP is an 8-bit +program. Neither of these are built if you select only the 16-bit or 32-bit +libraries. +. +. +.SH "BUILDING SHARED AND STATIC LIBRARIES" +.rs +.sp +The Autotools PCRE2 building process uses \fBlibtool\fP to build both shared +and static libraries by default. You can suppress an unwanted library by adding +one of +.sp + --disable-shared + --disable-static +.sp +to the \fBconfigure\fP command. +. +. +.SH "UNICODE AND UTF SUPPORT" +.rs +.sp +By default, PCRE2 is built with support for Unicode and UTF character strings. +To build it without Unicode support, add +.sp + --disable-unicode +.sp +to the \fBconfigure\fP command. This setting applies to all three libraries. It +is not possible to build one library with Unicode support and another without +in the same configuration. +.P +Of itself, Unicode support does not make PCRE2 treat strings as UTF-8, UTF-16 +or UTF-32. To do that, applications that use the library can set the PCRE2_UTF +option when they call \fBpcre2_compile()\fP to compile a pattern. +Alternatively, patterns may be started with (*UTF) unless the application has +locked this out by setting PCRE2_NEVER_UTF. +.P +UTF support allows the libraries to process character code points up to +0x10ffff in the strings that they handle. Unicode support also gives access to +the Unicode properties of characters, using pattern escapes such as \eP, \ep, +and \eX. Only the general category properties such as \fILu\fP and \fINd\fP are +supported. Details are given in the +.\" HREF +\fBpcre2pattern\fP +.\" +documentation. +.P +Pattern escapes such as \ed and \ew do not by default make use of Unicode +properties. The application can request that they do by setting the PCRE2_UCP +option. Unless the application has set PCRE2_NEVER_UCP, a pattern may also +request this by starting with (*UCP). +. +. +.SH "DISABLING THE USE OF \eC" +.rs +.sp +The \eC escape sequence, which matches a single code unit, even in a UTF mode, +can cause unpredictable behaviour because it may leave the current matching +point in the middle of a multi-code-unit character. The application can lock it +out by setting the PCRE2_NEVER_BACKSLASH_C option when calling +\fBpcre2_compile()\fP. There is also a build-time option +.sp + --enable-never-backslash-C +.sp +(note the upper case C) which locks out the use of \eC entirely. +. +. +.SH "JUST-IN-TIME COMPILER SUPPORT" +.rs +.sp +Just-in-time (JIT) compiler support is included in the build by specifying +.sp + --enable-jit +.sp +This support is available only for certain hardware architectures. If this +option is set for an unsupported architecture, a building error occurs. +If in doubt, use +.sp + --enable-jit=auto +.sp +which enables JIT only if the current hardware is supported. You can check +if JIT is enabled in the configuration summary that is output at the end of a +\fBconfigure\fP run. If you are enabling JIT under SELinux you may also want to +add +.sp + --enable-jit-sealloc +.sp +which enables the use of an execmem allocator in JIT that is compatible with +SELinux. This has no effect if JIT is not enabled. See the +.\" HREF +\fBpcre2jit\fP +.\" +documentation for a discussion of JIT usage. When JIT support is enabled, +\fBpcre2grep\fP automatically makes use of it, unless you add +.sp + --disable-pcre2grep-jit +.sp +to the \fBconfigure\fP command. +. +. +.SH "NEWLINE RECOGNITION" +.rs +.sp +By default, PCRE2 interprets the linefeed (LF) character as indicating the end +of a line. This is the normal newline character on Unix-like systems. You can +compile PCRE2 to use carriage return (CR) instead, by adding +.sp + --enable-newline-is-cr +.sp +to the \fBconfigure\fP command. There is also an --enable-newline-is-lf option, +which explicitly specifies linefeed as the newline character. +.P +Alternatively, you can specify that line endings are to be indicated by the +two-character sequence CRLF (CR immediately followed by LF). If you want this, +add +.sp + --enable-newline-is-crlf +.sp +to the \fBconfigure\fP command. There is a fourth option, specified by +.sp + --enable-newline-is-anycrlf +.sp +which causes PCRE2 to recognize any of the three sequences CR, LF, or CRLF as +indicating a line ending. A fifth option, specified by +.sp + --enable-newline-is-any +.sp +causes PCRE2 to recognize any Unicode newline sequence. The Unicode newline +sequences are the three just mentioned, plus the single characters VT (vertical +tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line +separator, U+2028), and PS (paragraph separator, U+2029). The final option is +.sp + --enable-newline-is-nul +.sp +which causes NUL (binary zero) to be set as the default line-ending character. +.P +Whatever default line ending convention is selected when PCRE2 is built can be +overridden by applications that use the library. At build time it is +recommended to use the standard for your operating system. +. +. +.SH "WHAT \eR MATCHES" +.rs +.sp +By default, the sequence \eR in a pattern matches any Unicode newline sequence, +independently of what has been selected as the line ending sequence. If you +specify +.sp + --enable-bsr-anycrlf +.sp +the default is changed so that \eR matches only CR, LF, or CRLF. Whatever is +selected when PCRE2 is built can be overridden by applications that use the +library. +. +. +.SH "HANDLING VERY LARGE PATTERNS" +.rs +.sp +Within a compiled pattern, offset values are used to point from one part to +another (for example, from an opening parenthesis to an alternation +metacharacter). By default, in the 8-bit and 16-bit libraries, two-byte values +are used for these offsets, leading to a maximum size for a compiled pattern of +around 64 thousand code units. This is sufficient to handle all but the most +gigantic patterns. Nevertheless, some people do want to process truly enormous +patterns, so it is possible to compile PCRE2 to use three-byte or four-byte +offsets by adding a setting such as +.sp + --with-link-size=3 +.sp +to the \fBconfigure\fP command. The value given must be 2, 3, or 4. For the +16-bit library, a value of 3 is rounded up to 4. In these libraries, using +longer offsets slows down the operation of PCRE2 because it has to load +additional data when handling them. For the 32-bit library the value is always +4 and cannot be overridden; the value of --with-link-size is ignored. +. +. +.SH "LIMITING PCRE2 RESOURCE USAGE" +.rs +.sp +The \fBpcre2_match()\fP function increments a counter each time it goes round +its main loop. Putting a limit on this counter controls the amount of computing +resource used by a single call to \fBpcre2_match()\fP. The limit can be changed +at run time, as described in the +.\" HREF +\fBpcre2api\fP +.\" +documentation. The default is 10 million, but this can be changed by adding a +setting such as +.sp + --with-match-limit=500000 +.sp +to the \fBconfigure\fP command. This setting also applies to the +\fBpcre2_dfa_match()\fP matching function, and to JIT matching (though the +counting is done differently). +.P +The \fBpcre2_match()\fP function starts out using a 20KiB vector on the system +stack to record backtracking points. The more nested backtracking points there +are (that is, the deeper the search tree), the more memory is needed. If the +initial vector is not large enough, heap memory is used, up to a certain limit, +which is specified in kibibytes (units of 1024 bytes). The limit can be changed +at run time, as described in the +.\" HREF +\fBpcre2api\fP +.\" +documentation. The default limit (in effect unlimited) is 20 million. You can +change this by a setting such as +.sp + --with-heap-limit=500 +.sp +which limits the amount of heap to 500 KiB. This limit applies only to +interpretive matching in \fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP, which +may also use the heap for internal workspace when processing complicated +patterns. This limit does not apply when JIT (which has its own memory +arrangements) is used. +.P +You can also explicitly limit the depth of nested backtracking in the +\fBpcre2_match()\fP interpreter. This limit defaults to the value that is set +for --with-match-limit. You can set a lower default limit by adding, for +example, +.sp + --with-match-limit_depth=10000 +.sp +to the \fBconfigure\fP command. This value can be overridden at run time. This +depth limit indirectly limits the amount of heap memory that is used, but +because the size of each backtracking "frame" depends on the number of +capturing parentheses in a pattern, the amount of heap that is used before the +limit is reached varies from pattern to pattern. This limit was more useful in +versions before 10.30, where function recursion was used for backtracking. +.P +As well as applying to \fBpcre2_match()\fP, the depth limit also controls +the depth of recursive function calls in \fBpcre2_dfa_match()\fP. These are +used for lookaround assertions, atomic groups, and recursion within patterns. +The limit does not apply to JIT matching. +. +. +.\" HTML +.SH "CREATING CHARACTER TABLES AT BUILD TIME" +.rs +.sp +PCRE2 uses fixed tables for processing characters whose code points are less +than 256. By default, PCRE2 is built with a set of tables that are distributed +in the file \fIsrc/pcre2_chartables.c.dist\fP. These tables are for ASCII codes +only. If you add +.sp + --enable-rebuild-chartables +.sp +to the \fBconfigure\fP command, the distributed tables are no longer used. +Instead, a program called \fBpcre2_dftables\fP is compiled and run. This +outputs the source for new set of tables, created in the default locale of your +C run-time system. This method of replacing the tables does not work if you are +cross compiling, because \fBpcre2_dftables\fP needs to be run on the local +host and therefore not compiled with the cross compiler. +.P +If you need to create alternative tables when cross compiling, you will have to +do so "by hand". There may also be other reasons for creating tables manually. +To cause \fBpcre2_dftables\fP to be built on the local host, run a normal +compiling command, and then run the program with the output file as its +argument, for example: +.sp + cc src/pcre2_dftables.c -o pcre2_dftables + ./pcre2_dftables src/pcre2_chartables.c +.sp +This builds the tables in the default locale of the local host. If you want to +specify a locale, you must use the -L option: +.sp + LC_ALL=fr_FR ./pcre2_dftables -L src/pcre2_chartables.c +.sp +You can also specify -b (with or without -L). This causes the tables to be +written in binary instead of as source code. A set of binary tables can be +loaded into memory by an application and passed to \fBpcre2_compile()\fP in the +same way as tables created by calling \fBpcre2_maketables()\fP. The tables are +just a string of bytes, independent of hardware characteristics such as +endianness. This means they can be bundled with an application that runs in +different environments, to ensure consistent behaviour. +. +. +.SH "USING EBCDIC CODE" +.rs +.sp +PCRE2 assumes by default that it will run in an environment where the character +code is ASCII or Unicode, which is a superset of ASCII. This is the case for +most computer operating systems. PCRE2 can, however, be compiled to run in an +8-bit EBCDIC environment by adding +.sp + --enable-ebcdic --disable-unicode +.sp +to the \fBconfigure\fP command. This setting implies +--enable-rebuild-chartables. You should only use it if you know that you are in +an EBCDIC environment (for example, an IBM mainframe operating system). +.P +It is not possible to support both EBCDIC and UTF-8 codes in the same version +of the library. Consequently, --enable-unicode and --enable-ebcdic are mutually +exclusive. +.P +The EBCDIC character that corresponds to an ASCII LF is assumed to have the +value 0x15 by default. However, in some EBCDIC environments, 0x25 is used. In +such an environment you should use +.sp + --enable-ebcdic-nl25 +.sp +as well as, or instead of, --enable-ebcdic. The EBCDIC character for CR has the +same value as in ASCII, namely, 0x0d. Whichever of 0x15 and 0x25 is \fInot\fP +chosen as LF is made to correspond to the Unicode NEL character (which, in +Unicode, is 0x85). +.P +The options that select newline behaviour, such as --enable-newline-is-cr, +and equivalent run-time options, refer to these character values in an EBCDIC +environment. +. +. +.SH "PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS" +.rs +.sp +By default \fBpcre2grep\fP supports the use of callouts with string arguments +within the patterns it is matching. There are two kinds: one that generates +output using local code, and another that calls an external program or script. +If --disable-pcre2grep-callout-fork is added to the \fBconfigure\fP command, +only the first kind of callout is supported; if --disable-pcre2grep-callout is +used, all callouts are completely ignored. For more details of \fBpcre2grep\fP +callouts, see the +.\" HREF +\fBpcre2grep\fP +.\" +documentation. +. +. +.SH "PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT" +.rs +.sp +By default, \fBpcre2grep\fP reads all files as plain text. You can build it so +that it recognizes files whose names end in \fB.gz\fP or \fB.bz2\fP, and reads +them with \fBlibz\fP or \fBlibbz2\fP, respectively, by adding one or both of +.sp + --enable-pcre2grep-libz + --enable-pcre2grep-libbz2 +.sp +to the \fBconfigure\fP command. These options naturally require that the +relevant libraries are installed on your system. Configuration will fail if +they are not. +. +. +.SH "PCRE2GREP BUFFER SIZE" +.rs +.sp +\fBpcre2grep\fP uses an internal buffer to hold a "window" on the file it is +scanning, in order to be able to output "before" and "after" lines when it +finds a match. The default starting size of the buffer is 20KiB. The buffer +itself is three times this size, but because of the way it is used for holding +"before" lines, the longest line that is guaranteed to be processable is the +notional buffer size. If a longer line is encountered, \fBpcre2grep\fP +automatically expands the buffer, up to a specified maximum size, whose default +is 1MiB or the starting size, whichever is the larger. You can change the +default parameter values by adding, for example, +.sp + --with-pcre2grep-bufsize=51200 + --with-pcre2grep-max-bufsize=2097152 +.sp +to the \fBconfigure\fP command. The caller of \fBpcre2grep\fP can override +these values by using --buffer-size and --max-buffer-size on the command line. +. +. +.SH "PCRE2TEST OPTION FOR LIBREADLINE SUPPORT" +.rs +.sp +If you add one of +.sp + --enable-pcre2test-libreadline + --enable-pcre2test-libedit +.sp +to the \fBconfigure\fP command, \fBpcre2test\fP is linked with the +\fBlibreadline\fP or\fBlibedit\fP library, respectively, and when its input is +from a terminal, it reads it using the \fBreadline()\fP function. This provides +line-editing and history facilities. Note that \fBlibreadline\fP is +GPL-licensed, so if you distribute a binary of \fBpcre2test\fP linked in this +way, there may be licensing issues. These can be avoided by linking instead +with \fBlibedit\fP, which has a BSD licence. +.P +Setting --enable-pcre2test-libreadline causes the \fB-lreadline\fP option to be +added to the \fBpcre2test\fP build. In many operating environments with a +sytem-installed readline library this is sufficient. However, in some +environments (e.g. if an unmodified distribution version of readline is in +use), some extra configuration may be necessary. The INSTALL file for +\fBlibreadline\fP says this: +.sp + "Readline uses the termcap functions, but does not link with + the termcap or curses library itself, allowing applications + which link with readline the to choose an appropriate library." +.sp +If your environment has not been set up so that an appropriate library is +automatically included, you may need to add something like +.sp + LIBS="-ncurses" +.sp +immediately before the \fBconfigure\fP command. +. +. +.SH "INCLUDING DEBUGGING CODE" +.rs +.sp +If you add +.sp + --enable-debug +.sp +to the \fBconfigure\fP command, additional debugging code is included in the +build. This feature is intended for use by the PCRE2 maintainers. +. +. +.SH "DEBUGGING WITH VALGRIND SUPPORT" +.rs +.sp +If you add +.sp + --enable-valgrind +.sp +to the \fBconfigure\fP command, PCRE2 will use valgrind annotations to mark +certain memory regions as unaddressable. This allows it to detect invalid +memory accesses, and is mostly useful for debugging PCRE2 itself. +. +. +.SH "CODE COVERAGE REPORTING" +.rs +.sp +If your C compiler is gcc, you can build a version of PCRE2 that can generate a +code coverage report for its test suite. To enable this, you must install +\fBlcov\fP version 1.6 or above. Then specify +.sp + --enable-coverage +.sp +to the \fBconfigure\fP command and build PCRE2 in the usual way. +.P +Note that using \fBccache\fP (a caching C compiler) is incompatible with code +coverage reporting. If you have configured \fBccache\fP to run automatically +on your system, you must set the environment variable +.sp + CCACHE_DISABLE=1 +.sp +before running \fBmake\fP to build PCRE2, so that \fBccache\fP is not used. +.P +When --enable-coverage is used, the following addition targets are added to the +\fIMakefile\fP: +.sp + make coverage +.sp +This creates a fresh coverage report for the PCRE2 test suite. It is equivalent +to running "make coverage-reset", "make coverage-baseline", "make check", and +then "make coverage-report". +.sp + make coverage-reset +.sp +This zeroes the coverage counters, but does nothing else. +.sp + make coverage-baseline +.sp +This captures baseline coverage information. +.sp + make coverage-report +.sp +This creates the coverage report. +.sp + make coverage-clean-report +.sp +This removes the generated coverage report without cleaning the coverage data +itself. +.sp + make coverage-clean-data +.sp +This removes the captured coverage data without removing the coverage files +created at compile time (*.gcno). +.sp + make coverage-clean +.sp +This cleans all coverage data including the generated coverage report. For more +information about code coverage, see the \fBgcov\fP and \fBlcov\fP +documentation. +. +. +.SH "DISABLING THE Z AND T FORMATTING MODIFIERS" +.rs +.sp +The C99 standard defines formatting modifiers z and t for size_t and +ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in +environments other than Microsoft Visual Studio when __STDC_VERSION__ is +defined and has a value greater than or equal to 199901L (indicating C99). +However, there is at least one environment that claims to be C99 but does not +support these modifiers. If +.sp + --disable-percent-zt +.sp +is specified, no use is made of the z or t modifiers. Instead of %td or %zu, +%lu is used, with a cast for size_t values. +. +. +.SH "SUPPORT FOR FUZZERS" +.rs +.sp +There is a special option for use by people who want to run fuzzing tests on +PCRE2: +.sp + --enable-fuzz-support +.sp +At present this applies only to the 8-bit library. If set, it causes an extra +library called libpcre2-fuzzsupport.a to be built, but not installed. This +contains a single function called LLVMFuzzerTestOneInput() whose arguments are +a pointer to a string and the length of the string. When called, this function +tries to compile the string as a pattern, and if that succeeds, to match it. +This is done both with no options and with some random options bits that are +generated from the string. +.P +Setting --enable-fuzz-support also causes a binary called \fBpcre2fuzzcheck\fP +to be created. This is normally run under valgrind or used when PCRE2 is +compiled with address sanitizing enabled. It calls the fuzzing function and +outputs information about what it is doing. The input strings are specified by +arguments: if an argument starts with "=" the rest of it is a literal input +string. Otherwise, it is assumed to be a file name, and the contents of the +file are the test string. +. +. +.SH "OBSOLETE OPTION" +.rs +.sp +In versions of PCRE2 prior to 10.30, there were two ways of handling +backtracking in the \fBpcre2_match()\fP function. The default was to use the +system stack, but if +.sp + --disable-stack-for-recursion +.sp +was set, memory on the heap was used. From release 10.30 onwards this has +changed (the stack is no longer used) and this option now does nothing except +give a warning. +. +.SH "SEE ALSO" +.rs +.sp +\fBpcre2api\fP(3), \fBpcre2-config\fP(3). +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 20 March 2020 +Copyright (c) 1997-2020 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2callout.3 b/src/pcre2/doc/pcre2callout.3 new file mode 100644 index 00000000..adb411b5 --- /dev/null +++ b/src/pcre2/doc/pcre2callout.3 @@ -0,0 +1,457 @@ +.TH PCRE2CALLOUT 3 "03 February 2019" "PCRE2 10.33" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH SYNOPSIS +.rs +.sp +.B #include +.PP +.SM +.nf +.B int (*pcre2_callout)(pcre2_callout_block *, void *); +.sp +.B int pcre2_callout_enumerate(const pcre2_code *\fIcode\fP, +.B " int (*\fIcallback\fP)(pcre2_callout_enumerate_block *, void *)," +.B " void *\fIuser_data\fP);" +.fi +. +.SH DESCRIPTION +.rs +.sp +PCRE2 provides a feature called "callout", which is a means of temporarily +passing control to the caller of PCRE2 in the middle of pattern matching. The +caller of PCRE2 provides an external function by putting its entry point in +a match context (see \fBpcre2_set_callout()\fP in the +.\" HREF +\fBpcre2api\fP +.\" +documentation). +.P +When using the \fBpcre2_substitute()\fP function, an additional callout feature +is available. This does a callout after each change to the subject string and +is described in the +.\" HREF +\fBpcre2api\fP +.\" +documentation; the rest of this document is concerned with callouts during +pattern matching. +.P +Within a regular expression, (?C) indicates a point at which the external +function is to be called. Different callout points can be identified by putting +a number less than 256 after the letter C. The default value is zero. +Alternatively, the argument may be a delimited string. The starting delimiter +must be one of ` ' " ^ % # $ { and the ending delimiter is the same as the +start, except for {, where the ending delimiter is }. If the ending delimiter +is needed within the string, it must be doubled. For example, this pattern has +two callout points: +.sp + (?C1)abc(?C"some ""arbitrary"" text")def +.sp +If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2 +automatically inserts callouts, all with number 255, before each item in the +pattern except for immediately before or after an explicit callout. For +example, if PCRE2_AUTO_CALLOUT is used with the pattern +.sp + A(?C3)B +.sp +it is processed as if it were +.sp + (?C255)A(?C3)B(?C255) +.sp +Here is a more complicated example: +.sp + A(\ed{2}|--) +.sp +With PCRE2_AUTO_CALLOUT, this pattern is processed as if it were +.sp + (?C255)A(?C255)((?C255)\ed{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255) +.sp +Notice that there is a callout before and after each parenthesis and +alternation bar. If the pattern contains a conditional group whose condition is +an assertion, an automatic callout is inserted immediately before the +condition. Such a callout may also be inserted explicitly, for example: +.sp + (?(?C9)(?=a)ab|de) (?(?C%text%)(?!=d)ab|de) +.sp +This applies only to assertion conditions (because they are themselves +independent groups). +.P +Callouts can be useful for tracking the progress of pattern matching. The +.\" HREF +\fBpcre2test\fP +.\" +program has a pattern qualifier (/auto_callout) that sets automatic callouts. +When any callouts are present, the output from \fBpcre2test\fP indicates how +the pattern is being matched. This is useful information when you are trying to +optimize the performance of a particular pattern. +. +. +.SH "MISSING CALLOUTS" +.rs +.sp +You should be aware that, because of optimizations in the way PCRE2 compiles +and matches patterns, callouts sometimes do not happen exactly as you might +expect. +. +. +.SS "Auto-possessification" +.rs +.sp +At compile time, PCRE2 "auto-possessifies" repeated items when it knows that +what follows cannot be part of the repeat. For example, a+[bc] is compiled as +if it were a++[bc]. The \fBpcre2test\fP output when this pattern is compiled +with PCRE2_ANCHORED and PCRE2_AUTO_CALLOUT and then applied to the string +"aaaa" is: +.sp + --->aaaa + +0 ^ a+ + +2 ^ ^ [bc] + No match +.sp +This indicates that when matching [bc] fails, there is no backtracking into a+ +(because it is being treated as a++) and therefore the callouts that would be +taken for the backtracks do not occur. You can disable the auto-possessify +feature by passing PCRE2_NO_AUTO_POSSESS to \fBpcre2_compile()\fP, or starting +the pattern with (*NO_AUTO_POSSESS). In this case, the output changes to this: +.sp + --->aaaa + +0 ^ a+ + +2 ^ ^ [bc] + +2 ^ ^ [bc] + +2 ^ ^ [bc] + +2 ^^ [bc] + No match +.sp +This time, when matching [bc] fails, the matcher backtracks into a+ and tries +again, repeatedly, until a+ itself fails. +. +. +.SS "Automatic .* anchoring" +.rs +.sp +By default, an optimization is applied when .* is the first significant item in +a pattern. If PCRE2_DOTALL is set, so that the dot can match any character, the +pattern is automatically anchored. If PCRE2_DOTALL is not set, a match can +start only after an internal newline or at the beginning of the subject, and +\fBpcre2_compile()\fP remembers this. If a pattern has more than one top-level +branch, automatic anchoring occurs if all branches are anchorable. +.P +This optimization is disabled, however, if .* is in an atomic group or if there +is a backreference to the capture group in which it appears. It is also +disabled if the pattern contains (*PRUNE) or (*SKIP). However, the presence of +callouts does not affect it. +.P +For example, if the pattern .*\ed is compiled with PCRE2_AUTO_CALLOUT and +applied to the string "aa", the \fBpcre2test\fP output is: +.sp + --->aa + +0 ^ .* + +2 ^ ^ \ed + +2 ^^ \ed + +2 ^ \ed + No match +.sp +This shows that all match attempts start at the beginning of the subject. In +other words, the pattern is anchored. You can disable this optimization by +passing PCRE2_NO_DOTSTAR_ANCHOR to \fBpcre2_compile()\fP, or starting the +pattern with (*NO_DOTSTAR_ANCHOR). In this case, the output changes to: +.sp + --->aa + +0 ^ .* + +2 ^ ^ \ed + +2 ^^ \ed + +2 ^ \ed + +0 ^ .* + +2 ^^ \ed + +2 ^ \ed + No match +.sp +This shows more match attempts, starting at the second subject character. +Another optimization, described in the next section, means that there is no +subsequent attempt to match with an empty subject. +. +. +.SS "Other optimizations" +.rs +.sp +Other optimizations that provide fast "no match" results also affect callouts. +For example, if the pattern is +.sp + ab(?C4)cd +.sp +PCRE2 knows that any matching string must contain the letter "d". If the +subject string is "abyz", the lack of "d" means that matching doesn't ever +start, and the callout is never reached. However, with "abyd", though the +result is still no match, the callout is obeyed. +.P +For most patterns PCRE2 also knows the minimum length of a matching string, and +will immediately give a "no match" return without actually running a match if +the subject is not long enough, or, for unanchored patterns, if it has been +scanned far enough. +.P +You can disable these optimizations by passing the PCRE2_NO_START_OPTIMIZE +option to \fBpcre2_compile()\fP, or by starting the pattern with +(*NO_START_OPT). This slows down the matching process, but does ensure that +callouts such as the example above are obeyed. +. +. +.\" HTML +.SH "THE CALLOUT INTERFACE" +.rs +.sp +During matching, when PCRE2 reaches a callout point, if an external function is +provided in the match context, it is called. This applies to both normal, +DFA, and JIT matching. The first argument to the callout function is a pointer +to a \fBpcre2_callout\fP block. The second argument is the void * callout data +that was supplied when the callout was set up by calling +\fBpcre2_set_callout()\fP (see the +.\" HREF +\fBpcre2api\fP +.\" +documentation). The callout block structure contains the following fields, not +necessarily in this order: +.sp + uint32_t \fIversion\fP; + uint32_t \fIcallout_number\fP; + uint32_t \fIcapture_top\fP; + uint32_t \fIcapture_last\fP; + uint32_t \fIcallout_flags\fP; + PCRE2_SIZE *\fIoffset_vector\fP; + PCRE2_SPTR \fImark\fP; + PCRE2_SPTR \fIsubject\fP; + PCRE2_SIZE \fIsubject_length\fP; + PCRE2_SIZE \fIstart_match\fP; + PCRE2_SIZE \fIcurrent_position\fP; + PCRE2_SIZE \fIpattern_position\fP; + PCRE2_SIZE \fInext_item_length\fP; + PCRE2_SIZE \fIcallout_string_offset\fP; + PCRE2_SIZE \fIcallout_string_length\fP; + PCRE2_SPTR \fIcallout_string\fP; +.sp +The \fIversion\fP field contains the version number of the block format. The +current version is 2; the three callout string fields were added for version 1, +and the \fIcallout_flags\fP field for version 2. If you are writing an +application that might use an earlier release of PCRE2, you should check the +version number before accessing any of these fields. The version number will +increase in future if more fields are added, but the intention is never to +remove any of the existing fields. +. +. +.SS "Fields for numerical callouts" +.rs +.sp +For a numerical callout, \fIcallout_string\fP is NULL, and \fIcallout_number\fP +contains the number of the callout, in the range 0-255. This is the number +that follows (?C for callouts that part of the pattern; it is 255 for +automatically generated callouts. +. +. +.SS "Fields for string callouts" +.rs +.sp +For callouts with string arguments, \fIcallout_number\fP is always zero, and +\fIcallout_string\fP points to the string that is contained within the compiled +pattern. Its length is given by \fIcallout_string_length\fP. Duplicated ending +delimiters that were present in the original pattern string have been turned +into single characters, but there is no other processing of the callout string +argument. An additional code unit containing binary zero is present after the +string, but is not included in the length. The delimiter that was used to start +the string is also stored within the pattern, immediately before the string +itself. You can access this delimiter as \fIcallout_string\fP[-1] if you need +it. +.P +The \fIcallout_string_offset\fP field is the code unit offset to the start of +the callout argument string within the original pattern string. This is +provided for the benefit of applications such as script languages that might +need to report errors in the callout string within the pattern. +. +. +.SS "Fields for all callouts" +.rs +.sp +The remaining fields in the callout block are the same for both kinds of +callout. +.P +The \fIoffset_vector\fP field is a pointer to a vector of capturing offsets +(the "ovector"). You may read the elements in this vector, but you must not +change any of them. +.P +For calls to \fBpcre2_match()\fP, the \fIoffset_vector\fP field is not (since +release 10.30) a pointer to the actual ovector that was passed to the matching +function in the match data block. Instead it points to an internal ovector of a +size large enough to hold all possible captured substrings in the pattern. Note +that whenever a recursion or subroutine call within a pattern completes, the +capturing state is reset to what it was before. +.P +The \fIcapture_last\fP field contains the number of the most recently captured +substring, and the \fIcapture_top\fP field contains one more than the number of +the highest numbered captured substring so far. If no substrings have yet been +captured, the value of \fIcapture_last\fP is 0 and the value of +\fIcapture_top\fP is 1. The values of these fields do not always differ by one; +for example, when the callout in the pattern ((a)(b))(?C2) is taken, +\fIcapture_last\fP is 1 but \fIcapture_top\fP is 4. +.P +The contents of ovector[2] to ovector[*2-1] can be inspected in +order to extract substrings that have been matched so far, in the same way as +extracting substrings after a match has completed. The values in ovector[0] and +ovector[1] are always PCRE2_UNSET because the match is by definition not +complete. Substrings that have not been captured but whose numbers are less +than \fIcapture_top\fP also have both of their ovector slots set to +PCRE2_UNSET. +.P +For DFA matching, the \fIoffset_vector\fP field points to the ovector that was +passed to the matching function in the match data block for callouts at the top +level, but to an internal ovector during the processing of pattern recursions, +lookarounds, and atomic groups. However, these ovectors hold no useful +information because \fBpcre2_dfa_match()\fP does not support substring +capturing. The value of \fIcapture_top\fP is always 1 and the value of +\fIcapture_last\fP is always 0 for DFA matching. +.P +The \fIsubject\fP and \fIsubject_length\fP fields contain copies of the values +that were passed to the matching function. +.P +The \fIstart_match\fP field normally contains the offset within the subject at +which the current match attempt started. However, if the escape sequence \eK +has been encountered, this value is changed to reflect the modified starting +point. If the pattern is not anchored, the callout function may be called +several times from the same point in the pattern for different starting points +in the subject. +.P +The \fIcurrent_position\fP field contains the offset within the subject of the +current match pointer. +.P +The \fIpattern_position\fP field contains the offset in the pattern string to +the next item to be matched. +.P +The \fInext_item_length\fP field contains the length of the next item to be +processed in the pattern string. When the callout is at the end of the pattern, +the length is zero. When the callout precedes an opening parenthesis, the +length includes meta characters that follow the parenthesis. For example, in a +callout before an assertion such as (?=ab) the length is 3. For an an +alternation bar or a closing parenthesis, the length is one, unless a closing +parenthesis is followed by a quantifier, in which case its length is included. +(This changed in release 10.23. In earlier releases, before an opening +parenthesis the length was that of the entire group, and before an alternation +bar or a closing parenthesis the length was zero.) +.P +The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to +help in distinguishing between different automatic callouts, which all have the +same callout number. However, they are set for all callouts, and are used by +\fBpcre2test\fP to show the next item to be matched when displaying callout +information. +.P +In callouts from \fBpcre2_match()\fP the \fImark\fP field contains a pointer to +the zero-terminated name of the most recently passed (*MARK), (*PRUNE), or +(*THEN) item in the match, or NULL if no such items have been passed. Instances +of (*PRUNE) or (*THEN) without a name do not obliterate a previous (*MARK). In +callouts from the DFA matching function this field always contains NULL. +.P +The \fIcallout_flags\fP field is always zero in callouts from +\fBpcre2_dfa_match()\fP or when JIT is being used. When \fBpcre2_match()\fP +without JIT is used, the following bits may be set: +.sp + PCRE2_CALLOUT_STARTMATCH +.sp +This is set for the first callout after the start of matching for each new +starting position in the subject. +.sp + PCRE2_CALLOUT_BACKTRACK +.sp +This is set if there has been a matching backtrack since the previous callout, +or since the start of matching if this is the first callout from a +\fBpcre2_match()\fP run. +.P +Both bits are set when a backtrack has caused a "bumpalong" to a new starting +position in the subject. Output from \fBpcre2test\fP does not indicate the +presence of these bits unless the \fBcallout_extra\fP modifier is set. +.P +The information in the \fBcallout_flags\fP field is provided so that +applications can track and tell their users how matching with backtracking is +done. This can be useful when trying to optimize patterns, or just to +understand how PCRE2 works. There is no support in \fBpcre2_dfa_match()\fP +because there is no backtracking in DFA matching, and there is no support in +JIT because JIT is all about maximimizing matching performance. In both these +cases the \fBcallout_flags\fP field is always zero. +. +. +.SH "RETURN VALUES FROM CALLOUTS" +.rs +.sp +The external callout function returns an integer to PCRE2. If the value is +zero, matching proceeds as normal. If the value is greater than zero, matching +fails at the current point, but the testing of other matching possibilities +goes ahead, just as if a lookahead assertion had failed. If the value is less +than zero, the match is abandoned, and the matching function returns the +negative value. +.P +Negative values should normally be chosen from the set of PCRE2_ERROR_xxx +values. In particular, PCRE2_ERROR_NOMATCH forces a standard "no match" +failure. The error number PCRE2_ERROR_CALLOUT is reserved for use by callout +functions; it will never be used by PCRE2 itself. +. +. +.SH "CALLOUT ENUMERATION" +.rs +.sp +.nf +.B int pcre2_callout_enumerate(const pcre2_code *\fIcode\fP, +.B " int (*\fIcallback\fP)(pcre2_callout_enumerate_block *, void *)," +.B " void *\fIuser_data\fP);" +.fi +.sp +A script language that supports the use of string arguments in callouts might +like to scan all the callouts in a pattern before running the match. This can +be done by calling \fBpcre2_callout_enumerate()\fP. The first argument is a +pointer to a compiled pattern, the second points to a callback function, and +the third is arbitrary user data. The callback function is called for every +callout in the pattern in the order in which they appear. Its first argument is +a pointer to a callout enumeration block, and its second argument is the +\fIuser_data\fP value that was passed to \fBpcre2_callout_enumerate()\fP. The +data block contains the following fields: +.sp + \fIversion\fP Block version number + \fIpattern_position\fP Offset to next item in pattern + \fInext_item_length\fP Length of next item in pattern + \fIcallout_number\fP Number for numbered callouts + \fIcallout_string_offset\fP Offset to string within pattern + \fIcallout_string_length\fP Length of callout string + \fIcallout_string\fP Points to callout string or is NULL +.sp +The version number is currently 0. It will increase if new fields are ever +added to the block. The remaining fields are the same as their namesakes in the +\fBpcre2_callout\fP block that is used for callouts during matching, as +described +.\" HTML +.\" +above. +.\" +.P +Note that the value of \fIpattern_position\fP is unique for each callout. +However, if a callout occurs inside a group that is quantified with a non-zero +minimum or a fixed maximum, the group is replicated inside the compiled +pattern. For example, a pattern such as /(a){2}/ is compiled as if it were +/(a)(a)/. This means that the callout will be enumerated more than once, but +with the same value for \fIpattern_position\fP in each case. +.P +The callback function should normally return zero. If it returns a non-zero +value, scanning the pattern stops, and that value is returned from +\fBpcre2_callout_enumerate()\fP. +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 03 February 2019 +Copyright (c) 1997-2019 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2compat.3 b/src/pcre2/doc/pcre2compat.3 new file mode 100644 index 00000000..026e6648 --- /dev/null +++ b/src/pcre2/doc/pcre2compat.3 @@ -0,0 +1,217 @@ +.TH PCRE2COMPAT 3 "06 October 2020" "PCRE2 10.36" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH "DIFFERENCES BETWEEN PCRE2 AND PERL" +.rs +.sp +This document describes some of the differences in the ways that PCRE2 and Perl +handle regular expressions. The differences described here are with respect to +Perl version 5.32.0, but as both Perl and PCRE2 are continually changing, the +information may at times be out of date. +.P +1. PCRE2 has only a subset of Perl's Unicode support. Details of what it does +have are given in the +.\" HREF +\fBpcre2unicode\fP +.\" +page. +.P +2. Like Perl, PCRE2 allows repeat quantifiers on parenthesized assertions, but +they do not mean what you might think. For example, (?!a){3} does not assert +that the next three characters are not "a". It just asserts that the next +character is not "a" three times (in principle; PCRE2 optimizes this to run the +assertion just once). Perl allows some repeat quantifiers on other assertions, +for example, \eb* (but not \eb{3}, though oddly it does allow ^{3}), but these +do not seem to have any use. PCRE2 does not allow any kind of quantifier on +non-lookaround assertions. +.P +3. Capture groups that occur inside negative lookaround assertions are counted, +but their entries in the offsets vector are set only when a negative assertion +is a condition that has a matching branch (that is, the condition is false). +Perl may set such capture groups in other circumstances. +.P +4. The following Perl escape sequences are not supported: \eF, \el, \eL, \eu, +\eU, and \eN when followed by a character name. \eN on its own, matching a +non-newline character, and \eN{U+dd..}, matching a Unicode code point, are +supported. The escapes that modify the case of following letters are +implemented by Perl's general string-handling and are not part of its pattern +matching engine. If any of these are encountered by PCRE2, an error is +generated by default. However, if either of the PCRE2_ALT_BSUX or +PCRE2_EXTRA_ALT_BSUX options is set, \eU and \eu are interpreted as ECMAScript +interprets them. +.P +5. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE2 is +built with Unicode support (the default). The properties that can be tested +with \ep and \eP are limited to the general category properties such as Lu and +Nd, script names such as Greek or Han, and the derived properties Any and L&. +Both PCRE2 and Perl support the Cs (surrogate) property, but in PCRE2 its use +is limited. See the +.\" HREF +\fBpcre2pattern\fP +.\" +documentation for details. The long synonyms for property names that Perl +supports (such as \ep{Letter}) are not supported by PCRE2, nor is it permitted +to prefix any of these properties with "Is". +.P +6. PCRE2 supports the \eQ...\eE escape for quoting substrings. Characters +in between are treated as literals. However, this is slightly different from +Perl in that $ and @ are also handled as literals inside the quotes. In Perl, +they cause variable interpolation (but of course PCRE2 does not have +variables). Also, Perl does "double-quotish backslash interpolation" on any +backslashes between \eQ and \eE which, its documentation says, "may lead to +confusing results". PCRE2 treats a backslash between \eQ and \eE just like any +other character. Note the following examples: +.sp + Pattern PCRE2 matches Perl matches +.sp +.\" JOIN + \eQabc$xyz\eE abc$xyz abc followed by the + contents of $xyz + \eQabc\e$xyz\eE abc\e$xyz abc\e$xyz + \eQabc\eE\e$\eQxyz\eE abc$xyz abc$xyz + \eQA\eB\eE A\eB A\eB + \eQ\e\eE \e \e\eE +.sp +The \eQ...\eE sequence is recognized both inside and outside character classes +by both PCRE2 and Perl. +.P +7. Fairly obviously, PCRE2 does not support the (?{code}) and (??{code}) +constructions. However, PCRE2 does have a "callout" feature, which allows an +external function to be called during pattern matching. See the +.\" HREF +\fBpcre2callout\fP +.\" +documentation for details. +.P +8. Subroutine calls (whether recursive or not) were treated as atomic groups up +to PCRE2 release 10.23, but from release 10.30 this changed, and backtracking +into subroutine calls is now supported, as in Perl. +.P +9. In PCRE2, if any of the backtracking control verbs are used in a group that +is called as a subroutine (whether or not recursively), their effect is +confined to that group; it does not extend to the surrounding pattern. This is +not always the case in Perl. In particular, if (*THEN) is present in a group +that is called as a subroutine, its action is limited to that group, even if +the group does not contain any | characters. Note that such groups are +processed as anchored at the point where they are tested. +.P +10. If a pattern contains more than one backtracking control verb, the first +one that is backtracked onto acts. For example, in the pattern +A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C +triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the +same as PCRE2, but there are cases where it differs. +.P +11. There are some differences that are concerned with the settings of captured +strings when part of a pattern is repeated. For example, matching "aba" against +the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to +"b". +.P +12. PCRE2's handling of duplicate capture group numbers and names is not as +general as Perl's. This is a consequence of the fact the PCRE2 works internally +just with numbers, using an external table to translate between numbers and +names. In particular, a pattern such as (?|(?A)|(?B)), where the two +capture groups have the same number but different names, is not supported, and +causes an error at compile time. If it were allowed, it would not be possible +to distinguish which group matched, because both names map to capture group +number 1. To avoid this confusing situation, an error is given at compile time. +.P +13. Perl used to recognize comments in some places that PCRE2 does not, for +example, between the ( and ? at the start of a group. If the /x modifier is +set, Perl allowed white space between ( and ? though the latest Perls give an +error (for a while it was just deprecated). There may still be some cases where +Perl behaves differently. +.P +14. Perl, when in warning mode, gives warnings for character classes such as +[A-\ed] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no +warning features, so it gives an error in these cases because they are almost +certainly user mistakes. +.P +15. In PCRE2, the upper/lower case character properties Lu and Ll are not +affected when case-independent matching is specified. For example, \ep{Lu} +always matches an upper case letter. I think Perl has changed in this respect; +in the release at the time of writing (5.32), \ep{Lu} and \ep{Ll} match all +letters, regardless of case, when case independence is specified. +.P +16. From release 5.32.0, Perl locks out the use of \eK in lookaround +assertions. In PCRE2, \eK is acted on when it occurs in positive assertions, +but is ignored in negative assertions. +.P +17. PCRE2 provides some extensions to the Perl regular expression facilities. +Perl 5.10 included new features that were not in earlier versions of Perl, some +of which (such as named parentheses) were in PCRE2 for some time before. This +list is with respect to Perl 5.32: +.sp +(a) Although lookbehind assertions in PCRE2 must match fixed length strings, +each alternative toplevel branch of a lookbehind assertion can match a +different length of string. Perl requires them all to have the same length. +.sp +(b) From PCRE2 10.23, backreferences to groups of fixed length are supported +in lookbehinds, provided that there is no possibility of referencing a +non-unique number or name. Perl does not support backreferences in lookbehinds. +.sp +(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $ +meta-character matches only at the very end of the string. +.sp +(d) A backslash followed by a letter with no special meaning is faulted. (Perl +can be made to issue a warning.) +.sp +(e) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is +inverted, that is, by default they are not greedy, but if followed by a +question mark they are. +.sp +(f) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried +only at the first matching position in the subject string. +.sp +(g) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY and PCRE2_NOTEMPTY_ATSTART +options have no Perl equivalents. +.sp +(h) The \eR escape sequence can be restricted to match only CR, LF, or CRLF +by the PCRE2_BSR_ANYCRLF option. +.sp +(i) The callout facility is PCRE2-specific. Perl supports codeblocks and +variable interpolation, but not general hooks on every match. +.sp +(j) The partial matching facility is PCRE2-specific. +.sp +(k) The alternative matching function (\fBpcre2_dfa_match()\fP matches in a +different way and is not Perl-compatible. +.sp +(l) PCRE2 recognizes some special sequences such as (*CR) or (*NO_JIT) at +the start of a pattern. These set overall options that cannot be changed within +the pattern. +.sp +(m) PCRE2 supports non-atomic positive lookaround assertions. This is an +extension to the lookaround facilities. The default, Perl-compatible +lookarounds are atomic. +.P +18. The Perl /a modifier restricts /d numbers to pure ascii, and the /aa +modifier restricts /i case-insensitive matching to pure ascii, ignoring Unicode +rules. This separation cannot be represented with PCRE2_UCP. +.P +19. Perl has different limits than PCRE2. See the +.\" HREF +\fBpcre2limit\fP +.\" +documentation for details. Perl went with 5.10 from recursion to iteration +keeping the intermediate matches on the heap, which is ~10% slower but does not +fall into any stack-overflow limit. PCRE2 made a similar change at release +10.30, and also has many build-time and run-time customizable limits. +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 06 October 2020 +Copyright (c) 1997-2019 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2convert.3 b/src/pcre2/doc/pcre2convert.3 new file mode 100644 index 00000000..34beaf0f --- /dev/null +++ b/src/pcre2/doc/pcre2convert.3 @@ -0,0 +1,164 @@ +.TH PCRE2CONVERT 3 "28 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH "EXPERIMENTAL PATTERN CONVERSION FUNCTIONS" +.rs +.sp +This document describes a set of functions that can be used to convert +"foreign" patterns into PCRE2 regular expressions. This facility is currently +experimental, and may be changed in future releases. Two kinds of pattern, +globs and POSIX patterns, are supported. +. +. +.SH "THE CONVERT CONTEXT" +.rs +.sp +.nf +.B pcre2_convert_context *pcre2_convert_context_create( +.B " pcre2_general_context *\fIgcontext\fP);" +.sp +.B pcre2_convert_context *pcre2_convert_context_copy( +.B " pcre2_convert_context *\fIcvcontext\fP);" +.sp +.B void pcre2_convert_context_free(pcre2_convert_context *\fIcvcontext\fP); +.sp +.B int pcre2_set_glob_escape(pcre2_convert_context *\fIcvcontext\fP, +.B " uint32_t \fIescape_char\fP);" +.sp +.B int pcre2_set_glob_separator(pcre2_convert_context *\fIcvcontext\fP, +.B " uint32_t \fIseparator_char\fP);" +.fi +.sp +A convert context is used to hold parameters that affect the way that pattern +conversion works. Like all PCRE2 contexts, you need to use a context only if +you want to override the defaults. There are the usual create, copy, and free +functions. If custom memory management functions are set in a general context +that is passed to \fBpcre2_convert_context_create()\fP, they are used for all +memory management within the conversion functions. +.P +There are only two parameters in the convert context at present. Both apply +only to glob conversions. The escape character defaults to grave accent under +Windows, otherwise backslash. It can be set to zero, meaning no escape +character, or to any punctuation character with a code point less than 256. +The separator character defaults to backslash under Windows, otherwise forward +slash. It can be set to forward slash, backslash, or dot. +.P +The two setting functions return zero on success, or PCRE2_ERROR_BADDATA if +their second argument is invalid. +. +. +.SH "THE CONVERSION FUNCTION" +.rs +.sp +.nf +.B int pcre2_pattern_convert(PCRE2_SPTR \fIpattern\fP, PCRE2_SIZE \fIlength\fP, +.B " uint32_t \fIoptions\fP, PCRE2_UCHAR **\fIbuffer\fP," +.B " PCRE2_SIZE *\fIblength\fP, pcre2_convert_context *\fIcvcontext\fP);" +.sp +.B void pcre2_converted_pattern_free(PCRE2_UCHAR *\fIconverted_pattern\fP); +.fi +.sp +The first two arguments of \fBpcre2_pattern_convert()\fP define the foreign +pattern that is to be converted. The length may be given as +PCRE2_ZERO_TERMINATED. The \fBoptions\fP argument defines how the pattern is to +be processed. If the input is UTF, the PCRE2_CONVERT_UTF option should be set. +PCRE2_CONVERT_NO_UTF_CHECK may also be set if you are sure the input is valid. +One or more of the glob options, or one of the following POSIX options must be +set to define the type of conversion that is required: +.sp + PCRE2_CONVERT_GLOB + PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR + PCRE2_CONVERT_GLOB_NO_STARSTAR + PCRE2_CONVERT_POSIX_BASIC + PCRE2_CONVERT_POSIX_EXTENDED +.sp +Details of the conversions are given below. The \fBbuffer\fP and \fBblength\fP +arguments define how the output is handled: +.P +If \fBbuffer\fP is NULL, the function just returns the length of the converted +pattern via \fBblength\fP. This is one less than the length of buffer needed, +because a terminating zero is always added to the output. +.P +If \fBbuffer\fP points to a NULL pointer, an output buffer is obtained using +the allocator in the context or \fBmalloc()\fP if no context is supplied. A +pointer to this buffer is placed in the variable to which \fBbuffer\fP points. +When no longer needed the output buffer must be freed by calling +\fBpcre2_converted_pattern_free()\fP. If this function is called with a NULL +argument, it returns immediately without doing anything. +.P +If \fBbuffer\fP points to a non-NULL pointer, \fBblength\fP must be set to the +actual length of the buffer provided (in code units). +.P +In all cases, after successful conversion, the variable pointed to by +\fBblength\fP is updated to the length actually used (in code units), excluding +the terminating zero that is always added. +.P +If an error occurs, the length (via \fBblength\fP) is set to the offset +within the input pattern where the error was detected. Only gross syntax errors +are caught; there are plenty of errors that will get passed on for +\fBpcre2_compile()\fP to discover. +.P +The return from \fBpcre2_pattern_convert()\fP is zero on success or a non-zero +PCRE2 error code. Note that PCRE2 error codes may be positive or negative: +\fBpcre2_compile()\fP uses mostly positive codes and \fBpcre2_match()\fP +negative ones; \fBpcre2_convert()\fP uses existing codes of both kinds. A +textual error message can be obtained by calling +\fBpcre2_get_error_message()\fP. +. +. +.SH "CONVERTING GLOBS" +.rs +.sp +Globs are used to match file names, and consequently have the concept of a +"path separator", which defaults to backslash under Windows and forward slash +otherwise. If PCRE2_CONVERT_GLOB is set, the wildcards * and ? are not +permitted to match separator characters, but the double-star (**) feature +(which does match separators) is supported. +.P +PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards allowed to +match separator characters. PCRE2_GLOB_NO_STARSTAR matches globs with the +double-star feature disabled. These options may be given together. +. +. +.SH "CONVERTING POSIX PATTERNS" +.rs +.sp +POSIX defines two kinds of regular expression pattern: basic and extended. +These can be processed by setting PCRE2_CONVERT_POSIX_BASIC or +PCRE2_CONVERT_POSIX_EXTENDED, respectively. +.P +In POSIX patterns, backslash is not special in a character class. Unmatched +closing parentheses are treated as literals. +.P +In basic patterns, ? + | {} and () must be escaped to be recognized +as metacharacters outside a character class. If the first character in the +pattern is * it is treated as a literal. ^ is a metacharacter only at the start +of a branch. +.P +In extended patterns, a backslash not in a character class always +makes the next character literal, whatever it is. There are no backreferences. +.P +Note: POSIX mandates that the longest possible match at the first matching +position must be found. This is not what \fBpcre2_match()\fP does; it yields +the first match that is found. An application can use \fBpcre2_dfa_match()\fP +to find the longest match, but that does not support backreferences (but then +neither do POSIX extended patterns). +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 28 June 2018 +Copyright (c) 1997-2018 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2demo.3 b/src/pcre2/doc/pcre2demo.3 new file mode 100644 index 00000000..0d301459 --- /dev/null +++ b/src/pcre2/doc/pcre2demo.3 @@ -0,0 +1,512 @@ +.\" Start example. +.de EX +. nr mE \\n(.f +. nf +. nh +. ft CW +.. +. +. +.\" End example. +.de EE +. ft \\n(mE +. fi +. hy \\n(HY +.. +. +.EX +/************************************************* +* PCRE2 DEMONSTRATION PROGRAM * +*************************************************/ + +/* This is a demonstration program to illustrate a straightforward way of +using the PCRE2 regular expression library from a C program. See the +pcre2sample documentation for a short discussion ("man pcre2sample" if you have +the PCRE2 man pages installed). PCRE2 is a revised API for the library, and is +incompatible with the original PCRE API. + +There are actually three libraries, each supporting a different code unit +width. This demonstration program uses the 8-bit library. The default is to +process each code unit as a separate character, but if the pattern begins with +"(*UTF)", both it and the subject are treated as UTF-8 strings, where +characters may occupy multiple code units. + +In Unix-like environments, if PCRE2 is installed in your standard system +libraries, you should be able to compile this program using this command: + +cc -Wall pcre2demo.c -lpcre2-8 -o pcre2demo + +If PCRE2 is not installed in a standard place, it is likely to be installed +with support for the pkg-config mechanism. If you have pkg-config, you can +compile this program using this command: + +cc -Wall pcre2demo.c `pkg-config --cflags --libs libpcre2-8` -o pcre2demo + +If you do not have pkg-config, you may have to use something like this: + +cc -Wall pcre2demo.c -I/usr/local/include -L/usr/local/lib \e + -R/usr/local/lib -lpcre2-8 -o pcre2demo + +Replace "/usr/local/include" and "/usr/local/lib" with wherever the include and +library files for PCRE2 are installed on your system. Only some operating +systems (Solaris is one) use the -R option. + +Building under Windows: + +If you want to statically link this program against a non-dll .a file, you must +define PCRE2_STATIC before including pcre2.h, so in this environment, uncomment +the following line. */ + +/* #define PCRE2_STATIC */ + +/* The PCRE2_CODE_UNIT_WIDTH macro must be defined before including pcre2.h. +For a program that uses only one code unit width, setting it to 8, 16, or 32 +makes it possible to use generic function names such as pcre2_compile(). Note +that just changing 8 to 16 (for example) is not sufficient to convert this +program to process 16-bit characters. Even in a fully 16-bit environment, where +string-handling functions such as strcmp() and printf() work with 16-bit +characters, the code for handling the table of named substrings will still need +to be modified. */ + +#define PCRE2_CODE_UNIT_WIDTH 8 + +#include +#include +#include + + +/************************************************************************** +* Here is the program. The API includes the concept of "contexts" for * +* setting up unusual interface requirements for compiling and matching, * +* such as custom memory managers and non-standard newline definitions. * +* This program does not do any of this, so it makes no use of contexts, * +* always passing NULL where a context could be given. * +**************************************************************************/ + +int main(int argc, char **argv) +{ +pcre2_code *re; +PCRE2_SPTR pattern; /* PCRE2_SPTR is a pointer to unsigned code units of */ +PCRE2_SPTR subject; /* the appropriate width (in this case, 8 bits). */ +PCRE2_SPTR name_table; + +int crlf_is_newline; +int errornumber; +int find_all; +int i; +int rc; +int utf8; + +uint32_t option_bits; +uint32_t namecount; +uint32_t name_entry_size; +uint32_t newline; + +PCRE2_SIZE erroroffset; +PCRE2_SIZE *ovector; +PCRE2_SIZE subject_length; + +pcre2_match_data *match_data; + + +/************************************************************************** +* First, sort out the command line. There is only one possible option at * +* the moment, "-g" to request repeated matching to find all occurrences, * +* like Perl's /g option. We set the variable find_all to a non-zero value * +* if the -g option is present. * +**************************************************************************/ + +find_all = 0; +for (i = 1; i < argc; i++) + { + if (strcmp(argv[i], "-g") == 0) find_all = 1; + else if (argv[i][0] == '-') + { + printf("Unrecognised option %s\en", argv[i]); + return 1; + } + else break; + } + +/* After the options, we require exactly two arguments, which are the pattern, +and the subject string. */ + +if (argc - i != 2) + { + printf("Exactly two arguments required: a regex and a subject string\en"); + return 1; + } + +/* Pattern and subject are char arguments, so they can be straightforwardly +cast to PCRE2_SPTR because we are working in 8-bit code units. The subject +length is cast to PCRE2_SIZE for completeness, though PCRE2_SIZE is in fact +defined to be size_t. */ + +pattern = (PCRE2_SPTR)argv[i]; +subject = (PCRE2_SPTR)argv[i+1]; +subject_length = (PCRE2_SIZE)strlen((char *)subject); + + +/************************************************************************* +* Now we are going to compile the regular expression pattern, and handle * +* any errors that are detected. * +*************************************************************************/ + +re = pcre2_compile( + pattern, /* the pattern */ + PCRE2_ZERO_TERMINATED, /* indicates pattern is zero-terminated */ + 0, /* default options */ + &errornumber, /* for error number */ + &erroroffset, /* for error offset */ + NULL); /* use default compile context */ + +/* Compilation failed: print the error message and exit. */ + +if (re == NULL) + { + PCRE2_UCHAR buffer[256]; + pcre2_get_error_message(errornumber, buffer, sizeof(buffer)); + printf("PCRE2 compilation failed at offset %d: %s\en", (int)erroroffset, + buffer); + return 1; + } + + +/************************************************************************* +* If the compilation succeeded, we call PCRE2 again, in order to do a * +* pattern match against the subject string. This does just ONE match. If * +* further matching is needed, it will be done below. Before running the * +* match we must set up a match_data block for holding the result. Using * +* pcre2_match_data_create_from_pattern() ensures that the block is * +* exactly the right size for the number of capturing parentheses in the * +* pattern. If you need to know the actual size of a match_data block as * +* a number of bytes, you can find it like this: * +* * +* PCRE2_SIZE match_data_size = pcre2_get_match_data_size(match_data); * +*************************************************************************/ + +match_data = pcre2_match_data_create_from_pattern(re, NULL); + +/* Now run the match. */ + +rc = pcre2_match( + re, /* the compiled pattern */ + subject, /* the subject string */ + subject_length, /* the length of the subject */ + 0, /* start at offset 0 in the subject */ + 0, /* default options */ + match_data, /* block for storing the result */ + NULL); /* use default match context */ + +/* Matching failed: handle error cases */ + +if (rc < 0) + { + switch(rc) + { + case PCRE2_ERROR_NOMATCH: printf("No match\en"); break; + /* + Handle other special cases if you like + */ + default: printf("Matching error %d\en", rc); break; + } + pcre2_match_data_free(match_data); /* Release memory used for the match */ + pcre2_code_free(re); /* data and the compiled pattern. */ + return 1; + } + +/* Match succeded. Get a pointer to the output vector, where string offsets are +stored. */ + +ovector = pcre2_get_ovector_pointer(match_data); +printf("Match succeeded at offset %d\en", (int)ovector[0]); + + +/************************************************************************* +* We have found the first match within the subject string. If the output * +* vector wasn't big enough, say so. Then output any substrings that were * +* captured. * +*************************************************************************/ + +/* The output vector wasn't big enough. This should not happen, because we used +pcre2_match_data_create_from_pattern() above. */ + +if (rc == 0) + printf("ovector was not big enough for all the captured substrings\en"); + +/* We must guard against patterns such as /(?=.\eK)/ that use \eK in an assertion +to set the start of a match later than its end. In this demonstration program, +we just detect this case and give up. */ + +if (ovector[0] > ovector[1]) + { + printf("\e\eK was used in an assertion to set the match start after its end.\en" + "From end to start the match was: %.*s\en", (int)(ovector[0] - ovector[1]), + (char *)(subject + ovector[1])); + printf("Run abandoned\en"); + pcre2_match_data_free(match_data); + pcre2_code_free(re); + return 1; + } + +/* Show substrings stored in the output vector by number. Obviously, in a real +application you might want to do things other than print them. */ + +for (i = 0; i < rc; i++) + { + PCRE2_SPTR substring_start = subject + ovector[2*i]; + PCRE2_SIZE substring_length = ovector[2*i+1] - ovector[2*i]; + printf("%2d: %.*s\en", i, (int)substring_length, (char *)substring_start); + } + + +/************************************************************************** +* That concludes the basic part of this demonstration program. We have * +* compiled a pattern, and performed a single match. The code that follows * +* shows first how to access named substrings, and then how to code for * +* repeated matches on the same subject. * +**************************************************************************/ + +/* See if there are any named substrings, and if so, show them by name. First +we have to extract the count of named parentheses from the pattern. */ + +(void)pcre2_pattern_info( + re, /* the compiled pattern */ + PCRE2_INFO_NAMECOUNT, /* get the number of named substrings */ + &namecount); /* where to put the answer */ + +if (namecount == 0) printf("No named substrings\en"); else + { + PCRE2_SPTR tabptr; + printf("Named substrings\en"); + + /* Before we can access the substrings, we must extract the table for + translating names to numbers, and the size of each entry in the table. */ + + (void)pcre2_pattern_info( + re, /* the compiled pattern */ + PCRE2_INFO_NAMETABLE, /* address of the table */ + &name_table); /* where to put the answer */ + + (void)pcre2_pattern_info( + re, /* the compiled pattern */ + PCRE2_INFO_NAMEENTRYSIZE, /* size of each entry in the table */ + &name_entry_size); /* where to put the answer */ + + /* Now we can scan the table and, for each entry, print the number, the name, + and the substring itself. In the 8-bit library the number is held in two + bytes, most significant first. */ + + tabptr = name_table; + for (i = 0; i < namecount; i++) + { + int n = (tabptr[0] << 8) | tabptr[1]; + printf("(%d) %*s: %.*s\en", n, name_entry_size - 3, tabptr + 2, + (int)(ovector[2*n+1] - ovector[2*n]), subject + ovector[2*n]); + tabptr += name_entry_size; + } + } + + +/************************************************************************* +* If the "-g" option was given on the command line, we want to continue * +* to search for additional matches in the subject string, in a similar * +* way to the /g option in Perl. This turns out to be trickier than you * +* might think because of the possibility of matching an empty string. * +* What happens is as follows: * +* * +* If the previous match was NOT for an empty string, we can just start * +* the next match at the end of the previous one. * +* * +* If the previous match WAS for an empty string, we can't do that, as it * +* would lead to an infinite loop. Instead, a call of pcre2_match() is * +* made with the PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set. The * +* first of these tells PCRE2 that an empty string at the start of the * +* subject is not a valid match; other possibilities must be tried. The * +* second flag restricts PCRE2 to one match attempt at the initial string * +* position. If this match succeeds, an alternative to the empty string * +* match has been found, and we can print it and proceed round the loop, * +* advancing by the length of whatever was found. If this match does not * +* succeed, we still stay in the loop, advancing by just one character. * +* In UTF-8 mode, which can be set by (*UTF) in the pattern, this may be * +* more than one byte. * +* * +* However, there is a complication concerned with newlines. When the * +* newline convention is such that CRLF is a valid newline, we must * +* advance by two characters rather than one. The newline convention can * +* be set in the regex by (*CR), etc.; if not, we must find the default. * +*************************************************************************/ + +if (!find_all) /* Check for -g */ + { + pcre2_match_data_free(match_data); /* Release the memory that was used */ + pcre2_code_free(re); /* for the match data and the pattern. */ + return 0; /* Exit the program. */ + } + +/* Before running the loop, check for UTF-8 and whether CRLF is a valid newline +sequence. First, find the options with which the regex was compiled and extract +the UTF state. */ + +(void)pcre2_pattern_info(re, PCRE2_INFO_ALLOPTIONS, &option_bits); +utf8 = (option_bits & PCRE2_UTF) != 0; + +/* Now find the newline convention and see whether CRLF is a valid newline +sequence. */ + +(void)pcre2_pattern_info(re, PCRE2_INFO_NEWLINE, &newline); +crlf_is_newline = newline == PCRE2_NEWLINE_ANY || + newline == PCRE2_NEWLINE_CRLF || + newline == PCRE2_NEWLINE_ANYCRLF; + +/* Loop for second and subsequent matches */ + +for (;;) + { + uint32_t options = 0; /* Normally no options */ + PCRE2_SIZE start_offset = ovector[1]; /* Start at end of previous match */ + + /* If the previous match was for an empty string, we are finished if we are + at the end of the subject. Otherwise, arrange to run another match at the + same point to see if a non-empty match can be found. */ + + if (ovector[0] == ovector[1]) + { + if (ovector[0] == subject_length) break; + options = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED; + } + + /* If the previous match was not an empty string, there is one tricky case to + consider. If a pattern contains \eK within a lookbehind assertion at the + start, the end of the matched string can be at the offset where the match + started. Without special action, this leads to a loop that keeps on matching + the same substring. We must detect this case and arrange to move the start on + by one character. The pcre2_get_startchar() function returns the starting + offset that was passed to pcre2_match(). */ + + else + { + PCRE2_SIZE startchar = pcre2_get_startchar(match_data); + if (start_offset <= startchar) + { + if (startchar >= subject_length) break; /* Reached end of subject. */ + start_offset = startchar + 1; /* Advance by one character. */ + if (utf8) /* If UTF-8, it may be more */ + { /* than one code unit. */ + for (; start_offset < subject_length; start_offset++) + if ((subject[start_offset] & 0xc0) != 0x80) break; + } + } + } + + /* Run the next matching operation */ + + rc = pcre2_match( + re, /* the compiled pattern */ + subject, /* the subject string */ + subject_length, /* the length of the subject */ + start_offset, /* starting offset in the subject */ + options, /* options */ + match_data, /* block for storing the result */ + NULL); /* use default match context */ + + /* This time, a result of NOMATCH isn't an error. If the value in "options" + is zero, it just means we have found all possible matches, so the loop ends. + Otherwise, it means we have failed to find a non-empty-string match at a + point where there was a previous empty-string match. In this case, we do what + Perl does: advance the matching position by one character, and continue. We + do this by setting the "end of previous match" offset, because that is picked + up at the top of the loop as the point at which to start again. + + There are two complications: (a) When CRLF is a valid newline sequence, and + the current position is just before it, advance by an extra byte. (b) + Otherwise we must ensure that we skip an entire UTF character if we are in + UTF mode. */ + + if (rc == PCRE2_ERROR_NOMATCH) + { + if (options == 0) break; /* All matches found */ + ovector[1] = start_offset + 1; /* Advance one code unit */ + if (crlf_is_newline && /* If CRLF is a newline & */ + start_offset < subject_length - 1 && /* we are at CRLF, */ + subject[start_offset] == '\er' && + subject[start_offset + 1] == '\en') + ovector[1] += 1; /* Advance by one more. */ + else if (utf8) /* Otherwise, ensure we */ + { /* advance a whole UTF-8 */ + while (ovector[1] < subject_length) /* character. */ + { + if ((subject[ovector[1]] & 0xc0) != 0x80) break; + ovector[1] += 1; + } + } + continue; /* Go round the loop again */ + } + + /* Other matching errors are not recoverable. */ + + if (rc < 0) + { + printf("Matching error %d\en", rc); + pcre2_match_data_free(match_data); + pcre2_code_free(re); + return 1; + } + + /* Match succeded */ + + printf("\enMatch succeeded again at offset %d\en", (int)ovector[0]); + + /* The match succeeded, but the output vector wasn't big enough. This + should not happen. */ + + if (rc == 0) + printf("ovector was not big enough for all the captured substrings\en"); + + /* We must guard against patterns such as /(?=.\eK)/ that use \eK in an + assertion to set the start of a match later than its end. In this + demonstration program, we just detect this case and give up. */ + + if (ovector[0] > ovector[1]) + { + printf("\e\eK was used in an assertion to set the match start after its end.\en" + "From end to start the match was: %.*s\en", (int)(ovector[0] - ovector[1]), + (char *)(subject + ovector[1])); + printf("Run abandoned\en"); + pcre2_match_data_free(match_data); + pcre2_code_free(re); + return 1; + } + + /* As before, show substrings stored in the output vector by number, and then + also any named substrings. */ + + for (i = 0; i < rc; i++) + { + PCRE2_SPTR substring_start = subject + ovector[2*i]; + size_t substring_length = ovector[2*i+1] - ovector[2*i]; + printf("%2d: %.*s\en", i, (int)substring_length, (char *)substring_start); + } + + if (namecount == 0) printf("No named substrings\en"); else + { + PCRE2_SPTR tabptr = name_table; + printf("Named substrings\en"); + for (i = 0; i < namecount; i++) + { + int n = (tabptr[0] << 8) | tabptr[1]; + printf("(%d) %*s: %.*s\en", n, name_entry_size - 3, tabptr + 2, + (int)(ovector[2*n+1] - ovector[2*n]), subject + ovector[2*n]); + tabptr += name_entry_size; + } + } + } /* End of loop to find second and subsequent matches */ + +printf("\en"); +pcre2_match_data_free(match_data); +pcre2_code_free(re); +return 0; +} + +/* End of pcre2demo.c */ +.EE diff --git a/src/pcre2/doc/pcre2grep.1 b/src/pcre2/doc/pcre2grep.1 new file mode 100644 index 00000000..66377ce2 --- /dev/null +++ b/src/pcre2/doc/pcre2grep.1 @@ -0,0 +1,960 @@ +.TH PCRE2GREP 1 "04 October 2020" "PCRE2 10.36" +.SH NAME +pcre2grep - a grep with Perl-compatible regular expressions. +.SH SYNOPSIS +.B pcre2grep [options] [long options] [pattern] [path1 path2 ...] +. +.SH DESCRIPTION +.rs +.sp +\fBpcre2grep\fP searches files for character patterns, in the same way as other +grep commands do, but it uses the PCRE2 regular expression library to support +patterns that are compatible with the regular expressions of Perl 5. See +.\" HREF +\fBpcre2syntax\fP(3) +.\" +for a quick-reference summary of pattern syntax, or +.\" HREF +\fBpcre2pattern\fP(3) +.\" +for a full description of the syntax and semantics of the regular expressions +that PCRE2 supports. +.P +Patterns, whether supplied on the command line or in a separate file, are given +without delimiters. For example: +.sp + pcre2grep Thursday /etc/motd +.sp +If you attempt to use delimiters (for example, by surrounding a pattern with +slashes, as is common in Perl scripts), they are interpreted as part of the +pattern. Quotes can of course be used to delimit patterns on the command line +because they are interpreted by the shell, and indeed quotes are required if a +pattern contains white space or shell metacharacters. +.P +The first argument that follows any option settings is treated as the single +pattern to be matched when neither \fB-e\fP nor \fB-f\fP is present. +Conversely, when one or both of these options are used to specify patterns, all +arguments are treated as path names. At least one of \fB-e\fP, \fB-f\fP, or an +argument pattern must be provided. +.P +If no files are specified, \fBpcre2grep\fP reads the standard input. The +standard input can also be referenced by a name consisting of a single hyphen. +For example: +.sp + pcre2grep some-pattern file1 - file3 +.sp +Input files are searched line by line. By default, each line that matches a +pattern is copied to the standard output, and if there is more than one file, +the file name is output at the start of each line, followed by a colon. +However, there are options that can change how \fBpcre2grep\fP behaves. In +particular, the \fB-M\fP option makes it possible to search for strings that +span line boundaries. What defines a line boundary is controlled by the +\fB-N\fP (\fB--newline\fP) option. +.P +The amount of memory used for buffering files that are being scanned is +controlled by parameters that can be set by the \fB--buffer-size\fP and +\fB--max-buffer-size\fP options. The first of these sets the size of buffer +that is obtained at the start of processing. If an input file contains very +long lines, a larger buffer may be needed; this is handled by automatically +extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The +default values for these parameters can be set when \fBpcre2grep\fP is +built; if nothing is specified, the defaults are set to 20KiB and 1MiB +respectively. An error occurs if a line is too long and the buffer can no +longer be expanded. +.P +The block of memory that is actually used is three times the "buffer size", to +allow for buffering "before" and "after" lines. If the buffer size is too +small, fewer than requested "before" and "after" lines may be output. +.P +Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the greater. +BUFSIZ is defined in \fB\fP. When there is more than one pattern +(specified by the use of \fB-e\fP and/or \fB-f\fP), each pattern is applied to +each line in the order in which they are defined, except that all the \fB-e\fP +patterns are tried before the \fB-f\fP patterns. +.P +By default, as soon as one pattern matches a line, no further patterns are +considered. However, if \fB--colour\fP (or \fB--color\fP) is used to colour the +matching substrings, or if \fB--only-matching\fP, \fB--file-offsets\fP, or +\fB--line-offsets\fP is used to output only the part of the line that matched +(either shown literally, or as an offset), scanning resumes immediately +following the match, so that further matches on the same line can be found. If +there are multiple patterns, they are all tried on the remainder of the line, +but patterns that follow the one that matched are not tried on the earlier +matched part of the line. +.P +This behaviour means that the order in which multiple patterns are specified +can affect the output when one of the above options is used. This is no longer +the same behaviour as GNU grep, which now manages to display earlier matches +for later patterns (as long as there is no overlap). +.P +Patterns that can match an empty string are accepted, but empty string +matches are never recognized. An example is the pattern "(super)?(man)?", in +which all components are optional. This pattern finds all occurrences of both +"super" and "man"; the output differs from matching with "super|man" when only +the matching substrings are being shown. +.P +If the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variable is set, +\fBpcre2grep\fP uses the value to set a locale when calling the PCRE2 library. +The \fB--locale\fP option can be used to override this. +. +. +.SH "SUPPORT FOR COMPRESSED FILES" +.rs +.sp +It is possible to compile \fBpcre2grep\fP so that it uses \fBlibz\fP or +\fBlibbz2\fP to read compressed files whose names end in \fB.gz\fP or +\fB.bz2\fP, respectively. You can find out whether your \fBpcre2grep\fP binary +has support for one or both of these file types by running it with the +\fB--help\fP option. If the appropriate support is not present, all files are +treated as plain text. The standard input is always so treated. When input is +from a compressed .gz or .bz2 file, the \fB--line-buffered\fP option is +ignored. +. +. +.SH "BINARY FILES" +.rs +.sp +By default, a file that contains a binary zero byte within the first 1024 bytes +is identified as a binary file, and is processed specially. However, if the +newline type is specified as NUL, that is, the line terminator is a binary +zero, the test for a binary file is not applied. See the \fB--binary-files\fP +option for a means of changing the way binary files are handled. +. +. +.SH "BINARY ZEROS IN PATTERNS" +.rs +.sp +Patterns passed from the command line are strings that are terminated by a +binary zero, so cannot contain internal zeros. However, patterns that are read +from a file via the \fB-f\fP option may contain binary zeros. +. +. +.SH OPTIONS +.rs +.sp +The order in which some of the options appear can affect the output. For +example, both the \fB-H\fP and \fB-l\fP options affect the printing of file +names. Whichever comes later in the command line will be the one that takes +effect. Similarly, except where noted below, if an option is given twice, the +later setting is used. Numerical values for options may be followed by K or M, +to signify multiplication by 1024 or 1024*1024 respectively. +.TP 10 +\fB--\fP +This terminates the list of options. It is useful if the next item on the +command line starts with a hyphen but is not an option. This allows for the +processing of patterns and file names that start with hyphens. +.TP +\fB-A\fP \fInumber\fP, \fB--after-context=\fP\fInumber\fP +Output up to \fInumber\fP lines of context after each matching line. Fewer +lines are output if the next match or the end of the file is reached, or if the +processing buffer size has been set too small. If file names and/or line +numbers are being output, a hyphen separator is used instead of a colon for the +context lines. A line containing "--" is output between each group of lines, +unless they are in fact contiguous in the input file. The value of \fInumber\fP +is expected to be relatively small. When \fB-c\fP is used, \fB-A\fP is ignored. +.TP +\fB-a\fP, \fB--text\fP +Treat binary files as text. This is equivalent to +\fB--binary-files\fP=\fItext\fP. +.TP +\fB-B\fP \fInumber\fP, \fB--before-context=\fP\fInumber\fP +Output up to \fInumber\fP lines of context before each matching line. Fewer +lines are output if the previous match or the start of the file is within +\fInumber\fP lines, or if the processing buffer size has been set too small. If +file names and/or line numbers are being output, a hyphen separator is used +instead of a colon for the context lines. A line containing "--" is output +between each group of lines, unless they are in fact contiguous in the input +file. The value of \fInumber\fP is expected to be relatively small. When +\fB-c\fP is used, \fB-B\fP is ignored. +.TP +\fB--binary-files=\fP\fIword\fP +Specify how binary files are to be processed. If the word is "binary" (the +default), pattern matching is performed on binary files, but the only output is +"Binary file matches" when a match succeeds. If the word is "text", +which is equivalent to the \fB-a\fP or \fB--text\fP option, binary files are +processed in the same way as any other file. In this case, when a match +succeeds, the output may be binary garbage, which can have nasty effects if +sent to a terminal. If the word is "without-match", which is equivalent to the +\fB-I\fP option, binary files are not processed at all; they are assumed not to +be of interest and are skipped without causing any output or affecting the +return code. +.TP +\fB--buffer-size=\fP\fInumber\fP +Set the parameter that controls how much memory is obtained at the start of +processing for buffering files that are being scanned. See also +\fB--max-buffer-size\fP below. +.TP +\fB-C\fP \fInumber\fP, \fB--context=\fP\fInumber\fP +Output \fInumber\fP lines of context both before and after each matching line. +This is equivalent to setting both \fB-A\fP and \fB-B\fP to the same value. +.TP +\fB-c\fP, \fB--count\fP +Do not output lines from the files that are being scanned; instead output the +number of lines that would have been shown, either because they matched, or, if +\fB-v\fP is set, because they failed to match. By default, this count is +exactly the same as the number of lines that would have been output, but if the +\fB-M\fP (multiline) option is used (without \fB-v\fP), there may be more +suppressed lines than the count (that is, the number of matches). +.sp +If no lines are selected, the number zero is output. If several files are are +being scanned, a count is output for each of them and the \fB-t\fP option can +be used to cause a total to be output at the end. However, if the +\fB--files-with-matches\fP option is also used, only those files whose counts +are greater than zero are listed. When \fB-c\fP is used, the \fB-A\fP, +\fB-B\fP, and \fB-C\fP options are ignored. +.TP +\fB--colour\fP, \fB--color\fP +If this option is given without any data, it is equivalent to "--colour=auto". +If data is required, it must be given in the same shell item, separated by an +equals sign. +.TP +\fB--colour=\fP\fIvalue\fP, \fB--color=\fP\fIvalue\fP +This option specifies under what circumstances the parts of a line that matched +a pattern should be coloured in the output. By default, the output is not +coloured. The value (which is optional, see above) may be "never", "always", or +"auto". In the latter case, colouring happens only if the standard output is +connected to a terminal. More resources are used when colouring is enabled, +because \fBpcre2grep\fP has to search for all possible matches in a line, not +just one, in order to colour them all. +.sp +The colour that is used can be specified by setting one of the environment +variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR, PCREGREP_COLOUR, or +PCREGREP_COLOR, which are checked in that order. If none of these are set, +\fBpcre2grep\fP looks for GREP_COLORS or GREP_COLOR (in that order). The value +of the variable should be a string of two numbers, separated by a semicolon, +except in the case of GREP_COLORS, which must start with "ms=" or "mt=" +followed by two semicolon-separated colours, terminated by the end of the +string or by a colon. If GREP_COLORS does not start with "ms=" or "mt=" it is +ignored, and GREP_COLOR is checked. +.sp +If the string obtained from one of the above variables contains any characters +other than semicolon or digits, the setting is ignored and the default colour +is used. The string is copied directly into the control string for setting +colour on a terminal, so it is your responsibility to ensure that the values +make sense. If no relevant environment variable is set, the default is "1;31", +which gives red. +.TP +\fB-D\fP \fIaction\fP, \fB--devices=\fP\fIaction\fP +If an input path is not a regular file or a directory, "action" specifies how +it is to be processed. Valid values are "read" (the default) or "skip" +(silently skip the path). +.TP +\fB-d\fP \fIaction\fP, \fB--directories=\fP\fIaction\fP +If an input path is a directory, "action" specifies how it is to be processed. +Valid values are "read" (the default in non-Windows environments, for +compatibility with GNU grep), "recurse" (equivalent to the \fB-r\fP option), or +"skip" (silently skip the path, the default in Windows environments). In the +"read" case, directories are read as if they were ordinary files. In some +operating systems the effect of reading a directory like this is an immediate +end-of-file; in others it may provoke an error. +.TP +\fB--depth-limit\fP=\fInumber\fP +See \fB--match-limit\fP below. +.TP +\fB-e\fP \fIpattern\fP, \fB--regex=\fP\fIpattern\fP, \fB--regexp=\fP\fIpattern\fP +Specify a pattern to be matched. This option can be used multiple times in +order to specify several patterns. It can also be used as a way of specifying a +single pattern that starts with a hyphen. When \fB-e\fP is used, no argument +pattern is taken from the command line; all arguments are treated as file +names. There is no limit to the number of patterns. They are applied to each +line in the order in which they are defined until one matches. +.sp +If \fB-f\fP is used with \fB-e\fP, the command line patterns are matched first, +followed by the patterns from the file(s), independent of the order in which +these options are specified. Note that multiple use of \fB-e\fP is not the same +as a single pattern with alternatives. For example, X|Y finds the first +character in a line that is X or Y, whereas if the two patterns are given +separately, with X first, \fBpcre2grep\fP finds X if it is present, even if it +follows Y in the line. It finds Y only if there is no X in the line. This +matters only if you are using \fB-o\fP or \fB--colo(u)r\fP to show the part(s) +of the line that matched. +.TP +\fB--exclude\fP=\fIpattern\fP +Files (but not directories) whose names match the pattern are skipped without +being processed. This applies to all files, whether listed on the command line, +obtained from \fB--file-list\fP, or by scanning a directory. The pattern is a +PCRE2 regular expression, and is matched against the final component of the +file name, not the entire path. The \fB-F\fP, \fB-w\fP, and \fB-x\fP options do +not apply to this pattern. The option may be given any number of times in order +to specify multiple patterns. If a file name matches both an \fB--include\fP +and an \fB--exclude\fP pattern, it is excluded. There is no short form for this +option. +.TP +\fB--exclude-from=\fP\fIfilename\fP +Treat each non-empty line of the file as the data for an \fB--exclude\fP +option. What constitutes a newline when reading the file is the operating +system's default. The \fB--newline\fP option has no effect on this option. This +option may be given more than once in order to specify a number of files to +read. +.TP +\fB--exclude-dir\fP=\fIpattern\fP +Directories whose names match the pattern are skipped without being processed, +whatever the setting of the \fB--recursive\fP option. This applies to all +directories, whether listed on the command line, obtained from +\fB--file-list\fP, or by scanning a parent directory. The pattern is a PCRE2 +regular expression, and is matched against the final component of the directory +name, not the entire path. The \fB-F\fP, \fB-w\fP, and \fB-x\fP options do not +apply to this pattern. The option may be given any number of times in order to +specify more than one pattern. If a directory matches both \fB--include-dir\fP +and \fB--exclude-dir\fP, it is excluded. There is no short form for this +option. +.TP +\fB-F\fP, \fB--fixed-strings\fP +Interpret each data-matching pattern as a list of fixed strings, separated by +newlines, instead of as a regular expression. What constitutes a newline for +this purpose is controlled by the \fB--newline\fP option. The \fB-w\fP (match +as a word) and \fB-x\fP (match whole line) options can be used with \fB-F\fP. +They apply to each of the fixed strings. A line is selected if any of the fixed +strings are found in it (subject to \fB-w\fP or \fB-x\fP, if present). This +option applies only to the patterns that are matched against the contents of +files; it does not apply to patterns specified by any of the \fB--include\fP or +\fB--exclude\fP options. +.TP +\fB-f\fP \fIfilename\fP, \fB--file=\fP\fIfilename\fP +Read patterns from the file, one per line, and match them against each line of +input. As is the case with patterns on the command line, no delimiters should +be used. What constitutes a newline when reading the file is the operating +system's default interpretation of \en. The \fB--newline\fP option has no +effect on this option. Trailing white space is removed from each line, and +blank lines are ignored. An empty file contains no patterns and therefore +matches nothing. Patterns read from a file in this way may contain binary +zeros, which are treated as ordinary data characters. See also the comments +about multiple patterns versus a single pattern with alternatives in the +description of \fB-e\fP above. +.sp +If this option is given more than once, all the specified files are read. A +data line is output if any of the patterns match it. A file name can be given +as "-" to refer to the standard input. When \fB-f\fP is used, patterns +specified on the command line using \fB-e\fP may also be present; they are +tested before the file's patterns. However, no other pattern is taken from the +command line; all arguments are treated as the names of paths to be searched. +.TP +\fB--file-list\fP=\fIfilename\fP +Read a list of files and/or directories that are to be scanned from the given +file, one per line. What constitutes a newline when reading the file is the +operating system's default. Trailing white space is removed from each line, and +blank lines are ignored. These paths are processed before any that are listed +on the command line. The file name can be given as "-" to refer to the standard +input. If \fB--file\fP and \fB--file-list\fP are both specified as "-", +patterns are read first. This is useful only when the standard input is a +terminal, from which further lines (the list of files) can be read after an +end-of-file indication. If this option is given more than once, all the +specified files are read. +.TP +\fB--file-offsets\fP +Instead of showing lines or parts of lines that match, show each match as an +offset from the start of the file and a length, separated by a comma. In this +mode, no context is shown. That is, the \fB-A\fP, \fB-B\fP, and \fB-C\fP +options are ignored. If there is more than one match in a line, each of them is +shown separately. This option is mutually exclusive with \fB--output\fP, +\fB--line-offsets\fP, and \fB--only-matching\fP. +.TP +\fB-H\fP, \fB--with-filename\fP +Force the inclusion of the file name at the start of output lines when +searching a single file. By default, the file name is not shown in this case. +For matching lines, the file name is followed by a colon; for context lines, a +hyphen separator is used. If a line number is also being output, it follows the +file name. When the \fB-M\fP option causes a pattern to match more than one +line, only the first is preceded by the file name. This option overrides any +previous \fB-h\fP, \fB-l\fP, or \fB-L\fP options. +.TP +\fB-h\fP, \fB--no-filename\fP +Suppress the output file names when searching multiple files. By default, +file names are shown when multiple files are searched. For matching lines, the +file name is followed by a colon; for context lines, a hyphen separator is used. +If a line number is also being output, it follows the file name. This option +overrides any previous \fB-H\fP, \fB-L\fP, or \fB-l\fP options. +.TP +\fB--heap-limit\fP=\fInumber\fP +See \fB--match-limit\fP below. +.TP +\fB--help\fP +Output a help message, giving brief details of the command options and file +type support, and then exit. Anything else on the command line is +ignored. +.TP +\fB-I\fP +Ignore binary files. This is equivalent to +\fB--binary-files\fP=\fIwithout-match\fP. +.TP +\fB-i\fP, \fB--ignore-case\fP +Ignore upper/lower case distinctions during comparisons. +.TP +\fB--include\fP=\fIpattern\fP +If any \fB--include\fP patterns are specified, the only files that are +processed are those whose names match one of the patterns and do not match an +\fB--exclude\fP pattern. This option does not affect directories, but it +applies to all files, whether listed on the command line, obtained from +\fB--file-list\fP, or by scanning a directory. The pattern is a PCRE2 regular +expression, and is matched against the final component of the file name, not +the entire path. The \fB-F\fP, \fB-w\fP, and \fB-x\fP options do not apply to +this pattern. The option may be given any number of times. If a file name +matches both an \fB--include\fP and an \fB--exclude\fP pattern, it is excluded. +There is no short form for this option. +.TP +\fB--include-from=\fP\fIfilename\fP +Treat each non-empty line of the file as the data for an \fB--include\fP +option. What constitutes a newline for this purpose is the operating system's +default. The \fB--newline\fP option has no effect on this option. This option +may be given any number of times; all the files are read. +.TP +\fB--include-dir\fP=\fIpattern\fP +If any \fB--include-dir\fP patterns are specified, the only directories that +are processed are those whose names match one of the patterns and do not match +an \fB--exclude-dir\fP pattern. This applies to all directories, whether listed +on the command line, obtained from \fB--file-list\fP, or by scanning a parent +directory. The pattern is a PCRE2 regular expression, and is matched against +the final component of the directory name, not the entire path. The \fB-F\fP, +\fB-w\fP, and \fB-x\fP options do not apply to this pattern. The option may be +given any number of times. If a directory matches both \fB--include-dir\fP and +\fB--exclude-dir\fP, it is excluded. There is no short form for this option. +.TP +\fB-L\fP, \fB--files-without-match\fP +Instead of outputting lines from the files, just output the names of the files +that do not contain any lines that would have been output. Each file name is +output once, on a separate line. This option overrides any previous \fB-H\fP, +\fB-h\fP, or \fB-l\fP options. +.TP +\fB-l\fP, \fB--files-with-matches\fP +Instead of outputting lines from the files, just output the names of the files +containing lines that would have been output. Each file name is output once, on +a separate line. Searching normally stops as soon as a matching line is found +in a file. However, if the \fB-c\fP (count) option is also used, matching +continues in order to obtain the correct count, and those files that have at +least one match are listed along with their counts. Using this option with +\fB-c\fP is a way of suppressing the listing of files with no matches that +occurs with \fB-c\fP on its own. This option overrides any previous \fB-H\fP, +\fB-h\fP, or \fB-L\fP options. +.TP +\fB--label\fP=\fIname\fP +This option supplies a name to be used for the standard input when file names +are being output. If not supplied, "(standard input)" is used. There is no +short form for this option. +.TP +\fB--line-buffered\fP +When this option is given, non-compressed input is read and processed line by +line, and the output is flushed after each write. By default, input is read in +large chunks, unless \fBpcre2grep\fP can determine that it is reading from a +terminal, which is currently possible only in Unix-like environments or +Windows. Output to terminal is normally automatically flushed by the operating +system. This option can be useful when the input or output is attached to a +pipe and you do not want \fBpcre2grep\fP to buffer up large amounts of data. +However, its use will affect performance, and the \fB-M\fP (multiline) option +ceases to work. When input is from a compressed .gz or .bz2 file, +\fB--line-buffered\fP is ignored. +.TP +\fB--line-offsets\fP +Instead of showing lines or parts of lines that match, show each match as a +line number, the offset from the start of the line, and a length. The line +number is terminated by a colon (as usual; see the \fB-n\fP option), and the +offset and length are separated by a comma. In this mode, no context is shown. +That is, the \fB-A\fP, \fB-B\fP, and \fB-C\fP options are ignored. If there is +more than one match in a line, each of them is shown separately. This option is +mutually exclusive with \fB--output\fP, \fB--file-offsets\fP, and +\fB--only-matching\fP. +.TP +\fB--locale\fP=\fIlocale-name\fP +This option specifies a locale to be used for pattern matching. It overrides +the value in the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variables. If no +locale is specified, the PCRE2 library's default (usually the "C" locale) is +used. There is no short form for this option. +.TP +\fB-M\fP, \fB--multiline\fP +Allow patterns to match more than one line. When this option is set, the PCRE2 +library is called in "multiline" mode. This allows a matched string to extend +past the end of a line and continue on one or more subsequent lines. Patterns +used with \fB-M\fP may usefully contain literal newline characters and internal +occurrences of ^ and $ characters. The output for a successful match may +consist of more than one line. The first line is the line in which the match +started, and the last line is the line in which the match ended. If the matched +string ends with a newline sequence, the output ends at the end of that line. +If \fB-v\fP is set, none of the lines in a multi-line match are output. Once a +match has been handled, scanning restarts at the beginning of the line after +the one in which the match ended. +.sp +The newline sequence that separates multiple lines must be matched as part of +the pattern. For example, to find the phrase "regular expression" in a file +where "regular" might be at the end of a line and "expression" at the start of +the next line, you could use this command: +.sp + pcre2grep -M 'regular\es+expression' +.sp +The \es escape sequence matches any white space character, including newlines, +and is followed by + so as to match trailing white space on the first line as +well as possibly handling a two-character newline sequence. +.sp +There is a limit to the number of lines that can be matched, imposed by the way +that \fBpcre2grep\fP buffers the input file as it scans it. With a sufficiently +large processing buffer, this should not be a problem, but the \fB-M\fP option +does not work when input is read line by line (see \fB--line-buffered\fP.) +.TP +\fB-m\fP \fInumber\fP, \fB--max-count\fP=\fInumber\fP +Stop processing after finding \fInumber\fP matching lines, or non-matching +lines if \fB-v\fP is also set. Any trailing context lines are output after the +final match. In multiline mode, each multiline match counts as just one line +for this purpose. If this limit is reached when reading the standard input from +a regular file, the file is left positioned just after the last matching line. +If \fB-c\fP is also set, the count that is output is never greater than +\fInumber\fP. This option has no effect if used with \fB-L\fP, \fB-l\fP, or +\fB-q\fP, or when just checking for a match in a binary file. +.TP +\fB--match-limit\fP=\fInumber\fP +Processing some regular expression patterns may take a very long time to search +for all possible matching strings. Others may require a very large amount of +memory. There are three options that set resource limits for matching. +.sp +The \fB--match-limit\fP option provides a means of limiting computing resource +usage when processing patterns that are not going to match, but which have a +very large number of possibilities in their search trees. The classic example +is a pattern that uses nested unlimited repeats. Internally, PCRE2 has a +counter that is incremented each time around its main processing loop. If the +value set by \fB--match-limit\fP is reached, an error occurs. +.sp +The \fB--heap-limit\fP option specifies, as a number of kibibytes (units of +1024 bytes), the amount of heap memory that may be used for matching. Heap +memory is needed only if matching the pattern requires a significant number of +nested backtracking points to be remembered. This parameter can be set to zero +to forbid the use of heap memory altogether. +.sp +The \fB--depth-limit\fP option limits the depth of nested backtracking points, +which indirectly limits the amount of memory that is used. The amount of memory +needed for each backtracking point depends on the number of capturing +parentheses in the pattern, so the amount of memory that is used before this +limit acts varies from pattern to pattern. This limit is of use only if it is +set smaller than \fB--match-limit\fP. +.sp +There are no short forms for these options. The default limits can be set +when the PCRE2 library is compiled; if they are not specified, the defaults +are very large and so effectively unlimited. +.TP +\fB--max-buffer-size\fP=\fInumber\fP +This limits the expansion of the processing buffer, whose initial size can be +set by \fB--buffer-size\fP. The maximum buffer size is silently forced to be no +smaller than the starting buffer size. +.TP +\fB-N\fP \fInewline-type\fP, \fB--newline\fP=\fInewline-type\fP +Six different conventions for indicating the ends of lines in scanned files are +supported. For example: +.sp + pcre2grep -N CRLF 'some pattern' +.sp +The newline type may be specified in upper, lower, or mixed case. If the +newline type is NUL, lines are separated by binary zero characters. The other +types are the single-character sequences CR (carriage return) and LF +(linefeed), the two-character sequence CRLF, an "anycrlf" type, which +recognizes any of the preceding three types, and an "any" type, for which any +Unicode line ending sequence is assumed to end a line. The Unicode sequences +are the three just mentioned, plus VT (vertical tab, U+000B), FF (form feed, +U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS +(paragraph separator, U+2029). +.sp +When the PCRE2 library is built, a default line-ending sequence is specified. +This is normally the standard sequence for the operating system. Unless +otherwise specified by this option, \fBpcre2grep\fP uses the library's default. +.sp +This option makes it possible to use \fBpcre2grep\fP to scan files that have +come from other environments without having to modify their line endings. If +the data that is being scanned does not agree with the convention set by this +option, \fBpcre2grep\fP may behave in strange ways. Note that this option does +not apply to files specified by the \fB-f\fP, \fB--exclude-from\fP, or +\fB--include-from\fP options, which are expected to use the operating system's +standard newline sequence. +.TP +\fB-n\fP, \fB--line-number\fP +Precede each output line by its line number in the file, followed by a colon +for matching lines or a hyphen for context lines. If the file name is also +being output, it precedes the line number. When the \fB-M\fP option causes a +pattern to match more than one line, only the first is preceded by its line +number. This option is forced if \fB--line-offsets\fP is used. +.TP +\fB--no-jit\fP +If the PCRE2 library is built with support for just-in-time compiling (which +speeds up matching), \fBpcre2grep\fP automatically makes use of this, unless it +was explicitly disabled at build time. This option can be used to disable the +use of JIT at run time. It is provided for testing and working round problems. +It should never be needed in normal use. +.TP +\fB-O\fP \fItext\fP, \fB--output\fP=\fItext\fP +When there is a match, instead of outputting the line that matched, output just +the text specified in this option, followed by an operating-system standard +newline. In this mode, no context is shown. That is, the \fB-A\fP, \fB-B\fP, +and \fB-C\fP options are ignored. The \fB--newline\fP option has no effect on +this option, which is mutually exclusive with \fB--only-matching\fP, +\fB--file-offsets\fP, and \fB--line-offsets\fP. However, like +\fB--only-matching\fP, if there is more than one match in a line, each of them +causes a line of output. +.sp +Escape sequences starting with a dollar character may be used to insert the +contents of the matched part of the line and/or captured substrings into the +text. +.sp +$ or ${} is replaced by the captured substring of the given +decimal number; zero substitutes the whole match. If the number is greater than +the number of capturing substrings, or if the capture is unset, the replacement +is empty. +.sp +$a is replaced by bell; $b by backspace; $e by escape; $f by form feed; $n by +newline; $r by carriage return; $t by tab; $v by vertical tab. +.sp +$o or $o{} is replaced by the character whose code point is the +given octal number. In the first form, up to three octal digits are processed. +When more digits are needed in Unicode mode to specify a wide character, the +second form must be used. +.sp +$x or $x{} is replaced by the character represented by the +given hexadecimal number. In the first form, up to two hexadecimal digits are +processed. When more digits are needed in Unicode mode to specify a wide +character, the second form must be used. +.sp +Any other character is substituted by itself. In particular, $$ is replaced by +a single dollar. +.TP +\fB-o\fP, \fB--only-matching\fP +Show only the part of the line that matched a pattern instead of the whole +line. In this mode, no context is shown. That is, the \fB-A\fP, \fB-B\fP, and +\fB-C\fP options are ignored. If there is more than one match in a line, each +of them is shown separately, on a separate line of output. If \fB-o\fP is +combined with \fB-v\fP (invert the sense of the match to find non-matching +lines), no output is generated, but the return code is set appropriately. If +the matched portion of the line is empty, nothing is output unless the file +name or line number are being printed, in which case they are shown on an +otherwise empty line. This option is mutually exclusive with \fB--output\fP, +\fB--file-offsets\fP and \fB--line-offsets\fP. +.TP +\fB-o\fP\fInumber\fP, \fB--only-matching\fP=\fInumber\fP +Show only the part of the line that matched the capturing parentheses of the +given number. Up to 50 capturing parentheses are supported by default. This +limit can be changed via the \fB--om-capture\fP option. A pattern may contain +any number of capturing parentheses, but only those whose number is within the +limit can be accessed by \fB-o\fP. An error occurs if the number specified by +\fB-o\fP is greater than the limit. +.sp +-o0 is the same as \fB-o\fP without a number. Because these options can be +given without an argument (see above), if an argument is present, it must be +given in the same shell item, for example, -o3 or --only-matching=2. The +comments given for the non-argument case above also apply to this option. If +the specified capturing parentheses do not exist in the pattern, or were not +set in the match, nothing is output unless the file name or line number are +being output. +.sp +If this option is given multiple times, multiple substrings are output for each +match, in the order the options are given, and all on one line. For example, +-o3 -o1 -o3 causes the substrings matched by capturing parentheses 3 and 1 and +then 3 again to be output. By default, there is no separator (but see the next +but one option). +.TP +\fB--om-capture\fP=\fInumber\fP +Set the number of capturing parentheses that can be accessed by \fB-o\fP. The +default is 50. +.TP +\fB--om-separator\fP=\fItext\fP +Specify a separating string for multiple occurrences of \fB-o\fP. The default +is an empty string. Separating strings are never coloured. +.TP +\fB-q\fP, \fB--quiet\fP +Work quietly, that is, display nothing except error messages. The exit +status indicates whether or not any matches were found. +.TP +\fB-r\fP, \fB--recursive\fP +If any given path is a directory, recursively scan the files it contains, +taking note of any \fB--include\fP and \fB--exclude\fP settings. By default, a +directory is read as a normal file; in some operating systems this gives an +immediate end-of-file. This option is a shorthand for setting the \fB-d\fP +option to "recurse". +.TP +\fB--recursion-limit\fP=\fInumber\fP +This is an obsolete synonym for \fB--depth-limit\fP. See \fB--match-limit\fP +above for details. +.TP +\fB-s\fP, \fB--no-messages\fP +Suppress error messages about non-existent or unreadable files. Such files are +quietly skipped. However, the return code is still 2, even if matches were +found in other files. +.TP +\fB-t\fP, \fB--total-count\fP +This option is useful when scanning more than one file. If used on its own, +\fB-t\fP suppresses all output except for a grand total number of matching +lines (or non-matching lines if \fB-v\fP is used) in all the files. If \fB-t\fP +is used with \fB-c\fP, a grand total is output except when the previous output +is just one line. In other words, it is not output when just one file's count +is listed. If file names are being output, the grand total is preceded by +"TOTAL:". Otherwise, it appears as just another number. The \fB-t\fP option is +ignored when used with \fB-L\fP (list files without matches), because the grand +total would always be zero. +.TP +\fB-u\fP, \fB--utf\fP +Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled +with UTF-8 support. All patterns (including those for any \fB--exclude\fP and +\fB--include\fP options) and all lines that are scanned must be valid strings +of UTF-8 characters. If an invalid UTF-8 string is encountered, an error +occurs. +.TP +\fB-U\fP, \fB--utf-allow-invalid\fP +As \fB--utf\fP, but in addition subject lines may contain invalid UTF-8 code +unit sequences. These can never form part of any pattern match. Patterns +themselves, however, must still be valid UTF-8 strings. This facility allows +valid UTF-8 strings to be sought within arbitrary byte sequences in executable +or other binary files. For more details about matching in non-valid UTF-8 +strings, see the +.\" HREF +\fBpcre2unicode\fP(3) +.\" +documentation. +.TP +\fB-V\fP, \fB--version\fP +Write the version numbers of \fBpcre2grep\fP and the PCRE2 library to the +standard output and then exit. Anything else on the command line is +ignored. +.TP +\fB-v\fP, \fB--invert-match\fP +Invert the sense of the match, so that lines which do \fInot\fP match any of +the patterns are the ones that are found. When this option is set, options such +as \fB--only-matching\fP and \fB--output\fP, which specify parts of a match +that are to be output, are ignored. +.TP +\fB-w\fP, \fB--word-regex\fP, \fB--word-regexp\fP +Force the patterns only to match "words". That is, there must be a word +boundary at the start and end of each matched string. This is equivalent to +having "\eb(?:" at the start of each pattern, and ")\eb" at the end. This +option applies only to the patterns that are matched against the contents of +files; it does not apply to patterns specified by any of the \fB--include\fP or +\fB--exclude\fP options. +.TP +\fB-x\fP, \fB--line-regex\fP, \fB--line-regexp\fP +Force the patterns to start matching only at the beginnings of lines, and in +addition, require them to match entire lines. In multiline mode the match may +be more than one line. This is equivalent to having "^(?:" at the start of each +pattern and ")$" at the end. This option applies only to the patterns that are +matched against the contents of files; it does not apply to patterns specified +by any of the \fB--include\fP or \fB--exclude\fP options. +. +. +.SH "ENVIRONMENT VARIABLES" +.rs +.sp +The environment variables \fBLC_ALL\fP and \fBLC_CTYPE\fP are examined, in that +order, for a locale. The first one that is set is used. This can be overridden +by the \fB--locale\fP option. If no locale is set, the PCRE2 library's default +(usually the "C" locale) is used. +. +. +.SH "NEWLINES" +.rs +.sp +The \fB-N\fP (\fB--newline\fP) option allows \fBpcre2grep\fP to scan files with +newline conventions that differ from the default. This option affects only the +way scanned files are processed. It does not affect the interpretation of files +specified by the \fB-f\fP, \fB--file-list\fP, \fB--exclude-from\fP, or +\fB--include-from\fP options. +.P +Any parts of the scanned input files that are written to the standard output +are copied with whatever newline sequences they have in the input. However, if +the final line of a file is output, and it does not end with a newline +sequence, a newline sequence is added. If the newline setting is CR, LF, CRLF +or NUL, that line ending is output; for the other settings (ANYCRLF or ANY) a +single NL is used. +.P +The newline setting does not affect the way in which \fBpcre2grep\fP writes +newlines in informational messages to the standard output and error streams. +Under Windows, the standard output is set to be binary, so that "\er\en" at the +ends of output lines that are copied from the input is not converted to +"\er\er\en" by the C I/O library. This means that any messages written to the +standard output must end with "\er\en". For all other operating systems, and +for all messages to the standard error stream, "\en" is used. +. +. +.SH "OPTIONS COMPATIBILITY" +.rs +.sp +Many of the short and long forms of \fBpcre2grep\fP's options are the same +as in the GNU \fBgrep\fP program. Any long option of the form +\fB--xxx-regexp\fP (GNU terminology) is also available as \fB--xxx-regex\fP +(PCRE2 terminology). However, the \fB--depth-limit\fP, \fB--file-list\fP, +\fB--file-offsets\fP, \fB--heap-limit\fP, \fB--include-dir\fP, +\fB--line-offsets\fP, \fB--locale\fP, \fB--match-limit\fP, \fB-M\fP, +\fB--multiline\fP, \fB-N\fP, \fB--newline\fP, \fB--om-separator\fP, +\fB--output\fP, \fB-u\fP, \fB--utf\fP, \fB-U\fP, and \fB--utf-allow-invalid\fP +options are specific to \fBpcre2grep\fP, as is the use of the +\fB--only-matching\fP option with a capturing parentheses number. +.P +Although most of the common options work the same way, a few are different in +\fBpcre2grep\fP. For example, the \fB--include\fP option's argument is a glob +for GNU \fBgrep\fP, but a regular expression for \fBpcre2grep\fP. If both the +\fB-c\fP and \fB-l\fP options are given, GNU grep lists only file names, +without counts, but \fBpcre2grep\fP gives the counts as well. +. +. +.SH "OPTIONS WITH DATA" +.rs +.sp +There are four different ways in which an option with data can be specified. +If a short form option is used, the data may follow immediately, or (with one +exception) in the next command line item. For example: +.sp + -f/some/file + -f /some/file +.sp +The exception is the \fB-o\fP option, which may appear with or without data. +Because of this, if data is present, it must follow immediately in the same +item, for example -o3. +.P +If a long form option is used, the data may appear in the same command line +item, separated by an equals character, or (with two exceptions) it may appear +in the next command line item. For example: +.sp + --file=/some/file + --file /some/file +.sp +Note, however, that if you want to supply a file name beginning with ~ as data +in a shell command, and have the shell expand ~ to a home directory, you must +separate the file name from the option, because the shell does not treat ~ +specially unless it is at the start of an item. +.P +The exceptions to the above are the \fB--colour\fP (or \fB--color\fP) and +\fB--only-matching\fP options, for which the data is optional. If one of these +options does have data, it must be given in the first form, using an equals +character. Otherwise \fBpcre2grep\fP will assume that it has no data. +. +. +.SH "USING PCRE2'S CALLOUT FACILITY" +.rs +.sp +\fBpcre2grep\fP has, by default, support for calling external programs or +scripts or echoing specific strings during matching by making use of PCRE2's +callout facility. However, this support can be completely or partially disabled +when \fBpcre2grep\fP is built. You can find out whether your binary has support +for callouts by running it with the \fB--help\fP option. If callout support is +completely disabled, all callouts in patterns are ignored by \fBpcre2grep\fP. +If the facility is partially disabled, calling external programs is not +supported, and callouts that request it are ignored. +.P +A callout in a PCRE2 pattern is of the form (?C) where the argument is +either a number or a quoted string (see the +.\" HREF +\fBpcre2callout\fP +.\" +documentation for details). Numbered callouts are ignored by \fBpcre2grep\fP; +only callouts with string arguments are useful. +. +. +.SS "Echoing a specific string" +.rs +.sp +Starting the callout string with a pipe character invokes an echoing facility +that avoids calling an external program or script. This facility is always +available, provided that callouts were not completely disabled when +\fBpcre2grep\fP was built. The rest of the callout string is processed as a +zero-terminated string, which means it should not contain any internal binary +zeros. It is written to the output, having first been passed through the same +escape processing as text from the \fB--output\fP (\fB-O\fP) option (see +above). However, $0 cannot be used to insert a matched substring because the +match is still in progress. Instead, the single character '0' is inserted. Any +syntax errors in the string (for example, a dollar not followed by another +character) causes the callout to be ignored. No terminator is added to the +output string, so if you want a newline, you must include it explicitly using +the escape $n. For example: +.sp + pcre2grep '(.)(..(.))(?C"|[$1] [$2] [$3]$n")' +.sp +Matching continues normally after the string is output. If you want to see only +the callout output but not any output from an actual match, you should end the +pattern with (*FAIL). +. +. +.SS "Calling external programs or scripts" +.rs +.sp +This facility can be independently disabled when \fBpcre2grep\fP is built. It +is supported for Windows, where a call to \fB_spawnvp()\fP is used, for VMS, +where \fBlib$spawn()\fP is used, and for any Unix-like environment where +\fBfork()\fP and \fBexecv()\fP are available. +.P +If the callout string does not start with a pipe (vertical bar) character, it +is parsed into a list of substrings separated by pipe characters. The first +substring must be an executable name, with the following substrings specifying +arguments: +.sp + executable_name|arg1|arg2|... +.sp +Any substring (including the executable name) may contain escape sequences +started by a dollar character. These are the same as for the \fB--output\fP +(\fB-O\fP) option documented above, except that $0 cannot insert the matched +string because the match is still in progress. Instead, the character '0' +is inserted. If you need a literal dollar or pipe character in any +substring, use $$ or $| respectively. Here is an example: +.sp + echo -e "abcde\en12345" | pcre2grep \e + '(?x)(.)(..(.)) + (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' - +.sp + Output: +.sp + Arg1: [a] [bcd] [d] Arg2: |a| () + abcde + Arg1: [1] [234] [4] Arg2: |1| () + 12345 +.sp +The parameters for the system call that is used to run the program or script +are zero-terminated strings. This means that binary zero characters in the +callout argument will cause premature termination of their substrings, and +therefore should not be present. Any syntax errors in the string (for example, +a dollar not followed by another character) causes the callout to be ignored. +If running the program fails for any reason (including the non-existence of the +executable), a local matching failure occurs and the matcher backtracks in the +normal way. +. +. +.SH "MATCHING ERRORS" +.rs +.sp +It is possible to supply a regular expression that takes a very long time to +fail to match certain lines. Such patterns normally involve nested indefinite +repeats, for example: (a+)*\ed when matched against a line of a's with no final +digit. The PCRE2 matching function has a resource limit that causes it to abort +in these circumstances. If this happens, \fBpcre2grep\fP outputs an error +message and the line that caused the problem to the standard error stream. If +there are more than 20 such errors, \fBpcre2grep\fP gives up. +.P +The \fB--match-limit\fP option of \fBpcre2grep\fP can be used to set the +overall resource limit. There are also other limits that affect the amount of +memory used during matching; see the discussion of \fB--heap-limit\fP and +\fB--depth-limit\fP above. +. +. +.SH DIAGNOSTICS +.rs +.sp +Exit status is 0 if any matches were found, 1 if no matches were found, and 2 +for syntax errors, overlong lines, non-existent or inaccessible files (even if +matches were found in other files) or too many matching errors. Using the +\fB-s\fP option to suppress error messages about inaccessible files does not +affect the return code. +.P +When run under VMS, the return code is placed in the symbol PCRE2GREP_RC +because VMS does not distinguish between exit(0) and exit(1). +. +. +.SH "SEE ALSO" +.rs +.sp +\fBpcre2pattern\fP(3), \fBpcre2syntax\fP(3), \fBpcre2callout\fP(3), +\fBpcre2unicode\fP(3). +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 04 October 2020 +Copyright (c) 1997-2020 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2grep.txt b/src/pcre2/doc/pcre2grep.txt new file mode 100644 index 00000000..0e839c70 --- /dev/null +++ b/src/pcre2/doc/pcre2grep.txt @@ -0,0 +1,1020 @@ +PCRE2GREP(1) General Commands Manual PCRE2GREP(1) + + + +NAME + pcre2grep - a grep with Perl-compatible regular expressions. + +SYNOPSIS + pcre2grep [options] [long options] [pattern] [path1 path2 ...] + + +DESCRIPTION + + pcre2grep searches files for character patterns, in the same way as + other grep commands do, but it uses the PCRE2 regular expression li- + brary to support patterns that are compatible with the regular expres- + sions of Perl 5. See pcre2syntax(3) for a quick-reference summary of + pattern syntax, or pcre2pattern(3) for a full description of the syntax + and semantics of the regular expressions that PCRE2 supports. + + Patterns, whether supplied on the command line or in a separate file, + are given without delimiters. For example: + + pcre2grep Thursday /etc/motd + + If you attempt to use delimiters (for example, by surrounding a pattern + with slashes, as is common in Perl scripts), they are interpreted as + part of the pattern. Quotes can of course be used to delimit patterns + on the command line because they are interpreted by the shell, and in- + deed quotes are required if a pattern contains white space or shell + metacharacters. + + The first argument that follows any option settings is treated as the + single pattern to be matched when neither -e nor -f is present. Con- + versely, when one or both of these options are used to specify pat- + terns, all arguments are treated as path names. At least one of -e, -f, + or an argument pattern must be provided. + + If no files are specified, pcre2grep reads the standard input. The + standard input can also be referenced by a name consisting of a single + hyphen. For example: + + pcre2grep some-pattern file1 - file3 + + Input files are searched line by line. By default, each line that + matches a pattern is copied to the standard output, and if there is + more than one file, the file name is output at the start of each line, + followed by a colon. However, there are options that can change how + pcre2grep behaves. In particular, the -M option makes it possible to + search for strings that span line boundaries. What defines a line + boundary is controlled by the -N (--newline) option. + + The amount of memory used for buffering files that are being scanned is + controlled by parameters that can be set by the --buffer-size and + --max-buffer-size options. The first of these sets the size of buffer + that is obtained at the start of processing. If an input file contains + very long lines, a larger buffer may be needed; this is handled by au- + tomatically extending the buffer, up to the limit specified by --max- + buffer-size. The default values for these parameters can be set when + pcre2grep is built; if nothing is specified, the defaults are set to + 20KiB and 1MiB respectively. An error occurs if a line is too long and + the buffer can no longer be expanded. + + The block of memory that is actually used is three times the "buffer + size", to allow for buffering "before" and "after" lines. If the buffer + size is too small, fewer than requested "before" and "after" lines may + be output. + + Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever is the + greater. BUFSIZ is defined in . When there is more than one + pattern (specified by the use of -e and/or -f), each pattern is applied + to each line in the order in which they are defined, except that all + the -e patterns are tried before the -f patterns. + + By default, as soon as one pattern matches a line, no further patterns + are considered. However, if --colour (or --color) is used to colour the + matching substrings, or if --only-matching, --file-offsets, or --line- + offsets is used to output only the part of the line that matched (ei- + ther shown literally, or as an offset), scanning resumes immediately + following the match, so that further matches on the same line can be + found. If there are multiple patterns, they are all tried on the re- + mainder of the line, but patterns that follow the one that matched are + not tried on the earlier matched part of the line. + + This behaviour means that the order in which multiple patterns are + specified can affect the output when one of the above options is used. + This is no longer the same behaviour as GNU grep, which now manages to + display earlier matches for later patterns (as long as there is no + overlap). + + Patterns that can match an empty string are accepted, but empty string + matches are never recognized. An example is the pattern "(su- + per)?(man)?", in which all components are optional. This pattern finds + all occurrences of both "super" and "man"; the output differs from + matching with "super|man" when only the matching substrings are being + shown. + + If the LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses + the value to set a locale when calling the PCRE2 library. The --locale + option can be used to override this. + + +SUPPORT FOR COMPRESSED FILES + + It is possible to compile pcre2grep so that it uses libz or libbz2 to + read compressed files whose names end in .gz or .bz2, respectively. You + can find out whether your pcre2grep binary has support for one or both + of these file types by running it with the --help option. If the appro- + priate support is not present, all files are treated as plain text. The + standard input is always so treated. When input is from a compressed + .gz or .bz2 file, the --line-buffered option is ignored. + + +BINARY FILES + + By default, a file that contains a binary zero byte within the first + 1024 bytes is identified as a binary file, and is processed specially. + However, if the newline type is specified as NUL, that is, the line + terminator is a binary zero, the test for a binary file is not applied. + See the --binary-files option for a means of changing the way binary + files are handled. + + +BINARY ZEROS IN PATTERNS + + Patterns passed from the command line are strings that are terminated + by a binary zero, so cannot contain internal zeros. However, patterns + that are read from a file via the -f option may contain binary zeros. + + +OPTIONS + + The order in which some of the options appear can affect the output. + For example, both the -H and -l options affect the printing of file + names. Whichever comes later in the command line will be the one that + takes effect. Similarly, except where noted below, if an option is + given twice, the later setting is used. Numerical values for options + may be followed by K or M, to signify multiplication by 1024 or + 1024*1024 respectively. + + -- This terminates the list of options. It is useful if the next + item on the command line starts with a hyphen but is not an + option. This allows for the processing of patterns and file + names that start with hyphens. + + -A number, --after-context=number + Output up to number lines of context after each matching + line. Fewer lines are output if the next match or the end of + the file is reached, or if the processing buffer size has + been set too small. If file names and/or line numbers are be- + ing output, a hyphen separator is used instead of a colon for + the context lines. A line containing "--" is output between + each group of lines, unless they are in fact contiguous in + the input file. The value of number is expected to be rela- + tively small. When -c is used, -A is ignored. + + -a, --text + Treat binary files as text. This is equivalent to --binary- + files=text. + + -B number, --before-context=number + Output up to number lines of context before each matching + line. Fewer lines are output if the previous match or the + start of the file is within number lines, or if the process- + ing buffer size has been set too small. If file names and/or + line numbers are being output, a hyphen separator is used in- + stead of a colon for the context lines. A line containing + "--" is output between each group of lines, unless they are + in fact contiguous in the input file. The value of number is + expected to be relatively small. When -c is used, -B is ig- + nored. + + --binary-files=word + Specify how binary files are to be processed. If the word is + "binary" (the default), pattern matching is performed on bi- + nary files, but the only output is "Binary file + matches" when a match succeeds. If the word is "text", which + is equivalent to the -a or --text option, binary files are + processed in the same way as any other file. In this case, + when a match succeeds, the output may be binary garbage, + which can have nasty effects if sent to a terminal. If the + word is "without-match", which is equivalent to the -I op- + tion, binary files are not processed at all; they are assumed + not to be of interest and are skipped without causing any + output or affecting the return code. + + --buffer-size=number + Set the parameter that controls how much memory is obtained + at the start of processing for buffering files that are being + scanned. See also --max-buffer-size below. + + -C number, --context=number + Output number lines of context both before and after each + matching line. This is equivalent to setting both -A and -B + to the same value. + + -c, --count + Do not output lines from the files that are being scanned; + instead output the number of lines that would have been + shown, either because they matched, or, if -v is set, because + they failed to match. By default, this count is exactly the + same as the number of lines that would have been output, but + if the -M (multiline) option is used (without -v), there may + be more suppressed lines than the count (that is, the number + of matches). + + If no lines are selected, the number zero is output. If sev- + eral files are are being scanned, a count is output for each + of them and the -t option can be used to cause a total to be + output at the end. However, if the --files-with-matches op- + tion is also used, only those files whose counts are greater + than zero are listed. When -c is used, the -A, -B, and -C op- + tions are ignored. + + --colour, --color + If this option is given without any data, it is equivalent to + "--colour=auto". If data is required, it must be given in + the same shell item, separated by an equals sign. + + --colour=value, --color=value + This option specifies under what circumstances the parts of a + line that matched a pattern should be coloured in the output. + By default, the output is not coloured. The value (which is + optional, see above) may be "never", "always", or "auto". In + the latter case, colouring happens only if the standard out- + put is connected to a terminal. More resources are used when + colouring is enabled, because pcre2grep has to search for all + possible matches in a line, not just one, in order to colour + them all. + + The colour that is used can be specified by setting one of + the environment variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR, + PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that + order. If none of these are set, pcre2grep looks for + GREP_COLORS or GREP_COLOR (in that order). The value of the + variable should be a string of two numbers, separated by a + semicolon, except in the case of GREP_COLORS, which must + start with "ms=" or "mt=" followed by two semicolon-separated + colours, terminated by the end of the string or by a colon. + If GREP_COLORS does not start with "ms=" or "mt=" it is ig- + nored, and GREP_COLOR is checked. + + If the string obtained from one of the above variables con- + tains any characters other than semicolon or digits, the set- + ting is ignored and the default colour is used. The string is + copied directly into the control string for setting colour on + a terminal, so it is your responsibility to ensure that the + values make sense. If no relevant environment variable is + set, the default is "1;31", which gives red. + + -D action, --devices=action + If an input path is not a regular file or a directory, "ac- + tion" specifies how it is to be processed. Valid values are + "read" (the default) or "skip" (silently skip the path). + + -d action, --directories=action + If an input path is a directory, "action" specifies how it is + to be processed. Valid values are "read" (the default in + non-Windows environments, for compatibility with GNU grep), + "recurse" (equivalent to the -r option), or "skip" (silently + skip the path, the default in Windows environments). In the + "read" case, directories are read as if they were ordinary + files. In some operating systems the effect of reading a di- + rectory like this is an immediate end-of-file; in others it + may provoke an error. + + --depth-limit=number + See --match-limit below. + + -e pattern, --regex=pattern, --regexp=pattern + Specify a pattern to be matched. This option can be used mul- + tiple times in order to specify several patterns. It can also + be used as a way of specifying a single pattern that starts + with a hyphen. When -e is used, no argument pattern is taken + from the command line; all arguments are treated as file + names. There is no limit to the number of patterns. They are + applied to each line in the order in which they are defined + until one matches. + + If -f is used with -e, the command line patterns are matched + first, followed by the patterns from the file(s), independent + of the order in which these options are specified. Note that + multiple use of -e is not the same as a single pattern with + alternatives. For example, X|Y finds the first character in a + line that is X or Y, whereas if the two patterns are given + separately, with X first, pcre2grep finds X if it is present, + even if it follows Y in the line. It finds Y only if there is + no X in the line. This matters only if you are using -o or + --colo(u)r to show the part(s) of the line that matched. + + --exclude=pattern + Files (but not directories) whose names match the pattern are + skipped without being processed. This applies to all files, + whether listed on the command line, obtained from --file- + list, or by scanning a directory. The pattern is a PCRE2 reg- + ular expression, and is matched against the final component + of the file name, not the entire path. The -F, -w, and -x op- + tions do not apply to this pattern. The option may be given + any number of times in order to specify multiple patterns. If + a file name matches both an --include and an --exclude pat- + tern, it is excluded. There is no short form for this option. + + --exclude-from=filename + Treat each non-empty line of the file as the data for an + --exclude option. What constitutes a newline when reading the + file is the operating system's default. The --newline option + has no effect on this option. This option may be given more + than once in order to specify a number of files to read. + + --exclude-dir=pattern + Directories whose names match the pattern are skipped without + being processed, whatever the setting of the --recursive op- + tion. This applies to all directories, whether listed on the + command line, obtained from --file-list, or by scanning a + parent directory. The pattern is a PCRE2 regular expression, + and is matched against the final component of the directory + name, not the entire path. The -F, -w, and -x options do not + apply to this pattern. The option may be given any number of + times in order to specify more than one pattern. If a direc- + tory matches both --include-dir and --exclude-dir, it is ex- + cluded. There is no short form for this option. + + -F, --fixed-strings + Interpret each data-matching pattern as a list of fixed + strings, separated by newlines, instead of as a regular ex- + pression. What constitutes a newline for this purpose is con- + trolled by the --newline option. The -w (match as a word) and + -x (match whole line) options can be used with -F. They ap- + ply to each of the fixed strings. A line is selected if any + of the fixed strings are found in it (subject to -w or -x, if + present). This option applies only to the patterns that are + matched against the contents of files; it does not apply to + patterns specified by any of the --include or --exclude op- + tions. + + -f filename, --file=filename + Read patterns from the file, one per line, and match them + against each line of input. As is the case with patterns on + the command line, no delimiters should be used. What consti- + tutes a newline when reading the file is the operating sys- + tem's default interpretation of \n. The --newline option has + no effect on this option. Trailing white space is removed + from each line, and blank lines are ignored. An empty file + contains no patterns and therefore matches nothing. Patterns + read from a file in this way may contain binary zeros, which + are treated as ordinary data characters. See also the com- + ments about multiple patterns versus a single pattern with + alternatives in the description of -e above. + + If this option is given more than once, all the specified + files are read. A data line is output if any of the patterns + match it. A file name can be given as "-" to refer to the + standard input. When -f is used, patterns specified on the + command line using -e may also be present; they are tested + before the file's patterns. However, no other pattern is + taken from the command line; all arguments are treated as the + names of paths to be searched. + + --file-list=filename + Read a list of files and/or directories that are to be + scanned from the given file, one per line. What constitutes a + newline when reading the file is the operating system's de- + fault. Trailing white space is removed from each line, and + blank lines are ignored. These paths are processed before any + that are listed on the command line. The file name can be + given as "-" to refer to the standard input. If --file and + --file-list are both specified as "-", patterns are read + first. This is useful only when the standard input is a ter- + minal, from which further lines (the list of files) can be + read after an end-of-file indication. If this option is given + more than once, all the specified files are read. + + --file-offsets + Instead of showing lines or parts of lines that match, show + each match as an offset from the start of the file and a + length, separated by a comma. In this mode, no context is + shown. That is, the -A, -B, and -C options are ignored. If + there is more than one match in a line, each of them is shown + separately. This option is mutually exclusive with --output, + --line-offsets, and --only-matching. + + -H, --with-filename + Force the inclusion of the file name at the start of output + lines when searching a single file. By default, the file name + is not shown in this case. For matching lines, the file name + is followed by a colon; for context lines, a hyphen separator + is used. If a line number is also being output, it follows + the file name. When the -M option causes a pattern to match + more than one line, only the first is preceded by the file + name. This option overrides any previous -h, -l, or -L op- + tions. + + -h, --no-filename + Suppress the output file names when searching multiple files. + By default, file names are shown when multiple files are + searched. For matching lines, the file name is followed by a + colon; for context lines, a hyphen separator is used. If a + line number is also being output, it follows the file name. + This option overrides any previous -H, -L, or -l options. + + --heap-limit=number + See --match-limit below. + + --help Output a help message, giving brief details of the command + options and file type support, and then exit. Anything else + on the command line is ignored. + + -I Ignore binary files. This is equivalent to --binary- + files=without-match. + + -i, --ignore-case + Ignore upper/lower case distinctions during comparisons. + + --include=pattern + If any --include patterns are specified, the only files that + are processed are those whose names match one of the patterns + and do not match an --exclude pattern. This option does not + affect directories, but it applies to all files, whether + listed on the command line, obtained from --file-list, or by + scanning a directory. The pattern is a PCRE2 regular expres- + sion, and is matched against the final component of the file + name, not the entire path. The -F, -w, and -x options do not + apply to this pattern. The option may be given any number of + times. If a file name matches both an --include and an --ex- + clude pattern, it is excluded. There is no short form for + this option. + + --include-from=filename + Treat each non-empty line of the file as the data for an + --include option. What constitutes a newline for this purpose + is the operating system's default. The --newline option has + no effect on this option. This option may be given any number + of times; all the files are read. + + --include-dir=pattern + If any --include-dir patterns are specified, the only direc- + tories that are processed are those whose names match one of + the patterns and do not match an --exclude-dir pattern. This + applies to all directories, whether listed on the command + line, obtained from --file-list, or by scanning a parent di- + rectory. The pattern is a PCRE2 regular expression, and is + matched against the final component of the directory name, + not the entire path. The -F, -w, and -x options do not apply + to this pattern. The option may be given any number of times. + If a directory matches both --include-dir and --exclude-dir, + it is excluded. There is no short form for this option. + + -L, --files-without-match + Instead of outputting lines from the files, just output the + names of the files that do not contain any lines that would + have been output. Each file name is output once, on a sepa- + rate line. This option overrides any previous -H, -h, or -l + options. + + -l, --files-with-matches + Instead of outputting lines from the files, just output the + names of the files containing lines that would have been out- + put. Each file name is output once, on a separate line. + Searching normally stops as soon as a matching line is found + in a file. However, if the -c (count) option is also used, + matching continues in order to obtain the correct count, and + those files that have at least one match are listed along + with their counts. Using this option with -c is a way of sup- + pressing the listing of files with no matches that occurs + with -c on its own. This option overrides any previous -H, + -h, or -L options. + + --label=name + This option supplies a name to be used for the standard input + when file names are being output. If not supplied, "(standard + input)" is used. There is no short form for this option. + + --line-buffered + When this option is given, non-compressed input is read and + processed line by line, and the output is flushed after each + write. By default, input is read in large chunks, unless + pcre2grep can determine that it is reading from a terminal, + which is currently possible only in Unix-like environments or + Windows. Output to terminal is normally automatically flushed + by the operating system. This option can be useful when the + input or output is attached to a pipe and you do not want + pcre2grep to buffer up large amounts of data. However, its + use will affect performance, and the -M (multiline) option + ceases to work. When input is from a compressed .gz or .bz2 + file, --line-buffered is ignored. + + --line-offsets + Instead of showing lines or parts of lines that match, show + each match as a line number, the offset from the start of the + line, and a length. The line number is terminated by a colon + (as usual; see the -n option), and the offset and length are + separated by a comma. In this mode, no context is shown. + That is, the -A, -B, and -C options are ignored. If there is + more than one match in a line, each of them is shown sepa- + rately. This option is mutually exclusive with --output, + --file-offsets, and --only-matching. + + --locale=locale-name + This option specifies a locale to be used for pattern match- + ing. It overrides the value in the LC_ALL or LC_CTYPE envi- + ronment variables. If no locale is specified, the PCRE2 li- + brary's default (usually the "C" locale) is used. There is no + short form for this option. + + -M, --multiline + Allow patterns to match more than one line. When this option + is set, the PCRE2 library is called in "multiline" mode. This + allows a matched string to extend past the end of a line and + continue on one or more subsequent lines. Patterns used with + -M may usefully contain literal newline characters and inter- + nal occurrences of ^ and $ characters. The output for a suc- + cessful match may consist of more than one line. The first + line is the line in which the match started, and the last + line is the line in which the match ended. If the matched + string ends with a newline sequence, the output ends at the + end of that line. If -v is set, none of the lines in a + multi-line match are output. Once a match has been handled, + scanning restarts at the beginning of the line after the one + in which the match ended. + + The newline sequence that separates multiple lines must be + matched as part of the pattern. For example, to find the + phrase "regular expression" in a file where "regular" might + be at the end of a line and "expression" at the start of the + next line, you could use this command: + + pcre2grep -M 'regular\s+expression' + + The \s escape sequence matches any white space character, in- + cluding newlines, and is followed by + so as to match trail- + ing white space on the first line as well as possibly han- + dling a two-character newline sequence. + + There is a limit to the number of lines that can be matched, + imposed by the way that pcre2grep buffers the input file as + it scans it. With a sufficiently large processing buffer, + this should not be a problem, but the -M option does not work + when input is read line by line (see --line-buffered.) + + -m number, --max-count=number + Stop processing after finding number matching lines, or non- + matching lines if -v is also set. Any trailing context lines + are output after the final match. In multiline mode, each + multiline match counts as just one line for this purpose. If + this limit is reached when reading the standard input from a + regular file, the file is left positioned just after the last + matching line. If -c is also set, the count that is output + is never greater than number. This option has no effect if + used with -L, -l, or -q, or when just checking for a match in + a binary file. + + --match-limit=number + Processing some regular expression patterns may take a very + long time to search for all possible matching strings. Others + may require a very large amount of memory. There are three + options that set resource limits for matching. + + The --match-limit option provides a means of limiting comput- + ing resource usage when processing patterns that are not go- + ing to match, but which have a very large number of possibil- + ities in their search trees. The classic example is a pattern + that uses nested unlimited repeats. Internally, PCRE2 has a + counter that is incremented each time around its main pro- + cessing loop. If the value set by --match-limit is reached, + an error occurs. + + The --heap-limit option specifies, as a number of kibibytes + (units of 1024 bytes), the amount of heap memory that may be + used for matching. Heap memory is needed only if matching the + pattern requires a significant number of nested backtracking + points to be remembered. This parameter can be set to zero to + forbid the use of heap memory altogether. + + The --depth-limit option limits the depth of nested back- + tracking points, which indirectly limits the amount of memory + that is used. The amount of memory needed for each backtrack- + ing point depends on the number of capturing parentheses in + the pattern, so the amount of memory that is used before this + limit acts varies from pattern to pattern. This limit is of + use only if it is set smaller than --match-limit. + + There are no short forms for these options. The default lim- + its can be set when the PCRE2 library is compiled; if they + are not specified, the defaults are very large and so effec- + tively unlimited. + + --max-buffer-size=number + This limits the expansion of the processing buffer, whose + initial size can be set by --buffer-size. The maximum buffer + size is silently forced to be no smaller than the starting + buffer size. + + -N newline-type, --newline=newline-type + Six different conventions for indicating the ends of lines in + scanned files are supported. For example: + + pcre2grep -N CRLF 'some pattern' + + The newline type may be specified in upper, lower, or mixed + case. If the newline type is NUL, lines are separated by bi- + nary zero characters. The other types are the single-charac- + ter sequences CR (carriage return) and LF (linefeed), the + two-character sequence CRLF, an "anycrlf" type, which recog- + nizes any of the preceding three types, and an "any" type, + for which any Unicode line ending sequence is assumed to end + a line. The Unicode sequences are the three just mentioned, + plus VT (vertical tab, U+000B), FF (form feed, U+000C), NEL + (next line, U+0085), LS (line separator, U+2028), and PS + (paragraph separator, U+2029). + + When the PCRE2 library is built, a default line-ending se- + quence is specified. This is normally the standard sequence + for the operating system. Unless otherwise specified by this + option, pcre2grep uses the library's default. + + This option makes it possible to use pcre2grep to scan files + that have come from other environments without having to mod- + ify their line endings. If the data that is being scanned + does not agree with the convention set by this option, + pcre2grep may behave in strange ways. Note that this option + does not apply to files specified by the -f, --exclude-from, + or --include-from options, which are expected to use the op- + erating system's standard newline sequence. + + -n, --line-number + Precede each output line by its line number in the file, fol- + lowed by a colon for matching lines or a hyphen for context + lines. If the file name is also being output, it precedes the + line number. When the -M option causes a pattern to match + more than one line, only the first is preceded by its line + number. This option is forced if --line-offsets is used. + + --no-jit If the PCRE2 library is built with support for just-in-time + compiling (which speeds up matching), pcre2grep automatically + makes use of this, unless it was explicitly disabled at build + time. This option can be used to disable the use of JIT at + run time. It is provided for testing and working round prob- + lems. It should never be needed in normal use. + + -O text, --output=text + When there is a match, instead of outputting the line that + matched, output just the text specified in this option, fol- + lowed by an operating-system standard newline. In this mode, + no context is shown. That is, the -A, -B, and -C options are + ignored. The --newline option has no effect on this option, + which is mutually exclusive with --only-matching, --file-off- + sets, and --line-offsets. However, like --only-matching, if + there is more than one match in a line, each of them causes a + line of output. + + Escape sequences starting with a dollar character may be used + to insert the contents of the matched part of the line and/or + captured substrings into the text. + + $ or ${} is replaced by the captured sub- + string of the given decimal number; zero substitutes the + whole match. If the number is greater than the number of cap- + turing substrings, or if the capture is unset, the replace- + ment is empty. + + $a is replaced by bell; $b by backspace; $e by escape; $f by + form feed; $n by newline; $r by carriage return; $t by tab; + $v by vertical tab. + + $o or $o{} is replaced by the character whose + code point is the given octal number. In the first form, up + to three octal digits are processed. When more digits are + needed in Unicode mode to specify a wide character, the sec- + ond form must be used. + + $x or $x{} is replaced by the character rep- + resented by the given hexadecimal number. In the first form, + up to two hexadecimal digits are processed. When more digits + are needed in Unicode mode to specify a wide character, the + second form must be used. + + Any other character is substituted by itself. In particular, + $$ is replaced by a single dollar. + + -o, --only-matching + Show only the part of the line that matched a pattern instead + of the whole line. In this mode, no context is shown. That + is, the -A, -B, and -C options are ignored. If there is more + than one match in a line, each of them is shown separately, + on a separate line of output. If -o is combined with -v (in- + vert the sense of the match to find non-matching lines), no + output is generated, but the return code is set appropri- + ately. If the matched portion of the line is empty, nothing + is output unless the file name or line number are being + printed, in which case they are shown on an otherwise empty + line. This option is mutually exclusive with --output, + --file-offsets and --line-offsets. + + -onumber, --only-matching=number + Show only the part of the line that matched the capturing + parentheses of the given number. Up to 50 capturing parenthe- + ses are supported by default. This limit can be changed via + the --om-capture option. A pattern may contain any number of + capturing parentheses, but only those whose number is within + the limit can be accessed by -o. An error occurs if the num- + ber specified by -o is greater than the limit. + + -o0 is the same as -o without a number. Because these options + can be given without an argument (see above), if an argument + is present, it must be given in the same shell item, for ex- + ample, -o3 or --only-matching=2. The comments given for the + non-argument case above also apply to this option. If the + specified capturing parentheses do not exist in the pattern, + or were not set in the match, nothing is output unless the + file name or line number are being output. + + If this option is given multiple times, multiple substrings + are output for each match, in the order the options are + given, and all on one line. For example, -o3 -o1 -o3 causes + the substrings matched by capturing parentheses 3 and 1 and + then 3 again to be output. By default, there is no separator + (but see the next but one option). + + --om-capture=number + Set the number of capturing parentheses that can be accessed + by -o. The default is 50. + + --om-separator=text + Specify a separating string for multiple occurrences of -o. + The default is an empty string. Separating strings are never + coloured. + + -q, --quiet + Work quietly, that is, display nothing except error messages. + The exit status indicates whether or not any matches were + found. + + -r, --recursive + If any given path is a directory, recursively scan the files + it contains, taking note of any --include and --exclude set- + tings. By default, a directory is read as a normal file; in + some operating systems this gives an immediate end-of-file. + This option is a shorthand for setting the -d option to "re- + curse". + + --recursion-limit=number + This is an obsolete synonym for --depth-limit. See --match- + limit above for details. + + -s, --no-messages + Suppress error messages about non-existent or unreadable + files. Such files are quietly skipped. However, the return + code is still 2, even if matches were found in other files. + + -t, --total-count + This option is useful when scanning more than one file. If + used on its own, -t suppresses all output except for a grand + total number of matching lines (or non-matching lines if -v + is used) in all the files. If -t is used with -c, a grand to- + tal is output except when the previous output is just one + line. In other words, it is not output when just one file's + count is listed. If file names are being output, the grand + total is preceded by "TOTAL:". Otherwise, it appears as just + another number. The -t option is ignored when used with -L + (list files without matches), because the grand total would + always be zero. + + -u, --utf Operate in UTF-8 mode. This option is available only if PCRE2 + has been compiled with UTF-8 support. All patterns (including + those for any --exclude and --include options) and all lines + that are scanned must be valid strings of UTF-8 characters. + If an invalid UTF-8 string is encountered, an error occurs. + + -U, --utf-allow-invalid + As --utf, but in addition subject lines may contain invalid + UTF-8 code unit sequences. These can never form part of any + pattern match. Patterns themselves, however, must still be + valid UTF-8 strings. This facility allows valid UTF-8 strings + to be sought within arbitrary byte sequences in executable or + other binary files. For more details about matching in non- + valid UTF-8 strings, see the pcre2unicode(3) documentation. + + -V, --version + Write the version numbers of pcre2grep and the PCRE2 library + to the standard output and then exit. Anything else on the + command line is ignored. + + -v, --invert-match + Invert the sense of the match, so that lines which do not + match any of the patterns are the ones that are found. When + this option is set, options such as --only-matching and + --output, which specify parts of a match that are to be out- + put, are ignored. + + -w, --word-regex, --word-regexp + Force the patterns only to match "words". That is, there must + be a word boundary at the start and end of each matched + string. This is equivalent to having "\b(?:" at the start of + each pattern, and ")\b" at the end. This option applies only + to the patterns that are matched against the contents of + files; it does not apply to patterns specified by any of the + --include or --exclude options. + + -x, --line-regex, --line-regexp + Force the patterns to start matching only at the beginnings + of lines, and in addition, require them to match entire + lines. In multiline mode the match may be more than one line. + This is equivalent to having "^(?:" at the start of each pat- + tern and ")$" at the end. This option applies only to the + patterns that are matched against the contents of files; it + does not apply to patterns specified by any of the --include + or --exclude options. + + +ENVIRONMENT VARIABLES + + The environment variables LC_ALL and LC_CTYPE are examined, in that or- + der, for a locale. The first one that is set is used. This can be over- + ridden by the --locale option. If no locale is set, the PCRE2 library's + default (usually the "C" locale) is used. + + +NEWLINES + + The -N (--newline) option allows pcre2grep to scan files with newline + conventions that differ from the default. This option affects only the + way scanned files are processed. It does not affect the interpretation + of files specified by the -f, --file-list, --exclude-from, or --in- + clude-from options. + + Any parts of the scanned input files that are written to the standard + output are copied with whatever newline sequences they have in the in- + put. However, if the final line of a file is output, and it does not + end with a newline sequence, a newline sequence is added. If the new- + line setting is CR, LF, CRLF or NUL, that line ending is output; for + the other settings (ANYCRLF or ANY) a single NL is used. + + The newline setting does not affect the way in which pcre2grep writes + newlines in informational messages to the standard output and error + streams. Under Windows, the standard output is set to be binary, so + that "\r\n" at the ends of output lines that are copied from the input + is not converted to "\r\r\n" by the C I/O library. This means that any + messages written to the standard output must end with "\r\n". For all + other operating systems, and for all messages to the standard error + stream, "\n" is used. + + +OPTIONS COMPATIBILITY + + Many of the short and long forms of pcre2grep's options are the same as + in the GNU grep program. Any long option of the form --xxx-regexp (GNU + terminology) is also available as --xxx-regex (PCRE2 terminology). How- + ever, the --depth-limit, --file-list, --file-offsets, --heap-limit, + --include-dir, --line-offsets, --locale, --match-limit, -M, --multi- + line, -N, --newline, --om-separator, --output, -u, --utf, -U, and + --utf-allow-invalid options are specific to pcre2grep, as is the use of + the --only-matching option with a capturing parentheses number. + + Although most of the common options work the same way, a few are dif- + ferent in pcre2grep. For example, the --include option's argument is a + glob for GNU grep, but a regular expression for pcre2grep. If both the + -c and -l options are given, GNU grep lists only file names, without + counts, but pcre2grep gives the counts as well. + + +OPTIONS WITH DATA + + There are four different ways in which an option with data can be spec- + ified. If a short form option is used, the data may follow immedi- + ately, or (with one exception) in the next command line item. For exam- + ple: + + -f/some/file + -f /some/file + + The exception is the -o option, which may appear with or without data. + Because of this, if data is present, it must follow immediately in the + same item, for example -o3. + + If a long form option is used, the data may appear in the same command + line item, separated by an equals character, or (with two exceptions) + it may appear in the next command line item. For example: + + --file=/some/file + --file /some/file + + Note, however, that if you want to supply a file name beginning with ~ + as data in a shell command, and have the shell expand ~ to a home di- + rectory, you must separate the file name from the option, because the + shell does not treat ~ specially unless it is at the start of an item. + + The exceptions to the above are the --colour (or --color) and --only- + matching options, for which the data is optional. If one of these op- + tions does have data, it must be given in the first form, using an + equals character. Otherwise pcre2grep will assume that it has no data. + + +USING PCRE2'S CALLOUT FACILITY + + pcre2grep has, by default, support for calling external programs or + scripts or echoing specific strings during matching by making use of + PCRE2's callout facility. However, this support can be completely or + partially disabled when pcre2grep is built. You can find out whether + your binary has support for callouts by running it with the --help op- + tion. If callout support is completely disabled, all callouts in pat- + terns are ignored by pcre2grep. If the facility is partially disabled, + calling external programs is not supported, and callouts that request + it are ignored. + + A callout in a PCRE2 pattern is of the form (?C) where the argu- + ment is either a number or a quoted string (see the pcre2callout docu- + mentation for details). Numbered callouts are ignored by pcre2grep; + only callouts with string arguments are useful. + + Echoing a specific string + + Starting the callout string with a pipe character invokes an echoing + facility that avoids calling an external program or script. This facil- + ity is always available, provided that callouts were not completely + disabled when pcre2grep was built. The rest of the callout string is + processed as a zero-terminated string, which means it should not con- + tain any internal binary zeros. It is written to the output, having + first been passed through the same escape processing as text from the + --output (-O) option (see above). However, $0 cannot be used to insert + a matched substring because the match is still in progress. Instead, + the single character '0' is inserted. Any syntax errors in the string + (for example, a dollar not followed by another character) causes the + callout to be ignored. No terminator is added to the output string, so + if you want a newline, you must include it explicitly using the escape + $n. For example: + + pcre2grep '(.)(..(.))(?C"|[$1] [$2] [$3]$n")' + + Matching continues normally after the string is output. If you want to + see only the callout output but not any output from an actual match, + you should end the pattern with (*FAIL). + + Calling external programs or scripts + + This facility can be independently disabled when pcre2grep is built. It + is supported for Windows, where a call to _spawnvp() is used, for VMS, + where lib$spawn() is used, and for any Unix-like environment where + fork() and execv() are available. + + If the callout string does not start with a pipe (vertical bar) charac- + ter, it is parsed into a list of substrings separated by pipe charac- + ters. The first substring must be an executable name, with the follow- + ing substrings specifying arguments: + + executable_name|arg1|arg2|... + + Any substring (including the executable name) may contain escape se- + quences started by a dollar character. These are the same as for the + --output (-O) option documented above, except that $0 cannot insert the + matched string because the match is still in progress. Instead, the + character '0' is inserted. If you need a literal dollar or pipe charac- + ter in any substring, use $$ or $| respectively. Here is an example: + + echo -e "abcde\n12345" | pcre2grep \ + '(?x)(.)(..(.)) + (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' - + + Output: + + Arg1: [a] [bcd] [d] Arg2: |a| () + abcde + Arg1: [1] [234] [4] Arg2: |1| () + 12345 + + The parameters for the system call that is used to run the program or + script are zero-terminated strings. This means that binary zero charac- + ters in the callout argument will cause premature termination of their + substrings, and therefore should not be present. Any syntax errors in + the string (for example, a dollar not followed by another character) + causes the callout to be ignored. If running the program fails for any + reason (including the non-existence of the executable), a local match- + ing failure occurs and the matcher backtracks in the normal way. + + +MATCHING ERRORS + + It is possible to supply a regular expression that takes a very long + time to fail to match certain lines. Such patterns normally involve + nested indefinite repeats, for example: (a+)*\d when matched against a + line of a's with no final digit. The PCRE2 matching function has a re- + source limit that causes it to abort in these circumstances. If this + happens, pcre2grep outputs an error message and the line that caused + the problem to the standard error stream. If there are more than 20 + such errors, pcre2grep gives up. + + The --match-limit option of pcre2grep can be used to set the overall + resource limit. There are also other limits that affect the amount of + memory used during matching; see the discussion of --heap-limit and + --depth-limit above. + + +DIAGNOSTICS + + Exit status is 0 if any matches were found, 1 if no matches were found, + and 2 for syntax errors, overlong lines, non-existent or inaccessible + files (even if matches were found in other files) or too many matching + errors. Using the -s option to suppress error messages about inaccessi- + ble files does not affect the return code. + + When run under VMS, the return code is placed in the symbol + PCRE2GREP_RC because VMS does not distinguish between exit(0) and + exit(1). + + +SEE ALSO + + pcre2pattern(3), pcre2syntax(3), pcre2callout(3), pcre2unicode(3). + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 04 October 2020 + Copyright (c) 1997-2020 University of Cambridge. diff --git a/src/pcre2/doc/pcre2jit.3 b/src/pcre2/doc/pcre2jit.3 new file mode 100644 index 00000000..fab83666 --- /dev/null +++ b/src/pcre2/doc/pcre2jit.3 @@ -0,0 +1,449 @@ +.TH PCRE2JIT 3 "23 May 2019" "PCRE2 10.34" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH "PCRE2 JUST-IN-TIME COMPILER SUPPORT" +.rs +.sp +Just-in-time compiling is a heavyweight optimization that can greatly speed up +pattern matching. However, it comes at the cost of extra processing before the +match is performed, so it is of most benefit when the same pattern is going to +be matched many times. This does not necessarily mean many calls of a matching +function; if the pattern is not anchored, matching attempts may take place many +times at various positions in the subject, even for a single call. Therefore, +if the subject string is very long, it may still pay to use JIT even for +one-off matches. JIT support is available for all of the 8-bit, 16-bit and +32-bit PCRE2 libraries. +.P +JIT support applies only to the traditional Perl-compatible matching function. +It does not apply when the DFA matching function is being used. The code for +this support was written by Zoltan Herczeg. +. +. +.SH "AVAILABILITY OF JIT SUPPORT" +.rs +.sp +JIT support is an optional feature of PCRE2. The "configure" option +--enable-jit (or equivalent CMake option) must be set when PCRE2 is built if +you want to use JIT. The support is limited to the following hardware +platforms: +.sp + ARM 32-bit (v5, v7, and Thumb2) + ARM 64-bit + Intel x86 32-bit and 64-bit + MIPS 32-bit and 64-bit + Power PC 32-bit and 64-bit + SPARC 32-bit +.sp +If --enable-jit is set on an unsupported platform, compilation fails. +.P +A program can tell if JIT support is available by calling \fBpcre2_config()\fP +with the PCRE2_CONFIG_JIT option. The result is 1 when JIT is available, and 0 +otherwise. However, a simple program does not need to check this in order to +use JIT. The API is implemented in a way that falls back to the interpretive +code if JIT is not available. For programs that need the best possible +performance, there is also a "fast path" API that is JIT-specific. +. +. +.SH "SIMPLE USE OF JIT" +.rs +.sp +To make use of the JIT support in the simplest way, all you have to do is to +call \fBpcre2_jit_compile()\fP after successfully compiling a pattern with +\fBpcre2_compile()\fP. This function has two arguments: the first is the +compiled pattern pointer that was returned by \fBpcre2_compile()\fP, and the +second is zero or more of the following option bits: PCRE2_JIT_COMPLETE, +PCRE2_JIT_PARTIAL_HARD, or PCRE2_JIT_PARTIAL_SOFT. +.P +If JIT support is not available, a call to \fBpcre2_jit_compile()\fP does +nothing and returns PCRE2_ERROR_JIT_BADOPTION. Otherwise, the compiled pattern +is passed to the JIT compiler, which turns it into machine code that executes +much faster than the normal interpretive code, but yields exactly the same +results. The returned value from \fBpcre2_jit_compile()\fP is zero on success, +or a negative error code. +.P +There is a limit to the size of pattern that JIT supports, imposed by the size +of machine stack that it uses. The exact rules are not documented because they +may change at any time, in particular, when new optimizations are introduced. +If a pattern is too big, a call to \fBpcre2_jit_compile()\fP returns +PCRE2_ERROR_NOMEMORY. +.P +PCRE2_JIT_COMPLETE requests the JIT compiler to generate code for complete +matches. If you want to run partial matches using the PCRE2_PARTIAL_HARD or +PCRE2_PARTIAL_SOFT options of \fBpcre2_match()\fP, you should set one or both +of the other options as well as, or instead of PCRE2_JIT_COMPLETE. The JIT +compiler generates different optimized code for each of the three modes +(normal, soft partial, hard partial). When \fBpcre2_match()\fP is called, the +appropriate code is run if it is available. Otherwise, the pattern is matched +using interpretive code. +.P +You can call \fBpcre2_jit_compile()\fP multiple times for the same compiled +pattern. It does nothing if it has previously compiled code for any of the +option bits. For example, you can call it once with PCRE2_JIT_COMPLETE and +(perhaps later, when you find you need partial matching) again with +PCRE2_JIT_COMPLETE and PCRE2_JIT_PARTIAL_HARD. This time it will ignore +PCRE2_JIT_COMPLETE and just compile code for partial matching. If +\fBpcre2_jit_compile()\fP is called with no option bits set, it immediately +returns zero. This is an alternative way of testing whether JIT is available. +.P +At present, it is not possible to free JIT compiled code except when the entire +compiled pattern is freed by calling \fBpcre2_code_free()\fP. +.P +In some circumstances you may need to call additional functions. These are +described in the section entitled +.\" HTML +.\" +"Controlling the JIT stack" +.\" +below. +.P +There are some \fBpcre2_match()\fP options that are not supported by JIT, and +there are also some pattern items that JIT cannot handle. Details are given +below. In both cases, matching automatically falls back to the interpretive +code. If you want to know whether JIT was actually used for a particular match, +you should arrange for a JIT callback function to be set up as described in the +section entitled +.\" HTML +.\" +"Controlling the JIT stack" +.\" +below, even if you do not need to supply a non-default JIT stack. Such a +callback function is called whenever JIT code is about to be obeyed. If the +match-time options are not right for JIT execution, the callback function is +not obeyed. +.P +If the JIT compiler finds an unsupported item, no JIT data is generated. You +can find out if JIT matching is available after compiling a pattern by calling +\fBpcre2_pattern_info()\fP with the PCRE2_INFO_JITSIZE option. A non-zero +result means that JIT compilation was successful. A result of 0 means that JIT +support is not available, or the pattern was not processed by +\fBpcre2_jit_compile()\fP, or the JIT compiler was not able to handle the +pattern. +. +. +.SH "MATCHING SUBJECTS CONTAINING INVALID UTF" +.rs +.sp +When a pattern is compiled with the PCRE2_UTF option, subject strings are +normally expected to be a valid sequence of UTF code units. By default, this is +checked at the start of matching and an error is generated if invalid UTF is +detected. The PCRE2_NO_UTF_CHECK option can be passed to \fBpcre2_match()\fP to +skip the check (for improved performance) if you are sure that a subject string +is valid. If this option is used with an invalid string, the result is +undefined. +.P +However, a way of running matches on strings that may contain invalid UTF +sequences is available. Calling \fBpcre2_compile()\fP with the +PCRE2_MATCH_INVALID_UTF option has two effects: it tells the interpreter in +\fBpcre2_match()\fP to support invalid UTF, and, if \fBpcre2_jit_compile()\fP +is called, the compiled JIT code also supports invalid UTF. Details of how this +support works, in both the JIT and the interpretive cases, is given in the +.\" HREF +\fBpcre2unicode\fP +.\" +documentation. +.P +There is also an obsolete option for \fBpcre2_jit_compile()\fP called +PCRE2_JIT_INVALID_UTF, which currently exists only for backward compatibility. +It is superseded by the \fBpcre2_compile()\fP option PCRE2_MATCH_INVALID_UTF +and should no longer be used. It may be removed in future. +. +. +.SH "UNSUPPORTED OPTIONS AND PATTERN ITEMS" +.rs +.sp +The \fBpcre2_match()\fP options that are supported for JIT matching are +PCRE2_COPY_MATCHED_SUBJECT, PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, +PCRE2_NOTEMPTY_ATSTART, PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and +PCRE2_PARTIAL_SOFT. The PCRE2_ANCHORED and PCRE2_ENDANCHORED options are not +supported at match time. +.P +If the PCRE2_NO_JIT option is passed to \fBpcre2_match()\fP it disables the +use of JIT, forcing matching by the interpreter code. +.P +The only unsupported pattern items are \eC (match a single data unit) when +running in a UTF mode, and a callout immediately before an assertion condition +in a conditional group. +. +. +.SH "RETURN VALUES FROM JIT MATCHING" +.rs +.sp +When a pattern is matched using JIT matching, the return values are the same +as those given by the interpretive \fBpcre2_match()\fP code, with the addition +of one new error code: PCRE2_ERROR_JIT_STACKLIMIT. This means that the memory +used for the JIT stack was insufficient. See +.\" HTML +.\" +"Controlling the JIT stack" +.\" +below for a discussion of JIT stack usage. +.P +The error code PCRE2_ERROR_MATCHLIMIT is returned by the JIT code if searching +a very large pattern tree goes on for too long, as it is in the same +circumstance when JIT is not used, but the details of exactly what is counted +are not the same. The PCRE2_ERROR_DEPTHLIMIT error code is never returned +when JIT matching is used. +. +. +.\" HTML +.SH "CONTROLLING THE JIT STACK" +.rs +.sp +When the compiled JIT code runs, it needs a block of memory to use as a stack. +By default, it uses 32KiB on the machine stack. However, some large or +complicated patterns need more than this. The error PCRE2_ERROR_JIT_STACKLIMIT +is given when there is not enough stack. Three functions are provided for +managing blocks of memory for use as JIT stacks. There is further discussion +about the use of JIT stacks in the section entitled +.\" HTML +.\" +"JIT stack FAQ" +.\" +below. +.P +The \fBpcre2_jit_stack_create()\fP function creates a JIT stack. Its arguments +are a starting size, a maximum size, and a general context (for memory +allocation functions, or NULL for standard memory allocation). It returns a +pointer to an opaque structure of type \fBpcre2_jit_stack\fP, or NULL if there +is an error. The \fBpcre2_jit_stack_free()\fP function is used to free a stack +that is no longer needed. If its argument is NULL, this function returns +immediately, without doing anything. (For the technically minded: the address +space is allocated by mmap or VirtualAlloc.) A maximum stack size of 512KiB to +1MiB should be more than enough for any pattern. +.P +The \fBpcre2_jit_stack_assign()\fP function specifies which stack JIT code +should use. Its arguments are as follows: +.sp + pcre2_match_context *mcontext + pcre2_jit_callback callback + void *data +.sp +The first argument is a pointer to a match context. When this is subsequently +passed to a matching function, its information determines which JIT stack is +used. If this argument is NULL, the function returns immediately, without doing +anything. There are three cases for the values of the other two options: +.sp + (1) If \fIcallback\fP is NULL and \fIdata\fP is NULL, an internal 32KiB block + on the machine stack is used. This is the default when a match + context is created. +.sp + (2) If \fIcallback\fP is NULL and \fIdata\fP is not NULL, \fIdata\fP must be + a pointer to a valid JIT stack, the result of calling + \fBpcre2_jit_stack_create()\fP. +.sp + (3) If \fIcallback\fP is not NULL, it must point to a function that is + called with \fIdata\fP as an argument at the start of matching, in + order to set up a JIT stack. If the return from the callback + function is NULL, the internal 32KiB stack is used; otherwise the + return value must be a valid JIT stack, the result of calling + \fBpcre2_jit_stack_create()\fP. +.sp +A callback function is obeyed whenever JIT code is about to be run; it is not +obeyed when \fBpcre2_match()\fP is called with options that are incompatible +for JIT matching. A callback function can therefore be used to determine +whether a match operation was executed by JIT or by the interpreter. +.P +You may safely use the same JIT stack for more than one pattern (either by +assigning directly or by callback), as long as the patterns are matched +sequentially in the same thread. Currently, the only way to set up +non-sequential matches in one thread is to use callouts: if a callout function +starts another match, that match must use a different JIT stack to the one used +for currently suspended match(es). +.P +In a multithread application, if you do not +specify a JIT stack, or if you assign or pass back NULL from a callback, that +is thread-safe, because each thread has its own machine stack. However, if you +assign or pass back a non-NULL JIT stack, this must be a different stack for +each thread so that the application is thread-safe. +.P +Strictly speaking, even more is allowed. You can assign the same non-NULL stack +to a match context that is used by any number of patterns, as long as they are +not used for matching by multiple threads at the same time. For example, you +could use the same stack in all compiled patterns, with a global mutex in the +callback to wait until the stack is available for use. However, this is an +inefficient solution, and not recommended. +.P +This is a suggestion for how a multithreaded program that needs to set up +non-default JIT stacks might operate: +.sp + During thread initalization + thread_local_var = pcre2_jit_stack_create(...) +.sp + During thread exit + pcre2_jit_stack_free(thread_local_var) +.sp + Use a one-line callback function + return thread_local_var +.sp +All the functions described in this section do nothing if JIT is not available. +. +. +.\" HTML +.SH "JIT STACK FAQ" +.rs +.sp +(1) Why do we need JIT stacks? +.sp +PCRE2 (and JIT) is a recursive, depth-first engine, so it needs a stack where +the local data of the current node is pushed before checking its child nodes. +Allocating real machine stack on some platforms is difficult. For example, the +stack chain needs to be updated every time if we extend the stack on PowerPC. +Although it is possible, its updating time overhead decreases performance. So +we do the recursion in memory. +.P +(2) Why don't we simply allocate blocks of memory with \fBmalloc()\fP? +.sp +Modern operating systems have a nice feature: they can reserve an address space +instead of allocating memory. We can safely allocate memory pages inside this +address space, so the stack could grow without moving memory data (this is +important because of pointers). Thus we can allocate 1MiB address space, and +use only a single memory page (usually 4KiB) if that is enough. However, we can +still grow up to 1MiB anytime if needed. +.P +(3) Who "owns" a JIT stack? +.sp +The owner of the stack is the user program, not the JIT studied pattern or +anything else. The user program must ensure that if a stack is being used by +\fBpcre2_match()\fP, (that is, it is assigned to a match context that is passed +to the pattern currently running), that stack must not be used by any other +threads (to avoid overwriting the same memory area). The best practice for +multithreaded programs is to allocate a stack for each thread, and return this +stack through the JIT callback function. +.P +(4) When should a JIT stack be freed? +.sp +You can free a JIT stack at any time, as long as it will not be used by +\fBpcre2_match()\fP again. When you assign the stack to a match context, only a +pointer is set. There is no reference counting or any other magic. You can free +compiled patterns, contexts, and stacks in any order, anytime. +Just \fIdo not\fP call \fBpcre2_match()\fP with a match context pointing to an +already freed stack, as that will cause SEGFAULT. (Also, do not free a stack +currently used by \fBpcre2_match()\fP in another thread). You can also replace +the stack in a context at any time when it is not in use. You should free the +previous stack before assigning a replacement. +.P +(5) Should I allocate/free a stack every time before/after calling +\fBpcre2_match()\fP? +.sp +No, because this is too costly in terms of resources. However, you could +implement some clever idea which release the stack if it is not used in let's +say two minutes. The JIT callback can help to achieve this without keeping a +list of patterns. +.P +(6) OK, the stack is for long term memory allocation. But what happens if a +pattern causes stack overflow with a stack of 1MiB? Is that 1MiB kept until the +stack is freed? +.sp +Especially on embedded sytems, it might be a good idea to release memory +sometimes without freeing the stack. There is no API for this at the moment. +Probably a function call which returns with the currently allocated memory for +any stack and another which allows releasing memory (shrinking the stack) would +be a good idea if someone needs this. +.P +(7) This is too much of a headache. Isn't there any better solution for JIT +stack handling? +.sp +No, thanks to Windows. If POSIX threads were used everywhere, we could throw +out this complicated API. +. +. +.SH "FREEING JIT SPECULATIVE MEMORY" +.rs +.sp +.nf +.B void pcre2_jit_free_unused_memory(pcre2_general_context *\fIgcontext\fP); +.fi +.P +The JIT executable allocator does not free all memory when it is possible. +It expects new allocations, and keeps some free memory around to improve +allocation speed. However, in low memory conditions, it might be better to free +all possible memory. You can cause this to happen by calling +pcre2_jit_free_unused_memory(). Its argument is a general context, for custom +memory management, or NULL for standard memory management. +. +. +.SH "EXAMPLE CODE" +.rs +.sp +This is a single-threaded example that specifies a JIT stack without using a +callback. A real program should include error checking after all the function +calls. +.sp + int rc; + pcre2_code *re; + pcre2_match_data *match_data; + pcre2_match_context *mcontext; + pcre2_jit_stack *jit_stack; +.sp + re = pcre2_compile(pattern, PCRE2_ZERO_TERMINATED, 0, + &errornumber, &erroffset, NULL); + rc = pcre2_jit_compile(re, PCRE2_JIT_COMPLETE); + mcontext = pcre2_match_context_create(NULL); + jit_stack = pcre2_jit_stack_create(32*1024, 512*1024, NULL); + pcre2_jit_stack_assign(mcontext, NULL, jit_stack); + match_data = pcre2_match_data_create(re, 10); + rc = pcre2_match(re, subject, length, 0, 0, match_data, mcontext); + /* Process result */ +.sp + pcre2_code_free(re); + pcre2_match_data_free(match_data); + pcre2_match_context_free(mcontext); + pcre2_jit_stack_free(jit_stack); +.sp +. +. +.SH "JIT FAST PATH API" +.rs +.sp +Because the API described above falls back to interpreted matching when JIT is +not available, it is convenient for programs that are written for general use +in many environments. However, calling JIT via \fBpcre2_match()\fP does have a +performance impact. Programs that are written for use where JIT is known to be +available, and which need the best possible performance, can instead use a +"fast path" API to call JIT matching directly instead of calling +\fBpcre2_match()\fP (obviously only for patterns that have been successfully +processed by \fBpcre2_jit_compile()\fP). +.P +The fast path function is called \fBpcre2_jit_match()\fP, and it takes exactly +the same arguments as \fBpcre2_match()\fP. However, the subject string must be +specified with a length; PCRE2_ZERO_TERMINATED is not supported. Unsupported +option bits (for example, PCRE2_ANCHORED, PCRE2_ENDANCHORED and +PCRE2_COPY_MATCHED_SUBJECT) are ignored, as is the PCRE2_NO_JIT option. The +return values are also the same as for \fBpcre2_match()\fP, plus +PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is requested +that was not compiled. +.P +When you call \fBpcre2_match()\fP, as well as testing for invalid options, a +number of other sanity checks are performed on the arguments. For example, if +the subject pointer is NULL, an immediate error is given. Also, unless +PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested for validity. In the +interests of speed, these checks do not happen on the JIT fast path, and if +invalid data is passed, the result is undefined. +.P +Bypassing the sanity checks and the \fBpcre2_match()\fP wrapping can give +speedups of more than 10%. +. +. +.SH "SEE ALSO" +.rs +.sp +\fBpcre2api\fP(3) +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel (FAQ by Zoltan Herczeg) +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 23 May 2019 +Copyright (c) 1997-2019 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2limits.3 b/src/pcre2/doc/pcre2limits.3 new file mode 100644 index 00000000..9bf3626d --- /dev/null +++ b/src/pcre2/doc/pcre2limits.3 @@ -0,0 +1,72 @@ +.TH PCRE2LIMITS 3 "03 February 2019" "PCRE2 10.33" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH "SIZE AND OTHER LIMITATIONS" +.rs +.sp +There are some size limitations in PCRE2 but it is hoped that they will never +in practice be relevant. +.P +The maximum size of a compiled pattern is approximately 64 thousand code units +for the 8-bit and 16-bit libraries if PCRE2 is compiled with the default +internal linkage size, which is 2 bytes for these libraries. If you want to +process regular expressions that are truly enormous, you can compile PCRE2 with +an internal linkage size of 3 or 4 (when building the 16-bit library, 3 is +rounded up to 4). See the \fBREADME\fP file in the source distribution and the +.\" HREF +\fBpcre2build\fP +.\" +documentation for details. In these cases the limit is substantially larger. +However, the speed of execution is slower. In the 32-bit library, the internal +linkage size is always 4. +.P +The maximum length of a source pattern string is essentially unlimited; it is +the largest number a PCRE2_SIZE variable can hold. However, the program that +calls \fBpcre2_compile()\fP can specify a smaller limit. +.P +The maximum length (in code units) of a subject string is one less than the +largest number a PCRE2_SIZE variable can hold. PCRE2_SIZE is an unsigned +integer type, usually defined as size_t. Its maximum value (that is +~(PCRE2_SIZE)0) is reserved as a special indicator for zero-terminated strings +and unset offsets. +.P +All values in repeating quantifiers must be less than 65536. +.P +The maximum length of a lookbehind assertion is 65535 characters. +.P +There is no limit to the number of parenthesized groups, but there can be no +more than 65535 capture groups, and there is a limit to the depth of nesting of +parenthesized subpatterns of all kinds. This is imposed in order to limit the +amount of system stack used at compile time. The default limit can be specified +when PCRE2 is built; if not, the default is set to 250. An application can +change this limit by calling pcre2_set_parens_nest_limit() to set the limit in +a compile context. +.P +The maximum length of name for a named capture group is 32 code units, and the +maximum number of such groups is 10000. +.P +The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb +is 255 code units for the 8-bit library and 65535 code units for the 16-bit and +32-bit libraries. +.P +The maximum length of a string argument to a callout is the largest number a +32-bit unsigned integer can hold. +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 02 February 2019 +Copyright (c) 1997-2019 University of Cambridge. +.fi diff --git a/src/pcre/doc/pcrematching.3 b/src/pcre2/doc/pcre2matching.3 similarity index 64% rename from src/pcre/doc/pcrematching.3 rename to src/pcre2/doc/pcre2matching.3 index 268baf9b..7f9bbac7 100644 --- a/src/pcre/doc/pcrematching.3 +++ b/src/pcre2/doc/pcre2matching.3 @@ -1,25 +1,24 @@ -.TH PCREMATCHING 3 "12 November 2013" "PCRE 8.34" +.TH PCRE2MATCHING 3 "23 May 2019" "PCRE2 10.34" .SH NAME -PCRE - Perl-compatible regular expressions -.SH "PCRE MATCHING ALGORITHMS" +PCRE2 - Perl-compatible regular expressions (revised API) +.SH "PCRE2 MATCHING ALGORITHMS" .rs .sp -This document describes the two different algorithms that are available in PCRE -for matching a compiled regular expression against a given subject string. The -"standard" algorithm is the one provided by the \fBpcre_exec()\fP, -\fBpcre16_exec()\fP and \fBpcre32_exec()\fP functions. These work in the same -as as Perl's matching function, and provide a Perl-compatible matching operation. -The just-in-time (JIT) optimization that is described in the +This document describes the two different algorithms that are available in +PCRE2 for matching a compiled regular expression against a given subject +string. The "standard" algorithm is the one provided by the \fBpcre2_match()\fP +function. This works in the same as as Perl's matching function, and provide a +Perl-compatible matching operation. The just-in-time (JIT) optimization that is +described in the .\" HREF -\fBpcrejit\fP +\fBpcre2jit\fP .\" -documentation is compatible with these functions. +documentation is compatible with this function. .P -An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP, -\fBpcre16_dfa_exec()\fP and \fBpcre32_dfa_exec()\fP functions; they operate in -a different way, and are not Perl-compatible. This alternative has advantages -and disadvantages compared with the standard algorithm, and these are described -below. +An alternative algorithm is provided by the \fBpcre2_dfa_match()\fP function; +it operates in a different way, and is not Perl-compatible. This alternative +has advantages and disadvantages compared with the standard algorithm, and +these are described below. .P When there is only one possible way in which a given subject string can match a pattern, the two algorithms give the same answer. A difference arises, however, @@ -43,22 +42,21 @@ as a tree structure. An unlimited repetition in the pattern makes the tree of infinite size, but it is still a tree. Matching the pattern to a given subject string (from a given starting point) can be thought of as a search of the tree. There are two ways to search a tree: depth-first and breadth-first, and these -correspond to the two matching algorithms provided by PCRE. +correspond to the two matching algorithms provided by PCRE2. . . .SH "THE STANDARD MATCHING ALGORITHM" .rs .sp -In the terminology of Jeffrey Friedl's book "Mastering Regular -Expressions", the standard algorithm is an "NFA algorithm". It conducts a -depth-first search of the pattern tree. That is, it proceeds along a single -path through the tree, checking that the subject matches what is required. When -there is a mismatch, the algorithm tries any alternatives at the current point, -and if they all fail, it backs up to the previous branch point in the tree, and -tries the next alternative branch at that level. This often involves backing up -(moving to the left) in the subject string as well. The order in which -repetition branches are tried is controlled by the greedy or ungreedy nature of -the quantifier. +In the terminology of Jeffrey Friedl's book "Mastering Regular Expressions", +the standard algorithm is an "NFA algorithm". It conducts a depth-first search +of the pattern tree. That is, it proceeds along a single path through the tree, +checking that the subject matches what is required. When there is a mismatch, +the algorithm tries any alternatives at the current point, and if they all +fail, it backs up to the previous branch point in the tree, and tries the next +alternative branch at that level. This often involves backing up (moving to the +left) in the subject string as well. The order in which repetition branches are +tried is controlled by the greedy or ungreedy nature of the quantifier. .P If a leaf node is reached, a matching string has been found, and at that point the algorithm stops. Thus, if there is more than one possible match, this @@ -69,7 +67,7 @@ ungreedy repetition quantifiers are specified in the pattern. Because it ends up with a single path through the tree, it is relatively straightforward for this algorithm to keep track of the substrings that are matched by portions of the pattern in parentheses. This provides support for -capturing parentheses and back references. +capturing parentheses and backreferences. . . .SH "THE ALTERNATIVE MATCHING ALGORITHM" @@ -101,24 +99,26 @@ subject. If the pattern .sp cat(er(pillar)?)? .sp -is matched against the string "the caterpillar catchment", the result will be -the three strings "caterpillar", "cater", and "cat" that start at the fifth +is matched against the string "the caterpillar catchment", the result is the +three strings "caterpillar", "cater", and "cat" that start at the fifth character of the subject. The algorithm does not automatically move on to find matches that start at later positions. .P -PCRE's "auto-possessification" optimization usually applies to character +PCRE2's "auto-possessification" optimization usually applies to character repeats at the end of a pattern (as well as internally). For example, the pattern "a\ed+" is compiled as if it were "a\ed++" because there is no point even considering the possibility of backtracking into the repeated digits. For DFA matching, this means that only one possible match is found. If you really do want multiple matches in such cases, either use an ungreedy repeat -("a\ed+?") or set the PCRE_NO_AUTO_POSSESS option when compiling. +("a\ed+?") or set the PCRE2_NO_AUTO_POSSESS option when compiling. .P -There are a number of features of PCRE regular expressions that are not -supported by the alternative matching algorithm. They are as follows: +There are a number of features of PCRE2 regular expressions that are not +supported or behave differently in the alternative matching function. Those +that are not supported cause an error if encountered. .P 1. Because the algorithm finds all possible matches, the greedy or ungreedy -nature of repetition quantifiers is not relevant. Greedy and ungreedy +nature of repetition quantifiers is not relevant (though it may affect +auto-possessification, as just described). During matching, greedy and ungreedy quantifiers are treated in exactly the same way. However, possessive quantifiers can make a difference when what follows could also match what is quantified, for example in a pattern like this: @@ -132,29 +132,34 @@ longest match is then "locked in" for the rest of the overall pattern. .P 2. When dealing with multiple paths through the tree simultaneously, it is not straightforward to keep track of captured substrings for the different matching -possibilities, and PCRE's implementation of this algorithm does not attempt to +possibilities, and PCRE2's implementation of this algorithm does not attempt to do this. This means that no captured substrings are available. .P -3. Because no substrings are captured, back references within the pattern are -not supported, and cause errors if encountered. +3. Because no substrings are captured, backreferences within the pattern are +not supported. .P 4. For the same reason, conditional expressions that use a backreference as the condition or test for a specific group recursion are not supported. .P -5. Because many paths through the tree may be active, the \eK escape sequence, +5. Again for the same reason, script runs are not supported. +.P +6. Because many paths through the tree may be active, the \eK escape sequence, which resets the start of the match when encountered (but may be on some paths -and not on others), is not supported. It causes an error if encountered. +and not on others), is not supported. .P -6. Callouts are supported, but the value of the \fIcapture_top\fP field is -always 1, and the value of the \fIcapture_last\fP field is always -1. +7. Callouts are supported, but the value of the \fIcapture_top\fP field is +always 1, and the value of the \fIcapture_last\fP field is always 0. .P -7. The \eC escape sequence, which (in the standard algorithm) always matches a -single data unit, even in UTF-8, UTF-16 or UTF-32 modes, is not supported in -these modes, because the alternative algorithm moves through the subject string -one character (not data unit) at a time, for all active paths through the tree. +8. The \eC escape sequence, which (in the standard algorithm) always matches a +single code unit, even in a UTF mode, is not supported in these modes, because +the alternative algorithm moves through the subject string one character (not +code unit) at a time, for all active paths through the tree. .P -8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not +9. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not supported. (*FAIL) is supported, and behaves like a failing negative assertion. +.P +10. The PCRE2_MATCH_INVALID_UTF option for \fBpcre2_compile()\fP is not +supported by \fBpcre2_dfa_match()\fP. . . .SH "ADVANTAGES OF THE ALTERNATIVE ALGORITHM" @@ -170,11 +175,11 @@ callouts. 2. Because the alternative algorithm scans the subject string just once, and never needs to backtrack (except for lookbehinds), it is possible to pass very long subject strings to the matching function in several pieces, checking for -partial matching each time. Although it is possible to do multi-segment -matching using the standard algorithm by retaining partially matched +partial matching each time. Although it is also possible to do multi-segment +matching using the standard algorithm, by retaining partially matched substrings, it is more complicated. The .\" HREF -\fBpcrepartial\fP +\fBpcre2partial\fP .\" documentation gives details of partial matching and discusses multi-segment matching. @@ -189,7 +194,8 @@ The alternative algorithm suffers from a number of disadvantages: because it has to search for all possible matches, but is also because it is less susceptible to optimization. .P -2. Capturing parentheses and back references are not supported. +2. Capturing parentheses, backreferences, script runs, and matching within +invalid UTF string are not supported. .P 3. Although atomic groups are supported, their use does not provide the performance advantage that it does for the standard algorithm. @@ -201,7 +207,7 @@ performance advantage that it does for the standard algorithm. .nf Philip Hazel University Computing Service -Cambridge CB2 3QH, England. +Cambridge, England. .fi . . @@ -209,6 +215,6 @@ Cambridge CB2 3QH, England. .rs .sp .nf -Last updated: 12 November 2013 -Copyright (c) 1997-2012 University of Cambridge. +Last updated: 23 May 2019 +Copyright (c) 1997-2019 University of Cambridge. .fi diff --git a/src/pcre2/doc/pcre2partial.3 b/src/pcre2/doc/pcre2partial.3 new file mode 100644 index 00000000..892906a7 --- /dev/null +++ b/src/pcre2/doc/pcre2partial.3 @@ -0,0 +1,373 @@ +.TH PCRE2PARTIAL 3 "04 September 2019" "PCRE2 10.34" +.SH NAME +PCRE2 - Perl-compatible regular expressions +.SH "PARTIAL MATCHING IN PCRE2" +.rs +.sp +In normal use of PCRE2, if there is a match up to the end of a subject string, +but more characters are needed to match the entire pattern, PCRE2_ERROR_NOMATCH +is returned, just like any other failing match. There are circumstances where +it might be helpful to distinguish this "partial match" case. +.P +One example is an application where the subject string is very long, and not +all available at once. The requirement here is to be able to do the matching +segment by segment, but special action is needed when a matched substring spans +the boundary between two segments. +.P +Another example is checking a user input string as it is typed, to ensure that +it conforms to a required format. Invalid characters can be immediately +diagnosed and rejected, giving instant feedback. +.P +Partial matching is a PCRE2-specific feature; it is not Perl-compatible. It is +requested by setting one of the PCRE2_PARTIAL_HARD or PCRE2_PARTIAL_SOFT +options when calling a matching function. The difference between the two +options is whether or not a partial match is preferred to an alternative +complete match, though the details differ between the two types of matching +function. If both options are set, PCRE2_PARTIAL_HARD takes precedence. +.P +If you want to use partial matching with just-in-time optimized code, as well +as setting a partial match option for the matching function, you must also call +\fBpcre2_jit_compile()\fP with one or both of these options: +.sp + PCRE2_JIT_PARTIAL_HARD + PCRE2_JIT_PARTIAL_SOFT +.sp +PCRE2_JIT_COMPLETE should also be set if you are going to run non-partial +matches on the same pattern. Separate code is compiled for each mode. If the +appropriate JIT mode has not been compiled, interpretive matching code is used. +.P +Setting a partial matching option disables two of PCRE2's standard +optimization hints. PCRE2 remembers the last literal code unit in a pattern, +and abandons matching immediately if it is not present in the subject string. +This optimization cannot be used for a subject string that might match only +partially. PCRE2 also remembers a minimum length of a matching string, and does +not bother to run the matching function on shorter strings. This optimization +is also disabled for partial matching. +. +. +.SH "REQUIREMENTS FOR A PARTIAL MATCH" +.rs +.sp +A possible partial match occurs during matching when the end of the subject +string is reached successfully, but either more characters are needed to +complete the match, or the addition of more characters might change what is +matched. +.P +Example 1: if the pattern is /abc/ and the subject is "ab", more characters are +definitely needed to complete a match. In this case both hard and soft matching +options yield a partial match. +.P +Example 2: if the pattern is /ab+/ and the subject is "ab", a complete match +can be found, but the addition of more characters might change what is +matched. In this case, only PCRE2_PARTIAL_HARD returns a partial match; +PCRE2_PARTIAL_SOFT returns the complete match. +.P +On reaching the end of the subject, when PCRE2_PARTIAL_HARD is set, if the next +pattern item is \ez, \eZ, \eb, \eB, or $ there is always a partial match. +Otherwise, for both options, the next pattern item must be one that inspects a +character, and at least one of the following must be true: +.P +(1) At least one character has already been inspected. An inspected character +need not form part of the final matched string; lookbehind assertions and the +\eK escape sequence provide ways of inspecting characters before the start of a +matched string. +.P +(2) The pattern contains one or more lookbehind assertions. This condition +exists in case there is a lookbehind that inspects characters before the start +of the match. +.P +(3) There is a special case when the whole pattern can match an empty string. +When the starting point is at the end of the subject, the empty string match is +a possibility, and if PCRE2_PARTIAL_SOFT is set and neither of the above +conditions is true, it is returned. However, because adding more characters +might result in a non-empty match, PCRE2_PARTIAL_HARD returns a partial match, +which in this case means "there is going to be a match at this point, but until +some more characters are added, we do not know if it will be an empty string or +something longer". +. +. +. +.SH "PARTIAL MATCHING USING pcre2_match()" +.rs +.sp +When a partial matching option is set, the result of calling +\fBpcre2_match()\fP can be one of the following: +.TP 2 +\fBA successful match\fP +A complete match has been found, starting and ending within this subject. +.TP +\fBPCRE2_ERROR_NOMATCH\fP +No match can start anywhere in this subject. +.TP +\fBPCRE2_ERROR_PARTIAL\fP +Adding more characters may result in a complete match that uses one or more +characters from the end of this subject. +.P +When a partial match is returned, the first two elements in the ovector point +to the portion of the subject that was matched, but the values in the rest of +the ovector are undefined. The appearance of \eK in the pattern has no effect +for a partial match. Consider this pattern: +.sp + /abc\eK123/ +.sp +If it is matched against "456abc123xyz" the result is a complete match, and the +ovector defines the matched string as "123", because \eK resets the "start of +match" point. However, if a partial match is requested and the subject string +is "456abc12", a partial match is found for the string "abc12", because all +these characters are needed for a subsequent re-match with additional +characters. +.P +If there is more than one partial match, the first one that was found provides +the data that is returned. Consider this pattern: +.sp + /123\ew+X|dogY/ +.sp +If this is matched against the subject string "abc123dog", both alternatives +fail to match, but the end of the subject is reached during matching, so +PCRE2_ERROR_PARTIAL is returned. The offsets are set to 3 and 9, identifying +"123dog" as the first partial match. (In this example, there are two partial +matches, because "dog" on its own partially matches the second alternative.) +. +. +.SS "How a partial match is processed by pcre2_match()" +.rs +.sp +What happens when a partial match is identified depends on which of the two +partial matching options is set. +.P +If PCRE2_PARTIAL_HARD is set, PCRE2_ERROR_PARTIAL is returned as soon as a +partial match is found, without continuing to search for possible complete +matches. This option is "hard" because it prefers an earlier partial match over +a later complete match. For this reason, the assumption is made that the end of +the supplied subject string is not the true end of the available data, which is +why \ez, \eZ, \eb, \eB, and $ always give a partial match. +.P +If PCRE2_PARTIAL_SOFT is set, the partial match is remembered, but matching +continues as normal, and other alternatives in the pattern are tried. If no +complete match can be found, PCRE2_ERROR_PARTIAL is returned instead of +PCRE2_ERROR_NOMATCH. This option is "soft" because it prefers a complete match +over a partial match. All the various matching items in a pattern behave as if +the subject string is potentially complete; \ez, \eZ, and $ match at the end of +the subject, as normal, and for \eb and \eB the end of the subject is treated +as a non-alphanumeric. +.P +The difference between the two partial matching options can be illustrated by a +pattern such as: +.sp + /dog(sbody)?/ +.sp +This matches either "dog" or "dogsbody", greedily (that is, it prefers the +longer string if possible). If it is matched against the string "dog" with +PCRE2_PARTIAL_SOFT, it yields a complete match for "dog". However, if +PCRE2_PARTIAL_HARD is set, the result is PCRE2_ERROR_PARTIAL. On the other +hand, if the pattern is made ungreedy the result is different: +.sp + /dog(sbody)??/ +.sp +In this case the result is always a complete match because that is found first, +and matching never continues after finding a complete match. It might be easier +to follow this explanation by thinking of the two patterns like this: +.sp + /dog(sbody)?/ is the same as /dogsbody|dog/ + /dog(sbody)??/ is the same as /dog|dogsbody/ +.sp +The second pattern will never match "dogsbody", because it will always find the +shorter match first. +. +. +.SS "Example of partial matching using pcre2test" +.rs +.sp +The \fBpcre2test\fP data modifiers \fBpartial_hard\fP (or \fBph\fP) and +\fBpartial_soft\fP (or \fBps\fP) set PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT, +respectively, when calling \fBpcre2_match()\fP. Here is a run of +\fBpcre2test\fP using a pattern that matches the whole subject in the form of a +date: +.sp + re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/ + data> 25dec3\e=ph + Partial match: 23dec3 + data> 3ju\e=ph + Partial match: 3ju + data> 3juj\e=ph + No match +.sp +This example gives the same results for both hard and soft partial matching +options. Here is an example where there is a difference: +.sp + re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/ + data> 25jun04\e=ps + 0: 25jun04 + 1: jun + data> 25jun04\e=ph + Partial match: 25jun04 +.sp +With PCRE2_PARTIAL_SOFT, the subject is matched completely. For +PCRE2_PARTIAL_HARD, however, the subject is assumed not to be complete, so +there is only a partial match. +. +. +. +.SH "MULTI-SEGMENT MATCHING WITH pcre2_match()" +.rs +.sp +PCRE was not originally designed with multi-segment matching in mind. However, +over time, features (including partial matching) that make multi-segment +matching possible have been added. A very long string can be searched segment +by segment by calling \fBpcre2_match()\fP repeatedly, with the aim of achieving +the same results that would happen if the entire string was available for +searching all the time. Normally, the strings that are being sought are much +shorter than each individual segment, and are in the middle of very long +strings, so the pattern is normally not anchored. +.P +Special logic must be implemented to handle a matched substring that spans a +segment boundary. PCRE2_PARTIAL_HARD should be used, because it returns a +partial match at the end of a segment whenever there is the possibility of +changing the match by adding more characters. The PCRE2_NOTBOL option should +also be set for all but the first segment. +.P +When a partial match occurs, the next segment must be added to the current +subject and the match re-run, using the \fIstartoffset\fP argument of +\fBpcre2_match()\fP to begin at the point where the partial match started. +For example: +.sp + re> /\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed/ + data> ...the date is 23ja\e=ph + Partial match: 23ja + data> ...the date is 23jan19 and on that day...\e=offset=15 + 0: 23jan19 + 1: jan +.sp +Note the use of the \fBoffset\fP modifier to start the new match where the +partial match was found. In this example, the next segment was added to the one +in which the partial match was found. This is the most straightforward +approach, typically using a memory buffer that is twice the size of each +segment. After a partial match, the first half of the buffer is discarded, the +second half is moved to the start of the buffer, and a new segment is added +before repeating the match as in the example above. After a no match, the +entire buffer can be discarded. +.P +If there are memory constraints, you may want to discard text that precedes a +partial match before adding the next segment. Unfortunately, this is not at +present straightforward. In cases such as the above, where the pattern does not +contain any lookbehinds, it is sufficient to retain only the partially matched +substring. However, if the pattern contains a lookbehind assertion, characters +that precede the start of the partial match may have been inspected during the +matching process. When \fBpcre2test\fP displays a partial match, it indicates +these characters with '<' if the \fBallusedtext\fP modifier is set: +.sp + re> "(?<=123)abc" + data> xx123ab\e=ph,allusedtext + Partial match: 123ab + <<< +.sp +However, the \fBallusedtext\fP modifier is not available for JIT matching, +because JIT matching does not record the first (or last) consulted characters. +For this reason, this information is not available via the API. It is therefore +not possible in general to obtain the exact number of characters that must be +retained in order to get the right match result. If you cannot retain the +entire segment, you must find some heuristic way of choosing. +.P +If you know the approximate length of the matching substrings, you can use that +to decide how much text to retain. The only lookbehind information that is +currently available via the API is the length of the longest individual +lookbehind in a pattern, but this can be misleading if there are nested +lookbehinds. The value returned by calling \fBpcre2_pattern_info()\fP with the +PCRE2_INFO_MAXLOOKBEHIND option is the maximum number of characters (not code +units) that any individual lookbehind moves back when it is processed. A +pattern such as "(?<=(? /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/ + data> 23ja\e=dfa,ps + Partial match: 23ja + data> n05\e=dfa,dfa_restart + 0: n05 +.sp +The first call has "23ja" as the subject, and requests partial matching; the +second call has "n05" as the subject for the continued (restarted) match. +Notice that when the match is complete, only the last part is shown; PCRE2 does +not retain the previously partially-matched string. It is up to the calling +program to do that if it needs to. This means that, for an unanchored pattern, +if a continued match fails, it is not possible to try again at a new starting +point. All this facility is capable of doing is continuing with the previous +match attempt. For example, consider this pattern: +.sp + 1234|3789 +.sp +If the first part of the subject is "ABC123", a partial match of the first +alternative is found at offset 3. There is no partial match for the second +alternative, because such a match does not start at the same point in the +subject string. Attempting to continue with the string "7890" does not yield a +match because only those alternatives that match at one point in the subject +are remembered. Depending on the application, this may or may not be what you +want. +.P +If you do want to allow for starting again at the next character, one way of +doing it is to retain some or all of the segment and try a new complete match, +as described for \fBpcre2_match()\fP above. Another possibility is to work with +two buffers. If a partial match at offset \fIn\fP in the first buffer is +followed by "no match" when PCRE2_DFA_RESTART is used on the second buffer, you +can then try a new match starting at offset \fIn+1\fP in the first buffer. +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 04 September 2019 +Copyright (c) 1997-2019 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2pattern.3 b/src/pcre2/doc/pcre2pattern.3 new file mode 100644 index 00000000..dc78e4d1 --- /dev/null +++ b/src/pcre2/doc/pcre2pattern.3 @@ -0,0 +1,3903 @@ +.TH PCRE2PATTERN 3 "06 October 2020" "PCRE2 10.35" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH "PCRE2 REGULAR EXPRESSION DETAILS" +.rs +.sp +The syntax and semantics of the regular expressions that are supported by PCRE2 +are described in detail below. There is a quick-reference syntax summary in the +.\" HREF +\fBpcre2syntax\fP +.\" +page. PCRE2 tries to match Perl syntax and semantics as closely as it can. +PCRE2 also supports some alternative regular expression syntax (which does not +conflict with the Perl syntax) in order to provide some compatibility with +regular expressions in Python, .NET, and Oniguruma. +.P +Perl's regular expressions are described in its own documentation, and regular +expressions in general are covered in a number of books, some of which have +copious examples. Jeffrey Friedl's "Mastering Regular Expressions", published +by O'Reilly, covers regular expressions in great detail. This description of +PCRE2's regular expressions is intended as reference material. +.P +This document discusses the regular expression patterns that are supported by +PCRE2 when its main matching function, \fBpcre2_match()\fP, is used. PCRE2 also +has an alternative matching function, \fBpcre2_dfa_match()\fP, which matches +using a different algorithm that is not Perl-compatible. Some of the features +discussed below are not available when DFA matching is used. The advantages and +disadvantages of the alternative function, and how it differs from the normal +function, are discussed in the +.\" HREF +\fBpcre2matching\fP +.\" +page. +. +. +.SH "SPECIAL START-OF-PATTERN ITEMS" +.rs +.sp +A number of options that can be passed to \fBpcre2_compile()\fP can also be set +by special items at the start of a pattern. These are not Perl-compatible, but +are provided to make these options accessible to pattern writers who are not +able to change the program that processes the pattern. Any number of these +items may appear, but they must all be together right at the start of the +pattern string, and the letters must be in upper case. +. +. +.SS "UTF support" +.rs +.sp +In the 8-bit and 16-bit PCRE2 libraries, characters may be coded either as +single code units, or as multiple UTF-8 or UTF-16 code units. UTF-32 can be +specified for the 32-bit library, in which case it constrains the character +values to valid Unicode code points. To process UTF strings, PCRE2 must be +built to include Unicode support (which is the default). When using UTF strings +you must either call the compiling function with one or both of the PCRE2_UTF +or PCRE2_MATCH_INVALID_UTF options, or the pattern must start with the special +sequence (*UTF), which is equivalent to setting the relevant PCRE2_UTF. How +setting a UTF mode affects pattern matching is mentioned in several places +below. There is also a summary of features in the +.\" HREF +\fBpcre2unicode\fP +.\" +page. +.P +Some applications that allow their users to supply patterns may wish to +restrict them to non-UTF data for security reasons. If the PCRE2_NEVER_UTF +option is passed to \fBpcre2_compile()\fP, (*UTF) is not allowed, and its +appearance in a pattern causes an error. +. +. +.SS "Unicode property support" +.rs +.sp +Another special sequence that may appear at the start of a pattern is (*UCP). +This has the same effect as setting the PCRE2_UCP option: it causes sequences +such as \ed and \ew to use Unicode properties to determine character types, +instead of recognizing only characters with codes less than 256 via a lookup +table. If also causes upper/lower casing operations to use Unicode properties +for characters with code points greater than 127, even when UTF is not set. +.P +Some applications that allow their users to supply patterns may wish to +restrict them for security reasons. If the PCRE2_NEVER_UCP option is passed to +\fBpcre2_compile()\fP, (*UCP) is not allowed, and its appearance in a pattern +causes an error. +. +. +.SS "Locking out empty string matching" +.rs +.sp +Starting a pattern with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) has the same effect +as passing the PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART option to whichever +matching function is subsequently called to match the pattern. These options +lock out the matching of empty strings, either entirely, or only at the start +of the subject. +. +. +.SS "Disabling auto-possessification" +.rs +.sp +If a pattern starts with (*NO_AUTO_POSSESS), it has the same effect as setting +the PCRE2_NO_AUTO_POSSESS option. This stops PCRE2 from making quantifiers +possessive when what follows cannot match the repeated item. For example, by +default a+b is treated as a++b. For more details, see the +.\" HREF +\fBpcre2api\fP +.\" +documentation. +. +. +.SS "Disabling start-up optimizations" +.rs +.sp +If a pattern starts with (*NO_START_OPT), it has the same effect as setting the +PCRE2_NO_START_OPTIMIZE option. This disables several optimizations for quickly +reaching "no match" results. For more details, see the +.\" HREF +\fBpcre2api\fP +.\" +documentation. +. +. +.SS "Disabling automatic anchoring" +.rs +.sp +If a pattern starts with (*NO_DOTSTAR_ANCHOR), it has the same effect as +setting the PCRE2_NO_DOTSTAR_ANCHOR option. This disables optimizations that +apply to patterns whose top-level branches all start with .* (match any number +of arbitrary characters). For more details, see the +.\" HREF +\fBpcre2api\fP +.\" +documentation. +. +. +.SS "Disabling JIT compilation" +.rs +.sp +If a pattern that starts with (*NO_JIT) is successfully compiled, an attempt by +the application to apply the JIT optimization by calling +\fBpcre2_jit_compile()\fP is ignored. +. +. +.SS "Setting match resource limits" +.rs +.sp +The \fBpcre2_match()\fP function contains a counter that is incremented every +time it goes round its main loop. The caller of \fBpcre2_match()\fP can set a +limit on this counter, which therefore limits the amount of computing resource +used for a match. The maximum depth of nested backtracking can also be limited; +this indirectly restricts the amount of heap memory that is used, but there is +also an explicit memory limit that can be set. +.P +These facilities are provided to catch runaway matches that are provoked by +patterns with huge matching trees. A common example is a pattern with nested +unlimited repeats applied to a long string that does not match. When one of +these limits is reached, \fBpcre2_match()\fP gives an error return. The limits +can also be set by items at the start of the pattern of the form +.sp + (*LIMIT_HEAP=d) + (*LIMIT_MATCH=d) + (*LIMIT_DEPTH=d) +.sp +where d is any number of decimal digits. However, the value of the setting must +be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP +for it to have any effect. In other words, the pattern writer can lower the +limits set by the programmer, but not raise them. If there is more than one +setting of one of these limits, the lower value is used. The heap limit is +specified in kibibytes (units of 1024 bytes). +.P +Prior to release 10.30, LIMIT_DEPTH was called LIMIT_RECURSION. This name is +still recognized for backwards compatibility. +.P +The heap limit applies only when the \fBpcre2_match()\fP or +\fBpcre2_dfa_match()\fP interpreters are used for matching. It does not apply +to JIT. The match limit is used (but in a different way) when JIT is being +used, or when \fBpcre2_dfa_match()\fP is called, to limit computing resource +usage by those matching functions. The depth limit is ignored by JIT but is +relevant for DFA matching, which uses function recursion for recursions within +the pattern and for lookaround assertions and atomic groups. In this case, the +depth limit controls the depth of such recursion. +. +. +.\" HTML +.SS "Newline conventions" +.rs +.sp +PCRE2 supports six different conventions for indicating line breaks in +strings: a single CR (carriage return) character, a single LF (linefeed) +character, the two-character sequence CRLF, any of the three preceding, any +Unicode newline sequence, or the NUL character (binary zero). The +.\" HREF +\fBpcre2api\fP +.\" +page has +.\" HTML +.\" +further discussion +.\" +about newlines, and shows how to set the newline convention when calling +\fBpcre2_compile()\fP. +.P +It is also possible to specify a newline convention by starting a pattern +string with one of the following sequences: +.sp + (*CR) carriage return + (*LF) linefeed + (*CRLF) carriage return, followed by linefeed + (*ANYCRLF) any of the three above + (*ANY) all Unicode newline sequences + (*NUL) the NUL character (binary zero) +.sp +These override the default and the options given to the compiling function. For +example, on a Unix system where LF is the default newline sequence, the pattern +.sp + (*CR)a.b +.sp +changes the convention to CR. That pattern matches "a\enb" because LF is no +longer a newline. If more than one of these settings is present, the last one +is used. +.P +The newline convention affects where the circumflex and dollar assertions are +true. It also affects the interpretation of the dot metacharacter when +PCRE2_DOTALL is not set, and the behaviour of \eN when not followed by an +opening brace. However, it does not affect what the \eR escape sequence +matches. By default, this is any Unicode newline sequence, for Perl +compatibility. However, this can be changed; see the next section and the +description of \eR in the section entitled +.\" HTML +.\" +"Newline sequences" +.\" +below. A change of \eR setting can be combined with a change of newline +convention. +. +. +.SS "Specifying what \eR matches" +.rs +.sp +It is possible to restrict \eR to match only CR, LF, or CRLF (instead of the +complete set of Unicode line endings) by setting the option PCRE2_BSR_ANYCRLF +at compile time. This effect can also be achieved by starting a pattern with +(*BSR_ANYCRLF). For completeness, (*BSR_UNICODE) is also recognized, +corresponding to PCRE2_BSR_UNICODE. +. +. +.SH "EBCDIC CHARACTER CODES" +.rs +.sp +PCRE2 can be compiled to run in an environment that uses EBCDIC as its +character code instead of ASCII or Unicode (typically a mainframe system). In +the sections below, character code values are ASCII or Unicode; in an EBCDIC +environment these characters may have different code values, and there are no +code points greater than 255. +. +. +.SH "CHARACTERS AND METACHARACTERS" +.rs +.sp +A regular expression is a pattern that is matched against a subject string from +left to right. Most characters stand for themselves in a pattern, and match the +corresponding characters in the subject. As a trivial example, the pattern +.sp + The quick brown fox +.sp +matches a portion of a subject string that is identical to itself. When +caseless matching is specified (the PCRE2_CASELESS option or (?i) within the +pattern), letters are matched independently of case. Note that there are two +ASCII characters, K and S, that, in addition to their lower case ASCII +equivalents, are case-equivalent with Unicode U+212A (Kelvin sign) and U+017F +(long S) respectively when either PCRE2_UTF or PCRE2_UCP is set. +.P +The power of regular expressions comes from the ability to include wild cards, +character classes, alternatives, and repetitions in the pattern. These are +encoded in the pattern by the use of \fImetacharacters\fP, which do not stand +for themselves but instead are interpreted in some special way. +.P +There are two different sets of metacharacters: those that are recognized +anywhere in the pattern except within square brackets, and those that are +recognized within square brackets. Outside square brackets, the metacharacters +are as follows: +.sp + \e general escape character with several uses + ^ assert start of string (or line, in multiline mode) + $ assert end of string (or line, in multiline mode) + . match any character except newline (by default) + [ start character class definition + | start of alternative branch + ( start group or control verb + ) end group or control verb + * 0 or more quantifier + + 1 or more quantifier; also "possessive quantifier" + ? 0 or 1 quantifier; also quantifier minimizer + { start min/max quantifier +.sp +Part of a pattern that is in square brackets is called a "character class". In +a character class the only metacharacters are: +.sp + \e general escape character + ^ negate the class, but only if the first character + - indicates character range + [ POSIX character class (if followed by POSIX syntax) + ] terminates the character class +.sp +If a pattern is compiled with the PCRE2_EXTENDED option, most white space in +the pattern, other than in a character class, and characters between a # +outside a character class and the next newline, inclusive, are ignored. An +escaping backslash can be used to include a white space or a # character as +part of the pattern. If the PCRE2_EXTENDED_MORE option is set, the same +applies, but in addition unescaped space and horizontal tab characters are +ignored inside a character class. Note: only these two characters are ignored, +not the full set of pattern white space characters that are ignored outside a +character class. Option settings can be changed within a pattern; see the +section entitled +.\" HTML +.\" +"Internal Option Setting" +.\" +below. +.P +The following sections describe the use of each of the metacharacters. +. +. +.SH BACKSLASH +.rs +.sp +The backslash character has several uses. Firstly, if it is followed by a +character that is not a digit or a letter, it takes away any special meaning +that character may have. This use of backslash as an escape character applies +both inside and outside character classes. +.P +For example, if you want to match a * character, you must write \e* in the +pattern. This escaping action applies whether or not the following character +would otherwise be interpreted as a metacharacter, so it is always safe to +precede a non-alphanumeric with backslash to specify that it stands for itself. +In particular, if you want to match a backslash, you write \e\e. +.P +Only ASCII digits and letters have any special meaning after a backslash. All +other characters (in particular, those whose code points are greater than 127) +are treated as literals. +.P +If you want to treat all characters in a sequence as literals, you can do so by +putting them between \eQ and \eE. This is different from Perl in that $ and @ +are handled as literals in \eQ...\eE sequences in PCRE2, whereas in Perl, $ and +@ cause variable interpolation. Also, Perl does "double-quotish backslash +interpolation" on any backslashes between \eQ and \eE which, its documentation +says, "may lead to confusing results". PCRE2 treats a backslash between \eQ and +\eE just like any other character. Note the following examples: +.sp + Pattern PCRE2 matches Perl matches +.sp +.\" JOIN + \eQabc$xyz\eE abc$xyz abc followed by the + contents of $xyz + \eQabc\e$xyz\eE abc\e$xyz abc\e$xyz + \eQabc\eE\e$\eQxyz\eE abc$xyz abc$xyz + \eQA\eB\eE A\eB A\eB + \eQ\e\eE \e \e\eE +.sp +The \eQ...\eE sequence is recognized both inside and outside character classes. +An isolated \eE that is not preceded by \eQ is ignored. If \eQ is not followed +by \eE later in the pattern, the literal interpretation continues to the end of +the pattern (that is, \eE is assumed at the end). If the isolated \eQ is inside +a character class, this causes an error, because the character class is not +terminated by a closing square bracket. +. +. +.\" HTML +.SS "Non-printing characters" +.rs +.sp +A second use of backslash provides a way of encoding non-printing characters +in patterns in a visible manner. There is no restriction on the appearance of +non-printing characters in a pattern, but when a pattern is being prepared by +text editing, it is often easier to use one of the following escape sequences +instead of the binary character it represents. In an ASCII or Unicode +environment, these escapes are as follows: +.sp + \ea alarm, that is, the BEL character (hex 07) + \ecx "control-x", where x is any printable ASCII character + \ee escape (hex 1B) + \ef form feed (hex 0C) + \en linefeed (hex 0A) + \er carriage return (hex 0D) (but see below) + \et tab (hex 09) + \e0dd character with octal code 0dd + \eddd character with octal code ddd, or backreference + \eo{ddd..} character with octal code ddd.. + \exhh character with hex code hh + \ex{hhh..} character with hex code hhh.. + \eN{U+hhh..} character with Unicode hex code point hhh.. +.sp +By default, after \ex that is not followed by {, from zero to two hexadecimal +digits are read (letters can be in upper or lower case). Any number of +hexadecimal digits may appear between \ex{ and }. If a character other than a +hexadecimal digit appears between \ex{ and }, or if there is no terminating }, +an error occurs. +.P +Characters whose code points are less than 256 can be defined by either of the +two syntaxes for \ex or by an octal sequence. There is no difference in the way +they are handled. For example, \exdc is exactly the same as \ex{dc} or \e334. +However, using the braced versions does make such sequences easier to read. +.P +Support is available for some ECMAScript (aka JavaScript) escape sequences via +two compile-time options. If PCRE2_ALT_BSUX is set, the sequence \ex followed +by { is not recognized. Only if \ex is followed by two hexadecimal digits is it +recognized as a character escape. Otherwise it is interpreted as a literal "x" +character. In this mode, support for code points greater than 256 is provided +by \eu, which must be followed by four hexadecimal digits; otherwise it is +interpreted as a literal "u" character. +.P +PCRE2_EXTRA_ALT_BSUX has the same effect as PCRE2_ALT_BSUX and, in addition, +\eu{hhh..} is recognized as the character specified by hexadecimal code point. +There may be any number of hexadecimal digits. This syntax is from ECMAScript +6. +.P +The \eN{U+hhh..} escape sequence is recognized only when PCRE2 is operating in +UTF mode. Perl also uses \eN{name} to specify characters by Unicode name; PCRE2 +does not support this. Note that when \eN is not followed by an opening brace +(curly bracket) it has an entirely different meaning, matching any character +that is not a newline. +.P +There are some legacy applications where the escape sequence \er is expected to +match a newline. If the PCRE2_EXTRA_ESCAPED_CR_IS_LF option is set, \er in a +pattern is converted to \en so that it matches a LF (linefeed) instead of a CR +(carriage return) character. +.P +The precise effect of \ecx on ASCII characters is as follows: if x is a lower +case letter, it is converted to upper case. Then bit 6 of the character (hex +40) is inverted. Thus \ecA to \ecZ become hex 01 to hex 1A (A is 41, Z is 5A), +but \ec{ becomes hex 3B ({ is 7B), and \ec; becomes hex 7B (; is 3B). If the +code unit following \ec has a value less than 32 or greater than 126, a +compile-time error occurs. +.P +When PCRE2 is compiled in EBCDIC mode, \eN{U+hhh..} is not supported. \ea, \ee, +\ef, \en, \er, and \et generate the appropriate EBCDIC code values. The \ec +escape is processed as specified for Perl in the \fBperlebcdic\fP document. The +only characters that are allowed after \ec are A-Z, a-z, or one of @, [, \e, ], +^, _, or ?. Any other character provokes a compile-time error. The sequence +\ec@ encodes character code 0; after \ec the letters (in either case) encode +characters 1-26 (hex 01 to hex 1A); [, \e, ], ^, and _ encode characters 27-31 +(hex 1B to hex 1F), and \ec? becomes either 255 (hex FF) or 95 (hex 5F). +.P +Thus, apart from \ec?, these escapes generate the same character code values as +they do in an ASCII environment, though the meanings of the values mostly +differ. For example, \ecG always generates code value 7, which is BEL in ASCII +but DEL in EBCDIC. +.P +The sequence \ec? generates DEL (127, hex 7F) in an ASCII environment, but +because 127 is not a control character in EBCDIC, Perl makes it generate the +APC character. Unfortunately, there are several variants of EBCDIC. In most of +them the APC character has the value 255 (hex FF), but in the one Perl calls +POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC +values, PCRE2 makes \ec? generate 95; otherwise it generates 255. +.P +After \e0 up to two further octal digits are read. If there are fewer than two +digits, just those that are present are used. Thus the sequence \e0\ex\e015 +specifies two binary zeros followed by a CR character (code value 13). Make +sure you supply two digits after the initial zero if the pattern character that +follows is itself an octal digit. +.P +The escape \eo must be followed by a sequence of octal digits, enclosed in +braces. An error occurs if this is not the case. This escape is a recent +addition to Perl; it provides way of specifying character code points as octal +numbers greater than 0777, and it also allows octal numbers and backreferences +to be unambiguously specified. +.P +For greater clarity and unambiguity, it is best to avoid following \e by a +digit greater than zero. Instead, use \eo{} or \ex{} to specify numerical +character code points, and \eg{} to specify backreferences. The following +paragraphs describe the old, ambiguous syntax. +.P +The handling of a backslash followed by a digit other than 0 is complicated, +and Perl has changed over time, causing PCRE2 also to change. +.P +Outside a character class, PCRE2 reads the digit and any following digits as a +decimal number. If the number is less than 10, begins with the digit 8 or 9, or +if there are at least that many previous capture groups in the expression, the +entire sequence is taken as a \fIbackreference\fP. A description of how this +works is given +.\" HTML +.\" +later, +.\" +following the discussion of +.\" HTML +.\" +parenthesized groups. +.\" +Otherwise, up to three octal digits are read to form a character code. +.P +Inside a character class, PCRE2 handles \e8 and \e9 as the literal characters +"8" and "9", and otherwise reads up to three octal digits following the +backslash, using them to generate a data character. Any subsequent digits stand +for themselves. For example, outside a character class: +.sp + \e040 is another way of writing an ASCII space +.\" JOIN + \e40 is the same, provided there are fewer than 40 + previous capture groups + \e7 is always a backreference +.\" JOIN + \e11 might be a backreference, or another way of + writing a tab + \e011 is always a tab + \e0113 is a tab followed by the character "3" +.\" JOIN + \e113 might be a backreference, otherwise the + character with octal code 113 +.\" JOIN + \e377 might be a backreference, otherwise + the value 255 (decimal) +.\" JOIN + \e81 is always a backreference +.sp +Note that octal values of 100 or greater that are specified using this syntax +must not be introduced by a leading zero, because no more than three octal +digits are ever read. +. +. +.SS "Constraints on character values" +.rs +.sp +Characters that are specified using octal or hexadecimal numbers are +limited to certain values, as follows: +.sp + 8-bit non-UTF mode no greater than 0xff + 16-bit non-UTF mode no greater than 0xffff + 32-bit non-UTF mode no greater than 0xffffffff + All UTF modes no greater than 0x10ffff and a valid code point +.sp +Invalid Unicode code points are all those in the range 0xd800 to 0xdfff (the +so-called "surrogate" code points). The check for these can be disabled by the +caller of \fBpcre2_compile()\fP by setting the option +PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES. However, this is possible only in UTF-8 +and UTF-32 modes, because these values are not representable in UTF-16. +. +. +.SS "Escape sequences in character classes" +.rs +.sp +All the sequences that define a single character value can be used both inside +and outside character classes. In addition, inside a character class, \eb is +interpreted as the backspace character (hex 08). +.P +When not followed by an opening brace, \eN is not allowed in a character class. +\eB, \eR, and \eX are not special inside a character class. Like other +unrecognized alphabetic escape sequences, they cause an error. Outside a +character class, these sequences have different meanings. +. +. +.SS "Unsupported escape sequences" +.rs +.sp +In Perl, the sequences \eF, \el, \eL, \eu, and \eU are recognized by its string +handler and used to modify the case of following characters. By default, PCRE2 +does not support these escape sequences in patterns. However, if either of the +PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX options is set, \eU matches a "U" +character, and \eu can be used to define a character by code point, as +described above. +. +. +.SS "Absolute and relative backreferences" +.rs +.sp +The sequence \eg followed by a signed or unsigned number, optionally enclosed +in braces, is an absolute or relative backreference. A named backreference +can be coded as \eg{name}. Backreferences are discussed +.\" HTML +.\" +later, +.\" +following the discussion of +.\" HTML +.\" +parenthesized groups. +.\" +. +. +.SS "Absolute and relative subroutine calls" +.rs +.sp +For compatibility with Oniguruma, the non-Perl syntax \eg followed by a name or +a number enclosed either in angle brackets or single quotes, is an alternative +syntax for referencing a capture group as a subroutine. Details are discussed +.\" HTML +.\" +later. +.\" +Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP +synonymous. The former is a backreference; the latter is a +.\" HTML +.\" +subroutine +.\" +call. +. +. +.\" HTML +.SS "Generic character types" +.rs +.sp +Another use of backslash is for specifying generic character types: +.sp + \ed any decimal digit + \eD any character that is not a decimal digit + \eh any horizontal white space character + \eH any character that is not a horizontal white space character + \eN any character that is not a newline + \es any white space character + \eS any character that is not a white space character + \ev any vertical white space character + \eV any character that is not a vertical white space character + \ew any "word" character + \eW any "non-word" character +.sp +The \eN escape sequence has the same meaning as +.\" HTML +.\" +the "." metacharacter +.\" +when PCRE2_DOTALL is not set, but setting PCRE2_DOTALL does not change the +meaning of \eN. Note that when \eN is followed by an opening brace it has a +different meaning. See the section entitled +.\" HTML +.\" +"Non-printing characters" +.\" +above for details. Perl also uses \eN{name} to specify characters by Unicode +name; PCRE2 does not support this. +.P +Each pair of lower and upper case escape sequences partitions the complete set +of characters into two disjoint sets. Any given character matches one, and only +one, of each pair. The sequences can appear both inside and outside character +classes. They each match one character of the appropriate type. If the current +matching point is at the end of the subject string, all of them fail, because +there is no character to match. +.P +The default \es characters are HT (9), LF (10), VT (11), FF (12), CR (13), and +space (32), which are defined as white space in the "C" locale. This list may +vary if locale-specific matching is taking place. For example, in some locales +the "non-breaking space" character (\exA0) is recognized as white space, and in +others the VT character is not. +.P +A "word" character is an underscore or any character that is a letter or digit. +By default, the definition of letters and digits is controlled by PCRE2's +low-valued character tables, and may vary if locale-specific matching is taking +place (see +.\" HTML +.\" +"Locale support" +.\" +in the +.\" HREF +\fBpcre2api\fP +.\" +page). For example, in a French locale such as "fr_FR" in Unix-like systems, +or "french" in Windows, some character codes greater than 127 are used for +accented letters, and these are then matched by \ew. The use of locales with +Unicode is discouraged. +.P +By default, characters whose code points are greater than 127 never match \ed, +\es, or \ew, and always match \eD, \eS, and \eW, although this may be different +for characters in the range 128-255 when locale-specific matching is happening. +These escape sequences retain their original meanings from before Unicode +support was available, mainly for efficiency reasons. If the PCRE2_UCP option +is set, the behaviour is changed so that Unicode properties are used to +determine character types, as follows: +.sp + \ed any character that matches \ep{Nd} (decimal digit) + \es any character that matches \ep{Z} or \eh or \ev + \ew any character that matches \ep{L} or \ep{N}, plus underscore +.sp +The upper case escapes match the inverse sets of characters. Note that \ed +matches only decimal digits, whereas \ew matches any Unicode digit, as well as +any Unicode letter, and underscore. Note also that PCRE2_UCP affects \eb, and +\eB because they are defined in terms of \ew and \eW. Matching these sequences +is noticeably slower when PCRE2_UCP is set. +.P +The sequences \eh, \eH, \ev, and \eV, in contrast to the other sequences, which +match only ASCII characters by default, always match a specific list of code +points, whether or not PCRE2_UCP is set. The horizontal space characters are: +.sp + U+0009 Horizontal tab (HT) + U+0020 Space + U+00A0 Non-break space + U+1680 Ogham space mark + U+180E Mongolian vowel separator + U+2000 En quad + U+2001 Em quad + U+2002 En space + U+2003 Em space + U+2004 Three-per-em space + U+2005 Four-per-em space + U+2006 Six-per-em space + U+2007 Figure space + U+2008 Punctuation space + U+2009 Thin space + U+200A Hair space + U+202F Narrow no-break space + U+205F Medium mathematical space + U+3000 Ideographic space +.sp +The vertical space characters are: +.sp + U+000A Linefeed (LF) + U+000B Vertical tab (VT) + U+000C Form feed (FF) + U+000D Carriage return (CR) + U+0085 Next line (NEL) + U+2028 Line separator + U+2029 Paragraph separator +.sp +In 8-bit, non-UTF-8 mode, only the characters with code points less than 256 +are relevant. +. +. +.\" HTML +.SS "Newline sequences" +.rs +.sp +Outside a character class, by default, the escape sequence \eR matches any +Unicode newline sequence. In 8-bit non-UTF-8 mode \eR is equivalent to the +following: +.sp + (?>\er\en|\en|\ex0b|\ef|\er|\ex85) +.sp +This is an example of an "atomic group", details of which are given +.\" HTML +.\" +below. +.\" +This particular group matches either the two-character sequence CR followed by +LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab, +U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next +line, U+0085). Because this is an atomic group, the two-character sequence is +treated as a single unit that cannot be split. +.P +In other modes, two additional characters whose code points are greater than 255 +are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029). +Unicode support is not needed for these characters to be recognized. +.P +It is possible to restrict \eR to match only CR, LF, or CRLF (instead of the +complete set of Unicode line endings) by setting the option PCRE2_BSR_ANYCRLF +at compile time. (BSR is an abbrevation for "backslash R".) This can be made +the default when PCRE2 is built; if this is the case, the other behaviour can +be requested via the PCRE2_BSR_UNICODE option. It is also possible to specify +these settings by starting a pattern string with one of the following +sequences: +.sp + (*BSR_ANYCRLF) CR, LF, or CRLF only + (*BSR_UNICODE) any Unicode newline sequence +.sp +These override the default and the options given to the compiling function. +Note that these special settings, which are not Perl-compatible, are recognized +only at the very start of a pattern, and that they must be in upper case. If +more than one of them is present, the last one is used. They can be combined +with a change of newline convention; for example, a pattern can start with: +.sp + (*ANY)(*BSR_ANYCRLF) +.sp +They can also be combined with the (*UTF) or (*UCP) special sequences. Inside a +character class, \eR is treated as an unrecognized escape sequence, and causes +an error. +. +. +.\" HTML +.SS Unicode character properties +.rs +.sp +When PCRE2 is built with Unicode support (the default), three additional escape +sequences that match characters with specific properties are available. They +can be used in any mode, though in 8-bit and 16-bit non-UTF modes these +sequences are of course limited to testing characters whose code points are +less than U+0100 and U+10000, respectively. In 32-bit non-UTF mode, code points +greater than 0x10ffff (the Unicode limit) may be encountered. These are all +treated as being in the Unknown script and with an unassigned type. The extra +escape sequences are: +.sp + \ep{\fIxx\fP} a character with the \fIxx\fP property + \eP{\fIxx\fP} a character without the \fIxx\fP property + \eX a Unicode extended grapheme cluster +.sp +The property names represented by \fIxx\fP above are case-sensitive. There is +support for Unicode script names, Unicode general category properties, "Any", +which matches any character (including newline), and some special PCRE2 +properties (described in the +.\" HTML +.\" +next section). +.\" +Other Perl properties such as "InMusicalSymbols" are not supported by PCRE2. +Note that \eP{Any} does not match any characters, so always causes a match +failure. +.P +Sets of Unicode characters are defined as belonging to certain scripts. A +character from one of these sets can be matched using a script name. For +example: +.sp + \ep{Greek} + \eP{Han} +.sp +Unassigned characters (and in non-UTF 32-bit mode, characters with code points +greater than 0x10FFFF) are assigned the "Unknown" script. Others that are not +part of an identified script are lumped together as "Common". The current list +of scripts is: +.P +Adlam, +Ahom, +Anatolian_Hieroglyphs, +Arabic, +Armenian, +Avestan, +Balinese, +Bamum, +Bassa_Vah, +Batak, +Bengali, +Bhaiksuki, +Bopomofo, +Brahmi, +Braille, +Buginese, +Buhid, +Canadian_Aboriginal, +Carian, +Caucasian_Albanian, +Chakma, +Cham, +Cherokee, +Chorasmian, +Common, +Coptic, +Cuneiform, +Cypriot, +Cyrillic, +Deseret, +Devanagari, +Dives_Akuru, +Dogra, +Duployan, +Egyptian_Hieroglyphs, +Elbasan, +Elymaic, +Ethiopic, +Georgian, +Glagolitic, +Gothic, +Grantha, +Greek, +Gujarati, +Gunjala_Gondi, +Gurmukhi, +Han, +Hangul, +Hanifi_Rohingya, +Hanunoo, +Hatran, +Hebrew, +Hiragana, +Imperial_Aramaic, +Inherited, +Inscriptional_Pahlavi, +Inscriptional_Parthian, +Javanese, +Kaithi, +Kannada, +Katakana, +Kayah_Li, +Kharoshthi, +Khitan_Small_Script, +Khmer, +Khojki, +Khudawadi, +Lao, +Latin, +Lepcha, +Limbu, +Linear_A, +Linear_B, +Lisu, +Lycian, +Lydian, +Mahajani, +Makasar, +Malayalam, +Mandaic, +Manichaean, +Marchen, +Masaram_Gondi, +Medefaidrin, +Meetei_Mayek, +Mende_Kikakui, +Meroitic_Cursive, +Meroitic_Hieroglyphs, +Miao, +Modi, +Mongolian, +Mro, +Multani, +Myanmar, +Nabataean, +Nandinagari, +New_Tai_Lue, +Newa, +Nko, +Nushu, +Nyakeng_Puachue_Hmong, +Ogham, +Ol_Chiki, +Old_Hungarian, +Old_Italic, +Old_North_Arabian, +Old_Permic, +Old_Persian, +Old_Sogdian, +Old_South_Arabian, +Old_Turkic, +Oriya, +Osage, +Osmanya, +Pahawh_Hmong, +Palmyrene, +Pau_Cin_Hau, +Phags_Pa, +Phoenician, +Psalter_Pahlavi, +Rejang, +Runic, +Samaritan, +Saurashtra, +Sharada, +Shavian, +Siddham, +SignWriting, +Sinhala, +Sogdian, +Sora_Sompeng, +Soyombo, +Sundanese, +Syloti_Nagri, +Syriac, +Tagalog, +Tagbanwa, +Tai_Le, +Tai_Tham, +Tai_Viet, +Takri, +Tamil, +Tangut, +Telugu, +Thaana, +Thai, +Tibetan, +Tifinagh, +Tirhuta, +Ugaritic, +Unknown, +Vai, +Wancho, +Warang_Citi, +Yezidi, +Yi, +Zanabazar_Square. +.P +Each character has exactly one Unicode general category property, specified by +a two-letter abbreviation. For compatibility with Perl, negation can be +specified by including a circumflex between the opening brace and the property +name. For example, \ep{^Lu} is the same as \eP{Lu}. +.P +If only one letter is specified with \ep or \eP, it includes all the general +category properties that start with that letter. In this case, in the absence +of negation, the curly brackets in the escape sequence are optional; these two +examples have the same effect: +.sp + \ep{L} + \epL +.sp +The following general category property codes are supported: +.sp + C Other + Cc Control + Cf Format + Cn Unassigned + Co Private use + Cs Surrogate +.sp + L Letter + Ll Lower case letter + Lm Modifier letter + Lo Other letter + Lt Title case letter + Lu Upper case letter +.sp + M Mark + Mc Spacing mark + Me Enclosing mark + Mn Non-spacing mark +.sp + N Number + Nd Decimal number + Nl Letter number + No Other number +.sp + P Punctuation + Pc Connector punctuation + Pd Dash punctuation + Pe Close punctuation + Pf Final punctuation + Pi Initial punctuation + Po Other punctuation + Ps Open punctuation +.sp + S Symbol + Sc Currency symbol + Sk Modifier symbol + Sm Mathematical symbol + So Other symbol +.sp + Z Separator + Zl Line separator + Zp Paragraph separator + Zs Space separator +.sp +The special property L& is also supported: it matches a character that has +the Lu, Ll, or Lt property, in other words, a letter that is not classified as +a modifier or "other". +.P +The Cs (Surrogate) property applies only to characters whose code points are in +the range U+D800 to U+DFFF. These characters are no different to any other +character when PCRE2 is not in UTF mode (using the 16-bit or 32-bit library). +However, they are not valid in Unicode strings and so cannot be tested by PCRE2 +in UTF mode, unless UTF validity checking has been turned off (see the +discussion of PCRE2_NO_UTF_CHECK in the +.\" HREF +\fBpcre2api\fP +.\" +page). +.P +The long synonyms for property names that Perl supports (such as \ep{Letter}) +are not supported by PCRE2, nor is it permitted to prefix any of these +properties with "Is". +.P +No character that is in the Unicode table has the Cn (unassigned) property. +Instead, this property is assumed for any code point that is not in the +Unicode table. +.P +Specifying caseless matching does not affect these escape sequences. For +example, \ep{Lu} always matches only upper case letters. This is different from +the behaviour of current versions of Perl. +.P +Matching characters by Unicode property is not fast, because PCRE2 has to do a +multistage table lookup in order to find a character's property. That is why +the traditional escape sequences such as \ed and \ew do not use Unicode +properties in PCRE2 by default, though you can make them do so by setting the +PCRE2_UCP option or by starting the pattern with (*UCP). +. +. +.SS Extended grapheme clusters +.rs +.sp +The \eX escape matches any number of Unicode characters that form an "extended +grapheme cluster", and treats the sequence as an atomic group +.\" HTML +.\" +(see below). +.\" +Unicode supports various kinds of composite character by giving each character +a grapheme breaking property, and having rules that use these properties to +define the boundaries of extended grapheme clusters. The rules are defined in +Unicode Standard Annex 29, "Unicode Text Segmentation". Unicode 11.0.0 +abandoned the use of some previous properties that had been used for emojis. +Instead it introduced various emoji-specific properties. PCRE2 uses only the +Extended Pictographic property. +.P +\eX always matches at least one character. Then it decides whether to add +additional characters according to the following rules for ending a cluster: +.P +1. End at the end of the subject string. +.P +2. Do not end between CR and LF; otherwise end after any control character. +.P +3. Do not break Hangul (a Korean script) syllable sequences. Hangul characters +are of five types: L, V, T, LV, and LVT. An L character may be followed by an +L, V, LV, or LVT character; an LV or V character may be followed by a V or T +character; an LVT or T character may be follwed only by a T character. +.P +4. Do not end before extending characters or spacing marks or the "zero-width +joiner" character. Characters with the "mark" property always have the +"extend" grapheme breaking property. +.P +5. Do not end after prepend characters. +.P +6. Do not break within emoji modifier sequences or emoji zwj sequences. That +is, do not break between characters with the Extended_Pictographic property. +Extend and ZWJ characters are allowed between the characters. +.P +7. Do not break within emoji flag sequences. That is, do not break between +regional indicator (RI) characters if there are an odd number of RI characters +before the break point. +.P +8. Otherwise, end the cluster. +. +. +.\" HTML +.SS PCRE2's additional properties +.rs +.sp +As well as the standard Unicode properties described above, PCRE2 supports four +more that make it possible to convert traditional escape sequences such as \ew +and \es to use Unicode properties. PCRE2 uses these non-standard, non-Perl +properties internally when PCRE2_UCP is set. However, they may also be used +explicitly. These properties are: +.sp + Xan Any alphanumeric character + Xps Any POSIX space character + Xsp Any Perl space character + Xwd Any Perl "word" character +.sp +Xan matches characters that have either the L (letter) or the N (number) +property. Xps matches the characters tab, linefeed, vertical tab, form feed, or +carriage return, and any other character that has the Z (separator) property. +Xsp is the same as Xps; in PCRE1 it used to exclude vertical tab, for Perl +compatibility, but Perl changed. Xwd matches the same characters as Xan, plus +underscore. +.P +There is another non-standard property, Xuc, which matches any character that +can be represented by a Universal Character Name in C++ and other programming +languages. These are the characters $, @, ` (grave accent), and all characters +with Unicode code points greater than or equal to U+00A0, except for the +surrogates U+D800 to U+DFFF. Note that most base (ASCII) characters are +excluded. (Universal Character Names are of the form \euHHHH or \eUHHHHHHHH +where H is a hexadecimal digit. Note that the Xuc property does not match these +sequences but the characters that they represent.) +. +. +.\" HTML +.SS "Resetting the match start" +.rs +.sp +In normal use, the escape sequence \eK causes any previously matched characters +not to be included in the final matched sequence that is returned. For example, +the pattern: +.sp + foo\eKbar +.sp +matches "foobar", but reports that it has matched "bar". \eK does not interact +with anchoring in any way. The pattern: +.sp + ^foo\eKbar +.sp +matches only when the subject begins with "foobar" (in single line mode), +though it again reports the matched string as "bar". This feature is similar to +a lookbehind assertion +.\" HTML +.\" +(described below). +.\" +However, in this case, the part of the subject before the real match does not +have to be of fixed length, as lookbehind assertions do. The use of \eK does +not interfere with the setting of +.\" HTML +.\" +captured substrings. +.\" +For example, when the pattern +.sp + (foo)\eKbar +.sp +matches "foobar", the first substring is still set to "foo". +.P +Perl used to document that the use of \eK within lookaround assertions is "not +well defined", but from version 5.32.0 Perl does not support this usage at all. +In PCRE2, \eK is acted upon when it occurs inside positive assertions, but is +ignored in negative assertions. Note that when a pattern such as (?=ab\eK) +matches, the reported start of the match can be greater than the end of the +match. Using \eK in a lookbehind assertion at the start of a pattern can also +lead to odd effects. For example, consider this pattern: +.sp + (?<=\eKfoo)bar +.sp +If the subject is "foobar", a call to \fBpcre2_match()\fP with a starting +offset of 3 succeeds and reports the matching string as "foobar", that is, the +start of the reported match is earlier than where the match started. +. +. +.\" HTML +.SS "Simple assertions" +.rs +.sp +The final use of backslash is for certain simple assertions. An assertion +specifies a condition that has to be met at a particular point in a match, +without consuming any characters from the subject string. The use of +groups for more complicated assertions is described +.\" HTML +.\" +below. +.\" +The backslashed assertions are: +.sp + \eb matches at a word boundary + \eB matches when not at a word boundary + \eA matches at the start of the subject + \eZ matches at the end of the subject + also matches before a newline at the end of the subject + \ez matches only at the end of the subject + \eG matches at the first matching position in the subject +.sp +Inside a character class, \eb has a different meaning; it matches the backspace +character. If any other of these assertions appears in a character class, an +"invalid escape sequence" error is generated. +.P +A word boundary is a position in the subject string where the current character +and the previous character do not both match \ew or \eW (i.e. one matches +\ew and the other matches \eW), or the start or end of the string if the +first or last character matches \ew, respectively. When PCRE2 is built with +Unicode support, the meanings of \ew and \eW can be changed by setting the +PCRE2_UCP option. When this is done, it also affects \eb and \eB. Neither PCRE2 +nor Perl has a separate "start of word" or "end of word" metasequence. However, +whatever follows \eb normally determines which it is. For example, the fragment +\eba matches "a" at the start of a word. +.P +The \eA, \eZ, and \ez assertions differ from the traditional circumflex and +dollar (described in the next section) in that they only ever match at the very +start and end of the subject string, whatever options are set. Thus, they are +independent of multiline mode. These three assertions are not affected by the +PCRE2_NOTBOL or PCRE2_NOTEOL options, which affect only the behaviour of the +circumflex and dollar metacharacters. However, if the \fIstartoffset\fP +argument of \fBpcre2_match()\fP is non-zero, indicating that matching is to +start at a point other than the beginning of the subject, \eA can never match. +The difference between \eZ and \ez is that \eZ matches before a newline at the +end of the string as well as at the very end, whereas \ez matches only at the +end. +.P +The \eG assertion is true only when the current matching position is at the +start point of the matching process, as specified by the \fIstartoffset\fP +argument of \fBpcre2_match()\fP. It differs from \eA when the value of +\fIstartoffset\fP is non-zero. By calling \fBpcre2_match()\fP multiple times +with appropriate arguments, you can mimic Perl's /g option, and it is in this +kind of implementation where \eG can be useful. +.P +Note, however, that PCRE2's implementation of \eG, being true at the starting +character of the matching process, is subtly different from Perl's, which +defines it as true at the end of the previous match. In Perl, these can be +different when the previously matched string was empty. Because PCRE2 does just +one match at a time, it cannot reproduce this behaviour. +.P +If all the alternatives of a pattern begin with \eG, the expression is anchored +to the starting match position, and the "anchored" flag is set in the compiled +regular expression. +. +. +.SH "CIRCUMFLEX AND DOLLAR" +.rs +.sp +The circumflex and dollar metacharacters are zero-width assertions. That is, +they test for a particular condition being true without consuming any +characters from the subject string. These two metacharacters are concerned with +matching the starts and ends of lines. If the newline convention is set so that +only the two-character sequence CRLF is recognized as a newline, isolated CR +and LF characters are treated as ordinary data characters, and are not +recognized as newlines. +.P +Outside a character class, in the default matching mode, the circumflex +character is an assertion that is true only if the current matching point is at +the start of the subject string. If the \fIstartoffset\fP argument of +\fBpcre2_match()\fP is non-zero, or if PCRE2_NOTBOL is set, circumflex can +never match if the PCRE2_MULTILINE option is unset. Inside a character class, +circumflex has an entirely different meaning +.\" HTML +.\" +(see below). +.\" +.P +Circumflex need not be the first character of the pattern if a number of +alternatives are involved, but it should be the first thing in each alternative +in which it appears if the pattern is ever to match that branch. If all +possible alternatives start with a circumflex, that is, if the pattern is +constrained to match only at the start of the subject, it is said to be an +"anchored" pattern. (There are also other constructs that can cause a pattern +to be anchored.) +.P +The dollar character is an assertion that is true only if the current matching +point is at the end of the subject string, or immediately before a newline at +the end of the string (by default), unless PCRE2_NOTEOL is set. Note, however, +that it does not actually match the newline. Dollar need not be the last +character of the pattern if a number of alternatives are involved, but it +should be the last item in any branch in which it appears. Dollar has no +special meaning in a character class. +.P +The meaning of dollar can be changed so that it matches only at the very end of +the string, by setting the PCRE2_DOLLAR_ENDONLY option at compile time. This +does not affect the \eZ assertion. +.P +The meanings of the circumflex and dollar metacharacters are changed if the +PCRE2_MULTILINE option is set. When this is the case, a dollar character +matches before any newlines in the string, as well as at the very end, and a +circumflex matches immediately after internal newlines as well as at the start +of the subject string. It does not match after a newline that ends the string, +for compatibility with Perl. However, this can be changed by setting the +PCRE2_ALT_CIRCUMFLEX option. +.P +For example, the pattern /^abc$/ matches the subject string "def\enabc" (where +\en represents a newline) in multiline mode, but not otherwise. Consequently, +patterns that are anchored in single line mode because all branches start with +^ are not anchored in multiline mode, and a match for circumflex is possible +when the \fIstartoffset\fP argument of \fBpcre2_match()\fP is non-zero. The +PCRE2_DOLLAR_ENDONLY option is ignored if PCRE2_MULTILINE is set. +.P +When the newline convention (see +.\" HTML +.\" +"Newline conventions" +.\" +below) recognizes the two-character sequence CRLF as a newline, this is +preferred, even if the single characters CR and LF are also recognized as +newlines. For example, if the newline convention is "any", a multiline mode +circumflex matches before "xyz" in the string "abc\er\enxyz" rather than after +CR, even though CR on its own is a valid newline. (It also matches at the very +start of the string, of course.) +.P +Note that the sequences \eA, \eZ, and \ez can be used to match the start and +end of the subject in both modes, and if all branches of a pattern start with +\eA it is always anchored, whether or not PCRE2_MULTILINE is set. +. +. +.\" HTML +.SH "FULL STOP (PERIOD, DOT) AND \eN" +.rs +.sp +Outside a character class, a dot in the pattern matches any one character in +the subject string except (by default) a character that signifies the end of a +line. +.P +When a line ending is defined as a single character, dot never matches that +character; when the two-character sequence CRLF is used, dot does not match CR +if it is immediately followed by LF, but otherwise it matches all characters +(including isolated CRs and LFs). When any Unicode line endings are being +recognized, dot does not match CR or LF or any of the other line ending +characters. +.P +The behaviour of dot with regard to newlines can be changed. If the +PCRE2_DOTALL option is set, a dot matches any one character, without exception. +If the two-character sequence CRLF is present in the subject string, it takes +two dots to match it. +.P +The handling of dot is entirely independent of the handling of circumflex and +dollar, the only relationship being that they both involve newlines. Dot has no +special meaning in a character class. +.P +The escape sequence \eN when not followed by an opening brace behaves like a +dot, except that it is not affected by the PCRE2_DOTALL option. In other words, +it matches any character except one that signifies the end of a line. +.P +When \eN is followed by an opening brace it has a different meaning. See the +section entitled +.\" HTML +.\" +"Non-printing characters" +.\" +above for details. Perl also uses \eN{name} to specify characters by Unicode +name; PCRE2 does not support this. +. +. +.SH "MATCHING A SINGLE CODE UNIT" +.rs +.sp +Outside a character class, the escape sequence \eC matches any one code unit, +whether or not a UTF mode is set. In the 8-bit library, one code unit is one +byte; in the 16-bit library it is a 16-bit unit; in the 32-bit library it is a +32-bit unit. Unlike a dot, \eC always matches line-ending characters. The +feature is provided in Perl in order to match individual bytes in UTF-8 mode, +but it is unclear how it can usefully be used. +.P +Because \eC breaks up characters into individual code units, matching one unit +with \eC in UTF-8 or UTF-16 mode means that the rest of the string may start +with a malformed UTF character. This has undefined results, because PCRE2 +assumes that it is matching character by character in a valid UTF string (by +default it checks the subject string's validity at the start of processing +unless the PCRE2_NO_UTF_CHECK or PCRE2_MATCH_INVALID_UTF option is used). +.P +An application can lock out the use of \eC by setting the +PCRE2_NEVER_BACKSLASH_C option when compiling a pattern. It is also possible to +build PCRE2 with the use of \eC permanently disabled. +.P +PCRE2 does not allow \eC to appear in lookbehind assertions +.\" HTML +.\" +(described below) +.\" +in UTF-8 or UTF-16 modes, because this would make it impossible to calculate +the length of the lookbehind. Neither the alternative matching function +\fBpcre2_dfa_match()\fP nor the JIT optimizer support \eC in these UTF modes. +The former gives a match-time error; the latter fails to optimize and so the +match is always run using the interpreter. +.P +In the 32-bit library, however, \eC is always supported (when not explicitly +locked out) because it always matches a single code unit, whether or not UTF-32 +is specified. +.P +In general, the \eC escape sequence is best avoided. However, one way of using +it that avoids the problem of malformed UTF-8 or UTF-16 characters is to use a +lookahead to check the length of the next character, as in this pattern, which +could be used with a UTF-8 string (ignore white space and line breaks): +.sp + (?| (?=[\ex00-\ex7f])(\eC) | + (?=[\ex80-\ex{7ff}])(\eC)(\eC) | + (?=[\ex{800}-\ex{ffff}])(\eC)(\eC)(\eC) | + (?=[\ex{10000}-\ex{1fffff}])(\eC)(\eC)(\eC)(\eC)) +.sp +In this example, a group that starts with (?| resets the capturing parentheses +numbers in each alternative (see +.\" HTML +.\" +"Duplicate Group Numbers" +.\" +below). The assertions at the start of each branch check the next UTF-8 +character for values whose encoding uses 1, 2, 3, or 4 bytes, respectively. The +character's individual bytes are then captured by the appropriate number of +\eC groups. +. +. +.\" HTML +.SH "SQUARE BRACKETS AND CHARACTER CLASSES" +.rs +.sp +An opening square bracket introduces a character class, terminated by a closing +square bracket. A closing square bracket on its own is not special by default. +If a closing square bracket is required as a member of the class, it should be +the first data character in the class (after an initial circumflex, if present) +or escaped with a backslash. This means that, by default, an empty class cannot +be defined. However, if the PCRE2_ALLOW_EMPTY_CLASS option is set, a closing +square bracket at the start does end the (empty) class. +.P +A character class matches a single character in the subject. A matched +character must be in the set of characters defined by the class, unless the +first character in the class definition is a circumflex, in which case the +subject character must not be in the set defined by the class. If a circumflex +is actually required as a member of the class, ensure it is not the first +character, or escape it with a backslash. +.P +For example, the character class [aeiou] matches any lower case vowel, while +[^aeiou] matches any character that is not a lower case vowel. Note that a +circumflex is just a convenient notation for specifying the characters that +are in the class by enumerating those that are not. A class that starts with a +circumflex is not an assertion; it still consumes a character from the subject +string, and therefore it fails if the current pointer is at the end of the +string. +.P +Characters in a class may be specified by their code points using \eo, \ex, or +\eN{U+hh..} in the usual way. When caseless matching is set, any letters in a +class represent both their upper case and lower case versions, so for example, +a caseless [aeiou] matches "A" as well as "a", and a caseless [^aeiou] does not +match "A", whereas a caseful version would. Note that there are two ASCII +characters, K and S, that, in addition to their lower case ASCII equivalents, +are case-equivalent with Unicode U+212A (Kelvin sign) and U+017F (long S) +respectively when either PCRE2_UTF or PCRE2_UCP is set. +.P +Characters that might indicate line breaks are never treated in any special way +when matching character classes, whatever line-ending sequence is in use, and +whatever setting of the PCRE2_DOTALL and PCRE2_MULTILINE options is used. A +class such as [^a] always matches one of these characters. +.P +The generic character type escape sequences \ed, \eD, \eh, \eH, \ep, \eP, \es, +\eS, \ev, \eV, \ew, and \eW may appear in a character class, and add the +characters that they match to the class. For example, [\edABCDEF] matches any +hexadecimal digit. In UTF modes, the PCRE2_UCP option affects the meanings of +\ed, \es, \ew and their upper case partners, just as it does when they appear +outside a character class, as described in the section entitled +.\" HTML +.\" +"Generic character types" +.\" +above. The escape sequence \eb has a different meaning inside a character +class; it matches the backspace character. The sequences \eB, \eR, and \eX are +not special inside a character class. Like any other unrecognized escape +sequences, they cause an error. The same is true for \eN when not followed by +an opening brace. +.P +The minus (hyphen) character can be used to specify a range of characters in a +character class. For example, [d-m] matches any letter between d and m, +inclusive. If a minus character is required in a class, it must be escaped with +a backslash or appear in a position where it cannot be interpreted as +indicating a range, typically as the first or last character in the class, +or immediately after a range. For example, [b-d-z] matches letters in the range +b to d, a hyphen character, or z. +.P +Perl treats a hyphen as a literal if it appears before or after a POSIX class +(see below) or before or after a character type escape such as as \ed or \eH. +However, unless the hyphen is the last character in the class, Perl outputs a +warning in its warning mode, as this is most likely a user error. As PCRE2 has +no facility for warning, an error is given in these cases. +.P +It is not possible to have the literal character "]" as the end character of a +range. A pattern such as [W-]46] is interpreted as a class of two characters +("W" and "-") followed by a literal string "46]", so it would match "W46]" or +"-46]". However, if the "]" is escaped with a backslash it is interpreted as +the end of range, so [W-\e]46] is interpreted as a class containing a range +followed by two other characters. The octal or hexadecimal representation of +"]" can also be used to end a range. +.P +Ranges normally include all code points between the start and end characters, +inclusive. They can also be used for code points specified numerically, for +example [\e000-\e037]. Ranges can include any characters that are valid for the +current mode. In any UTF mode, the so-called "surrogate" characters (those +whose code points lie between 0xd800 and 0xdfff inclusive) may not be specified +explicitly by default (the PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES option disables +this check). However, ranges such as [\ex{d7ff}-\ex{e000}], which include the +surrogates, are always permitted. +.P +There is a special case in EBCDIC environments for ranges whose end points are +both specified as literal letters in the same case. For compatibility with +Perl, EBCDIC code points within the range that are not letters are omitted. For +example, [h-k] matches only four characters, even though the codes for h and k +are 0x88 and 0x92, a range of 11 code points. However, if the range is +specified numerically, for example, [\ex88-\ex92] or [h-\ex92], all code points +are included. +.P +If a range that includes letters is used when caseless matching is set, it +matches the letters in either case. For example, [W-c] is equivalent to +[][\e\e^_`wxyzabc], matched caselessly, and in a non-UTF mode, if character +tables for a French locale are in use, [\exc8-\excb] matches accented E +characters in both cases. +.P +A circumflex can conveniently be used with the upper case character types to +specify a more restricted set of characters than the matching lower case type. +For example, the class [^\eW_] matches any letter or digit, but not underscore, +whereas [\ew] includes underscore. A positive character class should be read as +"something OR something OR ..." and a negative class as "NOT something AND NOT +something AND NOT ...". +.P +The only metacharacters that are recognized in character classes are backslash, +hyphen (only where it can be interpreted as specifying a range), circumflex +(only at the start), opening square bracket (only when it can be interpreted as +introducing a POSIX class name, or for a special compatibility feature - see +the next two sections), and the terminating closing square bracket. However, +escaping other non-alphanumeric characters does no harm. +. +. +.SH "POSIX CHARACTER CLASSES" +.rs +.sp +Perl supports the POSIX notation for character classes. This uses names +enclosed by [: and :] within the enclosing square brackets. PCRE2 also supports +this notation. For example, +.sp + [01[:alpha:]%] +.sp +matches "0", "1", any alphabetic character, or "%". The supported class names +are: +.sp + alnum letters and digits + alpha letters + ascii character codes 0 - 127 + blank space or tab only + cntrl control characters + digit decimal digits (same as \ed) + graph printing characters, excluding space + lower lower case letters + print printing characters, including space + punct printing characters, excluding letters and digits and space + space white space (the same as \es from PCRE2 8.34) + upper upper case letters + word "word" characters (same as \ew) + xdigit hexadecimal digits +.sp +The default "space" characters are HT (9), LF (10), VT (11), FF (12), CR (13), +and space (32). If locale-specific matching is taking place, the list of space +characters may be different; there may be fewer or more of them. "Space" and +\es match the same set of characters. +.P +The name "word" is a Perl extension, and "blank" is a GNU extension from Perl +5.8. Another Perl extension is negation, which is indicated by a ^ character +after the colon. For example, +.sp + [12[:^digit:]] +.sp +matches "1", "2", or any non-digit. PCRE2 (and Perl) also recognize the POSIX +syntax [.ch.] and [=ch=] where "ch" is a "collating element", but these are not +supported, and an error is given if they are encountered. +.P +By default, characters with values greater than 127 do not match any of the +POSIX character classes, although this may be different for characters in the +range 128-255 when locale-specific matching is happening. However, if the +PCRE2_UCP option is passed to \fBpcre2_compile()\fP, some of the classes are +changed so that Unicode character properties are used. This is achieved by +replacing certain POSIX classes with other sequences, as follows: +.sp + [:alnum:] becomes \ep{Xan} + [:alpha:] becomes \ep{L} + [:blank:] becomes \eh + [:cntrl:] becomes \ep{Cc} + [:digit:] becomes \ep{Nd} + [:lower:] becomes \ep{Ll} + [:space:] becomes \ep{Xps} + [:upper:] becomes \ep{Lu} + [:word:] becomes \ep{Xwd} +.sp +Negated versions, such as [:^alpha:] use \eP instead of \ep. Three other POSIX +classes are handled specially in UCP mode: +.TP 10 +[:graph:] +This matches characters that have glyphs that mark the page when printed. In +Unicode property terms, it matches all characters with the L, M, N, P, S, or Cf +properties, except for: +.sp + U+061C Arabic Letter Mark + U+180E Mongolian Vowel Separator + U+2066 - U+2069 Various "isolate"s +.sp +.TP 10 +[:print:] +This matches the same characters as [:graph:] plus space characters that are +not controls, that is, characters with the Zs property. +.TP 10 +[:punct:] +This matches all characters that have the Unicode P (punctuation) property, +plus those characters with code points less than 256 that have the S (Symbol) +property. +.P +The other POSIX classes are unchanged, and match only characters with code +points less than 256. +. +. +.SH "COMPATIBILITY FEATURE FOR WORD BOUNDARIES" +.rs +.sp +In the POSIX.2 compliant library that was included in 4.4BSD Unix, the ugly +syntax [[:<:]] and [[:>:]] is used for matching "start of word" and "end of +word". PCRE2 treats these items as follows: +.sp + [[:<:]] is converted to \eb(?=\ew) + [[:>:]] is converted to \eb(?<=\ew) +.sp +Only these exact character sequences are recognized. A sequence such as +[a[:<:]b] provokes error for an unrecognized POSIX class name. This support is +not compatible with Perl. It is provided to help migrations from other +environments, and is best not used in any new patterns. Note that \eb matches +at the start and the end of a word (see +.\" HTML +.\" +"Simple assertions" +.\" +above), and in a Perl-style pattern the preceding or following character +normally shows which is wanted, without the need for the assertions that are +used above in order to give exactly the POSIX behaviour. +. +. +.SH "VERTICAL BAR" +.rs +.sp +Vertical bar characters are used to separate alternative patterns. For example, +the pattern +.sp + gilbert|sullivan +.sp +matches either "gilbert" or "sullivan". Any number of alternatives may appear, +and an empty alternative is permitted (matching the empty string). The matching +process tries each alternative in turn, from left to right, and the first one +that succeeds is used. If the alternatives are within a group +.\" HTML +.\" +(defined below), +.\" +"succeeds" means matching the rest of the main pattern as well as the +alternative in the group. +. +. +.\" HTML +.SH "INTERNAL OPTION SETTING" +.rs +.sp +The settings of the PCRE2_CASELESS, PCRE2_MULTILINE, PCRE2_DOTALL, +PCRE2_EXTENDED, PCRE2_EXTENDED_MORE, and PCRE2_NO_AUTO_CAPTURE options can be +changed from within the pattern by a sequence of letters enclosed between "(?" +and ")". These options are Perl-compatible, and are described in detail in the +.\" HREF +\fBpcre2api\fP +.\" +documentation. The option letters are: +.sp + i for PCRE2_CASELESS + m for PCRE2_MULTILINE + n for PCRE2_NO_AUTO_CAPTURE + s for PCRE2_DOTALL + x for PCRE2_EXTENDED + xx for PCRE2_EXTENDED_MORE +.sp +For example, (?im) sets caseless, multiline matching. It is also possible to +unset these options by preceding the relevant letters with a hyphen, for +example (?-im). The two "extended" options are not independent; unsetting either +one cancels the effects of both of them. +.P +A combined setting and unsetting such as (?im-sx), which sets PCRE2_CASELESS +and PCRE2_MULTILINE while unsetting PCRE2_DOTALL and PCRE2_EXTENDED, is also +permitted. Only one hyphen may appear in the options string. If a letter +appears both before and after the hyphen, the option is unset. An empty options +setting "(?)" is allowed. Needless to say, it has no effect. +.P +If the first character following (? is a circumflex, it causes all of the above +options to be unset. Thus, (?^) is equivalent to (?-imnsx). Letters may follow +the circumflex to cause some options to be re-instated, but a hyphen may not +appear. +.P +The PCRE2-specific options PCRE2_DUPNAMES and PCRE2_UNGREEDY can be changed in +the same way as the Perl-compatible options by using the characters J and U +respectively. However, these are not unset by (?^). +.P +When one of these option changes occurs at top level (that is, not inside +group parentheses), the change applies to the remainder of the pattern +that follows. An option change within a group (see below for a description +of groups) affects only that part of the group that follows it, so +.sp + (a(?i)b)c +.sp +matches abc and aBc and no other strings (assuming PCRE2_CASELESS is not used). +By this means, options can be made to have different settings in different +parts of the pattern. Any changes made in one alternative do carry on +into subsequent branches within the same group. For example, +.sp + (a(?i)b|c) +.sp +matches "ab", "aB", "c", and "C", even though when matching "C" the first +branch is abandoned before the option setting. This is because the effects of +option settings happen at compile time. There would be some very weird +behaviour otherwise. +.P +As a convenient shorthand, if any option settings are required at the start of +a non-capturing group (see the next section), the option letters may +appear between the "?" and the ":". Thus the two patterns +.sp + (?i:saturday|sunday) + (?:(?i)saturday|sunday) +.sp +match exactly the same set of strings. +.P +\fBNote:\fP There are other PCRE2-specific options, applying to the whole +pattern, which can be set by the application when the compiling function is +called. In addition, the pattern can contain special leading sequences such as +(*CRLF) to override what the application has set or what has been defaulted. +Details are given in the section entitled +.\" HTML +.\" +"Newline sequences" +.\" +above. There are also the (*UTF) and (*UCP) leading sequences that can be used +to set UTF and Unicode property modes; they are equivalent to setting the +PCRE2_UTF and PCRE2_UCP options, respectively. However, the application can set +the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP options, which lock out the use of the +(*UTF) and (*UCP) sequences. +. +. +.\" HTML +.SH GROUPS +.rs +.sp +Groups are delimited by parentheses (round brackets), which can be nested. +Turning part of a pattern into a group does two things: +.sp +1. It localizes a set of alternatives. For example, the pattern +.sp + cat(aract|erpillar|) +.sp +matches "cataract", "caterpillar", or "cat". Without the parentheses, it would +match "cataract", "erpillar" or an empty string. +.sp +2. It creates a "capture group". This means that, when the whole pattern +matches, the portion of the subject string that matched the group is passed +back to the caller, separately from the portion that matched the whole pattern. +(This applies only to the traditional matching function; the DFA matching +function does not support capturing.) +.P +Opening parentheses are counted from left to right (starting from 1) to obtain +numbers for capture groups. For example, if the string "the red king" is +matched against the pattern +.sp + the ((red|white) (king|queen)) +.sp +the captured substrings are "red king", "red", and "king", and are numbered 1, +2, and 3, respectively. +.P +The fact that plain parentheses fulfil two functions is not always helpful. +There are often times when grouping is required without capturing. If an +opening parenthesis is followed by a question mark and a colon, the group +does not do any capturing, and is not counted when computing the number of any +subsequent capture groups. For example, if the string "the white queen" +is matched against the pattern +.sp + the ((?:red|white) (king|queen)) +.sp +the captured substrings are "white queen" and "queen", and are numbered 1 and +2. The maximum number of capture groups is 65535. +.P +As a convenient shorthand, if any option settings are required at the start of +a non-capturing group, the option letters may appear between the "?" and the +":". Thus the two patterns +.sp + (?i:saturday|sunday) + (?:(?i)saturday|sunday) +.sp +match exactly the same set of strings. Because alternative branches are tried +from left to right, and options are not reset until the end of the group is +reached, an option setting in one branch does affect subsequent branches, so +the above patterns match "SUNDAY" as well as "Saturday". +. +. +.\" HTML +.SH "DUPLICATE GROUP NUMBERS" +.rs +.sp +Perl 5.10 introduced a feature whereby each alternative in a group uses the +same numbers for its capturing parentheses. Such a group starts with (?| and is +itself a non-capturing group. For example, consider this pattern: +.sp + (?|(Sat)ur|(Sun))day +.sp +Because the two alternatives are inside a (?| group, both sets of capturing +parentheses are numbered one. Thus, when the pattern matches, you can look +at captured substring number one, whichever alternative matched. This construct +is useful when you want to capture part, but not all, of one of a number of +alternatives. Inside a (?| group, parentheses are numbered as usual, but the +number is reset at the start of each branch. The numbers of any capturing +parentheses that follow the whole group start after the highest number used in +any branch. The following example is taken from the Perl documentation. The +numbers underneath show in which buffer the captured content will be stored. +.sp + # before ---------------branch-reset----------- after + / ( a ) (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x + # 1 2 2 3 2 3 4 +.sp +A backreference to a capture group uses the most recent value that is set for +the group. The following pattern matches "abcabc" or "defdef": +.sp + /(?|(abc)|(def))\e1/ +.sp +In contrast, a subroutine call to a capture group always refers to the +first one in the pattern with the given number. The following pattern matches +"abcabc" or "defabc": +.sp + /(?|(abc)|(def))(?1)/ +.sp +A relative reference such as (?-1) is no different: it is just a convenient way +of computing an absolute group number. +.P +If a +.\" HTML +.\" +condition test +.\" +for a group's having matched refers to a non-unique number, the test is +true if any group with that number has matched. +.P +An alternative approach to using this "branch reset" feature is to use +duplicate named groups, as described in the next section. +. +. +.SH "NAMED CAPTURE GROUPS" +.rs +.sp +Identifying capture groups by number is simple, but it can be very hard to keep +track of the numbers in complicated patterns. Furthermore, if an expression is +modified, the numbers may change. To help with this difficulty, PCRE2 supports +the naming of capture groups. This feature was not added to Perl until release +5.10. Python had the feature earlier, and PCRE1 introduced it at release 4.0, +using the Python syntax. PCRE2 supports both the Perl and the Python syntax. +.P +In PCRE2, a capture group can be named in one of three ways: (?...) or +(?'name'...) as in Perl, or (?P...) as in Python. Names may be up to 32 +code units long. When PCRE2_UTF is not set, they may contain only ASCII +alphanumeric characters and underscores, but must start with a non-digit. When +PCRE2_UTF is set, the syntax of group names is extended to allow any Unicode +letter or Unicode decimal digit. In other words, group names must match one of +these patterns: +.sp + ^[_A-Za-z][_A-Za-z0-9]*\ez when PCRE2_UTF is not set + ^[_\ep{L}][_\ep{L}\ep{Nd}]*\ez when PCRE2_UTF is set +.sp +References to capture groups from other parts of the pattern, such as +.\" HTML +.\" +backreferences, +.\" +.\" HTML +.\" +recursion, +.\" +and +.\" HTML +.\" +conditions, +.\" +can all be made by name as well as by number. +.P +Named capture groups are allocated numbers as well as names, exactly as +if the names were not present. In both PCRE2 and Perl, capture groups +are primarily identified by numbers; any names are just aliases for these +numbers. The PCRE2 API provides function calls for extracting the complete +name-to-number translation table from a compiled pattern, as well as +convenience functions for extracting captured substrings by name. +.P +\fBWarning:\fP When more than one capture group has the same number, as +described in the previous section, a name given to one of them applies to all +of them. Perl allows identically numbered groups to have different names. +Consider this pattern, where there are two capture groups, both numbered 1: +.sp + (?|(?aa)|(?bb)) +.sp +Perl allows this, with both names AA and BB as aliases of group 1. Thus, after +a successful match, both names yield the same value (either "aa" or "bb"). +.P +In an attempt to reduce confusion, PCRE2 does not allow the same group number +to be associated with more than one name. The example above provokes a +compile-time error. However, there is still scope for confusion. Consider this +pattern: +.sp + (?|(?aa)|(bb)) +.sp +Although the second group number 1 is not explicitly named, the name AA is +still an alias for any group 1. Whether the pattern matches "aa" or "bb", a +reference by name to group AA yields the matched string. +.P +By default, a name must be unique within a pattern, except that duplicate names +are permitted for groups with the same number, for example: +.sp + (?|(?aa)|(?bb)) +.sp +The duplicate name constraint can be disabled by setting the PCRE2_DUPNAMES +option at compile time, or by the use of (?J) within the pattern, as described +in the section entitled +.\" HTML +.\" +"Internal Option Setting" +.\" +above. +.P +Duplicate names can be useful for patterns where only one instance of the named +capture group can match. Suppose you want to match the name of a weekday, +either as a 3-letter abbreviation or as the full name, and in both cases you +want to extract the abbreviation. This pattern (ignoring the line breaks) does +the job: +.sp + (?J) + (?Mon|Fri|Sun)(?:day)?| + (?Tue)(?:sday)?| + (?Wed)(?:nesday)?| + (?Thu)(?:rsday)?| + (?Sat)(?:urday)? +.sp +There are five capture groups, but only one is ever set after a match. The +convenience functions for extracting the data by name returns the substring for +the first (and in this example, the only) group of that name that matched. This +saves searching to find which numbered group it was. (An alternative way of +solving this problem is to use a "branch reset" group, as described in the +previous section.) +.P +If you make a backreference to a non-unique named group from elsewhere in the +pattern, the groups to which the name refers are checked in the order in which +they appear in the overall pattern. The first one that is set is used for the +reference. For example, this pattern matches both "foofoo" and "barbar" but not +"foobar" or "barfoo": +.sp + (?J)(?:(?foo)|(?bar))\ek +.sp +.P +If you make a subroutine call to a non-unique named group, the one that +corresponds to the first occurrence of the name is used. In the absence of +duplicate numbers this is the one with the lowest number. +.P +If you use a named reference in a condition +test (see the +.\" +.\" HTML +.\" +section about conditions +.\" +below), either to check whether a capture group has matched, or to check for +recursion, all groups with the same name are tested. If the condition is true +for any one of them, the overall condition is true. This is the same behaviour +as testing by number. For further details of the interfaces for handling named +capture groups, see the +.\" HREF +\fBpcre2api\fP +.\" +documentation. +. +. +.SH REPETITION +.rs +.sp +Repetition is specified by quantifiers, which can follow any of the following +items: +.sp + a literal data character + the dot metacharacter + the \eC escape sequence + the \eR escape sequence + the \eX escape sequence + an escape such as \ed or \epL that matches a single character + a character class + a backreference + a parenthesized group (including lookaround assertions) + a subroutine call (recursive or otherwise) +.sp +The general repetition quantifier specifies a minimum and maximum number of +permitted matches, by giving the two numbers in curly brackets (braces), +separated by a comma. The numbers must be less than 65536, and the first must +be less than or equal to the second. For example, +.sp + z{2,4} +.sp +matches "zz", "zzz", or "zzzz". A closing brace on its own is not a special +character. If the second number is omitted, but the comma is present, there is +no upper limit; if the second number and the comma are both omitted, the +quantifier specifies an exact number of required matches. Thus +.sp + [aeiou]{3,} +.sp +matches at least 3 successive vowels, but may match many more, whereas +.sp + \ed{8} +.sp +matches exactly 8 digits. An opening curly bracket that appears in a position +where a quantifier is not allowed, or one that does not match the syntax of a +quantifier, is taken as a literal character. For example, {,6} is not a +quantifier, but a literal string of four characters. +.P +In UTF modes, quantifiers apply to characters rather than to individual code +units. Thus, for example, \ex{100}{2} matches two characters, each of +which is represented by a two-byte sequence in a UTF-8 string. Similarly, +\eX{3} matches three Unicode extended grapheme clusters, each of which may be +several code units long (and they may be of different lengths). +.P +The quantifier {0} is permitted, causing the expression to behave as if the +previous item and the quantifier were not present. This may be useful for +capture groups that are referenced as +.\" HTML +.\" +subroutines +.\" +from elsewhere in the pattern (but see also the section entitled +.\" HTML +.\" +"Defining capture groups for use by reference only" +.\" +below). Except for parenthesized groups, items that have a {0} quantifier are +omitted from the compiled pattern. +.P +For convenience, the three most common quantifiers have single-character +abbreviations: +.sp + * is equivalent to {0,} + + is equivalent to {1,} + ? is equivalent to {0,1} +.sp +It is possible to construct infinite loops by following a group that can match +no characters with a quantifier that has no upper limit, for example: +.sp + (a?)* +.sp +Earlier versions of Perl and PCRE1 used to give an error at compile time for +such patterns. However, because there are cases where this can be useful, such +patterns are now accepted, but whenever an iteration of such a group matches no +characters, matching moves on to the next item in the pattern instead of +repeatedly matching an empty string. This does not prevent backtracking into +any of the iterations if a subsequent item fails to match. +.P +By default, quantifiers are "greedy", that is, they match as much as possible +(up to the maximum number of permitted times), without causing the rest of the +pattern to fail. The classic example of where this gives problems is in trying +to match comments in C programs. These appear between /* and */ and within the +comment, individual * and / characters may appear. An attempt to match C +comments by applying the pattern +.sp + /\e*.*\e*/ +.sp +to the string +.sp + /* first comment */ not comment /* second comment */ +.sp +fails, because it matches the entire string owing to the greediness of the .* +item. However, if a quantifier is followed by a question mark, it ceases to be +greedy, and instead matches the minimum number of times possible, so the +pattern +.sp + /\e*.*?\e*/ +.sp +does the right thing with the C comments. The meaning of the various +quantifiers is not otherwise changed, just the preferred number of matches. +Do not confuse this use of question mark with its use as a quantifier in its +own right. Because it has two uses, it can sometimes appear doubled, as in +.sp + \ed??\ed +.sp +which matches one digit by preference, but can match two if that is the only +way the rest of the pattern matches. +.P +If the PCRE2_UNGREEDY option is set (an option that is not available in Perl), +the quantifiers are not greedy by default, but individual ones can be made +greedy by following them with a question mark. In other words, it inverts the +default behaviour. +.P +When a parenthesized group is quantified with a minimum repeat count that +is greater than 1 or with a limited maximum, more memory is required for the +compiled pattern, in proportion to the size of the minimum or maximum. +.P +If a pattern starts with .* or .{0,} and the PCRE2_DOTALL option (equivalent +to Perl's /s) is set, thus allowing the dot to match newlines, the pattern is +implicitly anchored, because whatever follows will be tried against every +character position in the subject string, so there is no point in retrying the +overall match at any position after the first. PCRE2 normally treats such a +pattern as though it were preceded by \eA. +.P +In cases where it is known that the subject string contains no newlines, it is +worth setting PCRE2_DOTALL in order to obtain this optimization, or +alternatively, using ^ to indicate anchoring explicitly. +.P +However, there are some cases where the optimization cannot be used. When .* +is inside capturing parentheses that are the subject of a backreference +elsewhere in the pattern, a match at the start may fail where a later one +succeeds. Consider, for example: +.sp + (.*)abc\e1 +.sp +If the subject is "xyz123abc123" the match point is the fourth character. For +this reason, such a pattern is not implicitly anchored. +.P +Another case where implicit anchoring is not applied is when the leading .* is +inside an atomic group. Once again, a match at the start may fail where a later +one succeeds. Consider this pattern: +.sp + (?>.*?a)b +.sp +It matches "ab" in the subject "aab". The use of the backtracking control verbs +(*PRUNE) and (*SKIP) also disable this optimization, and there is an option, +PCRE2_NO_DOTSTAR_ANCHOR, to do so explicitly. +.P +When a capture group is repeated, the value captured is the substring that +matched the final iteration. For example, after +.sp + (tweedle[dume]{3}\es*)+ +.sp +has matched "tweedledum tweedledee" the value of the captured substring is +"tweedledee". However, if there are nested capture groups, the corresponding +captured values may have been set in previous iterations. For example, after +.sp + (a|(b))+ +.sp +matches "aba" the value of the second captured substring is "b". +. +. +.\" HTML +.SH "ATOMIC GROUPING AND POSSESSIVE QUANTIFIERS" +.rs +.sp +With both maximizing ("greedy") and minimizing ("ungreedy" or "lazy") +repetition, failure of what follows normally causes the repeated item to be +re-evaluated to see if a different number of repeats allows the rest of the +pattern to match. Sometimes it is useful to prevent this, either to change the +nature of the match, or to cause it fail earlier than it otherwise might, when +the author of the pattern knows there is no point in carrying on. +.P +Consider, for example, the pattern \ed+foo when applied to the subject line +.sp + 123456bar +.sp +After matching all 6 digits and then failing to match "foo", the normal +action of the matcher is to try again with only 5 digits matching the \ed+ +item, and then with 4, and so on, before ultimately failing. "Atomic grouping" +(a term taken from Jeffrey Friedl's book) provides the means for specifying +that once a group has matched, it is not to be re-evaluated in this way. +.P +If we use atomic grouping for the previous example, the matcher gives up +immediately on failing to match "foo" the first time. The notation is a kind of +special parenthesis, starting with (?> as in this example: +.sp + (?>\ed+)foo +.sp +Perl 5.28 introduced an experimental alphabetic form starting with (* which may +be easier to remember: +.sp + (*atomic:\ed+)foo +.sp +This kind of parenthesized group "locks up" the part of the pattern it +contains once it has matched, and a failure further into the pattern is +prevented from backtracking into it. Backtracking past it to previous items, +however, works as normal. +.P +An alternative description is that a group of this type matches exactly the +string of characters that an identical standalone pattern would match, if +anchored at the current point in the subject string. +.P +Atomic groups are not capture groups. Simple cases such as the above example +can be thought of as a maximizing repeat that must swallow everything it can. +So, while both \ed+ and \ed+? are prepared to adjust the number of digits they +match in order to make the rest of the pattern match, (?>\ed+) can only match +an entire sequence of digits. +.P +Atomic groups in general can of course contain arbitrarily complicated +expressions, and can be nested. However, when the contents of an atomic +group is just a single repeated item, as in the example above, a simpler +notation, called a "possessive quantifier" can be used. This consists of an +additional + character following a quantifier. Using this notation, the +previous example can be rewritten as +.sp + \ed++foo +.sp +Note that a possessive quantifier can be used with an entire group, for +example: +.sp + (abc|xyz){2,3}+ +.sp +Possessive quantifiers are always greedy; the setting of the PCRE2_UNGREEDY +option is ignored. They are a convenient notation for the simpler forms of +atomic group. However, there is no difference in the meaning of a possessive +quantifier and the equivalent atomic group, though there may be a performance +difference; possessive quantifiers should be slightly faster. +.P +The possessive quantifier syntax is an extension to the Perl 5.8 syntax. +Jeffrey Friedl originated the idea (and the name) in the first edition of his +book. Mike McCloskey liked it, so implemented it when he built Sun's Java +package, and PCRE1 copied it from there. It found its way into Perl at release +5.10. +.P +PCRE2 has an optimization that automatically "possessifies" certain simple +pattern constructs. For example, the sequence A+B is treated as A++B because +there is no point in backtracking into a sequence of A's when B must follow. +This feature can be disabled by the PCRE2_NO_AUTOPOSSESS option, or starting +the pattern with (*NO_AUTO_POSSESS). +.P +When a pattern contains an unlimited repeat inside a group that can itself be +repeated an unlimited number of times, the use of an atomic group is the only +way to avoid some failing matches taking a very long time indeed. The pattern +.sp + (\eD+|<\ed+>)*[!?] +.sp +matches an unlimited number of substrings that either consist of non-digits, or +digits enclosed in <>, followed by either ! or ?. When it matches, it runs +quickly. However, if it is applied to +.sp + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +.sp +it takes a long time before reporting failure. This is because the string can +be divided between the internal \eD+ repeat and the external * repeat in a +large number of ways, and all have to be tried. (The example uses [!?] rather +than a single character at the end, because both PCRE2 and Perl have an +optimization that allows for fast failure when a single character is used. They +remember the last single character that is required for a match, and fail early +if it is not present in the string.) If the pattern is changed so that it uses +an atomic group, like this: +.sp + ((?>\eD+)|<\ed+>)*[!?] +.sp +sequences of non-digits cannot be broken, and failure happens quickly. +. +. +.\" HTML +.SH "BACKREFERENCES" +.rs +.sp +Outside a character class, a backslash followed by a digit greater than 0 (and +possibly further digits) is a backreference to a capture group earlier (that +is, to its left) in the pattern, provided there have been that many previous +capture groups. +.P +However, if the decimal number following the backslash is less than 8, it is +always taken as a backreference, and causes an error only if there are not that +many capture groups in the entire pattern. In other words, the group that is +referenced need not be to the left of the reference for numbers less than 8. A +"forward backreference" of this type can make sense when a repetition is +involved and the group to the right has participated in an earlier iteration. +.P +It is not possible to have a numerical "forward backreference" to a group whose +number is 8 or more using this syntax because a sequence such as \e50 is +interpreted as a character defined in octal. See the subsection entitled +"Non-printing characters" +.\" HTML +.\" +above +.\" +for further details of the handling of digits following a backslash. Other +forms of backreferencing do not suffer from this restriction. In particular, +there is no problem when named capture groups are used (see below). +.P +Another way of avoiding the ambiguity inherent in the use of digits following a +backslash is to use the \eg escape sequence. This escape must be followed by a +signed or unsigned number, optionally enclosed in braces. These examples are +all identical: +.sp + (ring), \e1 + (ring), \eg1 + (ring), \eg{1} +.sp +An unsigned number specifies an absolute reference without the ambiguity that +is present in the older syntax. It is also useful when literal digits follow +the reference. A signed number is a relative reference. Consider this example: +.sp + (abc(def)ghi)\eg{-1} +.sp +The sequence \eg{-1} is a reference to the most recently started capture group +before \eg, that is, is it equivalent to \e2 in this example. Similarly, +\eg{-2} would be equivalent to \e1. The use of relative references can be +helpful in long patterns, and also in patterns that are created by joining +together fragments that contain references within themselves. +.P +The sequence \eg{+1} is a reference to the next capture group. This kind of +forward reference can be useful in patterns that repeat. Perl does not support +the use of + in this way. +.P +A backreference matches whatever actually most recently matched the capture +group in the current subject string, rather than anything at all that matches +the group (see +.\" HTML +.\" +"Groups as subroutines" +.\" +below for a way of doing that). So the pattern +.sp + (sens|respons)e and \e1ibility +.sp +matches "sense and sensibility" and "response and responsibility", but not +"sense and responsibility". If caseful matching is in force at the time of the +backreference, the case of letters is relevant. For example, +.sp + ((?i)rah)\es+\e1 +.sp +matches "rah rah" and "RAH RAH", but not "RAH rah", even though the original +capture group is matched caselessly. +.P +There are several different ways of writing backreferences to named capture +groups. The .NET syntax \ek{name} and the Perl syntax \ek or \ek'name' +are supported, as is the Python syntax (?P=name). Perl 5.10's unified +backreference syntax, in which \eg can be used for both numeric and named +references, is also supported. We could rewrite the above example in any of the +following ways: +.sp + (?(?i)rah)\es+\ek + (?'p1'(?i)rah)\es+\ek{p1} + (?P(?i)rah)\es+(?P=p1) + (?(?i)rah)\es+\eg{p1} +.sp +A capture group that is referenced by name may appear in the pattern before or +after the reference. +.P +There may be more than one backreference to the same group. If a group has not +actually been used in a particular match, backreferences to it always fail by +default. For example, the pattern +.sp + (a|(bc))\e2 +.sp +always fails if it starts to match "a" rather than "bc". However, if the +PCRE2_MATCH_UNSET_BACKREF option is set at compile time, a backreference to an +unset value matches an empty string. +.P +Because there may be many capture groups in a pattern, all digits following a +backslash are taken as part of a potential backreference number. If the pattern +continues with a digit character, some delimiter must be used to terminate the +backreference. If the PCRE2_EXTENDED or PCRE2_EXTENDED_MORE option is set, this +can be white space. Otherwise, the \eg{} syntax or an empty comment (see +.\" HTML +.\" +"Comments" +.\" +below) can be used. +. +. +.SS "Recursive backreferences" +.rs +.sp +A backreference that occurs inside the group to which it refers fails when the +group is first used, so, for example, (a\e1) never matches. However, such +references can be useful inside repeated groups. For example, the pattern +.sp + (a|b\e1)+ +.sp +matches any number of "a"s and also "aba", "ababbaa" etc. At each iteration of +the group, the backreference matches the character string corresponding to the +previous iteration. In order for this to work, the pattern must be such that +the first iteration does not need to match the backreference. This can be done +using alternation, as in the example above, or by a quantifier with a minimum +of zero. +.P +For versions of PCRE2 less than 10.25, backreferences of this type used to +cause the group that they reference to be treated as an +.\" HTML +.\" +atomic group. +.\" +This restriction no longer applies, and backtracking into such groups can occur +as normal. +. +. +.\" HTML +.SH ASSERTIONS +.rs +.sp +An assertion is a test on the characters following or preceding the current +matching point that does not consume any characters. The simple assertions +coded as \eb, \eB, \eA, \eG, \eZ, \ez, ^ and $ are described +.\" HTML +.\" +above. +.\" +.P +More complicated assertions are coded as parenthesized groups. There are two +kinds: those that look ahead of the current position in the subject string, and +those that look behind it, and in each case an assertion may be positive (must +match for the assertion to be true) or negative (must not match for the +assertion to be true). An assertion group is matched in the normal way, +and if it is true, matching continues after it, but with the matching position +in the subject string reset to what it was before the assertion was processed. +.P +The Perl-compatible lookaround assertions are atomic. If an assertion is true, +but there is a subsequent matching failure, there is no backtracking into the +assertion. However, there are some cases where non-atomic assertions can be +useful. PCRE2 has some support for these, described in the section entitled +.\" HTML +.\" +"Non-atomic assertions" +.\" +below, but they are not Perl-compatible. +.P +A lookaround assertion may appear as the condition in a +.\" HTML +.\" +conditional group +.\" +(see below). In this case, the result of matching the assertion determines +which branch of the condition is followed. +.P +Assertion groups are not capture groups. If an assertion contains capture +groups within it, these are counted for the purposes of numbering the capture +groups in the whole pattern. Within each branch of an assertion, locally +captured substrings may be referenced in the usual way. For example, a sequence +such as (.)\eg{-1} can be used to check that two adjacent characters are the +same. +.P +When a branch within an assertion fails to match, any substrings that were +captured are discarded (as happens with any pattern branch that fails to +match). A negative assertion is true only when all its branches fail to match; +this means that no captured substrings are ever retained after a successful +negative assertion. When an assertion contains a matching branch, what happens +depends on the type of assertion. +.P +For a positive assertion, internally captured substrings in the successful +branch are retained, and matching continues with the next pattern item after +the assertion. For a negative assertion, a matching branch means that the +assertion is not true. If such an assertion is being used as a condition in a +.\" HTML +.\" +conditional group +.\" +(see below), captured substrings are retained, because matching continues with +the "no" branch of the condition. For other failing negative assertions, +control passes to the previous backtracking point, thus discarding any captured +strings within the assertion. +.P +Most assertion groups may be repeated; though it makes no sense to assert the +same thing several times, the side effect of capturing in positive assertions +may occasionally be useful. However, an assertion that forms the condition for +a conditional group may not be quantified. PCRE2 used to restrict the +repetition of assertions, but from release 10.35 the only restriction is that +an unlimited maximum repetition is changed to be one more than the minimum. For +example, {3,} is treated as {3,4}. +. +. +.SS "Alphabetic assertion names" +.rs +.sp +Traditionally, symbolic sequences such as (?= and (?<= have been used to +specify lookaround assertions. Perl 5.28 introduced some experimental +alphabetic alternatives which might be easier to remember. They all start with +(* instead of (? and must be written using lower case letters. PCRE2 supports +the following synonyms: +.sp + (*positive_lookahead: or (*pla: is the same as (?= + (*negative_lookahead: or (*nla: is the same as (?! + (*positive_lookbehind: or (*plb: is the same as (?<= + (*negative_lookbehind: or (*nlb: is the same as (? +.SS "Lookbehind assertions" +.rs +.sp +Lookbehind assertions start with (?<= for positive assertions and (? +.\" +(see above) +.\" +can be used instead of a lookbehind assertion to get round the fixed-length +restriction. +.P +The implementation of lookbehind assertions is, for each alternative, to +temporarily move the current position back by the fixed length and then try to +match. If there are insufficient characters before the current position, the +assertion fails. +.P +In UTF-8 and UTF-16 modes, PCRE2 does not allow the \eC escape (which matches a +single code unit even in a UTF mode) to appear in lookbehind assertions, +because it makes it impossible to calculate the length of the lookbehind. The +\eX and \eR escapes, which can match different numbers of code units, are never +permitted in lookbehinds. +.P +.\" HTML +.\" +"Subroutine" +.\" +calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long +as the called capture group matches a fixed-length string. However, +.\" HTML +.\" +recursion, +.\" +that is, a "subroutine" call into a group that is already active, +is not supported. +.P +Perl does not support backreferences in lookbehinds. PCRE2 does support them, +but only if certain conditions are met. The PCRE2_MATCH_UNSET_BACKREF option +must not be set, there must be no use of (?| in the pattern (it creates +duplicate group numbers), and if the backreference is by name, the name +must be unique. Of course, the referenced group must itself match a fixed +length substring. The following pattern matches words containing at least two +characters that begin and end with the same character: +.sp + \eb(\ew)\ew++(?<=\e1) +.P +Possessive quantifiers can be used in conjunction with lookbehind assertions to +specify efficient matching of fixed-length strings at the end of subject +strings. Consider a simple pattern such as +.sp + abcd$ +.sp +when applied to a long string that does not match. Because matching proceeds +from left to right, PCRE2 will look for each "a" in the subject and then see if +what follows matches the rest of the pattern. If the pattern is specified as +.sp + ^.*abcd$ +.sp +the initial .* matches the entire string at first, but when this fails (because +there is no following "a"), it backtracks to match all but the last character, +then all but the last two characters, and so on. Once again the search for "a" +covers the entire string, from right to left, so we are no better off. However, +if the pattern is written as +.sp + ^.*+(?<=abcd) +.sp +there can be no backtracking for the .*+ item because of the possessive +quantifier; it can match only the entire string. The subsequent lookbehind +assertion does a single test on the last four characters. If it fails, the +match fails immediately. For long strings, this approach makes a significant +difference to the processing time. +. +. +.SS "Using multiple assertions" +.rs +.sp +Several assertions (of any sort) may occur in succession. For example, +.sp + (?<=\ed{3})(? +.SH "NON-ATOMIC ASSERTIONS" +.rs +.sp +The traditional Perl-compatible lookaround assertions are atomic. That is, if +an assertion is true, but there is a subsequent matching failure, there is no +backtracking into the assertion. However, there are some cases where non-atomic +positive assertions can be useful. PCRE2 provides these using the following +syntax: +.sp + (*non_atomic_positive_lookahead: or (*napla: or (?* + (*non_atomic_positive_lookbehind: or (*naplb: or (?<* +.sp +Consider the problem of finding the right-most word in a string that also +appears earlier in the string, that is, it must appear at least twice in total. +This pattern returns the required result as captured substring 1: +.sp + ^(?x)(*napla: .* \eb(\ew++)) (?> .*? \eb\e1\eb ){2} +.sp +For a subject such as "word1 word2 word3 word2 word3 word4" the result is +"word3". How does it work? At the start, ^(?x) anchors the pattern and sets the +"x" option, which causes white space (introduced for readability) to be +ignored. Inside the assertion, the greedy .* at first consumes the entire +string, but then has to backtrack until the rest of the assertion can match a +word, which is captured by group 1. In other words, when the assertion first +succeeds, it captures the right-most word in the string. +.P +The current matching point is then reset to the start of the subject, and the +rest of the pattern match checks for two occurrences of the captured word, +using an ungreedy .*? to scan from the left. If this succeeds, we are done, but +if the last word in the string does not occur twice, this part of the pattern +fails. If a traditional atomic lookhead (?= or (*pla: had been used, the +assertion could not be re-entered, and the whole match would fail. The pattern +would succeed only if the very last word in the subject was found twice. +.P +Using a non-atomic lookahead, however, means that when the last word does not +occur twice in the string, the lookahead can backtrack and find the second-last +word, and so on, until either the match succeeds, or all words have been +tested. +.P +Two conditions must be met for a non-atomic assertion to be useful: the +contents of one or more capturing groups must change after a backtrack into the +assertion, and there must be a backreference to a changed group later in the +pattern. If this is not the case, the rest of the pattern match fails exactly +as before because nothing has changed, so using a non-atomic assertion just +wastes resources. +.P +There is one exception to backtracking into a non-atomic assertion. If an +(*ACCEPT) control verb is triggered, the assertion succeeds atomically. That +is, a subsequent match failure cannot backtrack into the assertion. +.P +Non-atomic assertions are not supported by the alternative matching function +\fBpcre2_dfa_match()\fP. They are supported by JIT, but only if they do not +contain any control verbs such as (*ACCEPT). (This may change in future). Note +that assertions that appear as conditions for +.\" HTML +.\" +conditional groups +.\" +(see below) must be atomic. +. +. +.SH "SCRIPT RUNS" +.rs +.sp +In concept, a script run is a sequence of characters that are all from the same +Unicode script such as Latin or Greek. However, because some scripts are +commonly used together, and because some diacritical and other marks are used +with multiple scripts, it is not that simple. There is a full description of +the rules that PCRE2 uses in the section entitled +.\" HTML +.\" +"Script Runs" +.\" +in the +.\" HREF +\fBpcre2unicode\fP +.\" +documentation. +.P +If part of a pattern is enclosed between (*script_run: or (*sr: and a closing +parenthesis, it fails if the sequence of characters that it matches are not a +script run. After a failure, normal backtracking occurs. Script runs can be +used to detect spoofing attacks using characters that look the same, but are +from different scripts. The string "paypal.com" is an infamous example, where +the letters could be a mixture of Latin and Cyrillic. This pattern ensures that +the matched characters in a sequence of non-spaces that follow white space are +a script run: +.sp + \es+(*sr:\eS+) +.sp +To be sure that they are all from the Latin script (for example), a lookahead +can be used: +.sp + \es+(?=\ep{Latin})(*sr:\eS+) +.sp +This works as long as the first character is expected to be a character in that +script, and not (for example) punctuation, which is allowed with any script. If +this is not the case, a more creative lookahead is needed. For example, if +digits, underscore, and dots are permitted at the start: +.sp + \es+(?=[0-9_.]*\ep{Latin})(*sr:\eS+) +.sp +.P +In many cases, backtracking into a script run pattern fragment is not +desirable. The script run can employ an atomic group to prevent this. Because +this is a common requirement, a shorthand notation is provided by +(*atomic_script_run: or (*asr: +.sp + (*asr:...) is the same as (*sr:(?>...)) +.sp +Note that the atomic group is inside the script run. Putting it outside would +not prevent backtracking into the script run pattern. +.P +Support for script runs is not available if PCRE2 is compiled without Unicode +support. A compile-time error is given if any of the above constructs is +encountered. Script runs are not supported by the alternate matching function, +\fBpcre2_dfa_match()\fP because they use the same mechanism as capturing +parentheses. +.P +\fBWarning:\fP The (*ACCEPT) control verb +.\" HTML +.\" +(see below) +.\" +should not be used within a script run group, because it causes an immediate +exit from the group, bypassing the script run checking. +. +. +.\" HTML +.SH "CONDITIONAL GROUPS" +.rs +.sp +It is possible to cause the matching process to obey a pattern fragment +conditionally or to choose between two alternative fragments, depending on +the result of an assertion, or whether a specific capture group has +already been matched. The two possible forms of conditional group are: +.sp + (?(condition)yes-pattern) + (?(condition)yes-pattern|no-pattern) +.sp +If the condition is satisfied, the yes-pattern is used; otherwise the +no-pattern (if present) is used. An absent no-pattern is equivalent to an empty +string (it always matches). If there are more than two alternatives in the +group, a compile-time error occurs. Each of the two alternatives may itself +contain nested groups of any form, including conditional groups; the +restriction to two alternatives applies only at the level of the condition +itself. This pattern fragment is an example where the alternatives are complex: +.sp + (?(1) (A|B|C) | (D | (?(2)E|F) | E) ) +.sp +.P +There are five kinds of condition: references to capture groups, references to +recursion, two pseudo-conditions called DEFINE and VERSION, and assertions. +. +. +.SS "Checking for a used capture group by number" +.rs +.sp +If the text between the parentheses consists of a sequence of digits, the +condition is true if a capture group of that number has previously matched. If +there is more than one capture group with the same number (see the earlier +.\" +.\" HTML +.\" +section about duplicate group numbers), +.\" +the condition is true if any of them have matched. An alternative notation is +to precede the digits with a plus or minus sign. In this case, the group number +is relative rather than absolute. The most recently opened capture group can be +referenced by (?(-1), the next most recent by (?(-2), and so on. Inside loops +it can also make sense to refer to subsequent groups. The next capture group +can be referenced as (?(+1), and so on. (The value zero in any of these forms +is not used; it provokes a compile-time error.) +.P +Consider the following pattern, which contains non-significant white space to +make it more readable (assume the PCRE2_EXTENDED option) and to divide it into +three parts for ease of discussion: +.sp + ( \e( )? [^()]+ (?(1) \e) ) +.sp +The first part matches an optional opening parenthesis, and if that +character is present, sets it as the first captured substring. The second part +matches one or more characters that are not parentheses. The third part is a +conditional group that tests whether or not the first capture group +matched. If it did, that is, if subject started with an opening parenthesis, +the condition is true, and so the yes-pattern is executed and a closing +parenthesis is required. Otherwise, since no-pattern is not present, the +conditional group matches nothing. In other words, this pattern matches a +sequence of non-parentheses, optionally enclosed in parentheses. +.P +If you were embedding this pattern in a larger one, you could use a relative +reference: +.sp + ...other stuff... ( \e( )? [^()]+ (?(-1) \e) ) ... +.sp +This makes the fragment independent of the parentheses in the larger pattern. +. +. +.SS "Checking for a used capture group by name" +.rs +.sp +Perl uses the syntax (?()...) or (?('name')...) to test for a used +capture group by name. For compatibility with earlier versions of PCRE1, which +had this facility before Perl, the syntax (?(name)...) is also recognized. +Note, however, that undelimited names consisting of the letter R followed by +digits are ambiguous (see the following section). Rewriting the above example +to use a named group gives this: +.sp + (? \e( )? [^()]+ (?() \e) ) +.sp +If the name used in a condition of this kind is a duplicate, the test is +applied to all groups of the same name, and is true if any one of them has +matched. +. +. +.SS "Checking for pattern recursion" +.rs +.sp +"Recursion" in this sense refers to any subroutine-like call from one part of +the pattern to another, whether or not it is actually recursive. See the +sections entitled +.\" HTML +.\" +"Recursive patterns" +.\" +and +.\" HTML +.\" +"Groups as subroutines" +.\" +below for details of recursion and subroutine calls. +.P +If a condition is the string (R), and there is no capture group with the name +R, the condition is true if matching is currently in a recursion or subroutine +call to the whole pattern or any capture group. If digits follow the letter R, +and there is no group with that name, the condition is true if the most recent +call is into a group with the given number, which must exist somewhere in the +overall pattern. This is a contrived example that is equivalent to a+b: +.sp + ((?(R1)a+|(?1)b)) +.sp +However, in both cases, if there is a capture group with a matching name, the +condition tests for its being set, as described in the section above, instead +of testing for recursion. For example, creating a group with the name R1 by +adding (?) to the above pattern completely changes its meaning. +.P +If a name preceded by ampersand follows the letter R, for example: +.sp + (?(R&name)...) +.sp +the condition is true if the most recent recursion is into a group of that name +(which must exist within the pattern). +.P +This condition does not check the entire recursion stack. It tests only the +current level. If the name used in a condition of this kind is a duplicate, the +test is applied to all groups of the same name, and is true if any one of +them is the most recent recursion. +.P +At "top level", all these recursion test conditions are false. +. +. +.\" HTML +.SS "Defining capture groups for use by reference only" +.rs +.sp +If the condition is the string (DEFINE), the condition is always false, even if +there is a group with the name DEFINE. In this case, there may be only one +alternative in the rest of the conditional group. It is always skipped if +control reaches this point in the pattern; the idea of DEFINE is that it can be +used to define subroutines that can be referenced from elsewhere. (The use of +.\" HTML +.\" +subroutines +.\" +is described below.) For example, a pattern to match an IPv4 address such as +"192.168.23.245" could be written like this (ignore white space and line +breaks): +.sp + (?(DEFINE) (? 2[0-4]\ed | 25[0-5] | 1\ed\ed | [1-9]?\ed) ) + \eb (?&byte) (\e.(?&byte)){3} \eb +.sp +The first part of the pattern is a DEFINE group inside which a another group +named "byte" is defined. This matches an individual component of an IPv4 +address (a number less than 256). When matching takes place, this part of the +pattern is skipped because DEFINE acts like a false condition. The rest of the +pattern uses references to the named group to match the four dot-separated +components of an IPv4 address, insisting on a word boundary at each end. +. +. +.SS "Checking the PCRE2 version" +.rs +.sp +Programs that link with a PCRE2 library can check the version by calling +\fBpcre2_config()\fP with appropriate arguments. Users of applications that do +not have access to the underlying code cannot do this. A special "condition" +called VERSION exists to allow such users to discover which version of PCRE2 +they are dealing with by using this condition to match a string such as +"yesno". VERSION must be followed either by "=" or ">=" and a version number. +For example: +.sp + (?(VERSION>=10.4)yes|no) +.sp +This pattern matches "yes" if the PCRE2 version is greater or equal to 10.4, or +"no" otherwise. The fractional part of the version number may not contain more +than two digits. +. +. +.SS "Assertion conditions" +.rs +.sp +If the condition is not in any of the above formats, it must be a parenthesized +assertion. This may be a positive or negative lookahead or lookbehind +assertion. However, it must be a traditional atomic assertion, not one of the +PCRE2-specific +.\" HTML +.\" +non-atomic assertions. +.\" +.P +Consider this pattern, again containing non-significant white space, and with +the two alternatives on the second line: +.sp + (?(?=[^a-z]*[a-z]) + \ed{2}-[a-z]{3}-\ed{2} | \ed{2}-\ed{2}-\ed{2} ) +.sp +The condition is a positive lookahead assertion that matches an optional +sequence of non-letters followed by a letter. In other words, it tests for the +presence of at least one letter in the subject. If a letter is found, the +subject is matched against the first alternative; otherwise it is matched +against the second. This pattern matches strings in one of the two forms +dd-aaa-dd or dd-dd-dd, where aaa are letters and dd are digits. +.P +When an assertion that is a condition contains capture groups, any +capturing that occurs in a matching branch is retained afterwards, for both +positive and negative assertions, because matching always continues after the +assertion, whether it succeeds or fails. (Compare non-conditional assertions, +for which captures are retained only for positive assertions that succeed.) +. +. +.\" HTML +.SH COMMENTS +.rs +.sp +There are two ways of including comments in patterns that are processed by +PCRE2. In both cases, the start of the comment must not be in a character +class, nor in the middle of any other sequence of related characters such as +(?: or a group name or number. The characters that make up a comment play +no part in the pattern matching. +.P +The sequence (?# marks the start of a comment that continues up to the next +closing parenthesis. Nested parentheses are not permitted. If the +PCRE2_EXTENDED or PCRE2_EXTENDED_MORE option is set, an unescaped # character +also introduces a comment, which in this case continues to immediately after +the next newline character or character sequence in the pattern. Which +characters are interpreted as newlines is controlled by an option passed to the +compiling function or by a special sequence at the start of the pattern, as +described in the section entitled +.\" HTML +.\" +"Newline conventions" +.\" +above. Note that the end of this type of comment is a literal newline sequence +in the pattern; escape sequences that happen to represent a newline do not +count. For example, consider this pattern when PCRE2_EXTENDED is set, and the +default newline convention (a single linefeed character) is in force: +.sp + abc #comment \en still comment +.sp +On encountering the # character, \fBpcre2_compile()\fP skips along, looking for +a newline in the pattern. The sequence \en is still literal at this stage, so +it does not terminate the comment. Only an actual character with the code value +0x0a (the default newline) does so. +. +. +.\" HTML +.SH "RECURSIVE PATTERNS" +.rs +.sp +Consider the problem of matching a string in parentheses, allowing for +unlimited nested parentheses. Without the use of recursion, the best that can +be done is to use a pattern that matches up to some fixed depth of nesting. It +is not possible to handle an arbitrary nesting depth. +.P +For some time, Perl has provided a facility that allows regular expressions to +recurse (amongst other things). It does this by interpolating Perl code in the +expression at run time, and the code can refer to the expression itself. A Perl +pattern using code interpolation to solve the parentheses problem can be +created like this: +.sp + $re = qr{\e( (?: (?>[^()]+) | (?p{$re}) )* \e)}x; +.sp +The (?p{...}) item interpolates Perl code at run time, and in this case refers +recursively to the pattern in which it appears. +.P +Obviously, PCRE2 cannot support the interpolation of Perl code. Instead, it +supports special syntax for recursion of the entire pattern, and also for +individual capture group recursion. After its introduction in PCRE1 and Python, +this kind of recursion was subsequently introduced into Perl at release 5.10. +.P +A special item that consists of (? followed by a number greater than zero and a +closing parenthesis is a recursive subroutine call of the capture group of the +given number, provided that it occurs inside that group. (If not, it is a +.\" HTML +.\" +non-recursive subroutine +.\" +call, which is described in the next section.) The special item (?R) or (?0) is +a recursive call of the entire regular expression. +.P +This PCRE2 pattern solves the nested parentheses problem (assume the +PCRE2_EXTENDED option is set so that white space is ignored): +.sp + \e( ( [^()]++ | (?R) )* \e) +.sp +First it matches an opening parenthesis. Then it matches any number of +substrings which can either be a sequence of non-parentheses, or a recursive +match of the pattern itself (that is, a correctly parenthesized substring). +Finally there is a closing parenthesis. Note the use of a possessive quantifier +to avoid backtracking into sequences of non-parentheses. +.P +If this were part of a larger pattern, you would not want to recurse the entire +pattern, so instead you could use this: +.sp + ( \e( ( [^()]++ | (?1) )* \e) ) +.sp +We have put the pattern into parentheses, and caused the recursion to refer to +them instead of the whole pattern. +.P +In a larger pattern, keeping track of parenthesis numbers can be tricky. This +is made easier by the use of relative references. Instead of (?1) in the +pattern above you can write (?-2) to refer to the second most recently opened +parentheses preceding the recursion. In other words, a negative number counts +capturing parentheses leftwards from the point at which it is encountered. +.P +Be aware however, that if +.\" HTML +.\" +duplicate capture group numbers +.\" +are in use, relative references refer to the earliest group with the +appropriate number. Consider, for example: +.sp + (?|(a)|(b)) (c) (?-2) +.sp +The first two capture groups (a) and (b) are both numbered 1, and group (c) +is number 2. When the reference (?-2) is encountered, the second most recently +opened parentheses has the number 1, but it is the first such group (the (a) +group) to which the recursion refers. This would be the same if an absolute +reference (?1) was used. In other words, relative references are just a +shorthand for computing a group number. +.P +It is also possible to refer to subsequent capture groups, by writing +references such as (?+2). However, these cannot be recursive because the +reference is not inside the parentheses that are referenced. They are always +.\" HTML +.\" +non-recursive subroutine +.\" +calls, as described in the next section. +.P +An alternative approach is to use named parentheses. The Perl syntax for this +is (?&name); PCRE1's earlier syntax (?P>name) is also supported. We could +rewrite the above example as follows: +.sp + (? \e( ( [^()]++ | (?&pn) )* \e) ) +.sp +If there is more than one group with the same name, the earliest one is +used. +.P +The example pattern that we have been looking at contains nested unlimited +repeats, and so the use of a possessive quantifier for matching strings of +non-parentheses is important when applying the pattern to strings that do not +match. For example, when this pattern is applied to +.sp + (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa() +.sp +it yields "no match" quickly. However, if a possessive quantifier is not used, +the match runs for a very long time indeed because there are so many different +ways the + and * repeats can carve up the subject, and all have to be tested +before failure can be reported. +.P +At the end of a match, the values of capturing parentheses are those from +the outermost level. If you want to obtain intermediate values, a callout +function can be used (see below and the +.\" HREF +\fBpcre2callout\fP +.\" +documentation). If the pattern above is matched against +.sp + (ab(cd)ef) +.sp +the value for the inner capturing parentheses (numbered 2) is "ef", which is +the last value taken on at the top level. If a capture group is not matched at +the top level, its final captured value is unset, even if it was (temporarily) +set at a deeper level during the matching process. +.P +Do not confuse the (?R) item with the condition (R), which tests for recursion. +Consider this pattern, which matches text in angle brackets, allowing for +arbitrary nesting. Only digits are allowed in nested brackets (that is, when +recursing), whereas any characters are permitted at the outer level. +.sp + < (?: (?(R) \ed++ | [^<>]*+) | (?R)) * > +.sp +In this pattern, (?(R) is the start of a conditional group, with two different +alternatives for the recursive and non-recursive cases. The (?R) item is the +actual recursive call. +. +. +.\" HTML +.SS "Differences in recursion processing between PCRE2 and Perl" +.rs +.sp +Some former differences between PCRE2 and Perl no longer exist. +.P +Before release 10.30, recursion processing in PCRE2 differed from Perl in that +a recursive subroutine call was always treated as an atomic group. That is, +once it had matched some of the subject string, it was never re-entered, even +if it contained untried alternatives and there was a subsequent matching +failure. (Historical note: PCRE implemented recursion before Perl did.) +.P +Starting with release 10.30, recursive subroutine calls are no longer treated +as atomic. That is, they can be re-entered to try unused alternatives if there +is a matching failure later in the pattern. This is now compatible with the way +Perl works. If you want a subroutine call to be atomic, you must explicitly +enclose it in an atomic group. +.P +Supporting backtracking into recursions simplifies certain types of recursive +pattern. For example, this pattern matches palindromic strings: +.sp + ^((.)(?1)\e2|.?)$ +.sp +The second branch in the group matches a single central character in the +palindrome when there are an odd number of characters, or nothing when there +are an even number of characters, but in order to work it has to be able to try +the second case when the rest of the pattern match fails. If you want to match +typical palindromic phrases, the pattern has to ignore all non-word characters, +which can be done like this: +.sp + ^\eW*+((.)\eW*+(?1)\eW*+\e2|\eW*+.?)\eW*+$ +.sp +If run with the PCRE2_CASELESS option, this pattern matches phrases such as "A +man, a plan, a canal: Panama!". Note the use of the possessive quantifier *+ to +avoid backtracking into sequences of non-word characters. Without this, PCRE2 +takes a great deal longer (ten times or more) to match typical phrases, and +Perl takes so long that you think it has gone into a loop. +.P +Another way in which PCRE2 and Perl used to differ in their recursion +processing is in the handling of captured values. Formerly in Perl, when a +group was called recursively or as a subroutine (see the next section), it +had no access to any values that were captured outside the recursion, whereas +in PCRE2 these values can be referenced. Consider this pattern: +.sp + ^(.)(\e1|a(?2)) +.sp +This pattern matches "bab". The first capturing parentheses match "b", then in +the second group, when the backreference \e1 fails to match "b", the second +alternative matches "a" and then recurses. In the recursion, \e1 does now match +"b" and so the whole match succeeds. This match used to fail in Perl, but in +later versions (I tried 5.024) it now works. +. +. +.\" HTML +.SH "GROUPS AS SUBROUTINES" +.rs +.sp +If the syntax for a recursive group call (either by number or by name) is used +outside the parentheses to which it refers, it operates a bit like a subroutine +in a programming language. More accurately, PCRE2 treats the referenced group +as an independent subpattern which it tries to match at the current matching +position. The called group may be defined before or after the reference. A +numbered reference can be absolute or relative, as in these examples: +.sp + (...(absolute)...)...(?2)... + (...(relative)...)...(?-1)... + (...(?+1)...(relative)... +.sp +An earlier example pointed out that the pattern +.sp + (sens|respons)e and \e1ibility +.sp +matches "sense and sensibility" and "response and responsibility", but not +"sense and responsibility". If instead the pattern +.sp + (sens|respons)e and (?1)ibility +.sp +is used, it does match "sense and responsibility" as well as the other two +strings. Another example is given in the discussion of DEFINE above. +.P +Like recursions, subroutine calls used to be treated as atomic, but this +changed at PCRE2 release 10.30, so backtracking into subroutine calls can now +occur. However, any capturing parentheses that are set during the subroutine +call revert to their previous values afterwards. +.P +Processing options such as case-independence are fixed when a group is +defined, so if it is used as a subroutine, such options cannot be changed for +different calls. For example, consider this pattern: +.sp + (abc)(?i:(?-1)) +.sp +It matches "abcabc". It does not match "abcABC" because the change of +processing option does not affect the called group. +.P +The behaviour of +.\" HTML +.\" +backtracking control verbs +.\" +in groups when called as subroutines is described in the section entitled +.\" HTML +.\" +"Backtracking verbs in subroutines" +.\" +below. +. +. +.\" HTML +.SH "ONIGURUMA SUBROUTINE SYNTAX" +.rs +.sp +For compatibility with Oniguruma, the non-Perl syntax \eg followed by a name or +a number enclosed either in angle brackets or single quotes, is an alternative +syntax for calling a group as a subroutine, possibly recursively. Here are two +of the examples used above, rewritten using this syntax: +.sp + (? \e( ( (?>[^()]+) | \eg )* \e) ) + (sens|respons)e and \eg'1'ibility +.sp +PCRE2 supports an extension to Oniguruma: if a number is preceded by a +plus or a minus sign it is taken as a relative reference. For example: +.sp + (abc)(?i:\eg<-1>) +.sp +Note that \eg{...} (Perl syntax) and \eg<...> (Oniguruma syntax) are \fInot\fP +synonymous. The former is a backreference; the latter is a subroutine call. +. +. +.SH CALLOUTS +.rs +.sp +Perl has a feature whereby using the sequence (?{...}) causes arbitrary Perl +code to be obeyed in the middle of matching a regular expression. This makes it +possible, amongst other things, to extract different substrings that match the +same pair of parentheses when there is a repetition. +.P +PCRE2 provides a similar feature, but of course it cannot obey arbitrary Perl +code. The feature is called "callout". The caller of PCRE2 provides an external +function by putting its entry point in a match context using the function +\fBpcre2_set_callout()\fP, and then passing that context to \fBpcre2_match()\fP +or \fBpcre2_dfa_match()\fP. If no match context is passed, or if the callout +entry point is set to NULL, callouts are disabled. +.P +Within a regular expression, (?C) indicates a point at which the external +function is to be called. There are two kinds of callout: those with a +numerical argument and those with a string argument. (?C) on its own with no +argument is treated as (?C0). A numerical argument allows the application to +distinguish between different callouts. String arguments were added for release +10.20 to make it possible for script languages that use PCRE2 to embed short +scripts within patterns in a similar way to Perl. +.P +During matching, when PCRE2 reaches a callout point, the external function is +called. It is provided with the number or string argument of the callout, the +position in the pattern, and one item of data that is also set in the match +block. The callout function may cause matching to proceed, to backtrack, or to +fail. +.P +By default, PCRE2 implements a number of optimizations at matching time, and +one side-effect is that sometimes callouts are skipped. If you need all +possible callouts to happen, you need to set options that disable the relevant +optimizations. More details, including a complete description of the +programming interface to the callout function, are given in the +.\" HREF +\fBpcre2callout\fP +.\" +documentation. +. +. +.SS "Callouts with numerical arguments" +.rs +.sp +If you just want to have a means of identifying different callout points, put a +number less than 256 after the letter C. For example, this pattern has two +callout points: +.sp + (?C1)abc(?C2)def +.sp +If the PCRE2_AUTO_CALLOUT flag is passed to \fBpcre2_compile()\fP, numerical +callouts are automatically installed before each item in the pattern. They are +all numbered 255. If there is a conditional group in the pattern whose +condition is an assertion, an additional callout is inserted just before the +condition. An explicit callout may also be set at this position, as in this +example: +.sp + (?(?C9)(?=a)abc|def) +.sp +Note that this applies only to assertion conditions, not to other types of +condition. +. +. +.SS "Callouts with string arguments" +.rs +.sp +A delimited string may be used instead of a number as a callout argument. The +starting delimiter must be one of ` ' " ^ % # $ { and the ending delimiter is +the same as the start, except for {, where the ending delimiter is }. If the +ending delimiter is needed within the string, it must be doubled. For +example: +.sp + (?C'ab ''c'' d')xyz(?C{any text})pqr +.sp +The doubling is removed before the string is passed to the callout function. +. +. +.\" HTML +.SH "BACKTRACKING CONTROL" +.rs +.sp +There are a number of special "Backtracking Control Verbs" (to use Perl's +terminology) that modify the behaviour of backtracking during matching. They +are generally of the form (*VERB) or (*VERB:NAME). Some verbs take either form, +and may behave differently depending on whether or not a name argument is +present. The names are not required to be unique within the pattern. +.P +By default, for compatibility with Perl, a name is any sequence of characters +that does not include a closing parenthesis. The name is not processed in +any way, and it is not possible to include a closing parenthesis in the name. +This can be changed by setting the PCRE2_ALT_VERBNAMES option, but the result +is no longer Perl-compatible. +.P +When PCRE2_ALT_VERBNAMES is set, backslash processing is applied to verb names +and only an unescaped closing parenthesis terminates the name. However, the +only backslash items that are permitted are \eQ, \eE, and sequences such as +\ex{100} that define character code points. Character type escapes such as \ed +are faulted. +.P +A closing parenthesis can be included in a name either as \e) or between \eQ +and \eE. In addition to backslash processing, if the PCRE2_EXTENDED or +PCRE2_EXTENDED_MORE option is also set, unescaped whitespace in verb names is +skipped, and #-comments are recognized, exactly as in the rest of the pattern. +PCRE2_EXTENDED and PCRE2_EXTENDED_MORE do not affect verb names unless +PCRE2_ALT_VERBNAMES is also set. +.P +The maximum length of a name is 255 in the 8-bit library and 65535 in the +16-bit and 32-bit libraries. If the name is empty, that is, if the closing +parenthesis immediately follows the colon, the effect is as if the colon were +not there. Any number of these verbs may occur in a pattern. Except for +(*ACCEPT), they may not be quantified. +.P +Since these verbs are specifically related to backtracking, most of them can be +used only when the pattern is to be matched using the traditional matching +function, because that uses a backtracking algorithm. With the exception of +(*FAIL), which behaves like a failing negative assertion, the backtracking +control verbs cause an error if encountered by the DFA matching function. +.P +The behaviour of these verbs in +.\" HTML +.\" +repeated groups, +.\" +.\" HTML +.\" +assertions, +.\" +and in +.\" HTML +.\" +capture groups called as subroutines +.\" +(whether or not recursively) is documented below. +. +. +.\" HTML +.SS "Optimizations that affect backtracking verbs" +.rs +.sp +PCRE2 contains some optimizations that are used to speed up matching by running +some checks at the start of each match attempt. For example, it may know the +minimum length of matching subject, or that a particular character must be +present. When one of these optimizations bypasses the running of a match, any +included backtracking verbs will not, of course, be processed. You can suppress +the start-of-match optimizations by setting the PCRE2_NO_START_OPTIMIZE option +when calling \fBpcre2_compile()\fP, or by starting the pattern with +(*NO_START_OPT). There is more discussion of this option in the section +entitled +.\" HTML +.\" +"Compiling a pattern" +.\" +in the +.\" HREF +\fBpcre2api\fP +.\" +documentation. +.P +Experiments with Perl suggest that it too has similar optimizations, and like +PCRE2, turning them off can change the result of a match. +. +. +.\" HTML +.SS "Verbs that act immediately" +.rs +.sp +The following verbs act as soon as they are encountered. +.sp + (*ACCEPT) or (*ACCEPT:NAME) +.sp +This verb causes the match to end successfully, skipping the remainder of the +pattern. However, when it is inside a capture group that is called as a +subroutine, only that group is ended successfully. Matching then continues +at the outer level. If (*ACCEPT) in triggered in a positive assertion, the +assertion succeeds; in a negative assertion, the assertion fails. +.P +If (*ACCEPT) is inside capturing parentheses, the data so far is captured. For +example: +.sp + A((?:A|B(*ACCEPT)|C)D) +.sp +This matches "AB", "AAD", or "ACD"; when it matches "AB", "B" is captured by +the outer parentheses. +.P +(*ACCEPT) is the only backtracking verb that is allowed to be quantified +because an ungreedy quantification with a minimum of zero acts only when a +backtrack happens. Consider, for example, +.sp + (A(*ACCEPT)??B)C +.sp +where A, B, and C may be complex expressions. After matching "A", the matcher +processes "BC"; if that fails, causing a backtrack, (*ACCEPT) is triggered and +the match succeeds. In both cases, all but C is captured. Whereas (*COMMIT) +(see below) means "fail on backtrack", a repeated (*ACCEPT) of this type means +"succeed on backtrack". +.P +\fBWarning:\fP (*ACCEPT) should not be used within a script run group, because +it causes an immediate exit from the group, bypassing the script run checking. +.sp + (*FAIL) or (*FAIL:NAME) +.sp +This verb causes a matching failure, forcing backtracking to occur. It may be +abbreviated to (*F). It is equivalent to (?!) but easier to read. The Perl +documentation notes that it is probably useful only when combined with (?{}) or +(??{}). Those are, of course, Perl features that are not present in PCRE2. The +nearest equivalent is the callout feature, as for example in this pattern: +.sp + a+(?C)(*FAIL) +.sp +A match with the string "aaaa" always fails, but the callout is taken before +each backtrack happens (in this example, 10 times). +.P +(*ACCEPT:NAME) and (*FAIL:NAME) behave the same as (*MARK:NAME)(*ACCEPT) and +(*MARK:NAME)(*FAIL), respectively, that is, a (*MARK) is recorded just before +the verb acts. +. +. +.SS "Recording which path was taken" +.rs +.sp +There is one verb whose main purpose is to track how a match was arrived at, +though it also has a secondary use in conjunction with advancing the match +starting point (see (*SKIP) below). +.sp + (*MARK:NAME) or (*:NAME) +.sp +A name is always required with this verb. For all the other backtracking +control verbs, a NAME argument is optional. +.P +When a match succeeds, the name of the last-encountered mark name on the +matching path is passed back to the caller as described in the section entitled +.\" HTML +.\" +"Other information about the match" +.\" +in the +.\" HREF +\fBpcre2api\fP +.\" +documentation. This applies to all instances of (*MARK) and other verbs, +including those inside assertions and atomic groups. However, there are +differences in those cases when (*MARK) is used in conjunction with (*SKIP) as +described below. +.P +The mark name that was last encountered on the matching path is passed back. A +verb without a NAME argument is ignored for this purpose. Here is an example of +\fBpcre2test\fP output, where the "mark" modifier requests the retrieval and +outputting of (*MARK) data: +.sp + re> /X(*MARK:A)Y|X(*MARK:B)Z/mark + data> XY + 0: XY + MK: A + XZ + 0: XZ + MK: B +.sp +The (*MARK) name is tagged with "MK:" in this output, and in this example it +indicates which of the two alternatives matched. This is a more efficient way +of obtaining this information than putting each alternative in its own +capturing parentheses. +.P +If a verb with a name is encountered in a positive assertion that is true, the +name is recorded and passed back if it is the last-encountered. This does not +happen for negative assertions or failing positive assertions. +.P +After a partial match or a failed match, the last encountered name in the +entire match process is returned. For example: +.sp + re> /X(*MARK:A)Y|X(*MARK:B)Z/mark + data> XP + No match, mark = B +.sp +Note that in this unanchored example the mark is retained from the match +attempt that started at the letter "X" in the subject. Subsequent match +attempts starting at "P" and then with an empty string do not get as far as the +(*MARK) item, but nevertheless do not reset it. +.P +If you are interested in (*MARK) values after failed matches, you should +probably set the PCRE2_NO_START_OPTIMIZE option +.\" HTML +.\" +(see above) +.\" +to ensure that the match is always attempted. +. +. +.SS "Verbs that act after backtracking" +.rs +.sp +The following verbs do nothing when they are encountered. Matching continues +with what follows, but if there is a subsequent match failure, causing a +backtrack to the verb, a failure is forced. That is, backtracking cannot pass +to the left of the verb. However, when one of these verbs appears inside an +atomic group or in a lookaround assertion that is true, its effect is confined +to that group, because once the group has been matched, there is never any +backtracking into it. Backtracking from beyond an assertion or an atomic group +ignores the entire group, and seeks a preceding backtracking point. +.P +These verbs differ in exactly what kind of failure occurs when backtracking +reaches them. The behaviour described below is what happens when the verb is +not in a subroutine or an assertion. Subsequent sections cover these special +cases. +.sp + (*COMMIT) or (*COMMIT:NAME) +.sp +This verb causes the whole match to fail outright if there is a later matching +failure that causes backtracking to reach it. Even if the pattern is +unanchored, no further attempts to find a match by advancing the starting point +take place. If (*COMMIT) is the only backtracking verb that is encountered, +once it has been passed \fBpcre2_match()\fP is committed to finding a match at +the current starting point, or not at all. For example: +.sp + a+(*COMMIT)b +.sp +This matches "xxaab" but not "aacaab". It can be thought of as a kind of +dynamic anchor, or "I've started, so I must finish." +.P +The behaviour of (*COMMIT:NAME) is not the same as (*MARK:NAME)(*COMMIT). It is +like (*MARK:NAME) in that the name is remembered for passing back to the +caller. However, (*SKIP:NAME) searches only for names that are set with +(*MARK), ignoring those set by any of the other backtracking verbs. +.P +If there is more than one backtracking verb in a pattern, a different one that +follows (*COMMIT) may be triggered first, so merely passing (*COMMIT) during a +match does not always guarantee that a match must be at this starting point. +.P +Note that (*COMMIT) at the start of a pattern is not the same as an anchor, +unless PCRE2's start-of-match optimizations are turned off, as shown in this +output from \fBpcre2test\fP: +.sp + re> /(*COMMIT)abc/ + data> xyzabc + 0: abc + data> + re> /(*COMMIT)abc/no_start_optimize + data> xyzabc + No match +.sp +For the first pattern, PCRE2 knows that any match must start with "a", so the +optimization skips along the subject to "a" before applying the pattern to the +first set of data. The match attempt then succeeds. The second pattern disables +the optimization that skips along to the first character. The pattern is now +applied starting at "x", and so the (*COMMIT) causes the match to fail without +trying any other starting points. +.sp + (*PRUNE) or (*PRUNE:NAME) +.sp +This verb causes the match to fail at the current starting position in the +subject if there is a later matching failure that causes backtracking to reach +it. If the pattern is unanchored, the normal "bumpalong" advance to the next +starting character then happens. Backtracking can occur as usual to the left of +(*PRUNE), before it is reached, or when matching to the right of (*PRUNE), but +if there is no match to the right, backtracking cannot cross (*PRUNE). In +simple cases, the use of (*PRUNE) is just an alternative to an atomic group or +possessive quantifier, but there are some uses of (*PRUNE) that cannot be +expressed in any other way. In an anchored pattern (*PRUNE) has the same effect +as (*COMMIT). +.P +The behaviour of (*PRUNE:NAME) is not the same as (*MARK:NAME)(*PRUNE). It is +like (*MARK:NAME) in that the name is remembered for passing back to the +caller. However, (*SKIP:NAME) searches only for names set with (*MARK), +ignoring those set by other backtracking verbs. +.sp + (*SKIP) +.sp +This verb, when given without a name, is like (*PRUNE), except that if the +pattern is unanchored, the "bumpalong" advance is not to the next character, +but to the position in the subject where (*SKIP) was encountered. (*SKIP) +signifies that whatever text was matched leading up to it cannot be part of a +successful match if there is a later mismatch. Consider: +.sp + a+(*SKIP)b +.sp +If the subject is "aaaac...", after the first match attempt fails (starting at +the first character in the string), the starting point skips on to start the +next attempt at "c". Note that a possessive quantifer does not have the same +effect as this example; although it would suppress backtracking during the +first match attempt, the second attempt would start at the second character +instead of skipping on to "c". +.P +If (*SKIP) is used to specify a new starting position that is the same as the +starting position of the current match, or (by being inside a lookbehind) +earlier, the position specified by (*SKIP) is ignored, and instead the normal +"bumpalong" occurs. +.sp + (*SKIP:NAME) +.sp +When (*SKIP) has an associated name, its behaviour is modified. When such a +(*SKIP) is triggered, the previous path through the pattern is searched for the +most recent (*MARK) that has the same name. If one is found, the "bumpalong" +advance is to the subject position that corresponds to that (*MARK) instead of +to where (*SKIP) was encountered. If no (*MARK) with a matching name is found, +the (*SKIP) is ignored. +.P +The search for a (*MARK) name uses the normal backtracking mechanism, which +means that it does not see (*MARK) settings that are inside atomic groups or +assertions, because they are never re-entered by backtracking. Compare the +following \fBpcre2test\fP examples: +.sp + re> /a(?>(*MARK:X))(*SKIP:X)(*F)|(.)/ + data: abc + 0: a + 1: a + data: + re> /a(?:(*MARK:X))(*SKIP:X)(*F)|(.)/ + data: abc + 0: b + 1: b +.sp +In the first example, the (*MARK) setting is in an atomic group, so it is not +seen when (*SKIP:X) triggers, causing the (*SKIP) to be ignored. This allows +the second branch of the pattern to be tried at the first character position. +In the second example, the (*MARK) setting is not in an atomic group. This +allows (*SKIP:X) to find the (*MARK) when it backtracks, and this causes a new +matching attempt to start at the second character. This time, the (*MARK) is +never seen because "a" does not match "b", so the matcher immediately jumps to +the second branch of the pattern. +.P +Note that (*SKIP:NAME) searches only for names set by (*MARK:NAME). It ignores +names that are set by other backtracking verbs. +.sp + (*THEN) or (*THEN:NAME) +.sp +This verb causes a skip to the next innermost alternative when backtracking +reaches it. That is, it cancels any further backtracking within the current +alternative. Its name comes from the observation that it can be used for a +pattern-based if-then-else block: +.sp + ( COND1 (*THEN) FOO | COND2 (*THEN) BAR | COND3 (*THEN) BAZ ) ... +.sp +If the COND1 pattern matches, FOO is tried (and possibly further items after +the end of the group if FOO succeeds); on failure, the matcher skips to the +second alternative and tries COND2, without backtracking into COND1. If that +succeeds and BAR fails, COND3 is tried. If subsequently BAZ fails, there are no +more alternatives, so there is a backtrack to whatever came before the entire +group. If (*THEN) is not inside an alternation, it acts like (*PRUNE). +.P +The behaviour of (*THEN:NAME) is not the same as (*MARK:NAME)(*THEN). It is +like (*MARK:NAME) in that the name is remembered for passing back to the +caller. However, (*SKIP:NAME) searches only for names set with (*MARK), +ignoring those set by other backtracking verbs. +.P +A group that does not contain a | character is just a part of the enclosing +alternative; it is not a nested alternation with only one alternative. The +effect of (*THEN) extends beyond such a group to the enclosing alternative. +Consider this pattern, where A, B, etc. are complex pattern fragments that do +not contain any | characters at this level: +.sp + A (B(*THEN)C) | D +.sp +If A and B are matched, but there is a failure in C, matching does not +backtrack into A; instead it moves to the next alternative, that is, D. +However, if the group containing (*THEN) is given an alternative, it +behaves differently: +.sp + A (B(*THEN)C | (*FAIL)) | D +.sp +The effect of (*THEN) is now confined to the inner group. After a failure in C, +matching moves to (*FAIL), which causes the whole group to fail because there +are no more alternatives to try. In this case, matching does backtrack into A. +.P +Note that a conditional group is not considered as having two alternatives, +because only one is ever used. In other words, the | character in a conditional +group has a different meaning. Ignoring white space, consider: +.sp + ^.*? (?(?=a) a | b(*THEN)c ) +.sp +If the subject is "ba", this pattern does not match. Because .*? is ungreedy, +it initially matches zero characters. The condition (?=a) then fails, the +character "b" is matched, but "c" is not. At this point, matching does not +backtrack to .*? as might perhaps be expected from the presence of the | +character. The conditional group is part of the single alternative that +comprises the whole pattern, and so the match fails. (If there was a backtrack +into .*?, allowing it to match "b", the match would succeed.) +.P +The verbs just described provide four different "strengths" of control when +subsequent matching fails. (*THEN) is the weakest, carrying on the match at the +next alternative. (*PRUNE) comes next, failing the match at the current +starting position, but allowing an advance to the next character (for an +unanchored pattern). (*SKIP) is similar, except that the advance may be more +than one character. (*COMMIT) is the strongest, causing the entire match to +fail. +. +. +.SS "More than one backtracking verb" +.rs +.sp +If more than one backtracking verb is present in a pattern, the one that is +backtracked onto first acts. For example, consider this pattern, where A, B, +etc. are complex pattern fragments: +.sp + (A(*COMMIT)B(*THEN)C|ABD) +.sp +If A matches but B fails, the backtrack to (*COMMIT) causes the entire match to +fail. However, if A and B match, but C fails, the backtrack to (*THEN) causes +the next alternative (ABD) to be tried. This behaviour is consistent, but is +not always the same as Perl's. It means that if two or more backtracking verbs +appear in succession, all the the last of them has no effect. Consider this +example: +.sp + ...(*COMMIT)(*PRUNE)... +.sp +If there is a matching failure to the right, backtracking onto (*PRUNE) causes +it to be triggered, and its action is taken. There can never be a backtrack +onto (*COMMIT). +. +. +.\" HTML +.SS "Backtracking verbs in repeated groups" +.rs +.sp +PCRE2 sometimes differs from Perl in its handling of backtracking verbs in +repeated groups. For example, consider: +.sp + /(a(*COMMIT)b)+ac/ +.sp +If the subject is "abac", Perl matches unless its optimizations are disabled, +but PCRE2 always fails because the (*COMMIT) in the second repeat of the group +acts. +. +. +.\" HTML +.SS "Backtracking verbs in assertions" +.rs +.sp +(*FAIL) in any assertion has its normal effect: it forces an immediate +backtrack. The behaviour of the other backtracking verbs depends on whether or +not the assertion is standalone or acting as the condition in a conditional +group. +.P +(*ACCEPT) in a standalone positive assertion causes the assertion to succeed +without any further processing; captured strings and a mark name (if set) are +retained. In a standalone negative assertion, (*ACCEPT) causes the assertion to +fail without any further processing; captured substrings and any mark name are +discarded. +.P +If the assertion is a condition, (*ACCEPT) causes the condition to be true for +a positive assertion and false for a negative one; captured substrings are +retained in both cases. +.P +The remaining verbs act only when a later failure causes a backtrack to +reach them. This means that, for the Perl-compatible assertions, their effect +is confined to the assertion, because Perl lookaround assertions are atomic. A +backtrack that occurs after such an assertion is complete does not jump back +into the assertion. Note in particular that a (*MARK) name that is set in an +assertion is not "seen" by an instance of (*SKIP:NAME) later in the pattern. +.P +PCRE2 now supports non-atomic positive assertions, as described in the section +entitled +.\" HTML +.\" +"Non-atomic assertions" +.\" +above. These assertions must be standalone (not used as conditions). They are +not Perl-compatible. For these assertions, a later backtrack does jump back +into the assertion, and therefore verbs such as (*COMMIT) can be triggered by +backtracks from later in the pattern. +.P +The effect of (*THEN) is not allowed to escape beyond an assertion. If there +are no more branches to try, (*THEN) causes a positive assertion to be false, +and a negative assertion to be true. +.P +The other backtracking verbs are not treated specially if they appear in a +standalone positive assertion. In a conditional positive assertion, +backtracking (from within the assertion) into (*COMMIT), (*SKIP), or (*PRUNE) +causes the condition to be false. However, for both standalone and conditional +negative assertions, backtracking into (*COMMIT), (*SKIP), or (*PRUNE) causes +the assertion to be true, without considering any further alternative branches. +. +. +.\" HTML +.SS "Backtracking verbs in subroutines" +.rs +.sp +These behaviours occur whether or not the group is called recursively. +.P +(*ACCEPT) in a group called as a subroutine causes the subroutine match to +succeed without any further processing. Matching then continues after the +subroutine call. Perl documents this behaviour. Perl's treatment of the other +verbs in subroutines is different in some cases. +.P +(*FAIL) in a group called as a subroutine has its normal effect: it forces +an immediate backtrack. +.P +(*COMMIT), (*SKIP), and (*PRUNE) cause the subroutine match to fail when +triggered by being backtracked to in a group called as a subroutine. There is +then a backtrack at the outer level. +.P +(*THEN), when triggered, skips to the next alternative in the innermost +enclosing group that has alternatives (its normal behaviour). However, if there +is no such group within the subroutine's group, the subroutine match fails and +there is a backtrack at the outer level. +. +. +.SH "SEE ALSO" +.rs +.sp +\fBpcre2api\fP(3), \fBpcre2callout\fP(3), \fBpcre2matching\fP(3), +\fBpcre2syntax\fP(3), \fBpcre2\fP(3). +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 06 October 2020 +Copyright (c) 1997-2020 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2perform.3 b/src/pcre2/doc/pcre2perform.3 new file mode 100644 index 00000000..040369a7 --- /dev/null +++ b/src/pcre2/doc/pcre2perform.3 @@ -0,0 +1,244 @@ +.TH PCRE2PERFORM 3 "03 February 2019" "PCRE2 10.33" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH "PCRE2 PERFORMANCE" +.rs +.sp +Two aspects of performance are discussed below: memory usage and processing +time. The way you express your pattern as a regular expression can affect both +of them. +. +.SH "COMPILED PATTERN MEMORY USAGE" +.rs +.sp +Patterns are compiled by PCRE2 into a reasonably efficient interpretive code, +so that most simple patterns do not use much memory for storing the compiled +version. However, there is one case where the memory usage of a compiled +pattern can be unexpectedly large. If a parenthesized group has a quantifier +with a minimum greater than 1 and/or a limited maximum, the whole group is +repeated in the compiled code. For example, the pattern +.sp + (abc|def){2,4} +.sp +is compiled as if it were +.sp + (abc|def)(abc|def)((abc|def)(abc|def)?)? +.sp +(Technical aside: It is done this way so that backtrack points within each of +the repetitions can be independently maintained.) +.P +For regular expressions whose quantifiers use only small numbers, this is not +usually a problem. However, if the numbers are large, and particularly if such +repetitions are nested, the memory usage can become an embarrassment. For +example, the very simple pattern +.sp + ((ab){1,1000}c){1,3} +.sp +uses over 50KiB when compiled using the 8-bit library. When PCRE2 is +compiled with its default internal pointer size of two bytes, the size limit on +a compiled pattern is 65535 code units in the 8-bit and 16-bit libraries, and +this is reached with the above pattern if the outer repetition is increased +from 3 to 4. PCRE2 can be compiled to use larger internal pointers and thus +handle larger compiled patterns, but it is better to try to rewrite your +pattern to use less memory if you can. +.P +One way of reducing the memory usage for such patterns is to make use of +PCRE2's +.\" HTML +.\" +"subroutine" +.\" +facility. Re-writing the above pattern as +.sp + ((ab)(?2){0,999}c)(?1){0,2} +.sp +reduces the memory requirements to around 16KiB, and indeed it remains under +20KiB even with the outer repetition increased to 100. However, this kind of +pattern is not always exactly equivalent, because any captures within +subroutine calls are lost when the subroutine completes. If this is not a +problem, this kind of rewriting will allow you to process patterns that PCRE2 +cannot otherwise handle. The matching performance of the two different versions +of the pattern are roughly the same. (This applies from release 10.30 - things +were different in earlier releases.) +. +. +.SH "STACK AND HEAP USAGE AT RUN TIME" +.rs +.sp +From release 10.30, the interpretive (non-JIT) version of \fBpcre2_match()\fP +uses very little system stack at run time. In earlier releases recursive +function calls could use a great deal of stack, and this could cause problems, +but this usage has been eliminated. Backtracking positions are now explicitly +remembered in memory frames controlled by the code. An initial 20KiB vector of +frames is allocated on the system stack (enough for about 100 frames for small +patterns), but if this is insufficient, heap memory is used. The amount of heap +memory can be limited; if the limit is set to zero, only the initial stack +vector is used. Rewriting patterns to be time-efficient, as described below, +may also reduce the memory requirements. +.P +In contrast to \fBpcre2_match()\fP, \fBpcre2_dfa_match()\fP does use recursive +function calls, but only for processing atomic groups, lookaround assertions, +and recursion within the pattern. The original version of the code used to +allocate quite large internal workspace vectors on the stack, which caused some +problems for some patterns in environments with small stacks. From release +10.32 the code for \fBpcre2_dfa_match()\fP has been re-factored to use heap +memory when necessary for internal workspace when recursing, though recursive +function calls are still used. +.P +The "match depth" parameter can be used to limit the depth of function +recursion, and the "match heap" parameter to limit heap memory in +\fBpcre2_dfa_match()\fP. +. +. +.SH "PROCESSING TIME" +.rs +.sp +Certain items in regular expression patterns are processed more efficiently +than others. It is more efficient to use a character class like [aeiou] than a +set of single-character alternatives such as (a|e|i|o|u). In general, the +simplest construction that provides the required behaviour is usually the most +efficient. Jeffrey Friedl's book contains a lot of useful general discussion +about optimizing regular expressions for efficient performance. This document +contains a few observations about PCRE2. +.P +Using Unicode character properties (the \ep, \eP, and \eX escapes) is slow, +because PCRE2 has to use a multi-stage table lookup whenever it needs a +character's property. If you can find an alternative pattern that does not use +character properties, it will probably be faster. +.P +By default, the escape sequences \eb, \ed, \es, and \ew, and the POSIX +character classes such as [:alpha:] do not use Unicode properties, partly for +backwards compatibility, and partly for performance reasons. However, you can +set the PCRE2_UCP option or start the pattern with (*UCP) if you want Unicode +character properties to be used. This can double the matching time for items +such as \ed, when matched with \fBpcre2_match()\fP; the performance loss is +less with a DFA matching function, and in both cases there is not much +difference for \eb. +.P +When a pattern begins with .* not in atomic parentheses, nor in parentheses +that are the subject of a backreference, and the PCRE2_DOTALL option is set, +the pattern is implicitly anchored by PCRE2, since it can match only at the +start of a subject string. If the pattern has multiple top-level branches, they +must all be anchorable. The optimization can be disabled by the +PCRE2_NO_DOTSTAR_ANCHOR option, and is automatically disabled if the pattern +contains (*PRUNE) or (*SKIP). +.P +If PCRE2_DOTALL is not set, PCRE2 cannot make this optimization, because the +dot metacharacter does not then match a newline, and if the subject string +contains newlines, the pattern may match from the character immediately +following one of them instead of from the very start. For example, the pattern +.sp + .*second +.sp +matches the subject "first\enand second" (where \en stands for a newline +character), with the match starting at the seventh character. In order to do +this, PCRE2 has to retry the match starting after every newline in the subject. +.P +If you are using such a pattern with subject strings that do not contain +newlines, the best performance is obtained by setting PCRE2_DOTALL, or starting +the pattern with ^.* or ^.*? to indicate explicit anchoring. That saves PCRE2 +from having to scan along the subject looking for a newline to restart at. +.P +Beware of patterns that contain nested indefinite repeats. These can take a +long time to run when applied to a string that does not match. Consider the +pattern fragment +.sp + ^(a+)* +.sp +This can match "aaaa" in 16 different ways, and this number increases very +rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4 +times, and for each of those cases other than 0 or 4, the + repeats can match +different numbers of times.) When the remainder of the pattern is such that the +entire match is going to fail, PCRE2 has in principle to try every possible +variation, and this can take an extremely long time, even for relatively short +strings. +.P +An optimization catches some of the more simple cases such as +.sp + (a+)*b +.sp +where a literal character follows. Before embarking on the standard matching +procedure, PCRE2 checks that there is a "b" later in the subject string, and if +there is not, it fails the match immediately. However, when there is no +following literal this optimization cannot be used. You can see the difference +by comparing the behaviour of +.sp + (a+)*\ed +.sp +with the pattern above. The former gives a failure almost instantly when +applied to a whole line of "a" characters, whereas the latter takes an +appreciable time with strings longer than about 20 characters. +.P +In many cases, the solution to this kind of performance issue is to use an +atomic group or a possessive quantifier. This can often reduce memory +requirements as well. As another example, consider this pattern: +.sp + ([^<]|<(?!inet))+ +.sp +It matches from wherever it starts until it encounters " +.\" +"The match context" +.\" +in the +.\" HREF +\fBpcre2api\fP +.\" +documentation. +.P +The \fBpcre2test\fP test program has a modifier called "find_limits" which, if +applied to a subject line, causes it to find the smallest limits that allow a +pattern to match. This is done by repeatedly matching with different limits. +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 03 February 2019 +Copyright (c) 1997-2019 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2posix.3 b/src/pcre2/doc/pcre2posix.3 new file mode 100644 index 00000000..6cfede7e --- /dev/null +++ b/src/pcre2/doc/pcre2posix.3 @@ -0,0 +1,329 @@ +.TH PCRE2POSIX 3 "26 April 2021" "PCRE2 10.37" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH "SYNOPSIS" +.rs +.sp +.B #include +.PP +.nf +.B int pcre2_regcomp(regex_t *\fIpreg\fP, const char *\fIpattern\fP, +.B " int \fIcflags\fP);" +.sp +.B int pcre2_regexec(const regex_t *\fIpreg\fP, const char *\fIstring\fP, +.B " size_t \fInmatch\fP, regmatch_t \fIpmatch\fP[], int \fIeflags\fP);" +.sp +.B "size_t pcre2_regerror(int \fIerrcode\fP, const regex_t *\fIpreg\fP," +.B " char *\fIerrbuf\fP, size_t \fIerrbuf_size\fP);" +.sp +.B void pcre2_regfree(regex_t *\fIpreg\fP); +.fi +. +.SH DESCRIPTION +.rs +.sp +This set of functions provides a POSIX-style API for the PCRE2 regular +expression 8-bit library. There are no POSIX-style wrappers for PCRE2's 16-bit +and 32-bit libraries. See the +.\" HREF +\fBpcre2api\fP +.\" +documentation for a description of PCRE2's native API, which contains much +additional functionality. +.P +The functions described here are wrapper functions that ultimately call the +PCRE2 native API. Their prototypes are defined in the \fBpcre2posix.h\fP header +file, and they all have unique names starting with \fBpcre2_\fP. However, the +\fBpcre2posix.h\fP header also contains macro definitions that convert the +standard POSIX names such \fBregcomp()\fP into \fBpcre2_regcomp()\fP etc. This +means that a program can use the usual POSIX names without running the risk of +accidentally linking with POSIX functions from a different library. +.P +On Unix-like systems the PCRE2 POSIX library is called \fBlibpcre2-posix\fP, so +can be accessed by adding \fB-lpcre2-posix\fP to the command for linking an +application. Because the POSIX functions call the native ones, it is also +necessary to add \fB-lpcre2-8\fP. +.P +Although they were not defined as protypes in \fBpcre2posix.h\fP, releases +10.33 to 10.36 of the library contained functions with the POSIX names +\fBregcomp()\fP etc. These simply passed their arguments to the PCRE2 +functions. These functions were provided for backwards compatibility with +earlier versions of PCRE2, which had only POSIX names. However, this has proved +troublesome in situations where a program links with several libraries, some of +which use PCRE2's POSIX interface while others use the real POSIX functions. +For this reason, the POSIX names have been removed since release 10.37. +.P +Calling the header file \fBpcre2posix.h\fP avoids any conflict with other POSIX +libraries. It can, of course, be renamed or aliased as \fBregex.h\fP, which is +the "correct" name, if there is no clash. It provides two structure types, +\fIregex_t\fP for compiled internal forms, and \fIregmatch_t\fP for returning +captured substrings. It also defines some constants whose names start with +"REG_"; these are used for setting options and identifying error codes. +. +. +.SH "USING THE POSIX FUNCTIONS" +.rs +.sp +Those POSIX option bits that can reasonably be mapped to PCRE2 native options +have been implemented. In addition, the option REG_EXTENDED is defined with the +value zero. This has no effect, but since programs that are written to the +POSIX interface often use it, this makes it easier to slot in PCRE2 as a +replacement library. Other POSIX options are not even defined. +.P +There are also some options that are not defined by POSIX. These have been +added at the request of users who want to make use of certain PCRE2-specific +features via the POSIX calling interface or to add BSD or GNU functionality. +.P +When PCRE2 is called via these functions, it is only the API that is POSIX-like +in style. The syntax and semantics of the regular expressions themselves are +still those of Perl, subject to the setting of various PCRE2 options, as +described below. "POSIX-like in style" means that the API approximates to the +POSIX definition; it is not fully POSIX-compatible, and in multi-unit encoding +domains it is probably even less compatible. +.P +The descriptions below use the actual names of the functions, but, as described +above, the standard POSIX names (without the \fBpcre2_\fP prefix) may also be +used. +. +. +.SH "COMPILING A PATTERN" +.rs +.sp +The function \fBpcre2_regcomp()\fP is called to compile a pattern into an +internal form. By default, the pattern is a C string terminated by a binary +zero (but see REG_PEND below). The \fIpreg\fP argument is a pointer to a +\fBregex_t\fP structure that is used as a base for storing information about +the compiled regular expression. (It is also used for input when REG_PEND is +set.) +.P +The argument \fIcflags\fP is either zero, or contains one or more of the bits +defined by the following macros: +.sp + REG_DOTALL +.sp +The PCRE2_DOTALL option is set when the regular expression is passed for +compilation to the native function. Note that REG_DOTALL is not part of the +POSIX standard. +.sp + REG_ICASE +.sp +The PCRE2_CASELESS option is set when the regular expression is passed for +compilation to the native function. +.sp + REG_NEWLINE +.sp +The PCRE2_MULTILINE option is set when the regular expression is passed for +compilation to the native function. Note that this does \fInot\fP mimic the +defined POSIX behaviour for REG_NEWLINE (see the following section). +.sp + REG_NOSPEC +.sp +The PCRE2_LITERAL option is set when the regular expression is passed for +compilation to the native function. This disables all meta characters in the +pattern, causing it to be treated as a literal string. The only other options +that are allowed with REG_NOSPEC are REG_ICASE, REG_NOSUB, REG_PEND, and +REG_UTF. Note that REG_NOSPEC is not part of the POSIX standard. +.sp + REG_NOSUB +.sp +When a pattern that is compiled with this flag is passed to +\fBpcre2_regexec()\fP for matching, the \fInmatch\fP and \fIpmatch\fP arguments +are ignored, and no captured strings are returned. Versions of the PCRE library +prior to 10.22 used to set the PCRE2_NO_AUTO_CAPTURE compile option, but this +no longer happens because it disables the use of backreferences. +.sp + REG_PEND +.sp +If this option is set, the \fBreg_endp\fP field in the \fIpreg\fP structure +(which has the type const char *) must be set to point to the character beyond +the end of the pattern before calling \fBpcre2_regcomp()\fP. The pattern itself +may now contain binary zeros, which are treated as data characters. Without +REG_PEND, a binary zero terminates the pattern and the \fBre_endp\fP field is +ignored. This is a GNU extension to the POSIX standard and should be used with +caution in software intended to be portable to other systems. +.sp + REG_UCP +.sp +The PCRE2_UCP option is set when the regular expression is passed for +compilation to the native function. This causes PCRE2 to use Unicode properties +when matchine \ed, \ew, etc., instead of just recognizing ASCII values. Note +that REG_UCP is not part of the POSIX standard. +.sp + REG_UNGREEDY +.sp +The PCRE2_UNGREEDY option is set when the regular expression is passed for +compilation to the native function. Note that REG_UNGREEDY is not part of the +POSIX standard. +.sp + REG_UTF +.sp +The PCRE2_UTF option is set when the regular expression is passed for +compilation to the native function. This causes the pattern itself and all data +strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF +is not part of the POSIX standard. +.P +In the absence of these flags, no options are passed to the native function. +This means the the regex is compiled with PCRE2 default semantics. In +particular, the way it handles newline characters in the subject string is the +Perl way, not the POSIX way. Note that setting PCRE2_MULTILINE has only +\fIsome\fP of the effects specified for REG_NEWLINE. It does not affect the way +newlines are matched by the dot metacharacter (they are not) or by a negative +class such as [^a] (they are). +.P +The yield of \fBpcre2_regcomp()\fP is zero on success, and non-zero otherwise. +The \fIpreg\fP structure is filled in on success, and one other member of the +structure (as well as \fIre_endp\fP) is public: \fIre_nsub\fP contains the +number of capturing subpatterns in the regular expression. Various error codes +are defined in the header file. +.P +NOTE: If the yield of \fBpcre2_regcomp()\fP is non-zero, you must not attempt +to use the contents of the \fIpreg\fP structure. If, for example, you pass it +to \fBpcre2_regexec()\fP, the result is undefined and your program is likely to +crash. +. +. +.SH "MATCHING NEWLINE CHARACTERS" +.rs +.sp +This area is not simple, because POSIX and Perl take different views of things. +It is not possible to get PCRE2 to obey POSIX semantics, but then PCRE2 was +never intended to be a POSIX engine. The following table lists the different +possibilities for matching newline characters in Perl and PCRE2: +.sp + Default Change with +.sp + . matches newline no PCRE2_DOTALL + newline matches [^a] yes not changeable + $ matches \en at end yes PCRE2_DOLLAR_ENDONLY + $ matches \en in middle no PCRE2_MULTILINE + ^ matches \en in middle no PCRE2_MULTILINE +.sp +This is the equivalent table for a POSIX-compatible pattern matcher: +.sp + Default Change with +.sp + . matches newline yes REG_NEWLINE + newline matches [^a] yes REG_NEWLINE + $ matches \en at end no REG_NEWLINE + $ matches \en in middle no REG_NEWLINE + ^ matches \en in middle no REG_NEWLINE +.sp +This behaviour is not what happens when PCRE2 is called via its POSIX +API. By default, PCRE2's behaviour is the same as Perl's, except that there is +no equivalent for PCRE2_DOLLAR_ENDONLY in Perl. In both PCRE2 and Perl, there +is no way to stop newline from matching [^a]. +.P +Default POSIX newline handling can be obtained by setting PCRE2_DOTALL and +PCRE2_DOLLAR_ENDONLY when calling \fBpcre2_compile()\fP directly, but there is +no way to make PCRE2 behave exactly as for the REG_NEWLINE action. When using +the POSIX API, passing REG_NEWLINE to PCRE2's \fBpcre2_regcomp()\fP function +causes PCRE2_MULTILINE to be passed to \fBpcre2_compile()\fP, and REG_DOTALL +passes PCRE2_DOTALL. There is no way to pass PCRE2_DOLLAR_ENDONLY. +. +. +.SH "MATCHING A PATTERN" +.rs +.sp +The function \fBpcre2_regexec()\fP is called to match a compiled pattern +\fIpreg\fP against a given \fIstring\fP, which is by default terminated by a +zero byte (but see REG_STARTEND below), subject to the options in \fIeflags\fP. +These can be: +.sp + REG_NOTBOL +.sp +The PCRE2_NOTBOL option is set when calling the underlying PCRE2 matching +function. +.sp + REG_NOTEMPTY +.sp +The PCRE2_NOTEMPTY option is set when calling the underlying PCRE2 matching +function. Note that REG_NOTEMPTY is not part of the POSIX standard. However, +setting this option can give more POSIX-like behaviour in some situations. +.sp + REG_NOTEOL +.sp +The PCRE2_NOTEOL option is set when calling the underlying PCRE2 matching +function. +.sp + REG_STARTEND +.sp +When this option is set, the subject string starts at \fIstring\fP + +\fIpmatch[0].rm_so\fP and ends at \fIstring\fP + \fIpmatch[0].rm_eo\fP, which +should point to the first character beyond the string. There may be binary +zeros within the subject string, and indeed, using REG_STARTEND is the only +way to pass a subject string that contains a binary zero. +.P +Whatever the value of \fIpmatch[0].rm_so\fP, the offsets of the matched string +and any captured substrings are still given relative to the start of +\fIstring\fP itself. (Before PCRE2 release 10.30 these were given relative to +\fIstring\fP + \fIpmatch[0].rm_so\fP, but this differs from other +implementations.) +.P +This is a BSD extension, compatible with but not specified by IEEE Standard +1003.2 (POSIX.2), and should be used with caution in software intended to be +portable to other systems. Note that a non-zero \fIrm_so\fP does not imply +REG_NOTBOL; REG_STARTEND affects only the location and length of the string, +not how it is matched. Setting REG_STARTEND and passing \fIpmatch\fP as NULL +are mutually exclusive; the error REG_INVARG is returned. +.P +If the pattern was compiled with the REG_NOSUB flag, no data about any matched +strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of +\fBpcre2_regexec()\fP are ignored (except possibly as input for REG_STARTEND). +.P +The value of \fInmatch\fP may be zero, and the value \fIpmatch\fP may be NULL +(unless REG_STARTEND is set); in both these cases no data about any matched +strings is returned. +.P +Otherwise, the portion of the string that was matched, and also any captured +substrings, are returned via the \fIpmatch\fP argument, which points to an +array of \fInmatch\fP structures of type \fIregmatch_t\fP, containing the +members \fIrm_so\fP and \fIrm_eo\fP. These contain the byte offset to the first +character of each substring and the offset to the first character after the end +of each substring, respectively. The 0th element of the vector relates to the +entire portion of \fIstring\fP that was matched; subsequent elements relate to +the capturing subpatterns of the regular expression. Unused entries in the +array have both structure members set to -1. +.P +A successful match yields a zero return; various error codes are defined in the +header file, of which REG_NOMATCH is the "expected" failure code. +. +. +.SH "ERROR MESSAGES" +.rs +.sp +The \fBpcre2_regerror()\fP function maps a non-zero errorcode from either +\fBpcre2_regcomp()\fP or \fBpcre2_regexec()\fP to a printable message. If +\fIpreg\fP is not NULL, the error should have arisen from the use of that +structure. A message terminated by a binary zero is placed in \fIerrbuf\fP. If +the buffer is too short, only the first \fIerrbuf_size\fP - 1 characters of the +error message are used. The yield of the function is the size of buffer needed +to hold the whole message, including the terminating zero. This value is +greater than \fIerrbuf_size\fP if the message was truncated. +. +. +.SH MEMORY USAGE +.rs +.sp +Compiling a regular expression causes memory to be allocated and associated +with the \fIpreg\fP structure. The function \fBpcre2_regfree()\fP frees all +such memory, after which \fIpreg\fP may no longer be used as a compiled +expression. +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 26 April 2021 +Copyright (c) 1997-2021 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2sample.3 b/src/pcre2/doc/pcre2sample.3 new file mode 100644 index 00000000..661e3927 --- /dev/null +++ b/src/pcre2/doc/pcre2sample.3 @@ -0,0 +1,99 @@ +.TH PCRE2SAMPLE 3 "02 February 2016" "PCRE2 10.22" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH "PCRE2 SAMPLE PROGRAM" +.rs +.sp +A simple, complete demonstration program to get you started with using PCRE2 is +supplied in the file \fIpcre2demo.c\fP in the \fBsrc\fP directory in the PCRE2 +distribution. A listing of this program is given in the +.\" HREF +\fBpcre2demo\fP +.\" +documentation. If you do not have a copy of the PCRE2 distribution, you can +save this listing to re-create the contents of \fIpcre2demo.c\fP. +.P +The demonstration program compiles the regular expression that is its +first argument, and matches it against the subject string in its second +argument. No PCRE2 options are set, and default character tables are used. If +matching succeeds, the program outputs the portion of the subject that matched, +together with the contents of any captured substrings. +.P +If the -g option is given on the command line, the program then goes on to +check for further matches of the same regular expression in the same subject +string. The logic is a little bit tricky because of the possibility of matching +an empty string. Comments in the code explain what is going on. +.P +The code in \fBpcre2demo.c\fP is an 8-bit program that uses the PCRE2 8-bit +library. It handles strings and characters that are stored in 8-bit code units. +By default, one character corresponds to one code unit, but if the pattern +starts with "(*UTF)", both it and the subject are treated as UTF-8 strings, +where characters may occupy multiple code units. +.P +If PCRE2 is installed in the standard include and library directories for your +operating system, you should be able to compile the demonstration program using +a command like this: +.sp + cc -o pcre2demo pcre2demo.c -lpcre2-8 +.sp +If PCRE2 is installed elsewhere, you may need to add additional options to the +command line. For example, on a Unix-like system that has PCRE2 installed in +\fI/usr/local\fP, you can compile the demonstration program using a command +like this: +.sp +.\" JOINSH + cc -o pcre2demo -I/usr/local/include pcre2demo.c \e + -L/usr/local/lib -lpcre2-8 +.sp +Once you have built the demonstration program, you can run simple tests like +this: +.sp + ./pcre2demo 'cat|dog' 'the cat sat on the mat' + ./pcre2demo -g 'cat|dog' 'the dog sat on the cat' +.sp +Note that there is a much more comprehensive test program, called +.\" HREF +\fBpcre2test\fP, +.\" +which supports many more facilities for testing regular expressions using all +three PCRE2 libraries (8-bit, 16-bit, and 32-bit, though not all three need be +installed). The +.\" HREF +\fBpcre2demo\fP +.\" +program is provided as a relatively simple coding example. +.P +If you try to run +.\" HREF +\fBpcre2demo\fP +.\" +when PCRE2 is not installed in the standard library directory, you may get an +error like this on some operating systems (e.g. Solaris): +.sp + ld.so.1: pcre2demo: fatal: libpcre2-8.so.0: open failed: No such file or directory +.sp +This is caused by the way shared library support works on those systems. You +need to add +.sp + -R/usr/local/lib +.sp +(for example) to the compile command to get round this problem. +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 02 February 2016 +Copyright (c) 1997-2016 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2serialize.3 b/src/pcre2/doc/pcre2serialize.3 new file mode 100644 index 00000000..85aee9ba --- /dev/null +++ b/src/pcre2/doc/pcre2serialize.3 @@ -0,0 +1,199 @@ +.TH PCRE2SERIALIZE 3 "27 June 2018" "PCRE2 10.32" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH "SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS" +.rs +.sp +.nf +.B int32_t pcre2_serialize_decode(pcre2_code **\fIcodes\fP, +.B " int32_t \fInumber_of_codes\fP, const uint32_t *\fIbytes\fP," +.B " pcre2_general_context *\fIgcontext\fP);" +.sp +.B int32_t pcre2_serialize_encode(pcre2_code **\fIcodes\fP, +.B " int32_t \fInumber_of_codes\fP, uint32_t **\fIserialized_bytes\fP," +.B " PCRE2_SIZE *\fIserialized_size\fP, pcre2_general_context *\fIgcontext\fP);" +.sp +.B void pcre2_serialize_free(uint8_t *\fIbytes\fP); +.sp +.B int32_t pcre2_serialize_get_number_of_codes(const uint8_t *\fIbytes\fP); +.fi +.sp +If you are running an application that uses a large number of regular +expression patterns, it may be useful to store them in a precompiled form +instead of having to compile them every time the application is run. However, +if you are using the just-in-time optimization feature, it is not possible to +save and reload the JIT data, because it is position-dependent. The host on +which the patterns are reloaded must be running the same version of PCRE2, with +the same code unit width, and must also have the same endianness, pointer width +and PCRE2_SIZE type. For example, patterns compiled on a 32-bit system using +PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor can they be +reloaded using the 8-bit library. +.P +Note that "serialization" in PCRE2 does not convert compiled patterns to an +abstract format like Java or .NET serialization. The serialized output is +really just a bytecode dump, which is why it can only be reloaded in the same +environment as the one that created it. Hence the restrictions mentioned above. +Applications that are not statically linked with a fixed version of PCRE2 must +be prepared to recompile patterns from their sources, in order to be immune to +PCRE2 upgrades. +. +. +.SH "SECURITY CONCERNS" +.rs +.sp +The facility for saving and restoring compiled patterns is intended for use +within individual applications. As such, the data supplied to +\fBpcre2_serialize_decode()\fP is expected to be trusted data, not data from +arbitrary external sources. There is only some simple consistency checking, not +complete validation of what is being re-loaded. Corrupted data may cause +undefined results. For example, if the length field of a pattern in the +serialized data is corrupted, the deserializing code may read beyond the end of +the byte stream that is passed to it. +. +. +.SH "SAVING COMPILED PATTERNS" +.rs +.sp +Before compiled patterns can be saved they must be serialized, which in PCRE2 +means converting the pattern to a stream of bytes. A single byte stream may +contain any number of compiled patterns, but they must all use the same +character tables. A single copy of the tables is included in the byte stream +(its size is 1088 bytes). For more details of character tables, see the +.\" HTML +.\" +section on locale support +.\" +in the +.\" HREF +\fBpcre2api\fP +.\" +documentation. +.P +The function \fBpcre2_serialize_encode()\fP creates a serialized byte stream +from a list of compiled patterns. Its first two arguments specify the list, +being a pointer to a vector of pointers to compiled patterns, and the length of +the vector. The third and fourth arguments point to variables which are set to +point to the created byte stream and its length, respectively. The final +argument is a pointer to a general context, which can be used to specify custom +memory mangagement functions. If this argument is NULL, \fBmalloc()\fP is used +to obtain memory for the byte stream. The yield of the function is the number +of serialized patterns, or one of the following negative error codes: +.sp + PCRE2_ERROR_BADDATA the number of patterns is zero or less + PCRE2_ERROR_BADMAGIC mismatch of id bytes in one of the patterns + PCRE2_ERROR_MEMORY memory allocation failed + PCRE2_ERROR_MIXEDTABLES the patterns do not all use the same tables + PCRE2_ERROR_NULL the 1st, 3rd, or 4th argument is NULL +.sp +PCRE2_ERROR_BADMAGIC means either that a pattern's code has been corrupted, or +that a slot in the vector does not point to a compiled pattern. +.P +Once a set of patterns has been serialized you can save the data in any +appropriate manner. Here is sample code that compiles two patterns and writes +them to a file. It assumes that the variable \fIfd\fP refers to a file that is +open for output. The error checking that should be present in a real +application has been omitted for simplicity. +.sp + int errorcode; + uint8_t *bytes; + PCRE2_SIZE erroroffset; + PCRE2_SIZE bytescount; + pcre2_code *list_of_codes[2]; + list_of_codes[0] = pcre2_compile("first pattern", + PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL); + list_of_codes[1] = pcre2_compile("second pattern", + PCRE2_ZERO_TERMINATED, 0, &errorcode, &erroroffset, NULL); + errorcode = pcre2_serialize_encode(list_of_codes, 2, &bytes, + &bytescount, NULL); + errorcode = fwrite(bytes, 1, bytescount, fd); +.sp +Note that the serialized data is binary data that may contain any of the 256 +possible byte values. On systems that make a distinction between binary and +non-binary data, be sure that the file is opened for binary output. +.P +Serializing a set of patterns leaves the original data untouched, so they can +still be used for matching. Their memory must eventually be freed in the usual +way by calling \fBpcre2_code_free()\fP. When you have finished with the byte +stream, it too must be freed by calling \fBpcre2_serialize_free()\fP. If this +function is called with a NULL argument, it returns immediately without doing +anything. +. +. +.SH "RE-USING PRECOMPILED PATTERNS" +.rs +.sp +In order to re-use a set of saved patterns you must first make the serialized +byte stream available in main memory (for example, by reading from a file). The +management of this memory block is up to the application. You can use the +\fBpcre2_serialize_get_number_of_codes()\fP function to find out how many +compiled patterns are in the serialized data without actually decoding the +patterns: +.sp + uint8_t *bytes = ; + int32_t number_of_codes = pcre2_serialize_get_number_of_codes(bytes); +.sp +The \fBpcre2_serialize_decode()\fP function reads a byte stream and recreates +the compiled patterns in new memory blocks, setting pointers to them in a +vector. The first two arguments are a pointer to a suitable vector and its +length, and the third argument points to a byte stream. The final argument is a +pointer to a general context, which can be used to specify custom memory +mangagement functions for the decoded patterns. If this argument is NULL, +\fBmalloc()\fP and \fBfree()\fP are used. After deserialization, the byte +stream is no longer needed and can be discarded. +.sp + int32_t number_of_codes; + pcre2_code *list_of_codes[2]; + uint8_t *bytes = ; + int32_t number_of_codes = + pcre2_serialize_decode(list_of_codes, 2, bytes, NULL); +.sp +If the vector is not large enough for all the patterns in the byte stream, it +is filled with those that fit, and the remainder are ignored. The yield of the +function is the number of decoded patterns, or one of the following negative +error codes: +.sp + PCRE2_ERROR_BADDATA second argument is zero or less + PCRE2_ERROR_BADMAGIC mismatch of id bytes in the data + PCRE2_ERROR_BADMODE mismatch of code unit size or PCRE2 version + PCRE2_ERROR_BADSERIALIZEDDATA other sanity check failure + PCRE2_ERROR_MEMORY memory allocation failed + PCRE2_ERROR_NULL first or third argument is NULL +.sp +PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled +on a system with different endianness. +.P +Decoded patterns can be used for matching in the usual way, and must be freed +by calling \fBpcre2_code_free()\fP. However, be aware that there is a potential +race issue if you are using multiple patterns that were decoded from a single +byte stream in a multithreaded application. A single copy of the character +tables is used by all the decoded patterns and a reference count is used to +arrange for its memory to be automatically freed when the last pattern is +freed, but there is no locking on this reference count. Therefore, if you want +to call \fBpcre2_code_free()\fP for these patterns in different threads, you +must arrange your own locking, and ensure that \fBpcre2_code_free()\fP cannot +be called by two threads at the same time. +.P +If a pattern was processed by \fBpcre2_jit_compile()\fP before being +serialized, the JIT data is discarded and so is no longer available after a +save/restore cycle. You can, however, process a restored pattern with +\fBpcre2_jit_compile()\fP if you wish. +. +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 27 June 2018 +Copyright (c) 1997-2018 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2syntax.3 b/src/pcre2/doc/pcre2syntax.3 new file mode 100644 index 00000000..70764620 --- /dev/null +++ b/src/pcre2/doc/pcre2syntax.3 @@ -0,0 +1,681 @@ +.TH PCRE2SYNTAX 3 "28 December 2019" "PCRE2 10.35" +.SH NAME +PCRE2 - Perl-compatible regular expressions (revised API) +.SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY" +.rs +.sp +The full syntax and semantics of the regular expressions that are supported by +PCRE2 are described in the +.\" HREF +\fBpcre2pattern\fP +.\" +documentation. This document contains a quick-reference summary of the syntax. +. +. +.SH "QUOTING" +.rs +.sp + \ex where x is non-alphanumeric is a literal x + \eQ...\eE treat enclosed characters as literal +. +. +.SH "ESCAPED CHARACTERS" +.rs +.sp +This table applies to ASCII and Unicode environments. An unrecognized escape +sequence causes an error. +.sp + \ea alarm, that is, the BEL character (hex 07) + \ecx "control-x", where x is any ASCII printing character + \ee escape (hex 1B) + \ef form feed (hex 0C) + \en newline (hex 0A) + \er carriage return (hex 0D) + \et tab (hex 09) + \e0dd character with octal code 0dd + \eddd character with octal code ddd, or backreference + \eo{ddd..} character with octal code ddd.. + \eN{U+hh..} character with Unicode code point hh.. (Unicode mode only) + \exhh character with hex code hh + \ex{hh..} character with hex code hh.. +.sp +If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the +following are also recognized: +.sp + \eU the character "U" + \euhhhh character with hex code hhhh + \eu{hh..} character with hex code hh.. but only for EXTRA_ALT_BSUX +.sp +When \ex is not followed by {, from zero to two hexadecimal digits are read, +but in ALT_BSUX mode \ex must be followed by two hexadecimal digits to be +recognized as a hexadecimal escape; otherwise it matches a literal "x". +Likewise, if \eu (in ALT_BSUX mode) is not followed by four hexadecimal digits +or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it +matches a literal "u". +.P +Note that \e0dd is always an octal code. The treatment of backslash followed by +a non-zero digit is complicated; for details see the section +.\" HTML +.\" +"Non-printing characters" +.\" +in the +.\" HREF +\fBpcre2pattern\fP +.\" +documentation, where details of escape processing in EBCDIC environments are +also given. \eN{U+hh..} is synonymous with \ex{hh..} in PCRE2 but is not +supported in EBCDIC environments. Note that \eN not followed by an opening +curly bracket has a different meaning (see below). +. +. +.SH "CHARACTER TYPES" +.rs +.sp + . any character except newline; + in dotall mode, any character whatsoever + \eC one code unit, even in UTF mode (best avoided) + \ed a decimal digit + \eD a character that is not a decimal digit + \eh a horizontal white space character + \eH a character that is not a horizontal white space character + \eN a character that is not a newline + \ep{\fIxx\fP} a character with the \fIxx\fP property + \eP{\fIxx\fP} a character without the \fIxx\fP property + \eR a newline sequence + \es a white space character + \eS a character that is not a white space character + \ev a vertical white space character + \eV a character that is not a vertical white space character + \ew a "word" character + \eW a "non-word" character + \eX a Unicode extended grapheme cluster +.sp +\eC is dangerous because it may leave the current matching point in the middle +of a UTF-8 or UTF-16 character. The application can lock out the use of \eC by +setting the PCRE2_NEVER_BACKSLASH_C option. It is also possible to build PCRE2 +with the use of \eC permanently disabled. +.P +By default, \ed, \es, and \ew match only ASCII characters, even in UTF-8 mode +or in the 16-bit and 32-bit libraries. However, if locale-specific matching is +happening, \es and \ew may also match characters with code points in the range +128-255. If the PCRE2_UCP option is set, the behaviour of these escape +sequences is changed to use Unicode properties and they match many more +characters. +. +. +.SH "GENERAL CATEGORY PROPERTIES FOR \ep and \eP" +.rs +.sp + C Other + Cc Control + Cf Format + Cn Unassigned + Co Private use + Cs Surrogate +.sp + L Letter + Ll Lower case letter + Lm Modifier letter + Lo Other letter + Lt Title case letter + Lu Upper case letter + L& Ll, Lu, or Lt +.sp + M Mark + Mc Spacing mark + Me Enclosing mark + Mn Non-spacing mark +.sp + N Number + Nd Decimal number + Nl Letter number + No Other number +.sp + P Punctuation + Pc Connector punctuation + Pd Dash punctuation + Pe Close punctuation + Pf Final punctuation + Pi Initial punctuation + Po Other punctuation + Ps Open punctuation +.sp + S Symbol + Sc Currency symbol + Sk Modifier symbol + Sm Mathematical symbol + So Other symbol +.sp + Z Separator + Zl Line separator + Zp Paragraph separator + Zs Space separator +. +. +.SH "PCRE2 SPECIAL CATEGORY PROPERTIES FOR \ep and \eP" +.rs +.sp + Xan Alphanumeric: union of properties L and N + Xps POSIX space: property Z or tab, NL, VT, FF, CR + Xsp Perl space: property Z or tab, NL, VT, FF, CR + Xuc Univerally-named character: one that can be + represented by a Universal Character Name + Xwd Perl word: property Xan or underscore +.sp +Perl and POSIX space are now the same. Perl added VT to its space character set +at release 5.18. +. +. +.SH "SCRIPT NAMES FOR \ep AND \eP" +.rs +.sp +Adlam, +Ahom, +Anatolian_Hieroglyphs, +Arabic, +Armenian, +Avestan, +Balinese, +Bamum, +Bassa_Vah, +Batak, +Bengali, +Bhaiksuki, +Bopomofo, +Brahmi, +Braille, +Buginese, +Buhid, +Canadian_Aboriginal, +Carian, +Caucasian_Albanian, +Chakma, +Cham, +Cherokee, +Chorasmian, +Common, +Coptic, +Cuneiform, +Cypriot, +Cyrillic, +Deseret, +Devanagari, +Dives_Akuru, +Dogra, +Duployan, +Egyptian_Hieroglyphs, +Elbasan, +Elymaic, +Ethiopic, +Georgian, +Glagolitic, +Gothic, +Grantha, +Greek, +Gujarati, +Gunjala_Gondi, +Gurmukhi, +Han, +Hangul, +Hanifi_Rohingya, +Hanunoo, +Hatran, +Hebrew, +Hiragana, +Imperial_Aramaic, +Inherited, +Inscriptional_Pahlavi, +Inscriptional_Parthian, +Javanese, +Kaithi, +Kannada, +Katakana, +Kayah_Li, +Kharoshthi, +Khitan_Small_Script, +Khmer, +Khojki, +Khudawadi, +Lao, +Latin, +Lepcha, +Limbu, +Linear_A, +Linear_B, +Lisu, +Lycian, +Lydian, +Mahajani, +Makasar, +Malayalam, +Mandaic, +Manichaean, +Marchen, +Masaram_Gondi, +Medefaidrin, +Meetei_Mayek, +Mende_Kikakui, +Meroitic_Cursive, +Meroitic_Hieroglyphs, +Miao, +Modi, +Mongolian, +Mro, +Multani, +Myanmar, +Nabataean, +Nandinagari, +New_Tai_Lue, +Newa, +Nko, +Nushu, +Nyakeng_Puachue_Hmong, +Ogham, +Ol_Chiki, +Old_Hungarian, +Old_Italic, +Old_North_Arabian, +Old_Permic, +Old_Persian, +Old_Sogdian, +Old_South_Arabian, +Old_Turkic, +Oriya, +Osage, +Osmanya, +Pahawh_Hmong, +Palmyrene, +Pau_Cin_Hau, +Phags_Pa, +Phoenician, +Psalter_Pahlavi, +Rejang, +Runic, +Samaritan, +Saurashtra, +Sharada, +Shavian, +Siddham, +SignWriting, +Sinhala, +Sogdian, +Sora_Sompeng, +Soyombo, +Sundanese, +Syloti_Nagri, +Syriac, +Tagalog, +Tagbanwa, +Tai_Le, +Tai_Tham, +Tai_Viet, +Takri, +Tamil, +Tangut, +Telugu, +Thaana, +Thai, +Tibetan, +Tifinagh, +Tirhuta, +Ugaritic, +Vai, +Wancho, +Warang_Citi, +Yezidi, +Yi, +Zanabazar_Square. +. +. +.SH "CHARACTER CLASSES" +.rs +.sp + [...] positive character class + [^...] negative character class + [x-y] range (can be used for hex characters) + [[:xxx:]] positive POSIX named set + [[:^xxx:]] negative POSIX named set +.sp + alnum alphanumeric + alpha alphabetic + ascii 0-127 + blank space or tab + cntrl control character + digit decimal digit + graph printing, excluding space + lower lower case letter + print printing, including space + punct printing, excluding alphanumeric + space white space + upper upper case letter + word same as \ew + xdigit hexadecimal digit +.sp +In PCRE2, POSIX character set names recognize only ASCII characters by default, +but some of them use Unicode properties if PCRE2_UCP is set. You can use +\eQ...\eE inside a character class. +. +. +.SH "QUANTIFIERS" +.rs +.sp + ? 0 or 1, greedy + ?+ 0 or 1, possessive + ?? 0 or 1, lazy + * 0 or more, greedy + *+ 0 or more, possessive + *? 0 or more, lazy + + 1 or more, greedy + ++ 1 or more, possessive + +? 1 or more, lazy + {n} exactly n + {n,m} at least n, no more than m, greedy + {n,m}+ at least n, no more than m, possessive + {n,m}? at least n, no more than m, lazy + {n,} n or more, greedy + {n,}+ n or more, possessive + {n,}? n or more, lazy +. +. +.SH "ANCHORS AND SIMPLE ASSERTIONS" +.rs +.sp + \eb word boundary + \eB not a word boundary + ^ start of subject + also after an internal newline in multiline mode + (after any newline if PCRE2_ALT_CIRCUMFLEX is set) + \eA start of subject + $ end of subject + also before newline at end of subject + also before internal newline in multiline mode + \eZ end of subject + also before newline at end of subject + \ez end of subject + \eG first matching position in subject +. +. +.SH "REPORTED MATCH POINT SETTING" +.rs +.sp + \eK set reported start of match +.sp +\eK is honoured in positive assertions, but ignored in negative ones. +. +. +.SH "ALTERNATION" +.rs +.sp + expr|expr|expr... +. +. +.SH "CAPTURING" +.rs +.sp + (...) capture group + (?...) named capture group (Perl) + (?'name'...) named capture group (Perl) + (?P...) named capture group (Python) + (?:...) non-capture group + (?|...) non-capture group; reset group numbers for + capture groups in each alternative +.sp +In non-UTF modes, names may contain underscores and ASCII letters and digits; +in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In +both cases, a name must not start with a digit. +. +. +.SH "ATOMIC GROUPS" +.rs +.sp + (?>...) atomic non-capture group + (*atomic:...) atomic non-capture group +. +. +.SH "COMMENT" +.rs +.sp + (?#....) comment (not nestable) +. +. +.SH "OPTION SETTING" +.rs +Changes of these options within a group are automatically cancelled at the end +of the group. +.sp + (?i) caseless + (?J) allow duplicate named groups + (?m) multiline + (?n) no auto capture + (?s) single line (dotall) + (?U) default ungreedy (lazy) + (?x) extended: ignore white space except in classes + (?xx) as (?x) but also ignore space and tab in classes + (?-...) unset option(s) + (?^) unset imnsx options +.sp +Unsetting x or xx unsets both. Several options may be set at once, and a +mixture of setting and unsetting such as (?i-x) is allowed, but there may be +only one hyphen. Setting (but no unsetting) is allowed after (?^ for example +(?^in). An option setting may appear at the start of a non-capture group, for +example (?i:...). +.P +The following are recognized only at the very start of a pattern or after one +of the newline or \eR options with similar syntax. More than one of them may +appear. For the first three, d is a decimal number. +.sp + (*LIMIT_DEPTH=d) set the backtracking limit to d + (*LIMIT_HEAP=d) set the heap size limit to d * 1024 bytes + (*LIMIT_MATCH=d) set the match limit to d + (*NOTEMPTY) set PCRE2_NOTEMPTY when matching + (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching + (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS) + (*NO_DOTSTAR_ANCHOR) no .* anchoring (PCRE2_NO_DOTSTAR_ANCHOR) + (*NO_JIT) disable JIT optimization + (*NO_START_OPT) no start-match optimization (PCRE2_NO_START_OPTIMIZE) + (*UTF) set appropriate UTF mode for the library in use + (*UCP) set PCRE2_UCP (use Unicode properties for \ed etc) +.sp +Note that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the value of +the limits set by the caller of \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP, +not increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The +application can lock out the use of (*UTF) and (*UCP) by setting the +PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time. +. +. +.SH "NEWLINE CONVENTION" +.rs +.sp +These are recognized only at the very start of the pattern or after option +settings with a similar syntax. +.sp + (*CR) carriage return only + (*LF) linefeed only + (*CRLF) carriage return followed by linefeed + (*ANYCRLF) all three of the above + (*ANY) any Unicode newline sequence + (*NUL) the NUL character (binary zero) +. +. +.SH "WHAT \eR MATCHES" +.rs +.sp +These are recognized only at the very start of the pattern or after option +setting with a similar syntax. +.sp + (*BSR_ANYCRLF) CR, LF, or CRLF + (*BSR_UNICODE) any Unicode newline sequence +. +. +.SH "LOOKAHEAD AND LOOKBEHIND ASSERTIONS" +.rs +.sp + (?=...) ) + (*pla:...) ) positive lookahead + (*positive_lookahead:...) ) +.sp + (?!...) ) + (*nla:...) ) negative lookahead + (*negative_lookahead:...) ) +.sp + (?<=...) ) + (*plb:...) ) positive lookbehind + (*positive_lookbehind:...) ) +.sp + (? reference by name (Perl) + \ek'name' reference by name (Perl) + \eg{name} reference by name (Perl) + \ek{name} reference by name (.NET) + (?P=name) reference by name (Python) +. +. +.SH "SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)" +.rs +.sp + (?R) recurse whole pattern + (?n) call subroutine by absolute number + (?+n) call subroutine by relative number + (?-n) call subroutine by relative number + (?&name) call subroutine by name (Perl) + (?P>name) call subroutine by name (Python) + \eg call subroutine by name (Oniguruma) + \eg'name' call subroutine by name (Oniguruma) + \eg call subroutine by absolute number (Oniguruma) + \eg'n' call subroutine by absolute number (Oniguruma) + \eg<+n> call subroutine by relative number (PCRE2 extension) + \eg'+n' call subroutine by relative number (PCRE2 extension) + \eg<-n> call subroutine by relative number (PCRE2 extension) + \eg'-n' call subroutine by relative number (PCRE2 extension) +. +. +.SH "CONDITIONAL PATTERNS" +.rs +.sp + (?(condition)yes-pattern) + (?(condition)yes-pattern|no-pattern) +.sp + (?(n) absolute reference condition + (?(+n) relative reference condition + (?(-n) relative reference condition + (?() named reference condition (Perl) + (?('name') named reference condition (Perl) + (?(name) named reference condition (PCRE2, deprecated) + (?(R) overall recursion condition + (?(Rn) specific numbered group recursion condition + (?(R&name) specific named group recursion condition + (?(DEFINE) define groups for reference + (?(VERSION[>]=n.m) test PCRE2 version + (?(assert) assertion condition +.sp +Note the ambiguity of (?(R) and (?(Rn) which might be named reference +conditions or recursion tests. Such a condition is interpreted as a reference +condition if the relevant named group exists. +. +. +.SH "BACKTRACKING CONTROL" +.rs +.sp +All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the +name is mandatory, for the others it is optional. (*SKIP) changes its behaviour +if :NAME is present. The others just set a name for passing back to the caller, +but this is not a name that (*SKIP) can see. The following act immediately they +are reached: +.sp + (*ACCEPT) force successful match + (*FAIL) force backtrack; synonym (*F) + (*MARK:NAME) set name to be passed back; synonym (*:NAME) +.sp +The following act only when a subsequent match failure causes a backtrack to +reach them. They all force a match failure, but they differ in what happens +afterwards. Those that advance the start-of-match point do so only if the +pattern is not anchored. +.sp + (*COMMIT) overall failure, no advance of starting point + (*PRUNE) advance to next starting character + (*SKIP) advance to current matching position + (*SKIP:NAME) advance to position corresponding to an earlier + (*MARK:NAME); if not found, the (*SKIP) is ignored + (*THEN) local failure, backtrack to next alternation +.sp +The effect of one of these verbs in a group called as a subroutine is confined +to the subroutine call. +. +. +.SH "CALLOUTS" +.rs +.sp + (?C) callout (assumed number 0) + (?Cn) callout with numerical data n + (?C"text") callout with string data +.sp +The allowed string delimiters are ` ' " ^ % # $ (which are the same for the +start and the end), and the starting delimiter { matched with the ending +delimiter }. To encode the ending delimiter within the string, double it. +. +. +.SH "SEE ALSO" +.rs +.sp +\fBpcre2pattern\fP(3), \fBpcre2api\fP(3), \fBpcre2callout\fP(3), +\fBpcre2matching\fP(3), \fBpcre2\fP(3). +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 28 December 2019 +Copyright (c) 1997-2019 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2test.1 b/src/pcre2/doc/pcre2test.1 new file mode 100644 index 00000000..627f95a6 --- /dev/null +++ b/src/pcre2/doc/pcre2test.1 @@ -0,0 +1,2110 @@ +.TH PCRE2TEST 1 "28 April 2021" "PCRE 10.37" +.SH NAME +pcre2test - a program for testing Perl-compatible regular expressions. +.SH SYNOPSIS +.rs +.sp +.B pcre2test "[options] [input file [output file]]" +.sp +\fBpcre2test\fP is a test program for the PCRE2 regular expression libraries, +but it can also be used for experimenting with regular expressions. This +document describes the features of the test program; for details of the regular +expressions themselves, see the +.\" HREF +\fBpcre2pattern\fP +.\" +documentation. For details of the PCRE2 library function calls and their +options, see the +.\" HREF +\fBpcre2api\fP +.\" +documentation. +.P +The input for \fBpcre2test\fP is a sequence of regular expression patterns and +subject strings to be matched. There are also command lines for setting +defaults and controlling some special actions. The output shows the result of +each match attempt. Modifiers on external or internal command lines, the +patterns, and the subject lines specify PCRE2 function options, control how the +subject is processed, and what output is produced. +.P +As the original fairly simple PCRE library evolved, it acquired many different +features, and as a result, the original \fBpcretest\fP program ended up with a +lot of options in a messy, arcane syntax for testing all the features. The +move to the new PCRE2 API provided an opportunity to re-implement the test +program as \fBpcre2test\fP, with a cleaner modifier syntax. Nevertheless, there +are still many obscure modifiers, some of which are specifically designed for +use in conjunction with the test script and data files that are distributed as +part of PCRE2. All the modifiers are documented here, some without much +justification, but many of them are unlikely to be of use except when testing +the libraries. +. +. +.SH "PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES" +.rs +.sp +Different versions of the PCRE2 library can be built to support character +strings that are encoded in 8-bit, 16-bit, or 32-bit code units. One, two, or +all three of these libraries may be simultaneously installed. The +\fBpcre2test\fP program can be used to test all the libraries. However, its own +input and output are always in 8-bit format. When testing the 16-bit or 32-bit +libraries, patterns and subject strings are converted to 16-bit or 32-bit +format before being passed to the library functions. Results are converted back +to 8-bit code units for output. +.P +In the rest of this document, the names of library functions and structures +are given in generic form, for example, \fBpcre_compile()\fP. The actual +names used in the libraries have a suffix _8, _16, or _32, as appropriate. +. +. +.\" HTML +.SH "INPUT ENCODING" +.rs +.sp +Input to \fBpcre2test\fP is processed line by line, either by calling the C +library's \fBfgets()\fP function, or via the \fBlibreadline\fP library. In some +Windows environments character 26 (hex 1A) causes an immediate end of file, and +no further data is read, so this character should be avoided unless you really +want that action. +.P +The input is processed using using C's string functions, so must not +contain binary zeros, even though in Unix-like environments, \fBfgets()\fP +treats any bytes other than newline as data characters. An error is generated +if a binary zero is encountered. By default subject lines are processed for +backslash escapes, which makes it possible to include any data value in strings +that are passed to the library for matching. For patterns, there is a facility +for specifying some or all of the 8-bit input characters as hexadecimal pairs, +which makes it possible to include binary zeros. +. +. +.SS "Input for the 16-bit and 32-bit libraries" +.rs +.sp +When testing the 16-bit or 32-bit libraries, there is a need to be able to +generate character code points greater than 255 in the strings that are passed +to the library. For subject lines, backslash escapes can be used. In addition, +when the \fButf\fP modifier (see +.\" HTML +.\" +"Setting compilation options" +.\" +below) is set, the pattern and any following subject lines are interpreted as +UTF-8 strings and translated to UTF-16 or UTF-32 as appropriate. +.P +For non-UTF testing of wide characters, the \fButf8_input\fP modifier can be +used. This is mutually exclusive with \fButf\fP, and is allowed only in 16-bit +or 32-bit mode. It causes the pattern and following subject lines to be treated +as UTF-8 according to the original definition (RFC 2279), which allows for +character values up to 0x7fffffff. Each character is placed in one 16-bit or +32-bit code unit (in the 16-bit case, values greater than 0xffff cause an error +to occur). +.P +UTF-8 (in its original definition) is not capable of encoding values greater +than 0x7fffffff, but such values can be handled by the 32-bit library. When +testing this library in non-UTF mode with \fButf8_input\fP set, if any +character is preceded by the byte 0xff (which is an invalid byte in UTF-8) +0x80000000 is added to the character's value. This is the only way of passing +such code points in a pattern string. For subject strings, using an escape +sequence is preferable. +. +. +.SH "COMMAND LINE OPTIONS" +.rs +.TP 10 +\fB-8\fP +If the 8-bit library has been built, this option causes it to be used (this is +the default). If the 8-bit library has not been built, this option causes an +error. +.TP 10 +\fB-16\fP +If the 16-bit library has been built, this option causes it to be used. If only +the 16-bit library has been built, this is the default. If the 16-bit library +has not been built, this option causes an error. +.TP 10 +\fB-32\fP +If the 32-bit library has been built, this option causes it to be used. If only +the 32-bit library has been built, this is the default. If the 32-bit library +has not been built, this option causes an error. +.TP 10 +\fB-ac\fP +Behave as if each pattern has the \fBauto_callout\fP modifier, that is, insert +automatic callouts into every pattern that is compiled. +.TP 10 +\fB-AC\fP +As for \fB-ac\fP, but in addition behave as if each subject line has the +\fBcallout_extra\fP modifier, that is, show additional information from +callouts. +.TP 10 +\fB-b\fP +Behave as if each pattern has the \fBfullbincode\fP modifier; the full +internal binary form of the pattern is output after compilation. +.TP 10 +\fB-C\fP +Output the version number of the PCRE2 library, and all available information +about the optional features that are included, and then exit with zero exit +code. All other options are ignored. If both -C and -LM are present, whichever +is first is recognized. +.TP 10 +\fB-C\fP \fIoption\fP +Output information about a specific build-time option, then exit. This +functionality is intended for use in scripts such as \fBRunTest\fP. The +following options output the value and set the exit code as indicated: +.sp + ebcdic-nl the code for LF (= NL) in an EBCDIC environment: + 0x15 or 0x25 + 0 if used in an ASCII environment + exit code is always 0 + linksize the configured internal link size (2, 3, or 4) + exit code is set to the link size + newline the default newline setting: + CR, LF, CRLF, ANYCRLF, ANY, or NUL + exit code is always 0 + bsr the default setting for what \eR matches: + ANYCRLF or ANY + exit code is always 0 +.sp +The following options output 1 for true or 0 for false, and set the exit code +to the same value: +.sp + backslash-C \eC is supported (not locked out) + ebcdic compiled for an EBCDIC environment + jit just-in-time support is available + pcre2-16 the 16-bit library was built + pcre2-32 the 32-bit library was built + pcre2-8 the 8-bit library was built + unicode Unicode support is available +.sp +If an unknown option is given, an error message is output; the exit code is 0. +.TP 10 +\fB-d\fP +Behave as if each pattern has the \fBdebug\fP modifier; the internal +form and information about the compiled pattern is output after compilation; +\fB-d\fP is equivalent to \fB-b -i\fP. +.TP 10 +\fB-dfa\fP +Behave as if each subject line has the \fBdfa\fP modifier; matching is done +using the \fBpcre2_dfa_match()\fP function instead of the default +\fBpcre2_match()\fP. +.TP 10 +\fB-error\fP \fInumber[,number,...]\fP +Call \fBpcre2_get_error_message()\fP for each of the error numbers in the +comma-separated list, display the resulting messages on the standard output, +then exit with zero exit code. The numbers may be positive or negative. This is +a convenience facility for PCRE2 maintainers. +.TP 10 +\fB-help\fP +Output a brief summary these options and then exit. +.TP 10 +\fB-i\fP +Behave as if each pattern has the \fBinfo\fP modifier; information about the +compiled pattern is given after compilation. +.TP 10 +\fB-jit\fP +Behave as if each pattern line has the \fBjit\fP modifier; after successful +compilation, each pattern is passed to the just-in-time compiler, if available. +.TP 10 +\fB-jitfast\fP +Behave as if each pattern line has the \fBjitfast\fP modifier; after +successful compilation, each pattern is passed to the just-in-time compiler, if +available, and each subject line is passed directly to the JIT matcher via its +"fast path". +.TP 10 +\fB-jitverify\fP +Behave as if each pattern line has the \fBjitverify\fP modifier; after +successful compilation, each pattern is passed to the just-in-time compiler, if +available, and the use of JIT for matching is verified. +.TP 10 +\fB-LM\fP +List modifiers: write a list of available pattern and subject modifiers to the +standard output, then exit with zero exit code. All other options are ignored. +If both -C and -LM are present, whichever is first is recognized. +.TP 10 +\fB-pattern\fP \fImodifier-list\fP +Behave as if each pattern line contains the given modifiers. +.TP 10 +\fB-q\fP +Do not output the version number of \fBpcre2test\fP at the start of execution. +.TP 10 +\fB-S\fP \fIsize\fP +On Unix-like systems, set the size of the run-time stack to \fIsize\fP +mebibytes (units of 1024*1024 bytes). +.TP 10 +\fB-subject\fP \fImodifier-list\fP +Behave as if each subject line contains the given modifiers. +.TP 10 +\fB-t\fP +Run each compile and match many times with a timer, and output the resulting +times per compile or match. When JIT is used, separate times are given for the +initial compile and the JIT compile. You can control the number of iterations +that are used for timing by following \fB-t\fP with a number (as a separate +item on the command line). For example, "-t 1000" iterates 1000 times. The +default is to iterate 500,000 times. +.TP 10 +\fB-tm\fP +This is like \fB-t\fP except that it times only the matching phase, not the +compile phase. +.TP 10 +\fB-T\fP \fB-TM\fP +These behave like \fB-t\fP and \fB-tm\fP, but in addition, at the end of a run, +the total times for all compiles and matches are output. +.TP 10 +\fB-version\fP +Output the PCRE2 version number and then exit. +. +. +.SH "DESCRIPTION" +.rs +.sp +If \fBpcre2test\fP is given two filename arguments, it reads from the first and +writes to the second. If the first name is "-", input is taken from the +standard input. If \fBpcre2test\fP is given only one argument, it reads from +that file and writes to stdout. Otherwise, it reads from stdin and writes to +stdout. +.P +When \fBpcre2test\fP is built, a configuration option can specify that it +should be linked with the \fBlibreadline\fP or \fBlibedit\fP library. When this +is done, if the input is from a terminal, it is read using the \fBreadline()\fP +function. This provides line-editing and history facilities. The output from +the \fB-help\fP option states whether or not \fBreadline()\fP will be used. +.P +The program handles any number of tests, each of which consists of a set of +input lines. Each set starts with a regular expression pattern, followed by any +number of subject lines to be matched against that pattern. In between sets of +test data, command lines that begin with # may appear. This file format, with +some restrictions, can also be processed by the \fBperltest.sh\fP script that +is distributed with PCRE2 as a means of checking that the behaviour of PCRE2 +and Perl is the same. For a specification of \fBperltest.sh\fP, see the +comments near its beginning. See also the #perltest command below. +.P +When the input is a terminal, \fBpcre2test\fP prompts for each line of input, +using "re>" to prompt for regular expression patterns, and "data>" to prompt +for subject lines. Command lines starting with # can be entered only in +response to the "re>" prompt. +.P +Each subject line is matched separately and independently. If you want to do +multi-line matches, you have to use the \en escape sequence (or \er or \er\en, +etc., depending on the newline setting) in a single line of input to encode the +newline sequences. There is no limit on the length of subject lines; the input +buffer is automatically extended if it is too small. There are replication +features that makes it possible to generate long repetitive pattern or subject +lines without having to supply them explicitly. +.P +An empty line or the end of the file signals the end of the subject lines for a +test, at which point a new pattern or command line is expected if there is +still input to be read. +. +. +.SH "COMMAND LINES" +.rs +.sp +In between sets of test data, a line that begins with # is interpreted as a +command line. If the first character is followed by white space or an +exclamation mark, the line is treated as a comment, and ignored. Otherwise, the +following commands are recognized: +.sp + #forbid_utf +.sp +Subsequent patterns automatically have the PCRE2_NEVER_UTF and PCRE2_NEVER_UCP +options set, which locks out the use of the PCRE2_UTF and PCRE2_UCP options and +the use of (*UTF) and (*UCP) at the start of patterns. This command also forces +an error if a subsequent pattern contains any occurrences of \eP, \ep, or \eX, +which are still supported when PCRE2_UTF is not set, but which require Unicode +property support to be included in the library. +.P +This is a trigger guard that is used in test files to ensure that UTF or +Unicode property tests are not accidentally added to files that are used when +Unicode support is not included in the library. Setting PCRE2_NEVER_UTF and +PCRE2_NEVER_UCP as a default can also be obtained by the use of \fB#pattern\fP; +the difference is that \fB#forbid_utf\fP cannot be unset, and the automatic +options are not displayed in pattern information, to avoid cluttering up test +output. +.sp + #load +.sp +This command is used to load a set of precompiled patterns from a file, as +described in the section entitled "Saving and restoring compiled patterns" +.\" HTML +.\" +below. +.\" +.sp + #loadtables +.sp +This command is used to load a set of binary character tables that can be +accessed by the tables=3 qualifier. Such tables can be created by the +\fBpcre2_dftables\fP program with the -b option. +.sp + #newline_default [] +.sp +When PCRE2 is built, a default newline convention can be specified. This +determines which characters and/or character pairs are recognized as indicating +a newline in a pattern or subject string. The default can be overridden when a +pattern is compiled. The standard test files contain tests of various newline +conventions, but the majority of the tests expect a single linefeed to be +recognized as a newline by default. Without special action the tests would fail +when PCRE2 is compiled with either CR or CRLF as the default newline. +.P +The #newline_default command specifies a list of newline types that are +acceptable as the default. The types must be one of CR, LF, CRLF, ANYCRLF, +ANY, or NUL (in upper or lower case), for example: +.sp + #newline_default LF Any anyCRLF +.sp +If the default newline is in the list, this command has no effect. Otherwise, +except when testing the POSIX API, a \fBnewline\fP modifier that specifies the +first newline convention in the list (LF in the above example) is added to any +pattern that does not already have a \fBnewline\fP modifier. If the newline +list is empty, the feature is turned off. This command is present in a number +of the standard test input files. +.P +When the POSIX API is being tested there is no way to override the default +newline convention, though it is possible to set the newline convention from +within the pattern. A warning is given if the \fBposix\fP or \fBposix_nosub\fP +modifier is used when \fB#newline_default\fP would set a default for the +non-POSIX API. +.sp + #pattern +.sp +This command sets a default modifier list that applies to all subsequent +patterns. Modifiers on a pattern can change these settings. +.sp + #perltest +.sp +This line is used in test files that can also be processed by \fBperltest.sh\fP +to confirm that Perl gives the same results as PCRE2. Subsequent tests are +checked for the use of \fBpcre2test\fP features that are incompatible with the +\fBperltest.sh\fP script. +.P +Patterns must use '/' as their delimiter, and only certain modifiers are +supported. Comment lines, #pattern commands, and #subject commands that set or +unset "mark" are recognized and acted on. The #perltest, #forbid_utf, and +#newline_default commands, which are needed in the relevant pcre2test files, +are silently ignored. All other command lines are ignored, but give a warning +message. The \fB#perltest\fP command helps detect tests that are accidentally +put in the wrong file or use the wrong delimiter. For more details of the +\fBperltest.sh\fP script see the comments it contains. +.sp + #pop [] + #popcopy [] +.sp +These commands are used to manipulate the stack of compiled patterns, as +described in the section entitled "Saving and restoring compiled patterns" +.\" HTML +.\" +below. +.\" +.sp + #save +.sp +This command is used to save a set of compiled patterns to a file, as described +in the section entitled "Saving and restoring compiled patterns" +.\" HTML +.\" +below. +.\" +.sp + #subject +.sp +This command sets a default modifier list that applies to all subsequent +subject lines. Modifiers on a subject line can change these settings. +. +. +.SH "MODIFIER SYNTAX" +.rs +.sp +Modifier lists are used with both pattern and subject lines. Items in a list +are separated by commas followed by optional white space. Trailing whitespace +in a modifier list is ignored. Some modifiers may be given for both patterns +and subject lines, whereas others are valid only for one or the other. Each +modifier has a long name, for example "anchored", and some of them must be +followed by an equals sign and a value, for example, "offset=12". Values cannot +contain comma characters, but may contain spaces. Modifiers that do not take +values may be preceded by a minus sign to turn off a previous setting. +.P +A few of the more common modifiers can also be specified as single letters, for +example "i" for "caseless". In documentation, following the Perl convention, +these are written with a slash ("the /i modifier") for clarity. Abbreviated +modifiers must all be concatenated in the first item of a modifier list. If the +first item is not recognized as a long modifier name, it is interpreted as a +sequence of these abbreviations. For example: +.sp + /abc/ig,newline=cr,jit=3 +.sp +This is a pattern line whose modifier list starts with two one-letter modifiers +(/i and /g). The lower-case abbreviated modifiers are the same as used in Perl. +. +. +.SH "PATTERN SYNTAX" +.rs +.sp +A pattern line must start with one of the following characters (common symbols, +excluding pattern meta-characters): +.sp + / ! " ' ` - = _ : ; , % & @ ~ +.sp +This is interpreted as the pattern's delimiter. A regular expression may be +continued over several input lines, in which case the newline characters are +included within it. It is possible to include the delimiter within the pattern +by escaping it with a backslash, for example +.sp + /abc\e/def/ +.sp +If you do this, the escape and the delimiter form part of the pattern, but +since the delimiters are all non-alphanumeric, this does not affect its +interpretation. If the terminating delimiter is immediately followed by a +backslash, for example, +.sp + /abc/\e +.sp +then a backslash is added to the end of the pattern. This is done to provide a +way of testing the error condition that arises if a pattern finishes with a +backslash, because +.sp + /abc\e/ +.sp +is interpreted as the first line of a pattern that starts with "abc/", causing +pcre2test to read the next line as a continuation of the regular expression. +.P +A pattern can be followed by a modifier list (details below). +. +. +.SH "SUBJECT LINE SYNTAX" +.rs +.sp +Before each subject line is passed to \fBpcre2_match()\fP or +\fBpcre2_dfa_match()\fP, leading and trailing white space is removed, and the +line is scanned for backslash escapes, unless the \fBsubject_literal\fP +modifier was set for the pattern. The following provide a means of encoding +non-printing characters in a visible way: +.sp + \ea alarm (BEL, \ex07) + \eb backspace (\ex08) + \ee escape (\ex27) + \ef form feed (\ex0c) + \en newline (\ex0a) + \er carriage return (\ex0d) + \et tab (\ex09) + \ev vertical tab (\ex0b) + \ennn octal character (up to 3 octal digits); always + a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode + \eo{dd...} octal character (any number of octal digits} + \exhh hexadecimal byte (up to 2 hex digits) + \ex{hh...} hexadecimal character (any number of hex digits) +.sp +The use of \ex{hh...} is not dependent on the use of the \fButf\fP modifier on +the pattern. It is recognized always. There may be any number of hexadecimal +digits inside the braces; invalid values provoke error messages. +.P +Note that \exhh specifies one byte rather than one character in UTF-8 mode; +this makes it possible to construct invalid UTF-8 sequences for testing +purposes. On the other hand, \ex{hh} is interpreted as a UTF-8 character in +UTF-8 mode, generating more than one byte if the value is greater than 127. +When testing the 8-bit library not in UTF-8 mode, \ex{hh} generates one byte +for values less than 256, and causes an error for greater values. +.P +In UTF-16 mode, all 4-digit \ex{hhhh} values are accepted. This makes it +possible to construct invalid UTF-16 sequences for testing purposes. +.P +In UTF-32 mode, all 4- to 8-digit \ex{...} values are accepted. This makes it +possible to construct invalid UTF-32 sequences for testing purposes. +.P +There is a special backslash sequence that specifies replication of one or more +characters: +.sp + \e[]{} +.sp +This makes it possible to test long strings without having to provide them as +part of the file. For example: +.sp + \e[abc]{4} +.sp +is converted to "abcabcabcabc". This feature does not support nesting. To +include a closing square bracket in the characters, code it as \ex5D. +.P +A backslash followed by an equals sign marks the end of the subject string and +the start of a modifier list. For example: +.sp + abc\e=notbol,notempty +.sp +If the subject string is empty and \e= is followed by whitespace, the line is +treated as a comment line, and is not used for matching. For example: +.sp + \e= This is a comment. + abc\e= This is an invalid modifier list. +.sp +A backslash followed by any other non-alphanumeric character just escapes that +character. A backslash followed by anything else causes an error. However, if +the very last character in the line is a backslash (and there is no modifier +list), it is ignored. This gives a way of passing an empty line as data, since +a real empty line terminates the data input. +.P +If the \fBsubject_literal\fP modifier is set for a pattern, all subject lines +that follow are treated as literals, with no special treatment of backslashes. +No replication is possible, and any subject modifiers must be set as defaults +by a \fB#subject\fP command. +. +. +.SH "PATTERN MODIFIERS" +.rs +.sp +There are several types of modifier that can appear in pattern lines. Except +where noted below, they may also be used in \fB#pattern\fP commands. A +pattern's modifier list can add to or override default modifiers that were set +by a previous \fB#pattern\fP command. +. +. +.\" HTML +.SS "Setting compilation options" +.rs +.sp +The following modifiers set options for \fBpcre2_compile()\fP. Most of them set +bits in the options argument of that function, but those whose names start with +PCRE2_EXTRA are additional options that are set in the compile context. For the +main options, there are some single-letter abbreviations that are the same as +Perl options. There is special handling for /x: if a second x is present, +PCRE2_EXTENDED is converted into PCRE2_EXTENDED_MORE as in Perl. A third +appearance adds PCRE2_EXTENDED as well, though this makes no difference to the +way \fBpcre2_compile()\fP behaves. See +.\" HREF +\fBpcre2api\fP +.\" +for a description of the effects of these options. +.sp + allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS + allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES + alt_bsux set PCRE2_ALT_BSUX + alt_circumflex set PCRE2_ALT_CIRCUMFLEX + alt_verbnames set PCRE2_ALT_VERBNAMES + anchored set PCRE2_ANCHORED + auto_callout set PCRE2_AUTO_CALLOUT + bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL + /i caseless set PCRE2_CASELESS + dollar_endonly set PCRE2_DOLLAR_ENDONLY + /s dotall set PCRE2_DOTALL + dupnames set PCRE2_DUPNAMES + endanchored set PCRE2_ENDANCHORED + escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF + /x extended set PCRE2_EXTENDED + /xx extended_more set PCRE2_EXTENDED_MORE + extra_alt_bsux set PCRE2_EXTRA_ALT_BSUX + firstline set PCRE2_FIRSTLINE + literal set PCRE2_LITERAL + match_line set PCRE2_EXTRA_MATCH_LINE + match_invalid_utf set PCRE2_MATCH_INVALID_UTF + match_unset_backref set PCRE2_MATCH_UNSET_BACKREF + match_word set PCRE2_EXTRA_MATCH_WORD + /m multiline set PCRE2_MULTILINE + never_backslash_c set PCRE2_NEVER_BACKSLASH_C + never_ucp set PCRE2_NEVER_UCP + never_utf set PCRE2_NEVER_UTF + /n no_auto_capture set PCRE2_NO_AUTO_CAPTURE + no_auto_possess set PCRE2_NO_AUTO_POSSESS + no_dotstar_anchor set PCRE2_NO_DOTSTAR_ANCHOR + no_start_optimize set PCRE2_NO_START_OPTIMIZE + no_utf_check set PCRE2_NO_UTF_CHECK + ucp set PCRE2_UCP + ungreedy set PCRE2_UNGREEDY + use_offset_limit set PCRE2_USE_OFFSET_LIMIT + utf set PCRE2_UTF +.sp +As well as turning on the PCRE2_UTF option, the \fButf\fP modifier causes all +non-printing characters in output strings to be printed using the \ex{hh...} +notation. Otherwise, those less than 0x100 are output in hex without the curly +brackets. Setting \fButf\fP in 16-bit or 32-bit mode also causes pattern and +subject strings to be translated to UTF-16 or UTF-32, respectively, before +being passed to library functions. +. +. +.\" HTML +.SS "Setting compilation controls" +.rs +.sp +The following modifiers affect the compilation process or request information +about the pattern. There are single-letter abbreviations for some that are +heavily used in the test files. +.sp + bsr=[anycrlf|unicode] specify \eR handling + /B bincode show binary code without lengths + callout_info show callout information + convert= request foreign pattern conversion + convert_glob_escape=c set glob escape character + convert_glob_separator=c set glob separator character + convert_length set convert buffer length + debug same as info,fullbincode + framesize show matching frame size + fullbincode show binary code with lengths + /I info show info about compiled pattern + hex unquoted characters are hexadecimal + jit[=] use JIT + jitfast use JIT fast path + jitverify verify JIT use + locale= use this locale + max_pattern_length= set the maximum pattern length + memory show memory used + newline= set newline type + null_context compile with a NULL context + parens_nest_limit= set maximum parentheses depth + posix use the POSIX API + posix_nosub use the POSIX API with REG_NOSUB + push push compiled pattern onto the stack + pushcopy push a copy onto the stack + stackguard= test the stackguard feature + subject_literal treat all subject lines as literal + tables=[0|1|2|3] select internal tables + use_length do not zero-terminate the pattern + utf8_input treat input as UTF-8 +.sp +The effects of these modifiers are described in the following sections. +. +. +.SS "Newline and \eR handling" +.rs +.sp +The \fBbsr\fP modifier specifies what \eR in a pattern should match. If it is +set to "anycrlf", \eR matches CR, LF, or CRLF only. If it is set to "unicode", +\eR matches any Unicode newline sequence. The default can be specified when +PCRE2 is built; if it is not, the default is set to Unicode. +.P +The \fBnewline\fP modifier specifies which characters are to be interpreted as +newlines, both in the pattern and in subject lines. The type must be one of CR, +LF, CRLF, ANYCRLF, ANY, or NUL (in upper or lower case). +. +. +.SS "Information about a pattern" +.rs +.sp +The \fBdebug\fP modifier is a shorthand for \fBinfo,fullbincode\fP, requesting +all available information. +.P +The \fBbincode\fP modifier causes a representation of the compiled code to be +output after compilation. This information does not contain length and offset +values, which ensures that the same output is generated for different internal +link sizes and different code unit widths. By using \fBbincode\fP, the same +regression tests can be used in different environments. +.P +The \fBfullbincode\fP modifier, by contrast, \fIdoes\fP include length and +offset values. This is used in a few special tests that run only for specific +code unit widths and link sizes, and is also useful for one-off tests. +.P +The \fBinfo\fP modifier requests information about the compiled pattern +(whether it is anchored, has a fixed first character, and so on). The +information is obtained from the \fBpcre2_pattern_info()\fP function. Here are +some typical examples: +.sp + re> /(?i)(^a|^b)/m,info + Capture group count = 1 + Compile options: multiline + Overall options: caseless multiline + First code unit at start or follows newline + Subject length lower bound = 1 +.sp + re> /(?i)abc/info + Capture group count = 0 + Compile options: + Overall options: caseless + First code unit = 'a' (caseless) + Last code unit = 'c' (caseless) + Subject length lower bound = 3 +.sp +"Compile options" are those specified by modifiers; "overall options" have +added options that are taken or deduced from the pattern. If both sets of +options are the same, just a single "options" line is output; if there are no +options, the line is omitted. "First code unit" is where any match must start; +if there is more than one they are listed as "starting code units". "Last code +unit" is the last literal code unit that must be present in any match. This is +not necessarily the last character. These lines are omitted if no starting or +ending code units are recorded. The subject length line is omitted when +\fBno_start_optimize\fP is set because the minimum length is not calculated +when it can never be used. +.P +The \fBframesize\fP modifier shows the size, in bytes, of the storage frames +used by \fBpcre2_match()\fP for handling backtracking. The size depends on the +number of capturing parentheses in the pattern. +.P +The \fBcallout_info\fP modifier requests information about all the callouts in +the pattern. A list of them is output at the end of any other information that +is requested. For each callout, either its number or string is given, followed +by the item that follows it in the pattern. +. +. +.SS "Passing a NULL context" +.rs +.sp +Normally, \fBpcre2test\fP passes a context block to \fBpcre2_compile()\fP. If +the \fBnull_context\fP modifier is set, however, NULL is passed. This is for +testing that \fBpcre2_compile()\fP behaves correctly in this case (it uses +default values). +. +. +.SS "Specifying pattern characters in hexadecimal" +.rs +.sp +The \fBhex\fP modifier specifies that the characters of the pattern, except for +substrings enclosed in single or double quotes, are to be interpreted as pairs +of hexadecimal digits. This feature is provided as a way of creating patterns +that contain binary zeros and other non-printing characters. White space is +permitted between pairs of digits. For example, this pattern contains three +characters: +.sp + /ab 32 59/hex +.sp +Parts of such a pattern are taken literally if quoted. This pattern contains +nine characters, only two of which are specified in hexadecimal: +.sp + /ab "literal" 32/hex +.sp +Either single or double quotes may be used. There is no way of including +the delimiter within a substring. The \fBhex\fP and \fBexpand\fP modifiers are +mutually exclusive. +. +. +.SS "Specifying the pattern's length" +.rs +.sp +By default, patterns are passed to the compiling functions as zero-terminated +strings but can be passed by length instead of being zero-terminated. The +\fBuse_length\fP modifier causes this to happen. Using a length happens +automatically (whether or not \fBuse_length\fP is set) when \fBhex\fP is set, +because patterns specified in hexadecimal may contain binary zeros. +.P +If \fBhex\fP or \fBuse_length\fP is used with the POSIX wrapper API (see +.\" HTML +.\" +"Using the POSIX wrapper API" +.\" +below), the REG_PEND extension is used to pass the pattern's length. +. +. +.SS "Specifying wide characters in 16-bit and 32-bit modes" +.rs +.sp +In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and +translated to UTF-16 or UTF-32 when the \fButf\fP modifier is set. For testing +the 16-bit and 32-bit libraries in non-UTF mode, the \fButf8_input\fP modifier +can be used. It is mutually exclusive with \fButf\fP. Input lines are +interpreted as UTF-8 as a means of specifying wide characters. More details are +given in +.\" HTML +.\" +"Input encoding" +.\" +above. +. +. +.SS "Generating long repetitive patterns" +.rs +.sp +Some tests use long patterns that are very repetitive. Instead of creating a +very long input line for such a pattern, you can use a special repetition +feature, similar to the one described for subject lines above. If the +\fBexpand\fP modifier is present on a pattern, parts of the pattern that have +the form +.sp + \e[]{} +.sp +are expanded before the pattern is passed to \fBpcre2_compile()\fP. For +example, \e[AB]{6000} is expanded to "ABAB..." 6000 times. This construction +cannot be nested. An initial "\e[" sequence is recognized only if "]{" followed +by decimal digits and "}" is found later in the pattern. If not, the characters +remain in the pattern unaltered. The \fBexpand\fP and \fBhex\fP modifiers are +mutually exclusive. +.P +If part of an expanded pattern looks like an expansion, but is really part of +the actual pattern, unwanted expansion can be avoided by giving two values in +the quantifier. For example, \e[AB]{6000,6000} is not recognized as an +expansion item. +.P +If the \fBinfo\fP modifier is set on an expanded pattern, the result of the +expansion is included in the information that is output. +. +. +.SS "JIT compilation" +.rs +.sp +Just-in-time (JIT) compiling is a heavyweight optimization that can greatly +speed up pattern matching. See the +.\" HREF +\fBpcre2jit\fP +.\" +documentation for details. JIT compiling happens, optionally, after a pattern +has been successfully compiled into an internal form. The JIT compiler converts +this to optimized machine code. It needs to know whether the match-time options +PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used, because +different code is generated for the different cases. See the \fBpartial\fP +modifier in "Subject Modifiers" +.\" HTML +.\" +below +.\" +for details of how these options are specified for each match attempt. +.P +JIT compilation is requested by the \fBjit\fP pattern modifier, which may +optionally be followed by an equals sign and a number in the range 0 to 7. +The three bits that make up the number specify which of the three JIT operating +modes are to be compiled: +.sp + 1 compile JIT code for non-partial matching + 2 compile JIT code for soft partial matching + 4 compile JIT code for hard partial matching +.sp +The possible values for the \fBjit\fP modifier are therefore: +.sp + 0 disable JIT + 1 normal matching only + 2 soft partial matching only + 3 normal and soft partial matching + 4 hard partial matching only + 6 soft and hard partial matching only + 7 all three modes +.sp +If no number is given, 7 is assumed. The phrase "partial matching" means a call +to \fBpcre2_match()\fP with either the PCRE2_PARTIAL_SOFT or the +PCRE2_PARTIAL_HARD option set. Note that such a call may return a complete +match; the options enable the possibility of a partial match, but do not +require it. Note also that if you request JIT compilation only for partial +matching (for example, jit=2) but do not set the \fBpartial\fP modifier on a +subject line, that match will not use JIT code because none was compiled for +non-partial matching. +.P +If JIT compilation is successful, the compiled JIT code will automatically be +used when an appropriate type of match is run, except when incompatible +run-time options are specified. For more details, see the +.\" HREF +\fBpcre2jit\fP +.\" +documentation. See also the \fBjitstack\fP modifier below for a way of +setting the size of the JIT stack. +.P +If the \fBjitfast\fP modifier is specified, matching is done using the JIT +"fast path" interface, \fBpcre2_jit_match()\fP, which skips some of the sanity +checks that are done by \fBpcre2_match()\fP, and of course does not work when +JIT is not supported. If \fBjitfast\fP is specified without \fBjit\fP, jit=7 is +assumed. +.P +If the \fBjitverify\fP modifier is specified, information about the compiled +pattern shows whether JIT compilation was or was not successful. If +\fBjitverify\fP is specified without \fBjit\fP, jit=7 is assumed. If JIT +compilation is successful when \fBjitverify\fP is set, the text "(JIT)" is +added to the first output line after a match or non match when JIT-compiled +code was actually used in the match. +. +. +.SS "Setting a locale" +.rs +.sp +The \fBlocale\fP modifier must specify the name of a locale, for example: +.sp + /pattern/locale=fr_FR +.sp +The given locale is set, \fBpcre2_maketables()\fP is called to build a set of +character tables for the locale, and this is then passed to +\fBpcre2_compile()\fP when compiling the regular expression. The same tables +are used when matching the following subject lines. The \fBlocale\fP modifier +applies only to the pattern on which it appears, but can be given in a +\fB#pattern\fP command if a default is needed. Setting a locale and alternate +character tables are mutually exclusive. +. +. +.SS "Showing pattern memory" +.rs +.sp +The \fBmemory\fP modifier causes the size in bytes of the memory used to hold +the compiled pattern to be output. This does not include the size of the +\fBpcre2_code\fP block; it is just the actual compiled data. If the pattern is +subsequently passed to the JIT compiler, the size of the JIT compiled code is +also output. Here is an example: +.sp + re> /a(b)c/jit,memory + Memory allocation (code space): 21 + Memory allocation (JIT code): 1910 +.sp +. +. +.SS "Limiting nested parentheses" +.rs +.sp +The \fBparens_nest_limit\fP modifier sets a limit on the depth of nested +parentheses in a pattern. Breaching the limit causes a compilation error. +The default for the library is set when PCRE2 is built, but \fBpcre2test\fP +sets its own default of 220, which is required for running the standard test +suite. +. +. +.SS "Limiting the pattern length" +.rs +.sp +The \fBmax_pattern_length\fP modifier sets a limit, in code units, to the +length of pattern that \fBpcre2_compile()\fP will accept. Breaching the limit +causes a compilation error. The default is the largest number a PCRE2_SIZE +variable can hold (essentially unlimited). +. +. +.\" HTML +.SS "Using the POSIX wrapper API" +.rs +.sp +The \fBposix\fP and \fBposix_nosub\fP modifiers cause \fBpcre2test\fP to call +PCRE2 via the POSIX wrapper API rather than its native API. When +\fBposix_nosub\fP is used, the POSIX option REG_NOSUB is passed to +\fBregcomp()\fP. The POSIX wrapper supports only the 8-bit library. Note that +it does not imply POSIX matching semantics; for more detail see the +.\" HREF +\fBpcre2posix\fP +.\" +documentation. The following pattern modifiers set options for the +\fBregcomp()\fP function: +.sp + caseless REG_ICASE + multiline REG_NEWLINE + dotall REG_DOTALL ) + ungreedy REG_UNGREEDY ) These options are not part of + ucp REG_UCP ) the POSIX standard + utf REG_UTF8 ) +.sp +The \fBregerror_buffsize\fP modifier specifies a size for the error buffer that +is passed to \fBregerror()\fP in the event of a compilation error. For example: +.sp + /abc/posix,regerror_buffsize=20 +.sp +This provides a means of testing the behaviour of \fBregerror()\fP when the +buffer is too small for the error message. If this modifier has not been set, a +large buffer is used. +.P +The \fBaftertext\fP and \fBallaftertext\fP subject modifiers work as described +below. All other modifiers are either ignored, with a warning message, or cause +an error. +.P +The pattern is passed to \fBregcomp()\fP as a zero-terminated string by +default, but if the \fBuse_length\fP or \fBhex\fP modifiers are set, the +REG_PEND extension is used to pass it by length. +. +. +.SS "Testing the stack guard feature" +.rs +.sp +The \fBstackguard\fP modifier is used to test the use of +\fBpcre2_set_compile_recursion_guard()\fP, a function that is provided to +enable stack availability to be checked during compilation (see the +.\" HREF +\fBpcre2api\fP +.\" +documentation for details). If the number specified by the modifier is greater +than zero, \fBpcre2_set_compile_recursion_guard()\fP is called to set up +callback from \fBpcre2_compile()\fP to a local function. The argument it +receives is the current nesting parenthesis depth; if this is greater than the +value given by the modifier, non-zero is returned, causing the compilation to +be aborted. +. +. +.SS "Using alternative character tables" +.rs +.sp +The value specified for the \fBtables\fP modifier must be one of the digits 0, +1, 2, or 3. It causes a specific set of built-in character tables to be passed +to \fBpcre2_compile()\fP. This is used in the PCRE2 tests to check behaviour +with different character tables. The digit specifies the tables as follows: +.sp + 0 do not pass any special character tables + 1 the default ASCII tables, as distributed in + pcre2_chartables.c.dist + 2 a set of tables defining ISO 8859 characters + 3 a set of tables loaded by the #loadtables command +.sp +In tables 2, some characters whose codes are greater than 128 are identified as +letters, digits, spaces, etc. Tables 3 can be used only after a +\fB#loadtables\fP command has loaded them from a binary file. Setting alternate +character tables and a locale are mutually exclusive. +. +. +.SS "Setting certain match controls" +.rs +.sp +The following modifiers are really subject modifiers, and are described under +"Subject Modifiers" below. However, they may be included in a pattern's +modifier list, in which case they are applied to every subject line that is +processed with that pattern. These modifiers do not affect the compilation +process. +.sp + aftertext show text after match + allaftertext show text after captures + allcaptures show all captures + allvector show the entire ovector + allusedtext show all consulted text + altglobal alternative global matching + /g global global matching + jitstack= set size of JIT stack + mark show mark values + replace= specify a replacement string + startchar show starting character when relevant + substitute_callout use substitution callouts + substitute_extended use PCRE2_SUBSTITUTE_EXTENDED + substitute_literal use PCRE2_SUBSTITUTE_LITERAL + substitute_matched use PCRE2_SUBSTITUTE_MATCHED + substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH + substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY + substitute_skip= skip substitution + substitute_stop= skip substitution and following + substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET + substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY +.sp +These modifiers may not appear in a \fB#pattern\fP command. If you want them as +defaults, set them in a \fB#subject\fP command. +. +. +.SS "Specifying literal subject lines" +.rs +.sp +If the \fBsubject_literal\fP modifier is present on a pattern, all the subject +lines that it matches are taken as literal strings, with no interpretation of +backslashes. It is not possible to set subject modifiers on such lines, but any +that are set as defaults by a \fB#subject\fP command are recognized. +. +. +.SS "Saving a compiled pattern" +.rs +.sp +When a pattern with the \fBpush\fP modifier is successfully compiled, it is +pushed onto a stack of compiled patterns, and \fBpcre2test\fP expects the next +line to contain a new pattern (or a command) instead of a subject line. This +facility is used when saving compiled patterns to a file, as described in the +section entitled "Saving and restoring compiled patterns" +.\" HTML +.\" +below. +.\" +If \fBpushcopy\fP is used instead of \fBpush\fP, a copy of the compiled +pattern is stacked, leaving the original as current, ready to match the +following input lines. This provides a way of testing the +\fBpcre2_code_copy()\fP function. +.\" +The \fBpush\fP and \fBpushcopy \fP modifiers are incompatible with compilation +modifiers such as \fBglobal\fP that act at match time. Any that are specified +are ignored (for the stacked copy), with a warning message, except for +\fBreplace\fP, which causes an error. Note that \fBjitverify\fP, which is +allowed, does not carry through to any subsequent matching that uses a stacked +pattern. +. +. +.SS "Testing foreign pattern conversion" +.rs +.sp +The experimental foreign pattern conversion functions in PCRE2 can be tested by +setting the \fBconvert\fP modifier. Its argument is a colon-separated list of +options, which set the equivalent option for the \fBpcre2_pattern_convert()\fP +function: +.sp + glob PCRE2_CONVERT_GLOB + glob_no_starstar PCRE2_CONVERT_GLOB_NO_STARSTAR + glob_no_wild_separator PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR + posix_basic PCRE2_CONVERT_POSIX_BASIC + posix_extended PCRE2_CONVERT_POSIX_EXTENDED + unset Unset all options +.sp +The "unset" value is useful for turning off a default that has been set by a +\fB#pattern\fP command. When one of these options is set, the input pattern is +passed to \fBpcre2_pattern_convert()\fP. If the conversion is successful, the +result is reflected in the output and then passed to \fBpcre2_compile()\fP. The +normal \fButf\fP and \fBno_utf_check\fP options, if set, cause the +PCRE2_CONVERT_UTF and PCRE2_CONVERT_NO_UTF_CHECK options to be passed to +\fBpcre2_pattern_convert()\fP. +.P +By default, the conversion function is allowed to allocate a buffer for its +output. However, if the \fBconvert_length\fP modifier is set to a value greater +than zero, \fBpcre2test\fP passes a buffer of the given length. This makes it +possible to test the length check. +.P +The \fBconvert_glob_escape\fP and \fBconvert_glob_separator\fP modifiers can be +used to specify the escape and separator characters for glob processing, +overriding the defaults, which are operating-system dependent. +. +. +.\" HTML +.SH "SUBJECT MODIFIERS" +.rs +.sp +The modifiers that can appear in subject lines and the \fB#subject\fP +command are of two types. +. +. +.SS "Setting match options" +.rs +.sp +The following modifiers set options for \fBpcre2_match()\fP or +\fBpcre2_dfa_match()\fP. See +.\" HREF +\fBpcreapi\fP +.\" +for a description of their effects. +.sp + anchored set PCRE2_ANCHORED + endanchored set PCRE2_ENDANCHORED + dfa_restart set PCRE2_DFA_RESTART + dfa_shortest set PCRE2_DFA_SHORTEST + no_jit set PCRE2_NO_JIT + no_utf_check set PCRE2_NO_UTF_CHECK + notbol set PCRE2_NOTBOL + notempty set PCRE2_NOTEMPTY + notempty_atstart set PCRE2_NOTEMPTY_ATSTART + noteol set PCRE2_NOTEOL + partial_hard (or ph) set PCRE2_PARTIAL_HARD + partial_soft (or ps) set PCRE2_PARTIAL_SOFT +.sp +The partial matching modifiers are provided with abbreviations because they +appear frequently in tests. +.P +If the \fBposix\fP or \fBposix_nosub\fP modifier was present on the pattern, +causing the POSIX wrapper API to be used, the only option-setting modifiers +that have any effect are \fBnotbol\fP, \fBnotempty\fP, and \fBnoteol\fP, +causing REG_NOTBOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to +\fBregexec()\fP. The other modifiers are ignored, with a warning message. +.P +There is one additional modifier that can be used with the POSIX wrapper. It is +ignored (with a warning) if used for non-POSIX matching. +.sp + posix_startend=[:] +.sp +This causes the subject string to be passed to \fBregexec()\fP using the +REG_STARTEND option, which uses offsets to specify which part of the string is +searched. If only one number is given, the end offset is passed as the end of +the subject string. For more detail of REG_STARTEND, see the +.\" HREF +\fBpcre2posix\fP +.\" +documentation. If the subject string contains binary zeros (coded as escapes +such as \ex{00} because \fBpcre2test\fP does not support actual binary zeros in +its input), you must use \fBposix_startend\fP to specify its length. +. +. +.SS "Setting match controls" +.rs +.sp +The following modifiers affect the matching process or request additional +information. Some of them may also be specified on a pattern line (see above), +in which case they apply to every subject line that is matched against that +pattern, but can be overridden by modifiers on the subject. +.sp + aftertext show text after match + allaftertext show text after captures + allcaptures show all captures + allvector show the entire ovector + allusedtext show all consulted text (non-JIT only) + altglobal alternative global matching + callout_capture show captures at callout time + callout_data= set a value to pass via callouts + callout_error=[:] control callout error + callout_extra show extra callout information + callout_fail=[:] control callout failure + callout_no_where do not show position of a callout + callout_none do not supply a callout function + copy= copy captured substring + depth_limit= set a depth limit + dfa use \fBpcre2_dfa_match()\fP + find_limits find match and depth limits + get= extract captured substring + getall extract all captured substrings + /g global global matching + heap_limit= set a limit on heap memory (Kbytes) + jitstack= set size of JIT stack + mark show mark values + match_limit= set a match limit + memory show heap memory usage + null_context match with a NULL context + offset= set starting offset + offset_limit= set offset limit + ovector= set size of output vector + recursion_limit= obsolete synonym for depth_limit + replace= specify a replacement string + startchar show startchar when relevant + startoffset= same as offset= + substitute_callout use substitution callouts + substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED + substitute_literal use PCRE2_SUBSTITUTE_LITERAL + substitute_matched use PCRE2_SUBSTITUTE_MATCHED + substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH + substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY + substitute_skip= skip substitution number n + substitute_stop= skip substitution number n and greater + substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET + substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY + zero_terminate pass the subject as zero-terminated +.sp +The effects of these modifiers are described in the following sections. When +matching via the POSIX wrapper API, the \fBaftertext\fP, \fBallaftertext\fP, +and \fBovector\fP subject modifiers work as described below. All other +modifiers are either ignored, with a warning message, or cause an error. +. +. +.SS "Showing more text" +.rs +.sp +The \fBaftertext\fP modifier requests that as well as outputting the part of +the subject string that matched the entire pattern, \fBpcre2test\fP should in +addition output the remainder of the subject string. This is useful for tests +where the subject contains multiple copies of the same substring. The +\fBallaftertext\fP modifier requests the same action for captured substrings as +well as the main matched substring. In each case the remainder is output on the +following line with a plus character following the capture number. +.P +The \fBallusedtext\fP modifier requests that all the text that was consulted +during a successful pattern match by the interpreter should be shown, for both +full and partial matches. This feature is not supported for JIT matching, and +if requested with JIT it is ignored (with a warning message). Setting this +modifier affects the output if there is a lookbehind at the start of a match, +or, for a complete match, a lookahead at the end, or if \eK is used in the +pattern. Characters that precede or follow the start and end of the actual +match are indicated in the output by '<' or '>' characters underneath them. +Here is an example: +.sp + re> /(?<=pqr)abc(?=xyz)/ + data> 123pqrabcxyz456\e=allusedtext + 0: pqrabcxyz + <<< >>> + data> 123pqrabcxy\e=ph,allusedtext + Partial match: pqrabcxy + <<< +.sp +The first, complete match shows that the matched string is "abc", with the +preceding and following strings "pqr" and "xyz" having been consulted during +the match (when processing the assertions). The partial match can indicate only +the preceding string. +.P +The \fBstartchar\fP modifier requests that the starting character for the match +be indicated, if it is different to the start of the matched string. The only +time when this occurs is when \eK has been processed as part of the match. In +this situation, the output for the matched string is displayed from the +starting character instead of from the match point, with circumflex characters +under the earlier characters. For example: +.sp + re> /abc\eKxyz/ + data> abcxyz\e=startchar + 0: abcxyz + ^^^ +.sp +Unlike \fBallusedtext\fP, the \fBstartchar\fP modifier can be used with JIT. +However, these two modifiers are mutually exclusive. +. +. +.SS "Showing the value of all capture groups" +.rs +.sp +The \fBallcaptures\fP modifier requests that the values of all potential +captured parentheses be output after a match. By default, only those up to the +highest one actually used in the match are output (corresponding to the return +code from \fBpcre2_match()\fP). Groups that did not take part in the match +are output as "". This modifier is not relevant for DFA matching (which +does no capturing) and does not apply when \fBreplace\fP is specified; it is +ignored, with a warning message, if present. +. +. +.SS "Showing the entire ovector, for all outcomes" +.rs +.sp +The \fBallvector\fP modifier requests that the entire ovector be shown, +whatever the outcome of the match. Compare \fBallcaptures\fP, which shows only +up to the maximum number of capture groups for the pattern, and then only for a +successful complete non-DFA match. This modifier, which acts after any match +result, and also for DFA matching, provides a means of checking that there are +no unexpected modifications to ovector fields. Before each match attempt, the +ovector is filled with a special value, and if this is found in both elements +of a capturing pair, "" is output. After a successful match, this +applies to all groups after the maximum capture group for the pattern. In other +cases it applies to the entire ovector. After a partial match, the first two +elements are the only ones that should be set. After a DFA match, the amount of +ovector that is used depends on the number of matches that were found. +. +. +.SS "Testing pattern callouts" +.rs +.sp +A callout function is supplied when \fBpcre2test\fP calls the library matching +functions, unless \fBcallout_none\fP is specified. Its behaviour can be +controlled by various modifiers listed above whose names begin with +\fBcallout_\fP. Details are given in the section entitled "Callouts" +.\" HTML +.\" +below. +.\" +Testing callouts from \fBpcre2_substitute()\fP is decribed separately in +"Testing the substitution function" +.\" HTML +.\" +below. +.\" +. +. +.SS "Finding all matches in a string" +.rs +.sp +Searching for all possible matches within a subject can be requested by the +\fBglobal\fP or \fBaltglobal\fP modifier. After finding a match, the matching +function is called again to search the remainder of the subject. The difference +between \fBglobal\fP and \fBaltglobal\fP is that the former uses the +\fIstart_offset\fP argument to \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP +to start searching at a new point within the entire string (which is what Perl +does), whereas the latter passes over a shortened subject. This makes a +difference to the matching process if the pattern begins with a lookbehind +assertion (including \eb or \eB). +.P +If an empty string is matched, the next match is done with the +PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search for +another, non-empty, match at the same point in the subject. If this match +fails, the start offset is advanced, and the normal match is retried. This +imitates the way Perl handles such cases when using the \fB/g\fP modifier or +the \fBsplit()\fP function. Normally, the start offset is advanced by one +character, but if the newline convention recognizes CRLF as a newline, and the +current character is CR followed by LF, an advance of two characters occurs. +. +. +.SS "Testing substring extraction functions" +.rs +.sp +The \fBcopy\fP and \fBget\fP modifiers can be used to test the +\fBpcre2_substring_copy_xxx()\fP and \fBpcre2_substring_get_xxx()\fP functions. +They can be given more than once, and each can specify a capture group name or +number, for example: +.sp + abcd\e=copy=1,copy=3,get=G1 +.sp +If the \fB#subject\fP command is used to set default copy and/or get lists, +these can be unset by specifying a negative number to cancel all numbered +groups and an empty name to cancel all named groups. +.P +The \fBgetall\fP modifier tests \fBpcre2_substring_list_get()\fP, which +extracts all captured substrings. +.P +If the subject line is successfully matched, the substrings extracted by the +convenience functions are output with C, G, or L after the string number +instead of a colon. This is in addition to the normal full list. The string +length (that is, the return from the extraction function) is given in +parentheses after each substring, followed by the name when the extraction was +by name. +. +. +.\" HTML +.SS "Testing the substitution function" +.rs +.sp +If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is +called instead of one of the matching functions (or after one call of +\fBpcre2_match()\fP in the case of PCRE2_SUBSTITUTE_MATCHED). Note that +replacement strings cannot contain commas, because a comma signifies the end of +a modifier. This is not thought to be an issue in a test program. +.P +Specifying a completely empty replacement string disables this modifier. +However, it is possible to specify an empty replacement by providing a buffer +length, as described below, for an otherwise empty replacement. +.P +Unlike subject strings, \fBpcre2test\fP does not process replacement strings +for escape sequences. In UTF mode, a replacement string is checked to see if it +is a valid UTF-8 string. If so, it is correctly converted to a UTF string of +the appropriate code unit width. If it is not a valid UTF-8 string, the +individual code units are copied directly. This provides a means of passing an +invalid UTF-8 string for testing purposes. +.P +The following modifiers set options (in additional to the normal match options) +for \fBpcre2_substitute()\fP: +.sp + global PCRE2_SUBSTITUTE_GLOBAL + substitute_extended PCRE2_SUBSTITUTE_EXTENDED + substitute_literal PCRE2_SUBSTITUTE_LITERAL + substitute_matched PCRE2_SUBSTITUTE_MATCHED + substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH + substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY + substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET + substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY +.sp +See the +.\" HREF +\fBpcre2api\fP +.\" +documentation for details of these options. +.P +After a successful substitution, the modified string is output, preceded by the +number of replacements. This may be zero if there were no matches. Here is a +simple example of a substitution test: +.sp + /abc/replace=xxx + =abc=abc= + 1: =xxx=abc= + =abc=abc=\e=global + 2: =xxx=xxx= +.sp +Subject and replacement strings should be kept relatively short (fewer than 256 +characters) for substitution tests, as fixed-size buffers are used. To make it +easy to test for buffer overflow, if the replacement string starts with a +number in square brackets, that number is passed to \fBpcre2_substitute()\fP as +the size of the output buffer, with the replacement string starting at the next +character. Here is an example that tests the edge case: +.sp + /abc/ + 123abc123\e=replace=[10]XYZ + 1: 123XYZ123 + 123abc123\e=replace=[9]XYZ + Failed: error -47: no more memory +.sp +The default action of \fBpcre2_substitute()\fP is to return +PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the +PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the +\fBsubstitute_overflow_length\fP modifier), \fBpcre2_substitute()\fP continues +to go through the motions of matching and substituting (but not doing any +callouts), in order to compute the size of buffer that is required. When this +happens, \fBpcre2test\fP shows the required buffer length (which includes space +for the trailing zero) as part of the error message. For example: +.sp + /abc/substitute_overflow_length + 123abc123\e=replace=[9]XYZ + Failed: error -47: no more memory: 10 code units are needed +.sp +A replacement string is ignored with POSIX and DFA matching. Specifying partial +matching provokes an error return ("bad option value") from +\fBpcre2_substitute()\fP. +. +. +.SS "Testing substitute callouts" +.rs +.sp +If the \fBsubstitute_callout\fP modifier is set, a substitution callout +function is set up. The \fBnull_context\fP modifier must not be set, because +the address of the callout function is passed in a match context. When the +callout function is called (after each substitution), details of the the input +and output strings are output. For example: +.sp + /abc/g,replace=<$0>,substitute_callout + abcdefabcpqr + 1(1) Old 0 3 "abc" New 0 5 "" + 2(1) Old 6 9 "abc" New 8 13 "" + 2: defpqr +.sp +The first number on each callout line is the count of matches. The +parenthesized number is the number of pairs that are set in the ovector (that +is, one more than the number of capturing groups that were set). Then are +listed the offsets of the old substring, its contents, and the same for the +replacement. +.P +By default, the substitution callout function returns zero, which accepts the +replacement and causes matching to continue if /g was used. Two further +modifiers can be used to test other return values. If \fBsubstitute_skip\fP is +set to a value greater than zero the callout function returns +1 for the match +of that number, and similarly \fBsubstitute_stop\fP returns -1. These cause the +replacement to be rejected, and -1 causes no further matching to take place. If +either of them are set, \fBsubstitute_callout\fP is assumed. For example: +.sp + /abc/g,replace=<$0>,substitute_skip=1 + abcdefabcpqr + 1(1) Old 0 3 "abc" New 0 5 " SKIPPED" + 2(1) Old 6 9 "abc" New 6 11 "" + 2: abcdefpqr + abcdefabcpqr\e=substitute_stop=1 + 1(1) Old 0 3 "abc" New 0 5 " STOPPED" + 1: abcdefabcpqr +.sp +If both are set for the same number, stop takes precedence. Only a single skip +or stop is supported, which is sufficient for testing that the feature works. +. +. +.SS "Setting the JIT stack size" +.rs +.sp +The \fBjitstack\fP modifier provides a way of setting the maximum stack size +that is used by the just-in-time optimization code. It is ignored if JIT +optimization is not being used. The value is a number of kibibytes (units of +1024 bytes). Setting zero reverts to the default of 32KiB. Providing a stack +that is larger than the default is necessary only for very complicated +patterns. If \fBjitstack\fP is set non-zero on a subject line it overrides any +value that was set on the pattern. +. +. +.SS "Setting heap, match, and depth limits" +.rs +.sp +The \fBheap_limit\fP, \fBmatch_limit\fP, and \fBdepth_limit\fP modifiers set +the appropriate limits in the match context. These values are ignored when the +\fBfind_limits\fP modifier is specified. +. +. +.SS "Finding minimum limits" +.rs +.sp +If the \fBfind_limits\fP modifier is present on a subject line, \fBpcre2test\fP +calls the relevant matching function several times, setting different values in +the match context via \fBpcre2_set_heap_limit()\fP, +\fBpcre2_set_match_limit()\fP, or \fBpcre2_set_depth_limit()\fP until it finds +the minimum values for each parameter that allows the match to complete without +error. If JIT is being used, only the match limit is relevant. +.P +When using this modifier, the pattern should not contain any limit settings +such as (*LIMIT_MATCH=...) within it. If such a setting is present and is +lower than the minimum matching value, the minimum value cannot be found +because \fBpcre2_set_match_limit()\fP etc. are only able to reduce the value of +an in-pattern limit; they cannot increase it. +.P +For non-DFA matching, the minimum \fIdepth_limit\fP number is a measure of how +much nested backtracking happens (that is, how deeply the pattern's tree is +searched). In the case of DFA matching, \fIdepth_limit\fP controls the depth of +recursive calls of the internal function that is used for handling pattern +recursion, lookaround assertions, and atomic groups. +.P +For non-DFA matching, the \fImatch_limit\fP number is a measure of the amount +of backtracking that takes place, and learning the minimum value can be +instructive. For most simple matches, the number is quite small, but for +patterns with very large numbers of matching possibilities, it can become large +very quickly with increasing length of subject string. In the case of DFA +matching, \fImatch_limit\fP controls the total number of calls, both recursive +and non-recursive, to the internal matching function, thus controlling the +overall amount of computing resource that is used. +.P +For both kinds of matching, the \fIheap_limit\fP number, which is in kibibytes +(units of 1024 bytes), limits the amount of heap memory used for matching. A +value of zero disables the use of any heap memory; many simple pattern matches +can be done without using the heap, so zero is not an unreasonable setting. +. +. +.SS "Showing MARK names" +.rs +.sp +.P +The \fBmark\fP modifier causes the names from backtracking control verbs that +are returned from calls to \fBpcre2_match()\fP to be displayed. If a mark is +returned for a match, non-match, or partial match, \fBpcre2test\fP shows it. +For a match, it is on a line by itself, tagged with "MK:". Otherwise, it +is added to the non-match message. +. +. +.SS "Showing memory usage" +.rs +.sp +The \fBmemory\fP modifier causes \fBpcre2test\fP to log the sizes of all heap +memory allocation and freeing calls that occur during a call to +\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP. These occur only when a match +requires a bigger vector than the default for remembering backtracking points +(\fBpcre2_match()\fP) or for internal workspace (\fBpcre2_dfa_match()\fP). In +many cases there will be no heap memory used and therefore no additional +output. No heap memory is allocated during matching with JIT, so in that case +the \fBmemory\fP modifier never has any effect. For this modifier to work, the +\fBnull_context\fP modifier must not be set on both the pattern and the +subject, though it can be set on one or the other. +. +. +.SS "Setting a starting offset" +.rs +.sp +The \fBoffset\fP modifier sets an offset in the subject string at which +matching starts. Its value is a number of code units, not characters. +. +. +.SS "Setting an offset limit" +.rs +.sp +The \fBoffset_limit\fP modifier sets a limit for unanchored matches. If a match +cannot be found starting at or before this offset in the subject, a "no match" +return is given. The data value is a number of code units, not characters. When +this modifier is used, the \fBuse_offset_limit\fP modifier must have been set +for the pattern; if not, an error is generated. +. +. +.SS "Setting the size of the output vector" +.rs +.sp +The \fBovector\fP modifier applies only to the subject line in which it +appears, though of course it can also be used to set a default in a +\fB#subject\fP command. It specifies the number of pairs of offsets that are +available for storing matching information. The default is 15. +.P +A value of zero is useful when testing the POSIX API because it causes +\fBregexec()\fP to be called with a NULL capture vector. When not testing the +POSIX API, a value of zero is used to cause +\fBpcre2_match_data_create_from_pattern()\fP to be called, in order to create a +match block of exactly the right size for the pattern. (It is not possible to +create a match block with a zero-length ovector; there is always at least one +pair of offsets.) +. +. +.SS "Passing the subject as zero-terminated" +.rs +.sp +By default, the subject string is passed to a native API matching function with +its correct length. In order to test the facility for passing a zero-terminated +string, the \fBzero_terminate\fP modifier is provided. It causes the length to +be passed as PCRE2_ZERO_TERMINATED. When matching via the POSIX interface, +this modifier is ignored, with a warning. +.P +When testing \fBpcre2_substitute()\fP, this modifier also has the effect of +passing the replacement string as zero-terminated. +. +. +.SS "Passing a NULL context" +.rs +.sp +Normally, \fBpcre2test\fP passes a context block to \fBpcre2_match()\fP, +\fBpcre2_dfa_match()\fP, \fBpcre2_jit_match()\fP or \fBpcre2_substitute()\fP. +If the \fBnull_context\fP modifier is set, however, NULL is passed. This is for +testing that the matching and substitution functions behave correctly in this +case (they use default values). This modifier cannot be used with the +\fBfind_limits\fP or \fBsubstitute_callout\fP modifiers. +. +. +.SH "THE ALTERNATIVE MATCHING FUNCTION" +.rs +.sp +By default, \fBpcre2test\fP uses the standard PCRE2 matching function, +\fBpcre2_match()\fP to match each subject line. PCRE2 also supports an +alternative matching function, \fBpcre2_dfa_match()\fP, which operates in a +different way, and has some restrictions. The differences between the two +functions are described in the +.\" HREF +\fBpcre2matching\fP +.\" +documentation. +.P +If the \fBdfa\fP modifier is set, the alternative matching function is used. +This function finds all possible matches at a given point in the subject. If, +however, the \fBdfa_shortest\fP modifier is set, processing stops after the +first match is found. This is always the shortest possible match. +. +. +.SH "DEFAULT OUTPUT FROM pcre2test" +.rs +.sp +This section describes the output when the normal matching function, +\fBpcre2_match()\fP, is being used. +.P +When a match succeeds, \fBpcre2test\fP outputs the list of captured substrings, +starting with number 0 for the string that matched the whole pattern. +Otherwise, it outputs "No match" when the return is PCRE2_ERROR_NOMATCH, or +"Partial match:" followed by the partially matching substring when the +return is PCRE2_ERROR_PARTIAL. (Note that this is the +entire substring that was inspected during the partial match; it may include +characters before the actual match start if a lookbehind assertion, \eK, \eb, +or \eB was involved.) +.P +For any other return, \fBpcre2test\fP outputs the PCRE2 negative error number +and a short descriptive phrase. If the error is a failed UTF string check, the +code unit offset of the start of the failing character is also output. Here is +an example of an interactive \fBpcre2test\fP run. +.sp + $ pcre2test + PCRE2 version 10.22 2016-07-29 +.sp + re> /^abc(\ed+)/ + data> abc123 + 0: abc123 + 1: 123 + data> xyz + No match +.sp +Unset capturing substrings that are not followed by one that is set are not +shown by \fBpcre2test\fP unless the \fBallcaptures\fP modifier is specified. In +the following example, there are two capturing substrings, but when the first +data line is matched, the second, unset substring is not shown. An "internal" +unset substring is shown as "", as for the second data line. +.sp + re> /(a)|(b)/ + data> a + 0: a + 1: a + data> b + 0: b + 1: + 2: b +.sp +If the strings contain any non-printing characters, they are output as \exhh +escapes if the value is less than 256 and UTF mode is not set. Otherwise they +are output as \ex{hh...} escapes. See below for the definition of non-printing +characters. If the \fBaftertext\fP modifier is set, the output for substring +0 is followed by the the rest of the subject string, identified by "0+" like +this: +.sp + re> /cat/aftertext + data> cataract + 0: cat + 0+ aract +.sp +If global matching is requested, the results of successive matching attempts +are output in sequence, like this: +.sp + re> /\eBi(\ew\ew)/g + data> Mississippi + 0: iss + 1: ss + 0: iss + 1: ss + 0: ipp + 1: pp +.sp +"No match" is output only if the first match attempt fails. Here is an example +of a failure message (the offset 4 that is specified by the \fBoffset\fP +modifier is past the end of the subject string): +.sp + re> /xyz/ + data> xyz\e=offset=4 + Error -24 (bad offset value) +.P +Note that whereas patterns can be continued over several lines (a plain ">" +prompt is used for continuations), subject lines may not. However newlines can +be included in a subject by means of the \en escape (or \er, \er\en, etc., +depending on the newline sequence setting). +. +. +. +.SH "OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION" +.rs +.sp +When the alternative matching function, \fBpcre2_dfa_match()\fP, is used, the +output consists of a list of all the matches that start at the first point in +the subject where there is at least one match. For example: +.sp + re> /(tang|tangerine|tan)/ + data> yellow tangerine\e=dfa + 0: tangerine + 1: tang + 2: tan +.sp +Using the normal matching function on this data finds only "tang". The +longest matching string is always given first (and numbered zero). After a +PCRE2_ERROR_PARTIAL return, the output is "Partial match:", followed by the +partially matching substring. Note that this is the entire substring that was +inspected during the partial match; it may include characters before the actual +match start if a lookbehind assertion, \eb, or \eB was involved. (\eK is not +supported for DFA matching.) +.P +If global matching is requested, the search for further matches resumes +at the end of the longest match. For example: +.sp + re> /(tang|tangerine|tan)/g + data> yellow tangerine and tangy sultana\e=dfa + 0: tangerine + 1: tang + 2: tan + 0: tang + 1: tan + 0: tan +.sp +The alternative matching function does not support substring capture, so the +modifiers that are concerned with captured substrings are not relevant. +. +. +.SH "RESTARTING AFTER A PARTIAL MATCH" +.rs +.sp +When the alternative matching function has given the PCRE2_ERROR_PARTIAL +return, indicating that the subject partially matched the pattern, you can +restart the match with additional subject data by means of the +\fBdfa_restart\fP modifier. For example: +.sp + re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/ + data> 23ja\e=ps,dfa + Partial match: 23ja + data> n05\e=dfa,dfa_restart + 0: n05 +.sp +For further information about partial matching, see the +.\" HREF +\fBpcre2partial\fP +.\" +documentation. +. +. +.\" HTML +.SH CALLOUTS +.rs +.sp +If the pattern contains any callout requests, \fBpcre2test\fP's callout +function is called during matching unless \fBcallout_none\fP is specified. This +works with both matching functions, and with JIT, though there are some +differences in behaviour. The output for callouts with numerical arguments and +those with string arguments is slightly different. +. +. +.SS "Callouts with numerical arguments" +.rs +.sp +By default, the callout function displays the callout number, the start and +current positions in the subject text at the callout time, and the next pattern +item to be tested. For example: +.sp + --->pqrabcdef + 0 ^ ^ \ed +.sp +This output indicates that callout number 0 occurred for a match attempt +starting at the fourth character of the subject string, when the pointer was at +the seventh character, and when the next pattern item was \ed. Just +one circumflex is output if the start and current positions are the same, or if +the current position precedes the start position, which can happen if the +callout is in a lookbehind assertion. +.P +Callouts numbered 255 are assumed to be automatic callouts, inserted as a +result of the \fBauto_callout\fP pattern modifier. In this case, instead of +showing the callout number, the offset in the pattern, preceded by a plus, is +output. For example: +.sp + re> /\ed?[A-E]\e*/auto_callout + data> E* + --->E* + +0 ^ \ed? + +3 ^ [A-E] + +8 ^^ \e* + +10 ^ ^ + 0: E* +.sp +If a pattern contains (*MARK) items, an additional line is output whenever +a change of latest mark is passed to the callout function. For example: +.sp + re> /a(*MARK:X)bc/auto_callout + data> abc + --->abc + +0 ^ a + +1 ^^ (*MARK:X) + +10 ^^ b + Latest Mark: X + +11 ^ ^ c + +12 ^ ^ + 0: abc +.sp +The mark changes between matching "a" and "b", but stays the same for the rest +of the match, so nothing more is output. If, as a result of backtracking, the +mark reverts to being unset, the text "" is output. +. +. +.SS "Callouts with string arguments" +.rs +.sp +The output for a callout with a string argument is similar, except that instead +of outputting a callout number before the position indicators, the callout +string and its offset in the pattern string are output before the reflection of +the subject string, and the subject string is reflected for each callout. For +example: +.sp + re> /^ab(?C'first')cd(?C"second")ef/ + data> abcdefg + Callout (7): 'first' + --->abcdefg + ^ ^ c + Callout (20): "second" + --->abcdefg + ^ ^ e + 0: abcdef +.sp +. +. +.SS "Callout modifiers" +.rs +.sp +The callout function in \fBpcre2test\fP returns zero (carry on matching) by +default, but you can use a \fBcallout_fail\fP modifier in a subject line to +change this and other parameters of the callout (see below). +.P +If the \fBcallout_capture\fP modifier is set, the current captured groups are +output when a callout occurs. This is useful only for non-DFA matching, as +\fBpcre2_dfa_match()\fP does not support capturing, so no captures are ever +shown. +.P +The normal callout output, showing the callout number or pattern offset (as +described above) is suppressed if the \fBcallout_no_where\fP modifier is set. +.P +When using the interpretive matching function \fBpcre2_match()\fP without JIT, +setting the \fBcallout_extra\fP modifier causes additional output from +\fBpcre2test\fP's callout function to be generated. For the first callout in a +match attempt at a new starting position in the subject, "New match attempt" is +output. If there has been a backtrack since the last callout (or start of +matching if this is the first callout), "Backtrack" is output, followed by "No +other matching paths" if the backtrack ended the previous match attempt. For +example: +.sp + re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess + data> aac\e=callout_extra + New match attempt + --->aac + +0 ^ ( + +1 ^ a+ + +3 ^ ^ ) + +4 ^ ^ b + Backtrack + --->aac + +3 ^^ ) + +4 ^^ b + Backtrack + No other matching paths + New match attempt + --->aac + +0 ^ ( + +1 ^ a+ + +3 ^^ ) + +4 ^^ b + Backtrack + No other matching paths + New match attempt + --->aac + +0 ^ ( + +1 ^ a+ + Backtrack + No other matching paths + New match attempt + --->aac + +0 ^ ( + +1 ^ a+ + No match +.sp +Notice that various optimizations must be turned off if you want all possible +matching paths to be scanned. If \fBno_start_optimize\fP is not used, there is +an immediate "no match", without any callouts, because the starting +optimization fails to find "b" in the subject, which it knows must be present +for any match. If \fBno_auto_possess\fP is not used, the "a+" item is turned +into "a++", which reduces the number of backtracks. +.P +The \fBcallout_extra\fP modifier has no effect if used with the DFA matching +function, or with JIT. +. +. +.SS "Return values from callouts" +.rs +.sp +The default return from the callout function is zero, which allows matching to +continue. The \fBcallout_fail\fP modifier can be given one or two numbers. If +there is only one number, 1 is returned instead of 0 (causing matching to +backtrack) when a callout of that number is reached. If two numbers (:) +are given, 1 is returned when callout is reached and there have been at +least callouts. The \fBcallout_error\fP modifier is similar, except that +PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be +aborted. If both these modifiers are set for the same callout number, +\fBcallout_error\fP takes precedence. Note that callouts with string arguments +are always given the number zero. +.P +The \fBcallout_data\fP modifier can be given an unsigned or a negative number. +This is set as the "user data" that is passed to the matching function, and +passed back when the callout function is invoked. Any value other than zero is +used as a return from \fBpcre2test\fP's callout function. +.P +Inserting callouts can be helpful when using \fBpcre2test\fP to check +complicated regular expressions. For further information about callouts, see +the +.\" HREF +\fBpcre2callout\fP +.\" +documentation. +. +. +. +.SH "NON-PRINTING CHARACTERS" +.rs +.sp +When \fBpcre2test\fP is outputting text in the compiled version of a pattern, +bytes other than 32-126 are always treated as non-printing characters and are +therefore shown as hex escapes. +.P +When \fBpcre2test\fP is outputting text that is a matched part of a subject +string, it behaves in the same way, unless a different locale has been set for +the pattern (using the \fBlocale\fP modifier). In this case, the +\fBisprint()\fP function is used to distinguish printing and non-printing +characters. +. +. +. +.\" HTML +.SH "SAVING AND RESTORING COMPILED PATTERNS" +.rs +.sp +It is possible to save compiled patterns on disc or elsewhere, and reload them +later, subject to a number of restrictions. JIT data cannot be saved. The host +on which the patterns are reloaded must be running the same version of PCRE2, +with the same code unit width, and must also have the same endianness, pointer +width and PCRE2_SIZE type. Before compiled patterns can be saved they must be +serialized, that is, converted to a stream of bytes. A single byte stream may +contain any number of compiled patterns, but they must all use the same +character tables. A single copy of the tables is included in the byte stream +(its size is 1088 bytes). +.P +The functions whose names begin with \fBpcre2_serialize_\fP are used +for serializing and de-serializing. They are described in the +.\" HREF +\fBpcre2serialize\fP +.\" +documentation. In this section we describe the features of \fBpcre2test\fP that +can be used to test these functions. +.P +Note that "serialization" in PCRE2 does not convert compiled patterns to an +abstract format like Java or .NET. It just makes a reloadable byte code stream. +Hence the restrictions on reloading mentioned above. +.P +In \fBpcre2test\fP, when a pattern with \fBpush\fP modifier is successfully +compiled, it is pushed onto a stack of compiled patterns, and \fBpcre2test\fP +expects the next line to contain a new pattern (or command) instead of a +subject line. By contrast, the \fBpushcopy\fP modifier causes a copy of the +compiled pattern to be stacked, leaving the original available for immediate +matching. By using \fBpush\fP and/or \fBpushcopy\fP, a number of patterns can +be compiled and retained. These modifiers are incompatible with \fBposix\fP, +and control modifiers that act at match time are ignored (with a message) for +the stacked patterns. The \fBjitverify\fP modifier applies only at compile +time. +.P +The command +.sp + #save +.sp +causes all the stacked patterns to be serialized and the result written to the +named file. Afterwards, all the stacked patterns are freed. The command +.sp + #load +.sp +reads the data in the file, and then arranges for it to be de-serialized, with +the resulting compiled patterns added to the pattern stack. The pattern on the +top of the stack can be retrieved by the #pop command, which must be followed +by lines of subjects that are to be matched with the pattern, terminated as +usual by an empty line or end of file. This command may be followed by a +modifier list containing only +.\" HTML +.\" +control modifiers +.\" +that act after a pattern has been compiled. In particular, \fBhex\fP, +\fBposix\fP, \fBposix_nosub\fP, \fBpush\fP, and \fBpushcopy\fP are not allowed, +nor are any +.\" HTML +.\" +option-setting modifiers. +.\" +The JIT modifiers are, however permitted. Here is an example that saves and +reloads two patterns. +.sp + /abc/push + /xyz/push + #save tempfile + #load tempfile + #pop info + xyz +.sp + #pop jit,bincode + abc +.sp +If \fBjitverify\fP is used with #pop, it does not automatically imply +\fBjit\fP, which is different behaviour from when it is used on a pattern. +.P +The #popcopy command is analagous to the \fBpushcopy\fP modifier in that it +makes current a copy of the topmost stack pattern, leaving the original still +on the stack. +. +. +. +.SH "SEE ALSO" +.rs +.sp +\fBpcre2\fP(3), \fBpcre2api\fP(3), \fBpcre2callout\fP(3), +\fBpcre2jit\fP, \fBpcre2matching\fP(3), \fBpcre2partial\fP(d), +\fBpcre2pattern\fP(3), \fBpcre2serialize\fP(3). +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 28 April 2021 +Copyright (c) 1997-2021 University of Cambridge. +.fi diff --git a/src/pcre2/doc/pcre2test.txt b/src/pcre2/doc/pcre2test.txt new file mode 100644 index 00000000..a91f356c --- /dev/null +++ b/src/pcre2/doc/pcre2test.txt @@ -0,0 +1,1939 @@ +PCRE2TEST(1) General Commands Manual PCRE2TEST(1) + + + +NAME + pcre2test - a program for testing Perl-compatible regular expressions. + +SYNOPSIS + + pcre2test [options] [input file [output file]] + + pcre2test is a test program for the PCRE2 regular expression libraries, + but it can also be used for experimenting with regular expressions. + This document describes the features of the test program; for details + of the regular expressions themselves, see the pcre2pattern documenta- + tion. For details of the PCRE2 library function calls and their op- + tions, see the pcre2api documentation. + + The input for pcre2test is a sequence of regular expression patterns + and subject strings to be matched. There are also command lines for + setting defaults and controlling some special actions. The output shows + the result of each match attempt. Modifiers on external or internal + command lines, the patterns, and the subject lines specify PCRE2 func- + tion options, control how the subject is processed, and what output is + produced. + + As the original fairly simple PCRE library evolved, it acquired many + different features, and as a result, the original pcretest program + ended up with a lot of options in a messy, arcane syntax for testing + all the features. The move to the new PCRE2 API provided an opportunity + to re-implement the test program as pcre2test, with a cleaner modifier + syntax. Nevertheless, there are still many obscure modifiers, some of + which are specifically designed for use in conjunction with the test + script and data files that are distributed as part of PCRE2. All the + modifiers are documented here, some without much justification, but + many of them are unlikely to be of use except when testing the li- + braries. + + +PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES + + Different versions of the PCRE2 library can be built to support charac- + ter strings that are encoded in 8-bit, 16-bit, or 32-bit code units. + One, two, or all three of these libraries may be simultaneously in- + stalled. The pcre2test program can be used to test all the libraries. + However, its own input and output are always in 8-bit format. When + testing the 16-bit or 32-bit libraries, patterns and subject strings + are converted to 16-bit or 32-bit format before being passed to the li- + brary functions. Results are converted back to 8-bit code units for + output. + + In the rest of this document, the names of library functions and struc- + tures are given in generic form, for example, pcre_compile(). The ac- + tual names used in the libraries have a suffix _8, _16, or _32, as ap- + propriate. + + +INPUT ENCODING + + Input to pcre2test is processed line by line, either by calling the C + library's fgets() function, or via the libreadline library. In some + Windows environments character 26 (hex 1A) causes an immediate end of + file, and no further data is read, so this character should be avoided + unless you really want that action. + + The input is processed using using C's string functions, so must not + contain binary zeros, even though in Unix-like environments, fgets() + treats any bytes other than newline as data characters. An error is + generated if a binary zero is encountered. By default subject lines are + processed for backslash escapes, which makes it possible to include any + data value in strings that are passed to the library for matching. For + patterns, there is a facility for specifying some or all of the 8-bit + input characters as hexadecimal pairs, which makes it possible to in- + clude binary zeros. + + Input for the 16-bit and 32-bit libraries + + When testing the 16-bit or 32-bit libraries, there is a need to be able + to generate character code points greater than 255 in the strings that + are passed to the library. For subject lines, backslash escapes can be + used. In addition, when the utf modifier (see "Setting compilation op- + tions" below) is set, the pattern and any following subject lines are + interpreted as UTF-8 strings and translated to UTF-16 or UTF-32 as ap- + propriate. + + For non-UTF testing of wide characters, the utf8_input modifier can be + used. This is mutually exclusive with utf, and is allowed only in + 16-bit or 32-bit mode. It causes the pattern and following subject + lines to be treated as UTF-8 according to the original definition (RFC + 2279), which allows for character values up to 0x7fffffff. Each charac- + ter is placed in one 16-bit or 32-bit code unit (in the 16-bit case, + values greater than 0xffff cause an error to occur). + + UTF-8 (in its original definition) is not capable of encoding values + greater than 0x7fffffff, but such values can be handled by the 32-bit + library. When testing this library in non-UTF mode with utf8_input set, + if any character is preceded by the byte 0xff (which is an invalid byte + in UTF-8) 0x80000000 is added to the character's value. This is the + only way of passing such code points in a pattern string. For subject + strings, using an escape sequence is preferable. + + +COMMAND LINE OPTIONS + + -8 If the 8-bit library has been built, this option causes it to + be used (this is the default). If the 8-bit library has not + been built, this option causes an error. + + -16 If the 16-bit library has been built, this option causes it + to be used. If only the 16-bit library has been built, this + is the default. If the 16-bit library has not been built, + this option causes an error. + + -32 If the 32-bit library has been built, this option causes it + to be used. If only the 32-bit library has been built, this + is the default. If the 32-bit library has not been built, + this option causes an error. + + -ac Behave as if each pattern has the auto_callout modifier, that + is, insert automatic callouts into every pattern that is com- + piled. + + -AC As for -ac, but in addition behave as if each subject line + has the callout_extra modifier, that is, show additional in- + formation from callouts. + + -b Behave as if each pattern has the fullbincode modifier; the + full internal binary form of the pattern is output after com- + pilation. + + -C Output the version number of the PCRE2 library, and all + available information about the optional features that are + included, and then exit with zero exit code. All other op- + tions are ignored. If both -C and -LM are present, whichever + is first is recognized. + + -C option Output information about a specific build-time option, then + exit. This functionality is intended for use in scripts such + as RunTest. The following options output the value and set + the exit code as indicated: + + ebcdic-nl the code for LF (= NL) in an EBCDIC environment: + 0x15 or 0x25 + 0 if used in an ASCII environment + exit code is always 0 + linksize the configured internal link size (2, 3, or 4) + exit code is set to the link size + newline the default newline setting: + CR, LF, CRLF, ANYCRLF, ANY, or NUL + exit code is always 0 + bsr the default setting for what \R matches: + ANYCRLF or ANY + exit code is always 0 + + The following options output 1 for true or 0 for false, and + set the exit code to the same value: + + backslash-C \C is supported (not locked out) + ebcdic compiled for an EBCDIC environment + jit just-in-time support is available + pcre2-16 the 16-bit library was built + pcre2-32 the 32-bit library was built + pcre2-8 the 8-bit library was built + unicode Unicode support is available + + If an unknown option is given, an error message is output; + the exit code is 0. + + -d Behave as if each pattern has the debug modifier; the inter- + nal form and information about the compiled pattern is output + after compilation; -d is equivalent to -b -i. + + -dfa Behave as if each subject line has the dfa modifier; matching + is done using the pcre2_dfa_match() function instead of the + default pcre2_match(). + + -error number[,number,...] + Call pcre2_get_error_message() for each of the error numbers + in the comma-separated list, display the resulting messages + on the standard output, then exit with zero exit code. The + numbers may be positive or negative. This is a convenience + facility for PCRE2 maintainers. + + -help Output a brief summary these options and then exit. + + -i Behave as if each pattern has the info modifier; information + about the compiled pattern is given after compilation. + + -jit Behave as if each pattern line has the jit modifier; after + successful compilation, each pattern is passed to the just- + in-time compiler, if available. + + -jitfast Behave as if each pattern line has the jitfast modifier; af- + ter successful compilation, each pattern is passed to the + just-in-time compiler, if available, and each subject line is + passed directly to the JIT matcher via its "fast path". + + -jitverify + Behave as if each pattern line has the jitverify modifier; + after successful compilation, each pattern is passed to the + just-in-time compiler, if available, and the use of JIT for + matching is verified. + + -LM List modifiers: write a list of available pattern and subject + modifiers to the standard output, then exit with zero exit + code. All other options are ignored. If both -C and -LM are + present, whichever is first is recognized. + + -pattern modifier-list + Behave as if each pattern line contains the given modifiers. + + -q Do not output the version number of pcre2test at the start of + execution. + + -S size On Unix-like systems, set the size of the run-time stack to + size mebibytes (units of 1024*1024 bytes). + + -subject modifier-list + Behave as if each subject line contains the given modifiers. + + -t Run each compile and match many times with a timer, and out- + put the resulting times per compile or match. When JIT is + used, separate times are given for the initial compile and + the JIT compile. You can control the number of iterations + that are used for timing by following -t with a number (as a + separate item on the command line). For example, "-t 1000" + iterates 1000 times. The default is to iterate 500,000 times. + + -tm This is like -t except that it times only the matching phase, + not the compile phase. + + -T -TM These behave like -t and -tm, but in addition, at the end of + a run, the total times for all compiles and matches are out- + put. + + -version Output the PCRE2 version number and then exit. + + +DESCRIPTION + + If pcre2test is given two filename arguments, it reads from the first + and writes to the second. If the first name is "-", input is taken from + the standard input. If pcre2test is given only one argument, it reads + from that file and writes to stdout. Otherwise, it reads from stdin and + writes to stdout. + + When pcre2test is built, a configuration option can specify that it + should be linked with the libreadline or libedit library. When this is + done, if the input is from a terminal, it is read using the readline() + function. This provides line-editing and history facilities. The output + from the -help option states whether or not readline() will be used. + + The program handles any number of tests, each of which consists of a + set of input lines. Each set starts with a regular expression pattern, + followed by any number of subject lines to be matched against that pat- + tern. In between sets of test data, command lines that begin with # may + appear. This file format, with some restrictions, can also be processed + by the perltest.sh script that is distributed with PCRE2 as a means of + checking that the behaviour of PCRE2 and Perl is the same. For a speci- + fication of perltest.sh, see the comments near its beginning. See also + the #perltest command below. + + When the input is a terminal, pcre2test prompts for each line of input, + using "re>" to prompt for regular expression patterns, and "data>" to + prompt for subject lines. Command lines starting with # can be entered + only in response to the "re>" prompt. + + Each subject line is matched separately and independently. If you want + to do multi-line matches, you have to use the \n escape sequence (or \r + or \r\n, etc., depending on the newline setting) in a single line of + input to encode the newline sequences. There is no limit on the length + of subject lines; the input buffer is automatically extended if it is + too small. There are replication features that makes it possible to + generate long repetitive pattern or subject lines without having to + supply them explicitly. + + An empty line or the end of the file signals the end of the subject + lines for a test, at which point a new pattern or command line is ex- + pected if there is still input to be read. + + +COMMAND LINES + + In between sets of test data, a line that begins with # is interpreted + as a command line. If the first character is followed by white space or + an exclamation mark, the line is treated as a comment, and ignored. + Otherwise, the following commands are recognized: + + #forbid_utf + + Subsequent patterns automatically have the PCRE2_NEVER_UTF and + PCRE2_NEVER_UCP options set, which locks out the use of the PCRE2_UTF + and PCRE2_UCP options and the use of (*UTF) and (*UCP) at the start of + patterns. This command also forces an error if a subsequent pattern + contains any occurrences of \P, \p, or \X, which are still supported + when PCRE2_UTF is not set, but which require Unicode property support + to be included in the library. + + This is a trigger guard that is used in test files to ensure that UTF + or Unicode property tests are not accidentally added to files that are + used when Unicode support is not included in the library. Setting + PCRE2_NEVER_UTF and PCRE2_NEVER_UCP as a default can also be obtained + by the use of #pattern; the difference is that #forbid_utf cannot be + unset, and the automatic options are not displayed in pattern informa- + tion, to avoid cluttering up test output. + + #load + + This command is used to load a set of precompiled patterns from a file, + as described in the section entitled "Saving and restoring compiled + patterns" below. + + #loadtables + + This command is used to load a set of binary character tables that can + be accessed by the tables=3 qualifier. Such tables can be created by + the pcre2_dftables program with the -b option. + + #newline_default [] + + When PCRE2 is built, a default newline convention can be specified. + This determines which characters and/or character pairs are recognized + as indicating a newline in a pattern or subject string. The default can + be overridden when a pattern is compiled. The standard test files con- + tain tests of various newline conventions, but the majority of the + tests expect a single linefeed to be recognized as a newline by de- + fault. Without special action the tests would fail when PCRE2 is com- + piled with either CR or CRLF as the default newline. + + The #newline_default command specifies a list of newline types that are + acceptable as the default. The types must be one of CR, LF, CRLF, ANY- + CRLF, ANY, or NUL (in upper or lower case), for example: + + #newline_default LF Any anyCRLF + + If the default newline is in the list, this command has no effect. Oth- + erwise, except when testing the POSIX API, a newline modifier that + specifies the first newline convention in the list (LF in the above ex- + ample) is added to any pattern that does not already have a newline + modifier. If the newline list is empty, the feature is turned off. This + command is present in a number of the standard test input files. + + When the POSIX API is being tested there is no way to override the de- + fault newline convention, though it is possible to set the newline con- + vention from within the pattern. A warning is given if the posix or + posix_nosub modifier is used when #newline_default would set a default + for the non-POSIX API. + + #pattern + + This command sets a default modifier list that applies to all subse- + quent patterns. Modifiers on a pattern can change these settings. + + #perltest + + This line is used in test files that can also be processed by perl- + test.sh to confirm that Perl gives the same results as PCRE2. Subse- + quent tests are checked for the use of pcre2test features that are in- + compatible with the perltest.sh script. + + Patterns must use '/' as their delimiter, and only certain modifiers + are supported. Comment lines, #pattern commands, and #subject commands + that set or unset "mark" are recognized and acted on. The #perltest, + #forbid_utf, and #newline_default commands, which are needed in the + relevant pcre2test files, are silently ignored. All other command lines + are ignored, but give a warning message. The #perltest command helps + detect tests that are accidentally put in the wrong file or use the + wrong delimiter. For more details of the perltest.sh script see the + comments it contains. + + #pop [] + #popcopy [] + + These commands are used to manipulate the stack of compiled patterns, + as described in the section entitled "Saving and restoring compiled + patterns" below. + + #save + + This command is used to save a set of compiled patterns to a file, as + described in the section entitled "Saving and restoring compiled pat- + terns" below. + + #subject + + This command sets a default modifier list that applies to all subse- + quent subject lines. Modifiers on a subject line can change these set- + tings. + + +MODIFIER SYNTAX + + Modifier lists are used with both pattern and subject lines. Items in a + list are separated by commas followed by optional white space. Trailing + whitespace in a modifier list is ignored. Some modifiers may be given + for both patterns and subject lines, whereas others are valid only for + one or the other. Each modifier has a long name, for example "an- + chored", and some of them must be followed by an equals sign and a + value, for example, "offset=12". Values cannot contain comma charac- + ters, but may contain spaces. Modifiers that do not take values may be + preceded by a minus sign to turn off a previous setting. + + A few of the more common modifiers can also be specified as single let- + ters, for example "i" for "caseless". In documentation, following the + Perl convention, these are written with a slash ("the /i modifier") for + clarity. Abbreviated modifiers must all be concatenated in the first + item of a modifier list. If the first item is not recognized as a long + modifier name, it is interpreted as a sequence of these abbreviations. + For example: + + /abc/ig,newline=cr,jit=3 + + This is a pattern line whose modifier list starts with two one-letter + modifiers (/i and /g). The lower-case abbreviated modifiers are the + same as used in Perl. + + +PATTERN SYNTAX + + A pattern line must start with one of the following characters (common + symbols, excluding pattern meta-characters): + + / ! " ' ` - = _ : ; , % & @ ~ + + This is interpreted as the pattern's delimiter. A regular expression + may be continued over several input lines, in which case the newline + characters are included within it. It is possible to include the delim- + iter within the pattern by escaping it with a backslash, for example + + /abc\/def/ + + If you do this, the escape and the delimiter form part of the pattern, + but since the delimiters are all non-alphanumeric, this does not affect + its interpretation. If the terminating delimiter is immediately fol- + lowed by a backslash, for example, + + /abc/\ + + then a backslash is added to the end of the pattern. This is done to + provide a way of testing the error condition that arises if a pattern + finishes with a backslash, because + + /abc\/ + + is interpreted as the first line of a pattern that starts with "abc/", + causing pcre2test to read the next line as a continuation of the regu- + lar expression. + + A pattern can be followed by a modifier list (details below). + + +SUBJECT LINE SYNTAX + + Before each subject line is passed to pcre2_match() or + pcre2_dfa_match(), leading and trailing white space is removed, and the + line is scanned for backslash escapes, unless the subject_literal modi- + fier was set for the pattern. The following provide a means of encoding + non-printing characters in a visible way: + + \a alarm (BEL, \x07) + \b backspace (\x08) + \e escape (\x27) + \f form feed (\x0c) + \n newline (\x0a) + \r carriage return (\x0d) + \t tab (\x09) + \v vertical tab (\x0b) + \nnn octal character (up to 3 octal digits); always + a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode + \o{dd...} octal character (any number of octal digits} + \xhh hexadecimal byte (up to 2 hex digits) + \x{hh...} hexadecimal character (any number of hex digits) + + The use of \x{hh...} is not dependent on the use of the utf modifier on + the pattern. It is recognized always. There may be any number of hexa- + decimal digits inside the braces; invalid values provoke error mes- + sages. + + Note that \xhh specifies one byte rather than one character in UTF-8 + mode; this makes it possible to construct invalid UTF-8 sequences for + testing purposes. On the other hand, \x{hh} is interpreted as a UTF-8 + character in UTF-8 mode, generating more than one byte if the value is + greater than 127. When testing the 8-bit library not in UTF-8 mode, + \x{hh} generates one byte for values less than 256, and causes an error + for greater values. + + In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it + possible to construct invalid UTF-16 sequences for testing purposes. + + In UTF-32 mode, all 4- to 8-digit \x{...} values are accepted. This + makes it possible to construct invalid UTF-32 sequences for testing + purposes. + + There is a special backslash sequence that specifies replication of one + or more characters: + + \[]{} + + This makes it possible to test long strings without having to provide + them as part of the file. For example: + + \[abc]{4} + + is converted to "abcabcabcabc". This feature does not support nesting. + To include a closing square bracket in the characters, code it as \x5D. + + A backslash followed by an equals sign marks the end of the subject + string and the start of a modifier list. For example: + + abc\=notbol,notempty + + If the subject string is empty and \= is followed by whitespace, the + line is treated as a comment line, and is not used for matching. For + example: + + \= This is a comment. + abc\= This is an invalid modifier list. + + A backslash followed by any other non-alphanumeric character just es- + capes that character. A backslash followed by anything else causes an + error. However, if the very last character in the line is a backslash + (and there is no modifier list), it is ignored. This gives a way of + passing an empty line as data, since a real empty line terminates the + data input. + + If the subject_literal modifier is set for a pattern, all subject lines + that follow are treated as literals, with no special treatment of back- + slashes. No replication is possible, and any subject modifiers must be + set as defaults by a #subject command. + + +PATTERN MODIFIERS + + There are several types of modifier that can appear in pattern lines. + Except where noted below, they may also be used in #pattern commands. A + pattern's modifier list can add to or override default modifiers that + were set by a previous #pattern command. + + Setting compilation options + + The following modifiers set options for pcre2_compile(). Most of them + set bits in the options argument of that function, but those whose + names start with PCRE2_EXTRA are additional options that are set in the + compile context. For the main options, there are some single-letter ab- + breviations that are the same as Perl options. There is special han- + dling for /x: if a second x is present, PCRE2_EXTENDED is converted + into PCRE2_EXTENDED_MORE as in Perl. A third appearance adds PCRE2_EX- + TENDED as well, though this makes no difference to the way pcre2_com- + pile() behaves. See pcre2api for a description of the effects of these + options. + + allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS + allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES + alt_bsux set PCRE2_ALT_BSUX + alt_circumflex set PCRE2_ALT_CIRCUMFLEX + alt_verbnames set PCRE2_ALT_VERBNAMES + anchored set PCRE2_ANCHORED + auto_callout set PCRE2_AUTO_CALLOUT + bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL + /i caseless set PCRE2_CASELESS + dollar_endonly set PCRE2_DOLLAR_ENDONLY + /s dotall set PCRE2_DOTALL + dupnames set PCRE2_DUPNAMES + endanchored set PCRE2_ENDANCHORED + escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF + /x extended set PCRE2_EXTENDED + /xx extended_more set PCRE2_EXTENDED_MORE + extra_alt_bsux set PCRE2_EXTRA_ALT_BSUX + firstline set PCRE2_FIRSTLINE + literal set PCRE2_LITERAL + match_line set PCRE2_EXTRA_MATCH_LINE + match_invalid_utf set PCRE2_MATCH_INVALID_UTF + match_unset_backref set PCRE2_MATCH_UNSET_BACKREF + match_word set PCRE2_EXTRA_MATCH_WORD + /m multiline set PCRE2_MULTILINE + never_backslash_c set PCRE2_NEVER_BACKSLASH_C + never_ucp set PCRE2_NEVER_UCP + never_utf set PCRE2_NEVER_UTF + /n no_auto_capture set PCRE2_NO_AUTO_CAPTURE + no_auto_possess set PCRE2_NO_AUTO_POSSESS + no_dotstar_anchor set PCRE2_NO_DOTSTAR_ANCHOR + no_start_optimize set PCRE2_NO_START_OPTIMIZE + no_utf_check set PCRE2_NO_UTF_CHECK + ucp set PCRE2_UCP + ungreedy set PCRE2_UNGREEDY + use_offset_limit set PCRE2_USE_OFFSET_LIMIT + utf set PCRE2_UTF + + As well as turning on the PCRE2_UTF option, the utf modifier causes all + non-printing characters in output strings to be printed using the + \x{hh...} notation. Otherwise, those less than 0x100 are output in hex + without the curly brackets. Setting utf in 16-bit or 32-bit mode also + causes pattern and subject strings to be translated to UTF-16 or + UTF-32, respectively, before being passed to library functions. + + Setting compilation controls + + The following modifiers affect the compilation process or request in- + formation about the pattern. There are single-letter abbreviations for + some that are heavily used in the test files. + + bsr=[anycrlf|unicode] specify \R handling + /B bincode show binary code without lengths + callout_info show callout information + convert= request foreign pattern conversion + convert_glob_escape=c set glob escape character + convert_glob_separator=c set glob separator character + convert_length set convert buffer length + debug same as info,fullbincode + framesize show matching frame size + fullbincode show binary code with lengths + /I info show info about compiled pattern + hex unquoted characters are hexadecimal + jit[=] use JIT + jitfast use JIT fast path + jitverify verify JIT use + locale= use this locale + max_pattern_length= set the maximum pattern length + memory show memory used + newline= set newline type + null_context compile with a NULL context + parens_nest_limit= set maximum parentheses depth + posix use the POSIX API + posix_nosub use the POSIX API with REG_NOSUB + push push compiled pattern onto the stack + pushcopy push a copy onto the stack + stackguard= test the stackguard feature + subject_literal treat all subject lines as literal + tables=[0|1|2|3] select internal tables + use_length do not zero-terminate the pattern + utf8_input treat input as UTF-8 + + The effects of these modifiers are described in the following sections. + + Newline and \R handling + + The bsr modifier specifies what \R in a pattern should match. If it is + set to "anycrlf", \R matches CR, LF, or CRLF only. If it is set to + "unicode", \R matches any Unicode newline sequence. The default can be + specified when PCRE2 is built; if it is not, the default is set to Uni- + code. + + The newline modifier specifies which characters are to be interpreted + as newlines, both in the pattern and in subject lines. The type must be + one of CR, LF, CRLF, ANYCRLF, ANY, or NUL (in upper or lower case). + + Information about a pattern + + The debug modifier is a shorthand for info,fullbincode, requesting all + available information. + + The bincode modifier causes a representation of the compiled code to be + output after compilation. This information does not contain length and + offset values, which ensures that the same output is generated for dif- + ferent internal link sizes and different code unit widths. By using + bincode, the same regression tests can be used in different environ- + ments. + + The fullbincode modifier, by contrast, does include length and offset + values. This is used in a few special tests that run only for specific + code unit widths and link sizes, and is also useful for one-off tests. + + The info modifier requests information about the compiled pattern + (whether it is anchored, has a fixed first character, and so on). The + information is obtained from the pcre2_pattern_info() function. Here + are some typical examples: + + re> /(?i)(^a|^b)/m,info + Capture group count = 1 + Compile options: multiline + Overall options: caseless multiline + First code unit at start or follows newline + Subject length lower bound = 1 + + re> /(?i)abc/info + Capture group count = 0 + Compile options: + Overall options: caseless + First code unit = 'a' (caseless) + Last code unit = 'c' (caseless) + Subject length lower bound = 3 + + "Compile options" are those specified by modifiers; "overall options" + have added options that are taken or deduced from the pattern. If both + sets of options are the same, just a single "options" line is output; + if there are no options, the line is omitted. "First code unit" is + where any match must start; if there is more than one they are listed + as "starting code units". "Last code unit" is the last literal code + unit that must be present in any match. This is not necessarily the + last character. These lines are omitted if no starting or ending code + units are recorded. The subject length line is omitted when + no_start_optimize is set because the minimum length is not calculated + when it can never be used. + + The framesize modifier shows the size, in bytes, of the storage frames + used by pcre2_match() for handling backtracking. The size depends on + the number of capturing parentheses in the pattern. + + The callout_info modifier requests information about all the callouts + in the pattern. A list of them is output at the end of any other infor- + mation that is requested. For each callout, either its number or string + is given, followed by the item that follows it in the pattern. + + Passing a NULL context + + Normally, pcre2test passes a context block to pcre2_compile(). If the + null_context modifier is set, however, NULL is passed. This is for + testing that pcre2_compile() behaves correctly in this case (it uses + default values). + + Specifying pattern characters in hexadecimal + + The hex modifier specifies that the characters of the pattern, except + for substrings enclosed in single or double quotes, are to be inter- + preted as pairs of hexadecimal digits. This feature is provided as a + way of creating patterns that contain binary zeros and other non-print- + ing characters. White space is permitted between pairs of digits. For + example, this pattern contains three characters: + + /ab 32 59/hex + + Parts of such a pattern are taken literally if quoted. This pattern + contains nine characters, only two of which are specified in hexadeci- + mal: + + /ab "literal" 32/hex + + Either single or double quotes may be used. There is no way of includ- + ing the delimiter within a substring. The hex and expand modifiers are + mutually exclusive. + + Specifying the pattern's length + + By default, patterns are passed to the compiling functions as zero-ter- + minated strings but can be passed by length instead of being zero-ter- + minated. The use_length modifier causes this to happen. Using a length + happens automatically (whether or not use_length is set) when hex is + set, because patterns specified in hexadecimal may contain binary ze- + ros. + + If hex or use_length is used with the POSIX wrapper API (see "Using the + POSIX wrapper API" below), the REG_PEND extension is used to pass the + pattern's length. + + Specifying wide characters in 16-bit and 32-bit modes + + In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 + and translated to UTF-16 or UTF-32 when the utf modifier is set. For + testing the 16-bit and 32-bit libraries in non-UTF mode, the utf8_input + modifier can be used. It is mutually exclusive with utf. Input lines + are interpreted as UTF-8 as a means of specifying wide characters. More + details are given in "Input encoding" above. + + Generating long repetitive patterns + + Some tests use long patterns that are very repetitive. Instead of cre- + ating a very long input line for such a pattern, you can use a special + repetition feature, similar to the one described for subject lines + above. If the expand modifier is present on a pattern, parts of the + pattern that have the form + + \[]{} + + are expanded before the pattern is passed to pcre2_compile(). For exam- + ple, \[AB]{6000} is expanded to "ABAB..." 6000 times. This construction + cannot be nested. An initial "\[" sequence is recognized only if "]{" + followed by decimal digits and "}" is found later in the pattern. If + not, the characters remain in the pattern unaltered. The expand and hex + modifiers are mutually exclusive. + + If part of an expanded pattern looks like an expansion, but is really + part of the actual pattern, unwanted expansion can be avoided by giving + two values in the quantifier. For example, \[AB]{6000,6000} is not rec- + ognized as an expansion item. + + If the info modifier is set on an expanded pattern, the result of the + expansion is included in the information that is output. + + JIT compilation + + Just-in-time (JIT) compiling is a heavyweight optimization that can + greatly speed up pattern matching. See the pcre2jit documentation for + details. JIT compiling happens, optionally, after a pattern has been + successfully compiled into an internal form. The JIT compiler converts + this to optimized machine code. It needs to know whether the match-time + options PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used, + because different code is generated for the different cases. See the + partial modifier in "Subject Modifiers" below for details of how these + options are specified for each match attempt. + + JIT compilation is requested by the jit pattern modifier, which may op- + tionally be followed by an equals sign and a number in the range 0 to + 7. The three bits that make up the number specify which of the three + JIT operating modes are to be compiled: + + 1 compile JIT code for non-partial matching + 2 compile JIT code for soft partial matching + 4 compile JIT code for hard partial matching + + The possible values for the jit modifier are therefore: + + 0 disable JIT + 1 normal matching only + 2 soft partial matching only + 3 normal and soft partial matching + 4 hard partial matching only + 6 soft and hard partial matching only + 7 all three modes + + If no number is given, 7 is assumed. The phrase "partial matching" + means a call to pcre2_match() with either the PCRE2_PARTIAL_SOFT or the + PCRE2_PARTIAL_HARD option set. Note that such a call may return a com- + plete match; the options enable the possibility of a partial match, but + do not require it. Note also that if you request JIT compilation only + for partial matching (for example, jit=2) but do not set the partial + modifier on a subject line, that match will not use JIT code because + none was compiled for non-partial matching. + + If JIT compilation is successful, the compiled JIT code will automati- + cally be used when an appropriate type of match is run, except when in- + compatible run-time options are specified. For more details, see the + pcre2jit documentation. See also the jitstack modifier below for a way + of setting the size of the JIT stack. + + If the jitfast modifier is specified, matching is done using the JIT + "fast path" interface, pcre2_jit_match(), which skips some of the san- + ity checks that are done by pcre2_match(), and of course does not work + when JIT is not supported. If jitfast is specified without jit, jit=7 + is assumed. + + If the jitverify modifier is specified, information about the compiled + pattern shows whether JIT compilation was or was not successful. If + jitverify is specified without jit, jit=7 is assumed. If JIT compila- + tion is successful when jitverify is set, the text "(JIT)" is added to + the first output line after a match or non match when JIT-compiled code + was actually used in the match. + + Setting a locale + + The locale modifier must specify the name of a locale, for example: + + /pattern/locale=fr_FR + + The given locale is set, pcre2_maketables() is called to build a set of + character tables for the locale, and this is then passed to pcre2_com- + pile() when compiling the regular expression. The same tables are used + when matching the following subject lines. The locale modifier applies + only to the pattern on which it appears, but can be given in a #pattern + command if a default is needed. Setting a locale and alternate charac- + ter tables are mutually exclusive. + + Showing pattern memory + + The memory modifier causes the size in bytes of the memory used to hold + the compiled pattern to be output. This does not include the size of + the pcre2_code block; it is just the actual compiled data. If the pat- + tern is subsequently passed to the JIT compiler, the size of the JIT + compiled code is also output. Here is an example: + + re> /a(b)c/jit,memory + Memory allocation (code space): 21 + Memory allocation (JIT code): 1910 + + + Limiting nested parentheses + + The parens_nest_limit modifier sets a limit on the depth of nested + parentheses in a pattern. Breaching the limit causes a compilation er- + ror. The default for the library is set when PCRE2 is built, but + pcre2test sets its own default of 220, which is required for running + the standard test suite. + + Limiting the pattern length + + The max_pattern_length modifier sets a limit, in code units, to the + length of pattern that pcre2_compile() will accept. Breaching the limit + causes a compilation error. The default is the largest number a + PCRE2_SIZE variable can hold (essentially unlimited). + + Using the POSIX wrapper API + + The posix and posix_nosub modifiers cause pcre2test to call PCRE2 via + the POSIX wrapper API rather than its native API. When posix_nosub is + used, the POSIX option REG_NOSUB is passed to regcomp(). The POSIX + wrapper supports only the 8-bit library. Note that it does not imply + POSIX matching semantics; for more detail see the pcre2posix documenta- + tion. The following pattern modifiers set options for the regcomp() + function: + + caseless REG_ICASE + multiline REG_NEWLINE + dotall REG_DOTALL ) + ungreedy REG_UNGREEDY ) These options are not part of + ucp REG_UCP ) the POSIX standard + utf REG_UTF8 ) + + The regerror_buffsize modifier specifies a size for the error buffer + that is passed to regerror() in the event of a compilation error. For + example: + + /abc/posix,regerror_buffsize=20 + + This provides a means of testing the behaviour of regerror() when the + buffer is too small for the error message. If this modifier has not + been set, a large buffer is used. + + The aftertext and allaftertext subject modifiers work as described be- + low. All other modifiers are either ignored, with a warning message, or + cause an error. + + The pattern is passed to regcomp() as a zero-terminated string by de- + fault, but if the use_length or hex modifiers are set, the REG_PEND ex- + tension is used to pass it by length. + + Testing the stack guard feature + + The stackguard modifier is used to test the use of pcre2_set_com- + pile_recursion_guard(), a function that is provided to enable stack + availability to be checked during compilation (see the pcre2api docu- + mentation for details). If the number specified by the modifier is + greater than zero, pcre2_set_compile_recursion_guard() is called to set + up callback from pcre2_compile() to a local function. The argument it + receives is the current nesting parenthesis depth; if this is greater + than the value given by the modifier, non-zero is returned, causing the + compilation to be aborted. + + Using alternative character tables + + The value specified for the tables modifier must be one of the digits + 0, 1, 2, or 3. It causes a specific set of built-in character tables to + be passed to pcre2_compile(). This is used in the PCRE2 tests to check + behaviour with different character tables. The digit specifies the ta- + bles as follows: + + 0 do not pass any special character tables + 1 the default ASCII tables, as distributed in + pcre2_chartables.c.dist + 2 a set of tables defining ISO 8859 characters + 3 a set of tables loaded by the #loadtables command + + In tables 2, some characters whose codes are greater than 128 are iden- + tified as letters, digits, spaces, etc. Tables 3 can be used only after + a #loadtables command has loaded them from a binary file. Setting al- + ternate character tables and a locale are mutually exclusive. + + Setting certain match controls + + The following modifiers are really subject modifiers, and are described + under "Subject Modifiers" below. However, they may be included in a + pattern's modifier list, in which case they are applied to every sub- + ject line that is processed with that pattern. These modifiers do not + affect the compilation process. + + aftertext show text after match + allaftertext show text after captures + allcaptures show all captures + allvector show the entire ovector + allusedtext show all consulted text + altglobal alternative global matching + /g global global matching + jitstack= set size of JIT stack + mark show mark values + replace= specify a replacement string + startchar show starting character when relevant + substitute_callout use substitution callouts + substitute_extended use PCRE2_SUBSTITUTE_EXTENDED + substitute_literal use PCRE2_SUBSTITUTE_LITERAL + substitute_matched use PCRE2_SUBSTITUTE_MATCHED + substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH + substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY + substitute_skip= skip substitution + substitute_stop= skip substitution and following + substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET + substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY + + These modifiers may not appear in a #pattern command. If you want them + as defaults, set them in a #subject command. + + Specifying literal subject lines + + If the subject_literal modifier is present on a pattern, all the sub- + ject lines that it matches are taken as literal strings, with no inter- + pretation of backslashes. It is not possible to set subject modifiers + on such lines, but any that are set as defaults by a #subject command + are recognized. + + Saving a compiled pattern + + When a pattern with the push modifier is successfully compiled, it is + pushed onto a stack of compiled patterns, and pcre2test expects the + next line to contain a new pattern (or a command) instead of a subject + line. This facility is used when saving compiled patterns to a file, as + described in the section entitled "Saving and restoring compiled pat- + terns" below. If pushcopy is used instead of push, a copy of the com- + piled pattern is stacked, leaving the original as current, ready to + match the following input lines. This provides a way of testing the + pcre2_code_copy() function. The push and pushcopy modifiers are in- + compatible with compilation modifiers such as global that act at match + time. Any that are specified are ignored (for the stacked copy), with a + warning message, except for replace, which causes an error. Note that + jitverify, which is allowed, does not carry through to any subsequent + matching that uses a stacked pattern. + + Testing foreign pattern conversion + + The experimental foreign pattern conversion functions in PCRE2 can be + tested by setting the convert modifier. Its argument is a colon-sepa- + rated list of options, which set the equivalent option for the + pcre2_pattern_convert() function: + + glob PCRE2_CONVERT_GLOB + glob_no_starstar PCRE2_CONVERT_GLOB_NO_STARSTAR + glob_no_wild_separator PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR + posix_basic PCRE2_CONVERT_POSIX_BASIC + posix_extended PCRE2_CONVERT_POSIX_EXTENDED + unset Unset all options + + The "unset" value is useful for turning off a default that has been set + by a #pattern command. When one of these options is set, the input pat- + tern is passed to pcre2_pattern_convert(). If the conversion is suc- + cessful, the result is reflected in the output and then passed to + pcre2_compile(). The normal utf and no_utf_check options, if set, cause + the PCRE2_CONVERT_UTF and PCRE2_CONVERT_NO_UTF_CHECK options to be + passed to pcre2_pattern_convert(). + + By default, the conversion function is allowed to allocate a buffer for + its output. However, if the convert_length modifier is set to a value + greater than zero, pcre2test passes a buffer of the given length. This + makes it possible to test the length check. + + The convert_glob_escape and convert_glob_separator modifiers can be + used to specify the escape and separator characters for glob process- + ing, overriding the defaults, which are operating-system dependent. + + +SUBJECT MODIFIERS + + The modifiers that can appear in subject lines and the #subject command + are of two types. + + Setting match options + + The following modifiers set options for pcre2_match() or + pcre2_dfa_match(). See pcreapi for a description of their effects. + + anchored set PCRE2_ANCHORED + endanchored set PCRE2_ENDANCHORED + dfa_restart set PCRE2_DFA_RESTART + dfa_shortest set PCRE2_DFA_SHORTEST + no_jit set PCRE2_NO_JIT + no_utf_check set PCRE2_NO_UTF_CHECK + notbol set PCRE2_NOTBOL + notempty set PCRE2_NOTEMPTY + notempty_atstart set PCRE2_NOTEMPTY_ATSTART + noteol set PCRE2_NOTEOL + partial_hard (or ph) set PCRE2_PARTIAL_HARD + partial_soft (or ps) set PCRE2_PARTIAL_SOFT + + The partial matching modifiers are provided with abbreviations because + they appear frequently in tests. + + If the posix or posix_nosub modifier was present on the pattern, caus- + ing the POSIX wrapper API to be used, the only option-setting modifiers + that have any effect are notbol, notempty, and noteol, causing REG_NOT- + BOL, REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to + regexec(). The other modifiers are ignored, with a warning message. + + There is one additional modifier that can be used with the POSIX wrap- + per. It is ignored (with a warning) if used for non-POSIX matching. + + posix_startend=[:] + + This causes the subject string to be passed to regexec() using the + REG_STARTEND option, which uses offsets to specify which part of the + string is searched. If only one number is given, the end offset is + passed as the end of the subject string. For more detail of REG_STAR- + TEND, see the pcre2posix documentation. If the subject string contains + binary zeros (coded as escapes such as \x{00} because pcre2test does + not support actual binary zeros in its input), you must use posix_star- + tend to specify its length. + + Setting match controls + + The following modifiers affect the matching process or request addi- + tional information. Some of them may also be specified on a pattern + line (see above), in which case they apply to every subject line that + is matched against that pattern, but can be overridden by modifiers on + the subject. + + aftertext show text after match + allaftertext show text after captures + allcaptures show all captures + allvector show the entire ovector + allusedtext show all consulted text (non-JIT only) + altglobal alternative global matching + callout_capture show captures at callout time + callout_data= set a value to pass via callouts + callout_error=[:] control callout error + callout_extra show extra callout information + callout_fail=[:] control callout failure + callout_no_where do not show position of a callout + callout_none do not supply a callout function + copy= copy captured substring + depth_limit= set a depth limit + dfa use pcre2_dfa_match() + find_limits find match and depth limits + get= extract captured substring + getall extract all captured substrings + /g global global matching + heap_limit= set a limit on heap memory (Kbytes) + jitstack= set size of JIT stack + mark show mark values + match_limit= set a match limit + memory show heap memory usage + null_context match with a NULL context + offset= set starting offset + offset_limit= set offset limit + ovector= set size of output vector + recursion_limit= obsolete synonym for depth_limit + replace= specify a replacement string + startchar show startchar when relevant + startoffset= same as offset= + substitute_callout use substitution callouts + substitute_extedded use PCRE2_SUBSTITUTE_EXTENDED + substitute_literal use PCRE2_SUBSTITUTE_LITERAL + substitute_matched use PCRE2_SUBSTITUTE_MATCHED + substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH + substitute_replacement_only use PCRE2_SUBSTITUTE_REPLACEMENT_ONLY + substitute_skip= skip substitution number n + substitute_stop= skip substitution number n and greater + substitute_unknown_unset use PCRE2_SUBSTITUTE_UNKNOWN_UNSET + substitute_unset_empty use PCRE2_SUBSTITUTE_UNSET_EMPTY + zero_terminate pass the subject as zero-terminated + + The effects of these modifiers are described in the following sections. + When matching via the POSIX wrapper API, the aftertext, allaftertext, + and ovector subject modifiers work as described below. All other modi- + fiers are either ignored, with a warning message, or cause an error. + + Showing more text + + The aftertext modifier requests that as well as outputting the part of + the subject string that matched the entire pattern, pcre2test should in + addition output the remainder of the subject string. This is useful for + tests where the subject contains multiple copies of the same substring. + The allaftertext modifier requests the same action for captured sub- + strings as well as the main matched substring. In each case the remain- + der is output on the following line with a plus character following the + capture number. + + The allusedtext modifier requests that all the text that was consulted + during a successful pattern match by the interpreter should be shown, + for both full and partial matches. This feature is not supported for + JIT matching, and if requested with JIT it is ignored (with a warning + message). Setting this modifier affects the output if there is a look- + behind at the start of a match, or, for a complete match, a lookahead + at the end, or if \K is used in the pattern. Characters that precede or + follow the start and end of the actual match are indicated in the out- + put by '<' or '>' characters underneath them. Here is an example: + + re> /(?<=pqr)abc(?=xyz)/ + data> 123pqrabcxyz456\=allusedtext + 0: pqrabcxyz + <<< >>> + data> 123pqrabcxy\=ph,allusedtext + Partial match: pqrabcxy + <<< + + The first, complete match shows that the matched string is "abc", with + the preceding and following strings "pqr" and "xyz" having been con- + sulted during the match (when processing the assertions). The partial + match can indicate only the preceding string. + + The startchar modifier requests that the starting character for the + match be indicated, if it is different to the start of the matched + string. The only time when this occurs is when \K has been processed as + part of the match. In this situation, the output for the matched string + is displayed from the starting character instead of from the match + point, with circumflex characters under the earlier characters. For ex- + ample: + + re> /abc\Kxyz/ + data> abcxyz\=startchar + 0: abcxyz + ^^^ + + Unlike allusedtext, the startchar modifier can be used with JIT. How- + ever, these two modifiers are mutually exclusive. + + Showing the value of all capture groups + + The allcaptures modifier requests that the values of all potential cap- + tured parentheses be output after a match. By default, only those up to + the highest one actually used in the match are output (corresponding to + the return code from pcre2_match()). Groups that did not take part in + the match are output as "". This modifier is not relevant for + DFA matching (which does no capturing) and does not apply when replace + is specified; it is ignored, with a warning message, if present. + + Showing the entire ovector, for all outcomes + + The allvector modifier requests that the entire ovector be shown, what- + ever the outcome of the match. Compare allcaptures, which shows only up + to the maximum number of capture groups for the pattern, and then only + for a successful complete non-DFA match. This modifier, which acts af- + ter any match result, and also for DFA matching, provides a means of + checking that there are no unexpected modifications to ovector fields. + Before each match attempt, the ovector is filled with a special value, + and if this is found in both elements of a capturing pair, "" is output. After a successful match, this applies to all + groups after the maximum capture group for the pattern. In other cases + it applies to the entire ovector. After a partial match, the first two + elements are the only ones that should be set. After a DFA match, the + amount of ovector that is used depends on the number of matches that + were found. + + Testing pattern callouts + + A callout function is supplied when pcre2test calls the library match- + ing functions, unless callout_none is specified. Its behaviour can be + controlled by various modifiers listed above whose names begin with + callout_. Details are given in the section entitled "Callouts" below. + Testing callouts from pcre2_substitute() is decribed separately in + "Testing the substitution function" below. + + Finding all matches in a string + + Searching for all possible matches within a subject can be requested by + the global or altglobal modifier. After finding a match, the matching + function is called again to search the remainder of the subject. The + difference between global and altglobal is that the former uses the + start_offset argument to pcre2_match() or pcre2_dfa_match() to start + searching at a new point within the entire string (which is what Perl + does), whereas the latter passes over a shortened subject. This makes a + difference to the matching process if the pattern begins with a lookbe- + hind assertion (including \b or \B). + + If an empty string is matched, the next match is done with the + PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search + for another, non-empty, match at the same point in the subject. If this + match fails, the start offset is advanced, and the normal match is re- + tried. This imitates the way Perl handles such cases when using the /g + modifier or the split() function. Normally, the start offset is ad- + vanced by one character, but if the newline convention recognizes CRLF + as a newline, and the current character is CR followed by LF, an ad- + vance of two characters occurs. + + Testing substring extraction functions + + The copy and get modifiers can be used to test the pcre2_sub- + string_copy_xxx() and pcre2_substring_get_xxx() functions. They can be + given more than once, and each can specify a capture group name or num- + ber, for example: + + abcd\=copy=1,copy=3,get=G1 + + If the #subject command is used to set default copy and/or get lists, + these can be unset by specifying a negative number to cancel all num- + bered groups and an empty name to cancel all named groups. + + The getall modifier tests pcre2_substring_list_get(), which extracts + all captured substrings. + + If the subject line is successfully matched, the substrings extracted + by the convenience functions are output with C, G, or L after the + string number instead of a colon. This is in addition to the normal + full list. The string length (that is, the return from the extraction + function) is given in parentheses after each substring, followed by the + name when the extraction was by name. + + Testing the substitution function + + If the replace modifier is set, the pcre2_substitute() function is + called instead of one of the matching functions (or after one call of + pcre2_match() in the case of PCRE2_SUBSTITUTE_MATCHED). Note that re- + placement strings cannot contain commas, because a comma signifies the + end of a modifier. This is not thought to be an issue in a test pro- + gram. + + Specifying a completely empty replacement string disables this modi- + fier. However, it is possible to specify an empty replacement by pro- + viding a buffer length, as described below, for an otherwise empty re- + placement. + + Unlike subject strings, pcre2test does not process replacement strings + for escape sequences. In UTF mode, a replacement string is checked to + see if it is a valid UTF-8 string. If so, it is correctly converted to + a UTF string of the appropriate code unit width. If it is not a valid + UTF-8 string, the individual code units are copied directly. This pro- + vides a means of passing an invalid UTF-8 string for testing purposes. + + The following modifiers set options (in additional to the normal match + options) for pcre2_substitute(): + + global PCRE2_SUBSTITUTE_GLOBAL + substitute_extended PCRE2_SUBSTITUTE_EXTENDED + substitute_literal PCRE2_SUBSTITUTE_LITERAL + substitute_matched PCRE2_SUBSTITUTE_MATCHED + substitute_overflow_length PCRE2_SUBSTITUTE_OVERFLOW_LENGTH + substitute_replacement_only PCRE2_SUBSTITUTE_REPLACEMENT_ONLY + substitute_unknown_unset PCRE2_SUBSTITUTE_UNKNOWN_UNSET + substitute_unset_empty PCRE2_SUBSTITUTE_UNSET_EMPTY + + See the pcre2api documentation for details of these options. + + After a successful substitution, the modified string is output, pre- + ceded by the number of replacements. This may be zero if there were no + matches. Here is a simple example of a substitution test: + + /abc/replace=xxx + =abc=abc= + 1: =xxx=abc= + =abc=abc=\=global + 2: =xxx=xxx= + + Subject and replacement strings should be kept relatively short (fewer + than 256 characters) for substitution tests, as fixed-size buffers are + used. To make it easy to test for buffer overflow, if the replacement + string starts with a number in square brackets, that number is passed + to pcre2_substitute() as the size of the output buffer, with the re- + placement string starting at the next character. Here is an example + that tests the edge case: + + /abc/ + 123abc123\=replace=[10]XYZ + 1: 123XYZ123 + 123abc123\=replace=[9]XYZ + Failed: error -47: no more memory + + The default action of pcre2_substitute() is to return PCRE2_ER- + ROR_NOMEMORY when the output buffer is too small. However, if the + PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the substi- + tute_overflow_length modifier), pcre2_substitute() continues to go + through the motions of matching and substituting (but not doing any + callouts), in order to compute the size of buffer that is required. + When this happens, pcre2test shows the required buffer length (which + includes space for the trailing zero) as part of the error message. For + example: + + /abc/substitute_overflow_length + 123abc123\=replace=[9]XYZ + Failed: error -47: no more memory: 10 code units are needed + + A replacement string is ignored with POSIX and DFA matching. Specifying + partial matching provokes an error return ("bad option value") from + pcre2_substitute(). + + Testing substitute callouts + + If the substitute_callout modifier is set, a substitution callout func- + tion is set up. The null_context modifier must not be set, because the + address of the callout function is passed in a match context. When the + callout function is called (after each substitution), details of the + the input and output strings are output. For example: + + /abc/g,replace=<$0>,substitute_callout + abcdefabcpqr + 1(1) Old 0 3 "abc" New 0 5 "" + 2(1) Old 6 9 "abc" New 8 13 "" + 2: defpqr + + The first number on each callout line is the count of matches. The + parenthesized number is the number of pairs that are set in the ovector + (that is, one more than the number of capturing groups that were set). + Then are listed the offsets of the old substring, its contents, and the + same for the replacement. + + By default, the substitution callout function returns zero, which ac- + cepts the replacement and causes matching to continue if /g was used. + Two further modifiers can be used to test other return values. If sub- + stitute_skip is set to a value greater than zero the callout function + returns +1 for the match of that number, and similarly substitute_stop + returns -1. These cause the replacement to be rejected, and -1 causes + no further matching to take place. If either of them are set, substi- + tute_callout is assumed. For example: + + /abc/g,replace=<$0>,substitute_skip=1 + abcdefabcpqr + 1(1) Old 0 3 "abc" New 0 5 " SKIPPED" + 2(1) Old 6 9 "abc" New 6 11 "" + 2: abcdefpqr + abcdefabcpqr\=substitute_stop=1 + 1(1) Old 0 3 "abc" New 0 5 " STOPPED" + 1: abcdefabcpqr + + If both are set for the same number, stop takes precedence. Only a sin- + gle skip or stop is supported, which is sufficient for testing that the + feature works. + + Setting the JIT stack size + + The jitstack modifier provides a way of setting the maximum stack size + that is used by the just-in-time optimization code. It is ignored if + JIT optimization is not being used. The value is a number of kibibytes + (units of 1024 bytes). Setting zero reverts to the default of 32KiB. + Providing a stack that is larger than the default is necessary only for + very complicated patterns. If jitstack is set non-zero on a subject + line it overrides any value that was set on the pattern. + + Setting heap, match, and depth limits + + The heap_limit, match_limit, and depth_limit modifiers set the appro- + priate limits in the match context. These values are ignored when the + find_limits modifier is specified. + + Finding minimum limits + + If the find_limits modifier is present on a subject line, pcre2test + calls the relevant matching function several times, setting different + values in the match context via pcre2_set_heap_limit(), + pcre2_set_match_limit(), or pcre2_set_depth_limit() until it finds the + minimum values for each parameter that allows the match to complete + without error. If JIT is being used, only the match limit is relevant. + + When using this modifier, the pattern should not contain any limit set- + tings such as (*LIMIT_MATCH=...) within it. If such a setting is + present and is lower than the minimum matching value, the minimum value + cannot be found because pcre2_set_match_limit() etc. are only able to + reduce the value of an in-pattern limit; they cannot increase it. + + For non-DFA matching, the minimum depth_limit number is a measure of + how much nested backtracking happens (that is, how deeply the pattern's + tree is searched). In the case of DFA matching, depth_limit controls + the depth of recursive calls of the internal function that is used for + handling pattern recursion, lookaround assertions, and atomic groups. + + For non-DFA matching, the match_limit number is a measure of the amount + of backtracking that takes place, and learning the minimum value can be + instructive. For most simple matches, the number is quite small, but + for patterns with very large numbers of matching possibilities, it can + become large very quickly with increasing length of subject string. In + the case of DFA matching, match_limit controls the total number of + calls, both recursive and non-recursive, to the internal matching func- + tion, thus controlling the overall amount of computing resource that is + used. + + For both kinds of matching, the heap_limit number, which is in + kibibytes (units of 1024 bytes), limits the amount of heap memory used + for matching. A value of zero disables the use of any heap memory; many + simple pattern matches can be done without using the heap, so zero is + not an unreasonable setting. + + Showing MARK names + + + The mark modifier causes the names from backtracking control verbs that + are returned from calls to pcre2_match() to be displayed. If a mark is + returned for a match, non-match, or partial match, pcre2test shows it. + For a match, it is on a line by itself, tagged with "MK:". Otherwise, + it is added to the non-match message. + + Showing memory usage + + The memory modifier causes pcre2test to log the sizes of all heap mem- + ory allocation and freeing calls that occur during a call to + pcre2_match() or pcre2_dfa_match(). These occur only when a match re- + quires a bigger vector than the default for remembering backtracking + points (pcre2_match()) or for internal workspace (pcre2_dfa_match()). + In many cases there will be no heap memory used and therefore no addi- + tional output. No heap memory is allocated during matching with JIT, so + in that case the memory modifier never has any effect. For this modi- + fier to work, the null_context modifier must not be set on both the + pattern and the subject, though it can be set on one or the other. + + Setting a starting offset + + The offset modifier sets an offset in the subject string at which + matching starts. Its value is a number of code units, not characters. + + Setting an offset limit + + The offset_limit modifier sets a limit for unanchored matches. If a + match cannot be found starting at or before this offset in the subject, + a "no match" return is given. The data value is a number of code units, + not characters. When this modifier is used, the use_offset_limit modi- + fier must have been set for the pattern; if not, an error is generated. + + Setting the size of the output vector + + The ovector modifier applies only to the subject line in which it ap- + pears, though of course it can also be used to set a default in a #sub- + ject command. It specifies the number of pairs of offsets that are + available for storing matching information. The default is 15. + + A value of zero is useful when testing the POSIX API because it causes + regexec() to be called with a NULL capture vector. When not testing the + POSIX API, a value of zero is used to cause pcre2_match_data_cre- + ate_from_pattern() to be called, in order to create a match block of + exactly the right size for the pattern. (It is not possible to create a + match block with a zero-length ovector; there is always at least one + pair of offsets.) + + Passing the subject as zero-terminated + + By default, the subject string is passed to a native API matching func- + tion with its correct length. In order to test the facility for passing + a zero-terminated string, the zero_terminate modifier is provided. It + causes the length to be passed as PCRE2_ZERO_TERMINATED. When matching + via the POSIX interface, this modifier is ignored, with a warning. + + When testing pcre2_substitute(), this modifier also has the effect of + passing the replacement string as zero-terminated. + + Passing a NULL context + + Normally, pcre2test passes a context block to pcre2_match(), + pcre2_dfa_match(), pcre2_jit_match() or pcre2_substitute(). If the + null_context modifier is set, however, NULL is passed. This is for + testing that the matching and substitution functions behave correctly + in this case (they use default values). This modifier cannot be used + with the find_limits or substitute_callout modifiers. + + +THE ALTERNATIVE MATCHING FUNCTION + + By default, pcre2test uses the standard PCRE2 matching function, + pcre2_match() to match each subject line. PCRE2 also supports an alter- + native matching function, pcre2_dfa_match(), which operates in a dif- + ferent way, and has some restrictions. The differences between the two + functions are described in the pcre2matching documentation. + + If the dfa modifier is set, the alternative matching function is used. + This function finds all possible matches at a given point in the sub- + ject. If, however, the dfa_shortest modifier is set, processing stops + after the first match is found. This is always the shortest possible + match. + + +DEFAULT OUTPUT FROM pcre2test + + This section describes the output when the normal matching function, + pcre2_match(), is being used. + + When a match succeeds, pcre2test outputs the list of captured sub- + strings, starting with number 0 for the string that matched the whole + pattern. Otherwise, it outputs "No match" when the return is PCRE2_ER- + ROR_NOMATCH, or "Partial match:" followed by the partially matching + substring when the return is PCRE2_ERROR_PARTIAL. (Note that this is + the entire substring that was inspected during the partial match; it + may include characters before the actual match start if a lookbehind + assertion, \K, \b, or \B was involved.) + + For any other return, pcre2test outputs the PCRE2 negative error number + and a short descriptive phrase. If the error is a failed UTF string + check, the code unit offset of the start of the failing character is + also output. Here is an example of an interactive pcre2test run. + + $ pcre2test + PCRE2 version 10.22 2016-07-29 + + re> /^abc(\d+)/ + data> abc123 + 0: abc123 + 1: 123 + data> xyz + No match + + Unset capturing substrings that are not followed by one that is set are + not shown by pcre2test unless the allcaptures modifier is specified. In + the following example, there are two capturing substrings, but when the + first data line is matched, the second, unset substring is not shown. + An "internal" unset substring is shown as "", as for the second + data line. + + re> /(a)|(b)/ + data> a + 0: a + 1: a + data> b + 0: b + 1: + 2: b + + If the strings contain any non-printing characters, they are output as + \xhh escapes if the value is less than 256 and UTF mode is not set. + Otherwise they are output as \x{hh...} escapes. See below for the defi- + nition of non-printing characters. If the aftertext modifier is set, + the output for substring 0 is followed by the the rest of the subject + string, identified by "0+" like this: + + re> /cat/aftertext + data> cataract + 0: cat + 0+ aract + + If global matching is requested, the results of successive matching at- + tempts are output in sequence, like this: + + re> /\Bi(\w\w)/g + data> Mississippi + 0: iss + 1: ss + 0: iss + 1: ss + 0: ipp + 1: pp + + "No match" is output only if the first match attempt fails. Here is an + example of a failure message (the offset 4 that is specified by the + offset modifier is past the end of the subject string): + + re> /xyz/ + data> xyz\=offset=4 + Error -24 (bad offset value) + + Note that whereas patterns can be continued over several lines (a plain + ">" prompt is used for continuations), subject lines may not. However + newlines can be included in a subject by means of the \n escape (or \r, + \r\n, etc., depending on the newline sequence setting). + + +OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION + + When the alternative matching function, pcre2_dfa_match(), is used, the + output consists of a list of all the matches that start at the first + point in the subject where there is at least one match. For example: + + re> /(tang|tangerine|tan)/ + data> yellow tangerine\=dfa + 0: tangerine + 1: tang + 2: tan + + Using the normal matching function on this data finds only "tang". The + longest matching string is always given first (and numbered zero). Af- + ter a PCRE2_ERROR_PARTIAL return, the output is "Partial match:", fol- + lowed by the partially matching substring. Note that this is the entire + substring that was inspected during the partial match; it may include + characters before the actual match start if a lookbehind assertion, \b, + or \B was involved. (\K is not supported for DFA matching.) + + If global matching is requested, the search for further matches resumes + at the end of the longest match. For example: + + re> /(tang|tangerine|tan)/g + data> yellow tangerine and tangy sultana\=dfa + 0: tangerine + 1: tang + 2: tan + 0: tang + 1: tan + 0: tan + + The alternative matching function does not support substring capture, + so the modifiers that are concerned with captured substrings are not + relevant. + + +RESTARTING AFTER A PARTIAL MATCH + + When the alternative matching function has given the PCRE2_ERROR_PAR- + TIAL return, indicating that the subject partially matched the pattern, + you can restart the match with additional subject data by means of the + dfa_restart modifier. For example: + + re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/ + data> 23ja\=ps,dfa + Partial match: 23ja + data> n05\=dfa,dfa_restart + 0: n05 + + For further information about partial matching, see the pcre2partial + documentation. + + +CALLOUTS + + If the pattern contains any callout requests, pcre2test's callout func- + tion is called during matching unless callout_none is specified. This + works with both matching functions, and with JIT, though there are some + differences in behaviour. The output for callouts with numerical argu- + ments and those with string arguments is slightly different. + + Callouts with numerical arguments + + By default, the callout function displays the callout number, the start + and current positions in the subject text at the callout time, and the + next pattern item to be tested. For example: + + --->pqrabcdef + 0 ^ ^ \d + + This output indicates that callout number 0 occurred for a match at- + tempt starting at the fourth character of the subject string, when the + pointer was at the seventh character, and when the next pattern item + was \d. Just one circumflex is output if the start and current posi- + tions are the same, or if the current position precedes the start posi- + tion, which can happen if the callout is in a lookbehind assertion. + + Callouts numbered 255 are assumed to be automatic callouts, inserted as + a result of the auto_callout pattern modifier. In this case, instead of + showing the callout number, the offset in the pattern, preceded by a + plus, is output. For example: + + re> /\d?[A-E]\*/auto_callout + data> E* + --->E* + +0 ^ \d? + +3 ^ [A-E] + +8 ^^ \* + +10 ^ ^ + 0: E* + + If a pattern contains (*MARK) items, an additional line is output when- + ever a change of latest mark is passed to the callout function. For ex- + ample: + + re> /a(*MARK:X)bc/auto_callout + data> abc + --->abc + +0 ^ a + +1 ^^ (*MARK:X) + +10 ^^ b + Latest Mark: X + +11 ^ ^ c + +12 ^ ^ + 0: abc + + The mark changes between matching "a" and "b", but stays the same for + the rest of the match, so nothing more is output. If, as a result of + backtracking, the mark reverts to being unset, the text "" is + output. + + Callouts with string arguments + + The output for a callout with a string argument is similar, except that + instead of outputting a callout number before the position indicators, + the callout string and its offset in the pattern string are output be- + fore the reflection of the subject string, and the subject string is + reflected for each callout. For example: + + re> /^ab(?C'first')cd(?C"second")ef/ + data> abcdefg + Callout (7): 'first' + --->abcdefg + ^ ^ c + Callout (20): "second" + --->abcdefg + ^ ^ e + 0: abcdef + + + Callout modifiers + + The callout function in pcre2test returns zero (carry on matching) by + default, but you can use a callout_fail modifier in a subject line to + change this and other parameters of the callout (see below). + + If the callout_capture modifier is set, the current captured groups are + output when a callout occurs. This is useful only for non-DFA matching, + as pcre2_dfa_match() does not support capturing, so no captures are + ever shown. + + The normal callout output, showing the callout number or pattern offset + (as described above) is suppressed if the callout_no_where modifier is + set. + + When using the interpretive matching function pcre2_match() without + JIT, setting the callout_extra modifier causes additional output from + pcre2test's callout function to be generated. For the first callout in + a match attempt at a new starting position in the subject, "New match + attempt" is output. If there has been a backtrack since the last call- + out (or start of matching if this is the first callout), "Backtrack" is + output, followed by "No other matching paths" if the backtrack ended + the previous match attempt. For example: + + re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess + data> aac\=callout_extra + New match attempt + --->aac + +0 ^ ( + +1 ^ a+ + +3 ^ ^ ) + +4 ^ ^ b + Backtrack + --->aac + +3 ^^ ) + +4 ^^ b + Backtrack + No other matching paths + New match attempt + --->aac + +0 ^ ( + +1 ^ a+ + +3 ^^ ) + +4 ^^ b + Backtrack + No other matching paths + New match attempt + --->aac + +0 ^ ( + +1 ^ a+ + Backtrack + No other matching paths + New match attempt + --->aac + +0 ^ ( + +1 ^ a+ + No match + + Notice that various optimizations must be turned off if you want all + possible matching paths to be scanned. If no_start_optimize is not + used, there is an immediate "no match", without any callouts, because + the starting optimization fails to find "b" in the subject, which it + knows must be present for any match. If no_auto_possess is not used, + the "a+" item is turned into "a++", which reduces the number of back- + tracks. + + The callout_extra modifier has no effect if used with the DFA matching + function, or with JIT. + + Return values from callouts + + The default return from the callout function is zero, which allows + matching to continue. The callout_fail modifier can be given one or two + numbers. If there is only one number, 1 is returned instead of 0 (caus- + ing matching to backtrack) when a callout of that number is reached. If + two numbers (:) are given, 1 is returned when callout is + reached and there have been at least callouts. The callout_error + modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus- + ing the entire matching process to be aborted. If both these modifiers + are set for the same callout number, callout_error takes precedence. + Note that callouts with string arguments are always given the number + zero. + + The callout_data modifier can be given an unsigned or a negative num- + ber. This is set as the "user data" that is passed to the matching + function, and passed back when the callout function is invoked. Any + value other than zero is used as a return from pcre2test's callout + function. + + Inserting callouts can be helpful when using pcre2test to check compli- + cated regular expressions. For further information about callouts, see + the pcre2callout documentation. + + +NON-PRINTING CHARACTERS + + When pcre2test is outputting text in the compiled version of a pattern, + bytes other than 32-126 are always treated as non-printing characters + and are therefore shown as hex escapes. + + When pcre2test is outputting text that is a matched part of a subject + string, it behaves in the same way, unless a different locale has been + set for the pattern (using the locale modifier). In this case, the is- + print() function is used to distinguish printing and non-printing char- + acters. + + +SAVING AND RESTORING COMPILED PATTERNS + + It is possible to save compiled patterns on disc or elsewhere, and + reload them later, subject to a number of restrictions. JIT data cannot + be saved. The host on which the patterns are reloaded must be running + the same version of PCRE2, with the same code unit width, and must also + have the same endianness, pointer width and PCRE2_SIZE type. Before + compiled patterns can be saved they must be serialized, that is, con- + verted to a stream of bytes. A single byte stream may contain any num- + ber of compiled patterns, but they must all use the same character ta- + bles. A single copy of the tables is included in the byte stream (its + size is 1088 bytes). + + The functions whose names begin with pcre2_serialize_ are used for se- + rializing and de-serializing. They are described in the pcre2serialize + documentation. In this section we describe the features of pcre2test + that can be used to test these functions. + + Note that "serialization" in PCRE2 does not convert compiled patterns + to an abstract format like Java or .NET. It just makes a reloadable + byte code stream. Hence the restrictions on reloading mentioned above. + + In pcre2test, when a pattern with push modifier is successfully com- + piled, it is pushed onto a stack of compiled patterns, and pcre2test + expects the next line to contain a new pattern (or command) instead of + a subject line. By contrast, the pushcopy modifier causes a copy of the + compiled pattern to be stacked, leaving the original available for im- + mediate matching. By using push and/or pushcopy, a number of patterns + can be compiled and retained. These modifiers are incompatible with + posix, and control modifiers that act at match time are ignored (with a + message) for the stacked patterns. The jitverify modifier applies only + at compile time. + + The command + + #save + + causes all the stacked patterns to be serialized and the result written + to the named file. Afterwards, all the stacked patterns are freed. The + command + + #load + + reads the data in the file, and then arranges for it to be de-serial- + ized, with the resulting compiled patterns added to the pattern stack. + The pattern on the top of the stack can be retrieved by the #pop com- + mand, which must be followed by lines of subjects that are to be + matched with the pattern, terminated as usual by an empty line or end + of file. This command may be followed by a modifier list containing + only control modifiers that act after a pattern has been compiled. In + particular, hex, posix, posix_nosub, push, and pushcopy are not al- + lowed, nor are any option-setting modifiers. The JIT modifiers are, + however permitted. Here is an example that saves and reloads two pat- + terns. + + /abc/push + /xyz/push + #save tempfile + #load tempfile + #pop info + xyz + + #pop jit,bincode + abc + + If jitverify is used with #pop, it does not automatically imply jit, + which is different behaviour from when it is used on a pattern. + + The #popcopy command is analagous to the pushcopy modifier in that it + makes current a copy of the topmost stack pattern, leaving the original + still on the stack. + + +SEE ALSO + + pcre2(3), pcre2api(3), pcre2callout(3), pcre2jit, pcre2matching(3), + pcre2partial(d), pcre2pattern(3), pcre2serialize(3). + + +AUTHOR + + Philip Hazel + University Computing Service + Cambridge, England. + + +REVISION + + Last updated: 28 April 2021 + Copyright (c) 1997-2021 University of Cambridge. diff --git a/src/pcre2/doc/pcre2unicode.3 b/src/pcre2/doc/pcre2unicode.3 new file mode 100644 index 00000000..055a4ce4 --- /dev/null +++ b/src/pcre2/doc/pcre2unicode.3 @@ -0,0 +1,462 @@ +.TH PCRE2UNICODE 3 "23 February 2020" "PCRE2 10.35" +.SH NAME +PCRE - Perl-compatible regular expressions (revised API) +.SH "UNICODE AND UTF SUPPORT" +.rs +.sp +PCRE2 is normally built with Unicode support, though if you do not need it, you +can build it without, in which case the library will be smaller. With Unicode +support, PCRE2 has knowledge of Unicode character properties and can process +strings of text in UTF-8, UTF-16, and UTF-32 format (depending on the code unit +width), but this is not the default. Unless specifically requested, PCRE2 +treats each code unit in a string as one character. +.P +There are two ways of telling PCRE2 to switch to UTF mode, where characters may +consist of more than one code unit and the range of values is constrained. The +program can call +.\" HREF +\fBpcre2_compile()\fP +.\" +with the PCRE2_UTF option, or the pattern may start with the sequence (*UTF). +However, the latter facility can be locked out by the PCRE2_NEVER_UTF option. +That is, the programmer can prevent the supplier of the pattern from switching +to UTF mode. +.P +Note that the PCRE2_MATCH_INVALID_UTF option (see +.\" HTML +.\" +below) +.\" +forces PCRE2_UTF to be set. +.P +In UTF mode, both the pattern and any subject strings that are matched against +it are treated as UTF strings instead of strings of individual one-code-unit +characters. There are also some other changes to the way characters are +handled, as documented below. +. +. +.SH "UNICODE PROPERTY SUPPORT" +.rs +.sp +When PCRE2 is built with Unicode support, the escape sequences \ep{..}, +\eP{..}, and \eX can be used. This is not dependent on the PCRE2_UTF setting. +The Unicode properties that can be tested are limited to the general category +properties such as Lu for an upper case letter or Nd for a decimal number, the +Unicode script names such as Arabic or Han, and the derived properties Any and +L&. Full lists are given in the +.\" HREF +\fBpcre2pattern\fP +.\" +and +.\" HREF +\fBpcre2syntax\fP +.\" +documentation. Only the short names for properties are supported. For example, +\ep{L} matches a letter. Its Perl synonym, \ep{Letter}, is not supported. +Furthermore, in Perl, many properties may optionally be prefixed by "Is", for +compatibility with Perl 5.6. PCRE2 does not support this. +. +. +.SH "WIDE CHARACTERS AND UTF MODES" +.rs +.sp +Code points less than 256 can be specified in patterns by either braced or +unbraced hexadecimal escape sequences (for example, \ex{b3} or \exb3). Larger +values have to use braced sequences. Unbraced octal code points up to \e777 are +also recognized; larger ones can be coded using \eo{...}. +.P +The escape sequence \eN{U+} is recognized as another way of +specifying a Unicode character by code point in a UTF mode. It is not allowed +in non-UTF mode. +.P +In UTF mode, repeat quantifiers apply to complete UTF characters, not to +individual code units. +.P +In UTF mode, the dot metacharacter matches one UTF character instead of a +single code unit. +.P +In UTF mode, capture group names are not restricted to ASCII, and may contain +any Unicode letters and decimal digits, as well as underscore. +.P +The escape sequence \eC can be used to match a single code unit in UTF mode, +but its use can lead to some strange effects because it breaks up multi-unit +characters (see the description of \eC in the +.\" HREF +\fBpcre2pattern\fP +.\" +documentation). For this reason, there is a build-time option that disables +support for \eC completely. There is also a less draconian compile-time option +for locking out the use of \eC when a pattern is compiled. +.P +The use of \eC is not supported by the alternative matching function +\fBpcre2_dfa_match()\fP when in UTF-8 or UTF-16 mode, that is, when a character +may consist of more than one code unit. The use of \eC in these modes provokes +a match-time error. Also, the JIT optimization does not support \eC in these +modes. If JIT optimization is requested for a UTF-8 or UTF-16 pattern that +contains \eC, it will not succeed, and so when \fBpcre2_match()\fP is called, +the matching will be carried out by the interpretive function. +.P +The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly test +characters of any code value, but, by default, the characters that PCRE2 +recognizes as digits, spaces, or word characters remain the same set as in +non-UTF mode, all with code points less than 256. This remains true even when +PCRE2 is built to include Unicode support, because to do otherwise would slow +down matching in many common cases. Note that this also applies to \eb +and \eB, because they are defined in terms of \ew and \eW. If you want +to test for a wider sense of, say, "digit", you can use explicit Unicode +property tests such as \ep{Nd}. Alternatively, if you set the PCRE2_UCP option, +the way that the character escapes work is changed so that Unicode properties +are used to determine which characters match. There are more details in the +section on +.\" HTML +.\" +generic character types +.\" +in the +.\" HREF +\fBpcre2pattern\fP +.\" +documentation. +.P +Similarly, characters that match the POSIX named character classes are all +low-valued characters, unless the PCRE2_UCP option is set. +.P +However, the special horizontal and vertical white space matching escapes (\eh, +\eH, \ev, and \eV) do match all the appropriate Unicode characters, whether or +not PCRE2_UCP is set. +. +. +.SH "UNICODE CASE-EQUIVALENCE" +.rs +.sp +If either PCRE2_UTF or PCRE2_UCP is set, upper/lower case processing makes use +of Unicode properties except for characters whose code points are less than 128 +and that have at most two case-equivalent values. For these, a direct table +lookup is used for speed. A few Unicode characters such as Greek sigma have +more than two code points that are case-equivalent, and these are treated +specially. Setting PCRE2_UCP without PCRE2_UTF allows Unicode-style case +processing for non-UTF character encodings such as UCS-2. +. +. +.\" HTML +.SH "SCRIPT RUNS" +.rs +.sp +The pattern constructs (*script_run:...) and (*atomic_script_run:...), with +synonyms (*sr:...) and (*asr:...), verify that the string matched within the +parentheses is a script run. In concept, a script run is a sequence of +characters that are all from the same Unicode script. However, because some +scripts are commonly used together, and because some diacritical and other +marks are used with multiple scripts, it is not that simple. +.P +Every Unicode character has a Script property, mostly with a value +corresponding to the name of a script, such as Latin, Greek, or Cyrillic. There +are also three special values: +.P +"Unknown" is used for code points that have not been assigned, and also for the +surrogate code points. In the PCRE2 32-bit library, characters whose code +points are greater than the Unicode maximum (U+10FFFF), which are accessible +only in non-UTF mode, are assigned the Unknown script. +.P +"Common" is used for characters that are used with many scripts. These include +punctuation, emoji, mathematical, musical, and currency symbols, and the ASCII +digits 0 to 9. +.P +"Inherited" is used for characters such as diacritical marks that modify a +previous character. These are considered to take on the script of the character +that they modify. +.P +Some Inherited characters are used with many scripts, but many of them are only +normally used with a small number of scripts. For example, U+102E0 (Coptic +Epact thousands mark) is used only with Arabic and Coptic. In order to make it +possible to check this, a Unicode property called Script Extension exists. Its +value is a list of scripts that apply to the character. For the majority of +characters, the list contains just one script, the same one as the Script +property. However, for characters such as U+102E0 more than one Script is +listed. There are also some Common characters that have a single, non-Common +script in their Script Extension list. +.P +The next section describes the basic rules for deciding whether a given string +of characters is a script run. Note, however, that there are some special cases +involving the Chinese Han script, and an additional constraint for decimal +digits. These are covered in subsequent sections. +. +. +.SS "Basic script run rules" +.rs +.sp +A string that is less than two characters long is a script run. This is the +only case in which an Unknown character can be part of a script run. Longer +strings are checked using only the Script Extensions property, not the basic +Script property. +.P +If a character's Script Extension property is the single value "Inherited", it +is always accepted as part of a script run. This is also true for the property +"Common", subject to the checking of decimal digits described below. All the +remaining characters in a script run must have at least one script in common in +their Script Extension lists. In set-theoretic terminology, the intersection of +all the sets of scripts must not be empty. +.P +A simple example is an Internet name such as "google.com". The letters are all +in the Latin script, and the dot is Common, so this string is a script run. +However, the Cyrillic letter "o" looks exactly the same as the Latin "o"; a +string that looks the same, but with Cyrillic "o"s is not a script run. +.P +More interesting examples involve characters with more than one script in their +Script Extension. Consider the following characters: +.sp + U+060C Arabic comma + U+06D4 Arabic full stop +.sp +The first has the Script Extension list Arabic, Hanifi Rohingya, Syriac, and +Thaana; the second has just Arabic and Hanifi Rohingya. Both of them could +appear in script runs of either Arabic or Hanifi Rohingya. The first could also +appear in Syriac or Thaana script runs, but the second could not. +. +. +.SS "The Chinese Han script" +.rs +.sp +The Chinese Han script is commonly used in conjunction with other scripts for +writing certain languages. Japanese uses the Hiragana and Katakana scripts +together with Han; Korean uses Hangul and Han; Taiwanese Mandarin uses Bopomofo +and Han. These three combinations are treated as special cases when checking +script runs and are, in effect, "virtual scripts". Thus, a script run may +contain a mixture of Hiragana, Katakana, and Han, or a mixture of Hangul and +Han, or a mixture of Bopomofo and Han, but not, for example, a mixture of +Hangul and Bopomofo and Han. PCRE2 (like Perl) follows Unicode's Technical +Standard 39 ("Unicode Security Mechanisms", http://unicode.org/reports/tr39/) +in allowing such mixtures. +. +. +.SS "Decimal digits" +.rs +.sp +Unicode contains many sets of 10 decimal digits in different scripts, and some +scripts (including the Common script) contain more than one set. Some of these +decimal digits them are visually indistinguishable from the common ASCII +digits. In addition to the script checking described above, if a script run +contains any decimal digits, they must all come from the same set of 10 +adjacent characters. +. +. +.SH "VALIDITY OF UTF STRINGS" +.rs +.sp +When the PCRE2_UTF option is set, the strings passed as patterns and subjects +are (by default) checked for validity on entry to the relevant functions. If an +invalid UTF string is passed, a negative error code is returned. The code unit +offset to the offending character can be extracted from the match data block by +calling \fBpcre2_get_startchar()\fP, which is used for this purpose after a UTF +error. +.P +In some situations, you may already know that your strings are valid, and +therefore want to skip these checks in order to improve performance, for +example in the case of a long subject string that is being scanned repeatedly. +If you set the PCRE2_NO_UTF_CHECK option at compile time or at match time, +PCRE2 assumes that the pattern or subject it is given (respectively) contains +only valid UTF code unit sequences. +.P +If you pass an invalid UTF string when PCRE2_NO_UTF_CHECK is set, the result +is undefined and your program may crash or loop indefinitely or give incorrect +results. There is, however, one mode of matching that can handle invalid UTF +subject strings. This is enabled by passing PCRE2_MATCH_INVALID_UTF to +\fBpcre2_compile()\fP and is discussed below in the next section. The rest of +this section covers the case when PCRE2_MATCH_INVALID_UTF is not set. +.P +Passing PCRE2_NO_UTF_CHECK to \fBpcre2_compile()\fP just disables the UTF check +for the pattern; it does not also apply to subject strings. If you want to +disable the check for a subject string you must pass this same option to +\fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP. +.P +UTF-16 and UTF-32 strings can indicate their endianness by special code knows +as a byte-order mark (BOM). The PCRE2 functions do not handle this, expecting +strings to be in host byte order. +.P +Unless PCRE2_NO_UTF_CHECK is set, a UTF string is checked before any other +processing takes place. In the case of \fBpcre2_match()\fP and +\fBpcre2_dfa_match()\fP calls with a non-zero starting offset, the check is +applied only to that part of the subject that could be inspected during +matching, and there is a check that the starting offset points to the first +code unit of a character or to the end of the subject. If there are no +lookbehind assertions in the pattern, the check starts at the starting offset. +Otherwise, it starts at the length of the longest lookbehind before the +starting offset, or at the start of the subject if there are not that many +characters before the starting offset. Note that the sequences \eb and \eB are +one-character lookbehinds. +.P +In addition to checking the format of the string, there is a check to ensure +that all code points lie in the range U+0 to U+10FFFF, excluding the surrogate +area. The so-called "non-character" code points are not excluded because +Unicode corrigendum #9 makes it clear that they should not be. +.P +Characters in the "Surrogate Area" of Unicode are reserved for use by UTF-16, +where they are used in pairs to encode code points with values greater than +0xFFFF. The code points that are encoded by UTF-16 pairs are available +independently in the UTF-8 and UTF-32 encodings. (In other words, the whole +surrogate thing is a fudge for UTF-16 which unfortunately messes up UTF-8 and +UTF-32.) +.P +Setting PCRE2_NO_UTF_CHECK at compile time does not disable the error that is +given if an escape sequence for an invalid Unicode code point is encountered in +the pattern. If you want to allow escape sequences such as \ex{d800} (a +surrogate code point) you can set the PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES extra +option. However, this is possible only in UTF-8 and UTF-32 modes, because these +values are not representable in UTF-16. +. +. +.\" HTML +.SS "Errors in UTF-8 strings" +.rs +.sp +The following negative error codes are given for invalid UTF-8 strings: +.sp + PCRE2_ERROR_UTF8_ERR1 + PCRE2_ERROR_UTF8_ERR2 + PCRE2_ERROR_UTF8_ERR3 + PCRE2_ERROR_UTF8_ERR4 + PCRE2_ERROR_UTF8_ERR5 +.sp +The string ends with a truncated UTF-8 character; the code specifies how many +bytes are missing (1 to 5). Although RFC 3629 restricts UTF-8 characters to be +no longer than 4 bytes, the encoding scheme (originally defined by RFC 2279) +allows for up to 6 bytes, and this is checked first; hence the possibility of +4 or 5 missing bytes. +.sp + PCRE2_ERROR_UTF8_ERR6 + PCRE2_ERROR_UTF8_ERR7 + PCRE2_ERROR_UTF8_ERR8 + PCRE2_ERROR_UTF8_ERR9 + PCRE2_ERROR_UTF8_ERR10 +.sp +The two most significant bits of the 2nd, 3rd, 4th, 5th, or 6th byte of the +character do not have the binary value 0b10 (that is, either the most +significant bit is 0, or the next bit is 1). +.sp + PCRE2_ERROR_UTF8_ERR11 + PCRE2_ERROR_UTF8_ERR12 +.sp +A character that is valid by the RFC 2279 rules is either 5 or 6 bytes long; +these code points are excluded by RFC 3629. +.sp + PCRE2_ERROR_UTF8_ERR13 +.sp +A 4-byte character has a value greater than 0x10ffff; these code points are +excluded by RFC 3629. +.sp + PCRE2_ERROR_UTF8_ERR14 +.sp +A 3-byte character has a value in the range 0xd800 to 0xdfff; this range of +code points are reserved by RFC 3629 for use with UTF-16, and so are excluded +from UTF-8. +.sp + PCRE2_ERROR_UTF8_ERR15 + PCRE2_ERROR_UTF8_ERR16 + PCRE2_ERROR_UTF8_ERR17 + PCRE2_ERROR_UTF8_ERR18 + PCRE2_ERROR_UTF8_ERR19 +.sp +A 2-, 3-, 4-, 5-, or 6-byte character is "overlong", that is, it codes for a +value that can be represented by fewer bytes, which is invalid. For example, +the two bytes 0xc0, 0xae give the value 0x2e, whose correct coding uses just +one byte. +.sp + PCRE2_ERROR_UTF8_ERR20 +.sp +The two most significant bits of the first byte of a character have the binary +value 0b10 (that is, the most significant bit is 1 and the second is 0). Such a +byte can only validly occur as the second or subsequent byte of a multi-byte +character. +.sp + PCRE2_ERROR_UTF8_ERR21 +.sp +The first byte of a character has the value 0xfe or 0xff. These values can +never occur in a valid UTF-8 string. +. +. +.\" HTML +.SS "Errors in UTF-16 strings" +.rs +.sp +The following negative error codes are given for invalid UTF-16 strings: +.sp + PCRE2_ERROR_UTF16_ERR1 Missing low surrogate at end of string + PCRE2_ERROR_UTF16_ERR2 Invalid low surrogate follows high surrogate + PCRE2_ERROR_UTF16_ERR3 Isolated low surrogate +.sp +. +. +.\" HTML +.SS "Errors in UTF-32 strings" +.rs +.sp +The following negative error codes are given for invalid UTF-32 strings: +.sp + PCRE2_ERROR_UTF32_ERR1 Surrogate character (0xd800 to 0xdfff) + PCRE2_ERROR_UTF32_ERR2 Code point is greater than 0x10ffff +.sp +. +. +.\" HTML +.SH "MATCHING IN INVALID UTF STRINGS" +.rs +.sp +You can run pattern matches on subject strings that may contain invalid UTF +sequences if you call \fBpcre2_compile()\fP with the PCRE2_MATCH_INVALID_UTF +option. This is supported by \fBpcre2_match()\fP, including JIT matching, but +not by \fBpcre2_dfa_match()\fP. When PCRE2_MATCH_INVALID_UTF is set, it forces +PCRE2_UTF to be set as well. Note, however, that the pattern itself must be a +valid UTF string. +.P +Setting PCRE2_MATCH_INVALID_UTF does not affect what \fBpcre2_compile()\fP +generates, but if \fBpcre2_jit_compile()\fP is subsequently called, it does +generate different code. If JIT is not used, the option affects the behaviour +of the interpretive code in \fBpcre2_match()\fP. When PCRE2_MATCH_INVALID_UTF +is set at compile time, PCRE2_NO_UTF_CHECK is ignored at match time. +.P +In this mode, an invalid code unit sequence in the subject never matches any +pattern item. It does not match dot, it does not match \ep{Any}, it does not +even match negative items such as [^X]. A lookbehind assertion fails if it +encounters an invalid sequence while moving the current point backwards. In +other words, an invalid UTF code unit sequence acts as a barrier which no match +can cross. +.P +You can also think of this as the subject being split up into fragments of +valid UTF, delimited internally by invalid code unit sequences. The pattern is +matched fragment by fragment. The result of a successful match, however, is +given as code unit offsets in the entire subject string in the usual way. There +are a few points to consider: +.P +The internal boundaries are not interpreted as the beginnings or ends of lines +and so do not match circumflex or dollar characters in the pattern. +.P +If \fBpcre2_match()\fP is called with an offset that points to an invalid +UTF-sequence, that sequence is skipped, and the match starts at the next valid +UTF character, or the end of the subject. +.P +At internal fragment boundaries, \eb and \eB behave in the same way as at the +beginning and end of the subject. For example, a sequence such as \ebWORD\eb +would match an instance of WORD that is surrounded by invalid UTF code units. +.P +Using PCRE2_MATCH_INVALID_UTF, an application can run matches on arbitrary +data, knowing that any matched strings that are returned are valid UTF. This +can be useful when searching for UTF text in executable or other binary files. +. +. +.SH AUTHOR +.rs +.sp +.nf +Philip Hazel +University Computing Service +Cambridge, England. +.fi +. +. +.SH REVISION +.rs +.sp +.nf +Last updated: 23 February 2020 +Copyright (c) 1997-2020 University of Cambridge. +.fi diff --git a/src/pcre/install-sh b/src/pcre2/install-sh similarity index 79% rename from src/pcre/install-sh rename to src/pcre2/install-sh index 8175c640..ec298b53 100755 --- a/src/pcre/install-sh +++ b/src/pcre2/install-sh @@ -1,7 +1,7 @@ #!/bin/sh # install - install a program, script, or datafile -scriptversion=2018-03-11.20; # UTC +scriptversion=2020-11-14.01; # UTC # This originates from X11R5 (mit/util/scripts/install.sh), which was # later released in X11R6 (xc/config/util/install.sh) with the @@ -69,6 +69,11 @@ posix_mkdir= # Desired mode of installed file. mode=0755 +# Create dirs (including intermediate dirs) using mode 755. +# This is like GNU 'install' as of coreutils 8.32 (2020). +mkdir_umask=22 + +backupsuffix= chgrpcmd= chmodcmd=$chmodprog chowncmd= @@ -99,18 +104,28 @@ Options: --version display version info and exit. -c (ignored) - -C install only if different (preserve the last data modification time) + -C install only if different (preserve data modification time) -d create directories instead of installing files. -g GROUP $chgrpprog installed files to GROUP. -m MODE $chmodprog installed files to MODE. -o USER $chownprog installed files to USER. + -p pass -p to $cpprog. -s $stripprog installed files. + -S SUFFIX attempt to back up existing files, with suffix SUFFIX. -t DIRECTORY install into DIRECTORY. -T report an error if DSTFILE is a directory. Environment variables override the default commands: CHGRPPROG CHMODPROG CHOWNPROG CMPPROG CPPROG MKDIRPROG MVPROG RMPROG STRIPPROG + +By default, rm is invoked with -f; when overridden with RMPROG, +it's up to you to specify -f if you want it. + +If -S is not specified, no backups are attempted. + +Email bug reports to bug-automake@gnu.org. +Automake home page: https://www.gnu.org/software/automake/ " while test $# -ne 0; do @@ -137,8 +152,13 @@ while test $# -ne 0; do -o) chowncmd="$chownprog $2" shift;; + -p) cpprog="$cpprog -p";; + -s) stripcmd=$stripprog;; + -S) backupsuffix="$2" + shift;; + -t) is_target_a_directory=always dst_arg=$2 @@ -255,6 +275,10 @@ do dstdir=$dst test -d "$dstdir" dstdir_status=$? + # Don't chown directories that already exist. + if test $dstdir_status = 0; then + chowncmd="" + fi else # Waiting for this to be detected by the "$cpprog $src $dsttmp" command @@ -301,22 +325,6 @@ do if test $dstdir_status != 0; then case $posix_mkdir in '') - # Create intermediate dirs using mode 755 as modified by the umask. - # This is like FreeBSD 'install' as of 1997-10-28. - umask=`umask` - case $stripcmd.$umask in - # Optimize common cases. - *[2367][2367]) mkdir_umask=$umask;; - .*0[02][02] | .[02][02] | .[02]) mkdir_umask=22;; - - *[0-7]) - mkdir_umask=`expr $umask + 22 \ - - $umask % 100 % 40 + $umask % 20 \ - - $umask % 10 % 4 + $umask % 2 - `;; - *) mkdir_umask=$umask,go-w;; - esac - # With -d, create the new directory with the user-specified mode. # Otherwise, rely on $mkdir_umask. if test -n "$dir_arg"; then @@ -326,52 +334,49 @@ do fi posix_mkdir=false - case $umask in - *[123567][0-7][0-7]) - # POSIX mkdir -p sets u+wx bits regardless of umask, which - # is incompatible with FreeBSD 'install' when (umask & 300) != 0. - ;; - *) - # Note that $RANDOM variable is not portable (e.g. dash); Use it - # here however when possible just to lower collision chance. - tmpdir=${TMPDIR-/tmp}/ins$RANDOM-$$ - - trap 'ret=$?; rmdir "$tmpdir/a/b" "$tmpdir/a" "$tmpdir" 2>/dev/null; exit $ret' 0 - - # Because "mkdir -p" follows existing symlinks and we likely work - # directly in world-writeable /tmp, make sure that the '$tmpdir' - # directory is successfully created first before we actually test - # 'mkdir -p' feature. - if (umask $mkdir_umask && - $mkdirprog $mkdir_mode "$tmpdir" && - exec $mkdirprog $mkdir_mode -p -- "$tmpdir/a/b") >/dev/null 2>&1 - then - if test -z "$dir_arg" || { - # Check for POSIX incompatibilities with -m. - # HP-UX 11.23 and IRIX 6.5 mkdir -m -p sets group- or - # other-writable bit of parent directory when it shouldn't. - # FreeBSD 6.1 mkdir -m -p sets mode of existing directory. - test_tmpdir="$tmpdir/a" - ls_ld_tmpdir=`ls -ld "$test_tmpdir"` - case $ls_ld_tmpdir in - d????-?r-*) different_mode=700;; - d????-?--*) different_mode=755;; - *) false;; - esac && - $mkdirprog -m$different_mode -p -- "$test_tmpdir" && { - ls_ld_tmpdir_1=`ls -ld "$test_tmpdir"` - test "$ls_ld_tmpdir" = "$ls_ld_tmpdir_1" - } - } - then posix_mkdir=: - fi - rmdir "$tmpdir/a/b" "$tmpdir/a" "$tmpdir" - else - # Remove any dirs left behind by ancient mkdir implementations. - rmdir ./$mkdir_mode ./-p ./-- "$tmpdir" 2>/dev/null - fi - trap '' 0;; - esac;; + # The $RANDOM variable is not portable (e.g., dash). Use it + # here however when possible just to lower collision chance. + tmpdir=${TMPDIR-/tmp}/ins$RANDOM-$$ + + trap ' + ret=$? + rmdir "$tmpdir/a/b" "$tmpdir/a" "$tmpdir" 2>/dev/null + exit $ret + ' 0 + + # Because "mkdir -p" follows existing symlinks and we likely work + # directly in world-writeable /tmp, make sure that the '$tmpdir' + # directory is successfully created first before we actually test + # 'mkdir -p'. + if (umask $mkdir_umask && + $mkdirprog $mkdir_mode "$tmpdir" && + exec $mkdirprog $mkdir_mode -p -- "$tmpdir/a/b") >/dev/null 2>&1 + then + if test -z "$dir_arg" || { + # Check for POSIX incompatibilities with -m. + # HP-UX 11.23 and IRIX 6.5 mkdir -m -p sets group- or + # other-writable bit of parent directory when it shouldn't. + # FreeBSD 6.1 mkdir -m -p sets mode of existing directory. + test_tmpdir="$tmpdir/a" + ls_ld_tmpdir=`ls -ld "$test_tmpdir"` + case $ls_ld_tmpdir in + d????-?r-*) different_mode=700;; + d????-?--*) different_mode=755;; + *) false;; + esac && + $mkdirprog -m$different_mode -p -- "$test_tmpdir" && { + ls_ld_tmpdir_1=`ls -ld "$test_tmpdir"` + test "$ls_ld_tmpdir" = "$ls_ld_tmpdir_1" + } + } + then posix_mkdir=: + fi + rmdir "$tmpdir/a/b" "$tmpdir/a" "$tmpdir" + else + # Remove any dirs left behind by ancient mkdir implementations. + rmdir ./$mkdir_mode ./-p ./-- "$tmpdir" 2>/dev/null + fi + trap '' 0;; esac if @@ -382,7 +387,7 @@ do then : else - # The umask is ridiculous, or mkdir does not conform to POSIX, + # mkdir does not conform to POSIX, # or it failed possibly due to a race condition. Create the # directory the slow way, step by step, checking for races as we go. @@ -411,7 +416,7 @@ do prefixes= else if $posix_mkdir; then - (umask=$mkdir_umask && + (umask $mkdir_umask && $doit_exec $mkdirprog $mkdir_mode -p -- "$dstdir") && break # Don't fail if two instances are running concurrently. test -d "$prefix" || exit 1 @@ -451,7 +456,18 @@ do trap 'ret=$?; rm -f "$dsttmp" "$rmtmp" && exit $ret' 0 # Copy the file name to the temp name. - (umask $cp_umask && $doit_exec $cpprog "$src" "$dsttmp") && + (umask $cp_umask && + { test -z "$stripcmd" || { + # Create $dsttmp read-write so that cp doesn't create it read-only, + # which would cause strip to fail. + if test -z "$doit"; then + : >"$dsttmp" # No need to fork-exec 'touch'. + else + $doit touch "$dsttmp" + fi + } + } && + $doit_exec $cpprog "$src" "$dsttmp") && # and set any options; do chmod last to preserve setuid bits. # @@ -477,6 +493,13 @@ do then rm -f "$dsttmp" else + # If $backupsuffix is set, and the file being installed + # already exists, attempt a backup. Don't worry if it fails, + # e.g., if mv doesn't support -f. + if test -n "$backupsuffix" && test -f "$dst"; then + $doit $mvcmd -f "$dst" "$dst$backupsuffix" 2>/dev/null + fi + # Rename the file to the real destination. $doit $mvcmd -f "$dsttmp" "$dst" 2>/dev/null || @@ -491,9 +514,9 @@ do # file should still install successfully. { test ! -f "$dst" || - $doit $rmcmd -f "$dst" 2>/dev/null || + $doit $rmcmd "$dst" 2>/dev/null || { $doit $mvcmd -f "$dst" "$rmtmp" 2>/dev/null && - { $doit $rmcmd -f "$rmtmp" 2>/dev/null; :; } + { $doit $rmcmd "$rmtmp" 2>/dev/null; :; } } || { echo "$0: cannot unlink or rename $dst" >&2 (exit 1); exit 1 diff --git a/src/pcre2/libpcre2-16.pc.in b/src/pcre2/libpcre2-16.pc.in new file mode 100644 index 00000000..bacb4665 --- /dev/null +++ b/src/pcre2/libpcre2-16.pc.in @@ -0,0 +1,13 @@ +# Package Information for pkg-config + +prefix=@prefix@ +exec_prefix=@exec_prefix@ +libdir=@libdir@ +includedir=@includedir@ + +Name: libpcre2-16 +Description: PCRE2 - Perl compatible regular expressions C library (2nd API) with 16 bit character support +Version: @PACKAGE_VERSION@ +Libs: -L${libdir} -lpcre2-16@LIB_POSTFIX@ +Libs.private: @PTHREAD_CFLAGS@ @PTHREAD_LIBS@ +Cflags: -I${includedir} @PCRE2_STATIC_CFLAG@ diff --git a/src/pcre2/libpcre2-32.pc.in b/src/pcre2/libpcre2-32.pc.in new file mode 100644 index 00000000..06241f06 --- /dev/null +++ b/src/pcre2/libpcre2-32.pc.in @@ -0,0 +1,13 @@ +# Package Information for pkg-config + +prefix=@prefix@ +exec_prefix=@exec_prefix@ +libdir=@libdir@ +includedir=@includedir@ + +Name: libpcre2-32 +Description: PCRE2 - Perl compatible regular expressions C library (2nd API) with 32 bit character support +Version: @PACKAGE_VERSION@ +Libs: -L${libdir} -lpcre2-32@LIB_POSTFIX@ +Libs.private: @PTHREAD_CFLAGS@ @PTHREAD_LIBS@ +Cflags: -I${includedir} @PCRE2_STATIC_CFLAG@ diff --git a/src/pcre2/libpcre2-8.pc.in b/src/pcre2/libpcre2-8.pc.in new file mode 100644 index 00000000..246bb9ea --- /dev/null +++ b/src/pcre2/libpcre2-8.pc.in @@ -0,0 +1,13 @@ +# Package Information for pkg-config + +prefix=@prefix@ +exec_prefix=@exec_prefix@ +libdir=@libdir@ +includedir=@includedir@ + +Name: libpcre2-8 +Description: PCRE2 - Perl compatible regular expressions C library (2nd API) with 8 bit character support +Version: @PACKAGE_VERSION@ +Libs: -L${libdir} -lpcre2-8@LIB_POSTFIX@ +Libs.private: @PTHREAD_CFLAGS@ @PTHREAD_LIBS@ +Cflags: -I${includedir} @PCRE2_STATIC_CFLAG@ diff --git a/src/pcre2/libpcre2-posix.pc.in b/src/pcre2/libpcre2-posix.pc.in new file mode 100644 index 00000000..758c3068 --- /dev/null +++ b/src/pcre2/libpcre2-posix.pc.in @@ -0,0 +1,13 @@ +# Package Information for pkg-config + +prefix=@prefix@ +exec_prefix=@exec_prefix@ +libdir=@libdir@ +includedir=@includedir@ + +Name: libpcre2-posix +Description: Posix compatible interface to libpcre2-8 +Version: @PACKAGE_VERSION@ +Libs: -L${libdir} -lpcre2-posix@LIB_POSTFIX@ +Cflags: -I${includedir} @PCRE2_STATIC_CFLAG@ +Requires.private: libpcre2-8 diff --git a/src/pcre/ltmain.sh b/src/pcre2/ltmain.sh similarity index 99% rename from src/pcre/ltmain.sh rename to src/pcre2/ltmain.sh index d3ab94d6..48cea9b0 100644 --- a/src/pcre/ltmain.sh +++ b/src/pcre2/ltmain.sh @@ -2,7 +2,7 @@ ## DO NOT EDIT - This file generated from ./build-aux/ltmain.in ## by inline-source v2018-07-24.06 -# libtool (GNU libtool) 2.4.6.42-b88ce +# libtool (GNU libtool) 2.4.6.42-b88ce-dirty # Provide generalized library-building support services. # Written by Gordon Matzigkeit , 1996 @@ -31,7 +31,7 @@ PROGRAM=libtool PACKAGE=libtool -VERSION=2.4.6.42-b88ce +VERSION=2.4.6.42-b88ce-dirty package_revision=2.4.6.42 @@ -2176,7 +2176,7 @@ func_version () # End: # Set a version string. -scriptversion='(GNU libtool) 2.4.6.42-b88ce' +scriptversion='(GNU libtool) 2.4.6.42-b88ce-dirty' # func_echo ARG... @@ -2267,7 +2267,7 @@ include the following information: compiler: $LTCC compiler flags: $LTCFLAGS linker: $LD (gnu? $with_gnu_ld) - version: $progname (GNU libtool) 2.4.6.42-b88ce + version: $progname (GNU libtool) 2.4.6.42-b88ce-dirty automake: `($AUTOMAKE --version) 2>/dev/null |$SED 1q` autoconf: `($AUTOCONF --version) 2>/dev/null |$SED 1q` diff --git a/src/pcre/m4/ax_pthread.m4 b/src/pcre2/m4/ax_pthread.m4 similarity index 100% rename from src/pcre/m4/ax_pthread.m4 rename to src/pcre2/m4/ax_pthread.m4 diff --git a/src/pcre/m4/libtool.m4 b/src/pcre2/m4/libtool.m4 similarity index 99% rename from src/pcre/m4/libtool.m4 rename to src/pcre2/m4/libtool.m4 index b55a6e57..2b73e384 100644 --- a/src/pcre/m4/libtool.m4 +++ b/src/pcre2/m4/libtool.m4 @@ -728,7 +728,6 @@ _LT_CONFIG_SAVE_COMMANDS([ cat <<_LT_EOF >> "$cfgfile" #! $SHELL # Generated automatically by $as_me ($PACKAGE) $VERSION -# Libtool was configured on host `(hostname || uname -n) 2>/dev/null | sed 1q`: # NOTE: Changes made to this file will be lost: look at ltmain.sh. # Provide generalized library-building support services. diff --git a/src/pcre/m4/ltoptions.m4 b/src/pcre2/m4/ltoptions.m4 similarity index 100% rename from src/pcre/m4/ltoptions.m4 rename to src/pcre2/m4/ltoptions.m4 diff --git a/src/pcre/m4/ltsugar.m4 b/src/pcre2/m4/ltsugar.m4 similarity index 100% rename from src/pcre/m4/ltsugar.m4 rename to src/pcre2/m4/ltsugar.m4 diff --git a/src/pcre/m4/ltversion.m4 b/src/pcre2/m4/ltversion.m4 similarity index 87% rename from src/pcre/m4/ltversion.m4 rename to src/pcre2/m4/ltversion.m4 index 86b2ad72..7f9a3ada 100644 --- a/src/pcre/m4/ltversion.m4 +++ b/src/pcre2/m4/ltversion.m4 @@ -12,11 +12,11 @@ # serial 4221 ltversion.m4 # This file is part of GNU Libtool -m4_define([LT_PACKAGE_VERSION], [2.4.6.42-b88ce]) +m4_define([LT_PACKAGE_VERSION], [2.4.6.42-b88ce-dirty]) m4_define([LT_PACKAGE_REVISION], [2.4.6.42]) AC_DEFUN([LTVERSION_VERSION], -[macro_version='2.4.6.42-b88ce' +[macro_version='2.4.6.42-b88ce-dirty' macro_revision='2.4.6.42' _LT_DECL(, macro_version, 0, [Which release of libtool.m4 was used?]) _LT_DECL(, macro_revision, 0) diff --git a/src/pcre/m4/lt~obsolete.m4 b/src/pcre2/m4/lt~obsolete.m4 similarity index 100% rename from src/pcre/m4/lt~obsolete.m4 rename to src/pcre2/m4/lt~obsolete.m4 diff --git a/src/pcre/m4/pcre_visibility.m4 b/src/pcre2/m4/pcre2_visibility.m4 similarity index 68% rename from src/pcre/m4/pcre_visibility.m4 rename to src/pcre2/m4/pcre2_visibility.m4 index 30aff871..480f2eef 100644 --- a/src/pcre/m4/pcre_visibility.m4 +++ b/src/pcre2/m4/pcre2_visibility.m4 @@ -21,8 +21,9 @@ dnl Set the variable CFLAG_VISIBILITY. dnl Defines and sets the variable HAVE_VISIBILITY. dnl Modified to fit with PCRE build environment by Cristian Rodríguez. +dnl Adjusted for PCRE2 by PH -AC_DEFUN([PCRE_VISIBILITY], +AC_DEFUN([PCRE2_VISIBILITY], [ AC_REQUIRE([AC_PROG_CC]) VISIBILITY_CFLAGS= @@ -33,26 +34,26 @@ AC_DEFUN([PCRE_VISIBILITY], dnl whether it leads to an error because of some other option that the dnl user has put into $CC $CFLAGS $CPPFLAGS. AC_MSG_CHECKING([whether the -Werror option is usable]) - AC_CACHE_VAL([pcre_cv_cc_vis_werror], [ - pcre_save_CFLAGS="$CFLAGS" + AC_CACHE_VAL([pcre2_cv_cc_vis_werror], [ + pcre2_save_CFLAGS="$CFLAGS" CFLAGS="$CFLAGS -Werror" AC_COMPILE_IFELSE( [AC_LANG_PROGRAM([[]], [[]])], - [pcre_cv_cc_vis_werror=yes], - [pcre_cv_cc_vis_werror=no]) - CFLAGS="$pcre_save_CFLAGS"]) - AC_MSG_RESULT([$pcre_cv_cc_vis_werror]) + [pcre2_cv_cc_vis_werror=yes], + [pcre2_cv_cc_vis_werror=no]) + CFLAGS="$pcre2_save_CFLAGS"]) + AC_MSG_RESULT([$pcre2_cv_cc_vis_werror]) dnl Now check whether visibility declarations are supported. AC_MSG_CHECKING([for simple visibility declarations]) - AC_CACHE_VAL([pcre_cv_cc_visibility], [ - pcre_save_CFLAGS="$CFLAGS" + AC_CACHE_VAL([pcre2_cv_cc_visibility], [ + pcre2_save_CFLAGS="$CFLAGS" CFLAGS="$CFLAGS -fvisibility=hidden" dnl We use the option -Werror and a function dummyfunc, because on some dnl platforms (Cygwin 1.7) the use of -fvisibility triggers a warning dnl "visibility attribute not supported in this configuration; ignored" dnl at the first function definition in every compilation unit, and we dnl don't want to use the option in this case. - if test $pcre_cv_cc_vis_werror = yes; then + if test $pcre2_cv_cc_vis_werror = yes; then CFLAGS="$CFLAGS -Werror" fi AC_COMPILE_IFELSE( @@ -64,21 +65,18 @@ AC_DEFUN([PCRE_VISIBILITY], void dummyfunc (void) {} ]], [[]])], - [pcre_cv_cc_visibility=yes], - [pcre_cv_cc_visibility=no]) - CFLAGS="$pcre_save_CFLAGS"]) - AC_MSG_RESULT([$pcre_cv_cc_visibility]) - if test $pcre_cv_cc_visibility = yes; then + [pcre2_cv_cc_visibility=yes], + [pcre2_cv_cc_visibility=no]) + CFLAGS="$pcre2_save_CFLAGS"]) + AC_MSG_RESULT([$pcre2_cv_cc_visibility]) + if test $pcre2_cv_cc_visibility = yes; then VISIBILITY_CFLAGS="-fvisibility=hidden" VISIBILITY_CXXFLAGS="-fvisibility=hidden -fvisibility-inlines-hidden" HAVE_VISIBILITY=1 - AC_DEFINE(PCRE_EXP_DECL, [extern __attribute__ ((visibility ("default")))], [to make a symbol visible]) - AC_DEFINE(PCRE_EXP_DEFN, [__attribute__ ((visibility ("default")))], [to make a symbol visible]) - AC_DEFINE(PCRE_EXP_DATA_DEFN, [__attribute__ ((visibility ("default")))], [to make a symbol visible]) - AC_DEFINE(PCREPOSIX_EXP_DECL, [extern __attribute__ ((visibility ("default")))], [to make a symbol visible]) - AC_DEFINE(PCREPOSIX_EXP_DEFN, [extern __attribute__ ((visibility ("default")))], [to make a symbol visible]) - AC_DEFINE(PCRECPP_EXP_DECL, [extern __attribute__ ((visibility ("default")))], [to make a symbol visible]) - AC_DEFINE(PCRECPP_EXP_DEFN, [__attribute__ ((visibility ("default")))], [to make a symbol visible]) + AC_DEFINE(PCRE2_EXP_DECL, [extern __attribute__ ((visibility ("default")))], [to make a symbol visible]) + AC_DEFINE(PCRE2_EXP_DEFN, [__attribute__ ((visibility ("default")))], [to make a symbol visible]) + AC_DEFINE(PCRE2POSIX_EXP_DECL, [extern __attribute__ ((visibility ("default")))], [to make a symbol visible]) + AC_DEFINE(PCRE2POSIX_EXP_DEFN, [extern __attribute__ ((visibility ("default")))], [to make a symbol visible]) fi fi AC_SUBST([VISIBILITY_CFLAGS]) diff --git a/src/pcre/missing b/src/pcre2/missing similarity index 99% rename from src/pcre/missing rename to src/pcre2/missing index 625aeb11..8d0eaad2 100755 --- a/src/pcre/missing +++ b/src/pcre2/missing @@ -3,7 +3,7 @@ scriptversion=2018-03-07.03; # UTC -# Copyright (C) 1996-2018 Free Software Foundation, Inc. +# Copyright (C) 1996-2020 Free Software Foundation, Inc. # Originally written by Fran,cois Pinard , 1996. # This program is free software; you can redistribute it and/or modify diff --git a/src/pcre/pcre-config.in b/src/pcre2/pcre2-config.in similarity index 63% rename from src/pcre/pcre-config.in rename to src/pcre2/pcre2-config.in index ac06a332..bacea876 100644 --- a/src/pcre/pcre-config.in +++ b/src/pcre2/pcre2-config.in @@ -5,27 +5,22 @@ exec_prefix=@exec_prefix@ exec_prefix_set=no cflags="[--cflags]" +libs= -if test @enable_cpp@ = yes ; then - libs="[--libs-cpp]" -else - libs= -fi - -if test @enable_pcre16@ = yes ; then +if test @enable_pcre2_16@ = yes ; then libs="[--libs16] $libs" fi -if test @enable_pcre32@ = yes ; then +if test @enable_pcre2_32@ = yes ; then libs="[--libs32] $libs" fi -if test @enable_pcre8@ = yes ; then - libs="[--libs] [--libs-posix] $libs" +if test @enable_pcre2_8@ = yes ; then + libs="[--libs8] [--libs-posix] $libs" cflags="$cflags [--cflags-posix]" fi -usage="Usage: pcre-config [--prefix] [--exec-prefix] [--version] $libs $cflags" +usage="Usage: pcre2-config [--prefix] [--exec-prefix] [--version] $libs $cflags" if test $# -eq 0; then echo "${usage}" 1>&2 @@ -77,49 +72,42 @@ while test $# -gt 0; do if test @includedir@ != /usr/include ; then includes=-I@includedir@ fi - echo $includes @PCRE_STATIC_CFLAG@ + echo $includes @PCRE2_STATIC_CFLAG@ ;; --cflags-posix) - if test @enable_pcre8@ = yes ; then + if test @enable_pcre2_8@ = yes ; then if test @includedir@ != /usr/include ; then includes=-I@includedir@ fi - echo $includes @PCRE_STATIC_CFLAG@ + echo $includes @PCRE2_STATIC_CFLAG@ else echo "${usage}" 1>&2 fi ;; --libs-posix) - if test @enable_pcre8@ = yes ; then - echo $libS$libR -lpcreposix -lpcre + if test @enable_pcre2_8@ = yes ; then + echo $libS$libR -lpcre2-posix@LIB_POSTFIX@ -lpcre2-8@LIB_POSTFIX@ else echo "${usage}" 1>&2 fi ;; - --libs) - if test @enable_pcre8@ = yes ; then - echo $libS$libR -lpcre + --libs8) + if test @enable_pcre2_8@ = yes ; then + echo $libS$libR -lpcre2-8@LIB_POSTFIX@ else echo "${usage}" 1>&2 fi ;; --libs16) - if test @enable_pcre16@ = yes ; then - echo $libS$libR -lpcre16 + if test @enable_pcre2_16@ = yes ; then + echo $libS$libR -lpcre2-16@LIB_POSTFIX@ else echo "${usage}" 1>&2 fi ;; --libs32) - if test @enable_pcre32@ = yes ; then - echo $libS$libR -lpcre32 - else - echo "${usage}" 1>&2 - fi - ;; - --libs-cpp) - if test @enable_cpp@ = yes ; then - echo $libS$libR -lpcrecpp -lpcre + if test @enable_pcre2_32@ = yes ; then + echo $libS$libR -lpcre2-32@LIB_POSTFIX@ else echo "${usage}" 1>&2 fi diff --git a/src/pcre2/perltest.sh b/src/pcre2/perltest.sh new file mode 100755 index 00000000..31406c52 --- /dev/null +++ b/src/pcre2/perltest.sh @@ -0,0 +1,400 @@ +#! /bin/sh + +# Script for testing regular expressions with perl to check that PCRE2 handles +# them the same. For testing with different versions of Perl, if the first +# argument is -perl then the second is taken as the Perl command to use, and +# both are then removed. If the next argument is "-w", Perl is called with +# "-w", which turns on its warning mode. +# +# The Perl code has to have "use utf8" and "require Encode" at the start when +# running UTF-8 tests, but *not* for non-utf8 tests. (The "require" would +# actually be OK for non-utf8-tests, but is not always installed, so this way +# the script will always run for these tests.) +# +# The desired effect is achieved by making this a shell script that passes the +# Perl script to Perl through a pipe. If the next argument is "-utf8", a +# suitable prefix is set up. +# +# The remaining arguments, if any, are passed to Perl. They are an input file +# and an output file. If there is one argument, the output is written to +# STDOUT. If Perl receives no arguments, it opens /dev/tty as input, and writes +# output to STDOUT. (I haven't found a way of getting it to use STDIN, because +# of the contorted piping input.) + +perl=perl +perlarg='' +prefix='' + +if [ $# -gt 1 -a "$1" = "-perl" ] ; then + shift + perl=$1 + shift +fi + +if [ $# -gt 0 -a "$1" = "-w" ] ; then + perlarg="-w" + shift +fi + +if [ $# -gt 0 -a "$1" = "-utf8" ] ; then + prefix="use utf8; require Encode;" + shift +fi + + +# The Perl script that follows has a similar specification to pcre2test, and so +# can be given identical input, except that input patterns can be followed only +# by Perl's lower case modifiers and certain other pcre2test modifiers that are +# either handled or ignored: +# +# aftertext interpreted as "print $' afterwards" +# afteralltext ignored +# dupnames ignored (Perl always allows) +# jitstack ignored +# mark show mark information +# no_auto_possess ignored +# no_start_optimize insert (??{""}) at pattern start (disables optimizing) +# -no_start_optimize ignored +# subject_literal does not process subjects for escapes +# ucp sets Perl's /u modifier +# utf invoke UTF-8 functionality +# +# Comment lines are ignored. The #pattern command can be used to set modifiers +# that will be added to each subsequent pattern, after any modifiers it may +# already have. NOTE: this is different to pcre2test where #pattern sets +# defaults which can be overridden on individual patterns. The #subject command +# may be used to set or unset a default "mark" modifier for data lines. This is +# the only use of #subject that is supported. The #perltest, #forbid_utf, and +# #newline_default commands, which are needed in the relevant pcre2test files, +# are ignored. Any other #-command is ignored, with a warning message. +# +# The pattern lines should use only / as the delimiter. The other characters +# that pcre2test supports cause problems with this script. +# +# The data lines must not have any pcre2test modifiers. Unless +# "subject_literal" is on the pattern, data lines are processed as +# Perl double-quoted strings, so if they contain " $ or @ characters, these +# have to be escaped. For this reason, all such characters in the +# Perl-compatible testinput1 and testinput4 files are escaped so that they can +# be used for perltest as well as for pcre2test. The output from this script +# should be same as from pcre2test, apart from the initial identifying banner. +# +# The other testinput files are not suitable for feeding to perltest.sh, +# because they make use of the special modifiers that pcre2test uses for +# testing features of PCRE2. Some of these files also contain malformed regular +# expressions, in order to check that PCRE2 diagnoses them correctly. + +(echo "$prefix" ; cat <<'PERLEND' + +# The alpha assertions currently give warnings even when -w is not specified. + +no warnings "experimental::alpha_assertions"; +no warnings "experimental::script_run"; + +# Function for turning a string into a string of printing chars. + +sub pchars { +my($t) = ""; +if ($utf8) + { + @p = unpack('U*', $_[0]); + foreach $c (@p) + { + if ($c >= 32 && $c < 127) { $t .= chr $c; } + else { $t .= sprintf("\\x{%02x}", $c); + } + } + } +else + { + foreach $c (split(//, $_[0])) + { + if (ord $c >= 32 && ord $c < 127) { $t .= $c; } + else { $t .= sprintf("\\x%02x", ord $c); } + } + } +$t; +} + + +# Read lines from a named file or stdin and write to a named file or stdout; +# lines consist of a regular expression, in delimiters and optionally followed +# by options, followed by a set of test data, terminated by an empty line. + +# Sort out the input and output files + +if (@ARGV > 0) + { + open(INFILE, "<$ARGV[0]") || die "Failed to open $ARGV[0]\n"; + $infile = "INFILE"; + $interact = 0; + } +else + { + open(INFILE, " 1) + { + open(OUTFILE, ">$ARGV[1]") || die "Failed to open $ARGV[1]\n"; + $outfile = "OUTFILE"; + } +else { $outfile = "STDOUT"; } + +printf($outfile "Perl $^V\n\n"); + +$extra_modifiers = ""; +$default_show_mark = 0; + +# Main loop + +NEXT_RE: +for (;;) + { + printf " re> " if $interact; + last if ! ($_ = <$infile>); + printf $outfile "$_" if ! $interact; + next if ($_ =~ /^\s*$/ || $_ =~ /^#[\s!]/); + + # A few of pcre2test's #-commands are supported, or just ignored. Any others + # cause an error. + + if ($_ =~ /^#pattern(.*)/) + { + $extra_modifiers = $1; + chomp($extra_modifiers); + $extra_modifiers =~ s/\s+$//; + next; + } + elsif ($_ =~ /^#subject(.*)/) + { + $mod = $1; + chomp($mod); + $mod =~ s/\s+$//; + if ($mod =~ s/(-?)mark,?//) + { + $minus = $1; + $default_show_mark = ($minus =~ /^$/); + } + if ($mod !~ /^\s*$/) + { + printf $outfile "** Warning: \"$mod\" in #subject ignored\n"; + } + next; + } + elsif ($_ =~ /^#/) + { + if ($_ !~ /^#newline_default|^#perltest|^#forbid_utf/) + { + printf $outfile "** Warning: #-command ignored: %s", $_; + } + next; + } + + $pattern = $_; + + while ($pattern !~ /^\s*(.).*\1/s) + { + printf " > " if $interact; + last if ! ($_ = <$infile>); + printf $outfile "$_" if ! $interact; + $pattern .= $_; + } + + chomp($pattern); + $pattern =~ s/\s+$//; + + # Split the pattern from the modifiers and adjust them as necessary. + + $pattern =~ /^\s*((.).*\2)(.*)$/s; + $pat = $1; + $del = $2; + $mod = "$3,$extra_modifiers"; + $mod =~ s/^,\s*//; + + # The private "aftertext" modifier means "print $' afterwards". + + $showrest = ($mod =~ s/aftertext,?//); + + # The "subject_literal" modifer disables escapes in subjects. + + $subject_literal = ($mod =~ s/subject_literal,?//); + + # "allaftertext" is used by pcre2test to print remainders after captures + + $mod =~ s/allaftertext,?//; + + # Detect utf + + $utf8 = $mod =~ s/utf,?//; + + # Remove "dupnames". + + $mod =~ s/dupnames,?//; + + # Remove "jitstack". + + $mod =~ s/jitstack=\d+,?//; + + # The "mark" modifier requests checking of MARK data */ + + $show_mark = $default_show_mark | ($mod =~ s/mark,?//); + + # "ucp" asks pcre2test to set PCRE2_UCP; change this to /u for Perl + + $mod =~ s/ucp,?/u/; + + # Remove "no_auto_possess". + + $mod =~ s/no_auto_possess,?//; + + # Use no_start_optimize (disable PCRE2 start-up optimization) to disable Perl + # optimization by inserting (??{""}) at the start of the pattern. We may + # also encounter -no_start_optimize from a #pattern setting. + + $mod =~ s/-no_start_optimize,?//; + if ($mod =~ s/no_start_optimize,?//) { $pat =~ s/$del/$del(??{""})/; } + + # Add back retained modifiers and check that the pattern is valid. + + $mod =~ s/,//g; + $pattern = "$pat$mod"; + eval "\$_ =~ ${pattern}"; + if ($@) + { + printf $outfile "Error: $@"; + if (! $interact) + { + for (;;) + { + last if ! ($_ = <$infile>); + last if $_ =~ /^\s*$/; + } + } + next NEXT_RE; + } + + # If the /g modifier is present, we want to put a loop round the matching; + # otherwise just a single "if". + + $cmd = ($pattern =~ /g[a-z]*$/)? "while" : "if"; + + # If the pattern is actually the null string, Perl uses the most recently + # executed (and successfully compiled) regex is used instead. This is a + # nasty trap for the unwary! The PCRE2 test suite does contain null strings + # in places - if they are allowed through here all sorts of weird and + # unexpected effects happen. To avoid this, we replace such patterns with + # a non-null pattern that has the same effect. + + $pattern = "/(?#)/$2" if ($pattern =~ /^(.)\1(.*)$/); + + # Read data lines and test them + + for (;;) + { + printf "data> " if $interact; + last NEXT_RE if ! ($_ = <$infile>); + chomp; + printf $outfile "%s", "$_\n" if ! $interact; + + s/\s+$//; # Remove trailing space + s/^\s+//; # Remove leading space + + last if ($_ eq ""); + next if $_ =~ /^\\=(?:\s|$)/; # Comment line + + if ($subject_literal) + { + $x = $_; + } + else + { + $x = eval "\"$_\""; # To get escapes processed + } + + # Empty array for holding results, ensure $REGERROR and $REGMARK are + # unset, then do the matching. + + @subs = (); + + $pushes = "push \@subs,\$&;" . + "push \@subs,\$1;" . + "push \@subs,\$2;" . + "push \@subs,\$3;" . + "push \@subs,\$4;" . + "push \@subs,\$5;" . + "push \@subs,\$6;" . + "push \@subs,\$7;" . + "push \@subs,\$8;" . + "push \@subs,\$9;" . + "push \@subs,\$10;" . + "push \@subs,\$11;" . + "push \@subs,\$12;" . + "push \@subs,\$13;" . + "push \@subs,\$14;" . + "push \@subs,\$15;" . + "push \@subs,\$16;" . + "push \@subs,\$'; }"; + + undef $REGERROR; + undef $REGMARK; + + eval "${cmd} (\$x =~ ${pattern}) {" . $pushes; + + if ($@) + { + printf $outfile "Error: $@\n"; + next NEXT_RE; + } + elsif (scalar(@subs) == 0) + { + printf $outfile "No match"; + if ($show_mark && defined $REGERROR && $REGERROR != 1) + { printf $outfile (", mark = %s", &pchars($REGERROR)); } + printf $outfile "\n"; + } + else + { + while (scalar(@subs) != 0) + { + printf $outfile (" 0: %s\n", &pchars($subs[0])); + printf $outfile (" 0+ %s\n", &pchars($subs[17])) if $showrest; + $last_printed = 0; + for ($i = 1; $i <= 16; $i++) + { + if (defined $subs[$i]) + { + while ($last_printed++ < $i-1) + { printf $outfile ("%2d: \n", $last_printed); } + printf $outfile ("%2d: %s\n", $i, &pchars($subs[$i])); + $last_printed = $i; + } + } + splice(@subs, 0, 18); + } + + # It seems that $REGMARK is not marked as UTF-8 even when use utf8 is + # set and the input pattern was a UTF-8 string. We can, however, force + # it to be so marked. + + if ($show_mark && defined $REGMARK && $REGMARK != 1) + { + $xx = $REGMARK; + $xx = Encode::decode_utf8($xx) if $utf8; + printf $outfile ("MK: %s\n", &pchars($xx)); + } + } + } + } + +# By closing OUTFILE explicitly, we avoid a Perl warning in -w mode +# "main::OUTFILE" used only once". + +close(OUTFILE) if $outfile eq "OUTFILE"; + +PERLEND +) | $perl $perlarg - $@ + +# End diff --git a/src/pcre2/src/config.h.generic b/src/pcre2/src/config.h.generic new file mode 100644 index 00000000..e620bb0e --- /dev/null +++ b/src/pcre2/src/config.h.generic @@ -0,0 +1,445 @@ +/* src/config.h. Generated from config.h.in by configure. */ +/* src/config.h.in. Generated from configure.ac by autoheader. */ + +/* PCRE2 is written in Standard C, but there are a few non-standard things it +can cope with, allowing it to run on SunOS4 and other "close to standard" +systems. + +In environments that support the GNU autotools, config.h.in is converted into +config.h by the "configure" script. In environments that use CMake, +config-cmake.in is converted into config.h. If you are going to build PCRE2 "by +hand" without using "configure" or CMake, you should copy the distributed +config.h.generic to config.h, and edit the macro definitions to be the way you +need them. You must then add -DHAVE_CONFIG_H to all of your compile commands, +so that config.h is included at the start of every source. + +Alternatively, you can avoid editing by using -D on the compiler command line +to set the macro values. In this case, you do not have to set -DHAVE_CONFIG_H, +but if you do, default values will be taken from config.h for non-boolean +macros that are not defined on the command line. + +Boolean macros such as HAVE_STDLIB_H and SUPPORT_PCRE2_8 should either be +defined (conventionally to 1) for TRUE, and not defined at all for FALSE. All +such macros are listed as a commented #undef in config.h.generic. Macros such +as MATCH_LIMIT, whose actual value is relevant, have defaults defined, but are +surrounded by #ifndef/#endif lines so that the value can be overridden by -D. + +PCRE2 uses memmove() if HAVE_MEMMOVE is defined; otherwise it uses bcopy() if +HAVE_BCOPY is defined. If your system has neither bcopy() nor memmove(), make +sure both macros are undefined; an emulation function will then be used. */ + +/* By default, the \R escape sequence matches any Unicode line ending + character or sequence of characters. If BSR_ANYCRLF is defined (to any + value), this is changed so that backslash-R matches only CR, LF, or CRLF. + The build-time default can be overridden by the user of PCRE2 at runtime. + */ +/* #undef BSR_ANYCRLF */ + +/* Define to any value to disable the use of the z and t modifiers in + formatting settings such as %zu or %td (this is rarely needed). */ +/* #undef DISABLE_PERCENT_ZT */ + +/* If you are compiling for a system that uses EBCDIC instead of ASCII + character codes, define this macro to any value. When EBCDIC is set, PCRE2 + assumes that all input strings are in EBCDIC. If you do not define this + macro, PCRE2 will assume input strings are ASCII or UTF-8/16/32 Unicode. It + is not possible to build a version of PCRE2 that supports both EBCDIC and + UTF-8/16/32. */ +/* #undef EBCDIC */ + +/* In an EBCDIC environment, define this macro to any value to arrange for the + NL character to be 0x25 instead of the default 0x15. NL plays the role that + LF does in an ASCII/Unicode environment. */ +/* #undef EBCDIC_NL25 */ + +/* Define this if your compiler supports __attribute__((uninitialized)) */ +/* #undef HAVE_ATTRIBUTE_UNINITIALIZED */ + +/* Define to 1 if you have the `bcopy' function. */ +/* #undef HAVE_BCOPY */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_BZLIB_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_DIRENT_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_DLFCN_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_EDITLINE_READLINE_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_EDIT_READLINE_READLINE_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_INTTYPES_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_LIMITS_H */ + +/* Define to 1 if you have the `memfd_create' function. */ +/* #undef HAVE_MEMFD_CREATE */ + +/* Define to 1 if you have the `memmove' function. */ +/* #undef HAVE_MEMMOVE */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_MINIX_CONFIG_H */ + +/* Define to 1 if you have the `mkostemp' function. */ +/* #undef HAVE_MKOSTEMP */ + +/* Define if you have POSIX threads libraries and header files. */ +/* #undef HAVE_PTHREAD */ + +/* Have PTHREAD_PRIO_INHERIT. */ +/* #undef HAVE_PTHREAD_PRIO_INHERIT */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_READLINE_HISTORY_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_READLINE_READLINE_H */ + +/* Define to 1 if you have the `secure_getenv' function. */ +/* #undef HAVE_SECURE_GETENV */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_STDINT_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_STDIO_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_STDLIB_H */ + +/* Define to 1 if you have the `strerror' function. */ +/* #undef HAVE_STRERROR */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_STRINGS_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_STRING_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_SYS_STAT_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_SYS_TYPES_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_SYS_WAIT_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_UNISTD_H */ + +/* Define to 1 if the compiler supports simple visibility declarations. */ +/* #undef HAVE_VISIBILITY */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_WCHAR_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_WINDOWS_H */ + +/* Define to 1 if you have the header file. */ +/* #undef HAVE_ZLIB_H */ + +/* This limits the amount of memory that may be used while matching a pattern. + It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply + to JIT matching. The value is in kibibytes (units of 1024 bytes). */ +#ifndef HEAP_LIMIT +#define HEAP_LIMIT 20000000 +#endif + +/* The value of LINK_SIZE determines the number of bytes used to store links + as offsets within the compiled regex. The default is 2, which allows for + compiled patterns up to 65535 code units long. This covers the vast + majority of cases. However, PCRE2 can also be compiled to use 3 or 4 bytes + instead. This allows for longer patterns in extreme cases. */ +#ifndef LINK_SIZE +#define LINK_SIZE 2 +#endif + +/* Define to the sub-directory where libtool stores uninstalled libraries. */ +/* This is ignored unless you are using libtool. */ +#ifndef LT_OBJDIR +#define LT_OBJDIR ".libs/" +#endif + +/* The value of MATCH_LIMIT determines the default number of times the + pcre2_match() function can record a backtrack position during a single + matching attempt. The value is also used to limit a loop counter in + pcre2_dfa_match(). There is a runtime interface for setting a different + limit. The limit exists in order to catch runaway regular expressions that + take for ever to determine that they do not match. The default is set very + large so that it does not accidentally catch legitimate cases. */ +#ifndef MATCH_LIMIT +#define MATCH_LIMIT 10000000 +#endif + +/* The above limit applies to all backtracks, whether or not they are nested. + In some environments it is desirable to limit the nesting of backtracking + (that is, the depth of tree that is searched) more strictly, in order to + restrict the maximum amount of heap memory that is used. The value of + MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it + must be less than the value of MATCH_LIMIT. The default is to use the same + value as MATCH_LIMIT. There is a runtime method for setting a different + limit. In the case of pcre2_dfa_match(), this limit controls the depth of + the internal nested function calls that are used for pattern recursions, + lookarounds, and atomic groups. */ +#ifndef MATCH_LIMIT_DEPTH +#define MATCH_LIMIT_DEPTH MATCH_LIMIT +#endif + +/* This limit is parameterized just in case anybody ever wants to change it. + Care must be taken if it is increased, because it guards against integer + overflow caused by enormously large patterns. */ +#ifndef MAX_NAME_COUNT +#define MAX_NAME_COUNT 10000 +#endif + +/* This limit is parameterized just in case anybody ever wants to change it. + Care must be taken if it is increased, because it guards against integer + overflow caused by enormously large patterns. */ +#ifndef MAX_NAME_SIZE +#define MAX_NAME_SIZE 32 +#endif + +/* Defining NEVER_BACKSLASH_C locks out the use of \C in all patterns. */ +/* #undef NEVER_BACKSLASH_C */ + +/* The value of NEWLINE_DEFAULT determines the default newline character + sequence. PCRE2 client programs can override this by selecting other values + at run time. The valid values are 1 (CR), 2 (LF), 3 (CRLF), 4 (ANY), 5 + (ANYCRLF), and 6 (NUL). */ +#ifndef NEWLINE_DEFAULT +#define NEWLINE_DEFAULT 2 +#endif + +/* Name of package */ +#define PACKAGE "pcre2" + +/* Define to the address where bug reports for this package should be sent. */ +#define PACKAGE_BUGREPORT "" + +/* Define to the full name of this package. */ +#define PACKAGE_NAME "PCRE2" + +/* Define to the full name and version of this package. */ +#define PACKAGE_STRING "PCRE2 10.37" + +/* Define to the one symbol short name of this package. */ +#define PACKAGE_TARNAME "pcre2" + +/* Define to the home page for this package. */ +#define PACKAGE_URL "" + +/* Define to the version of this package. */ +#define PACKAGE_VERSION "10.37" + +/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested + parentheses (of any kind) in a pattern. This limits the amount of system + stack that is used while compiling a pattern. */ +#ifndef PARENS_NEST_LIMIT +#define PARENS_NEST_LIMIT 250 +#endif + +/* The value of PCRE2GREP_BUFSIZE is the starting size of the buffer used by + pcre2grep to hold parts of the file it is searching. The buffer will be + expanded up to PCRE2GREP_MAX_BUFSIZE if necessary, for files containing + very long lines. The actual amount of memory used by pcre2grep is three + times this number, because it allows for the buffering of "before" and + "after" lines. */ +#ifndef PCRE2GREP_BUFSIZE +#define PCRE2GREP_BUFSIZE 20480 +#endif + +/* The value of PCRE2GREP_MAX_BUFSIZE specifies the maximum size of the buffer + used by pcre2grep to hold parts of the file it is searching. The actual + amount of memory used by pcre2grep is three times this number, because it + allows for the buffering of "before" and "after" lines. */ +#ifndef PCRE2GREP_MAX_BUFSIZE +#define PCRE2GREP_MAX_BUFSIZE 1048576 +#endif + +/* Define to any value to include debugging code. */ +/* #undef PCRE2_DEBUG */ + +/* If you are compiling for a system other than a Unix-like system or + Win32, and it needs some magic to be inserted before the definition + of a function that is exported by the library, define this macro to + contain the relevant magic. If you do not define this macro, a suitable + __declspec value is used for Windows systems; in other environments + "extern" is used for a C compiler and "extern C" for a C++ compiler. + This macro apears at the start of every exported function that is part + of the external API. It does not appear on functions that are "external" + in the C sense, but which are internal to the library. */ +/* #undef PCRE2_EXP_DEFN */ + +/* Define to any value if linking statically (TODO: make nice with Libtool) */ +/* #undef PCRE2_STATIC */ + +/* Define to necessary symbol if this constant uses a non-standard name on + your system. */ +/* #undef PTHREAD_CREATE_JOINABLE */ + +/* Define to any non-zero number to enable support for SELinux compatible + executable memory allocator in JIT. Note that this will have no effect + unless SUPPORT_JIT is also defined. */ +/* #undef SLJIT_PROT_EXECUTABLE_ALLOCATOR */ + +/* Define to 1 if all of the C90 standard headers exist (not just the ones + required in a freestanding environment). This macro is provided for + backward compatibility; new code need not use it. */ +/* #undef STDC_HEADERS */ + +/* Define to any value to enable support for Just-In-Time compiling. */ +/* #undef SUPPORT_JIT */ + +/* Define to any value to allow pcre2grep to be linked with libbz2, so that it + is able to handle .bz2 files. */ +/* #undef SUPPORT_LIBBZ2 */ + +/* Define to any value to allow pcre2test to be linked with libedit. */ +/* #undef SUPPORT_LIBEDIT */ + +/* Define to any value to allow pcre2test to be linked with libreadline. */ +/* #undef SUPPORT_LIBREADLINE */ + +/* Define to any value to allow pcre2grep to be linked with libz, so that it + is able to handle .gz files. */ +/* #undef SUPPORT_LIBZ */ + +/* Define to any value to enable callout script support in pcre2grep. */ +/* #undef SUPPORT_PCRE2GREP_CALLOUT */ + +/* Define to any value to enable fork support in pcre2grep callout scripts. + This will have no effect unless SUPPORT_PCRE2GREP_CALLOUT is also defined. + */ +/* #undef SUPPORT_PCRE2GREP_CALLOUT_FORK */ + +/* Define to any value to enable JIT support in pcre2grep. Note that this will + have no effect unless SUPPORT_JIT is also defined. */ +/* #undef SUPPORT_PCRE2GREP_JIT */ + +/* Define to any value to enable the 16 bit PCRE2 library. */ +/* #undef SUPPORT_PCRE2_16 */ + +/* Define to any value to enable the 32 bit PCRE2 library. */ +/* #undef SUPPORT_PCRE2_32 */ + +/* Define to any value to enable the 8 bit PCRE2 library. */ +/* #undef SUPPORT_PCRE2_8 */ + +/* Define to any value to enable support for Unicode and UTF encoding. This + will work even in an EBCDIC environment, but it is incompatible with the + EBCDIC macro. That is, PCRE2 can support *either* EBCDIC code *or* + ASCII/Unicode, but not both at once. */ +/* #undef SUPPORT_UNICODE */ + +/* Define to any value for valgrind support to find invalid memory reads. */ +/* #undef SUPPORT_VALGRIND */ + +/* Enable extensions on AIX 3, Interix. */ +#ifndef _ALL_SOURCE +# define _ALL_SOURCE 1 +#endif +/* Enable general extensions on macOS. */ +#ifndef _DARWIN_C_SOURCE +# define _DARWIN_C_SOURCE 1 +#endif +/* Enable general extensions on Solaris. */ +#ifndef __EXTENSIONS__ +# define __EXTENSIONS__ 1 +#endif +/* Enable GNU extensions on systems that have them. */ +#ifndef _GNU_SOURCE +# define _GNU_SOURCE 1 +#endif +/* Enable X/Open compliant socket functions that do not require linking + with -lxnet on HP-UX 11.11. */ +#ifndef _HPUX_ALT_XOPEN_SOCKET_API +# define _HPUX_ALT_XOPEN_SOCKET_API 1 +#endif +/* Identify the host operating system as Minix. + This macro does not affect the system headers' behavior. + A future release of Autoconf may stop defining this macro. */ +#ifndef _MINIX +/* # undef _MINIX */ +#endif +/* Enable general extensions on NetBSD. + Enable NetBSD compatibility extensions on Minix. */ +#ifndef _NETBSD_SOURCE +# define _NETBSD_SOURCE 1 +#endif +/* Enable OpenBSD compatibility extensions on NetBSD. + Oddly enough, this does nothing on OpenBSD. */ +#ifndef _OPENBSD_SOURCE +# define _OPENBSD_SOURCE 1 +#endif +/* Define to 1 if needed for POSIX-compatible behavior. */ +#ifndef _POSIX_SOURCE +/* # undef _POSIX_SOURCE */ +#endif +/* Define to 2 if needed for POSIX-compatible behavior. */ +#ifndef _POSIX_1_SOURCE +/* # undef _POSIX_1_SOURCE */ +#endif +/* Enable POSIX-compatible threading on Solaris. */ +#ifndef _POSIX_PTHREAD_SEMANTICS +# define _POSIX_PTHREAD_SEMANTICS 1 +#endif +/* Enable extensions specified by ISO/IEC TS 18661-5:2014. */ +#ifndef __STDC_WANT_IEC_60559_ATTRIBS_EXT__ +# define __STDC_WANT_IEC_60559_ATTRIBS_EXT__ 1 +#endif +/* Enable extensions specified by ISO/IEC TS 18661-1:2014. */ +#ifndef __STDC_WANT_IEC_60559_BFP_EXT__ +# define __STDC_WANT_IEC_60559_BFP_EXT__ 1 +#endif +/* Enable extensions specified by ISO/IEC TS 18661-2:2015. */ +#ifndef __STDC_WANT_IEC_60559_DFP_EXT__ +# define __STDC_WANT_IEC_60559_DFP_EXT__ 1 +#endif +/* Enable extensions specified by ISO/IEC TS 18661-4:2015. */ +#ifndef __STDC_WANT_IEC_60559_FUNCS_EXT__ +# define __STDC_WANT_IEC_60559_FUNCS_EXT__ 1 +#endif +/* Enable extensions specified by ISO/IEC TS 18661-3:2015. */ +#ifndef __STDC_WANT_IEC_60559_TYPES_EXT__ +# define __STDC_WANT_IEC_60559_TYPES_EXT__ 1 +#endif +/* Enable extensions specified by ISO/IEC TR 24731-2:2010. */ +#ifndef __STDC_WANT_LIB_EXT2__ +# define __STDC_WANT_LIB_EXT2__ 1 +#endif +/* Enable extensions specified by ISO/IEC 24747:2009. */ +#ifndef __STDC_WANT_MATH_SPEC_FUNCS__ +# define __STDC_WANT_MATH_SPEC_FUNCS__ 1 +#endif +/* Enable extensions on HP NonStop. */ +#ifndef _TANDEM_SOURCE +# define _TANDEM_SOURCE 1 +#endif +/* Enable X/Open extensions. Define to 500 only if necessary + to make mbstate_t available. */ +#ifndef _XOPEN_SOURCE +/* # undef _XOPEN_SOURCE */ +#endif + +/* Version number of package */ +#define VERSION "10.37" + +/* Define to empty if `const' does not conform to ANSI C. */ +/* #undef const */ + +/* Define to the type of a signed integer type of width exactly 64 bits if + such a type exists and the standard includes do not define it. */ +/* #undef int64_t */ + +/* Define to `unsigned int' if does not define. */ +/* #undef size_t */ diff --git a/src/pcre2/src/config.h.in b/src/pcre2/src/config.h.in new file mode 100644 index 00000000..e7ab0640 --- /dev/null +++ b/src/pcre2/src/config.h.in @@ -0,0 +1,433 @@ +/* src/config.h.in. Generated from configure.ac by autoheader. */ + + +/* PCRE2 is written in Standard C, but there are a few non-standard things it +can cope with, allowing it to run on SunOS4 and other "close to standard" +systems. + +In environments that support the GNU autotools, config.h.in is converted into +config.h by the "configure" script. In environments that use CMake, +config-cmake.in is converted into config.h. If you are going to build PCRE2 "by +hand" without using "configure" or CMake, you should copy the distributed +config.h.generic to config.h, and edit the macro definitions to be the way you +need them. You must then add -DHAVE_CONFIG_H to all of your compile commands, +so that config.h is included at the start of every source. + +Alternatively, you can avoid editing by using -D on the compiler command line +to set the macro values. In this case, you do not have to set -DHAVE_CONFIG_H, +but if you do, default values will be taken from config.h for non-boolean +macros that are not defined on the command line. + +Boolean macros such as HAVE_STDLIB_H and SUPPORT_PCRE2_8 should either be +defined (conventionally to 1) for TRUE, and not defined at all for FALSE. All +such macros are listed as a commented #undef in config.h.generic. Macros such +as MATCH_LIMIT, whose actual value is relevant, have defaults defined, but are +surrounded by #ifndef/#endif lines so that the value can be overridden by -D. + +PCRE2 uses memmove() if HAVE_MEMMOVE is defined; otherwise it uses bcopy() if +HAVE_BCOPY is defined. If your system has neither bcopy() nor memmove(), make +sure both macros are undefined; an emulation function will then be used. */ + +/* By default, the \R escape sequence matches any Unicode line ending + character or sequence of characters. If BSR_ANYCRLF is defined (to any + value), this is changed so that backslash-R matches only CR, LF, or CRLF. + The build-time default can be overridden by the user of PCRE2 at runtime. + */ +#undef BSR_ANYCRLF + +/* Define to any value to disable the use of the z and t modifiers in + formatting settings such as %zu or %td (this is rarely needed). */ +#undef DISABLE_PERCENT_ZT + +/* If you are compiling for a system that uses EBCDIC instead of ASCII + character codes, define this macro to any value. When EBCDIC is set, PCRE2 + assumes that all input strings are in EBCDIC. If you do not define this + macro, PCRE2 will assume input strings are ASCII or UTF-8/16/32 Unicode. It + is not possible to build a version of PCRE2 that supports both EBCDIC and + UTF-8/16/32. */ +#undef EBCDIC + +/* In an EBCDIC environment, define this macro to any value to arrange for the + NL character to be 0x25 instead of the default 0x15. NL plays the role that + LF does in an ASCII/Unicode environment. */ +#undef EBCDIC_NL25 + +/* Define this if your compiler supports __attribute__((uninitialized)) */ +#undef HAVE_ATTRIBUTE_UNINITIALIZED + +/* Define to 1 if you have the `bcopy' function. */ +#undef HAVE_BCOPY + +/* Define to 1 if you have the header file. */ +#undef HAVE_BZLIB_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_DIRENT_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_DLFCN_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_EDITLINE_READLINE_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_EDIT_READLINE_READLINE_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_INTTYPES_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_LIMITS_H + +/* Define to 1 if you have the `memfd_create' function. */ +#undef HAVE_MEMFD_CREATE + +/* Define to 1 if you have the `memmove' function. */ +#undef HAVE_MEMMOVE + +/* Define to 1 if you have the header file. */ +#undef HAVE_MINIX_CONFIG_H + +/* Define to 1 if you have the `mkostemp' function. */ +#undef HAVE_MKOSTEMP + +/* Define if you have POSIX threads libraries and header files. */ +#undef HAVE_PTHREAD + +/* Have PTHREAD_PRIO_INHERIT. */ +#undef HAVE_PTHREAD_PRIO_INHERIT + +/* Define to 1 if you have the header file. */ +#undef HAVE_READLINE_HISTORY_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_READLINE_READLINE_H + +/* Define to 1 if you have the `secure_getenv' function. */ +#undef HAVE_SECURE_GETENV + +/* Define to 1 if you have the header file. */ +#undef HAVE_STDINT_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_STDIO_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_STDLIB_H + +/* Define to 1 if you have the `strerror' function. */ +#undef HAVE_STRERROR + +/* Define to 1 if you have the header file. */ +#undef HAVE_STRINGS_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_STRING_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_SYS_STAT_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_SYS_TYPES_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_SYS_WAIT_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_UNISTD_H + +/* Define to 1 if the compiler supports simple visibility declarations. */ +#undef HAVE_VISIBILITY + +/* Define to 1 if you have the header file. */ +#undef HAVE_WCHAR_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_WINDOWS_H + +/* Define to 1 if you have the header file. */ +#undef HAVE_ZLIB_H + +/* This limits the amount of memory that may be used while matching a pattern. + It applies to both pcre2_match() and pcre2_dfa_match(). It does not apply + to JIT matching. The value is in kibibytes (units of 1024 bytes). */ +#undef HEAP_LIMIT + +/* The value of LINK_SIZE determines the number of bytes used to store links + as offsets within the compiled regex. The default is 2, which allows for + compiled patterns up to 65535 code units long. This covers the vast + majority of cases. However, PCRE2 can also be compiled to use 3 or 4 bytes + instead. This allows for longer patterns in extreme cases. */ +#undef LINK_SIZE + +/* Define to the sub-directory where libtool stores uninstalled libraries. */ +#undef LT_OBJDIR + +/* The value of MATCH_LIMIT determines the default number of times the + pcre2_match() function can record a backtrack position during a single + matching attempt. The value is also used to limit a loop counter in + pcre2_dfa_match(). There is a runtime interface for setting a different + limit. The limit exists in order to catch runaway regular expressions that + take for ever to determine that they do not match. The default is set very + large so that it does not accidentally catch legitimate cases. */ +#undef MATCH_LIMIT + +/* The above limit applies to all backtracks, whether or not they are nested. + In some environments it is desirable to limit the nesting of backtracking + (that is, the depth of tree that is searched) more strictly, in order to + restrict the maximum amount of heap memory that is used. The value of + MATCH_LIMIT_DEPTH provides this facility. To have any useful effect, it + must be less than the value of MATCH_LIMIT. The default is to use the same + value as MATCH_LIMIT. There is a runtime method for setting a different + limit. In the case of pcre2_dfa_match(), this limit controls the depth of + the internal nested function calls that are used for pattern recursions, + lookarounds, and atomic groups. */ +#undef MATCH_LIMIT_DEPTH + +/* This limit is parameterized just in case anybody ever wants to change it. + Care must be taken if it is increased, because it guards against integer + overflow caused by enormously large patterns. */ +#undef MAX_NAME_COUNT + +/* This limit is parameterized just in case anybody ever wants to change it. + Care must be taken if it is increased, because it guards against integer + overflow caused by enormously large patterns. */ +#undef MAX_NAME_SIZE + +/* Defining NEVER_BACKSLASH_C locks out the use of \C in all patterns. */ +#undef NEVER_BACKSLASH_C + +/* The value of NEWLINE_DEFAULT determines the default newline character + sequence. PCRE2 client programs can override this by selecting other values + at run time. The valid values are 1 (CR), 2 (LF), 3 (CRLF), 4 (ANY), 5 + (ANYCRLF), and 6 (NUL). */ +#undef NEWLINE_DEFAULT + +/* Name of package */ +#undef PACKAGE + +/* Define to the address where bug reports for this package should be sent. */ +#undef PACKAGE_BUGREPORT + +/* Define to the full name of this package. */ +#undef PACKAGE_NAME + +/* Define to the full name and version of this package. */ +#undef PACKAGE_STRING + +/* Define to the one symbol short name of this package. */ +#undef PACKAGE_TARNAME + +/* Define to the home page for this package. */ +#undef PACKAGE_URL + +/* Define to the version of this package. */ +#undef PACKAGE_VERSION + +/* The value of PARENS_NEST_LIMIT specifies the maximum depth of nested + parentheses (of any kind) in a pattern. This limits the amount of system + stack that is used while compiling a pattern. */ +#undef PARENS_NEST_LIMIT + +/* The value of PCRE2GREP_BUFSIZE is the starting size of the buffer used by + pcre2grep to hold parts of the file it is searching. The buffer will be + expanded up to PCRE2GREP_MAX_BUFSIZE if necessary, for files containing + very long lines. The actual amount of memory used by pcre2grep is three + times this number, because it allows for the buffering of "before" and + "after" lines. */ +#undef PCRE2GREP_BUFSIZE + +/* The value of PCRE2GREP_MAX_BUFSIZE specifies the maximum size of the buffer + used by pcre2grep to hold parts of the file it is searching. The actual + amount of memory used by pcre2grep is three times this number, because it + allows for the buffering of "before" and "after" lines. */ +#undef PCRE2GREP_MAX_BUFSIZE + +/* to make a symbol visible */ +#undef PCRE2POSIX_EXP_DECL + +/* to make a symbol visible */ +#undef PCRE2POSIX_EXP_DEFN + +/* Define to any value to include debugging code. */ +#undef PCRE2_DEBUG + +/* to make a symbol visible */ +#undef PCRE2_EXP_DECL + + +/* If you are compiling for a system other than a Unix-like system or + Win32, and it needs some magic to be inserted before the definition + of a function that is exported by the library, define this macro to + contain the relevant magic. If you do not define this macro, a suitable + __declspec value is used for Windows systems; in other environments + "extern" is used for a C compiler and "extern C" for a C++ compiler. + This macro apears at the start of every exported function that is part + of the external API. It does not appear on functions that are "external" + in the C sense, but which are internal to the library. */ +#undef PCRE2_EXP_DEFN + +/* Define to any value if linking statically (TODO: make nice with Libtool) */ +#undef PCRE2_STATIC + +/* Define to necessary symbol if this constant uses a non-standard name on + your system. */ +#undef PTHREAD_CREATE_JOINABLE + +/* Define to any non-zero number to enable support for SELinux compatible + executable memory allocator in JIT. Note that this will have no effect + unless SUPPORT_JIT is also defined. */ +#undef SLJIT_PROT_EXECUTABLE_ALLOCATOR + +/* Define to 1 if all of the C90 standard headers exist (not just the ones + required in a freestanding environment). This macro is provided for + backward compatibility; new code need not use it. */ +#undef STDC_HEADERS + +/* Define to any value to enable support for Just-In-Time compiling. */ +#undef SUPPORT_JIT + +/* Define to any value to allow pcre2grep to be linked with libbz2, so that it + is able to handle .bz2 files. */ +#undef SUPPORT_LIBBZ2 + +/* Define to any value to allow pcre2test to be linked with libedit. */ +#undef SUPPORT_LIBEDIT + +/* Define to any value to allow pcre2test to be linked with libreadline. */ +#undef SUPPORT_LIBREADLINE + +/* Define to any value to allow pcre2grep to be linked with libz, so that it + is able to handle .gz files. */ +#undef SUPPORT_LIBZ + +/* Define to any value to enable callout script support in pcre2grep. */ +#undef SUPPORT_PCRE2GREP_CALLOUT + +/* Define to any value to enable fork support in pcre2grep callout scripts. + This will have no effect unless SUPPORT_PCRE2GREP_CALLOUT is also defined. + */ +#undef SUPPORT_PCRE2GREP_CALLOUT_FORK + +/* Define to any value to enable JIT support in pcre2grep. Note that this will + have no effect unless SUPPORT_JIT is also defined. */ +#undef SUPPORT_PCRE2GREP_JIT + +/* Define to any value to enable the 16 bit PCRE2 library. */ +#undef SUPPORT_PCRE2_16 + +/* Define to any value to enable the 32 bit PCRE2 library. */ +#undef SUPPORT_PCRE2_32 + +/* Define to any value to enable the 8 bit PCRE2 library. */ +#undef SUPPORT_PCRE2_8 + +/* Define to any value to enable support for Unicode and UTF encoding. This + will work even in an EBCDIC environment, but it is incompatible with the + EBCDIC macro. That is, PCRE2 can support *either* EBCDIC code *or* + ASCII/Unicode, but not both at once. */ +#undef SUPPORT_UNICODE + +/* Define to any value for valgrind support to find invalid memory reads. */ +#undef SUPPORT_VALGRIND + +/* Enable extensions on AIX 3, Interix. */ +#ifndef _ALL_SOURCE +# undef _ALL_SOURCE +#endif +/* Enable general extensions on macOS. */ +#ifndef _DARWIN_C_SOURCE +# undef _DARWIN_C_SOURCE +#endif +/* Enable general extensions on Solaris. */ +#ifndef __EXTENSIONS__ +# undef __EXTENSIONS__ +#endif +/* Enable GNU extensions on systems that have them. */ +#ifndef _GNU_SOURCE +# undef _GNU_SOURCE +#endif +/* Enable X/Open compliant socket functions that do not require linking + with -lxnet on HP-UX 11.11. */ +#ifndef _HPUX_ALT_XOPEN_SOCKET_API +# undef _HPUX_ALT_XOPEN_SOCKET_API +#endif +/* Identify the host operating system as Minix. + This macro does not affect the system headers' behavior. + A future release of Autoconf may stop defining this macro. */ +#ifndef _MINIX +# undef _MINIX +#endif +/* Enable general extensions on NetBSD. + Enable NetBSD compatibility extensions on Minix. */ +#ifndef _NETBSD_SOURCE +# undef _NETBSD_SOURCE +#endif +/* Enable OpenBSD compatibility extensions on NetBSD. + Oddly enough, this does nothing on OpenBSD. */ +#ifndef _OPENBSD_SOURCE +# undef _OPENBSD_SOURCE +#endif +/* Define to 1 if needed for POSIX-compatible behavior. */ +#ifndef _POSIX_SOURCE +# undef _POSIX_SOURCE +#endif +/* Define to 2 if needed for POSIX-compatible behavior. */ +#ifndef _POSIX_1_SOURCE +# undef _POSIX_1_SOURCE +#endif +/* Enable POSIX-compatible threading on Solaris. */ +#ifndef _POSIX_PTHREAD_SEMANTICS +# undef _POSIX_PTHREAD_SEMANTICS +#endif +/* Enable extensions specified by ISO/IEC TS 18661-5:2014. */ +#ifndef __STDC_WANT_IEC_60559_ATTRIBS_EXT__ +# undef __STDC_WANT_IEC_60559_ATTRIBS_EXT__ +#endif +/* Enable extensions specified by ISO/IEC TS 18661-1:2014. */ +#ifndef __STDC_WANT_IEC_60559_BFP_EXT__ +# undef __STDC_WANT_IEC_60559_BFP_EXT__ +#endif +/* Enable extensions specified by ISO/IEC TS 18661-2:2015. */ +#ifndef __STDC_WANT_IEC_60559_DFP_EXT__ +# undef __STDC_WANT_IEC_60559_DFP_EXT__ +#endif +/* Enable extensions specified by ISO/IEC TS 18661-4:2015. */ +#ifndef __STDC_WANT_IEC_60559_FUNCS_EXT__ +# undef __STDC_WANT_IEC_60559_FUNCS_EXT__ +#endif +/* Enable extensions specified by ISO/IEC TS 18661-3:2015. */ +#ifndef __STDC_WANT_IEC_60559_TYPES_EXT__ +# undef __STDC_WANT_IEC_60559_TYPES_EXT__ +#endif +/* Enable extensions specified by ISO/IEC TR 24731-2:2010. */ +#ifndef __STDC_WANT_LIB_EXT2__ +# undef __STDC_WANT_LIB_EXT2__ +#endif +/* Enable extensions specified by ISO/IEC 24747:2009. */ +#ifndef __STDC_WANT_MATH_SPEC_FUNCS__ +# undef __STDC_WANT_MATH_SPEC_FUNCS__ +#endif +/* Enable extensions on HP NonStop. */ +#ifndef _TANDEM_SOURCE +# undef _TANDEM_SOURCE +#endif +/* Enable X/Open extensions. Define to 500 only if necessary + to make mbstate_t available. */ +#ifndef _XOPEN_SOURCE +# undef _XOPEN_SOURCE +#endif + + +/* Version number of package */ +#undef VERSION + +/* Define to empty if `const' does not conform to ANSI C. */ +#undef const + +/* Define to the type of a signed integer type of width exactly 64 bits if + such a type exists and the standard includes do not define it. */ +#undef int64_t + +/* Define to `unsigned int' if does not define. */ +#undef size_t diff --git a/src/pcre2/src/pcre2.h.generic b/src/pcre2/src/pcre2.h.generic new file mode 100644 index 00000000..7ab6b39a --- /dev/null +++ b/src/pcre2/src/pcre2.h.generic @@ -0,0 +1,991 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* This is the public header file for the PCRE library, second API, to be +#included by applications that call PCRE2 functions. + + Copyright (c) 2016-2020 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + +#ifndef PCRE2_H_IDEMPOTENT_GUARD +#define PCRE2_H_IDEMPOTENT_GUARD + +/* The current PCRE version information. */ + +#define PCRE2_MAJOR 10 +#define PCRE2_MINOR 37 +#define PCRE2_PRERELEASE +#define PCRE2_DATE 2021-05-26 + +/* When an application links to a PCRE DLL in Windows, the symbols that are +imported have to be identified as such. When building PCRE2, the appropriate +export setting is defined in pcre2_internal.h, which includes this file. So we +don't change existing definitions of PCRE2_EXP_DECL. */ + +#if defined(_WIN32) && !defined(PCRE2_STATIC) +# ifndef PCRE2_EXP_DECL +# define PCRE2_EXP_DECL extern __declspec(dllimport) +# endif +#endif + +/* By default, we use the standard "extern" declarations. */ + +#ifndef PCRE2_EXP_DECL +# ifdef __cplusplus +# define PCRE2_EXP_DECL extern "C" +# else +# define PCRE2_EXP_DECL extern +# endif +#endif + +/* When compiling with the MSVC compiler, it is sometimes necessary to include +a "calling convention" before exported function names. (This is secondhand +information; I know nothing about MSVC myself). For example, something like + + void __cdecl function(....) + +might be needed. In order so make this easy, all the exported functions have +PCRE2_CALL_CONVENTION just before their names. It is rarely needed; if not +set, we ensure here that it has no effect. */ + +#ifndef PCRE2_CALL_CONVENTION +#define PCRE2_CALL_CONVENTION +#endif + +/* Have to include limits.h, stdlib.h, and inttypes.h to ensure that size_t and +uint8_t, UCHAR_MAX, etc are defined. Some systems that do have inttypes.h do +not have stdint.h, which is why we use inttypes.h, which according to the C +standard is a superset of stdint.h. If none of these headers are available, +the relevant values must be provided by some other means. */ + +#include +#include +#include + +/* Allow for C++ users compiling this directly. */ + +#ifdef __cplusplus +extern "C" { +#endif + +/* The following option bits can be passed to pcre2_compile(), pcre2_match(), +or pcre2_dfa_match(). PCRE2_NO_UTF_CHECK affects only the function to which it +is passed. Put these bits at the most significant end of the options word so +others can be added next to them */ + +#define PCRE2_ANCHORED 0x80000000u +#define PCRE2_NO_UTF_CHECK 0x40000000u +#define PCRE2_ENDANCHORED 0x20000000u + +/* The following option bits can be passed only to pcre2_compile(). However, +they may affect compilation, JIT compilation, and/or interpretive execution. +The following tags indicate which: + +C alters what is compiled by pcre2_compile() +J alters what is compiled by pcre2_jit_compile() +M is inspected during pcre2_match() execution +D is inspected during pcre2_dfa_match() execution +*/ + +#define PCRE2_ALLOW_EMPTY_CLASS 0x00000001u /* C */ +#define PCRE2_ALT_BSUX 0x00000002u /* C */ +#define PCRE2_AUTO_CALLOUT 0x00000004u /* C */ +#define PCRE2_CASELESS 0x00000008u /* C */ +#define PCRE2_DOLLAR_ENDONLY 0x00000010u /* J M D */ +#define PCRE2_DOTALL 0x00000020u /* C */ +#define PCRE2_DUPNAMES 0x00000040u /* C */ +#define PCRE2_EXTENDED 0x00000080u /* C */ +#define PCRE2_FIRSTLINE 0x00000100u /* J M D */ +#define PCRE2_MATCH_UNSET_BACKREF 0x00000200u /* C J M */ +#define PCRE2_MULTILINE 0x00000400u /* C */ +#define PCRE2_NEVER_UCP 0x00000800u /* C */ +#define PCRE2_NEVER_UTF 0x00001000u /* C */ +#define PCRE2_NO_AUTO_CAPTURE 0x00002000u /* C */ +#define PCRE2_NO_AUTO_POSSESS 0x00004000u /* C */ +#define PCRE2_NO_DOTSTAR_ANCHOR 0x00008000u /* C */ +#define PCRE2_NO_START_OPTIMIZE 0x00010000u /* J M D */ +#define PCRE2_UCP 0x00020000u /* C J M D */ +#define PCRE2_UNGREEDY 0x00040000u /* C */ +#define PCRE2_UTF 0x00080000u /* C J M D */ +#define PCRE2_NEVER_BACKSLASH_C 0x00100000u /* C */ +#define PCRE2_ALT_CIRCUMFLEX 0x00200000u /* J M D */ +#define PCRE2_ALT_VERBNAMES 0x00400000u /* C */ +#define PCRE2_USE_OFFSET_LIMIT 0x00800000u /* J M D */ +#define PCRE2_EXTENDED_MORE 0x01000000u /* C */ +#define PCRE2_LITERAL 0x02000000u /* C */ +#define PCRE2_MATCH_INVALID_UTF 0x04000000u /* J M D */ + +/* An additional compile options word is available in the compile context. */ + +#define PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 0x00000001u /* C */ +#define PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL 0x00000002u /* C */ +#define PCRE2_EXTRA_MATCH_WORD 0x00000004u /* C */ +#define PCRE2_EXTRA_MATCH_LINE 0x00000008u /* C */ +#define PCRE2_EXTRA_ESCAPED_CR_IS_LF 0x00000010u /* C */ +#define PCRE2_EXTRA_ALT_BSUX 0x00000020u /* C */ + +/* These are for pcre2_jit_compile(). */ + +#define PCRE2_JIT_COMPLETE 0x00000001u /* For full matching */ +#define PCRE2_JIT_PARTIAL_SOFT 0x00000002u +#define PCRE2_JIT_PARTIAL_HARD 0x00000004u +#define PCRE2_JIT_INVALID_UTF 0x00000100u + +/* These are for pcre2_match(), pcre2_dfa_match(), pcre2_jit_match(), and +pcre2_substitute(). Some are allowed only for one of the functions, and in +these cases it is noted below. Note that PCRE2_ANCHORED, PCRE2_ENDANCHORED and +PCRE2_NO_UTF_CHECK can also be passed to these functions (though +pcre2_jit_match() ignores the latter since it bypasses all sanity checks). */ + +#define PCRE2_NOTBOL 0x00000001u +#define PCRE2_NOTEOL 0x00000002u +#define PCRE2_NOTEMPTY 0x00000004u /* ) These two must be kept */ +#define PCRE2_NOTEMPTY_ATSTART 0x00000008u /* ) adjacent to each other. */ +#define PCRE2_PARTIAL_SOFT 0x00000010u +#define PCRE2_PARTIAL_HARD 0x00000020u +#define PCRE2_DFA_RESTART 0x00000040u /* pcre2_dfa_match() only */ +#define PCRE2_DFA_SHORTEST 0x00000080u /* pcre2_dfa_match() only */ +#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_UNKNOWN_UNSET 0x00000800u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 0x00001000u /* pcre2_substitute() only */ +#define PCRE2_NO_JIT 0x00002000u /* Not for pcre2_dfa_match() */ +#define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u +#define PCRE2_SUBSTITUTE_LITERAL 0x00008000u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_MATCHED 0x00010000u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 0x00020000u /* pcre2_substitute() only */ + +/* Options for pcre2_pattern_convert(). */ + +#define PCRE2_CONVERT_UTF 0x00000001u +#define PCRE2_CONVERT_NO_UTF_CHECK 0x00000002u +#define PCRE2_CONVERT_POSIX_BASIC 0x00000004u +#define PCRE2_CONVERT_POSIX_EXTENDED 0x00000008u +#define PCRE2_CONVERT_GLOB 0x00000010u +#define PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR 0x00000030u +#define PCRE2_CONVERT_GLOB_NO_STARSTAR 0x00000050u + +/* Newline and \R settings, for use in compile contexts. The newline values +must be kept in step with values set in config.h and both sets must all be +greater than zero. */ + +#define PCRE2_NEWLINE_CR 1 +#define PCRE2_NEWLINE_LF 2 +#define PCRE2_NEWLINE_CRLF 3 +#define PCRE2_NEWLINE_ANY 4 +#define PCRE2_NEWLINE_ANYCRLF 5 +#define PCRE2_NEWLINE_NUL 6 + +#define PCRE2_BSR_UNICODE 1 +#define PCRE2_BSR_ANYCRLF 2 + +/* Error codes for pcre2_compile(). Some of these are also used by +pcre2_pattern_convert(). */ + +#define PCRE2_ERROR_END_BACKSLASH 101 +#define PCRE2_ERROR_END_BACKSLASH_C 102 +#define PCRE2_ERROR_UNKNOWN_ESCAPE 103 +#define PCRE2_ERROR_QUANTIFIER_OUT_OF_ORDER 104 +#define PCRE2_ERROR_QUANTIFIER_TOO_BIG 105 +#define PCRE2_ERROR_MISSING_SQUARE_BRACKET 106 +#define PCRE2_ERROR_ESCAPE_INVALID_IN_CLASS 107 +#define PCRE2_ERROR_CLASS_RANGE_ORDER 108 +#define PCRE2_ERROR_QUANTIFIER_INVALID 109 +#define PCRE2_ERROR_INTERNAL_UNEXPECTED_REPEAT 110 +#define PCRE2_ERROR_INVALID_AFTER_PARENS_QUERY 111 +#define PCRE2_ERROR_POSIX_CLASS_NOT_IN_CLASS 112 +#define PCRE2_ERROR_POSIX_NO_SUPPORT_COLLATING 113 +#define PCRE2_ERROR_MISSING_CLOSING_PARENTHESIS 114 +#define PCRE2_ERROR_BAD_SUBPATTERN_REFERENCE 115 +#define PCRE2_ERROR_NULL_PATTERN 116 +#define PCRE2_ERROR_BAD_OPTIONS 117 +#define PCRE2_ERROR_MISSING_COMMENT_CLOSING 118 +#define PCRE2_ERROR_PARENTHESES_NEST_TOO_DEEP 119 +#define PCRE2_ERROR_PATTERN_TOO_LARGE 120 +#define PCRE2_ERROR_HEAP_FAILED 121 +#define PCRE2_ERROR_UNMATCHED_CLOSING_PARENTHESIS 122 +#define PCRE2_ERROR_INTERNAL_CODE_OVERFLOW 123 +#define PCRE2_ERROR_MISSING_CONDITION_CLOSING 124 +#define PCRE2_ERROR_LOOKBEHIND_NOT_FIXED_LENGTH 125 +#define PCRE2_ERROR_ZERO_RELATIVE_REFERENCE 126 +#define PCRE2_ERROR_TOO_MANY_CONDITION_BRANCHES 127 +#define PCRE2_ERROR_CONDITION_ASSERTION_EXPECTED 128 +#define PCRE2_ERROR_BAD_RELATIVE_REFERENCE 129 +#define PCRE2_ERROR_UNKNOWN_POSIX_CLASS 130 +#define PCRE2_ERROR_INTERNAL_STUDY_ERROR 131 +#define PCRE2_ERROR_UNICODE_NOT_SUPPORTED 132 +#define PCRE2_ERROR_PARENTHESES_STACK_CHECK 133 +#define PCRE2_ERROR_CODE_POINT_TOO_BIG 134 +#define PCRE2_ERROR_LOOKBEHIND_TOO_COMPLICATED 135 +#define PCRE2_ERROR_LOOKBEHIND_INVALID_BACKSLASH_C 136 +#define PCRE2_ERROR_UNSUPPORTED_ESCAPE_SEQUENCE 137 +#define PCRE2_ERROR_CALLOUT_NUMBER_TOO_BIG 138 +#define PCRE2_ERROR_MISSING_CALLOUT_CLOSING 139 +#define PCRE2_ERROR_ESCAPE_INVALID_IN_VERB 140 +#define PCRE2_ERROR_UNRECOGNIZED_AFTER_QUERY_P 141 +#define PCRE2_ERROR_MISSING_NAME_TERMINATOR 142 +#define PCRE2_ERROR_DUPLICATE_SUBPATTERN_NAME 143 +#define PCRE2_ERROR_INVALID_SUBPATTERN_NAME 144 +#define PCRE2_ERROR_UNICODE_PROPERTIES_UNAVAILABLE 145 +#define PCRE2_ERROR_MALFORMED_UNICODE_PROPERTY 146 +#define PCRE2_ERROR_UNKNOWN_UNICODE_PROPERTY 147 +#define PCRE2_ERROR_SUBPATTERN_NAME_TOO_LONG 148 +#define PCRE2_ERROR_TOO_MANY_NAMED_SUBPATTERNS 149 +#define PCRE2_ERROR_CLASS_INVALID_RANGE 150 +#define PCRE2_ERROR_OCTAL_BYTE_TOO_BIG 151 +#define PCRE2_ERROR_INTERNAL_OVERRAN_WORKSPACE 152 +#define PCRE2_ERROR_INTERNAL_MISSING_SUBPATTERN 153 +#define PCRE2_ERROR_DEFINE_TOO_MANY_BRANCHES 154 +#define PCRE2_ERROR_BACKSLASH_O_MISSING_BRACE 155 +#define PCRE2_ERROR_INTERNAL_UNKNOWN_NEWLINE 156 +#define PCRE2_ERROR_BACKSLASH_G_SYNTAX 157 +#define PCRE2_ERROR_PARENS_QUERY_R_MISSING_CLOSING 158 +/* Error 159 is obsolete and should now never occur */ +#define PCRE2_ERROR_VERB_ARGUMENT_NOT_ALLOWED 159 +#define PCRE2_ERROR_VERB_UNKNOWN 160 +#define PCRE2_ERROR_SUBPATTERN_NUMBER_TOO_BIG 161 +#define PCRE2_ERROR_SUBPATTERN_NAME_EXPECTED 162 +#define PCRE2_ERROR_INTERNAL_PARSED_OVERFLOW 163 +#define PCRE2_ERROR_INVALID_OCTAL 164 +#define PCRE2_ERROR_SUBPATTERN_NAMES_MISMATCH 165 +#define PCRE2_ERROR_MARK_MISSING_ARGUMENT 166 +#define PCRE2_ERROR_INVALID_HEXADECIMAL 167 +#define PCRE2_ERROR_BACKSLASH_C_SYNTAX 168 +#define PCRE2_ERROR_BACKSLASH_K_SYNTAX 169 +#define PCRE2_ERROR_INTERNAL_BAD_CODE_LOOKBEHINDS 170 +#define PCRE2_ERROR_BACKSLASH_N_IN_CLASS 171 +#define PCRE2_ERROR_CALLOUT_STRING_TOO_LONG 172 +#define PCRE2_ERROR_UNICODE_DISALLOWED_CODE_POINT 173 +#define PCRE2_ERROR_UTF_IS_DISABLED 174 +#define PCRE2_ERROR_UCP_IS_DISABLED 175 +#define PCRE2_ERROR_VERB_NAME_TOO_LONG 176 +#define PCRE2_ERROR_BACKSLASH_U_CODE_POINT_TOO_BIG 177 +#define PCRE2_ERROR_MISSING_OCTAL_OR_HEX_DIGITS 178 +#define PCRE2_ERROR_VERSION_CONDITION_SYNTAX 179 +#define PCRE2_ERROR_INTERNAL_BAD_CODE_AUTO_POSSESS 180 +#define PCRE2_ERROR_CALLOUT_NO_STRING_DELIMITER 181 +#define PCRE2_ERROR_CALLOUT_BAD_STRING_DELIMITER 182 +#define PCRE2_ERROR_BACKSLASH_C_CALLER_DISABLED 183 +#define PCRE2_ERROR_QUERY_BARJX_NEST_TOO_DEEP 184 +#define PCRE2_ERROR_BACKSLASH_C_LIBRARY_DISABLED 185 +#define PCRE2_ERROR_PATTERN_TOO_COMPLICATED 186 +#define PCRE2_ERROR_LOOKBEHIND_TOO_LONG 187 +#define PCRE2_ERROR_PATTERN_STRING_TOO_LONG 188 +#define PCRE2_ERROR_INTERNAL_BAD_CODE 189 +#define PCRE2_ERROR_INTERNAL_BAD_CODE_IN_SKIP 190 +#define PCRE2_ERROR_NO_SURROGATES_IN_UTF16 191 +#define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192 +#define PCRE2_ERROR_SUPPORTED_ONLY_IN_UNICODE 193 +#define PCRE2_ERROR_INVALID_HYPHEN_IN_OPTIONS 194 +#define PCRE2_ERROR_ALPHA_ASSERTION_UNKNOWN 195 +#define PCRE2_ERROR_SCRIPT_RUN_NOT_AVAILABLE 196 +#define PCRE2_ERROR_TOO_MANY_CAPTURES 197 +#define PCRE2_ERROR_CONDITION_ATOMIC_ASSERTION_EXPECTED 198 + + +/* "Expected" matching error codes: no match and partial match. */ + +#define PCRE2_ERROR_NOMATCH (-1) +#define PCRE2_ERROR_PARTIAL (-2) + +/* Error codes for UTF-8 validity checks */ + +#define PCRE2_ERROR_UTF8_ERR1 (-3) +#define PCRE2_ERROR_UTF8_ERR2 (-4) +#define PCRE2_ERROR_UTF8_ERR3 (-5) +#define PCRE2_ERROR_UTF8_ERR4 (-6) +#define PCRE2_ERROR_UTF8_ERR5 (-7) +#define PCRE2_ERROR_UTF8_ERR6 (-8) +#define PCRE2_ERROR_UTF8_ERR7 (-9) +#define PCRE2_ERROR_UTF8_ERR8 (-10) +#define PCRE2_ERROR_UTF8_ERR9 (-11) +#define PCRE2_ERROR_UTF8_ERR10 (-12) +#define PCRE2_ERROR_UTF8_ERR11 (-13) +#define PCRE2_ERROR_UTF8_ERR12 (-14) +#define PCRE2_ERROR_UTF8_ERR13 (-15) +#define PCRE2_ERROR_UTF8_ERR14 (-16) +#define PCRE2_ERROR_UTF8_ERR15 (-17) +#define PCRE2_ERROR_UTF8_ERR16 (-18) +#define PCRE2_ERROR_UTF8_ERR17 (-19) +#define PCRE2_ERROR_UTF8_ERR18 (-20) +#define PCRE2_ERROR_UTF8_ERR19 (-21) +#define PCRE2_ERROR_UTF8_ERR20 (-22) +#define PCRE2_ERROR_UTF8_ERR21 (-23) + +/* Error codes for UTF-16 validity checks */ + +#define PCRE2_ERROR_UTF16_ERR1 (-24) +#define PCRE2_ERROR_UTF16_ERR2 (-25) +#define PCRE2_ERROR_UTF16_ERR3 (-26) + +/* Error codes for UTF-32 validity checks */ + +#define PCRE2_ERROR_UTF32_ERR1 (-27) +#define PCRE2_ERROR_UTF32_ERR2 (-28) + +/* Miscellaneous error codes for pcre2[_dfa]_match(), substring extraction +functions, context functions, and serializing functions. They are in numerical +order. Originally they were in alphabetical order too, but now that PCRE2 is +released, the numbers must not be changed. */ + +#define PCRE2_ERROR_BADDATA (-29) +#define PCRE2_ERROR_MIXEDTABLES (-30) /* Name was changed */ +#define PCRE2_ERROR_BADMAGIC (-31) +#define PCRE2_ERROR_BADMODE (-32) +#define PCRE2_ERROR_BADOFFSET (-33) +#define PCRE2_ERROR_BADOPTION (-34) +#define PCRE2_ERROR_BADREPLACEMENT (-35) +#define PCRE2_ERROR_BADUTFOFFSET (-36) +#define PCRE2_ERROR_CALLOUT (-37) /* Never used by PCRE2 itself */ +#define PCRE2_ERROR_DFA_BADRESTART (-38) +#define PCRE2_ERROR_DFA_RECURSE (-39) +#define PCRE2_ERROR_DFA_UCOND (-40) +#define PCRE2_ERROR_DFA_UFUNC (-41) +#define PCRE2_ERROR_DFA_UITEM (-42) +#define PCRE2_ERROR_DFA_WSSIZE (-43) +#define PCRE2_ERROR_INTERNAL (-44) +#define PCRE2_ERROR_JIT_BADOPTION (-45) +#define PCRE2_ERROR_JIT_STACKLIMIT (-46) +#define PCRE2_ERROR_MATCHLIMIT (-47) +#define PCRE2_ERROR_NOMEMORY (-48) +#define PCRE2_ERROR_NOSUBSTRING (-49) +#define PCRE2_ERROR_NOUNIQUESUBSTRING (-50) +#define PCRE2_ERROR_NULL (-51) +#define PCRE2_ERROR_RECURSELOOP (-52) +#define PCRE2_ERROR_DEPTHLIMIT (-53) +#define PCRE2_ERROR_RECURSIONLIMIT (-53) /* Obsolete synonym */ +#define PCRE2_ERROR_UNAVAILABLE (-54) +#define PCRE2_ERROR_UNSET (-55) +#define PCRE2_ERROR_BADOFFSETLIMIT (-56) +#define PCRE2_ERROR_BADREPESCAPE (-57) +#define PCRE2_ERROR_REPMISSINGBRACE (-58) +#define PCRE2_ERROR_BADSUBSTITUTION (-59) +#define PCRE2_ERROR_BADSUBSPATTERN (-60) +#define PCRE2_ERROR_TOOMANYREPLACE (-61) +#define PCRE2_ERROR_BADSERIALIZEDDATA (-62) +#define PCRE2_ERROR_HEAPLIMIT (-63) +#define PCRE2_ERROR_CONVERT_SYNTAX (-64) +#define PCRE2_ERROR_INTERNAL_DUPMATCH (-65) +#define PCRE2_ERROR_DFA_UINVALID_UTF (-66) + + +/* Request types for pcre2_pattern_info() */ + +#define PCRE2_INFO_ALLOPTIONS 0 +#define PCRE2_INFO_ARGOPTIONS 1 +#define PCRE2_INFO_BACKREFMAX 2 +#define PCRE2_INFO_BSR 3 +#define PCRE2_INFO_CAPTURECOUNT 4 +#define PCRE2_INFO_FIRSTCODEUNIT 5 +#define PCRE2_INFO_FIRSTCODETYPE 6 +#define PCRE2_INFO_FIRSTBITMAP 7 +#define PCRE2_INFO_HASCRORLF 8 +#define PCRE2_INFO_JCHANGED 9 +#define PCRE2_INFO_JITSIZE 10 +#define PCRE2_INFO_LASTCODEUNIT 11 +#define PCRE2_INFO_LASTCODETYPE 12 +#define PCRE2_INFO_MATCHEMPTY 13 +#define PCRE2_INFO_MATCHLIMIT 14 +#define PCRE2_INFO_MAXLOOKBEHIND 15 +#define PCRE2_INFO_MINLENGTH 16 +#define PCRE2_INFO_NAMECOUNT 17 +#define PCRE2_INFO_NAMEENTRYSIZE 18 +#define PCRE2_INFO_NAMETABLE 19 +#define PCRE2_INFO_NEWLINE 20 +#define PCRE2_INFO_DEPTHLIMIT 21 +#define PCRE2_INFO_RECURSIONLIMIT 21 /* Obsolete synonym */ +#define PCRE2_INFO_SIZE 22 +#define PCRE2_INFO_HASBACKSLASHC 23 +#define PCRE2_INFO_FRAMESIZE 24 +#define PCRE2_INFO_HEAPLIMIT 25 +#define PCRE2_INFO_EXTRAOPTIONS 26 + +/* Request types for pcre2_config(). */ + +#define PCRE2_CONFIG_BSR 0 +#define PCRE2_CONFIG_JIT 1 +#define PCRE2_CONFIG_JITTARGET 2 +#define PCRE2_CONFIG_LINKSIZE 3 +#define PCRE2_CONFIG_MATCHLIMIT 4 +#define PCRE2_CONFIG_NEWLINE 5 +#define PCRE2_CONFIG_PARENSLIMIT 6 +#define PCRE2_CONFIG_DEPTHLIMIT 7 +#define PCRE2_CONFIG_RECURSIONLIMIT 7 /* Obsolete synonym */ +#define PCRE2_CONFIG_STACKRECURSE 8 /* Obsolete */ +#define PCRE2_CONFIG_UNICODE 9 +#define PCRE2_CONFIG_UNICODE_VERSION 10 +#define PCRE2_CONFIG_VERSION 11 +#define PCRE2_CONFIG_HEAPLIMIT 12 +#define PCRE2_CONFIG_NEVER_BACKSLASH_C 13 +#define PCRE2_CONFIG_COMPILED_WIDTHS 14 +#define PCRE2_CONFIG_TABLES_LENGTH 15 + + +/* Types for code units in patterns and subject strings. */ + +typedef uint8_t PCRE2_UCHAR8; +typedef uint16_t PCRE2_UCHAR16; +typedef uint32_t PCRE2_UCHAR32; + +typedef const PCRE2_UCHAR8 *PCRE2_SPTR8; +typedef const PCRE2_UCHAR16 *PCRE2_SPTR16; +typedef const PCRE2_UCHAR32 *PCRE2_SPTR32; + +/* The PCRE2_SIZE type is used for all string lengths and offsets in PCRE2, +including pattern offsets for errors and subject offsets after a match. We +define special values to indicate zero-terminated strings and unset offsets in +the offset vector (ovector). */ + +#define PCRE2_SIZE size_t +#define PCRE2_SIZE_MAX SIZE_MAX +#define PCRE2_ZERO_TERMINATED (~(PCRE2_SIZE)0) +#define PCRE2_UNSET (~(PCRE2_SIZE)0) + +/* Generic types for opaque structures and JIT callback functions. These +declarations are defined in a macro that is expanded for each width later. */ + +#define PCRE2_TYPES_LIST \ +struct pcre2_real_general_context; \ +typedef struct pcre2_real_general_context pcre2_general_context; \ +\ +struct pcre2_real_compile_context; \ +typedef struct pcre2_real_compile_context pcre2_compile_context; \ +\ +struct pcre2_real_match_context; \ +typedef struct pcre2_real_match_context pcre2_match_context; \ +\ +struct pcre2_real_convert_context; \ +typedef struct pcre2_real_convert_context pcre2_convert_context; \ +\ +struct pcre2_real_code; \ +typedef struct pcre2_real_code pcre2_code; \ +\ +struct pcre2_real_match_data; \ +typedef struct pcre2_real_match_data pcre2_match_data; \ +\ +struct pcre2_real_jit_stack; \ +typedef struct pcre2_real_jit_stack pcre2_jit_stack; \ +\ +typedef pcre2_jit_stack *(*pcre2_jit_callback)(void *); + + +/* The structures for passing out data via callout functions. We use structures +so that new fields can be added on the end in future versions, without changing +the API of the function, thereby allowing old clients to work without +modification. Define the generic versions in a macro; the width-specific +versions are generated from this macro below. */ + +/* Flags for the callout_flags field. These are cleared after a callout. */ + +#define PCRE2_CALLOUT_STARTMATCH 0x00000001u /* Set for each bumpalong */ +#define PCRE2_CALLOUT_BACKTRACK 0x00000002u /* Set after a backtrack */ + +#define PCRE2_STRUCTURE_LIST \ +typedef struct pcre2_callout_block { \ + uint32_t version; /* Identifies version of block */ \ + /* ------------------------ Version 0 ------------------------------- */ \ + uint32_t callout_number; /* Number compiled into pattern */ \ + uint32_t capture_top; /* Max current capture */ \ + uint32_t capture_last; /* Most recently closed capture */ \ + PCRE2_SIZE *offset_vector; /* The offset vector */ \ + PCRE2_SPTR mark; /* Pointer to current mark or NULL */ \ + PCRE2_SPTR subject; /* The subject being matched */ \ + PCRE2_SIZE subject_length; /* The length of the subject */ \ + PCRE2_SIZE start_match; /* Offset to start of this match attempt */ \ + PCRE2_SIZE current_position; /* Where we currently are in the subject */ \ + PCRE2_SIZE pattern_position; /* Offset to next item in the pattern */ \ + PCRE2_SIZE next_item_length; /* Length of next item in the pattern */ \ + /* ------------------- Added for Version 1 -------------------------- */ \ + PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \ + PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \ + PCRE2_SPTR callout_string; /* String compiled into pattern */ \ + /* ------------------- Added for Version 2 -------------------------- */ \ + uint32_t callout_flags; /* See above for list */ \ + /* ------------------------------------------------------------------ */ \ +} pcre2_callout_block; \ +\ +typedef struct pcre2_callout_enumerate_block { \ + uint32_t version; /* Identifies version of block */ \ + /* ------------------------ Version 0 ------------------------------- */ \ + PCRE2_SIZE pattern_position; /* Offset to next item in the pattern */ \ + PCRE2_SIZE next_item_length; /* Length of next item in the pattern */ \ + uint32_t callout_number; /* Number compiled into pattern */ \ + PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \ + PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \ + PCRE2_SPTR callout_string; /* String compiled into pattern */ \ + /* ------------------------------------------------------------------ */ \ +} pcre2_callout_enumerate_block; \ +\ +typedef struct pcre2_substitute_callout_block { \ + uint32_t version; /* Identifies version of block */ \ + /* ------------------------ Version 0 ------------------------------- */ \ + PCRE2_SPTR input; /* Pointer to input subject string */ \ + PCRE2_SPTR output; /* Pointer to output buffer */ \ + PCRE2_SIZE output_offsets[2]; /* Changed portion of the output */ \ + PCRE2_SIZE *ovector; /* Pointer to current ovector */ \ + uint32_t oveccount; /* Count of pairs set in ovector */ \ + uint32_t subscount; /* Substitution number */ \ + /* ------------------------------------------------------------------ */ \ +} pcre2_substitute_callout_block; + + +/* List the generic forms of all other functions in macros, which will be +expanded for each width below. Start with functions that give general +information. */ + +#define PCRE2_GENERAL_INFO_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION pcre2_config(uint32_t, void *); + + +/* Functions for manipulating contexts. */ + +#define PCRE2_GENERAL_CONTEXT_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_general_context PCRE2_CALL_CONVENTION \ + *pcre2_general_context_copy(pcre2_general_context *); \ +PCRE2_EXP_DECL pcre2_general_context PCRE2_CALL_CONVENTION \ + *pcre2_general_context_create(void *(*)(PCRE2_SIZE, void *), \ + void (*)(void *, void *), void *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_general_context_free(pcre2_general_context *); + +#define PCRE2_COMPILE_CONTEXT_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_compile_context PCRE2_CALL_CONVENTION \ + *pcre2_compile_context_copy(pcre2_compile_context *); \ +PCRE2_EXP_DECL pcre2_compile_context PCRE2_CALL_CONVENTION \ + *pcre2_compile_context_create(pcre2_general_context *);\ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_compile_context_free(pcre2_compile_context *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_bsr(pcre2_compile_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_character_tables(pcre2_compile_context *, const uint8_t *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_compile_extra_options(pcre2_compile_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_max_pattern_length(pcre2_compile_context *, PCRE2_SIZE); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_newline(pcre2_compile_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_parens_nest_limit(pcre2_compile_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_compile_recursion_guard(pcre2_compile_context *, \ + int (*)(uint32_t, void *), void *); + +#define PCRE2_MATCH_CONTEXT_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_match_context PCRE2_CALL_CONVENTION \ + *pcre2_match_context_copy(pcre2_match_context *); \ +PCRE2_EXP_DECL pcre2_match_context PCRE2_CALL_CONVENTION \ + *pcre2_match_context_create(pcre2_general_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_match_context_free(pcre2_match_context *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_callout(pcre2_match_context *, \ + int (*)(pcre2_callout_block *, void *), void *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_substitute_callout(pcre2_match_context *, \ + int (*)(pcre2_substitute_callout_block *, void *), void *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_depth_limit(pcre2_match_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_heap_limit(pcre2_match_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_match_limit(pcre2_match_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_offset_limit(pcre2_match_context *, PCRE2_SIZE); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_recursion_limit(pcre2_match_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_recursion_memory_management(pcre2_match_context *, \ + void *(*)(PCRE2_SIZE, void *), void (*)(void *, void *), void *); + +#define PCRE2_CONVERT_CONTEXT_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_convert_context PCRE2_CALL_CONVENTION \ + *pcre2_convert_context_copy(pcre2_convert_context *); \ +PCRE2_EXP_DECL pcre2_convert_context PCRE2_CALL_CONVENTION \ + *pcre2_convert_context_create(pcre2_general_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_convert_context_free(pcre2_convert_context *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_glob_escape(pcre2_convert_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_glob_separator(pcre2_convert_context *, uint32_t); + + +/* Functions concerned with compiling a pattern to PCRE internal code. */ + +#define PCRE2_COMPILE_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \ + *pcre2_compile(PCRE2_SPTR, PCRE2_SIZE, uint32_t, int *, PCRE2_SIZE *, \ + pcre2_compile_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_code_free(pcre2_code *); \ +PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \ + *pcre2_code_copy(const pcre2_code *); \ +PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \ + *pcre2_code_copy_with_tables(const pcre2_code *); + + +/* Functions that give information about a compiled pattern. */ + +#define PCRE2_PATTERN_INFO_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_pattern_info(const pcre2_code *, uint32_t, void *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_callout_enumerate(const pcre2_code *, \ + int (*)(pcre2_callout_enumerate_block *, void *), void *); + + +/* Functions for running a match and inspecting the result. */ + +#define PCRE2_MATCH_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_match_data PCRE2_CALL_CONVENTION \ + *pcre2_match_data_create(uint32_t, pcre2_general_context *); \ +PCRE2_EXP_DECL pcre2_match_data PCRE2_CALL_CONVENTION \ + *pcre2_match_data_create_from_pattern(const pcre2_code *, \ + pcre2_general_context *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_dfa_match(const pcre2_code *, PCRE2_SPTR, PCRE2_SIZE, PCRE2_SIZE, \ + uint32_t, pcre2_match_data *, pcre2_match_context *, int *, PCRE2_SIZE); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_match(const pcre2_code *, PCRE2_SPTR, PCRE2_SIZE, PCRE2_SIZE, \ + uint32_t, pcre2_match_data *, pcre2_match_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_match_data_free(pcre2_match_data *); \ +PCRE2_EXP_DECL PCRE2_SPTR PCRE2_CALL_CONVENTION \ + pcre2_get_mark(pcre2_match_data *); \ +PCRE2_EXP_DECL PCRE2_SIZE PCRE2_CALL_CONVENTION \ + pcre2_get_match_data_size(pcre2_match_data *); \ +PCRE2_EXP_DECL uint32_t PCRE2_CALL_CONVENTION \ + pcre2_get_ovector_count(pcre2_match_data *); \ +PCRE2_EXP_DECL PCRE2_SIZE PCRE2_CALL_CONVENTION \ + *pcre2_get_ovector_pointer(pcre2_match_data *); \ +PCRE2_EXP_DECL PCRE2_SIZE PCRE2_CALL_CONVENTION \ + pcre2_get_startchar(pcre2_match_data *); + + +/* Convenience functions for handling matched substrings. */ + +#define PCRE2_SUBSTRING_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_copy_byname(pcre2_match_data *, PCRE2_SPTR, PCRE2_UCHAR *, \ + PCRE2_SIZE *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_copy_bynumber(pcre2_match_data *, uint32_t, PCRE2_UCHAR *, \ + PCRE2_SIZE *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_substring_free(PCRE2_UCHAR *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_get_byname(pcre2_match_data *, PCRE2_SPTR, PCRE2_UCHAR **, \ + PCRE2_SIZE *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_get_bynumber(pcre2_match_data *, uint32_t, PCRE2_UCHAR **, \ + PCRE2_SIZE *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_length_byname(pcre2_match_data *, PCRE2_SPTR, PCRE2_SIZE *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_length_bynumber(pcre2_match_data *, uint32_t, PCRE2_SIZE *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_nametable_scan(const pcre2_code *, PCRE2_SPTR, PCRE2_SPTR *, \ + PCRE2_SPTR *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_number_from_name(const pcre2_code *, PCRE2_SPTR); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_substring_list_free(PCRE2_SPTR *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_list_get(pcre2_match_data *, PCRE2_UCHAR ***, PCRE2_SIZE **); + +/* Functions for serializing / deserializing compiled patterns. */ + +#define PCRE2_SERIALIZE_FUNCTIONS \ +PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION \ + pcre2_serialize_encode(const pcre2_code **, int32_t, uint8_t **, \ + PCRE2_SIZE *, pcre2_general_context *); \ +PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION \ + pcre2_serialize_decode(pcre2_code **, int32_t, const uint8_t *, \ + pcre2_general_context *); \ +PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION \ + pcre2_serialize_get_number_of_codes(const uint8_t *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_serialize_free(uint8_t *); + + +/* Convenience function for match + substitute. */ + +#define PCRE2_SUBSTITUTE_FUNCTION \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substitute(const pcre2_code *, PCRE2_SPTR, PCRE2_SIZE, PCRE2_SIZE, \ + uint32_t, pcre2_match_data *, pcre2_match_context *, PCRE2_SPTR, \ + PCRE2_SIZE, PCRE2_UCHAR *, PCRE2_SIZE *); + + +/* Functions for converting pattern source strings. */ + +#define PCRE2_CONVERT_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_pattern_convert(PCRE2_SPTR, PCRE2_SIZE, uint32_t, PCRE2_UCHAR **, \ + PCRE2_SIZE *, pcre2_convert_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_converted_pattern_free(PCRE2_UCHAR *); + + +/* Functions for JIT processing */ + +#define PCRE2_JIT_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_jit_compile(pcre2_code *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_jit_match(const pcre2_code *, PCRE2_SPTR, PCRE2_SIZE, PCRE2_SIZE, \ + uint32_t, pcre2_match_data *, pcre2_match_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_jit_free_unused_memory(pcre2_general_context *); \ +PCRE2_EXP_DECL pcre2_jit_stack PCRE2_CALL_CONVENTION \ + *pcre2_jit_stack_create(PCRE2_SIZE, PCRE2_SIZE, pcre2_general_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_jit_stack_assign(pcre2_match_context *, pcre2_jit_callback, void *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_jit_stack_free(pcre2_jit_stack *); + + +/* Other miscellaneous functions. */ + +#define PCRE2_OTHER_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_get_error_message(int, PCRE2_UCHAR *, PCRE2_SIZE); \ +PCRE2_EXP_DECL const uint8_t PCRE2_CALL_CONVENTION \ + *pcre2_maketables(pcre2_general_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_maketables_free(pcre2_general_context *, const uint8_t *); + +/* Define macros that generate width-specific names from generic versions. The +three-level macro scheme is necessary to get the macros expanded when we want +them to be. First we get the width from PCRE2_LOCAL_WIDTH, which is used for +generating three versions of everything below. After that, PCRE2_SUFFIX will be +re-defined to use PCRE2_CODE_UNIT_WIDTH, for use when macros such as +pcre2_compile are called by application code. */ + +#define PCRE2_JOIN(a,b) a ## b +#define PCRE2_GLUE(a,b) PCRE2_JOIN(a,b) +#define PCRE2_SUFFIX(a) PCRE2_GLUE(a,PCRE2_LOCAL_WIDTH) + + +/* Data types */ + +#define PCRE2_UCHAR PCRE2_SUFFIX(PCRE2_UCHAR) +#define PCRE2_SPTR PCRE2_SUFFIX(PCRE2_SPTR) + +#define pcre2_code PCRE2_SUFFIX(pcre2_code_) +#define pcre2_jit_callback PCRE2_SUFFIX(pcre2_jit_callback_) +#define pcre2_jit_stack PCRE2_SUFFIX(pcre2_jit_stack_) + +#define pcre2_real_code PCRE2_SUFFIX(pcre2_real_code_) +#define pcre2_real_general_context PCRE2_SUFFIX(pcre2_real_general_context_) +#define pcre2_real_compile_context PCRE2_SUFFIX(pcre2_real_compile_context_) +#define pcre2_real_convert_context PCRE2_SUFFIX(pcre2_real_convert_context_) +#define pcre2_real_match_context PCRE2_SUFFIX(pcre2_real_match_context_) +#define pcre2_real_jit_stack PCRE2_SUFFIX(pcre2_real_jit_stack_) +#define pcre2_real_match_data PCRE2_SUFFIX(pcre2_real_match_data_) + + +/* Data blocks */ + +#define pcre2_callout_block PCRE2_SUFFIX(pcre2_callout_block_) +#define pcre2_callout_enumerate_block PCRE2_SUFFIX(pcre2_callout_enumerate_block_) +#define pcre2_substitute_callout_block PCRE2_SUFFIX(pcre2_substitute_callout_block_) +#define pcre2_general_context PCRE2_SUFFIX(pcre2_general_context_) +#define pcre2_compile_context PCRE2_SUFFIX(pcre2_compile_context_) +#define pcre2_convert_context PCRE2_SUFFIX(pcre2_convert_context_) +#define pcre2_match_context PCRE2_SUFFIX(pcre2_match_context_) +#define pcre2_match_data PCRE2_SUFFIX(pcre2_match_data_) + + +/* Functions: the complete list in alphabetical order */ + +#define pcre2_callout_enumerate PCRE2_SUFFIX(pcre2_callout_enumerate_) +#define pcre2_code_copy PCRE2_SUFFIX(pcre2_code_copy_) +#define pcre2_code_copy_with_tables PCRE2_SUFFIX(pcre2_code_copy_with_tables_) +#define pcre2_code_free PCRE2_SUFFIX(pcre2_code_free_) +#define pcre2_compile PCRE2_SUFFIX(pcre2_compile_) +#define pcre2_compile_context_copy PCRE2_SUFFIX(pcre2_compile_context_copy_) +#define pcre2_compile_context_create PCRE2_SUFFIX(pcre2_compile_context_create_) +#define pcre2_compile_context_free PCRE2_SUFFIX(pcre2_compile_context_free_) +#define pcre2_config PCRE2_SUFFIX(pcre2_config_) +#define pcre2_convert_context_copy PCRE2_SUFFIX(pcre2_convert_context_copy_) +#define pcre2_convert_context_create PCRE2_SUFFIX(pcre2_convert_context_create_) +#define pcre2_convert_context_free PCRE2_SUFFIX(pcre2_convert_context_free_) +#define pcre2_converted_pattern_free PCRE2_SUFFIX(pcre2_converted_pattern_free_) +#define pcre2_dfa_match PCRE2_SUFFIX(pcre2_dfa_match_) +#define pcre2_general_context_copy PCRE2_SUFFIX(pcre2_general_context_copy_) +#define pcre2_general_context_create PCRE2_SUFFIX(pcre2_general_context_create_) +#define pcre2_general_context_free PCRE2_SUFFIX(pcre2_general_context_free_) +#define pcre2_get_error_message PCRE2_SUFFIX(pcre2_get_error_message_) +#define pcre2_get_mark PCRE2_SUFFIX(pcre2_get_mark_) +#define pcre2_get_match_data_size PCRE2_SUFFIX(pcre2_get_match_data_size_) +#define pcre2_get_ovector_pointer PCRE2_SUFFIX(pcre2_get_ovector_pointer_) +#define pcre2_get_ovector_count PCRE2_SUFFIX(pcre2_get_ovector_count_) +#define pcre2_get_startchar PCRE2_SUFFIX(pcre2_get_startchar_) +#define pcre2_jit_compile PCRE2_SUFFIX(pcre2_jit_compile_) +#define pcre2_jit_match PCRE2_SUFFIX(pcre2_jit_match_) +#define pcre2_jit_free_unused_memory PCRE2_SUFFIX(pcre2_jit_free_unused_memory_) +#define pcre2_jit_stack_assign PCRE2_SUFFIX(pcre2_jit_stack_assign_) +#define pcre2_jit_stack_create PCRE2_SUFFIX(pcre2_jit_stack_create_) +#define pcre2_jit_stack_free PCRE2_SUFFIX(pcre2_jit_stack_free_) +#define pcre2_maketables PCRE2_SUFFIX(pcre2_maketables_) +#define pcre2_maketables_free PCRE2_SUFFIX(pcre2_maketables_free_) +#define pcre2_match PCRE2_SUFFIX(pcre2_match_) +#define pcre2_match_context_copy PCRE2_SUFFIX(pcre2_match_context_copy_) +#define pcre2_match_context_create PCRE2_SUFFIX(pcre2_match_context_create_) +#define pcre2_match_context_free PCRE2_SUFFIX(pcre2_match_context_free_) +#define pcre2_match_data_create PCRE2_SUFFIX(pcre2_match_data_create_) +#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_) +#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_) +#define pcre2_pattern_convert PCRE2_SUFFIX(pcre2_pattern_convert_) +#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_) +#define pcre2_serialize_decode PCRE2_SUFFIX(pcre2_serialize_decode_) +#define pcre2_serialize_encode PCRE2_SUFFIX(pcre2_serialize_encode_) +#define pcre2_serialize_free PCRE2_SUFFIX(pcre2_serialize_free_) +#define pcre2_serialize_get_number_of_codes PCRE2_SUFFIX(pcre2_serialize_get_number_of_codes_) +#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_) +#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_) +#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_) +#define pcre2_set_compile_extra_options PCRE2_SUFFIX(pcre2_set_compile_extra_options_) +#define pcre2_set_compile_recursion_guard PCRE2_SUFFIX(pcre2_set_compile_recursion_guard_) +#define pcre2_set_depth_limit PCRE2_SUFFIX(pcre2_set_depth_limit_) +#define pcre2_set_glob_escape PCRE2_SUFFIX(pcre2_set_glob_escape_) +#define pcre2_set_glob_separator PCRE2_SUFFIX(pcre2_set_glob_separator_) +#define pcre2_set_heap_limit PCRE2_SUFFIX(pcre2_set_heap_limit_) +#define pcre2_set_match_limit PCRE2_SUFFIX(pcre2_set_match_limit_) +#define pcre2_set_max_pattern_length PCRE2_SUFFIX(pcre2_set_max_pattern_length_) +#define pcre2_set_newline PCRE2_SUFFIX(pcre2_set_newline_) +#define pcre2_set_parens_nest_limit PCRE2_SUFFIX(pcre2_set_parens_nest_limit_) +#define pcre2_set_offset_limit PCRE2_SUFFIX(pcre2_set_offset_limit_) +#define pcre2_set_substitute_callout PCRE2_SUFFIX(pcre2_set_substitute_callout_) +#define pcre2_substitute PCRE2_SUFFIX(pcre2_substitute_) +#define pcre2_substring_copy_byname PCRE2_SUFFIX(pcre2_substring_copy_byname_) +#define pcre2_substring_copy_bynumber PCRE2_SUFFIX(pcre2_substring_copy_bynumber_) +#define pcre2_substring_free PCRE2_SUFFIX(pcre2_substring_free_) +#define pcre2_substring_get_byname PCRE2_SUFFIX(pcre2_substring_get_byname_) +#define pcre2_substring_get_bynumber PCRE2_SUFFIX(pcre2_substring_get_bynumber_) +#define pcre2_substring_length_byname PCRE2_SUFFIX(pcre2_substring_length_byname_) +#define pcre2_substring_length_bynumber PCRE2_SUFFIX(pcre2_substring_length_bynumber_) +#define pcre2_substring_list_get PCRE2_SUFFIX(pcre2_substring_list_get_) +#define pcre2_substring_list_free PCRE2_SUFFIX(pcre2_substring_list_free_) +#define pcre2_substring_nametable_scan PCRE2_SUFFIX(pcre2_substring_nametable_scan_) +#define pcre2_substring_number_from_name PCRE2_SUFFIX(pcre2_substring_number_from_name_) + +/* Keep this old function name for backwards compatibility */ +#define pcre2_set_recursion_limit PCRE2_SUFFIX(pcre2_set_recursion_limit_) + +/* Keep this obsolete function for backwards compatibility: it is now a noop. */ +#define pcre2_set_recursion_memory_management PCRE2_SUFFIX(pcre2_set_recursion_memory_management_) + +/* Now generate all three sets of width-specific structures and function +prototypes. */ + +#define PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS \ +PCRE2_TYPES_LIST \ +PCRE2_STRUCTURE_LIST \ +PCRE2_GENERAL_INFO_FUNCTIONS \ +PCRE2_GENERAL_CONTEXT_FUNCTIONS \ +PCRE2_COMPILE_CONTEXT_FUNCTIONS \ +PCRE2_CONVERT_CONTEXT_FUNCTIONS \ +PCRE2_CONVERT_FUNCTIONS \ +PCRE2_MATCH_CONTEXT_FUNCTIONS \ +PCRE2_COMPILE_FUNCTIONS \ +PCRE2_PATTERN_INFO_FUNCTIONS \ +PCRE2_MATCH_FUNCTIONS \ +PCRE2_SUBSTRING_FUNCTIONS \ +PCRE2_SERIALIZE_FUNCTIONS \ +PCRE2_SUBSTITUTE_FUNCTION \ +PCRE2_JIT_FUNCTIONS \ +PCRE2_OTHER_FUNCTIONS + +#define PCRE2_LOCAL_WIDTH 8 +PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS +#undef PCRE2_LOCAL_WIDTH + +#define PCRE2_LOCAL_WIDTH 16 +PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS +#undef PCRE2_LOCAL_WIDTH + +#define PCRE2_LOCAL_WIDTH 32 +PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS +#undef PCRE2_LOCAL_WIDTH + +/* Undefine the list macros; they are no longer needed. */ + +#undef PCRE2_TYPES_LIST +#undef PCRE2_STRUCTURE_LIST +#undef PCRE2_GENERAL_INFO_FUNCTIONS +#undef PCRE2_GENERAL_CONTEXT_FUNCTIONS +#undef PCRE2_COMPILE_CONTEXT_FUNCTIONS +#undef PCRE2_CONVERT_CONTEXT_FUNCTIONS +#undef PCRE2_MATCH_CONTEXT_FUNCTIONS +#undef PCRE2_COMPILE_FUNCTIONS +#undef PCRE2_PATTERN_INFO_FUNCTIONS +#undef PCRE2_MATCH_FUNCTIONS +#undef PCRE2_SUBSTRING_FUNCTIONS +#undef PCRE2_SERIALIZE_FUNCTIONS +#undef PCRE2_SUBSTITUTE_FUNCTION +#undef PCRE2_JIT_FUNCTIONS +#undef PCRE2_OTHER_FUNCTIONS +#undef PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS + +/* PCRE2_CODE_UNIT_WIDTH must be defined. If it is 8, 16, or 32, redefine +PCRE2_SUFFIX to use it. If it is 0, undefine the other macros and make +PCRE2_SUFFIX a no-op. Otherwise, generate an error. */ + +#undef PCRE2_SUFFIX +#ifndef PCRE2_CODE_UNIT_WIDTH +#error PCRE2_CODE_UNIT_WIDTH must be defined before including pcre2.h. +#error Use 8, 16, or 32; or 0 for a multi-width application. +#else /* PCRE2_CODE_UNIT_WIDTH is defined */ +#if PCRE2_CODE_UNIT_WIDTH == 8 || \ + PCRE2_CODE_UNIT_WIDTH == 16 || \ + PCRE2_CODE_UNIT_WIDTH == 32 +#define PCRE2_SUFFIX(a) PCRE2_GLUE(a, PCRE2_CODE_UNIT_WIDTH) +#elif PCRE2_CODE_UNIT_WIDTH == 0 +#undef PCRE2_JOIN +#undef PCRE2_GLUE +#define PCRE2_SUFFIX(a) a +#else +#error PCRE2_CODE_UNIT_WIDTH must be 0, 8, 16, or 32. +#endif +#endif /* PCRE2_CODE_UNIT_WIDTH is defined */ + +#ifdef __cplusplus +} /* extern "C" */ +#endif + +#endif /* PCRE2_H_IDEMPOTENT_GUARD */ + +/* End of pcre2.h */ diff --git a/src/pcre2/src/pcre2.h.in b/src/pcre2/src/pcre2.h.in new file mode 100644 index 00000000..4fd6a1e3 --- /dev/null +++ b/src/pcre2/src/pcre2.h.in @@ -0,0 +1,991 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* This is the public header file for the PCRE library, second API, to be +#included by applications that call PCRE2 functions. + + Copyright (c) 2016-2020 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + +#ifndef PCRE2_H_IDEMPOTENT_GUARD +#define PCRE2_H_IDEMPOTENT_GUARD + +/* The current PCRE version information. */ + +#define PCRE2_MAJOR @PCRE2_MAJOR@ +#define PCRE2_MINOR @PCRE2_MINOR@ +#define PCRE2_PRERELEASE @PCRE2_PRERELEASE@ +#define PCRE2_DATE @PCRE2_DATE@ + +/* When an application links to a PCRE DLL in Windows, the symbols that are +imported have to be identified as such. When building PCRE2, the appropriate +export setting is defined in pcre2_internal.h, which includes this file. So we +don't change existing definitions of PCRE2_EXP_DECL. */ + +#if defined(_WIN32) && !defined(PCRE2_STATIC) +# ifndef PCRE2_EXP_DECL +# define PCRE2_EXP_DECL extern __declspec(dllimport) +# endif +#endif + +/* By default, we use the standard "extern" declarations. */ + +#ifndef PCRE2_EXP_DECL +# ifdef __cplusplus +# define PCRE2_EXP_DECL extern "C" +# else +# define PCRE2_EXP_DECL extern +# endif +#endif + +/* When compiling with the MSVC compiler, it is sometimes necessary to include +a "calling convention" before exported function names. (This is secondhand +information; I know nothing about MSVC myself). For example, something like + + void __cdecl function(....) + +might be needed. In order so make this easy, all the exported functions have +PCRE2_CALL_CONVENTION just before their names. It is rarely needed; if not +set, we ensure here that it has no effect. */ + +#ifndef PCRE2_CALL_CONVENTION +#define PCRE2_CALL_CONVENTION +#endif + +/* Have to include limits.h, stdlib.h, and inttypes.h to ensure that size_t and +uint8_t, UCHAR_MAX, etc are defined. Some systems that do have inttypes.h do +not have stdint.h, which is why we use inttypes.h, which according to the C +standard is a superset of stdint.h. If none of these headers are available, +the relevant values must be provided by some other means. */ + +#include +#include +#include + +/* Allow for C++ users compiling this directly. */ + +#ifdef __cplusplus +extern "C" { +#endif + +/* The following option bits can be passed to pcre2_compile(), pcre2_match(), +or pcre2_dfa_match(). PCRE2_NO_UTF_CHECK affects only the function to which it +is passed. Put these bits at the most significant end of the options word so +others can be added next to them */ + +#define PCRE2_ANCHORED 0x80000000u +#define PCRE2_NO_UTF_CHECK 0x40000000u +#define PCRE2_ENDANCHORED 0x20000000u + +/* The following option bits can be passed only to pcre2_compile(). However, +they may affect compilation, JIT compilation, and/or interpretive execution. +The following tags indicate which: + +C alters what is compiled by pcre2_compile() +J alters what is compiled by pcre2_jit_compile() +M is inspected during pcre2_match() execution +D is inspected during pcre2_dfa_match() execution +*/ + +#define PCRE2_ALLOW_EMPTY_CLASS 0x00000001u /* C */ +#define PCRE2_ALT_BSUX 0x00000002u /* C */ +#define PCRE2_AUTO_CALLOUT 0x00000004u /* C */ +#define PCRE2_CASELESS 0x00000008u /* C */ +#define PCRE2_DOLLAR_ENDONLY 0x00000010u /* J M D */ +#define PCRE2_DOTALL 0x00000020u /* C */ +#define PCRE2_DUPNAMES 0x00000040u /* C */ +#define PCRE2_EXTENDED 0x00000080u /* C */ +#define PCRE2_FIRSTLINE 0x00000100u /* J M D */ +#define PCRE2_MATCH_UNSET_BACKREF 0x00000200u /* C J M */ +#define PCRE2_MULTILINE 0x00000400u /* C */ +#define PCRE2_NEVER_UCP 0x00000800u /* C */ +#define PCRE2_NEVER_UTF 0x00001000u /* C */ +#define PCRE2_NO_AUTO_CAPTURE 0x00002000u /* C */ +#define PCRE2_NO_AUTO_POSSESS 0x00004000u /* C */ +#define PCRE2_NO_DOTSTAR_ANCHOR 0x00008000u /* C */ +#define PCRE2_NO_START_OPTIMIZE 0x00010000u /* J M D */ +#define PCRE2_UCP 0x00020000u /* C J M D */ +#define PCRE2_UNGREEDY 0x00040000u /* C */ +#define PCRE2_UTF 0x00080000u /* C J M D */ +#define PCRE2_NEVER_BACKSLASH_C 0x00100000u /* C */ +#define PCRE2_ALT_CIRCUMFLEX 0x00200000u /* J M D */ +#define PCRE2_ALT_VERBNAMES 0x00400000u /* C */ +#define PCRE2_USE_OFFSET_LIMIT 0x00800000u /* J M D */ +#define PCRE2_EXTENDED_MORE 0x01000000u /* C */ +#define PCRE2_LITERAL 0x02000000u /* C */ +#define PCRE2_MATCH_INVALID_UTF 0x04000000u /* J M D */ + +/* An additional compile options word is available in the compile context. */ + +#define PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES 0x00000001u /* C */ +#define PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL 0x00000002u /* C */ +#define PCRE2_EXTRA_MATCH_WORD 0x00000004u /* C */ +#define PCRE2_EXTRA_MATCH_LINE 0x00000008u /* C */ +#define PCRE2_EXTRA_ESCAPED_CR_IS_LF 0x00000010u /* C */ +#define PCRE2_EXTRA_ALT_BSUX 0x00000020u /* C */ + +/* These are for pcre2_jit_compile(). */ + +#define PCRE2_JIT_COMPLETE 0x00000001u /* For full matching */ +#define PCRE2_JIT_PARTIAL_SOFT 0x00000002u +#define PCRE2_JIT_PARTIAL_HARD 0x00000004u +#define PCRE2_JIT_INVALID_UTF 0x00000100u + +/* These are for pcre2_match(), pcre2_dfa_match(), pcre2_jit_match(), and +pcre2_substitute(). Some are allowed only for one of the functions, and in +these cases it is noted below. Note that PCRE2_ANCHORED, PCRE2_ENDANCHORED and +PCRE2_NO_UTF_CHECK can also be passed to these functions (though +pcre2_jit_match() ignores the latter since it bypasses all sanity checks). */ + +#define PCRE2_NOTBOL 0x00000001u +#define PCRE2_NOTEOL 0x00000002u +#define PCRE2_NOTEMPTY 0x00000004u /* ) These two must be kept */ +#define PCRE2_NOTEMPTY_ATSTART 0x00000008u /* ) adjacent to each other. */ +#define PCRE2_PARTIAL_SOFT 0x00000010u +#define PCRE2_PARTIAL_HARD 0x00000020u +#define PCRE2_DFA_RESTART 0x00000040u /* pcre2_dfa_match() only */ +#define PCRE2_DFA_SHORTEST 0x00000080u /* pcre2_dfa_match() only */ +#define PCRE2_SUBSTITUTE_GLOBAL 0x00000100u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_EXTENDED 0x00000200u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_UNSET_EMPTY 0x00000400u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_UNKNOWN_UNSET 0x00000800u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_OVERFLOW_LENGTH 0x00001000u /* pcre2_substitute() only */ +#define PCRE2_NO_JIT 0x00002000u /* Not for pcre2_dfa_match() */ +#define PCRE2_COPY_MATCHED_SUBJECT 0x00004000u +#define PCRE2_SUBSTITUTE_LITERAL 0x00008000u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_MATCHED 0x00010000u /* pcre2_substitute() only */ +#define PCRE2_SUBSTITUTE_REPLACEMENT_ONLY 0x00020000u /* pcre2_substitute() only */ + +/* Options for pcre2_pattern_convert(). */ + +#define PCRE2_CONVERT_UTF 0x00000001u +#define PCRE2_CONVERT_NO_UTF_CHECK 0x00000002u +#define PCRE2_CONVERT_POSIX_BASIC 0x00000004u +#define PCRE2_CONVERT_POSIX_EXTENDED 0x00000008u +#define PCRE2_CONVERT_GLOB 0x00000010u +#define PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR 0x00000030u +#define PCRE2_CONVERT_GLOB_NO_STARSTAR 0x00000050u + +/* Newline and \R settings, for use in compile contexts. The newline values +must be kept in step with values set in config.h and both sets must all be +greater than zero. */ + +#define PCRE2_NEWLINE_CR 1 +#define PCRE2_NEWLINE_LF 2 +#define PCRE2_NEWLINE_CRLF 3 +#define PCRE2_NEWLINE_ANY 4 +#define PCRE2_NEWLINE_ANYCRLF 5 +#define PCRE2_NEWLINE_NUL 6 + +#define PCRE2_BSR_UNICODE 1 +#define PCRE2_BSR_ANYCRLF 2 + +/* Error codes for pcre2_compile(). Some of these are also used by +pcre2_pattern_convert(). */ + +#define PCRE2_ERROR_END_BACKSLASH 101 +#define PCRE2_ERROR_END_BACKSLASH_C 102 +#define PCRE2_ERROR_UNKNOWN_ESCAPE 103 +#define PCRE2_ERROR_QUANTIFIER_OUT_OF_ORDER 104 +#define PCRE2_ERROR_QUANTIFIER_TOO_BIG 105 +#define PCRE2_ERROR_MISSING_SQUARE_BRACKET 106 +#define PCRE2_ERROR_ESCAPE_INVALID_IN_CLASS 107 +#define PCRE2_ERROR_CLASS_RANGE_ORDER 108 +#define PCRE2_ERROR_QUANTIFIER_INVALID 109 +#define PCRE2_ERROR_INTERNAL_UNEXPECTED_REPEAT 110 +#define PCRE2_ERROR_INVALID_AFTER_PARENS_QUERY 111 +#define PCRE2_ERROR_POSIX_CLASS_NOT_IN_CLASS 112 +#define PCRE2_ERROR_POSIX_NO_SUPPORT_COLLATING 113 +#define PCRE2_ERROR_MISSING_CLOSING_PARENTHESIS 114 +#define PCRE2_ERROR_BAD_SUBPATTERN_REFERENCE 115 +#define PCRE2_ERROR_NULL_PATTERN 116 +#define PCRE2_ERROR_BAD_OPTIONS 117 +#define PCRE2_ERROR_MISSING_COMMENT_CLOSING 118 +#define PCRE2_ERROR_PARENTHESES_NEST_TOO_DEEP 119 +#define PCRE2_ERROR_PATTERN_TOO_LARGE 120 +#define PCRE2_ERROR_HEAP_FAILED 121 +#define PCRE2_ERROR_UNMATCHED_CLOSING_PARENTHESIS 122 +#define PCRE2_ERROR_INTERNAL_CODE_OVERFLOW 123 +#define PCRE2_ERROR_MISSING_CONDITION_CLOSING 124 +#define PCRE2_ERROR_LOOKBEHIND_NOT_FIXED_LENGTH 125 +#define PCRE2_ERROR_ZERO_RELATIVE_REFERENCE 126 +#define PCRE2_ERROR_TOO_MANY_CONDITION_BRANCHES 127 +#define PCRE2_ERROR_CONDITION_ASSERTION_EXPECTED 128 +#define PCRE2_ERROR_BAD_RELATIVE_REFERENCE 129 +#define PCRE2_ERROR_UNKNOWN_POSIX_CLASS 130 +#define PCRE2_ERROR_INTERNAL_STUDY_ERROR 131 +#define PCRE2_ERROR_UNICODE_NOT_SUPPORTED 132 +#define PCRE2_ERROR_PARENTHESES_STACK_CHECK 133 +#define PCRE2_ERROR_CODE_POINT_TOO_BIG 134 +#define PCRE2_ERROR_LOOKBEHIND_TOO_COMPLICATED 135 +#define PCRE2_ERROR_LOOKBEHIND_INVALID_BACKSLASH_C 136 +#define PCRE2_ERROR_UNSUPPORTED_ESCAPE_SEQUENCE 137 +#define PCRE2_ERROR_CALLOUT_NUMBER_TOO_BIG 138 +#define PCRE2_ERROR_MISSING_CALLOUT_CLOSING 139 +#define PCRE2_ERROR_ESCAPE_INVALID_IN_VERB 140 +#define PCRE2_ERROR_UNRECOGNIZED_AFTER_QUERY_P 141 +#define PCRE2_ERROR_MISSING_NAME_TERMINATOR 142 +#define PCRE2_ERROR_DUPLICATE_SUBPATTERN_NAME 143 +#define PCRE2_ERROR_INVALID_SUBPATTERN_NAME 144 +#define PCRE2_ERROR_UNICODE_PROPERTIES_UNAVAILABLE 145 +#define PCRE2_ERROR_MALFORMED_UNICODE_PROPERTY 146 +#define PCRE2_ERROR_UNKNOWN_UNICODE_PROPERTY 147 +#define PCRE2_ERROR_SUBPATTERN_NAME_TOO_LONG 148 +#define PCRE2_ERROR_TOO_MANY_NAMED_SUBPATTERNS 149 +#define PCRE2_ERROR_CLASS_INVALID_RANGE 150 +#define PCRE2_ERROR_OCTAL_BYTE_TOO_BIG 151 +#define PCRE2_ERROR_INTERNAL_OVERRAN_WORKSPACE 152 +#define PCRE2_ERROR_INTERNAL_MISSING_SUBPATTERN 153 +#define PCRE2_ERROR_DEFINE_TOO_MANY_BRANCHES 154 +#define PCRE2_ERROR_BACKSLASH_O_MISSING_BRACE 155 +#define PCRE2_ERROR_INTERNAL_UNKNOWN_NEWLINE 156 +#define PCRE2_ERROR_BACKSLASH_G_SYNTAX 157 +#define PCRE2_ERROR_PARENS_QUERY_R_MISSING_CLOSING 158 +/* Error 159 is obsolete and should now never occur */ +#define PCRE2_ERROR_VERB_ARGUMENT_NOT_ALLOWED 159 +#define PCRE2_ERROR_VERB_UNKNOWN 160 +#define PCRE2_ERROR_SUBPATTERN_NUMBER_TOO_BIG 161 +#define PCRE2_ERROR_SUBPATTERN_NAME_EXPECTED 162 +#define PCRE2_ERROR_INTERNAL_PARSED_OVERFLOW 163 +#define PCRE2_ERROR_INVALID_OCTAL 164 +#define PCRE2_ERROR_SUBPATTERN_NAMES_MISMATCH 165 +#define PCRE2_ERROR_MARK_MISSING_ARGUMENT 166 +#define PCRE2_ERROR_INVALID_HEXADECIMAL 167 +#define PCRE2_ERROR_BACKSLASH_C_SYNTAX 168 +#define PCRE2_ERROR_BACKSLASH_K_SYNTAX 169 +#define PCRE2_ERROR_INTERNAL_BAD_CODE_LOOKBEHINDS 170 +#define PCRE2_ERROR_BACKSLASH_N_IN_CLASS 171 +#define PCRE2_ERROR_CALLOUT_STRING_TOO_LONG 172 +#define PCRE2_ERROR_UNICODE_DISALLOWED_CODE_POINT 173 +#define PCRE2_ERROR_UTF_IS_DISABLED 174 +#define PCRE2_ERROR_UCP_IS_DISABLED 175 +#define PCRE2_ERROR_VERB_NAME_TOO_LONG 176 +#define PCRE2_ERROR_BACKSLASH_U_CODE_POINT_TOO_BIG 177 +#define PCRE2_ERROR_MISSING_OCTAL_OR_HEX_DIGITS 178 +#define PCRE2_ERROR_VERSION_CONDITION_SYNTAX 179 +#define PCRE2_ERROR_INTERNAL_BAD_CODE_AUTO_POSSESS 180 +#define PCRE2_ERROR_CALLOUT_NO_STRING_DELIMITER 181 +#define PCRE2_ERROR_CALLOUT_BAD_STRING_DELIMITER 182 +#define PCRE2_ERROR_BACKSLASH_C_CALLER_DISABLED 183 +#define PCRE2_ERROR_QUERY_BARJX_NEST_TOO_DEEP 184 +#define PCRE2_ERROR_BACKSLASH_C_LIBRARY_DISABLED 185 +#define PCRE2_ERROR_PATTERN_TOO_COMPLICATED 186 +#define PCRE2_ERROR_LOOKBEHIND_TOO_LONG 187 +#define PCRE2_ERROR_PATTERN_STRING_TOO_LONG 188 +#define PCRE2_ERROR_INTERNAL_BAD_CODE 189 +#define PCRE2_ERROR_INTERNAL_BAD_CODE_IN_SKIP 190 +#define PCRE2_ERROR_NO_SURROGATES_IN_UTF16 191 +#define PCRE2_ERROR_BAD_LITERAL_OPTIONS 192 +#define PCRE2_ERROR_SUPPORTED_ONLY_IN_UNICODE 193 +#define PCRE2_ERROR_INVALID_HYPHEN_IN_OPTIONS 194 +#define PCRE2_ERROR_ALPHA_ASSERTION_UNKNOWN 195 +#define PCRE2_ERROR_SCRIPT_RUN_NOT_AVAILABLE 196 +#define PCRE2_ERROR_TOO_MANY_CAPTURES 197 +#define PCRE2_ERROR_CONDITION_ATOMIC_ASSERTION_EXPECTED 198 + + +/* "Expected" matching error codes: no match and partial match. */ + +#define PCRE2_ERROR_NOMATCH (-1) +#define PCRE2_ERROR_PARTIAL (-2) + +/* Error codes for UTF-8 validity checks */ + +#define PCRE2_ERROR_UTF8_ERR1 (-3) +#define PCRE2_ERROR_UTF8_ERR2 (-4) +#define PCRE2_ERROR_UTF8_ERR3 (-5) +#define PCRE2_ERROR_UTF8_ERR4 (-6) +#define PCRE2_ERROR_UTF8_ERR5 (-7) +#define PCRE2_ERROR_UTF8_ERR6 (-8) +#define PCRE2_ERROR_UTF8_ERR7 (-9) +#define PCRE2_ERROR_UTF8_ERR8 (-10) +#define PCRE2_ERROR_UTF8_ERR9 (-11) +#define PCRE2_ERROR_UTF8_ERR10 (-12) +#define PCRE2_ERROR_UTF8_ERR11 (-13) +#define PCRE2_ERROR_UTF8_ERR12 (-14) +#define PCRE2_ERROR_UTF8_ERR13 (-15) +#define PCRE2_ERROR_UTF8_ERR14 (-16) +#define PCRE2_ERROR_UTF8_ERR15 (-17) +#define PCRE2_ERROR_UTF8_ERR16 (-18) +#define PCRE2_ERROR_UTF8_ERR17 (-19) +#define PCRE2_ERROR_UTF8_ERR18 (-20) +#define PCRE2_ERROR_UTF8_ERR19 (-21) +#define PCRE2_ERROR_UTF8_ERR20 (-22) +#define PCRE2_ERROR_UTF8_ERR21 (-23) + +/* Error codes for UTF-16 validity checks */ + +#define PCRE2_ERROR_UTF16_ERR1 (-24) +#define PCRE2_ERROR_UTF16_ERR2 (-25) +#define PCRE2_ERROR_UTF16_ERR3 (-26) + +/* Error codes for UTF-32 validity checks */ + +#define PCRE2_ERROR_UTF32_ERR1 (-27) +#define PCRE2_ERROR_UTF32_ERR2 (-28) + +/* Miscellaneous error codes for pcre2[_dfa]_match(), substring extraction +functions, context functions, and serializing functions. They are in numerical +order. Originally they were in alphabetical order too, but now that PCRE2 is +released, the numbers must not be changed. */ + +#define PCRE2_ERROR_BADDATA (-29) +#define PCRE2_ERROR_MIXEDTABLES (-30) /* Name was changed */ +#define PCRE2_ERROR_BADMAGIC (-31) +#define PCRE2_ERROR_BADMODE (-32) +#define PCRE2_ERROR_BADOFFSET (-33) +#define PCRE2_ERROR_BADOPTION (-34) +#define PCRE2_ERROR_BADREPLACEMENT (-35) +#define PCRE2_ERROR_BADUTFOFFSET (-36) +#define PCRE2_ERROR_CALLOUT (-37) /* Never used by PCRE2 itself */ +#define PCRE2_ERROR_DFA_BADRESTART (-38) +#define PCRE2_ERROR_DFA_RECURSE (-39) +#define PCRE2_ERROR_DFA_UCOND (-40) +#define PCRE2_ERROR_DFA_UFUNC (-41) +#define PCRE2_ERROR_DFA_UITEM (-42) +#define PCRE2_ERROR_DFA_WSSIZE (-43) +#define PCRE2_ERROR_INTERNAL (-44) +#define PCRE2_ERROR_JIT_BADOPTION (-45) +#define PCRE2_ERROR_JIT_STACKLIMIT (-46) +#define PCRE2_ERROR_MATCHLIMIT (-47) +#define PCRE2_ERROR_NOMEMORY (-48) +#define PCRE2_ERROR_NOSUBSTRING (-49) +#define PCRE2_ERROR_NOUNIQUESUBSTRING (-50) +#define PCRE2_ERROR_NULL (-51) +#define PCRE2_ERROR_RECURSELOOP (-52) +#define PCRE2_ERROR_DEPTHLIMIT (-53) +#define PCRE2_ERROR_RECURSIONLIMIT (-53) /* Obsolete synonym */ +#define PCRE2_ERROR_UNAVAILABLE (-54) +#define PCRE2_ERROR_UNSET (-55) +#define PCRE2_ERROR_BADOFFSETLIMIT (-56) +#define PCRE2_ERROR_BADREPESCAPE (-57) +#define PCRE2_ERROR_REPMISSINGBRACE (-58) +#define PCRE2_ERROR_BADSUBSTITUTION (-59) +#define PCRE2_ERROR_BADSUBSPATTERN (-60) +#define PCRE2_ERROR_TOOMANYREPLACE (-61) +#define PCRE2_ERROR_BADSERIALIZEDDATA (-62) +#define PCRE2_ERROR_HEAPLIMIT (-63) +#define PCRE2_ERROR_CONVERT_SYNTAX (-64) +#define PCRE2_ERROR_INTERNAL_DUPMATCH (-65) +#define PCRE2_ERROR_DFA_UINVALID_UTF (-66) + + +/* Request types for pcre2_pattern_info() */ + +#define PCRE2_INFO_ALLOPTIONS 0 +#define PCRE2_INFO_ARGOPTIONS 1 +#define PCRE2_INFO_BACKREFMAX 2 +#define PCRE2_INFO_BSR 3 +#define PCRE2_INFO_CAPTURECOUNT 4 +#define PCRE2_INFO_FIRSTCODEUNIT 5 +#define PCRE2_INFO_FIRSTCODETYPE 6 +#define PCRE2_INFO_FIRSTBITMAP 7 +#define PCRE2_INFO_HASCRORLF 8 +#define PCRE2_INFO_JCHANGED 9 +#define PCRE2_INFO_JITSIZE 10 +#define PCRE2_INFO_LASTCODEUNIT 11 +#define PCRE2_INFO_LASTCODETYPE 12 +#define PCRE2_INFO_MATCHEMPTY 13 +#define PCRE2_INFO_MATCHLIMIT 14 +#define PCRE2_INFO_MAXLOOKBEHIND 15 +#define PCRE2_INFO_MINLENGTH 16 +#define PCRE2_INFO_NAMECOUNT 17 +#define PCRE2_INFO_NAMEENTRYSIZE 18 +#define PCRE2_INFO_NAMETABLE 19 +#define PCRE2_INFO_NEWLINE 20 +#define PCRE2_INFO_DEPTHLIMIT 21 +#define PCRE2_INFO_RECURSIONLIMIT 21 /* Obsolete synonym */ +#define PCRE2_INFO_SIZE 22 +#define PCRE2_INFO_HASBACKSLASHC 23 +#define PCRE2_INFO_FRAMESIZE 24 +#define PCRE2_INFO_HEAPLIMIT 25 +#define PCRE2_INFO_EXTRAOPTIONS 26 + +/* Request types for pcre2_config(). */ + +#define PCRE2_CONFIG_BSR 0 +#define PCRE2_CONFIG_JIT 1 +#define PCRE2_CONFIG_JITTARGET 2 +#define PCRE2_CONFIG_LINKSIZE 3 +#define PCRE2_CONFIG_MATCHLIMIT 4 +#define PCRE2_CONFIG_NEWLINE 5 +#define PCRE2_CONFIG_PARENSLIMIT 6 +#define PCRE2_CONFIG_DEPTHLIMIT 7 +#define PCRE2_CONFIG_RECURSIONLIMIT 7 /* Obsolete synonym */ +#define PCRE2_CONFIG_STACKRECURSE 8 /* Obsolete */ +#define PCRE2_CONFIG_UNICODE 9 +#define PCRE2_CONFIG_UNICODE_VERSION 10 +#define PCRE2_CONFIG_VERSION 11 +#define PCRE2_CONFIG_HEAPLIMIT 12 +#define PCRE2_CONFIG_NEVER_BACKSLASH_C 13 +#define PCRE2_CONFIG_COMPILED_WIDTHS 14 +#define PCRE2_CONFIG_TABLES_LENGTH 15 + + +/* Types for code units in patterns and subject strings. */ + +typedef uint8_t PCRE2_UCHAR8; +typedef uint16_t PCRE2_UCHAR16; +typedef uint32_t PCRE2_UCHAR32; + +typedef const PCRE2_UCHAR8 *PCRE2_SPTR8; +typedef const PCRE2_UCHAR16 *PCRE2_SPTR16; +typedef const PCRE2_UCHAR32 *PCRE2_SPTR32; + +/* The PCRE2_SIZE type is used for all string lengths and offsets in PCRE2, +including pattern offsets for errors and subject offsets after a match. We +define special values to indicate zero-terminated strings and unset offsets in +the offset vector (ovector). */ + +#define PCRE2_SIZE size_t +#define PCRE2_SIZE_MAX SIZE_MAX +#define PCRE2_ZERO_TERMINATED (~(PCRE2_SIZE)0) +#define PCRE2_UNSET (~(PCRE2_SIZE)0) + +/* Generic types for opaque structures and JIT callback functions. These +declarations are defined in a macro that is expanded for each width later. */ + +#define PCRE2_TYPES_LIST \ +struct pcre2_real_general_context; \ +typedef struct pcre2_real_general_context pcre2_general_context; \ +\ +struct pcre2_real_compile_context; \ +typedef struct pcre2_real_compile_context pcre2_compile_context; \ +\ +struct pcre2_real_match_context; \ +typedef struct pcre2_real_match_context pcre2_match_context; \ +\ +struct pcre2_real_convert_context; \ +typedef struct pcre2_real_convert_context pcre2_convert_context; \ +\ +struct pcre2_real_code; \ +typedef struct pcre2_real_code pcre2_code; \ +\ +struct pcre2_real_match_data; \ +typedef struct pcre2_real_match_data pcre2_match_data; \ +\ +struct pcre2_real_jit_stack; \ +typedef struct pcre2_real_jit_stack pcre2_jit_stack; \ +\ +typedef pcre2_jit_stack *(*pcre2_jit_callback)(void *); + + +/* The structures for passing out data via callout functions. We use structures +so that new fields can be added on the end in future versions, without changing +the API of the function, thereby allowing old clients to work without +modification. Define the generic versions in a macro; the width-specific +versions are generated from this macro below. */ + +/* Flags for the callout_flags field. These are cleared after a callout. */ + +#define PCRE2_CALLOUT_STARTMATCH 0x00000001u /* Set for each bumpalong */ +#define PCRE2_CALLOUT_BACKTRACK 0x00000002u /* Set after a backtrack */ + +#define PCRE2_STRUCTURE_LIST \ +typedef struct pcre2_callout_block { \ + uint32_t version; /* Identifies version of block */ \ + /* ------------------------ Version 0 ------------------------------- */ \ + uint32_t callout_number; /* Number compiled into pattern */ \ + uint32_t capture_top; /* Max current capture */ \ + uint32_t capture_last; /* Most recently closed capture */ \ + PCRE2_SIZE *offset_vector; /* The offset vector */ \ + PCRE2_SPTR mark; /* Pointer to current mark or NULL */ \ + PCRE2_SPTR subject; /* The subject being matched */ \ + PCRE2_SIZE subject_length; /* The length of the subject */ \ + PCRE2_SIZE start_match; /* Offset to start of this match attempt */ \ + PCRE2_SIZE current_position; /* Where we currently are in the subject */ \ + PCRE2_SIZE pattern_position; /* Offset to next item in the pattern */ \ + PCRE2_SIZE next_item_length; /* Length of next item in the pattern */ \ + /* ------------------- Added for Version 1 -------------------------- */ \ + PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \ + PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \ + PCRE2_SPTR callout_string; /* String compiled into pattern */ \ + /* ------------------- Added for Version 2 -------------------------- */ \ + uint32_t callout_flags; /* See above for list */ \ + /* ------------------------------------------------------------------ */ \ +} pcre2_callout_block; \ +\ +typedef struct pcre2_callout_enumerate_block { \ + uint32_t version; /* Identifies version of block */ \ + /* ------------------------ Version 0 ------------------------------- */ \ + PCRE2_SIZE pattern_position; /* Offset to next item in the pattern */ \ + PCRE2_SIZE next_item_length; /* Length of next item in the pattern */ \ + uint32_t callout_number; /* Number compiled into pattern */ \ + PCRE2_SIZE callout_string_offset; /* Offset to string within pattern */ \ + PCRE2_SIZE callout_string_length; /* Length of string compiled into pattern */ \ + PCRE2_SPTR callout_string; /* String compiled into pattern */ \ + /* ------------------------------------------------------------------ */ \ +} pcre2_callout_enumerate_block; \ +\ +typedef struct pcre2_substitute_callout_block { \ + uint32_t version; /* Identifies version of block */ \ + /* ------------------------ Version 0 ------------------------------- */ \ + PCRE2_SPTR input; /* Pointer to input subject string */ \ + PCRE2_SPTR output; /* Pointer to output buffer */ \ + PCRE2_SIZE output_offsets[2]; /* Changed portion of the output */ \ + PCRE2_SIZE *ovector; /* Pointer to current ovector */ \ + uint32_t oveccount; /* Count of pairs set in ovector */ \ + uint32_t subscount; /* Substitution number */ \ + /* ------------------------------------------------------------------ */ \ +} pcre2_substitute_callout_block; + + +/* List the generic forms of all other functions in macros, which will be +expanded for each width below. Start with functions that give general +information. */ + +#define PCRE2_GENERAL_INFO_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION pcre2_config(uint32_t, void *); + + +/* Functions for manipulating contexts. */ + +#define PCRE2_GENERAL_CONTEXT_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_general_context PCRE2_CALL_CONVENTION \ + *pcre2_general_context_copy(pcre2_general_context *); \ +PCRE2_EXP_DECL pcre2_general_context PCRE2_CALL_CONVENTION \ + *pcre2_general_context_create(void *(*)(PCRE2_SIZE, void *), \ + void (*)(void *, void *), void *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_general_context_free(pcre2_general_context *); + +#define PCRE2_COMPILE_CONTEXT_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_compile_context PCRE2_CALL_CONVENTION \ + *pcre2_compile_context_copy(pcre2_compile_context *); \ +PCRE2_EXP_DECL pcre2_compile_context PCRE2_CALL_CONVENTION \ + *pcre2_compile_context_create(pcre2_general_context *);\ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_compile_context_free(pcre2_compile_context *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_bsr(pcre2_compile_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_character_tables(pcre2_compile_context *, const uint8_t *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_compile_extra_options(pcre2_compile_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_max_pattern_length(pcre2_compile_context *, PCRE2_SIZE); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_newline(pcre2_compile_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_parens_nest_limit(pcre2_compile_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_compile_recursion_guard(pcre2_compile_context *, \ + int (*)(uint32_t, void *), void *); + +#define PCRE2_MATCH_CONTEXT_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_match_context PCRE2_CALL_CONVENTION \ + *pcre2_match_context_copy(pcre2_match_context *); \ +PCRE2_EXP_DECL pcre2_match_context PCRE2_CALL_CONVENTION \ + *pcre2_match_context_create(pcre2_general_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_match_context_free(pcre2_match_context *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_callout(pcre2_match_context *, \ + int (*)(pcre2_callout_block *, void *), void *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_substitute_callout(pcre2_match_context *, \ + int (*)(pcre2_substitute_callout_block *, void *), void *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_depth_limit(pcre2_match_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_heap_limit(pcre2_match_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_match_limit(pcre2_match_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_offset_limit(pcre2_match_context *, PCRE2_SIZE); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_recursion_limit(pcre2_match_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_recursion_memory_management(pcre2_match_context *, \ + void *(*)(PCRE2_SIZE, void *), void (*)(void *, void *), void *); + +#define PCRE2_CONVERT_CONTEXT_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_convert_context PCRE2_CALL_CONVENTION \ + *pcre2_convert_context_copy(pcre2_convert_context *); \ +PCRE2_EXP_DECL pcre2_convert_context PCRE2_CALL_CONVENTION \ + *pcre2_convert_context_create(pcre2_general_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_convert_context_free(pcre2_convert_context *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_glob_escape(pcre2_convert_context *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_set_glob_separator(pcre2_convert_context *, uint32_t); + + +/* Functions concerned with compiling a pattern to PCRE internal code. */ + +#define PCRE2_COMPILE_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \ + *pcre2_compile(PCRE2_SPTR, PCRE2_SIZE, uint32_t, int *, PCRE2_SIZE *, \ + pcre2_compile_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_code_free(pcre2_code *); \ +PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \ + *pcre2_code_copy(const pcre2_code *); \ +PCRE2_EXP_DECL pcre2_code PCRE2_CALL_CONVENTION \ + *pcre2_code_copy_with_tables(const pcre2_code *); + + +/* Functions that give information about a compiled pattern. */ + +#define PCRE2_PATTERN_INFO_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_pattern_info(const pcre2_code *, uint32_t, void *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_callout_enumerate(const pcre2_code *, \ + int (*)(pcre2_callout_enumerate_block *, void *), void *); + + +/* Functions for running a match and inspecting the result. */ + +#define PCRE2_MATCH_FUNCTIONS \ +PCRE2_EXP_DECL pcre2_match_data PCRE2_CALL_CONVENTION \ + *pcre2_match_data_create(uint32_t, pcre2_general_context *); \ +PCRE2_EXP_DECL pcre2_match_data PCRE2_CALL_CONVENTION \ + *pcre2_match_data_create_from_pattern(const pcre2_code *, \ + pcre2_general_context *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_dfa_match(const pcre2_code *, PCRE2_SPTR, PCRE2_SIZE, PCRE2_SIZE, \ + uint32_t, pcre2_match_data *, pcre2_match_context *, int *, PCRE2_SIZE); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_match(const pcre2_code *, PCRE2_SPTR, PCRE2_SIZE, PCRE2_SIZE, \ + uint32_t, pcre2_match_data *, pcre2_match_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_match_data_free(pcre2_match_data *); \ +PCRE2_EXP_DECL PCRE2_SPTR PCRE2_CALL_CONVENTION \ + pcre2_get_mark(pcre2_match_data *); \ +PCRE2_EXP_DECL PCRE2_SIZE PCRE2_CALL_CONVENTION \ + pcre2_get_match_data_size(pcre2_match_data *); \ +PCRE2_EXP_DECL uint32_t PCRE2_CALL_CONVENTION \ + pcre2_get_ovector_count(pcre2_match_data *); \ +PCRE2_EXP_DECL PCRE2_SIZE PCRE2_CALL_CONVENTION \ + *pcre2_get_ovector_pointer(pcre2_match_data *); \ +PCRE2_EXP_DECL PCRE2_SIZE PCRE2_CALL_CONVENTION \ + pcre2_get_startchar(pcre2_match_data *); + + +/* Convenience functions for handling matched substrings. */ + +#define PCRE2_SUBSTRING_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_copy_byname(pcre2_match_data *, PCRE2_SPTR, PCRE2_UCHAR *, \ + PCRE2_SIZE *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_copy_bynumber(pcre2_match_data *, uint32_t, PCRE2_UCHAR *, \ + PCRE2_SIZE *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_substring_free(PCRE2_UCHAR *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_get_byname(pcre2_match_data *, PCRE2_SPTR, PCRE2_UCHAR **, \ + PCRE2_SIZE *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_get_bynumber(pcre2_match_data *, uint32_t, PCRE2_UCHAR **, \ + PCRE2_SIZE *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_length_byname(pcre2_match_data *, PCRE2_SPTR, PCRE2_SIZE *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_length_bynumber(pcre2_match_data *, uint32_t, PCRE2_SIZE *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_nametable_scan(const pcre2_code *, PCRE2_SPTR, PCRE2_SPTR *, \ + PCRE2_SPTR *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_number_from_name(const pcre2_code *, PCRE2_SPTR); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_substring_list_free(PCRE2_SPTR *); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substring_list_get(pcre2_match_data *, PCRE2_UCHAR ***, PCRE2_SIZE **); + +/* Functions for serializing / deserializing compiled patterns. */ + +#define PCRE2_SERIALIZE_FUNCTIONS \ +PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION \ + pcre2_serialize_encode(const pcre2_code **, int32_t, uint8_t **, \ + PCRE2_SIZE *, pcre2_general_context *); \ +PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION \ + pcre2_serialize_decode(pcre2_code **, int32_t, const uint8_t *, \ + pcre2_general_context *); \ +PCRE2_EXP_DECL int32_t PCRE2_CALL_CONVENTION \ + pcre2_serialize_get_number_of_codes(const uint8_t *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_serialize_free(uint8_t *); + + +/* Convenience function for match + substitute. */ + +#define PCRE2_SUBSTITUTE_FUNCTION \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_substitute(const pcre2_code *, PCRE2_SPTR, PCRE2_SIZE, PCRE2_SIZE, \ + uint32_t, pcre2_match_data *, pcre2_match_context *, PCRE2_SPTR, \ + PCRE2_SIZE, PCRE2_UCHAR *, PCRE2_SIZE *); + + +/* Functions for converting pattern source strings. */ + +#define PCRE2_CONVERT_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_pattern_convert(PCRE2_SPTR, PCRE2_SIZE, uint32_t, PCRE2_UCHAR **, \ + PCRE2_SIZE *, pcre2_convert_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_converted_pattern_free(PCRE2_UCHAR *); + + +/* Functions for JIT processing */ + +#define PCRE2_JIT_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_jit_compile(pcre2_code *, uint32_t); \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_jit_match(const pcre2_code *, PCRE2_SPTR, PCRE2_SIZE, PCRE2_SIZE, \ + uint32_t, pcre2_match_data *, pcre2_match_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_jit_free_unused_memory(pcre2_general_context *); \ +PCRE2_EXP_DECL pcre2_jit_stack PCRE2_CALL_CONVENTION \ + *pcre2_jit_stack_create(PCRE2_SIZE, PCRE2_SIZE, pcre2_general_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_jit_stack_assign(pcre2_match_context *, pcre2_jit_callback, void *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_jit_stack_free(pcre2_jit_stack *); + + +/* Other miscellaneous functions. */ + +#define PCRE2_OTHER_FUNCTIONS \ +PCRE2_EXP_DECL int PCRE2_CALL_CONVENTION \ + pcre2_get_error_message(int, PCRE2_UCHAR *, PCRE2_SIZE); \ +PCRE2_EXP_DECL const uint8_t PCRE2_CALL_CONVENTION \ + *pcre2_maketables(pcre2_general_context *); \ +PCRE2_EXP_DECL void PCRE2_CALL_CONVENTION \ + pcre2_maketables_free(pcre2_general_context *, const uint8_t *); + +/* Define macros that generate width-specific names from generic versions. The +three-level macro scheme is necessary to get the macros expanded when we want +them to be. First we get the width from PCRE2_LOCAL_WIDTH, which is used for +generating three versions of everything below. After that, PCRE2_SUFFIX will be +re-defined to use PCRE2_CODE_UNIT_WIDTH, for use when macros such as +pcre2_compile are called by application code. */ + +#define PCRE2_JOIN(a,b) a ## b +#define PCRE2_GLUE(a,b) PCRE2_JOIN(a,b) +#define PCRE2_SUFFIX(a) PCRE2_GLUE(a,PCRE2_LOCAL_WIDTH) + + +/* Data types */ + +#define PCRE2_UCHAR PCRE2_SUFFIX(PCRE2_UCHAR) +#define PCRE2_SPTR PCRE2_SUFFIX(PCRE2_SPTR) + +#define pcre2_code PCRE2_SUFFIX(pcre2_code_) +#define pcre2_jit_callback PCRE2_SUFFIX(pcre2_jit_callback_) +#define pcre2_jit_stack PCRE2_SUFFIX(pcre2_jit_stack_) + +#define pcre2_real_code PCRE2_SUFFIX(pcre2_real_code_) +#define pcre2_real_general_context PCRE2_SUFFIX(pcre2_real_general_context_) +#define pcre2_real_compile_context PCRE2_SUFFIX(pcre2_real_compile_context_) +#define pcre2_real_convert_context PCRE2_SUFFIX(pcre2_real_convert_context_) +#define pcre2_real_match_context PCRE2_SUFFIX(pcre2_real_match_context_) +#define pcre2_real_jit_stack PCRE2_SUFFIX(pcre2_real_jit_stack_) +#define pcre2_real_match_data PCRE2_SUFFIX(pcre2_real_match_data_) + + +/* Data blocks */ + +#define pcre2_callout_block PCRE2_SUFFIX(pcre2_callout_block_) +#define pcre2_callout_enumerate_block PCRE2_SUFFIX(pcre2_callout_enumerate_block_) +#define pcre2_substitute_callout_block PCRE2_SUFFIX(pcre2_substitute_callout_block_) +#define pcre2_general_context PCRE2_SUFFIX(pcre2_general_context_) +#define pcre2_compile_context PCRE2_SUFFIX(pcre2_compile_context_) +#define pcre2_convert_context PCRE2_SUFFIX(pcre2_convert_context_) +#define pcre2_match_context PCRE2_SUFFIX(pcre2_match_context_) +#define pcre2_match_data PCRE2_SUFFIX(pcre2_match_data_) + + +/* Functions: the complete list in alphabetical order */ + +#define pcre2_callout_enumerate PCRE2_SUFFIX(pcre2_callout_enumerate_) +#define pcre2_code_copy PCRE2_SUFFIX(pcre2_code_copy_) +#define pcre2_code_copy_with_tables PCRE2_SUFFIX(pcre2_code_copy_with_tables_) +#define pcre2_code_free PCRE2_SUFFIX(pcre2_code_free_) +#define pcre2_compile PCRE2_SUFFIX(pcre2_compile_) +#define pcre2_compile_context_copy PCRE2_SUFFIX(pcre2_compile_context_copy_) +#define pcre2_compile_context_create PCRE2_SUFFIX(pcre2_compile_context_create_) +#define pcre2_compile_context_free PCRE2_SUFFIX(pcre2_compile_context_free_) +#define pcre2_config PCRE2_SUFFIX(pcre2_config_) +#define pcre2_convert_context_copy PCRE2_SUFFIX(pcre2_convert_context_copy_) +#define pcre2_convert_context_create PCRE2_SUFFIX(pcre2_convert_context_create_) +#define pcre2_convert_context_free PCRE2_SUFFIX(pcre2_convert_context_free_) +#define pcre2_converted_pattern_free PCRE2_SUFFIX(pcre2_converted_pattern_free_) +#define pcre2_dfa_match PCRE2_SUFFIX(pcre2_dfa_match_) +#define pcre2_general_context_copy PCRE2_SUFFIX(pcre2_general_context_copy_) +#define pcre2_general_context_create PCRE2_SUFFIX(pcre2_general_context_create_) +#define pcre2_general_context_free PCRE2_SUFFIX(pcre2_general_context_free_) +#define pcre2_get_error_message PCRE2_SUFFIX(pcre2_get_error_message_) +#define pcre2_get_mark PCRE2_SUFFIX(pcre2_get_mark_) +#define pcre2_get_match_data_size PCRE2_SUFFIX(pcre2_get_match_data_size_) +#define pcre2_get_ovector_pointer PCRE2_SUFFIX(pcre2_get_ovector_pointer_) +#define pcre2_get_ovector_count PCRE2_SUFFIX(pcre2_get_ovector_count_) +#define pcre2_get_startchar PCRE2_SUFFIX(pcre2_get_startchar_) +#define pcre2_jit_compile PCRE2_SUFFIX(pcre2_jit_compile_) +#define pcre2_jit_match PCRE2_SUFFIX(pcre2_jit_match_) +#define pcre2_jit_free_unused_memory PCRE2_SUFFIX(pcre2_jit_free_unused_memory_) +#define pcre2_jit_stack_assign PCRE2_SUFFIX(pcre2_jit_stack_assign_) +#define pcre2_jit_stack_create PCRE2_SUFFIX(pcre2_jit_stack_create_) +#define pcre2_jit_stack_free PCRE2_SUFFIX(pcre2_jit_stack_free_) +#define pcre2_maketables PCRE2_SUFFIX(pcre2_maketables_) +#define pcre2_maketables_free PCRE2_SUFFIX(pcre2_maketables_free_) +#define pcre2_match PCRE2_SUFFIX(pcre2_match_) +#define pcre2_match_context_copy PCRE2_SUFFIX(pcre2_match_context_copy_) +#define pcre2_match_context_create PCRE2_SUFFIX(pcre2_match_context_create_) +#define pcre2_match_context_free PCRE2_SUFFIX(pcre2_match_context_free_) +#define pcre2_match_data_create PCRE2_SUFFIX(pcre2_match_data_create_) +#define pcre2_match_data_create_from_pattern PCRE2_SUFFIX(pcre2_match_data_create_from_pattern_) +#define pcre2_match_data_free PCRE2_SUFFIX(pcre2_match_data_free_) +#define pcre2_pattern_convert PCRE2_SUFFIX(pcre2_pattern_convert_) +#define pcre2_pattern_info PCRE2_SUFFIX(pcre2_pattern_info_) +#define pcre2_serialize_decode PCRE2_SUFFIX(pcre2_serialize_decode_) +#define pcre2_serialize_encode PCRE2_SUFFIX(pcre2_serialize_encode_) +#define pcre2_serialize_free PCRE2_SUFFIX(pcre2_serialize_free_) +#define pcre2_serialize_get_number_of_codes PCRE2_SUFFIX(pcre2_serialize_get_number_of_codes_) +#define pcre2_set_bsr PCRE2_SUFFIX(pcre2_set_bsr_) +#define pcre2_set_callout PCRE2_SUFFIX(pcre2_set_callout_) +#define pcre2_set_character_tables PCRE2_SUFFIX(pcre2_set_character_tables_) +#define pcre2_set_compile_extra_options PCRE2_SUFFIX(pcre2_set_compile_extra_options_) +#define pcre2_set_compile_recursion_guard PCRE2_SUFFIX(pcre2_set_compile_recursion_guard_) +#define pcre2_set_depth_limit PCRE2_SUFFIX(pcre2_set_depth_limit_) +#define pcre2_set_glob_escape PCRE2_SUFFIX(pcre2_set_glob_escape_) +#define pcre2_set_glob_separator PCRE2_SUFFIX(pcre2_set_glob_separator_) +#define pcre2_set_heap_limit PCRE2_SUFFIX(pcre2_set_heap_limit_) +#define pcre2_set_match_limit PCRE2_SUFFIX(pcre2_set_match_limit_) +#define pcre2_set_max_pattern_length PCRE2_SUFFIX(pcre2_set_max_pattern_length_) +#define pcre2_set_newline PCRE2_SUFFIX(pcre2_set_newline_) +#define pcre2_set_parens_nest_limit PCRE2_SUFFIX(pcre2_set_parens_nest_limit_) +#define pcre2_set_offset_limit PCRE2_SUFFIX(pcre2_set_offset_limit_) +#define pcre2_set_substitute_callout PCRE2_SUFFIX(pcre2_set_substitute_callout_) +#define pcre2_substitute PCRE2_SUFFIX(pcre2_substitute_) +#define pcre2_substring_copy_byname PCRE2_SUFFIX(pcre2_substring_copy_byname_) +#define pcre2_substring_copy_bynumber PCRE2_SUFFIX(pcre2_substring_copy_bynumber_) +#define pcre2_substring_free PCRE2_SUFFIX(pcre2_substring_free_) +#define pcre2_substring_get_byname PCRE2_SUFFIX(pcre2_substring_get_byname_) +#define pcre2_substring_get_bynumber PCRE2_SUFFIX(pcre2_substring_get_bynumber_) +#define pcre2_substring_length_byname PCRE2_SUFFIX(pcre2_substring_length_byname_) +#define pcre2_substring_length_bynumber PCRE2_SUFFIX(pcre2_substring_length_bynumber_) +#define pcre2_substring_list_get PCRE2_SUFFIX(pcre2_substring_list_get_) +#define pcre2_substring_list_free PCRE2_SUFFIX(pcre2_substring_list_free_) +#define pcre2_substring_nametable_scan PCRE2_SUFFIX(pcre2_substring_nametable_scan_) +#define pcre2_substring_number_from_name PCRE2_SUFFIX(pcre2_substring_number_from_name_) + +/* Keep this old function name for backwards compatibility */ +#define pcre2_set_recursion_limit PCRE2_SUFFIX(pcre2_set_recursion_limit_) + +/* Keep this obsolete function for backwards compatibility: it is now a noop. */ +#define pcre2_set_recursion_memory_management PCRE2_SUFFIX(pcre2_set_recursion_memory_management_) + +/* Now generate all three sets of width-specific structures and function +prototypes. */ + +#define PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS \ +PCRE2_TYPES_LIST \ +PCRE2_STRUCTURE_LIST \ +PCRE2_GENERAL_INFO_FUNCTIONS \ +PCRE2_GENERAL_CONTEXT_FUNCTIONS \ +PCRE2_COMPILE_CONTEXT_FUNCTIONS \ +PCRE2_CONVERT_CONTEXT_FUNCTIONS \ +PCRE2_CONVERT_FUNCTIONS \ +PCRE2_MATCH_CONTEXT_FUNCTIONS \ +PCRE2_COMPILE_FUNCTIONS \ +PCRE2_PATTERN_INFO_FUNCTIONS \ +PCRE2_MATCH_FUNCTIONS \ +PCRE2_SUBSTRING_FUNCTIONS \ +PCRE2_SERIALIZE_FUNCTIONS \ +PCRE2_SUBSTITUTE_FUNCTION \ +PCRE2_JIT_FUNCTIONS \ +PCRE2_OTHER_FUNCTIONS + +#define PCRE2_LOCAL_WIDTH 8 +PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS +#undef PCRE2_LOCAL_WIDTH + +#define PCRE2_LOCAL_WIDTH 16 +PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS +#undef PCRE2_LOCAL_WIDTH + +#define PCRE2_LOCAL_WIDTH 32 +PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS +#undef PCRE2_LOCAL_WIDTH + +/* Undefine the list macros; they are no longer needed. */ + +#undef PCRE2_TYPES_LIST +#undef PCRE2_STRUCTURE_LIST +#undef PCRE2_GENERAL_INFO_FUNCTIONS +#undef PCRE2_GENERAL_CONTEXT_FUNCTIONS +#undef PCRE2_COMPILE_CONTEXT_FUNCTIONS +#undef PCRE2_CONVERT_CONTEXT_FUNCTIONS +#undef PCRE2_MATCH_CONTEXT_FUNCTIONS +#undef PCRE2_COMPILE_FUNCTIONS +#undef PCRE2_PATTERN_INFO_FUNCTIONS +#undef PCRE2_MATCH_FUNCTIONS +#undef PCRE2_SUBSTRING_FUNCTIONS +#undef PCRE2_SERIALIZE_FUNCTIONS +#undef PCRE2_SUBSTITUTE_FUNCTION +#undef PCRE2_JIT_FUNCTIONS +#undef PCRE2_OTHER_FUNCTIONS +#undef PCRE2_TYPES_STRUCTURES_AND_FUNCTIONS + +/* PCRE2_CODE_UNIT_WIDTH must be defined. If it is 8, 16, or 32, redefine +PCRE2_SUFFIX to use it. If it is 0, undefine the other macros and make +PCRE2_SUFFIX a no-op. Otherwise, generate an error. */ + +#undef PCRE2_SUFFIX +#ifndef PCRE2_CODE_UNIT_WIDTH +#error PCRE2_CODE_UNIT_WIDTH must be defined before including pcre2.h. +#error Use 8, 16, or 32; or 0 for a multi-width application. +#else /* PCRE2_CODE_UNIT_WIDTH is defined */ +#if PCRE2_CODE_UNIT_WIDTH == 8 || \ + PCRE2_CODE_UNIT_WIDTH == 16 || \ + PCRE2_CODE_UNIT_WIDTH == 32 +#define PCRE2_SUFFIX(a) PCRE2_GLUE(a, PCRE2_CODE_UNIT_WIDTH) +#elif PCRE2_CODE_UNIT_WIDTH == 0 +#undef PCRE2_JOIN +#undef PCRE2_GLUE +#define PCRE2_SUFFIX(a) a +#else +#error PCRE2_CODE_UNIT_WIDTH must be 0, 8, 16, or 32. +#endif +#endif /* PCRE2_CODE_UNIT_WIDTH is defined */ + +#ifdef __cplusplus +} /* extern "C" */ +#endif + +#endif /* PCRE2_H_IDEMPOTENT_GUARD */ + +/* End of pcre2.h */ diff --git a/src/pcre2/src/pcre2_auto_possess.c b/src/pcre2/src/pcre2_auto_possess.c new file mode 100644 index 00000000..e5e08956 --- /dev/null +++ b/src/pcre2/src/pcre2_auto_possess.c @@ -0,0 +1,1348 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2021 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + +/* This module contains functions that scan a compiled pattern and change +repeats into possessive repeats where possible. */ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + + +#include "pcre2_internal.h" + + +/************************************************* +* Tables for auto-possessification * +*************************************************/ + +/* This table is used to check whether auto-possessification is possible +between adjacent character-type opcodes. The left-hand (repeated) opcode is +used to select the row, and the right-hand opcode is use to select the column. +A value of 1 means that auto-possessification is OK. For example, the second +value in the first row means that \D+\d can be turned into \D++\d. + +The Unicode property types (\P and \p) have to be present to fill out the table +because of what their opcode values are, but the table values should always be +zero because property types are handled separately in the code. The last four +columns apply to items that cannot be repeated, so there is no need to have +rows for them. Note that OP_DIGIT etc. are generated only when PCRE_UCP is +*not* set. When it is set, \d etc. are converted into OP_(NOT_)PROP codes. */ + +#define APTROWS (LAST_AUTOTAB_LEFT_OP - FIRST_AUTOTAB_OP + 1) +#define APTCOLS (LAST_AUTOTAB_RIGHT_OP - FIRST_AUTOTAB_OP + 1) + +static const uint8_t autoposstab[APTROWS][APTCOLS] = { +/* \D \d \S \s \W \w . .+ \C \P \p \R \H \h \V \v \X \Z \z $ $M */ + { 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 }, /* \D */ + { 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1 }, /* \d */ + { 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1 }, /* \S */ + { 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 }, /* \s */ + { 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 }, /* \W */ + { 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1 }, /* \w */ + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0 }, /* . */ + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 }, /* .+ */ + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 }, /* \C */ + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, /* \P */ + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, /* \p */ + { 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0 }, /* \R */ + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0 }, /* \H */ + { 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0 }, /* \h */ + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0 }, /* \V */ + { 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0 }, /* \v */ + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 } /* \X */ +}; + +#ifdef SUPPORT_UNICODE +/* This table is used to check whether auto-possessification is possible +between adjacent Unicode property opcodes (OP_PROP and OP_NOTPROP). The +left-hand (repeated) opcode is used to select the row, and the right-hand +opcode is used to select the column. The values are as follows: + + 0 Always return FALSE (never auto-possessify) + 1 Character groups are distinct (possessify if both are OP_PROP) + 2 Check character categories in the same group (general or particular) + 3 TRUE if the two opcodes are not the same (PROP vs NOTPROP) + + 4 Check left general category vs right particular category + 5 Check right general category vs left particular category + + 6 Left alphanum vs right general category + 7 Left space vs right general category + 8 Left word vs right general category + + 9 Right alphanum vs left general category + 10 Right space vs left general category + 11 Right word vs left general category + + 12 Left alphanum vs right particular category + 13 Left space vs right particular category + 14 Left word vs right particular category + + 15 Right alphanum vs left particular category + 16 Right space vs left particular category + 17 Right word vs left particular category +*/ + +static const uint8_t propposstab[PT_TABSIZE][PT_TABSIZE] = { +/* ANY LAMP GC PC SC ALNUM SPACE PXSPACE WORD CLIST UCNC */ + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, /* PT_ANY */ + { 0, 3, 0, 0, 0, 3, 1, 1, 0, 0, 0 }, /* PT_LAMP */ + { 0, 0, 2, 4, 0, 9, 10, 10, 11, 0, 0 }, /* PT_GC */ + { 0, 0, 5, 2, 0, 15, 16, 16, 17, 0, 0 }, /* PT_PC */ + { 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0 }, /* PT_SC */ + { 0, 3, 6, 12, 0, 3, 1, 1, 0, 0, 0 }, /* PT_ALNUM */ + { 0, 1, 7, 13, 0, 1, 3, 3, 1, 0, 0 }, /* PT_SPACE */ + { 0, 1, 7, 13, 0, 1, 3, 3, 1, 0, 0 }, /* PT_PXSPACE */ + { 0, 0, 8, 14, 0, 0, 1, 1, 3, 0, 0 }, /* PT_WORD */ + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }, /* PT_CLIST */ + { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3 } /* PT_UCNC */ +}; + +/* This table is used to check whether auto-possessification is possible +between adjacent Unicode property opcodes (OP_PROP and OP_NOTPROP) when one +specifies a general category and the other specifies a particular category. The +row is selected by the general category and the column by the particular +category. The value is 1 if the particular category is not part of the general +category. */ + +static const uint8_t catposstab[7][30] = { +/* Cc Cf Cn Co Cs Ll Lm Lo Lt Lu Mc Me Mn Nd Nl No Pc Pd Pe Pf Pi Po Ps Sc Sk Sm So Zl Zp Zs */ + { 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }, /* C */ + { 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }, /* L */ + { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }, /* M */ + { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }, /* N */ + { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1 }, /* P */ + { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1 }, /* S */ + { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0 } /* Z */ +}; + +/* This table is used when checking ALNUM, (PX)SPACE, SPACE, and WORD against +a general or particular category. The properties in each row are those +that apply to the character set in question. Duplication means that a little +unnecessary work is done when checking, but this keeps things much simpler +because they can all use the same code. For more details see the comment where +this table is used. + +Note: SPACE and PXSPACE used to be different because Perl excluded VT from +"space", but from Perl 5.18 it's included, so both categories are treated the +same here. */ + +static const uint8_t posspropstab[3][4] = { + { ucp_L, ucp_N, ucp_N, ucp_Nl }, /* ALNUM, 3rd and 4th values redundant */ + { ucp_Z, ucp_Z, ucp_C, ucp_Cc }, /* SPACE and PXSPACE, 2nd value redundant */ + { ucp_L, ucp_N, ucp_P, ucp_Po } /* WORD */ +}; +#endif /* SUPPORT_UNICODE */ + + + +#ifdef SUPPORT_UNICODE +/************************************************* +* Check a character and a property * +*************************************************/ + +/* This function is called by compare_opcodes() when a property item is +adjacent to a fixed character. + +Arguments: + c the character + ptype the property type + pdata the data for the type + negated TRUE if it's a negated property (\P or \p{^) + +Returns: TRUE if auto-possessifying is OK +*/ + +static BOOL +check_char_prop(uint32_t c, unsigned int ptype, unsigned int pdata, + BOOL negated) +{ +const uint32_t *p; +const ucd_record *prop = GET_UCD(c); + +switch(ptype) + { + case PT_LAMP: + return (prop->chartype == ucp_Lu || + prop->chartype == ucp_Ll || + prop->chartype == ucp_Lt) == negated; + + case PT_GC: + return (pdata == PRIV(ucp_gentype)[prop->chartype]) == negated; + + case PT_PC: + return (pdata == prop->chartype) == negated; + + case PT_SC: + return (pdata == prop->script) == negated; + + /* These are specials */ + + case PT_ALNUM: + return (PRIV(ucp_gentype)[prop->chartype] == ucp_L || + PRIV(ucp_gentype)[prop->chartype] == ucp_N) == negated; + + /* Perl space used to exclude VT, but from Perl 5.18 it is included, which + means that Perl space and POSIX space are now identical. PCRE was changed + at release 8.34. */ + + case PT_SPACE: /* Perl space */ + case PT_PXSPACE: /* POSIX space */ + switch(c) + { + HSPACE_CASES: + VSPACE_CASES: + return negated; + + default: + return (PRIV(ucp_gentype)[prop->chartype] == ucp_Z) == negated; + } + break; /* Control never reaches here */ + + case PT_WORD: + return (PRIV(ucp_gentype)[prop->chartype] == ucp_L || + PRIV(ucp_gentype)[prop->chartype] == ucp_N || + c == CHAR_UNDERSCORE) == negated; + + case PT_CLIST: + p = PRIV(ucd_caseless_sets) + prop->caseset; + for (;;) + { + if (c < *p) return !negated; + if (c == *p++) return negated; + } + break; /* Control never reaches here */ + } + +return FALSE; +} +#endif /* SUPPORT_UNICODE */ + + + +/************************************************* +* Base opcode of repeated opcodes * +*************************************************/ + +/* Returns the base opcode for repeated single character type opcodes. If the +opcode is not a repeated character type, it returns with the original value. + +Arguments: c opcode +Returns: base opcode for the type +*/ + +static PCRE2_UCHAR +get_repeat_base(PCRE2_UCHAR c) +{ +return (c > OP_TYPEPOSUPTO)? c : + (c >= OP_TYPESTAR)? OP_TYPESTAR : + (c >= OP_NOTSTARI)? OP_NOTSTARI : + (c >= OP_NOTSTAR)? OP_NOTSTAR : + (c >= OP_STARI)? OP_STARI : + OP_STAR; +} + + +/************************************************* +* Fill the character property list * +*************************************************/ + +/* Checks whether the code points to an opcode that can take part in auto- +possessification, and if so, fills a list with its properties. + +Arguments: + code points to start of expression + utf TRUE if in UTF mode + ucp TRUE if in UCP mode + fcc points to the case-flipping table + list points to output list + list[0] will be filled with the opcode + list[1] will be non-zero if this opcode + can match an empty character string + list[2..7] depends on the opcode + +Returns: points to the start of the next opcode if *code is accepted + NULL if *code is not accepted +*/ + +static PCRE2_SPTR +get_chr_property_list(PCRE2_SPTR code, BOOL utf, BOOL ucp, const uint8_t *fcc, + uint32_t *list) +{ +PCRE2_UCHAR c = *code; +PCRE2_UCHAR base; +PCRE2_SPTR end; +uint32_t chr; + +#ifdef SUPPORT_UNICODE +uint32_t *clist_dest; +const uint32_t *clist_src; +#else +(void)utf; /* Suppress "unused parameter" compiler warnings */ +(void)ucp; +#endif + +list[0] = c; +list[1] = FALSE; +code++; + +if (c >= OP_STAR && c <= OP_TYPEPOSUPTO) + { + base = get_repeat_base(c); + c -= (base - OP_STAR); + + if (c == OP_UPTO || c == OP_MINUPTO || c == OP_EXACT || c == OP_POSUPTO) + code += IMM2_SIZE; + + list[1] = (c != OP_PLUS && c != OP_MINPLUS && c != OP_EXACT && + c != OP_POSPLUS); + + switch(base) + { + case OP_STAR: + list[0] = OP_CHAR; + break; + + case OP_STARI: + list[0] = OP_CHARI; + break; + + case OP_NOTSTAR: + list[0] = OP_NOT; + break; + + case OP_NOTSTARI: + list[0] = OP_NOTI; + break; + + case OP_TYPESTAR: + list[0] = *code; + code++; + break; + } + c = list[0]; + } + +switch(c) + { + case OP_NOT_DIGIT: + case OP_DIGIT: + case OP_NOT_WHITESPACE: + case OP_WHITESPACE: + case OP_NOT_WORDCHAR: + case OP_WORDCHAR: + case OP_ANY: + case OP_ALLANY: + case OP_ANYNL: + case OP_NOT_HSPACE: + case OP_HSPACE: + case OP_NOT_VSPACE: + case OP_VSPACE: + case OP_EXTUNI: + case OP_EODN: + case OP_EOD: + case OP_DOLL: + case OP_DOLLM: + return code; + + case OP_CHAR: + case OP_NOT: + GETCHARINCTEST(chr, code); + list[2] = chr; + list[3] = NOTACHAR; + return code; + + case OP_CHARI: + case OP_NOTI: + list[0] = (c == OP_CHARI) ? OP_CHAR : OP_NOT; + GETCHARINCTEST(chr, code); + list[2] = chr; + +#ifdef SUPPORT_UNICODE + if (chr < 128 || (chr < 256 && !utf && !ucp)) + list[3] = fcc[chr]; + else + list[3] = UCD_OTHERCASE(chr); +#elif defined SUPPORT_WIDE_CHARS + list[3] = (chr < 256) ? fcc[chr] : chr; +#else + list[3] = fcc[chr]; +#endif + + /* The othercase might be the same value. */ + + if (chr == list[3]) + list[3] = NOTACHAR; + else + list[4] = NOTACHAR; + return code; + +#ifdef SUPPORT_UNICODE + case OP_PROP: + case OP_NOTPROP: + if (code[0] != PT_CLIST) + { + list[2] = code[0]; + list[3] = code[1]; + return code + 2; + } + + /* Convert only if we have enough space. */ + + clist_src = PRIV(ucd_caseless_sets) + code[1]; + clist_dest = list + 2; + code += 2; + + do { + if (clist_dest >= list + 8) + { + /* Early return if there is not enough space. This should never + happen, since all clists are shorter than 5 character now. */ + list[2] = code[0]; + list[3] = code[1]; + return code; + } + *clist_dest++ = *clist_src; + } + while(*clist_src++ != NOTACHAR); + + /* All characters are stored. The terminating NOTACHAR is copied from the + clist itself. */ + + list[0] = (c == OP_PROP) ? OP_CHAR : OP_NOT; + return code; +#endif + + case OP_NCLASS: + case OP_CLASS: +#ifdef SUPPORT_WIDE_CHARS + case OP_XCLASS: + if (c == OP_XCLASS) + end = code + GET(code, 0) - 1; + else +#endif + end = code + 32 / sizeof(PCRE2_UCHAR); + + switch(*end) + { + case OP_CRSTAR: + case OP_CRMINSTAR: + case OP_CRQUERY: + case OP_CRMINQUERY: + case OP_CRPOSSTAR: + case OP_CRPOSQUERY: + list[1] = TRUE; + end++; + break; + + case OP_CRPLUS: + case OP_CRMINPLUS: + case OP_CRPOSPLUS: + end++; + break; + + case OP_CRRANGE: + case OP_CRMINRANGE: + case OP_CRPOSRANGE: + list[1] = (GET2(end, 1) == 0); + end += 1 + 2 * IMM2_SIZE; + break; + } + list[2] = (uint32_t)(end - code); + return end; + } + +return NULL; /* Opcode not accepted */ +} + + + +/************************************************* +* Scan further character sets for match * +*************************************************/ + +/* Checks whether the base and the current opcode have a common character, in +which case the base cannot be possessified. + +Arguments: + code points to the byte code + utf TRUE in UTF mode + ucp TRUE in UCP mode + cb compile data block + base_list the data list of the base opcode + base_end the end of the base opcode + rec_limit points to recursion depth counter + +Returns: TRUE if the auto-possessification is possible +*/ + +static BOOL +compare_opcodes(PCRE2_SPTR code, BOOL utf, BOOL ucp, const compile_block *cb, + const uint32_t *base_list, PCRE2_SPTR base_end, int *rec_limit) +{ +PCRE2_UCHAR c; +uint32_t list[8]; +const uint32_t *chr_ptr; +const uint32_t *ochr_ptr; +const uint32_t *list_ptr; +PCRE2_SPTR next_code; +#ifdef SUPPORT_WIDE_CHARS +PCRE2_SPTR xclass_flags; +#endif +const uint8_t *class_bitset; +const uint8_t *set1, *set2, *set_end; +uint32_t chr; +BOOL accepted, invert_bits; +BOOL entered_a_group = FALSE; + +if (--(*rec_limit) <= 0) return FALSE; /* Recursion has gone too deep */ + +/* Note: the base_list[1] contains whether the current opcode has a greedy +(represented by a non-zero value) quantifier. This is a different from +other character type lists, which store here that the character iterator +matches to an empty string (also represented by a non-zero value). */ + +for(;;) + { + /* All operations move the code pointer forward. + Therefore infinite recursions are not possible. */ + + c = *code; + + /* Skip over callouts */ + + if (c == OP_CALLOUT) + { + code += PRIV(OP_lengths)[c]; + continue; + } + + if (c == OP_CALLOUT_STR) + { + code += GET(code, 1 + 2*LINK_SIZE); + continue; + } + + /* At the end of a branch, skip to the end of the group. */ + + if (c == OP_ALT) + { + do code += GET(code, 1); while (*code == OP_ALT); + c = *code; + } + + /* Inspect the next opcode. */ + + switch(c) + { + /* We can always possessify a greedy iterator at the end of the pattern, + which is reached after skipping over the final OP_KET. A non-greedy + iterator must never be possessified. */ + + case OP_END: + return base_list[1] != 0; + + /* When an iterator is at the end of certain kinds of group we can inspect + what follows the group by skipping over the closing ket. Note that this + does not apply to OP_KETRMAX or OP_KETRMIN because what follows any given + iteration is variable (could be another iteration or could be the next + item). As these two opcodes are not listed in the next switch, they will + end up as the next code to inspect, and return FALSE by virtue of being + unsupported. */ + + case OP_KET: + case OP_KETRPOS: + /* The non-greedy case cannot be converted to a possessive form. */ + + if (base_list[1] == 0) return FALSE; + + /* If the bracket is capturing it might be referenced by an OP_RECURSE + so its last iterator can never be possessified if the pattern contains + recursions. (This could be improved by keeping a list of group numbers that + are called by recursion.) */ + + switch(*(code - GET(code, 1))) + { + case OP_CBRA: + case OP_SCBRA: + case OP_CBRAPOS: + case OP_SCBRAPOS: + if (cb->had_recurse) return FALSE; + break; + + /* A script run might have to backtrack if the iterated item can match + characters from more than one script. So give up unless repeating an + explicit character. */ + + case OP_SCRIPT_RUN: + if (base_list[0] != OP_CHAR && base_list[0] != OP_CHARI) + return FALSE; + break; + + /* Atomic sub-patterns and assertions can always auto-possessify their + last iterator. However, if the group was entered as a result of checking + a previous iterator, this is not possible. */ + + case OP_ASSERT: + case OP_ASSERT_NOT: + case OP_ASSERTBACK: + case OP_ASSERTBACK_NOT: + case OP_ONCE: + return !entered_a_group; + + /* Non-atomic assertions - don't possessify last iterator. This needs + more thought. */ + + case OP_ASSERT_NA: + case OP_ASSERTBACK_NA: + return FALSE; + } + + /* Skip over the bracket and inspect what comes next. */ + + code += PRIV(OP_lengths)[c]; + continue; + + /* Handle cases where the next item is a group. */ + + case OP_ONCE: + case OP_BRA: + case OP_CBRA: + next_code = code + GET(code, 1); + code += PRIV(OP_lengths)[c]; + + /* Check each branch. We have to recurse a level for all but the last + branch. */ + + while (*next_code == OP_ALT) + { + if (!compare_opcodes(code, utf, ucp, cb, base_list, base_end, rec_limit)) + return FALSE; + code = next_code + 1 + LINK_SIZE; + next_code += GET(next_code, 1); + } + + entered_a_group = TRUE; + continue; + + case OP_BRAZERO: + case OP_BRAMINZERO: + + next_code = code + 1; + if (*next_code != OP_BRA && *next_code != OP_CBRA && + *next_code != OP_ONCE) return FALSE; + + do next_code += GET(next_code, 1); while (*next_code == OP_ALT); + + /* The bracket content will be checked by the OP_BRA/OP_CBRA case above. */ + + next_code += 1 + LINK_SIZE; + if (!compare_opcodes(next_code, utf, ucp, cb, base_list, base_end, + rec_limit)) + return FALSE; + + code += PRIV(OP_lengths)[c]; + continue; + + /* The next opcode does not need special handling; fall through and use it + to see if the base can be possessified. */ + + default: + break; + } + + /* We now have the next appropriate opcode to compare with the base. Check + for a supported opcode, and load its properties. */ + + code = get_chr_property_list(code, utf, ucp, cb->fcc, list); + if (code == NULL) return FALSE; /* Unsupported */ + + /* If either opcode is a small character list, set pointers for comparing + characters from that list with another list, or with a property. */ + + if (base_list[0] == OP_CHAR) + { + chr_ptr = base_list + 2; + list_ptr = list; + } + else if (list[0] == OP_CHAR) + { + chr_ptr = list + 2; + list_ptr = base_list; + } + + /* Character bitsets can also be compared to certain opcodes. */ + + else if (base_list[0] == OP_CLASS || list[0] == OP_CLASS +#if PCRE2_CODE_UNIT_WIDTH == 8 + /* In 8 bit, non-UTF mode, OP_CLASS and OP_NCLASS are the same. */ + || (!utf && (base_list[0] == OP_NCLASS || list[0] == OP_NCLASS)) +#endif + ) + { +#if PCRE2_CODE_UNIT_WIDTH == 8 + if (base_list[0] == OP_CLASS || (!utf && base_list[0] == OP_NCLASS)) +#else + if (base_list[0] == OP_CLASS) +#endif + { + set1 = (uint8_t *)(base_end - base_list[2]); + list_ptr = list; + } + else + { + set1 = (uint8_t *)(code - list[2]); + list_ptr = base_list; + } + + invert_bits = FALSE; + switch(list_ptr[0]) + { + case OP_CLASS: + case OP_NCLASS: + set2 = (uint8_t *) + ((list_ptr == list ? code : base_end) - list_ptr[2]); + break; + +#ifdef SUPPORT_WIDE_CHARS + case OP_XCLASS: + xclass_flags = (list_ptr == list ? code : base_end) - list_ptr[2] + LINK_SIZE; + if ((*xclass_flags & XCL_HASPROP) != 0) return FALSE; + if ((*xclass_flags & XCL_MAP) == 0) + { + /* No bits are set for characters < 256. */ + if (list[1] == 0) return (*xclass_flags & XCL_NOT) == 0; + /* Might be an empty repeat. */ + continue; + } + set2 = (uint8_t *)(xclass_flags + 1); + break; +#endif + + case OP_NOT_DIGIT: + invert_bits = TRUE; + /* Fall through */ + case OP_DIGIT: + set2 = (uint8_t *)(cb->cbits + cbit_digit); + break; + + case OP_NOT_WHITESPACE: + invert_bits = TRUE; + /* Fall through */ + case OP_WHITESPACE: + set2 = (uint8_t *)(cb->cbits + cbit_space); + break; + + case OP_NOT_WORDCHAR: + invert_bits = TRUE; + /* Fall through */ + case OP_WORDCHAR: + set2 = (uint8_t *)(cb->cbits + cbit_word); + break; + + default: + return FALSE; + } + + /* Because the bit sets are unaligned bytes, we need to perform byte + comparison here. */ + + set_end = set1 + 32; + if (invert_bits) + { + do + { + if ((*set1++ & ~(*set2++)) != 0) return FALSE; + } + while (set1 < set_end); + } + else + { + do + { + if ((*set1++ & *set2++) != 0) return FALSE; + } + while (set1 < set_end); + } + + if (list[1] == 0) return TRUE; + /* Might be an empty repeat. */ + continue; + } + + /* Some property combinations also acceptable. Unicode property opcodes are + processed specially; the rest can be handled with a lookup table. */ + + else + { + uint32_t leftop, rightop; + + leftop = base_list[0]; + rightop = list[0]; + +#ifdef SUPPORT_UNICODE + accepted = FALSE; /* Always set in non-unicode case. */ + if (leftop == OP_PROP || leftop == OP_NOTPROP) + { + if (rightop == OP_EOD) + accepted = TRUE; + else if (rightop == OP_PROP || rightop == OP_NOTPROP) + { + int n; + const uint8_t *p; + BOOL same = leftop == rightop; + BOOL lisprop = leftop == OP_PROP; + BOOL risprop = rightop == OP_PROP; + BOOL bothprop = lisprop && risprop; + + /* There's a table that specifies how each combination is to be + processed: + 0 Always return FALSE (never auto-possessify) + 1 Character groups are distinct (possessify if both are OP_PROP) + 2 Check character categories in the same group (general or particular) + 3 Return TRUE if the two opcodes are not the same + ... see comments below + */ + + n = propposstab[base_list[2]][list[2]]; + switch(n) + { + case 0: break; + case 1: accepted = bothprop; break; + case 2: accepted = (base_list[3] == list[3]) != same; break; + case 3: accepted = !same; break; + + case 4: /* Left general category, right particular category */ + accepted = risprop && catposstab[base_list[3]][list[3]] == same; + break; + + case 5: /* Right general category, left particular category */ + accepted = lisprop && catposstab[list[3]][base_list[3]] == same; + break; + + /* This code is logically tricky. Think hard before fiddling with it. + The posspropstab table has four entries per row. Each row relates to + one of PCRE's special properties such as ALNUM or SPACE or WORD. + Only WORD actually needs all four entries, but using repeats for the + others means they can all use the same code below. + + The first two entries in each row are Unicode general categories, and + apply always, because all the characters they include are part of the + PCRE character set. The third and fourth entries are a general and a + particular category, respectively, that include one or more relevant + characters. One or the other is used, depending on whether the check + is for a general or a particular category. However, in both cases the + category contains more characters than the specials that are defined + for the property being tested against. Therefore, it cannot be used + in a NOTPROP case. + + Example: the row for WORD contains ucp_L, ucp_N, ucp_P, ucp_Po. + Underscore is covered by ucp_P or ucp_Po. */ + + case 6: /* Left alphanum vs right general category */ + case 7: /* Left space vs right general category */ + case 8: /* Left word vs right general category */ + p = posspropstab[n-6]; + accepted = risprop && lisprop == + (list[3] != p[0] && + list[3] != p[1] && + (list[3] != p[2] || !lisprop)); + break; + + case 9: /* Right alphanum vs left general category */ + case 10: /* Right space vs left general category */ + case 11: /* Right word vs left general category */ + p = posspropstab[n-9]; + accepted = lisprop && risprop == + (base_list[3] != p[0] && + base_list[3] != p[1] && + (base_list[3] != p[2] || !risprop)); + break; + + case 12: /* Left alphanum vs right particular category */ + case 13: /* Left space vs right particular category */ + case 14: /* Left word vs right particular category */ + p = posspropstab[n-12]; + accepted = risprop && lisprop == + (catposstab[p[0]][list[3]] && + catposstab[p[1]][list[3]] && + (list[3] != p[3] || !lisprop)); + break; + + case 15: /* Right alphanum vs left particular category */ + case 16: /* Right space vs left particular category */ + case 17: /* Right word vs left particular category */ + p = posspropstab[n-15]; + accepted = lisprop && risprop == + (catposstab[p[0]][base_list[3]] && + catposstab[p[1]][base_list[3]] && + (base_list[3] != p[3] || !risprop)); + break; + } + } + } + + else +#endif /* SUPPORT_UNICODE */ + + accepted = leftop >= FIRST_AUTOTAB_OP && leftop <= LAST_AUTOTAB_LEFT_OP && + rightop >= FIRST_AUTOTAB_OP && rightop <= LAST_AUTOTAB_RIGHT_OP && + autoposstab[leftop - FIRST_AUTOTAB_OP][rightop - FIRST_AUTOTAB_OP]; + + if (!accepted) return FALSE; + + if (list[1] == 0) return TRUE; + /* Might be an empty repeat. */ + continue; + } + + /* Control reaches here only if one of the items is a small character list. + All characters are checked against the other side. */ + + do + { + chr = *chr_ptr; + + switch(list_ptr[0]) + { + case OP_CHAR: + ochr_ptr = list_ptr + 2; + do + { + if (chr == *ochr_ptr) return FALSE; + ochr_ptr++; + } + while(*ochr_ptr != NOTACHAR); + break; + + case OP_NOT: + ochr_ptr = list_ptr + 2; + do + { + if (chr == *ochr_ptr) + break; + ochr_ptr++; + } + while(*ochr_ptr != NOTACHAR); + if (*ochr_ptr == NOTACHAR) return FALSE; /* Not found */ + break; + + /* Note that OP_DIGIT etc. are generated only when PCRE2_UCP is *not* + set. When it is set, \d etc. are converted into OP_(NOT_)PROP codes. */ + + case OP_DIGIT: + if (chr < 256 && (cb->ctypes[chr] & ctype_digit) != 0) return FALSE; + break; + + case OP_NOT_DIGIT: + if (chr > 255 || (cb->ctypes[chr] & ctype_digit) == 0) return FALSE; + break; + + case OP_WHITESPACE: + if (chr < 256 && (cb->ctypes[chr] & ctype_space) != 0) return FALSE; + break; + + case OP_NOT_WHITESPACE: + if (chr > 255 || (cb->ctypes[chr] & ctype_space) == 0) return FALSE; + break; + + case OP_WORDCHAR: + if (chr < 255 && (cb->ctypes[chr] & ctype_word) != 0) return FALSE; + break; + + case OP_NOT_WORDCHAR: + if (chr > 255 || (cb->ctypes[chr] & ctype_word) == 0) return FALSE; + break; + + case OP_HSPACE: + switch(chr) + { + HSPACE_CASES: return FALSE; + default: break; + } + break; + + case OP_NOT_HSPACE: + switch(chr) + { + HSPACE_CASES: break; + default: return FALSE; + } + break; + + case OP_ANYNL: + case OP_VSPACE: + switch(chr) + { + VSPACE_CASES: return FALSE; + default: break; + } + break; + + case OP_NOT_VSPACE: + switch(chr) + { + VSPACE_CASES: break; + default: return FALSE; + } + break; + + case OP_DOLL: + case OP_EODN: + switch (chr) + { + case CHAR_CR: + case CHAR_LF: + case CHAR_VT: + case CHAR_FF: + case CHAR_NEL: +#ifndef EBCDIC + case 0x2028: + case 0x2029: +#endif /* Not EBCDIC */ + return FALSE; + } + break; + + case OP_EOD: /* Can always possessify before \z */ + break; + +#ifdef SUPPORT_UNICODE + case OP_PROP: + case OP_NOTPROP: + if (!check_char_prop(chr, list_ptr[2], list_ptr[3], + list_ptr[0] == OP_NOTPROP)) + return FALSE; + break; +#endif + + case OP_NCLASS: + if (chr > 255) return FALSE; + /* Fall through */ + + case OP_CLASS: + if (chr > 255) break; + class_bitset = (uint8_t *) + ((list_ptr == list ? code : base_end) - list_ptr[2]); + if ((class_bitset[chr >> 3] & (1u << (chr & 7))) != 0) return FALSE; + break; + +#ifdef SUPPORT_WIDE_CHARS + case OP_XCLASS: + if (PRIV(xclass)(chr, (list_ptr == list ? code : base_end) - + list_ptr[2] + LINK_SIZE, utf)) return FALSE; + break; +#endif + + default: + return FALSE; + } + + chr_ptr++; + } + while(*chr_ptr != NOTACHAR); + + /* At least one character must be matched from this opcode. */ + + if (list[1] == 0) return TRUE; + } + +/* Control never reaches here. There used to be a fail-save return FALSE; here, +but some compilers complain about an unreachable statement. */ +} + + + +/************************************************* +* Scan compiled regex for auto-possession * +*************************************************/ + +/* Replaces single character iterations with their possessive alternatives +if appropriate. This function modifies the compiled opcode! Hitting a +non-existent opcode may indicate a bug in PCRE2, but it can also be caused if a +bad UTF string was compiled with PCRE2_NO_UTF_CHECK. The rec_limit catches +overly complicated or large patterns. In these cases, the check just stops, +leaving the remainder of the pattern unpossessified. + +Arguments: + code points to start of the byte code + cb compile data block + +Returns: 0 for success + -1 if a non-existant opcode is encountered +*/ + +int +PRIV(auto_possessify)(PCRE2_UCHAR *code, const compile_block *cb) +{ +PCRE2_UCHAR c; +PCRE2_SPTR end; +PCRE2_UCHAR *repeat_opcode; +uint32_t list[8]; +int rec_limit = 1000; /* Was 10,000 but clang+ASAN uses a lot of stack. */ +BOOL utf = (cb->external_options & PCRE2_UTF) != 0; +BOOL ucp = (cb->external_options & PCRE2_UCP) != 0; + +for (;;) + { + c = *code; + + if (c >= OP_TABLE_LENGTH) return -1; /* Something gone wrong */ + + if (c >= OP_STAR && c <= OP_TYPEPOSUPTO) + { + c -= get_repeat_base(c) - OP_STAR; + end = (c <= OP_MINUPTO) ? + get_chr_property_list(code, utf, ucp, cb->fcc, list) : NULL; + list[1] = c == OP_STAR || c == OP_PLUS || c == OP_QUERY || c == OP_UPTO; + + if (end != NULL && compare_opcodes(end, utf, ucp, cb, list, end, + &rec_limit)) + { + switch(c) + { + case OP_STAR: + *code += OP_POSSTAR - OP_STAR; + break; + + case OP_MINSTAR: + *code += OP_POSSTAR - OP_MINSTAR; + break; + + case OP_PLUS: + *code += OP_POSPLUS - OP_PLUS; + break; + + case OP_MINPLUS: + *code += OP_POSPLUS - OP_MINPLUS; + break; + + case OP_QUERY: + *code += OP_POSQUERY - OP_QUERY; + break; + + case OP_MINQUERY: + *code += OP_POSQUERY - OP_MINQUERY; + break; + + case OP_UPTO: + *code += OP_POSUPTO - OP_UPTO; + break; + + case OP_MINUPTO: + *code += OP_POSUPTO - OP_MINUPTO; + break; + } + } + c = *code; + } + else if (c == OP_CLASS || c == OP_NCLASS || c == OP_XCLASS) + { +#ifdef SUPPORT_WIDE_CHARS + if (c == OP_XCLASS) + repeat_opcode = code + GET(code, 1); + else +#endif + repeat_opcode = code + 1 + (32 / sizeof(PCRE2_UCHAR)); + + c = *repeat_opcode; + if (c >= OP_CRSTAR && c <= OP_CRMINRANGE) + { + /* The return from get_chr_property_list() will never be NULL when + *code (aka c) is one of the three class opcodes. However, gcc with + -fanalyzer notes that a NULL return is possible, and grumbles. Hence we + put in a check. */ + + end = get_chr_property_list(code, utf, ucp, cb->fcc, list); + list[1] = (c & 1) == 0; + + if (end != NULL && + compare_opcodes(end, utf, ucp, cb, list, end, &rec_limit)) + { + switch (c) + { + case OP_CRSTAR: + case OP_CRMINSTAR: + *repeat_opcode = OP_CRPOSSTAR; + break; + + case OP_CRPLUS: + case OP_CRMINPLUS: + *repeat_opcode = OP_CRPOSPLUS; + break; + + case OP_CRQUERY: + case OP_CRMINQUERY: + *repeat_opcode = OP_CRPOSQUERY; + break; + + case OP_CRRANGE: + case OP_CRMINRANGE: + *repeat_opcode = OP_CRPOSRANGE; + break; + } + } + } + c = *code; + } + + switch(c) + { + case OP_END: + return 0; + + case OP_TYPESTAR: + case OP_TYPEMINSTAR: + case OP_TYPEPLUS: + case OP_TYPEMINPLUS: + case OP_TYPEQUERY: + case OP_TYPEMINQUERY: + case OP_TYPEPOSSTAR: + case OP_TYPEPOSPLUS: + case OP_TYPEPOSQUERY: + if (code[1] == OP_PROP || code[1] == OP_NOTPROP) code += 2; + break; + + case OP_TYPEUPTO: + case OP_TYPEMINUPTO: + case OP_TYPEEXACT: + case OP_TYPEPOSUPTO: + if (code[1 + IMM2_SIZE] == OP_PROP || code[1 + IMM2_SIZE] == OP_NOTPROP) + code += 2; + break; + + case OP_CALLOUT_STR: + code += GET(code, 1 + 2*LINK_SIZE); + break; + +#ifdef SUPPORT_WIDE_CHARS + case OP_XCLASS: + code += GET(code, 1); + break; +#endif + + case OP_MARK: + case OP_COMMIT_ARG: + case OP_PRUNE_ARG: + case OP_SKIP_ARG: + case OP_THEN_ARG: + code += code[1]; + break; + } + + /* Add in the fixed length from the table */ + + code += PRIV(OP_lengths)[c]; + + /* In UTF-8 and UTF-16 modes, opcodes that are followed by a character may be + followed by a multi-byte character. The length in the table is a minimum, so + we have to arrange to skip the extra code units. */ + +#ifdef MAYBE_UTF_MULTI + if (utf) switch(c) + { + case OP_CHAR: + case OP_CHARI: + case OP_NOT: + case OP_NOTI: + case OP_STAR: + case OP_MINSTAR: + case OP_PLUS: + case OP_MINPLUS: + case OP_QUERY: + case OP_MINQUERY: + case OP_UPTO: + case OP_MINUPTO: + case OP_EXACT: + case OP_POSSTAR: + case OP_POSPLUS: + case OP_POSQUERY: + case OP_POSUPTO: + case OP_STARI: + case OP_MINSTARI: + case OP_PLUSI: + case OP_MINPLUSI: + case OP_QUERYI: + case OP_MINQUERYI: + case OP_UPTOI: + case OP_MINUPTOI: + case OP_EXACTI: + case OP_POSSTARI: + case OP_POSPLUSI: + case OP_POSQUERYI: + case OP_POSUPTOI: + case OP_NOTSTAR: + case OP_NOTMINSTAR: + case OP_NOTPLUS: + case OP_NOTMINPLUS: + case OP_NOTQUERY: + case OP_NOTMINQUERY: + case OP_NOTUPTO: + case OP_NOTMINUPTO: + case OP_NOTEXACT: + case OP_NOTPOSSTAR: + case OP_NOTPOSPLUS: + case OP_NOTPOSQUERY: + case OP_NOTPOSUPTO: + case OP_NOTSTARI: + case OP_NOTMINSTARI: + case OP_NOTPLUSI: + case OP_NOTMINPLUSI: + case OP_NOTQUERYI: + case OP_NOTMINQUERYI: + case OP_NOTUPTOI: + case OP_NOTMINUPTOI: + case OP_NOTEXACTI: + case OP_NOTPOSSTARI: + case OP_NOTPOSPLUSI: + case OP_NOTPOSQUERYI: + case OP_NOTPOSUPTOI: + if (HAS_EXTRALEN(code[-1])) code += GET_EXTRALEN(code[-1]); + break; + } +#else + (void)(utf); /* Keep compiler happy by referencing function argument */ +#endif /* SUPPORT_WIDE_CHARS */ + } +} + +/* End of pcre2_auto_possess.c */ diff --git a/src/pcre/pcre_chartables.c.dist b/src/pcre2/src/pcre2_chartables.c.dist similarity index 67% rename from src/pcre/pcre_chartables.c.dist rename to src/pcre2/src/pcre2_chartables.c.dist index 1e20ec29..861914d1 100644 --- a/src/pcre/pcre_chartables.c.dist +++ b/src/pcre2/src/pcre2_chartables.c.dist @@ -2,31 +2,36 @@ * Perl-Compatible Regular Expressions * *************************************************/ -/* This file contains character tables that are used when no external tables -are passed to PCRE by the application that calls it. The tables are used only -for characters whose code values are less than 256. - -This is a default version of the tables that assumes ASCII encoding. A program -called dftables (which is distributed with PCRE) can be used to build -alternative versions of this file. This is necessary if you are running in an -EBCDIC environment, or if you want to default to a different encoding, for -example ISO-8859-1. When dftables is run, it creates these tables in the -current locale. If PCRE is configured with --enable-rebuild-chartables, this -happens automatically. - -The following #includes are present because without them gcc 4.x may remove the -array definition from the final binary if PCRE is built into a static library -and dead code stripping is activated. This leads to link errors. Pulling in the -header ensures that the array gets flagged as "someone outside this compilation -unit might reference this" and so it will always be supplied to the linker. */ +/* This file was automatically written by the pcre2_dftables auxiliary +program. It contains character tables that are used when no external +tables are passed to PCRE2 by the application that calls it. The tables +are used only for characters whose code values are less than 256. */ + +/* This set of tables was written in the C locale. */ + +/* The pcre2_ftables program (which is distributed with PCRE2) can be used +to build alternative versions of this file. This is necessary if you are +running in an EBCDIC environment, or if you want to default to a different +encoding, for example ISO-8859-1. When pcre2_dftables is run, it creates +these tables in the "C" locale by default. This happens automatically if +PCRE2 is configured with --enable-rebuild-chartables. However, you can run +pcre2_dftables manually with the -L option to build tables using the LC_ALL +locale. */ + +/* The following #include is present because without it gcc 4.x may remove +the array definition from the final binary if PCRE2 is built into a static +library and dead code stripping is activated. This leads to link errors. +Pulling in the header ensures that the array gets flagged as "someone +outside this compilation unit might reference this" and so it will always +be supplied to the linker. */ #ifdef HAVE_CONFIG_H #include "config.h" #endif -#include "pcre_internal.h" +#include "pcre2_internal.h" -const pcre_uint8 PRIV(default_tables)[] = { +const uint8_t PRIV(default_tables)[] = { /* This table is a lower casing table. */ @@ -103,52 +108,52 @@ bytes long and the bits run from the least significant end of each byte. The classes that have their own maps are: space, xdigit, digit, upper, lower, word, graph, print, punct, and cntrl. Other classes are built from combinations. */ - 0x00,0x3e,0x00,0x00,0x01,0x00,0x00,0x00, + 0x00,0x3e,0x00,0x00,0x01,0x00,0x00,0x00, /* space */ 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03, + 0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03, /* xdigit */ 0x7e,0x00,0x00,0x00,0x7e,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03, + 0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03, /* digit */ 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* upper */ 0xfe,0xff,0xff,0x07,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* lower */ 0x00,0x00,0x00,0x00,0xfe,0xff,0xff,0x07, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03, + 0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03, /* word */ 0xfe,0xff,0xff,0x87,0xfe,0xff,0xff,0x07, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0xfe,0xff,0xff,0xff, + 0x00,0x00,0x00,0x00,0xfe,0xff,0xff,0xff, /* graph */ 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x7f, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0xff,0xff,0xff,0xff, + 0x00,0x00,0x00,0x00,0xff,0xff,0xff,0xff, /* print */ 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x7f, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0x00,0x00,0x00,0x00,0xfe,0xff,0x00,0xfc, + 0x00,0x00,0x00,0x00,0xfe,0xff,0x00,0xfc, /* punct */ 0x01,0x00,0x00,0xf8,0x01,0x00,0x00,0x78, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, - 0xff,0xff,0xff,0xff,0x00,0x00,0x00,0x00, + 0xff,0xff,0xff,0xff,0x00,0x00,0x00,0x00, /* cntrl */ 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x80, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, @@ -156,28 +161,27 @@ graph, print, punct, and cntrl. Other classes are built from combinations. */ /* This table identifies various classes of character by individual bits: 0x01 white space character 0x02 letter - 0x04 decimal digit - 0x08 hexadecimal digit + 0x04 lower case letter + 0x08 decimal digit 0x10 alphanumeric or '_' - 0x80 regular expression metacharacter or binary zero */ - 0x80,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 0- 7 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 0- 7 */ 0x00,0x01,0x01,0x01,0x01,0x01,0x00,0x00, /* 8- 15 */ 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 16- 23 */ 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 24- 31 */ - 0x01,0x00,0x00,0x00,0x80,0x00,0x00,0x00, /* - ' */ - 0x80,0x80,0x80,0x80,0x00,0x00,0x80,0x00, /* ( - / */ - 0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c, /* 0 - 7 */ - 0x1c,0x1c,0x00,0x00,0x00,0x00,0x00,0x80, /* 8 - ? */ - 0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* @ - G */ + 0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* - ' */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* ( - / */ + 0x18,0x18,0x18,0x18,0x18,0x18,0x18,0x18, /* 0 - 7 */ + 0x18,0x18,0x00,0x00,0x00,0x00,0x00,0x00, /* 8 - ? */ + 0x00,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* @ - G */ 0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* H - O */ 0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* P - W */ - 0x12,0x12,0x12,0x80,0x80,0x00,0x80,0x10, /* X - _ */ - 0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* ` - g */ - 0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* h - o */ - 0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* p - w */ - 0x12,0x12,0x12,0x80,0x80,0x00,0x00,0x00, /* x -127 */ + 0x12,0x12,0x12,0x00,0x00,0x00,0x00,0x10, /* X - _ */ + 0x00,0x16,0x16,0x16,0x16,0x16,0x16,0x16, /* ` - g */ + 0x16,0x16,0x16,0x16,0x16,0x16,0x16,0x16, /* h - o */ + 0x16,0x16,0x16,0x16,0x16,0x16,0x16,0x16, /* p - w */ + 0x16,0x16,0x16,0x00,0x00,0x00,0x00,0x00, /* x -127 */ 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 128-135 */ 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 136-143 */ 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 144-151 */ @@ -195,4 +199,4 @@ graph, print, punct, and cntrl. Other classes are built from combinations. */ 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 240-247 */ 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00};/* 248-255 */ -/* End of pcre_chartables.c */ +/* End of pcre2_chartables.c */ diff --git a/src/pcre2/src/pcre2_compile.c b/src/pcre2/src/pcre2_compile.c new file mode 100644 index 00000000..da449ae9 --- /dev/null +++ b/src/pcre2/src/pcre2_compile.c @@ -0,0 +1,10517 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2020 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#define NLBLOCK cb /* Block containing newline information */ +#define PSSTART start_pattern /* Field containing processed string start */ +#define PSEND end_pattern /* Field containing processed string end */ + +#include "pcre2_internal.h" + +/* In rare error cases debugging might require calling pcre2_printint(). */ + +#if 0 +#ifdef EBCDIC +#define PRINTABLE(c) ((c) >= 64 && (c) < 255) +#else +#define PRINTABLE(c) ((c) >= 32 && (c) < 127) +#endif +#include "pcre2_printint.c" +#define DEBUG_CALL_PRINTINT +#endif + +/* Other debugging code can be enabled by these defines. */ + +/* #define DEBUG_SHOW_CAPTURES */ +/* #define DEBUG_SHOW_PARSED */ + +/* There are a few things that vary with different code unit sizes. Handle them +by defining macros in order to minimize #if usage. */ + +#if PCRE2_CODE_UNIT_WIDTH == 8 +#define STRING_UTFn_RIGHTPAR STRING_UTF8_RIGHTPAR, 5 +#define XDIGIT(c) xdigitab[c] + +#else /* Either 16-bit or 32-bit */ +#define XDIGIT(c) (MAX_255(c)? xdigitab[c] : 0xff) + +#if PCRE2_CODE_UNIT_WIDTH == 16 +#define STRING_UTFn_RIGHTPAR STRING_UTF16_RIGHTPAR, 6 + +#else /* 32-bit */ +#define STRING_UTFn_RIGHTPAR STRING_UTF32_RIGHTPAR, 6 +#endif +#endif + +/* Macros to store and retrieve a PCRE2_SIZE value in the parsed pattern, which +consists of uint32_t elements. Assume that if uint32_t can't hold it, two of +them will be able to (i.e. assume a 64-bit world). */ + +#if PCRE2_SIZE_MAX <= UINT32_MAX +#define PUTOFFSET(s,p) *p++ = s +#define GETOFFSET(s,p) s = *p++ +#define GETPLUSOFFSET(s,p) s = *(++p) +#define READPLUSOFFSET(s,p) s = p[1] +#define SKIPOFFSET(p) p++ +#define SIZEOFFSET 1 +#else +#define PUTOFFSET(s,p) \ + { *p++ = (uint32_t)(s >> 32); *p++ = (uint32_t)(s & 0xffffffff); } +#define GETOFFSET(s,p) \ + { s = ((PCRE2_SIZE)p[0] << 32) | (PCRE2_SIZE)p[1]; p += 2; } +#define GETPLUSOFFSET(s,p) \ + { s = ((PCRE2_SIZE)p[1] << 32) | (PCRE2_SIZE)p[2]; p += 2; } +#define READPLUSOFFSET(s,p) \ + { s = ((PCRE2_SIZE)p[1] << 32) | (PCRE2_SIZE)p[2]; } +#define SKIPOFFSET(p) p += 2 +#define SIZEOFFSET 2 +#endif + +/* Macros for manipulating elements of the parsed pattern vector. */ + +#define META_CODE(x) (x & 0xffff0000u) +#define META_DATA(x) (x & 0x0000ffffu) +#define META_DIFF(x,y) ((x-y)>>16) + +/* Function definitions to allow mutual recursion */ + +#ifdef SUPPORT_UNICODE +static unsigned int + add_list_to_class_internal(uint8_t *, PCRE2_UCHAR **, uint32_t, + compile_block *, const uint32_t *, unsigned int); +#endif + +static int + compile_regex(uint32_t, PCRE2_UCHAR **, uint32_t **, int *, uint32_t, + uint32_t *, int32_t *, uint32_t *, int32_t *, branch_chain *, + compile_block *, PCRE2_SIZE *); + +static int + get_branchlength(uint32_t **, int *, int *, parsed_recurse_check *, + compile_block *); + +static BOOL + set_lookbehind_lengths(uint32_t **, int *, int *, parsed_recurse_check *, + compile_block *); + +static int + check_lookbehinds(uint32_t *, uint32_t **, parsed_recurse_check *, + compile_block *); + + +/************************************************* +* Code parameters and static tables * +*************************************************/ + +#define MAX_GROUP_NUMBER 65535u +#define MAX_REPEAT_COUNT 65535u +#define REPEAT_UNLIMITED (MAX_REPEAT_COUNT+1) + +/* COMPILE_WORK_SIZE specifies the size of stack workspace, which is used in +different ways in the different pattern scans. The parsing and group- +identifying pre-scan uses it to handle nesting, and needs it to be 16-bit +aligned for this. Having defined the size in code units, we set up +C16_WORK_SIZE as the number of elements in the 16-bit vector. + +During the first compiling phase, when determining how much memory is required, +the regex is partly compiled into this space, but the compiled parts are +discarded as soon as they can be, so that hopefully there will never be an +overrun. The code does, however, check for an overrun, which can occur for +pathological patterns. The size of the workspace depends on LINK_SIZE because +the length of compiled items varies with this. + +In the real compile phase, this workspace is not currently used. */ + +#define COMPILE_WORK_SIZE (3000*LINK_SIZE) /* Size in code units */ + +#define C16_WORK_SIZE \ + ((COMPILE_WORK_SIZE * sizeof(PCRE2_UCHAR))/sizeof(uint16_t)) + +/* A uint32_t vector is used for caching information about the size of +capturing groups, to improve performance. A default is created on the stack of +this size. */ + +#define GROUPINFO_DEFAULT_SIZE 256 + +/* The overrun tests check for a slightly smaller size so that they detect the +overrun before it actually does run off the end of the data block. */ + +#define WORK_SIZE_SAFETY_MARGIN (100) + +/* This value determines the size of the initial vector that is used for +remembering named groups during the pre-compile. It is allocated on the stack, +but if it is too small, it is expanded, in a similar way to the workspace. The +value is the number of slots in the list. */ + +#define NAMED_GROUP_LIST_SIZE 20 + +/* The pre-compiling pass over the pattern creates a parsed pattern in a vector +of uint32_t. For short patterns this lives on the stack, with this size. Heap +memory is used for longer patterns. */ + +#define PARSED_PATTERN_DEFAULT_SIZE 1024 + +/* Maximum length value to check against when making sure that the variable +that holds the compiled pattern length does not overflow. We make it a bit less +than INT_MAX to allow for adding in group terminating code units, so that we +don't have to check them every time. */ + +#define OFLOW_MAX (INT_MAX - 20) + +/* Code values for parsed patterns, which are stored in a vector of 32-bit +unsigned ints. Values less than META_END are literal data values. The coding +for identifying the item is in the top 16-bits, leaving 16 bits for the +additional data that some of them need. The META_CODE, META_DATA, and META_DIFF +macros are used to manipulate parsed pattern elements. + +NOTE: When these definitions are changed, the table of extra lengths for each +code (meta_extra_lengths, just below) must be updated to remain in step. */ + +#define META_END 0x80000000u /* End of pattern */ + +#define META_ALT 0x80010000u /* alternation */ +#define META_ATOMIC 0x80020000u /* atomic group */ +#define META_BACKREF 0x80030000u /* Back ref */ +#define META_BACKREF_BYNAME 0x80040000u /* \k'name' */ +#define META_BIGVALUE 0x80050000u /* Next is a literal > META_END */ +#define META_CALLOUT_NUMBER 0x80060000u /* (?C with numerical argument */ +#define META_CALLOUT_STRING 0x80070000u /* (?C with string argument */ +#define META_CAPTURE 0x80080000u /* Capturing parenthesis */ +#define META_CIRCUMFLEX 0x80090000u /* ^ metacharacter */ +#define META_CLASS 0x800a0000u /* start non-empty class */ +#define META_CLASS_EMPTY 0x800b0000u /* empty class */ +#define META_CLASS_EMPTY_NOT 0x800c0000u /* negative empty class */ +#define META_CLASS_END 0x800d0000u /* end of non-empty class */ +#define META_CLASS_NOT 0x800e0000u /* start non-empty negative class */ +#define META_COND_ASSERT 0x800f0000u /* (?(?assertion)... */ +#define META_COND_DEFINE 0x80100000u /* (?(DEFINE)... */ +#define META_COND_NAME 0x80110000u /* (?()... */ +#define META_COND_NUMBER 0x80120000u /* (?(digits)... */ +#define META_COND_RNAME 0x80130000u /* (?(R&name)... */ +#define META_COND_RNUMBER 0x80140000u /* (?(Rdigits)... */ +#define META_COND_VERSION 0x80150000u /* (?(VERSIONx.y)... */ +#define META_DOLLAR 0x80160000u /* $ metacharacter */ +#define META_DOT 0x80170000u /* . metacharacter */ +#define META_ESCAPE 0x80180000u /* \d and friends */ +#define META_KET 0x80190000u /* closing parenthesis */ +#define META_NOCAPTURE 0x801a0000u /* no capture parens */ +#define META_OPTIONS 0x801b0000u /* (?i) and friends */ +#define META_POSIX 0x801c0000u /* POSIX class item */ +#define META_POSIX_NEG 0x801d0000u /* negative POSIX class item */ +#define META_RANGE_ESCAPED 0x801e0000u /* range with at least one escape */ +#define META_RANGE_LITERAL 0x801f0000u /* range defined literally */ +#define META_RECURSE 0x80200000u /* Recursion */ +#define META_RECURSE_BYNAME 0x80210000u /* (?&name) */ +#define META_SCRIPT_RUN 0x80220000u /* (*script_run:...) */ + +/* These must be kept together to make it easy to check that an assertion +is present where expected in a conditional group. */ + +#define META_LOOKAHEAD 0x80230000u /* (?= */ +#define META_LOOKAHEADNOT 0x80240000u /* (?! */ +#define META_LOOKBEHIND 0x80250000u /* (?<= */ +#define META_LOOKBEHINDNOT 0x80260000u /* (?= 10 */ + 1+SIZEOFFSET, /* META_BACKREF_BYNAME */ + 1, /* META_BIGVALUE */ + 3, /* META_CALLOUT_NUMBER */ + 3+SIZEOFFSET, /* META_CALLOUT_STRING */ + 0, /* META_CAPTURE */ + 0, /* META_CIRCUMFLEX */ + 0, /* META_CLASS */ + 0, /* META_CLASS_EMPTY */ + 0, /* META_CLASS_EMPTY_NOT */ + 0, /* META_CLASS_END */ + 0, /* META_CLASS_NOT */ + 0, /* META_COND_ASSERT */ + SIZEOFFSET, /* META_COND_DEFINE */ + 1+SIZEOFFSET, /* META_COND_NAME */ + 1+SIZEOFFSET, /* META_COND_NUMBER */ + 1+SIZEOFFSET, /* META_COND_RNAME */ + 1+SIZEOFFSET, /* META_COND_RNUMBER */ + 3, /* META_COND_VERSION */ + 0, /* META_DOLLAR */ + 0, /* META_DOT */ + 0, /* META_ESCAPE - more for ESC_P, ESC_p, ESC_g, ESC_k */ + 0, /* META_KET */ + 0, /* META_NOCAPTURE */ + 1, /* META_OPTIONS */ + 1, /* META_POSIX */ + 1, /* META_POSIX_NEG */ + 0, /* META_RANGE_ESCAPED */ + 0, /* META_RANGE_LITERAL */ + SIZEOFFSET, /* META_RECURSE */ + 1+SIZEOFFSET, /* META_RECURSE_BYNAME */ + 0, /* META_SCRIPT_RUN */ + 0, /* META_LOOKAHEAD */ + 0, /* META_LOOKAHEADNOT */ + SIZEOFFSET, /* META_LOOKBEHIND */ + SIZEOFFSET, /* META_LOOKBEHINDNOT */ + 0, /* META_LOOKAHEAD_NA */ + SIZEOFFSET, /* META_LOOKBEHIND_NA */ + 1, /* META_MARK - plus the string length */ + 0, /* META_ACCEPT */ + 0, /* META_FAIL */ + 0, /* META_COMMIT */ + 1, /* META_COMMIT_ARG - plus the string length */ + 0, /* META_PRUNE */ + 1, /* META_PRUNE_ARG - plus the string length */ + 0, /* META_SKIP */ + 1, /* META_SKIP_ARG - plus the string length */ + 0, /* META_THEN */ + 1, /* META_THEN_ARG - plus the string length */ + 0, /* META_ASTERISK */ + 0, /* META_ASTERISK_PLUS */ + 0, /* META_ASTERISK_QUERY */ + 0, /* META_PLUS */ + 0, /* META_PLUS_PLUS */ + 0, /* META_PLUS_QUERY */ + 0, /* META_QUERY */ + 0, /* META_QUERY_PLUS */ + 0, /* META_QUERY_QUERY */ + 2, /* META_MINMAX */ + 2, /* META_MINMAX_PLUS */ + 2 /* META_MINMAX_QUERY */ +}; + +/* Types for skipping parts of a parsed pattern. */ + +enum { PSKIP_ALT, PSKIP_CLASS, PSKIP_KET }; + +/* Macro for setting individual bits in class bitmaps. It took some +experimenting to figure out how to stop gcc 5.3.0 from warning with +-Wconversion. This version gets a warning: + + #define SETBIT(a,b) a[(b)/8] |= (uint8_t)(1u << ((b)&7)) + +Let's hope the apparently less efficient version isn't actually so bad if the +compiler is clever with identical subexpressions. */ + +#define SETBIT(a,b) a[(b)/8] = (uint8_t)(a[(b)/8] | (1u << ((b)&7))) + +/* Private flags added to firstcu and reqcu. */ + +#define REQ_CASELESS (1u << 0) /* Indicates caselessness */ +#define REQ_VARY (1u << 1) /* reqcu followed non-literal item */ +/* Negative values for the firstcu and reqcu flags */ +#define REQ_UNSET (-2) /* Not yet found anything */ +#define REQ_NONE (-1) /* Found not fixed char */ + +/* These flags are used in the groupinfo vector. */ + +#define GI_SET_FIXED_LENGTH 0x80000000u +#define GI_NOT_FIXED_LENGTH 0x40000000u +#define GI_FIXED_LENGTH_MASK 0x0000ffffu + +/* This simple test for a decimal digit works for both ASCII/Unicode and EBCDIC +and is fast (a good compiler can turn it into a subtraction and unsigned +comparison). */ + +#define IS_DIGIT(x) ((x) >= CHAR_0 && (x) <= CHAR_9) + +/* Table to identify hex digits. The tables in chartables are dependent on the +locale, and may mark arbitrary characters as digits. We want to recognize only +0-9, a-z, and A-Z as hex digits, which is why we have a private table here. It +costs 256 bytes, but it is a lot faster than doing character value tests (at +least in some simple cases I timed), and in some applications one wants PCRE2 +to compile efficiently as well as match efficiently. The value in the table is +the binary hex digit value, or 0xff for non-hex digits. */ + +/* This is the "normal" case, for ASCII systems, and EBCDIC systems running in +UTF-8 mode. */ + +#ifndef EBCDIC +static const uint8_t xdigitab[] = + { + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 0- 7 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 8- 15 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 16- 23 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 24- 31 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* - ' */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* ( - / */ + 0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07, /* 0 - 7 */ + 0x08,0x09,0xff,0xff,0xff,0xff,0xff,0xff, /* 8 - ? */ + 0xff,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,0xff, /* @ - G */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* H - O */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* P - W */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* X - _ */ + 0xff,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,0xff, /* ` - g */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* h - o */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* p - w */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* x -127 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 128-135 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 136-143 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 144-151 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 152-159 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 160-167 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 168-175 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 176-183 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 184-191 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 192-199 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 2ff-207 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 208-215 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 216-223 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 224-231 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 232-239 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 240-247 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff};/* 248-255 */ + +#else + +/* This is the "abnormal" case, for EBCDIC systems not running in UTF-8 mode. */ + +static const uint8_t xdigitab[] = + { + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 0- 7 0 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 8- 15 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 16- 23 10 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 24- 31 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 32- 39 20 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 40- 47 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 48- 55 30 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 56- 63 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* - 71 40 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 72- | */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* & - 87 50 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 88- 95 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* - -103 60 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 104- ? */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 112-119 70 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 120- " */ + 0xff,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,0xff, /* 128- g 80 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* h -143 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 144- p 90 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* q -159 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 160- x A0 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* y -175 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* ^ -183 B0 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* 184-191 */ + 0xff,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,0xff, /* { - G C0 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* H -207 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* } - P D0 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* Q -223 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* \ - X E0 */ + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff, /* Y -239 */ + 0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07, /* 0 - 7 F0 */ + 0x08,0x09,0xff,0xff,0xff,0xff,0xff,0xff};/* 8 -255 */ +#endif /* EBCDIC */ + + +/* Table for handling alphanumeric escaped characters. Positive returns are +simple data values; negative values are for special things like \d and so on. +Zero means further processing is needed (for things like \x), or the escape is +invalid. */ + +/* This is the "normal" table for ASCII systems or for EBCDIC systems running +in UTF-8 mode. It runs from '0' to 'z'. */ + +#ifndef EBCDIC +#define ESCAPES_FIRST CHAR_0 +#define ESCAPES_LAST CHAR_z +#define UPPER_CASE(c) (c-32) + +static const short int escapes[] = { + 0, 0, + 0, 0, + 0, 0, + 0, 0, + 0, 0, + CHAR_COLON, CHAR_SEMICOLON, + CHAR_LESS_THAN_SIGN, CHAR_EQUALS_SIGN, + CHAR_GREATER_THAN_SIGN, CHAR_QUESTION_MARK, + CHAR_COMMERCIAL_AT, -ESC_A, + -ESC_B, -ESC_C, + -ESC_D, -ESC_E, + 0, -ESC_G, + -ESC_H, 0, + 0, -ESC_K, + 0, 0, + -ESC_N, 0, + -ESC_P, -ESC_Q, + -ESC_R, -ESC_S, + 0, 0, + -ESC_V, -ESC_W, + -ESC_X, 0, + -ESC_Z, CHAR_LEFT_SQUARE_BRACKET, + CHAR_BACKSLASH, CHAR_RIGHT_SQUARE_BRACKET, + CHAR_CIRCUMFLEX_ACCENT, CHAR_UNDERSCORE, + CHAR_GRAVE_ACCENT, CHAR_BEL, + -ESC_b, 0, + -ESC_d, CHAR_ESC, + CHAR_FF, 0, + -ESC_h, 0, + 0, -ESC_k, + 0, 0, + CHAR_LF, 0, + -ESC_p, 0, + CHAR_CR, -ESC_s, + CHAR_HT, 0, + -ESC_v, -ESC_w, + 0, 0, + -ESC_z +}; + +#else + +/* This is the "abnormal" table for EBCDIC systems without UTF-8 support. +It runs from 'a' to '9'. For some minimal testing of EBCDIC features, the code +is sometimes compiled on an ASCII system. In this case, we must not use CHAR_a +because it is defined as 'a', which of course picks up the ASCII value. */ + +#if 'a' == 0x81 /* Check for a real EBCDIC environment */ +#define ESCAPES_FIRST CHAR_a +#define ESCAPES_LAST CHAR_9 +#define UPPER_CASE(c) (c+64) +#else /* Testing in an ASCII environment */ +#define ESCAPES_FIRST ((unsigned char)'\x81') /* EBCDIC 'a' */ +#define ESCAPES_LAST ((unsigned char)'\xf9') /* EBCDIC '9' */ +#define UPPER_CASE(c) (c-32) +#endif + +static const short int escapes[] = { +/* 80 */ CHAR_BEL, -ESC_b, 0, -ESC_d, CHAR_ESC, CHAR_FF, 0, +/* 88 */ -ESC_h, 0, 0, '{', 0, 0, 0, 0, +/* 90 */ 0, 0, -ESC_k, 0, 0, CHAR_LF, 0, -ESC_p, +/* 98 */ 0, CHAR_CR, 0, '}', 0, 0, 0, 0, +/* A0 */ 0, '~', -ESC_s, CHAR_HT, 0, -ESC_v, -ESC_w, 0, +/* A8 */ 0, -ESC_z, 0, 0, 0, '[', 0, 0, +/* B0 */ 0, 0, 0, 0, 0, 0, 0, 0, +/* B8 */ 0, 0, 0, 0, 0, ']', '=', '-', +/* C0 */ '{', -ESC_A, -ESC_B, -ESC_C, -ESC_D, -ESC_E, 0, -ESC_G, +/* C8 */ -ESC_H, 0, 0, 0, 0, 0, 0, 0, +/* D0 */ '}', 0, -ESC_K, 0, 0, -ESC_N, 0, -ESC_P, +/* D8 */ -ESC_Q, -ESC_R, 0, 0, 0, 0, 0, 0, +/* E0 */ '\\', 0, -ESC_S, 0, 0, -ESC_V, -ESC_W, -ESC_X, +/* E8 */ 0, -ESC_Z, 0, 0, 0, 0, 0, 0, +/* F0 */ 0, 0, 0, 0, 0, 0, 0, 0, +/* F8 */ 0, 0 +}; + +/* We also need a table of characters that may follow \c in an EBCDIC +environment for characters 0-31. */ + +static unsigned char ebcdic_escape_c[] = "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_"; + +#endif /* EBCDIC */ + + +/* Table of special "verbs" like (*PRUNE). This is a short table, so it is +searched linearly. Put all the names into a single string, in order to reduce +the number of relocations when a shared library is dynamically linked. The +string is built from string macros so that it works in UTF-8 mode on EBCDIC +platforms. */ + +typedef struct verbitem { + unsigned int len; /* Length of verb name */ + uint32_t meta; /* Base META_ code */ + int has_arg; /* Argument requirement */ +} verbitem; + +static const char verbnames[] = + "\0" /* Empty name is a shorthand for MARK */ + STRING_MARK0 + STRING_ACCEPT0 + STRING_F0 + STRING_FAIL0 + STRING_COMMIT0 + STRING_PRUNE0 + STRING_SKIP0 + STRING_THEN; + +static const verbitem verbs[] = { + { 0, META_MARK, +1 }, /* > 0 => must have an argument */ + { 4, META_MARK, +1 }, + { 6, META_ACCEPT, -1 }, /* < 0 => Optional argument, convert to pre-MARK */ + { 1, META_FAIL, -1 }, + { 4, META_FAIL, -1 }, + { 6, META_COMMIT, 0 }, + { 5, META_PRUNE, 0 }, /* Optional argument; bump META code if found */ + { 4, META_SKIP, 0 }, + { 4, META_THEN, 0 } +}; + +static const int verbcount = sizeof(verbs)/sizeof(verbitem); + +/* Verb opcodes, indexed by their META code offset from META_MARK. */ + +static const uint32_t verbops[] = { + OP_MARK, OP_ACCEPT, OP_FAIL, OP_COMMIT, OP_COMMIT_ARG, OP_PRUNE, + OP_PRUNE_ARG, OP_SKIP, OP_SKIP_ARG, OP_THEN, OP_THEN_ARG }; + +/* Table of "alpha assertions" like (*pla:...), similar to the (*VERB) table. */ + +typedef struct alasitem { + unsigned int len; /* Length of name */ + uint32_t meta; /* Base META_ code */ +} alasitem; + +static const char alasnames[] = + STRING_pla0 + STRING_plb0 + STRING_napla0 + STRING_naplb0 + STRING_nla0 + STRING_nlb0 + STRING_positive_lookahead0 + STRING_positive_lookbehind0 + STRING_non_atomic_positive_lookahead0 + STRING_non_atomic_positive_lookbehind0 + STRING_negative_lookahead0 + STRING_negative_lookbehind0 + STRING_atomic0 + STRING_sr0 + STRING_asr0 + STRING_script_run0 + STRING_atomic_script_run; + +static const alasitem alasmeta[] = { + { 3, META_LOOKAHEAD }, + { 3, META_LOOKBEHIND }, + { 5, META_LOOKAHEAD_NA }, + { 5, META_LOOKBEHIND_NA }, + { 3, META_LOOKAHEADNOT }, + { 3, META_LOOKBEHINDNOT }, + { 18, META_LOOKAHEAD }, + { 19, META_LOOKBEHIND }, + { 29, META_LOOKAHEAD_NA }, + { 30, META_LOOKBEHIND_NA }, + { 18, META_LOOKAHEADNOT }, + { 19, META_LOOKBEHINDNOT }, + { 6, META_ATOMIC }, + { 2, META_SCRIPT_RUN }, /* sr = script run */ + { 3, META_ATOMIC_SCRIPT_RUN }, /* asr = atomic script run */ + { 10, META_SCRIPT_RUN }, /* script run */ + { 17, META_ATOMIC_SCRIPT_RUN } /* atomic script run */ +}; + +static const int alascount = sizeof(alasmeta)/sizeof(alasitem); + +/* Offsets from OP_STAR for case-independent and negative repeat opcodes. */ + +static uint32_t chartypeoffset[] = { + OP_STAR - OP_STAR, OP_STARI - OP_STAR, + OP_NOTSTAR - OP_STAR, OP_NOTSTARI - OP_STAR }; + +/* Tables of names of POSIX character classes and their lengths. The names are +now all in a single string, to reduce the number of relocations when a shared +library is dynamically loaded. The list of lengths is terminated by a zero +length entry. The first three must be alpha, lower, upper, as this is assumed +for handling case independence. The indices for graph, print, and punct are +needed, so identify them. */ + +static const char posix_names[] = + STRING_alpha0 STRING_lower0 STRING_upper0 STRING_alnum0 + STRING_ascii0 STRING_blank0 STRING_cntrl0 STRING_digit0 + STRING_graph0 STRING_print0 STRING_punct0 STRING_space0 + STRING_word0 STRING_xdigit; + +static const uint8_t posix_name_lengths[] = { + 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 6, 0 }; + +#define PC_GRAPH 8 +#define PC_PRINT 9 +#define PC_PUNCT 10 + +/* Table of class bit maps for each POSIX class. Each class is formed from a +base map, with an optional addition or removal of another map. Then, for some +classes, there is some additional tweaking: for [:blank:] the vertical space +characters are removed, and for [:alpha:] and [:alnum:] the underscore +character is removed. The triples in the table consist of the base map offset, +second map offset or -1 if no second map, and a non-negative value for map +addition or a negative value for map subtraction (if there are two maps). The +absolute value of the third field has these meanings: 0 => no tweaking, 1 => +remove vertical space characters, 2 => remove underscore. */ + +static const int posix_class_maps[] = { + cbit_word, cbit_digit, -2, /* alpha */ + cbit_lower, -1, 0, /* lower */ + cbit_upper, -1, 0, /* upper */ + cbit_word, -1, 2, /* alnum - word without underscore */ + cbit_print, cbit_cntrl, 0, /* ascii */ + cbit_space, -1, 1, /* blank - a GNU extension */ + cbit_cntrl, -1, 0, /* cntrl */ + cbit_digit, -1, 0, /* digit */ + cbit_graph, -1, 0, /* graph */ + cbit_print, -1, 0, /* print */ + cbit_punct, -1, 0, /* punct */ + cbit_space, -1, 0, /* space */ + cbit_word, -1, 0, /* word - a Perl extension */ + cbit_xdigit,-1, 0 /* xdigit */ +}; + +#ifdef SUPPORT_UNICODE + +/* The POSIX class Unicode property substitutes that are used in UCP mode must +be in the order of the POSIX class names, defined above. */ + +static int posix_substitutes[] = { + PT_GC, ucp_L, /* alpha */ + PT_PC, ucp_Ll, /* lower */ + PT_PC, ucp_Lu, /* upper */ + PT_ALNUM, 0, /* alnum */ + -1, 0, /* ascii, treat as non-UCP */ + -1, 1, /* blank, treat as \h */ + PT_PC, ucp_Cc, /* cntrl */ + PT_PC, ucp_Nd, /* digit */ + PT_PXGRAPH, 0, /* graph */ + PT_PXPRINT, 0, /* print */ + PT_PXPUNCT, 0, /* punct */ + PT_PXSPACE, 0, /* space */ /* Xps is POSIX space, but from 8.34 */ + PT_WORD, 0, /* word */ /* Perl and POSIX space are the same */ + -1, 0 /* xdigit, treat as non-UCP */ +}; +#define POSIX_SUBSIZE (sizeof(posix_substitutes) / (2*sizeof(uint32_t))) +#endif /* SUPPORT_UNICODE */ + +/* Masks for checking option settings. When PCRE2_LITERAL is set, only a subset +are allowed. */ + +#define PUBLIC_LITERAL_COMPILE_OPTIONS \ + (PCRE2_ANCHORED|PCRE2_AUTO_CALLOUT|PCRE2_CASELESS|PCRE2_ENDANCHORED| \ + PCRE2_FIRSTLINE|PCRE2_LITERAL|PCRE2_MATCH_INVALID_UTF| \ + PCRE2_NO_START_OPTIMIZE|PCRE2_NO_UTF_CHECK|PCRE2_USE_OFFSET_LIMIT|PCRE2_UTF) + +#define PUBLIC_COMPILE_OPTIONS \ + (PUBLIC_LITERAL_COMPILE_OPTIONS| \ + PCRE2_ALLOW_EMPTY_CLASS|PCRE2_ALT_BSUX|PCRE2_ALT_CIRCUMFLEX| \ + PCRE2_ALT_VERBNAMES|PCRE2_DOLLAR_ENDONLY|PCRE2_DOTALL|PCRE2_DUPNAMES| \ + PCRE2_EXTENDED|PCRE2_EXTENDED_MORE|PCRE2_MATCH_UNSET_BACKREF| \ + PCRE2_MULTILINE|PCRE2_NEVER_BACKSLASH_C|PCRE2_NEVER_UCP| \ + PCRE2_NEVER_UTF|PCRE2_NO_AUTO_CAPTURE|PCRE2_NO_AUTO_POSSESS| \ + PCRE2_NO_DOTSTAR_ANCHOR|PCRE2_UCP|PCRE2_UNGREEDY) + +#define PUBLIC_LITERAL_COMPILE_EXTRA_OPTIONS \ + (PCRE2_EXTRA_MATCH_LINE|PCRE2_EXTRA_MATCH_WORD) + +#define PUBLIC_COMPILE_EXTRA_OPTIONS \ + (PUBLIC_LITERAL_COMPILE_EXTRA_OPTIONS| \ + PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES|PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL| \ + PCRE2_EXTRA_ESCAPED_CR_IS_LF|PCRE2_EXTRA_ALT_BSUX) + +/* Compile time error code numbers. They are given names so that they can more +easily be tracked. When a new number is added, the tables called eint1 and +eint2 in pcre2posix.c may need to be updated, and a new error text must be +added to compile_error_texts in pcre2_error.c. */ + +enum { ERR0 = COMPILE_ERROR_BASE, + ERR1, ERR2, ERR3, ERR4, ERR5, ERR6, ERR7, ERR8, ERR9, ERR10, + ERR11, ERR12, ERR13, ERR14, ERR15, ERR16, ERR17, ERR18, ERR19, ERR20, + ERR21, ERR22, ERR23, ERR24, ERR25, ERR26, ERR27, ERR28, ERR29, ERR30, + ERR31, ERR32, ERR33, ERR34, ERR35, ERR36, ERR37, ERR38, ERR39, ERR40, + ERR41, ERR42, ERR43, ERR44, ERR45, ERR46, ERR47, ERR48, ERR49, ERR50, + ERR51, ERR52, ERR53, ERR54, ERR55, ERR56, ERR57, ERR58, ERR59, ERR60, + ERR61, ERR62, ERR63, ERR64, ERR65, ERR66, ERR67, ERR68, ERR69, ERR70, + ERR71, ERR72, ERR73, ERR74, ERR75, ERR76, ERR77, ERR78, ERR79, ERR80, + ERR81, ERR82, ERR83, ERR84, ERR85, ERR86, ERR87, ERR88, ERR89, ERR90, + ERR91, ERR92, ERR93, ERR94, ERR95, ERR96, ERR97, ERR98 }; + +/* This is a table of start-of-pattern options such as (*UTF) and settings such +as (*LIMIT_MATCH=nnnn) and (*CRLF). For completeness and backward +compatibility, (*UTFn) is supported in the relevant libraries, but (*UTF) is +generic and always supported. */ + +enum { PSO_OPT, /* Value is an option bit */ + PSO_FLG, /* Value is a flag bit */ + PSO_NL, /* Value is a newline type */ + PSO_BSR, /* Value is a \R type */ + PSO_LIMH, /* Read integer value for heap limit */ + PSO_LIMM, /* Read integer value for match limit */ + PSO_LIMD }; /* Read integer value for depth limit */ + +typedef struct pso { + const uint8_t *name; + uint16_t length; + uint16_t type; + uint32_t value; +} pso; + +/* NB: STRING_UTFn_RIGHTPAR contains the length as well */ + +static pso pso_list[] = { + { (uint8_t *)STRING_UTFn_RIGHTPAR, PSO_OPT, PCRE2_UTF }, + { (uint8_t *)STRING_UTF_RIGHTPAR, 4, PSO_OPT, PCRE2_UTF }, + { (uint8_t *)STRING_UCP_RIGHTPAR, 4, PSO_OPT, PCRE2_UCP }, + { (uint8_t *)STRING_NOTEMPTY_RIGHTPAR, 9, PSO_FLG, PCRE2_NOTEMPTY_SET }, + { (uint8_t *)STRING_NOTEMPTY_ATSTART_RIGHTPAR, 17, PSO_FLG, PCRE2_NE_ATST_SET }, + { (uint8_t *)STRING_NO_AUTO_POSSESS_RIGHTPAR, 16, PSO_OPT, PCRE2_NO_AUTO_POSSESS }, + { (uint8_t *)STRING_NO_DOTSTAR_ANCHOR_RIGHTPAR, 18, PSO_OPT, PCRE2_NO_DOTSTAR_ANCHOR }, + { (uint8_t *)STRING_NO_JIT_RIGHTPAR, 7, PSO_FLG, PCRE2_NOJIT }, + { (uint8_t *)STRING_NO_START_OPT_RIGHTPAR, 13, PSO_OPT, PCRE2_NO_START_OPTIMIZE }, + { (uint8_t *)STRING_LIMIT_HEAP_EQ, 11, PSO_LIMH, 0 }, + { (uint8_t *)STRING_LIMIT_MATCH_EQ, 12, PSO_LIMM, 0 }, + { (uint8_t *)STRING_LIMIT_DEPTH_EQ, 12, PSO_LIMD, 0 }, + { (uint8_t *)STRING_LIMIT_RECURSION_EQ, 16, PSO_LIMD, 0 }, + { (uint8_t *)STRING_CR_RIGHTPAR, 3, PSO_NL, PCRE2_NEWLINE_CR }, + { (uint8_t *)STRING_LF_RIGHTPAR, 3, PSO_NL, PCRE2_NEWLINE_LF }, + { (uint8_t *)STRING_CRLF_RIGHTPAR, 5, PSO_NL, PCRE2_NEWLINE_CRLF }, + { (uint8_t *)STRING_ANY_RIGHTPAR, 4, PSO_NL, PCRE2_NEWLINE_ANY }, + { (uint8_t *)STRING_NUL_RIGHTPAR, 4, PSO_NL, PCRE2_NEWLINE_NUL }, + { (uint8_t *)STRING_ANYCRLF_RIGHTPAR, 8, PSO_NL, PCRE2_NEWLINE_ANYCRLF }, + { (uint8_t *)STRING_BSR_ANYCRLF_RIGHTPAR, 12, PSO_BSR, PCRE2_BSR_ANYCRLF }, + { (uint8_t *)STRING_BSR_UNICODE_RIGHTPAR, 12, PSO_BSR, PCRE2_BSR_UNICODE } +}; + +/* This table is used when converting repeating opcodes into possessified +versions as a result of an explicit possessive quantifier such as ++. A zero +value means there is no possessified version - in those cases the item in +question must be wrapped in ONCE brackets. The table is truncated at OP_CALLOUT +because all relevant opcodes are less than that. */ + +static const uint8_t opcode_possessify[] = { + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0 - 15 */ + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 16 - 31 */ + + 0, /* NOTI */ + OP_POSSTAR, 0, /* STAR, MINSTAR */ + OP_POSPLUS, 0, /* PLUS, MINPLUS */ + OP_POSQUERY, 0, /* QUERY, MINQUERY */ + OP_POSUPTO, 0, /* UPTO, MINUPTO */ + 0, /* EXACT */ + 0, 0, 0, 0, /* POS{STAR,PLUS,QUERY,UPTO} */ + + OP_POSSTARI, 0, /* STARI, MINSTARI */ + OP_POSPLUSI, 0, /* PLUSI, MINPLUSI */ + OP_POSQUERYI, 0, /* QUERYI, MINQUERYI */ + OP_POSUPTOI, 0, /* UPTOI, MINUPTOI */ + 0, /* EXACTI */ + 0, 0, 0, 0, /* POS{STARI,PLUSI,QUERYI,UPTOI} */ + + OP_NOTPOSSTAR, 0, /* NOTSTAR, NOTMINSTAR */ + OP_NOTPOSPLUS, 0, /* NOTPLUS, NOTMINPLUS */ + OP_NOTPOSQUERY, 0, /* NOTQUERY, NOTMINQUERY */ + OP_NOTPOSUPTO, 0, /* NOTUPTO, NOTMINUPTO */ + 0, /* NOTEXACT */ + 0, 0, 0, 0, /* NOTPOS{STAR,PLUS,QUERY,UPTO} */ + + OP_NOTPOSSTARI, 0, /* NOTSTARI, NOTMINSTARI */ + OP_NOTPOSPLUSI, 0, /* NOTPLUSI, NOTMINPLUSI */ + OP_NOTPOSQUERYI, 0, /* NOTQUERYI, NOTMINQUERYI */ + OP_NOTPOSUPTOI, 0, /* NOTUPTOI, NOTMINUPTOI */ + 0, /* NOTEXACTI */ + 0, 0, 0, 0, /* NOTPOS{STARI,PLUSI,QUERYI,UPTOI} */ + + OP_TYPEPOSSTAR, 0, /* TYPESTAR, TYPEMINSTAR */ + OP_TYPEPOSPLUS, 0, /* TYPEPLUS, TYPEMINPLUS */ + OP_TYPEPOSQUERY, 0, /* TYPEQUERY, TYPEMINQUERY */ + OP_TYPEPOSUPTO, 0, /* TYPEUPTO, TYPEMINUPTO */ + 0, /* TYPEEXACT */ + 0, 0, 0, 0, /* TYPEPOS{STAR,PLUS,QUERY,UPTO} */ + + OP_CRPOSSTAR, 0, /* CRSTAR, CRMINSTAR */ + OP_CRPOSPLUS, 0, /* CRPLUS, CRMINPLUS */ + OP_CRPOSQUERY, 0, /* CRQUERY, CRMINQUERY */ + OP_CRPOSRANGE, 0, /* CRRANGE, CRMINRANGE */ + 0, 0, 0, 0, /* CRPOS{STAR,PLUS,QUERY,RANGE} */ + + 0, 0, 0, /* CLASS, NCLASS, XCLASS */ + 0, 0, /* REF, REFI */ + 0, 0, /* DNREF, DNREFI */ + 0, 0 /* RECURSE, CALLOUT */ +}; + + +#ifdef DEBUG_SHOW_PARSED +/************************************************* +* Show the parsed pattern for debugging * +*************************************************/ + +/* For debugging the pre-scan, this code, which outputs the parsed data vector, +can be enabled. */ + +static void show_parsed(compile_block *cb) +{ +uint32_t *pptr = cb->parsed_pattern; + +for (;;) + { + int max, min; + PCRE2_SIZE offset; + uint32_t i; + uint32_t length; + uint32_t meta_arg = META_DATA(*pptr); + + fprintf(stderr, "+++ %02d %.8x ", (int)(pptr - cb->parsed_pattern), *pptr); + + if (*pptr < META_END) + { + if (*pptr > 32 && *pptr < 128) fprintf(stderr, "%c", *pptr); + pptr++; + } + + else switch (META_CODE(*pptr++)) + { + default: + fprintf(stderr, "**** OOPS - unknown META value - giving up ****\n"); + return; + + case META_END: + fprintf(stderr, "META_END\n"); + return; + + case META_CAPTURE: + fprintf(stderr, "META_CAPTURE %d", meta_arg); + break; + + case META_RECURSE: + GETOFFSET(offset, pptr); + fprintf(stderr, "META_RECURSE %d %zd", meta_arg, offset); + break; + + case META_BACKREF: + if (meta_arg < 10) + offset = cb->small_ref_offset[meta_arg]; + else + GETOFFSET(offset, pptr); + fprintf(stderr, "META_BACKREF %d %zd", meta_arg, offset); + break; + + case META_ESCAPE: + if (meta_arg == ESC_P || meta_arg == ESC_p) + { + uint32_t ptype = *pptr >> 16; + uint32_t pvalue = *pptr++ & 0xffff; + fprintf(stderr, "META \\%c %d %d", (meta_arg == ESC_P)? 'P':'p', + ptype, pvalue); + } + else + { + uint32_t cc; + /* There's just one escape we might have here that isn't negated in the + escapes table. */ + if (meta_arg == ESC_g) cc = CHAR_g; + else for (cc = ESCAPES_FIRST; cc <= ESCAPES_LAST; cc++) + { + if (meta_arg == (uint32_t)(-escapes[cc - ESCAPES_FIRST])) break; + } + if (cc > ESCAPES_LAST) cc = CHAR_QUESTION_MARK; + fprintf(stderr, "META \\%c", cc); + } + break; + + case META_MINMAX: + min = *pptr++; + max = *pptr++; + if (max != REPEAT_UNLIMITED) + fprintf(stderr, "META {%d,%d}", min, max); + else + fprintf(stderr, "META {%d,}", min); + break; + + case META_MINMAX_QUERY: + min = *pptr++; + max = *pptr++; + if (max != REPEAT_UNLIMITED) + fprintf(stderr, "META {%d,%d}?", min, max); + else + fprintf(stderr, "META {%d,}?", min); + break; + + case META_MINMAX_PLUS: + min = *pptr++; + max = *pptr++; + if (max != REPEAT_UNLIMITED) + fprintf(stderr, "META {%d,%d}+", min, max); + else + fprintf(stderr, "META {%d,}+", min); + break; + + case META_BIGVALUE: fprintf(stderr, "META_BIGVALUE %.8x", *pptr++); break; + case META_CIRCUMFLEX: fprintf(stderr, "META_CIRCUMFLEX"); break; + case META_COND_ASSERT: fprintf(stderr, "META_COND_ASSERT"); break; + case META_DOLLAR: fprintf(stderr, "META_DOLLAR"); break; + case META_DOT: fprintf(stderr, "META_DOT"); break; + case META_ASTERISK: fprintf(stderr, "META *"); break; + case META_ASTERISK_QUERY: fprintf(stderr, "META *?"); break; + case META_ASTERISK_PLUS: fprintf(stderr, "META *+"); break; + case META_PLUS: fprintf(stderr, "META +"); break; + case META_PLUS_QUERY: fprintf(stderr, "META +?"); break; + case META_PLUS_PLUS: fprintf(stderr, "META ++"); break; + case META_QUERY: fprintf(stderr, "META ?"); break; + case META_QUERY_QUERY: fprintf(stderr, "META ??"); break; + case META_QUERY_PLUS: fprintf(stderr, "META ?+"); break; + + case META_ATOMIC: fprintf(stderr, "META (?>"); break; + case META_NOCAPTURE: fprintf(stderr, "META (?:"); break; + case META_LOOKAHEAD: fprintf(stderr, "META (?="); break; + case META_LOOKAHEADNOT: fprintf(stderr, "META (?!"); break; + case META_LOOKAHEAD_NA: fprintf(stderr, "META (*napla:"); break; + case META_SCRIPT_RUN: fprintf(stderr, "META (*sr:"); break; + case META_KET: fprintf(stderr, "META )"); break; + case META_ALT: fprintf(stderr, "META | %d", meta_arg); break; + + case META_CLASS: fprintf(stderr, "META ["); break; + case META_CLASS_NOT: fprintf(stderr, "META [^"); break; + case META_CLASS_END: fprintf(stderr, "META ]"); break; + case META_CLASS_EMPTY: fprintf(stderr, "META []"); break; + case META_CLASS_EMPTY_NOT: fprintf(stderr, "META [^]"); break; + + case META_RANGE_LITERAL: fprintf(stderr, "META - (literal)"); break; + case META_RANGE_ESCAPED: fprintf(stderr, "META - (escaped)"); break; + + case META_POSIX: fprintf(stderr, "META_POSIX %d", *pptr++); break; + case META_POSIX_NEG: fprintf(stderr, "META_POSIX_NEG %d", *pptr++); break; + + case META_ACCEPT: fprintf(stderr, "META (*ACCEPT)"); break; + case META_FAIL: fprintf(stderr, "META (*FAIL)"); break; + case META_COMMIT: fprintf(stderr, "META (*COMMIT)"); break; + case META_PRUNE: fprintf(stderr, "META (*PRUNE)"); break; + case META_SKIP: fprintf(stderr, "META (*SKIP)"); break; + case META_THEN: fprintf(stderr, "META (*THEN)"); break; + + case META_OPTIONS: fprintf(stderr, "META_OPTIONS 0x%02x", *pptr++); break; + + case META_LOOKBEHIND: + fprintf(stderr, "META (?<= %d offset=", meta_arg); + GETOFFSET(offset, pptr); + fprintf(stderr, "%zd", offset); + break; + + case META_LOOKBEHIND_NA: + fprintf(stderr, "META (*naplb: %d offset=", meta_arg); + GETOFFSET(offset, pptr); + fprintf(stderr, "%zd", offset); + break; + + case META_LOOKBEHINDNOT: + fprintf(stderr, "META (?="); + fprintf(stderr, "%d.", *pptr++); + fprintf(stderr, "%d)", *pptr++); + break; + + case META_COND_NAME: + fprintf(stderr, "META (?() length=%d offset=", *pptr++); + GETOFFSET(offset, pptr); + fprintf(stderr, "%zd", offset); + break; + + case META_COND_RNAME: + fprintf(stderr, "META (?(R&name) length=%d offset=", *pptr++); + GETOFFSET(offset, pptr); + fprintf(stderr, "%zd", offset); + break; + + /* This is kept as a name, because it might be. */ + + case META_COND_RNUMBER: + fprintf(stderr, "META (?(Rnumber) length=%d offset=", *pptr++); + GETOFFSET(offset, pptr); + fprintf(stderr, "%zd", offset); + break; + + case META_MARK: + fprintf(stderr, "META (*MARK:"); + goto SHOWARG; + + case META_COMMIT_ARG: + fprintf(stderr, "META (*COMMIT:"); + goto SHOWARG; + + case META_PRUNE_ARG: + fprintf(stderr, "META (*PRUNE:"); + goto SHOWARG; + + case META_SKIP_ARG: + fprintf(stderr, "META (*SKIP:"); + goto SHOWARG; + + case META_THEN_ARG: + fprintf(stderr, "META (*THEN:"); + SHOWARG: + length = *pptr++; + for (i = 0; i < length; i++) + { + uint32_t cc = *pptr++; + if (cc > 32 && cc < 128) fprintf(stderr, "%c", cc); + else fprintf(stderr, "\\x{%x}", cc); + } + fprintf(stderr, ") length=%u", length); + break; + } + fprintf(stderr, "\n"); + } +return; +} +#endif /* DEBUG_SHOW_PARSED */ + + + +/************************************************* +* Copy compiled code * +*************************************************/ + +/* Compiled JIT code cannot be copied, so the new compiled block has no +associated JIT data. */ + +PCRE2_EXP_DEFN pcre2_code * PCRE2_CALL_CONVENTION +pcre2_code_copy(const pcre2_code *code) +{ +PCRE2_SIZE* ref_count; +pcre2_code *newcode; + +if (code == NULL) return NULL; +newcode = code->memctl.malloc(code->blocksize, code->memctl.memory_data); +if (newcode == NULL) return NULL; +memcpy(newcode, code, code->blocksize); +newcode->executable_jit = NULL; + +/* If the code is one that has been deserialized, increment the reference count +in the decoded tables. */ + +if ((code->flags & PCRE2_DEREF_TABLES) != 0) + { + ref_count = (PCRE2_SIZE *)(code->tables + TABLES_LENGTH); + (*ref_count)++; + } + +return newcode; +} + + + +/************************************************* +* Copy compiled code and character tables * +*************************************************/ + +/* Compiled JIT code cannot be copied, so the new compiled block has no +associated JIT data. This version of code_copy also makes a separate copy of +the character tables. */ + +PCRE2_EXP_DEFN pcre2_code * PCRE2_CALL_CONVENTION +pcre2_code_copy_with_tables(const pcre2_code *code) +{ +PCRE2_SIZE* ref_count; +pcre2_code *newcode; +uint8_t *newtables; + +if (code == NULL) return NULL; +newcode = code->memctl.malloc(code->blocksize, code->memctl.memory_data); +if (newcode == NULL) return NULL; +memcpy(newcode, code, code->blocksize); +newcode->executable_jit = NULL; + +newtables = code->memctl.malloc(TABLES_LENGTH + sizeof(PCRE2_SIZE), + code->memctl.memory_data); +if (newtables == NULL) + { + code->memctl.free((void *)newcode, code->memctl.memory_data); + return NULL; + } +memcpy(newtables, code->tables, TABLES_LENGTH); +ref_count = (PCRE2_SIZE *)(newtables + TABLES_LENGTH); +*ref_count = 1; + +newcode->tables = newtables; +newcode->flags |= PCRE2_DEREF_TABLES; +return newcode; +} + + + +/************************************************* +* Free compiled code * +*************************************************/ + +PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION +pcre2_code_free(pcre2_code *code) +{ +PCRE2_SIZE* ref_count; + +if (code != NULL) + { + if (code->executable_jit != NULL) + PRIV(jit_free)(code->executable_jit, &code->memctl); + + if ((code->flags & PCRE2_DEREF_TABLES) != 0) + { + /* Decoded tables belong to the codes after deserialization, and they must + be freed when there are no more references to them. The *ref_count should + always be > 0. */ + + ref_count = (PCRE2_SIZE *)(code->tables + TABLES_LENGTH); + if (*ref_count > 0) + { + (*ref_count)--; + if (*ref_count == 0) + code->memctl.free((void *)code->tables, code->memctl.memory_data); + } + } + + code->memctl.free(code, code->memctl.memory_data); + } +} + + + +/************************************************* +* Read a number, possibly signed * +*************************************************/ + +/* This function is used to read numbers in the pattern. The initial pointer +must be the sign or first digit of the number. When relative values (introduced +by + or -) are allowed, they are relative group numbers, and the result must be +greater than zero. + +Arguments: + ptrptr points to the character pointer variable + ptrend points to the end of the input string + allow_sign if < 0, sign not allowed; if >= 0, sign is relative to this + max_value the largest number allowed + max_error the error to give for an over-large number + intptr where to put the result + errcodeptr where to put an error code + +Returns: TRUE - a number was read + FALSE - errorcode == 0 => no number was found + errorcode != 0 => an error occurred +*/ + +static BOOL +read_number(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, int32_t allow_sign, + uint32_t max_value, uint32_t max_error, int *intptr, int *errorcodeptr) +{ +int sign = 0; +uint32_t n = 0; +PCRE2_SPTR ptr = *ptrptr; +BOOL yield = FALSE; + +*errorcodeptr = 0; + +if (allow_sign >= 0 && ptr < ptrend) + { + if (*ptr == CHAR_PLUS) + { + sign = +1; + max_value -= allow_sign; + ptr++; + } + else if (*ptr == CHAR_MINUS) + { + sign = -1; + ptr++; + } + } + +if (ptr >= ptrend || !IS_DIGIT(*ptr)) return FALSE; +while (ptr < ptrend && IS_DIGIT(*ptr)) + { + n = n * 10 + *ptr++ - CHAR_0; + if (n > max_value) + { + *errorcodeptr = max_error; + goto EXIT; + } + } + +if (allow_sign >= 0 && sign != 0) + { + if (n == 0) + { + *errorcodeptr = ERR26; /* +0 and -0 are not allowed */ + goto EXIT; + } + + if (sign > 0) n += allow_sign; + else if ((int)n > allow_sign) + { + *errorcodeptr = ERR15; /* Non-existent subpattern */ + goto EXIT; + } + else n = allow_sign + 1 - n; + } + +yield = TRUE; + +EXIT: +*intptr = n; +*ptrptr = ptr; +return yield; +} + + + +/************************************************* +* Read repeat counts * +*************************************************/ + +/* Read an item of the form {n,m} and return the values if non-NULL pointers +are supplied. Repeat counts must be less than 65536 (MAX_REPEAT_COUNT); a +larger value is used for "unlimited". We have to use signed arguments for +read_number() because it is capable of returning a signed value. + +Arguments: + ptrptr points to pointer to character after'{' + ptrend pointer to end of input + minp if not NULL, pointer to int for min + maxp if not NULL, pointer to int for max (-1 if no max) + returned as -1 if no max + errorcodeptr points to error code variable + +Returns: FALSE if not a repeat quantifier, errorcode set zero + FALSE on error, with errorcode set non-zero + TRUE on success, with pointer updated to point after '}' +*/ + +static BOOL +read_repeat_counts(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, uint32_t *minp, + uint32_t *maxp, int *errorcodeptr) +{ +PCRE2_SPTR p; +BOOL yield = FALSE; +BOOL had_comma = FALSE; +int32_t min = 0; +int32_t max = REPEAT_UNLIMITED; /* This value is larger than MAX_REPEAT_COUNT */ + +/* Check the syntax */ + +*errorcodeptr = 0; +for (p = *ptrptr;; p++) + { + uint32_t c; + if (p >= ptrend) return FALSE; + c = *p; + if (IS_DIGIT(c)) continue; + if (c == CHAR_RIGHT_CURLY_BRACKET) break; + if (c == CHAR_COMMA) + { + if (had_comma) return FALSE; + had_comma = TRUE; + } + else return FALSE; + } + +/* The only error from read_number() is for a number that is too big. */ + +p = *ptrptr; +if (!read_number(&p, ptrend, -1, MAX_REPEAT_COUNT, ERR5, &min, errorcodeptr)) + goto EXIT; + +if (*p == CHAR_RIGHT_CURLY_BRACKET) + { + p++; + max = min; + } +else + { + if (*(++p) != CHAR_RIGHT_CURLY_BRACKET) + { + if (!read_number(&p, ptrend, -1, MAX_REPEAT_COUNT, ERR5, &max, + errorcodeptr)) + goto EXIT; + if (max < min) + { + *errorcodeptr = ERR4; + goto EXIT; + } + } + p++; + } + +yield = TRUE; +if (minp != NULL) *minp = (uint32_t)min; +if (maxp != NULL) *maxp = (uint32_t)max; + +/* Update the pattern pointer */ + +EXIT: +*ptrptr = p; +return yield; +} + + + +/************************************************* +* Handle escapes * +*************************************************/ + +/* This function is called when a \ has been encountered. It either returns a +positive value for a simple escape such as \d, or 0 for a data character, which +is placed in chptr. A backreference to group n is returned as negative n. On +entry, ptr is pointing at the character after \. On exit, it points after the +final code unit of the escape sequence. + +This function is also called from pcre2_substitute() to handle escape sequences +in replacement strings. In this case, the cb argument is NULL, and in the case +of escapes that have further processing, only sequences that define a data +character are recognised. The isclass argument is not relevant; the options +argument is the final value of the compiled pattern's options. + +Arguments: + ptrptr points to the input position pointer + ptrend points to the end of the input + chptr points to a returned data character + errorcodeptr points to the errorcode variable (containing zero) + options the current options bits + isclass TRUE if inside a character class + cb compile data block or NULL when called from pcre2_substitute() + +Returns: zero => a data character + positive => a special escape sequence + negative => a numerical back reference + on error, errorcodeptr is set non-zero +*/ + +int +PRIV(check_escape)(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, uint32_t *chptr, + int *errorcodeptr, uint32_t options, uint32_t extra_options, BOOL isclass, + compile_block *cb) +{ +BOOL utf = (options & PCRE2_UTF) != 0; +PCRE2_SPTR ptr = *ptrptr; +uint32_t c, cc; +int escape = 0; +int i; + +/* If backslash is at the end of the string, it's an error. */ + +if (ptr >= ptrend) + { + *errorcodeptr = ERR1; + return 0; + } + +GETCHARINCTEST(c, ptr); /* Get character value, increment pointer */ +*errorcodeptr = 0; /* Be optimistic */ + +/* Non-alphanumerics are literals, so we just leave the value in c. An initial +value test saves a memory lookup for code points outside the alphanumeric +range. */ + +if (c < ESCAPES_FIRST || c > ESCAPES_LAST) {} /* Definitely literal */ + +/* Otherwise, do a table lookup. Non-zero values need little processing here. A +positive value is a literal value for something like \n. A negative value is +the negation of one of the ESC_ macros that is passed back for handling by the +calling function. Some extra checking is needed for \N because only \N{U+dddd} +is supported. If the value is zero, further processing is handled below. */ + +else if ((i = escapes[c - ESCAPES_FIRST]) != 0) + { + if (i > 0) + { + c = (uint32_t)i; + if (c == CHAR_CR && (extra_options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0) + c = CHAR_LF; + } + else /* Negative table entry */ + { + escape = -i; /* Else return a special escape */ + if (cb != NULL && (escape == ESC_P || escape == ESC_p || escape == ESC_X)) + cb->external_flags |= PCRE2_HASBKPORX; /* Note \P, \p, or \X */ + + /* Perl supports \N{name} for character names and \N{U+dddd} for numerical + Unicode code points, as well as plain \N for "not newline". PCRE does not + support \N{name}. However, it does support quantification such as \N{2,3}, + so if \N{ is not followed by U+dddd we check for a quantifier. */ + + if (escape == ESC_N && ptr < ptrend && *ptr == CHAR_LEFT_CURLY_BRACKET) + { + PCRE2_SPTR p = ptr + 1; + + /* \N{U+ can be handled by the \x{ code. However, this construction is + not valid in EBCDIC environments because it specifies a Unicode + character, not a codepoint in the local code. For example \N{U+0041} + must be "A" in all environments. Also, in Perl, \N{U+ forces Unicode + casing semantics for the entire pattern, so allow it only in UTF (i.e. + Unicode) mode. */ + + if (ptrend - p > 1 && *p == CHAR_U && p[1] == CHAR_PLUS) + { +#ifdef EBCDIC + *errorcodeptr = ERR93; +#else + if (utf) + { + ptr = p + 1; + escape = 0; /* Not a fancy escape after all */ + goto COME_FROM_NU; + } + else *errorcodeptr = ERR93; +#endif + } + + /* Give an error if what follows is not a quantifier, but don't override + an error set by the quantifier reader (e.g. number overflow). */ + + else + { + if (!read_repeat_counts(&p, ptrend, NULL, NULL, errorcodeptr) && + *errorcodeptr == 0) + *errorcodeptr = ERR37; + } + } + } + } + +/* Escapes that need further processing, including those that are unknown, have +a zero entry in the lookup table. When called from pcre2_substitute(), only \c, +\o, and \x are recognized (\u and \U can never appear as they are used for case +forcing). */ + +else + { + int s; + PCRE2_SPTR oldptr; + BOOL overflow; + BOOL alt_bsux = + ((options & PCRE2_ALT_BSUX) | (extra_options & PCRE2_EXTRA_ALT_BSUX)) != 0; + + /* Filter calls from pcre2_substitute(). */ + + if (cb == NULL) + { + if (c != CHAR_c && c != CHAR_o && c != CHAR_x) + { + *errorcodeptr = ERR3; + return 0; + } + alt_bsux = FALSE; /* Do not modify \x handling */ + } + + switch (c) + { + /* A number of Perl escapes are not handled by PCRE. We give an explicit + error. */ + + case CHAR_F: + case CHAR_l: + case CHAR_L: + *errorcodeptr = ERR37; + break; + + /* \u is unrecognized when neither PCRE2_ALT_BSUX nor PCRE2_EXTRA_ALT_BSUX + is set. Otherwise, \u must be followed by exactly four hex digits or, if + PCRE2_EXTRA_ALT_BSUX is set, by any number of hex digits in braces. + Otherwise it is a lowercase u letter. This gives some compatibility with + ECMAScript (aka JavaScript). */ + + case CHAR_u: + if (!alt_bsux) *errorcodeptr = ERR37; else + { + uint32_t xc; + + if (ptr >= ptrend) break; + if (*ptr == CHAR_LEFT_CURLY_BRACKET && + (extra_options & PCRE2_EXTRA_ALT_BSUX) != 0) + { + PCRE2_SPTR hptr = ptr + 1; + cc = 0; + + while (hptr < ptrend && (xc = XDIGIT(*hptr)) != 0xff) + { + if ((cc & 0xf0000000) != 0) /* Test for 32-bit overflow */ + { + *errorcodeptr = ERR77; + ptr = hptr; /* Show where */ + break; /* *hptr != } will cause another break below */ + } + cc = (cc << 4) | xc; + hptr++; + } + + if (hptr == ptr + 1 || /* No hex digits */ + hptr >= ptrend || /* Hit end of input */ + *hptr != CHAR_RIGHT_CURLY_BRACKET) /* No } terminator */ + break; /* Hex escape not recognized */ + + c = cc; /* Accept the code point */ + ptr = hptr + 1; + } + + else /* Must be exactly 4 hex digits */ + { + if (ptrend - ptr < 4) break; /* Less than 4 chars */ + if ((cc = XDIGIT(ptr[0])) == 0xff) break; /* Not a hex digit */ + if ((xc = XDIGIT(ptr[1])) == 0xff) break; /* Not a hex digit */ + cc = (cc << 4) | xc; + if ((xc = XDIGIT(ptr[2])) == 0xff) break; /* Not a hex digit */ + cc = (cc << 4) | xc; + if ((xc = XDIGIT(ptr[3])) == 0xff) break; /* Not a hex digit */ + c = (cc << 4) | xc; + ptr += 4; + } + + if (utf) + { + if (c > 0x10ffffU) *errorcodeptr = ERR77; + else + if (c >= 0xd800 && c <= 0xdfff && + (extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0) + *errorcodeptr = ERR73; + } + else if (c > MAX_NON_UTF_CHAR) *errorcodeptr = ERR77; + } + break; + + /* \U is unrecognized unless PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set, + in which case it is an upper case letter. */ + + case CHAR_U: + if (!alt_bsux) *errorcodeptr = ERR37; + break; + + /* In a character class, \g is just a literal "g". Outside a character + class, \g must be followed by one of a number of specific things: + + (1) A number, either plain or braced. If positive, it is an absolute + backreference. If negative, it is a relative backreference. This is a Perl + 5.10 feature. + + (2) Perl 5.10 also supports \g{name} as a reference to a named group. This + is part of Perl's movement towards a unified syntax for back references. As + this is synonymous with \k{name}, we fudge it up by pretending it really + was \k{name}. + + (3) For Oniguruma compatibility we also support \g followed by a name or a + number either in angle brackets or in single quotes. However, these are + (possibly recursive) subroutine calls, _not_ backreferences. We return + the ESC_g code. + + Summary: Return a negative number for a numerical back reference, ESC_k for + a named back reference, and ESC_g for a named or numbered subroutine call. + */ + + case CHAR_g: + if (isclass) break; + + if (ptr >= ptrend) + { + *errorcodeptr = ERR57; + break; + } + + if (*ptr == CHAR_LESS_THAN_SIGN || *ptr == CHAR_APOSTROPHE) + { + escape = ESC_g; + break; + } + + /* If there is a brace delimiter, try to read a numerical reference. If + there isn't one, assume we have a name and treat it as \k. */ + + if (*ptr == CHAR_LEFT_CURLY_BRACKET) + { + PCRE2_SPTR p = ptr + 1; + if (!read_number(&p, ptrend, cb->bracount, MAX_GROUP_NUMBER, ERR61, &s, + errorcodeptr)) + { + if (*errorcodeptr == 0) escape = ESC_k; /* No number found */ + break; + } + if (p >= ptrend || *p != CHAR_RIGHT_CURLY_BRACKET) + { + *errorcodeptr = ERR57; + break; + } + ptr = p + 1; + } + + /* Read an undelimited number */ + + else + { + if (!read_number(&ptr, ptrend, cb->bracount, MAX_GROUP_NUMBER, ERR61, &s, + errorcodeptr)) + { + if (*errorcodeptr == 0) *errorcodeptr = ERR57; /* No number found */ + break; + } + } + + if (s <= 0) + { + *errorcodeptr = ERR15; + break; + } + + escape = -s; + break; + + /* The handling of escape sequences consisting of a string of digits + starting with one that is not zero is not straightforward. Perl has changed + over the years. Nowadays \g{} for backreferences and \o{} for octal are + recommended to avoid the ambiguities in the old syntax. + + Outside a character class, the digits are read as a decimal number. If the + number is less than 10, or if there are that many previous extracting left + brackets, it is a back reference. Otherwise, up to three octal digits are + read to form an escaped character code. Thus \123 is likely to be octal 123 + (cf \0123, which is octal 012 followed by the literal 3). + + Inside a character class, \ followed by a digit is always either a literal + 8 or 9 or an octal number. */ + + case CHAR_1: case CHAR_2: case CHAR_3: case CHAR_4: case CHAR_5: + case CHAR_6: case CHAR_7: case CHAR_8: case CHAR_9: + + if (!isclass) + { + oldptr = ptr; + ptr--; /* Back to the digit */ + + /* As we know we are at a digit, the only possible error from + read_number() is a number that is too large to be a group number. In this + case we fall through handle this as not a group reference. If we have + read a small enough number, check for a back reference. + + \1 to \9 are always back references. \8x and \9x are too; \1x to \7x + are octal escapes if there are not that many previous captures. */ + + if (read_number(&ptr, ptrend, -1, INT_MAX/10 - 1, 0, &s, errorcodeptr) && + (s < 10 || oldptr[-1] >= CHAR_8 || s <= (int)cb->bracount)) + { + if (s > (int)MAX_GROUP_NUMBER) *errorcodeptr = ERR61; + else escape = -s; /* Indicates a back reference */ + break; + } + + ptr = oldptr; /* Put the pointer back and fall through */ + } + + /* Handle a digit following \ when the number is not a back reference, or + we are within a character class. If the first digit is 8 or 9, Perl used to + generate a binary zero and then treat the digit as a following literal. At + least by Perl 5.18 this changed so as not to insert the binary zero. */ + + if (c >= CHAR_8) break; + + /* Fall through */ + + /* \0 always starts an octal number, but we may drop through to here with a + larger first octal digit. The original code used just to take the least + significant 8 bits of octal numbers (I think this is what early Perls used + to do). Nowadays we allow for larger numbers in UTF-8 mode and 16-bit mode, + but no more than 3 octal digits. */ + + case CHAR_0: + c -= CHAR_0; + while(i++ < 2 && ptr < ptrend && *ptr >= CHAR_0 && *ptr <= CHAR_7) + c = c * 8 + *ptr++ - CHAR_0; +#if PCRE2_CODE_UNIT_WIDTH == 8 + if (!utf && c > 0xff) *errorcodeptr = ERR51; +#endif + break; + + /* \o is a relatively new Perl feature, supporting a more general way of + specifying character codes in octal. The only supported form is \o{ddd}. */ + + case CHAR_o: + if (ptr >= ptrend || *ptr++ != CHAR_LEFT_CURLY_BRACKET) + { + ptr--; + *errorcodeptr = ERR55; + } + else if (ptr >= ptrend || *ptr == CHAR_RIGHT_CURLY_BRACKET) + *errorcodeptr = ERR78; + else + { + c = 0; + overflow = FALSE; + while (ptr < ptrend && *ptr >= CHAR_0 && *ptr <= CHAR_7) + { + cc = *ptr++; + if (c == 0 && cc == CHAR_0) continue; /* Leading zeroes */ +#if PCRE2_CODE_UNIT_WIDTH == 32 + if (c >= 0x20000000l) { overflow = TRUE; break; } +#endif + c = (c << 3) + (cc - CHAR_0); +#if PCRE2_CODE_UNIT_WIDTH == 8 + if (c > (utf ? 0x10ffffU : 0xffU)) { overflow = TRUE; break; } +#elif PCRE2_CODE_UNIT_WIDTH == 16 + if (c > (utf ? 0x10ffffU : 0xffffU)) { overflow = TRUE; break; } +#elif PCRE2_CODE_UNIT_WIDTH == 32 + if (utf && c > 0x10ffffU) { overflow = TRUE; break; } +#endif + } + if (overflow) + { + while (ptr < ptrend && *ptr >= CHAR_0 && *ptr <= CHAR_7) ptr++; + *errorcodeptr = ERR34; + } + else if (ptr < ptrend && *ptr++ == CHAR_RIGHT_CURLY_BRACKET) + { + if (utf && c >= 0xd800 && c <= 0xdfff && + (extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0) + { + ptr--; + *errorcodeptr = ERR73; + } + } + else + { + ptr--; + *errorcodeptr = ERR64; + } + } + break; + + /* When PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set, \x must be followed + by two hexadecimal digits. Otherwise it is a lowercase x letter. */ + + case CHAR_x: + if (alt_bsux) + { + uint32_t xc; + if (ptrend - ptr < 2) break; /* Less than 2 characters */ + if ((cc = XDIGIT(ptr[0])) == 0xff) break; /* Not a hex digit */ + if ((xc = XDIGIT(ptr[1])) == 0xff) break; /* Not a hex digit */ + c = (cc << 4) | xc; + ptr += 2; + } + + /* Handle \x in Perl's style. \x{ddd} is a character code which can be + greater than 0xff in UTF-8 or non-8bit mode, but only if the ddd are hex + digits. If not, { used to be treated as a data character. However, Perl + seems to read hex digits up to the first non-such, and ignore the rest, so + that, for example \x{zz} matches a binary zero. This seems crazy, so PCRE + now gives an error. */ + + else + { + if (ptr < ptrend && *ptr == CHAR_LEFT_CURLY_BRACKET) + { +#ifndef EBCDIC + COME_FROM_NU: +#endif + if (++ptr >= ptrend || *ptr == CHAR_RIGHT_CURLY_BRACKET) + { + *errorcodeptr = ERR78; + break; + } + c = 0; + overflow = FALSE; + + while (ptr < ptrend && (cc = XDIGIT(*ptr)) != 0xff) + { + ptr++; + if (c == 0 && cc == 0) continue; /* Leading zeroes */ +#if PCRE2_CODE_UNIT_WIDTH == 32 + if (c >= 0x10000000l) { overflow = TRUE; break; } +#endif + c = (c << 4) | cc; + if ((utf && c > 0x10ffffU) || (!utf && c > MAX_NON_UTF_CHAR)) + { + overflow = TRUE; + break; + } + } + + if (overflow) + { + while (ptr < ptrend && XDIGIT(*ptr) != 0xff) ptr++; + *errorcodeptr = ERR34; + } + else if (ptr < ptrend && *ptr++ == CHAR_RIGHT_CURLY_BRACKET) + { + if (utf && c >= 0xd800 && c <= 0xdfff && + (extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) == 0) + { + ptr--; + *errorcodeptr = ERR73; + } + } + + /* If the sequence of hex digits does not end with '}', give an error. + We used just to recognize this construct and fall through to the normal + \x handling, but nowadays Perl gives an error, which seems much more + sensible, so we do too. */ + + else + { + ptr--; + *errorcodeptr = ERR67; + } + } /* End of \x{} processing */ + + /* Read a up to two hex digits after \x */ + + else + { + c = 0; + if (ptr >= ptrend || (cc = XDIGIT(*ptr)) == 0xff) break; /* Not a hex digit */ + ptr++; + c = cc; + if (ptr >= ptrend || (cc = XDIGIT(*ptr)) == 0xff) break; /* Not a hex digit */ + ptr++; + c = (c << 4) | cc; + } /* End of \xdd handling */ + } /* End of Perl-style \x handling */ + break; + + /* The handling of \c is different in ASCII and EBCDIC environments. In an + ASCII (or Unicode) environment, an error is given if the character + following \c is not a printable ASCII character. Otherwise, the following + character is upper-cased if it is a letter, and after that the 0x40 bit is + flipped. The result is the value of the escape. + + In an EBCDIC environment the handling of \c is compatible with the + specification in the perlebcdic document. The following character must be + a letter or one of small number of special characters. These provide a + means of defining the character values 0-31. + + For testing the EBCDIC handling of \c in an ASCII environment, recognize + the EBCDIC value of 'c' explicitly. */ + +#if defined EBCDIC && 'a' != 0x81 + case 0x83: +#else + case CHAR_c: +#endif + if (ptr >= ptrend) + { + *errorcodeptr = ERR2; + break; + } + c = *ptr; + if (c >= CHAR_a && c <= CHAR_z) c = UPPER_CASE(c); + + /* Handle \c in an ASCII/Unicode environment. */ + +#ifndef EBCDIC /* ASCII/UTF-8 coding */ + if (c < 32 || c > 126) /* Excludes all non-printable ASCII */ + { + *errorcodeptr = ERR68; + break; + } + c ^= 0x40; + + /* Handle \c in an EBCDIC environment. The special case \c? is converted to + 255 (0xff) or 95 (0x5f) if other characters suggest we are using the + POSIX-BC encoding. (This is the way Perl indicates that it handles \c?.) + The other valid sequences correspond to a list of specific characters. */ + +#else + if (c == CHAR_QUESTION_MARK) + c = ('\\' == 188 && '`' == 74)? 0x5f : 0xff; + else + { + for (i = 0; i < 32; i++) + { + if (c == ebcdic_escape_c[i]) break; + } + if (i < 32) c = i; else *errorcodeptr = ERR68; + } +#endif /* EBCDIC */ + + ptr++; + break; + + /* Any other alphanumeric following \ is an error. Perl gives an error only + if in warning mode, but PCRE doesn't have a warning mode. */ + + default: + *errorcodeptr = ERR3; + *ptrptr = ptr - 1; /* Point to the character at fault */ + return 0; + } + } + +/* Set the pointer to the next character before returning. */ + +*ptrptr = ptr; +*chptr = c; +return escape; +} + + + +#ifdef SUPPORT_UNICODE +/************************************************* +* Handle \P and \p * +*************************************************/ + +/* This function is called after \P or \p has been encountered, provided that +PCRE2 is compiled with support for UTF and Unicode properties. On entry, the +contents of ptrptr are pointing after the P or p. On exit, it is left pointing +after the final code unit of the escape sequence. + +Arguments: + ptrptr the pattern position pointer + negptr a boolean that is set TRUE for negation else FALSE + ptypeptr an unsigned int that is set to the type value + pdataptr an unsigned int that is set to the detailed property value + errorcodeptr the error code variable + cb the compile data + +Returns: TRUE if the type value was found, or FALSE for an invalid type +*/ + +static BOOL +get_ucp(PCRE2_SPTR *ptrptr, BOOL *negptr, uint16_t *ptypeptr, + uint16_t *pdataptr, int *errorcodeptr, compile_block *cb) +{ +PCRE2_UCHAR c; +PCRE2_SIZE i, bot, top; +PCRE2_SPTR ptr = *ptrptr; +PCRE2_UCHAR name[32]; + +if (ptr >= cb->end_pattern) goto ERROR_RETURN; +c = *ptr++; +*negptr = FALSE; + +/* \P or \p can be followed by a name in {}, optionally preceded by ^ for +negation. */ + +if (c == CHAR_LEFT_CURLY_BRACKET) + { + if (ptr >= cb->end_pattern) goto ERROR_RETURN; + if (*ptr == CHAR_CIRCUMFLEX_ACCENT) + { + *negptr = TRUE; + ptr++; + } + for (i = 0; i < (int)(sizeof(name) / sizeof(PCRE2_UCHAR)) - 1; i++) + { + if (ptr >= cb->end_pattern) goto ERROR_RETURN; + c = *ptr++; + if (c == CHAR_NUL) goto ERROR_RETURN; + if (c == CHAR_RIGHT_CURLY_BRACKET) break; + name[i] = c; + } + if (c != CHAR_RIGHT_CURLY_BRACKET) goto ERROR_RETURN; + name[i] = 0; + } + +/* Otherwise there is just one following character, which must be an ASCII +letter. */ + +else if (MAX_255(c) && (cb->ctypes[c] & ctype_letter) != 0) + { + name[0] = c; + name[1] = 0; + } +else goto ERROR_RETURN; + +*ptrptr = ptr; + +/* Search for a recognized property name using binary chop. */ + +bot = 0; +top = PRIV(utt_size); + +while (bot < top) + { + int r; + i = (bot + top) >> 1; + r = PRIV(strcmp_c8)(name, PRIV(utt_names) + PRIV(utt)[i].name_offset); + if (r == 0) + { + *ptypeptr = PRIV(utt)[i].type; + *pdataptr = PRIV(utt)[i].value; + return TRUE; + } + if (r > 0) bot = i + 1; else top = i; + } +*errorcodeptr = ERR47; /* Unrecognized name */ +return FALSE; + +ERROR_RETURN: /* Malformed \P or \p */ +*errorcodeptr = ERR46; +*ptrptr = ptr; +return FALSE; +} +#endif + + + +/************************************************* +* Check for POSIX class syntax * +*************************************************/ + +/* This function is called when the sequence "[:" or "[." or "[=" is +encountered in a character class. It checks whether this is followed by a +sequence of characters terminated by a matching ":]" or ".]" or "=]". If we +reach an unescaped ']' without the special preceding character, return FALSE. + +Originally, this function only recognized a sequence of letters between the +terminators, but it seems that Perl recognizes any sequence of characters, +though of course unknown POSIX names are subsequently rejected. Perl gives an +"Unknown POSIX class" error for [:f\oo:] for example, where previously PCRE +didn't consider this to be a POSIX class. Likewise for [:1234:]. + +The problem in trying to be exactly like Perl is in the handling of escapes. We +have to be sure that [abc[:x\]pqr] is *not* treated as containing a POSIX +class, but [abc[:x\]pqr:]] is (so that an error can be generated). The code +below handles the special cases \\ and \], but does not try to do any other +escape processing. This makes it different from Perl for cases such as +[:l\ower:] where Perl recognizes it as the POSIX class "lower" but PCRE does +not recognize "l\ower". This is a lesser evil than not diagnosing bad classes +when Perl does, I think. + +A user pointed out that PCRE was rejecting [:a[:digit:]] whereas Perl was not. +It seems that the appearance of a nested POSIX class supersedes an apparent +external class. For example, [:a[:digit:]b:] matches "a", "b", ":", or +a digit. This is handled by returning FALSE if the start of a new group with +the same terminator is encountered, since the next closing sequence must close +the nested group, not the outer one. + +In Perl, unescaped square brackets may also appear as part of class names. For +example, [:a[:abc]b:] gives unknown POSIX class "[:abc]b:]". However, for +[:a[:abc]b][b:] it gives unknown POSIX class "[:abc]b][b:]", which does not +seem right at all. PCRE does not allow closing square brackets in POSIX class +names. + +Arguments: + ptr pointer to the character after the initial [ (colon, dot, equals) + ptrend pointer to the end of the pattern + endptr where to return a pointer to the terminating ':', '.', or '=' + +Returns: TRUE or FALSE +*/ + +static BOOL +check_posix_syntax(PCRE2_SPTR ptr, PCRE2_SPTR ptrend, PCRE2_SPTR *endptr) +{ +PCRE2_UCHAR terminator; /* Don't combine these lines; the Solaris cc */ +terminator = *ptr++; /* compiler warns about "non-constant" initializer. */ + +for (; ptrend - ptr >= 2; ptr++) + { + if (*ptr == CHAR_BACKSLASH && + (ptr[1] == CHAR_RIGHT_SQUARE_BRACKET || ptr[1] == CHAR_BACKSLASH)) + ptr++; + + else if ((*ptr == CHAR_LEFT_SQUARE_BRACKET && ptr[1] == terminator) || + *ptr == CHAR_RIGHT_SQUARE_BRACKET) return FALSE; + + else if (*ptr == terminator && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET) + { + *endptr = ptr; + return TRUE; + } + } + +return FALSE; +} + + + +/************************************************* +* Check POSIX class name * +*************************************************/ + +/* This function is called to check the name given in a POSIX-style class entry +such as [:alnum:]. + +Arguments: + ptr points to the first letter + len the length of the name + +Returns: a value representing the name, or -1 if unknown +*/ + +static int +check_posix_name(PCRE2_SPTR ptr, int len) +{ +const char *pn = posix_names; +int yield = 0; +while (posix_name_lengths[yield] != 0) + { + if (len == posix_name_lengths[yield] && + PRIV(strncmp_c8)(ptr, pn, (unsigned int)len) == 0) return yield; + pn += posix_name_lengths[yield] + 1; + yield++; + } +return -1; +} + + + +/************************************************* +* Read a subpattern or VERB name * +*************************************************/ + +/* This function is called from parse_regex() below whenever it needs to read +the name of a subpattern or a (*VERB) or an (*alpha_assertion). The initial +pointer must be to the character before the name. If that character is '*' we +are reading a verb or alpha assertion name. The pointer is updated to point +after the name, for a VERB or alpha assertion name, or after tha name's +terminator for a subpattern name. Returning both the offset and the name +pointer is redundant information, but some callers use one and some the other, +so it is simplest just to return both. + +Arguments: + ptrptr points to the character pointer variable + ptrend points to the end of the input string + utf true if the input is UTF-encoded + terminator the terminator of a subpattern name must be this + offsetptr where to put the offset from the start of the pattern + nameptr where to put a pointer to the name in the input + namelenptr where to put the length of the name + errcodeptr where to put an error code + cb pointer to the compile data block + +Returns: TRUE if a name was read + FALSE otherwise, with error code set +*/ + +static BOOL +read_name(PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, BOOL utf, uint32_t terminator, + PCRE2_SIZE *offsetptr, PCRE2_SPTR *nameptr, uint32_t *namelenptr, + int *errorcodeptr, compile_block *cb) +{ +PCRE2_SPTR ptr = *ptrptr; +BOOL is_group = (*ptr != CHAR_ASTERISK); + +if (++ptr >= ptrend) /* No characters in name */ + { + *errorcodeptr = is_group? ERR62: /* Subpattern name expected */ + ERR60; /* Verb not recognized or malformed */ + goto FAILED; + } + +*nameptr = ptr; +*offsetptr = (PCRE2_SIZE)(ptr - cb->start_pattern); + +/* In UTF mode, a group name may contain letters and decimal digits as defined +by Unicode properties, and underscores, but must not start with a digit. */ + +#ifdef SUPPORT_UNICODE +if (utf && is_group) + { + uint32_t c, type; + + GETCHAR(c, ptr); + type = UCD_CHARTYPE(c); + + if (type == ucp_Nd) + { + *errorcodeptr = ERR44; + goto FAILED; + } + + for(;;) + { + if (type != ucp_Nd && PRIV(ucp_gentype)[type] != ucp_L && + c != CHAR_UNDERSCORE) break; + ptr++; + FORWARDCHARTEST(ptr, ptrend); + if (ptr >= ptrend) break; + GETCHAR(c, ptr); + type = UCD_CHARTYPE(c); + } + } +else +#else +(void)utf; /* Avoid compiler warning */ +#endif /* SUPPORT_UNICODE */ + +/* Handle non-group names and group names in non-UTF modes. A group name must +not start with a digit. If either of the others start with a digit it just +won't be recognized. */ + + { + if (is_group && IS_DIGIT(*ptr)) + { + *errorcodeptr = ERR44; + goto FAILED; + } + + while (ptr < ptrend && MAX_255(*ptr) && (cb->ctypes[*ptr] & ctype_word) != 0) + { + ptr++; + } + } + +/* Check name length */ + +if (ptr > *nameptr + MAX_NAME_SIZE) + { + *errorcodeptr = ERR48; + goto FAILED; + } +*namelenptr = (uint32_t)(ptr - *nameptr); + +/* Subpattern names must not be empty, and their terminator is checked here. +(What follows a verb or alpha assertion name is checked separately.) */ + +if (is_group) + { + if (ptr == *nameptr) + { + *errorcodeptr = ERR62; /* Subpattern name expected */ + goto FAILED; + } + if (ptr >= ptrend || *ptr != (PCRE2_UCHAR)terminator) + { + *errorcodeptr = ERR42; + goto FAILED; + } + ptr++; + } + +*ptrptr = ptr; +return TRUE; + +FAILED: +*ptrptr = ptr; +return FALSE; +} + + + +/************************************************* +* Manage callouts at start of cycle * +*************************************************/ + +/* At the start of a new item in parse_regex() we are able to record the +details of the previous item in a prior callout, and also to set up an +automatic callout if enabled. Avoid having two adjacent automatic callouts, +which would otherwise happen for items such as \Q that contribute nothing to +the parsed pattern. + +Arguments: + ptr current pattern pointer + pcalloutptr points to a pointer to previous callout, or NULL + auto_callout TRUE if auto_callouts are enabled + parsed_pattern the parsed pattern pointer + cb compile block + +Returns: possibly updated parsed_pattern pointer. +*/ + +static uint32_t * +manage_callouts(PCRE2_SPTR ptr, uint32_t **pcalloutptr, BOOL auto_callout, + uint32_t *parsed_pattern, compile_block *cb) +{ +uint32_t *previous_callout = *pcalloutptr; + +if (previous_callout != NULL) previous_callout[2] = (uint32_t)(ptr - + cb->start_pattern - (PCRE2_SIZE)previous_callout[1]); + +if (!auto_callout) previous_callout = NULL; else + { + if (previous_callout == NULL || + previous_callout != parsed_pattern - 4 || + previous_callout[3] != 255) + { + previous_callout = parsed_pattern; /* Set up new automatic callout */ + parsed_pattern += 4; + previous_callout[0] = META_CALLOUT_NUMBER; + previous_callout[2] = 0; + previous_callout[3] = 255; + } + previous_callout[1] = (uint32_t)(ptr - cb->start_pattern); + } + +*pcalloutptr = previous_callout; +return parsed_pattern; +} + + + +/************************************************* +* Parse regex and identify named groups * +*************************************************/ + +/* This function is called first of all. It scans the pattern and does two +things: (1) It identifies capturing groups and makes a table of named capturing +groups so that information about them is fully available to both the compiling +scans. (2) It writes a parsed version of the pattern with comments omitted and +escapes processed into the parsed_pattern vector. + +Arguments: + ptr points to the start of the pattern + options compiling dynamic options (may change during the scan) + has_lookbehind points to a boolean, set TRUE if a lookbehind is found + cb pointer to the compile data block + +Returns: zero on success or a non-zero error code, with the + error offset placed in the cb field +*/ + +/* A structure and some flags for dealing with nested groups. */ + +typedef struct nest_save { + uint16_t nest_depth; + uint16_t reset_group; + uint16_t max_group; + uint16_t flags; + uint32_t options; +} nest_save; + +#define NSF_RESET 0x0001u +#define NSF_CONDASSERT 0x0002u +#define NSF_ATOMICSR 0x0004u + +/* Options that are changeable within the pattern must be tracked during +parsing. Some (e.g. PCRE2_EXTENDED) are implemented entirely during parsing, +but all must be tracked so that META_OPTIONS items set the correct values for +the main compiling phase. */ + +#define PARSE_TRACKED_OPTIONS (PCRE2_CASELESS|PCRE2_DOTALL|PCRE2_DUPNAMES| \ + PCRE2_EXTENDED|PCRE2_EXTENDED_MORE|PCRE2_MULTILINE|PCRE2_NO_AUTO_CAPTURE| \ + PCRE2_UNGREEDY) + +/* States used for analyzing ranges in character classes. The two OK values +must be last. */ + +enum { RANGE_NO, RANGE_STARTED, RANGE_OK_ESCAPED, RANGE_OK_LITERAL }; + +/* Only in 32-bit mode can there be literals > META_END. A macro encapsulates +the storing of literal values in the main parsed pattern, where they can always +be quantified. */ + +#if PCRE2_CODE_UNIT_WIDTH == 32 +#define PARSED_LITERAL(c, p) \ + { \ + if (c >= META_END) *p++ = META_BIGVALUE; \ + *p++ = c; \ + okquantifier = TRUE; \ + } +#else +#define PARSED_LITERAL(c, p) *p++ = c; okquantifier = TRUE; +#endif + +/* Here's the actual function. */ + +static int parse_regex(PCRE2_SPTR ptr, uint32_t options, BOOL *has_lookbehind, + compile_block *cb) +{ +uint32_t c; +uint32_t delimiter; +uint32_t namelen; +uint32_t class_range_state; +uint32_t *verblengthptr = NULL; /* Value avoids compiler warning */ +uint32_t *verbstartptr = NULL; +uint32_t *previous_callout = NULL; +uint32_t *parsed_pattern = cb->parsed_pattern; +uint32_t *parsed_pattern_end = cb->parsed_pattern_end; +uint32_t meta_quantifier = 0; +uint32_t add_after_mark = 0; +uint32_t extra_options = cb->cx->extra_options; +uint16_t nest_depth = 0; +int after_manual_callout = 0; +int expect_cond_assert = 0; +int errorcode = 0; +int escape; +int i; +BOOL inescq = FALSE; +BOOL inverbname = FALSE; +BOOL utf = (options & PCRE2_UTF) != 0; +BOOL auto_callout = (options & PCRE2_AUTO_CALLOUT) != 0; +BOOL isdupname; +BOOL negate_class; +BOOL okquantifier = FALSE; +PCRE2_SPTR thisptr; +PCRE2_SPTR name; +PCRE2_SPTR ptrend = cb->end_pattern; +PCRE2_SPTR verbnamestart = NULL; /* Value avoids compiler warning */ +named_group *ng; +nest_save *top_nest, *end_nests; + +/* Insert leading items for word and line matching (features provided for the +benefit of pcre2grep). */ + +if ((extra_options & PCRE2_EXTRA_MATCH_LINE) != 0) + { + *parsed_pattern++ = META_CIRCUMFLEX; + *parsed_pattern++ = META_NOCAPTURE; + } +else if ((extra_options & PCRE2_EXTRA_MATCH_WORD) != 0) + { + *parsed_pattern++ = META_ESCAPE + ESC_b; + *parsed_pattern++ = META_NOCAPTURE; + } + +/* If the pattern is actually a literal string, process it separately to avoid +cluttering up the main loop. */ + +if ((options & PCRE2_LITERAL) != 0) + { + while (ptr < ptrend) + { + if (parsed_pattern >= parsed_pattern_end) + { + errorcode = ERR63; /* Internal error (parsed pattern overflow) */ + goto FAILED; + } + thisptr = ptr; + GETCHARINCTEST(c, ptr); + if (auto_callout) + parsed_pattern = manage_callouts(thisptr, &previous_callout, + auto_callout, parsed_pattern, cb); + PARSED_LITERAL(c, parsed_pattern); + } + goto PARSED_END; + } + +/* Process a real regex which may contain meta-characters. */ + +top_nest = NULL; +end_nests = (nest_save *)(cb->start_workspace + cb->workspace_size); + +/* The size of the nest_save structure might not be a factor of the size of the +workspace. Therefore we must round down end_nests so as to correctly avoid +creating a nest_save that spans the end of the workspace. */ + +end_nests = (nest_save *)((char *)end_nests - + ((cb->workspace_size * sizeof(PCRE2_UCHAR)) % sizeof(nest_save))); + +/* PCRE2_EXTENDED_MORE implies PCRE2_EXTENDED */ + +if ((options & PCRE2_EXTENDED_MORE) != 0) options |= PCRE2_EXTENDED; + +/* Now scan the pattern */ + +while (ptr < ptrend) + { + int prev_expect_cond_assert; + uint32_t min_repeat, max_repeat; + uint32_t set, unset, *optset; + uint32_t terminator; + uint32_t prev_meta_quantifier; + BOOL prev_okquantifier; + PCRE2_SPTR tempptr; + PCRE2_SIZE offset; + + if (parsed_pattern >= parsed_pattern_end) + { + errorcode = ERR63; /* Internal error (parsed pattern overflow) */ + goto FAILED; + } + + if (nest_depth > cb->cx->parens_nest_limit) + { + errorcode = ERR19; + goto FAILED; /* Parentheses too deeply nested */ + } + + /* Get next input character, save its position for callout handling. */ + + thisptr = ptr; + GETCHARINCTEST(c, ptr); + + /* Copy quoted literals until \E, allowing for the possibility of automatic + callouts, except when processing a (*VERB) "name". */ + + if (inescq) + { + if (c == CHAR_BACKSLASH && ptr < ptrend && *ptr == CHAR_E) + { + inescq = FALSE; + ptr++; /* Skip E */ + } + else + { + if (expect_cond_assert > 0) /* A literal is not allowed if we are */ + { /* expecting a conditional assertion, */ + ptr--; /* but an empty \Q\E sequence is OK. */ + errorcode = ERR28; + goto FAILED; + } + if (inverbname) + { /* Don't use PARSED_LITERAL() because it */ +#if PCRE2_CODE_UNIT_WIDTH == 32 /* sets okquantifier. */ + if (c >= META_END) *parsed_pattern++ = META_BIGVALUE; +#endif + *parsed_pattern++ = c; + } + else + { + if (after_manual_callout-- <= 0) + parsed_pattern = manage_callouts(thisptr, &previous_callout, + auto_callout, parsed_pattern, cb); + PARSED_LITERAL(c, parsed_pattern); + } + meta_quantifier = 0; + } + continue; /* Next character */ + } + + /* If we are processing the "name" part of a (*VERB:NAME) item, all + characters up to the closing parenthesis are literals except when + PCRE2_ALT_VERBNAMES is set. That causes backslash interpretation, but only \Q + and \E and escaped characters are allowed (no character types such as \d). If + PCRE2_EXTENDED is also set, we must ignore white space and # comments. Do + this by not entering the special (*VERB:NAME) processing - they are then + picked up below. Note that c is a character, not a code unit, so we must not + use MAX_255 to test its size because MAX_255 tests code units and is assumed + TRUE in 8-bit mode. */ + + if (inverbname && + ( + /* EITHER: not both options set */ + ((options & (PCRE2_EXTENDED | PCRE2_ALT_VERBNAMES)) != + (PCRE2_EXTENDED | PCRE2_ALT_VERBNAMES)) || +#ifdef SUPPORT_UNICODE + /* OR: character > 255 AND not Unicode Pattern White Space */ + (c > 255 && (c|1) != 0x200f && (c|1) != 0x2029) || +#endif + /* OR: not a # comment or isspace() white space */ + (c < 256 && c != CHAR_NUMBER_SIGN && (cb->ctypes[c] & ctype_space) == 0 +#ifdef SUPPORT_UNICODE + /* and not CHAR_NEL when Unicode is supported */ + && c != CHAR_NEL +#endif + ))) + { + PCRE2_SIZE verbnamelength; + + switch(c) + { + default: /* Don't use PARSED_LITERAL() because it */ +#if PCRE2_CODE_UNIT_WIDTH == 32 /* sets okquantifier. */ + if (c >= META_END) *parsed_pattern++ = META_BIGVALUE; +#endif + *parsed_pattern++ = c; + break; + + case CHAR_RIGHT_PARENTHESIS: + inverbname = FALSE; + /* This is the length in characters */ + verbnamelength = (PCRE2_SIZE)(parsed_pattern - verblengthptr - 1); + /* But the limit on the length is in code units */ + if (ptr - verbnamestart - 1 > (int)MAX_MARK) + { + ptr--; + errorcode = ERR76; + goto FAILED; + } + *verblengthptr = (uint32_t)verbnamelength; + + /* If this name was on a verb such as (*ACCEPT) which does not continue, + a (*MARK) was generated for the name. We now add the original verb as the + next item. */ + + if (add_after_mark != 0) + { + *parsed_pattern++ = add_after_mark; + add_after_mark = 0; + } + break; + + case CHAR_BACKSLASH: + if ((options & PCRE2_ALT_VERBNAMES) != 0) + { + escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options, + cb->cx->extra_options, FALSE, cb); + if (errorcode != 0) goto FAILED; + } + else escape = 0; /* Treat all as literal */ + + switch(escape) + { + case 0: /* Don't use PARSED_LITERAL() because it */ +#if PCRE2_CODE_UNIT_WIDTH == 32 /* sets okquantifier. */ + if (c >= META_END) *parsed_pattern++ = META_BIGVALUE; +#endif + *parsed_pattern++ = c; + break; + + case ESC_Q: + inescq = TRUE; + break; + + case ESC_E: /* Ignore */ + break; + + default: + errorcode = ERR40; /* Invalid in verb name */ + goto FAILED; + } + } + continue; /* Next character in pattern */ + } + + /* Not a verb name character. At this point we must process everything that + must not change the quantification state. This is mainly comments, but we + handle \Q and \E here as well, so that an item such as A\Q\E+ is treated as + A+, as in Perl. An isolated \E is ignored. */ + + if (c == CHAR_BACKSLASH && ptr < ptrend) + { + if (*ptr == CHAR_Q || *ptr == CHAR_E) + { + inescq = *ptr == CHAR_Q; + ptr++; + continue; + } + } + + /* Skip over whitespace and # comments in extended mode. Note that c is a + character, not a code unit, so we must not use MAX_255 to test its size + because MAX_255 tests code units and is assumed TRUE in 8-bit mode. The + whitespace characters are those designated as "Pattern White Space" by + Unicode, which are the isspace() characters plus CHAR_NEL (newline), which is + U+0085 in Unicode, plus U+200E, U+200F, U+2028, and U+2029. These are a + subset of space characters that match \h and \v. */ + + if ((options & PCRE2_EXTENDED) != 0) + { + if (c < 256 && (cb->ctypes[c] & ctype_space) != 0) continue; +#ifdef SUPPORT_UNICODE + if (c == CHAR_NEL || (c|1) == 0x200f || (c|1) == 0x2029) continue; +#endif + if (c == CHAR_NUMBER_SIGN) + { + while (ptr < ptrend) + { + if (IS_NEWLINE(ptr)) /* For non-fixed-length newline cases, */ + { /* IS_NEWLINE sets cb->nllen. */ + ptr += cb->nllen; + break; + } + ptr++; +#ifdef SUPPORT_UNICODE + if (utf) FORWARDCHARTEST(ptr, ptrend); +#endif + } + continue; /* Next character in pattern */ + } + } + + /* Skip over bracketed comments */ + + if (c == CHAR_LEFT_PARENTHESIS && ptrend - ptr >= 2 && + ptr[0] == CHAR_QUESTION_MARK && ptr[1] == CHAR_NUMBER_SIGN) + { + while (++ptr < ptrend && *ptr != CHAR_RIGHT_PARENTHESIS); + if (ptr >= ptrend) + { + errorcode = ERR18; /* A special error for missing ) in a comment */ + goto FAILED; /* to make it easier to debug. */ + } + ptr++; + continue; /* Next character in pattern */ + } + + /* If the next item is not a quantifier, fill in length of any previous + callout and create an auto callout if required. */ + + if (c != CHAR_ASTERISK && c != CHAR_PLUS && c != CHAR_QUESTION_MARK && + (c != CHAR_LEFT_CURLY_BRACKET || + (tempptr = ptr, + !read_repeat_counts(&tempptr, ptrend, NULL, NULL, &errorcode)))) + { + if (after_manual_callout-- <= 0) + parsed_pattern = manage_callouts(thisptr, &previous_callout, auto_callout, + parsed_pattern, cb); + } + + /* If expect_cond_assert is 2, we have just passed (?( and are expecting an + assertion, possibly preceded by a callout. If the value is 1, we have just + had the callout and expect an assertion. There must be at least 3 more + characters in all cases. When expect_cond_assert is 2, we know that the + current character is an opening parenthesis, as otherwise we wouldn't be + here. However, when it is 1, we need to check, and it's easiest just to check + always. Note that expect_cond_assert may be negative, since all callouts just + decrement it. */ + + if (expect_cond_assert > 0) + { + BOOL ok = c == CHAR_LEFT_PARENTHESIS && ptrend - ptr >= 3 && + (ptr[0] == CHAR_QUESTION_MARK || ptr[0] == CHAR_ASTERISK); + if (ok) + { + if (ptr[0] == CHAR_ASTERISK) /* New alpha assertion format, possibly */ + { + ok = MAX_255(ptr[1]) && (cb->ctypes[ptr[1]] & ctype_lcletter) != 0; + } + else switch(ptr[1]) /* Traditional symbolic format */ + { + case CHAR_C: + ok = expect_cond_assert == 2; + break; + + case CHAR_EQUALS_SIGN: + case CHAR_EXCLAMATION_MARK: + break; + + case CHAR_LESS_THAN_SIGN: + ok = ptr[2] == CHAR_EQUALS_SIGN || ptr[2] == CHAR_EXCLAMATION_MARK; + break; + + default: + ok = FALSE; + } + } + + if (!ok) + { + ptr--; /* Adjust error offset */ + errorcode = ERR28; + goto FAILED; + } + } + + /* Remember whether we are expecting a conditional assertion, and set the + default for this item. */ + + prev_expect_cond_assert = expect_cond_assert; + expect_cond_assert = 0; + + /* Remember quantification status for the previous significant item, then set + default for this item. */ + + prev_okquantifier = okquantifier; + prev_meta_quantifier = meta_quantifier; + okquantifier = FALSE; + meta_quantifier = 0; + + /* If the previous significant item was a quantifier, adjust the parsed code + if there is a following modifier. The base meta value is always followed by + the PLUS and QUERY values, in that order. We do this here rather than after + reading a quantifier so that intervening comments and /x whitespace can be + ignored without having to replicate code. */ + + if (prev_meta_quantifier != 0 && (c == CHAR_QUESTION_MARK || c == CHAR_PLUS)) + { + parsed_pattern[(prev_meta_quantifier == META_MINMAX)? -3 : -1] = + prev_meta_quantifier + ((c == CHAR_QUESTION_MARK)? + 0x00020000u : 0x00010000u); + continue; /* Next character in pattern */ + } + + + /* Process the next item in the main part of a pattern. */ + + switch(c) + { + default: /* Non-special character */ + PARSED_LITERAL(c, parsed_pattern); + break; + + + /* ---- Escape sequence ---- */ + + case CHAR_BACKSLASH: + tempptr = ptr; + escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options, + cb->cx->extra_options, FALSE, cb); + if (errorcode != 0) + { + ESCAPE_FAILED: + if ((extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0) + goto FAILED; + ptr = tempptr; + if (ptr >= ptrend) c = CHAR_BACKSLASH; else + { + GETCHARINCTEST(c, ptr); /* Get character value, increment pointer */ + } + escape = 0; /* Treat as literal character */ + } + + /* The escape was a data escape or literal character. */ + + if (escape == 0) + { + PARSED_LITERAL(c, parsed_pattern); + } + + /* The escape was a back (or forward) reference. We keep the offset in + order to give a more useful diagnostic for a bad forward reference. For + references to groups numbered less than 10 we can't use more than two items + in parsed_pattern because they may be just two characters in the input (and + in a 64-bit world an offset may need two elements). So for them, the offset + of the first occurrent is held in a special vector. */ + + else if (escape < 0) + { + offset = (PCRE2_SIZE)(ptr - cb->start_pattern - 1); + escape = -escape; + *parsed_pattern++ = META_BACKREF | (uint32_t)escape; + if (escape < 10) + { + if (cb->small_ref_offset[escape] == PCRE2_UNSET) + cb->small_ref_offset[escape] = offset; + } + else + { + PUTOFFSET(offset, parsed_pattern); + } + okquantifier = TRUE; + } + + /* The escape was a character class such as \d etc. or other special + escape indicator such as \A or \X. Most of them generate just a single + parsed item, but \P and \p are followed by a 16-bit type and a 16-bit + value. They are supported only when Unicode is available. The type and + value are packed into a single 32-bit value so that the whole sequences + uses only two elements in the parsed_vector. This is because the same + coding is used if \d (for example) is turned into \p{Nd} when PCRE2_UCP is + set. + + There are also some cases where the escape sequence is followed by a name: + \k{name}, \k, and \k'name' are backreferences by name, and \g + and \g'name' are subroutine calls by name; \g{name} is a synonym for + \k{name}. Note that \g and \g'number' are handled by check_escape() + and returned as a negative value (handled above). A name is coded as an + offset into the pattern and a length. */ + + else switch (escape) + { + case ESC_C: +#ifdef NEVER_BACKSLASH_C + errorcode = ERR85; + goto ESCAPE_FAILED; +#else + if ((options & PCRE2_NEVER_BACKSLASH_C) != 0) + { + errorcode = ERR83; + goto ESCAPE_FAILED; + } +#endif + okquantifier = TRUE; + *parsed_pattern++ = META_ESCAPE + escape; + break; + + case ESC_X: +#ifndef SUPPORT_UNICODE + errorcode = ERR45; /* Supported only with Unicode support */ + goto ESCAPE_FAILED; +#endif + case ESC_H: + case ESC_h: + case ESC_N: + case ESC_R: + case ESC_V: + case ESC_v: + okquantifier = TRUE; + *parsed_pattern++ = META_ESCAPE + escape; + break; + + default: /* \A, \B, \b, \G, \K, \Z, \z cannot be quantified. */ + *parsed_pattern++ = META_ESCAPE + escape; + break; + + /* Escapes that change in UCP mode. Note that PCRE2_UCP will never be set + without Unicode support because it is checked when pcre2_compile() is + called. */ + + case ESC_d: + case ESC_D: + case ESC_s: + case ESC_S: + case ESC_w: + case ESC_W: + okquantifier = TRUE; + if ((options & PCRE2_UCP) == 0) + { + *parsed_pattern++ = META_ESCAPE + escape; + } + else + { + *parsed_pattern++ = META_ESCAPE + + ((escape == ESC_d || escape == ESC_s || escape == ESC_w)? + ESC_p : ESC_P); + switch(escape) + { + case ESC_d: + case ESC_D: + *parsed_pattern++ = (PT_PC << 16) | ucp_Nd; + break; + + case ESC_s: + case ESC_S: + *parsed_pattern++ = PT_SPACE << 16; + break; + + case ESC_w: + case ESC_W: + *parsed_pattern++ = PT_WORD << 16; + break; + } + } + break; + + /* Unicode property matching */ + + case ESC_P: + case ESC_p: +#ifdef SUPPORT_UNICODE + { + BOOL negated; + uint16_t ptype = 0, pdata = 0; + if (!get_ucp(&ptr, &negated, &ptype, &pdata, &errorcode, cb)) + goto ESCAPE_FAILED; + if (negated) escape = (escape == ESC_P)? ESC_p : ESC_P; + *parsed_pattern++ = META_ESCAPE + escape; + *parsed_pattern++ = (ptype << 16) | pdata; + okquantifier = TRUE; + } +#else + errorcode = ERR45; + goto ESCAPE_FAILED; +#endif + break; /* End \P and \p */ + + /* When \g is used with quotes or angle brackets as delimiters, it is a + numerical or named subroutine call, and control comes here. When used + with brace delimiters it is a numberical back reference and does not come + here because check_escape() returns it directly as a reference. \k is + always a named back reference. */ + + case ESC_g: + case ESC_k: + if (ptr >= ptrend || (*ptr != CHAR_LEFT_CURLY_BRACKET && + *ptr != CHAR_LESS_THAN_SIGN && *ptr != CHAR_APOSTROPHE)) + { + errorcode = (escape == ESC_g)? ERR57 : ERR69; + goto ESCAPE_FAILED; + } + terminator = (*ptr == CHAR_LESS_THAN_SIGN)? + CHAR_GREATER_THAN_SIGN : (*ptr == CHAR_APOSTROPHE)? + CHAR_APOSTROPHE : CHAR_RIGHT_CURLY_BRACKET; + + /* For a non-braced \g, check for a numerical recursion. */ + + if (escape == ESC_g && terminator != CHAR_RIGHT_CURLY_BRACKET) + { + PCRE2_SPTR p = ptr + 1; + + if (read_number(&p, ptrend, cb->bracount, MAX_GROUP_NUMBER, ERR61, &i, + &errorcode)) + { + if (p >= ptrend || *p != terminator) + { + errorcode = ERR57; + goto ESCAPE_FAILED; + } + ptr = p; + goto SET_RECURSION; + } + if (errorcode != 0) goto ESCAPE_FAILED; + } + + /* Not a numerical recursion */ + + if (!read_name(&ptr, ptrend, utf, terminator, &offset, &name, &namelen, + &errorcode, cb)) goto ESCAPE_FAILED; + + /* \k and \g when used with braces are back references, whereas \g used + with quotes or angle brackets is a recursion */ + + *parsed_pattern++ = + (escape == ESC_k || terminator == CHAR_RIGHT_CURLY_BRACKET)? + META_BACKREF_BYNAME : META_RECURSE_BYNAME; + *parsed_pattern++ = namelen; + + PUTOFFSET(offset, parsed_pattern); + okquantifier = TRUE; + break; /* End special escape processing */ + } + break; /* End escape sequence processing */ + + + /* ---- Single-character special items ---- */ + + case CHAR_CIRCUMFLEX_ACCENT: + *parsed_pattern++ = META_CIRCUMFLEX; + break; + + case CHAR_DOLLAR_SIGN: + *parsed_pattern++ = META_DOLLAR; + break; + + case CHAR_DOT: + *parsed_pattern++ = META_DOT; + okquantifier = TRUE; + break; + + + /* ---- Single-character quantifiers ---- */ + + case CHAR_ASTERISK: + meta_quantifier = META_ASTERISK; + goto CHECK_QUANTIFIER; + + case CHAR_PLUS: + meta_quantifier = META_PLUS; + goto CHECK_QUANTIFIER; + + case CHAR_QUESTION_MARK: + meta_quantifier = META_QUERY; + goto CHECK_QUANTIFIER; + + + /* ---- Potential {n,m} quantifier ---- */ + + case CHAR_LEFT_CURLY_BRACKET: + if (!read_repeat_counts(&ptr, ptrend, &min_repeat, &max_repeat, + &errorcode)) + { + if (errorcode != 0) goto FAILED; /* Error in quantifier. */ + PARSED_LITERAL(c, parsed_pattern); /* Not a quantifier */ + break; /* No more quantifier processing */ + } + meta_quantifier = META_MINMAX; + /* Fall through */ + + + /* ---- Quantifier post-processing ---- */ + + /* Check that a quantifier is allowed after the previous item. */ + + CHECK_QUANTIFIER: + if (!prev_okquantifier) + { + errorcode = ERR9; + goto FAILED_BACK; + } + + /* Most (*VERB)s are not allowed to be quantified, but an ungreedy + quantifier can be useful for (*ACCEPT) - meaning "succeed on backtrack", a + sort of negated (*COMMIT). We therefore allow (*ACCEPT) to be quantified by + wrapping it in non-capturing brackets, but we have to allow for a preceding + (*MARK) for when (*ACCEPT) has an argument. */ + + if (parsed_pattern[-1] == META_ACCEPT) + { + uint32_t *p; + for (p = parsed_pattern - 1; p >= verbstartptr; p--) p[1] = p[0]; + *verbstartptr = META_NOCAPTURE; + parsed_pattern[1] = META_KET; + parsed_pattern += 2; + } + + /* Now we can put the quantifier into the parsed pattern vector. At this + stage, we have only the basic quantifier. The check for a following + or ? + modifier happens at the top of the loop, after any intervening comments + have been removed. */ + + *parsed_pattern++ = meta_quantifier; + if (c == CHAR_LEFT_CURLY_BRACKET) + { + *parsed_pattern++ = min_repeat; + *parsed_pattern++ = max_repeat; + } + break; + + + /* ---- Character class ---- */ + + case CHAR_LEFT_SQUARE_BRACKET: + okquantifier = TRUE; + + /* In another (POSIX) regex library, the ugly syntax [[:<:]] and [[:>:]] is + used for "start of word" and "end of word". As these are otherwise illegal + sequences, we don't break anything by recognizing them. They are replaced + by \b(?=\w) and \b(?<=\w) respectively. Sequences like [a[:<:]] are + erroneous and are handled by the normal code below. */ + + if (ptrend - ptr >= 6 && + (PRIV(strncmp_c8)(ptr, STRING_WEIRD_STARTWORD, 6) == 0 || + PRIV(strncmp_c8)(ptr, STRING_WEIRD_ENDWORD, 6) == 0)) + { + *parsed_pattern++ = META_ESCAPE + ESC_b; + + if (ptr[2] == CHAR_LESS_THAN_SIGN) + { + *parsed_pattern++ = META_LOOKAHEAD; + } + else + { + *parsed_pattern++ = META_LOOKBEHIND; + *has_lookbehind = TRUE; + + /* The offset is used only for the "non-fixed length" error; this won't + occur here, so just store zero. */ + + PUTOFFSET((PCRE2_SIZE)0, parsed_pattern); + } + + if ((options & PCRE2_UCP) == 0) + *parsed_pattern++ = META_ESCAPE + ESC_w; + else + { + *parsed_pattern++ = META_ESCAPE + ESC_p; + *parsed_pattern++ = PT_WORD << 16; + } + *parsed_pattern++ = META_KET; + ptr += 6; + break; + } + + /* PCRE supports POSIX class stuff inside a class. Perl gives an error if + they are encountered at the top level, so we'll do that too. */ + + if (ptr < ptrend && (*ptr == CHAR_COLON || *ptr == CHAR_DOT || + *ptr == CHAR_EQUALS_SIGN) && + check_posix_syntax(ptr, ptrend, &tempptr)) + { + errorcode = (*ptr-- == CHAR_COLON)? ERR12 : ERR13; + goto FAILED; + } + + /* Process a regular character class. If the first character is '^', set + the negation flag. If the first few characters (either before or after ^) + are \Q\E or \E or space or tab in extended-more mode, we skip them too. + This makes for compatibility with Perl. */ + + negate_class = FALSE; + while (ptr < ptrend) + { + GETCHARINCTEST(c, ptr); + if (c == CHAR_BACKSLASH) + { + if (ptr < ptrend && *ptr == CHAR_E) ptr++; + else if (ptrend - ptr >= 3 && + PRIV(strncmp_c8)(ptr, STR_Q STR_BACKSLASH STR_E, 3) == 0) + ptr += 3; + else + break; + } + else if ((options & PCRE2_EXTENDED_MORE) != 0 && + (c == CHAR_SPACE || c == CHAR_HT)) /* Note: just these two */ + continue; + else if (!negate_class && c == CHAR_CIRCUMFLEX_ACCENT) + negate_class = TRUE; + else break; + } + + /* Now the real contents of the class; c has the first "real" character. + Empty classes are permitted only if the option is set. */ + + if (c == CHAR_RIGHT_SQUARE_BRACKET && + (cb->external_options & PCRE2_ALLOW_EMPTY_CLASS) != 0) + { + *parsed_pattern++ = negate_class? META_CLASS_EMPTY_NOT : META_CLASS_EMPTY; + break; /* End of class processing */ + } + + /* Process a non-empty class. */ + + *parsed_pattern++ = negate_class? META_CLASS_NOT : META_CLASS; + class_range_state = RANGE_NO; + + /* In an EBCDIC environment, Perl treats alphabetic ranges specially + because there are holes in the encoding, and simply using the range A-Z + (for example) would include the characters in the holes. This applies only + to ranges where both values are literal; [\xC1-\xE9] is different to [A-Z] + in this respect. In order to accommodate this, we keep track of whether + character values are literal or not, and a state variable for handling + ranges. */ + + /* Loop for the contents of the class */ + + for (;;) + { + BOOL char_is_literal = TRUE; + + /* Inside \Q...\E everything is literal except \E */ + + if (inescq) + { + if (c == CHAR_BACKSLASH && ptr < ptrend && *ptr == CHAR_E) + { + inescq = FALSE; /* Reset literal state */ + ptr++; /* Skip the 'E' */ + goto CLASS_CONTINUE; + } + goto CLASS_LITERAL; + } + + /* Skip over space and tab (only) in extended-more mode. */ + + if ((options & PCRE2_EXTENDED_MORE) != 0 && + (c == CHAR_SPACE || c == CHAR_HT)) + goto CLASS_CONTINUE; + + /* Handle POSIX class names. Perl allows a negation extension of the + form [:^name:]. A square bracket that doesn't match the syntax is + treated as a literal. We also recognize the POSIX constructions + [.ch.] and [=ch=] ("collating elements") and fault them, as Perl + 5.6 and 5.8 do. */ + + if (c == CHAR_LEFT_SQUARE_BRACKET && + ptrend - ptr >= 3 && + (*ptr == CHAR_COLON || *ptr == CHAR_DOT || + *ptr == CHAR_EQUALS_SIGN) && + check_posix_syntax(ptr, ptrend, &tempptr)) + { + BOOL posix_negate = FALSE; + int posix_class; + + /* Perl treats a hyphen before a POSIX class as a literal, not the + start of a range. However, it gives a warning in its warning mode. PCRE + does not have a warning mode, so we give an error, because this is + likely an error on the user's part. */ + + if (class_range_state == RANGE_STARTED) + { + errorcode = ERR50; + goto FAILED; + } + + if (*ptr != CHAR_COLON) + { + errorcode = ERR13; + goto FAILED_BACK; + } + + if (*(++ptr) == CHAR_CIRCUMFLEX_ACCENT) + { + posix_negate = TRUE; + ptr++; + } + + posix_class = check_posix_name(ptr, (int)(tempptr - ptr)); + if (posix_class < 0) + { + errorcode = ERR30; + goto FAILED; + } + ptr = tempptr + 2; + + /* Perl treats a hyphen after a POSIX class as a literal, not the + start of a range. However, it gives a warning in its warning mode + unless the hyphen is the last character in the class. PCRE does not + have a warning mode, so we give an error, because this is likely an + error on the user's part. */ + + if (ptr < ptrend - 1 && *ptr == CHAR_MINUS && + ptr[1] != CHAR_RIGHT_SQUARE_BRACKET) + { + errorcode = ERR50; + goto FAILED; + } + + /* Set "a hyphen is not the start of a range" for the -] case, and also + in case the POSIX class is followed by \E or \Q\E (possibly repeated - + fuzzers do that kind of thing) and *then* a hyphen. This causes that + hyphen to be treated as a literal. I don't think it's worth setting up + special apparatus to do otherwise. */ + + class_range_state = RANGE_NO; + + /* When PCRE2_UCP is set, some of the POSIX classes are converted to + use Unicode properties \p or \P or, in one case, \h or \H. The + substitutes table has two values per class, containing the type and + value of a \p or \P item. The special cases are specified with a + negative type: a non-zero value causes \h or \H to be used, and a zero + value falls through to behave like a non-UCP POSIX class. */ + +#ifdef SUPPORT_UNICODE + if ((options & PCRE2_UCP) != 0) + { + int ptype = posix_substitutes[2*posix_class]; + int pvalue = posix_substitutes[2*posix_class + 1]; + if (ptype >= 0) + { + *parsed_pattern++ = META_ESCAPE + (posix_negate? ESC_P : ESC_p); + *parsed_pattern++ = (ptype << 16) | pvalue; + goto CLASS_CONTINUE; + } + + if (pvalue != 0) + { + *parsed_pattern++ = META_ESCAPE + (posix_negate? ESC_H : ESC_h); + goto CLASS_CONTINUE; + } + + /* Fall through */ + } +#endif /* SUPPORT_UNICODE */ + + /* Non-UCP POSIX class */ + + *parsed_pattern++ = posix_negate? META_POSIX_NEG : META_POSIX; + *parsed_pattern++ = posix_class; + } + + /* Handle potential start of range */ + + else if (c == CHAR_MINUS && class_range_state >= RANGE_OK_ESCAPED) + { + *parsed_pattern++ = (class_range_state == RANGE_OK_LITERAL)? + META_RANGE_LITERAL : META_RANGE_ESCAPED; + class_range_state = RANGE_STARTED; + } + + /* Handle a literal character */ + + else if (c != CHAR_BACKSLASH) + { + CLASS_LITERAL: + if (class_range_state == RANGE_STARTED) + { + if (c == parsed_pattern[-2]) /* Optimize one-char range */ + parsed_pattern--; + else if (parsed_pattern[-2] > c) /* Check range is in order */ + { + errorcode = ERR8; + goto FAILED_BACK; + } + else + { + if (!char_is_literal && parsed_pattern[-1] == META_RANGE_LITERAL) + parsed_pattern[-1] = META_RANGE_ESCAPED; + PARSED_LITERAL(c, parsed_pattern); + } + class_range_state = RANGE_NO; + } + else /* Potential start of range */ + { + class_range_state = char_is_literal? + RANGE_OK_LITERAL : RANGE_OK_ESCAPED; + PARSED_LITERAL(c, parsed_pattern); + } + } + + /* Handle escapes in a class */ + + else + { + tempptr = ptr; + escape = PRIV(check_escape)(&ptr, ptrend, &c, &errorcode, options, + cb->cx->extra_options, TRUE, cb); + + if (errorcode != 0) + { + if ((extra_options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) == 0) + goto FAILED; + ptr = tempptr; + if (ptr >= ptrend) c = CHAR_BACKSLASH; else + { + GETCHARINCTEST(c, ptr); /* Get character value, increment pointer */ + } + escape = 0; /* Treat as literal character */ + } + + switch(escape) + { + case 0: /* Escaped character code point is in c */ + char_is_literal = FALSE; + goto CLASS_LITERAL; + + case ESC_b: + c = CHAR_BS; /* \b is backspace in a class */ + char_is_literal = FALSE; + goto CLASS_LITERAL; + + case ESC_Q: + inescq = TRUE; /* Enter literal mode */ + goto CLASS_CONTINUE; + + case ESC_E: /* Ignore orphan \E */ + goto CLASS_CONTINUE; + + case ESC_B: /* Always an error in a class */ + case ESC_R: + case ESC_X: + errorcode = ERR7; + ptr--; + goto FAILED; + } + + /* The second part of a range can be a single-character escape + sequence (detected above), but not any of the other escapes. Perl + treats a hyphen as a literal in such circumstances. However, in Perl's + warning mode, a warning is given, so PCRE now faults it, as it is + almost certainly a mistake on the user's part. */ + + if (class_range_state == RANGE_STARTED) + { + errorcode = ERR50; + goto FAILED; /* Not CLASS_ESCAPE_FAILED; always an error */ + } + + /* Of the remaining escapes, only those that define characters are + allowed in a class. None may start a range. */ + + class_range_state = RANGE_NO; + switch(escape) + { + case ESC_N: + errorcode = ERR71; + goto FAILED; + + case ESC_H: + case ESC_h: + case ESC_V: + case ESC_v: + *parsed_pattern++ = META_ESCAPE + escape; + break; + + /* These escapes are converted to Unicode property tests when + PCRE2_UCP is set. */ + + case ESC_d: + case ESC_D: + case ESC_s: + case ESC_S: + case ESC_w: + case ESC_W: + if ((options & PCRE2_UCP) == 0) + { + *parsed_pattern++ = META_ESCAPE + escape; + } + else + { + *parsed_pattern++ = META_ESCAPE + + ((escape == ESC_d || escape == ESC_s || escape == ESC_w)? + ESC_p : ESC_P); + switch(escape) + { + case ESC_d: + case ESC_D: + *parsed_pattern++ = (PT_PC << 16) | ucp_Nd; + break; + + case ESC_s: + case ESC_S: + *parsed_pattern++ = PT_SPACE << 16; + break; + + case ESC_w: + case ESC_W: + *parsed_pattern++ = PT_WORD << 16; + break; + } + } + break; + + /* Explicit Unicode property matching */ + + case ESC_P: + case ESC_p: +#ifdef SUPPORT_UNICODE + { + BOOL negated; + uint16_t ptype = 0, pdata = 0; + if (!get_ucp(&ptr, &negated, &ptype, &pdata, &errorcode, cb)) + goto FAILED; + if (negated) escape = (escape == ESC_P)? ESC_p : ESC_P; + *parsed_pattern++ = META_ESCAPE + escape; + *parsed_pattern++ = (ptype << 16) | pdata; + } +#else + errorcode = ERR45; + goto FAILED; +#endif + break; /* End \P and \p */ + + default: /* All others are not allowed in a class */ + errorcode = ERR7; + ptr--; + goto FAILED; + } + + /* Perl gives a warning unless a following hyphen is the last character + in the class. PCRE throws an error. */ + + if (ptr < ptrend - 1 && *ptr == CHAR_MINUS && + ptr[1] != CHAR_RIGHT_SQUARE_BRACKET) + { + errorcode = ERR50; + goto FAILED; + } + } + + /* Proceed to next thing in the class. */ + + CLASS_CONTINUE: + if (ptr >= ptrend) + { + errorcode = ERR6; /* Missing terminating ']' */ + goto FAILED; + } + GETCHARINCTEST(c, ptr); + if (c == CHAR_RIGHT_SQUARE_BRACKET && !inescq) break; + } /* End of class-processing loop */ + + /* -] at the end of a class is a literal '-' */ + + if (class_range_state == RANGE_STARTED) + { + parsed_pattern[-1] = CHAR_MINUS; + class_range_state = RANGE_NO; + } + + *parsed_pattern++ = META_CLASS_END; + break; /* End of character class */ + + + /* ---- Opening parenthesis ---- */ + + case CHAR_LEFT_PARENTHESIS: + if (ptr >= ptrend) goto UNCLOSED_PARENTHESIS; + + /* If ( is not followed by ? it is either a capture or a special verb or an + alpha assertion or a positive non-atomic lookahead. */ + + if (*ptr != CHAR_QUESTION_MARK) + { + const char *vn; + + /* Handle capturing brackets (or non-capturing if auto-capture is turned + off). */ + + if (*ptr != CHAR_ASTERISK) + { + nest_depth++; + if ((options & PCRE2_NO_AUTO_CAPTURE) == 0) + { + if (cb->bracount >= MAX_GROUP_NUMBER) + { + errorcode = ERR97; + goto FAILED; + } + cb->bracount++; + *parsed_pattern++ = META_CAPTURE | cb->bracount; + } + else *parsed_pattern++ = META_NOCAPTURE; + } + + /* Do nothing for (* followed by end of pattern or ) so it gives a "bad + quantifier" error rather than "(*MARK) must have an argument". */ + + else if (ptrend - ptr <= 1 || (c = ptr[1]) == CHAR_RIGHT_PARENTHESIS) + break; + + /* Handle "alpha assertions" such as (*pla:...). Most of these are + synonyms for the historical symbolic assertions, but the script run and + non-atomic lookaround ones are new. They are distinguished by starting + with a lower case letter. Checking both ends of the alphabet makes this + work in all character codes. */ + + else if (CHMAX_255(c) && (cb->ctypes[c] & ctype_lcletter) != 0) + { + uint32_t meta; + + vn = alasnames; + if (!read_name(&ptr, ptrend, utf, 0, &offset, &name, &namelen, + &errorcode, cb)) goto FAILED; + if (ptr >= ptrend || *ptr != CHAR_COLON) + { + errorcode = ERR95; /* Malformed */ + goto FAILED; + } + + /* Scan the table of alpha assertion names */ + + for (i = 0; i < alascount; i++) + { + if (namelen == alasmeta[i].len && + PRIV(strncmp_c8)(name, vn, namelen) == 0) + break; + vn += alasmeta[i].len + 1; + } + + if (i >= alascount) + { + errorcode = ERR95; /* Alpha assertion not recognized */ + goto FAILED; + } + + /* Check for expecting an assertion condition. If so, only atomic + lookaround assertions are valid. */ + + meta = alasmeta[i].meta; + if (prev_expect_cond_assert > 0 && + (meta < META_LOOKAHEAD || meta > META_LOOKBEHINDNOT)) + { + errorcode = (meta == META_LOOKAHEAD_NA || meta == META_LOOKBEHIND_NA)? + ERR98 : ERR28; /* (Atomic) assertion expected */ + goto FAILED; + } + + /* The lookaround alphabetic synonyms can mostly be handled by jumping + to the code that handles the traditional symbolic forms. */ + + switch(meta) + { + default: + errorcode = ERR89; /* Unknown code; should never occur because */ + goto FAILED; /* the meta values come from a table above. */ + + case META_ATOMIC: + goto ATOMIC_GROUP; + + case META_LOOKAHEAD: + goto POSITIVE_LOOK_AHEAD; + + case META_LOOKAHEAD_NA: + goto POSITIVE_NONATOMIC_LOOK_AHEAD; + + case META_LOOKAHEADNOT: + goto NEGATIVE_LOOK_AHEAD; + + case META_LOOKBEHIND: + case META_LOOKBEHINDNOT: + case META_LOOKBEHIND_NA: + *parsed_pattern++ = meta; + ptr--; + goto POST_LOOKBEHIND; + + /* The script run facilities are handled here. Unicode support is + required (give an error if not, as this is a security issue). Always + record a META_SCRIPT_RUN item. Then, for the atomic version, insert + META_ATOMIC and remember that we need two META_KETs at the end. */ + + case META_SCRIPT_RUN: + case META_ATOMIC_SCRIPT_RUN: +#ifdef SUPPORT_UNICODE + *parsed_pattern++ = META_SCRIPT_RUN; + nest_depth++; + ptr++; + if (meta == META_ATOMIC_SCRIPT_RUN) + { + *parsed_pattern++ = META_ATOMIC; + if (top_nest == NULL) top_nest = (nest_save *)(cb->start_workspace); + else if (++top_nest >= end_nests) + { + errorcode = ERR84; + goto FAILED; + } + top_nest->nest_depth = nest_depth; + top_nest->flags = NSF_ATOMICSR; + top_nest->options = options & PARSE_TRACKED_OPTIONS; + } + break; +#else /* SUPPORT_UNICODE */ + errorcode = ERR96; + goto FAILED; +#endif + } + } + + + /* ---- Handle (*VERB) and (*VERB:NAME) ---- */ + + else + { + vn = verbnames; + if (!read_name(&ptr, ptrend, utf, 0, &offset, &name, &namelen, + &errorcode, cb)) goto FAILED; + if (ptr >= ptrend || (*ptr != CHAR_COLON && + *ptr != CHAR_RIGHT_PARENTHESIS)) + { + errorcode = ERR60; /* Malformed */ + goto FAILED; + } + + /* Scan the table of verb names */ + + for (i = 0; i < verbcount; i++) + { + if (namelen == verbs[i].len && + PRIV(strncmp_c8)(name, vn, namelen) == 0) + break; + vn += verbs[i].len + 1; + } + + if (i >= verbcount) + { + errorcode = ERR60; /* Verb not recognized */ + goto FAILED; + } + + /* An empty argument is treated as no argument. */ + + if (*ptr == CHAR_COLON && ptr + 1 < ptrend && + ptr[1] == CHAR_RIGHT_PARENTHESIS) + ptr++; /* Advance to the closing parens */ + + /* Check for mandatory non-empty argument; this is (*MARK) */ + + if (verbs[i].has_arg > 0 && *ptr != CHAR_COLON) + { + errorcode = ERR66; + goto FAILED; + } + + /* Remember where this verb, possibly with a preceding (*MARK), starts, + for handling quantified (*ACCEPT). */ + + verbstartptr = parsed_pattern; + okquantifier = (verbs[i].meta == META_ACCEPT); + + /* It appears that Perl allows any characters whatsoever, other than a + closing parenthesis, to appear in arguments ("names"), so we no longer + insist on letters, digits, and underscores. Perl does not, however, do + any interpretation within arguments, and has no means of including a + closing parenthesis. PCRE supports escape processing but only when it + is requested by an option. We set inverbname TRUE here, and let the + main loop take care of this so that escape and \x processing is done by + the main code above. */ + + if (*ptr++ == CHAR_COLON) /* Skip past : or ) */ + { + /* Some optional arguments can be treated as a preceding (*MARK) */ + + if (verbs[i].has_arg < 0) + { + add_after_mark = verbs[i].meta; + *parsed_pattern++ = META_MARK; + } + + /* The remaining verbs with arguments (except *MARK) need a different + opcode. */ + + else + { + *parsed_pattern++ = verbs[i].meta + + ((verbs[i].meta != META_MARK)? 0x00010000u:0); + } + + /* Set up for reading the name in the main loop. */ + + verblengthptr = parsed_pattern++; + verbnamestart = ptr; + inverbname = TRUE; + } + else /* No verb "name" argument */ + { + *parsed_pattern++ = verbs[i].meta; + } + } /* End of (*VERB) handling */ + break; /* Done with this parenthesis */ + } /* End of groups that don't start with (? */ + + + /* ---- Items starting (? ---- */ + + /* The type of item is determined by what follows (?. Handle (?| and option + changes under "default" because both need a new block on the nest stack. + Comments starting with (?# are handled above. Note that there is some + ambiguity about the sequence (?- because if a digit follows it's a relative + recursion or subroutine call whereas otherwise it's an option unsetting. */ + + if (++ptr >= ptrend) goto UNCLOSED_PARENTHESIS; + + switch(*ptr) + { + default: + if (*ptr == CHAR_MINUS && ptrend - ptr > 1 && IS_DIGIT(ptr[1])) + goto RECURSION_BYNUMBER; /* The + case is handled by CHAR_PLUS */ + + /* We now have either (?| or a (possibly empty) option setting, + optionally followed by a non-capturing group. */ + + nest_depth++; + if (top_nest == NULL) top_nest = (nest_save *)(cb->start_workspace); + else if (++top_nest >= end_nests) + { + errorcode = ERR84; + goto FAILED; + } + top_nest->nest_depth = nest_depth; + top_nest->flags = 0; + top_nest->options = options & PARSE_TRACKED_OPTIONS; + + /* Start of non-capturing group that resets the capture count for each + branch. */ + + if (*ptr == CHAR_VERTICAL_LINE) + { + top_nest->reset_group = (uint16_t)cb->bracount; + top_nest->max_group = (uint16_t)cb->bracount; + top_nest->flags |= NSF_RESET; + cb->external_flags |= PCRE2_DUPCAPUSED; + *parsed_pattern++ = META_NOCAPTURE; + ptr++; + } + + /* Scan for options imnsxJU to be set or unset. */ + + else + { + BOOL hyphenok = TRUE; + uint32_t oldoptions = options; + + top_nest->reset_group = 0; + top_nest->max_group = 0; + set = unset = 0; + optset = &set; + + /* ^ at the start unsets imnsx and disables the subsequent use of - */ + + if (ptr < ptrend && *ptr == CHAR_CIRCUMFLEX_ACCENT) + { + options &= ~(PCRE2_CASELESS|PCRE2_MULTILINE|PCRE2_NO_AUTO_CAPTURE| + PCRE2_DOTALL|PCRE2_EXTENDED|PCRE2_EXTENDED_MORE); + hyphenok = FALSE; + ptr++; + } + + while (ptr < ptrend && *ptr != CHAR_RIGHT_PARENTHESIS && + *ptr != CHAR_COLON) + { + switch (*ptr++) + { + case CHAR_MINUS: + if (!hyphenok) + { + errorcode = ERR94; + ptr--; /* Correct the offset */ + goto FAILED; + } + optset = &unset; + hyphenok = FALSE; + break; + + case CHAR_J: /* Record that it changed in the external options */ + *optset |= PCRE2_DUPNAMES; + cb->external_flags |= PCRE2_JCHANGED; + break; + + case CHAR_i: *optset |= PCRE2_CASELESS; break; + case CHAR_m: *optset |= PCRE2_MULTILINE; break; + case CHAR_n: *optset |= PCRE2_NO_AUTO_CAPTURE; break; + case CHAR_s: *optset |= PCRE2_DOTALL; break; + case CHAR_U: *optset |= PCRE2_UNGREEDY; break; + + /* If x appears twice it sets the extended extended option. */ + + case CHAR_x: + *optset |= PCRE2_EXTENDED; + if (ptr < ptrend && *ptr == CHAR_x) + { + *optset |= PCRE2_EXTENDED_MORE; + ptr++; + } + break; + + default: + errorcode = ERR11; + ptr--; /* Correct the offset */ + goto FAILED; + } + } + + /* If we are setting extended without extended-more, ensure that any + existing extended-more gets unset. Also, unsetting extended must also + unset extended-more. */ + + if ((set & (PCRE2_EXTENDED|PCRE2_EXTENDED_MORE)) == PCRE2_EXTENDED || + (unset & PCRE2_EXTENDED) != 0) + unset |= PCRE2_EXTENDED_MORE; + + options = (options | set) & (~unset); + + /* If the options ended with ')' this is not the start of a nested + group with option changes, so the options change at this level. + In this case, if the previous level set up a nest block, discard the + one we have just created. Otherwise adjust it for the previous level. + If the options ended with ':' we are starting a non-capturing group, + possibly with an options setting. */ + + if (ptr >= ptrend) goto UNCLOSED_PARENTHESIS; + if (*ptr++ == CHAR_RIGHT_PARENTHESIS) + { + nest_depth--; /* This is not a nested group after all. */ + if (top_nest > (nest_save *)(cb->start_workspace) && + (top_nest-1)->nest_depth == nest_depth) top_nest--; + else top_nest->nest_depth = nest_depth; + } + else *parsed_pattern++ = META_NOCAPTURE; + + /* If nothing changed, no need to record. */ + + if (options != oldoptions) + { + *parsed_pattern++ = META_OPTIONS; + *parsed_pattern++ = options; + } + } /* End options processing */ + break; /* End default case after (? */ + + + /* ---- Python syntax support ---- */ + + case CHAR_P: + if (++ptr >= ptrend) goto UNCLOSED_PARENTHESIS; + + /* (?P is the same as (?, which defines a named group. */ + + if (*ptr == CHAR_LESS_THAN_SIGN) + { + terminator = CHAR_GREATER_THAN_SIGN; + goto DEFINE_NAME; + } + + /* (?P>name) is the same as (?&name), which is a recursion or subroutine + call. */ + + if (*ptr == CHAR_GREATER_THAN_SIGN) goto RECURSE_BY_NAME; + + /* (?P=name) is the same as \k, a back reference by name. Anything + else after (?P is an error. */ + + if (*ptr != CHAR_EQUALS_SIGN) + { + errorcode = ERR41; + goto FAILED; + } + if (!read_name(&ptr, ptrend, utf, CHAR_RIGHT_PARENTHESIS, &offset, &name, + &namelen, &errorcode, cb)) goto FAILED; + *parsed_pattern++ = META_BACKREF_BYNAME; + *parsed_pattern++ = namelen; + PUTOFFSET(offset, parsed_pattern); + okquantifier = TRUE; + break; /* End of (?P processing */ + + + /* ---- Recursion/subroutine calls by number ---- */ + + case CHAR_R: + i = 0; /* (?R) == (?R0) */ + ptr++; + if (ptr >= ptrend || *ptr != CHAR_RIGHT_PARENTHESIS) + { + errorcode = ERR58; + goto FAILED; + } + goto SET_RECURSION; + + /* An item starting (?- followed by a digit comes here via the "default" + case because (?- followed by a non-digit is an options setting. */ + + case CHAR_PLUS: + if (ptrend - ptr < 2 || !IS_DIGIT(ptr[1])) + { + errorcode = ERR29; /* Missing number */ + goto FAILED; + } + /* Fall through */ + + case CHAR_0: case CHAR_1: case CHAR_2: case CHAR_3: case CHAR_4: + case CHAR_5: case CHAR_6: case CHAR_7: case CHAR_8: case CHAR_9: + RECURSION_BYNUMBER: + if (!read_number(&ptr, ptrend, + (IS_DIGIT(*ptr))? -1:(int)(cb->bracount), /* + and - are relative */ + MAX_GROUP_NUMBER, ERR61, + &i, &errorcode)) goto FAILED; + if (i < 0) /* NB (?0) is permitted */ + { + errorcode = ERR15; /* Unknown group */ + goto FAILED_BACK; + } + if (ptr >= ptrend || *ptr != CHAR_RIGHT_PARENTHESIS) + goto UNCLOSED_PARENTHESIS; + + SET_RECURSION: + *parsed_pattern++ = META_RECURSE | (uint32_t)i; + offset = (PCRE2_SIZE)(ptr - cb->start_pattern); + ptr++; + PUTOFFSET(offset, parsed_pattern); + okquantifier = TRUE; + break; /* End of recursive call by number handling */ + + + /* ---- Recursion/subroutine calls by name ---- */ + + case CHAR_AMPERSAND: + RECURSE_BY_NAME: + if (!read_name(&ptr, ptrend, utf, CHAR_RIGHT_PARENTHESIS, &offset, &name, + &namelen, &errorcode, cb)) goto FAILED; + *parsed_pattern++ = META_RECURSE_BYNAME; + *parsed_pattern++ = namelen; + PUTOFFSET(offset, parsed_pattern); + okquantifier = TRUE; + break; + + /* ---- Callout with numerical or string argument ---- */ + + case CHAR_C: + if (++ptr >= ptrend) goto UNCLOSED_PARENTHESIS; + + /* If the previous item was a condition starting (?(? an assertion, + optionally preceded by a callout, is expected. This is checked later on, + during actual compilation. However we need to identify this kind of + assertion in this pass because it must not be qualified. The value of + expect_cond_assert is set to 2 after (?(? is processed. We decrement it + for a callout - still leaving a positive value that identifies the + assertion. Multiple callouts or any other items will make it zero or + less, which doesn't matter because they will cause an error later. */ + + expect_cond_assert = prev_expect_cond_assert - 1; + + /* If previous_callout is not NULL, it means this follows a previous + callout. If it was a manual callout, do nothing; this means its "length + of next pattern item" field will remain zero. If it was an automatic + callout, abolish it. */ + + if (previous_callout != NULL && (options & PCRE2_AUTO_CALLOUT) != 0 && + previous_callout == parsed_pattern - 4 && + parsed_pattern[-1] == 255) + parsed_pattern = previous_callout; + + /* Save for updating next pattern item length, and skip one item before + completing. */ + + previous_callout = parsed_pattern; + after_manual_callout = 1; + + /* Handle a string argument; specific delimiter is required. */ + + if (*ptr != CHAR_RIGHT_PARENTHESIS && !IS_DIGIT(*ptr)) + { + PCRE2_SIZE calloutlength; + PCRE2_SPTR startptr = ptr; + + delimiter = 0; + for (i = 0; PRIV(callout_start_delims)[i] != 0; i++) + { + if (*ptr == PRIV(callout_start_delims)[i]) + { + delimiter = PRIV(callout_end_delims)[i]; + break; + } + } + if (delimiter == 0) + { + errorcode = ERR82; + goto FAILED; + } + + *parsed_pattern = META_CALLOUT_STRING; + parsed_pattern += 3; /* Skip pattern info */ + + for (;;) + { + if (++ptr >= ptrend) + { + errorcode = ERR81; + ptr = startptr; /* To give a more useful message */ + goto FAILED; + } + if (*ptr == delimiter && (++ptr >= ptrend || *ptr != delimiter)) + break; + } + + calloutlength = (PCRE2_SIZE)(ptr - startptr); + if (calloutlength > UINT32_MAX) + { + errorcode = ERR72; + goto FAILED; + } + *parsed_pattern++ = (uint32_t)calloutlength; + offset = (PCRE2_SIZE)(startptr - cb->start_pattern); + PUTOFFSET(offset, parsed_pattern); + } + + /* Handle a callout with an optional numerical argument, which must be + less than or equal to 255. A missing argument gives 0. */ + + else + { + int n = 0; + *parsed_pattern = META_CALLOUT_NUMBER; /* Numerical callout */ + parsed_pattern += 3; /* Skip pattern info */ + while (ptr < ptrend && IS_DIGIT(*ptr)) + { + n = n * 10 + *ptr++ - CHAR_0; + if (n > 255) + { + errorcode = ERR38; + goto FAILED; + } + } + *parsed_pattern++ = n; + } + + /* Both formats must have a closing parenthesis */ + + if (ptr >= ptrend || *ptr != CHAR_RIGHT_PARENTHESIS) + { + errorcode = ERR39; + goto FAILED; + } + ptr++; + + /* Remember the offset to the next item in the pattern, and set a default + length. This should get updated after the next item is read. */ + + previous_callout[1] = (uint32_t)(ptr - cb->start_pattern); + previous_callout[2] = 0; + break; /* End callout */ + + + /* ---- Conditional group ---- */ + + /* A condition can be an assertion, a number (referring to a numbered + group's having been set), a name (referring to a named group), or 'R', + referring to overall recursion. R and R&name are also permitted + for recursion state tests. Numbers may be preceded by + or - to specify a + relative group number. + + There are several syntaxes for testing a named group: (?(name)) is used + by Python; Perl 5.10 onwards uses (?() or (?('name')). + + There are two unfortunate ambiguities. 'R' can be the recursive thing or + the name 'R' (and similarly for 'R' followed by digits). 'DEFINE' can be + the Perl DEFINE feature or the Python named test. We look for a name + first; if not found, we try the other case. + + For compatibility with auto-callouts, we allow a callout to be specified + before a condition that is an assertion. */ + + case CHAR_LEFT_PARENTHESIS: + if (++ptr >= ptrend) goto UNCLOSED_PARENTHESIS; + nest_depth++; + + /* If the next character is ? or * there must be an assertion next + (optionally preceded by a callout). We do not check this here, but + instead we set expect_cond_assert to 2. If this is still greater than + zero (callouts decrement it) when the next assertion is read, it will be + marked as a condition that must not be repeated. A value greater than + zero also causes checking that an assertion (possibly with callout) + follows. */ + + if (*ptr == CHAR_QUESTION_MARK || *ptr == CHAR_ASTERISK) + { + *parsed_pattern++ = META_COND_ASSERT; + ptr--; /* Pull pointer back to the opening parenthesis. */ + expect_cond_assert = 2; + break; /* End of conditional */ + } + + /* Handle (?([+-]number)... */ + + if (read_number(&ptr, ptrend, cb->bracount, MAX_GROUP_NUMBER, ERR61, &i, + &errorcode)) + { + if (i <= 0) + { + errorcode = ERR15; + goto FAILED; + } + *parsed_pattern++ = META_COND_NUMBER; + offset = (PCRE2_SIZE)(ptr - cb->start_pattern - 2); + PUTOFFSET(offset, parsed_pattern); + *parsed_pattern++ = i; + } + else if (errorcode != 0) goto FAILED; /* Number too big */ + + /* No number found. Handle the special case (?(VERSION[>]=n.m)... */ + + else if (ptrend - ptr >= 10 && + PRIV(strncmp_c8)(ptr, STRING_VERSION, 7) == 0 && + ptr[7] != CHAR_RIGHT_PARENTHESIS) + { + uint32_t ge = 0; + int major = 0; + int minor = 0; + + ptr += 7; + if (*ptr == CHAR_GREATER_THAN_SIGN) + { + ge = 1; + ptr++; + } + + /* NOTE: cannot write IS_DIGIT(*(++ptr)) here because IS_DIGIT + references its argument twice. */ + + if (*ptr != CHAR_EQUALS_SIGN || (ptr++, !IS_DIGIT(*ptr))) + goto BAD_VERSION_CONDITION; + + if (!read_number(&ptr, ptrend, -1, 1000, ERR79, &major, &errorcode)) + goto FAILED; + + if (ptr >= ptrend) goto BAD_VERSION_CONDITION; + if (*ptr == CHAR_DOT) + { + if (++ptr >= ptrend || !IS_DIGIT(*ptr)) goto BAD_VERSION_CONDITION; + minor = (*ptr++ - CHAR_0) * 10; + if (ptr >= ptrend) goto BAD_VERSION_CONDITION; + if (IS_DIGIT(*ptr)) minor += *ptr++ - CHAR_0; + if (ptr >= ptrend || *ptr != CHAR_RIGHT_PARENTHESIS) + goto BAD_VERSION_CONDITION; + } + + *parsed_pattern++ = META_COND_VERSION; + *parsed_pattern++ = ge; + *parsed_pattern++ = major; + *parsed_pattern++ = minor; + } + + /* All the remaining cases now require us to read a name. We cannot at + this stage distinguish ambiguous cases such as (?(R12) which might be a + recursion test by number or a name, because the named groups have not yet + all been identified. Those cases are treated as names, but given a + different META code. */ + + else + { + BOOL was_r_ampersand = FALSE; + + if (*ptr == CHAR_R && ptrend - ptr > 1 && ptr[1] == CHAR_AMPERSAND) + { + terminator = CHAR_RIGHT_PARENTHESIS; + was_r_ampersand = TRUE; + ptr++; + } + else if (*ptr == CHAR_LESS_THAN_SIGN) + terminator = CHAR_GREATER_THAN_SIGN; + else if (*ptr == CHAR_APOSTROPHE) + terminator = CHAR_APOSTROPHE; + else + { + terminator = CHAR_RIGHT_PARENTHESIS; + ptr--; /* Point to char before name */ + } + if (!read_name(&ptr, ptrend, utf, terminator, &offset, &name, &namelen, + &errorcode, cb)) goto FAILED; + + /* Handle (?(R&name) */ + + if (was_r_ampersand) + { + *parsed_pattern = META_COND_RNAME; + ptr--; /* Back to closing parens */ + } + + /* Handle (?(name). If the name is "DEFINE" we identify it with a + special code. Likewise if the name consists of R followed only by + digits. Otherwise, handle it like a quoted name. */ + + else if (terminator == CHAR_RIGHT_PARENTHESIS) + { + if (namelen == 6 && PRIV(strncmp_c8)(name, STRING_DEFINE, 6) == 0) + *parsed_pattern = META_COND_DEFINE; + else + { + for (i = 1; i < (int)namelen; i++) + if (!IS_DIGIT(name[i])) break; + *parsed_pattern = (*name == CHAR_R && i >= (int)namelen)? + META_COND_RNUMBER : META_COND_NAME; + } + ptr--; /* Back to closing parens */ + } + + /* Handle (?('name') or (?() */ + + else *parsed_pattern = META_COND_NAME; + + /* All these cases except DEFINE end with the name length and offset; + DEFINE just has an offset (for the "too many branches" error). */ + + if (*parsed_pattern++ != META_COND_DEFINE) *parsed_pattern++ = namelen; + PUTOFFSET(offset, parsed_pattern); + } /* End cases that read a name */ + + /* Check the closing parenthesis of the condition */ + + if (ptr >= ptrend || *ptr != CHAR_RIGHT_PARENTHESIS) + { + errorcode = ERR24; + goto FAILED; + } + ptr++; + break; /* End of condition processing */ + + + /* ---- Atomic group ---- */ + + case CHAR_GREATER_THAN_SIGN: + ATOMIC_GROUP: /* Come from (*atomic: */ + *parsed_pattern++ = META_ATOMIC; + nest_depth++; + ptr++; + break; + + + /* ---- Lookahead assertions ---- */ + + case CHAR_EQUALS_SIGN: + POSITIVE_LOOK_AHEAD: /* Come from (*pla: */ + *parsed_pattern++ = META_LOOKAHEAD; + ptr++; + goto POST_ASSERTION; + + case CHAR_ASTERISK: + POSITIVE_NONATOMIC_LOOK_AHEAD: /* Come from (?* */ + *parsed_pattern++ = META_LOOKAHEAD_NA; + ptr++; + goto POST_ASSERTION; + + case CHAR_EXCLAMATION_MARK: + NEGATIVE_LOOK_AHEAD: /* Come from (*nla: */ + *parsed_pattern++ = META_LOOKAHEADNOT; + ptr++; + goto POST_ASSERTION; + + + /* ---- Lookbehind assertions ---- */ + + /* (?< followed by = or ! or * is a lookbehind assertion. Otherwise (?< + is the start of the name of a capturing group. */ + + case CHAR_LESS_THAN_SIGN: + if (ptrend - ptr <= 1 || + (ptr[1] != CHAR_EQUALS_SIGN && + ptr[1] != CHAR_EXCLAMATION_MARK && + ptr[1] != CHAR_ASTERISK)) + { + terminator = CHAR_GREATER_THAN_SIGN; + goto DEFINE_NAME; + } + *parsed_pattern++ = (ptr[1] == CHAR_EQUALS_SIGN)? + META_LOOKBEHIND : (ptr[1] == CHAR_EXCLAMATION_MARK)? + META_LOOKBEHINDNOT : META_LOOKBEHIND_NA; + + POST_LOOKBEHIND: /* Come from (*plb: (*naplb: and (*nlb: */ + *has_lookbehind = TRUE; + offset = (PCRE2_SIZE)(ptr - cb->start_pattern - 2); + PUTOFFSET(offset, parsed_pattern); + ptr += 2; + /* Fall through */ + + /* If the previous item was a condition starting (?(? an assertion, + optionally preceded by a callout, is expected. This is checked later on, + during actual compilation. However we need to identify this kind of + assertion in this pass because it must not be qualified. The value of + expect_cond_assert is set to 2 after (?(? is processed. We decrement it + for a callout - still leaving a positive value that identifies the + assertion. Multiple callouts or any other items will make it zero or + less, which doesn't matter because they will cause an error later. */ + + POST_ASSERTION: + nest_depth++; + if (prev_expect_cond_assert > 0) + { + if (top_nest == NULL) top_nest = (nest_save *)(cb->start_workspace); + else if (++top_nest >= end_nests) + { + errorcode = ERR84; + goto FAILED; + } + top_nest->nest_depth = nest_depth; + top_nest->flags = NSF_CONDASSERT; + top_nest->options = options & PARSE_TRACKED_OPTIONS; + } + break; + + + /* ---- Define a named group ---- */ + + /* A named group may be defined as (?'name') or (?). In the latter + case we jump to DEFINE_NAME from the disambiguation of (?< above with the + terminator set to '>'. */ + + case CHAR_APOSTROPHE: + terminator = CHAR_APOSTROPHE; /* Terminator */ + + DEFINE_NAME: + if (!read_name(&ptr, ptrend, utf, terminator, &offset, &name, &namelen, + &errorcode, cb)) goto FAILED; + + /* We have a name for this capturing group. It is also assigned a number, + which is its primary means of identification. */ + + if (cb->bracount >= MAX_GROUP_NUMBER) + { + errorcode = ERR97; + goto FAILED; + } + cb->bracount++; + *parsed_pattern++ = META_CAPTURE | cb->bracount; + nest_depth++; + + /* Check not too many names */ + + if (cb->names_found >= MAX_NAME_COUNT) + { + errorcode = ERR49; + goto FAILED; + } + + /* Adjust the entry size to accommodate the longest name found. */ + + if (namelen + IMM2_SIZE + 1 > cb->name_entry_size) + cb->name_entry_size = (uint16_t)(namelen + IMM2_SIZE + 1); + + /* Scan the list to check for duplicates. For duplicate names, if the + number is the same, break the loop, which causes the name to be + discarded; otherwise, if DUPNAMES is not set, give an error. + If it is set, allow the name with a different number, but continue + scanning in case this is a duplicate with the same number. For + non-duplicate names, give an error if the number is duplicated. */ + + isdupname = FALSE; + ng = cb->named_groups; + for (i = 0; i < cb->names_found; i++, ng++) + { + if (namelen == ng->length && + PRIV(strncmp)(name, ng->name, (PCRE2_SIZE)namelen) == 0) + { + if (ng->number == cb->bracount) break; + if ((options & PCRE2_DUPNAMES) == 0) + { + errorcode = ERR43; + goto FAILED; + } + isdupname = ng->isdup = TRUE; /* Mark as a duplicate */ + cb->dupnames = TRUE; /* Duplicate names exist */ + } + else if (ng->number == cb->bracount) + { + errorcode = ERR65; + goto FAILED; + } + } + + if (i < cb->names_found) break; /* Ignore duplicate with same number */ + + /* Increase the list size if necessary */ + + if (cb->names_found >= cb->named_group_list_size) + { + uint32_t newsize = cb->named_group_list_size * 2; + named_group *newspace = + cb->cx->memctl.malloc(newsize * sizeof(named_group), + cb->cx->memctl.memory_data); + if (newspace == NULL) + { + errorcode = ERR21; + goto FAILED; + } + + memcpy(newspace, cb->named_groups, + cb->named_group_list_size * sizeof(named_group)); + if (cb->named_group_list_size > NAMED_GROUP_LIST_SIZE) + cb->cx->memctl.free((void *)cb->named_groups, + cb->cx->memctl.memory_data); + cb->named_groups = newspace; + cb->named_group_list_size = newsize; + } + + /* Add this name to the list */ + + cb->named_groups[cb->names_found].name = name; + cb->named_groups[cb->names_found].length = (uint16_t)namelen; + cb->named_groups[cb->names_found].number = cb->bracount; + cb->named_groups[cb->names_found].isdup = (uint16_t)isdupname; + cb->names_found++; + break; + } /* End of (? switch */ + break; /* End of ( handling */ + + + /* ---- Branch terminators ---- */ + + /* Alternation: reset the capture count if we are in a (?| group. */ + + case CHAR_VERTICAL_LINE: + if (top_nest != NULL && top_nest->nest_depth == nest_depth && + (top_nest->flags & NSF_RESET) != 0) + { + if (cb->bracount > top_nest->max_group) + top_nest->max_group = (uint16_t)cb->bracount; + cb->bracount = top_nest->reset_group; + } + *parsed_pattern++ = META_ALT; + break; + + /* End of group; reset the capture count to the maximum if we are in a (?| + group and/or reset the options that are tracked during parsing. Disallow + quantifier for a condition that is an assertion. */ + + case CHAR_RIGHT_PARENTHESIS: + okquantifier = TRUE; + if (top_nest != NULL && top_nest->nest_depth == nest_depth) + { + options = (options & ~PARSE_TRACKED_OPTIONS) | top_nest->options; + if ((top_nest->flags & NSF_RESET) != 0 && + top_nest->max_group > cb->bracount) + cb->bracount = top_nest->max_group; + if ((top_nest->flags & NSF_CONDASSERT) != 0) + okquantifier = FALSE; + + if ((top_nest->flags & NSF_ATOMICSR) != 0) + { + *parsed_pattern++ = META_KET; + } + + if (top_nest == (nest_save *)(cb->start_workspace)) top_nest = NULL; + else top_nest--; + } + if (nest_depth == 0) /* Unmatched closing parenthesis */ + { + errorcode = ERR22; + goto FAILED_BACK; + } + nest_depth--; + *parsed_pattern++ = META_KET; + break; + } /* End of switch on pattern character */ + } /* End of main character scan loop */ + +/* End of pattern reached. Check for missing ) at the end of a verb name. */ + +if (inverbname && ptr >= ptrend) + { + errorcode = ERR60; + goto FAILED; + } + +/* Manage callout for the final item */ + +PARSED_END: +parsed_pattern = manage_callouts(ptr, &previous_callout, auto_callout, + parsed_pattern, cb); + +/* Insert trailing items for word and line matching (features provided for the +benefit of pcre2grep). */ + +if ((extra_options & PCRE2_EXTRA_MATCH_LINE) != 0) + { + *parsed_pattern++ = META_KET; + *parsed_pattern++ = META_DOLLAR; + } +else if ((extra_options & PCRE2_EXTRA_MATCH_WORD) != 0) + { + *parsed_pattern++ = META_KET; + *parsed_pattern++ = META_ESCAPE + ESC_b; + } + +/* Terminate the parsed pattern, then return success if all groups are closed. +Otherwise we have unclosed parentheses. */ + +if (parsed_pattern >= parsed_pattern_end) + { + errorcode = ERR63; /* Internal error (parsed pattern overflow) */ + goto FAILED; + } + +*parsed_pattern = META_END; +if (nest_depth == 0) return 0; + +UNCLOSED_PARENTHESIS: +errorcode = ERR14; + +/* Come here for all failures. */ + +FAILED: +cb->erroroffset = (PCRE2_SIZE)(ptr - cb->start_pattern); +return errorcode; + +/* Some errors need to indicate the previous character. */ + +FAILED_BACK: +ptr--; +goto FAILED; + +/* This failure happens several times. */ + +BAD_VERSION_CONDITION: +errorcode = ERR79; +goto FAILED; +} + + + +/************************************************* +* Find first significant opcode * +*************************************************/ + +/* This is called by several functions that scan a compiled expression looking +for a fixed first character, or an anchoring opcode etc. It skips over things +that do not influence this. For some calls, it makes sense to skip negative +forward and all backward assertions, and also the \b assertion; for others it +does not. + +Arguments: + code pointer to the start of the group + skipassert TRUE if certain assertions are to be skipped + +Returns: pointer to the first significant opcode +*/ + +static const PCRE2_UCHAR* +first_significant_code(PCRE2_SPTR code, BOOL skipassert) +{ +for (;;) + { + switch ((int)*code) + { + case OP_ASSERT_NOT: + case OP_ASSERTBACK: + case OP_ASSERTBACK_NOT: + case OP_ASSERTBACK_NA: + if (!skipassert) return code; + do code += GET(code, 1); while (*code == OP_ALT); + code += PRIV(OP_lengths)[*code]; + break; + + case OP_WORD_BOUNDARY: + case OP_NOT_WORD_BOUNDARY: + if (!skipassert) return code; + /* Fall through */ + + case OP_CALLOUT: + case OP_CREF: + case OP_DNCREF: + case OP_RREF: + case OP_DNRREF: + case OP_FALSE: + case OP_TRUE: + code += PRIV(OP_lengths)[*code]; + break; + + case OP_CALLOUT_STR: + code += GET(code, 1 + 2*LINK_SIZE); + break; + + case OP_SKIPZERO: + code += 2 + GET(code, 2) + LINK_SIZE; + break; + + case OP_COND: + case OP_SCOND: + if (code[1+LINK_SIZE] != OP_FALSE || /* Not DEFINE */ + code[GET(code, 1)] != OP_KET) /* More than one branch */ + return code; + code += GET(code, 1) + 1 + LINK_SIZE; + break; + + case OP_MARK: + case OP_COMMIT_ARG: + case OP_PRUNE_ARG: + case OP_SKIP_ARG: + case OP_THEN_ARG: + code += code[1] + PRIV(OP_lengths)[*code]; + break; + + default: + return code; + } + } +/* Control never reaches here */ +} + + + +#ifdef SUPPORT_UNICODE +/************************************************* +* Get othercase range * +*************************************************/ + +/* This function is passed the start and end of a class range in UCP mode. It +searches up the characters, looking for ranges of characters in the "other" +case. Each call returns the next one, updating the start address. A character +with multiple other cases is returned on its own with a special return value. + +Arguments: + cptr points to starting character value; updated + d end value + ocptr where to put start of othercase range + odptr where to put end of othercase range + +Yield: -1 when no more + 0 when a range is returned + >0 the CASESET offset for char with multiple other cases + in this case, ocptr contains the original +*/ + +static int +get_othercase_range(uint32_t *cptr, uint32_t d, uint32_t *ocptr, + uint32_t *odptr) +{ +uint32_t c, othercase, next; +unsigned int co; + +/* Find the first character that has an other case. If it has multiple other +cases, return its case offset value. */ + +for (c = *cptr; c <= d; c++) + { + if ((co = UCD_CASESET(c)) != 0) + { + *ocptr = c++; /* Character that has the set */ + *cptr = c; /* Rest of input range */ + return (int)co; + } + if ((othercase = UCD_OTHERCASE(c)) != c) break; + } + +if (c > d) return -1; /* Reached end of range */ + +/* Found a character that has a single other case. Search for the end of the +range, which is either the end of the input range, or a character that has zero +or more than one other cases. */ + +*ocptr = othercase; +next = othercase + 1; + +for (++c; c <= d; c++) + { + if ((co = UCD_CASESET(c)) != 0 || UCD_OTHERCASE(c) != next) break; + next++; + } + +*odptr = next - 1; /* End of othercase range */ +*cptr = c; /* Rest of input range */ +return 0; +} +#endif /* SUPPORT_UNICODE */ + + + +/************************************************* +* Add a character or range to a class (internal) * +*************************************************/ + +/* This function packages up the logic of adding a character or range of +characters to a class. The character values in the arguments will be within the +valid values for the current mode (8-bit, 16-bit, UTF, etc). This function is +called only from within the "add to class" group of functions, some of which +are recursive and mutually recursive. The external entry point is +add_to_class(). + +Arguments: + classbits the bit map for characters < 256 + uchardptr points to the pointer for extra data + options the options word + cb compile data + start start of range character + end end of range character + +Returns: the number of < 256 characters added + the pointer to extra data is updated +*/ + +static unsigned int +add_to_class_internal(uint8_t *classbits, PCRE2_UCHAR **uchardptr, + uint32_t options, compile_block *cb, uint32_t start, uint32_t end) +{ +uint32_t c; +uint32_t classbits_end = (end <= 0xff ? end : 0xff); +unsigned int n8 = 0; + +/* If caseless matching is required, scan the range and process alternate +cases. In Unicode, there are 8-bit characters that have alternate cases that +are greater than 255 and vice-versa. Sometimes we can just extend the original +range. */ + +if ((options & PCRE2_CASELESS) != 0) + { +#ifdef SUPPORT_UNICODE + if ((options & (PCRE2_UTF|PCRE2_UCP)) != 0) + { + int rc; + uint32_t oc, od; + + options &= ~PCRE2_CASELESS; /* Remove for recursive calls */ + c = start; + + while ((rc = get_othercase_range(&c, end, &oc, &od)) >= 0) + { + /* Handle a single character that has more than one other case. */ + + if (rc > 0) n8 += add_list_to_class_internal(classbits, uchardptr, options, cb, + PRIV(ucd_caseless_sets) + rc, oc); + + /* Do nothing if the other case range is within the original range. */ + + else if (oc >= cb->class_range_start && od <= cb->class_range_end) continue; + + /* Extend the original range if there is overlap, noting that if oc < c, we + can't have od > end because a subrange is always shorter than the basic + range. Otherwise, use a recursive call to add the additional range. */ + + else if (oc < start && od >= start - 1) start = oc; /* Extend downwards */ + else if (od > end && oc <= end + 1) + { + end = od; /* Extend upwards */ + if (end > classbits_end) classbits_end = (end <= 0xff ? end : 0xff); + } + else n8 += add_to_class_internal(classbits, uchardptr, options, cb, oc, od); + } + } + else +#endif /* SUPPORT_UNICODE */ + + /* Not UTF mode */ + + for (c = start; c <= classbits_end; c++) + { + SETBIT(classbits, cb->fcc[c]); + n8++; + } + } + +/* Now handle the originally supplied range. Adjust the final value according +to the bit length - this means that the same lists of (e.g.) horizontal spaces +can be used in all cases. */ + +if ((options & PCRE2_UTF) == 0 && end > MAX_NON_UTF_CHAR) + end = MAX_NON_UTF_CHAR; + +if (start > cb->class_range_start && end < cb->class_range_end) return n8; + +/* Use the bitmap for characters < 256. Otherwise use extra data.*/ + +for (c = start; c <= classbits_end; c++) + { + /* Regardless of start, c will always be <= 255. */ + SETBIT(classbits, c); + n8++; + } + +#ifdef SUPPORT_WIDE_CHARS +if (start <= 0xff) start = 0xff + 1; + +if (end >= start) + { + PCRE2_UCHAR *uchardata = *uchardptr; + +#ifdef SUPPORT_UNICODE + if ((options & PCRE2_UTF) != 0) + { + if (start < end) + { + *uchardata++ = XCL_RANGE; + uchardata += PRIV(ord2utf)(start, uchardata); + uchardata += PRIV(ord2utf)(end, uchardata); + } + else if (start == end) + { + *uchardata++ = XCL_SINGLE; + uchardata += PRIV(ord2utf)(start, uchardata); + } + } + else +#endif /* SUPPORT_UNICODE */ + + /* Without UTF support, character values are constrained by the bit length, + and can only be > 256 for 16-bit and 32-bit libraries. */ + +#if PCRE2_CODE_UNIT_WIDTH == 8 + {} +#else + if (start < end) + { + *uchardata++ = XCL_RANGE; + *uchardata++ = start; + *uchardata++ = end; + } + else if (start == end) + { + *uchardata++ = XCL_SINGLE; + *uchardata++ = start; + } +#endif /* PCRE2_CODE_UNIT_WIDTH == 8 */ + *uchardptr = uchardata; /* Updata extra data pointer */ + } +#else /* SUPPORT_WIDE_CHARS */ + (void)uchardptr; /* Avoid compiler warning */ +#endif /* SUPPORT_WIDE_CHARS */ + +return n8; /* Number of 8-bit characters */ +} + + + +#ifdef SUPPORT_UNICODE +/************************************************* +* Add a list of characters to a class (internal) * +*************************************************/ + +/* This function is used for adding a list of case-equivalent characters to a +class when in UTF mode. This function is called only from within +add_to_class_internal(), with which it is mutually recursive. + +Arguments: + classbits the bit map for characters < 256 + uchardptr points to the pointer for extra data + options the options word + cb contains pointers to tables etc. + p points to row of 32-bit values, terminated by NOTACHAR + except character to omit; this is used when adding lists of + case-equivalent characters to avoid including the one we + already know about + +Returns: the number of < 256 characters added + the pointer to extra data is updated +*/ + +static unsigned int +add_list_to_class_internal(uint8_t *classbits, PCRE2_UCHAR **uchardptr, + uint32_t options, compile_block *cb, const uint32_t *p, unsigned int except) +{ +unsigned int n8 = 0; +while (p[0] < NOTACHAR) + { + unsigned int n = 0; + if (p[0] != except) + { + while(p[n+1] == p[0] + n + 1) n++; + n8 += add_to_class_internal(classbits, uchardptr, options, cb, p[0], p[n]); + } + p += n + 1; + } +return n8; +} +#endif + + + +/************************************************* +* External entry point for add range to class * +*************************************************/ + +/* This function sets the overall range so that the internal functions can try +to avoid duplication when handling case-independence. + +Arguments: + classbits the bit map for characters < 256 + uchardptr points to the pointer for extra data + options the options word + cb compile data + start start of range character + end end of range character + +Returns: the number of < 256 characters added + the pointer to extra data is updated +*/ + +static unsigned int +add_to_class(uint8_t *classbits, PCRE2_UCHAR **uchardptr, uint32_t options, + compile_block *cb, uint32_t start, uint32_t end) +{ +cb->class_range_start = start; +cb->class_range_end = end; +return add_to_class_internal(classbits, uchardptr, options, cb, start, end); +} + + +/************************************************* +* External entry point for add list to class * +*************************************************/ + +/* This function is used for adding a list of horizontal or vertical whitespace +characters to a class. The list must be in order so that ranges of characters +can be detected and handled appropriately. This function sets the overall range +so that the internal functions can try to avoid duplication when handling +case-independence. + +Arguments: + classbits the bit map for characters < 256 + uchardptr points to the pointer for extra data + options the options word + cb contains pointers to tables etc. + p points to row of 32-bit values, terminated by NOTACHAR + except character to omit; this is used when adding lists of + case-equivalent characters to avoid including the one we + already know about + +Returns: the number of < 256 characters added + the pointer to extra data is updated +*/ + +static unsigned int +add_list_to_class(uint8_t *classbits, PCRE2_UCHAR **uchardptr, uint32_t options, + compile_block *cb, const uint32_t *p, unsigned int except) +{ +unsigned int n8 = 0; +while (p[0] < NOTACHAR) + { + unsigned int n = 0; + if (p[0] != except) + { + while(p[n+1] == p[0] + n + 1) n++; + cb->class_range_start = p[0]; + cb->class_range_end = p[n]; + n8 += add_to_class_internal(classbits, uchardptr, options, cb, p[0], p[n]); + } + p += n + 1; + } +return n8; +} + + + +/************************************************* +* Add characters not in a list to a class * +*************************************************/ + +/* This function is used for adding the complement of a list of horizontal or +vertical whitespace to a class. The list must be in order. + +Arguments: + classbits the bit map for characters < 256 + uchardptr points to the pointer for extra data + options the options word + cb contains pointers to tables etc. + p points to row of 32-bit values, terminated by NOTACHAR + +Returns: the number of < 256 characters added + the pointer to extra data is updated +*/ + +static unsigned int +add_not_list_to_class(uint8_t *classbits, PCRE2_UCHAR **uchardptr, + uint32_t options, compile_block *cb, const uint32_t *p) +{ +BOOL utf = (options & PCRE2_UTF) != 0; +unsigned int n8 = 0; +if (p[0] > 0) + n8 += add_to_class(classbits, uchardptr, options, cb, 0, p[0] - 1); +while (p[0] < NOTACHAR) + { + while (p[1] == p[0] + 1) p++; + n8 += add_to_class(classbits, uchardptr, options, cb, p[0] + 1, + (p[1] == NOTACHAR) ? (utf ? 0x10ffffu : 0xffffffffu) : p[1] - 1); + p++; + } +return n8; +} + + + +/************************************************* +* Find details of duplicate group names * +*************************************************/ + +/* This is called from compile_branch() when it needs to know the index and +count of duplicates in the names table when processing named backreferences, +either directly, or as conditions. + +Arguments: + name points to the name + length the length of the name + indexptr where to put the index + countptr where to put the count of duplicates + errorcodeptr where to put an error code + cb the compile block + +Returns: TRUE if OK, FALSE if not, error code set +*/ + +static BOOL +find_dupname_details(PCRE2_SPTR name, uint32_t length, int *indexptr, + int *countptr, int *errorcodeptr, compile_block *cb) +{ +uint32_t i, groupnumber; +int count; +PCRE2_UCHAR *slot = cb->name_table; + +/* Find the first entry in the table */ + +for (i = 0; i < cb->names_found; i++) + { + if (PRIV(strncmp)(name, slot+IMM2_SIZE, length) == 0 && + slot[IMM2_SIZE+length] == 0) break; + slot += cb->name_entry_size; + } + +/* This should not occur, because this function is called only when we know we +have duplicate names. Give an internal error. */ + +if (i >= cb->names_found) + { + *errorcodeptr = ERR53; + cb->erroroffset = name - cb->start_pattern; + return FALSE; + } + +/* Record the index and then see how many duplicates there are, updating the +backref map and maximum back reference as we do. */ + +*indexptr = i; +count = 0; + +for (;;) + { + count++; + groupnumber = GET2(slot,0); + cb->backref_map |= (groupnumber < 32)? (1u << groupnumber) : 1; + if (groupnumber > cb->top_backref) cb->top_backref = groupnumber; + if (++i >= cb->names_found) break; + slot += cb->name_entry_size; + if (PRIV(strncmp)(name, slot+IMM2_SIZE, length) != 0 || + (slot+IMM2_SIZE)[length] != 0) break; + } + +*countptr = count; +return TRUE; +} + + + +/************************************************* +* Compile one branch * +*************************************************/ + +/* Scan the parsed pattern, compiling it into the a vector of PCRE2_UCHAR. If +the options are changed during the branch, the pointer is used to change the +external options bits. This function is used during the pre-compile phase when +we are trying to find out the amount of memory needed, as well as during the +real compile phase. The value of lengthptr distinguishes the two phases. + +Arguments: + optionsptr pointer to the option bits + codeptr points to the pointer to the current code point + pptrptr points to the current parsed pattern pointer + errorcodeptr points to error code variable + firstcuptr place to put the first required code unit + firstcuflagsptr place to put the first code unit flags, or a negative number + reqcuptr place to put the last required code unit + reqcuflagsptr place to put the last required code unit flags, or a negative number + bcptr points to current branch chain + cb contains pointers to tables etc. + lengthptr NULL during the real compile phase + points to length accumulator during pre-compile phase + +Returns: 0 There's been an error, *errorcodeptr is non-zero + +1 Success, this branch must match at least one character + -1 Success, this branch may match an empty string +*/ + +static int +compile_branch(uint32_t *optionsptr, PCRE2_UCHAR **codeptr, uint32_t **pptrptr, + int *errorcodeptr, uint32_t *firstcuptr, int32_t *firstcuflagsptr, + uint32_t *reqcuptr, int32_t *reqcuflagsptr, branch_chain *bcptr, + compile_block *cb, PCRE2_SIZE *lengthptr) +{ +int bravalue = 0; +int okreturn = -1; +int group_return = 0; +uint32_t repeat_min = 0, repeat_max = 0; /* To please picky compilers */ +uint32_t greedy_default, greedy_non_default; +uint32_t repeat_type, op_type; +uint32_t options = *optionsptr; /* May change dynamically */ +uint32_t firstcu, reqcu; +uint32_t zeroreqcu, zerofirstcu; +uint32_t escape; +uint32_t *pptr = *pptrptr; +uint32_t meta, meta_arg; +int32_t firstcuflags, reqcuflags; +int32_t zeroreqcuflags, zerofirstcuflags; +int32_t req_caseopt, reqvary, tempreqvary; +PCRE2_SIZE offset = 0; +PCRE2_SIZE length_prevgroup = 0; +PCRE2_UCHAR *code = *codeptr; +PCRE2_UCHAR *last_code = code; +PCRE2_UCHAR *orig_code = code; +PCRE2_UCHAR *tempcode; +PCRE2_UCHAR *previous = NULL; +PCRE2_UCHAR op_previous; +BOOL groupsetfirstcu = FALSE; +BOOL had_accept = FALSE; +BOOL matched_char = FALSE; +BOOL previous_matched_char = FALSE; +BOOL reset_caseful = FALSE; +const uint8_t *cbits = cb->cbits; +uint8_t classbits[32]; + +/* We can fish out the UTF setting once and for all into a BOOL, but we must +not do this for other options (e.g. PCRE2_EXTENDED) because they may change +dynamically as we process the pattern. */ + +#ifdef SUPPORT_UNICODE +BOOL utf = (options & PCRE2_UTF) != 0; +BOOL ucp = (options & PCRE2_UCP) != 0; +#else /* No Unicode support */ +BOOL utf = FALSE; +#endif + +/* Helper variables for OP_XCLASS opcode (for characters > 255). We define +class_uchardata always so that it can be passed to add_to_class() always, +though it will not be used in non-UTF 8-bit cases. This avoids having to supply +alternative calls for the different cases. */ + +PCRE2_UCHAR *class_uchardata; +#ifdef SUPPORT_WIDE_CHARS +BOOL xclass; +PCRE2_UCHAR *class_uchardata_base; +#endif + +/* Set up the default and non-default settings for greediness */ + +greedy_default = ((options & PCRE2_UNGREEDY) != 0); +greedy_non_default = greedy_default ^ 1; + +/* Initialize no first unit, no required unit. REQ_UNSET means "no char +matching encountered yet". It gets changed to REQ_NONE if we hit something that +matches a non-fixed first unit; reqcu just remains unset if we never find one. + +When we hit a repeat whose minimum is zero, we may have to adjust these values +to take the zero repeat into account. This is implemented by setting them to +zerofirstcu and zeroreqcu when such a repeat is encountered. The individual +item types that can be repeated set these backoff variables appropriately. */ + +firstcu = reqcu = zerofirstcu = zeroreqcu = 0; +firstcuflags = reqcuflags = zerofirstcuflags = zeroreqcuflags = REQ_UNSET; + +/* The variable req_caseopt contains either the REQ_CASELESS value or zero, +according to the current setting of the caseless flag. The REQ_CASELESS value +leaves the lower 28 bit empty. It is added into the firstcu or reqcu variables +to record the case status of the value. This is used only for ASCII characters. +*/ + +req_caseopt = ((options & PCRE2_CASELESS) != 0)? REQ_CASELESS:0; + +/* Switch on next META item until the end of the branch */ + +for (;; pptr++) + { +#ifdef SUPPORT_WIDE_CHARS + BOOL xclass_has_prop; +#endif + BOOL negate_class; + BOOL should_flip_negation; + BOOL match_all_or_no_wide_chars; + BOOL possessive_quantifier; + BOOL note_group_empty; + int class_has_8bitchar; + int i; + uint32_t mclength; + uint32_t skipunits; + uint32_t subreqcu, subfirstcu; + uint32_t groupnumber; + uint32_t verbarglen, verbculen; + int32_t subreqcuflags, subfirstcuflags; /* Must be signed */ + open_capitem *oc; + PCRE2_UCHAR mcbuffer[8]; + + /* Get next META item in the pattern and its potential argument. */ + + meta = META_CODE(*pptr); + meta_arg = META_DATA(*pptr); + + /* If we are in the pre-compile phase, accumulate the length used for the + previous cycle of this loop, unless the next item is a quantifier. */ + + if (lengthptr != NULL) + { + if (code > cb->start_workspace + cb->workspace_size - + WORK_SIZE_SAFETY_MARGIN) /* Check for overrun */ + { + *errorcodeptr = (code >= cb->start_workspace + cb->workspace_size)? + ERR52 : ERR86; + return 0; + } + + /* There is at least one situation where code goes backwards: this is the + case of a zero quantifier after a class (e.g. [ab]{0}). When the quantifier + is processed, the whole class is eliminated. However, it is created first, + so we have to allow memory for it. Therefore, don't ever reduce the length + at this point. */ + + if (code < last_code) code = last_code; + + /* If the next thing is not a quantifier, we add the length of the previous + item into the total, and reset the code pointer to the start of the + workspace. Otherwise leave the previous item available to be quantified. */ + + if (meta < META_ASTERISK || meta > META_MINMAX_QUERY) + { + if (OFLOW_MAX - *lengthptr < (PCRE2_SIZE)(code - orig_code)) + { + *errorcodeptr = ERR20; /* Integer overflow */ + return 0; + } + *lengthptr += (PCRE2_SIZE)(code - orig_code); + if (*lengthptr > MAX_PATTERN_SIZE) + { + *errorcodeptr = ERR20; /* Pattern is too large */ + return 0; + } + code = orig_code; + } + + /* Remember where this code item starts so we can catch the "backwards" + case above next time round. */ + + last_code = code; + } + + /* Process the next parsed pattern item. If it is not a quantifier, remember + where it starts so that it can be quantified when a quantifier follows. + Checking for the legality of quantifiers happens in parse_regex(), except for + a quantifier after an assertion that is a condition. */ + + if (meta < META_ASTERISK || meta > META_MINMAX_QUERY) + { + previous = code; + if (matched_char && !had_accept) okreturn = 1; + } + + previous_matched_char = matched_char; + matched_char = FALSE; + note_group_empty = FALSE; + skipunits = 0; /* Default value for most subgroups */ + + switch(meta) + { + /* ===================================================================*/ + /* The branch terminates at pattern end or | or ) */ + + case META_END: + case META_ALT: + case META_KET: + *firstcuptr = firstcu; + *firstcuflagsptr = firstcuflags; + *reqcuptr = reqcu; + *reqcuflagsptr = reqcuflags; + *codeptr = code; + *pptrptr = pptr; + return okreturn; + + + /* ===================================================================*/ + /* Handle single-character metacharacters. In multiline mode, ^ disables + the setting of any following char as a first character. */ + + case META_CIRCUMFLEX: + if ((options & PCRE2_MULTILINE) != 0) + { + if (firstcuflags == REQ_UNSET) + zerofirstcuflags = firstcuflags = REQ_NONE; + *code++ = OP_CIRCM; + } + else *code++ = OP_CIRC; + break; + + case META_DOLLAR: + *code++ = ((options & PCRE2_MULTILINE) != 0)? OP_DOLLM : OP_DOLL; + break; + + /* There can never be a first char if '.' is first, whatever happens about + repeats. The value of reqcu doesn't change either. */ + + case META_DOT: + matched_char = TRUE; + if (firstcuflags == REQ_UNSET) firstcuflags = REQ_NONE; + zerofirstcu = firstcu; + zerofirstcuflags = firstcuflags; + zeroreqcu = reqcu; + zeroreqcuflags = reqcuflags; + *code++ = ((options & PCRE2_DOTALL) != 0)? OP_ALLANY: OP_ANY; + break; + + + /* ===================================================================*/ + /* Empty character classes are allowed if PCRE2_ALLOW_EMPTY_CLASS is set. + Otherwise, an initial ']' is taken as a data character. When empty classes + are allowed, [] must always fail, so generate OP_FAIL, whereas [^] must + match any character, so generate OP_ALLANY. */ + + case META_CLASS_EMPTY: + case META_CLASS_EMPTY_NOT: + matched_char = TRUE; + *code++ = (meta == META_CLASS_EMPTY_NOT)? OP_ALLANY : OP_FAIL; + if (firstcuflags == REQ_UNSET) firstcuflags = REQ_NONE; + zerofirstcu = firstcu; + zerofirstcuflags = firstcuflags; + break; + + + /* ===================================================================*/ + /* Non-empty character class. If the included characters are all < 256, we + build a 32-byte bitmap of the permitted characters, except in the special + case where there is only one such character. For negated classes, we build + the map as usual, then invert it at the end. However, we use a different + opcode so that data characters > 255 can be handled correctly. + + If the class contains characters outside the 0-255 range, a different + opcode is compiled. It may optionally have a bit map for characters < 256, + but those above are are explicitly listed afterwards. A flag code unit + tells whether the bitmap is present, and whether this is a negated class or + not. */ + + case META_CLASS_NOT: + case META_CLASS: + matched_char = TRUE; + negate_class = meta == META_CLASS_NOT; + + /* We can optimize the case of a single character in a class by generating + OP_CHAR or OP_CHARI if it's positive, or OP_NOT or OP_NOTI if it's + negative. In the negative case there can be no first char if this item is + first, whatever repeat count may follow. In the case of reqcu, save the + previous value for reinstating. */ + + /* NOTE: at present this optimization is not effective if the only + character in a class in 32-bit, non-UCP mode has its top bit set. */ + + if (pptr[1] < META_END && pptr[2] == META_CLASS_END) + { +#ifdef SUPPORT_UNICODE + uint32_t d; +#endif + uint32_t c = pptr[1]; + + pptr += 2; /* Move on to class end */ + if (meta == META_CLASS) /* A positive one-char class can be */ + { /* handled as a normal literal character. */ + meta = c; /* Set up the character */ + goto NORMAL_CHAR_SET; + } + + /* Handle a negative one-character class */ + + zeroreqcu = reqcu; + zeroreqcuflags = reqcuflags; + if (firstcuflags == REQ_UNSET) firstcuflags = REQ_NONE; + zerofirstcu = firstcu; + zerofirstcuflags = firstcuflags; + + /* For caseless UTF or UCP mode, check whether this character has more + than one other case. If so, generate a special OP_NOTPROP item instead of + OP_NOTI. */ + +#ifdef SUPPORT_UNICODE + if ((utf||ucp) && (options & PCRE2_CASELESS) != 0 && + (d = UCD_CASESET(c)) != 0) + { + *code++ = OP_NOTPROP; + *code++ = PT_CLIST; + *code++ = d; + break; /* We are finished with this class */ + } +#endif + /* Char has only one other case, or UCP not available */ + + *code++ = ((options & PCRE2_CASELESS) != 0)? OP_NOTI: OP_NOT; + code += PUTCHAR(c, code); + break; /* We are finished with this class */ + } /* End of 1-char optimization */ + + /* Handle character classes that contain more than just one literal + character. If there are exactly two characters in a positive class, see if + they are case partners. This can be optimized to generate a caseless single + character match (which also sets first/required code units if relevant). */ + + if (meta == META_CLASS && pptr[1] < META_END && pptr[2] < META_END && + pptr[3] == META_CLASS_END) + { + uint32_t c = pptr[1]; + +#ifdef SUPPORT_UNICODE + if (UCD_CASESET(c) == 0) +#endif + { + uint32_t d; + +#ifdef SUPPORT_UNICODE + if ((utf || ucp) && c > 127) d = UCD_OTHERCASE(c); else +#endif + { +#if PCRE2_CODE_UNIT_WIDTH != 8 + if (c > 255) d = c; else +#endif + d = TABLE_GET(c, cb->fcc, c); + } + + if (c != d && pptr[2] == d) + { + pptr += 3; /* Move on to class end */ + meta = c; + if ((options & PCRE2_CASELESS) == 0) + { + reset_caseful = TRUE; + options |= PCRE2_CASELESS; + req_caseopt = REQ_CASELESS; + } + goto CLASS_CASELESS_CHAR; + } + } + } + + /* If a non-extended class contains a negative special such as \S, we need + to flip the negation flag at the end, so that support for characters > 255 + works correctly (they are all included in the class). An extended class may + need to insert specific matching or non-matching code for wide characters. + */ + + should_flip_negation = match_all_or_no_wide_chars = FALSE; + + /* Extended class (xclass) will be used when characters > 255 + might match. */ + +#ifdef SUPPORT_WIDE_CHARS + xclass = FALSE; + class_uchardata = code + LINK_SIZE + 2; /* For XCLASS items */ + class_uchardata_base = class_uchardata; /* Save the start */ +#endif + + /* For optimization purposes, we track some properties of the class: + class_has_8bitchar will be non-zero if the class contains at least one + character with a code point less than 256; xclass_has_prop will be TRUE if + Unicode property checks are present in the class. */ + + class_has_8bitchar = 0; +#ifdef SUPPORT_WIDE_CHARS + xclass_has_prop = FALSE; +#endif + + /* Initialize the 256-bit (32-byte) bit map to all zeros. We build the map + in a temporary bit of memory, in case the class contains fewer than two + 8-bit characters because in that case the compiled code doesn't use the bit + map. */ + + memset(classbits, 0, 32 * sizeof(uint8_t)); + + /* Process items until META_CLASS_END is reached. */ + + while ((meta = *(++pptr)) != META_CLASS_END) + { + /* Handle POSIX classes such as [:alpha:] etc. */ + + if (meta == META_POSIX || meta == META_POSIX_NEG) + { + BOOL local_negate = (meta == META_POSIX_NEG); + int posix_class = *(++pptr); + int taboffset, tabopt; + uint8_t pbits[32]; + + should_flip_negation = local_negate; /* Note negative special */ + + /* If matching is caseless, upper and lower are converted to alpha. + This relies on the fact that the class table starts with alpha, + lower, upper as the first 3 entries. */ + + if ((options & PCRE2_CASELESS) != 0 && posix_class <= 2) + posix_class = 0; + + /* When PCRE2_UCP is set, some of the POSIX classes are converted to + different escape sequences that use Unicode properties \p or \P. + Others that are not available via \p or \P have to generate + XCL_PROP/XCL_NOTPROP directly, which is done here. */ + +#ifdef SUPPORT_UNICODE + if ((options & PCRE2_UCP) != 0) switch(posix_class) + { + case PC_GRAPH: + case PC_PRINT: + case PC_PUNCT: + *class_uchardata++ = local_negate? XCL_NOTPROP : XCL_PROP; + *class_uchardata++ = (PCRE2_UCHAR) + ((posix_class == PC_GRAPH)? PT_PXGRAPH : + (posix_class == PC_PRINT)? PT_PXPRINT : PT_PXPUNCT); + *class_uchardata++ = 0; + xclass_has_prop = TRUE; + goto CONTINUE_CLASS; + + /* For the other POSIX classes (ascii, xdigit) we are going to + fall through to the non-UCP case and build a bit map for + characters with code points less than 256. However, if we are in + a negated POSIX class, characters with code points greater than + 255 must either all match or all not match, depending on whether + the whole class is not or is negated. For example, for + [[:^ascii:]... they must all match, whereas for [^[:^xdigit:]... + they must not. + + In the special case where there are no xclass items, this is + automatically handled by the use of OP_CLASS or OP_NCLASS, but an + explicit range is needed for OP_XCLASS. Setting a flag here + causes the range to be generated later when it is known that + OP_XCLASS is required. In the 8-bit library this is relevant only in + utf mode, since no wide characters can exist otherwise. */ + + default: +#if PCRE2_CODE_UNIT_WIDTH == 8 + if (utf) +#endif + match_all_or_no_wide_chars |= local_negate; + break; + } +#endif /* SUPPORT_UNICODE */ + + /* In the non-UCP case, or when UCP makes no difference, we build the + bit map for the POSIX class in a chunk of local store because we may + be adding and subtracting from it, and we don't want to subtract bits + that may be in the main map already. At the end we or the result into + the bit map that is being built. */ + + posix_class *= 3; + + /* Copy in the first table (always present) */ + + memcpy(pbits, cbits + posix_class_maps[posix_class], + 32 * sizeof(uint8_t)); + + /* If there is a second table, add or remove it as required. */ + + taboffset = posix_class_maps[posix_class + 1]; + tabopt = posix_class_maps[posix_class + 2]; + + if (taboffset >= 0) + { + if (tabopt >= 0) + for (i = 0; i < 32; i++) pbits[i] |= cbits[(int)i + taboffset]; + else + for (i = 0; i < 32; i++) pbits[i] &= ~cbits[(int)i + taboffset]; + } + + /* Now see if we need to remove any special characters. An option + value of 1 removes vertical space and 2 removes underscore. */ + + if (tabopt < 0) tabopt = -tabopt; + if (tabopt == 1) pbits[1] &= ~0x3c; + else if (tabopt == 2) pbits[11] &= 0x7f; + + /* Add the POSIX table or its complement into the main table that is + being built and we are done. */ + + if (local_negate) + for (i = 0; i < 32; i++) classbits[i] |= ~pbits[i]; + else + for (i = 0; i < 32; i++) classbits[i] |= pbits[i]; + + /* Every class contains at least one < 256 character. */ + + class_has_8bitchar = 1; + goto CONTINUE_CLASS; /* End of POSIX handling */ + } + + /* Other than POSIX classes, the only items we should encounter are + \d-type escapes and literal characters (possibly as ranges). */ + + if (meta == META_BIGVALUE) + { + meta = *(++pptr); + goto CLASS_LITERAL; + } + + /* Any other non-literal must be an escape */ + + if (meta >= META_END) + { + if (META_CODE(meta) != META_ESCAPE) + { +#ifdef DEBUG_SHOW_PARSED + fprintf(stderr, "** Unrecognized parsed pattern item 0x%.8x " + "in character class\n", meta); +#endif + *errorcodeptr = ERR89; /* Internal error - unrecognized. */ + return 0; + } + escape = META_DATA(meta); + + /* Every class contains at least one < 256 character. */ + + class_has_8bitchar++; + + switch(escape) + { + case ESC_d: + for (i = 0; i < 32; i++) classbits[i] |= cbits[i+cbit_digit]; + break; + + case ESC_D: + should_flip_negation = TRUE; + for (i = 0; i < 32; i++) classbits[i] |= ~cbits[i+cbit_digit]; + break; + + case ESC_w: + for (i = 0; i < 32; i++) classbits[i] |= cbits[i+cbit_word]; + break; + + case ESC_W: + should_flip_negation = TRUE; + for (i = 0; i < 32; i++) classbits[i] |= ~cbits[i+cbit_word]; + break; + + /* Perl 5.004 onwards omitted VT from \s, but restored it at Perl + 5.18. Before PCRE 8.34, we had to preserve the VT bit if it was + previously set by something earlier in the character class. + Luckily, the value of CHAR_VT is 0x0b in both ASCII and EBCDIC, so + we could just adjust the appropriate bit. From PCRE 8.34 we no + longer treat \s and \S specially. */ + + case ESC_s: + for (i = 0; i < 32; i++) classbits[i] |= cbits[i+cbit_space]; + break; + + case ESC_S: + should_flip_negation = TRUE; + for (i = 0; i < 32; i++) classbits[i] |= ~cbits[i+cbit_space]; + break; + + /* When adding the horizontal or vertical space lists to a class, or + their complements, disable PCRE2_CASELESS, because it justs wastes + time, and in the "not-x" UTF cases can create unwanted duplicates in + the XCLASS list (provoked by characters that have more than one other + case and by both cases being in the same "not-x" sublist). */ + + case ESC_h: + (void)add_list_to_class(classbits, &class_uchardata, + options & ~PCRE2_CASELESS, cb, PRIV(hspace_list), NOTACHAR); + break; + + case ESC_H: + (void)add_not_list_to_class(classbits, &class_uchardata, + options & ~PCRE2_CASELESS, cb, PRIV(hspace_list)); + break; + + case ESC_v: + (void)add_list_to_class(classbits, &class_uchardata, + options & ~PCRE2_CASELESS, cb, PRIV(vspace_list), NOTACHAR); + break; + + case ESC_V: + (void)add_not_list_to_class(classbits, &class_uchardata, + options & ~PCRE2_CASELESS, cb, PRIV(vspace_list)); + break; + + /* If Unicode is not supported, \P and \p are not allowed and are + faulted at parse time, so will never appear here. */ + +#ifdef SUPPORT_UNICODE + case ESC_p: + case ESC_P: + { + uint32_t ptype = *(++pptr) >> 16; + uint32_t pdata = *pptr & 0xffff; + *class_uchardata++ = (escape == ESC_p)? XCL_PROP : XCL_NOTPROP; + *class_uchardata++ = ptype; + *class_uchardata++ = pdata; + xclass_has_prop = TRUE; + class_has_8bitchar--; /* Undo! */ + } + break; +#endif + } + + goto CONTINUE_CLASS; + } /* End handling \d-type escapes */ + + /* A literal character may be followed by a range meta. At parse time + there are checks for out-of-order characters, for ranges where the two + characters are equal, and for hyphens that cannot indicate a range. At + this point, therefore, no checking is needed. */ + + else + { + uint32_t c, d; + + CLASS_LITERAL: + c = d = meta; + + /* Remember if \r or \n were explicitly used */ + + if (c == CHAR_CR || c == CHAR_NL) cb->external_flags |= PCRE2_HASCRORLF; + + /* Process a character range */ + + if (pptr[1] == META_RANGE_LITERAL || pptr[1] == META_RANGE_ESCAPED) + { +#ifdef EBCDIC + BOOL range_is_literal = (pptr[1] == META_RANGE_LITERAL); +#endif + pptr += 2; + d = *pptr; + if (d == META_BIGVALUE) d = *(++pptr); + + /* Remember an explicit \r or \n, and add the range to the class. */ + + if (d == CHAR_CR || d == CHAR_NL) cb->external_flags |= PCRE2_HASCRORLF; + + /* In an EBCDIC environment, Perl treats alphabetic ranges specially + because there are holes in the encoding, and simply using the range + A-Z (for example) would include the characters in the holes. This + applies only to literal ranges; [\xC1-\xE9] is different to [A-Z]. */ + +#ifdef EBCDIC + if (range_is_literal && + (cb->ctypes[c] & ctype_letter) != 0 && + (cb->ctypes[d] & ctype_letter) != 0 && + (c <= CHAR_z) == (d <= CHAR_z)) + { + uint32_t uc = (d <= CHAR_z)? 0 : 64; + uint32_t C = c - uc; + uint32_t D = d - uc; + + if (C <= CHAR_i) + { + class_has_8bitchar += + add_to_class(classbits, &class_uchardata, options, cb, C + uc, + ((D < CHAR_i)? D : CHAR_i) + uc); + C = CHAR_j; + } + + if (C <= D && C <= CHAR_r) + { + class_has_8bitchar += + add_to_class(classbits, &class_uchardata, options, cb, C + uc, + ((D < CHAR_r)? D : CHAR_r) + uc); + C = CHAR_s; + } + + if (C <= D) + { + class_has_8bitchar += + add_to_class(classbits, &class_uchardata, options, cb, C + uc, + D + uc); + } + } + else +#endif + /* Not an EBCDIC special range */ + + class_has_8bitchar += + add_to_class(classbits, &class_uchardata, options, cb, c, d); + goto CONTINUE_CLASS; /* Go get the next char in the class */ + } /* End of range handling */ + + + /* Handle a single character. */ + + class_has_8bitchar += + add_to_class(classbits, &class_uchardata, options, cb, meta, meta); + } + + /* Continue to the next item in the class. */ + + CONTINUE_CLASS: + +#ifdef SUPPORT_WIDE_CHARS + /* If any wide characters or Unicode properties have been encountered, + set xclass = TRUE. Then, in the pre-compile phase, accumulate the length + of the extra data and reset the pointer. This is so that very large + classes that contain a zillion wide characters or Unicode property tests + do not overwrite the workspace (which is on the stack). */ + + if (class_uchardata > class_uchardata_base) + { + xclass = TRUE; + if (lengthptr != NULL) + { + *lengthptr += class_uchardata - class_uchardata_base; + class_uchardata = class_uchardata_base; + } + } +#endif + + continue; /* Needed to avoid error when not supporting wide chars */ + } /* End of main class-processing loop */ + + /* If this class is the first thing in the branch, there can be no first + char setting, whatever the repeat count. Any reqcu setting must remain + unchanged after any kind of repeat. */ + + if (firstcuflags == REQ_UNSET) firstcuflags = REQ_NONE; + zerofirstcu = firstcu; + zerofirstcuflags = firstcuflags; + zeroreqcu = reqcu; + zeroreqcuflags = reqcuflags; + + /* If there are characters with values > 255, or Unicode property settings + (\p or \P), we have to compile an extended class, with its own opcode, + unless there were no property settings and there was a negated special such + as \S in the class, and PCRE2_UCP is not set, because in that case all + characters > 255 are in or not in the class, so any that were explicitly + given as well can be ignored. + + In the UCP case, if certain negated POSIX classes ([:^ascii:] or + [^:xdigit:]) were present in a class, we either have to match or not match + all wide characters (depending on whether the whole class is or is not + negated). This requirement is indicated by match_all_or_no_wide_chars being + true. We do this by including an explicit range, which works in both cases. + This applies only in UTF and 16-bit and 32-bit non-UTF modes, since there + cannot be any wide characters in 8-bit non-UTF mode. + + When there *are* properties in a positive UTF-8 or any 16-bit or 32_bit + class where \S etc is present without PCRE2_UCP, causing an extended class + to be compiled, we make sure that all characters > 255 are included by + forcing match_all_or_no_wide_chars to be true. + + If, when generating an xclass, there are no characters < 256, we can omit + the bitmap in the actual compiled code. */ + +#ifdef SUPPORT_WIDE_CHARS /* Defined for 16/32 bits, or 8-bit with Unicode */ + if (xclass && ( +#ifdef SUPPORT_UNICODE + (options & PCRE2_UCP) != 0 || +#endif + xclass_has_prop || !should_flip_negation)) + { + if (match_all_or_no_wide_chars || ( +#if PCRE2_CODE_UNIT_WIDTH == 8 + utf && +#endif + should_flip_negation && !negate_class && (options & PCRE2_UCP) == 0)) + { + *class_uchardata++ = XCL_RANGE; + if (utf) /* Will always be utf in the 8-bit library */ + { + class_uchardata += PRIV(ord2utf)(0x100, class_uchardata); + class_uchardata += PRIV(ord2utf)(MAX_UTF_CODE_POINT, class_uchardata); + } + else /* Can only happen for the 16-bit & 32-bit libraries */ + { +#if PCRE2_CODE_UNIT_WIDTH == 16 + *class_uchardata++ = 0x100; + *class_uchardata++ = 0xffffu; +#elif PCRE2_CODE_UNIT_WIDTH == 32 + *class_uchardata++ = 0x100; + *class_uchardata++ = 0xffffffffu; +#endif + } + } + *class_uchardata++ = XCL_END; /* Marks the end of extra data */ + *code++ = OP_XCLASS; + code += LINK_SIZE; + *code = negate_class? XCL_NOT:0; + if (xclass_has_prop) *code |= XCL_HASPROP; + + /* If the map is required, move up the extra data to make room for it; + otherwise just move the code pointer to the end of the extra data. */ + + if (class_has_8bitchar > 0) + { + *code++ |= XCL_MAP; + (void)memmove(code + (32 / sizeof(PCRE2_UCHAR)), code, + CU2BYTES(class_uchardata - code)); + if (negate_class && !xclass_has_prop) + { + /* Using 255 ^ instead of ~ avoids clang sanitize warning. */ + for (i = 0; i < 32; i++) classbits[i] = 255 ^ classbits[i]; + } + memcpy(code, classbits, 32); + code = class_uchardata + (32 / sizeof(PCRE2_UCHAR)); + } + else code = class_uchardata; + + /* Now fill in the complete length of the item */ + + PUT(previous, 1, (int)(code - previous)); + break; /* End of class handling */ + } +#endif /* SUPPORT_WIDE_CHARS */ + + /* If there are no characters > 255, or they are all to be included or + excluded, set the opcode to OP_CLASS or OP_NCLASS, depending on whether the + whole class was negated and whether there were negative specials such as \S + (non-UCP) in the class. Then copy the 32-byte map into the code vector, + negating it if necessary. */ + + *code++ = (negate_class == should_flip_negation) ? OP_CLASS : OP_NCLASS; + if (lengthptr == NULL) /* Save time in the pre-compile phase */ + { + if (negate_class) + { + /* Using 255 ^ instead of ~ avoids clang sanitize warning. */ + for (i = 0; i < 32; i++) classbits[i] = 255 ^ classbits[i]; + } + memcpy(code, classbits, 32); + } + code += 32 / sizeof(PCRE2_UCHAR); + break; /* End of class processing */ + + + /* ===================================================================*/ + /* Deal with (*VERB)s. */ + + /* Check for open captures before ACCEPT and close those that are within + the same assertion level, also converting ACCEPT to ASSERT_ACCEPT in an + assertion. In the first pass, just accumulate the length required; + otherwise hitting (*ACCEPT) inside many nested parentheses can cause + workspace overflow. Do not set firstcu after *ACCEPT. */ + + case META_ACCEPT: + cb->had_accept = had_accept = TRUE; + for (oc = cb->open_caps; + oc != NULL && oc->assert_depth >= cb->assert_depth; + oc = oc->next) + { + if (lengthptr != NULL) + { + *lengthptr += CU2BYTES(1) + IMM2_SIZE; + } + else + { + *code++ = OP_CLOSE; + PUT2INC(code, 0, oc->number); + } + } + *code++ = (cb->assert_depth > 0)? OP_ASSERT_ACCEPT : OP_ACCEPT; + if (firstcuflags == REQ_UNSET) firstcuflags = REQ_NONE; + break; + + case META_PRUNE: + case META_SKIP: + cb->had_pruneorskip = TRUE; + /* Fall through */ + case META_COMMIT: + case META_FAIL: + *code++ = verbops[(meta - META_MARK) >> 16]; + break; + + case META_THEN: + cb->external_flags |= PCRE2_HASTHEN; + *code++ = OP_THEN; + break; + + /* Handle verbs with arguments. Arguments can be very long, especially in + 16- and 32-bit modes, and can overflow the workspace in the first pass. + However, the argument length is constrained to be small enough to fit in + one code unit. This check happens in parse_regex(). In the first pass, + instead of putting the argument into memory, we just update the length + counter and set up an empty argument. */ + + case META_THEN_ARG: + cb->external_flags |= PCRE2_HASTHEN; + goto VERB_ARG; + + case META_PRUNE_ARG: + case META_SKIP_ARG: + cb->had_pruneorskip = TRUE; + /* Fall through */ + case META_MARK: + case META_COMMIT_ARG: + VERB_ARG: + *code++ = verbops[(meta - META_MARK) >> 16]; + /* The length is in characters. */ + verbarglen = *(++pptr); + verbculen = 0; + tempcode = code++; + for (i = 0; i < (int)verbarglen; i++) + { + meta = *(++pptr); +#ifdef SUPPORT_UNICODE + if (utf) mclength = PRIV(ord2utf)(meta, mcbuffer); else +#endif + { + mclength = 1; + mcbuffer[0] = meta; + } + if (lengthptr != NULL) *lengthptr += mclength; else + { + memcpy(code, mcbuffer, CU2BYTES(mclength)); + code += mclength; + verbculen += mclength; + } + } + + *tempcode = verbculen; /* Fill in the code unit length */ + *code++ = 0; /* Terminating zero */ + break; + + + /* ===================================================================*/ + /* Handle options change. The new setting must be passed back for use in + subsequent branches. Reset the greedy defaults and the case value for + firstcu and reqcu. */ + + case META_OPTIONS: + *optionsptr = options = *(++pptr); + greedy_default = ((options & PCRE2_UNGREEDY) != 0); + greedy_non_default = greedy_default ^ 1; + req_caseopt = ((options & PCRE2_CASELESS) != 0)? REQ_CASELESS : 0; + break; + + + /* ===================================================================*/ + /* Handle conditional subpatterns. The case of (?(Rdigits) is ambiguous + because it could be a numerical check on recursion, or a name check on a + group's being set. The pre-pass sets up META_COND_RNUMBER as a name so that + we can handle it either way. We first try for a name; if not found, process + the number. */ + + case META_COND_RNUMBER: /* (?(Rdigits) */ + case META_COND_NAME: /* (?(name) or (?'name') or ?() */ + case META_COND_RNAME: /* (?(R&name) - test for recursion */ + bravalue = OP_COND; + { + int count, index; + PCRE2_SPTR name; + named_group *ng = cb->named_groups; + uint32_t length = *(++pptr); + + GETPLUSOFFSET(offset, pptr); + name = cb->start_pattern + offset; + + /* In the first pass, the names generated in the pre-pass are available, + but the main name table has not yet been created. Scan the list of names + generated in the pre-pass in order to get a number and whether or not + this name is duplicated. If it is not duplicated, we can handle it as a + numerical group. */ + + for (i = 0; i < cb->names_found; i++, ng++) + { + if (length == ng->length && + PRIV(strncmp)(name, ng->name, length) == 0) + { + if (!ng->isdup) + { + code[1+LINK_SIZE] = (meta == META_COND_RNAME)? OP_RREF : OP_CREF; + PUT2(code, 2+LINK_SIZE, ng->number); + if (ng->number > cb->top_backref) cb->top_backref = ng->number; + skipunits = 1+IMM2_SIZE; + goto GROUP_PROCESS_NOTE_EMPTY; + } + break; /* Found a duplicated name */ + } + } + + /* If the name was not found we have a bad reference, unless we are + dealing with R, which is treated as a recursion test by number. + */ + + if (i >= cb->names_found) + { + groupnumber = 0; + if (meta == META_COND_RNUMBER) + { + for (i = 1; i < (int)length; i++) + { + groupnumber = groupnumber * 10 + name[i] - CHAR_0; + if (groupnumber > MAX_GROUP_NUMBER) + { + *errorcodeptr = ERR61; + cb->erroroffset = offset + i; + return 0; + } + } + } + + if (meta != META_COND_RNUMBER || groupnumber > cb->bracount) + { + *errorcodeptr = ERR15; + cb->erroroffset = offset; + return 0; + } + + /* (?Rdigits) treated as a recursion reference by number. A value of + zero (which is the result of both (?R) and (?R0)) means "any", and is + translated into RREF_ANY (which is 0xffff). */ + + if (groupnumber == 0) groupnumber = RREF_ANY; + code[1+LINK_SIZE] = OP_RREF; + PUT2(code, 2+LINK_SIZE, groupnumber); + skipunits = 1+IMM2_SIZE; + goto GROUP_PROCESS_NOTE_EMPTY; + } + + /* A duplicated name was found. Note that if an R name is found + (META_COND_RNUMBER), it is a reference test, not a recursion test. */ + + code[1+LINK_SIZE] = (meta == META_COND_RNAME)? OP_RREF : OP_CREF; + + /* We have a duplicated name. In the compile pass we have to search the + main table in order to get the index and count values. */ + + count = 0; /* Values for first pass (avoids compiler warning) */ + index = 0; + if (lengthptr == NULL && !find_dupname_details(name, length, &index, + &count, errorcodeptr, cb)) return 0; + + /* Add one to the opcode to change CREF/RREF into DNCREF/DNRREF and + insert appropriate data values. */ + + code[1+LINK_SIZE]++; + skipunits = 1+2*IMM2_SIZE; + PUT2(code, 2+LINK_SIZE, index); + PUT2(code, 2+LINK_SIZE+IMM2_SIZE, count); + } + goto GROUP_PROCESS_NOTE_EMPTY; + + /* The DEFINE condition is always false. Its internal groups may never + be called, so matched_char must remain false, hence the jump to + GROUP_PROCESS rather than GROUP_PROCESS_NOTE_EMPTY. */ + + case META_COND_DEFINE: + bravalue = OP_COND; + GETPLUSOFFSET(offset, pptr); + code[1+LINK_SIZE] = OP_DEFINE; + skipunits = 1; + goto GROUP_PROCESS; + + /* Conditional test of a group's being set. */ + + case META_COND_NUMBER: + bravalue = OP_COND; + GETPLUSOFFSET(offset, pptr); + groupnumber = *(++pptr); + if (groupnumber > cb->bracount) + { + *errorcodeptr = ERR15; + cb->erroroffset = offset; + return 0; + } + if (groupnumber > cb->top_backref) cb->top_backref = groupnumber; + offset -= 2; /* Point at initial ( for too many branches error */ + code[1+LINK_SIZE] = OP_CREF; + skipunits = 1+IMM2_SIZE; + PUT2(code, 2+LINK_SIZE, groupnumber); + goto GROUP_PROCESS_NOTE_EMPTY; + + /* Test for the PCRE2 version. */ + + case META_COND_VERSION: + bravalue = OP_COND; + if (pptr[1] > 0) + code[1+LINK_SIZE] = ((PCRE2_MAJOR > pptr[2]) || + (PCRE2_MAJOR == pptr[2] && PCRE2_MINOR >= pptr[3]))? + OP_TRUE : OP_FALSE; + else + code[1+LINK_SIZE] = (PCRE2_MAJOR == pptr[2] && PCRE2_MINOR == pptr[3])? + OP_TRUE : OP_FALSE; + skipunits = 1; + pptr += 3; + goto GROUP_PROCESS_NOTE_EMPTY; + + /* The condition is an assertion, possibly preceded by a callout. */ + + case META_COND_ASSERT: + bravalue = OP_COND; + goto GROUP_PROCESS_NOTE_EMPTY; + + + /* ===================================================================*/ + /* Handle all kinds of nested bracketed groups. The non-capturing, + non-conditional cases are here; others come to GROUP_PROCESS via goto. */ + + case META_LOOKAHEAD: + bravalue = OP_ASSERT; + cb->assert_depth += 1; + goto GROUP_PROCESS; + + case META_LOOKAHEAD_NA: + bravalue = OP_ASSERT_NA; + cb->assert_depth += 1; + goto GROUP_PROCESS; + + /* Optimize (?!) to (*FAIL) unless it is quantified - which is a weird + thing to do, but Perl allows all assertions to be quantified, and when + they contain capturing parentheses there may be a potential use for + this feature. Not that that applies to a quantified (?!) but we allow + it for uniformity. */ + + case META_LOOKAHEADNOT: + if (pptr[1] == META_KET && + (pptr[2] < META_ASTERISK || pptr[2] > META_MINMAX_QUERY)) + { + *code++ = OP_FAIL; + pptr++; + } + else + { + bravalue = OP_ASSERT_NOT; + cb->assert_depth += 1; + goto GROUP_PROCESS; + } + break; + + case META_LOOKBEHIND: + bravalue = OP_ASSERTBACK; + cb->assert_depth += 1; + goto GROUP_PROCESS; + + case META_LOOKBEHINDNOT: + bravalue = OP_ASSERTBACK_NOT; + cb->assert_depth += 1; + goto GROUP_PROCESS; + + case META_LOOKBEHIND_NA: + bravalue = OP_ASSERTBACK_NA; + cb->assert_depth += 1; + goto GROUP_PROCESS; + + case META_ATOMIC: + bravalue = OP_ONCE; + goto GROUP_PROCESS_NOTE_EMPTY; + + case META_SCRIPT_RUN: + bravalue = OP_SCRIPT_RUN; + goto GROUP_PROCESS_NOTE_EMPTY; + + case META_NOCAPTURE: + bravalue = OP_BRA; + /* Fall through */ + + /* Process nested bracketed regex. The nesting depth is maintained for the + benefit of the stackguard function. The test for too deep nesting is now + done in parse_regex(). Assertion and DEFINE groups come to GROUP_PROCESS; + others come to GROUP_PROCESS_NOTE_EMPTY, to indicate that we need to take + note of whether or not they may match an empty string. */ + + GROUP_PROCESS_NOTE_EMPTY: + note_group_empty = TRUE; + + GROUP_PROCESS: + cb->parens_depth += 1; + *code = bravalue; + pptr++; + tempcode = code; + tempreqvary = cb->req_varyopt; /* Save value before group */ + length_prevgroup = 0; /* Initialize for pre-compile phase */ + + if ((group_return = + compile_regex( + options, /* The option state */ + &tempcode, /* Where to put code (updated) */ + &pptr, /* Input pointer (updated) */ + errorcodeptr, /* Where to put an error message */ + skipunits, /* Skip over bracket number */ + &subfirstcu, /* For possible first char */ + &subfirstcuflags, + &subreqcu, /* For possible last char */ + &subreqcuflags, + bcptr, /* Current branch chain */ + cb, /* Compile data block */ + (lengthptr == NULL)? NULL : /* Actual compile phase */ + &length_prevgroup /* Pre-compile phase */ + )) == 0) + return 0; /* Error */ + + cb->parens_depth -= 1; + + /* If that was a non-conditional significant group (not an assertion, not a + DEFINE) that matches at least one character, then the current item matches + a character. Conditionals are handled below. */ + + if (note_group_empty && bravalue != OP_COND && group_return > 0) + matched_char = TRUE; + + /* If we've just compiled an assertion, pop the assert depth. */ + + if (bravalue >= OP_ASSERT && bravalue <= OP_ASSERTBACK_NA) + cb->assert_depth -= 1; + + /* At the end of compiling, code is still pointing to the start of the + group, while tempcode has been updated to point past the end of the group. + The parsed pattern pointer (pptr) is on the closing META_KET. + + If this is a conditional bracket, check that there are no more than + two branches in the group, or just one if it's a DEFINE group. We do this + in the real compile phase, not in the pre-pass, where the whole group may + not be available. */ + + if (bravalue == OP_COND && lengthptr == NULL) + { + PCRE2_UCHAR *tc = code; + int condcount = 0; + + do { + condcount++; + tc += GET(tc,1); + } + while (*tc != OP_KET); + + /* A DEFINE group is never obeyed inline (the "condition" is always + false). It must have only one branch. Having checked this, change the + opcode to OP_FALSE. */ + + if (code[LINK_SIZE+1] == OP_DEFINE) + { + if (condcount > 1) + { + cb->erroroffset = offset; + *errorcodeptr = ERR54; + return 0; + } + code[LINK_SIZE+1] = OP_FALSE; + bravalue = OP_DEFINE; /* A flag to suppress char handling below */ + } + + /* A "normal" conditional group. If there is just one branch, we must not + make use of its firstcu or reqcu, because this is equivalent to an + empty second branch. Also, it may match an empty string. If there are two + branches, this item must match a character if the group must. */ + + else + { + if (condcount > 2) + { + cb->erroroffset = offset; + *errorcodeptr = ERR27; + return 0; + } + if (condcount == 1) subfirstcuflags = subreqcuflags = REQ_NONE; + else if (group_return > 0) matched_char = TRUE; + } + } + + /* In the pre-compile phase, update the length by the length of the group, + less the brackets at either end. Then reduce the compiled code to just a + set of non-capturing brackets so that it doesn't use much memory if it is + duplicated by a quantifier.*/ + + if (lengthptr != NULL) + { + if (OFLOW_MAX - *lengthptr < length_prevgroup - 2 - 2*LINK_SIZE) + { + *errorcodeptr = ERR20; + return 0; + } + *lengthptr += length_prevgroup - 2 - 2*LINK_SIZE; + code++; /* This already contains bravalue */ + PUTINC(code, 0, 1 + LINK_SIZE); + *code++ = OP_KET; + PUTINC(code, 0, 1 + LINK_SIZE); + break; /* No need to waste time with special character handling */ + } + + /* Otherwise update the main code pointer to the end of the group. */ + + code = tempcode; + + /* For a DEFINE group, required and first character settings are not + relevant. */ + + if (bravalue == OP_DEFINE) break; + + /* Handle updating of the required and first code units for other types of + group. Update for normal brackets of all kinds, and conditions with two + branches (see code above). If the bracket is followed by a quantifier with + zero repeat, we have to back off. Hence the definition of zeroreqcu and + zerofirstcu outside the main loop so that they can be accessed for the back + off. */ + + zeroreqcu = reqcu; + zeroreqcuflags = reqcuflags; + zerofirstcu = firstcu; + zerofirstcuflags = firstcuflags; + groupsetfirstcu = FALSE; + + if (bravalue >= OP_ONCE) /* Not an assertion */ + { + /* If we have not yet set a firstcu in this branch, take it from the + subpattern, remembering that it was set here so that a repeat of more + than one can replicate it as reqcu if necessary. If the subpattern has + no firstcu, set "none" for the whole branch. In both cases, a zero + repeat forces firstcu to "none". */ + + if (firstcuflags == REQ_UNSET && subfirstcuflags != REQ_UNSET) + { + if (subfirstcuflags >= 0) + { + firstcu = subfirstcu; + firstcuflags = subfirstcuflags; + groupsetfirstcu = TRUE; + } + else firstcuflags = REQ_NONE; + zerofirstcuflags = REQ_NONE; + } + + /* If firstcu was previously set, convert the subpattern's firstcu + into reqcu if there wasn't one, using the vary flag that was in + existence beforehand. */ + + else if (subfirstcuflags >= 0 && subreqcuflags < 0) + { + subreqcu = subfirstcu; + subreqcuflags = subfirstcuflags | tempreqvary; + } + + /* If the subpattern set a required code unit (or set a first code unit + that isn't really the first code unit - see above), set it. */ + + if (subreqcuflags >= 0) + { + reqcu = subreqcu; + reqcuflags = subreqcuflags; + } + } + + /* For a forward assertion, we take the reqcu, if set, provided that the + group has also set a firstcu. This can be helpful if the pattern that + follows the assertion doesn't set a different char. For example, it's + useful for /(?=abcde).+/. We can't set firstcu for an assertion, however + because it leads to incorrect effect for patterns such as /(?=a)a.+/ when + the "real" "a" would then become a reqcu instead of a firstcu. This is + overcome by a scan at the end if there's no firstcu, looking for an + asserted first char. A similar effect for patterns like /(?=.*X)X$/ means + we must only take the reqcu when the group also set a firstcu. Otherwise, + in that example, 'X' ends up set for both. */ + + else if ((bravalue == OP_ASSERT || bravalue == OP_ASSERT_NA) && + subreqcuflags >= 0 && subfirstcuflags >= 0) + { + reqcu = subreqcu; + reqcuflags = subreqcuflags; + } + + break; /* End of nested group handling */ + + + /* ===================================================================*/ + /* Handle named backreferences and recursions. */ + + case META_BACKREF_BYNAME: + case META_RECURSE_BYNAME: + { + int count, index; + PCRE2_SPTR name; + BOOL is_dupname = FALSE; + named_group *ng = cb->named_groups; + uint32_t length = *(++pptr); + + GETPLUSOFFSET(offset, pptr); + name = cb->start_pattern + offset; + + /* In the first pass, the names generated in the pre-pass are available, + but the main name table has not yet been created. Scan the list of names + generated in the pre-pass in order to get a number and whether or not + this name is duplicated. */ + + groupnumber = 0; + for (i = 0; i < cb->names_found; i++, ng++) + { + if (length == ng->length && + PRIV(strncmp)(name, ng->name, length) == 0) + { + is_dupname = ng->isdup; + groupnumber = ng->number; + + /* For a recursion, that's all that is needed. We can now go to + the code that handles numerical recursion, applying it to the first + group with the given name. */ + + if (meta == META_RECURSE_BYNAME) + { + meta_arg = groupnumber; + goto HANDLE_NUMERICAL_RECURSION; + } + + /* For a back reference, update the back reference map and the + maximum back reference. */ + + cb->backref_map |= (groupnumber < 32)? (1u << groupnumber) : 1; + if (groupnumber > cb->top_backref) + cb->top_backref = groupnumber; + } + } + + /* If the name was not found we have a bad reference. */ + + if (groupnumber == 0) + { + *errorcodeptr = ERR15; + cb->erroroffset = offset; + return 0; + } + + /* If a back reference name is not duplicated, we can handle it as + a numerical reference. */ + + if (!is_dupname) + { + meta_arg = groupnumber; + goto HANDLE_SINGLE_REFERENCE; + } + + /* If a back reference name is duplicated, we generate a different + opcode to a numerical back reference. In the second pass we must + search for the index and count in the final name table. */ + + count = 0; /* Values for first pass (avoids compiler warning) */ + index = 0; + if (lengthptr == NULL && !find_dupname_details(name, length, &index, + &count, errorcodeptr, cb)) return 0; + + if (firstcuflags == REQ_UNSET) firstcuflags = REQ_NONE; + *code++ = ((options & PCRE2_CASELESS) != 0)? OP_DNREFI : OP_DNREF; + PUT2INC(code, 0, index); + PUT2INC(code, 0, count); + } + break; + + + /* ===================================================================*/ + /* Handle a numerical callout. */ + + case META_CALLOUT_NUMBER: + code[0] = OP_CALLOUT; + PUT(code, 1, pptr[1]); /* Offset to next pattern item */ + PUT(code, 1 + LINK_SIZE, pptr[2]); /* Length of next pattern item */ + code[1 + 2*LINK_SIZE] = pptr[3]; + pptr += 3; + code += PRIV(OP_lengths)[OP_CALLOUT]; + break; + + + /* ===================================================================*/ + /* Handle a callout with a string argument. In the pre-pass we just compute + the length without generating anything. The length in pptr[3] includes both + delimiters; in the actual compile only the first one is copied, but a + terminating zero is added. Any doubled delimiters within the string make + this an overestimate, but it is not worth bothering about. */ + + case META_CALLOUT_STRING: + if (lengthptr != NULL) + { + *lengthptr += pptr[3] + (1 + 4*LINK_SIZE); + pptr += 3; + SKIPOFFSET(pptr); + } + + /* In the real compile we can copy the string. The starting delimiter is + included so that the client can discover it if they want. We also pass the + start offset to help a script language give better error messages. */ + + else + { + PCRE2_SPTR pp; + uint32_t delimiter; + uint32_t length = pptr[3]; + PCRE2_UCHAR *callout_string = code + (1 + 4*LINK_SIZE); + + code[0] = OP_CALLOUT_STR; + PUT(code, 1, pptr[1]); /* Offset to next pattern item */ + PUT(code, 1 + LINK_SIZE, pptr[2]); /* Length of next pattern item */ + + pptr += 3; + GETPLUSOFFSET(offset, pptr); /* Offset to string in pattern */ + pp = cb->start_pattern + offset; + delimiter = *callout_string++ = *pp++; + if (delimiter == CHAR_LEFT_CURLY_BRACKET) + delimiter = CHAR_RIGHT_CURLY_BRACKET; + PUT(code, 1 + 3*LINK_SIZE, (int)(offset + 1)); /* One after delimiter */ + + /* The syntax of the pattern was checked in the parsing scan. The length + includes both delimiters, but we have passed the opening one just above, + so we reduce length before testing it. The test is for > 1 because we do + not want to copy the final delimiter. This also ensures that pp[1] is + accessible. */ + + while (--length > 1) + { + if (*pp == delimiter && pp[1] == delimiter) + { + *callout_string++ = delimiter; + pp += 2; + length--; + } + else *callout_string++ = *pp++; + } + *callout_string++ = CHAR_NUL; + + /* Set the length of the entire item, the advance to its end. */ + + PUT(code, 1 + 2*LINK_SIZE, (int)(callout_string - code)); + code = callout_string; + } + break; + + + /* ===================================================================*/ + /* Handle repetition. The different types are all sorted out in the parsing + pass. */ + + case META_MINMAX_PLUS: + case META_MINMAX_QUERY: + case META_MINMAX: + repeat_min = *(++pptr); + repeat_max = *(++pptr); + goto REPEAT; + + case META_ASTERISK: + case META_ASTERISK_PLUS: + case META_ASTERISK_QUERY: + repeat_min = 0; + repeat_max = REPEAT_UNLIMITED; + goto REPEAT; + + case META_PLUS: + case META_PLUS_PLUS: + case META_PLUS_QUERY: + repeat_min = 1; + repeat_max = REPEAT_UNLIMITED; + goto REPEAT; + + case META_QUERY: + case META_QUERY_PLUS: + case META_QUERY_QUERY: + repeat_min = 0; + repeat_max = 1; + + REPEAT: + if (previous_matched_char && repeat_min > 0) matched_char = TRUE; + + /* Remember whether this is a variable length repeat, and default to + single-char opcodes. */ + + reqvary = (repeat_min == repeat_max)? 0 : REQ_VARY; + op_type = 0; + + /* Adjust first and required code units for a zero repeat. */ + + if (repeat_min == 0) + { + firstcu = zerofirstcu; + firstcuflags = zerofirstcuflags; + reqcu = zeroreqcu; + reqcuflags = zeroreqcuflags; + } + + /* Note the greediness and possessiveness. */ + + switch (meta) + { + case META_MINMAX_PLUS: + case META_ASTERISK_PLUS: + case META_PLUS_PLUS: + case META_QUERY_PLUS: + repeat_type = 0; /* Force greedy */ + possessive_quantifier = TRUE; + break; + + case META_MINMAX_QUERY: + case META_ASTERISK_QUERY: + case META_PLUS_QUERY: + case META_QUERY_QUERY: + repeat_type = greedy_non_default; + possessive_quantifier = FALSE; + break; + + default: + repeat_type = greedy_default; + possessive_quantifier = FALSE; + break; + } + + /* Save start of previous item, in case we have to move it up in order to + insert something before it, and remember what it was. */ + + tempcode = previous; + op_previous = *previous; + + /* Now handle repetition for the different types of item. If the repeat + minimum and the repeat maximum are both 1, we can ignore the quantifier for + non-parenthesized items, as they have only one alternative. For anything in + parentheses, we must not ignore if {1} is possessive. */ + + switch (op_previous) + { + /* If previous was a character or negated character match, abolish the + item and generate a repeat item instead. If a char item has a minimum of + more than one, ensure that it is set in reqcu - it might not be if a + sequence such as x{3} is the first thing in a branch because the x will + have gone into firstcu instead. */ + + case OP_CHAR: + case OP_CHARI: + case OP_NOT: + case OP_NOTI: + if (repeat_max == 1 && repeat_min == 1) goto END_REPEAT; + op_type = chartypeoffset[op_previous - OP_CHAR]; + + /* Deal with UTF characters that take up more than one code unit. */ + +#ifdef MAYBE_UTF_MULTI + if (utf && NOT_FIRSTCU(code[-1])) + { + PCRE2_UCHAR *lastchar = code - 1; + BACKCHAR(lastchar); + mclength = (uint32_t)(code - lastchar); /* Length of UTF character */ + memcpy(mcbuffer, lastchar, CU2BYTES(mclength)); /* Save the char */ + } + else +#endif /* MAYBE_UTF_MULTI */ + + /* Handle the case of a single code unit - either with no UTF support, or + with UTF disabled, or for a single-code-unit UTF character. */ + { + mcbuffer[0] = code[-1]; + mclength = 1; + if (op_previous <= OP_CHARI && repeat_min > 1) + { + reqcu = mcbuffer[0]; + reqcuflags = req_caseopt | cb->req_varyopt; + } + } + goto OUTPUT_SINGLE_REPEAT; /* Code shared with single character types */ + + /* If previous was a character class or a back reference, we put the + repeat stuff after it, but just skip the item if the repeat was {0,0}. */ + +#ifdef SUPPORT_WIDE_CHARS + case OP_XCLASS: +#endif + case OP_CLASS: + case OP_NCLASS: + case OP_REF: + case OP_REFI: + case OP_DNREF: + case OP_DNREFI: + + if (repeat_max == 0) + { + code = previous; + goto END_REPEAT; + } + if (repeat_max == 1 && repeat_min == 1) goto END_REPEAT; + + if (repeat_min == 0 && repeat_max == REPEAT_UNLIMITED) + *code++ = OP_CRSTAR + repeat_type; + else if (repeat_min == 1 && repeat_max == REPEAT_UNLIMITED) + *code++ = OP_CRPLUS + repeat_type; + else if (repeat_min == 0 && repeat_max == 1) + *code++ = OP_CRQUERY + repeat_type; + else + { + *code++ = OP_CRRANGE + repeat_type; + PUT2INC(code, 0, repeat_min); + if (repeat_max == REPEAT_UNLIMITED) repeat_max = 0; /* 2-byte encoding for max */ + PUT2INC(code, 0, repeat_max); + } + break; + + /* If previous is OP_FAIL, it was generated by an empty class [] + (PCRE2_ALLOW_EMPTY_CLASS is set). The other ways in which OP_FAIL can be + generated, that is by (*FAIL) or (?!), disallow a quantifier at parse + time. We can just ignore this repeat. */ + + case OP_FAIL: + goto END_REPEAT; + + /* Prior to 10.30, repeated recursions were wrapped in OP_ONCE brackets + because pcre2_match() could not handle backtracking into recursively + called groups. Now that this backtracking is available, we no longer need + to do this. However, we still need to replicate recursions as we do for + groups so as to have independent backtracking points. We can replicate + for the minimum number of repeats directly. For optional repeats we now + wrap the recursion in OP_BRA brackets and make use of the bracket + repetition. */ + + case OP_RECURSE: + if (repeat_max == 1 && repeat_min == 1 && !possessive_quantifier) + goto END_REPEAT; + + /* Generate unwrapped repeats for a non-zero minimum, except when the + minimum is 1 and the maximum unlimited, because that can be handled with + OP_BRA terminated by OP_KETRMAX/MIN. When the maximum is equal to the + minimum, we just need to generate the appropriate additional copies. + Otherwise we need to generate one more, to simulate the situation when + the minimum is zero. */ + + if (repeat_min > 0 && (repeat_min != 1 || repeat_max != REPEAT_UNLIMITED)) + { + int replicate = repeat_min; + if (repeat_min == repeat_max) replicate--; + + /* In the pre-compile phase, we don't actually do the replication. We + just adjust the length as if we had. Do some paranoid checks for + potential integer overflow. The INT64_OR_DOUBLE type is a 64-bit + integer type when available, otherwise double. */ + + if (lengthptr != NULL) + { + PCRE2_SIZE delta = replicate*(1 + LINK_SIZE); + if ((INT64_OR_DOUBLE)replicate* + (INT64_OR_DOUBLE)(1 + LINK_SIZE) > + (INT64_OR_DOUBLE)INT_MAX || + OFLOW_MAX - *lengthptr < delta) + { + *errorcodeptr = ERR20; + return 0; + } + *lengthptr += delta; + } + + else for (i = 0; i < replicate; i++) + { + memcpy(code, previous, CU2BYTES(1 + LINK_SIZE)); + previous = code; + code += 1 + LINK_SIZE; + } + + /* If the number of repeats is fixed, we are done. Otherwise, adjust + the counts and fall through. */ + + if (repeat_min == repeat_max) break; + if (repeat_max != REPEAT_UNLIMITED) repeat_max -= repeat_min; + repeat_min = 0; + } + + /* Wrap the recursion call in OP_BRA brackets. */ + + (void)memmove(previous + 1 + LINK_SIZE, previous, CU2BYTES(1 + LINK_SIZE)); + op_previous = *previous = OP_BRA; + PUT(previous, 1, 2 + 2*LINK_SIZE); + previous[2 + 2*LINK_SIZE] = OP_KET; + PUT(previous, 3 + 2*LINK_SIZE, 2 + 2*LINK_SIZE); + code += 2 + 2 * LINK_SIZE; + length_prevgroup = 3 + 3*LINK_SIZE; + group_return = -1; /* Set "may match empty string" */ + + /* Now treat as a repeated OP_BRA. */ + /* Fall through */ + + /* If previous was a bracket group, we may have to replicate it in + certain cases. Note that at this point we can encounter only the "basic" + bracket opcodes such as BRA and CBRA, as this is the place where they get + converted into the more special varieties such as BRAPOS and SBRA. + Originally, PCRE did not allow repetition of assertions, but now it does, + for Perl compatibility. */ + + case OP_ASSERT: + case OP_ASSERT_NOT: + case OP_ASSERT_NA: + case OP_ASSERTBACK: + case OP_ASSERTBACK_NOT: + case OP_ASSERTBACK_NA: + case OP_ONCE: + case OP_SCRIPT_RUN: + case OP_BRA: + case OP_CBRA: + case OP_COND: + { + int len = (int)(code - previous); + PCRE2_UCHAR *bralink = NULL; + PCRE2_UCHAR *brazeroptr = NULL; + + if (repeat_max == 1 && repeat_min == 1 && !possessive_quantifier) + goto END_REPEAT; + + /* Repeating a DEFINE group (or any group where the condition is always + FALSE and there is only one branch) is pointless, but Perl allows the + syntax, so we just ignore the repeat. */ + + if (op_previous == OP_COND && previous[LINK_SIZE+1] == OP_FALSE && + previous[GET(previous, 1)] != OP_ALT) + goto END_REPEAT; + + /* Perl allows all assertions to be quantified, and when they contain + capturing parentheses and/or are optional there are potential uses for + this feature. PCRE2 used to force the maximum quantifier to 1 on the + invalid grounds that further repetition was never useful. This was + always a bit pointless, since an assertion could be wrapped with a + repeated group to achieve the effect. General repetition is now + permitted, but if the maximum is unlimited it is set to one more than + the minimum. */ + + if (op_previous < OP_ONCE) /* Assertion */ + { + if (repeat_max == REPEAT_UNLIMITED) repeat_max = repeat_min + 1; + } + + /* The case of a zero minimum is special because of the need to stick + OP_BRAZERO in front of it, and because the group appears once in the + data, whereas in other cases it appears the minimum number of times. For + this reason, it is simplest to treat this case separately, as otherwise + the code gets far too messy. There are several special subcases when the + minimum is zero. */ + + if (repeat_min == 0) + { + /* If the maximum is also zero, we used to just omit the group from + the output altogether, like this: + + ** if (repeat_max == 0) + ** { + ** code = previous; + ** goto END_REPEAT; + ** } + + However, that fails when a group or a subgroup within it is + referenced as a subroutine from elsewhere in the pattern, so now we + stick in OP_SKIPZERO in front of it so that it is skipped on + execution. As we don't have a list of which groups are referenced, we + cannot do this selectively. + + If the maximum is 1 or unlimited, we just have to stick in the + BRAZERO and do no more at this point. */ + + if (repeat_max <= 1 || repeat_max == REPEAT_UNLIMITED) + { + (void)memmove(previous + 1, previous, CU2BYTES(len)); + code++; + if (repeat_max == 0) + { + *previous++ = OP_SKIPZERO; + goto END_REPEAT; + } + brazeroptr = previous; /* Save for possessive optimizing */ + *previous++ = OP_BRAZERO + repeat_type; + } + + /* If the maximum is greater than 1 and limited, we have to replicate + in a nested fashion, sticking OP_BRAZERO before each set of brackets. + The first one has to be handled carefully because it's the original + copy, which has to be moved up. The remainder can be handled by code + that is common with the non-zero minimum case below. We have to + adjust the value or repeat_max, since one less copy is required. */ + + else + { + int linkoffset; + (void)memmove(previous + 2 + LINK_SIZE, previous, CU2BYTES(len)); + code += 2 + LINK_SIZE; + *previous++ = OP_BRAZERO + repeat_type; + *previous++ = OP_BRA; + + /* We chain together the bracket link offset fields that have to be + filled in later when the ends of the brackets are reached. */ + + linkoffset = (bralink == NULL)? 0 : (int)(previous - bralink); + bralink = previous; + PUTINC(previous, 0, linkoffset); + } + + if (repeat_max != REPEAT_UNLIMITED) repeat_max--; + } + + /* If the minimum is greater than zero, replicate the group as many + times as necessary, and adjust the maximum to the number of subsequent + copies that we need. */ + + else + { + if (repeat_min > 1) + { + /* In the pre-compile phase, we don't actually do the replication. + We just adjust the length as if we had. Do some paranoid checks for + potential integer overflow. The INT64_OR_DOUBLE type is a 64-bit + integer type when available, otherwise double. */ + + if (lengthptr != NULL) + { + PCRE2_SIZE delta = (repeat_min - 1)*length_prevgroup; + if ((INT64_OR_DOUBLE)(repeat_min - 1)* + (INT64_OR_DOUBLE)length_prevgroup > + (INT64_OR_DOUBLE)INT_MAX || + OFLOW_MAX - *lengthptr < delta) + { + *errorcodeptr = ERR20; + return 0; + } + *lengthptr += delta; + } + + /* This is compiling for real. If there is a set first code unit + for the group, and we have not yet set a "required code unit", set + it. */ + + else + { + if (groupsetfirstcu && reqcuflags < 0) + { + reqcu = firstcu; + reqcuflags = firstcuflags; + } + for (i = 1; (uint32_t)i < repeat_min; i++) + { + memcpy(code, previous, CU2BYTES(len)); + code += len; + } + } + } + + if (repeat_max != REPEAT_UNLIMITED) repeat_max -= repeat_min; + } + + /* This code is common to both the zero and non-zero minimum cases. If + the maximum is limited, it replicates the group in a nested fashion, + remembering the bracket starts on a stack. In the case of a zero + minimum, the first one was set up above. In all cases the repeat_max + now specifies the number of additional copies needed. Again, we must + remember to replicate entries on the forward reference list. */ + + if (repeat_max != REPEAT_UNLIMITED) + { + /* In the pre-compile phase, we don't actually do the replication. We + just adjust the length as if we had. For each repetition we must add + 1 to the length for BRAZERO and for all but the last repetition we + must add 2 + 2*LINKSIZE to allow for the nesting that occurs. Do some + paranoid checks to avoid integer overflow. The INT64_OR_DOUBLE type + is a 64-bit integer type when available, otherwise double. */ + + if (lengthptr != NULL && repeat_max > 0) + { + PCRE2_SIZE delta = repeat_max*(length_prevgroup + 1 + 2 + 2*LINK_SIZE) - + 2 - 2*LINK_SIZE; /* Last one doesn't nest */ + if ((INT64_OR_DOUBLE)repeat_max * + (INT64_OR_DOUBLE)(length_prevgroup + 1 + 2 + 2*LINK_SIZE) + > (INT64_OR_DOUBLE)INT_MAX || + OFLOW_MAX - *lengthptr < delta) + { + *errorcodeptr = ERR20; + return 0; + } + *lengthptr += delta; + } + + /* This is compiling for real */ + + else for (i = repeat_max - 1; i >= 0; i--) + { + *code++ = OP_BRAZERO + repeat_type; + + /* All but the final copy start a new nesting, maintaining the + chain of brackets outstanding. */ + + if (i != 0) + { + int linkoffset; + *code++ = OP_BRA; + linkoffset = (bralink == NULL)? 0 : (int)(code - bralink); + bralink = code; + PUTINC(code, 0, linkoffset); + } + + memcpy(code, previous, CU2BYTES(len)); + code += len; + } + + /* Now chain through the pending brackets, and fill in their length + fields (which are holding the chain links pro tem). */ + + while (bralink != NULL) + { + int oldlinkoffset; + int linkoffset = (int)(code - bralink + 1); + PCRE2_UCHAR *bra = code - linkoffset; + oldlinkoffset = GET(bra, 1); + bralink = (oldlinkoffset == 0)? NULL : bralink - oldlinkoffset; + *code++ = OP_KET; + PUTINC(code, 0, linkoffset); + PUT(bra, 1, linkoffset); + } + } + + /* If the maximum is unlimited, set a repeater in the final copy. For + SCRIPT_RUN and ONCE brackets, that's all we need to do. However, + possessively repeated ONCE brackets can be converted into non-capturing + brackets, as the behaviour of (?:xx)++ is the same as (?>xx)++ and this + saves having to deal with possessive ONCEs specially. + + Otherwise, when we are doing the actual compile phase, check to see + whether this group is one that could match an empty string. If so, + convert the initial operator to the S form (e.g. OP_BRA -> OP_SBRA) so + that runtime checking can be done. [This check is also applied to ONCE + and SCRIPT_RUN groups at runtime, but in a different way.] + + Then, if the quantifier was possessive and the bracket is not a + conditional, we convert the BRA code to the POS form, and the KET code + to KETRPOS. (It turns out to be convenient at runtime to detect this + kind of subpattern at both the start and at the end.) The use of + special opcodes makes it possible to reduce greatly the stack usage in + pcre2_match(). If the group is preceded by OP_BRAZERO, convert this to + OP_BRAPOSZERO. + + Then, if the minimum number of matches is 1 or 0, cancel the possessive + flag so that the default action below, of wrapping everything inside + atomic brackets, does not happen. When the minimum is greater than 1, + there will be earlier copies of the group, and so we still have to wrap + the whole thing. */ + + else + { + PCRE2_UCHAR *ketcode = code - 1 - LINK_SIZE; + PCRE2_UCHAR *bracode = ketcode - GET(ketcode, 1); + + /* Convert possessive ONCE brackets to non-capturing */ + + if (*bracode == OP_ONCE && possessive_quantifier) *bracode = OP_BRA; + + /* For non-possessive ONCE and for SCRIPT_RUN brackets, all we need + to do is to set the KET. */ + + if (*bracode == OP_ONCE || *bracode == OP_SCRIPT_RUN) + *ketcode = OP_KETRMAX + repeat_type; + + /* Handle non-SCRIPT_RUN and non-ONCE brackets and possessive ONCEs + (which have been converted to non-capturing above). */ + + else + { + /* In the compile phase, adjust the opcode if the group can match + an empty string. For a conditional group with only one branch, the + value of group_return will not show "could be empty", so we must + check that separately. */ + + if (lengthptr == NULL) + { + if (group_return < 0) *bracode += OP_SBRA - OP_BRA; + if (*bracode == OP_COND && bracode[GET(bracode,1)] != OP_ALT) + *bracode = OP_SCOND; + } + + /* Handle possessive quantifiers. */ + + if (possessive_quantifier) + { + /* For COND brackets, we wrap the whole thing in a possessively + repeated non-capturing bracket, because we have not invented POS + versions of the COND opcodes. */ + + if (*bracode == OP_COND || *bracode == OP_SCOND) + { + int nlen = (int)(code - bracode); + (void)memmove(bracode + 1 + LINK_SIZE, bracode, CU2BYTES(nlen)); + code += 1 + LINK_SIZE; + nlen += 1 + LINK_SIZE; + *bracode = (*bracode == OP_COND)? OP_BRAPOS : OP_SBRAPOS; + *code++ = OP_KETRPOS; + PUTINC(code, 0, nlen); + PUT(bracode, 1, nlen); + } + + /* For non-COND brackets, we modify the BRA code and use KETRPOS. */ + + else + { + *bracode += 1; /* Switch to xxxPOS opcodes */ + *ketcode = OP_KETRPOS; + } + + /* If the minimum is zero, mark it as possessive, then unset the + possessive flag when the minimum is 0 or 1. */ + + if (brazeroptr != NULL) *brazeroptr = OP_BRAPOSZERO; + if (repeat_min < 2) possessive_quantifier = FALSE; + } + + /* Non-possessive quantifier */ + + else *ketcode = OP_KETRMAX + repeat_type; + } + } + } + break; + + /* If previous was a character type match (\d or similar), abolish it and + create a suitable repeat item. The code is shared with single-character + repeats by setting op_type to add a suitable offset into repeat_type. + Note the the Unicode property types will be present only when + SUPPORT_UNICODE is defined, but we don't wrap the little bits of code + here because it just makes it horribly messy. */ + + default: + if (op_previous >= OP_EODN) /* Not a character type - internal error */ + { + *errorcodeptr = ERR10; + return 0; + } + else + { + int prop_type, prop_value; + PCRE2_UCHAR *oldcode; + + if (repeat_max == 1 && repeat_min == 1) goto END_REPEAT; + + op_type = OP_TYPESTAR - OP_STAR; /* Use type opcodes */ + mclength = 0; /* Not a character */ + + if (op_previous == OP_PROP || op_previous == OP_NOTPROP) + { + prop_type = previous[1]; + prop_value = previous[2]; + } + else + { + /* Come here from just above with a character in mcbuffer/mclength. */ + OUTPUT_SINGLE_REPEAT: + prop_type = prop_value = -1; + } + + /* At this point, if prop_type == prop_value == -1 we either have a + character in mcbuffer when mclength is greater than zero, or we have + mclength zero, in which case there is a non-property character type in + op_previous. If prop_type/value are not negative, we have a property + character type in op_previous. */ + + oldcode = code; /* Save where we were */ + code = previous; /* Usually overwrite previous item */ + + /* If the maximum is zero then the minimum must also be zero; Perl allows + this case, so we do too - by simply omitting the item altogether. */ + + if (repeat_max == 0) goto END_REPEAT; + + /* Combine the op_type with the repeat_type */ + + repeat_type += op_type; + + /* A minimum of zero is handled either as the special case * or ?, or as + an UPTO, with the maximum given. */ + + if (repeat_min == 0) + { + if (repeat_max == REPEAT_UNLIMITED) *code++ = OP_STAR + repeat_type; + else if (repeat_max == 1) *code++ = OP_QUERY + repeat_type; + else + { + *code++ = OP_UPTO + repeat_type; + PUT2INC(code, 0, repeat_max); + } + } + + /* A repeat minimum of 1 is optimized into some special cases. If the + maximum is unlimited, we use OP_PLUS. Otherwise, the original item is + left in place and, if the maximum is greater than 1, we use OP_UPTO with + one less than the maximum. */ + + else if (repeat_min == 1) + { + if (repeat_max == REPEAT_UNLIMITED) + *code++ = OP_PLUS + repeat_type; + else + { + code = oldcode; /* Leave previous item in place */ + if (repeat_max == 1) goto END_REPEAT; + *code++ = OP_UPTO + repeat_type; + PUT2INC(code, 0, repeat_max - 1); + } + } + + /* The case {n,n} is just an EXACT, while the general case {n,m} is + handled as an EXACT followed by an UPTO or STAR or QUERY. */ + + else + { + *code++ = OP_EXACT + op_type; /* NB EXACT doesn't have repeat_type */ + PUT2INC(code, 0, repeat_min); + + /* Unless repeat_max equals repeat_min, fill in the data for EXACT, + and then generate the second opcode. For a repeated Unicode property + match, there are two extra values that define the required property, + and mclength is set zero to indicate this. */ + + if (repeat_max != repeat_min) + { + if (mclength > 0) + { + memcpy(code, mcbuffer, CU2BYTES(mclength)); + code += mclength; + } + else + { + *code++ = op_previous; + if (prop_type >= 0) + { + *code++ = prop_type; + *code++ = prop_value; + } + } + + /* Now set up the following opcode */ + + if (repeat_max == REPEAT_UNLIMITED) + *code++ = OP_STAR + repeat_type; + else + { + repeat_max -= repeat_min; + if (repeat_max == 1) + { + *code++ = OP_QUERY + repeat_type; + } + else + { + *code++ = OP_UPTO + repeat_type; + PUT2INC(code, 0, repeat_max); + } + } + } + } + + /* Fill in the character or character type for the final opcode. */ + + if (mclength > 0) + { + memcpy(code, mcbuffer, CU2BYTES(mclength)); + code += mclength; + } + else + { + *code++ = op_previous; + if (prop_type >= 0) + { + *code++ = prop_type; + *code++ = prop_value; + } + } + } + break; + } /* End of switch on different op_previous values */ + + + /* If the character following a repeat is '+', possessive_quantifier is + TRUE. For some opcodes, there are special alternative opcodes for this + case. For anything else, we wrap the entire repeated item inside OP_ONCE + brackets. Logically, the '+' notation is just syntactic sugar, taken from + Sun's Java package, but the special opcodes can optimize it. + + Some (but not all) possessively repeated subpatterns have already been + completely handled in the code just above. For them, possessive_quantifier + is always FALSE at this stage. Note that the repeated item starts at + tempcode, not at previous, which might be the first part of a string whose + (former) last char we repeated. */ + + if (possessive_quantifier) + { + int len; + + /* Possessifying an EXACT quantifier has no effect, so we can ignore it. + However, QUERY, STAR, or UPTO may follow (for quantifiers such as {5,6}, + {5,}, or {5,10}). We skip over an EXACT item; if the length of what + remains is greater than zero, there's a further opcode that can be + handled. If not, do nothing, leaving the EXACT alone. */ + + switch(*tempcode) + { + case OP_TYPEEXACT: + tempcode += PRIV(OP_lengths)[*tempcode] + + ((tempcode[1 + IMM2_SIZE] == OP_PROP + || tempcode[1 + IMM2_SIZE] == OP_NOTPROP)? 2 : 0); + break; + + /* CHAR opcodes are used for exacts whose count is 1. */ + + case OP_CHAR: + case OP_CHARI: + case OP_NOT: + case OP_NOTI: + case OP_EXACT: + case OP_EXACTI: + case OP_NOTEXACT: + case OP_NOTEXACTI: + tempcode += PRIV(OP_lengths)[*tempcode]; +#ifdef SUPPORT_UNICODE + if (utf && HAS_EXTRALEN(tempcode[-1])) + tempcode += GET_EXTRALEN(tempcode[-1]); +#endif + break; + + /* For the class opcodes, the repeat operator appears at the end; + adjust tempcode to point to it. */ + + case OP_CLASS: + case OP_NCLASS: + tempcode += 1 + 32/sizeof(PCRE2_UCHAR); + break; + +#ifdef SUPPORT_WIDE_CHARS + case OP_XCLASS: + tempcode += GET(tempcode, 1); + break; +#endif + } + + /* If tempcode is equal to code (which points to the end of the repeated + item), it means we have skipped an EXACT item but there is no following + QUERY, STAR, or UPTO; the value of len will be 0, and we do nothing. In + all other cases, tempcode will be pointing to the repeat opcode, and will + be less than code, so the value of len will be greater than 0. */ + + len = (int)(code - tempcode); + if (len > 0) + { + unsigned int repcode = *tempcode; + + /* There is a table for possessifying opcodes, all of which are less + than OP_CALLOUT. A zero entry means there is no possessified version. + */ + + if (repcode < OP_CALLOUT && opcode_possessify[repcode] > 0) + *tempcode = opcode_possessify[repcode]; + + /* For opcode without a special possessified version, wrap the item in + ONCE brackets. */ + + else + { + (void)memmove(tempcode + 1 + LINK_SIZE, tempcode, CU2BYTES(len)); + code += 1 + LINK_SIZE; + len += 1 + LINK_SIZE; + tempcode[0] = OP_ONCE; + *code++ = OP_KET; + PUTINC(code, 0, len); + PUT(tempcode, 1, len); + } + } + } + + /* We set the "follows varying string" flag for subsequently encountered + reqcus if it isn't already set and we have just passed a varying length + item. */ + + END_REPEAT: + cb->req_varyopt |= reqvary; + break; + + + /* ===================================================================*/ + /* Handle a 32-bit data character with a value greater than META_END. */ + + case META_BIGVALUE: + pptr++; + goto NORMAL_CHAR; + + + /* ===============================================================*/ + /* Handle a back reference by number, which is the meta argument. The + pattern offsets for back references to group numbers less than 10 are held + in a special vector, to avoid using more than two parsed pattern elements + in 64-bit environments. We only need the offset to the first occurrence, + because if that doesn't fail, subsequent ones will also be OK. */ + + case META_BACKREF: + if (meta_arg < 10) offset = cb->small_ref_offset[meta_arg]; + else GETPLUSOFFSET(offset, pptr); + + if (meta_arg > cb->bracount) + { + cb->erroroffset = offset; + *errorcodeptr = ERR15; /* Non-existent subpattern */ + return 0; + } + + /* Come here from named backref handling when the reference is to a + single group (that is, not to a duplicated name). The back reference + data will have already been updated. We must disable firstcu if not + set, to cope with cases like (?=(\w+))\1: which would otherwise set ':' + later. */ + + HANDLE_SINGLE_REFERENCE: + if (firstcuflags == REQ_UNSET) zerofirstcuflags = firstcuflags = REQ_NONE; + *code++ = ((options & PCRE2_CASELESS) != 0)? OP_REFI : OP_REF; + PUT2INC(code, 0, meta_arg); + + /* Update the map of back references, and keep the highest one. We + could do this in parse_regex() for numerical back references, but not + for named back references, because we don't know the numbers to which + named back references refer. So we do it all in this function. */ + + cb->backref_map |= (meta_arg < 32)? (1u << meta_arg) : 1; + if (meta_arg > cb->top_backref) cb->top_backref = meta_arg; + break; + + + /* ===============================================================*/ + /* Handle recursion by inserting the number of the called group (which is + the meta argument) after OP_RECURSE. At the end of compiling the pattern is + scanned and these numbers are replaced by offsets within the pattern. It is + done like this to avoid problems with forward references and adjusting + offsets when groups are duplicated and moved (as discovered in previous + implementations). Note that a recursion does not have a set first + character. */ + + case META_RECURSE: + GETPLUSOFFSET(offset, pptr); + if (meta_arg > cb->bracount) + { + cb->erroroffset = offset; + *errorcodeptr = ERR15; /* Non-existent subpattern */ + return 0; + } + HANDLE_NUMERICAL_RECURSION: + *code = OP_RECURSE; + PUT(code, 1, meta_arg); + code += 1 + LINK_SIZE; + groupsetfirstcu = FALSE; + cb->had_recurse = TRUE; + if (firstcuflags == REQ_UNSET) firstcuflags = REQ_NONE; + zerofirstcu = firstcu; + zerofirstcuflags = firstcuflags; + break; + + + /* ===============================================================*/ + /* Handle capturing parentheses; the number is the meta argument. */ + + case META_CAPTURE: + bravalue = OP_CBRA; + skipunits = IMM2_SIZE; + PUT2(code, 1+LINK_SIZE, meta_arg); + cb->lastcapture = meta_arg; + goto GROUP_PROCESS_NOTE_EMPTY; + + + /* ===============================================================*/ + /* Handle escape sequence items. For ones like \d, the ESC_values are + arranged to be the same as the corresponding OP_values in the default case + when PCRE2_UCP is not set (which is the only case in which they will appear + here). + + Note: \Q and \E are never seen here, as they were dealt with in + parse_pattern(). Neither are numerical back references or recursions, which + were turned into META_BACKREF or META_RECURSE items, respectively. \k and + \g, when followed by names, are turned into META_BACKREF_BYNAME or + META_RECURSE_BYNAME. */ + + case META_ESCAPE: + + /* We can test for escape sequences that consume a character because their + values lie between ESC_b and ESC_Z; this may have to change if any new ones + are ever created. For these sequences, we disable the setting of a first + character if it hasn't already been set. */ + + if (meta_arg > ESC_b && meta_arg < ESC_Z) + { + matched_char = TRUE; + if (firstcuflags == REQ_UNSET) firstcuflags = REQ_NONE; + } + + /* Set values to reset to if this is followed by a zero repeat. */ + + zerofirstcu = firstcu; + zerofirstcuflags = firstcuflags; + zeroreqcu = reqcu; + zeroreqcuflags = reqcuflags; + + /* If Unicode is not supported, \P and \p are not allowed and are + faulted at parse time, so will never appear here. */ + +#ifdef SUPPORT_UNICODE + if (meta_arg == ESC_P || meta_arg == ESC_p) + { + uint32_t ptype = *(++pptr) >> 16; + uint32_t pdata = *pptr & 0xffff; + + /* The special case of \p{Any} is compiled to OP_ALLANY so as to benefit + from the auto-anchoring code. */ + + if (meta_arg == ESC_p && ptype == PT_ANY) + { + *code++ = OP_ALLANY; + } + else + { + *code++ = (meta_arg == ESC_p)? OP_PROP : OP_NOTPROP; + *code++ = ptype; + *code++ = pdata; + } + break; /* End META_ESCAPE */ + } +#endif + + /* For the rest (including \X when Unicode is supported - if not it's + faulted at parse time), the OP value is the escape value when PCRE2_UCP is + not set; if it is set, these escapes do not show up here because they are + converted into Unicode property tests in parse_regex(). Note that \b and \B + do a one-character lookbehind, and \A also behaves as if it does. */ + + if (meta_arg == ESC_C) cb->external_flags |= PCRE2_HASBKC; /* Record */ + if ((meta_arg == ESC_b || meta_arg == ESC_B || meta_arg == ESC_A) && + cb->max_lookbehind == 0) + cb->max_lookbehind = 1; + + /* In non-UTF mode, and for both 32-bit modes, we turn \C into OP_ALLANY + instead of OP_ANYBYTE so that it works in DFA mode and in lookbehinds. */ + +#if PCRE2_CODE_UNIT_WIDTH == 32 + *code++ = (meta_arg == ESC_C)? OP_ALLANY : meta_arg; +#else + *code++ = (!utf && meta_arg == ESC_C)? OP_ALLANY : meta_arg; +#endif + break; /* End META_ESCAPE */ + + + /* ===================================================================*/ + /* Handle an unrecognized meta value. A parsed pattern value less than + META_END is a literal. Otherwise we have a problem. */ + + default: + if (meta >= META_END) + { +#ifdef DEBUG_SHOW_PARSED + fprintf(stderr, "** Unrecognized parsed pattern item 0x%.8x\n", *pptr); +#endif + *errorcodeptr = ERR89; /* Internal error - unrecognized. */ + return 0; + } + + /* Handle a literal character. We come here by goto in the case of a + 32-bit, non-UTF character whose value is greater than META_END. */ + + NORMAL_CHAR: + meta = *pptr; /* Get the full 32 bits */ + NORMAL_CHAR_SET: /* Character is already in meta */ + matched_char = TRUE; + + /* For caseless UTF or UCP mode, check whether this character has more than + one other case. If so, generate a special OP_PROP item instead of OP_CHARI. + */ + +#ifdef SUPPORT_UNICODE + if ((utf||ucp) && (options & PCRE2_CASELESS) != 0) + { + uint32_t caseset = UCD_CASESET(meta); + if (caseset != 0) + { + *code++ = OP_PROP; + *code++ = PT_CLIST; + *code++ = caseset; + if (firstcuflags == REQ_UNSET) + firstcuflags = zerofirstcuflags = REQ_NONE; + break; /* End handling this meta item */ + } + } +#endif + + /* Caseful matches, or caseless and not one of the multicase characters. We + come here by goto in the case of a positive class that contains only + case-partners of a character with just two cases; matched_char has already + been set TRUE and options fudged if necessary. */ + + CLASS_CASELESS_CHAR: + + /* Get the character's code units into mcbuffer, with the length in + mclength. When not in UTF mode, the length is always 1. */ + +#ifdef SUPPORT_UNICODE + if (utf) mclength = PRIV(ord2utf)(meta, mcbuffer); else +#endif + { + mclength = 1; + mcbuffer[0] = meta; + } + + /* Generate the appropriate code */ + + *code++ = ((options & PCRE2_CASELESS) != 0)? OP_CHARI : OP_CHAR; + memcpy(code, mcbuffer, CU2BYTES(mclength)); + code += mclength; + + /* Remember if \r or \n were seen */ + + if (mcbuffer[0] == CHAR_CR || mcbuffer[0] == CHAR_NL) + cb->external_flags |= PCRE2_HASCRORLF; + + /* Set the first and required code units appropriately. If no previous + first code unit, set it from this character, but revert to none on a zero + repeat. Otherwise, leave the firstcu value alone, and don't change it on + a zero repeat. */ + + if (firstcuflags == REQ_UNSET) + { + zerofirstcuflags = REQ_NONE; + zeroreqcu = reqcu; + zeroreqcuflags = reqcuflags; + + /* If the character is more than one code unit long, we can set a single + firstcu only if it is not to be matched caselessly. Multiple possible + starting code units may be picked up later in the studying code. */ + + if (mclength == 1 || req_caseopt == 0) + { + firstcu = mcbuffer[0]; + firstcuflags = req_caseopt; + if (mclength != 1) + { + reqcu = code[-1]; + reqcuflags = cb->req_varyopt; + } + } + else firstcuflags = reqcuflags = REQ_NONE; + } + + /* firstcu was previously set; we can set reqcu only if the length is + 1 or the matching is caseful. */ + + else + { + zerofirstcu = firstcu; + zerofirstcuflags = firstcuflags; + zeroreqcu = reqcu; + zeroreqcuflags = reqcuflags; + if (mclength == 1 || req_caseopt == 0) + { + reqcu = code[-1]; + reqcuflags = req_caseopt | cb->req_varyopt; + } + } + + /* If caselessness was temporarily instated, reset it. */ + + if (reset_caseful) + { + options &= ~PCRE2_CASELESS; + req_caseopt = 0; + reset_caseful = FALSE; + } + + break; /* End literal character handling */ + } /* End of big switch */ + } /* End of big loop */ + +/* Control never reaches here. */ +} + + + +/************************************************* +* Compile regex: a sequence of alternatives * +*************************************************/ + +/* On entry, pptr is pointing past the bracket meta, but on return it points to +the closing bracket or META_END. The code variable is pointing at the code unit +into which the BRA operator has been stored. This function is used during the +pre-compile phase when we are trying to find out the amount of memory needed, +as well as during the real compile phase. The value of lengthptr distinguishes +the two phases. + +Arguments: + options option bits, including any changes for this subpattern + codeptr -> the address of the current code pointer + pptrptr -> the address of the current parsed pattern pointer + errorcodeptr -> pointer to error code variable + skipunits skip this many code units at start (for brackets and OP_COND) + firstcuptr place to put the first required code unit + firstcuflagsptr place to put the first code unit flags, or a negative number + reqcuptr place to put the last required code unit + reqcuflagsptr place to put the last required code unit flags, or a negative number + bcptr pointer to the chain of currently open branches + cb points to the data block with tables pointers etc. + lengthptr NULL during the real compile phase + points to length accumulator during pre-compile phase + +Returns: 0 There has been an error + +1 Success, this group must match at least one character + -1 Success, this group may match an empty string +*/ + +static int +compile_regex(uint32_t options, PCRE2_UCHAR **codeptr, uint32_t **pptrptr, + int *errorcodeptr, uint32_t skipunits, uint32_t *firstcuptr, + int32_t *firstcuflagsptr, uint32_t *reqcuptr,int32_t *reqcuflagsptr, + branch_chain *bcptr, compile_block *cb, PCRE2_SIZE *lengthptr) +{ +PCRE2_UCHAR *code = *codeptr; +PCRE2_UCHAR *last_branch = code; +PCRE2_UCHAR *start_bracket = code; +BOOL lookbehind; +open_capitem capitem; +int capnumber = 0; +int okreturn = 1; +uint32_t *pptr = *pptrptr; +uint32_t firstcu, reqcu; +uint32_t lookbehindlength; +int32_t firstcuflags, reqcuflags; +uint32_t branchfirstcu, branchreqcu; +int32_t branchfirstcuflags, branchreqcuflags; +PCRE2_SIZE length; +branch_chain bc; + +/* If set, call the external function that checks for stack availability. */ + +if (cb->cx->stack_guard != NULL && + cb->cx->stack_guard(cb->parens_depth, cb->cx->stack_guard_data)) + { + *errorcodeptr= ERR33; + return 0; + } + +/* Miscellaneous initialization */ + +bc.outer = bcptr; +bc.current_branch = code; + +firstcu = reqcu = 0; +firstcuflags = reqcuflags = REQ_UNSET; + +/* Accumulate the length for use in the pre-compile phase. Start with the +length of the BRA and KET and any extra code units that are required at the +beginning. We accumulate in a local variable to save frequent testing of +lengthptr for NULL. We cannot do this by looking at the value of 'code' at the +start and end of each alternative, because compiled items are discarded during +the pre-compile phase so that the workspace is not exceeded. */ + +length = 2 + 2*LINK_SIZE + skipunits; + +/* Remember if this is a lookbehind assertion, and if it is, save its length +and skip over the pattern offset. */ + +lookbehind = *code == OP_ASSERTBACK || + *code == OP_ASSERTBACK_NOT || + *code == OP_ASSERTBACK_NA; + +if (lookbehind) + { + lookbehindlength = META_DATA(pptr[-1]); + pptr += SIZEOFFSET; + } +else lookbehindlength = 0; + +/* If this is a capturing subpattern, add to the chain of open capturing items +so that we can detect them if (*ACCEPT) is encountered. Note that only OP_CBRA +need be tested here; changing this opcode to one of its variants, e.g. +OP_SCBRAPOS, happens later, after the group has been compiled. */ + +if (*code == OP_CBRA) + { + capnumber = GET2(code, 1 + LINK_SIZE); + capitem.number = capnumber; + capitem.next = cb->open_caps; + capitem.assert_depth = cb->assert_depth; + cb->open_caps = &capitem; + } + +/* Offset is set zero to mark that this bracket is still open */ + +PUT(code, 1, 0); +code += 1 + LINK_SIZE + skipunits; + +/* Loop for each alternative branch */ + +for (;;) + { + int branch_return; + + /* Insert OP_REVERSE if this is as lookbehind assertion. */ + + if (lookbehind && lookbehindlength > 0) + { + *code++ = OP_REVERSE; + PUTINC(code, 0, lookbehindlength); + length += 1 + LINK_SIZE; + } + + /* Now compile the branch; in the pre-compile phase its length gets added + into the length. */ + + if ((branch_return = + compile_branch(&options, &code, &pptr, errorcodeptr, &branchfirstcu, + &branchfirstcuflags, &branchreqcu, &branchreqcuflags, &bc, + cb, (lengthptr == NULL)? NULL : &length)) == 0) + return 0; + + /* If a branch can match an empty string, so can the whole group. */ + + if (branch_return < 0) okreturn = -1; + + /* In the real compile phase, there is some post-processing to be done. */ + + if (lengthptr == NULL) + { + /* If this is the first branch, the firstcu and reqcu values for the + branch become the values for the regex. */ + + if (*last_branch != OP_ALT) + { + firstcu = branchfirstcu; + firstcuflags = branchfirstcuflags; + reqcu = branchreqcu; + reqcuflags = branchreqcuflags; + } + + /* If this is not the first branch, the first char and reqcu have to + match the values from all the previous branches, except that if the + previous value for reqcu didn't have REQ_VARY set, it can still match, + and we set REQ_VARY for the group from this branch's value. */ + + else + { + /* If we previously had a firstcu, but it doesn't match the new branch, + we have to abandon the firstcu for the regex, but if there was + previously no reqcu, it takes on the value of the old firstcu. */ + + if (firstcuflags != branchfirstcuflags || firstcu != branchfirstcu) + { + if (firstcuflags >= 0) + { + if (reqcuflags < 0) + { + reqcu = firstcu; + reqcuflags = firstcuflags; + } + } + firstcuflags = REQ_NONE; + } + + /* If we (now or from before) have no firstcu, a firstcu from the + branch becomes a reqcu if there isn't a branch reqcu. */ + + if (firstcuflags < 0 && branchfirstcuflags >= 0 && + branchreqcuflags < 0) + { + branchreqcu = branchfirstcu; + branchreqcuflags = branchfirstcuflags; + } + + /* Now ensure that the reqcus match */ + + if (((reqcuflags & ~REQ_VARY) != (branchreqcuflags & ~REQ_VARY)) || + reqcu != branchreqcu) + reqcuflags = REQ_NONE; + else + { + reqcu = branchreqcu; + reqcuflags |= branchreqcuflags; /* To "or" REQ_VARY if present */ + } + } + } + + /* Handle reaching the end of the expression, either ')' or end of pattern. + In the real compile phase, go back through the alternative branches and + reverse the chain of offsets, with the field in the BRA item now becoming an + offset to the first alternative. If there are no alternatives, it points to + the end of the group. The length in the terminating ket is always the length + of the whole bracketed item. Return leaving the pointer at the terminating + char. */ + + if (META_CODE(*pptr) != META_ALT) + { + if (lengthptr == NULL) + { + PCRE2_SIZE branch_length = code - last_branch; + do + { + PCRE2_SIZE prev_length = GET(last_branch, 1); + PUT(last_branch, 1, branch_length); + branch_length = prev_length; + last_branch -= branch_length; + } + while (branch_length > 0); + } + + /* Fill in the ket */ + + *code = OP_KET; + PUT(code, 1, (int)(code - start_bracket)); + code += 1 + LINK_SIZE; + + /* If it was a capturing subpattern, remove the block from the chain. */ + + if (capnumber > 0) cb->open_caps = cb->open_caps->next; + + /* Set values to pass back */ + + *codeptr = code; + *pptrptr = pptr; + *firstcuptr = firstcu; + *firstcuflagsptr = firstcuflags; + *reqcuptr = reqcu; + *reqcuflagsptr = reqcuflags; + if (lengthptr != NULL) + { + if (OFLOW_MAX - *lengthptr < length) + { + *errorcodeptr = ERR20; + return 0; + } + *lengthptr += length; + } + return okreturn; + } + + /* Another branch follows. In the pre-compile phase, we can move the code + pointer back to where it was for the start of the first branch. (That is, + pretend that each branch is the only one.) + + In the real compile phase, insert an ALT node. Its length field points back + to the previous branch while the bracket remains open. At the end the chain + is reversed. It's done like this so that the start of the bracket has a + zero offset until it is closed, making it possible to detect recursion. */ + + if (lengthptr != NULL) + { + code = *codeptr + 1 + LINK_SIZE + skipunits; + length += 1 + LINK_SIZE; + } + else + { + *code = OP_ALT; + PUT(code, 1, (int)(code - last_branch)); + bc.current_branch = last_branch = code; + code += 1 + LINK_SIZE; + } + + /* Set the lookbehind length (if not in a lookbehind the value will be zero) + and then advance past the vertical bar. */ + + lookbehindlength = META_DATA(*pptr); + pptr++; + } +/* Control never reaches here */ +} + + + +/************************************************* +* Check for anchored pattern * +*************************************************/ + +/* Try to find out if this is an anchored regular expression. Consider each +alternative branch. If they all start with OP_SOD or OP_CIRC, or with a bracket +all of whose alternatives start with OP_SOD or OP_CIRC (recurse ad lib), then +it's anchored. However, if this is a multiline pattern, then only OP_SOD will +be found, because ^ generates OP_CIRCM in that mode. + +We can also consider a regex to be anchored if OP_SOM starts all its branches. +This is the code for \G, which means "match at start of match position, taking +into account the match offset". + +A branch is also implicitly anchored if it starts with .* and DOTALL is set, +because that will try the rest of the pattern at all possible matching points, +so there is no point trying again.... er .... + +.... except when the .* appears inside capturing parentheses, and there is a +subsequent back reference to those parentheses. We haven't enough information +to catch that case precisely. + +At first, the best we could do was to detect when .* was in capturing brackets +and the highest back reference was greater than or equal to that level. +However, by keeping a bitmap of the first 31 back references, we can catch some +of the more common cases more precisely. + +... A second exception is when the .* appears inside an atomic group, because +this prevents the number of characters it matches from being adjusted. + +Arguments: + code points to start of the compiled pattern + bracket_map a bitmap of which brackets we are inside while testing; this + handles up to substring 31; after that we just have to take + the less precise approach + cb points to the compile data block + atomcount atomic group level + inassert TRUE if in an assertion + +Returns: TRUE or FALSE +*/ + +static BOOL +is_anchored(PCRE2_SPTR code, unsigned int bracket_map, compile_block *cb, + int atomcount, BOOL inassert) +{ +do { + PCRE2_SPTR scode = first_significant_code( + code + PRIV(OP_lengths)[*code], FALSE); + int op = *scode; + + /* Non-capturing brackets */ + + if (op == OP_BRA || op == OP_BRAPOS || + op == OP_SBRA || op == OP_SBRAPOS) + { + if (!is_anchored(scode, bracket_map, cb, atomcount, inassert)) + return FALSE; + } + + /* Capturing brackets */ + + else if (op == OP_CBRA || op == OP_CBRAPOS || + op == OP_SCBRA || op == OP_SCBRAPOS) + { + int n = GET2(scode, 1+LINK_SIZE); + int new_map = bracket_map | ((n < 32)? (1u << n) : 1); + if (!is_anchored(scode, new_map, cb, atomcount, inassert)) return FALSE; + } + + /* Positive forward assertion */ + + else if (op == OP_ASSERT || op == OP_ASSERT_NA) + { + if (!is_anchored(scode, bracket_map, cb, atomcount, TRUE)) return FALSE; + } + + /* Condition. If there is no second branch, it can't be anchored. */ + + else if (op == OP_COND || op == OP_SCOND) + { + if (scode[GET(scode,1)] != OP_ALT) return FALSE; + if (!is_anchored(scode, bracket_map, cb, atomcount, inassert)) + return FALSE; + } + + /* Atomic groups */ + + else if (op == OP_ONCE) + { + if (!is_anchored(scode, bracket_map, cb, atomcount + 1, inassert)) + return FALSE; + } + + /* .* is not anchored unless DOTALL is set (which generates OP_ALLANY) and + it isn't in brackets that are or may be referenced or inside an atomic + group or an assertion. Also the pattern must not contain *PRUNE or *SKIP, + because these break the feature. Consider, for example, /(?s).*?(*PRUNE)b/ + with the subject "aab", which matches "b", i.e. not at the start of a line. + There is also an option that disables auto-anchoring. */ + + else if ((op == OP_TYPESTAR || op == OP_TYPEMINSTAR || + op == OP_TYPEPOSSTAR)) + { + if (scode[1] != OP_ALLANY || (bracket_map & cb->backref_map) != 0 || + atomcount > 0 || cb->had_pruneorskip || inassert || + (cb->external_options & PCRE2_NO_DOTSTAR_ANCHOR) != 0) + return FALSE; + } + + /* Check for explicit anchoring */ + + else if (op != OP_SOD && op != OP_SOM && op != OP_CIRC) return FALSE; + + code += GET(code, 1); + } +while (*code == OP_ALT); /* Loop for each alternative */ +return TRUE; +} + + + +/************************************************* +* Check for starting with ^ or .* * +*************************************************/ + +/* This is called to find out if every branch starts with ^ or .* so that +"first char" processing can be done to speed things up in multiline +matching and for non-DOTALL patterns that start with .* (which must start at +the beginning or after \n). As in the case of is_anchored() (see above), we +have to take account of back references to capturing brackets that contain .* +because in that case we can't make the assumption. Also, the appearance of .* +inside atomic brackets or in an assertion, or in a pattern that contains *PRUNE +or *SKIP does not count, because once again the assumption no longer holds. + +Arguments: + code points to start of the compiled pattern or a group + bracket_map a bitmap of which brackets we are inside while testing; this + handles up to substring 31; after that we just have to take + the less precise approach + cb points to the compile data + atomcount atomic group level + inassert TRUE if in an assertion + +Returns: TRUE or FALSE +*/ + +static BOOL +is_startline(PCRE2_SPTR code, unsigned int bracket_map, compile_block *cb, + int atomcount, BOOL inassert) +{ +do { + PCRE2_SPTR scode = first_significant_code( + code + PRIV(OP_lengths)[*code], FALSE); + int op = *scode; + + /* If we are at the start of a conditional assertion group, *both* the + conditional assertion *and* what follows the condition must satisfy the test + for start of line. Other kinds of condition fail. Note that there may be an + auto-callout at the start of a condition. */ + + if (op == OP_COND) + { + scode += 1 + LINK_SIZE; + + if (*scode == OP_CALLOUT) scode += PRIV(OP_lengths)[OP_CALLOUT]; + else if (*scode == OP_CALLOUT_STR) scode += GET(scode, 1 + 2*LINK_SIZE); + + switch (*scode) + { + case OP_CREF: + case OP_DNCREF: + case OP_RREF: + case OP_DNRREF: + case OP_FAIL: + case OP_FALSE: + case OP_TRUE: + return FALSE; + + default: /* Assertion */ + if (!is_startline(scode, bracket_map, cb, atomcount, TRUE)) return FALSE; + do scode += GET(scode, 1); while (*scode == OP_ALT); + scode += 1 + LINK_SIZE; + break; + } + scode = first_significant_code(scode, FALSE); + op = *scode; + } + + /* Non-capturing brackets */ + + if (op == OP_BRA || op == OP_BRAPOS || + op == OP_SBRA || op == OP_SBRAPOS) + { + if (!is_startline(scode, bracket_map, cb, atomcount, inassert)) + return FALSE; + } + + /* Capturing brackets */ + + else if (op == OP_CBRA || op == OP_CBRAPOS || + op == OP_SCBRA || op == OP_SCBRAPOS) + { + int n = GET2(scode, 1+LINK_SIZE); + int new_map = bracket_map | ((n < 32)? (1u << n) : 1); + if (!is_startline(scode, new_map, cb, atomcount, inassert)) return FALSE; + } + + /* Positive forward assertions */ + + else if (op == OP_ASSERT || op == OP_ASSERT_NA) + { + if (!is_startline(scode, bracket_map, cb, atomcount, TRUE)) + return FALSE; + } + + /* Atomic brackets */ + + else if (op == OP_ONCE) + { + if (!is_startline(scode, bracket_map, cb, atomcount + 1, inassert)) + return FALSE; + } + + /* .* means "start at start or after \n" if it isn't in atomic brackets or + brackets that may be referenced or an assertion, and as long as the pattern + does not contain *PRUNE or *SKIP, because these break the feature. Consider, + for example, /.*?a(*PRUNE)b/ with the subject "aab", which matches "ab", + i.e. not at the start of a line. There is also an option that disables this + optimization. */ + + else if (op == OP_TYPESTAR || op == OP_TYPEMINSTAR || op == OP_TYPEPOSSTAR) + { + if (scode[1] != OP_ANY || (bracket_map & cb->backref_map) != 0 || + atomcount > 0 || cb->had_pruneorskip || inassert || + (cb->external_options & PCRE2_NO_DOTSTAR_ANCHOR) != 0) + return FALSE; + } + + /* Check for explicit circumflex; anything else gives a FALSE result. Note + in particular that this includes atomic brackets OP_ONCE because the number + of characters matched by .* cannot be adjusted inside them. */ + + else if (op != OP_CIRC && op != OP_CIRCM) return FALSE; + + /* Move on to the next alternative */ + + code += GET(code, 1); + } +while (*code == OP_ALT); /* Loop for each alternative */ +return TRUE; +} + + + +/************************************************* +* Scan compiled regex for recursion reference * +*************************************************/ + +/* This function scans through a compiled pattern until it finds an instance of +OP_RECURSE. + +Arguments: + code points to start of expression + utf TRUE in UTF mode + +Returns: pointer to the opcode for OP_RECURSE, or NULL if not found +*/ + +static PCRE2_SPTR +find_recurse(PCRE2_SPTR code, BOOL utf) +{ +for (;;) + { + PCRE2_UCHAR c = *code; + if (c == OP_END) return NULL; + if (c == OP_RECURSE) return code; + + /* XCLASS is used for classes that cannot be represented just by a bit map. + This includes negated single high-valued characters. CALLOUT_STR is used for + callouts with string arguments. In both cases the length in the table is + zero; the actual length is stored in the compiled code. */ + + if (c == OP_XCLASS) code += GET(code, 1); + else if (c == OP_CALLOUT_STR) code += GET(code, 1 + 2*LINK_SIZE); + + /* Otherwise, we can get the item's length from the table, except that for + repeated character types, we have to test for \p and \P, which have an extra + two code units of parameters, and for MARK/PRUNE/SKIP/THEN with an argument, + we must add in its length. */ + + else + { + switch(c) + { + case OP_TYPESTAR: + case OP_TYPEMINSTAR: + case OP_TYPEPLUS: + case OP_TYPEMINPLUS: + case OP_TYPEQUERY: + case OP_TYPEMINQUERY: + case OP_TYPEPOSSTAR: + case OP_TYPEPOSPLUS: + case OP_TYPEPOSQUERY: + if (code[1] == OP_PROP || code[1] == OP_NOTPROP) code += 2; + break; + + case OP_TYPEPOSUPTO: + case OP_TYPEUPTO: + case OP_TYPEMINUPTO: + case OP_TYPEEXACT: + if (code[1 + IMM2_SIZE] == OP_PROP || code[1 + IMM2_SIZE] == OP_NOTPROP) + code += 2; + break; + + case OP_MARK: + case OP_COMMIT_ARG: + case OP_PRUNE_ARG: + case OP_SKIP_ARG: + case OP_THEN_ARG: + code += code[1]; + break; + } + + /* Add in the fixed length from the table */ + + code += PRIV(OP_lengths)[c]; + + /* In UTF-8 and UTF-16 modes, opcodes that are followed by a character may + be followed by a multi-unit character. The length in the table is a + minimum, so we have to arrange to skip the extra units. */ + +#ifdef MAYBE_UTF_MULTI + if (utf) switch(c) + { + case OP_CHAR: + case OP_CHARI: + case OP_NOT: + case OP_NOTI: + case OP_EXACT: + case OP_EXACTI: + case OP_NOTEXACT: + case OP_NOTEXACTI: + case OP_UPTO: + case OP_UPTOI: + case OP_NOTUPTO: + case OP_NOTUPTOI: + case OP_MINUPTO: + case OP_MINUPTOI: + case OP_NOTMINUPTO: + case OP_NOTMINUPTOI: + case OP_POSUPTO: + case OP_POSUPTOI: + case OP_NOTPOSUPTO: + case OP_NOTPOSUPTOI: + case OP_STAR: + case OP_STARI: + case OP_NOTSTAR: + case OP_NOTSTARI: + case OP_MINSTAR: + case OP_MINSTARI: + case OP_NOTMINSTAR: + case OP_NOTMINSTARI: + case OP_POSSTAR: + case OP_POSSTARI: + case OP_NOTPOSSTAR: + case OP_NOTPOSSTARI: + case OP_PLUS: + case OP_PLUSI: + case OP_NOTPLUS: + case OP_NOTPLUSI: + case OP_MINPLUS: + case OP_MINPLUSI: + case OP_NOTMINPLUS: + case OP_NOTMINPLUSI: + case OP_POSPLUS: + case OP_POSPLUSI: + case OP_NOTPOSPLUS: + case OP_NOTPOSPLUSI: + case OP_QUERY: + case OP_QUERYI: + case OP_NOTQUERY: + case OP_NOTQUERYI: + case OP_MINQUERY: + case OP_MINQUERYI: + case OP_NOTMINQUERY: + case OP_NOTMINQUERYI: + case OP_POSQUERY: + case OP_POSQUERYI: + case OP_NOTPOSQUERY: + case OP_NOTPOSQUERYI: + if (HAS_EXTRALEN(code[-1])) code += GET_EXTRALEN(code[-1]); + break; + } +#else + (void)(utf); /* Keep compiler happy by referencing function argument */ +#endif /* MAYBE_UTF_MULTI */ + } + } +} + + + +/************************************************* +* Check for asserted fixed first code unit * +*************************************************/ + +/* During compilation, the "first code unit" settings from forward assertions +are discarded, because they can cause conflicts with actual literals that +follow. However, if we end up without a first code unit setting for an +unanchored pattern, it is worth scanning the regex to see if there is an +initial asserted first code unit. If all branches start with the same asserted +code unit, or with a non-conditional bracket all of whose alternatives start +with the same asserted code unit (recurse ad lib), then we return that code +unit, with the flags set to zero or REQ_CASELESS; otherwise return zero with +REQ_NONE in the flags. + +Arguments: + code points to start of compiled pattern + flags points to the first code unit flags + inassert non-zero if in an assertion + +Returns: the fixed first code unit, or 0 with REQ_NONE in flags +*/ + +static uint32_t +find_firstassertedcu(PCRE2_SPTR code, int32_t *flags, uint32_t inassert) +{ +uint32_t c = 0; +int cflags = REQ_NONE; + +*flags = REQ_NONE; +do { + uint32_t d; + int dflags; + int xl = (*code == OP_CBRA || *code == OP_SCBRA || + *code == OP_CBRAPOS || *code == OP_SCBRAPOS)? IMM2_SIZE:0; + PCRE2_SPTR scode = first_significant_code(code + 1+LINK_SIZE + xl, TRUE); + PCRE2_UCHAR op = *scode; + + switch(op) + { + default: + return 0; + + case OP_BRA: + case OP_BRAPOS: + case OP_CBRA: + case OP_SCBRA: + case OP_CBRAPOS: + case OP_SCBRAPOS: + case OP_ASSERT: + case OP_ASSERT_NA: + case OP_ONCE: + case OP_SCRIPT_RUN: + d = find_firstassertedcu(scode, &dflags, inassert + + ((op == OP_ASSERT || op == OP_ASSERT_NA)?1:0)); + if (dflags < 0) + return 0; + if (cflags < 0) { c = d; cflags = dflags; } + else if (c != d || cflags != dflags) return 0; + break; + + case OP_EXACT: + scode += IMM2_SIZE; + /* Fall through */ + + case OP_CHAR: + case OP_PLUS: + case OP_MINPLUS: + case OP_POSPLUS: + if (inassert == 0) return 0; + if (cflags < 0) { c = scode[1]; cflags = 0; } + else if (c != scode[1]) return 0; + break; + + case OP_EXACTI: + scode += IMM2_SIZE; + /* Fall through */ + + case OP_CHARI: + case OP_PLUSI: + case OP_MINPLUSI: + case OP_POSPLUSI: + if (inassert == 0) return 0; + + /* If the character is more than one code unit long, we cannot set its + first code unit when matching caselessly. Later scanning may pick up + multiple code units. */ + +#ifdef SUPPORT_UNICODE +#if PCRE2_CODE_UNIT_WIDTH == 8 + if (scode[1] >= 0x80) return 0; +#elif PCRE2_CODE_UNIT_WIDTH == 16 + if (scode[1] >= 0xd800 && scode[1] <= 0xdfff) return 0; +#endif +#endif + + if (cflags < 0) { c = scode[1]; cflags = REQ_CASELESS; } + else if (c != scode[1]) return 0; + break; + } + + code += GET(code, 1); + } +while (*code == OP_ALT); + +*flags = cflags; +return c; +} + + + +/************************************************* +* Add an entry to the name/number table * +*************************************************/ + +/* This function is called between compiling passes to add an entry to the +name/number table, maintaining alphabetical order. Checking for permitted +and forbidden duplicates has already been done. + +Arguments: + cb the compile data block + name the name to add + length the length of the name + groupno the group number + tablecount the count of names in the table so far + +Returns: nothing +*/ + +static void +add_name_to_table(compile_block *cb, PCRE2_SPTR name, int length, + unsigned int groupno, uint32_t tablecount) +{ +uint32_t i; +PCRE2_UCHAR *slot = cb->name_table; + +for (i = 0; i < tablecount; i++) + { + int crc = memcmp(name, slot+IMM2_SIZE, CU2BYTES(length)); + if (crc == 0 && slot[IMM2_SIZE+length] != 0) + crc = -1; /* Current name is a substring */ + + /* Make space in the table and break the loop for an earlier name. For a + duplicate or later name, carry on. We do this for duplicates so that in the + simple case (when ?(| is not used) they are in order of their numbers. In all + cases they are in the order in which they appear in the pattern. */ + + if (crc < 0) + { + (void)memmove(slot + cb->name_entry_size, slot, + CU2BYTES((tablecount - i) * cb->name_entry_size)); + break; + } + + /* Continue the loop for a later or duplicate name */ + + slot += cb->name_entry_size; + } + +PUT2(slot, 0, groupno); +memcpy(slot + IMM2_SIZE, name, CU2BYTES(length)); + +/* Add a terminating zero and fill the rest of the slot with zeroes so that +the memory is all initialized. Otherwise valgrind moans about uninitialized +memory when saving serialized compiled patterns. */ + +memset(slot + IMM2_SIZE + length, 0, + CU2BYTES(cb->name_entry_size - length - IMM2_SIZE)); +} + + + +/************************************************* +* Skip in parsed pattern * +*************************************************/ + +/* This function is called to skip parts of the parsed pattern when finding the +length of a lookbehind branch. It is called after (*ACCEPT) and (*FAIL) to find +the end of the branch, it is called to skip over an internal lookaround or +(DEFINE) group, and it is also called to skip to the end of a class, during +which it will never encounter nested groups (but there's no need to have +special code for that). + +When called to find the end of a branch or group, pptr must point to the first +meta code inside the branch, not the branch-starting code. In other cases it +can point to the item that causes the function to be called. + +Arguments: + pptr current pointer to skip from + skiptype PSKIP_CLASS when skipping to end of class + PSKIP_ALT when META_ALT ends the skip + PSKIP_KET when only META_KET ends the skip + +Returns: new value of pptr + NULL if META_END is reached - should never occur + or for an unknown meta value - likewise +*/ + +static uint32_t * +parsed_skip(uint32_t *pptr, uint32_t skiptype) +{ +uint32_t nestlevel = 0; + +for (;; pptr++) + { + uint32_t meta = META_CODE(*pptr); + + switch(meta) + { + default: /* Just skip over most items */ + if (meta < META_END) continue; /* Literal */ + break; + + /* This should never occur. */ + + case META_END: + return NULL; + + /* The data for these items is variable in length. */ + + case META_BACKREF: /* Offset is present only if group >= 10 */ + if (META_DATA(*pptr) >= 10) pptr += SIZEOFFSET; + break; + + case META_ESCAPE: /* A few escapes are followed by data items. */ + switch (META_DATA(*pptr)) + { + case ESC_P: + case ESC_p: + pptr += 1; + break; + + case ESC_g: + case ESC_k: + pptr += 1 + SIZEOFFSET; + break; + } + break; + + case META_MARK: /* Add the length of the name. */ + case META_COMMIT_ARG: + case META_PRUNE_ARG: + case META_SKIP_ARG: + case META_THEN_ARG: + pptr += pptr[1]; + break; + + /* These are the "active" items in this loop. */ + + case META_CLASS_END: + if (skiptype == PSKIP_CLASS) return pptr; + break; + + case META_ATOMIC: + case META_CAPTURE: + case META_COND_ASSERT: + case META_COND_DEFINE: + case META_COND_NAME: + case META_COND_NUMBER: + case META_COND_RNAME: + case META_COND_RNUMBER: + case META_COND_VERSION: + case META_LOOKAHEAD: + case META_LOOKAHEADNOT: + case META_LOOKAHEAD_NA: + case META_LOOKBEHIND: + case META_LOOKBEHINDNOT: + case META_LOOKBEHIND_NA: + case META_NOCAPTURE: + case META_SCRIPT_RUN: + nestlevel++; + break; + + case META_ALT: + if (nestlevel == 0 && skiptype == PSKIP_ALT) return pptr; + break; + + case META_KET: + if (nestlevel == 0) return pptr; + nestlevel--; + break; + } + + /* The extra data item length for each meta is in a table. */ + + meta = (meta >> 16) & 0x7fff; + if (meta >= sizeof(meta_extra_lengths)) return NULL; + pptr += meta_extra_lengths[meta]; + } +/* Control never reaches here */ +return pptr; +} + + + +/************************************************* +* Find length of a parsed group * +*************************************************/ + +/* This is called for nested groups within a branch of a lookbehind whose +length is being computed. If all the branches in the nested group have the same +length, that is OK. On entry, the pointer must be at the first element after +the group initializing code. On exit it points to OP_KET. Caching is used to +improve processing speed when the same capturing group occurs many times. + +Arguments: + pptrptr pointer to pointer in the parsed pattern + isinline FALSE if a reference or recursion; TRUE for inline group + errcodeptr pointer to the errorcode + lcptr pointer to the loop counter + group number of captured group or -1 for a non-capturing group + recurses chain of recurse_check to catch mutual recursion + cb pointer to the compile data + +Returns: the group length or a negative number +*/ + +static int +get_grouplength(uint32_t **pptrptr, BOOL isinline, int *errcodeptr, int *lcptr, + int group, parsed_recurse_check *recurses, compile_block *cb) +{ +int branchlength; +int grouplength = -1; + +/* The cache can be used only if there is no possibility of there being two +groups with the same number. We do not need to set the end pointer for a group +that is being processed as a back reference or recursion, but we must do so for +an inline group. */ + +if (group > 0 && (cb->external_flags & PCRE2_DUPCAPUSED) == 0) + { + uint32_t groupinfo = cb->groupinfo[group]; + if ((groupinfo & GI_NOT_FIXED_LENGTH) != 0) return -1; + if ((groupinfo & GI_SET_FIXED_LENGTH) != 0) + { + if (isinline) *pptrptr = parsed_skip(*pptrptr, PSKIP_KET); + return groupinfo & GI_FIXED_LENGTH_MASK; + } + } + +/* Scan the group. In this case we find the end pointer of necessity. */ + +for(;;) + { + branchlength = get_branchlength(pptrptr, errcodeptr, lcptr, recurses, cb); + if (branchlength < 0) goto ISNOTFIXED; + if (grouplength == -1) grouplength = branchlength; + else if (grouplength != branchlength) goto ISNOTFIXED; + if (**pptrptr == META_KET) break; + *pptrptr += 1; /* Skip META_ALT */ + } + +if (group > 0) + cb->groupinfo[group] |= (uint32_t)(GI_SET_FIXED_LENGTH | grouplength); +return grouplength; + +ISNOTFIXED: +if (group > 0) cb->groupinfo[group] |= GI_NOT_FIXED_LENGTH; +return -1; +} + + + +/************************************************* +* Find length of a parsed branch * +*************************************************/ + +/* Return a fixed length for a branch in a lookbehind, giving an error if the +length is not fixed. On entry, *pptrptr points to the first element inside the +branch. On exit it is set to point to the ALT or KET. + +Arguments: + pptrptr pointer to pointer in the parsed pattern + errcodeptr pointer to error code + lcptr pointer to loop counter + recurses chain of recurse_check to catch mutual recursion + cb pointer to compile block + +Returns: the length, or a negative value on error +*/ + +static int +get_branchlength(uint32_t **pptrptr, int *errcodeptr, int *lcptr, + parsed_recurse_check *recurses, compile_block *cb) +{ +int branchlength = 0; +int grouplength; +uint32_t lastitemlength = 0; +uint32_t *pptr = *pptrptr; +PCRE2_SIZE offset; +parsed_recurse_check this_recurse; + +/* A large and/or complex regex can take too long to process. This can happen +more often when (?| groups are present in the pattern because their length +cannot be cached. */ + +if ((*lcptr)++ > 2000) + { + *errcodeptr = ERR35; /* Lookbehind is too complicated */ + return -1; + } + +/* Scan the branch, accumulating the length. */ + +for (;; pptr++) + { + parsed_recurse_check *r; + uint32_t *gptr, *gptrend; + uint32_t escape; + uint32_t group = 0; + uint32_t itemlength = 0; + + if (*pptr < META_END) + { + itemlength = 1; + } + + else switch (META_CODE(*pptr)) + { + case META_KET: + case META_ALT: + goto EXIT; + + /* (*ACCEPT) and (*FAIL) terminate the branch, but we must skip to the + actual termination. */ + + case META_ACCEPT: + case META_FAIL: + pptr = parsed_skip(pptr, PSKIP_ALT); + if (pptr == NULL) goto PARSED_SKIP_FAILED; + goto EXIT; + + case META_MARK: + case META_COMMIT_ARG: + case META_PRUNE_ARG: + case META_SKIP_ARG: + case META_THEN_ARG: + pptr += pptr[1] + 1; + break; + + case META_CIRCUMFLEX: + case META_COMMIT: + case META_DOLLAR: + case META_PRUNE: + case META_SKIP: + case META_THEN: + break; + + case META_OPTIONS: + pptr += 1; + break; + + case META_BIGVALUE: + itemlength = 1; + pptr += 1; + break; + + case META_CLASS: + case META_CLASS_NOT: + itemlength = 1; + pptr = parsed_skip(pptr, PSKIP_CLASS); + if (pptr == NULL) goto PARSED_SKIP_FAILED; + break; + + case META_CLASS_EMPTY_NOT: + case META_DOT: + itemlength = 1; + break; + + case META_CALLOUT_NUMBER: + pptr += 3; + break; + + case META_CALLOUT_STRING: + pptr += 3 + SIZEOFFSET; + break; + + /* Only some escapes consume a character. Of those, \R and \X are never + allowed because they might match more than character. \C is allowed only in + 32-bit and non-UTF 8/16-bit modes. */ + + case META_ESCAPE: + escape = META_DATA(*pptr); + if (escape == ESC_R || escape == ESC_X) return -1; + if (escape > ESC_b && escape < ESC_Z) + { +#if PCRE2_CODE_UNIT_WIDTH != 32 + if ((cb->external_options & PCRE2_UTF) != 0 && escape == ESC_C) + { + *errcodeptr = ERR36; + return -1; + } +#endif + itemlength = 1; + if (escape == ESC_p || escape == ESC_P) pptr++; /* Skip prop data */ + } + break; + + /* Lookaheads do not contribute to the length of this branch, but they may + contain lookbehinds within them whose lengths need to be set. */ + + case META_LOOKAHEAD: + case META_LOOKAHEADNOT: + case META_LOOKAHEAD_NA: + *errcodeptr = check_lookbehinds(pptr + 1, &pptr, recurses, cb); + if (*errcodeptr != 0) return -1; + + /* Ignore any qualifiers that follow a lookahead assertion. */ + + switch (pptr[1]) + { + case META_ASTERISK: + case META_ASTERISK_PLUS: + case META_ASTERISK_QUERY: + case META_PLUS: + case META_PLUS_PLUS: + case META_PLUS_QUERY: + case META_QUERY: + case META_QUERY_PLUS: + case META_QUERY_QUERY: + pptr++; + break; + + case META_MINMAX: + case META_MINMAX_PLUS: + case META_MINMAX_QUERY: + pptr += 3; + break; + + default: + break; + } + break; + + /* A nested lookbehind does not contribute any length to this lookbehind, + but must itself be checked and have its lengths set. */ + + case META_LOOKBEHIND: + case META_LOOKBEHINDNOT: + case META_LOOKBEHIND_NA: + if (!set_lookbehind_lengths(&pptr, errcodeptr, lcptr, recurses, cb)) + return -1; + break; + + /* Back references and recursions are handled by very similar code. At this + stage, the names generated in the parsing pass are available, but the main + name table has not yet been created. So for the named varieties, scan the + list of names in order to get the number of the first one in the pattern, + and whether or not this name is duplicated. */ + + case META_BACKREF_BYNAME: + if ((cb->external_options & PCRE2_MATCH_UNSET_BACKREF) != 0) + goto ISNOTFIXED; + /* Fall through */ + + case META_RECURSE_BYNAME: + { + int i; + PCRE2_SPTR name; + BOOL is_dupname = FALSE; + named_group *ng = cb->named_groups; + uint32_t meta_code = META_CODE(*pptr); + uint32_t length = *(++pptr); + + GETPLUSOFFSET(offset, pptr); + name = cb->start_pattern + offset; + for (i = 0; i < cb->names_found; i++, ng++) + { + if (length == ng->length && PRIV(strncmp)(name, ng->name, length) == 0) + { + group = ng->number; + is_dupname = ng->isdup; + break; + } + } + + if (group == 0) + { + *errcodeptr = ERR15; /* Non-existent subpattern */ + cb->erroroffset = offset; + return -1; + } + + /* A numerical back reference can be fixed length if duplicate capturing + groups are not being used. A non-duplicate named back reference can also + be handled. */ + + if (meta_code == META_RECURSE_BYNAME || + (!is_dupname && (cb->external_flags & PCRE2_DUPCAPUSED) == 0)) + goto RECURSE_OR_BACKREF_LENGTH; /* Handle as a numbered version. */ + } + goto ISNOTFIXED; /* Duplicate name or number */ + + /* The offset values for back references < 10 are in a separate vector + because otherwise they would use more than two parsed pattern elements on + 64-bit systems. */ + + case META_BACKREF: + if ((cb->external_options & PCRE2_MATCH_UNSET_BACKREF) != 0 || + (cb->external_flags & PCRE2_DUPCAPUSED) != 0) + goto ISNOTFIXED; + group = META_DATA(*pptr); + if (group < 10) + { + offset = cb->small_ref_offset[group]; + goto RECURSE_OR_BACKREF_LENGTH; + } + + /* Fall through */ + /* For groups >= 10 - picking up group twice does no harm. */ + + /* A true recursion implies not fixed length, but a subroutine call may + be OK. Back reference "recursions" are also failed. */ + + case META_RECURSE: + group = META_DATA(*pptr); + GETPLUSOFFSET(offset, pptr); + + RECURSE_OR_BACKREF_LENGTH: + if (group > cb->bracount) + { + cb->erroroffset = offset; + *errcodeptr = ERR15; /* Non-existent subpattern */ + return -1; + } + if (group == 0) goto ISNOTFIXED; /* Local recursion */ + for (gptr = cb->parsed_pattern; *gptr != META_END; gptr++) + { + if (META_CODE(*gptr) == META_BIGVALUE) gptr++; + else if (*gptr == (META_CAPTURE | group)) break; + } + + /* We must start the search for the end of the group at the first meta code + inside the group. Otherwise it will be treated as an enclosed group. */ + + gptrend = parsed_skip(gptr + 1, PSKIP_KET); + if (gptrend == NULL) goto PARSED_SKIP_FAILED; + if (pptr > gptr && pptr < gptrend) goto ISNOTFIXED; /* Local recursion */ + for (r = recurses; r != NULL; r = r->prev) if (r->groupptr == gptr) break; + if (r != NULL) goto ISNOTFIXED; /* Mutual recursion */ + this_recurse.prev = recurses; + this_recurse.groupptr = gptr; + + /* We do not need to know the position of the end of the group, that is, + gptr is not used after the call to get_grouplength(). Setting the second + argument FALSE stops it scanning for the end when the length can be found + in the cache. */ + + gptr++; + grouplength = get_grouplength(&gptr, FALSE, errcodeptr, lcptr, group, + &this_recurse, cb); + if (grouplength < 0) + { + if (*errcodeptr == 0) goto ISNOTFIXED; + return -1; /* Error already set */ + } + itemlength = grouplength; + break; + + /* A (DEFINE) group is never obeyed inline and so it does not contribute to + the length of this branch. Skip from the following item to the next + unpaired ket. */ + + case META_COND_DEFINE: + pptr = parsed_skip(pptr + 1, PSKIP_KET); + break; + + /* Check other nested groups - advance past the initial data for each type + and then seek a fixed length with get_grouplength(). */ + + case META_COND_NAME: + case META_COND_NUMBER: + case META_COND_RNAME: + case META_COND_RNUMBER: + pptr += 2 + SIZEOFFSET; + goto CHECK_GROUP; + + case META_COND_ASSERT: + pptr += 1; + goto CHECK_GROUP; + + case META_COND_VERSION: + pptr += 4; + goto CHECK_GROUP; + + case META_CAPTURE: + group = META_DATA(*pptr); + /* Fall through */ + + case META_ATOMIC: + case META_NOCAPTURE: + case META_SCRIPT_RUN: + pptr++; + CHECK_GROUP: + grouplength = get_grouplength(&pptr, TRUE, errcodeptr, lcptr, group, + recurses, cb); + if (grouplength < 0) return -1; + itemlength = grouplength; + break; + + /* Exact repetition is OK; variable repetition is not. A repetition of zero + must subtract the length that has already been added. */ + + case META_MINMAX: + case META_MINMAX_PLUS: + case META_MINMAX_QUERY: + if (pptr[1] == pptr[2]) + { + switch(pptr[1]) + { + case 0: + branchlength -= lastitemlength; + break; + + case 1: + itemlength = 0; + break; + + default: /* Check for integer overflow */ + if (lastitemlength != 0 && /* Should not occur, but just in case */ + INT_MAX/lastitemlength < pptr[1] - 1) + { + *errcodeptr = ERR87; /* Integer overflow; lookbehind too big */ + return -1; + } + itemlength = (pptr[1] - 1) * lastitemlength; + break; + } + pptr += 2; + break; + } + /* Fall through */ + + /* Any other item means this branch does not have a fixed length. */ + + default: + ISNOTFIXED: + *errcodeptr = ERR25; /* Not fixed length */ + return -1; + } + + /* Add the item length to the branchlength, checking for integer overflow and + for the branch length exceeding the limit. */ + + if (INT_MAX - branchlength < (int)itemlength || + (branchlength += itemlength) > LOOKBEHIND_MAX) + { + *errcodeptr = ERR87; + return -1; + } + + /* Save this item length for use if the next item is a quantifier. */ + + lastitemlength = itemlength; + } + +EXIT: +*pptrptr = pptr; +return branchlength; + +PARSED_SKIP_FAILED: +*errcodeptr = ERR90; +return -1; +} + + + +/************************************************* +* Set lengths in a lookbehind * +*************************************************/ + +/* This function is called for each lookbehind, to set the lengths in its +branches. An error occurs if any branch does not have a fixed length that is +less than the maximum (65535). On exit, the pointer must be left on the final +ket. + +The function also maintains the max_lookbehind value. Any lookbehind branch +that contains a nested lookbehind may actually look further back than the +length of the branch. The additional amount is passed back from +get_branchlength() as an "extra" value. + +Arguments: + pptrptr pointer to pointer in the parsed pattern + errcodeptr pointer to error code + lcptr pointer to loop counter + recurses chain of recurse_check to catch mutual recursion + cb pointer to compile block + +Returns: TRUE if all is well + FALSE otherwise, with error code and offset set +*/ + +static BOOL +set_lookbehind_lengths(uint32_t **pptrptr, int *errcodeptr, int *lcptr, + parsed_recurse_check *recurses, compile_block *cb) +{ +PCRE2_SIZE offset; +int branchlength; +uint32_t *bptr = *pptrptr; + +READPLUSOFFSET(offset, bptr); /* Offset for error messages */ +*pptrptr += SIZEOFFSET; + +do + { + *pptrptr += 1; + branchlength = get_branchlength(pptrptr, errcodeptr, lcptr, recurses, cb); + if (branchlength < 0) + { + /* The errorcode and offset may already be set from a nested lookbehind. */ + if (*errcodeptr == 0) *errcodeptr = ERR25; + if (cb->erroroffset == PCRE2_UNSET) cb->erroroffset = offset; + return FALSE; + } + if (branchlength > cb->max_lookbehind) cb->max_lookbehind = branchlength; + *bptr |= branchlength; /* branchlength never more than 65535 */ + bptr = *pptrptr; + } +while (*bptr == META_ALT); + +return TRUE; +} + + + +/************************************************* +* Check parsed pattern lookbehinds * +*************************************************/ + +/* This function is called at the end of parsing a pattern if any lookbehinds +were encountered. It scans the parsed pattern for them, calling +set_lookbehind_lengths() for each one. At the start, the errorcode is zero and +the error offset is marked unset. The enables the functions above not to +override settings from deeper nestings. + +This function is called recursively from get_branchlength() for lookaheads in +order to process any lookbehinds that they may contain. It stops when it hits a +non-nested closing parenthesis in this case, returning a pointer to it. + +Arguments + pptr points to where to start (start of pattern or start of lookahead) + retptr if not NULL, return the ket pointer here + recurses chain of recurse_check to catch mutual recursion + cb points to the compile block + +Returns: 0 on success, or an errorcode (cb->erroroffset will be set) +*/ + +static int +check_lookbehinds(uint32_t *pptr, uint32_t **retptr, + parsed_recurse_check *recurses, compile_block *cb) +{ +int errorcode = 0; +int loopcount = 0; +int nestlevel = 0; + +cb->erroroffset = PCRE2_UNSET; + +for (; *pptr != META_END; pptr++) + { + if (*pptr < META_END) continue; /* Literal */ + + switch (META_CODE(*pptr)) + { + default: + return ERR70; /* Unrecognized meta code */ + + case META_ESCAPE: + if (*pptr - META_ESCAPE == ESC_P || *pptr - META_ESCAPE == ESC_p) + pptr += 1; + break; + + case META_KET: + if (--nestlevel < 0) + { + if (retptr != NULL) *retptr = pptr; + return 0; + } + break; + + case META_ATOMIC: + case META_CAPTURE: + case META_COND_ASSERT: + case META_LOOKAHEAD: + case META_LOOKAHEADNOT: + case META_LOOKAHEAD_NA: + case META_NOCAPTURE: + case META_SCRIPT_RUN: + nestlevel++; + break; + + case META_ACCEPT: + case META_ALT: + case META_ASTERISK: + case META_ASTERISK_PLUS: + case META_ASTERISK_QUERY: + case META_BACKREF: + case META_CIRCUMFLEX: + case META_CLASS: + case META_CLASS_EMPTY: + case META_CLASS_EMPTY_NOT: + case META_CLASS_END: + case META_CLASS_NOT: + case META_COMMIT: + case META_DOLLAR: + case META_DOT: + case META_FAIL: + case META_PLUS: + case META_PLUS_PLUS: + case META_PLUS_QUERY: + case META_PRUNE: + case META_QUERY: + case META_QUERY_PLUS: + case META_QUERY_QUERY: + case META_RANGE_ESCAPED: + case META_RANGE_LITERAL: + case META_SKIP: + case META_THEN: + break; + + case META_RECURSE: + pptr += SIZEOFFSET; + break; + + case META_BACKREF_BYNAME: + case META_RECURSE_BYNAME: + pptr += 1 + SIZEOFFSET; + break; + + case META_COND_DEFINE: + pptr += SIZEOFFSET; + nestlevel++; + break; + + case META_COND_NAME: + case META_COND_NUMBER: + case META_COND_RNAME: + case META_COND_RNUMBER: + pptr += 1 + SIZEOFFSET; + nestlevel++; + break; + + case META_COND_VERSION: + pptr += 3; + nestlevel++; + break; + + case META_CALLOUT_STRING: + pptr += 3 + SIZEOFFSET; + break; + + case META_BIGVALUE: + case META_OPTIONS: + case META_POSIX: + case META_POSIX_NEG: + pptr += 1; + break; + + case META_MINMAX: + case META_MINMAX_QUERY: + case META_MINMAX_PLUS: + pptr += 2; + break; + + case META_CALLOUT_NUMBER: + pptr += 3; + break; + + case META_MARK: + case META_COMMIT_ARG: + case META_PRUNE_ARG: + case META_SKIP_ARG: + case META_THEN_ARG: + pptr += 1 + pptr[1]; + break; + + case META_LOOKBEHIND: + case META_LOOKBEHINDNOT: + case META_LOOKBEHIND_NA: + if (!set_lookbehind_lengths(&pptr, &errorcode, &loopcount, recurses, cb)) + return errorcode; + break; + } + } + +return 0; +} + + + +/************************************************* +* External function to compile a pattern * +*************************************************/ + +/* This function reads a regular expression in the form of a string and returns +a pointer to a block of store holding a compiled version of the expression. + +Arguments: + pattern the regular expression + patlen the length of the pattern, or PCRE2_ZERO_TERMINATED + options option bits + errorptr pointer to errorcode + erroroffset pointer to error offset + ccontext points to a compile context or is NULL + +Returns: pointer to compiled data block, or NULL on error, + with errorcode and erroroffset set +*/ + +PCRE2_EXP_DEFN pcre2_code * PCRE2_CALL_CONVENTION +pcre2_compile(PCRE2_SPTR pattern, PCRE2_SIZE patlen, uint32_t options, + int *errorptr, PCRE2_SIZE *erroroffset, pcre2_compile_context *ccontext) +{ +BOOL utf; /* Set TRUE for UTF mode */ +BOOL ucp; /* Set TRUE for UCP mode */ +BOOL has_lookbehind = FALSE; /* Set TRUE if a lookbehind is found */ +BOOL zero_terminated; /* Set TRUE for zero-terminated pattern */ +pcre2_real_code *re = NULL; /* What we will return */ +compile_block cb; /* "Static" compile-time data */ +const uint8_t *tables; /* Char tables base pointer */ + +PCRE2_UCHAR *code; /* Current pointer in compiled code */ +PCRE2_SPTR codestart; /* Start of compiled code */ +PCRE2_SPTR ptr; /* Current pointer in pattern */ +uint32_t *pptr; /* Current pointer in parsed pattern */ + +PCRE2_SIZE length = 1; /* Allow for final END opcode */ +PCRE2_SIZE usedlength; /* Actual length used */ +PCRE2_SIZE re_blocksize; /* Size of memory block */ +PCRE2_SIZE big32count = 0; /* 32-bit literals >= 0x80000000 */ +PCRE2_SIZE parsed_size_needed; /* Needed for parsed pattern */ + +int32_t firstcuflags, reqcuflags; /* Type of first/req code unit */ +uint32_t firstcu, reqcu; /* Value of first/req code unit */ +uint32_t setflags = 0; /* NL and BSR set flags */ + +uint32_t skipatstart; /* When checking (*UTF) etc */ +uint32_t limit_heap = UINT32_MAX; +uint32_t limit_match = UINT32_MAX; /* Unset match limits */ +uint32_t limit_depth = UINT32_MAX; + +int newline = 0; /* Unset; can be set by the pattern */ +int bsr = 0; /* Unset; can be set by the pattern */ +int errorcode = 0; /* Initialize to avoid compiler warn */ +int regexrc; /* Return from compile */ + +uint32_t i; /* Local loop counter */ + +/* Comments at the head of this file explain about these variables. */ + +uint32_t stack_groupinfo[GROUPINFO_DEFAULT_SIZE]; +uint32_t stack_parsed_pattern[PARSED_PATTERN_DEFAULT_SIZE]; +named_group named_groups[NAMED_GROUP_LIST_SIZE]; + +/* The workspace is used in different ways in the different compiling phases. +It needs to be 16-bit aligned for the preliminary parsing scan. */ + +uint32_t c16workspace[C16_WORK_SIZE]; +PCRE2_UCHAR *cworkspace = (PCRE2_UCHAR *)c16workspace; + + +/* -------------- Check arguments and set up the pattern ----------------- */ + +/* There must be error code and offset pointers. */ + +if (errorptr == NULL || erroroffset == NULL) return NULL; +*errorptr = ERR0; +*erroroffset = 0; + +/* There must be a pattern! */ + +if (pattern == NULL) + { + *errorptr = ERR16; + return NULL; + } + +/* A NULL compile context means "use a default context" */ + +if (ccontext == NULL) + ccontext = (pcre2_compile_context *)(&PRIV(default_compile_context)); + +/* PCRE2_MATCH_INVALID_UTF implies UTF */ + +if ((options & PCRE2_MATCH_INVALID_UTF) != 0) options |= PCRE2_UTF; + +/* Check that all undefined public option bits are zero. */ + +if ((options & ~PUBLIC_COMPILE_OPTIONS) != 0 || + (ccontext->extra_options & ~PUBLIC_COMPILE_EXTRA_OPTIONS) != 0) + { + *errorptr = ERR17; + return NULL; + } + +if ((options & PCRE2_LITERAL) != 0 && + ((options & ~PUBLIC_LITERAL_COMPILE_OPTIONS) != 0 || + (ccontext->extra_options & ~PUBLIC_LITERAL_COMPILE_EXTRA_OPTIONS) != 0)) + { + *errorptr = ERR92; + return NULL; + } + +/* A zero-terminated pattern is indicated by the special length value +PCRE2_ZERO_TERMINATED. Check for an overlong pattern. */ + +if ((zero_terminated = (patlen == PCRE2_ZERO_TERMINATED))) + patlen = PRIV(strlen)(pattern); + +if (patlen > ccontext->max_pattern_length) + { + *errorptr = ERR88; + return NULL; + } + +/* From here on, all returns from this function should end up going via the +EXIT label. */ + + +/* ------------ Initialize the "static" compile data -------------- */ + +tables = (ccontext->tables != NULL)? ccontext->tables : PRIV(default_tables); + +cb.lcc = tables + lcc_offset; /* Individual */ +cb.fcc = tables + fcc_offset; /* character */ +cb.cbits = tables + cbits_offset; /* tables */ +cb.ctypes = tables + ctypes_offset; + +cb.assert_depth = 0; +cb.bracount = 0; +cb.cx = ccontext; +cb.dupnames = FALSE; +cb.end_pattern = pattern + patlen; +cb.erroroffset = 0; +cb.external_flags = 0; +cb.external_options = options; +cb.groupinfo = stack_groupinfo; +cb.had_recurse = FALSE; +cb.lastcapture = 0; +cb.max_lookbehind = 0; +cb.name_entry_size = 0; +cb.name_table = NULL; +cb.named_groups = named_groups; +cb.named_group_list_size = NAMED_GROUP_LIST_SIZE; +cb.names_found = 0; +cb.open_caps = NULL; +cb.parens_depth = 0; +cb.parsed_pattern = stack_parsed_pattern; +cb.req_varyopt = 0; +cb.start_code = cworkspace; +cb.start_pattern = pattern; +cb.start_workspace = cworkspace; +cb.workspace_size = COMPILE_WORK_SIZE; + +/* Maximum back reference and backref bitmap. The bitmap records up to 31 back +references to help in deciding whether (.*) can be treated as anchored or not. +*/ + +cb.top_backref = 0; +cb.backref_map = 0; + +/* Escape sequences \1 to \9 are always back references, but as they are only +two characters long, only two elements can be used in the parsed_pattern +vector. The first contains the reference, and we'd like to use the second to +record the offset in the pattern, so that forward references to non-existent +groups can be diagnosed later with an offset. However, on 64-bit systems, +PCRE2_SIZE won't fit. Instead, we have a vector of offsets for the first +occurrence of \1 to \9, indexed by the second parsed_pattern value. All other +references have enough space for the offset to be put into the parsed pattern. +*/ + +for (i = 0; i < 10; i++) cb.small_ref_offset[i] = PCRE2_UNSET; + + +/* --------------- Start looking at the pattern --------------- */ + +/* Unless PCRE2_LITERAL is set, check for global one-time option settings at +the start of the pattern, and remember the offset to the actual regex. With +valgrind support, make the terminator of a zero-terminated pattern +inaccessible. This catches bugs that would otherwise only show up for +non-zero-terminated patterns. */ + +#ifdef SUPPORT_VALGRIND +if (zero_terminated) VALGRIND_MAKE_MEM_NOACCESS(pattern + patlen, CU2BYTES(1)); +#endif + +ptr = pattern; +skipatstart = 0; + +if ((options & PCRE2_LITERAL) == 0) + { + while (patlen - skipatstart >= 2 && + ptr[skipatstart] == CHAR_LEFT_PARENTHESIS && + ptr[skipatstart+1] == CHAR_ASTERISK) + { + for (i = 0; i < sizeof(pso_list)/sizeof(pso); i++) + { + uint32_t c, pp; + pso *p = pso_list + i; + + if (patlen - skipatstart - 2 >= p->length && + PRIV(strncmp_c8)(ptr + skipatstart + 2, (char *)(p->name), + p->length) == 0) + { + skipatstart += p->length + 2; + switch(p->type) + { + case PSO_OPT: + cb.external_options |= p->value; + break; + + case PSO_FLG: + setflags |= p->value; + break; + + case PSO_NL: + newline = p->value; + setflags |= PCRE2_NL_SET; + break; + + case PSO_BSR: + bsr = p->value; + setflags |= PCRE2_BSR_SET; + break; + + case PSO_LIMM: + case PSO_LIMD: + case PSO_LIMH: + c = 0; + pp = skipatstart; + if (!IS_DIGIT(ptr[pp])) + { + errorcode = ERR60; + ptr += pp; + goto HAD_EARLY_ERROR; + } + while (IS_DIGIT(ptr[pp])) + { + if (c > UINT32_MAX / 10 - 1) break; /* Integer overflow */ + c = c*10 + (ptr[pp++] - CHAR_0); + } + if (ptr[pp++] != CHAR_RIGHT_PARENTHESIS) + { + errorcode = ERR60; + ptr += pp; + goto HAD_EARLY_ERROR; + } + if (p->type == PSO_LIMH) limit_heap = c; + else if (p->type == PSO_LIMM) limit_match = c; + else limit_depth = c; + skipatstart += pp - skipatstart; + break; + } + break; /* Out of the table scan loop */ + } + } + if (i >= sizeof(pso_list)/sizeof(pso)) break; /* Out of pso loop */ + } + } + +/* End of pattern-start options; advance to start of real regex. */ + +ptr += skipatstart; + +/* Can't support UTF or UCP if PCRE2 was built without Unicode support. */ + +#ifndef SUPPORT_UNICODE +if ((cb.external_options & (PCRE2_UTF|PCRE2_UCP)) != 0) + { + errorcode = ERR32; + goto HAD_EARLY_ERROR; + } +#endif + +/* Check UTF. We have the original options in 'options', with that value as +modified by (*UTF) etc in cb->external_options. The extra option +PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not permitted in UTF-16 mode because the +surrogate code points cannot be represented in UTF-16. */ + +utf = (cb.external_options & PCRE2_UTF) != 0; +if (utf) + { + if ((options & PCRE2_NEVER_UTF) != 0) + { + errorcode = ERR74; + goto HAD_EARLY_ERROR; + } + if ((options & PCRE2_NO_UTF_CHECK) == 0 && + (errorcode = PRIV(valid_utf)(pattern, patlen, erroroffset)) != 0) + goto HAD_ERROR; /* Offset was set by valid_utf() */ + +#if PCRE2_CODE_UNIT_WIDTH == 16 + if ((ccontext->extra_options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) != 0) + { + errorcode = ERR91; + goto HAD_EARLY_ERROR; + } +#endif + } + +/* Check UCP lockout. */ + +ucp = (cb.external_options & PCRE2_UCP) != 0; +if (ucp && (cb.external_options & PCRE2_NEVER_UCP) != 0) + { + errorcode = ERR75; + goto HAD_EARLY_ERROR; + } + +/* Process the BSR setting. */ + +if (bsr == 0) bsr = ccontext->bsr_convention; + +/* Process the newline setting. */ + +if (newline == 0) newline = ccontext->newline_convention; +cb.nltype = NLTYPE_FIXED; +switch(newline) + { + case PCRE2_NEWLINE_CR: + cb.nllen = 1; + cb.nl[0] = CHAR_CR; + break; + + case PCRE2_NEWLINE_LF: + cb.nllen = 1; + cb.nl[0] = CHAR_NL; + break; + + case PCRE2_NEWLINE_NUL: + cb.nllen = 1; + cb.nl[0] = CHAR_NUL; + break; + + case PCRE2_NEWLINE_CRLF: + cb.nllen = 2; + cb.nl[0] = CHAR_CR; + cb.nl[1] = CHAR_NL; + break; + + case PCRE2_NEWLINE_ANY: + cb.nltype = NLTYPE_ANY; + break; + + case PCRE2_NEWLINE_ANYCRLF: + cb.nltype = NLTYPE_ANYCRLF; + break; + + default: + errorcode = ERR56; + goto HAD_EARLY_ERROR; + } + +/* Pre-scan the pattern to do two things: (1) Discover the named groups and +their numerical equivalents, so that this information is always available for +the remaining processing. (2) At the same time, parse the pattern and put a +processed version into the parsed_pattern vector. This has escapes interpreted +and comments removed (amongst other things). + +In all but one case, when PCRE2_AUTO_CALLOUT is not set, the number of unsigned +32-bit ints in the parsed pattern is bounded by the length of the pattern plus +one (for the terminator) plus four if PCRE2_EXTRA_WORD or PCRE2_EXTRA_LINE is +set. The exceptional case is when running in 32-bit, non-UTF mode, when literal +characters greater than META_END (0x80000000) have to be coded as two units. In +this case, therefore, we scan the pattern to check for such values. */ + +#if PCRE2_CODE_UNIT_WIDTH == 32 +if (!utf) + { + PCRE2_SPTR p; + for (p = ptr; p < cb.end_pattern; p++) if (*p >= META_END) big32count++; + } +#endif + +/* Ensure that the parsed pattern buffer is big enough. When PCRE2_AUTO_CALLOUT +is set we have to assume a numerical callout (4 elements) for each character +plus one at the end. This is overkill, but memory is plentiful these days. For +many smaller patterns the vector on the stack (which was set up above) can be +used. */ + +parsed_size_needed = patlen - skipatstart + big32count; + +if ((ccontext->extra_options & + (PCRE2_EXTRA_MATCH_WORD|PCRE2_EXTRA_MATCH_LINE)) != 0) + parsed_size_needed += 4; + +if ((options & PCRE2_AUTO_CALLOUT) != 0) + parsed_size_needed = (parsed_size_needed + 1) * 5; + +if (parsed_size_needed >= PARSED_PATTERN_DEFAULT_SIZE) + { + uint32_t *heap_parsed_pattern = ccontext->memctl.malloc( + (parsed_size_needed + 1) * sizeof(uint32_t), ccontext->memctl.memory_data); + if (heap_parsed_pattern == NULL) + { + *errorptr = ERR21; + goto EXIT; + } + cb.parsed_pattern = heap_parsed_pattern; + } +cb.parsed_pattern_end = cb.parsed_pattern + parsed_size_needed + 1; + +/* Do the parsing scan. */ + +errorcode = parse_regex(ptr, cb.external_options, &has_lookbehind, &cb); +if (errorcode != 0) goto HAD_CB_ERROR; + +/* Workspace is needed to remember information about numbered groups: whether a +group can match an empty string and what its fixed length is. This is done to +avoid the possibility of recursive references causing very long compile times +when checking these features. Unnumbered groups do not have this exposure since +they cannot be referenced. We use an indexed vector for this purpose. If there +are sufficiently few groups, the default vector on the stack, as set up above, +can be used. Otherwise we have to get/free a special vector. The vector must be +initialized to zero. */ + +if (cb.bracount >= GROUPINFO_DEFAULT_SIZE) + { + cb.groupinfo = ccontext->memctl.malloc( + (cb.bracount + 1)*sizeof(uint32_t), ccontext->memctl.memory_data); + if (cb.groupinfo == NULL) + { + errorcode = ERR21; + cb.erroroffset = 0; + goto HAD_CB_ERROR; + } + } +memset(cb.groupinfo, 0, (cb.bracount + 1) * sizeof(uint32_t)); + +/* If there were any lookbehinds, scan the parsed pattern to figure out their +lengths. */ + +if (has_lookbehind) + { + errorcode = check_lookbehinds(cb.parsed_pattern, NULL, NULL, &cb); + if (errorcode != 0) goto HAD_CB_ERROR; + } + +/* For debugging, there is a function that shows the parsed data vector. */ + +#ifdef DEBUG_SHOW_PARSED +fprintf(stderr, "+++ Pre-scan complete:\n"); +show_parsed(&cb); +#endif + +/* For debugging capturing information this code can be enabled. */ + +#ifdef DEBUG_SHOW_CAPTURES + { + named_group *ng = cb.named_groups; + fprintf(stderr, "+++Captures: %d\n", cb.bracount); + for (i = 0; i < cb.names_found; i++, ng++) + { + fprintf(stderr, "+++%3d %.*s\n", ng->number, ng->length, ng->name); + } + } +#endif + +/* Pretend to compile the pattern while actually just accumulating the amount +of memory required in the 'length' variable. This behaviour is triggered by +passing a non-NULL final argument to compile_regex(). We pass a block of +workspace (cworkspace) for it to compile parts of the pattern into; the +compiled code is discarded when it is no longer needed, so hopefully this +workspace will never overflow, though there is a test for its doing so. + +On error, errorcode will be set non-zero, so we don't need to look at the +result of the function. The initial options have been put into the cb block, +but we still have to pass a separate options variable (the first argument) +because the options may change as the pattern is processed. */ + +cb.erroroffset = patlen; /* For any subsequent errors that do not set it */ +pptr = cb.parsed_pattern; +code = cworkspace; +*code = OP_BRA; + +(void)compile_regex(cb.external_options, &code, &pptr, &errorcode, 0, &firstcu, + &firstcuflags, &reqcu, &reqcuflags, NULL, &cb, &length); + +if (errorcode != 0) goto HAD_CB_ERROR; /* Offset is in cb.erroroffset */ + +/* This should be caught in compile_regex(), but just in case... */ + +if (length > MAX_PATTERN_SIZE) + { + errorcode = ERR20; + goto HAD_CB_ERROR; + } + +/* Compute the size of, and then get and initialize, the data block for storing +the compiled pattern and names table. Integer overflow should no longer be +possible because nowadays we limit the maximum value of cb.names_found and +cb.name_entry_size. */ + +re_blocksize = sizeof(pcre2_real_code) + + CU2BYTES(length + + (PCRE2_SIZE)cb.names_found * (PCRE2_SIZE)cb.name_entry_size); +re = (pcre2_real_code *) + ccontext->memctl.malloc(re_blocksize, ccontext->memctl.memory_data); +if (re == NULL) + { + errorcode = ERR21; + goto HAD_CB_ERROR; + } + +/* The compiler may put padding at the end of the pcre2_real_code structure in +order to round it up to a multiple of 4 or 8 bytes. This means that when a +compiled pattern is copied (for example, when serialized) undefined bytes are +read, and this annoys debuggers such as valgrind. To avoid this, we explicitly +write to the last 8 bytes of the structure before setting the fields. */ + +memset((char *)re + sizeof(pcre2_real_code) - 8, 0, 8); +re->memctl = ccontext->memctl; +re->tables = tables; +re->executable_jit = NULL; +memset(re->start_bitmap, 0, 32 * sizeof(uint8_t)); +re->blocksize = re_blocksize; +re->magic_number = MAGIC_NUMBER; +re->compile_options = options; +re->overall_options = cb.external_options; +re->extra_options = ccontext->extra_options; +re->flags = PCRE2_CODE_UNIT_WIDTH/8 | cb.external_flags | setflags; +re->limit_heap = limit_heap; +re->limit_match = limit_match; +re->limit_depth = limit_depth; +re->first_codeunit = 0; +re->last_codeunit = 0; +re->bsr_convention = bsr; +re->newline_convention = newline; +re->max_lookbehind = 0; +re->minlength = 0; +re->top_bracket = 0; +re->top_backref = 0; +re->name_entry_size = cb.name_entry_size; +re->name_count = cb.names_found; + +/* The basic block is immediately followed by the name table, and the compiled +code follows after that. */ + +codestart = (PCRE2_SPTR)((uint8_t *)re + sizeof(pcre2_real_code)) + + re->name_entry_size * re->name_count; + +/* Update the compile data block for the actual compile. The starting points of +the name/number translation table and of the code are passed around in the +compile data block. The start/end pattern and initial options are already set +from the pre-compile phase, as is the name_entry_size field. */ + +cb.parens_depth = 0; +cb.assert_depth = 0; +cb.lastcapture = 0; +cb.name_table = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)); +cb.start_code = codestart; +cb.req_varyopt = 0; +cb.had_accept = FALSE; +cb.had_pruneorskip = FALSE; +cb.open_caps = NULL; + +/* If any named groups were found, create the name/number table from the list +created in the pre-pass. */ + +if (cb.names_found > 0) + { + named_group *ng = cb.named_groups; + for (i = 0; i < cb.names_found; i++, ng++) + add_name_to_table(&cb, ng->name, ng->length, ng->number, i); + } + +/* Set up a starting, non-extracting bracket, then compile the expression. On +error, errorcode will be set non-zero, so we don't need to look at the result +of the function here. */ + +pptr = cb.parsed_pattern; +code = (PCRE2_UCHAR *)codestart; +*code = OP_BRA; +regexrc = compile_regex(re->overall_options, &code, &pptr, &errorcode, 0, + &firstcu, &firstcuflags, &reqcu, &reqcuflags, NULL, &cb, NULL); +if (regexrc < 0) re->flags |= PCRE2_MATCH_EMPTY; +re->top_bracket = cb.bracount; +re->top_backref = cb.top_backref; +re->max_lookbehind = cb.max_lookbehind; + +if (cb.had_accept) + { + reqcu = 0; /* Must disable after (*ACCEPT) */ + reqcuflags = REQ_NONE; + re->flags |= PCRE2_HASACCEPT; /* Disables minimum length */ + } + +/* Fill in the final opcode and check for disastrous overflow. If no overflow, +but the estimated length exceeds the really used length, adjust the value of +re->blocksize, and if valgrind support is configured, mark the extra allocated +memory as unaddressable, so that any out-of-bound reads can be detected. */ + +*code++ = OP_END; +usedlength = code - codestart; +if (usedlength > length) errorcode = ERR23; else + { + re->blocksize -= CU2BYTES(length - usedlength); +#ifdef SUPPORT_VALGRIND + VALGRIND_MAKE_MEM_NOACCESS(code, CU2BYTES(length - usedlength)); +#endif + } + +/* Scan the pattern for recursion/subroutine calls and convert the group +numbers into offsets. Maintain a small cache so that repeated groups containing +recursions are efficiently handled. */ + +#define RSCAN_CACHE_SIZE 8 + +if (errorcode == 0 && cb.had_recurse) + { + PCRE2_UCHAR *rcode; + PCRE2_SPTR rgroup; + unsigned int ccount = 0; + int start = RSCAN_CACHE_SIZE; + recurse_cache rc[RSCAN_CACHE_SIZE]; + + for (rcode = (PCRE2_UCHAR *)find_recurse(codestart, utf); + rcode != NULL; + rcode = (PCRE2_UCHAR *)find_recurse(rcode + 1 + LINK_SIZE, utf)) + { + int p, groupnumber; + + groupnumber = (int)GET(rcode, 1); + if (groupnumber == 0) rgroup = codestart; else + { + PCRE2_SPTR search_from = codestart; + rgroup = NULL; + for (i = 0, p = start; i < ccount; i++, p = (p + 1) & 7) + { + if (groupnumber == rc[p].groupnumber) + { + rgroup = rc[p].group; + break; + } + + /* Group n+1 must always start to the right of group n, so we can save + search time below when the new group number is greater than any of the + previously found groups. */ + + if (groupnumber > rc[p].groupnumber) search_from = rc[p].group; + } + + if (rgroup == NULL) + { + rgroup = PRIV(find_bracket)(search_from, utf, groupnumber); + if (rgroup == NULL) + { + errorcode = ERR53; + break; + } + if (--start < 0) start = RSCAN_CACHE_SIZE - 1; + rc[start].groupnumber = groupnumber; + rc[start].group = rgroup; + if (ccount < RSCAN_CACHE_SIZE) ccount++; + } + } + + PUT(rcode, 1, rgroup - codestart); + } + } + +/* In rare debugging situations we sometimes need to look at the compiled code +at this stage. */ + +#ifdef DEBUG_CALL_PRINTINT +pcre2_printint(re, stderr, TRUE); +fprintf(stderr, "Length=%lu Used=%lu\n", length, usedlength); +#endif + +/* Unless disabled, check whether any single character iterators can be +auto-possessified. The function overwrites the appropriate opcode values, so +the type of the pointer must be cast. NOTE: the intermediate variable "temp" is +used in this code because at least one compiler gives a warning about loss of +"const" attribute if the cast (PCRE2_UCHAR *)codestart is used directly in the +function call. */ + +if (errorcode == 0 && (re->overall_options & PCRE2_NO_AUTO_POSSESS) == 0) + { + PCRE2_UCHAR *temp = (PCRE2_UCHAR *)codestart; + if (PRIV(auto_possessify)(temp, &cb) != 0) errorcode = ERR80; + } + +/* Failed to compile, or error while post-processing. */ + +if (errorcode != 0) goto HAD_CB_ERROR; + +/* Successful compile. If the anchored option was not passed, set it if +we can determine that the pattern is anchored by virtue of ^ characters or \A +or anything else, such as starting with non-atomic .* when DOTALL is set and +there are no occurrences of *PRUNE or *SKIP (though there is an option to +disable this case). */ + +if ((re->overall_options & PCRE2_ANCHORED) == 0 && + is_anchored(codestart, 0, &cb, 0, FALSE)) + re->overall_options |= PCRE2_ANCHORED; + +/* Set up the first code unit or startline flag, the required code unit, and +then study the pattern. This code need not be obeyed if PCRE2_NO_START_OPTIMIZE +is set, as the data it would create will not be used. Note that a first code +unit (but not the startline flag) is useful for anchored patterns because it +can still give a quick "no match" and also avoid searching for a last code +unit. */ + +if ((re->overall_options & PCRE2_NO_START_OPTIMIZE) == 0) + { + int minminlength = 0; /* For minimal minlength from first/required CU */ + + /* If we do not have a first code unit, see if there is one that is asserted + (these are not saved during the compile because they can cause conflicts with + actual literals that follow). */ + + if (firstcuflags < 0) + firstcu = find_firstassertedcu(codestart, &firstcuflags, 0); + + /* Save the data for a first code unit. The existence of one means the + minimum length must be at least 1. */ + + if (firstcuflags >= 0) + { + re->first_codeunit = firstcu; + re->flags |= PCRE2_FIRSTSET; + minminlength++; + + /* Handle caseless first code units. */ + + if ((firstcuflags & REQ_CASELESS) != 0) + { + if (firstcu < 128 || (!utf && !ucp && firstcu < 255)) + { + if (cb.fcc[firstcu] != firstcu) re->flags |= PCRE2_FIRSTCASELESS; + } + + /* The first code unit is > 128 in UTF or UCP mode, or > 255 otherwise. + In 8-bit UTF mode, codepoints in the range 128-255 are introductory code + points and cannot have another case, but if UCP is set they may do. */ + +#ifdef SUPPORT_UNICODE +#if PCRE2_CODE_UNIT_WIDTH == 8 + else if (ucp && !utf && UCD_OTHERCASE(firstcu) != firstcu) + re->flags |= PCRE2_FIRSTCASELESS; +#else + else if ((utf || ucp) && firstcu <= MAX_UTF_CODE_POINT && + UCD_OTHERCASE(firstcu) != firstcu) + re->flags |= PCRE2_FIRSTCASELESS; +#endif +#endif /* SUPPORT_UNICODE */ + } + } + + /* When there is no first code unit, for non-anchored patterns, see if we can + set the PCRE2_STARTLINE flag. This is helpful for multiline matches when all + branches start with ^ and also when all branches start with non-atomic .* for + non-DOTALL matches when *PRUNE and SKIP are not present. (There is an option + that disables this case.) */ + + else if ((re->overall_options & PCRE2_ANCHORED) == 0 && + is_startline(codestart, 0, &cb, 0, FALSE)) + re->flags |= PCRE2_STARTLINE; + + /* Handle the "required code unit", if one is set. In the UTF case we can + increment the minimum minimum length only if we are sure this really is a + different character and not a non-starting code unit of the first character, + because the minimum length count is in characters, not code units. */ + + if (reqcuflags >= 0) + { +#if PCRE2_CODE_UNIT_WIDTH == 16 + if ((re->overall_options & PCRE2_UTF) == 0 || /* Not UTF */ + firstcuflags < 0 || /* First not set */ + (firstcu & 0xf800) != 0xd800 || /* First not surrogate */ + (reqcu & 0xfc00) != 0xdc00) /* Req not low surrogate */ +#elif PCRE2_CODE_UNIT_WIDTH == 8 + if ((re->overall_options & PCRE2_UTF) == 0 || /* Not UTF */ + firstcuflags < 0 || /* First not set */ + (firstcu & 0x80) == 0 || /* First is ASCII */ + (reqcu & 0x80) == 0) /* Req is ASCII */ +#endif + { + minminlength++; + } + + /* In the case of an anchored pattern, set up the value only if it follows + a variable length item in the pattern. */ + + if ((re->overall_options & PCRE2_ANCHORED) == 0 || + (reqcuflags & REQ_VARY) != 0) + { + re->last_codeunit = reqcu; + re->flags |= PCRE2_LASTSET; + + /* Handle caseless required code units as for first code units (above). */ + + if ((reqcuflags & REQ_CASELESS) != 0) + { + if (reqcu < 128 || (!utf && !ucp && reqcu < 255)) + { + if (cb.fcc[reqcu] != reqcu) re->flags |= PCRE2_LASTCASELESS; + } +#ifdef SUPPORT_UNICODE +#if PCRE2_CODE_UNIT_WIDTH == 8 + else if (ucp && !utf && UCD_OTHERCASE(reqcu) != reqcu) + re->flags |= PCRE2_LASTCASELESS; +#else + else if ((utf || ucp) && reqcu <= MAX_UTF_CODE_POINT && + UCD_OTHERCASE(reqcu) != reqcu) + re->flags |= PCRE2_LASTCASELESS; +#endif +#endif /* SUPPORT_UNICODE */ + } + } + } + + /* Study the compiled pattern to set up information such as a bitmap of + starting code units and a minimum matching length. */ + + if (PRIV(study)(re) != 0) + { + errorcode = ERR31; + goto HAD_CB_ERROR; + } + + /* If study() set a bitmap of starting code units, it implies a minimum + length of at least one. */ + + if ((re->flags & PCRE2_FIRSTMAPSET) != 0 && minminlength == 0) + minminlength = 1; + + /* If the minimum length set (or not set) by study() is less than the minimum + implied by required code units, override it. */ + + if (re->minlength < minminlength) re->minlength = minminlength; + } /* End of start-of-match optimizations. */ + +/* Control ends up here in all cases. When running under valgrind, make a +pattern's terminating zero defined again. If memory was obtained for the parsed +version of the pattern, free it before returning. Also free the list of named +groups if a larger one had to be obtained, and likewise the group information +vector. */ + +EXIT: +#ifdef SUPPORT_VALGRIND +if (zero_terminated) VALGRIND_MAKE_MEM_DEFINED(pattern + patlen, CU2BYTES(1)); +#endif +if (cb.parsed_pattern != stack_parsed_pattern) + ccontext->memctl.free(cb.parsed_pattern, ccontext->memctl.memory_data); +if (cb.named_group_list_size > NAMED_GROUP_LIST_SIZE) + ccontext->memctl.free((void *)cb.named_groups, ccontext->memctl.memory_data); +if (cb.groupinfo != stack_groupinfo) + ccontext->memctl.free((void *)cb.groupinfo, ccontext->memctl.memory_data); +return re; /* Will be NULL after an error */ + +/* Errors discovered in parse_regex() set the offset value in the compile +block. Errors discovered before it is called must compute it from the ptr +value. After parse_regex() is called, the offset in the compile block is set to +the end of the pattern, but certain errors in compile_regex() may reset it if +an offset is available in the parsed pattern. */ + +HAD_CB_ERROR: +ptr = pattern + cb.erroroffset; + +HAD_EARLY_ERROR: +*erroroffset = ptr - pattern; + +HAD_ERROR: +*errorptr = errorcode; +pcre2_code_free(re); +re = NULL; +goto EXIT; +} + +/* End of pcre2_compile.c */ diff --git a/src/pcre2/src/pcre2_config.c b/src/pcre2/src/pcre2_config.c new file mode 100644 index 00000000..5ef103ca --- /dev/null +++ b/src/pcre2/src/pcre2_config.c @@ -0,0 +1,252 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2020 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +/* Save the configured link size, which is in bytes. In 16-bit and 32-bit modes +its value gets changed by pcre2_intmodedep.h (included by pcre2_internal.h) to +be in code units. */ + +static int configured_link_size = LINK_SIZE; + +#include "pcre2_internal.h" + +/* These macros are the standard way of turning unquoted text into C strings. +They allow macros like PCRE2_MAJOR to be defined without quotes, which is +convenient for user programs that want to test their values. */ + +#define STRING(a) # a +#define XSTRING(s) STRING(s) + + +/************************************************* +* Return info about what features are configured * +*************************************************/ + +/* If where is NULL, the length of memory required is returned. + +Arguments: + what what information is required + where where to put the information + +Returns: 0 if a numerical value is returned + >= 0 if a string value + PCRE2_ERROR_BADOPTION if "where" not recognized + or JIT target requested when JIT not enabled +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_config(uint32_t what, void *where) +{ +if (where == NULL) /* Requests a length */ + { + switch(what) + { + default: + return PCRE2_ERROR_BADOPTION; + + case PCRE2_CONFIG_BSR: + case PCRE2_CONFIG_COMPILED_WIDTHS: + case PCRE2_CONFIG_DEPTHLIMIT: + case PCRE2_CONFIG_HEAPLIMIT: + case PCRE2_CONFIG_JIT: + case PCRE2_CONFIG_LINKSIZE: + case PCRE2_CONFIG_MATCHLIMIT: + case PCRE2_CONFIG_NEVER_BACKSLASH_C: + case PCRE2_CONFIG_NEWLINE: + case PCRE2_CONFIG_PARENSLIMIT: + case PCRE2_CONFIG_STACKRECURSE: /* Obsolete */ + case PCRE2_CONFIG_TABLES_LENGTH: + case PCRE2_CONFIG_UNICODE: + return sizeof(uint32_t); + + /* These are handled below */ + + case PCRE2_CONFIG_JITTARGET: + case PCRE2_CONFIG_UNICODE_VERSION: + case PCRE2_CONFIG_VERSION: + break; + } + } + +switch (what) + { + default: + return PCRE2_ERROR_BADOPTION; + + case PCRE2_CONFIG_BSR: +#ifdef BSR_ANYCRLF + *((uint32_t *)where) = PCRE2_BSR_ANYCRLF; +#else + *((uint32_t *)where) = PCRE2_BSR_UNICODE; +#endif + break; + + case PCRE2_CONFIG_COMPILED_WIDTHS: + *((uint32_t *)where) = 0 +#ifdef SUPPORT_PCRE2_8 + + 1 +#endif +#ifdef SUPPORT_PCRE2_16 + + 2 +#endif +#ifdef SUPPORT_PCRE2_32 + + 4 +#endif + ; + break; + + case PCRE2_CONFIG_DEPTHLIMIT: + *((uint32_t *)where) = MATCH_LIMIT_DEPTH; + break; + + case PCRE2_CONFIG_HEAPLIMIT: + *((uint32_t *)where) = HEAP_LIMIT; + break; + + case PCRE2_CONFIG_JIT: +#ifdef SUPPORT_JIT + *((uint32_t *)where) = 1; +#else + *((uint32_t *)where) = 0; +#endif + break; + + case PCRE2_CONFIG_JITTARGET: +#ifdef SUPPORT_JIT + { + const char *v = PRIV(jit_get_target)(); + return (int)(1 + ((where == NULL)? + strlen(v) : PRIV(strcpy_c8)((PCRE2_UCHAR *)where, v))); + } +#else + return PCRE2_ERROR_BADOPTION; +#endif + + case PCRE2_CONFIG_LINKSIZE: + *((uint32_t *)where) = (uint32_t)configured_link_size; + break; + + case PCRE2_CONFIG_MATCHLIMIT: + *((uint32_t *)where) = MATCH_LIMIT; + break; + + case PCRE2_CONFIG_NEWLINE: + *((uint32_t *)where) = NEWLINE_DEFAULT; + break; + + case PCRE2_CONFIG_NEVER_BACKSLASH_C: +#ifdef NEVER_BACKSLASH_C + *((uint32_t *)where) = 1; +#else + *((uint32_t *)where) = 0; +#endif + break; + + case PCRE2_CONFIG_PARENSLIMIT: + *((uint32_t *)where) = PARENS_NEST_LIMIT; + break; + + /* This is now obsolete. The stack is no longer used via recursion for + handling backtracking in pcre2_match(). */ + + case PCRE2_CONFIG_STACKRECURSE: + *((uint32_t *)where) = 0; + break; + + case PCRE2_CONFIG_TABLES_LENGTH: + *((uint32_t *)where) = TABLES_LENGTH; + break; + + case PCRE2_CONFIG_UNICODE_VERSION: + { +#if defined SUPPORT_UNICODE + const char *v = PRIV(unicode_version); +#else + const char *v = "Unicode not supported"; +#endif + return (int)(1 + ((where == NULL)? + strlen(v) : PRIV(strcpy_c8)((PCRE2_UCHAR *)where, v))); + } + break; + + case PCRE2_CONFIG_UNICODE: +#if defined SUPPORT_UNICODE + *((uint32_t *)where) = 1; +#else + *((uint32_t *)where) = 0; +#endif + break; + + /* The hackery in setting "v" below is to cope with the case when + PCRE2_PRERELEASE is set to an empty string (which it is for real releases). + If the second alternative is used in this case, it does not leave a space + before the date. On the other hand, if all four macros are put into a single + XSTRING when PCRE2_PRERELEASE is not empty, an unwanted space is inserted. + There are problems using an "obvious" approach like this: + + XSTRING(PCRE2_MAJOR) "." XSTRING(PCRE_MINOR) + XSTRING(PCRE2_PRERELEASE) " " XSTRING(PCRE_DATE) + + because, when PCRE2_PRERELEASE is empty, this leads to an attempted expansion + of STRING(). The C standard states: "If (before argument substitution) any + argument consists of no preprocessing tokens, the behavior is undefined." It + turns out the gcc treats this case as a single empty string - which is what + we really want - but Visual C grumbles about the lack of an argument for the + macro. Unfortunately, both are within their rights. As there seems to be no + way to test for a macro's value being empty at compile time, we have to + resort to a runtime test. */ + + case PCRE2_CONFIG_VERSION: + { + const char *v = (XSTRING(Z PCRE2_PRERELEASE)[1] == 0)? + XSTRING(PCRE2_MAJOR.PCRE2_MINOR PCRE2_DATE) : + XSTRING(PCRE2_MAJOR.PCRE2_MINOR) XSTRING(PCRE2_PRERELEASE PCRE2_DATE); + return (int)(1 + ((where == NULL)? + strlen(v) : PRIV(strcpy_c8)((PCRE2_UCHAR *)where, v))); + } + } + +return 0; +} + +/* End of pcre2_config.c */ diff --git a/src/pcre2/src/pcre2_context.c b/src/pcre2/src/pcre2_context.c new file mode 100644 index 00000000..f904a494 --- /dev/null +++ b/src/pcre2/src/pcre2_context.c @@ -0,0 +1,488 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2018 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include "pcre2_internal.h" + + + +/************************************************* +* Default malloc/free functions * +*************************************************/ + +/* Ignore the "user data" argument in each case. */ + +static void *default_malloc(size_t size, void *data) +{ +(void)data; +return malloc(size); +} + + +static void default_free(void *block, void *data) +{ +(void)data; +free(block); +} + + + +/************************************************* +* Get a block and save memory control * +*************************************************/ + +/* This internal function is called to get a block of memory in which the +memory control data is to be stored at the start for future use. + +Arguments: + size amount of memory required + memctl pointer to a memctl block or NULL + +Returns: pointer to memory or NULL on failure +*/ + +extern void * +PRIV(memctl_malloc)(size_t size, pcre2_memctl *memctl) +{ +pcre2_memctl *newmemctl; +void *yield = (memctl == NULL)? malloc(size) : + memctl->malloc(size, memctl->memory_data); +if (yield == NULL) return NULL; +newmemctl = (pcre2_memctl *)yield; +if (memctl == NULL) + { + newmemctl->malloc = default_malloc; + newmemctl->free = default_free; + newmemctl->memory_data = NULL; + } +else *newmemctl = *memctl; +return yield; +} + + + +/************************************************* +* Create and initialize contexts * +*************************************************/ + +/* Initializing for compile and match contexts is done in separate, private +functions so that these can be called from functions such as pcre2_compile() +when an external context is not supplied. The initializing functions have an +option to set up default memory management. */ + +PCRE2_EXP_DEFN pcre2_general_context * PCRE2_CALL_CONVENTION +pcre2_general_context_create(void *(*private_malloc)(size_t, void *), + void (*private_free)(void *, void *), void *memory_data) +{ +pcre2_general_context *gcontext; +if (private_malloc == NULL) private_malloc = default_malloc; +if (private_free == NULL) private_free = default_free; +gcontext = private_malloc(sizeof(pcre2_real_general_context), memory_data); +if (gcontext == NULL) return NULL; +gcontext->memctl.malloc = private_malloc; +gcontext->memctl.free = private_free; +gcontext->memctl.memory_data = memory_data; +return gcontext; +} + + +/* A default compile context is set up to save having to initialize at run time +when no context is supplied to the compile function. */ + +const pcre2_compile_context PRIV(default_compile_context) = { + { default_malloc, default_free, NULL }, /* Default memory handling */ + NULL, /* Stack guard */ + NULL, /* Stack guard data */ + PRIV(default_tables), /* Character tables */ + PCRE2_UNSET, /* Max pattern length */ + BSR_DEFAULT, /* Backslash R default */ + NEWLINE_DEFAULT, /* Newline convention */ + PARENS_NEST_LIMIT, /* As it says */ + 0 }; /* Extra options */ + +/* The create function copies the default into the new memory, but must +override the default memory handling functions if a gcontext was provided. */ + +PCRE2_EXP_DEFN pcre2_compile_context * PCRE2_CALL_CONVENTION +pcre2_compile_context_create(pcre2_general_context *gcontext) +{ +pcre2_compile_context *ccontext = PRIV(memctl_malloc)( + sizeof(pcre2_real_compile_context), (pcre2_memctl *)gcontext); +if (ccontext == NULL) return NULL; +*ccontext = PRIV(default_compile_context); +if (gcontext != NULL) + *((pcre2_memctl *)ccontext) = *((pcre2_memctl *)gcontext); +return ccontext; +} + + +/* A default match context is set up to save having to initialize at run time +when no context is supplied to a match function. */ + +const pcre2_match_context PRIV(default_match_context) = { + { default_malloc, default_free, NULL }, +#ifdef SUPPORT_JIT + NULL, /* JIT callback */ + NULL, /* JIT callback data */ +#endif + NULL, /* Callout function */ + NULL, /* Callout data */ + NULL, /* Substitute callout function */ + NULL, /* Substitute callout data */ + PCRE2_UNSET, /* Offset limit */ + HEAP_LIMIT, + MATCH_LIMIT, + MATCH_LIMIT_DEPTH }; + +/* The create function copies the default into the new memory, but must +override the default memory handling functions if a gcontext was provided. */ + +PCRE2_EXP_DEFN pcre2_match_context * PCRE2_CALL_CONVENTION +pcre2_match_context_create(pcre2_general_context *gcontext) +{ +pcre2_match_context *mcontext = PRIV(memctl_malloc)( + sizeof(pcre2_real_match_context), (pcre2_memctl *)gcontext); +if (mcontext == NULL) return NULL; +*mcontext = PRIV(default_match_context); +if (gcontext != NULL) + *((pcre2_memctl *)mcontext) = *((pcre2_memctl *)gcontext); +return mcontext; +} + + +/* A default convert context is set up to save having to initialize at run time +when no context is supplied to the convert function. */ + +const pcre2_convert_context PRIV(default_convert_context) = { + { default_malloc, default_free, NULL }, /* Default memory handling */ +#ifdef _WIN32 + CHAR_BACKSLASH, /* Default path separator */ + CHAR_GRAVE_ACCENT /* Default escape character */ +#else /* Not Windows */ + CHAR_SLASH, /* Default path separator */ + CHAR_BACKSLASH /* Default escape character */ +#endif + }; + +/* The create function copies the default into the new memory, but must +override the default memory handling functions if a gcontext was provided. */ + +PCRE2_EXP_DEFN pcre2_convert_context * PCRE2_CALL_CONVENTION +pcre2_convert_context_create(pcre2_general_context *gcontext) +{ +pcre2_convert_context *ccontext = PRIV(memctl_malloc)( + sizeof(pcre2_real_convert_context), (pcre2_memctl *)gcontext); +if (ccontext == NULL) return NULL; +*ccontext = PRIV(default_convert_context); +if (gcontext != NULL) + *((pcre2_memctl *)ccontext) = *((pcre2_memctl *)gcontext); +return ccontext; +} + + +/************************************************* +* Context copy functions * +*************************************************/ + +PCRE2_EXP_DEFN pcre2_general_context * PCRE2_CALL_CONVENTION +pcre2_general_context_copy(pcre2_general_context *gcontext) +{ +pcre2_general_context *new = + gcontext->memctl.malloc(sizeof(pcre2_real_general_context), + gcontext->memctl.memory_data); +if (new == NULL) return NULL; +memcpy(new, gcontext, sizeof(pcre2_real_general_context)); +return new; +} + + +PCRE2_EXP_DEFN pcre2_compile_context * PCRE2_CALL_CONVENTION +pcre2_compile_context_copy(pcre2_compile_context *ccontext) +{ +pcre2_compile_context *new = + ccontext->memctl.malloc(sizeof(pcre2_real_compile_context), + ccontext->memctl.memory_data); +if (new == NULL) return NULL; +memcpy(new, ccontext, sizeof(pcre2_real_compile_context)); +return new; +} + + +PCRE2_EXP_DEFN pcre2_match_context * PCRE2_CALL_CONVENTION +pcre2_match_context_copy(pcre2_match_context *mcontext) +{ +pcre2_match_context *new = + mcontext->memctl.malloc(sizeof(pcre2_real_match_context), + mcontext->memctl.memory_data); +if (new == NULL) return NULL; +memcpy(new, mcontext, sizeof(pcre2_real_match_context)); +return new; +} + + + +PCRE2_EXP_DEFN pcre2_convert_context * PCRE2_CALL_CONVENTION +pcre2_convert_context_copy(pcre2_convert_context *ccontext) +{ +pcre2_convert_context *new = + ccontext->memctl.malloc(sizeof(pcre2_real_convert_context), + ccontext->memctl.memory_data); +if (new == NULL) return NULL; +memcpy(new, ccontext, sizeof(pcre2_real_convert_context)); +return new; +} + + +/************************************************* +* Context free functions * +*************************************************/ + +PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION +pcre2_general_context_free(pcre2_general_context *gcontext) +{ +if (gcontext != NULL) + gcontext->memctl.free(gcontext, gcontext->memctl.memory_data); +} + + +PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION +pcre2_compile_context_free(pcre2_compile_context *ccontext) +{ +if (ccontext != NULL) + ccontext->memctl.free(ccontext, ccontext->memctl.memory_data); +} + + +PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION +pcre2_match_context_free(pcre2_match_context *mcontext) +{ +if (mcontext != NULL) + mcontext->memctl.free(mcontext, mcontext->memctl.memory_data); +} + + +PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION +pcre2_convert_context_free(pcre2_convert_context *ccontext) +{ +if (ccontext != NULL) + ccontext->memctl.free(ccontext, ccontext->memctl.memory_data); +} + + +/************************************************* +* Set values in contexts * +*************************************************/ + +/* All these functions return 0 for success or PCRE2_ERROR_BADDATA if invalid +data is given. Only some of the functions are able to test the validity of the +data. */ + + +/* ------------ Compile context ------------ */ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_character_tables(pcre2_compile_context *ccontext, + const uint8_t *tables) +{ +ccontext->tables = tables; +return 0; +} + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_bsr(pcre2_compile_context *ccontext, uint32_t value) +{ +switch(value) + { + case PCRE2_BSR_ANYCRLF: + case PCRE2_BSR_UNICODE: + ccontext->bsr_convention = value; + return 0; + + default: + return PCRE2_ERROR_BADDATA; + } +} + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_max_pattern_length(pcre2_compile_context *ccontext, PCRE2_SIZE length) +{ +ccontext->max_pattern_length = length; +return 0; +} + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_newline(pcre2_compile_context *ccontext, uint32_t newline) +{ +switch(newline) + { + case PCRE2_NEWLINE_CR: + case PCRE2_NEWLINE_LF: + case PCRE2_NEWLINE_CRLF: + case PCRE2_NEWLINE_ANY: + case PCRE2_NEWLINE_ANYCRLF: + case PCRE2_NEWLINE_NUL: + ccontext->newline_convention = newline; + return 0; + + default: + return PCRE2_ERROR_BADDATA; + } +} + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_parens_nest_limit(pcre2_compile_context *ccontext, uint32_t limit) +{ +ccontext->parens_nest_limit = limit; +return 0; +} + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_compile_extra_options(pcre2_compile_context *ccontext, uint32_t options) +{ +ccontext->extra_options = options; +return 0; +} + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_compile_recursion_guard(pcre2_compile_context *ccontext, + int (*guard)(uint32_t, void *), void *user_data) +{ +ccontext->stack_guard = guard; +ccontext->stack_guard_data = user_data; +return 0; +} + + +/* ------------ Match context ------------ */ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_callout(pcre2_match_context *mcontext, + int (*callout)(pcre2_callout_block *, void *), void *callout_data) +{ +mcontext->callout = callout; +mcontext->callout_data = callout_data; +return 0; +} + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_substitute_callout(pcre2_match_context *mcontext, + int (*substitute_callout)(pcre2_substitute_callout_block *, void *), + void *substitute_callout_data) +{ +mcontext->substitute_callout = substitute_callout; +mcontext->substitute_callout_data = substitute_callout_data; +return 0; +} + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_heap_limit(pcre2_match_context *mcontext, uint32_t limit) +{ +mcontext->heap_limit = limit; +return 0; +} + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_match_limit(pcre2_match_context *mcontext, uint32_t limit) +{ +mcontext->match_limit = limit; +return 0; +} + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_depth_limit(pcre2_match_context *mcontext, uint32_t limit) +{ +mcontext->depth_limit = limit; +return 0; +} + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_offset_limit(pcre2_match_context *mcontext, PCRE2_SIZE limit) +{ +mcontext->offset_limit = limit; +return 0; +} + +/* This function became obsolete at release 10.30. It is kept as a synonym for +backwards compatibility. */ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_recursion_limit(pcre2_match_context *mcontext, uint32_t limit) +{ +return pcre2_set_depth_limit(mcontext, limit); +} + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_recursion_memory_management(pcre2_match_context *mcontext, + void *(*mymalloc)(size_t, void *), void (*myfree)(void *, void *), + void *mydata) +{ +(void)mcontext; +(void)mymalloc; +(void)myfree; +(void)mydata; +return 0; +} + +/* ------------ Convert context ------------ */ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_glob_separator(pcre2_convert_context *ccontext, uint32_t separator) +{ +if (separator != CHAR_SLASH && separator != CHAR_BACKSLASH && + separator != CHAR_DOT) return PCRE2_ERROR_BADDATA; +ccontext->glob_separator = separator; +return 0; +} + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_set_glob_escape(pcre2_convert_context *ccontext, uint32_t escape) +{ +if (escape > 255 || (escape != 0 && !ispunct(escape))) + return PCRE2_ERROR_BADDATA; +ccontext->glob_escape = escape; +return 0; +} + +/* End of pcre2_context.c */ + diff --git a/src/pcre2/src/pcre2_convert.c b/src/pcre2/src/pcre2_convert.c new file mode 100644 index 00000000..d45b6fee --- /dev/null +++ b/src/pcre2/src/pcre2_convert.c @@ -0,0 +1,1182 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2018 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include "pcre2_internal.h" + +#define TYPE_OPTIONS (PCRE2_CONVERT_GLOB| \ + PCRE2_CONVERT_POSIX_BASIC|PCRE2_CONVERT_POSIX_EXTENDED) + +#define ALL_OPTIONS (PCRE2_CONVERT_UTF|PCRE2_CONVERT_NO_UTF_CHECK| \ + PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR| \ + PCRE2_CONVERT_GLOB_NO_STARSTAR| \ + TYPE_OPTIONS) + +#define DUMMY_BUFFER_SIZE 100 + +/* Generated pattern fragments */ + +#define STR_BACKSLASH_A STR_BACKSLASH STR_A +#define STR_BACKSLASH_z STR_BACKSLASH STR_z +#define STR_COLON_RIGHT_SQUARE_BRACKET STR_COLON STR_RIGHT_SQUARE_BRACKET +#define STR_DOT_STAR_LOOKBEHIND STR_DOT STR_ASTERISK STR_LEFT_PARENTHESIS STR_QUESTION_MARK STR_LESS_THAN_SIGN STR_EQUALS_SIGN +#define STR_LOOKAHEAD_NOT_DOT STR_LEFT_PARENTHESIS STR_QUESTION_MARK STR_EXCLAMATION_MARK STR_BACKSLASH STR_DOT STR_RIGHT_PARENTHESIS +#define STR_QUERY_s STR_LEFT_PARENTHESIS STR_QUESTION_MARK STR_s STR_RIGHT_PARENTHESIS +#define STR_STAR_NUL STR_LEFT_PARENTHESIS STR_ASTERISK STR_N STR_U STR_L STR_RIGHT_PARENTHESIS + +/* States for range and POSIX processing */ + +enum { RANGE_NOT_STARTED, RANGE_STARTING, RANGE_STARTED }; +enum { POSIX_START_REGEX, POSIX_ANCHORED, POSIX_NOT_BRACKET, + POSIX_CLASS_NOT_STARTED, POSIX_CLASS_STARTING, POSIX_CLASS_STARTED }; + +/* Macro to add a character string to the output buffer, checking for overflow. */ + +#define PUTCHARS(string) \ + { \ + for (s = (char *)(string); *s != 0; s++) \ + { \ + if (p >= endp) return PCRE2_ERROR_NOMEMORY; \ + *p++ = *s; \ + } \ + } + +/* Literals that must be escaped: \ ? * + | . ^ $ { } [ ] ( ) */ + +static const char *pcre2_escaped_literals = + STR_BACKSLASH STR_QUESTION_MARK STR_ASTERISK STR_PLUS + STR_VERTICAL_LINE STR_DOT STR_CIRCUMFLEX_ACCENT STR_DOLLAR_SIGN + STR_LEFT_CURLY_BRACKET STR_RIGHT_CURLY_BRACKET + STR_LEFT_SQUARE_BRACKET STR_RIGHT_SQUARE_BRACKET + STR_LEFT_PARENTHESIS STR_RIGHT_PARENTHESIS; + +/* Recognized escaped metacharacters in POSIX basic patterns. */ + +static const char *posix_meta_escapes = + STR_LEFT_PARENTHESIS STR_RIGHT_PARENTHESIS + STR_LEFT_CURLY_BRACKET STR_RIGHT_CURLY_BRACKET + STR_1 STR_2 STR_3 STR_4 STR_5 STR_6 STR_7 STR_8 STR_9; + + + +/************************************************* +* Convert a POSIX pattern * +*************************************************/ + +/* This function handles both basic and extended POSIX patterns. + +Arguments: + pattype the pattern type + pattern the pattern + plength length in code units + utf TRUE if UTF + use_buffer where to put the output + use_length length of use_buffer + bufflenptr where to put the used length + dummyrun TRUE if a dummy run + ccontext the convert context + +Returns: 0 => success + !0 => error code +*/ + +static int +convert_posix(uint32_t pattype, PCRE2_SPTR pattern, PCRE2_SIZE plength, + BOOL utf, PCRE2_UCHAR *use_buffer, PCRE2_SIZE use_length, + PCRE2_SIZE *bufflenptr, BOOL dummyrun, pcre2_convert_context *ccontext) +{ +char *s; +PCRE2_SPTR posix = pattern; +PCRE2_UCHAR *p = use_buffer; +PCRE2_UCHAR *pp = p; +PCRE2_UCHAR *endp = p + use_length - 1; /* Allow for trailing zero */ +PCRE2_SIZE convlength = 0; + +uint32_t bracount = 0; +uint32_t posix_state = POSIX_START_REGEX; +uint32_t lastspecial = 0; +BOOL extended = (pattype & PCRE2_CONVERT_POSIX_EXTENDED) != 0; +BOOL nextisliteral = FALSE; + +(void)utf; /* Not used when Unicode not supported */ +(void)ccontext; /* Not currently used */ + +/* Initialize default for error offset as end of input. */ + +*bufflenptr = plength; +PUTCHARS(STR_STAR_NUL); + +/* Now scan the input. */ + +while (plength > 0) + { + uint32_t c, sc; + int clength = 1; + + /* Add in the length of the last item, then, if in the dummy run, pull the + pointer back to the start of the (temporary) buffer and then remember the + start of the next item. */ + + convlength += p - pp; + if (dummyrun) p = use_buffer; + pp = p; + + /* Pick up the next character */ + +#ifndef SUPPORT_UNICODE + c = *posix; +#else + GETCHARLENTEST(c, posix, clength); +#endif + posix += clength; + plength -= clength; + + sc = nextisliteral? 0 : c; + nextisliteral = FALSE; + + /* Handle a character within a class. */ + + if (posix_state >= POSIX_CLASS_NOT_STARTED) + { + if (c == CHAR_RIGHT_SQUARE_BRACKET) + { + PUTCHARS(STR_RIGHT_SQUARE_BRACKET); + posix_state = POSIX_NOT_BRACKET; + } + + /* Not the end of the class */ + + else + { + switch (posix_state) + { + case POSIX_CLASS_STARTED: + if (c <= 127 && islower(c)) break; /* Remain in started state */ + posix_state = POSIX_CLASS_NOT_STARTED; + if (c == CHAR_COLON && plength > 0 && + *posix == CHAR_RIGHT_SQUARE_BRACKET) + { + PUTCHARS(STR_COLON_RIGHT_SQUARE_BRACKET); + plength--; + posix++; + continue; /* With next character after :] */ + } + /* Fall through */ + + case POSIX_CLASS_NOT_STARTED: + if (c == CHAR_LEFT_SQUARE_BRACKET) + posix_state = POSIX_CLASS_STARTING; + break; + + case POSIX_CLASS_STARTING: + if (c == CHAR_COLON) posix_state = POSIX_CLASS_STARTED; + break; + } + + if (c == CHAR_BACKSLASH) PUTCHARS(STR_BACKSLASH); + if (p + clength > endp) return PCRE2_ERROR_NOMEMORY; + memcpy(p, posix - clength, CU2BYTES(clength)); + p += clength; + } + } + + /* Handle a character not within a class. */ + + else switch(sc) + { + case CHAR_LEFT_SQUARE_BRACKET: + PUTCHARS(STR_LEFT_SQUARE_BRACKET); + +#ifdef NEVER + /* We could handle special cases [[:<:]] and [[:>:]] (which PCRE does + support) but they are not part of POSIX 1003.1. */ + + if (plength >= 6) + { + if (posix[0] == CHAR_LEFT_SQUARE_BRACKET && + posix[1] == CHAR_COLON && + (posix[2] == CHAR_LESS_THAN_SIGN || + posix[2] == CHAR_GREATER_THAN_SIGN) && + posix[3] == CHAR_COLON && + posix[4] == CHAR_RIGHT_SQUARE_BRACKET && + posix[5] == CHAR_RIGHT_SQUARE_BRACKET) + { + if (p + 6 > endp) return PCRE2_ERROR_NOMEMORY; + memcpy(p, posix, CU2BYTES(6)); + p += 6; + posix += 6; + plength -= 6; + continue; /* With next character */ + } + } +#endif + + /* Handle start of "normal" character classes */ + + posix_state = POSIX_CLASS_NOT_STARTED; + + /* Handle ^ and ] as first characters */ + + if (plength > 0) + { + if (*posix == CHAR_CIRCUMFLEX_ACCENT) + { + posix++; + plength--; + PUTCHARS(STR_CIRCUMFLEX_ACCENT); + } + if (plength > 0 && *posix == CHAR_RIGHT_SQUARE_BRACKET) + { + posix++; + plength--; + PUTCHARS(STR_RIGHT_SQUARE_BRACKET); + } + } + break; + + case CHAR_BACKSLASH: + if (plength == 0) return PCRE2_ERROR_END_BACKSLASH; + if (extended) nextisliteral = TRUE; else + { + if (*posix < 127 && strchr(posix_meta_escapes, *posix) != NULL) + { + if (isdigit(*posix)) PUTCHARS(STR_BACKSLASH); + if (p + 1 > endp) return PCRE2_ERROR_NOMEMORY; + lastspecial = *p++ = *posix++; + plength--; + } + else nextisliteral = TRUE; + } + break; + + case CHAR_RIGHT_PARENTHESIS: + if (!extended || bracount == 0) goto ESCAPE_LITERAL; + bracount--; + goto COPY_SPECIAL; + + case CHAR_LEFT_PARENTHESIS: + bracount++; + /* Fall through */ + + case CHAR_QUESTION_MARK: + case CHAR_PLUS: + case CHAR_LEFT_CURLY_BRACKET: + case CHAR_RIGHT_CURLY_BRACKET: + case CHAR_VERTICAL_LINE: + if (!extended) goto ESCAPE_LITERAL; + /* Fall through */ + + case CHAR_DOT: + case CHAR_DOLLAR_SIGN: + posix_state = POSIX_NOT_BRACKET; + COPY_SPECIAL: + lastspecial = c; + if (p + 1 > endp) return PCRE2_ERROR_NOMEMORY; + *p++ = c; + break; + + case CHAR_ASTERISK: + if (lastspecial != CHAR_ASTERISK) + { + if (!extended && (posix_state < POSIX_NOT_BRACKET || + lastspecial == CHAR_LEFT_PARENTHESIS)) + goto ESCAPE_LITERAL; + goto COPY_SPECIAL; + } + break; /* Ignore second and subsequent asterisks */ + + case CHAR_CIRCUMFLEX_ACCENT: + if (extended) goto COPY_SPECIAL; + if (posix_state == POSIX_START_REGEX || + lastspecial == CHAR_LEFT_PARENTHESIS) + { + posix_state = POSIX_ANCHORED; + goto COPY_SPECIAL; + } + /* Fall through */ + + default: + if (c < 128 && strchr(pcre2_escaped_literals, c) != NULL) + { + ESCAPE_LITERAL: + PUTCHARS(STR_BACKSLASH); + } + lastspecial = 0xff; /* Indicates nothing special */ + if (p + clength > endp) return PCRE2_ERROR_NOMEMORY; + memcpy(p, posix - clength, CU2BYTES(clength)); + p += clength; + posix_state = POSIX_NOT_BRACKET; + break; + } + } + +if (posix_state >= POSIX_CLASS_NOT_STARTED) + return PCRE2_ERROR_MISSING_SQUARE_BRACKET; +convlength += p - pp; /* Final segment */ +*bufflenptr = convlength; +*p++ = 0; +return 0; +} + + +/************************************************* +* Convert a glob pattern * +*************************************************/ + +/* Context for writing the output into a buffer. */ + +typedef struct pcre2_output_context { + PCRE2_UCHAR *output; /* current output position */ + PCRE2_SPTR output_end; /* output end */ + PCRE2_SIZE output_size; /* size of the output */ + uint8_t out_str[8]; /* string copied to the output */ +} pcre2_output_context; + + +/* Write a character into the output. + +Arguments: + out output context + chr the next character +*/ + +static void +convert_glob_write(pcre2_output_context *out, PCRE2_UCHAR chr) +{ +out->output_size++; + +if (out->output < out->output_end) + *out->output++ = chr; +} + + +/* Write a string into the output. + +Arguments: + out output context + length length of out->out_str +*/ + +static void +convert_glob_write_str(pcre2_output_context *out, PCRE2_SIZE length) +{ +uint8_t *out_str = out->out_str; +PCRE2_UCHAR *output = out->output; +PCRE2_SPTR output_end = out->output_end; +PCRE2_SIZE output_size = out->output_size; + +do + { + output_size++; + + if (output < output_end) + *output++ = *out_str++; + } +while (--length != 0); + +out->output = output; +out->output_size = output_size; +} + + +/* Prints the separator into the output. + +Arguments: + out output context + separator glob separator + with_escape backslash is needed before separator +*/ + +static void +convert_glob_print_separator(pcre2_output_context *out, + PCRE2_UCHAR separator, BOOL with_escape) +{ +if (with_escape) + convert_glob_write(out, CHAR_BACKSLASH); + +convert_glob_write(out, separator); +} + + +/* Prints a wildcard into the output. + +Arguments: + out output context + separator glob separator + with_escape backslash is needed before separator +*/ + +static void +convert_glob_print_wildcard(pcre2_output_context *out, + PCRE2_UCHAR separator, BOOL with_escape) +{ +out->out_str[0] = CHAR_LEFT_SQUARE_BRACKET; +out->out_str[1] = CHAR_CIRCUMFLEX_ACCENT; +convert_glob_write_str(out, 2); + +convert_glob_print_separator(out, separator, with_escape); + +convert_glob_write(out, CHAR_RIGHT_SQUARE_BRACKET); +} + + +/* Parse a posix class. + +Arguments: + from starting point of scanning the range + pattern_end end of pattern + out output context + +Returns: >0 => class index + 0 => malformed class +*/ + +static int +convert_glob_parse_class(PCRE2_SPTR *from, PCRE2_SPTR pattern_end, + pcre2_output_context *out) +{ +static const char *posix_classes = "alnum:alpha:ascii:blank:cntrl:digit:" + "graph:lower:print:punct:space:upper:word:xdigit:"; +PCRE2_SPTR start = *from + 1; +PCRE2_SPTR pattern = start; +const char *class_ptr; +PCRE2_UCHAR c; +int class_index; + +while (TRUE) + { + if (pattern >= pattern_end) return 0; + + c = *pattern++; + + if (c < CHAR_a || c > CHAR_z) break; + } + +if (c != CHAR_COLON || pattern >= pattern_end || + *pattern != CHAR_RIGHT_SQUARE_BRACKET) + return 0; + +class_ptr = posix_classes; +class_index = 1; + +while (TRUE) + { + if (*class_ptr == CHAR_NUL) return 0; + + pattern = start; + + while (*pattern == (PCRE2_UCHAR) *class_ptr) + { + if (*pattern == CHAR_COLON) + { + pattern += 2; + start -= 2; + + do convert_glob_write(out, *start++); while (start < pattern); + + *from = pattern; + return class_index; + } + pattern++; + class_ptr++; + } + + while (*class_ptr != CHAR_COLON) class_ptr++; + class_ptr++; + class_index++; + } +} + +/* Checks whether the character is in the class. + +Arguments: + class_index class index + c character + +Returns: !0 => character is found in the class + 0 => otherwise +*/ + +static BOOL +convert_glob_char_in_class(int class_index, PCRE2_UCHAR c) +{ +switch (class_index) + { + case 1: return isalnum(c); + case 2: return isalpha(c); + case 3: return 1; + case 4: return c == CHAR_HT || c == CHAR_SPACE; + case 5: return iscntrl(c); + case 6: return isdigit(c); + case 7: return isgraph(c); + case 8: return islower(c); + case 9: return isprint(c); + case 10: return ispunct(c); + case 11: return isspace(c); + case 12: return isupper(c); + case 13: return isalnum(c) || c == CHAR_UNDERSCORE; + default: return isxdigit(c); + } +} + +/* Parse a range of characters. + +Arguments: + from starting point of scanning the range + pattern_end end of pattern + out output context + separator glob separator + with_escape backslash is needed before separator + +Returns: 0 => success + !0 => error code +*/ + +static int +convert_glob_parse_range(PCRE2_SPTR *from, PCRE2_SPTR pattern_end, + pcre2_output_context *out, BOOL utf, PCRE2_UCHAR separator, + BOOL with_escape, PCRE2_UCHAR escape, BOOL no_wildsep) +{ +BOOL is_negative = FALSE; +BOOL separator_seen = FALSE; +BOOL has_prev_c; +PCRE2_SPTR pattern = *from; +PCRE2_SPTR char_start = NULL; +uint32_t c, prev_c; +int len, class_index; + +(void)utf; /* Avoid compiler warning. */ + +if (pattern >= pattern_end) + { + *from = pattern; + return PCRE2_ERROR_MISSING_SQUARE_BRACKET; + } + +if (*pattern == CHAR_EXCLAMATION_MARK + || *pattern == CHAR_CIRCUMFLEX_ACCENT) + { + pattern++; + + if (pattern >= pattern_end) + { + *from = pattern; + return PCRE2_ERROR_MISSING_SQUARE_BRACKET; + } + + is_negative = TRUE; + + out->out_str[0] = CHAR_LEFT_SQUARE_BRACKET; + out->out_str[1] = CHAR_CIRCUMFLEX_ACCENT; + len = 2; + + if (!no_wildsep) + { + if (with_escape) + { + out->out_str[len] = CHAR_BACKSLASH; + len++; + } + out->out_str[len] = (uint8_t) separator; + } + + convert_glob_write_str(out, len + 1); + } +else + convert_glob_write(out, CHAR_LEFT_SQUARE_BRACKET); + +has_prev_c = FALSE; +prev_c = 0; + +if (*pattern == CHAR_RIGHT_SQUARE_BRACKET) + { + out->out_str[0] = CHAR_BACKSLASH; + out->out_str[1] = CHAR_RIGHT_SQUARE_BRACKET; + convert_glob_write_str(out, 2); + has_prev_c = TRUE; + prev_c = CHAR_RIGHT_SQUARE_BRACKET; + pattern++; + } + +while (pattern < pattern_end) + { + char_start = pattern; + GETCHARINCTEST(c, pattern); + + if (c == CHAR_RIGHT_SQUARE_BRACKET) + { + convert_glob_write(out, c); + + if (!is_negative && !no_wildsep && separator_seen) + { + out->out_str[0] = CHAR_LEFT_PARENTHESIS; + out->out_str[1] = CHAR_QUESTION_MARK; + out->out_str[2] = CHAR_LESS_THAN_SIGN; + out->out_str[3] = CHAR_EXCLAMATION_MARK; + convert_glob_write_str(out, 4); + + convert_glob_print_separator(out, separator, with_escape); + convert_glob_write(out, CHAR_RIGHT_PARENTHESIS); + } + + *from = pattern; + return 0; + } + + if (pattern >= pattern_end) break; + + if (c == CHAR_LEFT_SQUARE_BRACKET && *pattern == CHAR_COLON) + { + *from = pattern; + class_index = convert_glob_parse_class(from, pattern_end, out); + + if (class_index != 0) + { + pattern = *from; + + has_prev_c = FALSE; + prev_c = 0; + + if (!is_negative && + convert_glob_char_in_class (class_index, separator)) + separator_seen = TRUE; + continue; + } + } + else if (c == CHAR_MINUS && has_prev_c && + *pattern != CHAR_RIGHT_SQUARE_BRACKET) + { + convert_glob_write(out, CHAR_MINUS); + + char_start = pattern; + GETCHARINCTEST(c, pattern); + + if (pattern >= pattern_end) break; + + if (escape != 0 && c == escape) + { + char_start = pattern; + GETCHARINCTEST(c, pattern); + } + else if (c == CHAR_LEFT_SQUARE_BRACKET && *pattern == CHAR_COLON) + { + *from = pattern; + return PCRE2_ERROR_CONVERT_SYNTAX; + } + + if (prev_c > c) + { + *from = pattern; + return PCRE2_ERROR_CONVERT_SYNTAX; + } + + if (prev_c < separator && separator < c) separator_seen = TRUE; + + has_prev_c = FALSE; + prev_c = 0; + } + else + { + if (escape != 0 && c == escape) + { + char_start = pattern; + GETCHARINCTEST(c, pattern); + + if (pattern >= pattern_end) break; + } + + has_prev_c = TRUE; + prev_c = c; + } + + if (c == CHAR_LEFT_SQUARE_BRACKET || c == CHAR_RIGHT_SQUARE_BRACKET || + c == CHAR_BACKSLASH || c == CHAR_MINUS) + convert_glob_write(out, CHAR_BACKSLASH); + + if (c == separator) separator_seen = TRUE; + + do convert_glob_write(out, *char_start++); while (char_start < pattern); + } + +*from = pattern; +return PCRE2_ERROR_MISSING_SQUARE_BRACKET; +} + + +/* Prints a (*COMMIT) into the output. + +Arguments: + out output context +*/ + +static void +convert_glob_print_commit(pcre2_output_context *out) +{ +out->out_str[0] = CHAR_LEFT_PARENTHESIS; +out->out_str[1] = CHAR_ASTERISK; +out->out_str[2] = CHAR_C; +out->out_str[3] = CHAR_O; +out->out_str[4] = CHAR_M; +out->out_str[5] = CHAR_M; +out->out_str[6] = CHAR_I; +out->out_str[7] = CHAR_T; +convert_glob_write_str(out, 8); +convert_glob_write(out, CHAR_RIGHT_PARENTHESIS); +} + + +/* Bash glob converter. + +Arguments: + pattype the pattern type + pattern the pattern + plength length in code units + utf TRUE if UTF + use_buffer where to put the output + use_length length of use_buffer + bufflenptr where to put the used length + dummyrun TRUE if a dummy run + ccontext the convert context + +Returns: 0 => success + !0 => error code +*/ + +static int +convert_glob(uint32_t options, PCRE2_SPTR pattern, PCRE2_SIZE plength, + BOOL utf, PCRE2_UCHAR *use_buffer, PCRE2_SIZE use_length, + PCRE2_SIZE *bufflenptr, BOOL dummyrun, pcre2_convert_context *ccontext) +{ +pcre2_output_context out; +PCRE2_SPTR pattern_start = pattern; +PCRE2_SPTR pattern_end = pattern + plength; +PCRE2_UCHAR separator = ccontext->glob_separator; +PCRE2_UCHAR escape = ccontext->glob_escape; +PCRE2_UCHAR c; +BOOL no_wildsep = (options & PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR) != 0; +BOOL no_starstar = (options & PCRE2_CONVERT_GLOB_NO_STARSTAR) != 0; +BOOL in_atomic = FALSE; +BOOL after_starstar = FALSE; +BOOL no_slash_z = FALSE; +BOOL with_escape, is_start, after_separator; +int result = 0; + +(void)utf; /* Avoid compiler warning. */ + +#ifdef SUPPORT_UNICODE +if (utf && (separator >= 128 || escape >= 128)) + { + /* Currently only ASCII characters are supported. */ + *bufflenptr = 0; + return PCRE2_ERROR_CONVERT_SYNTAX; + } +#endif + +with_escape = strchr(pcre2_escaped_literals, separator) != NULL; + +/* Initialize default for error offset as end of input. */ +out.output = use_buffer; +out.output_end = use_buffer + use_length; +out.output_size = 0; + +out.out_str[0] = CHAR_LEFT_PARENTHESIS; +out.out_str[1] = CHAR_QUESTION_MARK; +out.out_str[2] = CHAR_s; +out.out_str[3] = CHAR_RIGHT_PARENTHESIS; +convert_glob_write_str(&out, 4); + +is_start = TRUE; + +if (pattern < pattern_end && pattern[0] == CHAR_ASTERISK) + { + if (no_wildsep) + is_start = FALSE; + else if (!no_starstar && pattern + 1 < pattern_end && + pattern[1] == CHAR_ASTERISK) + is_start = FALSE; + } + +if (is_start) + { + out.out_str[0] = CHAR_BACKSLASH; + out.out_str[1] = CHAR_A; + convert_glob_write_str(&out, 2); + } + +while (pattern < pattern_end) + { + c = *pattern++; + + if (c == CHAR_ASTERISK) + { + is_start = pattern == pattern_start + 1; + + if (in_atomic) + { + convert_glob_write(&out, CHAR_RIGHT_PARENTHESIS); + in_atomic = FALSE; + } + + if (!no_starstar && pattern < pattern_end && *pattern == CHAR_ASTERISK) + { + after_separator = is_start || (pattern[-2] == separator); + + do pattern++; while (pattern < pattern_end && + *pattern == CHAR_ASTERISK); + + if (pattern >= pattern_end) + { + no_slash_z = TRUE; + break; + } + + after_starstar = TRUE; + + if (after_separator && escape != 0 && *pattern == escape && + pattern + 1 < pattern_end && pattern[1] == separator) + pattern++; + + if (is_start) + { + if (*pattern != separator) continue; + + out.out_str[0] = CHAR_LEFT_PARENTHESIS; + out.out_str[1] = CHAR_QUESTION_MARK; + out.out_str[2] = CHAR_COLON; + out.out_str[3] = CHAR_BACKSLASH; + out.out_str[4] = CHAR_A; + out.out_str[5] = CHAR_VERTICAL_LINE; + convert_glob_write_str(&out, 6); + + convert_glob_print_separator(&out, separator, with_escape); + convert_glob_write(&out, CHAR_RIGHT_PARENTHESIS); + + pattern++; + continue; + } + + convert_glob_print_commit(&out); + + if (!after_separator || *pattern != separator) + { + out.out_str[0] = CHAR_DOT; + out.out_str[1] = CHAR_ASTERISK; + out.out_str[2] = CHAR_QUESTION_MARK; + convert_glob_write_str(&out, 3); + continue; + } + + out.out_str[0] = CHAR_LEFT_PARENTHESIS; + out.out_str[1] = CHAR_QUESTION_MARK; + out.out_str[2] = CHAR_COLON; + out.out_str[3] = CHAR_DOT; + out.out_str[4] = CHAR_ASTERISK; + out.out_str[5] = CHAR_QUESTION_MARK; + + convert_glob_write_str(&out, 6); + + convert_glob_print_separator(&out, separator, with_escape); + + out.out_str[0] = CHAR_RIGHT_PARENTHESIS; + out.out_str[1] = CHAR_QUESTION_MARK; + out.out_str[2] = CHAR_QUESTION_MARK; + convert_glob_write_str(&out, 3); + + pattern++; + continue; + } + + if (pattern < pattern_end && *pattern == CHAR_ASTERISK) + { + do pattern++; while (pattern < pattern_end && + *pattern == CHAR_ASTERISK); + } + + if (no_wildsep) + { + if (pattern >= pattern_end) + { + no_slash_z = TRUE; + break; + } + + /* Start check must be after the end check. */ + if (is_start) continue; + } + + if (!is_start) + { + if (after_starstar) + { + out.out_str[0] = CHAR_LEFT_PARENTHESIS; + out.out_str[1] = CHAR_QUESTION_MARK; + out.out_str[2] = CHAR_GREATER_THAN_SIGN; + convert_glob_write_str(&out, 3); + in_atomic = TRUE; + } + else + convert_glob_print_commit(&out); + } + + if (no_wildsep) + convert_glob_write(&out, CHAR_DOT); + else + convert_glob_print_wildcard(&out, separator, with_escape); + + out.out_str[0] = CHAR_ASTERISK; + out.out_str[1] = CHAR_QUESTION_MARK; + if (pattern >= pattern_end) + out.out_str[1] = CHAR_PLUS; + convert_glob_write_str(&out, 2); + continue; + } + + if (c == CHAR_QUESTION_MARK) + { + if (no_wildsep) + convert_glob_write(&out, CHAR_DOT); + else + convert_glob_print_wildcard(&out, separator, with_escape); + continue; + } + + if (c == CHAR_LEFT_SQUARE_BRACKET) + { + result = convert_glob_parse_range(&pattern, pattern_end, + &out, utf, separator, with_escape, escape, no_wildsep); + if (result != 0) break; + continue; + } + + if (escape != 0 && c == escape) + { + if (pattern >= pattern_end) + { + result = PCRE2_ERROR_CONVERT_SYNTAX; + break; + } + c = *pattern++; + } + + if (c < 128 && strchr(pcre2_escaped_literals, c) != NULL) + convert_glob_write(&out, CHAR_BACKSLASH); + + convert_glob_write(&out, c); + } + +if (result == 0) + { + if (!no_slash_z) + { + out.out_str[0] = CHAR_BACKSLASH; + out.out_str[1] = CHAR_z; + convert_glob_write_str(&out, 2); + } + + if (in_atomic) + convert_glob_write(&out, CHAR_RIGHT_PARENTHESIS); + + convert_glob_write(&out, CHAR_NUL); + + if (!dummyrun && out.output_size != (PCRE2_SIZE) (out.output - use_buffer)) + result = PCRE2_ERROR_NOMEMORY; + } + +if (result != 0) + { + *bufflenptr = pattern - pattern_start; + return result; + } + +*bufflenptr = out.output_size - 1; +return 0; +} + + +/************************************************* +* Convert pattern * +*************************************************/ + +/* This is the external-facing function for converting other forms of pattern +into PCRE2 regular expression patterns. On error, the bufflenptr argument is +used to return an offset in the original pattern. + +Arguments: + pattern the input pattern + plength length of input, or PCRE2_ZERO_TERMINATED + options options bits + buffptr pointer to pointer to output buffer + bufflenptr pointer to length of output buffer + ccontext convert context or NULL + +Returns: 0 for success, else an error code (+ve or -ve) +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_pattern_convert(PCRE2_SPTR pattern, PCRE2_SIZE plength, uint32_t options, + PCRE2_UCHAR **buffptr, PCRE2_SIZE *bufflenptr, + pcre2_convert_context *ccontext) +{ +int i, rc; +PCRE2_UCHAR dummy_buffer[DUMMY_BUFFER_SIZE]; +PCRE2_UCHAR *use_buffer = dummy_buffer; +PCRE2_SIZE use_length = DUMMY_BUFFER_SIZE; +BOOL utf = (options & PCRE2_CONVERT_UTF) != 0; +uint32_t pattype = options & TYPE_OPTIONS; + +if (pattern == NULL || bufflenptr == NULL) return PCRE2_ERROR_NULL; + +if ((options & ~ALL_OPTIONS) != 0 || /* Undefined bit set */ + (pattype & (~pattype+1)) != pattype || /* More than one type set */ + pattype == 0) /* No type set */ + { + *bufflenptr = 0; /* Error offset */ + return PCRE2_ERROR_BADOPTION; + } + +if (plength == PCRE2_ZERO_TERMINATED) plength = PRIV(strlen)(pattern); +if (ccontext == NULL) ccontext = + (pcre2_convert_context *)(&PRIV(default_convert_context)); + +/* Check UTF if required. */ + +#ifndef SUPPORT_UNICODE +if (utf) + { + *bufflenptr = 0; /* Error offset */ + return PCRE2_ERROR_UNICODE_NOT_SUPPORTED; + } +#else +if (utf && (options & PCRE2_CONVERT_NO_UTF_CHECK) == 0) + { + PCRE2_SIZE erroroffset; + rc = PRIV(valid_utf)(pattern, plength, &erroroffset); + if (rc != 0) + { + *bufflenptr = erroroffset; + return rc; + } + } +#endif + +/* If buffptr is not NULL, and what it points to is not NULL, we are being +provided with a buffer and a length, so set them as the buffer to use. */ + +if (buffptr != NULL && *buffptr != NULL) + { + use_buffer = *buffptr; + use_length = *bufflenptr; + } + +/* Call an individual converter, either just once (if a buffer was provided or +just the length is needed), or twice (if a memory allocation is required). */ + +for (i = 0; i < 2; i++) + { + PCRE2_UCHAR *allocated; + BOOL dummyrun = buffptr == NULL || *buffptr == NULL; + + switch(pattype) + { + case PCRE2_CONVERT_GLOB: + rc = convert_glob(options & ~PCRE2_CONVERT_GLOB, pattern, plength, utf, + use_buffer, use_length, bufflenptr, dummyrun, ccontext); + break; + + case PCRE2_CONVERT_POSIX_BASIC: + case PCRE2_CONVERT_POSIX_EXTENDED: + rc = convert_posix(pattype, pattern, plength, utf, use_buffer, use_length, + bufflenptr, dummyrun, ccontext); + break; + + default: + *bufflenptr = 0; /* Error offset */ + return PCRE2_ERROR_INTERNAL; + } + + if (rc != 0 || /* Error */ + buffptr == NULL || /* Just the length is required */ + *buffptr != NULL) /* Buffer was provided or allocated */ + return rc; + + /* Allocate memory for the buffer, with hidden space for an allocator at + the start. The next time round the loop runs the conversion for real. */ + + allocated = PRIV(memctl_malloc)(sizeof(pcre2_memctl) + + (*bufflenptr + 1)*PCRE2_CODE_UNIT_WIDTH, (pcre2_memctl *)ccontext); + if (allocated == NULL) return PCRE2_ERROR_NOMEMORY; + *buffptr = (PCRE2_UCHAR *)(((char *)allocated) + sizeof(pcre2_memctl)); + + use_buffer = *buffptr; + use_length = *bufflenptr + 1; + } + +/* Control should never get here. */ + +return PCRE2_ERROR_INTERNAL; +} + + +/************************************************* +* Free converted pattern * +*************************************************/ + +/* This frees a converted pattern that was put in newly-allocated memory. + +Argument: the converted pattern +Returns: nothing +*/ + +PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION +pcre2_converted_pattern_free(PCRE2_UCHAR *converted) +{ +if (converted != NULL) + { + pcre2_memctl *memctl = + (pcre2_memctl *)((char *)converted - sizeof(pcre2_memctl)); + memctl->free(memctl, memctl->memory_data); + } +} + +/* End of pcre2_convert.c */ diff --git a/src/pcre/pcre_dfa_exec.c b/src/pcre2/src/pcre2_dfa_match.c similarity index 65% rename from src/pcre/pcre_dfa_exec.c rename to src/pcre2/src/pcre2_dfa_match.c index f333381d..625695b7 100644 --- a/src/pcre/pcre_dfa_exec.c +++ b/src/pcre2/src/pcre2_dfa_match.c @@ -3,11 +3,11 @@ *************************************************/ /* PCRE is a library of functions to support regular expressions whose syntax -and semantics are as close as possible to those of the Perl 5 language (but see -below for why this module is different). +and semantics are as close as possible to those of the Perl 5 language. Written by Philip Hazel - Copyright (c) 1997-2017 University of Cambridge + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2020 University of Cambridge ----------------------------------------------------------------------------- Redistribution and use in source and binary forms, with or without @@ -38,7 +38,8 @@ POSSIBILITY OF SUCH DAMAGE. ----------------------------------------------------------------------------- */ -/* This module contains the external function pcre_dfa_exec(), which is an + +/* This module contains the external function pcre2_dfa_match(), which is an alternative matching function that uses a sort of DFA algorithm (not a true FSM). This is NOT Perl-compatible, but it has advantages in certain applications. */ @@ -64,28 +65,28 @@ on the stack. It did give a 13% improvement with one specially constructed pattern for certain subject strings, but on other strings and on many of the simpler patterns in the test suite it did worse. The major problem, I think, was the extra time to initialize the index. This had to be done for each call -of internal_dfa_exec(). (The supplied patch used a static vector, initialized +of internal_dfa_match(). (The supplied patch used a static vector, initialized only once - I suspect this was the cause of the problems with the tests.) Overall, I concluded that the gains in some cases did not outweigh the losses in others, so I abandoned this code. */ - #ifdef HAVE_CONFIG_H #include "config.h" #endif -#define NLBLOCK md /* Block containing newline information */ +#define NLBLOCK mb /* Block containing newline information */ #define PSSTART start_subject /* Field containing processed string start */ #define PSEND end_subject /* Field containing processed string end */ -#include "pcre_internal.h" - +#include "pcre2_internal.h" -/* For use to indent debugging output */ - -#define SP " " +#define PUBLIC_DFA_MATCH_OPTIONS \ + (PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \ + PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \ + PCRE2_PARTIAL_SOFT|PCRE2_DFA_SHORTEST|PCRE2_DFA_RESTART| \ + PCRE2_COPY_MATCHED_SUBJECT) /************************************************* @@ -112,7 +113,7 @@ small value. Non-zero values in the table are the offsets from the opcode where the character is to be found. ***NOTE*** If the start of this table is modified, the three tables that follow must also be modified. */ -static const pcre_uint8 coptable[] = { +static const uint8_t coptable[] = { 0, /* End */ 0, 0, 0, 0, 0, /* \A, \G, \K, \B, \b */ 0, 0, 0, 0, 0, 0, /* \D, \d, \S, \s, \W, \w */ @@ -161,6 +162,7 @@ static const pcre_uint8 coptable[] = { 0, /* DNREFI */ 0, /* RECURSE */ 0, /* CALLOUT */ + 0, /* CALLOUT_STR */ 0, /* Alt */ 0, /* Ket */ 0, /* KetRmax */ @@ -171,17 +173,21 @@ static const pcre_uint8 coptable[] = { 0, /* Assert not */ 0, /* Assert behind */ 0, /* Assert behind not */ - 0, 0, /* ONCE, ONCE_NC */ + 0, /* NA assert */ + 0, /* NA assert behind */ + 0, /* ONCE */ + 0, /* SCRIPT_RUN */ 0, 0, 0, 0, 0, /* BRA, BRAPOS, CBRA, CBRAPOS, COND */ 0, 0, 0, 0, 0, /* SBRA, SBRAPOS, SCBRA, SCBRAPOS, SCOND */ 0, 0, /* CREF, DNCREF */ 0, 0, /* RREF, DNRREF */ - 0, /* DEF */ + 0, 0, /* FALSE, TRUE */ 0, 0, 0, /* BRAZERO, BRAMINZERO, BRAPOSZERO */ 0, 0, 0, /* MARK, PRUNE, PRUNE_ARG */ 0, 0, 0, 0, /* SKIP, SKIP_ARG, THEN, THEN_ARG */ - 0, 0, 0, 0, /* COMMIT, FAIL, ACCEPT, ASSERT_ACCEPT */ - 0, 0 /* CLOSE, SKIPZERO */ + 0, 0, /* COMMIT, COMMIT_ARG */ + 0, 0, 0, /* FAIL, ACCEPT, ASSERT_ACCEPT */ + 0, 0, 0 /* CLOSE, SKIPZERO, DEFINE */ }; /* This table identifies those opcodes that inspect a character. It is used to @@ -189,7 +195,7 @@ remember the fact that a character could have been inspected when the end of the subject is reached. ***NOTE*** If the start of this table is modified, the two tables that follow must also be modified. */ -static const pcre_uint8 poptable[] = { +static const uint8_t poptable[] = { 0, /* End */ 0, 0, 0, 1, 1, /* \A, \G, \K, \B, \b */ 1, 1, 1, 1, 1, 1, /* \D, \d, \S, \s, \W, \w */ @@ -233,6 +239,7 @@ static const pcre_uint8 poptable[] = { 0, /* DNREFI */ 0, /* RECURSE */ 0, /* CALLOUT */ + 0, /* CALLOUT_STR */ 0, /* Alt */ 0, /* Ket */ 0, /* KetRmax */ @@ -243,23 +250,27 @@ static const pcre_uint8 poptable[] = { 0, /* Assert not */ 0, /* Assert behind */ 0, /* Assert behind not */ - 0, 0, /* ONCE, ONCE_NC */ + 0, /* NA assert */ + 0, /* NA assert behind */ + 0, /* ONCE */ + 0, /* SCRIPT_RUN */ 0, 0, 0, 0, 0, /* BRA, BRAPOS, CBRA, CBRAPOS, COND */ 0, 0, 0, 0, 0, /* SBRA, SBRAPOS, SCBRA, SCBRAPOS, SCOND */ 0, 0, /* CREF, DNCREF */ 0, 0, /* RREF, DNRREF */ - 0, /* DEF */ + 0, 0, /* FALSE, TRUE */ 0, 0, 0, /* BRAZERO, BRAMINZERO, BRAPOSZERO */ 0, 0, 0, /* MARK, PRUNE, PRUNE_ARG */ 0, 0, 0, 0, /* SKIP, SKIP_ARG, THEN, THEN_ARG */ - 0, 0, 0, 0, /* COMMIT, FAIL, ACCEPT, ASSERT_ACCEPT */ - 0, 0 /* CLOSE, SKIPZERO */ + 0, 0, /* COMMIT, COMMIT_ARG */ + 0, 0, 0, /* FAIL, ACCEPT, ASSERT_ACCEPT */ + 0, 0, 0 /* CLOSE, SKIPZERO, DEFINE */ }; /* These 2 tables allow for compact code for testing for \D, \d, \S, \s, \W, and \w */ -static const pcre_uint8 toptable1[] = { +static const uint8_t toptable1[] = { 0, 0, 0, 0, 0, 0, ctype_digit, ctype_digit, ctype_space, ctype_space, @@ -267,7 +278,7 @@ static const pcre_uint8 toptable1[] = { 0, 0 /* OP_ANY, OP_ALLANY */ }; -static const pcre_uint8 toptable2[] = { +static const uint8_t toptable2[] = { 0, 0, 0, 0, 0, 0, ctype_digit, 0, ctype_space, 0, @@ -282,7 +293,7 @@ entirely of ints because the working vector we are passed, and which we put these structures in, is a vector of ints. */ typedef struct stateblock { - int offset; /* Offset to opcode */ + int offset; /* Offset to opcode (-ve has meaning) */ int count; /* Count for repeats */ int data; /* Some use extra data */ } stateblock; @@ -290,39 +301,157 @@ typedef struct stateblock { #define INTS_PER_STATEBLOCK (int)(sizeof(stateblock)/sizeof(int)) -#ifdef PCRE_DEBUG +/* Before version 10.32 the recursive calls of internal_dfa_match() were passed +local working space and output vectors that were created on the stack. This has +caused issues for some patterns, especially in small-stack environments such as +Windows. A new scheme is now in use which sets up a vector on the stack, but if +this is too small, heap memory is used, up to the heap_limit. The main +parameters are all numbers of ints because the workspace is a vector of ints. + +The size of the starting stack vector, DFA_START_RWS_SIZE, is in bytes, and is +defined in pcre2_internal.h so as to be available to pcre2test when it is +finding the minimum heap requirement for a match. */ + +#define OVEC_UNIT (sizeof(PCRE2_SIZE)/sizeof(int)) + +#define RWS_BASE_SIZE (DFA_START_RWS_SIZE/sizeof(int)) /* Stack vector */ +#define RWS_RSIZE 1000 /* Work size for recursion */ +#define RWS_OVEC_RSIZE (1000*OVEC_UNIT) /* Ovector for recursion */ +#define RWS_OVEC_OSIZE (2*OVEC_UNIT) /* Ovector in other cases */ + +/* This structure is at the start of each workspace block. */ + +typedef struct RWS_anchor { + struct RWS_anchor *next; + uint32_t size; /* Number of ints */ + uint32_t free; /* Number of ints */ +} RWS_anchor; + +#define RWS_ANCHOR_SIZE (sizeof(RWS_anchor)/sizeof(int)) + + + /************************************************* -* Print character string * +* Process a callout * *************************************************/ -/* Character string printing function for debugging. +/* This function is called to perform a callout. Arguments: - p points to string - length number of bytes - f where to print + code current code pointer + offsets points to current capture offsets + current_subject start of current subject match + ptr current position in subject + mb the match block + extracode extra code offset when called from condition + lengthptr where to return the callout length + +Returns: the return from the callout +*/ + +static int +do_callout(PCRE2_SPTR code, PCRE2_SIZE *offsets, PCRE2_SPTR current_subject, + PCRE2_SPTR ptr, dfa_match_block *mb, PCRE2_SIZE extracode, + PCRE2_SIZE *lengthptr) +{ +pcre2_callout_block *cb = mb->cb; + +*lengthptr = (code[extracode] == OP_CALLOUT)? + (PCRE2_SIZE)PRIV(OP_lengths)[OP_CALLOUT] : + (PCRE2_SIZE)GET(code, 1 + 2*LINK_SIZE + extracode); + +if (mb->callout == NULL) return 0; /* No callout provided */ + +/* Fixed fields in the callout block are set once and for all at the start of +matching. */ + +cb->offset_vector = offsets; +cb->start_match = (PCRE2_SIZE)(current_subject - mb->start_subject); +cb->current_position = (PCRE2_SIZE)(ptr - mb->start_subject); +cb->pattern_position = GET(code, 1 + extracode); +cb->next_item_length = GET(code, 1 + LINK_SIZE + extracode); + +if (code[extracode] == OP_CALLOUT) + { + cb->callout_number = code[1 + 2*LINK_SIZE + extracode]; + cb->callout_string_offset = 0; + cb->callout_string = NULL; + cb->callout_string_length = 0; + } +else + { + cb->callout_number = 0; + cb->callout_string_offset = GET(code, 1 + 3*LINK_SIZE + extracode); + cb->callout_string = code + (1 + 4*LINK_SIZE + extracode) + 1; + cb->callout_string_length = *lengthptr - (1 + 4*LINK_SIZE) - 2; + } + +return (mb->callout)(cb, mb->callout_data); +} + + + +/************************************************* +* Expand local workspace memory * +*************************************************/ -Returns: nothing +/* This function is called when internal_dfa_match() is about to be called +recursively and there is insufficient working space left in the current +workspace block. If there's an existing next block, use it; otherwise get a new +block unless the heap limit is reached. + +Arguments: + rwsptr pointer to block pointer (updated) + ovecsize space needed for an ovector + mb the match block + +Returns: 0 rwsptr has been updated + !0 an error code */ -static void -pchars(const pcre_uchar *p, int length, FILE *f) +static int +more_workspace(RWS_anchor **rwsptr, unsigned int ovecsize, dfa_match_block *mb) { -pcre_uint32 c; -while (length-- > 0) +RWS_anchor *rws = *rwsptr; +RWS_anchor *new; + +if (rws->next != NULL) { - if (isprint(c = *(p++))) - fprintf(f, "%c", c); - else - fprintf(f, "\\x{%02x}", c); + new = rws->next; } + +/* Sizes in the RWS_anchor blocks are in units of sizeof(int), but +mb->heap_limit and mb->heap_used are in kibibytes. Play carefully, to avoid +overflow. */ + +else + { + uint32_t newsize = (rws->size >= UINT32_MAX/2)? UINT32_MAX/2 : rws->size * 2; + uint32_t newsizeK = newsize/(1024/sizeof(int)); + + if (newsizeK + mb->heap_used > mb->heap_limit) + newsizeK = (uint32_t)(mb->heap_limit - mb->heap_used); + newsize = newsizeK*(1024/sizeof(int)); + + if (newsize < RWS_RSIZE + ovecsize + RWS_ANCHOR_SIZE) + return PCRE2_ERROR_HEAPLIMIT; + new = mb->memctl.malloc(newsize*sizeof(int), mb->memctl.memory_data); + if (new == NULL) return PCRE2_ERROR_NOMEMORY; + mb->heap_used += newsizeK; + new->next = NULL; + new->size = newsize; + rws->next = new; + } + +new->free = new->size - RWS_ANCHOR_SIZE; +*rwsptr = new; +return 0; } -#endif /************************************************* -* Execute a Regular Expression - DFA engine * +* Match a Regular Expression - DFA engine * *************************************************/ /* This internal function applies a compiled pattern to a subject string, @@ -331,7 +460,7 @@ external one, possibly multiple times if the pattern is not anchored. The function calls itself recursively for some kinds of subpattern. Arguments: - md the match_data block with fixed information + mb the match_data block with fixed information this_start_code the opening bracket of this subexpression's code current_subject where we currently are in the subject string start_offset start offset in the subject string @@ -355,9 +484,8 @@ for the current character, one for the following character). */ next_active_state->offset = (x); \ next_active_state->count = (y); \ next_active_state++; \ - DPRINTF(("%.*sADD_ACTIVE(%d,%d)\n", rlevel*2-2, SP, (x), (y))); \ } \ - else return PCRE_ERROR_DFA_WSSIZE + else return PCRE2_ERROR_DFA_WSSIZE #define ADD_ACTIVE_DATA(x,y,z) \ if (active_count++ < wscount) \ @@ -366,9 +494,8 @@ for the current character, one for the following character). */ next_active_state->count = (y); \ next_active_state->data = (z); \ next_active_state++; \ - DPRINTF(("%.*sADD_ACTIVE_DATA(%d,%d,%d)\n", rlevel*2-2, SP, (x), (y), (z))); \ } \ - else return PCRE_ERROR_DFA_WSSIZE + else return PCRE2_ERROR_DFA_WSSIZE #define ADD_NEW(x,y) \ if (new_count++ < wscount) \ @@ -376,9 +503,8 @@ for the current character, one for the following character). */ next_new_state->offset = (x); \ next_new_state->count = (y); \ next_new_state++; \ - DPRINTF(("%.*sADD_NEW(%d,%d)\n", rlevel*2-2, SP, (x), (y))); \ } \ - else return PCRE_ERROR_DFA_WSSIZE + else return PCRE2_ERROR_DFA_WSSIZE #define ADD_NEW_DATA(x,y,z) \ if (new_count++ < wscount) \ @@ -387,95 +513,83 @@ for the current character, one for the following character). */ next_new_state->count = (y); \ next_new_state->data = (z); \ next_new_state++; \ - DPRINTF(("%.*sADD_NEW_DATA(%d,%d,%d) line %d\n", rlevel*2-2, SP, \ - (x), (y), (z), __LINE__)); \ } \ - else return PCRE_ERROR_DFA_WSSIZE + else return PCRE2_ERROR_DFA_WSSIZE /* And now, here is the code */ static int -internal_dfa_exec( - dfa_match_data *md, - const pcre_uchar *this_start_code, - const pcre_uchar *current_subject, - int start_offset, - int *offsets, - int offsetcount, +internal_dfa_match( + dfa_match_block *mb, + PCRE2_SPTR this_start_code, + PCRE2_SPTR current_subject, + PCRE2_SIZE start_offset, + PCRE2_SIZE *offsets, + uint32_t offsetcount, int *workspace, int wscount, - int rlevel) + uint32_t rlevel, + int *RWS) { stateblock *active_states, *new_states, *temp_states; stateblock *next_active_state, *next_new_state; - -const pcre_uint8 *ctypes, *lcc, *fcc; -const pcre_uchar *ptr; -const pcre_uchar *end_code, *first_op; - +const uint8_t *ctypes, *lcc, *fcc; +PCRE2_SPTR ptr; +PCRE2_SPTR end_code; dfa_recursion_info new_recursive; - int active_count, new_count, match_count; -/* Some fields in the md block are frequently referenced, so we load them into +/* Some fields in the mb block are frequently referenced, so we load them into independent variables in the hope that this will perform better. */ -const pcre_uchar *start_subject = md->start_subject; -const pcre_uchar *end_subject = md->end_subject; -const pcre_uchar *start_code = md->start_code; +PCRE2_SPTR start_subject = mb->start_subject; +PCRE2_SPTR end_subject = mb->end_subject; +PCRE2_SPTR start_code = mb->start_code; -#ifdef SUPPORT_UTF -BOOL utf = (md->poptions & PCRE_UTF8) != 0; +#ifdef SUPPORT_UNICODE +BOOL utf = (mb->poptions & PCRE2_UTF) != 0; +BOOL utf_or_ucp = utf || (mb->poptions & PCRE2_UCP) != 0; #else BOOL utf = FALSE; #endif BOOL reset_could_continue = FALSE; -rlevel++; -offsetcount &= (-2); +if (mb->match_call_count++ >= mb->match_limit) return PCRE2_ERROR_MATCHLIMIT; +if (rlevel++ > mb->match_limit_depth) return PCRE2_ERROR_DEPTHLIMIT; +offsetcount &= (uint32_t)(-2); /* Round down */ wscount -= 2; wscount = (wscount - (wscount % (INTS_PER_STATEBLOCK * 2))) / (2 * INTS_PER_STATEBLOCK); -DPRINTF(("\n%.*s---------------------\n" - "%.*sCall to internal_dfa_exec f=%d\n", - rlevel*2-2, SP, rlevel*2-2, SP, rlevel)); - -ctypes = md->tables + ctypes_offset; -lcc = md->tables + lcc_offset; -fcc = md->tables + fcc_offset; +ctypes = mb->tables + ctypes_offset; +lcc = mb->tables + lcc_offset; +fcc = mb->tables + fcc_offset; -match_count = PCRE_ERROR_NOMATCH; /* A negative number */ +match_count = PCRE2_ERROR_NOMATCH; /* A negative number */ active_states = (stateblock *)(workspace + 2); next_new_state = new_states = active_states + wscount; new_count = 0; -first_op = this_start_code + 1 + LINK_SIZE + - ((*this_start_code == OP_CBRA || *this_start_code == OP_SCBRA || - *this_start_code == OP_CBRAPOS || *this_start_code == OP_SCBRAPOS) - ? IMM2_SIZE:0); - /* The first thing in any (sub) pattern is a bracket of some sort. Push all the alternative states onto the list, and find out where the end is. This makes is possible to use this function recursively, when we want to stop at a matching internal ket rather than at the end. -If the first opcode in the first alternative is OP_REVERSE, we are dealing with -a backward assertion. In that case, we have to find out the maximum amount to -move back, and set up each alternative appropriately. */ +If we are dealing with a backward assertion we have to find out the maximum +amount to move back, and set up each alternative appropriately. */ -if (*first_op == OP_REVERSE) +if (*this_start_code == OP_ASSERTBACK || *this_start_code == OP_ASSERTBACK_NOT) { - int max_back = 0; - int gone_back; + size_t max_back = 0; + size_t gone_back; end_code = this_start_code; do { - int back = GET(end_code, 2+LINK_SIZE); + size_t back = (size_t)GET(end_code, 2+LINK_SIZE); if (back > max_back) max_back = back; end_code += GET(end_code, 1); } @@ -484,7 +598,7 @@ if (*first_op == OP_REVERSE) /* If we can't go back the amount required for the longest lookbehind pattern, go back as far as we can; some alternatives may still be viable. */ -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE /* In character mode we have to step back character by character */ if (utf) @@ -493,7 +607,8 @@ if (*first_op == OP_REVERSE) { if (current_subject <= start_subject) break; current_subject--; - ACROSSCHAR(current_subject > start_subject, *current_subject, current_subject--); + ACROSSCHAR(current_subject > start_subject, current_subject, + current_subject--); } } else @@ -502,26 +617,28 @@ if (*first_op == OP_REVERSE) /* In byte-mode we can do this quickly. */ { - gone_back = (current_subject - max_back < start_subject)? - (int)(current_subject - start_subject) : max_back; + size_t current_offset = (size_t)(current_subject - start_subject); + gone_back = (current_offset < max_back)? current_offset : max_back; current_subject -= gone_back; } /* Save the earliest consulted character */ - if (current_subject < md->start_used_ptr) - md->start_used_ptr = current_subject; + if (current_subject < mb->start_used_ptr) + mb->start_used_ptr = current_subject; - /* Now we can process the individual branches. */ + /* Now we can process the individual branches. There will be an OP_REVERSE at + the start of each branch, except when the length of the branch is zero. */ end_code = this_start_code; do { - int back = GET(end_code, 2+LINK_SIZE); + uint32_t revlen = (end_code[1+LINK_SIZE] == OP_REVERSE)? 1 + LINK_SIZE : 0; + size_t back = (revlen == 0)? 0 : (size_t)GET(end_code, 2+LINK_SIZE); if (back <= gone_back) { - int bstate = (int)(end_code - start_code + 2 + 2*LINK_SIZE); - ADD_NEW_DATA(-bstate, 0, gone_back - back); + int bstate = (int)(end_code - start_code + 1 + LINK_SIZE + revlen); + ADD_NEW_DATA(-bstate, 0, (int)(gone_back - back)); } end_code += GET(end_code, 1); } @@ -540,12 +657,12 @@ else /* Restarting */ - if (rlevel == 1 && (md->moptions & PCRE_DFA_RESTART) != 0) + if (rlevel == 1 && (mb->moptions & PCRE2_DFA_RESTART) != 0) { do { end_code += GET(end_code, 1); } while (*end_code == OP_ALT); new_count = workspace[1]; if (!workspace[0]) - memcpy(new_states, active_states, new_count * sizeof(stateblock)); + memcpy(new_states, active_states, (size_t)new_count * sizeof(stateblock)); } /* Not restarting */ @@ -568,8 +685,6 @@ else workspace[0] = 0; /* Bit indicating which vector is current */ -DPRINTF(("%.*sEnd state = %d\n", rlevel*2-2, SP, (int)(end_code - start_code))); - /* Loop for scanning the subject */ ptr = current_subject; @@ -577,12 +692,14 @@ for (;;) { int i, j; int clen, dlen; - pcre_uint32 c, d; + uint32_t c, d; int forced_fail = 0; BOOL partial_newline = FALSE; BOOL could_continue = reset_could_continue; reset_could_continue = FALSE; + if (ptr > mb->last_used_ptr) mb->last_used_ptr = ptr; + /* Make the new state list into the active state list and empty the new state list. */ @@ -595,17 +712,6 @@ for (;;) workspace[0] ^= 1; /* Remember for the restarting feature */ workspace[1] = active_count; -#ifdef PCRE_DEBUG - printf("%.*sNext character: rest of subject = \"", rlevel*2-2, SP); - pchars(ptr, STRLEN_UC(ptr), stdout); - printf("\"\n"); - - printf("%.*sActive states: ", rlevel*2-2, SP); - for (i = 0; i < active_count; i++) - printf("%d/%d ", active_states[i].offset, active_states[i].count); - printf("\n"); -#endif - /* Set the pointers for adding new states */ next_active_state = active_states + active_count; @@ -618,11 +724,11 @@ for (;;) if (ptr < end_subject) { clen = 1; /* Number of data items in the character */ -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE GETCHARLENTEST(c, ptr, clen); #else c = *ptr; -#endif /* SUPPORT_UTF */ +#endif /* SUPPORT_UNICODE */ } else { @@ -639,18 +745,12 @@ for (;;) { stateblock *current_state = active_states + i; BOOL caseless = FALSE; - const pcre_uchar *code; + PCRE2_SPTR code; + uint32_t codevalue; int state_offset = current_state->offset; - int codevalue, rrc; + int rrc; int count; -#ifdef PCRE_DEBUG - printf ("%.*sProcessing state %d c=", rlevel*2-2, SP, state_offset); - if (clen == 0) printf("EOL\n"); - else if (c > 32 && c < 127) printf("'%c'\n", c); - else printf("0x%02x\n", c); -#endif - /* A negative offset is a special case meaning "hold off going to this (negated) state until the number of characters in the data field have been skipped". If the could_continue flag was passed over from a previous @@ -660,7 +760,6 @@ for (;;) { if (current_state->data > 0) { - DPRINTF(("%.*sSkipping this character\n", rlevel*2-2, SP)); ADD_NEW_DATA(state_offset, current_state->count, current_state->data - 1); if (could_continue) reset_could_continue = TRUE; @@ -680,10 +779,7 @@ for (;;) { if (active_states[j].offset == state_offset && active_states[j].count == current_state->count) - { - DPRINTF(("%.*sDuplicate state: skipped\n", rlevel*2-2, SP)); goto NEXT_ACTIVE_STATE; - } } /* The state offset is the offset to the opcode */ @@ -711,15 +807,15 @@ for (;;) if (coptable[codevalue] > 0) { dlen = 1; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (utf) { GETCHARLEN(d, (code + coptable[codevalue]), dlen); } else -#endif /* SUPPORT_UTF */ +#endif /* SUPPORT_UNICODE */ d = code[coptable[codevalue]]; if (codevalue >= OP_TYPESTAR) { switch(d) { - case OP_ANYBYTE: return PCRE_ERROR_DFA_UITEM; + case OP_ANYBYTE: return PCRE2_ERROR_DFA_UITEM; case OP_NOTPROP: case OP_PROP: codevalue += OP_PROP_EXTRA; break; case OP_ANYNL: codevalue += OP_ANYNL_EXTRA; break; @@ -754,7 +850,7 @@ for (;;) case OP_TABLE_LENGTH + ((sizeof(coptable) == OP_TABLE_LENGTH) && (sizeof(poptable) == OP_TABLE_LENGTH)): - break; + return 0; /* ========================================================================== */ /* Reached a closing bracket. If not at the end of the pattern, carry @@ -764,7 +860,7 @@ for (;;) using recursive calls. Thus, it never adds any new states. At the end of the (sub)pattern, unless we have an empty string and - PCRE_NOTEMPTY is set, or PCRE_NOTEMPTY_ATSTART is set and we are at the + PCRE2_NOTEMPTY is set, or PCRE2_NOTEMPTY_ATSTART is set and we are at the start of the subject, save the match data, shifting up all previous matches so we always have the longest first. */ @@ -777,35 +873,28 @@ for (;;) ADD_ACTIVE(state_offset + 1 + LINK_SIZE, 0); if (codevalue != OP_KET) { - ADD_ACTIVE(state_offset - GET(code, 1), 0); + ADD_ACTIVE(state_offset - (int)GET(code, 1), 0); } } else { if (ptr > current_subject || - ((md->moptions & PCRE_NOTEMPTY) == 0 && - ((md->moptions & PCRE_NOTEMPTY_ATSTART) == 0 || - current_subject > start_subject + md->start_offset))) + ((mb->moptions & PCRE2_NOTEMPTY) == 0 && + ((mb->moptions & PCRE2_NOTEMPTY_ATSTART) == 0 || + current_subject > start_subject + mb->start_offset))) { if (match_count < 0) match_count = (offsetcount >= 2)? 1 : 0; - else if (match_count > 0 && ++match_count * 2 > offsetcount) + else if (match_count > 0 && ++match_count * 2 > (int)offsetcount) match_count = 0; - count = ((match_count == 0)? offsetcount : match_count * 2) - 2; - if (count > 0) memmove(offsets + 2, offsets, count * sizeof(int)); + count = ((match_count == 0)? (int)offsetcount : match_count * 2) - 2; + if (count > 0) (void)memmove(offsets + 2, offsets, + (size_t)count * sizeof(PCRE2_SIZE)); if (offsetcount >= 2) { - offsets[0] = (int)(current_subject - start_subject); - offsets[1] = (int)(ptr - start_subject); - DPRINTF(("%.*sSet matched string = \"%.*s\"\n", rlevel*2-2, SP, - offsets[1] - offsets[0], (char *)current_subject)); - } - if ((md->moptions & PCRE_DFA_SHORTEST) != 0) - { - DPRINTF(("%.*sEnd of internal_dfa_exec %d: returning %d\n" - "%.*s---------------------\n\n", rlevel*2-2, SP, rlevel, - match_count, rlevel*2-2, SP)); - return match_count; + offsets[0] = (PCRE2_SIZE)(current_subject - start_subject); + offsets[1] = (PCRE2_SIZE)(ptr - start_subject); } + if ((mb->moptions & PCRE2_DFA_SHORTEST) != 0) return match_count; } } break; @@ -861,14 +950,15 @@ for (;;) /*-----------------------------------------------------------------*/ case OP_CIRC: - if (ptr == start_subject && (md->moptions & PCRE_NOTBOL) == 0) + if (ptr == start_subject && (mb->moptions & PCRE2_NOTBOL) == 0) { ADD_ACTIVE(state_offset + 1, 0); } break; /*-----------------------------------------------------------------*/ case OP_CIRCM: - if ((ptr == start_subject && (md->moptions & PCRE_NOTBOL) == 0) || - (ptr != end_subject && WAS_NEWLINE(ptr))) + if ((ptr == start_subject && (mb->moptions & PCRE2_NOTBOL) == 0) || + ((ptr != end_subject || (mb->poptions & PCRE2_ALT_CIRCUMFLEX) != 0 ) + && WAS_NEWLINE(ptr))) { ADD_ACTIVE(state_offset + 1, 0); } break; @@ -876,8 +966,8 @@ for (;;) case OP_EOD: if (ptr >= end_subject) { - if ((md->moptions & PCRE_PARTIAL_HARD) != 0) - could_continue = TRUE; + if ((mb->moptions & PCRE2_PARTIAL_HARD) != 0) + return PCRE2_ERROR_PARTIAL; else { ADD_ACTIVE(state_offset + 1, 0); } } break; @@ -903,8 +993,8 @@ for (;;) case OP_ANY: if (clen > 0 && !IS_NEWLINE(ptr)) { - if (ptr + 1 >= md->end_subject && - (md->moptions & (PCRE_PARTIAL_HARD)) != 0 && + if (ptr + 1 >= mb->end_subject && + (mb->moptions & (PCRE2_PARTIAL_HARD)) != 0 && NLBLOCK->nltype == NLTYPE_FIXED && NLBLOCK->nllen == 2 && c == NLBLOCK->nl[0]) @@ -926,30 +1016,32 @@ for (;;) /*-----------------------------------------------------------------*/ case OP_EODN: - if (clen == 0 && (md->moptions & PCRE_PARTIAL_HARD) != 0) - could_continue = TRUE; - else if (clen == 0 || (IS_NEWLINE(ptr) && ptr == end_subject - md->nllen)) - { ADD_ACTIVE(state_offset + 1, 0); } + if (clen == 0 || (IS_NEWLINE(ptr) && ptr == end_subject - mb->nllen)) + { + if ((mb->moptions & PCRE2_PARTIAL_HARD) != 0) + return PCRE2_ERROR_PARTIAL; + ADD_ACTIVE(state_offset + 1, 0); + } break; /*-----------------------------------------------------------------*/ case OP_DOLL: - if ((md->moptions & PCRE_NOTEOL) == 0) + if ((mb->moptions & PCRE2_NOTEOL) == 0) { - if (clen == 0 && (md->moptions & PCRE_PARTIAL_HARD) != 0) + if (clen == 0 && (mb->moptions & PCRE2_PARTIAL_HARD) != 0) could_continue = TRUE; else if (clen == 0 || - ((md->poptions & PCRE_DOLLAR_ENDONLY) == 0 && IS_NEWLINE(ptr) && - (ptr == end_subject - md->nllen) + ((mb->poptions & PCRE2_DOLLAR_ENDONLY) == 0 && IS_NEWLINE(ptr) && + (ptr == end_subject - mb->nllen) )) { ADD_ACTIVE(state_offset + 1, 0); } - else if (ptr + 1 >= md->end_subject && - (md->moptions & (PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT)) != 0 && + else if (ptr + 1 >= mb->end_subject && + (mb->moptions & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0 && NLBLOCK->nltype == NLTYPE_FIXED && NLBLOCK->nllen == 2 && c == NLBLOCK->nl[0]) { - if ((md->moptions & PCRE_PARTIAL_HARD) != 0) + if ((mb->moptions & PCRE2_PARTIAL_HARD) != 0) { reset_could_continue = TRUE; ADD_NEW_DATA(-(state_offset + 1), 0, 1); @@ -961,20 +1053,20 @@ for (;;) /*-----------------------------------------------------------------*/ case OP_DOLLM: - if ((md->moptions & PCRE_NOTEOL) == 0) + if ((mb->moptions & PCRE2_NOTEOL) == 0) { - if (clen == 0 && (md->moptions & PCRE_PARTIAL_HARD) != 0) + if (clen == 0 && (mb->moptions & PCRE2_PARTIAL_HARD) != 0) could_continue = TRUE; else if (clen == 0 || - ((md->poptions & PCRE_DOLLAR_ENDONLY) == 0 && IS_NEWLINE(ptr))) + ((mb->poptions & PCRE2_DOLLAR_ENDONLY) == 0 && IS_NEWLINE(ptr))) { ADD_ACTIVE(state_offset + 1, 0); } - else if (ptr + 1 >= md->end_subject && - (md->moptions & (PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT)) != 0 && + else if (ptr + 1 >= mb->end_subject && + (mb->moptions & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0 && NLBLOCK->nltype == NLTYPE_FIXED && NLBLOCK->nllen == 2 && c == NLBLOCK->nl[0]) { - if ((md->moptions & PCRE_PARTIAL_HARD) != 0) + if ((mb->moptions & PCRE2_PARTIAL_HARD) != 0) { reset_could_continue = TRUE; ADD_NEW_DATA(-(state_offset + 1), 0, 1); @@ -1013,18 +1105,18 @@ for (;;) if (ptr > start_subject) { - const pcre_uchar *temp = ptr - 1; - if (temp < md->start_used_ptr) md->start_used_ptr = temp; -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 + PCRE2_SPTR temp = ptr - 1; + if (temp < mb->start_used_ptr) mb->start_used_ptr = temp; +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 if (utf) { BACKCHAR(temp); } #endif GETCHARTEST(d, temp); -#ifdef SUPPORT_UCP - if ((md->poptions & PCRE_UCP) != 0) +#ifdef SUPPORT_UNICODE + if ((mb->poptions & PCRE2_UCP) != 0) { if (d == '_') left_word = TRUE; else { - int cat = UCD_CATEGORY(d); + uint32_t cat = UCD_CATEGORY(d); left_word = (cat == ucp_L || cat == ucp_N); } } @@ -1036,12 +1128,20 @@ for (;;) if (clen > 0) { -#ifdef SUPPORT_UCP - if ((md->poptions & PCRE_UCP) != 0) + if (ptr >= mb->last_used_ptr) + { + PCRE2_SPTR temp = ptr + 1; +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 + if (utf) { FORWARDCHARTEST(temp, mb->end_subject); } +#endif + mb->last_used_ptr = temp; + } +#ifdef SUPPORT_UNICODE + if ((mb->poptions & PCRE2_UCP) != 0) { if (c == '_') right_word = TRUE; else { - int cat = UCD_CATEGORY(c); + uint32_t cat = UCD_CATEGORY(c); right_word = (cat == ucp_L || cat == ucp_N); } } @@ -1062,13 +1162,13 @@ for (;;) if the support is in the binary; otherwise a compile-time error occurs. */ -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE case OP_PROP: case OP_NOTPROP: if (clen > 0) { BOOL OK; - const pcre_uint32 *cp; + const uint32_t *cp; const ucd_record * prop = GET_UCD(c); switch(code[1]) { @@ -1167,8 +1267,8 @@ for (;;) if (count > 0) { ADD_ACTIVE(state_offset + 2, 0); } if (clen > 0) { - if (d == OP_ANY && ptr + 1 >= md->end_subject && - (md->moptions & (PCRE_PARTIAL_HARD)) != 0 && + if (d == OP_ANY && ptr + 1 >= mb->end_subject && + (mb->moptions & (PCRE2_PARTIAL_HARD)) != 0 && NLBLOCK->nltype == NLTYPE_FIXED && NLBLOCK->nllen == 2 && c == NLBLOCK->nl[0]) @@ -1198,8 +1298,8 @@ for (;;) ADD_ACTIVE(state_offset + 2, 0); if (clen > 0) { - if (d == OP_ANY && ptr + 1 >= md->end_subject && - (md->moptions & (PCRE_PARTIAL_HARD)) != 0 && + if (d == OP_ANY && ptr + 1 >= mb->end_subject && + (mb->moptions & (PCRE2_PARTIAL_HARD)) != 0 && NLBLOCK->nltype == NLTYPE_FIXED && NLBLOCK->nllen == 2 && c == NLBLOCK->nl[0]) @@ -1228,8 +1328,8 @@ for (;;) ADD_ACTIVE(state_offset + 2, 0); if (clen > 0) { - if (d == OP_ANY && ptr + 1 >= md->end_subject && - (md->moptions & (PCRE_PARTIAL_HARD)) != 0 && + if (d == OP_ANY && ptr + 1 >= mb->end_subject && + (mb->moptions & (PCRE2_PARTIAL_HARD)) != 0 && NLBLOCK->nltype == NLTYPE_FIXED && NLBLOCK->nllen == 2 && c == NLBLOCK->nl[0]) @@ -1256,8 +1356,8 @@ for (;;) count = current_state->count; /* Number already matched */ if (clen > 0) { - if (d == OP_ANY && ptr + 1 >= md->end_subject && - (md->moptions & (PCRE_PARTIAL_HARD)) != 0 && + if (d == OP_ANY && ptr + 1 >= mb->end_subject && + (mb->moptions & (PCRE2_PARTIAL_HARD)) != 0 && NLBLOCK->nltype == NLTYPE_FIXED && NLBLOCK->nllen == 2 && c == NLBLOCK->nl[0]) @@ -1285,8 +1385,8 @@ for (;;) count = current_state->count; /* Number already matched */ if (clen > 0) { - if (d == OP_ANY && ptr + 1 >= md->end_subject && - (md->moptions & (PCRE_PARTIAL_HARD)) != 0 && + if (d == OP_ANY && ptr + 1 >= mb->end_subject && + (mb->moptions & (PCRE2_PARTIAL_HARD)) != 0 && NLBLOCK->nltype == NLTYPE_FIXED && NLBLOCK->nllen == 2 && c == NLBLOCK->nl[0]) @@ -1317,7 +1417,7 @@ for (;;) argument. It keeps the code above fast for the other cases. The argument is in the d variable. */ -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE case OP_PROP_EXTRA + OP_TYPEPLUS: case OP_PROP_EXTRA + OP_TYPEMINPLUS: case OP_PROP_EXTRA + OP_TYPEPOSPLUS: @@ -1326,7 +1426,7 @@ for (;;) if (clen > 0) { BOOL OK; - const pcre_uint32 *cp; + const uint32_t *cp; const ucd_record * prop = GET_UCD(c); switch(code[2]) { @@ -1426,25 +1526,14 @@ for (;;) if (count > 0) { ADD_ACTIVE(state_offset + 2, 0); } if (clen > 0) { - int lgb, rgb; - const pcre_uchar *nptr = ptr + clen; int ncount = 0; if (count > 0 && codevalue == OP_EXTUNI_EXTRA + OP_TYPEPOSPLUS) { active_count--; /* Remove non-match possibility */ next_active_state--; } - lgb = UCD_GRAPHBREAK(c); - while (nptr < end_subject) - { - dlen = 1; - if (!utf) d = *nptr; else { GETCHARLEN(d, nptr, dlen); } - rgb = UCD_GRAPHBREAK(d); - if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) break; - ncount++; - lgb = rgb; - nptr += dlen; - } + (void)PRIV(extuni)(c, ptr + clen, mb->start_subject, end_subject, utf, + &ncount); count++; ADD_NEW_DATA(-state_offset, count, ncount); } @@ -1469,7 +1558,7 @@ for (;;) case 0x2028: case 0x2029: #endif /* Not EBCDIC */ - if ((md->moptions & PCRE_BSR_ANYCRLF) != 0) break; + if (mb->bsr_convention == PCRE2_BSR_ANYCRLF) break; goto ANYNL01; case CHAR_CR: @@ -1560,7 +1649,7 @@ for (;;) break; /*-----------------------------------------------------------------*/ -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE case OP_PROP_EXTRA + OP_TYPEQUERY: case OP_PROP_EXTRA + OP_TYPEMINQUERY: case OP_PROP_EXTRA + OP_TYPEPOSQUERY: @@ -1578,7 +1667,7 @@ for (;;) if (clen > 0) { BOOL OK; - const pcre_uint32 *cp; + const uint32_t *cp; const ucd_record * prop = GET_UCD(c); switch(code[2]) { @@ -1687,8 +1776,6 @@ for (;;) ADD_ACTIVE(state_offset + 2, 0); if (clen > 0) { - int lgb, rgb; - const pcre_uchar *nptr = ptr + clen; int ncount = 0; if (codevalue == OP_EXTUNI_EXTRA + OP_TYPEPOSSTAR || codevalue == OP_EXTUNI_EXTRA + OP_TYPEPOSQUERY) @@ -1696,17 +1783,8 @@ for (;;) active_count--; /* Remove non-match possibility */ next_active_state--; } - lgb = UCD_GRAPHBREAK(c); - while (nptr < end_subject) - { - dlen = 1; - if (!utf) d = *nptr; else { GETCHARLEN(d, nptr, dlen); } - rgb = UCD_GRAPHBREAK(d); - if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) break; - ncount++; - lgb = rgb; - nptr += dlen; - } + (void)PRIV(extuni)(c, ptr + clen, mb->start_subject, end_subject, utf, + &ncount); ADD_NEW_DATA(-(state_offset + count), 0, ncount); } break; @@ -1738,7 +1816,7 @@ for (;;) case 0x2028: case 0x2029: #endif /* Not EBCDIC */ - if ((md->moptions & PCRE_BSR_ANYCRLF) != 0) break; + if (mb->bsr_convention == PCRE2_BSR_ANYCRLF) break; goto ANYNL02; case CHAR_CR: @@ -1844,7 +1922,7 @@ for (;;) break; /*-----------------------------------------------------------------*/ -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE case OP_PROP_EXTRA + OP_TYPEEXACT: case OP_PROP_EXTRA + OP_TYPEUPTO: case OP_PROP_EXTRA + OP_TYPEMINUPTO: @@ -1855,7 +1933,7 @@ for (;;) if (clen > 0) { BOOL OK; - const pcre_uint32 *cp; + const uint32_t *cp; const ucd_record * prop = GET_UCD(c); switch(code[1 + IMM2_SIZE + 1]) { @@ -1959,26 +2037,16 @@ for (;;) count = current_state->count; /* Number already matched */ if (clen > 0) { - int lgb, rgb; - const pcre_uchar *nptr = ptr + clen; + PCRE2_SPTR nptr; int ncount = 0; if (codevalue == OP_EXTUNI_EXTRA + OP_TYPEPOSUPTO) { active_count--; /* Remove non-match possibility */ next_active_state--; } - lgb = UCD_GRAPHBREAK(c); - while (nptr < end_subject) - { - dlen = 1; - if (!utf) d = *nptr; else { GETCHARLEN(d, nptr, dlen); } - rgb = UCD_GRAPHBREAK(d); - if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) break; - ncount++; - lgb = rgb; - nptr += dlen; - } - if (nptr >= end_subject && (md->moptions & PCRE_PARTIAL_HARD) != 0) + nptr = PRIV(extuni)(c, ptr + clen, mb->start_subject, end_subject, utf, + &ncount); + if (nptr >= end_subject && (mb->moptions & PCRE2_PARTIAL_HARD) != 0) reset_could_continue = TRUE; if (++count >= (int)GET2(code, 1)) { ADD_NEW_DATA(-(state_offset + 2 + IMM2_SIZE), 0, ncount); } @@ -2008,7 +2076,7 @@ for (;;) case 0x2028: case 0x2029: #endif /* Not EBCDIC */ - if ((md->moptions & PCRE_BSR_ANYCRLF) != 0) break; + if (mb->bsr_convention == PCRE2_BSR_ANYCRLF) break; goto ANYNL03; case CHAR_CR: @@ -2122,8 +2190,8 @@ for (;;) case OP_CHARI: if (clen == 0) break; -#ifdef SUPPORT_UTF - if (utf) +#ifdef SUPPORT_UNICODE + if (utf_or_ucp) { if (c == d) { ADD_NEW(state_offset + dlen + 1, 0); } else { @@ -2131,20 +2199,13 @@ for (;;) if (c < 128) othercase = fcc[c]; else - /* If we have Unicode property support, we can use it to test the - other case of the character. */ -#ifdef SUPPORT_UCP othercase = UCD_OTHERCASE(c); -#else - othercase = NOTACHAR; -#endif - if (d == othercase) { ADD_NEW(state_offset + dlen + 1, 0); } } } else -#endif /* SUPPORT_UTF */ - /* Not UTF mode */ +#endif /* SUPPORT_UNICODE */ + /* Not UTF or UCP mode */ { if (TABLE_GET(c, lcc, c) == TABLE_GET(d, lcc, d)) { ADD_NEW(state_offset + 2, 0); } @@ -2152,7 +2213,7 @@ for (;;) break; -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE /*-----------------------------------------------------------------*/ /* This is a tricky one because it can match more than one character. Find out how many characters to skip, and then set up a negative state @@ -2161,21 +2222,10 @@ for (;;) case OP_EXTUNI: if (clen > 0) { - int lgb, rgb; - const pcre_uchar *nptr = ptr + clen; int ncount = 0; - lgb = UCD_GRAPHBREAK(c); - while (nptr < end_subject) - { - dlen = 1; - if (!utf) d = *nptr; else { GETCHARLEN(d, nptr, dlen); } - rgb = UCD_GRAPHBREAK(d); - if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) break; - ncount++; - lgb = rgb; - nptr += dlen; - } - if (nptr >= end_subject && (md->moptions & PCRE_PARTIAL_HARD) != 0) + PCRE2_SPTR nptr = PRIV(extuni)(c, ptr + clen, mb->start_subject, + end_subject, utf, &ncount); + if (nptr >= end_subject && (mb->moptions & PCRE2_PARTIAL_HARD) != 0) reset_could_continue = TRUE; ADD_NEW_DATA(-(state_offset + 1), 0, ncount); } @@ -2197,7 +2247,8 @@ for (;;) case 0x2028: case 0x2029: #endif /* Not EBCDIC */ - if ((md->moptions & PCRE_BSR_ANYCRLF) != 0) break; + if (mb->bsr_convention == PCRE2_BSR_ANYCRLF) break; + /* Fall through */ case CHAR_LF: ADD_NEW(state_offset + 1, 0); @@ -2207,7 +2258,7 @@ for (;;) if (ptr + 1 >= end_subject) { ADD_NEW(state_offset + 1, 0); - if ((md->moptions & PCRE_PARTIAL_HARD) != 0) + if ((mb->moptions & PCRE2_PARTIAL_HARD) != 0) reset_could_continue = TRUE; } else if (UCHAR21TEST(ptr + 1) == CHAR_LF) @@ -2287,18 +2338,12 @@ for (;;) case OP_NOTI: if (clen > 0) { - pcre_uint32 otherd; -#ifdef SUPPORT_UTF - if (utf && d >= 128) - { -#ifdef SUPPORT_UCP + uint32_t otherd; +#ifdef SUPPORT_UNICODE + if (utf_or_ucp && d >= 128) otherd = UCD_OTHERCASE(d); -#else - otherd = d; -#endif /* SUPPORT_UCP */ - } else -#endif /* SUPPORT_UTF */ +#endif /* SUPPORT_UNICODE */ otherd = TABLE_GET(d, fcc, d); if (c != d && c != otherd) { ADD_NEW(state_offset + dlen + 1, 0); } @@ -2326,18 +2371,14 @@ for (;;) if (count > 0) { ADD_ACTIVE(state_offset + dlen + 1, 0); } if (clen > 0) { - pcre_uint32 otherd = NOTACHAR; + uint32_t otherd = NOTACHAR; if (caseless) { -#ifdef SUPPORT_UTF - if (utf && d >= 128) - { -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE + if (utf_or_ucp && d >= 128) otherd = UCD_OTHERCASE(d); -#endif /* SUPPORT_UCP */ - } else -#endif /* SUPPORT_UTF */ +#endif /* SUPPORT_UNICODE */ otherd = TABLE_GET(d, fcc, d); } if ((c == d || c == otherd) == (codevalue < OP_NOTSTAR)) @@ -2373,18 +2414,14 @@ for (;;) ADD_ACTIVE(state_offset + dlen + 1, 0); if (clen > 0) { - pcre_uint32 otherd = NOTACHAR; + uint32_t otherd = NOTACHAR; if (caseless) { -#ifdef SUPPORT_UTF - if (utf && d >= 128) - { -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE + if (utf_or_ucp && d >= 128) otherd = UCD_OTHERCASE(d); -#endif /* SUPPORT_UCP */ - } else -#endif /* SUPPORT_UTF */ +#endif /* SUPPORT_UNICODE */ otherd = TABLE_GET(d, fcc, d); } if ((c == d || c == otherd) == (codevalue < OP_NOTSTAR)) @@ -2418,18 +2455,14 @@ for (;;) ADD_ACTIVE(state_offset + dlen + 1, 0); if (clen > 0) { - pcre_uint32 otherd = NOTACHAR; + uint32_t otherd = NOTACHAR; if (caseless) { -#ifdef SUPPORT_UTF - if (utf && d >= 128) - { -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE + if (utf_or_ucp && d >= 128) otherd = UCD_OTHERCASE(d); -#endif /* SUPPORT_UCP */ - } else -#endif /* SUPPORT_UTF */ +#endif /* SUPPORT_UNICODE */ otherd = TABLE_GET(d, fcc, d); } if ((c == d || c == otherd) == (codevalue < OP_NOTSTAR)) @@ -2455,18 +2488,14 @@ for (;;) count = current_state->count; /* Number already matched */ if (clen > 0) { - pcre_uint32 otherd = NOTACHAR; + uint32_t otherd = NOTACHAR; if (caseless) { -#ifdef SUPPORT_UTF - if (utf && d >= 128) - { -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE + if (utf_or_ucp && d >= 128) otherd = UCD_OTHERCASE(d); -#endif /* SUPPORT_UCP */ - } else -#endif /* SUPPORT_UTF */ +#endif /* SUPPORT_UNICODE */ otherd = TABLE_GET(d, fcc, d); } if ((c == d || c == otherd) == (codevalue < OP_NOTSTAR)) @@ -2499,18 +2528,14 @@ for (;;) count = current_state->count; /* Number already matched */ if (clen > 0) { - pcre_uint32 otherd = NOTACHAR; + uint32_t otherd = NOTACHAR; if (caseless) { -#ifdef SUPPORT_UTF - if (utf && d >= 128) - { -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE + if (utf_or_ucp && d >= 128) otherd = UCD_OTHERCASE(d); -#endif /* SUPPORT_UCP */ - } else -#endif /* SUPPORT_UTF */ +#endif /* SUPPORT_UNICODE */ otherd = TABLE_GET(d, fcc, d); } if ((c == d || c == otherd) == (codevalue < OP_NOTSTAR)) @@ -2538,18 +2563,18 @@ for (;;) { BOOL isinclass = FALSE; int next_state_offset; - const pcre_uchar *ecode; + PCRE2_SPTR ecode; /* For a simple class, there is always just a 32-byte table, and we can set isinclass from it. */ if (codevalue != OP_XCLASS) { - ecode = code + 1 + (32 / sizeof(pcre_uchar)); + ecode = code + 1 + (32 / sizeof(PCRE2_UCHAR)); if (clen > 0) { isinclass = (c > 255)? (codevalue == OP_NCLASS) : - ((((pcre_uint8 *)(code + 1))[c/8] & (1 << (c&7))) != 0); + ((((uint8_t *)(code + 1))[c/8] & (1u << (c&7))) != 0); } } @@ -2627,11 +2652,13 @@ for (;;) if (isinclass) { int max = (int)GET2(ecode, 1 + IMM2_SIZE); + if (*ecode == OP_CRPOSRANGE && count >= (int)GET2(ecode, 1)) { active_count--; /* Remove non-match possibility */ next_active_state--; } + if (++count >= max && max != 0) /* Max 0 => no limit */ { ADD_NEW(next_state_offset + 1 + 2 * IMM2_SIZE, 0); } else @@ -2662,24 +2689,39 @@ for (;;) case OP_ASSERTBACK_NOT: { int rc; - int local_offsets[2]; - int local_workspace[1000]; - const pcre_uchar *endasscode = code + GET(code, 1); + int *local_workspace; + PCRE2_SIZE *local_offsets; + PCRE2_SPTR endasscode = code + GET(code, 1); + RWS_anchor *rws = (RWS_anchor *)RWS; + + if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE) + { + rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb); + if (rc != 0) return rc; + RWS = (int *)rws; + } + + local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free); + local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE; + rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE; while (*endasscode == OP_ALT) endasscode += GET(endasscode, 1); - rc = internal_dfa_exec( - md, /* static match data */ + rc = internal_dfa_match( + mb, /* static match data */ code, /* this subexpression's code */ ptr, /* where we currently are */ - (int)(ptr - start_subject), /* start offset */ + (PCRE2_SIZE)(ptr - start_subject), /* start offset */ local_offsets, /* offset vector */ - sizeof(local_offsets)/sizeof(int), /* size of same */ + RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */ local_workspace, /* workspace vector */ - sizeof(local_workspace)/sizeof(int), /* size of same */ - rlevel); /* function recursion level */ + RWS_RSIZE, /* size of same */ + rlevel, /* function recursion level */ + RWS); /* recursion workspace */ + + rws->free += RWS_RSIZE + RWS_OVEC_OSIZE; - if (rc == PCRE_ERROR_DFA_UITEM) return rc; + if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc; if ((rc >= 0) == (codevalue == OP_ASSERT || codevalue == OP_ASSERTBACK)) { ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); } } @@ -2689,44 +2731,22 @@ for (;;) case OP_COND: case OP_SCOND: { - int local_offsets[1000]; - int local_workspace[1000]; - int codelink = GET(code, 1); - int condcode; + int codelink = (int)GET(code, 1); + PCRE2_UCHAR condcode; /* Because of the way auto-callout works during compile, a callout item is inserted between OP_COND and an assertion condition. This does not happen for the other conditions. */ - if (code[LINK_SIZE+1] == OP_CALLOUT) + if (code[LINK_SIZE + 1] == OP_CALLOUT + || code[LINK_SIZE + 1] == OP_CALLOUT_STR) { - rrc = 0; - if (PUBL(callout) != NULL) - { - PUBL(callout_block) cb; - cb.version = 1; /* Version 1 of the callout block */ - cb.callout_number = code[LINK_SIZE+2]; - cb.offset_vector = offsets; -#if defined COMPILE_PCRE8 - cb.subject = (PCRE_SPTR)start_subject; -#elif defined COMPILE_PCRE16 - cb.subject = (PCRE_SPTR16)start_subject; -#elif defined COMPILE_PCRE32 - cb.subject = (PCRE_SPTR32)start_subject; -#endif - cb.subject_length = (int)(end_subject - start_subject); - cb.start_match = (int)(current_subject - start_subject); - cb.current_position = (int)(ptr - start_subject); - cb.pattern_position = GET(code, LINK_SIZE + 3); - cb.next_item_length = GET(code, 3 + 2*LINK_SIZE); - cb.capture_top = 1; - cb.capture_last = -1; - cb.callout_data = md->callout_data; - cb.mark = NULL; /* No (*MARK) support */ - if ((rrc = (*PUBL(callout))(&cb)) < 0) return rrc; /* Abandon */ - } + PCRE2_SIZE callout_length; + rrc = do_callout(code, offsets, current_subject, ptr, mb, + 1 + LINK_SIZE, &callout_length); + if (rrc < 0) return rrc; /* Abandon */ if (rrc > 0) break; /* Fail this thread */ - code += PRIV(OP_lengths)[OP_CALLOUT]; /* Skip callout data */ + code += callout_length; /* Skip callout data */ } condcode = code[LINK_SIZE+1]; @@ -2736,23 +2756,28 @@ for (;;) if (condcode == OP_CREF || condcode == OP_DNCREF || condcode == OP_DNRREF) - return PCRE_ERROR_DFA_UCOND; + return PCRE2_ERROR_DFA_UCOND; /* The DEFINE condition is always false, and the assertion (?!) is converted to OP_FAIL. */ - if (condcode == OP_DEF || condcode == OP_FAIL) + if (condcode == OP_FALSE || condcode == OP_FAIL) { ADD_ACTIVE(state_offset + codelink + LINK_SIZE + 1, 0); } + /* There is also an always-true condition */ + + else if (condcode == OP_TRUE) + { ADD_ACTIVE(state_offset + LINK_SIZE + 2, 0); } + /* The only supported version of OP_RREF is for the value RREF_ANY, which means "test if in any recursion". We can't test for specifically recursed groups. */ else if (condcode == OP_RREF) { - int value = GET2(code, LINK_SIZE + 2); - if (value != RREF_ANY) return PCRE_ERROR_DFA_UCOND; - if (md->recursive != NULL) + unsigned int value = GET2(code, LINK_SIZE + 2); + if (value != RREF_ANY) return PCRE2_ERROR_DFA_UCOND; + if (mb->recursive != NULL) { ADD_ACTIVE(state_offset + LINK_SIZE + 2 + IMM2_SIZE, 0); } else { ADD_ACTIVE(state_offset + codelink + LINK_SIZE + 1, 0); } } @@ -2762,23 +2787,40 @@ for (;;) else { int rc; - const pcre_uchar *asscode = code + LINK_SIZE + 1; - const pcre_uchar *endasscode = asscode + GET(asscode, 1); + int *local_workspace; + PCRE2_SIZE *local_offsets; + PCRE2_SPTR asscode = code + LINK_SIZE + 1; + PCRE2_SPTR endasscode = asscode + GET(asscode, 1); + RWS_anchor *rws = (RWS_anchor *)RWS; + + if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE) + { + rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb); + if (rc != 0) return rc; + RWS = (int *)rws; + } + + local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free); + local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE; + rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE; while (*endasscode == OP_ALT) endasscode += GET(endasscode, 1); - rc = internal_dfa_exec( - md, /* fixed match data */ + rc = internal_dfa_match( + mb, /* fixed match data */ asscode, /* this subexpression's code */ ptr, /* where we currently are */ - (int)(ptr - start_subject), /* start offset */ + (PCRE2_SIZE)(ptr - start_subject), /* start offset */ local_offsets, /* offset vector */ - sizeof(local_offsets)/sizeof(int), /* size of same */ + RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */ local_workspace, /* workspace vector */ - sizeof(local_workspace)/sizeof(int), /* size of same */ - rlevel); /* function recursion level */ + RWS_RSIZE, /* size of same */ + rlevel, /* function recursion level */ + RWS); /* recursion workspace */ - if (rc == PCRE_ERROR_DFA_UITEM) return rc; + rws->free += RWS_RSIZE + RWS_OVEC_OSIZE; + + if (rc < 0 && rc != PCRE2_ERROR_NOMATCH) return rc; if ((rc >= 0) == (condcode == OP_ASSERT || condcode == OP_ASSERTBACK)) { ADD_ACTIVE((int)(endasscode + LINK_SIZE + 1 - start_code), 0); } @@ -2791,51 +2833,60 @@ for (;;) /*-----------------------------------------------------------------*/ case OP_RECURSE: { + int rc; + int *local_workspace; + PCRE2_SIZE *local_offsets; + RWS_anchor *rws = (RWS_anchor *)RWS; dfa_recursion_info *ri; - int local_offsets[1000]; - int local_workspace[1000]; - const pcre_uchar *callpat = start_code + GET(code, 1); - int recno = (callpat == md->start_code)? 0 : + PCRE2_SPTR callpat = start_code + GET(code, 1); + uint32_t recno = (callpat == mb->start_code)? 0 : GET2(callpat, 1 + LINK_SIZE); - int rc; - DPRINTF(("%.*sStarting regex recursion\n", rlevel*2-2, SP)); + if (rws->free < RWS_RSIZE + RWS_OVEC_RSIZE) + { + rc = more_workspace(&rws, RWS_OVEC_RSIZE, mb); + if (rc != 0) return rc; + RWS = (int *)rws; + } + + local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free); + local_workspace = ((int *)local_offsets) + RWS_OVEC_RSIZE; + rws->free -= RWS_RSIZE + RWS_OVEC_RSIZE; /* Check for repeating a recursion without advancing the subject pointer. This should catch convoluted mutual recursions. (Some simple cases are caught at compile time.) */ - for (ri = md->recursive; ri != NULL; ri = ri->prevrec) + for (ri = mb->recursive; ri != NULL; ri = ri->prevrec) if (recno == ri->group_num && ptr == ri->subject_position) - return PCRE_ERROR_RECURSELOOP; + return PCRE2_ERROR_RECURSELOOP; /* Remember this recursion and where we started it so as to catch infinite loops. */ new_recursive.group_num = recno; new_recursive.subject_position = ptr; - new_recursive.prevrec = md->recursive; - md->recursive = &new_recursive; + new_recursive.prevrec = mb->recursive; + mb->recursive = &new_recursive; - rc = internal_dfa_exec( - md, /* fixed match data */ + rc = internal_dfa_match( + mb, /* fixed match data */ callpat, /* this subexpression's code */ ptr, /* where we currently are */ - (int)(ptr - start_subject), /* start offset */ + (PCRE2_SIZE)(ptr - start_subject), /* start offset */ local_offsets, /* offset vector */ - sizeof(local_offsets)/sizeof(int), /* size of same */ + RWS_OVEC_RSIZE/OVEC_UNIT, /* size of same */ local_workspace, /* workspace vector */ - sizeof(local_workspace)/sizeof(int), /* size of same */ - rlevel); /* function recursion level */ + RWS_RSIZE, /* size of same */ + rlevel, /* function recursion level */ + RWS); /* recursion workspace */ - md->recursive = new_recursive.prevrec; /* Done this recursion */ - - DPRINTF(("%.*sReturn from regex recursion: rc=%d\n", rlevel*2-2, SP, - rc)); + rws->free += RWS_RSIZE + RWS_OVEC_RSIZE; + mb->recursive = new_recursive.prevrec; /* Done this recursion */ /* Ran out of internal offsets */ - if (rc == 0) return PCRE_ERROR_DFA_RECURSE; + if (rc == 0) return PCRE2_ERROR_DFA_RECURSE; /* For each successful matched substring, set up the next state with a count of characters to skip before trying it. Note that the count is in @@ -2845,18 +2896,19 @@ for (;;) { for (rc = rc*2 - 2; rc >= 0; rc -= 2) { - int charcount = local_offsets[rc+1] - local_offsets[rc]; -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 + PCRE2_SIZE charcount = local_offsets[rc+1] - local_offsets[rc]; +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 if (utf) { - const pcre_uchar *p = start_subject + local_offsets[rc]; - const pcre_uchar *pp = start_subject + local_offsets[rc+1]; - while (p < pp) if (NOT_FIRSTCHAR(*p++)) charcount--; + PCRE2_SPTR p = start_subject + local_offsets[rc]; + PCRE2_SPTR pp = start_subject + local_offsets[rc+1]; + while (p < pp) if (NOT_FIRSTCU(*p++)) charcount--; } #endif if (charcount > 0) { - ADD_NEW_DATA(-(state_offset + LINK_SIZE + 1), 0, (charcount - 1)); + ADD_NEW_DATA(-(state_offset + LINK_SIZE + 1), 0, + (int)(charcount - 1)); } else { @@ -2864,7 +2916,7 @@ for (;;) } } } - else if (rc != PCRE_ERROR_NOMATCH) return rc; + else if (rc != PCRE2_ERROR_NOMATCH) return rc; } break; @@ -2875,10 +2927,25 @@ for (;;) case OP_SCBRAPOS: case OP_BRAPOSZERO: { - int charcount, matched_count; - const pcre_uchar *local_ptr = ptr; + int rc; + int *local_workspace; + PCRE2_SIZE *local_offsets; + PCRE2_SIZE charcount, matched_count; + PCRE2_SPTR local_ptr = ptr; + RWS_anchor *rws = (RWS_anchor *)RWS; BOOL allow_zero; + if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE) + { + rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb); + if (rc != 0) return rc; + RWS = (int *)rws; + } + + local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free); + local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE; + rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE; + if (codevalue == OP_BRAPOSZERO) { allow_zero = TRUE; @@ -2891,25 +2958,23 @@ for (;;) for (matched_count = 0;; matched_count++) { - int local_offsets[2]; - int local_workspace[1000]; - - int rc = internal_dfa_exec( - md, /* fixed match data */ + rc = internal_dfa_match( + mb, /* fixed match data */ code, /* this subexpression's code */ local_ptr, /* where we currently are */ - (int)(ptr - start_subject), /* start offset */ + (PCRE2_SIZE)(ptr - start_subject), /* start offset */ local_offsets, /* offset vector */ - sizeof(local_offsets)/sizeof(int), /* size of same */ + RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */ local_workspace, /* workspace vector */ - sizeof(local_workspace)/sizeof(int), /* size of same */ - rlevel); /* function recursion level */ + RWS_RSIZE, /* size of same */ + rlevel, /* function recursion level */ + RWS); /* recursion workspace */ /* Failed to match */ if (rc < 0) { - if (rc != PCRE_ERROR_NOMATCH) return rc; + if (rc != PCRE2_ERROR_NOMATCH) return rc; break; } @@ -2920,13 +2985,15 @@ for (;;) local_ptr += charcount; /* Advance temporary position ptr */ } + rws->free += RWS_RSIZE + RWS_OVEC_OSIZE; + /* At this point we have matched the subpattern matched_count times, and local_ptr is pointing to the character after the end of the last match. */ if (matched_count > 0 || allow_zero) { - const pcre_uchar *end_subpattern = code; + PCRE2_SPTR end_subpattern = code; int next_state_offset; do { end_subpattern += GET(end_subpattern, 1); } @@ -2947,13 +3014,13 @@ for (;;) } else { - const pcre_uchar *p = ptr; - const pcre_uchar *pp = local_ptr; - charcount = (int)(pp - p); -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 - if (utf) while (p < pp) if (NOT_FIRSTCHAR(*p++)) charcount--; + PCRE2_SPTR p = ptr; + PCRE2_SPTR pp = local_ptr; + charcount = (PCRE2_SIZE)(pp - p); +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 + if (utf) while (p < pp) if (NOT_FIRSTCU(*p++)) charcount--; #endif - ADD_NEW_DATA(-next_state_offset, 0, (charcount - 1)); + ADD_NEW_DATA(-next_state_offset, 0, (int)(charcount - 1)); } } } @@ -2961,26 +3028,41 @@ for (;;) /*-----------------------------------------------------------------*/ case OP_ONCE: - case OP_ONCE_NC: { - int local_offsets[2]; - int local_workspace[1000]; + int rc; + int *local_workspace; + PCRE2_SIZE *local_offsets; + RWS_anchor *rws = (RWS_anchor *)RWS; + + if (rws->free < RWS_RSIZE + RWS_OVEC_OSIZE) + { + rc = more_workspace(&rws, RWS_OVEC_OSIZE, mb); + if (rc != 0) return rc; + RWS = (int *)rws; + } - int rc = internal_dfa_exec( - md, /* fixed match data */ + local_offsets = (PCRE2_SIZE *)(RWS + rws->size - rws->free); + local_workspace = ((int *)local_offsets) + RWS_OVEC_OSIZE; + rws->free -= RWS_RSIZE + RWS_OVEC_OSIZE; + + rc = internal_dfa_match( + mb, /* fixed match data */ code, /* this subexpression's code */ ptr, /* where we currently are */ - (int)(ptr - start_subject), /* start offset */ + (PCRE2_SIZE)(ptr - start_subject), /* start offset */ local_offsets, /* offset vector */ - sizeof(local_offsets)/sizeof(int), /* size of same */ + RWS_OVEC_OSIZE/OVEC_UNIT, /* size of same */ local_workspace, /* workspace vector */ - sizeof(local_workspace)/sizeof(int), /* size of same */ - rlevel); /* function recursion level */ + RWS_RSIZE, /* size of same */ + rlevel, /* function recursion level */ + RWS); /* recursion workspace */ + + rws->free += RWS_RSIZE + RWS_OVEC_OSIZE; if (rc >= 0) { - const pcre_uchar *end_subpattern = code; - int charcount = local_offsets[1] - local_offsets[0]; + PCRE2_SPTR end_subpattern = code; + PCRE2_SIZE charcount = local_offsets[1] - local_offsets[0]; int next_state_offset, repeat_state_offset; do { end_subpattern += GET(end_subpattern, 1); } @@ -3032,20 +3114,20 @@ for (;;) } else { -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 if (utf) { - const pcre_uchar *p = start_subject + local_offsets[0]; - const pcre_uchar *pp = start_subject + local_offsets[1]; - while (p < pp) if (NOT_FIRSTCHAR(*p++)) charcount--; + PCRE2_SPTR p = start_subject + local_offsets[0]; + PCRE2_SPTR pp = start_subject + local_offsets[1]; + while (p < pp) if (NOT_FIRSTCU(*p++)) charcount--; } #endif - ADD_NEW_DATA(-next_state_offset, 0, (charcount - 1)); + ADD_NEW_DATA(-next_state_offset, 0, (int)(charcount - 1)); if (repeat_state_offset >= 0) - { ADD_NEW_DATA(-repeat_state_offset, 0, (charcount - 1)); } + { ADD_NEW_DATA(-repeat_state_offset, 0, (int)(charcount - 1)); } } } - else if (rc != PCRE_ERROR_NOMATCH) return rc; + else if (rc != PCRE2_ERROR_NOMATCH) return rc; } break; @@ -3054,39 +3136,21 @@ for (;;) /* Handle callouts */ case OP_CALLOUT: - rrc = 0; - if (PUBL(callout) != NULL) + case OP_CALLOUT_STR: { - PUBL(callout_block) cb; - cb.version = 1; /* Version 1 of the callout block */ - cb.callout_number = code[1]; - cb.offset_vector = offsets; -#if defined COMPILE_PCRE8 - cb.subject = (PCRE_SPTR)start_subject; -#elif defined COMPILE_PCRE16 - cb.subject = (PCRE_SPTR16)start_subject; -#elif defined COMPILE_PCRE32 - cb.subject = (PCRE_SPTR32)start_subject; -#endif - cb.subject_length = (int)(end_subject - start_subject); - cb.start_match = (int)(current_subject - start_subject); - cb.current_position = (int)(ptr - start_subject); - cb.pattern_position = GET(code, 2); - cb.next_item_length = GET(code, 2 + LINK_SIZE); - cb.capture_top = 1; - cb.capture_last = -1; - cb.callout_data = md->callout_data; - cb.mark = NULL; /* No (*MARK) support */ - if ((rrc = (*PUBL(callout))(&cb)) < 0) return rrc; /* Abandon */ + PCRE2_SIZE callout_length; + rrc = do_callout(code, offsets, current_subject, ptr, mb, 0, + &callout_length); + if (rrc < 0) return rrc; /* Abandon */ + if (rrc == 0) + { ADD_ACTIVE(state_offset + (int)callout_length, 0); } } - if (rrc == 0) - { ADD_ACTIVE(state_offset + PRIV(OP_lengths)[OP_CALLOUT], 0); } break; /* ========================================================================== */ default: /* Unsupported opcode */ - return PCRE_ERROR_DFA_UITEM; + return PCRE2_ERROR_DFA_UITEM; } NEXT_ACTIVE_STATE: continue; @@ -3095,8 +3159,8 @@ for (;;) /* We have finished the processing at the current subject character. If no new states have been set for the next character, we have found all the - matches that we are going to find. If we are at the top level and partial - matching has been requested, check for appropriate conditions. + matches that we are going to find. If partial matching has been requested, + check for appropriate conditions. The "forced_ fail" variable counts the number of (*F) encountered for the character. If it is equal to the original active_count (saved in @@ -3108,27 +3172,26 @@ for (;;) if (new_count <= 0) { - if (rlevel == 1 && /* Top level, and */ - could_continue && /* Some could go on, and */ + if (could_continue && /* Some could go on, and */ forced_fail != workspace[1] && /* Not all forced fail & */ ( /* either... */ - (md->moptions & PCRE_PARTIAL_HARD) != 0 /* Hard partial */ + (mb->moptions & PCRE2_PARTIAL_HARD) != 0 /* Hard partial */ || /* or... */ - ((md->moptions & PCRE_PARTIAL_SOFT) != 0 && /* Soft partial and */ - match_count < 0) /* no matches */ + ((mb->moptions & PCRE2_PARTIAL_SOFT) != 0 && /* Soft partial and */ + match_count < 0) /* no matches */ ) && /* And... */ ( - partial_newline || /* Either partial NL */ - ( /* or ... */ - ptr >= end_subject && /* End of subject and */ - ptr > md->start_used_ptr) /* Inspected non-empty string */ + partial_newline || /* Either partial NL */ + ( /* or ... */ + ptr >= end_subject && /* End of subject and */ + ( /* either */ + ptr > mb->start_used_ptr || /* Inspected non-empty string */ + mb->allowemptypartial /* or pattern has lookbehind */ + ) /* or could match empty */ ) - ) - match_count = PCRE_ERROR_PARTIAL; - DPRINTF(("%.*sEnd of internal_dfa_exec %d: returning %d\n" - "%.*s---------------------\n\n", rlevel*2-2, SP, rlevel, match_count, - rlevel*2-2, SP)); - break; /* In effect, "return", but see the comment below */ + )) + match_count = PCRE2_ERROR_PARTIAL; + break; /* Exit from loop along the subject string */ } /* One or more states are active for the next character. */ @@ -3136,541 +3199,784 @@ for (;;) ptr += clen; /* Advance to next subject character */ } /* Loop to move along the subject string */ -/* Control gets here from "break" a few lines above. We do it this way because -if we use "return" above, we have compiler trouble. Some compilers warn if -there's nothing here because they think the function doesn't return a value. On -the other hand, if we put a dummy statement here, some more clever compilers -complain that it can't be reached. Sigh. */ +/* Control gets here from "break" a few lines above. If we have a match and +PCRE2_ENDANCHORED is set, the match fails. */ + +if (match_count >= 0 && + ((mb->moptions | mb->poptions) & PCRE2_ENDANCHORED) != 0 && + ptr < end_subject) + match_count = PCRE2_ERROR_NOMATCH; return match_count; } - /************************************************* -* Execute a Regular Expression - DFA engine * +* Match a pattern using the DFA algorithm * *************************************************/ -/* This external function applies a compiled re to a subject string using a DFA -engine. This function calls the internal function multiple times if the pattern -is not anchored. +/* This function matches a compiled pattern to a subject string, using the +alternate matching algorithm that finds all matches at once. Arguments: - argument_re points to the compiled expression - extra_data points to extra data or is NULL - subject points to the subject string - length length of subject string (may contain binary zeros) - start_offset where to start in the subject string - options option bits - offsets vector of match offsets - offsetcount size of same - workspace workspace vector - wscount size of same - -Returns: > 0 => number of match offset pairs placed in offsets - = 0 => offsets overflowed; longest matches are present - -1 => failed to match - < -1 => some kind of unexpected problem + code points to the compiled pattern + subject subject string + length length of subject string + startoffset where to start matching in the subject + options option bits + match_data points to a match data structure + gcontext points to a match context + workspace pointer to workspace + wscount size of workspace + +Returns: > 0 => number of match offset pairs placed in offsets + = 0 => offsets overflowed; longest matches are present + -1 => failed to match + < -1 => some kind of unexpected problem */ -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre_dfa_exec(const pcre *argument_re, const pcre_extra *extra_data, - const char *subject, int length, int start_offset, int options, int *offsets, - int offsetcount, int *workspace, int wscount) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre16_dfa_exec(const pcre16 *argument_re, const pcre16_extra *extra_data, - PCRE_SPTR16 subject, int length, int start_offset, int options, int *offsets, - int offsetcount, int *workspace, int wscount) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre32_dfa_exec(const pcre32 *argument_re, const pcre32_extra *extra_data, - PCRE_SPTR32 subject, int length, int start_offset, int options, int *offsets, - int offsetcount, int *workspace, int wscount) -#endif +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_dfa_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length, + PCRE2_SIZE start_offset, uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext, int *workspace, PCRE2_SIZE wscount) { -REAL_PCRE *re = (REAL_PCRE *)argument_re; -dfa_match_data match_block; -dfa_match_data *md = &match_block; +int rc; +int was_zero_terminated = 0; + +const pcre2_real_code *re = (const pcre2_real_code *)code; + +PCRE2_SPTR start_match; +PCRE2_SPTR end_subject; +PCRE2_SPTR bumpalong_limit; +PCRE2_SPTR req_cu_ptr; + BOOL utf, anchored, startline, firstline; -const pcre_uchar *current_subject, *end_subject; -const pcre_study_data *study = NULL; - -const pcre_uchar *req_char_ptr; -const pcre_uint8 *start_bits = NULL; -BOOL has_first_char = FALSE; -BOOL has_req_char = FALSE; -pcre_uchar first_char = 0; -pcre_uchar first_char2 = 0; -pcre_uchar req_char = 0; -pcre_uchar req_char2 = 0; -int newline; +BOOL has_first_cu = FALSE; +BOOL has_req_cu = FALSE; -/* Plausibility checks */ +#if PCRE2_CODE_UNIT_WIDTH == 8 +BOOL memchr_not_found_first_cu = FALSE; +BOOL memchr_not_found_first_cu2 = FALSE; +#endif -if ((options & ~PUBLIC_DFA_EXEC_OPTIONS) != 0) return PCRE_ERROR_BADOPTION; -if (re == NULL || subject == NULL || workspace == NULL || - (offsets == NULL && offsetcount > 0)) return PCRE_ERROR_NULL; -if (offsetcount < 0) return PCRE_ERROR_BADCOUNT; -if (wscount < 20) return PCRE_ERROR_DFA_WSSIZE; -if (length < 0) return PCRE_ERROR_BADLENGTH; -if (start_offset < 0 || start_offset > length) return PCRE_ERROR_BADOFFSET; +PCRE2_UCHAR first_cu = 0; +PCRE2_UCHAR first_cu2 = 0; +PCRE2_UCHAR req_cu = 0; +PCRE2_UCHAR req_cu2 = 0; -/* Check that the first field in the block is the magic number. If it is not, -return with PCRE_ERROR_BADMAGIC. However, if the magic number is equal to -REVERSED_MAGIC_NUMBER we return with PCRE_ERROR_BADENDIANNESS, which -means that the pattern is likely compiled with different endianness. */ +const uint8_t *start_bits = NULL; -if (re->magic_number != MAGIC_NUMBER) - return re->magic_number == REVERSED_MAGIC_NUMBER? - PCRE_ERROR_BADENDIANNESS:PCRE_ERROR_BADMAGIC; -if ((re->flags & PCRE_MODE) == 0) return PCRE_ERROR_BADMODE; +/* We need to have mb pointing to a match block, because the IS_NEWLINE macro +is used below, and it expects NLBLOCK to be defined as a pointer. */ -/* If restarting after a partial match, do some sanity checks on the contents -of the workspace. */ +pcre2_callout_block cb; +dfa_match_block actual_match_block; +dfa_match_block *mb = &actual_match_block; -if ((options & PCRE_DFA_RESTART) != 0) - { - if ((workspace[0] & (-2)) != 0 || workspace[1] < 1 || - workspace[1] > (wscount - 2)/INTS_PER_STATEBLOCK) - return PCRE_ERROR_DFA_BADRESTART; - } +/* Set up a starting block of memory for use during recursive calls to +internal_dfa_match(). By putting this on the stack, it minimizes resource use +in the case when it is not needed. If this is too small, more memory is +obtained from the heap. At the start of each block is an anchor structure.*/ -/* Set up study, callout, and table data */ +int base_recursion_workspace[RWS_BASE_SIZE]; +RWS_anchor *rws = (RWS_anchor *)base_recursion_workspace; +rws->next = NULL; +rws->size = RWS_BASE_SIZE; +rws->free = RWS_BASE_SIZE - RWS_ANCHOR_SIZE; -md->tables = re->tables; -md->callout_data = NULL; +/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated +subject string. */ -if (extra_data != NULL) +if (length == PCRE2_ZERO_TERMINATED) { - unsigned long int flags = extra_data->flags; - if ((flags & PCRE_EXTRA_STUDY_DATA) != 0) - study = (const pcre_study_data *)extra_data->study_data; - if ((flags & PCRE_EXTRA_MATCH_LIMIT) != 0) return PCRE_ERROR_DFA_UMLIMIT; - if ((flags & PCRE_EXTRA_MATCH_LIMIT_RECURSION) != 0) - return PCRE_ERROR_DFA_UMLIMIT; - if ((flags & PCRE_EXTRA_CALLOUT_DATA) != 0) - md->callout_data = extra_data->callout_data; - if ((flags & PCRE_EXTRA_TABLES) != 0) - md->tables = extra_data->tables; + length = PRIV(strlen)(subject); + was_zero_terminated = 1; } -/* Set some local values */ +/* Plausibility checks */ -current_subject = (const pcre_uchar *)subject + start_offset; -end_subject = (const pcre_uchar *)subject + length; -req_char_ptr = current_subject - 1; +if ((options & ~PUBLIC_DFA_MATCH_OPTIONS) != 0) return PCRE2_ERROR_BADOPTION; +if (re == NULL || subject == NULL || workspace == NULL || match_data == NULL) + return PCRE2_ERROR_NULL; +if (wscount < 20) return PCRE2_ERROR_DFA_WSSIZE; +if (start_offset > length) return PCRE2_ERROR_BADOFFSET; -#ifdef SUPPORT_UTF -/* PCRE_UTF(16|32) have the same value as PCRE_UTF8. */ -utf = (re->options & PCRE_UTF8) != 0; -#else -utf = FALSE; -#endif +/* Partial matching and PCRE2_ENDANCHORED are currently not allowed at the same +time. */ -anchored = (options & (PCRE_ANCHORED|PCRE_DFA_RESTART)) != 0 || - (re->options & PCRE_ANCHORED) != 0; +if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0 && + ((re->overall_options | options) & PCRE2_ENDANCHORED) != 0) + return PCRE2_ERROR_BADOPTION; -/* The remaining fixed data for passing around. */ +/* Invalid UTF support is not available for DFA matching. */ -md->start_code = (const pcre_uchar *)argument_re + - re->name_table_offset + re->name_count * re->name_entry_size; -md->start_subject = (const pcre_uchar *)subject; -md->end_subject = end_subject; -md->start_offset = start_offset; -md->moptions = options; -md->poptions = re->options; +if ((re->overall_options & PCRE2_MATCH_INVALID_UTF) != 0) + return PCRE2_ERROR_DFA_UINVALID_UTF; -/* If the BSR option is not set at match time, copy what was set -at compile time. */ +/* Check that the first field in the block is the magic number. If it is not, +return with PCRE2_ERROR_BADMAGIC. */ -if ((md->moptions & (PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE)) == 0) - { - if ((re->options & (PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE)) != 0) - md->moptions |= re->options & (PCRE_BSR_ANYCRLF|PCRE_BSR_UNICODE); -#ifdef BSR_ANYCRLF - else md->moptions |= PCRE_BSR_ANYCRLF; -#endif - } +if (re->magic_number != MAGIC_NUMBER) return PCRE2_ERROR_BADMAGIC; -/* Handle different types of newline. The three bits give eight cases. If -nothing is set at run time, whatever was used at compile time applies. */ +/* Check the code unit width. */ -switch ((((options & PCRE_NEWLINE_BITS) == 0)? re->options : (pcre_uint32)options) & - PCRE_NEWLINE_BITS) - { - case 0: newline = NEWLINE; break; /* Compile-time default */ - case PCRE_NEWLINE_CR: newline = CHAR_CR; break; - case PCRE_NEWLINE_LF: newline = CHAR_NL; break; - case PCRE_NEWLINE_CR+ - PCRE_NEWLINE_LF: newline = (CHAR_CR << 8) | CHAR_NL; break; - case PCRE_NEWLINE_ANY: newline = -1; break; - case PCRE_NEWLINE_ANYCRLF: newline = -2; break; - default: return PCRE_ERROR_BADNEWLINE; - } +if ((re->flags & PCRE2_MODE_MASK) != PCRE2_CODE_UNIT_WIDTH/8) + return PCRE2_ERROR_BADMODE; + +/* PCRE2_NOTEMPTY and PCRE2_NOTEMPTY_ATSTART are match-time flags in the +options variable for this function. Users of PCRE2 who are not calling the +function directly would like to have a way of setting these flags, in the same +way that they can set pcre2_compile() flags like PCRE2_NO_AUTOPOSSESS with +constructions like (*NO_AUTOPOSSESS). To enable this, (*NOTEMPTY) and +(*NOTEMPTY_ATSTART) set bits in the pattern's "flag" function which can now be +transferred to the options for this function. The bits are guaranteed to be +adjacent, but do not have the same values. This bit of Boolean trickery assumes +that the match-time bits are not more significant than the flag bits. If by +accident this is not the case, a compile-time division by zero error will +occur. */ -if (newline == -2) +#define FF (PCRE2_NOTEMPTY_SET|PCRE2_NE_ATST_SET) +#define OO (PCRE2_NOTEMPTY|PCRE2_NOTEMPTY_ATSTART) +options |= (re->flags & FF) / ((FF & (~FF+1)) / (OO & (~OO+1))); +#undef FF +#undef OO + +/* If restarting after a partial match, do some sanity checks on the contents +of the workspace. */ + +if ((options & PCRE2_DFA_RESTART) != 0) { - md->nltype = NLTYPE_ANYCRLF; + if ((workspace[0] & (-2)) != 0 || workspace[1] < 1 || + workspace[1] > (int)((wscount - 2)/INTS_PER_STATEBLOCK)) + return PCRE2_ERROR_DFA_BADRESTART; } -else if (newline < 0) + +/* Set some local values */ + +utf = (re->overall_options & PCRE2_UTF) != 0; +start_match = subject + start_offset; +end_subject = subject + length; +req_cu_ptr = start_match - 1; +anchored = (options & (PCRE2_ANCHORED|PCRE2_DFA_RESTART)) != 0 || + (re->overall_options & PCRE2_ANCHORED) != 0; + +/* The "must be at the start of a line" flags are used in a loop when finding +where to start. */ + +startline = (re->flags & PCRE2_STARTLINE) != 0; +firstline = (re->overall_options & PCRE2_FIRSTLINE) != 0; +bumpalong_limit = end_subject; + +/* Initialize and set up the fixed fields in the callout block, with a pointer +in the match block. */ + +mb->cb = &cb; +cb.version = 2; +cb.subject = subject; +cb.subject_length = (PCRE2_SIZE)(end_subject - subject); +cb.callout_flags = 0; +cb.capture_top = 1; /* No capture support */ +cb.capture_last = 0; +cb.mark = NULL; /* No (*MARK) support */ + +/* Get data from the match context, if present, and fill in the remaining +fields in the match block. It is an error to set an offset limit without +setting the flag at compile time. */ + +if (mcontext == NULL) { - md->nltype = NLTYPE_ANY; + mb->callout = NULL; + mb->memctl = re->memctl; + mb->match_limit = PRIV(default_match_context).match_limit; + mb->match_limit_depth = PRIV(default_match_context).depth_limit; + mb->heap_limit = PRIV(default_match_context).heap_limit; } else { - md->nltype = NLTYPE_FIXED; - if (newline > 255) - { - md->nllen = 2; - md->nl[0] = (newline >> 8) & 255; - md->nl[1] = newline & 255; - } - else + if (mcontext->offset_limit != PCRE2_UNSET) { - md->nllen = 1; - md->nl[0] = newline; + if ((re->overall_options & PCRE2_USE_OFFSET_LIMIT) == 0) + return PCRE2_ERROR_BADOFFSETLIMIT; + bumpalong_limit = subject + mcontext->offset_limit; } + mb->callout = mcontext->callout; + mb->callout_data = mcontext->callout_data; + mb->memctl = mcontext->memctl; + mb->match_limit = mcontext->match_limit; + mb->match_limit_depth = mcontext->depth_limit; + mb->heap_limit = mcontext->heap_limit; } -/* Check a UTF-8 string if required. Unfortunately there's no way of passing -back the character offset. */ +if (mb->match_limit > re->limit_match) + mb->match_limit = re->limit_match; + +if (mb->match_limit_depth > re->limit_depth) + mb->match_limit_depth = re->limit_depth; + +if (mb->heap_limit > re->limit_heap) + mb->heap_limit = re->limit_heap; + +mb->start_code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) + + re->name_count * re->name_entry_size; +mb->tables = re->tables; +mb->start_subject = subject; +mb->end_subject = end_subject; +mb->start_offset = start_offset; +mb->allowemptypartial = (re->max_lookbehind > 0) || + (re->flags & PCRE2_MATCH_EMPTY) != 0; +mb->moptions = options; +mb->poptions = re->overall_options; +mb->match_call_count = 0; +mb->heap_used = 0; + +/* Process the \R and newline settings. */ + +mb->bsr_convention = re->bsr_convention; +mb->nltype = NLTYPE_FIXED; +switch(re->newline_convention) + { + case PCRE2_NEWLINE_CR: + mb->nllen = 1; + mb->nl[0] = CHAR_CR; + break; + + case PCRE2_NEWLINE_LF: + mb->nllen = 1; + mb->nl[0] = CHAR_NL; + break; + + case PCRE2_NEWLINE_NUL: + mb->nllen = 1; + mb->nl[0] = CHAR_NUL; + break; + + case PCRE2_NEWLINE_CRLF: + mb->nllen = 2; + mb->nl[0] = CHAR_CR; + mb->nl[1] = CHAR_NL; + break; + + case PCRE2_NEWLINE_ANY: + mb->nltype = NLTYPE_ANY; + break; + + case PCRE2_NEWLINE_ANYCRLF: + mb->nltype = NLTYPE_ANYCRLF; + break; + + default: return PCRE2_ERROR_INTERNAL; + } -#ifdef SUPPORT_UTF -if (utf && (options & PCRE_NO_UTF8_CHECK) == 0) +/* Check a UTF string for validity if required. For 8-bit and 16-bit strings, +we must also check that a starting offset does not point into the middle of a +multiunit character. We check only the portion of the subject that is going to +be inspected during matching - from the offset minus the maximum back reference +to the given length. This saves time when a small part of a large subject is +being matched by the use of a starting offset. Note that the maximum lookbehind +is a number of characters, not code units. */ + +#ifdef SUPPORT_UNICODE +if (utf && (options & PCRE2_NO_UTF_CHECK) == 0) { - int erroroffset; - int errorcode = PRIV(valid_utf)((pcre_uchar *)subject, length, &erroroffset); - if (errorcode != 0) + PCRE2_SPTR check_subject = start_match; /* start_match includes offset */ + + if (start_offset > 0) { - if (offsetcount >= 2) +#if PCRE2_CODE_UNIT_WIDTH != 32 + unsigned int i; + if (start_match < end_subject && NOT_FIRSTCU(*start_match)) + return PCRE2_ERROR_BADUTFOFFSET; + for (i = re->max_lookbehind; i > 0 && check_subject > subject; i--) { - offsets[0] = erroroffset; - offsets[1] = errorcode; + check_subject--; + while (check_subject > subject && +#if PCRE2_CODE_UNIT_WIDTH == 8 + (*check_subject & 0xc0) == 0x80) +#else /* 16-bit */ + (*check_subject & 0xfc00) == 0xdc00) +#endif /* PCRE2_CODE_UNIT_WIDTH == 8 */ + check_subject--; } -#if defined COMPILE_PCRE8 - return (errorcode <= PCRE_UTF8_ERR5 && (options & PCRE_PARTIAL_HARD) != 0) ? - PCRE_ERROR_SHORTUTF8 : PCRE_ERROR_BADUTF8; -#elif defined COMPILE_PCRE16 - return (errorcode <= PCRE_UTF16_ERR1 && (options & PCRE_PARTIAL_HARD) != 0) ? - PCRE_ERROR_SHORTUTF16 : PCRE_ERROR_BADUTF16; -#elif defined COMPILE_PCRE32 - return PCRE_ERROR_BADUTF32; -#endif +#else /* In the 32-bit library, one code unit equals one character. */ + check_subject -= re->max_lookbehind; + if (check_subject < subject) check_subject = subject; +#endif /* PCRE2_CODE_UNIT_WIDTH != 32 */ } -#if defined COMPILE_PCRE8 || defined COMPILE_PCRE16 - if (start_offset > 0 && start_offset < length && - NOT_FIRSTCHAR(((PCRE_PUCHAR)subject)[start_offset])) - return PCRE_ERROR_BADUTF8_OFFSET; -#endif - } -#endif -/* If the exec call supplied NULL for tables, use the inbuilt ones. This -is a feature that makes it possible to save compiled regex and re-use them -in other programs later. */ + /* Validate the relevant portion of the subject. After an error, adjust the + offset to be an absolute offset in the whole string. */ -if (md->tables == NULL) md->tables = PRIV(default_tables); - -/* The "must be at the start of a line" flags are used in a loop when finding -where to start. */ - -startline = (re->flags & PCRE_STARTLINE) != 0; -firstline = (re->options & PCRE_FIRSTLINE) != 0; + match_data->rc = PRIV(valid_utf)(check_subject, + length - (PCRE2_SIZE)(check_subject - subject), &(match_data->startchar)); + if (match_data->rc != 0) + { + match_data->startchar += (PCRE2_SIZE)(check_subject - subject); + return match_data->rc; + } + } +#endif /* SUPPORT_UNICODE */ -/* Set up the first character to match, if available. The first_byte value is -never set for an anchored regular expression, but the anchoring may be forced -at run time, so we have to test for anchoring. The first char may be unset for -an unanchored pattern, of course. If there's no first char and the pattern was -studied, there may be a bitmap of possible first characters. */ +/* Set up the first code unit to match, if available. If there's no first code +unit there may be a bitmap of possible first characters. */ -if (!anchored) +if ((re->flags & PCRE2_FIRSTSET) != 0) { - if ((re->flags & PCRE_FIRSTSET) != 0) + has_first_cu = TRUE; + first_cu = first_cu2 = (PCRE2_UCHAR)(re->first_codeunit); + if ((re->flags & PCRE2_FIRSTCASELESS) != 0) { - has_first_char = TRUE; - first_char = first_char2 = (pcre_uchar)(re->first_char); - if ((re->flags & PCRE_FCH_CASELESS) != 0) - { - first_char2 = TABLE_GET(first_char, md->tables + fcc_offset, first_char); -#if defined SUPPORT_UCP && !(defined COMPILE_PCRE8) - if (utf && first_char > 127) - first_char2 = UCD_OTHERCASE(first_char); + first_cu2 = TABLE_GET(first_cu, mb->tables + fcc_offset, first_cu); +#ifdef SUPPORT_UNICODE +#if PCRE2_CODE_UNIT_WIDTH == 8 + if (first_cu > 127 && !utf && (re->overall_options & PCRE2_UCP) != 0) + first_cu2 = (PCRE2_UCHAR)UCD_OTHERCASE(first_cu); +#else + if (first_cu > 127 && (utf || (re->overall_options & PCRE2_UCP) != 0)) + first_cu2 = (PCRE2_UCHAR)UCD_OTHERCASE(first_cu); #endif - } - } - else - { - if (!startline && study != NULL && - (study->flags & PCRE_STUDY_MAPPED) != 0) - start_bits = study->start_bits; +#endif /* SUPPORT_UNICODE */ } } +else + if (!startline && (re->flags & PCRE2_FIRSTMAPSET) != 0) + start_bits = re->start_bitmap; -/* For anchored or unanchored matches, there may be a "last known required -character" set. */ +/* There may be a "last known required code unit" set. */ -if ((re->flags & PCRE_REQCHSET) != 0) +if ((re->flags & PCRE2_LASTSET) != 0) { - has_req_char = TRUE; - req_char = req_char2 = (pcre_uchar)(re->req_char); - if ((re->flags & PCRE_RCH_CASELESS) != 0) + has_req_cu = TRUE; + req_cu = req_cu2 = (PCRE2_UCHAR)(re->last_codeunit); + if ((re->flags & PCRE2_LASTCASELESS) != 0) { - req_char2 = TABLE_GET(req_char, md->tables + fcc_offset, req_char); -#if defined SUPPORT_UCP && !(defined COMPILE_PCRE8) - if (utf && req_char > 127) - req_char2 = UCD_OTHERCASE(req_char); + req_cu2 = TABLE_GET(req_cu, mb->tables + fcc_offset, req_cu); +#ifdef SUPPORT_UNICODE +#if PCRE2_CODE_UNIT_WIDTH == 8 + if (req_cu > 127 && !utf && (re->overall_options & PCRE2_UCP) != 0) + req_cu2 = (PCRE2_UCHAR)UCD_OTHERCASE(req_cu); +#else + if (req_cu > 127 && (utf || (re->overall_options & PCRE2_UCP) != 0)) + req_cu2 = (PCRE2_UCHAR)UCD_OTHERCASE(req_cu); #endif +#endif /* SUPPORT_UNICODE */ } } +/* If the match data block was previously used with PCRE2_COPY_MATCHED_SUBJECT, +free the memory that was obtained. */ + +if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0) + { + match_data->memctl.free((void *)match_data->subject, + match_data->memctl.memory_data); + match_data->flags &= ~PCRE2_MD_COPIED_SUBJECT; + } + +/* Fill in fields that are always returned in the match data. */ + +match_data->code = re; +match_data->subject = NULL; /* Default for no match */ +match_data->mark = NULL; +match_data->matchedby = PCRE2_MATCHEDBY_DFA_INTERPRETER; + /* Call the main matching function, looping for a non-anchored regex after a failed match. If not restarting, perform certain optimizations at the start of a match. */ for (;;) { - int rc; + /* ----------------- Start of match optimizations ---------------- */ - if ((options & PCRE_DFA_RESTART) == 0) - { - const pcre_uchar *save_end_subject = end_subject; + /* There are some optimizations that avoid running the match if a known + starting point is not found, or if a known later code unit is not present. + However, there is an option (settable at compile time) that disables + these, for testing and for ensuring that all callouts do actually occur. + The optimizations must also be avoided when restarting a DFA match. */ + if ((re->overall_options & PCRE2_NO_START_OPTIMIZE) == 0 && + (options & PCRE2_DFA_RESTART) == 0) + { /* If firstline is TRUE, the start of the match is constrained to the first - line of a multiline string. Implement this by temporarily adjusting - end_subject so that we stop scanning at a newline. If the match fails at - the newline, later code breaks this loop. */ + line of a multiline string. That is, the match must be before or at the + first newline following the start of matching. Temporarily adjust + end_subject so that we stop the optimization scans for a first code unit + immediately after the first character of a newline (the first code unit can + legitimately be a newline). If the match fails at the newline, later code + breaks this loop. */ if (firstline) { - PCRE_PUCHAR t = current_subject; -#ifdef SUPPORT_UTF + PCRE2_SPTR t = start_match; +#ifdef SUPPORT_UNICODE if (utf) { - while (t < md->end_subject && !IS_NEWLINE(t)) + while (t < end_subject && !IS_NEWLINE(t)) { t++; - ACROSSCHAR(t < end_subject, *t, t++); + ACROSSCHAR(t < end_subject, t, t++); } } else #endif - while (t < md->end_subject && !IS_NEWLINE(t)) t++; + while (t < end_subject && !IS_NEWLINE(t)) t++; end_subject = t; } - /* There are some optimizations that avoid running the match if a known - starting point is not found. However, there is an option that disables - these, for testing and for ensuring that all callouts do actually occur. - The option can be set in the regex by (*NO_START_OPT) or passed in - match-time options. */ + /* Anchored: check the first code unit if one is recorded. This may seem + pointless but it can help in detecting a no match case without scanning for + the required code unit. */ - if (((options | re->options) & PCRE_NO_START_OPTIMIZE) == 0) + if (anchored) { - /* Advance to a known first pcre_uchar (i.e. data item) */ + if (has_first_cu || start_bits != NULL) + { + BOOL ok = start_match < end_subject; + if (ok) + { + PCRE2_UCHAR c = UCHAR21TEST(start_match); + ok = has_first_cu && (c == first_cu || c == first_cu2); + if (!ok && start_bits != NULL) + { +#if PCRE2_CODE_UNIT_WIDTH != 8 + if (c > 255) c = 255; +#endif + ok = (start_bits[c/8] & (1u << (c&7))) != 0; + } + } + if (!ok) break; + } + } - if (has_first_char) + /* Not anchored. Advance to a unique first code unit if there is one. In + 8-bit mode, the use of memchr() gives a big speed up, even though we have + to call it twice in caseless mode, in order to find the earliest occurrence + of the character in either of its cases. If a call to memchr() that + searches the rest of the subject fails to find one case, remember that in + order not to keep on repeating the search. This can make a huge difference + when the strings are very long and only one case is present. */ + + else + { + if (has_first_cu) { - if (first_char != first_char2) + if (first_cu != first_cu2) /* Caseless */ { - pcre_uchar csc; - while (current_subject < end_subject && - (csc = UCHAR21TEST(current_subject)) != first_char && csc != first_char2) - current_subject++; +#if PCRE2_CODE_UNIT_WIDTH != 8 + PCRE2_UCHAR smc; + while (start_match < end_subject && + (smc = UCHAR21TEST(start_match)) != first_cu && + smc != first_cu2) + start_match++; + +#else /* 8-bit code units */ + PCRE2_SPTR pp1 = NULL; + PCRE2_SPTR pp2 = NULL; + PCRE2_SIZE cu2size = end_subject - start_match; + + if (!memchr_not_found_first_cu) + { + pp1 = memchr(start_match, first_cu, end_subject - start_match); + if (pp1 == NULL) memchr_not_found_first_cu = TRUE; + else cu2size = pp1 - start_match; + } + + /* If pp1 is not NULL, we have arranged to search only as far as pp1, + to see if the other case is earlier, so we can set "not found" only + when both searches have returned NULL. */ + + if (!memchr_not_found_first_cu2) + { + pp2 = memchr(start_match, first_cu2, cu2size); + memchr_not_found_first_cu2 = (pp2 == NULL && pp1 == NULL); + } + + if (pp1 == NULL) + start_match = (pp2 == NULL)? end_subject : pp2; + else + start_match = (pp2 == NULL || pp1 < pp2)? pp1 : pp2; +#endif } + + /* The caseful case */ + else - while (current_subject < end_subject && - UCHAR21TEST(current_subject) != first_char) - current_subject++; + { +#if PCRE2_CODE_UNIT_WIDTH != 8 + while (start_match < end_subject && UCHAR21TEST(start_match) != + first_cu) + start_match++; +#else /* 8-bit code units */ + start_match = memchr(start_match, first_cu, end_subject - start_match); + if (start_match == NULL) start_match = end_subject; +#endif + } + + /* If we can't find the required code unit, having reached the true end + of the subject, break the bumpalong loop, to force a match failure, + except when doing partial matching, when we let the next cycle run at + the end of the subject. To see why, consider the pattern /(?<=abc)def/, + which partially matches "abc", even though the string does not contain + the starting character "d". If we have not reached the true end of the + subject (PCRE2_FIRSTLINE caused end_subject to be temporarily modified) + we also let the cycle run, because the matching string is legitimately + allowed to start with the first code unit of a newline. */ + + if ((mb->moptions & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) == 0 && + start_match >= mb->end_subject) + break; } - /* Or to just after a linebreak for a multiline match if possible */ + /* If there's no first code unit, advance to just after a linebreak for a + multiline match if required. */ else if (startline) { - if (current_subject > md->start_subject + start_offset) + if (start_match > mb->start_subject + start_offset) { -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (utf) { - while (current_subject < end_subject && - !WAS_NEWLINE(current_subject)) + while (start_match < end_subject && !WAS_NEWLINE(start_match)) { - current_subject++; - ACROSSCHAR(current_subject < end_subject, *current_subject, - current_subject++); + start_match++; + ACROSSCHAR(start_match < end_subject, start_match, start_match++); } } else #endif - while (current_subject < end_subject && !WAS_NEWLINE(current_subject)) - current_subject++; + while (start_match < end_subject && !WAS_NEWLINE(start_match)) + start_match++; /* If we have just passed a CR and the newline option is ANY or ANYCRLF, and we are now at a LF, advance the match position by one - more character. */ + more code unit. */ - if (UCHAR21TEST(current_subject - 1) == CHAR_CR && - (md->nltype == NLTYPE_ANY || md->nltype == NLTYPE_ANYCRLF) && - current_subject < end_subject && - UCHAR21TEST(current_subject) == CHAR_NL) - current_subject++; + if (start_match[-1] == CHAR_CR && + (mb->nltype == NLTYPE_ANY || mb->nltype == NLTYPE_ANYCRLF) && + start_match < end_subject && + UCHAR21TEST(start_match) == CHAR_NL) + start_match++; } } - /* Advance to a non-unique first pcre_uchar after study */ + /* If there's no first code unit or a requirement for a multiline line + start, advance to a non-unique first code unit if any have been + identified. The bitmap contains only 256 bits. When code units are 16 or + 32 bits wide, all code units greater than 254 set the 255 bit. */ else if (start_bits != NULL) { - while (current_subject < end_subject) + while (start_match < end_subject) { - register pcre_uint32 c = UCHAR21TEST(current_subject); -#ifndef COMPILE_PCRE8 + uint32_t c = UCHAR21TEST(start_match); +#if PCRE2_CODE_UNIT_WIDTH != 8 if (c > 255) c = 255; #endif - if ((start_bits[c/8] & (1 << (c&7))) != 0) break; - current_subject++; + if ((start_bits[c/8] & (1u << (c&7))) != 0) break; + start_match++; } + + /* See comment above in first_cu checking about the next line. */ + + if ((mb->moptions & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) == 0 && + start_match >= mb->end_subject) + break; } - } + } /* End of first code unit handling */ /* Restore fudged end_subject */ - end_subject = save_end_subject; + end_subject = mb->end_subject; - /* The following two optimizations are disabled for partial matching or if - disabling is explicitly requested (and of course, by the test above, this - code is not obeyed when restarting after a partial match). */ + /* The following two optimizations are disabled for partial matching. */ - if (((options | re->options) & PCRE_NO_START_OPTIMIZE) == 0 && - (options & (PCRE_PARTIAL_HARD|PCRE_PARTIAL_SOFT)) == 0) + if ((mb->moptions & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) == 0) { - /* If the pattern was studied, a minimum subject length may be set. This - is a lower bound; no actual string of that length may actually match the - pattern. Although the value is, strictly, in characters, we treat it as - in pcre_uchar units to avoid spending too much time in this optimization. - */ - - if (study != NULL && (study->flags & PCRE_STUDY_MINLEN) != 0 && - (pcre_uint32)(end_subject - current_subject) < study->minlength) - return PCRE_ERROR_NOMATCH; - - /* If req_char is set, we know that that pcre_uchar must appear in the - subject for the match to succeed. If the first pcre_uchar is set, - req_char must be later in the subject; otherwise the test starts at the - match point. This optimization can save a huge amount of work in patterns - with nested unlimited repeats that aren't going to match. Writing - separate code for cased/caseless versions makes it go faster, as does - using an autoincrement and backing off on a match. + PCRE2_SPTR p; + + /* The minimum matching length is a lower bound; no actual string of that + length may actually match the pattern. Although the value is, strictly, + in characters, we treat it as code units to avoid spending too much time + in this optimization. */ + + if (end_subject - start_match < re->minlength) goto NOMATCH_EXIT; + + /* If req_cu is set, we know that that code unit must appear in the + subject for the match to succeed. If the first code unit is set, req_cu + must be later in the subject; otherwise the test starts at the match + point. This optimization can save a huge amount of backtracking in + patterns with nested unlimited repeats that aren't going to match. + Writing separate code for cased/caseless versions makes it go faster, as + does using an autoincrement and backing off on a match. As in the case of + the first code unit, using memchr() in the 8-bit library gives a big + speed up. Unlike the first_cu check above, we do not need to call + memchr() twice in the caseless case because we only need to check for the + presence of the character in either case, not find the first occurrence. + + The search can be skipped if the code unit was found later than the + current starting point in a previous iteration of the bumpalong loop. HOWEVER: when the subject string is very, very long, searching to its end can take a long time, and give bad performance on quite ordinary - patterns. This showed up when somebody was matching /^C/ on a 32-megabyte - string... so we don't do this when the string is sufficiently long. */ + patterns. This showed up when somebody was matching something like + /^\d+C/ on a 32-megabyte string... so we don't do this when the string is + sufficiently long, but it's worth searching a lot more for unanchored + patterns. */ - if (has_req_char && end_subject - current_subject < REQ_BYTE_MAX) + p = start_match + (has_first_cu? 1:0); + if (has_req_cu && p > req_cu_ptr) { - register PCRE_PUCHAR p = current_subject + (has_first_char? 1:0); + PCRE2_SIZE check_length = end_subject - start_match; - /* We don't need to repeat the search if we haven't yet reached the - place we found it at last time. */ - - if (p > req_char_ptr) + if (check_length < REQ_CU_MAX || + (!anchored && check_length < REQ_CU_MAX * 1000)) { - if (req_char != req_char2) + if (req_cu != req_cu2) /* Caseless */ { +#if PCRE2_CODE_UNIT_WIDTH != 8 while (p < end_subject) { - register pcre_uint32 pp = UCHAR21INCTEST(p); - if (pp == req_char || pp == req_char2) { p--; break; } + uint32_t pp = UCHAR21INCTEST(p); + if (pp == req_cu || pp == req_cu2) { p--; break; } + } +#else /* 8-bit code units */ + PCRE2_SPTR pp = p; + p = memchr(pp, req_cu, end_subject - pp); + if (p == NULL) + { + p = memchr(pp, req_cu2, end_subject - pp); + if (p == NULL) p = end_subject; } +#endif /* PCRE2_CODE_UNIT_WIDTH != 8 */ } + + /* The caseful case */ + else { +#if PCRE2_CODE_UNIT_WIDTH != 8 while (p < end_subject) { - if (UCHAR21INCTEST(p) == req_char) { p--; break; } + if (UCHAR21INCTEST(p) == req_cu) { p--; break; } } + +#else /* 8-bit code units */ + p = memchr(p, req_cu, end_subject - p); + if (p == NULL) p = end_subject; +#endif } - /* If we can't find the required pcre_uchar, break the matching loop, - which will cause a return or PCRE_ERROR_NOMATCH. */ + /* If we can't find the required code unit, break the matching loop, + forcing a match failure. */ if (p >= end_subject) break; - /* If we have found the required pcre_uchar, save the point where we + /* If we have found the required code unit, save the point where we found it, so that we don't search again next time round the loop if - the start hasn't passed this point yet. */ + the start hasn't passed this code unit yet. */ - req_char_ptr = p; + req_cu_ptr = p; } } } - } /* End of optimizations that are done when not restarting */ + } - /* OK, now we can do the business */ + /* ------------ End of start of match optimizations ------------ */ + + /* Give no match if we have passed the bumpalong limit. */ - md->start_used_ptr = current_subject; - md->recursive = NULL; + if (start_match > bumpalong_limit) break; - rc = internal_dfa_exec( - md, /* fixed match data */ - md->start_code, /* this subexpression's code */ - current_subject, /* where we currently are */ - start_offset, /* start offset in subject */ - offsets, /* offset vector */ - offsetcount, /* size of same */ - workspace, /* workspace vector */ - wscount, /* size of same */ - 0); /* function recurse level */ + /* OK, now we can do the business */ + + mb->start_used_ptr = start_match; + mb->last_used_ptr = start_match; + mb->recursive = NULL; + + rc = internal_dfa_match( + mb, /* fixed match data */ + mb->start_code, /* this subexpression's code */ + start_match, /* where we currently are */ + start_offset, /* start offset in subject */ + match_data->ovector, /* offset vector */ + (uint32_t)match_data->oveccount * 2, /* actual size of same */ + workspace, /* workspace vector */ + (int)wscount, /* size of same */ + 0, /* function recurse level */ + base_recursion_workspace); /* initial workspace for recursion */ /* Anything other than "no match" means we are done, always; otherwise, carry on only if not anchored. */ - if (rc != PCRE_ERROR_NOMATCH || anchored) + if (rc != PCRE2_ERROR_NOMATCH || anchored) { - if (rc == PCRE_ERROR_PARTIAL && offsetcount >= 2) + if (rc == PCRE2_ERROR_PARTIAL && match_data->oveccount > 0) + { + match_data->ovector[0] = (PCRE2_SIZE)(start_match - subject); + match_data->ovector[1] = (PCRE2_SIZE)(end_subject - subject); + } + match_data->leftchar = (PCRE2_SIZE)(mb->start_used_ptr - subject); + match_data->rightchar = (PCRE2_SIZE)( mb->last_used_ptr - subject); + match_data->startchar = (PCRE2_SIZE)(start_match - subject); + match_data->rc = rc; + + if (rc >= 0 &&(options & PCRE2_COPY_MATCHED_SUBJECT) != 0) + { + length = CU2BYTES(length + was_zero_terminated); + match_data->subject = match_data->memctl.malloc(length, + match_data->memctl.memory_data); + if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY; + memcpy((void *)match_data->subject, subject, length); + match_data->flags |= PCRE2_MD_COPIED_SUBJECT; + } + else { - offsets[0] = (int)(md->start_used_ptr - (PCRE_PUCHAR)subject); - offsets[1] = (int)(end_subject - (PCRE_PUCHAR)subject); - if (offsetcount > 2) - offsets[2] = (int)(current_subject - (PCRE_PUCHAR)subject); + if (rc >= 0 || rc == PCRE2_ERROR_PARTIAL) match_data->subject = subject; } - return rc; + goto EXIT; } /* Advance to the next subject character unless we are at the end of a line and firstline is set. */ - if (firstline && IS_NEWLINE(current_subject)) break; - current_subject++; -#ifdef SUPPORT_UTF + if (firstline && IS_NEWLINE(start_match)) break; + start_match++; +#ifdef SUPPORT_UNICODE if (utf) { - ACROSSCHAR(current_subject < end_subject, *current_subject, - current_subject++); + ACROSSCHAR(start_match < end_subject, start_match, start_match++); } #endif - if (current_subject > end_subject) break; + if (start_match > end_subject) break; /* If we have just passed a CR and we are now at a LF, and the pattern does not contain any explicit matches for \r or \n, and the newline option is CRLF or ANY or ANYCRLF, advance the match position by one more character. */ - if (UCHAR21TEST(current_subject - 1) == CHAR_CR && - current_subject < end_subject && - UCHAR21TEST(current_subject) == CHAR_NL && - (re->flags & PCRE_HASCRORLF) == 0 && - (md->nltype == NLTYPE_ANY || - md->nltype == NLTYPE_ANYCRLF || - md->nllen == 2)) - current_subject++; + if (UCHAR21TEST(start_match - 1) == CHAR_CR && + start_match < end_subject && + UCHAR21TEST(start_match) == CHAR_NL && + (re->flags & PCRE2_HASCRORLF) == 0 && + (mb->nltype == NLTYPE_ANY || + mb->nltype == NLTYPE_ANYCRLF || + mb->nllen == 2)) + start_match++; } /* "Bumpalong" loop */ -return PCRE_ERROR_NOMATCH; +NOMATCH_EXIT: +rc = PCRE2_ERROR_NOMATCH; + +EXIT: +while (rws->next != NULL) + { + RWS_anchor *next = rws->next; + rws->next = next->next; + mb->memctl.free(next, mb->memctl.memory_data); + } + +return rc; } -/* End of pcre_dfa_exec.c */ +/* End of pcre2_dfa_match.c */ diff --git a/src/pcre2/src/pcre2_dftables.c b/src/pcre2/src/pcre2_dftables.c new file mode 100644 index 00000000..71b90ce8 --- /dev/null +++ b/src/pcre2/src/pcre2_dftables.c @@ -0,0 +1,303 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2020 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +/* This is a freestanding support program to generate a file containing +character tables for PCRE2. The tables are built using the pcre2_maketables() +function, which is part of the PCRE2 API. By default, the system's "C" locale +is used rather than what the building user happens to have set, but the -L +option can be used to select the current locale from the LC_ALL environment +variable. By default, the tables are written in source form, but if -b is +given, they are written in binary. */ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include +#include +#include +#include + +#define PCRE2_CODE_UNIT_WIDTH 0 /* Must be set, but not relevant here */ +#include "pcre2_internal.h" + +#define PCRE2_DFTABLES /* pcre2_maketables.c notices this */ +#include "pcre2_maketables.c" + + +static const char *classlist[] = + { + "space", "xdigit", "digit", "upper", "lower", + "word", "graph", "print", "punct", "cntrl" + }; + + + +/************************************************* +* Usage * +*************************************************/ + +static void +usage(void) +{ +(void)fprintf(stderr, + "Usage: pcre2_dftables [options] \n" + " -b Write output in binary (default is source code)\n" + " -L Use locale from LC_ALL (default is \"C\" locale)\n" + ); +} + + + +/************************************************* +* Entry point * +*************************************************/ + +int main(int argc, char **argv) +{ +FILE *f; +int i; +int nclass = 0; +BOOL binary = FALSE; +char *env = (char *)"C"; +const unsigned char *tables; +const unsigned char *base_of_tables; + +/* Process options */ + +for (i = 1; i < argc; i++) + { + char *arg = argv[i]; + if (*arg != '-') break; + + if (strcmp(arg, "-help") == 0 || strcmp(arg, "--help") == 0) + { + usage(); + return 0; + } + + else if (strcmp(arg, "-L") == 0) + { + if (setlocale(LC_ALL, "") == NULL) + { + (void)fprintf(stderr, "pcre2_dftables: setlocale() failed\n"); + return 1; + } + env = getenv("LC_ALL"); + } + + else if (strcmp(arg, "-b") == 0) + binary = TRUE; + + else + { + (void)fprintf(stderr, "pcre2_dftables: unrecognized option %s\n", arg); + return 1; + } + } + +if (i != argc - 1) + { + (void)fprintf(stderr, "pcre2_dftables: one filename argument is required\n"); + return 1; + } + +/* Make the tables */ + +tables = maketables(); +base_of_tables = tables; + +f = fopen(argv[i], "wb"); +if (f == NULL) + { + fprintf(stderr, "pcre2_dftables: failed to open %s for writing\n", argv[1]); + return 1; + } + +/* If -b was specified, we write the tables in binary. */ + +if (binary) + { + int yield = 0; + size_t len = fwrite(tables, 1, TABLES_LENGTH, f); + if (len != TABLES_LENGTH) + { + (void)fprintf(stderr, "pcre2_dftables: fwrite() returned wrong length %d " + "instead of %d\n", (int)len, TABLES_LENGTH); + yield = 1; + } + fclose(f); + free((void *)base_of_tables); + return yield; + } + +/* Write the tables as source code for inclusion in the PCRE2 library. There +are several fprintf() calls here, because gcc in pedantic mode complains about +the very long string otherwise. */ + +(void)fprintf(f, + "/*************************************************\n" + "* Perl-Compatible Regular Expressions *\n" + "*************************************************/\n\n" + "/* This file was automatically written by the pcre2_dftables auxiliary\n" + "program. It contains character tables that are used when no external\n" + "tables are passed to PCRE2 by the application that calls it. The tables\n" + "are used only for characters whose code values are less than 256. */\n\n"); + +(void)fprintf(f, + "/* This set of tables was written in the %s locale. */\n\n", env); + +(void)fprintf(f, + "/* The pcre2_ftables program (which is distributed with PCRE2) can be used\n" + "to build alternative versions of this file. This is necessary if you are\n" + "running in an EBCDIC environment, or if you want to default to a different\n" + "encoding, for example ISO-8859-1. When pcre2_dftables is run, it creates\n" + "these tables in the \"C\" locale by default. This happens automatically if\n" + "PCRE2 is configured with --enable-rebuild-chartables. However, you can run\n" + "pcre2_dftables manually with the -L option to build tables using the LC_ALL\n" + "locale. */\n\n"); + +/* Force config.h in z/OS */ + +#if defined NATIVE_ZOS +(void)fprintf(f, + "/* For z/OS, config.h is forced */\n" + "#ifndef HAVE_CONFIG_H\n" + "#define HAVE_CONFIG_H 1\n" + "#endif\n\n"); +#endif + +(void)fprintf(f, + "/* The following #include is present because without it gcc 4.x may remove\n" + "the array definition from the final binary if PCRE2 is built into a static\n" + "library and dead code stripping is activated. This leads to link errors.\n" + "Pulling in the header ensures that the array gets flagged as \"someone\n" + "outside this compilation unit might reference this\" and so it will always\n" + "be supplied to the linker. */\n\n"); + +(void)fprintf(f, + "#ifdef HAVE_CONFIG_H\n" + "#include \"config.h\"\n" + "#endif\n\n" + "#include \"pcre2_internal.h\"\n\n"); + +(void)fprintf(f, + "const uint8_t PRIV(default_tables)[] = {\n\n" + "/* This table is a lower casing table. */\n\n"); + +(void)fprintf(f, " "); +for (i = 0; i < 256; i++) + { + if ((i & 7) == 0 && i != 0) fprintf(f, "\n "); + fprintf(f, "%3d", *tables++); + if (i != 255) fprintf(f, ","); + } +(void)fprintf(f, ",\n\n"); + +(void)fprintf(f, "/* This table is a case flipping table. */\n\n"); + +(void)fprintf(f, " "); +for (i = 0; i < 256; i++) + { + if ((i & 7) == 0 && i != 0) fprintf(f, "\n "); + fprintf(f, "%3d", *tables++); + if (i != 255) fprintf(f, ","); + } +(void)fprintf(f, ",\n\n"); + +(void)fprintf(f, + "/* This table contains bit maps for various character classes. Each map is 32\n" + "bytes long and the bits run from the least significant end of each byte. The\n" + "classes that have their own maps are: space, xdigit, digit, upper, lower, word,\n" + "graph, print, punct, and cntrl. Other classes are built from combinations. */\n\n"); + +(void)fprintf(f, " "); +for (i = 0; i < cbit_length; i++) + { + if ((i & 7) == 0 && i != 0) + { + if ((i & 31) == 0) (void)fprintf(f, "\n"); + if ((i & 24) == 8) (void)fprintf(f, " /* %s */", classlist[nclass++]); + (void)fprintf(f, "\n "); + } + (void)fprintf(f, "0x%02x", *tables++); + if (i != cbit_length - 1) (void)fprintf(f, ","); + } +(void)fprintf(f, ",\n\n"); + +(void)fprintf(f, + "/* This table identifies various classes of character by individual bits:\n" + " 0x%02x white space character\n" + " 0x%02x letter\n" + " 0x%02x lower case letter\n" + " 0x%02x decimal digit\n" + " 0x%02x alphanumeric or '_'\n*/\n\n", + ctype_space, ctype_letter, ctype_lcletter, ctype_digit, ctype_word); + +(void)fprintf(f, " "); +for (i = 0; i < 256; i++) + { + if ((i & 7) == 0 && i != 0) + { + (void)fprintf(f, " /* "); + if (isprint(i-8)) (void)fprintf(f, " %c -", i-8); + else (void)fprintf(f, "%3d-", i-8); + if (isprint(i-1)) (void)fprintf(f, " %c ", i-1); + else (void)fprintf(f, "%3d", i-1); + (void)fprintf(f, " */\n "); + } + (void)fprintf(f, "0x%02x", *tables++); + if (i != 255) (void)fprintf(f, ","); + } + +(void)fprintf(f, "};/* "); +if (isprint(i-8)) (void)fprintf(f, " %c -", i-8); + else (void)fprintf(f, "%3d-", i-8); +if (isprint(i-1)) (void)fprintf(f, " %c ", i-1); + else (void)fprintf(f, "%3d", i-1); +(void)fprintf(f, " */\n\n/* End of pcre2_chartables.c */\n"); + +fclose(f); +free((void *)base_of_tables); +return 0; +} + +/* End of pcre2_dftables.c */ diff --git a/src/pcre2/src/pcre2_error.c b/src/pcre2/src/pcre2_error.c new file mode 100644 index 00000000..c61648cb --- /dev/null +++ b/src/pcre2/src/pcre2_error.c @@ -0,0 +1,340 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2019 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include "pcre2_internal.h" + +#define STRING(a) # a +#define XSTRING(s) STRING(s) + +/* The texts of compile-time error messages. Compile-time error numbers start +at COMPILE_ERROR_BASE (100). + +This used to be a table of strings, but in order to reduce the number of +relocations needed when a shared library is loaded dynamically, it is now one +long string. We cannot use a table of offsets, because the lengths of inserts +such as XSTRING(MAX_NAME_SIZE) are not known. Instead, +pcre2_get_error_message() counts through to the one it wants - this isn't a +performance issue because these strings are used only when there is an error. + +Each substring ends with \0 to insert a null character. This includes the final +substring, so that the whole string ends with \0\0, which can be detected when +counting through. */ + +static const unsigned char compile_error_texts[] = + "no error\0" + "\\ at end of pattern\0" + "\\c at end of pattern\0" + "unrecognized character follows \\\0" + "numbers out of order in {} quantifier\0" + /* 5 */ + "number too big in {} quantifier\0" + "missing terminating ] for character class\0" + "escape sequence is invalid in character class\0" + "range out of order in character class\0" + "quantifier does not follow a repeatable item\0" + /* 10 */ + "internal error: unexpected repeat\0" + "unrecognized character after (? or (?-\0" + "POSIX named classes are supported only within a class\0" + "POSIX collating elements are not supported\0" + "missing closing parenthesis\0" + /* 15 */ + "reference to non-existent subpattern\0" + "pattern passed as NULL\0" + "unrecognised compile-time option bit(s)\0" + "missing ) after (?# comment\0" + "parentheses are too deeply nested\0" + /* 20 */ + "regular expression is too large\0" + "failed to allocate heap memory\0" + "unmatched closing parenthesis\0" + "internal error: code overflow\0" + "missing closing parenthesis for condition\0" + /* 25 */ + "lookbehind assertion is not fixed length\0" + "a relative value of zero is not allowed\0" + "conditional subpattern contains more than two branches\0" + "assertion expected after (?( or (?(?C)\0" + "digit expected after (?+ or (?-\0" + /* 30 */ + "unknown POSIX class name\0" + "internal error in pcre2_study(): should not occur\0" + "this version of PCRE2 does not have Unicode support\0" + "parentheses are too deeply nested (stack check)\0" + "character code point value in \\x{} or \\o{} is too large\0" + /* 35 */ + "lookbehind is too complicated\0" + "\\C is not allowed in a lookbehind assertion in UTF-" XSTRING(PCRE2_CODE_UNIT_WIDTH) " mode\0" + "PCRE2 does not support \\F, \\L, \\l, \\N{name}, \\U, or \\u\0" + "number after (?C is greater than 255\0" + "closing parenthesis for (?C expected\0" + /* 40 */ + "invalid escape sequence in (*VERB) name\0" + "unrecognized character after (?P\0" + "syntax error in subpattern name (missing terminator?)\0" + "two named subpatterns have the same name (PCRE2_DUPNAMES not set)\0" + "subpattern name must start with a non-digit\0" + /* 45 */ + "this version of PCRE2 does not have support for \\P, \\p, or \\X\0" + "malformed \\P or \\p sequence\0" + "unknown property name after \\P or \\p\0" + "subpattern name is too long (maximum " XSTRING(MAX_NAME_SIZE) " code units)\0" + "too many named subpatterns (maximum " XSTRING(MAX_NAME_COUNT) ")\0" + /* 50 */ + "invalid range in character class\0" + "octal value is greater than \\377 in 8-bit non-UTF-8 mode\0" + "internal error: overran compiling workspace\0" + "internal error: previously-checked referenced subpattern not found\0" + "DEFINE subpattern contains more than one branch\0" + /* 55 */ + "missing opening brace after \\o\0" + "internal error: unknown newline setting\0" + "\\g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number\0" + "(?R (recursive pattern call) must be followed by a closing parenthesis\0" + /* "an argument is not allowed for (*ACCEPT), (*FAIL), or (*COMMIT)\0" */ + "obsolete error (should not occur)\0" /* Was the above */ + /* 60 */ + "(*VERB) not recognized or malformed\0" + "subpattern number is too big\0" + "subpattern name expected\0" + "internal error: parsed pattern overflow\0" + "non-octal character in \\o{} (closing brace missing?)\0" + /* 65 */ + "different names for subpatterns of the same number are not allowed\0" + "(*MARK) must have an argument\0" + "non-hex character in \\x{} (closing brace missing?)\0" +#ifndef EBCDIC + "\\c must be followed by a printable ASCII character\0" +#else + "\\c must be followed by a letter or one of [\\]^_?\0" +#endif + "\\k is not followed by a braced, angle-bracketed, or quoted name\0" + /* 70 */ + "internal error: unknown meta code in check_lookbehinds()\0" + "\\N is not supported in a class\0" + "callout string is too long\0" + "disallowed Unicode code point (>= 0xd800 && <= 0xdfff)\0" + "using UTF is disabled by the application\0" + /* 75 */ + "using UCP is disabled by the application\0" + "name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)\0" + "character code point value in \\u.... sequence is too large\0" + "digits missing in \\x{} or \\o{} or \\N{U+}\0" + "syntax error or number too big in (?(VERSION condition\0" + /* 80 */ + "internal error: unknown opcode in auto_possessify()\0" + "missing terminating delimiter for callout with string argument\0" + "unrecognized string delimiter follows (?C\0" + "using \\C is disabled by the application\0" + "(?| and/or (?J: or (?x: parentheses are too deeply nested\0" + /* 85 */ + "using \\C is disabled in this PCRE2 library\0" + "regular expression is too complicated\0" + "lookbehind assertion is too long\0" + "pattern string is longer than the limit set by the application\0" + "internal error: unknown code in parsed pattern\0" + /* 90 */ + "internal error: bad code value in parsed_skip()\0" + "PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowed in UTF-16 mode\0" + "invalid option bits with PCRE2_LITERAL\0" + "\\N{U+dddd} is supported only in Unicode (UTF) mode\0" + "invalid hyphen in option setting\0" + /* 95 */ + "(*alpha_assertion) not recognized\0" + "script runs require Unicode support, which this version of PCRE2 does not have\0" + "too many capturing groups (maximum 65535)\0" + "atomic assertion expected after (?( or (?(?C)\0" + ; + +/* Match-time and UTF error texts are in the same format. */ + +static const unsigned char match_error_texts[] = + "no error\0" + "no match\0" + "partial match\0" + "UTF-8 error: 1 byte missing at end\0" + "UTF-8 error: 2 bytes missing at end\0" + /* 5 */ + "UTF-8 error: 3 bytes missing at end\0" + "UTF-8 error: 4 bytes missing at end\0" + "UTF-8 error: 5 bytes missing at end\0" + "UTF-8 error: byte 2 top bits not 0x80\0" + "UTF-8 error: byte 3 top bits not 0x80\0" + /* 10 */ + "UTF-8 error: byte 4 top bits not 0x80\0" + "UTF-8 error: byte 5 top bits not 0x80\0" + "UTF-8 error: byte 6 top bits not 0x80\0" + "UTF-8 error: 5-byte character is not allowed (RFC 3629)\0" + "UTF-8 error: 6-byte character is not allowed (RFC 3629)\0" + /* 15 */ + "UTF-8 error: code points greater than 0x10ffff are not defined\0" + "UTF-8 error: code points 0xd800-0xdfff are not defined\0" + "UTF-8 error: overlong 2-byte sequence\0" + "UTF-8 error: overlong 3-byte sequence\0" + "UTF-8 error: overlong 4-byte sequence\0" + /* 20 */ + "UTF-8 error: overlong 5-byte sequence\0" + "UTF-8 error: overlong 6-byte sequence\0" + "UTF-8 error: isolated byte with 0x80 bit set\0" + "UTF-8 error: illegal byte (0xfe or 0xff)\0" + "UTF-16 error: missing low surrogate at end\0" + /* 25 */ + "UTF-16 error: invalid low surrogate\0" + "UTF-16 error: isolated low surrogate\0" + "UTF-32 error: code points 0xd800-0xdfff are not defined\0" + "UTF-32 error: code points greater than 0x10ffff are not defined\0" + "bad data value\0" + /* 30 */ + "patterns do not all use the same character tables\0" + "magic number missing\0" + "pattern compiled in wrong mode: 8/16/32-bit error\0" + "bad offset value\0" + "bad option value\0" + /* 35 */ + "invalid replacement string\0" + "bad offset into UTF string\0" + "callout error code\0" /* Never returned by PCRE2 itself */ + "invalid data in workspace for DFA restart\0" + "too much recursion for DFA matching\0" + /* 40 */ + "backreference condition or recursion test is not supported for DFA matching\0" + "function is not supported for DFA matching\0" + "pattern contains an item that is not supported for DFA matching\0" + "workspace size exceeded in DFA matching\0" + "internal error - pattern overwritten?\0" + /* 45 */ + "bad JIT option\0" + "JIT stack limit reached\0" + "match limit exceeded\0" + "no more memory\0" + "unknown substring\0" + /* 50 */ + "non-unique substring name\0" + "NULL argument passed\0" + "nested recursion at the same subject position\0" + "matching depth limit exceeded\0" + "requested value is not available\0" + /* 55 */ + "requested value is not set\0" + "offset limit set without PCRE2_USE_OFFSET_LIMIT\0" + "bad escape sequence in replacement string\0" + "expected closing curly bracket in replacement string\0" + "bad substitution in replacement string\0" + /* 60 */ + "match with end before start or start moved backwards is not supported\0" + "too many replacements (more than INT_MAX)\0" + "bad serialized data\0" + "heap limit exceeded\0" + "invalid syntax\0" + /* 65 */ + "internal error - duplicate substitution match\0" + "PCRE2_MATCH_INVALID_UTF is not supported for DFA matching\0" + ; + + +/************************************************* +* Return error message * +*************************************************/ + +/* This function copies an error message into a buffer whose units are of an +appropriate width. Error numbers are positive for compile-time errors, and +negative for match-time errors (except for UTF errors), but the numbers are all +distinct. + +Arguments: + enumber error number + buffer where to put the message (zero terminated) + size size of the buffer in code units + +Returns: length of message if all is well + negative on error +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_get_error_message(int enumber, PCRE2_UCHAR *buffer, PCRE2_SIZE size) +{ +const unsigned char *message; +PCRE2_SIZE i; +int n; + +if (size == 0) return PCRE2_ERROR_NOMEMORY; + +if (enumber >= COMPILE_ERROR_BASE) /* Compile error */ + { + message = compile_error_texts; + n = enumber - COMPILE_ERROR_BASE; + } +else if (enumber < 0) /* Match or UTF error */ + { + message = match_error_texts; + n = -enumber; + } +else /* Invalid error number */ + { + message = (unsigned char *)"\0"; /* Empty message list */ + n = 1; + } + +for (; n > 0; n--) + { + while (*message++ != CHAR_NUL) {}; + if (*message == CHAR_NUL) return PCRE2_ERROR_BADDATA; + } + +for (i = 0; *message != 0; i++) + { + if (i >= size - 1) + { + buffer[i] = 0; /* Terminate partial message */ + return PCRE2_ERROR_NOMEMORY; + } + buffer[i] = *message++; + } + +buffer[i] = 0; +return (int)i; +} + +/* End of pcre2_error.c */ diff --git a/src/pcre2/src/pcre2_extuni.c b/src/pcre2/src/pcre2_extuni.c new file mode 100644 index 00000000..5a719e9c --- /dev/null +++ b/src/pcre2/src/pcre2_extuni.c @@ -0,0 +1,148 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2019 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + +/* This module contains an internal function that is used to match a Unicode +extended grapheme sequence. It is used by both pcre2_match() and +pcre2_def_match(). However, it is called only when Unicode support is being +compiled. Nevertheless, we provide a dummy function when there is no Unicode +support, because some compilers do not like functionless source files. */ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + + +#include "pcre2_internal.h" + + +/* Dummy function */ + +#ifndef SUPPORT_UNICODE +PCRE2_SPTR +PRIV(extuni)(uint32_t c, PCRE2_SPTR eptr, PCRE2_SPTR start_subject, + PCRE2_SPTR end_subject, BOOL utf, int *xcount) +{ +(void)c; +(void)eptr; +(void)start_subject; +(void)end_subject; +(void)utf; +(void)xcount; +return NULL; +} +#else + + +/************************************************* +* Match an extended grapheme sequence * +*************************************************/ + +/* +Arguments: + c the first character + eptr pointer to next character + start_subject pointer to start of subject + end_subject pointer to end of subject + utf TRUE if in UTF mode + xcount pointer to count of additional characters, + or NULL if count not needed + +Returns: pointer after the end of the sequence +*/ + +PCRE2_SPTR +PRIV(extuni)(uint32_t c, PCRE2_SPTR eptr, PCRE2_SPTR start_subject, + PCRE2_SPTR end_subject, BOOL utf, int *xcount) +{ +int lgb = UCD_GRAPHBREAK(c); + +while (eptr < end_subject) + { + int rgb; + int len = 1; + if (!utf) c = *eptr; else { GETCHARLEN(c, eptr, len); } + rgb = UCD_GRAPHBREAK(c); + if ((PRIV(ucp_gbtable)[lgb] & (1u << rgb)) == 0) break; + + /* Not breaking between Regional Indicators is allowed only if there + are an even number of preceding RIs. */ + + if (lgb == ucp_gbRegionalIndicator && rgb == ucp_gbRegionalIndicator) + { + int ricount = 0; + PCRE2_SPTR bptr = eptr - 1; + if (utf) BACKCHAR(bptr); + + /* bptr is pointing to the left-hand character */ + + while (bptr > start_subject) + { + bptr--; + if (utf) + { + BACKCHAR(bptr); + GETCHAR(c, bptr); + } + else + c = *bptr; + if (UCD_GRAPHBREAK(c) != ucp_gbRegionalIndicator) break; + ricount++; + } + if ((ricount & 1) != 0) break; /* Grapheme break required */ + } + + /* If Extend or ZWJ follows Extended_Pictographic, do not update lgb; this + allows any number of them before a following Extended_Pictographic. */ + + if ((rgb != ucp_gbExtend && rgb != ucp_gbZWJ) || + lgb != ucp_gbExtended_Pictographic) + lgb = rgb; + + eptr += len; + if (xcount != NULL) *xcount += 1; + } + +return eptr; +} + +#endif /* SUPPORT_UNICODE */ + +/* End of pcre2_extuni.c */ diff --git a/src/pcre2/src/pcre2_find_bracket.c b/src/pcre2/src/pcre2_find_bracket.c new file mode 100644 index 00000000..70baa139 --- /dev/null +++ b/src/pcre2/src/pcre2_find_bracket.c @@ -0,0 +1,219 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2018 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +/* This module contains a single function that scans through a compiled pattern +until it finds a capturing bracket with the given number, or, if the number is +negative, an instance of OP_REVERSE for a lookbehind. The function is called +from pcre2_compile.c and also from pcre2_study.c when finding the minimum +matching length. */ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include "pcre2_internal.h" + + +/************************************************* +* Scan compiled regex for specific bracket * +*************************************************/ + +/* +Arguments: + code points to start of expression + utf TRUE in UTF mode + number the required bracket number or negative to find a lookbehind + +Returns: pointer to the opcode for the bracket, or NULL if not found +*/ + +PCRE2_SPTR +PRIV(find_bracket)(PCRE2_SPTR code, BOOL utf, int number) +{ +for (;;) + { + PCRE2_UCHAR c = *code; + + if (c == OP_END) return NULL; + + /* XCLASS is used for classes that cannot be represented just by a bit map. + This includes negated single high-valued characters. CALLOUT_STR is used for + callouts with string arguments. In both cases the length in the table is + zero; the actual length is stored in the compiled code. */ + + if (c == OP_XCLASS) code += GET(code, 1); + else if (c == OP_CALLOUT_STR) code += GET(code, 1 + 2*LINK_SIZE); + + /* Handle lookbehind */ + + else if (c == OP_REVERSE) + { + if (number < 0) return (PCRE2_UCHAR *)code; + code += PRIV(OP_lengths)[c]; + } + + /* Handle capturing bracket */ + + else if (c == OP_CBRA || c == OP_SCBRA || + c == OP_CBRAPOS || c == OP_SCBRAPOS) + { + int n = (int)GET2(code, 1+LINK_SIZE); + if (n == number) return (PCRE2_UCHAR *)code; + code += PRIV(OP_lengths)[c]; + } + + /* Otherwise, we can get the item's length from the table, except that for + repeated character types, we have to test for \p and \P, which have an extra + two bytes of parameters, and for MARK/PRUNE/SKIP/THEN with an argument, we + must add in its length. */ + + else + { + switch(c) + { + case OP_TYPESTAR: + case OP_TYPEMINSTAR: + case OP_TYPEPLUS: + case OP_TYPEMINPLUS: + case OP_TYPEQUERY: + case OP_TYPEMINQUERY: + case OP_TYPEPOSSTAR: + case OP_TYPEPOSPLUS: + case OP_TYPEPOSQUERY: + if (code[1] == OP_PROP || code[1] == OP_NOTPROP) code += 2; + break; + + case OP_TYPEUPTO: + case OP_TYPEMINUPTO: + case OP_TYPEEXACT: + case OP_TYPEPOSUPTO: + if (code[1 + IMM2_SIZE] == OP_PROP || code[1 + IMM2_SIZE] == OP_NOTPROP) + code += 2; + break; + + case OP_MARK: + case OP_COMMIT_ARG: + case OP_PRUNE_ARG: + case OP_SKIP_ARG: + case OP_THEN_ARG: + code += code[1]; + break; + } + + /* Add in the fixed length from the table */ + + code += PRIV(OP_lengths)[c]; + + /* In UTF-8 and UTF-16 modes, opcodes that are followed by a character may be + followed by a multi-byte character. The length in the table is a minimum, so + we have to arrange to skip the extra bytes. */ + +#ifdef MAYBE_UTF_MULTI + if (utf) switch(c) + { + case OP_CHAR: + case OP_CHARI: + case OP_NOT: + case OP_NOTI: + case OP_EXACT: + case OP_EXACTI: + case OP_NOTEXACT: + case OP_NOTEXACTI: + case OP_UPTO: + case OP_UPTOI: + case OP_NOTUPTO: + case OP_NOTUPTOI: + case OP_MINUPTO: + case OP_MINUPTOI: + case OP_NOTMINUPTO: + case OP_NOTMINUPTOI: + case OP_POSUPTO: + case OP_POSUPTOI: + case OP_NOTPOSUPTO: + case OP_NOTPOSUPTOI: + case OP_STAR: + case OP_STARI: + case OP_NOTSTAR: + case OP_NOTSTARI: + case OP_MINSTAR: + case OP_MINSTARI: + case OP_NOTMINSTAR: + case OP_NOTMINSTARI: + case OP_POSSTAR: + case OP_POSSTARI: + case OP_NOTPOSSTAR: + case OP_NOTPOSSTARI: + case OP_PLUS: + case OP_PLUSI: + case OP_NOTPLUS: + case OP_NOTPLUSI: + case OP_MINPLUS: + case OP_MINPLUSI: + case OP_NOTMINPLUS: + case OP_NOTMINPLUSI: + case OP_POSPLUS: + case OP_POSPLUSI: + case OP_NOTPOSPLUS: + case OP_NOTPOSPLUSI: + case OP_QUERY: + case OP_QUERYI: + case OP_NOTQUERY: + case OP_NOTQUERYI: + case OP_MINQUERY: + case OP_MINQUERYI: + case OP_NOTMINQUERY: + case OP_NOTMINQUERYI: + case OP_POSQUERY: + case OP_POSQUERYI: + case OP_NOTPOSQUERY: + case OP_NOTPOSQUERYI: + if (HAS_EXTRALEN(code[-1])) code += GET_EXTRALEN(code[-1]); + break; + } +#else + (void)(utf); /* Keep compiler happy by referencing function argument */ +#endif /* MAYBE_UTF_MULTI */ + } + } +} + +/* End of pcre2_find_bracket.c */ diff --git a/src/pcre2/src/pcre2_fuzzsupport.c b/src/pcre2/src/pcre2_fuzzsupport.c new file mode 100644 index 00000000..48781ffc --- /dev/null +++ b/src/pcre2/src/pcre2_fuzzsupport.c @@ -0,0 +1,365 @@ +/*************************************************************************** +Fuzzer driver for PCRE2. Given an arbitrary string of bytes and a length, it +tries to compile and match it, deriving options from the string itself. If +STANDALONE is defined, a main program that calls the driver with the contents +of specified files is compiled, and commentary on what is happening is output. +If an argument starts with '=' the rest of it it is taken as a literal string +rather than a file name. This allows easy testing of short strings. + +Written by Philip Hazel, October 2016 +***************************************************************************/ + +#include +#include +#include +#include + +#define PCRE2_CODE_UNIT_WIDTH 8 +#include "pcre2.h" + +#define MAX_MATCH_SIZE 1000 + +#define DFA_WORKSPACE_COUNT 100 + +#define ALLOWED_COMPILE_OPTIONS \ + (PCRE2_ANCHORED|PCRE2_ALLOW_EMPTY_CLASS|PCRE2_ALT_BSUX|PCRE2_ALT_CIRCUMFLEX| \ + PCRE2_ALT_VERBNAMES|PCRE2_AUTO_CALLOUT|PCRE2_CASELESS|PCRE2_DOLLAR_ENDONLY| \ + PCRE2_DOTALL|PCRE2_DUPNAMES|PCRE2_ENDANCHORED|PCRE2_EXTENDED|PCRE2_FIRSTLINE| \ + PCRE2_MATCH_UNSET_BACKREF|PCRE2_MULTILINE|PCRE2_NEVER_BACKSLASH_C| \ + PCRE2_NO_AUTO_CAPTURE| \ + PCRE2_NO_AUTO_POSSESS|PCRE2_NO_DOTSTAR_ANCHOR|PCRE2_NO_START_OPTIMIZE| \ + PCRE2_UCP|PCRE2_UNGREEDY|PCRE2_USE_OFFSET_LIMIT| \ + PCRE2_UTF) + +#define ALLOWED_MATCH_OPTIONS \ + (PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \ + PCRE2_NOTEMPTY_ATSTART|PCRE2_PARTIAL_HARD| \ + PCRE2_PARTIAL_SOFT|PCRE2_NO_JIT) + +/* This is the callout function. Its only purpose is to halt matching if there +are more than 100 callouts, as one way of stopping too much time being spent on +fruitless matches. The callout data is a pointer to the counter. */ + +static int callout_function(pcre2_callout_block *cb, void *callout_data) +{ +(void)cb; /* Avoid unused parameter warning */ +*((uint32_t *)callout_data) += 1; +return (*((uint32_t *)callout_data) > 100)? PCRE2_ERROR_CALLOUT : 0; +} + +/* Putting in this apparently unnecessary prototype prevents gcc from giving a +"no previous prototype" warning when compiling at high warning level. */ + +int LLVMFuzzerTestOneInput(const unsigned char *, size_t); + +/* Here's the driving function. */ + +int LLVMFuzzerTestOneInput(const unsigned char *data, size_t size) +{ +uint32_t compile_options; +uint32_t match_options; +pcre2_match_data *match_data = NULL; +pcre2_match_context *match_context = NULL; +size_t match_size; +int dfa_workspace[DFA_WORKSPACE_COUNT]; +int r1, r2; +int i; + +if (size < 1) return 0; + +/* Limiting the length of the subject for matching stops fruitless searches +in large trees taking too much time. */ + +match_size = (size > MAX_MATCH_SIZE)? MAX_MATCH_SIZE : size; + +/* Figure out some options to use. Initialize the random number to ensure +repeatability. Ensure that we get a 32-bit unsigned random number for testing +options. (RAND_MAX is required to be at least 32767, but is commonly +2147483647, which excludes the top bit.) */ + +srand((unsigned int)(data[size/2])); +r1 = rand(); +r2 = rand(); + +/* Ensure that all undefined option bits are zero (waste of time trying them) +and also that PCRE2_NO_UTF_CHECK is unset, as there is no guarantee that the +input is UTF-8. Also unset PCRE2_NEVER_UTF and PCRE2_NEVER_UCP as there is no +reason to disallow UTF and UCP. Force PCRE2_NEVER_BACKSLASH_C to be set because +\C in random patterns is highly likely to cause a crash. */ + +compile_options = + ((((uint32_t)r1 << 16) | ((uint32_t)r2 & 0xffff)) & ALLOWED_COMPILE_OPTIONS) | + PCRE2_NEVER_BACKSLASH_C; + +match_options = + ((((uint32_t)r1 << 16) | ((uint32_t)r2 & 0xffff)) & ALLOWED_MATCH_OPTIONS); + +/* Discard partial matching if PCRE2_ENDANCHORED is set, because they are not +allowed together and just give an immediate error return. */ + +if (((compile_options|match_options) & PCRE2_ENDANCHORED) != 0) + match_options &= ~(PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT); + +/* Do the compile with and without the options, and after a successful compile, +likewise do the match with and without the options. */ + +for (i = 0; i < 2; i++) + { + uint32_t callout_count; + int errorcode; + PCRE2_SIZE erroroffset; + pcre2_code *code; + +#ifdef STANDALONE + printf("Compile options %.8x never_backslash_c", compile_options); + printf("%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n", + ((compile_options & PCRE2_ALT_BSUX) != 0)? ",alt_bsux" : "", + ((compile_options & PCRE2_ALT_CIRCUMFLEX) != 0)? ",alt_circumflex" : "", + ((compile_options & PCRE2_ALT_VERBNAMES) != 0)? ",alt_verbnames" : "", + ((compile_options & PCRE2_ALLOW_EMPTY_CLASS) != 0)? ",allow_empty_class" : "", + ((compile_options & PCRE2_ANCHORED) != 0)? ",anchored" : "", + ((compile_options & PCRE2_AUTO_CALLOUT) != 0)? ",auto_callout" : "", + ((compile_options & PCRE2_CASELESS) != 0)? ",caseless" : "", + ((compile_options & PCRE2_DOLLAR_ENDONLY) != 0)? ",dollar_endonly" : "", + ((compile_options & PCRE2_DOTALL) != 0)? ",dotall" : "", + ((compile_options & PCRE2_DUPNAMES) != 0)? ",dupnames" : "", + ((compile_options & PCRE2_ENDANCHORED) != 0)? ",endanchored" : "", + ((compile_options & PCRE2_EXTENDED) != 0)? ",extended" : "", + ((compile_options & PCRE2_FIRSTLINE) != 0)? ",firstline" : "", + ((compile_options & PCRE2_MATCH_UNSET_BACKREF) != 0)? ",match_unset_backref" : "", + ((compile_options & PCRE2_MULTILINE) != 0)? ",multiline" : "", + ((compile_options & PCRE2_NEVER_UCP) != 0)? ",never_ucp" : "", + ((compile_options & PCRE2_NEVER_UTF) != 0)? ",never_utf" : "", + ((compile_options & PCRE2_NO_AUTO_CAPTURE) != 0)? ",no_auto_capture" : "", + ((compile_options & PCRE2_NO_AUTO_POSSESS) != 0)? ",no_auto_possess" : "", + ((compile_options & PCRE2_NO_DOTSTAR_ANCHOR) != 0)? ",no_dotstar_anchor" : "", + ((compile_options & PCRE2_NO_UTF_CHECK) != 0)? ",no_utf_check" : "", + ((compile_options & PCRE2_NO_START_OPTIMIZE) != 0)? ",no_start_optimize" : "", + ((compile_options & PCRE2_UCP) != 0)? ",ucp" : "", + ((compile_options & PCRE2_UNGREEDY) != 0)? ",ungreedy" : "", + ((compile_options & PCRE2_USE_OFFSET_LIMIT) != 0)? ",use_offset_limit" : "", + ((compile_options & PCRE2_UTF) != 0)? ",utf" : ""); +#endif + + code = pcre2_compile((PCRE2_SPTR)data, (PCRE2_SIZE)size, compile_options, + &errorcode, &erroroffset, NULL); + + /* Compilation succeeded */ + + if (code != NULL) + { + int j; + uint32_t save_match_options = match_options; + + /* Create match data and context blocks only when we first need them. Set + low match and depth limits to avoid wasting too much searching large + pattern trees. Almost all matches are going to fail. */ + + if (match_data == NULL) + { + match_data = pcre2_match_data_create(32, NULL); + if (match_data == NULL) + { +#ifdef STANDALONE + printf("** Failed to create match data block\n"); +#endif + return 0; + } + } + + if (match_context == NULL) + { + match_context = pcre2_match_context_create(NULL); + if (match_context == NULL) + { +#ifdef STANDALONE + printf("** Failed to create match context block\n"); +#endif + return 0; + } + (void)pcre2_set_match_limit(match_context, 100); + (void)pcre2_set_depth_limit(match_context, 100); + (void)pcre2_set_callout(match_context, callout_function, &callout_count); + } + + /* Match twice, with and without options. */ + + for (j = 0; j < 2; j++) + { +#ifdef STANDALONE + printf("Match options %.8x", match_options); + printf("%s%s%s%s%s%s%s%s%s%s\n", + ((match_options & PCRE2_ANCHORED) != 0)? ",anchored" : "", + ((match_options & PCRE2_ENDANCHORED) != 0)? ",endanchored" : "", + ((match_options & PCRE2_NO_JIT) != 0)? ",no_jit" : "", + ((match_options & PCRE2_NO_UTF_CHECK) != 0)? ",no_utf_check" : "", + ((match_options & PCRE2_NOTBOL) != 0)? ",notbol" : "", + ((match_options & PCRE2_NOTEMPTY) != 0)? ",notempty" : "", + ((match_options & PCRE2_NOTEMPTY_ATSTART) != 0)? ",notempty_atstart" : "", + ((match_options & PCRE2_NOTEOL) != 0)? ",noteol" : "", + ((match_options & PCRE2_PARTIAL_HARD) != 0)? ",partial_hard" : "", + ((match_options & PCRE2_PARTIAL_SOFT) != 0)? ",partial_soft" : ""); +#endif + + callout_count = 0; + errorcode = pcre2_match(code, (PCRE2_SPTR)data, (PCRE2_SIZE)match_size, 0, + match_options, match_data, match_context); + +#ifdef STANDALONE + if (errorcode >= 0) printf("Match returned %d\n", errorcode); else + { + unsigned char buffer[256]; + pcre2_get_error_message(errorcode, buffer, 256); + printf("Match failed: error %d: %s\n", errorcode, buffer); + } +#endif + + match_options = 0; /* For second time */ + } + + /* Match with DFA twice, with and without options. */ + + match_options = save_match_options & ~PCRE2_NO_JIT; /* Not valid for DFA */ + + for (j = 0; j < 2; j++) + { +#ifdef STANDALONE + printf("DFA match options %.8x", match_options); + printf("%s%s%s%s%s%s%s%s%s\n", + ((match_options & PCRE2_ANCHORED) != 0)? ",anchored" : "", + ((match_options & PCRE2_ENDANCHORED) != 0)? ",endanchored" : "", + ((match_options & PCRE2_NO_UTF_CHECK) != 0)? ",no_utf_check" : "", + ((match_options & PCRE2_NOTBOL) != 0)? ",notbol" : "", + ((match_options & PCRE2_NOTEMPTY) != 0)? ",notempty" : "", + ((match_options & PCRE2_NOTEMPTY_ATSTART) != 0)? ",notempty_atstart" : "", + ((match_options & PCRE2_NOTEOL) != 0)? ",noteol" : "", + ((match_options & PCRE2_PARTIAL_HARD) != 0)? ",partial_hard" : "", + ((match_options & PCRE2_PARTIAL_SOFT) != 0)? ",partial_soft" : ""); +#endif + + callout_count = 0; + errorcode = pcre2_dfa_match(code, (PCRE2_SPTR)data, + (PCRE2_SIZE)match_size, 0, match_options, match_data, match_context, + dfa_workspace, DFA_WORKSPACE_COUNT); + +#ifdef STANDALONE + if (errorcode >= 0) printf("Match returned %d\n", errorcode); else + { + unsigned char buffer[256]; + pcre2_get_error_message(errorcode, buffer, 256); + printf("Match failed: error %d: %s\n", errorcode, buffer); + } +#endif + + match_options = 0; /* For second time */ + } + + match_options = save_match_options; /* Reset for the second compile */ + pcre2_code_free(code); + } + + /* Compilation failed */ + + else + { + unsigned char buffer[256]; + pcre2_get_error_message(errorcode, buffer, 256); +#ifdef STANDALONE + printf("Error %d at offset %lu: %s\n", errorcode, erroroffset, buffer); +#else + if (strstr((const char *)buffer, "internal error") != NULL) abort(); +#endif + } + + compile_options = PCRE2_NEVER_BACKSLASH_C; /* For second time */ + } + +if (match_data != NULL) pcre2_match_data_free(match_data); +if (match_context != NULL) pcre2_match_context_free(match_context); + +return 0; +} + + +/* Optional main program. */ + +#ifdef STANDALONE +int main(int argc, char **argv) +{ +int i; + +if (argc < 2) + { + printf("** No arguments given\n"); + return 0; + } + +for (i = 1; i < argc; i++) + { + size_t filelen; + size_t readsize; + unsigned char *buffer; + FILE *f; + + /* Handle a literal string. Copy to an exact size buffer so that checks for + overrunning work. */ + + if (argv[i][0] == '=') + { + readsize = strlen(argv[i]) - 1; + printf("------ ------\n"); + printf("Length = %lu\n", readsize); + printf("%.*s\n", (int)readsize, argv[i]+1); + buffer = (unsigned char *)malloc(readsize); + if (buffer == NULL) + printf("** Failed to allocate %lu bytes of memory\n", readsize); + else + { + memcpy(buffer, argv[i]+1, readsize); + LLVMFuzzerTestOneInput(buffer, readsize); + free(buffer); + } + continue; + } + + /* Handle a string given in a file */ + + f = fopen(argv[i], "rb"); + if (f == NULL) + { + printf("** Failed to open %s: %s\n", argv[i], strerror(errno)); + continue; + } + + printf("------ %s ------\n", argv[i]); + + fseek(f, 0, SEEK_END); + filelen = ftell(f); + fseek(f, 0, SEEK_SET); + + buffer = (unsigned char *)malloc(filelen); + if (buffer == NULL) + { + printf("** Failed to allocate %lu bytes of memory\n", filelen); + fclose(f); + continue; + } + + readsize = fread(buffer, 1, filelen, f); + fclose(f); + + if (readsize != filelen) + printf("** File size is %lu but fread() returned %lu\n", filelen, readsize); + else + { + printf("Length = %lu\n", filelen); + LLVMFuzzerTestOneInput(buffer, filelen); + } + free(buffer); + } + +return 0; +} +#endif /* STANDALONE */ + +/* End */ diff --git a/src/pcre2/src/pcre2_internal.h b/src/pcre2/src/pcre2_internal.h new file mode 100644 index 00000000..d8fad1e9 --- /dev/null +++ b/src/pcre2/src/pcre2_internal.h @@ -0,0 +1,2004 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE2 is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2020 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + +#ifndef PCRE2_INTERNAL_H_IDEMPOTENT_GUARD +#define PCRE2_INTERNAL_H_IDEMPOTENT_GUARD + +/* We do not support both EBCDIC and Unicode at the same time. The "configure" +script prevents both being selected, but not everybody uses "configure". EBCDIC +is only supported for the 8-bit library, but the check for this has to be later +in this file, because the first part is not width-dependent, and is included by +pcre2test.c with CODE_UNIT_WIDTH == 0. */ + +#if defined EBCDIC && defined SUPPORT_UNICODE +#error The use of both EBCDIC and SUPPORT_UNICODE is not supported. +#endif + +/* Standard C headers */ + +#include +#include +#include +#include +#include +#include + +/* Macros to make boolean values more obvious. The #ifndef is to pacify +compiler warnings in environments where these macros are defined elsewhere. +Unfortunately, there is no way to do the same for the typedef. */ + +typedef int BOOL; +#ifndef FALSE +#define FALSE 0 +#define TRUE 1 +#endif + +/* Valgrind (memcheck) support */ + +#ifdef SUPPORT_VALGRIND +#include +#endif + +/* -ftrivial-auto-var-init support supports initializing all local variables +to avoid some classes of bug, but this can cause an unacceptable slowdown +for large on-stack arrays in hot functions. This macro lets us annotate +such arrays. */ + +#ifdef HAVE_ATTRIBUTE_UNINITIALIZED +#define PCRE2_KEEP_UNINITIALIZED __attribute__((uninitialized)) +#else +#define PCRE2_KEEP_UNINITIALIZED +#endif + +/* Older versions of MSVC lack snprintf(). This define allows for +warning/error-free compilation and testing with MSVC compilers back to at least +MSVC 10/2010. Except for VC6 (which is missing some fundamentals and fails). */ + +#if defined(_MSC_VER) && (_MSC_VER < 1900) +#define snprintf _snprintf +#endif + +/* When compiling a DLL for Windows, the exported symbols have to be declared +using some MS magic. I found some useful information on this web page: +http://msdn2.microsoft.com/en-us/library/y4h7bcy6(VS.80).aspx. According to the +information there, using __declspec(dllexport) without "extern" we have a +definition; with "extern" we have a declaration. The settings here override the +setting in pcre2.h (which is included below); it defines only PCRE2_EXP_DECL, +which is all that is needed for applications (they just import the symbols). We +use: + + PCRE2_EXP_DECL for declarations + PCRE2_EXP_DEFN for definitions + +The reason for wrapping this in #ifndef PCRE2_EXP_DECL is so that pcre2test, +which is an application, but needs to import this file in order to "peek" at +internals, can #include pcre2.h first to get an application's-eye view. + +In principle, people compiling for non-Windows, non-Unix-like (i.e. uncommon, +special-purpose environments) might want to stick other stuff in front of +exported symbols. That's why, in the non-Windows case, we set PCRE2_EXP_DEFN +only if it is not already set. */ + +#ifndef PCRE2_EXP_DECL +# ifdef _WIN32 +# ifndef PCRE2_STATIC +# define PCRE2_EXP_DECL extern __declspec(dllexport) +# define PCRE2_EXP_DEFN __declspec(dllexport) +# else +# define PCRE2_EXP_DECL extern +# define PCRE2_EXP_DEFN +# endif +# else +# ifdef __cplusplus +# define PCRE2_EXP_DECL extern "C" +# else +# define PCRE2_EXP_DECL extern +# endif +# ifndef PCRE2_EXP_DEFN +# define PCRE2_EXP_DEFN PCRE2_EXP_DECL +# endif +# endif +#endif + +/* Include the public PCRE2 header and the definitions of UCP character +property values. This must follow the setting of PCRE2_EXP_DECL above. */ + +#include "pcre2.h" +#include "pcre2_ucp.h" + +/* When PCRE2 is compiled as a C++ library, the subject pointer can be replaced +with a custom type. This makes it possible, for example, to allow pcre2_match() +to process subject strings that are discontinuous by using a smart pointer +class. It must always be possible to inspect all of the subject string in +pcre2_match() because of the way it backtracks. */ + +/* WARNING: This is as yet untested for PCRE2. */ + +#ifdef CUSTOM_SUBJECT_PTR +#undef PCRE2_SPTR +#define PCRE2_SPTR CUSTOM_SUBJECT_PTR +#endif + +/* When checking for integer overflow in pcre2_compile(), we need to handle +large integers. If a 64-bit integer type is available, we can use that. +Otherwise we have to cast to double, which of course requires floating point +arithmetic. Handle this by defining a macro for the appropriate type. */ + +#if defined INT64_MAX || defined int64_t +#define INT64_OR_DOUBLE int64_t +#else +#define INT64_OR_DOUBLE double +#endif + +/* External (in the C sense) functions and tables that are private to the +libraries are always referenced using the PRIV macro. This makes it possible +for pcre2test.c to include some of the source files from the libraries using a +different PRIV definition to avoid name clashes. It also makes it clear in the +code that a non-static object is being referenced. */ + +#ifndef PRIV +#define PRIV(name) _pcre2_##name +#endif + +/* When compiling for use with the Virtual Pascal compiler, these functions +need to have their names changed. PCRE2 must be compiled with the -DVPCOMPAT +option on the command line. */ + +#ifdef VPCOMPAT +#define strlen(s) _strlen(s) +#define strncmp(s1,s2,m) _strncmp(s1,s2,m) +#define memcmp(s,c,n) _memcmp(s,c,n) +#define memcpy(d,s,n) _memcpy(d,s,n) +#define memmove(d,s,n) _memmove(d,s,n) +#define memset(s,c,n) _memset(s,c,n) +#else /* VPCOMPAT */ + +/* Otherwise, to cope with SunOS4 and other systems that lack memmove(), define +a macro that calls an emulating function. */ + +#ifndef HAVE_MEMMOVE +#undef memmove /* Some systems may have a macro */ +#define memmove(a, b, c) PRIV(memmove)(a, b, c) +#endif /* not HAVE_MEMMOVE */ +#endif /* not VPCOMPAT */ + +/* This is an unsigned int value that no UTF character can ever have, as +Unicode doesn't go beyond 0x0010ffff. */ + +#define NOTACHAR 0xffffffff + +/* This is the largest valid UTF/Unicode code point. */ + +#define MAX_UTF_CODE_POINT 0x10ffff + +/* Compile-time positive error numbers (all except UTF errors, which are +negative) start at this value. It should probably never be changed, in case +some application is checking for specific numbers. There is a copy of this +#define in pcre2posix.c (which now no longer includes this file). Ideally, a +way of having a single definition should be found, but as the number is +unlikely to change, this is not a pressing issue. The original reason for +having a base other than 0 was to keep the absolute values of compile-time and +run-time error numbers numerically different, but in the event the code does +not rely on this. */ + +#define COMPILE_ERROR_BASE 100 + +/* The initial frames vector for remembering backtracking points in +pcre2_match() is allocated on the system stack, of this size (bytes). The size +must be a multiple of sizeof(PCRE2_SPTR) in all environments, so making it a +multiple of 8 is best. Typical frame sizes are a few hundred bytes (it depends +on the number of capturing parentheses) so 20KiB handles quite a few frames. A +larger vector on the heap is obtained for patterns that need more frames. The +maximum size of this can be limited. */ + +#define START_FRAMES_SIZE 20480 + +/* Similarly, for DFA matching, an initial internal workspace vector is +allocated on the stack. */ + +#define DFA_START_RWS_SIZE 30720 + +/* Define the default BSR convention. */ + +#ifdef BSR_ANYCRLF +#define BSR_DEFAULT PCRE2_BSR_ANYCRLF +#else +#define BSR_DEFAULT PCRE2_BSR_UNICODE +#endif + + +/* ---------------- Basic UTF-8 macros ---------------- */ + +/* These UTF-8 macros are always defined because they are used in pcre2test for +handling wide characters in 16-bit and 32-bit modes, even if an 8-bit library +is not supported. */ + +/* Tests whether a UTF-8 code point needs extra bytes to decode. */ + +#define HASUTF8EXTRALEN(c) ((c) >= 0xc0) + +/* The following macros were originally written in the form of loops that used +data from the tables whose names start with PRIV(utf8_table). They were +rewritten by a user so as not to use loops, because in some environments this +gives a significant performance advantage, and it seems never to do any harm. +*/ + +/* Base macro to pick up the remaining bytes of a UTF-8 character, not +advancing the pointer. */ + +#define GETUTF8(c, eptr) \ + { \ + if ((c & 0x20u) == 0) \ + c = ((c & 0x1fu) << 6) | (eptr[1] & 0x3fu); \ + else if ((c & 0x10u) == 0) \ + c = ((c & 0x0fu) << 12) | ((eptr[1] & 0x3fu) << 6) | (eptr[2] & 0x3fu); \ + else if ((c & 0x08u) == 0) \ + c = ((c & 0x07u) << 18) | ((eptr[1] & 0x3fu) << 12) | \ + ((eptr[2] & 0x3fu) << 6) | (eptr[3] & 0x3fu); \ + else if ((c & 0x04u) == 0) \ + c = ((c & 0x03u) << 24) | ((eptr[1] & 0x3fu) << 18) | \ + ((eptr[2] & 0x3fu) << 12) | ((eptr[3] & 0x3fu) << 6) | \ + (eptr[4] & 0x3fu); \ + else \ + c = ((c & 0x01u) << 30) | ((eptr[1] & 0x3fu) << 24) | \ + ((eptr[2] & 0x3fu) << 18) | ((eptr[3] & 0x3fu) << 12) | \ + ((eptr[4] & 0x3fu) << 6) | (eptr[5] & 0x3fu); \ + } + +/* Base macro to pick up the remaining bytes of a UTF-8 character, advancing +the pointer. */ + +#define GETUTF8INC(c, eptr) \ + { \ + if ((c & 0x20u) == 0) \ + c = ((c & 0x1fu) << 6) | (*eptr++ & 0x3fu); \ + else if ((c & 0x10u) == 0) \ + { \ + c = ((c & 0x0fu) << 12) | ((*eptr & 0x3fu) << 6) | (eptr[1] & 0x3fu); \ + eptr += 2; \ + } \ + else if ((c & 0x08u) == 0) \ + { \ + c = ((c & 0x07u) << 18) | ((*eptr & 0x3fu) << 12) | \ + ((eptr[1] & 0x3fu) << 6) | (eptr[2] & 0x3fu); \ + eptr += 3; \ + } \ + else if ((c & 0x04u) == 0) \ + { \ + c = ((c & 0x03u) << 24) | ((*eptr & 0x3fu) << 18) | \ + ((eptr[1] & 0x3fu) << 12) | ((eptr[2] & 0x3fu) << 6) | \ + (eptr[3] & 0x3fu); \ + eptr += 4; \ + } \ + else \ + { \ + c = ((c & 0x01u) << 30) | ((*eptr & 0x3fu) << 24) | \ + ((eptr[1] & 0x3fu) << 18) | ((eptr[2] & 0x3fu) << 12) | \ + ((eptr[3] & 0x3fu) << 6) | (eptr[4] & 0x3fu); \ + eptr += 5; \ + } \ + } + +/* Base macro to pick up the remaining bytes of a UTF-8 character, not +advancing the pointer, incrementing the length. */ + +#define GETUTF8LEN(c, eptr, len) \ + { \ + if ((c & 0x20u) == 0) \ + { \ + c = ((c & 0x1fu) << 6) | (eptr[1] & 0x3fu); \ + len++; \ + } \ + else if ((c & 0x10u) == 0) \ + { \ + c = ((c & 0x0fu) << 12) | ((eptr[1] & 0x3fu) << 6) | (eptr[2] & 0x3fu); \ + len += 2; \ + } \ + else if ((c & 0x08u) == 0) \ + {\ + c = ((c & 0x07u) << 18) | ((eptr[1] & 0x3fu) << 12) | \ + ((eptr[2] & 0x3fu) << 6) | (eptr[3] & 0x3fu); \ + len += 3; \ + } \ + else if ((c & 0x04u) == 0) \ + { \ + c = ((c & 0x03u) << 24) | ((eptr[1] & 0x3fu) << 18) | \ + ((eptr[2] & 0x3fu) << 12) | ((eptr[3] & 0x3fu) << 6) | \ + (eptr[4] & 0x3fu); \ + len += 4; \ + } \ + else \ + {\ + c = ((c & 0x01u) << 30) | ((eptr[1] & 0x3fu) << 24) | \ + ((eptr[2] & 0x3fu) << 18) | ((eptr[3] & 0x3fu) << 12) | \ + ((eptr[4] & 0x3fu) << 6) | (eptr[5] & 0x3fu); \ + len += 5; \ + } \ + } + +/* --------------- Whitespace macros ---------------- */ + +/* Tests for Unicode horizontal and vertical whitespace characters must check a +number of different values. Using a switch statement for this generates the +fastest code (no loop, no memory access), and there are several places in the +interpreter code where this happens. In order to ensure that all the case lists +remain in step, we use macros so that there is only one place where the lists +are defined. + +These values are also required as lists in pcre2_compile.c when processing \h, +\H, \v and \V in a character class. The lists are defined in pcre2_tables.c, +but macros that define the values are here so that all the definitions are +together. The lists must be in ascending character order, terminated by +NOTACHAR (which is 0xffffffff). + +Any changes should ensure that the various macros are kept in step with each +other. NOTE: The values also appear in pcre2_jit_compile.c. */ + +/* -------------- ASCII/Unicode environments -------------- */ + +#ifndef EBCDIC + +/* Character U+180E (Mongolian Vowel Separator) is not included in the list of +spaces in the Unicode file PropList.txt, and Perl does not recognize it as a +space. However, in many other sources it is listed as a space and has been in +PCRE (both APIs) for a long time. */ + +#define HSPACE_LIST \ + CHAR_HT, CHAR_SPACE, CHAR_NBSP, \ + 0x1680, 0x180e, 0x2000, 0x2001, 0x2002, 0x2003, 0x2004, 0x2005, \ + 0x2006, 0x2007, 0x2008, 0x2009, 0x200A, 0x202f, 0x205f, 0x3000, \ + NOTACHAR + +#define HSPACE_MULTIBYTE_CASES \ + case 0x1680: /* OGHAM SPACE MARK */ \ + case 0x180e: /* MONGOLIAN VOWEL SEPARATOR */ \ + case 0x2000: /* EN QUAD */ \ + case 0x2001: /* EM QUAD */ \ + case 0x2002: /* EN SPACE */ \ + case 0x2003: /* EM SPACE */ \ + case 0x2004: /* THREE-PER-EM SPACE */ \ + case 0x2005: /* FOUR-PER-EM SPACE */ \ + case 0x2006: /* SIX-PER-EM SPACE */ \ + case 0x2007: /* FIGURE SPACE */ \ + case 0x2008: /* PUNCTUATION SPACE */ \ + case 0x2009: /* THIN SPACE */ \ + case 0x200A: /* HAIR SPACE */ \ + case 0x202f: /* NARROW NO-BREAK SPACE */ \ + case 0x205f: /* MEDIUM MATHEMATICAL SPACE */ \ + case 0x3000 /* IDEOGRAPHIC SPACE */ + +#define HSPACE_BYTE_CASES \ + case CHAR_HT: \ + case CHAR_SPACE: \ + case CHAR_NBSP + +#define HSPACE_CASES \ + HSPACE_BYTE_CASES: \ + HSPACE_MULTIBYTE_CASES + +#define VSPACE_LIST \ + CHAR_LF, CHAR_VT, CHAR_FF, CHAR_CR, CHAR_NEL, 0x2028, 0x2029, NOTACHAR + +#define VSPACE_MULTIBYTE_CASES \ + case 0x2028: /* LINE SEPARATOR */ \ + case 0x2029 /* PARAGRAPH SEPARATOR */ + +#define VSPACE_BYTE_CASES \ + case CHAR_LF: \ + case CHAR_VT: \ + case CHAR_FF: \ + case CHAR_CR: \ + case CHAR_NEL + +#define VSPACE_CASES \ + VSPACE_BYTE_CASES: \ + VSPACE_MULTIBYTE_CASES + +/* -------------- EBCDIC environments -------------- */ + +#else +#define HSPACE_LIST CHAR_HT, CHAR_SPACE, CHAR_NBSP, NOTACHAR + +#define HSPACE_BYTE_CASES \ + case CHAR_HT: \ + case CHAR_SPACE: \ + case CHAR_NBSP + +#define HSPACE_CASES HSPACE_BYTE_CASES + +#ifdef EBCDIC_NL25 +#define VSPACE_LIST \ + CHAR_VT, CHAR_FF, CHAR_CR, CHAR_NEL, CHAR_LF, NOTACHAR +#else +#define VSPACE_LIST \ + CHAR_VT, CHAR_FF, CHAR_CR, CHAR_LF, CHAR_NEL, NOTACHAR +#endif + +#define VSPACE_BYTE_CASES \ + case CHAR_LF: \ + case CHAR_VT: \ + case CHAR_FF: \ + case CHAR_CR: \ + case CHAR_NEL + +#define VSPACE_CASES VSPACE_BYTE_CASES +#endif /* EBCDIC */ + +/* -------------- End of whitespace macros -------------- */ + + +/* PCRE2 is able to support several different kinds of newline (CR, LF, CRLF, +"any" and "anycrlf" at present). The following macros are used to package up +testing for newlines. NLBLOCK, PSSTART, and PSEND are defined in the various +modules to indicate in which datablock the parameters exist, and what the +start/end of string field names are. */ + +#define NLTYPE_FIXED 0 /* Newline is a fixed length string */ +#define NLTYPE_ANY 1 /* Newline is any Unicode line ending */ +#define NLTYPE_ANYCRLF 2 /* Newline is CR, LF, or CRLF */ + +/* This macro checks for a newline at the given position */ + +#define IS_NEWLINE(p) \ + ((NLBLOCK->nltype != NLTYPE_FIXED)? \ + ((p) < NLBLOCK->PSEND && \ + PRIV(is_newline)((p), NLBLOCK->nltype, NLBLOCK->PSEND, \ + &(NLBLOCK->nllen), utf)) \ + : \ + ((p) <= NLBLOCK->PSEND - NLBLOCK->nllen && \ + UCHAR21TEST(p) == NLBLOCK->nl[0] && \ + (NLBLOCK->nllen == 1 || UCHAR21TEST(p+1) == NLBLOCK->nl[1]) \ + ) \ + ) + +/* This macro checks for a newline immediately preceding the given position */ + +#define WAS_NEWLINE(p) \ + ((NLBLOCK->nltype != NLTYPE_FIXED)? \ + ((p) > NLBLOCK->PSSTART && \ + PRIV(was_newline)((p), NLBLOCK->nltype, NLBLOCK->PSSTART, \ + &(NLBLOCK->nllen), utf)) \ + : \ + ((p) >= NLBLOCK->PSSTART + NLBLOCK->nllen && \ + UCHAR21TEST(p - NLBLOCK->nllen) == NLBLOCK->nl[0] && \ + (NLBLOCK->nllen == 1 || UCHAR21TEST(p - NLBLOCK->nllen + 1) == NLBLOCK->nl[1]) \ + ) \ + ) + +/* Private flags containing information about the compiled pattern. The first +three must not be changed, because whichever is set is actually the number of +bytes in a code unit in that mode. */ + +#define PCRE2_MODE8 0x00000001 /* compiled in 8 bit mode */ +#define PCRE2_MODE16 0x00000002 /* compiled in 16 bit mode */ +#define PCRE2_MODE32 0x00000004 /* compiled in 32 bit mode */ +#define PCRE2_FIRSTSET 0x00000010 /* first_code unit is set */ +#define PCRE2_FIRSTCASELESS 0x00000020 /* caseless first code unit */ +#define PCRE2_FIRSTMAPSET 0x00000040 /* bitmap of first code units is set */ +#define PCRE2_LASTSET 0x00000080 /* last code unit is set */ +#define PCRE2_LASTCASELESS 0x00000100 /* caseless last code unit */ +#define PCRE2_STARTLINE 0x00000200 /* start after \n for multiline */ +#define PCRE2_JCHANGED 0x00000400 /* j option used in pattern */ +#define PCRE2_HASCRORLF 0x00000800 /* explicit \r or \n in pattern */ +#define PCRE2_HASTHEN 0x00001000 /* pattern contains (*THEN) */ +#define PCRE2_MATCH_EMPTY 0x00002000 /* pattern can match empty string */ +#define PCRE2_BSR_SET 0x00004000 /* BSR was set in the pattern */ +#define PCRE2_NL_SET 0x00008000 /* newline was set in the pattern */ +#define PCRE2_NOTEMPTY_SET 0x00010000 /* (*NOTEMPTY) used ) keep */ +#define PCRE2_NE_ATST_SET 0x00020000 /* (*NOTEMPTY_ATSTART) used) together */ +#define PCRE2_DEREF_TABLES 0x00040000 /* release character tables */ +#define PCRE2_NOJIT 0x00080000 /* (*NOJIT) used */ +#define PCRE2_HASBKPORX 0x00100000 /* contains \P, \p, or \X */ +#define PCRE2_DUPCAPUSED 0x00200000 /* contains (?| */ +#define PCRE2_HASBKC 0x00400000 /* contains \C */ +#define PCRE2_HASACCEPT 0x00800000 /* contains (*ACCEPT) */ + +#define PCRE2_MODE_MASK (PCRE2_MODE8 | PCRE2_MODE16 | PCRE2_MODE32) + +/* Values for the matchedby field in a match data block. */ + +enum { PCRE2_MATCHEDBY_INTERPRETER, /* pcre2_match() */ + PCRE2_MATCHEDBY_DFA_INTERPRETER, /* pcre2_dfa_match() */ + PCRE2_MATCHEDBY_JIT }; /* pcre2_jit_match() */ + +/* Values for the flags field in a match data block. */ + +#define PCRE2_MD_COPIED_SUBJECT 0x01u + +/* Magic number to provide a small check against being handed junk. */ + +#define MAGIC_NUMBER 0x50435245UL /* 'PCRE' */ + +/* The maximum remaining length of subject we are prepared to search for a +req_unit match from an anchored pattern. In 8-bit mode, memchr() is used and is +much faster than the search loop that has to be used in 16-bit and 32-bit +modes. */ + +#if PCRE2_CODE_UNIT_WIDTH == 8 +#define REQ_CU_MAX 5000 +#else +#define REQ_CU_MAX 2000 +#endif + +/* Offsets for the bitmap tables in the cbits set of tables. Each table +contains a set of bits for a class map. Some classes are built by combining +these tables. */ + +#define cbit_space 0 /* [:space:] or \s */ +#define cbit_xdigit 32 /* [:xdigit:] */ +#define cbit_digit 64 /* [:digit:] or \d */ +#define cbit_upper 96 /* [:upper:] */ +#define cbit_lower 128 /* [:lower:] */ +#define cbit_word 160 /* [:word:] or \w */ +#define cbit_graph 192 /* [:graph:] */ +#define cbit_print 224 /* [:print:] */ +#define cbit_punct 256 /* [:punct:] */ +#define cbit_cntrl 288 /* [:cntrl:] */ +#define cbit_length 320 /* Length of the cbits table */ + +/* Bit definitions for entries in the ctypes table. Do not change these values +without checking pcre2_jit_compile.c, which has an assertion to ensure that +ctype_word has the value 16. */ + +#define ctype_space 0x01 +#define ctype_letter 0x02 +#define ctype_lcletter 0x04 +#define ctype_digit 0x08 +#define ctype_word 0x10 /* alphanumeric or '_' */ + +/* Offsets of the various tables from the base tables pointer, and +total length of the tables. */ + +#define lcc_offset 0 /* Lower case */ +#define fcc_offset 256 /* Flip case */ +#define cbits_offset 512 /* Character classes */ +#define ctypes_offset (cbits_offset + cbit_length) /* Character types */ +#define TABLES_LENGTH (ctypes_offset + 256) + + +/* -------------------- Character and string names ------------------------ */ + +/* If PCRE2 is to support UTF-8 on EBCDIC platforms, we cannot use normal +character constants like '*' because the compiler would emit their EBCDIC code, +which is different from their ASCII/UTF-8 code. Instead we define macros for +the characters so that they always use the ASCII/UTF-8 code when UTF-8 support +is enabled. When UTF-8 support is not enabled, the definitions use character +literals. Both character and string versions of each character are needed, and +there are some longer strings as well. + +This means that, on EBCDIC platforms, the PCRE2 library can handle either +EBCDIC, or UTF-8, but not both. To support both in the same compiled library +would need different lookups depending on whether PCRE2_UTF was set or not. +This would make it impossible to use characters in switch/case statements, +which would reduce performance. For a theoretical use (which nobody has asked +for) in a minority area (EBCDIC platforms), this is not sensible. Any +application that did need both could compile two versions of the library, using +macros to give the functions distinct names. */ + +#ifndef SUPPORT_UNICODE + +/* UTF-8 support is not enabled; use the platform-dependent character literals +so that PCRE2 works in both ASCII and EBCDIC environments, but only in non-UTF +mode. Newline characters are problematic in EBCDIC. Though it has CR and LF +characters, a common practice has been to use its NL (0x15) character as the +line terminator in C-like processing environments. However, sometimes the LF +(0x25) character is used instead, according to this Unicode document: + +http://unicode.org/standard/reports/tr13/tr13-5.html + +PCRE2 defaults EBCDIC NL to 0x15, but has a build-time option to select 0x25 +instead. Whichever is *not* chosen is defined as NEL. + +In both ASCII and EBCDIC environments, CHAR_NL and CHAR_LF are synonyms for the +same code point. */ + +#ifdef EBCDIC + +#ifndef EBCDIC_NL25 +#define CHAR_NL '\x15' +#define CHAR_NEL '\x25' +#define STR_NL "\x15" +#define STR_NEL "\x25" +#else +#define CHAR_NL '\x25' +#define CHAR_NEL '\x15' +#define STR_NL "\x25" +#define STR_NEL "\x15" +#endif + +#define CHAR_LF CHAR_NL +#define STR_LF STR_NL + +#define CHAR_ESC '\047' +#define CHAR_DEL '\007' +#define CHAR_NBSP ((unsigned char)'\x41') +#define STR_ESC "\047" +#define STR_DEL "\007" + +#else /* Not EBCDIC */ + +/* In ASCII/Unicode, linefeed is '\n' and we equate this to NL for +compatibility. NEL is the Unicode newline character; make sure it is +a positive value. */ + +#define CHAR_LF '\n' +#define CHAR_NL CHAR_LF +#define CHAR_NEL ((unsigned char)'\x85') +#define CHAR_ESC '\033' +#define CHAR_DEL '\177' +#define CHAR_NBSP ((unsigned char)'\xa0') + +#define STR_LF "\n" +#define STR_NL STR_LF +#define STR_NEL "\x85" +#define STR_ESC "\033" +#define STR_DEL "\177" + +#endif /* EBCDIC */ + +/* The remaining definitions work in both environments. */ + +#define CHAR_NUL '\0' +#define CHAR_HT '\t' +#define CHAR_VT '\v' +#define CHAR_FF '\f' +#define CHAR_CR '\r' +#define CHAR_BS '\b' +#define CHAR_BEL '\a' + +#define CHAR_SPACE ' ' +#define CHAR_EXCLAMATION_MARK '!' +#define CHAR_QUOTATION_MARK '"' +#define CHAR_NUMBER_SIGN '#' +#define CHAR_DOLLAR_SIGN '$' +#define CHAR_PERCENT_SIGN '%' +#define CHAR_AMPERSAND '&' +#define CHAR_APOSTROPHE '\'' +#define CHAR_LEFT_PARENTHESIS '(' +#define CHAR_RIGHT_PARENTHESIS ')' +#define CHAR_ASTERISK '*' +#define CHAR_PLUS '+' +#define CHAR_COMMA ',' +#define CHAR_MINUS '-' +#define CHAR_DOT '.' +#define CHAR_SLASH '/' +#define CHAR_0 '0' +#define CHAR_1 '1' +#define CHAR_2 '2' +#define CHAR_3 '3' +#define CHAR_4 '4' +#define CHAR_5 '5' +#define CHAR_6 '6' +#define CHAR_7 '7' +#define CHAR_8 '8' +#define CHAR_9 '9' +#define CHAR_COLON ':' +#define CHAR_SEMICOLON ';' +#define CHAR_LESS_THAN_SIGN '<' +#define CHAR_EQUALS_SIGN '=' +#define CHAR_GREATER_THAN_SIGN '>' +#define CHAR_QUESTION_MARK '?' +#define CHAR_COMMERCIAL_AT '@' +#define CHAR_A 'A' +#define CHAR_B 'B' +#define CHAR_C 'C' +#define CHAR_D 'D' +#define CHAR_E 'E' +#define CHAR_F 'F' +#define CHAR_G 'G' +#define CHAR_H 'H' +#define CHAR_I 'I' +#define CHAR_J 'J' +#define CHAR_K 'K' +#define CHAR_L 'L' +#define CHAR_M 'M' +#define CHAR_N 'N' +#define CHAR_O 'O' +#define CHAR_P 'P' +#define CHAR_Q 'Q' +#define CHAR_R 'R' +#define CHAR_S 'S' +#define CHAR_T 'T' +#define CHAR_U 'U' +#define CHAR_V 'V' +#define CHAR_W 'W' +#define CHAR_X 'X' +#define CHAR_Y 'Y' +#define CHAR_Z 'Z' +#define CHAR_LEFT_SQUARE_BRACKET '[' +#define CHAR_BACKSLASH '\\' +#define CHAR_RIGHT_SQUARE_BRACKET ']' +#define CHAR_CIRCUMFLEX_ACCENT '^' +#define CHAR_UNDERSCORE '_' +#define CHAR_GRAVE_ACCENT '`' +#define CHAR_a 'a' +#define CHAR_b 'b' +#define CHAR_c 'c' +#define CHAR_d 'd' +#define CHAR_e 'e' +#define CHAR_f 'f' +#define CHAR_g 'g' +#define CHAR_h 'h' +#define CHAR_i 'i' +#define CHAR_j 'j' +#define CHAR_k 'k' +#define CHAR_l 'l' +#define CHAR_m 'm' +#define CHAR_n 'n' +#define CHAR_o 'o' +#define CHAR_p 'p' +#define CHAR_q 'q' +#define CHAR_r 'r' +#define CHAR_s 's' +#define CHAR_t 't' +#define CHAR_u 'u' +#define CHAR_v 'v' +#define CHAR_w 'w' +#define CHAR_x 'x' +#define CHAR_y 'y' +#define CHAR_z 'z' +#define CHAR_LEFT_CURLY_BRACKET '{' +#define CHAR_VERTICAL_LINE '|' +#define CHAR_RIGHT_CURLY_BRACKET '}' +#define CHAR_TILDE '~' + +#define STR_HT "\t" +#define STR_VT "\v" +#define STR_FF "\f" +#define STR_CR "\r" +#define STR_BS "\b" +#define STR_BEL "\a" + +#define STR_SPACE " " +#define STR_EXCLAMATION_MARK "!" +#define STR_QUOTATION_MARK "\"" +#define STR_NUMBER_SIGN "#" +#define STR_DOLLAR_SIGN "$" +#define STR_PERCENT_SIGN "%" +#define STR_AMPERSAND "&" +#define STR_APOSTROPHE "'" +#define STR_LEFT_PARENTHESIS "(" +#define STR_RIGHT_PARENTHESIS ")" +#define STR_ASTERISK "*" +#define STR_PLUS "+" +#define STR_COMMA "," +#define STR_MINUS "-" +#define STR_DOT "." +#define STR_SLASH "/" +#define STR_0 "0" +#define STR_1 "1" +#define STR_2 "2" +#define STR_3 "3" +#define STR_4 "4" +#define STR_5 "5" +#define STR_6 "6" +#define STR_7 "7" +#define STR_8 "8" +#define STR_9 "9" +#define STR_COLON ":" +#define STR_SEMICOLON ";" +#define STR_LESS_THAN_SIGN "<" +#define STR_EQUALS_SIGN "=" +#define STR_GREATER_THAN_SIGN ">" +#define STR_QUESTION_MARK "?" +#define STR_COMMERCIAL_AT "@" +#define STR_A "A" +#define STR_B "B" +#define STR_C "C" +#define STR_D "D" +#define STR_E "E" +#define STR_F "F" +#define STR_G "G" +#define STR_H "H" +#define STR_I "I" +#define STR_J "J" +#define STR_K "K" +#define STR_L "L" +#define STR_M "M" +#define STR_N "N" +#define STR_O "O" +#define STR_P "P" +#define STR_Q "Q" +#define STR_R "R" +#define STR_S "S" +#define STR_T "T" +#define STR_U "U" +#define STR_V "V" +#define STR_W "W" +#define STR_X "X" +#define STR_Y "Y" +#define STR_Z "Z" +#define STR_LEFT_SQUARE_BRACKET "[" +#define STR_BACKSLASH "\\" +#define STR_RIGHT_SQUARE_BRACKET "]" +#define STR_CIRCUMFLEX_ACCENT "^" +#define STR_UNDERSCORE "_" +#define STR_GRAVE_ACCENT "`" +#define STR_a "a" +#define STR_b "b" +#define STR_c "c" +#define STR_d "d" +#define STR_e "e" +#define STR_f "f" +#define STR_g "g" +#define STR_h "h" +#define STR_i "i" +#define STR_j "j" +#define STR_k "k" +#define STR_l "l" +#define STR_m "m" +#define STR_n "n" +#define STR_o "o" +#define STR_p "p" +#define STR_q "q" +#define STR_r "r" +#define STR_s "s" +#define STR_t "t" +#define STR_u "u" +#define STR_v "v" +#define STR_w "w" +#define STR_x "x" +#define STR_y "y" +#define STR_z "z" +#define STR_LEFT_CURLY_BRACKET "{" +#define STR_VERTICAL_LINE "|" +#define STR_RIGHT_CURLY_BRACKET "}" +#define STR_TILDE "~" + +#define STRING_ACCEPT0 "ACCEPT\0" +#define STRING_COMMIT0 "COMMIT\0" +#define STRING_F0 "F\0" +#define STRING_FAIL0 "FAIL\0" +#define STRING_MARK0 "MARK\0" +#define STRING_PRUNE0 "PRUNE\0" +#define STRING_SKIP0 "SKIP\0" +#define STRING_THEN "THEN" + +#define STRING_atomic0 "atomic\0" +#define STRING_pla0 "pla\0" +#define STRING_plb0 "plb\0" +#define STRING_napla0 "napla\0" +#define STRING_naplb0 "naplb\0" +#define STRING_nla0 "nla\0" +#define STRING_nlb0 "nlb\0" +#define STRING_sr0 "sr\0" +#define STRING_asr0 "asr\0" +#define STRING_positive_lookahead0 "positive_lookahead\0" +#define STRING_positive_lookbehind0 "positive_lookbehind\0" +#define STRING_non_atomic_positive_lookahead0 "non_atomic_positive_lookahead\0" +#define STRING_non_atomic_positive_lookbehind0 "non_atomic_positive_lookbehind\0" +#define STRING_negative_lookahead0 "negative_lookahead\0" +#define STRING_negative_lookbehind0 "negative_lookbehind\0" +#define STRING_script_run0 "script_run\0" +#define STRING_atomic_script_run "atomic_script_run" + +#define STRING_alpha0 "alpha\0" +#define STRING_lower0 "lower\0" +#define STRING_upper0 "upper\0" +#define STRING_alnum0 "alnum\0" +#define STRING_ascii0 "ascii\0" +#define STRING_blank0 "blank\0" +#define STRING_cntrl0 "cntrl\0" +#define STRING_digit0 "digit\0" +#define STRING_graph0 "graph\0" +#define STRING_print0 "print\0" +#define STRING_punct0 "punct\0" +#define STRING_space0 "space\0" +#define STRING_word0 "word\0" +#define STRING_xdigit "xdigit" + +#define STRING_DEFINE "DEFINE" +#define STRING_VERSION "VERSION" +#define STRING_WEIRD_STARTWORD "[:<:]]" +#define STRING_WEIRD_ENDWORD "[:>:]]" + +#define STRING_CR_RIGHTPAR "CR)" +#define STRING_LF_RIGHTPAR "LF)" +#define STRING_CRLF_RIGHTPAR "CRLF)" +#define STRING_ANY_RIGHTPAR "ANY)" +#define STRING_ANYCRLF_RIGHTPAR "ANYCRLF)" +#define STRING_NUL_RIGHTPAR "NUL)" +#define STRING_BSR_ANYCRLF_RIGHTPAR "BSR_ANYCRLF)" +#define STRING_BSR_UNICODE_RIGHTPAR "BSR_UNICODE)" +#define STRING_UTF8_RIGHTPAR "UTF8)" +#define STRING_UTF16_RIGHTPAR "UTF16)" +#define STRING_UTF32_RIGHTPAR "UTF32)" +#define STRING_UTF_RIGHTPAR "UTF)" +#define STRING_UCP_RIGHTPAR "UCP)" +#define STRING_NO_AUTO_POSSESS_RIGHTPAR "NO_AUTO_POSSESS)" +#define STRING_NO_DOTSTAR_ANCHOR_RIGHTPAR "NO_DOTSTAR_ANCHOR)" +#define STRING_NO_JIT_RIGHTPAR "NO_JIT)" +#define STRING_NO_START_OPT_RIGHTPAR "NO_START_OPT)" +#define STRING_NOTEMPTY_RIGHTPAR "NOTEMPTY)" +#define STRING_NOTEMPTY_ATSTART_RIGHTPAR "NOTEMPTY_ATSTART)" +#define STRING_LIMIT_HEAP_EQ "LIMIT_HEAP=" +#define STRING_LIMIT_MATCH_EQ "LIMIT_MATCH=" +#define STRING_LIMIT_DEPTH_EQ "LIMIT_DEPTH=" +#define STRING_LIMIT_RECURSION_EQ "LIMIT_RECURSION=" +#define STRING_MARK "MARK" + +#else /* SUPPORT_UNICODE */ + +/* UTF-8 support is enabled; always use UTF-8 (=ASCII) character codes. This +works in both modes non-EBCDIC platforms, and on EBCDIC platforms in UTF-8 mode +only. */ + +#define CHAR_HT '\011' +#define CHAR_VT '\013' +#define CHAR_FF '\014' +#define CHAR_CR '\015' +#define CHAR_LF '\012' +#define CHAR_NL CHAR_LF +#define CHAR_NEL ((unsigned char)'\x85') +#define CHAR_BS '\010' +#define CHAR_BEL '\007' +#define CHAR_ESC '\033' +#define CHAR_DEL '\177' + +#define CHAR_NUL '\0' +#define CHAR_SPACE '\040' +#define CHAR_EXCLAMATION_MARK '\041' +#define CHAR_QUOTATION_MARK '\042' +#define CHAR_NUMBER_SIGN '\043' +#define CHAR_DOLLAR_SIGN '\044' +#define CHAR_PERCENT_SIGN '\045' +#define CHAR_AMPERSAND '\046' +#define CHAR_APOSTROPHE '\047' +#define CHAR_LEFT_PARENTHESIS '\050' +#define CHAR_RIGHT_PARENTHESIS '\051' +#define CHAR_ASTERISK '\052' +#define CHAR_PLUS '\053' +#define CHAR_COMMA '\054' +#define CHAR_MINUS '\055' +#define CHAR_DOT '\056' +#define CHAR_SLASH '\057' +#define CHAR_0 '\060' +#define CHAR_1 '\061' +#define CHAR_2 '\062' +#define CHAR_3 '\063' +#define CHAR_4 '\064' +#define CHAR_5 '\065' +#define CHAR_6 '\066' +#define CHAR_7 '\067' +#define CHAR_8 '\070' +#define CHAR_9 '\071' +#define CHAR_COLON '\072' +#define CHAR_SEMICOLON '\073' +#define CHAR_LESS_THAN_SIGN '\074' +#define CHAR_EQUALS_SIGN '\075' +#define CHAR_GREATER_THAN_SIGN '\076' +#define CHAR_QUESTION_MARK '\077' +#define CHAR_COMMERCIAL_AT '\100' +#define CHAR_A '\101' +#define CHAR_B '\102' +#define CHAR_C '\103' +#define CHAR_D '\104' +#define CHAR_E '\105' +#define CHAR_F '\106' +#define CHAR_G '\107' +#define CHAR_H '\110' +#define CHAR_I '\111' +#define CHAR_J '\112' +#define CHAR_K '\113' +#define CHAR_L '\114' +#define CHAR_M '\115' +#define CHAR_N '\116' +#define CHAR_O '\117' +#define CHAR_P '\120' +#define CHAR_Q '\121' +#define CHAR_R '\122' +#define CHAR_S '\123' +#define CHAR_T '\124' +#define CHAR_U '\125' +#define CHAR_V '\126' +#define CHAR_W '\127' +#define CHAR_X '\130' +#define CHAR_Y '\131' +#define CHAR_Z '\132' +#define CHAR_LEFT_SQUARE_BRACKET '\133' +#define CHAR_BACKSLASH '\134' +#define CHAR_RIGHT_SQUARE_BRACKET '\135' +#define CHAR_CIRCUMFLEX_ACCENT '\136' +#define CHAR_UNDERSCORE '\137' +#define CHAR_GRAVE_ACCENT '\140' +#define CHAR_a '\141' +#define CHAR_b '\142' +#define CHAR_c '\143' +#define CHAR_d '\144' +#define CHAR_e '\145' +#define CHAR_f '\146' +#define CHAR_g '\147' +#define CHAR_h '\150' +#define CHAR_i '\151' +#define CHAR_j '\152' +#define CHAR_k '\153' +#define CHAR_l '\154' +#define CHAR_m '\155' +#define CHAR_n '\156' +#define CHAR_o '\157' +#define CHAR_p '\160' +#define CHAR_q '\161' +#define CHAR_r '\162' +#define CHAR_s '\163' +#define CHAR_t '\164' +#define CHAR_u '\165' +#define CHAR_v '\166' +#define CHAR_w '\167' +#define CHAR_x '\170' +#define CHAR_y '\171' +#define CHAR_z '\172' +#define CHAR_LEFT_CURLY_BRACKET '\173' +#define CHAR_VERTICAL_LINE '\174' +#define CHAR_RIGHT_CURLY_BRACKET '\175' +#define CHAR_TILDE '\176' +#define CHAR_NBSP ((unsigned char)'\xa0') + +#define STR_HT "\011" +#define STR_VT "\013" +#define STR_FF "\014" +#define STR_CR "\015" +#define STR_NL "\012" +#define STR_BS "\010" +#define STR_BEL "\007" +#define STR_ESC "\033" +#define STR_DEL "\177" + +#define STR_SPACE "\040" +#define STR_EXCLAMATION_MARK "\041" +#define STR_QUOTATION_MARK "\042" +#define STR_NUMBER_SIGN "\043" +#define STR_DOLLAR_SIGN "\044" +#define STR_PERCENT_SIGN "\045" +#define STR_AMPERSAND "\046" +#define STR_APOSTROPHE "\047" +#define STR_LEFT_PARENTHESIS "\050" +#define STR_RIGHT_PARENTHESIS "\051" +#define STR_ASTERISK "\052" +#define STR_PLUS "\053" +#define STR_COMMA "\054" +#define STR_MINUS "\055" +#define STR_DOT "\056" +#define STR_SLASH "\057" +#define STR_0 "\060" +#define STR_1 "\061" +#define STR_2 "\062" +#define STR_3 "\063" +#define STR_4 "\064" +#define STR_5 "\065" +#define STR_6 "\066" +#define STR_7 "\067" +#define STR_8 "\070" +#define STR_9 "\071" +#define STR_COLON "\072" +#define STR_SEMICOLON "\073" +#define STR_LESS_THAN_SIGN "\074" +#define STR_EQUALS_SIGN "\075" +#define STR_GREATER_THAN_SIGN "\076" +#define STR_QUESTION_MARK "\077" +#define STR_COMMERCIAL_AT "\100" +#define STR_A "\101" +#define STR_B "\102" +#define STR_C "\103" +#define STR_D "\104" +#define STR_E "\105" +#define STR_F "\106" +#define STR_G "\107" +#define STR_H "\110" +#define STR_I "\111" +#define STR_J "\112" +#define STR_K "\113" +#define STR_L "\114" +#define STR_M "\115" +#define STR_N "\116" +#define STR_O "\117" +#define STR_P "\120" +#define STR_Q "\121" +#define STR_R "\122" +#define STR_S "\123" +#define STR_T "\124" +#define STR_U "\125" +#define STR_V "\126" +#define STR_W "\127" +#define STR_X "\130" +#define STR_Y "\131" +#define STR_Z "\132" +#define STR_LEFT_SQUARE_BRACKET "\133" +#define STR_BACKSLASH "\134" +#define STR_RIGHT_SQUARE_BRACKET "\135" +#define STR_CIRCUMFLEX_ACCENT "\136" +#define STR_UNDERSCORE "\137" +#define STR_GRAVE_ACCENT "\140" +#define STR_a "\141" +#define STR_b "\142" +#define STR_c "\143" +#define STR_d "\144" +#define STR_e "\145" +#define STR_f "\146" +#define STR_g "\147" +#define STR_h "\150" +#define STR_i "\151" +#define STR_j "\152" +#define STR_k "\153" +#define STR_l "\154" +#define STR_m "\155" +#define STR_n "\156" +#define STR_o "\157" +#define STR_p "\160" +#define STR_q "\161" +#define STR_r "\162" +#define STR_s "\163" +#define STR_t "\164" +#define STR_u "\165" +#define STR_v "\166" +#define STR_w "\167" +#define STR_x "\170" +#define STR_y "\171" +#define STR_z "\172" +#define STR_LEFT_CURLY_BRACKET "\173" +#define STR_VERTICAL_LINE "\174" +#define STR_RIGHT_CURLY_BRACKET "\175" +#define STR_TILDE "\176" + +#define STRING_ACCEPT0 STR_A STR_C STR_C STR_E STR_P STR_T "\0" +#define STRING_COMMIT0 STR_C STR_O STR_M STR_M STR_I STR_T "\0" +#define STRING_F0 STR_F "\0" +#define STRING_FAIL0 STR_F STR_A STR_I STR_L "\0" +#define STRING_MARK0 STR_M STR_A STR_R STR_K "\0" +#define STRING_PRUNE0 STR_P STR_R STR_U STR_N STR_E "\0" +#define STRING_SKIP0 STR_S STR_K STR_I STR_P "\0" +#define STRING_THEN STR_T STR_H STR_E STR_N + +#define STRING_atomic0 STR_a STR_t STR_o STR_m STR_i STR_c "\0" +#define STRING_pla0 STR_p STR_l STR_a "\0" +#define STRING_plb0 STR_p STR_l STR_b "\0" +#define STRING_napla0 STR_n STR_a STR_p STR_l STR_a "\0" +#define STRING_naplb0 STR_n STR_a STR_p STR_l STR_b "\0" +#define STRING_nla0 STR_n STR_l STR_a "\0" +#define STRING_nlb0 STR_n STR_l STR_b "\0" +#define STRING_sr0 STR_s STR_r "\0" +#define STRING_asr0 STR_a STR_s STR_r "\0" +#define STRING_positive_lookahead0 STR_p STR_o STR_s STR_i STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_a STR_h STR_e STR_a STR_d "\0" +#define STRING_positive_lookbehind0 STR_p STR_o STR_s STR_i STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_b STR_e STR_h STR_i STR_n STR_d "\0" +#define STRING_non_atomic_positive_lookahead0 STR_n STR_o STR_n STR_UNDERSCORE STR_a STR_t STR_o STR_m STR_i STR_c STR_UNDERSCORE STR_p STR_o STR_s STR_i STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_a STR_h STR_e STR_a STR_d "\0" +#define STRING_non_atomic_positive_lookbehind0 STR_n STR_o STR_n STR_UNDERSCORE STR_a STR_t STR_o STR_m STR_i STR_c STR_UNDERSCORE STR_p STR_o STR_s STR_i STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_b STR_e STR_h STR_i STR_n STR_d "\0" +#define STRING_negative_lookahead0 STR_n STR_e STR_g STR_a STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_a STR_h STR_e STR_a STR_d "\0" +#define STRING_negative_lookbehind0 STR_n STR_e STR_g STR_a STR_t STR_i STR_v STR_e STR_UNDERSCORE STR_l STR_o STR_o STR_k STR_b STR_e STR_h STR_i STR_n STR_d "\0" +#define STRING_script_run0 STR_s STR_c STR_r STR_i STR_p STR_t STR_UNDERSCORE STR_r STR_u STR_n "\0" +#define STRING_atomic_script_run STR_a STR_t STR_o STR_m STR_i STR_c STR_UNDERSCORE STR_s STR_c STR_r STR_i STR_p STR_t STR_UNDERSCORE STR_r STR_u STR_n + +#define STRING_alpha0 STR_a STR_l STR_p STR_h STR_a "\0" +#define STRING_lower0 STR_l STR_o STR_w STR_e STR_r "\0" +#define STRING_upper0 STR_u STR_p STR_p STR_e STR_r "\0" +#define STRING_alnum0 STR_a STR_l STR_n STR_u STR_m "\0" +#define STRING_ascii0 STR_a STR_s STR_c STR_i STR_i "\0" +#define STRING_blank0 STR_b STR_l STR_a STR_n STR_k "\0" +#define STRING_cntrl0 STR_c STR_n STR_t STR_r STR_l "\0" +#define STRING_digit0 STR_d STR_i STR_g STR_i STR_t "\0" +#define STRING_graph0 STR_g STR_r STR_a STR_p STR_h "\0" +#define STRING_print0 STR_p STR_r STR_i STR_n STR_t "\0" +#define STRING_punct0 STR_p STR_u STR_n STR_c STR_t "\0" +#define STRING_space0 STR_s STR_p STR_a STR_c STR_e "\0" +#define STRING_word0 STR_w STR_o STR_r STR_d "\0" +#define STRING_xdigit STR_x STR_d STR_i STR_g STR_i STR_t + +#define STRING_DEFINE STR_D STR_E STR_F STR_I STR_N STR_E +#define STRING_VERSION STR_V STR_E STR_R STR_S STR_I STR_O STR_N +#define STRING_WEIRD_STARTWORD STR_LEFT_SQUARE_BRACKET STR_COLON STR_LESS_THAN_SIGN STR_COLON STR_RIGHT_SQUARE_BRACKET STR_RIGHT_SQUARE_BRACKET +#define STRING_WEIRD_ENDWORD STR_LEFT_SQUARE_BRACKET STR_COLON STR_GREATER_THAN_SIGN STR_COLON STR_RIGHT_SQUARE_BRACKET STR_RIGHT_SQUARE_BRACKET + +#define STRING_CR_RIGHTPAR STR_C STR_R STR_RIGHT_PARENTHESIS +#define STRING_LF_RIGHTPAR STR_L STR_F STR_RIGHT_PARENTHESIS +#define STRING_CRLF_RIGHTPAR STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS +#define STRING_ANY_RIGHTPAR STR_A STR_N STR_Y STR_RIGHT_PARENTHESIS +#define STRING_ANYCRLF_RIGHTPAR STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS +#define STRING_NUL_RIGHTPAR STR_N STR_U STR_L STR_RIGHT_PARENTHESIS +#define STRING_BSR_ANYCRLF_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_A STR_N STR_Y STR_C STR_R STR_L STR_F STR_RIGHT_PARENTHESIS +#define STRING_BSR_UNICODE_RIGHTPAR STR_B STR_S STR_R STR_UNDERSCORE STR_U STR_N STR_I STR_C STR_O STR_D STR_E STR_RIGHT_PARENTHESIS +#define STRING_UTF8_RIGHTPAR STR_U STR_T STR_F STR_8 STR_RIGHT_PARENTHESIS +#define STRING_UTF16_RIGHTPAR STR_U STR_T STR_F STR_1 STR_6 STR_RIGHT_PARENTHESIS +#define STRING_UTF32_RIGHTPAR STR_U STR_T STR_F STR_3 STR_2 STR_RIGHT_PARENTHESIS +#define STRING_UTF_RIGHTPAR STR_U STR_T STR_F STR_RIGHT_PARENTHESIS +#define STRING_UCP_RIGHTPAR STR_U STR_C STR_P STR_RIGHT_PARENTHESIS +#define STRING_NO_AUTO_POSSESS_RIGHTPAR STR_N STR_O STR_UNDERSCORE STR_A STR_U STR_T STR_O STR_UNDERSCORE STR_P STR_O STR_S STR_S STR_E STR_S STR_S STR_RIGHT_PARENTHESIS +#define STRING_NO_DOTSTAR_ANCHOR_RIGHTPAR STR_N STR_O STR_UNDERSCORE STR_D STR_O STR_T STR_S STR_T STR_A STR_R STR_UNDERSCORE STR_A STR_N STR_C STR_H STR_O STR_R STR_RIGHT_PARENTHESIS +#define STRING_NO_JIT_RIGHTPAR STR_N STR_O STR_UNDERSCORE STR_J STR_I STR_T STR_RIGHT_PARENTHESIS +#define STRING_NO_START_OPT_RIGHTPAR STR_N STR_O STR_UNDERSCORE STR_S STR_T STR_A STR_R STR_T STR_UNDERSCORE STR_O STR_P STR_T STR_RIGHT_PARENTHESIS +#define STRING_NOTEMPTY_RIGHTPAR STR_N STR_O STR_T STR_E STR_M STR_P STR_T STR_Y STR_RIGHT_PARENTHESIS +#define STRING_NOTEMPTY_ATSTART_RIGHTPAR STR_N STR_O STR_T STR_E STR_M STR_P STR_T STR_Y STR_UNDERSCORE STR_A STR_T STR_S STR_T STR_A STR_R STR_T STR_RIGHT_PARENTHESIS +#define STRING_LIMIT_HEAP_EQ STR_L STR_I STR_M STR_I STR_T STR_UNDERSCORE STR_H STR_E STR_A STR_P STR_EQUALS_SIGN +#define STRING_LIMIT_MATCH_EQ STR_L STR_I STR_M STR_I STR_T STR_UNDERSCORE STR_M STR_A STR_T STR_C STR_H STR_EQUALS_SIGN +#define STRING_LIMIT_DEPTH_EQ STR_L STR_I STR_M STR_I STR_T STR_UNDERSCORE STR_D STR_E STR_P STR_T STR_H STR_EQUALS_SIGN +#define STRING_LIMIT_RECURSION_EQ STR_L STR_I STR_M STR_I STR_T STR_UNDERSCORE STR_R STR_E STR_C STR_U STR_R STR_S STR_I STR_O STR_N STR_EQUALS_SIGN +#define STRING_MARK STR_M STR_A STR_R STR_K + +#endif /* SUPPORT_UNICODE */ + +/* -------------------- End of character and string names -------------------*/ + +/* -------------------- Definitions for compiled patterns -------------------*/ + +/* Codes for different types of Unicode property */ + +#define PT_ANY 0 /* Any property - matches all chars */ +#define PT_LAMP 1 /* L& - the union of Lu, Ll, Lt */ +#define PT_GC 2 /* Specified general characteristic (e.g. L) */ +#define PT_PC 3 /* Specified particular characteristic (e.g. Lu) */ +#define PT_SC 4 /* Script (e.g. Han) */ +#define PT_ALNUM 5 /* Alphanumeric - the union of L and N */ +#define PT_SPACE 6 /* Perl space - Z plus 9,10,12,13 */ +#define PT_PXSPACE 7 /* POSIX space - Z plus 9,10,11,12,13 */ +#define PT_WORD 8 /* Word - L plus N plus underscore */ +#define PT_CLIST 9 /* Pseudo-property: match character list */ +#define PT_UCNC 10 /* Universal Character nameable character */ +#define PT_TABSIZE 11 /* Size of square table for autopossessify tests */ + +/* The following special properties are used only in XCLASS items, when POSIX +classes are specified and PCRE2_UCP is set - in other words, for Unicode +handling of these classes. They are not available via the \p or \P escapes like +those in the above list, and so they do not take part in the autopossessifying +table. */ + +#define PT_PXGRAPH 11 /* [:graph:] - characters that mark the paper */ +#define PT_PXPRINT 12 /* [:print:] - [:graph:] plus non-control spaces */ +#define PT_PXPUNCT 13 /* [:punct:] - punctuation characters */ + +/* Flag bits and data types for the extended class (OP_XCLASS) for classes that +contain characters with values greater than 255. */ + +#define XCL_NOT 0x01 /* Flag: this is a negative class */ +#define XCL_MAP 0x02 /* Flag: a 32-byte map is present */ +#define XCL_HASPROP 0x04 /* Flag: property checks are present. */ + +#define XCL_END 0 /* Marks end of individual items */ +#define XCL_SINGLE 1 /* Single item (one multibyte char) follows */ +#define XCL_RANGE 2 /* A range (two multibyte chars) follows */ +#define XCL_PROP 3 /* Unicode property (2-byte property code follows) */ +#define XCL_NOTPROP 4 /* Unicode inverted property (ditto) */ + +/* These are escaped items that aren't just an encoding of a particular data +value such as \n. They must have non-zero values, as check_escape() returns 0 +for a data character. In the escapes[] table in pcre2_compile.c their values +are negated in order to distinguish them from data values. + +They must appear here in the same order as in the opcode definitions below, up +to ESC_z. There's a dummy for OP_ALLANY because it corresponds to "." in DOTALL +mode rather than an escape sequence. It is also used for [^] in JavaScript +compatibility mode, and for \C in non-utf mode. In non-DOTALL mode, "." behaves +like \N. + +Negative numbers are used to encode a backreference (\1, \2, \3, etc.) in +check_escape(). There are tests in the code for an escape greater than ESC_b +and less than ESC_Z to detect the types that may be repeated. These are the +types that consume characters. If any new escapes are put in between that don't +consume a character, that code will have to change. */ + +enum { ESC_A = 1, ESC_G, ESC_K, ESC_B, ESC_b, ESC_D, ESC_d, ESC_S, ESC_s, + ESC_W, ESC_w, ESC_N, ESC_dum, ESC_C, ESC_P, ESC_p, ESC_R, ESC_H, + ESC_h, ESC_V, ESC_v, ESC_X, ESC_Z, ESC_z, + ESC_E, ESC_Q, ESC_g, ESC_k }; + + +/********************** Opcode definitions ******************/ + +/****** NOTE NOTE NOTE ****** + +Starting from 1 (i.e. after OP_END), the values up to OP_EOD must correspond in +order to the list of escapes immediately above. Furthermore, values up to +OP_DOLLM must not be changed without adjusting the table called autoposstab in +pcre2_auto_possess.c. + +Whenever this list is updated, the two macro definitions that follow must be +updated to match. The possessification table called "opcode_possessify" in +pcre2_compile.c must also be updated, and also the tables called "coptable" +and "poptable" in pcre2_dfa_match.c. + +****** NOTE NOTE NOTE ******/ + + +/* The values between FIRST_AUTOTAB_OP and LAST_AUTOTAB_RIGHT_OP, inclusive, +are used in a table for deciding whether a repeated character type can be +auto-possessified. */ + +#define FIRST_AUTOTAB_OP OP_NOT_DIGIT +#define LAST_AUTOTAB_LEFT_OP OP_EXTUNI +#define LAST_AUTOTAB_RIGHT_OP OP_DOLLM + +enum { + OP_END, /* 0 End of pattern */ + + /* Values corresponding to backslashed metacharacters */ + + OP_SOD, /* 1 Start of data: \A */ + OP_SOM, /* 2 Start of match (subject + offset): \G */ + OP_SET_SOM, /* 3 Set start of match (\K) */ + OP_NOT_WORD_BOUNDARY, /* 4 \B */ + OP_WORD_BOUNDARY, /* 5 \b */ + OP_NOT_DIGIT, /* 6 \D */ + OP_DIGIT, /* 7 \d */ + OP_NOT_WHITESPACE, /* 8 \S */ + OP_WHITESPACE, /* 9 \s */ + OP_NOT_WORDCHAR, /* 10 \W */ + OP_WORDCHAR, /* 11 \w */ + + OP_ANY, /* 12 Match any character except newline (\N) */ + OP_ALLANY, /* 13 Match any character */ + OP_ANYBYTE, /* 14 Match any byte (\C); different to OP_ANY for UTF-8 */ + OP_NOTPROP, /* 15 \P (not Unicode property) */ + OP_PROP, /* 16 \p (Unicode property) */ + OP_ANYNL, /* 17 \R (any newline sequence) */ + OP_NOT_HSPACE, /* 18 \H (not horizontal whitespace) */ + OP_HSPACE, /* 19 \h (horizontal whitespace) */ + OP_NOT_VSPACE, /* 20 \V (not vertical whitespace) */ + OP_VSPACE, /* 21 \v (vertical whitespace) */ + OP_EXTUNI, /* 22 \X (extended Unicode sequence */ + OP_EODN, /* 23 End of data or \n at end of data (\Z) */ + OP_EOD, /* 24 End of data (\z) */ + + /* Line end assertions */ + + OP_DOLL, /* 25 End of line - not multiline */ + OP_DOLLM, /* 26 End of line - multiline */ + OP_CIRC, /* 27 Start of line - not multiline */ + OP_CIRCM, /* 28 Start of line - multiline */ + + /* Single characters; caseful must precede the caseless ones, and these + must remain in this order, and adjacent. */ + + OP_CHAR, /* 29 Match one character, casefully */ + OP_CHARI, /* 30 Match one character, caselessly */ + OP_NOT, /* 31 Match one character, not the given one, casefully */ + OP_NOTI, /* 32 Match one character, not the given one, caselessly */ + + /* The following sets of 13 opcodes must always be kept in step because + the offset from the first one is used to generate the others. */ + + /* Repeated characters; caseful must precede the caseless ones */ + + OP_STAR, /* 33 The maximizing and minimizing versions of */ + OP_MINSTAR, /* 34 these six opcodes must come in pairs, with */ + OP_PLUS, /* 35 the minimizing one second. */ + OP_MINPLUS, /* 36 */ + OP_QUERY, /* 37 */ + OP_MINQUERY, /* 38 */ + + OP_UPTO, /* 39 From 0 to n matches of one character, caseful*/ + OP_MINUPTO, /* 40 */ + OP_EXACT, /* 41 Exactly n matches */ + + OP_POSSTAR, /* 42 Possessified star, caseful */ + OP_POSPLUS, /* 43 Possessified plus, caseful */ + OP_POSQUERY, /* 44 Posesssified query, caseful */ + OP_POSUPTO, /* 45 Possessified upto, caseful */ + + /* Repeated characters; caseless must follow the caseful ones */ + + OP_STARI, /* 46 */ + OP_MINSTARI, /* 47 */ + OP_PLUSI, /* 48 */ + OP_MINPLUSI, /* 49 */ + OP_QUERYI, /* 50 */ + OP_MINQUERYI, /* 51 */ + + OP_UPTOI, /* 52 From 0 to n matches of one character, caseless */ + OP_MINUPTOI, /* 53 */ + OP_EXACTI, /* 54 */ + + OP_POSSTARI, /* 55 Possessified star, caseless */ + OP_POSPLUSI, /* 56 Possessified plus, caseless */ + OP_POSQUERYI, /* 57 Posesssified query, caseless */ + OP_POSUPTOI, /* 58 Possessified upto, caseless */ + + /* The negated ones must follow the non-negated ones, and match them */ + /* Negated repeated character, caseful; must precede the caseless ones */ + + OP_NOTSTAR, /* 59 The maximizing and minimizing versions of */ + OP_NOTMINSTAR, /* 60 these six opcodes must come in pairs, with */ + OP_NOTPLUS, /* 61 the minimizing one second. They must be in */ + OP_NOTMINPLUS, /* 62 exactly the same order as those above. */ + OP_NOTQUERY, /* 63 */ + OP_NOTMINQUERY, /* 64 */ + + OP_NOTUPTO, /* 65 From 0 to n matches, caseful */ + OP_NOTMINUPTO, /* 66 */ + OP_NOTEXACT, /* 67 Exactly n matches */ + + OP_NOTPOSSTAR, /* 68 Possessified versions, caseful */ + OP_NOTPOSPLUS, /* 69 */ + OP_NOTPOSQUERY, /* 70 */ + OP_NOTPOSUPTO, /* 71 */ + + /* Negated repeated character, caseless; must follow the caseful ones */ + + OP_NOTSTARI, /* 72 */ + OP_NOTMINSTARI, /* 73 */ + OP_NOTPLUSI, /* 74 */ + OP_NOTMINPLUSI, /* 75 */ + OP_NOTQUERYI, /* 76 */ + OP_NOTMINQUERYI, /* 77 */ + + OP_NOTUPTOI, /* 78 From 0 to n matches, caseless */ + OP_NOTMINUPTOI, /* 79 */ + OP_NOTEXACTI, /* 80 Exactly n matches */ + + OP_NOTPOSSTARI, /* 81 Possessified versions, caseless */ + OP_NOTPOSPLUSI, /* 82 */ + OP_NOTPOSQUERYI, /* 83 */ + OP_NOTPOSUPTOI, /* 84 */ + + /* Character types */ + + OP_TYPESTAR, /* 85 The maximizing and minimizing versions of */ + OP_TYPEMINSTAR, /* 86 these six opcodes must come in pairs, with */ + OP_TYPEPLUS, /* 87 the minimizing one second. These codes must */ + OP_TYPEMINPLUS, /* 88 be in exactly the same order as those above. */ + OP_TYPEQUERY, /* 89 */ + OP_TYPEMINQUERY, /* 90 */ + + OP_TYPEUPTO, /* 91 From 0 to n matches */ + OP_TYPEMINUPTO, /* 92 */ + OP_TYPEEXACT, /* 93 Exactly n matches */ + + OP_TYPEPOSSTAR, /* 94 Possessified versions */ + OP_TYPEPOSPLUS, /* 95 */ + OP_TYPEPOSQUERY, /* 96 */ + OP_TYPEPOSUPTO, /* 97 */ + + /* These are used for character classes and back references; only the + first six are the same as the sets above. */ + + OP_CRSTAR, /* 98 The maximizing and minimizing versions of */ + OP_CRMINSTAR, /* 99 all these opcodes must come in pairs, with */ + OP_CRPLUS, /* 100 the minimizing one second. These codes must */ + OP_CRMINPLUS, /* 101 be in exactly the same order as those above. */ + OP_CRQUERY, /* 102 */ + OP_CRMINQUERY, /* 103 */ + + OP_CRRANGE, /* 104 These are different to the three sets above. */ + OP_CRMINRANGE, /* 105 */ + + OP_CRPOSSTAR, /* 106 Possessified versions */ + OP_CRPOSPLUS, /* 107 */ + OP_CRPOSQUERY, /* 108 */ + OP_CRPOSRANGE, /* 109 */ + + /* End of quantifier opcodes */ + + OP_CLASS, /* 110 Match a character class, chars < 256 only */ + OP_NCLASS, /* 111 Same, but the bitmap was created from a negative + class - the difference is relevant only when a + character > 255 is encountered. */ + OP_XCLASS, /* 112 Extended class for handling > 255 chars within the + class. This does both positive and negative. */ + OP_REF, /* 113 Match a back reference, casefully */ + OP_REFI, /* 114 Match a back reference, caselessly */ + OP_DNREF, /* 115 Match a duplicate name backref, casefully */ + OP_DNREFI, /* 116 Match a duplicate name backref, caselessly */ + OP_RECURSE, /* 117 Match a numbered subpattern (possibly recursive) */ + OP_CALLOUT, /* 118 Call out to external function if provided */ + OP_CALLOUT_STR, /* 119 Call out with string argument */ + + OP_ALT, /* 120 Start of alternation */ + OP_KET, /* 121 End of group that doesn't have an unbounded repeat */ + OP_KETRMAX, /* 122 These two must remain together and in this */ + OP_KETRMIN, /* 123 order. They are for groups the repeat for ever. */ + OP_KETRPOS, /* 124 Possessive unlimited repeat. */ + + /* The assertions must come before BRA, CBRA, ONCE, and COND. */ + + OP_REVERSE, /* 125 Move pointer back - used in lookbehind assertions */ + OP_ASSERT, /* 126 Positive lookahead */ + OP_ASSERT_NOT, /* 127 Negative lookahead */ + OP_ASSERTBACK, /* 128 Positive lookbehind */ + OP_ASSERTBACK_NOT, /* 129 Negative lookbehind */ + OP_ASSERT_NA, /* 130 Positive non-atomic lookahead */ + OP_ASSERTBACK_NA, /* 131 Positive non-atomic lookbehind */ + + /* ONCE, SCRIPT_RUN, BRA, BRAPOS, CBRA, CBRAPOS, and COND must come + immediately after the assertions, with ONCE first, as there's a test for >= + ONCE for a subpattern that isn't an assertion. The POS versions must + immediately follow the non-POS versions in each case. */ + + OP_ONCE, /* 132 Atomic group, contains captures */ + OP_SCRIPT_RUN, /* 133 Non-capture, but check characters' scripts */ + OP_BRA, /* 134 Start of non-capturing bracket */ + OP_BRAPOS, /* 135 Ditto, with unlimited, possessive repeat */ + OP_CBRA, /* 136 Start of capturing bracket */ + OP_CBRAPOS, /* 137 Ditto, with unlimited, possessive repeat */ + OP_COND, /* 138 Conditional group */ + + /* These five must follow the previous five, in the same order. There's a + check for >= SBRA to distinguish the two sets. */ + + OP_SBRA, /* 139 Start of non-capturing bracket, check empty */ + OP_SBRAPOS, /* 149 Ditto, with unlimited, possessive repeat */ + OP_SCBRA, /* 141 Start of capturing bracket, check empty */ + OP_SCBRAPOS, /* 142 Ditto, with unlimited, possessive repeat */ + OP_SCOND, /* 143 Conditional group, check empty */ + + /* The next two pairs must (respectively) be kept together. */ + + OP_CREF, /* 144 Used to hold a capture number as condition */ + OP_DNCREF, /* 145 Used to point to duplicate names as a condition */ + OP_RREF, /* 146 Used to hold a recursion number as condition */ + OP_DNRREF, /* 147 Used to point to duplicate names as a condition */ + OP_FALSE, /* 148 Always false (used by DEFINE and VERSION) */ + OP_TRUE, /* 149 Always true (used by VERSION) */ + + OP_BRAZERO, /* 150 These two must remain together and in this */ + OP_BRAMINZERO, /* 151 order. */ + OP_BRAPOSZERO, /* 152 */ + + /* These are backtracking control verbs */ + + OP_MARK, /* 153 always has an argument */ + OP_PRUNE, /* 154 */ + OP_PRUNE_ARG, /* 155 same, but with argument */ + OP_SKIP, /* 156 */ + OP_SKIP_ARG, /* 157 same, but with argument */ + OP_THEN, /* 158 */ + OP_THEN_ARG, /* 159 same, but with argument */ + OP_COMMIT, /* 160 */ + OP_COMMIT_ARG, /* 161 same, but with argument */ + + /* These are forced failure and success verbs. FAIL and ACCEPT do accept an + argument, but these cases can be compiled as, for example, (*MARK:X)(*FAIL) + without the need for a special opcode. */ + + OP_FAIL, /* 162 */ + OP_ACCEPT, /* 163 */ + OP_ASSERT_ACCEPT, /* 164 Used inside assertions */ + OP_CLOSE, /* 165 Used before OP_ACCEPT to close open captures */ + + /* This is used to skip a subpattern with a {0} quantifier */ + + OP_SKIPZERO, /* 166 */ + + /* This is used to identify a DEFINE group during compilation so that it can + be checked for having only one branch. It is changed to OP_FALSE before + compilation finishes. */ + + OP_DEFINE, /* 167 */ + + /* This is not an opcode, but is used to check that tables indexed by opcode + are the correct length, in order to catch updating errors - there have been + some in the past. */ + + OP_TABLE_LENGTH + +}; + +/* *** NOTE NOTE NOTE *** Whenever the list above is updated, the two macro +definitions that follow must also be updated to match. There are also tables +called "opcode_possessify" in pcre2_compile.c and "coptable" and "poptable" in +pcre2_dfa_match.c that must be updated. */ + + +/* This macro defines textual names for all the opcodes. These are used only +for debugging, and some of them are only partial names. The macro is referenced +only in pcre2_printint.c, which fills out the full names in many cases (and in +some cases doesn't actually use these names at all). */ + +#define OP_NAME_LIST \ + "End", "\\A", "\\G", "\\K", "\\B", "\\b", "\\D", "\\d", \ + "\\S", "\\s", "\\W", "\\w", "Any", "AllAny", "Anybyte", \ + "notprop", "prop", "\\R", "\\H", "\\h", "\\V", "\\v", \ + "extuni", "\\Z", "\\z", \ + "$", "$", "^", "^", "char", "chari", "not", "noti", \ + "*", "*?", "+", "+?", "?", "??", \ + "{", "{", "{", \ + "*+","++", "?+", "{", \ + "*", "*?", "+", "+?", "?", "??", \ + "{", "{", "{", \ + "*+","++", "?+", "{", \ + "*", "*?", "+", "+?", "?", "??", \ + "{", "{", "{", \ + "*+","++", "?+", "{", \ + "*", "*?", "+", "+?", "?", "??", \ + "{", "{", "{", \ + "*+","++", "?+", "{", \ + "*", "*?", "+", "+?", "?", "??", "{", "{", "{", \ + "*+","++", "?+", "{", \ + "*", "*?", "+", "+?", "?", "??", "{", "{", \ + "*+","++", "?+", "{", \ + "class", "nclass", "xclass", "Ref", "Refi", "DnRef", "DnRefi", \ + "Recurse", "Callout", "CalloutStr", \ + "Alt", "Ket", "KetRmax", "KetRmin", "KetRpos", \ + "Reverse", "Assert", "Assert not", \ + "Assert back", "Assert back not", \ + "Non-atomic assert", "Non-atomic assert back", \ + "Once", \ + "Script run", \ + "Bra", "BraPos", "CBra", "CBraPos", \ + "Cond", \ + "SBra", "SBraPos", "SCBra", "SCBraPos", \ + "SCond", \ + "Cond ref", "Cond dnref", "Cond rec", "Cond dnrec", \ + "Cond false", "Cond true", \ + "Brazero", "Braminzero", "Braposzero", \ + "*MARK", "*PRUNE", "*PRUNE", "*SKIP", "*SKIP", \ + "*THEN", "*THEN", "*COMMIT", "*COMMIT", "*FAIL", \ + "*ACCEPT", "*ASSERT_ACCEPT", \ + "Close", "Skip zero", "Define" + + +/* This macro defines the length of fixed length operations in the compiled +regex. The lengths are used when searching for specific things, and also in the +debugging printing of a compiled regex. We use a macro so that it can be +defined close to the definitions of the opcodes themselves. + +As things have been extended, some of these are no longer fixed lenths, but are +minima instead. For example, the length of a single-character repeat may vary +in UTF-8 mode. The code that uses this table must know about such things. */ + +#define OP_LENGTHS \ + 1, /* End */ \ + 1, 1, 1, 1, 1, /* \A, \G, \K, \B, \b */ \ + 1, 1, 1, 1, 1, 1, /* \D, \d, \S, \s, \W, \w */ \ + 1, 1, 1, /* Any, AllAny, Anybyte */ \ + 3, 3, /* \P, \p */ \ + 1, 1, 1, 1, 1, /* \R, \H, \h, \V, \v */ \ + 1, /* \X */ \ + 1, 1, 1, 1, 1, 1, /* \Z, \z, $, $M ^, ^M */ \ + 2, /* Char - the minimum length */ \ + 2, /* Chari - the minimum length */ \ + 2, /* not */ \ + 2, /* noti */ \ + /* Positive single-char repeats ** These are */ \ + 2, 2, 2, 2, 2, 2, /* *, *?, +, +?, ?, ?? ** minima in */ \ + 2+IMM2_SIZE, 2+IMM2_SIZE, /* upto, minupto ** mode */ \ + 2+IMM2_SIZE, /* exact */ \ + 2, 2, 2, 2+IMM2_SIZE, /* *+, ++, ?+, upto+ */ \ + 2, 2, 2, 2, 2, 2, /* *I, *?I, +I, +?I, ?I, ??I ** UTF-8 */ \ + 2+IMM2_SIZE, 2+IMM2_SIZE, /* upto I, minupto I */ \ + 2+IMM2_SIZE, /* exact I */ \ + 2, 2, 2, 2+IMM2_SIZE, /* *+I, ++I, ?+I, upto+I */ \ + /* Negative single-char repeats - only for chars < 256 */ \ + 2, 2, 2, 2, 2, 2, /* NOT *, *?, +, +?, ?, ?? */ \ + 2+IMM2_SIZE, 2+IMM2_SIZE, /* NOT upto, minupto */ \ + 2+IMM2_SIZE, /* NOT exact */ \ + 2, 2, 2, 2+IMM2_SIZE, /* Possessive NOT *, +, ?, upto */ \ + 2, 2, 2, 2, 2, 2, /* NOT *I, *?I, +I, +?I, ?I, ??I */ \ + 2+IMM2_SIZE, 2+IMM2_SIZE, /* NOT upto I, minupto I */ \ + 2+IMM2_SIZE, /* NOT exact I */ \ + 2, 2, 2, 2+IMM2_SIZE, /* Possessive NOT *I, +I, ?I, upto I */ \ + /* Positive type repeats */ \ + 2, 2, 2, 2, 2, 2, /* Type *, *?, +, +?, ?, ?? */ \ + 2+IMM2_SIZE, 2+IMM2_SIZE, /* Type upto, minupto */ \ + 2+IMM2_SIZE, /* Type exact */ \ + 2, 2, 2, 2+IMM2_SIZE, /* Possessive *+, ++, ?+, upto+ */ \ + /* Character class & ref repeats */ \ + 1, 1, 1, 1, 1, 1, /* *, *?, +, +?, ?, ?? */ \ + 1+2*IMM2_SIZE, 1+2*IMM2_SIZE, /* CRRANGE, CRMINRANGE */ \ + 1, 1, 1, 1+2*IMM2_SIZE, /* Possessive *+, ++, ?+, CRPOSRANGE */ \ + 1+(32/sizeof(PCRE2_UCHAR)), /* CLASS */ \ + 1+(32/sizeof(PCRE2_UCHAR)), /* NCLASS */ \ + 0, /* XCLASS - variable length */ \ + 1+IMM2_SIZE, /* REF */ \ + 1+IMM2_SIZE, /* REFI */ \ + 1+2*IMM2_SIZE, /* DNREF */ \ + 1+2*IMM2_SIZE, /* DNREFI */ \ + 1+LINK_SIZE, /* RECURSE */ \ + 1+2*LINK_SIZE+1, /* CALLOUT */ \ + 0, /* CALLOUT_STR - variable length */ \ + 1+LINK_SIZE, /* Alt */ \ + 1+LINK_SIZE, /* Ket */ \ + 1+LINK_SIZE, /* KetRmax */ \ + 1+LINK_SIZE, /* KetRmin */ \ + 1+LINK_SIZE, /* KetRpos */ \ + 1+LINK_SIZE, /* Reverse */ \ + 1+LINK_SIZE, /* Assert */ \ + 1+LINK_SIZE, /* Assert not */ \ + 1+LINK_SIZE, /* Assert behind */ \ + 1+LINK_SIZE, /* Assert behind not */ \ + 1+LINK_SIZE, /* NA Assert */ \ + 1+LINK_SIZE, /* NA Assert behind */ \ + 1+LINK_SIZE, /* ONCE */ \ + 1+LINK_SIZE, /* SCRIPT_RUN */ \ + 1+LINK_SIZE, /* BRA */ \ + 1+LINK_SIZE, /* BRAPOS */ \ + 1+LINK_SIZE+IMM2_SIZE, /* CBRA */ \ + 1+LINK_SIZE+IMM2_SIZE, /* CBRAPOS */ \ + 1+LINK_SIZE, /* COND */ \ + 1+LINK_SIZE, /* SBRA */ \ + 1+LINK_SIZE, /* SBRAPOS */ \ + 1+LINK_SIZE+IMM2_SIZE, /* SCBRA */ \ + 1+LINK_SIZE+IMM2_SIZE, /* SCBRAPOS */ \ + 1+LINK_SIZE, /* SCOND */ \ + 1+IMM2_SIZE, 1+2*IMM2_SIZE, /* CREF, DNCREF */ \ + 1+IMM2_SIZE, 1+2*IMM2_SIZE, /* RREF, DNRREF */ \ + 1, 1, /* FALSE, TRUE */ \ + 1, 1, 1, /* BRAZERO, BRAMINZERO, BRAPOSZERO */ \ + 3, 1, 3, /* MARK, PRUNE, PRUNE_ARG */ \ + 1, 3, /* SKIP, SKIP_ARG */ \ + 1, 3, /* THEN, THEN_ARG */ \ + 1, 3, /* COMMIT, COMMIT_ARG */ \ + 1, 1, 1, /* FAIL, ACCEPT, ASSERT_ACCEPT */ \ + 1+IMM2_SIZE, 1, /* CLOSE, SKIPZERO */ \ + 1 /* DEFINE */ + +/* A magic value for OP_RREF to indicate the "any recursion" condition. */ + +#define RREF_ANY 0xffff + + +/* ---------- Private structures that are mode-independent. ---------- */ + +/* Structure to hold data for custom memory management. */ + +typedef struct pcre2_memctl { + void * (*malloc)(size_t, void *); + void (*free)(void *, void *); + void *memory_data; +} pcre2_memctl; + +/* Structure for building a chain of open capturing subpatterns during +compiling, so that instructions to close them can be compiled when (*ACCEPT) is +encountered. */ + +typedef struct open_capitem { + struct open_capitem *next; /* Chain link */ + uint16_t number; /* Capture number */ + uint16_t assert_depth; /* Assertion depth when opened */ +} open_capitem; + +/* Layout of the UCP type table that translates property names into types and +codes. Each entry used to point directly to a name, but to reduce the number of +relocations in shared libraries, it now has an offset into a single string +instead. */ + +typedef struct { + uint16_t name_offset; + uint16_t type; + uint16_t value; +} ucp_type_table; + +/* Unicode character database (UCD) record format */ + +typedef struct { + uint8_t script; /* ucp_Arabic, etc. */ + uint8_t chartype; /* ucp_Cc, etc. (general categories) */ + uint8_t gbprop; /* ucp_gbControl, etc. (grapheme break property) */ + uint8_t caseset; /* offset to multichar other cases or zero */ + int32_t other_case; /* offset to other case, or zero if none */ + int16_t scriptx; /* script extension value */ + int16_t dummy; /* spare - to round to multiple of 4 bytes */ +} ucd_record; + +/* UCD access macros */ + +#define UCD_BLOCK_SIZE 128 +#define REAL_GET_UCD(ch) (PRIV(ucd_records) + \ + PRIV(ucd_stage2)[PRIV(ucd_stage1)[(int)(ch) / UCD_BLOCK_SIZE] * \ + UCD_BLOCK_SIZE + (int)(ch) % UCD_BLOCK_SIZE]) + +#if PCRE2_CODE_UNIT_WIDTH == 32 +#define GET_UCD(ch) ((ch > MAX_UTF_CODE_POINT)? \ + PRIV(dummy_ucd_record) : REAL_GET_UCD(ch)) +#else +#define GET_UCD(ch) REAL_GET_UCD(ch) +#endif + +#define UCD_CHARTYPE(ch) GET_UCD(ch)->chartype +#define UCD_SCRIPT(ch) GET_UCD(ch)->script +#define UCD_CATEGORY(ch) PRIV(ucp_gentype)[UCD_CHARTYPE(ch)] +#define UCD_GRAPHBREAK(ch) GET_UCD(ch)->gbprop +#define UCD_CASESET(ch) GET_UCD(ch)->caseset +#define UCD_OTHERCASE(ch) ((uint32_t)((int)ch + (int)(GET_UCD(ch)->other_case))) +#define UCD_SCRIPTX(ch) GET_UCD(ch)->scriptx + +/* Header for serialized pcre2 codes. */ + +typedef struct pcre2_serialized_data { + uint32_t magic; + uint32_t version; + uint32_t config; + int32_t number_of_codes; +} pcre2_serialized_data; + + + +/* ----------------- Items that need PCRE2_CODE_UNIT_WIDTH ----------------- */ + +/* When this file is included by pcre2test, PCRE2_CODE_UNIT_WIDTH is defined as +0, so the following items are omitted. */ + +#if defined PCRE2_CODE_UNIT_WIDTH && PCRE2_CODE_UNIT_WIDTH != 0 + +/* EBCDIC is supported only for the 8-bit library. */ + +#if defined EBCDIC && PCRE2_CODE_UNIT_WIDTH != 8 +#error EBCDIC is not supported for the 16-bit or 32-bit libraries +#endif + +/* This is the largest non-UTF code point. */ + +#define MAX_NON_UTF_CHAR (0xffffffffU >> (32 - PCRE2_CODE_UNIT_WIDTH)) + +/* Internal shared data tables and variables. These are used by more than one +of the exported public functions. They have to be "external" in the C sense, +but are not part of the PCRE2 public API. Although the data for some of them is +identical in all libraries, they must have different names so that multiple +libraries can be simultaneously linked to a single application. However, UTF-8 +tables are needed only when compiling the 8-bit library. */ + +#if PCRE2_CODE_UNIT_WIDTH == 8 +extern const int PRIV(utf8_table1)[]; +extern const int PRIV(utf8_table1_size); +extern const int PRIV(utf8_table2)[]; +extern const int PRIV(utf8_table3)[]; +extern const uint8_t PRIV(utf8_table4)[]; +#endif + +#define _pcre2_OP_lengths PCRE2_SUFFIX(_pcre2_OP_lengths_) +#define _pcre2_callout_end_delims PCRE2_SUFFIX(_pcre2_callout_end_delims_) +#define _pcre2_callout_start_delims PCRE2_SUFFIX(_pcre2_callout_start_delims_) +#define _pcre2_default_compile_context PCRE2_SUFFIX(_pcre2_default_compile_context_) +#define _pcre2_default_convert_context PCRE2_SUFFIX(_pcre2_default_convert_context_) +#define _pcre2_default_match_context PCRE2_SUFFIX(_pcre2_default_match_context_) +#define _pcre2_default_tables PCRE2_SUFFIX(_pcre2_default_tables_) +#if PCRE2_CODE_UNIT_WIDTH == 32 +#define _pcre2_dummy_ucd_record PCRE2_SUFFIX(_pcre2_dummy_ucd_record_) +#endif +#define _pcre2_hspace_list PCRE2_SUFFIX(_pcre2_hspace_list_) +#define _pcre2_vspace_list PCRE2_SUFFIX(_pcre2_vspace_list_) +#define _pcre2_ucd_caseless_sets PCRE2_SUFFIX(_pcre2_ucd_caseless_sets_) +#define _pcre2_ucd_digit_sets PCRE2_SUFFIX(_pcre2_ucd_digit_sets_) +#define _pcre2_ucd_script_sets PCRE2_SUFFIX(_pcre2_ucd_script_sets_) +#define _pcre2_ucd_records PCRE2_SUFFIX(_pcre2_ucd_records_) +#define _pcre2_ucd_stage1 PCRE2_SUFFIX(_pcre2_ucd_stage1_) +#define _pcre2_ucd_stage2 PCRE2_SUFFIX(_pcre2_ucd_stage2_) +#define _pcre2_ucp_gbtable PCRE2_SUFFIX(_pcre2_ucp_gbtable_) +#define _pcre2_ucp_gentype PCRE2_SUFFIX(_pcre2_ucp_gentype_) +#define _pcre2_ucp_typerange PCRE2_SUFFIX(_pcre2_ucp_typerange_) +#define _pcre2_unicode_version PCRE2_SUFFIX(_pcre2_unicode_version_) +#define _pcre2_utt PCRE2_SUFFIX(_pcre2_utt_) +#define _pcre2_utt_names PCRE2_SUFFIX(_pcre2_utt_names_) +#define _pcre2_utt_size PCRE2_SUFFIX(_pcre2_utt_size_) + +extern const uint8_t PRIV(OP_lengths)[]; +extern const uint32_t PRIV(callout_end_delims)[]; +extern const uint32_t PRIV(callout_start_delims)[]; +extern const pcre2_compile_context PRIV(default_compile_context); +extern const pcre2_convert_context PRIV(default_convert_context); +extern const pcre2_match_context PRIV(default_match_context); +extern const uint8_t PRIV(default_tables)[]; +extern const uint32_t PRIV(hspace_list)[]; +extern const uint32_t PRIV(vspace_list)[]; +extern const uint32_t PRIV(ucd_caseless_sets)[]; +extern const uint32_t PRIV(ucd_digit_sets)[]; +extern const uint8_t PRIV(ucd_script_sets)[]; +extern const ucd_record PRIV(ucd_records)[]; +#if PCRE2_CODE_UNIT_WIDTH == 32 +extern const ucd_record PRIV(dummy_ucd_record)[]; +#endif +extern const uint16_t PRIV(ucd_stage1)[]; +extern const uint16_t PRIV(ucd_stage2)[]; +extern const uint32_t PRIV(ucp_gbtable)[]; +extern const uint32_t PRIV(ucp_gentype)[]; +#ifdef SUPPORT_JIT +extern const int PRIV(ucp_typerange)[]; +#endif +extern const char *PRIV(unicode_version); +extern const ucp_type_table PRIV(utt)[]; +extern const char PRIV(utt_names)[]; +extern const size_t PRIV(utt_size); + +/* Mode-dependent macros and hidden and private structures are defined in a +separate file so that pcre2test can include them at all supported widths. When +compiling the library, PCRE2_CODE_UNIT_WIDTH will be defined, and we can +include them at the appropriate width, after setting up suffix macros for the +private structures. */ + +#define branch_chain PCRE2_SUFFIX(branch_chain_) +#define compile_block PCRE2_SUFFIX(compile_block_) +#define dfa_match_block PCRE2_SUFFIX(dfa_match_block_) +#define match_block PCRE2_SUFFIX(match_block_) +#define named_group PCRE2_SUFFIX(named_group_) + +#include "pcre2_intmodedep.h" + +/* Private "external" functions. These are internal functions that are called +from modules other than the one in which they are defined. They have to be +"external" in the C sense, but are not part of the PCRE2 public API. They are +not referenced from pcre2test, and must not be defined when no code unit width +is available. */ + +#define _pcre2_auto_possessify PCRE2_SUFFIX(_pcre2_auto_possessify_) +#define _pcre2_check_escape PCRE2_SUFFIX(_pcre2_check_escape_) +#define _pcre2_extuni PCRE2_SUFFIX(_pcre2_extuni_) +#define _pcre2_find_bracket PCRE2_SUFFIX(_pcre2_find_bracket_) +#define _pcre2_is_newline PCRE2_SUFFIX(_pcre2_is_newline_) +#define _pcre2_jit_free_rodata PCRE2_SUFFIX(_pcre2_jit_free_rodata_) +#define _pcre2_jit_free PCRE2_SUFFIX(_pcre2_jit_free_) +#define _pcre2_jit_get_size PCRE2_SUFFIX(_pcre2_jit_get_size_) +#define _pcre2_jit_get_target PCRE2_SUFFIX(_pcre2_jit_get_target_) +#define _pcre2_memctl_malloc PCRE2_SUFFIX(_pcre2_memctl_malloc_) +#define _pcre2_ord2utf PCRE2_SUFFIX(_pcre2_ord2utf_) +#define _pcre2_script_run PCRE2_SUFFIX(_pcre2_script_run_) +#define _pcre2_strcmp PCRE2_SUFFIX(_pcre2_strcmp_) +#define _pcre2_strcmp_c8 PCRE2_SUFFIX(_pcre2_strcmp_c8_) +#define _pcre2_strcpy_c8 PCRE2_SUFFIX(_pcre2_strcpy_c8_) +#define _pcre2_strlen PCRE2_SUFFIX(_pcre2_strlen_) +#define _pcre2_strncmp PCRE2_SUFFIX(_pcre2_strncmp_) +#define _pcre2_strncmp_c8 PCRE2_SUFFIX(_pcre2_strncmp_c8_) +#define _pcre2_study PCRE2_SUFFIX(_pcre2_study_) +#define _pcre2_valid_utf PCRE2_SUFFIX(_pcre2_valid_utf_) +#define _pcre2_was_newline PCRE2_SUFFIX(_pcre2_was_newline_) +#define _pcre2_xclass PCRE2_SUFFIX(_pcre2_xclass_) + +extern int _pcre2_auto_possessify(PCRE2_UCHAR *, + const compile_block *); +extern int _pcre2_check_escape(PCRE2_SPTR *, PCRE2_SPTR, uint32_t *, + int *, uint32_t, uint32_t, BOOL, compile_block *); +extern PCRE2_SPTR _pcre2_extuni(uint32_t, PCRE2_SPTR, PCRE2_SPTR, PCRE2_SPTR, + BOOL, int *); +extern PCRE2_SPTR _pcre2_find_bracket(PCRE2_SPTR, BOOL, int); +extern BOOL _pcre2_is_newline(PCRE2_SPTR, uint32_t, PCRE2_SPTR, + uint32_t *, BOOL); +extern void _pcre2_jit_free_rodata(void *, void *); +extern void _pcre2_jit_free(void *, pcre2_memctl *); +extern size_t _pcre2_jit_get_size(void *); +const char * _pcre2_jit_get_target(void); +extern void * _pcre2_memctl_malloc(size_t, pcre2_memctl *); +extern unsigned int _pcre2_ord2utf(uint32_t, PCRE2_UCHAR *); +extern BOOL _pcre2_script_run(PCRE2_SPTR, PCRE2_SPTR, BOOL); +extern int _pcre2_strcmp(PCRE2_SPTR, PCRE2_SPTR); +extern int _pcre2_strcmp_c8(PCRE2_SPTR, const char *); +extern PCRE2_SIZE _pcre2_strcpy_c8(PCRE2_UCHAR *, const char *); +extern PCRE2_SIZE _pcre2_strlen(PCRE2_SPTR); +extern int _pcre2_strncmp(PCRE2_SPTR, PCRE2_SPTR, size_t); +extern int _pcre2_strncmp_c8(PCRE2_SPTR, const char *, size_t); +extern int _pcre2_study(pcre2_real_code *); +extern int _pcre2_valid_utf(PCRE2_SPTR, PCRE2_SIZE, PCRE2_SIZE *); +extern BOOL _pcre2_was_newline(PCRE2_SPTR, uint32_t, PCRE2_SPTR, + uint32_t *, BOOL); +extern BOOL _pcre2_xclass(uint32_t, PCRE2_SPTR, BOOL); + +/* This function is needed only when memmove() is not available. */ + +#if !defined(VPCOMPAT) && !defined(HAVE_MEMMOVE) +#define _pcre2_memmove PCRE2_SUFFIX(_pcre2_memmove) +extern void * _pcre2_memmove(void *, const void *, size_t); +#endif + +#endif /* PCRE2_CODE_UNIT_WIDTH */ +#endif /* PCRE2_INTERNAL_H_IDEMPOTENT_GUARD */ + +/* End of pcre2_internal.h */ diff --git a/src/pcre2/src/pcre2_intmodedep.h b/src/pcre2/src/pcre2_intmodedep.h new file mode 100644 index 00000000..ea3b3ec6 --- /dev/null +++ b/src/pcre2/src/pcre2_intmodedep.h @@ -0,0 +1,923 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2018 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +/* This module contains mode-dependent macro and structure definitions. The +file is #included by pcre2_internal.h if PCRE2_CODE_UNIT_WIDTH is defined. +These mode-dependent items are kept in a separate file so that they can also be +#included multiple times for different code unit widths by pcre2test in order +to have access to the hidden structures at all supported widths. + +Some of the mode-dependent macros are required at different widths for +different parts of the pcre2test code (in particular, the included +pcre_printint.c file). We undefine them here so that they can be re-defined for +multiple inclusions. Not all of these are used in pcre2test, but it's easier +just to undefine them all. */ + +#undef ACROSSCHAR +#undef BACKCHAR +#undef BYTES2CU +#undef CHMAX_255 +#undef CU2BYTES +#undef FORWARDCHAR +#undef FORWARDCHARTEST +#undef GET +#undef GET2 +#undef GETCHAR +#undef GETCHARINC +#undef GETCHARINCTEST +#undef GETCHARLEN +#undef GETCHARLENTEST +#undef GETCHARTEST +#undef GET_EXTRALEN +#undef HAS_EXTRALEN +#undef IMM2_SIZE +#undef MAX_255 +#undef MAX_MARK +#undef MAX_PATTERN_SIZE +#undef MAX_UTF_SINGLE_CU +#undef NOT_FIRSTCU +#undef PUT +#undef PUT2 +#undef PUT2INC +#undef PUTCHAR +#undef PUTINC +#undef TABLE_GET + + + +/* -------------------------- MACROS ----------------------------- */ + +/* PCRE keeps offsets in its compiled code as at least 16-bit quantities +(always stored in big-endian order in 8-bit mode) by default. These are used, +for example, to link from the start of a subpattern to its alternatives and its +end. The use of 16 bits per offset limits the size of an 8-bit compiled regex +to around 64K, which is big enough for almost everybody. However, I received a +request for an even bigger limit. For this reason, and also to make the code +easier to maintain, the storing and loading of offsets from the compiled code +unit string is now handled by the macros that are defined here. + +The macros are controlled by the value of LINK_SIZE. This defaults to 2, but +values of 3 or 4 are also supported. */ + +/* ------------------- 8-bit support ------------------ */ + +#if PCRE2_CODE_UNIT_WIDTH == 8 + +#if LINK_SIZE == 2 +#define PUT(a,n,d) \ + (a[n] = (PCRE2_UCHAR)((d) >> 8)), \ + (a[(n)+1] = (PCRE2_UCHAR)((d) & 255)) +#define GET(a,n) \ + (unsigned int)(((a)[n] << 8) | (a)[(n)+1]) +#define MAX_PATTERN_SIZE (1 << 16) + +#elif LINK_SIZE == 3 +#define PUT(a,n,d) \ + (a[n] = (PCRE2_UCHAR)((d) >> 16)), \ + (a[(n)+1] = (PCRE2_UCHAR)((d) >> 8)), \ + (a[(n)+2] = (PCRE2_UCHAR)((d) & 255)) +#define GET(a,n) \ + (unsigned int)(((a)[n] << 16) | ((a)[(n)+1] << 8) | (a)[(n)+2]) +#define MAX_PATTERN_SIZE (1 << 24) + +#elif LINK_SIZE == 4 +#define PUT(a,n,d) \ + (a[n] = (PCRE2_UCHAR)((d) >> 24)), \ + (a[(n)+1] = (PCRE2_UCHAR)((d) >> 16)), \ + (a[(n)+2] = (PCRE2_UCHAR)((d) >> 8)), \ + (a[(n)+3] = (PCRE2_UCHAR)((d) & 255)) +#define GET(a,n) \ + (unsigned int)(((a)[n] << 24) | ((a)[(n)+1] << 16) | ((a)[(n)+2] << 8) | (a)[(n)+3]) +#define MAX_PATTERN_SIZE (1 << 30) /* Keep it positive */ + +#else +#error LINK_SIZE must be 2, 3, or 4 +#endif + + +/* ------------------- 16-bit support ------------------ */ + +#elif PCRE2_CODE_UNIT_WIDTH == 16 + +#if LINK_SIZE == 2 +#undef LINK_SIZE +#define LINK_SIZE 1 +#define PUT(a,n,d) \ + (a[n] = (PCRE2_UCHAR)(d)) +#define GET(a,n) \ + (a[n]) +#define MAX_PATTERN_SIZE (1 << 16) + +#elif LINK_SIZE == 3 || LINK_SIZE == 4 +#undef LINK_SIZE +#define LINK_SIZE 2 +#define PUT(a,n,d) \ + (a[n] = (PCRE2_UCHAR)((d) >> 16)), \ + (a[(n)+1] = (PCRE2_UCHAR)((d) & 65535)) +#define GET(a,n) \ + (unsigned int)(((a)[n] << 16) | (a)[(n)+1]) +#define MAX_PATTERN_SIZE (1 << 30) /* Keep it positive */ + +#else +#error LINK_SIZE must be 2, 3, or 4 +#endif + + +/* ------------------- 32-bit support ------------------ */ + +#elif PCRE2_CODE_UNIT_WIDTH == 32 +#undef LINK_SIZE +#define LINK_SIZE 1 +#define PUT(a,n,d) \ + (a[n] = (d)) +#define GET(a,n) \ + (a[n]) +#define MAX_PATTERN_SIZE (1 << 30) /* Keep it positive */ + +#else +#error Unsupported compiling mode +#endif + + +/* --------------- Other mode-specific macros ----------------- */ + +/* PCRE uses some other (at least) 16-bit quantities that do not change when +the size of offsets changes. There are used for repeat counts and for other +things such as capturing parenthesis numbers in back references. + +Define the number of code units required to hold a 16-bit count/offset, and +macros to load and store such a value. For reasons that I do not understand, +the expression in the 8-bit GET2 macro is treated by gcc as a signed +expression, even when a is declared as unsigned. It seems that any kind of +arithmetic results in a signed value. Hence the cast. */ + +#if PCRE2_CODE_UNIT_WIDTH == 8 +#define IMM2_SIZE 2 +#define GET2(a,n) (unsigned int)(((a)[n] << 8) | (a)[(n)+1]) +#define PUT2(a,n,d) a[n] = (d) >> 8, a[(n)+1] = (d) & 255 + +#else /* Code units are 16 or 32 bits */ +#define IMM2_SIZE 1 +#define GET2(a,n) a[n] +#define PUT2(a,n,d) a[n] = d +#endif + +/* Other macros that are different for 8-bit mode. The MAX_255 macro checks +whether its argument, which is assumed to be one code unit, is less than 256. +The CHMAX_255 macro does not assume one code unit. The maximum length of a MARK +name must fit in one code unit; currently it is set to 255 or 65535. The +TABLE_GET macro is used to access elements of tables containing exactly 256 +items. Its argument is a code unit. When code points can be greater than 255, a +check is needed before accessing these tables. */ + +#if PCRE2_CODE_UNIT_WIDTH == 8 +#define MAX_255(c) TRUE +#define MAX_MARK ((1u << 8) - 1) +#define TABLE_GET(c, table, default) ((table)[c]) +#ifdef SUPPORT_UNICODE +#define SUPPORT_WIDE_CHARS +#define CHMAX_255(c) ((c) <= 255u) +#else +#define CHMAX_255(c) TRUE +#endif /* SUPPORT_UNICODE */ + +#else /* Code units are 16 or 32 bits */ +#define CHMAX_255(c) ((c) <= 255u) +#define MAX_255(c) ((c) <= 255u) +#define MAX_MARK ((1u << 16) - 1) +#define SUPPORT_WIDE_CHARS +#define TABLE_GET(c, table, default) (MAX_255(c)? ((table)[c]):(default)) +#endif + + +/* ----------------- Character-handling macros ----------------- */ + +/* There is a proposed future special "UTF-21" mode, in which only the lowest +21 bits of a 32-bit character are interpreted as UTF, with the remaining 11 +high-order bits available to the application for other uses. In preparation for +the future implementation of this mode, there are macros that load a data item +and, if in this special mode, mask it to 21 bits. These macros all have names +starting with UCHAR21. In all other modes, including the normal 32-bit +library, the macros all have the same simple definitions. When the new mode is +implemented, it is expected that these definitions will be varied appropriately +using #ifdef when compiling the library that supports the special mode. */ + +#define UCHAR21(eptr) (*(eptr)) +#define UCHAR21TEST(eptr) (*(eptr)) +#define UCHAR21INC(eptr) (*(eptr)++) +#define UCHAR21INCTEST(eptr) (*(eptr)++) + +/* When UTF encoding is being used, a character is no longer just a single +byte in 8-bit mode or a single short in 16-bit mode. The macros for character +handling generate simple sequences when used in the basic mode, and more +complicated ones for UTF characters. GETCHARLENTEST and other macros are not +used when UTF is not supported. To make sure they can never even appear when +UTF support is omitted, we don't even define them. */ + +#ifndef SUPPORT_UNICODE + +/* #define MAX_UTF_SINGLE_CU */ +/* #define HAS_EXTRALEN(c) */ +/* #define GET_EXTRALEN(c) */ +/* #define NOT_FIRSTCU(c) */ +#define GETCHAR(c, eptr) c = *eptr; +#define GETCHARTEST(c, eptr) c = *eptr; +#define GETCHARINC(c, eptr) c = *eptr++; +#define GETCHARINCTEST(c, eptr) c = *eptr++; +#define GETCHARLEN(c, eptr, len) c = *eptr; +#define PUTCHAR(c, p) (*p = c, 1) +/* #define GETCHARLENTEST(c, eptr, len) */ +/* #define BACKCHAR(eptr) */ +/* #define FORWARDCHAR(eptr) */ +/* #define FORWARCCHARTEST(eptr,end) */ +/* #define ACROSSCHAR(condition, eptr, action) */ + +#else /* SUPPORT_UNICODE */ + +/* ------------------- 8-bit support ------------------ */ + +#if PCRE2_CODE_UNIT_WIDTH == 8 +#define MAYBE_UTF_MULTI /* UTF chars may use multiple code units */ + +/* The largest UTF code point that can be encoded as a single code unit. */ + +#define MAX_UTF_SINGLE_CU 127 + +/* Tests whether the code point needs extra characters to decode. */ + +#define HAS_EXTRALEN(c) HASUTF8EXTRALEN(c) + +/* Returns with the additional number of characters if IS_MULTICHAR(c) is TRUE. +Otherwise it has an undefined behaviour. */ + +#define GET_EXTRALEN(c) (PRIV(utf8_table4)[(c) & 0x3fu]) + +/* Returns TRUE, if the given value is not the first code unit of a UTF +sequence. */ + +#define NOT_FIRSTCU(c) (((c) & 0xc0u) == 0x80u) + +/* Get the next UTF-8 character, not advancing the pointer. This is called when +we know we are in UTF-8 mode. */ + +#define GETCHAR(c, eptr) \ + c = *eptr; \ + if (c >= 0xc0u) GETUTF8(c, eptr); + +/* Get the next UTF-8 character, testing for UTF-8 mode, and not advancing the +pointer. */ + +#define GETCHARTEST(c, eptr) \ + c = *eptr; \ + if (utf && c >= 0xc0u) GETUTF8(c, eptr); + +/* Get the next UTF-8 character, advancing the pointer. This is called when we +know we are in UTF-8 mode. */ + +#define GETCHARINC(c, eptr) \ + c = *eptr++; \ + if (c >= 0xc0u) GETUTF8INC(c, eptr); + +/* Get the next character, testing for UTF-8 mode, and advancing the pointer. +This is called when we don't know if we are in UTF-8 mode. */ + +#define GETCHARINCTEST(c, eptr) \ + c = *eptr++; \ + if (utf && c >= 0xc0u) GETUTF8INC(c, eptr); + +/* Get the next UTF-8 character, not advancing the pointer, incrementing length +if there are extra bytes. This is called when we know we are in UTF-8 mode. */ + +#define GETCHARLEN(c, eptr, len) \ + c = *eptr; \ + if (c >= 0xc0u) GETUTF8LEN(c, eptr, len); + +/* Get the next UTF-8 character, testing for UTF-8 mode, not advancing the +pointer, incrementing length if there are extra bytes. This is called when we +do not know if we are in UTF-8 mode. */ + +#define GETCHARLENTEST(c, eptr, len) \ + c = *eptr; \ + if (utf && c >= 0xc0u) GETUTF8LEN(c, eptr, len); + +/* If the pointer is not at the start of a character, move it back until +it is. This is called only in UTF-8 mode - we don't put a test within the macro +because almost all calls are already within a block of UTF-8 only code. */ + +#define BACKCHAR(eptr) while((*eptr & 0xc0u) == 0x80u) eptr-- + +/* Same as above, just in the other direction. */ +#define FORWARDCHAR(eptr) while((*eptr & 0xc0u) == 0x80u) eptr++ +#define FORWARDCHARTEST(eptr,end) while(eptr < end && (*eptr & 0xc0u) == 0x80u) eptr++ + +/* Same as above, but it allows a fully customizable form. */ +#define ACROSSCHAR(condition, eptr, action) \ + while((condition) && ((*eptr) & 0xc0u) == 0x80u) action + +/* Deposit a character into memory, returning the number of code units. */ + +#define PUTCHAR(c, p) ((utf && c > MAX_UTF_SINGLE_CU)? \ + PRIV(ord2utf)(c,p) : (*p = c, 1)) + + +/* ------------------- 16-bit support ------------------ */ + +#elif PCRE2_CODE_UNIT_WIDTH == 16 +#define MAYBE_UTF_MULTI /* UTF chars may use multiple code units */ + +/* The largest UTF code point that can be encoded as a single code unit. */ + +#define MAX_UTF_SINGLE_CU 65535 + +/* Tests whether the code point needs extra characters to decode. */ + +#define HAS_EXTRALEN(c) (((c) & 0xfc00u) == 0xd800u) + +/* Returns with the additional number of characters if IS_MULTICHAR(c) is TRUE. +Otherwise it has an undefined behaviour. */ + +#define GET_EXTRALEN(c) 1 + +/* Returns TRUE, if the given value is not the first code unit of a UTF +sequence. */ + +#define NOT_FIRSTCU(c) (((c) & 0xfc00u) == 0xdc00u) + +/* Base macro to pick up the low surrogate of a UTF-16 character, not +advancing the pointer. */ + +#define GETUTF16(c, eptr) \ + { c = (((c & 0x3ffu) << 10) | (eptr[1] & 0x3ffu)) + 0x10000u; } + +/* Get the next UTF-16 character, not advancing the pointer. This is called when +we know we are in UTF-16 mode. */ + +#define GETCHAR(c, eptr) \ + c = *eptr; \ + if ((c & 0xfc00u) == 0xd800u) GETUTF16(c, eptr); + +/* Get the next UTF-16 character, testing for UTF-16 mode, and not advancing the +pointer. */ + +#define GETCHARTEST(c, eptr) \ + c = *eptr; \ + if (utf && (c & 0xfc00u) == 0xd800u) GETUTF16(c, eptr); + +/* Base macro to pick up the low surrogate of a UTF-16 character, advancing +the pointer. */ + +#define GETUTF16INC(c, eptr) \ + { c = (((c & 0x3ffu) << 10) | (*eptr++ & 0x3ffu)) + 0x10000u; } + +/* Get the next UTF-16 character, advancing the pointer. This is called when we +know we are in UTF-16 mode. */ + +#define GETCHARINC(c, eptr) \ + c = *eptr++; \ + if ((c & 0xfc00u) == 0xd800u) GETUTF16INC(c, eptr); + +/* Get the next character, testing for UTF-16 mode, and advancing the pointer. +This is called when we don't know if we are in UTF-16 mode. */ + +#define GETCHARINCTEST(c, eptr) \ + c = *eptr++; \ + if (utf && (c & 0xfc00u) == 0xd800u) GETUTF16INC(c, eptr); + +/* Base macro to pick up the low surrogate of a UTF-16 character, not +advancing the pointer, incrementing the length. */ + +#define GETUTF16LEN(c, eptr, len) \ + { c = (((c & 0x3ffu) << 10) | (eptr[1] & 0x3ffu)) + 0x10000u; len++; } + +/* Get the next UTF-16 character, not advancing the pointer, incrementing +length if there is a low surrogate. This is called when we know we are in +UTF-16 mode. */ + +#define GETCHARLEN(c, eptr, len) \ + c = *eptr; \ + if ((c & 0xfc00u) == 0xd800u) GETUTF16LEN(c, eptr, len); + +/* Get the next UTF-816character, testing for UTF-16 mode, not advancing the +pointer, incrementing length if there is a low surrogate. This is called when +we do not know if we are in UTF-16 mode. */ + +#define GETCHARLENTEST(c, eptr, len) \ + c = *eptr; \ + if (utf && (c & 0xfc00u) == 0xd800u) GETUTF16LEN(c, eptr, len); + +/* If the pointer is not at the start of a character, move it back until +it is. This is called only in UTF-16 mode - we don't put a test within the +macro because almost all calls are already within a block of UTF-16 only +code. */ + +#define BACKCHAR(eptr) if ((*eptr & 0xfc00u) == 0xdc00u) eptr-- + +/* Same as above, just in the other direction. */ +#define FORWARDCHAR(eptr) if ((*eptr & 0xfc00u) == 0xdc00u) eptr++ +#define FORWARDCHARTEST(eptr,end) if (eptr < end && (*eptr & 0xfc00u) == 0xdc00u) eptr++ + +/* Same as above, but it allows a fully customizable form. */ +#define ACROSSCHAR(condition, eptr, action) \ + if ((condition) && ((*eptr) & 0xfc00u) == 0xdc00u) action + +/* Deposit a character into memory, returning the number of code units. */ + +#define PUTCHAR(c, p) ((utf && c > MAX_UTF_SINGLE_CU)? \ + PRIV(ord2utf)(c,p) : (*p = c, 1)) + + +/* ------------------- 32-bit support ------------------ */ + +#else + +/* These are trivial for the 32-bit library, since all UTF-32 characters fit +into one PCRE2_UCHAR unit. */ + +#define MAX_UTF_SINGLE_CU (0x10ffffu) +#define HAS_EXTRALEN(c) (0) +#define GET_EXTRALEN(c) (0) +#define NOT_FIRSTCU(c) (0) + +/* Get the next UTF-32 character, not advancing the pointer. This is called when +we know we are in UTF-32 mode. */ + +#define GETCHAR(c, eptr) \ + c = *(eptr); + +/* Get the next UTF-32 character, testing for UTF-32 mode, and not advancing the +pointer. */ + +#define GETCHARTEST(c, eptr) \ + c = *(eptr); + +/* Get the next UTF-32 character, advancing the pointer. This is called when we +know we are in UTF-32 mode. */ + +#define GETCHARINC(c, eptr) \ + c = *((eptr)++); + +/* Get the next character, testing for UTF-32 mode, and advancing the pointer. +This is called when we don't know if we are in UTF-32 mode. */ + +#define GETCHARINCTEST(c, eptr) \ + c = *((eptr)++); + +/* Get the next UTF-32 character, not advancing the pointer, not incrementing +length (since all UTF-32 is of length 1). This is called when we know we are in +UTF-32 mode. */ + +#define GETCHARLEN(c, eptr, len) \ + GETCHAR(c, eptr) + +/* Get the next UTF-32character, testing for UTF-32 mode, not advancing the +pointer, not incrementing the length (since all UTF-32 is of length 1). +This is called when we do not know if we are in UTF-32 mode. */ + +#define GETCHARLENTEST(c, eptr, len) \ + GETCHARTEST(c, eptr) + +/* If the pointer is not at the start of a character, move it back until +it is. This is called only in UTF-32 mode - we don't put a test within the +macro because almost all calls are already within a block of UTF-32 only +code. + +These are all no-ops since all UTF-32 characters fit into one pcre_uchar. */ + +#define BACKCHAR(eptr) do { } while (0) + +/* Same as above, just in the other direction. */ + +#define FORWARDCHAR(eptr) do { } while (0) +#define FORWARDCHARTEST(eptr,end) do { } while (0) + +/* Same as above, but it allows a fully customizable form. */ + +#define ACROSSCHAR(condition, eptr, action) do { } while (0) + +/* Deposit a character into memory, returning the number of code units. */ + +#define PUTCHAR(c, p) (*p = c, 1) + +#endif /* UTF-32 character handling */ +#endif /* SUPPORT_UNICODE */ + + +/* Mode-dependent macros that have the same definition in all modes. */ + +#define CU2BYTES(x) ((x)*((PCRE2_CODE_UNIT_WIDTH/8))) +#define BYTES2CU(x) ((x)/((PCRE2_CODE_UNIT_WIDTH/8))) +#define PUTINC(a,n,d) PUT(a,n,d), a += LINK_SIZE +#define PUT2INC(a,n,d) PUT2(a,n,d), a += IMM2_SIZE + + +/* ----------------------- HIDDEN STRUCTURES ----------------------------- */ + +/* NOTE: All these structures *must* start with a pcre2_memctl structure. The +code that uses them is simpler because it assumes this. */ + +/* The real general context structure. At present it holds only data for custom +memory control. */ + +typedef struct pcre2_real_general_context { + pcre2_memctl memctl; +} pcre2_real_general_context; + +/* The real compile context structure */ + +typedef struct pcre2_real_compile_context { + pcre2_memctl memctl; + int (*stack_guard)(uint32_t, void *); + void *stack_guard_data; + const uint8_t *tables; + PCRE2_SIZE max_pattern_length; + uint16_t bsr_convention; + uint16_t newline_convention; + uint32_t parens_nest_limit; + uint32_t extra_options; +} pcre2_real_compile_context; + +/* The real match context structure. */ + +typedef struct pcre2_real_match_context { + pcre2_memctl memctl; +#ifdef SUPPORT_JIT + pcre2_jit_callback jit_callback; + void *jit_callback_data; +#endif + int (*callout)(pcre2_callout_block *, void *); + void *callout_data; + int (*substitute_callout)(pcre2_substitute_callout_block *, void *); + void *substitute_callout_data; + PCRE2_SIZE offset_limit; + uint32_t heap_limit; + uint32_t match_limit; + uint32_t depth_limit; +} pcre2_real_match_context; + +/* The real convert context structure. */ + +typedef struct pcre2_real_convert_context { + pcre2_memctl memctl; + uint32_t glob_separator; + uint32_t glob_escape; +} pcre2_real_convert_context; + +/* The real compiled code structure. The type for the blocksize field is +defined specially because it is required in pcre2_serialize_decode() when +copying the size from possibly unaligned memory into a variable of the same +type. Use a macro rather than a typedef to avoid compiler warnings when this +file is included multiple times by pcre2test. LOOKBEHIND_MAX specifies the +largest lookbehind that is supported. (OP_REVERSE in a pattern has a 16-bit +argument in 8-bit and 16-bit modes, so we need no more than a 16-bit field +here.) */ + +#undef CODE_BLOCKSIZE_TYPE +#define CODE_BLOCKSIZE_TYPE size_t + +#undef LOOKBEHIND_MAX +#define LOOKBEHIND_MAX UINT16_MAX + +typedef struct pcre2_real_code { + pcre2_memctl memctl; /* Memory control fields */ + const uint8_t *tables; /* The character tables */ + void *executable_jit; /* Pointer to JIT code */ + uint8_t start_bitmap[32]; /* Bitmap for starting code unit < 256 */ + CODE_BLOCKSIZE_TYPE blocksize; /* Total (bytes) that was malloc-ed */ + uint32_t magic_number; /* Paranoid and endianness check */ + uint32_t compile_options; /* Options passed to pcre2_compile() */ + uint32_t overall_options; /* Options after processing the pattern */ + uint32_t extra_options; /* Taken from compile_context */ + uint32_t flags; /* Various state flags */ + uint32_t limit_heap; /* Limit set in the pattern */ + uint32_t limit_match; /* Limit set in the pattern */ + uint32_t limit_depth; /* Limit set in the pattern */ + uint32_t first_codeunit; /* Starting code unit */ + uint32_t last_codeunit; /* This codeunit must be seen */ + uint16_t bsr_convention; /* What \R matches */ + uint16_t newline_convention; /* What is a newline? */ + uint16_t max_lookbehind; /* Longest lookbehind (characters) */ + uint16_t minlength; /* Minimum length of match */ + uint16_t top_bracket; /* Highest numbered group */ + uint16_t top_backref; /* Highest numbered back reference */ + uint16_t name_entry_size; /* Size (code units) of table entries */ + uint16_t name_count; /* Number of name entries in the table */ +} pcre2_real_code; + +/* The real match data structure. Define ovector as large as it can ever +actually be so that array bound checkers don't grumble. Memory for this +structure is obtained by calling pcre2_match_data_create(), which sets the size +as the offset of ovector plus a pair of elements for each capturable string, so +the size varies from call to call. As the maximum number of capturing +subpatterns is 65535 we must allow for 65536 strings to include the overall +match. (See also the heapframe structure below.) */ + +typedef struct pcre2_real_match_data { + pcre2_memctl memctl; + const pcre2_real_code *code; /* The pattern used for the match */ + PCRE2_SPTR subject; /* The subject that was matched */ + PCRE2_SPTR mark; /* Pointer to last mark */ + PCRE2_SIZE leftchar; /* Offset to leftmost code unit */ + PCRE2_SIZE rightchar; /* Offset to rightmost code unit */ + PCRE2_SIZE startchar; /* Offset to starting code unit */ + uint8_t matchedby; /* Type of match (normal, JIT, DFA) */ + uint8_t flags; /* Various flags */ + uint16_t oveccount; /* Number of pairs */ + int rc; /* The return code from the match */ + PCRE2_SIZE ovector[131072]; /* Must be last in the structure */ +} pcre2_real_match_data; + + +/* ----------------------- PRIVATE STRUCTURES ----------------------------- */ + +/* These structures are not needed for pcre2test. */ + +#ifndef PCRE2_PCRE2TEST + +/* Structures for checking for mutual recursion when scanning compiled or +parsed code. */ + +typedef struct recurse_check { + struct recurse_check *prev; + PCRE2_SPTR group; +} recurse_check; + +typedef struct parsed_recurse_check { + struct parsed_recurse_check *prev; + uint32_t *groupptr; +} parsed_recurse_check; + +/* Structure for building a cache when filling in recursion offsets. */ + +typedef struct recurse_cache { + PCRE2_SPTR group; + int groupnumber; +} recurse_cache; + +/* Structure for maintaining a chain of pointers to the currently incomplete +branches, for testing for left recursion while compiling. */ + +typedef struct branch_chain { + struct branch_chain *outer; + PCRE2_UCHAR *current_branch; +} branch_chain; + +/* Structure for building a list of named groups during the first pass of +compiling. */ + +typedef struct named_group { + PCRE2_SPTR name; /* Points to the name in the pattern */ + uint32_t number; /* Group number */ + uint16_t length; /* Length of the name */ + uint16_t isdup; /* TRUE if a duplicate */ +} named_group; + +/* Structure for passing "static" information around between the functions +doing the compiling, so that they are thread-safe. */ + +typedef struct compile_block { + pcre2_real_compile_context *cx; /* Points to the compile context */ + const uint8_t *lcc; /* Points to lower casing table */ + const uint8_t *fcc; /* Points to case-flipping table */ + const uint8_t *cbits; /* Points to character type table */ + const uint8_t *ctypes; /* Points to table of type maps */ + PCRE2_SPTR start_workspace; /* The start of working space */ + PCRE2_SPTR start_code; /* The start of the compiled code */ + PCRE2_SPTR start_pattern; /* The start of the pattern */ + PCRE2_SPTR end_pattern; /* The end of the pattern */ + PCRE2_UCHAR *name_table; /* The name/number table */ + PCRE2_SIZE workspace_size; /* Size of workspace */ + PCRE2_SIZE small_ref_offset[10]; /* Offsets for \1 to \9 */ + PCRE2_SIZE erroroffset; /* Offset of error in pattern */ + uint16_t names_found; /* Number of entries so far */ + uint16_t name_entry_size; /* Size of each entry */ + uint16_t parens_depth; /* Depth of nested parentheses */ + uint16_t assert_depth; /* Depth of nested assertions */ + open_capitem *open_caps; /* Chain of open capture items */ + named_group *named_groups; /* Points to vector in pre-compile */ + uint32_t named_group_list_size; /* Number of entries in the list */ + uint32_t external_options; /* External (initial) options */ + uint32_t external_flags; /* External flag bits to be set */ + uint32_t bracount; /* Count of capturing parentheses */ + uint32_t lastcapture; /* Last capture encountered */ + uint32_t *parsed_pattern; /* Parsed pattern buffer */ + uint32_t *parsed_pattern_end; /* Parsed pattern should not get here */ + uint32_t *groupinfo; /* Group info vector */ + uint32_t top_backref; /* Maximum back reference */ + uint32_t backref_map; /* Bitmap of low back refs */ + uint32_t nltype; /* Newline type */ + uint32_t nllen; /* Newline string length */ + uint32_t class_range_start; /* Overall class range start */ + uint32_t class_range_end; /* Overall class range end */ + PCRE2_UCHAR nl[4]; /* Newline string when fixed length */ + int max_lookbehind; /* Maximum lookbehind (characters) */ + int req_varyopt; /* "After variable item" flag for reqbyte */ + BOOL had_accept; /* (*ACCEPT) encountered */ + BOOL had_pruneorskip; /* (*PRUNE) or (*SKIP) encountered */ + BOOL had_recurse; /* Had a recursion or subroutine call */ + BOOL dupnames; /* Duplicate names exist */ +} compile_block; + +/* Structure for keeping the properties of the in-memory stack used +by the JIT matcher. */ + +typedef struct pcre2_real_jit_stack { + pcre2_memctl memctl; + void* stack; +} pcre2_real_jit_stack; + +/* Structure for items in a linked list that represents an explicit recursive +call within the pattern when running pcre_dfa_match(). */ + +typedef struct dfa_recursion_info { + struct dfa_recursion_info *prevrec; + PCRE2_SPTR subject_position; + uint32_t group_num; +} dfa_recursion_info; + +/* Structure for "stack" frames that are used for remembering backtracking +positions during matching. As these are used in a vector, with the ovector item +being extended, the size of the structure must be a multiple of PCRE2_SIZE. The +only way to check this at compile time is to force an error by generating an +array with a negative size. By putting this in a typedef (which is never used), +we don't generate any code when all is well. */ + +typedef struct heapframe { + + /* The first set of fields are variables that have to be preserved over calls + to RRMATCH(), but which do not need to be copied to new frames. */ + + PCRE2_SPTR ecode; /* The current position in the pattern */ + PCRE2_SPTR temp_sptr[2]; /* Used for short-term PCRE_SPTR values */ + PCRE2_SIZE length; /* Used for character, string, or code lengths */ + PCRE2_SIZE back_frame; /* Amount to subtract on RRETURN */ + PCRE2_SIZE temp_size; /* Used for short-term PCRE2_SIZE values */ + uint32_t rdepth; /* "Recursion" depth */ + uint32_t group_frame_type; /* Type information for group frames */ + uint32_t temp_32[4]; /* Used for short-term 32-bit or BOOL values */ + uint8_t return_id; /* Where to go on in internal "return" */ + uint8_t op; /* Processing opcode */ + + /* At this point, the structure is 16-bit aligned. On most architectures + the alignment requirement for a pointer will ensure that the eptr field below + is 32-bit or 64-bit aligned. However, on m68k it is fine to have a pointer + that is 16-bit aligned. We must therefore ensure that what comes between here + and eptr is an odd multiple of 16 bits so as to get back into 32-bit + alignment. This happens naturally when PCRE2_UCHAR is 8 bits wide, but needs + fudges in the other cases. In the 32-bit case the padding comes first so that + the occu field itself is 32-bit aligned. Without the padding, this structure + is no longer a multiple of PCRE2_SIZE on m68k, and the check below fails. */ + +#if PCRE2_CODE_UNIT_WIDTH == 8 + PCRE2_UCHAR occu[6]; /* Used for other case code units */ +#elif PCRE2_CODE_UNIT_WIDTH == 16 + PCRE2_UCHAR occu[2]; /* Used for other case code units */ + uint8_t unused[2]; /* Ensure 32-bit alignment (see above) */ +#else + uint8_t unused[2]; /* Ensure 32-bit alignment (see above) */ + PCRE2_UCHAR occu[1]; /* Used for other case code units */ +#endif + + /* The rest have to be copied from the previous frame whenever a new frame + becomes current. The final field is specified as a large vector so that + runtime array bound checks don't catch references to it. However, for any + specific call to pcre2_match() the memory allocated for each frame structure + allows for exactly the right size ovector for the number of capturing + parentheses. (See also the comment for pcre2_real_match_data above.) */ + + PCRE2_SPTR eptr; /* MUST BE FIRST */ + PCRE2_SPTR start_match; /* Can be adjusted by \K */ + PCRE2_SPTR mark; /* Most recent mark on the success path */ + uint32_t current_recurse; /* Current (deepest) recursion number */ + uint32_t capture_last; /* Most recent capture */ + PCRE2_SIZE last_group_offset; /* Saved offset to most recent group frame */ + PCRE2_SIZE offset_top; /* Offset after highest capture */ + PCRE2_SIZE ovector[131072]; /* Must be last in the structure */ +} heapframe; + +/* This typedef is a check that the size of the heapframe structure is a +multiple of PCRE2_SIZE. See various comments above. */ + +typedef char check_heapframe_size[ + ((sizeof(heapframe) % sizeof(PCRE2_SIZE)) == 0)? (+1):(-1)]; + +/* Structure for passing "static" information around between the functions +doing traditional NFA matching (pcre2_match() and friends). */ + +typedef struct match_block { + pcre2_memctl memctl; /* For general use */ + PCRE2_SIZE frame_vector_size; /* Size of a backtracking frame */ + heapframe *match_frames; /* Points to vector of frames */ + heapframe *match_frames_top; /* Points after the end of the vector */ + heapframe *stack_frames; /* The original vector on the stack */ + PCRE2_SIZE heap_limit; /* As it says */ + uint32_t match_limit; /* As it says */ + uint32_t match_limit_depth; /* As it says */ + uint32_t match_call_count; /* Number of times a new frame is created */ + BOOL hitend; /* Hit the end of the subject at some point */ + BOOL hasthen; /* Pattern contains (*THEN) */ + BOOL allowemptypartial; /* Allow empty hard partial */ + const uint8_t *lcc; /* Points to lower casing table */ + const uint8_t *fcc; /* Points to case-flipping table */ + const uint8_t *ctypes; /* Points to table of type maps */ + PCRE2_SIZE start_offset; /* The start offset value */ + PCRE2_SIZE end_offset_top; /* Highwater mark at end of match */ + uint16_t partial; /* PARTIAL options */ + uint16_t bsr_convention; /* \R interpretation */ + uint16_t name_count; /* Number of names in name table */ + uint16_t name_entry_size; /* Size of entry in names table */ + PCRE2_SPTR name_table; /* Table of group names */ + PCRE2_SPTR start_code; /* For use when recursing */ + PCRE2_SPTR start_subject; /* Start of the subject string */ + PCRE2_SPTR check_subject; /* Where UTF-checked from */ + PCRE2_SPTR end_subject; /* End of the subject string */ + PCRE2_SPTR end_match_ptr; /* Subject position at end match */ + PCRE2_SPTR start_used_ptr; /* Earliest consulted character */ + PCRE2_SPTR last_used_ptr; /* Latest consulted character */ + PCRE2_SPTR mark; /* Mark pointer to pass back on success */ + PCRE2_SPTR nomatch_mark; /* Mark pointer to pass back on failure */ + PCRE2_SPTR verb_ecode_ptr; /* For passing back info */ + PCRE2_SPTR verb_skip_ptr; /* For passing back a (*SKIP) name */ + uint32_t verb_current_recurse; /* Current recurse when (*VERB) happens */ + uint32_t moptions; /* Match options */ + uint32_t poptions; /* Pattern options */ + uint32_t skip_arg_count; /* For counting SKIP_ARGs */ + uint32_t ignore_skip_arg; /* For re-run when SKIP arg name not found */ + uint32_t nltype; /* Newline type */ + uint32_t nllen; /* Newline string length */ + PCRE2_UCHAR nl[4]; /* Newline string when fixed */ + pcre2_callout_block *cb; /* Points to a callout block */ + void *callout_data; /* To pass back to callouts */ + int (*callout)(pcre2_callout_block *,void *); /* Callout function or NULL */ +} match_block; + +/* A similar structure is used for the same purpose by the DFA matching +functions. */ + +typedef struct dfa_match_block { + pcre2_memctl memctl; /* For general use */ + PCRE2_SPTR start_code; /* Start of the compiled pattern */ + PCRE2_SPTR start_subject ; /* Start of the subject string */ + PCRE2_SPTR end_subject; /* End of subject string */ + PCRE2_SPTR start_used_ptr; /* Earliest consulted character */ + PCRE2_SPTR last_used_ptr; /* Latest consulted character */ + const uint8_t *tables; /* Character tables */ + PCRE2_SIZE start_offset; /* The start offset value */ + PCRE2_SIZE heap_limit; /* As it says */ + PCRE2_SIZE heap_used; /* As it says */ + uint32_t match_limit; /* As it says */ + uint32_t match_limit_depth; /* As it says */ + uint32_t match_call_count; /* Number of calls of internal function */ + uint32_t moptions; /* Match options */ + uint32_t poptions; /* Pattern options */ + uint32_t nltype; /* Newline type */ + uint32_t nllen; /* Newline string length */ + BOOL allowemptypartial; /* Allow empty hard partial */ + PCRE2_UCHAR nl[4]; /* Newline string when fixed */ + uint16_t bsr_convention; /* \R interpretation */ + pcre2_callout_block *cb; /* Points to a callout block */ + void *callout_data; /* To pass back to callouts */ + int (*callout)(pcre2_callout_block *,void *); /* Callout function or NULL */ + dfa_recursion_info *recursive; /* Linked list of recursion data */ +} dfa_match_block; + +#endif /* PCRE2_PCRE2TEST */ + +/* End of pcre2_intmodedep.h */ diff --git a/src/pcre/pcre_jit_compile.c b/src/pcre2/src/pcre2_jit_compile.c similarity index 60% rename from src/pcre/pcre_jit_compile.c rename to src/pcre2/src/pcre2_jit_compile.c index bc5f9c01..f3a26aee 100644 --- a/src/pcre/pcre_jit_compile.c +++ b/src/pcre2/src/pcre2_jit_compile.c @@ -6,10 +6,9 @@ and semantics are as close as possible to those of the Perl 5 language. Written by Philip Hazel - Copyright (c) 1997-2013 University of Cambridge - - The machine code generator part (this module) was written by Zoltan Herczeg - Copyright (c) 2010-2013 + This module by Zoltan Herczeg + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2019 University of Cambridge ----------------------------------------------------------------------------- Redistribution and use in source and binary forms, with or without @@ -44,20 +43,38 @@ POSSIBILITY OF SUCH DAMAGE. #include "config.h" #endif -#include "pcre_internal.h" +#include "pcre2_internal.h" -#if defined SUPPORT_JIT +#ifdef SUPPORT_JIT /* All-in-one: Since we use the JIT compiler only from here, we just include it. This way we don't need to touch the build system files. */ -#define SLJIT_MALLOC(size, allocator_data) (PUBL(malloc))(size) -#define SLJIT_FREE(ptr, allocator_data) (PUBL(free))(ptr) #define SLJIT_CONFIG_AUTO 1 #define SLJIT_CONFIG_STATIC 1 #define SLJIT_VERBOSE 0 + +#ifdef PCRE2_DEBUG +#define SLJIT_DEBUG 1 +#else #define SLJIT_DEBUG 0 +#endif + +#define SLJIT_MALLOC(size, allocator_data) pcre2_jit_malloc(size, allocator_data) +#define SLJIT_FREE(ptr, allocator_data) pcre2_jit_free(ptr, allocator_data) + +static void * pcre2_jit_malloc(size_t size, void *allocator_data) +{ +pcre2_memctl *allocator = ((pcre2_memctl*)allocator_data); +return allocator->malloc(size, allocator->memory_data); +} + +static void pcre2_jit_free(void *ptr, void *allocator_data) +{ +pcre2_memctl *allocator = ((pcre2_memctl*)allocator_data); +allocator->free(ptr, allocator->memory_data); +} #include "sljit/sljitLir.c" @@ -160,28 +177,27 @@ Thus we can restore the private data to a particular point in the stack. typedef struct jit_arguments { /* Pointers first. */ struct sljit_stack *stack; - const pcre_uchar *str; - const pcre_uchar *begin; - const pcre_uchar *end; - int *offsets; - pcre_uchar *mark_ptr; + PCRE2_SPTR str; + PCRE2_SPTR begin; + PCRE2_SPTR end; + pcre2_match_data *match_data; + PCRE2_SPTR startchar_ptr; + PCRE2_UCHAR *mark_ptr; + int (*callout)(pcre2_callout_block *, void *); void *callout_data; /* Everything else after. */ + sljit_uw offset_limit; sljit_u32 limit_match; - int real_offset_count; - int offset_count; - sljit_u8 notbol; - sljit_u8 noteol; - sljit_u8 notempty; - sljit_u8 notempty_atstart; + sljit_u32 oveccount; + sljit_u32 options; } jit_arguments; +#define JIT_NUMBER_OF_COMPILE_MODES 3 + typedef struct executable_functions { void *executable_funcs[JIT_NUMBER_OF_COMPILE_MODES]; void *read_only_data_heads[JIT_NUMBER_OF_COMPILE_MODES]; sljit_uw executable_sizes[JIT_NUMBER_OF_COMPILE_MODES]; - PUBL(jit_callback) callback; - void *userdata; sljit_u32 top_bracket; sljit_u32 limit_match; } executable_functions; @@ -197,12 +213,6 @@ typedef struct stub_list { struct stub_list *next; } stub_list; -typedef struct label_addr_list { - struct sljit_label *label; - sljit_uw *update_addr; - struct label_addr_list *next; -} label_addr_list; - enum frame_types { no_frame = -1, no_stack = -2 @@ -213,6 +223,12 @@ enum control_types { type_then_trap = 1 }; +enum early_fail_types { + type_skip = 0, + type_fail = 1, + type_fail_range = 2 +}; + typedef int (SLJIT_FUNC *jit_function)(jit_arguments *args); /* The following structure is the key data type for the recursive @@ -227,7 +243,7 @@ typedef struct backtrack_common { struct backtrack_common *top; jump_list *topbacktracks; /* Opcode pointer. */ - pcre_uchar *cc; + PCRE2_SPTR cc; } backtrack_common; typedef struct assert_backtrack { @@ -256,6 +272,8 @@ typedef struct bracket_backtrack { assert_backtrack *assert; /* For OP_ONCE. Less than 0 if not needed. */ int framesize; + /* For brackets with >3 alternatives. */ + struct sljit_put_label *matching_put_label; } u; /* Points to our private memory word on the stack. */ int private_data_ptr; @@ -284,7 +302,7 @@ typedef struct char_iterator_backtrack { jump_list *backtracks; struct { unsigned int othercasebit; - pcre_uchar chr; + PCRE2_UCHAR chr; BOOL enabled; } charpos; } u; @@ -298,16 +316,25 @@ typedef struct ref_iterator_backtrack { typedef struct recurse_entry { struct recurse_entry *next; - /* Contains the function entry. */ - struct sljit_label *entry; - /* Collects the calls until the function is not created. */ - jump_list *calls; + /* Contains the function entry label. */ + struct sljit_label *entry_label; + /* Contains the function entry label. */ + struct sljit_label *backtrack_label; + /* Collects the entry calls until the function is not created. */ + jump_list *entry_calls; + /* Collects the backtrack calls until the function is not created. */ + jump_list *backtrack_calls; /* Points to the starting opcode. */ sljit_sw start; } recurse_entry; typedef struct recurse_backtrack { backtrack_common common; + /* Return to the matching path. */ + struct sljit_label *matchingpath; + /* Recursive pattern. */ + recurse_entry *entry; + /* Pattern is inlined. */ BOOL inlined_pattern; } recurse_backtrack; @@ -326,13 +353,28 @@ typedef struct then_trap_backtrack { int framesize; } then_trap_backtrack; -#define MAX_RANGE_SIZE 4 +#define MAX_N_CHARS 12 +#define MAX_DIFF_CHARS 5 + +typedef struct fast_forward_char_data { + /* Number of characters in the chars array, 255 for any character. */ + sljit_u8 count; + /* Number of last UTF-8 characters in the chars array. */ + sljit_u8 last_count; + /* Available characters in the current position. */ + PCRE2_UCHAR chars[MAX_DIFF_CHARS]; +} fast_forward_char_data; + +#define MAX_CLASS_RANGE_SIZE 4 +#define MAX_CLASS_CHARS_SIZE 3 typedef struct compiler_common { /* The sljit ceneric compiler. */ struct sljit_compiler *compiler; + /* Compiled regular expression. */ + pcre2_real_code *re; /* First byte code. */ - pcre_uchar *start; + PCRE2_SPTR start; /* Maps private data offset to each opcode. */ sljit_s32 *private_data_ptrs; /* Chain list of read-only data ptrs. */ @@ -367,16 +409,18 @@ typedef struct compiler_common { /* Points to the last matched capture block index. */ sljit_s32 capture_last_ptr; /* Fast forward skipping byte code pointer. */ - pcre_uchar *fast_forward_bc_ptr; + PCRE2_SPTR fast_forward_bc_ptr; /* Locals used by fast fail optimization. */ - sljit_s32 fast_fail_start_ptr; - sljit_s32 fast_fail_end_ptr; + sljit_s32 early_fail_start_ptr; + sljit_s32 early_fail_end_ptr; /* Flipped and lower case tables. */ const sljit_u8 *fcc; sljit_sw lcc; - /* Mode can be PCRE_STUDY_JIT_COMPILE and others. */ + /* Mode can be PCRE2_JIT_COMPLETE and others. */ int mode; + /* TRUE, when empty match is accepted for partial matching. */ + BOOL allow_empty_partial; /* TRUE, when minlength is greater than 0. */ BOOL might_be_empty; /* \K is found in the pattern. */ @@ -387,10 +431,10 @@ typedef struct compiler_common { BOOL has_then; /* (*SKIP) or (*SKIP:arg) is found in lookbehind assertion. */ BOOL has_skip_in_assert_back; - /* Currently in recurse or negative assert. */ - BOOL local_exit; - /* Currently in a positive assert. */ - BOOL positive_assert; + /* Quit is redirected by recurse, negative assertion, or positive assertion in conditional block. */ + BOOL local_quit_available; + /* Currently in a positive assertion. */ + BOOL in_positive_assertion; /* Newline control. */ int nltype; sljit_u32 nlmax; @@ -404,24 +448,24 @@ typedef struct compiler_common { /* Tables. */ sljit_sw ctypes; /* Named capturing brackets. */ - pcre_uchar *name_table; + PCRE2_SPTR name_table; sljit_sw name_count; sljit_sw name_entry_size; /* Labels and jump lists. */ struct sljit_label *partialmatchlabel; struct sljit_label *quit_label; - struct sljit_label *forced_quit_label; + struct sljit_label *abort_label; struct sljit_label *accept_label; struct sljit_label *ff_newline_shortcut; stub_list *stubs; - label_addr_list *label_addrs; recurse_entry *entries; recurse_entry *currententry; jump_list *partialmatch; jump_list *quit; - jump_list *positive_assert_quit; - jump_list *forced_quit; + jump_list *positive_assertion_quit; + jump_list *abort; + jump_list *failed_match; jump_list *accept; jump_list *calllimit; jump_list *stackalloc; @@ -433,19 +477,28 @@ typedef struct compiler_common { jump_list *casefulcmp; jump_list *caselesscmp; jump_list *reset_match; - BOOL jscript_compat; -#ifdef SUPPORT_UTF + BOOL unset_backref; + BOOL alt_circumflex; +#ifdef SUPPORT_UNICODE BOOL utf; -#ifdef SUPPORT_UCP - BOOL use_ucp; + BOOL invalid_utf; + BOOL ucp; + /* Points to saving area for iref. */ + sljit_s32 iref_ptr; jump_list *getucd; -#endif -#ifdef COMPILE_PCRE8 + jump_list *getucdtype; +#if PCRE2_CODE_UNIT_WIDTH == 8 jump_list *utfreadchar; - jump_list *utfreadchar16; jump_list *utfreadtype8; + jump_list *utfpeakcharback; +#endif +#if PCRE2_CODE_UNIT_WIDTH == 8 || PCRE2_CODE_UNIT_WIDTH == 16 + jump_list *utfreadchar_invalid; + jump_list *utfreadnewline_invalid; + jump_list *utfmoveback_invalid; + jump_list *utfpeakcharback_invalid; #endif -#endif /* SUPPORT_UTF */ +#endif /* SUPPORT_UNICODE */ } compiler_common; /* For byte_sequence_compare. */ @@ -458,24 +511,24 @@ typedef struct compare_context { union { sljit_s32 asint; sljit_u16 asushort; -#if defined COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 sljit_u8 asbyte; sljit_u8 asuchars[4]; -#elif defined COMPILE_PCRE16 +#elif PCRE2_CODE_UNIT_WIDTH == 16 sljit_u16 asuchars[2]; -#elif defined COMPILE_PCRE32 +#elif PCRE2_CODE_UNIT_WIDTH == 32 sljit_u32 asuchars[1]; #endif } c; union { sljit_s32 asint; sljit_u16 asushort; -#if defined COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 sljit_u8 asbyte; sljit_u8 asuchars[4]; -#elif defined COMPILE_PCRE16 +#elif PCRE2_CODE_UNIT_WIDTH == 16 sljit_u16 asuchars[2]; -#elif defined COMPILE_PCRE32 +#elif PCRE2_CODE_UNIT_WIDTH == 32 sljit_u32 asuchars[1]; #endif } oc; @@ -506,14 +559,20 @@ typedef struct compare_context { #define TMP2 SLJIT_R2 #define TMP3 SLJIT_R3 #endif -#define STR_PTR SLJIT_S0 -#define STR_END SLJIT_S1 -#define STACK_TOP SLJIT_R1 +#define STR_PTR SLJIT_R1 +#define STR_END SLJIT_S0 +#define STACK_TOP SLJIT_S1 #define STACK_LIMIT SLJIT_S2 #define COUNT_MATCH SLJIT_S3 #define ARGUMENTS SLJIT_S4 #define RETURN_ADDR SLJIT_R4 +#if (defined SLJIT_CONFIG_X86_32 && SLJIT_CONFIG_X86_32) +#define HAS_VIRTUAL_REGISTERS 1 +#else +#define HAS_VIRTUAL_REGISTERS 0 +#endif + /* Local space layout. */ /* These two locals can be used by the current opcode. */ #define LOCALS0 (0 * sizeof(sljit_sw)) @@ -532,12 +591,17 @@ the start pointers when the end of the capturing group has not yet reached. */ #define OVECTOR_PRIV(i) (common->cbra_ptr + (i) * (sljit_sw)sizeof(sljit_sw)) #define PRIVATE_DATA(cc) (common->private_data_ptrs[(cc) - common->start]) -#if defined COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 #define MOV_UCHAR SLJIT_MOV_U8 -#elif defined COMPILE_PCRE16 +#define IN_UCHARS(x) (x) +#elif PCRE2_CODE_UNIT_WIDTH == 16 #define MOV_UCHAR SLJIT_MOV_U16 -#elif defined COMPILE_PCRE32 +#define UCHAR_SHIFT (1) +#define IN_UCHARS(x) ((x) * 2) +#elif PCRE2_CODE_UNIT_WIDTH == 32 #define MOV_UCHAR SLJIT_MOV_U32 +#define UCHAR_SHIFT (2) +#define IN_UCHARS(x) ((x) * 4) #else #error Unsupported compiling mode #endif @@ -549,6 +613,8 @@ the start pointers when the end of the capturing group has not yet reached. */ sljit_emit_op1(compiler, (op), (dst), (dstw), (src), (srcw)) #define OP2(op, dst, dstw, src1, src1w, src2, src2w) \ sljit_emit_op2(compiler, (op), (dst), (dstw), (src1), (src1w), (src2), (src2w)) +#define OP_SRC(op, src, srcw) \ + sljit_emit_op_src(compiler, (op), (src), (srcw)) #define LABEL() \ sljit_emit_label(compiler) #define JUMP(type) \ @@ -565,26 +631,217 @@ the start pointers when the end of the capturing group has not yet reached. */ sljit_set_label(sljit_emit_cmp(compiler, (type), (src1), (src1w), (src2), (src2w)), (label)) #define OP_FLAGS(op, dst, dstw, type) \ sljit_emit_op_flags(compiler, (op), (dst), (dstw), (type)) +#define CMOV(type, dst_reg, src, srcw) \ + sljit_emit_cmov(compiler, (type), (dst_reg), (src), (srcw)) #define GET_LOCAL_BASE(dst, dstw, offset) \ sljit_get_local_base(compiler, (dst), (dstw), (offset)) #define READ_CHAR_MAX 0x7fffffff -#define INVALID_UTF_CHAR 888 +#define INVALID_UTF_CHAR -1 +#define UNASSIGNED_UTF_CHAR 888 + +#if defined SUPPORT_UNICODE +#if PCRE2_CODE_UNIT_WIDTH == 8 + +#define GETCHARINC_INVALID(c, ptr, end, invalid_action) \ + { \ + if (ptr[0] <= 0x7f) \ + c = *ptr++; \ + else if (ptr + 1 < end && ptr[1] >= 0x80 && ptr[1] < 0xc0) \ + { \ + c = ptr[1] - 0x80; \ + \ + if (ptr[0] >= 0xc2 && ptr[0] <= 0xdf) \ + { \ + c |= (ptr[0] - 0xc0) << 6; \ + ptr += 2; \ + } \ + else if (ptr + 2 < end && ptr[2] >= 0x80 && ptr[2] < 0xc0) \ + { \ + c = c << 6 | (ptr[2] - 0x80); \ + \ + if (ptr[0] >= 0xe0 && ptr[0] <= 0xef) \ + { \ + c |= (ptr[0] - 0xe0) << 12; \ + ptr += 3; \ + \ + if (c < 0x800 || (c >= 0xd800 && c < 0xe000)) \ + { \ + invalid_action; \ + } \ + } \ + else if (ptr + 3 < end && ptr[3] >= 0x80 && ptr[3] < 0xc0) \ + { \ + c = c << 6 | (ptr[3] - 0x80); \ + \ + if (ptr[0] >= 0xf0 && ptr[0] <= 0xf4) \ + { \ + c |= (ptr[0] - 0xf0) << 18; \ + ptr += 4; \ + \ + if (c >= 0x110000 || c < 0x10000) \ + { \ + invalid_action; \ + } \ + } \ + else \ + { \ + invalid_action; \ + } \ + } \ + else \ + { \ + invalid_action; \ + } \ + } \ + else \ + { \ + invalid_action; \ + } \ + } \ + else \ + { \ + invalid_action; \ + } \ + } + +#define GETCHARBACK_INVALID(c, ptr, start, invalid_action) \ + { \ + c = ptr[-1]; \ + if (c <= 0x7f) \ + ptr--; \ + else if (ptr - 1 > start && ptr[-1] >= 0x80 && ptr[-1] < 0xc0) \ + { \ + c -= 0x80; \ + \ + if (ptr[-2] >= 0xc2 && ptr[-2] <= 0xdf) \ + { \ + c |= (ptr[-2] - 0xc0) << 6; \ + ptr -= 2; \ + } \ + else if (ptr - 2 > start && ptr[-2] >= 0x80 && ptr[-2] < 0xc0) \ + { \ + c = c << 6 | (ptr[-2] - 0x80); \ + \ + if (ptr[-3] >= 0xe0 && ptr[-3] <= 0xef) \ + { \ + c |= (ptr[-3] - 0xe0) << 12; \ + ptr -= 3; \ + \ + if (c < 0x800 || (c >= 0xd800 && c < 0xe000)) \ + { \ + invalid_action; \ + } \ + } \ + else if (ptr - 3 > start && ptr[-3] >= 0x80 && ptr[-3] < 0xc0) \ + { \ + c = c << 6 | (ptr[-3] - 0x80); \ + \ + if (ptr[-4] >= 0xf0 && ptr[-4] <= 0xf4) \ + { \ + c |= (ptr[-4] - 0xf0) << 18; \ + ptr -= 4; \ + \ + if (c >= 0x110000 || c < 0x10000) \ + { \ + invalid_action; \ + } \ + } \ + else \ + { \ + invalid_action; \ + } \ + } \ + else \ + { \ + invalid_action; \ + } \ + } \ + else \ + { \ + invalid_action; \ + } \ + } \ + else \ + { \ + invalid_action; \ + } \ + } + +#elif PCRE2_CODE_UNIT_WIDTH == 16 + +#define GETCHARINC_INVALID(c, ptr, end, invalid_action) \ + { \ + if (ptr[0] < 0xd800 || ptr[0] >= 0xe000) \ + c = *ptr++; \ + else if (ptr[0] < 0xdc00 && ptr + 1 < end && ptr[1] >= 0xdc00 && ptr[1] < 0xe000) \ + { \ + c = (((ptr[0] - 0xd800) << 10) | (ptr[1] - 0xdc00)) + 0x10000; \ + ptr += 2; \ + } \ + else \ + { \ + invalid_action; \ + } \ + } + +#define GETCHARBACK_INVALID(c, ptr, start, invalid_action) \ + { \ + c = ptr[-1]; \ + if (c < 0xd800 || c >= 0xe000) \ + ptr--; \ + else if (c >= 0xdc00 && ptr - 1 > start && ptr[-2] >= 0xd800 && ptr[-2] < 0xdc00) \ + { \ + c = (((ptr[-2] - 0xd800) << 10) | (c - 0xdc00)) + 0x10000; \ + ptr -= 2; \ + } \ + else \ + { \ + invalid_action; \ + } \ + } + + +#elif PCRE2_CODE_UNIT_WIDTH == 32 + +#define GETCHARINC_INVALID(c, ptr, end, invalid_action) \ + { \ + if (ptr[0] < 0xd800 || (ptr[0] >= 0xe000 && ptr[0] < 0x110000)) \ + c = *ptr++; \ + else \ + { \ + invalid_action; \ + } \ + } + +#define GETCHARBACK_INVALID(c, ptr, start, invalid_action) \ + { \ + c = ptr[-1]; \ + if (ptr[-1] < 0xd800 || (ptr[-1] >= 0xe000 && ptr[-1] < 0x110000)) \ + ptr--; \ + else \ + { \ + invalid_action; \ + } \ + } + +#endif /* PCRE2_CODE_UNIT_WIDTH == [8|16|32] */ +#endif /* SUPPORT_UNICODE */ -static pcre_uchar *bracketend(pcre_uchar *cc) +static PCRE2_SPTR bracketend(PCRE2_SPTR cc) { -SLJIT_ASSERT((*cc >= OP_ASSERT && *cc <= OP_ASSERTBACK_NOT) || (*cc >= OP_ONCE && *cc <= OP_SCOND)); +SLJIT_ASSERT((*cc >= OP_ASSERT && *cc <= OP_ASSERTBACK_NA) || (*cc >= OP_ONCE && *cc <= OP_SCOND)); do cc += GET(cc, 1); while (*cc == OP_ALT); SLJIT_ASSERT(*cc >= OP_KET && *cc <= OP_KETRPOS); cc += 1 + LINK_SIZE; return cc; } -static int no_alternatives(pcre_uchar *cc) +static int no_alternatives(PCRE2_SPTR cc) { int count = 0; -SLJIT_ASSERT((*cc >= OP_ASSERT && *cc <= OP_ASSERTBACK_NOT) || (*cc >= OP_ONCE && *cc <= OP_SCOND)); +SLJIT_ASSERT((*cc >= OP_ASSERT && *cc <= OP_ASSERTBACK_NA) || (*cc >= OP_ONCE && *cc <= OP_SCOND)); do { cc += GET(cc, 1); @@ -601,13 +858,13 @@ return count; set_private_data_ptrs get_framesize init_frame - get_private_data_copy_length - copy_private_data + get_recurse_data_length + copy_recurse_data compile_matchingpath compile_backtrackingpath */ -static pcre_uchar *next_opcode(compiler_common *common, pcre_uchar *cc) +static PCRE2_SPTR next_opcode(compiler_common *common, PCRE2_SPTR cc) { SLJIT_UNUSED_ARG(common); switch(*cc) @@ -669,8 +926,10 @@ switch(*cc) case OP_ASSERT_NOT: case OP_ASSERTBACK: case OP_ASSERTBACK_NOT: + case OP_ASSERT_NA: + case OP_ASSERTBACK_NA: case OP_ONCE: - case OP_ONCE_NC: + case OP_SCRIPT_RUN: case OP_BRA: case OP_BRAPOS: case OP_CBRA: @@ -685,7 +944,8 @@ switch(*cc) case OP_DNCREF: case OP_RREF: case OP_DNRREF: - case OP_DEF: + case OP_FALSE: + case OP_TRUE: case OP_BRAZERO: case OP_BRAMINZERO: case OP_BRAPOSZERO: @@ -757,7 +1017,7 @@ switch(*cc) case OP_NOTPOSQUERYI: case OP_NOTPOSUPTOI: cc += PRIV(OP_lengths)[*cc]; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); #endif return cc; @@ -779,34 +1039,38 @@ switch(*cc) return cc + PRIV(OP_lengths)[*cc] - 1; case OP_ANYBYTE: -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf) return NULL; #endif return cc + 1; -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 + case OP_CALLOUT_STR: + return cc + GET(cc, 1 + 2*LINK_SIZE); + +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH != 8 case OP_XCLASS: return cc + GET(cc, 1); #endif case OP_MARK: + case OP_COMMIT_ARG: case OP_PRUNE_ARG: case OP_SKIP_ARG: case OP_THEN_ARG: return cc + 1 + 2 + cc[1]; default: - /* All opcodes are supported now! */ SLJIT_UNREACHABLE(); return NULL; } } -static BOOL check_opcode_types(compiler_common *common, pcre_uchar *cc, pcre_uchar *ccend) +static BOOL check_opcode_types(compiler_common *common, PCRE2_SPTR cc, PCRE2_SPTR ccend) { int count; -pcre_uchar *slot; -pcre_uchar *assert_back_end = cc - 1; +PCRE2_SPTR slot; +PCRE2_SPTR assert_back_end = cc - 1; +PCRE2_SPTR assert_na_end = cc - 1; /* Calculate important variables (like stack size) and checks whether all opcodes are supported. */ while (cc < ccend) @@ -819,12 +1083,28 @@ while (cc < ccend) cc += 1; break; - case OP_REF: case OP_REFI: +#ifdef SUPPORT_UNICODE + if (common->iref_ptr == 0) + { + common->iref_ptr = common->ovector_start; + common->ovector_start += 3 * sizeof(sljit_sw); + } +#endif /* SUPPORT_UNICODE */ + /* Fall through. */ + case OP_REF: common->optimized_cbracket[GET2(cc, 1)] = 0; cc += 1 + IMM2_SIZE; break; + case OP_ASSERT_NA: + case OP_ASSERTBACK_NA: + slot = bracketend(cc); + if (slot > assert_na_end) + assert_na_end = slot; + cc += 1 + LINK_SIZE; + break; + case OP_CBRAPOS: case OP_SCBRAPOS: common->optimized_cbracket[GET2(cc, 1 + LINK_SIZE)] = 0; @@ -835,7 +1115,7 @@ while (cc < ccend) case OP_SCOND: /* Only AUTO_CALLOUT can insert this opcode. We do not intend to support this case. */ - if (cc[1 + LINK_SIZE] == OP_CALLOUT) + if (cc[1 + LINK_SIZE] == OP_CALLOUT || cc[1 + LINK_SIZE] == OP_CALLOUT_STR) return FALSE; cc += 1 + LINK_SIZE; break; @@ -869,12 +1149,13 @@ while (cc < ccend) break; case OP_CALLOUT: + case OP_CALLOUT_STR: if (common->capture_last_ptr == 0) { common->capture_last_ptr = common->ovector_start; common->ovector_start += sizeof(sljit_sw); } - cc += 2 + 2 * LINK_SIZE; + cc += (*cc == OP_CALLOUT) ? PRIV(OP_lengths)[OP_CALLOUT] : GET(cc, 1 + 2*LINK_SIZE); break; case OP_ASSERTBACK: @@ -889,7 +1170,11 @@ while (cc < ccend) common->control_head_ptr = 1; /* Fall through. */ + case OP_COMMIT_ARG: case OP_PRUNE_ARG: + if (cc < assert_na_end) + return FALSE; + /* Fall through */ case OP_MARK: if (common->mark_ptr == 0) { @@ -908,6 +1193,8 @@ while (cc < ccend) case OP_SKIP: if (cc < assert_back_end) common->has_skip_in_assert_back = TRUE; + if (cc < assert_na_end) + return FALSE; cc += 1; break; @@ -916,9 +1203,19 @@ while (cc < ccend) common->has_skip_arg = TRUE; if (cc < assert_back_end) common->has_skip_in_assert_back = TRUE; + if (cc < assert_na_end) + return FALSE; cc += 1 + 2 + cc[1]; break; + case OP_PRUNE: + case OP_COMMIT: + case OP_ASSERT_ACCEPT: + if (cc < assert_na_end) + return FALSE; + cc++; + break; + default: cc = next_opcode(common, cc); if (cc == NULL) @@ -929,186 +1226,349 @@ while (cc < ccend) return TRUE; } -static BOOL is_accelerated_repeat(pcre_uchar *cc) +#define EARLY_FAIL_ENHANCE_MAX (1 + 3) + +/* +start: + 0 - skip / early fail allowed + 1 - only early fail with range allowed + >1 - (start - 1) early fail is processed + +return: current number of iterators enhanced with fast fail +*/ +static int detect_early_fail(compiler_common *common, PCRE2_SPTR cc, int *private_data_start, sljit_s32 depth, int start) { -switch(*cc) - { - case OP_TYPESTAR: - case OP_TYPEMINSTAR: - case OP_TYPEPLUS: - case OP_TYPEMINPLUS: - case OP_TYPEPOSSTAR: - case OP_TYPEPOSPLUS: - return (cc[1] != OP_ANYNL && cc[1] != OP_EXTUNI); +PCRE2_SPTR begin = cc; +PCRE2_SPTR next_alt; +PCRE2_SPTR end; +PCRE2_SPTR accelerated_start; +int result = 0; +int count; +BOOL fast_forward_allowed = TRUE; - case OP_STAR: - case OP_MINSTAR: - case OP_PLUS: - case OP_MINPLUS: - case OP_POSSTAR: - case OP_POSPLUS: +SLJIT_ASSERT(*cc == OP_ONCE || *cc == OP_BRA || *cc == OP_CBRA); +SLJIT_ASSERT(*cc != OP_CBRA || common->optimized_cbracket[GET2(cc, 1 + LINK_SIZE)] != 0); +SLJIT_ASSERT(start < EARLY_FAIL_ENHANCE_MAX); - case OP_STARI: - case OP_MINSTARI: - case OP_PLUSI: - case OP_MINPLUSI: - case OP_POSSTARI: - case OP_POSPLUSI: +do + { + count = start; + next_alt = cc + GET(cc, 1); + cc += 1 + LINK_SIZE + ((*cc == OP_CBRA) ? IMM2_SIZE : 0); - case OP_NOTSTAR: - case OP_NOTMINSTAR: - case OP_NOTPLUS: - case OP_NOTMINPLUS: - case OP_NOTPOSSTAR: - case OP_NOTPOSPLUS: + while (TRUE) + { + accelerated_start = NULL; - case OP_NOTSTARI: - case OP_NOTMINSTARI: - case OP_NOTPLUSI: - case OP_NOTMINPLUSI: - case OP_NOTPOSSTARI: - case OP_NOTPOSPLUSI: - return TRUE; + switch(*cc) + { + case OP_SOD: + case OP_SOM: + case OP_SET_SOM: + case OP_NOT_WORD_BOUNDARY: + case OP_WORD_BOUNDARY: + case OP_EODN: + case OP_EOD: + case OP_CIRC: + case OP_CIRCM: + case OP_DOLL: + case OP_DOLLM: + /* Zero width assertions. */ + cc++; + continue; + + case OP_NOT_DIGIT: + case OP_DIGIT: + case OP_NOT_WHITESPACE: + case OP_WHITESPACE: + case OP_NOT_WORDCHAR: + case OP_WORDCHAR: + case OP_ANY: + case OP_ALLANY: + case OP_ANYBYTE: + case OP_NOT_HSPACE: + case OP_HSPACE: + case OP_NOT_VSPACE: + case OP_VSPACE: + fast_forward_allowed = FALSE; + cc++; + continue; - case OP_CLASS: - case OP_NCLASS: -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 - case OP_XCLASS: - cc += (*cc == OP_XCLASS) ? GET(cc, 1) : (int)(1 + (32 / sizeof(pcre_uchar))); -#else - cc += (1 + (32 / sizeof(pcre_uchar))); + case OP_ANYNL: + case OP_EXTUNI: + fast_forward_allowed = FALSE; + if (count == 0) + count = 1; + cc++; + continue; + + case OP_NOTPROP: + case OP_PROP: + fast_forward_allowed = FALSE; + cc += 1 + 2; + continue; + + case OP_CHAR: + case OP_CHARI: + case OP_NOT: + case OP_NOTI: + fast_forward_allowed = FALSE; + cc += 2; +#ifdef SUPPORT_UNICODE + if (common->utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); #endif + continue; - switch(*cc) - { - case OP_CRSTAR: - case OP_CRMINSTAR: - case OP_CRPLUS: - case OP_CRMINPLUS: - case OP_CRPOSSTAR: - case OP_CRPOSPLUS: - return TRUE; - } - break; - } -return FALSE; -} + case OP_TYPESTAR: + case OP_TYPEMINSTAR: + case OP_TYPEPLUS: + case OP_TYPEMINPLUS: + case OP_TYPEPOSSTAR: + case OP_TYPEPOSPLUS: + /* The type or prop opcode is skipped in the next iteration. */ + cc += 1; -static SLJIT_INLINE BOOL detect_fast_forward_skip(compiler_common *common, int *private_data_start) -{ -pcre_uchar *cc = common->start; -pcre_uchar *end; + if (cc[0] != OP_ANYNL && cc[0] != OP_EXTUNI) + { + accelerated_start = cc - 1; + break; + } -/* Skip not repeated brackets. */ -while (TRUE) - { - switch(*cc) - { - case OP_SOD: - case OP_SOM: - case OP_SET_SOM: - case OP_NOT_WORD_BOUNDARY: - case OP_WORD_BOUNDARY: - case OP_EODN: - case OP_EOD: - case OP_CIRC: - case OP_CIRCM: - case OP_DOLL: - case OP_DOLLM: - /* Zero width assertions. */ - cc++; - continue; - } + if (count == 0) + count = 1; + fast_forward_allowed = FALSE; + continue; - if (*cc != OP_BRA && *cc != OP_CBRA) - break; + case OP_TYPEUPTO: + case OP_TYPEMINUPTO: + case OP_TYPEEXACT: + case OP_TYPEPOSUPTO: + cc += IMM2_SIZE; + /* Fall through */ + + case OP_TYPEQUERY: + case OP_TYPEMINQUERY: + case OP_TYPEPOSQUERY: + /* The type or prop opcode is skipped in the next iteration. */ + fast_forward_allowed = FALSE; + if (count == 0) + count = 1; + cc += 1; + continue; + + case OP_STAR: + case OP_MINSTAR: + case OP_PLUS: + case OP_MINPLUS: + case OP_POSSTAR: + case OP_POSPLUS: + + case OP_STARI: + case OP_MINSTARI: + case OP_PLUSI: + case OP_MINPLUSI: + case OP_POSSTARI: + case OP_POSPLUSI: + + case OP_NOTSTAR: + case OP_NOTMINSTAR: + case OP_NOTPLUS: + case OP_NOTMINPLUS: + case OP_NOTPOSSTAR: + case OP_NOTPOSPLUS: + + case OP_NOTSTARI: + case OP_NOTMINSTARI: + case OP_NOTPLUSI: + case OP_NOTMINPLUSI: + case OP_NOTPOSSTARI: + case OP_NOTPOSPLUSI: + accelerated_start = cc; + cc += 2; +#ifdef SUPPORT_UNICODE + if (common->utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); +#endif + break; - end = cc + GET(cc, 1); - if (*end != OP_KET || PRIVATE_DATA(end) != 0) - return FALSE; - if (*cc == OP_CBRA) - { - if (common->optimized_cbracket[GET2(cc, 1 + LINK_SIZE)] == 0) - return FALSE; - cc += IMM2_SIZE; - } - cc += 1 + LINK_SIZE; - } + case OP_UPTO: + case OP_MINUPTO: + case OP_EXACT: + case OP_POSUPTO: + case OP_UPTOI: + case OP_MINUPTOI: + case OP_EXACTI: + case OP_POSUPTOI: + case OP_NOTUPTO: + case OP_NOTMINUPTO: + case OP_NOTEXACT: + case OP_NOTPOSUPTO: + case OP_NOTUPTOI: + case OP_NOTMINUPTOI: + case OP_NOTEXACTI: + case OP_NOTPOSUPTOI: + cc += IMM2_SIZE; + /* Fall through */ + + case OP_QUERY: + case OP_MINQUERY: + case OP_POSQUERY: + case OP_QUERYI: + case OP_MINQUERYI: + case OP_POSQUERYI: + case OP_NOTQUERY: + case OP_NOTMINQUERY: + case OP_NOTPOSQUERY: + case OP_NOTQUERYI: + case OP_NOTMINQUERYI: + case OP_NOTPOSQUERYI: + fast_forward_allowed = FALSE; + if (count == 0) + count = 1; + cc += 2; +#ifdef SUPPORT_UNICODE + if (common->utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); +#endif + continue; -if (is_accelerated_repeat(cc)) - { - common->fast_forward_bc_ptr = cc; - common->private_data_ptrs[(cc + 1) - common->start] = *private_data_start; - *private_data_start += sizeof(sljit_sw); - return TRUE; - } -return FALSE; -} + case OP_CLASS: + case OP_NCLASS: +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH != 8 + case OP_XCLASS: + accelerated_start = cc; + cc += ((*cc == OP_XCLASS) ? GET(cc, 1) : (unsigned int)(1 + (32 / sizeof(PCRE2_UCHAR)))); +#else + accelerated_start = cc; + cc += (1 + (32 / sizeof(PCRE2_UCHAR))); +#endif -static SLJIT_INLINE void detect_fast_fail(compiler_common *common, pcre_uchar *cc, int *private_data_start, sljit_s32 depth) -{ - pcre_uchar *next_alt; + switch (*cc) + { + case OP_CRSTAR: + case OP_CRMINSTAR: + case OP_CRPLUS: + case OP_CRMINPLUS: + case OP_CRPOSSTAR: + case OP_CRPOSPLUS: + cc++; + break; - SLJIT_ASSERT(*cc == OP_BRA || *cc == OP_CBRA); + case OP_CRRANGE: + case OP_CRMINRANGE: + case OP_CRPOSRANGE: + cc += 2 * IMM2_SIZE; + /* Fall through */ + case OP_CRQUERY: + case OP_CRMINQUERY: + case OP_CRPOSQUERY: + cc++; + if (count == 0) + count = 1; + /* Fall through */ + default: + accelerated_start = NULL; + fast_forward_allowed = FALSE; + continue; + } + break; - if (*cc == OP_CBRA && common->optimized_cbracket[GET2(cc, 1 + LINK_SIZE)] == 0) - return; + case OP_ONCE: + case OP_BRA: + case OP_CBRA: + end = cc + GET(cc, 1); - next_alt = bracketend(cc) - (1 + LINK_SIZE); - if (*next_alt != OP_KET || PRIVATE_DATA(next_alt) != 0) - return; + fast_forward_allowed = FALSE; + if (depth >= 4) + break; - do - { - next_alt = cc + GET(cc, 1); + end = bracketend(cc) - (1 + LINK_SIZE); + if (*end != OP_KET || (*cc == OP_CBRA && common->optimized_cbracket[GET2(cc, 1 + LINK_SIZE)] == 0)) + break; - cc += 1 + LINK_SIZE + ((*cc == OP_CBRA) ? IMM2_SIZE : 0); + count = detect_early_fail(common, cc, private_data_start, depth + 1, count); - while (TRUE) - { - switch(*cc) + if (PRIVATE_DATA(cc) != 0) + common->private_data_ptrs[begin - common->start] = 1; + + if (count < EARLY_FAIL_ENHANCE_MAX) { - case OP_SOD: - case OP_SOM: - case OP_SET_SOM: - case OP_NOT_WORD_BOUNDARY: - case OP_WORD_BOUNDARY: - case OP_EODN: - case OP_EOD: - case OP_CIRC: - case OP_CIRCM: - case OP_DOLL: - case OP_DOLLM: - /* Zero width assertions. */ - cc++; + cc = end + (1 + LINK_SIZE); continue; } break; - } - if (depth > 0 && (*cc == OP_BRA || *cc == OP_CBRA)) - detect_fast_fail(common, cc, private_data_start, depth - 1); + case OP_KET: + SLJIT_ASSERT(PRIVATE_DATA(cc) == 0); + if (cc >= next_alt) + break; + cc += 1 + LINK_SIZE; + continue; + } - if (is_accelerated_repeat(cc)) + if (accelerated_start != NULL) { - common->private_data_ptrs[(cc + 1) - common->start] = *private_data_start; + if (count == 0) + { + count++; + + if (fast_forward_allowed && *next_alt == OP_KET) + { + common->fast_forward_bc_ptr = accelerated_start; + common->private_data_ptrs[(accelerated_start + 1) - common->start] = ((*private_data_start) << 3) | type_skip; + *private_data_start += sizeof(sljit_sw); + } + else + { + common->private_data_ptrs[(accelerated_start + 1) - common->start] = ((*private_data_start) << 3) | type_fail; - if (common->fast_fail_start_ptr == 0) - common->fast_fail_start_ptr = *private_data_start; + if (common->early_fail_start_ptr == 0) + common->early_fail_start_ptr = *private_data_start; - *private_data_start += sizeof(sljit_sw); - common->fast_fail_end_ptr = *private_data_start; + *private_data_start += sizeof(sljit_sw); + common->early_fail_end_ptr = *private_data_start; - if (*private_data_start > SLJIT_MAX_LOCAL_SIZE) - return; + if (*private_data_start > SLJIT_MAX_LOCAL_SIZE) + return EARLY_FAIL_ENHANCE_MAX; + } + } + else + { + common->private_data_ptrs[(accelerated_start + 1) - common->start] = ((*private_data_start) << 3) | type_fail_range; + + if (common->early_fail_start_ptr == 0) + common->early_fail_start_ptr = *private_data_start; + + *private_data_start += 2 * sizeof(sljit_sw); + common->early_fail_end_ptr = *private_data_start; + + if (*private_data_start > SLJIT_MAX_LOCAL_SIZE) + return EARLY_FAIL_ENHANCE_MAX; + } + + /* Cannot be part of a repeat. */ + common->private_data_ptrs[begin - common->start] = 1; + count++; + + if (count < EARLY_FAIL_ENHANCE_MAX) + continue; } - cc = next_alt; + break; } - while (*cc == OP_ALT); + + if (*cc != OP_ALT && *cc != OP_KET) + result = EARLY_FAIL_ENHANCE_MAX; + else if (result < count) + result = count; + + fast_forward_allowed = FALSE; + cc = next_alt; + } +while (*cc == OP_ALT); + +return result; } -static int get_class_iterator_size(pcre_uchar *cc) +static int get_class_iterator_size(PCRE2_SPTR cc) { sljit_u32 min; sljit_u32 max; @@ -1140,22 +1600,23 @@ switch(*cc) } } -static BOOL detect_repeat(compiler_common *common, pcre_uchar *begin) +static BOOL detect_repeat(compiler_common *common, PCRE2_SPTR begin) { -pcre_uchar *end = bracketend(begin); -pcre_uchar *next; -pcre_uchar *next_end; -pcre_uchar *max_end; -pcre_uchar type; +PCRE2_SPTR end = bracketend(begin); +PCRE2_SPTR next; +PCRE2_SPTR next_end; +PCRE2_SPTR max_end; +PCRE2_UCHAR type; sljit_sw length = end - begin; -int min, max, i; +sljit_s32 min, max, i; /* Detect fixed iterations first. */ -if (end[-(1 + LINK_SIZE)] != OP_KET) +if (end[-(1 + LINK_SIZE)] != OP_KET || PRIVATE_DATA(begin) != 0) return FALSE; -/* Already detected repeat. */ -if (common->private_data_ptrs[end - common->start - LINK_SIZE] != 0) +/* /(?:AB){4,6}/ is currently converted to /(?:AB){3}(?AB){1,3}/ + * Skip the check of the second part. */ +if (PRIVATE_DATA(end - LINK_SIZE) == 0) return TRUE; next = end; @@ -1277,11 +1738,11 @@ return FALSE; case OP_TYPEUPTO: \ case OP_TYPEMINUPTO: -static void set_private_data_ptrs(compiler_common *common, int *private_data_start, pcre_uchar *ccend) +static void set_private_data_ptrs(compiler_common *common, int *private_data_start, PCRE2_SPTR ccend) { -pcre_uchar *cc = common->start; -pcre_uchar *alternative; -pcre_uchar *end = NULL; +PCRE2_SPTR cc = common->start; +PCRE2_SPTR alternative; +PCRE2_SPTR end = NULL; int private_data_ptr = *private_data_start; int space, size, bracketlen; BOOL repeat_check = TRUE; @@ -1294,7 +1755,8 @@ while (cc < ccend) if (private_data_ptr > SLJIT_MAX_LOCAL_SIZE) break; - if (repeat_check && (*cc == OP_ONCE || *cc == OP_ONCE_NC || *cc == OP_BRA || *cc == OP_CBRA || *cc == OP_COND)) + /* When the bracket is prefixed by a zero iteration, skip the repeat check (at this point). */ + if (repeat_check && (*cc == OP_ONCE || *cc == OP_BRA || *cc == OP_CBRA || *cc == OP_COND)) { if (detect_repeat(common, cc)) { @@ -1322,8 +1784,10 @@ while (cc < ccend) case OP_ASSERT_NOT: case OP_ASSERTBACK: case OP_ASSERTBACK_NOT: + case OP_ASSERT_NA: + case OP_ASSERTBACK_NA: case OP_ONCE: - case OP_ONCE_NC: + case OP_SCRIPT_RUN: case OP_BRAPOS: case OP_SBRA: case OP_SBRAPOS: @@ -1342,6 +1806,7 @@ while (cc < ccend) case OP_COND: /* Might be a hidden SCOND. */ + common->private_data_ptrs[cc - common->start] = 0; alternative = cc + GET(cc, 1); if (*alternative == OP_KETRMAX || *alternative == OP_KETRMIN) { @@ -1363,57 +1828,57 @@ while (cc < ccend) case OP_BRAZERO: case OP_BRAMINZERO: case OP_BRAPOSZERO: - repeat_check = FALSE; size = 1; + repeat_check = FALSE; break; CASE_ITERATOR_PRIVATE_DATA_1 - space = 1; size = -2; + space = 1; break; CASE_ITERATOR_PRIVATE_DATA_2A - space = 2; size = -2; + space = 2; break; CASE_ITERATOR_PRIVATE_DATA_2B - space = 2; size = -(2 + IMM2_SIZE); + space = 2; break; CASE_ITERATOR_TYPE_PRIVATE_DATA_1 - space = 1; size = 1; + space = 1; break; CASE_ITERATOR_TYPE_PRIVATE_DATA_2A + size = 1; if (cc[1] != OP_ANYNL && cc[1] != OP_EXTUNI) space = 2; - size = 1; break; case OP_TYPEUPTO: + size = 1 + IMM2_SIZE; if (cc[1 + IMM2_SIZE] != OP_ANYNL && cc[1 + IMM2_SIZE] != OP_EXTUNI) space = 2; - size = 1 + IMM2_SIZE; break; case OP_TYPEMINUPTO: - space = 2; size = 1 + IMM2_SIZE; + space = 2; break; case OP_CLASS: case OP_NCLASS: + size = 1 + 32 / sizeof(PCRE2_UCHAR); space = get_class_iterator_size(cc + size); - size = 1 + 32 / sizeof(pcre_uchar); break; -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH != 8 case OP_XCLASS: - space = get_class_iterator_size(cc + size); size = GET(cc, 1); + space = get_class_iterator_size(cc + size); break; #endif @@ -1436,7 +1901,7 @@ while (cc < ccend) if (size < 0) { cc += -size; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); #endif } @@ -1459,7 +1924,7 @@ while (cc < ccend) } /* Returns with a frame_types (always < 0) if no need for frame. */ -static int get_framesize(compiler_common *common, pcre_uchar *cc, pcre_uchar *ccend, BOOL recursive, BOOL *needs_control_head) +static int get_framesize(compiler_common *common, PCRE2_SPTR cc, PCRE2_SPTR ccend, BOOL recursive, BOOL *needs_control_head) { int length = 0; int possessive = 0; @@ -1504,6 +1969,7 @@ while (cc < ccend) break; case OP_MARK: + case OP_COMMIT_ARG: case OP_PRUNE_ARG: case OP_THEN_ARG: SLJIT_ASSERT(common->mark_ptr != 0); @@ -1626,7 +2092,9 @@ while (cc < ccend) case OP_CLASS: case OP_NCLASS: case OP_XCLASS: + case OP_CALLOUT: + case OP_CALLOUT_STR: cc = next_opcode(common, cc); SLJIT_ASSERT(cc != NULL); @@ -1642,11 +2110,11 @@ if (length > 0) return stack_restore ? no_frame : no_stack; } -static void init_frame(compiler_common *common, pcre_uchar *cc, pcre_uchar *ccend, int stackpos, int stacktop, BOOL recursive) +static void init_frame(compiler_common *common, PCRE2_SPTR cc, PCRE2_SPTR ccend, int stackpos, int stacktop) { DEFINE_COMPILER; -BOOL setsom_found = recursive; -BOOL setmark_found = recursive; +BOOL setsom_found = FALSE; +BOOL setmark_found = FALSE; /* The last capture is a local variable even for recursions. */ BOOL capture_last_found = FALSE; int offset; @@ -1659,7 +2127,7 @@ stackpos = STACK(stackpos); if (ccend == NULL) { ccend = bracketend(cc) - (1 + LINK_SIZE); - if (recursive || (*cc != OP_CBRAPOS && *cc != OP_SCBRAPOS)) + if (*cc != OP_CBRAPOS && *cc != OP_SCBRAPOS) cc = next_opcode(common, cc); } @@ -1682,6 +2150,7 @@ while (cc < ccend) break; case OP_MARK: + case OP_COMMIT_ARG: case OP_PRUNE_ARG: case OP_THEN_ARG: SLJIT_ASSERT(common->mark_ptr != 0); @@ -1764,21 +2233,127 @@ OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), stackpos, SLJIT_IMM, 0); SLJIT_ASSERT(stackpos == STACK(stacktop)); } -static SLJIT_INLINE int get_private_data_copy_length(compiler_common *common, pcre_uchar *cc, pcre_uchar *ccend, BOOL needs_control_head) +#define RECURSE_TMP_REG_COUNT 3 + +typedef struct delayed_mem_copy_status { + struct sljit_compiler *compiler; + int store_bases[RECURSE_TMP_REG_COUNT]; + int store_offsets[RECURSE_TMP_REG_COUNT]; + int tmp_regs[RECURSE_TMP_REG_COUNT]; + int saved_tmp_regs[RECURSE_TMP_REG_COUNT]; + int next_tmp_reg; +} delayed_mem_copy_status; + +static void delayed_mem_copy_init(delayed_mem_copy_status *status, compiler_common *common) +{ +int i; + +for (i = 0; i < RECURSE_TMP_REG_COUNT; i++) + { + SLJIT_ASSERT(status->tmp_regs[i] >= 0); + SLJIT_ASSERT(sljit_get_register_index(status->saved_tmp_regs[i]) < 0 || status->tmp_regs[i] == status->saved_tmp_regs[i]); + + status->store_bases[i] = -1; + } +status->next_tmp_reg = 0; +status->compiler = common->compiler; +} + +static void delayed_mem_copy_move(delayed_mem_copy_status *status, int load_base, sljit_sw load_offset, + int store_base, sljit_sw store_offset) +{ +struct sljit_compiler *compiler = status->compiler; +int next_tmp_reg = status->next_tmp_reg; +int tmp_reg = status->tmp_regs[next_tmp_reg]; + +SLJIT_ASSERT(load_base > 0 && store_base > 0); + +if (status->store_bases[next_tmp_reg] == -1) + { + /* Preserve virtual registers. */ + if (sljit_get_register_index(status->saved_tmp_regs[next_tmp_reg]) < 0) + OP1(SLJIT_MOV, status->saved_tmp_regs[next_tmp_reg], 0, tmp_reg, 0); + } +else + OP1(SLJIT_MOV, SLJIT_MEM1(status->store_bases[next_tmp_reg]), status->store_offsets[next_tmp_reg], tmp_reg, 0); + +OP1(SLJIT_MOV, tmp_reg, 0, SLJIT_MEM1(load_base), load_offset); +status->store_bases[next_tmp_reg] = store_base; +status->store_offsets[next_tmp_reg] = store_offset; + +status->next_tmp_reg = (next_tmp_reg + 1) % RECURSE_TMP_REG_COUNT; +} + +static void delayed_mem_copy_finish(delayed_mem_copy_status *status) +{ +struct sljit_compiler *compiler = status->compiler; +int next_tmp_reg = status->next_tmp_reg; +int tmp_reg, saved_tmp_reg, i; + +for (i = 0; i < RECURSE_TMP_REG_COUNT; i++) + { + if (status->store_bases[next_tmp_reg] != -1) + { + tmp_reg = status->tmp_regs[next_tmp_reg]; + saved_tmp_reg = status->saved_tmp_regs[next_tmp_reg]; + + OP1(SLJIT_MOV, SLJIT_MEM1(status->store_bases[next_tmp_reg]), status->store_offsets[next_tmp_reg], tmp_reg, 0); + + /* Restore virtual registers. */ + if (sljit_get_register_index(saved_tmp_reg) < 0) + OP1(SLJIT_MOV, tmp_reg, 0, saved_tmp_reg, 0); + } + + next_tmp_reg = (next_tmp_reg + 1) % RECURSE_TMP_REG_COUNT; + } +} + +#undef RECURSE_TMP_REG_COUNT + +static int get_recurse_data_length(compiler_common *common, PCRE2_SPTR cc, PCRE2_SPTR ccend, + BOOL *needs_control_head, BOOL *has_quit, BOOL *has_accept) { -int private_data_length = needs_control_head ? 3 : 2; +int length = 1; int size; -pcre_uchar *alternative; +PCRE2_SPTR alternative; +BOOL quit_found = FALSE; +BOOL accept_found = FALSE; +BOOL setsom_found = FALSE; +BOOL setmark_found = FALSE; +BOOL capture_last_found = FALSE; +BOOL control_head_found = FALSE; + +#if defined DEBUG_FORCE_CONTROL_HEAD && DEBUG_FORCE_CONTROL_HEAD +SLJIT_ASSERT(common->control_head_ptr != 0); +control_head_found = TRUE; +#endif + /* Calculate the sum of the private machine words. */ while (cc < ccend) { size = 0; switch(*cc) { + case OP_SET_SOM: + SLJIT_ASSERT(common->has_set_som); + setsom_found = TRUE; + cc += 1; + break; + + case OP_RECURSE: + if (common->has_set_som) + setsom_found = TRUE; + if (common->mark_ptr != 0) + setmark_found = TRUE; + if (common->capture_last_ptr != 0) + capture_last_found = TRUE; + cc += 1 + LINK_SIZE; + break; + case OP_KET: if (PRIVATE_DATA(cc) != 0) { - private_data_length++; + length++; SLJIT_ASSERT(PRIVATE_DATA(cc + 1) != 0); cc += PRIVATE_DATA(cc + 1); } @@ -1789,27 +2364,34 @@ while (cc < ccend) case OP_ASSERT_NOT: case OP_ASSERTBACK: case OP_ASSERTBACK_NOT: + case OP_ASSERT_NA: + case OP_ASSERTBACK_NA: case OP_ONCE: - case OP_ONCE_NC: + case OP_SCRIPT_RUN: case OP_BRAPOS: case OP_SBRA: case OP_SBRAPOS: case OP_SCOND: - private_data_length++; + length++; SLJIT_ASSERT(PRIVATE_DATA(cc) != 0); cc += 1 + LINK_SIZE; break; case OP_CBRA: case OP_SCBRA: + length += 2; + if (common->capture_last_ptr != 0) + capture_last_found = TRUE; if (common->optimized_cbracket[GET2(cc, 1 + LINK_SIZE)] == 0) - private_data_length++; + length++; cc += 1 + LINK_SIZE + IMM2_SIZE; break; case OP_CBRAPOS: case OP_SCBRAPOS: - private_data_length += 2; + length += 2 + 2; + if (common->capture_last_ptr != 0) + capture_last_found = TRUE; cc += 1 + LINK_SIZE + IMM2_SIZE; break; @@ -1817,68 +2399,109 @@ while (cc < ccend) /* Might be a hidden SCOND. */ alternative = cc + GET(cc, 1); if (*alternative == OP_KETRMAX || *alternative == OP_KETRMIN) - private_data_length++; + length++; cc += 1 + LINK_SIZE; break; CASE_ITERATOR_PRIVATE_DATA_1 - if (PRIVATE_DATA(cc)) - private_data_length++; + if (PRIVATE_DATA(cc) != 0) + length++; cc += 2; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); #endif break; CASE_ITERATOR_PRIVATE_DATA_2A - if (PRIVATE_DATA(cc)) - private_data_length += 2; + if (PRIVATE_DATA(cc) != 0) + length += 2; cc += 2; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); #endif break; CASE_ITERATOR_PRIVATE_DATA_2B - if (PRIVATE_DATA(cc)) - private_data_length += 2; + if (PRIVATE_DATA(cc) != 0) + length += 2; cc += 2 + IMM2_SIZE; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); #endif break; CASE_ITERATOR_TYPE_PRIVATE_DATA_1 - if (PRIVATE_DATA(cc)) - private_data_length++; + if (PRIVATE_DATA(cc) != 0) + length++; cc += 1; break; CASE_ITERATOR_TYPE_PRIVATE_DATA_2A - if (PRIVATE_DATA(cc)) - private_data_length += 2; + if (PRIVATE_DATA(cc) != 0) + length += 2; cc += 1; break; CASE_ITERATOR_TYPE_PRIVATE_DATA_2B - if (PRIVATE_DATA(cc)) - private_data_length += 2; + if (PRIVATE_DATA(cc) != 0) + length += 2; cc += 1 + IMM2_SIZE; break; case OP_CLASS: case OP_NCLASS: -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH != 8 case OP_XCLASS: - size = (*cc == OP_XCLASS) ? GET(cc, 1) : 1 + 32 / (int)sizeof(pcre_uchar); + size = (*cc == OP_XCLASS) ? GET(cc, 1) : 1 + 32 / (int)sizeof(PCRE2_UCHAR); #else - size = 1 + 32 / (int)sizeof(pcre_uchar); + size = 1 + 32 / (int)sizeof(PCRE2_UCHAR); #endif - if (PRIVATE_DATA(cc)) - private_data_length += get_class_iterator_size(cc + size); + if (PRIVATE_DATA(cc) != 0) + length += get_class_iterator_size(cc + size); cc += size; break; + case OP_MARK: + case OP_COMMIT_ARG: + case OP_PRUNE_ARG: + case OP_THEN_ARG: + SLJIT_ASSERT(common->mark_ptr != 0); + if (!setmark_found) + setmark_found = TRUE; + if (common->control_head_ptr != 0) + control_head_found = TRUE; + if (*cc != OP_MARK) + quit_found = TRUE; + + cc += 1 + 2 + cc[1]; + break; + + case OP_PRUNE: + case OP_SKIP: + case OP_COMMIT: + quit_found = TRUE; + cc++; + break; + + case OP_SKIP_ARG: + quit_found = TRUE; + cc += 1 + 2 + cc[1]; + break; + + case OP_THEN: + SLJIT_ASSERT(common->control_head_ptr != 0); + quit_found = TRUE; + if (!control_head_found) + control_head_found = TRUE; + cc++; + break; + + case OP_ACCEPT: + case OP_ASSERT_ACCEPT: + accept_found = TRUE; + cc++; + break; + default: cc = next_opcode(common, cc); SLJIT_ASSERT(cc != NULL); @@ -1886,74 +2509,177 @@ while (cc < ccend) } } SLJIT_ASSERT(cc == ccend); -return private_data_length; + +if (control_head_found) + length++; +if (capture_last_found) + length++; +if (quit_found) + { + if (setsom_found) + length++; + if (setmark_found) + length++; + } + +*needs_control_head = control_head_found; +*has_quit = quit_found; +*has_accept = accept_found; +return length; } -static void copy_private_data(compiler_common *common, pcre_uchar *cc, pcre_uchar *ccend, - BOOL save, int stackptr, int stacktop, BOOL needs_control_head) +enum copy_recurse_data_types { + recurse_copy_from_global, + recurse_copy_private_to_global, + recurse_copy_shared_to_global, + recurse_copy_kept_shared_to_global, + recurse_swap_global +}; + +static void copy_recurse_data(compiler_common *common, PCRE2_SPTR cc, PCRE2_SPTR ccend, + int type, int stackptr, int stacktop, BOOL has_quit) { -DEFINE_COMPILER; -int srcw[2]; -int count, size; -BOOL tmp1next = TRUE; -BOOL tmp1empty = TRUE; -BOOL tmp2empty = TRUE; -pcre_uchar *alternative; -enum { - loop, - end -} status; - -status = loop; +delayed_mem_copy_status status; +PCRE2_SPTR alternative; +sljit_sw private_srcw[2]; +sljit_sw shared_srcw[3]; +sljit_sw kept_shared_srcw[2]; +int private_count, shared_count, kept_shared_count; +int from_sp, base_reg, offset, i; +BOOL setsom_found = FALSE; +BOOL setmark_found = FALSE; +BOOL capture_last_found = FALSE; +BOOL control_head_found = FALSE; + +#if defined DEBUG_FORCE_CONTROL_HEAD && DEBUG_FORCE_CONTROL_HEAD +SLJIT_ASSERT(common->control_head_ptr != 0); +control_head_found = TRUE; +#endif + +switch (type) + { + case recurse_copy_from_global: + from_sp = TRUE; + base_reg = STACK_TOP; + break; + + case recurse_copy_private_to_global: + case recurse_copy_shared_to_global: + case recurse_copy_kept_shared_to_global: + from_sp = FALSE; + base_reg = STACK_TOP; + break; + + default: + SLJIT_ASSERT(type == recurse_swap_global); + from_sp = FALSE; + base_reg = TMP2; + break; + } + stackptr = STACK(stackptr); -stacktop = STACK(stacktop - 1); +stacktop = STACK(stacktop); -if (!save) +status.tmp_regs[0] = TMP1; +status.saved_tmp_regs[0] = TMP1; + +if (base_reg != TMP2) { - stacktop -= (needs_control_head ? 2 : 1) * sizeof(sljit_sw); - if (stackptr < stacktop) - { - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(STACK_TOP), stackptr); - stackptr += sizeof(sljit_sw); - tmp1empty = FALSE; - } - if (stackptr < stacktop) - { - OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(STACK_TOP), stackptr); - stackptr += sizeof(sljit_sw); - tmp2empty = FALSE; - } - /* The tmp1next must be TRUE in either way. */ + status.tmp_regs[1] = TMP2; + status.saved_tmp_regs[1] = TMP2; + } +else + { + status.saved_tmp_regs[1] = RETURN_ADDR; + if (HAS_VIRTUAL_REGISTERS) + status.tmp_regs[1] = STR_PTR; + else + status.tmp_regs[1] = RETURN_ADDR; + } + +status.saved_tmp_regs[2] = TMP3; +if (HAS_VIRTUAL_REGISTERS) + status.tmp_regs[2] = STR_END; +else + status.tmp_regs[2] = TMP3; + +delayed_mem_copy_init(&status, common); + +if (type != recurse_copy_shared_to_global && type != recurse_copy_kept_shared_to_global) + { + SLJIT_ASSERT(type == recurse_copy_from_global || type == recurse_copy_private_to_global || type == recurse_swap_global); + + if (!from_sp) + delayed_mem_copy_move(&status, base_reg, stackptr, SLJIT_SP, common->recursive_head_ptr); + + if (from_sp || type == recurse_swap_global) + delayed_mem_copy_move(&status, SLJIT_SP, common->recursive_head_ptr, base_reg, stackptr); } -SLJIT_ASSERT(common->recursive_head_ptr != 0); +stackptr += sizeof(sljit_sw); -do +#if defined DEBUG_FORCE_CONTROL_HEAD && DEBUG_FORCE_CONTROL_HEAD +if (type != recurse_copy_shared_to_global) { - count = 0; - if (cc >= ccend) + if (!from_sp) + delayed_mem_copy_move(&status, base_reg, stackptr, SLJIT_SP, common->control_head_ptr); + + if (from_sp || type == recurse_swap_global) + delayed_mem_copy_move(&status, SLJIT_SP, common->control_head_ptr, base_reg, stackptr); + } + +stackptr += sizeof(sljit_sw); +#endif + +while (cc < ccend) + { + private_count = 0; + shared_count = 0; + kept_shared_count = 0; + + switch(*cc) { - if (!save) - break; + case OP_SET_SOM: + SLJIT_ASSERT(common->has_set_som); + if (has_quit && !setsom_found) + { + kept_shared_srcw[0] = OVECTOR(0); + kept_shared_count = 1; + setsom_found = TRUE; + } + cc += 1; + break; - count = 1; - srcw[0] = common->recursive_head_ptr; - if (needs_control_head) + case OP_RECURSE: + if (has_quit) + { + if (common->has_set_som && !setsom_found) + { + kept_shared_srcw[0] = OVECTOR(0); + kept_shared_count = 1; + setsom_found = TRUE; + } + if (common->mark_ptr != 0 && !setmark_found) + { + kept_shared_srcw[kept_shared_count] = common->mark_ptr; + kept_shared_count++; + setmark_found = TRUE; + } + } + if (common->capture_last_ptr != 0 && !capture_last_found) { - SLJIT_ASSERT(common->control_head_ptr != 0); - count = 2; - srcw[0] = common->control_head_ptr; - srcw[1] = common->recursive_head_ptr; + shared_srcw[0] = common->capture_last_ptr; + shared_count = 1; + capture_last_found = TRUE; } - status = end; - } - else switch(*cc) - { + cc += 1 + LINK_SIZE; + break; + case OP_KET: if (PRIVATE_DATA(cc) != 0) { - count = 1; - srcw[0] = PRIVATE_DATA(cc); + private_count = 1; + private_srcw[0] = PRIVATE_DATA(cc); SLJIT_ASSERT(PRIVATE_DATA(cc + 1) != 0); cc += PRIVATE_DATA(cc + 1); } @@ -1964,34 +2690,58 @@ do case OP_ASSERT_NOT: case OP_ASSERTBACK: case OP_ASSERTBACK_NOT: + case OP_ASSERT_NA: + case OP_ASSERTBACK_NA: case OP_ONCE: - case OP_ONCE_NC: + case OP_SCRIPT_RUN: case OP_BRAPOS: case OP_SBRA: case OP_SBRAPOS: case OP_SCOND: - count = 1; - srcw[0] = PRIVATE_DATA(cc); - SLJIT_ASSERT(srcw[0] != 0); + private_count = 1; + private_srcw[0] = PRIVATE_DATA(cc); cc += 1 + LINK_SIZE; break; case OP_CBRA: case OP_SCBRA: + offset = (GET2(cc, 1 + LINK_SIZE)) << 1; + shared_srcw[0] = OVECTOR(offset); + shared_srcw[1] = OVECTOR(offset + 1); + shared_count = 2; + + if (common->capture_last_ptr != 0 && !capture_last_found) + { + shared_srcw[2] = common->capture_last_ptr; + shared_count = 3; + capture_last_found = TRUE; + } + if (common->optimized_cbracket[GET2(cc, 1 + LINK_SIZE)] == 0) { - count = 1; - srcw[0] = OVECTOR_PRIV(GET2(cc, 1 + LINK_SIZE)); + private_count = 1; + private_srcw[0] = OVECTOR_PRIV(GET2(cc, 1 + LINK_SIZE)); } cc += 1 + LINK_SIZE + IMM2_SIZE; break; case OP_CBRAPOS: case OP_SCBRAPOS: - count = 2; - srcw[0] = PRIVATE_DATA(cc); - srcw[1] = OVECTOR_PRIV(GET2(cc, 1 + LINK_SIZE)); - SLJIT_ASSERT(srcw[0] != 0 && srcw[1] != 0); + offset = (GET2(cc, 1 + LINK_SIZE)) << 1; + shared_srcw[0] = OVECTOR(offset); + shared_srcw[1] = OVECTOR(offset + 1); + shared_count = 2; + + if (common->capture_last_ptr != 0 && !capture_last_found) + { + shared_srcw[2] = common->capture_last_ptr; + shared_count = 3; + capture_last_found = TRUE; + } + + private_count = 2; + private_srcw[0] = PRIVATE_DATA(cc); + private_srcw[1] = OVECTOR_PRIV(GET2(cc, 1 + LINK_SIZE)); cc += 1 + LINK_SIZE + IMM2_SIZE; break; @@ -2000,9 +2750,8 @@ do alternative = cc + GET(cc, 1); if (*alternative == OP_KETRMAX || *alternative == OP_KETRMIN) { - count = 1; - srcw[0] = PRIVATE_DATA(cc); - SLJIT_ASSERT(srcw[0] != 0); + private_count = 1; + private_srcw[0] = PRIVATE_DATA(cc); } cc += 1 + LINK_SIZE; break; @@ -2010,11 +2759,11 @@ do CASE_ITERATOR_PRIVATE_DATA_1 if (PRIVATE_DATA(cc)) { - count = 1; - srcw[0] = PRIVATE_DATA(cc); + private_count = 1; + private_srcw[0] = PRIVATE_DATA(cc); } cc += 2; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); #endif break; @@ -2022,12 +2771,12 @@ do CASE_ITERATOR_PRIVATE_DATA_2A if (PRIVATE_DATA(cc)) { - count = 2; - srcw[0] = PRIVATE_DATA(cc); - srcw[1] = PRIVATE_DATA(cc) + sizeof(sljit_sw); + private_count = 2; + private_srcw[0] = PRIVATE_DATA(cc); + private_srcw[1] = PRIVATE_DATA(cc) + sizeof(sljit_sw); } cc += 2; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); #endif break; @@ -2035,12 +2784,12 @@ do CASE_ITERATOR_PRIVATE_DATA_2B if (PRIVATE_DATA(cc)) { - count = 2; - srcw[0] = PRIVATE_DATA(cc); - srcw[1] = PRIVATE_DATA(cc) + sizeof(sljit_sw); + private_count = 2; + private_srcw[0] = PRIVATE_DATA(cc); + private_srcw[1] = PRIVATE_DATA(cc) + sizeof(sljit_sw); } cc += 2 + IMM2_SIZE; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); #endif break; @@ -2048,8 +2797,8 @@ do CASE_ITERATOR_TYPE_PRIVATE_DATA_1 if (PRIVATE_DATA(cc)) { - count = 1; - srcw[0] = PRIVATE_DATA(cc); + private_count = 1; + private_srcw[0] = PRIVATE_DATA(cc); } cc += 1; break; @@ -2057,9 +2806,9 @@ do CASE_ITERATOR_TYPE_PRIVATE_DATA_2A if (PRIVATE_DATA(cc)) { - count = 2; - srcw[0] = PRIVATE_DATA(cc); - srcw[1] = srcw[0] + sizeof(sljit_sw); + private_count = 2; + private_srcw[0] = PRIVATE_DATA(cc); + private_srcw[1] = private_srcw[0] + sizeof(sljit_sw); } cc += 1; break; @@ -2067,40 +2816,71 @@ do CASE_ITERATOR_TYPE_PRIVATE_DATA_2B if (PRIVATE_DATA(cc)) { - count = 2; - srcw[0] = PRIVATE_DATA(cc); - srcw[1] = srcw[0] + sizeof(sljit_sw); + private_count = 2; + private_srcw[0] = PRIVATE_DATA(cc); + private_srcw[1] = private_srcw[0] + sizeof(sljit_sw); } cc += 1 + IMM2_SIZE; break; case OP_CLASS: case OP_NCLASS: -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH != 8 case OP_XCLASS: - size = (*cc == OP_XCLASS) ? GET(cc, 1) : 1 + 32 / (int)sizeof(pcre_uchar); + i = (*cc == OP_XCLASS) ? GET(cc, 1) : 1 + 32 / (int)sizeof(PCRE2_UCHAR); #else - size = 1 + 32 / (int)sizeof(pcre_uchar); + i = 1 + 32 / (int)sizeof(PCRE2_UCHAR); #endif - if (PRIVATE_DATA(cc)) - switch(get_class_iterator_size(cc + size)) + if (PRIVATE_DATA(cc) != 0) + switch(get_class_iterator_size(cc + i)) { case 1: - count = 1; - srcw[0] = PRIVATE_DATA(cc); + private_count = 1; + private_srcw[0] = PRIVATE_DATA(cc); break; case 2: - count = 2; - srcw[0] = PRIVATE_DATA(cc); - srcw[1] = srcw[0] + sizeof(sljit_sw); + private_count = 2; + private_srcw[0] = PRIVATE_DATA(cc); + private_srcw[1] = private_srcw[0] + sizeof(sljit_sw); break; default: SLJIT_UNREACHABLE(); break; } - cc += size; + cc += i; + break; + + case OP_MARK: + case OP_COMMIT_ARG: + case OP_PRUNE_ARG: + case OP_THEN_ARG: + SLJIT_ASSERT(common->mark_ptr != 0); + if (has_quit && !setmark_found) + { + kept_shared_srcw[0] = common->mark_ptr; + kept_shared_count = 1; + setmark_found = TRUE; + } + if (common->control_head_ptr != 0 && !control_head_found) + { + private_srcw[0] = common->control_head_ptr; + private_count = 1; + control_head_found = TRUE; + } + cc += 1 + 2 + cc[1]; + break; + + case OP_THEN: + SLJIT_ASSERT(common->control_head_ptr != 0); + if (!control_head_found) + { + private_srcw[0] = common->control_head_ptr; + private_count = 1; + control_head_found = TRUE; + } + cc++; break; default: @@ -2109,104 +2889,79 @@ do break; } - while (count > 0) + if (type != recurse_copy_shared_to_global && type != recurse_copy_kept_shared_to_global) { - count--; - if (save) - { - if (tmp1next) - { - if (!tmp1empty) - { - OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), stackptr, TMP1, 0); - stackptr += sizeof(sljit_sw); - } - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), srcw[count]); - tmp1empty = FALSE; - tmp1next = FALSE; - } - else - { - if (!tmp2empty) - { - OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), stackptr, TMP2, 0); - stackptr += sizeof(sljit_sw); - } - OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(SLJIT_SP), srcw[count]); - tmp2empty = FALSE; - tmp1next = TRUE; - } - } - else + SLJIT_ASSERT(type == recurse_copy_from_global || type == recurse_copy_private_to_global || type == recurse_swap_global); + + for (i = 0; i < private_count; i++) { - if (tmp1next) - { - SLJIT_ASSERT(!tmp1empty); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), srcw[count], TMP1, 0); - tmp1empty = stackptr >= stacktop; - if (!tmp1empty) - { - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(STACK_TOP), stackptr); - stackptr += sizeof(sljit_sw); - } - tmp1next = FALSE; - } - else - { - SLJIT_ASSERT(!tmp2empty); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), srcw[count], TMP2, 0); - tmp2empty = stackptr >= stacktop; - if (!tmp2empty) - { - OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(STACK_TOP), stackptr); - stackptr += sizeof(sljit_sw); - } - tmp1next = TRUE; - } + SLJIT_ASSERT(private_srcw[i] != 0); + + if (!from_sp) + delayed_mem_copy_move(&status, base_reg, stackptr, SLJIT_SP, private_srcw[i]); + + if (from_sp || type == recurse_swap_global) + delayed_mem_copy_move(&status, SLJIT_SP, private_srcw[i], base_reg, stackptr); + + stackptr += sizeof(sljit_sw); } } - } -while (status != end); + else + stackptr += sizeof(sljit_sw) * private_count; -if (save) - { - if (tmp1next) + if (type != recurse_copy_private_to_global && type != recurse_copy_kept_shared_to_global) { - if (!tmp1empty) + SLJIT_ASSERT(type == recurse_copy_from_global || type == recurse_copy_shared_to_global || type == recurse_swap_global); + + for (i = 0; i < shared_count; i++) { - OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), stackptr, TMP1, 0); - stackptr += sizeof(sljit_sw); - } - if (!tmp2empty) - { - OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), stackptr, TMP2, 0); + SLJIT_ASSERT(shared_srcw[i] != 0); + + if (!from_sp) + delayed_mem_copy_move(&status, base_reg, stackptr, SLJIT_SP, shared_srcw[i]); + + if (from_sp || type == recurse_swap_global) + delayed_mem_copy_move(&status, SLJIT_SP, shared_srcw[i], base_reg, stackptr); + stackptr += sizeof(sljit_sw); } } else + stackptr += sizeof(sljit_sw) * shared_count; + + if (type != recurse_copy_private_to_global && type != recurse_swap_global) { - if (!tmp2empty) - { - OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), stackptr, TMP2, 0); - stackptr += sizeof(sljit_sw); - } - if (!tmp1empty) + SLJIT_ASSERT(type == recurse_copy_from_global || type == recurse_copy_shared_to_global || type == recurse_copy_kept_shared_to_global); + + for (i = 0; i < kept_shared_count; i++) { - OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), stackptr, TMP1, 0); + SLJIT_ASSERT(kept_shared_srcw[i] != 0); + + if (!from_sp) + delayed_mem_copy_move(&status, base_reg, stackptr, SLJIT_SP, kept_shared_srcw[i]); + + if (from_sp || type == recurse_swap_global) + delayed_mem_copy_move(&status, SLJIT_SP, kept_shared_srcw[i], base_reg, stackptr); + stackptr += sizeof(sljit_sw); } } + else + stackptr += sizeof(sljit_sw) * kept_shared_count; } -SLJIT_ASSERT(cc == ccend && stackptr == stacktop && (save || (tmp1empty && tmp2empty))); + +SLJIT_ASSERT(cc == ccend && stackptr == stacktop); + +delayed_mem_copy_finish(&status); } -static SLJIT_INLINE pcre_uchar *set_then_offsets(compiler_common *common, pcre_uchar *cc, sljit_u8 *current_offset) +static SLJIT_INLINE PCRE2_SPTR set_then_offsets(compiler_common *common, PCRE2_SPTR cc, sljit_u8 *current_offset) { -pcre_uchar *end = bracketend(cc); +PCRE2_SPTR end = bracketend(cc); BOOL has_alternatives = cc[GET(cc, 1)] == OP_ALT; /* Assert captures then. */ -if (*cc >= OP_ASSERT && *cc <= OP_ASSERTBACK_NOT) +if (*cc >= OP_ASSERT && *cc <= OP_ASSERTBACK_NA) current_offset = NULL; /* Conditional block does not. */ if (*cc == OP_COND || *cc == OP_SCOND) @@ -2218,7 +2973,7 @@ if (has_alternatives) while (cc < end) { - if ((*cc >= OP_ASSERT && *cc <= OP_ASSERTBACK_NOT) || (*cc >= OP_ONCE && *cc <= OP_SCOND)) + if ((*cc >= OP_ASSERT && *cc <= OP_ASSERTBACK_NA) || (*cc >= OP_ONCE && *cc <= OP_SCOND)) cc = set_then_offsets(common, cc, current_offset); else { @@ -2296,20 +3051,6 @@ while (list_item) common->stubs = NULL; } -static void add_label_addr(compiler_common *common, sljit_uw *update_addr) -{ -DEFINE_COMPILER; -label_addr_list *label_addr; - -label_addr = sljit_alloc_memory(compiler, sizeof(label_addr_list)); -if (label_addr == NULL) - return; -label_addr->label = LABEL(); -label_addr->update_addr = update_addr; -label_addr->next = common->label_addrs; -common->label_addrs = label_addr; -} - static SLJIT_INLINE void count_match(compiler_common *common) { DEFINE_COMPILER; @@ -2363,25 +3104,11 @@ common->read_only_data_head = (void *)result; return result + 1; } -static void free_read_only_data(void *current, void *allocator_data) -{ -void *next; - -SLJIT_UNUSED_ARG(allocator_data); - -while (current != NULL) - { - next = *(void**)current; - SLJIT_FREE(current, allocator_data); - current = next; - } -} - static SLJIT_INLINE void reset_ovector(compiler_common *common, int length) { DEFINE_COMPILER; struct sljit_label *loop; -int i; +sljit_s32 i; /* At this point we can freely use all temporary registers. */ SLJIT_ASSERT(length > 1); @@ -2416,16 +3143,54 @@ else } } -static SLJIT_INLINE void reset_fast_fail(compiler_common *common) +static SLJIT_INLINE void reset_early_fail(compiler_common *common) { DEFINE_COMPILER; +sljit_u32 size = (sljit_u32)(common->early_fail_end_ptr - common->early_fail_start_ptr); +sljit_u32 uncleared_size; +sljit_s32 src = SLJIT_IMM; sljit_s32 i; +struct sljit_label *loop; + +SLJIT_ASSERT(common->early_fail_start_ptr < common->early_fail_end_ptr); -SLJIT_ASSERT(common->fast_fail_start_ptr < common->fast_fail_end_ptr); +if (size == sizeof(sljit_sw)) + { + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->early_fail_start_ptr, SLJIT_IMM, 0); + return; + } + +if (sljit_get_register_index(TMP3) >= 0 && !sljit_has_cpu_feature(SLJIT_HAS_ZERO_REGISTER)) + { + OP1(SLJIT_MOV, TMP3, 0, SLJIT_IMM, 0); + src = TMP3; + } + +if (size <= 6 * sizeof(sljit_sw)) + { + for (i = common->early_fail_start_ptr; i < common->early_fail_end_ptr; i += sizeof(sljit_sw)) + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), i, src, 0); + return; + } + +GET_LOCAL_BASE(TMP1, 0, common->early_fail_start_ptr); + +uncleared_size = ((size / sizeof(sljit_sw)) % 3) * sizeof(sljit_sw); + +OP2(SLJIT_ADD, TMP2, 0, TMP1, 0, SLJIT_IMM, size - uncleared_size); + +loop = LABEL(); +OP1(SLJIT_MOV, SLJIT_MEM1(TMP1), 0, src, 0); +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 3 * sizeof(sljit_sw)); +OP1(SLJIT_MOV, SLJIT_MEM1(TMP1), -2 * (sljit_sw)sizeof(sljit_sw), src, 0); +OP1(SLJIT_MOV, SLJIT_MEM1(TMP1), -1 * (sljit_sw)sizeof(sljit_sw), src, 0); +CMPTO(SLJIT_LESS, TMP1, 0, TMP2, 0, loop); -OP2(SLJIT_SUB, TMP1, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); -for (i = common->fast_fail_start_ptr; i < common->fast_fail_end_ptr; i += sizeof(sljit_sw)) - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), i, TMP1, 0); +if (uncleared_size >= sizeof(sljit_sw)) + OP1(SLJIT_MOV, SLJIT_MEM1(TMP1), 0, src, 0); + +if (uncleared_size >= 2 * sizeof(sljit_sw)) + OP1(SLJIT_MOV, SLJIT_MEM1(TMP1), sizeof(sljit_sw), src, 0); } static SLJIT_INLINE void do_reset_match(compiler_common *common, int length) @@ -2466,17 +3231,23 @@ else } } -OP1(SLJIT_MOV, STACK_TOP, 0, ARGUMENTS, 0); +if (!HAS_VIRTUAL_REGISTERS) + OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, stack)); +else + OP1(SLJIT_MOV, STACK_TOP, 0, ARGUMENTS, 0); + if (common->mark_ptr != 0) OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->mark_ptr, SLJIT_IMM, 0); if (common->control_head_ptr != 0) OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_IMM, 0); -OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(STACK_TOP), SLJIT_OFFSETOF(jit_arguments, stack)); +if (HAS_VIRTUAL_REGISTERS) + OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(STACK_TOP), SLJIT_OFFSETOF(jit_arguments, stack)); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->start_ptr); OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(STACK_TOP), SLJIT_OFFSETOF(struct sljit_stack, end)); } -static sljit_sw SLJIT_FUNC do_search_mark(sljit_sw *current, const pcre_uchar *skip_arg) +static sljit_sw SLJIT_FUNC do_search_mark(sljit_sw *current, PCRE2_SPTR skip_arg) { while (current != NULL) { @@ -2486,7 +3257,7 @@ while (current != NULL) break; case type_mark: - if (STRCMP_UC_UC(skip_arg, (pcre_uchar *)current[2]) == 0) + if (PRIV(strcmp)(skip_arg, (PCRE2_SPTR)current[2]) == 0) return current[3]; break; @@ -2504,27 +3275,43 @@ static SLJIT_INLINE void copy_ovector(compiler_common *common, int topbracket) { DEFINE_COMPILER; struct sljit_label *loop; -struct sljit_jump *early_quit; BOOL has_pre; /* At this point we can freely use all registers. */ OP1(SLJIT_MOV, SLJIT_S2, 0, SLJIT_MEM1(SLJIT_SP), OVECTOR(1)); OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), OVECTOR(1), STR_PTR, 0); -OP1(SLJIT_MOV, SLJIT_R0, 0, ARGUMENTS, 0); -if (common->mark_ptr != 0) - OP1(SLJIT_MOV, SLJIT_R2, 0, SLJIT_MEM1(SLJIT_SP), common->mark_ptr); -OP1(SLJIT_MOV_S32, SLJIT_R1, 0, SLJIT_MEM1(SLJIT_R0), SLJIT_OFFSETOF(jit_arguments, offset_count)); -if (common->mark_ptr != 0) - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_R0), SLJIT_OFFSETOF(jit_arguments, mark_ptr), SLJIT_R2, 0); -OP2(SLJIT_SUB, SLJIT_R2, 0, SLJIT_MEM1(SLJIT_R0), SLJIT_OFFSETOF(jit_arguments, offsets), SLJIT_IMM, sizeof(int)); -OP1(SLJIT_MOV, SLJIT_R0, 0, SLJIT_MEM1(SLJIT_R0), SLJIT_OFFSETOF(jit_arguments, begin)); +if (HAS_VIRTUAL_REGISTERS) + { + OP1(SLJIT_MOV, SLJIT_R0, 0, ARGUMENTS, 0); + OP1(SLJIT_MOV, SLJIT_S0, 0, SLJIT_MEM1(SLJIT_SP), common->start_ptr); + if (common->mark_ptr != 0) + OP1(SLJIT_MOV, SLJIT_R2, 0, SLJIT_MEM1(SLJIT_SP), common->mark_ptr); + OP1(SLJIT_MOV_U32, SLJIT_R1, 0, SLJIT_MEM1(SLJIT_R0), SLJIT_OFFSETOF(jit_arguments, oveccount)); + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_R0), SLJIT_OFFSETOF(jit_arguments, startchar_ptr), SLJIT_S0, 0); + if (common->mark_ptr != 0) + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_R0), SLJIT_OFFSETOF(jit_arguments, mark_ptr), SLJIT_R2, 0); + OP2(SLJIT_ADD, SLJIT_R2, 0, SLJIT_MEM1(SLJIT_R0), SLJIT_OFFSETOF(jit_arguments, match_data), + SLJIT_IMM, SLJIT_OFFSETOF(pcre2_match_data, ovector) - sizeof(PCRE2_SIZE)); + } +else + { + OP1(SLJIT_MOV, SLJIT_S0, 0, SLJIT_MEM1(SLJIT_SP), common->start_ptr); + OP1(SLJIT_MOV, SLJIT_R2, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, match_data)); + if (common->mark_ptr != 0) + OP1(SLJIT_MOV, SLJIT_R0, 0, SLJIT_MEM1(SLJIT_SP), common->mark_ptr); + OP1(SLJIT_MOV_U32, SLJIT_R1, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, oveccount)); + OP1(SLJIT_MOV, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, startchar_ptr), SLJIT_S0, 0); + if (common->mark_ptr != 0) + OP1(SLJIT_MOV, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, mark_ptr), SLJIT_R0, 0); + OP2(SLJIT_ADD, SLJIT_R2, 0, SLJIT_R2, 0, SLJIT_IMM, SLJIT_OFFSETOF(pcre2_match_data, ovector) - sizeof(PCRE2_SIZE)); + } has_pre = sljit_emit_mem(compiler, SLJIT_MOV | SLJIT_MEM_SUPP | SLJIT_MEM_PRE, SLJIT_S1, SLJIT_MEM1(SLJIT_S0), sizeof(sljit_sw)) == SLJIT_SUCCESS; + GET_LOCAL_BASE(SLJIT_S0, 0, OVECTOR_START - (has_pre ? sizeof(sljit_sw) : 0)); +OP1(SLJIT_MOV, SLJIT_R0, 0, SLJIT_MEM1(HAS_VIRTUAL_REGISTERS ? SLJIT_R0 : ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, begin)); -/* Unlikely, but possible */ -early_quit = CMP(SLJIT_EQUAL, SLJIT_R1, 0, SLJIT_IMM, 0); loop = LABEL(); if (has_pre) @@ -2535,17 +3322,18 @@ else OP2(SLJIT_ADD, SLJIT_S0, 0, SLJIT_S0, 0, SLJIT_IMM, sizeof(sljit_sw)); } -OP2(SLJIT_ADD, SLJIT_R2, 0, SLJIT_R2, 0, SLJIT_IMM, sizeof(int)); +OP2(SLJIT_ADD, SLJIT_R2, 0, SLJIT_R2, 0, SLJIT_IMM, sizeof(PCRE2_SIZE)); OP2(SLJIT_SUB, SLJIT_S1, 0, SLJIT_S1, 0, SLJIT_R0, 0); /* Copy the integer value to the output buffer */ -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 +#if PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 OP2(SLJIT_ASHR, SLJIT_S1, 0, SLJIT_S1, 0, SLJIT_IMM, UCHAR_SHIFT); #endif -OP1(SLJIT_MOV_S32, SLJIT_MEM1(SLJIT_R2), 0, SLJIT_S1, 0); +SLJIT_ASSERT(sizeof(PCRE2_SIZE) == 4 || sizeof(PCRE2_SIZE) == 8); +OP1(((sizeof(PCRE2_SIZE) == 4) ? SLJIT_MOV_U32 : SLJIT_MOV), SLJIT_MEM1(SLJIT_R2), 0, SLJIT_S1, 0); + OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_R1, 0, SLJIT_R1, 0, SLJIT_IMM, 1); JUMPTO(SLJIT_NOT_ZERO, loop); -JUMPHERE(early_quit); /* Calculate the return value, which is the maximum ovector value. */ if (topbracket > 1) @@ -2560,6 +3348,7 @@ if (topbracket > 1) sljit_emit_mem(compiler, SLJIT_MOV | SLJIT_MEM_PRE, SLJIT_R2, SLJIT_MEM1(SLJIT_R0), -(2 * (sljit_sw)sizeof(sljit_sw))); OP2(SLJIT_SUB, SLJIT_R1, 0, SLJIT_R1, 0, SLJIT_IMM, 1); CMPTO(SLJIT_EQUAL, SLJIT_R2, 0, SLJIT_S2, 0, loop); + OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_R1, 0); } else { @@ -2572,8 +3361,8 @@ if (topbracket > 1) OP2(SLJIT_SUB, SLJIT_R0, 0, SLJIT_R0, 0, SLJIT_IMM, 2 * (sljit_sw)sizeof(sljit_sw)); OP2(SLJIT_SUB, SLJIT_R1, 0, SLJIT_R1, 0, SLJIT_IMM, 1); CMPTO(SLJIT_EQUAL, SLJIT_R2, 0, SLJIT_S2, 0, loop); + OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_R1, 0); } - OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_R1, 0); } else OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, 1); @@ -2582,41 +3371,37 @@ else static SLJIT_INLINE void return_with_partial_match(compiler_common *common, struct sljit_label *quit) { DEFINE_COMPILER; -struct sljit_jump *jump; +sljit_s32 mov_opcode; +sljit_s32 arguments_reg = !HAS_VIRTUAL_REGISTERS ? ARGUMENTS : SLJIT_R1; -SLJIT_COMPILE_ASSERT(STR_END == SLJIT_S1, str_end_must_be_saved_reg2); +SLJIT_COMPILE_ASSERT(STR_END == SLJIT_S0, str_end_must_be_saved_reg0); SLJIT_ASSERT(common->start_used_ptr != 0 && common->start_ptr != 0 - && (common->mode == JIT_PARTIAL_SOFT_COMPILE ? common->hit_start != 0 : common->hit_start == 0)); + && (common->mode == PCRE2_JIT_PARTIAL_SOFT ? common->hit_start != 0 : common->hit_start == 0)); -OP1(SLJIT_MOV, SLJIT_R1, 0, ARGUMENTS, 0); -OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE_ERROR_PARTIAL); -OP1(SLJIT_MOV_S32, SLJIT_R2, 0, SLJIT_MEM1(SLJIT_R1), SLJIT_OFFSETOF(jit_arguments, real_offset_count)); -CMPTO(SLJIT_SIG_LESS, SLJIT_R2, 0, SLJIT_IMM, 2, quit); +if (arguments_reg != ARGUMENTS) + OP1(SLJIT_MOV, arguments_reg, 0, ARGUMENTS, 0); +OP1(SLJIT_MOV, SLJIT_R2, 0, SLJIT_MEM1(SLJIT_SP), + common->mode == PCRE2_JIT_PARTIAL_SOFT ? common->hit_start : common->start_ptr); +OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_PARTIAL); /* Store match begin and end. */ -OP1(SLJIT_MOV, SLJIT_S0, 0, SLJIT_MEM1(SLJIT_R1), SLJIT_OFFSETOF(jit_arguments, begin)); -OP1(SLJIT_MOV, SLJIT_R1, 0, SLJIT_MEM1(SLJIT_R1), SLJIT_OFFSETOF(jit_arguments, offsets)); +OP1(SLJIT_MOV, SLJIT_S1, 0, SLJIT_MEM1(arguments_reg), SLJIT_OFFSETOF(jit_arguments, begin)); +OP1(SLJIT_MOV, SLJIT_MEM1(arguments_reg), SLJIT_OFFSETOF(jit_arguments, startchar_ptr), SLJIT_R2, 0); +OP1(SLJIT_MOV, SLJIT_R1, 0, SLJIT_MEM1(arguments_reg), SLJIT_OFFSETOF(jit_arguments, match_data)); -jump = CMP(SLJIT_SIG_LESS, SLJIT_R2, 0, SLJIT_IMM, 3); -OP2(SLJIT_SUB, SLJIT_R2, 0, SLJIT_MEM1(SLJIT_SP), common->mode == JIT_PARTIAL_HARD_COMPILE ? common->start_ptr : (common->hit_start + (int)sizeof(sljit_sw)), SLJIT_S0, 0); -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 -OP2(SLJIT_ASHR, SLJIT_R2, 0, SLJIT_R2, 0, SLJIT_IMM, UCHAR_SHIFT); -#endif -OP1(SLJIT_MOV_S32, SLJIT_MEM1(SLJIT_R1), 2 * sizeof(int), SLJIT_R2, 0); -JUMPHERE(jump); +mov_opcode = (sizeof(PCRE2_SIZE) == 4) ? SLJIT_MOV_U32 : SLJIT_MOV; -OP1(SLJIT_MOV, SLJIT_R2, 0, SLJIT_MEM1(SLJIT_SP), common->mode == JIT_PARTIAL_HARD_COMPILE ? common->start_used_ptr : common->hit_start); -OP2(SLJIT_SUB, SLJIT_S1, 0, STR_END, 0, SLJIT_S0, 0); -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 -OP2(SLJIT_ASHR, SLJIT_S1, 0, SLJIT_S1, 0, SLJIT_IMM, UCHAR_SHIFT); +OP2(SLJIT_SUB, SLJIT_R2, 0, SLJIT_R2, 0, SLJIT_S1, 0); +#if PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 +OP2(SLJIT_ASHR, SLJIT_R2, 0, SLJIT_R2, 0, SLJIT_IMM, UCHAR_SHIFT); #endif -OP1(SLJIT_MOV_S32, SLJIT_MEM1(SLJIT_R1), sizeof(int), SLJIT_S1, 0); +OP1(mov_opcode, SLJIT_MEM1(SLJIT_R1), SLJIT_OFFSETOF(pcre2_match_data, ovector), SLJIT_R2, 0); -OP2(SLJIT_SUB, SLJIT_R2, 0, SLJIT_R2, 0, SLJIT_S0, 0); -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 -OP2(SLJIT_ASHR, SLJIT_R2, 0, SLJIT_R2, 0, SLJIT_IMM, UCHAR_SHIFT); +OP2(SLJIT_SUB, STR_END, 0, STR_END, 0, SLJIT_S1, 0); +#if PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 +OP2(SLJIT_ASHR, STR_END, 0, STR_END, 0, SLJIT_IMM, UCHAR_SHIFT); #endif -OP1(SLJIT_MOV_S32, SLJIT_MEM1(SLJIT_R1), 0, SLJIT_R2, 0); +OP1(mov_opcode, SLJIT_MEM1(SLJIT_R1), SLJIT_OFFSETOF(pcre2_match_data, ovector) + sizeof(PCRE2_SIZE), STR_END, 0); JUMPTO(SLJIT_JUMP, quit); } @@ -2627,7 +3412,7 @@ static SLJIT_INLINE void check_start_used_ptr(compiler_common *common) DEFINE_COMPILER; struct sljit_jump *jump; -if (common->mode == JIT_PARTIAL_SOFT_COMPILE) +if (common->mode == PCRE2_JIT_PARTIAL_SOFT) { /* The value of -1 must be kept for start_used_ptr! */ OP2(SLJIT_ADD, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, SLJIT_IMM, 1); @@ -2637,7 +3422,7 @@ if (common->mode == JIT_PARTIAL_SOFT_COMPILE) OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, STR_PTR, 0); JUMPHERE(jump); } -else if (common->mode == JIT_PARTIAL_HARD_COMPILE) +else if (common->mode == PCRE2_JIT_PARTIAL_HARD) { jump = CMP(SLJIT_LESS_EQUAL, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, STR_PTR, 0); OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, STR_PTR, 0); @@ -2645,26 +3430,25 @@ else if (common->mode == JIT_PARTIAL_HARD_COMPILE) } } -static SLJIT_INLINE BOOL char_has_othercase(compiler_common *common, pcre_uchar *cc) +static SLJIT_INLINE BOOL char_has_othercase(compiler_common *common, PCRE2_SPTR cc) { /* Detects if the character has an othercase. */ unsigned int c; -#ifdef SUPPORT_UTF -if (common->utf) +#ifdef SUPPORT_UNICODE +if (common->utf || common->ucp) { - GETCHAR(c, cc); - if (c > 127) + if (common->utf) { -#ifdef SUPPORT_UCP - return c != UCD_OTHERCASE(c); -#else - return FALSE; -#endif + GETCHAR(c, cc); } -#ifndef COMPILE_PCRE8 + else + c = *cc; + + if (c > 127) + return c != UCD_OTHERCASE(c); + return common->fcc[c] != c; -#endif } else #endif @@ -2675,41 +3459,35 @@ return MAX_255(c) ? common->fcc[c] != c : FALSE; static SLJIT_INLINE unsigned int char_othercase(compiler_common *common, unsigned int c) { /* Returns with the othercase. */ -#ifdef SUPPORT_UTF -if (common->utf && c > 127) - { -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE +if ((common->utf || common->ucp) && c > 127) return UCD_OTHERCASE(c); -#else - return c; -#endif - } #endif return TABLE_GET(c, common->fcc, c); } -static unsigned int char_get_othercase_bit(compiler_common *common, pcre_uchar *cc) +static unsigned int char_get_othercase_bit(compiler_common *common, PCRE2_SPTR cc) { /* Detects if the character and its othercase has only 1 bit difference. */ unsigned int c, oc, bit; -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 int n; #endif -#ifdef SUPPORT_UTF -if (common->utf) +#ifdef SUPPORT_UNICODE +if (common->utf || common->ucp) { - GETCHAR(c, cc); + if (common->utf) + { + GETCHAR(c, cc); + } + else + c = *cc; + if (c <= 127) oc = common->fcc[c]; else - { -#ifdef SUPPORT_UCP oc = UCD_OTHERCASE(c); -#else - oc = c; -#endif - } } else { @@ -2732,9 +3510,9 @@ if (c <= 127 && bit == 0x20) if (!is_powerof2(bit)) return 0; -#if defined COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf && c > 127) { n = GET_EXTRALEN(*cc); @@ -2745,23 +3523,23 @@ if (common->utf && c > 127) } return (n << 8) | bit; } -#endif /* SUPPORT_UTF */ +#endif /* SUPPORT_UNICODE */ return (0 << 8) | bit; -#elif defined COMPILE_PCRE16 || defined COMPILE_PCRE32 +#elif PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf && c > 65535) { - if (bit >= (1 << 10)) + if (bit >= (1u << 10)) bit >>= 10; else return (bit < 256) ? ((2 << 8) | bit) : ((3 << 8) | (bit >> 8)); } -#endif /* SUPPORT_UTF */ -return (bit < 256) ? ((0 << 8) | bit) : ((1 << 8) | (bit >> 8)); +#endif /* SUPPORT_UNICODE */ +return (bit < 256) ? ((0u << 8) | bit) : ((1u << 8) | (bit >> 8)); -#endif /* COMPILE_PCRE[8|16|32] */ +#endif /* PCRE2_CODE_UNIT_WIDTH == [8|16|32] */ } static void check_partial(compiler_common *common, BOOL force) @@ -2770,17 +3548,17 @@ static void check_partial(compiler_common *common, BOOL force) DEFINE_COMPILER; struct sljit_jump *jump = NULL; -SLJIT_ASSERT(!force || common->mode != JIT_COMPILE); +SLJIT_ASSERT(!force || common->mode != PCRE2_JIT_COMPLETE); -if (common->mode == JIT_COMPILE) +if (common->mode == PCRE2_JIT_COMPLETE) return; -if (!force) +if (!force && !common->allow_empty_partial) jump = CMP(SLJIT_GREATER_EQUAL, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, STR_PTR, 0); -else if (common->mode == JIT_PARTIAL_SOFT_COMPILE) +else if (common->mode == PCRE2_JIT_PARTIAL_SOFT) jump = CMP(SLJIT_EQUAL, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, SLJIT_IMM, -1); -if (common->mode == JIT_PARTIAL_SOFT_COMPILE) +if (common->mode == PCRE2_JIT_PARTIAL_SOFT) OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->hit_start, SLJIT_IMM, 0); else { @@ -2800,14 +3578,14 @@ static void check_str_end(compiler_common *common, jump_list **end_reached) DEFINE_COMPILER; struct sljit_jump *jump; -if (common->mode == JIT_COMPILE) +if (common->mode == PCRE2_JIT_COMPLETE) { add_jump(compiler, end_reached, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); return; } jump = CMP(SLJIT_LESS, STR_PTR, 0, STR_END, 0); -if (common->mode == JIT_PARTIAL_SOFT_COMPILE) +if (common->mode == PCRE2_JIT_PARTIAL_SOFT) { add_jump(compiler, end_reached, CMP(SLJIT_GREATER_EQUAL, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, STR_PTR, 0)); OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->hit_start, SLJIT_IMM, 0); @@ -2829,7 +3607,7 @@ static void detect_partial_match(compiler_common *common, jump_list **backtracks DEFINE_COMPILER; struct sljit_jump *jump; -if (common->mode == JIT_COMPILE) +if (common->mode == PCRE2_JIT_COMPLETE) { add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); return; @@ -2837,8 +3615,12 @@ if (common->mode == JIT_COMPILE) /* Partial matching mode. */ jump = CMP(SLJIT_LESS, STR_PTR, 0, STR_END, 0); -add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, STR_PTR, 0)); -if (common->mode == JIT_PARTIAL_SOFT_COMPILE) +if (!common->allow_empty_partial) + add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, STR_PTR, 0)); +else if (common->mode == PCRE2_JIT_PARTIAL_SOFT) + add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, SLJIT_IMM, -1)); + +if (common->mode == PCRE2_JIT_PARTIAL_SOFT) { OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->hit_start, SLJIT_IMM, 0); add_jump(compiler, backtracks, JUMP(SLJIT_JUMP)); @@ -2853,125 +3635,241 @@ else JUMPHERE(jump); } -static void peek_char(compiler_common *common, sljit_u32 max) +static void process_partial_match(compiler_common *common) +{ +DEFINE_COMPILER; +struct sljit_jump *jump; + +/* Partial matching mode. */ +if (common->mode == PCRE2_JIT_PARTIAL_SOFT) + { + jump = CMP(SLJIT_GREATER_EQUAL, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, STR_PTR, 0); + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->hit_start, SLJIT_IMM, 0); + JUMPHERE(jump); + } +else if (common->mode == PCRE2_JIT_PARTIAL_HARD) + { + if (common->partialmatchlabel != NULL) + CMPTO(SLJIT_LESS, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, STR_PTR, 0, common->partialmatchlabel); + else + add_jump(compiler, &common->partialmatch, CMP(SLJIT_LESS, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, STR_PTR, 0)); + } +} + +static void detect_partial_match_to(compiler_common *common, struct sljit_label *label) +{ +DEFINE_COMPILER; + +CMPTO(SLJIT_LESS, STR_PTR, 0, STR_END, 0, label); +process_partial_match(common); +} + +static void peek_char(compiler_common *common, sljit_u32 max, sljit_s32 dst, sljit_sw dstw, jump_list **backtracks) { /* Reads the character into TMP1, keeps STR_PTR. -Does not check STR_END. TMP2 Destroyed. */ +Does not check STR_END. TMP2, dst, RETURN_ADDR Destroyed. */ DEFINE_COMPILER; -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 struct sljit_jump *jump; -#endif +#endif /* SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 */ SLJIT_UNUSED_ARG(max); +SLJIT_UNUSED_ARG(dst); +SLJIT_UNUSED_ARG(dstw); +SLJIT_UNUSED_ARG(backtracks); -OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), 0); -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 +OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); + +#ifdef SUPPORT_UNICODE +#if PCRE2_CODE_UNIT_WIDTH == 8 if (common->utf) { if (max < 128) return; - jump = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0xc0); + jump = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x80); + OP1(SLJIT_MOV, dst, dstw, STR_PTR, 0); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); - add_jump(compiler, &common->utfreadchar, JUMP(SLJIT_FAST_CALL)); - OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, TMP2, 0); + add_jump(compiler, common->invalid_utf ? &common->utfreadchar_invalid : &common->utfreadchar, JUMP(SLJIT_FAST_CALL)); + OP1(SLJIT_MOV, STR_PTR, 0, dst, dstw); + if (backtracks && common->invalid_utf) + add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR)); JUMPHERE(jump); } -#endif /* SUPPORT_UTF && !COMPILE_PCRE32 */ - -#if defined SUPPORT_UTF && defined COMPILE_PCRE16 +#elif PCRE2_CODE_UNIT_WIDTH == 16 if (common->utf) { if (max < 0xd800) return; OP2(SLJIT_SUB, TMP2, 0, TMP1, 0, SLJIT_IMM, 0xd800); - jump = CMP(SLJIT_GREATER, TMP2, 0, SLJIT_IMM, 0xdc00 - 0xd800 - 1); - /* TMP2 contains the high surrogate. */ - OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); - OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x40); - OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 10); - OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x3ff); - OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); - JUMPHERE(jump); - } -#endif -} - -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 -static BOOL is_char7_bitset(const sljit_u8 *bitset, BOOL nclass) -{ -/* Tells whether the character codes below 128 are enough -to determine a match. */ -const sljit_u8 value = nclass ? 0xff : 0; -const sljit_u8 *end = bitset + 32; + if (common->invalid_utf) + { + jump = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0xe000 - 0xd800); + OP1(SLJIT_MOV, dst, dstw, STR_PTR, 0); + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + add_jump(compiler, &common->utfreadchar_invalid, JUMP(SLJIT_FAST_CALL)); + OP1(SLJIT_MOV, STR_PTR, 0, dst, dstw); + if (backtracks && common->invalid_utf) + add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR)); + } + else + { + /* TMP2 contains the high surrogate. */ + jump = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0xdc00 - 0xd800); + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(1)); + OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 10); + OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x10000 - 0xdc00); + OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); + } -bitset += 16; -do + JUMPHERE(jump); + } +#elif PCRE2_CODE_UNIT_WIDTH == 32 +if (common->invalid_utf) { - if (*bitset++ != value) - return FALSE; + if (max < 0xd800) return; + + if (backtracks != NULL) + { + OP2(SLJIT_SUB, TMP2, 0, TMP1, 0, SLJIT_IMM, 0xd800); + add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0x110000)); + add_jump(compiler, backtracks, CMP(SLJIT_LESS, TMP2, 0, SLJIT_IMM, 0xe000 - 0xd800)); + } + else + { + OP2(SLJIT_SUB, TMP2, 0, TMP1, 0, SLJIT_IMM, 0xd800); + OP2(SLJIT_SUB | SLJIT_SET_GREATER_EQUAL, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x110000); + CMOV(SLJIT_GREATER_EQUAL, TMP1, SLJIT_IMM, INVALID_UTF_CHAR); + OP2(SLJIT_SUB | SLJIT_SET_LESS, SLJIT_UNUSED, 0, TMP2, 0, SLJIT_IMM, 0xe000 - 0xd800); + CMOV(SLJIT_LESS, TMP1, SLJIT_IMM, INVALID_UTF_CHAR); + } } -while (bitset < end); -return TRUE; +#endif /* PCRE2_CODE_UNIT_WIDTH == [8|16|32] */ +#endif /* SUPPORT_UNICODE */ } -static void read_char7_type(compiler_common *common, BOOL full_read) +static void peek_char_back(compiler_common *common, sljit_u32 max, jump_list **backtracks) { -/* Reads the precise character type of a character into TMP1, if the character -is less than 128. Otherwise it returns with zero. Does not check STR_END. The -full_read argument tells whether characters above max are accepted or not. */ +/* Reads one character back without moving STR_PTR. TMP2 must +contain the start of the subject buffer. Affects TMP1, TMP2, and RETURN_ADDR. */ DEFINE_COMPILER; -struct sljit_jump *jump; -SLJIT_ASSERT(common->utf); +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +struct sljit_jump *jump; +#endif /* SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 */ -OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), 0); -OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); +SLJIT_UNUSED_ARG(max); +SLJIT_UNUSED_ARG(backtracks); -OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP2), common->ctypes); +OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-1)); -if (full_read) +#ifdef SUPPORT_UNICODE +#if PCRE2_CODE_UNIT_WIDTH == 8 +if (common->utf) { - jump = CMP(SLJIT_LESS, TMP2, 0, SLJIT_IMM, 0xc0); - OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(utf8_table4) - 0xc0); - OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP2, 0); + if (max < 128) return; + + jump = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x80); + if (common->invalid_utf) + { + add_jump(compiler, &common->utfpeakcharback_invalid, JUMP(SLJIT_FAST_CALL)); + if (backtracks != NULL) + add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR)); + } + else + add_jump(compiler, &common->utfpeakcharback, JUMP(SLJIT_FAST_CALL)); JUMPHERE(jump); } +#elif PCRE2_CODE_UNIT_WIDTH == 16 +if (common->utf) + { + if (max < 0xd800) return; + + if (common->invalid_utf) + { + jump = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0xd800); + add_jump(compiler, &common->utfpeakcharback_invalid, JUMP(SLJIT_FAST_CALL)); + if (backtracks != NULL) + add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR)); + } + else + { + OP2(SLJIT_SUB, TMP2, 0, TMP1, 0, SLJIT_IMM, 0xdc00); + jump = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0xe000 - 0xdc00); + /* TMP2 contains the low surrogate. */ + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-2)); + OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x10000); + OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xd800); + OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 10); + OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); + } + JUMPHERE(jump); + } +#elif PCRE2_CODE_UNIT_WIDTH == 32 +if (common->invalid_utf) + { + OP2(SLJIT_SUB, TMP2, 0, TMP1, 0, SLJIT_IMM, 0xd800); + add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0x110000)); + add_jump(compiler, backtracks, CMP(SLJIT_LESS, TMP2, 0, SLJIT_IMM, 0xe000 - 0xd800)); + } +#endif /* PCRE2_CODE_UNIT_WIDTH == [8|16|32] */ +#endif /* SUPPORT_UNICODE */ } -#endif /* SUPPORT_UTF && COMPILE_PCRE8 */ +#define READ_CHAR_UPDATE_STR_PTR 0x1 +#define READ_CHAR_UTF8_NEWLINE 0x2 +#define READ_CHAR_NEWLINE (READ_CHAR_UPDATE_STR_PTR | READ_CHAR_UTF8_NEWLINE) +#define READ_CHAR_VALID_UTF 0x4 -static void read_char_range(compiler_common *common, sljit_u32 min, sljit_u32 max, BOOL update_str_ptr) +static void read_char(compiler_common *common, sljit_u32 min, sljit_u32 max, + jump_list **backtracks, sljit_u32 options) { /* Reads the precise value of a character into TMP1, if the character is between min and max (c >= min && c <= max). Otherwise it returns with a value outside the range. Does not check STR_END. */ DEFINE_COMPILER; -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 struct sljit_jump *jump; #endif -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 struct sljit_jump *jump2; #endif -SLJIT_UNUSED_ARG(update_str_ptr); SLJIT_UNUSED_ARG(min); SLJIT_UNUSED_ARG(max); +SLJIT_UNUSED_ARG(backtracks); +SLJIT_UNUSED_ARG(options); SLJIT_ASSERT(min <= max); OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 +#ifdef SUPPORT_UNICODE +#if PCRE2_CODE_UNIT_WIDTH == 8 if (common->utf) { - if (max < 128 && !update_str_ptr) return; + if (max < 128 && !(options & READ_CHAR_UPDATE_STR_PTR)) return; + + if (common->invalid_utf && !(options & READ_CHAR_VALID_UTF)) + { + jump = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x80); + + if (options & READ_CHAR_UTF8_NEWLINE) + add_jump(compiler, &common->utfreadnewline_invalid, JUMP(SLJIT_FAST_CALL)); + else + add_jump(compiler, &common->utfreadchar_invalid, JUMP(SLJIT_FAST_CALL)); + + if (backtracks != NULL) + add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR)); + JUMPHERE(jump); + return; + } jump = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0xc0); if (min >= 0x10000) { OP2(SLJIT_SUB, TMP2, 0, TMP1, 0, SLJIT_IMM, 0xf0); - if (update_str_ptr) + if (options & READ_CHAR_UPDATE_STR_PTR) OP1(SLJIT_MOV_U8, RETURN_ADDR, 0, SLJIT_MEM1(TMP1), (sljit_sw)PRIV(utf8_table4) - 0xc0); OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); jump2 = CMP(SLJIT_GREATER, TMP2, 0, SLJIT_IMM, 0x7); @@ -2983,19 +3881,19 @@ if (common->utf) OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x3f); OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(2)); - if (!update_str_ptr) + if (!(options & READ_CHAR_UPDATE_STR_PTR)) OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(3)); OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 6); OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x3f); OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); JUMPHERE(jump2); - if (update_str_ptr) + if (options & READ_CHAR_UPDATE_STR_PTR) OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, RETURN_ADDR, 0); } else if (min >= 0x800 && max <= 0xffff) { OP2(SLJIT_SUB, TMP2, 0, TMP1, 0, SLJIT_IMM, 0xe0); - if (update_str_ptr) + if (options & READ_CHAR_UPDATE_STR_PTR) OP1(SLJIT_MOV_U8, RETURN_ADDR, 0, SLJIT_MEM1(TMP1), (sljit_sw)PRIV(utf8_table4) - 0xc0); OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); jump2 = CMP(SLJIT_GREATER, TMP2, 0, SLJIT_IMM, 0xf); @@ -3003,17 +3901,19 @@ if (common->utf) OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x3f); OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(1)); - if (!update_str_ptr) + if (!(options & READ_CHAR_UPDATE_STR_PTR)) OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(2)); OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 6); OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x3f); OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); JUMPHERE(jump2); - if (update_str_ptr) + if (options & READ_CHAR_UPDATE_STR_PTR) OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, RETURN_ADDR, 0); } else if (max >= 0x800) - add_jump(compiler, (max < 0x10000) ? &common->utfreadchar16 : &common->utfreadchar, JUMP(SLJIT_FAST_CALL)); + { + add_jump(compiler, &common->utfreadchar, JUMP(SLJIT_FAST_CALL)); + } else if (max < 128) { OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP1), (sljit_sw)PRIV(utf8_table4) - 0xc0); @@ -3022,7 +3922,7 @@ if (common->utf) else { OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); - if (!update_str_ptr) + if (!(options & READ_CHAR_UPDATE_STR_PTR)) OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); else OP1(SLJIT_MOV_U8, RETURN_ADDR, 0, SLJIT_MEM1(TMP1), (sljit_sw)PRIV(utf8_table4) - 0xc0); @@ -3030,81 +3930,199 @@ if (common->utf) OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 6); OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x3f); OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); - if (update_str_ptr) + if (options & READ_CHAR_UPDATE_STR_PTR) OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, RETURN_ADDR, 0); } JUMPHERE(jump); } -#endif - -#if defined SUPPORT_UTF && defined COMPILE_PCRE16 +#elif PCRE2_CODE_UNIT_WIDTH == 16 if (common->utf) { + if (max < 0xd800 && !(options & READ_CHAR_UPDATE_STR_PTR)) return; + + if (common->invalid_utf && !(options & READ_CHAR_VALID_UTF)) + { + OP2(SLJIT_SUB, TMP2, 0, TMP1, 0, SLJIT_IMM, 0xd800); + jump = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0xe000 - 0xd800); + + if (options & READ_CHAR_UTF8_NEWLINE) + add_jump(compiler, &common->utfreadnewline_invalid, JUMP(SLJIT_FAST_CALL)); + else + add_jump(compiler, &common->utfreadchar_invalid, JUMP(SLJIT_FAST_CALL)); + + if (backtracks != NULL) + add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR)); + JUMPHERE(jump); + return; + } + if (max >= 0x10000) { OP2(SLJIT_SUB, TMP2, 0, TMP1, 0, SLJIT_IMM, 0xd800); - jump = CMP(SLJIT_GREATER, TMP2, 0, SLJIT_IMM, 0xdc00 - 0xd800 - 1); + jump = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0xdc00 - 0xd800); /* TMP2 contains the high surrogate. */ OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); - OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x40); OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 10); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); - OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x3ff); - OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); + OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x10000 - 0xdc00); + OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); JUMPHERE(jump); return; } - if (max < 0xd800 && !update_str_ptr) return; - /* Skip low surrogate if necessary. */ OP2(SLJIT_SUB, TMP2, 0, TMP1, 0, SLJIT_IMM, 0xd800); - jump = CMP(SLJIT_GREATER, TMP2, 0, SLJIT_IMM, 0xdc00 - 0xd800 - 1); - if (update_str_ptr) - OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); - if (max >= 0xd800) - OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 0x10000); - JUMPHERE(jump); + + if (sljit_has_cpu_feature(SLJIT_HAS_CMOV) && !HAS_VIRTUAL_REGISTERS) + { + if (options & READ_CHAR_UPDATE_STR_PTR) + OP2(SLJIT_ADD, RETURN_ADDR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + OP2(SLJIT_SUB | SLJIT_SET_LESS, SLJIT_UNUSED, 0, TMP2, 0, SLJIT_IMM, 0x400); + if (options & READ_CHAR_UPDATE_STR_PTR) + CMOV(SLJIT_LESS, STR_PTR, RETURN_ADDR, 0); + if (max >= 0xd800) + CMOV(SLJIT_LESS, TMP1, SLJIT_IMM, 0x10000); + } + else + { + jump = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x400); + if (options & READ_CHAR_UPDATE_STR_PTR) + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + if (max >= 0xd800) + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 0x10000); + JUMPHERE(jump); + } } -#endif +#elif PCRE2_CODE_UNIT_WIDTH == 32 +if (common->invalid_utf) + { + if (backtracks != NULL) + { + OP2(SLJIT_SUB, TMP2, 0, TMP1, 0, SLJIT_IMM, 0xd800); + add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0x110000)); + add_jump(compiler, backtracks, CMP(SLJIT_LESS, TMP2, 0, SLJIT_IMM, 0xe000 - 0xd800)); + } + else + { + OP2(SLJIT_SUB, TMP2, 0, TMP1, 0, SLJIT_IMM, 0xd800); + OP2(SLJIT_SUB | SLJIT_SET_GREATER_EQUAL, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x110000); + CMOV(SLJIT_GREATER_EQUAL, TMP1, SLJIT_IMM, INVALID_UTF_CHAR); + OP2(SLJIT_SUB | SLJIT_SET_LESS, SLJIT_UNUSED, 0, TMP2, 0, SLJIT_IMM, 0xe000 - 0xd800); + CMOV(SLJIT_LESS, TMP1, SLJIT_IMM, INVALID_UTF_CHAR); + } + } +#endif /* PCRE2_CODE_UNIT_WIDTH == [8|16|32] */ +#endif /* SUPPORT_UNICODE */ } -static SLJIT_INLINE void read_char(compiler_common *common) -{ -read_char_range(common, 0, READ_CHAR_MAX, TRUE); -} +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 -static void read_char8_type(compiler_common *common, BOOL update_str_ptr) +static BOOL is_char7_bitset(const sljit_u8 *bitset, BOOL nclass) { -/* Reads the character type into TMP1, updates STR_PTR. Does not check STR_END. */ -DEFINE_COMPILER; -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 -struct sljit_jump *jump; -#endif -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 -struct sljit_jump *jump2; -#endif - -SLJIT_UNUSED_ARG(update_str_ptr); +/* Tells whether the character codes below 128 are enough +to determine a match. */ +const sljit_u8 value = nclass ? 0xff : 0; +const sljit_u8 *end = bitset + 32; + +bitset += 16; +do + { + if (*bitset++ != value) + return FALSE; + } +while (bitset < end); +return TRUE; +} + +static void read_char7_type(compiler_common *common, jump_list **backtracks, BOOL negated) +{ +/* Reads the precise character type of a character into TMP1, if the character +is less than 128. Otherwise it returns with zero. Does not check STR_END. The +full_read argument tells whether characters above max are accepted or not. */ +DEFINE_COMPILER; +struct sljit_jump *jump; + +SLJIT_ASSERT(common->utf); + +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), 0); +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + +/* All values > 127 are zero in ctypes. */ +OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP2), common->ctypes); + +if (negated) + { + jump = CMP(SLJIT_LESS, TMP2, 0, SLJIT_IMM, 0x80); + + if (common->invalid_utf) + { + add_jump(compiler, &common->utfreadchar_invalid, JUMP(SLJIT_FAST_CALL)); + add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR)); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 0); + } + else + { + OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(utf8_table4) - 0xc0); + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP2, 0); + } + JUMPHERE(jump); + } +} + +#endif /* SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 */ + +static void read_char8_type(compiler_common *common, jump_list **backtracks, BOOL negated) +{ +/* Reads the character type into TMP1, updates STR_PTR. Does not check STR_END. */ +DEFINE_COMPILER; +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH != 8 +struct sljit_jump *jump; +#endif +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 +struct sljit_jump *jump2; +#endif + +SLJIT_UNUSED_ARG(backtracks); +SLJIT_UNUSED_ARG(negated); OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), 0); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 if (common->utf) { - /* This can be an extra read in some situations, but hopefully - it is needed in most cases. */ + /* The result of this read may be unused, but saves an "else" part. */ OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP2), common->ctypes); - jump = CMP(SLJIT_LESS, TMP2, 0, SLJIT_IMM, 0xc0); - if (!update_str_ptr) + jump = CMP(SLJIT_LESS, TMP2, 0, SLJIT_IMM, 0x80); + + if (!negated) { + if (common->invalid_utf) + add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); - OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x3f); + OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xc2); + if (common->invalid_utf) + add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0xe0 - 0xc2)); + OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 6); - OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x3f); - OP2(SLJIT_OR, TMP2, 0, TMP2, 0, TMP1, 0); + OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, TMP1, 0); + OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x80); + if (common->invalid_utf) + add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x40)); + + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 0); + jump2 = CMP(SLJIT_GREATER, TMP2, 0, SLJIT_IMM, 255); + OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP2), common->ctypes); + JUMPHERE(jump2); + } + else if (common->invalid_utf) + { + add_jump(compiler, &common->utfreadchar_invalid, JUMP(SLJIT_FAST_CALL)); + OP1(SLJIT_MOV, TMP2, 0, TMP1, 0); + add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR)); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 0); jump2 = CMP(SLJIT_GREATER, TMP2, 0, SLJIT_IMM, 255); OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP2), common->ctypes); @@ -3112,43 +4130,98 @@ if (common->utf) } else add_jump(compiler, &common->utfreadtype8, JUMP(SLJIT_FAST_CALL)); + JUMPHERE(jump); return; } -#endif /* SUPPORT_UTF && COMPILE_PCRE8 */ +#endif /* SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 */ + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 32 +if (common->invalid_utf && negated) + add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x110000)); +#endif /* SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 32 */ -#if !defined COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH != 8 /* The ctypes array contains only 256 values. */ OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 0); jump = CMP(SLJIT_GREATER, TMP2, 0, SLJIT_IMM, 255); -#endif +#endif /* PCRE2_CODE_UNIT_WIDTH != 8 */ OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP2), common->ctypes); -#if !defined COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH != 8 JUMPHERE(jump); -#endif +#endif /* PCRE2_CODE_UNIT_WIDTH != 8 */ -#if defined SUPPORT_UTF && defined COMPILE_PCRE16 -if (common->utf && update_str_ptr) +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 16 +if (common->utf && negated) { /* Skip low surrogate if necessary. */ + if (!common->invalid_utf) + { + OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xd800); + + if (sljit_has_cpu_feature(SLJIT_HAS_CMOV) && !HAS_VIRTUAL_REGISTERS) + { + OP2(SLJIT_ADD, RETURN_ADDR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + OP2(SLJIT_SUB | SLJIT_SET_LESS, SLJIT_UNUSED, 0, TMP2, 0, SLJIT_IMM, 0x400); + CMOV(SLJIT_LESS, STR_PTR, RETURN_ADDR, 0); + } + else + { + jump = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x400); + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + JUMPHERE(jump); + } + return; + } + OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xd800); - jump = CMP(SLJIT_GREATER, TMP2, 0, SLJIT_IMM, 0xdc00 - 0xd800 - 1); + jump = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0xe000 - 0xd800); + add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x400)); + add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + + OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xdc00); + add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x400)); + JUMPHERE(jump); + return; } -#endif /* SUPPORT_UTF && COMPILE_PCRE16 */ +#endif /* SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 16 */ } -static void skip_char_back(compiler_common *common) +static void move_back(compiler_common *common, jump_list **backtracks, BOOL must_be_valid) { -/* Goes one character back. Affects STR_PTR and TMP1. Does not check begin. */ +/* Goes one character back. Affects STR_PTR and TMP1. If must_be_valid is TRUE, +TMP2 is not used. Otherwise TMP2 must contain the start of the subject buffer, +and it is destroyed. Does not modify STR_PTR for invalid character sequences. */ DEFINE_COMPILER; -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 -#if defined COMPILE_PCRE8 + +SLJIT_UNUSED_ARG(backtracks); +SLJIT_UNUSED_ARG(must_be_valid); + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +struct sljit_jump *jump; +#endif + +#ifdef SUPPORT_UNICODE +#if PCRE2_CODE_UNIT_WIDTH == 8 struct sljit_label *label; if (common->utf) { + if (!must_be_valid && common->invalid_utf) + { + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), -IN_UCHARS(1)); + OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + jump = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x80); + add_jump(compiler, &common->utfmoveback_invalid, JUMP(SLJIT_FAST_CALL)); + if (backtracks != NULL) + add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, 0)); + JUMPHERE(jump); + return; + } + label = LABEL(); OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), -IN_UCHARS(1)); OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); @@ -3156,21 +4229,50 @@ if (common->utf) CMPTO(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, 0x80, label); return; } -#elif defined COMPILE_PCRE16 +#elif PCRE2_CODE_UNIT_WIDTH == 16 if (common->utf) { OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), -IN_UCHARS(1)); OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + + if (!must_be_valid && common->invalid_utf) + { + OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xd800); + jump = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0xe000 - 0xd800); + add_jump(compiler, &common->utfmoveback_invalid, JUMP(SLJIT_FAST_CALL)); + if (backtracks != NULL) + add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, 0)); + JUMPHERE(jump); + return; + } + /* Skip low surrogate if necessary. */ OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xfc00); OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0xdc00); OP_FLAGS(SLJIT_MOV, TMP1, 0, SLJIT_EQUAL); - OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 1); + OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, UCHAR_SHIFT); + OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, TMP1, 0); + return; + } +#elif PCRE2_CODE_UNIT_WIDTH == 32 +if (common->invalid_utf && !must_be_valid) + { + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), -IN_UCHARS(1)); + if (backtracks != NULL) + { + add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0x110000)); + OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + return; + } + + OP2(SLJIT_SUB | SLJIT_SET_LESS, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x110000); + OP_FLAGS(SLJIT_MOV, TMP1, 0, SLJIT_LESS); + OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, UCHAR_SHIFT); OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, TMP1, 0); return; } -#endif /* COMPILE_PCRE[8|16] */ -#endif /* SUPPORT_UTF && !COMPILE_PCRE32 */ +#endif /* PCRE2_CODE_UNIT_WIDTH == [8|16|32] */ +#endif /* SUPPORT_UNICODE */ OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); } @@ -3207,19 +4309,18 @@ else } } -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE -#if defined COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 static void do_utfreadchar(compiler_common *common) { /* Fast decoding a UTF-8 character. TMP1 contains the first byte -of the character (>= 0xc0). Return char value in TMP1, length in TMP2. */ +of the character (>= 0xc0). Return char value in TMP1. */ DEFINE_COMPILER; struct sljit_jump *jump; sljit_emit_fast_enter(compiler, RETURN_ADDR, 0); OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); -OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x3f); OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 6); OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x3f); OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); @@ -3228,13 +4329,12 @@ OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x800); jump = JUMP(SLJIT_NOT_ZERO); /* Two byte sequence. */ +OP2(SLJIT_XOR, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x3000); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); -OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, IN_UCHARS(2)); -sljit_emit_fast_return(compiler, RETURN_ADDR, 0); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); JUMPHERE(jump); OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(1)); -OP2(SLJIT_XOR, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x800); OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 6); OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x3f); OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); @@ -3242,56 +4342,19 @@ OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x10000); jump = JUMP(SLJIT_NOT_ZERO); /* Three byte sequence. */ +OP2(SLJIT_XOR, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xe0000); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(2)); -OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, IN_UCHARS(3)); -sljit_emit_fast_return(compiler, RETURN_ADDR, 0); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); /* Four byte sequence. */ JUMPHERE(jump); OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(2)); -OP2(SLJIT_XOR, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x10000); -OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 6); +OP2(SLJIT_XOR, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xf0000); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(3)); -OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x3f); -OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); -OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, IN_UCHARS(4)); -sljit_emit_fast_return(compiler, RETURN_ADDR, 0); -} - -static void do_utfreadchar16(compiler_common *common) -{ -/* Fast decoding a UTF-8 character. TMP1 contains the first byte -of the character (>= 0xc0). Return value in TMP1. */ -DEFINE_COMPILER; -struct sljit_jump *jump; - -sljit_emit_fast_enter(compiler, RETURN_ADDR, 0); -OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); -OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x3f); OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 6); OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x3f); OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); - -/* Searching for the first zero. */ -OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x800); -jump = JUMP(SLJIT_NOT_ZERO); -/* Two byte sequence. */ -OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); -sljit_emit_fast_return(compiler, RETURN_ADDR, 0); - -JUMPHERE(jump); -OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x400); -OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_NOT_ZERO); -/* This code runs only in 8 bit mode. No need to shift the value. */ -OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP2, 0); -OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(1)); -OP2(SLJIT_XOR, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x800); -OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 6); -OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x3f); -OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); -/* Three byte sequence. */ -OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(2)); -sljit_emit_fast_return(compiler, RETURN_ADDR, 0); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); } static void do_utfreadtype8(compiler_common *common) @@ -3316,25 +4379,664 @@ OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 6); OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x3f); OP2(SLJIT_OR, TMP2, 0, TMP2, 0, TMP1, 0); OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP2), common->ctypes); -sljit_emit_fast_return(compiler, RETURN_ADDR, 0); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); JUMPHERE(compare); OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 0); -sljit_emit_fast_return(compiler, RETURN_ADDR, 0); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); /* We only have types for characters less than 256. */ JUMPHERE(jump); OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(utf8_table4) - 0xc0); OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 0); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP2, 0); -sljit_emit_fast_return(compiler, RETURN_ADDR, 0); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); +} + +static void do_utfreadchar_invalid(compiler_common *common) +{ +/* Slow decoding a UTF-8 character. TMP1 contains the first byte +of the character (>= 0xc0). Return char value in TMP1. STR_PTR is +undefined for invalid characters. */ +DEFINE_COMPILER; +sljit_s32 i; +sljit_s32 has_cmov = sljit_has_cpu_feature(SLJIT_HAS_CMOV); +struct sljit_jump *jump; +struct sljit_jump *buffer_end_close; +struct sljit_label *three_byte_entry; +struct sljit_label *exit_invalid_label; +struct sljit_jump *exit_invalid[11]; + +sljit_emit_fast_enter(compiler, RETURN_ADDR, 0); + +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xc2); + +/* Usually more than 3 characters remained in the subject buffer. */ +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(3)); + +/* Not a valid start of a multi-byte sequence, no more bytes read. */ +exit_invalid[0] = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0xf5 - 0xc2); + +buffer_end_close = CMP(SLJIT_GREATER, STR_PTR, 0, STR_END, 0); + +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-3)); +OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 6); +/* If TMP2 is in 0x80-0xbf range, TMP1 is also increased by (0x2 << 6). */ +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x80); +exit_invalid[1] = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x40); + +OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x800); +jump = JUMP(SLJIT_NOT_ZERO); + +OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(2)); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +JUMPHERE(jump); + +/* Three-byte sequence. */ +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-2)); +OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 6); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x80); +OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); +if (has_cmov) + { + OP2(SLJIT_SUB | SLJIT_SET_GREATER_EQUAL, SLJIT_UNUSED, 0, TMP2, 0, SLJIT_IMM, 0x40); + CMOV(SLJIT_GREATER_EQUAL, TMP1, SLJIT_IMM, 0x20000); + exit_invalid[2] = NULL; + } +else + exit_invalid[2] = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x40); + +OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x10000); +jump = JUMP(SLJIT_NOT_ZERO); + +three_byte_entry = LABEL(); + +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x2d800); +if (has_cmov) + { + OP2(SLJIT_SUB | SLJIT_SET_LESS, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x800); + CMOV(SLJIT_LESS, TMP1, SLJIT_IMM, INVALID_UTF_CHAR - 0xd800); + exit_invalid[3] = NULL; + } +else + exit_invalid[3] = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x800); +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xd800); +OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + +if (has_cmov) + { + OP2(SLJIT_SUB | SLJIT_SET_LESS, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x800); + CMOV(SLJIT_LESS, TMP1, SLJIT_IMM, INVALID_UTF_CHAR); + exit_invalid[4] = NULL; + } +else + exit_invalid[4] = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x800); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +JUMPHERE(jump); + +/* Four-byte sequence. */ +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-1)); +OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 6); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x80); +OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); +if (has_cmov) + { + OP2(SLJIT_SUB | SLJIT_SET_GREATER_EQUAL, SLJIT_UNUSED, 0, TMP2, 0, SLJIT_IMM, 0x40); + CMOV(SLJIT_GREATER_EQUAL, TMP1, SLJIT_IMM, 0); + exit_invalid[5] = NULL; + } +else + exit_invalid[5] = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x40); + +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xc10000); +if (has_cmov) + { + OP2(SLJIT_SUB | SLJIT_SET_GREATER_EQUAL, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x100000); + CMOV(SLJIT_GREATER_EQUAL, TMP1, SLJIT_IMM, INVALID_UTF_CHAR - 0x10000); + exit_invalid[6] = NULL; + } +else + exit_invalid[6] = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0x100000); + +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x10000); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +JUMPHERE(buffer_end_close); +OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(2)); +exit_invalid[7] = CMP(SLJIT_GREATER, STR_PTR, 0, STR_END, 0); + +/* Two-byte sequence. */ +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-1)); +OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 6); +/* If TMP2 is in 0x80-0xbf range, TMP1 is also increased by (0x2 << 6). */ +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x80); +exit_invalid[8] = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x40); + +OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x800); +jump = JUMP(SLJIT_NOT_ZERO); + +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +/* Three-byte sequence. */ +JUMPHERE(jump); +exit_invalid[9] = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); + +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); +OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 6); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x80); +OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); +if (has_cmov) + { + OP2(SLJIT_SUB | SLJIT_SET_GREATER_EQUAL, SLJIT_UNUSED, 0, TMP2, 0, SLJIT_IMM, 0x40); + CMOV(SLJIT_GREATER_EQUAL, TMP1, SLJIT_IMM, INVALID_UTF_CHAR); + exit_invalid[10] = NULL; + } +else + exit_invalid[10] = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x40); + +/* One will be substracted from STR_PTR later. */ +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(2)); + +/* Four byte sequences are not possible. */ +CMPTO(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x30000, three_byte_entry); + +exit_invalid_label = LABEL(); +for (i = 0; i < 11; i++) + sljit_set_label(exit_invalid[i], exit_invalid_label); + +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); +} + +static void do_utfreadnewline_invalid(compiler_common *common) +{ +/* Slow decoding a UTF-8 character, specialized for newlines. +TMP1 contains the first byte of the character (>= 0xc0). Return +char value in TMP1. */ +DEFINE_COMPILER; +struct sljit_label *loop; +struct sljit_label *skip_start; +struct sljit_label *three_byte_exit; +struct sljit_jump *jump[5]; + +sljit_emit_fast_enter(compiler, RETURN_ADDR, 0); + +if (common->nltype != NLTYPE_ANY) + { + SLJIT_ASSERT(common->nltype != NLTYPE_FIXED || common->newline < 128); + + /* All newlines are ascii, just skip intermediate octets. */ + jump[0] = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); + loop = LABEL(); + if (sljit_emit_mem(compiler, MOV_UCHAR | SLJIT_MEM_SUPP | SLJIT_MEM_POST, TMP2, SLJIT_MEM1(STR_PTR), IN_UCHARS(1)) == SLJIT_SUCCESS) + sljit_emit_mem(compiler, MOV_UCHAR | SLJIT_MEM_POST, TMP2, SLJIT_MEM1(STR_PTR), IN_UCHARS(1)); + else + { + OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + } + + OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xc0); + CMPTO(SLJIT_EQUAL, TMP2, 0, SLJIT_IMM, 0x80, loop); + OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + + JUMPHERE(jump[0]); + + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR); + OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + return; + } + +jump[0] = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + +jump[1] = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, 0xc2); +jump[2] = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, 0xe2); + +skip_start = LABEL(); +OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xc0); +jump[3] = CMP(SLJIT_NOT_EQUAL, TMP2, 0, SLJIT_IMM, 0x80); + +/* Skip intermediate octets. */ +loop = LABEL(); +jump[4] = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); +OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xc0); +CMPTO(SLJIT_EQUAL, TMP2, 0, SLJIT_IMM, 0x80, loop); + +JUMPHERE(jump[3]); +OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + +three_byte_exit = LABEL(); +JUMPHERE(jump[0]); +JUMPHERE(jump[4]); + +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +/* Two byte long newline: 0x85. */ +JUMPHERE(jump[1]); +CMPTO(SLJIT_NOT_EQUAL, TMP2, 0, SLJIT_IMM, 0x85, skip_start); + +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 0x85); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +/* Three byte long newlines: 0x2028 and 0x2029. */ +JUMPHERE(jump[2]); +CMPTO(SLJIT_NOT_EQUAL, TMP2, 0, SLJIT_IMM, 0x80, skip_start); +CMPTO(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0, three_byte_exit); + +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + +OP2(SLJIT_SUB, TMP1, 0, TMP2, 0, SLJIT_IMM, 0x80); +CMPTO(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0x40, skip_start); + +OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, 0x2000); +OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); +} + +static void do_utfmoveback_invalid(compiler_common *common) +{ +/* Goes one character back. */ +DEFINE_COMPILER; +sljit_s32 i; +struct sljit_jump *jump; +struct sljit_jump *buffer_start_close; +struct sljit_label *exit_ok_label; +struct sljit_label *exit_invalid_label; +struct sljit_jump *exit_invalid[7]; + +sljit_emit_fast_enter(compiler, RETURN_ADDR, 0); + +OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(3)); +exit_invalid[0] = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0xc0); + +/* Two-byte sequence. */ +buffer_start_close = CMP(SLJIT_LESS, STR_PTR, 0, TMP2, 0); + +OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(2)); + +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xc0); +jump = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0x20); + +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 1); +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(2)); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +/* Three-byte sequence. */ +JUMPHERE(jump); +exit_invalid[1] = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, -0x40); + +OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(1)); + +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xe0); +jump = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0x10); + +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 1); +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +/* Four-byte sequence. */ +JUMPHERE(jump); +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xe0 - 0x80); +exit_invalid[2] = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0x40); + +OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xf0); +exit_invalid[3] = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0x05); + +exit_ok_label = LABEL(); +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 1); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +/* Two-byte sequence. */ +JUMPHERE(buffer_start_close); +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(2)); + +exit_invalid[4] = CMP(SLJIT_LESS, STR_PTR, 0, TMP2, 0); + +OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); + +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xc0); +CMPTO(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x20, exit_ok_label); + +/* Three-byte sequence. */ +OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); +exit_invalid[5] = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, -0x40); +exit_invalid[6] = CMP(SLJIT_LESS, STR_PTR, 0, TMP2, 0); + +OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); + +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xe0); +CMPTO(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x10, exit_ok_label); + +/* Four-byte sequences are not possible. */ + +exit_invalid_label = LABEL(); +sljit_set_label(exit_invalid[5], exit_invalid_label); +sljit_set_label(exit_invalid[6], exit_invalid_label); +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 0); +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(3)); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +JUMPHERE(exit_invalid[4]); +/* -2 + 4 = 2 */ +OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(2)); + +exit_invalid_label = LABEL(); +for (i = 0; i < 4; i++) + sljit_set_label(exit_invalid[i], exit_invalid_label); +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 0); +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(4)); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); +} + +static void do_utfpeakcharback(compiler_common *common) +{ +/* Peak a character back. Does not modify STR_PTR. */ +DEFINE_COMPILER; +struct sljit_jump *jump[2]; + +sljit_emit_fast_enter(compiler, RETURN_ADDR, 0); + +OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-2)); +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xc0); +jump[0] = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x20); + +OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-3)); +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xe0); +jump[1] = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x10); + +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-4)); +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xe0 - 0x80); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xf0); +OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 6); +OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); + +JUMPHERE(jump[1]); +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-2)); +OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 6); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x80); +OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); + +JUMPHERE(jump[0]); +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-1)); +OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 6); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x80); +OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); + +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); +} + +static void do_utfpeakcharback_invalid(compiler_common *common) +{ +/* Peak a character back. Does not modify STR_PTR. */ +DEFINE_COMPILER; +sljit_s32 i; +sljit_s32 has_cmov = sljit_has_cpu_feature(SLJIT_HAS_CMOV); +struct sljit_jump *jump[2]; +struct sljit_label *two_byte_entry; +struct sljit_label *three_byte_entry; +struct sljit_label *exit_invalid_label; +struct sljit_jump *exit_invalid[8]; + +sljit_emit_fast_enter(compiler, RETURN_ADDR, 0); + +OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, SLJIT_IMM, IN_UCHARS(3)); +exit_invalid[0] = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0xc0); +jump[0] = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, STR_PTR, 0); + +/* Two-byte sequence. */ +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-2)); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xc2); +jump[1] = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x1e); + +two_byte_entry = LABEL(); +OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 6); +/* If TMP1 is in 0x80-0xbf range, TMP1 is also increased by (0x2 << 6). */ +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +JUMPHERE(jump[1]); +OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xc2 - 0x80); +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x80); +exit_invalid[1] = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x40); +OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 6); +OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); + +/* Three-byte sequence. */ +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-3)); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xe0); +jump[1] = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x10); + +three_byte_entry = LABEL(); +OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 12); +OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); + +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xd800); +if (has_cmov) + { + OP2(SLJIT_SUB | SLJIT_SET_LESS, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x800); + CMOV(SLJIT_LESS, TMP1, SLJIT_IMM, -0xd800); + exit_invalid[2] = NULL; + } +else + exit_invalid[2] = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x800); + +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xd800); +if (has_cmov) + { + OP2(SLJIT_SUB | SLJIT_SET_LESS, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x800); + CMOV(SLJIT_LESS, TMP1, SLJIT_IMM, INVALID_UTF_CHAR); + exit_invalid[3] = NULL; + } +else + exit_invalid[3] = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x800); + +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +JUMPHERE(jump[1]); +OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xe0 - 0x80); +exit_invalid[4] = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x40); +OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 12); +OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); + +/* Four-byte sequence. */ +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-4)); +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x10000); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xf0); +OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 18); +/* ADD is used instead of OR because of the SUB 0x10000 above. */ +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); + +if (has_cmov) + { + OP2(SLJIT_SUB | SLJIT_SET_GREATER_EQUAL, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x100000); + CMOV(SLJIT_GREATER_EQUAL, TMP1, SLJIT_IMM, INVALID_UTF_CHAR - 0x10000); + exit_invalid[5] = NULL; + } +else + exit_invalid[5] = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0x100000); + +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x10000); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +JUMPHERE(jump[0]); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, IN_UCHARS(1)); +jump[0] = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, STR_PTR, 0); + +/* Two-byte sequence. */ +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-2)); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xc2); +CMPTO(SLJIT_LESS, TMP2, 0, SLJIT_IMM, 0x1e, two_byte_entry); + +OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xc2 - 0x80); +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x80); +exit_invalid[6] = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x40); +OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 6); +OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); + +/* Three-byte sequence. */ +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-3)); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xe0); +CMPTO(SLJIT_LESS, TMP2, 0, SLJIT_IMM, 0x10, three_byte_entry); + +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +JUMPHERE(jump[0]); +exit_invalid[7] = CMP(SLJIT_GREATER, TMP2, 0, STR_PTR, 0); + +/* Two-byte sequence. */ +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-2)); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xc2); +CMPTO(SLJIT_LESS, TMP2, 0, SLJIT_IMM, 0x1e, two_byte_entry); + +exit_invalid_label = LABEL(); +for (i = 0; i < 8; i++) + sljit_set_label(exit_invalid[i], exit_invalid_label); + +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); +} + +#endif /* PCRE2_CODE_UNIT_WIDTH == 8 */ + +#if PCRE2_CODE_UNIT_WIDTH == 16 + +static void do_utfreadchar_invalid(compiler_common *common) +{ +/* Slow decoding a UTF-16 character. TMP1 contains the first half +of the character (>= 0xd800). Return char value in TMP1. STR_PTR is +undefined for invalid characters. */ +DEFINE_COMPILER; +struct sljit_jump *exit_invalid[3]; + +sljit_emit_fast_enter(compiler, RETURN_ADDR, 0); + +/* TMP2 contains the high surrogate. */ +exit_invalid[0] = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0xdc00); +exit_invalid[1] = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); + +OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); +OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 10); +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xdc00); +OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, SLJIT_IMM, 0x10000); +exit_invalid[2] = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0x400); + +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +JUMPHERE(exit_invalid[0]); +JUMPHERE(exit_invalid[1]); +JUMPHERE(exit_invalid[2]); +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); +} + +static void do_utfreadnewline_invalid(compiler_common *common) +{ +/* Slow decoding a UTF-16 character, specialized for newlines. +TMP1 contains the first half of the character (>= 0xd800). Return +char value in TMP1. */ + +DEFINE_COMPILER; +struct sljit_jump *exit_invalid[2]; + +sljit_emit_fast_enter(compiler, RETURN_ADDR, 0); + +/* TMP2 contains the high surrogate. */ +exit_invalid[0] = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); + +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); +exit_invalid[1] = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0xdc00); + +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xdc00); +OP2(SLJIT_SUB | SLJIT_SET_LESS, SLJIT_UNUSED, 0, TMP2, 0, SLJIT_IMM, 0x400); +OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_LESS); +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 0x10000); +OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, UCHAR_SHIFT); +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP2, 0); + +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +JUMPHERE(exit_invalid[0]); +JUMPHERE(exit_invalid[1]); +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); +} + +static void do_utfmoveback_invalid(compiler_common *common) +{ +/* Goes one character back. */ +DEFINE_COMPILER; +struct sljit_jump *exit_invalid[3]; + +sljit_emit_fast_enter(compiler, RETURN_ADDR, 0); + +exit_invalid[0] = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x400); +exit_invalid[1] = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, STR_PTR, 0); + +OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-1)); +OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xd800); +exit_invalid[2] = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0x400); + +OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 1); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +JUMPHERE(exit_invalid[0]); +JUMPHERE(exit_invalid[1]); +JUMPHERE(exit_invalid[2]); + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 0); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); } -#endif /* COMPILE_PCRE8 */ +static void do_utfpeakcharback_invalid(compiler_common *common) +{ +/* Peak a character back. Does not modify STR_PTR. */ +DEFINE_COMPILER; +struct sljit_jump *jump; +struct sljit_jump *exit_invalid[3]; -#endif /* SUPPORT_UTF */ +sljit_emit_fast_enter(compiler, RETURN_ADDR, 0); -#ifdef SUPPORT_UCP +jump = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 0xe000); +OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, SLJIT_IMM, IN_UCHARS(1)); +exit_invalid[0] = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0xdc00); +exit_invalid[1] = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, STR_PTR, 0); + +OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-2)); +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x10000 - 0xdc00); +OP2(SLJIT_SUB, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xd800); +exit_invalid[2] = CMP(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 0x400); +OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 10); +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); + +JUMPHERE(jump); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); + +JUMPHERE(exit_invalid[0]); +JUMPHERE(exit_invalid[1]); +JUMPHERE(exit_invalid[2]); + +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); +} + +#endif /* PCRE2_CODE_UNIT_WIDTH == 16 */ /* UCD_BLOCK_SIZE must be 128 (see the assert below). */ #define UCD_BLOCK_MASK 127 @@ -3345,44 +5047,91 @@ static void do_getucd(compiler_common *common) /* Search the UCD record for the character comes in TMP1. Returns chartype in TMP1 and UCD offset in TMP2. */ DEFINE_COMPILER; -#ifdef COMPILE_PCRE32 +#if PCRE2_CODE_UNIT_WIDTH == 32 struct sljit_jump *jump; #endif #if defined SLJIT_DEBUG && SLJIT_DEBUG /* dummy_ucd_record */ -const ucd_record *record = GET_UCD(INVALID_UTF_CHAR); -SLJIT_ASSERT(record->script == ucp_Common && record->chartype == ucp_Cn && record->gbprop == ucp_gbOther); +const ucd_record *record = GET_UCD(UNASSIGNED_UTF_CHAR); +SLJIT_ASSERT(record->script == ucp_Unknown && record->chartype == ucp_Cn && record->gbprop == ucp_gbOther); SLJIT_ASSERT(record->caseset == 0 && record->other_case == 0); #endif -SLJIT_ASSERT(UCD_BLOCK_SIZE == 128 && sizeof(ucd_record) == 8); +SLJIT_ASSERT(UCD_BLOCK_SIZE == 128 && sizeof(ucd_record) == 12); sljit_emit_fast_enter(compiler, RETURN_ADDR, 0); -#ifdef COMPILE_PCRE32 +#if PCRE2_CODE_UNIT_WIDTH == 32 if (!common->utf) { - jump = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x10ffff + 1); - OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR); + jump = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, MAX_UTF_CODE_POINT + 1); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, UNASSIGNED_UTF_CHAR); JUMPHERE(jump); } #endif OP2(SLJIT_LSHR, TMP2, 0, TMP1, 0, SLJIT_IMM, UCD_BLOCK_SHIFT); -OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(ucd_stage1)); +OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 1); +OP1(SLJIT_MOV_U16, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(ucd_stage1)); OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, UCD_BLOCK_MASK); OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, UCD_BLOCK_SHIFT); OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, (sljit_sw)PRIV(ucd_stage2)); OP1(SLJIT_MOV_U16, TMP2, 0, SLJIT_MEM2(TMP2, TMP1), 1); -OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, (sljit_sw)PRIV(ucd_records) + SLJIT_OFFSETOF(ucd_record, chartype)); -OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM2(TMP1, TMP2), 3); -sljit_emit_fast_return(compiler, RETURN_ADDR, 0); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); } + +static void do_getucdtype(compiler_common *common) +{ +/* Search the UCD record for the character comes in TMP1. +Returns chartype in TMP1 and UCD offset in TMP2. */ +DEFINE_COMPILER; +#if PCRE2_CODE_UNIT_WIDTH == 32 +struct sljit_jump *jump; #endif -static SLJIT_INLINE struct sljit_label *mainloop_entry(compiler_common *common, BOOL hascrorlf) +#if defined SLJIT_DEBUG && SLJIT_DEBUG +/* dummy_ucd_record */ +const ucd_record *record = GET_UCD(UNASSIGNED_UTF_CHAR); +SLJIT_ASSERT(record->script == ucp_Unknown && record->chartype == ucp_Cn && record->gbprop == ucp_gbOther); +SLJIT_ASSERT(record->caseset == 0 && record->other_case == 0); +#endif + +SLJIT_ASSERT(UCD_BLOCK_SIZE == 128 && sizeof(ucd_record) == 12); + +sljit_emit_fast_enter(compiler, RETURN_ADDR, 0); + +#if PCRE2_CODE_UNIT_WIDTH == 32 +if (!common->utf) + { + jump = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, MAX_UTF_CODE_POINT + 1); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, UNASSIGNED_UTF_CHAR); + JUMPHERE(jump); + } +#endif + +OP2(SLJIT_LSHR, TMP2, 0, TMP1, 0, SLJIT_IMM, UCD_BLOCK_SHIFT); +OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 1); +OP1(SLJIT_MOV_U16, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(ucd_stage1)); +OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, UCD_BLOCK_MASK); +OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, UCD_BLOCK_SHIFT); +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); +OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, (sljit_sw)PRIV(ucd_stage2)); +OP1(SLJIT_MOV_U16, TMP2, 0, SLJIT_MEM2(TMP2, TMP1), 1); + +/* TMP2 is multiplied by 12. Same as (TMP2 << 2) + ((TMP2 << 2) << 1). */ +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, (sljit_sw)PRIV(ucd_records) + SLJIT_OFFSETOF(ucd_record, chartype)); +OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 2); +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); +OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM2(TMP1, TMP2), 1); + +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); +} + +#endif /* SUPPORT_UNICODE */ + +static SLJIT_INLINE struct sljit_label *mainloop_entry(compiler_common *common) { DEFINE_COMPILER; struct sljit_label *mainloop; @@ -3390,20 +5139,26 @@ struct sljit_label *newlinelabel = NULL; struct sljit_jump *start; struct sljit_jump *end = NULL; struct sljit_jump *end2 = NULL; -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 -struct sljit_jump *singlechar; -#endif +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +struct sljit_label *loop; +struct sljit_jump *jump; +#endif /* SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 */ jump_list *newline = NULL; +sljit_u32 overall_options = common->re->overall_options; +BOOL hascrorlf = (common->re->flags & PCRE2_HASCRORLF) != 0; BOOL newlinecheck = FALSE; BOOL readuchar = FALSE; -if (!(hascrorlf || (common->match_end_ptr != 0)) && - (common->nltype == NLTYPE_ANY || common->nltype == NLTYPE_ANYCRLF || common->newline > 255)) +if (!(hascrorlf || (overall_options & PCRE2_FIRSTLINE) != 0) + && (common->nltype == NLTYPE_ANY || common->nltype == NLTYPE_ANYCRLF || common->newline > 255)) newlinecheck = TRUE; -if (common->match_end_ptr != 0) +SLJIT_ASSERT(common->abort_label == NULL); + +if ((overall_options & PCRE2_FIRSTLINE) != 0) { /* Search for the end of the first line. */ + SLJIT_ASSERT(common->match_end_ptr != 0); OP1(SLJIT_MOV, TMP3, 0, STR_PTR, 0); if (common->nltype == NLTYPE_FIXED && common->newline > 255) @@ -3424,7 +5179,7 @@ if (common->match_end_ptr != 0) mainloop = LABEL(); /* Continual stores does not cause data dependency. */ OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->match_end_ptr, STR_PTR, 0); - read_char_range(common, common->nlmin, common->nlmax, TRUE); + read_char(common, common->nlmin, common->nlmax, NULL, READ_CHAR_NEWLINE); check_newlinechar(common, common->nltype, &newline, TRUE); CMPTO(SLJIT_LESS, STR_PTR, 0, STR_END, 0, mainloop); JUMPHERE(end); @@ -3434,6 +5189,41 @@ if (common->match_end_ptr != 0) OP1(SLJIT_MOV, STR_PTR, 0, TMP3, 0); } +else if ((overall_options & PCRE2_USE_OFFSET_LIMIT) != 0) + { + /* Check whether offset limit is set and valid. */ + SLJIT_ASSERT(common->match_end_ptr != 0); + + if (HAS_VIRTUAL_REGISTERS) + { + OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, offset_limit)); + } + else + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, offset_limit)); + + OP1(SLJIT_MOV, TMP2, 0, STR_END, 0); + end = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, (sljit_sw) PCRE2_UNSET); + if (HAS_VIRTUAL_REGISTERS) + OP1(SLJIT_MOV, TMP2, 0, ARGUMENTS, 0); + else + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, begin)); + +#if PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 + OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, UCHAR_SHIFT); +#endif /* PCRE2_CODE_UNIT_WIDTH == [16|32] */ + if (HAS_VIRTUAL_REGISTERS) + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(jit_arguments, begin)); + + OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, TMP1, 0); + end2 = CMP(SLJIT_LESS_EQUAL, TMP2, 0, STR_END, 0); + OP1(SLJIT_MOV, TMP2, 0, STR_END, 0); + JUMPHERE(end2); + OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_NOMATCH); + add_jump(compiler, &common->abort, CMP(SLJIT_LESS, TMP2, 0, STR_PTR, 0)); + JUMPHERE(end); + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->match_end_ptr, TMP2, 0); + } start = JUMP(SLJIT_JUMP); @@ -3445,9 +5235,9 @@ if (newlinecheck) OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), 0); OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, common->newline & 0xff); OP_FLAGS(SLJIT_MOV, TMP1, 0, SLJIT_EQUAL); -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 +#if PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, UCHAR_SHIFT); -#endif +#endif /* PCRE2_CODE_UNIT_WIDTH == [16|32] */ OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); end2 = JUMP(SLJIT_JUMP); } @@ -3455,9 +5245,9 @@ if (newlinecheck) mainloop = LABEL(); /* Increasing the STR_PTR here requires one less jump in the most common case. */ -#ifdef SUPPORT_UTF -if (common->utf) readuchar = TRUE; -#endif +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +if (common->utf && !common->invalid_utf) readuchar = TRUE; +#endif /* SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 */ if (newlinecheck) readuchar = TRUE; if (readuchar) @@ -3467,28 +5257,60 @@ if (newlinecheck) CMPTO(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, (common->newline >> 8) & 0xff, newlinelabel); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 -#if defined COMPILE_PCRE8 -if (common->utf) +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +#if PCRE2_CODE_UNIT_WIDTH == 8 +if (common->invalid_utf) { - singlechar = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0xc0); + /* Skip continuation code units. */ + loop = LABEL(); + jump = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), 0); + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x80); + CMPTO(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x40, loop); + OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + JUMPHERE(jump); + } +else if (common->utf) + { + jump = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0xc0); OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP1), (sljit_sw)PRIV(utf8_table4) - 0xc0); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); - JUMPHERE(singlechar); + JUMPHERE(jump); + } +#elif PCRE2_CODE_UNIT_WIDTH == 16 +if (common->invalid_utf) + { + /* Skip continuation code units. */ + loop = LABEL(); + jump = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), 0); + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xdc00); + CMPTO(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x400, loop); + OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + JUMPHERE(jump); } -#elif defined COMPILE_PCRE16 -if (common->utf) +else if (common->utf) { - singlechar = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0xd800); - OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xfc00); - OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0xd800); - OP_FLAGS(SLJIT_MOV, TMP1, 0, SLJIT_EQUAL); - OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 1); - OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); - JUMPHERE(singlechar); + OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xd800); + + if (sljit_has_cpu_feature(SLJIT_HAS_CMOV)) + { + OP2(SLJIT_ADD, TMP2, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + OP2(SLJIT_SUB | SLJIT_SET_LESS, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x400); + CMOV(SLJIT_LESS, STR_PTR, TMP2, 0); + } + else + { + OP2(SLJIT_SUB | SLJIT_SET_LESS, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x400); + OP_FLAGS(SLJIT_MOV, TMP1, 0, SLJIT_LESS); + OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, UCHAR_SHIFT); + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); + } } -#endif /* COMPILE_PCRE[8|16] */ -#endif /* SUPPORT_UTF && !COMPILE_PCRE32 */ +#endif /* PCRE2_CODE_UNIT_WIDTH == [8|16] */ +#endif /* SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 */ JUMPHERE(start); if (newlinecheck) @@ -3500,53 +5322,55 @@ if (newlinecheck) return mainloop; } -#define MAX_N_CHARS 16 -#define MAX_DIFF_CHARS 6 -static SLJIT_INLINE void add_prefix_char(pcre_uchar chr, pcre_uchar *chars) +static SLJIT_INLINE void add_prefix_char(PCRE2_UCHAR chr, fast_forward_char_data *chars, BOOL last) { -pcre_uchar i, len; +sljit_u32 i, count = chars->count; -len = chars[0]; -if (len == 255) +if (count == 255) return; -if (len == 0) +if (count == 0) { - chars[0] = 1; - chars[1] = chr; + chars->count = 1; + chars->chars[0] = chr; + + if (last) + chars->last_count = 1; return; } -for (i = len; i > 0; i--) - if (chars[i] == chr) +for (i = 0; i < count; i++) + if (chars->chars[i] == chr) return; -if (len >= MAX_DIFF_CHARS - 1) +if (count >= MAX_DIFF_CHARS) { - chars[0] = 255; + chars->count = 255; return; } -len++; -chars[len] = chr; -chars[0] = len; +chars->chars[count] = chr; +chars->count = count + 1; + +if (last) + chars->last_count++; } -static int scan_prefix(compiler_common *common, pcre_uchar *cc, pcre_uchar *chars, int max_chars, sljit_u32 *rec_count) +static int scan_prefix(compiler_common *common, PCRE2_SPTR cc, fast_forward_char_data *chars, int max_chars, sljit_u32 *rec_count) { /* Recursive function, which scans prefix literals. */ BOOL last, any, class, caseless; int len, repeat, len_save, consumed = 0; sljit_u32 chr; /* Any unicode character. */ sljit_u8 *bytes, *bytes_end, byte; -pcre_uchar *alternative, *cc_save, *oc; -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 -pcre_uchar othercase[8]; -#elif defined SUPPORT_UTF && defined COMPILE_PCRE16 -pcre_uchar othercase[2]; +PCRE2_SPTR alternative, cc_save, oc; +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 +PCRE2_UCHAR othercase[4]; +#elif defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 16 +PCRE2_UCHAR othercase[2]; #else -pcre_uchar othercase[1]; +PCRE2_UCHAR othercase[1]; #endif repeat = 1; @@ -3565,6 +5389,7 @@ while (TRUE) { case OP_CHARI: caseless = TRUE; + /* Fall through */ case OP_CHAR: last = FALSE; cc++; @@ -3589,6 +5414,8 @@ while (TRUE) case OP_ASSERT_NOT: case OP_ASSERTBACK: case OP_ASSERTBACK_NOT: + case OP_ASSERT_NA: + case OP_ASSERTBACK_NA: cc = bracketend(cc); continue; @@ -3596,6 +5423,7 @@ while (TRUE) case OP_MINPLUSI: case OP_POSPLUSI: caseless = TRUE; + /* Fall through */ case OP_PLUS: case OP_MINPLUS: case OP_POSPLUS: @@ -3604,6 +5432,7 @@ while (TRUE) case OP_EXACTI: caseless = TRUE; + /* Fall through */ case OP_EXACT: repeat = GET2(cc, 1); last = FALSE; @@ -3614,12 +5443,13 @@ while (TRUE) case OP_MINQUERYI: case OP_POSQUERYI: caseless = TRUE; + /* Fall through */ case OP_QUERY: case OP_MINQUERY: case OP_POSQUERY: len = 1; cc++; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf && HAS_EXTRALEN(*cc)) len += GET_EXTRALEN(*cc); #endif max_chars = scan_prefix(common, cc + len, chars, max_chars, rec_count); @@ -3637,7 +5467,6 @@ while (TRUE) continue; case OP_ONCE: - case OP_ONCE_NC: case OP_BRA: case OP_BRAPOS: case OP_CBRA: @@ -3657,7 +5486,7 @@ while (TRUE) continue; case OP_CLASS: -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 if (common->utf && !is_char7_bitset((const sljit_u8 *)(cc + 1), FALSE)) return consumed; #endif @@ -3665,15 +5494,15 @@ while (TRUE) break; case OP_NCLASS: -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 if (common->utf) return consumed; #endif class = TRUE; break; -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH != 8 case OP_XCLASS: -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 if (common->utf) return consumed; #endif any = TRUE; @@ -3682,7 +5511,7 @@ while (TRUE) #endif case OP_DIGIT: -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 if (common->utf && !is_char7_bitset((const sljit_u8 *)common->ctypes - cbit_length + cbit_digit, FALSE)) return consumed; #endif @@ -3691,7 +5520,7 @@ while (TRUE) break; case OP_WHITESPACE: -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 if (common->utf && !is_char7_bitset((const sljit_u8 *)common->ctypes - cbit_length + cbit_space, FALSE)) return consumed; #endif @@ -3700,7 +5529,7 @@ while (TRUE) break; case OP_WORDCHAR: -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 if (common->utf && !is_char7_bitset((const sljit_u8 *)common->ctypes - cbit_length + cbit_word, FALSE)) return consumed; #endif @@ -3717,17 +5546,17 @@ while (TRUE) case OP_NOT_WORDCHAR: case OP_ANY: case OP_ALLANY: -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 if (common->utf) return consumed; #endif any = TRUE; cc++; break; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE case OP_NOTPROP: case OP_PROP: -#ifndef COMPILE_PCRE32 +#if PCRE2_CODE_UNIT_WIDTH != 32 if (common->utf) return consumed; #endif any = TRUE; @@ -3742,7 +5571,7 @@ while (TRUE) case OP_NOTEXACT: case OP_NOTEXACTI: -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 if (common->utf) return consumed; #endif any = TRUE; @@ -3758,12 +5587,12 @@ while (TRUE) { do { - chars[0] = 255; + chars->count = 255; consumed++; if (--max_chars == 0) return consumed; - chars += MAX_DIFF_CHARS; + chars++; } while (--repeat > 0); @@ -3774,7 +5603,7 @@ while (TRUE) if (class) { bytes = (sljit_u8*) (cc + 1); - cc += 1 + 32 / sizeof(pcre_uchar); + cc += 1 + 32 / sizeof(PCRE2_UCHAR); switch (*cc) { @@ -3807,8 +5636,8 @@ while (TRUE) do { if (bytes[31] & 0x80) - chars[0] = 255; - else if (chars[0] != 255) + chars->count = 255; + else if (chars->count != 255) { bytes_end = bytes + 32; chr = 0; @@ -3823,7 +5652,7 @@ while (TRUE) do { if ((byte & 0x1) != 0) - add_prefix_char(chr, chars); + add_prefix_char(chr, chars, TRUE); byte >>= 1; chr++; } @@ -3831,14 +5660,14 @@ while (TRUE) chr = (chr + 7) & ~7; } } - while (chars[0] != 255 && bytes < bytes_end); + while (chars->count != 255 && bytes < bytes_end); bytes = bytes_end - 32; } consumed++; if (--max_chars == 0) return consumed; - chars += MAX_DIFF_CHARS; + chars++; } while (--repeat > 0); @@ -3869,13 +5698,13 @@ while (TRUE) } len = 1; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf && HAS_EXTRALEN(*cc)) len += GET_EXTRALEN(*cc); #endif if (caseless && char_has_othercase(common, cc)) { -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf) { GETCHAR(chr, cc); @@ -3886,7 +5715,12 @@ while (TRUE) #endif { chr = *cc; - othercase[0] = TABLE_GET(chr, common->fcc, chr); +#ifdef SUPPORT_UNICODE + if (common->ucp && chr > 127) + othercase[0] = UCD_OTHERCASE(chr); + else +#endif + othercase[0] = TABLE_GET(chr, common->fcc, chr); } } else @@ -3902,17 +5736,18 @@ while (TRUE) oc = othercase; do { + len--; + consumed++; + chr = *cc; - add_prefix_char(*cc, chars); + add_prefix_char(*cc, chars, len == 0); if (caseless) - add_prefix_char(*oc, chars); + add_prefix_char(*oc, chars, len == 0); - len--; - consumed++; if (--max_chars == 0) return consumed; - chars += MAX_DIFF_CHARS; + chars++; cc++; oc++; } @@ -3931,302 +5766,86 @@ while (TRUE) } } -#if (defined SLJIT_CONFIG_X86 && SLJIT_CONFIG_X86) && !(defined SUPPORT_VALGRIND) - -static sljit_s32 character_to_int32(pcre_uchar chr) +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +static void jumpto_if_not_utf_char_start(struct sljit_compiler *compiler, sljit_s32 reg, struct sljit_label *label) { -sljit_s32 value = (sljit_s32)chr; -#if defined COMPILE_PCRE8 -#define SSE2_COMPARE_TYPE_INDEX 0 -return (value << 24) | (value << 16) | (value << 8) | value; -#elif defined COMPILE_PCRE16 -#define SSE2_COMPARE_TYPE_INDEX 1 -return (value << 16) | value; -#elif defined COMPILE_PCRE32 -#define SSE2_COMPARE_TYPE_INDEX 2 -return value; +#if PCRE2_CODE_UNIT_WIDTH == 8 +OP2(SLJIT_AND, reg, 0, reg, 0, SLJIT_IMM, 0xc0); +CMPTO(SLJIT_EQUAL, reg, 0, SLJIT_IMM, 0x80, label); +#elif PCRE2_CODE_UNIT_WIDTH == 16 +OP2(SLJIT_AND, reg, 0, reg, 0, SLJIT_IMM, 0xfc00); +CMPTO(SLJIT_EQUAL, reg, 0, SLJIT_IMM, 0xdc00, label); #else -#error "Unsupported unit width" +#error "Unknown code width" #endif } - -static SLJIT_INLINE void fast_forward_first_char2_sse2(compiler_common *common, pcre_uchar char1, pcre_uchar char2) -{ -DEFINE_COMPILER; -struct sljit_label *start; -struct sljit_jump *quit[3]; -struct sljit_jump *nomatch; -sljit_u8 instruction[8]; -sljit_s32 tmp1_ind = sljit_get_register_index(TMP1); -sljit_s32 tmp2_ind = sljit_get_register_index(TMP2); -sljit_s32 str_ptr_ind = sljit_get_register_index(STR_PTR); -BOOL load_twice = FALSE; -pcre_uchar bit; - -bit = char1 ^ char2; -if (!is_powerof2(bit)) - bit = 0; - -if ((char1 != char2) && bit == 0) - load_twice = TRUE; - -quit[0] = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); - -/* First part (unaligned start) */ - -OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(char1 | bit)); - -SLJIT_ASSERT(tmp1_ind < 8 && tmp2_ind == 1); - -/* MOVD xmm, r/m32 */ -instruction[0] = 0x66; -instruction[1] = 0x0f; -instruction[2] = 0x6e; -instruction[3] = 0xc0 | (2 << 3) | tmp1_ind; -sljit_emit_op_custom(compiler, instruction, 4); - -if (char1 != char2) - { - OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(bit != 0 ? bit : char2)); - - /* MOVD xmm, r/m32 */ - instruction[3] = 0xc0 | (3 << 3) | tmp1_ind; - sljit_emit_op_custom(compiler, instruction, 4); - } - -/* PSHUFD xmm1, xmm2/m128, imm8 */ -instruction[2] = 0x70; -instruction[3] = 0xc0 | (2 << 3) | 2; -instruction[4] = 0; -sljit_emit_op_custom(compiler, instruction, 5); - -if (char1 != char2) - { - /* PSHUFD xmm1, xmm2/m128, imm8 */ - instruction[3] = 0xc0 | (3 << 3) | 3; - instruction[4] = 0; - sljit_emit_op_custom(compiler, instruction, 5); - } - -OP2(SLJIT_AND, TMP2, 0, STR_PTR, 0, SLJIT_IMM, 0xf); -OP2(SLJIT_AND, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, ~0xf); - -/* MOVDQA xmm1, xmm2/m128 */ -#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) - -if (str_ptr_ind < 8) - { - instruction[2] = 0x6f; - instruction[3] = (0 << 3) | str_ptr_ind; - sljit_emit_op_custom(compiler, instruction, 4); - - if (load_twice) - { - instruction[3] = (1 << 3) | str_ptr_ind; - sljit_emit_op_custom(compiler, instruction, 4); - } - } -else - { - instruction[1] = 0x41; - instruction[2] = 0x0f; - instruction[3] = 0x6f; - instruction[4] = (0 << 3) | (str_ptr_ind & 0x7); - sljit_emit_op_custom(compiler, instruction, 5); - - if (load_twice) - { - instruction[4] = (1 << 3) | str_ptr_ind; - sljit_emit_op_custom(compiler, instruction, 5); - } - instruction[1] = 0x0f; - } - -#else - -instruction[2] = 0x6f; -instruction[3] = (0 << 3) | str_ptr_ind; -sljit_emit_op_custom(compiler, instruction, 4); - -if (load_twice) - { - instruction[3] = (1 << 3) | str_ptr_ind; - sljit_emit_op_custom(compiler, instruction, 4); - } - #endif -if (bit != 0) - { - /* POR xmm1, xmm2/m128 */ - instruction[2] = 0xeb; - instruction[3] = 0xc0 | (0 << 3) | 3; - sljit_emit_op_custom(compiler, instruction, 4); - } - -/* PCMPEQB/W/D xmm1, xmm2/m128 */ -instruction[2] = 0x74 + SSE2_COMPARE_TYPE_INDEX; -instruction[3] = 0xc0 | (0 << 3) | 2; -sljit_emit_op_custom(compiler, instruction, 4); - -if (load_twice) - { - instruction[3] = 0xc0 | (1 << 3) | 3; - sljit_emit_op_custom(compiler, instruction, 4); - } - -/* PMOVMSKB reg, xmm */ -instruction[2] = 0xd7; -instruction[3] = 0xc0 | (tmp1_ind << 3) | 0; -sljit_emit_op_custom(compiler, instruction, 4); - -if (load_twice) - { - OP1(SLJIT_MOV, RETURN_ADDR, 0, TMP2, 0); - instruction[3] = 0xc0 | (tmp2_ind << 3) | 1; - sljit_emit_op_custom(compiler, instruction, 4); - - OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); - OP1(SLJIT_MOV, TMP2, 0, RETURN_ADDR, 0); - } - -OP2(SLJIT_ASHR, TMP1, 0, TMP1, 0, TMP2, 0); - -/* BSF r32, r/m32 */ -instruction[0] = 0x0f; -instruction[1] = 0xbc; -instruction[2] = 0xc0 | (tmp1_ind << 3) | tmp1_ind; -sljit_emit_op_custom(compiler, instruction, 3); -sljit_set_current_flags(compiler, SLJIT_SET_Z); - -nomatch = JUMP(SLJIT_ZERO); - -OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP2, 0); -OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); -quit[1] = JUMP(SLJIT_JUMP); - -JUMPHERE(nomatch); +#include "pcre2_jit_simd_inc.h" -start = LABEL(); -OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, 16); -quit[2] = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); - -/* Second part (aligned) */ - -instruction[0] = 0x66; -instruction[1] = 0x0f; - -/* MOVDQA xmm1, xmm2/m128 */ -#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) +#ifdef JIT_HAS_FAST_FORWARD_CHAR_PAIR_SIMD -if (str_ptr_ind < 8) - { - instruction[2] = 0x6f; - instruction[3] = (0 << 3) | str_ptr_ind; - sljit_emit_op_custom(compiler, instruction, 4); - - if (load_twice) - { - instruction[3] = (1 << 3) | str_ptr_ind; - sljit_emit_op_custom(compiler, instruction, 4); - } - } -else - { - instruction[1] = 0x41; - instruction[2] = 0x0f; - instruction[3] = 0x6f; - instruction[4] = (0 << 3) | (str_ptr_ind & 0x7); - sljit_emit_op_custom(compiler, instruction, 5); +static BOOL check_fast_forward_char_pair_simd(compiler_common *common, fast_forward_char_data *chars, int max) +{ + sljit_s32 i, j, max_i = 0, max_j = 0; + sljit_u32 max_pri = 0; + PCRE2_UCHAR a1, a2, a_pri, b1, b2, b_pri; - if (load_twice) + for (i = max - 1; i >= 1; i--) { - instruction[4] = (1 << 3) | str_ptr_ind; - sljit_emit_op_custom(compiler, instruction, 5); - } - instruction[1] = 0x0f; - } - -#else - -instruction[2] = 0x6f; -instruction[3] = (0 << 3) | str_ptr_ind; -sljit_emit_op_custom(compiler, instruction, 4); - -if (load_twice) - { - instruction[3] = (1 << 3) | str_ptr_ind; - sljit_emit_op_custom(compiler, instruction, 4); - } - -#endif - -if (bit != 0) - { - /* POR xmm1, xmm2/m128 */ - instruction[2] = 0xeb; - instruction[3] = 0xc0 | (0 << 3) | 3; - sljit_emit_op_custom(compiler, instruction, 4); - } - -/* PCMPEQB/W/D xmm1, xmm2/m128 */ -instruction[2] = 0x74 + SSE2_COMPARE_TYPE_INDEX; -instruction[3] = 0xc0 | (0 << 3) | 2; -sljit_emit_op_custom(compiler, instruction, 4); - -if (load_twice) - { - instruction[3] = 0xc0 | (1 << 3) | 3; - sljit_emit_op_custom(compiler, instruction, 4); - } - -/* PMOVMSKB reg, xmm */ -instruction[2] = 0xd7; -instruction[3] = 0xc0 | (tmp1_ind << 3) | 0; -sljit_emit_op_custom(compiler, instruction, 4); - -if (load_twice) - { - instruction[3] = 0xc0 | (tmp2_ind << 3) | 1; - sljit_emit_op_custom(compiler, instruction, 4); + if (chars[i].last_count > 2) + { + a1 = chars[i].chars[0]; + a2 = chars[i].chars[1]; + a_pri = chars[i].last_count; - OP2(SLJIT_OR, TMP1, 0, TMP1, 0, TMP2, 0); - } + j = i - max_fast_forward_char_pair_offset(); + if (j < 0) + j = 0; -/* BSF r32, r/m32 */ -instruction[0] = 0x0f; -instruction[1] = 0xbc; -instruction[2] = 0xc0 | (tmp1_ind << 3) | tmp1_ind; -sljit_emit_op_custom(compiler, instruction, 3); -sljit_set_current_flags(compiler, SLJIT_SET_Z); + while (j < i) + { + b_pri = chars[j].last_count; + if (b_pri > 2 && a_pri + b_pri >= max_pri) + { + b1 = chars[j].chars[0]; + b2 = chars[j].chars[1]; -JUMPTO(SLJIT_ZERO, start); + if (a1 != b1 && a1 != b2 && a2 != b1 && a2 != b2) + { + max_pri = a_pri + b_pri; + max_i = i; + max_j = j; + } + } + j++; + } + } + } -OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); +if (max_pri == 0) + return FALSE; -start = LABEL(); -SET_LABEL(quit[0], start); -SET_LABEL(quit[1], start); -SET_LABEL(quit[2], start); +fast_forward_char_pair_simd(common, max_i, chars[max_i].chars[0], chars[max_i].chars[1], max_j, chars[max_j].chars[0], chars[max_j].chars[1]); +return TRUE; } -#undef SSE2_COMPARE_TYPE_INDEX - -#endif +#endif /* JIT_HAS_FAST_FORWARD_CHAR_PAIR_SIMD */ -static void fast_forward_first_char2(compiler_common *common, pcre_uchar char1, pcre_uchar char2, sljit_s32 offset) +static void fast_forward_first_char2(compiler_common *common, PCRE2_UCHAR char1, PCRE2_UCHAR char2, sljit_s32 offset) { DEFINE_COMPILER; struct sljit_label *start; -struct sljit_jump *quit; -struct sljit_jump *found; -pcre_uchar mask; -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 -struct sljit_label *utf_start = NULL; -struct sljit_jump *utf_quit = NULL; -#endif +struct sljit_jump *match; +struct sljit_jump *partial_quit; +PCRE2_UCHAR mask; BOOL has_match_end = (common->match_end_ptr != 0); +SLJIT_ASSERT(common->mode == PCRE2_JIT_COMPLETE || offset == 0); + +if (has_match_end) + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->match_end_ptr); + if (offset > 0) OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(offset)); @@ -4234,66 +5853,19 @@ if (has_match_end) { OP1(SLJIT_MOV, TMP3, 0, STR_END, 0); - OP2(SLJIT_ADD, STR_END, 0, SLJIT_MEM1(SLJIT_SP), common->match_end_ptr, SLJIT_IMM, IN_UCHARS(offset + 1)); - OP2(SLJIT_SUB | SLJIT_SET_GREATER, SLJIT_UNUSED, 0, STR_END, 0, TMP3, 0); - sljit_emit_cmov(compiler, SLJIT_GREATER, STR_END, TMP3, 0); + OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, IN_UCHARS(offset + 1)); + OP2(SLJIT_SUB | SLJIT_SET_GREATER, SLJIT_UNUSED, 0, STR_END, 0, TMP1, 0); + CMOV(SLJIT_GREATER, STR_END, TMP1, 0); } -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 -if (common->utf && offset > 0) - utf_start = LABEL(); -#endif - -#if (defined SLJIT_CONFIG_X86 && SLJIT_CONFIG_X86) && !(defined SUPPORT_VALGRIND) +#ifdef JIT_HAS_FAST_FORWARD_CHAR_SIMD -/* SSE2 accelerated first character search. */ - -if (sljit_has_cpu_feature(SLJIT_HAS_SSE2)) +if (JIT_HAS_FAST_FORWARD_CHAR_SIMD) { - fast_forward_first_char2_sse2(common, char1, char2); - - SLJIT_ASSERT(common->mode == JIT_COMPILE || offset == 0); - if (common->mode == JIT_COMPILE) - { - /* In complete mode, we don't need to run a match when STR_PTR == STR_END. */ - SLJIT_ASSERT(common->forced_quit_label == NULL); - OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE_ERROR_NOMATCH); - add_jump(compiler, &common->forced_quit, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); - -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 - if (common->utf && offset > 0) - { - SLJIT_ASSERT(common->mode == JIT_COMPILE); - - OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-offset)); - OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); -#if defined COMPILE_PCRE8 - OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xc0); - CMPTO(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, 0x80, utf_start); -#elif defined COMPILE_PCRE16 - OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xfc00); - CMPTO(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, 0xdc00, utf_start); -#else -#error "Unknown code width" -#endif - OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); - } -#endif + fast_forward_char_simd(common, char1, char2, offset); - if (offset > 0) - OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(offset)); - } - else - { - OP2(SLJIT_SUB | SLJIT_SET_GREATER_EQUAL, SLJIT_UNUSED, 0, STR_PTR, 0, STR_END, 0); - if (has_match_end) - { - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->match_end_ptr); - sljit_emit_cmov(compiler, SLJIT_GREATER_EQUAL, STR_PTR, TMP1, 0); - } - else - sljit_emit_cmov(compiler, SLJIT_GREATER_EQUAL, STR_PTR, STR_END, 0); - } + if (offset > 0) + OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(offset)); if (has_match_end) OP1(SLJIT_MOV, STR_END, 0, TMP3, 0); @@ -4302,88 +5874,59 @@ if (sljit_has_cpu_feature(SLJIT_HAS_SSE2)) #endif -quit = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); - start = LABEL(); + +partial_quit = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); +if (common->mode == PCRE2_JIT_COMPLETE) + add_jump(compiler, &common->failed_match, partial_quit); + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), 0); +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); if (char1 == char2) - found = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, char1); + CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, char1, start); else { mask = char1 ^ char2; if (is_powerof2(mask)) { OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, mask); - found = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, char1 | mask); + CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, char1 | mask, start); } else { - OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, char1); - OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_EQUAL); - OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, char2); - OP_FLAGS(SLJIT_OR | SLJIT_SET_Z, TMP2, 0, SLJIT_EQUAL); - found = JUMP(SLJIT_NOT_ZERO); + match = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, char1); + CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, char2, start); + JUMPHERE(match); } } -OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); -CMPTO(SLJIT_LESS, STR_PTR, 0, STR_END, 0, start); - -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 -if (common->utf && offset > 0) - utf_quit = JUMP(SLJIT_JUMP); -#endif - -JUMPHERE(found); - -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 if (common->utf && offset > 0) { - OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-offset)); - OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); -#if defined COMPILE_PCRE8 - OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xc0); - CMPTO(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, 0x80, utf_start); -#elif defined COMPILE_PCRE16 - OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xfc00); - CMPTO(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, 0xdc00, utf_start); -#else -#error "Unknown code width" -#endif - OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); - JUMPHERE(utf_quit); + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-(offset + 1))); + jumpto_if_not_utf_char_start(compiler, TMP1, start); } #endif -JUMPHERE(quit); +OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(offset + 1)); + +if (common->mode != PCRE2_JIT_COMPLETE) + JUMPHERE(partial_quit); if (has_match_end) - { - quit = CMP(SLJIT_LESS, STR_PTR, 0, STR_END, 0); - OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), common->match_end_ptr); - if (offset > 0) - OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(offset)); - JUMPHERE(quit); OP1(SLJIT_MOV, STR_END, 0, TMP3, 0); - } - -if (offset > 0) - OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(offset)); } static SLJIT_INLINE BOOL fast_forward_first_n_chars(compiler_common *common) { DEFINE_COMPILER; struct sljit_label *start; -struct sljit_jump *quit; struct sljit_jump *match; -/* bytes[0] represent the number of characters between 0 -and MAX_N_BYTES - 1, 255 represents any character. */ -pcre_uchar chars[MAX_N_CHARS * MAX_DIFF_CHARS]; +fast_forward_char_data chars[MAX_N_CHARS]; sljit_s32 offset; -pcre_uchar mask; -pcre_uchar *char_set, *char_set_end; +PCRE2_UCHAR mask; +PCRE2_UCHAR *char_set, *char_set_end; int i, max, from; int range_right = -1, range_len; sljit_u8 *update_table = NULL; @@ -4391,7 +5934,10 @@ BOOL in_range; sljit_u32 rec_count; for (i = 0; i < MAX_N_CHARS; i++) - chars[i * MAX_DIFF_CHARS] = 0; + { + chars[i].count = 0; + chars[i].last_count = 0; + } rec_count = 10000; max = scan_prefix(common, common->start, chars, MAX_N_CHARS, &rec_count); @@ -4399,21 +5945,50 @@ max = scan_prefix(common, common->start, chars, MAX_N_CHARS, &rec_count); if (max < 1) return FALSE; +/* Convert last_count to priority. */ +for (i = 0; i < max; i++) + { + SLJIT_ASSERT(chars[i].count > 0 && chars[i].last_count <= chars[i].count); + + if (chars[i].count == 1) + { + chars[i].last_count = (chars[i].last_count == 1) ? 7 : 5; + /* Simplifies algorithms later. */ + chars[i].chars[1] = chars[i].chars[0]; + } + else if (chars[i].count == 2) + { + SLJIT_ASSERT(chars[i].chars[0] != chars[i].chars[1]); + + if (is_powerof2(chars[i].chars[0] ^ chars[i].chars[1])) + chars[i].last_count = (chars[i].last_count == 2) ? 6 : 4; + else + chars[i].last_count = (chars[i].last_count == 2) ? 3 : 2; + } + else + chars[i].last_count = (chars[i].count == 255) ? 0 : 1; + } + +#ifdef JIT_HAS_FAST_FORWARD_CHAR_PAIR_SIMD +if (JIT_HAS_FAST_FORWARD_CHAR_PAIR_SIMD && check_fast_forward_char_pair_simd(common, chars, max)) + return TRUE; +#endif + in_range = FALSE; /* Prevent compiler "uninitialized" warning */ from = 0; range_len = 4 /* minimum length */ - 1; for (i = 0; i <= max; i++) { - if (in_range && (i - from) > range_len && (chars[(i - 1) * MAX_DIFF_CHARS] < 255)) + if (in_range && (i - from) > range_len && (chars[i - 1].count < 255)) { range_len = i - from; range_right = i - 1; } - if (i < max && chars[i * MAX_DIFF_CHARS] < 255) + if (i < max && chars[i].count < 255) { - SLJIT_ASSERT(chars[i * MAX_DIFF_CHARS] > 0); + SLJIT_ASSERT(chars[i].count > 0); if (!in_range) { in_range = TRUE; @@ -4433,16 +6008,17 @@ if (range_right >= 0) for (i = 0; i < range_len; i++) { - char_set = chars + ((range_right - i) * MAX_DIFF_CHARS); - SLJIT_ASSERT(char_set[0] > 0 && char_set[0] < 255); - char_set_end = char_set + char_set[0]; - char_set++; - while (char_set <= char_set_end) + SLJIT_ASSERT(chars[range_right - i].count > 0 && chars[range_right - i].count < 255); + + char_set = chars[range_right - i].chars; + char_set_end = char_set + chars[range_right - i].count; + do { if (update_table[(*char_set) & 0xff] > IN_UCHARS(i)) update_table[(*char_set) & 0xff] = IN_UCHARS(i); char_set++; } + while (char_set < char_set_end); } } @@ -4450,78 +6026,65 @@ offset = -1; /* Scan forward. */ for (i = 0; i < max; i++) { + if (range_right == i) + continue; + if (offset == -1) { - if (chars[i * MAX_DIFF_CHARS] <= 2) + if (chars[i].last_count >= 2) offset = i; } - else if (chars[offset * MAX_DIFF_CHARS] == 2 && chars[i * MAX_DIFF_CHARS] <= 2) - { - if (chars[i * MAX_DIFF_CHARS] == 1) - offset = i; - else - { - mask = chars[offset * MAX_DIFF_CHARS + 1] ^ chars[offset * MAX_DIFF_CHARS + 2]; - if (!is_powerof2(mask)) - { - mask = chars[i * MAX_DIFF_CHARS + 1] ^ chars[i * MAX_DIFF_CHARS + 2]; - if (is_powerof2(mask)) - offset = i; - } - } - } + else if (chars[offset].last_count < chars[i].last_count) + offset = i; } +SLJIT_ASSERT(offset == -1 || (chars[offset].count >= 1 && chars[offset].count <= 2)); + if (range_right < 0) { if (offset < 0) return FALSE; - SLJIT_ASSERT(chars[offset * MAX_DIFF_CHARS] >= 1 && chars[offset * MAX_DIFF_CHARS] <= 2); /* Works regardless the value is 1 or 2. */ - mask = chars[offset * MAX_DIFF_CHARS + chars[offset * MAX_DIFF_CHARS]]; - fast_forward_first_char2(common, chars[offset * MAX_DIFF_CHARS + 1], mask, offset); + fast_forward_first_char2(common, chars[offset].chars[0], chars[offset].chars[1], offset); return TRUE; } -if (range_right == offset) - offset = -1; - -SLJIT_ASSERT(offset == -1 || (chars[offset * MAX_DIFF_CHARS] >= 1 && chars[offset * MAX_DIFF_CHARS] <= 2)); +SLJIT_ASSERT(range_right != offset); -max -= 1; -SLJIT_ASSERT(max > 0); if (common->match_end_ptr != 0) { OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->match_end_ptr); OP1(SLJIT_MOV, TMP3, 0, STR_END, 0); - OP2(SLJIT_SUB, STR_END, 0, STR_END, 0, SLJIT_IMM, IN_UCHARS(max)); - quit = CMP(SLJIT_LESS_EQUAL, STR_END, 0, TMP1, 0); - OP1(SLJIT_MOV, STR_END, 0, TMP1, 0); - JUMPHERE(quit); + OP2(SLJIT_SUB | SLJIT_SET_LESS, STR_END, 0, STR_END, 0, SLJIT_IMM, IN_UCHARS(max)); + add_jump(compiler, &common->failed_match, JUMP(SLJIT_LESS)); + OP2(SLJIT_SUB | SLJIT_SET_GREATER, SLJIT_UNUSED, 0, STR_END, 0, TMP1, 0); + CMOV(SLJIT_GREATER, STR_END, TMP1, 0); } else - OP2(SLJIT_SUB, STR_END, 0, STR_END, 0, SLJIT_IMM, IN_UCHARS(max)); + { + OP2(SLJIT_SUB | SLJIT_SET_LESS, STR_END, 0, STR_END, 0, SLJIT_IMM, IN_UCHARS(max)); + add_jump(compiler, &common->failed_match, JUMP(SLJIT_LESS)); + } SLJIT_ASSERT(range_right >= 0); -#if !(defined SLJIT_CONFIG_X86_32 && SLJIT_CONFIG_X86_32) -OP1(SLJIT_MOV, RETURN_ADDR, 0, SLJIT_IMM, (sljit_sw)update_table); -#endif +if (!HAS_VIRTUAL_REGISTERS) + OP1(SLJIT_MOV, RETURN_ADDR, 0, SLJIT_IMM, (sljit_sw)update_table); start = LABEL(); -quit = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); +add_jump(compiler, &common->failed_match, CMP(SLJIT_GREATER, STR_PTR, 0, STR_END, 0)); -#if defined COMPILE_PCRE8 || (defined SLJIT_LITTLE_ENDIAN && SLJIT_LITTLE_ENDIAN) +#if PCRE2_CODE_UNIT_WIDTH == 8 || (defined SLJIT_LITTLE_ENDIAN && SLJIT_LITTLE_ENDIAN) OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(range_right)); #else OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(range_right + 1) - 1); #endif -#if !(defined SLJIT_CONFIG_X86_32 && SLJIT_CONFIG_X86_32) -OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM2(RETURN_ADDR, TMP1), 0); -#else -OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP1), (sljit_sw)update_table); -#endif +if (!HAS_VIRTUAL_REGISTERS) + OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM2(RETURN_ADDR, TMP1), 0); +else + OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP1), (sljit_sw)update_table); + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, 0, start); @@ -4530,26 +6093,26 @@ if (offset >= 0) OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(offset)); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); - if (chars[offset * MAX_DIFF_CHARS] == 1) - CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, chars[offset * MAX_DIFF_CHARS + 1], start); + if (chars[offset].count == 1) + CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, chars[offset].chars[0], start); else { - mask = chars[offset * MAX_DIFF_CHARS + 1] ^ chars[offset * MAX_DIFF_CHARS + 2]; + mask = chars[offset].chars[0] ^ chars[offset].chars[1]; if (is_powerof2(mask)) { OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, mask); - CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, chars[offset * MAX_DIFF_CHARS + 1] | mask, start); + CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, chars[offset].chars[0] | mask, start); } else { - match = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, chars[offset * MAX_DIFF_CHARS + 1]); - CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, chars[offset * MAX_DIFF_CHARS + 2], start); + match = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, chars[offset].chars[0]); + CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, chars[offset].chars[1], start); JUMPHERE(match); } } } -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 if (common->utf && offset != 0) { if (offset < 0) @@ -4559,15 +6122,9 @@ if (common->utf && offset != 0) } else OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-1)); -#if defined COMPILE_PCRE8 - OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xc0); - CMPTO(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, 0x80, start); -#elif defined COMPILE_PCRE16 - OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xfc00); - CMPTO(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, 0xdc00, start); -#else -#error "Unknown code width" -#endif + + jumpto_if_not_utf_char_start(compiler, TMP1, start); + if (offset < 0) OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); } @@ -4576,38 +6133,24 @@ if (common->utf && offset != 0) if (offset >= 0) OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); -JUMPHERE(quit); - if (common->match_end_ptr != 0) - { - if (range_right >= 0) - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->match_end_ptr); OP1(SLJIT_MOV, STR_END, 0, TMP3, 0); - if (range_right >= 0) - { - quit = CMP(SLJIT_LESS_EQUAL, STR_PTR, 0, TMP1, 0); - OP1(SLJIT_MOV, STR_PTR, 0, TMP1, 0); - JUMPHERE(quit); - } - } else OP2(SLJIT_ADD, STR_END, 0, STR_END, 0, SLJIT_IMM, IN_UCHARS(max)); return TRUE; } -#undef MAX_N_CHARS -#undef MAX_DIFF_CHARS - -static SLJIT_INLINE void fast_forward_first_char(compiler_common *common, pcre_uchar first_char, BOOL caseless) +static SLJIT_INLINE void fast_forward_first_char(compiler_common *common) { -pcre_uchar oc; +PCRE2_UCHAR first_char = (PCRE2_UCHAR)(common->re->first_codeunit); +PCRE2_UCHAR oc; oc = first_char; -if (caseless) +if ((common->re->flags & PCRE2_FIRSTCASELESS) != 0) { oc = TABLE_GET(first_char, common->fcc, first_char); -#if defined SUPPORT_UCP && !defined COMPILE_PCRE8 - if (first_char > 127 && common->utf) +#if defined SUPPORT_UNICODE + if (first_char > 127 && (common->utf || common->ucp)) oc = UCD_OTHERCASE(first_char); #endif } @@ -4619,9 +6162,9 @@ static SLJIT_INLINE void fast_forward_newline(compiler_common *common) { DEFINE_COMPILER; struct sljit_label *loop; -struct sljit_jump *lastchar; +struct sljit_jump *lastchar = NULL; struct sljit_jump *firstchar; -struct sljit_jump *quit; +struct sljit_jump *quit = NULL; struct sljit_jump *foundcr = NULL; struct sljit_jump *notfoundnl; jump_list *newline = NULL; @@ -4632,215 +6175,310 @@ if (common->match_end_ptr != 0) OP1(SLJIT_MOV, STR_END, 0, SLJIT_MEM1(SLJIT_SP), common->match_end_ptr); } -if (common->nltype == NLTYPE_FIXED && common->newline > 255) - { - lastchar = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); - OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); - OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, str)); - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, begin)); - firstchar = CMP(SLJIT_LESS_EQUAL, STR_PTR, 0, TMP2, 0); +if (common->nltype == NLTYPE_FIXED && common->newline > 255) + { +#ifdef JIT_HAS_FAST_FORWARD_CHAR_PAIR_SIMD + if (JIT_HAS_FAST_FORWARD_CHAR_PAIR_SIMD && common->mode == PCRE2_JIT_COMPLETE) + { + if (HAS_VIRTUAL_REGISTERS) + { + OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, str)); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, begin)); + } + else + { + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, str)); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, begin)); + } + firstchar = CMP(SLJIT_LESS_EQUAL, STR_PTR, 0, TMP2, 0); + + OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, STR_PTR, 0, TMP1, 0); + OP_FLAGS(SLJIT_MOV, TMP1, 0, SLJIT_NOT_EQUAL); +#if PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 + OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, UCHAR_SHIFT); +#endif + OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, TMP1, 0); + + fast_forward_char_pair_simd(common, 1, common->newline & 0xff, common->newline & 0xff, 0, (common->newline >> 8) & 0xff, (common->newline >> 8) & 0xff); + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(2)); + } + else +#endif /* JIT_HAS_FAST_FORWARD_CHAR_PAIR_SIMD */ + { + lastchar = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); + if (HAS_VIRTUAL_REGISTERS) + { + OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, str)); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, begin)); + } + else + { + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, str)); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, begin)); + } + firstchar = CMP(SLJIT_LESS_EQUAL, STR_PTR, 0, TMP2, 0); - OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, IN_UCHARS(2)); - OP2(SLJIT_SUB | SLJIT_SET_GREATER_EQUAL, SLJIT_UNUSED, 0, STR_PTR, 0, TMP1, 0); - OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_GREATER_EQUAL); -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, UCHAR_SHIFT); + OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, IN_UCHARS(2)); + OP2(SLJIT_SUB | SLJIT_SET_GREATER_EQUAL, SLJIT_UNUSED, 0, STR_PTR, 0, TMP1, 0); + OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_GREATER_EQUAL); +#if PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 + OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, UCHAR_SHIFT); #endif - OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, TMP2, 0); + OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, TMP2, 0); - loop = LABEL(); - OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); - quit = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); - OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-2)); - OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-1)); - CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, (common->newline >> 8) & 0xff, loop); - CMPTO(SLJIT_NOT_EQUAL, TMP2, 0, SLJIT_IMM, common->newline & 0xff, loop); + loop = LABEL(); + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + quit = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-2)); + OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-1)); + CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, (common->newline >> 8) & 0xff, loop); + CMPTO(SLJIT_NOT_EQUAL, TMP2, 0, SLJIT_IMM, common->newline & 0xff, loop); + + JUMPHERE(quit); + JUMPHERE(lastchar); + } - JUMPHERE(quit); JUMPHERE(firstchar); - JUMPHERE(lastchar); if (common->match_end_ptr != 0) OP1(SLJIT_MOV, STR_END, 0, TMP3, 0); return; } -OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); -OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, str)); +if (HAS_VIRTUAL_REGISTERS) + { + OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, str)); + } +else + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, str)); + +/* Example: match /^/ to \r\n from offset 1. */ firstchar = CMP(SLJIT_LESS_EQUAL, STR_PTR, 0, TMP2, 0); -skip_char_back(common); + +if (common->nltype == NLTYPE_ANY) + move_back(common, NULL, FALSE); +else + OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); loop = LABEL(); common->ff_newline_shortcut = loop; -read_char_range(common, common->nlmin, common->nlmax, TRUE); -lastchar = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); -if (common->nltype == NLTYPE_ANY || common->nltype == NLTYPE_ANYCRLF) - foundcr = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, CHAR_CR); -check_newlinechar(common, common->nltype, &newline, FALSE); -set_jumps(newline, loop); +#ifdef JIT_HAS_FAST_FORWARD_CHAR_SIMD +if (JIT_HAS_FAST_FORWARD_CHAR_SIMD && (common->nltype == NLTYPE_FIXED || common->nltype == NLTYPE_ANYCRLF)) + { + if (common->nltype == NLTYPE_ANYCRLF) + { + fast_forward_char_simd(common, CHAR_CR, CHAR_LF, 0); + if (common->mode != PCRE2_JIT_COMPLETE) + lastchar = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); + + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), 0); + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + quit = CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, CHAR_CR); + } + else + { + fast_forward_char_simd(common, common->newline, common->newline, 0); + + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + if (common->mode != PCRE2_JIT_COMPLETE) + { + OP2(SLJIT_SUB | SLJIT_SET_GREATER, SLJIT_UNUSED, 0, STR_PTR, 0, STR_END, 0); + CMOV(SLJIT_GREATER, STR_PTR, STR_END, 0); + } + } + } +else +#endif /* JIT_HAS_FAST_FORWARD_CHAR_SIMD */ + { + read_char(common, common->nlmin, common->nlmax, NULL, READ_CHAR_NEWLINE); + lastchar = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); + if (common->nltype == NLTYPE_ANY || common->nltype == NLTYPE_ANYCRLF) + foundcr = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, CHAR_CR); + check_newlinechar(common, common->nltype, &newline, FALSE); + set_jumps(newline, loop); + } if (common->nltype == NLTYPE_ANY || common->nltype == NLTYPE_ANYCRLF) { - quit = JUMP(SLJIT_JUMP); - JUMPHERE(foundcr); + if (quit == NULL) + { + quit = JUMP(SLJIT_JUMP); + JUMPHERE(foundcr); + } + notfoundnl = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), 0); OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, CHAR_NL); OP_FLAGS(SLJIT_MOV, TMP1, 0, SLJIT_EQUAL); -#if defined COMPILE_PCRE16 || defined COMPILE_PCRE32 +#if PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, UCHAR_SHIFT); #endif OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); JUMPHERE(notfoundnl); JUMPHERE(quit); } -JUMPHERE(lastchar); + +if (lastchar) + JUMPHERE(lastchar); JUMPHERE(firstchar); if (common->match_end_ptr != 0) OP1(SLJIT_MOV, STR_END, 0, TMP3, 0); } -static BOOL check_class_ranges(compiler_common *common, const sljit_u8 *bits, BOOL nclass, BOOL invert, jump_list **backtracks); +static BOOL optimize_class(compiler_common *common, const sljit_u8 *bits, BOOL nclass, BOOL invert, jump_list **backtracks); -static SLJIT_INLINE void fast_forward_start_bits(compiler_common *common, const sljit_u8 *start_bits) +static SLJIT_INLINE void fast_forward_start_bits(compiler_common *common) { DEFINE_COMPILER; +const sljit_u8 *start_bits = common->re->start_bitmap; struct sljit_label *start; -struct sljit_jump *quit; +struct sljit_jump *partial_quit; +#if PCRE2_CODE_UNIT_WIDTH != 8 struct sljit_jump *found = NULL; -jump_list *matches = NULL; -#ifndef COMPILE_PCRE8 -struct sljit_jump *jump; #endif +jump_list *matches = NULL; if (common->match_end_ptr != 0) { + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->match_end_ptr); OP1(SLJIT_MOV, RETURN_ADDR, 0, STR_END, 0); - OP1(SLJIT_MOV, STR_END, 0, SLJIT_MEM1(SLJIT_SP), common->match_end_ptr); + OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, IN_UCHARS(1)); + OP2(SLJIT_SUB | SLJIT_SET_GREATER, SLJIT_UNUSED, 0, STR_END, 0, TMP1, 0); + CMOV(SLJIT_GREATER, STR_END, TMP1, 0); } start = LABEL(); -quit = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); + +partial_quit = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); +if (common->mode == PCRE2_JIT_COMPLETE) + add_jump(compiler, &common->failed_match, partial_quit); + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), 0); -#ifdef SUPPORT_UTF -if (common->utf) - OP1(SLJIT_MOV, TMP3, 0, TMP1, 0); -#endif +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); -if (!check_class_ranges(common, start_bits, (start_bits[31] & 0x80) != 0, TRUE, &matches)) +if (!optimize_class(common, start_bits, (start_bits[31] & 0x80) != 0, FALSE, &matches)) { -#ifndef COMPILE_PCRE8 - jump = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 255); - OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 255); - JUMPHERE(jump); +#if PCRE2_CODE_UNIT_WIDTH != 8 + if ((start_bits[31] & 0x80) != 0) + found = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 255); + else + CMPTO(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 255, start); +#elif defined SUPPORT_UNICODE + if (common->utf && is_char7_bitset(start_bits, FALSE)) + CMPTO(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 127, start); #endif OP2(SLJIT_AND, TMP2, 0, TMP1, 0, SLJIT_IMM, 0x7); OP2(SLJIT_LSHR, TMP1, 0, TMP1, 0, SLJIT_IMM, 3); OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP1), (sljit_sw)start_bits); - OP2(SLJIT_SHL, TMP2, 0, SLJIT_IMM, 1, TMP2, 0); - OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, TMP2, 0); - found = JUMP(SLJIT_NOT_ZERO); + if (!HAS_VIRTUAL_REGISTERS) + { + OP2(SLJIT_SHL, TMP3, 0, SLJIT_IMM, 1, TMP2, 0); + OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, TMP3, 0); + } + else + { + OP2(SLJIT_SHL, TMP2, 0, SLJIT_IMM, 1, TMP2, 0); + OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, TMP2, 0); + } + JUMPTO(SLJIT_ZERO, start); } +else + set_jumps(matches, start); -#ifdef SUPPORT_UTF -if (common->utf) - OP1(SLJIT_MOV, TMP1, 0, TMP3, 0); -#endif -OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); -#ifdef SUPPORT_UTF -#if defined COMPILE_PCRE8 -if (common->utf) - { - CMPTO(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0xc0, start); - OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP1), (sljit_sw)PRIV(utf8_table4) - 0xc0); - OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); - } -#elif defined COMPILE_PCRE16 -if (common->utf) - { - CMPTO(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0xd800, start); - OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xfc00); - OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0xd800); - OP_FLAGS(SLJIT_MOV, TMP1, 0, SLJIT_EQUAL); - OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 1); - OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); - } -#endif /* COMPILE_PCRE[8|16] */ -#endif /* SUPPORT_UTF */ -JUMPTO(SLJIT_JUMP, start); +#if PCRE2_CODE_UNIT_WIDTH != 8 if (found != NULL) JUMPHERE(found); -if (matches != NULL) - set_jumps(matches, LABEL()); -JUMPHERE(quit); +#endif + +OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + +if (common->mode != PCRE2_JIT_COMPLETE) + JUMPHERE(partial_quit); if (common->match_end_ptr != 0) OP1(SLJIT_MOV, STR_END, 0, RETURN_ADDR, 0); } -static SLJIT_INLINE struct sljit_jump *search_requested_char(compiler_common *common, pcre_uchar req_char, BOOL caseless, BOOL has_firstchar) +static SLJIT_INLINE jump_list *search_requested_char(compiler_common *common, PCRE2_UCHAR req_char, BOOL caseless, BOOL has_firstchar) { DEFINE_COMPILER; struct sljit_label *loop; struct sljit_jump *toolong; -struct sljit_jump *alreadyfound; +struct sljit_jump *already_found; struct sljit_jump *found; -struct sljit_jump *foundoc = NULL; -struct sljit_jump *notfound; +struct sljit_jump *found_oc = NULL; +jump_list *not_found = NULL; sljit_u32 oc, bit; SLJIT_ASSERT(common->req_char_ptr != 0); -OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(SLJIT_SP), common->req_char_ptr); -OP2(SLJIT_ADD, TMP1, 0, STR_PTR, 0, SLJIT_IMM, REQ_BYTE_MAX); -toolong = CMP(SLJIT_LESS, TMP1, 0, STR_END, 0); -alreadyfound = CMP(SLJIT_LESS, STR_PTR, 0, TMP2, 0); +OP2(SLJIT_ADD, TMP2, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(REQ_CU_MAX) * 100); +OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->req_char_ptr); +toolong = CMP(SLJIT_LESS, TMP2, 0, STR_END, 0); +already_found = CMP(SLJIT_LESS, STR_PTR, 0, TMP1, 0); if (has_firstchar) OP2(SLJIT_ADD, TMP1, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); else OP1(SLJIT_MOV, TMP1, 0, STR_PTR, 0); -loop = LABEL(); -notfound = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, STR_END, 0); - -OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(TMP1), 0); oc = req_char; if (caseless) { oc = TABLE_GET(req_char, common->fcc, req_char); -#if defined SUPPORT_UCP && !(defined COMPILE_PCRE8) - if (req_char > 127 && common->utf) +#if defined SUPPORT_UNICODE + if (req_char > 127 && (common->utf || common->ucp)) oc = UCD_OTHERCASE(req_char); #endif } -if (req_char == oc) - found = CMP(SLJIT_EQUAL, TMP2, 0, SLJIT_IMM, req_char); + +#ifdef JIT_HAS_FAST_REQUESTED_CHAR_SIMD +if (JIT_HAS_FAST_REQUESTED_CHAR_SIMD) + { + not_found = fast_requested_char_simd(common, req_char, oc); + } else +#endif { - bit = req_char ^ oc; - if (is_powerof2(bit)) - { - OP2(SLJIT_OR, TMP2, 0, TMP2, 0, SLJIT_IMM, bit); - found = CMP(SLJIT_EQUAL, TMP2, 0, SLJIT_IMM, req_char | bit); - } + loop = LABEL(); + add_jump(compiler, ¬_found, CMP(SLJIT_GREATER_EQUAL, TMP1, 0, STR_END, 0)); + + OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(TMP1), 0); + + if (req_char == oc) + found = CMP(SLJIT_EQUAL, TMP2, 0, SLJIT_IMM, req_char); else { - found = CMP(SLJIT_EQUAL, TMP2, 0, SLJIT_IMM, req_char); - foundoc = CMP(SLJIT_EQUAL, TMP2, 0, SLJIT_IMM, oc); + bit = req_char ^ oc; + if (is_powerof2(bit)) + { + OP2(SLJIT_OR, TMP2, 0, TMP2, 0, SLJIT_IMM, bit); + found = CMP(SLJIT_EQUAL, TMP2, 0, SLJIT_IMM, req_char | bit); + } + else + { + found = CMP(SLJIT_EQUAL, TMP2, 0, SLJIT_IMM, req_char); + found_oc = CMP(SLJIT_EQUAL, TMP2, 0, SLJIT_IMM, oc); + } } + OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, IN_UCHARS(1)); + JUMPTO(SLJIT_JUMP, loop); + + JUMPHERE(found); + if (found_oc) + JUMPHERE(found_oc); } -OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, IN_UCHARS(1)); -JUMPTO(SLJIT_JUMP, loop); -JUMPHERE(found); -if (foundoc) - JUMPHERE(foundoc); OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->req_char_ptr, TMP1, 0); -JUMPHERE(alreadyfound); + +JUMPHERE(already_found); JUMPHERE(toolong); -return notfound; +return not_found; } static void do_revertframes(compiler_common *common) @@ -4850,7 +6488,6 @@ struct sljit_jump *jump; struct sljit_label *mainloop; sljit_emit_fast_enter(compiler, RETURN_ADDR, 0); -OP1(SLJIT_MOV, TMP3, 0, STACK_TOP, 0); GET_LOCAL_BASE(TMP1, 0, 0); /* Drop frames until we reach STACK_TOP. */ @@ -4859,22 +6496,42 @@ OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(STACK_TOP), -sizeof(sljit_sw)); jump = CMP(SLJIT_SIG_LESS_EQUAL, TMP2, 0, SLJIT_IMM, 0); OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, TMP1, 0); -OP1(SLJIT_MOV, SLJIT_MEM1(TMP2), 0, SLJIT_MEM1(STACK_TOP), -2 * sizeof(sljit_sw)); -OP1(SLJIT_MOV, SLJIT_MEM1(TMP2), sizeof(sljit_sw), SLJIT_MEM1(STACK_TOP), -3 * sizeof(sljit_sw)); -OP2(SLJIT_SUB, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, 3 * sizeof(sljit_sw)); +if (HAS_VIRTUAL_REGISTERS) + { + OP1(SLJIT_MOV, SLJIT_MEM1(TMP2), 0, SLJIT_MEM1(STACK_TOP), -(2 * sizeof(sljit_sw))); + OP1(SLJIT_MOV, SLJIT_MEM1(TMP2), sizeof(sljit_sw), SLJIT_MEM1(STACK_TOP), -(3 * sizeof(sljit_sw))); + OP2(SLJIT_SUB, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, 3 * sizeof(sljit_sw)); + } +else + { + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(STACK_TOP), -(2 * sizeof(sljit_sw))); + OP1(SLJIT_MOV, TMP3, 0, SLJIT_MEM1(STACK_TOP), -(3 * sizeof(sljit_sw))); + OP2(SLJIT_SUB, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, 3 * sizeof(sljit_sw)); + OP1(SLJIT_MOV, SLJIT_MEM1(TMP2), 0, TMP1, 0); + GET_LOCAL_BASE(TMP1, 0, 0); + OP1(SLJIT_MOV, SLJIT_MEM1(TMP2), sizeof(sljit_sw), TMP3, 0); + } JUMPTO(SLJIT_JUMP, mainloop); JUMPHERE(jump); jump = CMP(SLJIT_NOT_ZERO /* SIG_LESS */, TMP2, 0, SLJIT_IMM, 0); /* End of reverting values. */ -OP1(SLJIT_MOV, STACK_TOP, 0, TMP3, 0); -sljit_emit_fast_return(compiler, RETURN_ADDR, 0); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); JUMPHERE(jump); OP1(SLJIT_NEG, TMP2, 0, TMP2, 0); OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, TMP1, 0); -OP1(SLJIT_MOV, SLJIT_MEM1(TMP2), 0, SLJIT_MEM1(STACK_TOP), -2 * sizeof(sljit_sw)); -OP2(SLJIT_SUB, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, 2 * sizeof(sljit_sw)); +if (HAS_VIRTUAL_REGISTERS) + { + OP1(SLJIT_MOV, SLJIT_MEM1(TMP2), 0, SLJIT_MEM1(STACK_TOP), -(2 * sizeof(sljit_sw))); + OP2(SLJIT_SUB, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, 2 * sizeof(sljit_sw)); + } +else + { + OP1(SLJIT_MOV, TMP3, 0, SLJIT_MEM1(STACK_TOP), -(2 * sizeof(sljit_sw))); + OP2(SLJIT_SUB, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, 2 * sizeof(sljit_sw)); + OP1(SLJIT_MOV, SLJIT_MEM1(TMP2), 0, TMP3, 0); + } JUMPTO(SLJIT_JUMP, mainloop); } @@ -4883,29 +6540,59 @@ static void check_wordboundary(compiler_common *common) DEFINE_COMPILER; struct sljit_jump *skipread; jump_list *skipread_list = NULL; -#if !(defined COMPILE_PCRE8) || defined SUPPORT_UTF +#ifdef SUPPORT_UNICODE +struct sljit_label *valid_utf; +jump_list *invalid_utf1 = NULL; +#endif /* SUPPORT_UNICODE */ +jump_list *invalid_utf2 = NULL; +#if PCRE2_CODE_UNIT_WIDTH != 8 || defined SUPPORT_UNICODE struct sljit_jump *jump; -#endif +#endif /* PCRE2_CODE_UNIT_WIDTH != 8 || SUPPORT_UNICODE */ SLJIT_COMPILE_ASSERT(ctype_word == 0x10, ctype_word_must_be_16); sljit_emit_fast_enter(compiler, SLJIT_MEM1(SLJIT_SP), LOCALS0); -/* Get type of the previous char, and put it to LOCALS1. */ +/* Get type of the previous char, and put it to TMP3. */ OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); -OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, begin)); -OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LOCALS1, SLJIT_IMM, 0); -skipread = CMP(SLJIT_LESS_EQUAL, STR_PTR, 0, TMP1, 0); -skip_char_back(common); -check_start_used_ptr(common); -read_char(common); +OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, begin)); +OP1(SLJIT_MOV, TMP3, 0, SLJIT_IMM, 0); +skipread = CMP(SLJIT_LESS_EQUAL, STR_PTR, 0, TMP2, 0); + +#ifdef SUPPORT_UNICODE +if (common->invalid_utf) + { + peek_char_back(common, READ_CHAR_MAX, &invalid_utf1); + + if (common->mode != PCRE2_JIT_COMPLETE) + { + OP1(SLJIT_MOV, RETURN_ADDR, 0, TMP1, 0); + OP1(SLJIT_MOV, TMP2, 0, STR_PTR, 0); + move_back(common, NULL, TRUE); + check_start_used_ptr(common); + OP1(SLJIT_MOV, TMP1, 0, RETURN_ADDR, 0); + OP1(SLJIT_MOV, STR_PTR, 0, TMP2, 0); + } + } +else +#endif /* SUPPORT_UNICODE */ + { + if (common->mode == PCRE2_JIT_COMPLETE) + peek_char_back(common, READ_CHAR_MAX, NULL); + else + { + move_back(common, NULL, TRUE); + check_start_used_ptr(common); + read_char(common, 0, READ_CHAR_MAX, NULL, READ_CHAR_UPDATE_STR_PTR); + } + } /* Testing char type. */ -#ifdef SUPPORT_UCP -if (common->use_ucp) +#ifdef SUPPORT_UNICODE +if (common->ucp) { OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, 1); jump = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, CHAR_UNDERSCORE); - add_jump(compiler, &common->getucd, JUMP(SLJIT_FAST_CALL)); + add_jump(compiler, &common->getucdtype, JUMP(SLJIT_FAST_CALL)); OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, ucp_Ll); OP2(SLJIT_SUB | SLJIT_SET_LESS_EQUAL, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, ucp_Lu - ucp_Ll); OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_LESS_EQUAL); @@ -4913,43 +6600,45 @@ if (common->use_ucp) OP2(SLJIT_SUB | SLJIT_SET_LESS_EQUAL, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, ucp_No - ucp_Nd); OP_FLAGS(SLJIT_OR, TMP2, 0, SLJIT_LESS_EQUAL); JUMPHERE(jump); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LOCALS1, TMP2, 0); + OP1(SLJIT_MOV, TMP3, 0, TMP2, 0); } else -#endif +#endif /* SUPPORT_UNICODE */ { -#ifndef COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH != 8 jump = CMP(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 255); -#elif defined SUPPORT_UTF - /* Here LOCALS1 has already been zeroed. */ +#elif defined SUPPORT_UNICODE + /* Here TMP3 has already been zeroed. */ jump = NULL; if (common->utf) jump = CMP(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 255); -#endif /* COMPILE_PCRE8 */ +#endif /* PCRE2_CODE_UNIT_WIDTH == 8 */ OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP1), common->ctypes); OP2(SLJIT_LSHR, TMP1, 0, TMP1, 0, SLJIT_IMM, 4 /* ctype_word */); - OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 1); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LOCALS1, TMP1, 0); -#ifndef COMPILE_PCRE8 + OP2(SLJIT_AND, TMP3, 0, TMP1, 0, SLJIT_IMM, 1); +#if PCRE2_CODE_UNIT_WIDTH != 8 JUMPHERE(jump); -#elif defined SUPPORT_UTF +#elif defined SUPPORT_UNICODE if (jump != NULL) JUMPHERE(jump); -#endif /* COMPILE_PCRE8 */ +#endif /* PCRE2_CODE_UNIT_WIDTH == 8 */ } JUMPHERE(skipread); OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, 0); check_str_end(common, &skipread_list); -peek_char(common, READ_CHAR_MAX); +peek_char(common, READ_CHAR_MAX, SLJIT_MEM1(SLJIT_SP), LOCALS1, &invalid_utf2); /* Testing char type. This is a code duplication. */ -#ifdef SUPPORT_UCP -if (common->use_ucp) +#ifdef SUPPORT_UNICODE + +valid_utf = LABEL(); + +if (common->ucp) { OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, 1); jump = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, CHAR_UNDERSCORE); - add_jump(compiler, &common->getucd, JUMP(SLJIT_FAST_CALL)); + add_jump(compiler, &common->getucdtype, JUMP(SLJIT_FAST_CALL)); OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, ucp_Ll); OP2(SLJIT_SUB | SLJIT_SET_LESS_EQUAL, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, ucp_Lu - ucp_Ll); OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_LESS_EQUAL); @@ -4959,13 +6648,13 @@ if (common->use_ucp) JUMPHERE(jump); } else -#endif +#endif /* SUPPORT_UNICODE */ { -#ifndef COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH != 8 /* TMP2 may be destroyed by peek_char. */ OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, 0); jump = CMP(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 255); -#elif defined SUPPORT_UTF +#elif defined SUPPORT_UNICODE OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, 0); jump = NULL; if (common->utf) @@ -4974,24 +6663,44 @@ else OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP1), common->ctypes); OP2(SLJIT_LSHR, TMP2, 0, TMP2, 0, SLJIT_IMM, 4 /* ctype_word */); OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 1); -#ifndef COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH != 8 JUMPHERE(jump); -#elif defined SUPPORT_UTF +#elif defined SUPPORT_UNICODE if (jump != NULL) JUMPHERE(jump); -#endif /* COMPILE_PCRE8 */ +#endif /* PCRE2_CODE_UNIT_WIDTH == 8 */ } set_jumps(skipread_list, LABEL()); -OP2(SLJIT_XOR | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP2, 0, SLJIT_MEM1(SLJIT_SP), LOCALS1); -sljit_emit_fast_return(compiler, SLJIT_MEM1(SLJIT_SP), LOCALS0); +OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), LOCALS0); +OP2(SLJIT_XOR | SLJIT_SET_Z, TMP2, 0, TMP2, 0, TMP3, 0); +OP_SRC(SLJIT_FAST_RETURN, TMP1, 0); + +#ifdef SUPPORT_UNICODE +if (common->invalid_utf) + { + set_jumps(invalid_utf1, LABEL()); + + peek_char(common, READ_CHAR_MAX, SLJIT_MEM1(SLJIT_SP), LOCALS1, NULL); + CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR, valid_utf); + + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), LOCALS0); + OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, -1); + OP_SRC(SLJIT_FAST_RETURN, TMP1, 0); + + set_jumps(invalid_utf2, LABEL()); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), LOCALS0); + OP1(SLJIT_MOV, TMP2, 0, TMP3, 0); + OP_SRC(SLJIT_FAST_RETURN, TMP1, 0); + } +#endif /* SUPPORT_UNICODE */ } -static BOOL check_class_ranges(compiler_common *common, const sljit_u8 *bits, BOOL nclass, BOOL invert, jump_list **backtracks) +static BOOL optimize_class_ranges(compiler_common *common, const sljit_u8 *bits, BOOL nclass, BOOL invert, jump_list **backtracks) { /* May destroy TMP1. */ DEFINE_COMPILER; -int ranges[MAX_RANGE_SIZE]; +int ranges[MAX_CLASS_RANGE_SIZE]; sljit_u8 bit, cbit, all; int i, byte, length = 0; @@ -5009,7 +6718,7 @@ for (i = 0; i < 256; ) cbit = (bits[byte] >> (i & 0x7)) & 0x1; if (cbit != bit) { - if (length >= MAX_RANGE_SIZE) + if (length >= MAX_CLASS_RANGE_SIZE) return FALSE; ranges[length] = i; length++; @@ -5022,7 +6731,7 @@ for (i = 0; i < 256; ) if (((bit == 0) && nclass) || ((bit == 1) && !nclass)) { - if (length >= MAX_RANGE_SIZE) + if (length >= MAX_CLASS_RANGE_SIZE) return FALSE; ranges[length] = 256; length++; @@ -5139,6 +6848,115 @@ switch(length) } } +static BOOL optimize_class_chars(compiler_common *common, const sljit_u8 *bits, BOOL nclass, BOOL invert, jump_list **backtracks) +{ +/* May destroy TMP1. */ +DEFINE_COMPILER; +uint16_t char_list[MAX_CLASS_CHARS_SIZE]; +uint8_t byte; +sljit_s32 type; +int i, j, k, len, c; + +if (!sljit_has_cpu_feature(SLJIT_HAS_CMOV)) + return FALSE; + +len = 0; + +for (i = 0; i < 32; i++) + { + byte = bits[i]; + + if (nclass) + byte = ~byte; + + j = 0; + while (byte != 0) + { + if (byte & 0x1) + { + c = i * 8 + j; + + k = len; + + if ((c & 0x20) != 0) + { + for (k = 0; k < len; k++) + if (char_list[k] == c - 0x20) + { + char_list[k] |= 0x120; + break; + } + } + + if (k == len) + { + if (len >= MAX_CLASS_CHARS_SIZE) + return FALSE; + + char_list[len++] = (uint16_t) c; + } + } + + byte >>= 1; + j++; + } + } + +if (len == 0) return FALSE; /* Should never occur, but stops analyzers complaining. */ + +i = 0; +j = 0; + +if (char_list[0] == 0) + { + i++; + OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0); + OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_ZERO); + } +else + OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, 0); + +while (i < len) + { + if ((char_list[i] & 0x100) != 0) + j++; + else + { + OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, char_list[i]); + CMOV(SLJIT_ZERO, TMP2, TMP1, 0); + } + i++; + } + +if (j != 0) + { + OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x20); + + for (i = 0; i < len; i++) + if ((char_list[i] & 0x100) != 0) + { + j--; + OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, char_list[i] & 0xff); + CMOV(SLJIT_ZERO, TMP2, TMP1, 0); + } + } + +if (invert) + nclass = !nclass; + +type = nclass ? SLJIT_NOT_EQUAL : SLJIT_EQUAL; +add_jump(compiler, backtracks, CMP(type, TMP2, 0, SLJIT_IMM, 0)); +return TRUE; +} + +static BOOL optimize_class(compiler_common *common, const sljit_u8 *bits, BOOL nclass, BOOL invert, jump_list **backtracks) +{ +/* May destroy TMP1. */ +if (optimize_class_ranges(common, bits, nclass, invert, backtracks)) + return TRUE; +return optimize_class_chars(common, bits, nclass, invert, backtracks); +} + static void check_anynewline(compiler_common *common) { /* Check whether TMP1 contains a newline character. TMP2 destroyed. */ @@ -5150,20 +6968,20 @@ OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x0a); OP2(SLJIT_SUB | SLJIT_SET_LESS_EQUAL, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x0d - 0x0a); OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_LESS_EQUAL); OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x85 - 0x0a); -#if defined SUPPORT_UTF || defined COMPILE_PCRE16 || defined COMPILE_PCRE32 -#ifdef COMPILE_PCRE8 +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 +#if PCRE2_CODE_UNIT_WIDTH == 8 if (common->utf) { #endif OP_FLAGS(SLJIT_OR, TMP2, 0, SLJIT_EQUAL); OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x1); OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x2029 - 0x0a); -#ifdef COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 } #endif -#endif /* SUPPORT_UTF || COMPILE_PCRE16 || COMPILE_PCRE32 */ +#endif /* SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH == [16|32] */ OP_FLAGS(SLJIT_OR | SLJIT_SET_Z, TMP2, 0, SLJIT_EQUAL); -sljit_emit_fast_return(compiler, RETURN_ADDR, 0); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); } static void check_hspace(compiler_common *common) @@ -5178,8 +6996,8 @@ OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_EQUAL); OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x20); OP_FLAGS(SLJIT_OR, TMP2, 0, SLJIT_EQUAL); OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0xa0); -#if defined SUPPORT_UTF || defined COMPILE_PCRE16 || defined COMPILE_PCRE32 -#ifdef COMPILE_PCRE8 +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 +#if PCRE2_CODE_UNIT_WIDTH == 8 if (common->utf) { #endif @@ -5196,13 +7014,13 @@ if (common->utf) OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x205f - 0x2000); OP_FLAGS(SLJIT_OR, TMP2, 0, SLJIT_EQUAL); OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x3000 - 0x2000); -#ifdef COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 } #endif -#endif /* SUPPORT_UTF || COMPILE_PCRE16 || COMPILE_PCRE32 */ +#endif /* SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH == [16|32] */ OP_FLAGS(SLJIT_OR | SLJIT_SET_Z, TMP2, 0, SLJIT_EQUAL); -sljit_emit_fast_return(compiler, RETURN_ADDR, 0); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); } static void check_vspace(compiler_common *common) @@ -5216,21 +7034,21 @@ OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x0a); OP2(SLJIT_SUB | SLJIT_SET_LESS_EQUAL, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x0d - 0x0a); OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_LESS_EQUAL); OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x85 - 0x0a); -#if defined SUPPORT_UTF || defined COMPILE_PCRE16 || defined COMPILE_PCRE32 -#ifdef COMPILE_PCRE8 +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 +#if PCRE2_CODE_UNIT_WIDTH == 8 if (common->utf) { #endif OP_FLAGS(SLJIT_OR, TMP2, 0, SLJIT_EQUAL); OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, 0x1); OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x2029 - 0x0a); -#ifdef COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 } #endif -#endif /* SUPPORT_UTF || COMPILE_PCRE16 || COMPILE_PCRE32 */ +#endif /* SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH == [16|32] */ OP_FLAGS(SLJIT_OR | SLJIT_SET_Z, TMP2, 0, SLJIT_EQUAL); -sljit_emit_fast_return(compiler, RETURN_ADDR, 0); +OP_SRC(SLJIT_FAST_RETURN, RETURN_ADDR, 0); } static void do_casefulcmp(compiler_common *common) @@ -5241,7 +7059,7 @@ struct sljit_label *label; int char1_reg; int char2_reg; -if (sljit_get_register_index(TMP3) < 0) +if (HAS_VIRTUAL_REGISTERS) { char1_reg = STR_END; char2_reg = STACK_TOP; @@ -5310,7 +7128,7 @@ if (char1_reg == STR_END) OP1(SLJIT_MOV, char2_reg, 0, RETURN_ADDR, 0); } -sljit_emit_fast_return(compiler, TMP1, 0); +OP_SRC(SLJIT_FAST_RETURN, TMP1, 0); } static void do_caselesscmp(compiler_common *common) @@ -5323,7 +7141,7 @@ int char2_reg; int lcc_table; int opt_type = 0; -if (sljit_get_register_index(TMP3) < 0) +if (HAS_VIRTUAL_REGISTERS) { char2_reg = STACK_TOP; lcc_table = STACK_LIMIT; @@ -5375,16 +7193,16 @@ else OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, IN_UCHARS(1)); } -#ifndef COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH != 8 jump = CMP(SLJIT_GREATER, char1_reg, 0, SLJIT_IMM, 255); #endif OP1(SLJIT_MOV_U8, char1_reg, 0, SLJIT_MEM2(lcc_table, char1_reg), 0); -#ifndef COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH != 8 JUMPHERE(jump); jump = CMP(SLJIT_GREATER, char2_reg, 0, SLJIT_IMM, 255); #endif OP1(SLJIT_MOV_U8, char2_reg, 0, SLJIT_MEM2(lcc_table, char2_reg), 0); -#ifndef COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH != 8 JUMPHERE(jump); #endif @@ -5408,47 +7226,16 @@ if (char2_reg == STACK_TOP) } OP1(SLJIT_MOV, char1_reg, 0, SLJIT_MEM1(SLJIT_SP), LOCALS1); -sljit_emit_fast_return(compiler, TMP1, 0); -} - -#if defined SUPPORT_UTF && defined SUPPORT_UCP - -static const pcre_uchar * SLJIT_FUNC do_utf_caselesscmp(pcre_uchar *src1, pcre_uchar *src2, pcre_uchar *end1, pcre_uchar *end2) -{ -/* This function would be ineffective to do in JIT level. */ -sljit_u32 c1, c2; -const ucd_record *ur; -const sljit_u32 *pp; - -while (src1 < end1) - { - if (src2 >= end2) - return (pcre_uchar*)1; - GETCHARINC(c1, src1); - GETCHARINC(c2, src2); - ur = GET_UCD(c2); - if (c1 != c2 && c1 != c2 + ur->other_case) - { - pp = PRIV(ucd_caseless_sets) + ur->caseset; - for (;;) - { - if (c1 < *pp) return NULL; - if (c1 == *pp++) break; - } - } - } -return src2; +OP_SRC(SLJIT_FAST_RETURN, TMP1, 0); } -#endif /* SUPPORT_UTF && SUPPORT_UCP */ - -static pcre_uchar *byte_sequence_compare(compiler_common *common, BOOL caseless, pcre_uchar *cc, +static PCRE2_SPTR byte_sequence_compare(compiler_common *common, BOOL caseless, PCRE2_SPTR cc, compare_context *context, jump_list **backtracks) { DEFINE_COMPILER; unsigned int othercasebit = 0; -pcre_uchar *othercasechar = NULL; -#ifdef SUPPORT_UTF +PCRE2_SPTR othercasechar = NULL; +#ifdef SUPPORT_UNICODE int utflength; #endif @@ -5457,25 +7244,25 @@ if (caseless && char_has_othercase(common, cc)) othercasebit = char_get_othercase_bit(common, cc); SLJIT_ASSERT(othercasebit); /* Extracting bit difference info. */ -#if defined COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 othercasechar = cc + (othercasebit >> 8); othercasebit &= 0xff; -#elif defined COMPILE_PCRE16 || defined COMPILE_PCRE32 +#elif PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 /* Note that this code only handles characters in the BMP. If there ever are characters outside the BMP whose othercase differs in only one bit from itself (there currently are none), this code will need to be - revised for COMPILE_PCRE32. */ + revised for PCRE2_CODE_UNIT_WIDTH == 32. */ othercasechar = cc + (othercasebit >> 9); if ((othercasebit & 0x100) != 0) othercasebit = (othercasebit & 0xff) << 8; else othercasebit &= 0xff; -#endif /* COMPILE_PCRE[8|16|32] */ +#endif /* PCRE2_CODE_UNIT_WIDTH == [8|16|32] */ } if (context->sourcereg == -1) { -#if defined COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 #if defined SLJIT_UNALIGNED && SLJIT_UNALIGNED if (context->length >= 4) OP1(SLJIT_MOV_S32, TMP1, 0, SLJIT_MEM1(STR_PTR), -context->length); @@ -5483,21 +7270,21 @@ if (context->sourcereg == -1) OP1(SLJIT_MOV_U16, TMP1, 0, SLJIT_MEM1(STR_PTR), -context->length); else #endif - OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(STR_PTR), -context->length); -#elif defined COMPILE_PCRE16 + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), -context->length); +#elif PCRE2_CODE_UNIT_WIDTH == 16 #if defined SLJIT_UNALIGNED && SLJIT_UNALIGNED if (context->length >= 4) OP1(SLJIT_MOV_S32, TMP1, 0, SLJIT_MEM1(STR_PTR), -context->length); else #endif OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), -context->length); -#elif defined COMPILE_PCRE32 +#elif PCRE2_CODE_UNIT_WIDTH == 32 OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), -context->length); -#endif /* COMPILE_PCRE[8|16|32] */ +#endif /* PCRE2_CODE_UNIT_WIDTH == [8|16|32] */ context->sourcereg = TMP2; } -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE utflength = 1; if (common->utf && HAS_EXTRALEN(*cc)) utflength += GET_EXTRALEN(*cc); @@ -5507,7 +7294,7 @@ do #endif context->length -= IN_UCHARS(1); -#if (defined SLJIT_UNALIGNED && SLJIT_UNALIGNED) && (defined COMPILE_PCRE8 || defined COMPILE_PCRE16) +#if (defined SLJIT_UNALIGNED && SLJIT_UNALIGNED) && (PCRE2_CODE_UNIT_WIDTH == 8 || PCRE2_CODE_UNIT_WIDTH == 16) /* Unaligned read is supported. */ if (othercasebit != 0 && othercasechar == cc) @@ -5522,7 +7309,7 @@ do } context->ucharptr++; -#if defined COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 if (context->ucharptr >= 4 || context->length == 0 || (context->ucharptr == 2 && context->length == 1)) #else if (context->ucharptr >= 2 || context->length == 0) @@ -5532,27 +7319,27 @@ do OP1(SLJIT_MOV_S32, context->sourcereg, 0, SLJIT_MEM1(STR_PTR), -context->length); else if (context->length >= 2) OP1(SLJIT_MOV_U16, context->sourcereg, 0, SLJIT_MEM1(STR_PTR), -context->length); -#if defined COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 else if (context->length >= 1) OP1(SLJIT_MOV_U8, context->sourcereg, 0, SLJIT_MEM1(STR_PTR), -context->length); -#endif /* COMPILE_PCRE8 */ +#endif /* PCRE2_CODE_UNIT_WIDTH == 8 */ context->sourcereg = context->sourcereg == TMP1 ? TMP2 : TMP1; switch(context->ucharptr) { - case 4 / sizeof(pcre_uchar): + case 4 / sizeof(PCRE2_UCHAR): if (context->oc.asint != 0) OP2(SLJIT_OR, context->sourcereg, 0, context->sourcereg, 0, SLJIT_IMM, context->oc.asint); add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, context->sourcereg, 0, SLJIT_IMM, context->c.asint | context->oc.asint)); break; - case 2 / sizeof(pcre_uchar): + case 2 / sizeof(PCRE2_UCHAR): if (context->oc.asushort != 0) OP2(SLJIT_OR, context->sourcereg, 0, context->sourcereg, 0, SLJIT_IMM, context->oc.asushort); add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, context->sourcereg, 0, SLJIT_IMM, context->c.asushort | context->oc.asushort)); break; -#ifdef COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 case 1: if (context->oc.asbyte != 0) OP2(SLJIT_OR, context->sourcereg, 0, context->sourcereg, 0, SLJIT_IMM, context->oc.asbyte); @@ -5586,7 +7373,7 @@ do #endif cc++; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE utflength--; } while (utflength > 0); @@ -5595,7 +7382,7 @@ while (utflength > 0); return cc; } -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH != 8 #define SET_TYPE_OFFSET(value) \ if ((value) != typeoffset) \ @@ -5617,37 +7404,38 @@ return cc; } \ charoffset = (value); -static pcre_uchar *compile_char1_matchingpath(compiler_common *common, pcre_uchar type, pcre_uchar *cc, jump_list **backtracks, BOOL check_str_ptr); +static PCRE2_SPTR compile_char1_matchingpath(compiler_common *common, PCRE2_UCHAR type, PCRE2_SPTR cc, jump_list **backtracks, BOOL check_str_ptr); -static void compile_xclass_matchingpath(compiler_common *common, pcre_uchar *cc, jump_list **backtracks) +static void compile_xclass_matchingpath(compiler_common *common, PCRE2_SPTR cc, jump_list **backtracks) { DEFINE_COMPILER; jump_list *found = NULL; jump_list **list = (cc[0] & XCL_NOT) == 0 ? &found : backtracks; sljit_uw c, charoffset, max = 256, min = READ_CHAR_MAX; struct sljit_jump *jump = NULL; -pcre_uchar *ccbegin; +PCRE2_SPTR ccbegin; int compares, invertcmp, numberofcmps; -#if defined SUPPORT_UTF && (defined COMPILE_PCRE8 || defined COMPILE_PCRE16) +#if defined SUPPORT_UNICODE && (PCRE2_CODE_UNIT_WIDTH == 8 || PCRE2_CODE_UNIT_WIDTH == 16) BOOL utf = common->utf; -#endif +#endif /* SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == [8|16] */ -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE BOOL needstype = FALSE, needsscript = FALSE, needschar = FALSE; BOOL charsaved = FALSE; int typereg = TMP1; const sljit_u32 *other_cases; sljit_uw typeoffset; -#endif +#endif /* SUPPORT_UNICODE */ /* Scanning the necessary info. */ cc++; ccbegin = cc; compares = 0; + if (cc[-1] & XCL_MAP) { min = 0; - cc += 32 / sizeof(pcre_uchar); + cc += 32 / sizeof(PCRE2_UCHAR); } while (*cc != XCL_END) @@ -5659,9 +7447,9 @@ while (*cc != XCL_END) GETCHARINCTEST(c, cc); if (c > max) max = c; if (c < min) min = c; -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE needschar = TRUE; -#endif +#endif /* SUPPORT_UNICODE */ } else if (*cc == XCL_RANGE) { @@ -5670,11 +7458,11 @@ while (*cc != XCL_END) if (c < min) min = c; GETCHARINCTEST(c, cc); if (c > max) max = c; -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE needschar = TRUE; -#endif +#endif /* SUPPORT_UNICODE */ } -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE else { SLJIT_ASSERT(*cc == XCL_PROP || *cc == XCL_NOTPROP); @@ -5740,20 +7528,29 @@ while (*cc != XCL_END) } cc += 2; } -#endif +#endif /* SUPPORT_UNICODE */ } SLJIT_ASSERT(compares > 0); /* We are not necessary in utf mode even in 8 bit mode. */ cc = ccbegin; -read_char_range(common, min, max, (cc[-1] & XCL_NOT) != 0); +if ((cc[-1] & XCL_NOT) != 0) + read_char(common, min, max, backtracks, READ_CHAR_UPDATE_STR_PTR); +else + { +#ifdef SUPPORT_UNICODE + read_char(common, min, max, (needstype || needsscript) ? backtracks : NULL, 0); +#else /* !SUPPORT_UNICODE */ + read_char(common, min, max, NULL, 0); +#endif /* SUPPORT_UNICODE */ + } if ((cc[-1] & XCL_HASPROP) == 0) { if ((cc[-1] & XCL_MAP) != 0) { jump = CMP(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 255); - if (!check_class_ranges(common, (const sljit_u8 *)cc, (((const sljit_u8 *)cc)[31] & 0x80) != 0, TRUE, &found)) + if (!optimize_class(common, (const sljit_u8 *)cc, (((const sljit_u8 *)cc)[31] & 0x80) != 0, TRUE, &found)) { OP2(SLJIT_AND, TMP2, 0, TMP1, 0, SLJIT_IMM, 0x7); OP2(SLJIT_LSHR, TMP1, 0, TMP1, 0, SLJIT_IMM, 3); @@ -5766,7 +7563,7 @@ if ((cc[-1] & XCL_HASPROP) == 0) add_jump(compiler, backtracks, JUMP(SLJIT_JUMP)); JUMPHERE(jump); - cc += 32 / sizeof(pcre_uchar); + cc += 32 / sizeof(PCRE2_UCHAR); } else { @@ -5777,15 +7574,15 @@ if ((cc[-1] & XCL_HASPROP) == 0) else if ((cc[-1] & XCL_MAP) != 0) { OP1(SLJIT_MOV, RETURN_ADDR, 0, TMP1, 0); -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE charsaved = TRUE; -#endif - if (!check_class_ranges(common, (const sljit_u8 *)cc, FALSE, TRUE, list)) +#endif /* SUPPORT_UNICODE */ + if (!optimize_class(common, (const sljit_u8 *)cc, FALSE, TRUE, list)) { -#ifdef COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 jump = NULL; if (common->utf) -#endif +#endif /* PCRE2_CODE_UNIT_WIDTH == 8 */ jump = CMP(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 255); OP2(SLJIT_AND, TMP2, 0, TMP1, 0, SLJIT_IMM, 0x7); @@ -5795,33 +7592,34 @@ else if ((cc[-1] & XCL_MAP) != 0) OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, TMP2, 0); add_jump(compiler, list, JUMP(SLJIT_NOT_ZERO)); -#ifdef COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 if (common->utf) -#endif +#endif /* PCRE2_CODE_UNIT_WIDTH == 8 */ JUMPHERE(jump); } OP1(SLJIT_MOV, TMP1, 0, RETURN_ADDR, 0); - cc += 32 / sizeof(pcre_uchar); + cc += 32 / sizeof(PCRE2_UCHAR); } -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE if (needstype || needsscript) { if (needschar && !charsaved) OP1(SLJIT_MOV, RETURN_ADDR, 0, TMP1, 0); -#ifdef COMPILE_PCRE32 +#if PCRE2_CODE_UNIT_WIDTH == 32 if (!common->utf) { - jump = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0x10ffff + 1); - OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, INVALID_UTF_CHAR); + jump = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, MAX_UTF_CODE_POINT + 1); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, UNASSIGNED_UTF_CHAR); JUMPHERE(jump); } -#endif +#endif /* PCRE2_CODE_UNIT_WIDTH == 32 */ OP2(SLJIT_LSHR, TMP2, 0, TMP1, 0, SLJIT_IMM, UCD_BLOCK_SHIFT); - OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(ucd_stage1)); + OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 1); + OP1(SLJIT_MOV_U16, TMP2, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(ucd_stage1)); OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, UCD_BLOCK_MASK); OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, UCD_BLOCK_SHIFT); OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); @@ -5831,8 +7629,11 @@ if (needstype || needsscript) /* Before anything else, we deal with scripts. */ if (needsscript) { - OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, (sljit_sw)PRIV(ucd_records) + SLJIT_OFFSETOF(ucd_record, script)); - OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM2(TMP1, TMP2), 3); + OP2(SLJIT_SHL, TMP1, 0, TMP2, 0, SLJIT_IMM, 3); + OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 2); + OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); + + OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP1), (sljit_sw)PRIV(ucd_records) + SLJIT_OFFSETOF(ucd_record, script)); ccbegin = cc; @@ -5864,39 +7665,64 @@ if (needstype || needsscript) } cc += 2; } - } + } + + cc = ccbegin; + + if (needstype) + { + /* TMP2 has already been shifted by 2 */ + if (!needschar) + { + OP2(SLJIT_ADD, TMP1, 0, TMP2, 0, TMP2, 0); + OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); + + OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP1), (sljit_sw)PRIV(ucd_records) + SLJIT_OFFSETOF(ucd_record, chartype)); + } + else + { + OP2(SLJIT_ADD, TMP1, 0, TMP2, 0, TMP2, 0); + OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, TMP1, 0); - cc = ccbegin; + OP1(SLJIT_MOV, TMP1, 0, RETURN_ADDR, 0); + OP1(SLJIT_MOV_U8, RETURN_ADDR, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(ucd_records) + SLJIT_OFFSETOF(ucd_record, chartype)); + typereg = RETURN_ADDR; + } + } + else if (needschar) + OP1(SLJIT_MOV, TMP1, 0, RETURN_ADDR, 0); } - - if (needschar) + else if (needstype) { - OP1(SLJIT_MOV, TMP1, 0, RETURN_ADDR, 0); - } + OP2(SLJIT_SHL, TMP1, 0, TMP2, 0, SLJIT_IMM, 3); + OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 2); - if (needstype) - { if (!needschar) { - OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, (sljit_sw)PRIV(ucd_records) + SLJIT_OFFSETOF(ucd_record, chartype)); - OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM2(TMP1, TMP2), 3); + OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP2, 0); + + OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP1), (sljit_sw)PRIV(ucd_records) + SLJIT_OFFSETOF(ucd_record, chartype)); } else { - OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 3); + OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, TMP1, 0); + + OP1(SLJIT_MOV, TMP1, 0, RETURN_ADDR, 0); OP1(SLJIT_MOV_U8, RETURN_ADDR, 0, SLJIT_MEM1(TMP2), (sljit_sw)PRIV(ucd_records) + SLJIT_OFFSETOF(ucd_record, chartype)); typereg = RETURN_ADDR; } } + else if (needschar) + OP1(SLJIT_MOV, TMP1, 0, RETURN_ADDR, 0); } -#endif +#endif /* SUPPORT_UNICODE */ /* Generating code. */ charoffset = 0; numberofcmps = 0; -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE typeoffset = 0; -#endif +#endif /* SUPPORT_UNICODE */ while (*cc != XCL_END) { @@ -5954,7 +7780,7 @@ while (*cc != XCL_END) numberofcmps = 0; } } -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE else { SLJIT_ASSERT(*cc == XCL_PROP || *cc == XCL_NOTPROP); @@ -6163,7 +7989,7 @@ while (*cc != XCL_END) } cc += 2; } -#endif +#endif /* SUPPORT_UNICODE */ if (jump != NULL) add_jump(compiler, compares > 0 ? list : backtracks, jump); @@ -6178,32 +8004,49 @@ if (found != NULL) #endif -static pcre_uchar *compile_simple_assertion_matchingpath(compiler_common *common, pcre_uchar type, pcre_uchar *cc, jump_list **backtracks) +static PCRE2_SPTR compile_simple_assertion_matchingpath(compiler_common *common, PCRE2_UCHAR type, PCRE2_SPTR cc, jump_list **backtracks) { DEFINE_COMPILER; int length; struct sljit_jump *jump[4]; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE struct sljit_label *label; -#endif /* SUPPORT_UTF */ +#endif /* SUPPORT_UNICODE */ switch(type) { case OP_SOD: - OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, begin)); + if (HAS_VIRTUAL_REGISTERS) + { + OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, begin)); + } + else + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, begin)); add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, STR_PTR, 0, TMP1, 0)); return cc; case OP_SOM: - OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, str)); + if (HAS_VIRTUAL_REGISTERS) + { + OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, str)); + } + else + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, str)); add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, STR_PTR, 0, TMP1, 0)); return cc; case OP_NOT_WORD_BOUNDARY: case OP_WORD_BOUNDARY: add_jump(compiler, &common->wordboundary, JUMP(SLJIT_FAST_CALL)); +#ifdef SUPPORT_UNICODE + if (common->invalid_utf) + { + add_jump(compiler, backtracks, CMP((type == OP_NOT_WORD_BOUNDARY) ? SLJIT_NOT_EQUAL : SLJIT_SIG_LESS_EQUAL, TMP2, 0, SLJIT_IMM, 0)); + return cc; + } +#endif /* SUPPORT_UNICODE */ sljit_set_current_flags(compiler, SLJIT_SET_Z); add_jump(compiler, backtracks, JUMP(type == OP_NOT_WORD_BOUNDARY ? SLJIT_NOT_ZERO : SLJIT_ZERO)); return cc; @@ -6215,7 +8058,7 @@ switch(type) { OP2(SLJIT_ADD, TMP2, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(2)); OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); - if (common->mode == JIT_COMPILE) + if (common->mode == PCRE2_JIT_COMPLETE) add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, TMP2, 0, STR_END, 0)); else { @@ -6262,30 +8105,37 @@ switch(type) } else { - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LOCALS1, STR_PTR, 0); - read_char_range(common, common->nlmin, common->nlmax, TRUE); + OP1(SLJIT_MOV, TMP3, 0, STR_PTR, 0); + read_char(common, common->nlmin, common->nlmax, backtracks, READ_CHAR_UPDATE_STR_PTR); add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, STR_PTR, 0, STR_END, 0)); add_jump(compiler, &common->anynewline, JUMP(SLJIT_FAST_CALL)); sljit_set_current_flags(compiler, SLJIT_SET_Z); add_jump(compiler, backtracks, JUMP(SLJIT_ZERO)); - OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), LOCALS1); + OP1(SLJIT_MOV, STR_PTR, 0, TMP3, 0); } JUMPHERE(jump[2]); JUMPHERE(jump[3]); } JUMPHERE(jump[0]); - check_partial(common, FALSE); + if (common->mode != PCRE2_JIT_COMPLETE) + check_partial(common, TRUE); return cc; case OP_EOD: add_jump(compiler, backtracks, CMP(SLJIT_LESS, STR_PTR, 0, STR_END, 0)); - check_partial(common, FALSE); + if (common->mode != PCRE2_JIT_COMPLETE) + check_partial(common, TRUE); return cc; case OP_DOLL: - OP1(SLJIT_MOV, TMP2, 0, ARGUMENTS, 0); - OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(jit_arguments, noteol)); - add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, TMP2, 0, SLJIT_IMM, 0)); + if (HAS_VIRTUAL_REGISTERS) + { + OP1(SLJIT_MOV, TMP2, 0, ARGUMENTS, 0); + OP2(SLJIT_AND32 | SLJIT_SET_Z, SLJIT_UNUSED, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(jit_arguments, options), SLJIT_IMM, PCRE2_NOTEOL); + } + else + OP2(SLJIT_AND32 | SLJIT_SET_Z, SLJIT_UNUSED, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, options), SLJIT_IMM, PCRE2_NOTEOL); + add_jump(compiler, backtracks, JUMP(SLJIT_NOT_ZERO32)); if (!common->endonly) compile_simple_assertion_matchingpath(common, OP_EODN, cc, backtracks); @@ -6298,9 +8148,14 @@ switch(type) case OP_DOLLM: jump[1] = CMP(SLJIT_LESS, STR_PTR, 0, STR_END, 0); - OP1(SLJIT_MOV, TMP2, 0, ARGUMENTS, 0); - OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(jit_arguments, noteol)); - add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, TMP2, 0, SLJIT_IMM, 0)); + if (HAS_VIRTUAL_REGISTERS) + { + OP1(SLJIT_MOV, TMP2, 0, ARGUMENTS, 0); + OP2(SLJIT_AND32 | SLJIT_SET_Z, SLJIT_UNUSED, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(jit_arguments, options), SLJIT_IMM, PCRE2_NOTEOL); + } + else + OP2(SLJIT_AND32 | SLJIT_SET_Z, SLJIT_UNUSED, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, options), SLJIT_IMM, PCRE2_NOTEOL); + add_jump(compiler, backtracks, JUMP(SLJIT_NOT_ZERO32)); check_partial(common, FALSE); jump[0] = JUMP(SLJIT_JUMP); JUMPHERE(jump[1]); @@ -6309,7 +8164,7 @@ switch(type) { OP2(SLJIT_ADD, TMP2, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(2)); OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); - if (common->mode == JIT_COMPILE) + if (common->mode == PCRE2_JIT_COMPLETE) add_jump(compiler, backtracks, CMP(SLJIT_GREATER, TMP2, 0, STR_END, 0)); else { @@ -6327,34 +8182,56 @@ switch(type) } else { - peek_char(common, common->nlmax); + peek_char(common, common->nlmax, TMP3, 0, NULL); check_newlinechar(common, common->nltype, backtracks, FALSE); } JUMPHERE(jump[0]); return cc; case OP_CIRC: - OP1(SLJIT_MOV, TMP2, 0, ARGUMENTS, 0); - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(jit_arguments, begin)); - add_jump(compiler, backtracks, CMP(SLJIT_GREATER, STR_PTR, 0, TMP1, 0)); - OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(jit_arguments, notbol)); - add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, TMP2, 0, SLJIT_IMM, 0)); + if (HAS_VIRTUAL_REGISTERS) + { + OP1(SLJIT_MOV, TMP2, 0, ARGUMENTS, 0); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(jit_arguments, begin)); + add_jump(compiler, backtracks, CMP(SLJIT_GREATER, STR_PTR, 0, TMP1, 0)); + OP2(SLJIT_AND32 | SLJIT_SET_Z, SLJIT_UNUSED, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(jit_arguments, options), SLJIT_IMM, PCRE2_NOTBOL); + add_jump(compiler, backtracks, JUMP(SLJIT_NOT_ZERO32)); + } + else + { + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, begin)); + add_jump(compiler, backtracks, CMP(SLJIT_GREATER, STR_PTR, 0, TMP1, 0)); + OP2(SLJIT_AND32 | SLJIT_SET_Z, SLJIT_UNUSED, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, options), SLJIT_IMM, PCRE2_NOTBOL); + add_jump(compiler, backtracks, JUMP(SLJIT_NOT_ZERO32)); + } return cc; case OP_CIRCM: - OP1(SLJIT_MOV, TMP2, 0, ARGUMENTS, 0); - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(jit_arguments, begin)); - jump[1] = CMP(SLJIT_GREATER, STR_PTR, 0, TMP1, 0); - OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(jit_arguments, notbol)); - add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, TMP2, 0, SLJIT_IMM, 0)); + /* TMP2 might be used by peek_char_back. */ + if (HAS_VIRTUAL_REGISTERS) + { + OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, begin)); + jump[1] = CMP(SLJIT_GREATER, STR_PTR, 0, TMP2, 0); + OP2(SLJIT_AND32 | SLJIT_SET_Z, SLJIT_UNUSED, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, options), SLJIT_IMM, PCRE2_NOTBOL); + } + else + { + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, begin)); + jump[1] = CMP(SLJIT_GREATER, STR_PTR, 0, TMP2, 0); + OP2(SLJIT_AND32 | SLJIT_SET_Z, SLJIT_UNUSED, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, options), SLJIT_IMM, PCRE2_NOTBOL); + } + add_jump(compiler, backtracks, JUMP(SLJIT_NOT_ZERO32)); jump[0] = JUMP(SLJIT_JUMP); JUMPHERE(jump[1]); - add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + if (!common->alt_circumflex) + add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + if (common->nltype == NLTYPE_FIXED && common->newline > 255) { - OP2(SLJIT_SUB, TMP2, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(2)); - add_jump(compiler, backtracks, CMP(SLJIT_LESS, TMP2, 0, TMP1, 0)); + OP2(SLJIT_SUB, TMP1, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(2)); + add_jump(compiler, backtracks, CMP(SLJIT_LESS, TMP1, 0, TMP2, 0)); OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-2)); OP1(MOV_UCHAR, TMP2, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-1)); add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, (common->newline >> 8) & 0xff)); @@ -6362,8 +8239,7 @@ switch(type) } else { - skip_char_back(common); - read_char_range(common, common->nlmin, common->nlmax, TRUE); + peek_char_back(common, common->nlmax, backtracks); check_newlinechar(common, common->nltype, backtracks, FALSE); } JUMPHERE(jump[0]); @@ -6373,24 +8249,28 @@ switch(type) length = GET(cc, 0); if (length == 0) return cc + LINK_SIZE; - OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); -#ifdef SUPPORT_UTF + if (HAS_VIRTUAL_REGISTERS) + { + OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, begin)); + } + else + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, begin)); +#ifdef SUPPORT_UNICODE if (common->utf) { - OP1(SLJIT_MOV, TMP3, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, begin)); - OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, length); + OP1(SLJIT_MOV, TMP3, 0, SLJIT_IMM, length); label = LABEL(); - add_jump(compiler, backtracks, CMP(SLJIT_LESS_EQUAL, STR_PTR, 0, TMP3, 0)); - skip_char_back(common); - OP2(SLJIT_SUB | SLJIT_SET_Z, TMP2, 0, TMP2, 0, SLJIT_IMM, 1); + add_jump(compiler, backtracks, CMP(SLJIT_LESS_EQUAL, STR_PTR, 0, TMP2, 0)); + move_back(common, backtracks, FALSE); + OP2(SLJIT_SUB | SLJIT_SET_Z, TMP3, 0, TMP3, 0, SLJIT_IMM, 1); JUMPTO(SLJIT_NOT_ZERO, label); } else #endif { - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, begin)); OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(length)); - add_jump(compiler, backtracks, CMP(SLJIT_LESS, STR_PTR, 0, TMP1, 0)); + add_jump(compiler, backtracks, CMP(SLJIT_LESS, STR_PTR, 0, TMP2, 0)); } check_start_used_ptr(common); return cc + LINK_SIZE; @@ -6399,7 +8279,216 @@ SLJIT_UNREACHABLE(); return cc; } -static pcre_uchar *compile_char1_matchingpath(compiler_common *common, pcre_uchar type, pcre_uchar *cc, jump_list **backtracks, BOOL check_str_ptr) +#ifdef SUPPORT_UNICODE + +#if PCRE2_CODE_UNIT_WIDTH != 32 + +static PCRE2_SPTR SLJIT_FUNC do_extuni_utf(jit_arguments *args, PCRE2_SPTR cc) +{ +PCRE2_SPTR start_subject = args->begin; +PCRE2_SPTR end_subject = args->end; +int lgb, rgb, ricount; +PCRE2_SPTR prevcc, endcc, bptr; +BOOL first = TRUE; +uint32_t c; + +prevcc = cc; +endcc = NULL; +do + { + GETCHARINC(c, cc); + rgb = UCD_GRAPHBREAK(c); + + if (first) + { + lgb = rgb; + endcc = cc; + first = FALSE; + continue; + } + + if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) + break; + + /* Not breaking between Regional Indicators is allowed only if there + are an even number of preceding RIs. */ + + if (lgb == ucp_gbRegionalIndicator && rgb == ucp_gbRegionalIndicator) + { + ricount = 0; + bptr = prevcc; + + /* bptr is pointing to the left-hand character */ + while (bptr > start_subject) + { + bptr--; + BACKCHAR(bptr); + GETCHAR(c, bptr); + + if (UCD_GRAPHBREAK(c) != ucp_gbRegionalIndicator) + break; + + ricount++; + } + + if ((ricount & 1) != 0) break; /* Grapheme break required */ + } + + /* If Extend or ZWJ follows Extended_Pictographic, do not update lgb; this + allows any number of them before a following Extended_Pictographic. */ + + if ((rgb != ucp_gbExtend && rgb != ucp_gbZWJ) || + lgb != ucp_gbExtended_Pictographic) + lgb = rgb; + + prevcc = endcc; + endcc = cc; + } +while (cc < end_subject); + +return endcc; +} + +#endif /* PCRE2_CODE_UNIT_WIDTH != 32 */ + +static PCRE2_SPTR SLJIT_FUNC do_extuni_utf_invalid(jit_arguments *args, PCRE2_SPTR cc) +{ +PCRE2_SPTR start_subject = args->begin; +PCRE2_SPTR end_subject = args->end; +int lgb, rgb, ricount; +PCRE2_SPTR prevcc, endcc, bptr; +BOOL first = TRUE; +uint32_t c; + +prevcc = cc; +endcc = NULL; +do + { + GETCHARINC_INVALID(c, cc, end_subject, break); + rgb = UCD_GRAPHBREAK(c); + + if (first) + { + lgb = rgb; + endcc = cc; + first = FALSE; + continue; + } + + if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) + break; + + /* Not breaking between Regional Indicators is allowed only if there + are an even number of preceding RIs. */ + + if (lgb == ucp_gbRegionalIndicator && rgb == ucp_gbRegionalIndicator) + { + ricount = 0; + bptr = prevcc; + + /* bptr is pointing to the left-hand character */ + while (bptr > start_subject) + { + GETCHARBACK_INVALID(c, bptr, start_subject, break); + + if (UCD_GRAPHBREAK(c) != ucp_gbRegionalIndicator) + break; + + ricount++; + } + + if ((ricount & 1) != 0) + break; /* Grapheme break required */ + } + + /* If Extend or ZWJ follows Extended_Pictographic, do not update lgb; this + allows any number of them before a following Extended_Pictographic. */ + + if ((rgb != ucp_gbExtend && rgb != ucp_gbZWJ) || + lgb != ucp_gbExtended_Pictographic) + lgb = rgb; + + prevcc = endcc; + endcc = cc; + } +while (cc < end_subject); + +return endcc; +} + +static PCRE2_SPTR SLJIT_FUNC do_extuni_no_utf(jit_arguments *args, PCRE2_SPTR cc) +{ +PCRE2_SPTR start_subject = args->begin; +PCRE2_SPTR end_subject = args->end; +int lgb, rgb, ricount; +PCRE2_SPTR bptr; +uint32_t c; + +/* Patch by PH */ +/* GETCHARINC(c, cc); */ +c = *cc++; + +#if PCRE2_CODE_UNIT_WIDTH == 32 +if (c >= 0x110000) + return NULL; +#endif /* PCRE2_CODE_UNIT_WIDTH == 32 */ +lgb = UCD_GRAPHBREAK(c); + +while (cc < end_subject) + { + c = *cc; +#if PCRE2_CODE_UNIT_WIDTH == 32 + if (c >= 0x110000) + break; +#endif /* PCRE2_CODE_UNIT_WIDTH == 32 */ + rgb = UCD_GRAPHBREAK(c); + + if ((PRIV(ucp_gbtable)[lgb] & (1 << rgb)) == 0) + break; + + /* Not breaking between Regional Indicators is allowed only if there + are an even number of preceding RIs. */ + + if (lgb == ucp_gbRegionalIndicator && rgb == ucp_gbRegionalIndicator) + { + ricount = 0; + bptr = cc - 1; + + /* bptr is pointing to the left-hand character */ + while (bptr > start_subject) + { + bptr--; + c = *bptr; +#if PCRE2_CODE_UNIT_WIDTH == 32 + if (c >= 0x110000) + break; +#endif /* PCRE2_CODE_UNIT_WIDTH == 32 */ + + if (UCD_GRAPHBREAK(c) != ucp_gbRegionalIndicator) break; + + ricount++; + } + + if ((ricount & 1) != 0) + break; /* Grapheme break required */ + } + + /* If Extend or ZWJ follows Extended_Pictographic, do not update lgb; this + allows any number of them before a following Extended_Pictographic. */ + + if ((rgb != ucp_gbExtend && rgb != ucp_gbZWJ) || + lgb != ucp_gbExtended_Pictographic) + lgb = rgb; + + cc++; + } + +return cc; +} + +#endif /* SUPPORT_UNICODE */ + +static PCRE2_SPTR compile_char1_matchingpath(compiler_common *common, PCRE2_UCHAR type, PCRE2_SPTR cc, jump_list **backtracks, BOOL check_str_ptr) { DEFINE_COMPILER; int length; @@ -6407,12 +8496,9 @@ unsigned int c, oc, bit; compare_context context; struct sljit_jump *jump[3]; jump_list *end_list; -#ifdef SUPPORT_UTF -struct sljit_label *label; -#ifdef SUPPORT_UCP -pcre_uchar propdata[5]; -#endif -#endif /* SUPPORT_UTF */ +#ifdef SUPPORT_UNICODE +PCRE2_UCHAR propdata[5]; +#endif /* SUPPORT_UNICODE */ switch(type) { @@ -6421,12 +8507,12 @@ switch(type) /* Digits are usually 0-9, so it is worth to optimize them. */ if (check_str_ptr) detect_partial_match(common, backtracks); -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 - if (common->utf && is_char7_bitset((const sljit_u8 *)common->ctypes - cbit_length + cbit_digit, FALSE)) - read_char7_type(common, type == OP_NOT_DIGIT); +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 + if (common->utf && is_char7_bitset((const sljit_u8*)common->ctypes - cbit_length + cbit_digit, FALSE)) + read_char7_type(common, backtracks, type == OP_NOT_DIGIT); else #endif - read_char8_type(common, type == OP_NOT_DIGIT); + read_char8_type(common, backtracks, type == OP_NOT_DIGIT); /* Flip the starting bit in the negative case. */ OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, ctype_digit); add_jump(compiler, backtracks, JUMP(type == OP_DIGIT ? SLJIT_ZERO : SLJIT_NOT_ZERO)); @@ -6436,12 +8522,12 @@ switch(type) case OP_WHITESPACE: if (check_str_ptr) detect_partial_match(common, backtracks); -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 - if (common->utf && is_char7_bitset((const sljit_u8 *)common->ctypes - cbit_length + cbit_space, FALSE)) - read_char7_type(common, type == OP_NOT_WHITESPACE); +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 + if (common->utf && is_char7_bitset((const sljit_u8*)common->ctypes - cbit_length + cbit_space, FALSE)) + read_char7_type(common, backtracks, type == OP_NOT_WHITESPACE); else #endif - read_char8_type(common, type == OP_NOT_WHITESPACE); + read_char8_type(common, backtracks, type == OP_NOT_WHITESPACE); OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, ctype_space); add_jump(compiler, backtracks, JUMP(type == OP_WHITESPACE ? SLJIT_ZERO : SLJIT_NOT_ZERO)); return cc; @@ -6450,12 +8536,12 @@ switch(type) case OP_WORDCHAR: if (check_str_ptr) detect_partial_match(common, backtracks); -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 - if (common->utf && is_char7_bitset((const sljit_u8 *)common->ctypes - cbit_length + cbit_word, FALSE)) - read_char7_type(common, type == OP_NOT_WORDCHAR); +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 + if (common->utf && is_char7_bitset((const sljit_u8*)common->ctypes - cbit_length + cbit_word, FALSE)) + read_char7_type(common, backtracks, type == OP_NOT_WORDCHAR); else #endif - read_char8_type(common, type == OP_NOT_WORDCHAR); + read_char8_type(common, backtracks, type == OP_NOT_WORDCHAR); OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, ctype_word); add_jump(compiler, backtracks, JUMP(type == OP_WORDCHAR ? SLJIT_ZERO : SLJIT_NOT_ZERO)); return cc; @@ -6463,12 +8549,12 @@ switch(type) case OP_ANY: if (check_str_ptr) detect_partial_match(common, backtracks); - read_char_range(common, common->nlmin, common->nlmax, TRUE); + read_char(common, common->nlmin, common->nlmax, backtracks, READ_CHAR_UPDATE_STR_PTR); if (common->nltype == NLTYPE_FIXED && common->newline > 255) { jump[0] = CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, (common->newline >> 8) & 0xff); end_list = NULL; - if (common->mode != JIT_PARTIAL_HARD_COMPILE) + if (common->mode != PCRE2_JIT_PARTIAL_HARD) add_jump(compiler, &end_list, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); else check_str_end(common, &end_list); @@ -6485,29 +8571,35 @@ switch(type) case OP_ALLANY: if (check_str_ptr) detect_partial_match(common, backtracks); -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf) { + if (common->invalid_utf) + { + read_char(common, 0, READ_CHAR_MAX, backtracks, READ_CHAR_UPDATE_STR_PTR); + return cc; + } + +#if PCRE2_CODE_UNIT_WIDTH == 8 || PCRE2_CODE_UNIT_WIDTH == 16 OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), 0); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); -#if defined COMPILE_PCRE8 || defined COMPILE_PCRE16 -#if defined COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 jump[0] = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0xc0); OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(TMP1), (sljit_sw)PRIV(utf8_table4) - 0xc0); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); -#elif defined COMPILE_PCRE16 +#elif PCRE2_CODE_UNIT_WIDTH == 16 jump[0] = CMP(SLJIT_LESS, TMP1, 0, SLJIT_IMM, 0xd800); OP2(SLJIT_AND, TMP1, 0, TMP1, 0, SLJIT_IMM, 0xfc00); OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0xd800); OP_FLAGS(SLJIT_MOV, TMP1, 0, SLJIT_EQUAL); OP2(SLJIT_SHL, TMP1, 0, TMP1, 0, SLJIT_IMM, 1); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); -#endif +#endif /* PCRE2_CODE_UNIT_WIDTH == 8 */ JUMPHERE(jump[0]); -#endif /* COMPILE_PCRE[8|16] */ return cc; +#endif /* PCRE2_CODE_UNIT_WIDTH == [8|16] */ } -#endif +#endif /* SUPPORT_UNICODE */ OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); return cc; @@ -6517,8 +8609,7 @@ switch(type) OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); return cc; -#ifdef SUPPORT_UTF -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE case OP_NOTPROP: case OP_PROP: propdata[0] = XCL_HASPROP; @@ -6530,17 +8621,16 @@ switch(type) detect_partial_match(common, backtracks); compile_xclass_matchingpath(common, propdata, backtracks); return cc + 2; -#endif #endif case OP_ANYNL: if (check_str_ptr) detect_partial_match(common, backtracks); - read_char_range(common, common->bsr_nlmin, common->bsr_nlmax, FALSE); + read_char(common, common->bsr_nlmin, common->bsr_nlmax, NULL, 0); jump[0] = CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, CHAR_CR); /* We don't need to handle soft partial matching case. */ end_list = NULL; - if (common->mode != JIT_PARTIAL_HARD_COMPILE) + if (common->mode != PCRE2_JIT_PARTIAL_HARD) add_jump(compiler, &end_list, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); else check_str_end(common, &end_list); @@ -6559,7 +8649,12 @@ switch(type) case OP_HSPACE: if (check_str_ptr) detect_partial_match(common, backtracks); - read_char_range(common, 0x9, 0x3000, type == OP_NOT_HSPACE); + + if (type == OP_NOT_HSPACE) + read_char(common, 0x9, 0x3000, backtracks, READ_CHAR_UPDATE_STR_PTR); + else + read_char(common, 0x9, 0x3000, NULL, 0); + add_jump(compiler, &common->hspace, JUMP(SLJIT_FAST_CALL)); sljit_set_current_flags(compiler, SLJIT_SET_Z); add_jump(compiler, backtracks, JUMP(type == OP_NOT_HSPACE ? SLJIT_NOT_ZERO : SLJIT_ZERO)); @@ -6569,45 +8664,42 @@ switch(type) case OP_VSPACE: if (check_str_ptr) detect_partial_match(common, backtracks); - read_char_range(common, 0xa, 0x2029, type == OP_NOT_VSPACE); + + if (type == OP_NOT_VSPACE) + read_char(common, 0xa, 0x2029, backtracks, READ_CHAR_UPDATE_STR_PTR); + else + read_char(common, 0xa, 0x2029, NULL, 0); + add_jump(compiler, &common->vspace, JUMP(SLJIT_FAST_CALL)); sljit_set_current_flags(compiler, SLJIT_SET_Z); add_jump(compiler, backtracks, JUMP(type == OP_NOT_VSPACE ? SLJIT_NOT_ZERO : SLJIT_ZERO)); return cc; -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE case OP_EXTUNI: if (check_str_ptr) detect_partial_match(common, backtracks); - read_char(common); - add_jump(compiler, &common->getucd, JUMP(SLJIT_FAST_CALL)); - OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, (sljit_sw)PRIV(ucd_records) + SLJIT_OFFSETOF(ucd_record, gbprop)); - /* Optimize register allocation: use a real register. */ - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LOCALS0, STACK_TOP, 0); - OP1(SLJIT_MOV_U8, STACK_TOP, 0, SLJIT_MEM2(TMP1, TMP2), 3); - label = LABEL(); - jump[0] = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); - OP1(SLJIT_MOV, TMP3, 0, STR_PTR, 0); - read_char(common); - add_jump(compiler, &common->getucd, JUMP(SLJIT_FAST_CALL)); - OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, (sljit_sw)PRIV(ucd_records) + SLJIT_OFFSETOF(ucd_record, gbprop)); - OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM2(TMP1, TMP2), 3); + SLJIT_ASSERT(TMP1 == SLJIT_R0 && STR_PTR == SLJIT_R1); + OP1(SLJIT_MOV, SLJIT_R0, 0, ARGUMENTS, 0); - OP2(SLJIT_SHL, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, 2); - OP1(SLJIT_MOV_U32, TMP1, 0, SLJIT_MEM1(STACK_TOP), (sljit_sw)PRIV(ucp_gbtable)); - OP1(SLJIT_MOV, STACK_TOP, 0, TMP2, 0); - OP2(SLJIT_SHL, TMP2, 0, SLJIT_IMM, 1, TMP2, 0); - OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, TMP2, 0); - JUMPTO(SLJIT_NOT_ZERO, label); +#if PCRE2_CODE_UNIT_WIDTH != 32 + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(SW), SLJIT_IMM, + common->utf ? (common->invalid_utf ? SLJIT_FUNC_OFFSET(do_extuni_utf_invalid) : SLJIT_FUNC_OFFSET(do_extuni_utf)) : SLJIT_FUNC_OFFSET(do_extuni_no_utf)); + if (common->invalid_utf) + add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, SLJIT_RETURN_REG, 0, SLJIT_IMM, 0)); +#else + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(SW), SLJIT_IMM, + common->invalid_utf ? SLJIT_FUNC_OFFSET(do_extuni_utf_invalid) : SLJIT_FUNC_OFFSET(do_extuni_no_utf)); + if (!common->utf || common->invalid_utf) + add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, SLJIT_RETURN_REG, 0, SLJIT_IMM, 0)); +#endif - OP1(SLJIT_MOV, STR_PTR, 0, TMP3, 0); - JUMPHERE(jump[0]); - OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), LOCALS0); + OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_RETURN_REG, 0); - if (common->mode == JIT_PARTIAL_HARD_COMPILE) + if (common->mode == PCRE2_JIT_PARTIAL_HARD) { - jump[0] = CMP(SLJIT_LESS, STR_PTR, 0, STR_END, 0); + jump[0] = CMP(SLJIT_LESS, SLJIT_RETURN_REG, 0, STR_END, 0); /* Since we successfully read a char above, partial matching must occure. */ check_partial(common, TRUE); JUMPHERE(jump[0]); @@ -6618,14 +8710,18 @@ switch(type) case OP_CHAR: case OP_CHARI: length = 1; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf && HAS_EXTRALEN(*cc)) length += GET_EXTRALEN(*cc); #endif - if (common->mode == JIT_COMPILE && check_str_ptr - && (type == OP_CHAR || !char_has_othercase(common, cc) || char_get_othercase_bit(common, cc) != 0)) + + if (check_str_ptr && common->mode != PCRE2_JIT_COMPLETE) + detect_partial_match(common, backtracks); + + if (type == OP_CHAR || !char_has_othercase(common, cc) || char_get_othercase_bit(common, cc) != 0) { OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(length)); - add_jump(compiler, backtracks, CMP(SLJIT_GREATER, STR_PTR, 0, STR_END, 0)); + if (length > 1 || (check_str_ptr && common->mode == PCRE2_JIT_COMPLETE)) + add_jump(compiler, backtracks, CMP(SLJIT_GREATER, STR_PTR, 0, STR_END, 0)); context.length = IN_UCHARS(length); context.sourcereg = -1; @@ -6635,9 +8731,7 @@ switch(type) return byte_sequence_compare(common, type == OP_CHARI, cc, &context, backtracks); } - if (check_str_ptr) - detect_partial_match(common, backtracks); -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf) { GETCHAR(c, cc); @@ -6646,37 +8740,42 @@ switch(type) #endif c = *cc; - if (type == OP_CHAR || !char_has_othercase(common, cc)) + SLJIT_ASSERT(type == OP_CHARI && char_has_othercase(common, cc)); + + if (check_str_ptr && common->mode == PCRE2_JIT_COMPLETE) + add_jump(compiler, backtracks, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + + oc = char_othercase(common, c); + read_char(common, c < oc ? c : oc, c > oc ? c : oc, NULL, 0); + + SLJIT_ASSERT(!is_powerof2(c ^ oc)); + + if (sljit_has_cpu_feature(SLJIT_HAS_CMOV)) { - read_char_range(common, c, c, FALSE); + OP2(SLJIT_SUB | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, oc); + CMOV(SLJIT_EQUAL, TMP1, SLJIT_IMM, c); add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, c)); - return cc + length; } - oc = char_othercase(common, c); - read_char_range(common, c < oc ? c : oc, c > oc ? c : oc, FALSE); - bit = c ^ oc; - if (is_powerof2(bit)) + else { - OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, bit); - add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, c | bit)); - return cc + length; + jump[0] = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, c); + add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, oc)); + JUMPHERE(jump[0]); } - jump[0] = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, c); - add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, oc)); - JUMPHERE(jump[0]); return cc + length; case OP_NOT: case OP_NOTI: if (check_str_ptr) detect_partial_match(common, backtracks); + length = 1; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf) { -#ifdef COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 c = *cc; - if (c < 128) + if (c < 128 && !common->invalid_utf) { OP1(SLJIT_MOV_U8, TMP1, 0, SLJIT_MEM1(STR_PTR), 0); if (type == OP_NOT || !char_has_othercase(common, cc)) @@ -6696,24 +8795,24 @@ switch(type) return cc + 1; } else -#endif /* COMPILE_PCRE8 */ +#endif /* PCRE2_CODE_UNIT_WIDTH == 8 */ { GETCHARLEN(c, cc, length); } } else -#endif /* SUPPORT_UTF */ +#endif /* SUPPORT_UNICODE */ c = *cc; if (type == OP_NOT || !char_has_othercase(common, cc)) { - read_char_range(common, c, c, TRUE); + read_char(common, c, c, backtracks, READ_CHAR_UPDATE_STR_PTR); add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, c)); } else { oc = char_othercase(common, c); - read_char_range(common, c < oc ? c : oc, c > oc ? c : oc, TRUE); + read_char(common, c < oc ? c : oc, c > oc ? c : oc, backtracks, READ_CHAR_UPDATE_STR_PTR); bit = c ^ oc; if (is_powerof2(bit)) { @@ -6733,17 +8832,23 @@ switch(type) if (check_str_ptr) detect_partial_match(common, backtracks); -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 bit = (common->utf && is_char7_bitset((const sljit_u8 *)cc, type == OP_NCLASS)) ? 127 : 255; - read_char_range(common, 0, bit, type == OP_NCLASS); + if (type == OP_NCLASS) + read_char(common, 0, bit, backtracks, READ_CHAR_UPDATE_STR_PTR); + else + read_char(common, 0, bit, NULL, 0); #else - read_char_range(common, 0, 255, type == OP_NCLASS); + if (type == OP_NCLASS) + read_char(common, 0, 255, backtracks, READ_CHAR_UPDATE_STR_PTR); + else + read_char(common, 0, 255, NULL, 0); #endif - if (check_class_ranges(common, (const sljit_u8 *)cc, type == OP_NCLASS, FALSE, backtracks)) - return cc + 32 / sizeof(pcre_uchar); + if (optimize_class(common, (const sljit_u8 *)cc, type == OP_NCLASS, FALSE, backtracks)) + return cc + 32 / sizeof(PCRE2_UCHAR); -#if defined SUPPORT_UTF && defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 jump[0] = NULL; if (common->utf) { @@ -6754,14 +8859,14 @@ switch(type) jump[0] = NULL; } } -#elif !defined COMPILE_PCRE8 +#elif PCRE2_CODE_UNIT_WIDTH != 8 jump[0] = CMP(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 255); if (type == OP_CLASS) { add_jump(compiler, backtracks, jump[0]); jump[0] = NULL; } -#endif /* SUPPORT_UTF && COMPILE_PCRE8 */ +#endif /* SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 */ OP2(SLJIT_AND, TMP2, 0, TMP1, 0, SLJIT_IMM, 0x7); OP2(SLJIT_LSHR, TMP1, 0, TMP1, 0, SLJIT_IMM, 3); @@ -6770,13 +8875,13 @@ switch(type) OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP1, 0, TMP2, 0); add_jump(compiler, backtracks, JUMP(SLJIT_ZERO)); -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH != 8 if (jump[0] != NULL) JUMPHERE(jump[0]); #endif - return cc + 32 / sizeof(pcre_uchar); + return cc + 32 / sizeof(PCRE2_UCHAR); -#if defined SUPPORT_UTF || defined COMPILE_PCRE16 || defined COMPILE_PCRE32 +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 case OP_XCLASS: if (check_str_ptr) detect_partial_match(common, backtracks); @@ -6788,12 +8893,12 @@ SLJIT_UNREACHABLE(); return cc; } -static SLJIT_INLINE pcre_uchar *compile_charn_matchingpath(compiler_common *common, pcre_uchar *cc, pcre_uchar *ccend, jump_list **backtracks) +static SLJIT_INLINE PCRE2_SPTR compile_charn_matchingpath(compiler_common *common, PCRE2_SPTR cc, PCRE2_SPTR ccend, jump_list **backtracks) { /* This function consumes at least one input character. */ /* To decrease the number of length checks, we try to concatenate the fixed length character sequences. */ DEFINE_COMPILER; -pcre_uchar *ccbegin = cc; +PCRE2_SPTR ccbegin = cc; compare_context context; int size; @@ -6806,7 +8911,7 @@ do if (*cc == OP_CHAR) { size = 1; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf && HAS_EXTRALEN(cc[1])) size += GET_EXTRALEN(cc[1]); #endif @@ -6814,7 +8919,7 @@ do else if (*cc == OP_CHARI) { size = 1; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf) { if (char_has_othercase(common, cc + 1) && char_get_othercase_bit(common, cc + 1) == 0) @@ -6855,7 +8960,7 @@ return compile_char1_matchingpath(common, *cc, cc + 1, backtracks, TRUE); } /* Forward definitions. */ -static void compile_matchingpath(compiler_common *, pcre_uchar *, pcre_uchar *, backtrack_common *); +static void compile_matchingpath(compiler_common *, PCRE2_SPTR, PCRE2_SPTR, backtrack_common *); static void compile_backtrackingpath(compiler_common *, struct backtrack_common *); #define PUSH_BACKTRACK(size, ccstart, error) \ @@ -6886,12 +8991,12 @@ static void compile_backtrackingpath(compiler_common *, struct backtrack_common #define BACKTRACK_AS(type) ((type *)backtrack) -static void compile_dnref_search(compiler_common *common, pcre_uchar *cc, jump_list **backtracks) +static void compile_dnref_search(compiler_common *common, PCRE2_SPTR cc, jump_list **backtracks) { /* The OVECTOR offset goes to TMP2. */ DEFINE_COMPILER; int count = GET2(cc, 1 + IMM2_SIZE); -pcre_uchar *slot = common->name_table + GET2(cc, 1) * common->name_entry_size; +PCRE2_SPTR slot = common->name_table + GET2(cc, 1) * common->name_entry_size; unsigned int offset; jump_list *found = NULL; @@ -6910,13 +9015,13 @@ while (count-- > 0) offset = GET2(slot, 0) << 1; GET_LOCAL_BASE(TMP2, 0, OVECTOR(offset)); -if (backtracks != NULL && !common->jscript_compat) +if (backtracks != NULL && !common->unset_backref) add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, SLJIT_MEM1(SLJIT_SP), OVECTOR(offset), TMP1, 0)); set_jumps(found, LABEL()); } -static void compile_ref_matchingpath(compiler_common *common, pcre_uchar *cc, jump_list **backtracks, BOOL withchecks, BOOL emptyfail) +static void compile_ref_matchingpath(compiler_common *common, PCRE2_SPTR cc, jump_list **backtracks, BOOL withchecks, BOOL emptyfail) { DEFINE_COMPILER; BOOL ref = (*cc == OP_REF || *cc == OP_REFI); @@ -6924,55 +9029,119 @@ int offset = 0; struct sljit_jump *jump = NULL; struct sljit_jump *partial; struct sljit_jump *nopartial; +#if defined SUPPORT_UNICODE +struct sljit_label *loop; +struct sljit_label *caseless_loop; +jump_list *no_match = NULL; +int source_reg = COUNT_MATCH; +int source_end_reg = ARGUMENTS; +int char1_reg = STACK_LIMIT; +#endif /* SUPPORT_UNICODE */ if (ref) { offset = GET2(cc, 1) << 1; OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), OVECTOR(offset)); /* OVECTOR(1) contains the "string begin - 1" constant. */ - if (withchecks && !common->jscript_compat) + if (withchecks && !common->unset_backref) add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_MEM1(SLJIT_SP), OVECTOR(1))); } else OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(TMP2), 0); -#if defined SUPPORT_UTF && defined SUPPORT_UCP +#if defined SUPPORT_UNICODE if (common->utf && *cc == OP_REFI) { - SLJIT_ASSERT(TMP1 == SLJIT_R0 && STACK_TOP == SLJIT_R1); + SLJIT_ASSERT(common->iref_ptr != 0); + if (ref) - OP1(SLJIT_MOV, SLJIT_R2, 0, SLJIT_MEM1(SLJIT_SP), OVECTOR(offset + 1)); + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(SLJIT_SP), OVECTOR(offset + 1)); else - OP1(SLJIT_MOV, SLJIT_R2, 0, SLJIT_MEM1(TMP2), sizeof(sljit_sw)); + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP2), sizeof(sljit_sw)); - if (withchecks) - jump = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_R2, 0); - - /* No free saved registers so save data on stack. */ - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LOCALS0, STACK_TOP, 0); - OP1(SLJIT_MOV, SLJIT_R1, 0, STR_PTR, 0); - OP1(SLJIT_MOV, SLJIT_R3, 0, STR_END, 0); - sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(SW) | SLJIT_ARG3(SW) | SLJIT_ARG4(SW), SLJIT_IMM, SLJIT_FUNC_OFFSET(do_utf_caselesscmp)); - OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), LOCALS0); - OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_RETURN_REG, 0); + if (withchecks && emptyfail) + add_jump(compiler, backtracks, CMP(SLJIT_EQUAL, TMP1, 0, TMP2, 0)); - if (common->mode == JIT_COMPILE) - add_jump(compiler, backtracks, CMP(SLJIT_LESS_EQUAL, SLJIT_RETURN_REG, 0, SLJIT_IMM, 1)); - else - { - OP2(SLJIT_SUB | SLJIT_SET_Z | SLJIT_SET_LESS, SLJIT_UNUSED, 0, SLJIT_RETURN_REG, 0, SLJIT_IMM, 1); + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->iref_ptr, source_reg, 0); + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->iref_ptr + sizeof(sljit_sw), source_end_reg, 0); + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->iref_ptr + sizeof(sljit_sw) * 2, char1_reg, 0); - add_jump(compiler, backtracks, JUMP(SLJIT_LESS)); + OP1(SLJIT_MOV, source_reg, 0, TMP1, 0); + OP1(SLJIT_MOV, source_end_reg, 0, TMP2, 0); + + loop = LABEL(); + jump = CMP(SLJIT_GREATER_EQUAL, source_reg, 0, source_end_reg, 0); + partial = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); + + /* Read original character. It must be a valid UTF character. */ + OP1(SLJIT_MOV, TMP3, 0, STR_PTR, 0); + OP1(SLJIT_MOV, STR_PTR, 0, source_reg, 0); + + read_char(common, 0, READ_CHAR_MAX, NULL, READ_CHAR_UPDATE_STR_PTR | READ_CHAR_VALID_UTF); + + OP1(SLJIT_MOV, source_reg, 0, STR_PTR, 0); + OP1(SLJIT_MOV, STR_PTR, 0, TMP3, 0); + OP1(SLJIT_MOV, char1_reg, 0, TMP1, 0); + + /* Read second character. */ + read_char(common, 0, READ_CHAR_MAX, &no_match, READ_CHAR_UPDATE_STR_PTR); + + CMPTO(SLJIT_EQUAL, TMP1, 0, char1_reg, 0, loop); + + OP1(SLJIT_MOV, TMP3, 0, TMP1, 0); + + add_jump(compiler, &common->getucd, JUMP(SLJIT_FAST_CALL)); + + OP2(SLJIT_SHL, TMP1, 0, TMP2, 0, SLJIT_IMM, 2); + OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 3); + OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, TMP1, 0); + + OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, SLJIT_IMM, (sljit_sw)PRIV(ucd_records)); + + OP1(SLJIT_MOV_S32, TMP1, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(ucd_record, other_case)); + OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(ucd_record, caseset)); + OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP3, 0); + CMPTO(SLJIT_EQUAL, TMP1, 0, char1_reg, 0, loop); + + add_jump(compiler, &no_match, CMP(SLJIT_EQUAL, TMP2, 0, SLJIT_IMM, 0)); + OP2(SLJIT_SHL, TMP2, 0, TMP2, 0, SLJIT_IMM, 2); + OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, SLJIT_IMM, (sljit_sw)PRIV(ucd_caseless_sets)); + + caseless_loop = LABEL(); + OP1(SLJIT_MOV_U32, TMP1, 0, SLJIT_MEM1(TMP2), 0); + OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, SLJIT_IMM, sizeof(uint32_t)); + OP2(SLJIT_SUB | SLJIT_SET_Z | SLJIT_SET_LESS, SLJIT_UNUSED, 0, TMP1, 0, char1_reg, 0); + JUMPTO(SLJIT_EQUAL, loop); + JUMPTO(SLJIT_LESS, caseless_loop); + + set_jumps(no_match, LABEL()); + if (common->mode == PCRE2_JIT_COMPLETE) + JUMPHERE(partial); + + OP1(SLJIT_MOV, source_reg, 0, SLJIT_MEM1(SLJIT_SP), common->iref_ptr); + OP1(SLJIT_MOV, source_end_reg, 0, SLJIT_MEM1(SLJIT_SP), common->iref_ptr + sizeof(sljit_sw)); + OP1(SLJIT_MOV, char1_reg, 0, SLJIT_MEM1(SLJIT_SP), common->iref_ptr + sizeof(sljit_sw) * 2); + add_jump(compiler, backtracks, JUMP(SLJIT_JUMP)); + + if (common->mode != PCRE2_JIT_COMPLETE) + { + JUMPHERE(partial); + OP1(SLJIT_MOV, source_reg, 0, SLJIT_MEM1(SLJIT_SP), common->iref_ptr); + OP1(SLJIT_MOV, source_end_reg, 0, SLJIT_MEM1(SLJIT_SP), common->iref_ptr + sizeof(sljit_sw)); + OP1(SLJIT_MOV, char1_reg, 0, SLJIT_MEM1(SLJIT_SP), common->iref_ptr + sizeof(sljit_sw) * 2); - nopartial = JUMP(SLJIT_NOT_EQUAL); - OP1(SLJIT_MOV, STR_PTR, 0, STR_END, 0); check_partial(common, FALSE); add_jump(compiler, backtracks, JUMP(SLJIT_JUMP)); - JUMPHERE(nopartial); } + + JUMPHERE(jump); + OP1(SLJIT_MOV, source_reg, 0, SLJIT_MEM1(SLJIT_SP), common->iref_ptr); + OP1(SLJIT_MOV, source_end_reg, 0, SLJIT_MEM1(SLJIT_SP), common->iref_ptr + sizeof(sljit_sw)); + OP1(SLJIT_MOV, char1_reg, 0, SLJIT_MEM1(SLJIT_SP), common->iref_ptr + sizeof(sljit_sw) * 2); + return; } else -#endif /* SUPPORT_UTF && SUPPORT_UCP */ +#endif /* SUPPORT_UNICODE */ { if (ref) OP2(SLJIT_SUB | SLJIT_SET_Z, TMP2, 0, SLJIT_MEM1(SLJIT_SP), OVECTOR(offset + 1), TMP1, 0); @@ -6984,13 +9153,13 @@ else OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP2, 0); partial = CMP(SLJIT_GREATER, STR_PTR, 0, STR_END, 0); - if (common->mode == JIT_COMPILE) + if (common->mode == PCRE2_JIT_COMPLETE) add_jump(compiler, backtracks, partial); add_jump(compiler, *cc == OP_REF ? &common->casefulcmp : &common->caselesscmp, JUMP(SLJIT_FAST_CALL)); add_jump(compiler, backtracks, CMP(SLJIT_NOT_EQUAL, TMP2, 0, SLJIT_IMM, 0)); - if (common->mode != JIT_COMPILE) + if (common->mode != PCRE2_JIT_COMPLETE) { nopartial = JUMP(SLJIT_JUMP); JUMPHERE(partial); @@ -7017,17 +9186,17 @@ if (jump != NULL) } } -static SLJIT_INLINE pcre_uchar *compile_ref_iterator_matchingpath(compiler_common *common, pcre_uchar *cc, backtrack_common *parent) +static SLJIT_INLINE PCRE2_SPTR compile_ref_iterator_matchingpath(compiler_common *common, PCRE2_SPTR cc, backtrack_common *parent) { DEFINE_COMPILER; BOOL ref = (*cc == OP_REF || *cc == OP_REFI); backtrack_common *backtrack; -pcre_uchar type; +PCRE2_UCHAR type; int offset = 0; struct sljit_label *label; struct sljit_jump *zerolength; struct sljit_jump *jump = NULL; -pcre_uchar *ccbegin = cc; +PCRE2_SPTR ccbegin = cc; int min = 0, max = 0; BOOL minimize; @@ -7224,14 +9393,14 @@ count_match(common); return cc; } -static SLJIT_INLINE pcre_uchar *compile_recurse_matchingpath(compiler_common *common, pcre_uchar *cc, backtrack_common *parent) +static SLJIT_INLINE PCRE2_SPTR compile_recurse_matchingpath(compiler_common *common, PCRE2_SPTR cc, backtrack_common *parent) { DEFINE_COMPILER; backtrack_common *backtrack; recurse_entry *entry = common->entries; recurse_entry *prev = NULL; sljit_sw start = GET(cc, 1); -pcre_uchar *start_cc; +PCRE2_SPTR start_cc; BOOL needs_control_head; PUSH_BACKTRACK(sizeof(recurse_backtrack), cc, NULL); @@ -7259,8 +9428,10 @@ if (entry == NULL) if (SLJIT_UNLIKELY(sljit_get_compiler_error(compiler))) return NULL; entry->next = NULL; - entry->entry = NULL; - entry->calls = NULL; + entry->entry_label = NULL; + entry->backtrack_label = NULL; + entry->entry_calls = NULL; + entry->backtrack_calls = NULL; entry->start = start; if (prev != NULL) @@ -7269,135 +9440,162 @@ if (entry == NULL) common->entries = entry; } -if (common->has_set_som && common->mark_ptr != 0) - { - OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(SLJIT_SP), OVECTOR(0)); - allocate_stack(common, 2); - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->mark_ptr); - OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(0), TMP2, 0); - OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(1), TMP1, 0); - } -else if (common->has_set_som || common->mark_ptr != 0) - { - OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(SLJIT_SP), common->has_set_som ? (int)(OVECTOR(0)) : common->mark_ptr); - allocate_stack(common, 1); - OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(0), TMP2, 0); - } +BACKTRACK_AS(recurse_backtrack)->entry = entry; -if (entry->entry == NULL) - add_jump(compiler, &entry->calls, JUMP(SLJIT_FAST_CALL)); +if (entry->entry_label == NULL) + add_jump(compiler, &entry->entry_calls, JUMP(SLJIT_FAST_CALL)); else - JUMPTO(SLJIT_FAST_CALL, entry->entry); + JUMPTO(SLJIT_FAST_CALL, entry->entry_label); /* Leave if the match is failed. */ add_jump(compiler, &backtrack->topbacktracks, CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, 0)); +BACKTRACK_AS(recurse_backtrack)->matchingpath = LABEL(); return cc + 1 + LINK_SIZE; } -static sljit_s32 SLJIT_FUNC do_callout(struct jit_arguments *arguments, PUBL(callout_block) *callout_block, pcre_uchar **jit_ovector) +static sljit_s32 SLJIT_FUNC do_callout(struct jit_arguments *arguments, pcre2_callout_block *callout_block, PCRE2_SPTR *jit_ovector) { -const pcre_uchar *begin = arguments->begin; -int *offset_vector = arguments->offsets; -int offset_count = arguments->offset_count; -int i; +PCRE2_SPTR begin; +PCRE2_SIZE *ovector; +sljit_u32 oveccount, capture_top; -if (PUBL(callout) == NULL) +if (arguments->callout == NULL) return 0; -callout_block->version = 2; -callout_block->callout_data = arguments->callout_data; +SLJIT_COMPILE_ASSERT(sizeof (PCRE2_SIZE) <= sizeof (sljit_sw), pcre2_size_must_be_lower_than_sljit_sw_size); + +begin = arguments->begin; +ovector = (PCRE2_SIZE*)(callout_block + 1); +oveccount = callout_block->capture_top; + +SLJIT_ASSERT(oveccount >= 1); + +callout_block->version = 2; +callout_block->callout_flags = 0; + +/* Offsets in subject. */ +callout_block->subject_length = arguments->end - arguments->begin; +callout_block->start_match = jit_ovector[0] - begin; +callout_block->current_position = (PCRE2_SPTR)callout_block->offset_vector - begin; +callout_block->subject = begin; + +/* Convert and copy the JIT offset vector to the ovector array. */ +callout_block->capture_top = 1; +callout_block->offset_vector = ovector; + +ovector[0] = PCRE2_UNSET; +ovector[1] = PCRE2_UNSET; +ovector += 2; +jit_ovector += 2; +capture_top = 1; + +/* Convert pointers to sizes. */ +while (--oveccount != 0) + { + capture_top++; + + ovector[0] = (PCRE2_SIZE)(jit_ovector[0] - begin); + ovector[1] = (PCRE2_SIZE)(jit_ovector[1] - begin); + + if (ovector[0] != PCRE2_UNSET) + callout_block->capture_top = capture_top; + + ovector += 2; + jit_ovector += 2; + } -/* Offsets in subject. */ -callout_block->subject_length = arguments->end - arguments->begin; -callout_block->start_match = (pcre_uchar*)callout_block->subject - arguments->begin; -callout_block->current_position = (pcre_uchar*)callout_block->offset_vector - arguments->begin; -#if defined COMPILE_PCRE8 -callout_block->subject = (PCRE_SPTR)begin; -#elif defined COMPILE_PCRE16 -callout_block->subject = (PCRE_SPTR16)begin; -#elif defined COMPILE_PCRE32 -callout_block->subject = (PCRE_SPTR32)begin; -#endif - -/* Convert and copy the JIT offset vector to the offset_vector array. */ -callout_block->capture_top = 0; -callout_block->offset_vector = offset_vector; -for (i = 2; i < offset_count; i += 2) - { - offset_vector[i] = jit_ovector[i] - begin; - offset_vector[i + 1] = jit_ovector[i + 1] - begin; - if (jit_ovector[i] >= begin) - callout_block->capture_top = i; - } - -callout_block->capture_top = (callout_block->capture_top >> 1) + 1; -if (offset_count > 0) - offset_vector[0] = -1; -if (offset_count > 1) - offset_vector[1] = -1; -return (*PUBL(callout))(callout_block); +return (arguments->callout)(callout_block, arguments->callout_data); } -/* Aligning to 8 byte. */ -#define CALLOUT_ARG_SIZE \ - (((int)sizeof(PUBL(callout_block)) + 7) & ~7) - #define CALLOUT_ARG_OFFSET(arg) \ - SLJIT_OFFSETOF(PUBL(callout_block), arg) + SLJIT_OFFSETOF(pcre2_callout_block, arg) -static SLJIT_INLINE pcre_uchar *compile_callout_matchingpath(compiler_common *common, pcre_uchar *cc, backtrack_common *parent) +static SLJIT_INLINE PCRE2_SPTR compile_callout_matchingpath(compiler_common *common, PCRE2_SPTR cc, backtrack_common *parent) { DEFINE_COMPILER; backtrack_common *backtrack; +sljit_s32 mov_opcode; +unsigned int callout_length = (*cc == OP_CALLOUT) + ? PRIV(OP_lengths)[OP_CALLOUT] : GET(cc, 1 + 2 * LINK_SIZE); +sljit_sw value1; +sljit_sw value2; +sljit_sw value3; +sljit_uw callout_arg_size = (common->re->top_bracket + 1) * 2 * sizeof(sljit_sw); PUSH_BACKTRACK(sizeof(backtrack_common), cc, NULL); -allocate_stack(common, CALLOUT_ARG_SIZE / sizeof(sljit_sw)); +callout_arg_size = (sizeof(pcre2_callout_block) + callout_arg_size + sizeof(sljit_sw) - 1) / sizeof(sljit_sw); + +allocate_stack(common, callout_arg_size); SLJIT_ASSERT(common->capture_last_ptr != 0); OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(SLJIT_SP), common->capture_last_ptr); OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); -OP1(SLJIT_MOV_S32, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(callout_number), SLJIT_IMM, cc[1]); -OP1(SLJIT_MOV_S32, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(capture_last), TMP2, 0); +value1 = (*cc == OP_CALLOUT) ? cc[1 + 2 * LINK_SIZE] : 0; +OP1(SLJIT_MOV_U32, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(callout_number), SLJIT_IMM, value1); +OP1(SLJIT_MOV_U32, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(capture_last), TMP2, 0); +OP1(SLJIT_MOV_U32, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(capture_top), SLJIT_IMM, common->re->top_bracket + 1); /* These pointer sized fields temporarly stores internal variables. */ -OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(SLJIT_SP), OVECTOR(0)); OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(offset_vector), STR_PTR, 0); -OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(subject), TMP2, 0); if (common->mark_ptr != 0) OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, mark_ptr)); -OP1(SLJIT_MOV_S32, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(pattern_position), SLJIT_IMM, GET(cc, 2)); -OP1(SLJIT_MOV_S32, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(next_item_length), SLJIT_IMM, GET(cc, 2 + LINK_SIZE)); +mov_opcode = (sizeof(PCRE2_SIZE) == 4) ? SLJIT_MOV_U32 : SLJIT_MOV; +OP1(mov_opcode, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(pattern_position), SLJIT_IMM, GET(cc, 1)); +OP1(mov_opcode, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(next_item_length), SLJIT_IMM, GET(cc, 1 + LINK_SIZE)); + +if (*cc == OP_CALLOUT) + { + value1 = 0; + value2 = 0; + value3 = 0; + } +else + { + value1 = (sljit_sw) (cc + (1 + 4*LINK_SIZE) + 1); + value2 = (callout_length - (1 + 4*LINK_SIZE + 2)); + value3 = (sljit_sw) (GET(cc, 1 + 3*LINK_SIZE)); + } + +OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(callout_string), SLJIT_IMM, value1); +OP1(mov_opcode, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(callout_string_length), SLJIT_IMM, value2); +OP1(mov_opcode, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(callout_string_offset), SLJIT_IMM, value3); OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), CALLOUT_ARG_OFFSET(mark), (common->mark_ptr != 0) ? TMP2 : SLJIT_IMM, 0); +SLJIT_ASSERT(TMP1 == SLJIT_R0 && STR_PTR == SLJIT_R1); + /* Needed to save important temporary registers. */ -OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LOCALS0, STACK_TOP, 0); +OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LOCALS0, STR_PTR, 0); /* SLJIT_R0 = arguments */ OP1(SLJIT_MOV, SLJIT_R1, 0, STACK_TOP, 0); GET_LOCAL_BASE(SLJIT_R2, 0, OVECTOR_START); sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(S32) | SLJIT_ARG1(SW) | SLJIT_ARG2(SW) | SLJIT_ARG3(SW), SLJIT_IMM, SLJIT_FUNC_OFFSET(do_callout)); -OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), LOCALS0); -free_stack(common, CALLOUT_ARG_SIZE / sizeof(sljit_sw)); +OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), LOCALS0); +free_stack(common, callout_arg_size); /* Check return value. */ OP2(SLJIT_SUB32 | SLJIT_SET_Z | SLJIT_SET_SIG_GREATER, SLJIT_UNUSED, 0, SLJIT_RETURN_REG, 0, SLJIT_IMM, 0); add_jump(compiler, &backtrack->topbacktracks, JUMP(SLJIT_SIG_GREATER32)); -if (common->forced_quit_label == NULL) - add_jump(compiler, &common->forced_quit, JUMP(SLJIT_NOT_EQUAL32) /* SIG_LESS */); +if (common->abort_label == NULL) + add_jump(compiler, &common->abort, JUMP(SLJIT_NOT_EQUAL32) /* SIG_LESS */); else - JUMPTO(SLJIT_NOT_EQUAL32 /* SIG_LESS */, common->forced_quit_label); -return cc + 2 + 2 * LINK_SIZE; + JUMPTO(SLJIT_NOT_EQUAL32 /* SIG_LESS */, common->abort_label); +return cc + callout_length; } #undef CALLOUT_ARG_SIZE #undef CALLOUT_ARG_OFFSET -static SLJIT_INLINE BOOL assert_needs_str_ptr_saving(pcre_uchar *cc) +static SLJIT_INLINE BOOL assert_needs_str_ptr_saving(PCRE2_SPTR cc) { while (TRUE) { switch (*cc) { + case OP_CALLOUT_STR: + cc += GET(cc, 1 + 2*LINK_SIZE); + break; + case OP_NOT_WORD_BOUNDARY: case OP_WORD_BOUNDARY: case OP_CIRC: @@ -7418,28 +9616,29 @@ while (TRUE) } } -static pcre_uchar *compile_assert_matchingpath(compiler_common *common, pcre_uchar *cc, assert_backtrack *backtrack, BOOL conditional) +static PCRE2_SPTR compile_assert_matchingpath(compiler_common *common, PCRE2_SPTR cc, assert_backtrack *backtrack, BOOL conditional) { DEFINE_COMPILER; int framesize; int extrasize; +BOOL local_quit_available = FALSE; BOOL needs_control_head; int private_data_ptr; backtrack_common altbacktrack; -pcre_uchar *ccbegin; -pcre_uchar opcode; -pcre_uchar bra = OP_BRA; +PCRE2_SPTR ccbegin; +PCRE2_UCHAR opcode; +PCRE2_UCHAR bra = OP_BRA; jump_list *tmp = NULL; jump_list **target = (conditional) ? &backtrack->condfailed : &backtrack->common.topbacktracks; jump_list **found; /* Saving previous accept variables. */ -BOOL save_local_exit = common->local_exit; -BOOL save_positive_assert = common->positive_assert; +BOOL save_local_quit_available = common->local_quit_available; +BOOL save_in_positive_assertion = common->in_positive_assertion; then_trap_backtrack *save_then_trap = common->then_trap; struct sljit_label *save_quit_label = common->quit_label; struct sljit_label *save_accept_label = common->accept_label; jump_list *save_quit = common->quit; -jump_list *save_positive_assert_quit = common->positive_assert_quit; +jump_list *save_positive_assertion_quit = common->positive_assertion_quit; jump_list *save_accept = common->accept; struct sljit_jump *jump; struct sljit_jump *brajump = NULL; @@ -7521,21 +9720,21 @@ else else OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(1), TMP1, 0); - init_frame(common, ccbegin, NULL, framesize + extrasize - 1, extrasize, FALSE); + init_frame(common, ccbegin, NULL, framesize + extrasize - 1, extrasize); } memset(&altbacktrack, 0, sizeof(backtrack_common)); -if (opcode == OP_ASSERT_NOT || opcode == OP_ASSERTBACK_NOT) +if (conditional || (opcode == OP_ASSERT_NOT || opcode == OP_ASSERTBACK_NOT)) { - /* Negative assert is stronger than positive assert. */ - common->local_exit = TRUE; + /* Control verbs cannot escape from these asserts. */ + local_quit_available = TRUE; + common->local_quit_available = TRUE; common->quit_label = NULL; common->quit = NULL; - common->positive_assert = FALSE; } -else - common->positive_assert = TRUE; -common->positive_assert_quit = NULL; + +common->in_positive_assertion = (opcode == OP_ASSERT || opcode == OP_ASSERTBACK); +common->positive_assertion_quit = NULL; while (1) { @@ -7551,16 +9750,16 @@ while (1) compile_matchingpath(common, ccbegin + 1 + LINK_SIZE, cc, &altbacktrack); if (SLJIT_UNLIKELY(sljit_get_compiler_error(compiler))) { - if (opcode == OP_ASSERT_NOT || opcode == OP_ASSERTBACK_NOT) + if (local_quit_available) { - common->local_exit = save_local_exit; + common->local_quit_available = save_local_quit_available; common->quit_label = save_quit_label; common->quit = save_quit; } - common->positive_assert = save_positive_assert; + common->in_positive_assertion = save_in_positive_assertion; common->then_trap = save_then_trap; common->accept_label = save_accept_label; - common->positive_assert_quit = save_positive_assert_quit; + common->positive_assertion_quit = save_positive_assertion_quit; common->accept = save_accept; return NULL; } @@ -7594,6 +9793,7 @@ while (1) if (needs_control_head) OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(-framesize - 2)); add_jump(compiler, &common->revertframes, JUMP(SLJIT_FAST_CALL)); + OP2(SLJIT_ADD, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, (framesize - 1) * sizeof(sljit_sw)); } } @@ -7629,16 +9829,16 @@ while (1) compile_backtrackingpath(common, altbacktrack.top); if (SLJIT_UNLIKELY(sljit_get_compiler_error(compiler))) { - if (opcode == OP_ASSERT_NOT || opcode == OP_ASSERTBACK_NOT) + if (local_quit_available) { - common->local_exit = save_local_exit; + common->local_quit_available = save_local_quit_available; common->quit_label = save_quit_label; common->quit = save_quit; } - common->positive_assert = save_positive_assert; + common->in_positive_assertion = save_in_positive_assertion; common->then_trap = save_then_trap; common->accept_label = save_accept_label; - common->positive_assert_quit = save_positive_assert_quit; + common->positive_assertion_quit = save_positive_assertion_quit; common->accept = save_accept; return NULL; } @@ -7651,18 +9851,18 @@ while (1) cc += GET(cc, 1); } -if (opcode == OP_ASSERT_NOT || opcode == OP_ASSERTBACK_NOT) +if (local_quit_available) { - SLJIT_ASSERT(common->positive_assert_quit == NULL); + SLJIT_ASSERT(common->positive_assertion_quit == NULL); /* Makes the check less complicated below. */ - common->positive_assert_quit = common->quit; + common->positive_assertion_quit = common->quit; } /* None of them matched. */ -if (common->positive_assert_quit != NULL) +if (common->positive_assertion_quit != NULL) { jump = JUMP(SLJIT_JUMP); - set_jumps(common->positive_assert_quit, LABEL()); + set_jumps(common->positive_assertion_quit, LABEL()); SLJIT_ASSERT(framesize != no_stack); if (framesize < 0) OP2(SLJIT_SUB, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), private_data_ptr, SLJIT_IMM, extrasize * sizeof(sljit_sw)); @@ -7670,7 +9870,7 @@ if (common->positive_assert_quit != NULL) { OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), private_data_ptr); add_jump(compiler, &common->revertframes, JUMP(SLJIT_FAST_CALL)); - OP2(SLJIT_SUB, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, (framesize + extrasize) * sizeof(sljit_sw)); + OP2(SLJIT_SUB, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, (extrasize + 1) * sizeof(sljit_sw)); } JUMPHERE(jump); } @@ -7754,7 +9954,8 @@ if (opcode == OP_ASSERT || opcode == OP_ASSERTBACK) } else { - OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(STACK_TOP), 0); + SLJIT_ASSERT(extrasize == 3); + OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(STACK_TOP), STACK(-1)); OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(0), bra == OP_BRAZERO ? STR_PTR : SLJIT_IMM, 0); } } @@ -7773,7 +9974,9 @@ if (opcode == OP_ASSERT || opcode == OP_ASSERTBACK) { OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), private_data_ptr); add_jump(compiler, &common->revertframes, JUMP(SLJIT_FAST_CALL)); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), private_data_ptr, SLJIT_MEM1(STACK_TOP), STACK(-framesize - 1)); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(STACK_TOP), STACK(-2)); + OP2(SLJIT_ADD, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, (framesize - 1) * sizeof(sljit_sw)); + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), private_data_ptr, TMP1, 0); } set_jumps(backtrack->common.topbacktracks, LABEL()); } @@ -7826,21 +10029,21 @@ else } } -if (opcode == OP_ASSERT_NOT || opcode == OP_ASSERTBACK_NOT) +if (local_quit_available) { - common->local_exit = save_local_exit; + common->local_quit_available = save_local_quit_available; common->quit_label = save_quit_label; common->quit = save_quit; } -common->positive_assert = save_positive_assert; +common->in_positive_assertion = save_in_positive_assertion; common->then_trap = save_then_trap; common->accept_label = save_accept_label; -common->positive_assert_quit = save_positive_assert_quit; +common->positive_assertion_quit = save_positive_assertion_quit; common->accept = save_accept; return cc + 1 + LINK_SIZE; } -static SLJIT_INLINE void match_once_common(compiler_common *common, pcre_uchar ket, int framesize, int private_data_ptr, BOOL has_alternatives, BOOL needs_control_head) +static SLJIT_INLINE void match_once_common(compiler_common *common, PCRE2_UCHAR ket, int framesize, int private_data_ptr, BOOL has_alternatives, BOOL needs_control_head) { DEFINE_COMPILER; int stacksize; @@ -7913,6 +10116,42 @@ if (common->optimized_cbracket[offset >> 1] == 0) return stacksize; } +static PCRE2_SPTR SLJIT_FUNC do_script_run(PCRE2_SPTR ptr, PCRE2_SPTR endptr) +{ + if (PRIV(script_run)(ptr, endptr, FALSE)) + return endptr; + return NULL; +} + +#ifdef SUPPORT_UNICODE + +static PCRE2_SPTR SLJIT_FUNC do_script_run_utf(PCRE2_SPTR ptr, PCRE2_SPTR endptr) +{ + if (PRIV(script_run)(ptr, endptr, TRUE)) + return endptr; + return NULL; +} + +#endif /* SUPPORT_UNICODE */ + +static SLJIT_INLINE void match_script_run_common(compiler_common *common, int private_data_ptr, backtrack_common *parent) +{ +DEFINE_COMPILER; + +SLJIT_ASSERT(TMP1 == SLJIT_R0 && STR_PTR == SLJIT_R1); + +OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), private_data_ptr); +#ifdef SUPPORT_UNICODE +sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(SW), SLJIT_IMM, + common->utf ? SLJIT_FUNC_OFFSET(do_script_run_utf) : SLJIT_FUNC_OFFSET(do_script_run)); +#else +sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(SW), SLJIT_IMM, SLJIT_FUNC_OFFSET(do_script_run)); +#endif + +OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_RETURN_REG, 0); +add_jump(compiler, parent->top != NULL ? &parent->top->nextbacktracks : &parent->topbacktracks, CMP(SLJIT_EQUAL, SLJIT_RETURN_REG, 0, SLJIT_IMM, 0)); +} + /* Handling bracketed expressions is probably the most complex part. @@ -7963,25 +10202,24 @@ return stacksize; (|) OP_*BRA | OP_ALT ... M A (?()|) OP_*COND | OP_ALT M A (?>|) OP_ONCE | OP_ALT ... [stack trace] M A - (?>|) OP_ONCE_NC | OP_ALT ... [stack trace] M A Or nothing, if trace is unnecessary */ -static pcre_uchar *compile_bracket_matchingpath(compiler_common *common, pcre_uchar *cc, backtrack_common *parent) +static PCRE2_SPTR compile_bracket_matchingpath(compiler_common *common, PCRE2_SPTR cc, backtrack_common *parent) { DEFINE_COMPILER; backtrack_common *backtrack; -pcre_uchar opcode; +PCRE2_UCHAR opcode; int private_data_ptr = 0; int offset = 0; int i, stacksize; int repeat_ptr = 0, repeat_length = 0; int repeat_type = 0, repeat_count = 0; -pcre_uchar *ccbegin; -pcre_uchar *matchingpath; -pcre_uchar *slot; -pcre_uchar bra = OP_BRA; -pcre_uchar ket; +PCRE2_SPTR ccbegin; +PCRE2_SPTR matchingpath; +PCRE2_SPTR slot; +PCRE2_UCHAR bra = OP_BRA; +PCRE2_UCHAR ket; assert_backtrack *assert; BOOL has_alternatives; BOOL needs_control_head = FALSE; @@ -8016,13 +10254,6 @@ if (ket == OP_KET && PRIVATE_DATA(matchingpath) != 0) ket = OP_KETRMIN; } -if ((opcode == OP_COND || opcode == OP_SCOND) && cc[1 + LINK_SIZE] == OP_DEF) - { - /* Drop this bracket_backtrack. */ - parent->top = backtrack->prev; - return matchingpath + 1 + LINK_SIZE + repeat_length; - } - matchingpath = ccbegin + 1 + LINK_SIZE; SLJIT_ASSERT(ket == OP_KET || ket == OP_KETRMAX || ket == OP_KETRMIN); SLJIT_ASSERT(!((bra == OP_BRAZERO && ket == OP_KETRMIN) || (bra == OP_BRAMINZERO && ket == OP_KETRMAX))); @@ -8030,12 +10261,14 @@ cc += GET(cc, 1); has_alternatives = *cc == OP_ALT; if (SLJIT_UNLIKELY(opcode == OP_COND || opcode == OP_SCOND)) - has_alternatives = (*matchingpath == OP_RREF || *matchingpath == OP_DNRREF || *matchingpath == OP_FAIL) ? FALSE : TRUE; + { + SLJIT_COMPILE_ASSERT(OP_DNRREF == OP_RREF + 1 && OP_FALSE == OP_RREF + 2 && OP_TRUE == OP_RREF + 3, + compile_time_checks_must_be_grouped_together); + has_alternatives = ((*matchingpath >= OP_RREF && *matchingpath <= OP_TRUE) || *matchingpath == OP_FAIL) ? FALSE : TRUE; + } if (SLJIT_UNLIKELY(opcode == OP_COND) && (*cc == OP_KETRMAX || *cc == OP_KETRMIN)) opcode = OP_SCOND; -if (SLJIT_UNLIKELY(opcode == OP_ONCE_NC)) - opcode = OP_ONCE; if (opcode == OP_CBRA || opcode == OP_SCBRA) { @@ -8054,7 +10287,7 @@ if (opcode == OP_CBRA || opcode == OP_SCBRA) BACKTRACK_AS(bracket_backtrack)->private_data_ptr = private_data_ptr; matchingpath += IMM2_SIZE; } -else if (opcode == OP_ONCE || opcode == OP_SBRA || opcode == OP_SCOND) +else if (opcode == OP_ASSERT_NA || opcode == OP_ASSERTBACK_NA || opcode == OP_ONCE || opcode == OP_SCRIPT_RUN || opcode == OP_SBRA || opcode == OP_SCOND) { /* Other brackets simply allocate the next entry. */ private_data_ptr = PRIVATE_DATA(ccbegin); @@ -8093,35 +10326,32 @@ if (bra == OP_BRAMINZERO) free_stack(common, 1); braminzero = CMP(SLJIT_EQUAL, STR_PTR, 0, SLJIT_IMM, 0); } - else + else if (opcode == OP_ONCE || opcode >= OP_SBRA) { - if (opcode == OP_ONCE || opcode >= OP_SBRA) + jump = CMP(SLJIT_NOT_EQUAL, STR_PTR, 0, SLJIT_IMM, 0); + OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(STACK_TOP), STACK(1)); + /* Nothing stored during the first run. */ + skip = JUMP(SLJIT_JUMP); + JUMPHERE(jump); + /* Checking zero-length iteration. */ + if (opcode != OP_ONCE || BACKTRACK_AS(bracket_backtrack)->u.framesize < 0) { - jump = CMP(SLJIT_NOT_EQUAL, STR_PTR, 0, SLJIT_IMM, 0); - OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(STACK_TOP), STACK(1)); - /* Nothing stored during the first run. */ - skip = JUMP(SLJIT_JUMP); - JUMPHERE(jump); - /* Checking zero-length iteration. */ - if (opcode != OP_ONCE || BACKTRACK_AS(bracket_backtrack)->u.framesize < 0) - { - /* When we come from outside, private_data_ptr contains the previous STR_PTR. */ - braminzero = CMP(SLJIT_EQUAL, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), private_data_ptr); - } - else - { - /* Except when the whole stack frame must be saved. */ - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), private_data_ptr); - braminzero = CMP(SLJIT_EQUAL, STR_PTR, 0, SLJIT_MEM1(TMP1), STACK(-BACKTRACK_AS(bracket_backtrack)->u.framesize - 2)); - } - JUMPHERE(skip); + /* When we come from outside, private_data_ptr contains the previous STR_PTR. */ + braminzero = CMP(SLJIT_EQUAL, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), private_data_ptr); } else { - jump = CMP(SLJIT_NOT_EQUAL, STR_PTR, 0, SLJIT_IMM, 0); - OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(STACK_TOP), STACK(1)); - JUMPHERE(jump); + /* Except when the whole stack frame must be saved. */ + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), private_data_ptr); + braminzero = CMP(SLJIT_EQUAL, STR_PTR, 0, SLJIT_MEM1(TMP1), STACK(-BACKTRACK_AS(bracket_backtrack)->u.framesize - 2)); } + JUMPHERE(skip); + } + else + { + jump = CMP(SLJIT_NOT_EQUAL, STR_PTR, 0, SLJIT_IMM, 0); + OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(STACK_TOP), STACK(1)); + JUMPHERE(jump); } } @@ -8138,7 +10368,7 @@ if (ket == OP_KETRMIN) if (ket == OP_KETRMAX) { rmax_label = LABEL(); - if (has_alternatives && opcode != OP_ONCE && opcode < OP_SBRA && repeat_type == 0) + if (has_alternatives && opcode >= OP_BRA && opcode < OP_SBRA && repeat_type == 0) BACKTRACK_AS(bracket_backtrack)->alternative_matchingpath = rmax_label; } @@ -8218,7 +10448,7 @@ if (opcode == OP_ONCE) OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), private_data_ptr, TMP2, 0); OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(stacksize), TMP1, 0); } - init_frame(common, ccbegin, NULL, BACKTRACK_AS(bracket_backtrack)->u.framesize + stacksize, stacksize + 1, FALSE); + init_frame(common, ccbegin, NULL, BACKTRACK_AS(bracket_backtrack)->u.framesize + stacksize, stacksize + 1); } } else if (opcode == OP_CBRA || opcode == OP_SCBRA) @@ -8242,7 +10472,7 @@ else if (opcode == OP_CBRA || opcode == OP_SCBRA) OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(0), TMP2, 0); } } -else if (opcode == OP_SBRA || opcode == OP_SCOND) +else if (opcode == OP_ASSERT_NA || opcode == OP_ASSERTBACK_NA || opcode == OP_SCRIPT_RUN || opcode == OP_SBRA || opcode == OP_SCOND) { /* Saving the previous value. */ OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(SLJIT_SP), private_data_ptr); @@ -8288,13 +10518,18 @@ if (opcode == OP_COND || opcode == OP_SCOND) add_jump(compiler, &(BACKTRACK_AS(bracket_backtrack)->u.condfailed), JUMP(SLJIT_ZERO)); matchingpath += 1 + 2 * IMM2_SIZE; } - else if (*matchingpath == OP_RREF || *matchingpath == OP_DNRREF || *matchingpath == OP_FAIL) + else if ((*matchingpath >= OP_RREF && *matchingpath <= OP_TRUE) || *matchingpath == OP_FAIL) { /* Never has other case. */ BACKTRACK_AS(bracket_backtrack)->u.condfailed = NULL; SLJIT_ASSERT(!has_alternatives); - if (*matchingpath == OP_FAIL) + if (*matchingpath == OP_TRUE) + { + stacksize = 1; + matchingpath++; + } + else if (*matchingpath == OP_FALSE || *matchingpath == OP_FAIL) stacksize = 0; else if (*matchingpath == OP_RREF) { @@ -8363,9 +10598,15 @@ compile_matchingpath(common, matchingpath, cc, backtrack); if (SLJIT_UNLIKELY(sljit_get_compiler_error(compiler))) return NULL; +if (opcode == OP_ASSERT_NA || opcode == OP_ASSERTBACK_NA) + OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), private_data_ptr); + if (opcode == OP_ONCE) match_once_common(common, ket, BACKTRACK_AS(bracket_backtrack)->u.framesize, private_data_ptr, has_alternatives, needs_control_head); +if (opcode == OP_SCRIPT_RUN) + match_script_run_common(common, private_data_ptr, backtrack); + stacksize = 0; if (repeat_type == OP_MINUPTO) { @@ -8408,10 +10649,23 @@ if (ket != OP_KET || bra != OP_BRA) if (offset != 0) stacksize = match_capture_common(common, stacksize, offset, private_data_ptr); +/* Skip and count the other alternatives. */ +i = 1; +while (*cc == OP_ALT) + { + cc += GET(cc, 1); + i++; + } + if (has_alternatives) { if (opcode != OP_ONCE) - OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(stacksize), SLJIT_IMM, 0); + { + if (i <= 3) + OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(stacksize), SLJIT_IMM, 0); + else + BACKTRACK_AS(bracket_backtrack)->u.matching_put_label = sljit_emit_put_label(compiler, SLJIT_MEM1(STACK_TOP), STACK(stacksize)); + } if (ket != OP_KETRMAX) BACKTRACK_AS(bracket_backtrack)->alternative_matchingpath = LABEL(); } @@ -8435,13 +10689,15 @@ if (ket == OP_KETRMAX) if (opcode != OP_ONCE) free_stack(common, 1); } - else if (opcode == OP_ONCE || opcode >= OP_SBRA) + else if (opcode < OP_BRA || opcode >= OP_SBRA) { if (has_alternatives) BACKTRACK_AS(bracket_backtrack)->alternative_matchingpath = LABEL(); + /* Checking zero-length iteration. */ if (opcode != OP_ONCE) { + /* This case includes opcodes such as OP_SCRIPT_RUN. */ CMPTO(SLJIT_NOT_EQUAL, SLJIT_MEM1(SLJIT_SP), private_data_ptr, STR_PTR, 0, rmax_label); /* Drop STR_PTR for greedy plus quantifier. */ if (bra != OP_BRAZERO) @@ -8487,6 +10743,7 @@ if (bra == OP_BRAMINZERO) { OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), private_data_ptr); add_jump(compiler, &common->revertframes, JUMP(SLJIT_FAST_CALL)); + OP2(SLJIT_ADD, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, (BACKTRACK_AS(bracket_backtrack)->u.framesize - 1) * sizeof(sljit_sw)); } else if (ket == OP_KETRMIN && opcode != OP_ONCE) free_stack(common, 1); @@ -8497,9 +10754,6 @@ if (bra == OP_BRAMINZERO) if ((ket != OP_KET && bra != OP_BRAMINZERO) || bra == OP_BRAZERO) count_match(common); -/* Skip the other alternatives. */ -while (*cc == OP_ALT) - cc += GET(cc, 1); cc += 1 + LINK_SIZE; if (opcode == OP_ONCE) @@ -8507,16 +10761,16 @@ if (opcode == OP_ONCE) /* We temporarily encode the needs_control_head in the lowest bit. Note: on the target architectures of SLJIT the ((x << 1) >> 1) returns the same value for small signed numbers (including negative numbers). */ - BACKTRACK_AS(bracket_backtrack)->u.framesize = (BACKTRACK_AS(bracket_backtrack)->u.framesize << 1) | (needs_control_head ? 1 : 0); + BACKTRACK_AS(bracket_backtrack)->u.framesize = (int)((unsigned)BACKTRACK_AS(bracket_backtrack)->u.framesize << 1) | (needs_control_head ? 1 : 0); } return cc + repeat_length; } -static pcre_uchar *compile_bracketpos_matchingpath(compiler_common *common, pcre_uchar *cc, backtrack_common *parent) +static PCRE2_SPTR compile_bracketpos_matchingpath(compiler_common *common, PCRE2_SPTR cc, backtrack_common *parent) { DEFINE_COMPILER; backtrack_common *backtrack; -pcre_uchar opcode; +PCRE2_UCHAR opcode; int private_data_ptr; int cbraprivptr = 0; BOOL needs_control_head; @@ -8524,7 +10778,7 @@ int framesize; int stacksize; int offset = 0; BOOL zero = FALSE; -pcre_uchar *ccbegin = NULL; +PCRE2_SPTR ccbegin = NULL; int stack; /* Also contains the offset of control head. */ struct sljit_label *loop = NULL; struct jump_list *emptymatch = NULL; @@ -8656,7 +10910,7 @@ else stack++; } OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(stack), TMP1, 0); - init_frame(common, cc, NULL, stacksize - 1, stacksize - framesize, FALSE); + init_frame(common, cc, NULL, stacksize - 1, stacksize - framesize); stack -= 1 + (offset == 0); } @@ -8795,7 +11049,7 @@ count_match(common); return cc + 1 + LINK_SIZE; } -static SLJIT_INLINE pcre_uchar *get_iterator_parameters(compiler_common *common, pcre_uchar *cc, pcre_uchar *opcode, pcre_uchar *type, sljit_u32 *max, sljit_u32 *exact, pcre_uchar **end) +static SLJIT_INLINE PCRE2_SPTR get_iterator_parameters(compiler_common *common, PCRE2_SPTR cc, PCRE2_UCHAR *opcode, PCRE2_UCHAR *type, sljit_u32 *max, sljit_u32 *exact, PCRE2_SPTR *end) { int class_len; @@ -8836,7 +11090,7 @@ else SLJIT_ASSERT(*opcode == OP_CLASS || *opcode == OP_NCLASS || *opcode == OP_XCLASS); *type = *opcode; cc++; - class_len = (*type < OP_XCLASS) ? (int)(1 + (32 / sizeof(pcre_uchar))) : GET(cc, 0); + class_len = (*type < OP_XCLASS) ? (int)(1 + (32 / sizeof(PCRE2_UCHAR))) : GET(cc, 0); *opcode = cc[class_len - 1]; if (*opcode >= OP_CRSTAR && *opcode <= OP_CRMINQUERY) @@ -8934,25 +11188,25 @@ if (*type == OP_END) } *end = cc + 1; -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (common->utf && HAS_EXTRALEN(*cc)) *end += GET_EXTRALEN(*cc); #endif return cc; } -static pcre_uchar *compile_iterator_matchingpath(compiler_common *common, pcre_uchar *cc, backtrack_common *parent) +static PCRE2_SPTR compile_iterator_matchingpath(compiler_common *common, PCRE2_SPTR cc, backtrack_common *parent) { DEFINE_COMPILER; backtrack_common *backtrack; -pcre_uchar opcode; -pcre_uchar type; +PCRE2_UCHAR opcode; +PCRE2_UCHAR type; sljit_u32 max = 0, exact; -BOOL fast_fail; -sljit_s32 fast_str_ptr; +sljit_s32 early_fail_ptr = PRIVATE_DATA(cc + 1); +sljit_s32 early_fail_type; BOOL charpos_enabled; -pcre_uchar charpos_char; +PCRE2_UCHAR charpos_char; unsigned int charpos_othercasebit; -pcre_uchar *end; +PCRE2_SPTR end; jump_list *no_match = NULL; jump_list *no_char1_match = NULL; struct sljit_jump *jump = NULL; @@ -8962,21 +11216,27 @@ int base = (private_data_ptr == 0) ? SLJIT_MEM1(STACK_TOP) : SLJIT_MEM1(SLJIT_SP int offset0 = (private_data_ptr == 0) ? STACK(0) : private_data_ptr; int offset1 = (private_data_ptr == 0) ? STACK(1) : private_data_ptr + (int)sizeof(sljit_sw); int tmp_base, tmp_offset; +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +BOOL use_tmp; +#endif PUSH_BACKTRACK(sizeof(char_iterator_backtrack), cc, NULL); -fast_str_ptr = PRIVATE_DATA(cc + 1); -fast_fail = TRUE; +early_fail_type = (early_fail_ptr & 0x7); +early_fail_ptr >>= 3; -SLJIT_ASSERT(common->fast_forward_bc_ptr == NULL || fast_str_ptr == 0 || cc == common->fast_forward_bc_ptr); +/* During recursion, these optimizations are disabled. */ +if (common->early_fail_start_ptr == 0) + { + early_fail_ptr = 0; + early_fail_type = type_skip; + } -if (cc == common->fast_forward_bc_ptr) - fast_fail = FALSE; -else if (common->fast_fail_start_ptr == 0) - fast_str_ptr = 0; +SLJIT_ASSERT(common->fast_forward_bc_ptr != NULL || early_fail_ptr == 0 + || (early_fail_ptr >= common->early_fail_start_ptr && early_fail_ptr <= common->early_fail_end_ptr)); -SLJIT_ASSERT(common->fast_forward_bc_ptr != NULL || fast_str_ptr == 0 - || (fast_str_ptr >= common->fast_fail_start_ptr && fast_str_ptr <= common->fast_fail_end_ptr)); +if (early_fail_type == type_fail) + add_jump(compiler, &backtrack->topbacktracks, CMP(SLJIT_LESS_EQUAL, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), early_fail_ptr)); cc = get_iterator_parameters(common, cc, &opcode, &type, &max, &exact, &end); @@ -8991,15 +11251,13 @@ else tmp_offset = POSSESSIVE0; } -if (fast_fail && fast_str_ptr != 0) - add_jump(compiler, &backtrack->topbacktracks, CMP(SLJIT_LESS_EQUAL, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), fast_str_ptr)); - /* Handle fixed part first. */ if (exact > 1) { - SLJIT_ASSERT(fast_str_ptr == 0); - if (common->mode == JIT_COMPILE -#ifdef SUPPORT_UTF + SLJIT_ASSERT(early_fail_ptr == 0); + + if (common->mode == PCRE2_JIT_COMPLETE +#ifdef SUPPORT_UNICODE && !common->utf #endif && type != OP_ANYNL && type != OP_EXTUNI) @@ -9022,18 +11280,31 @@ if (exact > 1) } } else if (exact == 1) + { compile_char1_matchingpath(common, type, cc, &backtrack->topbacktracks, TRUE); + if (early_fail_type == type_fail_range) + { + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), early_fail_ptr); + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(SLJIT_SP), early_fail_ptr + (int)sizeof(sljit_sw)); + OP2(SLJIT_SUB, TMP1, 0, TMP1, 0, TMP2, 0); + OP2(SLJIT_SUB, TMP2, 0, STR_PTR, 0, TMP2, 0); + add_jump(compiler, &backtrack->topbacktracks, CMP(SLJIT_LESS_EQUAL, TMP2, 0, TMP1, 0)); + + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), early_fail_ptr + (int)sizeof(sljit_sw), STR_PTR, 0); + } + } + switch(opcode) { case OP_STAR: case OP_UPTO: - SLJIT_ASSERT(fast_str_ptr == 0 || opcode == OP_STAR); + SLJIT_ASSERT(early_fail_ptr == 0 || opcode == OP_STAR); if (type == OP_ANYNL || type == OP_EXTUNI) { SLJIT_ASSERT(private_data_ptr == 0); - SLJIT_ASSERT(fast_str_ptr == 0); + SLJIT_ASSERT(early_fail_ptr == 0); allocate_stack(common, 2); OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(0), STR_PTR, 0); @@ -9052,180 +11323,231 @@ switch(opcode) OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), POSSESSIVE0, TMP1, 0); } - /* We cannot use TMP3 because of this allocate_stack. */ + /* We cannot use TMP3 because of allocate_stack. */ allocate_stack(common, 1); OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(0), STR_PTR, 0); JUMPTO(SLJIT_JUMP, label); if (jump != NULL) JUMPHERE(jump); + BACKTRACK_AS(char_iterator_backtrack)->matchingpath = LABEL(); + break; } - else - { - charpos_enabled = FALSE; - charpos_char = 0; - charpos_othercasebit = 0; - - if ((type != OP_CHAR && type != OP_CHARI) && (*end == OP_CHAR || *end == OP_CHARI)) - { - charpos_enabled = TRUE; -#ifdef SUPPORT_UTF - charpos_enabled = !common->utf || !HAS_EXTRALEN(end[1]); +#ifdef SUPPORT_UNICODE + else if (type == OP_ALLANY && !common->invalid_utf) +#else + else if (type == OP_ALLANY) #endif - if (charpos_enabled && *end == OP_CHARI && char_has_othercase(common, end + 1)) - { - charpos_othercasebit = char_get_othercase_bit(common, end + 1); - if (charpos_othercasebit == 0) - charpos_enabled = FALSE; - } - - if (charpos_enabled) - { - charpos_char = end[1]; - /* Consumpe the OP_CHAR opcode. */ - end += 2; -#if defined COMPILE_PCRE8 - SLJIT_ASSERT((charpos_othercasebit >> 8) == 0); -#elif defined COMPILE_PCRE16 || defined COMPILE_PCRE32 - SLJIT_ASSERT((charpos_othercasebit >> 9) == 0); - if ((charpos_othercasebit & 0x100) != 0) - charpos_othercasebit = (charpos_othercasebit & 0xff) << 8; -#endif - if (charpos_othercasebit != 0) - charpos_char |= charpos_othercasebit; - - BACKTRACK_AS(char_iterator_backtrack)->u.charpos.enabled = TRUE; - BACKTRACK_AS(char_iterator_backtrack)->u.charpos.chr = charpos_char; - BACKTRACK_AS(char_iterator_backtrack)->u.charpos.othercasebit = charpos_othercasebit; - } - } - - if (charpos_enabled) + { + if (opcode == OP_STAR) { - if (opcode == OP_UPTO) - OP1(SLJIT_MOV, tmp_base, tmp_offset, SLJIT_IMM, max + 1); + if (private_data_ptr == 0) + allocate_stack(common, 2); - /* Search the first instance of charpos_char. */ - jump = JUMP(SLJIT_JUMP); - label = LABEL(); - if (opcode == OP_UPTO) - { - OP2(SLJIT_SUB | SLJIT_SET_Z, tmp_base, tmp_offset, tmp_base, tmp_offset, SLJIT_IMM, 1); - add_jump(compiler, &backtrack->topbacktracks, JUMP(SLJIT_ZERO)); - } - compile_char1_matchingpath(common, type, cc, &backtrack->topbacktracks, FALSE); - if (fast_str_ptr != 0) - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), fast_str_ptr, STR_PTR, 0); - JUMPHERE(jump); + OP1(SLJIT_MOV, base, offset0, STR_END, 0); + OP1(SLJIT_MOV, base, offset1, STR_PTR, 0); - detect_partial_match(common, &backtrack->topbacktracks); - OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); - if (charpos_othercasebit != 0) - OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, charpos_othercasebit); - CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, charpos_char, label); + OP1(SLJIT_MOV, STR_PTR, 0, STR_END, 0); + process_partial_match(common); + if (early_fail_ptr != 0) + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), early_fail_ptr, STR_END, 0); + BACKTRACK_AS(char_iterator_backtrack)->matchingpath = LABEL(); + break; + } +#ifdef SUPPORT_UNICODE + else if (!common->utf) +#else + else +#endif + { if (private_data_ptr == 0) allocate_stack(common, 2); - OP1(SLJIT_MOV, base, offset0, STR_PTR, 0); + OP1(SLJIT_MOV, base, offset1, STR_PTR, 0); - if (opcode == OP_UPTO) - { - OP2(SLJIT_SUB | SLJIT_SET_Z, tmp_base, tmp_offset, tmp_base, tmp_offset, SLJIT_IMM, 1); - add_jump(compiler, &no_match, JUMP(SLJIT_ZERO)); - } + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(max)); - /* Search the last instance of charpos_char. */ - label = LABEL(); - compile_char1_matchingpath(common, type, cc, &no_match, FALSE); - if (fast_str_ptr != 0) - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), fast_str_ptr, STR_PTR, 0); - detect_partial_match(common, &no_match); - OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); - if (charpos_othercasebit != 0) - OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, charpos_othercasebit); - if (opcode == OP_STAR) + if (common->mode == PCRE2_JIT_COMPLETE) { - CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, charpos_char, label); - OP1(SLJIT_MOV, base, offset0, STR_PTR, 0); + OP2(SLJIT_SUB | SLJIT_SET_GREATER, SLJIT_UNUSED, 0, STR_PTR, 0, STR_END, 0); + CMOV(SLJIT_GREATER, STR_PTR, STR_END, 0); } else { - jump = CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, charpos_char); - OP1(SLJIT_MOV, base, offset0, STR_PTR, 0); + jump = CMP(SLJIT_LESS_EQUAL, STR_PTR, 0, STR_END, 0); + process_partial_match(common); JUMPHERE(jump); } - if (opcode == OP_UPTO) - { - OP2(SLJIT_SUB | SLJIT_SET_Z, tmp_base, tmp_offset, tmp_base, tmp_offset, SLJIT_IMM, 1); - JUMPTO(SLJIT_NOT_ZERO, label); - } - else - JUMPTO(SLJIT_JUMP, label); - - set_jumps(no_match, LABEL()); - OP1(SLJIT_MOV, STR_PTR, 0, base, offset0); - OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); OP1(SLJIT_MOV, base, offset0, STR_PTR, 0); + + if (early_fail_ptr != 0) + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), early_fail_ptr, STR_PTR, 0); + BACKTRACK_AS(char_iterator_backtrack)->matchingpath = LABEL(); + break; } -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 - else if (common->utf) + } + + charpos_enabled = FALSE; + charpos_char = 0; + charpos_othercasebit = 0; + + if ((type != OP_CHAR && type != OP_CHARI) && (*end == OP_CHAR || *end == OP_CHARI)) + { +#ifdef SUPPORT_UNICODE + charpos_enabled = !common->utf || !HAS_EXTRALEN(end[1]); +#else + charpos_enabled = TRUE; +#endif + if (charpos_enabled && *end == OP_CHARI && char_has_othercase(common, end + 1)) { - if (private_data_ptr == 0) - allocate_stack(common, 2); + charpos_othercasebit = char_get_othercase_bit(common, end + 1); + if (charpos_othercasebit == 0) + charpos_enabled = FALSE; + } - OP1(SLJIT_MOV, base, offset0, STR_PTR, 0); - OP1(SLJIT_MOV, base, offset1, STR_PTR, 0); + if (charpos_enabled) + { + charpos_char = end[1]; + /* Consume the OP_CHAR opcode. */ + end += 2; +#if PCRE2_CODE_UNIT_WIDTH == 8 + SLJIT_ASSERT((charpos_othercasebit >> 8) == 0); +#elif PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 + SLJIT_ASSERT((charpos_othercasebit >> 9) == 0); + if ((charpos_othercasebit & 0x100) != 0) + charpos_othercasebit = (charpos_othercasebit & 0xff) << 8; +#endif + if (charpos_othercasebit != 0) + charpos_char |= charpos_othercasebit; - if (opcode == OP_UPTO) - OP1(SLJIT_MOV, tmp_base, tmp_offset, SLJIT_IMM, max); + BACKTRACK_AS(char_iterator_backtrack)->u.charpos.enabled = TRUE; + BACKTRACK_AS(char_iterator_backtrack)->u.charpos.chr = charpos_char; + BACKTRACK_AS(char_iterator_backtrack)->u.charpos.othercasebit = charpos_othercasebit; + } + } - label = LABEL(); - compile_char1_matchingpath(common, type, cc, &no_match, TRUE); - OP1(SLJIT_MOV, base, offset0, STR_PTR, 0); + if (charpos_enabled) + { + if (opcode == OP_UPTO) + OP1(SLJIT_MOV, tmp_base, tmp_offset, SLJIT_IMM, max + 1); - if (opcode == OP_UPTO) - { - OP2(SLJIT_SUB | SLJIT_SET_Z, tmp_base, tmp_offset, tmp_base, tmp_offset, SLJIT_IMM, 1); - JUMPTO(SLJIT_NOT_ZERO, label); - } - else - JUMPTO(SLJIT_JUMP, label); + /* Search the first instance of charpos_char. */ + jump = JUMP(SLJIT_JUMP); + label = LABEL(); + if (opcode == OP_UPTO) + { + OP2(SLJIT_SUB | SLJIT_SET_Z, tmp_base, tmp_offset, tmp_base, tmp_offset, SLJIT_IMM, 1); + add_jump(compiler, &backtrack->topbacktracks, JUMP(SLJIT_ZERO)); + } + compile_char1_matchingpath(common, type, cc, &backtrack->topbacktracks, FALSE); + if (early_fail_ptr != 0) + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), early_fail_ptr, STR_PTR, 0); + JUMPHERE(jump); - set_jumps(no_match, LABEL()); - OP1(SLJIT_MOV, STR_PTR, 0, base, offset0); - if (fast_str_ptr != 0) - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), fast_str_ptr, STR_PTR, 0); + detect_partial_match(common, &backtrack->topbacktracks); + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); + if (charpos_othercasebit != 0) + OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, charpos_othercasebit); + CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, charpos_char, label); + + if (private_data_ptr == 0) + allocate_stack(common, 2); + OP1(SLJIT_MOV, base, offset0, STR_PTR, 0); + OP1(SLJIT_MOV, base, offset1, STR_PTR, 0); + + if (opcode == OP_UPTO) + { + OP2(SLJIT_SUB | SLJIT_SET_Z, tmp_base, tmp_offset, tmp_base, tmp_offset, SLJIT_IMM, 1); + add_jump(compiler, &no_match, JUMP(SLJIT_ZERO)); + } + + /* Search the last instance of charpos_char. */ + label = LABEL(); + compile_char1_matchingpath(common, type, cc, &no_match, FALSE); + if (early_fail_ptr != 0) + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), early_fail_ptr, STR_PTR, 0); + detect_partial_match(common, &no_match); + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(0)); + if (charpos_othercasebit != 0) + OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, charpos_othercasebit); + + if (opcode == OP_STAR) + { + CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, charpos_char, label); + OP1(SLJIT_MOV, base, offset0, STR_PTR, 0); + JUMPTO(SLJIT_JUMP, label); } -#endif else { - if (private_data_ptr == 0) - allocate_stack(common, 2); + jump = CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, charpos_char); + OP1(SLJIT_MOV, base, offset0, STR_PTR, 0); + JUMPHERE(jump); + OP2(SLJIT_SUB | SLJIT_SET_Z, tmp_base, tmp_offset, tmp_base, tmp_offset, SLJIT_IMM, 1); + JUMPTO(SLJIT_NOT_ZERO, label); + } - OP1(SLJIT_MOV, base, offset1, STR_PTR, 0); - if (opcode == OP_UPTO) - OP1(SLJIT_MOV, tmp_base, tmp_offset, SLJIT_IMM, max); + set_jumps(no_match, LABEL()); + OP2(SLJIT_ADD, STR_PTR, 0, base, offset0, SLJIT_IMM, IN_UCHARS(1)); + OP1(SLJIT_MOV, base, offset0, STR_PTR, 0); + } + else + { + if (private_data_ptr == 0) + allocate_stack(common, 2); - label = LABEL(); - detect_partial_match(common, &no_match); - compile_char1_matchingpath(common, type, cc, &no_char1_match, FALSE); - if (opcode == OP_UPTO) + OP1(SLJIT_MOV, base, offset1, STR_PTR, 0); +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 + use_tmp = (!HAS_VIRTUAL_REGISTERS && opcode == OP_STAR); + SLJIT_ASSERT(!use_tmp || tmp_base == TMP3); + + if (common->utf) + OP1(SLJIT_MOV, use_tmp ? TMP3 : base, use_tmp ? 0 : offset0, STR_PTR, 0); +#endif + if (opcode == OP_UPTO) + OP1(SLJIT_MOV, tmp_base, tmp_offset, SLJIT_IMM, max); + + detect_partial_match(common, &no_match); + label = LABEL(); + compile_char1_matchingpath(common, type, cc, &no_char1_match, FALSE); +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 + if (common->utf) + OP1(SLJIT_MOV, use_tmp ? TMP3 : base, use_tmp ? 0 : offset0, STR_PTR, 0); +#endif + + if (opcode == OP_UPTO) + { + OP2(SLJIT_SUB | SLJIT_SET_Z, tmp_base, tmp_offset, tmp_base, tmp_offset, SLJIT_IMM, 1); + add_jump(compiler, &no_match, JUMP(SLJIT_ZERO)); + } + + detect_partial_match_to(common, label); + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + + set_jumps(no_char1_match, LABEL()); +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 + if (common->utf) + { + set_jumps(no_match, LABEL()); + if (use_tmp) { - OP2(SLJIT_SUB | SLJIT_SET_Z, tmp_base, tmp_offset, tmp_base, tmp_offset, SLJIT_IMM, 1); - JUMPTO(SLJIT_NOT_ZERO, label); - OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + OP1(SLJIT_MOV, STR_PTR, 0, TMP3, 0); + OP1(SLJIT_MOV, base, offset0, TMP3, 0); } else - JUMPTO(SLJIT_JUMP, label); - - set_jumps(no_char1_match, LABEL()); + OP1(SLJIT_MOV, STR_PTR, 0, base, offset0); + } + else +#endif + { OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); set_jumps(no_match, LABEL()); OP1(SLJIT_MOV, base, offset0, STR_PTR, 0); - if (fast_str_ptr != 0) - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), fast_str_ptr, STR_PTR, 0); } + + if (early_fail_ptr != 0) + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), early_fail_ptr, STR_PTR, 0); } + BACKTRACK_AS(char_iterator_backtrack)->matchingpath = LABEL(); break; @@ -9234,12 +11556,12 @@ switch(opcode) allocate_stack(common, 1); OP1(SLJIT_MOV, base, offset0, STR_PTR, 0); BACKTRACK_AS(char_iterator_backtrack)->matchingpath = LABEL(); - if (fast_str_ptr != 0) - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), fast_str_ptr, STR_PTR, 0); + if (early_fail_ptr != 0) + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), early_fail_ptr, STR_PTR, 0); break; case OP_MINUPTO: - SLJIT_ASSERT(fast_str_ptr == 0); + SLJIT_ASSERT(early_fail_ptr == 0); if (private_data_ptr == 0) allocate_stack(common, 2); OP1(SLJIT_MOV, base, offset0, STR_PTR, 0); @@ -9249,7 +11571,7 @@ switch(opcode) case OP_QUERY: case OP_MINQUERY: - SLJIT_ASSERT(fast_str_ptr == 0); + SLJIT_ASSERT(early_fail_ptr == 0); if (private_data_ptr == 0) allocate_stack(common, 1); OP1(SLJIT_MOV, base, offset0, STR_PTR, 0); @@ -9262,63 +11584,112 @@ switch(opcode) break; case OP_POSSTAR: -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 +#if defined SUPPORT_UNICODE + if (type == OP_ALLANY && !common->invalid_utf) +#else + if (type == OP_ALLANY) +#endif + { + OP1(SLJIT_MOV, STR_PTR, 0, STR_END, 0); + process_partial_match(common); + if (early_fail_ptr != 0) + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), early_fail_ptr, STR_END, 0); + break; + } + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 if (common->utf) { OP1(SLJIT_MOV, tmp_base, tmp_offset, STR_PTR, 0); + detect_partial_match(common, &no_match); label = LABEL(); - compile_char1_matchingpath(common, type, cc, &no_match, TRUE); + compile_char1_matchingpath(common, type, cc, &no_match, FALSE); OP1(SLJIT_MOV, tmp_base, tmp_offset, STR_PTR, 0); - JUMPTO(SLJIT_JUMP, label); + detect_partial_match_to(common, label); + set_jumps(no_match, LABEL()); OP1(SLJIT_MOV, STR_PTR, 0, tmp_base, tmp_offset); - if (fast_str_ptr != 0) - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), fast_str_ptr, STR_PTR, 0); + if (early_fail_ptr != 0) + { + if (!HAS_VIRTUAL_REGISTERS && tmp_base == TMP3) + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), early_fail_ptr, TMP3, 0); + else + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), early_fail_ptr, STR_PTR, 0); + } break; } #endif - label = LABEL(); + detect_partial_match(common, &no_match); + label = LABEL(); compile_char1_matchingpath(common, type, cc, &no_char1_match, FALSE); - JUMPTO(SLJIT_JUMP, label); + detect_partial_match_to(common, label); + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + set_jumps(no_char1_match, LABEL()); OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); set_jumps(no_match, LABEL()); - if (fast_str_ptr != 0) - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), fast_str_ptr, STR_PTR, 0); + if (early_fail_ptr != 0) + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), early_fail_ptr, STR_PTR, 0); break; case OP_POSUPTO: - SLJIT_ASSERT(fast_str_ptr == 0); -#if defined SUPPORT_UTF && !defined COMPILE_PCRE32 + SLJIT_ASSERT(early_fail_ptr == 0); +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 if (common->utf) { OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), POSSESSIVE1, STR_PTR, 0); OP1(SLJIT_MOV, tmp_base, tmp_offset, SLJIT_IMM, max); + + detect_partial_match(common, &no_match); label = LABEL(); - compile_char1_matchingpath(common, type, cc, &no_match, TRUE); + compile_char1_matchingpath(common, type, cc, &no_match, FALSE); OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), POSSESSIVE1, STR_PTR, 0); OP2(SLJIT_SUB | SLJIT_SET_Z, tmp_base, tmp_offset, tmp_base, tmp_offset, SLJIT_IMM, 1); - JUMPTO(SLJIT_NOT_ZERO, label); + add_jump(compiler, &no_match, JUMP(SLJIT_ZERO)); + detect_partial_match_to(common, label); + set_jumps(no_match, LABEL()); OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), POSSESSIVE1); break; } #endif + + if (type == OP_ALLANY) + { + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(max)); + + if (common->mode == PCRE2_JIT_COMPLETE) + { + OP2(SLJIT_SUB | SLJIT_SET_GREATER, SLJIT_UNUSED, 0, STR_PTR, 0, STR_END, 0); + CMOV(SLJIT_GREATER, STR_PTR, STR_END, 0); + } + else + { + jump = CMP(SLJIT_LESS_EQUAL, STR_PTR, 0, STR_END, 0); + process_partial_match(common); + JUMPHERE(jump); + } + break; + } + OP1(SLJIT_MOV, tmp_base, tmp_offset, SLJIT_IMM, max); - label = LABEL(); + detect_partial_match(common, &no_match); + label = LABEL(); compile_char1_matchingpath(common, type, cc, &no_char1_match, FALSE); OP2(SLJIT_SUB | SLJIT_SET_Z, tmp_base, tmp_offset, tmp_base, tmp_offset, SLJIT_IMM, 1); - JUMPTO(SLJIT_NOT_ZERO, label); + add_jump(compiler, &no_match, JUMP(SLJIT_ZERO)); + detect_partial_match_to(common, label); OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + set_jumps(no_char1_match, LABEL()); OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); set_jumps(no_match, LABEL()); break; case OP_POSQUERY: - SLJIT_ASSERT(fast_str_ptr == 0); + SLJIT_ASSERT(early_fail_ptr == 0); OP1(SLJIT_MOV, tmp_base, tmp_offset, STR_PTR, 0); compile_char1_matchingpath(common, type, cc, &no_match, TRUE); OP1(SLJIT_MOV, tmp_base, tmp_offset, STR_PTR, 0); @@ -9335,7 +11706,7 @@ count_match(common); return end; } -static SLJIT_INLINE pcre_uchar *compile_fail_accept_matchingpath(compiler_common *common, pcre_uchar *cc, backtrack_common *parent) +static SLJIT_INLINE PCRE2_SPTR compile_fail_accept_matchingpath(compiler_common *common, PCRE2_SPTR cc, backtrack_common *parent) { DEFINE_COMPILER; backtrack_common *backtrack; @@ -9348,6 +11719,9 @@ if (*cc == OP_FAIL) return cc + 1; } +if (*cc == OP_ACCEPT && common->currententry == NULL && (common->re->overall_options & PCRE2_ENDANCHORED) != 0) + add_jump(compiler, &common->reset_match, CMP(SLJIT_NOT_EQUAL, STR_PTR, 0, STR_END, 0)); + if (*cc == OP_ASSERT_ACCEPT || common->currententry != NULL || !common->might_be_empty) { /* No need to check notempty conditions. */ @@ -9362,15 +11736,24 @@ if (common->accept_label == NULL) add_jump(compiler, &common->accept, CMP(SLJIT_NOT_EQUAL, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), OVECTOR(0))); else CMPTO(SLJIT_NOT_EQUAL, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), OVECTOR(0), common->accept_label); -OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); -OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, notempty)); -add_jump(compiler, &backtrack->topbacktracks, CMP(SLJIT_NOT_EQUAL, TMP2, 0, SLJIT_IMM, 0)); -OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, notempty_atstart)); + +if (HAS_VIRTUAL_REGISTERS) + { + OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); + OP1(SLJIT_MOV_U32, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, options)); + } +else + OP1(SLJIT_MOV_U32, TMP2, 0, SLJIT_MEM1(ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, options)); + +OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP2, 0, SLJIT_IMM, PCRE2_NOTEMPTY); +add_jump(compiler, &backtrack->topbacktracks, JUMP(SLJIT_NOT_ZERO)); +OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP2, 0, SLJIT_IMM, PCRE2_NOTEMPTY_ATSTART); if (common->accept_label == NULL) - add_jump(compiler, &common->accept, CMP(SLJIT_EQUAL, TMP2, 0, SLJIT_IMM, 0)); + add_jump(compiler, &common->accept, JUMP(SLJIT_ZERO)); else - CMPTO(SLJIT_EQUAL, TMP2, 0, SLJIT_IMM, 0, common->accept_label); -OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, str)); + JUMPTO(SLJIT_ZERO, common->accept_label); + +OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(HAS_VIRTUAL_REGISTERS ? TMP1 : ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, str)); if (common->accept_label == NULL) add_jump(compiler, &common->accept, CMP(SLJIT_NOT_EQUAL, TMP2, 0, STR_PTR, 0)); else @@ -9379,7 +11762,7 @@ add_jump(compiler, &backtrack->topbacktracks, JUMP(SLJIT_JUMP)); return cc + 1; } -static SLJIT_INLINE pcre_uchar *compile_close_matchingpath(compiler_common *common, pcre_uchar *cc) +static SLJIT_INLINE PCRE2_SPTR compile_close_matchingpath(compiler_common *common, PCRE2_SPTR cc) { DEFINE_COMPILER; int offset = GET2(cc, 1); @@ -9398,14 +11781,15 @@ if (!optimized_cbracket) return cc + 1 + IMM2_SIZE; } -static SLJIT_INLINE pcre_uchar *compile_control_verb_matchingpath(compiler_common *common, pcre_uchar *cc, backtrack_common *parent) +static SLJIT_INLINE PCRE2_SPTR compile_control_verb_matchingpath(compiler_common *common, PCRE2_SPTR cc, backtrack_common *parent) { DEFINE_COMPILER; backtrack_common *backtrack; -pcre_uchar opcode = *cc; -pcre_uchar *ccend = cc + 1; +PCRE2_UCHAR opcode = *cc; +PCRE2_SPTR ccend = cc + 1; -if (opcode == OP_PRUNE_ARG || opcode == OP_SKIP_ARG || opcode == OP_THEN_ARG) +if (opcode == OP_COMMIT_ARG || opcode == OP_PRUNE_ARG || + opcode == OP_SKIP_ARG || opcode == OP_THEN_ARG) ccend += 2 + cc[1]; PUSH_BACKTRACK(sizeof(backtrack_common), cc, NULL); @@ -9417,20 +11801,21 @@ if (opcode == OP_SKIP) return ccend; } -if (opcode == OP_PRUNE_ARG || opcode == OP_THEN_ARG) +if (opcode == OP_COMMIT_ARG || opcode == OP_PRUNE_ARG || opcode == OP_THEN_ARG) { - OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); + if (HAS_VIRTUAL_REGISTERS) + OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, (sljit_sw)(cc + 2)); OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->mark_ptr, TMP2, 0); - OP1(SLJIT_MOV, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, mark_ptr), TMP2, 0); + OP1(SLJIT_MOV, SLJIT_MEM1(HAS_VIRTUAL_REGISTERS ? TMP1 : ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, mark_ptr), TMP2, 0); } return ccend; } -static pcre_uchar then_trap_opcode[1] = { OP_THEN_TRAP }; +static PCRE2_UCHAR then_trap_opcode[1] = { OP_THEN_TRAP }; -static SLJIT_INLINE void compile_then_trap_matchingpath(compiler_common *common, pcre_uchar *cc, pcre_uchar *ccend, backtrack_common *parent) +static SLJIT_INLINE void compile_then_trap_matchingpath(compiler_common *common, PCRE2_SPTR cc, PCRE2_SPTR ccend, backtrack_common *parent) { DEFINE_COMPILER; backtrack_common *backtrack; @@ -9458,10 +11843,10 @@ OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(size - 3), TMP2, 0); size = BACKTRACK_AS(then_trap_backtrack)->framesize; if (size >= 0) - init_frame(common, cc, ccend, size - 1, 0, FALSE); + init_frame(common, cc, ccend, size - 1, 0); } -static void compile_matchingpath(compiler_common *common, pcre_uchar *cc, pcre_uchar *ccend, backtrack_common *parent) +static void compile_matchingpath(compiler_common *common, PCRE2_SPTR cc, PCRE2_SPTR ccend, backtrack_common *parent) { DEFINE_COMPILER; backtrack_common *backtrack; @@ -9530,7 +11915,7 @@ while (cc < ccend) case OP_CHAR: case OP_CHARI: - if (common->mode == JIT_COMPILE) + if (common->mode == PCRE2_JIT_COMPLETE) cc = compile_charn_matchingpath(common, cc, ccend, parent->top != NULL ? &parent->top->nextbacktracks : &parent->topbacktracks); else cc = compile_char1_matchingpath(common, *cc, cc + 1, parent->top != NULL ? &parent->top->nextbacktracks : &parent->topbacktracks, TRUE); @@ -9606,13 +11991,13 @@ while (cc < ccend) case OP_CLASS: case OP_NCLASS: - if (cc[1 + (32 / sizeof(pcre_uchar))] >= OP_CRSTAR && cc[1 + (32 / sizeof(pcre_uchar))] <= OP_CRPOSRANGE) + if (cc[1 + (32 / sizeof(PCRE2_UCHAR))] >= OP_CRSTAR && cc[1 + (32 / sizeof(PCRE2_UCHAR))] <= OP_CRPOSRANGE) cc = compile_iterator_matchingpath(common, cc, parent); else cc = compile_char1_matchingpath(common, *cc, cc + 1, parent->top != NULL ? &parent->top->nextbacktracks : &parent->topbacktracks, TRUE); break; -#if defined SUPPORT_UTF || defined COMPILE_PCRE16 || defined COMPILE_PCRE32 +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH == 16 || PCRE2_CODE_UNIT_WIDTH == 32 case OP_XCLASS: if (*(cc + GET(cc, 1)) >= OP_CRSTAR && *(cc + GET(cc, 1)) <= OP_CRPOSRANGE) cc = compile_iterator_matchingpath(common, cc, parent); @@ -9649,6 +12034,7 @@ while (cc < ccend) break; case OP_CALLOUT: + case OP_CALLOUT_STR: cc = compile_callout_matchingpath(common, cc, parent); break; @@ -9678,8 +12064,10 @@ while (cc < ccend) count_match(common); break; + case OP_ASSERT_NA: + case OP_ASSERTBACK_NA: case OP_ONCE: - case OP_ONCE_NC: + case OP_SCRIPT_RUN: case OP_BRA: case OP_CBRA: case OP_COND: @@ -9712,11 +12100,12 @@ while (cc < ccend) SLJIT_ASSERT(common->mark_ptr != 0); OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(SLJIT_SP), common->mark_ptr); allocate_stack(common, common->has_skip_arg ? 5 : 1); - OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); + if (HAS_VIRTUAL_REGISTERS) + OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(common->has_skip_arg ? 4 : 0), TMP2, 0); OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, (sljit_sw)(cc + 2)); OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->mark_ptr, TMP2, 0); - OP1(SLJIT_MOV, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, mark_ptr), TMP2, 0); + OP1(SLJIT_MOV, SLJIT_MEM1(HAS_VIRTUAL_REGISTERS ? TMP1 : ARGUMENTS), SLJIT_OFFSETOF(jit_arguments, mark_ptr), TMP2, 0); if (common->has_skip_arg) { OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr); @@ -9736,6 +12125,7 @@ while (cc < ccend) case OP_THEN: case OP_THEN_ARG: case OP_COMMIT: + case OP_COMMIT_ARG: cc = compile_control_verb_matchingpath(common, cc, parent); break; @@ -9790,14 +12180,14 @@ SLJIT_ASSERT(cc == ccend); static void compile_iterator_backtrackingpath(compiler_common *common, struct backtrack_common *current) { DEFINE_COMPILER; -pcre_uchar *cc = current->cc; -pcre_uchar opcode; -pcre_uchar type; +PCRE2_SPTR cc = current->cc; +PCRE2_UCHAR opcode; +PCRE2_UCHAR type; sljit_u32 max = 0, exact; struct sljit_label *label = NULL; struct sljit_jump *jump = NULL; jump_list *jumplist = NULL; -pcre_uchar *end; +PCRE2_SPTR end; int private_data_ptr = PRIVATE_DATA(cc); int base = (private_data_ptr == 0) ? SLJIT_MEM1(STACK_TOP) : SLJIT_MEM1(SLJIT_SP); int offset0 = (private_data_ptr == 0) ? STACK(0) : private_data_ptr; @@ -9832,14 +12222,14 @@ switch(opcode) if (CURRENT_AS(char_iterator_backtrack)->u.charpos.othercasebit != 0) OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, CURRENT_AS(char_iterator_backtrack)->u.charpos.othercasebit); CMPTO(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, CURRENT_AS(char_iterator_backtrack)->u.charpos.chr, CURRENT_AS(char_iterator_backtrack)->matchingpath); - skip_char_back(common); + move_back(common, NULL, TRUE); CMPTO(SLJIT_GREATER, STR_PTR, 0, TMP2, 0, label); } else { OP1(SLJIT_MOV, STR_PTR, 0, base, offset0); jump = CMP(SLJIT_LESS_EQUAL, STR_PTR, 0, base, offset1); - skip_char_back(common); + move_back(common, NULL, TRUE); OP1(SLJIT_MOV, base, offset0, STR_PTR, 0); JUMPTO(SLJIT_JUMP, CURRENT_AS(char_iterator_backtrack)->matchingpath); } @@ -9918,9 +12308,9 @@ set_jumps(current->topbacktracks, LABEL()); static SLJIT_INLINE void compile_ref_iterator_backtrackingpath(compiler_common *common, struct backtrack_common *current) { DEFINE_COMPILER; -pcre_uchar *cc = current->cc; +PCRE2_SPTR cc = current->cc; BOOL ref = (*cc == OP_REF || *cc == OP_REFI); -pcre_uchar type; +PCRE2_UCHAR type; type = cc[ref ? 1 + IMM2_SIZE : 1 + 2 * IMM2_SIZE]; @@ -9943,34 +12333,28 @@ free_stack(common, ref ? 2 : 3); static SLJIT_INLINE void compile_recurse_backtrackingpath(compiler_common *common, struct backtrack_common *current) { DEFINE_COMPILER; +recurse_entry *entry; -if (CURRENT_AS(recurse_backtrack)->inlined_pattern) - compile_backtrackingpath(common, current->top); -set_jumps(current->topbacktracks, LABEL()); -if (CURRENT_AS(recurse_backtrack)->inlined_pattern) - return; - -if (common->has_set_som && common->mark_ptr != 0) - { - OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(STACK_TOP), STACK(0)); - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(STACK_TOP), STACK(1)); - free_stack(common, 2); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), OVECTOR(0), TMP2, 0); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->mark_ptr, TMP1, 0); - } -else if (common->has_set_som || common->mark_ptr != 0) +if (!CURRENT_AS(recurse_backtrack)->inlined_pattern) { - OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(STACK_TOP), STACK(0)); - free_stack(common, 1); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->has_set_som ? (int)(OVECTOR(0)) : common->mark_ptr, TMP2, 0); + entry = CURRENT_AS(recurse_backtrack)->entry; + if (entry->backtrack_label == NULL) + add_jump(compiler, &entry->backtrack_calls, JUMP(SLJIT_FAST_CALL)); + else + JUMPTO(SLJIT_FAST_CALL, entry->backtrack_label); + CMPTO(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, 0, CURRENT_AS(recurse_backtrack)->matchingpath); } +else + compile_backtrackingpath(common, current->top); + +set_jumps(current->topbacktracks, LABEL()); } static void compile_assert_backtrackingpath(compiler_common *common, struct backtrack_common *current) { DEFINE_COMPILER; -pcre_uchar *cc = current->cc; -pcre_uchar bra = OP_BRA; +PCRE2_SPTR cc = current->cc; +PCRE2_UCHAR bra = OP_BRA; struct sljit_jump *brajump = NULL; SLJIT_ASSERT(*cc != OP_BRAMINZERO); @@ -10016,7 +12400,9 @@ if (*cc == OP_ASSERT || *cc == OP_ASSERTBACK) { OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), CURRENT_AS(assert_backtrack)->private_data_ptr); add_jump(compiler, &common->revertframes, JUMP(SLJIT_FAST_CALL)); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), CURRENT_AS(assert_backtrack)->private_data_ptr, SLJIT_MEM1(STACK_TOP), STACK(-CURRENT_AS(assert_backtrack)->framesize - 1)); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(STACK_TOP), STACK(-2)); + OP2(SLJIT_ADD, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, (CURRENT_AS(assert_backtrack)->framesize - 1) * sizeof(sljit_sw)); + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), CURRENT_AS(assert_backtrack)->private_data_ptr, TMP1, 0); set_jumps(current->topbacktracks, LABEL()); } @@ -10040,22 +12426,21 @@ int opcode, stacksize, alt_count, alt_max; int offset = 0; int private_data_ptr = CURRENT_AS(bracket_backtrack)->private_data_ptr; int repeat_ptr = 0, repeat_type = 0, repeat_count = 0; -pcre_uchar *cc = current->cc; -pcre_uchar *ccbegin; -pcre_uchar *ccprev; -pcre_uchar bra = OP_BRA; -pcre_uchar ket; +PCRE2_SPTR cc = current->cc; +PCRE2_SPTR ccbegin; +PCRE2_SPTR ccprev; +PCRE2_UCHAR bra = OP_BRA; +PCRE2_UCHAR ket; assert_backtrack *assert; -sljit_uw *next_update_addr = NULL; BOOL has_alternatives; BOOL needs_control_head = FALSE; struct sljit_jump *brazero = NULL; -struct sljit_jump *alt1 = NULL; -struct sljit_jump *alt2 = NULL; +struct sljit_jump *next_alt = NULL; struct sljit_jump *once = NULL; struct sljit_jump *cond = NULL; struct sljit_label *rmin_label = NULL; struct sljit_label *exact_label = NULL; +struct sljit_put_label *put_label = NULL; if (*cc == OP_BRAZERO || *cc == OP_BRAMINZERO) { @@ -10086,8 +12471,6 @@ if (opcode == OP_CBRA || opcode == OP_SCBRA) offset = (GET2(ccbegin, 1 + LINK_SIZE)) << 1; if (SLJIT_UNLIKELY(opcode == OP_COND) && (*cc == OP_KETRMAX || *cc == OP_KETRMIN)) opcode = OP_SCOND; -if (SLJIT_UNLIKELY(opcode == OP_ONCE_NC)) - opcode = OP_ONCE; alt_max = has_alternatives ? no_alternatives(ccbegin) : 0; @@ -10193,6 +12576,7 @@ if (SLJIT_UNLIKELY(opcode == OP_ONCE)) { OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), private_data_ptr); add_jump(compiler, &common->revertframes, JUMP(SLJIT_FAST_CALL)); + OP2(SLJIT_ADD, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, (CURRENT_AS(bracket_backtrack)->u.framesize - 1) * sizeof(sljit_sw)); } once = JUMP(SLJIT_JUMP); } @@ -10205,7 +12589,7 @@ else if (SLJIT_UNLIKELY(opcode == OP_COND) || SLJIT_UNLIKELY(opcode == OP_SCOND) free_stack(common, 1); alt_max = 2; - alt1 = CMP(SLJIT_EQUAL, TMP1, 0, SLJIT_IMM, sizeof(sljit_uw)); + next_alt = CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, 0); } } else if (has_alternatives) @@ -10213,21 +12597,16 @@ else if (has_alternatives) OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(STACK_TOP), STACK(0)); free_stack(common, 1); - if (alt_max > 4) + if (alt_max > 3) { - /* Table jump if alt_max is greater than 4. */ - next_update_addr = allocate_read_only_data(common, alt_max * sizeof(sljit_uw)); - if (SLJIT_UNLIKELY(next_update_addr == NULL)) - return; - sljit_emit_ijump(compiler, SLJIT_JUMP, SLJIT_MEM1(TMP1), (sljit_sw)next_update_addr); - add_label_addr(common, next_update_addr++); + sljit_emit_ijump(compiler, SLJIT_JUMP, TMP1, 0); + + SLJIT_ASSERT(CURRENT_AS(bracket_backtrack)->u.matching_put_label); + sljit_set_put_label(CURRENT_AS(bracket_backtrack)->u.matching_put_label, LABEL()); + sljit_emit_op0(compiler, SLJIT_ENDBR); } else - { - if (alt_max == 4) - alt2 = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 2 * sizeof(sljit_uw)); - alt1 = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, sizeof(sljit_uw)); - } + next_alt = CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, 0); } COMPILE_BACKTRACKINGPATH(current->top); @@ -10245,7 +12624,9 @@ if (SLJIT_UNLIKELY(opcode == OP_COND) || SLJIT_UNLIKELY(opcode == OP_SCOND)) { OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), assert->private_data_ptr); add_jump(compiler, &common->revertframes, JUMP(SLJIT_FAST_CALL)); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), assert->private_data_ptr, SLJIT_MEM1(STACK_TOP), STACK(-assert->framesize - 1)); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(STACK_TOP), STACK(-2)); + OP2(SLJIT_ADD, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, (assert->framesize - 1) * sizeof(sljit_sw)); + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), assert->private_data_ptr, TMP1, 0); } cond = JUMP(SLJIT_JUMP); set_jumps(CURRENT_AS(bracket_backtrack)->u.assert->condfailed, LABEL()); @@ -10262,7 +12643,7 @@ if (SLJIT_UNLIKELY(opcode == OP_COND) || SLJIT_UNLIKELY(opcode == OP_SCOND)) if (has_alternatives) { - alt_count = sizeof(sljit_uw); + alt_count = 1; do { current->top = NULL; @@ -10288,6 +12669,12 @@ if (has_alternatives) compile_matchingpath(common, ccprev, cc, current); if (SLJIT_UNLIKELY(sljit_get_compiler_error(compiler))) return; + + if (opcode == OP_ASSERT_NA || opcode == OP_ASSERTBACK_NA) + OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), private_data_ptr); + + if (opcode == OP_SCRIPT_RUN) + match_script_run_common(common, private_data_ptr, current); } /* Instructions after the current alternative is successfully matched. */ @@ -10338,7 +12725,12 @@ if (has_alternatives) stacksize = match_capture_common(common, stacksize, offset, private_data_ptr); if (opcode != OP_ONCE) - OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(stacksize), SLJIT_IMM, alt_count); + { + if (alt_max <= 3) + OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(stacksize), SLJIT_IMM, alt_count); + else + put_label = sljit_emit_put_label(compiler, SLJIT_MEM1(STACK_TOP), STACK(stacksize)); + } if (offset != 0 && ket == OP_KETRMAX && common->optimized_cbracket[offset >> 1] != 0) { @@ -10351,24 +12743,21 @@ if (has_alternatives) if (opcode != OP_ONCE) { - if (alt_max > 4) - add_label_addr(common, next_update_addr++); - else + if (alt_max <= 3) { - if (alt_count != 2 * sizeof(sljit_uw)) - { - JUMPHERE(alt1); - if (alt_max == 3 && alt_count == sizeof(sljit_uw)) - alt2 = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 2 * sizeof(sljit_uw)); - } - else + JUMPHERE(next_alt); + alt_count++; + if (alt_count < alt_max) { - JUMPHERE(alt2); - if (alt_max == 4) - alt1 = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 3 * sizeof(sljit_uw)); + SLJIT_ASSERT(alt_count == 2 && alt_max == 3); + next_alt = CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, 1); } } - alt_count += sizeof(sljit_uw); + else + { + sljit_set_put_label(put_label, LABEL()); + sljit_emit_op0(compiler, SLJIT_ENDBR); + } } COMPILE_BACKTRACKINGPATH(current->top); @@ -10386,7 +12775,9 @@ if (has_alternatives) { OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), assert->private_data_ptr); add_jump(compiler, &common->revertframes, JUMP(SLJIT_FAST_CALL)); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), assert->private_data_ptr, SLJIT_MEM1(STACK_TOP), STACK(-assert->framesize - 1)); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(STACK_TOP), STACK(-2)); + OP2(SLJIT_ADD, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, (assert->framesize - 1) * sizeof(sljit_sw)); + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), assert->private_data_ptr, TMP1, 0); } JUMPHERE(cond); } @@ -10414,7 +12805,7 @@ if (offset != 0) OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), private_data_ptr, TMP1, 0); } } -else if (opcode == OP_SBRA || opcode == OP_SCOND) +else if (opcode == OP_ASSERT_NA || opcode == OP_ASSERTBACK_NA || opcode == OP_SCRIPT_RUN || opcode == OP_SBRA || opcode == OP_SCOND) { OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), private_data_ptr, SLJIT_MEM1(STACK_TOP), STACK(0)); free_stack(common, 1); @@ -10522,6 +12913,7 @@ if (CURRENT_AS(bracketpos_backtrack)->framesize < 0) OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), CURRENT_AS(bracketpos_backtrack)->private_data_ptr); add_jump(compiler, &common->revertframes, JUMP(SLJIT_FAST_CALL)); +OP2(SLJIT_ADD, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, (CURRENT_AS(bracketpos_backtrack)->framesize - 1) * sizeof(sljit_sw)); if (current->topbacktracks) { @@ -10561,7 +12953,7 @@ SLJIT_ASSERT(!current->nextbacktracks && !current->topbacktracks); static SLJIT_INLINE void compile_control_verb_backtrackingpath(compiler_common *common, struct backtrack_common *current) { DEFINE_COMPILER; -pcre_uchar opcode = *current->cc; +PCRE2_UCHAR opcode = *current->cc; struct sljit_label *loop; struct sljit_jump *jump; @@ -10584,15 +12976,16 @@ if (opcode == OP_THEN || opcode == OP_THEN_ARG) add_jump(compiler, &common->then_trap->quit, JUMP(SLJIT_JUMP)); return; } - else if (common->positive_assert) + else if (!common->local_quit_available && common->in_positive_assertion) { - add_jump(compiler, &common->positive_assert_quit, JUMP(SLJIT_JUMP)); + add_jump(compiler, &common->positive_assertion_quit, JUMP(SLJIT_JUMP)); return; } } -if (common->local_exit) +if (common->local_quit_available) { + /* Abort match with a fail. */ if (common->quit_label == NULL) add_jump(compiler, &common->quit, JUMP(SLJIT_JUMP)); else @@ -10602,15 +12995,13 @@ if (common->local_exit) if (opcode == OP_SKIP_ARG) { - SLJIT_ASSERT(common->control_head_ptr != 0); + SLJIT_ASSERT(common->control_head_ptr != 0 && TMP1 == SLJIT_R0 && STR_PTR == SLJIT_R1); OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LOCALS0, STACK_TOP, 0); - OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_IMM, (sljit_sw)(current->cc + 2)); + OP1(SLJIT_MOV, SLJIT_R1, 0, SLJIT_IMM, (sljit_sw)(current->cc + 2)); sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(SW), SLJIT_IMM, SLJIT_FUNC_OFFSET(do_search_mark)); - OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), LOCALS0); - OP1(SLJIT_MOV, STR_PTR, 0, TMP1, 0); - add_jump(compiler, &common->reset_match, CMP(SLJIT_NOT_EQUAL, STR_PTR, 0, SLJIT_IMM, 0)); + OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_R0, 0); + add_jump(compiler, &common->reset_match, CMP(SLJIT_NOT_EQUAL, SLJIT_R0, 0, SLJIT_IMM, 0)); return; } @@ -10643,7 +13034,10 @@ jump = JUMP(SLJIT_JUMP); set_jumps(CURRENT_AS(then_trap_backtrack)->quit, LABEL()); /* STACK_TOP is set by THEN. */ if (CURRENT_AS(then_trap_backtrack)->framesize >= 0) + { add_jump(compiler, &common->revertframes, JUMP(SLJIT_FAST_CALL)); + OP2(SLJIT_ADD, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, (CURRENT_AS(then_trap_backtrack)->framesize - 1) * sizeof(sljit_sw)); + } OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(STACK_TOP), STACK(0)); free_stack(common, 3); @@ -10735,7 +13129,7 @@ while (current) case OP_TYPEPOSUPTO: case OP_CLASS: case OP_NCLASS: -#if defined SUPPORT_UTF || !defined COMPILE_PCRE8 +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH != 8 case OP_XCLASS: #endif compile_iterator_backtrackingpath(common, current); @@ -10759,8 +13153,10 @@ while (current) compile_assert_backtrackingpath(common, current); break; + case OP_ASSERT_NA: + case OP_ASSERTBACK_NA: case OP_ONCE: - case OP_ONCE_NC: + case OP_SCRIPT_RUN: case OP_BRA: case OP_CBRA: case OP_COND: @@ -10809,8 +13205,9 @@ while (current) break; case OP_COMMIT: - if (!common->local_exit) - OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE_ERROR_NOMATCH); + case OP_COMMIT_ARG: + if (!common->local_quit_available) + OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_NOMATCH); if (common->quit_label == NULL) add_jump(compiler, &common->quit, JUMP(SLJIT_JUMP)); else @@ -10818,6 +13215,7 @@ while (current) break; case OP_CALLOUT: + case OP_CALLOUT_STR: case OP_FAIL: case OP_ACCEPT: case OP_ASSERT_ACCEPT: @@ -10841,42 +13239,55 @@ common->then_trap = save_then_trap; static SLJIT_INLINE void compile_recurse(compiler_common *common) { DEFINE_COMPILER; -pcre_uchar *cc = common->start + common->currententry->start; -pcre_uchar *ccbegin = cc + 1 + LINK_SIZE + (*cc == OP_BRA ? 0 : IMM2_SIZE); -pcre_uchar *ccend = bracketend(cc) - (1 + LINK_SIZE); +PCRE2_SPTR cc = common->start + common->currententry->start; +PCRE2_SPTR ccbegin = cc + 1 + LINK_SIZE + (*cc == OP_BRA ? 0 : IMM2_SIZE); +PCRE2_SPTR ccend = bracketend(cc) - (1 + LINK_SIZE); BOOL needs_control_head; -int framesize = get_framesize(common, cc, NULL, TRUE, &needs_control_head); -int private_data_size = get_private_data_copy_length(common, ccbegin, ccend, needs_control_head); -int alternativesize; -BOOL needs_frame; +BOOL has_quit; +BOOL has_accept; +int private_data_size = get_recurse_data_length(common, ccbegin, ccend, &needs_control_head, &has_quit, &has_accept); +int alt_count, alt_max, local_size; backtrack_common altbacktrack; -struct sljit_jump *jump; +jump_list *match = NULL; +struct sljit_jump *next_alt = NULL; +struct sljit_jump *accept_exit = NULL; +struct sljit_label *quit; +struct sljit_put_label *put_label = NULL; /* Recurse captures then. */ common->then_trap = NULL; SLJIT_ASSERT(*cc == OP_BRA || *cc == OP_CBRA || *cc == OP_CBRAPOS || *cc == OP_SCBRA || *cc == OP_SCBRAPOS); -needs_frame = framesize >= 0; -if (!needs_frame) - framesize = 0; -alternativesize = *(cc + GET(cc, 1)) == OP_ALT ? 1 : 0; -SLJIT_ASSERT(common->currententry->entry == NULL && common->recursive_head_ptr != 0); -common->currententry->entry = LABEL(); -set_jumps(common->currententry->calls, common->currententry->entry); +alt_max = no_alternatives(cc); +alt_count = 0; + +/* Matching path. */ +SLJIT_ASSERT(common->currententry->entry_label == NULL && common->recursive_head_ptr != 0); +common->currententry->entry_label = LABEL(); +set_jumps(common->currententry->entry_calls, common->currententry->entry_label); sljit_emit_fast_enter(compiler, TMP2, 0); count_match(common); -allocate_stack(common, private_data_size + framesize + alternativesize); -OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(private_data_size + framesize + alternativesize - 1), TMP2, 0); -copy_private_data(common, ccbegin, ccend, TRUE, framesize + alternativesize, private_data_size + framesize + alternativesize, needs_control_head); + +local_size = (alt_max > 1) ? 2 : 1; + +/* (Reversed) stack layout: + [private data][return address][optional: str ptr] ... [optional: alternative index][recursive_head_ptr] */ + +allocate_stack(common, private_data_size + local_size); +/* Save return address. */ +OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(local_size - 1), TMP2, 0); + +copy_recurse_data(common, ccbegin, ccend, recurse_copy_from_global, local_size, private_data_size + local_size, has_quit); + +/* This variable is saved and restored all time when we enter or exit from a recursive context. */ +OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->recursive_head_ptr, STACK_TOP, 0); + if (needs_control_head) OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_IMM, 0); -OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->recursive_head_ptr, STACK_TOP, 0); -if (needs_frame) - init_frame(common, cc, NULL, framesize + alternativesize - 1, alternativesize, TRUE); -if (alternativesize > 0) +if (alt_max > 1) OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(0), STR_PTR, 0); memset(&altbacktrack, 0, sizeof(backtrack_common)); @@ -10898,7 +13309,70 @@ while (1) if (SLJIT_UNLIKELY(sljit_get_compiler_error(compiler))) return; - add_jump(compiler, &common->accept, JUMP(SLJIT_JUMP)); + allocate_stack(common, (alt_max > 1 || has_accept) ? 2 : 1); + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(SLJIT_SP), common->recursive_head_ptr); + + if (alt_max > 1 || has_accept) + { + if (alt_max > 3) + put_label = sljit_emit_put_label(compiler, SLJIT_MEM1(STACK_TOP), STACK(1)); + else + OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(1), SLJIT_IMM, alt_count); + } + + add_jump(compiler, &match, JUMP(SLJIT_JUMP)); + + if (alt_count == 0) + { + /* Backtracking path entry. */ + SLJIT_ASSERT(common->currententry->backtrack_label == NULL); + common->currententry->backtrack_label = LABEL(); + set_jumps(common->currententry->backtrack_calls, common->currententry->backtrack_label); + + sljit_emit_fast_enter(compiler, TMP1, 0); + + if (has_accept) + accept_exit = CMP(SLJIT_EQUAL, SLJIT_MEM1(STACK_TOP), STACK(1), SLJIT_IMM, -1); + + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(STACK_TOP), STACK(0)); + /* Save return address. */ + OP1(SLJIT_MOV, SLJIT_MEM1(TMP2), STACK(local_size - 1), TMP1, 0); + + copy_recurse_data(common, ccbegin, ccend, recurse_swap_global, local_size, private_data_size + local_size, has_quit); + + if (alt_max > 1) + { + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(STACK_TOP), STACK(1)); + free_stack(common, 2); + + if (alt_max > 3) + { + sljit_emit_ijump(compiler, SLJIT_JUMP, TMP1, 0); + sljit_set_put_label(put_label, LABEL()); + sljit_emit_op0(compiler, SLJIT_ENDBR); + } + else + next_alt = CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, 0); + } + else + free_stack(common, has_accept ? 2 : 1); + } + else if (alt_max > 3) + { + sljit_set_put_label(put_label, LABEL()); + sljit_emit_op0(compiler, SLJIT_ENDBR); + } + else + { + JUMPHERE(next_alt); + if (alt_count + 1 < alt_max) + { + SLJIT_ASSERT(alt_count == 1 && alt_max == 3); + next_alt = CMP(SLJIT_NOT_EQUAL, TMP1, 0, SLJIT_IMM, 1); + } + } + + alt_count++; compile_backtrackingpath(common, altbacktrack.top); if (SLJIT_UNLIKELY(sljit_get_compiler_error(compiler))) @@ -10912,76 +13386,88 @@ while (1) cc += GET(cc, 1); } -/* None of them matched. */ -OP1(SLJIT_MOV, TMP3, 0, SLJIT_IMM, 0); -jump = JUMP(SLJIT_JUMP); +/* No alternative is matched. */ + +quit = LABEL(); + +copy_recurse_data(common, ccbegin, ccend, recurse_copy_private_to_global, local_size, private_data_size + local_size, has_quit); + +OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(STACK_TOP), STACK(local_size - 1)); +free_stack(common, private_data_size + local_size); +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 0); +OP_SRC(SLJIT_FAST_RETURN, TMP2, 0); if (common->quit != NULL) { + SLJIT_ASSERT(has_quit); + set_jumps(common->quit, LABEL()); OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), common->recursive_head_ptr); - if (needs_frame) - { - OP2(SLJIT_ADD, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, (framesize + alternativesize) * sizeof(sljit_sw)); - add_jump(compiler, &common->revertframes, JUMP(SLJIT_FAST_CALL)); - OP2(SLJIT_SUB, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, (framesize + alternativesize) * sizeof(sljit_sw)); - } - OP1(SLJIT_MOV, TMP3, 0, SLJIT_IMM, 0); - common->quit = NULL; - add_jump(compiler, &common->quit, JUMP(SLJIT_JUMP)); + copy_recurse_data(common, ccbegin, ccend, recurse_copy_shared_to_global, local_size, private_data_size + local_size, has_quit); + JUMPTO(SLJIT_JUMP, quit); } -set_jumps(common->accept, LABEL()); -OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), common->recursive_head_ptr); -if (needs_frame) +if (has_accept) { - OP2(SLJIT_ADD, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, (framesize + alternativesize) * sizeof(sljit_sw)); - add_jump(compiler, &common->revertframes, JUMP(SLJIT_FAST_CALL)); - OP2(SLJIT_SUB, STACK_TOP, 0, STACK_TOP, 0, SLJIT_IMM, (framesize + alternativesize) * sizeof(sljit_sw)); - } -OP1(SLJIT_MOV, TMP3, 0, SLJIT_IMM, 1); + JUMPHERE(accept_exit); + free_stack(common, 2); -JUMPHERE(jump); -if (common->quit != NULL) - set_jumps(common->quit, LABEL()); -copy_private_data(common, ccbegin, ccend, FALSE, framesize + alternativesize, private_data_size + framesize + alternativesize, needs_control_head); -free_stack(common, private_data_size + framesize + alternativesize); -if (needs_control_head) - { - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(STACK_TOP), STACK(-3)); - OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(STACK_TOP), STACK(-2)); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->recursive_head_ptr, TMP1, 0); - OP1(SLJIT_MOV, TMP1, 0, TMP3, 0); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, TMP2, 0); + /* Save return address. */ + OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(local_size - 1), TMP1, 0); + + copy_recurse_data(common, ccbegin, ccend, recurse_copy_kept_shared_to_global, local_size, private_data_size + local_size, has_quit); + + OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(STACK_TOP), STACK(local_size - 1)); + free_stack(common, private_data_size + local_size); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 0); + OP_SRC(SLJIT_FAST_RETURN, TMP2, 0); } -else + +if (common->accept != NULL) { - OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(STACK_TOP), STACK(-2)); - OP1(SLJIT_MOV, TMP1, 0, TMP3, 0); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->recursive_head_ptr, TMP2, 0); + SLJIT_ASSERT(has_accept); + + set_jumps(common->accept, LABEL()); + + OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), common->recursive_head_ptr); + OP1(SLJIT_MOV, TMP2, 0, STACK_TOP, 0); + + allocate_stack(common, 2); + OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(1), SLJIT_IMM, -1); } -sljit_emit_fast_return(compiler, SLJIT_MEM1(STACK_TOP), STACK(-1)); + +set_jumps(match, LABEL()); + +OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(0), TMP2, 0); + +copy_recurse_data(common, ccbegin, ccend, recurse_swap_global, local_size, private_data_size + local_size, has_quit); + +OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP2), STACK(local_size - 1)); +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, 1); +OP_SRC(SLJIT_FAST_RETURN, TMP2, 0); } #undef COMPILE_BACKTRACKINGPATH #undef CURRENT_AS -void -PRIV(jit_compile)(const REAL_PCRE *re, PUBL(extra) *extra, int mode) +#define PUBLIC_JIT_COMPILE_CONFIGURATION_OPTIONS \ + (PCRE2_JIT_INVALID_UTF) + +static int jit_compile(pcre2_code *code, sljit_u32 mode) { +pcre2_real_code *re = (pcre2_real_code *)code; struct sljit_compiler *compiler; backtrack_common rootbacktrack; compiler_common common_data; compiler_common *common = &common_data; const sljit_u8 *tables = re->tables; -pcre_study_data *study; +void *allocator_data = &re->memctl; int private_data_size; -pcre_uchar *ccend; +PCRE2_SPTR ccend; executable_functions *functions; void *executable_func; sljit_uw executable_size; sljit_uw total_length; -label_addr_list *label_addr; struct sljit_label *mainloop_label = NULL; struct sljit_label *continue_match_label; struct sljit_label *empty_match_found_label = NULL; @@ -10990,51 +13476,55 @@ struct sljit_label *reset_match_label; struct sljit_label *quit_label; struct sljit_jump *jump; struct sljit_jump *minlength_check_failed = NULL; -struct sljit_jump *reqbyte_notfound = NULL; struct sljit_jump *empty_match = NULL; +struct sljit_jump *end_anchor_failed = NULL; +jump_list *reqcu_not_found = NULL; -SLJIT_ASSERT((extra->flags & PCRE_EXTRA_STUDY_DATA) != 0); -study = extra->study_data; +SLJIT_ASSERT(tables); -if (!tables) - tables = PRIV(default_tables); +#if HAS_VIRTUAL_REGISTERS == 1 +SLJIT_ASSERT(sljit_get_register_index(TMP3) < 0 && sljit_get_register_index(ARGUMENTS) < 0 && sljit_get_register_index(RETURN_ADDR) < 0); +#elif HAS_VIRTUAL_REGISTERS == 0 +SLJIT_ASSERT(sljit_get_register_index(TMP3) >= 0 && sljit_get_register_index(ARGUMENTS) >= 0 && sljit_get_register_index(RETURN_ADDR) >= 0); +#else +#error "Invalid value for HAS_VIRTUAL_REGISTERS" +#endif memset(&rootbacktrack, 0, sizeof(backtrack_common)); memset(common, 0, sizeof(compiler_common)); -rootbacktrack.cc = (pcre_uchar *)re + re->name_table_offset + re->name_count * re->name_entry_size; +common->re = re; +common->name_table = (PCRE2_SPTR)((uint8_t *)re + sizeof(pcre2_real_code)); +rootbacktrack.cc = common->name_table + re->name_count * re->name_entry_size; + +#ifdef SUPPORT_UNICODE +common->invalid_utf = (mode & PCRE2_JIT_INVALID_UTF) != 0; +#endif /* SUPPORT_UNICODE */ +mode &= ~PUBLIC_JIT_COMPILE_CONFIGURATION_OPTIONS; common->start = rootbacktrack.cc; common->read_only_data_head = NULL; common->fcc = tables + fcc_offset; common->lcc = (sljit_sw)(tables + lcc_offset); common->mode = mode; -common->might_be_empty = study->minlength == 0; +common->might_be_empty = (re->minlength == 0) || (re->flags & PCRE2_MATCH_EMPTY); +common->allow_empty_partial = (re->max_lookbehind > 0) || (re->flags & PCRE2_MATCH_EMPTY); common->nltype = NLTYPE_FIXED; -switch(re->options & PCRE_NEWLINE_BITS) +switch(re->newline_convention) { - case 0: - /* Compile-time default */ - switch(NEWLINE) - { - case -1: common->newline = (CHAR_CR << 8) | CHAR_NL; common->nltype = NLTYPE_ANY; break; - case -2: common->newline = (CHAR_CR << 8) | CHAR_NL; common->nltype = NLTYPE_ANYCRLF; break; - default: common->newline = NEWLINE; break; - } - break; - case PCRE_NEWLINE_CR: common->newline = CHAR_CR; break; - case PCRE_NEWLINE_LF: common->newline = CHAR_NL; break; - case PCRE_NEWLINE_CR+ - PCRE_NEWLINE_LF: common->newline = (CHAR_CR << 8) | CHAR_NL; break; - case PCRE_NEWLINE_ANY: common->newline = (CHAR_CR << 8) | CHAR_NL; common->nltype = NLTYPE_ANY; break; - case PCRE_NEWLINE_ANYCRLF: common->newline = (CHAR_CR << 8) | CHAR_NL; common->nltype = NLTYPE_ANYCRLF; break; - default: return; + case PCRE2_NEWLINE_CR: common->newline = CHAR_CR; break; + case PCRE2_NEWLINE_LF: common->newline = CHAR_NL; break; + case PCRE2_NEWLINE_CRLF: common->newline = (CHAR_CR << 8) | CHAR_NL; break; + case PCRE2_NEWLINE_ANY: common->newline = (CHAR_CR << 8) | CHAR_NL; common->nltype = NLTYPE_ANY; break; + case PCRE2_NEWLINE_ANYCRLF: common->newline = (CHAR_CR << 8) | CHAR_NL; common->nltype = NLTYPE_ANYCRLF; break; + case PCRE2_NEWLINE_NUL: common->newline = CHAR_NUL; break; + default: return PCRE2_ERROR_INTERNAL; } common->nlmax = READ_CHAR_MAX; common->nlmin = 0; -if ((re->options & PCRE_BSR_ANYCRLF) != 0) - common->bsr_nltype = NLTYPE_ANYCRLF; -else if ((re->options & PCRE_BSR_UNICODE) != 0) +if (re->bsr_convention == PCRE2_BSR_UNICODE) common->bsr_nltype = NLTYPE_ANY; +else if (re->bsr_convention == PCRE2_BSR_ANYCRLF) + common->bsr_nltype = NLTYPE_ANYCRLF; else { #ifdef BSR_ANYCRLF @@ -11045,18 +13535,16 @@ else } common->bsr_nlmax = READ_CHAR_MAX; common->bsr_nlmin = 0; -common->endonly = (re->options & PCRE_DOLLAR_ENDONLY) != 0; +common->endonly = (re->overall_options & PCRE2_DOLLAR_ENDONLY) != 0; common->ctypes = (sljit_sw)(tables + ctypes_offset); -common->name_table = ((pcre_uchar *)re) + re->name_table_offset; common->name_count = re->name_count; common->name_entry_size = re->name_entry_size; -common->jscript_compat = (re->options & PCRE_JAVASCRIPT_COMPAT) != 0; -#ifdef SUPPORT_UTF +common->unset_backref = (re->overall_options & PCRE2_MATCH_UNSET_BACKREF) != 0; +common->alt_circumflex = (re->overall_options & PCRE2_ALT_CIRCUMFLEX) != 0; +#ifdef SUPPORT_UNICODE /* PCRE_UTF[16|32] have the same value as PCRE_UTF8. */ -common->utf = (re->options & PCRE_UTF8) != 0; -#ifdef SUPPORT_UCP -common->use_ucp = (re->options & PCRE_UCP) != 0; -#endif +common->utf = (re->overall_options & PCRE2_UTF) != 0; +common->ucp = (re->overall_options & PCRE2_UCP) != 0; if (common->utf) { if (common->nltype == NLTYPE_ANY) @@ -11080,14 +13568,16 @@ if (common->utf) common->bsr_nlmax = (CHAR_CR > CHAR_NL) ? CHAR_CR : CHAR_NL; common->bsr_nlmin = (CHAR_CR < CHAR_NL) ? CHAR_CR : CHAR_NL; } -#endif /* SUPPORT_UTF */ +else + common->invalid_utf = FALSE; +#endif /* SUPPORT_UNICODE */ ccend = bracketend(common->start); /* Calculate the local space size on the stack. */ common->ovector_start = LIMIT_MATCH + sizeof(sljit_sw); -common->optimized_cbracket = (sljit_u8 *)SLJIT_MALLOC(re->top_bracket + 1, compiler->allocator_data); +common->optimized_cbracket = (sljit_u8 *)SLJIT_MALLOC(re->top_bracket + 1, allocator_data); if (!common->optimized_cbracket) - return; + return PCRE2_ERROR_NOMEMORY; #if defined DEBUG_FORCE_UNOPTIMIZED_CBRAS && DEBUG_FORCE_UNOPTIMIZED_CBRAS == 1 memset(common->optimized_cbracket, 0, re->top_bracket + 1); #else @@ -11101,27 +13591,27 @@ common->ovector_start += sizeof(sljit_sw); #endif if (!check_opcode_types(common, common->start, ccend)) { - SLJIT_FREE(common->optimized_cbracket, compiler->allocator_data); - return; + SLJIT_FREE(common->optimized_cbracket, allocator_data); + return PCRE2_ERROR_NOMEMORY; } /* Checking flags and updating ovector_start. */ -if (mode == JIT_COMPILE && (re->flags & PCRE_REQCHSET) != 0 && (re->options & PCRE_NO_START_OPTIMIZE) == 0) +if (mode == PCRE2_JIT_COMPLETE && (re->flags & PCRE2_LASTSET) != 0 && (re->overall_options & PCRE2_NO_START_OPTIMIZE) == 0) { common->req_char_ptr = common->ovector_start; common->ovector_start += sizeof(sljit_sw); } -if (mode != JIT_COMPILE) +if (mode != PCRE2_JIT_COMPLETE) { common->start_used_ptr = common->ovector_start; common->ovector_start += sizeof(sljit_sw); - if (mode == JIT_PARTIAL_SOFT_COMPILE) + if (mode == PCRE2_JIT_PARTIAL_SOFT) { common->hit_start = common->ovector_start; - common->ovector_start += 2 * sizeof(sljit_sw); + common->ovector_start += sizeof(sljit_sw); } } -if ((re->options & PCRE_FIRSTLINE) != 0) +if ((re->overall_options & (PCRE2_FIRSTLINE | PCRE2_USE_OFFSET_LIMIT)) != 0) { common->match_end_ptr = common->ovector_start; common->ovector_start += sizeof(sljit_sw); @@ -11156,29 +13646,28 @@ SLJIT_ASSERT(!(common->req_char_ptr != 0 && common->start_used_ptr != 0)); common->cbra_ptr = OVECTOR_START + (re->top_bracket + 1) * 2 * sizeof(sljit_sw); total_length = ccend - common->start; -common->private_data_ptrs = (sljit_s32 *)SLJIT_MALLOC(total_length * (sizeof(sljit_s32) + (common->has_then ? 1 : 0)), compiler->allocator_data); +common->private_data_ptrs = (sljit_s32 *)SLJIT_MALLOC(total_length * (sizeof(sljit_s32) + (common->has_then ? 1 : 0)), allocator_data); if (!common->private_data_ptrs) { - SLJIT_FREE(common->optimized_cbracket, compiler->allocator_data); - return; + SLJIT_FREE(common->optimized_cbracket, allocator_data); + return PCRE2_ERROR_NOMEMORY; } memset(common->private_data_ptrs, 0, total_length * sizeof(sljit_s32)); private_data_size = common->cbra_ptr + (re->top_bracket + 1) * sizeof(sljit_sw); + +if ((re->overall_options & PCRE2_ANCHORED) == 0 && (re->overall_options & PCRE2_NO_START_OPTIMIZE) == 0 && !common->has_skip_in_assert_back) + detect_early_fail(common, common->start, &private_data_size, 0, 0); + set_private_data_ptrs(common, &private_data_size, ccend); -if ((re->options & PCRE_ANCHORED) == 0 && (re->options & PCRE_NO_START_OPTIMIZE) == 0) - { - if (!detect_fast_forward_skip(common, &private_data_size) && !common->has_skip_in_assert_back) - detect_fast_fail(common, common->start, &private_data_size, 4); - } -SLJIT_ASSERT(common->fast_fail_start_ptr <= common->fast_fail_end_ptr); +SLJIT_ASSERT(common->early_fail_start_ptr <= common->early_fail_end_ptr); if (private_data_size > SLJIT_MAX_LOCAL_SIZE) { - SLJIT_FREE(common->private_data_ptrs, compiler->allocator_data); - SLJIT_FREE(common->optimized_cbracket, compiler->allocator_data); - return; + SLJIT_FREE(common->private_data_ptrs, allocator_data); + SLJIT_FREE(common->optimized_cbracket, allocator_data); + return PCRE2_ERROR_NOMEMORY; } if (common->has_then) @@ -11188,12 +13677,12 @@ if (common->has_then) set_then_offsets(common, common->start, NULL); } -compiler = sljit_create_compiler(NULL); +compiler = sljit_create_compiler(allocator_data, NULL); if (!compiler) { - SLJIT_FREE(common->optimized_cbracket, compiler->allocator_data); - SLJIT_FREE(common->private_data_ptrs, compiler->allocator_data); - return; + SLJIT_FREE(common->optimized_cbracket, allocator_data); + SLJIT_FREE(common->private_data_ptrs, allocator_data); + return PCRE2_ERROR_NOMEMORY; } common->compiler = compiler; @@ -11216,10 +13705,10 @@ OP1(SLJIT_MOV, STACK_LIMIT, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(struct sljit_sta OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 1); OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LIMIT_MATCH, TMP1, 0); -if (common->fast_fail_start_ptr < common->fast_fail_end_ptr) - reset_fast_fail(common); +if (common->early_fail_start_ptr < common->early_fail_end_ptr) + reset_early_fail(common); -if (mode == JIT_PARTIAL_SOFT_COMPILE) +if (mode == PCRE2_JIT_PARTIAL_SOFT) OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->hit_start, SLJIT_IMM, -1); if (common->mark_ptr != 0) OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->mark_ptr, SLJIT_IMM, 0); @@ -11227,68 +13716,70 @@ if (common->control_head_ptr != 0) OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_IMM, 0); /* Main part of the matching */ -if ((re->options & PCRE_ANCHORED) == 0) +if ((re->overall_options & PCRE2_ANCHORED) == 0) { - mainloop_label = mainloop_entry(common, (re->flags & PCRE_HASCRORLF) != 0); + mainloop_label = mainloop_entry(common); continue_match_label = LABEL(); /* Forward search if possible. */ - if ((re->options & PCRE_NO_START_OPTIMIZE) == 0) + if ((re->overall_options & PCRE2_NO_START_OPTIMIZE) == 0) { - if (mode == JIT_COMPILE && fast_forward_first_n_chars(common)) + if (mode == PCRE2_JIT_COMPLETE && fast_forward_first_n_chars(common)) ; - else if ((re->flags & PCRE_FIRSTSET) != 0) - fast_forward_first_char(common, (pcre_uchar)re->first_char, (re->flags & PCRE_FCH_CASELESS) != 0); - else if ((re->flags & PCRE_STARTLINE) != 0) + else if ((re->flags & PCRE2_FIRSTSET) != 0) + fast_forward_first_char(common); + else if ((re->flags & PCRE2_STARTLINE) != 0) fast_forward_newline(common); - else if (study != NULL && (study->flags & PCRE_STUDY_MAPPED) != 0) - fast_forward_start_bits(common, study->start_bits); + else if ((re->flags & PCRE2_FIRSTMAPSET) != 0) + fast_forward_start_bits(common); } } else continue_match_label = LABEL(); -if (mode == JIT_COMPILE && study->minlength > 0 && (re->options & PCRE_NO_START_OPTIMIZE) == 0) +if (mode == PCRE2_JIT_COMPLETE && re->minlength > 0 && (re->overall_options & PCRE2_NO_START_OPTIMIZE) == 0) { - OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE_ERROR_NOMATCH); - OP2(SLJIT_ADD, TMP2, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(study->minlength)); + OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_NOMATCH); + OP2(SLJIT_ADD, TMP2, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(re->minlength)); minlength_check_failed = CMP(SLJIT_GREATER, TMP2, 0, STR_END, 0); } if (common->req_char_ptr != 0) - reqbyte_notfound = search_requested_char(common, (pcre_uchar)re->req_char, (re->flags & PCRE_RCH_CASELESS) != 0, (re->flags & PCRE_FIRSTSET) != 0); + reqcu_not_found = search_requested_char(common, (PCRE2_UCHAR)(re->last_codeunit), (re->flags & PCRE2_LASTCASELESS) != 0, (re->flags & PCRE2_FIRSTSET) != 0); /* Store the current STR_PTR in OVECTOR(0). */ OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), OVECTOR(0), STR_PTR, 0); /* Copy the limit of allowed recursions. */ OP1(SLJIT_MOV, COUNT_MATCH, 0, SLJIT_MEM1(SLJIT_SP), LIMIT_MATCH); if (common->capture_last_ptr != 0) - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->capture_last_ptr, SLJIT_IMM, -1); + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->capture_last_ptr, SLJIT_IMM, 0); if (common->fast_forward_bc_ptr != NULL) - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), PRIVATE_DATA(common->fast_forward_bc_ptr + 1), STR_PTR, 0); + OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), PRIVATE_DATA(common->fast_forward_bc_ptr + 1) >> 3, STR_PTR, 0); if (common->start_ptr != OVECTOR(0)) OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->start_ptr, STR_PTR, 0); /* Copy the beginning of the string. */ -if (mode == JIT_PARTIAL_SOFT_COMPILE) +if (mode == PCRE2_JIT_PARTIAL_SOFT) { jump = CMP(SLJIT_NOT_EQUAL, SLJIT_MEM1(SLJIT_SP), common->hit_start, SLJIT_IMM, -1); OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, STR_PTR, 0); - OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->hit_start + sizeof(sljit_sw), STR_PTR, 0); JUMPHERE(jump); } -else if (mode == JIT_PARTIAL_HARD_COMPILE) +else if (mode == PCRE2_JIT_PARTIAL_HARD) OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, STR_PTR, 0); compile_matchingpath(common, common->start, ccend, &rootbacktrack); if (SLJIT_UNLIKELY(sljit_get_compiler_error(compiler))) { sljit_free_compiler(compiler); - SLJIT_FREE(common->optimized_cbracket, compiler->allocator_data); - SLJIT_FREE(common->private_data_ptrs, compiler->allocator_data); - free_read_only_data(common->read_only_data_head, compiler->allocator_data); - return; + SLJIT_FREE(common->optimized_cbracket, allocator_data); + SLJIT_FREE(common->private_data_ptrs, allocator_data); + PRIV(jit_free_rodata)(common->read_only_data_head, allocator_data); + return PCRE2_ERROR_NOMEMORY; } +if ((re->overall_options & PCRE2_ENDANCHORED) != 0) + end_anchor_failed = CMP(SLJIT_NOT_EQUAL, STR_PTR, 0, STR_END, 0); + if (common->might_be_empty) { empty_match = CMP(SLJIT_EQUAL, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), OVECTOR(0)); @@ -11301,16 +13792,29 @@ if (common->accept != NULL) /* This means we have a match. Update the ovector. */ copy_ovector(common, re->top_bracket + 1); -common->quit_label = common->forced_quit_label = LABEL(); +common->quit_label = common->abort_label = LABEL(); if (common->quit != NULL) set_jumps(common->quit, common->quit_label); -if (common->forced_quit != NULL) - set_jumps(common->forced_quit, common->forced_quit_label); +if (common->abort != NULL) + set_jumps(common->abort, common->abort_label); if (minlength_check_failed != NULL) - SET_LABEL(minlength_check_failed, common->forced_quit_label); + SET_LABEL(minlength_check_failed, common->abort_label); + +sljit_emit_op0(compiler, SLJIT_SKIP_FRAMES_BEFORE_RETURN); sljit_emit_return(compiler, SLJIT_MOV, SLJIT_RETURN_REG, 0); -if (mode != JIT_COMPILE) +if (common->failed_match != NULL) + { + SLJIT_ASSERT(common->mode == PCRE2_JIT_COMPLETE); + set_jumps(common->failed_match, LABEL()); + OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_NOMATCH); + JUMPTO(SLJIT_JUMP, common->abort_label); + } + +if ((re->overall_options & PCRE2_ENDANCHORED) != 0) + JUMPHERE(end_anchor_failed); + +if (mode != PCRE2_JIT_COMPLETE) { common->partialmatchlabel = LABEL(); set_jumps(common->partialmatch, common->partialmatchlabel); @@ -11323,55 +13827,64 @@ compile_backtrackingpath(common, rootbacktrack.top); if (SLJIT_UNLIKELY(sljit_get_compiler_error(compiler))) { sljit_free_compiler(compiler); - SLJIT_FREE(common->optimized_cbracket, compiler->allocator_data); - SLJIT_FREE(common->private_data_ptrs, compiler->allocator_data); - free_read_only_data(common->read_only_data_head, compiler->allocator_data); - return; + SLJIT_FREE(common->optimized_cbracket, allocator_data); + SLJIT_FREE(common->private_data_ptrs, allocator_data); + PRIV(jit_free_rodata)(common->read_only_data_head, allocator_data); + return PCRE2_ERROR_NOMEMORY; } SLJIT_ASSERT(rootbacktrack.prev == NULL); reset_match_label = LABEL(); -if (mode == JIT_PARTIAL_SOFT_COMPILE) +if (mode == PCRE2_JIT_PARTIAL_SOFT) { /* Update hit_start only in the first time. */ jump = CMP(SLJIT_NOT_EQUAL, SLJIT_MEM1(SLJIT_SP), common->hit_start, SLJIT_IMM, 0); - OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr); + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->start_ptr); OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->start_used_ptr, SLJIT_IMM, -1); OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->hit_start, TMP1, 0); JUMPHERE(jump); } /* Check we have remaining characters. */ -if ((re->options & PCRE_ANCHORED) == 0 && (re->options & PCRE_FIRSTLINE) != 0) +if ((re->overall_options & PCRE2_ANCHORED) == 0 && common->match_end_ptr != 0) { - SLJIT_ASSERT(common->match_end_ptr != 0); OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->match_end_ptr); } OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), - (common->fast_forward_bc_ptr != NULL) ? (PRIVATE_DATA(common->fast_forward_bc_ptr + 1)) : common->start_ptr); + (common->fast_forward_bc_ptr != NULL) ? (PRIVATE_DATA(common->fast_forward_bc_ptr + 1) >> 3) : common->start_ptr); -if ((re->options & PCRE_ANCHORED) == 0) +if ((re->overall_options & PCRE2_ANCHORED) == 0) { if (common->ff_newline_shortcut != NULL) { - if ((re->options & PCRE_FIRSTLINE) == 0) - CMPTO(SLJIT_LESS, STR_PTR, 0, STR_END, 0, common->ff_newline_shortcut); - /* There cannot be more newlines here. */ + /* There cannot be more newlines if PCRE2_FIRSTLINE is set. */ + if ((re->overall_options & PCRE2_FIRSTLINE) == 0) + { + if (common->match_end_ptr != 0) + { + OP1(SLJIT_MOV, TMP3, 0, STR_END, 0); + OP1(SLJIT_MOV, STR_END, 0, TMP1, 0); + CMPTO(SLJIT_LESS, STR_PTR, 0, TMP1, 0, common->ff_newline_shortcut); + OP1(SLJIT_MOV, STR_END, 0, TMP3, 0); + } + else + CMPTO(SLJIT_LESS, STR_PTR, 0, STR_END, 0, common->ff_newline_shortcut); + } } else - CMPTO(SLJIT_LESS, STR_PTR, 0, ((re->options & PCRE_FIRSTLINE) == 0) ? STR_END : TMP1, 0, mainloop_label); + CMPTO(SLJIT_LESS, STR_PTR, 0, (common->match_end_ptr == 0) ? STR_END : TMP1, 0, mainloop_label); } /* No more remaining characters. */ -if (reqbyte_notfound != NULL) - JUMPHERE(reqbyte_notfound); +if (reqcu_not_found != NULL) + set_jumps(reqcu_not_found, LABEL()); -if (mode == JIT_PARTIAL_SOFT_COMPILE) +if (mode == PCRE2_JIT_PARTIAL_SOFT) CMPTO(SLJIT_NOT_EQUAL, SLJIT_MEM1(SLJIT_SP), common->hit_start, SLJIT_IMM, -1, common->partialmatchlabel); -OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE_ERROR_NOMATCH); +OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_NOMATCH); JUMPTO(SLJIT_JUMP, common->quit_label); flush_stubs(common); @@ -11380,20 +13893,21 @@ if (common->might_be_empty) { JUMPHERE(empty_match); OP1(SLJIT_MOV, TMP1, 0, ARGUMENTS, 0); - OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, notempty)); - CMPTO(SLJIT_NOT_EQUAL, TMP2, 0, SLJIT_IMM, 0, empty_match_backtrack_label); - OP1(SLJIT_MOV_U8, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, notempty_atstart)); - CMPTO(SLJIT_EQUAL, TMP2, 0, SLJIT_IMM, 0, empty_match_found_label); + OP1(SLJIT_MOV_U32, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, options)); + OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP2, 0, SLJIT_IMM, PCRE2_NOTEMPTY); + JUMPTO(SLJIT_NOT_ZERO, empty_match_backtrack_label); + OP2(SLJIT_AND | SLJIT_SET_Z, SLJIT_UNUSED, 0, TMP2, 0, SLJIT_IMM, PCRE2_NOTEMPTY_ATSTART); + JUMPTO(SLJIT_ZERO, empty_match_found_label); OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, str)); CMPTO(SLJIT_NOT_EQUAL, TMP2, 0, STR_PTR, 0, empty_match_found_label); JUMPTO(SLJIT_JUMP, empty_match_backtrack_label); } common->fast_forward_bc_ptr = NULL; -common->fast_fail_start_ptr = 0; -common->fast_fail_end_ptr = 0; +common->early_fail_start_ptr = 0; +common->early_fail_end_ptr = 0; common->currententry = common->entries; -common->local_exit = TRUE; +common->local_quit_available = TRUE; quit_label = common->quit_label; while (common->currententry != NULL) { @@ -11402,15 +13916,15 @@ while (common->currententry != NULL) if (SLJIT_UNLIKELY(sljit_get_compiler_error(compiler))) { sljit_free_compiler(compiler); - SLJIT_FREE(common->optimized_cbracket, compiler->allocator_data); - SLJIT_FREE(common->private_data_ptrs, compiler->allocator_data); - free_read_only_data(common->read_only_data_head, compiler->allocator_data); - return; + SLJIT_FREE(common->optimized_cbracket, allocator_data); + SLJIT_FREE(common->private_data_ptrs, allocator_data); + PRIV(jit_free_rodata)(common->read_only_data_head, allocator_data); + return PCRE2_ERROR_NOMEMORY; } flush_stubs(common); common->currententry = common->currententry->next; } -common->local_exit = FALSE; +common->local_quit_available = FALSE; common->quit_label = quit_label; /* Allocating stack, returns with PCRE_ERROR_JIT_STACKLIMIT if fails. */ @@ -11419,31 +13933,32 @@ set_jumps(common->stackalloc, LABEL()); /* RETURN_ADDR is not a saved register. */ sljit_emit_fast_enter(compiler, SLJIT_MEM1(SLJIT_SP), LOCALS0); -SLJIT_ASSERT(TMP1 == SLJIT_R0 && STACK_TOP == SLJIT_R1); +SLJIT_ASSERT(TMP1 == SLJIT_R0 && STR_PTR == SLJIT_R1); -OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LOCALS1, STACK_TOP, 0); +OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LOCALS1, STR_PTR, 0); OP1(SLJIT_MOV, SLJIT_R0, 0, ARGUMENTS, 0); OP2(SLJIT_SUB, SLJIT_R1, 0, STACK_LIMIT, 0, SLJIT_IMM, STACK_GROWTH_RATE); OP1(SLJIT_MOV, SLJIT_R0, 0, SLJIT_MEM1(SLJIT_R0), SLJIT_OFFSETOF(jit_arguments, stack)); OP1(SLJIT_MOV, STACK_LIMIT, 0, TMP2, 0); sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(SW), SLJIT_IMM, SLJIT_FUNC_OFFSET(sljit_stack_resize)); + jump = CMP(SLJIT_EQUAL, SLJIT_RETURN_REG, 0, SLJIT_IMM, 0); OP1(SLJIT_MOV, TMP2, 0, STACK_LIMIT, 0); OP1(SLJIT_MOV, STACK_LIMIT, 0, SLJIT_RETURN_REG, 0); OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), LOCALS0); -OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(SLJIT_SP), LOCALS1); -sljit_emit_fast_return(compiler, TMP1, 0); +OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), LOCALS1); +OP_SRC(SLJIT_FAST_RETURN, TMP1, 0); /* Allocation failed. */ JUMPHERE(jump); /* We break the return address cache here, but this is a really rare case. */ -OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE_ERROR_JIT_STACKLIMIT); +OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_JIT_STACKLIMIT); JUMPTO(SLJIT_JUMP, common->quit_label); /* Call limit reached. */ set_jumps(common->calllimit, LABEL()); -OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE_ERROR_MATCHLIMIT); +OP1(SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, PCRE2_ERROR_MATCHLIMIT); JUMPTO(SLJIT_JUMP, common->quit_label); if (common->revertframes != NULL) @@ -11489,425 +14004,247 @@ if (common->reset_match != NULL) OP1(SLJIT_MOV, STR_PTR, 0, TMP1, 0); JUMPTO(SLJIT_JUMP, reset_match_label); } -#ifdef SUPPORT_UTF -#ifdef COMPILE_PCRE8 +#ifdef SUPPORT_UNICODE +#if PCRE2_CODE_UNIT_WIDTH == 8 if (common->utfreadchar != NULL) { set_jumps(common->utfreadchar, LABEL()); do_utfreadchar(common); } -if (common->utfreadchar16 != NULL) - { - set_jumps(common->utfreadchar16, LABEL()); - do_utfreadchar16(common); - } if (common->utfreadtype8 != NULL) { set_jumps(common->utfreadtype8, LABEL()); do_utfreadtype8(common); } -#endif /* COMPILE_PCRE8 */ -#endif /* SUPPORT_UTF */ -#ifdef SUPPORT_UCP +if (common->utfpeakcharback != NULL) + { + set_jumps(common->utfpeakcharback, LABEL()); + do_utfpeakcharback(common); + } +#endif /* PCRE2_CODE_UNIT_WIDTH == 8 */ +#if PCRE2_CODE_UNIT_WIDTH == 8 || PCRE2_CODE_UNIT_WIDTH == 16 +if (common->utfreadchar_invalid != NULL) + { + set_jumps(common->utfreadchar_invalid, LABEL()); + do_utfreadchar_invalid(common); + } +if (common->utfreadnewline_invalid != NULL) + { + set_jumps(common->utfreadnewline_invalid, LABEL()); + do_utfreadnewline_invalid(common); + } +if (common->utfmoveback_invalid) + { + set_jumps(common->utfmoveback_invalid, LABEL()); + do_utfmoveback_invalid(common); + } +if (common->utfpeakcharback_invalid) + { + set_jumps(common->utfpeakcharback_invalid, LABEL()); + do_utfpeakcharback_invalid(common); + } +#endif /* PCRE2_CODE_UNIT_WIDTH == 8 || PCRE2_CODE_UNIT_WIDTH == 16 */ if (common->getucd != NULL) { set_jumps(common->getucd, LABEL()); do_getucd(common); } -#endif +if (common->getucdtype != NULL) + { + set_jumps(common->getucdtype, LABEL()); + do_getucdtype(common); + } +#endif /* SUPPORT_UNICODE */ -SLJIT_FREE(common->optimized_cbracket, compiler->allocator_data); -SLJIT_FREE(common->private_data_ptrs, compiler->allocator_data); +SLJIT_FREE(common->optimized_cbracket, allocator_data); +SLJIT_FREE(common->private_data_ptrs, allocator_data); executable_func = sljit_generate_code(compiler); executable_size = sljit_get_generated_code_size(compiler); -label_addr = common->label_addrs; -while (label_addr != NULL) - { - *label_addr->update_addr = sljit_get_label_addr(label_addr->label); - label_addr = label_addr->next; - } sljit_free_compiler(compiler); + if (executable_func == NULL) { - free_read_only_data(common->read_only_data_head, compiler->allocator_data); - return; + PRIV(jit_free_rodata)(common->read_only_data_head, allocator_data); + return PCRE2_ERROR_NOMEMORY; } /* Reuse the function descriptor if possible. */ -if ((extra->flags & PCRE_EXTRA_EXECUTABLE_JIT) != 0 && extra->executable_jit != NULL) - functions = (executable_functions *)extra->executable_jit; +if (re->executable_jit != NULL) + functions = (executable_functions *)re->executable_jit; else { - /* Note: If your memory-checker has flagged the allocation below as a - * memory leak, it is probably because you either forgot to call - * pcre_free_study() (or pcre16_free_study()) on the pcre_extra (or - * pcre16_extra) object, or you called said function after having - * cleared the PCRE_EXTRA_EXECUTABLE_JIT bit from the "flags" field - * of the object. (The function will only free the JIT data if the - * bit remains set, as the bit indicates that the pointer to the data - * is valid.) - */ - functions = SLJIT_MALLOC(sizeof(executable_functions), compiler->allocator_data); + functions = SLJIT_MALLOC(sizeof(executable_functions), allocator_data); if (functions == NULL) { /* This case is highly unlikely since we just recently freed a lot of memory. Not impossible though. */ - sljit_free_code(executable_func); - free_read_only_data(common->read_only_data_head, compiler->allocator_data); - return; + sljit_free_code(executable_func, NULL); + PRIV(jit_free_rodata)(common->read_only_data_head, allocator_data); + return PCRE2_ERROR_NOMEMORY; } memset(functions, 0, sizeof(executable_functions)); - functions->top_bracket = (re->top_bracket + 1) * 2; - functions->limit_match = (re->flags & PCRE_MLSET) != 0 ? re->limit_match : 0; - extra->executable_jit = functions; - extra->flags |= PCRE_EXTRA_EXECUTABLE_JIT; + functions->top_bracket = re->top_bracket + 1; + functions->limit_match = re->limit_match; + re->executable_jit = functions; } +/* Turn mode into an index. */ +if (mode == PCRE2_JIT_COMPLETE) + mode = 0; +else + mode = (mode == PCRE2_JIT_PARTIAL_SOFT) ? 1 : 2; + +SLJIT_ASSERT(mode < JIT_NUMBER_OF_COMPILE_MODES); functions->executable_funcs[mode] = executable_func; functions->read_only_data_heads[mode] = common->read_only_data_head; functions->executable_sizes[mode] = executable_size; +return 0; } -static SLJIT_NOINLINE int jit_machine_stack_exec(jit_arguments *arguments, void *executable_func) -{ -union { - void *executable_func; - jit_function call_executable_func; -} convert_executable_func; -sljit_u8 local_space[MACHINE_STACK_SIZE]; -struct sljit_stack local_stack; - -local_stack.min_start = local_space; -local_stack.start = local_space; -local_stack.end = local_space + MACHINE_STACK_SIZE; -local_stack.top = local_space + MACHINE_STACK_SIZE; -arguments->stack = &local_stack; -convert_executable_func.executable_func = executable_func; -return convert_executable_func.call_executable_func(arguments); -} +#endif + +/************************************************* +* JIT compile a Regular Expression * +*************************************************/ + +/* This function used JIT to convert a previously-compiled pattern into machine +code. + +Arguments: + code a compiled pattern + options JIT option bits + +Returns: 0: success or (*NOJIT) was used + <0: an error code +*/ + +#define PUBLIC_JIT_COMPILE_OPTIONS \ + (PCRE2_JIT_COMPLETE|PCRE2_JIT_PARTIAL_SOFT|PCRE2_JIT_PARTIAL_HARD|PCRE2_JIT_INVALID_UTF) -int -PRIV(jit_exec)(const PUBL(extra) *extra_data, const pcre_uchar *subject, - int length, int start_offset, int options, int *offsets, int offset_count) +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_jit_compile(pcre2_code *code, uint32_t options) { -executable_functions *functions = (executable_functions *)extra_data->executable_jit; -union { - void *executable_func; - jit_function call_executable_func; -} convert_executable_func; -jit_arguments arguments; -int max_offset_count; -int retval; -int mode = JIT_COMPILE; - -if ((options & PCRE_PARTIAL_HARD) != 0) - mode = JIT_PARTIAL_HARD_COMPILE; -else if ((options & PCRE_PARTIAL_SOFT) != 0) - mode = JIT_PARTIAL_SOFT_COMPILE; - -if (functions->executable_funcs[mode] == NULL) - return PCRE_ERROR_JIT_BADOPTION; - -/* Sanity checks should be handled by pcre_exec. */ -arguments.str = subject + start_offset; -arguments.begin = subject; -arguments.end = subject + length; -arguments.mark_ptr = NULL; -/* JIT decreases this value less frequently than the interpreter. */ -arguments.limit_match = ((extra_data->flags & PCRE_EXTRA_MATCH_LIMIT) == 0) ? MATCH_LIMIT : (sljit_u32)(extra_data->match_limit); -if (functions->limit_match != 0 && functions->limit_match < arguments.limit_match) - arguments.limit_match = functions->limit_match; -arguments.notbol = (options & PCRE_NOTBOL) != 0; -arguments.noteol = (options & PCRE_NOTEOL) != 0; -arguments.notempty = (options & PCRE_NOTEMPTY) != 0; -arguments.notempty_atstart = (options & PCRE_NOTEMPTY_ATSTART) != 0; -arguments.offsets = offsets; -arguments.callout_data = (extra_data->flags & PCRE_EXTRA_CALLOUT_DATA) != 0 ? extra_data->callout_data : NULL; -arguments.real_offset_count = offset_count; - -/* pcre_exec() rounds offset_count to a multiple of 3, and then uses only 2/3 of -the output vector for storing captured strings, with the remainder used as -workspace. We don't need the workspace here. For compatibility, we limit the -number of captured strings in the same way as pcre_exec(), so that the user -gets the same result with and without JIT. */ - -if (offset_count != 2) - offset_count = ((offset_count - (offset_count % 3)) * 2) / 3; -max_offset_count = functions->top_bracket; -if (offset_count > max_offset_count) - offset_count = max_offset_count; -arguments.offset_count = offset_count; - -if (functions->callback) - arguments.stack = (struct sljit_stack *)functions->callback(functions->userdata); -else - arguments.stack = (struct sljit_stack *)functions->userdata; +pcre2_real_code *re = (pcre2_real_code *)code; -if (arguments.stack == NULL) - retval = jit_machine_stack_exec(&arguments, functions->executable_funcs[mode]); -else - { - convert_executable_func.executable_func = functions->executable_funcs[mode]; - retval = convert_executable_func.call_executable_func(&arguments); - } +if (code == NULL) + return PCRE2_ERROR_NULL; -if (retval * 2 > offset_count) - retval = 0; -if ((extra_data->flags & PCRE_EXTRA_MARK) != 0) - *(extra_data->mark) = arguments.mark_ptr; +if ((options & ~PUBLIC_JIT_COMPILE_OPTIONS) != 0) + return PCRE2_ERROR_JIT_BADOPTION; -return retval; -} +/* Support for invalid UTF was first introduced in JIT, with the option +PCRE2_JIT_INVALID_UTF. Later, support was added to the interpreter, and the +compile-time option PCRE2_MATCH_INVALID_UTF was created. This is now the +preferred feature, with the earlier option deprecated. However, for backward +compatibility, if the earlier option is set, it forces the new option so that +if JIT matching falls back to the interpreter, there is still support for +invalid UTF. However, if this function has already been successfully called +without PCRE2_JIT_INVALID_UTF and without PCRE2_MATCH_INVALID_UTF (meaning that +non-invalid-supporting JIT code was compiled), give an error. + +If in the future support for PCRE2_JIT_INVALID_UTF is withdrawn, the following +actions are needed: -#if defined COMPILE_PCRE8 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre_jit_exec(const pcre *argument_re, const pcre_extra *extra_data, - PCRE_SPTR subject, int length, int start_offset, int options, - int *offsets, int offset_count, pcre_jit_stack *stack) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre16_jit_exec(const pcre16 *argument_re, const pcre16_extra *extra_data, - PCRE_SPTR16 subject, int length, int start_offset, int options, - int *offsets, int offset_count, pcre16_jit_stack *stack) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DEFN int PCRE_CALL_CONVENTION -pcre32_jit_exec(const pcre32 *argument_re, const pcre32_extra *extra_data, - PCRE_SPTR32 subject, int length, int start_offset, int options, - int *offsets, int offset_count, pcre32_jit_stack *stack) + 1. Remove the definition from pcre2.h.in and from the list in + PUBLIC_JIT_COMPILE_OPTIONS above. + + 2. Replace PCRE2_JIT_INVALID_UTF with a local flag in this module. + + 3. Replace PCRE2_JIT_INVALID_UTF in pcre2_jit_test.c. + + 4. Delete the following short block of code. The setting of "re" and + "functions" can be moved into the JIT-only block below, but if that is + done, (void)re and (void)functions will be needed in the non-JIT case, to + avoid compiler warnings. +*/ + +#ifdef SUPPORT_JIT +executable_functions *functions = (executable_functions *)re->executable_jit; +static int executable_allocator_is_working = 0; #endif -{ -pcre_uchar *subject_ptr = (pcre_uchar *)subject; -executable_functions *functions = (executable_functions *)extra_data->executable_jit; -union { - void *executable_func; - jit_function call_executable_func; -} convert_executable_func; -jit_arguments arguments; -int max_offset_count; -int retval; -int mode = JIT_COMPILE; - -SLJIT_UNUSED_ARG(argument_re); - -/* Plausibility checks */ -if ((options & ~PUBLIC_JIT_EXEC_OPTIONS) != 0) return PCRE_ERROR_JIT_BADOPTION; - -if ((options & PCRE_PARTIAL_HARD) != 0) - mode = JIT_PARTIAL_HARD_COMPILE; -else if ((options & PCRE_PARTIAL_SOFT) != 0) - mode = JIT_PARTIAL_SOFT_COMPILE; - -if (functions == NULL || functions->executable_funcs[mode] == NULL) - return PCRE_ERROR_JIT_BADOPTION; - -/* Sanity checks should be handled by pcre_exec. */ -arguments.stack = (struct sljit_stack *)stack; -arguments.str = subject_ptr + start_offset; -arguments.begin = subject_ptr; -arguments.end = subject_ptr + length; -arguments.mark_ptr = NULL; -/* JIT decreases this value less frequently than the interpreter. */ -arguments.limit_match = ((extra_data->flags & PCRE_EXTRA_MATCH_LIMIT) == 0) ? MATCH_LIMIT : (sljit_u32)(extra_data->match_limit); -if (functions->limit_match != 0 && functions->limit_match < arguments.limit_match) - arguments.limit_match = functions->limit_match; -arguments.notbol = (options & PCRE_NOTBOL) != 0; -arguments.noteol = (options & PCRE_NOTEOL) != 0; -arguments.notempty = (options & PCRE_NOTEMPTY) != 0; -arguments.notempty_atstart = (options & PCRE_NOTEMPTY_ATSTART) != 0; -arguments.offsets = offsets; -arguments.callout_data = (extra_data->flags & PCRE_EXTRA_CALLOUT_DATA) != 0 ? extra_data->callout_data : NULL; -arguments.real_offset_count = offset_count; - -/* pcre_exec() rounds offset_count to a multiple of 3, and then uses only 2/3 of -the output vector for storing captured strings, with the remainder used as -workspace. We don't need the workspace here. For compatibility, we limit the -number of captured strings in the same way as pcre_exec(), so that the user -gets the same result with and without JIT. */ - -if (offset_count != 2) - offset_count = ((offset_count - (offset_count % 3)) * 2) / 3; -max_offset_count = functions->top_bracket; -if (offset_count > max_offset_count) - offset_count = max_offset_count; -arguments.offset_count = offset_count; - -convert_executable_func.executable_func = functions->executable_funcs[mode]; -retval = convert_executable_func.call_executable_func(&arguments); - -if (retval * 2 > offset_count) - retval = 0; -if ((extra_data->flags & PCRE_EXTRA_MARK) != 0) - *(extra_data->mark) = arguments.mark_ptr; - -return retval; -} -void -PRIV(jit_free)(void *executable_funcs) -{ -int i; -executable_functions *functions = (executable_functions *)executable_funcs; -for (i = 0; i < JIT_NUMBER_OF_COMPILE_MODES; i++) +if ((options & PCRE2_JIT_INVALID_UTF) != 0) { - if (functions->executable_funcs[i] != NULL) - sljit_free_code(functions->executable_funcs[i]); - free_read_only_data(functions->read_only_data_heads[i], NULL); + if ((re->overall_options & PCRE2_MATCH_INVALID_UTF) == 0) + { +#ifdef SUPPORT_JIT + if (functions != NULL) return PCRE2_ERROR_JIT_BADOPTION; +#endif + re->overall_options |= PCRE2_MATCH_INVALID_UTF; + } } -SLJIT_FREE(functions, compiler->allocator_data); -} -int -PRIV(jit_get_size)(void *executable_funcs) -{ -int i; -sljit_uw size = 0; -sljit_uw *executable_sizes = ((executable_functions *)executable_funcs)->executable_sizes; -for (i = 0; i < JIT_NUMBER_OF_COMPILE_MODES; i++) - size += executable_sizes[i]; -return (int)size; -} +/* The above tests are run with and without JIT support. This means that +PCRE2_JIT_INVALID_UTF propagates back into the regex options (ensuring +interpreter support) even in the absence of JIT. But now, if there is no JIT +support, give an error return. */ -const char* -PRIV(jit_get_target)(void) -{ -return sljit_get_platform_name(); -} +#ifndef SUPPORT_JIT +return PCRE2_ERROR_JIT_BADOPTION; +#else /* SUPPORT_JIT */ -#if defined COMPILE_PCRE8 -PCRE_EXP_DECL pcre_jit_stack * -pcre_jit_stack_alloc(int startsize, int maxsize) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DECL pcre16_jit_stack * -pcre16_jit_stack_alloc(int startsize, int maxsize) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DECL pcre32_jit_stack * -pcre32_jit_stack_alloc(int startsize, int maxsize) -#endif -{ -if (startsize < 1 || maxsize < 1) - return NULL; -if (startsize > maxsize) - startsize = maxsize; -startsize = (startsize + STACK_GROWTH_RATE - 1) & ~(STACK_GROWTH_RATE - 1); -maxsize = (maxsize + STACK_GROWTH_RATE - 1) & ~(STACK_GROWTH_RATE - 1); -return (PUBL(jit_stack)*)sljit_allocate_stack(startsize, maxsize, NULL); -} +/* There is JIT support. Do the necessary. */ -#if defined COMPILE_PCRE8 -PCRE_EXP_DECL void -pcre_jit_stack_free(pcre_jit_stack *stack) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DECL void -pcre16_jit_stack_free(pcre16_jit_stack *stack) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DECL void -pcre32_jit_stack_free(pcre32_jit_stack *stack) -#endif -{ -sljit_free_stack((struct sljit_stack *)stack, NULL); -} +if ((re->flags & PCRE2_NOJIT) != 0) return 0; -#if defined COMPILE_PCRE8 -PCRE_EXP_DECL void -pcre_assign_jit_stack(pcre_extra *extra, pcre_jit_callback callback, void *userdata) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DECL void -pcre16_assign_jit_stack(pcre16_extra *extra, pcre16_jit_callback callback, void *userdata) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DECL void -pcre32_assign_jit_stack(pcre32_extra *extra, pcre32_jit_callback callback, void *userdata) -#endif -{ -executable_functions *functions; -if (extra != NULL && - (extra->flags & PCRE_EXTRA_EXECUTABLE_JIT) != 0 && - extra->executable_jit != NULL) +if (executable_allocator_is_working == 0) { - functions = (executable_functions *)extra->executable_jit; - functions->callback = callback; - functions->userdata = userdata; + /* Checks whether the executable allocator is working. This check + might run multiple times in multi-threaded environments, but the + result should not be affected by it. */ + void *ptr = SLJIT_MALLOC_EXEC(32, NULL); + + executable_allocator_is_working = -1; + + if (ptr != NULL) + { + SLJIT_FREE_EXEC(((sljit_u8*)(ptr)) + SLJIT_EXEC_OFFSET(ptr), NULL); + executable_allocator_is_working = 1; + } } -} -#if defined COMPILE_PCRE8 -PCRE_EXP_DECL void -pcre_jit_free_unused_memory(void) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DECL void -pcre16_jit_free_unused_memory(void) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DECL void -pcre32_jit_free_unused_memory(void) -#endif -{ -sljit_free_unused_memory_exec(); -} +if (executable_allocator_is_working < 0) + return PCRE2_ERROR_NOMEMORY; -#else /* SUPPORT_JIT */ +if ((re->overall_options & PCRE2_MATCH_INVALID_UTF) != 0) + options |= PCRE2_JIT_INVALID_UTF; -/* These are dummy functions to avoid linking errors when JIT support is not -being compiled. */ +if ((options & PCRE2_JIT_COMPLETE) != 0 && (functions == NULL + || functions->executable_funcs[0] == NULL)) { + uint32_t excluded_options = (PCRE2_JIT_PARTIAL_SOFT | PCRE2_JIT_PARTIAL_HARD); + int result = jit_compile(code, options & ~excluded_options); + if (result != 0) + return result; + } -#if defined COMPILE_PCRE8 -PCRE_EXP_DECL pcre_jit_stack * -pcre_jit_stack_alloc(int startsize, int maxsize) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DECL pcre16_jit_stack * -pcre16_jit_stack_alloc(int startsize, int maxsize) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DECL pcre32_jit_stack * -pcre32_jit_stack_alloc(int startsize, int maxsize) -#endif -{ -(void)startsize; -(void)maxsize; -return NULL; -} +if ((options & PCRE2_JIT_PARTIAL_SOFT) != 0 && (functions == NULL + || functions->executable_funcs[1] == NULL)) { + uint32_t excluded_options = (PCRE2_JIT_COMPLETE | PCRE2_JIT_PARTIAL_HARD); + int result = jit_compile(code, options & ~excluded_options); + if (result != 0) + return result; + } -#if defined COMPILE_PCRE8 -PCRE_EXP_DECL void -pcre_jit_stack_free(pcre_jit_stack *stack) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DECL void -pcre16_jit_stack_free(pcre16_jit_stack *stack) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DECL void -pcre32_jit_stack_free(pcre32_jit_stack *stack) -#endif -{ -(void)stack; -} +if ((options & PCRE2_JIT_PARTIAL_HARD) != 0 && (functions == NULL + || functions->executable_funcs[2] == NULL)) { + uint32_t excluded_options = (PCRE2_JIT_COMPLETE | PCRE2_JIT_PARTIAL_SOFT); + int result = jit_compile(code, options & ~excluded_options); + if (result != 0) + return result; + } -#if defined COMPILE_PCRE8 -PCRE_EXP_DECL void -pcre_assign_jit_stack(pcre_extra *extra, pcre_jit_callback callback, void *userdata) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DECL void -pcre16_assign_jit_stack(pcre16_extra *extra, pcre16_jit_callback callback, void *userdata) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DECL void -pcre32_assign_jit_stack(pcre32_extra *extra, pcre32_jit_callback callback, void *userdata) -#endif -{ -(void)extra; -(void)callback; -(void)userdata; -} +return 0; -#if defined COMPILE_PCRE8 -PCRE_EXP_DECL void -pcre_jit_free_unused_memory(void) -#elif defined COMPILE_PCRE16 -PCRE_EXP_DECL void -pcre16_jit_free_unused_memory(void) -#elif defined COMPILE_PCRE32 -PCRE_EXP_DECL void -pcre32_jit_free_unused_memory(void) -#endif -{ +#endif /* SUPPORT_JIT */ } -#endif +/* JIT compiler uses an all-in-one approach. This improves security, + since the code generator functions are not exported. */ + +#define INCLUDED_FROM_PCRE2_JIT_COMPILE + +#include "pcre2_jit_match.c" +#include "pcre2_jit_misc.c" -/* End of pcre_jit_compile.c */ +/* End of pcre2_jit_compile.c */ diff --git a/src/pcre2/src/pcre2_jit_match.c b/src/pcre2/src/pcre2_jit_match.c new file mode 100644 index 00000000..7e13b8cf --- /dev/null +++ b/src/pcre2/src/pcre2_jit_match.c @@ -0,0 +1,186 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2018 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + +#ifndef INCLUDED_FROM_PCRE2_JIT_COMPILE +#error This file must be included from pcre2_jit_compile.c. +#endif + +#ifdef SUPPORT_JIT + +static SLJIT_NOINLINE int jit_machine_stack_exec(jit_arguments *arguments, jit_function executable_func) +{ +sljit_u8 local_space[MACHINE_STACK_SIZE]; +struct sljit_stack local_stack; + +local_stack.min_start = local_space; +local_stack.start = local_space; +local_stack.end = local_space + MACHINE_STACK_SIZE; +local_stack.top = local_space + MACHINE_STACK_SIZE; +arguments->stack = &local_stack; +return executable_func(arguments); +} + +#endif + + +/************************************************* +* Do a JIT pattern match * +*************************************************/ + +/* This function runs a JIT pattern match. + +Arguments: + code points to the compiled expression + subject points to the subject string + length length of subject string (may contain binary zeros) + start_offset where to start in the subject string + options option bits + match_data points to a match_data block + mcontext points to a match context + +Returns: > 0 => success; value is the number of ovector pairs filled + = 0 => success, but ovector is not big enough + -1 => failed to match (PCRE_ERROR_NOMATCH) + < -1 => some kind of unexpected problem +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_jit_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length, + PCRE2_SIZE start_offset, uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext) +{ +#ifndef SUPPORT_JIT + +(void)code; +(void)subject; +(void)length; +(void)start_offset; +(void)options; +(void)match_data; +(void)mcontext; +return PCRE2_ERROR_JIT_BADOPTION; + +#else /* SUPPORT_JIT */ + +pcre2_real_code *re = (pcre2_real_code *)code; +executable_functions *functions = (executable_functions *)re->executable_jit; +pcre2_jit_stack *jit_stack; +uint32_t oveccount = match_data->oveccount; +uint32_t max_oveccount; +union { + void *executable_func; + jit_function call_executable_func; +} convert_executable_func; +jit_arguments arguments; +int rc; +int index = 0; + +if ((options & PCRE2_PARTIAL_HARD) != 0) + index = 2; +else if ((options & PCRE2_PARTIAL_SOFT) != 0) + index = 1; + +if (functions == NULL || functions->executable_funcs[index] == NULL) + return PCRE2_ERROR_JIT_BADOPTION; + +/* Sanity checks should be handled by pcre_exec. */ +arguments.str = subject + start_offset; +arguments.begin = subject; +arguments.end = subject + length; +arguments.match_data = match_data; +arguments.startchar_ptr = subject; +arguments.mark_ptr = NULL; +arguments.options = options; + +if (mcontext != NULL) + { + arguments.callout = mcontext->callout; + arguments.callout_data = mcontext->callout_data; + arguments.offset_limit = mcontext->offset_limit; + arguments.limit_match = (mcontext->match_limit < re->limit_match)? + mcontext->match_limit : re->limit_match; + if (mcontext->jit_callback != NULL) + jit_stack = mcontext->jit_callback(mcontext->jit_callback_data); + else + jit_stack = (pcre2_jit_stack *)mcontext->jit_callback_data; + } +else + { + arguments.callout = NULL; + arguments.callout_data = NULL; + arguments.offset_limit = PCRE2_UNSET; + arguments.limit_match = (MATCH_LIMIT < re->limit_match)? + MATCH_LIMIT : re->limit_match; + jit_stack = NULL; + } + + +max_oveccount = functions->top_bracket; +if (oveccount > max_oveccount) + oveccount = max_oveccount; +arguments.oveccount = oveccount << 1; + + +convert_executable_func.executable_func = functions->executable_funcs[index]; +if (jit_stack != NULL) + { + arguments.stack = (struct sljit_stack *)(jit_stack->stack); + rc = convert_executable_func.call_executable_func(&arguments); + } +else + rc = jit_machine_stack_exec(&arguments, convert_executable_func.call_executable_func); + +if (rc > (int)oveccount) + rc = 0; +match_data->code = re; +match_data->subject = (rc >= 0 || rc == PCRE2_ERROR_PARTIAL)? subject : NULL; +match_data->rc = rc; +match_data->startchar = arguments.startchar_ptr - subject; +match_data->leftchar = 0; +match_data->rightchar = 0; +match_data->mark = arguments.mark_ptr; +match_data->matchedby = PCRE2_MATCHEDBY_JIT; + +return match_data->rc; + +#endif /* SUPPORT_JIT */ +} + +/* End of pcre2_jit_match.c */ diff --git a/src/pcre2/src/pcre2_jit_misc.c b/src/pcre2/src/pcre2_jit_misc.c new file mode 100644 index 00000000..ec924e0f --- /dev/null +++ b/src/pcre2/src/pcre2_jit_misc.c @@ -0,0 +1,232 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +#ifndef INCLUDED_FROM_PCRE2_JIT_COMPILE +#error This file must be included from pcre2_jit_compile.c. +#endif + + + +/************************************************* +* Free JIT read-only data * +*************************************************/ + +void +PRIV(jit_free_rodata)(void *current, void *allocator_data) +{ +#ifndef SUPPORT_JIT +(void)current; +(void)allocator_data; +#else /* SUPPORT_JIT */ +void *next; + +SLJIT_UNUSED_ARG(allocator_data); + +while (current != NULL) + { + next = *(void**)current; + SLJIT_FREE(current, allocator_data); + current = next; + } + +#endif /* SUPPORT_JIT */ +} + +/************************************************* +* Free JIT compiled code * +*************************************************/ + +void +PRIV(jit_free)(void *executable_jit, pcre2_memctl *memctl) +{ +#ifndef SUPPORT_JIT +(void)executable_jit; +(void)memctl; +#else /* SUPPORT_JIT */ + +executable_functions *functions = (executable_functions *)executable_jit; +void *allocator_data = memctl; +int i; + +for (i = 0; i < JIT_NUMBER_OF_COMPILE_MODES; i++) + { + if (functions->executable_funcs[i] != NULL) + sljit_free_code(functions->executable_funcs[i], NULL); + PRIV(jit_free_rodata)(functions->read_only_data_heads[i], allocator_data); + } + +SLJIT_FREE(functions, allocator_data); + +#endif /* SUPPORT_JIT */ +} + + +/************************************************* +* Free unused JIT memory * +*************************************************/ + +PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION +pcre2_jit_free_unused_memory(pcre2_general_context *gcontext) +{ +#ifndef SUPPORT_JIT +(void)gcontext; /* Suppress warning */ +#else /* SUPPORT_JIT */ +SLJIT_UNUSED_ARG(gcontext); +sljit_free_unused_memory_exec(); +#endif /* SUPPORT_JIT */ +} + + + +/************************************************* +* Allocate a JIT stack * +*************************************************/ + +PCRE2_EXP_DEFN pcre2_jit_stack * PCRE2_CALL_CONVENTION +pcre2_jit_stack_create(size_t startsize, size_t maxsize, + pcre2_general_context *gcontext) +{ +#ifndef SUPPORT_JIT + +(void)gcontext; +(void)startsize; +(void)maxsize; +return NULL; + +#else /* SUPPORT_JIT */ + +pcre2_jit_stack *jit_stack; + +if (startsize < 1 || maxsize < 1) + return NULL; +if (startsize > maxsize) + startsize = maxsize; +startsize = (startsize + STACK_GROWTH_RATE - 1) & ~(STACK_GROWTH_RATE - 1); +maxsize = (maxsize + STACK_GROWTH_RATE - 1) & ~(STACK_GROWTH_RATE - 1); + +jit_stack = PRIV(memctl_malloc)(sizeof(pcre2_real_jit_stack), (pcre2_memctl *)gcontext); +if (jit_stack == NULL) return NULL; +jit_stack->stack = sljit_allocate_stack(startsize, maxsize, &jit_stack->memctl); +if (jit_stack->stack == NULL) + { + jit_stack->memctl.free(jit_stack, jit_stack->memctl.memory_data); + return NULL; + } +return jit_stack; + +#endif +} + + +/************************************************* +* Assign a JIT stack to a pattern * +*************************************************/ + +PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION +pcre2_jit_stack_assign(pcre2_match_context *mcontext, pcre2_jit_callback callback, + void *callback_data) +{ +#ifndef SUPPORT_JIT +(void)mcontext; +(void)callback; +(void)callback_data; +#else /* SUPPORT_JIT */ + +if (mcontext == NULL) return; +mcontext->jit_callback = callback; +mcontext->jit_callback_data = callback_data; + +#endif /* SUPPORT_JIT */ +} + + +/************************************************* +* Free a JIT stack * +*************************************************/ + +PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION +pcre2_jit_stack_free(pcre2_jit_stack *jit_stack) +{ +#ifndef SUPPORT_JIT +(void)jit_stack; +#else /* SUPPORT_JIT */ +if (jit_stack != NULL) + { + sljit_free_stack((struct sljit_stack *)(jit_stack->stack), &jit_stack->memctl); + jit_stack->memctl.free(jit_stack, jit_stack->memctl.memory_data); + } +#endif /* SUPPORT_JIT */ +} + + +/************************************************* +* Get target CPU type * +*************************************************/ + +const char* +PRIV(jit_get_target)(void) +{ +#ifndef SUPPORT_JIT +return "JIT is not supported"; +#else /* SUPPORT_JIT */ +return sljit_get_platform_name(); +#endif /* SUPPORT_JIT */ +} + + +/************************************************* +* Get size of JIT code * +*************************************************/ + +size_t +PRIV(jit_get_size)(void *executable_jit) +{ +#ifndef SUPPORT_JIT +(void)executable_jit; +return 0; +#else /* SUPPORT_JIT */ +sljit_uw *executable_sizes = ((executable_functions *)executable_jit)->executable_sizes; +SLJIT_COMPILE_ASSERT(JIT_NUMBER_OF_COMPILE_MODES == 3, number_of_compile_modes_changed); +return executable_sizes[0] + executable_sizes[1] + executable_sizes[2]; +#endif +} + +/* End of pcre2_jit_misc.c */ diff --git a/src/pcre2/src/pcre2_jit_neon_inc.h b/src/pcre2/src/pcre2_jit_neon_inc.h new file mode 100644 index 00000000..150da29e --- /dev/null +++ b/src/pcre2/src/pcre2_jit_neon_inc.h @@ -0,0 +1,347 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + This module by Zoltan Herczeg and Sebastian Pop + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2019 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + +# if defined(FFCS) +# if defined(FF_UTF) +# define FF_FUN ffcs_utf +# else +# define FF_FUN ffcs +# endif + +# elif defined(FFCS_2) +# if defined(FF_UTF) +# define FF_FUN ffcs_2_utf +# else +# define FF_FUN ffcs_2 +# endif + +# elif defined(FFCS_MASK) +# if defined(FF_UTF) +# define FF_FUN ffcs_mask_utf +# else +# define FF_FUN ffcs_mask +# endif + +# elif defined(FFCPS_0) +# if defined (FF_UTF) +# define FF_FUN ffcps_0_utf +# else +# define FF_FUN ffcps_0 +# endif + +# elif defined (FFCPS_1) +# if defined (FF_UTF) +# define FF_FUN ffcps_1_utf +# else +# define FF_FUN ffcps_1 +# endif + +# elif defined (FFCPS_DEFAULT) +# if defined (FF_UTF) +# define FF_FUN ffcps_default_utf +# else +# define FF_FUN ffcps_default +# endif +# endif + +static sljit_u8* SLJIT_FUNC FF_FUN(sljit_u8 *str_end, sljit_u8 *str_ptr, sljit_uw offs1, sljit_uw offs2, sljit_uw chars) +#undef FF_FUN +{ +quad_word qw; +int_char ic; + +SLJIT_UNUSED_ARG(offs1); +SLJIT_UNUSED_ARG(offs2); + +ic.x = chars; + +#if defined(FFCS) +sljit_u8 c1 = ic.c.c1; +vect_t vc1 = VDUPQ(c1); + +#elif defined(FFCS_2) +sljit_u8 c1 = ic.c.c1; +vect_t vc1 = VDUPQ(c1); +sljit_u8 c2 = ic.c.c2; +vect_t vc2 = VDUPQ(c2); + +#elif defined(FFCS_MASK) +sljit_u8 c1 = ic.c.c1; +vect_t vc1 = VDUPQ(c1); +sljit_u8 mask = ic.c.c2; +vect_t vmask = VDUPQ(mask); +#endif + +#if defined(FFCPS) +compare_type compare1_type = compare_match1; +compare_type compare2_type = compare_match1; +vect_t cmp1a, cmp1b, cmp2a, cmp2b; +const sljit_u32 diff = IN_UCHARS(offs1 - offs2); +PCRE2_UCHAR char1a = ic.c.c1; +PCRE2_UCHAR char2a = ic.c.c3; + +# ifdef FFCPS_CHAR1A2A +cmp1a = VDUPQ(char1a); +cmp2a = VDUPQ(char2a); +cmp1b = VDUPQ(0); /* to avoid errors on older compilers -Werror=maybe-uninitialized */ +cmp2b = VDUPQ(0); /* to avoid errors on older compilers -Werror=maybe-uninitialized */ +# else +PCRE2_UCHAR char1b = ic.c.c2; +PCRE2_UCHAR char2b = ic.c.c4; +if (char1a == char1b) + { + cmp1a = VDUPQ(char1a); + cmp1b = VDUPQ(0); /* to avoid errors on older compilers -Werror=maybe-uninitialized */ + } +else + { + sljit_u32 bit1 = char1a ^ char1b; + if (is_powerof2(bit1)) + { + compare1_type = compare_match1i; + cmp1a = VDUPQ(char1a | bit1); + cmp1b = VDUPQ(bit1); + } + else + { + compare1_type = compare_match2; + cmp1a = VDUPQ(char1a); + cmp1b = VDUPQ(char1b); + } + } + +if (char2a == char2b) + { + cmp2a = VDUPQ(char2a); + cmp2b = VDUPQ(0); /* to avoid errors on older compilers -Werror=maybe-uninitialized */ + } +else + { + sljit_u32 bit2 = char2a ^ char2b; + if (is_powerof2(bit2)) + { + compare2_type = compare_match1i; + cmp2a = VDUPQ(char2a | bit2); + cmp2b = VDUPQ(bit2); + } + else + { + compare2_type = compare_match2; + cmp2a = VDUPQ(char2a); + cmp2b = VDUPQ(char2b); + } + } +# endif + +str_ptr += IN_UCHARS(offs1); +#endif + +#if PCRE2_CODE_UNIT_WIDTH != 8 +vect_t char_mask = VDUPQ(0xff); +#endif + +#if defined(FF_UTF) +restart:; +#endif + +#if defined(FFCPS) +sljit_u8 *p1 = str_ptr - diff; +#endif +sljit_s32 align_offset = ((uint64_t)str_ptr & 0xf); +str_ptr = (sljit_u8 *) ((uint64_t)str_ptr & ~0xf); +vect_t data = VLD1Q(str_ptr); +#if PCRE2_CODE_UNIT_WIDTH != 8 +data = VANDQ(data, char_mask); +#endif + +#if defined(FFCS) +vect_t eq = VCEQQ(data, vc1); + +#elif defined(FFCS_2) +vect_t eq1 = VCEQQ(data, vc1); +vect_t eq2 = VCEQQ(data, vc2); +vect_t eq = VORRQ(eq1, eq2); + +#elif defined(FFCS_MASK) +vect_t eq = VORRQ(data, vmask); +eq = VCEQQ(eq, vc1); + +#elif defined(FFCPS) +# if defined(FFCPS_DIFF1) +vect_t prev_data = data; +# endif + +vect_t data2; +if (p1 < str_ptr) + { + data2 = VLD1Q(str_ptr - diff); +#if PCRE2_CODE_UNIT_WIDTH != 8 + data2 = VANDQ(data2, char_mask); +#endif + } +else + data2 = shift_left_n_lanes(data, offs1 - offs2); + +if (compare1_type == compare_match1) + data = VCEQQ(data, cmp1a); +else + data = fast_forward_char_pair_compare(compare1_type, data, cmp1a, cmp1b); + +if (compare2_type == compare_match1) + data2 = VCEQQ(data2, cmp2a); +else + data2 = fast_forward_char_pair_compare(compare2_type, data2, cmp2a, cmp2b); + +vect_t eq = VANDQ(data, data2); +#endif + +VST1Q(qw.mem, eq); +/* Ignore matches before the first STR_PTR. */ +if (align_offset < 8) + { + qw.dw[0] >>= align_offset * 8; + if (qw.dw[0]) + { + str_ptr += align_offset + __builtin_ctzll(qw.dw[0]) / 8; + goto match; + } + if (qw.dw[1]) + { + str_ptr += 8 + __builtin_ctzll(qw.dw[1]) / 8; + goto match; + } + } +else + { + qw.dw[1] >>= (align_offset - 8) * 8; + if (qw.dw[1]) + { + str_ptr += align_offset + __builtin_ctzll(qw.dw[1]) / 8; + goto match; + } + } +str_ptr += 16; + +while (str_ptr < str_end) + { + vect_t orig_data = VLD1Q(str_ptr); +#if PCRE2_CODE_UNIT_WIDTH != 8 + orig_data = VANDQ(orig_data, char_mask); +#endif + data = orig_data; + +#if defined(FFCS) + eq = VCEQQ(data, vc1); + +#elif defined(FFCS_2) + eq1 = VCEQQ(data, vc1); + eq2 = VCEQQ(data, vc2); + eq = VORRQ(eq1, eq2); + +#elif defined(FFCS_MASK) + eq = VORRQ(data, vmask); + eq = VCEQQ(eq, vc1); +#endif + +#if defined(FFCPS) +# if defined (FFCPS_DIFF1) + data2 = VEXTQ(prev_data, data, VECTOR_FACTOR - 1); +# else + data2 = VLD1Q(str_ptr - diff); +# if PCRE2_CODE_UNIT_WIDTH != 8 + data2 = VANDQ(data2, char_mask); +# endif +# endif + +# ifdef FFCPS_CHAR1A2A + data = VCEQQ(data, cmp1a); + data2 = VCEQQ(data2, cmp2a); +# else + if (compare1_type == compare_match1) + data = VCEQQ(data, cmp1a); + else + data = fast_forward_char_pair_compare(compare1_type, data, cmp1a, cmp1b); + if (compare2_type == compare_match1) + data2 = VCEQQ(data2, cmp2a); + else + data2 = fast_forward_char_pair_compare(compare2_type, data2, cmp2a, cmp2b); +# endif + + eq = VANDQ(data, data2); +#endif + + VST1Q(qw.mem, eq); + if (qw.dw[0]) + str_ptr += __builtin_ctzll(qw.dw[0]) / 8; + else if (qw.dw[1]) + str_ptr += 8 + __builtin_ctzll(qw.dw[1]) / 8; + else { + str_ptr += 16; +#if defined (FFCPS_DIFF1) + prev_data = orig_data; +#endif + continue; + } + +match:; + if (str_ptr >= str_end) + /* Failed match. */ + return NULL; + +#if defined(FF_UTF) + if (utf_continue(str_ptr + IN_UCHARS(-offs1))) + { + /* Not a match. */ + str_ptr += IN_UCHARS(1); + goto restart; + } +#endif + + /* Match. */ +#if defined (FFCPS) + str_ptr -= IN_UCHARS(offs1); +#endif + return str_ptr; + } + +/* Failed match. */ +return NULL; +} diff --git a/src/pcre2/src/pcre2_jit_simd_inc.h b/src/pcre2/src/pcre2_jit_simd_inc.h new file mode 100644 index 00000000..5fd97b15 --- /dev/null +++ b/src/pcre2/src/pcre2_jit_simd_inc.h @@ -0,0 +1,1867 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + This module by Zoltan Herczeg + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2019 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + +#if !(defined SUPPORT_VALGRIND) + +#if ((defined SLJIT_CONFIG_X86 && SLJIT_CONFIG_X86) \ + || (defined SLJIT_CONFIG_S390X && SLJIT_CONFIG_S390X)) + +typedef enum { + vector_compare_match1, + vector_compare_match1i, + vector_compare_match2, +} vector_compare_type; + +static SLJIT_INLINE sljit_u32 max_fast_forward_char_pair_offset(void) +{ +#if PCRE2_CODE_UNIT_WIDTH == 8 +return 15; +#elif PCRE2_CODE_UNIT_WIDTH == 16 +return 7; +#elif PCRE2_CODE_UNIT_WIDTH == 32 +return 3; +#else +#error "Unsupported unit width" +#endif +} + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +static struct sljit_jump *jump_if_utf_char_start(struct sljit_compiler *compiler, sljit_s32 reg) +{ +#if PCRE2_CODE_UNIT_WIDTH == 8 +OP2(SLJIT_AND, reg, 0, reg, 0, SLJIT_IMM, 0xc0); +return CMP(SLJIT_NOT_EQUAL, reg, 0, SLJIT_IMM, 0x80); +#elif PCRE2_CODE_UNIT_WIDTH == 16 +OP2(SLJIT_AND, reg, 0, reg, 0, SLJIT_IMM, 0xfc00); +return CMP(SLJIT_NOT_EQUAL, reg, 0, SLJIT_IMM, 0xdc00); +#else +#error "Unknown code width" +#endif +} +#endif + +#endif /* SLJIT_CONFIG_X86 || SLJIT_CONFIG_S390X */ + +#if (defined SLJIT_CONFIG_X86 && SLJIT_CONFIG_X86) + +static sljit_s32 character_to_int32(PCRE2_UCHAR chr) +{ +sljit_u32 value = chr; +#if PCRE2_CODE_UNIT_WIDTH == 8 +#define SSE2_COMPARE_TYPE_INDEX 0 +return (sljit_s32)((value << 24) | (value << 16) | (value << 8) | value); +#elif PCRE2_CODE_UNIT_WIDTH == 16 +#define SSE2_COMPARE_TYPE_INDEX 1 +return (sljit_s32)((value << 16) | value); +#elif PCRE2_CODE_UNIT_WIDTH == 32 +#define SSE2_COMPARE_TYPE_INDEX 2 +return (sljit_s32)(value); +#else +#error "Unsupported unit width" +#endif +} + +static void load_from_mem_sse2(struct sljit_compiler *compiler, sljit_s32 dst_xmm_reg, sljit_s32 src_general_reg, sljit_s8 offset) +{ +sljit_u8 instruction[5]; + +SLJIT_ASSERT(dst_xmm_reg < 8); +SLJIT_ASSERT(src_general_reg < 8); + +/* MOVDQA xmm1, xmm2/m128 */ +instruction[0] = ((sljit_u8)offset & 0xf) == 0 ? 0x66 : 0xf3; +instruction[1] = 0x0f; +instruction[2] = 0x6f; + +if (offset == 0) + { + instruction[3] = (dst_xmm_reg << 3) | src_general_reg; + sljit_emit_op_custom(compiler, instruction, 4); + return; + } + +instruction[3] = 0x40 | (dst_xmm_reg << 3) | src_general_reg; +instruction[4] = (sljit_u8)offset; +sljit_emit_op_custom(compiler, instruction, 5); +} + +static void fast_forward_char_pair_sse2_compare(struct sljit_compiler *compiler, vector_compare_type compare_type, + int step, sljit_s32 dst_ind, sljit_s32 cmp1_ind, sljit_s32 cmp2_ind, sljit_s32 tmp_ind) +{ +sljit_u8 instruction[4]; +instruction[0] = 0x66; +instruction[1] = 0x0f; + +SLJIT_ASSERT(step >= 0 && step <= 3); + +if (compare_type != vector_compare_match2) + { + if (step == 0) + { + if (compare_type == vector_compare_match1i) + { + /* POR xmm1, xmm2/m128 */ + /* instruction[0] = 0x66; */ + /* instruction[1] = 0x0f; */ + instruction[2] = 0xeb; + instruction[3] = 0xc0 | (dst_ind << 3) | cmp2_ind; + sljit_emit_op_custom(compiler, instruction, 4); + } + return; + } + + if (step != 2) + return; + + /* PCMPEQB/W/D xmm1, xmm2/m128 */ + /* instruction[0] = 0x66; */ + /* instruction[1] = 0x0f; */ + instruction[2] = 0x74 + SSE2_COMPARE_TYPE_INDEX; + instruction[3] = 0xc0 | (dst_ind << 3) | cmp1_ind; + sljit_emit_op_custom(compiler, instruction, 4); + return; + } + +switch (step) + { + case 0: + /* MOVDQA xmm1, xmm2/m128 */ + /* instruction[0] = 0x66; */ + /* instruction[1] = 0x0f; */ + instruction[2] = 0x6f; + instruction[3] = 0xc0 | (tmp_ind << 3) | dst_ind; + sljit_emit_op_custom(compiler, instruction, 4); + return; + + case 1: + /* PCMPEQB/W/D xmm1, xmm2/m128 */ + /* instruction[0] = 0x66; */ + /* instruction[1] = 0x0f; */ + instruction[2] = 0x74 + SSE2_COMPARE_TYPE_INDEX; + instruction[3] = 0xc0 | (dst_ind << 3) | cmp1_ind; + sljit_emit_op_custom(compiler, instruction, 4); + return; + + case 2: + /* PCMPEQB/W/D xmm1, xmm2/m128 */ + /* instruction[0] = 0x66; */ + /* instruction[1] = 0x0f; */ + instruction[2] = 0x74 + SSE2_COMPARE_TYPE_INDEX; + instruction[3] = 0xc0 | (tmp_ind << 3) | cmp2_ind; + sljit_emit_op_custom(compiler, instruction, 4); + return; + + case 3: + /* POR xmm1, xmm2/m128 */ + /* instruction[0] = 0x66; */ + /* instruction[1] = 0x0f; */ + instruction[2] = 0xeb; + instruction[3] = 0xc0 | (dst_ind << 3) | tmp_ind; + sljit_emit_op_custom(compiler, instruction, 4); + return; + } +} + +#define JIT_HAS_FAST_FORWARD_CHAR_SIMD (sljit_has_cpu_feature(SLJIT_HAS_SSE2)) + +static void fast_forward_char_simd(compiler_common *common, PCRE2_UCHAR char1, PCRE2_UCHAR char2, sljit_s32 offset) +{ +DEFINE_COMPILER; +sljit_u8 instruction[8]; +struct sljit_label *start; +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +struct sljit_label *restart; +#endif +struct sljit_jump *quit; +struct sljit_jump *partial_quit[2]; +vector_compare_type compare_type = vector_compare_match1; +sljit_s32 tmp1_reg_ind = sljit_get_register_index(TMP1); +sljit_s32 str_ptr_reg_ind = sljit_get_register_index(STR_PTR); +sljit_s32 data_ind = 0; +sljit_s32 tmp_ind = 1; +sljit_s32 cmp1_ind = 2; +sljit_s32 cmp2_ind = 3; +sljit_u32 bit = 0; +int i; + +SLJIT_UNUSED_ARG(offset); + +if (char1 != char2) + { + bit = char1 ^ char2; + compare_type = vector_compare_match1i; + + if (!is_powerof2(bit)) + { + bit = 0; + compare_type = vector_compare_match2; + } + } + +partial_quit[0] = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); +if (common->mode == PCRE2_JIT_COMPLETE) + add_jump(compiler, &common->failed_match, partial_quit[0]); + +/* First part (unaligned start) */ + +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(char1 | bit)); + +SLJIT_ASSERT(tmp1_reg_ind < 8); + +/* MOVD xmm, r/m32 */ +instruction[0] = 0x66; +instruction[1] = 0x0f; +instruction[2] = 0x6e; +instruction[3] = 0xc0 | (cmp1_ind << 3) | tmp1_reg_ind; +sljit_emit_op_custom(compiler, instruction, 4); + +if (char1 != char2) + { + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(bit != 0 ? bit : char2)); + + /* MOVD xmm, r/m32 */ + instruction[3] = 0xc0 | (cmp2_ind << 3) | tmp1_reg_ind; + sljit_emit_op_custom(compiler, instruction, 4); + } + +OP1(SLJIT_MOV, TMP2, 0, STR_PTR, 0); + +/* PSHUFD xmm1, xmm2/m128, imm8 */ +/* instruction[0] = 0x66; */ +/* instruction[1] = 0x0f; */ +instruction[2] = 0x70; +instruction[3] = 0xc0 | (cmp1_ind << 3) | cmp1_ind; +instruction[4] = 0; +sljit_emit_op_custom(compiler, instruction, 5); + +if (char1 != char2) + { + /* PSHUFD xmm1, xmm2/m128, imm8 */ + instruction[3] = 0xc0 | (cmp2_ind << 3) | cmp2_ind; + sljit_emit_op_custom(compiler, instruction, 5); + } + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +restart = LABEL(); +#endif +OP2(SLJIT_AND, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, ~0xf); +OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xf); + +load_from_mem_sse2(compiler, data_ind, str_ptr_reg_ind, 0); +for (i = 0; i < 4; i++) + fast_forward_char_pair_sse2_compare(compiler, compare_type, i, data_ind, cmp1_ind, cmp2_ind, tmp_ind); + +/* PMOVMSKB reg, xmm */ +/* instruction[0] = 0x66; */ +/* instruction[1] = 0x0f; */ +instruction[2] = 0xd7; +instruction[3] = 0xc0 | (tmp1_reg_ind << 3) | data_ind; +sljit_emit_op_custom(compiler, instruction, 4); + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP2, 0); +OP2(SLJIT_LSHR, TMP1, 0, TMP1, 0, TMP2, 0); + +quit = CMP(SLJIT_NOT_ZERO, TMP1, 0, SLJIT_IMM, 0); + +OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, TMP2, 0); + +/* Second part (aligned) */ +start = LABEL(); + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, 16); + +partial_quit[1] = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); +if (common->mode == PCRE2_JIT_COMPLETE) + add_jump(compiler, &common->failed_match, partial_quit[1]); + +load_from_mem_sse2(compiler, data_ind, str_ptr_reg_ind, 0); +for (i = 0; i < 4; i++) + fast_forward_char_pair_sse2_compare(compiler, compare_type, i, data_ind, cmp1_ind, cmp2_ind, tmp_ind); + +/* PMOVMSKB reg, xmm */ +/* instruction[0] = 0x66; */ +/* instruction[1] = 0x0f; */ +instruction[2] = 0xd7; +instruction[3] = 0xc0 | (tmp1_reg_ind << 3) | data_ind; +sljit_emit_op_custom(compiler, instruction, 4); + +CMPTO(SLJIT_ZERO, TMP1, 0, SLJIT_IMM, 0, start); + +JUMPHERE(quit); + +/* BSF r32, r/m32 */ +instruction[0] = 0x0f; +instruction[1] = 0xbc; +instruction[2] = 0xc0 | (tmp1_reg_ind << 3) | tmp1_reg_ind; +sljit_emit_op_custom(compiler, instruction, 3); + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); + +if (common->mode != PCRE2_JIT_COMPLETE) + { + JUMPHERE(partial_quit[0]); + JUMPHERE(partial_quit[1]); + OP2(SLJIT_SUB | SLJIT_SET_GREATER, SLJIT_UNUSED, 0, STR_PTR, 0, STR_END, 0); + CMOV(SLJIT_GREATER, STR_PTR, STR_END, 0); + } +else + add_jump(compiler, &common->failed_match, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +if (common->utf && offset > 0) + { + SLJIT_ASSERT(common->mode == PCRE2_JIT_COMPLETE); + + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-offset)); + + quit = jump_if_utf_char_start(compiler, TMP1); + + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + add_jump(compiler, &common->failed_match, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + OP1(SLJIT_MOV, TMP2, 0, STR_PTR, 0); + JUMPTO(SLJIT_JUMP, restart); + + JUMPHERE(quit); + } +#endif +} + +#define JIT_HAS_FAST_REQUESTED_CHAR_SIMD (sljit_has_cpu_feature(SLJIT_HAS_SSE2)) + +static jump_list *fast_requested_char_simd(compiler_common *common, PCRE2_UCHAR char1, PCRE2_UCHAR char2) +{ +DEFINE_COMPILER; +sljit_u8 instruction[8]; +struct sljit_label *start; +struct sljit_jump *quit; +jump_list *not_found = NULL; +vector_compare_type compare_type = vector_compare_match1; +sljit_s32 tmp1_reg_ind = sljit_get_register_index(TMP1); +sljit_s32 str_ptr_reg_ind = sljit_get_register_index(STR_PTR); +sljit_s32 data_ind = 0; +sljit_s32 tmp_ind = 1; +sljit_s32 cmp1_ind = 2; +sljit_s32 cmp2_ind = 3; +sljit_u32 bit = 0; +int i; + +if (char1 != char2) + { + bit = char1 ^ char2; + compare_type = vector_compare_match1i; + + if (!is_powerof2(bit)) + { + bit = 0; + compare_type = vector_compare_match2; + } + } + +add_jump(compiler, ¬_found, CMP(SLJIT_GREATER_EQUAL, TMP1, 0, STR_END, 0)); +OP1(SLJIT_MOV, TMP2, 0, TMP1, 0); +OP1(SLJIT_MOV, TMP3, 0, STR_PTR, 0); + +/* First part (unaligned start) */ + +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(char1 | bit)); + +SLJIT_ASSERT(tmp1_reg_ind < 8); + +/* MOVD xmm, r/m32 */ +instruction[0] = 0x66; +instruction[1] = 0x0f; +instruction[2] = 0x6e; +instruction[3] = 0xc0 | (cmp1_ind << 3) | tmp1_reg_ind; +sljit_emit_op_custom(compiler, instruction, 4); + +if (char1 != char2) + { + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(bit != 0 ? bit : char2)); + + /* MOVD xmm, r/m32 */ + instruction[3] = 0xc0 | (cmp2_ind << 3) | tmp1_reg_ind; + sljit_emit_op_custom(compiler, instruction, 4); + } + +OP1(SLJIT_MOV, STR_PTR, 0, TMP2, 0); + +/* PSHUFD xmm1, xmm2/m128, imm8 */ +/* instruction[0] = 0x66; */ +/* instruction[1] = 0x0f; */ +instruction[2] = 0x70; +instruction[3] = 0xc0 | (cmp1_ind << 3) | cmp1_ind; +instruction[4] = 0; +sljit_emit_op_custom(compiler, instruction, 5); + +if (char1 != char2) + { + /* PSHUFD xmm1, xmm2/m128, imm8 */ + instruction[3] = 0xc0 | (cmp2_ind << 3) | cmp2_ind; + sljit_emit_op_custom(compiler, instruction, 5); + } + +OP2(SLJIT_AND, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, ~0xf); +OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xf); + +load_from_mem_sse2(compiler, data_ind, str_ptr_reg_ind, 0); +for (i = 0; i < 4; i++) + fast_forward_char_pair_sse2_compare(compiler, compare_type, i, data_ind, cmp1_ind, cmp2_ind, tmp_ind); + +/* PMOVMSKB reg, xmm */ +/* instruction[0] = 0x66; */ +/* instruction[1] = 0x0f; */ +instruction[2] = 0xd7; +instruction[3] = 0xc0 | (tmp1_reg_ind << 3) | data_ind; +sljit_emit_op_custom(compiler, instruction, 4); + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP2, 0); +OP2(SLJIT_LSHR, TMP1, 0, TMP1, 0, TMP2, 0); + +quit = CMP(SLJIT_NOT_ZERO, TMP1, 0, SLJIT_IMM, 0); + +OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, TMP2, 0); + +/* Second part (aligned) */ +start = LABEL(); + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, 16); + +add_jump(compiler, ¬_found, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + +load_from_mem_sse2(compiler, data_ind, str_ptr_reg_ind, 0); +for (i = 0; i < 4; i++) + fast_forward_char_pair_sse2_compare(compiler, compare_type, i, data_ind, cmp1_ind, cmp2_ind, tmp_ind); + +/* PMOVMSKB reg, xmm */ +/* instruction[0] = 0x66; */ +/* instruction[1] = 0x0f; */ +instruction[2] = 0xd7; +instruction[3] = 0xc0 | (tmp1_reg_ind << 3) | data_ind; +sljit_emit_op_custom(compiler, instruction, 4); + +CMPTO(SLJIT_ZERO, TMP1, 0, SLJIT_IMM, 0, start); + +JUMPHERE(quit); + +/* BSF r32, r/m32 */ +instruction[0] = 0x0f; +instruction[1] = 0xbc; +instruction[2] = 0xc0 | (tmp1_reg_ind << 3) | tmp1_reg_ind; +sljit_emit_op_custom(compiler, instruction, 3); + +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, STR_PTR, 0); +add_jump(compiler, ¬_found, CMP(SLJIT_GREATER_EQUAL, TMP1, 0, STR_END, 0)); + +OP1(SLJIT_MOV, STR_PTR, 0, TMP3, 0); +return not_found; +} + +#ifndef _WIN64 + +#define JIT_HAS_FAST_FORWARD_CHAR_PAIR_SIMD (sljit_has_cpu_feature(SLJIT_HAS_SSE2)) + +static void fast_forward_char_pair_simd(compiler_common *common, sljit_s32 offs1, + PCRE2_UCHAR char1a, PCRE2_UCHAR char1b, sljit_s32 offs2, PCRE2_UCHAR char2a, PCRE2_UCHAR char2b) +{ +DEFINE_COMPILER; +sljit_u8 instruction[8]; +vector_compare_type compare1_type = vector_compare_match1; +vector_compare_type compare2_type = vector_compare_match1; +sljit_u32 bit1 = 0; +sljit_u32 bit2 = 0; +sljit_u32 diff = IN_UCHARS(offs1 - offs2); +sljit_s32 tmp1_reg_ind = sljit_get_register_index(TMP1); +sljit_s32 tmp2_reg_ind = sljit_get_register_index(TMP2); +sljit_s32 str_ptr_reg_ind = sljit_get_register_index(STR_PTR); +sljit_s32 data1_ind = 0; +sljit_s32 data2_ind = 1; +sljit_s32 tmp1_ind = 2; +sljit_s32 tmp2_ind = 3; +sljit_s32 cmp1a_ind = 4; +sljit_s32 cmp1b_ind = 5; +sljit_s32 cmp2a_ind = 6; +sljit_s32 cmp2b_ind = 7; +struct sljit_label *start; +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +struct sljit_label *restart; +#endif +struct sljit_jump *jump[2]; +int i; + +SLJIT_ASSERT(common->mode == PCRE2_JIT_COMPLETE && offs1 > offs2); +SLJIT_ASSERT(diff <= IN_UCHARS(max_fast_forward_char_pair_offset())); +SLJIT_ASSERT(tmp1_reg_ind < 8 && tmp2_reg_ind == 1); + +/* Initialize. */ +if (common->match_end_ptr != 0) + { + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->match_end_ptr); + OP1(SLJIT_MOV, TMP3, 0, STR_END, 0); + OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, IN_UCHARS(offs1 + 1)); + + OP2(SLJIT_SUB | SLJIT_SET_LESS, SLJIT_UNUSED, 0, TMP1, 0, STR_END, 0); + CMOV(SLJIT_LESS, STR_END, TMP1, 0); + } + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(offs1)); +add_jump(compiler, &common->failed_match, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + +/* MOVD xmm, r/m32 */ +instruction[0] = 0x66; +instruction[1] = 0x0f; +instruction[2] = 0x6e; + +if (char1a == char1b) + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(char1a)); +else + { + bit1 = char1a ^ char1b; + if (is_powerof2(bit1)) + { + compare1_type = vector_compare_match1i; + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(char1a | bit1)); + OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, character_to_int32(bit1)); + } + else + { + compare1_type = vector_compare_match2; + bit1 = 0; + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(char1a)); + OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, character_to_int32(char1b)); + } + } + +instruction[3] = 0xc0 | (cmp1a_ind << 3) | tmp1_reg_ind; +sljit_emit_op_custom(compiler, instruction, 4); + +if (char1a != char1b) + { + instruction[3] = 0xc0 | (cmp1b_ind << 3) | tmp2_reg_ind; + sljit_emit_op_custom(compiler, instruction, 4); + } + +if (char2a == char2b) + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(char2a)); +else + { + bit2 = char2a ^ char2b; + if (is_powerof2(bit2)) + { + compare2_type = vector_compare_match1i; + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(char2a | bit2)); + OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, character_to_int32(bit2)); + } + else + { + compare2_type = vector_compare_match2; + bit2 = 0; + OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, character_to_int32(char2a)); + OP1(SLJIT_MOV, TMP2, 0, SLJIT_IMM, character_to_int32(char2b)); + } + } + +instruction[3] = 0xc0 | (cmp2a_ind << 3) | tmp1_reg_ind; +sljit_emit_op_custom(compiler, instruction, 4); + +if (char2a != char2b) + { + instruction[3] = 0xc0 | (cmp2b_ind << 3) | tmp2_reg_ind; + sljit_emit_op_custom(compiler, instruction, 4); + } + +/* PSHUFD xmm1, xmm2/m128, imm8 */ +/* instruction[0] = 0x66; */ +/* instruction[1] = 0x0f; */ +instruction[2] = 0x70; +instruction[4] = 0; + +instruction[3] = 0xc0 | (cmp1a_ind << 3) | cmp1a_ind; +sljit_emit_op_custom(compiler, instruction, 5); + +if (char1a != char1b) + { + instruction[3] = 0xc0 | (cmp1b_ind << 3) | cmp1b_ind; + sljit_emit_op_custom(compiler, instruction, 5); + } + +instruction[3] = 0xc0 | (cmp2a_ind << 3) | cmp2a_ind; +sljit_emit_op_custom(compiler, instruction, 5); + +if (char2a != char2b) + { + instruction[3] = 0xc0 | (cmp2b_ind << 3) | cmp2b_ind; + sljit_emit_op_custom(compiler, instruction, 5); + } + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +restart = LABEL(); +#endif + +OP2(SLJIT_SUB, TMP1, 0, STR_PTR, 0, SLJIT_IMM, diff); +OP1(SLJIT_MOV, TMP2, 0, STR_PTR, 0); +OP2(SLJIT_AND, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, ~0xf); + +load_from_mem_sse2(compiler, data1_ind, str_ptr_reg_ind, 0); + +jump[0] = CMP(SLJIT_GREATER_EQUAL, TMP1, 0, STR_PTR, 0); + +load_from_mem_sse2(compiler, data2_ind, str_ptr_reg_ind, -(sljit_s8)diff); +jump[1] = JUMP(SLJIT_JUMP); + +JUMPHERE(jump[0]); + +/* MOVDQA xmm1, xmm2/m128 */ +/* instruction[0] = 0x66; */ +/* instruction[1] = 0x0f; */ +instruction[2] = 0x6f; +instruction[3] = 0xc0 | (data2_ind << 3) | data1_ind; +sljit_emit_op_custom(compiler, instruction, 4); + +/* PSLLDQ xmm1, imm8 */ +/* instruction[0] = 0x66; */ +/* instruction[1] = 0x0f; */ +instruction[2] = 0x73; +instruction[3] = 0xc0 | (7 << 3) | data2_ind; +instruction[4] = diff; +sljit_emit_op_custom(compiler, instruction, 5); + +JUMPHERE(jump[1]); + +OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, 0xf); + +for (i = 0; i < 4; i++) + { + fast_forward_char_pair_sse2_compare(compiler, compare2_type, i, data2_ind, cmp2a_ind, cmp2b_ind, tmp2_ind); + fast_forward_char_pair_sse2_compare(compiler, compare1_type, i, data1_ind, cmp1a_ind, cmp1b_ind, tmp1_ind); + } + +/* PAND xmm1, xmm2/m128 */ +/* instruction[0] = 0x66; */ +/* instruction[1] = 0x0f; */ +instruction[2] = 0xdb; +instruction[3] = 0xc0 | (data1_ind << 3) | data2_ind; +sljit_emit_op_custom(compiler, instruction, 4); + +/* PMOVMSKB reg, xmm */ +/* instruction[0] = 0x66; */ +/* instruction[1] = 0x0f; */ +instruction[2] = 0xd7; +instruction[3] = 0xc0 | (tmp1_reg_ind << 3) | 0; +sljit_emit_op_custom(compiler, instruction, 4); + +/* Ignore matches before the first STR_PTR. */ +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP2, 0); +OP2(SLJIT_LSHR, TMP1, 0, TMP1, 0, TMP2, 0); + +jump[0] = CMP(SLJIT_NOT_ZERO, TMP1, 0, SLJIT_IMM, 0); + +OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, TMP2, 0); + +/* Main loop. */ +start = LABEL(); + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, 16); +add_jump(compiler, &common->failed_match, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + +load_from_mem_sse2(compiler, data1_ind, str_ptr_reg_ind, 0); +load_from_mem_sse2(compiler, data2_ind, str_ptr_reg_ind, -(sljit_s8)diff); + +for (i = 0; i < 4; i++) + { + fast_forward_char_pair_sse2_compare(compiler, compare1_type, i, data1_ind, cmp1a_ind, cmp1b_ind, tmp2_ind); + fast_forward_char_pair_sse2_compare(compiler, compare2_type, i, data2_ind, cmp2a_ind, cmp2b_ind, tmp1_ind); + } + +/* PAND xmm1, xmm2/m128 */ +/* instruction[0] = 0x66; */ +/* instruction[1] = 0x0f; */ +instruction[2] = 0xdb; +instruction[3] = 0xc0 | (data1_ind << 3) | data2_ind; +sljit_emit_op_custom(compiler, instruction, 4); + +/* PMOVMSKB reg, xmm */ +/* instruction[0] = 0x66; */ +/* instruction[1] = 0x0f; */ +instruction[2] = 0xd7; +instruction[3] = 0xc0 | (tmp1_reg_ind << 3) | 0; +sljit_emit_op_custom(compiler, instruction, 4); + +CMPTO(SLJIT_ZERO, TMP1, 0, SLJIT_IMM, 0, start); + +JUMPHERE(jump[0]); + +/* BSF r32, r/m32 */ +instruction[0] = 0x0f; +instruction[1] = 0xbc; +instruction[2] = 0xc0 | (tmp1_reg_ind << 3) | tmp1_reg_ind; +sljit_emit_op_custom(compiler, instruction, 3); + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); + +add_jump(compiler, &common->failed_match, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +if (common->utf) + { + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-offs1)); + + jump[0] = jump_if_utf_char_start(compiler, TMP1); + + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + CMPTO(SLJIT_LESS, STR_PTR, 0, STR_END, 0, restart); + + add_jump(compiler, &common->failed_match, JUMP(SLJIT_JUMP)); + + JUMPHERE(jump[0]); + } +#endif + +OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(offs1)); + +if (common->match_end_ptr != 0) + OP1(SLJIT_MOV, STR_END, 0, TMP3, 0); +} + +#endif /* !_WIN64 */ + +#undef SSE2_COMPARE_TYPE_INDEX + +#endif /* SLJIT_CONFIG_X86 */ + +#if (defined SLJIT_CONFIG_ARM_64 && SLJIT_CONFIG_ARM_64 && (defined __ARM_NEON || defined __ARM_NEON__)) + +#include + +typedef union { + unsigned int x; + struct { unsigned char c1, c2, c3, c4; } c; +} int_char; + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +static SLJIT_INLINE int utf_continue(sljit_u8 *s) +{ +#if PCRE2_CODE_UNIT_WIDTH == 8 +return (*s & 0xc0) == 0x80; +#elif PCRE2_CODE_UNIT_WIDTH == 16 +return (*s & 0xfc00) == 0xdc00; +#else +#error "Unknown code width" +#endif +} +#endif /* SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 */ + +#if PCRE2_CODE_UNIT_WIDTH == 8 +# define VECTOR_FACTOR 16 +# define vect_t uint8x16_t +# define VLD1Q(X) vld1q_u8((sljit_u8 *)(X)) +# define VCEQQ vceqq_u8 +# define VORRQ vorrq_u8 +# define VST1Q vst1q_u8 +# define VDUPQ vdupq_n_u8 +# define VEXTQ vextq_u8 +# define VANDQ vandq_u8 +typedef union { + uint8_t mem[16]; + uint64_t dw[2]; +} quad_word; +#elif PCRE2_CODE_UNIT_WIDTH == 16 +# define VECTOR_FACTOR 8 +# define vect_t uint16x8_t +# define VLD1Q(X) vld1q_u16((sljit_u16 *)(X)) +# define VCEQQ vceqq_u16 +# define VORRQ vorrq_u16 +# define VST1Q vst1q_u16 +# define VDUPQ vdupq_n_u16 +# define VEXTQ vextq_u16 +# define VANDQ vandq_u16 +typedef union { + uint16_t mem[8]; + uint64_t dw[2]; +} quad_word; +#else +# define VECTOR_FACTOR 4 +# define vect_t uint32x4_t +# define VLD1Q(X) vld1q_u32((sljit_u32 *)(X)) +# define VCEQQ vceqq_u32 +# define VORRQ vorrq_u32 +# define VST1Q vst1q_u32 +# define VDUPQ vdupq_n_u32 +# define VEXTQ vextq_u32 +# define VANDQ vandq_u32 +typedef union { + uint32_t mem[4]; + uint64_t dw[2]; +} quad_word; +#endif + +#define FFCS +#include "pcre2_jit_neon_inc.h" +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +# define FF_UTF +# include "pcre2_jit_neon_inc.h" +# undef FF_UTF +#endif +#undef FFCS + +#define FFCS_2 +#include "pcre2_jit_neon_inc.h" +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +# define FF_UTF +# include "pcre2_jit_neon_inc.h" +# undef FF_UTF +#endif +#undef FFCS_2 + +#define FFCS_MASK +#include "pcre2_jit_neon_inc.h" +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +# define FF_UTF +# include "pcre2_jit_neon_inc.h" +# undef FF_UTF +#endif +#undef FFCS_MASK + +#define JIT_HAS_FAST_FORWARD_CHAR_SIMD 1 + +static void fast_forward_char_simd(compiler_common *common, PCRE2_UCHAR char1, PCRE2_UCHAR char2, sljit_s32 offset) +{ +DEFINE_COMPILER; +int_char ic; +struct sljit_jump *partial_quit; +/* Save temporary registers. */ +OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LOCALS0, STR_PTR, 0); +OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LOCALS1, TMP3, 0); + +/* Prepare function arguments */ +OP1(SLJIT_MOV, SLJIT_R0, 0, STR_END, 0); +OP1(SLJIT_MOV, SLJIT_R1, 0, STR_PTR, 0); +OP1(SLJIT_MOV, SLJIT_R2, 0, SLJIT_IMM, offset); + +if (char1 == char2) + { + ic.c.c1 = char1; + ic.c.c2 = char2; + OP1(SLJIT_MOV, SLJIT_R4, 0, SLJIT_IMM, ic.x); + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 + if (common->utf && offset > 0) + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(UW) | SLJIT_ARG3(UW) | SLJIT_ARG4(UW), + SLJIT_IMM, SLJIT_FUNC_OFFSET(ffcs_utf)); + else + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(UW) | SLJIT_ARG3(UW) | SLJIT_ARG4(UW), + SLJIT_IMM, SLJIT_FUNC_OFFSET(ffcs)); +#else + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(UW) | SLJIT_ARG3(UW) | SLJIT_ARG4(UW), + SLJIT_IMM, SLJIT_FUNC_OFFSET(ffcs)); +#endif + } +else + { + PCRE2_UCHAR mask = char1 ^ char2; + if (is_powerof2(mask)) + { + ic.c.c1 = char1 | mask; + ic.c.c2 = mask; + OP1(SLJIT_MOV, SLJIT_R4, 0, SLJIT_IMM, ic.x); + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 + if (common->utf && offset > 0) + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(UW) | SLJIT_ARG3(UW) | SLJIT_ARG4(UW), + SLJIT_IMM, SLJIT_FUNC_OFFSET(ffcs_mask_utf)); + else + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(UW) | SLJIT_ARG3(UW) | SLJIT_ARG4(UW), + SLJIT_IMM, SLJIT_FUNC_OFFSET(ffcs_mask)); +#else + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(UW) | SLJIT_ARG3(UW) | SLJIT_ARG4(UW), + SLJIT_IMM, SLJIT_FUNC_OFFSET(ffcs_mask)); +#endif + } + else + { + ic.c.c1 = char1; + ic.c.c2 = char2; + OP1(SLJIT_MOV, SLJIT_R4, 0, SLJIT_IMM, ic.x); + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 + if (common->utf && offset > 0) + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(UW) | SLJIT_ARG3(UW) | SLJIT_ARG4(UW), + SLJIT_IMM, SLJIT_FUNC_OFFSET(ffcs_2_utf)); + else + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(UW) | SLJIT_ARG3(UW) | SLJIT_ARG4(UW), + SLJIT_IMM, SLJIT_FUNC_OFFSET(ffcs_2)); +#else + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(UW) | SLJIT_ARG3(UW) | SLJIT_ARG4(UW), + SLJIT_IMM, SLJIT_FUNC_OFFSET(ffcs_2)); +#endif + } + } +/* Restore registers. */ +OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), LOCALS0); +OP1(SLJIT_MOV, TMP3, 0, SLJIT_MEM1(SLJIT_SP), LOCALS1); + +/* Check return value. */ +partial_quit = CMP(SLJIT_EQUAL, SLJIT_RETURN_REG, 0, SLJIT_IMM, 0); +if (common->mode == PCRE2_JIT_COMPLETE) + add_jump(compiler, &common->failed_match, partial_quit); + +/* Fast forward STR_PTR to the result of memchr. */ +OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_RETURN_REG, 0); + +if (common->mode != PCRE2_JIT_COMPLETE) + JUMPHERE(partial_quit); +} + +typedef enum { + compare_match1, + compare_match1i, + compare_match2, +} compare_type; + +static inline vect_t fast_forward_char_pair_compare(compare_type ctype, vect_t dst, vect_t cmp1, vect_t cmp2) +{ +if (ctype == compare_match2) + { + vect_t tmp = dst; + dst = VCEQQ(dst, cmp1); + tmp = VCEQQ(tmp, cmp2); + dst = VORRQ(dst, tmp); + return dst; + } + +if (ctype == compare_match1i) + dst = VORRQ(dst, cmp2); +dst = VCEQQ(dst, cmp1); +return dst; +} + +static SLJIT_INLINE sljit_u32 max_fast_forward_char_pair_offset(void) +{ +#if PCRE2_CODE_UNIT_WIDTH == 8 +return 15; +#elif PCRE2_CODE_UNIT_WIDTH == 16 +return 7; +#elif PCRE2_CODE_UNIT_WIDTH == 32 +return 3; +#else +#error "Unsupported unit width" +#endif +} + +/* ARM doesn't have a shift left across lanes. */ +static SLJIT_INLINE vect_t shift_left_n_lanes(vect_t a, sljit_u8 n) +{ +vect_t zero = VDUPQ(0); +SLJIT_ASSERT(0 < n && n < VECTOR_FACTOR); +/* VEXTQ takes an immediate as last argument. */ +#define C(X) case X: return VEXTQ(zero, a, VECTOR_FACTOR - X); +switch (n) + { + C(1); C(2); C(3); +#if PCRE2_CODE_UNIT_WIDTH != 32 + C(4); C(5); C(6); C(7); +# if PCRE2_CODE_UNIT_WIDTH != 16 + C(8); C(9); C(10); C(11); C(12); C(13); C(14); C(15); +# endif +#endif + default: + /* Based on the ASSERT(0 < n && n < VECTOR_FACTOR) above, this won't + happen. The return is still here for compilers to not warn. */ + return a; + } +} + +#define FFCPS +#define FFCPS_DIFF1 +#define FFCPS_CHAR1A2A + +#define FFCPS_0 +#include "pcre2_jit_neon_inc.h" +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +# define FF_UTF +# include "pcre2_jit_neon_inc.h" +# undef FF_UTF +#endif +#undef FFCPS_0 + +#undef FFCPS_CHAR1A2A + +#define FFCPS_1 +#include "pcre2_jit_neon_inc.h" +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +# define FF_UTF +# include "pcre2_jit_neon_inc.h" +# undef FF_UTF +#endif +#undef FFCPS_1 + +#undef FFCPS_DIFF1 + +#define FFCPS_DEFAULT +#include "pcre2_jit_neon_inc.h" +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +# define FF_UTF +# include "pcre2_jit_neon_inc.h" +# undef FF_UTF +#endif +#undef FFCPS + +#define JIT_HAS_FAST_FORWARD_CHAR_PAIR_SIMD 1 + +static void fast_forward_char_pair_simd(compiler_common *common, sljit_s32 offs1, + PCRE2_UCHAR char1a, PCRE2_UCHAR char1b, sljit_s32 offs2, PCRE2_UCHAR char2a, PCRE2_UCHAR char2b) +{ +DEFINE_COMPILER; +sljit_u32 diff = IN_UCHARS(offs1 - offs2); +struct sljit_jump *partial_quit; +int_char ic; +SLJIT_ASSERT(common->mode == PCRE2_JIT_COMPLETE && offs1 > offs2); +SLJIT_ASSERT(diff <= IN_UCHARS(max_fast_forward_char_pair_offset())); +SLJIT_ASSERT(compiler->scratches == 5); + +/* Save temporary register STR_PTR. */ +OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LOCALS0, STR_PTR, 0); + +/* Prepare arguments for the function call. */ +if (common->match_end_ptr == 0) + OP1(SLJIT_MOV, SLJIT_R0, 0, STR_END, 0); +else + { + OP1(SLJIT_MOV, SLJIT_R0, 0, SLJIT_MEM1(SLJIT_SP), common->match_end_ptr); + OP2(SLJIT_ADD, SLJIT_R0, 0, SLJIT_R0, 0, SLJIT_IMM, IN_UCHARS(offs1 + 1)); + + OP2(SLJIT_SUB | SLJIT_SET_LESS, SLJIT_UNUSED, 0, STR_END, 0, SLJIT_R0, 0); + CMOV(SLJIT_LESS, SLJIT_R0, STR_END, 0); + } + +OP1(SLJIT_MOV, SLJIT_R1, 0, STR_PTR, 0); +OP1(SLJIT_MOV_S32, SLJIT_R2, 0, SLJIT_IMM, offs1); +OP1(SLJIT_MOV_S32, SLJIT_R3, 0, SLJIT_IMM, offs2); +ic.c.c1 = char1a; +ic.c.c2 = char1b; +ic.c.c3 = char2a; +ic.c.c4 = char2b; +OP1(SLJIT_MOV_U32, SLJIT_R4, 0, SLJIT_IMM, ic.x); + +if (diff == 1) { + if (char1a == char1b && char2a == char2b) { +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 + if (common->utf) + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(SW) | SLJIT_ARG3(SW) | SLJIT_ARG4(SW), + SLJIT_IMM, SLJIT_FUNC_OFFSET(ffcps_0_utf)); + else +#endif + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(SW) | SLJIT_ARG3(SW) | SLJIT_ARG4(SW), + SLJIT_IMM, SLJIT_FUNC_OFFSET(ffcps_0)); + } else { +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 + if (common->utf) + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(SW) | SLJIT_ARG3(SW) | SLJIT_ARG4(SW), + SLJIT_IMM, SLJIT_FUNC_OFFSET(ffcps_1_utf)); + else +#endif + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(SW) | SLJIT_ARG3(SW) | SLJIT_ARG4(SW), + SLJIT_IMM, SLJIT_FUNC_OFFSET(ffcps_1)); + } +} else { +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 + if (common->utf) + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(SW) | SLJIT_ARG3(SW) | SLJIT_ARG4(SW), + SLJIT_IMM, SLJIT_FUNC_OFFSET(ffcps_default_utf)); + else +#endif + sljit_emit_icall(compiler, SLJIT_CALL, SLJIT_RET(SW) | SLJIT_ARG1(SW) | SLJIT_ARG2(SW) | SLJIT_ARG3(SW) | SLJIT_ARG4(SW), + SLJIT_IMM, SLJIT_FUNC_OFFSET(ffcps_default)); +} + +/* Restore STR_PTR register. */ +OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_MEM1(SLJIT_SP), LOCALS0); + +/* Check return value. */ +partial_quit = CMP(SLJIT_EQUAL, SLJIT_RETURN_REG, 0, SLJIT_IMM, 0); +add_jump(compiler, &common->failed_match, partial_quit); + +/* Fast forward STR_PTR to the result of memchr. */ +OP1(SLJIT_MOV, STR_PTR, 0, SLJIT_RETURN_REG, 0); + +JUMPHERE(partial_quit); +} + +#endif /* SLJIT_CONFIG_ARM_64 && SLJIT_CONFIG_ARM_64 */ + +#if (defined SLJIT_CONFIG_S390X && SLJIT_CONFIG_S390X) + +#if PCRE2_CODE_UNIT_WIDTH == 8 +#define VECTOR_ELEMENT_SIZE 0 +#elif PCRE2_CODE_UNIT_WIDTH == 16 +#define VECTOR_ELEMENT_SIZE 1 +#elif PCRE2_CODE_UNIT_WIDTH == 32 +#define VECTOR_ELEMENT_SIZE 2 +#else +#error "Unsupported unit width" +#endif + +static void load_from_mem_vector(struct sljit_compiler *compiler, BOOL vlbb, sljit_s32 dst_vreg, + sljit_s32 base_reg, sljit_s32 index_reg) +{ +sljit_u16 instruction[3]; + +instruction[0] = (sljit_u16)(0xe700 | (dst_vreg << 4) | index_reg); +instruction[1] = (sljit_u16)(base_reg << 12); +instruction[2] = (sljit_u16)((0x8 << 8) | (vlbb ? 0x07 : 0x06)); + +sljit_emit_op_custom(compiler, instruction, 6); +} + +#if PCRE2_CODE_UNIT_WIDTH == 32 + +static void replicate_imm_vector(struct sljit_compiler *compiler, int step, sljit_s32 dst_vreg, + PCRE2_UCHAR chr, sljit_s32 tmp_general_reg) +{ +sljit_u16 instruction[3]; + +SLJIT_ASSERT(step >= 0 && step <= 1); + +if (chr < 0x7fff) + { + if (step == 1) + return; + + /* VREPI */ + instruction[0] = (sljit_u16)(0xe700 | (dst_vreg << 4)); + instruction[1] = (sljit_u16)chr; + instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0x8 << 8) | 0x45); + sljit_emit_op_custom(compiler, instruction, 6); + return; + } + +if (step == 0) + { + OP1(SLJIT_MOV, tmp_general_reg, 0, SLJIT_IMM, chr); + + /* VLVG */ + instruction[0] = (sljit_u16)(0xe700 | (dst_vreg << 4) | sljit_get_register_index(tmp_general_reg)); + instruction[1] = 0; + instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0x8 << 8) | 0x22); + sljit_emit_op_custom(compiler, instruction, 6); + return; + } + +/* VREP */ +instruction[0] = (sljit_u16)(0xe700 | (dst_vreg << 4) | dst_vreg); +instruction[1] = 0; +instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0xc << 8) | 0x4d); +sljit_emit_op_custom(compiler, instruction, 6); +} + +#endif + +static void fast_forward_char_pair_sse2_compare(struct sljit_compiler *compiler, vector_compare_type compare_type, + int step, sljit_s32 dst_ind, sljit_s32 cmp1_ind, sljit_s32 cmp2_ind, sljit_s32 tmp_ind) +{ +sljit_u16 instruction[3]; + +SLJIT_ASSERT(step >= 0 && step <= 2); + +if (step == 1) + { + /* VCEQ */ + instruction[0] = (sljit_u16)(0xe700 | (dst_ind << 4) | dst_ind); + instruction[1] = (sljit_u16)(cmp1_ind << 12); + instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0xe << 8) | 0xf8); + sljit_emit_op_custom(compiler, instruction, 6); + return; + } + +if (compare_type != vector_compare_match2) + { + if (step == 0 && compare_type == vector_compare_match1i) + { + /* VO */ + instruction[0] = (sljit_u16)(0xe700 | (dst_ind << 4) | dst_ind); + instruction[1] = (sljit_u16)(cmp2_ind << 12); + instruction[2] = (sljit_u16)((0xe << 8) | 0x6a); + sljit_emit_op_custom(compiler, instruction, 6); + } + return; + } + +switch (step) + { + case 0: + /* VCEQ */ + instruction[0] = (sljit_u16)(0xe700 | (tmp_ind << 4) | dst_ind); + instruction[1] = (sljit_u16)(cmp2_ind << 12); + instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0xe << 8) | 0xf8); + sljit_emit_op_custom(compiler, instruction, 6); + return; + + case 2: + /* VO */ + instruction[0] = (sljit_u16)(0xe700 | (dst_ind << 4) | dst_ind); + instruction[1] = (sljit_u16)(tmp_ind << 12); + instruction[2] = (sljit_u16)((0xe << 8) | 0x6a); + sljit_emit_op_custom(compiler, instruction, 6); + return; + } +} + +#define JIT_HAS_FAST_FORWARD_CHAR_SIMD 1 + +static void fast_forward_char_simd(compiler_common *common, PCRE2_UCHAR char1, PCRE2_UCHAR char2, sljit_s32 offset) +{ +DEFINE_COMPILER; +sljit_u16 instruction[3]; +struct sljit_label *start; +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +struct sljit_label *restart; +#endif +struct sljit_jump *quit; +struct sljit_jump *partial_quit[2]; +vector_compare_type compare_type = vector_compare_match1; +sljit_s32 tmp1_reg_ind = sljit_get_register_index(TMP1); +sljit_s32 str_ptr_reg_ind = sljit_get_register_index(STR_PTR); +sljit_s32 data_ind = 0; +sljit_s32 tmp_ind = 1; +sljit_s32 cmp1_ind = 2; +sljit_s32 cmp2_ind = 3; +sljit_s32 zero_ind = 4; +sljit_u32 bit = 0; +int i; + +SLJIT_UNUSED_ARG(offset); + +if (char1 != char2) + { + bit = char1 ^ char2; + compare_type = vector_compare_match1i; + + if (!is_powerof2(bit)) + { + bit = 0; + compare_type = vector_compare_match2; + } + } + +partial_quit[0] = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); +if (common->mode == PCRE2_JIT_COMPLETE) + add_jump(compiler, &common->failed_match, partial_quit[0]); + +/* First part (unaligned start) */ + +OP2(SLJIT_ADD, TMP2, 0, STR_PTR, 0, SLJIT_IMM, 16); + +#if PCRE2_CODE_UNIT_WIDTH != 32 + +/* VREPI */ +instruction[0] = (sljit_u16)(0xe700 | (cmp1_ind << 4)); +instruction[1] = (sljit_u16)(char1 | bit); +instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0x8 << 8) | 0x45); +sljit_emit_op_custom(compiler, instruction, 6); + +if (char1 != char2) + { + /* VREPI */ + instruction[0] = (sljit_u16)(0xe700 | (cmp2_ind << 4)); + instruction[1] = (sljit_u16)(bit != 0 ? bit : char2); + /* instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0x8 << 8) | 0x45); */ + sljit_emit_op_custom(compiler, instruction, 6); + } + +#else /* PCRE2_CODE_UNIT_WIDTH == 32 */ + +for (int i = 0; i < 2; i++) + { + replicate_imm_vector(compiler, i, cmp1_ind, char1 | bit, TMP1); + + if (char1 != char2) + replicate_imm_vector(compiler, i, cmp2_ind, bit != 0 ? bit : char2, TMP1); + } + +#endif /* PCRE2_CODE_UNIT_WIDTH != 32 */ + +if (compare_type == vector_compare_match2) + { + /* VREPI */ + instruction[0] = (sljit_u16)(0xe700 | (zero_ind << 4)); + instruction[1] = 0; + instruction[2] = (sljit_u16)((0x8 << 8) | 0x45); + sljit_emit_op_custom(compiler, instruction, 6); + } + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +restart = LABEL(); +#endif + +load_from_mem_vector(compiler, TRUE, data_ind, str_ptr_reg_ind, 0); +OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, ~15); + +if (compare_type != vector_compare_match2) + { + if (compare_type == vector_compare_match1i) + fast_forward_char_pair_sse2_compare(compiler, compare_type, 0, data_ind, cmp1_ind, cmp2_ind, tmp_ind); + + /* VFEE */ + instruction[0] = (sljit_u16)(0xe700 | (data_ind << 4) | data_ind); + instruction[1] = (sljit_u16)((cmp1_ind << 12) | (1 << 4)); + instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0xe << 8) | 0x80); + sljit_emit_op_custom(compiler, instruction, 6); + } +else + { + for (i = 0; i < 3; i++) + fast_forward_char_pair_sse2_compare(compiler, compare_type, i, data_ind, cmp1_ind, cmp2_ind, tmp_ind); + + /* VFENE */ + instruction[0] = (sljit_u16)(0xe700 | (data_ind << 4) | data_ind); + instruction[1] = (sljit_u16)((zero_ind << 12) | (1 << 4)); + instruction[2] = (sljit_u16)((0xe << 8) | 0x81); + sljit_emit_op_custom(compiler, instruction, 6); + } + +/* TODO: use sljit_set_current_flags */ + +/* VLGVB */ +instruction[0] = (sljit_u16)(0xe700 | (tmp1_reg_ind << 4) | data_ind); +instruction[1] = 7; +instruction[2] = (sljit_u16)((0x4 << 8) | 0x21); +sljit_emit_op_custom(compiler, instruction, 6); + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); +quit = CMP(SLJIT_LESS, STR_PTR, 0, TMP2, 0); + +OP2(SLJIT_SUB, STR_PTR, 0, TMP2, 0, SLJIT_IMM, 16); + +/* Second part (aligned) */ +start = LABEL(); + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, 16); + +partial_quit[1] = CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0); +if (common->mode == PCRE2_JIT_COMPLETE) + add_jump(compiler, &common->failed_match, partial_quit[1]); + +load_from_mem_vector(compiler, TRUE, data_ind, str_ptr_reg_ind, 0); + +if (compare_type != vector_compare_match2) + { + if (compare_type == vector_compare_match1i) + fast_forward_char_pair_sse2_compare(compiler, compare_type, 0, data_ind, cmp1_ind, cmp2_ind, tmp_ind); + + /* VFEE */ + instruction[0] = (sljit_u16)(0xe700 | (data_ind << 4) | data_ind); + instruction[1] = (sljit_u16)((cmp1_ind << 12) | (1 << 4)); + instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0xe << 8) | 0x80); + sljit_emit_op_custom(compiler, instruction, 6); + } +else + { + for (i = 0; i < 3; i++) + fast_forward_char_pair_sse2_compare(compiler, compare_type, i, data_ind, cmp1_ind, cmp2_ind, tmp_ind); + + /* VFENE */ + instruction[0] = (sljit_u16)(0xe700 | (data_ind << 4) | data_ind); + instruction[1] = (sljit_u16)((zero_ind << 12) | (1 << 4)); + instruction[2] = (sljit_u16)((0xe << 8) | 0x81); + sljit_emit_op_custom(compiler, instruction, 6); + } + +/* TODO: use sljit_set_current_flags */ + +/* VLGVB */ +instruction[0] = (sljit_u16)(0xe700 | (tmp1_reg_ind << 4) | data_ind); +instruction[1] = 7; +instruction[2] = (sljit_u16)((0x4 << 8) | 0x21); +sljit_emit_op_custom(compiler, instruction, 6); + +CMPTO(SLJIT_GREATER_EQUAL, TMP1, 0, SLJIT_IMM, 16, start); + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); + +JUMPHERE(quit); + +if (common->mode != PCRE2_JIT_COMPLETE) + { + JUMPHERE(partial_quit[0]); + JUMPHERE(partial_quit[1]); + OP2(SLJIT_SUB | SLJIT_SET_GREATER, SLJIT_UNUSED, 0, STR_PTR, 0, STR_END, 0); + CMOV(SLJIT_GREATER, STR_PTR, STR_END, 0); + } +else + add_jump(compiler, &common->failed_match, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +if (common->utf && offset > 0) + { + SLJIT_ASSERT(common->mode == PCRE2_JIT_COMPLETE); + + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-offset)); + + quit = jump_if_utf_char_start(compiler, TMP1); + + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + add_jump(compiler, &common->failed_match, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + + OP2(SLJIT_ADD, TMP2, 0, STR_PTR, 0, SLJIT_IMM, 16); + JUMPTO(SLJIT_JUMP, restart); + + JUMPHERE(quit); + } +#endif +} + +#define JIT_HAS_FAST_REQUESTED_CHAR_SIMD 1 + +static jump_list *fast_requested_char_simd(compiler_common *common, PCRE2_UCHAR char1, PCRE2_UCHAR char2) +{ +DEFINE_COMPILER; +sljit_u16 instruction[3]; +struct sljit_label *start; +struct sljit_jump *quit; +jump_list *not_found = NULL; +vector_compare_type compare_type = vector_compare_match1; +sljit_s32 tmp1_reg_ind = sljit_get_register_index(TMP1); +sljit_s32 tmp3_reg_ind = sljit_get_register_index(TMP3); +sljit_s32 data_ind = 0; +sljit_s32 tmp_ind = 1; +sljit_s32 cmp1_ind = 2; +sljit_s32 cmp2_ind = 3; +sljit_s32 zero_ind = 4; +sljit_u32 bit = 0; +int i; + +if (char1 != char2) + { + bit = char1 ^ char2; + compare_type = vector_compare_match1i; + + if (!is_powerof2(bit)) + { + bit = 0; + compare_type = vector_compare_match2; + } + } + +add_jump(compiler, ¬_found, CMP(SLJIT_GREATER_EQUAL, TMP1, 0, STR_END, 0)); + +/* First part (unaligned start) */ + +OP2(SLJIT_ADD, TMP2, 0, TMP1, 0, SLJIT_IMM, 16); + +#if PCRE2_CODE_UNIT_WIDTH != 32 + +/* VREPI */ +instruction[0] = (sljit_u16)(0xe700 | (cmp1_ind << 4)); +instruction[1] = (sljit_u16)(char1 | bit); +instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0x8 << 8) | 0x45); +sljit_emit_op_custom(compiler, instruction, 6); + +if (char1 != char2) + { + /* VREPI */ + instruction[0] = (sljit_u16)(0xe700 | (cmp2_ind << 4)); + instruction[1] = (sljit_u16)(bit != 0 ? bit : char2); + /* instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0x8 << 8) | 0x45); */ + sljit_emit_op_custom(compiler, instruction, 6); + } + +#else /* PCRE2_CODE_UNIT_WIDTH == 32 */ + +for (int i = 0; i < 2; i++) + { + replicate_imm_vector(compiler, i, cmp1_ind, char1 | bit, TMP3); + + if (char1 != char2) + replicate_imm_vector(compiler, i, cmp2_ind, bit != 0 ? bit : char2, TMP3); + } + +#endif /* PCRE2_CODE_UNIT_WIDTH != 32 */ + +if (compare_type == vector_compare_match2) + { + /* VREPI */ + instruction[0] = (sljit_u16)(0xe700 | (zero_ind << 4)); + instruction[1] = 0; + instruction[2] = (sljit_u16)((0x8 << 8) | 0x45); + sljit_emit_op_custom(compiler, instruction, 6); + } + +load_from_mem_vector(compiler, TRUE, data_ind, tmp1_reg_ind, 0); +OP2(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_IMM, ~15); + +if (compare_type != vector_compare_match2) + { + if (compare_type == vector_compare_match1i) + fast_forward_char_pair_sse2_compare(compiler, compare_type, 0, data_ind, cmp1_ind, cmp2_ind, tmp_ind); + + /* VFEE */ + instruction[0] = (sljit_u16)(0xe700 | (data_ind << 4) | data_ind); + instruction[1] = (sljit_u16)((cmp1_ind << 12) | (1 << 4)); + instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0xe << 8) | 0x80); + sljit_emit_op_custom(compiler, instruction, 6); + } +else + { + for (i = 0; i < 3; i++) + fast_forward_char_pair_sse2_compare(compiler, compare_type, i, data_ind, cmp1_ind, cmp2_ind, tmp_ind); + + /* VFENE */ + instruction[0] = (sljit_u16)(0xe700 | (data_ind << 4) | data_ind); + instruction[1] = (sljit_u16)((zero_ind << 12) | (1 << 4)); + instruction[2] = (sljit_u16)((0xe << 8) | 0x81); + sljit_emit_op_custom(compiler, instruction, 6); + } + +/* TODO: use sljit_set_current_flags */ + +/* VLGVB */ +instruction[0] = (sljit_u16)(0xe700 | (tmp3_reg_ind << 4) | data_ind); +instruction[1] = 7; +instruction[2] = (sljit_u16)((0x4 << 8) | 0x21); +sljit_emit_op_custom(compiler, instruction, 6); + +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP3, 0); +quit = CMP(SLJIT_LESS, TMP1, 0, TMP2, 0); + +OP2(SLJIT_SUB, TMP1, 0, TMP2, 0, SLJIT_IMM, 16); + +/* Second part (aligned) */ +start = LABEL(); + +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 16); + +add_jump(compiler, ¬_found, CMP(SLJIT_GREATER_EQUAL, TMP1, 0, STR_END, 0)); + +load_from_mem_vector(compiler, TRUE, data_ind, tmp1_reg_ind, 0); + +if (compare_type != vector_compare_match2) + { + if (compare_type == vector_compare_match1i) + fast_forward_char_pair_sse2_compare(compiler, compare_type, 0, data_ind, cmp1_ind, cmp2_ind, tmp_ind); + + /* VFEE */ + instruction[0] = (sljit_u16)(0xe700 | (data_ind << 4) | data_ind); + instruction[1] = (sljit_u16)((cmp1_ind << 12) | (1 << 4)); + instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0xe << 8) | 0x80); + sljit_emit_op_custom(compiler, instruction, 6); + } +else + { + for (i = 0; i < 3; i++) + fast_forward_char_pair_sse2_compare(compiler, compare_type, i, data_ind, cmp1_ind, cmp2_ind, tmp_ind); + + /* VFENE */ + instruction[0] = (sljit_u16)(0xe700 | (data_ind << 4) | data_ind); + instruction[1] = (sljit_u16)((zero_ind << 12) | (1 << 4)); + instruction[2] = (sljit_u16)((0xe << 8) | 0x81); + sljit_emit_op_custom(compiler, instruction, 6); + } + +/* TODO: use sljit_set_current_flags */ + +/* VLGVB */ +instruction[0] = (sljit_u16)(0xe700 | (tmp3_reg_ind << 4) | data_ind); +instruction[1] = 7; +instruction[2] = (sljit_u16)((0x4 << 8) | 0x21); +sljit_emit_op_custom(compiler, instruction, 6); + +CMPTO(SLJIT_GREATER_EQUAL, TMP3, 0, SLJIT_IMM, 16, start); + +OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, TMP3, 0); + +JUMPHERE(quit); +add_jump(compiler, ¬_found, CMP(SLJIT_GREATER_EQUAL, TMP1, 0, STR_END, 0)); + +return not_found; +} + +#define JIT_HAS_FAST_FORWARD_CHAR_PAIR_SIMD 1 + +static void fast_forward_char_pair_simd(compiler_common *common, sljit_s32 offs1, + PCRE2_UCHAR char1a, PCRE2_UCHAR char1b, sljit_s32 offs2, PCRE2_UCHAR char2a, PCRE2_UCHAR char2b) +{ +DEFINE_COMPILER; +sljit_u16 instruction[3]; +struct sljit_label *start; +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +struct sljit_label *restart; +#endif +struct sljit_jump *quit; +struct sljit_jump *jump[2]; +vector_compare_type compare1_type = vector_compare_match1; +vector_compare_type compare2_type = vector_compare_match1; +sljit_u32 bit1 = 0; +sljit_u32 bit2 = 0; +sljit_s32 diff = IN_UCHARS(offs2 - offs1); +sljit_s32 tmp1_reg_ind = sljit_get_register_index(TMP1); +sljit_s32 tmp2_reg_ind = sljit_get_register_index(TMP2); +sljit_s32 str_ptr_reg_ind = sljit_get_register_index(STR_PTR); +sljit_s32 data1_ind = 0; +sljit_s32 data2_ind = 1; +sljit_s32 tmp1_ind = 2; +sljit_s32 tmp2_ind = 3; +sljit_s32 cmp1a_ind = 4; +sljit_s32 cmp1b_ind = 5; +sljit_s32 cmp2a_ind = 6; +sljit_s32 cmp2b_ind = 7; +sljit_s32 zero_ind = 8; +int i; + +SLJIT_ASSERT(common->mode == PCRE2_JIT_COMPLETE && offs1 > offs2); +SLJIT_ASSERT(-diff <= (sljit_s32)IN_UCHARS(max_fast_forward_char_pair_offset())); +SLJIT_ASSERT(tmp1_reg_ind != 0 && tmp2_reg_ind != 0); + +if (char1a != char1b) + { + bit1 = char1a ^ char1b; + compare1_type = vector_compare_match1i; + + if (!is_powerof2(bit1)) + { + bit1 = 0; + compare1_type = vector_compare_match2; + } + } + +if (char2a != char2b) + { + bit2 = char2a ^ char2b; + compare2_type = vector_compare_match1i; + + if (!is_powerof2(bit2)) + { + bit2 = 0; + compare2_type = vector_compare_match2; + } + } + +/* Initialize. */ +if (common->match_end_ptr != 0) + { + OP1(SLJIT_MOV, TMP1, 0, SLJIT_MEM1(SLJIT_SP), common->match_end_ptr); + OP1(SLJIT_MOV, TMP3, 0, STR_END, 0); + OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, IN_UCHARS(offs1 + 1)); + + OP2(SLJIT_SUB | SLJIT_SET_LESS, SLJIT_UNUSED, 0, TMP1, 0, STR_END, 0); + CMOV(SLJIT_LESS, STR_END, TMP1, 0); + } + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(offs1)); +add_jump(compiler, &common->failed_match, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); +OP2(SLJIT_AND, TMP2, 0, STR_PTR, 0, SLJIT_IMM, ~15); + +#if PCRE2_CODE_UNIT_WIDTH != 32 + +OP2(SLJIT_SUB, TMP1, 0, STR_PTR, 0, SLJIT_IMM, -diff); + +/* VREPI */ +instruction[0] = (sljit_u16)(0xe700 | (cmp1a_ind << 4)); +instruction[1] = (sljit_u16)(char1a | bit1); +instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0x8 << 8) | 0x45); +sljit_emit_op_custom(compiler, instruction, 6); + +if (char1a != char1b) + { + /* VREPI */ + instruction[0] = (sljit_u16)(0xe700 | (cmp1b_ind << 4)); + instruction[1] = (sljit_u16)(bit1 != 0 ? bit1 : char1b); + /* instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0x8 << 8) | 0x45); */ + sljit_emit_op_custom(compiler, instruction, 6); + } + +/* VREPI */ +instruction[0] = (sljit_u16)(0xe700 | (cmp2a_ind << 4)); +instruction[1] = (sljit_u16)(char2a | bit2); +/* instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0x8 << 8) | 0x45); */ +sljit_emit_op_custom(compiler, instruction, 6); + +if (char2a != char2b) + { + /* VREPI */ + instruction[0] = (sljit_u16)(0xe700 | (cmp2b_ind << 4)); + instruction[1] = (sljit_u16)(bit2 != 0 ? bit2 : char2b); + /* instruction[2] = (sljit_u16)((VECTOR_ELEMENT_SIZE << 12) | (0x8 << 8) | 0x45); */ + sljit_emit_op_custom(compiler, instruction, 6); + } + +#else /* PCRE2_CODE_UNIT_WIDTH == 32 */ + +for (int i = 0; i < 2; i++) + { + replicate_imm_vector(compiler, i, cmp1a_ind, char1a | bit1, TMP1); + + if (char1a != char1b) + replicate_imm_vector(compiler, i, cmp1b_ind, bit1 != 0 ? bit1 : char1b, TMP1); + + replicate_imm_vector(compiler, i, cmp2a_ind, char2a | bit2, TMP1); + + if (char2a != char2b) + replicate_imm_vector(compiler, i, cmp2b_ind, bit2 != 0 ? bit2 : char2b, TMP1); + } + +OP2(SLJIT_SUB, TMP1, 0, STR_PTR, 0, SLJIT_IMM, -diff); + +#endif /* PCRE2_CODE_UNIT_WIDTH != 32 */ + +/* VREPI */ +instruction[0] = (sljit_u16)(0xe700 | (zero_ind << 4)); +instruction[1] = 0; +instruction[2] = (sljit_u16)((0x8 << 8) | 0x45); +sljit_emit_op_custom(compiler, instruction, 6); + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +restart = LABEL(); +#endif + +jump[0] = CMP(SLJIT_LESS, TMP1, 0, TMP2, 0); +load_from_mem_vector(compiler, TRUE, data2_ind, tmp1_reg_ind, 0); +jump[1] = JUMP(SLJIT_JUMP); +JUMPHERE(jump[0]); +load_from_mem_vector(compiler, FALSE, data2_ind, tmp1_reg_ind, 0); +JUMPHERE(jump[1]); + +load_from_mem_vector(compiler, TRUE, data1_ind, str_ptr_reg_ind, 0); +OP2(SLJIT_ADD, TMP2, 0, TMP2, 0, SLJIT_IMM, 16); + +for (i = 0; i < 3; i++) + { + fast_forward_char_pair_sse2_compare(compiler, compare1_type, i, data1_ind, cmp1a_ind, cmp1b_ind, tmp1_ind); + fast_forward_char_pair_sse2_compare(compiler, compare2_type, i, data2_ind, cmp2a_ind, cmp2b_ind, tmp2_ind); + } + +/* VN */ +instruction[0] = (sljit_u16)(0xe700 | (data1_ind << 4) | data1_ind); +instruction[1] = (sljit_u16)(data2_ind << 12); +instruction[2] = (sljit_u16)((0xe << 8) | 0x68); +sljit_emit_op_custom(compiler, instruction, 6); + +/* VFENE */ +instruction[0] = (sljit_u16)(0xe700 | (data1_ind << 4) | data1_ind); +instruction[1] = (sljit_u16)((zero_ind << 12) | (1 << 4)); +instruction[2] = (sljit_u16)((0xe << 8) | 0x81); +sljit_emit_op_custom(compiler, instruction, 6); + +/* TODO: use sljit_set_current_flags */ + +/* VLGVB */ +instruction[0] = (sljit_u16)(0xe700 | (tmp1_reg_ind << 4) | data1_ind); +instruction[1] = 7; +instruction[2] = (sljit_u16)((0x4 << 8) | 0x21); +sljit_emit_op_custom(compiler, instruction, 6); + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP1, 0); +quit = CMP(SLJIT_LESS, STR_PTR, 0, TMP2, 0); + +OP2(SLJIT_SUB, STR_PTR, 0, TMP2, 0, SLJIT_IMM, 16); +OP1(SLJIT_MOV, TMP1, 0, SLJIT_IMM, diff); + +/* Main loop. */ +start = LABEL(); + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, 16); +add_jump(compiler, &common->failed_match, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + +load_from_mem_vector(compiler, FALSE, data1_ind, str_ptr_reg_ind, 0); +load_from_mem_vector(compiler, FALSE, data2_ind, str_ptr_reg_ind, tmp1_reg_ind); + +for (i = 0; i < 3; i++) + { + fast_forward_char_pair_sse2_compare(compiler, compare1_type, i, data1_ind, cmp1a_ind, cmp1b_ind, tmp1_ind); + fast_forward_char_pair_sse2_compare(compiler, compare2_type, i, data2_ind, cmp2a_ind, cmp2b_ind, tmp2_ind); + } + +/* VN */ +instruction[0] = (sljit_u16)(0xe700 | (data1_ind << 4) | data1_ind); +instruction[1] = (sljit_u16)(data2_ind << 12); +instruction[2] = (sljit_u16)((0xe << 8) | 0x68); +sljit_emit_op_custom(compiler, instruction, 6); + +/* VFENE */ +instruction[0] = (sljit_u16)(0xe700 | (data1_ind << 4) | data1_ind); +instruction[1] = (sljit_u16)((zero_ind << 12) | (1 << 4)); +instruction[2] = (sljit_u16)((0xe << 8) | 0x81); +sljit_emit_op_custom(compiler, instruction, 6); + +/* TODO: use sljit_set_current_flags */ + +/* VLGVB */ +instruction[0] = (sljit_u16)(0xe700 | (tmp2_reg_ind << 4) | data1_ind); +instruction[1] = 7; +instruction[2] = (sljit_u16)((0x4 << 8) | 0x21); +sljit_emit_op_custom(compiler, instruction, 6); + +CMPTO(SLJIT_GREATER_EQUAL, TMP2, 0, SLJIT_IMM, 16, start); + +OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, TMP2, 0); + +JUMPHERE(quit); + +add_jump(compiler, &common->failed_match, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH != 32 +if (common->utf) + { + SLJIT_ASSERT(common->mode == PCRE2_JIT_COMPLETE); + + OP1(MOV_UCHAR, TMP1, 0, SLJIT_MEM1(STR_PTR), IN_UCHARS(-offs1)); + + quit = jump_if_utf_char_start(compiler, TMP1); + + OP2(SLJIT_ADD, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(1)); + add_jump(compiler, &common->failed_match, CMP(SLJIT_GREATER_EQUAL, STR_PTR, 0, STR_END, 0)); + + /* TMP1 contains diff. */ + OP2(SLJIT_AND, TMP2, 0, STR_PTR, 0, SLJIT_IMM, ~15); + OP2(SLJIT_SUB, TMP1, 0, STR_PTR, 0, SLJIT_IMM, -diff); + JUMPTO(SLJIT_JUMP, restart); + + JUMPHERE(quit); + } +#endif + +OP2(SLJIT_SUB, STR_PTR, 0, STR_PTR, 0, SLJIT_IMM, IN_UCHARS(offs1)); + +if (common->match_end_ptr != 0) + OP1(SLJIT_MOV, STR_END, 0, TMP3, 0); +} + +#endif /* SLJIT_CONFIG_S390X */ + +#endif /* !SUPPORT_VALGRIND */ diff --git a/src/pcre2/src/pcre2_jit_test.c b/src/pcre2/src/pcre2_jit_test.c new file mode 100644 index 00000000..d9358873 --- /dev/null +++ b/src/pcre2/src/pcre2_jit_test.c @@ -0,0 +1,2501 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include +#include + +#define PCRE2_CODE_UNIT_WIDTH 0 +#include "pcre2.h" + +/* + Letter characters: + \xe6\x92\xad = 0x64ad = 25773 (kanji) + Non-letter characters: + \xc2\xa1 = 0xa1 = (Inverted Exclamation Mark) + \xf3\xa9\xb7\x80 = 0xe9dc0 = 957888 + \xed\xa0\x80 = 55296 = 0xd800 (Invalid UTF character) + \xed\xb0\x80 = 56320 = 0xdc00 (Invalid UTF character) + Newlines: + \xc2\x85 = 0x85 = 133 (NExt Line = NEL) + \xe2\x80\xa8 = 0x2028 = 8232 (Line Separator) + Othercase pairs: + \xc3\xa9 = 0xe9 = 233 (e') + \xc3\x89 = 0xc9 = 201 (E') + \xc3\xa1 = 0xe1 = 225 (a') + \xc3\x81 = 0xc1 = 193 (A') + \x53 = 0x53 = S + \x73 = 0x73 = s + \xc5\xbf = 0x17f = 383 (long S) + \xc8\xba = 0x23a = 570 + \xe2\xb1\xa5 = 0x2c65 = 11365 + \xe1\xbd\xb8 = 0x1f78 = 8056 + \xe1\xbf\xb8 = 0x1ff8 = 8184 + \xf0\x90\x90\x80 = 0x10400 = 66560 + \xf0\x90\x90\xa8 = 0x10428 = 66600 + \xc7\x84 = 0x1c4 = 452 + \xc7\x85 = 0x1c5 = 453 + \xc7\x86 = 0x1c6 = 454 + Caseless sets: + ucp_Armenian - \x{531}-\x{556} -> \x{561}-\x{586} + ucp_Coptic - \x{2c80}-\x{2ce3} -> caseless: XOR 0x1 + ucp_Latin - \x{ff21}-\x{ff3a} -> \x{ff41]-\x{ff5a} + + Mark property: + \xcc\x8d = 0x30d = 781 + Special: + \xc2\x80 = 0x80 = 128 (lowest 2 byte character) + \xdf\xbf = 0x7ff = 2047 (highest 2 byte character) + \xe0\xa0\x80 = 0x800 = 2048 (lowest 2 byte character) + \xef\xbf\xbf = 0xffff = 65535 (highest 3 byte character) + \xf0\x90\x80\x80 = 0x10000 = 65536 (lowest 4 byte character) + \xf4\x8f\xbf\xbf = 0x10ffff = 1114111 (highest allowed utf character) +*/ + +static int regression_tests(void); +static int invalid_utf8_regression_tests(void); +static int invalid_utf16_regression_tests(void); +static int invalid_utf32_regression_tests(void); + +int main(void) +{ + int jit = 0; +#if defined SUPPORT_PCRE2_8 + pcre2_config_8(PCRE2_CONFIG_JIT, &jit); +#elif defined SUPPORT_PCRE2_16 + pcre2_config_16(PCRE2_CONFIG_JIT, &jit); +#elif defined SUPPORT_PCRE2_32 + pcre2_config_32(PCRE2_CONFIG_JIT, &jit); +#endif + if (!jit) { + printf("JIT must be enabled to run pcre_jit_test\n"); + return 1; + } + return regression_tests() + | invalid_utf8_regression_tests() + | invalid_utf16_regression_tests() + | invalid_utf32_regression_tests(); +} + +/* --------------------------------------------------------------------------------------- */ + +#if !(defined SUPPORT_PCRE2_8) && !(defined SUPPORT_PCRE2_16) && !(defined SUPPORT_PCRE2_32) +#error SUPPORT_PCRE2_8 or SUPPORT_PCRE2_16 or SUPPORT_PCRE2_32 must be defined +#endif + +#define MU (PCRE2_MULTILINE | PCRE2_UTF) +#define MUP (PCRE2_MULTILINE | PCRE2_UTF | PCRE2_UCP) +#define CMU (PCRE2_CASELESS | PCRE2_MULTILINE | PCRE2_UTF) +#define CMUP (PCRE2_CASELESS | PCRE2_MULTILINE | PCRE2_UTF | PCRE2_UCP) +#define M (PCRE2_MULTILINE) +#define MP (PCRE2_MULTILINE | PCRE2_UCP) +#define U (PCRE2_UTF) +#define CM (PCRE2_CASELESS | PCRE2_MULTILINE) + +#define BSR(x) ((x) << 16) +#define A PCRE2_NEWLINE_ANYCRLF + +#define GET_NEWLINE(x) ((x) & 0xffff) +#define GET_BSR(x) ((x) >> 16) + +#define OFFSET_MASK 0x00ffff +#define F_NO8 0x010000 +#define F_NO16 0x020000 +#define F_NO32 0x020000 +#define F_NOMATCH 0x040000 +#define F_DIFF 0x080000 +#define F_FORCECONV 0x100000 +#define F_PROPERTY 0x200000 + +struct regression_test_case { + int compile_options; + int newline; + int match_options; + int start_offset; + const char *pattern; + const char *input; +}; + +static struct regression_test_case regression_test_cases[] = { + /* Constant strings. */ + { MU, A, 0, 0, "AbC", "AbAbC" }, + { MU, A, 0, 0, "ACCEPT", "AACACCACCEACCEPACCEPTACCEPTT" }, + { CMU, A, 0, 0, "aA#\xc3\xa9\xc3\x81", "aA#Aa#\xc3\x89\xc3\xa1" }, + { M, A, 0, 0, "[^a]", "aAbB" }, + { CM, A, 0, 0, "[^m]", "mMnN" }, + { M, A, 0, 0, "a[^b][^#]", "abacd" }, + { CM, A, 0, 0, "A[^B][^E]", "abacd" }, + { CMU, A, 0, 0, "[^x][^#]", "XxBll" }, + { MU, A, 0, 0, "[^a]", "aaa\xc3\xa1#Ab" }, + { CMU, A, 0, 0, "[^A]", "aA\xe6\x92\xad" }, + { MU, A, 0, 0, "\\W(\\W)?\\w", "\r\n+bc" }, + { MU, A, 0, 0, "\\W(\\W)?\\w", "\n\r+bc" }, + { MU, A, 0, 0, "\\W(\\W)?\\w", "\r\r+bc" }, + { MU, A, 0, 0, "\\W(\\W)?\\w", "\n\n+bc" }, + { MU, A, 0, 0, "[axd]", "sAXd" }, + { CMU, A, 0, 0, "[axd]", "sAXd" }, + { CMU, A, 0, 0 | F_NOMATCH, "[^axd]", "DxA" }, + { MU, A, 0, 0, "[a-dA-C]", "\xe6\x92\xad\xc3\xa9.B" }, + { MU, A, 0, 0, "[^a-dA-C]", "\xe6\x92\xad\xc3\xa9" }, + { CMU, A, 0, 0, "[^\xc3\xa9]", "\xc3\xa9\xc3\x89." }, + { MU, A, 0, 0, "[^\xc3\xa9]", "\xc3\xa9\xc3\x89." }, + { MU, A, 0, 0, "[^a]", "\xc2\x80[]" }, + { CMU, A, 0, 0, "\xf0\x90\x90\xa7", "\xf0\x90\x91\x8f" }, + { CM, A, 0, 0, "1a2b3c4", "1a2B3c51A2B3C4" }, + { PCRE2_CASELESS, 0, 0, 0, "\xff#a", "\xff#\xff\xfe##\xff#A" }, + { PCRE2_CASELESS, 0, 0, 0, "\xfe", "\xff\xfc#\xfe\xfe" }, + { PCRE2_CASELESS, 0, 0, 0, "a1", "Aa1" }, +#ifndef NEVER_BACKSLASH_C + { M, A, 0, 0, "\\Ca", "cda" }, + { CM, A, 0, 0, "\\Ca", "CDA" }, + { M, A, 0, 0 | F_NOMATCH, "\\Cx", "cda" }, + { CM, A, 0, 0 | F_NOMATCH, "\\Cx", "CDA" }, +#endif /* !NEVER_BACKSLASH_C */ + { CMUP, A, 0, 0, "\xf0\x90\x90\x80\xf0\x90\x90\xa8", "\xf0\x90\x90\xa8\xf0\x90\x90\x80" }, + { CMUP, A, 0, 0, "\xf0\x90\x90\x80{2}", "\xf0\x90\x90\x80#\xf0\x90\x90\xa8\xf0\x90\x90\x80" }, + { CMUP, A, 0, 0, "\xf0\x90\x90\xa8{2}", "\xf0\x90\x90\x80#\xf0\x90\x90\xa8\xf0\x90\x90\x80" }, + { CMUP, A, 0, 0, "\xe1\xbd\xb8\xe1\xbf\xb8", "\xe1\xbf\xb8\xe1\xbd\xb8" }, + { M, A, 0, 0, "[3-57-9]", "5" }, + { PCRE2_AUTO_CALLOUT, A, 0, 0, "12345678901234567890123456789012345678901234567890123456789012345678901234567890", + "12345678901234567890123456789012345678901234567890123456789012345678901234567890" }, + + /* Assertions. */ + { MU, A, 0, 0, "\\b[^A]", "A_B#" }, + { M, A, 0, 0 | F_NOMATCH, "\\b\\W", "\n*" }, + { MU, A, 0, 0, "\\B[^,]\\b[^s]\\b", "#X" }, + { MP, A, 0, 0, "\\B", "_\xa1" }, + { MP, A, 0, 0 | F_PROPERTY, "\\b_\\b[,A]\\B", "_," }, + { MUP, A, 0, 0, "\\b", "\xe6\x92\xad!" }, + { MUP, A, 0, 0, "\\B", "_\xc2\xa1\xc3\xa1\xc2\x85" }, + { MUP, A, 0, 0, "\\b[^A]\\B[^c]\\b[^_]\\B", "_\xc3\xa1\xe2\x80\xa8" }, + { MUP, A, 0, 0, "\\b\\w+\\B", "\xc3\x89\xc2\xa1\xe6\x92\xad\xc3\x81\xc3\xa1" }, + { MU, A, 0, 0 | F_NOMATCH, "\\b.", "\xcd\xbe" }, + { CMUP, A, 0, 0, "\\By", "\xf0\x90\x90\xa8y" }, + { M, A, 0, 0 | F_NOMATCH, "\\R^", "\n" }, + { M, A, 0, 1 | F_NOMATCH, "^", "\n" }, + { 0, 0, 0, 0, "^ab", "ab" }, + { 0, 0, 0, 0 | F_NOMATCH, "^ab", "aab" }, + { M, PCRE2_NEWLINE_CRLF, 0, 0, "^a", "\r\raa\n\naa\r\naa" }, + { MU, A, 0, 0, "^-", "\xe2\x80\xa8--\xc2\x85-\r\n-" }, + { M, PCRE2_NEWLINE_ANY, 0, 0, "^-", "a--b--\x85--" }, + { MU, PCRE2_NEWLINE_ANY, 0, 0, "^-", "a--\xe2\x80\xa8--" }, + { MU, PCRE2_NEWLINE_ANY, 0, 0, "^-", "a--\xc2\x85--" }, + { 0, 0, 0, 0, "ab$", "ab" }, + { 0, 0, 0, 0 | F_NOMATCH, "ab$", "abab\n\n" }, + { PCRE2_DOLLAR_ENDONLY, 0, 0, 0 | F_NOMATCH, "ab$", "abab\r\n" }, + { M, PCRE2_NEWLINE_CRLF, 0, 0, "a$", "\r\raa\n\naa\r\naa" }, + { M, PCRE2_NEWLINE_ANY, 0, 0, "a$", "aaa" }, + { MU, PCRE2_NEWLINE_ANYCRLF, 0, 0, "#$", "#\xc2\x85###\r#" }, + { MU, PCRE2_NEWLINE_ANY, 0, 0, "#$", "#\xe2\x80\xa9" }, + { 0, PCRE2_NEWLINE_ANY, PCRE2_NOTBOL, 0 | F_NOMATCH, "^a", "aa\naa" }, + { M, PCRE2_NEWLINE_ANY, PCRE2_NOTBOL, 0, "^a", "aa\naa" }, + { 0, PCRE2_NEWLINE_ANY, PCRE2_NOTEOL, 0 | F_NOMATCH, "a$", "aa\naa" }, + { 0, PCRE2_NEWLINE_ANY, PCRE2_NOTEOL, 0 | F_NOMATCH, "a$", "aa\r\n" }, + { U | PCRE2_DOLLAR_ENDONLY, PCRE2_NEWLINE_ANY, 0, 0 | F_PROPERTY, "\\p{Any}{2,}$", "aa\r\n" }, + { M, PCRE2_NEWLINE_ANY, PCRE2_NOTEOL, 0, "a$", "aa\naa" }, + { 0, PCRE2_NEWLINE_CR, 0, 0, ".\\Z", "aaa" }, + { U, PCRE2_NEWLINE_CR, 0, 0, "a\\Z", "aaa\r" }, + { 0, PCRE2_NEWLINE_CR, 0, 0, ".\\Z", "aaa\n" }, + { 0, PCRE2_NEWLINE_CRLF, 0, 0, ".\\Z", "aaa\r" }, + { U, PCRE2_NEWLINE_CRLF, 0, 0, ".\\Z", "aaa\n" }, + { 0, PCRE2_NEWLINE_CRLF, 0, 0, ".\\Z", "aaa\r\n" }, + { U, PCRE2_NEWLINE_ANYCRLF, 0, 0, ".\\Z", "aaa" }, + { U, PCRE2_NEWLINE_ANYCRLF, 0, 0, ".\\Z", "aaa\r" }, + { U, PCRE2_NEWLINE_ANYCRLF, 0, 0, ".\\Z", "aaa\n" }, + { U, PCRE2_NEWLINE_ANYCRLF, 0, 0, ".\\Z", "aaa\r\n" }, + { U, PCRE2_NEWLINE_ANYCRLF, 0, 0, ".\\Z", "aaa\xe2\x80\xa8" }, + { U, PCRE2_NEWLINE_ANYCRLF, 0, 0, ".\\Z", "aaa" }, + { U, PCRE2_NEWLINE_ANYCRLF, 0, 0, ".\\Z", "aaa\r" }, + { U, PCRE2_NEWLINE_ANYCRLF, 0, 0, ".\\Z", "aaa\n" }, + { U, PCRE2_NEWLINE_ANYCRLF, 0, 0, ".\\Z", "aaa\r\n" }, + { U, PCRE2_NEWLINE_ANY, 0, 0, ".\\Z", "aaa\xc2\x85" }, + { U, PCRE2_NEWLINE_ANY, 0, 0, ".\\Z", "aaa\xe2\x80\xa8" }, + { M, A, 0, 0, "\\Aa", "aaa" }, + { M, A, 0, 1 | F_NOMATCH, "\\Aa", "aaa" }, + { M, A, 0, 1, "\\Ga", "aaa" }, + { M, A, 0, 1 | F_NOMATCH, "\\Ga", "aba" }, + { M, A, 0, 0, "a\\z", "aaa" }, + { M, A, 0, 0 | F_NOMATCH, "a\\z", "aab" }, + + /* Brackets and alternatives. */ + { MU, A, 0, 0, "(ab|bb|cd)", "bacde" }, + { MU, A, 0, 0, "(?:ab|a)(bc|c)", "ababc" }, + { MU, A, 0, 0, "((ab|(cc))|(bb)|(?:cd|efg))", "abac" }, + { CMU, A, 0, 0, "((aB|(Cc))|(bB)|(?:cd|EFg))", "AcCe" }, + { MU, A, 0, 0, "((ab|(cc))|(bb)|(?:cd|ebg))", "acebebg" }, + { MU, A, 0, 0, "(?:(a)|(?:b))(cc|(?:d|e))(a|b)k", "accabdbbccbk" }, + { MU, A, 0, 0, "\xc7\x82|\xc6\x82", "\xf1\x83\x82\x82\xc7\x82\xc7\x83" }, + { MU, A, 0, 0, "=\xc7\x82|#\xc6\x82", "\xf1\x83\x82\x82=\xc7\x82\xc7\x83" }, + { MU, A, 0, 0, "\xc7\x82\xc7\x83|\xc6\x82\xc6\x82", "\xf1\x83\x82\x82\xc7\x82\xc7\x83" }, + { MU, A, 0, 0, "\xc6\x82\xc6\x82|\xc7\x83\xc7\x83|\xc8\x84\xc8\x84", "\xf1\x83\x82\x82\xc8\x84\xc8\x84" }, + { U, A, 0, 0, "\xe1\x81\x80|\xe2\x82\x80|\xe4\x84\x80", "\xdf\xbf\xc2\x80\xe4\x84\x80" }, + { U, A, 0, 0, "(?:\xe1\x81\x80|\xe2\x82\x80|\xe4\x84\x80)#", "\xdf\xbf\xc2\x80#\xe4\x84\x80#" }, + { CM, A, 0, 0, "ab|cd", "CD" }, + { CM, A, 0, 0, "a1277|a1377|bX487", "bx487" }, + { CM, A, 0, 0, "a1277|a1377|bx487", "bX487" }, + + /* Greedy and non-greedy ? operators. */ + { MU, A, 0, 0, "(?:a)?a", "laab" }, + { CMU, A, 0, 0, "(A)?A", "llaab" }, + { MU, A, 0, 0, "(a)?\?a", "aab" }, /* ?? is the prefix of trygraphs in GCC. */ + { MU, A, 0, 0, "(a)?a", "manm" }, + { CMU, A, 0, 0, "(a|b)?\?d((?:e)?)", "ABABdx" }, + { MU, A, 0, 0, "(a|b)?\?d((?:e)?)", "abcde" }, + { MU, A, 0, 0, "((?:ab)?\?g|b(?:g(nn|d)?\?)?)?\?(?:n)?m", "abgnbgnnbgdnmm" }, + + /* Greedy and non-greedy + operators */ + { MU, A, 0, 0, "(aa)+aa", "aaaaaaa" }, + { MU, A, 0, 0, "(aa)+?aa", "aaaaaaa" }, + { MU, A, 0, 0, "(?:aba|ab|a)+l", "ababamababal" }, + { MU, A, 0, 0, "(?:aba|ab|a)+?l", "ababamababal" }, + { MU, A, 0, 0, "(a(?:bc|cb|b|c)+?|ss)+e", "accssabccbcacbccbbXaccssabccbcacbccbbe" }, + { MU, A, 0, 0, "(a(?:bc|cb|b|c)+|ss)+?e", "accssabccbcacbccbbXaccssabccbcacbccbbe" }, + { MU, A, 0, 0, "(?:(b(c)+?)+)?\?(?:(bc)+|(cb)+)+(?:m)+", "bccbcccbcbccbcbPbccbcccbcbccbcbmmn" }, + + /* Greedy and non-greedy * operators */ + { CMU, A, 0, 0, "(?:AA)*AB", "aaaaaaamaaaaaaab" }, + { MU, A, 0, 0, "(?:aa)*?ab", "aaaaaaamaaaaaaab" }, + { MU, A, 0, 0, "(aa|ab)*ab", "aaabaaab" }, + { CMU, A, 0, 0, "(aa|Ab)*?aB", "aaabaaab" }, + { MU, A, 0, 0, "(a|b)*(?:a)*(?:b)*m", "abbbaaababanabbbaaababamm" }, + { MU, A, 0, 0, "(a|b)*?(?:a)*?(?:b)*?m", "abbbaaababanabbbaaababamm" }, + { M, A, 0, 0, "a(a(\\1*)a|(b)b+){0}a", "aa" }, + { M, A, 0, 0, "((?:a|)*){0}a", "a" }, + + /* Combining ? + * operators */ + { MU, A, 0, 0, "((bm)+)?\?(?:a)*(bm)+n|((am)+?)?(?:a)+(am)*n", "bmbmabmamaaamambmaman" }, + { MU, A, 0, 0, "(((ab)?cd)*ef)+g", "abcdcdefcdefefmabcdcdefcdefefgg" }, + { MU, A, 0, 0, "(((ab)?\?cd)*?ef)+?g", "abcdcdefcdefefmabcdcdefcdefefgg" }, + { MU, A, 0, 0, "(?:(ab)?c|(?:ab)+?d)*g", "ababcdccababddg" }, + { MU, A, 0, 0, "(?:(?:ab)?\?c|(ab)+d)*?g", "ababcdccababddg" }, + + /* Single character iterators. */ + { MU, A, 0, 0, "(a+aab)+aaaab", "aaaabcaaaabaabcaabcaaabaaaab" }, + { MU, A, 0, 0, "(a*a*aab)+x", "aaaaabaabaaabmaabx" }, + { MU, A, 0, 0, "(a*?(b|ab)a*?)+x", "aaaabcxbbaabaacbaaabaabax" }, + { MU, A, 0, 0, "(a+(ab|ad)a+)+x", "aaabaaaadaabaaabaaaadaaax" }, + { MU, A, 0, 0, "(a?(a)a?)+(aaa)", "abaaabaaaaaaaa" }, + { MU, A, 0, 0, "(a?\?(a)a?\?)+(b)", "aaaacaaacaacacbaaab" }, + { MU, A, 0, 0, "(a{0,4}(b))+d", "aaaaaabaabcaaaaabaaaaabd" }, + { MU, A, 0, 0, "(a{0,4}?[^b])+d+(a{0,4}[^b])d+", "aaaaadaaaacaadddaaddd" }, + { MU, A, 0, 0, "(ba{2})+c", "baabaaabacbaabaac" }, + { MU, A, 0, 0, "(a*+bc++)+", "aaabbcaaabcccab" }, + { MU, A, 0, 0, "(a?+[^b])+", "babaacacb" }, + { MU, A, 0, 0, "(a{0,3}+b)(a{0,3}+b)(a{0,3}+)[^c]", "abaabaaacbaabaaaac" }, + { CMU, A, 0, 0, "([a-c]+[d-f]+?)+?g", "aBdacdehAbDaFgA" }, + { CMU, A, 0, 0, "[c-f]+k", "DemmFke" }, + { MU, A, 0, 0, "([DGH]{0,4}M)+", "GGDGHDGMMHMDHHGHM" }, + { MU, A, 0, 0, "([a-c]{4,}s)+", "abasabbasbbaabsbba" }, + { CMU, A, 0, 0, "[ace]{3,7}", "AcbDAcEEcEd" }, + { CMU, A, 0, 0, "[ace]{3,7}?", "AcbDAcEEcEd" }, + { CMU, A, 0, 0, "[ace]{3,}", "AcbDAcEEcEd" }, + { CMU, A, 0, 0, "[ace]{3,}?", "AcbDAcEEcEd" }, + { MU, A, 0, 0, "[ckl]{2,}?g", "cdkkmlglglkcg" }, + { CMU, A, 0, 0, "[ace]{5}?", "AcCebDAcEEcEd" }, + { MU, A, 0, 0, "([AbC]{3,5}?d)+", "BACaAbbAEAACCbdCCbdCCAAbb" }, + { MU, A, 0, 0, "([^ab]{0,}s){2}", "abaabcdsABamsDDs" }, + { MU, A, 0, 0, "\\b\\w+\\B", "x,a_cd" }, + { MUP, A, 0, 0, "\\b[^\xc2\xa1]+\\B", "\xc3\x89\xc2\xa1\xe6\x92\xad\xc3\x81\xc3\xa1" }, + { CMU, A, 0, 0, "[^b]+(a*)([^c]?d{3})", "aaaaddd" }, + { CMUP, A, 0, 0, "\xe1\xbd\xb8{2}", "\xe1\xbf\xb8#\xe1\xbf\xb8\xe1\xbd\xb8" }, + { CMU, A, 0, 0, "[^\xf0\x90\x90\x80]{2,4}@", "\xf0\x90\x90\xa8\xf0\x90\x90\x80###\xf0\x90\x90\x80@@@" }, + { CMU, A, 0, 0, "[^\xe1\xbd\xb8][^\xc3\xa9]", "\xe1\xbd\xb8\xe1\xbf\xb8\xc3\xa9\xc3\x89#" }, + { MU, A, 0, 0, "[^\xe1\xbd\xb8][^\xc3\xa9]", "\xe1\xbd\xb8\xe1\xbf\xb8\xc3\xa9\xc3\x89#" }, + { MU, A, 0, 0, "[^\xe1\xbd\xb8]{3,}?", "##\xe1\xbd\xb8#\xe1\xbd\xb8#\xc3\x89#\xe1\xbd\xb8" }, + { MU, A, 0, 0, "\\d+123", "987654321,01234" }, + { MU, A, 0, 0, "abcd*|\\w+xy", "aaaaa,abxyz" }, + { MU, A, 0, 0, "(?:abc|((?:amc|\\b\\w*xy)))", "aaaaa,abxyz" }, + { MU, A, 0, 0, "a(?R)|([a-z]++)#", ".abcd.abcd#."}, + { MU, A, 0, 0, "a(?R)|([a-z]++)#", ".abcd.mbcd#."}, + { MU, A, 0, 0, ".[ab]*.", "xx" }, + { MU, A, 0, 0, ".[ab]*a", "xxa" }, + { MU, A, 0, 0, ".[ab]?.", "xx" }, + { MU, A, 0, 0, "_[ab]+_*a", "_aa" }, + + /* Bracket repeats with limit. */ + { MU, A, 0, 0, "(?:(ab){2}){5}M", "abababababababababababM" }, + { MU, A, 0, 0, "(?:ab|abab){1,5}M", "abababababababababababM" }, + { MU, A, 0, 0, "(?>ab|abab){1,5}M", "abababababababababababM" }, + { MU, A, 0, 0, "(?:ab|abab){1,5}?M", "abababababababababababM" }, + { MU, A, 0, 0, "(?>ab|abab){1,5}?M", "abababababababababababM" }, + { MU, A, 0, 0, "(?:(ab){1,4}?){1,3}?M", "abababababababababababababM" }, + { MU, A, 0, 0, "(?:(ab){1,4}){1,3}abababababababababababM", "ababababababababababababM" }, + { MU, A, 0, 0 | F_NOMATCH, "(?:(ab){1,4}){1,3}abababababababababababM", "abababababababababababM" }, + { MU, A, 0, 0, "(ab){4,6}?M", "abababababababM" }, + + /* Basic character sets. */ + { MU, A, 0, 0, "(?:\\s)+(?:\\S)+", "ab \t\xc3\xa9\xe6\x92\xad " }, + { MU, A, 0, 0, "(\\w)*(k)(\\W)?\?", "abcdef abck11" }, + { MU, A, 0, 0, "\\((\\d)+\\)\\D", "a() (83 (8)2 (9)ab" }, + { MU, A, 0, 0, "\\w(\\s|(?:\\d)*,)+\\w\\wb", "a 5, 4,, bb 5, 4,, aab" }, + { MU, A, 0, 0, "(\\v+)(\\V+)", "\x0e\xc2\x85\xe2\x80\xa8\x0b\x09\xe2\x80\xa9" }, + { MU, A, 0, 0, "(\\h+)(\\H+)", "\xe2\x80\xa8\xe2\x80\x80\x20\xe2\x80\x8a\xe2\x81\x9f\xe3\x80\x80\x09\x20\xc2\xa0\x0a" }, + { MU, A, 0, 0, "x[bcef]+", "xaxdxecbfg" }, + { MU, A, 0, 0, "x[bcdghij]+", "xaxexfxdgbjk" }, + { MU, A, 0, 0, "x[^befg]+", "xbxexacdhg" }, + { MU, A, 0, 0, "x[^bcdl]+", "xlxbxaekmd" }, + { MU, A, 0, 0, "x[^bcdghi]+", "xbxdxgxaefji" }, + { MU, A, 0, 0, "x[B-Fb-f]+", "xaxAxgxbfBFG" }, + { CMU, A, 0, 0, "\\x{e9}+", "#\xf0\x90\x90\xa8\xc3\xa8\xc3\xa9\xc3\x89\xc3\x88" }, + { CMU, A, 0, 0, "[^\\x{e9}]+", "\xc3\xa9#\xf0\x90\x90\xa8\xc3\xa8\xc3\x88\xc3\x89" }, + { MU, A, 0, 0, "[\\x02\\x7e]+", "\xc3\x81\xe1\xbf\xb8\xf0\x90\x90\xa8\x01\x02\x7e\x7f" }, + { MU, A, 0, 0, "[^\\x02\\x7e]+", "\x02\xc3\x81\xe1\xbf\xb8\xf0\x90\x90\xa8\x01\x7f\x7e" }, + { MU, A, 0, 0, "[\\x{81}-\\x{7fe}]+", "#\xe1\xbf\xb8\xf0\x90\x90\xa8\xc2\x80\xc2\x81\xdf\xbe\xdf\xbf" }, + { MU, A, 0, 0, "[^\\x{81}-\\x{7fe}]+", "\xc2\x81#\xe1\xbf\xb8\xf0\x90\x90\xa8\xc2\x80\xdf\xbf\xdf\xbe" }, + { MU, A, 0, 0, "[\\x{801}-\\x{fffe}]+", "#\xc3\xa9\xf0\x90\x90\x80\xe0\xa0\x80\xe0\xa0\x81\xef\xbf\xbe\xef\xbf\xbf" }, + { MU, A, 0, 0, "[^\\x{801}-\\x{fffe}]+", "\xe0\xa0\x81#\xc3\xa9\xf0\x90\x90\x80\xe0\xa0\x80\xef\xbf\xbf\xef\xbf\xbe" }, + { MU, A, 0, 0, "[\\x{10001}-\\x{10fffe}]+", "#\xc3\xa9\xe2\xb1\xa5\xf0\x90\x80\x80\xf0\x90\x80\x81\xf4\x8f\xbf\xbe\xf4\x8f\xbf\xbf" }, + { MU, A, 0, 0, "[^\\x{10001}-\\x{10fffe}]+", "\xf0\x90\x80\x81#\xc3\xa9\xe2\xb1\xa5\xf0\x90\x80\x80\xf4\x8f\xbf\xbf\xf4\x8f\xbf\xbe" }, + { CMU, A, 0, 0 | F_NOMATCH, "^[\\x{0100}-\\x{017f}]", " " }, + + /* Unicode properties. */ + { MUP, A, 0, 0, "[1-5\xc3\xa9\\w]", "\xc3\xa1_" }, + { MUP, A, 0, 0 | F_PROPERTY, "[\xc3\x81\\p{Ll}]", "A_\xc3\x89\xc3\xa1" }, + { MUP, A, 0, 0, "[\\Wd-h_x-z]+", "a\xc2\xa1#_yhzdxi" }, + { MUP, A, 0, 0 | F_NOMATCH | F_PROPERTY, "[\\P{Any}]", "abc" }, + { MUP, A, 0, 0 | F_NOMATCH | F_PROPERTY, "[^\\p{Any}]", "abc" }, + { MUP, A, 0, 0 | F_NOMATCH | F_PROPERTY, "[\\P{Any}\xc3\xa1-\xc3\xa8]", "abc" }, + { MUP, A, 0, 0 | F_NOMATCH | F_PROPERTY, "[^\\p{Any}\xc3\xa1-\xc3\xa8]", "abc" }, + { MUP, A, 0, 0 | F_NOMATCH | F_PROPERTY, "[\xc3\xa1-\xc3\xa8\\P{Any}]", "abc" }, + { MUP, A, 0, 0 | F_NOMATCH | F_PROPERTY, "[^\xc3\xa1-\xc3\xa8\\p{Any}]", "abc" }, + { MUP, A, 0, 0 | F_PROPERTY, "[\xc3\xa1-\xc3\xa8\\p{Any}]", "abc" }, + { MUP, A, 0, 0 | F_PROPERTY, "[^\xc3\xa1-\xc3\xa8\\P{Any}]", "abc" }, + { MUP, A, 0, 0, "[b-\xc3\xa9\\s]", "a\xc\xe6\x92\xad" }, + { CMUP, A, 0, 0, "[\xc2\x85-\xc2\x89\xc3\x89]", "\xc2\x84\xc3\xa9" }, + { MUP, A, 0, 0, "[^b-d^&\\s]{3,}", "db^ !a\xe2\x80\xa8_ae" }, + { MUP, A, 0, 0 | F_PROPERTY, "[^\\S\\P{Any}][\\sN]{1,3}[\\P{N}]{4}", "\xe2\x80\xaa\xa N\x9\xc3\xa9_0" }, + { MU, A, 0, 0 | F_PROPERTY, "[^\\P{L}\x9!D-F\xa]{2,3}", "\x9,.DF\xa.CG\xc3\x81" }, + { CMUP, A, 0, 0, "[\xc3\xa1-\xc3\xa9_\xe2\x80\xa0-\xe2\x80\xaf]{1,5}[^\xe2\x80\xa0-\xe2\x80\xaf]", "\xc2\xa1\xc3\x89\xc3\x89\xe2\x80\xaf_\xe2\x80\xa0" }, + { MUP, A, 0, 0 | F_PROPERTY, "[\xc3\xa2-\xc3\xa6\xc3\x81-\xc3\x84\xe2\x80\xa8-\xe2\x80\xa9\xe6\x92\xad\\p{Zs}]{2,}", "\xe2\x80\xa7\xe2\x80\xa9\xe6\x92\xad \xe6\x92\xae" }, + { MUP, A, 0, 0 | F_PROPERTY, "[\\P{L&}]{2}[^\xc2\x85-\xc2\x89\\p{Ll}\\p{Lu}]{2}", "\xc3\xa9\xe6\x92\xad.a\xe6\x92\xad|\xc2\x8a#" }, + { PCRE2_UCP, 0, 0, 0 | F_PROPERTY, "[a-b\\s]{2,5}[^a]", "AB baaa" }, + { MUP, 0, 0, 0 | F_NOMATCH, "[^\\p{Hangul}\\p{Z}]", " " }, + + /* Possible empty brackets. */ + { MU, A, 0, 0, "(?:|ab||bc|a)+d", "abcxabcabd" }, + { MU, A, 0, 0, "(|ab||bc|a)+d", "abcxabcabd" }, + { MU, A, 0, 0, "(?:|ab||bc|a)*d", "abcxabcabd" }, + { MU, A, 0, 0, "(|ab||bc|a)*d", "abcxabcabd" }, + { MU, A, 0, 0, "(?:|ab||bc|a)+?d", "abcxabcabd" }, + { MU, A, 0, 0, "(|ab||bc|a)+?d", "abcxabcabd" }, + { MU, A, 0, 0, "(?:|ab||bc|a)*?d", "abcxabcabd" }, + { MU, A, 0, 0, "(|ab||bc|a)*?d", "abcxabcabd" }, + { MU, A, 0, 0, "(((a)*?|(?:ba)+)+?|(?:|c|ca)*)*m", "abaacaccabacabalabaacaccabacabamm" }, + { MU, A, 0, 0, "(?:((?:a)*|(ba)+?)+|(|c|ca)*?)*?m", "abaacaccabacabalabaacaccabacabamm" }, + + /* Start offset. */ + { MU, A, 0, 3, "(\\d|(?:\\w)*\\w)+", "0ac01Hb" }, + { MU, A, 0, 4 | F_NOMATCH, "(\\w\\W\\w)+", "ab#d" }, + { MU, A, 0, 2 | F_NOMATCH, "(\\w\\W\\w)+", "ab#d" }, + { MU, A, 0, 1, "(\\w\\W\\w)+", "ab#d" }, + + /* Newline. */ + { M, PCRE2_NEWLINE_CRLF, 0, 0, "\\W{0,2}[^#]{3}", "\r\n#....." }, + { M, PCRE2_NEWLINE_CR, 0, 0, "\\W{0,2}[^#]{3}", "\r\n#....." }, + { M, PCRE2_NEWLINE_CRLF, 0, 0, "\\W{1,3}[^#]", "\r\n##...." }, + { MU, A, PCRE2_NO_UTF_CHECK, 1, "^.a", "\n\x80\nxa" }, + { MU, A, 0, 1, "^", "\r\n" }, + { M, PCRE2_NEWLINE_CRLF, 0, 1 | F_NOMATCH, "^", "\r\n" }, + { M, PCRE2_NEWLINE_CRLF, 0, 1, "^", "\r\na" }, + + /* Any character except newline or any newline. */ + { 0, PCRE2_NEWLINE_CRLF, 0, 0, ".", "\r" }, + { U, PCRE2_NEWLINE_CRLF, 0, 0, ".(.).", "a\xc3\xa1\r\n\n\r\r" }, + { 0, PCRE2_NEWLINE_ANYCRLF, 0, 0, ".(.)", "a\rb\nc\r\n\xc2\x85\xe2\x80\xa8" }, + { U, PCRE2_NEWLINE_ANYCRLF, 0, 0, ".(.)", "a\rb\nc\r\n\xc2\x85\xe2\x80\xa8" }, + { U, PCRE2_NEWLINE_ANY, 0, 0, "(.).", "a\rb\nc\r\n\xc2\x85\xe2\x80\xa9$de" }, + { U, PCRE2_NEWLINE_ANYCRLF, 0, 0 | F_NOMATCH, ".(.).", "\xe2\x80\xa8\nb\r" }, + { 0, PCRE2_NEWLINE_ANY, 0, 0, "(.)(.)", "#\x85#\r#\n#\r\n#\x84" }, + { U, PCRE2_NEWLINE_ANY, 0, 0, "(.+)#", "#\rMn\xc2\x85#\n###" }, + { 0, BSR(PCRE2_BSR_ANYCRLF), 0, 0, "\\R", "\r" }, + { 0, BSR(PCRE2_BSR_ANYCRLF), 0, 0, "\\R", "\x85#\r\n#" }, + { U, BSR(PCRE2_BSR_UNICODE), 0, 0, "\\R", "ab\xe2\x80\xa8#c" }, + { U, BSR(PCRE2_BSR_UNICODE), 0, 0, "\\R", "ab\r\nc" }, + { U, PCRE2_NEWLINE_CRLF | BSR(PCRE2_BSR_UNICODE), 0, 0, "(\\R.)+", "\xc2\x85\r\n#\xe2\x80\xa8\n\r\n\r" }, + { MU, A, 0, 0 | F_NOMATCH, "\\R+", "ab" }, + { MU, A, 0, 0, "\\R+", "ab\r\n\r" }, + { MU, A, 0, 0, "\\R*", "ab\r\n\r" }, + { MU, A, 0, 0, "\\R*", "\r\n\r" }, + { MU, A, 0, 0, "\\R{2,4}", "\r\nab\r\r" }, + { MU, A, 0, 0, "\\R{2,4}", "\r\nab\n\n\n\r\r\r" }, + { MU, A, 0, 0, "\\R{2,}", "\r\nab\n\n\n\r\r\r" }, + { MU, A, 0, 0, "\\R{0,3}", "\r\n\r\n\r\n\r\n\r\n" }, + { MU, A, 0, 0 | F_NOMATCH, "\\R+\\R\\R", "\r\n\r\n" }, + { MU, A, 0, 0, "\\R+\\R\\R", "\r\r\r" }, + { MU, A, 0, 0, "\\R*\\R\\R", "\n\r" }, + { MU, A, 0, 0 | F_NOMATCH, "\\R{2,4}\\R\\R", "\r\r\r" }, + { MU, A, 0, 0, "\\R{2,4}\\R\\R", "\r\r\r\r" }, + + /* Atomic groups (no fallback from "next" direction). */ + { MU, A, 0, 0 | F_NOMATCH, "(?>ab)ab", "bab" }, + { MU, A, 0, 0 | F_NOMATCH, "(?>(ab))ab", "bab" }, + { MU, A, 0, 0, "(?>ab)+abc(?>de)*def(?>gh)?ghe(?>ij)+?k(?>lm)*?n(?>op)?\?op", + "bababcdedefgheijijklmlmnop" }, + { MU, A, 0, 0, "(?>a(b)+a|(ab)?\?(b))an", "abban" }, + { MU, A, 0, 0, "(?>ab+a|(?:ab)?\?b)an", "abban" }, + { MU, A, 0, 0, "((?>ab|ad|)*?)(?>|c)*abad", "abababcababad" }, + { MU, A, 0, 0, "(?>(aa|b|)*+(?>(##)|###)*d|(aa)(?>(baa)?)m)", "aabaa#####da" }, + { MU, A, 0, 0, "((?>a|)+?)b", "aaacaaab" }, + { MU, A, 0, 0, "(?>x|)*$", "aaa" }, + { MU, A, 0, 0, "(?>(x)|)*$", "aaa" }, + { MU, A, 0, 0, "(?>x|())*$", "aaa" }, + { MU, A, 0, 0, "((?>[cxy]a|[a-d])*?)b", "aaa+ aaab" }, + { MU, A, 0, 0, "((?>[cxy](a)|[a-d])*?)b", "aaa+ aaab" }, + { MU, A, 0, 0, "(?>((?>(a+))))bab|(?>((?>(a+))))bb", "aaaabaaabaabab" }, + { MU, A, 0, 0, "(?>(?>a+))bab|(?>(?>a+))bb", "aaaabaaabaabab" }, + { MU, A, 0, 0, "(?>(a)c|(?>(c)|(a))a)b*?bab", "aaaabaaabaabab" }, + { MU, A, 0, 0, "(?>ac|(?>c|a)a)b*?bab", "aaaabaaabaabab" }, + { MU, A, 0, 0, "(?>(b)b|(a))*b(?>(c)|d)?x", "ababcaaabdbx" }, + { MU, A, 0, 0, "(?>bb|a)*b(?>c|d)?x", "ababcaaabdbx" }, + { MU, A, 0, 0, "(?>(bb)|a)*b(?>c|(d))?x", "ababcaaabdbx" }, + { MU, A, 0, 0, "(?>(a))*?(?>(a))+?(?>(a))??x", "aaaaaacccaaaaabax" }, + { MU, A, 0, 0, "(?>a)*?(?>a)+?(?>a)??x", "aaaaaacccaaaaabax" }, + { MU, A, 0, 0, "(?>(a)|)*?(?>(a)|)+?(?>(a)|)??x", "aaaaaacccaaaaabax" }, + { MU, A, 0, 0, "(?>a|)*?(?>a|)+?(?>a|)??x", "aaaaaacccaaaaabax" }, + { MU, A, 0, 0, "(?>a(?>(a{0,2}))*?b|aac)+b", "aaaaaaacaaaabaaaaacaaaabaacaaabb" }, + { CM, A, 0, 0, "(?>((?>a{32}|b+|(a*))?(?>c+|d*)?\?)+e)+?f", "aaccebbdde bbdaaaccebbdee bbdaaaccebbdeef" }, + { MU, A, 0, 0, "(?>(?:(?>aa|a||x)+?b|(?>aa|a||(x))+?c)?(?>[ad]{0,2})*?d)+d", "aaacdbaabdcabdbaaacd aacaabdbdcdcaaaadaabcbaadd" }, + { MU, A, 0, 0, "(?>(?:(?>aa|a||(x))+?b|(?>aa|a||x)+?c)?(?>[ad]{0,2})*?d)+d", "aaacdbaabdcabdbaaacd aacaabdbdcdcaaaadaabcbaadd" }, + { MU, A, 0, 0 | F_PROPERTY, "\\X", "\xcc\x8d\xcc\x8d" }, + { MU, A, 0, 0 | F_PROPERTY, "\\X", "\xcc\x8d\xcc\x8d#\xcc\x8d\xcc\x8d" }, + { MU, A, 0, 0 | F_PROPERTY, "\\X+..", "\xcc\x8d#\xcc\x8d#\xcc\x8d\xcc\x8d" }, + { MU, A, 0, 0 | F_PROPERTY, "\\X{2,4}", "abcdef" }, + { MU, A, 0, 0 | F_PROPERTY, "\\X{2,4}?", "abcdef" }, + { MU, A, 0, 0 | F_NOMATCH | F_PROPERTY, "\\X{2,4}..", "#\xcc\x8d##" }, + { MU, A, 0, 0 | F_PROPERTY, "\\X{2,4}..", "#\xcc\x8d#\xcc\x8d##" }, + { MU, A, 0, 0, "(c(ab)?+ab)+", "cabcababcab" }, + { MU, A, 0, 0, "(?>(a+)b)+aabab", "aaaabaaabaabab" }, + + /* Possessive quantifiers. */ + { MU, A, 0, 0, "(?:a|b)++m", "mababbaaxababbaam" }, + { MU, A, 0, 0, "(?:a|b)*+m", "mababbaaxababbaam" }, + { MU, A, 0, 0, "(?:a|b)*+m", "ababbaaxababbaam" }, + { MU, A, 0, 0, "(a|b)++m", "mababbaaxababbaam" }, + { MU, A, 0, 0, "(a|b)*+m", "mababbaaxababbaam" }, + { MU, A, 0, 0, "(a|b)*+m", "ababbaaxababbaam" }, + { MU, A, 0, 0, "(a|b(*ACCEPT))++m", "maaxab" }, + { MU, A, 0, 0, "(?:b*)++m", "bxbbxbbbxm" }, + { MU, A, 0, 0, "(?:b*)++m", "bxbbxbbbxbbm" }, + { MU, A, 0, 0, "(?:b*)*+m", "bxbbxbbbxm" }, + { MU, A, 0, 0, "(?:b*)*+m", "bxbbxbbbxbbm" }, + { MU, A, 0, 0, "(b*)++m", "bxbbxbbbxm" }, + { MU, A, 0, 0, "(b*)++m", "bxbbxbbbxbbm" }, + { MU, A, 0, 0, "(b*)*+m", "bxbbxbbbxm" }, + { MU, A, 0, 0, "(b*)*+m", "bxbbxbbbxbbm" }, + { MU, A, 0, 0, "(?:a|(b))++m", "mababbaaxababbaam" }, + { MU, A, 0, 0, "(?:(a)|b)*+m", "mababbaaxababbaam" }, + { MU, A, 0, 0, "(?:(a)|(b))*+m", "ababbaaxababbaam" }, + { MU, A, 0, 0, "(a|(b))++m", "mababbaaxababbaam" }, + { MU, A, 0, 0, "((a)|b)*+m", "mababbaaxababbaam" }, + { MU, A, 0, 0, "((a)|(b))*+m", "ababbaaxababbaam" }, + { MU, A, 0, 0, "(a|(b)(*ACCEPT))++m", "maaxab" }, + { MU, A, 0, 0, "(?:(b*))++m", "bxbbxbbbxm" }, + { MU, A, 0, 0, "(?:(b*))++m", "bxbbxbbbxbbm" }, + { MU, A, 0, 0, "(?:(b*))*+m", "bxbbxbbbxm" }, + { MU, A, 0, 0, "(?:(b*))*+m", "bxbbxbbbxbbm" }, + { MU, A, 0, 0, "((b*))++m", "bxbbxbbbxm" }, + { MU, A, 0, 0, "((b*))++m", "bxbbxbbbxbbm" }, + { MU, A, 0, 0, "((b*))*+m", "bxbbxbbbxm" }, + { MU, A, 0, 0, "((b*))*+m", "bxbbxbbbxbbm" }, + { MU, A, 0, 0 | F_NOMATCH, "(?>(b{2,4}))(?:(?:(aa|c))++m|(?:(aa|c))+n)", "bbaacaaccaaaacxbbbmbn" }, + { MU, A, 0, 0, "((?:b)++a)+(cd)*+m", "bbababbacdcdnbbababbacdcdm" }, + { MU, A, 0, 0, "((?:(b))++a)+((c)d)*+m", "bbababbacdcdnbbababbacdcdm" }, + { MU, A, 0, 0, "(?:(?:(?:ab)*+k)++(?:n(?:cd)++)*+)*+m", "ababkkXababkkabkncXababkkabkncdcdncdXababkkabkncdcdncdkkabkncdXababkkabkncdcdncdkkabkncdm" }, + { MU, A, 0, 0, "(?:((ab)*+(k))++(n(?:c(d))++)*+)*+m", "ababkkXababkkabkncXababkkabkncdcdncdXababkkabkncdcdncdkkabkncdXababkkabkncdcdncdkkabkncdm" }, + + /* Back references. */ + { MU, A, 0, 0, "(aa|bb)(\\1*)(ll|)(\\3*)bbbbbbc", "aaaaaabbbbbbbbc" }, + { CMU, A, 0, 0, "(aa|bb)(\\1+)(ll|)(\\3+)bbbbbbc", "bBbbBbCbBbbbBbbcbbBbbbBBbbC" }, + { CM, A, 0, 0, "(a{2,4})\\1", "AaAaaAaA" }, + { MU, A, 0, 0, "(aa|bb)(\\1?)aa(\\1?)(ll|)(\\4+)bbc", "aaaaaaaabbaabbbbaabbbbc" }, + { MU, A, 0, 0, "(aa|bb)(\\1{0,5})(ll|)(\\3{0,5})cc", "bbxxbbbbxxaaaaaaaaaaaaaaaacc" }, + { MU, A, 0, 0, "(aa|bb)(\\1{3,5})(ll|)(\\3{3,5})cc", "bbbbbbbbbbbbaaaaaaccbbbbbbbbbbbbbbcc" }, + { MU, A, 0, 0, "(aa|bb)(\\1{3,})(ll|)(\\3{3,})cc", "bbbbbbbbbbbbaaaaaaccbbbbbbbbbbbbbbcc" }, + { MU, A, 0, 0, "(\\w+)b(\\1+)c", "GabGaGaDbGaDGaDc" }, + { MU, A, 0, 0, "(?:(aa)|b)\\1?b", "bb" }, + { CMU, A, 0, 0, "(aa|bb)(\\1*?)aa(\\1+?)", "bBBbaaAAaaAAaa" }, + { MU, A, 0, 0, "(aa|bb)(\\1*?)(dd|)cc(\\3+?)", "aaaaaccdd" }, + { CMU, A, 0, 0, "(?:(aa|bb)(\\1?\?)cc){2}(\\1?\?)", "aAaABBbbAAaAcCaAcCaA" }, + { MU, A, 0, 0, "(?:(aa|bb)(\\1{3,5}?)){2}(dd|)(\\3{3,5}?)", "aaaaaabbbbbbbbbbaaaaaaaaaaaaaa" }, + { CM, A, 0, 0, "(?:(aa|bb)(\\1{3,}?)){2}(dd|)(\\3{3,}?)", "aaaaaabbbbbbbbbbaaaaaaaaaaaaaa" }, + { MU, A, 0, 0, "(?:(aa|bb)(\\1{0,3}?)){2}(dd|)(\\3{0,3}?)b(\\1{0,3}?)(\\1{0,3})", "aaaaaaaaaaaaaaabaaaaa" }, + { MU, A, 0, 0, "(a(?:\\1|)a){3}b", "aaaaaaaaaaab" }, + { M, A, 0, 0, "(a?)b(\\1\\1*\\1+\\1?\\1*?\\1+?\\1??\\1*+\\1++\\1?+\\1{4}\\1{3,5}\\1{4,}\\1{0,5}\\1{3,5}?\\1{4,}?\\1{0,5}?\\1{3,5}+\\1{4,}+\\1{0,5}+#){2}d", "bb#b##d" }, + { MUP, A, 0, 0 | F_PROPERTY, "(\\P{N})\\1{2,}", ".www." }, + { MUP, A, 0, 0 | F_PROPERTY, "(\\P{N})\\1{0,2}", "wwwww." }, + { MUP, A, 0, 0 | F_PROPERTY, "(\\P{N})\\1{1,2}ww", "wwww" }, + { MUP, A, 0, 0 | F_PROPERTY, "(\\P{N})\\1{1,2}ww", "wwwww" }, + { PCRE2_UCP, 0, 0, 0 | F_PROPERTY, "(\\P{N})\\1{2,}", ".www." }, + { CMUP, A, 0, 0, "(\xf0\x90\x90\x80)\\1", "\xf0\x90\x90\xa8\xf0\x90\x90\xa8" }, + { MU | PCRE2_DUPNAMES, A, 0, 0 | F_NOMATCH, "\\k{1,3}(?aa)(?bb)", "aabb" }, + { MU | PCRE2_DUPNAMES | PCRE2_MATCH_UNSET_BACKREF, A, 0, 0, "\\k{1,3}(?aa)(?bb)", "aabb" }, + { MU | PCRE2_DUPNAMES | PCRE2_MATCH_UNSET_BACKREF, A, 0, 0, "\\k*(?aa)(?bb)", "aabb" }, + { MU | PCRE2_DUPNAMES, A, 0, 0, "(?aa)(?bb)\\k{0,3}aaaaaa", "aabbaaaaaa" }, + { MU | PCRE2_DUPNAMES, A, 0, 0, "(?aa)(?bb)\\k{2,5}bb", "aabbaaaabb" }, + { MU | PCRE2_DUPNAMES, A, 0, 0, "(?:(?aa)|(?bb))\\k{0,3}m", "aaaaaaaabbbbaabbbbm" }, + { MU | PCRE2_DUPNAMES, A, 0, 0 | F_NOMATCH, "\\k{1,3}?(?aa)(?bb)", "aabb" }, + { MU | PCRE2_DUPNAMES | PCRE2_MATCH_UNSET_BACKREF, A, 0, 0, "\\k{1,3}?(?aa)(?bb)", "aabb" }, + { MU | PCRE2_DUPNAMES, A, 0, 0, "\\k*?(?aa)(?bb)", "aabb" }, + { MU | PCRE2_DUPNAMES, A, 0, 0, "(?:(?aa)|(?bb))\\k{0,3}?m", "aaaaaabbbbbbaabbbbbbbbbbm" }, + { MU | PCRE2_DUPNAMES, A, 0, 0, "(?:(?aa)|(?bb))\\k*?m", "aaaaaabbbbbbaabbbbbbbbbbm" }, + { MU | PCRE2_DUPNAMES, A, 0, 0, "(?:(?aa)|(?bb))\\k{2,3}?", "aaaabbbbaaaabbbbbbbbbb" }, + { CMU | PCRE2_DUPNAMES, A, 0, 0, "(?:(?AA)|(?BB))\\k{0,3}M", "aaaaaaaabbbbaabbbbm" }, + { CMU | PCRE2_DUPNAMES, A, 0, 0, "(?:(?AA)|(?BB))\\k{1,3}M", "aaaaaaaabbbbaabbbbm" }, + { CMU | PCRE2_DUPNAMES, A, 0, 0, "(?:(?AA)|(?BB))\\k{0,3}?M", "aaaaaabbbbbbaabbbbbbbbbbm" }, + { CMU | PCRE2_DUPNAMES, A, 0, 0, "(?:(?AA)|(?BB))\\k{2,3}?", "aaaabbbbaaaabbbbbbbbbb" }, + + /* Assertions. */ + { MU, A, 0, 0, "(?=xx|yy|zz)\\w{4}", "abczzdefg" }, + { MU, A, 0, 0, "(?=((\\w+)b){3}|ab)", "dbbbb ab" }, + { MU, A, 0, 0, "(?!ab|bc|cd)[a-z]{2}", "Xabcdef" }, + { MU, A, 0, 0, "(?<=aaa|aa|a)a", "aaa" }, + { MU, A, 0, 2, "(?<=aaa|aa|a)a", "aaa" }, + { M, A, 0, 0, "(?<=aaa|aa|a)a", "aaa" }, + { M, A, 0, 2, "(?<=aaa|aa|a)a", "aaa" }, + { MU, A, 0, 0, "(\\d{2})(?!\\w+c|(((\\w?)m){2}n)+|\\1)", "x5656" }, + { MU, A, 0, 0, "((?=((\\d{2,6}\\w){2,}))\\w{5,20}K){2,}", "567v09708K12l00M00 567v09708K12l00M00K45K" }, + { MU, A, 0, 0, "(?=(?:(?=\\S+a)\\w*(b)){3})\\w+\\d", "bba bbab nbbkba nbbkba0kl" }, + { MU, A, 0, 0, "(?>a(?>(b+))a(?=(..)))*?k", "acabbcabbaabacabaabbakk" }, + { MU, A, 0, 0, "((?(?=(a))a)+k)", "bbak" }, + { MU, A, 0, 0, "((?(?=a)a)+k)", "bbak" }, + { MU, A, 0, 0 | F_NOMATCH, "(?=(?>(a))m)amk", "a k" }, + { MU, A, 0, 0 | F_NOMATCH, "(?!(?>(a))m)amk", "a k" }, + { MU, A, 0, 0 | F_NOMATCH, "(?>(?=(a))am)amk", "a k" }, + { MU, A, 0, 0, "(?=(?>a|(?=(?>(b+))a|c)[a-c]+)*?m)[a-cm]+k", "aaam bbam baaambaam abbabba baaambaamk" }, + { MU, A, 0, 0, "(?> ?\?\\b(?(?=\\w{1,4}(a))m)\\w{0,8}bc){2,}?", "bca ssbc mabd ssbc mabc" }, + { MU, A, 0, 0, "(?:(?=ab)?[^n][^n])+m", "ababcdabcdcdabnababcdabcdcdabm" }, + { MU, A, 0, 0, "(?:(?=a(b))?[^n][^n])+m", "ababcdabcdcdabnababcdabcdcdabm" }, + { MU, A, 0, 0, "(?:(?=.(.))??\\1.)+m", "aabbbcbacccanaabbbcbacccam" }, + { MU, A, 0, 0, "(?:(?=.)??[a-c])+m", "abacdcbacacdcaccam" }, + { MU, A, 0, 0, "((?!a)?(?!([^a]))?)+$", "acbab" }, + { MU, A, 0, 0, "((?!a)?\?(?!([^a]))?\?)+$", "acbab" }, + { MU, A, 0, 0, "a(?=(?C)\\B(?C`x`))b", "ab" }, + { MU, A, 0, 0, "a(?!(?C)\\B(?C`x`))bb|ab", "abb" }, + { MU, A, 0, 0, "a(?=\\b|(?C)\\B(?C`x`))b", "ab" }, + { MU, A, 0, 0, "a(?!\\b|(?C)\\B(?C`x`))bb|ab", "abb" }, + { MU, A, 0, 0, "c(?(?=(?C)\\B(?C`x`))ab|a)", "cab" }, + { MU, A, 0, 0, "c(?(?!(?C)\\B(?C`x`))ab|a)", "cab" }, + { MU, A, 0, 0, "c(?(?=\\b|(?C)\\B(?C`x`))ab|a)", "cab" }, + { MU, A, 0, 0, "c(?(?!\\b|(?C)\\B(?C`x`))ab|a)", "cab" }, + { MU, A, 0, 0, "a(?=)b", "ab" }, + { MU, A, 0, 0 | F_NOMATCH, "a(?!)b", "ab" }, + + /* Not empty, ACCEPT, FAIL */ + { MU, A, PCRE2_NOTEMPTY, 0 | F_NOMATCH, "a*", "bcx" }, + { MU, A, PCRE2_NOTEMPTY, 0, "a*", "bcaad" }, + { MU, A, PCRE2_NOTEMPTY, 0, "a*?", "bcaad" }, + { MU, A, PCRE2_NOTEMPTY_ATSTART, 0, "a*", "bcaad" }, + { MU, A, 0, 0, "a(*ACCEPT)b", "ab" }, + { MU, A, PCRE2_NOTEMPTY, 0 | F_NOMATCH, "a*(*ACCEPT)b", "bcx" }, + { MU, A, PCRE2_NOTEMPTY, 0, "a*(*ACCEPT)b", "bcaad" }, + { MU, A, PCRE2_NOTEMPTY, 0, "a*?(*ACCEPT)b", "bcaad" }, + { MU, A, PCRE2_NOTEMPTY, 0 | F_NOMATCH, "(?:z|a*(*ACCEPT)b)", "bcx" }, + { MU, A, PCRE2_NOTEMPTY, 0, "(?:z|a*(*ACCEPT)b)", "bcaad" }, + { MU, A, PCRE2_NOTEMPTY, 0, "(?:z|a*?(*ACCEPT)b)", "bcaad" }, + { MU, A, PCRE2_NOTEMPTY_ATSTART, 0, "a*(*ACCEPT)b", "bcx" }, + { MU, A, PCRE2_NOTEMPTY_ATSTART, 0 | F_NOMATCH, "a*(*ACCEPT)b", "" }, + { MU, A, 0, 0, "((a(*ACCEPT)b))", "ab" }, + { MU, A, 0, 0, "(a(*FAIL)a|a)", "aaa" }, + { MU, A, 0, 0, "(?=ab(*ACCEPT)b)a", "ab" }, + { MU, A, 0, 0, "(?=(?:x|ab(*ACCEPT)b))", "ab" }, + { MU, A, 0, 0, "(?=(a(b(*ACCEPT)b)))a", "ab" }, + { MU, A, PCRE2_NOTEMPTY, 0, "(?=a*(*ACCEPT))c", "c" }, + { MU, A, PCRE2_NOTEMPTY, 0 | F_NOMATCH, "(?=A)", "AB" }, + + /* Conditional blocks. */ + { MU, A, 0, 0, "(?(?=(a))a|b)+k", "ababbalbbadabak" }, + { MU, A, 0, 0, "(?(?!(b))a|b)+k", "ababbalbbadabak" }, + { MU, A, 0, 0, "(?(?=a)a|b)+k", "ababbalbbadabak" }, + { MU, A, 0, 0, "(?(?!b)a|b)+k", "ababbalbbadabak" }, + { MU, A, 0, 0, "(?(?=(a))a*|b*)+k", "ababbalbbadabak" }, + { MU, A, 0, 0, "(?(?!(b))a*|b*)+k", "ababbalbbadabak" }, + { MU, A, 0, 0, "(?(?!(b))(?:aaaaaa|a)|(?:bbbbbb|b))+aaaak", "aaaaaaaaaaaaaa bbbbbbbbbbbbbbb aaaaaaak" }, + { MU, A, 0, 0, "(?(?!b)(?:aaaaaa|a)|(?:bbbbbb|b))+aaaak", "aaaaaaaaaaaaaa bbbbbbbbbbbbbbb aaaaaaak" }, + { MU, A, 0, 0 | F_DIFF, "(?(?!(b))(?:aaaaaa|a)|(?:bbbbbb|b))+bbbbk", "aaaaaaaaaaaaaa bbbbbbbbbbbbbbb bbbbbbbk" }, + { MU, A, 0, 0, "(?(?!b)(?:aaaaaa|a)|(?:bbbbbb|b))+bbbbk", "aaaaaaaaaaaaaa bbbbbbbbbbbbbbb bbbbbbbk" }, + { MU, A, 0, 0, "(?(?=a)a*|b*)+k", "ababbalbbadabak" }, + { MU, A, 0, 0, "(?(?!b)a*|b*)+k", "ababbalbbadabak" }, + { MU, A, 0, 0, "(?(?=a)ab)", "a" }, + { MU, A, 0, 0, "(?(?a)?(?Pb)?(?(Name)c|d)*l", "bc ddd abccabccl" }, + { MU, A, 0, 0, "(?Pa)?(?Pb)?(?(Name)c|d)+?dd", "bcabcacdb bdddd" }, + { MU, A, 0, 0, "(?Pa)?(?Pb)?(?(Name)c|d)+l", "ababccddabdbccd abcccl" }, + { MU, A, 0, 0, "((?:a|aa)(?(1)aaa))x", "aax" }, + { MU, A, 0, 0, "(?(?!)a|b)", "ab" }, + { MU, A, 0, 0, "(?(?!)a)", "ab" }, + { MU, A, 0, 0 | F_NOMATCH, "(?(?!)a|b)", "ac" }, + + /* Set start of match. */ + { MU, A, 0, 0, "(?:\\Ka)*aaaab", "aaaaaaaa aaaaaaabb" }, + { MU, A, 0, 0, "(?>\\Ka\\Ka)*aaaab", "aaaaaaaa aaaaaaaaaabb" }, + { MU, A, 0, 0, "a+\\K(?<=\\Gaa)a", "aaaaaa" }, + { MU, A, PCRE2_NOTEMPTY, 0 | F_NOMATCH, "a\\K(*ACCEPT)b", "aa" }, + { MU, A, PCRE2_NOTEMPTY_ATSTART, 0, "a\\K(*ACCEPT)b", "aa" }, + + /* First line. */ + { MU | PCRE2_FIRSTLINE, A, 0, 0 | F_PROPERTY, "\\p{Any}a", "bb\naaa" }, + { MU | PCRE2_FIRSTLINE, A, 0, 0 | F_NOMATCH | F_PROPERTY, "\\p{Any}a", "bb\r\naaa" }, + { MU | PCRE2_FIRSTLINE, A, 0, 0, "(?<=a)", "a" }, + { MU | PCRE2_FIRSTLINE, A, 0, 0 | F_NOMATCH, "[^a][^b]", "ab" }, + { MU | PCRE2_FIRSTLINE, A, 0, 0 | F_NOMATCH, "a", "\na" }, + { MU | PCRE2_FIRSTLINE, A, 0, 0 | F_NOMATCH, "[abc]", "\na" }, + { MU | PCRE2_FIRSTLINE, A, 0, 0 | F_NOMATCH, "^a", "\na" }, + { MU | PCRE2_FIRSTLINE, A, 0, 0 | F_NOMATCH, "^(?<=\n)", "\na" }, + { MU | PCRE2_FIRSTLINE, A, 0, 0, "\xf0\x90\x90\x80", "\xf0\x90\x90\x80" }, + { MU | PCRE2_FIRSTLINE, PCRE2_NEWLINE_ANY, 0, 0 | F_NOMATCH, "#", "\xc2\x85#" }, + { M | PCRE2_FIRSTLINE, PCRE2_NEWLINE_ANY, 0, 0 | F_NOMATCH, "#", "\x85#" }, + { MU | PCRE2_FIRSTLINE, PCRE2_NEWLINE_ANY, 0, 0 | F_NOMATCH, "^#", "\xe2\x80\xa8#" }, + { MU | PCRE2_FIRSTLINE, PCRE2_NEWLINE_CRLF, 0, 0 | F_PROPERTY, "\\p{Any}", "\r\na" }, + { MU | PCRE2_FIRSTLINE, PCRE2_NEWLINE_CRLF, 0, 0, ".", "\r" }, + { MU | PCRE2_FIRSTLINE, PCRE2_NEWLINE_CRLF, 0, 0, "a", "\ra" }, + { MU | PCRE2_FIRSTLINE, PCRE2_NEWLINE_CRLF, 0, 0 | F_NOMATCH, "ba", "bbb\r\nba" }, + { MU | PCRE2_FIRSTLINE, PCRE2_NEWLINE_CRLF, 0, 0 | F_NOMATCH | F_PROPERTY, "\\p{Any}{4}|a", "\r\na" }, + { MU | PCRE2_FIRSTLINE, PCRE2_NEWLINE_CRLF, 0, 1, ".", "\r\n" }, + { PCRE2_FIRSTLINE | PCRE2_DOTALL, PCRE2_NEWLINE_LF, 0, 0 | F_NOMATCH, "ab.", "ab" }, + { MU | PCRE2_FIRSTLINE, A, 0, 1 | F_NOMATCH, "^[a-d0-9]", "\nxx\nd" }, + { PCRE2_FIRSTLINE | PCRE2_DOTALL, PCRE2_NEWLINE_ANY, 0, 0, "....a", "012\n0a" }, + { MU | PCRE2_FIRSTLINE, A, 0, 0, "[aC]", "a" }, + + /* Recurse. */ + { MU, A, 0, 0, "(a)(?1)", "aa" }, + { MU, A, 0, 0, "((a))(?1)", "aa" }, + { MU, A, 0, 0, "(b|a)(?1)", "aa" }, + { MU, A, 0, 0, "(b|(a))(?1)", "aa" }, + { MU, A, 0, 0 | F_NOMATCH, "((a)(b)(?:a*))(?1)", "aba" }, + { MU, A, 0, 0, "((a)(b)(?:a*))(?1)", "abab" }, + { MU, A, 0, 0, "((a+)c(?2))b(?1)", "aacaabaca" }, + { MU, A, 0, 0, "((?2)b|(a)){2}(?1)", "aabab" }, + { MU, A, 0, 0, "(?1)(a)*+(?2)(b(?1))", "aababa" }, + { MU, A, 0, 0, "(?1)(((a(*ACCEPT)))b)", "axaa" }, + { MU, A, 0, 0, "(?1)(?(DEFINE) (((ac(*ACCEPT)))b) )", "akaac" }, + { MU, A, 0, 0, "(a+)b(?1)b\\1", "abaaabaaaaa" }, + { MU, A, 0, 0, "(?(DEFINE)(aa|a))(?1)ab", "aab" }, + { MU, A, 0, 0, "(?(DEFINE)(a\\Kb))(?1)+ababc", "abababxabababc" }, + { MU, A, 0, 0, "(a\\Kb)(?1)+ababc", "abababxababababc" }, + { MU, A, 0, 0 | F_NOMATCH, "(a\\Kb)(?1)+ababc", "abababxababababxc" }, + { MU, A, 0, 0, "b|<(?R)*>", "<" }, + { MU, A, 0, 0, "(a\\K){0}(?:(?1)b|ac)", "ac" }, + { MU, A, 0, 0, "(?(DEFINE)(a(?2)|b)(b(?1)|(a)))(?:(?1)|(?2))m", "ababababnababababaam" }, + { MU, A, 0, 0, "(a)((?(R)a|b))(?2)", "aabbabaa" }, + { MU, A, 0, 0, "(a)((?(R2)a|b))(?2)", "aabbabaa" }, + { MU, A, 0, 0, "(a)((?(R1)a|b))(?2)", "ababba" }, + { MU, A, 0, 0, "(?(R0)aa|bb(?R))", "abba aabb bbaa" }, + { MU, A, 0, 0, "((?(R)(?:aaaa|a)|(?:(aaaa)|(a)))+)(?1)$", "aaaaaaaaaa aaaa" }, + { MU, A, 0, 0, "(?Pa(?(R&Name)a|b))(?1)", "aab abb abaa" }, + { MU, A, 0, 0, "((?(R)a|(?1)){3})", "XaaaaaaaaaX" }, + { MU, A, 0, 0, "((?:(?(R)a|(?1))){3})", "XaaaaaaaaaX" }, + { MU, A, 0, 0, "((?(R)a|(?1)){1,3})aaaaaa", "aaaaaaaaXaaaaaaaaa" }, + { MU, A, 0, 0, "((?(R)a|(?1)){1,3}?)M", "aaaM" }, + { MU, A, 0, 0, "((.)(?:.|\\2(?1))){0}#(?1)#", "#aabbccdde# #aabbccddee#" }, + { MU, A, 0, 0, "((.)(?:\\2|\\2{4}b)){0}#(?:(?1))+#", "#aaaab# #aaaaab#" }, + + /* 16 bit specific tests. */ + { CM, A, 0, 0 | F_FORCECONV, "\xc3\xa1", "\xc3\x81\xc3\xa1" }, + { CM, A, 0, 0 | F_FORCECONV, "\xe1\xbd\xb8", "\xe1\xbf\xb8\xe1\xbd\xb8" }, + { CM, A, 0, 0 | F_FORCECONV, "[\xc3\xa1]", "\xc3\x81\xc3\xa1" }, + { CM, A, 0, 0 | F_FORCECONV, "[\xe1\xbd\xb8]", "\xe1\xbf\xb8\xe1\xbd\xb8" }, + { CM, A, 0, 0 | F_FORCECONV, "[a-\xed\xb0\x80]", "A" }, + { CM, A, 0, 0 | F_NO8 | F_FORCECONV, "[a-\\x{dc00}]", "B" }, + { CM, A, 0, 0 | F_NO8 | F_NOMATCH | F_FORCECONV, "[b-\\x{dc00}]", "a" }, + { CM, A, 0, 0 | F_NO8 | F_FORCECONV, "\xed\xa0\x80\\x{d800}\xed\xb0\x80\\x{dc00}", "\xed\xa0\x80\xed\xa0\x80\xed\xb0\x80\xed\xb0\x80" }, + { CM, A, 0, 0 | F_NO8 | F_FORCECONV, "[\xed\xa0\x80\\x{d800}]{1,2}?[\xed\xb0\x80\\x{dc00}]{1,2}?#", "\xed\xa0\x80\xed\xa0\x80\xed\xb0\x80\xed\xb0\x80#" }, + { CM, A, 0, 0 | F_FORCECONV, "[\xed\xa0\x80\xed\xb0\x80#]{0,3}(?<=\xed\xb0\x80.)", "\xed\xa0\x80#\xed\xa0\x80##\xed\xb0\x80\xed\xa0\x80" }, + { CM, A, 0, 0 | F_FORCECONV, "[\xed\xa0\x80-\xed\xb3\xbf]", "\xed\x9f\xbf\xed\xa0\x83" }, + { CM, A, 0, 0 | F_FORCECONV, "[\xed\xa0\x80-\xed\xb3\xbf]", "\xed\xb4\x80\xed\xb3\xb0" }, + { CM, A, 0, 0 | F_NO8 | F_FORCECONV, "[\\x{d800}-\\x{dcff}]", "\xed\x9f\xbf\xed\xa0\x83" }, + { CM, A, 0, 0 | F_NO8 | F_FORCECONV, "[\\x{d800}-\\x{dcff}]", "\xed\xb4\x80\xed\xb3\xb0" }, + { CM, A, 0, 0 | F_FORCECONV, "[\xed\xa0\x80-\xef\xbf\xbf]+[\x1-\xed\xb0\x80]+#", "\xed\xa0\x85\xc3\x81\xed\xa0\x85\xef\xbf\xb0\xc2\x85\xed\xa9\x89#" }, + { CM, A, 0, 0 | F_FORCECONV, "[\xed\xa0\x80][\xed\xb0\x80]{2,}", "\xed\xa0\x80\xed\xb0\x80\xed\xa0\x80\xed\xb0\x80\xed\xb0\x80\xed\xb0\x80" }, + { M, A, 0, 0 | F_FORCECONV, "[^\xed\xb0\x80]{3,}?", "##\xed\xb0\x80#\xed\xb0\x80#\xc3\x89#\xed\xb0\x80" }, + { M, A, 0, 0 | F_NO8 | F_FORCECONV, "[^\\x{dc00}]{3,}?", "##\xed\xb0\x80#\xed\xb0\x80#\xc3\x89#\xed\xb0\x80" }, + { CM, A, 0, 0 | F_FORCECONV, ".\\B.", "\xed\xa0\x80\xed\xb0\x80" }, + { CM, A, 0, 0 | F_FORCECONV, "\\D+(?:\\d+|.)\\S+(?:\\s+|.)\\W+(?:\\w+|.)\xed\xa0\x80\xed\xa0\x80", "\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80" }, + { CM, A, 0, 0 | F_FORCECONV, "\\d*\\s*\\w*\xed\xa0\x80\xed\xa0\x80", "\xed\xa0\x80\xed\xa0\x80" }, + { CM, A, 0, 0 | F_FORCECONV | F_NOMATCH, "\\d*?\\D*?\\s*?\\S*?\\w*?\\W*?##", "\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80\xed\xa0\x80#" }, + { CM | PCRE2_EXTENDED, A, 0, 0 | F_FORCECONV, "\xed\xa0\x80 \xed\xb0\x80 !", "\xed\xa0\x80\xed\xb0\x80!" }, + { CM, A, 0, 0 | F_FORCECONV, "\xed\xa0\x80+#[^#]+\xed\xa0\x80", "\xed\xa0\x80#a\xed\xa0\x80" }, + { CM, A, 0, 0 | F_FORCECONV, "(\xed\xa0\x80+)#\\1", "\xed\xa0\x80\xed\xa0\x80#\xed\xa0\x80\xed\xa0\x80" }, + { M, PCRE2_NEWLINE_ANY, 0, 0 | F_NO8 | F_FORCECONV, "^-", "a--\xe2\x80\xa8--" }, + { 0, BSR(PCRE2_BSR_UNICODE), 0, 0 | F_NO8 | F_FORCECONV, "\\R", "ab\xe2\x80\xa8" }, + { 0, 0, 0, 0 | F_NO8 | F_FORCECONV, "\\v", "ab\xe2\x80\xa9" }, + { 0, 0, 0, 0 | F_NO8 | F_FORCECONV, "\\h", "ab\xe1\xa0\x8e" }, + { 0, 0, 0, 0 | F_NO8 | F_FORCECONV, "\\v+?\\V+?#", "\xe2\x80\xa9\xe2\x80\xa9\xef\xbf\xbf\xef\xbf\xbf#" }, + { 0, 0, 0, 0 | F_NO8 | F_FORCECONV, "\\h+?\\H+?#", "\xe1\xa0\x8e\xe1\xa0\x8e\xef\xbf\xbf\xef\xbf\xbf#" }, + + /* Partial matching. */ + { MU, A, PCRE2_PARTIAL_SOFT, 0, "ab", "a" }, + { MU, A, PCRE2_PARTIAL_SOFT, 0, "ab|a", "a" }, + { MU, A, PCRE2_PARTIAL_HARD, 0, "ab|a", "a" }, + { MU, A, PCRE2_PARTIAL_SOFT, 0, "\\b#", "a" }, + { MU, A, PCRE2_PARTIAL_SOFT, 0, "(?<=a)b", "a" }, + { MU, A, PCRE2_PARTIAL_SOFT, 0, "abc|(?<=xxa)bc", "xxab" }, + { MU, A, PCRE2_PARTIAL_SOFT, 0, "a\\B", "a" }, + { MU, A, PCRE2_PARTIAL_HARD, 0, "a\\b", "a" }, + + /* (*MARK) verb. */ + { MU, A, 0, 0, "a(*MARK:aa)a", "ababaa" }, + { MU, A, 0, 0 | F_NOMATCH, "a(*:aa)a", "abab" }, + { MU, A, 0, 0, "a(*:aa)(b(*:bb)b|bc)", "abc" }, + { MU, A, 0, 0 | F_NOMATCH, "a(*:1)x|b(*:2)y", "abc" }, + { MU, A, 0, 0, "(?>a(*:aa))b|ac", "ac" }, + { MU, A, 0, 0, "(?(DEFINE)(a(*:aa)))(?1)", "a" }, + { MU, A, 0, 0 | F_NOMATCH, "(?(DEFINE)((a)(*:aa)))(?1)b", "aa" }, + { MU, A, 0, 0, "(?(DEFINE)(a(*:aa)))a(?1)b|aac", "aac" }, + { MU, A, 0, 0, "(a(*:aa)){0}(?:b(?1)b|c)+c", "babbab cc" }, + { MU, A, 0, 0, "(a(*:aa)){0}(?:b(?1)b)+", "babba" }, + { MU, A, 0, 0 | F_NOMATCH, "(a(*:aa)){0}(?:b(?1)b)+", "ba" }, + { MU, A, 0, 0, "(a\\K(*:aa)){0}(?:b(?1)b|c)+c", "babbab cc" }, + { MU, A, 0, 0, "(a\\K(*:aa)){0}(?:b(?1)b)+", "babba" }, + { MU, A, 0, 0 | F_NOMATCH, "(a\\K(*:aa)){0}(?:b(?1)b)+", "ba" }, + { MU, A, 0, 0 | F_NOMATCH, "(*:mark)m", "a" }, + + /* (*COMMIT) verb. */ + { MU, A, 0, 0 | F_NOMATCH, "a(*COMMIT)b", "ac" }, + { MU, A, 0, 0, "aa(*COMMIT)b", "xaxaab" }, + { MU, A, 0, 0 | F_NOMATCH, "a(*COMMIT)(*:msg)b|ac", "ac" }, + { MU, A, 0, 0 | F_NOMATCH, "(a(*COMMIT)b)++", "abac" }, + { MU, A, 0, 0 | F_NOMATCH, "((a)(*COMMIT)b)++", "abac" }, + { MU, A, 0, 0 | F_NOMATCH, "(?=a(*COMMIT)b)ab|ad", "ad" }, + + /* (*PRUNE) verb. */ + { MU, A, 0, 0, "aa\\K(*PRUNE)b", "aaab" }, + { MU, A, 0, 0, "aa(*PRUNE:bb)b|a", "aa" }, + { MU, A, 0, 0, "(a)(a)(*PRUNE)b|(a)", "aa" }, + { MU, A, 0, 0, "(a)(a)(a)(a)(a)(a)(a)(a)(*PRUNE)b|(a)", "aaaaaaaa" }, + { MU, A, PCRE2_PARTIAL_SOFT, 0, "a(*PRUNE)a|", "a" }, + { MU, A, PCRE2_PARTIAL_SOFT, 0, "a(*PRUNE)a|m", "a" }, + { MU, A, 0, 0 | F_NOMATCH, "(?=a(*PRUNE)b)ab|ad", "ad" }, + { MU, A, 0, 0, "a(*COMMIT)(*PRUNE)d|bc", "abc" }, + { MU, A, 0, 0, "(?=a(*COMMIT)b)a(*PRUNE)c|bc", "abc" }, + { MU, A, 0, 0 | F_NOMATCH, "(*COMMIT)(?=a(*COMMIT)b)a(*PRUNE)c|bc", "abc" }, + { MU, A, 0, 0, "(?=(a)(*COMMIT)b)a(*PRUNE)c|bc", "abc" }, + { MU, A, 0, 0 | F_NOMATCH, "(*COMMIT)(?=(a)(*COMMIT)b)a(*PRUNE)c|bc", "abc" }, + { MU, A, 0, 0, "(a(*COMMIT)b){0}a(?1)(*PRUNE)c|bc", "abc" }, + { MU, A, 0, 0 | F_NOMATCH, "(a(*COMMIT)b){0}a(*COMMIT)(?1)(*PRUNE)c|bc", "abc" }, + { MU, A, 0, 0, "(a(*COMMIT)b)++(*PRUNE)d|c", "ababc" }, + { MU, A, 0, 0 | F_NOMATCH, "(*COMMIT)(a(*COMMIT)b)++(*PRUNE)d|c", "ababc" }, + { MU, A, 0, 0, "((a)(*COMMIT)b)++(*PRUNE)d|c", "ababc" }, + { MU, A, 0, 0 | F_NOMATCH, "(*COMMIT)((a)(*COMMIT)b)++(*PRUNE)d|c", "ababc" }, + { MU, A, 0, 0, "(?>a(*COMMIT)b)*abab(*PRUNE)d|ba", "ababab" }, + { MU, A, 0, 0 | F_NOMATCH, "(*COMMIT)(?>a(*COMMIT)b)*abab(*PRUNE)d|ba", "ababab" }, + { MU, A, 0, 0, "(?>a(*COMMIT)b)+abab(*PRUNE)d|ba", "ababab" }, + { MU, A, 0, 0 | F_NOMATCH, "(*COMMIT)(?>a(*COMMIT)b)+abab(*PRUNE)d|ba", "ababab" }, + { MU, A, 0, 0, "(?>a(*COMMIT)b)?ab(*PRUNE)d|ba", "aba" }, + { MU, A, 0, 0 | F_NOMATCH, "(*COMMIT)(?>a(*COMMIT)b)?ab(*PRUNE)d|ba", "aba" }, + { MU, A, 0, 0, "(?>a(*COMMIT)b)*?n(*PRUNE)d|ba", "abababn" }, + { MU, A, 0, 0 | F_NOMATCH, "(*COMMIT)(?>a(*COMMIT)b)*?n(*PRUNE)d|ba", "abababn" }, + { MU, A, 0, 0, "(?>a(*COMMIT)b)+?n(*PRUNE)d|ba", "abababn" }, + { MU, A, 0, 0 | F_NOMATCH, "(*COMMIT)(?>a(*COMMIT)b)+?n(*PRUNE)d|ba", "abababn" }, + { MU, A, 0, 0, "(?>a(*COMMIT)b)??n(*PRUNE)d|bn", "abn" }, + { MU, A, 0, 0 | F_NOMATCH, "(*COMMIT)(?>a(*COMMIT)b)??n(*PRUNE)d|bn", "abn" }, + + /* (*SKIP) verb. */ + { MU, A, 0, 0 | F_NOMATCH, "(?=a(*SKIP)b)ab|ad", "ad" }, + { MU, A, 0, 0, "(\\w+(*SKIP)#)", "abcd,xyz#," }, + { MU, A, 0, 0, "\\w+(*SKIP)#|mm", "abcd,xyz#," }, + { MU, A, 0, 0 | F_NOMATCH, "b+(?<=(*SKIP)#c)|b+", "#bbb" }, + + /* (*THEN) verb. */ + { MU, A, 0, 0, "((?:a(*THEN)|aab)(*THEN)c|a+)+m", "aabcaabcaabcaabcnacm" }, + { MU, A, 0, 0 | F_NOMATCH, "((?:a(*THEN)|aab)(*THEN)c|a+)+m", "aabcm" }, + { MU, A, 0, 0, "((?:a(*THEN)|aab)c|a+)+m", "aabcaabcnmaabcaabcm" }, + { MU, A, 0, 0, "((?:a|aab)(*THEN)c|a+)+m", "aam" }, + { MU, A, 0, 0, "((?:a(*COMMIT)|aab)(*THEN)c|a+)+m", "aam" }, + { MU, A, 0, 0, "(?(?=a(*THEN)b)ab|ad)", "ad" }, + { MU, A, 0, 0, "(?(?!a(*THEN)b)ad|add)", "add" }, + { MU, A, 0, 0 | F_NOMATCH, "(?(?=a)a(*THEN)b|ad)", "ad" }, + { MU, A, 0, 0, "(?!(?(?=a)ab|b(*THEN)d))bn|bnn", "bnn" }, + { MU, A, 0, 0, "(?=(*THEN: ))* ", " " }, + { MU, A, 0, 0, "a(*THEN)(?R) |", "a" }, + + /* Recurse and control verbs. */ + { MU, A, 0, 0, "(a(*ACCEPT)b){0}a(?1)b", "aacaabb" }, + { MU, A, 0, 0, "((a)\\2(*ACCEPT)b){0}a(?1)b", "aaacaaabb" }, + { MU, A, 0, 0, "((ab|a(*ACCEPT)x)+|ababababax){0}_(?1)_", "_ababababax_ _ababababa_" }, + { MU, A, 0, 0, "((.)(?:A(*ACCEPT)|(?1)\\2)){0}_(?1)_", "_bcdaAdcb_bcdaAdcb_" }, + { MU, A, 0, 0, "((*MARK:m)(?:a|a(*COMMIT)b|aa)){0}_(?1)_", "_ab_" }, + { MU, A, 0, 0, "((*MARK:m)(?:a|a(*COMMIT)b|aa)){0}_(?1)_|(_aa_)", "_aa_" }, + { MU, A, 0, 0, "(a(*COMMIT)(?:b|bb)|c(*ACCEPT)d|dd){0}_(?1)+_", "_ax_ _cd_ _abbb_ _abcd_ _abbcdd_" }, + { MU, A, 0, 0, "((.)(?:.|(*COMMIT)\\2{3}(*ACCEPT).*|.*)){0}_(?1){0,4}_", "_aaaabbbbccccddd_ _aaaabbbbccccdddd_" }, + +#ifdef SUPPORT_UNICODE + /* Script runs and iterations. */ + { MU, A, 0, 0, "!(*sr:\\w\\w|\\w\\w\\w)*#", "!abcdefghijklmno!abcdefghijklmno!abcdef#" }, + { MU, A, 0, 0, "!(*sr:\\w\\w|\\w\\w\\w)+#", "!abcdefghijklmno!abcdefghijklmno!abcdef#" }, + { MU, A, 0, 0, "!(*sr:\\w\\w|\\w\\w\\w)*?#", "!abcdefghijklmno!abcdefghijklmno!abcdef#" }, + { MU, A, 0, 0, "!(*sr:\\w\\w|\\w\\w\\w)+?#", "!abcdefghijklmno!abcdefghijklmno!abcdef#" }, + { MU, A, 0, 0, "!(*sr:\\w\\w|\\w\\w\\w)*+#", "!abcdefghijklmno!abcdefghijklmno!abcdef#" }, + { MU, A, 0, 0, "!(*sr:\\w\\w|\\w\\w\\w)++#", "!abcdefghijklmno!abcdefghijklmno!abcdef#" }, + { MU, A, 0, 0, "!(*sr:\\w\\w|\\w\\w\\w)?#", "!ab!abc!ab!ab#" }, + { MU, A, 0, 0, "!(*sr:\\w\\w|\\w\\w\\w)??#", "!ab!abc!ab!ab#" }, +#endif + + /* Deep recursion. */ + { MU, A, 0, 0, "((((?:(?:(?:\\w)+)?)*|(?>\\w)+?)+|(?>\\w)?\?)*)?\\s", "aaaaa+ " }, + { MU, A, 0, 0, "(?:((?:(?:(?:\\w*?)+)??|(?>\\w)?|\\w*+)*)+)+?\\s", "aa+ " }, + { MU, A, 0, 0, "((a?)+)+b", "aaaaaaaaaaaa b" }, + + /* Deep recursion: Stack limit reached. */ + { M, A, 0, 0 | F_NOMATCH, "a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaa" }, + { M, A, 0, 0 | F_NOMATCH, "(?:a+)+b", "aaaaaaaaaaaaaaaaaaaaaaaa b" }, + { M, A, 0, 0 | F_NOMATCH, "(?:a+?)+?b", "aaaaaaaaaaaaaaaaaaaaaaaa b" }, + { M, A, 0, 0 | F_NOMATCH, "(?:a*)*b", "aaaaaaaaaaaaaaaaaaaaaaaa b" }, + { M, A, 0, 0 | F_NOMATCH, "(?:a*?)*?b", "aaaaaaaaaaaaaaaaaaaaaaaa b" }, + + { 0, 0, 0, 0, NULL, NULL } +}; + +#ifdef SUPPORT_PCRE2_8 +static pcre2_jit_stack_8* callback8(void *arg) +{ + return (pcre2_jit_stack_8 *)arg; +} +#endif + +#ifdef SUPPORT_PCRE2_16 +static pcre2_jit_stack_16* callback16(void *arg) +{ + return (pcre2_jit_stack_16 *)arg; +} +#endif + +#ifdef SUPPORT_PCRE2_32 +static pcre2_jit_stack_32* callback32(void *arg) +{ + return (pcre2_jit_stack_32 *)arg; +} +#endif + +#ifdef SUPPORT_PCRE2_8 +static pcre2_jit_stack_8 *stack8; + +static pcre2_jit_stack_8 *getstack8(void) +{ + if (!stack8) + stack8 = pcre2_jit_stack_create_8(1, 1024 * 1024, NULL); + return stack8; +} + +static void setstack8(pcre2_match_context_8 *mcontext) +{ + if (!mcontext) { + if (stack8) + pcre2_jit_stack_free_8(stack8); + stack8 = NULL; + return; + } + + pcre2_jit_stack_assign_8(mcontext, callback8, getstack8()); +} +#endif /* SUPPORT_PCRE2_8 */ + +#ifdef SUPPORT_PCRE2_16 +static pcre2_jit_stack_16 *stack16; + +static pcre2_jit_stack_16 *getstack16(void) +{ + if (!stack16) + stack16 = pcre2_jit_stack_create_16(1, 1024 * 1024, NULL); + return stack16; +} + +static void setstack16(pcre2_match_context_16 *mcontext) +{ + if (!mcontext) { + if (stack16) + pcre2_jit_stack_free_16(stack16); + stack16 = NULL; + return; + } + + pcre2_jit_stack_assign_16(mcontext, callback16, getstack16()); +} +#endif /* SUPPORT_PCRE2_16 */ + +#ifdef SUPPORT_PCRE2_32 +static pcre2_jit_stack_32 *stack32; + +static pcre2_jit_stack_32 *getstack32(void) +{ + if (!stack32) + stack32 = pcre2_jit_stack_create_32(1, 1024 * 1024, NULL); + return stack32; +} + +static void setstack32(pcre2_match_context_32 *mcontext) +{ + if (!mcontext) { + if (stack32) + pcre2_jit_stack_free_32(stack32); + stack32 = NULL; + return; + } + + pcre2_jit_stack_assign_32(mcontext, callback32, getstack32()); +} +#endif /* SUPPORT_PCRE2_32 */ + +#ifdef SUPPORT_PCRE2_16 + +static int convert_utf8_to_utf16(PCRE2_SPTR8 input, PCRE2_UCHAR16 *output, int *offsetmap, int max_length) +{ + PCRE2_SPTR8 iptr = input; + PCRE2_UCHAR16 *optr = output; + unsigned int c; + + if (max_length == 0) + return 0; + + while (*iptr && max_length > 1) { + c = 0; + if (offsetmap) + *offsetmap++ = (int)(iptr - (unsigned char*)input); + + if (*iptr < 0xc0) + c = *iptr++; + else if (!(*iptr & 0x20)) { + c = ((iptr[0] & 0x1f) << 6) | (iptr[1] & 0x3f); + iptr += 2; + } else if (!(*iptr & 0x10)) { + c = ((iptr[0] & 0x0f) << 12) | ((iptr[1] & 0x3f) << 6) | (iptr[2] & 0x3f); + iptr += 3; + } else if (!(*iptr & 0x08)) { + c = ((iptr[0] & 0x07) << 18) | ((iptr[1] & 0x3f) << 12) | ((iptr[2] & 0x3f) << 6) | (iptr[3] & 0x3f); + iptr += 4; + } + + if (c < 65536) { + *optr++ = c; + max_length--; + } else if (max_length <= 2) { + *optr = '\0'; + return (int)(optr - output); + } else { + c -= 0x10000; + *optr++ = 0xd800 | ((c >> 10) & 0x3ff); + *optr++ = 0xdc00 | (c & 0x3ff); + max_length -= 2; + if (offsetmap) + offsetmap++; + } + } + if (offsetmap) + *offsetmap = (int)(iptr - (unsigned char*)input); + *optr = '\0'; + return (int)(optr - output); +} + +static int copy_char8_to_char16(PCRE2_SPTR8 input, PCRE2_UCHAR16 *output, int max_length) +{ + PCRE2_SPTR8 iptr = input; + PCRE2_UCHAR16 *optr = output; + + if (max_length == 0) + return 0; + + while (*iptr && max_length > 1) { + *optr++ = *iptr++; + max_length--; + } + *optr = '\0'; + return (int)(optr - output); +} + +#define REGTEST_MAX_LENGTH16 4096 +static PCRE2_UCHAR16 regtest_buf16[REGTEST_MAX_LENGTH16]; +static int regtest_offsetmap16[REGTEST_MAX_LENGTH16]; + +#endif /* SUPPORT_PCRE2_16 */ + +#ifdef SUPPORT_PCRE2_32 + +static int convert_utf8_to_utf32(PCRE2_SPTR8 input, PCRE2_UCHAR32 *output, int *offsetmap, int max_length) +{ + PCRE2_SPTR8 iptr = input; + PCRE2_UCHAR32 *optr = output; + unsigned int c; + + if (max_length == 0) + return 0; + + while (*iptr && max_length > 1) { + c = 0; + if (offsetmap) + *offsetmap++ = (int)(iptr - (unsigned char*)input); + + if (*iptr < 0xc0) + c = *iptr++; + else if (!(*iptr & 0x20)) { + c = ((iptr[0] & 0x1f) << 6) | (iptr[1] & 0x3f); + iptr += 2; + } else if (!(*iptr & 0x10)) { + c = ((iptr[0] & 0x0f) << 12) | ((iptr[1] & 0x3f) << 6) | (iptr[2] & 0x3f); + iptr += 3; + } else if (!(*iptr & 0x08)) { + c = ((iptr[0] & 0x07) << 18) | ((iptr[1] & 0x3f) << 12) | ((iptr[2] & 0x3f) << 6) | (iptr[3] & 0x3f); + iptr += 4; + } + + *optr++ = c; + max_length--; + } + if (offsetmap) + *offsetmap = (int)(iptr - (unsigned char*)input); + *optr = 0; + return (int)(optr - output); +} + +static int copy_char8_to_char32(PCRE2_SPTR8 input, PCRE2_UCHAR32 *output, int max_length) +{ + PCRE2_SPTR8 iptr = input; + PCRE2_UCHAR32 *optr = output; + + if (max_length == 0) + return 0; + + while (*iptr && max_length > 1) { + *optr++ = *iptr++; + max_length--; + } + *optr = '\0'; + return (int)(optr - output); +} + +#define REGTEST_MAX_LENGTH32 4096 +static PCRE2_UCHAR32 regtest_buf32[REGTEST_MAX_LENGTH32]; +static int regtest_offsetmap32[REGTEST_MAX_LENGTH32]; + +#endif /* SUPPORT_PCRE2_32 */ + +static int check_ascii(const char *input) +{ + const unsigned char *ptr = (unsigned char *)input; + while (*ptr) { + if (*ptr > 127) + return 0; + ptr++; + } + return 1; +} + +#define OVECTOR_SIZE 15 + +static int regression_tests(void) +{ + struct regression_test_case *current = regression_test_cases; + int error; + PCRE2_SIZE err_offs; + int is_successful; + int is_ascii; + int total = 0; + int successful = 0; + int successful_row = 0; + int counter = 0; + int jit_compile_mode; + int utf = 0; + int disabled_options = 0; + int i; +#ifdef SUPPORT_PCRE2_8 + pcre2_code_8 *re8; + pcre2_compile_context_8 *ccontext8; + pcre2_match_data_8 *mdata8_1; + pcre2_match_data_8 *mdata8_2; + pcre2_match_context_8 *mcontext8; + PCRE2_SIZE *ovector8_1 = NULL; + PCRE2_SIZE *ovector8_2 = NULL; + int return_value8[2]; +#endif +#ifdef SUPPORT_PCRE2_16 + pcre2_code_16 *re16; + pcre2_compile_context_16 *ccontext16; + pcre2_match_data_16 *mdata16_1; + pcre2_match_data_16 *mdata16_2; + pcre2_match_context_16 *mcontext16; + PCRE2_SIZE *ovector16_1 = NULL; + PCRE2_SIZE *ovector16_2 = NULL; + int return_value16[2]; + int length16; +#endif +#ifdef SUPPORT_PCRE2_32 + pcre2_code_32 *re32; + pcre2_compile_context_32 *ccontext32; + pcre2_match_data_32 *mdata32_1; + pcre2_match_data_32 *mdata32_2; + pcre2_match_context_32 *mcontext32; + PCRE2_SIZE *ovector32_1 = NULL; + PCRE2_SIZE *ovector32_2 = NULL; + int return_value32[2]; + int length32; +#endif + +#if defined SUPPORT_PCRE2_8 + PCRE2_UCHAR8 cpu_info[128]; +#elif defined SUPPORT_PCRE2_16 + PCRE2_UCHAR16 cpu_info[128]; +#elif defined SUPPORT_PCRE2_32 + PCRE2_UCHAR32 cpu_info[128]; +#endif +#if defined SUPPORT_UNICODE && ((defined(SUPPORT_PCRE2_8) + defined(SUPPORT_PCRE2_16) + defined(SUPPORT_PCRE2_32)) >= 2) + int return_value; +#endif + + /* This test compares the behaviour of interpreter and JIT. Although disabling + utf or ucp may make tests fail, if the pcre_exec result is the SAME, it is + still considered successful from pcre_jit_test point of view. */ + +#if defined SUPPORT_PCRE2_8 + pcre2_config_8(PCRE2_CONFIG_JITTARGET, &cpu_info); +#elif defined SUPPORT_PCRE2_16 + pcre2_config_16(PCRE2_CONFIG_JITTARGET, &cpu_info); +#elif defined SUPPORT_PCRE2_32 + pcre2_config_32(PCRE2_CONFIG_JITTARGET, &cpu_info); +#endif + + printf("Running JIT regression tests\n"); + printf(" target CPU of SLJIT compiler: "); + for (i = 0; cpu_info[i]; i++) + printf("%c", (char)(cpu_info[i])); + printf("\n"); + +#if defined SUPPORT_PCRE2_8 + pcre2_config_8(PCRE2_CONFIG_UNICODE, &utf); +#elif defined SUPPORT_PCRE2_16 + pcre2_config_16(PCRE2_CONFIG_UNICODE, &utf); +#elif defined SUPPORT_PCRE2_32 + pcre2_config_32(PCRE2_CONFIG_UNICODE, &utf); +#endif + + if (!utf) + disabled_options |= PCRE2_UTF; +#ifdef SUPPORT_PCRE2_8 + printf(" in 8 bit mode with UTF-8 %s:\n", utf ? "enabled" : "disabled"); +#endif +#ifdef SUPPORT_PCRE2_16 + printf(" in 16 bit mode with UTF-16 %s:\n", utf ? "enabled" : "disabled"); +#endif +#ifdef SUPPORT_PCRE2_32 + printf(" in 32 bit mode with UTF-32 %s:\n", utf ? "enabled" : "disabled"); +#endif + + while (current->pattern) { + /* printf("\nPattern: %s :\n", current->pattern); */ + total++; + is_ascii = 0; + if (!(current->start_offset & F_PROPERTY)) + is_ascii = check_ascii(current->pattern) && check_ascii(current->input); + + if (current->match_options & PCRE2_PARTIAL_SOFT) + jit_compile_mode = PCRE2_JIT_PARTIAL_SOFT; + else if (current->match_options & PCRE2_PARTIAL_HARD) + jit_compile_mode = PCRE2_JIT_PARTIAL_HARD; + else + jit_compile_mode = PCRE2_JIT_COMPLETE; + error = 0; +#ifdef SUPPORT_PCRE2_8 + re8 = NULL; + ccontext8 = pcre2_compile_context_create_8(NULL); + if (ccontext8) { + if (GET_NEWLINE(current->newline)) + pcre2_set_newline_8(ccontext8, GET_NEWLINE(current->newline)); + if (GET_BSR(current->newline)) + pcre2_set_bsr_8(ccontext8, GET_BSR(current->newline)); + + if (!(current->start_offset & F_NO8)) { + re8 = pcre2_compile_8((PCRE2_SPTR8)current->pattern, PCRE2_ZERO_TERMINATED, + current->compile_options & ~disabled_options, + &error, &err_offs, ccontext8); + + if (!re8 && (utf || is_ascii)) + printf("\n8 bit: Cannot compile pattern \"%s\": %d\n", current->pattern, error); + } + pcre2_compile_context_free_8(ccontext8); + } + else + printf("\n8 bit: Cannot allocate compile context\n"); +#endif +#ifdef SUPPORT_PCRE2_16 + if ((current->compile_options & PCRE2_UTF) || (current->start_offset & F_FORCECONV)) + convert_utf8_to_utf16((PCRE2_SPTR8)current->pattern, regtest_buf16, NULL, REGTEST_MAX_LENGTH16); + else + copy_char8_to_char16((PCRE2_SPTR8)current->pattern, regtest_buf16, REGTEST_MAX_LENGTH16); + + re16 = NULL; + ccontext16 = pcre2_compile_context_create_16(NULL); + if (ccontext16) { + if (GET_NEWLINE(current->newline)) + pcre2_set_newline_16(ccontext16, GET_NEWLINE(current->newline)); + if (GET_BSR(current->newline)) + pcre2_set_bsr_16(ccontext16, GET_BSR(current->newline)); + + if (!(current->start_offset & F_NO16)) { + re16 = pcre2_compile_16(regtest_buf16, PCRE2_ZERO_TERMINATED, + current->compile_options & ~disabled_options, + &error, &err_offs, ccontext16); + + if (!re16 && (utf || is_ascii)) + printf("\n16 bit: Cannot compile pattern \"%s\": %d\n", current->pattern, error); + } + pcre2_compile_context_free_16(ccontext16); + } + else + printf("\n16 bit: Cannot allocate compile context\n"); +#endif +#ifdef SUPPORT_PCRE2_32 + if ((current->compile_options & PCRE2_UTF) || (current->start_offset & F_FORCECONV)) + convert_utf8_to_utf32((PCRE2_SPTR8)current->pattern, regtest_buf32, NULL, REGTEST_MAX_LENGTH32); + else + copy_char8_to_char32((PCRE2_SPTR8)current->pattern, regtest_buf32, REGTEST_MAX_LENGTH32); + + re32 = NULL; + ccontext32 = pcre2_compile_context_create_32(NULL); + if (ccontext32) { + if (GET_NEWLINE(current->newline)) + pcre2_set_newline_32(ccontext32, GET_NEWLINE(current->newline)); + if (GET_BSR(current->newline)) + pcre2_set_bsr_32(ccontext32, GET_BSR(current->newline)); + + if (!(current->start_offset & F_NO32)) { + re32 = pcre2_compile_32(regtest_buf32, PCRE2_ZERO_TERMINATED, + current->compile_options & ~disabled_options, + &error, &err_offs, ccontext32); + + if (!re32 && (utf || is_ascii)) + printf("\n32 bit: Cannot compile pattern \"%s\": %d\n", current->pattern, error); + } + pcre2_compile_context_free_32(ccontext32); + } + else + printf("\n32 bit: Cannot allocate compile context\n"); +#endif + + counter++; + if ((counter & 0x3) != 0) { +#ifdef SUPPORT_PCRE2_8 + setstack8(NULL); +#endif +#ifdef SUPPORT_PCRE2_16 + setstack16(NULL); +#endif +#ifdef SUPPORT_PCRE2_32 + setstack32(NULL); +#endif + } + +#ifdef SUPPORT_PCRE2_8 + return_value8[0] = -1000; + return_value8[1] = -1000; + mdata8_1 = pcre2_match_data_create_8(OVECTOR_SIZE, NULL); + mdata8_2 = pcre2_match_data_create_8(OVECTOR_SIZE, NULL); + mcontext8 = pcre2_match_context_create_8(NULL); + if (!mdata8_1 || !mdata8_2 || !mcontext8) { + printf("\n8 bit: Cannot allocate match data\n"); + pcre2_match_data_free_8(mdata8_1); + pcre2_match_data_free_8(mdata8_2); + pcre2_match_context_free_8(mcontext8); + pcre2_code_free_8(re8); + re8 = NULL; + } else { + ovector8_1 = pcre2_get_ovector_pointer_8(mdata8_1); + ovector8_2 = pcre2_get_ovector_pointer_8(mdata8_2); + for (i = 0; i < OVECTOR_SIZE * 2; ++i) + ovector8_1[i] = -2; + for (i = 0; i < OVECTOR_SIZE * 2; ++i) + ovector8_2[i] = -2; + pcre2_set_match_limit_8(mcontext8, 10000000); + } + if (re8) { + return_value8[1] = pcre2_match_8(re8, (PCRE2_SPTR8)current->input, strlen(current->input), + current->start_offset & OFFSET_MASK, current->match_options, mdata8_2, mcontext8); + + if (pcre2_jit_compile_8(re8, jit_compile_mode)) { + printf("\n8 bit: JIT compiler does not support \"%s\"\n", current->pattern); + } else if ((counter & 0x1) != 0) { + setstack8(mcontext8); + return_value8[0] = pcre2_match_8(re8, (PCRE2_SPTR8)current->input, strlen(current->input), + current->start_offset & OFFSET_MASK, current->match_options, mdata8_1, mcontext8); + } else { + pcre2_jit_stack_assign_8(mcontext8, NULL, getstack8()); + return_value8[0] = pcre2_jit_match_8(re8, (PCRE2_SPTR8)current->input, strlen(current->input), + current->start_offset & OFFSET_MASK, current->match_options, mdata8_1, mcontext8); + } + } +#endif + +#ifdef SUPPORT_PCRE2_16 + return_value16[0] = -1000; + return_value16[1] = -1000; + mdata16_1 = pcre2_match_data_create_16(OVECTOR_SIZE, NULL); + mdata16_2 = pcre2_match_data_create_16(OVECTOR_SIZE, NULL); + mcontext16 = pcre2_match_context_create_16(NULL); + if (!mdata16_1 || !mdata16_2 || !mcontext16) { + printf("\n16 bit: Cannot allocate match data\n"); + pcre2_match_data_free_16(mdata16_1); + pcre2_match_data_free_16(mdata16_2); + pcre2_match_context_free_16(mcontext16); + pcre2_code_free_16(re16); + re16 = NULL; + } else { + ovector16_1 = pcre2_get_ovector_pointer_16(mdata16_1); + ovector16_2 = pcre2_get_ovector_pointer_16(mdata16_2); + for (i = 0; i < OVECTOR_SIZE * 2; ++i) + ovector16_1[i] = -2; + for (i = 0; i < OVECTOR_SIZE * 2; ++i) + ovector16_2[i] = -2; + pcre2_set_match_limit_16(mcontext16, 10000000); + } + if (re16) { + if ((current->compile_options & PCRE2_UTF) || (current->start_offset & F_FORCECONV)) + length16 = convert_utf8_to_utf16((PCRE2_SPTR8)current->input, regtest_buf16, regtest_offsetmap16, REGTEST_MAX_LENGTH16); + else + length16 = copy_char8_to_char16((PCRE2_SPTR8)current->input, regtest_buf16, REGTEST_MAX_LENGTH16); + + return_value16[1] = pcre2_match_16(re16, regtest_buf16, length16, + current->start_offset & OFFSET_MASK, current->match_options, mdata16_2, mcontext16); + + if (pcre2_jit_compile_16(re16, jit_compile_mode)) { + printf("\n16 bit: JIT compiler does not support \"%s\"\n", current->pattern); + } else if ((counter & 0x1) != 0) { + setstack16(mcontext16); + return_value16[0] = pcre2_match_16(re16, regtest_buf16, length16, + current->start_offset & OFFSET_MASK, current->match_options, mdata16_1, mcontext16); + } else { + pcre2_jit_stack_assign_16(mcontext16, NULL, getstack16()); + return_value16[0] = pcre2_jit_match_16(re16, regtest_buf16, length16, + current->start_offset & OFFSET_MASK, current->match_options, mdata16_1, mcontext16); + } + } +#endif + +#ifdef SUPPORT_PCRE2_32 + return_value32[0] = -1000; + return_value32[1] = -1000; + mdata32_1 = pcre2_match_data_create_32(OVECTOR_SIZE, NULL); + mdata32_2 = pcre2_match_data_create_32(OVECTOR_SIZE, NULL); + mcontext32 = pcre2_match_context_create_32(NULL); + if (!mdata32_1 || !mdata32_2 || !mcontext32) { + printf("\n32 bit: Cannot allocate match data\n"); + pcre2_match_data_free_32(mdata32_1); + pcre2_match_data_free_32(mdata32_2); + pcre2_match_context_free_32(mcontext32); + pcre2_code_free_32(re32); + re32 = NULL; + } else { + ovector32_1 = pcre2_get_ovector_pointer_32(mdata32_1); + ovector32_2 = pcre2_get_ovector_pointer_32(mdata32_2); + for (i = 0; i < OVECTOR_SIZE * 2; ++i) + ovector32_1[i] = -2; + for (i = 0; i < OVECTOR_SIZE * 2; ++i) + ovector32_2[i] = -2; + pcre2_set_match_limit_32(mcontext32, 10000000); + } + if (re32) { + if ((current->compile_options & PCRE2_UTF) || (current->start_offset & F_FORCECONV)) + length32 = convert_utf8_to_utf32((PCRE2_SPTR8)current->input, regtest_buf32, regtest_offsetmap32, REGTEST_MAX_LENGTH32); + else + length32 = copy_char8_to_char32((PCRE2_SPTR8)current->input, regtest_buf32, REGTEST_MAX_LENGTH32); + + return_value32[1] = pcre2_match_32(re32, regtest_buf32, length32, + current->start_offset & OFFSET_MASK, current->match_options, mdata32_2, mcontext32); + + if (pcre2_jit_compile_32(re32, jit_compile_mode)) { + printf("\n32 bit: JIT compiler does not support \"%s\"\n", current->pattern); + } else if ((counter & 0x1) != 0) { + setstack32(mcontext32); + return_value32[0] = pcre2_match_32(re32, regtest_buf32, length32, + current->start_offset & OFFSET_MASK, current->match_options, mdata32_1, mcontext32); + } else { + pcre2_jit_stack_assign_32(mcontext32, NULL, getstack32()); + return_value32[0] = pcre2_jit_match_32(re32, regtest_buf32, length32, + current->start_offset & OFFSET_MASK, current->match_options, mdata32_1, mcontext32); + } + } +#endif + + /* printf("[%d-%d-%d|%d-%d|%d-%d|%d-%d]%s", + return_value8[0], return_value16[0], return_value32[0], + (int)ovector8_1[0], (int)ovector8_1[1], + (int)ovector16_1[0], (int)ovector16_1[1], + (int)ovector32_1[0], (int)ovector32_1[1], + (current->compile_options & PCRE2_CASELESS) ? "C" : ""); */ + + /* If F_DIFF is set, just run the test, but do not compare the results. + Segfaults can still be captured. */ + + is_successful = 1; + if (!(current->start_offset & F_DIFF)) { +#if defined SUPPORT_UNICODE && ((defined(SUPPORT_PCRE2_8) + defined(SUPPORT_PCRE2_16) + defined(SUPPORT_PCRE2_32)) >= 2) + if (!(current->start_offset & F_FORCECONV)) { + + /* All results must be the same. */ +#ifdef SUPPORT_PCRE2_8 + if ((return_value = return_value8[0]) != return_value8[1]) { + printf("\n8 bit: Return value differs(J8:%d,I8:%d): [%d] '%s' @ '%s'\n", + return_value8[0], return_value8[1], total, current->pattern, current->input); + is_successful = 0; + } else +#endif +#ifdef SUPPORT_PCRE2_16 + if ((return_value = return_value16[0]) != return_value16[1]) { + printf("\n16 bit: Return value differs(J16:%d,I16:%d): [%d] '%s' @ '%s'\n", + return_value16[0], return_value16[1], total, current->pattern, current->input); + is_successful = 0; + } else +#endif +#ifdef SUPPORT_PCRE2_32 + if ((return_value = return_value32[0]) != return_value32[1]) { + printf("\n32 bit: Return value differs(J32:%d,I32:%d): [%d] '%s' @ '%s'\n", + return_value32[0], return_value32[1], total, current->pattern, current->input); + is_successful = 0; + } else +#endif +#if defined SUPPORT_PCRE2_8 && defined SUPPORT_PCRE2_16 + if (return_value8[0] != return_value16[0]) { + printf("\n8 and 16 bit: Return value differs(J8:%d,J16:%d): [%d] '%s' @ '%s'\n", + return_value8[0], return_value16[0], + total, current->pattern, current->input); + is_successful = 0; + } else +#endif +#if defined SUPPORT_PCRE2_8 && defined SUPPORT_PCRE2_32 + if (return_value8[0] != return_value32[0]) { + printf("\n8 and 32 bit: Return value differs(J8:%d,J32:%d): [%d] '%s' @ '%s'\n", + return_value8[0], return_value32[0], + total, current->pattern, current->input); + is_successful = 0; + } else +#endif +#if defined SUPPORT_PCRE2_16 && defined SUPPORT_PCRE2_32 + if (return_value16[0] != return_value32[0]) { + printf("\n16 and 32 bit: Return value differs(J16:%d,J32:%d): [%d] '%s' @ '%s'\n", + return_value16[0], return_value32[0], + total, current->pattern, current->input); + is_successful = 0; + } else +#endif + if (return_value >= 0 || return_value == PCRE2_ERROR_PARTIAL) { + if (return_value == PCRE2_ERROR_PARTIAL) { + return_value = 2; + } else { + return_value *= 2; + } +#ifdef SUPPORT_PCRE2_8 + return_value8[0] = return_value; +#endif +#ifdef SUPPORT_PCRE2_16 + return_value16[0] = return_value; +#endif +#ifdef SUPPORT_PCRE2_32 + return_value32[0] = return_value; +#endif + /* Transform back the results. */ + if (current->compile_options & PCRE2_UTF) { +#ifdef SUPPORT_PCRE2_16 + for (i = 0; i < return_value; ++i) { + if (ovector16_1[i] != PCRE2_UNSET) + ovector16_1[i] = regtest_offsetmap16[ovector16_1[i]]; + if (ovector16_2[i] != PCRE2_UNSET) + ovector16_2[i] = regtest_offsetmap16[ovector16_2[i]]; + } +#endif +#ifdef SUPPORT_PCRE2_32 + for (i = 0; i < return_value; ++i) { + if (ovector32_1[i] != PCRE2_UNSET) + ovector32_1[i] = regtest_offsetmap32[ovector32_1[i]]; + if (ovector32_2[i] != PCRE2_UNSET) + ovector32_2[i] = regtest_offsetmap32[ovector32_2[i]]; + } +#endif + } + + for (i = 0; i < return_value; ++i) { +#if defined SUPPORT_PCRE2_8 && defined SUPPORT_PCRE2_16 + if (ovector8_1[i] != ovector8_2[i] || ovector8_1[i] != ovector16_1[i] || ovector8_1[i] != ovector16_2[i]) { + printf("\n8 and 16 bit: Ovector[%d] value differs(J8:%d,I8:%d,J16:%d,I16:%d): [%d] '%s' @ '%s' \n", + i, (int)ovector8_1[i], (int)ovector8_2[i], (int)ovector16_1[i], (int)ovector16_2[i], + total, current->pattern, current->input); + is_successful = 0; + } +#endif +#if defined SUPPORT_PCRE2_8 && defined SUPPORT_PCRE2_32 + if (ovector8_1[i] != ovector8_2[i] || ovector8_1[i] != ovector32_1[i] || ovector8_1[i] != ovector32_2[i]) { + printf("\n8 and 32 bit: Ovector[%d] value differs(J8:%d,I8:%d,J32:%d,I32:%d): [%d] '%s' @ '%s' \n", + i, (int)ovector8_1[i], (int)ovector8_2[i], (int)ovector32_1[i], (int)ovector32_2[i], + total, current->pattern, current->input); + is_successful = 0; + } +#endif +#if defined SUPPORT_PCRE2_16 && defined SUPPORT_PCRE2_32 + if (ovector16_1[i] != ovector16_2[i] || ovector16_1[i] != ovector32_1[i] || ovector16_1[i] != ovector32_2[i]) { + printf("\n16 and 32 bit: Ovector[%d] value differs(J16:%d,I16:%d,J32:%d,I32:%d): [%d] '%s' @ '%s' \n", + i, (int)ovector16_1[i], (int)ovector16_2[i], (int)ovector32_1[i], (int)ovector32_2[i], + total, current->pattern, current->input); + is_successful = 0; + } +#endif + } + } + } else +#endif /* more than one of SUPPORT_PCRE2_8, SUPPORT_PCRE2_16 and SUPPORT_PCRE2_32 */ + { +#ifdef SUPPORT_PCRE2_8 + if (return_value8[0] != return_value8[1]) { + printf("\n8 bit: Return value differs(%d:%d): [%d] '%s' @ '%s'\n", + return_value8[0], return_value8[1], total, current->pattern, current->input); + is_successful = 0; + } else if (return_value8[0] >= 0 || return_value8[0] == PCRE2_ERROR_PARTIAL) { + if (return_value8[0] == PCRE2_ERROR_PARTIAL) + return_value8[0] = 2; + else + return_value8[0] *= 2; + + for (i = 0; i < return_value8[0]; ++i) + if (ovector8_1[i] != ovector8_2[i]) { + printf("\n8 bit: Ovector[%d] value differs(%d:%d): [%d] '%s' @ '%s'\n", + i, (int)ovector8_1[i], (int)ovector8_2[i], total, current->pattern, current->input); + is_successful = 0; + } + } +#endif + +#ifdef SUPPORT_PCRE2_16 + if (return_value16[0] != return_value16[1]) { + printf("\n16 bit: Return value differs(%d:%d): [%d] '%s' @ '%s'\n", + return_value16[0], return_value16[1], total, current->pattern, current->input); + is_successful = 0; + } else if (return_value16[0] >= 0 || return_value16[0] == PCRE2_ERROR_PARTIAL) { + if (return_value16[0] == PCRE2_ERROR_PARTIAL) + return_value16[0] = 2; + else + return_value16[0] *= 2; + + for (i = 0; i < return_value16[0]; ++i) + if (ovector16_1[i] != ovector16_2[i]) { + printf("\n16 bit: Ovector[%d] value differs(%d:%d): [%d] '%s' @ '%s'\n", + i, (int)ovector16_1[i], (int)ovector16_2[i], total, current->pattern, current->input); + is_successful = 0; + } + } +#endif + +#ifdef SUPPORT_PCRE2_32 + if (return_value32[0] != return_value32[1]) { + printf("\n32 bit: Return value differs(%d:%d): [%d] '%s' @ '%s'\n", + return_value32[0], return_value32[1], total, current->pattern, current->input); + is_successful = 0; + } else if (return_value32[0] >= 0 || return_value32[0] == PCRE2_ERROR_PARTIAL) { + if (return_value32[0] == PCRE2_ERROR_PARTIAL) + return_value32[0] = 2; + else + return_value32[0] *= 2; + + for (i = 0; i < return_value32[0]; ++i) + if (ovector32_1[i] != ovector32_2[i]) { + printf("\n32 bit: Ovector[%d] value differs(%d:%d): [%d] '%s' @ '%s'\n", + i, (int)ovector32_1[i], (int)ovector32_2[i], total, current->pattern, current->input); + is_successful = 0; + } + } +#endif + } + } + + if (is_successful) { +#ifdef SUPPORT_PCRE2_8 + if (!(current->start_offset & F_NO8) && (utf || is_ascii)) { + if (return_value8[0] < 0 && !(current->start_offset & F_NOMATCH)) { + printf("8 bit: Test should match: [%d] '%s' @ '%s'\n", + total, current->pattern, current->input); + is_successful = 0; + } + + if (return_value8[0] >= 0 && (current->start_offset & F_NOMATCH)) { + printf("8 bit: Test should not match: [%d] '%s' @ '%s'\n", + total, current->pattern, current->input); + is_successful = 0; + } + } +#endif +#ifdef SUPPORT_PCRE2_16 + if (!(current->start_offset & F_NO16) && (utf || is_ascii)) { + if (return_value16[0] < 0 && !(current->start_offset & F_NOMATCH)) { + printf("16 bit: Test should match: [%d] '%s' @ '%s'\n", + total, current->pattern, current->input); + is_successful = 0; + } + + if (return_value16[0] >= 0 && (current->start_offset & F_NOMATCH)) { + printf("16 bit: Test should not match: [%d] '%s' @ '%s'\n", + total, current->pattern, current->input); + is_successful = 0; + } + } +#endif +#ifdef SUPPORT_PCRE2_32 + if (!(current->start_offset & F_NO32) && (utf || is_ascii)) { + if (return_value32[0] < 0 && !(current->start_offset & F_NOMATCH)) { + printf("32 bit: Test should match: [%d] '%s' @ '%s'\n", + total, current->pattern, current->input); + is_successful = 0; + } + + if (return_value32[0] >= 0 && (current->start_offset & F_NOMATCH)) { + printf("32 bit: Test should not match: [%d] '%s' @ '%s'\n", + total, current->pattern, current->input); + is_successful = 0; + } + } +#endif + } + + if (is_successful) { +#ifdef SUPPORT_PCRE2_8 + if (re8 && !(current->start_offset & F_NO8) && pcre2_get_mark_8(mdata8_1) != pcre2_get_mark_8(mdata8_2)) { + printf("8 bit: Mark value mismatch: [%d] '%s' @ '%s'\n", + total, current->pattern, current->input); + is_successful = 0; + } +#endif +#ifdef SUPPORT_PCRE2_16 + if (re16 && !(current->start_offset & F_NO16) && pcre2_get_mark_16(mdata16_1) != pcre2_get_mark_16(mdata16_2)) { + printf("16 bit: Mark value mismatch: [%d] '%s' @ '%s'\n", + total, current->pattern, current->input); + is_successful = 0; + } +#endif +#ifdef SUPPORT_PCRE2_32 + if (re32 && !(current->start_offset & F_NO32) && pcre2_get_mark_32(mdata32_1) != pcre2_get_mark_32(mdata32_2)) { + printf("32 bit: Mark value mismatch: [%d] '%s' @ '%s'\n", + total, current->pattern, current->input); + is_successful = 0; + } +#endif + } + +#ifdef SUPPORT_PCRE2_8 + pcre2_code_free_8(re8); + pcre2_match_data_free_8(mdata8_1); + pcre2_match_data_free_8(mdata8_2); + pcre2_match_context_free_8(mcontext8); +#endif +#ifdef SUPPORT_PCRE2_16 + pcre2_code_free_16(re16); + pcre2_match_data_free_16(mdata16_1); + pcre2_match_data_free_16(mdata16_2); + pcre2_match_context_free_16(mcontext16); +#endif +#ifdef SUPPORT_PCRE2_32 + pcre2_code_free_32(re32); + pcre2_match_data_free_32(mdata32_1); + pcre2_match_data_free_32(mdata32_2); + pcre2_match_context_free_32(mcontext32); +#endif + + if (is_successful) { + successful++; + successful_row++; + printf("."); + if (successful_row >= 60) { + successful_row = 0; + printf("\n"); + } + } else + successful_row = 0; + + fflush(stdout); + current++; + } +#ifdef SUPPORT_PCRE2_8 + setstack8(NULL); +#endif +#ifdef SUPPORT_PCRE2_16 + setstack16(NULL); +#endif +#ifdef SUPPORT_PCRE2_32 + setstack32(NULL); +#endif + + if (total == successful) { + printf("\nAll JIT regression tests are successfully passed.\n"); + return 0; + } else { + printf("\nSuccessful test ratio: %d%% (%d failed)\n", successful * 100 / total, total - successful); + return 1; + } +} + +#if defined SUPPORT_UNICODE + +static int check_invalid_utf_result(int pattern_index, const char *type, int result, + int match_start, int match_end, PCRE2_SIZE *ovector) +{ + if (match_start < 0) { + if (result != -1) { + printf("Pattern[%d] %s result is not -1.\n", pattern_index, type); + return 1; + } + return 0; + } + + if (result <= 0) { + printf("Pattern[%d] %s result (%d) is not greater than 0.\n", pattern_index, type, result); + return 1; + } + + if (ovector[0] != (PCRE2_SIZE)match_start) { + printf("Pattern[%d] %s ovector[0] is unexpected (%d instead of %d)\n", + pattern_index, type, (int)ovector[0], match_start); + return 1; + } + + if (ovector[1] != (PCRE2_SIZE)match_end) { + printf("Pattern[%d] %s ovector[1] is unexpected (%d instead of %d)\n", + pattern_index, type, (int)ovector[1], match_end); + return 1; + } + + return 0; +} + +#endif /* SUPPORT_UNICODE */ + +#if defined SUPPORT_UNICODE && defined SUPPORT_PCRE2_8 + +#define UDA (PCRE2_UTF | PCRE2_DOTALL | PCRE2_ANCHORED) +#define CI (PCRE2_JIT_COMPLETE | PCRE2_JIT_INVALID_UTF) +#define CPI (PCRE2_JIT_COMPLETE | PCRE2_JIT_PARTIAL_SOFT | PCRE2_JIT_INVALID_UTF) + +struct invalid_utf8_regression_test_case { + int compile_options; + int jit_compile_options; + int start_offset; + int skip_left; + int skip_right; + int match_start; + int match_end; + const char *pattern[2]; + const char *input; +}; + +static const char invalid_utf8_newline_cr; + +static const struct invalid_utf8_regression_test_case invalid_utf8_regression_test_cases[] = { + { UDA, CI, 0, 0, 0, 0, 4, { ".", NULL }, "\xf4\x8f\xbf\xbf" }, + { UDA, CI, 0, 0, 0, 0, 4, { ".", NULL }, "\xf0\x90\x80\x80" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xf4\x90\x80\x80" }, + { UDA, CI, 0, 0, 1, -1, -1, { ".", NULL }, "\xf4\x8f\xbf\xbf" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xf0\x90\x80\x7f" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xf0\x90\x80\xc0" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xf0\x8f\xbf\xbf" }, + { UDA, CI, 0, 0, 0, 0, 3, { ".", NULL }, "\xef\xbf\xbf#" }, + { UDA, CI, 0, 0, 0, 0, 3, { ".", NULL }, "\xef\xbf\xbf" }, + { UDA, CI, 0, 0, 0, 0, 3, { ".", NULL }, "\xe0\xa0\x80#" }, + { UDA, CI, 0, 0, 0, 0, 3, { ".", NULL }, "\xe0\xa0\x80" }, + { UDA, CI, 0, 0, 2, -1, -1, { ".", NULL }, "\xef\xbf\xbf#" }, + { UDA, CI, 0, 0, 1, -1, -1, { ".", NULL }, "\xef\xbf\xbf" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xef\xbf\x7f#" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xef\xbf\xc0" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xe0\x9f\xbf#" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xe0\x9f\xbf" }, + { UDA, CI, 0, 0, 0, 0, 3, { ".", NULL }, "\xed\x9f\xbf#" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xed\xa0\x80#" }, + { UDA, CI, 0, 0, 0, 0, 3, { ".", NULL }, "\xee\x80\x80#" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xed\xbf\xbf#" }, + { UDA, CI, 0, 0, 0, 0, 2, { ".", NULL }, "\xdf\xbf##" }, + { UDA, CI, 0, 0, 0, 0, 2, { ".", NULL }, "\xdf\xbf#" }, + { UDA, CI, 0, 0, 0, 0, 2, { ".", NULL }, "\xdf\xbf" }, + { UDA, CI, 0, 0, 0, 0, 2, { ".", NULL }, "\xc2\x80##" }, + { UDA, CI, 0, 0, 0, 0, 2, { ".", NULL }, "\xc2\x80#" }, + { UDA, CI, 0, 0, 0, 0, 2, { ".", NULL }, "\xc2\x80" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xe0\x80##" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xdf\xc0##" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xe0\x80" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xdf\xc0" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xc1\xbf##" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xc1\xbf" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\x80###" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\x80" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xf8###" }, + { UDA, CI, 0, 0, 0, -1, -1, { ".", NULL }, "\xf8" }, + { UDA, CI, 0, 0, 0, 0, 1, { ".", NULL }, "\x7f" }, + + { UDA, CPI, 4, 0, 0, 4, 4, { "\\B", NULL }, "\xf4\x8f\xbf\xbf#" }, + { UDA, CPI, 4, 0, 0, -1, -1, { "\\B", "\\b" }, "\xf4\xa0\x80\x80\xf4\xa0\x80\x80" }, + { UDA, CPI, 4, 1, 1, -1, -1, { "\\B", "\\b" }, "\xf4\x8f\xbf\xbf\xf4\x8f\xbf\xbf" }, + { UDA, CPI, 4, 0, 0, 4, 4, { "\\B", NULL }, "#\xef\xbf\xbf#" }, + { UDA, CPI, 4, 0, 0, 4, 4, { "\\B", NULL }, "#\xe0\xa0\x80#" }, + { UDA, CPI, 4, 0, 0, 4, 4, { "\\B", NULL }, "\xf0\x90\x80\x80#" }, + { UDA, CPI, 4, 0, 0, 4, 4, { "\\B", NULL }, "\xf3\xbf\xbf\xbf#" }, + { UDA, CPI, 4, 0, 0, -1, -1, { "\\B", "\\b" }, "\xf0\x8f\xbf\xbf\xf0\x8f\xbf\xbf" }, + { UDA, CPI, 4, 0, 0, -1, -1, { "\\B", "\\b" }, "\xf5\x80\x80\x80\xf5\x80\x80\x80" }, + { UDA, CPI, 4, 0, 0, -1, -1, { "\\B", "\\b" }, "\xf4\x90\x80\x80\xf4\x90\x80\x80" }, + { UDA, CPI, 4, 0, 0, -1, -1, { "\\B", "\\b" }, "\xf4\x8f\xbf\xff\xf4\x8f\xbf\xff" }, + { UDA, CPI, 4, 0, 0, -1, -1, { "\\B", "\\b" }, "\xf4\x8f\xff\xbf\xf4\x8f\xff\xbf" }, + { UDA, CPI, 4, 0, 1, -1, -1, { "\\B", "\\b" }, "\xef\x80\x80\x80\xef\x80\x80" }, + { UDA, CPI, 4, 0, 0, -1, -1, { "\\B", "\\b" }, "\x80\x80\x80\x80\x80\x80\x80\x80" }, + { UDA, CPI, 4, 0, 0, -1, -1, { "\\B", "\\b" }, "#\xe0\x9f\xbf\xe0\x9f\xbf#" }, + { UDA, CPI, 4, 2, 2, -1, -1, { "\\B", "\\b" }, "#\xe0\xa0\x80\xe0\xa0\x80#" }, + { UDA, CPI, 4, 0, 0, -1, -1, { "\\B", "\\b" }, "#\xf0\x80\x80\xf0\x80\x80#" }, + { UDA, CPI, 4, 0, 0, -1, -1, { "\\B", "\\b" }, "#\xed\xa0\x80\xed\xa0\x80#" }, + { UDA, CPI, 4, 0, 0, 4, 4, { "\\B", NULL }, "##\xdf\xbf#" }, + { UDA, CPI, 4, 2, 0, 2, 2, { "\\B", NULL }, "##\xdf\xbf#" }, + { UDA, CPI, 4, 0, 0, 4, 4, { "\\B", NULL }, "##\xc2\x80#" }, + { UDA, CPI, 4, 2, 0, 2, 2, { "\\B", NULL }, "##\xc2\x80#" }, + { UDA, CPI, 4, 0, 0, -1, -1, { "\\B", "\\b" }, "##\xc1\xbf\xc1\xbf##" }, + { UDA, CPI, 4, 0, 0, -1, -1, { "\\B", "\\b" }, "##\xdf\xc0\xdf\xc0##" }, + { UDA, CPI, 4, 0, 0, -1, -1, { "\\B", "\\b" }, "##\xe0\x80\xe0\x80##" }, + + { UDA, CPI, 3, 0, 0, 3, 3, { "\\B", NULL }, "\xef\xbf\xbf#" }, + { UDA, CPI, 3, 0, 0, 3, 3, { "\\B", NULL }, "\xe0\xa0\x80#" }, + { UDA, CPI, 3, 0, 0, -1, -1, { "\\B", "\\b" }, "\xe0\x9f\xbf\xe0\x9f\xbf" }, + { UDA, CPI, 3, 1, 1, -1, -1, { "\\B", "\\b" }, "\xef\xbf\xbf\xef\xbf\xbf" }, + { UDA, CPI, 3, 0, 1, -1, -1, { "\\B", "\\b" }, "\xdf\x80\x80\xdf\x80" }, + { UDA, CPI, 3, 0, 0, -1, -1, { "\\B", "\\b" }, "\xef\xbf\xff\xef\xbf\xff" }, + { UDA, CPI, 3, 0, 0, -1, -1, { "\\B", "\\b" }, "\xef\xff\xbf\xef\xff\xbf" }, + { UDA, CPI, 3, 0, 0, -1, -1, { "\\B", "\\b" }, "\xed\xbf\xbf\xed\xbf\xbf" }, + + { UDA, CPI, 2, 0, 0, 2, 2, { "\\B", NULL }, "\xdf\xbf#" }, + { UDA, CPI, 2, 0, 0, 2, 2, { "\\B", NULL }, "\xc2\x80#" }, + { UDA, CPI, 2, 1, 1, -1, -1, { "\\B", "\\b" }, "\xdf\xbf\xdf\xbf" }, + { UDA, CPI, 2, 0, 0, -1, -1, { "\\B", "\\b" }, "\xc1\xbf\xc1\xbf" }, + { UDA, CPI, 2, 0, 0, -1, -1, { "\\B", "\\b" }, "\xe0\x80\xe0\x80" }, + { UDA, CPI, 2, 0, 0, -1, -1, { "\\B", "\\b" }, "\xdf\xff\xdf\xff" }, + { UDA, CPI, 2, 0, 0, -1, -1, { "\\B", "\\b" }, "\xff\xbf\xff\xbf" }, + + { UDA, CPI, 1, 0, 0, 1, 1, { "\\B", NULL }, "\x7f#" }, + { UDA, CPI, 1, 0, 0, 1, 1, { "\\B", NULL }, "\x01#" }, + { UDA, CPI, 1, 0, 0, -1, -1, { "\\B", "\\b" }, "\x80\x80" }, + { UDA, CPI, 1, 0, 0, -1, -1, { "\\B", "\\b" }, "\xb0\xb0" }, + + { UDA | PCRE2_CASELESS, CPI, 0, 0, 0, 0, 2, { "(.)\\1", NULL }, "aA" }, + { UDA | PCRE2_CASELESS, CPI, 0, 0, 0, -1, -1, { "(.)\\1", NULL }, "a\xff" }, + { UDA | PCRE2_CASELESS, CPI, 0, 0, 0, 0, 4, { "(.)\\1", NULL }, "\xc3\xa1\xc3\x81" }, + { UDA | PCRE2_CASELESS, CPI, 0, 0, 1, -1, -1, { "(.)\\1", NULL }, "\xc3\xa1\xc3\x81" }, + { UDA | PCRE2_CASELESS, CPI, 0, 0, 0, -1, -1, { "(.)\\1", NULL }, "\xc2\x80\x80" }, + { UDA | PCRE2_CASELESS, CPI, 0, 0, 0, 0, 6, { "(.)\\1", NULL }, "\xe1\xbd\xb8\xe1\xbf\xb8" }, + { UDA | PCRE2_CASELESS, CPI, 0, 0, 1, -1, -1, { "(.)\\1", NULL }, "\xe1\xbd\xb8\xe1\xbf\xb8" }, + { UDA | PCRE2_CASELESS, CPI, 0, 0, 0, 0, 8, { "(.)\\1", NULL }, "\xf0\x90\x90\x80\xf0\x90\x90\xa8" }, + { UDA | PCRE2_CASELESS, CPI, 0, 0, 1, -1, -1, { "(.)\\1", NULL }, "\xf0\x90\x90\x80\xf0\x90\x90\xa8" }, + + { UDA, CPI, 0, 0, 0, 0, 1, { "\\X", NULL }, "A" }, + { UDA, CPI, 0, 0, 0, -1, -1, { "\\X", NULL }, "\xff" }, + { UDA, CPI, 0, 0, 0, 0, 2, { "\\X", NULL }, "\xc3\xa1" }, + { UDA, CPI, 0, 0, 1, -1, -1, { "\\X", NULL }, "\xc3\xa1" }, + { UDA, CPI, 0, 0, 0, -1, -1, { "\\X", NULL }, "\xc3\x7f" }, + { UDA, CPI, 0, 0, 0, 0, 3, { "\\X", NULL }, "\xe1\xbd\xb8" }, + { UDA, CPI, 0, 0, 1, -1, -1, { "\\X", NULL }, "\xe1\xbd\xb8" }, + { UDA, CPI, 0, 0, 0, 0, 4, { "\\X", NULL }, "\xf0\x90\x90\x80" }, + { UDA, CPI, 0, 0, 1, -1, -1, { "\\X", NULL }, "\xf0\x90\x90\x80" }, + + { UDA, CPI, 0, 0, 0, -1, -1, { "[^#]", NULL }, "#" }, + { UDA, CPI, 0, 0, 0, 0, 4, { "[^#]", NULL }, "\xf4\x8f\xbf\xbf" }, + { UDA, CPI, 0, 0, 0, -1, -1, { "[^#]", NULL }, "\xf4\x90\x80\x80" }, + { UDA, CPI, 0, 0, 0, -1, -1, { "[^#]", NULL }, "\xc1\x80" }, + + { PCRE2_UTF | PCRE2_MULTILINE, CI, 1, 0, 0, 2, 3, { "^\\W", NULL }, " \x0a#"}, + { PCRE2_UTF | PCRE2_MULTILINE, CI, 1, 0, 0, 14, 15, { "^\\W", NULL }, " \xc0\x8a#\xe0\x80\x8a#\xf0\x80\x80\x8a#\x0a#"}, + { PCRE2_UTF | PCRE2_MULTILINE, CI, 1, 0, 0, 3, 4, { "^\\W", NULL }, " \xf8\x0a#"}, + { PCRE2_UTF | PCRE2_MULTILINE, CI, 1, 0, 0, 3, 4, { "^\\W", NULL }, " \xc3\x0a#"}, + { PCRE2_UTF | PCRE2_MULTILINE, CI, 1, 0, 0, 3, 4, { "^\\W", NULL }, " \xf1\x0a#"}, + { PCRE2_UTF | PCRE2_MULTILINE, CI, 1, 0, 0, 4, 5, { "^\\W", NULL }, " \xf2\xbf\x0a#"}, + { PCRE2_UTF | PCRE2_MULTILINE, CI, 1, 0, 0, 5, 6, { "^\\W", NULL }, " \xf2\xbf\xbf\x0a#"}, + { PCRE2_UTF | PCRE2_MULTILINE, CI, 1, 0, 0, 3, 4, { "^\\W", NULL }, " \xef\x0a#"}, + { PCRE2_UTF | PCRE2_MULTILINE, CI, 1, 0, 0, 4, 5, { "^\\W", NULL }, " \xef\xbf\x0a#"}, + { PCRE2_UTF | PCRE2_MULTILINE, CI, 1, 0, 0, 5, 6, { "^\\W", NULL }, " \x85#\xc2\x85#"}, + { PCRE2_UTF | PCRE2_MULTILINE, CI, 1, 0, 0, 7, 8, { "^\\W", NULL }, " \xe2\x80\xf8\xe2\x80\xa8#"}, + + { PCRE2_UTF | PCRE2_FIRSTLINE, CI, 0, 0, 0, -1, -1, { "#", NULL }, "\xe2\x80\xf8\xe2\x80\xa8#"}, + { PCRE2_UTF | PCRE2_FIRSTLINE, CI, 0, 0, 0, 3, 4, { "#", NULL }, "\xe2\x80\xf8#\xe2\x80\xa8#"}, + { PCRE2_UTF | PCRE2_FIRSTLINE, CI, 0, 0, 0, -1, -1, { "#", NULL }, "abcd\xc2\x85#"}, + { PCRE2_UTF | PCRE2_FIRSTLINE, CI, 0, 0, 0, 1, 2, { "#", NULL }, "\x85#\xc2\x85#"}, + { PCRE2_UTF | PCRE2_FIRSTLINE, CI, 0, 0, 0, 5, 6, { "#", NULL }, "\xef,\x80,\xf8#\x0a"}, + { PCRE2_UTF | PCRE2_FIRSTLINE, CI, 0, 0, 0, -1, -1, { "#", NULL }, "\xef,\x80,\xf8\x0a#"}, + + { PCRE2_UTF | PCRE2_NO_START_OPTIMIZE, CI, 0, 0, 0, 4, 8, { "#\xc7\x85#", NULL }, "\x80\x80#\xc7#\xc7\x85#" }, + { PCRE2_UTF | PCRE2_NO_START_OPTIMIZE, CI, 0, 0, 0, 7, 11, { "#\xc7\x85#", NULL }, "\x80\x80#\xc7\x80\x80\x80#\xc7\x85#" }, + { PCRE2_UTF, CI, 0, 0, 0, 4, 8, { "#\xc7\x85#", NULL }, "\x80\x80#\xc7#\xc7\x85#" }, + { PCRE2_UTF, CI, 0, 0, 0, 7, 11, { "#\xc7\x85#", NULL }, "\x80\x80#\xc7\x80\x80\x80#\xc7\x85#" }, + + { PCRE2_UTF | PCRE2_UCP, CI, 0, 0, 0, -1, -1, { "[\\s]", NULL }, "\xed\xa0\x80" }, + + /* These two are not invalid UTF tests, but this infrastructure fits better for them. */ + { 0, PCRE2_JIT_COMPLETE, 0, 0, 1, -1, -1, { "\\X{2}", NULL }, "\r\n\n" }, + { 0, PCRE2_JIT_COMPLETE, 0, 0, 1, -1, -1, { "\\R{2}", NULL }, "\r\n\n" }, + + { PCRE2_UTF | PCRE2_MULTILINE, CI, 0, 0, 0, -1, -1, { "^.a", &invalid_utf8_newline_cr }, "\xc3\xa7#a" }, + + { 0, 0, 0, 0, 0, 0, 0, { NULL, NULL }, NULL } +}; + +#undef UDA +#undef CI +#undef CPI + +static int run_invalid_utf8_test(const struct invalid_utf8_regression_test_case *current, + int pattern_index, int i, pcre2_compile_context_8 *ccontext, pcre2_match_data_8 *mdata) +{ + pcre2_code_8 *code; + int result, errorcode; + PCRE2_SIZE length, erroroffset; + PCRE2_SIZE *ovector = pcre2_get_ovector_pointer_8(mdata); + + if (current->pattern[i] == NULL) + return 1; + + code = pcre2_compile_8((PCRE2_UCHAR8*)current->pattern[i], PCRE2_ZERO_TERMINATED, + current->compile_options, &errorcode, &erroroffset, ccontext); + + if (!code) { + printf("Pattern[%d:0] cannot be compiled. Error offset: %d\n", pattern_index, (int)erroroffset); + return 0; + } + + if (pcre2_jit_compile_8(code, current->jit_compile_options) != 0) { + printf("Pattern[%d:0] cannot be compiled by the JIT compiler.\n", pattern_index); + pcre2_code_free_8(code); + return 0; + } + + length = (PCRE2_SIZE)(strlen(current->input) - current->skip_left - current->skip_right); + + if (current->jit_compile_options & PCRE2_JIT_COMPLETE) { + result = pcre2_jit_match_8(code, (PCRE2_UCHAR8*)(current->input + current->skip_left), + length, current->start_offset - current->skip_left, 0, mdata, NULL); + + if (check_invalid_utf_result(pattern_index, "match", result, current->match_start, current->match_end, ovector)) { + pcre2_code_free_8(code); + return 0; + } + } + + if (current->jit_compile_options & PCRE2_JIT_PARTIAL_SOFT) { + result = pcre2_jit_match_8(code, (PCRE2_UCHAR8*)(current->input + current->skip_left), + length, current->start_offset - current->skip_left, PCRE2_PARTIAL_SOFT, mdata, NULL); + + if (check_invalid_utf_result(pattern_index, "partial match", result, current->match_start, current->match_end, ovector)) { + pcre2_code_free_8(code); + return 0; + } + } + + pcre2_code_free_8(code); + return 1; +} + +static int invalid_utf8_regression_tests(void) +{ + const struct invalid_utf8_regression_test_case *current; + pcre2_compile_context_8 *ccontext; + pcre2_match_data_8 *mdata; + int total = 0, successful = 0; + int result; + + printf("\nRunning invalid-utf8 JIT regression tests\n"); + + ccontext = pcre2_compile_context_create_8(NULL); + pcre2_set_newline_8(ccontext, PCRE2_NEWLINE_ANY); + mdata = pcre2_match_data_create_8(4, NULL); + + for (current = invalid_utf8_regression_test_cases; current->pattern[0]; current++) { + /* printf("\nPattern: %s :\n", current->pattern); */ + total++; + + result = 1; + if (current->pattern[1] != &invalid_utf8_newline_cr) + { + if (!run_invalid_utf8_test(current, total - 1, 0, ccontext, mdata)) + result = 0; + if (!run_invalid_utf8_test(current, total - 1, 1, ccontext, mdata)) + result = 0; + } else { + pcre2_set_newline_8(ccontext, PCRE2_NEWLINE_CR); + if (!run_invalid_utf8_test(current, total - 1, 0, ccontext, mdata)) + result = 0; + pcre2_set_newline_8(ccontext, PCRE2_NEWLINE_ANY); + } + + if (result) { + successful++; + } + + printf("."); + if ((total % 60) == 0) + printf("\n"); + } + + if ((total % 60) != 0) + printf("\n"); + + pcre2_match_data_free_8(mdata); + pcre2_compile_context_free_8(ccontext); + + if (total == successful) { + printf("\nAll invalid UTF8 JIT regression tests are successfully passed.\n"); + return 0; + } else { + printf("\nInvalid UTF8 successful test ratio: %d%% (%d failed)\n", successful * 100 / total, total - successful); + return 1; + } +} + +#else /* !SUPPORT_UNICODE || !SUPPORT_PCRE2_8 */ + +static int invalid_utf8_regression_tests(void) +{ + return 0; +} + +#endif /* SUPPORT_UNICODE && SUPPORT_PCRE2_8 */ + +#if defined SUPPORT_UNICODE && defined SUPPORT_PCRE2_16 + +#define UDA (PCRE2_UTF | PCRE2_DOTALL | PCRE2_ANCHORED) +#define CI (PCRE2_JIT_COMPLETE | PCRE2_JIT_INVALID_UTF) +#define CPI (PCRE2_JIT_COMPLETE | PCRE2_JIT_PARTIAL_SOFT | PCRE2_JIT_INVALID_UTF) + +struct invalid_utf16_regression_test_case { + int compile_options; + int jit_compile_options; + int start_offset; + int skip_left; + int skip_right; + int match_start; + int match_end; + const PCRE2_UCHAR16 *pattern[2]; + const PCRE2_UCHAR16 *input; +}; + +static PCRE2_UCHAR16 allany16[] = { '.', 0 }; +static PCRE2_UCHAR16 non_word_boundary16[] = { '\\', 'B', 0 }; +static PCRE2_UCHAR16 word_boundary16[] = { '\\', 'b', 0 }; +static PCRE2_UCHAR16 backreference16[] = { '(', '.', ')', '\\', '1', 0 }; +static PCRE2_UCHAR16 grapheme16[] = { '\\', 'X', 0 }; +static PCRE2_UCHAR16 nothashmark16[] = { '[', '^', '#', ']', 0 }; +static PCRE2_UCHAR16 afternl16[] = { '^', '\\', 'W', 0 }; +static PCRE2_UCHAR16 generic16[] = { '#', 0xd800, 0xdc00, '#', 0 }; +static PCRE2_UCHAR16 test16_1[] = { 0xd7ff, 0xe000, 0xffff, 0x01, '#', 0 }; +static PCRE2_UCHAR16 test16_2[] = { 0xd800, 0xdc00, 0xd800, 0xdc00, 0 }; +static PCRE2_UCHAR16 test16_3[] = { 0xdbff, 0xdfff, 0xdbff, 0xdfff, 0 }; +static PCRE2_UCHAR16 test16_4[] = { 0xd800, 0xdbff, 0xd800, 0xdbff, 0 }; +static PCRE2_UCHAR16 test16_5[] = { '#', 0xd800, 0xdc00, '#', 0 }; +static PCRE2_UCHAR16 test16_6[] = { 'a', 'A', 0xdc28, 0 }; +static PCRE2_UCHAR16 test16_7[] = { 0xd801, 0xdc00, 0xd801, 0xdc28, 0 }; +static PCRE2_UCHAR16 test16_8[] = { '#', 0xd800, 0xdc00, 0 }; +static PCRE2_UCHAR16 test16_9[] = { ' ', 0x2028, '#', 0 }; +static PCRE2_UCHAR16 test16_10[] = { ' ', 0xdc00, 0xd800, 0x2028, '#', 0 }; +static PCRE2_UCHAR16 test16_11[] = { 0xdc00, 0xdc00, 0xd800, 0xdc00, 0xdc00, '#', 0xd800, 0xdc00, '#', 0 }; +static PCRE2_UCHAR16 test16_12[] = { '#', 0xd800, 0xdc00, 0xd800, '#', 0xd800, 0xdc00, 0xdc00, 0xdc00, '#', 0xd800, 0xdc00, '#', 0 }; + +static const struct invalid_utf16_regression_test_case invalid_utf16_regression_test_cases[] = { + { UDA, CI, 0, 0, 0, 0, 1, { allany16, NULL }, test16_1 }, + { UDA, CI, 1, 0, 0, 1, 2, { allany16, NULL }, test16_1 }, + { UDA, CI, 2, 0, 0, 2, 3, { allany16, NULL }, test16_1 }, + { UDA, CI, 3, 0, 0, 3, 4, { allany16, NULL }, test16_1 }, + { UDA, CI, 0, 0, 0, 0, 2, { allany16, NULL }, test16_2 }, + { UDA, CI, 0, 0, 3, -1, -1, { allany16, NULL }, test16_2 }, + { UDA, CI, 1, 0, 0, -1, -1, { allany16, NULL }, test16_2 }, + { UDA, CI, 0, 0, 0, 0, 2, { allany16, NULL }, test16_3 }, + { UDA, CI, 0, 0, 3, -1, -1, { allany16, NULL }, test16_3 }, + { UDA, CI, 1, 0, 0, -1, -1, { allany16, NULL }, test16_3 }, + + { UDA, CPI, 1, 0, 0, 1, 1, { non_word_boundary16, NULL }, test16_1 }, + { UDA, CPI, 2, 0, 0, 2, 2, { non_word_boundary16, NULL }, test16_1 }, + { UDA, CPI, 3, 0, 0, 3, 3, { non_word_boundary16, NULL }, test16_1 }, + { UDA, CPI, 4, 0, 0, 4, 4, { non_word_boundary16, NULL }, test16_1 }, + { UDA, CPI, 2, 0, 0, 2, 2, { non_word_boundary16, NULL }, test16_2 }, + { UDA, CPI, 2, 0, 0, 2, 2, { non_word_boundary16, NULL }, test16_3 }, + { UDA, CPI, 2, 1, 1, -1, -1, { non_word_boundary16, word_boundary16 }, test16_2 }, + { UDA, CPI, 2, 1, 1, -1, -1, { non_word_boundary16, word_boundary16 }, test16_3 }, + { UDA, CPI, 2, 0, 0, -1, -1, { non_word_boundary16, word_boundary16 }, test16_4 }, + { UDA, CPI, 2, 0, 0, -1, -1, { non_word_boundary16, word_boundary16 }, test16_5 }, + + { UDA | PCRE2_CASELESS, CPI, 0, 0, 0, 0, 2, { backreference16, NULL }, test16_6 }, + { UDA | PCRE2_CASELESS, CPI, 1, 0, 0, -1, -1, { backreference16, NULL }, test16_6 }, + { UDA | PCRE2_CASELESS, CPI, 0, 0, 0, 0, 4, { backreference16, NULL }, test16_7 }, + { UDA | PCRE2_CASELESS, CPI, 0, 0, 1, -1, -1, { backreference16, NULL }, test16_7 }, + + { UDA, CPI, 0, 0, 0, 0, 1, { grapheme16, NULL }, test16_6 }, + { UDA, CPI, 1, 0, 0, 1, 2, { grapheme16, NULL }, test16_6 }, + { UDA, CPI, 2, 0, 0, -1, -1, { grapheme16, NULL }, test16_6 }, + { UDA, CPI, 0, 0, 0, 0, 2, { grapheme16, NULL }, test16_7 }, + { UDA, CPI, 2, 0, 0, 2, 4, { grapheme16, NULL }, test16_7 }, + { UDA, CPI, 1, 0, 0, -1, -1, { grapheme16, NULL }, test16_7 }, + + { UDA, CPI, 0, 0, 0, -1, -1, { nothashmark16, NULL }, test16_8 }, + { UDA, CPI, 1, 0, 0, 1, 3, { nothashmark16, NULL }, test16_8 }, + { UDA, CPI, 2, 0, 0, -1, -1, { nothashmark16, NULL }, test16_8 }, + + { PCRE2_UTF | PCRE2_MULTILINE, CI, 1, 0, 0, 2, 3, { afternl16, NULL }, test16_9 }, + { PCRE2_UTF | PCRE2_MULTILINE, CI, 1, 0, 0, 4, 5, { afternl16, NULL }, test16_10 }, + + { PCRE2_UTF | PCRE2_NO_START_OPTIMIZE, CI, 0, 0, 0, 5, 9, { generic16, NULL }, test16_11 }, + { PCRE2_UTF | PCRE2_NO_START_OPTIMIZE, CI, 0, 0, 0, 9, 13, { generic16, NULL }, test16_12 }, + { PCRE2_UTF, CI, 0, 0, 0, 5, 9, { generic16, NULL }, test16_11 }, + { PCRE2_UTF, CI, 0, 0, 0, 9, 13, { generic16, NULL }, test16_12 }, + + { 0, 0, 0, 0, 0, 0, 0, { NULL, NULL }, NULL } +}; + +#undef UDA +#undef CI +#undef CPI + +static int run_invalid_utf16_test(const struct invalid_utf16_regression_test_case *current, + int pattern_index, int i, pcre2_compile_context_16 *ccontext, pcre2_match_data_16 *mdata) +{ + pcre2_code_16 *code; + int result, errorcode; + PCRE2_SIZE length, erroroffset; + const PCRE2_UCHAR16 *input; + PCRE2_SIZE *ovector = pcre2_get_ovector_pointer_16(mdata); + + if (current->pattern[i] == NULL) + return 1; + + code = pcre2_compile_16(current->pattern[i], PCRE2_ZERO_TERMINATED, + current->compile_options, &errorcode, &erroroffset, ccontext); + + if (!code) { + printf("Pattern[%d:0] cannot be compiled. Error offset: %d\n", pattern_index, (int)erroroffset); + return 0; + } + + if (pcre2_jit_compile_16(code, current->jit_compile_options) != 0) { + printf("Pattern[%d:0] cannot be compiled by the JIT compiler.\n", pattern_index); + pcre2_code_free_16(code); + return 0; + } + + input = current->input; + length = 0; + + while (*input++ != 0) + length++; + + length -= current->skip_left + current->skip_right; + + if (current->jit_compile_options & PCRE2_JIT_COMPLETE) { + result = pcre2_jit_match_16(code, (current->input + current->skip_left), + length, current->start_offset - current->skip_left, 0, mdata, NULL); + + if (check_invalid_utf_result(pattern_index, "match", result, current->match_start, current->match_end, ovector)) { + pcre2_code_free_16(code); + return 0; + } + } + + if (current->jit_compile_options & PCRE2_JIT_PARTIAL_SOFT) { + result = pcre2_jit_match_16(code, (current->input + current->skip_left), + length, current->start_offset - current->skip_left, PCRE2_PARTIAL_SOFT, mdata, NULL); + + if (check_invalid_utf_result(pattern_index, "partial match", result, current->match_start, current->match_end, ovector)) { + pcre2_code_free_16(code); + return 0; + } + } + + pcre2_code_free_16(code); + return 1; +} + +static int invalid_utf16_regression_tests(void) +{ + const struct invalid_utf16_regression_test_case *current; + pcre2_compile_context_16 *ccontext; + pcre2_match_data_16 *mdata; + int total = 0, successful = 0; + int result; + + printf("\nRunning invalid-utf16 JIT regression tests\n"); + + ccontext = pcre2_compile_context_create_16(NULL); + pcre2_set_newline_16(ccontext, PCRE2_NEWLINE_ANY); + mdata = pcre2_match_data_create_16(4, NULL); + + for (current = invalid_utf16_regression_test_cases; current->pattern[0]; current++) { + /* printf("\nPattern: %s :\n", current->pattern); */ + total++; + + result = 1; + if (!run_invalid_utf16_test(current, total - 1, 0, ccontext, mdata)) + result = 0; + if (!run_invalid_utf16_test(current, total - 1, 1, ccontext, mdata)) + result = 0; + + if (result) { + successful++; + } + + printf("."); + if ((total % 60) == 0) + printf("\n"); + } + + if ((total % 60) != 0) + printf("\n"); + + pcre2_match_data_free_16(mdata); + pcre2_compile_context_free_16(ccontext); + + if (total == successful) { + printf("\nAll invalid UTF16 JIT regression tests are successfully passed.\n"); + return 0; + } else { + printf("\nInvalid UTF16 successful test ratio: %d%% (%d failed)\n", successful * 100 / total, total - successful); + return 1; + } +} + +#else /* !SUPPORT_UNICODE || !SUPPORT_PCRE2_16 */ + +static int invalid_utf16_regression_tests(void) +{ + return 0; +} + +#endif /* SUPPORT_UNICODE && SUPPORT_PCRE2_16 */ + +#if defined SUPPORT_UNICODE && defined SUPPORT_PCRE2_32 + +#define UDA (PCRE2_UTF | PCRE2_DOTALL | PCRE2_ANCHORED) +#define CI (PCRE2_JIT_COMPLETE | PCRE2_JIT_INVALID_UTF) +#define CPI (PCRE2_JIT_COMPLETE | PCRE2_JIT_PARTIAL_SOFT | PCRE2_JIT_INVALID_UTF) + +struct invalid_utf32_regression_test_case { + int compile_options; + int jit_compile_options; + int start_offset; + int skip_left; + int skip_right; + int match_start; + int match_end; + const PCRE2_UCHAR32 *pattern[2]; + const PCRE2_UCHAR32 *input; +}; + +static PCRE2_UCHAR32 allany32[] = { '.', 0 }; +static PCRE2_UCHAR32 non_word_boundary32[] = { '\\', 'B', 0 }; +static PCRE2_UCHAR32 word_boundary32[] = { '\\', 'b', 0 }; +static PCRE2_UCHAR32 backreference32[] = { '(', '.', ')', '\\', '1', 0 }; +static PCRE2_UCHAR32 grapheme32[] = { '\\', 'X', 0 }; +static PCRE2_UCHAR32 nothashmark32[] = { '[', '^', '#', ']', 0 }; +static PCRE2_UCHAR32 afternl32[] = { '^', '\\', 'W', 0 }; +static PCRE2_UCHAR32 test32_1[] = { 0x10ffff, 0x10ffff, 0x110000, 0x110000, 0x10ffff, 0 }; +static PCRE2_UCHAR32 test32_2[] = { 0xd7ff, 0xe000, 0xd800, 0xdfff, 0xe000, 0xdfff, 0xd800, 0 }; +static PCRE2_UCHAR32 test32_3[] = { 'a', 'A', 0x110000, 0 }; +static PCRE2_UCHAR32 test32_4[] = { '#', 0x10ffff, 0x110000, 0 }; +static PCRE2_UCHAR32 test32_5[] = { ' ', 0x2028, '#', 0 }; +static PCRE2_UCHAR32 test32_6[] = { ' ', 0x110000, 0x2028, '#', 0 }; + +static const struct invalid_utf32_regression_test_case invalid_utf32_regression_test_cases[] = { + { UDA, CI, 0, 0, 0, 0, 1, { allany32, NULL }, test32_1 }, + { UDA, CI, 2, 0, 0, -1, -1, { allany32, NULL }, test32_1 }, + { UDA, CI, 0, 0, 0, 0, 1, { allany32, NULL }, test32_2 }, + { UDA, CI, 1, 0, 0, 1, 2, { allany32, NULL }, test32_2 }, + { UDA, CI, 2, 0, 0, -1, -1, { allany32, NULL }, test32_2 }, + { UDA, CI, 3, 0, 0, -1, -1, { allany32, NULL }, test32_2 }, + + { UDA, CPI, 1, 0, 0, 1, 1, { non_word_boundary32, NULL }, test32_1 }, + { UDA, CPI, 3, 0, 0, -1, -1, { non_word_boundary32, word_boundary32 }, test32_1 }, + { UDA, CPI, 1, 0, 0, 1, 1, { non_word_boundary32, NULL }, test32_2 }, + { UDA, CPI, 3, 0, 0, -1, -1, { non_word_boundary32, word_boundary32 }, test32_2 }, + { UDA, CPI, 6, 0, 0, -1, -1, { non_word_boundary32, word_boundary32 }, test32_2 }, + + { UDA | PCRE2_CASELESS, CPI, 0, 0, 0, 0, 2, { backreference32, NULL }, test32_3 }, + { UDA | PCRE2_CASELESS, CPI, 1, 0, 0, -1, -1, { backreference32, NULL }, test32_3 }, + + { UDA, CPI, 0, 0, 0, 0, 1, { grapheme32, NULL }, test32_1 }, + { UDA, CPI, 2, 0, 0, -1, -1, { grapheme32, NULL }, test32_1 }, + { UDA, CPI, 1, 0, 0, 1, 2, { grapheme32, NULL }, test32_2 }, + { UDA, CPI, 2, 0, 0, -1, -1, { grapheme32, NULL }, test32_2 }, + { UDA, CPI, 3, 0, 0, -1, -1, { grapheme32, NULL }, test32_2 }, + { UDA, CPI, 4, 0, 0, 4, 5, { grapheme32, NULL }, test32_2 }, + + { UDA, CPI, 0, 0, 0, -1, -1, { nothashmark32, NULL }, test32_4 }, + { UDA, CPI, 1, 0, 0, 1, 2, { nothashmark32, NULL }, test32_4 }, + { UDA, CPI, 2, 0, 0, -1, -1, { nothashmark32, NULL }, test32_4 }, + { UDA, CPI, 1, 0, 0, 1, 2, { nothashmark32, NULL }, test32_2 }, + { UDA, CPI, 2, 0, 0, -1, -1, { nothashmark32, NULL }, test32_2 }, + + { PCRE2_UTF | PCRE2_MULTILINE, CI, 1, 0, 0, 2, 3, { afternl32, NULL }, test32_5 }, + { PCRE2_UTF | PCRE2_MULTILINE, CI, 1, 0, 0, 3, 4, { afternl32, NULL }, test32_6 }, + + { 0, 0, 0, 0, 0, 0, 0, { NULL, NULL }, NULL } +}; + +#undef UDA +#undef CI +#undef CPI + +static int run_invalid_utf32_test(const struct invalid_utf32_regression_test_case *current, + int pattern_index, int i, pcre2_compile_context_32 *ccontext, pcre2_match_data_32 *mdata) +{ + pcre2_code_32 *code; + int result, errorcode; + PCRE2_SIZE length, erroroffset; + const PCRE2_UCHAR32 *input; + PCRE2_SIZE *ovector = pcre2_get_ovector_pointer_32(mdata); + + if (current->pattern[i] == NULL) + return 1; + + code = pcre2_compile_32(current->pattern[i], PCRE2_ZERO_TERMINATED, + current->compile_options, &errorcode, &erroroffset, ccontext); + + if (!code) { + printf("Pattern[%d:0] cannot be compiled. Error offset: %d\n", pattern_index, (int)erroroffset); + return 0; + } + + if (pcre2_jit_compile_32(code, current->jit_compile_options) != 0) { + printf("Pattern[%d:0] cannot be compiled by the JIT compiler.\n", pattern_index); + pcre2_code_free_32(code); + return 0; + } + + input = current->input; + length = 0; + + while (*input++ != 0) + length++; + + length -= current->skip_left + current->skip_right; + + if (current->jit_compile_options & PCRE2_JIT_COMPLETE) { + result = pcre2_jit_match_32(code, (current->input + current->skip_left), + length, current->start_offset - current->skip_left, 0, mdata, NULL); + + if (check_invalid_utf_result(pattern_index, "match", result, current->match_start, current->match_end, ovector)) { + pcre2_code_free_32(code); + return 0; + } + } + + if (current->jit_compile_options & PCRE2_JIT_PARTIAL_SOFT) { + result = pcre2_jit_match_32(code, (current->input + current->skip_left), + length, current->start_offset - current->skip_left, PCRE2_PARTIAL_SOFT, mdata, NULL); + + if (check_invalid_utf_result(pattern_index, "partial match", result, current->match_start, current->match_end, ovector)) { + pcre2_code_free_32(code); + return 0; + } + } + + pcre2_code_free_32(code); + return 1; +} + +static int invalid_utf32_regression_tests(void) +{ + const struct invalid_utf32_regression_test_case *current; + pcre2_compile_context_32 *ccontext; + pcre2_match_data_32 *mdata; + int total = 0, successful = 0; + int result; + + printf("\nRunning invalid-utf32 JIT regression tests\n"); + + ccontext = pcre2_compile_context_create_32(NULL); + pcre2_set_newline_32(ccontext, PCRE2_NEWLINE_ANY); + mdata = pcre2_match_data_create_32(4, NULL); + + for (current = invalid_utf32_regression_test_cases; current->pattern[0]; current++) { + /* printf("\nPattern: %s :\n", current->pattern); */ + total++; + + result = 1; + if (!run_invalid_utf32_test(current, total - 1, 0, ccontext, mdata)) + result = 0; + if (!run_invalid_utf32_test(current, total - 1, 1, ccontext, mdata)) + result = 0; + + if (result) { + successful++; + } + + printf("."); + if ((total % 60) == 0) + printf("\n"); + } + + if ((total % 60) != 0) + printf("\n"); + + pcre2_match_data_free_32(mdata); + pcre2_compile_context_free_32(ccontext); + + if (total == successful) { + printf("\nAll invalid UTF32 JIT regression tests are successfully passed.\n"); + return 0; + } else { + printf("\nInvalid UTF32 successful test ratio: %d%% (%d failed)\n", successful * 100 / total, total - successful); + return 1; + } +} + +#else /* !SUPPORT_UNICODE || !SUPPORT_PCRE2_32 */ + +static int invalid_utf32_regression_tests(void) +{ + return 0; +} + +#endif /* SUPPORT_UNICODE && SUPPORT_PCRE2_32 */ + +/* End of pcre2_jit_test.c */ diff --git a/src/pcre/pcre_maketables.c b/src/pcre2/src/pcre2_maketables.c similarity index 57% rename from src/pcre/pcre_maketables.c rename to src/pcre2/src/pcre2_maketables.c index a44a6eaa..56d24940 100644 --- a/src/pcre/pcre_maketables.c +++ b/src/pcre2/src/pcre2_maketables.c @@ -6,7 +6,8 @@ and semantics are as close as possible to those of the Perl 5 language. Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2020 University of Cambridge ----------------------------------------------------------------------------- Redistribution and use in source and binary forms, with or without @@ -38,53 +39,55 @@ POSSIBILITY OF SUCH DAMAGE. */ -/* This module contains the external function pcre_maketables(), which builds -character tables for PCRE in the current locale. The file is compiled on its -own as part of the PCRE library. However, it is also included in the -compilation of dftables.c, in which case the macro DFTABLES is defined. */ +/* This module contains the external function pcre2_maketables(), which builds +character tables for PCRE2 in the current locale. The file is compiled on its +own as part of the PCRE2 library. It is also included in the compilation of +pcre2_dftables.c as a freestanding program, in which case the macro +PCRE2_DFTABLES is defined. */ - -#ifndef DFTABLES +#ifndef PCRE2_DFTABLES /* Compiling the library */ # ifdef HAVE_CONFIG_H # include "config.h" # endif -# include "pcre_internal.h" +# include "pcre2_internal.h" #endif + /************************************************* -* Create PCRE character tables * +* Create PCRE2 character tables * *************************************************/ -/* This function builds a set of character tables for use by PCRE and returns +/* This function builds a set of character tables for use by PCRE2 and returns a pointer to them. They are build using the ctype functions, and consequently their contents will depend upon the current locale setting. When compiled as -part of the library, the store is obtained via PUBL(malloc)(), but when -compiled inside dftables, use malloc(). +part of the library, the store is obtained via a general context malloc, if +supplied, but when PCRE2_DFTABLES is defined (when compiling the pcre2_dftables +freestanding auxiliary program) malloc() is used, and the function has a +different name so as not to clash with the prototype in pcre2.h. -Arguments: none +Arguments: none when PCRE2_DFTABLES is defined + else a PCRE2 general context or NULL Returns: pointer to the contiguous block of data + else NULL if memory allocation failed */ -#if defined COMPILE_PCRE8 -const unsigned char * -pcre_maketables(void) -#elif defined COMPILE_PCRE16 -const unsigned char * -pcre16_maketables(void) -#elif defined COMPILE_PCRE32 -const unsigned char * -pcre32_maketables(void) -#endif +#ifdef PCRE2_DFTABLES /* Included in freestanding pcre2_dftables program */ +static const uint8_t *maketables(void) { -unsigned char *yield, *p; -int i; +uint8_t *yield = (uint8_t *)malloc(TABLES_LENGTH); -#ifndef DFTABLES -yield = (unsigned char*)(PUBL(malloc))(tables_length); -#else -yield = (unsigned char*)malloc(tables_length); -#endif +#else /* Not PCRE2_DFTABLES, that is, compiling the library */ +PCRE2_EXP_DEFN const uint8_t * PCRE2_CALL_CONVENTION +pcre2_maketables(pcre2_general_context *gcontext) +{ +uint8_t *yield = (uint8_t *)((gcontext != NULL)? + gcontext->memctl.malloc(TABLES_LENGTH, gcontext->memctl.memory_data) : + malloc(TABLES_LENGTH)); +#endif /* PCRE2_DFTABLES */ + +int i; +uint8_t *p; if (yield == NULL) return NULL; p = yield; @@ -102,8 +105,8 @@ exclusive ones - in some locales things may be different. Note that the table for "space" includes everything "isspace" gives, including VT in the default locale. This makes it work for the POSIX class [:space:]. -From release 8.34 is is also correct for Perl space, because Perl added VT at -release 5.18. +From PCRE1 release 8.34 and for all PCRE2 releases it is also correct for Perl +space, because Perl added VT at release 5.18. Note also that it is possible for a character to be alnum or alpha without being lower or upper, such as "male and female ordinals" (\xAA and \xBA) in the @@ -113,44 +116,48 @@ test for alnum specially. */ memset(p, 0, cbit_length); for (i = 0; i < 256; i++) { - if (isdigit(i)) p[cbit_digit + i/8] |= 1 << (i&7); - if (isupper(i)) p[cbit_upper + i/8] |= 1 << (i&7); - if (islower(i)) p[cbit_lower + i/8] |= 1 << (i&7); - if (isalnum(i)) p[cbit_word + i/8] |= 1 << (i&7); - if (i == '_') p[cbit_word + i/8] |= 1 << (i&7); - if (isspace(i)) p[cbit_space + i/8] |= 1 << (i&7); - if (isxdigit(i))p[cbit_xdigit + i/8] |= 1 << (i&7); - if (isgraph(i)) p[cbit_graph + i/8] |= 1 << (i&7); - if (isprint(i)) p[cbit_print + i/8] |= 1 << (i&7); - if (ispunct(i)) p[cbit_punct + i/8] |= 1 << (i&7); - if (iscntrl(i)) p[cbit_cntrl + i/8] |= 1 << (i&7); + if (isdigit(i)) p[cbit_digit + i/8] |= 1u << (i&7); + if (isupper(i)) p[cbit_upper + i/8] |= 1u << (i&7); + if (islower(i)) p[cbit_lower + i/8] |= 1u << (i&7); + if (isalnum(i)) p[cbit_word + i/8] |= 1u << (i&7); + if (i == '_') p[cbit_word + i/8] |= 1u << (i&7); + if (isspace(i)) p[cbit_space + i/8] |= 1u << (i&7); + if (isxdigit(i)) p[cbit_xdigit + i/8] |= 1u << (i&7); + if (isgraph(i)) p[cbit_graph + i/8] |= 1u << (i&7); + if (isprint(i)) p[cbit_print + i/8] |= 1u << (i&7); + if (ispunct(i)) p[cbit_punct + i/8] |= 1u << (i&7); + if (iscntrl(i)) p[cbit_cntrl + i/8] |= 1u << (i&7); } p += cbit_length; /* Finally, the character type table. In this, we used to exclude VT from the white space chars, because Perl didn't recognize it as such for \s and for -comments within regexes. However, Perl changed at release 5.18, so PCRE changed -at release 8.34. */ +comments within regexes. However, Perl changed at release 5.18, so PCRE1 +changed at release 8.34 and it's always been this way for PCRE2. */ for (i = 0; i < 256; i++) { int x = 0; if (isspace(i)) x += ctype_space; if (isalpha(i)) x += ctype_letter; + if (islower(i)) x += ctype_lcletter; if (isdigit(i)) x += ctype_digit; - if (isxdigit(i)) x += ctype_xdigit; if (isalnum(i) || i == '_') x += ctype_word; - - /* Note: strchr includes the terminating zero in the characters it considers. - In this instance, that is ok because we want binary zero to be flagged as a - meta-character, which in this sense is any character that terminates a run - of data characters. */ - - if (strchr("\\*+?{^.$|()[", i) != 0) x += ctype_meta; *p++ = x; } return yield; } -/* End of pcre_maketables.c */ +#ifndef PCRE2_DFTABLES /* Compiling the library */ +PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION +pcre2_maketables_free(pcre2_general_context *gcontext, const uint8_t *tables) +{ + if (gcontext) + gcontext->memctl.free((void *)tables, gcontext->memctl.memory_data); + else + free((void *)tables); +} +#endif + +/* End of pcre2_maketables.c */ diff --git a/src/pcre2/src/pcre2_match.c b/src/pcre2/src/pcre2_match.c new file mode 100644 index 00000000..ed605171 --- /dev/null +++ b/src/pcre2/src/pcre2_match.c @@ -0,0 +1,7313 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2015-2020 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +/* These defines enable debugging code */ + +/* #define DEBUG_FRAMES_DISPLAY */ +/* #define DEBUG_SHOW_OPS */ +/* #define DEBUG_SHOW_RMATCH */ + +#ifdef DEBUG_FRAME_DISPLAY +#include +#endif + +/* These defines identify the name of the block containing "static" +information, and fields within it. */ + +#define NLBLOCK mb /* Block containing newline information */ +#define PSSTART start_subject /* Field containing processed string start */ +#define PSEND end_subject /* Field containing processed string end */ + +#include "pcre2_internal.h" + +#define RECURSE_UNSET 0xffffffffu /* Bigger than max group number */ + +/* Masks for identifying the public options that are permitted at match time. */ + +#define PUBLIC_MATCH_OPTIONS \ + (PCRE2_ANCHORED|PCRE2_ENDANCHORED|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY| \ + PCRE2_NOTEMPTY_ATSTART|PCRE2_NO_UTF_CHECK|PCRE2_PARTIAL_HARD| \ + PCRE2_PARTIAL_SOFT|PCRE2_NO_JIT|PCRE2_COPY_MATCHED_SUBJECT) + +#define PUBLIC_JIT_MATCH_OPTIONS \ + (PCRE2_NO_UTF_CHECK|PCRE2_NOTBOL|PCRE2_NOTEOL|PCRE2_NOTEMPTY|\ + PCRE2_NOTEMPTY_ATSTART|PCRE2_PARTIAL_SOFT|PCRE2_PARTIAL_HARD|\ + PCRE2_COPY_MATCHED_SUBJECT) + +/* Non-error returns from and within the match() function. Error returns are +externally defined PCRE2_ERROR_xxx codes, which are all negative. */ + +#define MATCH_MATCH 1 +#define MATCH_NOMATCH 0 + +/* Special internal returns used in the match() function. Make them +sufficiently negative to avoid the external error codes. */ + +#define MATCH_ACCEPT (-999) +#define MATCH_KETRPOS (-998) +/* The next 5 must be kept together and in sequence so that a test that checks +for any one of them can use a range. */ +#define MATCH_COMMIT (-997) +#define MATCH_PRUNE (-996) +#define MATCH_SKIP (-995) +#define MATCH_SKIP_ARG (-994) +#define MATCH_THEN (-993) +#define MATCH_BACKTRACK_MAX MATCH_THEN +#define MATCH_BACKTRACK_MIN MATCH_COMMIT + +/* Group frame type values. Zero means the frame is not a group frame. The +lower 16 bits are used for data (e.g. the capture number). Group frames are +used for most groups so that information about the start is easily available at +the end without having to scan back through intermediate frames (backtrack +points). */ + +#define GF_CAPTURE 0x00010000u +#define GF_NOCAPTURE 0x00020000u +#define GF_CONDASSERT 0x00030000u +#define GF_RECURSE 0x00040000u + +/* Masks for the identity and data parts of the group frame type. */ + +#define GF_IDMASK(a) ((a) & 0xffff0000u) +#define GF_DATAMASK(a) ((a) & 0x0000ffffu) + +/* Repetition types */ + +enum { REPTYPE_MIN, REPTYPE_MAX, REPTYPE_POS }; + +/* Min and max values for the common repeats; a maximum of UINT32_MAX => +infinity. */ + +static const uint32_t rep_min[] = { + 0, 0, /* * and *? */ + 1, 1, /* + and +? */ + 0, 0, /* ? and ?? */ + 0, 0, /* dummy placefillers for OP_CR[MIN]RANGE */ + 0, 1, 0 }; /* OP_CRPOS{STAR, PLUS, QUERY} */ + +static const uint32_t rep_max[] = { + UINT32_MAX, UINT32_MAX, /* * and *? */ + UINT32_MAX, UINT32_MAX, /* + and +? */ + 1, 1, /* ? and ?? */ + 0, 0, /* dummy placefillers for OP_CR[MIN]RANGE */ + UINT32_MAX, UINT32_MAX, 1 }; /* OP_CRPOS{STAR, PLUS, QUERY} */ + +/* Repetition types - must include OP_CRPOSRANGE (not needed above) */ + +static const uint32_t rep_typ[] = { + REPTYPE_MAX, REPTYPE_MIN, /* * and *? */ + REPTYPE_MAX, REPTYPE_MIN, /* + and +? */ + REPTYPE_MAX, REPTYPE_MIN, /* ? and ?? */ + REPTYPE_MAX, REPTYPE_MIN, /* OP_CRRANGE and OP_CRMINRANGE */ + REPTYPE_POS, REPTYPE_POS, /* OP_CRPOSSTAR, OP_CRPOSPLUS */ + REPTYPE_POS, REPTYPE_POS }; /* OP_CRPOSQUERY, OP_CRPOSRANGE */ + +/* Numbers for RMATCH calls at backtracking points. When these lists are +changed, the code at RETURN_SWITCH below must be updated in sync. */ + +enum { RM1=1, RM2, RM3, RM4, RM5, RM6, RM7, RM8, RM9, RM10, + RM11, RM12, RM13, RM14, RM15, RM16, RM17, RM18, RM19, RM20, + RM21, RM22, RM23, RM24, RM25, RM26, RM27, RM28, RM29, RM30, + RM31, RM32, RM33, RM34, RM35, RM36 }; + +#ifdef SUPPORT_WIDE_CHARS +enum { RM100=100, RM101 }; +#endif + +#ifdef SUPPORT_UNICODE +enum { RM200=200, RM201, RM202, RM203, RM204, RM205, RM206, RM207, + RM208, RM209, RM210, RM211, RM212, RM213, RM214, RM215, + RM216, RM217, RM218, RM219, RM220, RM221, RM222 }; +#endif + +/* Define short names for general fields in the current backtrack frame, which +is always pointed to by the F variable. Occasional references to fields in +other frames are written out explicitly. There are also some fields in the +current frame whose names start with "temp" that are used for short-term, +localised backtracking memory. These are #defined with Lxxx names at the point +of use and undefined afterwards. */ + +#define Fback_frame F->back_frame +#define Fcapture_last F->capture_last +#define Fcurrent_recurse F->current_recurse +#define Fecode F->ecode +#define Feptr F->eptr +#define Fgroup_frame_type F->group_frame_type +#define Flast_group_offset F->last_group_offset +#define Flength F->length +#define Fmark F->mark +#define Frdepth F->rdepth +#define Fstart_match F->start_match +#define Foffset_top F->offset_top +#define Foccu F->occu +#define Fop F->op +#define Fovector F->ovector +#define Freturn_id F->return_id + + +#ifdef DEBUG_FRAMES_DISPLAY +/************************************************* +* Display current frames and contents * +*************************************************/ + +/* This debugging function displays the current set of frames and their +contents. It is not called automatically from anywhere, the intention being +that calls can be inserted where necessary when debugging frame-related +problems. + +Arguments: + f the file to write to + F the current top frame + P a previous frame of interest + frame_size the frame size + mb points to the match block + s identification text + +Returns: nothing +*/ + +static void +display_frames(FILE *f, heapframe *F, heapframe *P, PCRE2_SIZE frame_size, + match_block *mb, const char *s, ...) +{ +uint32_t i; +heapframe *Q; +va_list ap; +va_start(ap, s); + +fprintf(f, "FRAMES "); +vfprintf(f, s, ap); +va_end(ap); + +if (P != NULL) fprintf(f, " P=%lu", + ((char *)P - (char *)(mb->match_frames))/frame_size); +fprintf(f, "\n"); + +for (i = 0, Q = mb->match_frames; + Q <= F; + i++, Q = (heapframe *)((char *)Q + frame_size)) + { + fprintf(f, "Frame %d type=%x subj=%lu code=%d back=%lu id=%d", + i, Q->group_frame_type, Q->eptr - mb->start_subject, *(Q->ecode), + Q->back_frame, Q->return_id); + + if (Q->last_group_offset == PCRE2_UNSET) + fprintf(f, " lgoffset=unset\n"); + else + fprintf(f, " lgoffset=%lu\n", Q->last_group_offset/frame_size); + } +} + +#endif + + + +/************************************************* +* Process a callout * +*************************************************/ + +/* This function is called for all callouts, whether "standalone" or at the +start of a conditional group. Feptr will be pointing to either OP_CALLOUT or +OP_CALLOUT_STR. A callout block is allocated in pcre2_match() and initialized +with fixed values. + +Arguments: + F points to the current backtracking frame + mb points to the match block + lengthptr where to return the length of the callout item + +Returns: the return from the callout + or 0 if no callout function exists +*/ + +static int +do_callout(heapframe *F, match_block *mb, PCRE2_SIZE *lengthptr) +{ +int rc; +PCRE2_SIZE save0, save1; +PCRE2_SIZE *callout_ovector; +pcre2_callout_block *cb; + +*lengthptr = (*Fecode == OP_CALLOUT)? + PRIV(OP_lengths)[OP_CALLOUT] : GET(Fecode, 1 + 2*LINK_SIZE); + +if (mb->callout == NULL) return 0; /* No callout function provided */ + +/* The original matching code (pre 10.30) worked directly with the ovector +passed by the user, and this was passed to callouts. Now that the working +ovector is in the backtracking frame, it no longer needs to reserve space for +the overall match offsets (which would waste space in the frame). For backward +compatibility, however, we pass capture_top and offset_vector to the callout as +if for the extended ovector, and we ensure that the first two slots are unset +by preserving and restoring their current contents. Picky compilers complain if +references such as Fovector[-2] are use directly, so we set up a separate +pointer. */ + +callout_ovector = (PCRE2_SIZE *)(Fovector) - 2; + +/* The cb->version, cb->subject, cb->subject_length, and cb->start_match fields +are set externally. The first 3 never change; the last is updated for each +bumpalong. */ + +cb = mb->cb; +cb->capture_top = (uint32_t)Foffset_top/2 + 1; +cb->capture_last = Fcapture_last; +cb->offset_vector = callout_ovector; +cb->mark = mb->nomatch_mark; +cb->current_position = (PCRE2_SIZE)(Feptr - mb->start_subject); +cb->pattern_position = GET(Fecode, 1); +cb->next_item_length = GET(Fecode, 1 + LINK_SIZE); + +if (*Fecode == OP_CALLOUT) /* Numerical callout */ + { + cb->callout_number = Fecode[1 + 2*LINK_SIZE]; + cb->callout_string_offset = 0; + cb->callout_string = NULL; + cb->callout_string_length = 0; + } +else /* String callout */ + { + cb->callout_number = 0; + cb->callout_string_offset = GET(Fecode, 1 + 3*LINK_SIZE); + cb->callout_string = Fecode + (1 + 4*LINK_SIZE) + 1; + cb->callout_string_length = + *lengthptr - (1 + 4*LINK_SIZE) - 2; + } + +save0 = callout_ovector[0]; +save1 = callout_ovector[1]; +callout_ovector[0] = callout_ovector[1] = PCRE2_UNSET; +rc = mb->callout(cb, mb->callout_data); +callout_ovector[0] = save0; +callout_ovector[1] = save1; +cb->callout_flags = 0; +return rc; +} + + + +/************************************************* +* Match a back-reference * +*************************************************/ + +/* This function is called only when it is known that the offset lies within +the offsets that have so far been used in the match. Note that in caseless +UTF-8 mode, the number of subject bytes matched may be different to the number +of reference bytes. (In theory this could also happen in UTF-16 mode, but it +seems unlikely.) + +Arguments: + offset index into the offset vector + caseless TRUE if caseless + F the current backtracking frame pointer + mb points to match block + lengthptr pointer for returning the length matched + +Returns: = 0 sucessful match; number of code units matched is set + < 0 no match + > 0 partial match +*/ + +static int +match_ref(PCRE2_SIZE offset, BOOL caseless, heapframe *F, match_block *mb, + PCRE2_SIZE *lengthptr) +{ +PCRE2_SPTR p; +PCRE2_SIZE length; +PCRE2_SPTR eptr; +PCRE2_SPTR eptr_start; + +/* Deal with an unset group. The default is no match, but there is an option to +match an empty string. */ + +if (offset >= Foffset_top || Fovector[offset] == PCRE2_UNSET) + { + if ((mb->poptions & PCRE2_MATCH_UNSET_BACKREF) != 0) + { + *lengthptr = 0; + return 0; /* Match */ + } + else return -1; /* No match */ + } + +/* Separate the caseless and UTF cases for speed. */ + +eptr = eptr_start = Feptr; +p = mb->start_subject + Fovector[offset]; +length = Fovector[offset+1] - Fovector[offset]; + +if (caseless) + { +#if defined SUPPORT_UNICODE + BOOL utf = (mb->poptions & PCRE2_UTF) != 0; + + if (utf || (mb->poptions & PCRE2_UCP) != 0) + { + PCRE2_SPTR endptr = p + length; + + /* Match characters up to the end of the reference. NOTE: the number of + code units matched may differ, because in UTF-8 there are some characters + whose upper and lower case codes have different numbers of bytes. For + example, U+023A (2 bytes in UTF-8) is the upper case version of U+2C65 (3 + bytes in UTF-8); a sequence of 3 of the former uses 6 bytes, as does a + sequence of two of the latter. It is important, therefore, to check the + length along the reference, not along the subject (earlier code did this + wrong). UCP without uses Unicode properties but without UTF encoding. */ + + while (p < endptr) + { + uint32_t c, d; + const ucd_record *ur; + if (eptr >= mb->end_subject) return 1; /* Partial match */ + + if (utf) + { + GETCHARINC(c, eptr); + GETCHARINC(d, p); + } + else + { + c = *eptr++; + d = *p++; + } + + ur = GET_UCD(d); + if (c != d && c != (uint32_t)((int)d + ur->other_case)) + { + const uint32_t *pp = PRIV(ucd_caseless_sets) + ur->caseset; + for (;;) + { + if (c < *pp) return -1; /* No match */ + if (c == *pp++) break; + } + } + } + } + else +#endif + + /* Not in UTF or UCP mode */ + { + for (; length > 0; length--) + { + uint32_t cc, cp; + if (eptr >= mb->end_subject) return 1; /* Partial match */ + cc = UCHAR21TEST(eptr); + cp = UCHAR21TEST(p); + if (TABLE_GET(cp, mb->lcc, cp) != TABLE_GET(cc, mb->lcc, cc)) + return -1; /* No match */ + p++; + eptr++; + } + } + } + +/* In the caseful case, we can just compare the code units, whether or not we +are in UTF and/or UCP mode. When partial matching, we have to do this unit by +unit. */ + +else + { + if (mb->partial != 0) + { + for (; length > 0; length--) + { + if (eptr >= mb->end_subject) return 1; /* Partial match */ + if (UCHAR21INCTEST(p) != UCHAR21INCTEST(eptr)) return -1; /* No match */ + } + } + + /* Not partial matching */ + + else + { + if ((PCRE2_SIZE)(mb->end_subject - eptr) < length) return 1; /* Partial */ + if (memcmp(p, eptr, CU2BYTES(length)) != 0) return -1; /* No match */ + eptr += length; + } + } + +*lengthptr = eptr - eptr_start; +return 0; /* Match */ +} + + + +/****************************************************************************** +******************************************************************************* + "Recursion" in the match() function + +The original match() function was highly recursive, but this proved to be the +source of a number of problems over the years, mostly because of the relatively +small system stacks that are commonly found. As new features were added to +patterns, various kludges were invented to reduce the amount of stack used, +making the code hard to understand in places. + +A version did exist that used individual frames on the heap instead of calling +match() recursively, but this ran substantially slower. The current version is +a refactoring that uses a vector of frames to remember backtracking points. +This runs no slower, and possibly even a bit faster than the original recursive +implementation. An initial vector of size START_FRAMES_SIZE (enough for maybe +50 frames) is allocated on the system stack. If this is not big enough, the +heap is used for a larger vector. + +******************************************************************************* +******************************************************************************/ + + + + +/************************************************* +* Macros for the match() function * +*************************************************/ + +/* These macros pack up tests that are used for partial matching several times +in the code. The second one is used when we already know we are past the end of +the subject. We set the "hit end" flag if the pointer is at the end of the +subject and either (a) the pointer is past the earliest inspected character +(i.e. something has been matched, even if not part of the actual matched +string), or (b) the pattern contains a lookbehind. These are the conditions for +which adding more characters may allow the current match to continue. + +For hard partial matching, we immediately return a partial match. Otherwise, +carrying on means that a complete match on the current subject will be sought. +A partial match is returned only if no complete match can be found. */ + +#define CHECK_PARTIAL()\ + if (Feptr >= mb->end_subject) \ + { \ + SCHECK_PARTIAL(); \ + } + +#define SCHECK_PARTIAL()\ + if (mb->partial != 0 && \ + (Feptr > mb->start_used_ptr || mb->allowemptypartial)) \ + { \ + mb->hitend = TRUE; \ + if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; \ + } + + +/* These macros are used to implement backtracking. They simulate a recursive +call to the match() function by means of a local vector of frames which +remember the backtracking points. */ + +#define RMATCH(ra,rb)\ + {\ + start_ecode = ra;\ + Freturn_id = rb;\ + goto MATCH_RECURSE;\ + L_##rb:;\ + } + +#define RRETURN(ra)\ + {\ + rrc = ra;\ + goto RETURN_SWITCH;\ + } + + + +/************************************************* +* Match from current position * +*************************************************/ + +/* This function is called to run one match attempt at a single starting point +in the subject. + +Performance note: It might be tempting to extract commonly used fields from the +mb structure (e.g. end_subject) into individual variables to improve +performance. Tests using gcc on a SPARC disproved this; in the first case, it +made performance worse. + +Arguments: + start_eptr starting character in subject + start_ecode starting position in compiled code + ovector pointer to the final output vector + oveccount number of pairs in ovector + top_bracket number of capturing parentheses in the pattern + frame_size size of each backtracking frame + mb pointer to "static" variables block + +Returns: MATCH_MATCH if matched ) these values are >= 0 + MATCH_NOMATCH if failed to match ) + negative MATCH_xxx value for PRUNE, SKIP, etc + negative PCRE2_ERROR_xxx value if aborted by an error condition + (e.g. stopped by repeated call or depth limit) +*/ + +static int +match(PCRE2_SPTR start_eptr, PCRE2_SPTR start_ecode, PCRE2_SIZE *ovector, + uint16_t oveccount, uint16_t top_bracket, PCRE2_SIZE frame_size, + match_block *mb) +{ +/* Frame-handling variables */ + +heapframe *F; /* Current frame pointer */ +heapframe *N = NULL; /* Temporary frame pointers */ +heapframe *P = NULL; +heapframe *assert_accept_frame = NULL; /* For passing back a frame with captures */ +PCRE2_SIZE frame_copy_size; /* Amount to copy when creating a new frame */ + +/* Local variables that do not need to be preserved over calls to RRMATCH(). */ + +PCRE2_SPTR bracode; /* Temp pointer to start of group */ +PCRE2_SIZE offset; /* Used for group offsets */ +PCRE2_SIZE length; /* Used for various length calculations */ + +int rrc; /* Return from functions & backtracking "recursions" */ +#ifdef SUPPORT_UNICODE +int proptype; /* Type of character property */ +#endif + +uint32_t i; /* Used for local loops */ +uint32_t fc; /* Character values */ +uint32_t number; /* Used for group and other numbers */ +uint32_t reptype = 0; /* Type of repetition (0 to avoid compiler warning) */ +uint32_t group_frame_type; /* Specifies type for new group frames */ + +BOOL condition; /* Used in conditional groups */ +BOOL cur_is_word; /* Used in "word" tests */ +BOOL prev_is_word; /* Used in "word" tests */ + +/* UTF and UCP flags */ + +#ifdef SUPPORT_UNICODE +BOOL utf = (mb->poptions & PCRE2_UTF) != 0; +BOOL ucp = (mb->poptions & PCRE2_UCP) != 0; +#else +BOOL utf = FALSE; /* Required for convenience even when no Unicode support */ +#endif + +/* This is the length of the last part of a backtracking frame that must be +copied when a new frame is created. */ + +frame_copy_size = frame_size - offsetof(heapframe, eptr); + +/* Set up the first current frame at the start of the vector, and initialize +fields that are not reset for new frames. */ + +F = mb->match_frames; +Frdepth = 0; /* "Recursion" depth */ +Fcapture_last = 0; /* Number of most recent capture */ +Fcurrent_recurse = RECURSE_UNSET; /* Not pattern recursing. */ +Fstart_match = Feptr = start_eptr; /* Current data pointer and start match */ +Fmark = NULL; /* Most recent mark */ +Foffset_top = 0; /* End of captures within the frame */ +Flast_group_offset = PCRE2_UNSET; /* Saved frame of most recent group */ +group_frame_type = 0; /* Not a start of group frame */ +goto NEW_FRAME; /* Start processing with this frame */ + +/* Come back here when we want to create a new frame for remembering a +backtracking point. */ + +MATCH_RECURSE: + +/* Set up a new backtracking frame. If the vector is full, get a new one +on the heap, doubling the size, but constrained by the heap limit. */ + +N = (heapframe *)((char *)F + frame_size); +if (N >= mb->match_frames_top) + { + PCRE2_SIZE newsize = mb->frame_vector_size * 2; + heapframe *new; + + if ((newsize / 1024) > mb->heap_limit) + { + PCRE2_SIZE maxsize = ((mb->heap_limit * 1024)/frame_size) * frame_size; + if (mb->frame_vector_size >= maxsize) return PCRE2_ERROR_HEAPLIMIT; + newsize = maxsize; + } + + new = mb->memctl.malloc(newsize, mb->memctl.memory_data); + if (new == NULL) return PCRE2_ERROR_NOMEMORY; + memcpy(new, mb->match_frames, mb->frame_vector_size); + + F = (heapframe *)((char *)new + ((char *)F - (char *)mb->match_frames)); + N = (heapframe *)((char *)F + frame_size); + + if (mb->match_frames != mb->stack_frames) + mb->memctl.free(mb->match_frames, mb->memctl.memory_data); + mb->match_frames = new; + mb->match_frames_top = (heapframe *)((char *)mb->match_frames + newsize); + mb->frame_vector_size = newsize; + } + +#ifdef DEBUG_SHOW_RMATCH +fprintf(stderr, "++ RMATCH %2d frame=%d", Freturn_id, Frdepth + 1); +if (group_frame_type != 0) + { + fprintf(stderr, " type=%x ", group_frame_type); + switch (GF_IDMASK(group_frame_type)) + { + case GF_CAPTURE: + fprintf(stderr, "capture=%d", GF_DATAMASK(group_frame_type)); + break; + + case GF_NOCAPTURE: + fprintf(stderr, "nocapture op=%d", GF_DATAMASK(group_frame_type)); + break; + + case GF_CONDASSERT: + fprintf(stderr, "condassert op=%d", GF_DATAMASK(group_frame_type)); + break; + + case GF_RECURSE: + fprintf(stderr, "recurse=%d", GF_DATAMASK(group_frame_type)); + break; + + default: + fprintf(stderr, "*** unknown ***"); + break; + } + } +fprintf(stderr, "\n"); +#endif + +/* Copy those fields that must be copied into the new frame, increase the +"recursion" depth (i.e. the new frame's index) and then make the new frame +current. */ + +memcpy((char *)N + offsetof(heapframe, eptr), + (char *)F + offsetof(heapframe, eptr), + frame_copy_size); + +N->rdepth = Frdepth + 1; +F = N; + +/* Carry on processing with a new frame. */ + +NEW_FRAME: +Fgroup_frame_type = group_frame_type; +Fecode = start_ecode; /* Starting code pointer */ +Fback_frame = frame_size; /* Default is go back one frame */ + +/* If this is a special type of group frame, remember its offset for quick +access at the end of the group. If this is a recursion, set a new current +recursion value. */ + +if (group_frame_type != 0) + { + Flast_group_offset = (char *)F - (char *)mb->match_frames; + if (GF_IDMASK(group_frame_type) == GF_RECURSE) + Fcurrent_recurse = GF_DATAMASK(group_frame_type); + group_frame_type = 0; + } + + +/* ========================================================================= */ +/* This is the main processing loop. First check that we haven't recorded too +many backtracks (search tree is too large), or that we haven't exceeded the +recursive depth limit (used too many backtracking frames). If not, process the +opcodes. */ + +if (mb->match_call_count++ >= mb->match_limit) return PCRE2_ERROR_MATCHLIMIT; +if (Frdepth >= mb->match_limit_depth) return PCRE2_ERROR_DEPTHLIMIT; + +for (;;) + { +#ifdef DEBUG_SHOW_OPS +fprintf(stderr, "++ op=%d\n", *Fecode); +#endif + + Fop = (uint8_t)(*Fecode); /* Cast needed for 16-bit and 32-bit modes */ + switch(Fop) + { + /* ===================================================================== */ + /* Before OP_ACCEPT there may be any number of OP_CLOSE opcodes, to close + any currently open capturing brackets. Unlike reaching the end of a group, + where we know the starting frame is at the top of the chained frames, in + this case we have to search back for the relevant frame in case other types + of group that use chained frames have intervened. Multiple OP_CLOSEs always + come innermost first, which matches the chain order. We can ignore this in + a recursion, because captures are not passed out of recursions. */ + + case OP_CLOSE: + if (Fcurrent_recurse == RECURSE_UNSET) + { + number = GET2(Fecode, 1); + offset = Flast_group_offset; + for(;;) + { + if (offset == PCRE2_UNSET) return PCRE2_ERROR_INTERNAL; + N = (heapframe *)((char *)mb->match_frames + offset); + P = (heapframe *)((char *)N - frame_size); + if (N->group_frame_type == (GF_CAPTURE | number)) break; + offset = P->last_group_offset; + } + offset = (number << 1) - 2; + Fcapture_last = number; + Fovector[offset] = P->eptr - mb->start_subject; + Fovector[offset+1] = Feptr - mb->start_subject; + if (offset >= Foffset_top) Foffset_top = offset + 2; + } + Fecode += PRIV(OP_lengths)[*Fecode]; + break; + + + /* ===================================================================== */ + /* Real or forced end of the pattern, assertion, or recursion. In an + assertion ACCEPT, update the last used pointer and remember the current + frame so that the captures and mark can be fished out of it. */ + + case OP_ASSERT_ACCEPT: + if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr; + assert_accept_frame = F; + RRETURN(MATCH_ACCEPT); + + /* If recursing, we have to find the most recent recursion. */ + + case OP_ACCEPT: + case OP_END: + + /* Handle end of a recursion. */ + + if (Fcurrent_recurse != RECURSE_UNSET) + { + offset = Flast_group_offset; + for(;;) + { + if (offset == PCRE2_UNSET) return PCRE2_ERROR_INTERNAL; + N = (heapframe *)((char *)mb->match_frames + offset); + P = (heapframe *)((char *)N - frame_size); + if (GF_IDMASK(N->group_frame_type) == GF_RECURSE) break; + offset = P->last_group_offset; + } + + /* N is now the frame of the recursion; the previous frame is at the + OP_RECURSE position. Go back there, copying the current subject position + and mark, and the start_match position (\K might have changed it), and + then move on past the OP_RECURSE. */ + + P->eptr = Feptr; + P->mark = Fmark; + P->start_match = Fstart_match; + F = P; + Fecode += 1 + LINK_SIZE; + continue; + } + + /* Not a recursion. Fail for an empty string match if either PCRE2_NOTEMPTY + is set, or if PCRE2_NOTEMPTY_ATSTART is set and we have matched at the + start of the subject. In both cases, backtracking will then try other + alternatives, if any. */ + + if (Feptr == Fstart_match && + ((mb->moptions & PCRE2_NOTEMPTY) != 0 || + ((mb->moptions & PCRE2_NOTEMPTY_ATSTART) != 0 && + Fstart_match == mb->start_subject + mb->start_offset))) + RRETURN(MATCH_NOMATCH); + + /* Also fail if PCRE2_ENDANCHORED is set and the end of the match is not + the end of the subject. After (*ACCEPT) we fail the entire match (at this + position) but backtrack on reaching the end of the pattern. */ + + if (Feptr < mb->end_subject && + ((mb->moptions | mb->poptions) & PCRE2_ENDANCHORED) != 0) + { + if (Fop == OP_END) RRETURN(MATCH_NOMATCH); + return MATCH_NOMATCH; + } + + /* We have a successful match of the whole pattern. Record the result and + then do a direct return from the function. If there is space in the offset + vector, set any pairs that follow the highest-numbered captured string but + are less than the number of capturing groups in the pattern to PCRE2_UNSET. + It is documented that this happens. "Gaps" are set to PCRE2_UNSET + dynamically. It is only those at the end that need setting here. */ + + mb->end_match_ptr = Feptr; /* Record where we ended */ + mb->end_offset_top = Foffset_top; /* and how many extracts were taken */ + mb->mark = Fmark; /* and the last success mark */ + if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr; + + ovector[0] = Fstart_match - mb->start_subject; + ovector[1] = Feptr - mb->start_subject; + + /* Set i to the smaller of the sizes of the external and frame ovectors. */ + + i = 2 * ((top_bracket + 1 > oveccount)? oveccount : top_bracket + 1); + memcpy(ovector + 2, Fovector, (i - 2) * sizeof(PCRE2_SIZE)); + while (--i >= Foffset_top + 2) ovector[i] = PCRE2_UNSET; + return MATCH_MATCH; /* Note: NOT RRETURN */ + + + /*===================================================================== */ + /* Match any single character type except newline; have to take care with + CRLF newlines and partial matching. */ + + case OP_ANY: + if (IS_NEWLINE(Feptr)) RRETURN(MATCH_NOMATCH); + if (mb->partial != 0 && + Feptr == mb->end_subject - 1 && + NLBLOCK->nltype == NLTYPE_FIXED && + NLBLOCK->nllen == 2 && + UCHAR21TEST(Feptr) == NLBLOCK->nl[0]) + { + mb->hitend = TRUE; + if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; + } + /* Fall through */ + + /* Match any single character whatsoever. */ + + case OP_ALLANY: + if (Feptr >= mb->end_subject) /* DO NOT merge the Feptr++ here; it must */ + { /* not be updated before SCHECK_PARTIAL. */ + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + Feptr++; +#ifdef SUPPORT_UNICODE + if (utf) ACROSSCHAR(Feptr < mb->end_subject, Feptr, Feptr++); +#endif + Fecode++; + break; + + + /* ===================================================================== */ + /* Match a single code unit, even in UTF mode. This opcode really does + match any code unit, even newline. (It really should be called ANYCODEUNIT, + of course - the byte name is from pre-16 bit days.) */ + + case OP_ANYBYTE: + if (Feptr >= mb->end_subject) /* DO NOT merge the Feptr++ here; it must */ + { /* not be updated before SCHECK_PARTIAL. */ + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + Feptr++; + Fecode++; + break; + + + /* ===================================================================== */ + /* Match a single character, casefully */ + + case OP_CHAR: +#ifdef SUPPORT_UNICODE + if (utf) + { + Flength = 1; + Fecode++; + GETCHARLEN(fc, Fecode, Flength); + if (Flength > (PCRE2_SIZE)(mb->end_subject - Feptr)) + { + CHECK_PARTIAL(); /* Not SCHECK_PARTIAL() */ + RRETURN(MATCH_NOMATCH); + } + for (; Flength > 0; Flength--) + { + if (*Fecode++ != UCHAR21INC(Feptr)) RRETURN(MATCH_NOMATCH); + } + } + else +#endif + + /* Not UTF mode */ + { + if (mb->end_subject - Feptr < 1) + { + SCHECK_PARTIAL(); /* This one can use SCHECK_PARTIAL() */ + RRETURN(MATCH_NOMATCH); + } + if (Fecode[1] != *Feptr++) RRETURN(MATCH_NOMATCH); + Fecode += 2; + } + break; + + + /* ===================================================================== */ + /* Match a single character, caselessly. If we are at the end of the + subject, give up immediately. We get here only when the pattern character + has at most one other case. Characters with more than two cases are coded + as OP_PROP with the pseudo-property PT_CLIST. */ + + case OP_CHARI: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + +#ifdef SUPPORT_UNICODE + if (utf) + { + Flength = 1; + Fecode++; + GETCHARLEN(fc, Fecode, Flength); + + /* If the pattern character's value is < 128, we know that its other case + (if any) is also < 128 (and therefore only one code unit long in all + code-unit widths), so we can use the fast lookup table. We checked above + that there is at least one character left in the subject. */ + + if (fc < 128) + { + uint32_t cc = UCHAR21(Feptr); + if (mb->lcc[fc] != TABLE_GET(cc, mb->lcc, cc)) RRETURN(MATCH_NOMATCH); + Fecode++; + Feptr++; + } + + /* Otherwise we must pick up the subject character and use Unicode + property support to test its other case. Note that we cannot use the + value of "Flength" to check for sufficient bytes left, because the other + case of the character may have more or fewer code units. */ + + else + { + uint32_t dc; + GETCHARINC(dc, Feptr); + Fecode += Flength; + if (dc != fc && dc != UCD_OTHERCASE(fc)) RRETURN(MATCH_NOMATCH); + } + } + + /* If UCP is set without UTF we must do the same as above, but with one + character per code unit. */ + + else if (ucp) + { + uint32_t cc = UCHAR21(Feptr); + fc = Fecode[1]; + if (fc < 128) + { + if (mb->lcc[fc] != TABLE_GET(cc, mb->lcc, cc)) RRETURN(MATCH_NOMATCH); + } + else + { + if (cc != fc && cc != UCD_OTHERCASE(fc)) RRETURN(MATCH_NOMATCH); + } + Feptr++; + Fecode += 2; + } + + else +#endif /* SUPPORT_UNICODE */ + + /* Not UTF or UCP mode; use the table for characters < 256. */ + { + if (TABLE_GET(Fecode[1], mb->lcc, Fecode[1]) + != TABLE_GET(*Feptr, mb->lcc, *Feptr)) RRETURN(MATCH_NOMATCH); + Feptr++; + Fecode += 2; + } + break; + + + /* ===================================================================== */ + /* Match not a single character. */ + + case OP_NOT: + case OP_NOTI: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + +#ifdef SUPPORT_UNICODE + if (utf) + { + uint32_t ch; + Fecode++; + GETCHARINC(ch, Fecode); + GETCHARINC(fc, Feptr); + if (ch == fc) + { + RRETURN(MATCH_NOMATCH); /* Caseful match */ + } + else if (Fop == OP_NOTI) /* If caseless */ + { + if (ch > 127) + ch = UCD_OTHERCASE(ch); + else + ch = (mb->fcc)[ch]; + if (ch == fc) RRETURN(MATCH_NOMATCH); + } + } + + /* UCP without UTF is as above, but with one character per code unit. */ + + else if (ucp) + { + uint32_t ch; + fc = UCHAR21INC(Feptr); + ch = Fecode[1]; + Fecode += 2; + + if (ch == fc) + { + RRETURN(MATCH_NOMATCH); /* Caseful match */ + } + else if (Fop == OP_NOTI) /* If caseless */ + { + if (ch > 127) + ch = UCD_OTHERCASE(ch); + else + ch = (mb->fcc)[ch]; + if (ch == fc) RRETURN(MATCH_NOMATCH); + } + } + + else +#endif /* SUPPORT_UNICODE */ + + /* Neither UTF nor UCP is set */ + + { + uint32_t ch = Fecode[1]; + fc = UCHAR21INC(Feptr); + if (ch == fc || (Fop == OP_NOTI && TABLE_GET(ch, mb->fcc, ch) == fc)) + RRETURN(MATCH_NOMATCH); + Fecode += 2; + } + break; + + + /* ===================================================================== */ + /* Match a single character repeatedly. */ + +#define Loclength F->temp_size +#define Lstart_eptr F->temp_sptr[0] +#define Lcharptr F->temp_sptr[1] +#define Lmin F->temp_32[0] +#define Lmax F->temp_32[1] +#define Lc F->temp_32[2] +#define Loc F->temp_32[3] + + case OP_EXACT: + case OP_EXACTI: + Lmin = Lmax = GET2(Fecode, 1); + Fecode += 1 + IMM2_SIZE; + goto REPEATCHAR; + + case OP_POSUPTO: + case OP_POSUPTOI: + reptype = REPTYPE_POS; + Lmin = 0; + Lmax = GET2(Fecode, 1); + Fecode += 1 + IMM2_SIZE; + goto REPEATCHAR; + + case OP_UPTO: + case OP_UPTOI: + reptype = REPTYPE_MAX; + Lmin = 0; + Lmax = GET2(Fecode, 1); + Fecode += 1 + IMM2_SIZE; + goto REPEATCHAR; + + case OP_MINUPTO: + case OP_MINUPTOI: + reptype = REPTYPE_MIN; + Lmin = 0; + Lmax = GET2(Fecode, 1); + Fecode += 1 + IMM2_SIZE; + goto REPEATCHAR; + + case OP_POSSTAR: + case OP_POSSTARI: + reptype = REPTYPE_POS; + Lmin = 0; + Lmax = UINT32_MAX; + Fecode++; + goto REPEATCHAR; + + case OP_POSPLUS: + case OP_POSPLUSI: + reptype = REPTYPE_POS; + Lmin = 1; + Lmax = UINT32_MAX; + Fecode++; + goto REPEATCHAR; + + case OP_POSQUERY: + case OP_POSQUERYI: + reptype = REPTYPE_POS; + Lmin = 0; + Lmax = 1; + Fecode++; + goto REPEATCHAR; + + case OP_STAR: + case OP_STARI: + case OP_MINSTAR: + case OP_MINSTARI: + case OP_PLUS: + case OP_PLUSI: + case OP_MINPLUS: + case OP_MINPLUSI: + case OP_QUERY: + case OP_QUERYI: + case OP_MINQUERY: + case OP_MINQUERYI: + fc = *Fecode++ - ((Fop < OP_STARI)? OP_STAR : OP_STARI); + Lmin = rep_min[fc]; + Lmax = rep_max[fc]; + reptype = rep_typ[fc]; + + /* Common code for all repeated single-character matches. We first check + for the minimum number of characters. If the minimum equals the maximum, we + are done. Otherwise, if minimizing, check the rest of the pattern for a + match; if there isn't one, advance up to the maximum, one character at a + time. + + If maximizing, advance up to the maximum number of matching characters, + until Feptr is past the end of the maximum run. If possessive, we are + then done (no backing up). Otherwise, match at this position; anything + other than no match is immediately returned. For nomatch, back up one + character, unless we are matching \R and the last thing matched was + \r\n, in which case, back up two code units until we reach the first + optional character position. + + The various UTF/non-UTF and caseful/caseless cases are handled separately, + for speed. */ + + REPEATCHAR: +#ifdef SUPPORT_UNICODE + if (utf) + { + Flength = 1; + Lcharptr = Fecode; + GETCHARLEN(fc, Fecode, Flength); + Fecode += Flength; + + /* Handle multi-code-unit character matching, caseful and caseless. */ + + if (Flength > 1) + { + uint32_t othercase; + + if (Fop >= OP_STARI && /* Caseless */ + (othercase = UCD_OTHERCASE(fc)) != fc) + Loclength = PRIV(ord2utf)(othercase, Foccu); + else Loclength = 0; + + for (i = 1; i <= Lmin; i++) + { + if (Feptr <= mb->end_subject - Flength && + memcmp(Feptr, Lcharptr, CU2BYTES(Flength)) == 0) Feptr += Flength; + else if (Loclength > 0 && + Feptr <= mb->end_subject - Loclength && + memcmp(Feptr, Foccu, CU2BYTES(Loclength)) == 0) + Feptr += Loclength; + else + { + CHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + } + + if (Lmin == Lmax) continue; + + if (reptype == REPTYPE_MIN) + { + for (;;) + { + RMATCH(Fecode, RM202); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr <= mb->end_subject - Flength && + memcmp(Feptr, Lcharptr, CU2BYTES(Flength)) == 0) Feptr += Flength; + else if (Loclength > 0 && + Feptr <= mb->end_subject - Loclength && + memcmp(Feptr, Foccu, CU2BYTES(Loclength)) == 0) + Feptr += Loclength; + else + { + CHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + } + /* Control never gets here */ + } + + else /* Maximize */ + { + Lstart_eptr = Feptr; + for (i = Lmin; i < Lmax; i++) + { + if (Feptr <= mb->end_subject - Flength && + memcmp(Feptr, Lcharptr, CU2BYTES(Flength)) == 0) + Feptr += Flength; + else if (Loclength > 0 && + Feptr <= mb->end_subject - Loclength && + memcmp(Feptr, Foccu, CU2BYTES(Loclength)) == 0) + Feptr += Loclength; + else + { + CHECK_PARTIAL(); + break; + } + } + + /* After \C in UTF mode, Lstart_eptr might be in the middle of a + Unicode character. Use <= Lstart_eptr to ensure backtracking doesn't + go too far. */ + + if (reptype != REPTYPE_POS) for(;;) + { + if (Feptr <= Lstart_eptr) break; + RMATCH(Fecode, RM203); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Feptr--; + BACKCHAR(Feptr); + } + } + break; /* End of repeated wide character handling */ + } + + /* Length of UTF character is 1. Put it into the preserved variable and + fall through to the non-UTF code. */ + + Lc = fc; + } + else +#endif /* SUPPORT_UNICODE */ + + /* When not in UTF mode, load a single-code-unit character. Then proceed as + above, using Unicode casing if either UTF or UCP is set. */ + + Lc = *Fecode++; + + /* Caseless comparison */ + + if (Fop >= OP_STARI) + { +#if PCRE2_CODE_UNIT_WIDTH == 8 +#ifdef SUPPORT_UNICODE + if (ucp && !utf && Lc > 127) Loc = UCD_OTHERCASE(Lc); + else +#endif /* SUPPORT_UNICODE */ + /* Lc will be < 128 in UTF-8 mode. */ + Loc = mb->fcc[Lc]; +#else /* 16-bit & 32-bit */ +#ifdef SUPPORT_UNICODE + if ((utf || ucp) && Lc > 127) Loc = UCD_OTHERCASE(Lc); + else +#endif /* SUPPORT_UNICODE */ + Loc = TABLE_GET(Lc, mb->fcc, Lc); +#endif /* PCRE2_CODE_UNIT_WIDTH == 8 */ + + for (i = 1; i <= Lmin; i++) + { + uint32_t cc; /* Faster than PCRE2_UCHAR */ + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + cc = UCHAR21TEST(Feptr); + if (Lc != cc && Loc != cc) RRETURN(MATCH_NOMATCH); + Feptr++; + } + if (Lmin == Lmax) continue; + + if (reptype == REPTYPE_MIN) + { + for (;;) + { + uint32_t cc; /* Faster than PCRE2_UCHAR */ + RMATCH(Fecode, RM25); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + cc = UCHAR21TEST(Feptr); + if (Lc != cc && Loc != cc) RRETURN(MATCH_NOMATCH); + Feptr++; + } + /* Control never gets here */ + } + + else /* Maximize */ + { + Lstart_eptr = Feptr; + for (i = Lmin; i < Lmax; i++) + { + uint32_t cc; /* Faster than PCRE2_UCHAR */ + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + cc = UCHAR21TEST(Feptr); + if (Lc != cc && Loc != cc) break; + Feptr++; + } + if (reptype != REPTYPE_POS) for (;;) + { + if (Feptr == Lstart_eptr) break; + RMATCH(Fecode, RM26); + Feptr--; + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + } + } + } + + /* Caseful comparisons (includes all multi-byte characters) */ + + else + { + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (Lc != UCHAR21INCTEST(Feptr)) RRETURN(MATCH_NOMATCH); + } + + if (Lmin == Lmax) continue; + + if (reptype == REPTYPE_MIN) + { + for (;;) + { + RMATCH(Fecode, RM27); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (Lc != UCHAR21INCTEST(Feptr)) RRETURN(MATCH_NOMATCH); + } + /* Control never gets here */ + } + else /* Maximize */ + { + Lstart_eptr = Feptr; + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + + if (Lc != UCHAR21TEST(Feptr)) break; + Feptr++; + } + + if (reptype != REPTYPE_POS) for (;;) + { + if (Feptr <= Lstart_eptr) break; + RMATCH(Fecode, RM28); + Feptr--; + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + } + } + } + break; + +#undef Loclength +#undef Lstart_eptr +#undef Lcharptr +#undef Lmin +#undef Lmax +#undef Lc +#undef Loc + + + /* ===================================================================== */ + /* Match a negated single one-byte character repeatedly. This is almost a + repeat of the code for a repeated single character, but I haven't found a + nice way of commoning these up that doesn't require a test of the + positive/negative option for each character match. Maybe that wouldn't add + very much to the time taken, but character matching *is* what this is all + about... */ + +#define Lstart_eptr F->temp_sptr[0] +#define Lmin F->temp_32[0] +#define Lmax F->temp_32[1] +#define Lc F->temp_32[2] +#define Loc F->temp_32[3] + + case OP_NOTEXACT: + case OP_NOTEXACTI: + Lmin = Lmax = GET2(Fecode, 1); + Fecode += 1 + IMM2_SIZE; + goto REPEATNOTCHAR; + + case OP_NOTUPTO: + case OP_NOTUPTOI: + Lmin = 0; + Lmax = GET2(Fecode, 1); + reptype = REPTYPE_MAX; + Fecode += 1 + IMM2_SIZE; + goto REPEATNOTCHAR; + + case OP_NOTMINUPTO: + case OP_NOTMINUPTOI: + Lmin = 0; + Lmax = GET2(Fecode, 1); + reptype = REPTYPE_MIN; + Fecode += 1 + IMM2_SIZE; + goto REPEATNOTCHAR; + + case OP_NOTPOSSTAR: + case OP_NOTPOSSTARI: + reptype = REPTYPE_POS; + Lmin = 0; + Lmax = UINT32_MAX; + Fecode++; + goto REPEATNOTCHAR; + + case OP_NOTPOSPLUS: + case OP_NOTPOSPLUSI: + reptype = REPTYPE_POS; + Lmin = 1; + Lmax = UINT32_MAX; + Fecode++; + goto REPEATNOTCHAR; + + case OP_NOTPOSQUERY: + case OP_NOTPOSQUERYI: + reptype = REPTYPE_POS; + Lmin = 0; + Lmax = 1; + Fecode++; + goto REPEATNOTCHAR; + + case OP_NOTPOSUPTO: + case OP_NOTPOSUPTOI: + reptype = REPTYPE_POS; + Lmin = 0; + Lmax = GET2(Fecode, 1); + Fecode += 1 + IMM2_SIZE; + goto REPEATNOTCHAR; + + case OP_NOTSTAR: + case OP_NOTSTARI: + case OP_NOTMINSTAR: + case OP_NOTMINSTARI: + case OP_NOTPLUS: + case OP_NOTPLUSI: + case OP_NOTMINPLUS: + case OP_NOTMINPLUSI: + case OP_NOTQUERY: + case OP_NOTQUERYI: + case OP_NOTMINQUERY: + case OP_NOTMINQUERYI: + fc = *Fecode++ - ((Fop >= OP_NOTSTARI)? OP_NOTSTARI: OP_NOTSTAR); + Lmin = rep_min[fc]; + Lmax = rep_max[fc]; + reptype = rep_typ[fc]; + + /* Common code for all repeated single-character non-matches. */ + + REPEATNOTCHAR: + GETCHARINCTEST(Lc, Fecode); + + /* The code is duplicated for the caseless and caseful cases, for speed, + since matching characters is likely to be quite common. First, ensure the + minimum number of matches are present. If Lmin = Lmax, we are done. + Otherwise, if minimizing, keep trying the rest of the expression and + advancing one matching character if failing, up to the maximum. + Alternatively, if maximizing, find the maximum number of characters and + work backwards. */ + + if (Fop >= OP_NOTSTARI) /* Caseless */ + { +#ifdef SUPPORT_UNICODE + if ((utf || ucp) && Lc > 127) + Loc = UCD_OTHERCASE(Lc); + else +#endif /* SUPPORT_UNICODE */ + + Loc = TABLE_GET(Lc, mb->fcc, Lc); /* Other case from table */ + +#ifdef SUPPORT_UNICODE + if (utf) + { + uint32_t d; + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINC(d, Feptr); + if (Lc == d || Loc == d) RRETURN(MATCH_NOMATCH); + } + } + else +#endif /* SUPPORT_UNICODE */ + + /* Not UTF mode */ + { + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (Lc == *Feptr || Loc == *Feptr) RRETURN(MATCH_NOMATCH); + Feptr++; + } + } + + if (Lmin == Lmax) continue; /* Finished for exact count */ + + if (reptype == REPTYPE_MIN) + { +#ifdef SUPPORT_UNICODE + if (utf) + { + uint32_t d; + for (;;) + { + RMATCH(Fecode, RM204); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINC(d, Feptr); + if (Lc == d || Loc == d) RRETURN(MATCH_NOMATCH); + } + } + else +#endif /*SUPPORT_UNICODE */ + + /* Not UTF mode */ + { + for (;;) + { + RMATCH(Fecode, RM29); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (Lc == *Feptr || Loc == *Feptr) RRETURN(MATCH_NOMATCH); + Feptr++; + } + } + /* Control never gets here */ + } + + /* Maximize case */ + + else + { + Lstart_eptr = Feptr; + +#ifdef SUPPORT_UNICODE + if (utf) + { + uint32_t d; + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLEN(d, Feptr, len); + if (Lc == d || Loc == d) break; + Feptr += len; + } + + /* After \C in UTF mode, Lstart_eptr might be in the middle of a + Unicode character. Use <= Lstart_eptr to ensure backtracking doesn't + go too far. */ + + if (reptype != REPTYPE_POS) for(;;) + { + if (Feptr <= Lstart_eptr) break; + RMATCH(Fecode, RM205); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Feptr--; + BACKCHAR(Feptr); + } + } + else +#endif /* SUPPORT_UNICODE */ + + /* Not UTF mode */ + { + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + if (Lc == *Feptr || Loc == *Feptr) break; + Feptr++; + } + if (reptype != REPTYPE_POS) for (;;) + { + if (Feptr == Lstart_eptr) break; + RMATCH(Fecode, RM30); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Feptr--; + } + } + } + } + + /* Caseful comparisons */ + + else + { +#ifdef SUPPORT_UNICODE + if (utf) + { + uint32_t d; + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINC(d, Feptr); + if (Lc == d) RRETURN(MATCH_NOMATCH); + } + } + else +#endif + /* Not UTF mode */ + { + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (Lc == *Feptr++) RRETURN(MATCH_NOMATCH); + } + } + + if (Lmin == Lmax) continue; + + if (reptype == REPTYPE_MIN) + { +#ifdef SUPPORT_UNICODE + if (utf) + { + uint32_t d; + for (;;) + { + RMATCH(Fecode, RM206); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINC(d, Feptr); + if (Lc == d) RRETURN(MATCH_NOMATCH); + } + } + else +#endif + /* Not UTF mode */ + { + for (;;) + { + RMATCH(Fecode, RM31); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (Lc == *Feptr++) RRETURN(MATCH_NOMATCH); + } + } + /* Control never gets here */ + } + + /* Maximize case */ + + else + { + Lstart_eptr = Feptr; + +#ifdef SUPPORT_UNICODE + if (utf) + { + uint32_t d; + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLEN(d, Feptr, len); + if (Lc == d) break; + Feptr += len; + } + + /* After \C in UTF mode, Lstart_eptr might be in the middle of a + Unicode character. Use <= Lstart_eptr to ensure backtracking doesn't + go too far. */ + + if (reptype != REPTYPE_POS) for(;;) + { + if (Feptr <= Lstart_eptr) break; + RMATCH(Fecode, RM207); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Feptr--; + BACKCHAR(Feptr); + } + } + else +#endif + /* Not UTF mode */ + { + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + if (Lc == *Feptr) break; + Feptr++; + } + if (reptype != REPTYPE_POS) for (;;) + { + if (Feptr == Lstart_eptr) break; + RMATCH(Fecode, RM32); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Feptr--; + } + } + } + } + break; + +#undef Lstart_eptr +#undef Lmin +#undef Lmax +#undef Lc +#undef Loc + + + /* ===================================================================== */ + /* Match a bit-mapped character class, possibly repeatedly. These opcodes + are used when all the characters in the class have values in the range + 0-255, and either the matching is caseful, or the characters are in the + range 0-127 when UTF processing is enabled. The only difference between + OP_CLASS and OP_NCLASS occurs when a data character outside the range is + encountered. */ + +#define Lmin F->temp_32[0] +#define Lmax F->temp_32[1] +#define Lstart_eptr F->temp_sptr[0] +#define Lbyte_map_address F->temp_sptr[1] +#define Lbyte_map ((unsigned char *)Lbyte_map_address) + + case OP_NCLASS: + case OP_CLASS: + { + Lbyte_map_address = Fecode + 1; /* Save for matching */ + Fecode += 1 + (32 / sizeof(PCRE2_UCHAR)); /* Advance past the item */ + + /* Look past the end of the item to see if there is repeat information + following. Then obey similar code to character type repeats. */ + + switch (*Fecode) + { + case OP_CRSTAR: + case OP_CRMINSTAR: + case OP_CRPLUS: + case OP_CRMINPLUS: + case OP_CRQUERY: + case OP_CRMINQUERY: + case OP_CRPOSSTAR: + case OP_CRPOSPLUS: + case OP_CRPOSQUERY: + fc = *Fecode++ - OP_CRSTAR; + Lmin = rep_min[fc]; + Lmax = rep_max[fc]; + reptype = rep_typ[fc]; + break; + + case OP_CRRANGE: + case OP_CRMINRANGE: + case OP_CRPOSRANGE: + Lmin = GET2(Fecode, 1); + Lmax = GET2(Fecode, 1 + IMM2_SIZE); + if (Lmax == 0) Lmax = UINT32_MAX; /* Max 0 => infinity */ + reptype = rep_typ[*Fecode - OP_CRSTAR]; + Fecode += 1 + 2 * IMM2_SIZE; + break; + + default: /* No repeat follows */ + Lmin = Lmax = 1; + break; + } + + /* First, ensure the minimum number of matches are present. */ + +#ifdef SUPPORT_UNICODE + if (utf) + { + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINC(fc, Feptr); + if (fc > 255) + { + if (Fop == OP_CLASS) RRETURN(MATCH_NOMATCH); + } + else + if ((Lbyte_map[fc/8] & (1u << (fc&7))) == 0) RRETURN(MATCH_NOMATCH); + } + } + else +#endif + /* Not UTF mode */ + { + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + fc = *Feptr++; +#if PCRE2_CODE_UNIT_WIDTH != 8 + if (fc > 255) + { + if (Fop == OP_CLASS) RRETURN(MATCH_NOMATCH); + } + else +#endif + if ((Lbyte_map[fc/8] & (1u << (fc&7))) == 0) RRETURN(MATCH_NOMATCH); + } + } + + /* If Lmax == Lmin we are done. Continue with main loop. */ + + if (Lmin == Lmax) continue; + + /* If minimizing, keep testing the rest of the expression and advancing + the pointer while it matches the class. */ + + if (reptype == REPTYPE_MIN) + { +#ifdef SUPPORT_UNICODE + if (utf) + { + for (;;) + { + RMATCH(Fecode, RM200); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINC(fc, Feptr); + if (fc > 255) + { + if (Fop == OP_CLASS) RRETURN(MATCH_NOMATCH); + } + else + if ((Lbyte_map[fc/8] & (1u << (fc&7))) == 0) RRETURN(MATCH_NOMATCH); + } + } + else +#endif + /* Not UTF mode */ + { + for (;;) + { + RMATCH(Fecode, RM23); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + fc = *Feptr++; +#if PCRE2_CODE_UNIT_WIDTH != 8 + if (fc > 255) + { + if (Fop == OP_CLASS) RRETURN(MATCH_NOMATCH); + } + else +#endif + if ((Lbyte_map[fc/8] & (1u << (fc&7))) == 0) RRETURN(MATCH_NOMATCH); + } + } + /* Control never gets here */ + } + + /* If maximizing, find the longest possible run, then work backwards. */ + + else + { + Lstart_eptr = Feptr; + +#ifdef SUPPORT_UNICODE + if (utf) + { + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLEN(fc, Feptr, len); + if (fc > 255) + { + if (Fop == OP_CLASS) break; + } + else + if ((Lbyte_map[fc/8] & (1u << (fc&7))) == 0) break; + Feptr += len; + } + + if (reptype == REPTYPE_POS) continue; /* No backtracking */ + + /* After \C in UTF mode, Lstart_eptr might be in the middle of a + Unicode character. Use <= Lstart_eptr to ensure backtracking doesn't + go too far. */ + + for (;;) + { + RMATCH(Fecode, RM201); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Feptr-- <= Lstart_eptr) break; /* Tried at original position */ + BACKCHAR(Feptr); + } + } + else +#endif + /* Not UTF mode */ + { + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + fc = *Feptr; +#if PCRE2_CODE_UNIT_WIDTH != 8 + if (fc > 255) + { + if (Fop == OP_CLASS) break; + } + else +#endif + if ((Lbyte_map[fc/8] & (1u << (fc&7))) == 0) break; + Feptr++; + } + + if (reptype == REPTYPE_POS) continue; /* No backtracking */ + + while (Feptr >= Lstart_eptr) + { + RMATCH(Fecode, RM24); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Feptr--; + } + } + + RRETURN(MATCH_NOMATCH); + } + } + /* Control never gets here */ + +#undef Lbyte_map_address +#undef Lbyte_map +#undef Lstart_eptr +#undef Lmin +#undef Lmax + + + /* ===================================================================== */ + /* Match an extended character class. In the 8-bit library, this opcode is + encountered only when UTF-8 mode mode is supported. In the 16-bit and + 32-bit libraries, codepoints greater than 255 may be encountered even when + UTF is not supported. */ + +#define Lstart_eptr F->temp_sptr[0] +#define Lxclass_data F->temp_sptr[1] +#define Lmin F->temp_32[0] +#define Lmax F->temp_32[1] + +#ifdef SUPPORT_WIDE_CHARS + case OP_XCLASS: + { + Lxclass_data = Fecode + 1 + LINK_SIZE; /* Save for matching */ + Fecode += GET(Fecode, 1); /* Advance past the item */ + + switch (*Fecode) + { + case OP_CRSTAR: + case OP_CRMINSTAR: + case OP_CRPLUS: + case OP_CRMINPLUS: + case OP_CRQUERY: + case OP_CRMINQUERY: + case OP_CRPOSSTAR: + case OP_CRPOSPLUS: + case OP_CRPOSQUERY: + fc = *Fecode++ - OP_CRSTAR; + Lmin = rep_min[fc]; + Lmax = rep_max[fc]; + reptype = rep_typ[fc]; + break; + + case OP_CRRANGE: + case OP_CRMINRANGE: + case OP_CRPOSRANGE: + Lmin = GET2(Fecode, 1); + Lmax = GET2(Fecode, 1 + IMM2_SIZE); + if (Lmax == 0) Lmax = UINT32_MAX; /* Max 0 => infinity */ + reptype = rep_typ[*Fecode - OP_CRSTAR]; + Fecode += 1 + 2 * IMM2_SIZE; + break; + + default: /* No repeat follows */ + Lmin = Lmax = 1; + break; + } + + /* First, ensure the minimum number of matches are present. */ + + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if (!PRIV(xclass)(fc, Lxclass_data, utf)) RRETURN(MATCH_NOMATCH); + } + + /* If Lmax == Lmin we can just continue with the main loop. */ + + if (Lmin == Lmax) continue; + + /* If minimizing, keep testing the rest of the expression and advancing + the pointer while it matches the class. */ + + if (reptype == REPTYPE_MIN) + { + for (;;) + { + RMATCH(Fecode, RM100); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if (!PRIV(xclass)(fc, Lxclass_data, utf)) RRETURN(MATCH_NOMATCH); + } + /* Control never gets here */ + } + + /* If maximizing, find the longest possible run, then work backwards. */ + + else + { + Lstart_eptr = Feptr; + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } +#ifdef SUPPORT_UNICODE + GETCHARLENTEST(fc, Feptr, len); +#else + fc = *Feptr; +#endif + if (!PRIV(xclass)(fc, Lxclass_data, utf)) break; + Feptr += len; + } + + if (reptype == REPTYPE_POS) continue; /* No backtracking */ + + /* After \C in UTF mode, Lstart_eptr might be in the middle of a + Unicode character. Use <= Lstart_eptr to ensure backtracking doesn't + go too far. */ + + for(;;) + { + RMATCH(Fecode, RM101); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Feptr-- <= Lstart_eptr) break; /* Tried at original position */ +#ifdef SUPPORT_UNICODE + if (utf) BACKCHAR(Feptr); +#endif + } + RRETURN(MATCH_NOMATCH); + } + + /* Control never gets here */ + } +#endif /* SUPPORT_WIDE_CHARS: end of XCLASS */ + +#undef Lstart_eptr +#undef Lxclass_data +#undef Lmin +#undef Lmax + + + /* ===================================================================== */ + /* Match various character types when PCRE2_UCP is not set. These opcodes + are not generated when PCRE2_UCP is set - instead appropriate property + tests are compiled. */ + + case OP_NOT_DIGIT: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if (CHMAX_255(fc) && (mb->ctypes[fc] & ctype_digit) != 0) + RRETURN(MATCH_NOMATCH); + Fecode++; + break; + + case OP_DIGIT: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if (!CHMAX_255(fc) || (mb->ctypes[fc] & ctype_digit) == 0) + RRETURN(MATCH_NOMATCH); + Fecode++; + break; + + case OP_NOT_WHITESPACE: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if (CHMAX_255(fc) && (mb->ctypes[fc] & ctype_space) != 0) + RRETURN(MATCH_NOMATCH); + Fecode++; + break; + + case OP_WHITESPACE: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if (!CHMAX_255(fc) || (mb->ctypes[fc] & ctype_space) == 0) + RRETURN(MATCH_NOMATCH); + Fecode++; + break; + + case OP_NOT_WORDCHAR: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if (CHMAX_255(fc) && (mb->ctypes[fc] & ctype_word) != 0) + RRETURN(MATCH_NOMATCH); + Fecode++; + break; + + case OP_WORDCHAR: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if (!CHMAX_255(fc) || (mb->ctypes[fc] & ctype_word) == 0) + RRETURN(MATCH_NOMATCH); + Fecode++; + break; + + case OP_ANYNL: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + switch(fc) + { + default: RRETURN(MATCH_NOMATCH); + + case CHAR_CR: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + } + else if (UCHAR21TEST(Feptr) == CHAR_LF) Feptr++; + break; + + case CHAR_LF: + break; + + case CHAR_VT: + case CHAR_FF: + case CHAR_NEL: +#ifndef EBCDIC + case 0x2028: + case 0x2029: +#endif /* Not EBCDIC */ + if (mb->bsr_convention == PCRE2_BSR_ANYCRLF) RRETURN(MATCH_NOMATCH); + break; + } + Fecode++; + break; + + case OP_NOT_HSPACE: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + switch(fc) + { + HSPACE_CASES: RRETURN(MATCH_NOMATCH); /* Byte and multibyte cases */ + default: break; + } + Fecode++; + break; + + case OP_HSPACE: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + switch(fc) + { + HSPACE_CASES: break; /* Byte and multibyte cases */ + default: RRETURN(MATCH_NOMATCH); + } + Fecode++; + break; + + case OP_NOT_VSPACE: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + switch(fc) + { + VSPACE_CASES: RRETURN(MATCH_NOMATCH); + default: break; + } + Fecode++; + break; + + case OP_VSPACE: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + switch(fc) + { + VSPACE_CASES: break; + default: RRETURN(MATCH_NOMATCH); + } + Fecode++; + break; + + +#ifdef SUPPORT_UNICODE + + /* ===================================================================== */ + /* Check the next character by Unicode property. We will get here only + if the support is in the binary; otherwise a compile-time error occurs. */ + + case OP_PROP: + case OP_NOTPROP: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + { + const uint32_t *cp; + const ucd_record *prop = GET_UCD(fc); + + switch(Fecode[1]) + { + case PT_ANY: + if (Fop == OP_NOTPROP) RRETURN(MATCH_NOMATCH); + break; + + case PT_LAMP: + if ((prop->chartype == ucp_Lu || + prop->chartype == ucp_Ll || + prop->chartype == ucp_Lt) == (Fop == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + break; + + case PT_GC: + if ((Fecode[2] != PRIV(ucp_gentype)[prop->chartype]) == (Fop == OP_PROP)) + RRETURN(MATCH_NOMATCH); + break; + + case PT_PC: + if ((Fecode[2] != prop->chartype) == (Fop == OP_PROP)) + RRETURN(MATCH_NOMATCH); + break; + + case PT_SC: + if ((Fecode[2] != prop->script) == (Fop == OP_PROP)) + RRETURN(MATCH_NOMATCH); + break; + + /* These are specials */ + + case PT_ALNUM: + if ((PRIV(ucp_gentype)[prop->chartype] == ucp_L || + PRIV(ucp_gentype)[prop->chartype] == ucp_N) == (Fop == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + break; + + /* Perl space used to exclude VT, but from Perl 5.18 it is included, + which means that Perl space and POSIX space are now identical. PCRE + was changed at release 8.34. */ + + case PT_SPACE: /* Perl space */ + case PT_PXSPACE: /* POSIX space */ + switch(fc) + { + HSPACE_CASES: + VSPACE_CASES: + if (Fop == OP_NOTPROP) RRETURN(MATCH_NOMATCH); + break; + + default: + if ((PRIV(ucp_gentype)[prop->chartype] == ucp_Z) == + (Fop == OP_NOTPROP)) RRETURN(MATCH_NOMATCH); + break; + } + break; + + case PT_WORD: + if ((PRIV(ucp_gentype)[prop->chartype] == ucp_L || + PRIV(ucp_gentype)[prop->chartype] == ucp_N || + fc == CHAR_UNDERSCORE) == (Fop == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + break; + + case PT_CLIST: + cp = PRIV(ucd_caseless_sets) + Fecode[2]; + for (;;) + { + if (fc < *cp) + { if (Fop == OP_PROP) { RRETURN(MATCH_NOMATCH); } else break; } + if (fc == *cp++) + { if (Fop == OP_PROP) break; else { RRETURN(MATCH_NOMATCH); } } + } + break; + + case PT_UCNC: + if ((fc == CHAR_DOLLAR_SIGN || fc == CHAR_COMMERCIAL_AT || + fc == CHAR_GRAVE_ACCENT || (fc >= 0xa0 && fc <= 0xd7ff) || + fc >= 0xe000) == (Fop == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + break; + + /* This should never occur */ + + default: + return PCRE2_ERROR_INTERNAL; + } + + Fecode += 3; + } + break; + + + /* ===================================================================== */ + /* Match an extended Unicode sequence. We will get here only if the support + is in the binary; otherwise a compile-time error occurs. */ + + case OP_EXTUNI: + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + else + { + GETCHARINCTEST(fc, Feptr); + Feptr = PRIV(extuni)(fc, Feptr, mb->start_subject, mb->end_subject, utf, + NULL); + } + CHECK_PARTIAL(); + Fecode++; + break; + +#endif /* SUPPORT_UNICODE */ + + + /* ===================================================================== */ + /* Match a single character type repeatedly. Note that the property type + does not need to be in a stack frame as it is not used within an RMATCH() + loop. */ + +#define Lstart_eptr F->temp_sptr[0] +#define Lmin F->temp_32[0] +#define Lmax F->temp_32[1] +#define Lctype F->temp_32[2] +#define Lpropvalue F->temp_32[3] + + case OP_TYPEEXACT: + Lmin = Lmax = GET2(Fecode, 1); + Fecode += 1 + IMM2_SIZE; + goto REPEATTYPE; + + case OP_TYPEUPTO: + case OP_TYPEMINUPTO: + Lmin = 0; + Lmax = GET2(Fecode, 1); + reptype = (*Fecode == OP_TYPEMINUPTO)? REPTYPE_MIN : REPTYPE_MAX; + Fecode += 1 + IMM2_SIZE; + goto REPEATTYPE; + + case OP_TYPEPOSSTAR: + reptype = REPTYPE_POS; + Lmin = 0; + Lmax = UINT32_MAX; + Fecode++; + goto REPEATTYPE; + + case OP_TYPEPOSPLUS: + reptype = REPTYPE_POS; + Lmin = 1; + Lmax = UINT32_MAX; + Fecode++; + goto REPEATTYPE; + + case OP_TYPEPOSQUERY: + reptype = REPTYPE_POS; + Lmin = 0; + Lmax = 1; + Fecode++; + goto REPEATTYPE; + + case OP_TYPEPOSUPTO: + reptype = REPTYPE_POS; + Lmin = 0; + Lmax = GET2(Fecode, 1); + Fecode += 1 + IMM2_SIZE; + goto REPEATTYPE; + + case OP_TYPESTAR: + case OP_TYPEMINSTAR: + case OP_TYPEPLUS: + case OP_TYPEMINPLUS: + case OP_TYPEQUERY: + case OP_TYPEMINQUERY: + fc = *Fecode++ - OP_TYPESTAR; + Lmin = rep_min[fc]; + Lmax = rep_max[fc]; + reptype = rep_typ[fc]; + + /* Common code for all repeated character type matches. */ + + REPEATTYPE: + Lctype = *Fecode++; /* Code for the character type */ + +#ifdef SUPPORT_UNICODE + if (Lctype == OP_PROP || Lctype == OP_NOTPROP) + { + proptype = *Fecode++; + Lpropvalue = *Fecode++; + } + else proptype = -1; +#endif + + /* First, ensure the minimum number of matches are present. Use inline + code for maximizing the speed, and do the type test once at the start + (i.e. keep it out of the loop). The code for UTF mode is separated out for + tidiness, except for Unicode property tests. */ + + if (Lmin > 0) + { +#ifdef SUPPORT_UNICODE + if (proptype >= 0) /* Property tests in all modes */ + { + switch(proptype) + { + case PT_ANY: + if (Lctype == OP_NOTPROP) RRETURN(MATCH_NOMATCH); + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + } + break; + + case PT_LAMP: + for (i = 1; i <= Lmin; i++) + { + int chartype; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + chartype = UCD_CHARTYPE(fc); + if ((chartype == ucp_Lu || + chartype == ucp_Ll || + chartype == ucp_Lt) == (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + } + break; + + case PT_GC: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if ((UCD_CATEGORY(fc) == Lpropvalue) == (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + } + break; + + case PT_PC: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if ((UCD_CHARTYPE(fc) == Lpropvalue) == (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + } + break; + + case PT_SC: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if ((UCD_SCRIPT(fc) == Lpropvalue) == (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + } + break; + + case PT_ALNUM: + for (i = 1; i <= Lmin; i++) + { + int category; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + category = UCD_CATEGORY(fc); + if ((category == ucp_L || category == ucp_N) == (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + } + break; + + /* Perl space used to exclude VT, but from Perl 5.18 it is included, + which means that Perl space and POSIX space are now identical. PCRE + was changed at release 8.34. */ + + case PT_SPACE: /* Perl space */ + case PT_PXSPACE: /* POSIX space */ + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + switch(fc) + { + HSPACE_CASES: + VSPACE_CASES: + if (Lctype == OP_NOTPROP) RRETURN(MATCH_NOMATCH); + break; + + default: + if ((UCD_CATEGORY(fc) == ucp_Z) == (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + break; + } + } + break; + + case PT_WORD: + for (i = 1; i <= Lmin; i++) + { + int category; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + category = UCD_CATEGORY(fc); + if ((category == ucp_L || category == ucp_N || + fc == CHAR_UNDERSCORE) == (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + } + break; + + case PT_CLIST: + for (i = 1; i <= Lmin; i++) + { + const uint32_t *cp; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + cp = PRIV(ucd_caseless_sets) + Lpropvalue; + for (;;) + { + if (fc < *cp) + { + if (Lctype == OP_NOTPROP) break; + RRETURN(MATCH_NOMATCH); + } + if (fc == *cp++) + { + if (Lctype == OP_NOTPROP) RRETURN(MATCH_NOMATCH); + break; + } + } + } + break; + + case PT_UCNC: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if ((fc == CHAR_DOLLAR_SIGN || fc == CHAR_COMMERCIAL_AT || + fc == CHAR_GRAVE_ACCENT || (fc >= 0xa0 && fc <= 0xd7ff) || + fc >= 0xe000) == (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + } + break; + + /* This should not occur */ + + default: + return PCRE2_ERROR_INTERNAL; + } + } + + /* Match extended Unicode sequences. We will get here only if the + support is in the binary; otherwise a compile-time error occurs. */ + + else if (Lctype == OP_EXTUNI) + { + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + else + { + GETCHARINCTEST(fc, Feptr); + Feptr = PRIV(extuni)(fc, Feptr, mb->start_subject, + mb->end_subject, utf, NULL); + } + CHECK_PARTIAL(); + } + } + else +#endif /* SUPPORT_UNICODE */ + +/* Handle all other cases in UTF mode */ + +#ifdef SUPPORT_UNICODE + if (utf) switch(Lctype) + { + case OP_ANY: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (IS_NEWLINE(Feptr)) RRETURN(MATCH_NOMATCH); + if (mb->partial != 0 && + Feptr + 1 >= mb->end_subject && + NLBLOCK->nltype == NLTYPE_FIXED && + NLBLOCK->nllen == 2 && + UCHAR21(Feptr) == NLBLOCK->nl[0]) + { + mb->hitend = TRUE; + if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; + } + Feptr++; + ACROSSCHAR(Feptr < mb->end_subject, Feptr, Feptr++); + } + break; + + case OP_ALLANY: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + Feptr++; + ACROSSCHAR(Feptr < mb->end_subject, Feptr, Feptr++); + } + break; + + case OP_ANYBYTE: + if (Feptr > mb->end_subject - Lmin) RRETURN(MATCH_NOMATCH); + Feptr += Lmin; + break; + + case OP_ANYNL: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINC(fc, Feptr); + switch(fc) + { + default: RRETURN(MATCH_NOMATCH); + + case CHAR_CR: + if (Feptr < mb->end_subject && UCHAR21(Feptr) == CHAR_LF) Feptr++; + break; + + case CHAR_LF: + break; + + case CHAR_VT: + case CHAR_FF: + case CHAR_NEL: +#ifndef EBCDIC + case 0x2028: + case 0x2029: +#endif /* Not EBCDIC */ + if (mb->bsr_convention == PCRE2_BSR_ANYCRLF) RRETURN(MATCH_NOMATCH); + break; + } + } + break; + + case OP_NOT_HSPACE: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINC(fc, Feptr); + switch(fc) + { + HSPACE_CASES: RRETURN(MATCH_NOMATCH); + default: break; + } + } + break; + + case OP_HSPACE: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINC(fc, Feptr); + switch(fc) + { + HSPACE_CASES: break; + default: RRETURN(MATCH_NOMATCH); + } + } + break; + + case OP_NOT_VSPACE: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINC(fc, Feptr); + switch(fc) + { + VSPACE_CASES: RRETURN(MATCH_NOMATCH); + default: break; + } + } + break; + + case OP_VSPACE: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINC(fc, Feptr); + switch(fc) + { + VSPACE_CASES: break; + default: RRETURN(MATCH_NOMATCH); + } + } + break; + + case OP_NOT_DIGIT: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINC(fc, Feptr); + if (fc < 128 && (mb->ctypes[fc] & ctype_digit) != 0) + RRETURN(MATCH_NOMATCH); + } + break; + + case OP_DIGIT: + for (i = 1; i <= Lmin; i++) + { + uint32_t cc; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + cc = UCHAR21(Feptr); + if (cc >= 128 || (mb->ctypes[cc] & ctype_digit) == 0) + RRETURN(MATCH_NOMATCH); + Feptr++; + /* No need to skip more code units - we know it has only one. */ + } + break; + + case OP_NOT_WHITESPACE: + for (i = 1; i <= Lmin; i++) + { + uint32_t cc; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + cc = UCHAR21(Feptr); + if (cc < 128 && (mb->ctypes[cc] & ctype_space) != 0) + RRETURN(MATCH_NOMATCH); + Feptr++; + ACROSSCHAR(Feptr < mb->end_subject, Feptr, Feptr++); + } + break; + + case OP_WHITESPACE: + for (i = 1; i <= Lmin; i++) + { + uint32_t cc; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + cc = UCHAR21(Feptr); + if (cc >= 128 || (mb->ctypes[cc] & ctype_space) == 0) + RRETURN(MATCH_NOMATCH); + Feptr++; + /* No need to skip more code units - we know it has only one. */ + } + break; + + case OP_NOT_WORDCHAR: + for (i = 1; i <= Lmin; i++) + { + uint32_t cc; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + cc = UCHAR21(Feptr); + if (cc < 128 && (mb->ctypes[cc] & ctype_word) != 0) + RRETURN(MATCH_NOMATCH); + Feptr++; + ACROSSCHAR(Feptr < mb->end_subject, Feptr, Feptr++); + } + break; + + case OP_WORDCHAR: + for (i = 1; i <= Lmin; i++) + { + uint32_t cc; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + cc = UCHAR21(Feptr); + if (cc >= 128 || (mb->ctypes[cc] & ctype_word) == 0) + RRETURN(MATCH_NOMATCH); + Feptr++; + /* No need to skip more code units - we know it has only one. */ + } + break; + + default: + return PCRE2_ERROR_INTERNAL; + } /* End switch(Lctype) */ + + else +#endif /* SUPPORT_UNICODE */ + + /* Code for the non-UTF case for minimum matching of operators other + than OP_PROP and OP_NOTPROP. */ + + switch(Lctype) + { + case OP_ANY: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (IS_NEWLINE(Feptr)) RRETURN(MATCH_NOMATCH); + if (mb->partial != 0 && + Feptr + 1 >= mb->end_subject && + NLBLOCK->nltype == NLTYPE_FIXED && + NLBLOCK->nllen == 2 && + *Feptr == NLBLOCK->nl[0]) + { + mb->hitend = TRUE; + if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; + } + Feptr++; + } + break; + + case OP_ALLANY: + if (Feptr > mb->end_subject - Lmin) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + Feptr += Lmin; + break; + + /* This OP_ANYBYTE case will never be reached because \C gets turned + into OP_ALLANY in non-UTF mode. Cut out the code so that coverage + reports don't complain about it's never being used. */ + +/* case OP_ANYBYTE: +* if (Feptr > mb->end_subject - Lmin) +* { +* SCHECK_PARTIAL(); +* RRETURN(MATCH_NOMATCH); +* } +* Feptr += Lmin; +* break; +*/ + case OP_ANYNL: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + switch(*Feptr++) + { + default: RRETURN(MATCH_NOMATCH); + + case CHAR_CR: + if (Feptr < mb->end_subject && *Feptr == CHAR_LF) Feptr++; + break; + + case CHAR_LF: + break; + + case CHAR_VT: + case CHAR_FF: + case CHAR_NEL: +#if PCRE2_CODE_UNIT_WIDTH != 8 + case 0x2028: + case 0x2029: +#endif + if (mb->bsr_convention == PCRE2_BSR_ANYCRLF) RRETURN(MATCH_NOMATCH); + break; + } + } + break; + + case OP_NOT_HSPACE: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + switch(*Feptr++) + { + default: break; + HSPACE_BYTE_CASES: +#if PCRE2_CODE_UNIT_WIDTH != 8 + HSPACE_MULTIBYTE_CASES: +#endif + RRETURN(MATCH_NOMATCH); + } + } + break; + + case OP_HSPACE: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + switch(*Feptr++) + { + default: RRETURN(MATCH_NOMATCH); + HSPACE_BYTE_CASES: +#if PCRE2_CODE_UNIT_WIDTH != 8 + HSPACE_MULTIBYTE_CASES: +#endif + break; + } + } + break; + + case OP_NOT_VSPACE: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + switch(*Feptr++) + { + VSPACE_BYTE_CASES: +#if PCRE2_CODE_UNIT_WIDTH != 8 + VSPACE_MULTIBYTE_CASES: +#endif + RRETURN(MATCH_NOMATCH); + default: break; + } + } + break; + + case OP_VSPACE: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + switch(*Feptr++) + { + default: RRETURN(MATCH_NOMATCH); + VSPACE_BYTE_CASES: +#if PCRE2_CODE_UNIT_WIDTH != 8 + VSPACE_MULTIBYTE_CASES: +#endif + break; + } + } + break; + + case OP_NOT_DIGIT: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (MAX_255(*Feptr) && (mb->ctypes[*Feptr] & ctype_digit) != 0) + RRETURN(MATCH_NOMATCH); + Feptr++; + } + break; + + case OP_DIGIT: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (!MAX_255(*Feptr) || (mb->ctypes[*Feptr] & ctype_digit) == 0) + RRETURN(MATCH_NOMATCH); + Feptr++; + } + break; + + case OP_NOT_WHITESPACE: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (MAX_255(*Feptr) && (mb->ctypes[*Feptr] & ctype_space) != 0) + RRETURN(MATCH_NOMATCH); + Feptr++; + } + break; + + case OP_WHITESPACE: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (!MAX_255(*Feptr) || (mb->ctypes[*Feptr] & ctype_space) == 0) + RRETURN(MATCH_NOMATCH); + Feptr++; + } + break; + + case OP_NOT_WORDCHAR: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (MAX_255(*Feptr) && (mb->ctypes[*Feptr] & ctype_word) != 0) + RRETURN(MATCH_NOMATCH); + Feptr++; + } + break; + + case OP_WORDCHAR: + for (i = 1; i <= Lmin; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (!MAX_255(*Feptr) || (mb->ctypes[*Feptr] & ctype_word) == 0) + RRETURN(MATCH_NOMATCH); + Feptr++; + } + break; + + default: + return PCRE2_ERROR_INTERNAL; + } + } + + /* If Lmin = Lmax we are done. Continue with the main loop. */ + + if (Lmin == Lmax) continue; + + /* If minimizing, we have to test the rest of the pattern before each + subsequent match. */ + + if (reptype == REPTYPE_MIN) + { +#ifdef SUPPORT_UNICODE + if (proptype >= 0) + { + switch(proptype) + { + case PT_ANY: + for (;;) + { + RMATCH(Fecode, RM208); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if (Lctype == OP_NOTPROP) RRETURN(MATCH_NOMATCH); + } + /* Control never gets here */ + + case PT_LAMP: + for (;;) + { + int chartype; + RMATCH(Fecode, RM209); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + chartype = UCD_CHARTYPE(fc); + if ((chartype == ucp_Lu || + chartype == ucp_Ll || + chartype == ucp_Lt) == (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + } + /* Control never gets here */ + + case PT_GC: + for (;;) + { + RMATCH(Fecode, RM210); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if ((UCD_CATEGORY(fc) == Lpropvalue) == (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + } + /* Control never gets here */ + + case PT_PC: + for (;;) + { + RMATCH(Fecode, RM211); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if ((UCD_CHARTYPE(fc) == Lpropvalue) == (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + } + /* Control never gets here */ + + case PT_SC: + for (;;) + { + RMATCH(Fecode, RM212); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if ((UCD_SCRIPT(fc) == Lpropvalue) == (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + } + /* Control never gets here */ + + case PT_ALNUM: + for (;;) + { + int category; + RMATCH(Fecode, RM213); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + category = UCD_CATEGORY(fc); + if ((category == ucp_L || category == ucp_N) == + (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + } + /* Control never gets here */ + + /* Perl space used to exclude VT, but from Perl 5.18 it is included, + which means that Perl space and POSIX space are now identical. PCRE + was changed at release 8.34. */ + + case PT_SPACE: /* Perl space */ + case PT_PXSPACE: /* POSIX space */ + for (;;) + { + RMATCH(Fecode, RM214); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + switch(fc) + { + HSPACE_CASES: + VSPACE_CASES: + if (Lctype == OP_NOTPROP) RRETURN(MATCH_NOMATCH); + break; + + default: + if ((UCD_CATEGORY(fc) == ucp_Z) == (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + break; + } + } + /* Control never gets here */ + + case PT_WORD: + for (;;) + { + int category; + RMATCH(Fecode, RM215); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + category = UCD_CATEGORY(fc); + if ((category == ucp_L || + category == ucp_N || + fc == CHAR_UNDERSCORE) == (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + } + /* Control never gets here */ + + case PT_CLIST: + for (;;) + { + const uint32_t *cp; + RMATCH(Fecode, RM216); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + cp = PRIV(ucd_caseless_sets) + Lpropvalue; + for (;;) + { + if (fc < *cp) + { + if (Lctype == OP_NOTPROP) break; + RRETURN(MATCH_NOMATCH); + } + if (fc == *cp++) + { + if (Lctype == OP_NOTPROP) RRETURN(MATCH_NOMATCH); + break; + } + } + } + /* Control never gets here */ + + case PT_UCNC: + for (;;) + { + RMATCH(Fecode, RM217); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + GETCHARINCTEST(fc, Feptr); + if ((fc == CHAR_DOLLAR_SIGN || fc == CHAR_COMMERCIAL_AT || + fc == CHAR_GRAVE_ACCENT || (fc >= 0xa0 && fc <= 0xd7ff) || + fc >= 0xe000) == (Lctype == OP_NOTPROP)) + RRETURN(MATCH_NOMATCH); + } + /* Control never gets here */ + + /* This should never occur */ + default: + return PCRE2_ERROR_INTERNAL; + } + } + + /* Match extended Unicode sequences. We will get here only if the + support is in the binary; otherwise a compile-time error occurs. */ + + else if (Lctype == OP_EXTUNI) + { + for (;;) + { + RMATCH(Fecode, RM218); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + else + { + GETCHARINCTEST(fc, Feptr); + Feptr = PRIV(extuni)(fc, Feptr, mb->start_subject, mb->end_subject, + utf, NULL); + } + CHECK_PARTIAL(); + } + } + else +#endif /* SUPPORT_UNICODE */ + + /* UTF mode for non-property testing character types. */ + +#ifdef SUPPORT_UNICODE + if (utf) + { + for (;;) + { + RMATCH(Fecode, RM219); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (Lctype == OP_ANY && IS_NEWLINE(Feptr)) RRETURN(MATCH_NOMATCH); + GETCHARINC(fc, Feptr); + switch(Lctype) + { + case OP_ANY: /* This is the non-NL case */ + if (mb->partial != 0 && /* Take care with CRLF partial */ + Feptr >= mb->end_subject && + NLBLOCK->nltype == NLTYPE_FIXED && + NLBLOCK->nllen == 2 && + fc == NLBLOCK->nl[0]) + { + mb->hitend = TRUE; + if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; + } + break; + + case OP_ALLANY: + case OP_ANYBYTE: + break; + + case OP_ANYNL: + switch(fc) + { + default: RRETURN(MATCH_NOMATCH); + + case CHAR_CR: + if (Feptr < mb->end_subject && UCHAR21(Feptr) == CHAR_LF) Feptr++; + break; + + case CHAR_LF: + break; + + case CHAR_VT: + case CHAR_FF: + case CHAR_NEL: +#ifndef EBCDIC + case 0x2028: + case 0x2029: +#endif /* Not EBCDIC */ + if (mb->bsr_convention == PCRE2_BSR_ANYCRLF) + RRETURN(MATCH_NOMATCH); + break; + } + break; + + case OP_NOT_HSPACE: + switch(fc) + { + HSPACE_CASES: RRETURN(MATCH_NOMATCH); + default: break; + } + break; + + case OP_HSPACE: + switch(fc) + { + HSPACE_CASES: break; + default: RRETURN(MATCH_NOMATCH); + } + break; + + case OP_NOT_VSPACE: + switch(fc) + { + VSPACE_CASES: RRETURN(MATCH_NOMATCH); + default: break; + } + break; + + case OP_VSPACE: + switch(fc) + { + VSPACE_CASES: break; + default: RRETURN(MATCH_NOMATCH); + } + break; + + case OP_NOT_DIGIT: + if (fc < 256 && (mb->ctypes[fc] & ctype_digit) != 0) + RRETURN(MATCH_NOMATCH); + break; + + case OP_DIGIT: + if (fc >= 256 || (mb->ctypes[fc] & ctype_digit) == 0) + RRETURN(MATCH_NOMATCH); + break; + + case OP_NOT_WHITESPACE: + if (fc < 256 && (mb->ctypes[fc] & ctype_space) != 0) + RRETURN(MATCH_NOMATCH); + break; + + case OP_WHITESPACE: + if (fc >= 256 || (mb->ctypes[fc] & ctype_space) == 0) + RRETURN(MATCH_NOMATCH); + break; + + case OP_NOT_WORDCHAR: + if (fc < 256 && (mb->ctypes[fc] & ctype_word) != 0) + RRETURN(MATCH_NOMATCH); + break; + + case OP_WORDCHAR: + if (fc >= 256 || (mb->ctypes[fc] & ctype_word) == 0) + RRETURN(MATCH_NOMATCH); + break; + + default: + return PCRE2_ERROR_INTERNAL; + } + } + } + else +#endif /* SUPPORT_UNICODE */ + + /* Not UTF mode */ + { + for (;;) + { + RMATCH(Fecode, RM33); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + if (Lctype == OP_ANY && IS_NEWLINE(Feptr)) + RRETURN(MATCH_NOMATCH); + fc = *Feptr++; + switch(Lctype) + { + case OP_ANY: /* This is the non-NL case */ + if (mb->partial != 0 && /* Take care with CRLF partial */ + Feptr >= mb->end_subject && + NLBLOCK->nltype == NLTYPE_FIXED && + NLBLOCK->nllen == 2 && + fc == NLBLOCK->nl[0]) + { + mb->hitend = TRUE; + if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; + } + break; + + case OP_ALLANY: + case OP_ANYBYTE: + break; + + case OP_ANYNL: + switch(fc) + { + default: RRETURN(MATCH_NOMATCH); + + case CHAR_CR: + if (Feptr < mb->end_subject && *Feptr == CHAR_LF) Feptr++; + break; + + case CHAR_LF: + break; + + case CHAR_VT: + case CHAR_FF: + case CHAR_NEL: +#if PCRE2_CODE_UNIT_WIDTH != 8 + case 0x2028: + case 0x2029: +#endif + if (mb->bsr_convention == PCRE2_BSR_ANYCRLF) + RRETURN(MATCH_NOMATCH); + break; + } + break; + + case OP_NOT_HSPACE: + switch(fc) + { + default: break; + HSPACE_BYTE_CASES: +#if PCRE2_CODE_UNIT_WIDTH != 8 + HSPACE_MULTIBYTE_CASES: +#endif + RRETURN(MATCH_NOMATCH); + } + break; + + case OP_HSPACE: + switch(fc) + { + default: RRETURN(MATCH_NOMATCH); + HSPACE_BYTE_CASES: +#if PCRE2_CODE_UNIT_WIDTH != 8 + HSPACE_MULTIBYTE_CASES: +#endif + break; + } + break; + + case OP_NOT_VSPACE: + switch(fc) + { + default: break; + VSPACE_BYTE_CASES: +#if PCRE2_CODE_UNIT_WIDTH != 8 + VSPACE_MULTIBYTE_CASES: +#endif + RRETURN(MATCH_NOMATCH); + } + break; + + case OP_VSPACE: + switch(fc) + { + default: RRETURN(MATCH_NOMATCH); + VSPACE_BYTE_CASES: +#if PCRE2_CODE_UNIT_WIDTH != 8 + VSPACE_MULTIBYTE_CASES: +#endif + break; + } + break; + + case OP_NOT_DIGIT: + if (MAX_255(fc) && (mb->ctypes[fc] & ctype_digit) != 0) + RRETURN(MATCH_NOMATCH); + break; + + case OP_DIGIT: + if (!MAX_255(fc) || (mb->ctypes[fc] & ctype_digit) == 0) + RRETURN(MATCH_NOMATCH); + break; + + case OP_NOT_WHITESPACE: + if (MAX_255(fc) && (mb->ctypes[fc] & ctype_space) != 0) + RRETURN(MATCH_NOMATCH); + break; + + case OP_WHITESPACE: + if (!MAX_255(fc) || (mb->ctypes[fc] & ctype_space) == 0) + RRETURN(MATCH_NOMATCH); + break; + + case OP_NOT_WORDCHAR: + if (MAX_255(fc) && (mb->ctypes[fc] & ctype_word) != 0) + RRETURN(MATCH_NOMATCH); + break; + + case OP_WORDCHAR: + if (!MAX_255(fc) || (mb->ctypes[fc] & ctype_word) == 0) + RRETURN(MATCH_NOMATCH); + break; + + default: + return PCRE2_ERROR_INTERNAL; + } + } + } + /* Control never gets here */ + } + + /* If maximizing, it is worth using inline code for speed, doing the type + test once at the start (i.e. keep it out of the loop). */ + + else + { + Lstart_eptr = Feptr; /* Remember where we started */ + +#ifdef SUPPORT_UNICODE + if (proptype >= 0) + { + switch(proptype) + { + case PT_ANY: + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLENTEST(fc, Feptr, len); + if (Lctype == OP_NOTPROP) break; + Feptr+= len; + } + break; + + case PT_LAMP: + for (i = Lmin; i < Lmax; i++) + { + int chartype; + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLENTEST(fc, Feptr, len); + chartype = UCD_CHARTYPE(fc); + if ((chartype == ucp_Lu || + chartype == ucp_Ll || + chartype == ucp_Lt) == (Lctype == OP_NOTPROP)) + break; + Feptr+= len; + } + break; + + case PT_GC: + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLENTEST(fc, Feptr, len); + if ((UCD_CATEGORY(fc) == Lpropvalue) == (Lctype == OP_NOTPROP)) + break; + Feptr+= len; + } + break; + + case PT_PC: + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLENTEST(fc, Feptr, len); + if ((UCD_CHARTYPE(fc) == Lpropvalue) == (Lctype == OP_NOTPROP)) + break; + Feptr+= len; + } + break; + + case PT_SC: + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLENTEST(fc, Feptr, len); + if ((UCD_SCRIPT(fc) == Lpropvalue) == (Lctype == OP_NOTPROP)) + break; + Feptr+= len; + } + break; + + case PT_ALNUM: + for (i = Lmin; i < Lmax; i++) + { + int category; + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLENTEST(fc, Feptr, len); + category = UCD_CATEGORY(fc); + if ((category == ucp_L || category == ucp_N) == + (Lctype == OP_NOTPROP)) + break; + Feptr+= len; + } + break; + + /* Perl space used to exclude VT, but from Perl 5.18 it is included, + which means that Perl space and POSIX space are now identical. PCRE + was changed at release 8.34. */ + + case PT_SPACE: /* Perl space */ + case PT_PXSPACE: /* POSIX space */ + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLENTEST(fc, Feptr, len); + switch(fc) + { + HSPACE_CASES: + VSPACE_CASES: + if (Lctype == OP_NOTPROP) goto ENDLOOP99; /* Break the loop */ + break; + + default: + if ((UCD_CATEGORY(fc) == ucp_Z) == (Lctype == OP_NOTPROP)) + goto ENDLOOP99; /* Break the loop */ + break; + } + Feptr+= len; + } + ENDLOOP99: + break; + + case PT_WORD: + for (i = Lmin; i < Lmax; i++) + { + int category; + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLENTEST(fc, Feptr, len); + category = UCD_CATEGORY(fc); + if ((category == ucp_L || category == ucp_N || + fc == CHAR_UNDERSCORE) == (Lctype == OP_NOTPROP)) + break; + Feptr+= len; + } + break; + + case PT_CLIST: + for (i = Lmin; i < Lmax; i++) + { + const uint32_t *cp; + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLENTEST(fc, Feptr, len); + cp = PRIV(ucd_caseless_sets) + Lpropvalue; + for (;;) + { + if (fc < *cp) + { if (Lctype == OP_NOTPROP) break; else goto GOT_MAX; } + if (fc == *cp++) + { if (Lctype == OP_NOTPROP) goto GOT_MAX; else break; } + } + Feptr += len; + } + GOT_MAX: + break; + + case PT_UCNC: + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLENTEST(fc, Feptr, len); + if ((fc == CHAR_DOLLAR_SIGN || fc == CHAR_COMMERCIAL_AT || + fc == CHAR_GRAVE_ACCENT || (fc >= 0xa0 && fc <= 0xd7ff) || + fc >= 0xe000) == (Lctype == OP_NOTPROP)) + break; + Feptr += len; + } + break; + + default: + return PCRE2_ERROR_INTERNAL; + } + + /* Feptr is now past the end of the maximum run */ + + if (reptype == REPTYPE_POS) continue; /* No backtracking */ + + /* After \C in UTF mode, Lstart_eptr might be in the middle of a + Unicode character. Use <= Lstart_eptr to ensure backtracking doesn't + go too far. */ + + for(;;) + { + if (Feptr <= Lstart_eptr) break; + RMATCH(Fecode, RM222); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Feptr--; + if (utf) BACKCHAR(Feptr); + } + } + + /* Match extended Unicode grapheme clusters. We will get here only if the + support is in the binary; otherwise a compile-time error occurs. */ + + else if (Lctype == OP_EXTUNI) + { + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + else + { + GETCHARINCTEST(fc, Feptr); + Feptr = PRIV(extuni)(fc, Feptr, mb->start_subject, mb->end_subject, + utf, NULL); + } + CHECK_PARTIAL(); + } + + /* Feptr is now past the end of the maximum run */ + + if (reptype == REPTYPE_POS) continue; /* No backtracking */ + + /* We use <= Lstart_eptr rather than == Lstart_eptr to detect the start + of the run while backtracking because the use of \C in UTF mode can + cause BACKCHAR to move back past Lstart_eptr. This is just palliative; + the use of \C in UTF mode is fraught with danger. */ + + for(;;) + { + int lgb, rgb; + PCRE2_SPTR fptr; + + if (Feptr <= Lstart_eptr) break; /* At start of char run */ + RMATCH(Fecode, RM220); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + + /* Backtracking over an extended grapheme cluster involves inspecting + the previous two characters (if present) to see if a break is + permitted between them. */ + + Feptr--; + if (!utf) fc = *Feptr; else + { + BACKCHAR(Feptr); + GETCHAR(fc, Feptr); + } + rgb = UCD_GRAPHBREAK(fc); + + for (;;) + { + if (Feptr <= Lstart_eptr) break; /* At start of char run */ + fptr = Feptr - 1; + if (!utf) fc = *fptr; else + { + BACKCHAR(fptr); + GETCHAR(fc, fptr); + } + lgb = UCD_GRAPHBREAK(fc); + if ((PRIV(ucp_gbtable)[lgb] & (1u << rgb)) == 0) break; + Feptr = fptr; + rgb = lgb; + } + } + } + + else +#endif /* SUPPORT_UNICODE */ + +#ifdef SUPPORT_UNICODE + if (utf) + { + switch(Lctype) + { + case OP_ANY: + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + if (IS_NEWLINE(Feptr)) break; + if (mb->partial != 0 && /* Take care with CRLF partial */ + Feptr + 1 >= mb->end_subject && + NLBLOCK->nltype == NLTYPE_FIXED && + NLBLOCK->nllen == 2 && + UCHAR21(Feptr) == NLBLOCK->nl[0]) + { + mb->hitend = TRUE; + if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; + } + Feptr++; + ACROSSCHAR(Feptr < mb->end_subject, Feptr, Feptr++); + } + break; + + case OP_ALLANY: + if (Lmax < UINT32_MAX) + { + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + Feptr++; + ACROSSCHAR(Feptr < mb->end_subject, Feptr, Feptr++); + } + } + else + { + Feptr = mb->end_subject; /* Unlimited UTF-8 repeat */ + SCHECK_PARTIAL(); + } + break; + + /* The "byte" (i.e. "code unit") case is the same as non-UTF */ + + case OP_ANYBYTE: + fc = Lmax - Lmin; + if (fc > (uint32_t)(mb->end_subject - Feptr)) + { + Feptr = mb->end_subject; + SCHECK_PARTIAL(); + } + else Feptr += fc; + break; + + case OP_ANYNL: + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLEN(fc, Feptr, len); + if (fc == CHAR_CR) + { + if (++Feptr >= mb->end_subject) break; + if (UCHAR21(Feptr) == CHAR_LF) Feptr++; + } + else + { + if (fc != CHAR_LF && + (mb->bsr_convention == PCRE2_BSR_ANYCRLF || + (fc != CHAR_VT && fc != CHAR_FF && fc != CHAR_NEL +#ifndef EBCDIC + && fc != 0x2028 && fc != 0x2029 +#endif /* Not EBCDIC */ + ))) + break; + Feptr += len; + } + } + break; + + case OP_NOT_HSPACE: + case OP_HSPACE: + for (i = Lmin; i < Lmax; i++) + { + BOOL gotspace; + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLEN(fc, Feptr, len); + switch(fc) + { + HSPACE_CASES: gotspace = TRUE; break; + default: gotspace = FALSE; break; + } + if (gotspace == (Lctype == OP_NOT_HSPACE)) break; + Feptr += len; + } + break; + + case OP_NOT_VSPACE: + case OP_VSPACE: + for (i = Lmin; i < Lmax; i++) + { + BOOL gotspace; + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLEN(fc, Feptr, len); + switch(fc) + { + VSPACE_CASES: gotspace = TRUE; break; + default: gotspace = FALSE; break; + } + if (gotspace == (Lctype == OP_NOT_VSPACE)) break; + Feptr += len; + } + break; + + case OP_NOT_DIGIT: + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLEN(fc, Feptr, len); + if (fc < 256 && (mb->ctypes[fc] & ctype_digit) != 0) break; + Feptr+= len; + } + break; + + case OP_DIGIT: + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLEN(fc, Feptr, len); + if (fc >= 256 ||(mb->ctypes[fc] & ctype_digit) == 0) break; + Feptr+= len; + } + break; + + case OP_NOT_WHITESPACE: + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLEN(fc, Feptr, len); + if (fc < 256 && (mb->ctypes[fc] & ctype_space) != 0) break; + Feptr+= len; + } + break; + + case OP_WHITESPACE: + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLEN(fc, Feptr, len); + if (fc >= 256 ||(mb->ctypes[fc] & ctype_space) == 0) break; + Feptr+= len; + } + break; + + case OP_NOT_WORDCHAR: + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLEN(fc, Feptr, len); + if (fc < 256 && (mb->ctypes[fc] & ctype_word) != 0) break; + Feptr+= len; + } + break; + + case OP_WORDCHAR: + for (i = Lmin; i < Lmax; i++) + { + int len = 1; + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + GETCHARLEN(fc, Feptr, len); + if (fc >= 256 || (mb->ctypes[fc] & ctype_word) == 0) break; + Feptr+= len; + } + break; + + default: + return PCRE2_ERROR_INTERNAL; + } + + if (reptype == REPTYPE_POS) continue; /* No backtracking */ + + /* After \C in UTF mode, Lstart_eptr might be in the middle of a + Unicode character. Use <= Lstart_eptr to ensure backtracking doesn't go + too far. */ + + for(;;) + { + if (Feptr <= Lstart_eptr) break; + RMATCH(Fecode, RM221); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Feptr--; + BACKCHAR(Feptr); + if (Lctype == OP_ANYNL && Feptr > Lstart_eptr && + UCHAR21(Feptr) == CHAR_NL && UCHAR21(Feptr - 1) == CHAR_CR) + Feptr--; + } + } + else +#endif /* SUPPORT_UNICODE */ + + /* Not UTF mode */ + { + switch(Lctype) + { + case OP_ANY: + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + if (IS_NEWLINE(Feptr)) break; + if (mb->partial != 0 && /* Take care with CRLF partial */ + Feptr + 1 >= mb->end_subject && + NLBLOCK->nltype == NLTYPE_FIXED && + NLBLOCK->nllen == 2 && + *Feptr == NLBLOCK->nl[0]) + { + mb->hitend = TRUE; + if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; + } + Feptr++; + } + break; + + case OP_ALLANY: + case OP_ANYBYTE: + fc = Lmax - Lmin; + if (fc > (uint32_t)(mb->end_subject - Feptr)) + { + Feptr = mb->end_subject; + SCHECK_PARTIAL(); + } + else Feptr += fc; + break; + + case OP_ANYNL: + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + fc = *Feptr; + if (fc == CHAR_CR) + { + if (++Feptr >= mb->end_subject) break; + if (*Feptr == CHAR_LF) Feptr++; + } + else + { + if (fc != CHAR_LF && (mb->bsr_convention == PCRE2_BSR_ANYCRLF || + (fc != CHAR_VT && fc != CHAR_FF && fc != CHAR_NEL +#if PCRE2_CODE_UNIT_WIDTH != 8 + && fc != 0x2028 && fc != 0x2029 +#endif + ))) break; + Feptr++; + } + } + break; + + case OP_NOT_HSPACE: + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + switch(*Feptr) + { + default: Feptr++; break; + HSPACE_BYTE_CASES: +#if PCRE2_CODE_UNIT_WIDTH != 8 + HSPACE_MULTIBYTE_CASES: +#endif + goto ENDLOOP00; + } + } + ENDLOOP00: + break; + + case OP_HSPACE: + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + switch(*Feptr) + { + default: goto ENDLOOP01; + HSPACE_BYTE_CASES: +#if PCRE2_CODE_UNIT_WIDTH != 8 + HSPACE_MULTIBYTE_CASES: +#endif + Feptr++; break; + } + } + ENDLOOP01: + break; + + case OP_NOT_VSPACE: + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + switch(*Feptr) + { + default: Feptr++; break; + VSPACE_BYTE_CASES: +#if PCRE2_CODE_UNIT_WIDTH != 8 + VSPACE_MULTIBYTE_CASES: +#endif + goto ENDLOOP02; + } + } + ENDLOOP02: + break; + + case OP_VSPACE: + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + switch(*Feptr) + { + default: goto ENDLOOP03; + VSPACE_BYTE_CASES: +#if PCRE2_CODE_UNIT_WIDTH != 8 + VSPACE_MULTIBYTE_CASES: +#endif + Feptr++; break; + } + } + ENDLOOP03: + break; + + case OP_NOT_DIGIT: + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + if (MAX_255(*Feptr) && (mb->ctypes[*Feptr] & ctype_digit) != 0) + break; + Feptr++; + } + break; + + case OP_DIGIT: + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + if (!MAX_255(*Feptr) || (mb->ctypes[*Feptr] & ctype_digit) == 0) + break; + Feptr++; + } + break; + + case OP_NOT_WHITESPACE: + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + if (MAX_255(*Feptr) && (mb->ctypes[*Feptr] & ctype_space) != 0) + break; + Feptr++; + } + break; + + case OP_WHITESPACE: + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + if (!MAX_255(*Feptr) || (mb->ctypes[*Feptr] & ctype_space) == 0) + break; + Feptr++; + } + break; + + case OP_NOT_WORDCHAR: + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + if (MAX_255(*Feptr) && (mb->ctypes[*Feptr] & ctype_word) != 0) + break; + Feptr++; + } + break; + + case OP_WORDCHAR: + for (i = Lmin; i < Lmax; i++) + { + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + break; + } + if (!MAX_255(*Feptr) || (mb->ctypes[*Feptr] & ctype_word) == 0) + break; + Feptr++; + } + break; + + default: + return PCRE2_ERROR_INTERNAL; + } + + if (reptype == REPTYPE_POS) continue; /* No backtracking */ + + for (;;) + { + if (Feptr == Lstart_eptr) break; + RMATCH(Fecode, RM34); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Feptr--; + if (Lctype == OP_ANYNL && Feptr > Lstart_eptr && *Feptr == CHAR_LF && + Feptr[-1] == CHAR_CR) Feptr--; + } + } + } + break; /* End of repeat character type processing */ + +#undef Lstart_eptr +#undef Lmin +#undef Lmax +#undef Lctype +#undef Lpropvalue + + + /* ===================================================================== */ + /* Match a back reference, possibly repeatedly. Look past the end of the + item to see if there is repeat information following. The OP_REF and + OP_REFI opcodes are used for a reference to a numbered group or to a + non-duplicated named group. For a duplicated named group, OP_DNREF and + OP_DNREFI are used. In this case we must scan the list of groups to which + the name refers, and use the first one that is set. */ + +#define Lmin F->temp_32[0] +#define Lmax F->temp_32[1] +#define Lcaseless F->temp_32[2] +#define Lstart F->temp_sptr[0] +#define Loffset F->temp_size + + case OP_DNREF: + case OP_DNREFI: + Lcaseless = (Fop == OP_DNREFI); + { + int count = GET2(Fecode, 1+IMM2_SIZE); + PCRE2_SPTR slot = mb->name_table + GET2(Fecode, 1) * mb->name_entry_size; + Fecode += 1 + 2*IMM2_SIZE; + + while (count-- > 0) + { + Loffset = (GET2(slot, 0) << 1) - 2; + if (Loffset < Foffset_top && Fovector[Loffset] != PCRE2_UNSET) break; + slot += mb->name_entry_size; + } + } + goto REF_REPEAT; + + case OP_REF: + case OP_REFI: + Lcaseless = (Fop == OP_REFI); + Loffset = (GET2(Fecode, 1) << 1) - 2; + Fecode += 1 + IMM2_SIZE; + + /* Set up for repetition, or handle the non-repeated case. The maximum and + minimum must be in the heap frame, but as they are short-term values, we + use temporary fields. */ + + REF_REPEAT: + switch (*Fecode) + { + case OP_CRSTAR: + case OP_CRMINSTAR: + case OP_CRPLUS: + case OP_CRMINPLUS: + case OP_CRQUERY: + case OP_CRMINQUERY: + fc = *Fecode++ - OP_CRSTAR; + Lmin = rep_min[fc]; + Lmax = rep_max[fc]; + reptype = rep_typ[fc]; + break; + + case OP_CRRANGE: + case OP_CRMINRANGE: + Lmin = GET2(Fecode, 1); + Lmax = GET2(Fecode, 1 + IMM2_SIZE); + reptype = rep_typ[*Fecode - OP_CRSTAR]; + if (Lmax == 0) Lmax = UINT32_MAX; /* Max 0 => infinity */ + Fecode += 1 + 2 * IMM2_SIZE; + break; + + default: /* No repeat follows */ + { + rrc = match_ref(Loffset, Lcaseless, F, mb, &length); + if (rrc != 0) + { + if (rrc > 0) Feptr = mb->end_subject; /* Partial match */ + CHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + } + Feptr += length; + continue; /* With the main loop */ + } + + /* Handle repeated back references. If a set group has length zero, just + continue with the main loop, because it matches however many times. For an + unset reference, if the minimum is zero, we can also just continue. We can + also continue if PCRE2_MATCH_UNSET_BACKREF is set, because this makes unset + group behave as a zero-length group. For any other unset cases, carrying + on will result in NOMATCH. */ + + if (Loffset < Foffset_top && Fovector[Loffset] != PCRE2_UNSET) + { + if (Fovector[Loffset] == Fovector[Loffset + 1]) continue; + } + else /* Group is not set */ + { + if (Lmin == 0 || (mb->poptions & PCRE2_MATCH_UNSET_BACKREF) != 0) + continue; + } + + /* First, ensure the minimum number of matches are present. */ + + for (i = 1; i <= Lmin; i++) + { + PCRE2_SIZE slength; + rrc = match_ref(Loffset, Lcaseless, F, mb, &slength); + if (rrc != 0) + { + if (rrc > 0) Feptr = mb->end_subject; /* Partial match */ + CHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + Feptr += slength; + } + + /* If min = max, we are done. They are not both allowed to be zero. */ + + if (Lmin == Lmax) continue; + + /* If minimizing, keep trying and advancing the pointer. */ + + if (reptype == REPTYPE_MIN) + { + for (;;) + { + PCRE2_SIZE slength; + RMATCH(Fecode, RM20); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Lmin++ >= Lmax) RRETURN(MATCH_NOMATCH); + rrc = match_ref(Loffset, Lcaseless, F, mb, &slength); + if (rrc != 0) + { + if (rrc > 0) Feptr = mb->end_subject; /* Partial match */ + CHECK_PARTIAL(); + RRETURN(MATCH_NOMATCH); + } + Feptr += slength; + } + /* Control never gets here */ + } + + /* If maximizing, find the longest string and work backwards, as long as + the matched lengths for each iteration are the same. */ + + else + { + BOOL samelengths = TRUE; + Lstart = Feptr; /* Starting position */ + Flength = Fovector[Loffset+1] - Fovector[Loffset]; + + for (i = Lmin; i < Lmax; i++) + { + PCRE2_SIZE slength; + rrc = match_ref(Loffset, Lcaseless, F, mb, &slength); + if (rrc != 0) + { + /* Can't use CHECK_PARTIAL because we don't want to update Feptr in + the soft partial matching case. */ + + if (rrc > 0 && mb->partial != 0 && + mb->end_subject > mb->start_used_ptr) + { + mb->hitend = TRUE; + if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; + } + break; + } + + if (slength != Flength) samelengths = FALSE; + Feptr += slength; + } + + /* If the length matched for each repetition is the same as the length of + the captured group, we can easily work backwards. This is the normal + case. However, in caseless UTF-8 mode there are pairs of case-equivalent + characters whose lengths (in terms of code units) differ. However, this + is very rare, so we handle it by re-matching fewer and fewer times. */ + + if (samelengths) + { + while (Feptr >= Lstart) + { + RMATCH(Fecode, RM21); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Feptr -= Flength; + } + } + + /* The rare case of non-matching lengths. Re-scan the repetition for each + iteration. We know that match_ref() will succeed every time. */ + + else + { + Lmax = i; + for (;;) + { + RMATCH(Fecode, RM22); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + if (Feptr == Lstart) break; /* Failed after minimal repetition */ + Feptr = Lstart; + Lmax--; + for (i = Lmin; i < Lmax; i++) + { + PCRE2_SIZE slength; + (void)match_ref(Loffset, Lcaseless, F, mb, &slength); + Feptr += slength; + } + } + } + + RRETURN(MATCH_NOMATCH); + } + /* Control never gets here */ + +#undef Lcaseless +#undef Lmin +#undef Lmax +#undef Lstart +#undef Loffset + + + +/* ========================================================================= */ +/* Opcodes for the start of various parenthesized items */ +/* ========================================================================= */ + + /* In all cases, if the result of RMATCH() is MATCH_THEN, check whether the + (*THEN) is within the current branch by comparing the address of OP_THEN + that is passed back with the end of the branch. If (*THEN) is within the + current branch, and the branch is one of two or more alternatives (it + either starts or ends with OP_ALT), we have reached the limit of THEN's + action, so convert the return code to NOMATCH, which will cause normal + backtracking to happen from now on. Otherwise, THEN is passed back to an + outer alternative. This implements Perl's treatment of parenthesized + groups, where a group not containing | does not affect the current + alternative, that is, (X) is NOT the same as (X|(*F)). */ + + + /* ===================================================================== */ + /* BRAZERO, BRAMINZERO and SKIPZERO occur just before a non-possessive + bracket group, indicating that it may occur zero times. It may repeat + infinitely, or not at all - i.e. it could be ()* or ()? or even (){0} in + the pattern. Brackets with fixed upper repeat limits are compiled as a + number of copies, with the optional ones preceded by BRAZERO or BRAMINZERO. + Possessive groups with possible zero repeats are preceded by BRAPOSZERO. */ + +#define Lnext_ecode F->temp_sptr[0] + + case OP_BRAZERO: + Lnext_ecode = Fecode + 1; + RMATCH(Lnext_ecode, RM9); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + do Lnext_ecode += GET(Lnext_ecode, 1); while (*Lnext_ecode == OP_ALT); + Fecode = Lnext_ecode + 1 + LINK_SIZE; + break; + + case OP_BRAMINZERO: + Lnext_ecode = Fecode + 1; + do Lnext_ecode += GET(Lnext_ecode, 1); while (*Lnext_ecode == OP_ALT); + RMATCH(Lnext_ecode + 1 + LINK_SIZE, RM10); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Fecode++; + break; + +#undef Lnext_ecode + + case OP_SKIPZERO: + Fecode++; + do Fecode += GET(Fecode,1); while (*Fecode == OP_ALT); + Fecode += 1 + LINK_SIZE; + break; + + + /* ===================================================================== */ + /* Handle possessive brackets with an unlimited repeat. The end of these + brackets will always be OP_KETRPOS, which returns MATCH_KETRPOS without + going further in the pattern. */ + +#define Lframe_type F->temp_32[0] +#define Lmatched_once F->temp_32[1] +#define Lzero_allowed F->temp_32[2] +#define Lstart_eptr F->temp_sptr[0] +#define Lstart_group F->temp_sptr[1] + + case OP_BRAPOSZERO: + Lzero_allowed = TRUE; /* Zero repeat is allowed */ + Fecode += 1; + if (*Fecode == OP_CBRAPOS || *Fecode == OP_SCBRAPOS) + goto POSSESSIVE_CAPTURE; + goto POSSESSIVE_NON_CAPTURE; + + case OP_BRAPOS: + case OP_SBRAPOS: + Lzero_allowed = FALSE; /* Zero repeat not allowed */ + + POSSESSIVE_NON_CAPTURE: + Lframe_type = GF_NOCAPTURE; /* Remembered frame type */ + goto POSSESSIVE_GROUP; + + case OP_CBRAPOS: + case OP_SCBRAPOS: + Lzero_allowed = FALSE; /* Zero repeat not allowed */ + + POSSESSIVE_CAPTURE: + number = GET2(Fecode, 1+LINK_SIZE); + Lframe_type = GF_CAPTURE | number; /* Remembered frame type */ + + POSSESSIVE_GROUP: + Lmatched_once = FALSE; /* Never matched */ + Lstart_group = Fecode; /* Start of this group */ + + for (;;) + { + Lstart_eptr = Feptr; /* Position at group start */ + group_frame_type = Lframe_type; + RMATCH(Fecode + PRIV(OP_lengths)[*Fecode], RM8); + if (rrc == MATCH_KETRPOS) + { + Lmatched_once = TRUE; /* Matched at least once */ + if (Feptr == Lstart_eptr) /* Empty match; skip to end */ + { + do Fecode += GET(Fecode, 1); while (*Fecode == OP_ALT); + break; + } + + Fecode = Lstart_group; + continue; + } + + /* See comment above about handling THEN. */ + + if (rrc == MATCH_THEN) + { + PCRE2_SPTR next_ecode = Fecode + GET(Fecode,1); + if (mb->verb_ecode_ptr < next_ecode && + (*Fecode == OP_ALT || *next_ecode == OP_ALT)) + rrc = MATCH_NOMATCH; + } + + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Fecode += GET(Fecode, 1); + if (*Fecode != OP_ALT) break; + } + + /* Success if matched something or zero repeat allowed */ + + if (Lmatched_once || Lzero_allowed) + { + Fecode += 1 + LINK_SIZE; + break; + } + + RRETURN(MATCH_NOMATCH); + +#undef Lmatched_once +#undef Lzero_allowed +#undef Lframe_type +#undef Lstart_eptr +#undef Lstart_group + + + /* ===================================================================== */ + /* Handle non-capturing brackets that cannot match an empty string. When we + get to the final alternative within the brackets, as long as there are no + THEN's in the pattern, we can optimize by not recording a new backtracking + point. (Ideally we should test for a THEN within this group, but we don't + have that information.) Don't do this if we are at the very top level, + however, because that would make handling assertions and once-only brackets + messier when there is nothing to go back to. */ + +#define Lframe_type F->temp_32[0] /* Set for all that use GROUPLOOP */ +#define Lnext_branch F->temp_sptr[0] /* Used only in OP_BRA handling */ + + case OP_BRA: + if (mb->hasthen || Frdepth == 0) + { + Lframe_type = 0; + goto GROUPLOOP; + } + + for (;;) + { + Lnext_branch = Fecode + GET(Fecode, 1); + if (*Lnext_branch != OP_ALT) break; + + /* This is never the final branch. We do not need to test for MATCH_THEN + here because this code is not used when there is a THEN in the pattern. */ + + RMATCH(Fecode + PRIV(OP_lengths)[*Fecode], RM1); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Fecode = Lnext_branch; + } + + /* Hit the start of the final branch. Continue at this level. */ + + Fecode += PRIV(OP_lengths)[*Fecode]; + break; + +#undef Lnext_branch + + + /* ===================================================================== */ + /* Handle a capturing bracket, other than those that are possessive with an + unlimited repeat. */ + + case OP_CBRA: + case OP_SCBRA: + Lframe_type = GF_CAPTURE | GET2(Fecode, 1+LINK_SIZE); + goto GROUPLOOP; + + + /* ===================================================================== */ + /* Atomic groups and non-capturing brackets that can match an empty string + must record a backtracking point and also set up a chained frame. */ + + case OP_ONCE: + case OP_SCRIPT_RUN: + case OP_SBRA: + Lframe_type = GF_NOCAPTURE | Fop; + + GROUPLOOP: + for (;;) + { + group_frame_type = Lframe_type; + RMATCH(Fecode + PRIV(OP_lengths)[*Fecode], RM2); + if (rrc == MATCH_THEN) + { + PCRE2_SPTR next_ecode = Fecode + GET(Fecode,1); + if (mb->verb_ecode_ptr < next_ecode && + (*Fecode == OP_ALT || *next_ecode == OP_ALT)) + rrc = MATCH_NOMATCH; + } + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Fecode += GET(Fecode, 1); + if (*Fecode != OP_ALT) RRETURN(MATCH_NOMATCH); + } + /* Control never reaches here. */ + +#undef Lframe_type + + + /* ===================================================================== */ + /* Recursion either matches the current regex, or some subexpression. The + offset data is the offset to the starting bracket from the start of the + whole pattern. (This is so that it works from duplicated subpatterns.) */ + +#define Lframe_type F->temp_32[0] +#define Lstart_branch F->temp_sptr[0] + + case OP_RECURSE: + bracode = mb->start_code + GET(Fecode, 1); + number = (bracode == mb->start_code)? 0 : GET2(bracode, 1 + LINK_SIZE); + + /* If we are already in a recursion, check for repeating the same one + without advancing the subject pointer. This should catch convoluted mutual + recursions. (Some simple cases are caught at compile time.) */ + + if (Fcurrent_recurse != RECURSE_UNSET) + { + offset = Flast_group_offset; + while (offset != PCRE2_UNSET) + { + N = (heapframe *)((char *)mb->match_frames + offset); + P = (heapframe *)((char *)N - frame_size); + if (N->group_frame_type == (GF_RECURSE | number)) + { + if (Feptr == P->eptr) return PCRE2_ERROR_RECURSELOOP; + break; + } + offset = P->last_group_offset; + } + } + + /* Now run the recursion, branch by branch. */ + + Lstart_branch = bracode; + Lframe_type = GF_RECURSE | number; + + for (;;) + { + PCRE2_SPTR next_ecode; + + group_frame_type = Lframe_type; + RMATCH(Lstart_branch + PRIV(OP_lengths)[*Lstart_branch], RM11); + next_ecode = Lstart_branch + GET(Lstart_branch,1); + + /* Handle backtracking verbs, which are defined in a range that can + easily be tested for. PCRE does not allow THEN, SKIP, PRUNE or COMMIT to + escape beyond a recursion; they cause a NOMATCH for the entire recursion. + + When one of these verbs triggers, the current recursion group number is + recorded. If it matches the recursion we are processing, the verb + happened within the recursion and we must deal with it. Otherwise it must + have happened after the recursion completed, and so has to be passed + back. See comment above about handling THEN. */ + + if (rrc >= MATCH_BACKTRACK_MIN && rrc <= MATCH_BACKTRACK_MAX && + mb->verb_current_recurse == (Lframe_type ^ GF_RECURSE)) + { + if (rrc == MATCH_THEN && mb->verb_ecode_ptr < next_ecode && + (*Lstart_branch == OP_ALT || *next_ecode == OP_ALT)) + rrc = MATCH_NOMATCH; + else RRETURN(MATCH_NOMATCH); + } + + /* Note that carrying on after (*ACCEPT) in a recursion is handled in the + OP_ACCEPT code. Nothing needs to be done here. */ + + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Lstart_branch = next_ecode; + if (*Lstart_branch != OP_ALT) RRETURN(MATCH_NOMATCH); + } + /* Control never reaches here. */ + +#undef Lframe_type +#undef Lstart_branch + + + /* ===================================================================== */ + /* Positive assertions are like other groups except that PCRE doesn't allow + the effect of (*THEN) to escape beyond an assertion; it is therefore + treated as NOMATCH. (*ACCEPT) is treated as successful assertion, with its + captures and mark retained. Any other return is an error. */ + +#define Lframe_type F->temp_32[0] + + case OP_ASSERT: + case OP_ASSERTBACK: + case OP_ASSERT_NA: + case OP_ASSERTBACK_NA: + Lframe_type = GF_NOCAPTURE | Fop; + for (;;) + { + group_frame_type = Lframe_type; + RMATCH(Fecode + PRIV(OP_lengths)[*Fecode], RM3); + if (rrc == MATCH_ACCEPT) + { + memcpy(Fovector, + (char *)assert_accept_frame + offsetof(heapframe, ovector), + assert_accept_frame->offset_top * sizeof(PCRE2_SIZE)); + Foffset_top = assert_accept_frame->offset_top; + Fmark = assert_accept_frame->mark; + break; + } + if (rrc != MATCH_NOMATCH && rrc != MATCH_THEN) RRETURN(rrc); + Fecode += GET(Fecode, 1); + if (*Fecode != OP_ALT) RRETURN(MATCH_NOMATCH); + } + + do Fecode += GET(Fecode, 1); while (*Fecode == OP_ALT); + Fecode += 1 + LINK_SIZE; + break; + +#undef Lframe_type + + + /* ===================================================================== */ + /* Handle negative assertions. Loop for each non-matching branch as for + positive assertions. */ + +#define Lframe_type F->temp_32[0] + + case OP_ASSERT_NOT: + case OP_ASSERTBACK_NOT: + Lframe_type = GF_NOCAPTURE | Fop; + + for (;;) + { + group_frame_type = Lframe_type; + RMATCH(Fecode + PRIV(OP_lengths)[*Fecode], RM4); + switch(rrc) + { + case MATCH_ACCEPT: /* Assertion matched, therefore it fails. */ + case MATCH_MATCH: + RRETURN (MATCH_NOMATCH); + + case MATCH_NOMATCH: /* Branch failed, try next if present. */ + case MATCH_THEN: + Fecode += GET(Fecode, 1); + if (*Fecode != OP_ALT) goto ASSERT_NOT_FAILED; + break; + + case MATCH_COMMIT: /* Assertion forced to fail, therefore continue. */ + case MATCH_SKIP: + case MATCH_PRUNE: + do Fecode += GET(Fecode, 1); while (*Fecode == OP_ALT); + goto ASSERT_NOT_FAILED; + + default: /* Pass back any other return */ + RRETURN(rrc); + } + } + + /* None of the branches have matched or there was a backtrack to (*COMMIT), + (*SKIP), (*PRUNE), or (*THEN) in the last branch. This is success for a + negative assertion, so carry on. */ + + ASSERT_NOT_FAILED: + Fecode += 1 + LINK_SIZE; + break; + +#undef Lframe_type + + + /* ===================================================================== */ + /* The callout item calls an external function, if one is provided, passing + details of the match so far. This is mainly for debugging, though the + function is able to force a failure. */ + + case OP_CALLOUT: + case OP_CALLOUT_STR: + rrc = do_callout(F, mb, &length); + if (rrc > 0) RRETURN(MATCH_NOMATCH); + if (rrc < 0) RRETURN(rrc); + Fecode += length; + break; + + + /* ===================================================================== */ + /* Conditional group: compilation checked that there are no more than two + branches. If the condition is false, skipping the first branch takes us + past the end of the item if there is only one branch, but that's exactly + what we want. */ + + case OP_COND: + case OP_SCOND: + + /* The variable Flength will be added to Fecode when the condition is + false, to get to the second branch. Setting it to the offset to the ALT or + KET, then incrementing Fecode achieves this effect. However, if the second + branch is non-existent, we must point to the KET so that the end of the + group is correctly processed. We now have Fecode pointing to the condition + or callout. */ + + Flength = GET(Fecode, 1); /* Offset to the second branch */ + if (Fecode[Flength] != OP_ALT) Flength -= 1 + LINK_SIZE; + Fecode += 1 + LINK_SIZE; /* From this opcode */ + + /* Because of the way auto-callout works during compile, a callout item is + inserted between OP_COND and an assertion condition. Such a callout can + also be inserted manually. */ + + if (*Fecode == OP_CALLOUT || *Fecode == OP_CALLOUT_STR) + { + rrc = do_callout(F, mb, &length); + if (rrc > 0) RRETURN(MATCH_NOMATCH); + if (rrc < 0) RRETURN(rrc); + + /* Advance Fecode past the callout, so it now points to the condition. We + must adjust Flength so that the value of Fecode+Flength is unchanged. */ + + Fecode += length; + Flength -= length; + } + + /* Test the various possible conditions */ + + condition = FALSE; + switch(*Fecode) + { + case OP_RREF: /* Group recursion test */ + if (Fcurrent_recurse != RECURSE_UNSET) + { + number = GET2(Fecode, 1); + condition = (number == RREF_ANY || number == Fcurrent_recurse); + } + break; + + case OP_DNRREF: /* Duplicate named group recursion test */ + if (Fcurrent_recurse != RECURSE_UNSET) + { + int count = GET2(Fecode, 1 + IMM2_SIZE); + PCRE2_SPTR slot = mb->name_table + GET2(Fecode, 1) * mb->name_entry_size; + while (count-- > 0) + { + number = GET2(slot, 0); + condition = number == Fcurrent_recurse; + if (condition) break; + slot += mb->name_entry_size; + } + } + break; + + case OP_CREF: /* Numbered group used test */ + offset = (GET2(Fecode, 1) << 1) - 2; /* Doubled ref number */ + condition = offset < Foffset_top && Fovector[offset] != PCRE2_UNSET; + break; + + case OP_DNCREF: /* Duplicate named group used test */ + { + int count = GET2(Fecode, 1 + IMM2_SIZE); + PCRE2_SPTR slot = mb->name_table + GET2(Fecode, 1) * mb->name_entry_size; + while (count-- > 0) + { + offset = (GET2(slot, 0) << 1) - 2; + condition = offset < Foffset_top && Fovector[offset] != PCRE2_UNSET; + if (condition) break; + slot += mb->name_entry_size; + } + } + break; + + case OP_FALSE: + case OP_FAIL: /* The assertion (?!) becomes OP_FAIL */ + break; + + case OP_TRUE: + condition = TRUE; + break; + + /* The condition is an assertion. Run code similar to the assertion code + above. */ + +#define Lpositive F->temp_32[0] +#define Lstart_branch F->temp_sptr[0] + + default: + Lpositive = (*Fecode == OP_ASSERT || *Fecode == OP_ASSERTBACK); + Lstart_branch = Fecode; + + for (;;) + { + group_frame_type = GF_CONDASSERT | *Fecode; + RMATCH(Lstart_branch + PRIV(OP_lengths)[*Lstart_branch], RM5); + + switch(rrc) + { + case MATCH_ACCEPT: /* Save captures */ + memcpy(Fovector, + (char *)assert_accept_frame + offsetof(heapframe, ovector), + assert_accept_frame->offset_top * sizeof(PCRE2_SIZE)); + Foffset_top = assert_accept_frame->offset_top; + + /* Fall through */ + /* In the case of a match, the captures have already been put into + the current frame. */ + + case MATCH_MATCH: + condition = Lpositive; /* TRUE for positive assertion */ + break; + + /* PCRE doesn't allow the effect of (*THEN) to escape beyond an + assertion; it is therefore always treated as NOMATCH. */ + + case MATCH_NOMATCH: + case MATCH_THEN: + Lstart_branch += GET(Lstart_branch, 1); + if (*Lstart_branch == OP_ALT) continue; /* Try next branch */ + condition = !Lpositive; /* TRUE for negative assertion */ + break; + + /* These force no match without checking other branches. */ + + case MATCH_COMMIT: + case MATCH_SKIP: + case MATCH_PRUNE: + condition = !Lpositive; + break; + + default: + RRETURN(rrc); + } + break; /* Out of the branch loop */ + } + + /* If the condition is true, find the end of the assertion so that + advancing past it gets us to the start of the first branch. */ + + if (condition) + { + do Fecode += GET(Fecode, 1); while (*Fecode == OP_ALT); + } + break; /* End of assertion condition */ + } + +#undef Lpositive +#undef Lstart_branch + + /* Choose branch according to the condition. */ + + Fecode += condition? PRIV(OP_lengths)[*Fecode] : Flength; + + /* If the opcode is OP_SCOND it means we are at a repeated conditional + group that might match an empty string. We must therefore descend a level + so that the start is remembered for checking. For OP_COND we can just + continue at this level. */ + + if (Fop == OP_SCOND) + { + group_frame_type = GF_NOCAPTURE | Fop; + RMATCH(Fecode, RM35); + RRETURN(rrc); + } + break; + + + +/* ========================================================================= */ +/* End of start of parenthesis opcodes */ +/* ========================================================================= */ + + + /* ===================================================================== */ + /* Move the subject pointer back. This occurs only at the start of each + branch of a lookbehind assertion. If we are too close to the start to move + back, fail. When working with UTF-8 we move back a number of characters, + not bytes. */ + + case OP_REVERSE: + number = GET(Fecode, 1); +#ifdef SUPPORT_UNICODE + if (utf) + { + while (number-- > 0) + { + if (Feptr <= mb->check_subject) RRETURN(MATCH_NOMATCH); + Feptr--; + BACKCHAR(Feptr); + } + } + else +#endif + + /* No UTF-8 support, or not in UTF-8 mode: count is code unit count */ + + { + if ((ptrdiff_t)number > Feptr - mb->start_subject) RRETURN(MATCH_NOMATCH); + Feptr -= number; + } + + /* Save the earliest consulted character, then skip to next opcode */ + + if (Feptr < mb->start_used_ptr) mb->start_used_ptr = Feptr; + Fecode += 1 + LINK_SIZE; + break; + + + /* ===================================================================== */ + /* An alternation is the end of a branch; scan along to find the end of the + bracketed group. */ + + case OP_ALT: + do Fecode += GET(Fecode,1); while (*Fecode == OP_ALT); + break; + + + /* ===================================================================== */ + /* The end of a parenthesized group. For all but OP_BRA and OP_COND, the + starting frame was added to the chained frames in order to remember the + starting subject position for the group. */ + + case OP_KET: + case OP_KETRMIN: + case OP_KETRMAX: + case OP_KETRPOS: + + bracode = Fecode - GET(Fecode, 1); + + /* Point N to the frame at the start of the most recent group. + Remember the subject pointer at the start of the group. */ + + if (*bracode != OP_BRA && *bracode != OP_COND) + { + N = (heapframe *)((char *)mb->match_frames + Flast_group_offset); + P = (heapframe *)((char *)N - frame_size); + Flast_group_offset = P->last_group_offset; + +#ifdef DEBUG_SHOW_RMATCH + fprintf(stderr, "++ KET for frame=%d type=%x prev char offset=%lu\n", + N->rdepth, N->group_frame_type, + (char *)P->eptr - (char *)mb->start_subject); +#endif + + /* If we are at the end of an assertion that is a condition, return a + match, discarding any intermediate backtracking points. Copy back the + mark setting and the captures into the frame before N so that they are + set on return. Doing this for all assertions, both positive and negative, + seems to match what Perl does. */ + + if (GF_IDMASK(N->group_frame_type) == GF_CONDASSERT) + { + memcpy((char *)P + offsetof(heapframe, ovector), Fovector, + Foffset_top * sizeof(PCRE2_SIZE)); + P->offset_top = Foffset_top; + P->mark = Fmark; + Fback_frame = (char *)F - (char *)P; + RRETURN(MATCH_MATCH); + } + } + else P = NULL; /* Indicates starting frame not recorded */ + + /* The group was not a conditional assertion. */ + + switch (*bracode) + { + case OP_BRA: /* No need to do anything for these */ + case OP_COND: + case OP_SCOND: + break; + + /* Non-atomic positive assertions are like OP_BRA, except that the + subject pointer must be put back to where it was at the start of the + assertion. */ + + case OP_ASSERT_NA: + case OP_ASSERTBACK_NA: + if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr; + Feptr = P->eptr; + break; + + /* Atomic positive assertions are like OP_ONCE, except that in addition + the subject pointer must be put back to where it was at the start of the + assertion. */ + + case OP_ASSERT: + case OP_ASSERTBACK: + if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr; + Feptr = P->eptr; + /* Fall through */ + + /* For an atomic group, discard internal backtracking points. We must + also ensure that any remaining branches within the top-level of the group + are not tried. Do this by adjusting the code pointer within the backtrack + frame so that it points to the final branch. */ + + case OP_ONCE: + Fback_frame = ((char *)F - (char *)P); + for (;;) + { + uint32_t y = GET(P->ecode,1); + if ((P->ecode)[y] != OP_ALT) break; + P->ecode += y; + } + break; + + /* A matching negative assertion returns MATCH, which is turned into + NOMATCH at the assertion level. */ + + case OP_ASSERT_NOT: + case OP_ASSERTBACK_NOT: + RRETURN(MATCH_MATCH); + + /* At the end of a script run, apply the script-checking rules. This code + will never by exercised if Unicode support it not compiled, because in + that environment script runs cause an error at compile time. */ + + case OP_SCRIPT_RUN: + if (!PRIV(script_run)(P->eptr, Feptr, utf)) RRETURN(MATCH_NOMATCH); + break; + + /* Whole-pattern recursion is coded as a recurse into group 0, so it + won't be picked up here. Instead, we catch it when the OP_END is reached. + Other recursion is handled here. */ + + case OP_CBRA: + case OP_CBRAPOS: + case OP_SCBRA: + case OP_SCBRAPOS: + number = GET2(bracode, 1+LINK_SIZE); + + /* Handle a recursively called group. We reinstate the previous set of + captures and then carry on after the recursion call. */ + + if (Fcurrent_recurse == number) + { + P = (heapframe *)((char *)N - frame_size); + memcpy((char *)F + offsetof(heapframe, ovector), P->ovector, + P->offset_top * sizeof(PCRE2_SIZE)); + Foffset_top = P->offset_top; + Fcapture_last = P->capture_last; + Fcurrent_recurse = P->current_recurse; + Fecode = P->ecode + 1 + LINK_SIZE; + continue; /* With next opcode */ + } + + /* Deal with actual capturing. */ + + offset = (number << 1) - 2; + Fcapture_last = number; + Fovector[offset] = P->eptr - mb->start_subject; + Fovector[offset+1] = Feptr - mb->start_subject; + if (offset >= Foffset_top) Foffset_top = offset + 2; + break; + } /* End actions relating to the starting opcode */ + + /* OP_KETRPOS is a possessive repeating ket. Remember the current position, + and return the MATCH_KETRPOS. This makes it possible to do the repeats one + at a time from the outer level. This must precede the empty string test - + in this case that test is done at the outer level. */ + + if (*Fecode == OP_KETRPOS) + { + memcpy((char *)P + offsetof(heapframe, eptr), + (char *)F + offsetof(heapframe, eptr), + frame_copy_size); + RRETURN(MATCH_KETRPOS); + } + + /* Handle the different kinds of closing brackets. A non-repeating ket + needs no special action, just continuing at this level. This also happens + for the repeating kets if the group matched no characters, in order to + forcibly break infinite loops. Otherwise, the repeating kets try the rest + of the pattern or restart from the preceding bracket, in the appropriate + order. */ + + if (Fop != OP_KET && (P == NULL || Feptr != P->eptr)) + { + if (Fop == OP_KETRMIN) + { + RMATCH(Fecode + 1 + LINK_SIZE, RM6); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + Fecode -= GET(Fecode, 1); + break; /* End of ket processing */ + } + + /* Repeat the maximum number of times (KETRMAX) */ + + RMATCH(bracode, RM7); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + } + + /* Carry on at this level for a non-repeating ket, or after matching an + empty string, or after repeating for a maximum number of times. */ + + Fecode += 1 + LINK_SIZE; + break; + + + /* ===================================================================== */ + /* Start and end of line assertions, not multiline mode. */ + + case OP_CIRC: /* Start of line, unless PCRE2_NOTBOL is set. */ + if (Feptr != mb->start_subject || (mb->moptions & PCRE2_NOTBOL) != 0) + RRETURN(MATCH_NOMATCH); + Fecode++; + break; + + case OP_SOD: /* Unconditional start of subject */ + if (Feptr != mb->start_subject) RRETURN(MATCH_NOMATCH); + Fecode++; + break; + + /* When PCRE2_NOTEOL is unset, assert before the subject end, or a + terminating newline unless PCRE2_DOLLAR_ENDONLY is set. */ + + case OP_DOLL: + if ((mb->moptions & PCRE2_NOTEOL) != 0) RRETURN(MATCH_NOMATCH); + if ((mb->poptions & PCRE2_DOLLAR_ENDONLY) == 0) goto ASSERT_NL_OR_EOS; + + /* Fall through */ + /* Unconditional end of subject assertion (\z) */ + + case OP_EOD: + if (Feptr < mb->end_subject) RRETURN(MATCH_NOMATCH); + if (mb->partial != 0) + { + mb->hitend = TRUE; + if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; + } + Fecode++; + break; + + /* End of subject or ending \n assertion (\Z) */ + + case OP_EODN: + ASSERT_NL_OR_EOS: + if (Feptr < mb->end_subject && + (!IS_NEWLINE(Feptr) || Feptr != mb->end_subject - mb->nllen)) + { + if (mb->partial != 0 && + Feptr + 1 >= mb->end_subject && + NLBLOCK->nltype == NLTYPE_FIXED && + NLBLOCK->nllen == 2 && + UCHAR21TEST(Feptr) == NLBLOCK->nl[0]) + { + mb->hitend = TRUE; + if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; + } + RRETURN(MATCH_NOMATCH); + } + + /* Either at end of string or \n before end. */ + + if (mb->partial != 0) + { + mb->hitend = TRUE; + if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; + } + Fecode++; + break; + + + /* ===================================================================== */ + /* Start and end of line assertions, multiline mode. */ + + /* Start of subject unless notbol, or after any newline except for one at + the very end, unless PCRE2_ALT_CIRCUMFLEX is set. */ + + case OP_CIRCM: + if ((mb->moptions & PCRE2_NOTBOL) != 0 && Feptr == mb->start_subject) + RRETURN(MATCH_NOMATCH); + if (Feptr != mb->start_subject && + ((Feptr == mb->end_subject && + (mb->poptions & PCRE2_ALT_CIRCUMFLEX) == 0) || + !WAS_NEWLINE(Feptr))) + RRETURN(MATCH_NOMATCH); + Fecode++; + break; + + /* Assert before any newline, or before end of subject unless noteol is + set. */ + + case OP_DOLLM: + if (Feptr < mb->end_subject) + { + if (!IS_NEWLINE(Feptr)) + { + if (mb->partial != 0 && + Feptr + 1 >= mb->end_subject && + NLBLOCK->nltype == NLTYPE_FIXED && + NLBLOCK->nllen == 2 && + UCHAR21TEST(Feptr) == NLBLOCK->nl[0]) + { + mb->hitend = TRUE; + if (mb->partial > 1) return PCRE2_ERROR_PARTIAL; + } + RRETURN(MATCH_NOMATCH); + } + } + else + { + if ((mb->moptions & PCRE2_NOTEOL) != 0) RRETURN(MATCH_NOMATCH); + SCHECK_PARTIAL(); + } + Fecode++; + break; + + + /* ===================================================================== */ + /* Start of match assertion */ + + case OP_SOM: + if (Feptr != mb->start_subject + mb->start_offset) RRETURN(MATCH_NOMATCH); + Fecode++; + break; + + + /* ===================================================================== */ + /* Reset the start of match point */ + + case OP_SET_SOM: + Fstart_match = Feptr; + Fecode++; + break; + + + /* ===================================================================== */ + /* Word boundary assertions. Find out if the previous and current + characters are "word" characters. It takes a bit more work in UTF mode. + Characters > 255 are assumed to be "non-word" characters when PCRE2_UCP is + not set. When it is set, use Unicode properties if available, even when not + in UTF mode. Remember the earliest and latest consulted characters. */ + + case OP_NOT_WORD_BOUNDARY: + case OP_WORD_BOUNDARY: + if (Feptr == mb->check_subject) prev_is_word = FALSE; else + { + PCRE2_SPTR lastptr = Feptr - 1; +#ifdef SUPPORT_UNICODE + if (utf) + { + BACKCHAR(lastptr); + GETCHAR(fc, lastptr); + } + else +#endif /* SUPPORT_UNICODE */ + fc = *lastptr; + if (lastptr < mb->start_used_ptr) mb->start_used_ptr = lastptr; +#ifdef SUPPORT_UNICODE + if ((mb->poptions & PCRE2_UCP) != 0) + { + if (fc == '_') prev_is_word = TRUE; else + { + int cat = UCD_CATEGORY(fc); + prev_is_word = (cat == ucp_L || cat == ucp_N); + } + } + else +#endif /* SUPPORT_UNICODE */ + prev_is_word = CHMAX_255(fc) && (mb->ctypes[fc] & ctype_word) != 0; + } + + /* Get status of next character */ + + if (Feptr >= mb->end_subject) + { + SCHECK_PARTIAL(); + cur_is_word = FALSE; + } + else + { + PCRE2_SPTR nextptr = Feptr + 1; +#ifdef SUPPORT_UNICODE + if (utf) + { + FORWARDCHARTEST(nextptr, mb->end_subject); + GETCHAR(fc, Feptr); + } + else +#endif /* SUPPORT_UNICODE */ + fc = *Feptr; + if (nextptr > mb->last_used_ptr) mb->last_used_ptr = nextptr; +#ifdef SUPPORT_UNICODE + if ((mb->poptions & PCRE2_UCP) != 0) + { + if (fc == '_') cur_is_word = TRUE; else + { + int cat = UCD_CATEGORY(fc); + cur_is_word = (cat == ucp_L || cat == ucp_N); + } + } + else +#endif /* SUPPORT_UNICODE */ + cur_is_word = CHMAX_255(fc) && (mb->ctypes[fc] & ctype_word) != 0; + } + + /* Now see if the situation is what we want */ + + if ((*Fecode++ == OP_WORD_BOUNDARY)? + cur_is_word == prev_is_word : cur_is_word != prev_is_word) + RRETURN(MATCH_NOMATCH); + break; + + + /* ===================================================================== */ + /* Backtracking (*VERB)s, with and without arguments. Note that if the + pattern is successfully matched, we do not come back from RMATCH. */ + + case OP_MARK: + Fmark = mb->nomatch_mark = Fecode + 2; + RMATCH(Fecode + PRIV(OP_lengths)[*Fecode] + Fecode[1], RM12); + + /* A return of MATCH_SKIP_ARG means that matching failed at SKIP with an + argument, and we must check whether that argument matches this MARK's + argument. It is passed back in mb->verb_skip_ptr. If it does match, we + return MATCH_SKIP with mb->verb_skip_ptr now pointing to the subject + position that corresponds to this mark. Otherwise, pass back the return + code unaltered. */ + + if (rrc == MATCH_SKIP_ARG && + PRIV(strcmp)(Fecode + 2, mb->verb_skip_ptr) == 0) + { + mb->verb_skip_ptr = Feptr; /* Pass back current position */ + RRETURN(MATCH_SKIP); + } + RRETURN(rrc); + + case OP_FAIL: + RRETURN(MATCH_NOMATCH); + + /* Record the current recursing group number in mb->verb_current_recurse + when a backtracking return such as MATCH_COMMIT is given. This enables the + recurse processing to catch verbs from within the recursion. */ + + case OP_COMMIT: + RMATCH(Fecode + PRIV(OP_lengths)[*Fecode], RM13); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + mb->verb_current_recurse = Fcurrent_recurse; + RRETURN(MATCH_COMMIT); + + case OP_COMMIT_ARG: + Fmark = mb->nomatch_mark = Fecode + 2; + RMATCH(Fecode + PRIV(OP_lengths)[*Fecode] + Fecode[1], RM36); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + mb->verb_current_recurse = Fcurrent_recurse; + RRETURN(MATCH_COMMIT); + + case OP_PRUNE: + RMATCH(Fecode + PRIV(OP_lengths)[*Fecode], RM14); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + mb->verb_current_recurse = Fcurrent_recurse; + RRETURN(MATCH_PRUNE); + + case OP_PRUNE_ARG: + Fmark = mb->nomatch_mark = Fecode + 2; + RMATCH(Fecode + PRIV(OP_lengths)[*Fecode] + Fecode[1], RM15); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + mb->verb_current_recurse = Fcurrent_recurse; + RRETURN(MATCH_PRUNE); + + case OP_SKIP: + RMATCH(Fecode + PRIV(OP_lengths)[*Fecode], RM16); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + mb->verb_skip_ptr = Feptr; /* Pass back current position */ + mb->verb_current_recurse = Fcurrent_recurse; + RRETURN(MATCH_SKIP); + + /* Note that, for Perl compatibility, SKIP with an argument does NOT set + nomatch_mark. When a pattern match ends with a SKIP_ARG for which there was + not a matching mark, we have to re-run the match, ignoring the SKIP_ARG + that failed and any that precede it (either they also failed, or were not + triggered). To do this, we maintain a count of executed SKIP_ARGs. If a + SKIP_ARG gets to top level, the match is re-run with mb->ignore_skip_arg + set to the count of the one that failed. */ + + case OP_SKIP_ARG: + mb->skip_arg_count++; + if (mb->skip_arg_count <= mb->ignore_skip_arg) + { + Fecode += PRIV(OP_lengths)[*Fecode] + Fecode[1]; + break; + } + RMATCH(Fecode + PRIV(OP_lengths)[*Fecode] + Fecode[1], RM17); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + + /* Pass back the current skip name and return the special MATCH_SKIP_ARG + return code. This will either be caught by a matching MARK, or get to the + top, where it causes a rematch with mb->ignore_skip_arg set to the value of + mb->skip_arg_count. */ + + mb->verb_skip_ptr = Fecode + 2; + mb->verb_current_recurse = Fcurrent_recurse; + RRETURN(MATCH_SKIP_ARG); + + /* For THEN (and THEN_ARG) we pass back the address of the opcode, so that + the branch in which it occurs can be determined. */ + + case OP_THEN: + RMATCH(Fecode + PRIV(OP_lengths)[*Fecode], RM18); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + mb->verb_ecode_ptr = Fecode; + mb->verb_current_recurse = Fcurrent_recurse; + RRETURN(MATCH_THEN); + + case OP_THEN_ARG: + Fmark = mb->nomatch_mark = Fecode + 2; + RMATCH(Fecode + PRIV(OP_lengths)[*Fecode] + Fecode[1], RM19); + if (rrc != MATCH_NOMATCH) RRETURN(rrc); + mb->verb_ecode_ptr = Fecode; + mb->verb_current_recurse = Fcurrent_recurse; + RRETURN(MATCH_THEN); + + + /* ===================================================================== */ + /* There's been some horrible disaster. Arrival here can only mean there is + something seriously wrong in the code above or the OP_xxx definitions. */ + + default: + return PCRE2_ERROR_INTERNAL; + } + + /* Do not insert any code in here without much thought; it is assumed + that "continue" in the code above comes out to here to repeat the main + loop. */ + + } /* End of main loop */ +/* Control never reaches here */ + + +/* ========================================================================= */ +/* The RRETURN() macro jumps here. The number that is saved in Freturn_id +indicates which label we actually want to return to. The value in Frdepth is +the index number of the frame in the vector. The return value has been placed +in rrc. */ + +#define LBL(val) case val: goto L_RM##val; + +RETURN_SWITCH: +if (Feptr > mb->last_used_ptr) mb->last_used_ptr = Feptr; +if (Frdepth == 0) return rrc; /* Exit from the top level */ +F = (heapframe *)((char *)F - Fback_frame); /* Backtrack */ +mb->cb->callout_flags |= PCRE2_CALLOUT_BACKTRACK; /* Note for callouts */ + +#ifdef DEBUG_SHOW_RMATCH +fprintf(stderr, "++ RETURN %d to %d\n", rrc, Freturn_id); +#endif + +switch (Freturn_id) + { + LBL( 1) LBL( 2) LBL( 3) LBL( 4) LBL( 5) LBL( 6) LBL( 7) LBL( 8) + LBL( 9) LBL(10) LBL(11) LBL(12) LBL(13) LBL(14) LBL(15) LBL(16) + LBL(17) LBL(18) LBL(19) LBL(20) LBL(21) LBL(22) LBL(23) LBL(24) + LBL(25) LBL(26) LBL(27) LBL(28) LBL(29) LBL(30) LBL(31) LBL(32) + LBL(33) LBL(34) LBL(35) LBL(36) + +#ifdef SUPPORT_WIDE_CHARS + LBL(100) LBL(101) +#endif + +#ifdef SUPPORT_UNICODE + LBL(200) LBL(201) LBL(202) LBL(203) LBL(204) LBL(205) LBL(206) + LBL(207) LBL(208) LBL(209) LBL(210) LBL(211) LBL(212) LBL(213) + LBL(214) LBL(215) LBL(216) LBL(217) LBL(218) LBL(219) LBL(220) + LBL(221) LBL(222) +#endif + + default: + return PCRE2_ERROR_INTERNAL; + } +#undef LBL +} + + +/************************************************* +* Match a Regular Expression * +*************************************************/ + +/* This function applies a compiled pattern to a subject string and picks out +portions of the string if it matches. Two elements in the vector are set for +each substring: the offsets to the start and end of the substring. + +Arguments: + code points to the compiled expression + subject points to the subject string + length length of subject string (may contain binary zeros) + start_offset where to start in the subject string + options option bits + match_data points to a match_data block + mcontext points a PCRE2 context + +Returns: > 0 => success; value is the number of ovector pairs filled + = 0 => success, but ovector is not big enough + = -1 => failed to match (PCRE2_ERROR_NOMATCH) + = -2 => partial match (PCRE2_ERROR_PARTIAL) + < -2 => some kind of unexpected problem +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_match(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length, + PCRE2_SIZE start_offset, uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext) +{ +int rc; +int was_zero_terminated = 0; +const uint8_t *start_bits = NULL; +const pcre2_real_code *re = (const pcre2_real_code *)code; + +BOOL anchored; +BOOL firstline; +BOOL has_first_cu = FALSE; +BOOL has_req_cu = FALSE; +BOOL startline; + +#if PCRE2_CODE_UNIT_WIDTH == 8 +BOOL memchr_not_found_first_cu; +BOOL memchr_not_found_first_cu2; +#endif + +PCRE2_UCHAR first_cu = 0; +PCRE2_UCHAR first_cu2 = 0; +PCRE2_UCHAR req_cu = 0; +PCRE2_UCHAR req_cu2 = 0; + +PCRE2_SPTR bumpalong_limit; +PCRE2_SPTR end_subject; +PCRE2_SPTR true_end_subject; +PCRE2_SPTR start_match = subject + start_offset; +PCRE2_SPTR req_cu_ptr = start_match - 1; +PCRE2_SPTR start_partial; +PCRE2_SPTR match_partial; + +#ifdef SUPPORT_JIT +BOOL use_jit; +#endif + +/* This flag is needed even when Unicode is not supported for convenience +(it is used by the IS_NEWLINE macro). */ + +BOOL utf = FALSE; + +#ifdef SUPPORT_UNICODE +BOOL ucp = FALSE; +BOOL allow_invalid; +uint32_t fragment_options = 0; +#ifdef SUPPORT_JIT +BOOL jit_checked_utf = FALSE; +#endif +#endif /* SUPPORT_UNICODE */ + +PCRE2_SIZE frame_size; + +/* We need to have mb as a pointer to a match block, because the IS_NEWLINE +macro is used below, and it expects NLBLOCK to be defined as a pointer. */ + +pcre2_callout_block cb; +match_block actual_match_block; +match_block *mb = &actual_match_block; + +/* Allocate an initial vector of backtracking frames on the stack. If this +proves to be too small, it is replaced by a larger one on the heap. To get a +vector of the size required that is aligned for pointers, allocate it as a +vector of pointers. */ + +PCRE2_SPTR stack_frames_vector[START_FRAMES_SIZE/sizeof(PCRE2_SPTR)] + PCRE2_KEEP_UNINITIALIZED; +mb->stack_frames = (heapframe *)stack_frames_vector; + +/* A length equal to PCRE2_ZERO_TERMINATED implies a zero-terminated +subject string. */ + +if (length == PCRE2_ZERO_TERMINATED) + { + length = PRIV(strlen)(subject); + was_zero_terminated = 1; + } +true_end_subject = end_subject = subject + length; + +/* Plausibility checks */ + +if ((options & ~PUBLIC_MATCH_OPTIONS) != 0) return PCRE2_ERROR_BADOPTION; +if (code == NULL || subject == NULL || match_data == NULL) + return PCRE2_ERROR_NULL; +if (start_offset > length) return PCRE2_ERROR_BADOFFSET; + +/* Check that the first field in the block is the magic number. */ + +if (re->magic_number != MAGIC_NUMBER) return PCRE2_ERROR_BADMAGIC; + +/* Check the code unit width. */ + +if ((re->flags & PCRE2_MODE_MASK) != PCRE2_CODE_UNIT_WIDTH/8) + return PCRE2_ERROR_BADMODE; + +/* PCRE2_NOTEMPTY and PCRE2_NOTEMPTY_ATSTART are match-time flags in the +options variable for this function. Users of PCRE2 who are not calling the +function directly would like to have a way of setting these flags, in the same +way that they can set pcre2_compile() flags like PCRE2_NO_AUTOPOSSESS with +constructions like (*NO_AUTOPOSSESS). To enable this, (*NOTEMPTY) and +(*NOTEMPTY_ATSTART) set bits in the pattern's "flag" function which we now +transfer to the options for this function. The bits are guaranteed to be +adjacent, but do not have the same values. This bit of Boolean trickery assumes +that the match-time bits are not more significant than the flag bits. If by +accident this is not the case, a compile-time division by zero error will +occur. */ + +#define FF (PCRE2_NOTEMPTY_SET|PCRE2_NE_ATST_SET) +#define OO (PCRE2_NOTEMPTY|PCRE2_NOTEMPTY_ATSTART) +options |= (re->flags & FF) / ((FF & (~FF+1)) / (OO & (~OO+1))); +#undef FF +#undef OO + +/* If the pattern was successfully studied with JIT support, we will run the +JIT executable instead of the rest of this function. Most options must be set +at compile time for the JIT code to be usable. */ + +#ifdef SUPPORT_JIT +use_jit = (re->executable_jit != NULL && + (options & ~PUBLIC_JIT_MATCH_OPTIONS) == 0); +#endif + +/* Initialize UTF/UCP parameters. */ + +#ifdef SUPPORT_UNICODE +utf = (re->overall_options & PCRE2_UTF) != 0; +allow_invalid = (re->overall_options & PCRE2_MATCH_INVALID_UTF) != 0; +ucp = (re->overall_options & PCRE2_UCP) != 0; +#endif /* SUPPORT_UNICODE */ + +/* Convert the partial matching flags into an integer. */ + +mb->partial = ((options & PCRE2_PARTIAL_HARD) != 0)? 2 : + ((options & PCRE2_PARTIAL_SOFT) != 0)? 1 : 0; + +/* Partial matching and PCRE2_ENDANCHORED are currently not allowed at the same +time. */ + +if (mb->partial != 0 && + ((re->overall_options | options) & PCRE2_ENDANCHORED) != 0) + return PCRE2_ERROR_BADOPTION; + +/* It is an error to set an offset limit without setting the flag at compile +time. */ + +if (mcontext != NULL && mcontext->offset_limit != PCRE2_UNSET && + (re->overall_options & PCRE2_USE_OFFSET_LIMIT) == 0) + return PCRE2_ERROR_BADOFFSETLIMIT; + +/* If the match data block was previously used with PCRE2_COPY_MATCHED_SUBJECT, +free the memory that was obtained. Set the field to NULL for no match cases. */ + +if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0) + { + match_data->memctl.free((void *)match_data->subject, + match_data->memctl.memory_data); + match_data->flags &= ~PCRE2_MD_COPIED_SUBJECT; + } +match_data->subject = NULL; + +/* Zero the error offset in case the first code unit is invalid UTF. */ + +match_data->startchar = 0; + + +/* ============================= JIT matching ============================== */ + +/* Prepare for JIT matching. Check a UTF string for validity unless no check is +requested or invalid UTF can be handled. We check only the portion of the +subject that might be be inspected during matching - from the offset minus the +maximum lookbehind to the given length. This saves time when a small part of a +large subject is being matched by the use of a starting offset. Note that the +maximum lookbehind is a number of characters, not code units. */ + +#ifdef SUPPORT_JIT +if (use_jit) + { +#ifdef SUPPORT_UNICODE + if (utf && (options & PCRE2_NO_UTF_CHECK) == 0 && !allow_invalid) + { +#if PCRE2_CODE_UNIT_WIDTH != 32 + unsigned int i; +#endif + + /* For 8-bit and 16-bit UTF, check that the first code unit is a valid + character start. */ + +#if PCRE2_CODE_UNIT_WIDTH != 32 + if (start_match < end_subject && NOT_FIRSTCU(*start_match)) + { + if (start_offset > 0) return PCRE2_ERROR_BADUTFOFFSET; +#if PCRE2_CODE_UNIT_WIDTH == 8 + return PCRE2_ERROR_UTF8_ERR20; /* Isolated 0x80 byte */ +#else + return PCRE2_ERROR_UTF16_ERR3; /* Isolated low surrogate */ +#endif + } +#endif /* WIDTH != 32 */ + + /* Move back by the maximum lookbehind, just in case it happens at the very + start of matching. */ + +#if PCRE2_CODE_UNIT_WIDTH != 32 + for (i = re->max_lookbehind; i > 0 && start_match > subject; i--) + { + start_match--; + while (start_match > subject && +#if PCRE2_CODE_UNIT_WIDTH == 8 + (*start_match & 0xc0) == 0x80) +#else /* 16-bit */ + (*start_match & 0xfc00) == 0xdc00) +#endif + start_match--; + } +#else /* PCRE2_CODE_UNIT_WIDTH != 32 */ + + /* In the 32-bit library, one code unit equals one character. However, + we cannot just subtract the lookbehind and then compare pointers, because + a very large lookbehind could create an invalid pointer. */ + + if (start_offset >= re->max_lookbehind) + start_match -= re->max_lookbehind; + else + start_match = subject; +#endif /* PCRE2_CODE_UNIT_WIDTH != 32 */ + + /* Validate the relevant portion of the subject. Adjust the offset of an + invalid code point to be an absolute offset in the whole string. */ + + match_data->rc = PRIV(valid_utf)(start_match, + length - (start_match - subject), &(match_data->startchar)); + if (match_data->rc != 0) + { + match_data->startchar += start_match - subject; + return match_data->rc; + } + jit_checked_utf = TRUE; + } +#endif /* SUPPORT_UNICODE */ + + /* If JIT returns BADOPTION, which means that the selected complete or + partial matching mode was not compiled, fall through to the interpreter. */ + + rc = pcre2_jit_match(code, subject, length, start_offset, options, + match_data, mcontext); + if (rc != PCRE2_ERROR_JIT_BADOPTION) + { + if (rc >= 0 && (options & PCRE2_COPY_MATCHED_SUBJECT) != 0) + { + length = CU2BYTES(length + was_zero_terminated); + match_data->subject = match_data->memctl.malloc(length, + match_data->memctl.memory_data); + if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY; + memcpy((void *)match_data->subject, subject, length); + match_data->flags |= PCRE2_MD_COPIED_SUBJECT; + } + return rc; + } + } +#endif /* SUPPORT_JIT */ + +/* ========================= End of JIT matching ========================== */ + + +/* Proceed with non-JIT matching. The default is to allow lookbehinds to the +start of the subject. A UTF check when there is a non-zero offset may change +this. */ + +mb->check_subject = subject; + +/* If a UTF subject string was not checked for validity in the JIT code above, +check it here, and handle support for invalid UTF strings. The check above +happens only when invalid UTF is not supported and PCRE2_NO_CHECK_UTF is unset. +If we get here in those circumstances, it means the subject string is valid, +but for some reason JIT matching was not successful. There is no need to check +the subject again. + +We check only the portion of the subject that might be be inspected during +matching - from the offset minus the maximum lookbehind to the given length. +This saves time when a small part of a large subject is being matched by the +use of a starting offset. Note that the maximum lookbehind is a number of +characters, not code units. + +Note also that support for invalid UTF forces a check, overriding the setting +of PCRE2_NO_CHECK_UTF. */ + +#ifdef SUPPORT_UNICODE +if (utf && +#ifdef SUPPORT_JIT + !jit_checked_utf && +#endif + ((options & PCRE2_NO_UTF_CHECK) == 0 || allow_invalid)) + { +#if PCRE2_CODE_UNIT_WIDTH != 32 + BOOL skipped_bad_start = FALSE; +#endif + + /* For 8-bit and 16-bit UTF, check that the first code unit is a valid + character start. If we are handling invalid UTF, just skip over such code + units. Otherwise, give an appropriate error. */ + +#if PCRE2_CODE_UNIT_WIDTH != 32 + if (allow_invalid) + { + while (start_match < end_subject && NOT_FIRSTCU(*start_match)) + { + start_match++; + skipped_bad_start = TRUE; + } + } + else if (start_match < end_subject && NOT_FIRSTCU(*start_match)) + { + if (start_offset > 0) return PCRE2_ERROR_BADUTFOFFSET; +#if PCRE2_CODE_UNIT_WIDTH == 8 + return PCRE2_ERROR_UTF8_ERR20; /* Isolated 0x80 byte */ +#else + return PCRE2_ERROR_UTF16_ERR3; /* Isolated low surrogate */ +#endif + } +#endif /* WIDTH != 32 */ + + /* The mb->check_subject field points to the start of UTF checking; + lookbehinds can go back no further than this. */ + + mb->check_subject = start_match; + + /* Move back by the maximum lookbehind, just in case it happens at the very + start of matching, but don't do this if we skipped bad 8-bit or 16-bit code + units above. */ + +#if PCRE2_CODE_UNIT_WIDTH != 32 + if (!skipped_bad_start) + { + unsigned int i; + for (i = re->max_lookbehind; i > 0 && mb->check_subject > subject; i--) + { + mb->check_subject--; + while (mb->check_subject > subject && +#if PCRE2_CODE_UNIT_WIDTH == 8 + (*mb->check_subject & 0xc0) == 0x80) +#else /* 16-bit */ + (*mb->check_subject & 0xfc00) == 0xdc00) +#endif + mb->check_subject--; + } + } +#else /* PCRE2_CODE_UNIT_WIDTH != 32 */ + + /* In the 32-bit library, one code unit equals one character. However, + we cannot just subtract the lookbehind and then compare pointers, because + a very large lookbehind could create an invalid pointer. */ + + if (start_offset >= re->max_lookbehind) + mb->check_subject -= re->max_lookbehind; + else + mb->check_subject = subject; +#endif /* PCRE2_CODE_UNIT_WIDTH != 32 */ + + /* Validate the relevant portion of the subject. There's a loop in case we + encounter bad UTF in the characters preceding start_match which we are + scanning because of a lookbehind. */ + + for (;;) + { + match_data->rc = PRIV(valid_utf)(mb->check_subject, + length - (mb->check_subject - subject), &(match_data->startchar)); + + if (match_data->rc == 0) break; /* Valid UTF string */ + + /* Invalid UTF string. Adjust the offset to be an absolute offset in the + whole string. If we are handling invalid UTF strings, set end_subject to + stop before the bad code unit, and set the options to "not end of line". + Otherwise return the error. */ + + match_data->startchar += mb->check_subject - subject; + if (!allow_invalid || match_data->rc > 0) return match_data->rc; + end_subject = subject + match_data->startchar; + + /* If the end precedes start_match, it means there is invalid UTF in the + extra code units we reversed over because of a lookbehind. Advance past the + first bad code unit, and then skip invalid character starting code units in + 8-bit and 16-bit modes, and try again. */ + + if (end_subject < start_match) + { + mb->check_subject = end_subject + 1; +#if PCRE2_CODE_UNIT_WIDTH != 32 + while (mb->check_subject < start_match && NOT_FIRSTCU(*mb->check_subject)) + mb->check_subject++; +#endif + } + + /* Otherwise, set the not end of line option, and do the match. */ + + else + { + fragment_options = PCRE2_NOTEOL; + break; + } + } + } +#endif /* SUPPORT_UNICODE */ + +/* A NULL match context means "use a default context", but we take the memory +control functions from the pattern. */ + +if (mcontext == NULL) + { + mcontext = (pcre2_match_context *)(&PRIV(default_match_context)); + mb->memctl = re->memctl; + } +else mb->memctl = mcontext->memctl; + +anchored = ((re->overall_options | options) & PCRE2_ANCHORED) != 0; +firstline = (re->overall_options & PCRE2_FIRSTLINE) != 0; +startline = (re->flags & PCRE2_STARTLINE) != 0; +bumpalong_limit = (mcontext->offset_limit == PCRE2_UNSET)? + true_end_subject : subject + mcontext->offset_limit; + +/* Initialize and set up the fixed fields in the callout block, with a pointer +in the match block. */ + +mb->cb = &cb; +cb.version = 2; +cb.subject = subject; +cb.subject_length = (PCRE2_SIZE)(end_subject - subject); +cb.callout_flags = 0; + +/* Fill in the remaining fields in the match block, except for moptions, which +gets set later. */ + +mb->callout = mcontext->callout; +mb->callout_data = mcontext->callout_data; + +mb->start_subject = subject; +mb->start_offset = start_offset; +mb->end_subject = end_subject; +mb->hasthen = (re->flags & PCRE2_HASTHEN) != 0; +mb->allowemptypartial = (re->max_lookbehind > 0) || + (re->flags & PCRE2_MATCH_EMPTY) != 0; +mb->poptions = re->overall_options; /* Pattern options */ +mb->ignore_skip_arg = 0; +mb->mark = mb->nomatch_mark = NULL; /* In case never set */ + +/* The name table is needed for finding all the numbers associated with a +given name, for condition testing. The code follows the name table. */ + +mb->name_table = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)); +mb->name_count = re->name_count; +mb->name_entry_size = re->name_entry_size; +mb->start_code = mb->name_table + re->name_count * re->name_entry_size; + +/* Process the \R and newline settings. */ + +mb->bsr_convention = re->bsr_convention; +mb->nltype = NLTYPE_FIXED; +switch(re->newline_convention) + { + case PCRE2_NEWLINE_CR: + mb->nllen = 1; + mb->nl[0] = CHAR_CR; + break; + + case PCRE2_NEWLINE_LF: + mb->nllen = 1; + mb->nl[0] = CHAR_NL; + break; + + case PCRE2_NEWLINE_NUL: + mb->nllen = 1; + mb->nl[0] = CHAR_NUL; + break; + + case PCRE2_NEWLINE_CRLF: + mb->nllen = 2; + mb->nl[0] = CHAR_CR; + mb->nl[1] = CHAR_NL; + break; + + case PCRE2_NEWLINE_ANY: + mb->nltype = NLTYPE_ANY; + break; + + case PCRE2_NEWLINE_ANYCRLF: + mb->nltype = NLTYPE_ANYCRLF; + break; + + default: return PCRE2_ERROR_INTERNAL; + } + +/* The backtracking frames have fixed data at the front, and a PCRE2_SIZE +vector at the end, whose size depends on the number of capturing parentheses in +the pattern. It is not used at all if there are no capturing parentheses. + + frame_size is the total size of each frame + mb->frame_vector_size is the total usable size of the vector (rounded down + to a whole number of frames) + +The last of these is changed within the match() function if the frame vector +has to be expanded. We therefore put it into the match block so that it is +correct when calling match() more than once for non-anchored patterns. */ + +frame_size = offsetof(heapframe, ovector) + + re->top_bracket * 2 * sizeof(PCRE2_SIZE); + +/* Limits set in the pattern override the match context only if they are +smaller. */ + +mb->heap_limit = (mcontext->heap_limit < re->limit_heap)? + mcontext->heap_limit : re->limit_heap; + +mb->match_limit = (mcontext->match_limit < re->limit_match)? + mcontext->match_limit : re->limit_match; + +mb->match_limit_depth = (mcontext->depth_limit < re->limit_depth)? + mcontext->depth_limit : re->limit_depth; + +/* If a pattern has very many capturing parentheses, the frame size may be very +large. Ensure that there are at least 10 available frames by getting an initial +vector on the heap if necessary, except when the heap limit prevents this. Get +fewer if possible. (The heap limit is in kibibytes.) */ + +if (frame_size <= START_FRAMES_SIZE/10) + { + mb->match_frames = mb->stack_frames; /* Initial frame vector on the stack */ + mb->frame_vector_size = ((START_FRAMES_SIZE/frame_size) * frame_size); + } +else + { + mb->frame_vector_size = frame_size * 10; + if ((mb->frame_vector_size / 1024) > mb->heap_limit) + { + if (frame_size > mb->heap_limit * 1024) return PCRE2_ERROR_HEAPLIMIT; + mb->frame_vector_size = ((mb->heap_limit * 1024)/frame_size) * frame_size; + } + mb->match_frames = mb->memctl.malloc(mb->frame_vector_size, + mb->memctl.memory_data); + if (mb->match_frames == NULL) return PCRE2_ERROR_NOMEMORY; + } + +mb->match_frames_top = + (heapframe *)((char *)mb->match_frames + mb->frame_vector_size); + +/* Write to the ovector within the first frame to mark every capture unset and +to avoid uninitialized memory read errors when it is copied to a new frame. */ + +memset((char *)(mb->match_frames) + offsetof(heapframe, ovector), 0xff, + re->top_bracket * 2 * sizeof(PCRE2_SIZE)); + +/* Pointers to the individual character tables */ + +mb->lcc = re->tables + lcc_offset; +mb->fcc = re->tables + fcc_offset; +mb->ctypes = re->tables + ctypes_offset; + +/* Set up the first code unit to match, if available. If there's no first code +unit there may be a bitmap of possible first characters. */ + +if ((re->flags & PCRE2_FIRSTSET) != 0) + { + has_first_cu = TRUE; + first_cu = first_cu2 = (PCRE2_UCHAR)(re->first_codeunit); + if ((re->flags & PCRE2_FIRSTCASELESS) != 0) + { + first_cu2 = TABLE_GET(first_cu, mb->fcc, first_cu); +#ifdef SUPPORT_UNICODE +#if PCRE2_CODE_UNIT_WIDTH == 8 + if (first_cu > 127 && ucp && !utf) first_cu2 = UCD_OTHERCASE(first_cu); +#else + if (first_cu > 127 && (utf || ucp)) first_cu2 = UCD_OTHERCASE(first_cu); +#endif +#endif /* SUPPORT_UNICODE */ + } + } +else + if (!startline && (re->flags & PCRE2_FIRSTMAPSET) != 0) + start_bits = re->start_bitmap; + +/* There may also be a "last known required character" set. */ + +if ((re->flags & PCRE2_LASTSET) != 0) + { + has_req_cu = TRUE; + req_cu = req_cu2 = (PCRE2_UCHAR)(re->last_codeunit); + if ((re->flags & PCRE2_LASTCASELESS) != 0) + { + req_cu2 = TABLE_GET(req_cu, mb->fcc, req_cu); +#ifdef SUPPORT_UNICODE +#if PCRE2_CODE_UNIT_WIDTH == 8 + if (req_cu > 127 && ucp && !utf) req_cu2 = UCD_OTHERCASE(req_cu); +#else + if (req_cu > 127 && (utf || ucp)) req_cu2 = UCD_OTHERCASE(req_cu); +#endif +#endif /* SUPPORT_UNICODE */ + } + } + + +/* ==========================================================================*/ + +/* Loop for handling unanchored repeated matching attempts; for anchored regexs +the loop runs just once. */ + +#ifdef SUPPORT_UNICODE +FRAGMENT_RESTART: +#endif + +start_partial = match_partial = NULL; +mb->hitend = FALSE; + +#if PCRE2_CODE_UNIT_WIDTH == 8 +memchr_not_found_first_cu = FALSE; +memchr_not_found_first_cu2 = FALSE; +#endif + +for(;;) + { + PCRE2_SPTR new_start_match; + + /* ----------------- Start of match optimizations ---------------- */ + + /* There are some optimizations that avoid running the match if a known + starting point is not found, or if a known later code unit is not present. + However, there is an option (settable at compile time) that disables these, + for testing and for ensuring that all callouts do actually occur. */ + + if ((re->overall_options & PCRE2_NO_START_OPTIMIZE) == 0) + { + /* If firstline is TRUE, the start of the match is constrained to the first + line of a multiline string. That is, the match must be before or at the + first newline following the start of matching. Temporarily adjust + end_subject so that we stop the scans for a first code unit at a newline. + If the match fails at the newline, later code breaks the loop. */ + + if (firstline) + { + PCRE2_SPTR t = start_match; +#ifdef SUPPORT_UNICODE + if (utf) + { + while (t < end_subject && !IS_NEWLINE(t)) + { + t++; + ACROSSCHAR(t < end_subject, t, t++); + } + } + else +#endif + while (t < end_subject && !IS_NEWLINE(t)) t++; + end_subject = t; + } + + /* Anchored: check the first code unit if one is recorded. This may seem + pointless but it can help in detecting a no match case without scanning for + the required code unit. */ + + if (anchored) + { + if (has_first_cu || start_bits != NULL) + { + BOOL ok = start_match < end_subject; + if (ok) + { + PCRE2_UCHAR c = UCHAR21TEST(start_match); + ok = has_first_cu && (c == first_cu || c == first_cu2); + if (!ok && start_bits != NULL) + { +#if PCRE2_CODE_UNIT_WIDTH != 8 + if (c > 255) c = 255; +#endif + ok = (start_bits[c/8] & (1u << (c&7))) != 0; + } + } + if (!ok) + { + rc = MATCH_NOMATCH; + break; + } + } + } + + /* Not anchored. Advance to a unique first code unit if there is one. In + 8-bit mode, the use of memchr() gives a big speed up, even though we have + to call it twice in caseless mode, in order to find the earliest occurrence + of the character in either of its cases. If a call to memchr() that + searches the rest of the subject fails to find one case, remember that in + order not to keep on repeating the search. This can make a huge difference + when the strings are very long and only one case is present. */ + + else + { + if (has_first_cu) + { + if (first_cu != first_cu2) /* Caseless */ + { +#if PCRE2_CODE_UNIT_WIDTH != 8 + PCRE2_UCHAR smc; + while (start_match < end_subject && + (smc = UCHAR21TEST(start_match)) != first_cu && + smc != first_cu2) + start_match++; + +#else /* 8-bit code units */ + PCRE2_SPTR pp1 = NULL; + PCRE2_SPTR pp2 = NULL; + PCRE2_SIZE cu2size = end_subject - start_match; + + if (!memchr_not_found_first_cu) + { + pp1 = memchr(start_match, first_cu, end_subject - start_match); + if (pp1 == NULL) memchr_not_found_first_cu = TRUE; + else cu2size = pp1 - start_match; + } + + /* If pp1 is not NULL, we have arranged to search only as far as pp1, + to see if the other case is earlier, so we can set "not found" only + when both searches have returned NULL. */ + + if (!memchr_not_found_first_cu2) + { + pp2 = memchr(start_match, first_cu2, cu2size); + memchr_not_found_first_cu2 = (pp2 == NULL && pp1 == NULL); + } + + if (pp1 == NULL) + start_match = (pp2 == NULL)? end_subject : pp2; + else + start_match = (pp2 == NULL || pp1 < pp2)? pp1 : pp2; +#endif + } + + /* The caseful case */ + + else + { +#if PCRE2_CODE_UNIT_WIDTH != 8 + while (start_match < end_subject && UCHAR21TEST(start_match) != + first_cu) + start_match++; +#else + start_match = memchr(start_match, first_cu, end_subject - start_match); + if (start_match == NULL) start_match = end_subject; +#endif + } + + /* If we can't find the required first code unit, having reached the + true end of the subject, break the bumpalong loop, to force a match + failure, except when doing partial matching, when we let the next cycle + run at the end of the subject. To see why, consider the pattern + /(?<=abc)def/, which partially matches "abc", even though the string + does not contain the starting character "d". If we have not reached the + true end of the subject (PCRE2_FIRSTLINE caused end_subject to be + temporarily modified) we also let the cycle run, because the matching + string is legitimately allowed to start with the first code unit of a + newline. */ + + if (mb->partial == 0 && start_match >= mb->end_subject) + { + rc = MATCH_NOMATCH; + break; + } + } + + /* If there's no first code unit, advance to just after a linebreak for a + multiline match if required. */ + + else if (startline) + { + if (start_match > mb->start_subject + start_offset) + { +#ifdef SUPPORT_UNICODE + if (utf) + { + while (start_match < end_subject && !WAS_NEWLINE(start_match)) + { + start_match++; + ACROSSCHAR(start_match < end_subject, start_match, start_match++); + } + } + else +#endif + while (start_match < end_subject && !WAS_NEWLINE(start_match)) + start_match++; + + /* If we have just passed a CR and the newline option is ANY or + ANYCRLF, and we are now at a LF, advance the match position by one + more code unit. */ + + if (start_match[-1] == CHAR_CR && + (mb->nltype == NLTYPE_ANY || mb->nltype == NLTYPE_ANYCRLF) && + start_match < end_subject && + UCHAR21TEST(start_match) == CHAR_NL) + start_match++; + } + } + + /* If there's no first code unit or a requirement for a multiline line + start, advance to a non-unique first code unit if any have been + identified. The bitmap contains only 256 bits. When code units are 16 or + 32 bits wide, all code units greater than 254 set the 255 bit. */ + + else if (start_bits != NULL) + { + while (start_match < end_subject) + { + uint32_t c = UCHAR21TEST(start_match); +#if PCRE2_CODE_UNIT_WIDTH != 8 + if (c > 255) c = 255; +#endif + if ((start_bits[c/8] & (1u << (c&7))) != 0) break; + start_match++; + } + + /* See comment above in first_cu checking about the next few lines. */ + + if (mb->partial == 0 && start_match >= mb->end_subject) + { + rc = MATCH_NOMATCH; + break; + } + } + } /* End first code unit handling */ + + /* Restore fudged end_subject */ + + end_subject = mb->end_subject; + + /* The following two optimizations must be disabled for partial matching. */ + + if (mb->partial == 0) + { + PCRE2_SPTR p; + + /* The minimum matching length is a lower bound; no string of that length + may actually match the pattern. Although the value is, strictly, in + characters, we treat it as code units to avoid spending too much time in + this optimization. */ + + if (end_subject - start_match < re->minlength) + { + rc = MATCH_NOMATCH; + break; + } + + /* If req_cu is set, we know that that code unit must appear in the + subject for the (non-partial) match to succeed. If the first code unit is + set, req_cu must be later in the subject; otherwise the test starts at + the match point. This optimization can save a huge amount of backtracking + in patterns with nested unlimited repeats that aren't going to match. + Writing separate code for caseful/caseless versions makes it go faster, + as does using an autoincrement and backing off on a match. As in the case + of the first code unit, using memchr() in the 8-bit library gives a big + speed up. Unlike the first_cu check above, we do not need to call + memchr() twice in the caseless case because we only need to check for the + presence of the character in either case, not find the first occurrence. + + The search can be skipped if the code unit was found later than the + current starting point in a previous iteration of the bumpalong loop. + + HOWEVER: when the subject string is very, very long, searching to its end + can take a long time, and give bad performance on quite ordinary + anchored patterns. This showed up when somebody was matching something + like /^\d+C/ on a 32-megabyte string... so we don't do this when the + string is sufficiently long, but it's worth searching a lot more for + unanchored patterns. */ + + p = start_match + (has_first_cu? 1:0); + if (has_req_cu && p > req_cu_ptr) + { + PCRE2_SIZE check_length = end_subject - start_match; + + if (check_length < REQ_CU_MAX || + (!anchored && check_length < REQ_CU_MAX * 1000)) + { + if (req_cu != req_cu2) /* Caseless */ + { +#if PCRE2_CODE_UNIT_WIDTH != 8 + while (p < end_subject) + { + uint32_t pp = UCHAR21INCTEST(p); + if (pp == req_cu || pp == req_cu2) { p--; break; } + } +#else /* 8-bit code units */ + PCRE2_SPTR pp = p; + p = memchr(pp, req_cu, end_subject - pp); + if (p == NULL) + { + p = memchr(pp, req_cu2, end_subject - pp); + if (p == NULL) p = end_subject; + } +#endif /* PCRE2_CODE_UNIT_WIDTH != 8 */ + } + + /* The caseful case */ + + else + { +#if PCRE2_CODE_UNIT_WIDTH != 8 + while (p < end_subject) + { + if (UCHAR21INCTEST(p) == req_cu) { p--; break; } + } + +#else /* 8-bit code units */ + p = memchr(p, req_cu, end_subject - p); + if (p == NULL) p = end_subject; +#endif + } + + /* If we can't find the required code unit, break the bumpalong loop, + forcing a match failure. */ + + if (p >= end_subject) + { + rc = MATCH_NOMATCH; + break; + } + + /* If we have found the required code unit, save the point where we + found it, so that we don't search again next time round the bumpalong + loop if the start hasn't yet passed this code unit. */ + + req_cu_ptr = p; + } + } + } + } + + /* ------------ End of start of match optimizations ------------ */ + + /* Give no match if we have passed the bumpalong limit. */ + + if (start_match > bumpalong_limit) + { + rc = MATCH_NOMATCH; + break; + } + + /* OK, we can now run the match. If "hitend" is set afterwards, remember the + first starting point for which a partial match was found. */ + + cb.start_match = (PCRE2_SIZE)(start_match - subject); + cb.callout_flags |= PCRE2_CALLOUT_STARTMATCH; + + mb->start_used_ptr = start_match; + mb->last_used_ptr = start_match; +#ifdef SUPPORT_UNICODE + mb->moptions = options | fragment_options; +#else + mb->moptions = options; +#endif + mb->match_call_count = 0; + mb->end_offset_top = 0; + mb->skip_arg_count = 0; + + rc = match(start_match, mb->start_code, match_data->ovector, + match_data->oveccount, re->top_bracket, frame_size, mb); + + if (mb->hitend && start_partial == NULL) + { + start_partial = mb->start_used_ptr; + match_partial = start_match; + } + + switch(rc) + { + /* If MATCH_SKIP_ARG reaches this level it means that a MARK that matched + the SKIP's arg was not found. In this circumstance, Perl ignores the SKIP + entirely. The only way we can do that is to re-do the match at the same + point, with a flag to force SKIP with an argument to be ignored. Just + treating this case as NOMATCH does not work because it does not check other + alternatives in patterns such as A(*SKIP:A)B|AC when the subject is AC. */ + + case MATCH_SKIP_ARG: + new_start_match = start_match; + mb->ignore_skip_arg = mb->skip_arg_count; + break; + + /* SKIP passes back the next starting point explicitly, but if it is no + greater than the match we have just done, treat it as NOMATCH. */ + + case MATCH_SKIP: + if (mb->verb_skip_ptr > start_match) + { + new_start_match = mb->verb_skip_ptr; + break; + } + /* Fall through */ + + /* NOMATCH and PRUNE advance by one character. THEN at this level acts + exactly like PRUNE. Unset ignore SKIP-with-argument. */ + + case MATCH_NOMATCH: + case MATCH_PRUNE: + case MATCH_THEN: + mb->ignore_skip_arg = 0; + new_start_match = start_match + 1; +#ifdef SUPPORT_UNICODE + if (utf) + ACROSSCHAR(new_start_match < end_subject, new_start_match, + new_start_match++); +#endif + break; + + /* COMMIT disables the bumpalong, but otherwise behaves as NOMATCH. */ + + case MATCH_COMMIT: + rc = MATCH_NOMATCH; + goto ENDLOOP; + + /* Any other return is either a match, or some kind of error. */ + + default: + goto ENDLOOP; + } + + /* Control reaches here for the various types of "no match at this point" + result. Reset the code to MATCH_NOMATCH for subsequent checking. */ + + rc = MATCH_NOMATCH; + + /* If PCRE2_FIRSTLINE is set, the match must happen before or at the first + newline in the subject (though it may continue over the newline). Therefore, + if we have just failed to match, starting at a newline, do not continue. */ + + if (firstline && IS_NEWLINE(start_match)) break; + + /* Advance to new matching position */ + + start_match = new_start_match; + + /* Break the loop if the pattern is anchored or if we have passed the end of + the subject. */ + + if (anchored || start_match > end_subject) break; + + /* If we have just passed a CR and we are now at a LF, and the pattern does + not contain any explicit matches for \r or \n, and the newline option is CRLF + or ANY or ANYCRLF, advance the match position by one more code unit. In + normal matching start_match will aways be greater than the first position at + this stage, but a failed *SKIP can cause a return at the same point, which is + why the first test exists. */ + + if (start_match > subject + start_offset && + start_match[-1] == CHAR_CR && + start_match < end_subject && + *start_match == CHAR_NL && + (re->flags & PCRE2_HASCRORLF) == 0 && + (mb->nltype == NLTYPE_ANY || + mb->nltype == NLTYPE_ANYCRLF || + mb->nllen == 2)) + start_match++; + + mb->mark = NULL; /* Reset for start of next match attempt */ + } /* End of for(;;) "bumpalong" loop */ + +/* ==========================================================================*/ + +/* When we reach here, one of the following stopping conditions is true: + +(1) The match succeeded, either completely, or partially; + +(2) The pattern is anchored or the match was failed after (*COMMIT); + +(3) We are past the end of the subject or the bumpalong limit; + +(4) PCRE2_FIRSTLINE is set and we have failed to match at a newline, because + this option requests that a match occur at or before the first newline in + the subject. + +(5) Some kind of error occurred. + +*/ + +ENDLOOP: + +/* If end_subject != true_end_subject, it means we are handling invalid UTF, +and have just processed a non-terminal fragment. If this resulted in no match +or a partial match we must carry on to the next fragment (a partial match is +returned to the caller only at the very end of the subject). A loop is used to +avoid trying to match against empty fragments; if the pattern can match an +empty string it would have done so already. */ + +#ifdef SUPPORT_UNICODE +if (utf && end_subject != true_end_subject && + (rc == MATCH_NOMATCH || rc == PCRE2_ERROR_PARTIAL)) + { + for (;;) + { + /* Advance past the first bad code unit, and then skip invalid character + starting code units in 8-bit and 16-bit modes. */ + + start_match = end_subject + 1; + +#if PCRE2_CODE_UNIT_WIDTH != 32 + while (start_match < true_end_subject && NOT_FIRSTCU(*start_match)) + start_match++; +#endif + + /* If we have hit the end of the subject, there isn't another non-empty + fragment, so give up. */ + + if (start_match >= true_end_subject) + { + rc = MATCH_NOMATCH; /* In case it was partial */ + break; + } + + /* Check the rest of the subject */ + + mb->check_subject = start_match; + rc = PRIV(valid_utf)(start_match, length - (start_match - subject), + &(match_data->startchar)); + + /* The rest of the subject is valid UTF. */ + + if (rc == 0) + { + mb->end_subject = end_subject = true_end_subject; + fragment_options = PCRE2_NOTBOL; + goto FRAGMENT_RESTART; + } + + /* A subsequent UTF error has been found; if the next fragment is + non-empty, set up to process it. Otherwise, let the loop advance. */ + + else if (rc < 0) + { + mb->end_subject = end_subject = start_match + match_data->startchar; + if (end_subject > start_match) + { + fragment_options = PCRE2_NOTBOL|PCRE2_NOTEOL; + goto FRAGMENT_RESTART; + } + } + } + } +#endif /* SUPPORT_UNICODE */ + +/* Release an enlarged frame vector that is on the heap. */ + +if (mb->match_frames != mb->stack_frames) + mb->memctl.free(mb->match_frames, mb->memctl.memory_data); + +/* Fill in fields that are always returned in the match data. */ + +match_data->code = re; +match_data->mark = mb->mark; +match_data->matchedby = PCRE2_MATCHEDBY_INTERPRETER; + +/* Handle a fully successful match. Set the return code to the number of +captured strings, or 0 if there were too many to fit into the ovector, and then +set the remaining returned values before returning. Make a copy of the subject +string if requested. */ + +if (rc == MATCH_MATCH) + { + match_data->rc = ((int)mb->end_offset_top >= 2 * match_data->oveccount)? + 0 : (int)mb->end_offset_top/2 + 1; + match_data->startchar = start_match - subject; + match_data->leftchar = mb->start_used_ptr - subject; + match_data->rightchar = ((mb->last_used_ptr > mb->end_match_ptr)? + mb->last_used_ptr : mb->end_match_ptr) - subject; + if ((options & PCRE2_COPY_MATCHED_SUBJECT) != 0) + { + length = CU2BYTES(length + was_zero_terminated); + match_data->subject = match_data->memctl.malloc(length, + match_data->memctl.memory_data); + if (match_data->subject == NULL) return PCRE2_ERROR_NOMEMORY; + memcpy((void *)match_data->subject, subject, length); + match_data->flags |= PCRE2_MD_COPIED_SUBJECT; + } + else match_data->subject = subject; + return match_data->rc; + } + +/* Control gets here if there has been a partial match, an error, or if the +overall match attempt has failed at all permitted starting positions. Any mark +data is in the nomatch_mark field. */ + +match_data->mark = mb->nomatch_mark; + +/* For anything other than nomatch or partial match, just return the code. */ + +if (rc != MATCH_NOMATCH && rc != PCRE2_ERROR_PARTIAL) match_data->rc = rc; + +/* Handle a partial match. If a "soft" partial match was requested, searching +for a complete match will have continued, and the value of rc at this point +will be MATCH_NOMATCH. For a "hard" partial match, it will already be +PCRE2_ERROR_PARTIAL. */ + +else if (match_partial != NULL) + { + match_data->subject = subject; + match_data->ovector[0] = match_partial - subject; + match_data->ovector[1] = end_subject - subject; + match_data->startchar = match_partial - subject; + match_data->leftchar = start_partial - subject; + match_data->rightchar = end_subject - subject; + match_data->rc = PCRE2_ERROR_PARTIAL; + } + +/* Else this is the classic nomatch case. */ + +else match_data->rc = PCRE2_ERROR_NOMATCH; + +return match_data->rc; +} + +/* End of pcre2_match.c */ diff --git a/src/pcre2/src/pcre2_match_data.c b/src/pcre2/src/pcre2_match_data.c new file mode 100644 index 00000000..53e46987 --- /dev/null +++ b/src/pcre2/src/pcre2_match_data.c @@ -0,0 +1,166 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2019 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include "pcre2_internal.h" + + + +/************************************************* +* Create a match data block given ovector size * +*************************************************/ + +/* A minimum of 1 is imposed on the number of ovector pairs. */ + +PCRE2_EXP_DEFN pcre2_match_data * PCRE2_CALL_CONVENTION +pcre2_match_data_create(uint32_t oveccount, pcre2_general_context *gcontext) +{ +pcre2_match_data *yield; +if (oveccount < 1) oveccount = 1; +yield = PRIV(memctl_malloc)( + offsetof(pcre2_match_data, ovector) + 2*oveccount*sizeof(PCRE2_SIZE), + (pcre2_memctl *)gcontext); +if (yield == NULL) return NULL; +yield->oveccount = oveccount; +yield->flags = 0; +return yield; +} + + + +/************************************************* +* Create a match data block using pattern data * +*************************************************/ + +/* If no context is supplied, use the memory allocator from the code. */ + +PCRE2_EXP_DEFN pcre2_match_data * PCRE2_CALL_CONVENTION +pcre2_match_data_create_from_pattern(const pcre2_code *code, + pcre2_general_context *gcontext) +{ +if (gcontext == NULL) gcontext = (pcre2_general_context *)code; +return pcre2_match_data_create(((pcre2_real_code *)code)->top_bracket + 1, + gcontext); +} + + + +/************************************************* +* Free a match data block * +*************************************************/ + +PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION +pcre2_match_data_free(pcre2_match_data *match_data) +{ +if (match_data != NULL) + { + if ((match_data->flags & PCRE2_MD_COPIED_SUBJECT) != 0) + match_data->memctl.free((void *)match_data->subject, + match_data->memctl.memory_data); + match_data->memctl.free(match_data, match_data->memctl.memory_data); + } +} + + + +/************************************************* +* Get last mark in match * +*************************************************/ + +PCRE2_EXP_DEFN PCRE2_SPTR PCRE2_CALL_CONVENTION +pcre2_get_mark(pcre2_match_data *match_data) +{ +return match_data->mark; +} + + + +/************************************************* +* Get pointer to ovector * +*************************************************/ + +PCRE2_EXP_DEFN PCRE2_SIZE * PCRE2_CALL_CONVENTION +pcre2_get_ovector_pointer(pcre2_match_data *match_data) +{ +return match_data->ovector; +} + + + +/************************************************* +* Get number of ovector slots * +*************************************************/ + +PCRE2_EXP_DEFN uint32_t PCRE2_CALL_CONVENTION +pcre2_get_ovector_count(pcre2_match_data *match_data) +{ +return match_data->oveccount; +} + + + +/************************************************* +* Get starting code unit in match * +*************************************************/ + +PCRE2_EXP_DEFN PCRE2_SIZE PCRE2_CALL_CONVENTION +pcre2_get_startchar(pcre2_match_data *match_data) +{ +return match_data->startchar; +} + + + +/************************************************* +* Get size of match data block * +*************************************************/ + +PCRE2_EXP_DEFN PCRE2_SIZE PCRE2_CALL_CONVENTION +pcre2_get_match_data_size(pcre2_match_data *match_data) +{ +return offsetof(pcre2_match_data, ovector) + + 2 * (match_data->oveccount) * sizeof(PCRE2_SIZE); +} + +/* End of pcre2_match_data.c */ diff --git a/src/pcre/pcre_newline.c b/src/pcre2/src/pcre2_newline.c similarity index 64% rename from src/pcre/pcre_newline.c rename to src/pcre2/src/pcre2_newline.c index b8f5a4de..6e9366db 100644 --- a/src/pcre/pcre_newline.c +++ b/src/pcre2/src/pcre2_newline.c @@ -6,7 +6,8 @@ and semantics are as close as possible to those of the Perl 5 language. Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016 University of Cambridge ----------------------------------------------------------------------------- Redistribution and use in source and binary forms, with or without @@ -41,7 +42,7 @@ POSSIBILITY OF SUCH DAMAGE. /* This module contains internal functions for testing newlines when more than one kind of newline is to be recognized. When a newline is found, its length is returned. In principle, we could implement several newline "types", each -referring to a different set of newline characters. At present, PCRE supports +referring to a different set of newline characters. At present, PCRE2 supports only NLTYPE_FIXED, which gets handled without these functions, NLTYPE_ANYCRLF, and NLTYPE_ANY. The full list of Unicode newline characters is taken from http://unicode.org/unicode/reports/tr18/. */ @@ -51,7 +52,7 @@ and NLTYPE_ANY. The full list of Unicode newline characters is taken from #include "config.h" #endif -#include "pcre_internal.h" +#include "pcre2_internal.h" @@ -59,8 +60,10 @@ and NLTYPE_ANY. The full list of Unicode newline characters is taken from * Check for newline at given position * *************************************************/ -/* It is guaranteed that the initial value of ptr is less than the end of the -string that is being processed. +/* This function is called only via the IS_NEWLINE macro, which does so only +when the newline type is NLTYPE_ANY or NLTYPE_ANYCRLF. The case of a fixed +newline (NLTYPE_FIXED) is handled inline. It is guaranteed that the code unit +pointed to by ptr is less than the end of the string. Arguments: ptr pointer to possible newline @@ -73,28 +76,30 @@ Returns: TRUE or FALSE */ BOOL -PRIV(is_newline)(PCRE_PUCHAR ptr, int type, PCRE_PUCHAR endptr, int *lenptr, - BOOL utf) +PRIV(is_newline)(PCRE2_SPTR ptr, uint32_t type, PCRE2_SPTR endptr, + uint32_t *lenptr, BOOL utf) { -pcre_uint32 c; -(void)utf; -#ifdef SUPPORT_UTF -if (utf) - { - GETCHAR(c, ptr); - } -else -#endif /* SUPPORT_UTF */ - c = *ptr; +uint32_t c; -/* Note that this function is called only for ANY or ANYCRLF. */ +#ifdef SUPPORT_UNICODE +if (utf) { GETCHAR(c, ptr); } else c = *ptr; +#else +(void)utf; +c = *ptr; +#endif /* SUPPORT_UNICODE */ if (type == NLTYPE_ANYCRLF) switch(c) { - case CHAR_LF: *lenptr = 1; return TRUE; - case CHAR_CR: *lenptr = (ptr < endptr - 1 && ptr[1] == CHAR_LF)? 2 : 1; - return TRUE; - default: return FALSE; + case CHAR_LF: + *lenptr = 1; + return TRUE; + + case CHAR_CR: + *lenptr = (ptr < endptr - 1 && ptr[1] == CHAR_LF)? 2 : 1; + return TRUE; + + default: + return FALSE; } /* NLTYPE_ANY */ @@ -106,25 +111,36 @@ else switch(c) #endif case CHAR_LF: case CHAR_VT: - case CHAR_FF: *lenptr = 1; return TRUE; + case CHAR_FF: + *lenptr = 1; + return TRUE; case CHAR_CR: *lenptr = (ptr < endptr - 1 && ptr[1] == CHAR_LF)? 2 : 1; return TRUE; #ifndef EBCDIC -#ifdef COMPILE_PCRE8 - case CHAR_NEL: *lenptr = utf? 2 : 1; return TRUE; - case 0x2028: /* LS */ - case 0x2029: *lenptr = 3; return TRUE; /* PS */ -#else /* COMPILE_PCRE16 || COMPILE_PCRE32 */ +#if PCRE2_CODE_UNIT_WIDTH == 8 + case CHAR_NEL: + *lenptr = utf? 2 : 1; + return TRUE; + + case 0x2028: /* LS */ + case 0x2029: /* PS */ + *lenptr = 3; + return TRUE; + +#else /* 16-bit or 32-bit code units */ case CHAR_NEL: - case 0x2028: /* LS */ - case 0x2029: *lenptr = 1; return TRUE; /* PS */ -#endif /* COMPILE_PCRE8 */ -#endif /* Not EBCDIC */ + case 0x2028: /* LS */ + case 0x2029: /* PS */ + *lenptr = 1; + return TRUE; +#endif +#endif /* Not EBCDIC */ - default: return FALSE; + default: + return FALSE; } } @@ -134,8 +150,10 @@ else switch(c) * Check for newline at previous position * *************************************************/ -/* It is guaranteed that the initial value of ptr is greater than the start of -the string that is being processed. +/* This function is called only via the WAS_NEWLINE macro, which does so only +when the newline type is NLTYPE_ANY or NLTYPE_ANYCRLF. The case of a fixed +newline (NLTYPE_FIXED) is handled inline. It is guaranteed that the initial +value of ptr is greater than the start of the string that is being processed. Arguments: ptr pointer to possible newline @@ -148,23 +166,23 @@ Returns: TRUE or FALSE */ BOOL -PRIV(was_newline)(PCRE_PUCHAR ptr, int type, PCRE_PUCHAR startptr, int *lenptr, - BOOL utf) +PRIV(was_newline)(PCRE2_SPTR ptr, uint32_t type, PCRE2_SPTR startptr, + uint32_t *lenptr, BOOL utf) { -pcre_uint32 c; -(void)utf; +uint32_t c; ptr--; -#ifdef SUPPORT_UTF + +#ifdef SUPPORT_UNICODE if (utf) { BACKCHAR(ptr); GETCHAR(c, ptr); } -else -#endif /* SUPPORT_UTF */ - c = *ptr; - -/* Note that this function is called only for ANY or ANYCRLF. */ +else c = *ptr; +#else +(void)utf; +c = *ptr; +#endif /* SUPPORT_UNICODE */ if (type == NLTYPE_ANYCRLF) switch(c) { @@ -172,8 +190,12 @@ if (type == NLTYPE_ANYCRLF) switch(c) *lenptr = (ptr > startptr && ptr[-1] == CHAR_CR)? 2 : 1; return TRUE; - case CHAR_CR: *lenptr = 1; return TRUE; - default: return FALSE; + case CHAR_CR: + *lenptr = 1; + return TRUE; + + default: + return FALSE; } /* NLTYPE_ANY */ @@ -189,22 +211,33 @@ else switch(c) #endif case CHAR_VT: case CHAR_FF: - case CHAR_CR: *lenptr = 1; return TRUE; + case CHAR_CR: + *lenptr = 1; + return TRUE; #ifndef EBCDIC -#ifdef COMPILE_PCRE8 - case CHAR_NEL: *lenptr = utf? 2 : 1; return TRUE; - case 0x2028: /* LS */ - case 0x2029: *lenptr = 3; return TRUE; /* PS */ -#else /* COMPILE_PCRE16 || COMPILE_PCRE32 */ +#if PCRE2_CODE_UNIT_WIDTH == 8 case CHAR_NEL: - case 0x2028: /* LS */ - case 0x2029: *lenptr = 1; return TRUE; /* PS */ -#endif /* COMPILE_PCRE8 */ -#endif /* NotEBCDIC */ + *lenptr = utf? 2 : 1; + return TRUE; + + case 0x2028: /* LS */ + case 0x2029: /* PS */ + *lenptr = 3; + return TRUE; + +#else /* 16-bit or 32-bit code units */ + case CHAR_NEL: + case 0x2028: /* LS */ + case 0x2029: /* PS */ + *lenptr = 1; + return TRUE; +#endif +#endif /* Not EBCDIC */ - default: return FALSE; + default: + return FALSE; } } -/* End of pcre_newline.c */ +/* End of pcre2_newline.c */ diff --git a/src/pcre/pcre_ord2utf8.c b/src/pcre2/src/pcre2_ord2utf.c similarity index 66% rename from src/pcre/pcre_ord2utf8.c rename to src/pcre2/src/pcre2_ord2utf.c index 95f1beb9..14037309 100644 --- a/src/pcre/pcre_ord2utf8.c +++ b/src/pcre2/src/pcre2_ord2utf.c @@ -6,7 +6,8 @@ and semantics are as close as possible to those of the Perl 5 language. Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016 University of Cambridge ----------------------------------------------------------------------------- Redistribution and use in source and binary forms, with or without @@ -38,39 +39,51 @@ POSSIBILITY OF SUCH DAMAGE. */ -/* This file contains a private PCRE function that converts an ordinal -character value into a UTF8 string. */ +/* This file contains a function that converts a Unicode character code point +into a UTF string. The behaviour is different for each code unit width. */ + #ifdef HAVE_CONFIG_H #include "config.h" #endif -#define COMPILE_PCRE8 +#include "pcre2_internal.h" + + +/* If SUPPORT_UNICODE is not defined, this function will never be called. +Supply a dummy function because some compilers do not like empty source +modules. */ + +#ifndef SUPPORT_UNICODE +unsigned int +PRIV(ord2utf)(uint32_t cvalue, PCRE2_UCHAR *buffer) +{ +(void)(cvalue); +(void)(buffer); +return 0; +} +#else /* SUPPORT_UNICODE */ -#include "pcre_internal.h" /************************************************* -* Convert character value to UTF-8 * +* Convert code point to UTF * *************************************************/ -/* This function takes an integer value in the range 0 - 0x10ffff -and encodes it as a UTF-8 character in 1 to 4 pcre_uchars. - +/* Arguments: cvalue the character value - buffer pointer to buffer for result - at least 6 pcre_uchars long + buffer pointer to buffer for result -Returns: number of characters placed in the buffer +Returns: number of code units placed in the buffer */ -unsigned -int -PRIV(ord2utf)(pcre_uint32 cvalue, pcre_uchar *buffer) +unsigned int +PRIV(ord2utf)(uint32_t cvalue, PCRE2_UCHAR *buffer) { -#ifdef SUPPORT_UTF - -register int i, j; +/* Convert to UTF-8 */ +#if PCRE2_CODE_UNIT_WIDTH == 8 +int i, j; for (i = 0; i < PRIV(utf8_table1_size); i++) if ((int)cvalue <= PRIV(utf8_table1)[i]) break; buffer += i; @@ -82,13 +95,26 @@ for (j = i; j > 0; j--) *buffer = PRIV(utf8_table2)[i] | cvalue; return i + 1; -#else +/* Convert to UTF-16 */ -(void)(cvalue); /* Keep compiler happy; this function won't ever be */ -(void)(buffer); /* called when SUPPORT_UTF is not defined. */ -return 0; +#elif PCRE2_CODE_UNIT_WIDTH == 16 +if (cvalue <= 0xffff) + { + *buffer = (PCRE2_UCHAR)cvalue; + return 1; + } +cvalue -= 0x10000; +*buffer++ = 0xd800 | (cvalue >> 10); +*buffer = 0xdc00 | (cvalue & 0x3ff); +return 2; + +/* Convert to UTF-32 */ +#else +*buffer = (PCRE2_UCHAR)cvalue; +return 1; #endif } +#endif /* SUPPORT_UNICODE */ -/* End of pcre_ord2utf8.c */ +/* End of pcre_ord2utf.c */ diff --git a/src/pcre2/src/pcre2_pattern_info.c b/src/pcre2/src/pcre2_pattern_info.c new file mode 100644 index 00000000..a29f5eff --- /dev/null +++ b/src/pcre2/src/pcre2_pattern_info.c @@ -0,0 +1,432 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2018 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include "pcre2_internal.h" + + +/************************************************* +* Return info about compiled pattern * +*************************************************/ + +/* +Arguments: + code points to compiled code + what what information is required + where where to put the information; if NULL, return length + +Returns: 0 when data returned + > 0 when length requested + < 0 on error or unset value +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_pattern_info(const pcre2_code *code, uint32_t what, void *where) +{ +const pcre2_real_code *re = (pcre2_real_code *)code; + +if (where == NULL) /* Requests field length */ + { + switch(what) + { + case PCRE2_INFO_ALLOPTIONS: + case PCRE2_INFO_ARGOPTIONS: + case PCRE2_INFO_BACKREFMAX: + case PCRE2_INFO_BSR: + case PCRE2_INFO_CAPTURECOUNT: + case PCRE2_INFO_DEPTHLIMIT: + case PCRE2_INFO_EXTRAOPTIONS: + case PCRE2_INFO_FIRSTCODETYPE: + case PCRE2_INFO_FIRSTCODEUNIT: + case PCRE2_INFO_HASBACKSLASHC: + case PCRE2_INFO_HASCRORLF: + case PCRE2_INFO_HEAPLIMIT: + case PCRE2_INFO_JCHANGED: + case PCRE2_INFO_LASTCODETYPE: + case PCRE2_INFO_LASTCODEUNIT: + case PCRE2_INFO_MATCHEMPTY: + case PCRE2_INFO_MATCHLIMIT: + case PCRE2_INFO_MAXLOOKBEHIND: + case PCRE2_INFO_MINLENGTH: + case PCRE2_INFO_NAMEENTRYSIZE: + case PCRE2_INFO_NAMECOUNT: + case PCRE2_INFO_NEWLINE: + return sizeof(uint32_t); + + case PCRE2_INFO_FIRSTBITMAP: + return sizeof(const uint8_t *); + + case PCRE2_INFO_JITSIZE: + case PCRE2_INFO_SIZE: + case PCRE2_INFO_FRAMESIZE: + return sizeof(size_t); + + case PCRE2_INFO_NAMETABLE: + return sizeof(PCRE2_SPTR); + } + } + +if (re == NULL) return PCRE2_ERROR_NULL; + +/* Check that the first field in the block is the magic number. If it is not, +return with PCRE2_ERROR_BADMAGIC. */ + +if (re->magic_number != MAGIC_NUMBER) return PCRE2_ERROR_BADMAGIC; + +/* Check that this pattern was compiled in the correct bit mode */ + +if ((re->flags & (PCRE2_CODE_UNIT_WIDTH/8)) == 0) return PCRE2_ERROR_BADMODE; + +switch(what) + { + case PCRE2_INFO_ALLOPTIONS: + *((uint32_t *)where) = re->overall_options; + break; + + case PCRE2_INFO_ARGOPTIONS: + *((uint32_t *)where) = re->compile_options; + break; + + case PCRE2_INFO_BACKREFMAX: + *((uint32_t *)where) = re->top_backref; + break; + + case PCRE2_INFO_BSR: + *((uint32_t *)where) = re->bsr_convention; + break; + + case PCRE2_INFO_CAPTURECOUNT: + *((uint32_t *)where) = re->top_bracket; + break; + + case PCRE2_INFO_DEPTHLIMIT: + *((uint32_t *)where) = re->limit_depth; + if (re->limit_depth == UINT32_MAX) return PCRE2_ERROR_UNSET; + break; + + case PCRE2_INFO_EXTRAOPTIONS: + *((uint32_t *)where) = re->extra_options; + break; + + case PCRE2_INFO_FIRSTCODETYPE: + *((uint32_t *)where) = ((re->flags & PCRE2_FIRSTSET) != 0)? 1 : + ((re->flags & PCRE2_STARTLINE) != 0)? 2 : 0; + break; + + case PCRE2_INFO_FIRSTCODEUNIT: + *((uint32_t *)where) = ((re->flags & PCRE2_FIRSTSET) != 0)? + re->first_codeunit : 0; + break; + + case PCRE2_INFO_FIRSTBITMAP: + *((const uint8_t **)where) = ((re->flags & PCRE2_FIRSTMAPSET) != 0)? + &(re->start_bitmap[0]) : NULL; + break; + + case PCRE2_INFO_FRAMESIZE: + *((size_t *)where) = offsetof(heapframe, ovector) + + re->top_bracket * 2 * sizeof(PCRE2_SIZE); + break; + + case PCRE2_INFO_HASBACKSLASHC: + *((uint32_t *)where) = (re->flags & PCRE2_HASBKC) != 0; + break; + + case PCRE2_INFO_HASCRORLF: + *((uint32_t *)where) = (re->flags & PCRE2_HASCRORLF) != 0; + break; + + case PCRE2_INFO_HEAPLIMIT: + *((uint32_t *)where) = re->limit_heap; + if (re->limit_heap == UINT32_MAX) return PCRE2_ERROR_UNSET; + break; + + case PCRE2_INFO_JCHANGED: + *((uint32_t *)where) = (re->flags & PCRE2_JCHANGED) != 0; + break; + + case PCRE2_INFO_JITSIZE: +#ifdef SUPPORT_JIT + *((size_t *)where) = (re->executable_jit != NULL)? + PRIV(jit_get_size)(re->executable_jit) : 0; +#else + *((size_t *)where) = 0; +#endif + break; + + case PCRE2_INFO_LASTCODETYPE: + *((uint32_t *)where) = ((re->flags & PCRE2_LASTSET) != 0)? 1 : 0; + break; + + case PCRE2_INFO_LASTCODEUNIT: + *((uint32_t *)where) = ((re->flags & PCRE2_LASTSET) != 0)? + re->last_codeunit : 0; + break; + + case PCRE2_INFO_MATCHEMPTY: + *((uint32_t *)where) = (re->flags & PCRE2_MATCH_EMPTY) != 0; + break; + + case PCRE2_INFO_MATCHLIMIT: + *((uint32_t *)where) = re->limit_match; + if (re->limit_match == UINT32_MAX) return PCRE2_ERROR_UNSET; + break; + + case PCRE2_INFO_MAXLOOKBEHIND: + *((uint32_t *)where) = re->max_lookbehind; + break; + + case PCRE2_INFO_MINLENGTH: + *((uint32_t *)where) = re->minlength; + break; + + case PCRE2_INFO_NAMEENTRYSIZE: + *((uint32_t *)where) = re->name_entry_size; + break; + + case PCRE2_INFO_NAMECOUNT: + *((uint32_t *)where) = re->name_count; + break; + + case PCRE2_INFO_NAMETABLE: + *((PCRE2_SPTR *)where) = (PCRE2_SPTR)((char *)re + sizeof(pcre2_real_code)); + break; + + case PCRE2_INFO_NEWLINE: + *((uint32_t *)where) = re->newline_convention; + break; + + case PCRE2_INFO_SIZE: + *((size_t *)where) = re->blocksize; + break; + + default: return PCRE2_ERROR_BADOPTION; + } + +return 0; +} + + + +/************************************************* +* Callout enumerator * +*************************************************/ + +/* +Arguments: + code points to compiled code + callback function called for each callout block + callout_data user data passed to the callback + +Returns: 0 when successfully completed + < 0 on local error + != 0 for callback error +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_callout_enumerate(const pcre2_code *code, + int (*callback)(pcre2_callout_enumerate_block *, void *), void *callout_data) +{ +pcre2_real_code *re = (pcre2_real_code *)code; +pcre2_callout_enumerate_block cb; +PCRE2_SPTR cc; +#ifdef SUPPORT_UNICODE +BOOL utf; +#endif + +if (re == NULL) return PCRE2_ERROR_NULL; + +#ifdef SUPPORT_UNICODE +utf = (re->overall_options & PCRE2_UTF) != 0; +#endif + +/* Check that the first field in the block is the magic number. If it is not, +return with PCRE2_ERROR_BADMAGIC. */ + +if (re->magic_number != MAGIC_NUMBER) return PCRE2_ERROR_BADMAGIC; + +/* Check that this pattern was compiled in the correct bit mode */ + +if ((re->flags & (PCRE2_CODE_UNIT_WIDTH/8)) == 0) return PCRE2_ERROR_BADMODE; + +cb.version = 0; +cc = (PCRE2_SPTR)((uint8_t *)re + sizeof(pcre2_real_code)) + + re->name_count * re->name_entry_size; + +while (TRUE) + { + int rc; + switch (*cc) + { + case OP_END: + return 0; + + case OP_CHAR: + case OP_CHARI: + case OP_NOT: + case OP_NOTI: + case OP_STAR: + case OP_MINSTAR: + case OP_PLUS: + case OP_MINPLUS: + case OP_QUERY: + case OP_MINQUERY: + case OP_UPTO: + case OP_MINUPTO: + case OP_EXACT: + case OP_POSSTAR: + case OP_POSPLUS: + case OP_POSQUERY: + case OP_POSUPTO: + case OP_STARI: + case OP_MINSTARI: + case OP_PLUSI: + case OP_MINPLUSI: + case OP_QUERYI: + case OP_MINQUERYI: + case OP_UPTOI: + case OP_MINUPTOI: + case OP_EXACTI: + case OP_POSSTARI: + case OP_POSPLUSI: + case OP_POSQUERYI: + case OP_POSUPTOI: + case OP_NOTSTAR: + case OP_NOTMINSTAR: + case OP_NOTPLUS: + case OP_NOTMINPLUS: + case OP_NOTQUERY: + case OP_NOTMINQUERY: + case OP_NOTUPTO: + case OP_NOTMINUPTO: + case OP_NOTEXACT: + case OP_NOTPOSSTAR: + case OP_NOTPOSPLUS: + case OP_NOTPOSQUERY: + case OP_NOTPOSUPTO: + case OP_NOTSTARI: + case OP_NOTMINSTARI: + case OP_NOTPLUSI: + case OP_NOTMINPLUSI: + case OP_NOTQUERYI: + case OP_NOTMINQUERYI: + case OP_NOTUPTOI: + case OP_NOTMINUPTOI: + case OP_NOTEXACTI: + case OP_NOTPOSSTARI: + case OP_NOTPOSPLUSI: + case OP_NOTPOSQUERYI: + case OP_NOTPOSUPTOI: + cc += PRIV(OP_lengths)[*cc]; +#ifdef SUPPORT_UNICODE + if (utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); +#endif + break; + + case OP_TYPESTAR: + case OP_TYPEMINSTAR: + case OP_TYPEPLUS: + case OP_TYPEMINPLUS: + case OP_TYPEQUERY: + case OP_TYPEMINQUERY: + case OP_TYPEUPTO: + case OP_TYPEMINUPTO: + case OP_TYPEEXACT: + case OP_TYPEPOSSTAR: + case OP_TYPEPOSPLUS: + case OP_TYPEPOSQUERY: + case OP_TYPEPOSUPTO: + cc += PRIV(OP_lengths)[*cc]; +#ifdef SUPPORT_UNICODE + if (cc[-1] == OP_PROP || cc[-1] == OP_NOTPROP) cc += 2; +#endif + break; + +#if defined SUPPORT_UNICODE || PCRE2_CODE_UNIT_WIDTH != 8 + case OP_XCLASS: + cc += GET(cc, 1); + break; +#endif + + case OP_MARK: + case OP_COMMIT_ARG: + case OP_PRUNE_ARG: + case OP_SKIP_ARG: + case OP_THEN_ARG: + cc += PRIV(OP_lengths)[*cc] + cc[1]; + break; + + case OP_CALLOUT: + cb.pattern_position = GET(cc, 1); + cb.next_item_length = GET(cc, 1 + LINK_SIZE); + cb.callout_number = cc[1 + 2*LINK_SIZE]; + cb.callout_string_offset = 0; + cb.callout_string_length = 0; + cb.callout_string = NULL; + rc = callback(&cb, callout_data); + if (rc != 0) return rc; + cc += PRIV(OP_lengths)[*cc]; + break; + + case OP_CALLOUT_STR: + cb.pattern_position = GET(cc, 1); + cb.next_item_length = GET(cc, 1 + LINK_SIZE); + cb.callout_number = 0; + cb.callout_string_offset = GET(cc, 1 + 3*LINK_SIZE); + cb.callout_string_length = + GET(cc, 1 + 2*LINK_SIZE) - (1 + 4*LINK_SIZE) - 2; + cb.callout_string = cc + (1 + 4*LINK_SIZE) + 1; + rc = callback(&cb, callout_data); + if (rc != 0) return rc; + cc += GET(cc, 1 + 2*LINK_SIZE); + break; + + default: + cc += PRIV(OP_lengths)[*cc]; + break; + } + } +} + +/* End of pcre2_pattern_info.c */ diff --git a/src/pcre/pcre_printint.c b/src/pcre2/src/pcre2_printint.c similarity index 63% rename from src/pcre/pcre_printint.c rename to src/pcre2/src/pcre2_printint.c index 60dcb55e..b9bab025 100644 --- a/src/pcre/pcre_printint.c +++ b/src/pcre2/src/pcre2_printint.c @@ -6,7 +6,8 @@ and semantics are as close as possible to those of the Perl 5 language. Written by Philip Hazel - Copyright (c) 1997-2012 University of Cambridge + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2019 University of Cambridge ----------------------------------------------------------------------------- Redistribution and use in source and binary forms, with or without @@ -40,104 +41,102 @@ POSSIBILITY OF SUCH DAMAGE. /* This module contains a PCRE private debugging function for printing out the internal form of a compiled regular expression, along with some supporting -local functions. This source file is used in two places: +local functions. This source file is #included in pcre2test.c at each supported +code unit width, with PCRE2_SUFFIX set appropriately, just like the functions +that comprise the library. It can also optionally be included in +pcre2_compile.c for detailed debugging in error situations. */ -(1) It is #included by pcre_compile.c when it is compiled in debugging mode -(PCRE_DEBUG defined in pcre_internal.h). It is not included in production -compiles. In this case PCRE_INCLUDED is defined. -(2) It is also compiled separately and linked with pcretest.c, which can be -asked to print out a compiled regex for debugging purposes. */ +/* Tables of operator names. The same 8-bit table is used for all code unit +widths, so it must be defined only once. The list itself is defined in +pcre2_internal.h, which is #included by pcre2test before this file. */ -#ifndef PCRE_INCLUDED - -#ifdef HAVE_CONFIG_H -#include "config.h" +#ifndef OP_LISTS_DEFINED +static const char *OP_names[] = { OP_NAME_LIST }; +#define OP_LISTS_DEFINED #endif -/* For pcretest program. */ -#define PRIV(name) name - -/* We have to include pcre_internal.h because we need the internal info for -displaying the results of pcre_study() and we also need to know about the -internal macros, structures, and other internal data values; pcretest has -"inside information" compared to a program that strictly follows the PCRE API. +/* The functions and tables herein must all have mode-dependent names. */ -Although pcre_internal.h does itself include pcre.h, we explicitly include it -here before pcre_internal.h so that the PCRE_EXP_xxx macros get set -appropriately for an application, not for building PCRE. */ +#define OP_lengths PCRE2_SUFFIX(OP_lengths_) +#define get_ucpname PCRE2_SUFFIX(get_ucpname_) +#define pcre2_printint PCRE2_SUFFIX(pcre2_printint_) +#define print_char PCRE2_SUFFIX(print_char_) +#define print_custring PCRE2_SUFFIX(print_custring_) +#define print_custring_bylen PCRE2_SUFFIX(print_custring_bylen_) +#define print_prop PCRE2_SUFFIX(print_prop_) -#include "pcre.h" -#include "pcre_internal.h" +/* Table of sizes for the fixed-length opcodes. It's defined in a macro so that +the definition is next to the definition of the opcodes in pcre2_internal.h. +The contents of the table are, however, mode-dependent. */ -/* These are the funtions that are contained within. It doesn't seem worth -having a separate .h file just for this. */ +static const uint8_t OP_lengths[] = { OP_LENGTHS }; -#endif /* PCRE_INCLUDED */ -#ifdef PCRE_INCLUDED -static /* Keep the following function as private. */ -#endif -#if defined COMPILE_PCRE8 -void pcre_printint(pcre *external_re, FILE *f, BOOL print_lengths); -#elif defined COMPILE_PCRE16 -void pcre16_printint(pcre *external_re, FILE *f, BOOL print_lengths); -#elif defined COMPILE_PCRE32 -void pcre32_printint(pcre *external_re, FILE *f, BOOL print_lengths); -#endif - -/* Macro that decides whether a character should be output as a literal or in -hexadecimal. We don't use isprint() because that can vary from system to system -(even without the use of locales) and we want the output always to be the same, -for testing purposes. */ - -#ifdef EBCDIC -#define PRINTABLE(c) ((c) >= 64 && (c) < 255) -#else -#define PRINTABLE(c) ((c) >= 32 && (c) < 127) -#endif - -/* The table of operator names. */ - -static const char *priv_OP_names[] = { OP_NAME_LIST }; - -/* This table of operator lengths is not actually used by the working code, -but its size is needed for a check that ensures it is the correct size for the -number of opcodes (thus catching update omissions). */ - -static const pcre_uint8 priv_OP_lengths[] = { OP_LENGTHS }; +/************************************************* +* Print one character from a string * +*************************************************/ +/* In UTF mode the character may occupy more than one code unit. +Arguments: + f file to write to + ptr pointer to first code unit of the character + utf TRUE if string is UTF (will be FALSE if UTF is not supported) -/************************************************* -* Print single- or multi-byte character * -*************************************************/ +Returns: number of additional code units used +*/ static unsigned int -print_char(FILE *f, pcre_uchar *ptr, BOOL utf) +print_char(FILE *f, PCRE2_SPTR ptr, BOOL utf) { -pcre_uint32 c = *ptr; - -#ifndef SUPPORT_UTF +uint32_t c = *ptr; +BOOL one_code_unit = !utf; -(void)utf; /* Avoid compiler warning */ -if (PRINTABLE(c)) fprintf(f, "%c", (char)c); -else if (c <= 0x80) fprintf(f, "\\x%02x", c); -else fprintf(f, "\\x{%x}", c); -return 0; +/* If UTF is supported and requested, check for a valid single code unit. */ +#ifdef SUPPORT_UNICODE +if (utf) + { +#if PCRE2_CODE_UNIT_WIDTH == 8 + one_code_unit = c < 0x80; +#elif PCRE2_CODE_UNIT_WIDTH == 16 + one_code_unit = (c & 0xfc00) != 0xd800; #else + one_code_unit = (c & 0xfffff800u) != 0xd800u; +#endif /* CODE_UNIT_WIDTH */ + } +#endif /* SUPPORT_UNICODE */ -#if defined COMPILE_PCRE8 +/* Handle a valid one-code-unit character at any width. */ -if (!utf || (c & 0xc0) != 0xc0) +if (one_code_unit) { if (PRINTABLE(c)) fprintf(f, "%c", (char)c); else if (c < 0x80) fprintf(f, "\\x%02x", c); else fprintf(f, "\\x{%02x}", c); return 0; } + +/* Code for invalid UTF code units and multi-unit UTF characters is different +for each width. If UTF is not supported, control should never get here, but we +need a return statement to keep the compiler happy. */ + +#ifndef SUPPORT_UNICODE +return 0; +#else + +/* Malformed UTF-8 should occur only if the sanity check has been turned off. +Rather than swallow random bytes, just stop if we hit a bad one. Print it with +\X instead of \x as an indication. */ + +#if PCRE2_CODE_UNIT_WIDTH == 8 +if ((c & 0xc0) != 0xc0) + { + fprintf(f, "\\X{%x}", c); /* Invalid starting byte */ + return 0; + } else { int i; @@ -146,198 +145,181 @@ else c = (c & PRIV(utf8_table3)[a]) << s; for (i = 1; i <= a; i++) { - /* This is a check for malformed UTF-8; it should only occur if the sanity - check has been turned off. Rather than swallow random bytes, just stop if - we hit a bad one. Print it with \X instead of \x as an indication. */ - if ((ptr[i] & 0xc0) != 0x80) { - fprintf(f, "\\X{%x}", c); + fprintf(f, "\\X{%x}", c); /* Invalid secondary byte */ return i - 1; } - - /* The byte is OK */ - s -= 6; c |= (ptr[i] & 0x3f) << s; } fprintf(f, "\\x{%x}", c); return a; - } - -#elif defined COMPILE_PCRE16 - -if (!utf || (c & 0xfc00) != 0xd800) - { - if (PRINTABLE(c)) fprintf(f, "%c", (char)c); - else if (c <= 0x80) fprintf(f, "\\x%02x", c); - else fprintf(f, "\\x{%02x}", c); - return 0; - } -else - { - /* This is a check for malformed UTF-16; it should only occur if the sanity - check has been turned off. Rather than swallow a low surrogate, just stop if - we hit a bad one. Print it with \X instead of \x as an indication. */ - - if ((ptr[1] & 0xfc00) != 0xdc00) - { - fprintf(f, "\\X{%x}", c); - return 0; - } - - c = (((c & 0x3ff) << 10) | (ptr[1] & 0x3ff)) + 0x10000; - fprintf(f, "\\x{%x}", c); - return 1; - } +} +#endif /* PCRE2_CODE_UNIT_WIDTH == 8 */ -#elif defined COMPILE_PCRE32 +/* UTF-16: rather than swallow a low surrogate, just stop if we hit a bad one. +Print it with \X instead of \x as an indication. */ -if (!utf || (c & 0xfffff800u) != 0xd800u) +#if PCRE2_CODE_UNIT_WIDTH == 16 +if ((ptr[1] & 0xfc00) != 0xdc00) { - if (PRINTABLE(c)) fprintf(f, "%c", (char)c); - else if (c <= 0x80) fprintf(f, "\\x%02x", c); - else fprintf(f, "\\x{%x}", c); - return 0; - } -else - { - /* This is a check for malformed UTF-32; it should only occur if the sanity - check has been turned off. Rather than swallow a surrogate, just stop if - we hit one. Print it with \X instead of \x as an indication. */ fprintf(f, "\\X{%x}", c); return 0; } +c = (((c & 0x3ff) << 10) | (ptr[1] & 0x3ff)) + 0x10000; +fprintf(f, "\\x{%x}", c); +return 1; +#endif /* PCRE2_CODE_UNIT_WIDTH == 16 */ -#endif /* COMPILE_PCRE[8|16|32] */ +/* For UTF-32 we get here only for a malformed code unit, which should only +occur if the sanity check has been turned off. Print it with \X instead of \x +as an indication. */ -#endif /* SUPPORT_UTF */ +#if PCRE2_CODE_UNIT_WIDTH == 32 +fprintf(f, "\\X{%x}", c); +return 0; +#endif /* PCRE2_CODE_UNIT_WIDTH == 32 */ +#endif /* SUPPORT_UNICODE */ } + + /************************************************* -* Print uchar string (regardless of utf) * +* Print string as a list of code units * *************************************************/ +/* These take no account of UTF as they always print each individual code unit. +The string is zero-terminated for print_custring(); the length is given for +print_custring_bylen(). + +Arguments: + f file to write to + ptr point to the string + len length for print_custring_bylen() + +Returns: nothing +*/ + static void -print_puchar(FILE *f, PCRE_PUCHAR ptr) +print_custring(FILE *f, PCRE2_SPTR ptr) { while (*ptr != '\0') { - register pcre_uint32 c = *ptr++; + uint32_t c = *ptr++; if (PRINTABLE(c)) fprintf(f, "%c", c); else fprintf(f, "\\x{%x}", c); } } +static void +print_custring_bylen(FILE *f, PCRE2_SPTR ptr, PCRE2_UCHAR len) +{ +for (; len > 0; len--) + { + uint32_t c = *ptr++; + if (PRINTABLE(c)) fprintf(f, "%c", c); else fprintf(f, "\\x{%x}", c); + } +} + + + /************************************************* * Find Unicode property name * *************************************************/ +/* When there is no UTF/UCP support, the table of names does not exist. This +function should not be called in such configurations, because a pattern that +tries to use Unicode properties won't compile. Rather than put lots of #ifdefs +into the main code, however, we just put one into this function. */ + static const char * get_ucpname(unsigned int ptype, unsigned int pvalue) { -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE int i; for (i = PRIV(utt_size) - 1; i >= 0; i--) { if (ptype == PRIV(utt)[i].type && pvalue == PRIV(utt)[i].value) break; } return (i >= 0)? PRIV(utt_names) + PRIV(utt)[i].name_offset : "??"; -#else -/* It gets harder and harder to shut off unwanted compiler warnings. */ -ptype = ptype * pvalue; -return (ptype == pvalue)? "??" : "??"; -#endif +#else /* No UTF support */ +(void)ptype; +(void)pvalue; +return "??"; +#endif /* SUPPORT_UNICODE */ } + /************************************************* * Print Unicode property value * *************************************************/ /* "Normal" properties can be printed from tables. The PT_CLIST property is a pseudo-property that contains a pointer to a list of case-equivalent -characters. This is used only when UCP support is available and UTF mode is -selected. It should never occur otherwise, but just in case it does, have -something ready to print. */ +characters. + +Arguments: + f file to write to + code pointer in the compiled code + before text to print before + after text to print after + +Returns: nothing +*/ static void -print_prop(FILE *f, pcre_uchar *code, const char *before, const char *after) +print_prop(FILE *f, PCRE2_SPTR code, const char *before, const char *after) { if (code[1] != PT_CLIST) { - fprintf(f, "%s%s %s%s", before, priv_OP_names[*code], get_ucpname(code[1], + fprintf(f, "%s%s %s%s", before, OP_names[*code], get_ucpname(code[1], code[2]), after); } else { const char *not = (*code == OP_PROP)? "" : "not "; -#ifndef SUPPORT_UCP - fprintf(f, "%s%sclist %d%s", before, not, code[2], after); -#else - const pcre_uint32 *p = PRIV(ucd_caseless_sets) + code[2]; + const uint32_t *p = PRIV(ucd_caseless_sets) + code[2]; fprintf (f, "%s%sclist", before, not); while (*p < NOTACHAR) fprintf(f, " %04x", *p++); fprintf(f, "%s", after); -#endif } } - /************************************************* -* Print compiled regex * +* Print compiled pattern * *************************************************/ -/* Make this function work for a regex with integers either byte order. -However, we assume that what we are passed is a compiled regex. The -print_lengths flag controls whether offsets and lengths of items are printed. -They can be turned off from pcretest so that automatic tests on bytecode can be -written that do not depend on the value of LINK_SIZE. */ +/* The print_lengths flag controls whether offsets and lengths of items are +printed. Lenths can be turned off from pcre2test so that automatic tests on +bytecode can be written that do not depend on the value of LINK_SIZE. -#ifdef PCRE_INCLUDED -static /* Keep the following function as private. */ -#endif -#if defined COMPILE_PCRE8 -void -pcre_printint(pcre *external_re, FILE *f, BOOL print_lengths) -#elif defined COMPILE_PCRE16 -void -pcre16_printint(pcre *external_re, FILE *f, BOOL print_lengths) -#elif defined COMPILE_PCRE32 -void -pcre32_printint(pcre *external_re, FILE *f, BOOL print_lengths) -#endif -{ -REAL_PCRE *re = (REAL_PCRE *)external_re; -pcre_uchar *codestart, *code; -BOOL utf; +Arguments: + re a compiled pattern + f the file to write to + print_lengths show various lengths -unsigned int options = re->options; -int offset = re->name_table_offset; -int count = re->name_count; -int size = re->name_entry_size; +Returns: nothing +*/ -if (re->magic_number != MAGIC_NUMBER) - { - offset = ((offset << 8) & 0xff00) | ((offset >> 8) & 0xff); - count = ((count << 8) & 0xff00) | ((count >> 8) & 0xff); - size = ((size << 8) & 0xff00) | ((size >> 8) & 0xff); - options = ((options << 24) & 0xff000000) | - ((options << 8) & 0x00ff0000) | - ((options >> 8) & 0x0000ff00) | - ((options >> 24) & 0x000000ff); - } +static void +pcre2_printint(pcre2_code *re, FILE *f, BOOL print_lengths) +{ +PCRE2_SPTR codestart, nametable, code; +uint32_t nesize = re->name_entry_size; +BOOL utf = (re->overall_options & PCRE2_UTF) != 0; -code = codestart = (pcre_uchar *)re + offset + count * size; -/* PCRE_UTF(16|32) have the same value as PCRE_UTF8. */ -utf = (options & PCRE_UTF8) != 0; +nametable = (PCRE2_SPTR)((uint8_t *)re + sizeof(pcre2_real_code)); +code = codestart = nametable + re->name_count * re->name_entry_size; for(;;) { - pcre_uchar *ccode; + PCRE2_SPTR ccode; + uint32_t c; + int i; const char *flag = " "; - pcre_uint32 c; unsigned int extra = 0; if (print_lengths) @@ -356,13 +338,13 @@ for(;;) case OP_TABLE_LENGTH: case OP_TABLE_LENGTH + - ((sizeof(priv_OP_names)/sizeof(const char *) == OP_TABLE_LENGTH) && - (sizeof(priv_OP_lengths) == OP_TABLE_LENGTH)): - break; + ((sizeof(OP_names)/sizeof(const char *) == OP_TABLE_LENGTH) && + (sizeof(OP_lengths) == OP_TABLE_LENGTH)): + return; /* ========================================================================== */ case OP_END: - fprintf(f, " %s\n", priv_OP_names[*code]); + fprintf(f, " %s\n", OP_names[*code]); fprintf(f, "------------------------------------------------------------------\n"); return; @@ -394,7 +376,7 @@ for(;;) case OP_SCBRAPOS: if (print_lengths) fprintf(f, "%3d ", GET(code, 1)); else fprintf(f, " "); - fprintf(f, "%s %d", priv_OP_names[*code], GET2(code, 1+LINK_SIZE)); + fprintf(f, "%s %d", OP_names[*code], GET2(code, 1+LINK_SIZE)); break; case OP_BRA: @@ -410,30 +392,31 @@ for(;;) case OP_ASSERT_NOT: case OP_ASSERTBACK: case OP_ASSERTBACK_NOT: + case OP_ASSERT_NA: + case OP_ASSERTBACK_NA: case OP_ONCE: - case OP_ONCE_NC: + case OP_SCRIPT_RUN: case OP_COND: case OP_SCOND: case OP_REVERSE: if (print_lengths) fprintf(f, "%3d ", GET(code, 1)); else fprintf(f, " "); - fprintf(f, "%s", priv_OP_names[*code]); + fprintf(f, "%s", OP_names[*code]); break; case OP_CLOSE: - fprintf(f, " %s %d", priv_OP_names[*code], GET2(code, 1)); + fprintf(f, " %s %d", OP_names[*code], GET2(code, 1)); break; case OP_CREF: - fprintf(f, "%3d %s", GET2(code,1), priv_OP_names[*code]); + fprintf(f, "%3d %s", GET2(code,1), OP_names[*code]); break; case OP_DNCREF: { - pcre_uchar *entry = (pcre_uchar *)re + offset + (GET2(code, 1) * size) + - IMM2_SIZE; + PCRE2_SPTR entry = nametable + (GET2(code, 1) * nesize) + IMM2_SIZE; fprintf(f, " %s Cond ref <", flag); - print_puchar(f, entry); + print_custring(f, entry); fprintf(f, ">%d", GET2(code, 1 + IMM2_SIZE)); } break; @@ -448,16 +431,19 @@ for(;;) case OP_DNRREF: { - pcre_uchar *entry = (pcre_uchar *)re + offset + (GET2(code, 1) * size) + - IMM2_SIZE; + PCRE2_SPTR entry = nametable + (GET2(code, 1) * nesize) + IMM2_SIZE; fprintf(f, " %s Cond recurse <", flag); - print_puchar(f, entry); + print_custring(f, entry); fprintf(f, ">%d", GET2(code, 1 + IMM2_SIZE)); } break; - case OP_DEF: - fprintf(f, " Cond def"); + case OP_FALSE: + fprintf(f, " Cond false"); + break; + + case OP_TRUE: + fprintf(f, " Cond true"); break; case OP_STARI: @@ -490,6 +476,7 @@ for(;;) case OP_TYPEMINQUERY: case OP_TYPEPOSQUERY: fprintf(f, " %s ", flag); + if (*code >= OP_TYPESTAR) { if (code[1] == OP_PROP || code[1] == OP_NOTPROP) @@ -497,10 +484,10 @@ for(;;) print_prop(f, code + 1, "", " "); extra = 2; } - else fprintf(f, "%s", priv_OP_names[code[1]]); + else fprintf(f, "%s", OP_names[code[1]]); } else extra = print_char(f, code+1, utf); - fprintf(f, "%s", priv_OP_names[*code]); + fprintf(f, "%s", OP_names[*code]); break; case OP_EXACTI: @@ -531,7 +518,7 @@ for(;;) print_prop(f, code + IMM2_SIZE + 1, " ", " "); extra = 2; } - else fprintf(f, " %s", priv_OP_names[code[1 + IMM2_SIZE]]); + else fprintf(f, " %s", OP_names[code[1 + IMM2_SIZE]]); fprintf(f, "{"); if (*code != OP_TYPEEXACT) fprintf(f, "0,"); fprintf(f, "%d}", GET2(code,1)); @@ -571,7 +558,7 @@ for(;;) case OP_NOTPOSQUERY: fprintf(f, " %s [^", flag); extra = print_char(f, code + 1, utf); - fprintf(f, "]%s", priv_OP_names[*code]); + fprintf(f, "]%s", OP_names[*code]); break; case OP_NOTEXACTI: @@ -598,7 +585,7 @@ for(;;) case OP_RECURSE: if (print_lengths) fprintf(f, "%3d ", GET(code, 1)); else fprintf(f, " "); - fprintf(f, "%s", priv_OP_names[*code]); + fprintf(f, "%s", OP_names[*code]); break; case OP_REFI: @@ -606,7 +593,7 @@ for(;;) /* Fall through */ case OP_REF: fprintf(f, " %s \\%d", flag, GET2(code,1)); - ccode = code + priv_OP_lengths[*code]; + ccode = code + OP_lengths[*code]; goto CLASS_REF_REPEAT; case OP_DNREFI: @@ -614,18 +601,32 @@ for(;;) /* Fall through */ case OP_DNREF: { - pcre_uchar *entry = (pcre_uchar *)re + offset + (GET2(code, 1) * size) + - IMM2_SIZE; + PCRE2_SPTR entry = nametable + (GET2(code, 1) * nesize) + IMM2_SIZE; fprintf(f, " %s \\k<", flag); - print_puchar(f, entry); + print_custring(f, entry); fprintf(f, ">%d", GET2(code, 1 + IMM2_SIZE)); } - ccode = code + priv_OP_lengths[*code]; + ccode = code + OP_lengths[*code]; goto CLASS_REF_REPEAT; case OP_CALLOUT: - fprintf(f, " %s %d %d %d", priv_OP_names[*code], code[1], GET(code,2), - GET(code, 2 + LINK_SIZE)); + fprintf(f, " %s %d %d %d", OP_names[*code], code[1 + 2*LINK_SIZE], + GET(code, 1), GET(code, 1 + LINK_SIZE)); + break; + + case OP_CALLOUT_STR: + c = code[1 + 4*LINK_SIZE]; + fprintf(f, " %s %c", OP_names[*code], c); + extra = GET(code, 1 + 2*LINK_SIZE); + print_custring_bylen(f, code + 2 + 4*LINK_SIZE, extra - 3 - 4*LINK_SIZE); + for (i = 0; PRIV(callout_start_delims)[i] != 0; i++) + if (c == PRIV(callout_start_delims)[i]) + { + c = PRIV(callout_end_delims)[i]; + break; + } + fprintf(f, "%c %d %d %d", c, GET(code, 1 + 3*LINK_SIZE), GET(code, 1), + GET(code, 1 + LINK_SIZE)); break; case OP_PROP: @@ -641,12 +642,11 @@ for(;;) case OP_NCLASS: case OP_XCLASS: { - int i; unsigned int min, max; BOOL printmap; BOOL invertmap = FALSE; - pcre_uint8 *map; - pcre_uint8 inverted_map[32]; + uint8_t *map; + uint8_t inverted_map[32]; fprintf(f, " ["); @@ -672,20 +672,21 @@ for(;;) if (printmap) { - map = (pcre_uint8 *)ccode; + map = (uint8_t *)ccode; if (invertmap) { - for (i = 0; i < 32; i++) inverted_map[i] = ~map[i]; + /* Using 255 ^ instead of ~ avoids clang sanitize warning. */ + for (i = 0; i < 32; i++) inverted_map[i] = 255 ^ map[i]; map = inverted_map; } for (i = 0; i < 256; i++) { - if ((map[i/8] & (1 << (i&7))) != 0) + if ((map[i/8] & (1u << (i&7))) != 0) { int j; for (j = i+1; j < 256; j++) - if ((map[j/8] & (1 << (j&7))) == 0) break; + if ((map[j/8] & (1u << (j&7))) == 0) break; if (i == '-' || i == ']') fprintf(f, "\\"); if (PRINTABLE(i)) fprintf(f, "%c", i); else fprintf(f, "\\x%02x", i); @@ -699,14 +700,14 @@ for(;;) i = j; } } - ccode += 32 / sizeof(pcre_uchar); + ccode += 32 / sizeof(PCRE2_UCHAR); } /* For an XCLASS there is always some additional data */ if (*code == OP_XCLASS) { - pcre_uchar ch; + PCRE2_UCHAR ch; while ((ch = *ccode++) != XCL_END) { BOOL not = FALSE; @@ -776,8 +777,8 @@ for(;;) case OP_CRPOSSTAR: case OP_CRPOSPLUS: case OP_CRPOSQUERY: - fprintf(f, "%s", priv_OP_names[*ccode]); - extra += priv_OP_lengths[*ccode]; + fprintf(f, "%s", OP_names[*ccode]); + extra += OP_lengths[*ccode]; break; case OP_CRRANGE: @@ -789,7 +790,7 @@ for(;;) else fprintf(f, "{%u,%u}", min, max); if (*ccode == OP_CRMINRANGE) fprintf(f, "?"); else if (*ccode == OP_CRPOSRANGE) fprintf(f, "+"); - extra += priv_OP_lengths[*ccode]; + extra += OP_lengths[*ccode]; break; /* Do nothing if it's not a repeat; this code stops picky compilers @@ -802,16 +803,17 @@ for(;;) break; case OP_MARK: + case OP_COMMIT_ARG: case OP_PRUNE_ARG: case OP_SKIP_ARG: case OP_THEN_ARG: - fprintf(f, " %s ", priv_OP_names[*code]); - print_puchar(f, code + 2); + fprintf(f, " %s ", OP_names[*code]); + print_custring_bylen(f, code + 2, code[1]); extra += code[1]; break; case OP_THEN: - fprintf(f, " %s", priv_OP_names[*code]); + fprintf(f, " %s", OP_names[*code]); break; case OP_CIRCM: @@ -822,13 +824,13 @@ for(;;) /* Anything else is just an item with no data, but possibly a flag. */ default: - fprintf(f, " %s %s", flag, priv_OP_names[*code]); + fprintf(f, " %s %s", flag, OP_names[*code]); break; } - code += priv_OP_lengths[*code] + extra; + code += OP_lengths[*code] + extra; fprintf(f, "\n"); } } -/* End of pcre_printint.src */ +/* End of pcre2_printint.c */ diff --git a/src/pcre2/src/pcre2_script_run.c b/src/pcre2/src/pcre2_script_run.c new file mode 100644 index 00000000..91a48330 --- /dev/null +++ b/src/pcre2/src/pcre2_script_run.c @@ -0,0 +1,441 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2018 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + +/* This module contains the function for checking a script run. */ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include "pcre2_internal.h" + + +/************************************************* +* Check script run * +*************************************************/ + +/* A script run is conceptually a sequence of characters all in the same +Unicode script. However, it isn't quite that simple. There are special rules +for scripts that are commonly used together, and also special rules for digits. +This function implements the appropriate checks, which is possible only when +PCRE2 is compiled with Unicode support. The function returns TRUE if there is +no Unicode support; however, it should never be called in that circumstance +because an error is given by pcre2_compile() if a script run is called for in a +version of PCRE2 compiled without Unicode support. + +Arguments: + pgr point to the first character + endptr point after the last character + utf TRUE if in UTF mode + +Returns: TRUE if this is a valid script run +*/ + +/* These dummy values must be less than the negation of the largest offset in +the PRIV(ucd_script_sets) vector, which is held in a 16-bit field in UCD +records (and is only likely to be a few hundred). */ + +#define SCRIPT_UNSET (-99999) +#define SCRIPT_HANPENDING (-99998) +#define SCRIPT_HANHIRAKATA (-99997) +#define SCRIPT_HANBOPOMOFO (-99996) +#define SCRIPT_HANHANGUL (-99995) +#define SCRIPT_LIST (-99994) + +#define INTERSECTION_LIST_SIZE 50 + +BOOL +PRIV(script_run)(PCRE2_SPTR ptr, PCRE2_SPTR endptr, BOOL utf) +{ +#ifdef SUPPORT_UNICODE +int require_script = SCRIPT_UNSET; +uint8_t intersection_list[INTERSECTION_LIST_SIZE]; +const uint8_t *require_list = NULL; +uint32_t require_digitset = 0; +uint32_t c; + +#if PCRE2_CODE_UNIT_WIDTH == 32 +(void)utf; /* Avoid compiler warning */ +#endif + +/* Any string containing fewer than 2 characters is a valid script run. */ + +if (ptr >= endptr) return TRUE; +GETCHARINCTEST(c, ptr); +if (ptr >= endptr) return TRUE; + +/* Scan strings of two or more characters, checking the Unicode characteristics +of each code point. We make use of the Script Extensions property. There is +special code for scripts that can be combined with characters from the Han +Chinese script. This may be used in conjunction with four other scripts in +these combinations: + +. Han with Hiragana and Katakana is allowed (for Japanese). +. Han with Bopomofo is allowed (for Taiwanese Mandarin). +. Han with Hangul is allowed (for Korean). + +If the first significant character's script is one of the four, the required +script type is immediately known. However, if the first significant +character's script is Han, we have to keep checking for a non-Han character. +Hence the SCRIPT_HANPENDING state. */ + +for (;;) + { + const ucd_record *ucd = GET_UCD(c); + int32_t scriptx = ucd->scriptx; + + /* If the script extension is Unknown, the string is not a valid script run. + Such characters can only form script runs of length one. */ + + if (scriptx == ucp_Unknown) return FALSE; + + /* A character whose script extension is Inherited is always accepted with + any script, and plays no further part in this testing. A character whose + script is Common is always accepted, but must still be tested for a digit + below. The scriptx value at this point is non-zero, because zero is + ucp_Unknown, tested for above. */ + + if (scriptx != ucp_Inherited) + { + if (scriptx != ucp_Common) + { + /* If the script extension value is positive, the character is not a mark + that can be used with many scripts. In the simple case we either set or + compare with the required script. However, handling the scripts that can + combine with Han are more complicated, as is the case when the previous + characters have been man-script marks. */ + + if (scriptx > 0) + { + switch(require_script) + { + /* Either the first significant character (require_script unset) or + after only Han characters. */ + + case SCRIPT_UNSET: + case SCRIPT_HANPENDING: + switch(scriptx) + { + case ucp_Han: + require_script = SCRIPT_HANPENDING; + break; + + case ucp_Hiragana: + case ucp_Katakana: + require_script = SCRIPT_HANHIRAKATA; + break; + + case ucp_Bopomofo: + require_script = SCRIPT_HANBOPOMOFO; + break; + + case ucp_Hangul: + require_script = SCRIPT_HANHANGUL; + break; + + /* Not a Han-related script. If expecting one, fail. Otherise set + the requirement to this script. */ + + default: + if (require_script == SCRIPT_HANPENDING) return FALSE; + require_script = scriptx; + break; + } + break; + + /* Previously encountered one of the "with Han" scripts. Check that + this character is appropriate. */ + + case SCRIPT_HANHIRAKATA: + if (scriptx != ucp_Han && scriptx != ucp_Hiragana && + scriptx != ucp_Katakana) + return FALSE; + break; + + case SCRIPT_HANBOPOMOFO: + if (scriptx != ucp_Han && scriptx != ucp_Bopomofo) return FALSE; + break; + + case SCRIPT_HANHANGUL: + if (scriptx != ucp_Han && scriptx != ucp_Hangul) return FALSE; + break; + + /* We have a list of scripts to check that is derived from one or + more previous characters. This is either one of the lists in + ucd_script_sets[] (for one previous character) or the intersection of + several lists for multiple characters. */ + + case SCRIPT_LIST: + { + const uint8_t *list; + for (list = require_list; *list != 0; list++) + { + if (*list == scriptx) break; + } + if (*list == 0) return FALSE; + } + + /* The rest of the string must be in this script, but we have to + allow for the Han complications. */ + + switch(scriptx) + { + case ucp_Han: + require_script = SCRIPT_HANPENDING; + break; + + case ucp_Hiragana: + case ucp_Katakana: + require_script = SCRIPT_HANHIRAKATA; + break; + + case ucp_Bopomofo: + require_script = SCRIPT_HANBOPOMOFO; + break; + + case ucp_Hangul: + require_script = SCRIPT_HANHANGUL; + break; + + default: + require_script = scriptx; + break; + } + break; + + /* This is the easy case when a single script is required. */ + + default: + if (scriptx != require_script) return FALSE; + break; + } + } /* End of handing positive scriptx */ + + /* If scriptx is negative, this character is a mark-type character that + has a list of permitted scripts. */ + + else + { + uint32_t chspecial; + const uint8_t *clist, *rlist; + const uint8_t *list = PRIV(ucd_script_sets) - scriptx; + + switch(require_script) + { + case SCRIPT_UNSET: + require_list = PRIV(ucd_script_sets) - scriptx; + require_script = SCRIPT_LIST; + break; + + /* An inspection of the Unicode 11.0.0 files shows that there are the + following types of Script Extension list that involve the Han, + Bopomofo, Hiragana, Katakana, and Hangul scripts: + + . Bopomofo + Han + . Han + Hiragana + Katakana + . Hiragana + Katakana + . Bopopmofo + Hangul + Han + Hiragana + Katakana + + The following code tries to make sense of this. */ + +#define FOUND_BOPOMOFO 1 +#define FOUND_HIRAGANA 2 +#define FOUND_KATAKANA 4 +#define FOUND_HANGUL 8 + + case SCRIPT_HANPENDING: + chspecial = 0; + for (; *list != 0; list++) + { + switch (*list) + { + case ucp_Bopomofo: chspecial |= FOUND_BOPOMOFO; break; + case ucp_Hiragana: chspecial |= FOUND_HIRAGANA; break; + case ucp_Katakana: chspecial |= FOUND_KATAKANA; break; + case ucp_Hangul: chspecial |= FOUND_HANGUL; break; + default: break; + } + } + + if (chspecial == 0) return FALSE; + + if (chspecial == FOUND_BOPOMOFO) + { + require_script = SCRIPT_HANBOPOMOFO; + } + else if (chspecial == (FOUND_HIRAGANA|FOUND_KATAKANA)) + { + require_script = SCRIPT_HANHIRAKATA; + } + + /* Otherwise it must be allowed with all of them, so remain in + the pending state. */ + + break; + + case SCRIPT_HANHIRAKATA: + for (; *list != 0; list++) + { + if (*list == ucp_Hiragana || *list == ucp_Katakana) break; + } + if (*list == 0) return FALSE; + break; + + case SCRIPT_HANBOPOMOFO: + for (; *list != 0; list++) + { + if (*list == ucp_Bopomofo) break; + } + if (*list == 0) return FALSE; + break; + + case SCRIPT_HANHANGUL: + for (; *list != 0; list++) + { + if (*list == ucp_Hangul) break; + } + if (*list == 0) return FALSE; + break; + + /* Previously encountered one or more characters that are allowed + with a list of scripts. Build the intersection of the required list + with this character's list in intersection_list[]. This code is + written so that it still works OK if the required list is already in + that vector. */ + + case SCRIPT_LIST: + { + int i = 0; + for (rlist = require_list; *rlist != 0; rlist++) + { + for (clist = list; *clist != 0; clist++) + { + if (*rlist == *clist) + { + intersection_list[i++] = *rlist; + break; + } + } + } + if (i == 0) return FALSE; /* No scripts in common */ + + /* If there's just one script in common, we can set it as the + unique required script. Otherwise, terminate the intersection list + and make it the required list. */ + + if (i == 1) + { + require_script = intersection_list[0]; + } + else + { + intersection_list[i] = 0; + require_list = intersection_list; + } + } + break; + + /* The previously set required script is a single script, not + Han-related. Check that it is in this character's list. */ + + default: + for (; *list != 0; list++) + { + if (*list == require_script) break; + } + if (*list == 0) return FALSE; + break; + } + } /* End of handling negative scriptx */ + } /* End of checking non-Common character */ + + /* The character is in an acceptable script. We must now ensure that all + decimal digits in the string come from the same set. Some scripts (e.g. + Common, Arabic) have more than one set of decimal digits. This code does + not allow mixing sets, even within the same script. The vector called + PRIV(ucd_digit_sets)[] contains, in its first element, the number of + following elements, and then, in ascending order, the code points of the + '9' characters in every set of 10 digits. Each set is identified by the + offset in the vector of its '9' character. An initial check of the first + value picks up ASCII digits quickly. Otherwise, a binary chop is used. */ + + if (ucd->chartype == ucp_Nd) + { + uint32_t digitset; + + if (c <= PRIV(ucd_digit_sets)[1]) digitset = 1; else + { + int mid; + int bot = 1; + int top = PRIV(ucd_digit_sets)[0]; + for (;;) + { + if (top <= bot + 1) /* <= rather than == is paranoia */ + { + digitset = top; + break; + } + mid = (top + bot) / 2; + if (c <= PRIV(ucd_digit_sets)[mid]) top = mid; else bot = mid; + } + } + + /* A required value of 0 means "unset". */ + + if (require_digitset == 0) require_digitset = digitset; + else if (digitset != require_digitset) return FALSE; + } /* End digit handling */ + } /* End checking non-Inherited character */ + + /* If we haven't yet got to the end, pick up the next character. */ + + if (ptr >= endptr) return TRUE; + GETCHARINCTEST(c, ptr); + } /* End checking loop */ + +#else /* NOT SUPPORT_UNICODE */ +(void)ptr; +(void)endptr; +(void)utf; +return TRUE; +#endif /* SUPPORT_UNICODE */ +} + +/* End of pcre2_script_run.c */ diff --git a/src/pcre2/src/pcre2_serialize.c b/src/pcre2/src/pcre2_serialize.c new file mode 100644 index 00000000..ba17a26d --- /dev/null +++ b/src/pcre2/src/pcre2_serialize.c @@ -0,0 +1,286 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2020 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + +/* This module contains functions for serializing and deserializing +a sequence of compiled codes. */ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + + +#include "pcre2_internal.h" + +/* Magic number to provide a small check against being handed junk. */ + +#define SERIALIZED_DATA_MAGIC 0x50523253u + +/* Deserialization is limited to the current PCRE version and +character width. */ + +#define SERIALIZED_DATA_VERSION \ + ((PCRE2_MAJOR) | ((PCRE2_MINOR) << 16)) + +#define SERIALIZED_DATA_CONFIG \ + (sizeof(PCRE2_UCHAR) | ((sizeof(void*)) << 8) | ((sizeof(PCRE2_SIZE)) << 16)) + + + +/************************************************* +* Serialize compiled patterns * +*************************************************/ + +PCRE2_EXP_DEFN int32_t PCRE2_CALL_CONVENTION +pcre2_serialize_encode(const pcre2_code **codes, int32_t number_of_codes, + uint8_t **serialized_bytes, PCRE2_SIZE *serialized_size, + pcre2_general_context *gcontext) +{ +uint8_t *bytes; +uint8_t *dst_bytes; +int32_t i; +PCRE2_SIZE total_size; +const pcre2_real_code *re; +const uint8_t *tables; +pcre2_serialized_data *data; + +const pcre2_memctl *memctl = (gcontext != NULL) ? + &gcontext->memctl : &PRIV(default_compile_context).memctl; + +if (codes == NULL || serialized_bytes == NULL || serialized_size == NULL) + return PCRE2_ERROR_NULL; + +if (number_of_codes <= 0) return PCRE2_ERROR_BADDATA; + +/* Compute total size. */ +total_size = sizeof(pcre2_serialized_data) + TABLES_LENGTH; +tables = NULL; + +for (i = 0; i < number_of_codes; i++) + { + if (codes[i] == NULL) return PCRE2_ERROR_NULL; + re = (const pcre2_real_code *)(codes[i]); + if (re->magic_number != MAGIC_NUMBER) return PCRE2_ERROR_BADMAGIC; + if (tables == NULL) + tables = re->tables; + else if (tables != re->tables) + return PCRE2_ERROR_MIXEDTABLES; + total_size += re->blocksize; + } + +/* Initialize the byte stream. */ +bytes = memctl->malloc(total_size + sizeof(pcre2_memctl), memctl->memory_data); +if (bytes == NULL) return PCRE2_ERROR_NOMEMORY; + +/* The controller is stored as a hidden parameter. */ +memcpy(bytes, memctl, sizeof(pcre2_memctl)); +bytes += sizeof(pcre2_memctl); + +data = (pcre2_serialized_data *)bytes; +data->magic = SERIALIZED_DATA_MAGIC; +data->version = SERIALIZED_DATA_VERSION; +data->config = SERIALIZED_DATA_CONFIG; +data->number_of_codes = number_of_codes; + +/* Copy all compiled code data. */ +dst_bytes = bytes + sizeof(pcre2_serialized_data); +memcpy(dst_bytes, tables, TABLES_LENGTH); +dst_bytes += TABLES_LENGTH; + +for (i = 0; i < number_of_codes; i++) + { + re = (const pcre2_real_code *)(codes[i]); + (void)memcpy(dst_bytes, (char *)re, re->blocksize); + + /* Certain fields in the compiled code block are re-set during + deserialization. In order to ensure that the serialized data stream is always + the same for the same pattern, set them to zero here. We can't assume the + copy of the pattern is correctly aligned for accessing the fields as part of + a structure. Note the use of sizeof(void *) in the second of these, to + specify the size of a pointer. If sizeof(uint8_t *) is used (tables is a + pointer to uint8_t), gcc gives a warning because the first argument is also a + pointer to uint8_t. Casting the first argument to (void *) can stop this, but + it didn't stop Coverity giving the same complaint. */ + + (void)memset(dst_bytes + offsetof(pcre2_real_code, memctl), 0, + sizeof(pcre2_memctl)); + (void)memset(dst_bytes + offsetof(pcre2_real_code, tables), 0, + sizeof(void *)); + (void)memset(dst_bytes + offsetof(pcre2_real_code, executable_jit), 0, + sizeof(void *)); + + dst_bytes += re->blocksize; + } + +*serialized_bytes = bytes; +*serialized_size = total_size; +return number_of_codes; +} + + +/************************************************* +* Deserialize compiled patterns * +*************************************************/ + +PCRE2_EXP_DEFN int32_t PCRE2_CALL_CONVENTION +pcre2_serialize_decode(pcre2_code **codes, int32_t number_of_codes, + const uint8_t *bytes, pcre2_general_context *gcontext) +{ +const pcre2_serialized_data *data = (const pcre2_serialized_data *)bytes; +const pcre2_memctl *memctl = (gcontext != NULL) ? + &gcontext->memctl : &PRIV(default_compile_context).memctl; + +const uint8_t *src_bytes; +pcre2_real_code *dst_re; +uint8_t *tables; +int32_t i, j; + +/* Sanity checks. */ + +if (data == NULL || codes == NULL) return PCRE2_ERROR_NULL; +if (number_of_codes <= 0) return PCRE2_ERROR_BADDATA; +if (data->number_of_codes <= 0) return PCRE2_ERROR_BADSERIALIZEDDATA; +if (data->magic != SERIALIZED_DATA_MAGIC) return PCRE2_ERROR_BADMAGIC; +if (data->version != SERIALIZED_DATA_VERSION) return PCRE2_ERROR_BADMODE; +if (data->config != SERIALIZED_DATA_CONFIG) return PCRE2_ERROR_BADMODE; + +if (number_of_codes > data->number_of_codes) + number_of_codes = data->number_of_codes; + +src_bytes = bytes + sizeof(pcre2_serialized_data); + +/* Decode tables. The reference count for the tables is stored immediately +following them. */ + +tables = memctl->malloc(TABLES_LENGTH + sizeof(PCRE2_SIZE), memctl->memory_data); +if (tables == NULL) return PCRE2_ERROR_NOMEMORY; + +memcpy(tables, src_bytes, TABLES_LENGTH); +*(PCRE2_SIZE *)(tables + TABLES_LENGTH) = number_of_codes; +src_bytes += TABLES_LENGTH; + +/* Decode the byte stream. We must not try to read the size from the compiled +code block in the stream, because it might be unaligned, which causes errors on +hardware such as Sparc-64 that doesn't like unaligned memory accesses. The type +of the blocksize field is given its own name to ensure that it is the same here +as in the block. */ + +for (i = 0; i < number_of_codes; i++) + { + CODE_BLOCKSIZE_TYPE blocksize; + memcpy(&blocksize, src_bytes + offsetof(pcre2_real_code, blocksize), + sizeof(CODE_BLOCKSIZE_TYPE)); + if (blocksize <= sizeof(pcre2_real_code)) + return PCRE2_ERROR_BADSERIALIZEDDATA; + + /* The allocator provided by gcontext replaces the original one. */ + + dst_re = (pcre2_real_code *)PRIV(memctl_malloc)(blocksize, + (pcre2_memctl *)gcontext); + if (dst_re == NULL) + { + memctl->free(tables, memctl->memory_data); + for (j = 0; j < i; j++) + { + memctl->free(codes[j], memctl->memory_data); + codes[j] = NULL; + } + return PCRE2_ERROR_NOMEMORY; + } + + /* The new allocator must be preserved. */ + + memcpy(((uint8_t *)dst_re) + sizeof(pcre2_memctl), + src_bytes + sizeof(pcre2_memctl), blocksize - sizeof(pcre2_memctl)); + if (dst_re->magic_number != MAGIC_NUMBER || + dst_re->name_entry_size > MAX_NAME_SIZE + IMM2_SIZE + 1 || + dst_re->name_count > MAX_NAME_COUNT) + { + memctl->free(dst_re, memctl->memory_data); + return PCRE2_ERROR_BADSERIALIZEDDATA; + } + + /* At the moment only one table is supported. */ + + dst_re->tables = tables; + dst_re->executable_jit = NULL; + dst_re->flags |= PCRE2_DEREF_TABLES; + + codes[i] = dst_re; + src_bytes += blocksize; + } + +return number_of_codes; +} + + +/************************************************* +* Get the number of serialized patterns * +*************************************************/ + +PCRE2_EXP_DEFN int32_t PCRE2_CALL_CONVENTION +pcre2_serialize_get_number_of_codes(const uint8_t *bytes) +{ +const pcre2_serialized_data *data = (const pcre2_serialized_data *)bytes; + +if (data == NULL) return PCRE2_ERROR_NULL; +if (data->magic != SERIALIZED_DATA_MAGIC) return PCRE2_ERROR_BADMAGIC; +if (data->version != SERIALIZED_DATA_VERSION) return PCRE2_ERROR_BADMODE; +if (data->config != SERIALIZED_DATA_CONFIG) return PCRE2_ERROR_BADMODE; + +return data->number_of_codes; +} + + +/************************************************* +* Free the allocated stream * +*************************************************/ + +PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION +pcre2_serialize_free(uint8_t *bytes) +{ +if (bytes != NULL) + { + pcre2_memctl *memctl = (pcre2_memctl *)(bytes - sizeof(pcre2_memctl)); + memctl->free(memctl, memctl->memory_data); + } +} + +/* End of pcre2_serialize.c */ diff --git a/src/pcre2/src/pcre2_string_utils.c b/src/pcre2/src/pcre2_string_utils.c new file mode 100644 index 00000000..d6be01ac --- /dev/null +++ b/src/pcre2/src/pcre2_string_utils.c @@ -0,0 +1,237 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2018 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + +/* This module contains internal functions for comparing and finding the length +of strings. These are used instead of strcmp() etc because the standard +functions work only on 8-bit data. */ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include "pcre2_internal.h" + + +/************************************************* +* Emulated memmove() for systems without it * +*************************************************/ + +/* This function can make use of bcopy() if it is available. Otherwise do it by +steam, as there some non-Unix environments that lack both memmove() and +bcopy(). */ + +#if !defined(VPCOMPAT) && !defined(HAVE_MEMMOVE) +void * +PRIV(memmove)(void *d, const void *s, size_t n) +{ +#ifdef HAVE_BCOPY +bcopy(s, d, n); +return d; +#else +size_t i; +unsigned char *dest = (unsigned char *)d; +const unsigned char *src = (const unsigned char *)s; +if (dest > src) + { + dest += n; + src += n; + for (i = 0; i < n; ++i) *(--dest) = *(--src); + return (void *)dest; + } +else + { + for (i = 0; i < n; ++i) *dest++ = *src++; + return (void *)(dest - n); + } +#endif /* not HAVE_BCOPY */ +} +#endif /* not VPCOMPAT && not HAVE_MEMMOVE */ + + +/************************************************* +* Compare two zero-terminated PCRE2 strings * +*************************************************/ + +/* +Arguments: + str1 first string + str2 second string + +Returns: 0, 1, or -1 +*/ + +int +PRIV(strcmp)(PCRE2_SPTR str1, PCRE2_SPTR str2) +{ +PCRE2_UCHAR c1, c2; +while (*str1 != '\0' || *str2 != '\0') + { + c1 = *str1++; + c2 = *str2++; + if (c1 != c2) return ((c1 > c2) << 1) - 1; + } +return 0; +} + + +/************************************************* +* Compare zero-terminated PCRE2 & 8-bit strings * +*************************************************/ + +/* As the 8-bit string is almost always a literal, its type is specified as +const char *. + +Arguments: + str1 first string + str2 second string + +Returns: 0, 1, or -1 +*/ + +int +PRIV(strcmp_c8)(PCRE2_SPTR str1, const char *str2) +{ +PCRE2_UCHAR c1, c2; +while (*str1 != '\0' || *str2 != '\0') + { + c1 = *str1++; + c2 = *str2++; + if (c1 != c2) return ((c1 > c2) << 1) - 1; + } +return 0; +} + + +/************************************************* +* Compare two PCRE2 strings, given a length * +*************************************************/ + +/* +Arguments: + str1 first string + str2 second string + len the length + +Returns: 0, 1, or -1 +*/ + +int +PRIV(strncmp)(PCRE2_SPTR str1, PCRE2_SPTR str2, size_t len) +{ +PCRE2_UCHAR c1, c2; +for (; len > 0; len--) + { + c1 = *str1++; + c2 = *str2++; + if (c1 != c2) return ((c1 > c2) << 1) - 1; + } +return 0; +} + + +/************************************************* +* Compare PCRE2 string to 8-bit string by length * +*************************************************/ + +/* As the 8-bit string is almost always a literal, its type is specified as +const char *. + +Arguments: + str1 first string + str2 second string + len the length + +Returns: 0, 1, or -1 +*/ + +int +PRIV(strncmp_c8)(PCRE2_SPTR str1, const char *str2, size_t len) +{ +PCRE2_UCHAR c1, c2; +for (; len > 0; len--) + { + c1 = *str1++; + c2 = *str2++; + if (c1 != c2) return ((c1 > c2) << 1) - 1; + } +return 0; +} + + +/************************************************* +* Find the length of a PCRE2 string * +*************************************************/ + +/* +Argument: the string +Returns: the length +*/ + +PCRE2_SIZE +PRIV(strlen)(PCRE2_SPTR str) +{ +PCRE2_SIZE c = 0; +while (*str++ != 0) c++; +return c; +} + + +/************************************************* +* Copy 8-bit 0-terminated string to PCRE2 string * +*************************************************/ + +/* Arguments: + str1 buffer to receive the string + str2 8-bit string to be copied + +Returns: the number of code units used (excluding trailing zero) +*/ + +PCRE2_SIZE +PRIV(strcpy_c8)(PCRE2_UCHAR *str1, const char *str2) +{ +PCRE2_UCHAR *t = str1; +while (*str2 != 0) *t++ = *str2++; +*t = 0; +return t - str1; +} + +/* End of pcre2_string_utils.c */ diff --git a/src/pcre2/src/pcre2_study.c b/src/pcre2/src/pcre2_study.c new file mode 100644 index 00000000..9bbb3757 --- /dev/null +++ b/src/pcre2/src/pcre2_study.c @@ -0,0 +1,1825 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2020 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + +/* This module contains functions for scanning a compiled pattern and +collecting data (e.g. minimum matching length). */ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include "pcre2_internal.h" + +/* The maximum remembered capturing brackets minimum. */ + +#define MAX_CACHE_BACKREF 128 + +/* Set a bit in the starting code unit bit map. */ + +#define SET_BIT(c) re->start_bitmap[(c)/8] |= (1u << ((c)&7)) + +/* Returns from set_start_bits() */ + +enum { SSB_FAIL, SSB_DONE, SSB_CONTINUE, SSB_UNKNOWN, SSB_TOODEEP }; + + +/************************************************* +* Find the minimum subject length for a group * +*************************************************/ + +/* Scan a parenthesized group and compute the minimum length of subject that +is needed to match it. This is a lower bound; it does not mean there is a +string of that length that matches. In UTF mode, the result is in characters +rather than code units. The field in a compiled pattern for storing the minimum +length is 16-bits long (on the grounds that anything longer than that is +pathological), so we give up when we reach that amount. This also means that +integer overflow for really crazy patterns cannot happen. + +Backreference minimum lengths are cached to speed up multiple references. This +function is called only when the highest back reference in the pattern is less +than or equal to MAX_CACHE_BACKREF, which is one less than the size of the +caching vector. The zeroth element contains the number of the highest set +value. + +Arguments: + re compiled pattern block + code pointer to start of group (the bracket) + startcode pointer to start of the whole pattern's code + utf UTF flag + recurses chain of recurse_check to catch mutual recursion + countptr pointer to call count (to catch over complexity) + backref_cache vector for caching back references. + +This function is no longer called when the pattern contains (*ACCEPT); however, +the old code for returning -1 is retained, just in case. + +Returns: the minimum length + -1 \C in UTF-8 mode + or (*ACCEPT) + or pattern too complicated + -2 internal error (missing capturing bracket) + -3 internal error (opcode not listed) +*/ + +static int +find_minlength(const pcre2_real_code *re, PCRE2_SPTR code, + PCRE2_SPTR startcode, BOOL utf, recurse_check *recurses, int *countptr, + int *backref_cache) +{ +int length = -1; +int branchlength = 0; +int prev_cap_recno = -1; +int prev_cap_d = 0; +int prev_recurse_recno = -1; +int prev_recurse_d = 0; +uint32_t once_fudge = 0; +BOOL had_recurse = FALSE; +BOOL dupcapused = (re->flags & PCRE2_DUPCAPUSED) != 0; +PCRE2_SPTR nextbranch = code + GET(code, 1); +PCRE2_UCHAR *cc = (PCRE2_UCHAR *)code + 1 + LINK_SIZE; +recurse_check this_recurse; + +/* If this is a "could be empty" group, its minimum length is 0. */ + +if (*code >= OP_SBRA && *code <= OP_SCOND) return 0; + +/* Skip over capturing bracket number */ + +if (*code == OP_CBRA || *code == OP_CBRAPOS) cc += IMM2_SIZE; + +/* A large and/or complex regex can take too long to process. */ + +if ((*countptr)++ > 1000) return -1; + +/* Scan along the opcodes for this branch. If we get to the end of the branch, +check the length against that of the other branches. If the accumulated length +passes 16-bits, reset to that value and skip the rest of the branch. */ + +for (;;) + { + int d, min, recno; + PCRE2_UCHAR op, *cs, *ce; + + if (branchlength >= UINT16_MAX) + { + branchlength = UINT16_MAX; + cc = (PCRE2_UCHAR *)nextbranch; + } + + op = *cc; + switch (op) + { + case OP_COND: + case OP_SCOND: + + /* If there is only one branch in a condition, the implied branch has zero + length, so we don't add anything. This covers the DEFINE "condition" + automatically. If there are two branches we can treat it the same as any + other non-capturing subpattern. */ + + cs = cc + GET(cc, 1); + if (*cs != OP_ALT) + { + cc = cs + 1 + LINK_SIZE; + break; + } + goto PROCESS_NON_CAPTURE; + + case OP_BRA: + /* There's a special case of OP_BRA, when it is wrapped round a repeated + OP_RECURSE. We'd like to process the latter at this level so that + remembering the value works for repeated cases. So we do nothing, but + set a fudge value to skip over the OP_KET after the recurse. */ + + if (cc[1+LINK_SIZE] == OP_RECURSE && cc[2*(1+LINK_SIZE)] == OP_KET) + { + once_fudge = 1 + LINK_SIZE; + cc += 1 + LINK_SIZE; + break; + } + /* Fall through */ + + case OP_ONCE: + case OP_SCRIPT_RUN: + case OP_SBRA: + case OP_BRAPOS: + case OP_SBRAPOS: + PROCESS_NON_CAPTURE: + d = find_minlength(re, cc, startcode, utf, recurses, countptr, + backref_cache); + if (d < 0) return d; + branchlength += d; + do cc += GET(cc, 1); while (*cc == OP_ALT); + cc += 1 + LINK_SIZE; + break; + + /* To save time for repeated capturing subpatterns, we remember the + length of the previous one. Unfortunately we can't do the same for + the unnumbered ones above. Nor can we do this if (?| is present in the + pattern because captures with the same number are not then identical. */ + + case OP_CBRA: + case OP_SCBRA: + case OP_CBRAPOS: + case OP_SCBRAPOS: + recno = (int)GET2(cc, 1+LINK_SIZE); + if (dupcapused || recno != prev_cap_recno) + { + prev_cap_recno = recno; + prev_cap_d = find_minlength(re, cc, startcode, utf, recurses, countptr, + backref_cache); + if (prev_cap_d < 0) return prev_cap_d; + } + branchlength += prev_cap_d; + do cc += GET(cc, 1); while (*cc == OP_ALT); + cc += 1 + LINK_SIZE; + break; + + /* ACCEPT makes things far too complicated; we have to give up. In fact, + from 10.34 onwards, if a pattern contains (*ACCEPT), this function is not + used. However, leave the code in place, just in case. */ + + case OP_ACCEPT: + case OP_ASSERT_ACCEPT: + return -1; + + /* Reached end of a branch; if it's a ket it is the end of a nested + call. If it's ALT it is an alternation in a nested call. If it is END it's + the end of the outer call. All can be handled by the same code. If the + length of any branch is zero, there is no need to scan any subsequent + branches. */ + + case OP_ALT: + case OP_KET: + case OP_KETRMAX: + case OP_KETRMIN: + case OP_KETRPOS: + case OP_END: + if (length < 0 || (!had_recurse && branchlength < length)) + length = branchlength; + if (op != OP_ALT || length == 0) return length; + nextbranch = cc + GET(cc, 1); + cc += 1 + LINK_SIZE; + branchlength = 0; + had_recurse = FALSE; + break; + + /* Skip over assertive subpatterns */ + + case OP_ASSERT: + case OP_ASSERT_NOT: + case OP_ASSERTBACK: + case OP_ASSERTBACK_NOT: + case OP_ASSERT_NA: + case OP_ASSERTBACK_NA: + do cc += GET(cc, 1); while (*cc == OP_ALT); + /* Fall through */ + + /* Skip over things that don't match chars */ + + case OP_REVERSE: + case OP_CREF: + case OP_DNCREF: + case OP_RREF: + case OP_DNRREF: + case OP_FALSE: + case OP_TRUE: + case OP_CALLOUT: + case OP_SOD: + case OP_SOM: + case OP_EOD: + case OP_EODN: + case OP_CIRC: + case OP_CIRCM: + case OP_DOLL: + case OP_DOLLM: + case OP_NOT_WORD_BOUNDARY: + case OP_WORD_BOUNDARY: + cc += PRIV(OP_lengths)[*cc]; + break; + + case OP_CALLOUT_STR: + cc += GET(cc, 1 + 2*LINK_SIZE); + break; + + /* Skip over a subpattern that has a {0} or {0,x} quantifier */ + + case OP_BRAZERO: + case OP_BRAMINZERO: + case OP_BRAPOSZERO: + case OP_SKIPZERO: + cc += PRIV(OP_lengths)[*cc]; + do cc += GET(cc, 1); while (*cc == OP_ALT); + cc += 1 + LINK_SIZE; + break; + + /* Handle literal characters and + repetitions */ + + case OP_CHAR: + case OP_CHARI: + case OP_NOT: + case OP_NOTI: + case OP_PLUS: + case OP_PLUSI: + case OP_MINPLUS: + case OP_MINPLUSI: + case OP_POSPLUS: + case OP_POSPLUSI: + case OP_NOTPLUS: + case OP_NOTPLUSI: + case OP_NOTMINPLUS: + case OP_NOTMINPLUSI: + case OP_NOTPOSPLUS: + case OP_NOTPOSPLUSI: + branchlength++; + cc += 2; +#ifdef SUPPORT_UNICODE + if (utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); +#endif + break; + + case OP_TYPEPLUS: + case OP_TYPEMINPLUS: + case OP_TYPEPOSPLUS: + branchlength++; + cc += (cc[1] == OP_PROP || cc[1] == OP_NOTPROP)? 4 : 2; + break; + + /* Handle exact repetitions. The count is already in characters, but we + may need to skip over a multibyte character in UTF mode. */ + + case OP_EXACT: + case OP_EXACTI: + case OP_NOTEXACT: + case OP_NOTEXACTI: + branchlength += GET2(cc,1); + cc += 2 + IMM2_SIZE; +#ifdef SUPPORT_UNICODE + if (utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); +#endif + break; + + case OP_TYPEEXACT: + branchlength += GET2(cc,1); + cc += 2 + IMM2_SIZE + ((cc[1 + IMM2_SIZE] == OP_PROP + || cc[1 + IMM2_SIZE] == OP_NOTPROP)? 2 : 0); + break; + + /* Handle single-char non-literal matchers */ + + case OP_PROP: + case OP_NOTPROP: + cc += 2; + /* Fall through */ + + case OP_NOT_DIGIT: + case OP_DIGIT: + case OP_NOT_WHITESPACE: + case OP_WHITESPACE: + case OP_NOT_WORDCHAR: + case OP_WORDCHAR: + case OP_ANY: + case OP_ALLANY: + case OP_EXTUNI: + case OP_HSPACE: + case OP_NOT_HSPACE: + case OP_VSPACE: + case OP_NOT_VSPACE: + branchlength++; + cc++; + break; + + /* "Any newline" might match two characters, but it also might match just + one. */ + + case OP_ANYNL: + branchlength += 1; + cc++; + break; + + /* The single-byte matcher means we can't proceed in UTF mode. (In + non-UTF mode \C will actually be turned into OP_ALLANY, so won't ever + appear, but leave the code, just in case.) */ + + case OP_ANYBYTE: +#ifdef SUPPORT_UNICODE + if (utf) return -1; +#endif + branchlength++; + cc++; + break; + + /* For repeated character types, we have to test for \p and \P, which have + an extra two bytes of parameters. */ + + case OP_TYPESTAR: + case OP_TYPEMINSTAR: + case OP_TYPEQUERY: + case OP_TYPEMINQUERY: + case OP_TYPEPOSSTAR: + case OP_TYPEPOSQUERY: + if (cc[1] == OP_PROP || cc[1] == OP_NOTPROP) cc += 2; + cc += PRIV(OP_lengths)[op]; + break; + + case OP_TYPEUPTO: + case OP_TYPEMINUPTO: + case OP_TYPEPOSUPTO: + if (cc[1 + IMM2_SIZE] == OP_PROP + || cc[1 + IMM2_SIZE] == OP_NOTPROP) cc += 2; + cc += PRIV(OP_lengths)[op]; + break; + + /* Check a class for variable quantification */ + + case OP_CLASS: + case OP_NCLASS: +#ifdef SUPPORT_WIDE_CHARS + case OP_XCLASS: + /* The original code caused an unsigned overflow in 64 bit systems, + so now we use a conditional statement. */ + if (op == OP_XCLASS) + cc += GET(cc, 1); + else + cc += PRIV(OP_lengths)[OP_CLASS]; +#else + cc += PRIV(OP_lengths)[OP_CLASS]; +#endif + + switch (*cc) + { + case OP_CRPLUS: + case OP_CRMINPLUS: + case OP_CRPOSPLUS: + branchlength++; + /* Fall through */ + + case OP_CRSTAR: + case OP_CRMINSTAR: + case OP_CRQUERY: + case OP_CRMINQUERY: + case OP_CRPOSSTAR: + case OP_CRPOSQUERY: + cc++; + break; + + case OP_CRRANGE: + case OP_CRMINRANGE: + case OP_CRPOSRANGE: + branchlength += GET2(cc,1); + cc += 1 + 2 * IMM2_SIZE; + break; + + default: + branchlength++; + break; + } + break; + + /* Backreferences and subroutine calls (OP_RECURSE) are treated in the same + way: we find the minimum length for the subpattern. A recursion + (backreference or subroutine) causes an a flag to be set that causes the + length of this branch to be ignored. The logic is that a recursion can only + make sense if there is another alternative that stops the recursing. That + will provide the minimum length (when no recursion happens). + + If PCRE2_MATCH_UNSET_BACKREF is set, a backreference to an unset bracket + matches an empty string (by default it causes a matching failure), so in + that case we must set the minimum length to zero. + + For backreferenes, if duplicate numbers are present in the pattern we check + for a reference to a duplicate. If it is, we don't know which version will + be referenced, so we have to set the minimum length to zero. */ + + /* Duplicate named pattern back reference. */ + + case OP_DNREF: + case OP_DNREFI: + if (!dupcapused && (re->overall_options & PCRE2_MATCH_UNSET_BACKREF) == 0) + { + int count = GET2(cc, 1+IMM2_SIZE); + PCRE2_UCHAR *slot = + (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) + + GET2(cc, 1) * re->name_entry_size; + + d = INT_MAX; + + /* Scan all groups with the same name; find the shortest. */ + + while (count-- > 0) + { + int dd, i; + recno = GET2(slot, 0); + + if (recno <= backref_cache[0] && backref_cache[recno] >= 0) + dd = backref_cache[recno]; + else + { + ce = cs = (PCRE2_UCHAR *)PRIV(find_bracket)(startcode, utf, recno); + if (cs == NULL) return -2; + do ce += GET(ce, 1); while (*ce == OP_ALT); + + dd = 0; + if (!dupcapused || + (PCRE2_UCHAR *)PRIV(find_bracket)(ce, utf, recno) == NULL) + { + if (cc > cs && cc < ce) /* Simple recursion */ + { + had_recurse = TRUE; + } + else + { + recurse_check *r = recurses; + for (r = recurses; r != NULL; r = r->prev) + if (r->group == cs) break; + if (r != NULL) /* Mutual recursion */ + { + had_recurse = TRUE; + } + else + { + this_recurse.prev = recurses; /* No recursion */ + this_recurse.group = cs; + dd = find_minlength(re, cs, startcode, utf, &this_recurse, + countptr, backref_cache); + if (dd < 0) return dd; + } + } + } + + backref_cache[recno] = dd; + for (i = backref_cache[0] + 1; i < recno; i++) backref_cache[i] = -1; + backref_cache[0] = recno; + } + + if (dd < d) d = dd; + if (d <= 0) break; /* No point looking at any more */ + slot += re->name_entry_size; + } + } + else d = 0; + cc += 1 + 2*IMM2_SIZE; + goto REPEAT_BACK_REFERENCE; + + /* Single back reference by number. References by name are converted to by + number when there is no duplication. */ + + case OP_REF: + case OP_REFI: + recno = GET2(cc, 1); + if (recno <= backref_cache[0] && backref_cache[recno] >= 0) + d = backref_cache[recno]; + else + { + int i; + d = 0; + + if ((re->overall_options & PCRE2_MATCH_UNSET_BACKREF) == 0) + { + ce = cs = (PCRE2_UCHAR *)PRIV(find_bracket)(startcode, utf, recno); + if (cs == NULL) return -2; + do ce += GET(ce, 1); while (*ce == OP_ALT); + + if (!dupcapused || + (PCRE2_UCHAR *)PRIV(find_bracket)(ce, utf, recno) == NULL) + { + if (cc > cs && cc < ce) /* Simple recursion */ + { + had_recurse = TRUE; + } + else + { + recurse_check *r = recurses; + for (r = recurses; r != NULL; r = r->prev) if (r->group == cs) break; + if (r != NULL) /* Mutual recursion */ + { + had_recurse = TRUE; + } + else /* No recursion */ + { + this_recurse.prev = recurses; + this_recurse.group = cs; + d = find_minlength(re, cs, startcode, utf, &this_recurse, countptr, + backref_cache); + if (d < 0) return d; + } + } + } + } + + backref_cache[recno] = d; + for (i = backref_cache[0] + 1; i < recno; i++) backref_cache[i] = -1; + backref_cache[0] = recno; + } + + cc += 1 + IMM2_SIZE; + + /* Handle repeated back references */ + + REPEAT_BACK_REFERENCE: + switch (*cc) + { + case OP_CRSTAR: + case OP_CRMINSTAR: + case OP_CRQUERY: + case OP_CRMINQUERY: + case OP_CRPOSSTAR: + case OP_CRPOSQUERY: + min = 0; + cc++; + break; + + case OP_CRPLUS: + case OP_CRMINPLUS: + case OP_CRPOSPLUS: + min = 1; + cc++; + break; + + case OP_CRRANGE: + case OP_CRMINRANGE: + case OP_CRPOSRANGE: + min = GET2(cc, 1); + cc += 1 + 2 * IMM2_SIZE; + break; + + default: + min = 1; + break; + } + + /* Take care not to overflow: (1) min and d are ints, so check that their + product is not greater than INT_MAX. (2) branchlength is limited to + UINT16_MAX (checked at the top of the loop). */ + + if ((d > 0 && (INT_MAX/d) < min) || UINT16_MAX - branchlength < min*d) + branchlength = UINT16_MAX; + else branchlength += min * d; + break; + + /* Recursion always refers to the first occurrence of a subpattern with a + given number. Therefore, we can always make use of caching, even when the + pattern contains multiple subpatterns with the same number. */ + + case OP_RECURSE: + cs = ce = (PCRE2_UCHAR *)startcode + GET(cc, 1); + recno = GET2(cs, 1+LINK_SIZE); + if (recno == prev_recurse_recno) + { + branchlength += prev_recurse_d; + } + else + { + do ce += GET(ce, 1); while (*ce == OP_ALT); + if (cc > cs && cc < ce) /* Simple recursion */ + had_recurse = TRUE; + else + { + recurse_check *r = recurses; + for (r = recurses; r != NULL; r = r->prev) if (r->group == cs) break; + if (r != NULL) /* Mutual recursion */ + had_recurse = TRUE; + else + { + this_recurse.prev = recurses; + this_recurse.group = cs; + prev_recurse_d = find_minlength(re, cs, startcode, utf, &this_recurse, + countptr, backref_cache); + if (prev_recurse_d < 0) return prev_recurse_d; + prev_recurse_recno = recno; + branchlength += prev_recurse_d; + } + } + } + cc += 1 + LINK_SIZE + once_fudge; + once_fudge = 0; + break; + + /* Anything else does not or need not match a character. We can get the + item's length from the table, but for those that can match zero occurrences + of a character, we must take special action for UTF-8 characters. As it + happens, the "NOT" versions of these opcodes are used at present only for + ASCII characters, so they could be omitted from this list. However, in + future that may change, so we include them here so as not to leave a + gotcha for a future maintainer. */ + + case OP_UPTO: + case OP_UPTOI: + case OP_NOTUPTO: + case OP_NOTUPTOI: + case OP_MINUPTO: + case OP_MINUPTOI: + case OP_NOTMINUPTO: + case OP_NOTMINUPTOI: + case OP_POSUPTO: + case OP_POSUPTOI: + case OP_NOTPOSUPTO: + case OP_NOTPOSUPTOI: + + case OP_STAR: + case OP_STARI: + case OP_NOTSTAR: + case OP_NOTSTARI: + case OP_MINSTAR: + case OP_MINSTARI: + case OP_NOTMINSTAR: + case OP_NOTMINSTARI: + case OP_POSSTAR: + case OP_POSSTARI: + case OP_NOTPOSSTAR: + case OP_NOTPOSSTARI: + + case OP_QUERY: + case OP_QUERYI: + case OP_NOTQUERY: + case OP_NOTQUERYI: + case OP_MINQUERY: + case OP_MINQUERYI: + case OP_NOTMINQUERY: + case OP_NOTMINQUERYI: + case OP_POSQUERY: + case OP_POSQUERYI: + case OP_NOTPOSQUERY: + case OP_NOTPOSQUERYI: + + cc += PRIV(OP_lengths)[op]; +#ifdef SUPPORT_UNICODE + if (utf && HAS_EXTRALEN(cc[-1])) cc += GET_EXTRALEN(cc[-1]); +#endif + break; + + /* Skip these, but we need to add in the name length. */ + + case OP_MARK: + case OP_COMMIT_ARG: + case OP_PRUNE_ARG: + case OP_SKIP_ARG: + case OP_THEN_ARG: + cc += PRIV(OP_lengths)[op] + cc[1]; + break; + + /* The remaining opcodes are just skipped over. */ + + case OP_CLOSE: + case OP_COMMIT: + case OP_FAIL: + case OP_PRUNE: + case OP_SET_SOM: + case OP_SKIP: + case OP_THEN: + cc += PRIV(OP_lengths)[op]; + break; + + /* This should not occur: we list all opcodes explicitly so that when + new ones get added they are properly considered. */ + + default: + return -3; + } + } +/* Control never gets here */ +} + + + +/************************************************* +* Set a bit and maybe its alternate case * +*************************************************/ + +/* Given a character, set its first code unit's bit in the table, and also the +corresponding bit for the other version of a letter if we are caseless. + +Arguments: + re points to the regex block + p points to the first code unit of the character + caseless TRUE if caseless + utf TRUE for UTF mode + ucp TRUE for UCP mode + +Returns: pointer after the character +*/ + +static PCRE2_SPTR +set_table_bit(pcre2_real_code *re, PCRE2_SPTR p, BOOL caseless, BOOL utf, + BOOL ucp) +{ +uint32_t c = *p++; /* First code unit */ + +(void)utf; /* Stop compiler warnings when UTF not supported */ +(void)ucp; + +/* In 16-bit and 32-bit modes, code units greater than 0xff set the bit for +0xff. */ + +#if PCRE2_CODE_UNIT_WIDTH != 8 +if (c > 0xff) SET_BIT(0xff); else +#endif + +SET_BIT(c); + +/* In UTF-8 or UTF-16 mode, pick up the remaining code units in order to find +the end of the character, even when caseless. */ + +#ifdef SUPPORT_UNICODE +if (utf) + { +#if PCRE2_CODE_UNIT_WIDTH == 8 + if (c >= 0xc0) GETUTF8INC(c, p); +#elif PCRE2_CODE_UNIT_WIDTH == 16 + if ((c & 0xfc00) == 0xd800) GETUTF16INC(c, p); +#endif + } +#endif /* SUPPORT_UNICODE */ + +/* If caseless, handle the other case of the character. */ + +if (caseless) + { +#ifdef SUPPORT_UNICODE + if (utf || ucp) + { + c = UCD_OTHERCASE(c); +#if PCRE2_CODE_UNIT_WIDTH == 8 + if (utf) + { + PCRE2_UCHAR buff[6]; + (void)PRIV(ord2utf)(c, buff); + SET_BIT(buff[0]); + } + else if (c < 256) SET_BIT(c); +#else /* 16-bit or 32-bit mode */ + if (c > 0xff) SET_BIT(0xff); else SET_BIT(c); +#endif + } + + else +#endif /* SUPPORT_UNICODE */ + + /* Not UTF or UCP */ + + if (MAX_255(c)) SET_BIT(re->tables[fcc_offset + c]); + } + +return p; +} + + + +/************************************************* +* Set bits for a positive character type * +*************************************************/ + +/* This function sets starting bits for a character type. In UTF-8 mode, we can +only do a direct setting for bytes less than 128, as otherwise there can be +confusion with bytes in the middle of UTF-8 characters. In a "traditional" +environment, the tables will only recognize ASCII characters anyway, but in at +least one Windows environment, some higher bytes bits were set in the tables. +So we deal with that case by considering the UTF-8 encoding. + +Arguments: + re the regex block + cbit type the type of character wanted + table_limit 32 for non-UTF-8; 16 for UTF-8 + +Returns: nothing +*/ + +static void +set_type_bits(pcre2_real_code *re, int cbit_type, unsigned int table_limit) +{ +uint32_t c; +for (c = 0; c < table_limit; c++) + re->start_bitmap[c] |= re->tables[c+cbits_offset+cbit_type]; +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 +if (table_limit == 32) return; +for (c = 128; c < 256; c++) + { + if ((re->tables[cbits_offset + c/8] & (1u << (c&7))) != 0) + { + PCRE2_UCHAR buff[6]; + (void)PRIV(ord2utf)(c, buff); + SET_BIT(buff[0]); + } + } +#endif /* UTF-8 */ +} + + +/************************************************* +* Set bits for a negative character type * +*************************************************/ + +/* This function sets starting bits for a negative character type such as \D. +In UTF-8 mode, we can only do a direct setting for bytes less than 128, as +otherwise there can be confusion with bytes in the middle of UTF-8 characters. +Unlike in the positive case, where we can set appropriate starting bits for +specific high-valued UTF-8 characters, in this case we have to set the bits for +all high-valued characters. The lowest is 0xc2, but we overkill by starting at +0xc0 (192) for simplicity. + +Arguments: + re the regex block + cbit type the type of character wanted + table_limit 32 for non-UTF-8; 16 for UTF-8 + +Returns: nothing +*/ + +static void +set_nottype_bits(pcre2_real_code *re, int cbit_type, unsigned int table_limit) +{ +uint32_t c; +for (c = 0; c < table_limit; c++) + re->start_bitmap[c] |= ~(re->tables[c+cbits_offset+cbit_type]); +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 +if (table_limit != 32) for (c = 24; c < 32; c++) re->start_bitmap[c] = 0xff; +#endif +} + + + +/************************************************* +* Create bitmap of starting code units * +*************************************************/ + +/* This function scans a compiled unanchored expression recursively and +attempts to build a bitmap of the set of possible starting code units whose +values are less than 256. In 16-bit and 32-bit mode, values above 255 all cause +the 255 bit to be set. When calling set[_not]_type_bits() in UTF-8 (sic) mode +we pass a value of 16 rather than 32 as the final argument. (See comments in +those functions for the reason.) + +The SSB_CONTINUE return is useful for parenthesized groups in patterns such as +(a*)b where the group provides some optional starting code units but scanning +must continue at the outer level to find at least one mandatory code unit. At +the outermost level, this function fails unless the result is SSB_DONE. + +We restrict recursion (for nested groups) to 1000 to avoid stack overflow +issues. + +Arguments: + re points to the compiled regex block + code points to an expression + utf TRUE if in UTF mode + ucp TRUE if in UCP mode + depthptr pointer to recurse depth + +Returns: SSB_FAIL => Failed to find any starting code units + SSB_DONE => Found mandatory starting code units + SSB_CONTINUE => Found optional starting code units + SSB_UNKNOWN => Hit an unrecognized opcode + SSB_TOODEEP => Recursion is too deep +*/ + +static int +set_start_bits(pcre2_real_code *re, PCRE2_SPTR code, BOOL utf, BOOL ucp, + int *depthptr) +{ +uint32_t c; +int yield = SSB_DONE; + +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 +int table_limit = utf? 16:32; +#else +int table_limit = 32; +#endif + +*depthptr += 1; +if (*depthptr > 1000) return SSB_TOODEEP; + +do + { + BOOL try_next = TRUE; + PCRE2_SPTR tcode = code + 1 + LINK_SIZE; + + if (*code == OP_CBRA || *code == OP_SCBRA || + *code == OP_CBRAPOS || *code == OP_SCBRAPOS) tcode += IMM2_SIZE; + + while (try_next) /* Loop for items in this branch */ + { + int rc; + uint8_t *classmap = NULL; +#ifdef SUPPORT_WIDE_CHARS + PCRE2_UCHAR xclassflags; +#endif + + switch(*tcode) + { + /* If we reach something we don't understand, it means a new opcode has + been created that hasn't been added to this function. Hopefully this + problem will be discovered during testing. */ + + default: + return SSB_UNKNOWN; + + /* Fail for a valid opcode that implies no starting bits. */ + + case OP_ACCEPT: + case OP_ASSERT_ACCEPT: + case OP_ALLANY: + case OP_ANY: + case OP_ANYBYTE: + case OP_CIRCM: + case OP_CLOSE: + case OP_COMMIT: + case OP_COMMIT_ARG: + case OP_COND: + case OP_CREF: + case OP_FALSE: + case OP_TRUE: + case OP_DNCREF: + case OP_DNREF: + case OP_DNREFI: + case OP_DNRREF: + case OP_DOLL: + case OP_DOLLM: + case OP_END: + case OP_EOD: + case OP_EODN: + case OP_EXTUNI: + case OP_FAIL: + case OP_MARK: + case OP_NOT: + case OP_NOTEXACT: + case OP_NOTEXACTI: + case OP_NOTI: + case OP_NOTMINPLUS: + case OP_NOTMINPLUSI: + case OP_NOTMINQUERY: + case OP_NOTMINQUERYI: + case OP_NOTMINSTAR: + case OP_NOTMINSTARI: + case OP_NOTMINUPTO: + case OP_NOTMINUPTOI: + case OP_NOTPLUS: + case OP_NOTPLUSI: + case OP_NOTPOSPLUS: + case OP_NOTPOSPLUSI: + case OP_NOTPOSQUERY: + case OP_NOTPOSQUERYI: + case OP_NOTPOSSTAR: + case OP_NOTPOSSTARI: + case OP_NOTPOSUPTO: + case OP_NOTPOSUPTOI: + case OP_NOTPROP: + case OP_NOTQUERY: + case OP_NOTQUERYI: + case OP_NOTSTAR: + case OP_NOTSTARI: + case OP_NOTUPTO: + case OP_NOTUPTOI: + case OP_NOT_HSPACE: + case OP_NOT_VSPACE: + case OP_PRUNE: + case OP_PRUNE_ARG: + case OP_RECURSE: + case OP_REF: + case OP_REFI: + case OP_REVERSE: + case OP_RREF: + case OP_SCOND: + case OP_SET_SOM: + case OP_SKIP: + case OP_SKIP_ARG: + case OP_SOD: + case OP_SOM: + case OP_THEN: + case OP_THEN_ARG: + return SSB_FAIL; + + /* OP_CIRC happens only at the start of an anchored branch (multiline ^ + uses OP_CIRCM). Skip over it. */ + + case OP_CIRC: + tcode += PRIV(OP_lengths)[OP_CIRC]; + break; + + /* A "real" property test implies no starting bits, but the fake property + PT_CLIST identifies a list of characters. These lists are short, as they + are used for characters with more than one "other case", so there is no + point in recognizing them for OP_NOTPROP. */ + + case OP_PROP: + if (tcode[1] != PT_CLIST) return SSB_FAIL; + { + const uint32_t *p = PRIV(ucd_caseless_sets) + tcode[2]; + while ((c = *p++) < NOTACHAR) + { +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 + if (utf) + { + PCRE2_UCHAR buff[6]; + (void)PRIV(ord2utf)(c, buff); + c = buff[0]; + } +#endif + if (c > 0xff) SET_BIT(0xff); else SET_BIT(c); + } + } + try_next = FALSE; + break; + + /* We can ignore word boundary tests. */ + + case OP_WORD_BOUNDARY: + case OP_NOT_WORD_BOUNDARY: + tcode++; + break; + + /* If we hit a bracket or a positive lookahead assertion, recurse to set + bits from within the subpattern. If it can't find anything, we have to + give up. If it finds some mandatory character(s), we are done for this + branch. Otherwise, carry on scanning after the subpattern. */ + + case OP_BRA: + case OP_SBRA: + case OP_CBRA: + case OP_SCBRA: + case OP_BRAPOS: + case OP_SBRAPOS: + case OP_CBRAPOS: + case OP_SCBRAPOS: + case OP_ONCE: + case OP_SCRIPT_RUN: + case OP_ASSERT: + case OP_ASSERT_NA: + rc = set_start_bits(re, tcode, utf, ucp, depthptr); + if (rc == SSB_DONE) + { + try_next = FALSE; + } + else if (rc == SSB_CONTINUE) + { + do tcode += GET(tcode, 1); while (*tcode == OP_ALT); + tcode += 1 + LINK_SIZE; + } + else return rc; /* FAIL, UNKNOWN, or TOODEEP */ + break; + + /* If we hit ALT or KET, it means we haven't found anything mandatory in + this branch, though we might have found something optional. For ALT, we + continue with the next alternative, but we have to arrange that the final + result from subpattern is SSB_CONTINUE rather than SSB_DONE. For KET, + return SSB_CONTINUE: if this is the top level, that indicates failure, + but after a nested subpattern, it causes scanning to continue. */ + + case OP_ALT: + yield = SSB_CONTINUE; + try_next = FALSE; + break; + + case OP_KET: + case OP_KETRMAX: + case OP_KETRMIN: + case OP_KETRPOS: + return SSB_CONTINUE; + + /* Skip over callout */ + + case OP_CALLOUT: + tcode += PRIV(OP_lengths)[OP_CALLOUT]; + break; + + case OP_CALLOUT_STR: + tcode += GET(tcode, 1 + 2*LINK_SIZE); + break; + + /* Skip over lookbehind and negative lookahead assertions */ + + case OP_ASSERT_NOT: + case OP_ASSERTBACK: + case OP_ASSERTBACK_NOT: + case OP_ASSERTBACK_NA: + do tcode += GET(tcode, 1); while (*tcode == OP_ALT); + tcode += 1 + LINK_SIZE; + break; + + /* BRAZERO does the bracket, but carries on. */ + + case OP_BRAZERO: + case OP_BRAMINZERO: + case OP_BRAPOSZERO: + rc = set_start_bits(re, ++tcode, utf, ucp, depthptr); + if (rc == SSB_FAIL || rc == SSB_UNKNOWN || rc == SSB_TOODEEP) return rc; + do tcode += GET(tcode,1); while (*tcode == OP_ALT); + tcode += 1 + LINK_SIZE; + break; + + /* SKIPZERO skips the bracket. */ + + case OP_SKIPZERO: + tcode++; + do tcode += GET(tcode,1); while (*tcode == OP_ALT); + tcode += 1 + LINK_SIZE; + break; + + /* Single-char * or ? sets the bit and tries the next item */ + + case OP_STAR: + case OP_MINSTAR: + case OP_POSSTAR: + case OP_QUERY: + case OP_MINQUERY: + case OP_POSQUERY: + tcode = set_table_bit(re, tcode + 1, FALSE, utf, ucp); + break; + + case OP_STARI: + case OP_MINSTARI: + case OP_POSSTARI: + case OP_QUERYI: + case OP_MINQUERYI: + case OP_POSQUERYI: + tcode = set_table_bit(re, tcode + 1, TRUE, utf, ucp); + break; + + /* Single-char upto sets the bit and tries the next */ + + case OP_UPTO: + case OP_MINUPTO: + case OP_POSUPTO: + tcode = set_table_bit(re, tcode + 1 + IMM2_SIZE, FALSE, utf, ucp); + break; + + case OP_UPTOI: + case OP_MINUPTOI: + case OP_POSUPTOI: + tcode = set_table_bit(re, tcode + 1 + IMM2_SIZE, TRUE, utf, ucp); + break; + + /* At least one single char sets the bit and stops */ + + case OP_EXACT: + tcode += IMM2_SIZE; + /* Fall through */ + case OP_CHAR: + case OP_PLUS: + case OP_MINPLUS: + case OP_POSPLUS: + (void)set_table_bit(re, tcode + 1, FALSE, utf, ucp); + try_next = FALSE; + break; + + case OP_EXACTI: + tcode += IMM2_SIZE; + /* Fall through */ + case OP_CHARI: + case OP_PLUSI: + case OP_MINPLUSI: + case OP_POSPLUSI: + (void)set_table_bit(re, tcode + 1, TRUE, utf, ucp); + try_next = FALSE; + break; + + /* Special spacing and line-terminating items. These recognize specific + lists of characters. The difference between VSPACE and ANYNL is that the + latter can match the two-character CRLF sequence, but that is not + relevant for finding the first character, so their code here is + identical. */ + + case OP_HSPACE: + SET_BIT(CHAR_HT); + SET_BIT(CHAR_SPACE); + + /* For the 16-bit and 32-bit libraries (which can never be EBCDIC), set + the bits for 0xA0 and for code units >= 255, independently of UTF. */ + +#if PCRE2_CODE_UNIT_WIDTH != 8 + SET_BIT(0xA0); + SET_BIT(0xFF); +#else + /* For the 8-bit library in UTF-8 mode, set the bits for the first code + units of horizontal space characters. */ + +#ifdef SUPPORT_UNICODE + if (utf) + { + SET_BIT(0xC2); /* For U+00A0 */ + SET_BIT(0xE1); /* For U+1680, U+180E */ + SET_BIT(0xE2); /* For U+2000 - U+200A, U+202F, U+205F */ + SET_BIT(0xE3); /* For U+3000 */ + } + else +#endif + /* For the 8-bit library not in UTF-8 mode, set the bit for 0xA0, unless + the code is EBCDIC. */ + { +#ifndef EBCDIC + SET_BIT(0xA0); +#endif /* Not EBCDIC */ + } +#endif /* 8-bit support */ + + try_next = FALSE; + break; + + case OP_ANYNL: + case OP_VSPACE: + SET_BIT(CHAR_LF); + SET_BIT(CHAR_VT); + SET_BIT(CHAR_FF); + SET_BIT(CHAR_CR); + + /* For the 16-bit and 32-bit libraries (which can never be EBCDIC), set + the bits for NEL and for code units >= 255, independently of UTF. */ + +#if PCRE2_CODE_UNIT_WIDTH != 8 + SET_BIT(CHAR_NEL); + SET_BIT(0xFF); +#else + /* For the 8-bit library in UTF-8 mode, set the bits for the first code + units of vertical space characters. */ + +#ifdef SUPPORT_UNICODE + if (utf) + { + SET_BIT(0xC2); /* For U+0085 (NEL) */ + SET_BIT(0xE2); /* For U+2028, U+2029 */ + } + else +#endif + /* For the 8-bit library not in UTF-8 mode, set the bit for NEL. */ + { + SET_BIT(CHAR_NEL); + } +#endif /* 8-bit support */ + + try_next = FALSE; + break; + + /* Single character types set the bits and stop. Note that if PCRE2_UCP + is set, we do not see these opcodes because \d etc are converted to + properties. Therefore, these apply in the case when only characters less + than 256 are recognized to match the types. */ + + case OP_NOT_DIGIT: + set_nottype_bits(re, cbit_digit, table_limit); + try_next = FALSE; + break; + + case OP_DIGIT: + set_type_bits(re, cbit_digit, table_limit); + try_next = FALSE; + break; + + case OP_NOT_WHITESPACE: + set_nottype_bits(re, cbit_space, table_limit); + try_next = FALSE; + break; + + case OP_WHITESPACE: + set_type_bits(re, cbit_space, table_limit); + try_next = FALSE; + break; + + case OP_NOT_WORDCHAR: + set_nottype_bits(re, cbit_word, table_limit); + try_next = FALSE; + break; + + case OP_WORDCHAR: + set_type_bits(re, cbit_word, table_limit); + try_next = FALSE; + break; + + /* One or more character type fudges the pointer and restarts, knowing + it will hit a single character type and stop there. */ + + case OP_TYPEPLUS: + case OP_TYPEMINPLUS: + case OP_TYPEPOSPLUS: + tcode++; + break; + + case OP_TYPEEXACT: + tcode += 1 + IMM2_SIZE; + break; + + /* Zero or more repeats of character types set the bits and then + try again. */ + + case OP_TYPEUPTO: + case OP_TYPEMINUPTO: + case OP_TYPEPOSUPTO: + tcode += IMM2_SIZE; /* Fall through */ + + case OP_TYPESTAR: + case OP_TYPEMINSTAR: + case OP_TYPEPOSSTAR: + case OP_TYPEQUERY: + case OP_TYPEMINQUERY: + case OP_TYPEPOSQUERY: + switch(tcode[1]) + { + default: + case OP_ANY: + case OP_ALLANY: + return SSB_FAIL; + + case OP_HSPACE: + SET_BIT(CHAR_HT); + SET_BIT(CHAR_SPACE); + + /* For the 16-bit and 32-bit libraries (which can never be EBCDIC), set + the bits for 0xA0 and for code units >= 255, independently of UTF. */ + +#if PCRE2_CODE_UNIT_WIDTH != 8 + SET_BIT(0xA0); + SET_BIT(0xFF); +#else + /* For the 8-bit library in UTF-8 mode, set the bits for the first code + units of horizontal space characters. */ + +#ifdef SUPPORT_UNICODE + if (utf) + { + SET_BIT(0xC2); /* For U+00A0 */ + SET_BIT(0xE1); /* For U+1680, U+180E */ + SET_BIT(0xE2); /* For U+2000 - U+200A, U+202F, U+205F */ + SET_BIT(0xE3); /* For U+3000 */ + } + else +#endif + /* For the 8-bit library not in UTF-8 mode, set the bit for 0xA0, unless + the code is EBCDIC. */ + { +#ifndef EBCDIC + SET_BIT(0xA0); +#endif /* Not EBCDIC */ + } +#endif /* 8-bit support */ + break; + + case OP_ANYNL: + case OP_VSPACE: + SET_BIT(CHAR_LF); + SET_BIT(CHAR_VT); + SET_BIT(CHAR_FF); + SET_BIT(CHAR_CR); + + /* For the 16-bit and 32-bit libraries (which can never be EBCDIC), set + the bits for NEL and for code units >= 255, independently of UTF. */ + +#if PCRE2_CODE_UNIT_WIDTH != 8 + SET_BIT(CHAR_NEL); + SET_BIT(0xFF); +#else + /* For the 8-bit library in UTF-8 mode, set the bits for the first code + units of vertical space characters. */ + +#ifdef SUPPORT_UNICODE + if (utf) + { + SET_BIT(0xC2); /* For U+0085 (NEL) */ + SET_BIT(0xE2); /* For U+2028, U+2029 */ + } + else +#endif + /* For the 8-bit library not in UTF-8 mode, set the bit for NEL. */ + { + SET_BIT(CHAR_NEL); + } +#endif /* 8-bit support */ + break; + + case OP_NOT_DIGIT: + set_nottype_bits(re, cbit_digit, table_limit); + break; + + case OP_DIGIT: + set_type_bits(re, cbit_digit, table_limit); + break; + + case OP_NOT_WHITESPACE: + set_nottype_bits(re, cbit_space, table_limit); + break; + + case OP_WHITESPACE: + set_type_bits(re, cbit_space, table_limit); + break; + + case OP_NOT_WORDCHAR: + set_nottype_bits(re, cbit_word, table_limit); + break; + + case OP_WORDCHAR: + set_type_bits(re, cbit_word, table_limit); + break; + } + + tcode += 2; + break; + + /* Extended class: if there are any property checks, or if this is a + negative XCLASS without a map, give up. If there are no property checks, + there must be wide characters on the XCLASS list, because otherwise an + XCLASS would not have been created. This means that code points >= 255 + are potential starters. In the UTF-8 case we can scan them and set bits + for the relevant leading bytes. */ + +#ifdef SUPPORT_WIDE_CHARS + case OP_XCLASS: + xclassflags = tcode[1 + LINK_SIZE]; + if ((xclassflags & XCL_HASPROP) != 0 || + (xclassflags & (XCL_MAP|XCL_NOT)) == XCL_NOT) + return SSB_FAIL; + + /* We have a positive XCLASS or a negative one without a map. Set up the + map pointer if there is one, and fall through. */ + + classmap = ((xclassflags & XCL_MAP) == 0)? NULL : + (uint8_t *)(tcode + 1 + LINK_SIZE + 1); + + /* In UTF-8 mode, scan the character list and set bits for leading bytes, + then jump to handle the map. */ + +#if PCRE2_CODE_UNIT_WIDTH == 8 + if (utf && (xclassflags & XCL_NOT) == 0) + { + PCRE2_UCHAR b, e; + PCRE2_SPTR p = tcode + 1 + LINK_SIZE + 1 + ((classmap == NULL)? 0:32); + tcode += GET(tcode, 1); + + for (;;) switch (*p++) + { + case XCL_SINGLE: + b = *p++; + while ((*p & 0xc0) == 0x80) p++; + re->start_bitmap[b/8] |= (1u << (b&7)); + break; + + case XCL_RANGE: + b = *p++; + while ((*p & 0xc0) == 0x80) p++; + e = *p++; + while ((*p & 0xc0) == 0x80) p++; + for (; b <= e; b++) + re->start_bitmap[b/8] |= (1u << (b&7)); + break; + + case XCL_END: + goto HANDLE_CLASSMAP; + + default: + return SSB_UNKNOWN; /* Internal error, should not occur */ + } + } +#endif /* SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 */ +#endif /* SUPPORT_WIDE_CHARS */ + + /* It seems that the fall through comment must be outside the #ifdef if + it is to avoid the gcc compiler warning. */ + + /* Fall through */ + + /* Enter here for a negative non-XCLASS. In the 8-bit library, if we are + in UTF mode, any byte with a value >= 0xc4 is a potentially valid starter + because it starts a character with a value > 255. In 8-bit non-UTF mode, + there is no difference between CLASS and NCLASS. In all other wide + character modes, set the 0xFF bit to indicate code units >= 255. */ + + case OP_NCLASS: +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 + if (utf) + { + re->start_bitmap[24] |= 0xf0; /* Bits for 0xc4 - 0xc8 */ + memset(re->start_bitmap+25, 0xff, 7); /* Bits for 0xc9 - 0xff */ + } +#elif PCRE2_CODE_UNIT_WIDTH != 8 + SET_BIT(0xFF); /* For characters >= 255 */ +#endif + /* Fall through */ + + /* Enter here for a positive non-XCLASS. If we have fallen through from + an XCLASS, classmap will already be set; just advance the code pointer. + Otherwise, set up classmap for a a non-XCLASS and advance past it. */ + + case OP_CLASS: + if (*tcode == OP_XCLASS) tcode += GET(tcode, 1); else + { + classmap = (uint8_t *)(++tcode); + tcode += 32 / sizeof(PCRE2_UCHAR); + } + + /* When wide characters are supported, classmap may be NULL. In UTF-8 + (sic) mode, the bits in a class bit map correspond to character values, + not to byte values. However, the bit map we are constructing is for byte + values. So we have to do a conversion for characters whose code point is + greater than 127. In fact, there are only two possible starting bytes for + characters in the range 128 - 255. */ + +#if defined SUPPORT_WIDE_CHARS && PCRE2_CODE_UNIT_WIDTH == 8 + HANDLE_CLASSMAP: +#endif + if (classmap != NULL) + { +#if defined SUPPORT_UNICODE && PCRE2_CODE_UNIT_WIDTH == 8 + if (utf) + { + for (c = 0; c < 16; c++) re->start_bitmap[c] |= classmap[c]; + for (c = 128; c < 256; c++) + { + if ((classmap[c/8] & (1u << (c&7))) != 0) + { + int d = (c >> 6) | 0xc0; /* Set bit for this starter */ + re->start_bitmap[d/8] |= (1u << (d&7)); /* and then skip on to the */ + c = (c & 0xc0) + 0x40 - 1; /* next relevant character. */ + } + } + } + else +#endif + /* In all modes except UTF-8, the two bit maps are compatible. */ + + { + for (c = 0; c < 32; c++) re->start_bitmap[c] |= classmap[c]; + } + } + + /* Act on what follows the class. For a zero minimum repeat, continue; + otherwise stop processing. */ + + switch (*tcode) + { + case OP_CRSTAR: + case OP_CRMINSTAR: + case OP_CRQUERY: + case OP_CRMINQUERY: + case OP_CRPOSSTAR: + case OP_CRPOSQUERY: + tcode++; + break; + + case OP_CRRANGE: + case OP_CRMINRANGE: + case OP_CRPOSRANGE: + if (GET2(tcode, 1) == 0) tcode += 1 + 2 * IMM2_SIZE; + else try_next = FALSE; + break; + + default: + try_next = FALSE; + break; + } + break; /* End of class handling case */ + } /* End of switch for opcodes */ + } /* End of try_next loop */ + + code += GET(code, 1); /* Advance to next branch */ + } +while (*code == OP_ALT); + +return yield; +} + + + +/************************************************* +* Study a compiled expression * +*************************************************/ + +/* This function is handed a compiled expression that it must study to produce +information that will speed up the matching. + +Argument: + re points to the compiled expression + +Returns: 0 normally; non-zero should never normally occur + 1 unknown opcode in set_start_bits + 2 missing capturing bracket + 3 unknown opcode in find_minlength +*/ + +int +PRIV(study)(pcre2_real_code *re) +{ +int count = 0; +PCRE2_UCHAR *code; +BOOL utf = (re->overall_options & PCRE2_UTF) != 0; +BOOL ucp = (re->overall_options & PCRE2_UCP) != 0; + +/* Find start of compiled code */ + +code = (PCRE2_UCHAR *)((uint8_t *)re + sizeof(pcre2_real_code)) + + re->name_entry_size * re->name_count; + +/* For a pattern that has a first code unit, or a multiline pattern that +matches only at "line start", there is no point in seeking a list of starting +code units. */ + +if ((re->flags & (PCRE2_FIRSTSET|PCRE2_STARTLINE)) == 0) + { + int depth = 0; + int rc = set_start_bits(re, code, utf, ucp, &depth); + if (rc == SSB_UNKNOWN) return 1; + + /* If a list of starting code units was set up, scan the list to see if only + one or two were listed. Having only one listed is rare because usually a + single starting code unit will have been recognized and PCRE2_FIRSTSET set. + If two are listed, see if they are caseless versions of the same character; + if so we can replace the list with a caseless first code unit. This gives + better performance and is plausibly worth doing for patterns such as [Ww]ord + or (word|WORD). */ + + if (rc == SSB_DONE) + { + int i; + int a = -1; + int b = -1; + uint8_t *p = re->start_bitmap; + uint32_t flags = PCRE2_FIRSTMAPSET; + + for (i = 0; i < 256; p++, i += 8) + { + uint8_t x = *p; + if (x != 0) + { + int c; + uint8_t y = x & (~x + 1); /* Least significant bit */ + if (y != x) goto DONE; /* More than one bit set */ + + /* In the 16-bit and 32-bit libraries, the bit for 0xff means "0xff and + all wide characters", so we cannot use it here. */ + +#if PCRE2_CODE_UNIT_WIDTH != 8 + if (i == 248 && x == 0x80) goto DONE; +#endif + + /* Compute the character value */ + + c = i; + switch (x) + { + case 1: break; + case 2: c += 1; break; case 4: c += 2; break; + case 8: c += 3; break; case 16: c += 4; break; + case 32: c += 5; break; case 64: c += 6; break; + case 128: c += 7; break; + } + + /* c contains the code unit value, in the range 0-255. In 8-bit UTF + mode, only values < 128 can be used. In all the other cases, c is a + character value. */ + +#if PCRE2_CODE_UNIT_WIDTH == 8 + if (utf && c > 127) goto DONE; +#endif + if (a < 0) a = c; /* First one found, save in a */ + else if (b < 0) /* Second one found */ + { + int d = TABLE_GET((unsigned int)c, re->tables + fcc_offset, c); + +#ifdef SUPPORT_UNICODE + if (utf || ucp) + { + if (UCD_CASESET(c) != 0) goto DONE; /* Multiple case set */ + if (c > 127) d = UCD_OTHERCASE(c); + } +#endif /* SUPPORT_UNICODE */ + + if (d != a) goto DONE; /* Not the other case of a */ + b = c; /* Save second in b */ + } + else goto DONE; /* More than two characters found */ + } + } + + /* Replace the start code unit bits with a first code unit, but only if it + is not the same as a required later code unit. This is because a search for + a required code unit starts after an explicit first code unit, but at a + code unit found from the bitmap. Patterns such as /a*a/ don't work + if both the start unit and required unit are the same. */ + + if (a >= 0 && + ( + (re->flags & PCRE2_LASTSET) == 0 || + ( + re->last_codeunit != (uint32_t)a && + (b < 0 || re->last_codeunit != (uint32_t)b) + ) + )) + { + re->first_codeunit = a; + flags = PCRE2_FIRSTSET; + if (b >= 0) flags |= PCRE2_FIRSTCASELESS; + } + + DONE: + re->flags |= flags; + } + } + +/* Find the minimum length of subject string. If the pattern can match an empty +string, the minimum length is already known. If the pattern contains (*ACCEPT) +all bets are off, and we don't even try to find a minimum length. If there are +more back references than the size of the vector we are going to cache them in, +do nothing. A pattern that complicated will probably take a long time to +analyze and may in any case turn out to be too complicated. Note that back +reference minima are held as 16-bit numbers. */ + +if ((re->flags & (PCRE2_MATCH_EMPTY|PCRE2_HASACCEPT)) == 0 && + re->top_backref <= MAX_CACHE_BACKREF) + { + int min; + int backref_cache[MAX_CACHE_BACKREF+1]; + backref_cache[0] = 0; /* Highest one that is set */ + min = find_minlength(re, code, code, utf, NULL, &count, backref_cache); + switch(min) + { + case -1: /* \C in UTF mode or over-complex regex */ + break; /* Leave minlength unchanged (will be zero) */ + + case -2: + return 2; /* missing capturing bracket */ + + case -3: + return 3; /* unrecognized opcode */ + + default: + re->minlength = (min > UINT16_MAX)? UINT16_MAX : min; + break; + } + } + +return 0; +} + +/* End of pcre2_study.c */ diff --git a/src/pcre2/src/pcre2_substitute.c b/src/pcre2/src/pcre2_substitute.c new file mode 100644 index 00000000..981a106a --- /dev/null +++ b/src/pcre2/src/pcre2_substitute.c @@ -0,0 +1,987 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2020 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include "pcre2_internal.h" + +#define PTR_STACK_SIZE 20 + +#define SUBSTITUTE_OPTIONS \ + (PCRE2_SUBSTITUTE_EXTENDED|PCRE2_SUBSTITUTE_GLOBAL| \ + PCRE2_SUBSTITUTE_LITERAL|PCRE2_SUBSTITUTE_MATCHED| \ + PCRE2_SUBSTITUTE_OVERFLOW_LENGTH|PCRE2_SUBSTITUTE_REPLACEMENT_ONLY| \ + PCRE2_SUBSTITUTE_UNKNOWN_UNSET|PCRE2_SUBSTITUTE_UNSET_EMPTY) + + + +/************************************************* +* Find end of substitute text * +*************************************************/ + +/* In extended mode, we recognize ${name:+set text:unset text} and similar +constructions. This requires the identification of unescaped : and } +characters. This function scans for such. It must deal with nested ${ +constructions. The pointer to the text is updated, either to the required end +character, or to where an error was detected. + +Arguments: + code points to the compiled expression (for options) + ptrptr points to the pointer to the start of the text (updated) + ptrend end of the whole string + last TRUE if the last expected string (only } recognized) + +Returns: 0 on success + negative error code on failure +*/ + +static int +find_text_end(const pcre2_code *code, PCRE2_SPTR *ptrptr, PCRE2_SPTR ptrend, + BOOL last) +{ +int rc = 0; +uint32_t nestlevel = 0; +BOOL literal = FALSE; +PCRE2_SPTR ptr = *ptrptr; + +for (; ptr < ptrend; ptr++) + { + if (literal) + { + if (ptr[0] == CHAR_BACKSLASH && ptr < ptrend - 1 && ptr[1] == CHAR_E) + { + literal = FALSE; + ptr += 1; + } + } + + else if (*ptr == CHAR_RIGHT_CURLY_BRACKET) + { + if (nestlevel == 0) goto EXIT; + nestlevel--; + } + + else if (*ptr == CHAR_COLON && !last && nestlevel == 0) goto EXIT; + + else if (*ptr == CHAR_DOLLAR_SIGN) + { + if (ptr < ptrend - 1 && ptr[1] == CHAR_LEFT_CURLY_BRACKET) + { + nestlevel++; + ptr += 1; + } + } + + else if (*ptr == CHAR_BACKSLASH) + { + int erc; + int errorcode; + uint32_t ch; + + if (ptr < ptrend - 1) switch (ptr[1]) + { + case CHAR_L: + case CHAR_l: + case CHAR_U: + case CHAR_u: + ptr += 1; + continue; + } + + ptr += 1; /* Must point after \ */ + erc = PRIV(check_escape)(&ptr, ptrend, &ch, &errorcode, + code->overall_options, code->extra_options, FALSE, NULL); + ptr -= 1; /* Back to last code unit of escape */ + if (errorcode != 0) + { + rc = errorcode; + goto EXIT; + } + + switch(erc) + { + case 0: /* Data character */ + case ESC_E: /* Isolated \E is ignored */ + break; + + case ESC_Q: + literal = TRUE; + break; + + default: + rc = PCRE2_ERROR_BADREPESCAPE; + goto EXIT; + } + } + } + +rc = PCRE2_ERROR_REPMISSINGBRACE; /* Terminator not found */ + +EXIT: +*ptrptr = ptr; +return rc; +} + + + +/************************************************* +* Match and substitute * +*************************************************/ + +/* This function applies a compiled re to a subject string and creates a new +string with substitutions. The first 7 arguments are the same as for +pcre2_match(). Either string length may be PCRE2_ZERO_TERMINATED. + +Arguments: + code points to the compiled expression + subject points to the subject string + length length of subject string (may contain binary zeros) + start_offset where to start in the subject string + options option bits + match_data points to a match_data block, or is NULL + context points a PCRE2 context + replacement points to the replacement string + rlength length of replacement string + buffer where to put the substituted string + blength points to length of buffer; updated to length of string + +Returns: >= 0 number of substitutions made + < 0 an error code + PCRE2_ERROR_BADREPLACEMENT means invalid use of $ +*/ + +/* This macro checks for space in the buffer before copying into it. On +overflow, either give an error immediately, or keep on, accumulating the +length. */ + +#define CHECKMEMCPY(from,length) \ + { \ + if (!overflowed && lengthleft < length) \ + { \ + if ((suboptions & PCRE2_SUBSTITUTE_OVERFLOW_LENGTH) == 0) goto NOROOM; \ + overflowed = TRUE; \ + extra_needed = length - lengthleft; \ + } \ + else if (overflowed) \ + { \ + extra_needed += length; \ + } \ + else \ + { \ + memcpy(buffer + buff_offset, from, CU2BYTES(length)); \ + buff_offset += length; \ + lengthleft -= length; \ + } \ + } + +/* Here's the function */ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_substitute(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length, + PCRE2_SIZE start_offset, uint32_t options, pcre2_match_data *match_data, + pcre2_match_context *mcontext, PCRE2_SPTR replacement, PCRE2_SIZE rlength, + PCRE2_UCHAR *buffer, PCRE2_SIZE *blength) +{ +int rc; +int subs; +int forcecase = 0; +int forcecasereset = 0; +uint32_t ovector_count; +uint32_t goptions = 0; +uint32_t suboptions; +pcre2_match_data *internal_match_data = NULL; +BOOL escaped_literal = FALSE; +BOOL overflowed = FALSE; +BOOL use_existing_match; +BOOL replacement_only; +#ifdef SUPPORT_UNICODE +BOOL utf = (code->overall_options & PCRE2_UTF) != 0; +BOOL ucp = (code->overall_options & PCRE2_UCP) != 0; +#endif +PCRE2_UCHAR temp[6]; +PCRE2_SPTR ptr; +PCRE2_SPTR repend; +PCRE2_SIZE extra_needed = 0; +PCRE2_SIZE buff_offset, buff_length, lengthleft, fraglength; +PCRE2_SIZE *ovector; +PCRE2_SIZE ovecsave[3]; +pcre2_substitute_callout_block scb; + +/* General initialization */ + +buff_offset = 0; +lengthleft = buff_length = *blength; +*blength = PCRE2_UNSET; +ovecsave[0] = ovecsave[1] = ovecsave[2] = PCRE2_UNSET; + +/* Partial matching is not valid. This must come after setting *blength to +PCRE2_UNSET, so as not to imply an offset in the replacement. */ + +if ((options & (PCRE2_PARTIAL_HARD|PCRE2_PARTIAL_SOFT)) != 0) + return PCRE2_ERROR_BADOPTION; + +/* Check for using a match that has already happened. Note that the subject +pointer in the match data may be NULL after a no-match. */ + +use_existing_match = ((options & PCRE2_SUBSTITUTE_MATCHED) != 0); +replacement_only = ((options & PCRE2_SUBSTITUTE_REPLACEMENT_ONLY) != 0); + +/* If starting from an existing match, there must be an externally provided +match data block. We create an internal match_data block in two cases: (a) an +external one is not supplied (and we are not starting from an existing match); +(b) an existing match is to be used for the first substitution. In the latter +case, we copy the existing match into the internal block. This ensures that no +changes are made to the existing match data block. */ + +if (match_data == NULL) + { + pcre2_general_context *gcontext; + if (use_existing_match) return PCRE2_ERROR_NULL; + gcontext = (mcontext == NULL)? + (pcre2_general_context *)code : + (pcre2_general_context *)mcontext; + match_data = internal_match_data = + pcre2_match_data_create_from_pattern(code, gcontext); + if (internal_match_data == NULL) return PCRE2_ERROR_NOMEMORY; + } + +else if (use_existing_match) + { + pcre2_general_context *gcontext = (mcontext == NULL)? + (pcre2_general_context *)code : + (pcre2_general_context *)mcontext; + int pairs = (code->top_bracket + 1 < match_data->oveccount)? + code->top_bracket + 1 : match_data->oveccount; + internal_match_data = pcre2_match_data_create(match_data->oveccount, + gcontext); + if (internal_match_data == NULL) return PCRE2_ERROR_NOMEMORY; + memcpy(internal_match_data, match_data, offsetof(pcre2_match_data, ovector) + + 2*pairs*sizeof(PCRE2_SIZE)); + match_data = internal_match_data; + } + +/* Remember ovector details */ + +ovector = pcre2_get_ovector_pointer(match_data); +ovector_count = pcre2_get_ovector_count(match_data); + +/* Fixed things in the callout block */ + +scb.version = 0; +scb.input = subject; +scb.output = (PCRE2_SPTR)buffer; +scb.ovector = ovector; + +/* Find lengths of zero-terminated strings and the end of the replacement. */ + +if (length == PCRE2_ZERO_TERMINATED) length = PRIV(strlen)(subject); +if (rlength == PCRE2_ZERO_TERMINATED) rlength = PRIV(strlen)(replacement); +repend = replacement + rlength; + +/* Check UTF replacement string if necessary. */ + +#ifdef SUPPORT_UNICODE +if (utf && (options & PCRE2_NO_UTF_CHECK) == 0) + { + rc = PRIV(valid_utf)(replacement, rlength, &(match_data->startchar)); + if (rc != 0) + { + match_data->leftchar = 0; + goto EXIT; + } + } +#endif /* SUPPORT_UNICODE */ + +/* Save the substitute options and remove them from the match options. */ + +suboptions = options & SUBSTITUTE_OPTIONS; +options &= ~SUBSTITUTE_OPTIONS; + +/* Error if the start match offset is greater than the length of the subject. */ + +if (start_offset > length) + { + match_data->leftchar = 0; + rc = PCRE2_ERROR_BADOFFSET; + goto EXIT; + } + +/* Copy up to the start offset, unless only the replacement is required. */ + +if (!replacement_only) CHECKMEMCPY(subject, start_offset); + +/* Loop for global substituting. If PCRE2_SUBSTITUTE_MATCHED is set, the first +match is taken from the match_data that was passed in. */ + +subs = 0; +do + { + PCRE2_SPTR ptrstack[PTR_STACK_SIZE]; + uint32_t ptrstackptr = 0; + + if (use_existing_match) + { + rc = match_data->rc; + use_existing_match = FALSE; + } + else rc = pcre2_match(code, subject, length, start_offset, options|goptions, + match_data, mcontext); + +#ifdef SUPPORT_UNICODE + if (utf) options |= PCRE2_NO_UTF_CHECK; /* Only need to check once */ +#endif + + /* Any error other than no match returns the error code. No match when not + doing the special after-empty-match global rematch, or when at the end of the + subject, breaks the global loop. Otherwise, advance the starting point by one + character, copying it to the output, and try again. */ + + if (rc < 0) + { + PCRE2_SIZE save_start; + + if (rc != PCRE2_ERROR_NOMATCH) goto EXIT; + if (goptions == 0 || start_offset >= length) break; + + /* Advance by one code point. Then, if CRLF is a valid newline sequence and + we have advanced into the middle of it, advance one more code point. In + other words, do not start in the middle of CRLF, even if CR and LF on their + own are valid newlines. */ + + save_start = start_offset++; + if (subject[start_offset-1] == CHAR_CR && + code->newline_convention != PCRE2_NEWLINE_CR && + code->newline_convention != PCRE2_NEWLINE_LF && + start_offset < length && + subject[start_offset] == CHAR_LF) + start_offset++; + + /* Otherwise, in UTF mode, advance past any secondary code points. */ + + else if ((code->overall_options & PCRE2_UTF) != 0) + { +#if PCRE2_CODE_UNIT_WIDTH == 8 + while (start_offset < length && (subject[start_offset] & 0xc0) == 0x80) + start_offset++; +#elif PCRE2_CODE_UNIT_WIDTH == 16 + while (start_offset < length && + (subject[start_offset] & 0xfc00) == 0xdc00) + start_offset++; +#endif + } + + /* Copy what we have advanced past (unless not required), reset the special + global options, and continue to the next match. */ + + fraglength = start_offset - save_start; + if (!replacement_only) CHECKMEMCPY(subject + save_start, fraglength); + goptions = 0; + continue; + } + + /* Handle a successful match. Matches that use \K to end before they start + or start before the current point in the subject are not supported. */ + + if (ovector[1] < ovector[0] || ovector[0] < start_offset) + { + rc = PCRE2_ERROR_BADSUBSPATTERN; + goto EXIT; + } + + /* Check for the same match as previous. This is legitimate after matching an + empty string that starts after the initial match offset. We have tried again + at the match point in case the pattern is one like /(?<=\G.)/ which can never + match at its starting point, so running the match achieves the bumpalong. If + we do get the same (null) match at the original match point, it isn't such a + pattern, so we now do the empty string magic. In all other cases, a repeat + match should never occur. */ + + if (ovecsave[0] == ovector[0] && ovecsave[1] == ovector[1]) + { + if (ovector[0] == ovector[1] && ovecsave[2] != start_offset) + { + goptions = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED; + ovecsave[2] = start_offset; + continue; /* Back to the top of the loop */ + } + rc = PCRE2_ERROR_INTERNAL_DUPMATCH; + goto EXIT; + } + + /* Count substitutions with a paranoid check for integer overflow; surely no + real call to this function would ever hit this! */ + + if (subs == INT_MAX) + { + rc = PCRE2_ERROR_TOOMANYREPLACE; + goto EXIT; + } + subs++; + + /* Copy the text leading up to the match (unless not required), and remember + where the insert begins and how many ovector pairs are set. */ + + if (rc == 0) rc = ovector_count; + fraglength = ovector[0] - start_offset; + if (!replacement_only) CHECKMEMCPY(subject + start_offset, fraglength); + scb.output_offsets[0] = buff_offset; + scb.oveccount = rc; + + /* Process the replacement string. If the entire replacement is literal, just + copy it with length check. */ + + ptr = replacement; + if ((suboptions & PCRE2_SUBSTITUTE_LITERAL) != 0) + { + CHECKMEMCPY(ptr, rlength); + } + + /* Within a non-literal replacement, which must be scanned character by + character, local literal mode can be set by \Q, but only in extended mode + when backslashes are being interpreted. In extended mode we must handle + nested substrings that are to be reprocessed. */ + + else for (;;) + { + uint32_t ch; + unsigned int chlen; + + /* If at the end of a nested substring, pop the stack. */ + + if (ptr >= repend) + { + if (ptrstackptr == 0) break; /* End of replacement string */ + repend = ptrstack[--ptrstackptr]; + ptr = ptrstack[--ptrstackptr]; + continue; + } + + /* Handle the next character */ + + if (escaped_literal) + { + if (ptr[0] == CHAR_BACKSLASH && ptr < repend - 1 && ptr[1] == CHAR_E) + { + escaped_literal = FALSE; + ptr += 2; + continue; + } + goto LOADLITERAL; + } + + /* Not in literal mode. */ + + if (*ptr == CHAR_DOLLAR_SIGN) + { + int group, n; + uint32_t special = 0; + BOOL inparens; + BOOL star; + PCRE2_SIZE sublength; + PCRE2_SPTR text1_start = NULL; + PCRE2_SPTR text1_end = NULL; + PCRE2_SPTR text2_start = NULL; + PCRE2_SPTR text2_end = NULL; + PCRE2_UCHAR next; + PCRE2_UCHAR name[33]; + + if (++ptr >= repend) goto BAD; + if ((next = *ptr) == CHAR_DOLLAR_SIGN) goto LOADLITERAL; + + group = -1; + n = 0; + inparens = FALSE; + star = FALSE; + + if (next == CHAR_LEFT_CURLY_BRACKET) + { + if (++ptr >= repend) goto BAD; + next = *ptr; + inparens = TRUE; + } + + if (next == CHAR_ASTERISK) + { + if (++ptr >= repend) goto BAD; + next = *ptr; + star = TRUE; + } + + if (!star && next >= CHAR_0 && next <= CHAR_9) + { + group = next - CHAR_0; + while (++ptr < repend) + { + next = *ptr; + if (next < CHAR_0 || next > CHAR_9) break; + group = group * 10 + next - CHAR_0; + + /* A check for a number greater than the hightest captured group + is sufficient here; no need for a separate overflow check. If unknown + groups are to be treated as unset, just skip over any remaining + digits and carry on. */ + + if (group > code->top_bracket) + { + if ((suboptions & PCRE2_SUBSTITUTE_UNKNOWN_UNSET) != 0) + { + while (++ptr < repend && *ptr >= CHAR_0 && *ptr <= CHAR_9); + break; + } + else + { + rc = PCRE2_ERROR_NOSUBSTRING; + goto PTREXIT; + } + } + } + } + else + { + const uint8_t *ctypes = code->tables + ctypes_offset; + while (MAX_255(next) && (ctypes[next] & ctype_word) != 0) + { + name[n++] = next; + if (n > 32) goto BAD; + if (++ptr >= repend) break; + next = *ptr; + } + if (n == 0) goto BAD; + name[n] = 0; + } + + /* In extended mode we recognize ${name:+set text:unset text} and + ${name:-default text}. */ + + if (inparens) + { + if ((suboptions & PCRE2_SUBSTITUTE_EXTENDED) != 0 && + !star && ptr < repend - 2 && next == CHAR_COLON) + { + special = *(++ptr); + if (special != CHAR_PLUS && special != CHAR_MINUS) + { + rc = PCRE2_ERROR_BADSUBSTITUTION; + goto PTREXIT; + } + + text1_start = ++ptr; + rc = find_text_end(code, &ptr, repend, special == CHAR_MINUS); + if (rc != 0) goto PTREXIT; + text1_end = ptr; + + if (special == CHAR_PLUS && *ptr == CHAR_COLON) + { + text2_start = ++ptr; + rc = find_text_end(code, &ptr, repend, TRUE); + if (rc != 0) goto PTREXIT; + text2_end = ptr; + } + } + + else + { + if (ptr >= repend || *ptr != CHAR_RIGHT_CURLY_BRACKET) + { + rc = PCRE2_ERROR_REPMISSINGBRACE; + goto PTREXIT; + } + } + + ptr++; + } + + /* Have found a syntactically correct group number or name, or *name. + Only *MARK is currently recognized. */ + + if (star) + { + if (PRIV(strcmp_c8)(name, STRING_MARK) == 0) + { + PCRE2_SPTR mark = pcre2_get_mark(match_data); + if (mark != NULL) + { + PCRE2_SPTR mark_start = mark; + while (*mark != 0) mark++; + fraglength = mark - mark_start; + CHECKMEMCPY(mark_start, fraglength); + } + } + else goto BAD; + } + + /* Substitute the contents of a group. We don't use substring_copy + functions any more, in order to support case forcing. */ + + else + { + PCRE2_SPTR subptr, subptrend; + + /* Find a number for a named group. In case there are duplicate names, + search for the first one that is set. If the name is not found when + PCRE2_SUBSTITUTE_UNKNOWN_EMPTY is set, set the group number to a + non-existent group. */ + + if (group < 0) + { + PCRE2_SPTR first, last, entry; + rc = pcre2_substring_nametable_scan(code, name, &first, &last); + if (rc == PCRE2_ERROR_NOSUBSTRING && + (suboptions & PCRE2_SUBSTITUTE_UNKNOWN_UNSET) != 0) + { + group = code->top_bracket + 1; + } + else + { + if (rc < 0) goto PTREXIT; + for (entry = first; entry <= last; entry += rc) + { + uint32_t ng = GET2(entry, 0); + if (ng < ovector_count) + { + if (group < 0) group = ng; /* First in ovector */ + if (ovector[ng*2] != PCRE2_UNSET) + { + group = ng; /* First that is set */ + break; + } + } + } + + /* If group is still negative, it means we did not find a group + that is in the ovector. Just set the first group. */ + + if (group < 0) group = GET2(first, 0); + } + } + + /* We now have a group that is identified by number. Find the length of + the captured string. If a group in a non-special substitution is unset + when PCRE2_SUBSTITUTE_UNSET_EMPTY is set, substitute nothing. */ + + rc = pcre2_substring_length_bynumber(match_data, group, &sublength); + if (rc < 0) + { + if (rc == PCRE2_ERROR_NOSUBSTRING && + (suboptions & PCRE2_SUBSTITUTE_UNKNOWN_UNSET) != 0) + { + rc = PCRE2_ERROR_UNSET; + } + if (rc != PCRE2_ERROR_UNSET) goto PTREXIT; /* Non-unset errors */ + if (special == 0) /* Plain substitution */ + { + if ((suboptions & PCRE2_SUBSTITUTE_UNSET_EMPTY) != 0) continue; + goto PTREXIT; /* Else error */ + } + } + + /* If special is '+' we have a 'set' and possibly an 'unset' text, + both of which are reprocessed when used. If special is '-' we have a + default text for when the group is unset; it must be reprocessed. */ + + if (special != 0) + { + if (special == CHAR_MINUS) + { + if (rc == 0) goto LITERAL_SUBSTITUTE; + text2_start = text1_start; + text2_end = text1_end; + } + + if (ptrstackptr >= PTR_STACK_SIZE) goto BAD; + ptrstack[ptrstackptr++] = ptr; + ptrstack[ptrstackptr++] = repend; + + if (rc == 0) + { + ptr = text1_start; + repend = text1_end; + } + else + { + ptr = text2_start; + repend = text2_end; + } + continue; + } + + /* Otherwise we have a literal substitution of a group's contents. */ + + LITERAL_SUBSTITUTE: + subptr = subject + ovector[group*2]; + subptrend = subject + ovector[group*2 + 1]; + + /* Substitute a literal string, possibly forcing alphabetic case. */ + + while (subptr < subptrend) + { + GETCHARINCTEST(ch, subptr); + if (forcecase != 0) + { +#ifdef SUPPORT_UNICODE + if (utf || ucp) + { + uint32_t type = UCD_CHARTYPE(ch); + if (PRIV(ucp_gentype)[type] == ucp_L && + type != ((forcecase > 0)? ucp_Lu : ucp_Ll)) + ch = UCD_OTHERCASE(ch); + } + else +#endif + { + if (((code->tables + cbits_offset + + ((forcecase > 0)? cbit_upper:cbit_lower) + )[ch/8] & (1u << (ch%8))) == 0) + ch = (code->tables + fcc_offset)[ch]; + } + forcecase = forcecasereset; + } + +#ifdef SUPPORT_UNICODE + if (utf) chlen = PRIV(ord2utf)(ch, temp); else +#endif + { + temp[0] = ch; + chlen = 1; + } + CHECKMEMCPY(temp, chlen); + } + } + } + + /* Handle an escape sequence in extended mode. We can use check_escape() + to process \Q, \E, \c, \o, \x and \ followed by non-alphanumerics, but + the case-forcing escapes are not supported in pcre2_compile() so must be + recognized here. */ + + else if ((suboptions & PCRE2_SUBSTITUTE_EXTENDED) != 0 && + *ptr == CHAR_BACKSLASH) + { + int errorcode; + + if (ptr < repend - 1) switch (ptr[1]) + { + case CHAR_L: + forcecase = forcecasereset = -1; + ptr += 2; + continue; + + case CHAR_l: + forcecase = -1; + forcecasereset = 0; + ptr += 2; + continue; + + case CHAR_U: + forcecase = forcecasereset = 1; + ptr += 2; + continue; + + case CHAR_u: + forcecase = 1; + forcecasereset = 0; + ptr += 2; + continue; + + default: + break; + } + + ptr++; /* Point after \ */ + rc = PRIV(check_escape)(&ptr, repend, &ch, &errorcode, + code->overall_options, code->extra_options, FALSE, NULL); + if (errorcode != 0) goto BADESCAPE; + + switch(rc) + { + case ESC_E: + forcecase = forcecasereset = 0; + continue; + + case ESC_Q: + escaped_literal = TRUE; + continue; + + case 0: /* Data character */ + goto LITERAL; + + default: + goto BADESCAPE; + } + } + + /* Handle a literal code unit */ + + else + { + LOADLITERAL: + GETCHARINCTEST(ch, ptr); /* Get character value, increment pointer */ + + LITERAL: + if (forcecase != 0) + { +#ifdef SUPPORT_UNICODE + if (utf || ucp) + { + uint32_t type = UCD_CHARTYPE(ch); + if (PRIV(ucp_gentype)[type] == ucp_L && + type != ((forcecase > 0)? ucp_Lu : ucp_Ll)) + ch = UCD_OTHERCASE(ch); + } + else +#endif + { + if (((code->tables + cbits_offset + + ((forcecase > 0)? cbit_upper:cbit_lower) + )[ch/8] & (1u << (ch%8))) == 0) + ch = (code->tables + fcc_offset)[ch]; + } + forcecase = forcecasereset; + } + +#ifdef SUPPORT_UNICODE + if (utf) chlen = PRIV(ord2utf)(ch, temp); else +#endif + { + temp[0] = ch; + chlen = 1; + } + CHECKMEMCPY(temp, chlen); + } /* End handling a literal code unit */ + } /* End of loop for scanning the replacement. */ + + /* The replacement has been copied to the output, or its size has been + remembered. Do the callout if there is one and we have done an actual + replacement. */ + + if (!overflowed && mcontext != NULL && mcontext->substitute_callout != NULL) + { + scb.subscount = subs; + scb.output_offsets[1] = buff_offset; + rc = mcontext->substitute_callout(&scb, mcontext->substitute_callout_data); + + /* A non-zero return means cancel this substitution. Instead, copy the + matched string fragment. */ + + if (rc != 0) + { + PCRE2_SIZE newlength = scb.output_offsets[1] - scb.output_offsets[0]; + PCRE2_SIZE oldlength = ovector[1] - ovector[0]; + + buff_offset -= newlength; + lengthleft += newlength; + if (!replacement_only) CHECKMEMCPY(subject + ovector[0], oldlength); + + /* A negative return means do not do any more. */ + + if (rc < 0) suboptions &= (~PCRE2_SUBSTITUTE_GLOBAL); + } + } + + /* Save the details of this match. See above for how this data is used. If we + matched an empty string, do the magic for global matches. Update the start + offset to point to the rest of the subject string. If we re-used an existing + match for the first match, switch to the internal match data block. */ + + ovecsave[0] = ovector[0]; + ovecsave[1] = ovector[1]; + ovecsave[2] = start_offset; + + goptions = (ovector[0] != ovector[1] || ovector[0] > start_offset)? 0 : + PCRE2_ANCHORED|PCRE2_NOTEMPTY_ATSTART; + start_offset = ovector[1]; + } while ((suboptions & PCRE2_SUBSTITUTE_GLOBAL) != 0); /* Repeat "do" loop */ + +/* Copy the rest of the subject unless not required, and terminate the output +with a binary zero. */ + +if (!replacement_only) + { + fraglength = length - start_offset; + CHECKMEMCPY(subject + start_offset, fraglength); + } + +temp[0] = 0; +CHECKMEMCPY(temp, 1); + +/* If overflowed is set it means the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set, +and matching has carried on after a full buffer, in order to compute the length +needed. Otherwise, an overflow generates an immediate error return. */ + +if (overflowed) + { + rc = PCRE2_ERROR_NOMEMORY; + *blength = buff_length + extra_needed; + } + +/* After a successful execution, return the number of substitutions and set the +length of buffer used, excluding the trailing zero. */ + +else + { + rc = subs; + *blength = buff_offset - 1; + } + +EXIT: +if (internal_match_data != NULL) pcre2_match_data_free(internal_match_data); + else match_data->rc = rc; +return rc; + +NOROOM: +rc = PCRE2_ERROR_NOMEMORY; +goto EXIT; + +BAD: +rc = PCRE2_ERROR_BADREPLACEMENT; +goto PTREXIT; + +BADESCAPE: +rc = PCRE2_ERROR_BADREPESCAPE; + +PTREXIT: +*blength = (PCRE2_SIZE)(ptr - replacement); +goto EXIT; +} + +/* End of pcre2_substitute.c */ diff --git a/src/pcre2/src/pcre2_substring.c b/src/pcre2/src/pcre2_substring.c new file mode 100644 index 00000000..ddf5774e --- /dev/null +++ b/src/pcre2/src/pcre2_substring.c @@ -0,0 +1,547 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2018 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include "pcre2_internal.h" + + + +/************************************************* +* Copy named captured string to given buffer * +*************************************************/ + +/* This function copies a single captured substring into a given buffer, +identifying it by name. If the regex permits duplicate names, the first +substring that is set is chosen. + +Arguments: + match_data points to the match data + stringname the name of the required substring + buffer where to put the substring + sizeptr the size of the buffer, updated to the size of the substring + +Returns: if successful: zero + if not successful, a negative error code: + (1) an error from nametable_scan() + (2) an error from copy_bynumber() + (3) PCRE2_ERROR_UNAVAILABLE: no group is in ovector + (4) PCRE2_ERROR_UNSET: all named groups in ovector are unset +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_substring_copy_byname(pcre2_match_data *match_data, PCRE2_SPTR stringname, + PCRE2_UCHAR *buffer, PCRE2_SIZE *sizeptr) +{ +PCRE2_SPTR first, last, entry; +int failrc, entrysize; +if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER) + return PCRE2_ERROR_DFA_UFUNC; +entrysize = pcre2_substring_nametable_scan(match_data->code, stringname, + &first, &last); +if (entrysize < 0) return entrysize; +failrc = PCRE2_ERROR_UNAVAILABLE; +for (entry = first; entry <= last; entry += entrysize) + { + uint32_t n = GET2(entry, 0); + if (n < match_data->oveccount) + { + if (match_data->ovector[n*2] != PCRE2_UNSET) + return pcre2_substring_copy_bynumber(match_data, n, buffer, sizeptr); + failrc = PCRE2_ERROR_UNSET; + } + } +return failrc; +} + + + +/************************************************* +* Copy numbered captured string to given buffer * +*************************************************/ + +/* This function copies a single captured substring into a given buffer, +identifying it by number. + +Arguments: + match_data points to the match data + stringnumber the number of the required substring + buffer where to put the substring + sizeptr the size of the buffer, updated to the size of the substring + +Returns: if successful: 0 + if not successful, a negative error code: + PCRE2_ERROR_NOMEMORY: buffer too small + PCRE2_ERROR_NOSUBSTRING: no such substring + PCRE2_ERROR_UNAVAILABLE: ovector too small + PCRE2_ERROR_UNSET: substring is not set +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_substring_copy_bynumber(pcre2_match_data *match_data, + uint32_t stringnumber, PCRE2_UCHAR *buffer, PCRE2_SIZE *sizeptr) +{ +int rc; +PCRE2_SIZE size; +rc = pcre2_substring_length_bynumber(match_data, stringnumber, &size); +if (rc < 0) return rc; +if (size + 1 > *sizeptr) return PCRE2_ERROR_NOMEMORY; +memcpy(buffer, match_data->subject + match_data->ovector[stringnumber*2], + CU2BYTES(size)); +buffer[size] = 0; +*sizeptr = size; +return 0; +} + + + +/************************************************* +* Extract named captured string * +*************************************************/ + +/* This function copies a single captured substring, identified by name, into +new memory. If the regex permits duplicate names, the first substring that is +set is chosen. + +Arguments: + match_data pointer to match_data + stringname the name of the required substring + stringptr where to put the pointer to the new memory + sizeptr where to put the length of the substring + +Returns: if successful: zero + if not successful, a negative value: + (1) an error from nametable_scan() + (2) an error from get_bynumber() + (3) PCRE2_ERROR_UNAVAILABLE: no group is in ovector + (4) PCRE2_ERROR_UNSET: all named groups in ovector are unset +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_substring_get_byname(pcre2_match_data *match_data, + PCRE2_SPTR stringname, PCRE2_UCHAR **stringptr, PCRE2_SIZE *sizeptr) +{ +PCRE2_SPTR first, last, entry; +int failrc, entrysize; +if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER) + return PCRE2_ERROR_DFA_UFUNC; +entrysize = pcre2_substring_nametable_scan(match_data->code, stringname, + &first, &last); +if (entrysize < 0) return entrysize; +failrc = PCRE2_ERROR_UNAVAILABLE; +for (entry = first; entry <= last; entry += entrysize) + { + uint32_t n = GET2(entry, 0); + if (n < match_data->oveccount) + { + if (match_data->ovector[n*2] != PCRE2_UNSET) + return pcre2_substring_get_bynumber(match_data, n, stringptr, sizeptr); + failrc = PCRE2_ERROR_UNSET; + } + } +return failrc; +} + + + +/************************************************* +* Extract captured string to new memory * +*************************************************/ + +/* This function copies a single captured substring into a piece of new +memory. + +Arguments: + match_data points to match data + stringnumber the number of the required substring + stringptr where to put a pointer to the new memory + sizeptr where to put the size of the substring + +Returns: if successful: 0 + if not successful, a negative error code: + PCRE2_ERROR_NOMEMORY: failed to get memory + PCRE2_ERROR_NOSUBSTRING: no such substring + PCRE2_ERROR_UNAVAILABLE: ovector too small + PCRE2_ERROR_UNSET: substring is not set +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_substring_get_bynumber(pcre2_match_data *match_data, + uint32_t stringnumber, PCRE2_UCHAR **stringptr, PCRE2_SIZE *sizeptr) +{ +int rc; +PCRE2_SIZE size; +PCRE2_UCHAR *yield; +rc = pcre2_substring_length_bynumber(match_data, stringnumber, &size); +if (rc < 0) return rc; +yield = PRIV(memctl_malloc)(sizeof(pcre2_memctl) + + (size + 1)*PCRE2_CODE_UNIT_WIDTH, (pcre2_memctl *)match_data); +if (yield == NULL) return PCRE2_ERROR_NOMEMORY; +yield = (PCRE2_UCHAR *)(((char *)yield) + sizeof(pcre2_memctl)); +memcpy(yield, match_data->subject + match_data->ovector[stringnumber*2], + CU2BYTES(size)); +yield[size] = 0; +*stringptr = yield; +*sizeptr = size; +return 0; +} + + + +/************************************************* +* Free memory obtained by get_substring * +*************************************************/ + +/* +Argument: the result of a previous pcre2_substring_get_byxxx() +Returns: nothing +*/ + +PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION +pcre2_substring_free(PCRE2_UCHAR *string) +{ +if (string != NULL) + { + pcre2_memctl *memctl = (pcre2_memctl *)((char *)string - sizeof(pcre2_memctl)); + memctl->free(memctl, memctl->memory_data); + } +} + + + +/************************************************* +* Get length of a named substring * +*************************************************/ + +/* This function returns the length of a named captured substring. If the regex +permits duplicate names, the first substring that is set is chosen. + +Arguments: + match_data pointer to match data + stringname the name of the required substring + sizeptr where to put the length + +Returns: 0 if successful, else a negative error number +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_substring_length_byname(pcre2_match_data *match_data, + PCRE2_SPTR stringname, PCRE2_SIZE *sizeptr) +{ +PCRE2_SPTR first, last, entry; +int failrc, entrysize; +if (match_data->matchedby == PCRE2_MATCHEDBY_DFA_INTERPRETER) + return PCRE2_ERROR_DFA_UFUNC; +entrysize = pcre2_substring_nametable_scan(match_data->code, stringname, + &first, &last); +if (entrysize < 0) return entrysize; +failrc = PCRE2_ERROR_UNAVAILABLE; +for (entry = first; entry <= last; entry += entrysize) + { + uint32_t n = GET2(entry, 0); + if (n < match_data->oveccount) + { + if (match_data->ovector[n*2] != PCRE2_UNSET) + return pcre2_substring_length_bynumber(match_data, n, sizeptr); + failrc = PCRE2_ERROR_UNSET; + } + } +return failrc; +} + + + +/************************************************* +* Get length of a numbered substring * +*************************************************/ + +/* This function returns the length of a captured substring. If the start is +beyond the end (which can happen when \K is used in an assertion), it sets the +length to zero. + +Arguments: + match_data pointer to match data + stringnumber the number of the required substring + sizeptr where to put the length, if not NULL + +Returns: if successful: 0 + if not successful, a negative error code: + PCRE2_ERROR_NOSUBSTRING: no such substring + PCRE2_ERROR_UNAVAILABLE: ovector is too small + PCRE2_ERROR_UNSET: substring is not set +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_substring_length_bynumber(pcre2_match_data *match_data, + uint32_t stringnumber, PCRE2_SIZE *sizeptr) +{ +PCRE2_SIZE left, right; +int count = match_data->rc; +if (count == PCRE2_ERROR_PARTIAL) + { + if (stringnumber > 0) return PCRE2_ERROR_PARTIAL; + count = 0; + } +else if (count < 0) return count; /* Match failed */ + +if (match_data->matchedby != PCRE2_MATCHEDBY_DFA_INTERPRETER) + { + if (stringnumber > match_data->code->top_bracket) + return PCRE2_ERROR_NOSUBSTRING; + if (stringnumber >= match_data->oveccount) + return PCRE2_ERROR_UNAVAILABLE; + if (match_data->ovector[stringnumber*2] == PCRE2_UNSET) + return PCRE2_ERROR_UNSET; + } +else /* Matched using pcre2_dfa_match() */ + { + if (stringnumber >= match_data->oveccount) return PCRE2_ERROR_UNAVAILABLE; + if (count != 0 && stringnumber >= (uint32_t)count) return PCRE2_ERROR_UNSET; + } + +left = match_data->ovector[stringnumber*2]; +right = match_data->ovector[stringnumber*2+1]; +if (sizeptr != NULL) *sizeptr = (left > right)? 0 : right - left; +return 0; +} + + + +/************************************************* +* Extract all captured strings to new memory * +*************************************************/ + +/* This function gets one chunk of memory and builds a list of pointers and all +the captured substrings in it. A NULL pointer is put on the end of the list. +The substrings are zero-terminated, but also, if the final argument is +non-NULL, a list of lengths is also returned. This allows binary data to be +handled. + +Arguments: + match_data points to the match data + listptr set to point to the list of pointers + lengthsptr set to point to the list of lengths (may be NULL) + +Returns: if successful: 0 + if not successful, a negative error code: + PCRE2_ERROR_NOMEMORY: failed to get memory, + or a match failure code +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_substring_list_get(pcre2_match_data *match_data, PCRE2_UCHAR ***listptr, + PCRE2_SIZE **lengthsptr) +{ +int i, count, count2; +PCRE2_SIZE size; +PCRE2_SIZE *lensp; +pcre2_memctl *memp; +PCRE2_UCHAR **listp; +PCRE2_UCHAR *sp; +PCRE2_SIZE *ovector; + +if ((count = match_data->rc) < 0) return count; /* Match failed */ +if (count == 0) count = match_data->oveccount; /* Ovector too small */ + +count2 = 2*count; +ovector = match_data->ovector; +size = sizeof(pcre2_memctl) + sizeof(PCRE2_UCHAR *); /* For final NULL */ +if (lengthsptr != NULL) size += sizeof(PCRE2_SIZE) * count; /* For lengths */ + +for (i = 0; i < count2; i += 2) + { + size += sizeof(PCRE2_UCHAR *) + CU2BYTES(1); + if (ovector[i+1] > ovector[i]) size += CU2BYTES(ovector[i+1] - ovector[i]); + } + +memp = PRIV(memctl_malloc)(size, (pcre2_memctl *)match_data); +if (memp == NULL) return PCRE2_ERROR_NOMEMORY; + +*listptr = listp = (PCRE2_UCHAR **)((char *)memp + sizeof(pcre2_memctl)); +lensp = (PCRE2_SIZE *)((char *)listp + sizeof(PCRE2_UCHAR *) * (count + 1)); + +if (lengthsptr == NULL) + { + sp = (PCRE2_UCHAR *)lensp; + lensp = NULL; + } +else + { + *lengthsptr = lensp; + sp = (PCRE2_UCHAR *)((char *)lensp + sizeof(PCRE2_SIZE) * count); + } + +for (i = 0; i < count2; i += 2) + { + size = (ovector[i+1] > ovector[i])? (ovector[i+1] - ovector[i]) : 0; + + /* Size == 0 includes the case when the capture is unset. Avoid adding + PCRE2_UNSET to match_data->subject because it overflows, even though with + zero size calling memcpy() is harmless. */ + + if (size != 0) memcpy(sp, match_data->subject + ovector[i], CU2BYTES(size)); + *listp++ = sp; + if (lensp != NULL) *lensp++ = size; + sp += size; + *sp++ = 0; + } + +*listp = NULL; +return 0; +} + + + +/************************************************* +* Free memory obtained by substring_list_get * +*************************************************/ + +/* +Argument: the result of a previous pcre2_substring_list_get() +Returns: nothing +*/ + +PCRE2_EXP_DEFN void PCRE2_CALL_CONVENTION +pcre2_substring_list_free(PCRE2_SPTR *list) +{ +if (list != NULL) + { + pcre2_memctl *memctl = (pcre2_memctl *)((char *)list - sizeof(pcre2_memctl)); + memctl->free(memctl, memctl->memory_data); + } +} + + + +/************************************************* +* Find (multiple) entries for named string * +*************************************************/ + +/* This function scans the nametable for a given name, using binary chop. It +returns either two pointers to the entries in the table, or, if no pointers are +given, the number of a unique group with the given name. If duplicate names are +permitted, and the name is not unique, an error is generated. + +Arguments: + code the compiled regex + stringname the name whose entries required + firstptr where to put the pointer to the first entry + lastptr where to put the pointer to the last entry + +Returns: PCRE2_ERROR_NOSUBSTRING if the name is not found + otherwise, if firstptr and lastptr are NULL: + a group number for a unique substring + else PCRE2_ERROR_NOUNIQUESUBSTRING + otherwise: + the length of each entry, having set firstptr and lastptr +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_substring_nametable_scan(const pcre2_code *code, PCRE2_SPTR stringname, + PCRE2_SPTR *firstptr, PCRE2_SPTR *lastptr) +{ +uint16_t bot = 0; +uint16_t top = code->name_count; +uint16_t entrysize = code->name_entry_size; +PCRE2_SPTR nametable = (PCRE2_SPTR)((char *)code + sizeof(pcre2_real_code)); + +while (top > bot) + { + uint16_t mid = (top + bot) / 2; + PCRE2_SPTR entry = nametable + entrysize*mid; + int c = PRIV(strcmp)(stringname, entry + IMM2_SIZE); + if (c == 0) + { + PCRE2_SPTR first; + PCRE2_SPTR last; + PCRE2_SPTR lastentry; + lastentry = nametable + entrysize * (code->name_count - 1); + first = last = entry; + while (first > nametable) + { + if (PRIV(strcmp)(stringname, (first - entrysize + IMM2_SIZE)) != 0) break; + first -= entrysize; + } + while (last < lastentry) + { + if (PRIV(strcmp)(stringname, (last + entrysize + IMM2_SIZE)) != 0) break; + last += entrysize; + } + if (firstptr == NULL) return (first == last)? + (int)GET2(entry, 0) : PCRE2_ERROR_NOUNIQUESUBSTRING; + *firstptr = first; + *lastptr = last; + return entrysize; + } + if (c > 0) bot = mid + 1; else top = mid; + } + +return PCRE2_ERROR_NOSUBSTRING; +} + + +/************************************************* +* Find number for named string * +*************************************************/ + +/* This function is a convenience wrapper for pcre2_substring_nametable_scan() +when it is known that names are unique. If there are duplicate names, it is not +defined which number is returned. + +Arguments: + code the compiled regex + stringname the name whose number is required + +Returns: the number of the named parenthesis, or a negative number + PCRE2_ERROR_NOSUBSTRING if not found + PCRE2_ERROR_NOUNIQUESUBSTRING if not unique +*/ + +PCRE2_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_substring_number_from_name(const pcre2_code *code, + PCRE2_SPTR stringname) +{ +return pcre2_substring_nametable_scan(code, stringname, NULL, NULL); +} + +/* End of pcre2_substring.c */ diff --git a/src/pcre/pcre_tables.c b/src/pcre2/src/pcre2_tables.c similarity index 58% rename from src/pcre/pcre_tables.c rename to src/pcre2/src/pcre2_tables.c index 5e18e8cf..b10de45e 100644 --- a/src/pcre/pcre_tables.c +++ b/src/pcre2/src/pcre2_tables.c @@ -6,7 +6,8 @@ and semantics are as close as possible to those of the Perl 5 language. Written by Philip Hazel - Copyright (c) 1997-2017 University of Cambridge + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2019 University of Cambridge ----------------------------------------------------------------------------- Redistribution and use in source and binary forms, with or without @@ -37,46 +38,64 @@ POSSIBILITY OF SUCH DAMAGE. ----------------------------------------------------------------------------- */ -#ifndef PCRE_INCLUDED - /* This module contains some fixed tables that are used by more than one of the -PCRE code modules. The tables are also #included by the pcretest program, which -uses macros to change their names from _pcre_xxx to xxxx, thereby avoiding name -clashes with the library. */ - +PCRE2 code modules. The tables are also #included by the pcre2test program, +which uses macros to change their names from _pcre2_xxx to xxxx, thereby +avoiding name clashes with the library. In this case, PCRE2_PCRE2TEST is +defined. */ +#ifndef PCRE2_PCRE2TEST /* We're compiling the library */ #ifdef HAVE_CONFIG_H #include "config.h" #endif +#include "pcre2_internal.h" +#endif /* PCRE2_PCRE2TEST */ -#include "pcre_internal.h" - -#endif /* PCRE_INCLUDED */ /* Table of sizes for the fixed-length opcodes. It's defined in a macro so that -the definition is next to the definition of the opcodes in pcre_internal.h. */ +the definition is next to the definition of the opcodes in pcre2_internal.h. +This is mode-dependent, so is skipped when this file is included by pcre2test. */ -const pcre_uint8 PRIV(OP_lengths)[] = { OP_LENGTHS }; +#ifndef PCRE2_PCRE2TEST +const uint8_t PRIV(OP_lengths)[] = { OP_LENGTHS }; +#endif /* Tables of horizontal and vertical whitespace characters, suitable for adding to classes. */ -const pcre_uint32 PRIV(hspace_list)[] = { HSPACE_LIST }; -const pcre_uint32 PRIV(vspace_list)[] = { VSPACE_LIST }; +const uint32_t PRIV(hspace_list)[] = { HSPACE_LIST }; +const uint32_t PRIV(vspace_list)[] = { VSPACE_LIST }; +/* These tables are the pairs of delimiters that are valid for callout string +arguments. For each starting delimiter there must be a matching ending +delimiter, which in fact is different only for bracket-like delimiters. */ + +const uint32_t PRIV(callout_start_delims)[] = { + CHAR_GRAVE_ACCENT, CHAR_APOSTROPHE, CHAR_QUOTATION_MARK, + CHAR_CIRCUMFLEX_ACCENT, CHAR_PERCENT_SIGN, CHAR_NUMBER_SIGN, + CHAR_DOLLAR_SIGN, CHAR_LEFT_CURLY_BRACKET, 0 }; + +const uint32_t PRIV(callout_end_delims[]) = { + CHAR_GRAVE_ACCENT, CHAR_APOSTROPHE, CHAR_QUOTATION_MARK, + CHAR_CIRCUMFLEX_ACCENT, CHAR_PERCENT_SIGN, CHAR_NUMBER_SIGN, + CHAR_DOLLAR_SIGN, CHAR_RIGHT_CURLY_BRACKET, 0 }; /************************************************* * Tables for UTF-8 support * *************************************************/ -/* These are the breakpoints for different numbers of bytes in a UTF-8 -character. */ +/* These tables are required by pcre2test in 16- or 32-bit mode, as well +as for the library in 8-bit mode, because pcre2test uses UTF-8 internally for +handling wide characters. */ -#if (defined SUPPORT_UTF && defined COMPILE_PCRE8) \ - || (defined PCRE_INCLUDED && (defined SUPPORT_PCRE16 || defined SUPPORT_PCRE32)) +#if defined PCRE2_PCRE2TEST || \ + (defined SUPPORT_UNICODE && \ + defined PCRE2_CODE_UNIT_WIDTH && \ + PCRE2_CODE_UNIT_WIDTH == 8) -/* These tables are also required by pcretest in 16- or 32-bit mode. */ +/* These are the breakpoints for different numbers of bytes in a UTF-8 +character. */ const int PRIV(utf8_table1)[] = { 0x7f, 0x7ff, 0xffff, 0x1fffff, 0x3ffffff, 0x7fffffff}; @@ -92,19 +111,20 @@ const int PRIV(utf8_table3)[] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01}; /* Table of the number of extra bytes, indexed by the first byte masked with 0x3f. The highest number for a valid UTF-8 first byte is in fact 0x3d. */ -const pcre_uint8 PRIV(utf8_table4)[] = { +const uint8_t PRIV(utf8_table4)[] = { 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5 }; -#endif /* (SUPPORT_UTF && COMPILE_PCRE8) || (PCRE_INCLUDED && SUPPORT_PCRE[16|32])*/ +#endif /* UTF-8 support needed */ + -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE /* Table to translate from particular type value to the general value. */ -const pcre_uint32 PRIV(ucp_gentype)[] = { +const uint32_t PRIV(ucp_gentype)[] = { ucp_C, ucp_C, ucp_C, ucp_C, ucp_C, /* Cc, Cf, Cn, Co, Cs */ ucp_L, ucp_L, ucp_L, ucp_L, ucp_L, /* Ll, Lu, Lm, Lo, Lt */ ucp_M, ucp_M, ucp_M, /* Mc, Me, Mn */ @@ -117,18 +137,19 @@ const pcre_uint32 PRIV(ucp_gentype)[] = { /* This table encodes the rules for finding the end of an extended grapheme cluster. Every code point has a grapheme break property which is one of the -ucp_gbXX values defined in ucp.h. The 2-dimensional table is indexed by the -properties of two adjacent code points. The left property selects a word from -the table, and the right property selects a bit from that word like this: +ucp_gbXX values defined in pcre2_ucp.h. These changed between Unicode versions +10 and 11. The 2-dimensional table is indexed by the properties of two adjacent +code points. The left property selects a word from the table, and the right +property selects a bit from that word like this: - ucp_gbtable[left-property] & (1 << right-property) + PRIV(ucp_gbtable)[left-property] & (1u << right-property) The value is non-zero if a grapheme break is NOT permitted between the relevant two code points. The breaking rules are as follows: 1. Break at the start and end of text (pretty obviously). -2. Do not break between a CR and LF; otherwise, break before and after +2. Do not break between a CR and LF; otherwise, break before and after controls. 3. Do not break Hangul syllable sequences, the rules for which are: @@ -137,44 +158,54 @@ two code points. The breaking rules are as follows: LV or V may be followed by V or T LVT or T may be followed by T -4. Do not break before extending characters. +4. Do not break before extending characters or zero-width-joiner (ZWJ). -The next two rules are only for extended grapheme clusters (but that's what we +The following rules are only for extended grapheme clusters (but that's what we are implementing). 5. Do not break before SpacingMarks. 6. Do not break after Prepend characters. -7. Otherwise, break everywhere. +7. Do not break within emoji modifier sequences or emoji zwj sequences. That + is, do not break between characters with the Extended_Pictographic property. + Extend and ZWJ characters are allowed between the characters; this cannot be + represented in this table, the code has to deal with it. + +8. Do not break within emoji flag sequences. That is, do not break between + regional indicator (RI) symbols if there are an odd number of RI characters + before the break point. This table encodes "join RI characters"; the code + has to deal with checking for previous adjoining RIs. + +9. Otherwise, break everywhere. */ -const pcre_uint32 PRIV(ucp_gbtable[]) = { - (1< 0x10ffff is not permitted -PCRE_UTF8_ERR14 3-byte character with value 0xd000-0xdfff is not permitted -PCRE_UTF8_ERR15 Overlong 2-byte sequence -PCRE_UTF8_ERR16 Overlong 3-byte sequence -PCRE_UTF8_ERR17 Overlong 4-byte sequence -PCRE_UTF8_ERR18 Overlong 5-byte sequence (won't ever occur) -PCRE_UTF8_ERR19 Overlong 6-byte sequence (won't ever occur) -PCRE_UTF8_ERR20 Isolated 0x80 byte (not within UTF-8 character) -PCRE_UTF8_ERR21 Byte with the illegal value 0xfe or 0xff -PCRE_UTF8_ERR22 Unused (was non-character) - Arguments: string points to the string - length length of string, or -1 if the string is zero-terminated + length length of string errp pointer to an error position offset variable -Returns: = 0 if the string is a valid UTF-8 string - > 0 otherwise, setting the offset of the bad character +Returns: == 0 if the string is a valid UTF string + != 0 otherwise, setting the offset of the bad character */ int -PRIV(valid_utf)(PCRE_PUCHAR string, int length, int *erroroffset) +PRIV(valid_utf)(PCRE2_SPTR string, PCRE2_SIZE length, PCRE2_SIZE *erroroffset) { -#ifdef SUPPORT_UTF -register PCRE_PUCHAR p; +PCRE2_SPTR p; +uint32_t c; -if (length < 0) - { - for (p = string; *p != 0; p++); - length = (int)(p - string); - } +/* ----------------- Check a UTF-8 string ----------------- */ + +#if PCRE2_CODE_UNIT_WIDTH == 8 + +/* Originally, this function checked according to RFC 2279, allowing for values +in the range 0 to 0x7fffffff, up to 6 bytes long, but ensuring that they were +in the canonical format. Once somebody had pointed out RFC 3629 to me (it +obsoletes 2279), additional restrictions were applied. The values are now +limited to be between 0 and 0x0010ffff, no more than 4 bytes long, and the +subrange 0xd000 to 0xdfff is excluded. However, the format of 5-byte and 6-byte +characters is still checked. Error returns are as follows: + +PCRE2_ERROR_UTF8_ERR1 Missing 1 byte at the end of the string +PCRE2_ERROR_UTF8_ERR2 Missing 2 bytes at the end of the string +PCRE2_ERROR_UTF8_ERR3 Missing 3 bytes at the end of the string +PCRE2_ERROR_UTF8_ERR4 Missing 4 bytes at the end of the string +PCRE2_ERROR_UTF8_ERR5 Missing 5 bytes at the end of the string +PCRE2_ERROR_UTF8_ERR6 2nd-byte's two top bits are not 0x80 +PCRE2_ERROR_UTF8_ERR7 3rd-byte's two top bits are not 0x80 +PCRE2_ERROR_UTF8_ERR8 4th-byte's two top bits are not 0x80 +PCRE2_ERROR_UTF8_ERR9 5th-byte's two top bits are not 0x80 +PCRE2_ERROR_UTF8_ERR10 6th-byte's two top bits are not 0x80 +PCRE2_ERROR_UTF8_ERR11 5-byte character is not permitted by RFC 3629 +PCRE2_ERROR_UTF8_ERR12 6-byte character is not permitted by RFC 3629 +PCRE2_ERROR_UTF8_ERR13 4-byte character with value > 0x10ffff is not permitted +PCRE2_ERROR_UTF8_ERR14 3-byte character with value 0xd800-0xdfff is not permitted +PCRE2_ERROR_UTF8_ERR15 Overlong 2-byte sequence +PCRE2_ERROR_UTF8_ERR16 Overlong 3-byte sequence +PCRE2_ERROR_UTF8_ERR17 Overlong 4-byte sequence +PCRE2_ERROR_UTF8_ERR18 Overlong 5-byte sequence (won't ever occur) +PCRE2_ERROR_UTF8_ERR19 Overlong 6-byte sequence (won't ever occur) +PCRE2_ERROR_UTF8_ERR20 Isolated 0x80 byte (not within UTF-8 character) +PCRE2_ERROR_UTF8_ERR21 Byte with the illegal value 0xfe or 0xff +*/ -for (p = string; length-- > 0; p++) +for (p = string; length > 0; p++) { - register pcre_uchar ab, c, d; + uint32_t ab, d; c = *p; + length--; + if (c < 128) continue; /* ASCII character */ if (c < 0xc0) /* Isolated 10xx xxxx byte */ { - *erroroffset = (int)(p - string); - return PCRE_UTF8_ERR20; + *erroroffset = (PCRE2_SIZE)(p - string); + return PCRE2_ERROR_UTF8_ERR20; } if (c >= 0xfe) /* Invalid 0xfe or 0xff bytes */ { - *erroroffset = (int)(p - string); - return PCRE_UTF8_ERR21; + *erroroffset = (PCRE2_SIZE)(p - string); + return PCRE2_ERROR_UTF8_ERR21; } - ab = PRIV(utf8_table4)[c & 0x3f]; /* Number of additional bytes */ - if (length < ab) + ab = PRIV(utf8_table4)[c & 0x3f]; /* Number of additional bytes (1-5) */ + if (length < ab) /* Missing bytes */ { - *erroroffset = (int)(p - string); /* Missing bytes */ - return ab - length; /* Codes ERR1 to ERR5 */ + *erroroffset = (PCRE2_SIZE)(p - string); + switch(ab - length) + { + case 1: return PCRE2_ERROR_UTF8_ERR1; + case 2: return PCRE2_ERROR_UTF8_ERR2; + case 3: return PCRE2_ERROR_UTF8_ERR3; + case 4: return PCRE2_ERROR_UTF8_ERR4; + case 5: return PCRE2_ERROR_UTF8_ERR5; + } } length -= ab; /* Length remaining */ @@ -147,7 +172,7 @@ for (p = string; length-- > 0; p++) if (((d = *(++p)) & 0xc0) != 0x80) { *erroroffset = (int)(p - string) - 1; - return PCRE_UTF8_ERR6; + return PCRE2_ERROR_UTF8_ERR6; } /* For each length, check that the remaining bytes start with the 0x80 bit @@ -162,7 +187,7 @@ for (p = string; length-- > 0; p++) case 1: if ((c & 0x3e) == 0) { *erroroffset = (int)(p - string) - 1; - return PCRE_UTF8_ERR15; + return PCRE2_ERROR_UTF8_ERR15; } break; @@ -174,17 +199,17 @@ for (p = string; length-- > 0; p++) if ((*(++p) & 0xc0) != 0x80) /* Third byte */ { *erroroffset = (int)(p - string) - 2; - return PCRE_UTF8_ERR7; + return PCRE2_ERROR_UTF8_ERR7; } if (c == 0xe0 && (d & 0x20) == 0) { *erroroffset = (int)(p - string) - 2; - return PCRE_UTF8_ERR16; + return PCRE2_ERROR_UTF8_ERR16; } if (c == 0xed && d >= 0xa0) { *erroroffset = (int)(p - string) - 2; - return PCRE_UTF8_ERR14; + return PCRE2_ERROR_UTF8_ERR14; } break; @@ -196,22 +221,22 @@ for (p = string; length-- > 0; p++) if ((*(++p) & 0xc0) != 0x80) /* Third byte */ { *erroroffset = (int)(p - string) - 2; - return PCRE_UTF8_ERR7; + return PCRE2_ERROR_UTF8_ERR7; } if ((*(++p) & 0xc0) != 0x80) /* Fourth byte */ { *erroroffset = (int)(p - string) - 3; - return PCRE_UTF8_ERR8; + return PCRE2_ERROR_UTF8_ERR8; } if (c == 0xf0 && (d & 0x30) == 0) { *erroroffset = (int)(p - string) - 3; - return PCRE_UTF8_ERR17; + return PCRE2_ERROR_UTF8_ERR17; } if (c > 0xf4 || (c == 0xf4 && d > 0x8f)) { *erroroffset = (int)(p - string) - 3; - return PCRE_UTF8_ERR13; + return PCRE2_ERROR_UTF8_ERR13; } break; @@ -227,22 +252,22 @@ for (p = string; length-- > 0; p++) if ((*(++p) & 0xc0) != 0x80) /* Third byte */ { *erroroffset = (int)(p - string) - 2; - return PCRE_UTF8_ERR7; + return PCRE2_ERROR_UTF8_ERR7; } if ((*(++p) & 0xc0) != 0x80) /* Fourth byte */ { *erroroffset = (int)(p - string) - 3; - return PCRE_UTF8_ERR8; + return PCRE2_ERROR_UTF8_ERR8; } if ((*(++p) & 0xc0) != 0x80) /* Fifth byte */ { *erroroffset = (int)(p - string) - 4; - return PCRE_UTF8_ERR9; + return PCRE2_ERROR_UTF8_ERR9; } if (c == 0xf8 && (d & 0x38) == 0) { *erroroffset = (int)(p - string) - 4; - return PCRE_UTF8_ERR18; + return PCRE2_ERROR_UTF8_ERR18; } break; @@ -253,27 +278,27 @@ for (p = string; length-- > 0; p++) if ((*(++p) & 0xc0) != 0x80) /* Third byte */ { *erroroffset = (int)(p - string) - 2; - return PCRE_UTF8_ERR7; + return PCRE2_ERROR_UTF8_ERR7; } if ((*(++p) & 0xc0) != 0x80) /* Fourth byte */ { *erroroffset = (int)(p - string) - 3; - return PCRE_UTF8_ERR8; + return PCRE2_ERROR_UTF8_ERR8; } if ((*(++p) & 0xc0) != 0x80) /* Fifth byte */ { *erroroffset = (int)(p - string) - 4; - return PCRE_UTF8_ERR9; + return PCRE2_ERROR_UTF8_ERR9; } if ((*(++p) & 0xc0) != 0x80) /* Sixth byte */ { *erroroffset = (int)(p - string) - 5; - return PCRE_UTF8_ERR10; + return PCRE2_ERROR_UTF8_ERR10; } if (c == 0xfc && (d & 0x3c) == 0) { *erroroffset = (int)(p - string) - 5; - return PCRE_UTF8_ERR19; + return PCRE2_ERROR_UTF8_ERR19; } break; } @@ -285,17 +310,89 @@ for (p = string; length-- > 0; p++) if (ab > 3) { *erroroffset = (int)(p - string) - ab; - return (ab == 4)? PCRE_UTF8_ERR11 : PCRE_UTF8_ERR12; + return (ab == 4)? PCRE2_ERROR_UTF8_ERR11 : PCRE2_ERROR_UTF8_ERR12; } } +return 0; -#else /* Not SUPPORT_UTF */ -(void)(string); /* Keep picky compilers happy */ -(void)(length); -(void)(erroroffset); -#endif -return PCRE_UTF8_ERR0; /* This indicates success */ +/* ----------------- Check a UTF-16 string ----------------- */ + +#elif PCRE2_CODE_UNIT_WIDTH == 16 + +/* There's not so much work, nor so many errors, for UTF-16. +PCRE2_ERROR_UTF16_ERR1 Missing low surrogate at the end of the string +PCRE2_ERROR_UTF16_ERR2 Invalid low surrogate +PCRE2_ERROR_UTF16_ERR3 Isolated low surrogate +*/ + +for (p = string; length > 0; p++) + { + c = *p; + length--; + + if ((c & 0xf800) != 0xd800) + { + /* Normal UTF-16 code point. Neither high nor low surrogate. */ + } + else if ((c & 0x0400) == 0) + { + /* High surrogate. Must be a followed by a low surrogate. */ + if (length == 0) + { + *erroroffset = p - string; + return PCRE2_ERROR_UTF16_ERR1; + } + p++; + length--; + if ((*p & 0xfc00) != 0xdc00) + { + *erroroffset = p - string - 1; + return PCRE2_ERROR_UTF16_ERR2; + } + } + else + { + /* Isolated low surrogate. Always an error. */ + *erroroffset = p - string; + return PCRE2_ERROR_UTF16_ERR3; + } + } +return 0; + + + +/* ----------------- Check a UTF-32 string ----------------- */ + +#else + +/* There is very little to do for a UTF-32 string. +PCRE2_ERROR_UTF32_ERR1 Surrogate character +PCRE2_ERROR_UTF32_ERR2 Character > 0x10ffff +*/ + +for (p = string; length > 0; length--, p++) + { + c = *p; + if ((c & 0xfffff800u) != 0xd800u) + { + /* Normal UTF-32 code point. Neither high nor low surrogate. */ + if (c > 0x10ffffu) + { + *erroroffset = p - string; + return PCRE2_ERROR_UTF32_ERR2; + } + } + else + { + /* A surrogate */ + *erroroffset = p - string; + return PCRE2_ERROR_UTF32_ERR1; + } + } +return 0; +#endif /* CODE_UNIT_WIDTH */ } +#endif /* SUPPORT_UNICODE */ -/* End of pcre_valid_utf8.c */ +/* End of pcre2_valid_utf.c */ diff --git a/src/pcre/pcre_xclass.c b/src/pcre2/src/pcre2_xclass.c similarity index 86% rename from src/pcre/pcre_xclass.c rename to src/pcre2/src/pcre2_xclass.c index ef759a58..8b052be6 100644 --- a/src/pcre/pcre_xclass.c +++ b/src/pcre2/src/pcre2_xclass.c @@ -6,7 +6,8 @@ and semantics are as close as possible to those of the Perl 5 language. Written by Philip Hazel - Copyright (c) 1997-2013 University of Cambridge + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2019 University of Cambridge ----------------------------------------------------------------------------- Redistribution and use in source and binary forms, with or without @@ -37,46 +38,46 @@ POSSIBILITY OF SUCH DAMAGE. ----------------------------------------------------------------------------- */ - /* This module contains an internal function that is used to match an extended -class. It is used by both pcre_exec() and pcre_def_exec(). */ +class. It is used by pcre2_auto_possessify() and by both pcre2_match() and +pcre2_def_match(). */ #ifdef HAVE_CONFIG_H #include "config.h" #endif -#include "pcre_internal.h" +#include "pcre2_internal.h" /************************************************* * Match character against an XCLASS * *************************************************/ /* This function is called to match a character against an extended class that -might contain values > 255 and/or Unicode properties. +might contain codepoints above 255 and/or Unicode properties. Arguments: c the character - data points to the flag byte of the XCLASS data + data points to the flag code unit of the XCLASS data + utf TRUE if in UTF mode Returns: TRUE if character matches, else FALSE */ BOOL -PRIV(xclass)(pcre_uint32 c, const pcre_uchar *data, BOOL utf) +PRIV(xclass)(uint32_t c, PCRE2_SPTR data, BOOL utf) { -pcre_uchar t; +PCRE2_UCHAR t; BOOL negated = (*data & XCL_NOT) != 0; -(void)utf; -#ifdef COMPILE_PCRE8 +#if PCRE2_CODE_UNIT_WIDTH == 8 /* In 8 bit mode, this must always be TRUE. Help the compiler to know that. */ utf = TRUE; #endif -/* Character values < 256 are matched against a bitmap, if one is present. If -not, we still carry on, because there may be ranges that start below 256 in the +/* Code points < 256 are matched against a bitmap, if one is present. If not, +we still carry on, because there may be ranges that start below 256 in the additional data. */ if (c < 256) @@ -84,37 +85,37 @@ if (c < 256) if ((*data & XCL_HASPROP) == 0) { if ((*data & XCL_MAP) == 0) return negated; - return (((pcre_uint8 *)(data + 1))[c/8] & (1 << (c&7))) != 0; + return (((uint8_t *)(data + 1))[c/8] & (1u << (c&7))) != 0; } if ((*data & XCL_MAP) != 0 && - (((pcre_uint8 *)(data + 1))[c/8] & (1 << (c&7))) != 0) + (((uint8_t *)(data + 1))[c/8] & (1u << (c&7))) != 0) return !negated; /* char found */ } /* First skip the bit map if present. Then match against the list of Unicode properties or large chars or ranges that end with a large char. We won't ever -encounter XCL_PROP or XCL_NOTPROP when UCP support is not compiled. */ +encounter XCL_PROP or XCL_NOTPROP when UTF support is not compiled. */ -if ((*data++ & XCL_MAP) != 0) data += 32 / sizeof(pcre_uchar); +if ((*data++ & XCL_MAP) != 0) data += 32 / sizeof(PCRE2_UCHAR); while ((t = *data++) != XCL_END) { - pcre_uint32 x, y; + uint32_t x, y; if (t == XCL_SINGLE) { -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (utf) { GETCHARINC(x, data); /* macro generates multiple statements */ } else #endif - x = *data++; + x = *data++; if (c == x) return !negated; } else if (t == XCL_RANGE) { -#ifdef SUPPORT_UTF +#ifdef SUPPORT_UNICODE if (utf) { GETCHARINC(x, data); /* macro generates multiple statements */ @@ -129,7 +130,7 @@ while ((t = *data++) != XCL_END) if (c >= x && c <= y) return !negated; } -#ifdef SUPPORT_UCP +#ifdef SUPPORT_UNICODE else /* XCL_PROP & XCL_NOTPROP */ { const ucd_record *prop = GET_UCD(c); @@ -259,10 +260,12 @@ while ((t = *data++) != XCL_END) data += 2; } -#endif /* SUPPORT_UCP */ +#else + (void)utf; /* Avoid compiler warning */ +#endif /* SUPPORT_UNICODE */ } return negated; /* char did not match */ } -/* End of pcre_xclass.c */ +/* End of pcre2_xclass.c */ diff --git a/src/pcre2/src/pcre2demo.c b/src/pcre2/src/pcre2demo.c new file mode 100644 index 00000000..a49f1f8e --- /dev/null +++ b/src/pcre2/src/pcre2demo.c @@ -0,0 +1,494 @@ +/************************************************* +* PCRE2 DEMONSTRATION PROGRAM * +*************************************************/ + +/* This is a demonstration program to illustrate a straightforward way of +using the PCRE2 regular expression library from a C program. See the +pcre2sample documentation for a short discussion ("man pcre2sample" if you have +the PCRE2 man pages installed). PCRE2 is a revised API for the library, and is +incompatible with the original PCRE API. + +There are actually three libraries, each supporting a different code unit +width. This demonstration program uses the 8-bit library. The default is to +process each code unit as a separate character, but if the pattern begins with +"(*UTF)", both it and the subject are treated as UTF-8 strings, where +characters may occupy multiple code units. + +In Unix-like environments, if PCRE2 is installed in your standard system +libraries, you should be able to compile this program using this command: + +cc -Wall pcre2demo.c -lpcre2-8 -o pcre2demo + +If PCRE2 is not installed in a standard place, it is likely to be installed +with support for the pkg-config mechanism. If you have pkg-config, you can +compile this program using this command: + +cc -Wall pcre2demo.c `pkg-config --cflags --libs libpcre2-8` -o pcre2demo + +If you do not have pkg-config, you may have to use something like this: + +cc -Wall pcre2demo.c -I/usr/local/include -L/usr/local/lib \ + -R/usr/local/lib -lpcre2-8 -o pcre2demo + +Replace "/usr/local/include" and "/usr/local/lib" with wherever the include and +library files for PCRE2 are installed on your system. Only some operating +systems (Solaris is one) use the -R option. + +Building under Windows: + +If you want to statically link this program against a non-dll .a file, you must +define PCRE2_STATIC before including pcre2.h, so in this environment, uncomment +the following line. */ + +/* #define PCRE2_STATIC */ + +/* The PCRE2_CODE_UNIT_WIDTH macro must be defined before including pcre2.h. +For a program that uses only one code unit width, setting it to 8, 16, or 32 +makes it possible to use generic function names such as pcre2_compile(). Note +that just changing 8 to 16 (for example) is not sufficient to convert this +program to process 16-bit characters. Even in a fully 16-bit environment, where +string-handling functions such as strcmp() and printf() work with 16-bit +characters, the code for handling the table of named substrings will still need +to be modified. */ + +#define PCRE2_CODE_UNIT_WIDTH 8 + +#include +#include +#include + + +/************************************************************************** +* Here is the program. The API includes the concept of "contexts" for * +* setting up unusual interface requirements for compiling and matching, * +* such as custom memory managers and non-standard newline definitions. * +* This program does not do any of this, so it makes no use of contexts, * +* always passing NULL where a context could be given. * +**************************************************************************/ + +int main(int argc, char **argv) +{ +pcre2_code *re; +PCRE2_SPTR pattern; /* PCRE2_SPTR is a pointer to unsigned code units of */ +PCRE2_SPTR subject; /* the appropriate width (in this case, 8 bits). */ +PCRE2_SPTR name_table; + +int crlf_is_newline; +int errornumber; +int find_all; +int i; +int rc; +int utf8; + +uint32_t option_bits; +uint32_t namecount; +uint32_t name_entry_size; +uint32_t newline; + +PCRE2_SIZE erroroffset; +PCRE2_SIZE *ovector; +PCRE2_SIZE subject_length; + +pcre2_match_data *match_data; + + +/************************************************************************** +* First, sort out the command line. There is only one possible option at * +* the moment, "-g" to request repeated matching to find all occurrences, * +* like Perl's /g option. We set the variable find_all to a non-zero value * +* if the -g option is present. * +**************************************************************************/ + +find_all = 0; +for (i = 1; i < argc; i++) + { + if (strcmp(argv[i], "-g") == 0) find_all = 1; + else if (argv[i][0] == '-') + { + printf("Unrecognised option %s\n", argv[i]); + return 1; + } + else break; + } + +/* After the options, we require exactly two arguments, which are the pattern, +and the subject string. */ + +if (argc - i != 2) + { + printf("Exactly two arguments required: a regex and a subject string\n"); + return 1; + } + +/* Pattern and subject are char arguments, so they can be straightforwardly +cast to PCRE2_SPTR because we are working in 8-bit code units. The subject +length is cast to PCRE2_SIZE for completeness, though PCRE2_SIZE is in fact +defined to be size_t. */ + +pattern = (PCRE2_SPTR)argv[i]; +subject = (PCRE2_SPTR)argv[i+1]; +subject_length = (PCRE2_SIZE)strlen((char *)subject); + + +/************************************************************************* +* Now we are going to compile the regular expression pattern, and handle * +* any errors that are detected. * +*************************************************************************/ + +re = pcre2_compile( + pattern, /* the pattern */ + PCRE2_ZERO_TERMINATED, /* indicates pattern is zero-terminated */ + 0, /* default options */ + &errornumber, /* for error number */ + &erroroffset, /* for error offset */ + NULL); /* use default compile context */ + +/* Compilation failed: print the error message and exit. */ + +if (re == NULL) + { + PCRE2_UCHAR buffer[256]; + pcre2_get_error_message(errornumber, buffer, sizeof(buffer)); + printf("PCRE2 compilation failed at offset %d: %s\n", (int)erroroffset, + buffer); + return 1; + } + + +/************************************************************************* +* If the compilation succeeded, we call PCRE2 again, in order to do a * +* pattern match against the subject string. This does just ONE match. If * +* further matching is needed, it will be done below. Before running the * +* match we must set up a match_data block for holding the result. Using * +* pcre2_match_data_create_from_pattern() ensures that the block is * +* exactly the right size for the number of capturing parentheses in the * +* pattern. If you need to know the actual size of a match_data block as * +* a number of bytes, you can find it like this: * +* * +* PCRE2_SIZE match_data_size = pcre2_get_match_data_size(match_data); * +*************************************************************************/ + +match_data = pcre2_match_data_create_from_pattern(re, NULL); + +/* Now run the match. */ + +rc = pcre2_match( + re, /* the compiled pattern */ + subject, /* the subject string */ + subject_length, /* the length of the subject */ + 0, /* start at offset 0 in the subject */ + 0, /* default options */ + match_data, /* block for storing the result */ + NULL); /* use default match context */ + +/* Matching failed: handle error cases */ + +if (rc < 0) + { + switch(rc) + { + case PCRE2_ERROR_NOMATCH: printf("No match\n"); break; + /* + Handle other special cases if you like + */ + default: printf("Matching error %d\n", rc); break; + } + pcre2_match_data_free(match_data); /* Release memory used for the match */ + pcre2_code_free(re); /* data and the compiled pattern. */ + return 1; + } + +/* Match succeded. Get a pointer to the output vector, where string offsets are +stored. */ + +ovector = pcre2_get_ovector_pointer(match_data); +printf("Match succeeded at offset %d\n", (int)ovector[0]); + + +/************************************************************************* +* We have found the first match within the subject string. If the output * +* vector wasn't big enough, say so. Then output any substrings that were * +* captured. * +*************************************************************************/ + +/* The output vector wasn't big enough. This should not happen, because we used +pcre2_match_data_create_from_pattern() above. */ + +if (rc == 0) + printf("ovector was not big enough for all the captured substrings\n"); + +/* We must guard against patterns such as /(?=.\K)/ that use \K in an assertion +to set the start of a match later than its end. In this demonstration program, +we just detect this case and give up. */ + +if (ovector[0] > ovector[1]) + { + printf("\\K was used in an assertion to set the match start after its end.\n" + "From end to start the match was: %.*s\n", (int)(ovector[0] - ovector[1]), + (char *)(subject + ovector[1])); + printf("Run abandoned\n"); + pcre2_match_data_free(match_data); + pcre2_code_free(re); + return 1; + } + +/* Show substrings stored in the output vector by number. Obviously, in a real +application you might want to do things other than print them. */ + +for (i = 0; i < rc; i++) + { + PCRE2_SPTR substring_start = subject + ovector[2*i]; + PCRE2_SIZE substring_length = ovector[2*i+1] - ovector[2*i]; + printf("%2d: %.*s\n", i, (int)substring_length, (char *)substring_start); + } + + +/************************************************************************** +* That concludes the basic part of this demonstration program. We have * +* compiled a pattern, and performed a single match. The code that follows * +* shows first how to access named substrings, and then how to code for * +* repeated matches on the same subject. * +**************************************************************************/ + +/* See if there are any named substrings, and if so, show them by name. First +we have to extract the count of named parentheses from the pattern. */ + +(void)pcre2_pattern_info( + re, /* the compiled pattern */ + PCRE2_INFO_NAMECOUNT, /* get the number of named substrings */ + &namecount); /* where to put the answer */ + +if (namecount == 0) printf("No named substrings\n"); else + { + PCRE2_SPTR tabptr; + printf("Named substrings\n"); + + /* Before we can access the substrings, we must extract the table for + translating names to numbers, and the size of each entry in the table. */ + + (void)pcre2_pattern_info( + re, /* the compiled pattern */ + PCRE2_INFO_NAMETABLE, /* address of the table */ + &name_table); /* where to put the answer */ + + (void)pcre2_pattern_info( + re, /* the compiled pattern */ + PCRE2_INFO_NAMEENTRYSIZE, /* size of each entry in the table */ + &name_entry_size); /* where to put the answer */ + + /* Now we can scan the table and, for each entry, print the number, the name, + and the substring itself. In the 8-bit library the number is held in two + bytes, most significant first. */ + + tabptr = name_table; + for (i = 0; i < namecount; i++) + { + int n = (tabptr[0] << 8) | tabptr[1]; + printf("(%d) %*s: %.*s\n", n, name_entry_size - 3, tabptr + 2, + (int)(ovector[2*n+1] - ovector[2*n]), subject + ovector[2*n]); + tabptr += name_entry_size; + } + } + + +/************************************************************************* +* If the "-g" option was given on the command line, we want to continue * +* to search for additional matches in the subject string, in a similar * +* way to the /g option in Perl. This turns out to be trickier than you * +* might think because of the possibility of matching an empty string. * +* What happens is as follows: * +* * +* If the previous match was NOT for an empty string, we can just start * +* the next match at the end of the previous one. * +* * +* If the previous match WAS for an empty string, we can't do that, as it * +* would lead to an infinite loop. Instead, a call of pcre2_match() is * +* made with the PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set. The * +* first of these tells PCRE2 that an empty string at the start of the * +* subject is not a valid match; other possibilities must be tried. The * +* second flag restricts PCRE2 to one match attempt at the initial string * +* position. If this match succeeds, an alternative to the empty string * +* match has been found, and we can print it and proceed round the loop, * +* advancing by the length of whatever was found. If this match does not * +* succeed, we still stay in the loop, advancing by just one character. * +* In UTF-8 mode, which can be set by (*UTF) in the pattern, this may be * +* more than one byte. * +* * +* However, there is a complication concerned with newlines. When the * +* newline convention is such that CRLF is a valid newline, we must * +* advance by two characters rather than one. The newline convention can * +* be set in the regex by (*CR), etc.; if not, we must find the default. * +*************************************************************************/ + +if (!find_all) /* Check for -g */ + { + pcre2_match_data_free(match_data); /* Release the memory that was used */ + pcre2_code_free(re); /* for the match data and the pattern. */ + return 0; /* Exit the program. */ + } + +/* Before running the loop, check for UTF-8 and whether CRLF is a valid newline +sequence. First, find the options with which the regex was compiled and extract +the UTF state. */ + +(void)pcre2_pattern_info(re, PCRE2_INFO_ALLOPTIONS, &option_bits); +utf8 = (option_bits & PCRE2_UTF) != 0; + +/* Now find the newline convention and see whether CRLF is a valid newline +sequence. */ + +(void)pcre2_pattern_info(re, PCRE2_INFO_NEWLINE, &newline); +crlf_is_newline = newline == PCRE2_NEWLINE_ANY || + newline == PCRE2_NEWLINE_CRLF || + newline == PCRE2_NEWLINE_ANYCRLF; + +/* Loop for second and subsequent matches */ + +for (;;) + { + uint32_t options = 0; /* Normally no options */ + PCRE2_SIZE start_offset = ovector[1]; /* Start at end of previous match */ + + /* If the previous match was for an empty string, we are finished if we are + at the end of the subject. Otherwise, arrange to run another match at the + same point to see if a non-empty match can be found. */ + + if (ovector[0] == ovector[1]) + { + if (ovector[0] == subject_length) break; + options = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED; + } + + /* If the previous match was not an empty string, there is one tricky case to + consider. If a pattern contains \K within a lookbehind assertion at the + start, the end of the matched string can be at the offset where the match + started. Without special action, this leads to a loop that keeps on matching + the same substring. We must detect this case and arrange to move the start on + by one character. The pcre2_get_startchar() function returns the starting + offset that was passed to pcre2_match(). */ + + else + { + PCRE2_SIZE startchar = pcre2_get_startchar(match_data); + if (start_offset <= startchar) + { + if (startchar >= subject_length) break; /* Reached end of subject. */ + start_offset = startchar + 1; /* Advance by one character. */ + if (utf8) /* If UTF-8, it may be more */ + { /* than one code unit. */ + for (; start_offset < subject_length; start_offset++) + if ((subject[start_offset] & 0xc0) != 0x80) break; + } + } + } + + /* Run the next matching operation */ + + rc = pcre2_match( + re, /* the compiled pattern */ + subject, /* the subject string */ + subject_length, /* the length of the subject */ + start_offset, /* starting offset in the subject */ + options, /* options */ + match_data, /* block for storing the result */ + NULL); /* use default match context */ + + /* This time, a result of NOMATCH isn't an error. If the value in "options" + is zero, it just means we have found all possible matches, so the loop ends. + Otherwise, it means we have failed to find a non-empty-string match at a + point where there was a previous empty-string match. In this case, we do what + Perl does: advance the matching position by one character, and continue. We + do this by setting the "end of previous match" offset, because that is picked + up at the top of the loop as the point at which to start again. + + There are two complications: (a) When CRLF is a valid newline sequence, and + the current position is just before it, advance by an extra byte. (b) + Otherwise we must ensure that we skip an entire UTF character if we are in + UTF mode. */ + + if (rc == PCRE2_ERROR_NOMATCH) + { + if (options == 0) break; /* All matches found */ + ovector[1] = start_offset + 1; /* Advance one code unit */ + if (crlf_is_newline && /* If CRLF is a newline & */ + start_offset < subject_length - 1 && /* we are at CRLF, */ + subject[start_offset] == '\r' && + subject[start_offset + 1] == '\n') + ovector[1] += 1; /* Advance by one more. */ + else if (utf8) /* Otherwise, ensure we */ + { /* advance a whole UTF-8 */ + while (ovector[1] < subject_length) /* character. */ + { + if ((subject[ovector[1]] & 0xc0) != 0x80) break; + ovector[1] += 1; + } + } + continue; /* Go round the loop again */ + } + + /* Other matching errors are not recoverable. */ + + if (rc < 0) + { + printf("Matching error %d\n", rc); + pcre2_match_data_free(match_data); + pcre2_code_free(re); + return 1; + } + + /* Match succeded */ + + printf("\nMatch succeeded again at offset %d\n", (int)ovector[0]); + + /* The match succeeded, but the output vector wasn't big enough. This + should not happen. */ + + if (rc == 0) + printf("ovector was not big enough for all the captured substrings\n"); + + /* We must guard against patterns such as /(?=.\K)/ that use \K in an + assertion to set the start of a match later than its end. In this + demonstration program, we just detect this case and give up. */ + + if (ovector[0] > ovector[1]) + { + printf("\\K was used in an assertion to set the match start after its end.\n" + "From end to start the match was: %.*s\n", (int)(ovector[0] - ovector[1]), + (char *)(subject + ovector[1])); + printf("Run abandoned\n"); + pcre2_match_data_free(match_data); + pcre2_code_free(re); + return 1; + } + + /* As before, show substrings stored in the output vector by number, and then + also any named substrings. */ + + for (i = 0; i < rc; i++) + { + PCRE2_SPTR substring_start = subject + ovector[2*i]; + size_t substring_length = ovector[2*i+1] - ovector[2*i]; + printf("%2d: %.*s\n", i, (int)substring_length, (char *)substring_start); + } + + if (namecount == 0) printf("No named substrings\n"); else + { + PCRE2_SPTR tabptr = name_table; + printf("Named substrings\n"); + for (i = 0; i < namecount; i++) + { + int n = (tabptr[0] << 8) | tabptr[1]; + printf("(%d) %*s: %.*s\n", n, name_entry_size - 3, tabptr + 2, + (int)(ovector[2*n+1] - ovector[2*n]), subject + ovector[2*n]); + tabptr += name_entry_size; + } + } + } /* End of loop to find second and subsequent matches */ + +printf("\n"); +pcre2_match_data_free(match_data); +pcre2_code_free(re); +return 0; +} + +/* End of pcre2demo.c */ diff --git a/src/pcre/pcregrep.c b/src/pcre2/src/pcre2grep.c similarity index 55% rename from src/pcre/pcregrep.c rename to src/pcre2/src/pcre2grep.c index 59824068..b54229b2 100644 --- a/src/pcre/pcregrep.c +++ b/src/pcre2/src/pcre2grep.c @@ -1,18 +1,19 @@ /************************************************* -* pcregrep program * +* pcre2grep program * *************************************************/ -/* This is a grep program that uses the PCRE regular expression library to do -its pattern matching. On Unix-like, Windows, and native z/OS systems it can -recurse into directories, and in z/OS it can handle PDS files. +/* This is a grep program that uses the 8-bit PCRE regular expression library +via the PCRE2 updated API to do its pattern matching. On Unix-like, Windows, +and native z/OS systems it can recurse into directories, and in z/OS it can +handle PDS files. Note that for native z/OS, in addition to defining the NATIVE_ZOS macro, an -additional header is required. That header is not included in the main PCRE -distribution because other apparatus is needed to compile pcregrep for z/OS. +additional header is required. That header is not included in the main PCRE2 +distribution because other apparatus is needed to compile pcre2grep for z/OS. The header can be found in the special z/OS distribution, which is available from www.zaconsultants.net or from www.cbttape.org. - Copyright (c) 1997-2014 University of Cambridge + Copyright (c) 1997-2020 University of Cambridge ----------------------------------------------------------------------------- Redistribution and use in source and binary forms, with or without @@ -57,6 +58,35 @@ POSSIBILITY OF SUCH DAMAGE. #include #include +#if (defined _WIN32 || (defined HAVE_WINDOWS_H && HAVE_WINDOWS_H)) \ + && !defined WIN32 && !defined(__CYGWIN__) +#define WIN32 +#endif + +/* Some cmake's define it still */ +#if defined(__CYGWIN__) && defined(WIN32) +#undef WIN32 +#endif + +#ifdef __VMS +#include clidef +#include descrip +#include lib$routines +#endif + +#ifdef WIN32 +#include /* For _setmode() */ +#include /* For _O_BINARY */ +#endif + +#if defined(SUPPORT_PCRE2GREP_CALLOUT) && defined(SUPPORT_PCRE2GREP_CALLOUT_FORK) +#ifdef WIN32 +#include +#else +#include +#endif +#endif + #ifdef HAVE_UNISTD_H #include #endif @@ -69,14 +99,36 @@ POSSIBILITY OF SUCH DAMAGE. #include #endif -#include "pcre.h" +#define PCRE2_CODE_UNIT_WIDTH 8 +#include "pcre2.h" + +/* Older versions of MSVC lack snprintf(). This define allows for +warning/error-free compilation and testing with MSVC compilers back to at least +MSVC 10/2010. Except for VC6 (which is missing some fundamentals and fails). */ + +#if defined(_MSC_VER) && (_MSC_VER < 1900) +#define snprintf _snprintf +#endif + +/* VC and older compilers don't support %td or %zu, and even some that claim to +be C99 don't support it (hence DISABLE_PERCENT_ZT). */ + +#if defined(_MSC_VER) || !defined(__STDC_VERSION__) || __STDC_VERSION__ < 199901L || defined(DISABLE_PERCENT_ZT) +#define PTR_FORM "lu" +#define SIZ_FORM "lu" +#define SIZ_CAST (unsigned long int) +#else +#define PTR_FORM "td" +#define SIZ_FORM "zu" +#define SIZ_CAST +#endif #define FALSE 0 #define TRUE 1 typedef int BOOL; -#define OFFSET_SIZE 99 +#define DEFAULT_CAPTURE_MAX 50 #if BUFSIZ > 8192 #define MAXPATLEN BUFSIZ @@ -84,7 +136,8 @@ typedef int BOOL; #define MAXPATLEN 8192 #endif -#define PATBUFSIZE (MAXPATLEN + 10) /* Allows for prefix+suffix */ +#define FNBUFSIZ 2048 +#define ERRBUFSIZ 256 /* Values for the "filenames" variable, which specifies options for file name output. The order is important; it is assumed that a file name is wanted for @@ -107,21 +160,43 @@ enum { DEE_READ, DEE_SKIP }; #define PO_LINE_MATCH 0x0002 #define PO_FIXED_STRINGS 0x0004 -/* Line ending types */ - -enum { EL_LF, EL_CR, EL_CRLF, EL_ANY, EL_ANYCRLF }; - /* Binary file options */ enum { BIN_BINARY, BIN_NOMATCH, BIN_TEXT }; +/* Return values from decode_dollar_escape() */ + +enum { DDE_ERROR, DDE_CAPTURE, DDE_CHAR }; + /* In newer versions of gcc, with FORTIFY_SOURCE set (the default in some environments), a warning is issued if the value of fwrite() is ignored. Unfortunately, casting to (void) does not suppress the warning. To get round this, we use a macro that compiles a fudge. Oddly, this does not also seem to apply to fprintf(). */ -#define FWRITE(a,b,c,d) if (fwrite(a,b,c,d)) {} +#define FWRITE_IGNORE(a,b,c,d) if (fwrite(a,b,c,d)) {} + +/* Under Windows, we have to set stdout to be binary, so that it does not +convert \r\n at the ends of output lines to \r\r\n. However, that means that +any messages written to stdout must have \r\n as their line terminator. This is +handled by using STDOUT_NL as the newline string. We also use a normal double +quote for the example, as single quotes aren't usually available. */ + +#ifdef WIN32 +#define STDOUT_NL "\r\n" +#define STDOUT_NL_LEN 2 +#define QUOT "\"" +#else +#define STDOUT_NL "\n" +#define STDOUT_NL_LEN 1 +#define QUOT "'" +#endif + +/* This code is returned from decode_dollar_escape() when $n is encountered, +and used to mean "output STDOUT_NL". It is, of course, not a valid Unicode code +point. */ + +#define STDOUT_NL_CODE 0x7fffffffu @@ -140,28 +215,32 @@ static const char *jfriedl_prefix = ""; static const char *jfriedl_postfix = ""; #endif -static int endlinetype; +static const char *colour_string = "1;31"; +static const char *colour_option = NULL; +static const char *dee_option = NULL; +static const char *DEE_option = NULL; +static const char *locale = NULL; +static const char *newline_arg = NULL; +static const char *om_separator = NULL; +static const char *stdin_name = "(standard input)"; +static const char *output_text = NULL; -static char *colour_string = (char *)"1;31"; -static char *colour_option = NULL; -static char *dee_option = NULL; -static char *DEE_option = NULL; -static char *locale = NULL; static char *main_buffer = NULL; -static char *newline = NULL; -static char *om_separator = (char *)""; -static char *stdin_name = (char *)"(standard input)"; - -static const unsigned char *pcretables = NULL; static int after_context = 0; static int before_context = 0; static int binary_files = BIN_BINARY; static int both_context = 0; -static int bufthird = PCREGREP_BUFSIZE; -static int bufsize = 3*PCREGREP_BUFSIZE; +static int bufthird = PCRE2GREP_BUFSIZE; +static int max_bufthird = PCRE2GREP_MAX_BUFSIZE; +static int bufsize = 3*PCRE2GREP_BUFSIZE; +static int endlinetype; + +static int count_limit = -1; /* Not long, so that it works with OP_NUMBER */ +static unsigned long int counts_printed = 0; +static unsigned long int total_count = 0; -#if defined HAVE_WINDOWS_H && HAVE_WINDOWS_H +#ifdef WIN32 static int dee_action = dee_SKIP; #else static int dee_action = dee_READ; @@ -170,20 +249,33 @@ static int dee_action = dee_READ; static int DEE_action = DEE_READ; static int error_count = 0; static int filenames = FN_DEFAULT; -static int pcre_options = 0; -static int process_options = 0; -#ifdef SUPPORT_PCREGREP_JIT -static int study_options = PCRE_STUDY_JIT_COMPILE; +#ifdef SUPPORT_PCRE2GREP_JIT +static BOOL use_jit = TRUE; #else -static int study_options = 0; +static BOOL use_jit = FALSE; #endif -static unsigned long int match_limit = 0; -static unsigned long int match_limit_recursion = 0; +static const uint8_t *character_tables = NULL; + +static uint32_t pcre2_options = 0; +static uint32_t extra_options = 0; +static PCRE2_SIZE heap_limit = PCRE2_UNSET; +static uint32_t match_limit = 0; +static uint32_t depth_limit = 0; + +static pcre2_compile_context *compile_context; +static pcre2_match_context *match_context; +static pcre2_match_data *match_data; +static PCRE2_SIZE *offsets; +static uint32_t offset_size; +static uint32_t capture_max = DEFAULT_CAPTURE_MAX; static BOOL count_only = FALSE; static BOOL do_colour = FALSE; +#ifdef WIN32 +static BOOL do_ansi = FALSE; +#endif static BOOL file_offsets = FALSE; static BOOL hyphenpending = FALSE; static BOOL invert = FALSE; @@ -194,9 +286,12 @@ static BOOL number = FALSE; static BOOL omit_zero_count = FALSE; static BOOL resource_error = FALSE; static BOOL quiet = FALSE; -static BOOL show_only_matching = FALSE; +static BOOL show_total_count = FALSE; static BOOL silent = FALSE; -static BOOL utf8 = FALSE; +static BOOL utf = FALSE; + +static uint8_t utf8_buffer[8]; + /* Structure for list of --only-matching capturing numbers. */ @@ -207,6 +302,7 @@ typedef struct omstr { static omstr *only_matching = NULL; static omstr *only_matching_last = NULL; +static int only_matching_count; /* Structure for holding the two variables that describe a number chain. */ @@ -252,8 +348,8 @@ also for include/exclude patterns. */ typedef struct patstr { struct patstr *next; char *string; - pcre *compiled; - pcre_extra *hint; + PCRE2_SIZE length; + pcre2_code *compiled; } patstr; static patstr *patterns = NULL; @@ -289,7 +385,7 @@ static const char *incexname[4] = { "--include", "--exclude", /* Structure for options and list of them */ -enum { OP_NODATA, OP_STRING, OP_OP_STRING, OP_NUMBER, OP_LONGNUMBER, +enum { OP_NODATA, OP_STRING, OP_OP_STRING, OP_NUMBER, OP_U32NUMBER, OP_SIZE, OP_OP_NUMBER, OP_OP_NUMBERS, OP_PATLIST, OP_FILELIST, OP_BINFILES }; typedef struct option_item { @@ -315,15 +411,18 @@ used to identify them. */ #define N_LOFFSETS (-10) #define N_FOFFSETS (-11) #define N_LBUFFER (-12) -#define N_M_LIMIT (-13) -#define N_M_LIMIT_REC (-14) -#define N_BUFSIZE (-15) -#define N_NOJIT (-16) -#define N_FILE_LIST (-17) -#define N_BINARY_FILES (-18) -#define N_EXCLUDE_FROM (-19) -#define N_INCLUDE_FROM (-20) -#define N_OM_SEPARATOR (-21) +#define N_H_LIMIT (-13) +#define N_M_LIMIT (-14) +#define N_M_LIMIT_DEP (-15) +#define N_BUFSIZE (-16) +#define N_NOJIT (-17) +#define N_FILE_LIST (-18) +#define N_BINARY_FILES (-19) +#define N_EXCLUDE_FROM (-20) +#define N_INCLUDE_FROM (-21) +#define N_OM_SEPARATOR (-22) +#define N_MAX_BUFSIZE (-23) +#define N_OM_CAPTURE (-24) static option_item optionlist[] = { { OP_NODATA, N_NULL, NULL, "", "terminate options" }, @@ -332,7 +431,8 @@ static option_item optionlist[] = { { OP_NODATA, 'a', NULL, "text", "treat binary files as text" }, { OP_NUMBER, 'B', &before_context, "before-context=number", "set number of prior context lines" }, { OP_BINFILES, N_BINARY_FILES, NULL, "binary-files=word", "set treatment of binary files" }, - { OP_NUMBER, N_BUFSIZE,&bufthird, "buffer-size=number", "set processing buffer size parameter" }, + { OP_NUMBER, N_BUFSIZE,&bufthird, "buffer-size=number", "set processing buffer starting size" }, + { OP_NUMBER, N_MAX_BUFSIZE,&max_bufthird, "max-buffer-size=number", "set processing buffer maximum size" }, { OP_OP_STRING, N_COLOUR, &colour_option, "color=option", "matched text color option" }, { OP_OP_STRING, N_COLOUR, &colour_option, "colour=option", "matched text colour option" }, { OP_NUMBER, 'C', &both_context, "context=number", "set number of context lines, before & after" }, @@ -348,24 +448,29 @@ static option_item optionlist[] = { { OP_NODATA, 'h', NULL, "no-filename", "suppress the prefixing filename on output" }, { OP_NODATA, 'I', NULL, "", "treat binary files as not matching (ignore)" }, { OP_NODATA, 'i', NULL, "ignore-case", "ignore case distinctions" }, -#ifdef SUPPORT_PCREGREP_JIT - { OP_NODATA, N_NOJIT, NULL, "no-jit", "do not use just-in-time compiler optimization" }, -#else - { OP_NODATA, N_NOJIT, NULL, "no-jit", "ignored: this pcregrep does not support JIT" }, -#endif { OP_NODATA, 'l', NULL, "files-with-matches", "print only FILE names containing matches" }, { OP_NODATA, 'L', NULL, "files-without-match","print only FILE names not containing matches" }, { OP_STRING, N_LABEL, &stdin_name, "label=name", "set name for standard input" }, { OP_NODATA, N_LBUFFER, NULL, "line-buffered", "use line buffering" }, { OP_NODATA, N_LOFFSETS, NULL, "line-offsets", "output line numbers and offsets, not text" }, { OP_STRING, N_LOCALE, &locale, "locale=locale", "use the named locale" }, - { OP_LONGNUMBER, N_M_LIMIT, &match_limit, "match-limit=number", "set PCRE match limit option" }, - { OP_LONGNUMBER, N_M_LIMIT_REC, &match_limit_recursion, "recursion-limit=number", "set PCRE match recursion limit option" }, + { OP_SIZE, N_H_LIMIT, &heap_limit, "heap-limit=number", "set PCRE2 heap limit option (kibibytes)" }, + { OP_U32NUMBER, N_M_LIMIT, &match_limit, "match-limit=number", "set PCRE2 match limit option" }, + { OP_U32NUMBER, N_M_LIMIT_DEP, &depth_limit, "depth-limit=number", "set PCRE2 depth limit option" }, + { OP_U32NUMBER, N_M_LIMIT_DEP, &depth_limit, "recursion-limit=number", "obsolete synonym for depth-limit" }, { OP_NODATA, 'M', NULL, "multiline", "run in multiline mode" }, - { OP_STRING, 'N', &newline, "newline=type", "set newline type (CR, LF, CRLF, ANYCRLF or ANY)" }, + { OP_NUMBER, 'm', &count_limit, "max-count=number", "stop after matched lines" }, + { OP_STRING, 'N', &newline_arg, "newline=type", "set newline type (CR, LF, CRLF, ANYCRLF, ANY, or NUL)" }, { OP_NODATA, 'n', NULL, "line-number", "print line number with output lines" }, +#ifdef SUPPORT_PCRE2GREP_JIT + { OP_NODATA, N_NOJIT, NULL, "no-jit", "do not use just-in-time compiler optimization" }, +#else + { OP_NODATA, N_NOJIT, NULL, "no-jit", "ignored: this pcre2grep does not support JIT" }, +#endif + { OP_STRING, 'O', &output_text, "output=text", "show only this text (possibly expanded)" }, { OP_OP_NUMBERS, 'o', &only_matching_data, "only-matching=n", "show only the part of the line that matched" }, { OP_STRING, N_OM_SEPARATOR, &om_separator, "om-separator=text", "set separator for multiple -o output" }, + { OP_U32NUMBER, N_OM_CAPTURE, &capture_max, "om-capture=n", "set capture count for --only-matching" }, { OP_NODATA, 'q', NULL, "quiet", "suppress output, just set return code" }, { OP_NODATA, 'r', NULL, "recursive", "recursively scan sub-directories" }, { OP_PATLIST, N_EXCLUDE,&exclude_patdata, "exclude=pattern","exclude matching files when recursing" }, @@ -374,20 +479,13 @@ static option_item optionlist[] = { { OP_PATLIST, N_INCLUDE_DIR,&include_dir_patdata, "include-dir=pattern","include matching directories when recursing" }, { OP_FILELIST, N_EXCLUDE_FROM,&exclude_from_data, "exclude-from=path", "read exclude list from file" }, { OP_FILELIST, N_INCLUDE_FROM,&include_from_data, "include-from=path", "read include list from file" }, - - /* These two were accidentally implemented with underscores instead of - hyphens in the option names. As this was not discovered for several releases, - the incorrect versions are left in the table for compatibility. However, the - --help function misses out any option that has an underscore in its name. */ - - { OP_PATLIST, N_EXCLUDE_DIR,&exclude_dir_patdata, "exclude_dir=pattern","exclude matching directories when recursing" }, - { OP_PATLIST, N_INCLUDE_DIR,&include_dir_patdata, "include_dir=pattern","include matching directories when recursing" }, - #ifdef JFRIEDL_DEBUG { OP_OP_NUMBER, 'S', &S_arg, "jeffS", "replace matched (sub)string with X" }, #endif { OP_NODATA, 's', NULL, "no-messages", "suppress error messages" }, - { OP_NODATA, 'u', NULL, "utf-8", "use UTF-8 mode" }, + { OP_NODATA, 't', NULL, "total-count", "print total count of matching lines" }, + { OP_NODATA, 'u', NULL, "utf", "use UTF mode" }, + { OP_NODATA, 'U', NULL, "utf-allow-invalid", "use UTF mode, allow for invalid code units" }, { OP_NODATA, 'V', NULL, "version", "print version information and exit" }, { OP_NODATA, 'v', NULL, "invert-match", "select non-matching lines" }, { OP_NODATA, 'w', NULL, "word-regex(p)", "force patterns to match only as words" }, @@ -395,21 +493,19 @@ static option_item optionlist[] = { { OP_NODATA, 0, NULL, NULL, NULL } }; -/* Tables for prefixing and suffixing patterns, according to the -w, -x, and -F -options. These set the 1, 2, and 4 bits in process_options, respectively. Note -that the combination of -w and -x has the same effect as -x on its own, so we -can treat them as the same. Note that the MAXPATLEN macro assumes the longest -prefix+suffix is 10 characters; if anything longer is added, it must be -adjusted. */ +/* Table of names for newline types. Must be kept in step with the definitions +of PCRE2_NEWLINE_xx in pcre2.h. */ -static const char *prefix[] = { - "", "\\b", "^(?:", "^(?:", "\\Q", "\\b\\Q", "^(?:\\Q", "^(?:\\Q" }; +static const char *newlines[] = { + "DEFAULT", "CR", "LF", "CRLF", "ANY", "ANYCRLF", "NUL" }; -static const char *suffix[] = { - "", "\\b", ")$", ")$", "\\E", "\\E\\b", "\\E)$", "\\E)$" }; +/* UTF-8 tables */ -/* UTF-8 tables - used only when the newline setting is "any". */ +const int utf8_table1[] = + { 0x7f, 0x7ff, 0xffff, 0x1fffff, 0x3ffffff, 0x7fffffff}; +const int utf8_table1_size = sizeof(utf8_table1) / sizeof(int); +const int utf8_table2[] = { 0, 0xc0, 0xe0, 0xf0, 0xf8, 0xfc}; const int utf8_table3[] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01}; const char utf8_table4[] = { @@ -419,6 +515,116 @@ const char utf8_table4[] = { 3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5 }; +#if !defined(VPCOMPAT) && !defined(HAVE_MEMMOVE) +/************************************************* +* Emulated memmove() for systems without it * +*************************************************/ + +/* This function can make use of bcopy() if it is available. Otherwise do it by +steam, as there are some non-Unix environments that lack both memmove() and +bcopy(). */ + +static void * +emulated_memmove(void *d, const void *s, size_t n) +{ +#ifdef HAVE_BCOPY +bcopy(s, d, n); +return d; +#else +size_t i; +unsigned char *dest = (unsigned char *)d; +const unsigned char *src = (const unsigned char *)s; +if (dest > src) + { + dest += n; + src += n; + for (i = 0; i < n; ++i) *(--dest) = *(--src); + return (void *)dest; + } +else + { + for (i = 0; i < n; ++i) *dest++ = *src++; + return (void *)(dest - n); + } +#endif /* not HAVE_BCOPY */ +} +#undef memmove +#define memmove(d,s,n) emulated_memmove(d,s,n) +#endif /* not VPCOMPAT && not HAVE_MEMMOVE */ + + + +/************************************************* +* Convert code point to UTF-8 * +*************************************************/ + +/* A static buffer is used. Returns the number of bytes. */ + +static int +ord2utf8(uint32_t value) +{ +int i, j; +uint8_t *utf8bytes = utf8_buffer; +for (i = 0; i < utf8_table1_size; i++) + if (value <= (uint32_t)utf8_table1[i]) break; +utf8bytes += i; +for (j = i; j > 0; j--) + { + *utf8bytes-- = 0x80 | (value & 0x3f); + value >>= 6; + } +*utf8bytes = utf8_table2[i] | value; +return i + 1; +} + + + +/************************************************* +* Case-independent string compare * +*************************************************/ + +static int +strcmpic(const char *str1, const char *str2) +{ +unsigned int c1, c2; +while (*str1 != '\0' || *str2 != '\0') + { + c1 = tolower(*str1++); + c2 = tolower(*str2++); + if (c1 != c2) return ((c1 > c2) << 1) - 1; + } +return 0; +} + + +/************************************************* +* Parse GREP_COLORS * +*************************************************/ + +/* Extract ms or mt from GREP_COLORS. + +Argument: the string, possibly NULL +Returns: the value of ms or mt, or NULL if neither present +*/ + +static char * +parse_grep_colors(const char *gc) +{ +static char seq[16]; +char *col; +uint32_t len; +if (gc == NULL) return NULL; +col = strstr(gc, "ms="); +if (col == NULL) col = strstr(gc, "mt="); +if (col == NULL) return NULL; +len = 0; +col += 3; +while (*col != ':' && *col != 0 && len < sizeof(seq)-1) + seq[len++] = *col++; +seq[len] = 0; +return seq; +} + /************************************************* * Exit from the program * @@ -431,14 +637,28 @@ Returns: does not return */ static void -pcregrep_exit(int rc) +pcre2grep_exit(int rc) { +/* VMS does exit codes differently: both exit(1) and exit(0) return with a +status of 1, which is not helpful. To help with this problem, define a symbol +(akin to an environment variable) called "PCRE2GREP_RC" and put the exit code +therein. */ + +#ifdef __VMS + char val_buf[4]; + $DESCRIPTOR(sym_nam, "PCRE2GREP_RC"); + $DESCRIPTOR(sym_val, val_buf); + sprintf(val_buf, "%d", rc); + sym_val.dsc$w_length = strlen(val_buf); + lib$set_symbol(&sym_nam, &sym_val); +#endif + if (resource_error) { - fprintf(stderr, "pcregrep: Error %d, %d or %d means that a resource limit " - "was exceeded.\n", PCRE_ERROR_MATCHLIMIT, PCRE_ERROR_RECURSIONLIMIT, - PCRE_ERROR_JIT_STACKLIMIT); - fprintf(stderr, "pcregrep: Check your regex for nested unlimited loops.\n"); + fprintf(stderr, "pcre2grep: Error %d, %d, %d or %d means that a resource " + "limit was exceeded.\n", PCRE2_ERROR_JIT_STACKLIMIT, PCRE2_ERROR_MATCHLIMIT, + PCRE2_ERROR_DEPTHLIMIT, PCRE2_ERROR_HEAPLIMIT); + fprintf(stderr, "pcre2grep: Check your regex for nested unlimited loops.\n"); } exit(rc); } @@ -453,31 +673,32 @@ exit(rc); Arguments: s pattern string to add + patlen length of pattern after if not NULL points to item to insert after Returns: new pattern block or NULL on error */ static patstr * -add_pattern(char *s, patstr *after) +add_pattern(char *s, PCRE2_SIZE patlen, patstr *after) { patstr *p = (patstr *)malloc(sizeof(patstr)); if (p == NULL) { - fprintf(stderr, "pcregrep: malloc failed\n"); - pcregrep_exit(2); + fprintf(stderr, "pcre2grep: malloc failed\n"); + pcre2grep_exit(2); } -if (strlen(s) > MAXPATLEN) +if (patlen > MAXPATLEN) { - fprintf(stderr, "pcregrep: pattern is too long (limit is %d bytes)\n", + fprintf(stderr, "pcre2grep: pattern is too long (limit is %d bytes)\n", MAXPATLEN); free(p); return NULL; } p->next = NULL; p->string = s; +p->length = patlen; p->compiled = NULL; -p->hint = NULL; if (after != NULL) { @@ -505,8 +726,7 @@ while (pc != NULL) { patstr *p = pc; pc = p->next; - if (p->hint != NULL) pcre_free_study(p->hint); - if (p->compiled != NULL) pcre_free(p->compiled); + if (p->compiled != NULL) pcre2_code_free(p->compiled); free(p); } } @@ -537,9 +757,83 @@ while (fn != NULL) * OS-specific functions * *************************************************/ -/* These functions are defined so that they can be made system specific. -At present there are versions for Unix-style environments, Windows, native -z/OS, and "no support". */ +/* These definitions are needed in all Windows environments, even those where +Unix-style directory scanning can be used (see below). */ + +#ifdef WIN32 + +#ifndef STRICT +# define STRICT +#endif +#ifndef WIN32_LEAN_AND_MEAN +# define WIN32_LEAN_AND_MEAN +#endif + +#include + +#define iswild(name) (strpbrk(name, "*?") != NULL) + +/* Convert ANSI BGR format to RGB used by Windows */ +#define BGR_RGB(x) ((x & 1 ? 4 : 0) | (x & 2) | (x & 4 ? 1 : 0)) + +static HANDLE hstdout; +static CONSOLE_SCREEN_BUFFER_INFO csbi; +static WORD match_colour; + +static WORD +decode_ANSI_colour(const char *cs) +{ +WORD result = csbi.wAttributes; +while (*cs) + { + if (isdigit(*cs)) + { + int code = atoi(cs); + if (code == 1) result |= 0x08; + else if (code == 4) result |= 0x8000; + else if (code == 5) result |= 0x80; + else if (code >= 30 && code <= 37) result = (result & 0xF8) | BGR_RGB(code - 30); + else if (code == 39) result = (result & 0xF0) | (csbi.wAttributes & 0x0F); + else if (code >= 40 && code <= 47) result = (result & 0x8F) | (BGR_RGB(code - 40) << 4); + else if (code == 49) result = (result & 0x0F) | (csbi.wAttributes & 0xF0); + /* aixterm high intensity colour codes */ + else if (code >= 90 && code <= 97) result = (result & 0xF0) | BGR_RGB(code - 90) | 0x08; + else if (code >= 100 && code <= 107) result = (result & 0x0F) | (BGR_RGB(code - 100) << 4) | 0x80; + + while (isdigit(*cs)) cs++; + } + if (*cs) cs++; + } +return result; +} + + +static void +init_colour_output() +{ +if (do_colour) + { + hstdout = GetStdHandle(STD_OUTPUT_HANDLE); + /* This fails when redirected to con; try again if so. */ + if (!GetConsoleScreenBufferInfo(hstdout, &csbi) && !do_ansi) + { + HANDLE hcon = CreateFile("CONOUT$", GENERIC_READ | GENERIC_WRITE, + FILE_SHARE_WRITE, NULL, OPEN_EXISTING, 0, NULL); + GetConsoleScreenBufferInfo(hcon, &csbi); + CloseHandle(hcon); + } + match_colour = decode_ANSI_colour(colour_string); + /* No valid colour found - turn off colouring */ + if (!match_colour) do_colour = FALSE; + } +} + +#endif /* WIN32 */ + + +/* The following sets of functions are defined so that they can be made system +specific. At present there are versions for Unix-style environments, Windows, +native z/OS, and "no support". */ /************* Directory scanning Unix-style and z/OS ***********/ @@ -555,7 +849,7 @@ z/OS, and "no support". */ /* However, z/OS needs the #include statements in this header */ #include "pcrzosfs.h" /* That header is not included in the main PCRE distribution because - other apparatus is needed to compile pcregrep for z/OS. The header + other apparatus is needed to compile pcre2grep for z/OS. The header can be found in the special z/OS distribution, which is available from www.zaconsultants.net or from www.cbttape.org. */ #endif @@ -569,7 +863,7 @@ isdirectory(char *filename) struct stat statbuf; if (stat(filename, &statbuf) < 0) return 0; /* In the expectation that opening as a file will fail */ -return (statbuf.st_mode & S_IFMT) == S_IFDIR; +return S_ISDIR(statbuf.st_mode); } static directory_type * @@ -606,7 +900,7 @@ isregfile(char *filename) struct stat statbuf; if (stat(filename, &statbuf) < 0) return 1; /* In the expectation that opening as a file will fail */ -return (statbuf.st_mode & S_IFMT) == S_IFREG; +return S_ISREG(statbuf.st_mode); } @@ -643,6 +937,18 @@ return isatty(fileno(f)); } #endif + +/************* Print optionally coloured match Unix-style and z/OS **********/ + +static void +print_match(const void *buf, int length) +{ +if (length == 0) return; +if (do_colour) fprintf(stdout, "%c[%sm", 0x1b, colour_string); +FWRITE_IGNORE(buf, 1, length, stdout); +if (do_colour) fprintf(stdout, "%c[0m", 0x1b); +} + /* End of Unix-style or native z/OS environment functions. */ @@ -652,19 +958,9 @@ return isatty(fileno(f)); Lionel Fourquaux. David Burgess added a patch to define INVALID_FILE_ATTRIBUTES when it did not exist. David Byron added a patch that moved the #include of to before the INVALID_FILE_ATTRIBUTES definition rather than after. -The double test below stops gcc 4.4.4 grumbling that HAVE_WINDOWS_H is -undefined when it is indeed undefined. */ - -#elif defined HAVE_WINDOWS_H && HAVE_WINDOWS_H - -#ifndef STRICT -# define STRICT -#endif -#ifndef WIN32_LEAN_AND_MEAN -# define WIN32_LEAN_AND_MEAN -#endif +*/ -#include +#elif defined WIN32 #ifndef INVALID_FILE_ATTRIBUTES #define INVALID_FILE_ATTRIBUTES 0xFFFFFFFF @@ -700,11 +996,14 @@ pattern = (char *)malloc(len + 3); dir = (directory_type *)malloc(sizeof(*dir)); if ((pattern == NULL) || (dir == NULL)) { - fprintf(stderr, "pcregrep: malloc failed\n"); - pcregrep_exit(2); + fprintf(stderr, "pcre2grep: malloc failed\n"); + pcre2grep_exit(2); } memcpy(pattern, filename, len); -memcpy(&(pattern[len]), "\\*", 3); +if (iswild(filename)) + pattern[len] = 0; +else + memcpy(&(pattern[len]), "\\*", 3); dir->handle = FindFirstFile(pattern, &(dir->data)); if (dir->handle != INVALID_HANDLE_VALUE) { @@ -762,18 +1061,36 @@ return !isdirectory(filename); /************* Test for a terminal in Windows **********/ -/* I don't know how to do this; assume never */ - static BOOL is_stdout_tty(void) { -return FALSE; +return _isatty(_fileno(stdout)); } static BOOL is_file_tty(FILE *f) { -return FALSE; +return _isatty(_fileno(f)); +} + + +/************* Print optionally coloured match in Windows **********/ + +static void +print_match(const void *buf, int length) +{ +if (length == 0) return; +if (do_colour) + { + if (do_ansi) fprintf(stdout, "%c[%sm", 0x1b, colour_string); + else SetConsoleTextAttribute(hstdout, match_colour); + } +FWRITE_IGNORE(buf, 1, length, stdout); +if (do_colour) + { + if (do_ansi) fprintf(stdout, "%c[0m", 0x1b); + else SetConsoleTextAttribute(hstdout, csbi.wAttributes); + } } /* End of Windows functions */ @@ -815,6 +1132,16 @@ is_file_tty(FILE *f) return FALSE; } + +/************* Print optionally coloured match when we can't do it **********/ + +static void +print_match(const void *buf, int length) +{ +if (length == 0) return; +FWRITE_IGNORE(buf, 1, length, stdout); +} + #endif /* End of system-specific functions */ @@ -849,13 +1176,13 @@ static int usage(int rc) { option_item *op; -fprintf(stderr, "Usage: pcregrep [-"); +fprintf(stderr, "Usage: pcre2grep [-"); for (op = optionlist; op->one_char != 0; op++) { if (op->one_char > 0) fprintf(stderr, "%c", op->one_char); } fprintf(stderr, "] [long options] [pattern] [files]\n"); -fprintf(stderr, "Type `pcregrep --help' for more information and the long " +fprintf(stderr, "Type \"pcre2grep --help\" for more information and the long " "options.\n"); return rc; } @@ -871,43 +1198,44 @@ help(void) { option_item *op; -printf("Usage: pcregrep [OPTION]... [PATTERN] [FILE1 FILE2 ...]\n"); -printf("Search for PATTERN in each FILE or standard input.\n"); -printf("PATTERN must be present if neither -e nor -f is used.\n"); -printf("\"-\" can be used as a file name to mean STDIN.\n"); +printf("Usage: pcre2grep [OPTION]... [PATTERN] [FILE1 FILE2 ...]" STDOUT_NL); +printf("Search for PATTERN in each FILE or standard input." STDOUT_NL); +printf("PATTERN must be present if neither -e nor -f is used." STDOUT_NL); + +#ifdef SUPPORT_PCRE2GREP_CALLOUT +#ifdef SUPPORT_PCRE2GREP_CALLOUT_FORK +printf("All callout scripts in patterns are supported." STDOUT_NL); +#else +printf("Non-fork callout scripts in patterns are supported." STDOUT_NL); +#endif +#else +printf("Callout scripts are not supported in this pcre2grep." STDOUT_NL); +#endif + +printf("\"-\" can be used as a file name to mean STDIN." STDOUT_NL); #ifdef SUPPORT_LIBZ -printf("Files whose names end in .gz are read using zlib.\n"); +printf("Files whose names end in .gz are read using zlib." STDOUT_NL); #endif #ifdef SUPPORT_LIBBZ2 -printf("Files whose names end in .bz2 are read using bzlib2.\n"); +printf("Files whose names end in .bz2 are read using bzlib2." STDOUT_NL); #endif #if defined SUPPORT_LIBZ || defined SUPPORT_LIBBZ2 -printf("Other files and the standard input are read as plain files.\n\n"); +printf("Other files and the standard input are read as plain files." STDOUT_NL STDOUT_NL); #else -printf("All files are read as plain files, without any interpretation.\n\n"); +printf("All files are read as plain files, without any interpretation." STDOUT_NL STDOUT_NL); #endif -printf("Example: pcregrep -i 'hello.*world' menu.h main.c\n\n"); -printf("Options:\n"); +printf("Example: pcre2grep -i " QUOT "hello.*world" QUOT " menu.h main.c" STDOUT_NL STDOUT_NL); +printf("Options:" STDOUT_NL); for (op = optionlist; op->one_char != 0; op++) { int n; char s[4]; - /* Two options were accidentally implemented and documented with underscores - instead of hyphens in their names, something that was not noticed for quite a - few releases. When fixing this, I left the underscored versions in the list - in case people were using them. However, we don't want to display them in the - help data. There are no other options that contain underscores, and we do not - expect ever to implement such options. Therefore, just omit any option that - contains an underscore. */ - - if (strchr(op->long_name, '_') != NULL) continue; - if (op->one_char > 0 && (op->long_name)[0] == 0) n = 31 - printf(" -%c", op->one_char); else @@ -918,17 +1246,18 @@ for (op = optionlist; op->one_char != 0; op++) } if (n < 1) n = 1; - printf("%.*s%s\n", n, " ", op->help_text); + printf("%.*s%s" STDOUT_NL, n, " ", op->help_text); } -printf("\nNumbers may be followed by K or M, e.g. --buffer-size=100K.\n"); -printf("The default value for --buffer-size is %d.\n", PCREGREP_BUFSIZE); -printf("When reading patterns or file names from a file, trailing white\n"); -printf("space is removed and blank lines are ignored.\n"); -printf("The maximum size of any pattern is %d bytes.\n", MAXPATLEN); +printf(STDOUT_NL "Numbers may be followed by K or M, e.g. --max-buffer-size=100K." STDOUT_NL); +printf("The default value for --buffer-size is %d." STDOUT_NL, PCRE2GREP_BUFSIZE); +printf("The default value for --max-buffer-size is %d." STDOUT_NL, PCRE2GREP_MAX_BUFSIZE); +printf("When reading patterns or file names from a file, trailing white" STDOUT_NL); +printf("space is removed and blank lines are ignored." STDOUT_NL); +printf("The maximum size of any pattern is %d bytes." STDOUT_NL, MAXPATLEN); -printf("\nWith no FILEs, read standard input. If fewer than two FILEs given, assume -h.\n"); -printf("Exit status is 0 if any matches, 1 if no matches, and 2 if trouble.\n"); +printf(STDOUT_NL "With no FILEs, read standard input. If fewer than two FILEs given, assume -h." STDOUT_NL); +printf("Exit status is 0 if any matches, 1 if no matches, and 2 if trouble." STDOUT_NL); } @@ -951,11 +1280,11 @@ Returns: TRUE if the path is not excluded static BOOL test_incexc(char *path, patstr *ip, patstr *ep) { -int plen = strlen(path); +int plen = strlen((const char *)path); for (; ep != NULL; ep = ep->next) { - if (pcre_exec(ep->compiled, NULL, path, plen, 0, 0, NULL, 0) >= 0) + if (pcre2_match(ep->compiled, (PCRE2_SPTR)path, plen, 0, 0, match_data, NULL) >= 0) return FALSE; } @@ -963,7 +1292,7 @@ if (ip == NULL) return TRUE; for (; ip != NULL; ip = ip->next) { - if (pcre_exec(ip->compiled, NULL, path, plen, 0, 0, NULL, 0) >= 0) + if (pcre2_match(ip->compiled, (PCRE2_SPTR)path, plen, 0, 0, match_data, NULL) >= 0) return TRUE; } @@ -1014,13 +1343,13 @@ if (*endptr != 0) /* Error */ char *equals = strchr(op->long_name, '='); int nlen = (equals == NULL)? (int)strlen(op->long_name) : (int)(equals - op->long_name); - fprintf(stderr, "pcregrep: Malformed number \"%s\" after --%.*s\n", + fprintf(stderr, "pcre2grep: Malformed number \"%s\" after --%.*s\n", option_data, nlen, op->long_name); } else - fprintf(stderr, "pcregrep: Malformed number \"%s\" after -%c\n", + fprintf(stderr, "pcre2grep: Malformed number \"%s\" after -%c\n", option_data, op->one_char); - pcregrep_exit(usage(2)); + pcre2grep_exit(usage(2)); } return n; @@ -1049,8 +1378,8 @@ omstr *om = (omstr *)malloc(sizeof(omstr)); if (om == NULL) { - fprintf(stderr, "pcregrep: malloc failed\n"); - pcregrep_exit(2); + fprintf(stderr, "pcre2grep: malloc failed\n"); + pcre2grep_exit(2); } om->next = NULL; om->groupnum = n; @@ -1069,12 +1398,14 @@ return om; * Read one line of input * *************************************************/ -/* Normally, input is read using fread() into a large buffer, so many lines may -be read at once. However, doing this for tty input means that no output appears -until a lot of input has been typed. Instead, tty input is handled line by -line. We cannot use fgets() for this, because it does not stop at a binary -zero, and therefore there is no way of telling how many characters it has read, -because there may be binary zeros embedded in the data. +/* Normally, input that is to be scanned is read using fread() (or gzread, or +BZ2_read) into a large buffer, so many lines may be read at once. However, +doing this for tty input means that no output appears until a lot of input has +been typed. Instead, tty input is handled line by line. We cannot use fgets() +for this, because it does not stop at a binary zero, and therefore there is no +way of telling how many characters it has read, because there may be binary +zeros embedded in the data. This function is also used for reading patterns +from files (the -f option). Arguments: buffer the buffer to read into @@ -1084,7 +1415,7 @@ because there may be binary zeros embedded in the data. Returns: the number of characters read, zero at end of file */ -static unsigned int +static PCRE2_SIZE read_one_line(char *buffer, int length, FILE *f) { int c; @@ -1121,7 +1452,7 @@ end_of_line(char *p, char *endptr, int *lenptr) switch(endlinetype) { default: /* Just in case */ - case EL_LF: + case PCRE2_NEWLINE_LF: while (p < endptr && *p != '\n') p++; if (p < endptr) { @@ -1131,7 +1462,7 @@ switch(endlinetype) *lenptr = 0; return endptr; - case EL_CR: + case PCRE2_NEWLINE_CR: while (p < endptr && *p != '\r') p++; if (p < endptr) { @@ -1141,7 +1472,17 @@ switch(endlinetype) *lenptr = 0; return endptr; - case EL_CRLF: + case PCRE2_NEWLINE_NUL: + while (p < endptr && *p != '\0') p++; + if (p < endptr) + { + *lenptr = 1; + return p + 1; + } + *lenptr = 0; + return endptr; + + case PCRE2_NEWLINE_CRLF: for (;;) { while (p < endptr && *p != '\r') p++; @@ -1158,13 +1499,13 @@ switch(endlinetype) } break; - case EL_ANYCRLF: + case PCRE2_NEWLINE_ANYCRLF: while (p < endptr) { int extra = 0; - register int c = *((unsigned char *)p); + int c = *((unsigned char *)p); - if (utf8 && c >= 0xc0) + if (utf && c >= 0xc0) { int gcii, gcss; extra = utf8_table4[c & 0x3f]; /* Number of additional bytes */ @@ -1202,13 +1543,13 @@ switch(endlinetype) *lenptr = 0; /* Must have hit the end */ return endptr; - case EL_ANY: + case PCRE2_NEWLINE_ANY: while (p < endptr) { int extra = 0; - register int c = *((unsigned char *)p); + int c = *((unsigned char *)p); - if (utf8 && c >= 0xc0) + if (utf && c >= 0xc0) { int gcii, gcss; extra = utf8_table4[c & 0x3f]; /* Number of additional bytes */ @@ -1242,7 +1583,7 @@ switch(endlinetype) #ifndef EBCDIC case 0x85: /* Unicode NEL */ - *lenptr = utf8? 2 : 1; + *lenptr = utf? 2 : 1; return p; case 0x2028: /* Unicode LS */ @@ -1282,17 +1623,22 @@ previous_line(char *p, char *startptr) switch(endlinetype) { default: /* Just in case */ - case EL_LF: + case PCRE2_NEWLINE_LF: p--; while (p > startptr && p[-1] != '\n') p--; return p; - case EL_CR: + case PCRE2_NEWLINE_CR: p--; while (p > startptr && p[-1] != '\n') p--; return p; - case EL_CRLF: + case PCRE2_NEWLINE_NUL: + p--; + while (p > startptr && p[-1] != '\0') p--; + return p; + + case PCRE2_NEWLINE_CRLF: for (;;) { p -= 2; @@ -1301,17 +1647,17 @@ switch(endlinetype) } /* Control can never get here */ - case EL_ANY: - case EL_ANYCRLF: + case PCRE2_NEWLINE_ANY: + case PCRE2_NEWLINE_ANYCRLF: if (*(--p) == '\n' && p > startptr && p[-1] == '\r') p--; - if (utf8) while ((*p & 0xc0) == 0x80) p--; + if (utf) while ((*p & 0xc0) == 0x80) p--; while (p > startptr) { - register unsigned int c; + unsigned int c; char *pp = p - 1; - if (utf8) + if (utf) { int extra = 0; while ((*pp & 0xc0) == 0x80) pp--; @@ -1331,7 +1677,7 @@ switch(endlinetype) } else c = *((unsigned char *)pp); - if (endlinetype == EL_ANYCRLF) switch (c) + if (endlinetype == PCRE2_NEWLINE_ANYCRLF) switch (c) { case '\n': /* LF */ case '\r': /* CR */ @@ -1347,7 +1693,7 @@ switch(endlinetype) case '\v': /* VT */ case '\f': /* FF */ case '\r': /* CR */ -#ifndef EBCDIE +#ifndef EBCDIC case 0x85: /* Unicode NEL */ case 0x2028: /* Unicode LS */ case 0x2029: /* Unicode PS */ @@ -1367,6 +1713,42 @@ switch(endlinetype) +/************************************************* +* Output newline at end * +*************************************************/ + +/* This function is called if the final line of a file has been written to +stdout, but it does not have a terminating newline. + +Arguments: none +Returns: nothing +*/ + +static void +write_final_newline(void) +{ +switch(endlinetype) + { + default: /* Just in case */ + case PCRE2_NEWLINE_LF: + case PCRE2_NEWLINE_ANY: + case PCRE2_NEWLINE_ANYCRLF: + fprintf(stdout, "\n"); + break; + + case PCRE2_NEWLINE_CR: + fprintf(stdout, "\r"); + break; + + case PCRE2_NEWLINE_CRLF: + fprintf(stdout, "\r\n"); + break; + + case PCRE2_NEWLINE_NUL: + fprintf(stdout, "%c", 0); + break; + } +} /************************************************* @@ -1388,85 +1770,723 @@ Returns: nothing static void do_after_lines(unsigned long int lastmatchnumber, char *lastmatchrestart, - char *endptr, char *printname) + char *endptr, const char *printname) { if (after_context > 0 && lastmatchnumber > 0) { int count = 0; - while (lastmatchrestart < endptr && count++ < after_context) + int ellength = 0; + while (lastmatchrestart < endptr && count < after_context) { - int ellength; - char *pp = lastmatchrestart; + char *pp = end_of_line(lastmatchrestart, endptr, &ellength); + if (ellength == 0 && pp == main_buffer + bufsize) break; if (printname != NULL) fprintf(stdout, "%s-", printname); if (number) fprintf(stdout, "%lu-", lastmatchnumber++); - pp = end_of_line(pp, endptr, &ellength); - FWRITE(lastmatchrestart, 1, pp - lastmatchrestart, stdout); + FWRITE_IGNORE(lastmatchrestart, 1, pp - lastmatchrestart, stdout); lastmatchrestart = pp; + count++; + } + + /* If we have printed any lines, arrange for a hyphen separator if anything + else follows. Also, if the last line is the final line in the file and it had + no newline, add one. */ + + if (count > 0) + { + hyphenpending = TRUE; + if (ellength == 0 && lastmatchrestart >= endptr) + write_final_newline(); + } + } +} + + + +/************************************************* +* Apply patterns to subject till one matches * +*************************************************/ + +/* This function is called to run through all patterns, looking for a match. It +is used multiple times for the same subject when colouring is enabled, in order +to find all possible matches. + +Arguments: + matchptr the start of the subject + length the length of the subject to match + options options for pcre_exec + startoffset where to start matching + mrc address of where to put the result of pcre2_match() + +Returns: TRUE if there was a match + FALSE if there was no match + invert if there was a non-fatal error +*/ + +static BOOL +match_patterns(char *matchptr, PCRE2_SIZE length, unsigned int options, + PCRE2_SIZE startoffset, int *mrc) +{ +int i; +PCRE2_SIZE slen = length; +patstr *p = patterns; +const char *msg = "this text:\n\n"; + +if (slen > 200) + { + slen = 200; + msg = "text that starts:\n\n"; + } + +for (i = 1; p != NULL; p = p->next, i++) + { + *mrc = pcre2_match(p->compiled, (PCRE2_SPTR)matchptr, (int)length, + startoffset, options, match_data, match_context); + if (*mrc >= 0) return TRUE; + if (*mrc == PCRE2_ERROR_NOMATCH) continue; + fprintf(stderr, "pcre2grep: pcre2_match() gave error %d while matching ", *mrc); + if (patterns->next != NULL) fprintf(stderr, "pattern number %d to ", i); + fprintf(stderr, "%s", msg); + FWRITE_IGNORE(matchptr, 1, slen, stderr); /* In case binary zero included */ + fprintf(stderr, "\n\n"); + if (*mrc <= PCRE2_ERROR_UTF8_ERR1 && + *mrc >= PCRE2_ERROR_UTF8_ERR21) + { + unsigned char mbuffer[256]; + PCRE2_SIZE startchar = pcre2_get_startchar(match_data); + (void)pcre2_get_error_message(*mrc, mbuffer, sizeof(mbuffer)); + fprintf(stderr, "%s at offset %" SIZ_FORM "\n\n", mbuffer, + SIZ_CAST startchar); + } + if (*mrc == PCRE2_ERROR_MATCHLIMIT || *mrc == PCRE2_ERROR_DEPTHLIMIT || + *mrc == PCRE2_ERROR_HEAPLIMIT || *mrc == PCRE2_ERROR_JIT_STACKLIMIT) + resource_error = TRUE; + if (error_count++ > 20) + { + fprintf(stderr, "pcre2grep: Too many errors - abandoned.\n"); + pcre2grep_exit(2); + } + return invert; /* No more matching; don't show the line again */ + } + +return FALSE; /* No match, no errors */ +} + + + +/************************************************* +* Decode dollar escape sequence * +*************************************************/ + +/* Called from various places to decode $ escapes in output strings. The escape +sequences are as follows: + +$ or ${} returns a capture number. However, if callout is TRUE, +zero is never returned; '0' is substituted. + +$a returns bell. +$b returns backspace. +$e returns escape. +$f returns form feed. +$n returns newline. +$r returns carriage return. +$t returns tab. +$v returns vertical tab. +$o returns the character represented by the given octal + number; up to three digits are processed. +$o{} does the same, up to 7 digits, but gives an error for mode-invalid + code points. +$x returns the character represented by the given hexadecimal + number; up to two digits are processed. +$x{= '0' && *string <= '9'); + string--; /* Point to last digit */ + + /* In a callout, capture number 0 is not available. No error can be given, + so just return the character '0'. */ + + if (callout && c == 0) + { + *value = '0'; + } + else + { + *value = c; + rc = DDE_CAPTURE; + } + break; + + /* Limit octal numbers to 3 digits without braces, or up to 7 with braces, + for valid Unicode code points. */ + + case 'o': + base = 8; + string++; + if (*string == '{') + { + brace = TRUE; + string++; + dcount = 7; + } + else dcount = 3; + for (; dcount > 0; dcount--) + { + if (*string < '0' || *string > '7') break; + c = c * 8 + (*string++ - '0'); + } + *value = c; + string--; /* Point to last digit */ + break; + + /* Limit hex numbers to 2 digits without braces, or up to 6 with braces, + for valid Unicode code points. */ + + case 'x': + base = 16; + string++; + if (*string == '{') + { + brace = TRUE; + string++; + dcount = 6; + } + else dcount = 2; + for (; dcount > 0; dcount--) + { + if (!isxdigit(*string)) break; + if (*string >= '0' && *string <= '9') + c = c *16 + *string++ - '0'; + else + c = c * 16 + (*string++ | 0x20) - 'a' + 10; + } + *value = c; + string--; /* Point to last digit */ + break; + + case 'a': *value = '\a'; break; + case 'b': *value = '\b'; break; +#ifndef EBCDIC + case 'e': *value = '\033'; break; +#else + case 'e': *value = '\047'; break; +#endif + case 'f': *value = '\f'; break; + case 'n': *value = STDOUT_NL_CODE; break; + case 'r': *value = '\r'; break; + case 't': *value = '\t'; break; + case 'v': *value = '\v'; break; + + default: *value = *string; break; + } + +if (brace) + { + c = string[1]; + if (c != '}') + { + rc = DDE_ERROR; + if (!callout) + { + if ((base == 8 && c >= '0' && c <= '7') || + (base == 16 && isxdigit(c))) + { + fprintf(stderr, "pcre2grep: Error in output text at offset %d: " + "too many %s digits\n", (int)(string - begin), + (base == 8)? "octal" : "hex"); + } + else + { + fprintf(stderr, "pcre2grep: Error in output text at offset %d: %s\n", + (int)(string - begin), "missing closing brace"); + } + } + } + else string++; + } + +/* Check maximum code point values, but take note of STDOUT_NL_CODE. */ + +if (rc == DDE_CHAR && *value != STDOUT_NL_CODE) + { + uint32_t max = utf? 0x0010ffffu : 0xffu; + if (*value > max) + { + if (!callout) + fprintf(stderr, "pcre2grep: Error in output text at offset %d: " + "code point greater than 0x%x is invalid\n", (int)(string - begin), max); + rc = DDE_ERROR; + } + } + +*last = string; +return rc; +} + + + +/************************************************* +* Check output text for errors * +*************************************************/ + +/* Called early, to get errors before doing anything for -O text; also called +from callouts to check before outputting. + +Arguments: + string an --output text string + callout TRUE if in a callout (stops printing errors) + +Returns: TRUE if OK, FALSE on error +*/ + +static BOOL +syntax_check_output_text(PCRE2_SPTR string, BOOL callout) +{ +uint32_t value; +PCRE2_SPTR begin = string; + +for (; *string != 0; string++) + { + if (*string == '$' && + decode_dollar_escape(begin, string, callout, &value, &string) == DDE_ERROR) + return FALSE; + } + +return TRUE; +} + + +/************************************************* +* Display output text * +*************************************************/ + +/* Display the output text, which is assumed to have already been syntax +checked. Output may contain escape sequences started by the dollar sign. + +Arguments: + string: the output text + callout: TRUE for the builtin callout, FALSE for --output + subject the start of the subject + ovector: capture offsets + capture_top: number of captures + +Returns: TRUE if something was output, other than newline + FALSE if nothing was output, or newline was last output +*/ + +static BOOL +display_output_text(PCRE2_SPTR string, BOOL callout, PCRE2_SPTR subject, + PCRE2_SIZE *ovector, PCRE2_SIZE capture_top) +{ +uint32_t value; +BOOL printed = FALSE; +PCRE2_SPTR begin = string; + +for (; *string != 0; string++) + { + if (*string == '$') + { + switch(decode_dollar_escape(begin, string, callout, &value, &string)) + { + case DDE_CHAR: + if (value == STDOUT_NL_CODE) + { + fprintf(stdout, STDOUT_NL); + printed = FALSE; + continue; + } + break; /* Will print value */ + + case DDE_CAPTURE: + if (value < capture_top) + { + PCRE2_SIZE capturesize; + value *= 2; + capturesize = ovector[value + 1] - ovector[value]; + if (capturesize > 0) + { + print_match(subject + ovector[value], capturesize); + printed = TRUE; + } + } + continue; + + default: /* Should not occur */ + break; + } } - hyphenpending = TRUE; + + else value = *string; /* Not a $ escape */ + + if (utf && value <= 127) fprintf(stdout, "%c", *string); else + { + int i; + int n = ord2utf8(value); + for (i = 0; i < n; i++) fputc(utf8_buffer[i], stdout); + } + + printed = TRUE; + } + +return printed; +} + + +#ifdef SUPPORT_PCRE2GREP_CALLOUT + +/************************************************* +* Parse and execute callout scripts * +*************************************************/ + +/* If SUPPORT_PCRE2GREP_CALLOUT_FORK is defined, this function parses a callout +string block and executes the program specified by the string. The string is a +list of substrings separated by pipe characters. The first substring represents +the executable name, and the following substrings specify the arguments: + + program_name|param1|param2|... + +Any substring (including the program name) can contain escape sequences +started by the dollar character. The escape sequences are substituted as +follows: + + $ or ${} is replaced by the captured substring of the given + decimal number, which must be greater than zero. If the number is greater + than the number of capturing substrings, or if the capture is unset, the + replacement is empty. + + Any other character is substituted by itself. E.g: $$ is replaced by a single + dollar or $| replaced by a pipe character. + +Alternatively, if string starts with pipe, the remainder is taken as an output +string, same as --output. This is the only form that is supported if +SUPPORT_PCRE2GREP_FORK is not defined. In this case, --om-separator is used to +separate each callout, defaulting to newline. + +Example: + + echo -e "abcde\n12345" | pcre2grep \ + '(.)(..(.))(?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' - + + Output: + + Arg1: [a] [bcd] [d] Arg2: |a| () + abcde + Arg1: [1] [234] [4] Arg2: |1| () + 12345 + +Arguments: + blockptr the callout block + +Returns: currently it always returns with 0 +*/ + +static int +pcre2grep_callout(pcre2_callout_block *calloutptr, void *unused) +{ +PCRE2_SIZE length = calloutptr->callout_string_length; +PCRE2_SPTR string = calloutptr->callout_string; +PCRE2_SPTR subject = calloutptr->subject; +PCRE2_SIZE *ovector = calloutptr->offset_vector; +PCRE2_SIZE capture_top = calloutptr->capture_top; + +#ifdef SUPPORT_PCRE2GREP_CALLOUT_FORK +PCRE2_SIZE argsvectorlen = 2; +PCRE2_SIZE argslen = 1; +char *args; +char *argsptr; +char **argsvector; +char **argsvectorptr; +#ifndef WIN32 +pid_t pid; +#endif +int result = 0; +#endif /* SUPPORT_PCRE2GREP_CALLOUT_FORK */ + +(void)unused; /* Avoid compiler warning */ + +/* Only callouts with strings are supported. */ + +if (string == NULL || length == 0) return 0; + +/* If there's no command, output the remainder directly. */ + +if (*string == '|') + { + string++; + if (!syntax_check_output_text(string, TRUE)) return 0; + (void)display_output_text(string, TRUE, subject, ovector, capture_top); + return 0; + } + +#ifndef SUPPORT_PCRE2GREP_CALLOUT_FORK +return 0; +#else + +/* Checking syntax and compute the number of string fragments. Callout strings +are silently ignored in the event of a syntax error. */ + +while (length > 0) + { + if (*string == '|') + { + argsvectorlen++; + if (argsvectorlen > 10000) return 0; /* Too many args */ + } + + else if (*string == '$') + { + uint32_t value; + PCRE2_SPTR begin = string; + + switch (decode_dollar_escape(begin, string, TRUE, &value, &string)) + { + case DDE_CAPTURE: + if (value < capture_top) + { + value *= 2; + argslen += ovector[value + 1] - ovector[value]; + } + argslen--; /* Negate the effect of argslen++ below. */ + break; + + case DDE_CHAR: + if (value == STDOUT_NL_CODE) argslen += STDOUT_NL_LEN - 1; + else if (utf && value > 127) argslen += ord2utf8(value) - 1; + break; + + default: /* Should not occur */ + case DDE_ERROR: + return 0; + } + + length -= (string - begin); + } + + string++; + length--; + argslen++; + } + +/* Get memory for the argument vector and its strings. */ + +args = (char*)malloc(argslen); +if (args == NULL) return 0; + +argsvector = (char**)malloc(argsvectorlen * sizeof(char*)); +if (argsvector == NULL) + { + free(args); + return 0; + } + +/* Now reprocess the string and set up the arguments. */ + +argsptr = args; +argsvectorptr = argsvector; +*argsvectorptr++ = argsptr; + +length = calloutptr->callout_string_length; +string = calloutptr->callout_string; + +while (length > 0) + { + if (*string == '|') + { + *argsptr++ = '\0'; + *argsvectorptr++ = argsptr; + } + + else if (*string == '$') + { + uint32_t value; + PCRE2_SPTR begin = string; + + switch (decode_dollar_escape(begin, string, TRUE, &value, &string)) + { + case DDE_CAPTURE: + if (value < capture_top) + { + PCRE2_SIZE capturesize; + value *= 2; + capturesize = ovector[value + 1] - ovector[value]; + memcpy(argsptr, subject + ovector[value], capturesize); + argsptr += capturesize; + } + break; + + case DDE_CHAR: + if (value == STDOUT_NL_CODE) + { + memcpy(argsptr, STDOUT_NL, STDOUT_NL_LEN); + argsptr += STDOUT_NL_LEN; + } + else if (utf && value > 127) + { + int n = ord2utf8(value); + memcpy(argsptr, utf8_buffer, n); + argsptr += n; + } + else + { + *argsptr++ = value; + } + break; + + default: /* Even though this should not occur, the string having */ + case DDE_ERROR: /* been checked above, we need to include the free() */ + free(args); /* calls so that source checkers do not complain. */ + free(argsvector); + return 0; + } + + length -= (string - begin); + } + + else *argsptr++ = *string; + + /* Advance along the string */ + + string++; + length--; + } + +*argsptr++ = '\0'; +*argsvectorptr = NULL; + +/* Running an external command is system-dependent. Handle Windows and VMS as +necessary, otherwise assume fork(). */ + +#ifdef WIN32 +result = _spawnvp(_P_WAIT, argsvector[0], (const char * const *)argsvector); + +#elif defined __VMS + { + char cmdbuf[500]; + short i = 0; + int flags = CLI$M_NOCLISYM|CLI$M_NOLOGNAM|CLI$M_NOKEYPAD, status, retstat; + $DESCRIPTOR(cmd, cmdbuf); + + cmdbuf[0] = 0; + while (argsvector[i]) + { + strcat(cmdbuf, argsvector[i]); + strcat(cmdbuf, " "); + i++; + } + cmd.dsc$w_length = strlen(cmdbuf) - 1; + status = lib$spawn(&cmd, 0,0, &flags, 0,0, &retstat); + if (!(status & 1)) result = 0; + else result = retstat & 1 ? 0 : 1; } -} +#else /* Neither Windows nor VMS */ +pid = fork(); +if (pid == 0) + { + (void)execv(argsvector[0], argsvector); + /* Control gets here if there is an error, e.g. a non-existent program */ + exit(1); + } +else if (pid > 0) + (void)waitpid(pid, &result, 0); +#endif /* End Windows/VMS/other handling */ +free(args); +free(argsvector); -/************************************************* -* Apply patterns to subject till one matches * -*************************************************/ +/* Currently negative return values are not supported, only zero (match +continues) or non-zero (match fails). */ -/* This function is called to run through all patterns, looking for a match. It -is used multiple times for the same subject when colouring is enabled, in order -to find all possible matches. +return result != 0; +#endif /* SUPPORT_PCRE2GREP_CALLOUT_FORK */ +} +#endif /* SUPPORT_PCRE2GREP_CALLOUT */ -Arguments: - matchptr the start of the subject - length the length of the subject to match - options options for pcre_exec - startoffset where to start matching - offsets the offets vector to fill in - mrc address of where to put the result of pcre_exec() -Returns: TRUE if there was a match - FALSE if there was no match - invert if there was a non-fatal error -*/ -static BOOL -match_patterns(char *matchptr, size_t length, unsigned int options, - int startoffset, int *offsets, int *mrc) +/************************************************* +* Read a portion of the file into buffer * +*************************************************/ + +static int +fill_buffer(void *handle, int frtype, char *buffer, int length, + BOOL input_line_buffered) { -int i; -size_t slen = length; -patstr *p = patterns; -const char *msg = "this text:\n\n"; +(void)frtype; /* Avoid warning when not used */ -if (slen > 200) - { - slen = 200; - msg = "text that starts:\n\n"; - } -for (i = 1; p != NULL; p = p->next, i++) - { - *mrc = pcre_exec(p->compiled, p->hint, matchptr, (int)length, - startoffset, options, offsets, OFFSET_SIZE); - if (*mrc >= 0) return TRUE; - if (*mrc == PCRE_ERROR_NOMATCH) continue; - fprintf(stderr, "pcregrep: pcre_exec() gave error %d while matching ", *mrc); - if (patterns->next != NULL) fprintf(stderr, "pattern number %d to ", i); - fprintf(stderr, "%s", msg); - FWRITE(matchptr, 1, slen, stderr); /* In case binary zero included */ - fprintf(stderr, "\n\n"); - if (*mrc == PCRE_ERROR_MATCHLIMIT || *mrc == PCRE_ERROR_RECURSIONLIMIT || - *mrc == PCRE_ERROR_JIT_STACKLIMIT) - resource_error = TRUE; - if (error_count++ > 20) - { - fprintf(stderr, "pcregrep: Too many errors - abandoned.\n"); - pcregrep_exit(2); - } - return invert; /* No more matching; don't show the line again */ - } +#ifdef SUPPORT_LIBZ +if (frtype == FR_LIBZ) + return gzread((gzFile)handle, buffer, length); +else +#endif -return FALSE; /* No match, no errors */ +#ifdef SUPPORT_LIBBZ2 +if (frtype == FR_LIBBZ2) + return BZ2_bzread((BZFILE *)handle, buffer, length); +else +#endif + +return (input_line_buffered ? + read_one_line(buffer, length, (FILE *)handle) : + fread(buffer, 1, length, (FILE *)handle)); } @@ -1499,76 +2519,54 @@ Returns: 0 if there was at least one match */ static int -pcregrep(void *handle, int frtype, char *filename, char *printname) +pcre2grep(void *handle, int frtype, const char *filename, const char *printname) { int rc = 1; int filepos = 0; -int offsets[OFFSET_SIZE]; unsigned long int linenumber = 1; unsigned long int lastmatchnumber = 0; unsigned long int count = 0; -char *lastmatchrestart = NULL; +long int count_matched_lines = 0; +char *lastmatchrestart = main_buffer; char *ptr = main_buffer; char *endptr; -size_t bufflength; +PCRE2_SIZE bufflength; BOOL binary = FALSE; BOOL endhyphenpending = FALSE; +BOOL lines_printed = FALSE; BOOL input_line_buffered = line_buffered; FILE *in = NULL; /* Ensure initialized */ -#ifdef SUPPORT_LIBZ -gzFile ingz = NULL; -#endif - -#ifdef SUPPORT_LIBBZ2 -BZFILE *inbz2 = NULL; -#endif - - /* Do the first read into the start of the buffer and set up the pointer to end of what we have. In the case of libz, a non-zipped .gz file will be read as a plain file. However, if a .bz2 file isn't actually bzipped, the first read will fail. */ -(void)frtype; - -#ifdef SUPPORT_LIBZ -if (frtype == FR_LIBZ) +if (frtype != FR_LIBZ && frtype != FR_LIBBZ2) { - ingz = (gzFile)handle; - bufflength = gzread (ingz, main_buffer, bufsize); + in = (FILE *)handle; + if (is_file_tty(in)) input_line_buffered = TRUE; } -else -#endif +else input_line_buffered = FALSE; + +bufflength = fill_buffer(handle, frtype, main_buffer, bufsize, + input_line_buffered); #ifdef SUPPORT_LIBBZ2 -if (frtype == FR_LIBBZ2) - { - inbz2 = (BZFILE *)handle; - bufflength = BZ2_bzread(inbz2, main_buffer, bufsize); - if ((int)bufflength < 0) return 2; /* Gotcha: bufflength is size_t; */ - } /* without the cast it is unsigned. */ -else +if (frtype == FR_LIBBZ2 && (int)bufflength < 0) return 2; /* Gotcha: bufflength is PCRE2_SIZE */ #endif - { - in = (FILE *)handle; - if (is_file_tty(in)) input_line_buffered = TRUE; - bufflength = input_line_buffered? - read_one_line(main_buffer, bufsize, in) : - fread(main_buffer, 1, bufsize, in); - } - endptr = main_buffer + bufflength; /* Unless binary-files=text, see if we have a binary file. This uses the same rule as GNU grep, namely, a search for a binary zero byte near the start of the -file. */ +file. However, when the newline convention is binary zero, we can't do this. */ if (binary_files != BIN_TEXT) { - binary = - memchr(main_buffer, 0, (bufflength > 1024)? 1024 : bufflength) != NULL; + if (endlinetype != PCRE2_NEWLINE_NUL) + binary = memchr(main_buffer, 0, (bufflength > 1024)? 1024 : bufflength) + != NULL; if (binary && binary_files == BIN_NOMATCH) return 1; } @@ -1581,40 +2579,94 @@ while (ptr < endptr) { int endlinelength; int mrc = 0; - int startoffset = 0; - int prevoffsets[2]; unsigned int options = 0; BOOL match; - char *matchptr = ptr; + BOOL line_matched = FALSE; char *t = ptr; - size_t length, linelength; + PCRE2_SIZE length, linelength; + PCRE2_SIZE startoffset = 0; - prevoffsets[0] = prevoffsets[1] = -1; + /* If the -m option set a limit for the number of matched or non-matched + lines, check it here. A limit of zero means that no matching is ever done. + For stdin from a file, set the file position. */ + + if (count_limit >= 0 && count_matched_lines >= count_limit) + { + if (frtype == FR_PLAIN && filename == stdin_name && !is_file_tty(handle)) + (void)fseek(handle, (long int)filepos, SEEK_SET); + rc = (count_limit == 0)? 1 : 0; + break; + } /* At this point, ptr is at the start of a line. We need to find the length - of the subject string to pass to pcre_exec(). In multiline mode, it is the + of the subject string to pass to pcre2_match(). In multiline mode, it is the length remainder of the data in the buffer. Otherwise, it is the length of the next line, excluding the terminating newline. After matching, we always - advance by the length of the next line. In multiline mode the PCRE_FIRSTLINE + advance by the length of the next line. In multiline mode the PCRE2_FIRSTLINE option is used for compiling, so that any match is constrained to be in the first line. */ t = end_of_line(t, endptr, &endlinelength); linelength = t - ptr - endlinelength; - length = multiline? (size_t)(endptr - ptr) : linelength; + length = multiline? (PCRE2_SIZE)(endptr - ptr) : linelength; /* Check to see if the line we are looking at extends right to the very end of the buffer without a line terminator. This means the line is too long to - handle. */ + handle at the current buffer size. Until the buffer reaches its maximum size, + try doubling it and reading more data. */ if (endlinelength == 0 && t == main_buffer + bufsize) { - fprintf(stderr, "pcregrep: line %lu%s%s is too long for the internal buffer\n" - "pcregrep: check the --buffer-size option\n", - linenumber, - (filename == NULL)? "" : " of file ", - (filename == NULL)? "" : filename); - return 2; + if (bufthird < max_bufthird) + { + char *new_buffer; + int new_bufthird = 2*bufthird; + + if (new_bufthird > max_bufthird) new_bufthird = max_bufthird; + new_buffer = (char *)malloc(3*new_bufthird); + + if (new_buffer == NULL) + { + fprintf(stderr, + "pcre2grep: line %lu%s%s is too long for the internal buffer\n" + "pcre2grep: not enough memory to increase the buffer size to %d\n", + linenumber, + (filename == NULL)? "" : " of file ", + (filename == NULL)? "" : filename, + new_bufthird); + return 2; + } + + /* Copy the data and adjust pointers to the new buffer location. */ + + memcpy(new_buffer, main_buffer, bufsize); + bufthird = new_bufthird; + bufsize = 3*bufthird; + ptr = new_buffer + (ptr - main_buffer); + lastmatchrestart = new_buffer + (lastmatchrestart - main_buffer); + free(main_buffer); + main_buffer = new_buffer; + + /* Read more data into the buffer and then try to find the line ending + again. */ + + bufflength += fill_buffer(handle, frtype, main_buffer + bufflength, + bufsize - bufflength, input_line_buffered); + endptr = main_buffer + bufflength; + continue; + } + else + { + fprintf(stderr, + "pcre2grep: line %lu%s%s is too long for the internal buffer\n" + "pcre2grep: the maximum buffer size is %d\n" + "pcre2grep: use the --max-buffer-size option to change it\n", + linenumber, + (filename == NULL)? "" : " of file ", + (filename == NULL)? "" : filename, + bufthird); + return 2; + } } /* Extra processing for Jeffrey Friedl's debugging. */ @@ -1635,7 +2687,7 @@ while (ptr < endptr) ptr = malloc(newlen + 1); if (!ptr) { printf("out of memory"); - pcregrep_exit(2); + pcre2grep_exit(2); } endptr = ptr; strcpy(endptr, jfriedl_prefix); endptr += strlen(jfriedl_prefix); @@ -1653,7 +2705,7 @@ while (ptr < endptr) for (i = 0; i < jfriedl_XR; i++) match = (pcre_exec(patterns->compiled, patterns->hint, ptr, length, 0, - PCRE_NOTEMPTY, offsets, OFFSET_SIZE) >= 0); + PCRE2_NOTEMPTY, offsets, offset_size) >= 0); if (gettimeofday(&end_time, &dummy) != 0) perror("bad gettimeofday"); @@ -1667,8 +2719,8 @@ while (ptr < endptr) } #endif - /* We come back here after a match when show_only_matching is set, in order - to find any further matches in the same line. This applies to + /* We come back here after a match when only_matching_count is non-zero, in + order to find any further matches in the same line. This applies to --only-matching, --file-offsets, and --line-offsets. */ ONLY_MATCHING_RESTART: @@ -1676,13 +2728,16 @@ while (ptr < endptr) /* Run through all the patterns until one matches or there is an error other than NOMATCH. This code is in a subroutine so that it can be re-used for finding subsequent matches when colouring matched lines. After finding one - match, set PCRE_NOTEMPTY to disable any further matches of null strings in + match, set PCRE2_NOTEMPTY to disable any further matches of null strings in this line. */ - match = match_patterns(matchptr, length, options, startoffset, offsets, &mrc); - options = PCRE_NOTEMPTY; + match = match_patterns(ptr, length, options, startoffset, &mrc); + options = PCRE2_NOTEMPTY; - /* If it's a match or a not-match (as required), do what's wanted. */ + /* If it's a match or a not-match (as required), do what's wanted. NOTE: Use + only FWRITE_IGNORE() - which is just a packaged fwrite() that ignores its + return code - to output data lines, so that binary zeroes are treated as just + another data character. */ if (match != invert) { @@ -1692,13 +2747,17 @@ while (ptr < endptr) if (filenames == FN_NOMATCH_ONLY) return 1; - /* If all we want is a yes/no answer, stop now. */ + /* Remember that this line matched (for counting matched lines) */ + + line_matched = TRUE; + + /* If all we want is a yes/no answer, we can return immediately. */ if (quiet) return 0; /* Just count if just counting is wanted. */ - else if (count_only) count++; + else if (count_only || show_total_count) count++; /* When handling a binary file and binary-files==binary, the "binary" variable will be set true (it's false in all other cases). In this @@ -1706,127 +2765,120 @@ while (ptr < endptr) else if (binary) { - fprintf(stdout, "Binary file %s matches\n", filename); + fprintf(stdout, "Binary file %s matches" STDOUT_NL, filename); return 0; } - /* If all we want is a file name, there is no need to scan any more lines - in the file. */ + /* Likewise, if all we want is a file name, there is no need to scan any + more lines in the file. */ else if (filenames == FN_MATCH_ONLY) { - fprintf(stdout, "%s\n", printname); + fprintf(stdout, "%s" STDOUT_NL, printname); return 0; } /* The --only-matching option prints just the substring that matched, and/or one or more captured portions of it, as long as these strings are not empty. The --file-offsets and --line-offsets options output offsets for - the matching substring (all three set show_only_matching). None of these - mutually exclusive options prints any context. Afterwards, adjust the start - and then jump back to look for further matches in the same line. If we are - in invert mode, however, nothing is printed and we do not restart - this - could still be useful because the return code is set. */ + the matching substring (all three set only_matching_count non-zero). None + of these mutually exclusive options prints any context. Afterwards, adjust + the start and then jump back to look for further matches in the same line. + If we are in invert mode, however, nothing is printed and we do not restart + - this could still be useful because the return code is set. */ - else if (show_only_matching) + else if (only_matching_count != 0) { if (!invert) { - int oldstartoffset = startoffset; + PCRE2_SIZE oldstartoffset; - /* It is possible, when a lookbehind assertion contains \K, for the - same string to be found again. The code below advances startoffset, but - until it is past the "bumpalong" offset that gave the match, the same - substring will be returned. The PCRE1 library does not return the - bumpalong offset, so all we can do is ignore repeated strings. (PCRE2 - does this better.) */ + if (printname != NULL) fprintf(stdout, "%s:", printname); + if (number) fprintf(stdout, "%lu:", linenumber); - if (prevoffsets[0] != offsets[0] || prevoffsets[1] != offsets[1]) - { - prevoffsets[0] = offsets[0]; - prevoffsets[1] = offsets[1]; + /* Handle --line-offsets */ + + if (line_offsets) + fprintf(stdout, "%d,%d" STDOUT_NL, (int)(ptr + offsets[0] - ptr), + (int)(offsets[1] - offsets[0])); - if (printname != NULL) fprintf(stdout, "%s:", printname); - if (number) fprintf(stdout, "%lu:", linenumber); + /* Handle --file-offsets */ - /* Handle --line-offsets */ + else if (file_offsets) + fprintf(stdout, "%d,%d" STDOUT_NL, + (int)(filepos + ptr + offsets[0] - ptr), + (int)(offsets[1] - offsets[0])); - if (line_offsets) - fprintf(stdout, "%d,%d\n", (int)(matchptr + offsets[0] - ptr), - offsets[1] - offsets[0]); + /* Handle --output (which has already been syntax checked) */ - /* Handle --file-offsets */ + else if (output_text != NULL) + { + if (display_output_text((PCRE2_SPTR)output_text, FALSE, + (PCRE2_SPTR)ptr, offsets, mrc) || printname != NULL || + number) + fprintf(stdout, STDOUT_NL); + } - else if (file_offsets) - fprintf(stdout, "%d,%d\n", - (int)(filepos + matchptr + offsets[0] - ptr), - offsets[1] - offsets[0]); + /* Handle --only-matching, which may occur many times */ - /* Handle --only-matching, which may occur many times */ + else + { + BOOL printed = FALSE; + omstr *om; - else + for (om = only_matching; om != NULL; om = om->next) { - BOOL printed = FALSE; - omstr *om; - - for (om = only_matching; om != NULL; om = om->next) + int n = om->groupnum; + if (n == 0 || n < mrc) { - int n = om->groupnum; - if (n < mrc) + int plen = offsets[2*n + 1] - offsets[2*n]; + if (plen > 0) { - int plen = offsets[2*n + 1] - offsets[2*n]; - if (plen > 0) - { - if (printed) fprintf(stdout, "%s", om_separator); - if (do_colour) fprintf(stdout, "%c[%sm", 0x1b, colour_string); - FWRITE(matchptr + offsets[n*2], 1, plen, stdout); - if (do_colour) fprintf(stdout, "%c[00m", 0x1b); - printed = TRUE; - } + if (printed && om_separator != NULL) + fprintf(stdout, "%s", om_separator); + print_match(ptr + offsets[n*2], plen); + printed = TRUE; } } - - if (printed || printname != NULL || number) fprintf(stdout, "\n"); } + + if (printed || printname != NULL || number) + fprintf(stdout, STDOUT_NL); } - /* Prepare to repeat to find the next match. If the patterned contained - a lookbehind tht included \K, it is possible that the end of the match - might be at or before the actual strting offset we have just used. We - need to start one character further on. Unfortunately, for unanchored - patterns, the actual start offset can be greater that the one that was - set as a result of "bumpalong". PCRE1 does not return the actual start - offset, so we have to check against the original start offset. This may - lead to duplicates - we we need the fudge above to avoid printing them. - (PCRE2 does this better.) */ + /* Prepare to repeat to find the next match in the line. */ match = FALSE; if (line_buffered) fflush(stdout); rc = 0; /* Had some success */ + /* If the pattern contained a lookbehind that included \K, it is + possible that the end of the match might be at or before the actual + starting offset we have just used. In this case, start one character + further on. */ + startoffset = offsets[1]; /* Restart after the match */ + oldstartoffset = pcre2_get_startchar(match_data); if (startoffset <= oldstartoffset) { - if ((size_t)startoffset >= length) - goto END_ONE_MATCH; /* We were at the end */ + if (startoffset >= length) goto END_ONE_MATCH; /* Were at end */ startoffset = oldstartoffset + 1; - if (utf8) - while ((matchptr[startoffset] & 0xc0) == 0x80) startoffset++; + if (utf) while ((ptr[startoffset] & 0xc0) == 0x80) startoffset++; } /* If the current match ended past the end of the line (only possible in multiline mode), we must move on to the line in which it did end before searching for more matches. */ - while (startoffset > (int)linelength) + while (startoffset > linelength) { - matchptr = ptr += linelength + endlinelength; + ptr += linelength + endlinelength; filepos += (int)(linelength + endlinelength); linenumber++; startoffset -= (int)(linelength + endlinelength); t = end_of_line(ptr, endptr, &endlinelength); linelength = t - ptr - endlinelength; - length = (size_t)(endptr - ptr); + length = (PCRE2_SIZE)(endptr - ptr); } goto ONLY_MATCHING_RESTART; @@ -1839,6 +2891,8 @@ while (ptr < endptr) else { + lines_printed = TRUE; + /* See if there is a requirement to print some "after" lines from a previous match. We never print any overlaps. */ @@ -1864,7 +2918,7 @@ while (ptr < endptr) if (printname != NULL) fprintf(stdout, "%s-", printname); if (number) fprintf(stdout, "%lu-", lastmatchnumber++); pp = end_of_line(pp, endptr, &ellength); - FWRITE(lastmatchrestart, 1, pp - lastmatchrestart, stdout); + FWRITE_IGNORE(lastmatchrestart, 1, pp - lastmatchrestart, stdout); lastmatchrestart = pp; } if (lastmatchrestart != ptr) hyphenpending = TRUE; @@ -1874,7 +2928,7 @@ while (ptr < endptr) if (hyphenpending) { - fprintf(stdout, "--\n"); + fprintf(stdout, "--" STDOUT_NL); hyphenpending = FALSE; hyphenprinted = TRUE; } @@ -1887,7 +2941,8 @@ while (ptr < endptr) int linecount = 0; char *p = ptr; - while (p > main_buffer && (lastmatchnumber == 0 || p > lastmatchrestart) && + while (p > main_buffer && + (lastmatchnumber == 0 || p > lastmatchrestart) && linecount < before_context) { linecount++; @@ -1895,7 +2950,7 @@ while (ptr < endptr) } if (lastmatchnumber > 0 && p > lastmatchrestart && !hyphenprinted) - fprintf(stdout, "--\n"); + fprintf(stdout, "--" STDOUT_NL); while (p < ptr) { @@ -1904,7 +2959,7 @@ while (ptr < endptr) if (printname != NULL) fprintf(stdout, "%s-", printname); if (number) fprintf(stdout, "%lu-", linenumber - linecount--); pp = end_of_line(pp, endptr, &ellength); - FWRITE(p, 1, pp - p, stdout); + FWRITE_IGNORE(p, 1, pp - p, stdout); p = pp; } } @@ -1918,27 +2973,6 @@ while (ptr < endptr) if (printname != NULL) fprintf(stdout, "%s:", printname); if (number) fprintf(stdout, "%lu:", linenumber); - /* In multiline mode, we want to print to the end of the line in which - the end of the matched string is found, so we adjust linelength and the - line number appropriately, but only when there actually was a match - (invert not set). Because the PCRE_FIRSTLINE option is set, the start of - the match will always be before the first newline sequence. */ - - if (multiline & !invert) - { - char *endmatch = ptr + offsets[1]; - t = ptr; - while (t <= endmatch) - { - t = end_of_line(t, endptr, &endlinelength); - if (t < endmatch) linenumber++; else break; - } - linelength = t - ptr - endlinelength; - } - - /*** NOTE: Use only fwrite() to output the data line, so that binary - zeroes are treated as just another data character. */ - /* This extra option, for Jeffrey Friedl's debugging requirements, replaces the matched string, or a specific captured string if it exists, with X. When this happens, colouring is ignored. */ @@ -1948,47 +2982,109 @@ while (ptr < endptr) { int first = S_arg * 2; int last = first + 1; - FWRITE(ptr, 1, offsets[first], stdout); + FWRITE_IGNORE(ptr, 1, offsets[first], stdout); fprintf(stdout, "X"); - FWRITE(ptr + offsets[last], 1, linelength - offsets[last], stdout); + FWRITE_IGNORE(ptr + offsets[last], 1, linelength - offsets[last], stdout); } else #endif - /* We have to split the line(s) up if colouring, and search for further - matches, but not of course if the line is a non-match. */ + /* In multiline mode, or if colouring, we have to split the line(s) up + and search for further matches, but not of course if the line is a + non-match. In multiline mode this is necessary in case there is another + match that spans the end of the current line. When colouring we want to + colour all matches. */ - if (do_colour && !invert) + if ((multiline || do_colour) && !invert) { int plength; - FWRITE(ptr, 1, offsets[0], stdout); - fprintf(stdout, "%c[%sm", 0x1b, colour_string); - FWRITE(ptr + offsets[0], 1, offsets[1] - offsets[0], stdout); - fprintf(stdout, "%c[00m", 0x1b); + PCRE2_SIZE endprevious; + + /* The use of \K may make the end offset earlier than the start. In + this situation, swap them round. */ + + if (offsets[0] > offsets[1]) + { + PCRE2_SIZE temp = offsets[0]; + offsets[0] = offsets[1]; + offsets[1] = temp; + } + + FWRITE_IGNORE(ptr, 1, offsets[0], stdout); + print_match(ptr + offsets[0], offsets[1] - offsets[0]); + for (;;) { - startoffset = offsets[1]; - if (startoffset >= (int)linelength + endlinelength || - !match_patterns(matchptr, length, options, startoffset, offsets, - &mrc)) - break; - FWRITE(matchptr + startoffset, 1, offsets[0] - startoffset, stdout); - fprintf(stdout, "%c[%sm", 0x1b, colour_string); - FWRITE(matchptr + offsets[0], 1, offsets[1] - offsets[0], stdout); - fprintf(stdout, "%c[00m", 0x1b); + PCRE2_SIZE oldstartoffset = pcre2_get_startchar(match_data); + + endprevious = offsets[1]; + startoffset = endprevious; /* Advance after previous match. */ + + /* If the pattern contained a lookbehind that included \K, it is + possible that the end of the match might be at or before the actual + starting offset we have just used. In this case, start one character + further on. */ + + if (startoffset <= oldstartoffset) + { + startoffset = oldstartoffset + 1; + if (utf) while ((ptr[startoffset] & 0xc0) == 0x80) startoffset++; + } + + /* If the current match ended past the end of the line (only possible + in multiline mode), we must move on to the line in which it did end + before searching for more matches. Because the PCRE2_FIRSTLINE option + is set, the start of the match will always be before the first + newline sequence. */ + + while (startoffset > linelength + endlinelength) + { + ptr += linelength + endlinelength; + filepos += (int)(linelength + endlinelength); + linenumber++; + startoffset -= (int)(linelength + endlinelength); + endprevious -= (int)(linelength + endlinelength); + t = end_of_line(ptr, endptr, &endlinelength); + linelength = t - ptr - endlinelength; + length = (PCRE2_SIZE)(endptr - ptr); + } + + /* If startoffset is at the exact end of the line it means this + complete line was the final part of the match, so there is nothing + more to do. */ + + if (startoffset == linelength + endlinelength) break; + + /* Otherwise, run a match from within the final line, and if found, + loop for any that may follow. */ + + if (!match_patterns(ptr, length, options, startoffset, &mrc)) break; + + /* The use of \K may make the end offset earlier than the start. In + this situation, swap them round. */ + + if (offsets[0] > offsets[1]) + { + PCRE2_SIZE temp = offsets[0]; + offsets[0] = offsets[1]; + offsets[1] = temp; + } + + FWRITE_IGNORE(ptr + endprevious, 1, offsets[0] - endprevious, stdout); + print_match(ptr + offsets[0], offsets[1] - offsets[0]); } /* In multiline mode, we may have already printed the complete line and its line-ending characters (if they matched the pattern), so there may be no more to print. */ - plength = (int)((linelength + endlinelength) - startoffset); - if (plength > 0) FWRITE(ptr + startoffset, 1, plength, stdout); + plength = (int)((linelength + endlinelength) - endprevious); + if (plength > 0) FWRITE_IGNORE(ptr + endprevious, 1, plength, stdout); } - /* Not colouring; no need to search for further matches */ + /* Not colouring or multiline; no need to search for further matches. */ - else FWRITE(ptr, 1, linelength + endlinelength, stdout); + else FWRITE_IGNORE(ptr, 1, linelength + endlinelength, stdout); } /* End of doing what has to be done for a match. If --line-buffered was @@ -2002,6 +3098,12 @@ while (ptr < endptr) lastmatchrestart = ptr + linelength + endlinelength; lastmatchnumber = linenumber + 1; + + /* If a line was printed and we are now at the end of the file and the last + line had no newline, output one. */ + + if (lines_printed && lastmatchrestart >= endptr && endlinelength == 0) + write_final_newline(); } /* For a match in multiline inverted mode (which of course did not cause @@ -2030,10 +3132,15 @@ while (ptr < endptr) filepos += (int)(linelength + endlinelength); linenumber++; + /* If there was at least one match (or a non-match, as required) in the line, + increment the count for the -m option. */ + + if (line_matched) count_matched_lines++; + /* If input is line buffered, and the buffer is not yet full, read another line and add it into the buffer. */ - if (input_line_buffered && bufflength < (size_t)bufsize) + if (input_line_buffered && bufflength < (PCRE2_SIZE)bufsize) { int add = read_one_line(ptr, bufsize - (int)(ptr - main_buffer), in); bufflength += add; @@ -2045,39 +3152,23 @@ while (ptr < endptr) 1/3 and refill it. Before we do this, if some unprinted "after" lines are about to be lost, print them. */ - if (bufflength >= (size_t)bufsize && ptr > main_buffer + 2*bufthird) + if (bufflength >= (PCRE2_SIZE)bufsize && ptr > main_buffer + 2*bufthird) { if (after_context > 0 && lastmatchnumber > 0 && lastmatchrestart < main_buffer + bufthird) { do_after_lines(lastmatchnumber, lastmatchrestart, endptr, printname); - lastmatchnumber = 0; + lastmatchnumber = 0; /* Indicates no after lines pending */ } /* Now do the shuffle */ - memmove(main_buffer, main_buffer + bufthird, 2*bufthird); + (void)memmove(main_buffer, main_buffer + bufthird, 2*bufthird); ptr -= bufthird; -#ifdef SUPPORT_LIBZ - if (frtype == FR_LIBZ) - bufflength = 2*bufthird + - gzread (ingz, main_buffer + 2*bufthird, bufthird); - else -#endif - -#ifdef SUPPORT_LIBBZ2 - if (frtype == FR_LIBBZ2) - bufflength = 2*bufthird + - BZ2_bzread(inbz2, main_buffer + 2*bufthird, bufthird); - else -#endif - - bufflength = 2*bufthird + - (input_line_buffered? - read_one_line(main_buffer + 2*bufthird, bufthird, in) : - fread(main_buffer + 2*bufthird, 1, bufthird, in)); + bufflength = 2*bufthird + fill_buffer(handle, frtype, + main_buffer + 2*bufthird, bufthird, input_line_buffered); endptr = main_buffer + bufflength; /* Adjust any last match point */ @@ -2089,7 +3180,7 @@ while (ptr < endptr) /* End of file; print final "after" lines if wanted; do_after_lines sets hyphenpending if it prints something. */ -if (!show_only_matching && !count_only) +if (only_matching_count == 0 && !(count_only|show_total_count)) { do_after_lines(lastmatchnumber, lastmatchrestart, endptr, printname); hyphenpending |= endhyphenpending; @@ -2100,7 +3191,7 @@ were none. If we found a match, we won't have got this far. */ if (filenames == FN_NOMATCH_ONLY) { - fprintf(stdout, "%s\n", printname); + fprintf(stdout, "%s" STDOUT_NL, printname); return 0; } @@ -2112,10 +3203,12 @@ if (count_only && !quiet) { if (printname != NULL && filenames != FN_NONE) fprintf(stdout, "%s:", printname); - fprintf(stdout, "%lu\n", count); + fprintf(stdout, "%lu" STDOUT_NL, count); + counts_printed++; } } +total_count += count; /* Can be set without count_only */ return rc; } @@ -2171,7 +3264,7 @@ FILE *zos_test_file; if (strcmp(pathname, "-") == 0) { - return pcregrep(stdin, FR_PLAIN, stdin_name, + return pcre2grep(stdin, FR_PLAIN, stdin_name, (filenames > FN_DEFAULT || (filenames == FN_DEFAULT && !only_one_at_top))? stdin_name : NULL); } @@ -2195,7 +3288,7 @@ zos_test_file = fopen(pathname,"rb"); if (zos_test_file == NULL) { - if (!silent) fprintf(stderr, "pcregrep: failed to test next file %s\n", + if (!silent) fprintf(stderr, "pcre2grep: failed to test next file %s\n", pathname, strerror(errno)); return -1; } @@ -2234,14 +3327,14 @@ if (isdirectory(pathname)) if (dee_action == dee_RECURSE) { - char buffer[2048]; + char buffer[FNBUFSIZ]; char *nextfile; directory_type *dir = opendirectory(pathname); if (dir == NULL) { if (!silent) - fprintf(stderr, "pcregrep: Failed to open directory %s: %s\n", pathname, + fprintf(stderr, "pcre2grep: Failed to open directory %s: %s\n", pathname, strerror(errno)); return 2; } @@ -2250,9 +3343,9 @@ if (isdirectory(pathname)) { int frc; int fnlength = strlen(pathname) + strlen(nextfile) + 2; - if (fnlength > 2048) + if (fnlength > FNBUFSIZ) { - fprintf(stderr, "pcregrep: recursive filename is too long\n"); + fprintf(stderr, "pcre2grep: recursive filename is too long\n"); rc = 2; break; } @@ -2267,6 +3360,36 @@ if (isdirectory(pathname)) } } +#ifdef WIN32 +if (iswild(pathname)) + { + char buffer[1024]; + char *nextfile; + char *name; + directory_type *dir = opendirectory(pathname); + + if (dir == NULL) + return 0; + + for (nextfile = name = pathname; *nextfile != 0; nextfile++) + if (*nextfile == '/' || *nextfile == '\\') + name = nextfile + 1; + *name = 0; + + while ((nextfile = readdirectory(dir)) != NULL) + { + int frc; + sprintf(buffer, "%.512s%.128s", pathname, nextfile); + frc = grep_or_recurse(buffer, dir_recurse, FALSE); + if (frc > 1) rc = frc; + else if (frc == 0 && rc == 1) rc = 0; + } + + closedirectory(dir); + return rc; + } +#endif + #if defined NATIVE_ZOS } #endif @@ -2303,7 +3426,7 @@ if (pathlen > 3 && strcmp(pathname + pathlen - 3, ".gz") == 0) if (ingz == NULL) { if (!silent) - fprintf(stderr, "pcregrep: Failed to open %s: %s\n", pathname, + fprintf(stderr, "pcre2grep: Failed to open %s: %s\n", pathname, strerror(errno)); return 2; } @@ -2342,14 +3465,14 @@ an attempt to read a .bz2 file indicates that it really is a plain file. */ if (handle == NULL) { if (!silent) - fprintf(stderr, "pcregrep: Failed to open %s: %s\n", pathname, + fprintf(stderr, "pcre2grep: Failed to open %s: %s\n", pathname, strerror(errno)); return 2; } /* Now grep the file */ -rc = pcregrep(handle, frtype, pathname, (filenames > FN_DEFAULT || +rc = pcre2grep(handle, frtype, pathname, (filenames > FN_DEFAULT || (filenames == FN_DEFAULT && !only_one_at_top))? pathname : NULL); /* Close in an appropriate manner. */ @@ -2377,7 +3500,7 @@ if (frtype == FR_LIBBZ2) goto PLAIN_FILE; } else if (!silent) - fprintf(stderr, "pcregrep: Failed to read %s using bzlib: %s\n", + fprintf(stderr, "pcre2grep: Failed to read %s using bzlib: %s\n", pathname, err); rc = 2; /* The normal "something went wrong" code */ } @@ -2390,7 +3513,7 @@ else fclose(in); -/* Pass back the yield from pcregrep(). */ +/* Pass back the yield from pcre2grep(). */ return rc; } @@ -2407,20 +3530,20 @@ handle_option(int letter, int options) switch(letter) { case N_FOFFSETS: file_offsets = TRUE; break; - case N_HELP: help(); pcregrep_exit(0); + case N_HELP: help(); pcre2grep_exit(0); break; /* Stops compiler warning */ case N_LBUFFER: line_buffered = TRUE; break; case N_LOFFSETS: line_offsets = number = TRUE; break; - case N_NOJIT: study_options &= ~PCRE_STUDY_JIT_COMPILE; break; + case N_NOJIT: use_jit = FALSE; break; case 'a': binary_files = BIN_TEXT; break; case 'c': count_only = TRUE; break; - case 'F': process_options |= PO_FIXED_STRINGS; break; + case 'F': options |= PCRE2_LITERAL; break; case 'H': filenames = FN_FORCE; break; case 'I': binary_files = BIN_NOMATCH; break; case 'h': filenames = FN_NONE; break; - case 'i': options |= PCRE_CASELESS; break; + case 'i': options |= PCRE2_CASELESS; break; case 'l': omit_zero_count = TRUE; filenames = FN_MATCH_ONLY; break; case 'L': filenames = FN_NOMATCH_ONLY; break; - case 'M': multiline = TRUE; options |= PCRE_MULTILINE|PCRE_FIRSTLINE; break; + case 'M': multiline = TRUE; options |= PCRE2_MULTILINE|PCRE2_FIRSTLINE; break; case 'n': number = TRUE; break; case 'o': @@ -2431,19 +3554,25 @@ switch(letter) case 'q': quiet = TRUE; break; case 'r': dee_action = dee_RECURSE; break; case 's': silent = TRUE; break; - case 'u': options |= PCRE_UTF8; utf8 = TRUE; break; + case 't': show_total_count = TRUE; break; + case 'u': options |= PCRE2_UTF; utf = TRUE; break; + case 'U': options |= PCRE2_UTF|PCRE2_MATCH_INVALID_UTF; utf = TRUE; break; case 'v': invert = TRUE; break; - case 'w': process_options |= PO_WORD_MATCH; break; - case 'x': process_options |= PO_LINE_MATCH; break; + case 'w': extra_options |= PCRE2_EXTRA_MATCH_WORD; break; + case 'x': extra_options |= PCRE2_EXTRA_MATCH_LINE; break; case 'V': - fprintf(stdout, "pcregrep version %s\n", pcre_version()); - pcregrep_exit(0); + { + unsigned char buffer[128]; + (void)pcre2_config(PCRE2_CONFIG_VERSION, buffer); + fprintf(stdout, "pcre2grep version %s" STDOUT_NL, buffer); + } + pcre2grep_exit(0); break; default: - fprintf(stderr, "pcregrep: Unknown option -%c\n", letter); - pcregrep_exit(usage(2)); + fprintf(stderr, "pcre2grep: Unknown option -%c\n", letter); + pcre2grep_exit(usage(2)); } return options; @@ -2451,7 +3580,6 @@ return options; - /************************************************* * Construct printed ordinal * *************************************************/ @@ -2465,6 +3593,8 @@ static char buffer[14]; char *p = buffer; sprintf(p, "%d", n); while (*p != 0) p++; +n %= 100; +if (n >= 11 && n <= 13) n = 0; switch (n%10) { case 1: strcpy(p, "st"); break; @@ -2492,7 +3622,6 @@ pattern chain. Arguments: p points to the pattern block options the PCRE options - popts the processing options fromfile TRUE if the pattern was read from a file fromtext file name or identifying text (e.g. "include") count 0 if this is the only command line pattern, or @@ -2503,18 +3632,19 @@ Returns: TRUE on success, FALSE after an error */ static BOOL -compile_pattern(patstr *p, int options, int popts, int fromfile, - const char *fromtext, int count) +compile_pattern(patstr *p, int options, int fromfile, const char *fromtext, + int count) { -char buffer[PATBUFSIZE]; -const char *error; -char *ps = p->string; -int patlen = strlen(ps); -int errptr; +char *ps; +int errcode; +PCRE2_SIZE patlen, erroffset; +PCRE2_UCHAR errmessbuffer[ERRBUFSIZ]; if (p->compiled != NULL) return TRUE; +ps = p->string; +patlen = p->length; -if ((popts & PO_FIXED_STRINGS) != 0) +if ((options & PCRE2_LITERAL) != 0) { int ellength; char *eop = ps + patlen; @@ -2522,40 +3652,44 @@ if ((popts & PO_FIXED_STRINGS) != 0) if (ellength != 0) { - if (add_pattern(pe, p) == NULL) return FALSE; - patlen = (int)(pe - ps - ellength); + patlen = pe - ps - ellength; + if (add_pattern(pe, p->length-patlen-ellength, p) == NULL) return FALSE; } } -if (snprintf(buffer, PATBUFSIZE, "%s%.*s%s", prefix[popts], patlen, ps, - suffix[popts]) > PATBUFSIZE) +p->compiled = pcre2_compile((PCRE2_SPTR)ps, patlen, options, &errcode, + &erroffset, compile_context); + +/* Handle successful compile. Try JIT-compiling if supported and enabled. We +ignore any JIT compiler errors, relying falling back to interpreting if +anything goes wrong with JIT. */ + +if (p->compiled != NULL) { - fprintf(stderr, "pcregrep: Buffer overflow while compiling \"%s\"\n", - ps); - return FALSE; +#ifdef SUPPORT_PCRE2GREP_JIT + if (use_jit) (void)pcre2_jit_compile(p->compiled, PCRE2_JIT_COMPLETE); +#endif + return TRUE; } -p->compiled = pcre_compile(buffer, options, &error, &errptr, pcretables); -if (p->compiled != NULL) return TRUE; - /* Handle compile errors */ -errptr -= (int)strlen(prefix[popts]); -if (errptr > patlen) errptr = patlen; +if (erroffset > patlen) erroffset = patlen; +pcre2_get_error_message(errcode, errmessbuffer, sizeof(errmessbuffer)); if (fromfile) { - fprintf(stderr, "pcregrep: Error in regex in line %d of %s " - "at offset %d: %s\n", count, fromtext, errptr, error); + fprintf(stderr, "pcre2grep: Error in regex in line %d of %s " + "at offset %d: %s\n", count, fromtext, (int)erroffset, errmessbuffer); } else { if (count == 0) - fprintf(stderr, "pcregrep: Error in %s regex at offset %d: %s\n", - fromtext, errptr, error); + fprintf(stderr, "pcre2grep: Error in %s regex at offset %d: %s\n", + fromtext, (int)erroffset, errmessbuffer); else - fprintf(stderr, "pcregrep: Error in %s %s regex at offset %d: %s\n", - ordin(count), fromtext, errptr, error); + fprintf(stderr, "pcre2grep: Error in %s %s regex at offset %d: %s\n", + ordin(count), fromtext, (int)erroffset, errmessbuffer); } return FALSE; @@ -2573,18 +3707,18 @@ return FALSE; name the name of the file; "-" is stdin patptr pointer to the pattern chain anchor patlastptr pointer to the last pattern pointer - popts the process options to pass to pattern_compile() Returns: TRUE if all went well */ static BOOL -read_pattern_file(char *name, patstr **patptr, patstr **patlastptr, int popts) +read_pattern_file(char *name, patstr **patptr, patstr **patlastptr) { int linenumber = 0; +PCRE2_SIZE patlen; FILE *f; -char *filename; -char buffer[PATBUFSIZE]; +const char *filename; +char buffer[MAXPATLEN+20]; if (strcmp(name, "-") == 0) { @@ -2596,26 +3730,24 @@ else f = fopen(name, "r"); if (f == NULL) { - fprintf(stderr, "pcregrep: Failed to open %s: %s\n", name, strerror(errno)); + fprintf(stderr, "pcre2grep: Failed to open %s: %s\n", name, strerror(errno)); return FALSE; } filename = name; } -while (fgets(buffer, PATBUFSIZE, f) != NULL) +while ((patlen = read_one_line(buffer, sizeof(buffer), f)) > 0) { - char *s = buffer + (int)strlen(buffer); - while (s > buffer && isspace((unsigned char)(s[-1]))) s--; - *s = 0; + while (patlen > 0 && isspace((unsigned char)(buffer[patlen-1]))) patlen--; linenumber++; - if (buffer[0] == 0) continue; /* Skip blank lines */ + if (patlen == 0) continue; /* Skip blank lines */ /* Note: this call to add_pattern() puts a pointer to the local variable "buffer" into the pattern chain. However, that pointer is used only when compiling the pattern, which happens immediately below, so we flatten it afterwards, as a precaution against any later code trying to use it. */ - *patlastptr = add_pattern(buffer, *patlastptr); + *patlastptr = add_pattern(buffer, patlen, *patlastptr); if (*patlastptr == NULL) { if (f != stdin) fclose(f); @@ -2625,12 +3757,13 @@ while (fgets(buffer, PATBUFSIZE, f) != NULL) /* This loop is needed because compiling a "pattern" when -F is set may add on additional literal patterns if the original contains a newline. In the - common case, it never will, because fgets() stops at a newline. However, - the -N option can be used to give pcregrep a different newline setting. */ + common case, it never will, because read_one_line() stops at a newline. + However, the -N option can be used to give pcre2grep a different newline + setting. */ for(;;) { - if (!compile_pattern(*patlastptr, pcre_options, popts, TRUE, filename, + if (!compile_pattern(*patlastptr, pcre2_options, TRUE, filename, linenumber)) { if (f != stdin) fclose(f); @@ -2662,28 +3795,22 @@ int rc = 1; BOOL only_one_at_top; patstr *cp; fnstr *fn; +omstr *om; const char *locale_from = "--locale"; -const char *error; -#ifdef SUPPORT_PCREGREP_JIT -pcre_jit_stack *jit_stack = NULL; +#ifdef SUPPORT_PCRE2GREP_JIT +pcre2_jit_stack *jit_stack = NULL; #endif -/* Set the default line ending value from the default in the PCRE library; -"lf", "cr", "crlf", and "any" are supported. Anything else is treated as "lf". -Note that the return values from pcre_config(), though derived from the ASCII -codes, are the same in EBCDIC environments, so we must use the actual values -rather than escapes such as as '\r'. */ +/* In Windows, stdout is set up as a text stream, which means that \n is +converted to \r\n. This causes output lines that are copied from the input to +change from ....\r\n to ....\r\r\n, which is not right. We therefore ensure +that stdout is a binary stream. Note that this means all other output to stdout +must use STDOUT_NL to terminate lines. */ -(void)pcre_config(PCRE_CONFIG_NEWLINE, &i); -switch(i) - { - default: newline = (char *)"lf"; break; - case 13: newline = (char *)"cr"; break; - case (13 << 8) | 10: newline = (char *)"crlf"; break; - case -1: newline = (char *)"any"; break; - case -2: newline = (char *)"anycrlf"; break; - } +#ifdef WIN32 +_setmode(_fileno(stdout), _O_BINARY); +#endif /* Process the options */ @@ -2702,7 +3829,7 @@ for (i = 1; i < argc; i++) if (argv[i][1] == 0) { if (pattern_files != NULL || patterns != NULL) break; - else pcregrep_exit(usage(2)); + else pcre2grep_exit(usage(2)); } /* Handle a long name option, or -- to terminate the options */ @@ -2764,20 +3891,22 @@ for (i = 1; i < argc; i++) { char buff1[24]; char buff2[24]; + int ret; int baselen = (int)(opbra - op->long_name); int fulllen = (int)(strchr(op->long_name, ')') - op->long_name + 1); int arglen = (argequals == NULL || equals == NULL)? (int)strlen(arg) : (int)(argequals - arg); - if (snprintf(buff1, sizeof(buff1), "%.*s", baselen, op->long_name) > - (int)sizeof(buff1) || - snprintf(buff2, sizeof(buff2), "%s%.*s", buff1, - fulllen - baselen - 2, opbra + 1) > (int)sizeof(buff2)) + if ((ret = snprintf(buff1, sizeof(buff1), "%.*s", baselen, op->long_name), + ret < 0 || ret > (int)sizeof(buff1)) || + (ret = snprintf(buff2, sizeof(buff2), "%s%.*s", buff1, + fulllen - baselen - 2, opbra + 1), + ret < 0 || ret > (int)sizeof(buff2))) { - fprintf(stderr, "pcregrep: Buffer overflow when parsing %s option\n", + fprintf(stderr, "pcre2grep: Buffer overflow when parsing %s option\n", op->long_name); - pcregrep_exit(2); + pcre2grep_exit(2); } if (strncmp(arg, buff1, arglen) == 0 || @@ -2799,8 +3928,8 @@ for (i = 1; i < argc; i++) if (op->one_char == 0) { - fprintf(stderr, "pcregrep: Unknown option %s\n", argv[i]); - pcregrep_exit(usage(2)); + fprintf(stderr, "pcre2grep: Unknown option %s\n", argv[i]); + pcre2grep_exit(usage(2)); } } @@ -2843,9 +3972,9 @@ for (i = 1; i < argc; i++) } if (op->one_char == 0) { - fprintf(stderr, "pcregrep: Unknown option letter '%c' in \"%s\"\n", + fprintf(stderr, "pcre2grep: Unknown option letter '%c' in \"%s\"\n", *s, argv[i]); - pcregrep_exit(usage(2)); + pcre2grep_exit(usage(2)); } option_data = s+1; @@ -2872,7 +4001,7 @@ for (i = 1; i < argc; i++) /* Handle a single-character option with no data, then loop for the next character in the string. */ - pcre_options = handle_option(*s++, pcre_options); + pcre2_options = handle_option(*s++, pcre2_options); } } @@ -2882,7 +4011,7 @@ for (i = 1; i < argc; i++) if (op->type == OP_NODATA) { - pcre_options = handle_option(op->one_char, pcre_options); + pcre2_options = handle_option(op->one_char, pcre2_options); continue; } @@ -2898,7 +4027,7 @@ for (i = 1; i < argc; i++) switch (op->one_char) { case N_COLOUR: - colour_option = (char *)"auto"; + colour_option = "auto"; break; case 'o': @@ -2921,8 +4050,8 @@ for (i = 1; i < argc; i++) { if (i >= argc - 1 || longopwasequals) { - fprintf(stderr, "pcregrep: Data missing after %s\n", argv[i]); - pcregrep_exit(usage(2)); + fprintf(stderr, "pcre2grep: Data missing after %s\n", argv[i]); + pcre2grep_exit(usage(2)); } option_data = argv[++i]; } @@ -2945,7 +4074,8 @@ for (i = 1; i < argc; i++) else if (op->type == OP_PATLIST) { patdatastr *pd = (patdatastr *)op->dataptr; - *(pd->lastptr) = add_pattern(option_data, *(pd->lastptr)); + *(pd->lastptr) = add_pattern(option_data, (PCRE2_SIZE)strlen(option_data), + *(pd->lastptr)); if (*(pd->lastptr) == NULL) goto EXIT2; if (*(pd->anchor) == NULL) *(pd->anchor) = *(pd->lastptr); } @@ -2959,7 +4089,7 @@ for (i = 1; i < argc; i++) fn = (fnstr *)malloc(sizeof(fnstr)); if (fn == NULL) { - fprintf(stderr, "pcregrep: malloc failed\n"); + fprintf(stderr, "pcre2grep: malloc failed\n"); goto EXIT2; } fn->next = NULL; @@ -2983,23 +4113,24 @@ for (i = 1; i < argc; i++) binary_files = BIN_TEXT; else { - fprintf(stderr, "pcregrep: unknown value \"%s\" for binary-files\n", + fprintf(stderr, "pcre2grep: unknown value \"%s\" for binary-files\n", option_data); - pcregrep_exit(usage(2)); + pcre2grep_exit(usage(2)); } } /* Otherwise, deal with a single string or numeric data value. */ - else if (op->type != OP_NUMBER && op->type != OP_LONGNUMBER && - op->type != OP_OP_NUMBER) + else if (op->type != OP_NUMBER && op->type != OP_U32NUMBER && + op->type != OP_OP_NUMBER && op->type != OP_SIZE) { *((char **)op->dataptr) = option_data; } else { unsigned long int n = decode_number(option_data, op, longop); - if (op->type == OP_LONGNUMBER) *((unsigned long int *)op->dataptr) = n; + if (op->type == OP_U32NUMBER) *((uint32_t *)op->dataptr) = n; + else if (op->type == OP_SIZE) *((PCRE2_SIZE *)op->dataptr) = n; else *((int *)op->dataptr) = n; } } @@ -3013,20 +4144,60 @@ if (both_context > 0) if (before_context == 0) before_context = both_context; } -/* Only one of --only-matching, --file-offsets, or --line-offsets is permitted. -However, all three set show_only_matching because they display, each in their -own way, only the data that has matched. */ +/* Only one of --only-matching, --output, --file-offsets, or --line-offsets is +permitted. They display, each in their own way, only the data that has matched. +*/ + +only_matching_count = (only_matching != NULL) + (output_text != NULL) + + file_offsets + line_offsets; + +if (only_matching_count > 1) + { + fprintf(stderr, "pcre2grep: Cannot mix --only-matching, --output, " + "--file-offsets and/or --line-offsets\n"); + pcre2grep_exit(usage(2)); + } + + +/* Check that there is a big enough ovector for all -o settings. */ -if ((only_matching != NULL && (file_offsets || line_offsets)) || - (file_offsets && line_offsets)) +for (om = only_matching; om != NULL; om = om->next) { - fprintf(stderr, "pcregrep: Cannot mix --only-matching, --file-offsets " - "and/or --line-offsets\n"); - pcregrep_exit(usage(2)); + int n = om->groupnum; + if (n > (int)capture_max) + { + fprintf(stderr, "pcre2grep: Requested group %d cannot be captured.\n", n); + fprintf(stderr, "pcre2grep: Use --om-capture to increase the size of the capture vector.\n"); + goto EXIT2; + } } -if (only_matching != NULL || file_offsets || line_offsets) - show_only_matching = TRUE; +/* Check the text supplied to --output for errors. */ + +if (output_text != NULL && + !syntax_check_output_text((PCRE2_SPTR)output_text, FALSE)) + goto EXIT2; + +/* Set up default compile and match contexts and a match data block. */ + +offset_size = capture_max + 1; +compile_context = pcre2_compile_context_create(NULL); +match_context = pcre2_match_context_create(NULL); +match_data = pcre2_match_data_create(offset_size, NULL); +offsets = pcre2_get_ovector_pointer(match_data); + +/* If string (script) callouts are supported, set up the callout processing +function. */ + +#ifdef SUPPORT_PCRE2GREP_CALLOUT +pcre2_set_callout(match_context, pcre2grep_callout, NULL); +#endif + +/* Put limits into the match data block. */ + +if (heap_limit != PCRE2_UNSET) pcre2_set_heap_limit(match_context, heap_limit); +if (match_limit > 0) pcre2_set_match_limit(match_context, match_limit); +if (depth_limit > 0) pcre2_set_depth_limit(match_context, depth_limit); /* If a locale has not been provided as an option, see if the LC_CTYPE or LC_ALL environment variable is set, and if so, use it. */ @@ -3043,71 +4214,79 @@ if (locale == NULL) locale_from = "LC_CTYPE"; } -/* If a locale is set, use it to generate the tables the PCRE needs. Otherwise, -pcretables==NULL, which causes the use of default tables. */ +/* If a locale is set, use it to generate the tables the PCRE needs. Passing +NULL to pcre2_maketables() means that malloc() is used to get the memory. */ if (locale != NULL) { if (setlocale(LC_CTYPE, locale) == NULL) { - fprintf(stderr, "pcregrep: Failed to set locale %s (obtained from %s)\n", + fprintf(stderr, "pcre2grep: Failed to set locale %s (obtained from %s)\n", locale, locale_from); goto EXIT2; } - pcretables = pcre_maketables(); + character_tables = pcre2_maketables(NULL); + pcre2_set_character_tables(compile_context, character_tables); } /* Sort out colouring */ if (colour_option != NULL && strcmp(colour_option, "never") != 0) { - if (strcmp(colour_option, "always") == 0) do_colour = TRUE; + if (strcmp(colour_option, "always") == 0) +#ifdef WIN32 + do_ansi = !is_stdout_tty(), +#endif + do_colour = TRUE; else if (strcmp(colour_option, "auto") == 0) do_colour = is_stdout_tty(); else { - fprintf(stderr, "pcregrep: Unknown colour setting \"%s\"\n", + fprintf(stderr, "pcre2grep: Unknown colour setting \"%s\"\n", colour_option); goto EXIT2; } if (do_colour) { - char *cs = getenv("PCREGREP_COLOUR"); + char *cs = getenv("PCRE2GREP_COLOUR"); + if (cs == NULL) cs = getenv("PCRE2GREP_COLOR"); + if (cs == NULL) cs = getenv("PCREGREP_COLOUR"); if (cs == NULL) cs = getenv("PCREGREP_COLOR"); - if (cs != NULL) colour_string = cs; + if (cs == NULL) cs = parse_grep_colors(getenv("GREP_COLORS")); + if (cs == NULL) cs = getenv("GREP_COLOR"); + if (cs != NULL) + { + if (strspn(cs, ";0123456789") == strlen(cs)) colour_string = cs; + } +#ifdef WIN32 + init_colour_output(); +#endif } } -/* Interpret the newline type; the default settings are Unix-like. */ +/* Sort out a newline setting. */ -if (strcmp(newline, "cr") == 0 || strcmp(newline, "CR") == 0) - { - pcre_options |= PCRE_NEWLINE_CR; - endlinetype = EL_CR; - } -else if (strcmp(newline, "lf") == 0 || strcmp(newline, "LF") == 0) - { - pcre_options |= PCRE_NEWLINE_LF; - endlinetype = EL_LF; - } -else if (strcmp(newline, "crlf") == 0 || strcmp(newline, "CRLF") == 0) - { - pcre_options |= PCRE_NEWLINE_CRLF; - endlinetype = EL_CRLF; - } -else if (strcmp(newline, "any") == 0 || strcmp(newline, "ANY") == 0) +if (newline_arg != NULL) { - pcre_options |= PCRE_NEWLINE_ANY; - endlinetype = EL_ANY; - } -else if (strcmp(newline, "anycrlf") == 0 || strcmp(newline, "ANYCRLF") == 0) - { - pcre_options |= PCRE_NEWLINE_ANYCRLF; - endlinetype = EL_ANYCRLF; + for (endlinetype = 1; endlinetype < (int)(sizeof(newlines)/sizeof(char *)); + endlinetype++) + { + if (strcmpic(newline_arg, newlines[endlinetype]) == 0) break; + } + if (endlinetype < (int)(sizeof(newlines)/sizeof(char *))) + pcre2_set_newline(compile_context, endlinetype); + else + { + fprintf(stderr, "pcre2grep: Invalid newline specifier \"%s\"\n", + newline_arg); + goto EXIT2; + } } + +/* Find default newline convention */ + else { - fprintf(stderr, "pcregrep: Invalid newline specifier \"%s\"\n", newline); - goto EXIT2; + (void)pcre2_config(PCRE2_CONFIG_NEWLINE, &endlinetype); } /* Interpret the text values for -d and -D */ @@ -3119,7 +4298,7 @@ if (dee_option != NULL) else if (strcmp(dee_option, "skip") == 0) dee_action = dee_SKIP; else { - fprintf(stderr, "pcregrep: Invalid value \"%s\" for -d\n", dee_option); + fprintf(stderr, "pcre2grep: Invalid value \"%s\" for -d\n", dee_option); goto EXIT2; } } @@ -3130,17 +4309,21 @@ if (DEE_option != NULL) else if (strcmp(DEE_option, "skip") == 0) DEE_action = DEE_SKIP; else { - fprintf(stderr, "pcregrep: Invalid value \"%s\" for -D\n", DEE_option); + fprintf(stderr, "pcre2grep: Invalid value \"%s\" for -D\n", DEE_option); goto EXIT2; } } +/* Set the extra options */ + +(void)pcre2_set_compile_extra_options(compile_context, extra_options); + /* Check the values for Jeffrey Friedl's debugging options. */ #ifdef JFRIEDL_DEBUG if (S_arg > 9) { - fprintf(stderr, "pcregrep: bad value for -S option\n"); + fprintf(stderr, "pcre2grep: bad value for -S option\n"); return 2; } if (jfriedl_XT != 0 || jfriedl_XR != 0) @@ -3150,14 +4333,30 @@ if (jfriedl_XT != 0 || jfriedl_XR != 0) } #endif +/* If use_jit is set, check whether JIT is available. If not, do not try +to use JIT. */ + +if (use_jit) + { + uint32_t answer; + (void)pcre2_config(PCRE2_CONFIG_JIT, &answer); + if (!answer) use_jit = FALSE; + } + /* Get memory for the main buffer. */ +if (bufthird <= 0) + { + fprintf(stderr, "pcre2grep: --buffer-size must be greater than zero\n"); + goto EXIT2; + } + bufsize = 3*bufthird; main_buffer = (char *)malloc(bufsize); if (main_buffer == NULL) { - fprintf(stderr, "pcregrep: malloc failed\n"); + fprintf(stderr, "pcre2grep: malloc failed\n"); goto EXIT2; } @@ -3167,7 +4366,9 @@ the first argument is the one and only pattern, and it must exist. */ if (patterns == NULL && pattern_files == NULL) { if (i >= argc) return usage(2); - patterns = patterns_last = add_pattern(argv[i++], NULL); + patterns = patterns_last = add_pattern(argv[i], (PCRE2_SIZE)strlen(argv[i]), + NULL); + i++; if (patterns == NULL) goto EXIT2; } @@ -3179,7 +4380,7 @@ chain, so we must not access the next pointer till after the compile. */ for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next) { - if (!compile_pattern(cp, pcre_options, process_options, FALSE, "command-line", + if (!compile_pattern(cp, pcre2_options, FALSE, "command-line", (j == 1 && patterns->next == NULL)? 0 : j)) goto EXIT2; } @@ -3188,71 +4389,35 @@ for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next) for (fn = pattern_files; fn != NULL; fn = fn->next) { - if (!read_pattern_file(fn->name, &patterns, &patterns_last, process_options)) - goto EXIT2; + if (!read_pattern_file(fn->name, &patterns, &patterns_last)) goto EXIT2; } -/* Study the regular expressions, as we will be running them many times. If an -extra block is needed for a limit, set PCRE_STUDY_EXTRA_NEEDED so that one is -returned, even if studying produces no data. */ - -if (match_limit > 0 || match_limit_recursion > 0) - study_options |= PCRE_STUDY_EXTRA_NEEDED; - /* Unless JIT has been explicitly disabled, arrange a stack for it to use. */ -#ifdef SUPPORT_PCREGREP_JIT -if ((study_options & PCRE_STUDY_JIT_COMPILE) != 0) - jit_stack = pcre_jit_stack_alloc(32*1024, 1024*1024); -#endif - -for (j = 1, cp = patterns; cp != NULL; j++, cp = cp->next) +#ifdef SUPPORT_PCRE2GREP_JIT +if (use_jit) { - cp->hint = pcre_study(cp->compiled, study_options, &error); - if (error != NULL) - { - if (patterns->next == NULL) - fprintf(stderr, "pcregrep: Error while studying regex: %s\n", error); - else - fprintf(stderr, "pcregrep: Error while studying regex number %d: %s\n", - j, error); - goto EXIT2; - } -#ifdef SUPPORT_PCREGREP_JIT - if (jit_stack != NULL && cp->hint != NULL) - pcre_assign_jit_stack(cp->hint, NULL, jit_stack); -#endif + jit_stack = pcre2_jit_stack_create(32*1024, 1024*1024, NULL); + if (jit_stack != NULL ) + pcre2_jit_stack_assign(match_context, NULL, jit_stack); } +#endif -/* If --match-limit or --recursion-limit was set, put the value(s) into the -pcre_extra block for each pattern. There will always be an extra block because -of the use of PCRE_STUDY_EXTRA_NEEDED above. */ - -for (cp = patterns; cp != NULL; cp = cp->next) - { - if (match_limit > 0) - { - cp->hint->flags |= PCRE_EXTRA_MATCH_LIMIT; - cp->hint->match_limit = match_limit; - } +/* -F, -w, and -x do not apply to include or exclude patterns, so we must +adjust the options. */ - if (match_limit_recursion > 0) - { - cp->hint->flags |= PCRE_EXTRA_MATCH_LIMIT_RECURSION; - cp->hint->match_limit_recursion = match_limit_recursion; - } - } +pcre2_options &= ~PCRE2_LITERAL; +(void)pcre2_set_compile_extra_options(compile_context, 0); /* If there are include or exclude patterns read from the command line, compile -them. -F, -w, and -x do not apply, so the third argument of compile_pattern is -0. */ +them. */ for (j = 0; j < 4; j++) { int k; for (k = 1, cp = *(incexlist[j]); cp != NULL; k++, cp = cp->next) { - if (!compile_pattern(cp, pcre_options, 0, FALSE, incexname[j], + if (!compile_pattern(cp, pcre2_options, FALSE, incexname[j], (k == 1 && cp->next == NULL)? 0 : k)) goto EXIT2; } @@ -3262,13 +4427,13 @@ for (j = 0; j < 4; j++) for (fn = include_from; fn != NULL; fn = fn->next) { - if (!read_pattern_file(fn->name, &include_patterns, &include_patterns_last, 0)) + if (!read_pattern_file(fn->name, &include_patterns, &include_patterns_last)) goto EXIT2; } for (fn = exclude_from; fn != NULL; fn = fn->next) { - if (!read_pattern_file(fn->name, &exclude_patterns, &exclude_patterns_last, 0)) + if (!read_pattern_file(fn->name, &exclude_patterns, &exclude_patterns_last)) goto EXIT2; } @@ -3277,7 +4442,7 @@ no file arguments, search stdin, and then exit. */ if (file_lists == NULL && i >= argc) { - rc = pcregrep(stdin, FR_PLAIN, stdin_name, + rc = pcre2grep(stdin, FR_PLAIN, stdin_name, (filenames > FN_DEFAULT)? stdin_name : NULL); goto EXIT; } @@ -3287,19 +4452,19 @@ read them line by line and search the given files. */ for (fn = file_lists; fn != NULL; fn = fn->next) { - char buffer[PATBUFSIZE]; + char buffer[FNBUFSIZ]; FILE *fl; if (strcmp(fn->name, "-") == 0) fl = stdin; else { fl = fopen(fn->name, "rb"); if (fl == NULL) { - fprintf(stderr, "pcregrep: Failed to open %s: %s\n", fn->name, + fprintf(stderr, "pcre2grep: Failed to open %s: %s\n", fn->name, strerror(errno)); goto EXIT2; } } - while (fgets(buffer, PATBUFSIZE, fl) != NULL) + while (fgets(buffer, sizeof(buffer), fl) != NULL) { int frc; char *end = buffer + (int)strlen(buffer); @@ -3329,13 +4494,36 @@ for (; i < argc; i++) else if (frc == 0 && rc == 1) rc = 0; } +#ifdef SUPPORT_PCRE2GREP_CALLOUT +/* If separating builtin echo callouts by implicit newline, add one more for +the final item. */ + +if (om_separator != NULL && strcmp(om_separator, STDOUT_NL) == 0) + fprintf(stdout, STDOUT_NL); +#endif + +/* Show the total number of matches if requested, but not if only one file's +count was printed. */ + +if (show_total_count && counts_printed != 1 && filenames != FN_NOMATCH_ONLY) + { + if (counts_printed != 0 && filenames >= FN_DEFAULT) + fprintf(stdout, "TOTAL:"); + fprintf(stdout, "%lu" STDOUT_NL, total_count); + } + EXIT: -#ifdef SUPPORT_PCREGREP_JIT -if (jit_stack != NULL) pcre_jit_stack_free(jit_stack); +#ifdef SUPPORT_PCRE2GREP_JIT +pcre2_jit_free_unused_memory(NULL); +if (jit_stack != NULL) pcre2_jit_stack_free(jit_stack); #endif free(main_buffer); -free((void *)pcretables); +if (character_tables != NULL) pcre2_maketables_free(NULL, character_tables); + +pcre2_compile_context_free(compile_context); +pcre2_match_context_free(match_context); +pcre2_match_data_free(match_data); free_pattern_chain(patterns); free_pattern_chain(include_patterns); @@ -3355,11 +4543,11 @@ while (only_matching != NULL) free(this); } -pcregrep_exit(rc); +pcre2grep_exit(rc); EXIT2: rc = 2; goto EXIT; } -/* End of pcregrep */ +/* End of pcre2grep */ diff --git a/src/pcre2/src/pcre2posix.c b/src/pcre2/src/pcre2posix.c new file mode 100644 index 00000000..486bccef --- /dev/null +++ b/src/pcre2/src/pcre2posix.c @@ -0,0 +1,437 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2021 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +/* This module is a wrapper that provides a POSIX API to the underlying PCRE2 +functions. The operative functions are called pcre2_regcomp(), etc., with +wrappers that use the plain POSIX names. In addition, pcre2posix.h defines the +POSIX names as macros for the pcre2_xxx functions, so any program that includes +it and uses the POSIX names will call the base functions directly. This makes +it easier for an application to be sure it gets the PCRE2 versions in the +presence of other POSIX regex libraries. */ + + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + + +/* Ensure that the PCRE2POSIX_EXP_xxx macros are set appropriately for +compiling these functions. This must come before including pcre2posix.h, where +they are set for an application (using these functions) if they have not +previously been set. */ + +#if defined(_WIN32) && !defined(PCRE2_STATIC) +# define PCRE2POSIX_EXP_DECL extern __declspec(dllexport) +# define PCRE2POSIX_EXP_DEFN __declspec(dllexport) +#endif + +/* Older versions of MSVC lack snprintf(). This define allows for +warning/error-free compilation and testing with MSVC compilers back to at least +MSVC 10/2010. Except for VC6 (which is missing some fundamentals and fails). */ + +#if defined(_MSC_VER) && (_MSC_VER < 1900) +#define snprintf _snprintf +#endif + + +/* Compile-time error numbers start at this value. It should probably never be +changed. This #define is a copy of the one in pcre2_internal.h. */ + +#define COMPILE_ERROR_BASE 100 + + +/* Standard C headers */ + +#include +#include +#include +#include +#include +#include + +/* PCRE2 headers */ + +#include "pcre2.h" +#include "pcre2posix.h" + +/* When compiling with the MSVC compiler, it is sometimes necessary to include +a "calling convention" before exported function names. (This is secondhand +information; I know nothing about MSVC myself). For example, something like + + void __cdecl function(....) + +might be needed. In order to make this easy, all the exported functions have +PCRE2_CALL_CONVENTION just before their names. It is rarely needed; if not +set, we ensure here that it has no effect. */ + +#ifndef PCRE2_CALL_CONVENTION +#define PCRE2_CALL_CONVENTION +#endif + +/* Table to translate PCRE2 compile time error codes into POSIX error codes. +Only a few PCRE2 errors with a value greater than 23 turn into special POSIX +codes: most go to REG_BADPAT. The second table lists, in pairs, those that +don't. */ + +static const int eint1[] = { + 0, /* No error */ + REG_EESCAPE, /* \ at end of pattern */ + REG_EESCAPE, /* \c at end of pattern */ + REG_EESCAPE, /* unrecognized character follows \ */ + REG_BADBR, /* numbers out of order in {} quantifier */ + /* 5 */ + REG_BADBR, /* number too big in {} quantifier */ + REG_EBRACK, /* missing terminating ] for character class */ + REG_ECTYPE, /* invalid escape sequence in character class */ + REG_ERANGE, /* range out of order in character class */ + REG_BADRPT, /* nothing to repeat */ + /* 10 */ + REG_ASSERT, /* internal error: unexpected repeat */ + REG_BADPAT, /* unrecognized character after (? or (?- */ + REG_BADPAT, /* POSIX named classes are supported only within a class */ + REG_BADPAT, /* POSIX collating elements are not supported */ + REG_EPAREN, /* missing ) */ + /* 15 */ + REG_ESUBREG, /* reference to non-existent subpattern */ + REG_INVARG, /* pattern passed as NULL */ + REG_INVARG, /* unknown compile-time option bit(s) */ + REG_EPAREN, /* missing ) after (?# comment */ + REG_ESIZE, /* parentheses nested too deeply */ + /* 20 */ + REG_ESIZE, /* regular expression too large */ + REG_ESPACE, /* failed to get memory */ + REG_EPAREN, /* unmatched closing parenthesis */ + REG_ASSERT /* internal error: code overflow */ + }; + +static const int eint2[] = { + 30, REG_ECTYPE, /* unknown POSIX class name */ + 32, REG_INVARG, /* this version of PCRE2 does not have Unicode support */ + 37, REG_EESCAPE, /* PCRE2 does not support \L, \l, \N{name}, \U, or \u */ + 56, REG_INVARG, /* internal error: unknown newline setting */ + 92, REG_INVARG, /* invalid option bits with PCRE2_LITERAL */ +}; + +/* Table of texts corresponding to POSIX error codes */ + +static const char *const pstring[] = { + "", /* Dummy for value 0 */ + "internal error", /* REG_ASSERT */ + "invalid repeat counts in {}", /* BADBR */ + "pattern error", /* BADPAT */ + "? * + invalid", /* BADRPT */ + "unbalanced {}", /* EBRACE */ + "unbalanced []", /* EBRACK */ + "collation error - not relevant", /* ECOLLATE */ + "bad class", /* ECTYPE */ + "bad escape sequence", /* EESCAPE */ + "empty expression", /* EMPTY */ + "unbalanced ()", /* EPAREN */ + "bad range inside []", /* ERANGE */ + "expression too big", /* ESIZE */ + "failed to get memory", /* ESPACE */ + "bad back reference", /* ESUBREG */ + "bad argument", /* INVARG */ + "match failed" /* NOMATCH */ +}; + + + +#if 0 /* REMOVE THIS CODE */ + +The code below was created for 10.33 (see ChangeLog 10.33 #4) when the +POSIX functions were given pcre2_... names instead of the traditional POSIX +names. However, it has proved to be more troublesome than useful. There have +been at least two cases where a program links with two others, one of which +uses the POSIX library and the other uses the PCRE2 POSIX functions, thus +causing two instances of the POSIX runctions to exist, leading to trouble. For +10.37 this code is commented out. In due course it can be removed if there are +no issues. The only small worry is the comment below about languages that do +not include pcre2posix.h. If there are any such cases, they will have to use +the PCRE2 names. + + +/************************************************* +* Wrappers with traditional POSIX names * +*************************************************/ + +/* Keep defining them to preseve the ABI for applications linked to the pcre2 +POSIX library before these names were changed into macros in pcre2posix.h. +This also ensures that the POSIX names are callable from languages that do not +include pcre2posix.h. It is vital to #undef the macro definitions from +pcre2posix.h! */ + +#undef regerror +PCRE2POSIX_EXP_DECL size_t regerror(int, const regex_t *, char *, size_t); +PCRE2POSIX_EXP_DEFN size_t PCRE2_CALL_CONVENTION +regerror(int errcode, const regex_t *preg, char *errbuf, size_t errbuf_size) +{ +return pcre2_regerror(errcode, preg, errbuf, errbuf_size); +} + +#undef regfree +PCRE2POSIX_EXP_DECL void regfree(regex_t *); +PCRE2POSIX_EXP_DEFN void PCRE2_CALL_CONVENTION +regfree(regex_t *preg) +{ +pcre2_regfree(preg); +} + +#undef regcomp +PCRE2POSIX_EXP_DECL int regcomp(regex_t *, const char *, int); +PCRE2POSIX_EXP_DEFN int PCRE2_CALL_CONVENTION +regcomp(regex_t *preg, const char *pattern, int cflags) +{ +return pcre2_regcomp(preg, pattern, cflags); +} + +#undef regexec +PCRE2POSIX_EXP_DECL int regexec(const regex_t *, const char *, size_t, + regmatch_t *, int); +PCRE2POSIX_EXP_DEFN int PCRE2_CALL_CONVENTION +regexec(const regex_t *preg, const char *string, size_t nmatch, + regmatch_t pmatch[], int eflags) +{ +return pcre2_regexec(preg, string, nmatch, pmatch, eflags); +} +#endif + + +/************************************************* +* Translate error code to string * +*************************************************/ + +PCRE2POSIX_EXP_DEFN size_t PCRE2_CALL_CONVENTION +pcre2_regerror(int errcode, const regex_t *preg, char *errbuf, + size_t errbuf_size) +{ +int used; +const char *message; + +message = (errcode <= 0 || errcode >= (int)(sizeof(pstring)/sizeof(char *)))? + "unknown error code" : pstring[errcode]; + +if (preg != NULL && (int)preg->re_erroffset != -1) + { + used = snprintf(errbuf, errbuf_size, "%s at offset %-6d", message, + (int)preg->re_erroffset); + } +else + { + used = snprintf(errbuf, errbuf_size, "%s", message); + } + +return used + 1; +} + + + +/************************************************* +* Free store held by a regex * +*************************************************/ + +PCRE2POSIX_EXP_DEFN void PCRE2_CALL_CONVENTION +pcre2_regfree(regex_t *preg) +{ +pcre2_match_data_free(preg->re_match_data); +pcre2_code_free(preg->re_pcre2_code); +} + + + +/************************************************* +* Compile a regular expression * +*************************************************/ + +/* +Arguments: + preg points to a structure for recording the compiled expression + pattern the pattern to compile + cflags compilation flags + +Returns: 0 on success + various non-zero codes on failure +*/ + +PCRE2POSIX_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_regcomp(regex_t *preg, const char *pattern, int cflags) +{ +PCRE2_SIZE erroffset; +PCRE2_SIZE patlen; +int errorcode; +int options = 0; +int re_nsub = 0; + +patlen = ((cflags & REG_PEND) != 0)? (PCRE2_SIZE)(preg->re_endp - pattern) : + PCRE2_ZERO_TERMINATED; + +if ((cflags & REG_ICASE) != 0) options |= PCRE2_CASELESS; +if ((cflags & REG_NEWLINE) != 0) options |= PCRE2_MULTILINE; +if ((cflags & REG_DOTALL) != 0) options |= PCRE2_DOTALL; +if ((cflags & REG_NOSPEC) != 0) options |= PCRE2_LITERAL; +if ((cflags & REG_UTF) != 0) options |= PCRE2_UTF; +if ((cflags & REG_UCP) != 0) options |= PCRE2_UCP; +if ((cflags & REG_UNGREEDY) != 0) options |= PCRE2_UNGREEDY; + +preg->re_cflags = cflags; +preg->re_pcre2_code = pcre2_compile((PCRE2_SPTR)pattern, patlen, options, + &errorcode, &erroffset, NULL); +preg->re_erroffset = erroffset; + +if (preg->re_pcre2_code == NULL) + { + unsigned int i; + + /* A negative value is a UTF error; otherwise all error codes are greater + than COMPILE_ERROR_BASE, but check, just in case. */ + + if (errorcode < COMPILE_ERROR_BASE) return REG_BADPAT; + errorcode -= COMPILE_ERROR_BASE; + + if (errorcode < (int)(sizeof(eint1)/sizeof(const int))) + return eint1[errorcode]; + for (i = 0; i < sizeof(eint2)/sizeof(const int); i += 2) + if (errorcode == eint2[i]) return eint2[i+1]; + return REG_BADPAT; + } + +(void)pcre2_pattern_info((const pcre2_code *)preg->re_pcre2_code, + PCRE2_INFO_CAPTURECOUNT, &re_nsub); +preg->re_nsub = (size_t)re_nsub; +preg->re_match_data = pcre2_match_data_create(re_nsub + 1, NULL); +preg->re_erroffset = (size_t)(-1); /* No meaning after successful compile */ + +if (preg->re_match_data == NULL) + { + pcre2_code_free(preg->re_pcre2_code); + return REG_ESPACE; + } + +return 0; +} + + + +/************************************************* +* Match a regular expression * +*************************************************/ + +/* A suitable match_data block, large enough to hold all possible captures, was +obtained when the pattern was compiled, to save having to allocate and free it +for each match. If REG_NOSUB was specified at compile time, the nmatch and +pmatch arguments are ignored, and the only result is yes/no/error. */ + +PCRE2POSIX_EXP_DEFN int PCRE2_CALL_CONVENTION +pcre2_regexec(const regex_t *preg, const char *string, size_t nmatch, + regmatch_t pmatch[], int eflags) +{ +int rc, so, eo; +int options = 0; +pcre2_match_data *md = (pcre2_match_data *)preg->re_match_data; + +if ((eflags & REG_NOTBOL) != 0) options |= PCRE2_NOTBOL; +if ((eflags & REG_NOTEOL) != 0) options |= PCRE2_NOTEOL; +if ((eflags & REG_NOTEMPTY) != 0) options |= PCRE2_NOTEMPTY; + +/* When REG_NOSUB was specified, or if no vector has been passed in which to +put captured strings, ensure that nmatch is zero. This will stop any attempt to +write to pmatch. */ + +if ((preg->re_cflags & REG_NOSUB) != 0 || pmatch == NULL) nmatch = 0; + +/* REG_STARTEND is a BSD extension, to allow for non-NUL-terminated strings. +The man page from OS X says "REG_STARTEND affects only the location of the +string, not how it is matched". That is why the "so" value is used to bump the +start location rather than being passed as a PCRE2 "starting offset". */ + +if ((eflags & REG_STARTEND) != 0) + { + if (pmatch == NULL) return REG_INVARG; + so = pmatch[0].rm_so; + eo = pmatch[0].rm_eo; + } +else + { + so = 0; + eo = (int)strlen(string); + } + +rc = pcre2_match((const pcre2_code *)preg->re_pcre2_code, + (PCRE2_SPTR)string + so, (eo - so), 0, options, md, NULL); + +/* Successful match */ + +if (rc >= 0) + { + size_t i; + PCRE2_SIZE *ovector = pcre2_get_ovector_pointer(md); + if ((size_t)rc > nmatch) rc = (int)nmatch; + for (i = 0; i < (size_t)rc; i++) + { + pmatch[i].rm_so = (ovector[i*2] == PCRE2_UNSET)? -1 : + (int)(ovector[i*2] + so); + pmatch[i].rm_eo = (ovector[i*2+1] == PCRE2_UNSET)? -1 : + (int)(ovector[i*2+1] + so); + } + for (; i < nmatch; i++) pmatch[i].rm_so = pmatch[i].rm_eo = -1; + return 0; + } + +/* Unsuccessful match */ + +if (rc <= PCRE2_ERROR_UTF8_ERR1 && rc >= PCRE2_ERROR_UTF8_ERR21) + return REG_INVARG; + +switch(rc) + { + default: return REG_ASSERT; + case PCRE2_ERROR_BADMODE: return REG_INVARG; + case PCRE2_ERROR_BADMAGIC: return REG_INVARG; + case PCRE2_ERROR_BADOPTION: return REG_INVARG; + case PCRE2_ERROR_BADUTFOFFSET: return REG_INVARG; + case PCRE2_ERROR_MATCHLIMIT: return REG_ESPACE; + case PCRE2_ERROR_NOMATCH: return REG_NOMATCH; + case PCRE2_ERROR_NOMEMORY: return REG_ESPACE; + case PCRE2_ERROR_NULL: return REG_INVARG; + } +} + +/* End of pcre2posix.c */ diff --git a/src/pcre2/src/pcre2posix.h b/src/pcre2/src/pcre2posix.h new file mode 100644 index 00000000..3a663b9f --- /dev/null +++ b/src/pcre2/src/pcre2posix.h @@ -0,0 +1,170 @@ +/************************************************* +* Perl-Compatible Regular Expressions * +*************************************************/ + +/* PCRE2 is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. This is +the public header file to be #included by applications that call PCRE2 via the +POSIX wrapper interface. + + Written by Philip Hazel + Original API code Copyright (c) 1997-2012 University of Cambridge + New API code Copyright (c) 2016-2019 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +/* Have to include stdlib.h in order to ensure that size_t is defined. */ + +#include + +/* Allow for C++ users */ + +#ifdef __cplusplus +extern "C" { +#endif + +/* Options, mostly defined by POSIX, but with some extras. */ + +#define REG_ICASE 0x0001 /* Maps to PCRE2_CASELESS */ +#define REG_NEWLINE 0x0002 /* Maps to PCRE2_MULTILINE */ +#define REG_NOTBOL 0x0004 /* Maps to PCRE2_NOTBOL */ +#define REG_NOTEOL 0x0008 /* Maps to PCRE2_NOTEOL */ +#define REG_DOTALL 0x0010 /* NOT defined by POSIX; maps to PCRE2_DOTALL */ +#define REG_NOSUB 0x0020 /* Do not report what was matched */ +#define REG_UTF 0x0040 /* NOT defined by POSIX; maps to PCRE2_UTF */ +#define REG_STARTEND 0x0080 /* BSD feature: pass subject string by so,eo */ +#define REG_NOTEMPTY 0x0100 /* NOT defined by POSIX; maps to PCRE2_NOTEMPTY */ +#define REG_UNGREEDY 0x0200 /* NOT defined by POSIX; maps to PCRE2_UNGREEDY */ +#define REG_UCP 0x0400 /* NOT defined by POSIX; maps to PCRE2_UCP */ +#define REG_PEND 0x0800 /* GNU feature: pass end pattern by re_endp */ +#define REG_NOSPEC 0x1000 /* Maps to PCRE2_LITERAL */ + +/* This is not used by PCRE2, but by defining it we make it easier +to slot PCRE2 into existing programs that make POSIX calls. */ + +#define REG_EXTENDED 0 + +/* Error values. Not all these are relevant or used by the wrapper. */ + +enum { + REG_ASSERT = 1, /* internal error ? */ + REG_BADBR, /* invalid repeat counts in {} */ + REG_BADPAT, /* pattern error */ + REG_BADRPT, /* ? * + invalid */ + REG_EBRACE, /* unbalanced {} */ + REG_EBRACK, /* unbalanced [] */ + REG_ECOLLATE, /* collation error - not relevant */ + REG_ECTYPE, /* bad class */ + REG_EESCAPE, /* bad escape sequence */ + REG_EMPTY, /* empty expression */ + REG_EPAREN, /* unbalanced () */ + REG_ERANGE, /* bad range inside [] */ + REG_ESIZE, /* expression too big */ + REG_ESPACE, /* failed to get memory */ + REG_ESUBREG, /* bad back reference */ + REG_INVARG, /* bad argument */ + REG_NOMATCH /* match failed */ +}; + + +/* The structure representing a compiled regular expression. It is also used +for passing the pattern end pointer when REG_PEND is set. */ + +typedef struct { + void *re_pcre2_code; + void *re_match_data; + const char *re_endp; + size_t re_nsub; + size_t re_erroffset; + int re_cflags; +} regex_t; + +/* The structure in which a captured offset is returned. */ + +typedef int regoff_t; + +typedef struct { + regoff_t rm_so; + regoff_t rm_eo; +} regmatch_t; + +/* When an application links to a PCRE2 DLL in Windows, the symbols that are +imported have to be identified as such. When building PCRE2, the appropriate +export settings are needed, and are set in pcre2posix.c before including this +file. */ + +#if defined(_WIN32) && !defined(PCRE2_STATIC) && !defined(PCRE2POSIX_EXP_DECL) +# define PCRE2POSIX_EXP_DECL extern __declspec(dllimport) +# define PCRE2POSIX_EXP_DEFN __declspec(dllimport) +#endif + +/* By default, we use the standard "extern" declarations. */ + +#ifndef PCRE2POSIX_EXP_DECL +# ifdef __cplusplus +# define PCRE2POSIX_EXP_DECL extern "C" +# define PCRE2POSIX_EXP_DEFN extern "C" +# else +# define PCRE2POSIX_EXP_DECL extern +# define PCRE2POSIX_EXP_DEFN extern +# endif +#endif + +/* The functions. The actual code is in functions with pcre2_xxx names for +uniqueness. POSIX names are provided as macros for API compatibility with POSIX +regex functions. It's done this way to ensure to they are always linked from +the PCRE2 library and not by accident from elsewhere (regex_t differs in size +elsewhere). */ + +PCRE2POSIX_EXP_DECL int pcre2_regcomp(regex_t *, const char *, int); +PCRE2POSIX_EXP_DECL int pcre2_regexec(const regex_t *, const char *, size_t, + regmatch_t *, int); +PCRE2POSIX_EXP_DECL size_t pcre2_regerror(int, const regex_t *, char *, size_t); +PCRE2POSIX_EXP_DECL void pcre2_regfree(regex_t *); + +#define regcomp pcre2_regcomp +#define regexec pcre2_regexec +#define regerror pcre2_regerror +#define regfree pcre2_regfree + +/* Debian had a patch that used different names. These are now here to save +them having to maintain their own patch, but are not documented by PCRE2. */ + +#define PCRE2regcomp pcre2_regcomp +#define PCRE2regexec pcre2_regexec +#define PCRE2regerror pcre2_regerror +#define PCRE2regfree pcre2_regfree + +#ifdef __cplusplus +} /* extern "C" */ +#endif + +/* End of pcre2posix.h */ diff --git a/src/pcre2/src/pcre2test.c b/src/pcre2/src/pcre2test.c new file mode 100644 index 00000000..aa007f8f --- /dev/null +++ b/src/pcre2/src/pcre2test.c @@ -0,0 +1,9220 @@ +/************************************************* +* PCRE2 testing program * +*************************************************/ + +/* PCRE2 is a library of functions to support regular expressions whose syntax +and semantics are as close as possible to those of the Perl 5 language. In 2014 +the API was completely revised and '2' was added to the name, because the old +API, which had lasted for 16 years, could not accommodate new requirements. At +the same time, this testing program was re-designed because its original +hacked-up (non-) design had also run out of steam. + + Written by Philip Hazel + Original code Copyright (c) 1997-2012 University of Cambridge + Rewritten code Copyright (c) 2016-2020 University of Cambridge + +----------------------------------------------------------------------------- +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + + * Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. + + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + + * Neither the name of the University of Cambridge nor the names of its + contributors may be used to endorse or promote products derived from + this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. +----------------------------------------------------------------------------- +*/ + + +/* This program supports testing of the 8-bit, 16-bit, and 32-bit PCRE2 +libraries in a single program, though its input and output are always 8-bit. +It is different from modules such as pcre2_compile.c in the library itself, +which are compiled separately for each code unit width. If two widths are +enabled, for example, pcre2_compile.c is compiled twice. In contrast, +pcre2test.c is compiled only once, and linked with all the enabled libraries. +Therefore, it must not make use of any of the macros from pcre2.h or +pcre2_internal.h that depend on PCRE2_CODE_UNIT_WIDTH. It does, however, make +use of SUPPORT_PCRE2_8, SUPPORT_PCRE2_16, and SUPPORT_PCRE2_32, to ensure that +it references only the enabled library functions. */ + +#ifdef HAVE_CONFIG_H +#include "config.h" +#endif + +#include +#include +#include +#include +#include +#include +#include + +#if defined NATIVE_ZOS +#include "pcrzoscs.h" +/* That header is not included in the main PCRE2 distribution because other +apparatus is needed to compile pcre2test for z/OS. The header can be found in +the special z/OS distribution, which is available from www.zaconsultants.net or +from www.cbttape.org. */ +#endif + +#ifdef HAVE_UNISTD_H +#include +#endif + +/* Debugging code enabler */ + +/* #define DEBUG_SHOW_MALLOC_ADDRESSES */ + +/* Both libreadline and libedit are optionally supported. The user-supplied +original patch uses readline/readline.h for libedit, but in at least one system +it is installed as editline/readline.h, so the configuration code now looks for +that first, falling back to readline/readline.h. */ + +#if defined(SUPPORT_LIBREADLINE) || defined(SUPPORT_LIBEDIT) +#if defined(SUPPORT_LIBREADLINE) +#include +#include +#else +#if defined(HAVE_EDITLINE_READLINE_H) +#include +#else +#include +#endif +#endif +#endif + +/* Put the test for interactive input into a macro so that it can be changed if +required for different environments. */ + +#define INTERACTIVE(f) isatty(fileno(f)) + + +/* ---------------------- System-specific definitions ---------------------- */ + +/* A number of things vary for Windows builds. Originally, pcretest opened its +input and output without "b"; then I was told that "b" was needed in some +environments, so it was added for release 5.0 to both the input and output. (It +makes no difference on Unix-like systems.) Later I was told that it is wrong +for the input on Windows. I've now abstracted the modes into macros that are +set here, to make it easier to fiddle with them, and removed "b" from the input +mode under Windows. The BINARY versions are used when saving/restoring compiled +patterns. */ + +#if defined(_WIN32) || defined(WIN32) +#include /* For _setmode() */ +#include /* For _O_BINARY */ +#define INPUT_MODE "r" +#define OUTPUT_MODE "wb" +#define BINARY_INPUT_MODE "rb" +#define BINARY_OUTPUT_MODE "wb" + +#ifndef isatty +#define isatty _isatty /* This is what Windows calls them, I'm told, */ +#endif /* though in some environments they seem to */ + /* be already defined, hence the #ifndefs. */ +#ifndef fileno +#define fileno _fileno +#endif + +/* A user sent this fix for Borland Builder 5 under Windows. */ + +#ifdef __BORLANDC__ +#define _setmode(handle, mode) setmode(handle, mode) +#endif + +/* Not Windows */ + +#else +#include /* These two includes are needed */ +#include /* for setrlimit(). */ +#if defined NATIVE_ZOS /* z/OS uses non-binary I/O */ +#define INPUT_MODE "r" +#define OUTPUT_MODE "w" +#define BINARY_INPUT_MODE "rb" +#define BINARY_OUTPUT_MODE "wb" +#else +#define INPUT_MODE "rb" +#define OUTPUT_MODE "wb" +#define BINARY_INPUT_MODE "rb" +#define BINARY_OUTPUT_MODE "wb" +#endif +#endif + +/* VMS-specific code was included as suggested by a VMS user [1]. Another VMS +user [2] provided alternative code which worked better for him. I have +commented out the original, but kept it around just in case. */ + +#ifdef __VMS +#include +/* These two includes came from [2]. */ +#include descrip +#include lib$routines +/* void vms_setsymbol( char *, char *, int ); Original code from [1]. */ +#endif + +/* VC and older compilers don't support %td or %zu, and even some that claim to +be C99 don't support it (hence DISABLE_PERCENT_ZT). There are some non-C99 +environments where %lu gives a warning with 32-bit pointers. As there doesn't +seem to be an easy way round this, just live with it (the cases are rare). */ + +#if defined(_MSC_VER) || !defined(__STDC_VERSION__) || __STDC_VERSION__ < 199901L || defined(DISABLE_PERCENT_ZT) +#define PTR_FORM "lu" +#define SIZ_FORM "lu" +#define SIZ_CAST (unsigned long int) +#else +#define PTR_FORM "td" +#define SIZ_FORM "zu" +#define SIZ_CAST +#endif + +/* ------------------End of system-specific definitions -------------------- */ + +/* Glueing macros that are used in several places below. */ + +#define glue(a,b) a##b +#define G(a,b) glue(a,b) + +/* Miscellaneous parameters and manifests */ + +#ifndef CLOCKS_PER_SEC +#ifdef CLK_TCK +#define CLOCKS_PER_SEC CLK_TCK +#else +#define CLOCKS_PER_SEC 100 +#endif +#endif + +#define CFORE_UNSET UINT32_MAX /* Unset value for startend/cfail/cerror fields */ +#define CONVERT_UNSET UINT32_MAX /* Unset value for convert_type field */ +#define DFA_WS_DIMENSION 1000 /* Size of DFA workspace */ +#define DEFAULT_OVECCOUNT 15 /* Default ovector count */ +#define JUNK_OFFSET 0xdeadbeef /* For initializing ovector */ +#define LOCALESIZE 32 /* Size of locale name */ +#define LOOPREPEAT 500000 /* Default loop count for timing */ +#define MALLOCLISTSIZE 20 /* For remembering mallocs */ +#define PARENS_NEST_DEFAULT 220 /* Default parentheses nest limit */ +#define PATSTACKSIZE 20 /* Pattern stack for save/restore testing */ +#define REPLACE_MODSIZE 100 /* Field for reading 8-bit replacement */ +#define VERSION_SIZE 64 /* Size of buffer for the version strings */ + +/* Default JIT compile options */ + +#define JIT_DEFAULT (PCRE2_JIT_COMPLETE|\ + PCRE2_JIT_PARTIAL_SOFT|\ + PCRE2_JIT_PARTIAL_HARD) + +/* Make sure the buffer into which replacement strings are copied is big enough +to hold them as 32-bit code units. */ + +#define REPLACE_BUFFSIZE 1024 /* This is a byte value */ + +/* Execution modes */ + +#define PCRE8_MODE 8 +#define PCRE16_MODE 16 +#define PCRE32_MODE 32 + +/* Processing returns */ + +enum { PR_OK, PR_SKIP, PR_ABEND }; + +/* The macro PRINTABLE determines whether to print an output character as-is or +as a hex value when showing compiled patterns. is We use it in cases when the +locale has not been explicitly changed, so as to get consistent output from +systems that differ in their output from isprint() even in the "C" locale. */ + +#ifdef EBCDIC +#define PRINTABLE(c) ((c) >= 64 && (c) < 255) +#else +#define PRINTABLE(c) ((c) >= 32 && (c) < 127) +#endif + +#define PRINTOK(c) ((use_tables != NULL && c < 256)? isprint(c) : PRINTABLE(c)) + +/* We have to include some of the library source files because we need +to use some of the macros, internal structure definitions, and other internal +values - pcre2test has "inside information" compared to an application program +that strictly follows the PCRE2 API. + +Before including pcre2_internal.h we define PRIV so that it does not get +defined therein. This ensures that PRIV names in the included files do not +clash with those in the libraries. Also, although pcre2_internal.h does itself +include pcre2.h, we explicitly include it beforehand, along with pcre2posix.h, +so that the PCRE2_EXP_xxx macros get set appropriately for an application, not +for building the library. */ + +#define PRIV(name) name +#define PCRE2_CODE_UNIT_WIDTH 0 +#include "pcre2.h" +#include "pcre2posix.h" +#include "pcre2_internal.h" + +/* We need access to some of the data tables that PCRE2 uses. Defining +PCRE2_PCRETEST makes some minor changes in the files. The previous definition +of PRIV avoids name clashes. */ + +#define PCRE2_PCRE2TEST +#include "pcre2_tables.c" +#include "pcre2_ucd.c" + +/* 32-bit integer values in the input are read by strtoul() or strtol(). The +check needed for overflow depends on whether long ints are in fact longer than +ints. They are defined not to be shorter. */ + +#if ULONG_MAX > UINT32_MAX +#define U32OVERFLOW(x) (x > UINT32_MAX) +#else +#define U32OVERFLOW(x) (x == UINT32_MAX) +#endif + +#if LONG_MAX > INT32_MAX +#define S32OVERFLOW(x) (x > INT32_MAX || x < INT32_MIN) +#else +#define S32OVERFLOW(x) (x == INT32_MAX || x == INT32_MIN) +#endif + +/* When PCRE2_CODE_UNIT_WIDTH is zero, pcre2_internal.h does not include +pcre2_intmodedep.h, which is where mode-dependent macros and structures are +defined. We can now include it for each supported code unit width. Because +PCRE2_CODE_UNIT_WIDTH was defined as zero before including pcre2.h, it will +have left PCRE2_SUFFIX defined as a no-op. We must re-define it appropriately +while including these files, and then restore it to a no-op. Because LINK_SIZE +may be changed in 16-bit mode and forced to 1 in 32-bit mode, the order of +these inclusions should not be changed. */ + +#undef PCRE2_SUFFIX +#undef PCRE2_CODE_UNIT_WIDTH + +#ifdef SUPPORT_PCRE2_8 +#define PCRE2_CODE_UNIT_WIDTH 8 +#define PCRE2_SUFFIX(a) G(a,8) +#include "pcre2_intmodedep.h" +#include "pcre2_printint.c" +#undef PCRE2_CODE_UNIT_WIDTH +#undef PCRE2_SUFFIX +#endif /* SUPPORT_PCRE2_8 */ + +#ifdef SUPPORT_PCRE2_16 +#define PCRE2_CODE_UNIT_WIDTH 16 +#define PCRE2_SUFFIX(a) G(a,16) +#include "pcre2_intmodedep.h" +#include "pcre2_printint.c" +#undef PCRE2_CODE_UNIT_WIDTH +#undef PCRE2_SUFFIX +#endif /* SUPPORT_PCRE2_16 */ + +#ifdef SUPPORT_PCRE2_32 +#define PCRE2_CODE_UNIT_WIDTH 32 +#define PCRE2_SUFFIX(a) G(a,32) +#include "pcre2_intmodedep.h" +#include "pcre2_printint.c" +#undef PCRE2_CODE_UNIT_WIDTH +#undef PCRE2_SUFFIX +#endif /* SUPPORT_PCRE2_32 */ + +#define PCRE2_SUFFIX(a) a + +/* We need to be able to check input text for UTF-8 validity, whatever code +widths are actually available, because the input to pcre2test is always in +8-bit code units. So we include the UTF validity checking function for 8-bit +code units. */ + +extern int valid_utf(PCRE2_SPTR8, PCRE2_SIZE, PCRE2_SIZE *); + +#define PCRE2_CODE_UNIT_WIDTH 8 +#undef PCRE2_SPTR +#define PCRE2_SPTR PCRE2_SPTR8 +#include "pcre2_valid_utf.c" +#undef PCRE2_CODE_UNIT_WIDTH +#undef PCRE2_SPTR + +/* If we have 8-bit support, default to it; if there is also 16-or 32-bit +support, it can be selected by a command-line option. If there is no 8-bit +support, there must be 16-bit or 32-bit support, so default to one of them. The +config function, JIT stack, contexts, and version string are the same in all +modes, so use the form of the first that is available. */ + +#if defined SUPPORT_PCRE2_8 +#define DEFAULT_TEST_MODE PCRE8_MODE +#define VERSION_TYPE PCRE2_UCHAR8 +#define PCRE2_CONFIG pcre2_config_8 +#define PCRE2_JIT_STACK pcre2_jit_stack_8 +#define PCRE2_REAL_GENERAL_CONTEXT pcre2_real_general_context_8 +#define PCRE2_REAL_COMPILE_CONTEXT pcre2_real_compile_context_8 +#define PCRE2_REAL_CONVERT_CONTEXT pcre2_real_convert_context_8 +#define PCRE2_REAL_MATCH_CONTEXT pcre2_real_match_context_8 + +#elif defined SUPPORT_PCRE2_16 +#define DEFAULT_TEST_MODE PCRE16_MODE +#define VERSION_TYPE PCRE2_UCHAR16 +#define PCRE2_CONFIG pcre2_config_16 +#define PCRE2_JIT_STACK pcre2_jit_stack_16 +#define PCRE2_REAL_GENERAL_CONTEXT pcre2_real_general_context_16 +#define PCRE2_REAL_COMPILE_CONTEXT pcre2_real_compile_context_16 +#define PCRE2_REAL_CONVERT_CONTEXT pcre2_real_convert_context_16 +#define PCRE2_REAL_MATCH_CONTEXT pcre2_real_match_context_16 + +#elif defined SUPPORT_PCRE2_32 +#define DEFAULT_TEST_MODE PCRE32_MODE +#define VERSION_TYPE PCRE2_UCHAR32 +#define PCRE2_CONFIG pcre2_config_32 +#define PCRE2_JIT_STACK pcre2_jit_stack_32 +#define PCRE2_REAL_GENERAL_CONTEXT pcre2_real_general_context_32 +#define PCRE2_REAL_COMPILE_CONTEXT pcre2_real_compile_context_32 +#define PCRE2_REAL_CONVERT_CONTEXT pcre2_real_convert_context_32 +#define PCRE2_REAL_MATCH_CONTEXT pcre2_real_match_context_32 +#endif + +/* ------------- Structure and table for handling #-commands ------------- */ + +typedef struct cmdstruct { + const char *name; + int value; +} cmdstruct; + +enum { CMD_FORBID_UTF, CMD_LOAD, CMD_LOADTABLES, CMD_NEWLINE_DEFAULT, + CMD_PATTERN, CMD_PERLTEST, CMD_POP, CMD_POPCOPY, CMD_SAVE, CMD_SUBJECT, + CMD_UNKNOWN }; + +static cmdstruct cmdlist[] = { + { "forbid_utf", CMD_FORBID_UTF }, + { "load", CMD_LOAD }, + { "loadtables", CMD_LOADTABLES }, + { "newline_default", CMD_NEWLINE_DEFAULT }, + { "pattern", CMD_PATTERN }, + { "perltest", CMD_PERLTEST }, + { "pop", CMD_POP }, + { "popcopy", CMD_POPCOPY }, + { "save", CMD_SAVE }, + { "subject", CMD_SUBJECT }}; + +#define cmdlistcount (sizeof(cmdlist)/sizeof(cmdstruct)) + +/* ------------- Structures and tables for handling modifiers -------------- */ + +/* Table of names for newline types. Must be kept in step with the definitions +of PCRE2_NEWLINE_xx in pcre2.h. */ + +static const char *newlines[] = { + "DEFAULT", "CR", "LF", "CRLF", "ANY", "ANYCRLF", "NUL" }; + +/* Structure and table for handling pattern conversion types. */ + +typedef struct convertstruct { + const char *name; + uint32_t option; +} convertstruct; + +static convertstruct convertlist[] = { + { "glob", PCRE2_CONVERT_GLOB }, + { "glob_no_starstar", PCRE2_CONVERT_GLOB_NO_STARSTAR }, + { "glob_no_wild_separator", PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR }, + { "posix_basic", PCRE2_CONVERT_POSIX_BASIC }, + { "posix_extended", PCRE2_CONVERT_POSIX_EXTENDED }, + { "unset", CONVERT_UNSET }}; + +#define convertlistcount (sizeof(convertlist)/sizeof(convertstruct)) + +/* Modifier types and applicability */ + +enum { MOD_CTC, /* Applies to a compile context */ + MOD_CTM, /* Applies to a match context */ + MOD_PAT, /* Applies to a pattern */ + MOD_PATP, /* Ditto, OK for Perl test */ + MOD_DAT, /* Applies to a data line */ + MOD_PD, /* Applies to a pattern or a data line */ + MOD_PDP, /* As MOD_PD, OK for Perl test */ + MOD_PND, /* As MOD_PD, but not for a default pattern */ + MOD_PNDP, /* As MOD_PND, OK for Perl test */ + MOD_CHR, /* Is a single character */ + MOD_CON, /* Is a "convert" type/options list */ + MOD_CTL, /* Is a control bit */ + MOD_BSR, /* Is a BSR value */ + MOD_IN2, /* Is one or two unsigned integers */ + MOD_INS, /* Is a signed integer */ + MOD_INT, /* Is an unsigned integer */ + MOD_IND, /* Is an unsigned integer, but no value => default */ + MOD_NL, /* Is a newline value */ + MOD_NN, /* Is a number or a name; more than one may occur */ + MOD_OPT, /* Is an option bit */ + MOD_SIZ, /* Is a PCRE2_SIZE value */ + MOD_STR }; /* Is a string */ + +/* Control bits. Some apply to compiling, some to matching, but some can be set +either on a pattern or a data line, so they must all be distinct. There are now +so many of them that they are split into two fields. */ + +#define CTL_AFTERTEXT 0x00000001u +#define CTL_ALLAFTERTEXT 0x00000002u +#define CTL_ALLCAPTURES 0x00000004u +#define CTL_ALLUSEDTEXT 0x00000008u +#define CTL_ALTGLOBAL 0x00000010u +#define CTL_BINCODE 0x00000020u +#define CTL_CALLOUT_CAPTURE 0x00000040u +#define CTL_CALLOUT_INFO 0x00000080u +#define CTL_CALLOUT_NONE 0x00000100u +#define CTL_DFA 0x00000200u +#define CTL_EXPAND 0x00000400u +#define CTL_FINDLIMITS 0x00000800u +#define CTL_FRAMESIZE 0x00001000u +#define CTL_FULLBINCODE 0x00002000u +#define CTL_GETALL 0x00004000u +#define CTL_GLOBAL 0x00008000u +#define CTL_HEXPAT 0x00010000u /* Same word as USE_LENGTH */ +#define CTL_INFO 0x00020000u +#define CTL_JITFAST 0x00040000u +#define CTL_JITVERIFY 0x00080000u +#define CTL_MARK 0x00100000u +#define CTL_MEMORY 0x00200000u +#define CTL_NULLCONTEXT 0x00400000u +#define CTL_POSIX 0x00800000u +#define CTL_POSIX_NOSUB 0x01000000u +#define CTL_PUSH 0x02000000u /* These three must be */ +#define CTL_PUSHCOPY 0x04000000u /* all in the same */ +#define CTL_PUSHTABLESCOPY 0x08000000u /* word. */ +#define CTL_STARTCHAR 0x10000000u +#define CTL_USE_LENGTH 0x20000000u /* Same word as HEXPAT */ +#define CTL_UTF8_INPUT 0x40000000u +#define CTL_ZERO_TERMINATE 0x80000000u + +/* Combinations */ + +#define CTL_DEBUG (CTL_FULLBINCODE|CTL_INFO) /* For setting */ +#define CTL_ANYINFO (CTL_DEBUG|CTL_BINCODE|CTL_CALLOUT_INFO) +#define CTL_ANYGLOB (CTL_ALTGLOBAL|CTL_GLOBAL) + +/* Second control word */ + +#define CTL2_SUBSTITUTE_CALLOUT 0x00000001u +#define CTL2_SUBSTITUTE_EXTENDED 0x00000002u +#define CTL2_SUBSTITUTE_LITERAL 0x00000004u +#define CTL2_SUBSTITUTE_MATCHED 0x00000008u +#define CTL2_SUBSTITUTE_OVERFLOW_LENGTH 0x00000010u +#define CTL2_SUBSTITUTE_REPLACEMENT_ONLY 0x00000020u +#define CTL2_SUBSTITUTE_UNKNOWN_UNSET 0x00000040u +#define CTL2_SUBSTITUTE_UNSET_EMPTY 0x00000080u +#define CTL2_SUBJECT_LITERAL 0x00000100u +#define CTL2_CALLOUT_NO_WHERE 0x00000200u +#define CTL2_CALLOUT_EXTRA 0x00000400u +#define CTL2_ALLVECTOR 0x00000800u + +#define CTL2_NL_SET 0x40000000u /* Informational */ +#define CTL2_BSR_SET 0x80000000u /* Informational */ + +/* These are the matching controls that may be set either on a pattern or on a +data line. They are copied from the pattern controls as initial settings for +data line controls. Note that CTL_MEMORY is not included here, because it does +different things in the two cases. */ + +#define CTL_ALLPD (CTL_AFTERTEXT|\ + CTL_ALLAFTERTEXT|\ + CTL_ALLCAPTURES|\ + CTL_ALLUSEDTEXT|\ + CTL_ALTGLOBAL|\ + CTL_GLOBAL|\ + CTL_MARK|\ + CTL_STARTCHAR|\ + CTL_UTF8_INPUT) + +#define CTL2_ALLPD (CTL2_SUBSTITUTE_CALLOUT|\ + CTL2_SUBSTITUTE_EXTENDED|\ + CTL2_SUBSTITUTE_LITERAL|\ + CTL2_SUBSTITUTE_MATCHED|\ + CTL2_SUBSTITUTE_OVERFLOW_LENGTH|\ + CTL2_SUBSTITUTE_REPLACEMENT_ONLY|\ + CTL2_SUBSTITUTE_UNKNOWN_UNSET|\ + CTL2_SUBSTITUTE_UNSET_EMPTY|\ + CTL2_ALLVECTOR) + +/* Structures for holding modifier information for patterns and subject strings +(data). Fields containing modifiers that can be set either for a pattern or a +subject must be at the start and in the same order in both cases so that the +same offset in the big table below works for both. */ + +typedef struct patctl { /* Structure for pattern modifiers. */ + uint32_t options; /* Must be in same position as datctl */ + uint32_t control; /* Must be in same position as datctl */ + uint32_t control2; /* Must be in same position as datctl */ + uint32_t jitstack; /* Must be in same position as datctl */ + uint8_t replacement[REPLACE_MODSIZE]; /* So must this */ + uint32_t substitute_skip; /* Must be in same position as patctl */ + uint32_t substitute_stop; /* Must be in same position as patctl */ + uint32_t jit; + uint32_t stackguard_test; + uint32_t tables_id; + uint32_t convert_type; + uint32_t convert_length; + uint32_t convert_glob_escape; + uint32_t convert_glob_separator; + uint32_t regerror_buffsize; + uint8_t locale[LOCALESIZE]; +} patctl; + +#define MAXCPYGET 10 +#define LENCPYGET 64 + +typedef struct datctl { /* Structure for data line modifiers. */ + uint32_t options; /* Must be in same position as patctl */ + uint32_t control; /* Must be in same position as patctl */ + uint32_t control2; /* Must be in same position as patctl */ + uint32_t jitstack; /* Must be in same position as patctl */ + uint8_t replacement[REPLACE_MODSIZE]; /* So must this */ + uint32_t substitute_skip; /* Must be in same position as patctl */ + uint32_t substitute_stop; /* Must be in same position as patctl */ + uint32_t startend[2]; + uint32_t cerror[2]; + uint32_t cfail[2]; + int32_t callout_data; + int32_t copy_numbers[MAXCPYGET]; + int32_t get_numbers[MAXCPYGET]; + uint32_t oveccount; + uint32_t offset; + uint8_t copy_names[LENCPYGET]; + uint8_t get_names[LENCPYGET]; +} datctl; + +/* Ids for which context to modify. */ + +enum { CTX_PAT, /* Active pattern context */ + CTX_POPPAT, /* Ditto, for a popped pattern */ + CTX_DEFPAT, /* Default pattern context */ + CTX_DAT, /* Active data (match) context */ + CTX_DEFDAT }; /* Default data (match) context */ + +/* Macros to simplify the big table below. */ + +#define CO(name) offsetof(PCRE2_REAL_COMPILE_CONTEXT, name) +#define MO(name) offsetof(PCRE2_REAL_MATCH_CONTEXT, name) +#define PO(name) offsetof(patctl, name) +#define PD(name) PO(name) +#define DO(name) offsetof(datctl, name) + +/* Table of all long-form modifiers. Must be in collating sequence of modifier +name because it is searched by binary chop. */ + +typedef struct modstruct { + const char *name; + uint16_t which; + uint16_t type; + uint32_t value; + PCRE2_SIZE offset; +} modstruct; + +static modstruct modlist[] = { + { "aftertext", MOD_PNDP, MOD_CTL, CTL_AFTERTEXT, PO(control) }, + { "allaftertext", MOD_PNDP, MOD_CTL, CTL_ALLAFTERTEXT, PO(control) }, + { "allcaptures", MOD_PND, MOD_CTL, CTL_ALLCAPTURES, PO(control) }, + { "allow_empty_class", MOD_PAT, MOD_OPT, PCRE2_ALLOW_EMPTY_CLASS, PO(options) }, + { "allow_surrogate_escapes", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES, CO(extra_options) }, + { "allusedtext", MOD_PNDP, MOD_CTL, CTL_ALLUSEDTEXT, PO(control) }, + { "allvector", MOD_PND, MOD_CTL, CTL2_ALLVECTOR, PO(control2) }, + { "alt_bsux", MOD_PAT, MOD_OPT, PCRE2_ALT_BSUX, PO(options) }, + { "alt_circumflex", MOD_PAT, MOD_OPT, PCRE2_ALT_CIRCUMFLEX, PO(options) }, + { "alt_verbnames", MOD_PAT, MOD_OPT, PCRE2_ALT_VERBNAMES, PO(options) }, + { "altglobal", MOD_PND, MOD_CTL, CTL_ALTGLOBAL, PO(control) }, + { "anchored", MOD_PD, MOD_OPT, PCRE2_ANCHORED, PD(options) }, + { "auto_callout", MOD_PAT, MOD_OPT, PCRE2_AUTO_CALLOUT, PO(options) }, + { "bad_escape_is_literal", MOD_CTC, MOD_OPT, PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL, CO(extra_options) }, + { "bincode", MOD_PAT, MOD_CTL, CTL_BINCODE, PO(control) }, + { "bsr", MOD_CTC, MOD_BSR, 0, CO(bsr_convention) }, + { "callout_capture", MOD_DAT, MOD_CTL, CTL_CALLOUT_CAPTURE, DO(control) }, + { "callout_data", MOD_DAT, MOD_INS, 0, DO(callout_data) }, + { "callout_error", MOD_DAT, MOD_IN2, 0, DO(cerror) }, + { "callout_extra", MOD_DAT, MOD_CTL, CTL2_CALLOUT_EXTRA, DO(control2) }, + { "callout_fail", MOD_DAT, MOD_IN2, 0, DO(cfail) }, + { "callout_info", MOD_PAT, MOD_CTL, CTL_CALLOUT_INFO, PO(control) }, + { "callout_no_where", MOD_DAT, MOD_CTL, CTL2_CALLOUT_NO_WHERE, DO(control2) }, + { "callout_none", MOD_DAT, MOD_CTL, CTL_CALLOUT_NONE, DO(control) }, + { "caseless", MOD_PATP, MOD_OPT, PCRE2_CASELESS, PO(options) }, + { "convert", MOD_PAT, MOD_CON, 0, PO(convert_type) }, + { "convert_glob_escape", MOD_PAT, MOD_CHR, 0, PO(convert_glob_escape) }, + { "convert_glob_separator", MOD_PAT, MOD_CHR, 0, PO(convert_glob_separator) }, + { "convert_length", MOD_PAT, MOD_INT, 0, PO(convert_length) }, + { "copy", MOD_DAT, MOD_NN, DO(copy_numbers), DO(copy_names) }, + { "copy_matched_subject", MOD_DAT, MOD_OPT, PCRE2_COPY_MATCHED_SUBJECT, DO(options) }, + { "debug", MOD_PAT, MOD_CTL, CTL_DEBUG, PO(control) }, + { "depth_limit", MOD_CTM, MOD_INT, 0, MO(depth_limit) }, + { "dfa", MOD_DAT, MOD_CTL, CTL_DFA, DO(control) }, + { "dfa_restart", MOD_DAT, MOD_OPT, PCRE2_DFA_RESTART, DO(options) }, + { "dfa_shortest", MOD_DAT, MOD_OPT, PCRE2_DFA_SHORTEST, DO(options) }, + { "dollar_endonly", MOD_PAT, MOD_OPT, PCRE2_DOLLAR_ENDONLY, PO(options) }, + { "dotall", MOD_PATP, MOD_OPT, PCRE2_DOTALL, PO(options) }, + { "dupnames", MOD_PATP, MOD_OPT, PCRE2_DUPNAMES, PO(options) }, + { "endanchored", MOD_PD, MOD_OPT, PCRE2_ENDANCHORED, PD(options) }, + { "escaped_cr_is_lf", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ESCAPED_CR_IS_LF, CO(extra_options) }, + { "expand", MOD_PAT, MOD_CTL, CTL_EXPAND, PO(control) }, + { "extended", MOD_PATP, MOD_OPT, PCRE2_EXTENDED, PO(options) }, + { "extended_more", MOD_PATP, MOD_OPT, PCRE2_EXTENDED_MORE, PO(options) }, + { "extra_alt_bsux", MOD_CTC, MOD_OPT, PCRE2_EXTRA_ALT_BSUX, CO(extra_options) }, + { "find_limits", MOD_DAT, MOD_CTL, CTL_FINDLIMITS, DO(control) }, + { "firstline", MOD_PAT, MOD_OPT, PCRE2_FIRSTLINE, PO(options) }, + { "framesize", MOD_PAT, MOD_CTL, CTL_FRAMESIZE, PO(control) }, + { "fullbincode", MOD_PAT, MOD_CTL, CTL_FULLBINCODE, PO(control) }, + { "get", MOD_DAT, MOD_NN, DO(get_numbers), DO(get_names) }, + { "getall", MOD_DAT, MOD_CTL, CTL_GETALL, DO(control) }, + { "global", MOD_PNDP, MOD_CTL, CTL_GLOBAL, PO(control) }, + { "heap_limit", MOD_CTM, MOD_INT, 0, MO(heap_limit) }, + { "hex", MOD_PAT, MOD_CTL, CTL_HEXPAT, PO(control) }, + { "info", MOD_PAT, MOD_CTL, CTL_INFO, PO(control) }, + { "jit", MOD_PAT, MOD_IND, 7, PO(jit) }, + { "jitfast", MOD_PAT, MOD_CTL, CTL_JITFAST, PO(control) }, + { "jitstack", MOD_PNDP, MOD_INT, 0, PO(jitstack) }, + { "jitverify", MOD_PAT, MOD_CTL, CTL_JITVERIFY, PO(control) }, + { "literal", MOD_PAT, MOD_OPT, PCRE2_LITERAL, PO(options) }, + { "locale", MOD_PAT, MOD_STR, LOCALESIZE, PO(locale) }, + { "mark", MOD_PNDP, MOD_CTL, CTL_MARK, PO(control) }, + { "match_invalid_utf", MOD_PAT, MOD_OPT, PCRE2_MATCH_INVALID_UTF, PO(options) }, + { "match_limit", MOD_CTM, MOD_INT, 0, MO(match_limit) }, + { "match_line", MOD_CTC, MOD_OPT, PCRE2_EXTRA_MATCH_LINE, CO(extra_options) }, + { "match_unset_backref", MOD_PAT, MOD_OPT, PCRE2_MATCH_UNSET_BACKREF, PO(options) }, + { "match_word", MOD_CTC, MOD_OPT, PCRE2_EXTRA_MATCH_WORD, CO(extra_options) }, + { "max_pattern_length", MOD_CTC, MOD_SIZ, 0, CO(max_pattern_length) }, + { "memory", MOD_PD, MOD_CTL, CTL_MEMORY, PD(control) }, + { "multiline", MOD_PATP, MOD_OPT, PCRE2_MULTILINE, PO(options) }, + { "never_backslash_c", MOD_PAT, MOD_OPT, PCRE2_NEVER_BACKSLASH_C, PO(options) }, + { "never_ucp", MOD_PAT, MOD_OPT, PCRE2_NEVER_UCP, PO(options) }, + { "never_utf", MOD_PAT, MOD_OPT, PCRE2_NEVER_UTF, PO(options) }, + { "newline", MOD_CTC, MOD_NL, 0, CO(newline_convention) }, + { "no_auto_capture", MOD_PAT, MOD_OPT, PCRE2_NO_AUTO_CAPTURE, PO(options) }, + { "no_auto_possess", MOD_PATP, MOD_OPT, PCRE2_NO_AUTO_POSSESS, PO(options) }, + { "no_dotstar_anchor", MOD_PAT, MOD_OPT, PCRE2_NO_DOTSTAR_ANCHOR, PO(options) }, + { "no_jit", MOD_DAT, MOD_OPT, PCRE2_NO_JIT, DO(options) }, + { "no_start_optimize", MOD_PATP, MOD_OPT, PCRE2_NO_START_OPTIMIZE, PO(options) }, + { "no_utf_check", MOD_PD, MOD_OPT, PCRE2_NO_UTF_CHECK, PD(options) }, + { "notbol", MOD_DAT, MOD_OPT, PCRE2_NOTBOL, DO(options) }, + { "notempty", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY, DO(options) }, + { "notempty_atstart", MOD_DAT, MOD_OPT, PCRE2_NOTEMPTY_ATSTART, DO(options) }, + { "noteol", MOD_DAT, MOD_OPT, PCRE2_NOTEOL, DO(options) }, + { "null_context", MOD_PD, MOD_CTL, CTL_NULLCONTEXT, PO(control) }, + { "offset", MOD_DAT, MOD_INT, 0, DO(offset) }, + { "offset_limit", MOD_CTM, MOD_SIZ, 0, MO(offset_limit)}, + { "ovector", MOD_DAT, MOD_INT, 0, DO(oveccount) }, + { "parens_nest_limit", MOD_CTC, MOD_INT, 0, CO(parens_nest_limit) }, + { "partial_hard", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) }, + { "partial_soft", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) }, + { "ph", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_HARD, DO(options) }, + { "posix", MOD_PAT, MOD_CTL, CTL_POSIX, PO(control) }, + { "posix_nosub", MOD_PAT, MOD_CTL, CTL_POSIX|CTL_POSIX_NOSUB, PO(control) }, + { "posix_startend", MOD_DAT, MOD_IN2, 0, DO(startend) }, + { "ps", MOD_DAT, MOD_OPT, PCRE2_PARTIAL_SOFT, DO(options) }, + { "push", MOD_PAT, MOD_CTL, CTL_PUSH, PO(control) }, + { "pushcopy", MOD_PAT, MOD_CTL, CTL_PUSHCOPY, PO(control) }, + { "pushtablescopy", MOD_PAT, MOD_CTL, CTL_PUSHTABLESCOPY, PO(control) }, + { "recursion_limit", MOD_CTM, MOD_INT, 0, MO(depth_limit) }, /* Obsolete synonym */ + { "regerror_buffsize", MOD_PAT, MOD_INT, 0, PO(regerror_buffsize) }, + { "replace", MOD_PND, MOD_STR, REPLACE_MODSIZE, PO(replacement) }, + { "stackguard", MOD_PAT, MOD_INT, 0, PO(stackguard_test) }, + { "startchar", MOD_PND, MOD_CTL, CTL_STARTCHAR, PO(control) }, + { "startoffset", MOD_DAT, MOD_INT, 0, DO(offset) }, + { "subject_literal", MOD_PATP, MOD_CTL, CTL2_SUBJECT_LITERAL, PO(control2) }, + { "substitute_callout", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_CALLOUT, PO(control2) }, + { "substitute_extended", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_EXTENDED, PO(control2) }, + { "substitute_literal", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_LITERAL, PO(control2) }, + { "substitute_matched", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_MATCHED, PO(control2) }, + { "substitute_overflow_length", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_OVERFLOW_LENGTH, PO(control2) }, + { "substitute_replacement_only", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_REPLACEMENT_ONLY, PO(control2) }, + { "substitute_skip", MOD_PND, MOD_INT, 0, PO(substitute_skip) }, + { "substitute_stop", MOD_PND, MOD_INT, 0, PO(substitute_stop) }, + { "substitute_unknown_unset", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNKNOWN_UNSET, PO(control2) }, + { "substitute_unset_empty", MOD_PND, MOD_CTL, CTL2_SUBSTITUTE_UNSET_EMPTY, PO(control2) }, + { "tables", MOD_PAT, MOD_INT, 0, PO(tables_id) }, + { "ucp", MOD_PATP, MOD_OPT, PCRE2_UCP, PO(options) }, + { "ungreedy", MOD_PAT, MOD_OPT, PCRE2_UNGREEDY, PO(options) }, + { "use_length", MOD_PAT, MOD_CTL, CTL_USE_LENGTH, PO(control) }, + { "use_offset_limit", MOD_PAT, MOD_OPT, PCRE2_USE_OFFSET_LIMIT, PO(options) }, + { "utf", MOD_PATP, MOD_OPT, PCRE2_UTF, PO(options) }, + { "utf8_input", MOD_PAT, MOD_CTL, CTL_UTF8_INPUT, PO(control) }, + { "zero_terminate", MOD_DAT, MOD_CTL, CTL_ZERO_TERMINATE, DO(control) } +}; + +#define MODLISTCOUNT sizeof(modlist)/sizeof(modstruct) + +/* Controls and options that are supported for use with the POSIX interface. */ + +#define POSIX_SUPPORTED_COMPILE_OPTIONS ( \ + PCRE2_CASELESS|PCRE2_DOTALL|PCRE2_LITERAL|PCRE2_MULTILINE|PCRE2_UCP| \ + PCRE2_UTF|PCRE2_UNGREEDY) + +#define POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS (0) + +#define POSIX_SUPPORTED_COMPILE_CONTROLS ( \ + CTL_AFTERTEXT|CTL_ALLAFTERTEXT|CTL_EXPAND|CTL_HEXPAT|CTL_POSIX| \ + CTL_POSIX_NOSUB|CTL_USE_LENGTH) + +#define POSIX_SUPPORTED_COMPILE_CONTROLS2 (0) + +#define POSIX_SUPPORTED_MATCH_OPTIONS ( \ + PCRE2_NOTBOL|PCRE2_NOTEMPTY|PCRE2_NOTEOL) + +#define POSIX_SUPPORTED_MATCH_CONTROLS (CTL_AFTERTEXT|CTL_ALLAFTERTEXT) +#define POSIX_SUPPORTED_MATCH_CONTROLS2 (0) + +/* Control bits that are not ignored with 'push'. */ + +#define PUSH_SUPPORTED_COMPILE_CONTROLS ( \ + CTL_BINCODE|CTL_CALLOUT_INFO|CTL_FULLBINCODE|CTL_HEXPAT|CTL_INFO| \ + CTL_JITVERIFY|CTL_MEMORY|CTL_FRAMESIZE|CTL_PUSH|CTL_PUSHCOPY| \ + CTL_PUSHTABLESCOPY|CTL_USE_LENGTH) + +#define PUSH_SUPPORTED_COMPILE_CONTROLS2 (CTL2_BSR_SET|CTL2_NL_SET) + +/* Controls that apply only at compile time with 'push'. */ + +#define PUSH_COMPILE_ONLY_CONTROLS CTL_JITVERIFY +#define PUSH_COMPILE_ONLY_CONTROLS2 (0) + +/* Controls that are forbidden with #pop or #popcopy. */ + +#define NOTPOP_CONTROLS (CTL_HEXPAT|CTL_POSIX|CTL_POSIX_NOSUB|CTL_PUSH| \ + CTL_PUSHCOPY|CTL_PUSHTABLESCOPY|CTL_USE_LENGTH) + +/* Pattern controls that are mutually exclusive. At present these are all in +the first control word. Note that CTL_POSIX_NOSUB is always accompanied by +CTL_POSIX, so it doesn't need its own entries. */ + +static uint32_t exclusive_pat_controls[] = { + CTL_POSIX | CTL_PUSH, + CTL_POSIX | CTL_PUSHCOPY, + CTL_POSIX | CTL_PUSHTABLESCOPY, + CTL_PUSH | CTL_PUSHCOPY, + CTL_PUSH | CTL_PUSHTABLESCOPY, + CTL_PUSHCOPY | CTL_PUSHTABLESCOPY, + CTL_EXPAND | CTL_HEXPAT }; + +/* Data controls that are mutually exclusive. At present these are all in the +first control word. */ + +static uint32_t exclusive_dat_controls[] = { + CTL_ALLUSEDTEXT | CTL_STARTCHAR, + CTL_FINDLIMITS | CTL_NULLCONTEXT }; + +/* Table of single-character abbreviated modifiers. The index field is +initialized to -1, but the first time the modifier is encountered, it is filled +in with the index of the full entry in modlist, to save repeated searching when +processing multiple test items. This short list is searched serially, so its +order does not matter. */ + +typedef struct c1modstruct { + const char *fullname; + uint32_t onechar; + int index; +} c1modstruct; + +static c1modstruct c1modlist[] = { + { "bincode", 'B', -1 }, + { "info", 'I', -1 }, + { "global", 'g', -1 }, + { "caseless", 'i', -1 }, + { "multiline", 'm', -1 }, + { "no_auto_capture", 'n', -1 }, + { "dotall", 's', -1 }, + { "extended", 'x', -1 } +}; + +#define C1MODLISTCOUNT sizeof(c1modlist)/sizeof(c1modstruct) + +/* Table of arguments for the -C command line option. Use macros to make the +table itself easier to read. */ + +#if defined SUPPORT_PCRE2_8 +#define SUPPORT_8 1 +#endif +#if defined SUPPORT_PCRE2_16 +#define SUPPORT_16 1 +#endif +#if defined SUPPORT_PCRE2_32 +#define SUPPORT_32 1 +#endif + +#ifndef SUPPORT_8 +#define SUPPORT_8 0 +#endif +#ifndef SUPPORT_16 +#define SUPPORT_16 0 +#endif +#ifndef SUPPORT_32 +#define SUPPORT_32 0 +#endif + +#ifdef EBCDIC +#define SUPPORT_EBCDIC 1 +#define EBCDIC_NL CHAR_LF +#else +#define SUPPORT_EBCDIC 0 +#define EBCDIC_NL 0 +#endif + +#ifdef NEVER_BACKSLASH_C +#define BACKSLASH_C 0 +#else +#define BACKSLASH_C 1 +#endif + +typedef struct coptstruct { + const char *name; + uint32_t type; + uint32_t value; +} coptstruct; + +enum { CONF_BSR, + CONF_FIX, + CONF_FIZ, + CONF_INT, + CONF_NL +}; + +static coptstruct coptlist[] = { + { "backslash-C", CONF_FIX, BACKSLASH_C }, + { "bsr", CONF_BSR, PCRE2_CONFIG_BSR }, + { "ebcdic", CONF_FIX, SUPPORT_EBCDIC }, + { "ebcdic-nl", CONF_FIZ, EBCDIC_NL }, + { "jit", CONF_INT, PCRE2_CONFIG_JIT }, + { "linksize", CONF_INT, PCRE2_CONFIG_LINKSIZE }, + { "newline", CONF_NL, PCRE2_CONFIG_NEWLINE }, + { "pcre2-16", CONF_FIX, SUPPORT_16 }, + { "pcre2-32", CONF_FIX, SUPPORT_32 }, + { "pcre2-8", CONF_FIX, SUPPORT_8 }, + { "unicode", CONF_INT, PCRE2_CONFIG_UNICODE } +}; + +#define COPTLISTCOUNT sizeof(coptlist)/sizeof(coptstruct) + +#undef SUPPORT_8 +#undef SUPPORT_16 +#undef SUPPORT_32 +#undef SUPPORT_EBCDIC + + +/* ----------------------- Static variables ------------------------ */ + +static FILE *infile; +static FILE *outfile; + +static const void *last_callout_mark; +static PCRE2_JIT_STACK *jit_stack = NULL; +static size_t jit_stack_size = 0; + +static BOOL first_callout; +static BOOL jit_was_used; +static BOOL restrict_for_perl_test = FALSE; +static BOOL show_memory = FALSE; + +static int code_unit_size; /* Bytes */ +static int jitrc; /* Return from JIT compile */ +static int test_mode = DEFAULT_TEST_MODE; +static int timeit = 0; +static int timeitm = 0; + +clock_t total_compile_time = 0; +clock_t total_jit_compile_time = 0; +clock_t total_match_time = 0; + +static uint32_t dfa_matched; +static uint32_t forbid_utf = 0; +static uint32_t maxlookbehind; +static uint32_t max_oveccount; +static uint32_t callout_count; +static uint32_t maxcapcount; + +static uint16_t local_newline_default = 0; + +static VERSION_TYPE jittarget[VERSION_SIZE]; +static VERSION_TYPE version[VERSION_SIZE]; +static VERSION_TYPE uversion[VERSION_SIZE]; + +static patctl def_patctl; +static patctl pat_patctl; +static datctl def_datctl; +static datctl dat_datctl; + +static void *patstack[PATSTACKSIZE]; +static int patstacknext = 0; + +static void *malloclist[MALLOCLISTSIZE]; +static PCRE2_SIZE malloclistlength[MALLOCLISTSIZE]; +static uint32_t malloclistptr = 0; + +#ifdef SUPPORT_PCRE2_8 +static regex_t preg = { NULL, NULL, 0, 0, 0, 0 }; +#endif + +static int *dfa_workspace = NULL; +static const uint8_t *locale_tables = NULL; +static const uint8_t *use_tables = NULL; +static uint8_t locale_name[32]; +static uint8_t *tables3 = NULL; /* For binary-loaded tables */ +static uint32_t loadtables_length = 0; + +/* We need buffers for building 16/32-bit strings; 8-bit strings don't need +rebuilding, but set up the same naming scheme for use in macros. The "buffer" +buffer is where all input lines are read. Its size is the same as pbuffer8. +Pattern lines are always copied to pbuffer8 for use in callouts, even if they +are actually compiled from pbuffer16 or pbuffer32. */ + +static size_t pbuffer8_size = 50000; /* Initial size, bytes */ +static uint8_t *pbuffer8 = NULL; +static uint8_t *buffer = NULL; + +/* The dbuffer is where all processed data lines are put. In non-8-bit modes it +is cast as needed. For long data lines it grows as necessary. */ + +static size_t dbuffer_size = 1u << 14; /* Initial size, bytes */ +static uint8_t *dbuffer = NULL; + + +/* ---------------- Mode-dependent variables -------------------*/ + +#ifdef SUPPORT_PCRE2_8 +static pcre2_code_8 *compiled_code8; +static pcre2_general_context_8 *general_context8, *general_context_copy8; +static pcre2_compile_context_8 *pat_context8, *default_pat_context8; +static pcre2_convert_context_8 *con_context8, *default_con_context8; +static pcre2_match_context_8 *dat_context8, *default_dat_context8; +static pcre2_match_data_8 *match_data8; +#endif + +#ifdef SUPPORT_PCRE2_16 +static pcre2_code_16 *compiled_code16; +static pcre2_general_context_16 *general_context16, *general_context_copy16; +static pcre2_compile_context_16 *pat_context16, *default_pat_context16; +static pcre2_convert_context_16 *con_context16, *default_con_context16; +static pcre2_match_context_16 *dat_context16, *default_dat_context16; +static pcre2_match_data_16 *match_data16; +static PCRE2_SIZE pbuffer16_size = 0; /* Set only when needed */ +static uint16_t *pbuffer16 = NULL; +#endif + +#ifdef SUPPORT_PCRE2_32 +static pcre2_code_32 *compiled_code32; +static pcre2_general_context_32 *general_context32, *general_context_copy32; +static pcre2_compile_context_32 *pat_context32, *default_pat_context32; +static pcre2_convert_context_32 *con_context32, *default_con_context32; +static pcre2_match_context_32 *dat_context32, *default_dat_context32; +static pcre2_match_data_32 *match_data32; +static PCRE2_SIZE pbuffer32_size = 0; /* Set only when needed */ +static uint32_t *pbuffer32 = NULL; +#endif + + +/* ---------------- Macros that work in all modes ----------------- */ + +#define CAST8VAR(x) CASTVAR(uint8_t *, x) +#define SET(x,y) SETOP(x,y,=) +#define SETPLUS(x,y) SETOP(x,y,+=) +#define strlen8(x) strlen((char *)x) + + +/* ---------------- Mode-dependent, runtime-testing macros ------------------*/ + +/* Define macros for variables and functions that must be selected dynamically +depending on the mode setting (8, 16, 32). These are dependent on which modes +are supported. */ + +#if (defined (SUPPORT_PCRE2_8) + defined (SUPPORT_PCRE2_16) + \ + defined (SUPPORT_PCRE2_32)) >= 2 + +/* ----- All three modes supported ----- */ + +#if defined(SUPPORT_PCRE2_8) && defined(SUPPORT_PCRE2_16) && defined(SUPPORT_PCRE2_32) + +#define CASTFLD(t,a,b) ((test_mode == PCRE8_MODE)? (t)(G(a,8)->b) : \ + (test_mode == PCRE16_MODE)? (t)(G(a,16)->b) : (t)(G(a,32)->b)) + +#define CASTVAR(t,x) ( \ + (test_mode == PCRE8_MODE)? (t)G(x,8) : \ + (test_mode == PCRE16_MODE)? (t)G(x,16) : (t)G(x,32)) + +#define CODE_UNIT(a,b) ( \ + (test_mode == PCRE8_MODE)? (uint32_t)(((PCRE2_SPTR8)(a))[b]) : \ + (test_mode == PCRE16_MODE)? (uint32_t)(((PCRE2_SPTR16)(a))[b]) : \ + (uint32_t)(((PCRE2_SPTR32)(a))[b])) + +#define CONCTXCPY(a,b) \ + if (test_mode == PCRE8_MODE) \ + memcpy(G(a,8),G(b,8),sizeof(pcre2_convert_context_8)); \ + else if (test_mode == PCRE16_MODE) \ + memcpy(G(a,16),G(b,16),sizeof(pcre2_convert_context_16)); \ + else memcpy(G(a,32),G(b,32),sizeof(pcre2_convert_context_32)) + +#define CONVERT_COPY(a,b,c) \ + if (test_mode == PCRE8_MODE) \ + memcpy(G(a,8),(char *)b,c); \ + else if (test_mode == PCRE16_MODE) \ + memcpy(G(a,16),(char *)b,(c)*2); \ + else if (test_mode == PCRE32_MODE) \ + memcpy(G(a,32),(char *)b,(c)*4) + +#define DATCTXCPY(a,b) \ + if (test_mode == PCRE8_MODE) \ + memcpy(G(a,8),G(b,8),sizeof(pcre2_match_context_8)); \ + else if (test_mode == PCRE16_MODE) \ + memcpy(G(a,16),G(b,16),sizeof(pcre2_match_context_16)); \ + else memcpy(G(a,32),G(b,32),sizeof(pcre2_match_context_32)) + +#define FLD(a,b) ((test_mode == PCRE8_MODE)? G(a,8)->b : \ + (test_mode == PCRE16_MODE)? G(a,16)->b : G(a,32)->b) + +#define PATCTXCPY(a,b) \ + if (test_mode == PCRE8_MODE) \ + memcpy(G(a,8),G(b,8),sizeof(pcre2_compile_context_8)); \ + else if (test_mode == PCRE16_MODE) \ + memcpy(G(a,16),G(b,16),sizeof(pcre2_compile_context_16)); \ + else memcpy(G(a,32),G(b,32),sizeof(pcre2_compile_context_32)) + +#define PCHARS(lv, p, offset, len, utf, f) \ + if (test_mode == PCRE32_MODE) \ + lv = pchars32((PCRE2_SPTR32)(p)+offset, len, utf, f); \ + else if (test_mode == PCRE16_MODE) \ + lv = pchars16((PCRE2_SPTR16)(p)+offset, len, utf, f); \ + else \ + lv = pchars8((PCRE2_SPTR8)(p)+offset, len, utf, f) + +#define PCHARSV(p, offset, len, utf, f) \ + if (test_mode == PCRE32_MODE) \ + (void)pchars32((PCRE2_SPTR32)(p)+offset, len, utf, f); \ + else if (test_mode == PCRE16_MODE) \ + (void)pchars16((PCRE2_SPTR16)(p)+offset, len, utf, f); \ + else \ + (void)pchars8((PCRE2_SPTR8)(p)+offset, len, utf, f) + +#define PCRE2_CALLOUT_ENUMERATE(a,b,c) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_callout_enumerate_8(compiled_code8, \ + (int (*)(struct pcre2_callout_enumerate_block_8 *, void *))b,c); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_callout_enumerate_16(compiled_code16, \ + (int(*)(struct pcre2_callout_enumerate_block_16 *, void *))b,c); \ + else \ + a = pcre2_callout_enumerate_32(compiled_code32, \ + (int (*)(struct pcre2_callout_enumerate_block_32 *, void *))b,c) + +#define PCRE2_CODE_COPY_FROM_VOID(a,b) \ + if (test_mode == PCRE8_MODE) \ + G(a,8) = pcre2_code_copy_8(b); \ + else if (test_mode == PCRE16_MODE) \ + G(a,16) = pcre2_code_copy_16(b); \ + else \ + G(a,32) = pcre2_code_copy_32(b) + +#define PCRE2_CODE_COPY_TO_VOID(a,b) \ + if (test_mode == PCRE8_MODE) \ + a = (void *)pcre2_code_copy_8(G(b,8)); \ + else if (test_mode == PCRE16_MODE) \ + a = (void *)pcre2_code_copy_16(G(b,16)); \ + else \ + a = (void *)pcre2_code_copy_32(G(b,32)) + +#define PCRE2_CODE_COPY_WITH_TABLES_TO_VOID(a,b) \ + if (test_mode == PCRE8_MODE) \ + a = (void *)pcre2_code_copy_with_tables_8(G(b,8)); \ + else if (test_mode == PCRE16_MODE) \ + a = (void *)pcre2_code_copy_with_tables_16(G(b,16)); \ + else \ + a = (void *)pcre2_code_copy_with_tables_32(G(b,32)) + +#define PCRE2_COMPILE(a,b,c,d,e,f,g) \ + if (test_mode == PCRE8_MODE) \ + G(a,8) = pcre2_compile_8(G(b,8),c,d,e,f,g); \ + else if (test_mode == PCRE16_MODE) \ + G(a,16) = pcre2_compile_16(G(b,16),c,d,e,f,g); \ + else \ + G(a,32) = pcre2_compile_32(G(b,32),c,d,e,f,g) + +#define PCRE2_CONVERTED_PATTERN_FREE(a) \ + if (test_mode == PCRE8_MODE) pcre2_converted_pattern_free_8((PCRE2_UCHAR8 *)a); \ + else if (test_mode == PCRE16_MODE) pcre2_converted_pattern_free_16((PCRE2_UCHAR16 *)a); \ + else pcre2_converted_pattern_free_32((PCRE2_UCHAR32 *)a) + +#define PCRE2_DFA_MATCH(a,b,c,d,e,f,g,h,i,j) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_dfa_match_8(G(b,8),(PCRE2_SPTR8)c,d,e,f,G(g,8),h,i,j); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_dfa_match_16(G(b,16),(PCRE2_SPTR16)c,d,e,f,G(g,16),h,i,j); \ + else \ + a = pcre2_dfa_match_32(G(b,32),(PCRE2_SPTR32)c,d,e,f,G(g,32),h,i,j) + +#define PCRE2_GET_ERROR_MESSAGE(r,a,b) \ + if (test_mode == PCRE8_MODE) \ + r = pcre2_get_error_message_8(a,G(b,8),G(G(b,8),_size)); \ + else if (test_mode == PCRE16_MODE) \ + r = pcre2_get_error_message_16(a,G(b,16),G(G(b,16),_size/2)); \ + else \ + r = pcre2_get_error_message_32(a,G(b,32),G(G(b,32),_size/4)) + +#define PCRE2_GET_OVECTOR_COUNT(a,b) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_get_ovector_count_8(G(b,8)); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_get_ovector_count_16(G(b,16)); \ + else \ + a = pcre2_get_ovector_count_32(G(b,32)) + +#define PCRE2_GET_STARTCHAR(a,b) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_get_startchar_8(G(b,8)); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_get_startchar_16(G(b,16)); \ + else \ + a = pcre2_get_startchar_32(G(b,32)) + +#define PCRE2_JIT_COMPILE(r,a,b) \ + if (test_mode == PCRE8_MODE) r = pcre2_jit_compile_8(G(a,8),b); \ + else if (test_mode == PCRE16_MODE) r = pcre2_jit_compile_16(G(a,16),b); \ + else r = pcre2_jit_compile_32(G(a,32),b) + +#define PCRE2_JIT_FREE_UNUSED_MEMORY(a) \ + if (test_mode == PCRE8_MODE) pcre2_jit_free_unused_memory_8(G(a,8)); \ + else if (test_mode == PCRE16_MODE) pcre2_jit_free_unused_memory_16(G(a,16)); \ + else pcre2_jit_free_unused_memory_32(G(a,32)) + +#define PCRE2_JIT_MATCH(a,b,c,d,e,f,g,h) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_jit_match_8(G(b,8),(PCRE2_SPTR8)c,d,e,f,G(g,8),h); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_jit_match_16(G(b,16),(PCRE2_SPTR16)c,d,e,f,G(g,16),h); \ + else \ + a = pcre2_jit_match_32(G(b,32),(PCRE2_SPTR32)c,d,e,f,G(g,32),h) + +#define PCRE2_JIT_STACK_CREATE(a,b,c,d) \ + if (test_mode == PCRE8_MODE) \ + a = (PCRE2_JIT_STACK *)pcre2_jit_stack_create_8(b,c,d); \ + else if (test_mode == PCRE16_MODE) \ + a = (PCRE2_JIT_STACK *)pcre2_jit_stack_create_16(b,c,d); \ + else \ + a = (PCRE2_JIT_STACK *)pcre2_jit_stack_create_32(b,c,d); + +#define PCRE2_JIT_STACK_ASSIGN(a,b,c) \ + if (test_mode == PCRE8_MODE) \ + pcre2_jit_stack_assign_8(G(a,8),(pcre2_jit_callback_8)b,c); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_jit_stack_assign_16(G(a,16),(pcre2_jit_callback_16)b,c); \ + else \ + pcre2_jit_stack_assign_32(G(a,32),(pcre2_jit_callback_32)b,c); + +#define PCRE2_JIT_STACK_FREE(a) \ + if (test_mode == PCRE8_MODE) \ + pcre2_jit_stack_free_8((pcre2_jit_stack_8 *)a); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_jit_stack_free_16((pcre2_jit_stack_16 *)a); \ + else \ + pcre2_jit_stack_free_32((pcre2_jit_stack_32 *)a); + +#define PCRE2_MAKETABLES(a) \ + if (test_mode == PCRE8_MODE) a = pcre2_maketables_8(NULL); \ + else if (test_mode == PCRE16_MODE) a = pcre2_maketables_16(NULL); \ + else a = pcre2_maketables_32(NULL) + +#define PCRE2_MATCH(a,b,c,d,e,f,g,h) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_match_8(G(b,8),(PCRE2_SPTR8)c,d,e,f,G(g,8),h); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_match_16(G(b,16),(PCRE2_SPTR16)c,d,e,f,G(g,16),h); \ + else \ + a = pcre2_match_32(G(b,32),(PCRE2_SPTR32)c,d,e,f,G(g,32),h) + +#define PCRE2_MATCH_DATA_CREATE(a,b,c) \ + if (test_mode == PCRE8_MODE) \ + G(a,8) = pcre2_match_data_create_8(b,c); \ + else if (test_mode == PCRE16_MODE) \ + G(a,16) = pcre2_match_data_create_16(b,c); \ + else \ + G(a,32) = pcre2_match_data_create_32(b,c) + +#define PCRE2_MATCH_DATA_CREATE_FROM_PATTERN(a,b,c) \ + if (test_mode == PCRE8_MODE) \ + G(a,8) = pcre2_match_data_create_from_pattern_8(G(b,8),c); \ + else if (test_mode == PCRE16_MODE) \ + G(a,16) = pcre2_match_data_create_from_pattern_16(G(b,16),c); \ + else \ + G(a,32) = pcre2_match_data_create_from_pattern_32(G(b,32),c) + +#define PCRE2_MATCH_DATA_FREE(a) \ + if (test_mode == PCRE8_MODE) \ + pcre2_match_data_free_8(G(a,8)); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_match_data_free_16(G(a,16)); \ + else \ + pcre2_match_data_free_32(G(a,32)) + +#define PCRE2_PATTERN_CONVERT(a,b,c,d,e,f,g) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_pattern_convert_8(G(b,8),c,d,(PCRE2_UCHAR8 **)e,f,G(g,8)); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_pattern_convert_16(G(b,16),c,d,(PCRE2_UCHAR16 **)e,f,G(g,16)); \ + else \ + a = pcre2_pattern_convert_32(G(b,32),c,d,(PCRE2_UCHAR32 **)e,f,G(g,32)) + +#define PCRE2_PATTERN_INFO(a,b,c,d) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_pattern_info_8(G(b,8),c,d); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_pattern_info_16(G(b,16),c,d); \ + else \ + a = pcre2_pattern_info_32(G(b,32),c,d) + +#define PCRE2_PRINTINT(a) \ + if (test_mode == PCRE8_MODE) \ + pcre2_printint_8(compiled_code8,outfile,a); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_printint_16(compiled_code16,outfile,a); \ + else \ + pcre2_printint_32(compiled_code32,outfile,a) + +#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \ + if (test_mode == PCRE8_MODE) \ + r = pcre2_serialize_decode_8((pcre2_code_8 **)a,b,c,G(d,8)); \ + else if (test_mode == PCRE16_MODE) \ + r = pcre2_serialize_decode_16((pcre2_code_16 **)a,b,c,G(d,16)); \ + else \ + r = pcre2_serialize_decode_32((pcre2_code_32 **)a,b,c,G(d,32)) + +#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \ + if (test_mode == PCRE8_MODE) \ + r = pcre2_serialize_encode_8((const pcre2_code_8 **)a,b,c,d,G(e,8)); \ + else if (test_mode == PCRE16_MODE) \ + r = pcre2_serialize_encode_16((const pcre2_code_16 **)a,b,c,d,G(e,16)); \ + else \ + r = pcre2_serialize_encode_32((const pcre2_code_32 **)a,b,c,d,G(e,32)) + +#define PCRE2_SERIALIZE_FREE(a) \ + if (test_mode == PCRE8_MODE) \ + pcre2_serialize_free_8(a); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_serialize_free_16(a); \ + else \ + pcre2_serialize_free_32(a) + +#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \ + if (test_mode == PCRE8_MODE) \ + r = pcre2_serialize_get_number_of_codes_8(a); \ + else if (test_mode == PCRE16_MODE) \ + r = pcre2_serialize_get_number_of_codes_16(a); \ + else \ + r = pcre2_serialize_get_number_of_codes_32(a); \ + +#define PCRE2_SET_CALLOUT(a,b,c) \ + if (test_mode == PCRE8_MODE) \ + pcre2_set_callout_8(G(a,8),(int (*)(pcre2_callout_block_8 *, void *))b,c); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_set_callout_16(G(a,16),(int (*)(pcre2_callout_block_16 *, void *))b,c); \ + else \ + pcre2_set_callout_32(G(a,32),(int (*)(pcre2_callout_block_32 *, void *))b,c); + +#define PCRE2_SET_CHARACTER_TABLES(a,b) \ + if (test_mode == PCRE8_MODE) \ + pcre2_set_character_tables_8(G(a,8),b); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_set_character_tables_16(G(a,16),b); \ + else \ + pcre2_set_character_tables_32(G(a,32),b) + +#define PCRE2_SET_COMPILE_RECURSION_GUARD(a,b,c) \ + if (test_mode == PCRE8_MODE) \ + pcre2_set_compile_recursion_guard_8(G(a,8),b,c); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_set_compile_recursion_guard_16(G(a,16),b,c); \ + else \ + pcre2_set_compile_recursion_guard_32(G(a,32),b,c) + +#define PCRE2_SET_DEPTH_LIMIT(a,b) \ + if (test_mode == PCRE8_MODE) \ + pcre2_set_depth_limit_8(G(a,8),b); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_set_depth_limit_16(G(a,16),b); \ + else \ + pcre2_set_depth_limit_32(G(a,32),b) + +#define PCRE2_SET_GLOB_SEPARATOR(r,a,b) \ + if (test_mode == PCRE8_MODE) \ + r = pcre2_set_glob_separator_8(G(a,8),b); \ + else if (test_mode == PCRE16_MODE) \ + r = pcre2_set_glob_separator_16(G(a,16),b); \ + else \ + r = pcre2_set_glob_separator_32(G(a,32),b) + +#define PCRE2_SET_GLOB_ESCAPE(r,a,b) \ + if (test_mode == PCRE8_MODE) \ + r = pcre2_set_glob_escape_8(G(a,8),b); \ + else if (test_mode == PCRE16_MODE) \ + r = pcre2_set_glob_escape_16(G(a,16),b); \ + else \ + r = pcre2_set_glob_escape_32(G(a,32),b) + +#define PCRE2_SET_HEAP_LIMIT(a,b) \ + if (test_mode == PCRE8_MODE) \ + pcre2_set_heap_limit_8(G(a,8),b); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_set_heap_limit_16(G(a,16),b); \ + else \ + pcre2_set_heap_limit_32(G(a,32),b) + +#define PCRE2_SET_MATCH_LIMIT(a,b) \ + if (test_mode == PCRE8_MODE) \ + pcre2_set_match_limit_8(G(a,8),b); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_set_match_limit_16(G(a,16),b); \ + else \ + pcre2_set_match_limit_32(G(a,32),b) + +#define PCRE2_SET_MAX_PATTERN_LENGTH(a,b) \ + if (test_mode == PCRE8_MODE) \ + pcre2_set_max_pattern_length_8(G(a,8),b); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_set_max_pattern_length_16(G(a,16),b); \ + else \ + pcre2_set_max_pattern_length_32(G(a,32),b) + +#define PCRE2_SET_OFFSET_LIMIT(a,b) \ + if (test_mode == PCRE8_MODE) \ + pcre2_set_offset_limit_8(G(a,8),b); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_set_offset_limit_16(G(a,16),b); \ + else \ + pcre2_set_offset_limit_32(G(a,32),b) + +#define PCRE2_SET_PARENS_NEST_LIMIT(a,b) \ + if (test_mode == PCRE8_MODE) \ + pcre2_set_parens_nest_limit_8(G(a,8),b); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_set_parens_nest_limit_16(G(a,16),b); \ + else \ + pcre2_set_parens_nest_limit_32(G(a,32),b) + +#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \ + if (test_mode == PCRE8_MODE) \ + pcre2_set_substitute_callout_8(G(a,8), \ + (int (*)(pcre2_substitute_callout_block_8 *, void *))b,c); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_set_substitute_callout_16(G(a,16), \ + (int (*)(pcre2_substitute_callout_block_16 *, void *))b,c); \ + else \ + pcre2_set_substitute_callout_32(G(a,32), \ + (int (*)(pcre2_substitute_callout_block_32 *, void *))b,c) + +#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_substitute_8(G(b,8),(PCRE2_SPTR8)c,d,e,f,G(g,8),h, \ + (PCRE2_SPTR8)i,j,(PCRE2_UCHAR8 *)k,l); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_substitute_16(G(b,16),(PCRE2_SPTR16)c,d,e,f,G(g,16),h, \ + (PCRE2_SPTR16)i,j,(PCRE2_UCHAR16 *)k,l); \ + else \ + a = pcre2_substitute_32(G(b,32),(PCRE2_SPTR32)c,d,e,f,G(g,32),h, \ + (PCRE2_SPTR32)i,j,(PCRE2_UCHAR32 *)k,l) + +#define PCRE2_SUBSTRING_COPY_BYNAME(a,b,c,d,e) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_substring_copy_byname_8(G(b,8),G(c,8),(PCRE2_UCHAR8 *)d,e); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_substring_copy_byname_16(G(b,16),G(c,16),(PCRE2_UCHAR16 *)d,e); \ + else \ + a = pcre2_substring_copy_byname_32(G(b,32),G(c,32),(PCRE2_UCHAR32 *)d,e) + +#define PCRE2_SUBSTRING_COPY_BYNUMBER(a,b,c,d,e) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_substring_copy_bynumber_8(G(b,8),c,(PCRE2_UCHAR8 *)d,e); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_substring_copy_bynumber_16(G(b,16),c,(PCRE2_UCHAR16 *)d,e); \ + else \ + a = pcre2_substring_copy_bynumber_32(G(b,32),c,(PCRE2_UCHAR32 *)d,e) + +#define PCRE2_SUBSTRING_FREE(a) \ + if (test_mode == PCRE8_MODE) pcre2_substring_free_8((PCRE2_UCHAR8 *)a); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_substring_free_16((PCRE2_UCHAR16 *)a); \ + else pcre2_substring_free_32((PCRE2_UCHAR32 *)a) + +#define PCRE2_SUBSTRING_GET_BYNAME(a,b,c,d,e) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_substring_get_byname_8(G(b,8),G(c,8),(PCRE2_UCHAR8 **)d,e); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_substring_get_byname_16(G(b,16),G(c,16),(PCRE2_UCHAR16 **)d,e); \ + else \ + a = pcre2_substring_get_byname_32(G(b,32),G(c,32),(PCRE2_UCHAR32 **)d,e) + +#define PCRE2_SUBSTRING_GET_BYNUMBER(a,b,c,d,e) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_substring_get_bynumber_8(G(b,8),c,(PCRE2_UCHAR8 **)d,e); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_substring_get_bynumber_16(G(b,16),c,(PCRE2_UCHAR16 **)d,e); \ + else \ + a = pcre2_substring_get_bynumber_32(G(b,32),c,(PCRE2_UCHAR32 **)d,e) + +#define PCRE2_SUBSTRING_LENGTH_BYNAME(a,b,c,d) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_substring_length_byname_8(G(b,8),G(c,8),d); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_substring_length_byname_16(G(b,16),G(c,16),d); \ + else \ + a = pcre2_substring_length_byname_32(G(b,32),G(c,32),d) + +#define PCRE2_SUBSTRING_LENGTH_BYNUMBER(a,b,c,d) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_substring_length_bynumber_8(G(b,8),c,d); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_substring_length_bynumber_16(G(b,16),c,d); \ + else \ + a = pcre2_substring_length_bynumber_32(G(b,32),c,d) + +#define PCRE2_SUBSTRING_LIST_GET(a,b,c,d) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_substring_list_get_8(G(b,8),(PCRE2_UCHAR8 ***)c,d); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_substring_list_get_16(G(b,16),(PCRE2_UCHAR16 ***)c,d); \ + else \ + a = pcre2_substring_list_get_32(G(b,32),(PCRE2_UCHAR32 ***)c,d) + +#define PCRE2_SUBSTRING_LIST_FREE(a) \ + if (test_mode == PCRE8_MODE) \ + pcre2_substring_list_free_8((PCRE2_SPTR8 *)a); \ + else if (test_mode == PCRE16_MODE) \ + pcre2_substring_list_free_16((PCRE2_SPTR16 *)a); \ + else \ + pcre2_substring_list_free_32((PCRE2_SPTR32 *)a) + +#define PCRE2_SUBSTRING_NUMBER_FROM_NAME(a,b,c) \ + if (test_mode == PCRE8_MODE) \ + a = pcre2_substring_number_from_name_8(G(b,8),G(c,8)); \ + else if (test_mode == PCRE16_MODE) \ + a = pcre2_substring_number_from_name_16(G(b,16),G(c,16)); \ + else \ + a = pcre2_substring_number_from_name_32(G(b,32),G(c,32)) + +#define PTR(x) ( \ + (test_mode == PCRE8_MODE)? (void *)G(x,8) : \ + (test_mode == PCRE16_MODE)? (void *)G(x,16) : \ + (void *)G(x,32)) + +#define SETFLD(x,y,z) \ + if (test_mode == PCRE8_MODE) G(x,8)->y = z; \ + else if (test_mode == PCRE16_MODE) G(x,16)->y = z; \ + else G(x,32)->y = z + +#define SETFLDVEC(x,y,v,z) \ + if (test_mode == PCRE8_MODE) G(x,8)->y[v] = z; \ + else if (test_mode == PCRE16_MODE) G(x,16)->y[v] = z; \ + else G(x,32)->y[v] = z + +#define SETOP(x,y,z) \ + if (test_mode == PCRE8_MODE) G(x,8) z y; \ + else if (test_mode == PCRE16_MODE) G(x,16) z y; \ + else G(x,32) z y + +#define SETCASTPTR(x,y) \ + if (test_mode == PCRE8_MODE) \ + G(x,8) = (uint8_t *)(y); \ + else if (test_mode == PCRE16_MODE) \ + G(x,16) = (uint16_t *)(y); \ + else \ + G(x,32) = (uint32_t *)(y) + +#define STRLEN(p) ((test_mode == PCRE8_MODE)? ((int)strlen((char *)p)) : \ + (test_mode == PCRE16_MODE)? ((int)strlen16((PCRE2_SPTR16)p)) : \ + ((int)strlen32((PCRE2_SPTR32)p))) + +#define SUB1(a,b) \ + if (test_mode == PCRE8_MODE) G(a,8)(G(b,8)); \ + else if (test_mode == PCRE16_MODE) G(a,16)(G(b,16)); \ + else G(a,32)(G(b,32)) + +#define SUB2(a,b,c) \ + if (test_mode == PCRE8_MODE) G(a,8)(G(b,8),G(c,8)); \ + else if (test_mode == PCRE16_MODE) G(a,16)(G(b,16),G(c,16)); \ + else G(a,32)(G(b,32),G(c,32)) + +#define TEST(x,r,y) ( \ + (test_mode == PCRE8_MODE && G(x,8) r (y)) || \ + (test_mode == PCRE16_MODE && G(x,16) r (y)) || \ + (test_mode == PCRE32_MODE && G(x,32) r (y))) + +#define TESTFLD(x,f,r,y) ( \ + (test_mode == PCRE8_MODE && G(x,8)->f r (y)) || \ + (test_mode == PCRE16_MODE && G(x,16)->f r (y)) || \ + (test_mode == PCRE32_MODE && G(x,32)->f r (y))) + + +/* ----- Two out of three modes are supported ----- */ + +#else + +/* We can use some macro trickery to make a single set of definitions work in +the three different cases. */ + +/* ----- 32-bit and 16-bit but not 8-bit supported ----- */ + +#if defined(SUPPORT_PCRE2_32) && defined(SUPPORT_PCRE2_16) +#define BITONE 32 +#define BITTWO 16 + +/* ----- 32-bit and 8-bit but not 16-bit supported ----- */ + +#elif defined(SUPPORT_PCRE2_32) && defined(SUPPORT_PCRE2_8) +#define BITONE 32 +#define BITTWO 8 + +/* ----- 16-bit and 8-bit but not 32-bit supported ----- */ + +#else +#define BITONE 16 +#define BITTWO 8 +#endif + + +/* ----- Common macros for two-mode cases ----- */ + +#define BYTEONE (BITONE/8) +#define BYTETWO (BITTWO/8) + +#define CASTFLD(t,a,b) \ + ((test_mode == G(G(PCRE,BITONE),_MODE))? (t)(G(a,BITONE)->b) : \ + (t)(G(a,BITTWO)->b)) + +#define CASTVAR(t,x) ( \ + (test_mode == G(G(PCRE,BITONE),_MODE))? \ + (t)G(x,BITONE) : (t)G(x,BITTWO)) + +#define CODE_UNIT(a,b) ( \ + (test_mode == G(G(PCRE,BITONE),_MODE))? \ + (uint32_t)(((G(PCRE2_SPTR,BITONE))(a))[b]) : \ + (uint32_t)(((G(PCRE2_SPTR,BITTWO))(a))[b])) + +#define CONCTXCPY(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + memcpy(G(a,BITONE),G(b,BITONE),sizeof(G(pcre2_convert_context_,BITONE))); \ + else \ + memcpy(G(a,BITTWO),G(b,BITTWO),sizeof(G(pcre2_convert_context_,BITTWO))) + +#define CONVERT_COPY(a,b,c) \ + (test_mode == G(G(PCRE,BITONE),_MODE))? \ + memcpy(G(a,BITONE),(char *)b,(c)*BYTEONE) : \ + memcpy(G(a,BITTWO),(char *)b,(c)*BYTETWO) + +#define DATCTXCPY(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + memcpy(G(a,BITONE),G(b,BITONE),sizeof(G(pcre2_match_context_,BITONE))); \ + else \ + memcpy(G(a,BITTWO),G(b,BITTWO),sizeof(G(pcre2_match_context_,BITTWO))) + +#define FLD(a,b) \ + ((test_mode == G(G(PCRE,BITONE),_MODE))? G(a,BITONE)->b : G(a,BITTWO)->b) + +#define PATCTXCPY(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + memcpy(G(a,BITONE),G(b,BITONE),sizeof(G(pcre2_compile_context_,BITONE))); \ + else \ + memcpy(G(a,BITTWO),G(b,BITTWO),sizeof(G(pcre2_compile_context_,BITTWO))) + +#define PCHARS(lv, p, offset, len, utf, f) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + lv = G(pchars,BITONE)((G(PCRE2_SPTR,BITONE))(p)+offset, len, utf, f); \ + else \ + lv = G(pchars,BITTWO)((G(PCRE2_SPTR,BITTWO))(p)+offset, len, utf, f) + +#define PCHARSV(p, offset, len, utf, f) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + (void)G(pchars,BITONE)((G(PCRE2_SPTR,BITONE))(p)+offset, len, utf, f); \ + else \ + (void)G(pchars,BITTWO)((G(PCRE2_SPTR,BITTWO))(p)+offset, len, utf, f) + +#define PCRE2_CALLOUT_ENUMERATE(a,b,c) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_callout_enumerate,BITONE)(G(compiled_code,BITONE), \ + (int (*)(struct G(pcre2_callout_enumerate_block_,BITONE) *, void *))b,c); \ + else \ + a = G(pcre2_callout_enumerate,BITTWO)(G(compiled_code,BITTWO), \ + (int (*)(struct G(pcre2_callout_enumerate_block_,BITTWO) *, void *))b,c) + +#define PCRE2_CODE_COPY_FROM_VOID(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(a,BITONE) = G(pcre2_code_copy_,BITONE)(b); \ + else \ + G(a,BITTWO) = G(pcre2_code_copy_,BITTWO)(b) + +#define PCRE2_CODE_COPY_TO_VOID(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = (void *)G(pcre2_code_copy_,BITONE)(G(b,BITONE)); \ + else \ + a = (void *)G(pcre2_code_copy_,BITTWO)(G(b,BITTWO)) + +#define PCRE2_CODE_COPY_WITH_TABLES_TO_VOID(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = (void *)G(pcre2_code_copy_with_tables_,BITONE)(G(b,BITONE)); \ + else \ + a = (void *)G(pcre2_code_copy_with_tables_,BITTWO)(G(b,BITTWO)) + +#define PCRE2_COMPILE(a,b,c,d,e,f,g) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(a,BITONE) = G(pcre2_compile_,BITONE)(G(b,BITONE),c,d,e,f,g); \ + else \ + G(a,BITTWO) = G(pcre2_compile_,BITTWO)(G(b,BITTWO),c,d,e,f,g) + +#define PCRE2_CONVERTED_PATTERN_FREE(a) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_converted_pattern_free_,BITONE)((G(PCRE2_UCHAR,BITONE) *)a); \ + else \ + G(pcre2_converted_pattern_free_,BITTWO)((G(PCRE2_UCHAR,BITTWO) *)a) + +#define PCRE2_DFA_MATCH(a,b,c,d,e,f,g,h,i,j) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_dfa_match_,BITONE)(G(b,BITONE),(G(PCRE2_SPTR,BITONE))c,d,e,f, \ + G(g,BITONE),h,i,j); \ + else \ + a = G(pcre2_dfa_match_,BITTWO)(G(b,BITTWO),(G(PCRE2_SPTR,BITTWO))c,d,e,f, \ + G(g,BITTWO),h,i,j) + +#define PCRE2_GET_ERROR_MESSAGE(r,a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + r = G(pcre2_get_error_message_,BITONE)(a,G(b,BITONE),G(G(b,BITONE),_size/BYTEONE)); \ + else \ + r = G(pcre2_get_error_message_,BITTWO)(a,G(b,BITTWO),G(G(b,BITTWO),_size/BYTETWO)) + +#define PCRE2_GET_OVECTOR_COUNT(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_get_ovector_count_,BITONE)(G(b,BITONE)); \ + else \ + a = G(pcre2_get_ovector_count_,BITTWO)(G(b,BITTWO)) + +#define PCRE2_GET_STARTCHAR(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_get_startchar_,BITONE)(G(b,BITONE)); \ + else \ + a = G(pcre2_get_startchar_,BITTWO)(G(b,BITTWO)) + +#define PCRE2_JIT_COMPILE(r,a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + r = G(pcre2_jit_compile_,BITONE)(G(a,BITONE),b); \ + else \ + r = G(pcre2_jit_compile_,BITTWO)(G(a,BITTWO),b) + +#define PCRE2_JIT_FREE_UNUSED_MEMORY(a) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_jit_free_unused_memory_,BITONE)(G(a,BITONE)); \ + else \ + G(pcre2_jit_free_unused_memory_,BITTWO)(G(a,BITTWO)) + +#define PCRE2_JIT_MATCH(a,b,c,d,e,f,g,h) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_jit_match_,BITONE)(G(b,BITONE),(G(PCRE2_SPTR,BITONE))c,d,e,f, \ + G(g,BITONE),h); \ + else \ + a = G(pcre2_jit_match_,BITTWO)(G(b,BITTWO),(G(PCRE2_SPTR,BITTWO))c,d,e,f, \ + G(g,BITTWO),h) + +#define PCRE2_JIT_STACK_CREATE(a,b,c,d) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = (PCRE2_JIT_STACK *)G(pcre2_jit_stack_create_,BITONE)(b,c,d); \ + else \ + a = (PCRE2_JIT_STACK *)G(pcre2_jit_stack_create_,BITTWO)(b,c,d); \ + +#define PCRE2_JIT_STACK_ASSIGN(a,b,c) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_jit_stack_assign_,BITONE)(G(a,BITONE),(G(pcre2_jit_callback_,BITONE))b,c); \ + else \ + G(pcre2_jit_stack_assign_,BITTWO)(G(a,BITTWO),(G(pcre2_jit_callback_,BITTWO))b,c); + +#define PCRE2_JIT_STACK_FREE(a) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_jit_stack_free_,BITONE)((G(pcre2_jit_stack_,BITONE) *)a); \ + else \ + G(pcre2_jit_stack_free_,BITTWO)((G(pcre2_jit_stack_,BITTWO) *)a); + +#define PCRE2_MAKETABLES(a) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_maketables_,BITONE)(NULL); \ + else \ + a = G(pcre2_maketables_,BITTWO)(NULL) + +#define PCRE2_MATCH(a,b,c,d,e,f,g,h) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_match_,BITONE)(G(b,BITONE),(G(PCRE2_SPTR,BITONE))c,d,e,f, \ + G(g,BITONE),h); \ + else \ + a = G(pcre2_match_,BITTWO)(G(b,BITTWO),(G(PCRE2_SPTR,BITTWO))c,d,e,f, \ + G(g,BITTWO),h) + +#define PCRE2_MATCH_DATA_CREATE(a,b,c) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(a,BITONE) = G(pcre2_match_data_create_,BITONE)(b,c); \ + else \ + G(a,BITTWO) = G(pcre2_match_data_create_,BITTWO)(b,c) + +#define PCRE2_MATCH_DATA_CREATE_FROM_PATTERN(a,b,c) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(a,BITONE) = G(pcre2_match_data_create_from_pattern_,BITONE)(G(b,BITONE),c); \ + else \ + G(a,BITTWO) = G(pcre2_match_data_create_from_pattern_,BITTWO)(G(b,BITTWO),c) + +#define PCRE2_MATCH_DATA_FREE(a) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_match_data_free_,BITONE)(G(a,BITONE)); \ + else \ + G(pcre2_match_data_free_,BITTWO)(G(a,BITTWO)) + +#define PCRE2_PATTERN_CONVERT(a,b,c,d,e,f,g) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_pattern_convert_,BITONE)(G(b,BITONE),c,d,(G(PCRE2_UCHAR,BITONE) **)e,f,G(g,BITONE)); \ + else \ + a = G(pcre2_pattern_convert_,BITTWO)(G(b,BITTWO),c,d,(G(PCRE2_UCHAR,BITTWO) **)e,f,G(g,BITTWO)) + +#define PCRE2_PATTERN_INFO(a,b,c,d) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_pattern_info_,BITONE)(G(b,BITONE),c,d); \ + else \ + a = G(pcre2_pattern_info_,BITTWO)(G(b,BITTWO),c,d) + +#define PCRE2_PRINTINT(a) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_printint_,BITONE)(G(compiled_code,BITONE),outfile,a); \ + else \ + G(pcre2_printint_,BITTWO)(G(compiled_code,BITTWO),outfile,a) + +#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + r = G(pcre2_serialize_decode_,BITONE)((G(pcre2_code_,BITONE) **)a,b,c,G(d,BITONE)); \ + else \ + r = G(pcre2_serialize_decode_,BITTWO)((G(pcre2_code_,BITTWO) **)a,b,c,G(d,BITTWO)) + +#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + r = G(pcre2_serialize_encode_,BITONE)((G(const pcre2_code_,BITONE) **)a,b,c,d,G(e,BITONE)); \ + else \ + r = G(pcre2_serialize_encode_,BITTWO)((G(const pcre2_code_,BITTWO) **)a,b,c,d,G(e,BITTWO)) + +#define PCRE2_SERIALIZE_FREE(a) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_serialize_free_,BITONE)(a); \ + else \ + G(pcre2_serialize_free_,BITTWO)(a) + +#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + r = G(pcre2_serialize_get_number_of_codes_,BITONE)(a); \ + else \ + r = G(pcre2_serialize_get_number_of_codes_,BITTWO)(a) + +#define PCRE2_SET_CALLOUT(a,b,c) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_set_callout_,BITONE)(G(a,BITONE), \ + (int (*)(G(pcre2_callout_block_,BITONE) *, void *))b,c); \ + else \ + G(pcre2_set_callout_,BITTWO)(G(a,BITTWO), \ + (int (*)(G(pcre2_callout_block_,BITTWO) *, void *))b,c); + +#define PCRE2_SET_CHARACTER_TABLES(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_set_character_tables_,BITONE)(G(a,BITONE),b); \ + else \ + G(pcre2_set_character_tables_,BITTWO)(G(a,BITTWO),b) + +#define PCRE2_SET_COMPILE_RECURSION_GUARD(a,b,c) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_set_compile_recursion_guard_,BITONE)(G(a,BITONE),b,c); \ + else \ + G(pcre2_set_compile_recursion_guard_,BITTWO)(G(a,BITTWO),b,c) + +#define PCRE2_SET_DEPTH_LIMIT(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_set_depth_limit_,BITONE)(G(a,BITONE),b); \ + else \ + G(pcre2_set_depth_limit_,BITTWO)(G(a,BITTWO),b) + +#define PCRE2_SET_GLOB_ESCAPE(r,a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + r = G(pcre2_set_glob_escape_,BITONE)(G(a,BITONE),b); \ + else \ + r = G(pcre2_set_glob_escape_,BITTWO)(G(a,BITTWO),b) + +#define PCRE2_SET_GLOB_SEPARATOR(r,a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + r = G(pcre2_set_glob_separator_,BITONE)(G(a,BITONE),b); \ + else \ + r = G(pcre2_set_glob_separator_,BITTWO)(G(a,BITTWO),b) + +#define PCRE2_SET_HEAP_LIMIT(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_set_heap_limit_,BITONE)(G(a,BITONE),b); \ + else \ + G(pcre2_set_heap_limit_,BITTWO)(G(a,BITTWO),b) + +#define PCRE2_SET_MATCH_LIMIT(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_set_match_limit_,BITONE)(G(a,BITONE),b); \ + else \ + G(pcre2_set_match_limit_,BITTWO)(G(a,BITTWO),b) + +#define PCRE2_SET_MAX_PATTERN_LENGTH(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_set_max_pattern_length_,BITONE)(G(a,BITONE),b); \ + else \ + G(pcre2_set_max_pattern_length_,BITTWO)(G(a,BITTWO),b) + +#define PCRE2_SET_OFFSET_LIMIT(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_set_offset_limit_,BITONE)(G(a,BITONE),b); \ + else \ + G(pcre2_set_offset_limit_,BITTWO)(G(a,BITTWO),b) + +#define PCRE2_SET_PARENS_NEST_LIMIT(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_set_parens_nest_limit_,BITONE)(G(a,BITONE),b); \ + else \ + G(pcre2_set_parens_nest_limit_,BITTWO)(G(a,BITTWO),b) + +#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_set_substitute_callout_,BITONE)(G(a,BITONE), \ + (int (*)(G(pcre2_substitute_callout_block_,BITONE) *, void *))b,c); \ + else \ + G(pcre2_set_substitute_callout_,BITTWO)(G(a,BITTWO), \ + (int (*)(G(pcre2_substitute_callout_block_,BITTWO) *, void *))b,c) + +#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_substitute_,BITONE)(G(b,BITONE),(G(PCRE2_SPTR,BITONE))c,d,e,f, \ + G(g,BITONE),h,(G(PCRE2_SPTR,BITONE))i,j, \ + (G(PCRE2_UCHAR,BITONE) *)k,l); \ + else \ + a = G(pcre2_substitute_,BITTWO)(G(b,BITTWO),(G(PCRE2_SPTR,BITTWO))c,d,e,f, \ + G(g,BITTWO),h,(G(PCRE2_SPTR,BITTWO))i,j, \ + (G(PCRE2_UCHAR,BITTWO) *)k,l) + +#define PCRE2_SUBSTRING_COPY_BYNAME(a,b,c,d,e) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_substring_copy_byname_,BITONE)(G(b,BITONE),G(c,BITONE),\ + (G(PCRE2_UCHAR,BITONE) *)d,e); \ + else \ + a = G(pcre2_substring_copy_byname_,BITTWO)(G(b,BITTWO),G(c,BITTWO),\ + (G(PCRE2_UCHAR,BITTWO) *)d,e) + +#define PCRE2_SUBSTRING_COPY_BYNUMBER(a,b,c,d,e) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_substring_copy_bynumber_,BITONE)(G(b,BITONE),c,\ + (G(PCRE2_UCHAR,BITONE) *)d,e); \ + else \ + a = G(pcre2_substring_copy_bynumber_,BITTWO)(G(b,BITTWO),c,\ + (G(PCRE2_UCHAR,BITTWO) *)d,e) + +#define PCRE2_SUBSTRING_FREE(a) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_substring_free_,BITONE)((G(PCRE2_UCHAR,BITONE) *)a); \ + else G(pcre2_substring_free_,BITTWO)((G(PCRE2_UCHAR,BITTWO) *)a) + +#define PCRE2_SUBSTRING_GET_BYNAME(a,b,c,d,e) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_substring_get_byname_,BITONE)(G(b,BITONE),G(c,BITONE),\ + (G(PCRE2_UCHAR,BITONE) **)d,e); \ + else \ + a = G(pcre2_substring_get_byname_,BITTWO)(G(b,BITTWO),G(c,BITTWO),\ + (G(PCRE2_UCHAR,BITTWO) **)d,e) + +#define PCRE2_SUBSTRING_GET_BYNUMBER(a,b,c,d,e) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_substring_get_bynumber_,BITONE)(G(b,BITONE),c,\ + (G(PCRE2_UCHAR,BITONE) **)d,e); \ + else \ + a = G(pcre2_substring_get_bynumber_,BITTWO)(G(b,BITTWO),c,\ + (G(PCRE2_UCHAR,BITTWO) **)d,e) + +#define PCRE2_SUBSTRING_LENGTH_BYNAME(a,b,c,d) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_substring_length_byname_,BITONE)(G(b,BITONE),G(c,BITONE),d); \ + else \ + a = G(pcre2_substring_length_byname_,BITTWO)(G(b,BITTWO),G(c,BITTWO),d) + +#define PCRE2_SUBSTRING_LENGTH_BYNUMBER(a,b,c,d) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_substring_length_bynumber_,BITONE)(G(b,BITONE),c,d); \ + else \ + a = G(pcre2_substring_length_bynumber_,BITTWO)(G(b,BITTWO),c,d) + +#define PCRE2_SUBSTRING_LIST_GET(a,b,c,d) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_substring_list_get_,BITONE)(G(b,BITONE), \ + (G(PCRE2_UCHAR,BITONE) ***)c,d); \ + else \ + a = G(pcre2_substring_list_get_,BITTWO)(G(b,BITTWO), \ + (G(PCRE2_UCHAR,BITTWO) ***)c,d) + +#define PCRE2_SUBSTRING_LIST_FREE(a) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(pcre2_substring_list_free_,BITONE)((G(PCRE2_SPTR,BITONE) *)a); \ + else \ + G(pcre2_substring_list_free_,BITTWO)((G(PCRE2_SPTR,BITTWO) *)a) + +#define PCRE2_SUBSTRING_NUMBER_FROM_NAME(a,b,c) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + a = G(pcre2_substring_number_from_name_,BITONE)(G(b,BITONE),G(c,BITONE)); \ + else \ + a = G(pcre2_substring_number_from_name_,BITTWO)(G(b,BITTWO),G(c,BITTWO)) + +#define PTR(x) ( \ + (test_mode == G(G(PCRE,BITONE),_MODE))? (void *)G(x,BITONE) : \ + (void *)G(x,BITTWO)) + +#define SETFLD(x,y,z) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) G(x,BITONE)->y = z; \ + else G(x,BITTWO)->y = z + +#define SETFLDVEC(x,y,v,z) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) G(x,BITONE)->y[v] = z; \ + else G(x,BITTWO)->y[v] = z + +#define SETOP(x,y,z) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) G(x,BITONE) z y; \ + else G(x,BITTWO) z y + +#define SETCASTPTR(x,y) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(x,BITONE) = (G(G(uint,BITONE),_t) *)(y); \ + else \ + G(x,BITTWO) = (G(G(uint,BITTWO),_t) *)(y) + +#define STRLEN(p) ((test_mode == G(G(PCRE,BITONE),_MODE))? \ + G(strlen,BITONE)((G(PCRE2_SPTR,BITONE))p) : \ + G(strlen,BITTWO)((G(PCRE2_SPTR,BITTWO))p)) + +#define SUB1(a,b) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(a,BITONE)(G(b,BITONE)); \ + else \ + G(a,BITTWO)(G(b,BITTWO)) + +#define SUB2(a,b,c) \ + if (test_mode == G(G(PCRE,BITONE),_MODE)) \ + G(a,BITONE))(G(b,BITONE),G(c,BITONE)); \ + else \ + G(a,BITTWO))(G(b,BITTWO),G(c,BITTWO)) + +#define TEST(x,r,y) ( \ + (test_mode == G(G(PCRE,BITONE),_MODE) && G(x,BITONE) r (y)) || \ + (test_mode == G(G(PCRE,BITTWO),_MODE) && G(x,BITTWO) r (y))) + +#define TESTFLD(x,f,r,y) ( \ + (test_mode == G(G(PCRE,BITONE),_MODE) && G(x,BITONE)->f r (y)) || \ + (test_mode == G(G(PCRE,BITTWO),_MODE) && G(x,BITTWO)->f r (y))) + + +#endif /* Two out of three modes */ + +/* ----- End of cases where more than one mode is supported ----- */ + + +/* ----- Only 8-bit mode is supported ----- */ + +#elif defined SUPPORT_PCRE2_8 +#define CASTFLD(t,a,b) (t)(G(a,8)->b) +#define CASTVAR(t,x) (t)G(x,8) +#define CODE_UNIT(a,b) (uint32_t)(((PCRE2_SPTR8)(a))[b]) +#define CONCTXCPY(a,b) memcpy(G(a,8),G(b,8),sizeof(pcre2_convert_context_8)) +#define CONVERT_COPY(a,b,c) memcpy(G(a,8),(char *)b, c) +#define DATCTXCPY(a,b) memcpy(G(a,8),G(b,8),sizeof(pcre2_match_context_8)) +#define FLD(a,b) G(a,8)->b +#define PATCTXCPY(a,b) memcpy(G(a,8),G(b,8),sizeof(pcre2_compile_context_8)) +#define PCHARS(lv, p, offset, len, utf, f) \ + lv = pchars8((PCRE2_SPTR8)(p)+offset, len, utf, f) +#define PCHARSV(p, offset, len, utf, f) \ + (void)pchars8((PCRE2_SPTR8)(p)+offset, len, utf, f) +#define PCRE2_CALLOUT_ENUMERATE(a,b,c) \ + a = pcre2_callout_enumerate_8(compiled_code8, \ + (int (*)(struct pcre2_callout_enumerate_block_8 *, void *))b,c) +#define PCRE2_CODE_COPY_FROM_VOID(a,b) G(a,8) = pcre2_code_copy_8(b) +#define PCRE2_CODE_COPY_TO_VOID(a,b) a = (void *)pcre2_code_copy_8(G(b,8)) +#define PCRE2_CODE_COPY_WITH_TABLES_TO_VOID(a,b) a = (void *)pcre2_code_copy_with_tables_8(G(b,8)) +#define PCRE2_COMPILE(a,b,c,d,e,f,g) \ + G(a,8) = pcre2_compile_8(G(b,8),c,d,e,f,g) +#define PCRE2_CONVERTED_PATTERN_FREE(a) \ + pcre2_converted_pattern_free_8((PCRE2_UCHAR8 *)a) +#define PCRE2_DFA_MATCH(a,b,c,d,e,f,g,h,i,j) \ + a = pcre2_dfa_match_8(G(b,8),(PCRE2_SPTR8)c,d,e,f,G(g,8),h,i,j) +#define PCRE2_GET_ERROR_MESSAGE(r,a,b) \ + r = pcre2_get_error_message_8(a,G(b,8),G(G(b,8),_size)) +#define PCRE2_GET_OVECTOR_COUNT(a,b) a = pcre2_get_ovector_count_8(G(b,8)) +#define PCRE2_GET_STARTCHAR(a,b) a = pcre2_get_startchar_8(G(b,8)) +#define PCRE2_JIT_COMPILE(r,a,b) r = pcre2_jit_compile_8(G(a,8),b) +#define PCRE2_JIT_FREE_UNUSED_MEMORY(a) pcre2_jit_free_unused_memory_8(G(a,8)) +#define PCRE2_JIT_MATCH(a,b,c,d,e,f,g,h) \ + a = pcre2_jit_match_8(G(b,8),(PCRE2_SPTR8)c,d,e,f,G(g,8),h) +#define PCRE2_JIT_STACK_CREATE(a,b,c,d) \ + a = (PCRE2_JIT_STACK *)pcre2_jit_stack_create_8(b,c,d); +#define PCRE2_JIT_STACK_ASSIGN(a,b,c) \ + pcre2_jit_stack_assign_8(G(a,8),(pcre2_jit_callback_8)b,c); +#define PCRE2_JIT_STACK_FREE(a) pcre2_jit_stack_free_8((pcre2_jit_stack_8 *)a); +#define PCRE2_MAKETABLES(a) a = pcre2_maketables_8(NULL) +#define PCRE2_MATCH(a,b,c,d,e,f,g,h) \ + a = pcre2_match_8(G(b,8),(PCRE2_SPTR8)c,d,e,f,G(g,8),h) +#define PCRE2_MATCH_DATA_CREATE(a,b,c) G(a,8) = pcre2_match_data_create_8(b,c) +#define PCRE2_MATCH_DATA_CREATE_FROM_PATTERN(a,b,c) \ + G(a,8) = pcre2_match_data_create_from_pattern_8(G(b,8),c) +#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_8(G(a,8)) +#define PCRE2_PATTERN_CONVERT(a,b,c,d,e,f,g) a = pcre2_pattern_convert_8(G(b,8),c,d,(PCRE2_UCHAR8 **)e,f,G(g,8)) +#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_8(G(b,8),c,d) +#define PCRE2_PRINTINT(a) pcre2_printint_8(compiled_code8,outfile,a) +#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \ + r = pcre2_serialize_decode_8((pcre2_code_8 **)a,b,c,G(d,8)) +#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \ + r = pcre2_serialize_encode_8((const pcre2_code_8 **)a,b,c,d,G(e,8)) +#define PCRE2_SERIALIZE_FREE(a) pcre2_serialize_free_8(a) +#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \ + r = pcre2_serialize_get_number_of_codes_8(a) +#define PCRE2_SET_CALLOUT(a,b,c) \ + pcre2_set_callout_8(G(a,8),(int (*)(pcre2_callout_block_8 *, void *))b,c) +#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_8(G(a,8),b) +#define PCRE2_SET_COMPILE_RECURSION_GUARD(a,b,c) \ + pcre2_set_compile_recursion_guard_8(G(a,8),b,c) +#define PCRE2_SET_DEPTH_LIMIT(a,b) pcre2_set_depth_limit_8(G(a,8),b) +#define PCRE2_SET_GLOB_ESCAPE(r,a,b) r = pcre2_set_glob_escape_8(G(a,8),b) +#define PCRE2_SET_GLOB_SEPARATOR(r,a,b) r = pcre2_set_glob_separator_8(G(a,8),b) +#define PCRE2_SET_HEAP_LIMIT(a,b) pcre2_set_heap_limit_8(G(a,8),b) +#define PCRE2_SET_MATCH_LIMIT(a,b) pcre2_set_match_limit_8(G(a,8),b) +#define PCRE2_SET_MAX_PATTERN_LENGTH(a,b) pcre2_set_max_pattern_length_8(G(a,8),b) +#define PCRE2_SET_OFFSET_LIMIT(a,b) pcre2_set_offset_limit_8(G(a,8),b) +#define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_8(G(a,8),b) +#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \ + pcre2_set_substitute_callout_8(G(a,8), \ + (int (*)(pcre2_substitute_callout_block_8 *, void *))b,c) +#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \ + a = pcre2_substitute_8(G(b,8),(PCRE2_SPTR8)c,d,e,f,G(g,8),h, \ + (PCRE2_SPTR8)i,j,(PCRE2_UCHAR8 *)k,l) +#define PCRE2_SUBSTRING_COPY_BYNAME(a,b,c,d,e) \ + a = pcre2_substring_copy_byname_8(G(b,8),G(c,8),(PCRE2_UCHAR8 *)d,e) +#define PCRE2_SUBSTRING_COPY_BYNUMBER(a,b,c,d,e) \ + a = pcre2_substring_copy_bynumber_8(G(b,8),c,(PCRE2_UCHAR8 *)d,e) +#define PCRE2_SUBSTRING_FREE(a) pcre2_substring_free_8((PCRE2_UCHAR8 *)a) +#define PCRE2_SUBSTRING_GET_BYNAME(a,b,c,d,e) \ + a = pcre2_substring_get_byname_8(G(b,8),G(c,8),(PCRE2_UCHAR8 **)d,e) +#define PCRE2_SUBSTRING_GET_BYNUMBER(a,b,c,d,e) \ + a = pcre2_substring_get_bynumber_8(G(b,8),c,(PCRE2_UCHAR8 **)d,e) +#define PCRE2_SUBSTRING_LENGTH_BYNAME(a,b,c,d) \ + a = pcre2_substring_length_byname_8(G(b,8),G(c,8),d) +#define PCRE2_SUBSTRING_LENGTH_BYNUMBER(a,b,c,d) \ + a = pcre2_substring_length_bynumber_8(G(b,8),c,d) +#define PCRE2_SUBSTRING_LIST_GET(a,b,c,d) \ + a = pcre2_substring_list_get_8(G(b,8),(PCRE2_UCHAR8 ***)c,d) +#define PCRE2_SUBSTRING_LIST_FREE(a) \ + pcre2_substring_list_free_8((PCRE2_SPTR8 *)a) +#define PCRE2_SUBSTRING_NUMBER_FROM_NAME(a,b,c) \ + a = pcre2_substring_number_from_name_8(G(b,8),G(c,8)); +#define PTR(x) (void *)G(x,8) +#define SETFLD(x,y,z) G(x,8)->y = z +#define SETFLDVEC(x,y,v,z) G(x,8)->y[v] = z +#define SETOP(x,y,z) G(x,8) z y +#define SETCASTPTR(x,y) G(x,8) = (uint8_t *)(y) +#define STRLEN(p) (int)strlen((char *)p) +#define SUB1(a,b) G(a,8)(G(b,8)) +#define SUB2(a,b,c) G(a,8)(G(b,8),G(c,8)) +#define TEST(x,r,y) (G(x,8) r (y)) +#define TESTFLD(x,f,r,y) (G(x,8)->f r (y)) + + +/* ----- Only 16-bit mode is supported ----- */ + +#elif defined SUPPORT_PCRE2_16 +#define CASTFLD(t,a,b) (t)(G(a,16)->b) +#define CASTVAR(t,x) (t)G(x,16) +#define CODE_UNIT(a,b) (uint32_t)(((PCRE2_SPTR16)(a))[b]) +#define CONCTXCPY(a,b) memcpy(G(a,16),G(b,16),sizeof(pcre2_convert_context_16)) +#define CONVERT_COPY(a,b,c) memcpy(G(a,16),(char *)b, (c)*2) +#define DATCTXCPY(a,b) memcpy(G(a,16),G(b,16),sizeof(pcre2_match_context_16)) +#define FLD(a,b) G(a,16)->b +#define PATCTXCPY(a,b) memcpy(G(a,16),G(b,16),sizeof(pcre2_compile_context_16)) +#define PCHARS(lv, p, offset, len, utf, f) \ + lv = pchars16((PCRE2_SPTR16)(p)+offset, len, utf, f) +#define PCHARSV(p, offset, len, utf, f) \ + (void)pchars16((PCRE2_SPTR16)(p)+offset, len, utf, f) +#define PCRE2_CALLOUT_ENUMERATE(a,b,c) \ + a = pcre2_callout_enumerate_16(compiled_code16, \ + (int (*)(struct pcre2_callout_enumerate_block_16 *, void *))b,c) +#define PCRE2_CODE_COPY_FROM_VOID(a,b) G(a,16) = pcre2_code_copy_16(b) +#define PCRE2_CODE_COPY_TO_VOID(a,b) a = (void *)pcre2_code_copy_16(G(b,16)) +#define PCRE2_CODE_COPY_WITH_TABLES_TO_VOID(a,b) a = (void *)pcre2_code_copy_with_tables_16(G(b,16)) +#define PCRE2_COMPILE(a,b,c,d,e,f,g) \ + G(a,16) = pcre2_compile_16(G(b,16),c,d,e,f,g) +#define PCRE2_CONVERTED_PATTERN_FREE(a) \ + pcre2_converted_pattern_free_16((PCRE2_UCHAR16 *)a) +#define PCRE2_DFA_MATCH(a,b,c,d,e,f,g,h,i,j) \ + a = pcre2_dfa_match_16(G(b,16),(PCRE2_SPTR16)c,d,e,f,G(g,16),h,i,j) +#define PCRE2_GET_ERROR_MESSAGE(r,a,b) \ + r = pcre2_get_error_message_16(a,G(b,16),G(G(b,16),_size/2)) +#define PCRE2_GET_OVECTOR_COUNT(a,b) a = pcre2_get_ovector_count_16(G(b,16)) +#define PCRE2_GET_STARTCHAR(a,b) a = pcre2_get_startchar_16(G(b,16)) +#define PCRE2_JIT_COMPILE(r,a,b) r = pcre2_jit_compile_16(G(a,16),b) +#define PCRE2_JIT_FREE_UNUSED_MEMORY(a) pcre2_jit_free_unused_memory_16(G(a,16)) +#define PCRE2_JIT_MATCH(a,b,c,d,e,f,g,h) \ + a = pcre2_jit_match_16(G(b,16),(PCRE2_SPTR16)c,d,e,f,G(g,16),h) +#define PCRE2_JIT_STACK_CREATE(a,b,c,d) \ + a = (PCRE2_JIT_STACK *)pcre2_jit_stack_create_16(b,c,d); +#define PCRE2_JIT_STACK_ASSIGN(a,b,c) \ + pcre2_jit_stack_assign_16(G(a,16),(pcre2_jit_callback_16)b,c); +#define PCRE2_JIT_STACK_FREE(a) pcre2_jit_stack_free_16((pcre2_jit_stack_16 *)a); +#define PCRE2_MAKETABLES(a) a = pcre2_maketables_16(NULL) +#define PCRE2_MATCH(a,b,c,d,e,f,g,h) \ + a = pcre2_match_16(G(b,16),(PCRE2_SPTR16)c,d,e,f,G(g,16),h) +#define PCRE2_MATCH_DATA_CREATE(a,b,c) G(a,16) = pcre2_match_data_create_16(b,c) +#define PCRE2_MATCH_DATA_CREATE_FROM_PATTERN(a,b,c) \ + G(a,16) = pcre2_match_data_create_from_pattern_16(G(b,16),c) +#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_16(G(a,16)) +#define PCRE2_PATTERN_CONVERT(a,b,c,d,e,f,g) a = pcre2_pattern_convert_16(G(b,16),c,d,(PCRE2_UCHAR16 **)e,f,G(g,16)) +#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_16(G(b,16),c,d) +#define PCRE2_PRINTINT(a) pcre2_printint_16(compiled_code16,outfile,a) +#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \ + r = pcre2_serialize_decode_16((pcre2_code_16 **)a,b,c,G(d,16)) +#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \ + r = pcre2_serialize_encode_16((const pcre2_code_16 **)a,b,c,d,G(e,16)) +#define PCRE2_SERIALIZE_FREE(a) pcre2_serialize_free_16(a) +#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \ + r = pcre2_serialize_get_number_of_codes_16(a) +#define PCRE2_SET_CALLOUT(a,b,c) \ + pcre2_set_callout_16(G(a,16),(int (*)(pcre2_callout_block_16 *, void *))b,c); +#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_16(G(a,16),b) +#define PCRE2_SET_COMPILE_RECURSION_GUARD(a,b,c) \ + pcre2_set_compile_recursion_guard_16(G(a,16),b,c) +#define PCRE2_SET_DEPTH_LIMIT(a,b) pcre2_set_depth_limit_16(G(a,16),b) +#define PCRE2_SET_GLOB_ESCAPE(r,a,b) r = pcre2_set_glob_escape_16(G(a,16),b) +#define PCRE2_SET_GLOB_SEPARATOR(r,a,b) r = pcre2_set_glob_separator_16(G(a,16),b) +#define PCRE2_SET_HEAP_LIMIT(a,b) pcre2_set_heap_limit_16(G(a,16),b) +#define PCRE2_SET_MATCH_LIMIT(a,b) pcre2_set_match_limit_16(G(a,16),b) +#define PCRE2_SET_MAX_PATTERN_LENGTH(a,b) pcre2_set_max_pattern_length_16(G(a,16),b) +#define PCRE2_SET_OFFSET_LIMIT(a,b) pcre2_set_offset_limit_16(G(a,16),b) +#define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_16(G(a,16),b) +#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \ + pcre2_set_substitute_callout_16(G(a,16), \ + (int (*)(pcre2_substitute_callout_block_16 *, void *))b,c) +#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \ + a = pcre2_substitute_16(G(b,16),(PCRE2_SPTR16)c,d,e,f,G(g,16),h, \ + (PCRE2_SPTR16)i,j,(PCRE2_UCHAR16 *)k,l) +#define PCRE2_SUBSTRING_COPY_BYNAME(a,b,c,d,e) \ + a = pcre2_substring_copy_byname_16(G(b,16),G(c,16),(PCRE2_UCHAR16 *)d,e) +#define PCRE2_SUBSTRING_COPY_BYNUMBER(a,b,c,d,e) \ + a = pcre2_substring_copy_bynumber_16(G(b,16),c,(PCRE2_UCHAR16 *)d,e) +#define PCRE2_SUBSTRING_FREE(a) pcre2_substring_free_16((PCRE2_UCHAR16 *)a) +#define PCRE2_SUBSTRING_GET_BYNAME(a,b,c,d,e) \ + a = pcre2_substring_get_byname_16(G(b,16),G(c,16),(PCRE2_UCHAR16 **)d,e) +#define PCRE2_SUBSTRING_GET_BYNUMBER(a,b,c,d,e) \ + a = pcre2_substring_get_bynumber_16(G(b,16),c,(PCRE2_UCHAR16 **)d,e) +#define PCRE2_SUBSTRING_LENGTH_BYNAME(a,b,c,d) \ + a = pcre2_substring_length_byname_16(G(b,16),G(c,16),d) +#define PCRE2_SUBSTRING_LENGTH_BYNUMBER(a,b,c,d) \ + a = pcre2_substring_length_bynumber_16(G(b,16),c,d) +#define PCRE2_SUBSTRING_LIST_GET(a,b,c,d) \ + a = pcre2_substring_list_get_16(G(b,16),(PCRE2_UCHAR16 ***)c,d) +#define PCRE2_SUBSTRING_LIST_FREE(a) \ + pcre2_substring_list_free_16((PCRE2_SPTR16 *)a) +#define PCRE2_SUBSTRING_NUMBER_FROM_NAME(a,b,c) \ + a = pcre2_substring_number_from_name_16(G(b,16),G(c,16)); +#define PTR(x) (void *)G(x,16) +#define SETFLD(x,y,z) G(x,16)->y = z +#define SETFLDVEC(x,y,v,z) G(x,16)->y[v] = z +#define SETOP(x,y,z) G(x,16) z y +#define SETCASTPTR(x,y) G(x,16) = (uint16_t *)(y) +#define STRLEN(p) (int)strlen16((PCRE2_SPTR16)p) +#define SUB1(a,b) G(a,16)(G(b,16)) +#define SUB2(a,b,c) G(a,16)(G(b,16),G(c,16)) +#define TEST(x,r,y) (G(x,16) r (y)) +#define TESTFLD(x,f,r,y) (G(x,16)->f r (y)) + + +/* ----- Only 32-bit mode is supported ----- */ + +#elif defined SUPPORT_PCRE2_32 +#define CASTFLD(t,a,b) (t)(G(a,32)->b) +#define CASTVAR(t,x) (t)G(x,32) +#define CODE_UNIT(a,b) (uint32_t)(((PCRE2_SPTR32)(a))[b]) +#define CONCTXCPY(a,b) memcpy(G(a,32),G(b,32),sizeof(pcre2_convert_context_32)) +#define CONVERT_COPY(a,b,c) memcpy(G(a,32),(char *)b, (c)*4) +#define DATCTXCPY(a,b) memcpy(G(a,32),G(b,32),sizeof(pcre2_match_context_32)) +#define FLD(a,b) G(a,32)->b +#define PATCTXCPY(a,b) memcpy(G(a,32),G(b,32),sizeof(pcre2_compile_context_32)) +#define PCHARS(lv, p, offset, len, utf, f) \ + lv = pchars32((PCRE2_SPTR32)(p)+offset, len, utf, f) +#define PCHARSV(p, offset, len, utf, f) \ + (void)pchars32((PCRE2_SPTR32)(p)+offset, len, utf, f) +#define PCRE2_CALLOUT_ENUMERATE(a,b,c) \ + a = pcre2_callout_enumerate_32(compiled_code32, \ + (int (*)(struct pcre2_callout_enumerate_block_32 *, void *))b,c) +#define PCRE2_CODE_COPY_FROM_VOID(a,b) G(a,32) = pcre2_code_copy_32(b) +#define PCRE2_CODE_COPY_TO_VOID(a,b) a = (void *)pcre2_code_copy_32(G(b,32)) +#define PCRE2_CODE_COPY_WITH_TABLES_TO_VOID(a,b) a = (void *)pcre2_code_copy_with_tables_32(G(b,32)) +#define PCRE2_COMPILE(a,b,c,d,e,f,g) \ + G(a,32) = pcre2_compile_32(G(b,32),c,d,e,f,g) +#define PCRE2_CONVERTED_PATTERN_FREE(a) \ + pcre2_converted_pattern_free_32((PCRE2_UCHAR32 *)a) +#define PCRE2_DFA_MATCH(a,b,c,d,e,f,g,h,i,j) \ + a = pcre2_dfa_match_32(G(b,32),(PCRE2_SPTR32)c,d,e,f,G(g,32),h,i,j) +#define PCRE2_GET_ERROR_MESSAGE(r,a,b) \ + r = pcre2_get_error_message_32(a,G(b,32),G(G(b,32),_size/4)) +#define PCRE2_GET_OVECTOR_COUNT(a,b) a = pcre2_get_ovector_count_32(G(b,32)) +#define PCRE2_GET_STARTCHAR(a,b) a = pcre2_get_startchar_32(G(b,32)) +#define PCRE2_JIT_COMPILE(r,a,b) r = pcre2_jit_compile_32(G(a,32),b) +#define PCRE2_JIT_FREE_UNUSED_MEMORY(a) pcre2_jit_free_unused_memory_32(G(a,32)) +#define PCRE2_JIT_MATCH(a,b,c,d,e,f,g,h) \ + a = pcre2_jit_match_32(G(b,32),(PCRE2_SPTR32)c,d,e,f,G(g,32),h) +#define PCRE2_JIT_STACK_CREATE(a,b,c,d) \ + a = (PCRE2_JIT_STACK *)pcre2_jit_stack_create_32(b,c,d); +#define PCRE2_JIT_STACK_ASSIGN(a,b,c) \ + pcre2_jit_stack_assign_32(G(a,32),(pcre2_jit_callback_32)b,c); +#define PCRE2_JIT_STACK_FREE(a) pcre2_jit_stack_free_32((pcre2_jit_stack_32 *)a); +#define PCRE2_MAKETABLES(a) a = pcre2_maketables_32(NULL) +#define PCRE2_MATCH(a,b,c,d,e,f,g,h) \ + a = pcre2_match_32(G(b,32),(PCRE2_SPTR32)c,d,e,f,G(g,32),h) +#define PCRE2_MATCH_DATA_CREATE(a,b,c) G(a,32) = pcre2_match_data_create_32(b,c) +#define PCRE2_MATCH_DATA_CREATE_FROM_PATTERN(a,b,c) \ + G(a,32) = pcre2_match_data_create_from_pattern_32(G(b,32),c) +#define PCRE2_MATCH_DATA_FREE(a) pcre2_match_data_free_32(G(a,32)) +#define PCRE2_PATTERN_CONVERT(a,b,c,d,e,f,g) a = pcre2_pattern_convert_32(G(b,32),c,d,(PCRE2_UCHAR32 **)e,f,G(g,32)) +#define PCRE2_PATTERN_INFO(a,b,c,d) a = pcre2_pattern_info_32(G(b,32),c,d) +#define PCRE2_PRINTINT(a) pcre2_printint_32(compiled_code32,outfile,a) +#define PCRE2_SERIALIZE_DECODE(r,a,b,c,d) \ + r = pcre2_serialize_decode_32((pcre2_code_32 **)a,b,c,G(d,32)) +#define PCRE2_SERIALIZE_ENCODE(r,a,b,c,d,e) \ + r = pcre2_serialize_encode_32((const pcre2_code_32 **)a,b,c,d,G(e,32)) +#define PCRE2_SERIALIZE_FREE(a) pcre2_serialize_free_32(a) +#define PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(r,a) \ + r = pcre2_serialize_get_number_of_codes_32(a) +#define PCRE2_SET_CALLOUT(a,b,c) \ + pcre2_set_callout_32(G(a,32),(int (*)(pcre2_callout_block_32 *, void *))b,c) +#define PCRE2_SET_CHARACTER_TABLES(a,b) pcre2_set_character_tables_32(G(a,32),b) +#define PCRE2_SET_COMPILE_RECURSION_GUARD(a,b,c) \ + pcre2_set_compile_recursion_guard_32(G(a,32),b,c) +#define PCRE2_SET_DEPTH_LIMIT(a,b) pcre2_set_depth_limit_32(G(a,32),b) +#define PCRE2_SET_GLOB_ESCAPE(r,a,b) r = pcre2_set_glob_escape_32(G(a,32),b) +#define PCRE2_SET_GLOB_SEPARATOR(r,a,b) r = pcre2_set_glob_separator_32(G(a,32),b) +#define PCRE2_SET_HEAP_LIMIT(a,b) pcre2_set_heap_limit_32(G(a,32),b) +#define PCRE2_SET_MATCH_LIMIT(a,b) pcre2_set_match_limit_32(G(a,32),b) +#define PCRE2_SET_MAX_PATTERN_LENGTH(a,b) pcre2_set_max_pattern_length_32(G(a,32),b) +#define PCRE2_SET_OFFSET_LIMIT(a,b) pcre2_set_offset_limit_32(G(a,32),b) +#define PCRE2_SET_PARENS_NEST_LIMIT(a,b) pcre2_set_parens_nest_limit_32(G(a,32),b) +#define PCRE2_SET_SUBSTITUTE_CALLOUT(a,b,c) \ + pcre2_set_substitute_callout_32(G(a,32), \ + (int (*)(pcre2_substitute_callout_block_32 *, void *))b,c) +#define PCRE2_SUBSTITUTE(a,b,c,d,e,f,g,h,i,j,k,l) \ + a = pcre2_substitute_32(G(b,32),(PCRE2_SPTR32)c,d,e,f,G(g,32),h, \ + (PCRE2_SPTR32)i,j,(PCRE2_UCHAR32 *)k,l) +#define PCRE2_SUBSTRING_COPY_BYNAME(a,b,c,d,e) \ + a = pcre2_substring_copy_byname_32(G(b,32),G(c,32),(PCRE2_UCHAR32 *)d,e) +#define PCRE2_SUBSTRING_COPY_BYNUMBER(a,b,c,d,e) \ + a = pcre2_substring_copy_bynumber_32(G(b,32),c,(PCRE2_UCHAR32 *)d,e); +#define PCRE2_SUBSTRING_FREE(a) pcre2_substring_free_32((PCRE2_UCHAR32 *)a) +#define PCRE2_SUBSTRING_GET_BYNAME(a,b,c,d,e) \ + a = pcre2_substring_get_byname_32(G(b,32),G(c,32),(PCRE2_UCHAR32 **)d,e) +#define PCRE2_SUBSTRING_GET_BYNUMBER(a,b,c,d,e) \ + a = pcre2_substring_get_bynumber_32(G(b,32),c,(PCRE2_UCHAR32 **)d,e) +#define PCRE2_SUBSTRING_LENGTH_BYNAME(a,b,c,d) \ + a = pcre2_substring_length_byname_32(G(b,32),G(c,32),d) +#define PCRE2_SUBSTRING_LENGTH_BYNUMBER(a,b,c,d) \ + a = pcre2_substring_length_bynumber_32(G(b,32),c,d) +#define PCRE2_SUBSTRING_LIST_GET(a,b,c,d) \ + a = pcre2_substring_list_get_32(G(b,32),(PCRE2_UCHAR32 ***)c,d) +#define PCRE2_SUBSTRING_LIST_FREE(a) \ + pcre2_substring_list_free_32((PCRE2_SPTR32 *)a) +#define PCRE2_SUBSTRING_NUMBER_FROM_NAME(a,b,c) \ + a = pcre2_substring_number_from_name_32(G(b,32),G(c,32)); +#define PTR(x) (void *)G(x,32) +#define SETFLD(x,y,z) G(x,32)->y = z +#define SETFLDVEC(x,y,v,z) G(x,32)->y[v] = z +#define SETOP(x,y,z) G(x,32) z y +#define SETCASTPTR(x,y) G(x,32) = (uint32_t *)(y) +#define STRLEN(p) (int)strlen32((PCRE2_SPTR32)p) +#define SUB1(a,b) G(a,32)(G(b,32)) +#define SUB2(a,b,c) G(a,32)(G(b,32),G(c,32)) +#define TEST(x,r,y) (G(x,32) r (y)) +#define TESTFLD(x,f,r,y) (G(x,32)->f r (y)) + +#endif + +/* ----- End of mode-specific function call macros ----- */ + + + + +/************************************************* +* Alternate character tables * +*************************************************/ + +/* By default, the "tables" pointer in the compile context when calling +pcre2_compile() is not set (= NULL), thereby using the default tables of the +library. However, the tables modifier can be used to select alternate sets of +tables, for different kinds of testing. Note that the locale modifier also +adjusts the tables. */ + +/* This is the set of tables distributed as default with PCRE2. It recognizes +only ASCII characters. */ + +static const uint8_t tables1[] = { + +/* This table is a lower casing table. */ + + 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15, + 16, 17, 18, 19, 20, 21, 22, 23, + 24, 25, 26, 27, 28, 29, 30, 31, + 32, 33, 34, 35, 36, 37, 38, 39, + 40, 41, 42, 43, 44, 45, 46, 47, + 48, 49, 50, 51, 52, 53, 54, 55, + 56, 57, 58, 59, 60, 61, 62, 63, + 64, 97, 98, 99,100,101,102,103, + 104,105,106,107,108,109,110,111, + 112,113,114,115,116,117,118,119, + 120,121,122, 91, 92, 93, 94, 95, + 96, 97, 98, 99,100,101,102,103, + 104,105,106,107,108,109,110,111, + 112,113,114,115,116,117,118,119, + 120,121,122,123,124,125,126,127, + 128,129,130,131,132,133,134,135, + 136,137,138,139,140,141,142,143, + 144,145,146,147,148,149,150,151, + 152,153,154,155,156,157,158,159, + 160,161,162,163,164,165,166,167, + 168,169,170,171,172,173,174,175, + 176,177,178,179,180,181,182,183, + 184,185,186,187,188,189,190,191, + 192,193,194,195,196,197,198,199, + 200,201,202,203,204,205,206,207, + 208,209,210,211,212,213,214,215, + 216,217,218,219,220,221,222,223, + 224,225,226,227,228,229,230,231, + 232,233,234,235,236,237,238,239, + 240,241,242,243,244,245,246,247, + 248,249,250,251,252,253,254,255, + +/* This table is a case flipping table. */ + + 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15, + 16, 17, 18, 19, 20, 21, 22, 23, + 24, 25, 26, 27, 28, 29, 30, 31, + 32, 33, 34, 35, 36, 37, 38, 39, + 40, 41, 42, 43, 44, 45, 46, 47, + 48, 49, 50, 51, 52, 53, 54, 55, + 56, 57, 58, 59, 60, 61, 62, 63, + 64, 97, 98, 99,100,101,102,103, + 104,105,106,107,108,109,110,111, + 112,113,114,115,116,117,118,119, + 120,121,122, 91, 92, 93, 94, 95, + 96, 65, 66, 67, 68, 69, 70, 71, + 72, 73, 74, 75, 76, 77, 78, 79, + 80, 81, 82, 83, 84, 85, 86, 87, + 88, 89, 90,123,124,125,126,127, + 128,129,130,131,132,133,134,135, + 136,137,138,139,140,141,142,143, + 144,145,146,147,148,149,150,151, + 152,153,154,155,156,157,158,159, + 160,161,162,163,164,165,166,167, + 168,169,170,171,172,173,174,175, + 176,177,178,179,180,181,182,183, + 184,185,186,187,188,189,190,191, + 192,193,194,195,196,197,198,199, + 200,201,202,203,204,205,206,207, + 208,209,210,211,212,213,214,215, + 216,217,218,219,220,221,222,223, + 224,225,226,227,228,229,230,231, + 232,233,234,235,236,237,238,239, + 240,241,242,243,244,245,246,247, + 248,249,250,251,252,253,254,255, + +/* This table contains bit maps for various character classes. Each map is 32 +bytes long and the bits run from the least significant end of each byte. The +classes that have their own maps are: space, xdigit, digit, upper, lower, word, +graph, print, punct, and cntrl. Other classes are built from combinations. */ + + 0x00,0x3e,0x00,0x00,0x01,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + + 0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03, + 0x7e,0x00,0x00,0x00,0x7e,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + + 0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0xfe,0xff,0xff,0x07,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0xfe,0xff,0xff,0x07, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + + 0x00,0x00,0x00,0x00,0x00,0x00,0xff,0x03, + 0xfe,0xff,0xff,0x87,0xfe,0xff,0xff,0x07, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + + 0x00,0x00,0x00,0x00,0xfe,0xff,0xff,0xff, + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x7f, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + + 0x00,0x00,0x00,0x00,0xff,0xff,0xff,0xff, + 0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x7f, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + + 0x00,0x00,0x00,0x00,0xfe,0xff,0x00,0xfc, + 0x01,0x00,0x00,0xf8,0x01,0x00,0x00,0x78, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + + 0xff,0xff,0xff,0xff,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x80, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, + +/* This table identifies various classes of character by individual bits: + 0x01 white space character + 0x02 letter + 0x04 decimal digit + 0x08 hexadecimal digit + 0x10 alphanumeric or '_' + 0x80 regular expression metacharacter or binary zero +*/ + + 0x80,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 0- 7 */ + 0x00,0x01,0x01,0x01,0x01,0x01,0x00,0x00, /* 8- 15 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 16- 23 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 24- 31 */ + 0x01,0x00,0x00,0x00,0x80,0x00,0x00,0x00, /* - ' */ + 0x80,0x80,0x80,0x80,0x00,0x00,0x80,0x00, /* ( - / */ + 0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c,0x1c, /* 0 - 7 */ + 0x1c,0x1c,0x00,0x00,0x00,0x00,0x00,0x80, /* 8 - ? */ + 0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* @ - G */ + 0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* H - O */ + 0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* P - W */ + 0x12,0x12,0x12,0x80,0x80,0x00,0x80,0x10, /* X - _ */ + 0x00,0x1a,0x1a,0x1a,0x1a,0x1a,0x1a,0x12, /* ` - g */ + 0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* h - o */ + 0x12,0x12,0x12,0x12,0x12,0x12,0x12,0x12, /* p - w */ + 0x12,0x12,0x12,0x80,0x80,0x00,0x00,0x00, /* x -127 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 128-135 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 136-143 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 144-151 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 152-159 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 160-167 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 168-175 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 176-183 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 184-191 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 192-199 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 200-207 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 208-215 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 216-223 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 224-231 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 232-239 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, /* 240-247 */ + 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00};/* 248-255 */ + +/* This is a set of tables that came originally from a Windows user. It seems +to be at least an approximation of ISO 8859. In particular, there are +characters greater than 128 that are marked as spaces, letters, etc. */ + +static const uint8_t tables2[] = { +0,1,2,3,4,5,6,7, +8,9,10,11,12,13,14,15, +16,17,18,19,20,21,22,23, +24,25,26,27,28,29,30,31, +32,33,34,35,36,37,38,39, +40,41,42,43,44,45,46,47, +48,49,50,51,52,53,54,55, +56,57,58,59,60,61,62,63, +64,97,98,99,100,101,102,103, +104,105,106,107,108,109,110,111, +112,113,114,115,116,117,118,119, +120,121,122,91,92,93,94,95, +96,97,98,99,100,101,102,103, +104,105,106,107,108,109,110,111, +112,113,114,115,116,117,118,119, +120,121,122,123,124,125,126,127, +128,129,130,131,132,133,134,135, +136,137,138,139,140,141,142,143, +144,145,146,147,148,149,150,151, +152,153,154,155,156,157,158,159, +160,161,162,163,164,165,166,167, +168,169,170,171,172,173,174,175, +176,177,178,179,180,181,182,183, +184,185,186,187,188,189,190,191, +224,225,226,227,228,229,230,231, +232,233,234,235,236,237,238,239, +240,241,242,243,244,245,246,215, +248,249,250,251,252,253,254,223, +224,225,226,227,228,229,230,231, +232,233,234,235,236,237,238,239, +240,241,242,243,244,245,246,247, +248,249,250,251,252,253,254,255, +0,1,2,3,4,5,6,7, +8,9,10,11,12,13,14,15, +16,17,18,19,20,21,22,23, +24,25,26,27,28,29,30,31, +32,33,34,35,36,37,38,39, +40,41,42,43,44,45,46,47, +48,49,50,51,52,53,54,55, +56,57,58,59,60,61,62,63, +64,97,98,99,100,101,102,103, +104,105,106,107,108,109,110,111, +112,113,114,115,116,117,118,119, +120,121,122,91,92,93,94,95, +96,65,66,67,68,69,70,71, +72,73,74,75,76,77,78,79, +80,81,82,83,84,85,86,87, +88,89,90,123,124,125,126,127, +128,129,130,131,132,133,134,135, +136,137,138,139,140,141,142,143, +144,145,146,147,148,149,150,151, +152,153,154,155,156,157,158,159, +160,161,162,163,164,165,166,167, +168,169,170,171,172,173,174,175, +176,177,178,179,180,181,182,183, +184,185,186,187,188,189,190,191, +224,225,226,227,228,229,230,231, +232,233,234,235,236,237,238,239, +240,241,242,243,244,245,246,215, +248,249,250,251,252,253,254,223, +192,193,194,195,196,197,198,199, +200,201,202,203,204,205,206,207, +208,209,210,211,212,213,214,247, +216,217,218,219,220,221,222,255, +0,62,0,0,1,0,0,0, +0,0,0,0,0,0,0,0, +32,0,0,0,1,0,0,0, +0,0,0,0,0,0,0,0, +0,0,0,0,0,0,255,3, +126,0,0,0,126,0,0,0, +0,0,0,0,0,0,0,0, +0,0,0,0,0,0,0,0, +0,0,0,0,0,0,255,3, +0,0,0,0,0,0,0,0, +0,0,0,0,0,0,12,2, +0,0,0,0,0,0,0,0, +0,0,0,0,0,0,0,0, +254,255,255,7,0,0,0,0, +0,0,0,0,0,0,0,0, +255,255,127,127,0,0,0,0, +0,0,0,0,0,0,0,0, +0,0,0,0,254,255,255,7, +0,0,0,0,0,4,32,4, +0,0,0,128,255,255,127,255, +0,0,0,0,0,0,255,3, +254,255,255,135,254,255,255,7, +0,0,0,0,0,4,44,6, +255,255,127,255,255,255,127,255, +0,0,0,0,254,255,255,255, +255,255,255,255,255,255,255,127, +0,0,0,0,254,255,255,255, +255,255,255,255,255,255,255,255, +0,2,0,0,255,255,255,255, +255,255,255,255,255,255,255,127, +0,0,0,0,255,255,255,255, +255,255,255,255,255,255,255,255, +0,0,0,0,254,255,0,252, +1,0,0,248,1,0,0,120, +0,0,0,0,254,255,255,255, +0,0,128,0,0,0,128,0, +255,255,255,255,0,0,0,0, +0,0,0,0,0,0,0,128, +255,255,255,255,0,0,0,0, +0,0,0,0,0,0,0,0, +128,0,0,0,0,0,0,0, +0,1,1,0,1,1,0,0, +0,0,0,0,0,0,0,0, +0,0,0,0,0,0,0,0, +1,0,0,0,128,0,0,0, +128,128,128,128,0,0,128,0, +28,28,28,28,28,28,28,28, +28,28,0,0,0,0,0,128, +0,26,26,26,26,26,26,18, +18,18,18,18,18,18,18,18, +18,18,18,18,18,18,18,18, +18,18,18,128,128,0,128,16, +0,26,26,26,26,26,26,18, +18,18,18,18,18,18,18,18, +18,18,18,18,18,18,18,18, +18,18,18,128,128,0,0,0, +0,0,0,0,0,1,0,0, +0,0,0,0,0,0,0,0, +0,0,0,0,0,0,0,0, +0,0,0,0,0,0,0,0, +1,0,0,0,0,0,0,0, +0,0,18,0,0,0,0,0, +0,0,20,20,0,18,0,0, +0,20,18,0,0,0,0,0, +18,18,18,18,18,18,18,18, +18,18,18,18,18,18,18,18, +18,18,18,18,18,18,18,0, +18,18,18,18,18,18,18,18, +18,18,18,18,18,18,18,18, +18,18,18,18,18,18,18,18, +18,18,18,18,18,18,18,0, +18,18,18,18,18,18,18,18 +}; + + + +#if !defined(VPCOMPAT) && !defined(HAVE_MEMMOVE) +/************************************************* +* Emulated memmove() for systems without it * +*************************************************/ + +/* This function can make use of bcopy() if it is available. Otherwise do it by +steam, as there are some non-Unix environments that lack both memmove() and +bcopy(). */ + +static void * +emulated_memmove(void *d, const void *s, size_t n) +{ +#ifdef HAVE_BCOPY +bcopy(s, d, n); +return d; +#else +size_t i; +unsigned char *dest = (unsigned char *)d; +const unsigned char *src = (const unsigned char *)s; +if (dest > src) + { + dest += n; + src += n; + for (i = 0; i < n; ++i) *(--dest) = *(--src); + return (void *)dest; + } +else + { + for (i = 0; i < n; ++i) *dest++ = *src++; + return (void *)(dest - n); + } +#endif /* not HAVE_BCOPY */ +} +#undef memmove +#define memmove(d,s,n) emulated_memmove(d,s,n) +#endif /* not VPCOMPAT && not HAVE_MEMMOVE */ + + + +#ifndef HAVE_STRERROR +/************************************************* +* Provide strerror() for non-ANSI libraries * +*************************************************/ + +/* Some old-fashioned systems (e.g. SunOS4) didn't have strerror() in their +libraries. They may no longer be around, but just in case, we can try to +provide the same facility by this simple alternative function. */ + +extern int sys_nerr; +extern char *sys_errlist[]; + +char * +strerror(int n) +{ +if (n < 0 || n >= sys_nerr) return "unknown error number"; +return sys_errlist[n]; +} +#endif /* HAVE_STRERROR */ + + + +/************************************************* +* Local memory functions * +*************************************************/ + +/* Alternative memory functions, to test functionality. */ + +static void *my_malloc(PCRE2_SIZE size, void *data) +{ +void *block = malloc(size); +(void)data; +if (show_memory) + { + if (block == NULL) + { + fprintf(outfile, "** malloc() failed for %" SIZ_FORM "\n", SIZ_CAST size); + } + else + { + fprintf(outfile, "malloc %5" SIZ_FORM, SIZ_CAST size); +#ifdef DEBUG_SHOW_MALLOC_ADDRESSES + fprintf(outfile, " %p", block); /* Not portable */ +#endif + if (malloclistptr < MALLOCLISTSIZE) + { + malloclist[malloclistptr] = block; + malloclistlength[malloclistptr++] = size; + } + else + fprintf(outfile, " (not remembered)"); + fprintf(outfile, "\n"); + } + } +return block; +} + +static void my_free(void *block, void *data) +{ +(void)data; +if (show_memory) + { + uint32_t i, j; + BOOL found = FALSE; + + fprintf(outfile, "free"); + for (i = 0; i < malloclistptr; i++) + { + if (block == malloclist[i]) + { + fprintf(outfile, " %5" SIZ_FORM, SIZ_CAST malloclistlength[i]); + malloclistptr--; + for (j = i; j < malloclistptr; j++) + { + malloclist[j] = malloclist[j+1]; + malloclistlength[j] = malloclistlength[j+1]; + } + found = TRUE; + break; + } + } + if (!found) fprintf(outfile, " unremembered block"); +#ifdef DEBUG_SHOW_MALLOC_ADDRESSES + fprintf(outfile, " %p", block); /* Not portable */ +#endif + fprintf(outfile, "\n"); + } +free(block); +} + + + +/************************************************* +* Callback function for stack guard * +*************************************************/ + +/* This is set up to be called from pcre2_compile() when the stackguard=n +modifier sets a value greater than zero. The test we do is whether the +parenthesis nesting depth is greater than the value set by the modifier. + +Argument: the current parenthesis nesting depth +Returns: non-zero to kill the compilation +*/ + +static int +stack_guard(uint32_t depth, void *user_data) +{ +(void)user_data; +return depth > pat_patctl.stackguard_test; +} + + +/************************************************* +* JIT memory callback * +*************************************************/ + +static PCRE2_JIT_STACK* +jit_callback(void *arg) +{ +jit_was_used = TRUE; +return (PCRE2_JIT_STACK *)arg; +} + + +/************************************************* +* Convert UTF-8 character to code point * +*************************************************/ + +/* This function reads one or more bytes that represent a UTF-8 character, +and returns the codepoint of that character. Note that the function supports +the original UTF-8 definition of RFC 2279, allowing for values in the range 0 +to 0x7fffffff, up to 6 bytes long. This makes it possible to generate +codepoints greater than 0x10ffff which are useful for testing PCRE2's error +checking, and also for generating 32-bit non-UTF data values above the UTF +limit. + +Argument: + utf8bytes a pointer to the byte vector + vptr a pointer to an int to receive the value + +Returns: > 0 => the number of bytes consumed + -6 to 0 => malformed UTF-8 character at offset = (-return) +*/ + +static int +utf82ord(PCRE2_SPTR8 utf8bytes, uint32_t *vptr) +{ +uint32_t c = *utf8bytes++; +uint32_t d = c; +int i, j, s; + +for (i = -1; i < 6; i++) /* i is number of additional bytes */ + { + if ((d & 0x80) == 0) break; + d <<= 1; + } + +if (i == -1) { *vptr = c; return 1; } /* ascii character */ +if (i == 0 || i == 6) return 0; /* invalid UTF-8 */ + +/* i now has a value in the range 1-5 */ + +s = 6*i; +d = (c & utf8_table3[i]) << s; + +for (j = 0; j < i; j++) + { + c = *utf8bytes++; + if ((c & 0xc0) != 0x80) return -(j+1); + s -= 6; + d |= (c & 0x3f) << s; + } + +/* Check that encoding was the correct unique one */ + +for (j = 0; j < utf8_table1_size; j++) + if (d <= (uint32_t)utf8_table1[j]) break; +if (j != i) return -(i+1); + +/* Valid value */ + +*vptr = d; +return i+1; +} + + + +/************************************************* +* Print one character * +*************************************************/ + +/* Print a single character either literally, or as a hex escape, and count how +many printed characters are used. + +Arguments: + c the character + utf TRUE in UTF mode + f the FILE to print to, or NULL just to count characters + +Returns: number of characters written +*/ + +static int +pchar(uint32_t c, BOOL utf, FILE *f) +{ +int n = 0; +char tempbuffer[16]; + +if (PRINTOK(c)) + { + if (f != NULL) fprintf(f, "%c", c); + return 1; + } + +if (c < 0x100) + { + if (utf) + { + if (f != NULL) fprintf(f, "\\x{%02x}", c); + return 6; + } + else + { + if (f != NULL) fprintf(f, "\\x%02x", c); + return 4; + } + } + +if (f != NULL) n = fprintf(f, "\\x{%02x}", c); + else n = sprintf(tempbuffer, "\\x{%02x}", c); + +return n >= 0 ? n : 0; +} + + + +#ifdef SUPPORT_PCRE2_16 +/************************************************* +* Find length of 0-terminated 16-bit string * +*************************************************/ + +static size_t strlen16(PCRE2_SPTR16 p) +{ +PCRE2_SPTR16 pp = p; +while (*pp != 0) pp++; +return (int)(pp - p); +} +#endif /* SUPPORT_PCRE2_16 */ + + + +#ifdef SUPPORT_PCRE2_32 +/************************************************* +* Find length of 0-terminated 32-bit string * +*************************************************/ + +static size_t strlen32(PCRE2_SPTR32 p) +{ +PCRE2_SPTR32 pp = p; +while (*pp != 0) pp++; +return (int)(pp - p); +} +#endif /* SUPPORT_PCRE2_32 */ + + +#ifdef SUPPORT_PCRE2_8 +/************************************************* +* Print 8-bit character string * +*************************************************/ + +/* Must handle UTF-8 strings in utf8 mode. Yields number of characters printed. +For printing *MARK strings, a negative length is given, indicating that the +length is in the first code unit. If handed a NULL file, this function just +counts chars without printing (because pchar() does that). */ + +static int pchars8(PCRE2_SPTR8 p, int length, BOOL utf, FILE *f) +{ +uint32_t c = 0; +int yield = 0; +if (length < 0) length = *p++; +while (length-- > 0) + { + if (utf) + { + int rc = utf82ord(p, &c); + if (rc > 0 && rc <= length + 1) /* Mustn't run over the end */ + { + length -= rc - 1; + p += rc; + yield += pchar(c, utf, f); + continue; + } + } + c = *p++; + yield += pchar(c, utf, f); + } + +return yield; +} +#endif + + +#ifdef SUPPORT_PCRE2_16 +/************************************************* +* Print 16-bit character string * +*************************************************/ + +/* Must handle UTF-16 strings in utf mode. Yields number of characters printed. +For printing *MARK strings, a negative length is given, indicating that the +length is in the first code unit. If handed a NULL file, just counts chars +without printing. */ + +static int pchars16(PCRE2_SPTR16 p, int length, BOOL utf, FILE *f) +{ +int yield = 0; +if (length < 0) length = *p++; +while (length-- > 0) + { + uint32_t c = *p++ & 0xffff; + if (utf && c >= 0xD800 && c < 0xDC00 && length > 0) + { + int d = *p & 0xffff; + if (d >= 0xDC00 && d <= 0xDFFF) + { + c = ((c & 0x3ff) << 10) + (d & 0x3ff) + 0x10000; + length--; + p++; + } + } + yield += pchar(c, utf, f); + } +return yield; +} +#endif /* SUPPORT_PCRE2_16 */ + + + +#ifdef SUPPORT_PCRE2_32 +/************************************************* +* Print 32-bit character string * +*************************************************/ + +/* Must handle UTF-32 strings in utf mode. Yields number of characters printed. +For printing *MARK strings, a negative length is given, indicating that the +length is in the first code unit. If handed a NULL file, just counts chars +without printing. */ + +static int pchars32(PCRE2_SPTR32 p, int length, BOOL utf, FILE *f) +{ +int yield = 0; +(void)(utf); /* Avoid compiler warning */ +if (length < 0) length = *p++; +while (length-- > 0) + { + uint32_t c = *p++; + yield += pchar(c, utf, f); + } +return yield; +} +#endif /* SUPPORT_PCRE2_32 */ + + + + +/************************************************* +* Convert character value to UTF-8 * +*************************************************/ + +/* This function takes an integer value in the range 0 - 0x7fffffff +and encodes it as a UTF-8 character in 0 to 6 bytes. It is needed even when the +8-bit library is not supported, to generate UTF-8 output for non-ASCII +characters. + +Arguments: + cvalue the character value + utf8bytes pointer to buffer for result - at least 6 bytes long + +Returns: number of characters placed in the buffer +*/ + +static int +ord2utf8(uint32_t cvalue, uint8_t *utf8bytes) +{ +int i, j; +if (cvalue > 0x7fffffffu) + return -1; +for (i = 0; i < utf8_table1_size; i++) + if (cvalue <= (uint32_t)utf8_table1[i]) break; +utf8bytes += i; +for (j = i; j > 0; j--) + { + *utf8bytes-- = 0x80 | (cvalue & 0x3f); + cvalue >>= 6; + } +*utf8bytes = utf8_table2[i] | cvalue; +return i + 1; +} + + + +#ifdef SUPPORT_PCRE2_16 +/************************************************* +* Convert string to 16-bit * +*************************************************/ + +/* In UTF mode the input is always interpreted as a string of UTF-8 bytes using +the original UTF-8 definition of RFC 2279, which allows for up to 6 bytes, and +code values from 0 to 0x7fffffff. However, values greater than the later UTF +limit of 0x10ffff cause an error. In non-UTF mode the input is interpreted as +UTF-8 if the utf8_input modifier is set, but an error is generated for values +greater than 0xffff. + +If all the input bytes are ASCII, the space needed for a 16-bit string is +exactly double the 8-bit size. Otherwise, the size needed for a 16-bit string +is no more than double, because up to 0xffff uses no more than 3 bytes in UTF-8 +but possibly 4 in UTF-16. Higher values use 4 bytes in UTF-8 and up to 4 bytes +in UTF-16. The result is always left in pbuffer16. Impose a minimum size to +save repeated re-sizing. + +Note that this function does not object to surrogate values. This is +deliberate; it makes it possible to construct UTF-16 strings that are invalid, +for the purpose of testing that they are correctly faulted. + +Arguments: + p points to a byte string + utf true in UTF mode + lenptr points to number of bytes in the string (excluding trailing zero) + +Returns: 0 on success, with the length updated to the number of 16-bit + data items used (excluding the trailing zero) + OR -1 if a UTF-8 string is malformed + OR -2 if a value > 0x10ffff is encountered in UTF mode + OR -3 if a value > 0xffff is encountered when not in UTF mode +*/ + +static PCRE2_SIZE +to16(uint8_t *p, int utf, PCRE2_SIZE *lenptr) +{ +uint16_t *pp; +PCRE2_SIZE len = *lenptr; + +if (pbuffer16_size < 2*len + 2) + { + if (pbuffer16 != NULL) free(pbuffer16); + pbuffer16_size = 2*len + 2; + if (pbuffer16_size < 4096) pbuffer16_size = 4096; + pbuffer16 = (uint16_t *)malloc(pbuffer16_size); + if (pbuffer16 == NULL) + { + fprintf(stderr, "pcre2test: malloc(%" SIZ_FORM ") failed for pbuffer16\n", + SIZ_CAST pbuffer16_size); + exit(1); + } + } + +pp = pbuffer16; +if (!utf && (pat_patctl.control & CTL_UTF8_INPUT) == 0) + { + for (; len > 0; len--) *pp++ = *p++; + } +else while (len > 0) + { + uint32_t c; + int chlen = utf82ord(p, &c); + if (chlen <= 0) return -1; + if (!utf && c > 0xffff) return -3; + if (c > 0x10ffff) return -2; + p += chlen; + len -= chlen; + if (c < 0x10000) *pp++ = c; else + { + c -= 0x10000; + *pp++ = 0xD800 | (c >> 10); + *pp++ = 0xDC00 | (c & 0x3ff); + } + } + +*pp = 0; +*lenptr = pp - pbuffer16; +return 0; +} +#endif + + + +#ifdef SUPPORT_PCRE2_32 +/************************************************* +* Convert string to 32-bit * +*************************************************/ + +/* In UTF mode the input is always interpreted as a string of UTF-8 bytes using +the original UTF-8 definition of RFC 2279, which allows for up to 6 bytes, and +code values from 0 to 0x7fffffff. However, values greater than the later UTF +limit of 0x10ffff cause an error. + +In non-UTF mode the input is interpreted as UTF-8 if the utf8_input modifier +is set, and no limit is imposed. There is special interpretation of the 0xff +byte (which is illegal in UTF-8) in this case: it causes the top bit of the +next character to be set. This provides a way of generating 32-bit characters +greater than 0x7fffffff. + +If all the input bytes are ASCII, the space needed for a 32-bit string is +exactly four times the 8-bit size. Otherwise, the size needed for a 32-bit +string is no more than four times, because the number of characters must be +less than the number of bytes. The result is always left in pbuffer32. Impose a +minimum size to save repeated re-sizing. + +Note that this function does not object to surrogate values. This is +deliberate; it makes it possible to construct UTF-32 strings that are invalid, +for the purpose of testing that they are correctly faulted. + +Arguments: + p points to a byte string + utf true in UTF mode + lenptr points to number of bytes in the string (excluding trailing zero) + +Returns: 0 on success, with the length updated to the number of 32-bit + data items used (excluding the trailing zero) + OR -1 if a UTF-8 string is malformed + OR -2 if a value > 0x10ffff is encountered in UTF mode +*/ + +static PCRE2_SIZE +to32(uint8_t *p, int utf, PCRE2_SIZE *lenptr) +{ +uint32_t *pp; +PCRE2_SIZE len = *lenptr; + +if (pbuffer32_size < 4*len + 4) + { + if (pbuffer32 != NULL) free(pbuffer32); + pbuffer32_size = 4*len + 4; + if (pbuffer32_size < 8192) pbuffer32_size = 8192; + pbuffer32 = (uint32_t *)malloc(pbuffer32_size); + if (pbuffer32 == NULL) + { + fprintf(stderr, "pcre2test: malloc(%" SIZ_FORM ") failed for pbuffer32\n", + SIZ_CAST pbuffer32_size); + exit(1); + } + } + +pp = pbuffer32; + +if (!utf && (pat_patctl.control & CTL_UTF8_INPUT) == 0) + { + for (; len > 0; len--) *pp++ = *p++; + } + +else while (len > 0) + { + int chlen; + uint32_t c; + uint32_t topbit = 0; + if (!utf && *p == 0xff && len > 1) + { + topbit = 0x80000000u; + p++; + len--; + } + chlen = utf82ord(p, &c); + if (chlen <= 0) return -1; + if (utf && c > 0x10ffff) return -2; + p += chlen; + len -= chlen; + *pp++ = c | topbit; + } + +*pp = 0; +*lenptr = pp - pbuffer32; +return 0; +} +#endif /* SUPPORT_PCRE2_32 */ + + + +/* This function is no longer used. Keep it around for a while, just in case it +needs to be re-instated. */ + +#ifdef NEVERNEVERNEVER + +/************************************************* +* Move back by so many characters * +*************************************************/ + +/* Given a code unit offset in a subject string, move backwards by a number of +characters, and return the resulting offset. + +Arguments: + subject pointer to the string + offset start offset + count count to move back by + utf TRUE if in UTF mode + +Returns: a possibly changed offset +*/ + +static PCRE2_SIZE +backchars(uint8_t *subject, PCRE2_SIZE offset, uint32_t count, BOOL utf) +{ +if (!utf || test_mode == PCRE32_MODE) + return (count >= offset)? 0 : (offset - count); + +else if (test_mode == PCRE8_MODE) + { + PCRE2_SPTR8 pp = (PCRE2_SPTR8)subject + offset; + for (; count > 0 && pp > (PCRE2_SPTR8)subject; count--) + { + pp--; + while ((*pp & 0xc0) == 0x80) pp--; + } + return pp - (PCRE2_SPTR8)subject; + } + +else /* 16-bit mode */ + { + PCRE2_SPTR16 pp = (PCRE2_SPTR16)subject + offset; + for (; count > 0 && pp > (PCRE2_SPTR16)subject; count--) + { + pp--; + if ((*pp & 0xfc00) == 0xdc00) pp--; + } + return pp - (PCRE2_SPTR16)subject; + } +} +#endif /* NEVERNEVERNEVER */ + + + +/************************************************* +* Expand input buffers * +*************************************************/ + +/* This function doubles the size of the input buffer and the buffer for +keeping an 8-bit copy of patterns (pbuffer8), and copies the current buffers to +the new ones. + +Arguments: none +Returns: nothing (aborts if malloc() fails) +*/ + +static void +expand_input_buffers(void) +{ +int new_pbuffer8_size = 2*pbuffer8_size; +uint8_t *new_buffer = (uint8_t *)malloc(new_pbuffer8_size); +uint8_t *new_pbuffer8 = (uint8_t *)malloc(new_pbuffer8_size); + +if (new_buffer == NULL || new_pbuffer8 == NULL) + { + fprintf(stderr, "pcre2test: malloc(%d) failed\n", new_pbuffer8_size); + exit(1); + } + +memcpy(new_buffer, buffer, pbuffer8_size); +memcpy(new_pbuffer8, pbuffer8, pbuffer8_size); + +pbuffer8_size = new_pbuffer8_size; + +free(buffer); +free(pbuffer8); + +buffer = new_buffer; +pbuffer8 = new_pbuffer8; +} + + + +/************************************************* +* Read or extend an input line * +*************************************************/ + +/* Input lines are read into buffer, but both patterns and data lines can be +continued over multiple input lines. In addition, if the buffer fills up, we +want to automatically expand it so as to be able to handle extremely large +lines that are needed for certain stress tests, although this is less likely +now that there are repetition features for both patterns and data. When the +input buffer is expanded, the other two buffers must also be expanded likewise, +and the contents of pbuffer, which are a copy of the input for callouts, must +be preserved (for when expansion happens for a data line). This is not the most +optimal way of handling this, but hey, this is just a test program! + +Arguments: + f the file to read + start where in buffer to start (this *must* be within buffer) + prompt for stdin or readline() + +Returns: pointer to the start of new data + could be a copy of start, or could be moved + NULL if no data read and EOF reached +*/ + +static uint8_t * +extend_inputline(FILE *f, uint8_t *start, const char *prompt) +{ +uint8_t *here = start; + +for (;;) + { + size_t rlen = (size_t)(pbuffer8_size - (here - buffer)); + + if (rlen > 1000) + { + size_t dlen; + + /* If libreadline or libedit support is required, use readline() to read a + line if the input is a terminal. Note that readline() removes the trailing + newline, so we must put it back again, to be compatible with fgets(). */ + +#if defined(SUPPORT_LIBREADLINE) || defined(SUPPORT_LIBEDIT) + if (INTERACTIVE(f)) + { + size_t len; + char *s = readline(prompt); + if (s == NULL) return (here == start)? NULL : start; + len = strlen(s); + if (len > 0) add_history(s); + if (len > rlen - 1) len = rlen - 1; + memcpy(here, s, len); + here[len] = '\n'; + here[len+1] = 0; + free(s); + } + else +#endif + + /* Read the next line by normal means, prompting if the file is a tty. */ + + { + if (INTERACTIVE(f)) printf("%s", prompt); + if (fgets((char *)here, rlen, f) == NULL) + return (here == start)? NULL : start; + } + + dlen = strlen((char *)here); + here += dlen; + + /* Check for end of line reached. Take care not to read data from before + start (dlen will be zero for a file starting with a binary zero). */ + + if (here > start && here[-1] == '\n') return start; + + /* If we have not read a newline when reading a file, we have either filled + the buffer or reached the end of the file. We can detect the former by + checking that the string fills the buffer, and the latter by feof(). If + neither of these is true, it means we read a binary zero which has caused + strlen() to give a short length. This is a hard error because pcre2test + expects to work with C strings. */ + + if (!INTERACTIVE(f) && dlen < rlen - 1 && !feof(f)) + { + fprintf(outfile, "** Binary zero encountered in input\n"); + fprintf(outfile, "** pcre2test run abandoned\n"); + exit(1); + } + } + + else + { + size_t start_offset = start - buffer; + size_t here_offset = here - buffer; + expand_input_buffers(); + start = buffer + start_offset; + here = buffer + here_offset; + } + } + +/* Control never gets here */ +} + + + +/************************************************* +* Case-independent strncmp() function * +*************************************************/ + +/* +Arguments: + s first string + t second string + n number of characters to compare + +Returns: < 0, = 0, or > 0, according to the comparison +*/ + +static int +strncmpic(const uint8_t *s, const uint8_t *t, int n) +{ +while (n--) + { + int c = tolower(*s++) - tolower(*t++); + if (c != 0) return c; + } +return 0; +} + + + +/************************************************* +* Scan the main modifier list * +*************************************************/ + +/* This function searches the modifier list for a long modifier name. + +Argument: + p start of the name + lenp length of the name + +Returns: an index in the modifier list, or -1 on failure +*/ + +static int +scan_modifiers(const uint8_t *p, unsigned int len) +{ +int bot = 0; +int top = MODLISTCOUNT; + +while (top > bot) + { + int mid = (bot + top)/2; + unsigned int mlen = strlen(modlist[mid].name); + int c = strncmp((char *)p, modlist[mid].name, (len < mlen)? len : mlen); + if (c == 0) + { + if (len == mlen) return mid; + c = (int)len - (int)mlen; + } + if (c > 0) bot = mid + 1; else top = mid; + } + +return -1; + +} + + + +/************************************************* +* Check a modifer and find its field * +*************************************************/ + +/* This function is called when a modifier has been identified. We check that +it is allowed here and find the field that is to be changed. + +Arguments: + m the modifier list entry + ctx CTX_PAT => pattern context + CTX_POPPAT => pattern context for popped pattern + CTX_DEFPAT => default pattern context + CTX_DAT => data context + CTX_DEFDAT => default data context + pctl point to pattern control block + dctl point to data control block + c a single character or 0 + +Returns: a field pointer or NULL +*/ + +static void * +check_modifier(modstruct *m, int ctx, patctl *pctl, datctl *dctl, uint32_t c) +{ +void *field = NULL; +PCRE2_SIZE offset = m->offset; + +if (restrict_for_perl_test) switch(m->which) + { + case MOD_PNDP: + case MOD_PATP: + case MOD_PDP: + break; + + default: + fprintf(outfile, "** '%s' is not allowed in a Perl-compatible test\n", + m->name); + return NULL; + } + +switch (m->which) + { + case MOD_CTC: /* Compile context modifier */ + if (ctx == CTX_DEFPAT) field = PTR(default_pat_context); + else if (ctx == CTX_PAT) field = PTR(pat_context); + break; + + case MOD_CTM: /* Match context modifier */ + if (ctx == CTX_DEFDAT) field = PTR(default_dat_context); + else if (ctx == CTX_DAT) field = PTR(dat_context); + break; + + case MOD_DAT: /* Data line modifier */ + if (dctl != NULL) field = dctl; + break; + + case MOD_PAT: /* Pattern modifier */ + case MOD_PATP: /* Allowed for Perl test */ + if (pctl != NULL) field = pctl; + break; + + case MOD_PD: /* Pattern or data line modifier */ + case MOD_PDP: /* Ditto, allowed for Perl test */ + case MOD_PND: /* Ditto, but not default pattern */ + case MOD_PNDP: /* Ditto, allowed for Perl test */ + if (dctl != NULL) field = dctl; + else if (pctl != NULL && (m->which == MOD_PD || m->which == MOD_PDP || + ctx != CTX_DEFPAT)) + field = pctl; + break; + } + +if (field == NULL) + { + if (c == 0) + fprintf(outfile, "** '%s' is not valid here\n", m->name); + else + fprintf(outfile, "** /%c is not valid here\n", c); + return NULL; + } + +return (char *)field + offset; +} + + + +/************************************************* +* Decode a modifier list * +*************************************************/ + +/* A pointer to a control block is NULL when called in cases when that block is +not relevant. They are never all relevant in one call. At least one of patctl +and datctl is NULL. The second argument specifies which context to use for +modifiers that apply to contexts. + +Arguments: + p point to modifier string + ctx CTX_PAT => pattern context + CTX_POPPAT => pattern context for popped pattern + CTX_DEFPAT => default pattern context + CTX_DAT => data context + CTX_DEFDAT => default data context + pctl point to pattern control block + dctl point to data control block + +Returns: TRUE if successful decode, FALSE otherwise +*/ + +static BOOL +decode_modifiers(uint8_t *p, int ctx, patctl *pctl, datctl *dctl) +{ +uint8_t *ep, *pp; +long li; +unsigned long uli; +BOOL first = TRUE; + +for (;;) + { + void *field; + modstruct *m; + BOOL off = FALSE; + unsigned int i, len; + int index; + char *endptr; + + /* Skip white space and commas. */ + + while (isspace(*p) || *p == ',') p++; + if (*p == 0) break; + + /* Find the end of the item; lose trailing whitespace at end of line. */ + + for (ep = p; *ep != 0 && *ep != ','; ep++); + if (*ep == 0) + { + while (ep > p && isspace(ep[-1])) ep--; + *ep = 0; + } + + /* Remember if the first character is '-'. */ + + if (*p == '-') + { + off = TRUE; + p++; + } + + /* Find the length of a full-length modifier name, and scan for it. */ + + pp = p; + while (pp < ep && *pp != '=') pp++; + index = scan_modifiers(p, pp - p); + + /* If the first modifier is unrecognized, try to interpret it as a sequence + of single-character abbreviated modifiers. None of these modifiers have any + associated data. They just set options or control bits. */ + + if (index < 0) + { + uint32_t cc; + uint8_t *mp = p; + + if (!first) + { + fprintf(outfile, "** Unrecognized modifier '%.*s'\n", (int)(ep-p), p); + if (ep - p == 1) + fprintf(outfile, "** Single-character modifiers must come first\n"); + return FALSE; + } + + for (cc = *p; cc != ',' && cc != '\n' && cc != 0; cc = *(++p)) + { + for (i = 0; i < C1MODLISTCOUNT; i++) + if (cc == c1modlist[i].onechar) break; + + if (i >= C1MODLISTCOUNT) + { + fprintf(outfile, "** Unrecognized modifier '%c' in '%.*s'\n", + *p, (int)(ep-mp), mp); + return FALSE; + } + + if (c1modlist[i].index >= 0) + { + index = c1modlist[i].index; + } + + else + { + index = scan_modifiers((uint8_t *)(c1modlist[i].fullname), + strlen(c1modlist[i].fullname)); + if (index < 0) + { + fprintf(outfile, "** Internal error: single-character equivalent " + "modifier '%s' not found\n", c1modlist[i].fullname); + return FALSE; + } + c1modlist[i].index = index; /* Cache for next time */ + } + + field = check_modifier(modlist + index, ctx, pctl, dctl, *p); + if (field == NULL) return FALSE; + + /* /x is a special case; a second appearance changes PCRE2_EXTENDED to + PCRE2_EXTENDED_MORE. */ + + if (cc == 'x' && (*((uint32_t *)field) & PCRE2_EXTENDED) != 0) + { + *((uint32_t *)field) &= ~PCRE2_EXTENDED; + *((uint32_t *)field) |= PCRE2_EXTENDED_MORE; + } + else + *((uint32_t *)field) |= modlist[index].value; + } + + continue; /* With tne next (fullname) modifier */ + } + + /* We have a match on a full-name modifier. Check for the existence of data + when needed. */ + + m = modlist + index; /* Save typing */ + if (m->type != MOD_CTL && m->type != MOD_OPT && + (m->type != MOD_IND || *pp == '=')) + { + if (*pp++ != '=') + { + fprintf(outfile, "** '=' expected after '%s'\n", m->name); + return FALSE; + } + if (off) + { + fprintf(outfile, "** '-' is not valid for '%s'\n", m->name); + return FALSE; + } + } + + /* These on/off types have no data. */ + + else if (*pp != ',' && *pp != '\n' && *pp != ' ' && *pp != 0) + { + fprintf(outfile, "** Unrecognized modifier '%.*s'\n", (int)(ep-p), p); + return FALSE; + } + + /* Set the data length for those types that have data. Then find the field + that is to be set. If check_modifier() returns NULL, it has already output an + error message. */ + + len = ep - pp; + field = check_modifier(m, ctx, pctl, dctl, 0); + if (field == NULL) return FALSE; + + /* Process according to data type. */ + + switch (m->type) + { + case MOD_CTL: + case MOD_OPT: + if (off) *((uint32_t *)field) &= ~m->value; + else *((uint32_t *)field) |= m->value; + break; + + case MOD_BSR: + if (len == 7 && strncmpic(pp, (const uint8_t *)"default", 7) == 0) + { +#ifdef BSR_ANYCRLF + *((uint16_t *)field) = PCRE2_BSR_ANYCRLF; +#else + *((uint16_t *)field) = PCRE2_BSR_UNICODE; +#endif + if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL2_BSR_SET; + else dctl->control2 &= ~CTL2_BSR_SET; + } + else + { + if (len == 7 && strncmpic(pp, (const uint8_t *)"anycrlf", 7) == 0) + *((uint16_t *)field) = PCRE2_BSR_ANYCRLF; + else if (len == 7 && strncmpic(pp, (const uint8_t *)"unicode", 7) == 0) + *((uint16_t *)field) = PCRE2_BSR_UNICODE; + else goto INVALID_VALUE; + if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL2_BSR_SET; + else dctl->control2 |= CTL2_BSR_SET; + } + pp = ep; + break; + + case MOD_CHR: /* A single character */ + *((uint32_t *)field) = *pp++; + break; + + case MOD_CON: /* A convert type/options list */ + for (;; pp++) + { + uint8_t *colon = (uint8_t *)strchr((const char *)pp, ':'); + len = ((colon != NULL && colon < ep)? colon:ep) - pp; + for (i = 0; i < convertlistcount; i++) + { + if (strncmpic(pp, (const uint8_t *)convertlist[i].name, len) == 0) + { + if (*((uint32_t *)field) == CONVERT_UNSET) + *((uint32_t *)field) = convertlist[i].option; + else + *((uint32_t *)field) |= convertlist[i].option; + break; + } + } + if (i >= convertlistcount) goto INVALID_VALUE; + pp += len; + if (*pp != ':') break; + } + break; + + case MOD_IN2: /* One or two unsigned integers */ + if (!isdigit(*pp)) goto INVALID_VALUE; + uli = strtoul((const char *)pp, &endptr, 10); + if (U32OVERFLOW(uli)) goto INVALID_VALUE; + ((uint32_t *)field)[0] = (uint32_t)uli; + if (*endptr == ':') + { + uli = strtoul((const char *)endptr+1, &endptr, 10); + if (U32OVERFLOW(uli)) goto INVALID_VALUE; + ((uint32_t *)field)[1] = (uint32_t)uli; + } + else ((uint32_t *)field)[1] = 0; + pp = (uint8_t *)endptr; + break; + + /* PCRE2_SIZE_MAX is usually SIZE_MAX, which may be greater, equal to, or + less than ULONG_MAX. So first test for overflowing the long int, and then + test for overflowing PCRE2_SIZE_MAX if it is smaller than ULONG_MAX. */ + + case MOD_SIZ: /* PCRE2_SIZE value */ + if (!isdigit(*pp)) goto INVALID_VALUE; + uli = strtoul((const char *)pp, &endptr, 10); + if (uli == ULONG_MAX) goto INVALID_VALUE; +#if ULONG_MAX > PCRE2_SIZE_MAX + if (uli > PCRE2_SIZE_MAX) goto INVALID_VALUE; +#endif + *((PCRE2_SIZE *)field) = (PCRE2_SIZE)uli; + pp = (uint8_t *)endptr; + break; + + case MOD_IND: /* Unsigned integer with default */ + if (len == 0) + { + *((uint32_t *)field) = (uint32_t)(m->value); + break; + } + /* Fall through */ + + case MOD_INT: /* Unsigned integer */ + if (!isdigit(*pp)) goto INVALID_VALUE; + uli = strtoul((const char *)pp, &endptr, 10); + if (U32OVERFLOW(uli)) goto INVALID_VALUE; + *((uint32_t *)field) = (uint32_t)uli; + pp = (uint8_t *)endptr; + break; + + case MOD_INS: /* Signed integer */ + if (!isdigit(*pp) && *pp != '-') goto INVALID_VALUE; + li = strtol((const char *)pp, &endptr, 10); + if (S32OVERFLOW(li)) goto INVALID_VALUE; + *((int32_t *)field) = (int32_t)li; + pp = (uint8_t *)endptr; + break; + + case MOD_NL: + for (i = 0; i < sizeof(newlines)/sizeof(char *); i++) + if (len == strlen(newlines[i]) && + strncmpic(pp, (const uint8_t *)newlines[i], len) == 0) break; + if (i >= sizeof(newlines)/sizeof(char *)) goto INVALID_VALUE; + if (i == 0) + { + *((uint16_t *)field) = NEWLINE_DEFAULT; + if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 &= ~CTL2_NL_SET; + else dctl->control2 &= ~CTL2_NL_SET; + } + else + { + *((uint16_t *)field) = i; + if (ctx == CTX_PAT || ctx == CTX_DEFPAT) pctl->control2 |= CTL2_NL_SET; + else dctl->control2 |= CTL2_NL_SET; + } + pp = ep; + break; + + case MOD_NN: /* Name or (signed) number; may be several */ + if (isdigit(*pp) || *pp == '-') + { + int ct = MAXCPYGET - 1; + int32_t value; + li = strtol((const char *)pp, &endptr, 10); + if (S32OVERFLOW(li)) goto INVALID_VALUE; + value = (int32_t)li; + field = (char *)field - m->offset + m->value; /* Adjust field ptr */ + if (value >= 0) /* Add new number */ + { + while (*((int32_t *)field) >= 0 && ct-- > 0) /* Skip previous */ + field = (char *)field + sizeof(int32_t); + if (ct <= 0) + { + fprintf(outfile, "** Too many numeric '%s' modifiers\n", m->name); + return FALSE; + } + } + *((int32_t *)field) = value; + if (ct > 0) ((int32_t *)field)[1] = -1; + pp = (uint8_t *)endptr; + } + + /* Multiple strings are put end to end. */ + + else + { + char *nn = (char *)field; + if (len > 0) /* Add new name */ + { + if (len > MAX_NAME_SIZE) + { + fprintf(outfile, "** Group name in '%s' is too long\n", m->name); + return FALSE; + } + while (*nn != 0) nn += strlen(nn) + 1; + if (nn + len + 2 - (char *)field > LENCPYGET) + { + fprintf(outfile, "** Too many characters in named '%s' modifiers\n", + m->name); + return FALSE; + } + memcpy(nn, pp, len); + } + nn[len] = 0 ; + nn[len+1] = 0; + pp = ep; + } + break; + + case MOD_STR: + if (len + 1 > m->value) + { + fprintf(outfile, "** Overlong value for '%s' (max %d code units)\n", + m->name, m->value - 1); + return FALSE; + } + memcpy(field, pp, len); + ((uint8_t *)field)[len] = 0; + pp = ep; + break; + } + + if (*pp != ',' && *pp != '\n' && *pp != ' ' && *pp != 0) + { + fprintf(outfile, "** Comma expected after modifier item '%s'\n", m->name); + return FALSE; + } + + p = pp; + first = FALSE; + + if (ctx == CTX_POPPAT && + (pctl->options != 0 || + pctl->tables_id != 0 || + pctl->locale[0] != 0 || + (pctl->control & NOTPOP_CONTROLS) != 0)) + { + fprintf(outfile, "** '%s' is not valid here\n", m->name); + return FALSE; + } + } + +return TRUE; + +INVALID_VALUE: +fprintf(outfile, "** Invalid value in '%.*s'\n", (int)(ep-p), p); +return FALSE; +} + + +/************************************************* +* Get info from a pattern * +*************************************************/ + +/* A wrapped call to pcre2_pattern_info(), applied to the current compiled +pattern. + +Arguments: + what code for the required information + where where to put the answer + unsetok PCRE2_ERROR_UNSET is an "expected" result + +Returns: the return from pcre2_pattern_info() +*/ + +static int +pattern_info(int what, void *where, BOOL unsetok) +{ +int rc; +PCRE2_PATTERN_INFO(rc, compiled_code, what, NULL); /* Exercise the code */ +PCRE2_PATTERN_INFO(rc, compiled_code, what, where); +if (rc >= 0) return 0; +if (rc != PCRE2_ERROR_UNSET || !unsetok) + { + fprintf(outfile, "Error %d from pcre2_pattern_info_%d(%d)\n", rc, test_mode, + what); + if (rc == PCRE2_ERROR_BADMODE) + fprintf(outfile, "Running in %d-bit mode but pattern was compiled in " + "%d-bit mode\n", test_mode, + 8 * (FLD(compiled_code, flags) & PCRE2_MODE_MASK)); + } +return rc; +} + + + +#ifdef SUPPORT_PCRE2_8 +/************************************************* +* Show something in a list * +*************************************************/ + +/* This function just helps to keep the code that uses it tidier. It's used for +various lists of things where there needs to be introductory text before the +first item. As these calls are all in the POSIX-support code, they happen only +when 8-bit mode is supported. */ + +static void +prmsg(const char **msg, const char *s) +{ +fprintf(outfile, "%s %s", *msg, s); +*msg = ""; +} +#endif /* SUPPORT_PCRE2_8 */ + + + +/************************************************* +* Show control bits * +*************************************************/ + +/* Called for mutually exclusive controls and for unsupported POSIX controls. +Because the bits are unique, this can be used for both pattern and data control +words. + +Arguments: + controls control bits + controls2 more control bits + before text to print before + +Returns: nothing +*/ + +static void +show_controls(uint32_t controls, uint32_t controls2, const char *before) +{ +fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s", + before, + ((controls & CTL_AFTERTEXT) != 0)? " aftertext" : "", + ((controls & CTL_ALLAFTERTEXT) != 0)? " allaftertext" : "", + ((controls & CTL_ALLCAPTURES) != 0)? " allcaptures" : "", + ((controls & CTL_ALLUSEDTEXT) != 0)? " allusedtext" : "", + ((controls2 & CTL2_ALLVECTOR) != 0)? " allvector" : "", + ((controls & CTL_ALTGLOBAL) != 0)? " altglobal" : "", + ((controls & CTL_BINCODE) != 0)? " bincode" : "", + ((controls2 & CTL2_BSR_SET) != 0)? " bsr" : "", + ((controls & CTL_CALLOUT_CAPTURE) != 0)? " callout_capture" : "", + ((controls2 & CTL2_CALLOUT_EXTRA) != 0)? " callout_extra" : "", + ((controls & CTL_CALLOUT_INFO) != 0)? " callout_info" : "", + ((controls & CTL_CALLOUT_NONE) != 0)? " callout_none" : "", + ((controls2 & CTL2_CALLOUT_NO_WHERE) != 0)? " callout_no_where" : "", + ((controls & CTL_DFA) != 0)? " dfa" : "", + ((controls & CTL_EXPAND) != 0)? " expand" : "", + ((controls & CTL_FINDLIMITS) != 0)? " find_limits" : "", + ((controls & CTL_FRAMESIZE) != 0)? " framesize" : "", + ((controls & CTL_FULLBINCODE) != 0)? " fullbincode" : "", + ((controls & CTL_GETALL) != 0)? " getall" : "", + ((controls & CTL_GLOBAL) != 0)? " global" : "", + ((controls & CTL_HEXPAT) != 0)? " hex" : "", + ((controls & CTL_INFO) != 0)? " info" : "", + ((controls & CTL_JITFAST) != 0)? " jitfast" : "", + ((controls & CTL_JITVERIFY) != 0)? " jitverify" : "", + ((controls & CTL_MARK) != 0)? " mark" : "", + ((controls & CTL_MEMORY) != 0)? " memory" : "", + ((controls2 & CTL2_NL_SET) != 0)? " newline" : "", + ((controls & CTL_NULLCONTEXT) != 0)? " null_context" : "", + ((controls & CTL_POSIX) != 0)? " posix" : "", + ((controls & CTL_POSIX_NOSUB) != 0)? " posix_nosub" : "", + ((controls & CTL_PUSH) != 0)? " push" : "", + ((controls & CTL_PUSHCOPY) != 0)? " pushcopy" : "", + ((controls & CTL_PUSHTABLESCOPY) != 0)? " pushtablescopy" : "", + ((controls & CTL_STARTCHAR) != 0)? " startchar" : "", + ((controls2 & CTL2_SUBSTITUTE_CALLOUT) != 0)? " substitute_callout" : "", + ((controls2 & CTL2_SUBSTITUTE_EXTENDED) != 0)? " substitute_extended" : "", + ((controls2 & CTL2_SUBSTITUTE_LITERAL) != 0)? " substitute_literal" : "", + ((controls2 & CTL2_SUBSTITUTE_MATCHED) != 0)? " substitute_matched" : "", + ((controls2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) != 0)? " substitute_overflow_length" : "", + ((controls2 & CTL2_SUBSTITUTE_REPLACEMENT_ONLY) != 0)? " substitute_replacement_only" : "", + ((controls2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) != 0)? " substitute_unknown_unset" : "", + ((controls2 & CTL2_SUBSTITUTE_UNSET_EMPTY) != 0)? " substitute_unset_empty" : "", + ((controls & CTL_USE_LENGTH) != 0)? " use_length" : "", + ((controls & CTL_UTF8_INPUT) != 0)? " utf8_input" : "", + ((controls & CTL_ZERO_TERMINATE) != 0)? " zero_terminate" : ""); +} + + + +/************************************************* +* Show compile options * +*************************************************/ + +/* Called from show_pattern_info() and for unsupported POSIX options. + +Arguments: + options an options word + before text to print before + after text to print after + +Returns: nothing +*/ + +static void +show_compile_options(uint32_t options, const char *before, const char *after) +{ +if (options == 0) fprintf(outfile, "%s %s", before, after); +else fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s", + before, + ((options & PCRE2_ALT_BSUX) != 0)? " alt_bsux" : "", + ((options & PCRE2_ALT_CIRCUMFLEX) != 0)? " alt_circumflex" : "", + ((options & PCRE2_ALT_VERBNAMES) != 0)? " alt_verbnames" : "", + ((options & PCRE2_ALLOW_EMPTY_CLASS) != 0)? " allow_empty_class" : "", + ((options & PCRE2_ANCHORED) != 0)? " anchored" : "", + ((options & PCRE2_AUTO_CALLOUT) != 0)? " auto_callout" : "", + ((options & PCRE2_CASELESS) != 0)? " caseless" : "", + ((options & PCRE2_DOLLAR_ENDONLY) != 0)? " dollar_endonly" : "", + ((options & PCRE2_DOTALL) != 0)? " dotall" : "", + ((options & PCRE2_DUPNAMES) != 0)? " dupnames" : "", + ((options & PCRE2_ENDANCHORED) != 0)? " endanchored" : "", + ((options & PCRE2_EXTENDED) != 0)? " extended" : "", + ((options & PCRE2_EXTENDED_MORE) != 0)? " extended_more" : "", + ((options & PCRE2_FIRSTLINE) != 0)? " firstline" : "", + ((options & PCRE2_LITERAL) != 0)? " literal" : "", + ((options & PCRE2_MATCH_INVALID_UTF) != 0)? " match_invalid_utf" : "", + ((options & PCRE2_MATCH_UNSET_BACKREF) != 0)? " match_unset_backref" : "", + ((options & PCRE2_MULTILINE) != 0)? " multiline" : "", + ((options & PCRE2_NEVER_BACKSLASH_C) != 0)? " never_backslash_c" : "", + ((options & PCRE2_NEVER_UCP) != 0)? " never_ucp" : "", + ((options & PCRE2_NEVER_UTF) != 0)? " never_utf" : "", + ((options & PCRE2_NO_AUTO_CAPTURE) != 0)? " no_auto_capture" : "", + ((options & PCRE2_NO_AUTO_POSSESS) != 0)? " no_auto_possess" : "", + ((options & PCRE2_NO_DOTSTAR_ANCHOR) != 0)? " no_dotstar_anchor" : "", + ((options & PCRE2_NO_UTF_CHECK) != 0)? " no_utf_check" : "", + ((options & PCRE2_NO_START_OPTIMIZE) != 0)? " no_start_optimize" : "", + ((options & PCRE2_UCP) != 0)? " ucp" : "", + ((options & PCRE2_UNGREEDY) != 0)? " ungreedy" : "", + ((options & PCRE2_USE_OFFSET_LIMIT) != 0)? " use_offset_limit" : "", + ((options & PCRE2_UTF) != 0)? " utf" : "", + after); +} + + +/************************************************* +* Show compile extra options * +*************************************************/ + +/* Called from show_pattern_info() and for unsupported POSIX options. + +Arguments: + options an options word + before text to print before + after text to print after + +Returns: nothing +*/ + +static void +show_compile_extra_options(uint32_t options, const char *before, + const char *after) +{ +if (options == 0) fprintf(outfile, "%s %s", before, after); +else fprintf(outfile, "%s%s%s%s%s%s%s%s", + before, + ((options & PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES) != 0)? " allow_surrogate_escapes" : "", + ((options & PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL) != 0)? " bad_escape_is_literal" : "", + ((options & PCRE2_EXTRA_ALT_BSUX) != 0)? " extra_alt_bsux" : "", + ((options & PCRE2_EXTRA_MATCH_WORD) != 0)? " match_word" : "", + ((options & PCRE2_EXTRA_MATCH_LINE) != 0)? " match_line" : "", + ((options & PCRE2_EXTRA_ESCAPED_CR_IS_LF) != 0)? " escaped_cr_is_lf" : "", + after); +} + + + +#ifdef SUPPORT_PCRE2_8 +/************************************************* +* Show match options * +*************************************************/ + +/* Called for unsupported POSIX options. */ + +static void +show_match_options(uint32_t options) +{ +fprintf(outfile, "%s%s%s%s%s%s%s%s%s%s%s%s%s", + ((options & PCRE2_ANCHORED) != 0)? " anchored" : "", + ((options & PCRE2_COPY_MATCHED_SUBJECT) != 0)? " copy_matched_subject" : "", + ((options & PCRE2_DFA_RESTART) != 0)? " dfa_restart" : "", + ((options & PCRE2_DFA_SHORTEST) != 0)? " dfa_shortest" : "", + ((options & PCRE2_ENDANCHORED) != 0)? " endanchored" : "", + ((options & PCRE2_NO_JIT) != 0)? " no_jit" : "", + ((options & PCRE2_NO_UTF_CHECK) != 0)? " no_utf_check" : "", + ((options & PCRE2_NOTBOL) != 0)? " notbol" : "", + ((options & PCRE2_NOTEMPTY) != 0)? " notempty" : "", + ((options & PCRE2_NOTEMPTY_ATSTART) != 0)? " notempty_atstart" : "", + ((options & PCRE2_NOTEOL) != 0)? " noteol" : "", + ((options & PCRE2_PARTIAL_HARD) != 0)? " partial_hard" : "", + ((options & PCRE2_PARTIAL_SOFT) != 0)? " partial_soft" : ""); +} +#endif /* SUPPORT_PCRE2_8 */ + + + +/************************************************* +* Show memory usage info for a pattern * +*************************************************/ + +static void +show_memory_info(void) +{ +uint32_t name_count, name_entry_size; +size_t size, cblock_size; + +/* One of the test_mode values will always be true, but to stop a compiler +warning we must initialize cblock_size. */ + +cblock_size = 0; +#ifdef SUPPORT_PCRE2_8 +if (test_mode == PCRE8_MODE) cblock_size = sizeof(pcre2_real_code_8); +#endif +#ifdef SUPPORT_PCRE2_16 +if (test_mode == PCRE16_MODE) cblock_size = sizeof(pcre2_real_code_16); +#endif +#ifdef SUPPORT_PCRE2_32 +if (test_mode == PCRE32_MODE) cblock_size = sizeof(pcre2_real_code_32); +#endif + +(void)pattern_info(PCRE2_INFO_SIZE, &size, FALSE); +(void)pattern_info(PCRE2_INFO_NAMECOUNT, &name_count, FALSE); +(void)pattern_info(PCRE2_INFO_NAMEENTRYSIZE, &name_entry_size, FALSE); +fprintf(outfile, "Memory allocation (code space): %d\n", + (int)(size - name_count*name_entry_size*code_unit_size - cblock_size)); +if (pat_patctl.jit != 0) + { + (void)pattern_info(PCRE2_INFO_JITSIZE, &size, FALSE); + fprintf(outfile, "Memory allocation (JIT code): %d\n", (int)size); + } +} + + + +/************************************************* +* Show frame size info for a pattern * +*************************************************/ + +static void +show_framesize(void) +{ +size_t frame_size; +(void)pattern_info(PCRE2_INFO_FRAMESIZE, &frame_size, FALSE); +fprintf(outfile, "Frame size for pcre2_match(): %d\n", (int)frame_size); +} + + + +/************************************************* +* Get and output an error message * +*************************************************/ + +static BOOL +print_error_message(int errorcode, const char *before, const char *after) +{ +int len; +PCRE2_GET_ERROR_MESSAGE(len, errorcode, pbuffer); +if (len < 0) + { + fprintf(outfile, "\n** pcre2test internal error: cannot interpret error " + "number\n** Unexpected return (%d) from pcre2_get_error_message()\n", len); + } +else + { + fprintf(outfile, "%s", before); + PCHARSV(CASTVAR(void *, pbuffer), 0, len, FALSE, outfile); + fprintf(outfile, "%s", after); + } +return len >= 0; +} + + +/************************************************* +* Callback function for callout enumeration * +*************************************************/ + +/* The only differences in the callout emumeration block for different code +unit widths are that the pointers to the subject, the most recent MARK, and a +callout argument string point to strings of the appropriate width. Casts can be +used to deal with this. + +Argument: + cb pointer to enumerate block + callout_data user data + +Returns: 0 +*/ + +static int callout_callback(pcre2_callout_enumerate_block_8 *cb, + void *callout_data) +{ +uint32_t i; +BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0; + +(void)callout_data; /* Not currently displayed */ + +fprintf(outfile, "Callout "); +if (cb->callout_string != NULL) + { + uint32_t delimiter = CODE_UNIT(cb->callout_string, -1); + fprintf(outfile, "%c", delimiter); + PCHARSV(cb->callout_string, 0, + cb->callout_string_length, utf, outfile); + for (i = 0; callout_start_delims[i] != 0; i++) + if (delimiter == callout_start_delims[i]) + { + delimiter = callout_end_delims[i]; + break; + } + fprintf(outfile, "%c ", delimiter); + } +else fprintf(outfile, "%d ", cb->callout_number); + +fprintf(outfile, "%.*s\n", + (int)((cb->next_item_length == 0)? 1 : cb->next_item_length), + pbuffer8 + cb->pattern_position); + +return 0; +} + + + +/************************************************* +* Show information about a pattern * +*************************************************/ + +/* This function is called after a pattern has been compiled if any of the +information-requesting controls have been set. + +Arguments: none + +Returns: PR_OK continue processing next line + PR_SKIP skip to a blank line + PR_ABEND abort the pcre2test run +*/ + +static int +show_pattern_info(void) +{ +uint32_t compile_options, overall_options, extra_options; +BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0; + +if ((pat_patctl.control & (CTL_BINCODE|CTL_FULLBINCODE)) != 0) + { + fprintf(outfile, "------------------------------------------------------------------\n"); + PCRE2_PRINTINT((pat_patctl.control & CTL_FULLBINCODE) != 0); + } + +if ((pat_patctl.control & CTL_INFO) != 0) + { + int rc; + void *nametable; + uint8_t *start_bits; + BOOL heap_limit_set, match_limit_set, depth_limit_set; + uint32_t backrefmax, bsr_convention, capture_count, first_ctype, first_cunit, + hasbackslashc, hascrorlf, jchanged, last_ctype, last_cunit, match_empty, + depth_limit, heap_limit, match_limit, minlength, nameentrysize, namecount, + newline_convention; + + /* Exercise the error route. */ + + PCRE2_PATTERN_INFO(rc, compiled_code, 999, NULL); + (void)rc; + + /* These info requests may return PCRE2_ERROR_UNSET. */ + + switch(pattern_info(PCRE2_INFO_HEAPLIMIT, &heap_limit, TRUE)) + { + case 0: + heap_limit_set = TRUE; + break; + + case PCRE2_ERROR_UNSET: + heap_limit_set = FALSE; + break; + + default: + return PR_ABEND; + } + + switch(pattern_info(PCRE2_INFO_MATCHLIMIT, &match_limit, TRUE)) + { + case 0: + match_limit_set = TRUE; + break; + + case PCRE2_ERROR_UNSET: + match_limit_set = FALSE; + break; + + default: + return PR_ABEND; + } + + switch(pattern_info(PCRE2_INFO_DEPTHLIMIT, &depth_limit, TRUE)) + { + case 0: + depth_limit_set = TRUE; + break; + + case PCRE2_ERROR_UNSET: + depth_limit_set = FALSE; + break; + + default: + return PR_ABEND; + } + + /* These info requests should always succeed. */ + + if (pattern_info(PCRE2_INFO_BACKREFMAX, &backrefmax, FALSE) + + pattern_info(PCRE2_INFO_BSR, &bsr_convention, FALSE) + + pattern_info(PCRE2_INFO_CAPTURECOUNT, &capture_count, FALSE) + + pattern_info(PCRE2_INFO_FIRSTBITMAP, &start_bits, FALSE) + + pattern_info(PCRE2_INFO_FIRSTCODEUNIT, &first_cunit, FALSE) + + pattern_info(PCRE2_INFO_FIRSTCODETYPE, &first_ctype, FALSE) + + pattern_info(PCRE2_INFO_HASBACKSLASHC, &hasbackslashc, FALSE) + + pattern_info(PCRE2_INFO_HASCRORLF, &hascrorlf, FALSE) + + pattern_info(PCRE2_INFO_JCHANGED, &jchanged, FALSE) + + pattern_info(PCRE2_INFO_LASTCODEUNIT, &last_cunit, FALSE) + + pattern_info(PCRE2_INFO_LASTCODETYPE, &last_ctype, FALSE) + + pattern_info(PCRE2_INFO_MATCHEMPTY, &match_empty, FALSE) + + pattern_info(PCRE2_INFO_MINLENGTH, &minlength, FALSE) + + pattern_info(PCRE2_INFO_NAMECOUNT, &namecount, FALSE) + + pattern_info(PCRE2_INFO_NAMEENTRYSIZE, &nameentrysize, FALSE) + + pattern_info(PCRE2_INFO_NAMETABLE, &nametable, FALSE) + + pattern_info(PCRE2_INFO_NEWLINE, &newline_convention, FALSE) + != 0) + return PR_ABEND; + + fprintf(outfile, "Capture group count = %d\n", capture_count); + + if (backrefmax > 0) + fprintf(outfile, "Max back reference = %d\n", backrefmax); + + if (maxlookbehind > 0) + fprintf(outfile, "Max lookbehind = %d\n", maxlookbehind); + + if (heap_limit_set) + fprintf(outfile, "Heap limit = %u\n", heap_limit); + + if (match_limit_set) + fprintf(outfile, "Match limit = %u\n", match_limit); + + if (depth_limit_set) + fprintf(outfile, "Depth limit = %u\n", depth_limit); + + if (namecount > 0) + { + fprintf(outfile, "Named capture groups:\n"); + for (; namecount > 0; namecount--) + { + int imm2_size = test_mode == PCRE8_MODE ? 2 : 1; + uint32_t length = (uint32_t)STRLEN(nametable + imm2_size); + fprintf(outfile, " "); + + /* In UTF mode the name may be a UTF string containing non-ASCII + letters and digits. We must output it as a UTF-8 string. In non-UTF mode, + use the normal string printing functions, which use escapes for all + non-ASCII characters. */ + + if (utf) + { +#ifdef SUPPORT_PCRE2_32 + if (test_mode == PCRE32_MODE) + { + PCRE2_SPTR32 nameptr = (PCRE2_SPTR32)nametable + imm2_size; + while (*nameptr != 0) + { + uint8_t u8buff[6]; + int len = ord2utf8(*nameptr++, u8buff); + fprintf(outfile, "%.*s", len, u8buff); + } + } +#endif +#ifdef SUPPORT_PCRE2_16 + if (test_mode == PCRE16_MODE) + { + PCRE2_SPTR16 nameptr = (PCRE2_SPTR16)nametable + imm2_size; + while (*nameptr != 0) + { + int len; + uint8_t u8buff[6]; + uint32_t c = *nameptr++ & 0xffff; + if (c >= 0xD800 && c < 0xDC00) + c = ((c & 0x3ff) << 10) + (*nameptr++ & 0x3ff) + 0x10000; + len = ord2utf8(c, u8buff); + fprintf(outfile, "%.*s", len, u8buff); + } + } +#endif +#ifdef SUPPORT_PCRE2_8 + if (test_mode == PCRE8_MODE) + fprintf(outfile, "%s", (PCRE2_SPTR8)nametable + imm2_size); +#endif + } + else /* Not UTF mode */ + { + PCHARSV(nametable, imm2_size, length, FALSE, outfile); + } + + while (length++ < nameentrysize - imm2_size) putc(' ', outfile); + +#ifdef SUPPORT_PCRE2_32 + if (test_mode == PCRE32_MODE) + fprintf(outfile, "%3d\n", (int)(((PCRE2_SPTR32)nametable)[0])); +#endif +#ifdef SUPPORT_PCRE2_16 + if (test_mode == PCRE16_MODE) + fprintf(outfile, "%3d\n", (int)(((PCRE2_SPTR16)nametable)[0])); +#endif +#ifdef SUPPORT_PCRE2_8 + if (test_mode == PCRE8_MODE) + fprintf(outfile, "%3d\n", (int)( + ((((PCRE2_SPTR8)nametable)[0]) << 8) | ((PCRE2_SPTR8)nametable)[1])); +#endif + + nametable = (void*)((PCRE2_SPTR8)nametable + nameentrysize * code_unit_size); + } + } + + if (hascrorlf) fprintf(outfile, "Contains explicit CR or LF match\n"); + if (hasbackslashc) fprintf(outfile, "Contains \\C\n"); + if (match_empty) fprintf(outfile, "May match empty string\n"); + + pattern_info(PCRE2_INFO_ARGOPTIONS, &compile_options, FALSE); + pattern_info(PCRE2_INFO_ALLOPTIONS, &overall_options, FALSE); + pattern_info(PCRE2_INFO_EXTRAOPTIONS, &extra_options, FALSE); + + /* Remove UTF/UCP if they were there only because of forbid_utf. This saves + cluttering up the verification output of non-UTF test files. */ + + if ((pat_patctl.options & PCRE2_NEVER_UTF) == 0) + { + compile_options &= ~PCRE2_NEVER_UTF; + overall_options &= ~PCRE2_NEVER_UTF; + } + + if ((pat_patctl.options & PCRE2_NEVER_UCP) == 0) + { + compile_options &= ~PCRE2_NEVER_UCP; + overall_options &= ~PCRE2_NEVER_UCP; + } + + if ((compile_options|overall_options) != 0) + { + if (compile_options == overall_options) + show_compile_options(compile_options, "Options:", "\n"); + else + { + show_compile_options(compile_options, "Compile options:", "\n"); + show_compile_options(overall_options, "Overall options:", "\n"); + } + } + + if (extra_options != 0) + show_compile_extra_options(extra_options, "Extra options:", "\n"); + + if (jchanged) fprintf(outfile, "Duplicate name status changes\n"); + + if ((pat_patctl.control2 & CTL2_BSR_SET) != 0 || + (FLD(compiled_code, flags) & PCRE2_BSR_SET) != 0) + fprintf(outfile, "\\R matches %s\n", (bsr_convention == PCRE2_BSR_UNICODE)? + "any Unicode newline" : "CR, LF, or CRLF"); + + if ((FLD(compiled_code, flags) & PCRE2_NL_SET) != 0) + { + switch (newline_convention) + { + case PCRE2_NEWLINE_CR: + fprintf(outfile, "Forced newline is CR\n"); + break; + + case PCRE2_NEWLINE_LF: + fprintf(outfile, "Forced newline is LF\n"); + break; + + case PCRE2_NEWLINE_CRLF: + fprintf(outfile, "Forced newline is CRLF\n"); + break; + + case PCRE2_NEWLINE_ANYCRLF: + fprintf(outfile, "Forced newline is CR, LF, or CRLF\n"); + break; + + case PCRE2_NEWLINE_ANY: + fprintf(outfile, "Forced newline is any Unicode newline\n"); + break; + + case PCRE2_NEWLINE_NUL: + fprintf(outfile, "Forced newline is NUL\n"); + break; + + default: + break; + } + } + + if (first_ctype == 2) + { + fprintf(outfile, "First code unit at start or follows newline\n"); + } + else if (first_ctype == 1) + { + const char *caseless = + ((FLD(compiled_code, flags) & PCRE2_FIRSTCASELESS) == 0)? + "" : " (caseless)"; + if (PRINTOK(first_cunit)) + fprintf(outfile, "First code unit = \'%c\'%s\n", first_cunit, caseless); + else + { + fprintf(outfile, "First code unit = "); + pchar(first_cunit, FALSE, outfile); + fprintf(outfile, "%s\n", caseless); + } + } + else if (start_bits != NULL) + { + int i; + int c = 24; + fprintf(outfile, "Starting code units: "); + for (i = 0; i < 256; i++) + { + if ((start_bits[i/8] & (1u << (i&7))) != 0) + { + if (c > 75) + { + fprintf(outfile, "\n "); + c = 2; + } + if (PRINTOK(i) && i != ' ') + { + fprintf(outfile, "%c ", i); + c += 2; + } + else + { + fprintf(outfile, "\\x%02x ", i); + c += 5; + } + } + } + fprintf(outfile, "\n"); + } + + if (last_ctype != 0) + { + const char *caseless = + ((FLD(compiled_code, flags) & PCRE2_LASTCASELESS) == 0)? + "" : " (caseless)"; + if (PRINTOK(last_cunit)) + fprintf(outfile, "Last code unit = \'%c\'%s\n", last_cunit, caseless); + else + { + fprintf(outfile, "Last code unit = "); + pchar(last_cunit, FALSE, outfile); + fprintf(outfile, "%s\n", caseless); + } + } + + if ((FLD(compiled_code, overall_options) & PCRE2_NO_START_OPTIMIZE) == 0) + fprintf(outfile, "Subject length lower bound = %d\n", minlength); + + if (pat_patctl.jit != 0 && (pat_patctl.control & CTL_JITVERIFY) != 0) + { + if (FLD(compiled_code, executable_jit) != NULL) + fprintf(outfile, "JIT compilation was successful\n"); + else + { +#ifdef SUPPORT_JIT + fprintf(outfile, "JIT compilation was not successful"); + if (jitrc != 0 && !print_error_message(jitrc, " (", ")")) + return PR_ABEND; + fprintf(outfile, "\n"); +#else + fprintf(outfile, "JIT support is not available in this version of PCRE2\n"); +#endif + } + } + } + +if ((pat_patctl.control & CTL_CALLOUT_INFO) != 0) + { + int errorcode; + PCRE2_CALLOUT_ENUMERATE(errorcode, callout_callback, 0); + if (errorcode != 0) + { + fprintf(outfile, "Callout enumerate failed: error %d: ", errorcode); + if (errorcode < 0 && !print_error_message(errorcode, "", "\n")) + return PR_ABEND; + return PR_SKIP; + } + } + +return PR_OK; +} + + + +/************************************************* +* Handle serialization error * +*************************************************/ + +/* Print an error message after a serialization failure. + +Arguments: + rc the error code + msg an initial message for what failed + +Returns: FALSE if print_error_message() fails +*/ + +static BOOL +serial_error(int rc, const char *msg) +{ +fprintf(outfile, "%s failed: error %d: ", msg, rc); +return print_error_message(rc, "", "\n"); +} + + + +/************************************************* +* Open file for save/load commands * +*************************************************/ + +/* This function decodes the file name and opens the file. + +Arguments: + buffptr point after the #command + mode open mode + fptr points to the FILE variable + name name of # command + +Returns: PR_OK or PR_ABEND +*/ + +static int +open_file(uint8_t *buffptr, const char *mode, FILE **fptr, const char *name) +{ +char *endf; +char *filename = (char *)buffptr; +while (isspace(*filename)) filename++; +endf = filename + strlen8(filename); +while (endf > filename && isspace(endf[-1])) endf--; + +if (endf == filename) + { + fprintf(outfile, "** File name expected after %s\n", name); + return PR_ABEND; + } + +*endf = 0; +*fptr = fopen((const char *)filename, mode); +if (*fptr == NULL) + { + fprintf(outfile, "** Failed to open '%s': %s\n", filename, strerror(errno)); + return PR_ABEND; + } + +return PR_OK; +} + + + +/************************************************* +* Process command line * +*************************************************/ + +/* This function is called for lines beginning with # and a character that is +not ! or whitespace, when encountered between tests, which means that there is +no compiled pattern (compiled_code is NULL). The line is in buffer. + +Arguments: none + +Returns: PR_OK continue processing next line + PR_SKIP skip to a blank line + PR_ABEND abort the pcre2test run +*/ + +static int +process_command(void) +{ +FILE *f; +PCRE2_SIZE serial_size; +size_t i; +int rc, cmd, cmdlen, yield; +uint16_t first_listed_newline; +const char *cmdname; +uint8_t *argptr, *serial; + +yield = PR_OK; +cmd = CMD_UNKNOWN; +cmdlen = 0; + +for (i = 0; i < cmdlistcount; i++) + { + cmdname = cmdlist[i].name; + cmdlen = strlen(cmdname); + if (strncmp((char *)(buffer+1), cmdname, cmdlen) == 0 && + isspace(buffer[cmdlen+1])) + { + cmd = cmdlist[i].value; + break; + } + } + +argptr = buffer + cmdlen + 1; + +if (restrict_for_perl_test && cmd != CMD_PATTERN && cmd != CMD_SUBJECT) + { + fprintf(outfile, "** #%s is not allowed after #perltest\n", cmdname); + return PR_ABEND; + } + +switch(cmd) + { + case CMD_UNKNOWN: + fprintf(outfile, "** Unknown command: %s", buffer); + break; + + case CMD_FORBID_UTF: + forbid_utf = PCRE2_NEVER_UTF|PCRE2_NEVER_UCP; + break; + + case CMD_PERLTEST: + restrict_for_perl_test = TRUE; + break; + + /* Set default pattern modifiers */ + + case CMD_PATTERN: + (void)decode_modifiers(argptr, CTX_DEFPAT, &def_patctl, NULL); + if (def_patctl.jit == 0 && (def_patctl.control & CTL_JITVERIFY) != 0) + def_patctl.jit = JIT_DEFAULT; + break; + + /* Set default subject modifiers */ + + case CMD_SUBJECT: + (void)decode_modifiers(argptr, CTX_DEFDAT, NULL, &def_datctl); + break; + + /* Check the default newline, and if not one of those listed, set up the + first one to be forced. An empty list unsets. */ + + case CMD_NEWLINE_DEFAULT: + local_newline_default = 0; /* Unset */ + first_listed_newline = 0; + for (;;) + { + while (isspace(*argptr)) argptr++; + if (*argptr == 0) break; + for (i = 1; i < sizeof(newlines)/sizeof(char *); i++) + { + size_t nlen = strlen(newlines[i]); + if (strncmpic(argptr, (const uint8_t *)newlines[i], nlen) == 0 && + isspace(argptr[nlen])) + { + if (i == NEWLINE_DEFAULT) return PR_OK; /* Default is valid */ + if (first_listed_newline == 0) first_listed_newline = i; + } + } + while (*argptr != 0 && !isspace(*argptr)) argptr++; + } + local_newline_default = first_listed_newline; + break; + + /* Pop or copy a compiled pattern off the stack. Modifiers that do not affect + the compiled pattern (e.g. to give information) are permitted. The default + pattern modifiers are ignored. */ + + case CMD_POP: + case CMD_POPCOPY: + if (patstacknext <= 0) + { + fprintf(outfile, "** Can't pop off an empty stack\n"); + return PR_SKIP; + } + memset(&pat_patctl, 0, sizeof(patctl)); /* Completely unset */ + if (!decode_modifiers(argptr, CTX_POPPAT, &pat_patctl, NULL)) + return PR_SKIP; + + if (cmd == CMD_POP) + { + SET(compiled_code, patstack[--patstacknext]); + } + else + { + PCRE2_CODE_COPY_FROM_VOID(compiled_code, patstack[patstacknext - 1]); + } + + if (pat_patctl.jit != 0) + { + PCRE2_JIT_COMPILE(jitrc, compiled_code, pat_patctl.jit); + } + if ((pat_patctl.control & CTL_MEMORY) != 0) show_memory_info(); + if ((pat_patctl.control & CTL_FRAMESIZE) != 0) show_framesize(); + if ((pat_patctl.control & CTL_ANYINFO) != 0) + { + rc = show_pattern_info(); + if (rc != PR_OK) return rc; + } + break; + + /* Save the stack of compiled patterns to a file, then empty the stack. */ + + case CMD_SAVE: + if (patstacknext <= 0) + { + fprintf(outfile, "** No stacked patterns to save\n"); + return PR_OK; + } + + rc = open_file(argptr+1, BINARY_OUTPUT_MODE, &f, "#save"); + if (rc != PR_OK) return rc; + + PCRE2_SERIALIZE_ENCODE(rc, patstack, patstacknext, &serial, &serial_size, + general_context); + if (rc < 0) + { + fclose(f); + if (!serial_error(rc, "Serialization")) return PR_ABEND; + break; + } + + /* Write the length at the start of the file to make it straightforward to + get the right memory when re-loading. This saves having to read the file size + in different operating systems. To allow for different endianness (even + though reloading with the opposite endianness does not work), write the + length byte-by-byte. */ + + for (i = 0; i < 4; i++) fputc((serial_size >> (i*8)) & 255, f); + if (fwrite(serial, 1, serial_size, f) != serial_size) + { + fprintf(outfile, "** Wrong return from fwrite()\n"); + fclose(f); + return PR_ABEND; + } + + fclose(f); + PCRE2_SERIALIZE_FREE(serial); + while(patstacknext > 0) + { + SET(compiled_code, patstack[--patstacknext]); + SUB1(pcre2_code_free, compiled_code); + } + SET(compiled_code, NULL); + break; + + /* Load a set of compiled patterns from a file onto the stack */ + + case CMD_LOAD: + rc = open_file(argptr+1, BINARY_INPUT_MODE, &f, "#load"); + if (rc != PR_OK) return rc; + + serial_size = 0; + for (i = 0; i < 4; i++) serial_size |= fgetc(f) << (i*8); + + serial = malloc(serial_size); + if (serial == NULL) + { + fprintf(outfile, "** Failed to get memory (size %" SIZ_FORM ") for #load\n", + SIZ_CAST serial_size); + fclose(f); + return PR_ABEND; + } + + i = fread(serial, 1, serial_size, f); + fclose(f); + + if (i != serial_size) + { + fprintf(outfile, "** Wrong return from fread()\n"); + yield = PR_ABEND; + } + else + { + PCRE2_SERIALIZE_GET_NUMBER_OF_CODES(rc, serial); + if (rc < 0) + { + if (!serial_error(rc, "Get number of codes")) yield = PR_ABEND; + } + else + { + if (rc + patstacknext > PATSTACKSIZE) + { + fprintf(outfile, "** Not enough space on pattern stack for %d pattern%s\n", + rc, (rc == 1)? "" : "s"); + rc = PATSTACKSIZE - patstacknext; + fprintf(outfile, "** Decoding %d pattern%s\n", rc, + (rc == 1)? "" : "s"); + } + PCRE2_SERIALIZE_DECODE(rc, patstack + patstacknext, rc, serial, + general_context); + if (rc < 0) + { + if (!serial_error(rc, "Deserialization")) yield = PR_ABEND; + } + else patstacknext += rc; + } + } + + free(serial); + break; + + /* Load a set of binary tables into tables3. */ + + case CMD_LOADTABLES: + rc = open_file(argptr+1, BINARY_INPUT_MODE, &f, "#loadtables"); + if (rc != PR_OK) return rc; + + if (tables3 == NULL) + { + (void)PCRE2_CONFIG(PCRE2_CONFIG_TABLES_LENGTH, &loadtables_length); + tables3 = malloc(loadtables_length); + } + + if (tables3 == NULL) + { + fprintf(outfile, "** Failed: malloc failed for #loadtables\n"); + yield = PR_ABEND; + } + else if (fread(tables3, 1, loadtables_length, f) != loadtables_length) + { + fprintf(outfile, "** Wrong return from fread()\n"); + yield = PR_ABEND; + } + + fclose(f); + break; + } + +return yield; +} + + + +/************************************************* +* Process pattern line * +*************************************************/ + +/* This function is called when the input buffer contains the start of a +pattern. The first character is known to be a valid delimiter. The pattern is +read, modifiers are interpreted, and a suitable local context is set up for +this test. The pattern is then compiled. + +Arguments: none + +Returns: PR_OK continue processing next line + PR_SKIP skip to a blank line + PR_ABEND abort the pcre2test run +*/ + +static int +process_pattern(void) +{ +BOOL utf; +uint32_t k; +uint8_t *p = buffer; +unsigned int delimiter = *p++; +int errorcode; +void *use_pat_context; +uint32_t use_forbid_utf = forbid_utf; +PCRE2_SIZE patlen; +PCRE2_SIZE valgrind_access_length; +PCRE2_SIZE erroroffset; + +/* The perltest.sh script supports only / as a delimiter. */ + +if (restrict_for_perl_test && delimiter != '/') + { + fprintf(outfile, "** The only allowed delimiter after #perltest is '/'\n"); + return PR_ABEND; + } + +/* Initialize the context and pattern/data controls for this test from the +defaults. */ + +PATCTXCPY(pat_context, default_pat_context); +memcpy(&pat_patctl, &def_patctl, sizeof(patctl)); + +/* Find the end of the pattern, reading more lines if necessary. */ + +for(;;) + { + while (*p != 0) + { + if (*p == '\\' && p[1] != 0) p++; + else if (*p == delimiter) break; + p++; + } + if (*p != 0) break; + if ((p = extend_inputline(infile, p, " > ")) == NULL) + { + fprintf(outfile, "** Unexpected EOF\n"); + return PR_ABEND; + } + if (!INTERACTIVE(infile)) fprintf(outfile, "%s", (char *)p); + } + +/* If the first character after the delimiter is backslash, make the pattern +end with backslash. This is purely to provide a way of testing for the error +message when a pattern ends with backslash. */ + +if (p[1] == '\\') *p++ = '\\'; + +/* Terminate the pattern at the delimiter, and compute the length. */ + +*p++ = 0; +patlen = p - buffer - 2; + +/* Look for modifiers and options after the final delimiter. */ + +if (!decode_modifiers(p, CTX_PAT, &pat_patctl, NULL)) return PR_SKIP; + +/* Note that the match_invalid_utf option also sets utf when passed to +pcre2_compile(). */ + +utf = (pat_patctl.options & (PCRE2_UTF|PCRE2_MATCH_INVALID_UTF)) != 0; + +/* The utf8_input modifier is not allowed in 8-bit mode, and is mutually +exclusive with the utf modifier. */ + +if ((pat_patctl.control & CTL_UTF8_INPUT) != 0) + { + if (test_mode == PCRE8_MODE) + { + fprintf(outfile, "** The utf8_input modifier is not allowed in 8-bit mode\n"); + return PR_SKIP; + } + if (utf) + { + fprintf(outfile, "** The utf and utf8_input modifiers are mutually exclusive\n"); + return PR_SKIP; + } + } + +/* The convert and posix modifiers are mutually exclusive. */ + +if (pat_patctl.convert_type != CONVERT_UNSET && + (pat_patctl.control & CTL_POSIX) != 0) + { + fprintf(outfile, "** The convert and posix modifiers are mutually exclusive\n"); + return PR_SKIP; + } + +/* Check for mutually exclusive control modifiers. At present, these are all in +the first control word. */ + +for (k = 0; k < sizeof(exclusive_pat_controls)/sizeof(uint32_t); k++) + { + uint32_t c = pat_patctl.control & exclusive_pat_controls[k]; + if (c != 0 && c != (c & (~c+1))) + { + show_controls(c, 0, "** Not allowed together:"); + fprintf(outfile, "\n"); + return PR_SKIP; + } + } + +/* Assume full JIT compile for jitverify and/or jitfast if nothing else was +specified. */ + +if (pat_patctl.jit == 0 && + (pat_patctl.control & (CTL_JITVERIFY|CTL_JITFAST)) != 0) + pat_patctl.jit = JIT_DEFAULT; + +/* Now copy the pattern to pbuffer8 for use in 8-bit testing and for reflecting +in callouts. Convert from hex if requested (literal strings in quotes may be +present within the hexadecimal pairs). The result must necessarily be fewer +characters so will always fit in pbuffer8. */ + +if ((pat_patctl.control & CTL_HEXPAT) != 0) + { + uint8_t *pp, *pt; + uint32_t c, d; + + pt = pbuffer8; + for (pp = buffer + 1; *pp != 0; pp++) + { + if (isspace(*pp)) continue; + c = *pp++; + + /* Handle a literal substring */ + + if (c == '\'' || c == '"') + { + uint8_t *pq = pp; + for (;; pp++) + { + d = *pp; + if (d == 0) + { + fprintf(outfile, "** Missing closing quote in hex pattern: " + "opening quote is at offset %" PTR_FORM ".\n", pq - buffer - 2); + return PR_SKIP; + } + if (d == c) break; + *pt++ = d; + } + } + + /* Expect a hex pair */ + + else + { + if (!isxdigit(c)) + { + fprintf(outfile, "** Unexpected non-hex-digit '%c' at offset %" + PTR_FORM " in hex pattern: quote missing?\n", c, pp - buffer - 2); + return PR_SKIP; + } + if (*pp == 0) + { + fprintf(outfile, "** Odd number of digits in hex pattern\n"); + return PR_SKIP; + } + d = *pp; + if (!isxdigit(d)) + { + fprintf(outfile, "** Unexpected non-hex-digit '%c' at offset %" + PTR_FORM " in hex pattern: quote missing?\n", d, pp - buffer - 1); + return PR_SKIP; + } + c = toupper(c); + d = toupper(d); + *pt++ = ((isdigit(c)? (c - '0') : (c - 'A' + 10)) << 4) + + (isdigit(d)? (d - '0') : (d - 'A' + 10)); + } + } + *pt = 0; + patlen = pt - pbuffer8; + } + +/* If not a hex string, process for repetition expansion if requested. */ + +else if ((pat_patctl.control & CTL_EXPAND) != 0) + { + uint8_t *pp, *pt; + + pt = pbuffer8; + for (pp = buffer + 1; *pp != 0; pp++) + { + uint8_t *pc = pp; + uint32_t count = 1; + size_t length = 1; + + /* Check for replication syntax; if not found, the defaults just set will + prevail and one character will be copied. */ + + if (pp[0] == '\\' && pp[1] == '[') + { + uint8_t *pe; + for (pe = pp + 2; *pe != 0; pe++) + { + if (pe[0] == ']' && pe[1] == '{') + { + uint32_t clen = pe - pc - 2; + uint32_t i = 0; + unsigned long uli; + char *endptr; + + pe += 2; + uli = strtoul((const char *)pe, &endptr, 10); + if (U32OVERFLOW(uli)) + { + fprintf(outfile, "** Pattern repeat count too large\n"); + return PR_SKIP; + } + + i = (uint32_t)uli; + pe = (uint8_t *)endptr; + if (*pe == '}') + { + if (i == 0) + { + fprintf(outfile, "** Zero repeat not allowed\n"); + return PR_SKIP; + } + pc += 2; + count = i; + length = clen; + pp = pe; + break; + } + } + } + } + + /* Add to output. If the buffer is too small expand it. The function for + expanding buffers always keeps buffer and pbuffer8 in step as far as their + size goes. */ + + while (pt + count * length > pbuffer8 + pbuffer8_size) + { + size_t pc_offset = pc - buffer; + size_t pp_offset = pp - buffer; + size_t pt_offset = pt - pbuffer8; + expand_input_buffers(); + pc = buffer + pc_offset; + pp = buffer + pp_offset; + pt = pbuffer8 + pt_offset; + } + + for (; count > 0; count--) + { + memcpy(pt, pc, length); + pt += length; + } + } + + *pt = 0; + patlen = pt - pbuffer8; + + if ((pat_patctl.control & CTL_INFO) != 0) + fprintf(outfile, "Expanded: %s\n", pbuffer8); + } + +/* Neither hex nor expanded, just copy the input verbatim. */ + +else + { + strncpy((char *)pbuffer8, (char *)(buffer+1), patlen + 1); + } + +/* Sort out character tables */ + +if (pat_patctl.locale[0] != 0) + { + if (pat_patctl.tables_id != 0) + { + fprintf(outfile, "** 'Locale' and 'tables' must not both be set\n"); + return PR_SKIP; + } + if (setlocale(LC_CTYPE, (const char *)pat_patctl.locale) == NULL) + { + fprintf(outfile, "** Failed to set locale '%s'\n", pat_patctl.locale); + return PR_SKIP; + } + if (strcmp((const char *)pat_patctl.locale, (const char *)locale_name) != 0) + { + strcpy((char *)locale_name, (char *)pat_patctl.locale); + if (locale_tables != NULL) free((void *)locale_tables); + PCRE2_MAKETABLES(locale_tables); + } + use_tables = locale_tables; + } + +else switch (pat_patctl.tables_id) + { + case 0: use_tables = NULL; break; + case 1: use_tables = tables1; break; + case 2: use_tables = tables2; break; + + case 3: + if (tables3 == NULL) + { + fprintf(outfile, "** 'Tables = 3' is invalid: binary tables have not " + "been loaded\n"); + return PR_SKIP; + } + use_tables = tables3; + break; + + default: + fprintf(outfile, "** 'Tables' must specify 0, 1, 2, or 3.\n"); + return PR_SKIP; + } + +PCRE2_SET_CHARACTER_TABLES(pat_context, use_tables); + +/* Set up for the stackguard test. */ + +if (pat_patctl.stackguard_test != 0) + { + PCRE2_SET_COMPILE_RECURSION_GUARD(pat_context, stack_guard, NULL); + } + +/* Handle compiling via the POSIX interface, which doesn't support the +timing, showing, or debugging options, nor the ability to pass over +local character tables. Neither does it have 16-bit or 32-bit support. */ + +if ((pat_patctl.control & CTL_POSIX) != 0) + { +#ifdef SUPPORT_PCRE2_8 + int rc; + int cflags = 0; + const char *msg = "** Ignored with POSIX interface:"; +#endif + + if (test_mode != PCRE8_MODE) + { + fprintf(outfile, "** The POSIX interface is available only in 8-bit mode\n"); + return PR_SKIP; + } + +#ifdef SUPPORT_PCRE2_8 + /* Check for features that the POSIX interface does not support. */ + + if (pat_patctl.locale[0] != 0) prmsg(&msg, "locale"); + if (pat_patctl.replacement[0] != 0) prmsg(&msg, "replace"); + if (pat_patctl.tables_id != 0) prmsg(&msg, "tables"); + if (pat_patctl.stackguard_test != 0) prmsg(&msg, "stackguard"); + if (timeit > 0) prmsg(&msg, "timing"); + if (pat_patctl.jit != 0) prmsg(&msg, "JIT"); + + if ((pat_patctl.options & ~POSIX_SUPPORTED_COMPILE_OPTIONS) != 0) + { + show_compile_options( + pat_patctl.options & ~POSIX_SUPPORTED_COMPILE_OPTIONS, msg, ""); + msg = ""; + } + + if ((FLD(pat_context, extra_options) & + ~POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS) != 0) + { + show_compile_extra_options( + FLD(pat_context, extra_options) & ~POSIX_SUPPORTED_COMPILE_EXTRA_OPTIONS, + msg, ""); + msg = ""; + } + + if ((pat_patctl.control & ~POSIX_SUPPORTED_COMPILE_CONTROLS) != 0 || + (pat_patctl.control2 & ~POSIX_SUPPORTED_COMPILE_CONTROLS2) != 0) + { + show_controls(pat_patctl.control & ~POSIX_SUPPORTED_COMPILE_CONTROLS, + pat_patctl.control2 & ~POSIX_SUPPORTED_COMPILE_CONTROLS2, msg); + msg = ""; + } + + if (local_newline_default != 0) prmsg(&msg, "#newline_default"); + if (FLD(pat_context, max_pattern_length) != PCRE2_UNSET) + prmsg(&msg, "max_pattern_length"); + if (FLD(pat_context, parens_nest_limit) != PARENS_NEST_DEFAULT) + prmsg(&msg, "parens_nest_limit"); + + if (msg[0] == 0) fprintf(outfile, "\n"); + + /* Translate PCRE2 options to POSIX options and then compile. */ + + if (utf) cflags |= REG_UTF; + if ((pat_patctl.control & CTL_POSIX_NOSUB) != 0) cflags |= REG_NOSUB; + if ((pat_patctl.options & PCRE2_UCP) != 0) cflags |= REG_UCP; + if ((pat_patctl.options & PCRE2_CASELESS) != 0) cflags |= REG_ICASE; + if ((pat_patctl.options & PCRE2_LITERAL) != 0) cflags |= REG_NOSPEC; + if ((pat_patctl.options & PCRE2_MULTILINE) != 0) cflags |= REG_NEWLINE; + if ((pat_patctl.options & PCRE2_DOTALL) != 0) cflags |= REG_DOTALL; + if ((pat_patctl.options & PCRE2_UNGREEDY) != 0) cflags |= REG_UNGREEDY; + + if ((pat_patctl.control & (CTL_HEXPAT|CTL_USE_LENGTH)) != 0) + { + preg.re_endp = (char *)pbuffer8 + patlen; + cflags |= REG_PEND; + } + + rc = regcomp(&preg, (char *)pbuffer8, cflags); + + /* Compiling failed */ + + if (rc != 0) + { + size_t bsize, usize; + int psize; + + preg.re_pcre2_code = NULL; /* In case something was left in there */ + preg.re_match_data = NULL; + + bsize = (pat_patctl.regerror_buffsize != 0)? + pat_patctl.regerror_buffsize : pbuffer8_size; + if (bsize + 8 < pbuffer8_size) + memcpy(pbuffer8 + bsize, "DEADBEEF", 8); + usize = regerror(rc, &preg, (char *)pbuffer8, bsize); + + /* Inside regerror(), snprintf() is used. If the buffer is too small, some + versions of snprintf() put a zero byte at the end, but others do not. + Therefore, we print a maximum of one less than the size of the buffer. */ + + psize = (int)bsize - 1; + fprintf(outfile, "Failed: POSIX code %d: %.*s\n", rc, psize, pbuffer8); + if (usize > bsize) + { + fprintf(outfile, "** regerror() message truncated\n"); + if (memcmp(pbuffer8 + bsize, "DEADBEEF", 8) != 0) + fprintf(outfile, "** regerror() buffer overflow\n"); + } + return PR_SKIP; + } + + /* Compiling succeeded. Check that the values in the preg block are sensible. + It can happen that pcre2test is accidentally linked with a different POSIX + library which succeeds, but of course puts different things into preg. In + this situation, calling regfree() may cause a segfault (or invalid free() in + valgrind), so ensure that preg.re_pcre2_code is NULL, which suppresses the + calling of regfree() on exit. */ + + if (preg.re_pcre2_code == NULL || + ((pcre2_real_code_8 *)preg.re_pcre2_code)->magic_number != MAGIC_NUMBER || + ((pcre2_real_code_8 *)preg.re_pcre2_code)->top_bracket != preg.re_nsub || + preg.re_match_data == NULL || + preg.re_cflags != cflags) + { + fprintf(outfile, + "** The regcomp() function returned zero (success), but the values set\n" + "** in the preg block are not valid for PCRE2. Check that pcre2test is\n" + "** linked with PCRE2's pcre2posix module (-lpcre2-posix) and not with\n" + "** some other POSIX regex library.\n**\n"); + preg.re_pcre2_code = NULL; + return PR_ABEND; + } + + return PR_OK; +#endif /* SUPPORT_PCRE2_8 */ + } + +/* Handle compiling via the native interface. Controls that act later are +ignored with "push". Replacements are locked out. */ + +if ((pat_patctl.control & (CTL_PUSH|CTL_PUSHCOPY|CTL_PUSHTABLESCOPY)) != 0) + { + if (pat_patctl.replacement[0] != 0) + { + fprintf(outfile, "** Replacement text is not supported with 'push'.\n"); + return PR_OK; + } + if ((pat_patctl.control & ~PUSH_SUPPORTED_COMPILE_CONTROLS) != 0 || + (pat_patctl.control2 & ~PUSH_SUPPORTED_COMPILE_CONTROLS2) != 0) + { + show_controls(pat_patctl.control & ~PUSH_SUPPORTED_COMPILE_CONTROLS, + pat_patctl.control2 & ~PUSH_SUPPORTED_COMPILE_CONTROLS2, + "** Ignored when compiled pattern is stacked with 'push':"); + fprintf(outfile, "\n"); + } + if ((pat_patctl.control & PUSH_COMPILE_ONLY_CONTROLS) != 0 || + (pat_patctl.control2 & PUSH_COMPILE_ONLY_CONTROLS2) != 0) + { + show_controls(pat_patctl.control & PUSH_COMPILE_ONLY_CONTROLS, + pat_patctl.control2 & PUSH_COMPILE_ONLY_CONTROLS2, + "** Applies only to compile when pattern is stacked with 'push':"); + fprintf(outfile, "\n"); + } + } + +/* Convert the input in non-8-bit modes. */ + +errorcode = 0; + +#ifdef SUPPORT_PCRE2_16 +if (test_mode == PCRE16_MODE) errorcode = to16(pbuffer8, utf, &patlen); +#endif + +#ifdef SUPPORT_PCRE2_32 +if (test_mode == PCRE32_MODE) errorcode = to32(pbuffer8, utf, &patlen); +#endif + +switch(errorcode) + { + case -1: + fprintf(outfile, "** Failed: invalid UTF-8 string cannot be " + "converted to %d-bit string\n", (test_mode == PCRE16_MODE)? 16:32); + return PR_SKIP; + + case -2: + fprintf(outfile, "** Failed: character value greater than 0x10ffff " + "cannot be converted to UTF\n"); + return PR_SKIP; + + case -3: + fprintf(outfile, "** Failed: character value greater than 0xffff " + "cannot be converted to 16-bit in non-UTF mode\n"); + return PR_SKIP; + + default: + break; + } + +/* The pattern is now in pbuffer[8|16|32], with the length in code units in +patlen. If it is to be converted, copy the result back afterwards so that it +ends up back in the usual place. */ + +if (pat_patctl.convert_type != CONVERT_UNSET) + { + int rc; + int convert_return = PR_OK; + uint32_t convert_options = pat_patctl.convert_type; + void *converted_pattern; + PCRE2_SIZE converted_length; + + if (pat_patctl.convert_length != 0) + { + converted_length = pat_patctl.convert_length; + converted_pattern = malloc(converted_length * code_unit_size); + if (converted_pattern == NULL) + { + fprintf(outfile, "** Failed: malloc failed for converted pattern\n"); + return PR_SKIP; + } + } + else converted_pattern = NULL; /* Let the library allocate */ + + if (utf) convert_options |= PCRE2_CONVERT_UTF; + if ((pat_patctl.options & PCRE2_NO_UTF_CHECK) != 0) + convert_options |= PCRE2_CONVERT_NO_UTF_CHECK; + + CONCTXCPY(con_context, default_con_context); + + if (pat_patctl.convert_glob_escape != 0) + { + uint32_t escape = (pat_patctl.convert_glob_escape == '0')? 0 : + pat_patctl.convert_glob_escape; + PCRE2_SET_GLOB_ESCAPE(rc, con_context, escape); + if (rc != 0) + { + fprintf(outfile, "** Invalid glob escape '%c'\n", + pat_patctl.convert_glob_escape); + convert_return = PR_SKIP; + goto CONVERT_FINISH; + } + } + + if (pat_patctl.convert_glob_separator != 0) + { + PCRE2_SET_GLOB_SEPARATOR(rc, con_context, pat_patctl.convert_glob_separator); + if (rc != 0) + { + fprintf(outfile, "** Invalid glob separator '%c'\n", + pat_patctl.convert_glob_separator); + convert_return = PR_SKIP; + goto CONVERT_FINISH; + } + } + + PCRE2_PATTERN_CONVERT(rc, pbuffer, patlen, convert_options, + &converted_pattern, &converted_length, con_context); + + if (rc != 0) + { + fprintf(outfile, "** Pattern conversion error at offset %" SIZ_FORM ": ", + SIZ_CAST converted_length); + convert_return = print_error_message(rc, "", "\n")? PR_SKIP:PR_ABEND; + } + + /* Output the converted pattern, then copy it. */ + + else + { + PCHARSV(converted_pattern, 0, converted_length, utf, outfile); + fprintf(outfile, "\n"); + patlen = converted_length; + CONVERT_COPY(pbuffer, converted_pattern, converted_length + 1); + } + + /* Free the converted pattern. */ + + CONVERT_FINISH: + if (pat_patctl.convert_length != 0) + free(converted_pattern); + else + PCRE2_CONVERTED_PATTERN_FREE(converted_pattern); + + /* Return if conversion was unsuccessful. */ + + if (convert_return != PR_OK) return convert_return; + } + +/* By default we pass a zero-terminated pattern, but a length is passed if +"use_length" was specified or this is a hex pattern (which might contain binary +zeros). When valgrind is supported, arrange for the unused part of the buffer +to be marked as no access. */ + +valgrind_access_length = patlen; +if ((pat_patctl.control & (CTL_HEXPAT|CTL_USE_LENGTH)) == 0) + { + patlen = PCRE2_ZERO_TERMINATED; + valgrind_access_length += 1; /* For the terminating zero */ + } + +#ifdef SUPPORT_VALGRIND +#ifdef SUPPORT_PCRE2_8 +if (test_mode == PCRE8_MODE && pbuffer8 != NULL) + { + VALGRIND_MAKE_MEM_NOACCESS(pbuffer8 + valgrind_access_length, + pbuffer8_size - valgrind_access_length); + } +#endif +#ifdef SUPPORT_PCRE2_16 +if (test_mode == PCRE16_MODE && pbuffer16 != NULL) + { + VALGRIND_MAKE_MEM_NOACCESS(pbuffer16 + valgrind_access_length, + pbuffer16_size - valgrind_access_length*sizeof(uint16_t)); + } +#endif +#ifdef SUPPORT_PCRE2_32 +if (test_mode == PCRE32_MODE && pbuffer32 != NULL) + { + VALGRIND_MAKE_MEM_NOACCESS(pbuffer32 + valgrind_access_length, + pbuffer32_size - valgrind_access_length*sizeof(uint32_t)); + } +#endif +#else /* Valgrind not supported */ +(void)valgrind_access_length; /* Avoid compiler warning */ +#endif + +/* If #newline_default has been used and the library was not compiled with an +appropriate default newline setting, local_newline_default will be non-zero. We +use this if there is no explicit newline modifier. */ + +if ((pat_patctl.control2 & CTL2_NL_SET) == 0 && local_newline_default != 0) + { + SETFLD(pat_context, newline_convention, local_newline_default); + } + +/* The null_context modifier is used to test calling pcre2_compile() with a +NULL context. */ + +use_pat_context = ((pat_patctl.control & CTL_NULLCONTEXT) != 0)? + NULL : PTR(pat_context); + +/* If PCRE2_LITERAL is set, set use_forbid_utf zero because PCRE2_NEVER_UTF +and PCRE2_NEVER_UCP are invalid with it. */ + +if ((pat_patctl.options & PCRE2_LITERAL) != 0) use_forbid_utf = 0; + +/* Compile many times when timing. */ + +if (timeit > 0) + { + int i; + clock_t time_taken = 0; + for (i = 0; i < timeit; i++) + { + clock_t start_time = clock(); + PCRE2_COMPILE(compiled_code, pbuffer, patlen, + pat_patctl.options|use_forbid_utf, &errorcode, &erroroffset, + use_pat_context); + time_taken += clock() - start_time; + if (TEST(compiled_code, !=, NULL)) + { SUB1(pcre2_code_free, compiled_code); } + } + total_compile_time += time_taken; + fprintf(outfile, "Compile time %.4f milliseconds\n", + (((double)time_taken * 1000.0) / (double)timeit) / + (double)CLOCKS_PER_SEC); + } + +/* A final compile that is used "for real". */ + +PCRE2_COMPILE(compiled_code, pbuffer, patlen, pat_patctl.options|use_forbid_utf, + &errorcode, &erroroffset, use_pat_context); + +/* Call the JIT compiler if requested. When timing, we must free and recompile +the pattern each time because that is the only way to free the JIT compiled +code. We know that compilation will always succeed. */ + +if (TEST(compiled_code, !=, NULL) && pat_patctl.jit != 0) + { + if (timeit > 0) + { + int i; + clock_t time_taken = 0; + + for (i = 0; i < timeit; i++) + { + clock_t start_time; + SUB1(pcre2_code_free, compiled_code); + PCRE2_COMPILE(compiled_code, pbuffer, patlen, + pat_patctl.options|use_forbid_utf, &errorcode, &erroroffset, + use_pat_context); + start_time = clock(); + PCRE2_JIT_COMPILE(jitrc, compiled_code, pat_patctl.jit); + time_taken += clock() - start_time; + } + total_jit_compile_time += time_taken; + fprintf(outfile, "JIT compile %.4f milliseconds\n", + (((double)time_taken * 1000.0) / (double)timeit) / + (double)CLOCKS_PER_SEC); + } + else + { + PCRE2_JIT_COMPILE(jitrc, compiled_code, pat_patctl.jit); + } + } + +/* If valgrind is supported, mark the pbuffer as accessible again. The 16-bit +and 32-bit buffers can be marked completely undefined, but we must leave the +pattern in the 8-bit buffer defined because it may be read from a callout +during matching. */ + +#ifdef SUPPORT_VALGRIND +#ifdef SUPPORT_PCRE2_8 +if (test_mode == PCRE8_MODE) + { + VALGRIND_MAKE_MEM_UNDEFINED(pbuffer8 + valgrind_access_length, + pbuffer8_size - valgrind_access_length); + } +#endif +#ifdef SUPPORT_PCRE2_16 +if (test_mode == PCRE16_MODE) + { + VALGRIND_MAKE_MEM_UNDEFINED(pbuffer16, pbuffer16_size); + } +#endif +#ifdef SUPPORT_PCRE2_32 +if (test_mode == PCRE32_MODE) + { + VALGRIND_MAKE_MEM_UNDEFINED(pbuffer32, pbuffer32_size); + } +#endif +#endif + +/* Compilation failed; go back for another re, skipping to blank line +if non-interactive. */ + +if (TEST(compiled_code, ==, NULL)) + { + fprintf(outfile, "Failed: error %d at offset %d: ", errorcode, + (int)erroroffset); + if (!print_error_message(errorcode, "", "\n")) return PR_ABEND; + return PR_SKIP; + } + +/* If forbid_utf is non-zero, we are running a non-UTF test. UTF and UCP are +locked out at compile time, but we must also check for occurrences of \P, \p, +and \X, which are only supported when Unicode is supported. */ + +if (forbid_utf != 0) + { + if ((FLD(compiled_code, flags) & PCRE2_HASBKPORX) != 0) + { + fprintf(outfile, "** \\P, \\p, and \\X are not allowed after the " + "#forbid_utf command\n"); + return PR_SKIP; + } + } + +/* Remember the maximum lookbehind, for partial matching. */ + +if (pattern_info(PCRE2_INFO_MAXLOOKBEHIND, &maxlookbehind, FALSE) != 0) + return PR_ABEND; + +/* Remember the number of captures. */ + +if (pattern_info(PCRE2_INFO_CAPTURECOUNT, &maxcapcount, FALSE) < 0) + return PR_ABEND; + +/* If an explicit newline modifier was given, set the information flag in the +pattern so that it is preserved over push/pop. */ + +if ((pat_patctl.control2 & CTL2_NL_SET) != 0) + { + SETFLD(compiled_code, flags, FLD(compiled_code, flags) | PCRE2_NL_SET); + } + +/* Output code size and other information if requested. */ + +if ((pat_patctl.control & CTL_MEMORY) != 0) show_memory_info(); +if ((pat_patctl.control & CTL_FRAMESIZE) != 0) show_framesize(); +if ((pat_patctl.control & CTL_ANYINFO) != 0) + { + int rc = show_pattern_info(); + if (rc != PR_OK) return rc; + } + +/* The "push" control requests that the compiled pattern be remembered on a +stack. This is mainly for testing the serialization functionality. */ + +if ((pat_patctl.control & CTL_PUSH) != 0) + { + if (patstacknext >= PATSTACKSIZE) + { + fprintf(outfile, "** Too many pushed patterns (max %d)\n", PATSTACKSIZE); + return PR_ABEND; + } + patstack[patstacknext++] = PTR(compiled_code); + SET(compiled_code, NULL); + } + +/* The "pushcopy" and "pushtablescopy" controls are similar, but push a +copy of the pattern, the latter with a copy of its character tables. This tests +the pcre2_code_copy() and pcre2_code_copy_with_tables() functions. */ + +if ((pat_patctl.control & (CTL_PUSHCOPY|CTL_PUSHTABLESCOPY)) != 0) + { + if (patstacknext >= PATSTACKSIZE) + { + fprintf(outfile, "** Too many pushed patterns (max %d)\n", PATSTACKSIZE); + return PR_ABEND; + } + if ((pat_patctl.control & CTL_PUSHCOPY) != 0) + { + PCRE2_CODE_COPY_TO_VOID(patstack[patstacknext++], compiled_code); + } + else + { + PCRE2_CODE_COPY_WITH_TABLES_TO_VOID(patstack[patstacknext++], + compiled_code); } + } + +return PR_OK; +} + + + +/************************************************* +* Check heap, match or depth limit * +*************************************************/ + +/* This is used for DFA, normal, and JIT fast matching. For DFA matching it +should only be called with the third argument set to PCRE2_ERROR_DEPTHLIMIT. + +Arguments: + pp the subject string + ulen length of subject or PCRE2_ZERO_TERMINATED + errnumber defines which limit to test + msg string to include in final message + +Returns: the return from the final match function call +*/ + +static int +check_match_limit(uint8_t *pp, PCRE2_SIZE ulen, int errnumber, const char *msg) +{ +int capcount; +uint32_t min = 0; +uint32_t mid = 64; +uint32_t max = UINT32_MAX; + +PCRE2_SET_MATCH_LIMIT(dat_context, max); +PCRE2_SET_DEPTH_LIMIT(dat_context, max); +PCRE2_SET_HEAP_LIMIT(dat_context, max); + +for (;;) + { + uint32_t stack_start = 0; + + if (errnumber == PCRE2_ERROR_HEAPLIMIT) + { + PCRE2_SET_HEAP_LIMIT(dat_context, mid); + } + else if (errnumber == PCRE2_ERROR_MATCHLIMIT) + { + PCRE2_SET_MATCH_LIMIT(dat_context, mid); + } + else + { + PCRE2_SET_DEPTH_LIMIT(dat_context, mid); + } + + if ((dat_datctl.control & CTL_DFA) != 0) + { + stack_start = DFA_START_RWS_SIZE/1024; + if (dfa_workspace == NULL) + dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int)); + if (dfa_matched++ == 0) + dfa_workspace[0] = -1; /* To catch bad restart */ + PCRE2_DFA_MATCH(capcount, compiled_code, pp, ulen, dat_datctl.offset, + dat_datctl.options, match_data, + PTR(dat_context), dfa_workspace, DFA_WS_DIMENSION); + } + + else if ((pat_patctl.control & CTL_JITFAST) != 0) + PCRE2_JIT_MATCH(capcount, compiled_code, pp, ulen, dat_datctl.offset, + dat_datctl.options, match_data, PTR(dat_context)); + + else + { + stack_start = START_FRAMES_SIZE/1024; + PCRE2_MATCH(capcount, compiled_code, pp, ulen, dat_datctl.offset, + dat_datctl.options, match_data, PTR(dat_context)); + } + + if (capcount == errnumber) + { + if ((mid & 0x80000000u) != 0) + { + fprintf(outfile, "Can't find minimum %s limit: check pattern for " + "restriction\n", msg); + break; + } + + min = mid; + mid = (mid == max - 1)? max : (max != UINT32_MAX)? (min + max)/2 : mid*2; + } + else if (capcount >= 0 || + capcount == PCRE2_ERROR_NOMATCH || + capcount == PCRE2_ERROR_PARTIAL) + { + /* If we've not hit the error with a heap limit less than the size of the + initial stack frame vector (for pcre2_match()) or the initial stack + workspace vector (for pcre2_dfa_match()), the heap is not being used, so + the minimum limit is zero; there's no need to go on. The other limits are + always greater than zero. */ + + if (errnumber == PCRE2_ERROR_HEAPLIMIT && mid < stack_start) + { + fprintf(outfile, "Minimum %s limit = 0\n", msg); + break; + } + if (mid == min + 1) + { + fprintf(outfile, "Minimum %s limit = %d\n", msg, mid); + break; + } + max = mid; + mid = (min + max)/2; + } + else break; /* Some other error */ + } + +return capcount; +} + + + +/************************************************* +* Substitute callout function * +*************************************************/ + +/* Called from pcre2_substitute() when the substitute_callout modifier is set. +Print out the data that is passed back. The substitute callout block is +identical for all code unit widths, so we just pick one. + +Arguments: + scb pointer to substitute callout block + data_ptr callout data + +Returns: nothing +*/ + +static int +substitute_callout_function(pcre2_substitute_callout_block_8 *scb, + void *data_ptr) +{ +int yield = 0; +BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0; +(void)data_ptr; /* Not used */ + +fprintf(outfile, "%2d(%d) Old %" SIZ_FORM " %" SIZ_FORM " \"", + scb->subscount, scb->oveccount, + SIZ_CAST scb->ovector[0], SIZ_CAST scb->ovector[1]); + +PCHARSV(scb->input, scb->ovector[0], scb->ovector[1] - scb->ovector[0], + utf, outfile); + +fprintf(outfile, "\" New %" SIZ_FORM " %" SIZ_FORM " \"", + SIZ_CAST scb->output_offsets[0], SIZ_CAST scb->output_offsets[1]); + +PCHARSV(scb->output, scb->output_offsets[0], + scb->output_offsets[1] - scb->output_offsets[0], utf, outfile); + +if (scb->subscount == dat_datctl.substitute_stop) + { + yield = -1; + fprintf(outfile, " STOPPED"); + } +else if (scb->subscount == dat_datctl.substitute_skip) + { + yield = +1; + fprintf(outfile, " SKIPPED"); + } + +fprintf(outfile, "\"\n"); +return yield; +} + + +/************************************************* +* Callout function * +*************************************************/ + +/* Called from a PCRE2 library as a result of the (?C) item. We print out where +we are in the match (unless suppressed). Yield zero unless more callouts than +the fail count, or the callout data is not zero. The only differences in the +callout block for different code unit widths are that the pointers to the +subject, the most recent MARK, and a callout argument string point to strings +of the appropriate width. Casts can be used to deal with this. + +Arguments: + cb a pointer to a callout block + callout_data_ptr the provided callout data + +Returns: 0 or 1 or an error, as determined by settings +*/ + +static int +callout_function(pcre2_callout_block_8 *cb, void *callout_data_ptr) +{ +FILE *f, *fdefault; +uint32_t i, pre_start, post_start, subject_length; +PCRE2_SIZE current_position; +BOOL utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0; +BOOL callout_capture = (dat_datctl.control & CTL_CALLOUT_CAPTURE) != 0; +BOOL callout_where = (dat_datctl.control2 & CTL2_CALLOUT_NO_WHERE) == 0; + +/* The FILE f is used for echoing the subject string if it is non-NULL. This +happens only once in simple cases, but we want to repeat after any additional +output caused by CALLOUT_EXTRA. */ + +fdefault = (!first_callout && !callout_capture && cb->callout_string == NULL)? + NULL : outfile; + +if ((dat_datctl.control2 & CTL2_CALLOUT_EXTRA) != 0) + { + f = outfile; + switch (cb->callout_flags) + { + case PCRE2_CALLOUT_BACKTRACK: + fprintf(f, "Backtrack\n"); + break; + + case PCRE2_CALLOUT_STARTMATCH|PCRE2_CALLOUT_BACKTRACK: + fprintf(f, "Backtrack\nNo other matching paths\n"); + /* Fall through */ + + case PCRE2_CALLOUT_STARTMATCH: + fprintf(f, "New match attempt\n"); + break; + + default: + f = fdefault; + break; + } + } +else f = fdefault; + +/* For a callout with a string argument, show the string first because there +isn't a tidy way to fit it in the rest of the data. */ + +if (cb->callout_string != NULL) + { + uint32_t delimiter = CODE_UNIT(cb->callout_string, -1); + fprintf(outfile, "Callout (%" SIZ_FORM "): %c", + SIZ_CAST cb->callout_string_offset, delimiter); + PCHARSV(cb->callout_string, 0, + cb->callout_string_length, utf, outfile); + for (i = 0; callout_start_delims[i] != 0; i++) + if (delimiter == callout_start_delims[i]) + { + delimiter = callout_end_delims[i]; + break; + } + fprintf(outfile, "%c", delimiter); + if (!callout_capture) fprintf(outfile, "\n"); + } + +/* Show captured strings if required */ + +if (callout_capture) + { + if (cb->callout_string == NULL) + fprintf(outfile, "Callout %d:", cb->callout_number); + fprintf(outfile, " last capture = %d\n", cb->capture_last); + for (i = 2; i < cb->capture_top * 2; i += 2) + { + fprintf(outfile, "%2d: ", i/2); + if (cb->offset_vector[i] == PCRE2_UNSET) + fprintf(outfile, ""); + else + { + PCHARSV(cb->subject, cb->offset_vector[i], + cb->offset_vector[i+1] - cb->offset_vector[i], utf, f); + } + fprintf(outfile, "\n"); + } + } + +/* Unless suppressed, re-print the subject in canonical form (with escapes for +non-printing characters), the first time, or if giving full details. On +subsequent calls in the same match, we use PCHARS() just to find the printed +lengths of the substrings. */ + +if (callout_where) + { + if (f != NULL) fprintf(f, "--->"); + + /* The subject before the match start. */ + + PCHARS(pre_start, cb->subject, 0, cb->start_match, utf, f); + + /* If a lookbehind is involved, the current position may be earlier than the + match start. If so, use the match start instead. */ + + current_position = (cb->current_position >= cb->start_match)? + cb->current_position : cb->start_match; + + /* The subject between the match start and the current position. */ + + PCHARS(post_start, cb->subject, cb->start_match, + current_position - cb->start_match, utf, f); + + /* Print from the current position to the end. */ + + PCHARSV(cb->subject, current_position, cb->subject_length - current_position, + utf, f); + + /* Calculate the total subject printed length (no print). */ + + PCHARS(subject_length, cb->subject, 0, cb->subject_length, utf, NULL); + + if (f != NULL) fprintf(f, "\n"); + + /* For automatic callouts, show the pattern offset. Otherwise, for a + numerical callout whose number has not already been shown with captured + strings, show the number here. A callout with a string argument has been + displayed above. */ + + if (cb->callout_number == 255) + { + fprintf(outfile, "%+3d ", (int)cb->pattern_position); + if (cb->pattern_position > 99) fprintf(outfile, "\n "); + } + else + { + if (callout_capture || cb->callout_string != NULL) fprintf(outfile, " "); + else fprintf(outfile, "%3d ", cb->callout_number); + } + + /* Now show position indicators */ + + for (i = 0; i < pre_start; i++) fprintf(outfile, " "); + fprintf(outfile, "^"); + + if (post_start > 0) + { + for (i = 0; i < post_start - 1; i++) fprintf(outfile, " "); + fprintf(outfile, "^"); + } + + for (i = 0; i < subject_length - pre_start - post_start + 4; i++) + fprintf(outfile, " "); + + if (cb->next_item_length != 0) + fprintf(outfile, "%.*s", (int)(cb->next_item_length), + pbuffer8 + cb->pattern_position); + else + fprintf(outfile, "End of pattern"); + + fprintf(outfile, "\n"); + } + +first_callout = FALSE; + +/* Show any mark info */ + +if (cb->mark != last_callout_mark) + { + if (cb->mark == NULL) + fprintf(outfile, "Latest Mark: \n"); + else + { + fprintf(outfile, "Latest Mark: "); + PCHARSV(cb->mark, -1, -1, utf, outfile); + putc('\n', outfile); + } + last_callout_mark = cb->mark; + } + +/* Show callout data */ + +if (callout_data_ptr != NULL) + { + int callout_data = *((int32_t *)callout_data_ptr); + if (callout_data != 0) + { + fprintf(outfile, "Callout data = %d\n", callout_data); + return callout_data; + } + } + +/* Keep count and give the appropriate return code */ + +callout_count++; + +if (cb->callout_number == dat_datctl.cerror[0] && + callout_count >= dat_datctl.cerror[1]) + return PCRE2_ERROR_CALLOUT; + +if (cb->callout_number == dat_datctl.cfail[0] && + callout_count >= dat_datctl.cfail[1]) + return 1; + +return 0; +} + + + +/************************************************* +* Handle *MARK and copy/get tests * +*************************************************/ + +/* This function is called after complete and partial matches. It runs the +tests for substring extraction. + +Arguments: + utf TRUE for utf + capcount return from pcre2_match() + +Returns: FALSE if print_error_message() fails +*/ + +static BOOL +copy_and_get(BOOL utf, int capcount) +{ +int i; +uint8_t *nptr; + +/* Test copy strings by number */ + +for (i = 0; i < MAXCPYGET && dat_datctl.copy_numbers[i] >= 0; i++) + { + int rc; + PCRE2_SIZE length, length2; + uint32_t copybuffer[256]; + uint32_t n = (uint32_t)(dat_datctl.copy_numbers[i]); + length = sizeof(copybuffer)/code_unit_size; + PCRE2_SUBSTRING_COPY_BYNUMBER(rc, match_data, n, copybuffer, &length); + if (rc < 0) + { + fprintf(outfile, "Copy substring %d failed (%d): ", n, rc); + if (!print_error_message(rc, "", "\n")) return FALSE; + } + else + { + PCRE2_SUBSTRING_LENGTH_BYNUMBER(rc, match_data, n, &length2); + if (rc < 0) + { + fprintf(outfile, "Get substring %d length failed (%d): ", n, rc); + if (!print_error_message(rc, "", "\n")) return FALSE; + } + else if (length2 != length) + { + fprintf(outfile, "Mismatched substring lengths: %" + SIZ_FORM " %" SIZ_FORM "\n", SIZ_CAST length, SIZ_CAST length2); + } + fprintf(outfile, "%2dC ", n); + PCHARSV(copybuffer, 0, length, utf, outfile); + fprintf(outfile, " (%" SIZ_FORM ")\n", SIZ_CAST length); + } + } + +/* Test copy strings by name */ + +nptr = dat_datctl.copy_names; +for (;;) + { + int rc; + int groupnumber; + PCRE2_SIZE length, length2; + uint32_t copybuffer[256]; + int namelen = strlen((const char *)nptr); +#if defined SUPPORT_PCRE2_16 || defined SUPPORT_PCRE2_32 + PCRE2_SIZE cnl = namelen; +#endif + if (namelen == 0) break; + +#ifdef SUPPORT_PCRE2_8 + if (test_mode == PCRE8_MODE) strcpy((char *)pbuffer8, (char *)nptr); +#endif +#ifdef SUPPORT_PCRE2_16 + if (test_mode == PCRE16_MODE)(void)to16(nptr, utf, &cnl); +#endif +#ifdef SUPPORT_PCRE2_32 + if (test_mode == PCRE32_MODE)(void)to32(nptr, utf, &cnl); +#endif + + PCRE2_SUBSTRING_NUMBER_FROM_NAME(groupnumber, compiled_code, pbuffer); + if (groupnumber < 0 && groupnumber != PCRE2_ERROR_NOUNIQUESUBSTRING) + fprintf(outfile, "Number not found for group '%s'\n", nptr); + + length = sizeof(copybuffer)/code_unit_size; + PCRE2_SUBSTRING_COPY_BYNAME(rc, match_data, pbuffer, copybuffer, &length); + if (rc < 0) + { + fprintf(outfile, "Copy substring '%s' failed (%d): ", nptr, rc); + if (!print_error_message(rc, "", "\n")) return FALSE; + } + else + { + PCRE2_SUBSTRING_LENGTH_BYNAME(rc, match_data, pbuffer, &length2); + if (rc < 0) + { + fprintf(outfile, "Get substring '%s' length failed (%d): ", nptr, rc); + if (!print_error_message(rc, "", "\n")) return FALSE; + } + else if (length2 != length) + { + fprintf(outfile, "Mismatched substring lengths: %" + SIZ_FORM " %" SIZ_FORM "\n", SIZ_CAST length, SIZ_CAST length2); + } + fprintf(outfile, " C "); + PCHARSV(copybuffer, 0, length, utf, outfile); + fprintf(outfile, " (%" SIZ_FORM ") %s", SIZ_CAST length, nptr); + if (groupnumber >= 0) fprintf(outfile, " (group %d)\n", groupnumber); + else fprintf(outfile, " (non-unique)\n"); + } + nptr += namelen + 1; + } + +/* Test get strings by number */ + +for (i = 0; i < MAXCPYGET && dat_datctl.get_numbers[i] >= 0; i++) + { + int rc; + PCRE2_SIZE length; + void *gotbuffer; + uint32_t n = (uint32_t)(dat_datctl.get_numbers[i]); + PCRE2_SUBSTRING_GET_BYNUMBER(rc, match_data, n, &gotbuffer, &length); + if (rc < 0) + { + fprintf(outfile, "Get substring %d failed (%d): ", n, rc); + if (!print_error_message(rc, "", "\n")) return FALSE; + } + else + { + fprintf(outfile, "%2dG ", n); + PCHARSV(gotbuffer, 0, length, utf, outfile); + fprintf(outfile, " (%" SIZ_FORM ")\n", SIZ_CAST length); + PCRE2_SUBSTRING_FREE(gotbuffer); + } + } + +/* Test get strings by name */ + +nptr = dat_datctl.get_names; +for (;;) + { + PCRE2_SIZE length; + void *gotbuffer; + int rc; + int groupnumber; + int namelen = strlen((const char *)nptr); +#if defined SUPPORT_PCRE2_16 || defined SUPPORT_PCRE2_32 + PCRE2_SIZE cnl = namelen; +#endif + if (namelen == 0) break; + +#ifdef SUPPORT_PCRE2_8 + if (test_mode == PCRE8_MODE) strcpy((char *)pbuffer8, (char *)nptr); +#endif +#ifdef SUPPORT_PCRE2_16 + if (test_mode == PCRE16_MODE)(void)to16(nptr, utf, &cnl); +#endif +#ifdef SUPPORT_PCRE2_32 + if (test_mode == PCRE32_MODE)(void)to32(nptr, utf, &cnl); +#endif + + PCRE2_SUBSTRING_NUMBER_FROM_NAME(groupnumber, compiled_code, pbuffer); + if (groupnumber < 0 && groupnumber != PCRE2_ERROR_NOUNIQUESUBSTRING) + fprintf(outfile, "Number not found for group '%s'\n", nptr); + + PCRE2_SUBSTRING_GET_BYNAME(rc, match_data, pbuffer, &gotbuffer, &length); + if (rc < 0) + { + fprintf(outfile, "Get substring '%s' failed (%d): ", nptr, rc); + if (!print_error_message(rc, "", "\n")) return FALSE; + } + else + { + fprintf(outfile, " G "); + PCHARSV(gotbuffer, 0, length, utf, outfile); + fprintf(outfile, " (%" SIZ_FORM ") %s", SIZ_CAST length, nptr); + if (groupnumber >= 0) fprintf(outfile, " (group %d)\n", groupnumber); + else fprintf(outfile, " (non-unique)\n"); + PCRE2_SUBSTRING_FREE(gotbuffer); + } + nptr += namelen + 1; + } + +/* Test getting the complete list of captured strings. */ + +if ((dat_datctl.control & CTL_GETALL) != 0) + { + int rc; + void **stringlist; + PCRE2_SIZE *lengths; + PCRE2_SUBSTRING_LIST_GET(rc, match_data, &stringlist, &lengths); + if (rc < 0) + { + fprintf(outfile, "get substring list failed (%d): ", rc); + if (!print_error_message(rc, "", "\n")) return FALSE; + } + else + { + for (i = 0; i < capcount; i++) + { + fprintf(outfile, "%2dL ", i); + PCHARSV(stringlist[i], 0, lengths[i], utf, outfile); + putc('\n', outfile); + } + if (stringlist[i] != NULL) + fprintf(outfile, "string list not terminated by NULL\n"); + PCRE2_SUBSTRING_LIST_FREE(stringlist); + } + } + +return TRUE; +} + + + +/************************************************* +* Show an entire ovector * +*************************************************/ + +/* This function is called after partial matching or match failure, when the +"allvector" modifier is set. It is a means of checking the contents of the +entire ovector, to ensure no modification of fields that should be unchanged. + +Arguments: + ovector points to the ovector + oveccount number of pairs + +Returns: nothing +*/ + +static void +show_ovector(PCRE2_SIZE *ovector, uint32_t oveccount) +{ +uint32_t i; +for (i = 0; i < 2*oveccount; i += 2) + { + PCRE2_SIZE start = ovector[i]; + PCRE2_SIZE end = ovector[i+1]; + + fprintf(outfile, "%2d: ", i/2); + if (start == PCRE2_UNSET && end == PCRE2_UNSET) + fprintf(outfile, "\n"); + else if (start == JUNK_OFFSET && end == JUNK_OFFSET) + fprintf(outfile, "\n"); + else + fprintf(outfile, "%ld %ld\n", (unsigned long int)start, + (unsigned long int)end); + } +} + + +/************************************************* +* Process a data line * +*************************************************/ + +/* The line is in buffer; it will not be empty. + +Arguments: none + +Returns: PR_OK continue processing next line + PR_SKIP skip to a blank line + PR_ABEND abort the pcre2test run +*/ + +static int +process_data(void) +{ +PCRE2_SIZE len, ulen, arg_ulen; +uint32_t gmatched; +uint32_t c, k; +uint32_t g_notempty = 0; +uint8_t *p, *pp, *start_rep; +size_t needlen; +void *use_dat_context; +BOOL utf; +BOOL subject_literal; + +PCRE2_SIZE *ovector; +PCRE2_SIZE ovecsave[3]; +uint32_t oveccount; + +#ifdef SUPPORT_PCRE2_8 +uint8_t *q8 = NULL; +#endif +#ifdef SUPPORT_PCRE2_16 +uint16_t *q16 = NULL; +#endif +#ifdef SUPPORT_PCRE2_32 +uint32_t *q32 = NULL; +#endif + +subject_literal = (pat_patctl.control2 & CTL2_SUBJECT_LITERAL) != 0; + +/* Copy the default context and data control blocks to the active ones. Then +copy from the pattern the controls that can be set in either the pattern or the +data. This allows them to be overridden in the data line. We do not do this for +options because those that are common apply separately to compiling and +matching. */ + +DATCTXCPY(dat_context, default_dat_context); +memcpy(&dat_datctl, &def_datctl, sizeof(datctl)); +dat_datctl.control |= (pat_patctl.control & CTL_ALLPD); +dat_datctl.control2 |= (pat_patctl.control2 & CTL2_ALLPD); +strcpy((char *)dat_datctl.replacement, (char *)pat_patctl.replacement); +if (dat_datctl.jitstack == 0) dat_datctl.jitstack = pat_patctl.jitstack; + +if (dat_datctl.substitute_skip == 0) + dat_datctl.substitute_skip = pat_patctl.substitute_skip; +if (dat_datctl.substitute_stop == 0) + dat_datctl.substitute_stop = pat_patctl.substitute_stop; + +/* Initialize for scanning the data line. */ + +#ifdef SUPPORT_PCRE2_8 +utf = ((((pat_patctl.control & CTL_POSIX) != 0)? + ((pcre2_real_code_8 *)preg.re_pcre2_code)->overall_options : + FLD(compiled_code, overall_options)) & PCRE2_UTF) != 0; +#else +utf = (FLD(compiled_code, overall_options) & PCRE2_UTF) != 0; +#endif + +start_rep = NULL; +len = strlen((const char *)buffer); +while (len > 0 && isspace(buffer[len-1])) len--; +buffer[len] = 0; +p = buffer; +while (isspace(*p)) p++; + +/* Check that the data is well-formed UTF-8 if we're in UTF mode. To create +invalid input to pcre2_match(), you must use \x?? or \x{} sequences. */ + +if (utf) + { + uint8_t *q; + uint32_t cc; + int n = 1; + for (q = p; n > 0 && *q; q += n) n = utf82ord(q, &cc); + if (n <= 0) + { + fprintf(outfile, "** Failed: invalid UTF-8 string cannot be used as input " + "in UTF mode\n"); + return PR_OK; + } + } + +#ifdef SUPPORT_VALGRIND +/* Mark the dbuffer as addressable but undefined again. */ +if (dbuffer != NULL) + { + VALGRIND_MAKE_MEM_UNDEFINED(dbuffer, dbuffer_size); + } +#endif + +/* Allocate a buffer to hold the data line; len+1 is an upper bound on +the number of code units that will be needed (though the buffer may have to be +extended if replication is involved). */ + +needlen = (size_t)((len+1) * code_unit_size); +if (dbuffer == NULL || needlen >= dbuffer_size) + { + while (needlen >= dbuffer_size) dbuffer_size *= 2; + dbuffer = (uint8_t *)realloc(dbuffer, dbuffer_size); + if (dbuffer == NULL) + { + fprintf(stderr, "pcre2test: realloc(%d) failed\n", (int)dbuffer_size); + exit(1); + } + } +SETCASTPTR(q, dbuffer); /* Sets q8, q16, or q32, as appropriate. */ + +/* Scan the data line, interpreting data escapes, and put the result into a +buffer of the appropriate width. In UTF mode, input is always UTF-8; otherwise, +in 16- and 32-bit modes, it can be forced to UTF-8 by the utf8_input modifier. +*/ + +while ((c = *p++) != 0) + { + int32_t i = 0; + size_t replen; + + /* ] may mark the end of a replicated sequence */ + + if (c == ']' && start_rep != NULL) + { + long li; + char *endptr; + size_t qoffset = CAST8VAR(q) - dbuffer; + size_t rep_offset = start_rep - dbuffer; + + if (*p++ != '{') + { + fprintf(outfile, "** Expected '{' after \\[....]\n"); + return PR_OK; + } + + li = strtol((const char *)p, &endptr, 10); + if (S32OVERFLOW(li)) + { + fprintf(outfile, "** Repeat count too large\n"); + return PR_OK; + } + + p = (uint8_t *)endptr; + if (*p++ != '}') + { + fprintf(outfile, "** Expected '}' after \\[...]{...\n"); + return PR_OK; + } + + i = (int32_t)li; + if (i-- == 0) + { + fprintf(outfile, "** Zero repeat not allowed\n"); + return PR_OK; + } + + replen = CAST8VAR(q) - start_rep; + needlen += replen * i; + + if (needlen >= dbuffer_size) + { + while (needlen >= dbuffer_size) dbuffer_size *= 2; + dbuffer = (uint8_t *)realloc(dbuffer, dbuffer_size); + if (dbuffer == NULL) + { + fprintf(stderr, "pcre2test: realloc(%d) failed\n", (int)dbuffer_size); + exit(1); + } + SETCASTPTR(q, dbuffer + qoffset); + start_rep = dbuffer + rep_offset; + } + + while (i-- > 0) + { + memcpy(CAST8VAR(q), start_rep, replen); + SETPLUS(q, replen/code_unit_size); + } + + start_rep = NULL; + continue; + } + + /* Handle a non-escaped character. In non-UTF 32-bit mode with utf8_input + set, do the fudge for setting the top bit. */ + + if (c != '\\' || subject_literal) + { + uint32_t topbit = 0; + if (test_mode == PCRE32_MODE && c == 0xff && *p != 0) + { + topbit = 0x80000000; + c = *p++; + } + if ((utf || (pat_patctl.control & CTL_UTF8_INPUT) != 0) && + HASUTF8EXTRALEN(c)) { GETUTF8INC(c, p); } + c |= topbit; + } + + /* Handle backslash escapes */ + + else switch ((c = *p++)) + { + case '\\': break; + case 'a': c = CHAR_BEL; break; + case 'b': c = '\b'; break; + case 'e': c = CHAR_ESC; break; + case 'f': c = '\f'; break; + case 'n': c = '\n'; break; + case 'r': c = '\r'; break; + case 't': c = '\t'; break; + case 'v': c = '\v'; break; + + case '0': case '1': case '2': case '3': + case '4': case '5': case '6': case '7': + c -= '0'; + while (i++ < 2 && isdigit(*p) && *p != '8' && *p != '9') + c = c * 8 + *p++ - '0'; + break; + + case 'o': + if (*p == '{') + { + uint8_t *pt = p; + c = 0; + for (pt++; isdigit(*pt) && *pt != '8' && *pt != '9'; pt++) + { + if (++i == 12) + fprintf(outfile, "** Too many octal digits in \\o{...} item; " + "using only the first twelve.\n"); + else c = c * 8 + *pt - '0'; + } + if (*pt == '}') p = pt + 1; + else fprintf(outfile, "** Missing } after \\o{ (assumed)\n"); + } + break; + + case 'x': + if (*p == '{') + { + uint8_t *pt = p; + c = 0; + + /* We used to have "while (isxdigit(*(++pt)))" here, but it fails + when isxdigit() is a macro that refers to its argument more than + once. This is banned by the C Standard, but apparently happens in at + least one MacOS environment. */ + + for (pt++; isxdigit(*pt); pt++) + { + if (++i == 9) + fprintf(outfile, "** Too many hex digits in \\x{...} item; " + "using only the first eight.\n"); + else c = c * 16 + tolower(*pt) - ((isdigit(*pt))? '0' : 'a' - 10); + } + if (*pt == '}') + { + p = pt + 1; + break; + } + /* Not correct form for \x{...}; fall through */ + } + + /* \x without {} always defines just one byte in 8-bit mode. This + allows UTF-8 characters to be constructed byte by byte, and also allows + invalid UTF-8 sequences to be made. Just copy the byte in UTF-8 mode. + Otherwise, pass it down as data. */ + + c = 0; + while (i++ < 2 && isxdigit(*p)) + { + c = c * 16 + tolower(*p) - ((isdigit(*p))? '0' : 'a' - 10); + p++; + } +#if defined SUPPORT_PCRE2_8 + if (utf && (test_mode == PCRE8_MODE)) + { + *q8++ = c; + continue; + } +#endif + break; + + case 0: /* \ followed by EOF allows for an empty line */ + p--; + continue; + + case '=': /* \= terminates the data, starts modifiers */ + goto ENDSTRING; + + case '[': /* \[ introduces a replicated character sequence */ + if (start_rep != NULL) + { + fprintf(outfile, "** Nested replication is not supported\n"); + return PR_OK; + } + start_rep = CAST8VAR(q); + continue; + + default: + if (isalnum(c)) + { + fprintf(outfile, "** Unrecognized escape sequence \"\\%c\"\n", c); + return PR_OK; + } + } + + /* We now have a character value in c that may be greater than 255. + In 8-bit mode we convert to UTF-8 if we are in UTF mode. Values greater + than 127 in UTF mode must have come from \x{...} or octal constructs + because values from \x.. get this far only in non-UTF mode. */ + +#ifdef SUPPORT_PCRE2_8 + if (test_mode == PCRE8_MODE) + { + if (utf) + { + if (c > 0x7fffffff) + { + fprintf(outfile, "** Character \\x{%x} is greater than 0x7fffffff " + "and so cannot be converted to UTF-8\n", c); + return PR_OK; + } + q8 += ord2utf8(c, q8); + } + else + { + if (c > 0xffu) + { + fprintf(outfile, "** Character \\x{%x} is greater than 255 " + "and UTF-8 mode is not enabled.\n", c); + fprintf(outfile, "** Truncation will probably give the wrong " + "result.\n"); + } + *q8++ = (uint8_t)c; + } + } +#endif +#ifdef SUPPORT_PCRE2_16 + if (test_mode == PCRE16_MODE) + { + if (utf) + { + if (c > 0x10ffffu) + { + fprintf(outfile, "** Failed: character \\x{%x} is greater than " + "0x10ffff and so cannot be converted to UTF-16\n", c); + return PR_OK; + } + else if (c >= 0x10000u) + { + c-= 0x10000u; + *q16++ = 0xD800 | (c >> 10); + *q16++ = 0xDC00 | (c & 0x3ff); + } + else + *q16++ = c; + } + else + { + if (c > 0xffffu) + { + fprintf(outfile, "** Character \\x{%x} is greater than 0xffff " + "and UTF-16 mode is not enabled.\n", c); + fprintf(outfile, "** Truncation will probably give the wrong " + "result.\n"); + } + + *q16++ = (uint16_t)c; + } + } +#endif +#ifdef SUPPORT_PCRE2_32 + if (test_mode == PCRE32_MODE) + { + *q32++ = c; + } +#endif + } + +ENDSTRING: +SET(*q, 0); +len = CASTVAR(uint8_t *, q) - dbuffer; /* Length in bytes */ +ulen = len/code_unit_size; /* Length in code units */ +arg_ulen = ulen; /* Value to use in match arg */ + +/* If the string was terminated by \= we must now interpret modifiers. */ + +if (p[-1] != 0 && !decode_modifiers(p, CTX_DAT, NULL, &dat_datctl)) + return PR_OK; + +/* Setting substitute_{skip,fail} implies a substitute callout. */ + +if (dat_datctl.substitute_skip != 0 || dat_datctl.substitute_stop != 0) + dat_datctl.control2 |= CTL2_SUBSTITUTE_CALLOUT; + +/* Check for mutually exclusive modifiers. At present, these are all in the +first control word. */ + +for (k = 0; k < sizeof(exclusive_dat_controls)/sizeof(uint32_t); k++) + { + c = dat_datctl.control & exclusive_dat_controls[k]; + if (c != 0 && c != (c & (~c+1))) + { + show_controls(c, 0, "** Not allowed together:"); + fprintf(outfile, "\n"); + return PR_OK; + } + } + +if (pat_patctl.replacement[0] != 0) + { + if ((dat_datctl.control2 & CTL2_SUBSTITUTE_CALLOUT) != 0 && + (dat_datctl.control & CTL_NULLCONTEXT) != 0) + { + fprintf(outfile, "** Replacement callouts are not supported with null_context.\n"); + return PR_OK; + } + + if ((dat_datctl.control & CTL_ALLCAPTURES) != 0) + fprintf(outfile, "** Ignored with replacement text: allcaptures\n"); + } + +/* Warn for modifiers that are ignored for DFA. */ + +if ((dat_datctl.control & CTL_DFA) != 0) + { + if ((dat_datctl.control & CTL_ALLCAPTURES) != 0) + fprintf(outfile, "** Ignored after DFA matching: allcaptures\n"); + } + +/* We now have the subject in dbuffer, with len containing the byte length, and +ulen containing the code unit length, with a copy in arg_ulen for use in match +function arguments (this gets changed to PCRE2_ZERO_TERMINATED when the +zero_terminate modifier is present). + +Move the data to the end of the buffer so that a read over the end can be +caught by valgrind or other means. If we have explicit valgrind support, mark +the unused start of the buffer unaddressable. If we are using the POSIX +interface, or testing zero-termination, we must include the terminating zero in +the usable data. */ + +c = code_unit_size * (((pat_patctl.control & CTL_POSIX) + + (dat_datctl.control & CTL_ZERO_TERMINATE) != 0)? 1:0); +pp = memmove(dbuffer + dbuffer_size - len - c, dbuffer, len + c); +#ifdef SUPPORT_VALGRIND + VALGRIND_MAKE_MEM_NOACCESS(dbuffer, dbuffer_size - (len + c)); +#endif + +/* Now pp points to the subject string. POSIX matching is only possible in +8-bit mode, and it does not support timing or other fancy features. Some were +checked at compile time, but we need to check the match-time settings here. */ + +#ifdef SUPPORT_PCRE2_8 +if ((pat_patctl.control & CTL_POSIX) != 0) + { + int rc; + int eflags = 0; + regmatch_t *pmatch = NULL; + const char *msg = "** Ignored with POSIX interface:"; + + if (dat_datctl.cerror[0] != CFORE_UNSET || dat_datctl.cerror[1] != CFORE_UNSET) + prmsg(&msg, "callout_error"); + if (dat_datctl.cfail[0] != CFORE_UNSET || dat_datctl.cfail[1] != CFORE_UNSET) + prmsg(&msg, "callout_fail"); + if (dat_datctl.copy_numbers[0] >= 0 || dat_datctl.copy_names[0] != 0) + prmsg(&msg, "copy"); + if (dat_datctl.get_numbers[0] >= 0 || dat_datctl.get_names[0] != 0) + prmsg(&msg, "get"); + if (dat_datctl.jitstack != 0) prmsg(&msg, "jitstack"); + if (dat_datctl.offset != 0) prmsg(&msg, "offset"); + + if ((dat_datctl.options & ~POSIX_SUPPORTED_MATCH_OPTIONS) != 0) + { + fprintf(outfile, "%s", msg); + show_match_options(dat_datctl.options & ~POSIX_SUPPORTED_MATCH_OPTIONS); + msg = ""; + } + if ((dat_datctl.control & ~POSIX_SUPPORTED_MATCH_CONTROLS) != 0 || + (dat_datctl.control2 & ~POSIX_SUPPORTED_MATCH_CONTROLS2) != 0) + { + show_controls(dat_datctl.control & ~POSIX_SUPPORTED_MATCH_CONTROLS, + dat_datctl.control2 & ~POSIX_SUPPORTED_MATCH_CONTROLS2, msg); + msg = ""; + } + + if (msg[0] == 0) fprintf(outfile, "\n"); + + if (dat_datctl.oveccount > 0) + { + pmatch = (regmatch_t *)malloc(sizeof(regmatch_t) * dat_datctl.oveccount); + if (pmatch == NULL) + { + fprintf(outfile, "** Failed to get memory for recording matching " + "information (size set = %du)\n", dat_datctl.oveccount); + return PR_OK; + } + } + + if (dat_datctl.startend[0] != CFORE_UNSET) + { + pmatch[0].rm_so = dat_datctl.startend[0]; + pmatch[0].rm_eo = (dat_datctl.startend[1] != 0)? + dat_datctl.startend[1] : len; + eflags |= REG_STARTEND; + } + + if ((dat_datctl.options & PCRE2_NOTBOL) != 0) eflags |= REG_NOTBOL; + if ((dat_datctl.options & PCRE2_NOTEOL) != 0) eflags |= REG_NOTEOL; + if ((dat_datctl.options & PCRE2_NOTEMPTY) != 0) eflags |= REG_NOTEMPTY; + + rc = regexec(&preg, (const char *)pp, dat_datctl.oveccount, pmatch, eflags); + if (rc != 0) + { + (void)regerror(rc, &preg, (char *)pbuffer8, pbuffer8_size); + fprintf(outfile, "No match: POSIX code %d: %s\n", rc, pbuffer8); + } + else if ((pat_patctl.control & CTL_POSIX_NOSUB) != 0) + fprintf(outfile, "Matched with REG_NOSUB\n"); + else if (dat_datctl.oveccount == 0) + fprintf(outfile, "Matched without capture\n"); + else + { + size_t i, j; + size_t last_printed = (size_t)dat_datctl.oveccount; + for (i = 0; i < (size_t)dat_datctl.oveccount; i++) + { + if (pmatch[i].rm_so >= 0) + { + PCRE2_SIZE start = pmatch[i].rm_so; + PCRE2_SIZE end = pmatch[i].rm_eo; + for (j = last_printed + 1; j < i; j++) + fprintf(outfile, "%2d: \n", (int)j); + last_printed = i; + if (start > end) + { + start = pmatch[i].rm_eo; + end = pmatch[i].rm_so; + fprintf(outfile, "Start of matched string is beyond its end - " + "displaying from end to start.\n"); + } + fprintf(outfile, "%2d: ", (int)i); + PCHARSV(pp, start, end - start, utf, outfile); + fprintf(outfile, "\n"); + + if ((i == 0 && (dat_datctl.control & CTL_AFTERTEXT) != 0) || + (dat_datctl.control & CTL_ALLAFTERTEXT) != 0) + { + fprintf(outfile, "%2d+ ", (int)i); + /* Note: don't use the start/end variables here because we want to + show the text from what is reported as the end. */ + PCHARSV(pp, pmatch[i].rm_eo, len - pmatch[i].rm_eo, utf, outfile); + fprintf(outfile, "\n"); } + } + } + } + free(pmatch); + return PR_OK; + } +#endif /* SUPPORT_PCRE2_8 */ + + /* Handle matching via the native interface. Check for consistency of +modifiers. */ + +if (dat_datctl.startend[0] != CFORE_UNSET) + fprintf(outfile, "** \\=posix_startend ignored for non-POSIX matching\n"); + +/* ALLUSEDTEXT is not supported with JIT, but JIT is not used with DFA +matching, even if the JIT compiler was used. */ + +if ((dat_datctl.control & (CTL_ALLUSEDTEXT|CTL_DFA)) == CTL_ALLUSEDTEXT && + FLD(compiled_code, executable_jit) != NULL) + { + fprintf(outfile, "** Showing all consulted text is not supported by JIT: ignored\n"); + dat_datctl.control &= ~CTL_ALLUSEDTEXT; + } + +/* Handle passing the subject as zero-terminated. */ + +if ((dat_datctl.control & CTL_ZERO_TERMINATE) != 0) + arg_ulen = PCRE2_ZERO_TERMINATED; + +/* The nullcontext modifier is used to test calling pcre2_[jit_]match() with a +NULL context. */ + +use_dat_context = ((dat_datctl.control & CTL_NULLCONTEXT) != 0)? + NULL : PTR(dat_context); + +/* Enable display of malloc/free if wanted. We can do this only if either the +pattern or the subject is processed with a context. */ + +show_memory = (dat_datctl.control & CTL_MEMORY) != 0; + +if (show_memory && + (pat_patctl.control & dat_datctl.control & CTL_NULLCONTEXT) != 0) + fprintf(outfile, "** \\=memory requires either a pattern or a subject " + "context: ignored\n"); + +/* Create and assign a JIT stack if requested. */ + +if (dat_datctl.jitstack != 0) + { + if (dat_datctl.jitstack != jit_stack_size) + { + PCRE2_JIT_STACK_FREE(jit_stack); + PCRE2_JIT_STACK_CREATE(jit_stack, 1, dat_datctl.jitstack * 1024, NULL); + jit_stack_size = dat_datctl.jitstack; + } + PCRE2_JIT_STACK_ASSIGN(dat_context, jit_callback, jit_stack); + } + +/* Or de-assign */ + +else if (jit_stack != NULL) + { + PCRE2_JIT_STACK_ASSIGN(dat_context, NULL, NULL); + PCRE2_JIT_STACK_FREE(jit_stack); + jit_stack = NULL; + jit_stack_size = 0; + } + +/* When no JIT stack is assigned, we must ensure that there is a JIT callback +if we want to verify that JIT was actually used. */ + +if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_stack == NULL) + { + PCRE2_JIT_STACK_ASSIGN(dat_context, jit_callback, NULL); + } + +/* Adjust match_data according to size of offsets required. A size of zero +causes a new match data block to be obtained that exactly fits the pattern. */ + +if (dat_datctl.oveccount == 0) + { + PCRE2_MATCH_DATA_FREE(match_data); + PCRE2_MATCH_DATA_CREATE_FROM_PATTERN(match_data, compiled_code, NULL); + PCRE2_GET_OVECTOR_COUNT(max_oveccount, match_data); + } +else if (dat_datctl.oveccount <= max_oveccount) + { + SETFLD(match_data, oveccount, dat_datctl.oveccount); + } +else + { + max_oveccount = dat_datctl.oveccount; + PCRE2_MATCH_DATA_FREE(match_data); + PCRE2_MATCH_DATA_CREATE(match_data, max_oveccount, NULL); + } + +if (CASTVAR(void *, match_data) == NULL) + { + fprintf(outfile, "** Failed to get memory for recording matching " + "information (size requested: %d)\n", dat_datctl.oveccount); + max_oveccount = 0; + return PR_OK; + } + +ovector = FLD(match_data, ovector); +PCRE2_GET_OVECTOR_COUNT(oveccount, match_data); + +/* Replacement processing is ignored for DFA matching. */ + +if (dat_datctl.replacement[0] != 0 && (dat_datctl.control & CTL_DFA) != 0) + { + fprintf(outfile, "** Ignored for DFA matching: replace\n"); + dat_datctl.replacement[0] = 0; + } + +/* If a replacement string is provided, call pcre2_substitute() instead of one +of the matching functions. First we have to convert the replacement string to +the appropriate width. */ + +if (dat_datctl.replacement[0] != 0) + { + int rc; + uint8_t *pr; + uint8_t rbuffer[REPLACE_BUFFSIZE]; + uint8_t nbuffer[REPLACE_BUFFSIZE]; + uint32_t xoptions; + uint32_t emoption; /* External match option */ + PCRE2_SIZE j, rlen, nsize, erroroffset; + BOOL badutf = FALSE; + +#ifdef SUPPORT_PCRE2_8 + uint8_t *r8 = NULL; +#endif +#ifdef SUPPORT_PCRE2_16 + uint16_t *r16 = NULL; +#endif +#ifdef SUPPORT_PCRE2_32 + uint32_t *r32 = NULL; +#endif + + /* Fill the ovector with junk to detect elements that do not get set + when they should be (relevant only when "allvector" is specified). */ + + for (j = 0; j < 2*oveccount; j++) ovector[j] = JUNK_OFFSET; + + if (timeitm) + fprintf(outfile, "** Timing is not supported with replace: ignored\n"); + + if ((dat_datctl.control & CTL_ALTGLOBAL) != 0) + fprintf(outfile, "** Altglobal is not supported with replace: ignored\n"); + + /* Check for a test that does substitution after an initial external match. + If this is set, we run the external match, but leave the interpretation of + its output to pcre2_substitute(). */ + + emoption = ((dat_datctl.control2 & CTL2_SUBSTITUTE_MATCHED) == 0)? 0 : + PCRE2_SUBSTITUTE_MATCHED; + + if (emoption != 0) + { + PCRE2_MATCH(rc, compiled_code, pp, arg_ulen, dat_datctl.offset, + dat_datctl.options, match_data, use_dat_context); + } + + xoptions = emoption | + (((dat_datctl.control & CTL_GLOBAL) == 0)? 0 : + PCRE2_SUBSTITUTE_GLOBAL) | + (((dat_datctl.control2 & CTL2_SUBSTITUTE_EXTENDED) == 0)? 0 : + PCRE2_SUBSTITUTE_EXTENDED) | + (((dat_datctl.control2 & CTL2_SUBSTITUTE_LITERAL) == 0)? 0 : + PCRE2_SUBSTITUTE_LITERAL) | + (((dat_datctl.control2 & CTL2_SUBSTITUTE_OVERFLOW_LENGTH) == 0)? 0 : + PCRE2_SUBSTITUTE_OVERFLOW_LENGTH) | + (((dat_datctl.control2 & CTL2_SUBSTITUTE_REPLACEMENT_ONLY) == 0)? 0 : + PCRE2_SUBSTITUTE_REPLACEMENT_ONLY) | + (((dat_datctl.control2 & CTL2_SUBSTITUTE_UNKNOWN_UNSET) == 0)? 0 : + PCRE2_SUBSTITUTE_UNKNOWN_UNSET) | + (((dat_datctl.control2 & CTL2_SUBSTITUTE_UNSET_EMPTY) == 0)? 0 : + PCRE2_SUBSTITUTE_UNSET_EMPTY); + + SETCASTPTR(r, rbuffer); /* Sets r8, r16, or r32, as appropriate. */ + pr = dat_datctl.replacement; + + /* If the replacement starts with '[]' we interpret that as length + value for the replacement buffer. */ + + nsize = REPLACE_BUFFSIZE/code_unit_size; + if (*pr == '[') + { + PCRE2_SIZE n = 0; + while ((c = *(++pr)) >= CHAR_0 && c <= CHAR_9) n = n * 10 + c - CHAR_0; + if (*pr++ != ']') + { + fprintf(outfile, "Bad buffer size in replacement string\n"); + return PR_OK; + } + if (n > nsize) + { + fprintf(outfile, "Replacement buffer setting (%" SIZ_FORM ") is too " + "large (max %" SIZ_FORM ")\n", SIZ_CAST n, SIZ_CAST nsize); + return PR_OK; + } + nsize = n; + } + + /* Now copy the replacement string to a buffer of the appropriate width. No + escape processing is done for replacements. In UTF mode, check for an invalid + UTF-8 input string, and if it is invalid, just copy its code units without + UTF interpretation. This provides a means of checking that an invalid string + is detected. Otherwise, UTF-8 can be used to include wide characters in a + replacement. */ + + if (utf) badutf = valid_utf(pr, strlen((const char *)pr), &erroroffset); + + /* Not UTF or invalid UTF-8: just copy the code units. */ + + if (!utf || badutf) + { + while ((c = *pr++) != 0) + { +#ifdef SUPPORT_PCRE2_8 + if (test_mode == PCRE8_MODE) *r8++ = c; +#endif +#ifdef SUPPORT_PCRE2_16 + if (test_mode == PCRE16_MODE) *r16++ = c; +#endif +#ifdef SUPPORT_PCRE2_32 + if (test_mode == PCRE32_MODE) *r32++ = c; +#endif + } + } + + /* Valid UTF-8 replacement string */ + + else while ((c = *pr++) != 0) + { + if (HASUTF8EXTRALEN(c)) { GETUTF8INC(c, pr); } + +#ifdef SUPPORT_PCRE2_8 + if (test_mode == PCRE8_MODE) r8 += ord2utf8(c, r8); +#endif + +#ifdef SUPPORT_PCRE2_16 + if (test_mode == PCRE16_MODE) + { + if (c >= 0x10000u) + { + c-= 0x10000u; + *r16++ = 0xD800 | (c >> 10); + *r16++ = 0xDC00 | (c & 0x3ff); + } + else *r16++ = c; + } +#endif + +#ifdef SUPPORT_PCRE2_32 + if (test_mode == PCRE32_MODE) *r32++ = c; +#endif + } + + SET(*r, 0); + if ((dat_datctl.control & CTL_ZERO_TERMINATE) != 0) + rlen = PCRE2_ZERO_TERMINATED; + else + rlen = (CASTVAR(uint8_t *, r) - rbuffer)/code_unit_size; + + if ((dat_datctl.control2 & CTL2_SUBSTITUTE_CALLOUT) != 0) + { + PCRE2_SET_SUBSTITUTE_CALLOUT(dat_context, substitute_callout_function, NULL); + } + else + { + PCRE2_SET_SUBSTITUTE_CALLOUT(dat_context, NULL, NULL); /* No callout */ + } + + PCRE2_SUBSTITUTE(rc, compiled_code, pp, arg_ulen, dat_datctl.offset, + dat_datctl.options|xoptions, match_data, use_dat_context, + rbuffer, rlen, nbuffer, &nsize); + + if (rc < 0) + { + fprintf(outfile, "Failed: error %d", rc); + if (rc != PCRE2_ERROR_NOMEMORY && nsize != PCRE2_UNSET) + fprintf(outfile, " at offset %ld in replacement", (long int)nsize); + fprintf(outfile, ": "); + if (!print_error_message(rc, "", "")) return PR_ABEND; + if (rc == PCRE2_ERROR_NOMEMORY && + (xoptions & PCRE2_SUBSTITUTE_OVERFLOW_LENGTH) != 0) + fprintf(outfile, ": %ld code units are needed", (long int)nsize); + } + else + { + fprintf(outfile, "%2d: ", rc); + PCHARSV(nbuffer, 0, nsize, utf, outfile); + } + + fprintf(outfile, "\n"); + show_memory = FALSE; + + /* Show final ovector contents if requested. */ + + if ((dat_datctl.control2 & CTL2_ALLVECTOR) != 0) + show_ovector(ovector, oveccount); + + return PR_OK; + } /* End of substitution handling */ + +/* When a replacement string is not provided, run a loop for global matching +with one of the basic matching functions. For altglobal (or first time round +the loop), set an "unset" value for the previous match info. */ + +ovecsave[0] = ovecsave[1] = ovecsave[2] = PCRE2_UNSET; + +for (gmatched = 0;; gmatched++) + { + PCRE2_SIZE j; + int capcount; + + /* Fill the ovector with junk to detect elements that do not get set + when they should be. */ + + for (j = 0; j < 2*oveccount; j++) ovector[j] = JUNK_OFFSET; + + /* When matching is via pcre2_match(), we will detect the use of JIT via the + stack callback function. */ + + jit_was_used = (pat_patctl.control & CTL_JITFAST) != 0; + + /* Do timing if required. */ + + if (timeitm > 0) + { + int i; + clock_t start_time, time_taken; + + if ((dat_datctl.control & CTL_DFA) != 0) + { + if ((dat_datctl.options & PCRE2_DFA_RESTART) != 0) + { + fprintf(outfile, "Timing DFA restarts is not supported\n"); + return PR_OK; + } + if (dfa_workspace == NULL) + dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int)); + start_time = clock(); + for (i = 0; i < timeitm; i++) + { + PCRE2_DFA_MATCH(capcount, compiled_code, pp, arg_ulen, + dat_datctl.offset, dat_datctl.options | g_notempty, match_data, + use_dat_context, dfa_workspace, DFA_WS_DIMENSION); + } + } + + else if ((pat_patctl.control & CTL_JITFAST) != 0) + { + start_time = clock(); + for (i = 0; i < timeitm; i++) + { + PCRE2_JIT_MATCH(capcount, compiled_code, pp, arg_ulen, + dat_datctl.offset, dat_datctl.options | g_notempty, match_data, + use_dat_context); + } + } + + else + { + start_time = clock(); + for (i = 0; i < timeitm; i++) + { + PCRE2_MATCH(capcount, compiled_code, pp, arg_ulen, + dat_datctl.offset, dat_datctl.options | g_notempty, match_data, + use_dat_context); + } + } + total_match_time += (time_taken = clock() - start_time); + fprintf(outfile, "Match time %.4f milliseconds\n", + (((double)time_taken * 1000.0) / (double)timeitm) / + (double)CLOCKS_PER_SEC); + } + + /* Find the heap, match and depth limits if requested. The depth and heap + limits are not relevant for JIT. The return from check_match_limit() is the + return from the final call to pcre2_match() or pcre2_dfa_match(). */ + + if ((dat_datctl.control & CTL_FINDLIMITS) != 0) + { + capcount = 0; /* This stops compiler warnings */ + + if (FLD(compiled_code, executable_jit) == NULL || + (dat_datctl.options & PCRE2_NO_JIT) != 0) + { + (void)check_match_limit(pp, arg_ulen, PCRE2_ERROR_HEAPLIMIT, "heap"); + } + + capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_MATCHLIMIT, + "match"); + + if (FLD(compiled_code, executable_jit) == NULL || + (dat_datctl.options & PCRE2_NO_JIT) != 0 || + (dat_datctl.control & CTL_DFA) != 0) + { + capcount = check_match_limit(pp, arg_ulen, PCRE2_ERROR_DEPTHLIMIT, + "depth"); + } + + if (capcount == 0) + { + fprintf(outfile, "Matched, but offsets vector is too small to show all matches\n"); + capcount = dat_datctl.oveccount; + } + } + + /* Otherwise just run a single match, setting up a callout if required (the + default). There is a copy of the pattern in pbuffer8 for use by callouts. */ + + else + { + if ((dat_datctl.control & CTL_CALLOUT_NONE) == 0) + { + PCRE2_SET_CALLOUT(dat_context, callout_function, + (void *)(&dat_datctl.callout_data)); + first_callout = TRUE; + last_callout_mark = NULL; + callout_count = 0; + } + else + { + PCRE2_SET_CALLOUT(dat_context, NULL, NULL); /* No callout */ + } + + /* Run a single DFA or NFA match. */ + + if ((dat_datctl.control & CTL_DFA) != 0) + { + if (dfa_workspace == NULL) + dfa_workspace = (int *)malloc(DFA_WS_DIMENSION*sizeof(int)); + if (dfa_matched++ == 0) + dfa_workspace[0] = -1; /* To catch bad restart */ + PCRE2_DFA_MATCH(capcount, compiled_code, pp, arg_ulen, + dat_datctl.offset, dat_datctl.options | g_notempty, match_data, + use_dat_context, dfa_workspace, DFA_WS_DIMENSION); + if (capcount == 0) + { + fprintf(outfile, "Matched, but offsets vector is too small to show all matches\n"); + capcount = dat_datctl.oveccount; + } + } + else + { + if ((pat_patctl.control & CTL_JITFAST) != 0) + PCRE2_JIT_MATCH(capcount, compiled_code, pp, arg_ulen, dat_datctl.offset, + dat_datctl.options | g_notempty, match_data, use_dat_context); + else + PCRE2_MATCH(capcount, compiled_code, pp, arg_ulen, dat_datctl.offset, + dat_datctl.options | g_notempty, match_data, use_dat_context); + if (capcount == 0) + { + fprintf(outfile, "Matched, but too many substrings\n"); + capcount = dat_datctl.oveccount; + } + } + } + + /* The result of the match is now in capcount. First handle a successful + match. */ + + if (capcount >= 0) + { + int i; + + if (capcount > (int)oveccount) /* Check for lunatic return value */ + { + fprintf(outfile, + "** PCRE2 error: returned count %d is too big for ovector count %d\n", + capcount, oveccount); + capcount = oveccount; + if ((dat_datctl.control & CTL_ANYGLOB) != 0) + { + fprintf(outfile, "** Global loop abandoned\n"); + dat_datctl.control &= ~CTL_ANYGLOB; /* Break g/G loop */ + } + } + + /* If PCRE2_COPY_MATCHED_SUBJECT was set, check that things are as they + should be, but not for fast JIT, where it isn't supported. */ + + if ((dat_datctl.options & PCRE2_COPY_MATCHED_SUBJECT) != 0 && + (pat_patctl.control & CTL_JITFAST) == 0) + { + if ((FLD(match_data, flags) & PCRE2_MD_COPIED_SUBJECT) == 0) + fprintf(outfile, + "** PCRE2 error: flag not set after copy_matched_subject\n"); + + if (CASTFLD(void *, match_data, subject) == pp) + fprintf(outfile, + "** PCRE2 error: copy_matched_subject has not copied\n"); + + if (memcmp(CASTFLD(void *, match_data, subject), pp, ulen) != 0) + fprintf(outfile, + "** PCRE2 error: copy_matched_subject mismatch\n"); + } + + /* If this is not the first time round a global loop, check that the + returned string has changed. If it has not, check for an empty string match + at different starting offset from the previous match. This is a failed test + retry for null-matching patterns that don't match at their starting offset, + for example /(?<=\G.)/. A repeated match at the same point is not such a + pattern, and must be discarded, and we then proceed to seek a non-null + match at the current point. For any other repeated match, there is a bug + somewhere and we must break the loop because it will go on for ever. We + know that there are always at least two elements in the ovector. */ + + if (gmatched > 0 && ovecsave[0] == ovector[0] && ovecsave[1] == ovector[1]) + { + if (ovector[0] == ovector[1] && ovecsave[2] != dat_datctl.offset) + { + g_notempty = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED; + ovecsave[2] = dat_datctl.offset; + continue; /* Back to the top of the loop */ + } + fprintf(outfile, + "** PCRE2 error: global repeat returned the same string as previous\n"); + fprintf(outfile, "** Global loop abandoned\n"); + dat_datctl.control &= ~CTL_ANYGLOB; /* Break g/G loop */ + } + + /* "allcaptures" requests showing of all captures in the pattern, to check + unset ones at the end. It may be set on the pattern or the data. Implement + by setting capcount to the maximum. This is not relevant for DFA matching, + so ignore it (warning given above). */ + + if ((dat_datctl.control & (CTL_ALLCAPTURES|CTL_DFA)) == CTL_ALLCAPTURES) + { + capcount = maxcapcount + 1; /* Allow for full match */ + if (capcount > (int)oveccount) capcount = oveccount; + } + + /* "allvector" request showing the entire ovector. */ + + if ((dat_datctl.control2 & CTL2_ALLVECTOR) != 0) capcount = oveccount; + + /* Output the captured substrings. Note that, for the matched string, + the use of \K in an assertion can make the start later than the end. */ + + for (i = 0; i < 2*capcount; i += 2) + { + PCRE2_SIZE lleft, lmiddle, lright; + PCRE2_SIZE start = ovector[i]; + PCRE2_SIZE end = ovector[i+1]; + + if (start > end) + { + start = ovector[i+1]; + end = ovector[i]; + fprintf(outfile, "Start of matched string is beyond its end - " + "displaying from end to start.\n"); + } + + fprintf(outfile, "%2d: ", i/2); + + /* Check for an unset group */ + + if (start == PCRE2_UNSET && end == PCRE2_UNSET) + { + fprintf(outfile, "\n"); + continue; + } + + /* Check for silly offsets, in particular, values that have not been + set when they should have been. However, if we are past the end of the + captures for this pattern ("allvector" causes this), or if we are DFA + matching, it isn't an error if the entry is unchanged. */ + + if (start > ulen || end > ulen) + { + if (((dat_datctl.control & CTL_DFA) != 0 || + i >= (int)(2*maxcapcount + 2)) && + start == JUNK_OFFSET && end == JUNK_OFFSET) + fprintf(outfile, "\n"); + else + fprintf(outfile, "ERROR: bad value(s) for offset(s): 0x%lx 0x%lx\n", + (unsigned long int)start, (unsigned long int)end); + continue; + } + + /* When JIT is not being used, ALLUSEDTEXT may be set. (It if is set with + JIT, it is disabled above, with a comment.) When the match is done by the + interpreter, leftchar and rightchar are available, and if ALLUSEDTEXT is + set, and if the leftmost consulted character is before the start of the + match or the rightmost consulted character is past the end of the match, + we want to show all consulted characters for the main matched string, and + indicate which were lookarounds. */ + + if (i == 0) + { + BOOL showallused; + PCRE2_SIZE leftchar, rightchar; + + if ((dat_datctl.control & CTL_ALLUSEDTEXT) != 0) + { + leftchar = FLD(match_data, leftchar); + rightchar = FLD(match_data, rightchar); + showallused = i == 0 && (leftchar < start || rightchar > end); + } + else showallused = FALSE; + + if (showallused) + { + PCHARS(lleft, pp, leftchar, start - leftchar, utf, outfile); + PCHARS(lmiddle, pp, start, end - start, utf, outfile); + PCHARS(lright, pp, end, rightchar - end, utf, outfile); + if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used) + fprintf(outfile, " (JIT)"); + fprintf(outfile, "\n "); + for (j = 0; j < lleft; j++) fprintf(outfile, "<"); + for (j = 0; j < lmiddle; j++) fprintf(outfile, " "); + for (j = 0; j < lright; j++) fprintf(outfile, ">"); + } + + /* When a pattern contains \K, the start of match position may be + different to the start of the matched string. When this is the case, + show it when requested. */ + + else if ((dat_datctl.control & CTL_STARTCHAR) != 0) + { + PCRE2_SIZE startchar; + PCRE2_GET_STARTCHAR(startchar, match_data); + PCHARS(lleft, pp, startchar, start - startchar, utf, outfile); + PCHARSV(pp, start, end - start, utf, outfile); + if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used) + fprintf(outfile, " (JIT)"); + if (startchar != start) + { + fprintf(outfile, "\n "); + for (j = 0; j < lleft; j++) fprintf(outfile, "^"); + } + } + + /* Otherwise, just show the matched string. */ + + else + { + PCHARSV(pp, start, end - start, utf, outfile); + if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used) + fprintf(outfile, " (JIT)"); + } + } + + /* Not the main matched string. Just show it unadorned. */ + + else + { + PCHARSV(pp, start, end - start, utf, outfile); + } + + fprintf(outfile, "\n"); + + /* Note: don't use the start/end variables here because we want to + show the text from what is reported as the end. */ + + if ((dat_datctl.control & CTL_ALLAFTERTEXT) != 0 || + (i == 0 && (dat_datctl.control & CTL_AFTERTEXT) != 0)) + { + fprintf(outfile, "%2d+ ", i/2); + PCHARSV(pp, ovector[i+1], ulen - ovector[i+1], utf, outfile); + fprintf(outfile, "\n"); + } + } + + /* Output (*MARK) data if requested */ + + if ((dat_datctl.control & CTL_MARK) != 0 && + TESTFLD(match_data, mark, !=, NULL)) + { + fprintf(outfile, "MK: "); + PCHARSV(CASTFLD(void *, match_data, mark), -1, -1, utf, outfile); + fprintf(outfile, "\n"); + } + + /* Process copy/get strings */ + + if (!copy_and_get(utf, capcount)) return PR_ABEND; + + } /* End of handling a successful match */ + + /* There was a partial match. The value of ovector[0] is the bumpalong point, + that is, startchar, not any \K point that might have been passed. When JIT is + not in use, "allusedtext" may be set, in which case we indicate the leftmost + consulted character. */ + + else if (capcount == PCRE2_ERROR_PARTIAL) + { + PCRE2_SIZE leftchar; + int backlength; + int rubriclength = 0; + + if ((dat_datctl.control & CTL_ALLUSEDTEXT) != 0) + { + leftchar = FLD(match_data, leftchar); + } + else leftchar = ovector[0]; + + fprintf(outfile, "Partial match"); + if ((dat_datctl.control & CTL_MARK) != 0 && + TESTFLD(match_data, mark, !=, NULL)) + { + fprintf(outfile, ", mark="); + PCHARS(rubriclength, CASTFLD(void *, match_data, mark), -1, -1, utf, + outfile); + rubriclength += 7; + } + fprintf(outfile, ": "); + rubriclength += 15; + + PCHARS(backlength, pp, leftchar, ovector[0] - leftchar, utf, outfile); + PCHARSV(pp, ovector[0], ulen - ovector[0], utf, outfile); + + if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used) + fprintf(outfile, " (JIT)"); + fprintf(outfile, "\n"); + + if (backlength != 0) + { + int i; + for (i = 0; i < rubriclength; i++) fprintf(outfile, " "); + for (i = 0; i < backlength; i++) fprintf(outfile, "<"); + fprintf(outfile, "\n"); + } + + if (ulen != ovector[1]) + fprintf(outfile, "** ovector[1] is not equal to the subject length: " + "%ld != %ld\n", (unsigned long int)ovector[1], (unsigned long int)ulen); + + /* Process copy/get strings */ + + if (!copy_and_get(utf, 1)) return PR_ABEND; + + /* "allvector" outputs the entire vector */ + + if ((dat_datctl.control2 & CTL2_ALLVECTOR) != 0) + show_ovector(ovector, oveccount); + + break; /* Out of the /g loop */ + } /* End of handling partial match */ + + /* Failed to match. If this is a /g or /G loop, we might previously have + set g_notempty (to PCRE2_NOTEMPTY_ATSTART|PCRE2_ANCHORED) after a null match. + If that is the case, this is not necessarily the end. We want to advance the + start offset, and continue. We won't be at the end of the string - that was + checked before setting g_notempty. We achieve the effect by pretending that a + single character was matched. + + Complication arises in the case when the newline convention is "any", "crlf", + or "anycrlf". If the previous match was at the end of a line terminated by + CRLF, an advance of one character just passes the CR, whereas we should + prefer the longer newline sequence, as does the code in pcre2_match(). + + Otherwise, in the case of UTF-8 or UTF-16 matching, the advance must be one + character, not one byte. */ + + else if (g_notempty != 0) /* There was a previous null match */ + { + uint16_t nl = FLD(compiled_code, newline_convention); + PCRE2_SIZE start_offset = dat_datctl.offset; /* Where the match was */ + PCRE2_SIZE end_offset = start_offset + 1; + + if ((nl == PCRE2_NEWLINE_CRLF || nl == PCRE2_NEWLINE_ANY || + nl == PCRE2_NEWLINE_ANYCRLF) && + start_offset < ulen - 1 && + CODE_UNIT(pp, start_offset) == '\r' && + CODE_UNIT(pp, end_offset) == '\n') + end_offset++; + + else if (utf && test_mode != PCRE32_MODE) + { + if (test_mode == PCRE8_MODE) + { + for (; end_offset < ulen; end_offset++) + if ((((PCRE2_SPTR8)pp)[end_offset] & 0xc0) != 0x80) break; + } + else /* 16-bit mode */ + { + for (; end_offset < ulen; end_offset++) + if ((((PCRE2_SPTR16)pp)[end_offset] & 0xfc00) != 0xdc00) break; + } + } + + SETFLDVEC(match_data, ovector, 0, start_offset); + SETFLDVEC(match_data, ovector, 1, end_offset); + } /* End of handling null match in a global loop */ + + /* A "normal" match failure. There will be a negative error number in + capcount. */ + + else + { + switch(capcount) + { + case PCRE2_ERROR_NOMATCH: + if (gmatched == 0) + { + fprintf(outfile, "No match"); + if ((dat_datctl.control & CTL_MARK) != 0 && + TESTFLD(match_data, mark, !=, NULL)) + { + fprintf(outfile, ", mark = "); + PCHARSV(CASTFLD(void *, match_data, mark), -1, -1, utf, outfile); + } + if ((pat_patctl.control & CTL_JITVERIFY) != 0 && jit_was_used) + fprintf(outfile, " (JIT)"); + fprintf(outfile, "\n"); + + /* "allvector" outputs the entire vector */ + + if ((dat_datctl.control2 & CTL2_ALLVECTOR) != 0) + show_ovector(ovector, oveccount); + } + break; + + case PCRE2_ERROR_BADUTFOFFSET: + fprintf(outfile, "Error %d (bad UTF-%d offset)\n", capcount, test_mode); + break; + + default: + fprintf(outfile, "Failed: error %d: ", capcount); + if (!print_error_message(capcount, "", "")) return PR_ABEND; + if (capcount <= PCRE2_ERROR_UTF8_ERR1 && + capcount >= PCRE2_ERROR_UTF32_ERR2) + { + PCRE2_SIZE startchar; + PCRE2_GET_STARTCHAR(startchar, match_data); + fprintf(outfile, " at offset %" SIZ_FORM, SIZ_CAST startchar); + } + fprintf(outfile, "\n"); + break; + } + + break; /* Out of the /g loop */ + } /* End of failed match handling */ + + /* Control reaches here in two circumstances: (a) after a match, and (b) + after a non-match that immediately followed a match on an empty string when + doing a global search. Such a match is done with PCRE2_NOTEMPTY_ATSTART and + PCRE2_ANCHORED set in g_notempty. The code above turns it into a fake match + of one character. So effectively we get here only after a match. If we + are not doing a global search, we are done. */ + + if ((dat_datctl.control & CTL_ANYGLOB) == 0) break; else + { + PCRE2_SIZE match_offset = FLD(match_data, ovector)[0]; + PCRE2_SIZE end_offset = FLD(match_data, ovector)[1]; + + /* We must now set up for the next iteration of a global search. If we have + matched an empty string, first check to see if we are at the end of the + subject. If so, the loop is over. Otherwise, mimic what Perl's /g option + does. Set PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED and try the match again + at the same point. If this fails it will be picked up above, where a fake + match is set up so that at this point we advance to the next character. + + However, in order to cope with patterns that never match at their starting + offset (e.g. /(?<=\G.)/) we don't do this when the match offset is greater + than the starting offset. This means there will be a retry with the + starting offset at the match offset. If this returns the same match again, + it is picked up above and ignored, and the special action is then taken. */ + + if (match_offset == end_offset) + { + if (end_offset == ulen) break; /* End of subject */ + if (match_offset <= dat_datctl.offset) + g_notempty = PCRE2_NOTEMPTY_ATSTART | PCRE2_ANCHORED; + } + + /* However, even after matching a non-empty string, there is still one + tricky case. If a pattern contains \K within a lookbehind assertion at the + start, the end of the matched string can be at the offset where the match + started. In the case of a normal /g iteration without special action, this + leads to a loop that keeps on returning the same substring. The loop would + be caught above, but we really want to move on to the next match. */ + + else + { + g_notempty = 0; /* Set for a "normal" repeat */ + if ((dat_datctl.control & CTL_GLOBAL) != 0) + { + PCRE2_SIZE startchar; + PCRE2_GET_STARTCHAR(startchar, match_data); + if (end_offset <= startchar) + { + if (startchar >= ulen) break; /* End of subject */ + end_offset = startchar + 1; + if (utf && test_mode != PCRE32_MODE) + { + if (test_mode == PCRE8_MODE) + { + for (; end_offset < ulen; end_offset++) + if ((((PCRE2_SPTR8)pp)[end_offset] & 0xc0) != 0x80) break; + } + else /* 16-bit mode */ + { + for (; end_offset < ulen; end_offset++) + if ((((PCRE2_SPTR16)pp)[end_offset] & 0xfc00) != 0xdc00) break; + } + } + } + } + } + + /* For a normal global (/g) iteration, save the current ovector[0,1] and + the starting offset so that we can check that they do change each time. + Otherwise a matching bug that returns the same string causes an infinite + loop. It has happened! Then update the start offset, leaving other + parameters alone. */ + + if ((dat_datctl.control & CTL_GLOBAL) != 0) + { + ovecsave[0] = ovector[0]; + ovecsave[1] = ovector[1]; + ovecsave[2] = dat_datctl.offset; + dat_datctl.offset = end_offset; + } + + /* For altglobal, just update the pointer and length. */ + + else + { + pp += end_offset * code_unit_size; + len -= end_offset * code_unit_size; + ulen -= end_offset; + if (arg_ulen != PCRE2_ZERO_TERMINATED) arg_ulen -= end_offset; + } + } + } /* End of global loop */ + +show_memory = FALSE; +return PR_OK; +} + + + + +/************************************************* +* Print PCRE2 version * +*************************************************/ + +static void +print_version(FILE *f) +{ +VERSION_TYPE *vp; +fprintf(f, "PCRE2 version "); +for (vp = version; *vp != 0; vp++) fprintf(f, "%c", *vp); +fprintf(f, "\n"); +} + + + +/************************************************* +* Print Unicode version * +*************************************************/ + +static void +print_unicode_version(FILE *f) +{ +VERSION_TYPE *vp; +fprintf(f, "Unicode version "); +for (vp = uversion; *vp != 0; vp++) fprintf(f, "%c", *vp); +} + + + +/************************************************* +* Print JIT target * +*************************************************/ + +static void +print_jit_target(FILE *f) +{ +VERSION_TYPE *vp; +for (vp = jittarget; *vp != 0; vp++) fprintf(f, "%c", *vp); +} + + + +/************************************************* +* Print newline configuration * +*************************************************/ + +/* Output is always to stdout. + +Arguments: + rc the return code from PCRE2_CONFIG_NEWLINE + isc TRUE if called from "-C newline" +Returns: nothing +*/ + +static void +print_newline_config(uint32_t optval, BOOL isc) +{ +if (!isc) printf(" Default newline sequence is "); +if (optval < sizeof(newlines)/sizeof(char *)) + printf("%s\n", newlines[optval]); +else + printf("a non-standard value: %d\n", optval); +} + + + +/************************************************* +* Usage function * +*************************************************/ + +static void +usage(void) +{ +printf("Usage: pcre2test [options] [ []]\n\n"); +printf("Input and output default to stdin and stdout.\n"); +#if defined(SUPPORT_LIBREADLINE) || defined(SUPPORT_LIBEDIT) +printf("If input is a terminal, readline() is used to read from it.\n"); +#else +printf("This version of pcre2test is not linked with readline().\n"); +#endif +printf("\nOptions:\n"); +#ifdef SUPPORT_PCRE2_8 +printf(" -8 use the 8-bit library\n"); +#endif +#ifdef SUPPORT_PCRE2_16 +printf(" -16 use the 16-bit library\n"); +#endif +#ifdef SUPPORT_PCRE2_32 +printf(" -32 use the 32-bit library\n"); +#endif +printf(" -ac set default pattern modifier PCRE2_AUTO_CALLOUT\n"); +printf(" -AC as -ac, but also set subject 'callout_extra' modifier\n"); +printf(" -b set default pattern modifier 'fullbincode'\n"); +printf(" -C show PCRE2 compile-time options and exit\n"); +printf(" -C arg show a specific compile-time option and exit with its\n"); +printf(" value if numeric (else 0). The arg can be:\n"); +printf(" backslash-C use of \\C is enabled [0, 1]\n"); +printf(" bsr \\R type [ANYCRLF, ANY]\n"); +printf(" ebcdic compiled for EBCDIC character code [0,1]\n"); +printf(" ebcdic-nl NL code if compiled for EBCDIC\n"); +printf(" jit just-in-time compiler supported [0, 1]\n"); +printf(" linksize internal link size [2, 3, 4]\n"); +printf(" newline newline type [CR, LF, CRLF, ANYCRLF, ANY, NUL]\n"); +printf(" pcre2-8 8 bit library support enabled [0, 1]\n"); +printf(" pcre2-16 16 bit library support enabled [0, 1]\n"); +printf(" pcre2-32 32 bit library support enabled [0, 1]\n"); +printf(" unicode Unicode and UTF support enabled [0, 1]\n"); +printf(" -d set default pattern modifier 'debug'\n"); +printf(" -dfa set default subject modifier 'dfa'\n"); +printf(" -error show messages for error numbers, then exit\n"); +printf(" -help show usage information\n"); +printf(" -i set default pattern modifier 'info'\n"); +printf(" -jit set default pattern modifier 'jit'\n"); +printf(" -jitfast set default pattern modifier 'jitfast'\n"); +printf(" -jitverify set default pattern modifier 'jitverify'\n"); +printf(" -LM list pattern and subject modifiers, then exit\n"); +printf(" -q quiet: do not output PCRE2 version number at start\n"); +printf(" -pattern set default pattern modifier fields\n"); +printf(" -subject set default subject modifier fields\n"); +printf(" -S set stack size to mebibytes\n"); +printf(" -t [] time compilation and execution, repeating times\n"); +printf(" -tm [] time execution (matching) only, repeating times\n"); +printf(" -T same as -t, but show total times at the end\n"); +printf(" -TM same as -tm, but show total time at the end\n"); +printf(" -version show PCRE2 version and exit\n"); +} + + + +/************************************************* +* Handle -C option * +*************************************************/ + +/* This option outputs configuration options and sets an appropriate return +code when asked for a single option. The code is abstracted into a separate +function because of its size. Use whichever pcre2_config() function is +available. + +Argument: an option name or NULL +Returns: the return code +*/ + +static int +c_option(const char *arg) +{ +uint32_t optval; +unsigned int i = COPTLISTCOUNT; +int yield = 0; + +if (arg != NULL && arg[0] != CHAR_MINUS) + { + for (i = 0; i < COPTLISTCOUNT; i++) + if (strcmp(arg, coptlist[i].name) == 0) break; + + if (i >= COPTLISTCOUNT) + { + fprintf(stderr, "** Unknown -C option '%s'\n", arg); + return 0; + } + + switch (coptlist[i].type) + { + case CONF_BSR: + (void)PCRE2_CONFIG(coptlist[i].value, &optval); + printf("%s\n", (optval == PCRE2_BSR_ANYCRLF)? "ANYCRLF" : "ANY"); + break; + + case CONF_FIX: + yield = coptlist[i].value; + printf("%d\n", yield); + break; + + case CONF_FIZ: + optval = coptlist[i].value; + printf("%d\n", optval); + break; + + case CONF_INT: + (void)PCRE2_CONFIG(coptlist[i].value, &yield); + printf("%d\n", yield); + break; + + case CONF_NL: + (void)PCRE2_CONFIG(coptlist[i].value, &optval); + print_newline_config(optval, TRUE); + break; + } + +/* For VMS, return the value by setting a symbol, for certain values only. This +is contributed code which the PCRE2 developers have no means of testing. */ + +#ifdef __VMS + +/* This is the original code provided by the first VMS contributor. */ +#ifdef NEVER + if (copytlist[i].type == CONF_FIX || coptlist[i].type == CONF_INT) + { + char ucname[16]; + strcpy(ucname, coptlist[i].name); + for (i = 0; ucname[i] != 0; i++) ucname[i] = toupper[ucname[i]]; + vms_setsymbol(ucname, 0, optval); + } +#endif + +/* This is the new code, provided by a second VMS contributor. */ + + if (coptlist[i].type == CONF_FIX || coptlist[i].type == CONF_INT) + { + char nam_buf[22], val_buf[4]; + $DESCRIPTOR(nam, nam_buf); + $DESCRIPTOR(val, val_buf); + + strcpy(nam_buf, coptlist[i].name); + nam.dsc$w_length = strlen(nam_buf); + sprintf(val_buf, "%d", yield); + val.dsc$w_length = strlen(val_buf); + lib$set_symbol(&nam, &val); + } +#endif /* __VMS */ + + return yield; + } + +/* No argument for -C: output all configuration information. */ + +print_version(stdout); +printf("Compiled with\n"); + +#ifdef EBCDIC +printf(" EBCDIC code support: LF is 0x%02x\n", CHAR_LF); +#if defined NATIVE_ZOS +printf(" EBCDIC code page %s or similar\n", pcrz_cpversion()); +#endif +#endif + +(void)PCRE2_CONFIG(PCRE2_CONFIG_COMPILED_WIDTHS, &optval); +if (optval & 1) printf(" 8-bit support\n"); +if (optval & 2) printf(" 16-bit support\n"); +if (optval & 4) printf(" 32-bit support\n"); + +#ifdef SUPPORT_VALGRIND +printf(" Valgrind support\n"); +#endif + +(void)PCRE2_CONFIG(PCRE2_CONFIG_UNICODE, &optval); +if (optval != 0) + { + printf(" UTF and UCP support ("); + print_unicode_version(stdout); + printf(")\n"); + } +else printf(" No Unicode support\n"); + +(void)PCRE2_CONFIG(PCRE2_CONFIG_JIT, &optval); +if (optval != 0) + { + printf(" Just-in-time compiler support: "); + print_jit_target(stdout); + printf("\n"); + } +else + { + printf(" No just-in-time compiler support\n"); + } + +(void)PCRE2_CONFIG(PCRE2_CONFIG_NEWLINE, &optval); +print_newline_config(optval, FALSE); +(void)PCRE2_CONFIG(PCRE2_CONFIG_BSR, &optval); +printf(" \\R matches %s\n", + (optval == PCRE2_BSR_ANYCRLF)? "CR, LF, or CRLF only" : + "all Unicode newlines"); +(void)PCRE2_CONFIG(PCRE2_CONFIG_NEVER_BACKSLASH_C, &optval); +printf(" \\C is %ssupported\n", optval? "not ":""); +(void)PCRE2_CONFIG(PCRE2_CONFIG_LINKSIZE, &optval); +printf(" Internal link size = %d\n", optval); +(void)PCRE2_CONFIG(PCRE2_CONFIG_PARENSLIMIT, &optval); +printf(" Parentheses nest limit = %d\n", optval); +(void)PCRE2_CONFIG(PCRE2_CONFIG_HEAPLIMIT, &optval); +printf(" Default heap limit = %d kibibytes\n", optval); +(void)PCRE2_CONFIG(PCRE2_CONFIG_MATCHLIMIT, &optval); +printf(" Default match limit = %d\n", optval); +(void)PCRE2_CONFIG(PCRE2_CONFIG_DEPTHLIMIT, &optval); +printf(" Default depth limit = %d\n", optval); + +#if defined SUPPORT_LIBREADLINE +printf(" pcre2test has libreadline support\n"); +#elif defined SUPPORT_LIBEDIT +printf(" pcre2test has libedit support\n"); +#else +printf(" pcre2test has neither libreadline nor libedit support\n"); +#endif + +return 0; +} + + + +/************************************************* +* Display one modifier * +*************************************************/ + +static void +display_one_modifier(modstruct *m, BOOL for_pattern) +{ +uint32_t c = (!for_pattern && (m->which == MOD_PND || m->which == MOD_PNDP))? + '*' : ' '; +printf("%c%s", c, m->name); +} + + + +/************************************************* +* Display pattern or subject modifiers * +*************************************************/ + +/* In order to print in two columns, first scan without printing to get a list +of the modifiers that are required. + +Arguments: + for_pattern TRUE for pattern modifiers, FALSE for subject modifiers + title string to be used in title + +Returns: nothing +*/ + +static void +display_selected_modifiers(BOOL for_pattern, const char *title) +{ +uint32_t i, j; +uint32_t n = 0; +uint32_t list[MODLISTCOUNT]; + +for (i = 0; i < MODLISTCOUNT; i++) + { + BOOL is_pattern = TRUE; + modstruct *m = modlist + i; + + switch (m->which) + { + case MOD_CTC: /* Compile context */ + case MOD_PAT: /* Pattern */ + case MOD_PATP: /* Pattern, OK for Perl-compatible test */ + break; + + /* The MOD_PND and MOD_PNDP modifiers are precisely those that affect + subjects, but can be given with a pattern. We list them as subject + modifiers, but marked with an asterisk.*/ + + case MOD_CTM: /* Match context */ + case MOD_DAT: /* Subject line */ + case MOD_PND: /* As PD, but not default pattern */ + case MOD_PNDP: /* As PND, OK for Perl-compatible test */ + is_pattern = FALSE; + break; + + default: printf("** Unknown type for modifier '%s'\n", m->name); + /* Fall through */ + case MOD_PD: /* Pattern or subject */ + case MOD_PDP: /* As PD, OK for Perl-compatible test */ + is_pattern = for_pattern; + break; + } + + if (for_pattern == is_pattern) list[n++] = i; + } + +/* Now print from the list in two columns. */ + +printf("-------------- %s MODIFIERS --------------\n", title); + +for (i = 0, j = (n+1)/2; i < (n+1)/2; i++, j++) + { + modstruct *m = modlist + list[i]; + display_one_modifier(m, for_pattern); + if (j < n) + { + uint32_t k = 27 - strlen(m->name); + while (k-- > 0) printf(" "); + display_one_modifier(modlist + list[j], for_pattern); + } + printf("\n"); + } +} + + + +/************************************************* +* Display the list of modifiers * +*************************************************/ + +static void +display_modifiers(void) +{ +printf( + "An asterisk on a subject modifier means that it may be given on a pattern\n" + "line, in order to apply to all subjects matched by that pattern. Modifiers\n" + "that are listed for both patterns and subjects have different effects in\n" + "each case.\n\n"); +display_selected_modifiers(TRUE, "PATTERN"); +printf("\n"); +display_selected_modifiers(FALSE, "SUBJECT"); +} + + + +/************************************************* +* Main Program * +*************************************************/ + +int +main(int argc, char **argv) +{ +uint32_t temp; +uint32_t yield = 0; +uint32_t op = 1; +BOOL notdone = TRUE; +BOOL quiet = FALSE; +BOOL showtotaltimes = FALSE; +BOOL skipping = FALSE; +char *arg_subject = NULL; +char *arg_pattern = NULL; +char *arg_error = NULL; + +/* The offsets to the options and control bits fields of the pattern and data +control blocks must be the same so that common options and controls such as +"anchored" or "memory" can work for either of them from a single table entry. +We cannot test this till runtime because "offsetof" does not work in the +preprocessor. */ + +if (PO(options) != DO(options) || PO(control) != DO(control) || + PO(control2) != DO(control2)) + { + fprintf(stderr, "** Coding error: " + "options and control offsets for pattern and data must be the same.\n"); + return 1; + } + +/* Get the PCRE2 and Unicode version number and JIT target information, at the +same time checking that a request for the length gives the same answer. Also +check lengths for non-string items. */ + +if (PCRE2_CONFIG(PCRE2_CONFIG_VERSION, NULL) != + PCRE2_CONFIG(PCRE2_CONFIG_VERSION, version) || + + PCRE2_CONFIG(PCRE2_CONFIG_UNICODE_VERSION, NULL) != + PCRE2_CONFIG(PCRE2_CONFIG_UNICODE_VERSION, uversion) || + + PCRE2_CONFIG(PCRE2_CONFIG_JITTARGET, NULL) != + PCRE2_CONFIG(PCRE2_CONFIG_JITTARGET, jittarget) || + + PCRE2_CONFIG(PCRE2_CONFIG_UNICODE, NULL) != sizeof(uint32_t) || + PCRE2_CONFIG(PCRE2_CONFIG_MATCHLIMIT, NULL) != sizeof(uint32_t)) + { + fprintf(stderr, "** Error in pcre2_config(): bad length\n"); + return 1; + } + +/* Check that bad options are diagnosed. */ + +if (PCRE2_CONFIG(999, NULL) != PCRE2_ERROR_BADOPTION || + PCRE2_CONFIG(999, &temp) != PCRE2_ERROR_BADOPTION) + { + fprintf(stderr, "** Error in pcre2_config(): bad option not diagnosed\n"); + return 1; + } + +/* This configuration option is now obsolete, but running a quick check ensures +that its code is covered. */ + +(void)PCRE2_CONFIG(PCRE2_CONFIG_STACKRECURSE, &temp); + +/* Get buffers from malloc() so that valgrind will check their misuse when +debugging. They grow automatically when very long lines are read. The 16- +and 32-bit buffers (pbuffer16, pbuffer32) are obtained only if needed. */ + +buffer = (uint8_t *)malloc(pbuffer8_size); +pbuffer8 = (uint8_t *)malloc(pbuffer8_size); + +/* The following _setmode() stuff is some Windows magic that tells its runtime +library to translate CRLF into a single LF character. At least, that's what +I've been told: never having used Windows I take this all on trust. Originally +it set 0x8000, but then I was advised that _O_BINARY was better. */ + +#if defined(_WIN32) || defined(WIN32) +_setmode( _fileno( stdout ), _O_BINARY ); +#endif + +/* Initialization that does not depend on the running mode. */ + +locale_name[0] = 0; + +memset(&def_patctl, 0, sizeof(patctl)); +def_patctl.convert_type = CONVERT_UNSET; + +memset(&def_datctl, 0, sizeof(datctl)); +def_datctl.oveccount = DEFAULT_OVECCOUNT; +def_datctl.copy_numbers[0] = -1; +def_datctl.get_numbers[0] = -1; +def_datctl.startend[0] = def_datctl.startend[1] = CFORE_UNSET; +def_datctl.cerror[0] = def_datctl.cerror[1] = CFORE_UNSET; +def_datctl.cfail[0] = def_datctl.cfail[1] = CFORE_UNSET; + +/* Scan command line options. */ + +while (argc > 1 && argv[op][0] == '-' && argv[op][1] != 0) + { + char *endptr; + char *arg = argv[op]; + unsigned long uli; + + /* List modifiers and exit. */ + + if (strcmp(arg, "-LM") == 0) + { + display_modifiers(); + goto EXIT; + } + + /* Display and/or set return code for configuration options. */ + + if (strcmp(arg, "-C") == 0) + { + yield = c_option(argv[op + 1]); + goto EXIT; + } + + /* Select operating mode. Ensure that pcre2_config() is called in 16-bit + and 32-bit modes because that won't happen naturally when 8-bit is also + configured. Also call some other functions that are not otherwise used. This + means that a coverage report won't claim there are uncalled functions. */ + + if (strcmp(arg, "-8") == 0) + { +#ifdef SUPPORT_PCRE2_8 + test_mode = PCRE8_MODE; + (void)pcre2_set_bsr_8(pat_context8, 999); + (void)pcre2_set_newline_8(pat_context8, 999); +#else + fprintf(stderr, + "** This version of PCRE2 was built without 8-bit support\n"); + exit(1); +#endif + } + + else if (strcmp(arg, "-16") == 0) + { +#ifdef SUPPORT_PCRE2_16 + test_mode = PCRE16_MODE; + (void)pcre2_config_16(PCRE2_CONFIG_VERSION, NULL); + (void)pcre2_set_bsr_16(pat_context16, 999); + (void)pcre2_set_newline_16(pat_context16, 999); +#else + fprintf(stderr, + "** This version of PCRE2 was built without 16-bit support\n"); + exit(1); +#endif + } + + else if (strcmp(arg, "-32") == 0) + { +#ifdef SUPPORT_PCRE2_32 + test_mode = PCRE32_MODE; + (void)pcre2_config_32(PCRE2_CONFIG_VERSION, NULL); + (void)pcre2_set_bsr_32(pat_context32, 999); + (void)pcre2_set_newline_32(pat_context32, 999); +#else + fprintf(stderr, + "** This version of PCRE2 was built without 32-bit support\n"); + exit(1); +#endif + } + + /* Set quiet (no version verification) */ + + else if (strcmp(arg, "-q") == 0) quiet = TRUE; + + /* Set system stack size */ + + else if (strcmp(arg, "-S") == 0 && argc > 2 && + ((uli = strtoul(argv[op+1], &endptr, 10)), *endptr == 0)) + { +#if defined(_WIN32) || defined(WIN32) || defined(__minix) || defined(NATIVE_ZOS) || defined(__VMS) + fprintf(stderr, "pcre2test: -S is not supported on this OS\n"); + exit(1); +#else + int rc; + uint32_t stack_size; + struct rlimit rlim; + if (U32OVERFLOW(uli)) + { + fprintf(stderr, "** Argument for -S is too big\n"); + exit(1); + } + stack_size = (uint32_t)uli; + getrlimit(RLIMIT_STACK, &rlim); + rlim.rlim_cur = stack_size * 1024 * 1024; + if (rlim.rlim_cur > rlim.rlim_max) + { + fprintf(stderr, + "pcre2test: requested stack size %luMiB is greater than hard limit " + "%luMiB\n", (unsigned long int)stack_size, + (unsigned long int)(rlim.rlim_max)); + exit(1); + } + rc = setrlimit(RLIMIT_STACK, &rlim); + if (rc != 0) + { + fprintf(stderr, "pcre2test: setting stack size %luMiB failed: %s\n", + (unsigned long int)stack_size, strerror(errno)); + exit(1); + } + op++; + argc--; +#endif + } + + /* Set some common pattern and subject controls */ + + else if (strcmp(arg, "-AC") == 0) + { + def_patctl.options |= PCRE2_AUTO_CALLOUT; + def_datctl.control2 |= CTL2_CALLOUT_EXTRA; + } + else if (strcmp(arg, "-ac") == 0) def_patctl.options |= PCRE2_AUTO_CALLOUT; + else if (strcmp(arg, "-b") == 0) def_patctl.control |= CTL_FULLBINCODE; + else if (strcmp(arg, "-d") == 0) def_patctl.control |= CTL_DEBUG; + else if (strcmp(arg, "-dfa") == 0) def_datctl.control |= CTL_DFA; + else if (strcmp(arg, "-i") == 0) def_patctl.control |= CTL_INFO; + else if (strcmp(arg, "-jit") == 0 || strcmp(arg, "-jitverify") == 0 || + strcmp(arg, "-jitfast") == 0) + { + if (arg[4] == 'v') def_patctl.control |= CTL_JITVERIFY; + else if (arg[4] == 'f') def_patctl.control |= CTL_JITFAST; + def_patctl.jit = JIT_DEFAULT; /* full & partial */ +#ifndef SUPPORT_JIT + fprintf(stderr, "** Warning: JIT support is not available: " + "-jit[fast|verify] calls functions that do nothing.\n"); +#endif + } + + /* Set timing parameters */ + + else if (strcmp(arg, "-t") == 0 || strcmp(arg, "-tm") == 0 || + strcmp(arg, "-T") == 0 || strcmp(arg, "-TM") == 0) + { + int both = arg[2] == 0; + showtotaltimes = arg[1] == 'T'; + if (argc > 2 && (uli = strtoul(argv[op+1], &endptr, 10), *endptr == 0)) + { + if (uli == 0) + { + fprintf(stderr, "** Argument for %s must not be zero\n", arg); + exit(1); + } + if (U32OVERFLOW(uli)) + { + fprintf(stderr, "** Argument for %s is too big\n", arg); + exit(1); + } + timeitm = (int)uli; + op++; + argc--; + } + else timeitm = LOOPREPEAT; + if (both) timeit = timeitm; + } + + /* Give help */ + + else if (strcmp(arg, "-help") == 0 || + strcmp(arg, "--help") == 0) + { + usage(); + goto EXIT; + } + + /* Show version */ + + else if (strcmp(arg, "-version") == 0 || + strcmp(arg, "--version") == 0) + { + print_version(stdout); + goto EXIT; + } + + /* The following options save their data for processing once we know what + the running mode is. */ + + else if (strcmp(arg, "-error") == 0) + { + arg_error = argv[op+1]; + goto CHECK_VALUE_EXISTS; + } + + else if (strcmp(arg, "-subject") == 0) + { + arg_subject = argv[op+1]; + goto CHECK_VALUE_EXISTS; + } + + else if (strcmp(arg, "-pattern") == 0) + { + arg_pattern = argv[op+1]; + CHECK_VALUE_EXISTS: + if (argc <= 2) + { + fprintf(stderr, "** Missing value for %s\n", arg); + yield = 1; + goto EXIT; + } + op++; + argc--; + } + + /* Unrecognized option */ + + else + { + fprintf(stderr, "** Unknown or malformed option '%s'\n", arg); + usage(); + yield = 1; + goto EXIT; + } + op++; + argc--; + } + +/* If -error was present, get the error numbers, show the messages, and exit. +We wait to do this until we know which mode we are in. */ + +if (arg_error != NULL) + { + int len; + int errcode; + char *endptr; + +/* Ensure the relevant non-8-bit buffer is available. Ensure that it is at +least 128 code units, because it is used for retrieving error messages. */ + +#ifdef SUPPORT_PCRE2_16 + if (test_mode == PCRE16_MODE) + { + pbuffer16_size = 256; + pbuffer16 = (uint16_t *)malloc(pbuffer16_size); + if (pbuffer16 == NULL) + { + fprintf(stderr, "pcre2test: malloc(%" SIZ_FORM ") failed for pbuffer16\n", + SIZ_CAST pbuffer16_size); + yield = 1; + goto EXIT; + } + } +#endif + +#ifdef SUPPORT_PCRE2_32 + if (test_mode == PCRE32_MODE) + { + pbuffer32_size = 512; + pbuffer32 = (uint32_t *)malloc(pbuffer32_size); + if (pbuffer32 == NULL) + { + fprintf(stderr, "pcre2test: malloc(%" SIZ_FORM ") failed for pbuffer32\n", + SIZ_CAST pbuffer32_size); + yield = 1; + goto EXIT; + } + } +#endif + + /* Loop along a list of error numbers. */ + + for (;;) + { + errcode = strtol(arg_error, &endptr, 10); + if (*endptr != 0 && *endptr != CHAR_COMMA) + { + fprintf(stderr, "** '%s' is not a valid error number list\n", arg_error); + yield = 1; + goto EXIT; + } + printf("Error %d: ", errcode); + PCRE2_GET_ERROR_MESSAGE(len, errcode, pbuffer); + if (len < 0) + { + switch (len) + { + case PCRE2_ERROR_BADDATA: + printf("PCRE2_ERROR_BADDATA (unknown error number)"); + break; + + case PCRE2_ERROR_NOMEMORY: + printf("PCRE2_ERROR_NOMEMORY (buffer too small)"); + break; + + default: + printf("Unexpected return (%d) from pcre2_get_error_message()", len); + break; + } + } + else + { + PCHARSV(CASTVAR(void *, pbuffer), 0, len, FALSE, stdout); + } + printf("\n"); + if (*endptr == 0) goto EXIT; + arg_error = endptr + 1; + } + /* Control never reaches here */ + } /* End of -error handling */ + +/* Initialize things that cannot be done until we know which test mode we are +running in. Exercise the general context copying and match data size functions, +which are not otherwise used. */ + +code_unit_size = test_mode/8; +max_oveccount = DEFAULT_OVECCOUNT; + +/* Use macros to save a lot of duplication. */ + +#define CREATECONTEXTS \ + G(general_context,BITS) = G(pcre2_general_context_create_,BITS)(&my_malloc, &my_free, NULL); \ + G(general_context_copy,BITS) = G(pcre2_general_context_copy_,BITS)(G(general_context,BITS)); \ + G(default_pat_context,BITS) = G(pcre2_compile_context_create_,BITS)(G(general_context,BITS)); \ + G(pat_context,BITS) = G(pcre2_compile_context_copy_,BITS)(G(default_pat_context,BITS)); \ + G(default_dat_context,BITS) = G(pcre2_match_context_create_,BITS)(G(general_context,BITS)); \ + G(dat_context,BITS) = G(pcre2_match_context_copy_,BITS)(G(default_dat_context,BITS)); \ + G(default_con_context,BITS) = G(pcre2_convert_context_create_,BITS)(G(general_context,BITS)); \ + G(con_context,BITS) = G(pcre2_convert_context_copy_,BITS)(G(default_con_context,BITS)); \ + G(match_data,BITS) = G(pcre2_match_data_create_,BITS)(max_oveccount, G(general_context,BITS)) + +#define CONTEXTTESTS \ + (void)G(pcre2_set_compile_extra_options_,BITS)(G(pat_context,BITS), 0); \ + (void)G(pcre2_set_max_pattern_length_,BITS)(G(pat_context,BITS), 0); \ + (void)G(pcre2_set_offset_limit_,BITS)(G(dat_context,BITS), 0); \ + (void)G(pcre2_set_recursion_memory_management_,BITS)(G(dat_context,BITS), my_malloc, my_free, NULL); \ + (void)G(pcre2_get_match_data_size_,BITS)(G(match_data,BITS)) + + +/* Call the appropriate functions for the current mode, and exercise some +functions that are not otherwise called. */ + +#ifdef SUPPORT_PCRE2_8 +#undef BITS +#define BITS 8 +if (test_mode == PCRE8_MODE) + { + CREATECONTEXTS; + CONTEXTTESTS; + } +#endif + +#ifdef SUPPORT_PCRE2_16 +#undef BITS +#define BITS 16 +if (test_mode == PCRE16_MODE) + { + CREATECONTEXTS; + CONTEXTTESTS; + } +#endif + +#ifdef SUPPORT_PCRE2_32 +#undef BITS +#define BITS 32 +if (test_mode == PCRE32_MODE) + { + CREATECONTEXTS; + CONTEXTTESTS; + } +#endif + +/* Set a default parentheses nest limit that is large enough to run the +standard tests (this also exercises the function). */ + +PCRE2_SET_PARENS_NEST_LIMIT(default_pat_context, PARENS_NEST_DEFAULT); + +/* Handle command line modifier settings, sending any error messages to +stderr. We need to know the mode before modifying the context, and it is tidier +to do them all in the same way. */ + +outfile = stderr; +if ((arg_pattern != NULL && + !decode_modifiers((uint8_t *)arg_pattern, CTX_DEFPAT, &def_patctl, NULL)) || + (arg_subject != NULL && + !decode_modifiers((uint8_t *)arg_subject, CTX_DEFDAT, NULL, &def_datctl))) + { + yield = 1; + goto EXIT; + } + +/* Sort out the input and output files, defaulting to stdin/stdout. */ + +infile = stdin; +outfile = stdout; + +if (argc > 1 && strcmp(argv[op], "-") != 0) + { + infile = fopen(argv[op], INPUT_MODE); + if (infile == NULL) + { + printf("** Failed to open '%s': %s\n", argv[op], strerror(errno)); + yield = 1; + goto EXIT; + } + } + +#if defined(SUPPORT_LIBREADLINE) || defined(SUPPORT_LIBEDIT) +if (INTERACTIVE(infile)) using_history(); +#endif + +if (argc > 2) + { + outfile = fopen(argv[op+1], OUTPUT_MODE); + if (outfile == NULL) + { + printf("** Failed to open '%s': %s\n", argv[op+1], strerror(errno)); + yield = 1; + goto EXIT; + } + } + +/* Output a heading line unless quiet, then process input lines. */ + +if (!quiet) print_version(outfile); + +SET(compiled_code, NULL); + +#ifdef SUPPORT_PCRE2_8 +preg.re_pcre2_code = NULL; +preg.re_match_data = NULL; +#endif + +while (notdone) + { + uint8_t *p; + int rc = PR_OK; + BOOL expectdata = TEST(compiled_code, !=, NULL); +#ifdef SUPPORT_PCRE2_8 + expectdata |= preg.re_pcre2_code != NULL; +#endif + + if (extend_inputline(infile, buffer, expectdata? "data> " : " re> ") == NULL) + break; + if (!INTERACTIVE(infile)) fprintf(outfile, "%s", (char *)buffer); + fflush(outfile); + p = buffer; + + /* If we have a pattern set up for testing, or we are skipping after a + compile failure, a blank line terminates this test. */ + + if (expectdata || skipping) + { + while (isspace(*p)) p++; + if (*p == 0) + { +#ifdef SUPPORT_PCRE2_8 + if (preg.re_pcre2_code != NULL) + { + regfree(&preg); + preg.re_pcre2_code = NULL; + preg.re_match_data = NULL; + } +#endif /* SUPPORT_PCRE2_8 */ + if (TEST(compiled_code, !=, NULL)) + { + SUB1(pcre2_code_free, compiled_code); + SET(compiled_code, NULL); + } + skipping = FALSE; + setlocale(LC_CTYPE, "C"); + } + + /* Otherwise, if we are not skipping, and the line is not a data comment + line starting with "\=", process a data line. */ + + else if (!skipping && !(p[0] == '\\' && p[1] == '=' && isspace(p[2]))) + { + rc = process_data(); + } + } + + /* We do not have a pattern set up for testing. Lines starting with # are + either comments or special commands. Blank lines are ignored. Otherwise, the + line must start with a valid delimiter. It is then processed as a pattern + line. A copy of the pattern is left in pbuffer8 for use by callouts. Under + valgrind, make the unused part of the buffer undefined, to catch overruns. */ + + else if (*p == '#') + { + if (isspace(p[1]) || p[1] == '!' || p[1] == 0) continue; + rc = process_command(); + } + + else if (strchr("/!\"'`%&-=_:;,@~", *p) != NULL) + { + rc = process_pattern(); + dfa_matched = 0; + } + + else + { + while (isspace(*p)) p++; + if (*p != 0) + { + fprintf(outfile, "** Invalid pattern delimiter '%c' (x%x).\n", *buffer, + *buffer); + rc = PR_SKIP; + } + } + + if (rc == PR_SKIP && !INTERACTIVE(infile)) skipping = TRUE; + else if (rc == PR_ABEND) + { + fprintf(outfile, "** pcre2test run abandoned\n"); + yield = 1; + goto EXIT; + } + } + +/* Finish off a normal run. */ + +if (INTERACTIVE(infile)) fprintf(outfile, "\n"); + +if (showtotaltimes) + { + const char *pad = ""; + fprintf(outfile, "--------------------------------------\n"); + if (timeit > 0) + { + fprintf(outfile, "Total compile time %.4f milliseconds\n", + (((double)total_compile_time * 1000.0) / (double)timeit) / + (double)CLOCKS_PER_SEC); + if (total_jit_compile_time > 0) + fprintf(outfile, "Total JIT compile %.4f milliseconds\n", + (((double)total_jit_compile_time * 1000.0) / (double)timeit) / + (double)CLOCKS_PER_SEC); + pad = " "; + } + fprintf(outfile, "Total match time %s%.4f milliseconds\n", pad, + (((double)total_match_time * 1000.0) / (double)timeitm) / + (double)CLOCKS_PER_SEC); + } + + +EXIT: + +#if defined(SUPPORT_LIBREADLINE) || defined(SUPPORT_LIBEDIT) +if (infile != NULL && INTERACTIVE(infile)) clear_history(); +#endif + +if (infile != NULL && infile != stdin) fclose(infile); +if (outfile != NULL && outfile != stdout) fclose(outfile); + +free(buffer); +free(dbuffer); +free(pbuffer8); +free(dfa_workspace); +free((void *)locale_tables); +free(tables3); +PCRE2_MATCH_DATA_FREE(match_data); +SUB1(pcre2_code_free, compiled_code); + +while(patstacknext-- > 0) + { + SET(compiled_code, patstack[patstacknext]); + SUB1(pcre2_code_free, compiled_code); + } + +PCRE2_JIT_FREE_UNUSED_MEMORY(general_context); +if (jit_stack != NULL) + { + PCRE2_JIT_STACK_FREE(jit_stack); + } + +#define FREECONTEXTS \ + G(pcre2_general_context_free_,BITS)(G(general_context,BITS)); \ + G(pcre2_general_context_free_,BITS)(G(general_context_copy,BITS)); \ + G(pcre2_compile_context_free_,BITS)(G(pat_context,BITS)); \ + G(pcre2_compile_context_free_,BITS)(G(default_pat_context,BITS)); \ + G(pcre2_match_context_free_,BITS)(G(dat_context,BITS)); \ + G(pcre2_match_context_free_,BITS)(G(default_dat_context,BITS)); \ + G(pcre2_convert_context_free_,BITS)(G(default_con_context,BITS)); \ + G(pcre2_convert_context_free_,BITS)(G(con_context,BITS)); + +#ifdef SUPPORT_PCRE2_8 +#undef BITS +#define BITS 8 +if (preg.re_pcre2_code != NULL) regfree(&preg); +FREECONTEXTS; +#endif + +#ifdef SUPPORT_PCRE2_16 +#undef BITS +#define BITS 16 +free(pbuffer16); +FREECONTEXTS; +#endif + +#ifdef SUPPORT_PCRE2_32 +#undef BITS +#define BITS 32 +free(pbuffer32); +FREECONTEXTS; +#endif + +#if defined(__VMS) + yield = SS$_NORMAL; /* Return values via DCL symbols */ +#endif + +return yield; +} + +/* End of pcre2test.c */ diff --git a/src/pcre/sljit/sljitConfig.h b/src/pcre2/src/sljit/sljitConfig.h similarity index 75% rename from src/pcre/sljit/sljitConfig.h rename to src/pcre2/src/sljit/sljitConfig.h index d54b5e6f..1c821d28 100644 --- a/src/pcre/sljit/sljitConfig.h +++ b/src/pcre2/src/sljit/sljitConfig.h @@ -24,15 +24,19 @@ * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ -#ifndef _SLJIT_CONFIG_H_ -#define _SLJIT_CONFIG_H_ +#ifndef SLJIT_CONFIG_H_ +#define SLJIT_CONFIG_H_ -/* --------------------------------------------------------------------- */ -/* Custom defines */ -/* --------------------------------------------------------------------- */ +#ifdef __cplusplus +extern "C" { +#endif -/* Put your custom defines here. This empty section will never change - which helps maintaining patches (with diff / patch utilities). */ +/* + This file contains the basic configuration options for the SLJIT compiler + and their default values. These options can be overridden in the + sljitConfigPre.h header file when SLJIT_HAVE_CONFIG_PRE is set to a + non-zero value. +*/ /* --------------------------------------------------------------------- */ /* Architecture */ @@ -50,7 +54,7 @@ /* #define SLJIT_CONFIG_MIPS_32 1 */ /* #define SLJIT_CONFIG_MIPS_64 1 */ /* #define SLJIT_CONFIG_SPARC_32 1 */ -/* #define SLJIT_CONFIG_TILEGX 1 */ +/* #define SLJIT_CONFIG_S390X 1 */ /* #define SLJIT_CONFIG_AUTO 1 */ /* #define SLJIT_CONFIG_UNSUPPORTED 1 */ @@ -59,18 +63,19 @@ /* Utilities */ /* --------------------------------------------------------------------- */ -/* Useful for thread-safe compiling of global functions. */ -#ifndef SLJIT_UTIL_GLOBAL_LOCK -/* Enabled by default */ -#define SLJIT_UTIL_GLOBAL_LOCK 1 -#endif - -/* Implements a stack like data structure (by using mmap / VirtualAlloc). */ +/* Implements a stack like data structure (by using mmap / VirtualAlloc */ +/* or a custom allocator). */ #ifndef SLJIT_UTIL_STACK /* Enabled by default */ #define SLJIT_UTIL_STACK 1 #endif +/* Uses user provided allocator to allocate the stack (see SLJIT_UTIL_STACK) */ +#ifndef SLJIT_UTIL_SIMPLE_STACK_ALLOCATION +/* Disabled by default */ +#define SLJIT_UTIL_SIMPLE_STACK_ALLOCATION 0 +#endif + /* Single threaded application. Does not require any locks. */ #ifndef SLJIT_SINGLE_THREADED /* Disabled by default. */ @@ -97,15 +102,31 @@ /* When SLJIT_PROT_EXECUTABLE_ALLOCATOR is enabled SLJIT uses an allocator which does not set writable and executable - permission flags at the same time. The trade-of is increased - memory consumption and disabled dynamic code modifications. */ + permission flags at the same time. + Instead, it creates a shared memory segment (usually backed by a file) + and maps it twice, with different permissions, depending on the use + case. + The trade-off is increased use of virtual memory, incompatibility with + fork(), and some possible additional security risks by the use of + publicly accessible files for the generated code. */ #ifndef SLJIT_PROT_EXECUTABLE_ALLOCATOR /* Disabled by default. */ #define SLJIT_PROT_EXECUTABLE_ALLOCATOR 0 #endif +/* When SLJIT_WX_EXECUTABLE_ALLOCATOR is enabled SLJIT uses an + allocator which does not set writable and executable permission + flags at the same time. + Instead, it creates a new independent map on each invocation and + switches permissions at the underlying pages as needed. + The trade-off is increased memory use and degraded performance. */ +#ifndef SLJIT_WX_EXECUTABLE_ALLOCATOR +/* Disabled by default. */ +#define SLJIT_WX_EXECUTABLE_ALLOCATOR 0 #endif +#endif /* !SLJIT_EXECUTABLE_ALLOCATOR */ + /* Force cdecl calling convention even if a better calling convention (e.g. fastcall) is supported by the C compiler. If this option is disabled (this is the default), functions @@ -144,4 +165,8 @@ /* For further configurations, see the beginning of sljitConfigInternal.h */ +#ifdef __cplusplus +} /* extern "C" */ #endif + +#endif /* SLJIT_CONFIG_H_ */ diff --git a/src/pcre/sljit/sljitConfigInternal.h b/src/pcre2/src/sljit/sljitConfigInternal.h similarity index 82% rename from src/pcre/sljit/sljitConfigInternal.h rename to src/pcre2/src/sljit/sljitConfigInternal.h index ba60311e..ff36e5b7 100644 --- a/src/pcre/sljit/sljitConfigInternal.h +++ b/src/pcre2/src/sljit/sljitConfigInternal.h @@ -24,8 +24,22 @@ * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ -#ifndef _SLJIT_CONFIG_INTERNAL_H_ -#define _SLJIT_CONFIG_INTERNAL_H_ +#ifndef SLJIT_CONFIG_INTERNAL_H_ +#define SLJIT_CONFIG_INTERNAL_H_ + +#if (defined SLJIT_VERBOSE && SLJIT_VERBOSE) \ + || (defined SLJIT_DEBUG && SLJIT_DEBUG && (!defined(SLJIT_ASSERT) || !defined(SLJIT_UNREACHABLE))) +#include +#endif + +#if (defined SLJIT_DEBUG && SLJIT_DEBUG \ + && (!defined(SLJIT_ASSERT) || !defined(SLJIT_UNREACHABLE) || !defined(SLJIT_HALT_PROCESS))) +#include +#endif + +#ifdef __cplusplus +extern "C" { +#endif /* SLJIT defines the following architecture dependent types and macros: @@ -67,30 +81,13 @@ Other macros: SLJIT_FUNC : calling convention attribute for both calling JIT from C and C calling back from JIT - SLJIT_W(number) : defining 64 bit constants on 64 bit architectures (compiler independent helper) + SLJIT_W(number) : defining 64 bit constants on 64 bit architectures (platform independent helper) */ /*****************/ /* Sanity check. */ /*****************/ -#if !((defined SLJIT_CONFIG_X86_32 && SLJIT_CONFIG_X86_32) \ - || (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) \ - || (defined SLJIT_CONFIG_ARM_V5 && SLJIT_CONFIG_ARM_V5) \ - || (defined SLJIT_CONFIG_ARM_V7 && SLJIT_CONFIG_ARM_V7) \ - || (defined SLJIT_CONFIG_ARM_THUMB2 && SLJIT_CONFIG_ARM_THUMB2) \ - || (defined SLJIT_CONFIG_ARM_64 && SLJIT_CONFIG_ARM_64) \ - || (defined SLJIT_CONFIG_PPC_32 && SLJIT_CONFIG_PPC_32) \ - || (defined SLJIT_CONFIG_PPC_64 && SLJIT_CONFIG_PPC_64) \ - || (defined SLJIT_CONFIG_MIPS_32 && SLJIT_CONFIG_MIPS_32) \ - || (defined SLJIT_CONFIG_MIPS_64 && SLJIT_CONFIG_MIPS_64) \ - || (defined SLJIT_CONFIG_SPARC_32 && SLJIT_CONFIG_SPARC_32) \ - || (defined SLJIT_CONFIG_TILEGX && SLJIT_CONFIG_TILEGX) \ - || (defined SLJIT_CONFIG_AUTO && SLJIT_CONFIG_AUTO) \ - || (defined SLJIT_CONFIG_UNSUPPORTED && SLJIT_CONFIG_UNSUPPORTED)) -#error "An architecture must be selected" -#endif - #if (defined SLJIT_CONFIG_X86_32 && SLJIT_CONFIG_X86_32) \ + (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) \ + (defined SLJIT_CONFIG_ARM_V5 && SLJIT_CONFIG_ARM_V5) \ @@ -99,15 +96,36 @@ + (defined SLJIT_CONFIG_ARM_64 && SLJIT_CONFIG_ARM_64) \ + (defined SLJIT_CONFIG_PPC_32 && SLJIT_CONFIG_PPC_32) \ + (defined SLJIT_CONFIG_PPC_64 && SLJIT_CONFIG_PPC_64) \ - + (defined SLJIT_CONFIG_TILEGX && SLJIT_CONFIG_TILEGX) \ + (defined SLJIT_CONFIG_MIPS_32 && SLJIT_CONFIG_MIPS_32) \ + (defined SLJIT_CONFIG_MIPS_64 && SLJIT_CONFIG_MIPS_64) \ + (defined SLJIT_CONFIG_SPARC_32 && SLJIT_CONFIG_SPARC_32) \ + + (defined SLJIT_CONFIG_S390X && SLJIT_CONFIG_S390X) \ + (defined SLJIT_CONFIG_AUTO && SLJIT_CONFIG_AUTO) \ + (defined SLJIT_CONFIG_UNSUPPORTED && SLJIT_CONFIG_UNSUPPORTED) >= 2 #error "Multiple architectures are selected" #endif +#if !(defined SLJIT_CONFIG_X86_32 && SLJIT_CONFIG_X86_32) \ + && !(defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) \ + && !(defined SLJIT_CONFIG_ARM_V5 && SLJIT_CONFIG_ARM_V5) \ + && !(defined SLJIT_CONFIG_ARM_V7 && SLJIT_CONFIG_ARM_V7) \ + && !(defined SLJIT_CONFIG_ARM_THUMB2 && SLJIT_CONFIG_ARM_THUMB2) \ + && !(defined SLJIT_CONFIG_ARM_64 && SLJIT_CONFIG_ARM_64) \ + && !(defined SLJIT_CONFIG_PPC_32 && SLJIT_CONFIG_PPC_32) \ + && !(defined SLJIT_CONFIG_PPC_64 && SLJIT_CONFIG_PPC_64) \ + && !(defined SLJIT_CONFIG_MIPS_32 && SLJIT_CONFIG_MIPS_32) \ + && !(defined SLJIT_CONFIG_MIPS_64 && SLJIT_CONFIG_MIPS_64) \ + && !(defined SLJIT_CONFIG_SPARC_32 && SLJIT_CONFIG_SPARC_32) \ + && !(defined SLJIT_CONFIG_S390X && SLJIT_CONFIG_S390X) \ + && !(defined SLJIT_CONFIG_UNSUPPORTED && SLJIT_CONFIG_UNSUPPORTED) \ + && !(defined SLJIT_CONFIG_AUTO && SLJIT_CONFIG_AUTO) +#if defined SLJIT_CONFIG_AUTO && !SLJIT_CONFIG_AUTO +#error "An architecture must be selected" +#else /* SLJIT_CONFIG_AUTO */ +#define SLJIT_CONFIG_AUTO 1 +#endif /* !SLJIT_CONFIG_AUTO */ +#endif /* !SLJIT_CONFIG */ + /********************************************************/ /* Automatic CPU detection (requires compiler support). */ /********************************************************/ @@ -140,8 +158,8 @@ #define SLJIT_CONFIG_MIPS_64 1 #elif defined(__sparc__) || defined(__sparc) #define SLJIT_CONFIG_SPARC_32 1 -#elif defined(__tilegx__) -#define SLJIT_CONFIG_TILEGX 1 +#elif defined(__s390x__) +#define SLJIT_CONFIG_S390X 1 #else /* Unsupported architecture */ #define SLJIT_CONFIG_UNSUPPORTED 1 @@ -191,6 +209,22 @@ #define SLJIT_CONFIG_SPARC 1 #endif +/***********************************************************/ +/* Intel Control-flow Enforcement Technology (CET) spport. */ +/***********************************************************/ + +#ifdef SLJIT_CONFIG_X86 + +#if defined(__CET__) && !(defined SLJIT_CONFIG_X86_CET && SLJIT_CONFIG_X86_CET) +#define SLJIT_CONFIG_X86_CET 1 +#endif + +#if (defined SLJIT_CONFIG_X86_CET && SLJIT_CONFIG_X86_CET) && defined(__GNUC__) +#include +#endif + +#endif /* SLJIT_CONFIG_X86 */ + /**********************************/ /* External function definitions. */ /**********************************/ @@ -214,6 +248,10 @@ #define SLJIT_MEMCPY(dest, src, len) memcpy(dest, src, len) #endif +#ifndef SLJIT_MEMMOVE +#define SLJIT_MEMMOVE(dest, src, len) memmove(dest, src, len) +#endif + #ifndef SLJIT_ZEROMEM #define SLJIT_ZEROMEM(dest, len) memset(dest, 0, len) #endif @@ -261,6 +299,7 @@ /* Type of public API functions. */ /*********************************/ +#ifndef SLJIT_API_FUNC_ATTRIBUTE #if (defined SLJIT_CONFIG_STATIC && SLJIT_CONFIG_STATIC) /* Static ABI functions. For all-in-one programs. */ @@ -274,6 +313,7 @@ #else #define SLJIT_API_FUNC_ATTRIBUTE #endif /* (defined SLJIT_CONFIG_STATIC && SLJIT_CONFIG_STATIC) */ +#endif /* defined SLJIT_API_FUNC_ATTRIBUTE */ /****************************/ /* Instruction cache flush. */ @@ -283,7 +323,7 @@ #if __has_builtin(__builtin___clear_cache) #define SLJIT_CACHE_FLUSH(from, to) \ - __builtin___clear_cache((char*)from, (char*)to) + __builtin___clear_cache((char*)(from), (char*)(to)) #endif /* __has_builtin(__builtin___clear_cache) */ #endif /* (!defined SLJIT_CACHE_FLUSH && defined __has_builtin) */ @@ -314,7 +354,7 @@ #elif (defined(__GNUC__) && (__GNUC__ >= 5 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))) #define SLJIT_CACHE_FLUSH(from, to) \ - __builtin___clear_cache((char*)from, (char*)to) + __builtin___clear_cache((char*)(from), (char*)(to)) #elif defined __ANDROID__ @@ -373,7 +413,7 @@ typedef long int sljit_sw; && !(defined SLJIT_CONFIG_ARM_64 && SLJIT_CONFIG_ARM_64) \ && !(defined SLJIT_CONFIG_PPC_64 && SLJIT_CONFIG_PPC_64) \ && !(defined SLJIT_CONFIG_MIPS_64 && SLJIT_CONFIG_MIPS_64) \ - && !(defined SLJIT_CONFIG_TILEGX && SLJIT_CONFIG_TILEGX) + && !(defined SLJIT_CONFIG_S390X && SLJIT_CONFIG_S390X) #define SLJIT_32BIT_ARCHITECTURE 1 #define SLJIT_WORD_SHIFT 2 typedef unsigned int sljit_uw; @@ -415,10 +455,14 @@ typedef double sljit_f64; #if (defined SLJIT_CONFIG_UNSUPPORTED && SLJIT_CONFIG_UNSUPPORTED) #define SLJIT_W(w) (w##l) #elif (defined SLJIT_64BIT_ARCHITECTURE && SLJIT_64BIT_ARCHITECTURE) +#ifdef _WIN64 #define SLJIT_W(w) (w##ll) -#else +#else /* !windows */ +#define SLJIT_W(w) (w##l) +#endif /* windows */ +#else /* 32 bit */ #define SLJIT_W(w) (w) -#endif +#endif /* unknown */ #endif /* !SLJIT_W */ @@ -447,7 +491,27 @@ typedef double sljit_f64; #define SLJIT_BIG_ENDIAN 1 #endif -#elif (defined SLJIT_CONFIG_SPARC_32 && SLJIT_CONFIG_SPARC_32) +#ifndef SLJIT_MIPS_REV + +/* Auto detecting mips revision. */ +#if (defined __mips_isa_rev) && (__mips_isa_rev >= 6) +#define SLJIT_MIPS_REV 6 +#elif (defined __mips_isa_rev && __mips_isa_rev >= 1) \ + || (defined __clang__ && defined _MIPS_ARCH_OCTEON) \ + || (defined __clang__ && defined _MIPS_ARCH_P5600) +/* clang either forgets to define (clang-7) __mips_isa_rev at all + * or sets it to zero (clang-8,-9) for -march=octeon (MIPS64 R2+) + * and -march=p5600 (MIPS32 R5). + * It also sets the __mips macro to 64 or 32 for -mipsN when N <= 5 + * (should be set to N exactly) so we cannot rely on this too. + */ +#define SLJIT_MIPS_REV 1 +#endif + +#endif /* !SLJIT_MIPS_REV */ + +#elif (defined SLJIT_CONFIG_SPARC_32 && SLJIT_CONFIG_SPARC_32) \ + || (defined SLJIT_CONFIG_S390X && SLJIT_CONFIG_S390X) #define SLJIT_BIG_ENDIAN 1 @@ -474,7 +538,8 @@ typedef double sljit_f64; || (defined SLJIT_CONFIG_ARM_THUMB2 && SLJIT_CONFIG_ARM_THUMB2) \ || (defined SLJIT_CONFIG_ARM_64 && SLJIT_CONFIG_ARM_64) \ || (defined SLJIT_CONFIG_PPC_32 && SLJIT_CONFIG_PPC_32) \ - || (defined SLJIT_CONFIG_PPC_64 && SLJIT_CONFIG_PPC_64) + || (defined SLJIT_CONFIG_PPC_64 && SLJIT_CONFIG_PPC_64) \ + || (defined SLJIT_CONFIG_S390X && SLJIT_CONFIG_S390X) #define SLJIT_UNALIGNED 1 #endif @@ -492,17 +557,19 @@ typedef double sljit_f64; #ifndef SLJIT_FUNC -#if (defined SLJIT_USE_CDECL_CALLING_CONVENTION && SLJIT_USE_CDECL_CALLING_CONVENTION) +#if (defined SLJIT_USE_CDECL_CALLING_CONVENTION && SLJIT_USE_CDECL_CALLING_CONVENTION) \ + || !(defined SLJIT_CONFIG_X86_32 && SLJIT_CONFIG_X86_32) -/* Force cdecl. */ #define SLJIT_FUNC -#elif (defined SLJIT_CONFIG_X86_32 && SLJIT_CONFIG_X86_32) - -#if defined(__GNUC__) && !defined(__APPLE__) +#elif defined(__GNUC__) && !defined(__APPLE__) +#if __GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4) #define SLJIT_FUNC __attribute__ ((fastcall)) #define SLJIT_X86_32_FASTCALL 1 +#else +#define SLJIT_FUNC +#endif /* gcc >= 3.4 */ #elif defined(_MSC_VER) @@ -516,16 +583,10 @@ typedef double sljit_f64; #else /* Unknown compiler. */ -/* The cdecl attribute is the default. */ -#define SLJIT_FUNC - -#endif - -#else /* Non x86-32 architectures. */ - +/* The cdecl calling convention is usually the x86 default. */ #define SLJIT_FUNC -#endif /* SLJIT_CONFIG_X86_32 */ +#endif /* SLJIT_USE_CDECL_CALLING_CONVENTION */ #endif /* !SLJIT_FUNC */ @@ -556,8 +617,16 @@ determine the next executed instruction after return. */ SLJIT_API_FUNC_ATTRIBUTE void* sljit_malloc_exec(sljit_uw size); SLJIT_API_FUNC_ATTRIBUTE void sljit_free_exec(void* ptr); SLJIT_API_FUNC_ATTRIBUTE void sljit_free_unused_memory_exec(void); -#define SLJIT_MALLOC_EXEC(size) sljit_malloc_exec(size) -#define SLJIT_FREE_EXEC(ptr) sljit_free_exec(ptr) +#define SLJIT_BUILTIN_MALLOC_EXEC(size, exec_allocator_data) sljit_malloc_exec(size) +#define SLJIT_BUILTIN_FREE_EXEC(ptr, exec_allocator_data) sljit_free_exec(ptr) + +#ifndef SLJIT_MALLOC_EXEC +#define SLJIT_MALLOC_EXEC(size, exec_allocator_data) SLJIT_BUILTIN_MALLOC_EXEC((size), (exec_allocator_data)) +#endif /* SLJIT_MALLOC_EXEC */ + +#ifndef SLJIT_FREE_EXEC +#define SLJIT_FREE_EXEC(ptr, exec_allocator_data) SLJIT_BUILTIN_FREE_EXEC((ptr), (exec_allocator_data)) +#endif /* SLJIT_FREE_EXEC */ #if (defined SLJIT_PROT_EXECUTABLE_ALLOCATOR && SLJIT_PROT_EXECUTABLE_ALLOCATOR) SLJIT_API_FUNC_ATTRIBUTE sljit_sw sljit_exec_offset(void* ptr); @@ -566,7 +635,7 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_sw sljit_exec_offset(void* ptr); #define SLJIT_EXEC_OFFSET(ptr) 0 #endif -#endif +#endif /* SLJIT_EXECUTABLE_ALLOCATOR */ /**********************************************/ /* Registers and locals offset determination. */ @@ -642,11 +711,32 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_sw sljit_exec_offset(void* ptr); #define SLJIT_LOCALS_OFFSET_BASE ((16 + 1 + 6 + 2 + 1) * sizeof(sljit_sw)) #endif -#elif (defined SLJIT_CONFIG_TILEGX && SLJIT_CONFIG_TILEGX) +#elif (defined SLJIT_CONFIG_S390X && SLJIT_CONFIG_S390X) -#define SLJIT_NUMBER_OF_REGISTERS 10 -#define SLJIT_NUMBER_OF_SAVED_REGISTERS 5 -#define SLJIT_LOCALS_OFFSET_BASE 0 +/* + * https://refspecs.linuxbase.org/ELF/zSeries/lzsabi0_zSeries.html#STACKFRAME + * + * 160 + * .. FR6 + * .. FR4 + * .. FR2 + * 128 FR0 + * 120 R15 (used for SP) + * 112 R14 + * 104 R13 + * 96 R12 + * .. + * 48 R6 + * .. + * 16 R2 + * 8 RESERVED + * 0 SP + */ +#define SLJIT_S390X_DEFAULT_STACK_FRAME_SIZE 160 + +#define SLJIT_NUMBER_OF_REGISTERS 12 +#define SLJIT_NUMBER_OF_SAVED_REGISTERS 8 +#define SLJIT_LOCALS_OFFSET_BASE SLJIT_S390X_DEFAULT_STACK_FRAME_SIZE #elif (defined SLJIT_CONFIG_UNSUPPORTED && SLJIT_CONFIG_UNSUPPORTED) @@ -675,24 +765,16 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_sw sljit_exec_offset(void* ptr); /* Debug and verbose related macros. */ /*************************************/ -#if (defined SLJIT_VERBOSE && SLJIT_VERBOSE) -#include -#endif - #if (defined SLJIT_DEBUG && SLJIT_DEBUG) #if !defined(SLJIT_ASSERT) || !defined(SLJIT_UNREACHABLE) /* SLJIT_HALT_PROCESS must halt the process. */ #ifndef SLJIT_HALT_PROCESS -#include - #define SLJIT_HALT_PROCESS() \ abort(); #endif /* !SLJIT_HALT_PROCESS */ -#include - #endif /* !SLJIT_ASSERT || !SLJIT_UNREACHABLE */ /* Feel free to redefine these two macros. */ @@ -738,4 +820,8 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_sw sljit_exec_offset(void* ptr); #endif /* !SLJIT_COMPILE_ASSERT */ +#ifdef __cplusplus +} /* extern "C" */ #endif + +#endif /* SLJIT_CONFIG_INTERNAL_H_ */ diff --git a/src/pcre/sljit/sljitExecAllocator.c b/src/pcre2/src/sljit/sljitExecAllocator.c similarity index 78% rename from src/pcre/sljit/sljitExecAllocator.c rename to src/pcre2/src/sljit/sljitExecAllocator.c index 3b37a975..6e5bf78e 100644 --- a/src/pcre/sljit/sljitExecAllocator.c +++ b/src/pcre2/src/sljit/sljitExecAllocator.c @@ -72,14 +72,14 @@ alloc_chunk / free_chunk : * allocate executable system memory chunks * the size is always divisible by CHUNK_SIZE - allocator_grab_lock / allocator_release_lock : - * make the allocator thread safe - * can be empty if the OS (or the application) does not support threading + SLJIT_ALLOCATOR_LOCK / SLJIT_ALLOCATOR_UNLOCK : + * provided as part of sljitUtils * only the allocator requires this lock, sljit is fully thread safe as it only uses local variables */ #ifdef _WIN32 +#define SLJIT_UPDATE_WX_FLAGS(from, to, enable_exec) static SLJIT_INLINE void* alloc_chunk(sljit_uw size) { @@ -92,70 +92,109 @@ static SLJIT_INLINE void free_chunk(void *chunk, sljit_uw size) VirtualFree(chunk, 0, MEM_RELEASE); } -#else +#else /* POSIX */ -#ifdef __APPLE__ -/* Configures TARGET_OS_OSX when appropriate */ +#if defined(__APPLE__) && defined(MAP_JIT) +/* + On macOS systems, returns MAP_JIT if it is defined _and_ we're running on a + version where it's OK to have more than one JIT block or where MAP_JIT is + required. + On non-macOS systems, returns MAP_JIT if it is defined. +*/ #include - -#if TARGET_OS_OSX && defined(MAP_JIT) +#if TARGET_OS_OSX +#if defined SLJIT_CONFIG_X86 && SLJIT_CONFIG_X86 +#ifdef MAP_ANON #include -#endif /* TARGET_OS_OSX && MAP_JIT */ +#include -#ifdef MAP_JIT +#define SLJIT_MAP_JIT (get_map_jit_flag()) static SLJIT_INLINE int get_map_jit_flag() { -#if TARGET_OS_OSX - /* On macOS systems, returns MAP_JIT if it is defined _and_ we're running on a version - of macOS where it's OK to have more than one JIT block. On non-macOS systems, returns - MAP_JIT if it is defined. */ + sljit_sw page_size; + void *ptr; + struct utsname name; static int map_jit_flag = -1; - /* The following code is thread safe because multiple initialization - sets map_jit_flag to the same value and the code has no side-effects. - Changing the kernel version witout system restart is (very) unlikely. */ - if (map_jit_flag == -1) { - struct utsname name; - + if (map_jit_flag < 0) { + map_jit_flag = 0; uname(&name); - /* Kernel version for 10.14.0 (Mojave) */ - map_jit_flag = (atoi(name.release) >= 18) ? MAP_JIT : 0; + /* Kernel version for 10.14.0 (Mojave) or later */ + if (atoi(name.release) >= 18) { + page_size = get_page_alignment() + 1; + /* Only use MAP_JIT if a hardened runtime is used */ + ptr = mmap(NULL, page_size, PROT_WRITE | PROT_EXEC, + MAP_PRIVATE | MAP_ANON, -1, 0); + + if (ptr != MAP_FAILED) + munmap(ptr, page_size); + else + map_jit_flag = MAP_JIT; + } } - return map_jit_flag; -#else /* !TARGET_OS_OSX */ - return MAP_JIT; -#endif /* TARGET_OS_OSX */ } +#endif /* MAP_ANON */ +#else /* !SLJIT_CONFIG_X86 */ +#if !(defined SLJIT_CONFIG_ARM && SLJIT_CONFIG_ARM) +#error Unsupported architecture +#endif /* SLJIT_CONFIG_ARM */ +#include -#endif /* MAP_JIT */ +#define SLJIT_MAP_JIT (MAP_JIT) +#define SLJIT_UPDATE_WX_FLAGS(from, to, enable_exec) \ + apple_update_wx_flags(enable_exec) -#endif /* __APPLE__ */ +static SLJIT_INLINE void apple_update_wx_flags(sljit_s32 enable_exec) +{ + pthread_jit_write_protect_np(enable_exec); +} +#endif /* SLJIT_CONFIG_X86 */ +#else /* !TARGET_OS_OSX */ +#define SLJIT_MAP_JIT (MAP_JIT) +#endif /* TARGET_OS_OSX */ +#endif /* __APPLE__ && MAP_JIT */ +#ifndef SLJIT_UPDATE_WX_FLAGS +#define SLJIT_UPDATE_WX_FLAGS(from, to, enable_exec) +#endif /* !SLJIT_UPDATE_WX_FLAGS */ +#ifndef SLJIT_MAP_JIT +#define SLJIT_MAP_JIT (0) +#endif /* !SLJIT_MAP_JIT */ static SLJIT_INLINE void* alloc_chunk(sljit_uw size) { void *retval; + int prot = PROT_READ | PROT_WRITE | PROT_EXEC; + int flags = MAP_PRIVATE; + int fd = -1; + +#ifdef PROT_MAX + prot |= PROT_MAX(prot); +#endif #ifdef MAP_ANON + flags |= MAP_ANON | SLJIT_MAP_JIT; +#else /* !MAP_ANON */ + if (SLJIT_UNLIKELY((dev_zero < 0) && open_dev_zero())) + return NULL; - int flags = MAP_PRIVATE | MAP_ANON; + fd = dev_zero; +#endif /* MAP_ANON */ -#ifdef MAP_JIT - flags |= get_map_jit_flag(); -#endif + retval = mmap(NULL, size, prot, flags, fd, 0); + if (retval == MAP_FAILED) + return NULL; - retval = mmap(NULL, size, PROT_READ | PROT_WRITE | PROT_EXEC, flags, -1, 0); -#else /* !MAP_ANON */ - if (dev_zero < 0) { - if (open_dev_zero()) - return NULL; + if (mprotect(retval, size, PROT_READ | PROT_WRITE | PROT_EXEC) < 0) { + munmap(retval, size); + return NULL; } - retval = mmap(NULL, size, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE, dev_zero, 0); -#endif /* MAP_ANON */ - return (retval != MAP_FAILED) ? retval : NULL; + SLJIT_UPDATE_WX_FLAGS(retval, (uint8_t *)retval + size, 0); + + return retval; } static SLJIT_INLINE void free_chunk(void *chunk, sljit_uw size) @@ -163,7 +202,7 @@ static SLJIT_INLINE void free_chunk(void *chunk, sljit_uw size) munmap(chunk, size); } -#endif +#endif /* windows */ /* --------------------------------------------------------------------- */ /* Common functions */ @@ -226,7 +265,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_malloc_exec(sljit_uw size) struct free_block *free_block; sljit_uw chunk_size; - allocator_grab_lock(); + SLJIT_ALLOCATOR_LOCK(); if (size < (64 - sizeof(struct block_header))) size = (64 - sizeof(struct block_header)); size = ALIGN_SIZE(size); @@ -235,6 +274,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_malloc_exec(sljit_uw size) while (free_block) { if (free_block->size >= size) { chunk_size = free_block->size; + SLJIT_UPDATE_WX_FLAGS(NULL, NULL, 0); if (chunk_size > size + 64) { /* We just cut a block from the end of the free block. */ chunk_size -= size; @@ -250,7 +290,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_malloc_exec(sljit_uw size) } allocated_size += size; header->size = size; - allocator_release_lock(); + SLJIT_ALLOCATOR_UNLOCK(); return MEM_START(header); } free_block = free_block->next; @@ -259,7 +299,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_malloc_exec(sljit_uw size) chunk_size = (size + sizeof(struct block_header) + CHUNK_SIZE - 1) & CHUNK_MASK; header = (struct block_header*)alloc_chunk(chunk_size); if (!header) { - allocator_release_lock(); + SLJIT_ALLOCATOR_UNLOCK(); return NULL; } @@ -286,7 +326,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_malloc_exec(sljit_uw size) } next_header->size = 1; next_header->prev_size = chunk_size; - allocator_release_lock(); + SLJIT_ALLOCATOR_UNLOCK(); return MEM_START(header); } @@ -295,11 +335,12 @@ SLJIT_API_FUNC_ATTRIBUTE void sljit_free_exec(void* ptr) struct block_header *header; struct free_block* free_block; - allocator_grab_lock(); + SLJIT_ALLOCATOR_LOCK(); header = AS_BLOCK_HEADER(ptr, -(sljit_sw)sizeof(struct block_header)); allocated_size -= header->size; /* Connecting free blocks together if possible. */ + SLJIT_UPDATE_WX_FLAGS(NULL, NULL, 0); /* If header->prev_size == 0, free_block will equal to header. In this case, free_block->header.size will be > 0. */ @@ -332,7 +373,8 @@ SLJIT_API_FUNC_ATTRIBUTE void sljit_free_exec(void* ptr) } } - allocator_release_lock(); + SLJIT_UPDATE_WX_FLAGS(NULL, NULL, 1); + SLJIT_ALLOCATOR_UNLOCK(); } SLJIT_API_FUNC_ATTRIBUTE void sljit_free_unused_memory_exec(void) @@ -340,7 +382,8 @@ SLJIT_API_FUNC_ATTRIBUTE void sljit_free_unused_memory_exec(void) struct free_block* free_block; struct free_block* next_free_block; - allocator_grab_lock(); + SLJIT_ALLOCATOR_LOCK(); + SLJIT_UPDATE_WX_FLAGS(NULL, NULL, 0); free_block = free_blocks; while (free_block) { @@ -355,5 +398,6 @@ SLJIT_API_FUNC_ATTRIBUTE void sljit_free_unused_memory_exec(void) } SLJIT_ASSERT((total_size && free_blocks) || (!total_size && !free_blocks)); - allocator_release_lock(); + SLJIT_UPDATE_WX_FLAGS(NULL, NULL, 1); + SLJIT_ALLOCATOR_UNLOCK(); } diff --git a/src/pcre/sljit/sljitLir.c b/src/pcre2/src/sljit/sljitLir.c similarity index 93% rename from src/pcre/sljit/sljitLir.c rename to src/pcre2/src/sljit/sljitLir.c index 5bdddc10..d817c90b 100644 --- a/src/pcre/sljit/sljitLir.c +++ b/src/pcre2/src/sljit/sljitLir.c @@ -28,7 +28,6 @@ #ifdef _WIN32 -/* For SLJIT_CACHE_FLUSH, which can expand to FlushInstructionCache. */ #include #endif /* _WIN32 */ @@ -144,6 +143,7 @@ #if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) # define PATCH_MD 0x10 #endif +# define TYPE_SHIFT 13 #endif #if (defined SLJIT_CONFIG_ARM_V5 && SLJIT_CONFIG_ARM_V5) || (defined SLJIT_CONFIG_ARM_V7 && SLJIT_CONFIG_ARM_V7) @@ -201,15 +201,16 @@ # define IS_CALL 0x010 # define IS_BIT26_COND 0x020 # define IS_BIT16_COND 0x040 +# define IS_BIT23_COND 0x080 -# define IS_COND (IS_BIT26_COND | IS_BIT16_COND) +# define IS_COND (IS_BIT26_COND | IS_BIT16_COND | IS_BIT23_COND) -# define PATCH_B 0x080 -# define PATCH_J 0x100 +# define PATCH_B 0x100 +# define PATCH_J 0x200 #if (defined SLJIT_CONFIG_MIPS_64 && SLJIT_CONFIG_MIPS_64) -# define PATCH_ABS32 0x200 -# define PATCH_ABS48 0x400 +# define PATCH_ABS32 0x400 +# define PATCH_ABS48 0x800 #endif /* instruction types */ @@ -221,14 +222,6 @@ # define FCSR_FCC 33 #endif -#if (defined SLJIT_CONFIG_TILEGX && SLJIT_CONFIG_TILEGX) -# define IS_JAL 0x04 -# define IS_COND 0x08 - -# define PATCH_B 0x10 -# define PATCH_J 0x20 -#endif - #if (defined SLJIT_CONFIG_SPARC_32 && SLJIT_CONFIG_SPARC_32) # define IS_MOVABLE 0x04 # define IS_COND 0x08 @@ -272,6 +265,8 @@ #if (defined SLJIT_PROT_EXECUTABLE_ALLOCATOR && SLJIT_PROT_EXECUTABLE_ALLOCATOR) #include "sljitProtExecAllocator.c" +#elif (defined SLJIT_WX_EXECUTABLE_ALLOCATOR && SLJIT_WX_EXECUTABLE_ALLOCATOR) +#include "sljitWXExecAllocator.c" #else #include "sljitExecAllocator.c" #endif @@ -284,6 +279,10 @@ #define SLJIT_ADD_EXEC_OFFSET(ptr, exec_offset) ((sljit_u8 *)(ptr)) #endif +#ifndef SLJIT_UPDATE_WX_FLAGS +#define SLJIT_UPDATE_WX_FLAGS(from, to, enable_exec) +#endif + /* Argument checking features. */ #if (defined SLJIT_ARGUMENT_CHECKS && SLJIT_ARGUMENT_CHECKS) @@ -364,7 +363,7 @@ static sljit_s32 compiler_initialized = 0; static void init_compiler(void); #endif -SLJIT_API_FUNC_ATTRIBUTE struct sljit_compiler* sljit_create_compiler(void *allocator_data) +SLJIT_API_FUNC_ATTRIBUTE struct sljit_compiler* sljit_create_compiler(void *allocator_data, void *exec_allocator_data) { struct sljit_compiler *compiler = (struct sljit_compiler*)SLJIT_MALLOC(sizeof(struct sljit_compiler), allocator_data); if (!compiler) @@ -391,6 +390,7 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_compiler* sljit_create_compiler(void *allo compiler->error = SLJIT_SUCCESS; compiler->allocator_data = allocator_data; + compiler->exec_allocator_data = exec_allocator_data; compiler->buf = (struct sljit_memory_fragment*)SLJIT_MALLOC(BUF_SIZE, allocator_data); compiler->abuf = (struct sljit_memory_fragment*)SLJIT_MALLOC(ABUF_SIZE, allocator_data); @@ -483,22 +483,28 @@ SLJIT_API_FUNC_ATTRIBUTE void sljit_set_compiler_memory_error(struct sljit_compi } #if (defined SLJIT_CONFIG_ARM_THUMB2 && SLJIT_CONFIG_ARM_THUMB2) -SLJIT_API_FUNC_ATTRIBUTE void sljit_free_code(void* code) +SLJIT_API_FUNC_ATTRIBUTE void sljit_free_code(void* code, void *exec_allocator_data) { + SLJIT_UNUSED_ARG(exec_allocator_data); + /* Remove thumb mode flag. */ - SLJIT_FREE_EXEC((void*)((sljit_uw)code & ~0x1)); + SLJIT_FREE_EXEC((void*)((sljit_uw)code & ~0x1), exec_allocator_data); } #elif (defined SLJIT_INDIRECT_CALL && SLJIT_INDIRECT_CALL) -SLJIT_API_FUNC_ATTRIBUTE void sljit_free_code(void* code) +SLJIT_API_FUNC_ATTRIBUTE void sljit_free_code(void* code, void *exec_allocator_data) { + SLJIT_UNUSED_ARG(exec_allocator_data); + /* Resolve indirection. */ code = (void*)(*(sljit_uw*)code); - SLJIT_FREE_EXEC(code); + SLJIT_FREE_EXEC(code, exec_allocator_data); } #else -SLJIT_API_FUNC_ATTRIBUTE void sljit_free_code(void* code) +SLJIT_API_FUNC_ATTRIBUTE void sljit_free_code(void* code, void *exec_allocator_data) { - SLJIT_FREE_EXEC(code); + SLJIT_UNUSED_ARG(exec_allocator_data); + + SLJIT_FREE_EXEC(code, exec_allocator_data); } #endif @@ -520,6 +526,12 @@ SLJIT_API_FUNC_ATTRIBUTE void sljit_set_target(struct sljit_jump *jump, sljit_uw } } +SLJIT_API_FUNC_ATTRIBUTE void sljit_set_put_label(struct sljit_put_label *put_label, struct sljit_label *label) +{ + if (SLJIT_LIKELY(!!put_label)) + put_label->label = label; +} + SLJIT_API_FUNC_ATTRIBUTE void sljit_set_current_flags(struct sljit_compiler *compiler, sljit_s32 current_flags) { SLJIT_UNUSED_ARG(compiler); @@ -619,6 +631,33 @@ static SLJIT_INLINE sljit_s32 get_arg_count(sljit_s32 arg_types) return arg_count; } + +/* Only used in RISC architectures where the instruction size is constant */ +#if !(defined SLJIT_CONFIG_X86 && SLJIT_CONFIG_X86) \ + && !(defined SLJIT_CONFIG_S390X && SLJIT_CONFIG_S390X) + +static SLJIT_INLINE sljit_uw compute_next_addr(struct sljit_label *label, struct sljit_jump *jump, + struct sljit_const *const_, struct sljit_put_label *put_label) +{ + sljit_uw result = ~(sljit_uw)0; + + if (label) + result = label->size; + + if (jump && jump->addr < result) + result = jump->addr; + + if (const_ && const_->addr < result) + result = const_->addr; + + if (put_label && put_label->addr < result) + result = put_label->addr; + + return result; +} + +#endif /* !SLJIT_CONFIG_X86 && !SLJIT_CONFIG_S390X */ + static SLJIT_INLINE void set_emit_enter(struct sljit_compiler *compiler, sljit_s32 options, sljit_s32 args, sljit_s32 scratches, sljit_s32 saveds, sljit_s32 fscratches, sljit_s32 fsaveds, sljit_s32 local_size) @@ -686,6 +725,19 @@ static SLJIT_INLINE void set_const(struct sljit_const *const_, struct sljit_comp compiler->last_const = const_; } +static SLJIT_INLINE void set_put_label(struct sljit_put_label *put_label, struct sljit_compiler *compiler, sljit_uw offset) +{ + put_label->next = NULL; + put_label->label = NULL; + put_label->addr = compiler->size - offset; + put_label->flags = 0; + if (compiler->last_put_label) + compiler->last_put_label->next = put_label; + else + compiler->put_labels = put_label; + compiler->last_put_label = put_label; +} + #define ADDRESSING_DEPENDS_ON(exp, reg) \ (((exp) & SLJIT_MEM) && (((exp) & REG_MASK) == reg || OFFS_REG(exp) == reg)) @@ -881,7 +933,8 @@ static void sljit_verbose_fparam(struct sljit_compiler *compiler, sljit_s32 p, s static const char* op0_names[] = { (char*)"breakpoint", (char*)"nop", (char*)"lmul.uw", (char*)"lmul.sw", - (char*)"divmod.u", (char*)"divmod.s", (char*)"div.u", (char*)"div.s" + (char*)"divmod.u", (char*)"divmod.s", (char*)"div.u", (char*)"div.s", + (char*)"endbr", (char*)"skip_frames_before_return" }; static const char* op1_names[] = { @@ -898,6 +951,12 @@ static const char* op2_names[] = { (char*)"shl", (char*)"lshr", (char*)"ashr", }; +static const char* op_src_names[] = { + (char*)"fast_return", (char*)"skip_frames_before_fast_return", + (char*)"prefetch_l1", (char*)"prefetch_l2", + (char*)"prefetch_l3", (char*)"prefetch_once", +}; + static const char* fop1_names[] = { (char*)"mov", (char*)"conv", (char*)"conv", (char*)"conv", (char*)"conv", (char*)"conv", (char*)"cmp", (char*)"neg", @@ -1107,37 +1166,21 @@ static SLJIT_INLINE CHECK_RETURN_TYPE check_sljit_emit_fast_enter(struct sljit_c CHECK_RETURN_OK; } -static SLJIT_INLINE CHECK_RETURN_TYPE check_sljit_emit_fast_return(struct sljit_compiler *compiler, sljit_s32 src, sljit_sw srcw) -{ -#if (defined SLJIT_ARGUMENT_CHECKS && SLJIT_ARGUMENT_CHECKS) - FUNCTION_CHECK_SRC(src, srcw); - CHECK_ARGUMENT(src != SLJIT_IMM); - compiler->last_flags = 0; -#endif -#if (defined SLJIT_VERBOSE && SLJIT_VERBOSE) - if (SLJIT_UNLIKELY(!!compiler->verbose)) { - fprintf(compiler->verbose, " fast_return "); - sljit_verbose_param(compiler, src, srcw); - fprintf(compiler->verbose, "\n"); - } -#endif - CHECK_RETURN_OK; -} - static SLJIT_INLINE CHECK_RETURN_TYPE check_sljit_emit_op0(struct sljit_compiler *compiler, sljit_s32 op) { #if (defined SLJIT_ARGUMENT_CHECKS && SLJIT_ARGUMENT_CHECKS) CHECK_ARGUMENT((op >= SLJIT_BREAKPOINT && op <= SLJIT_LMUL_SW) - || ((op & ~SLJIT_I32_OP) >= SLJIT_DIVMOD_UW && (op & ~SLJIT_I32_OP) <= SLJIT_DIV_SW)); - CHECK_ARGUMENT(op < SLJIT_LMUL_UW || compiler->scratches >= 2); - if (op >= SLJIT_LMUL_UW) + || ((op & ~SLJIT_I32_OP) >= SLJIT_DIVMOD_UW && (op & ~SLJIT_I32_OP) <= SLJIT_DIV_SW) + || (op >= SLJIT_ENDBR && op <= SLJIT_SKIP_FRAMES_BEFORE_RETURN)); + CHECK_ARGUMENT(GET_OPCODE(op) < SLJIT_LMUL_UW || GET_OPCODE(op) >= SLJIT_ENDBR || compiler->scratches >= 2); + if ((GET_OPCODE(op) >= SLJIT_LMUL_UW && GET_OPCODE(op) <= SLJIT_DIV_SW) || op == SLJIT_SKIP_FRAMES_BEFORE_RETURN) compiler->last_flags = 0; #endif #if (defined SLJIT_VERBOSE && SLJIT_VERBOSE) if (SLJIT_UNLIKELY(!!compiler->verbose)) { fprintf(compiler->verbose, " %s", op0_names[GET_OPCODE(op) - SLJIT_OP0_BASE]); - if (GET_OPCODE(op) >= SLJIT_DIVMOD_UW) { + if (GET_OPCODE(op) >= SLJIT_DIVMOD_UW && GET_OPCODE(op) <= SLJIT_DIV_SW) { fprintf(compiler->verbose, (op & SLJIT_I32_OP) ? "32" : "w"); } fprintf(compiler->verbose, "\n"); @@ -1179,7 +1222,7 @@ static SLJIT_INLINE CHECK_RETURN_TYPE check_sljit_emit_op1(struct sljit_compiler break; } - FUNCTION_CHECK_DST(dst, dstw, 1); + FUNCTION_CHECK_DST(dst, dstw, HAS_FLAGS(op)); FUNCTION_CHECK_SRC(src, srcw); if (GET_OPCODE(op) >= SLJIT_NOT) { @@ -1259,7 +1302,7 @@ static SLJIT_INLINE CHECK_RETURN_TYPE check_sljit_emit_op2(struct sljit_compiler break; } - FUNCTION_CHECK_DST(dst, dstw, 1); + FUNCTION_CHECK_DST(dst, dstw, HAS_FLAGS(op)); FUNCTION_CHECK_SRC(src1, src1w); FUNCTION_CHECK_SRC(src2, src2w); compiler->last_flags = GET_FLAG_TYPE(op) | (op & (SLJIT_I32_OP | SLJIT_SET_Z)); @@ -1280,6 +1323,33 @@ static SLJIT_INLINE CHECK_RETURN_TYPE check_sljit_emit_op2(struct sljit_compiler CHECK_RETURN_OK; } +static SLJIT_INLINE CHECK_RETURN_TYPE check_sljit_emit_op_src(struct sljit_compiler *compiler, sljit_s32 op, + sljit_s32 src, sljit_sw srcw) +{ +#if (defined SLJIT_ARGUMENT_CHECKS && SLJIT_ARGUMENT_CHECKS) + CHECK_ARGUMENT(op >= SLJIT_FAST_RETURN && op <= SLJIT_PREFETCH_ONCE); + FUNCTION_CHECK_SRC(src, srcw); + + if (op == SLJIT_FAST_RETURN || op == SLJIT_SKIP_FRAMES_BEFORE_FAST_RETURN) + { + CHECK_ARGUMENT(src != SLJIT_IMM); + compiler->last_flags = 0; + } + else if (op >= SLJIT_PREFETCH_L1 && op <= SLJIT_PREFETCH_ONCE) + { + CHECK_ARGUMENT(src & SLJIT_MEM); + } +#endif +#if (defined SLJIT_VERBOSE && SLJIT_VERBOSE) + if (SLJIT_UNLIKELY(!!compiler->verbose)) { + fprintf(compiler->verbose, " %s ", op_src_names[op - SLJIT_OP_SRC_BASE]); + sljit_verbose_param(compiler, src, srcw); + fprintf(compiler->verbose, "\n"); + } +#endif + CHECK_RETURN_OK; +} + static SLJIT_INLINE CHECK_RETURN_TYPE check_sljit_get_register_index(sljit_s32 reg) { SLJIT_UNUSED_ARG(reg); @@ -1315,6 +1385,8 @@ static SLJIT_INLINE CHECK_RETURN_TYPE check_sljit_emit_op_custom(struct sljit_co #elif (defined SLJIT_CONFIG_ARM_THUMB2 && SLJIT_CONFIG_ARM_THUMB2) CHECK_ARGUMENT((size == 2 && (((sljit_sw)instruction) & 0x1) == 0) || (size == 4 && (((sljit_sw)instruction) & 0x3) == 0)); +#elif (defined SLJIT_CONFIG_S390X && SLJIT_CONFIG_S390X) + CHECK_ARGUMENT(size == 2 || size == 4 || size == 6); #else CHECK_ARGUMENT(size == 4 && (((sljit_sw)instruction) & 0x3) == 0); #endif @@ -1904,6 +1976,21 @@ static SLJIT_INLINE CHECK_RETURN_TYPE check_sljit_emit_const(struct sljit_compil CHECK_RETURN_OK; } +static SLJIT_INLINE CHECK_RETURN_TYPE check_sljit_emit_put_label(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw) +{ +#if (defined SLJIT_ARGUMENT_CHECKS && SLJIT_ARGUMENT_CHECKS) + FUNCTION_CHECK_DST(dst, dstw, 0); +#endif +#if (defined SLJIT_VERBOSE && SLJIT_VERBOSE) + if (SLJIT_UNLIKELY(!!compiler->verbose)) { + fprintf(compiler->verbose, " put_label "); + sljit_verbose_param(compiler, dst, dstw); + fprintf(compiler->verbose, "\n"); + } +#endif + CHECK_RETURN_OK; +} + #endif /* SLJIT_ARGUMENT_CHECKS || SLJIT_VERBOSE */ #define SELECT_FOP1_OPERATION_WITH_CHECKS(compiler, op, dst, dstw, src, srcw) \ @@ -1956,7 +2043,7 @@ static SLJIT_INLINE sljit_s32 emit_mov_before_return(struct sljit_compiler *comp #if (defined SLJIT_CONFIG_X86 && SLJIT_CONFIG_X86) \ || (defined SLJIT_CONFIG_PPC && SLJIT_CONFIG_PPC) \ || (defined SLJIT_CONFIG_SPARC_32 && SLJIT_CONFIG_SPARC_32) \ - || ((defined SLJIT_CONFIG_MIPS && SLJIT_CONFIG_MIPS) && !(defined SLJIT_MIPS_R1 && SLJIT_MIPS_R1)) + || ((defined SLJIT_CONFIG_MIPS && SLJIT_CONFIG_MIPS) && !(defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 1 && SLJIT_MIPS_REV < 6)) static SLJIT_INLINE sljit_s32 sljit_emit_cmov_generic(struct sljit_compiler *compiler, sljit_s32 type, sljit_s32 dst_reg, @@ -2033,8 +2120,8 @@ static SLJIT_INLINE sljit_s32 sljit_emit_cmov_generic(struct sljit_compiler *com # include "sljitNativeMIPS_common.c" #elif (defined SLJIT_CONFIG_SPARC && SLJIT_CONFIG_SPARC) # include "sljitNativeSPARC_common.c" -#elif (defined SLJIT_CONFIG_TILEGX && SLJIT_CONFIG_TILEGX) -# include "sljitNativeTILEGX_64.c" +#elif (defined SLJIT_CONFIG_S390X && SLJIT_CONFIG_S390X) +# include "sljitNativeS390X.c" #endif #if !(defined SLJIT_CONFIG_MIPS && SLJIT_CONFIG_MIPS) @@ -2065,7 +2152,7 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_jump* sljit_emit_cmp(struct sljit_compiler #endif if (SLJIT_UNLIKELY((src1 & SLJIT_IMM) && !(src2 & SLJIT_IMM))) { - /* Immediate is prefered as second argument by most architectures. */ + /* Immediate is preferred as second argument by most architectures. */ switch (condition) { case SLJIT_LESS: condition = SLJIT_GREATER; @@ -2214,9 +2301,10 @@ SLJIT_API_FUNC_ATTRIBUTE const char* sljit_get_platform_name(void) return "unsupported"; } -SLJIT_API_FUNC_ATTRIBUTE struct sljit_compiler* sljit_create_compiler(void *allocator_data) +SLJIT_API_FUNC_ATTRIBUTE struct sljit_compiler* sljit_create_compiler(void *allocator_data, void *exec_allocator_data) { SLJIT_UNUSED_ARG(allocator_data); + SLJIT_UNUSED_ARG(exec_allocator_data); SLJIT_UNREACHABLE(); return NULL; } @@ -2264,9 +2352,10 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_has_cpu_feature(sljit_s32 feature_type) return 0; } -SLJIT_API_FUNC_ATTRIBUTE void sljit_free_code(void* code) +SLJIT_API_FUNC_ATTRIBUTE void sljit_free_code(void* code, void *exec_allocator_data) { SLJIT_UNUSED_ARG(code); + SLJIT_UNUSED_ARG(exec_allocator_data); SLJIT_UNREACHABLE(); } @@ -2321,15 +2410,6 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_enter(struct sljit_compiler * return SLJIT_ERR_UNSUPPORTED; } -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_return(struct sljit_compiler *compiler, sljit_s32 src, sljit_sw srcw) -{ - SLJIT_UNUSED_ARG(compiler); - SLJIT_UNUSED_ARG(src); - SLJIT_UNUSED_ARG(srcw); - SLJIT_UNREACHABLE(); - return SLJIT_ERR_UNSUPPORTED; -} - SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op0(struct sljit_compiler *compiler, sljit_s32 op) { SLJIT_UNUSED_ARG(compiler); @@ -2369,6 +2449,17 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op2(struct sljit_compiler *compile return SLJIT_ERR_UNSUPPORTED; } +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_src(struct sljit_compiler *compiler, sljit_s32 op, + sljit_s32 src, sljit_sw srcw) +{ + SLJIT_UNUSED_ARG(compiler); + SLJIT_UNUSED_ARG(op); + SLJIT_UNUSED_ARG(src); + SLJIT_UNUSED_ARG(srcw); + SLJIT_UNREACHABLE(); + return SLJIT_ERR_UNSUPPORTED; +} + SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_get_register_index(sljit_s32 reg) { SLJIT_UNREACHABLE(); @@ -2489,6 +2580,13 @@ SLJIT_API_FUNC_ATTRIBUTE void sljit_set_target(struct sljit_jump *jump, sljit_uw SLJIT_UNREACHABLE(); } +SLJIT_API_FUNC_ATTRIBUTE void sljit_set_put_label(struct sljit_put_label *put_label, struct sljit_label *label) +{ + SLJIT_UNUSED_ARG(put_label); + SLJIT_UNUSED_ARG(label); + SLJIT_UNREACHABLE(); +} + SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_ijump(struct sljit_compiler *compiler, sljit_s32 type, sljit_s32 src, sljit_sw srcw) { SLJIT_UNUSED_ARG(compiler); @@ -2580,6 +2678,14 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_const* sljit_emit_const(struct sljit_compi return NULL; } +SLJIT_API_FUNC_ATTRIBUTE struct sljit_put_label* sljit_emit_put_label(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw) +{ + SLJIT_UNUSED_ARG(compiler); + SLJIT_UNUSED_ARG(dst); + SLJIT_UNUSED_ARG(dstw); + return NULL; +} + SLJIT_API_FUNC_ATTRIBUTE void sljit_set_jump_addr(sljit_uw addr, sljit_uw new_target, sljit_sw executable_offset) { SLJIT_UNUSED_ARG(addr); @@ -2596,4 +2702,4 @@ SLJIT_API_FUNC_ATTRIBUTE void sljit_set_const(sljit_uw addr, sljit_sw new_consta SLJIT_UNREACHABLE(); } -#endif +#endif /* !SLJIT_CONFIG_UNSUPPORTED */ diff --git a/src/pcre/sljit/sljitLir.h b/src/pcre2/src/sljit/sljitLir.h similarity index 92% rename from src/pcre/sljit/sljitLir.h rename to src/pcre2/src/sljit/sljitLir.h index e71890cf..93d28046 100644 --- a/src/pcre/sljit/sljitLir.h +++ b/src/pcre2/src/sljit/sljitLir.h @@ -24,8 +24,8 @@ * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ -#ifndef _SLJIT_LIR_H_ -#define _SLJIT_LIR_H_ +#ifndef SLJIT_LIR_H_ +#define SLJIT_LIR_H_ /* ------------------------------------------------------------------------ @@ -70,9 +70,11 @@ - pass --smc-check=all argument to valgrind, since JIT is a "self-modifying code" */ -#if !(defined SLJIT_NO_DEFAULT_CONFIG && SLJIT_NO_DEFAULT_CONFIG) +#if (defined SLJIT_HAVE_CONFIG_PRE && SLJIT_HAVE_CONFIG_PRE) +#include "sljitConfigPre.h" +#endif /* SLJIT_HAVE_CONFIG_PRE */ + #include "sljitConfig.h" -#endif /* The following header file defines useful macros for fine tuning sljit based code generators. They are listed in the beginning @@ -80,6 +82,14 @@ of sljitConfigInternal.h */ #include "sljitConfigInternal.h" +#if (defined SLJIT_HAVE_CONFIG_POST && SLJIT_HAVE_CONFIG_POST) +#include "sljitConfigPost.h" +#endif /* SLJIT_HAVE_CONFIG_POST */ + +#ifdef __cplusplus +extern "C" { +#endif + /* --------------------------------------------------------------------- */ /* Error codes */ /* --------------------------------------------------------------------- */ @@ -154,10 +164,10 @@ of sljitConfigInternal.h */ */ /* When SLJIT_UNUSED is specified as the destination of sljit_emit_op1 - or sljit_emit_op2 operations the result is discarded. If no status - flags are set, no instructions are emitted for these operations. Data - prefetch is a special exception, see SLJIT_MOV operation. Other SLJIT - operations do not support SLJIT_UNUSED as a destination operand. */ + or sljit_emit_op2 operations the result is discarded. Some status + flags must be set when the destination is SLJIT_UNUSED, because the + operation would have no effect otherwise. Other SLJIT operations do + not support SLJIT_UNUSED as a destination operand. */ #define SLJIT_UNUSED 0 /* Scratch registers. */ @@ -348,13 +358,20 @@ struct sljit_label { struct sljit_jump { struct sljit_jump *next; sljit_uw addr; - sljit_sw flags; + sljit_uw flags; union { sljit_uw target; - struct sljit_label* label; + struct sljit_label *label; } u; }; +struct sljit_put_label { + struct sljit_put_label *next; + struct sljit_label *label; + sljit_uw addr; + sljit_uw flags; +}; + struct sljit_const { struct sljit_const *next; sljit_uw addr; @@ -366,12 +383,15 @@ struct sljit_compiler { struct sljit_label *labels; struct sljit_jump *jumps; + struct sljit_put_label *put_labels; struct sljit_const *consts; struct sljit_label *last_label; struct sljit_jump *last_jump; struct sljit_const *last_const; + struct sljit_put_label *last_put_label; void *allocator_data; + void *exec_allocator_data; struct sljit_memory_fragment *buf; struct sljit_memory_fragment *abuf; @@ -438,9 +458,9 @@ struct sljit_compiler { sljit_sw cache_argw; #endif -#if (defined SLJIT_CONFIG_TILEGX && SLJIT_CONFIG_TILEGX) - sljit_s32 cache_arg; - sljit_sw cache_argw; +#if (defined SLJIT_CONFIG_S390X && SLJIT_CONFIG_S390X) + /* Need to allocate register save area to make calls. */ + sljit_s32 have_save_area; #endif #if (defined SLJIT_VERBOSE && SLJIT_VERBOSE) @@ -472,10 +492,12 @@ struct sljit_compiler { custom memory managers. This pointer is passed to SLJIT_MALLOC and SLJIT_FREE macros. Most allocators (including the default one) ignores this value, and it is recommended to pass NULL - as a dummy value for allocator_data. + as a dummy value for allocator_data. The exec_allocator_data + has the same purpose but this one is passed to SLJIT_MALLOC_EXEC / + SLJIT_MALLOC_FREE functions. Returns NULL if failed. */ -SLJIT_API_FUNC_ATTRIBUTE struct sljit_compiler* sljit_create_compiler(void *allocator_data); +SLJIT_API_FUNC_ATTRIBUTE struct sljit_compiler* sljit_create_compiler(void *allocator_data, void *exec_allocator_data); /* Frees everything except the compiled machine code. */ SLJIT_API_FUNC_ATTRIBUTE void sljit_free_compiler(struct sljit_compiler *compiler); @@ -522,7 +544,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil /* Free executable code. */ -SLJIT_API_FUNC_ATTRIBUTE void sljit_free_code(void* code); +SLJIT_API_FUNC_ATTRIBUTE void sljit_free_code(void* code, void *exec_allocator_data); /* When the protected executable allocator is used the JIT code is mapped @@ -558,10 +580,14 @@ static SLJIT_INLINE sljit_uw sljit_get_generated_code_size(struct sljit_compiler #define SLJIT_HAS_FPU 0 /* [Limitation] Some registers are virtual registers. */ #define SLJIT_HAS_VIRTUAL_REGISTERS 1 +/* [Emulated] Has zero register (setting a memory location to zero is efficient). */ +#define SLJIT_HAS_ZERO_REGISTER 2 /* [Emulated] Count leading zero is supported. */ -#define SLJIT_HAS_CLZ 2 +#define SLJIT_HAS_CLZ 3 +/* [Emulated] Conditional move is supported. */ +#define SLJIT_HAS_CMOV 4 /* [Emulated] Conditional move is supported. */ -#define SLJIT_HAS_CMOV 3 +#define SLJIT_HAS_PREFETCH 5 #if (defined SLJIT_CONFIG_X86 && SLJIT_CONFIG_X86) /* [Not emulated] SSE2 support is available on x86. */ @@ -649,10 +675,10 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_return(struct sljit_compiler *comp sljit_s32 src, sljit_sw srcw); /* Generating entry and exit points for fast call functions (see SLJIT_FAST_CALL). - Both sljit_emit_fast_enter and sljit_emit_fast_return functions preserve the + Both sljit_emit_fast_enter and SLJIT_FAST_RETURN operations preserve the values of all registers and stack frame. The return address is stored in the dst argument of sljit_emit_fast_enter, and this return address can be passed - to sljit_emit_fast_return to continue the execution after the fast call. + to SLJIT_FAST_RETURN to continue the execution after the fast call. Fast calls are cheap operations (usually only a single call instruction is emitted) but they do not preserve any registers. However the callee function @@ -660,16 +686,15 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_return(struct sljit_compiler *comp efficiently exploited by various optimizations. Registers can be saved manually by the callee function if needed. - Although returning to different address by sljit_emit_fast_return is possible, + Although returning to different address by SLJIT_FAST_RETURN is possible, this address usually cannot be predicted by the return address predictor of - modern CPUs which may reduce performance. Furthermore using sljit_emit_ijump - to return is also inefficient since return address prediction is usually - triggered by a specific form of ijump. + modern CPUs which may reduce performance. Furthermore certain security + enhancement technologies such as Intel Control-flow Enforcement Technology + (CET) may disallow returning to a different address. Flags: - (does not modify flags). */ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_enter(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw); -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_return(struct sljit_compiler *compiler, sljit_s32 src, sljit_sw srcw); /* Source and destination operands for arithmetical instructions @@ -683,7 +708,7 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_return(struct sljit_compiler */ /* - IMPORATNT NOTE: memory access MUST be naturally aligned except + IMPORTANT NOTE: memory access MUST be naturally aligned unless SLJIT_UNALIGNED macro is defined and its value is 1. length | alignment @@ -725,6 +750,9 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_return(struct sljit_compiler mips: [reg+imm], -65536 <= imm <= 65535 sparc: [reg+imm], -4096 <= imm <= 4095 [reg+reg] is supported + s390x: [reg+imm], -2^19 <= imm < 2^19 + [reg+reg] is supported + Write-back is not supported */ /* Macros for specifying operand types. */ @@ -878,6 +906,14 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_return(struct sljit_compiler the behaviour is undefined. */ #define SLJIT_DIV_SW (SLJIT_OP0_BASE + 7) #define SLJIT_DIV_S32 (SLJIT_DIV_SW | SLJIT_I32_OP) +/* Flags: - (does not modify flags) + ENDBR32 instruction for x86-32 and ENDBR64 instruction for x86-64 + when Intel Control-flow Enforcement Technology (CET) is enabled. + No instruction for other architectures. */ +#define SLJIT_ENDBR (SLJIT_OP0_BASE + 8) +/* Flags: - (may destroy flags) + Skip stack frames before return. */ +#define SLJIT_SKIP_FRAMES_BEFORE_RETURN (SLJIT_OP0_BASE + 9) SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op0(struct sljit_compiler *compiler, sljit_s32 op); @@ -895,15 +931,6 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op0(struct sljit_compiler *compile U32 - unsigned int (32 bit) data transfer S32 - signed int (32 bit) data transfer P - pointer (sljit_p) data transfer - - If the destination of a MOV instruction is SLJIT_UNUSED and the source - operand is a memory address the compiler emits a prefetch instruction - if this instruction is supported by the current CPU. Higher data sizes - bring the data closer to the core: a MOV with word size loads the data - into a higher level cache than a byte size. Otherwise the type does not - affect the prefetch instruction. Furthermore a prefetch instruction - never fails, so it can be used to prefetch a data from an address and - check whether that address is NULL afterwards. */ /* Flags: - (does not modify flags) */ @@ -1008,8 +1035,46 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op2(struct sljit_compiler *compile sljit_s32 src1, sljit_sw src1w, sljit_s32 src2, sljit_sw src2w); +/* Starting index of opcodes for sljit_emit_op2. */ +#define SLJIT_OP_SRC_BASE 128 + +/* Note: src cannot be an immedate value + Flags: - (does not modify flags) */ +#define SLJIT_FAST_RETURN (SLJIT_OP_SRC_BASE + 0) +/* Skip stack frames before fast return. + Note: src cannot be an immedate value + Flags: may destroy flags. */ +#define SLJIT_SKIP_FRAMES_BEFORE_FAST_RETURN (SLJIT_OP_SRC_BASE + 1) +/* Prefetch value into the level 1 data cache + Note: if the target CPU does not support data prefetch, + no instructions are emitted. + Note: this instruction never fails, even if the memory address is invalid. + Flags: - (does not modify flags) */ +#define SLJIT_PREFETCH_L1 (SLJIT_OP_SRC_BASE + 2) +/* Prefetch value into the level 2 data cache + Note: same as SLJIT_PREFETCH_L1 if the target CPU + does not support this instruction form. + Note: this instruction never fails, even if the memory address is invalid. + Flags: - (does not modify flags) */ +#define SLJIT_PREFETCH_L2 (SLJIT_OP_SRC_BASE + 3) +/* Prefetch value into the level 3 data cache + Note: same as SLJIT_PREFETCH_L2 if the target CPU + does not support this instruction form. + Note: this instruction never fails, even if the memory address is invalid. + Flags: - (does not modify flags) */ +#define SLJIT_PREFETCH_L3 (SLJIT_OP_SRC_BASE + 4) +/* Prefetch a value which is only used once (and can be discarded afterwards) + Note: same as SLJIT_PREFETCH_L1 if the target CPU + does not support this instruction form. + Note: this instruction never fails, even if the memory address is invalid. + Flags: - (does not modify flags) */ +#define SLJIT_PREFETCH_ONCE (SLJIT_OP_SRC_BASE + 5) + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_src(struct sljit_compiler *compiler, sljit_s32 op, + sljit_s32 src, sljit_sw srcw); + /* Starting index of opcodes for sljit_emit_fop1. */ -#define SLJIT_FOP1_BASE 128 +#define SLJIT_FOP1_BASE 160 /* Flags: - (does not modify flags) */ #define SLJIT_MOV_F64 (SLJIT_FOP1_BASE + 0) @@ -1048,7 +1113,7 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fop1(struct sljit_compiler *compil sljit_s32 src, sljit_sw srcw); /* Starting index of opcodes for sljit_emit_fop2. */ -#define SLJIT_FOP2_BASE 160 +#define SLJIT_FOP2_BASE 192 /* Flags: - (does not modify flags) */ #define SLJIT_ADD_F64 (SLJIT_FOP2_BASE + 0) @@ -1152,7 +1217,7 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_label* sljit_emit_label(struct sljit_compi /* Unconditional jump types. */ #define SLJIT_JUMP 24 - /* Fast calling method. See sljit_emit_fast_enter / sljit_emit_fast_return. */ + /* Fast calling method. See sljit_emit_fast_enter / SLJIT_FAST_RETURN. */ #define SLJIT_FAST_CALL 25 /* Called function must be declared with the SLJIT_FUNC attribute. */ #define SLJIT_CALL 26 @@ -1314,10 +1379,17 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fmem(struct sljit_compiler *compil Flags: - (may destroy flags) */ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_get_local_base(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw, sljit_sw offset); -/* The constant can be changed runtime (see: sljit_set_const) +/* Store a value that can be changed runtime (see: sljit_get_const_addr / sljit_set_const) Flags: - (does not modify flags) */ SLJIT_API_FUNC_ATTRIBUTE struct sljit_const* sljit_emit_const(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw, sljit_sw init_value); +/* Store the value of a label (see: sljit_set_put_label) + Flags: - (does not modify flags) */ +SLJIT_API_FUNC_ATTRIBUTE struct sljit_put_label* sljit_emit_put_label(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw); + +/* Set the value stored by put_label to this label. */ +SLJIT_API_FUNC_ATTRIBUTE void sljit_set_put_label(struct sljit_put_label *put_label, struct sljit_label *label); + /* After the code generation the address for label, jump and const instructions are computed. Since these structures are freed by sljit_free_compiler, the addresses must be preserved by the user program elsewere. */ @@ -1345,12 +1417,6 @@ SLJIT_API_FUNC_ATTRIBUTE const char* sljit_get_platform_name(void); /* Portable helper function to get an offset of a member. */ #define SLJIT_OFFSETOF(base, member) ((sljit_sw)(&((base*)0x10)->member) - 0x10) -#if (defined SLJIT_UTIL_GLOBAL_LOCK && SLJIT_UTIL_GLOBAL_LOCK) -/* This global lock is useful to compile common functions. */ -SLJIT_API_FUNC_ATTRIBUTE void SLJIT_FUNC sljit_grab_lock(void); -SLJIT_API_FUNC_ATTRIBUTE void SLJIT_FUNC sljit_release_lock(void); -#endif - #if (defined SLJIT_UTIL_STACK && SLJIT_UTIL_STACK) /* The sljit_stack structure and its manipulation functions provides @@ -1474,4 +1540,8 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_custom(struct sljit_compiler *c SLJIT_API_FUNC_ATTRIBUTE void sljit_set_current_flags(struct sljit_compiler *compiler, sljit_s32 current_flags); -#endif /* _SLJIT_LIR_H_ */ +#ifdef __cplusplus +} /* extern "C" */ +#endif + +#endif /* SLJIT_LIR_H_ */ diff --git a/src/pcre/sljit/sljitNativeARM_32.c b/src/pcre2/src/sljit/sljitNativeARM_32.c similarity index 92% rename from src/pcre/sljit/sljitNativeARM_32.c rename to src/pcre2/src/sljit/sljitNativeARM_32.c index 6d61eed9..ae8479f0 100644 --- a/src/pcre/sljit/sljitNativeARM_32.c +++ b/src/pcre2/src/sljit/sljitNativeARM_32.c @@ -467,18 +467,28 @@ static SLJIT_INLINE void inline_set_jump_addr(sljit_uw jump_ptr, sljit_sw execut sljit_s32 bl = (mov_pc & 0x0000f000) != RD(TMP_PC); sljit_sw diff = (sljit_sw)(((sljit_sw)new_addr - (sljit_sw)(inst + 2) - executable_offset) >> 2); + SLJIT_UNUSED_ARG(executable_offset); + if (diff <= 0x7fffff && diff >= -0x800000) { /* Turn to branch. */ if (!bl) { + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 1, 0); + } inst[0] = (mov_pc & COND_MASK) | (B - CONDITIONAL) | (diff & 0xffffff); if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 1, 1); inst = (sljit_uw *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 1); } } else { + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 2, 0); + } inst[0] = (mov_pc & COND_MASK) | (BL - CONDITIONAL) | (diff & 0xffffff); inst[1] = NOP; if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 2, 1); inst = (sljit_uw *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 2); } @@ -491,28 +501,52 @@ static SLJIT_INLINE void inline_set_jump_addr(sljit_uw jump_ptr, sljit_sw execut ptr = inst + 1; if (*inst != mov_pc) { + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + (!bl ? 1 : 2), 0); + } inst[0] = mov_pc; if (!bl) { if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 1, 1); inst = (sljit_uw *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 1); } } else { inst[1] = BLX | RM(TMP_REG1); if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 2, 1); inst = (sljit_uw *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 2); } } } + + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(ptr, ptr + 1, 0); + } + *ptr = new_addr; + + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(ptr, ptr + 1, 1); + } } #else sljit_uw *inst = (sljit_uw*)jump_ptr; + + SLJIT_UNUSED_ARG(executable_offset); + SLJIT_ASSERT((inst[0] & 0xfff00000) == MOVW && (inst[1] & 0xfff00000) == MOVT); + + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 2, 0); + } + inst[0] = MOVW | (inst[0] & 0xf000) | ((new_addr << 4) & 0xf0000) | (new_addr & 0xfff); inst[1] = MOVT | (inst[1] & 0xf000) | ((new_addr >> 12) & 0xf0000) | ((new_addr >> 16) & 0xfff); + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 2, 1); inst = (sljit_uw *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 2); } @@ -529,10 +563,18 @@ static SLJIT_INLINE void inline_set_const(sljit_uw addr, sljit_sw executable_off sljit_uw ldr_literal = ptr[1]; sljit_uw src2; + SLJIT_UNUSED_ARG(executable_offset); + src2 = get_imm(new_constant); if (src2) { + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 1, 0); + } + *inst = 0xe3a00000 | (ldr_literal & 0xf000) | src2; + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 1, 1); inst = (sljit_uw *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 1); } @@ -541,8 +583,14 @@ static SLJIT_INLINE void inline_set_const(sljit_uw addr, sljit_sw executable_off src2 = get_imm(~new_constant); if (src2) { + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 1, 0); + } + *inst = 0xe3e00000 | (ldr_literal & 0xf000) | src2; + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 1, 1); inst = (sljit_uw *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 1); } @@ -555,19 +603,44 @@ static SLJIT_INLINE void inline_set_const(sljit_uw addr, sljit_sw executable_off ptr = inst + 1; if (*inst != ldr_literal) { + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 1, 0); + } + *inst = ldr_literal; + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 1, 1); inst = (sljit_uw *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 1); } } + + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(ptr, ptr + 1, 0); + } + *ptr = new_constant; + + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(ptr, ptr + 1, 1); + } #else sljit_uw *inst = (sljit_uw*)addr; + + SLJIT_UNUSED_ARG(executable_offset); + SLJIT_ASSERT((inst[0] & 0xfff00000) == MOVW && (inst[1] & 0xfff00000) == MOVT); + + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 2, 0); + } + inst[0] = MOVW | (inst[0] & 0xf000) | ((new_constant << 4) & 0xf0000) | (new_constant & 0xfff); inst[1] = MOVT | (inst[1] & 0xf000) | ((new_constant >> 12) & 0xf0000) | ((new_constant >> 16) & 0xfff); + if (flush_cache) { + SLJIT_UPDATE_WX_FLAGS(inst, inst + 2, 1); inst = (sljit_uw *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 2); } @@ -583,8 +656,9 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil sljit_uw *buf_end; sljit_uw size; sljit_uw word_count; + sljit_uw next_addr; sljit_sw executable_offset; - sljit_sw jump_addr; + sljit_sw addr; #if (defined SLJIT_CONFIG_ARM_V5 && SLJIT_CONFIG_ARM_V5) sljit_uw cpool_size; sljit_uw cpool_skip_alignment; @@ -597,6 +671,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil struct sljit_label *label; struct sljit_jump *jump; struct sljit_const *const_; + struct sljit_put_label *put_label; CHECK_ERROR_PTR(); CHECK_PTR(check_sljit_generate_code(compiler)); @@ -610,7 +685,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil #else size = compiler->size; #endif - code = (sljit_uw*)SLJIT_MALLOC_EXEC(size * sizeof(sljit_uw)); + code = (sljit_uw*)SLJIT_MALLOC_EXEC(size * sizeof(sljit_uw), compiler->exec_allocator_data); PTR_FAIL_WITH_EXEC_IF(code); buf = compiler->buf; @@ -625,11 +700,13 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil code_ptr = code; word_count = 0; + next_addr = 1; executable_offset = SLJIT_EXEC_OFFSET(code); label = compiler->labels; jump = compiler->jumps; const_ = compiler->consts; + put_label = compiler->put_labels; if (label && label->size == 0) { label->addr = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code, executable_offset); @@ -649,7 +726,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil } else { if (SLJIT_UNLIKELY(resolve_const_pool_index(compiler, &first_patch, cpool_current_index, cpool_start_address, buf_ptr))) { - SLJIT_FREE_EXEC(code); + SLJIT_FREE_EXEC(code, compiler->exec_allocator_data); compiler->error = SLJIT_ERR_ALLOC_FAILED; return NULL; } @@ -662,6 +739,8 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil label->addr = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); label->size = code_ptr - code; label = label->next; + + next_addr = compute_next_addr(label, jump, const_, put_label); } } } @@ -669,35 +748,45 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil else if ((*buf_ptr & 0xff000000) != PUSH_POOL) { #endif *code_ptr = *buf_ptr++; + if (next_addr == word_count) { + SLJIT_ASSERT(!label || label->size >= word_count); + SLJIT_ASSERT(!jump || jump->addr >= word_count); + SLJIT_ASSERT(!const_ || const_->addr >= word_count); + SLJIT_ASSERT(!put_label || put_label->addr >= word_count); + /* These structures are ordered by their address. */ - SLJIT_ASSERT(!label || label->size >= word_count); - SLJIT_ASSERT(!jump || jump->addr >= word_count); - SLJIT_ASSERT(!const_ || const_->addr >= word_count); - if (jump && jump->addr == word_count) { + if (jump && jump->addr == word_count) { #if (defined SLJIT_CONFIG_ARM_V5 && SLJIT_CONFIG_ARM_V5) - if (detect_jump_type(jump, code_ptr, code, executable_offset)) - code_ptr--; - jump->addr = (sljit_uw)code_ptr; + if (detect_jump_type(jump, code_ptr, code, executable_offset)) + code_ptr--; + jump->addr = (sljit_uw)code_ptr; #else - jump->addr = (sljit_uw)(code_ptr - 2); - if (detect_jump_type(jump, code_ptr, code, executable_offset)) - code_ptr -= 2; + jump->addr = (sljit_uw)(code_ptr - 2); + if (detect_jump_type(jump, code_ptr, code, executable_offset)) + code_ptr -= 2; #endif - jump = jump->next; - } - if (label && label->size == word_count) { - /* code_ptr can be affected above. */ - label->addr = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr + 1, executable_offset); - label->size = (code_ptr + 1) - code; - label = label->next; - } - if (const_ && const_->addr == word_count) { + jump = jump->next; + } + if (label && label->size == word_count) { + /* code_ptr can be affected above. */ + label->addr = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr + 1, executable_offset); + label->size = (code_ptr + 1) - code; + label = label->next; + } + if (const_ && const_->addr == word_count) { #if (defined SLJIT_CONFIG_ARM_V5 && SLJIT_CONFIG_ARM_V5) - const_->addr = (sljit_uw)code_ptr; + const_->addr = (sljit_uw)code_ptr; #else - const_->addr = (sljit_uw)(code_ptr - 1); + const_->addr = (sljit_uw)(code_ptr - 1); #endif - const_ = const_->next; + const_ = const_->next; + } + if (put_label && put_label->addr == word_count) { + SLJIT_ASSERT(put_label->label); + put_label->addr = (sljit_uw)code_ptr; + put_label = put_label->next; + } + next_addr = compute_next_addr(label, jump, const_, put_label); } code_ptr++; #if (defined SLJIT_CONFIG_ARM_V5 && SLJIT_CONFIG_ARM_V5) @@ -725,6 +814,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil SLJIT_ASSERT(!label); SLJIT_ASSERT(!jump); SLJIT_ASSERT(!const_); + SLJIT_ASSERT(!put_label); #if (defined SLJIT_CONFIG_ARM_V5 && SLJIT_CONFIG_ARM_V5) SLJIT_ASSERT(cpool_size == 0); @@ -739,7 +829,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil cpool_current_index = 0; while (buf_ptr < buf_end) { if (SLJIT_UNLIKELY(resolve_const_pool_index(compiler, &first_patch, cpool_current_index, cpool_start_address, buf_ptr))) { - SLJIT_FREE_EXEC(code); + SLJIT_FREE_EXEC(code, compiler->exec_allocator_data); compiler->error = SLJIT_ERR_ALLOC_FAILED; return NULL; } @@ -755,15 +845,15 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil buf_ptr = (sljit_uw *)jump->addr; if (jump->flags & PATCH_B) { - jump_addr = (sljit_sw)SLJIT_ADD_EXEC_OFFSET(buf_ptr + 2, executable_offset); + addr = (sljit_sw)SLJIT_ADD_EXEC_OFFSET(buf_ptr + 2, executable_offset); if (!(jump->flags & JUMP_ADDR)) { SLJIT_ASSERT(jump->flags & JUMP_LABEL); - SLJIT_ASSERT(((sljit_sw)jump->u.label->addr - jump_addr) <= 0x01ffffff && ((sljit_sw)jump->u.label->addr - jump_addr) >= -0x02000000); - *buf_ptr |= (((sljit_sw)jump->u.label->addr - jump_addr) >> 2) & 0x00ffffff; + SLJIT_ASSERT(((sljit_sw)jump->u.label->addr - addr) <= 0x01ffffff && ((sljit_sw)jump->u.label->addr - addr) >= -0x02000000); + *buf_ptr |= (((sljit_sw)jump->u.label->addr - addr) >> 2) & 0x00ffffff; } else { - SLJIT_ASSERT(((sljit_sw)jump->u.target - jump_addr) <= 0x01ffffff && ((sljit_sw)jump->u.target - jump_addr) >= -0x02000000); - *buf_ptr |= (((sljit_sw)jump->u.target - jump_addr) >> 2) & 0x00ffffff; + SLJIT_ASSERT(((sljit_sw)jump->u.target - addr) <= 0x01ffffff && ((sljit_sw)jump->u.target - addr) >= -0x02000000); + *buf_ptr |= (((sljit_sw)jump->u.target - addr) >> 2) & 0x00ffffff; } } else if (jump->flags & SLJIT_REWRITABLE_JUMP) { @@ -813,6 +903,22 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil } #endif + put_label = compiler->put_labels; + while (put_label) { + addr = put_label->label->addr; + buf_ptr = (sljit_uw*)put_label->addr; + +#if (defined SLJIT_CONFIG_ARM_V5 && SLJIT_CONFIG_ARM_V5) + SLJIT_ASSERT((buf_ptr[0] & 0xffff0000) == 0xe59f0000); + buf_ptr[((buf_ptr[0] & 0xfff) >> 2) + 2] = addr; +#else + SLJIT_ASSERT((buf_ptr[-1] & 0xfff00000) == MOVW && (buf_ptr[0] & 0xfff00000) == MOVT); + buf_ptr[-1] |= ((addr << 4) & 0xf0000) | (addr & 0xfff); + buf_ptr[0] |= ((addr >> 12) & 0xf0000) | ((addr >> 16) & 0xfff); +#endif + put_label = put_label->next; + } + SLJIT_ASSERT(code_ptr - code <= (sljit_s32)size); compiler->error = SLJIT_ERR_COMPILED; @@ -823,6 +929,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil code_ptr = (sljit_uw *)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); SLJIT_CACHE_FLUSH(code, code_ptr); + SLJIT_UPDATE_WX_FLAGS(code, code_ptr, 1); return code; } @@ -839,6 +946,9 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_has_cpu_feature(sljit_s32 feature_type) case SLJIT_HAS_CLZ: case SLJIT_HAS_CMOV: +#if (defined SLJIT_CONFIG_ARM_V7 && SLJIT_CONFIG_ARM_V7) + case SLJIT_HAS_PREFETCH: +#endif return 1; default: @@ -1645,6 +1755,9 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op0(struct sljit_compiler *compile | (saved_reg_list[0] << 12) /* ldr rX, [sp], #8/16 */); } return SLJIT_SUCCESS; + case SLJIT_ENDBR: + case SLJIT_SKIP_FRAMES_BEFORE_RETURN: + return SLJIT_SUCCESS; } return SLJIT_SUCCESS; @@ -1659,14 +1772,6 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op1(struct sljit_compiler *compile ADJUST_LOCAL_OFFSET(dst, dstw); ADJUST_LOCAL_OFFSET(src, srcw); - if (dst == SLJIT_UNUSED && !HAS_FLAGS(op)) { -#if (defined SLJIT_CONFIG_ARM_V7 && SLJIT_CONFIG_ARM_V7) - if (op <= SLJIT_MOV_P && (src & SLJIT_MEM)) - return emit_op_mem(compiler, PRELOAD | LOAD_DATA, TMP_PC, src, srcw, TMP_REG1); -#endif - return SLJIT_SUCCESS; - } - switch (GET_OPCODE(op)) { case SLJIT_MOV: case SLJIT_MOV_U32: @@ -1748,6 +1853,40 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op2(struct sljit_compiler *compile return SLJIT_SUCCESS; } +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_src(struct sljit_compiler *compiler, sljit_s32 op, + sljit_s32 src, sljit_sw srcw) +{ + CHECK_ERROR(); + CHECK(check_sljit_emit_op_src(compiler, op, src, srcw)); + ADJUST_LOCAL_OFFSET(src, srcw); + + switch (op) { + case SLJIT_FAST_RETURN: + SLJIT_ASSERT(reg_map[TMP_REG2] == 14); + + if (FAST_IS_REG(src)) + FAIL_IF(push_inst(compiler, MOV | RD(TMP_REG2) | RM(src))); + else + FAIL_IF(emit_op_mem(compiler, WORD_SIZE | LOAD_DATA, TMP_REG2, src, srcw, TMP_REG1)); + + return push_inst(compiler, BX | RM(TMP_REG2)); + case SLJIT_SKIP_FRAMES_BEFORE_FAST_RETURN: + return SLJIT_SUCCESS; + case SLJIT_PREFETCH_L1: + case SLJIT_PREFETCH_L2: + case SLJIT_PREFETCH_L3: + case SLJIT_PREFETCH_ONCE: +#if (defined SLJIT_CONFIG_ARM_V7 && SLJIT_CONFIG_ARM_V7) + SLJIT_ASSERT(src & SLJIT_MEM); + return emit_op_mem(compiler, PRELOAD | LOAD_DATA, TMP_PC, src, srcw, TMP_REG1); +#else /* !SLJIT_CONFIG_ARM_V7 */ + return SLJIT_SUCCESS; +#endif /* SLJIT_CONFIG_ARM_V7 */ + } + + return SLJIT_SUCCESS; +} + SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_get_register_index(sljit_s32 reg) { CHECK_REG_INDEX(check_sljit_get_register_index(reg)); @@ -2010,22 +2149,6 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_enter(struct sljit_compiler * return emit_op_mem(compiler, WORD_SIZE, TMP_REG2, dst, dstw, TMP_REG1); } -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_return(struct sljit_compiler *compiler, sljit_s32 src, sljit_sw srcw) -{ - CHECK_ERROR(); - CHECK(check_sljit_emit_fast_return(compiler, src, srcw)); - ADJUST_LOCAL_OFFSET(src, srcw); - - SLJIT_ASSERT(reg_map[TMP_REG2] == 14); - - if (FAST_IS_REG(src)) - FAIL_IF(push_inst(compiler, MOV | RD(TMP_REG2) | RM(src))); - else - FAIL_IF(emit_op_mem(compiler, WORD_SIZE | LOAD_DATA, TMP_REG2, src, srcw, TMP_REG1)); - - return push_inst(compiler, BX | RM(TMP_REG2)); -} - /* --------------------------------------------------------------------- */ /* Conditional instructions */ /* --------------------------------------------------------------------- */ @@ -2584,11 +2707,11 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_mem(struct sljit_compiler *compile } else { if (is_type1_transfer) { - if (memw > 4095 && memw < -4095) + if (memw > 4095 || memw < -4095) return SLJIT_ERR_UNSUPPORTED; } else { - if (memw > 255 && memw < -255) + if (memw > 255 || memw < -255) return SLJIT_ERR_UNSUPPORTED; } } @@ -2639,28 +2762,55 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_mem(struct sljit_compiler *compile SLJIT_API_FUNC_ATTRIBUTE struct sljit_const* sljit_emit_const(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw, sljit_sw init_value) { struct sljit_const *const_; - sljit_s32 reg; + sljit_s32 dst_r; CHECK_ERROR_PTR(); CHECK_PTR(check_sljit_emit_const(compiler, dst, dstw, init_value)); ADJUST_LOCAL_OFFSET(dst, dstw); + dst_r = SLOW_IS_REG(dst) ? dst : TMP_REG2; + +#if (defined SLJIT_CONFIG_ARM_V5 && SLJIT_CONFIG_ARM_V5) + PTR_FAIL_IF(push_inst_with_unique_literal(compiler, EMIT_DATA_TRANSFER(WORD_SIZE | LOAD_DATA, 1, dst_r, TMP_PC, 0), init_value)); + compiler->patches++; +#else + PTR_FAIL_IF(emit_imm(compiler, dst_r, init_value)); +#endif + const_ = (struct sljit_const*)ensure_abuf(compiler, sizeof(struct sljit_const)); PTR_FAIL_IF(!const_); + set_const(const_, compiler); + + if (dst & SLJIT_MEM) + PTR_FAIL_IF(emit_op_mem(compiler, WORD_SIZE, TMP_REG2, dst, dstw, TMP_REG1)); + return const_; +} - reg = SLOW_IS_REG(dst) ? dst : TMP_REG2; +SLJIT_API_FUNC_ATTRIBUTE struct sljit_put_label* sljit_emit_put_label(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw) +{ + struct sljit_put_label *put_label; + sljit_s32 dst_r; + + CHECK_ERROR_PTR(); + CHECK_PTR(check_sljit_emit_put_label(compiler, dst, dstw)); + ADJUST_LOCAL_OFFSET(dst, dstw); + + dst_r = SLOW_IS_REG(dst) ? dst : TMP_REG2; #if (defined SLJIT_CONFIG_ARM_V5 && SLJIT_CONFIG_ARM_V5) - PTR_FAIL_IF(push_inst_with_unique_literal(compiler, EMIT_DATA_TRANSFER(WORD_SIZE | LOAD_DATA, 1, reg, TMP_PC, 0), init_value)); + PTR_FAIL_IF(push_inst_with_unique_literal(compiler, EMIT_DATA_TRANSFER(WORD_SIZE | LOAD_DATA, 1, dst_r, TMP_PC, 0), 0)); compiler->patches++; #else - PTR_FAIL_IF(emit_imm(compiler, reg, init_value)); + PTR_FAIL_IF(emit_imm(compiler, dst_r, 0)); #endif - set_const(const_, compiler); + + put_label = (struct sljit_put_label*)ensure_abuf(compiler, sizeof(struct sljit_put_label)); + PTR_FAIL_IF(!put_label); + set_put_label(put_label, compiler, 0); if (dst & SLJIT_MEM) PTR_FAIL_IF(emit_op_mem(compiler, WORD_SIZE, TMP_REG2, dst, dstw, TMP_REG1)); - return const_; + return put_label; } SLJIT_API_FUNC_ATTRIBUTE void sljit_set_jump_addr(sljit_uw addr, sljit_uw new_target, sljit_sw executable_offset) diff --git a/src/pcre/sljit/sljitNativeARM_64.c b/src/pcre2/src/sljit/sljitNativeARM_64.c similarity index 92% rename from src/pcre/sljit/sljitNativeARM_64.c rename to src/pcre2/src/sljit/sljitNativeARM_64.c index b015695c..52267e7d 100644 --- a/src/pcre/sljit/sljitNativeARM_64.c +++ b/src/pcre2/src/sljit/sljitNativeARM_64.c @@ -151,17 +151,7 @@ static SLJIT_INLINE sljit_s32 emit_imm64_const(struct sljit_compiler *compiler, return push_inst(compiler, MOVK | RD(dst) | ((imm >> 48) << 5) | (3 << 21)); } -static SLJIT_INLINE void modify_imm64_const(sljit_ins* inst, sljit_uw new_imm) -{ - sljit_s32 dst = inst[0] & 0x1f; - SLJIT_ASSERT((inst[0] & 0xffe00000) == MOVZ && (inst[1] & 0xffe00000) == (MOVK | (1 << 21))); - inst[0] = MOVZ | dst | ((new_imm & 0xffff) << 5); - inst[1] = MOVK | dst | (((new_imm >> 16) & 0xffff) << 5) | (1 << 21); - inst[2] = MOVK | dst | (((new_imm >> 32) & 0xffff) << 5) | (2 << 21); - inst[3] = MOVK | dst | ((new_imm >> 48) << 5) | (3 << 21); -} - -static SLJIT_INLINE sljit_s32 detect_jump_type(struct sljit_jump *jump, sljit_ins *code_ptr, sljit_ins *code, sljit_sw executable_offset) +static SLJIT_INLINE sljit_sw detect_jump_type(struct sljit_jump *jump, sljit_ins *code_ptr, sljit_ins *code, sljit_sw executable_offset) { sljit_sw diff; sljit_uw target_addr; @@ -196,14 +186,14 @@ static SLJIT_INLINE sljit_s32 detect_jump_type(struct sljit_jump *jump, sljit_in return 4; } - if (target_addr <= 0xffffffffl) { + if (target_addr < 0x100000000l) { if (jump->flags & IS_COND) code_ptr[-5] -= (2 << 5); code_ptr[-2] = code_ptr[0]; return 2; } - if (target_addr <= 0xffffffffffffl) { + if (target_addr < 0x1000000000000l) { if (jump->flags & IS_COND) code_ptr[-5] -= (1 << 5); jump->flags |= PATCH_ABS48; @@ -215,6 +205,22 @@ static SLJIT_INLINE sljit_s32 detect_jump_type(struct sljit_jump *jump, sljit_in return 0; } +static SLJIT_INLINE sljit_sw put_label_get_length(struct sljit_put_label *put_label, sljit_uw max_label) +{ + if (max_label < 0x100000000l) { + put_label->flags = 0; + return 2; + } + + if (max_label < 0x1000000000000l) { + put_label->flags = 1; + return 1; + } + + put_label->flags = 2; + return 0; +} + SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compiler) { struct sljit_memory_fragment *buf; @@ -223,6 +229,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil sljit_ins *buf_ptr; sljit_ins *buf_end; sljit_uw word_count; + sljit_uw next_addr; sljit_sw executable_offset; sljit_uw addr; sljit_s32 dst; @@ -230,45 +237,59 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil struct sljit_label *label; struct sljit_jump *jump; struct sljit_const *const_; + struct sljit_put_label *put_label; CHECK_ERROR_PTR(); CHECK_PTR(check_sljit_generate_code(compiler)); reverse_buf(compiler); - code = (sljit_ins*)SLJIT_MALLOC_EXEC(compiler->size * sizeof(sljit_ins)); + code = (sljit_ins*)SLJIT_MALLOC_EXEC(compiler->size * sizeof(sljit_ins), compiler->exec_allocator_data); PTR_FAIL_WITH_EXEC_IF(code); buf = compiler->buf; code_ptr = code; word_count = 0; + next_addr = 0; executable_offset = SLJIT_EXEC_OFFSET(code); label = compiler->labels; jump = compiler->jumps; const_ = compiler->consts; + put_label = compiler->put_labels; do { buf_ptr = (sljit_ins*)buf->memory; buf_end = buf_ptr + (buf->used_size >> 2); do { *code_ptr = *buf_ptr++; - /* These structures are ordered by their address. */ - SLJIT_ASSERT(!label || label->size >= word_count); - SLJIT_ASSERT(!jump || jump->addr >= word_count); - SLJIT_ASSERT(!const_ || const_->addr >= word_count); - if (label && label->size == word_count) { - label->addr = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); - label->size = code_ptr - code; - label = label->next; - } - if (jump && jump->addr == word_count) { - jump->addr = (sljit_uw)(code_ptr - 4); - code_ptr -= detect_jump_type(jump, code_ptr, code, executable_offset); - jump = jump->next; - } - if (const_ && const_->addr == word_count) { - const_->addr = (sljit_uw)code_ptr; - const_ = const_->next; + if (next_addr == word_count) { + SLJIT_ASSERT(!label || label->size >= word_count); + SLJIT_ASSERT(!jump || jump->addr >= word_count); + SLJIT_ASSERT(!const_ || const_->addr >= word_count); + SLJIT_ASSERT(!put_label || put_label->addr >= word_count); + + /* These structures are ordered by their address. */ + if (label && label->size == word_count) { + label->addr = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); + label->size = code_ptr - code; + label = label->next; + } + if (jump && jump->addr == word_count) { + jump->addr = (sljit_uw)(code_ptr - 4); + code_ptr -= detect_jump_type(jump, code_ptr, code, executable_offset); + jump = jump->next; + } + if (const_ && const_->addr == word_count) { + const_->addr = (sljit_uw)code_ptr; + const_ = const_->next; + } + if (put_label && put_label->addr == word_count) { + SLJIT_ASSERT(put_label->label); + put_label->addr = (sljit_uw)(code_ptr - 3); + code_ptr -= put_label_get_length(put_label, (sljit_uw)(SLJIT_ADD_EXEC_OFFSET(code, executable_offset) + put_label->label->size)); + put_label = put_label->next; + } + next_addr = compute_next_addr(label, jump, const_, put_label); } code_ptr ++; word_count ++; @@ -286,6 +307,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil SLJIT_ASSERT(!label); SLJIT_ASSERT(!jump); SLJIT_ASSERT(!const_); + SLJIT_ASSERT(!put_label); SLJIT_ASSERT(code_ptr - code <= (sljit_sw)compiler->size); jump = compiler->jumps; @@ -323,6 +345,23 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil jump = jump->next; } + put_label = compiler->put_labels; + while (put_label) { + addr = put_label->label->addr; + buf_ptr = (sljit_ins *)put_label->addr; + + buf_ptr[0] |= (addr & 0xffff) << 5; + buf_ptr[1] |= ((addr >> 16) & 0xffff) << 5; + + if (put_label->flags >= 1) + buf_ptr[2] |= ((addr >> 32) & 0xffff) << 5; + + if (put_label->flags >= 2) + buf_ptr[3] |= ((addr >> 48) & 0xffff) << 5; + + put_label = put_label->next; + } + compiler->error = SLJIT_ERR_COMPILED; compiler->executable_offset = executable_offset; compiler->executable_size = (code_ptr - code) * sizeof(sljit_ins); @@ -331,6 +370,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil code_ptr = (sljit_ins *)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); SLJIT_CACHE_FLUSH(code, code_ptr); + SLJIT_UPDATE_WX_FLAGS(code, code_ptr, 1); return code; } @@ -347,6 +387,7 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_has_cpu_feature(sljit_s32 feature_type) case SLJIT_HAS_CLZ: case SLJIT_HAS_CMOV: + case SLJIT_HAS_PREFETCH: return 1; default: @@ -1105,6 +1146,9 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op0(struct sljit_compiler *compile case SLJIT_DIV_UW: case SLJIT_DIV_SW: return push_inst(compiler, ((op == SLJIT_DIV_UW ? UDIV : SDIV) ^ inv_bits) | RD(SLJIT_R0) | RN(SLJIT_R0) | RM(SLJIT_R1)); + case SLJIT_ENDBR: + case SLJIT_SKIP_FRAMES_BEFORE_RETURN: + return SLJIT_SUCCESS; } return SLJIT_SUCCESS; @@ -1122,23 +1166,6 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op1(struct sljit_compiler *compile ADJUST_LOCAL_OFFSET(dst, dstw); ADJUST_LOCAL_OFFSET(src, srcw); - if (dst == SLJIT_UNUSED && !HAS_FLAGS(op)) { - if (op <= SLJIT_MOV_P && (src & SLJIT_MEM)) { - SLJIT_ASSERT(reg_map[1] == 0 && reg_map[3] == 2 && reg_map[5] == 4); - - if (op >= SLJIT_MOV_U8 && op <= SLJIT_MOV_S8) - dst = 5; - else if (op >= SLJIT_MOV_U16 && op <= SLJIT_MOV_S16) - dst = 3; - else - dst = 1; - - /* Signed word sized load is the prefetch instruction. */ - return emit_op_mem(compiler, WORD_SIZE | SIGNED, dst, src, srcw, TMP_REG1); - } - return SLJIT_SUCCESS; - } - dst_r = SLOW_IS_REG(dst) ? dst : TMP_REG1; op = GET_OPCODE(op); @@ -1278,6 +1305,46 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op2(struct sljit_compiler *compile return SLJIT_SUCCESS; } +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_src(struct sljit_compiler *compiler, sljit_s32 op, + sljit_s32 src, sljit_sw srcw) +{ + CHECK_ERROR(); + CHECK(check_sljit_emit_op_src(compiler, op, src, srcw)); + ADJUST_LOCAL_OFFSET(src, srcw); + + switch (op) { + case SLJIT_FAST_RETURN: + if (FAST_IS_REG(src)) + FAIL_IF(push_inst(compiler, ORR | RD(TMP_LR) | RN(TMP_ZERO) | RM(src))); + else + FAIL_IF(emit_op_mem(compiler, WORD_SIZE, TMP_LR, src, srcw, TMP_REG1)); + + return push_inst(compiler, RET | RN(TMP_LR)); + case SLJIT_SKIP_FRAMES_BEFORE_FAST_RETURN: + return SLJIT_SUCCESS; + case SLJIT_PREFETCH_L1: + case SLJIT_PREFETCH_L2: + case SLJIT_PREFETCH_L3: + case SLJIT_PREFETCH_ONCE: + SLJIT_ASSERT(reg_map[1] == 0 && reg_map[3] == 2 && reg_map[5] == 4); + + /* The reg_map[op] should provide the appropriate constant. */ + if (op == SLJIT_PREFETCH_L1) + op = 1; + else if (op == SLJIT_PREFETCH_L2) + op = 3; + else if (op == SLJIT_PREFETCH_L3) + op = 5; + else + op = 2; + + /* Signed word sized load is the prefetch instruction. */ + return emit_op_mem(compiler, WORD_SIZE | SIGNED, op, src, srcw, TMP_REG1); + } + + return SLJIT_SUCCESS; +} + SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_get_register_index(sljit_s32 reg) { CHECK_REG_INDEX(check_sljit_get_register_index(reg)); @@ -1529,20 +1596,6 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_enter(struct sljit_compiler * return emit_op_mem(compiler, WORD_SIZE | STORE, TMP_LR, dst, dstw, TMP_REG1); } -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_return(struct sljit_compiler *compiler, sljit_s32 src, sljit_sw srcw) -{ - CHECK_ERROR(); - CHECK(check_sljit_emit_fast_return(compiler, src, srcw)); - ADJUST_LOCAL_OFFSET(src, srcw); - - if (FAST_IS_REG(src)) - FAIL_IF(push_inst(compiler, ORR | RD(TMP_LR) | RN(TMP_ZERO) | RM(src))); - else - FAIL_IF(emit_op_mem(compiler, WORD_SIZE, TMP_LR, src, srcw, TMP_REG1)); - - return push_inst(compiler, RET | RN(TMP_LR)); -} - /* --------------------------------------------------------------------- */ /* Conditional instructions */ /* --------------------------------------------------------------------- */ @@ -1816,7 +1869,7 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_mem(struct sljit_compiler *compile CHECK_ERROR(); CHECK(check_sljit_emit_mem(compiler, type, reg, mem, memw)); - if ((mem & OFFS_REG_MASK) || (memw > 255 && memw < -256)) + if ((mem & OFFS_REG_MASK) || (memw > 255 || memw < -256)) return SLJIT_ERR_UNSUPPORTED; if (type & SLJIT_MEM_SUPP) @@ -1866,7 +1919,7 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fmem(struct sljit_compiler *compil CHECK_ERROR(); CHECK(check_sljit_emit_fmem(compiler, type, freg, mem, memw)); - if ((mem & OFFS_REG_MASK) || (memw > 255 && memw < -256)) + if ((mem & OFFS_REG_MASK) || (memw > 255 || memw < -256)) return SLJIT_ERR_UNSUPPORTED; if (type & SLJIT_MEM_SUPP) @@ -1947,18 +2000,49 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_const* sljit_emit_const(struct sljit_compi return const_; } +SLJIT_API_FUNC_ATTRIBUTE struct sljit_put_label* sljit_emit_put_label(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw) +{ + struct sljit_put_label *put_label; + sljit_s32 dst_r; + + CHECK_ERROR_PTR(); + CHECK_PTR(check_sljit_emit_put_label(compiler, dst, dstw)); + ADJUST_LOCAL_OFFSET(dst, dstw); + + dst_r = FAST_IS_REG(dst) ? dst : TMP_REG1; + PTR_FAIL_IF(emit_imm64_const(compiler, dst_r, 0)); + + put_label = (struct sljit_put_label*)ensure_abuf(compiler, sizeof(struct sljit_put_label)); + PTR_FAIL_IF(!put_label); + set_put_label(put_label, compiler, 1); + + if (dst & SLJIT_MEM) + PTR_FAIL_IF(emit_op_mem(compiler, WORD_SIZE | STORE, dst_r, dst, dstw, TMP_REG2)); + + return put_label; +} + SLJIT_API_FUNC_ATTRIBUTE void sljit_set_jump_addr(sljit_uw addr, sljit_uw new_target, sljit_sw executable_offset) { sljit_ins* inst = (sljit_ins*)addr; - modify_imm64_const(inst, new_target); + sljit_s32 dst; + SLJIT_UNUSED_ARG(executable_offset); + + SLJIT_UPDATE_WX_FLAGS(inst, inst + 4, 0); + + dst = inst[0] & 0x1f; + SLJIT_ASSERT((inst[0] & 0xffe00000) == MOVZ && (inst[1] & 0xffe00000) == (MOVK | (1 << 21))); + inst[0] = MOVZ | dst | ((new_target & 0xffff) << 5); + inst[1] = MOVK | dst | (((new_target >> 16) & 0xffff) << 5) | (1 << 21); + inst[2] = MOVK | dst | (((new_target >> 32) & 0xffff) << 5) | (2 << 21); + inst[3] = MOVK | dst | ((new_target >> 48) << 5) | (3 << 21); + + SLJIT_UPDATE_WX_FLAGS(inst, inst + 4, 1); inst = (sljit_ins *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 4); } SLJIT_API_FUNC_ATTRIBUTE void sljit_set_const(sljit_uw addr, sljit_sw new_constant, sljit_sw executable_offset) { - sljit_ins* inst = (sljit_ins*)addr; - modify_imm64_const(inst, new_constant); - inst = (sljit_ins *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); - SLJIT_CACHE_FLUSH(inst, inst + 4); + sljit_set_jump_addr(addr, new_constant, executable_offset); } diff --git a/src/pcre/sljit/sljitNativeARM_T2_32.c b/src/pcre2/src/sljit/sljitNativeARM_T2_32.c similarity index 95% rename from src/pcre/sljit/sljitNativeARM_T2_32.c rename to src/pcre2/src/sljit/sljitNativeARM_T2_32.c index d7024b6d..4624882f 100644 --- a/src/pcre/sljit/sljitNativeARM_T2_32.c +++ b/src/pcre2/src/sljit/sljitNativeARM_T2_32.c @@ -365,50 +365,64 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil sljit_u16 *buf_ptr; sljit_u16 *buf_end; sljit_uw half_count; + sljit_uw next_addr; sljit_sw executable_offset; struct sljit_label *label; struct sljit_jump *jump; struct sljit_const *const_; + struct sljit_put_label *put_label; CHECK_ERROR_PTR(); CHECK_PTR(check_sljit_generate_code(compiler)); reverse_buf(compiler); - code = (sljit_u16*)SLJIT_MALLOC_EXEC(compiler->size * sizeof(sljit_u16)); + code = (sljit_u16*)SLJIT_MALLOC_EXEC(compiler->size * sizeof(sljit_u16), compiler->exec_allocator_data); PTR_FAIL_WITH_EXEC_IF(code); buf = compiler->buf; code_ptr = code; half_count = 0; + next_addr = 0; executable_offset = SLJIT_EXEC_OFFSET(code); label = compiler->labels; jump = compiler->jumps; const_ = compiler->consts; + put_label = compiler->put_labels; do { buf_ptr = (sljit_u16*)buf->memory; buf_end = buf_ptr + (buf->used_size >> 1); do { *code_ptr = *buf_ptr++; - /* These structures are ordered by their address. */ - SLJIT_ASSERT(!label || label->size >= half_count); - SLJIT_ASSERT(!jump || jump->addr >= half_count); - SLJIT_ASSERT(!const_ || const_->addr >= half_count); - if (label && label->size == half_count) { - label->addr = ((sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset)) | 0x1; - label->size = code_ptr - code; - label = label->next; - } - if (jump && jump->addr == half_count) { - jump->addr = (sljit_uw)code_ptr - ((jump->flags & IS_COND) ? 10 : 8); - code_ptr -= detect_jump_type(jump, code_ptr, code, executable_offset); - jump = jump->next; - } - if (const_ && const_->addr == half_count) { - const_->addr = (sljit_uw)code_ptr; - const_ = const_->next; + if (next_addr == half_count) { + SLJIT_ASSERT(!label || label->size >= half_count); + SLJIT_ASSERT(!jump || jump->addr >= half_count); + SLJIT_ASSERT(!const_ || const_->addr >= half_count); + SLJIT_ASSERT(!put_label || put_label->addr >= half_count); + + /* These structures are ordered by their address. */ + if (label && label->size == half_count) { + label->addr = ((sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset)) | 0x1; + label->size = code_ptr - code; + label = label->next; + } + if (jump && jump->addr == half_count) { + jump->addr = (sljit_uw)code_ptr - ((jump->flags & IS_COND) ? 10 : 8); + code_ptr -= detect_jump_type(jump, code_ptr, code, executable_offset); + jump = jump->next; + } + if (const_ && const_->addr == half_count) { + const_->addr = (sljit_uw)code_ptr; + const_ = const_->next; + } + if (put_label && put_label->addr == half_count) { + SLJIT_ASSERT(put_label->label); + put_label->addr = (sljit_uw)code_ptr; + put_label = put_label->next; + } + next_addr = compute_next_addr(label, jump, const_, put_label); } code_ptr ++; half_count ++; @@ -426,6 +440,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil SLJIT_ASSERT(!label); SLJIT_ASSERT(!jump); SLJIT_ASSERT(!const_); + SLJIT_ASSERT(!put_label); SLJIT_ASSERT(code_ptr - code <= (sljit_sw)compiler->size); jump = compiler->jumps; @@ -434,6 +449,12 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil jump = jump->next; } + put_label = compiler->put_labels; + while (put_label) { + modify_imm32_const((sljit_u16 *)put_label->addr, put_label->label->addr); + put_label = put_label->next; + } + compiler->error = SLJIT_ERR_COMPILED; compiler->executable_offset = executable_offset; compiler->executable_size = (code_ptr - code) * sizeof(sljit_u16); @@ -442,6 +463,8 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil code_ptr = (sljit_u16 *)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); SLJIT_CACHE_FLUSH(code, code_ptr); + SLJIT_UPDATE_WX_FLAGS(code, code_ptr, 1); + /* Set thumb mode flag. */ return (void*)((sljit_uw)code | 0x1); } @@ -459,6 +482,7 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_has_cpu_feature(sljit_s32 feature_type) case SLJIT_HAS_CLZ: case SLJIT_HAS_CMOV: + case SLJIT_HAS_PREFETCH: return 1; default: @@ -586,7 +610,7 @@ static sljit_s32 emit_op_imm(struct sljit_compiler *compiler, sljit_s32 flags, s Although some clever things could be done here, "NOT IMM" does not worth the efforts. */ break; case SLJIT_ADD: - nimm = -imm; + nimm = -(sljit_sw)imm; if (IS_2_LO_REGS(reg, dst)) { if (imm <= 0x7) return push_inst16(compiler, ADDSI3 | IMM3(imm) | RD3(dst) | RN3(reg)); @@ -608,7 +632,7 @@ static sljit_s32 emit_op_imm(struct sljit_compiler *compiler, sljit_s32 flags, s nimm = get_imm(imm); if (nimm != INVALID_IMM) return push_inst32(compiler, ADD_WI | (flags & SET_FLAGS) | RD4(dst) | RN4(reg) | nimm); - nimm = get_imm(-imm); + nimm = get_imm(-(sljit_sw)imm); if (nimm != INVALID_IMM) return push_inst32(compiler, SUB_WI | (flags & SET_FLAGS) | RD4(dst) | RN4(reg) | nimm); break; @@ -633,11 +657,11 @@ static sljit_s32 emit_op_imm(struct sljit_compiler *compiler, sljit_s32 flags, s nimm = get_imm(imm); if (nimm != INVALID_IMM) return push_inst32(compiler, CMPI_W | RN4(reg) | nimm); - nimm = get_imm(-imm); + nimm = get_imm(-(sljit_sw)imm); if (nimm != INVALID_IMM) return push_inst32(compiler, CMNI_W | RN4(reg) | nimm); } - nimm = -imm; + nimm = -(sljit_sw)imm; if (IS_2_LO_REGS(reg, dst)) { if (imm <= 0x7) return push_inst16(compiler, SUBSI3 | IMM3(imm) | RD3(dst) | RN3(reg)); @@ -659,7 +683,7 @@ static sljit_s32 emit_op_imm(struct sljit_compiler *compiler, sljit_s32 flags, s nimm = get_imm(imm); if (nimm != INVALID_IMM) return push_inst32(compiler, SUB_WI | (flags & SET_FLAGS) | RD4(dst) | RN4(reg) | nimm); - nimm = get_imm(-imm); + nimm = get_imm(-(sljit_sw)imm); if (nimm != INVALID_IMM) return push_inst32(compiler, ADD_WI | (flags & SET_FLAGS) | RD4(dst) | RN4(reg) | nimm); break; @@ -1307,6 +1331,9 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op0(struct sljit_compiler *compile } return SLJIT_SUCCESS; #endif /* __ARM_FEATURE_IDIV || __ARM_ARCH_EXT_IDIV__ */ + case SLJIT_ENDBR: + case SLJIT_SKIP_FRAMES_BEFORE_RETURN: + return SLJIT_SUCCESS; } return SLJIT_SUCCESS; @@ -1324,13 +1351,6 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op1(struct sljit_compiler *compile ADJUST_LOCAL_OFFSET(dst, dstw); ADJUST_LOCAL_OFFSET(src, srcw); - if (dst == SLJIT_UNUSED && !HAS_FLAGS(op)) { - /* Since TMP_PC has index 15, IS_2_LO_REGS and IS_3_LO_REGS checks always fail. */ - if (op <= SLJIT_MOV_P && (src & SLJIT_MEM)) - return emit_op_mem(compiler, PRELOAD, TMP_PC, src, srcw, TMP_REG1); - return SLJIT_SUCCESS; - } - dst_r = SLOW_IS_REG(dst) ? dst : TMP_REG1; op = GET_OPCODE(op); @@ -1454,6 +1474,35 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op2(struct sljit_compiler *compile return emit_op_mem(compiler, WORD_SIZE | STORE, dst_reg, dst, dstw, TMP_REG2); } +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_src(struct sljit_compiler *compiler, sljit_s32 op, + sljit_s32 src, sljit_sw srcw) +{ + CHECK_ERROR(); + CHECK(check_sljit_emit_op_src(compiler, op, src, srcw)); + ADJUST_LOCAL_OFFSET(src, srcw); + + switch (op) { + case SLJIT_FAST_RETURN: + SLJIT_ASSERT(reg_map[TMP_REG2] == 14); + + if (FAST_IS_REG(src)) + FAIL_IF(push_inst16(compiler, MOV | SET_REGS44(TMP_REG2, src))); + else + FAIL_IF(emit_op_mem(compiler, WORD_SIZE, TMP_REG2, src, srcw, TMP_REG2)); + + return push_inst16(compiler, BX | RN3(TMP_REG2)); + case SLJIT_SKIP_FRAMES_BEFORE_FAST_RETURN: + return SLJIT_SUCCESS; + case SLJIT_PREFETCH_L1: + case SLJIT_PREFETCH_L2: + case SLJIT_PREFETCH_L3: + case SLJIT_PREFETCH_ONCE: + return emit_op_mem(compiler, PRELOAD, TMP_PC, src, srcw, TMP_REG1); + } + + return SLJIT_SUCCESS; +} + SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_get_register_index(sljit_s32 reg) { CHECK_REG_INDEX(check_sljit_get_register_index(reg)); @@ -1707,22 +1756,6 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_enter(struct sljit_compiler * return emit_op_mem(compiler, WORD_SIZE | STORE, TMP_REG2, dst, dstw, TMP_REG1); } -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_return(struct sljit_compiler *compiler, sljit_s32 src, sljit_sw srcw) -{ - CHECK_ERROR(); - CHECK(check_sljit_emit_fast_return(compiler, src, srcw)); - ADJUST_LOCAL_OFFSET(src, srcw); - - SLJIT_ASSERT(reg_map[TMP_REG2] == 14); - - if (FAST_IS_REG(src)) - FAIL_IF(push_inst16(compiler, MOV | SET_REGS44(TMP_REG2, src))); - else - FAIL_IF(emit_op_mem(compiler, WORD_SIZE, TMP_REG2, src, srcw, TMP_REG2)); - - return push_inst16(compiler, BX | RN3(TMP_REG2)); -} - /* --------------------------------------------------------------------- */ /* Conditional instructions */ /* --------------------------------------------------------------------- */ @@ -2243,7 +2276,7 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_mem(struct sljit_compiler *compile CHECK_ERROR(); CHECK(check_sljit_emit_mem(compiler, type, reg, mem, memw)); - if ((mem & OFFS_REG_MASK) || (memw > 255 && memw < -255)) + if ((mem & OFFS_REG_MASK) || (memw > 255 || memw < -255)) return SLJIT_ERR_UNSUPPORTED; if (type & SLJIT_MEM_SUPP) @@ -2311,18 +2344,40 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_const* sljit_emit_const(struct sljit_compi return const_; } +SLJIT_API_FUNC_ATTRIBUTE struct sljit_put_label* sljit_emit_put_label(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw) +{ + struct sljit_put_label *put_label; + sljit_s32 dst_r; + + CHECK_ERROR_PTR(); + CHECK_PTR(check_sljit_emit_put_label(compiler, dst, dstw)); + ADJUST_LOCAL_OFFSET(dst, dstw); + + put_label = (struct sljit_put_label*)ensure_abuf(compiler, sizeof(struct sljit_put_label)); + PTR_FAIL_IF(!put_label); + set_put_label(put_label, compiler, 0); + + dst_r = FAST_IS_REG(dst) ? dst : TMP_REG1; + PTR_FAIL_IF(emit_imm32_const(compiler, dst_r, 0)); + + if (dst & SLJIT_MEM) + PTR_FAIL_IF(emit_op_mem(compiler, WORD_SIZE | STORE, dst_r, dst, dstw, TMP_REG2)); + return put_label; +} + SLJIT_API_FUNC_ATTRIBUTE void sljit_set_jump_addr(sljit_uw addr, sljit_uw new_target, sljit_sw executable_offset) { sljit_u16 *inst = (sljit_u16*)addr; + SLJIT_UNUSED_ARG(executable_offset); + + SLJIT_UPDATE_WX_FLAGS(inst, inst + 4, 0); modify_imm32_const(inst, new_target); + SLJIT_UPDATE_WX_FLAGS(inst, inst + 4, 1); inst = (sljit_u16 *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 4); } SLJIT_API_FUNC_ATTRIBUTE void sljit_set_const(sljit_uw addr, sljit_sw new_constant, sljit_sw executable_offset) { - sljit_u16 *inst = (sljit_u16*)addr; - modify_imm32_const(inst, new_constant); - inst = (sljit_u16 *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); - SLJIT_CACHE_FLUSH(inst, inst + 4); + sljit_set_jump_addr(addr, new_constant, executable_offset); } diff --git a/src/pcre/sljit/sljitNativeMIPS_32.c b/src/pcre2/src/sljit/sljitNativeMIPS_32.c similarity index 95% rename from src/pcre/sljit/sljitNativeMIPS_32.c rename to src/pcre2/src/sljit/sljitNativeMIPS_32.c index 094c9923..f887ee13 100644 --- a/src/pcre/sljit/sljitNativeMIPS_32.c +++ b/src/pcre2/src/sljit/sljitNativeMIPS_32.c @@ -86,12 +86,12 @@ static SLJIT_INLINE sljit_s32 emit_single_op(struct sljit_compiler *compiler, sl SLJIT_ASSERT(src1 == TMP_REG1 && !(flags & SRC2_IMM)); if ((flags & (REG_DEST | REG2_SOURCE)) == (REG_DEST | REG2_SOURCE)) { if (op == SLJIT_MOV_S8) { -#if (defined SLJIT_MIPS_R1 && SLJIT_MIPS_R1) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 1) return push_inst(compiler, SEB | T(src2) | D(dst), DR(dst)); -#else +#else /* SLJIT_MIPS_REV < 1 */ FAIL_IF(push_inst(compiler, SLL | T(src2) | D(dst) | SH_IMM(24), DR(dst))); return push_inst(compiler, SRA | T(dst) | D(dst) | SH_IMM(24), DR(dst)); -#endif +#endif /* SLJIT_MIPS_REV >= 1 */ } return push_inst(compiler, ANDI | S(src2) | T(dst) | IMM(0xff), DR(dst)); } @@ -105,12 +105,12 @@ static SLJIT_INLINE sljit_s32 emit_single_op(struct sljit_compiler *compiler, sl SLJIT_ASSERT(src1 == TMP_REG1 && !(flags & SRC2_IMM)); if ((flags & (REG_DEST | REG2_SOURCE)) == (REG_DEST | REG2_SOURCE)) { if (op == SLJIT_MOV_S16) { -#if (defined SLJIT_MIPS_R1 && SLJIT_MIPS_R1) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 1) return push_inst(compiler, SEH | T(src2) | D(dst), DR(dst)); -#else +#else /* SLJIT_MIPS_REV < 1 */ FAIL_IF(push_inst(compiler, SLL | T(src2) | D(dst) | SH_IMM(16), DR(dst))); return push_inst(compiler, SRA | T(dst) | D(dst) | SH_IMM(16), DR(dst)); -#endif +#endif /* SLJIT_MIPS_REV >= 1 */ } return push_inst(compiler, ANDI | S(src2) | T(dst) | IMM(0xffff), DR(dst)); } @@ -129,12 +129,12 @@ static SLJIT_INLINE sljit_s32 emit_single_op(struct sljit_compiler *compiler, sl case SLJIT_CLZ: SLJIT_ASSERT(src1 == TMP_REG1 && !(flags & SRC2_IMM)); -#if (defined SLJIT_MIPS_R1 && SLJIT_MIPS_R1) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 1) if (op & SLJIT_SET_Z) FAIL_IF(push_inst(compiler, CLZ | S(src2) | TA(EQUAL_FLAG) | DA(EQUAL_FLAG), EQUAL_FLAG)); if (!(flags & UNUSED_DEST)) FAIL_IF(push_inst(compiler, CLZ | S(src2) | T(dst) | D(dst), DR(dst))); -#else +#else /* SLJIT_MIPS_REV < 1 */ if (SLJIT_UNLIKELY(flags & UNUSED_DEST)) { FAIL_IF(push_inst(compiler, SRL | T(src2) | DA(EQUAL_FLAG) | SH_IMM(31), EQUAL_FLAG)); return push_inst(compiler, XORI | SA(EQUAL_FLAG) | TA(EQUAL_FLAG) | IMM(1), EQUAL_FLAG); @@ -149,7 +149,7 @@ static SLJIT_INLINE sljit_s32 emit_single_op(struct sljit_compiler *compiler, sl FAIL_IF(push_inst(compiler, ADDIU | S(dst) | T(dst) | IMM(1), DR(dst))); FAIL_IF(push_inst(compiler, BGEZ | S(TMP_REG1) | IMM(-2), UNMOVABLE_INS)); FAIL_IF(push_inst(compiler, SLL | T(TMP_REG1) | D(TMP_REG1) | SH_IMM(1), UNMOVABLE_INS)); -#endif +#endif /* SLJIT_MIPS_REV >= 1 */ return SLJIT_SUCCESS; case SLJIT_ADD: @@ -368,16 +368,22 @@ static SLJIT_INLINE sljit_s32 emit_single_op(struct sljit_compiler *compiler, sl SLJIT_ASSERT(!(flags & SRC2_IMM)); if (GET_FLAG_TYPE(op) != SLJIT_MUL_OVERFLOW) { -#if (defined SLJIT_MIPS_R1 && SLJIT_MIPS_R1) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 1) return push_inst(compiler, MUL | S(src1) | T(src2) | D(dst), DR(dst)); -#else +#else /* SLJIT_MIPS_REV < 1 */ FAIL_IF(push_inst(compiler, MULT | S(src1) | T(src2), MOVABLE_INS)); return push_inst(compiler, MFLO | D(dst), DR(dst)); -#endif +#endif /* SLJIT_MIPS_REV >= 1 */ } + +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) + FAIL_IF(push_inst(compiler, MUL | S(src1) | T(src2) | D(dst), DR(dst))); + FAIL_IF(push_inst(compiler, MUH | S(src1) | T(src2) | DA(EQUAL_FLAG), EQUAL_FLAG)); +#else /* SLJIT_MIPS_REV < 6 */ FAIL_IF(push_inst(compiler, MULT | S(src1) | T(src2), MOVABLE_INS)); FAIL_IF(push_inst(compiler, MFHI | DA(EQUAL_FLAG), EQUAL_FLAG)); FAIL_IF(push_inst(compiler, MFLO | D(dst), DR(dst))); +#endif /* SLJIT_MIPS_REV >= 6 */ FAIL_IF(push_inst(compiler, SRA | T(dst) | DA(OTHER_FLAG) | SH_IMM(31), OTHER_FLAG)); return push_inst(compiler, SUBU | SA(EQUAL_FLAG) | TA(OTHER_FLAG) | DA(OTHER_FLAG), OTHER_FLAG); @@ -419,21 +425,20 @@ static SLJIT_INLINE sljit_s32 emit_const(struct sljit_compiler *compiler, sljit_ SLJIT_API_FUNC_ATTRIBUTE void sljit_set_jump_addr(sljit_uw addr, sljit_uw new_target, sljit_sw executable_offset) { sljit_ins *inst = (sljit_ins *)addr; + SLJIT_UNUSED_ARG(executable_offset); + SLJIT_UPDATE_WX_FLAGS(inst, inst + 2, 0); + SLJIT_ASSERT((inst[0] & 0xffe00000) == LUI && (inst[1] & 0xfc000000) == ORI); inst[0] = (inst[0] & 0xffff0000) | ((new_target >> 16) & 0xffff); inst[1] = (inst[1] & 0xffff0000) | (new_target & 0xffff); + SLJIT_UPDATE_WX_FLAGS(inst, inst + 2, 1); inst = (sljit_ins *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 2); } SLJIT_API_FUNC_ATTRIBUTE void sljit_set_const(sljit_uw addr, sljit_sw new_constant, sljit_sw executable_offset) { - sljit_ins *inst = (sljit_ins *)addr; - - inst[0] = (inst[0] & 0xffff0000) | ((new_constant >> 16) & 0xffff); - inst[1] = (inst[1] & 0xffff0000) | (new_constant & 0xffff); - inst = (sljit_ins *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); - SLJIT_CACHE_FLUSH(inst, inst + 2); + sljit_set_jump_addr(addr, new_constant, executable_offset); } static sljit_s32 call_with_args(struct sljit_compiler *compiler, sljit_s32 arg_types, sljit_ins *ins_ptr) diff --git a/src/pcre/sljit/sljitNativeMIPS_64.c b/src/pcre2/src/sljit/sljitNativeMIPS_64.c similarity index 96% rename from src/pcre/sljit/sljitNativeMIPS_64.c rename to src/pcre2/src/sljit/sljitNativeMIPS_64.c index f841aef5..5ab9b7d0 100644 --- a/src/pcre/sljit/sljitNativeMIPS_64.c +++ b/src/pcre2/src/sljit/sljitNativeMIPS_64.c @@ -220,12 +220,12 @@ static SLJIT_INLINE sljit_s32 emit_single_op(struct sljit_compiler *compiler, sl case SLJIT_CLZ: SLJIT_ASSERT(src1 == TMP_REG1 && !(flags & SRC2_IMM)); -#if (defined SLJIT_MIPS_R1 && SLJIT_MIPS_R1) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 1) if (op & SLJIT_SET_Z) FAIL_IF(push_inst(compiler, SELECT_OP(DCLZ, CLZ) | S(src2) | TA(EQUAL_FLAG) | DA(EQUAL_FLAG), EQUAL_FLAG)); if (!(flags & UNUSED_DEST)) FAIL_IF(push_inst(compiler, SELECT_OP(DCLZ, CLZ) | S(src2) | T(dst) | D(dst), DR(dst))); -#else +#else /* SLJIT_MIPS_REV < 1 */ if (SLJIT_UNLIKELY(flags & UNUSED_DEST)) { FAIL_IF(push_inst(compiler, SELECT_OP(DSRL32, SRL) | T(src2) | DA(EQUAL_FLAG) | SH_IMM(31), EQUAL_FLAG)); return push_inst(compiler, XORI | SA(EQUAL_FLAG) | TA(EQUAL_FLAG) | IMM(1), EQUAL_FLAG); @@ -240,7 +240,7 @@ static SLJIT_INLINE sljit_s32 emit_single_op(struct sljit_compiler *compiler, sl FAIL_IF(push_inst(compiler, SELECT_OP(DADDIU, ADDIU) | S(dst) | T(dst) | IMM(1), DR(dst))); FAIL_IF(push_inst(compiler, BGEZ | S(TMP_REG1) | IMM(-2), UNMOVABLE_INS)); FAIL_IF(push_inst(compiler, SELECT_OP(DSLL, SLL) | T(TMP_REG1) | D(TMP_REG1) | SH_IMM(1), UNMOVABLE_INS)); -#endif +#endif /* SLJIT_MIPS_REV >= 1 */ return SLJIT_SUCCESS; case SLJIT_ADD: @@ -459,19 +459,27 @@ static SLJIT_INLINE sljit_s32 emit_single_op(struct sljit_compiler *compiler, sl SLJIT_ASSERT(!(flags & SRC2_IMM)); if (GET_FLAG_TYPE(op) != SLJIT_MUL_OVERFLOW) { -#if (defined SLJIT_MIPS_R1 && SLJIT_MIPS_R1) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) + return push_inst(compiler, SELECT_OP(DMUL, MUL) | S(src1) | T(src2) | D(dst), DR(dst)); +#elif (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 1) if (op & SLJIT_I32_OP) return push_inst(compiler, MUL | S(src1) | T(src2) | D(dst), DR(dst)); FAIL_IF(push_inst(compiler, DMULT | S(src1) | T(src2), MOVABLE_INS)); return push_inst(compiler, MFLO | D(dst), DR(dst)); -#else +#else /* SLJIT_MIPS_REV < 1 */ FAIL_IF(push_inst(compiler, SELECT_OP(DMULT, MULT) | S(src1) | T(src2), MOVABLE_INS)); return push_inst(compiler, MFLO | D(dst), DR(dst)); -#endif +#endif /* SLJIT_MIPS_REV >= 6 */ } + +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) + FAIL_IF(push_inst(compiler, SELECT_OP(DMUL, MUL) | S(src1) | T(src2) | D(dst), DR(dst))); + FAIL_IF(push_inst(compiler, SELECT_OP(DMUH, MUH) | S(src1) | T(src2) | DA(EQUAL_FLAG), EQUAL_FLAG)); +#else /* SLJIT_MIPS_REV < 6 */ FAIL_IF(push_inst(compiler, SELECT_OP(DMULT, MULT) | S(src1) | T(src2), MOVABLE_INS)); FAIL_IF(push_inst(compiler, MFHI | DA(EQUAL_FLAG), EQUAL_FLAG)); FAIL_IF(push_inst(compiler, MFLO | D(dst), DR(dst))); +#endif /* SLJIT_MIPS_REV >= 6 */ FAIL_IF(push_inst(compiler, SELECT_OP(DSRA32, SRA) | T(dst) | DA(OTHER_FLAG) | SH_IMM(31), OTHER_FLAG)); return push_inst(compiler, SELECT_OP(DSUBU, SUBU) | SA(EQUAL_FLAG) | TA(OTHER_FLAG) | DA(OTHER_FLAG), OTHER_FLAG); @@ -517,25 +525,21 @@ static SLJIT_INLINE sljit_s32 emit_const(struct sljit_compiler *compiler, sljit_ SLJIT_API_FUNC_ATTRIBUTE void sljit_set_jump_addr(sljit_uw addr, sljit_uw new_target, sljit_sw executable_offset) { sljit_ins *inst = (sljit_ins *)addr; + SLJIT_UNUSED_ARG(executable_offset); + SLJIT_UPDATE_WX_FLAGS(inst, inst + 6, 0); inst[0] = (inst[0] & 0xffff0000) | ((new_target >> 48) & 0xffff); inst[1] = (inst[1] & 0xffff0000) | ((new_target >> 32) & 0xffff); inst[3] = (inst[3] & 0xffff0000) | ((new_target >> 16) & 0xffff); inst[5] = (inst[5] & 0xffff0000) | (new_target & 0xffff); + SLJIT_UPDATE_WX_FLAGS(inst, inst + 6, 1); inst = (sljit_ins *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 6); } SLJIT_API_FUNC_ATTRIBUTE void sljit_set_const(sljit_uw addr, sljit_sw new_constant, sljit_sw executable_offset) { - sljit_ins *inst = (sljit_ins *)addr; - - inst[0] = (inst[0] & 0xffff0000) | ((new_constant >> 48) & 0xffff); - inst[1] = (inst[1] & 0xffff0000) | ((new_constant >> 32) & 0xffff); - inst[3] = (inst[3] & 0xffff0000) | ((new_constant >> 16) & 0xffff); - inst[5] = (inst[5] & 0xffff0000) | (new_constant & 0xffff); - inst = (sljit_ins *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); - SLJIT_CACHE_FLUSH(inst, inst + 6); + sljit_set_jump_addr(addr, new_constant, executable_offset); } static sljit_s32 call_with_args(struct sljit_compiler *compiler, sljit_s32 arg_types, sljit_ins *ins_ptr) diff --git a/src/pcre/sljit/sljitNativeMIPS_common.c b/src/pcre2/src/sljit/sljitNativeMIPS_common.c similarity index 83% rename from src/pcre/sljit/sljitNativeMIPS_common.c rename to src/pcre2/src/sljit/sljitNativeMIPS_common.c index 894e2130..ecf4dac4 100644 --- a/src/pcre/sljit/sljitNativeMIPS_common.c +++ b/src/pcre2/src/sljit/sljitNativeMIPS_common.c @@ -25,19 +25,34 @@ */ /* Latest MIPS architecture. */ -/* Automatically detect SLJIT_MIPS_R1 */ + +#ifndef __mips_hard_float +/* Disable automatic detection, covers both -msoft-float and -mno-float */ +#undef SLJIT_IS_FPU_AVAILABLE +#define SLJIT_IS_FPU_AVAILABLE 0 +#endif SLJIT_API_FUNC_ATTRIBUTE const char* sljit_get_platform_name(void) { -#if (defined SLJIT_MIPS_R1 && SLJIT_MIPS_R1) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) + +#if (defined SLJIT_CONFIG_MIPS_32 && SLJIT_CONFIG_MIPS_32) + return "MIPS32-R6" SLJIT_CPUINFO; +#else /* !SLJIT_CONFIG_MIPS_32 */ + return "MIPS64-R6" SLJIT_CPUINFO; +#endif /* SLJIT_CONFIG_MIPS_32 */ + +#elif (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 1) + #if (defined SLJIT_CONFIG_MIPS_32 && SLJIT_CONFIG_MIPS_32) return "MIPS32-R1" SLJIT_CPUINFO; -#else +#else /* !SLJIT_CONFIG_MIPS_32 */ return "MIPS64-R1" SLJIT_CPUINFO; -#endif -#else /* SLJIT_MIPS_R1 */ +#endif /* SLJIT_CONFIG_MIPS_32 */ + +#else /* SLJIT_MIPS_REV < 1 */ return "MIPS III" SLJIT_CPUINFO; -#endif +#endif /* SLJIT_MIPS_REV >= 6 */ } /* Length of an instruction word @@ -62,6 +77,7 @@ typedef sljit_u32 sljit_ins; #define TMP_FREG1 (SLJIT_NUMBER_OF_FLOAT_REGISTERS + 1) #define TMP_FREG2 (SLJIT_NUMBER_OF_FLOAT_REGISTERS + 2) +#define TMP_FREG3 (SLJIT_NUMBER_OF_FLOAT_REGISTERS + 3) static const sljit_u8 reg_map[SLJIT_NUMBER_OF_REGISTERS + 5] = { 0, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 24, 23, 22, 21, 20, 19, 18, 17, 16, 29, 4, 25, 31 @@ -69,14 +85,14 @@ static const sljit_u8 reg_map[SLJIT_NUMBER_OF_REGISTERS + 5] = { #if (defined SLJIT_CONFIG_MIPS_32 && SLJIT_CONFIG_MIPS_32) -static const sljit_u8 freg_map[SLJIT_NUMBER_OF_FLOAT_REGISTERS + 3] = { - 0, 0, 14, 2, 4, 6, 8, 12, 10 +static const sljit_u8 freg_map[SLJIT_NUMBER_OF_FLOAT_REGISTERS + 4] = { + 0, 0, 14, 2, 4, 6, 8, 12, 10, 16 }; #else -static const sljit_u8 freg_map[SLJIT_NUMBER_OF_FLOAT_REGISTERS + 3] = { - 0, 0, 13, 14, 15, 16, 17, 12, 18 +static const sljit_u8 freg_map[SLJIT_NUMBER_OF_FLOAT_REGISTERS + 4] = { + 0, 0, 13, 14, 15, 16, 17, 12, 18, 10 }; #endif @@ -102,6 +118,11 @@ static const sljit_u8 freg_map[SLJIT_NUMBER_OF_FLOAT_REGISTERS + 3] = { #define FR(dr) (freg_map[dr]) #define HI(opcode) ((opcode) << 26) #define LO(opcode) (opcode) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) +/* CMP.cond.fmt */ +/* S = (20 << 21) D = (21 << 21) */ +#define CMP_FMT_S (20 << 21) +#endif /* SLJIT_MIPS_REV >= 6 */ /* S = (16 << 21) D = (17 << 21) */ #define FMT_S (16 << 21) #define FMT_D (17 << 21) @@ -114,8 +135,13 @@ static const sljit_u8 freg_map[SLJIT_NUMBER_OF_FLOAT_REGISTERS + 3] = { #define ANDI (HI(12)) #define B (HI(4)) #define BAL (HI(1) | (17 << 16)) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) +#define BC1EQZ (HI(17) | (9 << 21) | FT(TMP_FREG3)) +#define BC1NEZ (HI(17) | (13 << 21) | FT(TMP_FREG3)) +#else /* SLJIT_MIPS_REV < 6 */ #define BC1F (HI(17) | (8 << 21)) #define BC1T (HI(17) | (8 << 21) | (1 << 16)) +#endif /* SLJIT_MIPS_REV >= 6 */ #define BEQ (HI(4)) #define BGEZ (HI(1) | (1 << 16)) #define BGTZ (HI(7)) @@ -124,20 +150,42 @@ static const sljit_u8 freg_map[SLJIT_NUMBER_OF_FLOAT_REGISTERS + 3] = { #define BNE (HI(5)) #define BREAK (HI(0) | LO(13)) #define CFC1 (HI(17) | (2 << 21)) -#define C_UN_S (HI(17) | FMT_S | LO(49)) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) +#define C_UEQ_S (HI(17) | CMP_FMT_S | LO(3)) +#define C_ULE_S (HI(17) | CMP_FMT_S | LO(7)) +#define C_ULT_S (HI(17) | CMP_FMT_S | LO(5)) +#define C_UN_S (HI(17) | CMP_FMT_S | LO(1)) +#define C_FD (FD(TMP_FREG3)) +#else /* SLJIT_MIPS_REV < 6 */ #define C_UEQ_S (HI(17) | FMT_S | LO(51)) #define C_ULE_S (HI(17) | FMT_S | LO(55)) #define C_ULT_S (HI(17) | FMT_S | LO(53)) +#define C_UN_S (HI(17) | FMT_S | LO(49)) +#define C_FD (0) +#endif /* SLJIT_MIPS_REV >= 6 */ #define CVT_S_S (HI(17) | FMT_S | LO(32)) #define DADDIU (HI(25)) #define DADDU (HI(0) | LO(45)) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) +#define DDIV (HI(0) | (2 << 6) | LO(30)) +#define DDIVU (HI(0) | (2 << 6) | LO(31)) +#define DMOD (HI(0) | (3 << 6) | LO(30)) +#define DMODU (HI(0) | (3 << 6) | LO(31)) +#define DIV (HI(0) | (2 << 6) | LO(26)) +#define DIVU (HI(0) | (2 << 6) | LO(27)) +#define DMUH (HI(0) | (3 << 6) | LO(28)) +#define DMUHU (HI(0) | (3 << 6) | LO(29)) +#define DMUL (HI(0) | (2 << 6) | LO(28)) +#define DMULU (HI(0) | (2 << 6) | LO(29)) +#else /* SLJIT_MIPS_REV < 6 */ #define DDIV (HI(0) | LO(30)) #define DDIVU (HI(0) | LO(31)) #define DIV (HI(0) | LO(26)) #define DIVU (HI(0) | LO(27)) -#define DIV_S (HI(17) | FMT_S | LO(3)) #define DMULT (HI(0) | LO(28)) #define DMULTU (HI(0) | LO(29)) +#endif /* SLJIT_MIPS_REV >= 6 */ +#define DIV_S (HI(17) | FMT_S | LO(3)) #define DSLL (HI(0) | LO(56)) #define DSLL32 (HI(0) | LO(60)) #define DSLLV (HI(0) | LO(20)) @@ -151,18 +199,34 @@ static const sljit_u8 freg_map[SLJIT_NUMBER_OF_FLOAT_REGISTERS + 3] = { #define J (HI(2)) #define JAL (HI(3)) #define JALR (HI(0) | LO(9)) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) +#define JR (HI(0) | LO(9)) +#else /* SLJIT_MIPS_REV < 6 */ #define JR (HI(0) | LO(8)) +#endif /* SLJIT_MIPS_REV >= 6 */ #define LD (HI(55)) #define LUI (HI(15)) #define LW (HI(35)) #define MFC1 (HI(17)) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) +#define MOD (HI(0) | (3 << 6) | LO(26)) +#define MODU (HI(0) | (3 << 6) | LO(27)) +#else /* SLJIT_MIPS_REV < 6 */ #define MFHI (HI(0) | LO(16)) #define MFLO (HI(0) | LO(18)) +#endif /* SLJIT_MIPS_REV >= 6 */ #define MOV_S (HI(17) | FMT_S | LO(6)) #define MTC1 (HI(17) | (4 << 21)) -#define MUL_S (HI(17) | FMT_S | LO(2)) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) +#define MUH (HI(0) | (3 << 6) | LO(24)) +#define MUHU (HI(0) | (3 << 6) | LO(25)) +#define MUL (HI(0) | (2 << 6) | LO(24)) +#define MULU (HI(0) | (2 << 6) | LO(25)) +#else /* SLJIT_MIPS_REV < 6 */ #define MULT (HI(0) | LO(24)) #define MULTU (HI(0) | LO(25)) +#endif /* SLJIT_MIPS_REV >= 6 */ +#define MUL_S (HI(17) | FMT_S | LO(2)) #define NEG_S (HI(17) | FMT_S | LO(7)) #define NOP (HI(0) | LO(0)) #define NOR (HI(0) | LO(39)) @@ -188,19 +252,23 @@ static const sljit_u8 freg_map[SLJIT_NUMBER_OF_FLOAT_REGISTERS + 3] = { #define XOR (HI(0) | LO(38)) #define XORI (HI(14)) -#if (defined SLJIT_MIPS_R1 && SLJIT_MIPS_R1) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 1) #define CLZ (HI(28) | LO(32)) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) +#define DCLZ (LO(18)) +#else /* SLJIT_MIPS_REV < 6 */ #define DCLZ (HI(28) | LO(36)) #define MOVF (HI(0) | (0 << 16) | LO(1)) #define MOVN (HI(0) | LO(11)) #define MOVT (HI(0) | (1 << 16) | LO(1)) #define MOVZ (HI(0) | LO(10)) #define MUL (HI(28) | LO(2)) +#endif /* SLJIT_MIPS_REV >= 6 */ #define PREF (HI(51)) #define PREFX (HI(19) | LO(15)) #define SEB (HI(31) | (16 << 6) | LO(32)) #define SEH (HI(31) | (24 << 6) | LO(32)) -#endif +#endif /* SLJIT_MIPS_REV >= 1 */ #if (defined SLJIT_CONFIG_MIPS_32 && SLJIT_CONFIG_MIPS_32) #define ADDU_W ADDU @@ -222,9 +290,9 @@ static const sljit_u8 freg_map[SLJIT_NUMBER_OF_FLOAT_REGISTERS + 3] = { Useful for reordering instructions in the delay slot. */ static sljit_s32 push_inst(struct sljit_compiler *compiler, sljit_ins ins, sljit_s32 delay_slot) { + sljit_ins *ptr = (sljit_ins*)ensure_buf(compiler, sizeof(sljit_ins)); SLJIT_ASSERT(delay_slot == MOVABLE_INS || delay_slot >= UNMOVABLE_INS || delay_slot == ((ins >> 11) & 0x1f) || delay_slot == ((ins >> 16) & 0x1f)); - sljit_ins *ptr = (sljit_ins*)ensure_buf(compiler, sizeof(sljit_ins)); FAIL_IF(!ptr); *ptr = ins; compiler->size++; @@ -234,7 +302,13 @@ static sljit_s32 push_inst(struct sljit_compiler *compiler, sljit_ins ins, sljit static SLJIT_INLINE sljit_ins invert_branch(sljit_s32 flags) { - return (flags & IS_BIT26_COND) ? (1 << 26) : (1 << 16); + if (flags & IS_BIT26_COND) + return (1 << 26); +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) + if (flags & IS_BIT23_COND) + return (1 << 23); +#endif /* SLJIT_MIPS_REV >= 6 */ + return (1 << 16); } static SLJIT_INLINE sljit_ins* detect_jump_type(struct sljit_jump *jump, sljit_ins *code_ptr, sljit_ins *code, sljit_sw executable_offset) @@ -376,6 +450,55 @@ static __attribute__ ((noinline)) void sljit_cache_flush(void* code, void* code_ } #endif +#if (defined SLJIT_CONFIG_MIPS_64 && SLJIT_CONFIG_MIPS_64) + +static SLJIT_INLINE sljit_sw put_label_get_length(struct sljit_put_label *put_label, sljit_uw max_label) +{ + if (max_label < 0x80000000l) { + put_label->flags = 0; + return 1; + } + + if (max_label < 0x800000000000l) { + put_label->flags = 1; + return 3; + } + + put_label->flags = 2; + return 5; +} + +static SLJIT_INLINE void put_label_set(struct sljit_put_label *put_label) +{ + sljit_uw addr = put_label->label->addr; + sljit_ins *inst = (sljit_ins *)put_label->addr; + sljit_s32 reg = *inst; + + if (put_label->flags == 0) { + SLJIT_ASSERT(addr < 0x80000000l); + inst[0] = LUI | T(reg) | IMM(addr >> 16); + } + else if (put_label->flags == 1) { + SLJIT_ASSERT(addr < 0x800000000000l); + inst[0] = LUI | T(reg) | IMM(addr >> 32); + inst[1] = ORI | S(reg) | T(reg) | IMM((addr >> 16) & 0xffff); + inst[2] = DSLL | T(reg) | D(reg) | SH_IMM(16); + inst += 2; + } + else { + inst[0] = LUI | T(reg) | IMM(addr >> 48); + inst[1] = ORI | S(reg) | T(reg) | IMM((addr >> 32) & 0xffff); + inst[2] = DSLL | T(reg) | D(reg) | SH_IMM(16); + inst[3] = ORI | S(reg) | T(reg) | IMM((addr >> 16) & 0xffff); + inst[4] = DSLL | T(reg) | D(reg) | SH_IMM(16); + inst += 4; + } + + inst[1] = ORI | S(reg) | T(reg) | IMM(addr & 0xffff); +} + +#endif + SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compiler) { struct sljit_memory_fragment *buf; @@ -384,56 +507,73 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil sljit_ins *buf_ptr; sljit_ins *buf_end; sljit_uw word_count; + sljit_uw next_addr; sljit_sw executable_offset; sljit_uw addr; struct sljit_label *label; struct sljit_jump *jump; struct sljit_const *const_; + struct sljit_put_label *put_label; CHECK_ERROR_PTR(); CHECK_PTR(check_sljit_generate_code(compiler)); reverse_buf(compiler); - code = (sljit_ins*)SLJIT_MALLOC_EXEC(compiler->size * sizeof(sljit_ins)); + code = (sljit_ins*)SLJIT_MALLOC_EXEC(compiler->size * sizeof(sljit_ins), compiler->exec_allocator_data); PTR_FAIL_WITH_EXEC_IF(code); buf = compiler->buf; code_ptr = code; word_count = 0; + next_addr = 0; executable_offset = SLJIT_EXEC_OFFSET(code); label = compiler->labels; jump = compiler->jumps; const_ = compiler->consts; + put_label = compiler->put_labels; do { buf_ptr = (sljit_ins*)buf->memory; buf_end = buf_ptr + (buf->used_size >> 2); do { *code_ptr = *buf_ptr++; - SLJIT_ASSERT(!label || label->size >= word_count); - SLJIT_ASSERT(!jump || jump->addr >= word_count); - SLJIT_ASSERT(!const_ || const_->addr >= word_count); - /* These structures are ordered by their address. */ - if (label && label->size == word_count) { - label->addr = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); - label->size = code_ptr - code; - label = label->next; - } - if (jump && jump->addr == word_count) { + if (next_addr == word_count) { + SLJIT_ASSERT(!label || label->size >= word_count); + SLJIT_ASSERT(!jump || jump->addr >= word_count); + SLJIT_ASSERT(!const_ || const_->addr >= word_count); + SLJIT_ASSERT(!put_label || put_label->addr >= word_count); + + /* These structures are ordered by their address. */ + if (label && label->size == word_count) { + label->addr = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); + label->size = code_ptr - code; + label = label->next; + } + if (jump && jump->addr == word_count) { #if (defined SLJIT_CONFIG_MIPS_32 && SLJIT_CONFIG_MIPS_32) - jump->addr = (sljit_uw)(code_ptr - 3); + jump->addr = (sljit_uw)(code_ptr - 3); #else - jump->addr = (sljit_uw)(code_ptr - 7); + jump->addr = (sljit_uw)(code_ptr - 7); #endif - code_ptr = detect_jump_type(jump, code_ptr, code, executable_offset); - jump = jump->next; - } - if (const_ && const_->addr == word_count) { - /* Just recording the address. */ - const_->addr = (sljit_uw)code_ptr; - const_ = const_->next; + code_ptr = detect_jump_type(jump, code_ptr, code, executable_offset); + jump = jump->next; + } + if (const_ && const_->addr == word_count) { + const_->addr = (sljit_uw)code_ptr; + const_ = const_->next; + } + if (put_label && put_label->addr == word_count) { + SLJIT_ASSERT(put_label->label); + put_label->addr = (sljit_uw)code_ptr; +#if (defined SLJIT_CONFIG_MIPS_64 && SLJIT_CONFIG_MIPS_64) + code_ptr += put_label_get_length(put_label, (sljit_uw)(SLJIT_ADD_EXEC_OFFSET(code, executable_offset) + put_label->label->size)); + word_count += 5; +#endif + put_label = put_label->next; + } + next_addr = compute_next_addr(label, jump, const_, put_label); } code_ptr ++; word_count ++; @@ -451,6 +591,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil SLJIT_ASSERT(!label); SLJIT_ASSERT(!jump); SLJIT_ASSERT(!const_); + SLJIT_ASSERT(!put_label); SLJIT_ASSERT(code_ptr - code <= (sljit_sw)compiler->size); jump = compiler->jumps; @@ -498,6 +639,21 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil jump = jump->next; } + put_label = compiler->put_labels; + while (put_label) { +#if (defined SLJIT_CONFIG_MIPS_32 && SLJIT_CONFIG_MIPS_32) + addr = put_label->label->addr; + buf_ptr = (sljit_ins *)put_label->addr; + + SLJIT_ASSERT((buf_ptr[0] & 0xffe00000) == LUI && (buf_ptr[1] & 0xfc000000) == ORI); + buf_ptr[0] |= (addr >> 16) & 0xffff; + buf_ptr[1] |= addr & 0xffff; +#else + put_label_set(put_label); +#endif + put_label = put_label->next; + } + compiler->error = SLJIT_ERR_COMPILED; compiler->executable_offset = executable_offset; compiler->executable_size = (code_ptr - code) * sizeof(sljit_ins); @@ -511,6 +667,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil /* GCC workaround for invalid code generation with -O2. */ sljit_cache_flush(code, code_ptr); #endif + SLJIT_UPDATE_WX_FLAGS(code, code_ptr, 1); return code; } @@ -523,17 +680,20 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_has_cpu_feature(sljit_s32 feature_type) #ifdef SLJIT_IS_FPU_AVAILABLE return SLJIT_IS_FPU_AVAILABLE; #elif defined(__GNUC__) - asm ("cfc1 %0, $0" : "=r"(fir)); + __asm__ ("cfc1 %0, $0" : "=r"(fir)); return (fir >> 22) & 0x1; #else #error "FIR check is not implemented for this architecture" #endif + case SLJIT_HAS_ZERO_REGISTER: + return 1; -#if (defined SLJIT_MIPS_R1 && SLJIT_MIPS_R1) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 1) case SLJIT_HAS_CLZ: case SLJIT_HAS_CMOV: + case SLJIT_HAS_PREFETCH: return 1; -#endif +#endif /* SLJIT_MIPS_REV >= 1 */ default: return fir; @@ -1075,40 +1235,71 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op0(struct sljit_compiler *compile return push_inst(compiler, NOP, UNMOVABLE_INS); case SLJIT_LMUL_UW: case SLJIT_LMUL_SW: +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) +#if (defined SLJIT_CONFIG_MIPS_64 && SLJIT_CONFIG_MIPS_64) + FAIL_IF(push_inst(compiler, (op == SLJIT_LMUL_UW ? DMULU : DMUL) | S(SLJIT_R0) | T(SLJIT_R1) | D(TMP_REG3), DR(TMP_REG3))); + FAIL_IF(push_inst(compiler, (op == SLJIT_LMUL_UW ? DMUHU : DMUH) | S(SLJIT_R0) | T(SLJIT_R1) | D(TMP_REG1), DR(TMP_REG1))); +#else /* !SLJIT_CONFIG_MIPS_64 */ + FAIL_IF(push_inst(compiler, (op == SLJIT_LMUL_UW ? MULU : MUL) | S(SLJIT_R0) | T(SLJIT_R1) | D(TMP_REG3), DR(TMP_REG3))); + FAIL_IF(push_inst(compiler, (op == SLJIT_LMUL_UW ? MUHU : MUH) | S(SLJIT_R0) | T(SLJIT_R1) | D(TMP_REG1), DR(TMP_REG1))); +#endif /* SLJIT_CONFIG_MIPS_64 */ + FAIL_IF(push_inst(compiler, ADDU_W | S(TMP_REG3) | TA(0) | D(SLJIT_R0), DR(SLJIT_R0))); + return push_inst(compiler, ADDU_W | S(TMP_REG1) | TA(0) | D(SLJIT_R1), DR(SLJIT_R1)); +#else /* SLJIT_MIPS_REV < 6 */ #if (defined SLJIT_CONFIG_MIPS_64 && SLJIT_CONFIG_MIPS_64) FAIL_IF(push_inst(compiler, (op == SLJIT_LMUL_UW ? DMULTU : DMULT) | S(SLJIT_R0) | T(SLJIT_R1), MOVABLE_INS)); -#else +#else /* !SLJIT_CONFIG_MIPS_64 */ FAIL_IF(push_inst(compiler, (op == SLJIT_LMUL_UW ? MULTU : MULT) | S(SLJIT_R0) | T(SLJIT_R1), MOVABLE_INS)); -#endif +#endif /* SLJIT_CONFIG_MIPS_64 */ FAIL_IF(push_inst(compiler, MFLO | D(SLJIT_R0), DR(SLJIT_R0))); return push_inst(compiler, MFHI | D(SLJIT_R1), DR(SLJIT_R1)); +#endif /* SLJIT_MIPS_REV >= 6 */ case SLJIT_DIVMOD_UW: case SLJIT_DIVMOD_SW: case SLJIT_DIV_UW: case SLJIT_DIV_SW: SLJIT_COMPILE_ASSERT((SLJIT_DIVMOD_UW & 0x2) == 0 && SLJIT_DIV_UW - 0x2 == SLJIT_DIVMOD_UW, bad_div_opcode_assignments); -#if !(defined SLJIT_MIPS_R1 && SLJIT_MIPS_R1) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) +#if (defined SLJIT_CONFIG_MIPS_64 && SLJIT_CONFIG_MIPS_64) + if (int_op) { + FAIL_IF(push_inst(compiler, ((op | 0x2) == SLJIT_DIV_UW ? DIVU : DIV) | S(SLJIT_R0) | T(SLJIT_R1) | D(TMP_REG3), DR(TMP_REG3))); + FAIL_IF(push_inst(compiler, ((op | 0x2) == SLJIT_DIV_UW ? MODU : MOD) | S(SLJIT_R0) | T(SLJIT_R1) | D(TMP_REG1), DR(TMP_REG1))); + } + else { + FAIL_IF(push_inst(compiler, ((op | 0x2) == SLJIT_DIV_UW ? DDIVU : DDIV) | S(SLJIT_R0) | T(SLJIT_R1) | D(TMP_REG3), DR(TMP_REG3))); + FAIL_IF(push_inst(compiler, ((op | 0x2) == SLJIT_DIV_UW ? DMODU : DMOD) | S(SLJIT_R0) | T(SLJIT_R1) | D(TMP_REG1), DR(TMP_REG1))); + } +#else /* !SLJIT_CONFIG_MIPS_64 */ + FAIL_IF(push_inst(compiler, ((op | 0x2) == SLJIT_DIV_UW ? DIVU : DIV) | S(SLJIT_R0) | T(SLJIT_R1) | D(TMP_REG3), DR(TMP_REG3))); + FAIL_IF(push_inst(compiler, ((op | 0x2) == SLJIT_DIV_UW ? MODU : MOD) | S(SLJIT_R0) | T(SLJIT_R1) | D(TMP_REG1), DR(TMP_REG1))); +#endif /* SLJIT_CONFIG_MIPS_64 */ + FAIL_IF(push_inst(compiler, ADDU_W | S(TMP_REG3) | TA(0) | D(SLJIT_R0), DR(SLJIT_R0))); + return (op >= SLJIT_DIV_UW) ? SLJIT_SUCCESS : push_inst(compiler, ADDU_W | S(TMP_REG1) | TA(0) | D(SLJIT_R1), DR(SLJIT_R1)); +#else /* SLJIT_MIPS_REV < 6 */ +#if !(defined SLJIT_MIPS_REV) FAIL_IF(push_inst(compiler, NOP, UNMOVABLE_INS)); FAIL_IF(push_inst(compiler, NOP, UNMOVABLE_INS)); -#endif - +#endif /* !SLJIT_MIPS_REV */ #if (defined SLJIT_CONFIG_MIPS_64 && SLJIT_CONFIG_MIPS_64) if (int_op) FAIL_IF(push_inst(compiler, ((op | 0x2) == SLJIT_DIV_UW ? DIVU : DIV) | S(SLJIT_R0) | T(SLJIT_R1), MOVABLE_INS)); else FAIL_IF(push_inst(compiler, ((op | 0x2) == SLJIT_DIV_UW ? DDIVU : DDIV) | S(SLJIT_R0) | T(SLJIT_R1), MOVABLE_INS)); -#else +#else /* !SLJIT_CONFIG_MIPS_64 */ FAIL_IF(push_inst(compiler, ((op | 0x2) == SLJIT_DIV_UW ? DIVU : DIV) | S(SLJIT_R0) | T(SLJIT_R1), MOVABLE_INS)); -#endif - +#endif /* SLJIT_CONFIG_MIPS_64 */ FAIL_IF(push_inst(compiler, MFLO | D(SLJIT_R0), DR(SLJIT_R0))); return (op >= SLJIT_DIV_UW) ? SLJIT_SUCCESS : push_inst(compiler, MFHI | D(SLJIT_R1), DR(SLJIT_R1)); +#endif /* SLJIT_MIPS_REV >= 6 */ + case SLJIT_ENDBR: + case SLJIT_SKIP_FRAMES_BEFORE_RETURN: + return SLJIT_SUCCESS; } return SLJIT_SUCCESS; } -#if (defined SLJIT_MIPS_R1 && SLJIT_MIPS_R1) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 1) static sljit_s32 emit_prefetch(struct sljit_compiler *compiler, sljit_s32 src, sljit_sw srcw) { @@ -1129,7 +1320,7 @@ static sljit_s32 emit_prefetch(struct sljit_compiler *compiler, return push_inst(compiler, PREFX | S(src & REG_MASK) | T(OFFS_REG(src)), MOVABLE_INS); } -#endif +#endif /* SLJIT_MIPS_REV >= 1 */ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op1(struct sljit_compiler *compiler, sljit_s32 op, sljit_s32 dst, sljit_sw dstw, @@ -1146,14 +1337,6 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op1(struct sljit_compiler *compile ADJUST_LOCAL_OFFSET(dst, dstw); ADJUST_LOCAL_OFFSET(src, srcw); - if (dst == SLJIT_UNUSED && !HAS_FLAGS(op)) { -#if (defined SLJIT_MIPS_R1 && SLJIT_MIPS_R1) - if (op <= SLJIT_MOV_P && (src & SLJIT_MEM)) - return emit_prefetch(compiler, src, srcw); -#endif - return SLJIT_SUCCESS; - } - #if (defined SLJIT_CONFIG_MIPS_64 && SLJIT_CONFIG_MIPS_64) if ((op & SLJIT_I32_OP) && GET_OPCODE(op) >= SLJIT_NOT) flags |= INT_DATA | SIGNED_DATA; @@ -1280,6 +1463,38 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op2(struct sljit_compiler *compile #endif } +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_src(struct sljit_compiler *compiler, sljit_s32 op, + sljit_s32 src, sljit_sw srcw) +{ + CHECK_ERROR(); + CHECK(check_sljit_emit_op_src(compiler, op, src, srcw)); + ADJUST_LOCAL_OFFSET(src, srcw); + + switch (op) { + case SLJIT_FAST_RETURN: + if (FAST_IS_REG(src)) + FAIL_IF(push_inst(compiler, ADDU_W | S(src) | TA(0) | DA(RETURN_ADDR_REG), RETURN_ADDR_REG)); + else + FAIL_IF(emit_op_mem(compiler, WORD_DATA | LOAD_DATA, RETURN_ADDR_REG, src, srcw)); + + FAIL_IF(push_inst(compiler, JR | SA(RETURN_ADDR_REG), UNMOVABLE_INS)); + return push_inst(compiler, NOP, UNMOVABLE_INS); + case SLJIT_SKIP_FRAMES_BEFORE_FAST_RETURN: + return SLJIT_SUCCESS; + case SLJIT_PREFETCH_L1: + case SLJIT_PREFETCH_L2: + case SLJIT_PREFETCH_L3: + case SLJIT_PREFETCH_ONCE: +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 1) + return emit_prefetch(compiler, src, srcw); +#else /* SLJIT_MIPS_REV < 1 */ + return SLJIT_SUCCESS; +#endif /* SLJIT_MIPS_REV >= 1 */ + } + + return SLJIT_SUCCESS; +} + SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_get_register_index(sljit_s32 reg) { CHECK_REG_INDEX(check_sljit_get_register_index(reg)); @@ -1408,8 +1623,7 @@ static SLJIT_INLINE sljit_s32 sljit_emit_fop1_cmp(struct sljit_compiler *compile inst = C_UN_S; break; } - - return push_inst(compiler, inst | FMT(op) | FT(src2) | FS(src1), UNMOVABLE_INS); + return push_inst(compiler, inst | FMT(op) | FT(src2) | FS(src1) | C_FD, UNMOVABLE_INS); } SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fop1(struct sljit_compiler *compiler, sljit_s32 op, @@ -1550,25 +1764,12 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_enter(struct sljit_compiler * ADJUST_LOCAL_OFFSET(dst, dstw); if (FAST_IS_REG(dst)) - return push_inst(compiler, ADDU_W | SA(RETURN_ADDR_REG) | TA(0) | D(dst), DR(dst)); + return push_inst(compiler, ADDU_W | SA(RETURN_ADDR_REG) | TA(0) | D(dst), UNMOVABLE_INS); /* Memory. */ - return emit_op_mem(compiler, WORD_DATA, RETURN_ADDR_REG, dst, dstw); -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_return(struct sljit_compiler *compiler, sljit_s32 src, sljit_sw srcw) -{ - CHECK_ERROR(); - CHECK(check_sljit_emit_fast_return(compiler, src, srcw)); - ADJUST_LOCAL_OFFSET(src, srcw); - - if (FAST_IS_REG(src)) - FAIL_IF(push_inst(compiler, ADDU_W | S(src) | TA(0) | DA(RETURN_ADDR_REG), RETURN_ADDR_REG)); - else - FAIL_IF(emit_op_mem(compiler, WORD_DATA | LOAD_DATA, RETURN_ADDR_REG, src, srcw)); - - FAIL_IF(push_inst(compiler, JR | SA(RETURN_ADDR_REG), UNMOVABLE_INS)); - return push_inst(compiler, NOP, UNMOVABLE_INS); + FAIL_IF(emit_op_mem(compiler, WORD_DATA, RETURN_ADDR_REG, dst, dstw)); + compiler->delay_slot = UNMOVABLE_INS; + return SLJIT_SUCCESS; } /* --------------------------------------------------------------------- */ @@ -1608,16 +1809,30 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_label* sljit_emit_label(struct sljit_compi flags = IS_BIT26_COND; \ delay_check = src; +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) + +#define BR_T() \ + inst = BC1NEZ; \ + flags = IS_BIT23_COND; \ + delay_check = FCSR_FCC; +#define BR_F() \ + inst = BC1EQZ; \ + flags = IS_BIT23_COND; \ + delay_check = FCSR_FCC; + +#else /* SLJIT_MIPS_REV < 6 */ + #define BR_T() \ inst = BC1T | JUMP_LENGTH; \ flags = IS_BIT16_COND; \ delay_check = FCSR_FCC; - #define BR_F() \ inst = BC1F | JUMP_LENGTH; \ flags = IS_BIT16_COND; \ delay_check = FCSR_FCC; +#endif /* SLJIT_MIPS_REV >= 6 */ + SLJIT_API_FUNC_ATTRIBUTE struct sljit_jump* sljit_emit_jump(struct sljit_compiler *compiler, sljit_s32 type) { struct sljit_jump *jump; @@ -1927,7 +2142,11 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_flags(struct sljit_compiler *co case SLJIT_GREATER_EQUAL_F64: case SLJIT_UNORDERED_F64: case SLJIT_ORDERED_F64: +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 6) + FAIL_IF(push_inst(compiler, MFC1 | TA(dst_ar) | FS(TMP_FREG3), dst_ar)); +#else /* SLJIT_MIPS_REV < 6 */ FAIL_IF(push_inst(compiler, CFC1 | TA(dst_ar) | DA(FCSR_REG), dst_ar)); +#endif /* SLJIT_MIPS_REV >= 6 */ FAIL_IF(push_inst(compiler, SRL | TA(dst_ar) | DA(dst_ar) | SH_IMM(23), dst_ar)); FAIL_IF(push_inst(compiler, ANDI | SA(dst_ar) | TA(dst_ar) | IMM(1), dst_ar)); src_ar = dst_ar; @@ -1967,14 +2186,14 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_cmov(struct sljit_compiler *compil sljit_s32 dst_reg, sljit_s32 src, sljit_sw srcw) { -#if (defined SLJIT_MIPS_R1 && SLJIT_MIPS_R1) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 1 && SLJIT_MIPS_REV < 6) sljit_ins ins; -#endif +#endif /* SLJIT_MIPS_REV >= 1 && SLJIT_MIPS_REV < 6 */ CHECK_ERROR(); CHECK(check_sljit_emit_cmov(compiler, type, dst_reg, src, srcw)); -#if (defined SLJIT_MIPS_R1 && SLJIT_MIPS_R1) +#if (defined SLJIT_MIPS_REV && SLJIT_MIPS_REV >= 1 && SLJIT_MIPS_REV < 6) if (SLJIT_UNLIKELY(src & SLJIT_IMM)) { #if (defined SLJIT_CONFIG_MIPS_64 && SLJIT_CONFIG_MIPS_64) @@ -2031,15 +2250,15 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_cmov(struct sljit_compiler *compil return push_inst(compiler, ins | S(src) | D(dst_reg), DR(dst_reg)); -#else +#else /* SLJIT_MIPS_REV < 1 || SLJIT_MIPS_REV >= 6 */ return sljit_emit_cmov_generic(compiler, type, dst_reg, src, srcw); -#endif +#endif /* SLJIT_MIPS_REV >= 1 */ } SLJIT_API_FUNC_ATTRIBUTE struct sljit_const* sljit_emit_const(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw, sljit_sw init_value) { struct sljit_const *const_; - sljit_s32 reg; + sljit_s32 dst_r; CHECK_ERROR_PTR(); CHECK_PTR(check_sljit_emit_const(compiler, dst, dstw, init_value)); @@ -2049,11 +2268,38 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_const* sljit_emit_const(struct sljit_compi PTR_FAIL_IF(!const_); set_const(const_, compiler); - reg = FAST_IS_REG(dst) ? dst : TMP_REG2; - - PTR_FAIL_IF(emit_const(compiler, reg, init_value)); + dst_r = FAST_IS_REG(dst) ? dst : TMP_REG2; + PTR_FAIL_IF(emit_const(compiler, dst_r, init_value)); if (dst & SLJIT_MEM) PTR_FAIL_IF(emit_op(compiler, SLJIT_MOV, WORD_DATA, dst, dstw, TMP_REG1, 0, TMP_REG2, 0)); + return const_; } + +SLJIT_API_FUNC_ATTRIBUTE struct sljit_put_label* sljit_emit_put_label(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw) +{ + struct sljit_put_label *put_label; + sljit_s32 dst_r; + + CHECK_ERROR_PTR(); + CHECK_PTR(check_sljit_emit_put_label(compiler, dst, dstw)); + ADJUST_LOCAL_OFFSET(dst, dstw); + + put_label = (struct sljit_put_label*)ensure_abuf(compiler, sizeof(struct sljit_put_label)); + PTR_FAIL_IF(!put_label); + set_put_label(put_label, compiler, 0); + + dst_r = FAST_IS_REG(dst) ? dst : TMP_REG2; +#if (defined SLJIT_CONFIG_MIPS_32 && SLJIT_CONFIG_MIPS_32) + PTR_FAIL_IF(emit_const(compiler, dst_r, 0)); +#else + PTR_FAIL_IF(push_inst(compiler, dst_r, UNMOVABLE_INS)); + compiler->size += 5; +#endif + + if (dst & SLJIT_MEM) + PTR_FAIL_IF(emit_op(compiler, SLJIT_MOV, WORD_DATA, dst, dstw, TMP_REG1, 0, TMP_REG2, 0)); + + return put_label; +} diff --git a/src/pcre/sljit/sljitNativePPC_32.c b/src/pcre2/src/sljit/sljitNativePPC_32.c similarity index 97% rename from src/pcre/sljit/sljitNativePPC_32.c rename to src/pcre2/src/sljit/sljitNativePPC_32.c index fc185f78..7d9ec533 100644 --- a/src/pcre/sljit/sljitNativePPC_32.c +++ b/src/pcre2/src/sljit/sljitNativePPC_32.c @@ -258,19 +258,18 @@ static SLJIT_INLINE sljit_s32 emit_const(struct sljit_compiler *compiler, sljit_ SLJIT_API_FUNC_ATTRIBUTE void sljit_set_jump_addr(sljit_uw addr, sljit_uw new_target, sljit_sw executable_offset) { sljit_ins *inst = (sljit_ins *)addr; + SLJIT_UNUSED_ARG(executable_offset); + SLJIT_UPDATE_WX_FLAGS(inst, inst + 2, 0); + SLJIT_ASSERT((inst[0] & 0xfc1f0000) == ADDIS && (inst[1] & 0xfc000000) == ORI); inst[0] = (inst[0] & 0xffff0000) | ((new_target >> 16) & 0xffff); inst[1] = (inst[1] & 0xffff0000) | (new_target & 0xffff); + SLJIT_UPDATE_WX_FLAGS(inst, inst + 2, 1); inst = (sljit_ins *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 2); } SLJIT_API_FUNC_ATTRIBUTE void sljit_set_const(sljit_uw addr, sljit_sw new_constant, sljit_sw executable_offset) { - sljit_ins *inst = (sljit_ins *)addr; - - inst[0] = (inst[0] & 0xffff0000) | ((new_constant >> 16) & 0xffff); - inst[1] = (inst[1] & 0xffff0000) | (new_constant & 0xffff); - inst = (sljit_ins *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); - SLJIT_CACHE_FLUSH(inst, inst + 2); + sljit_set_jump_addr(addr, new_constant, executable_offset); } diff --git a/src/pcre/sljit/sljitNativePPC_64.c b/src/pcre2/src/sljit/sljitNativePPC_64.c similarity index 96% rename from src/pcre/sljit/sljitNativePPC_64.c rename to src/pcre2/src/sljit/sljitNativePPC_64.c index 706b2ba2..92147d2a 100644 --- a/src/pcre/sljit/sljitNativePPC_64.c +++ b/src/pcre2/src/sljit/sljitNativePPC_64.c @@ -35,9 +35,6 @@ #error "Must implement count leading zeroes" #endif -#define RLDI(dst, src, sh, mb, type) \ - (HI(30) | S(src) | A(dst) | ((type) << 2) | (((sh) & 0x1f) << 11) | (((sh) & 0x20) >> 4) | (((mb) & 0x1f) << 6) | ((mb) & 0x20)) - #define PUSH_RLDICR(reg, shift) \ push_inst(compiler, RLDI(reg, reg, 63 - shift, shift, 1)) @@ -480,23 +477,19 @@ static SLJIT_INLINE sljit_s32 emit_const(struct sljit_compiler *compiler, sljit_ SLJIT_API_FUNC_ATTRIBUTE void sljit_set_jump_addr(sljit_uw addr, sljit_uw new_target, sljit_sw executable_offset) { sljit_ins *inst = (sljit_ins*)addr; + SLJIT_UNUSED_ARG(executable_offset); + SLJIT_UPDATE_WX_FLAGS(inst, inst + 5, 0); inst[0] = (inst[0] & 0xffff0000) | ((new_target >> 48) & 0xffff); inst[1] = (inst[1] & 0xffff0000) | ((new_target >> 32) & 0xffff); inst[3] = (inst[3] & 0xffff0000) | ((new_target >> 16) & 0xffff); inst[4] = (inst[4] & 0xffff0000) | (new_target & 0xffff); + SLJIT_UPDATE_WX_FLAGS(inst, inst + 5, 1); inst = (sljit_ins *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 5); } SLJIT_API_FUNC_ATTRIBUTE void sljit_set_const(sljit_uw addr, sljit_sw new_constant, sljit_sw executable_offset) { - sljit_ins *inst = (sljit_ins*)addr; - - inst[0] = (inst[0] & 0xffff0000) | ((new_constant >> 48) & 0xffff); - inst[1] = (inst[1] & 0xffff0000) | ((new_constant >> 32) & 0xffff); - inst[3] = (inst[3] & 0xffff0000) | ((new_constant >> 16) & 0xffff); - inst[4] = (inst[4] & 0xffff0000) | (new_constant & 0xffff); - inst = (sljit_ins *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); - SLJIT_CACHE_FLUSH(inst, inst + 5); + sljit_set_jump_addr(addr, new_constant, executable_offset); } diff --git a/src/pcre/sljit/sljitNativePPC_common.c b/src/pcre2/src/sljit/sljitNativePPC_common.c similarity index 92% rename from src/pcre/sljit/sljitNativePPC_common.c rename to src/pcre2/src/sljit/sljitNativePPC_common.c index b34e3965..d84562ce 100644 --- a/src/pcre/sljit/sljitNativePPC_common.c +++ b/src/pcre2/src/sljit/sljitNativePPC_common.c @@ -231,6 +231,9 @@ static const sljit_u8 freg_map[SLJIT_NUMBER_OF_FLOAT_REGISTERS + 3] = { #define SIMM_MIN (-0x8000) #define UIMM_MAX (0xffff) +#define RLDI(dst, src, sh, mb, type) \ + (HI(30) | S(src) | A(dst) | ((type) << 2) | (((sh) & 0x1f) << 11) | (((sh) & 0x20) >> 4) | (((mb) & 0x1f) << 6) | ((mb) & 0x20)) + #if (defined SLJIT_INDIRECT_CALL && SLJIT_INDIRECT_CALL) SLJIT_API_FUNC_ATTRIBUTE void sljit_set_function_context(void** func_ptr, struct sljit_function_context* context, sljit_sw addr, void* func) { @@ -324,6 +327,55 @@ static SLJIT_INLINE sljit_s32 detect_jump_type(struct sljit_jump *jump, sljit_in return 0; } +#if (defined SLJIT_CONFIG_PPC_64 && SLJIT_CONFIG_PPC_64) + +static SLJIT_INLINE sljit_sw put_label_get_length(struct sljit_put_label *put_label, sljit_uw max_label) +{ + if (max_label < 0x100000000l) { + put_label->flags = 0; + return 1; + } + + if (max_label < 0x1000000000000l) { + put_label->flags = 1; + return 3; + } + + put_label->flags = 2; + return 4; +} + +static SLJIT_INLINE void put_label_set(struct sljit_put_label *put_label) +{ + sljit_uw addr = put_label->label->addr; + sljit_ins *inst = (sljit_ins *)put_label->addr; + sljit_s32 reg = *inst; + + if (put_label->flags == 0) { + SLJIT_ASSERT(addr < 0x100000000l); + inst[0] = ORIS | S(TMP_ZERO) | A(reg) | IMM(addr >> 16); + } + else { + if (put_label->flags == 1) { + SLJIT_ASSERT(addr < 0x1000000000000l); + inst[0] = ORI | S(TMP_ZERO) | A(reg) | IMM(addr >> 32); + } + else { + inst[0] = ORIS | S(TMP_ZERO) | A(reg) | IMM(addr >> 48); + inst[1] = ORI | S(reg) | A(reg) | IMM((addr >> 32) & 0xffff); + inst ++; + } + + inst[1] = RLDI(reg, reg, 32, 31, 1); + inst[2] = ORIS | S(reg) | A(reg) | IMM((addr >> 16) & 0xffff); + inst += 2; + } + + inst[1] = ORI | S(reg) | A(reg) | IMM(addr & 0xffff); +} + +#endif + SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compiler) { struct sljit_memory_fragment *buf; @@ -332,12 +384,14 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil sljit_ins *buf_ptr; sljit_ins *buf_end; sljit_uw word_count; + sljit_uw next_addr; sljit_sw executable_offset; sljit_uw addr; struct sljit_label *label; struct sljit_jump *jump; struct sljit_const *const_; + struct sljit_put_label *put_label; CHECK_ERROR_PTR(); CHECK_PTR(check_sljit_generate_code(compiler)); @@ -350,77 +404,93 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil compiler->size += (sizeof(struct sljit_function_context) / sizeof(sljit_ins)); #endif #endif - code = (sljit_ins*)SLJIT_MALLOC_EXEC(compiler->size * sizeof(sljit_ins)); + code = (sljit_ins*)SLJIT_MALLOC_EXEC(compiler->size * sizeof(sljit_ins), compiler->exec_allocator_data); PTR_FAIL_WITH_EXEC_IF(code); buf = compiler->buf; code_ptr = code; word_count = 0; + next_addr = 0; executable_offset = SLJIT_EXEC_OFFSET(code); label = compiler->labels; jump = compiler->jumps; const_ = compiler->consts; + put_label = compiler->put_labels; do { buf_ptr = (sljit_ins*)buf->memory; buf_end = buf_ptr + (buf->used_size >> 2); do { *code_ptr = *buf_ptr++; - SLJIT_ASSERT(!label || label->size >= word_count); - SLJIT_ASSERT(!jump || jump->addr >= word_count); - SLJIT_ASSERT(!const_ || const_->addr >= word_count); - /* These structures are ordered by their address. */ - if (label && label->size == word_count) { - /* Just recording the address. */ - label->addr = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); - label->size = code_ptr - code; - label = label->next; - } - if (jump && jump->addr == word_count) { + if (next_addr == word_count) { + SLJIT_ASSERT(!label || label->size >= word_count); + SLJIT_ASSERT(!jump || jump->addr >= word_count); + SLJIT_ASSERT(!const_ || const_->addr >= word_count); + SLJIT_ASSERT(!put_label || put_label->addr >= word_count); + + /* These structures are ordered by their address. */ + if (label && label->size == word_count) { + /* Just recording the address. */ + label->addr = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); + label->size = code_ptr - code; + label = label->next; + } + if (jump && jump->addr == word_count) { #if (defined SLJIT_CONFIG_PPC_32 && SLJIT_CONFIG_PPC_32) - jump->addr = (sljit_uw)(code_ptr - 3); + jump->addr = (sljit_uw)(code_ptr - 3); #else - jump->addr = (sljit_uw)(code_ptr - 6); + jump->addr = (sljit_uw)(code_ptr - 6); #endif - if (detect_jump_type(jump, code_ptr, code, executable_offset)) { + if (detect_jump_type(jump, code_ptr, code, executable_offset)) { #if (defined SLJIT_CONFIG_PPC_32 && SLJIT_CONFIG_PPC_32) - code_ptr[-3] = code_ptr[0]; - code_ptr -= 3; -#else - if (jump->flags & PATCH_ABS32) { + code_ptr[-3] = code_ptr[0]; code_ptr -= 3; - code_ptr[-1] = code_ptr[2]; - code_ptr[0] = code_ptr[3]; - } - else if (jump->flags & PATCH_ABS48) { - code_ptr--; - code_ptr[-1] = code_ptr[0]; - code_ptr[0] = code_ptr[1]; - /* rldicr rX,rX,32,31 -> rX,rX,16,47 */ - SLJIT_ASSERT((code_ptr[-3] & 0xfc00ffff) == 0x780007c6); - code_ptr[-3] ^= 0x8422; - /* oris -> ori */ - code_ptr[-2] ^= 0x4000000; - } - else { - code_ptr[-6] = code_ptr[0]; - code_ptr -= 6; - } +#else + if (jump->flags & PATCH_ABS32) { + code_ptr -= 3; + code_ptr[-1] = code_ptr[2]; + code_ptr[0] = code_ptr[3]; + } + else if (jump->flags & PATCH_ABS48) { + code_ptr--; + code_ptr[-1] = code_ptr[0]; + code_ptr[0] = code_ptr[1]; + /* rldicr rX,rX,32,31 -> rX,rX,16,47 */ + SLJIT_ASSERT((code_ptr[-3] & 0xfc00ffff) == 0x780007c6); + code_ptr[-3] ^= 0x8422; + /* oris -> ori */ + code_ptr[-2] ^= 0x4000000; + } + else { + code_ptr[-6] = code_ptr[0]; + code_ptr -= 6; + } #endif - if (jump->flags & REMOVE_COND) { - code_ptr[0] = BCx | (2 << 2) | ((code_ptr[0] ^ (8 << 21)) & 0x03ff0001); - code_ptr++; - jump->addr += sizeof(sljit_ins); - code_ptr[0] = Bx; - jump->flags -= IS_COND; + if (jump->flags & REMOVE_COND) { + code_ptr[0] = BCx | (2 << 2) | ((code_ptr[0] ^ (8 << 21)) & 0x03ff0001); + code_ptr++; + jump->addr += sizeof(sljit_ins); + code_ptr[0] = Bx; + jump->flags -= IS_COND; + } } + jump = jump->next; } - jump = jump->next; - } - if (const_ && const_->addr == word_count) { - const_->addr = (sljit_uw)code_ptr; - const_ = const_->next; + if (const_ && const_->addr == word_count) { + const_->addr = (sljit_uw)code_ptr; + const_ = const_->next; + } + if (put_label && put_label->addr == word_count) { + SLJIT_ASSERT(put_label->label); + put_label->addr = (sljit_uw)code_ptr; +#if (defined SLJIT_CONFIG_PPC_64 && SLJIT_CONFIG_PPC_64) + code_ptr += put_label_get_length(put_label, (sljit_uw)(SLJIT_ADD_EXEC_OFFSET(code, executable_offset) + put_label->label->size)); + word_count += 4; +#endif + put_label = put_label->next; + } + next_addr = compute_next_addr(label, jump, const_, put_label); } code_ptr ++; word_count ++; @@ -438,6 +508,8 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil SLJIT_ASSERT(!label); SLJIT_ASSERT(!jump); SLJIT_ASSERT(!const_); + SLJIT_ASSERT(!put_label); + #if (defined SLJIT_INDIRECT_CALL && SLJIT_INDIRECT_CALL) SLJIT_ASSERT(code_ptr - code <= (sljit_sw)compiler->size - (sizeof(struct sljit_function_context) / sizeof(sljit_ins))); #else @@ -503,6 +575,21 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil jump = jump->next; } + put_label = compiler->put_labels; + while (put_label) { +#if (defined SLJIT_CONFIG_PPC_32 && SLJIT_CONFIG_PPC_32) + addr = put_label->label->addr; + buf_ptr = (sljit_ins *)put_label->addr; + + SLJIT_ASSERT((buf_ptr[0] & 0xfc1f0000) == ADDIS && (buf_ptr[1] & 0xfc000000) == ORI); + buf_ptr[0] |= (addr >> 16) & 0xffff; + buf_ptr[1] |= addr & 0xffff; +#else + put_label_set(put_label); +#endif + put_label = put_label->next; + } + compiler->error = SLJIT_ERR_COMPILED; compiler->executable_offset = executable_offset; compiler->executable_size = (code_ptr - code) * sizeof(sljit_ins); @@ -520,6 +607,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil code_ptr = (sljit_ins *)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); SLJIT_CACHE_FLUSH(code, code_ptr); + SLJIT_UPDATE_WX_FLAGS(code, code_ptr, 1); #if (defined SLJIT_INDIRECT_CALL && SLJIT_INDIRECT_CALL) return code_ptr; @@ -539,7 +627,10 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_has_cpu_feature(sljit_s32 feature_type) return 1; #endif + /* A saved register is set to a zero value. */ + case SLJIT_HAS_ZERO_REGISTER: case SLJIT_HAS_CLZ: + case SLJIT_HAS_PREFETCH: return 1; default: @@ -1071,6 +1162,9 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op0(struct sljit_compiler *compile #else return push_inst(compiler, (op == SLJIT_DIV_UW ? DIVWU : DIVW) | D(SLJIT_R0) | A(SLJIT_R0) | B(SLJIT_R1)); #endif + case SLJIT_ENDBR: + case SLJIT_SKIP_FRAMES_BEFORE_RETURN: + return SLJIT_SUCCESS; } return SLJIT_SUCCESS; @@ -1116,13 +1210,6 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op1(struct sljit_compiler *compile ADJUST_LOCAL_OFFSET(dst, dstw); ADJUST_LOCAL_OFFSET(src, srcw); - if (dst == SLJIT_UNUSED && !HAS_FLAGS(op)) { - if (op <= SLJIT_MOV_P && (src & SLJIT_MEM)) - return emit_prefetch(compiler, src, srcw); - - return SLJIT_SUCCESS; - } - op = GET_OPCODE(op); if ((src & SLJIT_IMM) && srcw == 0) src = TMP_ZERO; @@ -1449,6 +1536,35 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op2(struct sljit_compiler *compile return SLJIT_SUCCESS; } +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_src(struct sljit_compiler *compiler, sljit_s32 op, + sljit_s32 src, sljit_sw srcw) +{ + CHECK_ERROR(); + CHECK(check_sljit_emit_op_src(compiler, op, src, srcw)); + ADJUST_LOCAL_OFFSET(src, srcw); + + switch (op) { + case SLJIT_FAST_RETURN: + if (FAST_IS_REG(src)) + FAIL_IF(push_inst(compiler, MTLR | S(src))); + else { + FAIL_IF(emit_op(compiler, SLJIT_MOV, WORD_DATA, TMP_REG2, 0, TMP_REG1, 0, src, srcw)); + FAIL_IF(push_inst(compiler, MTLR | S(TMP_REG2))); + } + + return push_inst(compiler, BLR); + case SLJIT_SKIP_FRAMES_BEFORE_FAST_RETURN: + return SLJIT_SUCCESS; + case SLJIT_PREFETCH_L1: + case SLJIT_PREFETCH_L2: + case SLJIT_PREFETCH_L3: + case SLJIT_PREFETCH_ONCE: + return emit_prefetch(compiler, src, srcw); + } + + return SLJIT_SUCCESS; +} + SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_get_register_index(sljit_s32 reg) { CHECK_REG_INDEX(check_sljit_get_register_index(reg)); @@ -1767,22 +1883,6 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_enter(struct sljit_compiler * return emit_op(compiler, SLJIT_MOV, WORD_DATA, dst, dstw, TMP_REG1, 0, TMP_REG2, 0); } -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_return(struct sljit_compiler *compiler, sljit_s32 src, sljit_sw srcw) -{ - CHECK_ERROR(); - CHECK(check_sljit_emit_fast_return(compiler, src, srcw)); - ADJUST_LOCAL_OFFSET(src, srcw); - - if (FAST_IS_REG(src)) - FAIL_IF(push_inst(compiler, MTLR | S(src))); - else { - FAIL_IF(emit_op(compiler, SLJIT_MOV, WORD_DATA, TMP_REG2, 0, TMP_REG1, 0, src, srcw)); - FAIL_IF(push_inst(compiler, MTLR | S(TMP_REG2))); - } - - return push_inst(compiler, BLR); -} - /* --------------------------------------------------------------------- */ /* Conditional instructions */ /* --------------------------------------------------------------------- */ @@ -2261,7 +2361,7 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fmem(struct sljit_compiler *compil SLJIT_API_FUNC_ATTRIBUTE struct sljit_const* sljit_emit_const(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw, sljit_sw init_value) { struct sljit_const *const_; - sljit_s32 reg; + sljit_s32 dst_r; CHECK_ERROR_PTR(); CHECK_PTR(check_sljit_emit_const(compiler, dst, dstw, init_value)); @@ -2271,11 +2371,38 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_const* sljit_emit_const(struct sljit_compi PTR_FAIL_IF(!const_); set_const(const_, compiler); - reg = FAST_IS_REG(dst) ? dst : TMP_REG2; - - PTR_FAIL_IF(emit_const(compiler, reg, init_value)); + dst_r = FAST_IS_REG(dst) ? dst : TMP_REG2; + PTR_FAIL_IF(emit_const(compiler, dst_r, init_value)); if (dst & SLJIT_MEM) PTR_FAIL_IF(emit_op(compiler, SLJIT_MOV, WORD_DATA, dst, dstw, TMP_REG1, 0, TMP_REG2, 0)); + return const_; } + +SLJIT_API_FUNC_ATTRIBUTE struct sljit_put_label* sljit_emit_put_label(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw) +{ + struct sljit_put_label *put_label; + sljit_s32 dst_r; + + CHECK_ERROR_PTR(); + CHECK_PTR(check_sljit_emit_put_label(compiler, dst, dstw)); + ADJUST_LOCAL_OFFSET(dst, dstw); + + put_label = (struct sljit_put_label*)ensure_abuf(compiler, sizeof(struct sljit_put_label)); + PTR_FAIL_IF(!put_label); + set_put_label(put_label, compiler, 0); + + dst_r = FAST_IS_REG(dst) ? dst : TMP_REG2; +#if (defined SLJIT_CONFIG_PPC_32 && SLJIT_CONFIG_PPC_32) + PTR_FAIL_IF(emit_const(compiler, dst_r, 0)); +#else + PTR_FAIL_IF(push_inst(compiler, dst_r)); + compiler->size += 4; +#endif + + if (dst & SLJIT_MEM) + PTR_FAIL_IF(emit_op(compiler, SLJIT_MOV, WORD_DATA, dst, dstw, TMP_REG1, 0, TMP_REG2, 0)); + + return put_label; +} diff --git a/src/pcre2/src/sljit/sljitNativeS390X.c b/src/pcre2/src/sljit/sljitNativeS390X.c new file mode 100644 index 00000000..3d007fe8 --- /dev/null +++ b/src/pcre2/src/sljit/sljitNativeS390X.c @@ -0,0 +1,2813 @@ +/* + * Stack-less Just-In-Time compiler + * + * Copyright Zoltan Herczeg (hzmester@freemail.hu). All rights reserved. + * + * Redistribution and use in source and binary forms, with or without modification, are + * permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, this list of + * conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright notice, this list + * of conditions and the following disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) AND CONTRIBUTORS ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT + * SHALL THE COPYRIGHT HOLDER(S) OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED + * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN + * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include + +#ifdef __ARCH__ +#define ENABLE_STATIC_FACILITY_DETECTION 1 +#else +#define ENABLE_STATIC_FACILITY_DETECTION 0 +#endif +#define ENABLE_DYNAMIC_FACILITY_DETECTION 1 + +SLJIT_API_FUNC_ATTRIBUTE const char* sljit_get_platform_name(void) +{ + return "s390x" SLJIT_CPUINFO; +} + +/* Instructions. */ +typedef sljit_uw sljit_ins; + +/* Instruction tags (most significant halfword). */ +static const sljit_ins sljit_ins_const = (sljit_ins)1 << 48; + +static const sljit_u8 reg_map[SLJIT_NUMBER_OF_REGISTERS + 4] = { + 14, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 0, 1 +}; + +/* there are also a[2-15] available, but they are slower to access and + * their use is limited as mundaym explained: + * https://github.com/zherczeg/sljit/pull/91#discussion_r486895689 + */ + +/* General Purpose Registers [0-15]. */ +typedef sljit_uw sljit_gpr; + +/* + * WARNING + * the following code is non standard and should be improved for + * consistency, but doesn't use SLJIT_NUMBER_OF_REGISTERS based + * registers because r0 and r1 are the ABI recommended volatiles. + * there is a gpr() function that maps sljit to physical register numbers + * that should be used instead of the usual index into reg_map[] and + * will be retired ASAP (TODO: carenas) + */ + +static const sljit_gpr r0 = 0; /* reg_map[SLJIT_NUMBER_OF_REGISTERS + 2]: 0 in address calculations; reserved */ +static const sljit_gpr r1 = 1; /* reg_map[SLJIT_NUMBER_OF_REGISTERS + 3]: reserved */ +static const sljit_gpr r2 = 2; /* reg_map[1]: 1st argument */ +static const sljit_gpr r3 = 3; /* reg_map[2]: 2nd argument */ +static const sljit_gpr r4 = 4; /* reg_map[3]: 3rd argument */ +static const sljit_gpr r5 = 5; /* reg_map[4]: 4th argument */ +static const sljit_gpr r6 = 6; /* reg_map[5]: 5th argument; 1st saved register */ +static const sljit_gpr r7 = 7; /* reg_map[6] */ +static const sljit_gpr r8 = 8; /* reg_map[7] */ +static const sljit_gpr r9 = 9; /* reg_map[8] */ +static const sljit_gpr r10 = 10; /* reg_map[9] */ +static const sljit_gpr r11 = 11; /* reg_map[10] */ +static const sljit_gpr r12 = 12; /* reg_map[11]: GOT */ +static const sljit_gpr r13 = 13; /* reg_map[12]: Literal Pool pointer */ +static const sljit_gpr r14 = 14; /* reg_map[0]: return address and flag register */ +static const sljit_gpr r15 = 15; /* reg_map[SLJIT_NUMBER_OF_REGISTERS + 1]: stack pointer */ + +/* WARNING: r12 and r13 shouldn't be used as per ABI recommendation */ +/* TODO(carenas): r12 might conflict in PIC code, reserve? */ +/* TODO(carenas): r13 is usually pointed to "pool" per ABI, using a tmp + * like we do know might be faster though, reserve? + */ + +/* TODO(carenas): should be named TMP_REG[1-2] for consistency */ +#define tmp0 r0 +#define tmp1 r1 + +/* TODO(carenas): flags should move to a different register so that + * link register doesn't need to change + */ + +/* Link registers. The normal link register is r14, but since + we use that for flags we need to use r0 instead to do fast + calls so that flags are preserved. */ +static const sljit_gpr link_r = 14; /* r14 */ +static const sljit_gpr fast_link_r = 0; /* r0 */ + +/* Flag register layout: + + 0 32 33 34 36 64 + +---------------+---+---+-------+-------+ + | ZERO | 0 | 0 | C C |///////| + +---------------+---+---+-------+-------+ +*/ +static const sljit_gpr flag_r = 14; /* r14 */ + +struct sljit_s390x_const { + struct sljit_const const_; /* must be first */ + sljit_sw init_value; /* required to build literal pool */ +}; + +/* Convert SLJIT register to hardware register. */ +static SLJIT_INLINE sljit_gpr gpr(sljit_s32 r) +{ + SLJIT_ASSERT(r != SLJIT_UNUSED); + SLJIT_ASSERT(r < (sljit_s32)(sizeof(reg_map) / sizeof(reg_map[0]))); + return reg_map[r]; +} + +/* Size of instruction in bytes. Tags must already be cleared. */ +static SLJIT_INLINE sljit_uw sizeof_ins(sljit_ins ins) +{ + /* keep faulting instructions */ + if (ins == 0) + return 2; + + if ((ins & 0x00000000ffffL) == ins) + return 2; + if ((ins & 0x0000ffffffffL) == ins) + return 4; + if ((ins & 0xffffffffffffL) == ins) + return 6; + + SLJIT_UNREACHABLE(); + return (sljit_uw)-1; +} + +static sljit_s32 push_inst(struct sljit_compiler *compiler, sljit_ins ins) +{ + sljit_ins *ibuf = (sljit_ins *)ensure_buf(compiler, sizeof(sljit_ins)); + FAIL_IF(!ibuf); + *ibuf = ins; + compiler->size++; + return SLJIT_SUCCESS; +} + +static sljit_s32 encode_inst(void **ptr, sljit_ins ins) +{ + sljit_u16 *ibuf = (sljit_u16 *)*ptr; + sljit_uw size = sizeof_ins(ins); + + SLJIT_ASSERT((size & 6) == size); + switch (size) { + case 6: + *ibuf++ = (sljit_u16)(ins >> 32); + /* fallthrough */ + case 4: + *ibuf++ = (sljit_u16)(ins >> 16); + /* fallthrough */ + case 2: + *ibuf++ = (sljit_u16)(ins); + } + *ptr = (void*)ibuf; + return SLJIT_SUCCESS; +} + +/* Map the given type to a 4-bit condition code mask. */ +static SLJIT_INLINE sljit_u8 get_cc(sljit_s32 type) { + const sljit_u8 eq = 1 << 3; /* equal {,to zero} */ + const sljit_u8 lt = 1 << 2; /* less than {,zero} */ + const sljit_u8 gt = 1 << 1; /* greater than {,zero} */ + const sljit_u8 ov = 1 << 0; /* {overflow,NaN} */ + + switch (type) { + case SLJIT_EQUAL: + case SLJIT_EQUAL_F64: + return eq; + + case SLJIT_NOT_EQUAL: + case SLJIT_NOT_EQUAL_F64: + return ~eq; + + case SLJIT_LESS: + case SLJIT_SIG_LESS: + case SLJIT_LESS_F64: + return lt; + + case SLJIT_LESS_EQUAL: + case SLJIT_SIG_LESS_EQUAL: + case SLJIT_LESS_EQUAL_F64: + return (lt | eq); + + case SLJIT_GREATER: + case SLJIT_SIG_GREATER: + case SLJIT_GREATER_F64: + return gt; + + case SLJIT_GREATER_EQUAL: + case SLJIT_SIG_GREATER_EQUAL: + case SLJIT_GREATER_EQUAL_F64: + return (gt | eq); + + case SLJIT_OVERFLOW: + case SLJIT_MUL_OVERFLOW: + case SLJIT_UNORDERED_F64: + return ov; + + case SLJIT_NOT_OVERFLOW: + case SLJIT_MUL_NOT_OVERFLOW: + case SLJIT_ORDERED_F64: + return ~ov; + } + + SLJIT_UNREACHABLE(); + return (sljit_u8)-1; +} + +/* Facility to bit index mappings. + Note: some facilities share the same bit index. */ +typedef sljit_uw facility_bit; +#define STORE_FACILITY_LIST_EXTENDED_FACILITY 7 +#define FAST_LONG_DISPLACEMENT_FACILITY 19 +#define EXTENDED_IMMEDIATE_FACILITY 21 +#define GENERAL_INSTRUCTION_EXTENSION_FACILITY 34 +#define DISTINCT_OPERAND_FACILITY 45 +#define HIGH_WORD_FACILITY 45 +#define POPULATION_COUNT_FACILITY 45 +#define LOAD_STORE_ON_CONDITION_1_FACILITY 45 +#define MISCELLANEOUS_INSTRUCTION_EXTENSIONS_1_FACILITY 49 +#define LOAD_STORE_ON_CONDITION_2_FACILITY 53 +#define MISCELLANEOUS_INSTRUCTION_EXTENSIONS_2_FACILITY 58 +#define VECTOR_FACILITY 129 +#define VECTOR_ENHANCEMENTS_1_FACILITY 135 + +/* Report whether a facility is known to be present due to the compiler + settings. This function should always be compiled to a constant + value given a constant argument. */ +static SLJIT_INLINE int have_facility_static(facility_bit x) +{ +#if ENABLE_STATIC_FACILITY_DETECTION + switch (x) { + case FAST_LONG_DISPLACEMENT_FACILITY: + return (__ARCH__ >= 6 /* z990 */); + case EXTENDED_IMMEDIATE_FACILITY: + case STORE_FACILITY_LIST_EXTENDED_FACILITY: + return (__ARCH__ >= 7 /* z9-109 */); + case GENERAL_INSTRUCTION_EXTENSION_FACILITY: + return (__ARCH__ >= 8 /* z10 */); + case DISTINCT_OPERAND_FACILITY: + return (__ARCH__ >= 9 /* z196 */); + case MISCELLANEOUS_INSTRUCTION_EXTENSIONS_1_FACILITY: + return (__ARCH__ >= 10 /* zEC12 */); + case LOAD_STORE_ON_CONDITION_2_FACILITY: + case VECTOR_FACILITY: + return (__ARCH__ >= 11 /* z13 */); + case MISCELLANEOUS_INSTRUCTION_EXTENSIONS_2_FACILITY: + case VECTOR_ENHANCEMENTS_1_FACILITY: + return (__ARCH__ >= 12 /* z14 */); + default: + SLJIT_UNREACHABLE(); + } +#endif + return 0; +} + +static SLJIT_INLINE unsigned long get_hwcap() +{ + static unsigned long hwcap = 0; + if (SLJIT_UNLIKELY(!hwcap)) { + hwcap = getauxval(AT_HWCAP); + SLJIT_ASSERT(hwcap != 0); + } + return hwcap; +} + +static SLJIT_INLINE int have_stfle() +{ + if (have_facility_static(STORE_FACILITY_LIST_EXTENDED_FACILITY)) + return 1; + + return (get_hwcap() & HWCAP_S390_STFLE); +} + +/* Report whether the given facility is available. This function always + performs a runtime check. */ +static int have_facility_dynamic(facility_bit x) +{ +#if ENABLE_DYNAMIC_FACILITY_DETECTION + static struct { + sljit_uw bits[4]; + } cpu_features; + size_t size = sizeof(cpu_features); + const sljit_uw word_index = x >> 6; + const sljit_uw bit_index = ((1UL << 63) >> (x & 63)); + + SLJIT_ASSERT(x < size * 8); + if (SLJIT_UNLIKELY(!have_stfle())) + return 0; + + if (SLJIT_UNLIKELY(cpu_features.bits[0] == 0)) { + __asm__ __volatile__ ( + "lgr %%r0, %0;" + "stfle 0(%1);" + /* outputs */: + /* inputs */: "d" ((size / 8) - 1), "a" (&cpu_features) + /* clobbers */: "r0", "cc", "memory" + ); + SLJIT_ASSERT(cpu_features.bits[0] != 0); + } + return (cpu_features.bits[word_index] & bit_index) != 0; +#else + return 0; +#endif +} + +#define HAVE_FACILITY(name, bit) \ +static SLJIT_INLINE int name() \ +{ \ + static int have = -1; \ + /* Static check first. May allow the function to be optimized away. */ \ + if (have_facility_static(bit)) \ + have = 1; \ + else if (SLJIT_UNLIKELY(have < 0)) \ + have = have_facility_dynamic(bit) ? 1 : 0; \ +\ + return have; \ +} + +HAVE_FACILITY(have_eimm, EXTENDED_IMMEDIATE_FACILITY) +HAVE_FACILITY(have_ldisp, FAST_LONG_DISPLACEMENT_FACILITY) +HAVE_FACILITY(have_genext, GENERAL_INSTRUCTION_EXTENSION_FACILITY) +HAVE_FACILITY(have_lscond1, LOAD_STORE_ON_CONDITION_1_FACILITY) +HAVE_FACILITY(have_lscond2, LOAD_STORE_ON_CONDITION_2_FACILITY) +HAVE_FACILITY(have_misc2, MISCELLANEOUS_INSTRUCTION_EXTENSIONS_2_FACILITY) +#undef HAVE_FACILITY + +#define is_u12(d) (0 <= (d) && (d) <= 0x00000fffL) +#define is_u32(d) (0 <= (d) && (d) <= 0xffffffffL) + +#define CHECK_SIGNED(v, bitlen) \ + ((v) == (((v) << (sizeof(v) * 8 - bitlen)) >> (sizeof(v) * 8 - bitlen))) + +#define is_s16(d) CHECK_SIGNED((d), 16) +#define is_s20(d) CHECK_SIGNED((d), 20) +#define is_s32(d) CHECK_SIGNED((d), 32) + +static SLJIT_INLINE sljit_uw disp_s20(sljit_s32 d) +{ + sljit_uw dh = (d >> 12) & 0xff; + sljit_uw dl = (d << 8) & 0xfff00; + + SLJIT_ASSERT(is_s20(d)); + return dh | dl; +} + +/* TODO(carenas): variadic macro is not strictly needed */ +#define SLJIT_S390X_INSTRUCTION(op, ...) \ +static SLJIT_INLINE sljit_ins op(__VA_ARGS__) + +/* RR form instructions. */ +#define SLJIT_S390X_RR(name, pattern) \ +SLJIT_S390X_INSTRUCTION(name, sljit_gpr dst, sljit_gpr src) \ +{ \ + return (pattern) | ((dst & 0xf) << 4) | (src & 0xf); \ +} + +/* ADD */ +SLJIT_S390X_RR(ar, 0x1a00) + +/* ADD LOGICAL */ +SLJIT_S390X_RR(alr, 0x1e00) + +/* AND */ +SLJIT_S390X_RR(nr, 0x1400) + +/* BRANCH AND SAVE */ +SLJIT_S390X_RR(basr, 0x0d00) + +/* BRANCH ON CONDITION */ +SLJIT_S390X_RR(bcr, 0x0700) /* TODO(mundaym): type for mask? */ + +/* COMPARE */ +SLJIT_S390X_RR(cr, 0x1900) + +/* COMPARE LOGICAL */ +SLJIT_S390X_RR(clr, 0x1500) + +/* DIVIDE */ +SLJIT_S390X_RR(dr, 0x1d00) + +/* EXCLUSIVE OR */ +SLJIT_S390X_RR(xr, 0x1700) + +/* LOAD */ +SLJIT_S390X_RR(lr, 0x1800) + +/* LOAD COMPLEMENT */ +SLJIT_S390X_RR(lcr, 0x1300) + +/* OR */ +SLJIT_S390X_RR(or, 0x1600) + +/* SUBTRACT */ +SLJIT_S390X_RR(sr, 0x1b00) + +/* SUBTRACT LOGICAL */ +SLJIT_S390X_RR(slr, 0x1f00) + +#undef SLJIT_S390X_RR + +/* RRE form instructions */ +#define SLJIT_S390X_RRE(name, pattern) \ +SLJIT_S390X_INSTRUCTION(name, sljit_gpr dst, sljit_gpr src) \ +{ \ + return (pattern) | ((dst & 0xf) << 4) | (src & 0xf); \ +} + +/* ADD */ +SLJIT_S390X_RRE(agr, 0xb9080000) + +/* ADD LOGICAL */ +SLJIT_S390X_RRE(algr, 0xb90a0000) + +/* ADD LOGICAL WITH CARRY */ +SLJIT_S390X_RRE(alcr, 0xb9980000) +SLJIT_S390X_RRE(alcgr, 0xb9880000) + +/* AND */ +SLJIT_S390X_RRE(ngr, 0xb9800000) + +/* COMPARE */ +SLJIT_S390X_RRE(cgr, 0xb9200000) + +/* COMPARE LOGICAL */ +SLJIT_S390X_RRE(clgr, 0xb9210000) + +/* DIVIDE LOGICAL */ +SLJIT_S390X_RRE(dlr, 0xb9970000) +SLJIT_S390X_RRE(dlgr, 0xb9870000) + +/* DIVIDE SINGLE */ +SLJIT_S390X_RRE(dsgr, 0xb90d0000) + +/* EXCLUSIVE OR */ +SLJIT_S390X_RRE(xgr, 0xb9820000) + +/* LOAD */ +SLJIT_S390X_RRE(lgr, 0xb9040000) +SLJIT_S390X_RRE(lgfr, 0xb9140000) + +/* LOAD BYTE */ +SLJIT_S390X_RRE(lbr, 0xb9260000) +SLJIT_S390X_RRE(lgbr, 0xb9060000) + +/* LOAD COMPLEMENT */ +SLJIT_S390X_RRE(lcgr, 0xb9030000) + +/* LOAD HALFWORD */ +SLJIT_S390X_RRE(lhr, 0xb9270000) +SLJIT_S390X_RRE(lghr, 0xb9070000) + +/* LOAD LOGICAL */ +SLJIT_S390X_RRE(llgfr, 0xb9160000) + +/* LOAD LOGICAL CHARACTER */ +SLJIT_S390X_RRE(llcr, 0xb9940000) +SLJIT_S390X_RRE(llgcr, 0xb9840000) + +/* LOAD LOGICAL HALFWORD */ +SLJIT_S390X_RRE(llhr, 0xb9950000) +SLJIT_S390X_RRE(llghr, 0xb9850000) + +/* MULTIPLY LOGICAL */ +SLJIT_S390X_RRE(mlgr, 0xb9860000) + +/* MULTIPLY SINGLE */ +SLJIT_S390X_RRE(msr, 0xb2520000) +SLJIT_S390X_RRE(msgr, 0xb90c0000) +SLJIT_S390X_RRE(msgfr, 0xb91c0000) + +/* OR */ +SLJIT_S390X_RRE(ogr, 0xb9810000) + +/* SUBTRACT */ +SLJIT_S390X_RRE(sgr, 0xb9090000) + +/* SUBTRACT LOGICAL */ +SLJIT_S390X_RRE(slgr, 0xb90b0000) + +/* SUBTRACT LOGICAL WITH BORROW */ +SLJIT_S390X_RRE(slbr, 0xb9990000) +SLJIT_S390X_RRE(slbgr, 0xb9890000) + +#undef SLJIT_S390X_RRE + +/* RI-a form instructions */ +#define SLJIT_S390X_RIA(name, pattern, imm_type) \ +SLJIT_S390X_INSTRUCTION(name, sljit_gpr reg, imm_type imm) \ +{ \ + return (pattern) | ((reg & 0xf) << 20) | (imm & 0xffff); \ +} + +/* ADD HALFWORD IMMEDIATE */ +SLJIT_S390X_RIA(ahi, 0xa70a0000, sljit_s16) +SLJIT_S390X_RIA(aghi, 0xa70b0000, sljit_s16) + +/* COMPARE HALFWORD IMMEDIATE */ +SLJIT_S390X_RIA(chi, 0xa70e0000, sljit_s16) +SLJIT_S390X_RIA(cghi, 0xa70f0000, sljit_s16) + +/* LOAD HALFWORD IMMEDIATE */ +SLJIT_S390X_RIA(lhi, 0xa7080000, sljit_s16) +SLJIT_S390X_RIA(lghi, 0xa7090000, sljit_s16) + +/* LOAD LOGICAL IMMEDIATE */ +SLJIT_S390X_RIA(llihh, 0xa50c0000, sljit_u16) +SLJIT_S390X_RIA(llihl, 0xa50d0000, sljit_u16) +SLJIT_S390X_RIA(llilh, 0xa50e0000, sljit_u16) +SLJIT_S390X_RIA(llill, 0xa50f0000, sljit_u16) + +/* MULTIPLY HALFWORD IMMEDIATE */ +SLJIT_S390X_RIA(mhi, 0xa70c0000, sljit_s16) +SLJIT_S390X_RIA(mghi, 0xa70d0000, sljit_s16) + +/* OR IMMEDIATE */ +SLJIT_S390X_RIA(oilh, 0xa50a0000, sljit_u16) + +/* TEST UNDER MASK */ +SLJIT_S390X_RIA(tmlh, 0xa7000000, sljit_u16) + +#undef SLJIT_S390X_RIA + +/* RIL-a form instructions (requires extended immediate facility) */ +#define SLJIT_S390X_RILA(name, pattern, imm_type) \ +SLJIT_S390X_INSTRUCTION(name, sljit_gpr reg, imm_type imm) \ +{ \ + SLJIT_ASSERT(have_eimm()); \ + return (pattern) | ((sljit_ins)(reg & 0xf) << 36) | (imm & 0xffffffff); \ +} + +/* ADD IMMEDIATE */ +SLJIT_S390X_RILA(afi, 0xc20900000000, sljit_s32) +SLJIT_S390X_RILA(agfi, 0xc20800000000, sljit_s32) + +/* ADD IMMEDIATE HIGH */ +SLJIT_S390X_RILA(aih, 0xcc0800000000, sljit_s32) /* TODO(mundaym): high-word facility? */ + +/* ADD LOGICAL IMMEDIATE */ +SLJIT_S390X_RILA(alfi, 0xc20b00000000, sljit_u32) +SLJIT_S390X_RILA(algfi, 0xc20a00000000, sljit_u32) + +/* AND IMMEDIATE */ +SLJIT_S390X_RILA(nihf, 0xc00a00000000, sljit_u32) +SLJIT_S390X_RILA(nilf, 0xc00b00000000, sljit_u32) + +/* COMPARE IMMEDIATE */ +SLJIT_S390X_RILA(cfi, 0xc20d00000000, sljit_s32) +SLJIT_S390X_RILA(cgfi, 0xc20c00000000, sljit_s32) + +/* COMPARE IMMEDIATE HIGH */ +SLJIT_S390X_RILA(cih, 0xcc0d00000000, sljit_s32) /* TODO(mundaym): high-word facility? */ + +/* COMPARE LOGICAL IMMEDIATE */ +SLJIT_S390X_RILA(clfi, 0xc20f00000000, sljit_u32) +SLJIT_S390X_RILA(clgfi, 0xc20e00000000, sljit_u32) + +/* EXCLUSIVE OR IMMEDIATE */ +SLJIT_S390X_RILA(xilf, 0xc00700000000, sljit_u32) + +/* INSERT IMMEDIATE */ +SLJIT_S390X_RILA(iihf, 0xc00800000000, sljit_u32) +SLJIT_S390X_RILA(iilf, 0xc00900000000, sljit_u32) + +/* LOAD IMMEDIATE */ +SLJIT_S390X_RILA(lgfi, 0xc00100000000, sljit_s32) + +/* LOAD LOGICAL IMMEDIATE */ +SLJIT_S390X_RILA(llihf, 0xc00e00000000, sljit_u32) +SLJIT_S390X_RILA(llilf, 0xc00f00000000, sljit_u32) + +/* OR IMMEDIATE */ +SLJIT_S390X_RILA(oilf, 0xc00d00000000, sljit_u32) + +#undef SLJIT_S390X_RILA + +/* RX-a form instructions */ +#define SLJIT_S390X_RXA(name, pattern) \ +SLJIT_S390X_INSTRUCTION(name, sljit_gpr r, sljit_u16 d, sljit_gpr x, sljit_gpr b) \ +{ \ + sljit_ins ri, xi, bi, di; \ +\ + SLJIT_ASSERT((d & 0xfff) == d); \ + ri = (sljit_ins)(r & 0xf) << 20; \ + xi = (sljit_ins)(x & 0xf) << 16; \ + bi = (sljit_ins)(b & 0xf) << 12; \ + di = (sljit_ins)(d & 0xfff); \ +\ + return (pattern) | ri | xi | bi | di; \ +} + +/* ADD */ +SLJIT_S390X_RXA(a, 0x5a000000) + +/* ADD LOGICAL */ +SLJIT_S390X_RXA(al, 0x5e000000) + +/* AND */ +SLJIT_S390X_RXA(n, 0x54000000) + +/* EXCLUSIVE OR */ +SLJIT_S390X_RXA(x, 0x57000000) + +/* LOAD */ +SLJIT_S390X_RXA(l, 0x58000000) + +/* LOAD ADDRESS */ +SLJIT_S390X_RXA(la, 0x41000000) + +/* LOAD HALFWORD */ +SLJIT_S390X_RXA(lh, 0x48000000) + +/* MULTIPLY SINGLE */ +SLJIT_S390X_RXA(ms, 0x71000000) + +/* OR */ +SLJIT_S390X_RXA(o, 0x56000000) + +/* STORE */ +SLJIT_S390X_RXA(st, 0x50000000) + +/* STORE CHARACTER */ +SLJIT_S390X_RXA(stc, 0x42000000) + +/* STORE HALFWORD */ +SLJIT_S390X_RXA(sth, 0x40000000) + +/* SUBTRACT */ +SLJIT_S390X_RXA(s, 0x5b000000) + +/* SUBTRACT LOGICAL */ +SLJIT_S390X_RXA(sl, 0x5f000000) + +#undef SLJIT_S390X_RXA + +/* RXY-a instructions */ +#define SLJIT_S390X_RXYA(name, pattern, cond) \ +SLJIT_S390X_INSTRUCTION(name, sljit_gpr r, sljit_s32 d, sljit_gpr x, sljit_gpr b) \ +{ \ + sljit_ins ri, xi, bi, di; \ +\ + SLJIT_ASSERT(cond); \ + ri = (sljit_ins)(r & 0xf) << 36; \ + xi = (sljit_ins)(x & 0xf) << 32; \ + bi = (sljit_ins)(b & 0xf) << 28; \ + di = (sljit_ins)disp_s20(d) << 8; \ +\ + return (pattern) | ri | xi | bi | di; \ +} + +/* ADD */ +SLJIT_S390X_RXYA(ay, 0xe3000000005a, have_ldisp()) +SLJIT_S390X_RXYA(ag, 0xe30000000008, 1) + +/* ADD LOGICAL */ +SLJIT_S390X_RXYA(aly, 0xe3000000005e, have_ldisp()) +SLJIT_S390X_RXYA(alg, 0xe3000000000a, 1) + +/* ADD LOGICAL WITH CARRY */ +SLJIT_S390X_RXYA(alc, 0xe30000000098, 1) +SLJIT_S390X_RXYA(alcg, 0xe30000000088, 1) + +/* AND */ +SLJIT_S390X_RXYA(ny, 0xe30000000054, have_ldisp()) +SLJIT_S390X_RXYA(ng, 0xe30000000080, 1) + +/* EXCLUSIVE OR */ +SLJIT_S390X_RXYA(xy, 0xe30000000057, have_ldisp()) +SLJIT_S390X_RXYA(xg, 0xe30000000082, 1) + +/* LOAD */ +SLJIT_S390X_RXYA(ly, 0xe30000000058, have_ldisp()) +SLJIT_S390X_RXYA(lg, 0xe30000000004, 1) +SLJIT_S390X_RXYA(lgf, 0xe30000000014, 1) + +/* LOAD BYTE */ +SLJIT_S390X_RXYA(lb, 0xe30000000076, have_ldisp()) +SLJIT_S390X_RXYA(lgb, 0xe30000000077, have_ldisp()) + +/* LOAD HALFWORD */ +SLJIT_S390X_RXYA(lhy, 0xe30000000078, have_ldisp()) +SLJIT_S390X_RXYA(lgh, 0xe30000000015, 1) + +/* LOAD LOGICAL */ +SLJIT_S390X_RXYA(llgf, 0xe30000000016, 1) + +/* LOAD LOGICAL CHARACTER */ +SLJIT_S390X_RXYA(llc, 0xe30000000094, have_eimm()) +SLJIT_S390X_RXYA(llgc, 0xe30000000090, 1) + +/* LOAD LOGICAL HALFWORD */ +SLJIT_S390X_RXYA(llh, 0xe30000000095, have_eimm()) +SLJIT_S390X_RXYA(llgh, 0xe30000000091, 1) + +/* MULTIPLY SINGLE */ +SLJIT_S390X_RXYA(msy, 0xe30000000051, have_ldisp()) +SLJIT_S390X_RXYA(msg, 0xe3000000000c, 1) + +/* OR */ +SLJIT_S390X_RXYA(oy, 0xe30000000056, have_ldisp()) +SLJIT_S390X_RXYA(og, 0xe30000000081, 1) + +/* STORE */ +SLJIT_S390X_RXYA(sty, 0xe30000000050, have_ldisp()) +SLJIT_S390X_RXYA(stg, 0xe30000000024, 1) + +/* STORE CHARACTER */ +SLJIT_S390X_RXYA(stcy, 0xe30000000072, have_ldisp()) + +/* STORE HALFWORD */ +SLJIT_S390X_RXYA(sthy, 0xe30000000070, have_ldisp()) + +/* SUBTRACT */ +SLJIT_S390X_RXYA(sy, 0xe3000000005b, have_ldisp()) +SLJIT_S390X_RXYA(sg, 0xe30000000009, 1) + +/* SUBTRACT LOGICAL */ +SLJIT_S390X_RXYA(sly, 0xe3000000005f, have_ldisp()) +SLJIT_S390X_RXYA(slg, 0xe3000000000b, 1) + +/* SUBTRACT LOGICAL WITH BORROW */ +SLJIT_S390X_RXYA(slb, 0xe30000000099, 1) +SLJIT_S390X_RXYA(slbg, 0xe30000000089, 1) + +#undef SLJIT_S390X_RXYA + +/* RS-a instructions */ +#define SLJIT_S390X_RSA(name, pattern) \ +SLJIT_S390X_INSTRUCTION(name, sljit_gpr reg, sljit_sw d, sljit_gpr b) \ +{ \ + sljit_ins r1 = (sljit_ins)(reg & 0xf) << 20; \ + sljit_ins b2 = (sljit_ins)(b & 0xf) << 12; \ + sljit_ins d2 = (sljit_ins)(d & 0xfff); \ + return (pattern) | r1 | b2 | d2; \ +} + +/* SHIFT LEFT SINGLE LOGICAL */ +SLJIT_S390X_RSA(sll, 0x89000000) + +/* SHIFT RIGHT SINGLE */ +SLJIT_S390X_RSA(sra, 0x8a000000) + +/* SHIFT RIGHT SINGLE LOGICAL */ +SLJIT_S390X_RSA(srl, 0x88000000) + +#undef SLJIT_S390X_RSA + +/* RSY-a instructions */ +#define SLJIT_S390X_RSYA(name, pattern, cond) \ +SLJIT_S390X_INSTRUCTION(name, sljit_gpr dst, sljit_gpr src, sljit_sw d, sljit_gpr b) \ +{ \ + sljit_ins r1, r3, b2, d2; \ +\ + SLJIT_ASSERT(cond); \ + r1 = (sljit_ins)(dst & 0xf) << 36; \ + r3 = (sljit_ins)(src & 0xf) << 32; \ + b2 = (sljit_ins)(b & 0xf) << 28; \ + d2 = (sljit_ins)disp_s20(d) << 8; \ +\ + return (pattern) | r1 | r3 | b2 | d2; \ +} + +/* LOAD MULTIPLE */ +SLJIT_S390X_RSYA(lmg, 0xeb0000000004, 1) + +/* SHIFT LEFT LOGICAL */ +SLJIT_S390X_RSYA(sllg, 0xeb000000000d, 1) + +/* SHIFT RIGHT SINGLE */ +SLJIT_S390X_RSYA(srag, 0xeb000000000a, 1) + +/* SHIFT RIGHT SINGLE LOGICAL */ +SLJIT_S390X_RSYA(srlg, 0xeb000000000c, 1) + +/* STORE MULTIPLE */ +SLJIT_S390X_RSYA(stmg, 0xeb0000000024, 1) + +#undef SLJIT_S390X_RSYA + +/* RIE-f instructions (require general-instructions-extension facility) */ +#define SLJIT_S390X_RIEF(name, pattern) \ +SLJIT_S390X_INSTRUCTION(name, sljit_gpr dst, sljit_gpr src, sljit_u8 start, sljit_u8 end, sljit_u8 rot) \ +{ \ + sljit_ins r1, r2, i3, i4, i5; \ +\ + SLJIT_ASSERT(have_genext()); \ + r1 = (sljit_ins)(dst & 0xf) << 36; \ + r2 = (sljit_ins)(src & 0xf) << 32; \ + i3 = (sljit_ins)start << 24; \ + i4 = (sljit_ins)end << 16; \ + i5 = (sljit_ins)rot << 8; \ +\ + return (pattern) | r1 | r2 | i3 | i4 | i5; \ +} + +/* ROTATE THEN AND SELECTED BITS */ +/* SLJIT_S390X_RIEF(rnsbg, 0xec0000000054) */ + +/* ROTATE THEN EXCLUSIVE OR SELECTED BITS */ +/* SLJIT_S390X_RIEF(rxsbg, 0xec0000000057) */ + +/* ROTATE THEN OR SELECTED BITS */ +SLJIT_S390X_RIEF(rosbg, 0xec0000000056) + +/* ROTATE THEN INSERT SELECTED BITS */ +/* SLJIT_S390X_RIEF(risbg, 0xec0000000055) */ +/* SLJIT_S390X_RIEF(risbgn, 0xec0000000059) */ + +/* ROTATE THEN INSERT SELECTED BITS HIGH */ +SLJIT_S390X_RIEF(risbhg, 0xec000000005d) + +/* ROTATE THEN INSERT SELECTED BITS LOW */ +/* SLJIT_S390X_RIEF(risblg, 0xec0000000051) */ + +#undef SLJIT_S390X_RIEF + +/* RRF-a instructions */ +#define SLJIT_S390X_RRFA(name, pattern, cond) \ +SLJIT_S390X_INSTRUCTION(name, sljit_gpr dst, sljit_gpr src1, sljit_gpr src2) \ +{ \ + sljit_ins r1, r2, r3; \ +\ + SLJIT_ASSERT(cond); \ + r1 = (sljit_ins)(dst & 0xf) << 4; \ + r2 = (sljit_ins)(src1 & 0xf); \ + r3 = (sljit_ins)(src2 & 0xf) << 12; \ +\ + return (pattern) | r3 | r1 | r2; \ +} + +/* MULTIPLY */ +SLJIT_S390X_RRFA(msrkc, 0xb9fd0000, have_misc2()) +SLJIT_S390X_RRFA(msgrkc, 0xb9ed0000, have_misc2()) + +#undef SLJIT_S390X_RRFA + +/* RRF-c instructions (require load/store-on-condition 1 facility) */ +#define SLJIT_S390X_RRFC(name, pattern) \ +SLJIT_S390X_INSTRUCTION(name, sljit_gpr dst, sljit_gpr src, sljit_uw mask) \ +{ \ + sljit_ins r1, r2, m3; \ +\ + SLJIT_ASSERT(have_lscond1()); \ + r1 = (sljit_ins)(dst & 0xf) << 4; \ + r2 = (sljit_ins)(src & 0xf); \ + m3 = (sljit_ins)(mask & 0xf) << 12; \ +\ + return (pattern) | m3 | r1 | r2; \ +} + +/* LOAD HALFWORD IMMEDIATE ON CONDITION */ +SLJIT_S390X_RRFC(locr, 0xb9f20000) +SLJIT_S390X_RRFC(locgr, 0xb9e20000) + +#undef SLJIT_S390X_RRFC + +/* RIE-g instructions (require load/store-on-condition 2 facility) */ +#define SLJIT_S390X_RIEG(name, pattern) \ +SLJIT_S390X_INSTRUCTION(name, sljit_gpr reg, sljit_sw imm, sljit_uw mask) \ +{ \ + sljit_ins r1, m3, i2; \ +\ + SLJIT_ASSERT(have_lscond2()); \ + r1 = (sljit_ins)(reg & 0xf) << 36; \ + m3 = (sljit_ins)(mask & 0xf) << 32; \ + i2 = (sljit_ins)(imm & 0xffffL) << 16; \ +\ + return (pattern) | r1 | m3 | i2; \ +} + +/* LOAD HALFWORD IMMEDIATE ON CONDITION */ +SLJIT_S390X_RIEG(lochi, 0xec0000000042) +SLJIT_S390X_RIEG(locghi, 0xec0000000046) + +#undef SLJIT_S390X_RIEG + +#define SLJIT_S390X_RILB(name, pattern, cond) \ +SLJIT_S390X_INSTRUCTION(name, sljit_gpr reg, sljit_sw ri) \ +{ \ + sljit_ins r1, ri2; \ +\ + SLJIT_ASSERT(cond); \ + r1 = (sljit_ins)(reg & 0xf) << 36; \ + ri2 = (sljit_ins)(ri & 0xffffffff); \ +\ + return (pattern) | r1 | ri2; \ +} + +/* BRANCH RELATIVE AND SAVE LONG */ +SLJIT_S390X_RILB(brasl, 0xc00500000000, 1) + +/* LOAD ADDRESS RELATIVE LONG */ +SLJIT_S390X_RILB(larl, 0xc00000000000, 1) + +/* LOAD RELATIVE LONG */ +SLJIT_S390X_RILB(lgrl, 0xc40800000000, have_genext()) + +#undef SLJIT_S390X_RILB + +SLJIT_S390X_INSTRUCTION(br, sljit_gpr target) +{ + return 0x07f0 | target; +} + +SLJIT_S390X_INSTRUCTION(brcl, sljit_uw mask, sljit_sw target) +{ + sljit_ins m1 = (sljit_ins)(mask & 0xf) << 36; + sljit_ins ri2 = (sljit_ins)target & 0xffffffff; + return 0xc00400000000L | m1 | ri2; +} + +SLJIT_S390X_INSTRUCTION(flogr, sljit_gpr dst, sljit_gpr src) +{ + sljit_ins r1 = ((sljit_ins)dst & 0xf) << 8; + sljit_ins r2 = ((sljit_ins)src & 0xf); + SLJIT_ASSERT(have_eimm()); + return 0xb9830000 | r1 | r2; +} + +/* INSERT PROGRAM MASK */ +SLJIT_S390X_INSTRUCTION(ipm, sljit_gpr dst) +{ + return 0xb2220000 | ((sljit_ins)(dst & 0xf) << 4); +} + +/* ROTATE THEN INSERT SELECTED BITS HIGH (ZERO) */ +SLJIT_S390X_INSTRUCTION(risbhgz, sljit_gpr dst, sljit_gpr src, sljit_u8 start, sljit_u8 end, sljit_u8 rot) +{ + return risbhg(dst, src, start, 0x8 | end, rot); +} + +#undef SLJIT_S390X_INSTRUCTION + +/* load condition code as needed to match type */ +static sljit_s32 push_load_cc(struct sljit_compiler *compiler, sljit_s32 type) +{ + type &= ~SLJIT_I32_OP; + switch (type) { + case SLJIT_ZERO: + case SLJIT_NOT_ZERO: + return push_inst(compiler, cih(flag_r, 0)); + break; + default: + return push_inst(compiler, tmlh(flag_r, 0x3000)); + break; + } + return SLJIT_SUCCESS; +} + +static sljit_s32 push_store_zero_flag(struct sljit_compiler *compiler, sljit_s32 op, sljit_gpr source) +{ + /* insert low 32-bits into high 32-bits of flag register */ + FAIL_IF(push_inst(compiler, risbhgz(flag_r, source, 0, 31, 32))); + if (!(op & SLJIT_I32_OP)) { + /* OR high 32-bits with high 32-bits of flag register */ + return push_inst(compiler, rosbg(flag_r, source, 0, 31, 0)); + } + return SLJIT_SUCCESS; +} + +/* load 64-bit immediate into register without clobbering flags */ +static sljit_s32 push_load_imm_inst(struct sljit_compiler *compiler, sljit_gpr target, sljit_sw v) +{ + /* 4 byte instructions */ + if (is_s16(v)) + return push_inst(compiler, lghi(target, (sljit_s16)v)); + + if ((sljit_uw)v == (v & 0x000000000000ffffU)) + return push_inst(compiler, llill(target, (sljit_u16)v)); + + if ((sljit_uw)v == (v & 0x00000000ffff0000U)) + return push_inst(compiler, llilh(target, (sljit_u16)(v >> 16))); + + if ((sljit_uw)v == (v & 0x0000ffff00000000U)) + return push_inst(compiler, llihl(target, (sljit_u16)(v >> 32))); + + if ((sljit_uw)v == (v & 0xffff000000000000U)) + return push_inst(compiler, llihh(target, (sljit_u16)(v >> 48))); + + /* 6 byte instructions (requires extended immediate facility) */ + if (have_eimm()) { + if (is_s32(v)) + return push_inst(compiler, lgfi(target, (sljit_s32)v)); + + if ((sljit_uw)v == (v & 0x00000000ffffffffU)) + return push_inst(compiler, llilf(target, (sljit_u32)v)); + + if ((sljit_uw)v == (v & 0xffffffff00000000U)) + return push_inst(compiler, llihf(target, (sljit_u32)(v >> 32))); + + FAIL_IF(push_inst(compiler, llilf(target, (sljit_u32)v))); + return push_inst(compiler, iihf(target, (sljit_u32)(v >> 32))); + } + /* TODO(mundaym): instruction sequences that don't use extended immediates */ + abort(); +} + +struct addr { + sljit_gpr base; + sljit_gpr index; + sljit_sw offset; +}; + +/* transform memory operand into D(X,B) form with a signed 20-bit offset */ +static sljit_s32 make_addr_bxy(struct sljit_compiler *compiler, + struct addr *addr, sljit_s32 mem, sljit_sw off, + sljit_gpr tmp /* clobbered, must not be r0 */) +{ + sljit_gpr base = r0; + sljit_gpr index = r0; + + SLJIT_ASSERT(tmp != r0); + if (mem & REG_MASK) + base = gpr(mem & REG_MASK); + + if (mem & OFFS_REG_MASK) { + index = gpr(OFFS_REG(mem)); + if (off != 0) { + /* shift and put the result into tmp */ + SLJIT_ASSERT(0 <= off && off < 64); + FAIL_IF(push_inst(compiler, sllg(tmp, index, off, 0))); + index = tmp; + off = 0; /* clear offset */ + } + } + else if (!is_s20(off)) { + FAIL_IF(push_load_imm_inst(compiler, tmp, off)); + index = tmp; + off = 0; /* clear offset */ + } + addr->base = base; + addr->index = index; + addr->offset = off; + return SLJIT_SUCCESS; +} + +/* transform memory operand into D(X,B) form with an unsigned 12-bit offset */ +static sljit_s32 make_addr_bx(struct sljit_compiler *compiler, + struct addr *addr, sljit_s32 mem, sljit_sw off, + sljit_gpr tmp /* clobbered, must not be r0 */) +{ + sljit_gpr base = r0; + sljit_gpr index = r0; + + SLJIT_ASSERT(tmp != r0); + if (mem & REG_MASK) + base = gpr(mem & REG_MASK); + + if (mem & OFFS_REG_MASK) { + index = gpr(OFFS_REG(mem)); + if (off != 0) { + /* shift and put the result into tmp */ + SLJIT_ASSERT(0 <= off && off < 64); + FAIL_IF(push_inst(compiler, sllg(tmp, index, off, 0))); + index = tmp; + off = 0; /* clear offset */ + } + } + else if (!is_u12(off)) { + FAIL_IF(push_load_imm_inst(compiler, tmp, off)); + index = tmp; + off = 0; /* clear offset */ + } + addr->base = base; + addr->index = index; + addr->offset = off; + return SLJIT_SUCCESS; +} + +#define EVAL(op, r, addr) op(r, addr.offset, addr.index, addr.base) +#define WHEN(cond, r, i1, i2, addr) \ + (cond) ? EVAL(i1, r, addr) : EVAL(i2, r, addr) + +static sljit_s32 load_word(struct sljit_compiler *compiler, sljit_gpr dst, + sljit_s32 src, sljit_sw srcw, + sljit_gpr tmp /* clobbered */, sljit_s32 is_32bit) +{ + struct addr addr; + sljit_ins ins; + + SLJIT_ASSERT(src & SLJIT_MEM); + if (have_ldisp() || !is_32bit) + FAIL_IF(make_addr_bxy(compiler, &addr, src, srcw, tmp)); + else + FAIL_IF(make_addr_bx(compiler, &addr, src, srcw, tmp)); + + if (is_32bit) + ins = WHEN(is_u12(addr.offset), dst, l, ly, addr); + else + ins = lg(dst, addr.offset, addr.index, addr.base); + + return push_inst(compiler, ins); +} + +static sljit_s32 store_word(struct sljit_compiler *compiler, sljit_gpr src, + sljit_s32 dst, sljit_sw dstw, + sljit_gpr tmp /* clobbered */, sljit_s32 is_32bit) +{ + struct addr addr; + sljit_ins ins; + + SLJIT_ASSERT(dst & SLJIT_MEM); + if (have_ldisp() || !is_32bit) + FAIL_IF(make_addr_bxy(compiler, &addr, dst, dstw, tmp)); + else + FAIL_IF(make_addr_bx(compiler, &addr, dst, dstw, tmp)); + + if (is_32bit) + ins = WHEN(is_u12(addr.offset), src, st, sty, addr); + else + ins = stg(src, addr.offset, addr.index, addr.base); + + return push_inst(compiler, ins); +} + +#undef WHEN + +SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compiler) +{ + struct sljit_label *label; + struct sljit_jump *jump; + struct sljit_s390x_const *const_; + struct sljit_put_label *put_label; + sljit_sw executable_offset; + sljit_uw ins_size = 0; /* instructions */ + sljit_uw pool_size = 0; /* literal pool */ + sljit_uw pad_size; + sljit_uw i, j = 0; + struct sljit_memory_fragment *buf; + void *code, *code_ptr; + sljit_uw *pool, *pool_ptr; + + sljit_uw source; + sljit_sw offset; /* TODO(carenas): only need 32 bit */ + + CHECK_ERROR_PTR(); + CHECK_PTR(check_sljit_generate_code(compiler)); + reverse_buf(compiler); + + /* branch handling */ + label = compiler->labels; + jump = compiler->jumps; + put_label = compiler->put_labels; + + /* TODO(carenas): compiler->executable_size could be calculated + * before to avoid the following loop (except for + * pool_size) + */ + /* calculate the size of the code */ + for (buf = compiler->buf; buf != NULL; buf = buf->next) { + sljit_uw len = buf->used_size / sizeof(sljit_ins); + sljit_ins *ibuf = (sljit_ins *)buf->memory; + for (i = 0; i < len; ++i, ++j) { + sljit_ins ins = ibuf[i]; + + /* TODO(carenas): instruction tag vs size/addr == j + * using instruction tags for const is creative + * but unlike all other architectures, and is not + * done consistently for all other objects. + * This might need reviewing later. + */ + if (ins & sljit_ins_const) { + pool_size += sizeof(*pool); + ins &= ~sljit_ins_const; + } + if (label && label->size == j) { + label->size = ins_size; + label = label->next; + } + if (jump && jump->addr == j) { + if ((jump->flags & SLJIT_REWRITABLE_JUMP) || (jump->flags & JUMP_ADDR)) { + /* encoded: */ + /* brasl %r14, (or brcl , ) */ + /* replace with: */ + /* lgrl %r1, */ + /* bras %r14, %r1 (or bcr , %r1) */ + pool_size += sizeof(*pool); + ins_size += 2; + } + jump = jump->next; + } + if (put_label && put_label->addr == j) { + pool_size += sizeof(*pool); + put_label = put_label->next; + } + ins_size += sizeof_ins(ins); + } + } + + /* emit trailing label */ + if (label && label->size == j) { + label->size = ins_size; + label = label->next; + } + + SLJIT_ASSERT(!label); + SLJIT_ASSERT(!jump); + SLJIT_ASSERT(!put_label); + + /* pad code size to 8 bytes so is accessible with half word offsets */ + /* the literal pool needs to be doubleword aligned */ + pad_size = ((ins_size + 7UL) & ~7UL) - ins_size; + SLJIT_ASSERT(pad_size < 8UL); + + /* allocate target buffer */ + code = SLJIT_MALLOC_EXEC(ins_size + pad_size + pool_size, + compiler->exec_allocator_data); + PTR_FAIL_WITH_EXEC_IF(code); + code_ptr = code; + executable_offset = SLJIT_EXEC_OFFSET(code); + + /* TODO(carenas): pool is optional, and the ABI recommends it to + * be created before the function code, instead of + * globally; if generated code is too big could + * need offsets bigger than 32bit words and asser() + */ + pool = (sljit_uw *)((sljit_uw)code + ins_size + pad_size); + pool_ptr = pool; + const_ = (struct sljit_s390x_const *)compiler->consts; + + /* update label addresses */ + label = compiler->labels; + while (label) { + label->addr = (sljit_uw)SLJIT_ADD_EXEC_OFFSET( + (sljit_uw)code_ptr + label->size, executable_offset); + label = label->next; + } + + /* reset jumps */ + jump = compiler->jumps; + put_label = compiler->put_labels; + + /* emit the code */ + j = 0; + for (buf = compiler->buf; buf != NULL; buf = buf->next) { + sljit_uw len = buf->used_size / sizeof(sljit_ins); + sljit_ins *ibuf = (sljit_ins *)buf->memory; + for (i = 0; i < len; ++i, ++j) { + sljit_ins ins = ibuf[i]; + if (ins & sljit_ins_const) { + /* clear the const tag */ + ins &= ~sljit_ins_const; + + /* update instruction with relative address of constant */ + source = (sljit_uw)code_ptr; + offset = (sljit_uw)pool_ptr - source; + SLJIT_ASSERT(!(offset & 1)); + offset >>= 1; /* halfword (not byte) offset */ + SLJIT_ASSERT(is_s32(offset)); + ins |= (sljit_ins)offset & 0xffffffff; + + /* update address */ + const_->const_.addr = (sljit_uw)pool_ptr; + + /* store initial value into pool and update pool address */ + *(pool_ptr++) = const_->init_value; + + /* move to next constant */ + const_ = (struct sljit_s390x_const *)const_->const_.next; + } + if (jump && jump->addr == j) { + sljit_sw target = (jump->flags & JUMP_LABEL) ? jump->u.label->addr : jump->u.target; + if ((jump->flags & SLJIT_REWRITABLE_JUMP) || (jump->flags & JUMP_ADDR)) { + jump->addr = (sljit_uw)pool_ptr; + + /* load address into tmp1 */ + source = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); + offset = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(pool_ptr, executable_offset) - source; + SLJIT_ASSERT(!(offset & 1)); + offset >>= 1; + SLJIT_ASSERT(is_s32(offset)); + encode_inst(&code_ptr, + lgrl(tmp1, offset & 0xffffffff)); + + /* store jump target into pool and update pool address */ + *(pool_ptr++) = target; + + /* branch to tmp1 */ + sljit_ins op = (ins >> 32) & 0xf; + sljit_ins arg = (ins >> 36) & 0xf; + switch (op) { + case 4: /* brcl -> bcr */ + ins = bcr(arg, tmp1); + break; + case 5: /* brasl -> basr */ + ins = basr(arg, tmp1); + break; + default: + abort(); + } + } + else { + jump->addr = (sljit_uw)code_ptr + 2; + source = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); + offset = target - source; + + /* offset must be halfword aligned */ + SLJIT_ASSERT(!(offset & 1)); + offset >>= 1; + SLJIT_ASSERT(is_s32(offset)); /* TODO(mundaym): handle arbitrary offsets */ + + /* patch jump target */ + ins |= (sljit_ins)offset & 0xffffffff; + } + jump = jump->next; + } + if (put_label && put_label->addr == j) { + source = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); + + SLJIT_ASSERT(put_label->label); + put_label->addr = (sljit_uw)code_ptr; + + /* store target into pool */ + *pool_ptr = put_label->label->addr; + offset = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(pool_ptr, executable_offset) - source; + pool_ptr++; + + SLJIT_ASSERT(!(offset & 1)); + offset >>= 1; + SLJIT_ASSERT(is_s32(offset)); + ins |= (sljit_ins)offset & 0xffffffff; + + put_label = put_label->next; + } + encode_inst(&code_ptr, ins); + } + } + SLJIT_ASSERT((sljit_u8 *)code + ins_size == code_ptr); + SLJIT_ASSERT((sljit_u8 *)pool + pool_size == (sljit_u8 *)pool_ptr); + + compiler->error = SLJIT_ERR_COMPILED; + compiler->executable_offset = executable_offset; + compiler->executable_size = ins_size; + code = SLJIT_ADD_EXEC_OFFSET(code, executable_offset); + code_ptr = SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); + SLJIT_CACHE_FLUSH(code, code_ptr); + SLJIT_UPDATE_WX_FLAGS(code, code_ptr, 1); + return code; +} + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_has_cpu_feature(sljit_s32 feature_type) +{ + /* TODO(mundaym): implement all */ + switch (feature_type) { + case SLJIT_HAS_CLZ: + return have_eimm() ? 1 : 0; /* FLOGR instruction */ + case SLJIT_HAS_CMOV: + return have_lscond1() ? 1 : 0; + case SLJIT_HAS_FPU: + return 0; + } + return 0; +} + +/* --------------------------------------------------------------------- */ +/* Entry, exit */ +/* --------------------------------------------------------------------- */ + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_enter(struct sljit_compiler *compiler, + sljit_s32 options, sljit_s32 arg_types, sljit_s32 scratches, sljit_s32 saveds, + sljit_s32 fscratches, sljit_s32 fsaveds, sljit_s32 local_size) +{ + sljit_s32 args = get_arg_count(arg_types); + sljit_sw frame_size; + + CHECK_ERROR(); + CHECK(check_sljit_emit_enter(compiler, options, arg_types, scratches, saveds, fscratches, fsaveds, local_size)); + set_emit_enter(compiler, options, arg_types, scratches, saveds, fscratches, fsaveds, local_size); + + /* saved registers go in callee allocated save area */ + compiler->local_size = (local_size + 0xf) & ~0xf; + frame_size = compiler->local_size + SLJIT_S390X_DEFAULT_STACK_FRAME_SIZE; + + FAIL_IF(push_inst(compiler, stmg(r6, r15, r6 * sizeof(sljit_sw), r15))); /* save registers TODO(MGM): optimize */ + if (frame_size != 0) { + if (is_s16(-frame_size)) + FAIL_IF(push_inst(compiler, aghi(r15, -((sljit_s16)frame_size)))); + else if (is_s32(-frame_size)) + FAIL_IF(push_inst(compiler, agfi(r15, -((sljit_s32)frame_size)))); + else { + FAIL_IF(push_load_imm_inst(compiler, tmp1, -frame_size)); + FAIL_IF(push_inst(compiler, la(r15, 0, tmp1, r15))); + } + } + + if (args >= 1) + FAIL_IF(push_inst(compiler, lgr(gpr(SLJIT_S0), gpr(SLJIT_R0)))); + if (args >= 2) + FAIL_IF(push_inst(compiler, lgr(gpr(SLJIT_S1), gpr(SLJIT_R1)))); + if (args >= 3) + FAIL_IF(push_inst(compiler, lgr(gpr(SLJIT_S2), gpr(SLJIT_R2)))); + SLJIT_ASSERT(args < 4); + + return SLJIT_SUCCESS; +} + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_set_context(struct sljit_compiler *compiler, + sljit_s32 options, sljit_s32 arg_types, sljit_s32 scratches, sljit_s32 saveds, + sljit_s32 fscratches, sljit_s32 fsaveds, sljit_s32 local_size) +{ + CHECK_ERROR(); + CHECK(check_sljit_set_context(compiler, options, arg_types, scratches, saveds, fscratches, fsaveds, local_size)); + set_set_context(compiler, options, arg_types, scratches, saveds, fscratches, fsaveds, local_size); + + /* TODO(mundaym): stack space for saved floating point registers */ + compiler->local_size = (local_size + 0xf) & ~0xf; + return SLJIT_SUCCESS; +} + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_return(struct sljit_compiler *compiler, sljit_s32 op, sljit_s32 src, sljit_sw srcw) +{ + sljit_sw size; + sljit_gpr end; + + CHECK_ERROR(); + CHECK(check_sljit_emit_return(compiler, op, src, srcw)); + + FAIL_IF(emit_mov_before_return(compiler, op, src, srcw)); + + size = compiler->local_size + SLJIT_S390X_DEFAULT_STACK_FRAME_SIZE + (r6 * sizeof(sljit_sw)); + if (!is_s20(size)) { + FAIL_IF(push_load_imm_inst(compiler, tmp1, compiler->local_size + SLJIT_S390X_DEFAULT_STACK_FRAME_SIZE)); + FAIL_IF(push_inst(compiler, la(r15, 0, tmp1, r15))); + size = r6 * sizeof(sljit_sw); + end = r14; /* r15 has been restored already */ + } + else + end = r15; + + FAIL_IF(push_inst(compiler, lmg(r6, end, size, r15))); /* restore registers TODO(MGM): optimize */ + FAIL_IF(push_inst(compiler, br(r14))); /* return */ + + return SLJIT_SUCCESS; +} + +/* --------------------------------------------------------------------- */ +/* Operators */ +/* --------------------------------------------------------------------- */ + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op0(struct sljit_compiler *compiler, sljit_s32 op) +{ + sljit_gpr arg0 = gpr(SLJIT_R0); + sljit_gpr arg1 = gpr(SLJIT_R1); + + CHECK_ERROR(); + CHECK(check_sljit_emit_op0(compiler, op)); + + op = GET_OPCODE(op) | (op & SLJIT_I32_OP); + switch (op) { + case SLJIT_BREAKPOINT: + /* The following invalid instruction is emitted by gdb. */ + return push_inst(compiler, 0x0001 /* 2-byte trap */); + case SLJIT_NOP: + return push_inst(compiler, 0x0700 /* 2-byte nop */); + case SLJIT_LMUL_UW: + FAIL_IF(push_inst(compiler, mlgr(arg0, arg0))); + break; + case SLJIT_LMUL_SW: + /* signed multiplication from: */ + /* Hacker's Delight, Second Edition: Chapter 8-3. */ + FAIL_IF(push_inst(compiler, srag(tmp0, arg0, 63, 0))); + FAIL_IF(push_inst(compiler, srag(tmp1, arg1, 63, 0))); + FAIL_IF(push_inst(compiler, ngr(tmp0, arg1))); + FAIL_IF(push_inst(compiler, ngr(tmp1, arg0))); + + /* unsigned multiplication */ + FAIL_IF(push_inst(compiler, mlgr(arg0, arg0))); + + FAIL_IF(push_inst(compiler, sgr(arg0, tmp0))); + FAIL_IF(push_inst(compiler, sgr(arg0, tmp1))); + break; + case SLJIT_DIV_U32: + case SLJIT_DIVMOD_U32: + FAIL_IF(push_inst(compiler, lhi(tmp0, 0))); + FAIL_IF(push_inst(compiler, lr(tmp1, arg0))); + FAIL_IF(push_inst(compiler, dlr(tmp0, arg1))); + FAIL_IF(push_inst(compiler, lr(arg0, tmp1))); /* quotient */ + if (op == SLJIT_DIVMOD_U32) + return push_inst(compiler, lr(arg1, tmp0)); /* remainder */ + + return SLJIT_SUCCESS; + case SLJIT_DIV_S32: + case SLJIT_DIVMOD_S32: + FAIL_IF(push_inst(compiler, lhi(tmp0, 0))); + FAIL_IF(push_inst(compiler, lr(tmp1, arg0))); + FAIL_IF(push_inst(compiler, dr(tmp0, arg1))); + FAIL_IF(push_inst(compiler, lr(arg0, tmp1))); /* quotient */ + if (op == SLJIT_DIVMOD_S32) + return push_inst(compiler, lr(arg1, tmp0)); /* remainder */ + + return SLJIT_SUCCESS; + case SLJIT_DIV_UW: + case SLJIT_DIVMOD_UW: + FAIL_IF(push_inst(compiler, lghi(tmp0, 0))); + FAIL_IF(push_inst(compiler, lgr(tmp1, arg0))); + FAIL_IF(push_inst(compiler, dlgr(tmp0, arg1))); + FAIL_IF(push_inst(compiler, lgr(arg0, tmp1))); /* quotient */ + if (op == SLJIT_DIVMOD_UW) + return push_inst(compiler, lgr(arg1, tmp0)); /* remainder */ + + return SLJIT_SUCCESS; + case SLJIT_DIV_SW: + case SLJIT_DIVMOD_SW: + FAIL_IF(push_inst(compiler, lgr(tmp1, arg0))); + FAIL_IF(push_inst(compiler, dsgr(tmp0, arg1))); + FAIL_IF(push_inst(compiler, lgr(arg0, tmp1))); /* quotient */ + if (op == SLJIT_DIVMOD_SW) + return push_inst(compiler, lgr(arg1, tmp0)); /* remainder */ + + return SLJIT_SUCCESS; + case SLJIT_ENDBR: + return SLJIT_SUCCESS; + case SLJIT_SKIP_FRAMES_BEFORE_RETURN: + return SLJIT_SUCCESS; + default: + SLJIT_UNREACHABLE(); + } + /* swap result registers */ + FAIL_IF(push_inst(compiler, lgr(tmp0, arg0))); + FAIL_IF(push_inst(compiler, lgr(arg0, arg1))); + return push_inst(compiler, lgr(arg1, tmp0)); +} + +/* LEVAL will be defined later with different parameters as needed */ +#define WHEN2(cond, i1, i2) (cond) ? LEVAL(i1) : LEVAL(i2) + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op1(struct sljit_compiler *compiler, sljit_s32 op, + sljit_s32 dst, sljit_sw dstw, + sljit_s32 src, sljit_sw srcw) +{ + sljit_ins ins; + struct addr mem; + sljit_gpr dst_r; + sljit_gpr src_r; + sljit_s32 opcode = GET_OPCODE(op); + + CHECK_ERROR(); + CHECK(check_sljit_emit_op1(compiler, op, dst, dstw, src, srcw)); + ADJUST_LOCAL_OFFSET(dst, dstw); + ADJUST_LOCAL_OFFSET(src, srcw); + + if ((dst == SLJIT_UNUSED) && !HAS_FLAGS(op)) { + /* TODO(carenas): implement prefetch? */ + return SLJIT_SUCCESS; + } + if (opcode >= SLJIT_MOV && opcode <= SLJIT_MOV_P) { + /* LOAD REGISTER */ + if (FAST_IS_REG(dst) && FAST_IS_REG(src)) { + dst_r = gpr(dst); + src_r = gpr(src); + switch (opcode | (op & SLJIT_I32_OP)) { + /* 32-bit */ + case SLJIT_MOV32_U8: + ins = llcr(dst_r, src_r); + break; + case SLJIT_MOV32_S8: + ins = lbr(dst_r, src_r); + break; + case SLJIT_MOV32_U16: + ins = llhr(dst_r, src_r); + break; + case SLJIT_MOV32_S16: + ins = lhr(dst_r, src_r); + break; + case SLJIT_MOV32: + ins = lr(dst_r, src_r); + break; + /* 64-bit */ + case SLJIT_MOV_U8: + ins = llgcr(dst_r, src_r); + break; + case SLJIT_MOV_S8: + ins = lgbr(dst_r, src_r); + break; + case SLJIT_MOV_U16: + ins = llghr(dst_r, src_r); + break; + case SLJIT_MOV_S16: + ins = lghr(dst_r, src_r); + break; + case SLJIT_MOV_U32: + ins = llgfr(dst_r, src_r); + break; + case SLJIT_MOV_S32: + ins = lgfr(dst_r, src_r); + break; + case SLJIT_MOV: + case SLJIT_MOV_P: + ins = lgr(dst_r, src_r); + break; + default: + ins = 0; + SLJIT_UNREACHABLE(); + } + FAIL_IF(push_inst(compiler, ins)); + if (HAS_FLAGS(op)) { + /* only handle zero flag */ + SLJIT_ASSERT(!(op & VARIABLE_FLAG_MASK)); + return push_store_zero_flag(compiler, op, dst_r); + } + return SLJIT_SUCCESS; + } + /* LOAD IMMEDIATE */ + if (FAST_IS_REG(dst) && (src & SLJIT_IMM)) { + switch (opcode) { + case SLJIT_MOV_U8: + srcw = (sljit_sw)((sljit_u8)(srcw)); + break; + case SLJIT_MOV_S8: + srcw = (sljit_sw)((sljit_s8)(srcw)); + break; + case SLJIT_MOV_U16: + srcw = (sljit_sw)((sljit_u16)(srcw)); + break; + case SLJIT_MOV_S16: + srcw = (sljit_sw)((sljit_s16)(srcw)); + break; + case SLJIT_MOV_U32: + srcw = (sljit_sw)((sljit_u32)(srcw)); + break; + case SLJIT_MOV_S32: + srcw = (sljit_sw)((sljit_s32)(srcw)); + break; + } + return push_load_imm_inst(compiler, gpr(dst), srcw); + } + /* LOAD */ + /* TODO(carenas): avoid reg being defined later */ + #define LEVAL(i) EVAL(i, reg, mem) + if (FAST_IS_REG(dst) && (src & SLJIT_MEM)) { + sljit_gpr reg = gpr(dst); + + FAIL_IF(make_addr_bxy(compiler, &mem, src, srcw, tmp1)); + /* TODO(carenas): convert all calls below to LEVAL */ + switch (opcode | (op & SLJIT_I32_OP)) { + case SLJIT_MOV32_U8: + ins = llc(reg, mem.offset, mem.index, mem.base); + break; + case SLJIT_MOV32_S8: + ins = lb(reg, mem.offset, mem.index, mem.base); + break; + case SLJIT_MOV32_U16: + ins = llh(reg, mem.offset, mem.index, mem.base); + break; + case SLJIT_MOV32_S16: + ins = WHEN2(is_u12(mem.offset), lh, lhy); + break; + case SLJIT_MOV32: + ins = WHEN2(is_u12(mem.offset), l, ly); + break; + case SLJIT_MOV_U8: + ins = LEVAL(llgc); + break; + case SLJIT_MOV_S8: + ins = lgb(reg, mem.offset, mem.index, mem.base); + break; + case SLJIT_MOV_U16: + ins = LEVAL(llgh); + break; + case SLJIT_MOV_S16: + ins = lgh(reg, mem.offset, mem.index, mem.base); + break; + case SLJIT_MOV_U32: + ins = LEVAL(llgf); + break; + case SLJIT_MOV_S32: + ins = lgf(reg, mem.offset, mem.index, mem.base); + break; + case SLJIT_MOV_P: + case SLJIT_MOV: + ins = lg(reg, mem.offset, mem.index, mem.base); + break; + default: + SLJIT_UNREACHABLE(); + } + FAIL_IF(push_inst(compiler, ins)); + if (HAS_FLAGS(op)) { + /* only handle zero flag */ + SLJIT_ASSERT(!(op & VARIABLE_FLAG_MASK)); + return push_store_zero_flag(compiler, op, reg); + } + return SLJIT_SUCCESS; + } + /* STORE and STORE IMMEDIATE */ + if ((dst & SLJIT_MEM) + && (FAST_IS_REG(src) || (src & SLJIT_IMM))) { + sljit_gpr reg = FAST_IS_REG(src) ? gpr(src) : tmp0; + if (src & SLJIT_IMM) { + /* TODO(mundaym): MOVE IMMEDIATE? */ + FAIL_IF(push_load_imm_inst(compiler, reg, srcw)); + } + struct addr mem; + FAIL_IF(make_addr_bxy(compiler, &mem, dst, dstw, tmp1)); + switch (opcode) { + case SLJIT_MOV_U8: + case SLJIT_MOV_S8: + return push_inst(compiler, + WHEN2(is_u12(mem.offset), stc, stcy)); + case SLJIT_MOV_U16: + case SLJIT_MOV_S16: + return push_inst(compiler, + WHEN2(is_u12(mem.offset), sth, sthy)); + case SLJIT_MOV_U32: + case SLJIT_MOV_S32: + return push_inst(compiler, + WHEN2(is_u12(mem.offset), st, sty)); + case SLJIT_MOV_P: + case SLJIT_MOV: + FAIL_IF(push_inst(compiler, LEVAL(stg))); + if (HAS_FLAGS(op)) { + /* only handle zero flag */ + SLJIT_ASSERT(!(op & VARIABLE_FLAG_MASK)); + return push_store_zero_flag(compiler, op, reg); + } + return SLJIT_SUCCESS; + default: + SLJIT_UNREACHABLE(); + } + } + #undef LEVAL + /* MOVE CHARACTERS */ + if ((dst & SLJIT_MEM) && (src & SLJIT_MEM)) { + struct addr mem; + FAIL_IF(make_addr_bxy(compiler, &mem, src, srcw, tmp1)); + switch (opcode) { + case SLJIT_MOV_U8: + case SLJIT_MOV_S8: + FAIL_IF(push_inst(compiler, + EVAL(llgc, tmp0, mem))); + FAIL_IF(make_addr_bxy(compiler, &mem, dst, dstw, tmp1)); + return push_inst(compiler, + EVAL(stcy, tmp0, mem)); + case SLJIT_MOV_U16: + case SLJIT_MOV_S16: + FAIL_IF(push_inst(compiler, + EVAL(llgh, tmp0, mem))); + FAIL_IF(make_addr_bxy(compiler, &mem, dst, dstw, tmp1)); + return push_inst(compiler, + EVAL(sthy, tmp0, mem)); + case SLJIT_MOV_U32: + case SLJIT_MOV_S32: + FAIL_IF(push_inst(compiler, + EVAL(ly, tmp0, mem))); + FAIL_IF(make_addr_bxy(compiler, &mem, dst, dstw, tmp1)); + return push_inst(compiler, + EVAL(sty, tmp0, mem)); + case SLJIT_MOV_P: + case SLJIT_MOV: + FAIL_IF(push_inst(compiler, + EVAL(lg, tmp0, mem))); + FAIL_IF(make_addr_bxy(compiler, &mem, dst, dstw, tmp1)); + FAIL_IF(push_inst(compiler, + EVAL(stg, tmp0, mem))); + if (HAS_FLAGS(op)) { + /* only handle zero flag */ + SLJIT_ASSERT(!(op & VARIABLE_FLAG_MASK)); + return push_store_zero_flag(compiler, op, tmp0); + } + return SLJIT_SUCCESS; + default: + SLJIT_UNREACHABLE(); + } + } + SLJIT_UNREACHABLE(); + } + + SLJIT_ASSERT((src & SLJIT_IMM) == 0); /* no immediates */ + + dst_r = SLOW_IS_REG(dst) ? gpr(REG_MASK & dst) : tmp0; + src_r = FAST_IS_REG(src) ? gpr(REG_MASK & src) : tmp0; + if (src & SLJIT_MEM) + FAIL_IF(load_word(compiler, src_r, src, srcw, tmp1, src & SLJIT_I32_OP)); + + /* TODO(mundaym): optimize loads and stores */ + switch (opcode | (op & SLJIT_I32_OP)) { + case SLJIT_NOT: + /* emulate ~x with x^-1 */ + FAIL_IF(push_load_imm_inst(compiler, tmp1, -1)); + if (src_r != dst_r) + FAIL_IF(push_inst(compiler, lgr(dst_r, src_r))); + + FAIL_IF(push_inst(compiler, xgr(dst_r, tmp1))); + break; + case SLJIT_NOT32: + /* emulate ~x with x^-1 */ + if (have_eimm()) + FAIL_IF(push_inst(compiler, xilf(dst_r, -1))); + else { + FAIL_IF(push_load_imm_inst(compiler, tmp1, -1)); + if (src_r != dst_r) + FAIL_IF(push_inst(compiler, lr(dst_r, src_r))); + + FAIL_IF(push_inst(compiler, xr(dst_r, tmp1))); + } + break; + case SLJIT_NEG: + FAIL_IF(push_inst(compiler, lcgr(dst_r, src_r))); + break; + case SLJIT_NEG32: + FAIL_IF(push_inst(compiler, lcr(dst_r, src_r))); + break; + case SLJIT_CLZ: + if (have_eimm()) { + FAIL_IF(push_inst(compiler, flogr(tmp0, src_r))); /* clobbers tmp1 */ + if (dst_r != tmp0) + FAIL_IF(push_inst(compiler, lgr(dst_r, tmp0))); + } else { + abort(); /* TODO(mundaym): no eimm (?) */ + } + break; + case SLJIT_CLZ32: + if (have_eimm()) { + FAIL_IF(push_inst(compiler, sllg(tmp1, src_r, 32, 0))); + FAIL_IF(push_inst(compiler, iilf(tmp1, 0xffffffff))); + FAIL_IF(push_inst(compiler, flogr(tmp0, tmp1))); /* clobbers tmp1 */ + if (dst_r != tmp0) + FAIL_IF(push_inst(compiler, lr(dst_r, tmp0))); + } else { + abort(); /* TODO(mundaym): no eimm (?) */ + } + break; + default: + SLJIT_UNREACHABLE(); + } + + /* write condition code to emulated flag register */ + if (op & VARIABLE_FLAG_MASK) + FAIL_IF(push_inst(compiler, ipm(flag_r))); + + /* write zero flag to emulated flag register */ + if (op & SLJIT_SET_Z) + FAIL_IF(push_store_zero_flag(compiler, op, dst_r)); + + /* TODO(carenas): doesn't need FAIL_IF */ + if ((dst != SLJIT_UNUSED) && (dst & SLJIT_MEM)) + FAIL_IF(store_word(compiler, dst_r, dst, dstw, tmp1, op & SLJIT_I32_OP)); + + return SLJIT_SUCCESS; +} + +static SLJIT_INLINE int is_commutative(sljit_s32 op) +{ + switch (GET_OPCODE(op)) { + case SLJIT_ADD: + case SLJIT_ADDC: + case SLJIT_MUL: + case SLJIT_AND: + case SLJIT_OR: + case SLJIT_XOR: + return 1; + } + return 0; +} + +static SLJIT_INLINE int is_shift(sljit_s32 op) { + sljit_s32 v = GET_OPCODE(op); + return (v == SLJIT_SHL || v == SLJIT_ASHR || v == SLJIT_LSHR) ? 1 : 0; +} + +static SLJIT_INLINE int sets_signed_flag(sljit_s32 op) +{ + switch (GET_FLAG_TYPE(op)) { + case SLJIT_OVERFLOW: + case SLJIT_NOT_OVERFLOW: + case SLJIT_SIG_LESS: + case SLJIT_SIG_LESS_EQUAL: + case SLJIT_SIG_GREATER: + case SLJIT_SIG_GREATER_EQUAL: + return 1; + } + return 0; +} + +/* Report whether we have an instruction for: + op dst src imm + where dst and src are separate registers. */ +static int have_op_3_imm(sljit_s32 op, sljit_sw imm) { + return 0; /* TODO(mundaym): implement */ +} + +/* Report whether we have an instruction for: + op reg imm + where reg is both a source and the destination. */ +static int have_op_2_imm(sljit_s32 op, sljit_sw imm) { + switch (GET_OPCODE(op) | (op & SLJIT_I32_OP)) { + case SLJIT_ADD32: + case SLJIT_ADD: + if (!HAS_FLAGS(op) || sets_signed_flag(op)) + return have_eimm() ? is_s32(imm) : is_s16(imm); + + return have_eimm() && is_u32(imm); + case SLJIT_MUL32: + case SLJIT_MUL: + /* TODO(mundaym): general extension check */ + /* for ms{,g}fi */ + if (op & VARIABLE_FLAG_MASK) + return 0; + + return have_genext() && is_s16(imm); + case SLJIT_OR32: + case SLJIT_XOR32: + case SLJIT_AND32: + /* only use if have extended immediate facility */ + /* this ensures flags are set correctly */ + return have_eimm(); + case SLJIT_AND: + case SLJIT_OR: + case SLJIT_XOR: + /* TODO(mundaym): make this more flexible */ + /* avoid using immediate variations, flags */ + /* won't be set correctly */ + return 0; + case SLJIT_ADDC32: + case SLJIT_ADDC: + /* no ADD LOGICAL WITH CARRY IMMEDIATE */ + return 0; + case SLJIT_SUB: + case SLJIT_SUB32: + case SLJIT_SUBC: + case SLJIT_SUBC32: + /* no SUBTRACT IMMEDIATE */ + /* TODO(mundaym): SUBTRACT LOGICAL IMMEDIATE */ + return 0; + } + return 0; +} + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op2(struct sljit_compiler *compiler, sljit_s32 op, + sljit_s32 dst, sljit_sw dstw, + sljit_s32 src1, sljit_sw src1w, + sljit_s32 src2, sljit_sw src2w) +{ + CHECK_ERROR(); + CHECK(check_sljit_emit_op2(compiler, op, dst, dstw, src1, src1w, src2, src2w)); + ADJUST_LOCAL_OFFSET(dst, dstw); + ADJUST_LOCAL_OFFSET(src1, src1w); + ADJUST_LOCAL_OFFSET(src2, src2w); + + if (dst == SLJIT_UNUSED && !HAS_FLAGS(op)) + return SLJIT_SUCCESS; + + sljit_gpr dst_r = SLOW_IS_REG(dst) ? gpr(dst & REG_MASK) : tmp0; + + if (is_commutative(op)) { + #define SWAP_ARGS \ + do { \ + sljit_s32 t = src1; \ + sljit_sw tw = src1w; \ + src1 = src2; \ + src1w = src2w; \ + src2 = t; \ + src2w = tw; \ + } while(0); + + /* prefer immediate in src2 */ + if (src1 & SLJIT_IMM) { + SWAP_ARGS + } + + /* prefer to have src1 use same register as dst */ + if (FAST_IS_REG(src2) && gpr(src2 & REG_MASK) == dst_r) { + SWAP_ARGS + } + + /* prefer memory argument in src2 */ + if (FAST_IS_REG(src2) && (src1 & SLJIT_MEM)) { + SWAP_ARGS + } + #undef SWAP_ARGS + } + + /* src1 must be in a register */ + sljit_gpr src1_r = FAST_IS_REG(src1) ? gpr(src1 & REG_MASK) : tmp0; + if (src1 & SLJIT_IMM) + FAIL_IF(push_load_imm_inst(compiler, src1_r, src1w)); + + if (src1 & SLJIT_MEM) + FAIL_IF(load_word(compiler, src1_r, src1, src1w, tmp1, op & SLJIT_I32_OP)); + + /* emit comparison before subtract */ + if (GET_OPCODE(op) == SLJIT_SUB && (op & VARIABLE_FLAG_MASK)) { + sljit_sw cmp = 0; + switch (GET_FLAG_TYPE(op)) { + case SLJIT_LESS: + case SLJIT_LESS_EQUAL: + case SLJIT_GREATER: + case SLJIT_GREATER_EQUAL: + cmp = 1; /* unsigned */ + break; + case SLJIT_EQUAL: + case SLJIT_SIG_LESS: + case SLJIT_SIG_LESS_EQUAL: + case SLJIT_SIG_GREATER: + case SLJIT_SIG_GREATER_EQUAL: + cmp = -1; /* signed */ + break; + } + if (cmp) { + /* clear flags - no need to generate now */ + op &= ~VARIABLE_FLAG_MASK; + sljit_gpr src2_r = FAST_IS_REG(src2) ? gpr(src2 & REG_MASK) : tmp1; + if (src2 & SLJIT_IMM) { + #define LEVAL(i) i(src1_r, src2w) + if (cmp > 0 && is_u32(src2w)) { + /* unsigned */ + FAIL_IF(push_inst(compiler, + WHEN2(op & SLJIT_I32_OP, clfi, clgfi))); + } + else if (cmp < 0 && is_s16(src2w)) { + /* signed */ + FAIL_IF(push_inst(compiler, + WHEN2(op & SLJIT_I32_OP, chi, cghi))); + } + else if (cmp < 0 && is_s32(src2w)) { + /* signed */ + FAIL_IF(push_inst(compiler, + WHEN2(op & SLJIT_I32_OP, cfi, cgfi))); + } + #undef LEVAL + #define LEVAL(i) i(src1_r, src2_r) + else { + FAIL_IF(push_load_imm_inst(compiler, src2_r, src2w)); + if (cmp > 0) { + /* unsigned */ + FAIL_IF(push_inst(compiler, + WHEN2(op & SLJIT_I32_OP, clr, clgr))); + } + if (cmp < 0) { + /* signed */ + FAIL_IF(push_inst(compiler, + WHEN2(op & SLJIT_I32_OP, cr, cgr))); + } + } + } + else { + if (src2 & SLJIT_MEM) { + /* TODO(mundaym): comparisons with memory */ + /* load src2 into register */ + FAIL_IF(load_word(compiler, src2_r, src2, src2w, tmp1, op & SLJIT_I32_OP)); + } + if (cmp > 0) { + /* unsigned */ + FAIL_IF(push_inst(compiler, + WHEN2(op & SLJIT_I32_OP, clr, clgr))); + } + if (cmp < 0) { + /* signed */ + FAIL_IF(push_inst(compiler, + WHEN2(op & SLJIT_I32_OP, cr, cgr))); + } + #undef LEVAL + } + FAIL_IF(push_inst(compiler, ipm(flag_r))); + } + } + + if (!HAS_FLAGS(op) && dst == SLJIT_UNUSED) + return SLJIT_SUCCESS; + + /* need to specify signed or logical operation */ + int signed_flags = sets_signed_flag(op); + + if (is_shift(op)) { + /* handle shifts first, they have more constraints than other operations */ + sljit_sw d = 0; + sljit_gpr b = FAST_IS_REG(src2) ? gpr(src2 & REG_MASK) : r0; + if (src2 & SLJIT_IMM) + d = src2w & ((op & SLJIT_I32_OP) ? 31 : 63); + + if (src2 & SLJIT_MEM) { + /* shift amount (b) cannot be in r0 (i.e. tmp0) */ + FAIL_IF(load_word(compiler, tmp1, src2, src2w, tmp1, op & SLJIT_I32_OP)); + b = tmp1; + } + /* src1 and dst share the same register in the base 32-bit ISA */ + /* TODO(mundaym): not needed when distinct-operand facility is available */ + int workaround_alias = op & SLJIT_I32_OP && src1_r != dst_r; + if (workaround_alias) { + /* put src1 into tmp0 so we can overwrite it */ + FAIL_IF(push_inst(compiler, lr(tmp0, src1_r))); + src1_r = tmp0; + } + switch (GET_OPCODE(op) | (op & SLJIT_I32_OP)) { + case SLJIT_SHL: + FAIL_IF(push_inst(compiler, sllg(dst_r, src1_r, d, b))); + break; + case SLJIT_SHL32: + FAIL_IF(push_inst(compiler, sll(src1_r, d, b))); + break; + case SLJIT_LSHR: + FAIL_IF(push_inst(compiler, srlg(dst_r, src1_r, d, b))); + break; + case SLJIT_LSHR32: + FAIL_IF(push_inst(compiler, srl(src1_r, d, b))); + break; + case SLJIT_ASHR: + FAIL_IF(push_inst(compiler, srag(dst_r, src1_r, d, b))); + break; + case SLJIT_ASHR32: + FAIL_IF(push_inst(compiler, sra(src1_r, d, b))); + break; + default: + SLJIT_UNREACHABLE(); + } + if (workaround_alias && dst_r != src1_r) + FAIL_IF(push_inst(compiler, lr(dst_r, src1_r))); + + } + else if ((GET_OPCODE(op) == SLJIT_MUL) && HAS_FLAGS(op)) { + /* multiply instructions do not generally set flags so we need to manually */ + /* detect overflow conditions */ + /* TODO(mundaym): 64-bit overflow */ + SLJIT_ASSERT(GET_FLAG_TYPE(op) == SLJIT_MUL_OVERFLOW || + GET_FLAG_TYPE(op) == SLJIT_MUL_NOT_OVERFLOW); + sljit_gpr src2_r = FAST_IS_REG(src2) ? gpr(src2 & REG_MASK) : tmp1; + if (src2 & SLJIT_IMM) { + /* load src2 into register */ + FAIL_IF(push_load_imm_inst(compiler, src2_r, src2w)); + } + if (src2 & SLJIT_MEM) { + /* load src2 into register */ + FAIL_IF(load_word(compiler, src2_r, src2, src2w, tmp1, op & SLJIT_I32_OP)); + } + if (have_misc2()) { + #define LEVAL(i) i(dst_r, src1_r, src2_r) + FAIL_IF(push_inst(compiler, + WHEN2(op & SLJIT_I32_OP, msrkc, msgrkc))); + #undef LEVAL + } + else if (op & SLJIT_I32_OP) { + op &= ~VARIABLE_FLAG_MASK; + FAIL_IF(push_inst(compiler, lgfr(tmp0, src1_r))); + FAIL_IF(push_inst(compiler, msgfr(tmp0, src2_r))); + if (dst_r != tmp0) { + FAIL_IF(push_inst(compiler, lr(dst_r, tmp0))); + } + FAIL_IF(push_inst(compiler, aih(tmp0, 1))); + FAIL_IF(push_inst(compiler, nihf(tmp0, ~1U))); + FAIL_IF(push_inst(compiler, ipm(flag_r))); + FAIL_IF(push_inst(compiler, oilh(flag_r, 0x2000))); + } + else + return SLJIT_ERR_UNSUPPORTED; + + } + else if ((GET_OPCODE(op) == SLJIT_SUB) && (op & SLJIT_SET_Z) && !signed_flags) { + /* subtract logical instructions do not set the right flags unfortunately */ + /* instead, negate src2 and issue an add logical */ + /* TODO(mundaym): distinct operand facility where needed */ + if (src1_r != dst_r && src1_r != tmp0) { + #define LEVAL(i) i(tmp0, src1_r) + FAIL_IF(push_inst(compiler, + WHEN2(op & SLJIT_I32_OP, lr, lgr))); + src1_r = tmp0; + #undef LEVAL + } + sljit_gpr src2_r = FAST_IS_REG(src2) ? gpr(src2 & REG_MASK) : tmp1; + if (src2 & SLJIT_IMM) { + /* load src2 into register */ + FAIL_IF(push_load_imm_inst(compiler, src2_r, src2w)); + } + if (src2 & SLJIT_MEM) { + /* load src2 into register */ + FAIL_IF(load_word(compiler, src2_r, src2, src2w, tmp1, op & SLJIT_I32_OP)); + } + if (op & SLJIT_I32_OP) { + FAIL_IF(push_inst(compiler, lcr(tmp1, src2_r))); + FAIL_IF(push_inst(compiler, alr(src1_r, tmp1))); + if (src1_r != dst_r) + FAIL_IF(push_inst(compiler, lr(dst_r, src1_r))); + } + else { + FAIL_IF(push_inst(compiler, lcgr(tmp1, src2_r))); + FAIL_IF(push_inst(compiler, algr(src1_r, tmp1))); + if (src1_r != dst_r) + FAIL_IF(push_inst(compiler, lgr(dst_r, src1_r))); + } + } + else if ((src2 & SLJIT_IMM) && (src1_r == dst_r) && have_op_2_imm(op, src2w)) { + switch (GET_OPCODE(op) | (op & SLJIT_I32_OP)) { + #define LEVAL(i) i(dst_r, src2w) + case SLJIT_ADD: + if (!HAS_FLAGS(op) || signed_flags) { + FAIL_IF(push_inst(compiler, + WHEN2(is_s16(src2w), aghi, agfi))); + } + else + FAIL_IF(push_inst(compiler, LEVAL(algfi))); + + break; + case SLJIT_ADD32: + if (!HAS_FLAGS(op) || signed_flags) + FAIL_IF(push_inst(compiler, + WHEN2(is_s16(src2w), ahi, afi))); + else + FAIL_IF(push_inst(compiler, LEVAL(alfi))); + + break; + #undef LEVAL /* TODO(carenas): move down and refactor? */ + case SLJIT_MUL: + FAIL_IF(push_inst(compiler, mhi(dst_r, src2w))); + break; + case SLJIT_MUL32: + FAIL_IF(push_inst(compiler, mghi(dst_r, src2w))); + break; + case SLJIT_OR32: + FAIL_IF(push_inst(compiler, oilf(dst_r, src2w))); + break; + case SLJIT_XOR32: + FAIL_IF(push_inst(compiler, xilf(dst_r, src2w))); + break; + case SLJIT_AND32: + FAIL_IF(push_inst(compiler, nilf(dst_r, src2w))); + break; + default: + SLJIT_UNREACHABLE(); + } + } + else if ((src2 & SLJIT_IMM) && have_op_3_imm(op, src2w)) { + abort(); /* TODO(mundaym): implement */ + } + else if ((src2 & SLJIT_MEM) && (dst_r == src1_r)) { + /* most 32-bit instructions can only handle 12-bit immediate offsets */ + int need_u12 = !have_ldisp() && + (op & SLJIT_I32_OP) && + (GET_OPCODE(op) != SLJIT_ADDC) && + (GET_OPCODE(op) != SLJIT_SUBC); + struct addr mem; + if (need_u12) + FAIL_IF(make_addr_bx(compiler, &mem, src2, src2w, tmp1)); + else + FAIL_IF(make_addr_bxy(compiler, &mem, src2, src2w, tmp1)); + + int can_u12 = is_u12(mem.offset) ? 1 : 0; + sljit_ins ins = 0; + switch (GET_OPCODE(op) | (op & SLJIT_I32_OP)) { + /* 64-bit ops */ + #define LEVAL(i) EVAL(i, dst_r, mem) + case SLJIT_ADD: + ins = WHEN2(signed_flags, ag, alg); + break; + case SLJIT_SUB: + ins = WHEN2(signed_flags, sg, slg); + break; + case SLJIT_ADDC: + ins = LEVAL(alcg); + break; + case SLJIT_SUBC: + ins = LEVAL(slbg); + break; + case SLJIT_MUL: + ins = LEVAL(msg); + break; + case SLJIT_OR: + ins = LEVAL(og); + break; + case SLJIT_XOR: + ins = LEVAL(xg); + break; + case SLJIT_AND: + ins = LEVAL(ng); + break; + /* 32-bit ops */ + case SLJIT_ADD32: + if (signed_flags) + ins = WHEN2(can_u12, a, ay); + else + ins = WHEN2(can_u12, al, aly); + break; + case SLJIT_SUB32: + if (signed_flags) + ins = WHEN2(can_u12, s, sy); + else + ins = WHEN2(can_u12, sl, sly); + break; + case SLJIT_ADDC32: + ins = LEVAL(alc); + break; + case SLJIT_SUBC32: + ins = LEVAL(slb); + break; + case SLJIT_MUL32: + ins = WHEN2(can_u12, ms, msy); + break; + case SLJIT_OR32: + ins = WHEN2(can_u12, o, oy); + break; + case SLJIT_XOR32: + ins = WHEN2(can_u12, x, xy); + break; + case SLJIT_AND32: + ins = WHEN2(can_u12, n, ny); + break; + #undef LEVAL + default: + SLJIT_UNREACHABLE(); + } + FAIL_IF(push_inst(compiler, ins)); + } + else { + sljit_gpr src2_r = FAST_IS_REG(src2) ? gpr(src2 & REG_MASK) : tmp1; + if (src2 & SLJIT_IMM) { + /* load src2 into register */ + FAIL_IF(push_load_imm_inst(compiler, src2_r, src2w)); + } + if (src2 & SLJIT_MEM) { + /* load src2 into register */ + FAIL_IF(load_word(compiler, src2_r, src2, src2w, tmp1, op & SLJIT_I32_OP)); + } + /* TODO(mundaym): distinct operand facility where needed */ + #define LEVAL(i) i(tmp0, src1_r) + if (src1_r != dst_r && src1_r != tmp0) { + FAIL_IF(push_inst(compiler, + WHEN2(op & SLJIT_I32_OP, lr, lgr))); + src1_r = tmp0; + } + #undef LEVAL + sljit_ins ins = 0; + switch (GET_OPCODE(op) | (op & SLJIT_I32_OP)) { + #define LEVAL(i) i(src1_r, src2_r) + /* 64-bit ops */ + case SLJIT_ADD: + ins = WHEN2(signed_flags, agr, algr); + break; + case SLJIT_SUB: + ins = WHEN2(signed_flags, sgr, slgr); + break; + case SLJIT_ADDC: + ins = LEVAL(alcgr); + break; + case SLJIT_SUBC: + ins = LEVAL(slbgr); + break; + case SLJIT_MUL: + ins = LEVAL(msgr); + break; + case SLJIT_AND: + ins = LEVAL(ngr); + break; + case SLJIT_OR: + ins = LEVAL(ogr); + break; + case SLJIT_XOR: + ins = LEVAL(xgr); + break; + /* 32-bit ops */ + case SLJIT_ADD32: + ins = WHEN2(signed_flags, ar, alr); + break; + case SLJIT_SUB32: + ins = WHEN2(signed_flags, sr, slr); + break; + case SLJIT_ADDC32: + ins = LEVAL(alcr); + break; + case SLJIT_SUBC32: + ins = LEVAL(slbr); + break; + case SLJIT_MUL32: + ins = LEVAL(msr); + break; + case SLJIT_AND32: + ins = LEVAL(nr); + break; + case SLJIT_OR32: + ins = LEVAL(or); + break; + case SLJIT_XOR32: + ins = LEVAL(xr); + break; + #undef LEVAL + default: + SLJIT_UNREACHABLE(); + } + FAIL_IF(push_inst(compiler, ins)); + #define LEVAL(i) i(dst_r, src1_r) + if (src1_r != dst_r) + FAIL_IF(push_inst(compiler, + WHEN2(op & SLJIT_I32_OP, lr, lgr))); + #undef LEVAL + } + + /* write condition code to emulated flag register */ + if (op & VARIABLE_FLAG_MASK) + FAIL_IF(push_inst(compiler, ipm(flag_r))); + + /* write zero flag to emulated flag register */ + if (op & SLJIT_SET_Z) + FAIL_IF(push_store_zero_flag(compiler, op, dst_r)); + + /* finally write the result to memory if required */ + if (dst & SLJIT_MEM) { + SLJIT_ASSERT(dst_r != tmp1); + /* TODO(carenas): s/FAIL_IF/ return */ + FAIL_IF(store_word(compiler, dst_r, dst, dstw, tmp1, op & SLJIT_I32_OP)); + } + + return SLJIT_SUCCESS; +} + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_src( + struct sljit_compiler *compiler, + sljit_s32 op, sljit_s32 src, sljit_sw srcw) +{ + sljit_gpr src_r; + + CHECK_ERROR(); + CHECK(check_sljit_emit_op_src(compiler, op, src, srcw)); + ADJUST_LOCAL_OFFSET(src, srcw); + + switch (op) { + case SLJIT_FAST_RETURN: + src_r = FAST_IS_REG(src) ? gpr(src) : tmp1; + if (src & SLJIT_MEM) + FAIL_IF(load_word(compiler, tmp1, src, srcw, tmp1, 0)); + + return push_inst(compiler, br(src_r)); + case SLJIT_SKIP_FRAMES_BEFORE_FAST_RETURN: + /* TODO(carenas): implement? */ + return SLJIT_SUCCESS; + case SLJIT_PREFETCH_L1: + case SLJIT_PREFETCH_L2: + case SLJIT_PREFETCH_L3: + case SLJIT_PREFETCH_ONCE: + /* TODO(carenas): implement */ + return SLJIT_SUCCESS; + default: + /* TODO(carenas): probably should not success by default */ + return SLJIT_SUCCESS; + } + + return SLJIT_SUCCESS; +} + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_get_register_index(sljit_s32 reg) +{ + CHECK_REG_INDEX(check_sljit_get_register_index(reg)); + return gpr(reg); +} + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_get_float_register_index(sljit_s32 reg) +{ + CHECK_REG_INDEX(check_sljit_get_float_register_index(reg)); + abort(); +} + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_custom(struct sljit_compiler *compiler, + void *instruction, sljit_s32 size) +{ + sljit_ins ins = 0; + + CHECK_ERROR(); + CHECK(check_sljit_emit_op_custom(compiler, instruction, size)); + + memcpy((sljit_u8 *)&ins + sizeof(ins) - size, instruction, size); + return push_inst(compiler, ins); +} + +/* --------------------------------------------------------------------- */ +/* Floating point operators */ +/* --------------------------------------------------------------------- */ + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fop1(struct sljit_compiler *compiler, sljit_s32 op, + sljit_s32 dst, sljit_sw dstw, + sljit_s32 src, sljit_sw srcw) +{ + CHECK_ERROR(); + abort(); +} + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fop2(struct sljit_compiler *compiler, sljit_s32 op, + sljit_s32 dst, sljit_sw dstw, + sljit_s32 src1, sljit_sw src1w, + sljit_s32 src2, sljit_sw src2w) +{ + CHECK_ERROR(); + abort(); +} + +/* --------------------------------------------------------------------- */ +/* Other instructions */ +/* --------------------------------------------------------------------- */ + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_enter(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw) +{ + CHECK_ERROR(); + CHECK(check_sljit_emit_fast_enter(compiler, dst, dstw)); + ADJUST_LOCAL_OFFSET(dst, dstw); + + if (FAST_IS_REG(dst)) + return push_inst(compiler, lgr(gpr(dst), fast_link_r)); + + /* memory */ + return store_word(compiler, fast_link_r, dst, dstw, tmp1, 0); +} + +/* --------------------------------------------------------------------- */ +/* Conditional instructions */ +/* --------------------------------------------------------------------- */ + +SLJIT_API_FUNC_ATTRIBUTE struct sljit_label* sljit_emit_label(struct sljit_compiler *compiler) +{ + struct sljit_label *label; + + CHECK_ERROR_PTR(); + CHECK_PTR(check_sljit_emit_label(compiler)); + + if (compiler->last_label && compiler->last_label->size == compiler->size) + return compiler->last_label; + + label = (struct sljit_label*)ensure_abuf(compiler, sizeof(struct sljit_label)); + PTR_FAIL_IF(!label); + set_label(label, compiler); + return label; +} + +SLJIT_API_FUNC_ATTRIBUTE struct sljit_jump* sljit_emit_jump(struct sljit_compiler *compiler, sljit_s32 type) +{ + sljit_u8 mask = ((type & 0xff) < SLJIT_JUMP) ? get_cc(type & 0xff) : 0xf; + + CHECK_ERROR_PTR(); + CHECK_PTR(check_sljit_emit_jump(compiler, type)); + + /* reload condition code */ + if (mask != 0xf) + PTR_FAIL_IF(push_load_cc(compiler, type & 0xff)); + + /* record jump */ + struct sljit_jump *jump = (struct sljit_jump *) + ensure_abuf(compiler, sizeof(struct sljit_jump)); + PTR_FAIL_IF(!jump); + set_jump(jump, compiler, type & SLJIT_REWRITABLE_JUMP); + jump->addr = compiler->size; + + /* emit jump instruction */ + type &= 0xff; + if (type >= SLJIT_FAST_CALL) + PTR_FAIL_IF(push_inst(compiler, brasl(type == SLJIT_FAST_CALL ? fast_link_r : link_r, 0))); + else + PTR_FAIL_IF(push_inst(compiler, brcl(mask, 0))); + + return jump; +} + +SLJIT_API_FUNC_ATTRIBUTE struct sljit_jump* sljit_emit_call(struct sljit_compiler *compiler, sljit_s32 type, + sljit_s32 arg_types) +{ + CHECK_ERROR_PTR(); + CHECK_PTR(check_sljit_emit_call(compiler, type, arg_types)); + +#if (defined SLJIT_VERBOSE && SLJIT_VERBOSE) \ + || (defined SLJIT_ARGUMENT_CHECKS && SLJIT_ARGUMENT_CHECKS) + compiler->skip_checks = 1; +#endif + + return sljit_emit_jump(compiler, type); +} + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_ijump(struct sljit_compiler *compiler, sljit_s32 type, sljit_s32 src, sljit_sw srcw) +{ + sljit_gpr src_r = FAST_IS_REG(src) ? gpr(src) : tmp1; + + CHECK_ERROR(); + CHECK(check_sljit_emit_ijump(compiler, type, src, srcw)); + ADJUST_LOCAL_OFFSET(src, srcw); + + if (src & SLJIT_IMM) { + SLJIT_ASSERT(!(srcw & 1)); /* target address must be even */ + FAIL_IF(push_load_imm_inst(compiler, src_r, srcw)); + } + else if (src & SLJIT_MEM) + FAIL_IF(load_word(compiler, src_r, src, srcw, tmp1, 0 /* 64-bit */)); + + /* emit jump instruction */ + if (type >= SLJIT_FAST_CALL) + return push_inst(compiler, basr(type == SLJIT_FAST_CALL ? fast_link_r : link_r, src_r)); + + return push_inst(compiler, br(src_r)); +} + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_icall(struct sljit_compiler *compiler, sljit_s32 type, + sljit_s32 arg_types, + sljit_s32 src, sljit_sw srcw) +{ + CHECK_ERROR(); + CHECK(check_sljit_emit_icall(compiler, type, arg_types, src, srcw)); + +#if (defined SLJIT_VERBOSE && SLJIT_VERBOSE) \ + || (defined SLJIT_ARGUMENT_CHECKS && SLJIT_ARGUMENT_CHECKS) + compiler->skip_checks = 1; +#endif + + return sljit_emit_ijump(compiler, type, src, srcw); +} + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_flags(struct sljit_compiler *compiler, sljit_s32 op, + sljit_s32 dst, sljit_sw dstw, + sljit_s32 type) +{ + sljit_u8 mask = get_cc(type & 0xff); + + CHECK_ERROR(); + CHECK(check_sljit_emit_op_flags(compiler, op, dst, dstw, type)); + + sljit_gpr dst_r = FAST_IS_REG(dst) ? gpr(dst & REG_MASK) : tmp0; + sljit_gpr loc_r = tmp1; + switch (GET_OPCODE(op)) { + case SLJIT_AND: + case SLJIT_OR: + case SLJIT_XOR: + /* dst is also source operand */ + if (dst & SLJIT_MEM) + FAIL_IF(load_word(compiler, dst_r, dst, dstw, tmp1, op & SLJIT_I32_OP)); + + break; + case SLJIT_MOV: + case (SLJIT_MOV32 & ~SLJIT_I32_OP): + /* can write straight into destination */ + loc_r = dst_r; + break; + default: + SLJIT_UNREACHABLE(); + } + + if (mask != 0xf) + FAIL_IF(push_load_cc(compiler, type & 0xff)); + + /* TODO(mundaym): fold into cmov helper function? */ + #define LEVAL(i) i(loc_r, 1, mask) + if (have_lscond2()) { + FAIL_IF(push_load_imm_inst(compiler, loc_r, 0)); + FAIL_IF(push_inst(compiler, + WHEN2(op & SLJIT_I32_OP, lochi, locghi))); + } else { + /* TODO(mundaym): no load/store-on-condition 2 facility (ipm? branch-and-set?) */ + abort(); + } + #undef LEVAL + + /* apply bitwise op and set condition codes */ + switch (GET_OPCODE(op)) { + #define LEVAL(i) i(dst_r, loc_r) + case SLJIT_AND: + FAIL_IF(push_inst(compiler, + WHEN2(op & SLJIT_I32_OP, nr, ngr))); + break; + case SLJIT_OR: + FAIL_IF(push_inst(compiler, + WHEN2(op & SLJIT_I32_OP, or, ogr))); + break; + case SLJIT_XOR: + FAIL_IF(push_inst(compiler, + WHEN2(op & SLJIT_I32_OP, xr, xgr))); + break; + #undef LEVAL + } + + /* set zero flag if needed */ + if (op & SLJIT_SET_Z) + FAIL_IF(push_store_zero_flag(compiler, op, dst_r)); + + /* store result to memory if required */ + /* TODO(carenas): s/FAIL_IF/ return */ + if (dst & SLJIT_MEM) + FAIL_IF(store_word(compiler, dst_r, dst, dstw, tmp1, op & SLJIT_I32_OP)); + + return SLJIT_SUCCESS; +} + +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_cmov(struct sljit_compiler *compiler, sljit_s32 type, + sljit_s32 dst_reg, + sljit_s32 src, sljit_sw srcw) +{ + sljit_u8 mask = get_cc(type & 0xff); + sljit_gpr dst_r = gpr(dst_reg & ~SLJIT_I32_OP); + sljit_gpr src_r = FAST_IS_REG(src) ? gpr(src) : tmp0; + + CHECK_ERROR(); + CHECK(check_sljit_emit_cmov(compiler, type, dst_reg, src, srcw)); + + if (mask != 0xf) + FAIL_IF(push_load_cc(compiler, type & 0xff)); + + if (src & SLJIT_IMM) { + /* TODO(mundaym): fast path with lscond2 */ + FAIL_IF(push_load_imm_inst(compiler, src_r, srcw)); + } + + #define LEVAL(i) i(dst_r, src_r, mask) + if (have_lscond1()) + return push_inst(compiler, + WHEN2(dst_reg & SLJIT_I32_OP, locr, locgr)); + + #undef LEVAL + + /* TODO(mundaym): implement */ + return SLJIT_ERR_UNSUPPORTED; +} + +/* --------------------------------------------------------------------- */ +/* Other instructions */ +/* --------------------------------------------------------------------- */ + +/* On s390x we build a literal pool to hold constants. This has two main + advantages: + + 1. we only need one instruction in the instruction stream (LGRL) + 2. we can store 64 bit addresses and use 32 bit offsets + + To retrofit the extra information needed to build the literal pool we + add a new sljit_s390x_const struct that contains the initial value but + can still be cast to a sljit_const. */ + +SLJIT_API_FUNC_ATTRIBUTE struct sljit_const* sljit_emit_const(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw, sljit_sw init_value) +{ + struct sljit_s390x_const *const_; + sljit_gpr dst_r; + + CHECK_ERROR_PTR(); + CHECK_PTR(check_sljit_emit_const(compiler, dst, dstw, init_value)); + + const_ = (struct sljit_s390x_const*)ensure_abuf(compiler, + sizeof(struct sljit_s390x_const)); + PTR_FAIL_IF(!const_); + set_const((struct sljit_const*)const_, compiler); + const_->init_value = init_value; + + dst_r = FAST_IS_REG(dst) ? gpr(dst & REG_MASK) : tmp0; + if (have_genext()) + PTR_FAIL_IF(push_inst(compiler, sljit_ins_const | lgrl(dst_r, 0))); + else { + PTR_FAIL_IF(push_inst(compiler, sljit_ins_const | larl(tmp1, 0))); + PTR_FAIL_IF(push_inst(compiler, lg(dst_r, 0, r0, tmp1))); + } + + if (dst & SLJIT_MEM) + PTR_FAIL_IF(store_word(compiler, dst_r, dst, dstw, tmp1, 0 /* always 64-bit */)); + + return (struct sljit_const*)const_; +} + +SLJIT_API_FUNC_ATTRIBUTE void sljit_set_jump_addr(sljit_uw addr, sljit_uw new_target, sljit_sw executable_offset) +{ + /* Update the constant pool. */ + sljit_uw *ptr = (sljit_uw *)addr; + SLJIT_UNUSED_ARG(executable_offset); + + SLJIT_UPDATE_WX_FLAGS(ptr, ptr + 1, 0); + *ptr = new_target; + SLJIT_UPDATE_WX_FLAGS(ptr, ptr + 1, 1); + SLJIT_CACHE_FLUSH(ptr, ptr + 1); +} + +SLJIT_API_FUNC_ATTRIBUTE void sljit_set_const(sljit_uw addr, sljit_sw new_constant, sljit_sw executable_offset) +{ + sljit_set_jump_addr(addr, new_constant, executable_offset); +} + +SLJIT_API_FUNC_ATTRIBUTE struct sljit_put_label *sljit_emit_put_label( + struct sljit_compiler *compiler, + sljit_s32 dst, sljit_sw dstw) +{ + struct sljit_put_label *put_label; + sljit_gpr dst_r; + + CHECK_ERROR_PTR(); + CHECK_PTR(check_sljit_emit_put_label(compiler, dst, dstw)); + ADJUST_LOCAL_OFFSET(dst, dstw); + + put_label = (struct sljit_put_label*)ensure_abuf(compiler, sizeof(struct sljit_put_label)); + PTR_FAIL_IF(!put_label); + set_put_label(put_label, compiler, 0); + + dst_r = FAST_IS_REG(dst) ? gpr(dst & REG_MASK) : tmp0; + + if (have_genext()) + PTR_FAIL_IF(push_inst(compiler, lgrl(dst_r, 0))); + else { + PTR_FAIL_IF(push_inst(compiler, larl(tmp1, 0))); + PTR_FAIL_IF(push_inst(compiler, lg(dst_r, 0, r0, tmp1))); + } + + if (dst & SLJIT_MEM) + PTR_FAIL_IF(store_word(compiler, dst_r, dst, dstw, tmp1, 0)); + + return put_label; +} + +/* TODO(carenas): EVAL probably should move up or be refactored */ +#undef WHEN2 +#undef EVAL + +#undef tmp1 +#undef tmp0 + +/* TODO(carenas): undef other macros that spill like is_u12? */ diff --git a/src/pcre/sljit/sljitNativeSPARC_32.c b/src/pcre2/src/sljit/sljitNativeSPARC_32.c similarity index 97% rename from src/pcre/sljit/sljitNativeSPARC_32.c rename to src/pcre2/src/sljit/sljitNativeSPARC_32.c index 0671b130..e5167f02 100644 --- a/src/pcre/sljit/sljitNativeSPARC_32.c +++ b/src/pcre2/src/sljit/sljitNativeSPARC_32.c @@ -266,19 +266,18 @@ static SLJIT_INLINE sljit_s32 emit_const(struct sljit_compiler *compiler, sljit_ SLJIT_API_FUNC_ATTRIBUTE void sljit_set_jump_addr(sljit_uw addr, sljit_uw new_target, sljit_sw executable_offset) { sljit_ins *inst = (sljit_ins *)addr; + SLJIT_UNUSED_ARG(executable_offset); + SLJIT_UPDATE_WX_FLAGS(inst, inst + 2, 0); + SLJIT_ASSERT(((inst[0] & 0xc1c00000) == 0x01000000) && ((inst[1] & 0xc1f82000) == 0x80102000)); inst[0] = (inst[0] & 0xffc00000) | ((new_target >> 10) & 0x3fffff); inst[1] = (inst[1] & 0xfffffc00) | (new_target & 0x3ff); + SLJIT_UPDATE_WX_FLAGS(inst, inst + 2, 1); inst = (sljit_ins *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); SLJIT_CACHE_FLUSH(inst, inst + 2); } SLJIT_API_FUNC_ATTRIBUTE void sljit_set_const(sljit_uw addr, sljit_sw new_constant, sljit_sw executable_offset) { - sljit_ins *inst = (sljit_ins *)addr; - - inst[0] = (inst[0] & 0xffc00000) | ((new_constant >> 10) & 0x3fffff); - inst[1] = (inst[1] & 0xfffffc00) | (new_constant & 0x3ff); - inst = (sljit_ins *)SLJIT_ADD_EXEC_OFFSET(inst, executable_offset); - SLJIT_CACHE_FLUSH(inst, inst + 2); + sljit_set_jump_addr(addr, new_constant, executable_offset); } diff --git a/src/pcre/sljit/sljitNativeSPARC_common.c b/src/pcre2/src/sljit/sljitNativeSPARC_common.c similarity index 92% rename from src/pcre/sljit/sljitNativeSPARC_common.c rename to src/pcre2/src/sljit/sljitNativeSPARC_common.c index 669ecd81..544d80d0 100644 --- a/src/pcre/sljit/sljitNativeSPARC_common.c +++ b/src/pcre2/src/sljit/sljitNativeSPARC_common.c @@ -298,57 +298,71 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil sljit_ins *buf_ptr; sljit_ins *buf_end; sljit_uw word_count; + sljit_uw next_addr; sljit_sw executable_offset; sljit_uw addr; struct sljit_label *label; struct sljit_jump *jump; struct sljit_const *const_; + struct sljit_put_label *put_label; CHECK_ERROR_PTR(); CHECK_PTR(check_sljit_generate_code(compiler)); reverse_buf(compiler); - code = (sljit_ins*)SLJIT_MALLOC_EXEC(compiler->size * sizeof(sljit_ins)); + code = (sljit_ins*)SLJIT_MALLOC_EXEC(compiler->size * sizeof(sljit_ins), compiler->exec_allocator_data); PTR_FAIL_WITH_EXEC_IF(code); buf = compiler->buf; code_ptr = code; word_count = 0; + next_addr = 0; executable_offset = SLJIT_EXEC_OFFSET(code); label = compiler->labels; jump = compiler->jumps; const_ = compiler->consts; + put_label = compiler->put_labels; do { buf_ptr = (sljit_ins*)buf->memory; buf_end = buf_ptr + (buf->used_size >> 2); do { *code_ptr = *buf_ptr++; - SLJIT_ASSERT(!label || label->size >= word_count); - SLJIT_ASSERT(!jump || jump->addr >= word_count); - SLJIT_ASSERT(!const_ || const_->addr >= word_count); - /* These structures are ordered by their address. */ - if (label && label->size == word_count) { - /* Just recording the address. */ - label->addr = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); - label->size = code_ptr - code; - label = label->next; - } - if (jump && jump->addr == word_count) { + if (next_addr == word_count) { + SLJIT_ASSERT(!label || label->size >= word_count); + SLJIT_ASSERT(!jump || jump->addr >= word_count); + SLJIT_ASSERT(!const_ || const_->addr >= word_count); + SLJIT_ASSERT(!put_label || put_label->addr >= word_count); + + /* These structures are ordered by their address. */ + if (label && label->size == word_count) { + /* Just recording the address. */ + label->addr = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); + label->size = code_ptr - code; + label = label->next; + } + if (jump && jump->addr == word_count) { #if (defined SLJIT_CONFIG_SPARC_32 && SLJIT_CONFIG_SPARC_32) - jump->addr = (sljit_uw)(code_ptr - 3); + jump->addr = (sljit_uw)(code_ptr - 3); #else - jump->addr = (sljit_uw)(code_ptr - 6); + jump->addr = (sljit_uw)(code_ptr - 6); #endif - code_ptr = detect_jump_type(jump, code_ptr, code, executable_offset); - jump = jump->next; - } - if (const_ && const_->addr == word_count) { - /* Just recording the address. */ - const_->addr = (sljit_uw)code_ptr; - const_ = const_->next; + code_ptr = detect_jump_type(jump, code_ptr, code, executable_offset); + jump = jump->next; + } + if (const_ && const_->addr == word_count) { + /* Just recording the address. */ + const_->addr = (sljit_uw)code_ptr; + const_ = const_->next; + } + if (put_label && put_label->addr == word_count) { + SLJIT_ASSERT(put_label->label); + put_label->addr = (sljit_uw)code_ptr; + put_label = put_label->next; + } + next_addr = compute_next_addr(label, jump, const_, put_label); } code_ptr ++; word_count ++; @@ -366,6 +380,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil SLJIT_ASSERT(!label); SLJIT_ASSERT(!jump); SLJIT_ASSERT(!const_); + SLJIT_ASSERT(!put_label); SLJIT_ASSERT(code_ptr - code <= (sljit_s32)compiler->size); jump = compiler->jumps; @@ -389,8 +404,9 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil /* Set the fields of immediate loads. */ #if (defined SLJIT_CONFIG_SPARC_32 && SLJIT_CONFIG_SPARC_32) - buf_ptr[0] = (buf_ptr[0] & 0xffc00000) | ((addr >> 10) & 0x3fffff); - buf_ptr[1] = (buf_ptr[1] & 0xfffffc00) | (addr & 0x3ff); + SLJIT_ASSERT(((buf_ptr[0] & 0xc1cfffff) == 0x01000000) && ((buf_ptr[1] & 0xc1f83fff) == 0x80102000)); + buf_ptr[0] |= (addr >> 10) & 0x3fffff; + buf_ptr[1] |= addr & 0x3ff; #else #error "Implementation required" #endif @@ -398,6 +414,20 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil jump = jump->next; } + put_label = compiler->put_labels; + while (put_label) { + addr = put_label->label->addr; + buf_ptr = (sljit_ins *)put_label->addr; + +#if (defined SLJIT_CONFIG_SPARC_32 && SLJIT_CONFIG_SPARC_32) + SLJIT_ASSERT(((buf_ptr[0] & 0xc1cfffff) == 0x01000000) && ((buf_ptr[1] & 0xc1f83fff) == 0x80102000)); + buf_ptr[0] |= (addr >> 10) & 0x3fffff; + buf_ptr[1] |= addr & 0x3ff; +#else +#error "Implementation required" +#endif + put_label = put_label->next; + } compiler->error = SLJIT_ERR_COMPILED; compiler->executable_offset = executable_offset; @@ -407,6 +437,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil code_ptr = (sljit_ins *)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); SLJIT_CACHE_FLUSH(code, code_ptr); + SLJIT_UPDATE_WX_FLAGS(code, code_ptr, 1); return code; } @@ -421,6 +452,9 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_has_cpu_feature(sljit_s32 feature_type) return 1; #endif + case SLJIT_HAS_ZERO_REGISTER: + return 1; + #if (defined SLJIT_CONFIG_SPARC_64 && SLJIT_CONFIG_SPARC_64) case SLJIT_HAS_CMOV: return 1; @@ -842,6 +876,9 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op0(struct sljit_compiler *compile #else #error "Implementation required" #endif + case SLJIT_ENDBR: + case SLJIT_SKIP_FRAMES_BEFORE_RETURN: + return SLJIT_SUCCESS; } return SLJIT_SUCCESS; @@ -858,9 +895,6 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op1(struct sljit_compiler *compile ADJUST_LOCAL_OFFSET(dst, dstw); ADJUST_LOCAL_OFFSET(src, srcw); - if (dst == SLJIT_UNUSED && !HAS_FLAGS(op)) - return SLJIT_SUCCESS; - op = GET_OPCODE(op); switch (op) { case SLJIT_MOV: @@ -941,6 +975,33 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op2(struct sljit_compiler *compile return SLJIT_SUCCESS; } +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_src(struct sljit_compiler *compiler, sljit_s32 op, + sljit_s32 src, sljit_sw srcw) +{ + CHECK_ERROR(); + CHECK(check_sljit_emit_op_src(compiler, op, src, srcw)); + ADJUST_LOCAL_OFFSET(src, srcw); + + switch (op) { + case SLJIT_FAST_RETURN: + if (FAST_IS_REG(src)) + FAIL_IF(push_inst(compiler, OR | D(TMP_LINK) | S1(0) | S2(src), DR(TMP_LINK))); + else + FAIL_IF(emit_op_mem(compiler, WORD_DATA | LOAD_DATA, TMP_LINK, src, srcw)); + + FAIL_IF(push_inst(compiler, JMPL | D(0) | S1(TMP_LINK) | IMM(8), UNMOVABLE_INS)); + return push_inst(compiler, NOP, UNMOVABLE_INS); + case SLJIT_SKIP_FRAMES_BEFORE_FAST_RETURN: + case SLJIT_PREFETCH_L1: + case SLJIT_PREFETCH_L2: + case SLJIT_PREFETCH_L3: + case SLJIT_PREFETCH_ONCE: + return SLJIT_SUCCESS; + } + + return SLJIT_SUCCESS; +} + SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_get_register_index(sljit_s32 reg) { CHECK_REG_INDEX(check_sljit_get_register_index(reg)); @@ -1185,25 +1246,12 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_enter(struct sljit_compiler * ADJUST_LOCAL_OFFSET(dst, dstw); if (FAST_IS_REG(dst)) - return push_inst(compiler, OR | D(dst) | S1(0) | S2(TMP_LINK), DR(dst)); + return push_inst(compiler, OR | D(dst) | S1(0) | S2(TMP_LINK), UNMOVABLE_INS); /* Memory. */ - return emit_op_mem(compiler, WORD_DATA, TMP_LINK, dst, dstw); -} - -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_return(struct sljit_compiler *compiler, sljit_s32 src, sljit_sw srcw) -{ - CHECK_ERROR(); - CHECK(check_sljit_emit_fast_return(compiler, src, srcw)); - ADJUST_LOCAL_OFFSET(src, srcw); - - if (FAST_IS_REG(src)) - FAIL_IF(push_inst(compiler, OR | D(TMP_LINK) | S1(0) | S2(src), DR(TMP_LINK))); - else - FAIL_IF(emit_op_mem(compiler, WORD_DATA | LOAD_DATA, TMP_LINK, src, srcw)); - - FAIL_IF(push_inst(compiler, JMPL | D(0) | S1(TMP_LINK) | IMM(8), UNMOVABLE_INS)); - return push_inst(compiler, NOP, UNMOVABLE_INS); + FAIL_IF(emit_op_mem(compiler, WORD_DATA, TMP_LINK, dst, dstw)); + compiler->delay_slot = UNMOVABLE_INS; + return SLJIT_SUCCESS; } /* --------------------------------------------------------------------- */ @@ -1465,8 +1513,8 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_cmov(struct sljit_compiler *compil SLJIT_API_FUNC_ATTRIBUTE struct sljit_const* sljit_emit_const(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw, sljit_sw init_value) { - sljit_s32 reg; struct sljit_const *const_; + sljit_s32 dst_r; CHECK_ERROR_PTR(); CHECK_PTR(check_sljit_emit_const(compiler, dst, dstw, init_value)); @@ -1476,11 +1524,31 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_const* sljit_emit_const(struct sljit_compi PTR_FAIL_IF(!const_); set_const(const_, compiler); - reg = FAST_IS_REG(dst) ? dst : TMP_REG2; - - PTR_FAIL_IF(emit_const(compiler, reg, init_value)); + dst_r = FAST_IS_REG(dst) ? dst : TMP_REG2; + PTR_FAIL_IF(emit_const(compiler, dst_r, init_value)); if (dst & SLJIT_MEM) PTR_FAIL_IF(emit_op_mem(compiler, WORD_DATA, TMP_REG2, dst, dstw)); return const_; } + +SLJIT_API_FUNC_ATTRIBUTE struct sljit_put_label* sljit_emit_put_label(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw) +{ + struct sljit_put_label *put_label; + sljit_s32 dst_r; + + CHECK_ERROR_PTR(); + CHECK_PTR(check_sljit_emit_put_label(compiler, dst, dstw)); + ADJUST_LOCAL_OFFSET(dst, dstw); + + put_label = (struct sljit_put_label*)ensure_abuf(compiler, sizeof(struct sljit_put_label)); + PTR_FAIL_IF(!put_label); + set_put_label(put_label, compiler, 0); + + dst_r = FAST_IS_REG(dst) ? dst : TMP_REG2; + PTR_FAIL_IF(emit_const(compiler, dst_r, 0)); + + if (dst & SLJIT_MEM) + PTR_FAIL_IF(emit_op_mem(compiler, WORD_DATA, TMP_REG2, dst, dstw)); + return put_label; +} diff --git a/src/pcre/sljit/sljitNativeX86_32.c b/src/pcre2/src/sljit/sljitNativeX86_32.c similarity index 94% rename from src/pcre/sljit/sljitNativeX86_32.c rename to src/pcre2/src/sljit/sljitNativeX86_32.c index 074e64b9..79a7e8bb 100644 --- a/src/pcre/sljit/sljitNativeX86_32.c +++ b/src/pcre2/src/sljit/sljitNativeX86_32.c @@ -38,8 +38,10 @@ static sljit_s32 emit_do_imm(struct sljit_compiler *compiler, sljit_u8 opcode, s return SLJIT_SUCCESS; } -static sljit_u8* generate_far_jump_code(struct sljit_jump *jump, sljit_u8 *code_ptr, sljit_s32 type, sljit_sw executable_offset) +static sljit_u8* generate_far_jump_code(struct sljit_jump *jump, sljit_u8 *code_ptr, sljit_sw executable_offset) { + sljit_s32 type = jump->flags >> TYPE_SHIFT; + if (type == SLJIT_JUMP) { *code_ptr++ = JMP_i32; jump->addr++; @@ -74,6 +76,9 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_enter(struct sljit_compiler *compi CHECK(check_sljit_emit_enter(compiler, options, arg_types, scratches, saveds, fscratches, fsaveds, local_size)); set_emit_enter(compiler, options, arg_types, scratches, saveds, fscratches, fsaveds, local_size); + /* Emit ENDBR32 at function entry if needed. */ + FAIL_IF(emit_endbranch(compiler)); + args = get_arg_count(arg_types); compiler->args = args; @@ -305,14 +310,11 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_return(struct sljit_compiler *comp SLJIT_SP, 0, SLJIT_SP, 0, SLJIT_IMM, compiler->local_size)); #endif - size = 2 + (compiler->scratches > 7 ? (compiler->scratches - 7) : 0) + + size = 2 + (compiler->scratches > 9 ? (compiler->scratches - 9) : 0) + (compiler->saveds <= 3 ? compiler->saveds : 3); #if (defined SLJIT_X86_32_FASTCALL && SLJIT_X86_32_FASTCALL) if (compiler->args > 2) size += 2; -#else - if (compiler->args > 0) - size += 2; #endif inst = (sljit_u8*)ensure_buf(compiler, 1 + size); FAIL_IF(!inst); @@ -365,6 +367,8 @@ static sljit_u8* emit_x86_instruction(struct sljit_compiler *compiler, sljit_s32 SLJIT_ASSERT((flags & (EX86_PREF_F2 | EX86_PREF_F3)) != (EX86_PREF_F2 | EX86_PREF_F3) && (flags & (EX86_PREF_F2 | EX86_PREF_66)) != (EX86_PREF_F2 | EX86_PREF_66) && (flags & (EX86_PREF_F3 | EX86_PREF_66)) != (EX86_PREF_F3 | EX86_PREF_66)); + /* We don't support (%ebp). */ + SLJIT_ASSERT(!(b & SLJIT_MEM) || immb || reg_map[b & REG_MASK] != 5); size &= 0xf; inst_size = size; @@ -861,14 +865,10 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_enter(struct sljit_compiler * return SLJIT_SUCCESS; } -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_return(struct sljit_compiler *compiler, sljit_s32 src, sljit_sw srcw) +static sljit_s32 emit_fast_return(struct sljit_compiler *compiler, sljit_s32 src, sljit_sw srcw) { sljit_u8 *inst; - CHECK_ERROR(); - CHECK(check_sljit_emit_fast_return(compiler, src, srcw)); - ADJUST_LOCAL_OFFSET(src, srcw); - CHECK_EXTRA_REGS(src, srcw, (void)0); if (FAST_IS_REG(src)) { @@ -892,3 +892,37 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_return(struct sljit_compiler RET(); return SLJIT_SUCCESS; } + +static sljit_s32 skip_frames_before_return(struct sljit_compiler *compiler) +{ + sljit_s32 size, saved_size; + sljit_s32 has_f64_aligment; + + /* Don't adjust shadow stack if it isn't enabled. */ + if (!cpu_has_shadow_stack ()) + return SLJIT_SUCCESS; + + SLJIT_ASSERT(compiler->args >= 0); + SLJIT_ASSERT(compiler->local_size > 0); + +#if !defined(__APPLE__) + has_f64_aligment = compiler->options & SLJIT_F64_ALIGNMENT; +#else + has_f64_aligment = 0; +#endif + + size = compiler->local_size; + saved_size = (1 + (compiler->scratches > 9 ? (compiler->scratches - 9) : 0) + (compiler->saveds <= 3 ? compiler->saveds : 3)) * sizeof(sljit_uw); + if (has_f64_aligment) { + /* mov TMP_REG1, [esp + local_size]. */ + EMIT_MOV(compiler, TMP_REG1, 0, SLJIT_MEM1(SLJIT_SP), size); + /* mov TMP_REG1, [TMP_REG1+ saved_size]. */ + EMIT_MOV(compiler, TMP_REG1, 0, SLJIT_MEM1(TMP_REG1), saved_size); + /* Move return address to [esp]. */ + EMIT_MOV(compiler, SLJIT_MEM1(SLJIT_SP), 0, TMP_REG1, 0); + size = 0; + } else + size += saved_size; + + return adjust_shadow_stack(compiler, SLJIT_UNUSED, 0, SLJIT_SP, size); +} diff --git a/src/pcre/sljit/sljitNativeX86_64.c b/src/pcre2/src/sljit/sljitNativeX86_64.c similarity index 91% rename from src/pcre/sljit/sljitNativeX86_64.c rename to src/pcre2/src/sljit/sljitNativeX86_64.c index 85065656..e85b56a6 100644 --- a/src/pcre/sljit/sljitNativeX86_64.c +++ b/src/pcre2/src/sljit/sljitNativeX86_64.c @@ -39,8 +39,10 @@ static sljit_s32 emit_load_imm64(struct sljit_compiler *compiler, sljit_s32 reg, return SLJIT_SUCCESS; } -static sljit_u8* generate_far_jump_code(struct sljit_jump *jump, sljit_u8 *code_ptr, sljit_s32 type) +static sljit_u8* generate_far_jump_code(struct sljit_jump *jump, sljit_u8 *code_ptr) { + sljit_s32 type = jump->flags >> TYPE_SHIFT; + int short_addr = !(jump->flags & SLJIT_REWRITABLE_JUMP) && !(jump->flags & JUMP_LABEL) && (jump->u.target <= 0xffffffff); /* The relative jump below specialized for this case. */ @@ -72,6 +74,56 @@ static sljit_u8* generate_far_jump_code(struct sljit_jump *jump, sljit_u8 *code_ return code_ptr; } +static sljit_u8* generate_put_label_code(struct sljit_put_label *put_label, sljit_u8 *code_ptr, sljit_uw max_label) +{ + if (max_label > HALFWORD_MAX) { + put_label->addr -= put_label->flags; + put_label->flags = PATCH_MD; + return code_ptr; + } + + if (put_label->flags == 0) { + /* Destination is register. */ + code_ptr = (sljit_u8*)put_label->addr - 2 - sizeof(sljit_uw); + + SLJIT_ASSERT((code_ptr[0] & 0xf8) == REX_W); + SLJIT_ASSERT((code_ptr[1] & 0xf8) == MOV_r_i32); + + if ((code_ptr[0] & 0x07) != 0) { + code_ptr[0] = (sljit_u8)(code_ptr[0] & ~0x08); + code_ptr += 2 + sizeof(sljit_s32); + } + else { + code_ptr[0] = code_ptr[1]; + code_ptr += 1 + sizeof(sljit_s32); + } + + put_label->addr = (sljit_uw)code_ptr; + return code_ptr; + } + + code_ptr -= put_label->flags + (2 + sizeof(sljit_uw)); + SLJIT_MEMMOVE(code_ptr, code_ptr + (2 + sizeof(sljit_uw)), put_label->flags); + + SLJIT_ASSERT((code_ptr[0] & 0xf8) == REX_W); + + if ((code_ptr[1] & 0xf8) == MOV_r_i32) { + code_ptr += 2 + sizeof(sljit_uw); + SLJIT_ASSERT((code_ptr[0] & 0xf8) == REX_W); + } + + SLJIT_ASSERT(code_ptr[1] == MOV_rm_r); + + code_ptr[0] = (sljit_u8)(code_ptr[0] & ~0x4); + code_ptr[1] = MOV_rm_i32; + code_ptr[2] = (sljit_u8)(code_ptr[2] & ~(0x7 << 3)); + + code_ptr = (sljit_u8*)(put_label->addr - (2 + sizeof(sljit_uw)) + sizeof(sljit_s32)); + put_label->addr = (sljit_uw)code_ptr; + put_label->flags = 0; + return code_ptr; +} + SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_enter(struct sljit_compiler *compiler, sljit_s32 options, sljit_s32 arg_types, sljit_s32 scratches, sljit_s32 saveds, sljit_s32 fscratches, sljit_s32 fsaveds, sljit_s32 local_size) @@ -83,6 +135,9 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_enter(struct sljit_compiler *compi CHECK(check_sljit_emit_enter(compiler, options, arg_types, scratches, saveds, fscratches, fsaveds, local_size)); set_emit_enter(compiler, options, arg_types, scratches, saveds, fscratches, fsaveds, local_size); + /* Emit ENDBR64 at function entry if needed. */ + FAIL_IF(emit_endbranch(compiler)); + compiler->mode32 = 0; #ifdef _WIN64 @@ -744,14 +799,10 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_enter(struct sljit_compiler * return SLJIT_SUCCESS; } -SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_fast_return(struct sljit_compiler *compiler, sljit_s32 src, sljit_sw srcw) +static sljit_s32 emit_fast_return(struct sljit_compiler *compiler, sljit_s32 src, sljit_sw srcw) { sljit_u8 *inst; - CHECK_ERROR(); - CHECK(check_sljit_emit_fast_return(compiler, src, srcw)); - ADJUST_LOCAL_OFFSET(src, srcw); - if (FAST_IS_REG(src)) { if (reg_map[src] < 8) { inst = (sljit_u8*)ensure_buf(compiler, 1 + 1 + 1); @@ -846,3 +897,22 @@ static sljit_s32 emit_mov_int(struct sljit_compiler *compiler, sljit_s32 sign, return SLJIT_SUCCESS; } + +static sljit_s32 skip_frames_before_return(struct sljit_compiler *compiler) +{ + sljit_s32 tmp, size; + + /* Don't adjust shadow stack if it isn't enabled. */ + if (!cpu_has_shadow_stack ()) + return SLJIT_SUCCESS; + + size = compiler->local_size; + tmp = compiler->scratches; + if (tmp >= SLJIT_FIRST_SAVED_REG) + size += (tmp - SLJIT_FIRST_SAVED_REG + 1) * sizeof(sljit_uw); + tmp = compiler->saveds < SLJIT_NUMBER_OF_SAVED_REGISTERS ? (SLJIT_S0 + 1 - compiler->saveds) : SLJIT_FIRST_SAVED_REG; + if (SLJIT_S0 >= tmp) + size += (SLJIT_S0 - tmp + 1) * sizeof(sljit_uw); + + return adjust_shadow_stack(compiler, SLJIT_UNUSED, 0, SLJIT_SP, size); +} diff --git a/src/pcre/sljit/sljitNativeX86_common.c b/src/pcre2/src/sljit/sljitNativeX86_common.c similarity index 89% rename from src/pcre/sljit/sljitNativeX86_common.c rename to src/pcre2/src/sljit/sljitNativeX86_common.c index 6f02ee3e..ddcc5ebf 100644 --- a/src/pcre/sljit/sljitNativeX86_common.c +++ b/src/pcre2/src/sljit/sljitNativeX86_common.c @@ -428,13 +428,15 @@ static sljit_u8 get_jump_code(sljit_s32 type) } #if (defined SLJIT_CONFIG_X86_32 && SLJIT_CONFIG_X86_32) -static sljit_u8* generate_far_jump_code(struct sljit_jump *jump, sljit_u8 *code_ptr, sljit_s32 type, sljit_sw executable_offset); +static sljit_u8* generate_far_jump_code(struct sljit_jump *jump, sljit_u8 *code_ptr, sljit_sw executable_offset); #else -static sljit_u8* generate_far_jump_code(struct sljit_jump *jump, sljit_u8 *code_ptr, sljit_s32 type); +static sljit_u8* generate_far_jump_code(struct sljit_jump *jump, sljit_u8 *code_ptr); +static sljit_u8* generate_put_label_code(struct sljit_put_label *put_label, sljit_u8 *code_ptr, sljit_uw max_label); #endif -static sljit_u8* generate_near_jump_code(struct sljit_jump *jump, sljit_u8 *code_ptr, sljit_u8 *code, sljit_s32 type, sljit_sw executable_offset) +static sljit_u8* generate_near_jump_code(struct sljit_jump *jump, sljit_u8 *code_ptr, sljit_u8 *code, sljit_sw executable_offset) { + sljit_s32 type = jump->flags >> TYPE_SHIFT; sljit_s32 short_jump; sljit_uw label_addr; @@ -447,7 +449,7 @@ static sljit_u8* generate_near_jump_code(struct sljit_jump *jump, sljit_u8 *code #if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) if ((sljit_sw)(label_addr - (jump->addr + 1)) > HALFWORD_MAX || (sljit_sw)(label_addr - (jump->addr + 1)) < HALFWORD_MIN) - return generate_far_jump_code(jump, code_ptr, type); + return generate_far_jump_code(jump, code_ptr); #endif if (type == SLJIT_JUMP) { @@ -497,13 +499,14 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil struct sljit_label *label; struct sljit_jump *jump; struct sljit_const *const_; + struct sljit_put_label *put_label; CHECK_ERROR_PTR(); CHECK_PTR(check_sljit_generate_code(compiler)); reverse_buf(compiler); /* Second code generation pass. */ - code = (sljit_u8*)SLJIT_MALLOC_EXEC(compiler->size); + code = (sljit_u8*)SLJIT_MALLOC_EXEC(compiler->size, compiler->exec_allocator_data); PTR_FAIL_WITH_EXEC_IF(code); buf = compiler->buf; @@ -511,6 +514,7 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil label = compiler->labels; jump = compiler->jumps; const_ = compiler->consts; + put_label = compiler->put_labels; executable_offset = SLJIT_EXEC_OFFSET(code); do { @@ -525,27 +529,38 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil buf_ptr += len; } else { - if (*buf_ptr >= 2) { + switch (*buf_ptr) { + case 0: + label->addr = (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset); + label->size = code_ptr - code; + label = label->next; + break; + case 1: jump->addr = (sljit_uw)code_ptr; if (!(jump->flags & SLJIT_REWRITABLE_JUMP)) - code_ptr = generate_near_jump_code(jump, code_ptr, code, *buf_ptr - 2, executable_offset); + code_ptr = generate_near_jump_code(jump, code_ptr, code, executable_offset); else { #if (defined SLJIT_CONFIG_X86_32 && SLJIT_CONFIG_X86_32) - code_ptr = generate_far_jump_code(jump, code_ptr, *buf_ptr - 2, executable_offset); + code_ptr = generate_far_jump_code(jump, code_ptr, executable_offset); #else - code_ptr = generate_far_jump_code(jump, code_ptr, *buf_ptr - 2); + code_ptr = generate_far_jump_code(jump, code_ptr); #endif } jump = jump->next; - } - else if (*buf_ptr == 0) { - label->addr = ((sljit_uw)code_ptr) + executable_offset; - label->size = code_ptr - code; - label = label->next; - } - else { /* *buf_ptr is 1 */ + break; + case 2: const_->addr = ((sljit_uw)code_ptr) - sizeof(sljit_sw); const_ = const_->next; + break; + default: + SLJIT_ASSERT(*buf_ptr == 3); + SLJIT_ASSERT(put_label->label); + put_label->addr = (sljit_uw)code_ptr; +#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) + code_ptr = generate_put_label_code(put_label, code_ptr, (sljit_uw)SLJIT_ADD_EXEC_OFFSET(code, executable_offset) + put_label->label->size); +#endif + put_label = put_label->next; + break; } buf_ptr++; } @@ -557,6 +572,8 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil SLJIT_ASSERT(!label); SLJIT_ASSERT(!jump); SLJIT_ASSERT(!const_); + SLJIT_ASSERT(!put_label); + SLJIT_ASSERT(code_ptr <= code + compiler->size); jump = compiler->jumps; while (jump) { @@ -591,12 +608,32 @@ SLJIT_API_FUNC_ATTRIBUTE void* sljit_generate_code(struct sljit_compiler *compil jump = jump->next; } - /* Some space may be wasted because of short jumps. */ - SLJIT_ASSERT(code_ptr <= code + compiler->size); + put_label = compiler->put_labels; + while (put_label) { +#if (defined SLJIT_CONFIG_X86_32 && SLJIT_CONFIG_X86_32) + sljit_unaligned_store_sw((void*)(put_label->addr - sizeof(sljit_sw)), (sljit_sw)put_label->label->addr); +#else + if (put_label->flags & PATCH_MD) { + SLJIT_ASSERT(put_label->label->addr > HALFWORD_MAX); + sljit_unaligned_store_sw((void*)(put_label->addr - sizeof(sljit_sw)), (sljit_sw)put_label->label->addr); + } + else { + SLJIT_ASSERT(put_label->label->addr <= HALFWORD_MAX); + sljit_unaligned_store_s32((void*)(put_label->addr - sizeof(sljit_s32)), (sljit_s32)put_label->label->addr); + } +#endif + + put_label = put_label->next; + } + compiler->error = SLJIT_ERR_COMPILED; compiler->executable_offset = executable_offset; compiler->executable_size = code_ptr - code; - return (void*)(code + executable_offset); + + code = (sljit_u8*)SLJIT_ADD_EXEC_OFFSET(code, executable_offset); + + SLJIT_UPDATE_WX_FLAGS(code, (sljit_u8*)SLJIT_ADD_EXEC_OFFSET(code_ptr, executable_offset), 1); + return (void*)code; } SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_has_cpu_feature(sljit_s32 feature_type) @@ -624,6 +661,9 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_has_cpu_feature(sljit_s32 feature_type) get_cpu_features(); return cpu_has_cmov; + case SLJIT_HAS_PREFETCH: + return 1; + case SLJIT_HAS_SSE2: #if (defined SLJIT_DETECT_SSE2 && SLJIT_DETECT_SSE2) if (cpu_has_sse2 == -1) @@ -669,6 +709,166 @@ static SLJIT_INLINE sljit_s32 emit_sse2_store(struct sljit_compiler *compiler, static SLJIT_INLINE sljit_s32 emit_sse2_load(struct sljit_compiler *compiler, sljit_s32 single, sljit_s32 dst, sljit_s32 src, sljit_sw srcw); +static sljit_s32 emit_cmp_binary(struct sljit_compiler *compiler, + sljit_s32 src1, sljit_sw src1w, + sljit_s32 src2, sljit_sw src2w); + +static SLJIT_INLINE sljit_s32 emit_endbranch(struct sljit_compiler *compiler) +{ +#if (defined SLJIT_CONFIG_X86_CET && SLJIT_CONFIG_X86_CET) + /* Emit endbr32/endbr64 when CET is enabled. */ + sljit_u8 *inst; + inst = (sljit_u8*)ensure_buf(compiler, 1 + 4); + FAIL_IF(!inst); + INC_SIZE(4); + *inst++ = 0xf3; + *inst++ = 0x0f; + *inst++ = 0x1e; +#if (defined SLJIT_CONFIG_X86_32 && SLJIT_CONFIG_X86_32) + *inst = 0xfb; +#else + *inst = 0xfa; +#endif +#else /* !SLJIT_CONFIG_X86_CET */ + SLJIT_UNUSED_ARG(compiler); +#endif /* SLJIT_CONFIG_X86_CET */ + return SLJIT_SUCCESS; +} + +#if (defined SLJIT_CONFIG_X86_CET && SLJIT_CONFIG_X86_CET) && defined (__SHSTK__) + +static SLJIT_INLINE sljit_s32 emit_rdssp(struct sljit_compiler *compiler, sljit_s32 reg) +{ + sljit_u8 *inst; + sljit_s32 size; + +#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) + size = 5; +#else + size = 4; +#endif + + inst = (sljit_u8*)ensure_buf(compiler, 1 + size); + FAIL_IF(!inst); + INC_SIZE(size); + *inst++ = 0xf3; +#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) + *inst++ = REX_W | (reg_map[reg] <= 7 ? 0 : REX_B); +#endif + *inst++ = 0x0f; + *inst++ = 0x1e; + *inst = (0x3 << 6) | (0x1 << 3) | (reg_map[reg] & 0x7); + return SLJIT_SUCCESS; +} + +static SLJIT_INLINE sljit_s32 emit_incssp(struct sljit_compiler *compiler, sljit_s32 reg) +{ + sljit_u8 *inst; + sljit_s32 size; + +#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) + size = 5; +#else + size = 4; +#endif + + inst = (sljit_u8*)ensure_buf(compiler, 1 + size); + FAIL_IF(!inst); + INC_SIZE(size); + *inst++ = 0xf3; +#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) + *inst++ = REX_W | (reg_map[reg] <= 7 ? 0 : REX_B); +#endif + *inst++ = 0x0f; + *inst++ = 0xae; + *inst = (0x3 << 6) | (0x5 << 3) | (reg_map[reg] & 0x7); + return SLJIT_SUCCESS; +} + +#endif /* SLJIT_CONFIG_X86_CET && __SHSTK__ */ + +static SLJIT_INLINE sljit_s32 cpu_has_shadow_stack(void) +{ +#if (defined SLJIT_CONFIG_X86_CET && SLJIT_CONFIG_X86_CET) && defined (__SHSTK__) + return _get_ssp() != 0; +#else /* !SLJIT_CONFIG_X86_CET || !__SHSTK__ */ + return 0; +#endif /* SLJIT_CONFIG_X86_CET && __SHSTK__ */ +} + +static SLJIT_INLINE sljit_s32 adjust_shadow_stack(struct sljit_compiler *compiler, + sljit_s32 src, sljit_sw srcw, sljit_s32 base, sljit_sw disp) +{ +#if (defined SLJIT_CONFIG_X86_CET && SLJIT_CONFIG_X86_CET) && defined (__SHSTK__) + sljit_u8 *inst, *jz_after_cmp_inst; + sljit_uw size_jz_after_cmp_inst; + + sljit_uw size_before_rdssp_inst = compiler->size; + + /* Generate "RDSSP TMP_REG1". */ + FAIL_IF(emit_rdssp(compiler, TMP_REG1)); + + /* Load return address on shadow stack into TMP_REG1. */ +#if (defined SLJIT_CONFIG_X86_32 && SLJIT_CONFIG_X86_32) + SLJIT_ASSERT(reg_map[TMP_REG1] == 5); + + /* Hand code unsupported "mov 0x0(%ebp),%ebp". */ + inst = (sljit_u8*)ensure_buf(compiler, 1 + 3); + FAIL_IF(!inst); + INC_SIZE(3); + *inst++ = 0x8b; + *inst++ = 0x6d; + *inst = 0; +#else /* !SLJIT_CONFIG_X86_32 */ + EMIT_MOV(compiler, TMP_REG1, 0, SLJIT_MEM1(TMP_REG1), 0); +#endif /* SLJIT_CONFIG_X86_32 */ + + if (src == SLJIT_UNUSED) { + /* Return address is on stack. */ + src = SLJIT_MEM1(base); + srcw = disp; + } + + /* Compare return address against TMP_REG1. */ + FAIL_IF(emit_cmp_binary (compiler, TMP_REG1, 0, src, srcw)); + + /* Generate JZ to skip shadow stack ajdustment when shadow + stack matches normal stack. */ + inst = (sljit_u8*)ensure_buf(compiler, 1 + 2); + FAIL_IF(!inst); + INC_SIZE(2); + *inst++ = get_jump_code(SLJIT_EQUAL) - 0x10; + size_jz_after_cmp_inst = compiler->size; + jz_after_cmp_inst = inst; + +#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) + /* REX_W is not necessary. */ + compiler->mode32 = 1; +#endif + /* Load 1 into TMP_REG1. */ + EMIT_MOV(compiler, TMP_REG1, 0, SLJIT_IMM, 1); + + /* Generate "INCSSP TMP_REG1". */ + FAIL_IF(emit_incssp(compiler, TMP_REG1)); + + /* Jump back to "RDSSP TMP_REG1" to check shadow stack again. */ + inst = (sljit_u8*)ensure_buf(compiler, 1 + 2); + FAIL_IF(!inst); + INC_SIZE(2); + *inst++ = JMP_i8; + *inst = size_before_rdssp_inst - compiler->size; + + *jz_after_cmp_inst = compiler->size - size_jz_after_cmp_inst; +#else /* !SLJIT_CONFIG_X86_CET || !__SHSTK__ */ + SLJIT_UNUSED_ARG(compiler); + SLJIT_UNUSED_ARG(src); + SLJIT_UNUSED_ARG(srcw); + SLJIT_UNUSED_ARG(base); + SLJIT_UNUSED_ARG(disp); +#endif /* SLJIT_CONFIG_X86_CET && __SHSTK__ */ + return SLJIT_SUCCESS; +} + #if (defined SLJIT_CONFIG_X86_32 && SLJIT_CONFIG_X86_32) #include "sljitNativeX86_32.c" #else @@ -872,6 +1072,10 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op0(struct sljit_compiler *compile EMIT_MOV(compiler, SLJIT_R1, 0, TMP_REG1, 0); #endif break; + case SLJIT_ENDBR: + return emit_endbranch(compiler); + case SLJIT_SKIP_FRAMES_BEFORE_RETURN: + return skip_frames_before_return(compiler); } return SLJIT_SUCCESS; @@ -1041,12 +1245,12 @@ static sljit_s32 emit_prefetch(struct sljit_compiler *compiler, sljit_s32 op, *inst++ = GROUP_0F; *inst++ = PREFETCH; - if (op >= SLJIT_MOV_U8 && op <= SLJIT_MOV_S8) - *inst |= (3 << 3); - else if (op >= SLJIT_MOV_U16 && op <= SLJIT_MOV_S16) - *inst |= (2 << 3); - else + if (op == SLJIT_PREFETCH_L1) *inst |= (1 << 3); + else if (op == SLJIT_PREFETCH_L2) + *inst |= (2 << 3); + else if (op == SLJIT_PREFETCH_L3) + *inst |= (3 << 3); return SLJIT_SUCCESS; } @@ -1251,12 +1455,6 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op1(struct sljit_compiler *compile compiler->mode32 = op_flags & SLJIT_I32_OP; #endif - if (dst == SLJIT_UNUSED && !HAS_FLAGS(op)) { - if (op <= SLJIT_MOV_P && (src & SLJIT_MEM)) - return emit_prefetch(compiler, op, src, srcw); - return SLJIT_SUCCESS; - } - op = GET_OPCODE(op); if (op >= SLJIT_MOV && op <= SLJIT_MOV_P) { @@ -2117,6 +2315,10 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op2(struct sljit_compiler *compile if (!HAS_FLAGS(op)) { if ((src2 & SLJIT_IMM) && emit_lea_binary(compiler, dst, dstw, src1, src1w, SLJIT_IMM, -src2w) != SLJIT_ERR_UNSUPPORTED) return compiler->error; + if (SLOW_IS_REG(dst) && src2 == dst) { + FAIL_IF(emit_non_cum_binary(compiler, BINARY_OPCODE(SUB), dst, 0, dst, 0, src1, src1w)); + return emit_unary(compiler, NEG_rm, dst, 0, dst, 0); + } } if (dst == SLJIT_UNUSED) @@ -2153,6 +2355,33 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op2(struct sljit_compiler *compile return SLJIT_SUCCESS; } +SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_op_src(struct sljit_compiler *compiler, sljit_s32 op, + sljit_s32 src, sljit_sw srcw) +{ + CHECK_ERROR(); + CHECK(check_sljit_emit_op_src(compiler, op, src, srcw)); + ADJUST_LOCAL_OFFSET(src, srcw); + + CHECK_EXTRA_REGS(src, srcw, (void)0); + + switch (op) { + case SLJIT_FAST_RETURN: + return emit_fast_return(compiler, src, srcw); + case SLJIT_SKIP_FRAMES_BEFORE_FAST_RETURN: + /* Don't adjust shadow stack if it isn't enabled. */ + if (!cpu_has_shadow_stack ()) + return SLJIT_SUCCESS; + return adjust_shadow_stack(compiler, src, srcw, SLJIT_UNUSED, 0); + case SLJIT_PREFETCH_L1: + case SLJIT_PREFETCH_L2: + case SLJIT_PREFETCH_L3: + case SLJIT_PREFETCH_ONCE: + return emit_prefetch(compiler, op, src, srcw); + } + + return SLJIT_SUCCESS; +} + SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_get_register_index(sljit_s32 reg) { CHECK_REG_INDEX(check_sljit_get_register_index(reg)); @@ -2481,7 +2710,7 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_jump* sljit_emit_jump(struct sljit_compile jump = (struct sljit_jump*)ensure_abuf(compiler, sizeof(struct sljit_jump)); PTR_FAIL_IF_NULL(jump); - set_jump(jump, compiler, type & SLJIT_REWRITABLE_JUMP); + set_jump(jump, compiler, (type & SLJIT_REWRITABLE_JUMP) | ((type & 0xff) << TYPE_SHIFT)); type &= 0xff; /* Worst case size. */ @@ -2495,7 +2724,7 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_jump* sljit_emit_jump(struct sljit_compile PTR_FAIL_IF_NULL(inst); *inst++ = 0; - *inst++ = type + 2; + *inst++ = 1; return jump; } @@ -2513,7 +2742,7 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_ijump(struct sljit_compiler *compi if (src == SLJIT_IMM) { jump = (struct sljit_jump*)ensure_abuf(compiler, sizeof(struct sljit_jump)); FAIL_IF_NULL(jump); - set_jump(jump, compiler, JUMP_ADDR); + set_jump(jump, compiler, JUMP_ADDR | (type << TYPE_SHIFT)); jump->u.target = srcw; /* Worst case size. */ @@ -2527,7 +2756,7 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_s32 sljit_emit_ijump(struct sljit_compiler *compi FAIL_IF_NULL(inst); *inst++ = 0; - *inst++ = type + 2; + *inst++ = 1; } else { #if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) @@ -2831,7 +3060,7 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_const* sljit_emit_const(struct sljit_compi PTR_FAIL_IF(!inst); *inst++ = 0; - *inst++ = 1; + *inst++ = 2; #if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) if (dst & SLJIT_MEM) @@ -2842,18 +3071,72 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_const* sljit_emit_const(struct sljit_compi return const_; } +SLJIT_API_FUNC_ATTRIBUTE struct sljit_put_label* sljit_emit_put_label(struct sljit_compiler *compiler, sljit_s32 dst, sljit_sw dstw) +{ + struct sljit_put_label *put_label; + sljit_u8 *inst; +#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) + sljit_s32 reg; + sljit_uw start_size; +#endif + + CHECK_ERROR_PTR(); + CHECK_PTR(check_sljit_emit_put_label(compiler, dst, dstw)); + ADJUST_LOCAL_OFFSET(dst, dstw); + + CHECK_EXTRA_REGS(dst, dstw, (void)0); + + put_label = (struct sljit_put_label*)ensure_abuf(compiler, sizeof(struct sljit_put_label)); + PTR_FAIL_IF(!put_label); + set_put_label(put_label, compiler, 0); + +#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) + compiler->mode32 = 0; + reg = FAST_IS_REG(dst) ? dst : TMP_REG1; + + if (emit_load_imm64(compiler, reg, 0)) + return NULL; +#else + if (emit_mov(compiler, dst, dstw, SLJIT_IMM, 0)) + return NULL; +#endif + +#if (defined SLJIT_CONFIG_X86_64 && SLJIT_CONFIG_X86_64) + if (dst & SLJIT_MEM) { + start_size = compiler->size; + if (emit_mov(compiler, dst, dstw, TMP_REG1, 0)) + return NULL; + put_label->flags = compiler->size - start_size; + } +#endif + + inst = (sljit_u8*)ensure_buf(compiler, 2); + PTR_FAIL_IF(!inst); + + *inst++ = 0; + *inst++ = 3; + + return put_label; +} + SLJIT_API_FUNC_ATTRIBUTE void sljit_set_jump_addr(sljit_uw addr, sljit_uw new_target, sljit_sw executable_offset) { SLJIT_UNUSED_ARG(executable_offset); + + SLJIT_UPDATE_WX_FLAGS((void*)addr, (void*)(addr + sizeof(sljit_uw)), 0); #if (defined SLJIT_CONFIG_X86_32 && SLJIT_CONFIG_X86_32) sljit_unaligned_store_sw((void*)addr, new_target - (addr + 4) - (sljit_uw)executable_offset); #else sljit_unaligned_store_sw((void*)addr, (sljit_sw) new_target); #endif + SLJIT_UPDATE_WX_FLAGS((void*)addr, (void*)(addr + sizeof(sljit_uw)), 1); } SLJIT_API_FUNC_ATTRIBUTE void sljit_set_const(sljit_uw addr, sljit_sw new_constant, sljit_sw executable_offset) { SLJIT_UNUSED_ARG(executable_offset); + + SLJIT_UPDATE_WX_FLAGS((void*)addr, (void*)(addr + sizeof(sljit_sw)), 0); sljit_unaligned_store_sw((void*)addr, new_constant); + SLJIT_UPDATE_WX_FLAGS((void*)addr, (void*)(addr + sizeof(sljit_sw)), 1); } diff --git a/src/pcre2/src/sljit/sljitProtExecAllocator.c b/src/pcre2/src/sljit/sljitProtExecAllocator.c new file mode 100644 index 00000000..147175af --- /dev/null +++ b/src/pcre2/src/sljit/sljitProtExecAllocator.c @@ -0,0 +1,474 @@ +/* + * Stack-less Just-In-Time compiler + * + * Copyright Zoltan Herczeg (hzmester@freemail.hu). All rights reserved. + * + * Redistribution and use in source and binary forms, with or without modification, are + * permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, this list of + * conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright notice, this list + * of conditions and the following disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) AND CONTRIBUTORS ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT + * SHALL THE COPYRIGHT HOLDER(S) OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED + * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN + * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + This file contains a simple executable memory allocator + + It is assumed, that executable code blocks are usually medium (or sometimes + large) memory blocks, and the allocator is not too frequently called (less + optimized than other allocators). Thus, using it as a generic allocator is + not suggested. + + How does it work: + Memory is allocated in continuous memory areas called chunks by alloc_chunk() + Chunk format: + [ block ][ block ] ... [ block ][ block terminator ] + + All blocks and the block terminator is started with block_header. The block + header contains the size of the previous and the next block. These sizes + can also contain special values. + Block size: + 0 - The block is a free_block, with a different size member. + 1 - The block is a block terminator. + n - The block is used at the moment, and the value contains its size. + Previous block size: + 0 - This is the first block of the memory chunk. + n - The size of the previous block. + + Using these size values we can go forward or backward on the block chain. + The unused blocks are stored in a chain list pointed by free_blocks. This + list is useful if we need to find a suitable memory area when the allocator + is called. + + When a block is freed, the new free block is connected to its adjacent free + blocks if possible. + + [ free block ][ used block ][ free block ] + and "used block" is freed, the three blocks are connected together: + [ one big free block ] +*/ + +/* --------------------------------------------------------------------- */ +/* System (OS) functions */ +/* --------------------------------------------------------------------- */ + +/* 64 KByte. */ +#define CHUNK_SIZE 0x10000 + +struct chunk_header { + void *executable; +}; + +/* + alloc_chunk / free_chunk : + * allocate executable system memory chunks + * the size is always divisible by CHUNK_SIZE + SLJIT_ALLOCATOR_LOCK / SLJIT_ALLOCATOR_UNLOCK : + * provided as part of sljitUtils + * only the allocator requires this lock, sljit is fully thread safe + as it only uses local variables +*/ + +#ifndef __NetBSD__ +#include +#include +#include +#include + +#ifndef O_NOATIME +#define O_NOATIME 0 +#endif + +/* this is a linux extension available since kernel 3.11 */ +#ifndef O_TMPFILE +#define O_TMPFILE 020200000 +#endif + +#ifndef _GNU_SOURCE +char *secure_getenv(const char *name); +int mkostemp(char *template, int flags); +#endif + +static SLJIT_INLINE int create_tempfile(void) +{ + int fd; + char tmp_name[256]; + size_t tmp_name_len = 0; + char *dir; + struct stat st; +#if defined(SLJIT_SINGLE_THREADED) && SLJIT_SINGLE_THREADED + mode_t mode; +#endif + +#ifdef HAVE_MEMFD_CREATE + /* this is a GNU extension, make sure to use -D_GNU_SOURCE */ + fd = memfd_create("sljit", MFD_CLOEXEC); + if (fd != -1) { + fchmod(fd, 0); + return fd; + } +#endif + + dir = secure_getenv("TMPDIR"); + + if (dir) { + tmp_name_len = strlen(dir); + if (tmp_name_len > 0 && tmp_name_len < sizeof(tmp_name)) { + if ((stat(dir, &st) == 0) && S_ISDIR(st.st_mode)) + strcpy(tmp_name, dir); + } + } + +#ifdef P_tmpdir + if (!tmp_name_len) { + tmp_name_len = strlen(P_tmpdir); + if (tmp_name_len > 0 && tmp_name_len < sizeof(tmp_name)) + strcpy(tmp_name, P_tmpdir); + } +#endif + if (!tmp_name_len) { + strcpy(tmp_name, "/tmp"); + tmp_name_len = 4; + } + + SLJIT_ASSERT(tmp_name_len > 0 && tmp_name_len < sizeof(tmp_name)); + + if (tmp_name[tmp_name_len - 1] == '/') + tmp_name[--tmp_name_len] = '\0'; + +#ifdef __linux__ + /* + * the previous trimming might had left an empty string if TMPDIR="/" + * so work around the problem below + */ + fd = open(tmp_name_len ? tmp_name : "/", + O_TMPFILE | O_EXCL | O_RDWR | O_NOATIME | O_CLOEXEC, 0); + if (fd != -1) + return fd; +#endif + + if (tmp_name_len + 7 >= sizeof(tmp_name)) + return -1; + + strcpy(tmp_name + tmp_name_len, "/XXXXXX"); +#if defined(SLJIT_SINGLE_THREADED) && SLJIT_SINGLE_THREADED + mode = umask(0777); +#endif + fd = mkostemp(tmp_name, O_CLOEXEC | O_NOATIME); +#if defined(SLJIT_SINGLE_THREADED) && SLJIT_SINGLE_THREADED + umask(mode); +#else + fchmod(fd, 0); +#endif + + if (fd == -1) + return -1; + + if (unlink(tmp_name)) { + close(fd); + return -1; + } + + return fd; +} + +static SLJIT_INLINE struct chunk_header* alloc_chunk(sljit_uw size) +{ + struct chunk_header *retval; + int fd; + + fd = create_tempfile(); + if (fd == -1) + return NULL; + + if (ftruncate(fd, size)) { + close(fd); + return NULL; + } + + retval = (struct chunk_header *)mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + + if (retval == MAP_FAILED) { + close(fd); + return NULL; + } + + retval->executable = mmap(NULL, size, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0); + + if (retval->executable == MAP_FAILED) { + munmap((void *)retval, size); + close(fd); + return NULL; + } + + close(fd); + return retval; +} +#else +/* + * MAP_REMAPDUP is a NetBSD extension available sinde 8.0, make sure to + * adjust your feature macros (ex: -D_NETBSD_SOURCE) as needed + */ +static SLJIT_INLINE struct chunk_header* alloc_chunk(sljit_uw size) +{ + struct chunk_header *retval; + + retval = (struct chunk_header *)mmap(NULL, size, + PROT_READ | PROT_WRITE | PROT_MPROTECT(PROT_EXEC), + MAP_ANON | MAP_SHARED, -1, 0); + + if (retval == MAP_FAILED) + return NULL; + + retval->executable = mremap(retval, size, NULL, size, MAP_REMAPDUP); + if (retval->executable == MAP_FAILED) { + munmap((void *)retval, size); + return NULL; + } + + if (mprotect(retval->executable, size, PROT_READ | PROT_EXEC) == -1) { + munmap(retval->executable, size); + munmap((void *)retval, size); + return NULL; + } + + return retval; +} +#endif /* NetBSD */ + +static SLJIT_INLINE void free_chunk(void *chunk, sljit_uw size) +{ + struct chunk_header *header = ((struct chunk_header *)chunk) - 1; + + munmap(header->executable, size); + munmap((void *)header, size); +} + +/* --------------------------------------------------------------------- */ +/* Common functions */ +/* --------------------------------------------------------------------- */ + +#define CHUNK_MASK (~(CHUNK_SIZE - 1)) + +struct block_header { + sljit_uw size; + sljit_uw prev_size; + sljit_sw executable_offset; +}; + +struct free_block { + struct block_header header; + struct free_block *next; + struct free_block *prev; + sljit_uw size; +}; + +#define AS_BLOCK_HEADER(base, offset) \ + ((struct block_header*)(((sljit_u8*)base) + offset)) +#define AS_FREE_BLOCK(base, offset) \ + ((struct free_block*)(((sljit_u8*)base) + offset)) +#define MEM_START(base) ((void*)((base) + 1)) +#define ALIGN_SIZE(size) (((size) + sizeof(struct block_header) + 7) & ~7) + +static struct free_block* free_blocks; +static sljit_uw allocated_size; +static sljit_uw total_size; + +static SLJIT_INLINE void sljit_insert_free_block(struct free_block *free_block, sljit_uw size) +{ + free_block->header.size = 0; + free_block->size = size; + + free_block->next = free_blocks; + free_block->prev = NULL; + if (free_blocks) + free_blocks->prev = free_block; + free_blocks = free_block; +} + +static SLJIT_INLINE void sljit_remove_free_block(struct free_block *free_block) +{ + if (free_block->next) + free_block->next->prev = free_block->prev; + + if (free_block->prev) + free_block->prev->next = free_block->next; + else { + SLJIT_ASSERT(free_blocks == free_block); + free_blocks = free_block->next; + } +} + +SLJIT_API_FUNC_ATTRIBUTE void* sljit_malloc_exec(sljit_uw size) +{ + struct chunk_header *chunk_header; + struct block_header *header; + struct block_header *next_header; + struct free_block *free_block; + sljit_uw chunk_size; + sljit_sw executable_offset; + + SLJIT_ALLOCATOR_LOCK(); + if (size < (64 - sizeof(struct block_header))) + size = (64 - sizeof(struct block_header)); + size = ALIGN_SIZE(size); + + free_block = free_blocks; + while (free_block) { + if (free_block->size >= size) { + chunk_size = free_block->size; + if (chunk_size > size + 64) { + /* We just cut a block from the end of the free block. */ + chunk_size -= size; + free_block->size = chunk_size; + header = AS_BLOCK_HEADER(free_block, chunk_size); + header->prev_size = chunk_size; + header->executable_offset = free_block->header.executable_offset; + AS_BLOCK_HEADER(header, size)->prev_size = size; + } + else { + sljit_remove_free_block(free_block); + header = (struct block_header*)free_block; + size = chunk_size; + } + allocated_size += size; + header->size = size; + SLJIT_ALLOCATOR_UNLOCK(); + return MEM_START(header); + } + free_block = free_block->next; + } + + chunk_size = sizeof(struct chunk_header) + sizeof(struct block_header); + chunk_size = (chunk_size + size + CHUNK_SIZE - 1) & CHUNK_MASK; + + chunk_header = alloc_chunk(chunk_size); + if (!chunk_header) { + SLJIT_ALLOCATOR_UNLOCK(); + return NULL; + } + + executable_offset = (sljit_sw)((sljit_u8*)chunk_header->executable - (sljit_u8*)chunk_header); + + chunk_size -= sizeof(struct chunk_header) + sizeof(struct block_header); + total_size += chunk_size; + + header = (struct block_header *)(chunk_header + 1); + + header->prev_size = 0; + header->executable_offset = executable_offset; + if (chunk_size > size + 64) { + /* Cut the allocated space into a free and a used block. */ + allocated_size += size; + header->size = size; + chunk_size -= size; + + free_block = AS_FREE_BLOCK(header, size); + free_block->header.prev_size = size; + free_block->header.executable_offset = executable_offset; + sljit_insert_free_block(free_block, chunk_size); + next_header = AS_BLOCK_HEADER(free_block, chunk_size); + } + else { + /* All space belongs to this allocation. */ + allocated_size += chunk_size; + header->size = chunk_size; + next_header = AS_BLOCK_HEADER(header, chunk_size); + } + next_header->size = 1; + next_header->prev_size = chunk_size; + next_header->executable_offset = executable_offset; + SLJIT_ALLOCATOR_UNLOCK(); + return MEM_START(header); +} + +SLJIT_API_FUNC_ATTRIBUTE void sljit_free_exec(void* ptr) +{ + struct block_header *header; + struct free_block* free_block; + + SLJIT_ALLOCATOR_LOCK(); + header = AS_BLOCK_HEADER(ptr, -(sljit_sw)sizeof(struct block_header)); + header = AS_BLOCK_HEADER(header, -header->executable_offset); + allocated_size -= header->size; + + /* Connecting free blocks together if possible. */ + + /* If header->prev_size == 0, free_block will equal to header. + In this case, free_block->header.size will be > 0. */ + free_block = AS_FREE_BLOCK(header, -(sljit_sw)header->prev_size); + if (SLJIT_UNLIKELY(!free_block->header.size)) { + free_block->size += header->size; + header = AS_BLOCK_HEADER(free_block, free_block->size); + header->prev_size = free_block->size; + } + else { + free_block = (struct free_block*)header; + sljit_insert_free_block(free_block, header->size); + } + + header = AS_BLOCK_HEADER(free_block, free_block->size); + if (SLJIT_UNLIKELY(!header->size)) { + free_block->size += ((struct free_block*)header)->size; + sljit_remove_free_block((struct free_block*)header); + header = AS_BLOCK_HEADER(free_block, free_block->size); + header->prev_size = free_block->size; + } + + /* The whole chunk is free. */ + if (SLJIT_UNLIKELY(!free_block->header.prev_size && header->size == 1)) { + /* If this block is freed, we still have (allocated_size / 2) free space. */ + if (total_size - free_block->size > (allocated_size * 3 / 2)) { + total_size -= free_block->size; + sljit_remove_free_block(free_block); + free_chunk(free_block, free_block->size + + sizeof(struct chunk_header) + + sizeof(struct block_header)); + } + } + + SLJIT_ALLOCATOR_UNLOCK(); +} + +SLJIT_API_FUNC_ATTRIBUTE void sljit_free_unused_memory_exec(void) +{ + struct free_block* free_block; + struct free_block* next_free_block; + + SLJIT_ALLOCATOR_LOCK(); + + free_block = free_blocks; + while (free_block) { + next_free_block = free_block->next; + if (!free_block->header.prev_size && + AS_BLOCK_HEADER(free_block, free_block->size)->size == 1) { + total_size -= free_block->size; + sljit_remove_free_block(free_block); + free_chunk(free_block, free_block->size + + sizeof(struct chunk_header) + + sizeof(struct block_header)); + } + free_block = next_free_block; + } + + SLJIT_ASSERT((total_size && free_blocks) || (!total_size && !free_blocks)); + SLJIT_ALLOCATOR_UNLOCK(); +} + +SLJIT_API_FUNC_ATTRIBUTE sljit_sw sljit_exec_offset(void* ptr) +{ + return ((struct block_header *)(ptr))[-1].executable_offset; +} diff --git a/src/pcre/sljit/sljitUtils.c b/src/pcre2/src/sljit/sljitUtils.c similarity index 52% rename from src/pcre/sljit/sljitUtils.c rename to src/pcre2/src/sljit/sljitUtils.c index 5c2a8389..9bce7147 100644 --- a/src/pcre/sljit/sljitUtils.c +++ b/src/pcre2/src/sljit/sljitUtils.c @@ -28,214 +28,229 @@ /* Locks */ /* ------------------------------------------------------------------------ */ -#if (defined SLJIT_EXECUTABLE_ALLOCATOR && SLJIT_EXECUTABLE_ALLOCATOR) || (defined SLJIT_UTIL_GLOBAL_LOCK && SLJIT_UTIL_GLOBAL_LOCK) +/* Executable Allocator */ +#if (defined SLJIT_EXECUTABLE_ALLOCATOR && SLJIT_EXECUTABLE_ALLOCATOR) \ + && !(defined SLJIT_WX_EXECUTABLE_ALLOCATOR && SLJIT_WX_EXECUTABLE_ALLOCATOR) #if (defined SLJIT_SINGLE_THREADED && SLJIT_SINGLE_THREADED) +#define SLJIT_ALLOCATOR_LOCK() +#define SLJIT_ALLOCATOR_UNLOCK() +#elif !(defined _WIN32) +#include + +static pthread_mutex_t allocator_lock = PTHREAD_MUTEX_INITIALIZER; -#if (defined SLJIT_EXECUTABLE_ALLOCATOR && SLJIT_EXECUTABLE_ALLOCATOR) +#define SLJIT_ALLOCATOR_LOCK() pthread_mutex_lock(&allocator_lock) +#define SLJIT_ALLOCATOR_UNLOCK() pthread_mutex_unlock(&allocator_lock) +#else /* windows */ +static HANDLE allocator_lock; static SLJIT_INLINE void allocator_grab_lock(void) { - /* Always successful. */ + HANDLE lock; + if (SLJIT_UNLIKELY(!InterlockedCompareExchangePointer(&allocator_lock, NULL, NULL))) { + lock = CreateMutex(NULL, FALSE, NULL); + if (InterlockedCompareExchangePointer(&allocator_lock, lock, NULL)) + CloseHandle(lock); + } + WaitForSingleObject(allocator_lock, INFINITE); } -static SLJIT_INLINE void allocator_release_lock(void) -{ - /* Always successful. */ -} +#define SLJIT_ALLOCATOR_LOCK() allocator_grab_lock() +#define SLJIT_ALLOCATOR_UNLOCK() ReleaseMutex(allocator_lock) +#endif /* thread implementation */ +#endif /* SLJIT_EXECUTABLE_ALLOCATOR && !SLJIT_WX_EXECUTABLE_ALLOCATOR */ -#endif /* SLJIT_EXECUTABLE_ALLOCATOR */ +/* ------------------------------------------------------------------------ */ +/* Stack */ +/* ------------------------------------------------------------------------ */ -#if (defined SLJIT_UTIL_GLOBAL_LOCK && SLJIT_UTIL_GLOBAL_LOCK) +#if ((defined SLJIT_UTIL_STACK && SLJIT_UTIL_STACK) \ + && !(defined SLJIT_UTIL_SIMPLE_STACK_ALLOCATION && SLJIT_UTIL_SIMPLE_STACK_ALLOCATION)) \ + || ((defined SLJIT_EXECUTABLE_ALLOCATOR && SLJIT_EXECUTABLE_ALLOCATOR) \ + && !((defined SLJIT_PROT_EXECUTABLE_ALLOCATOR && SLJIT_PROT_EXECUTABLE_ALLOCATOR) \ + || (defined SLJIT_WX_EXECUTABLE_ALLOCATOR && SLJIT_WX_EXECUTABLE_ALLOCATOR))) -SLJIT_API_FUNC_ATTRIBUTE void SLJIT_FUNC sljit_grab_lock(void) -{ - /* Always successful. */ -} +#ifndef _WIN32 +/* Provides mmap function. */ +#include +#include -SLJIT_API_FUNC_ATTRIBUTE void SLJIT_FUNC sljit_release_lock(void) -{ - /* Always successful. */ -} +#ifndef MAP_ANON +#ifdef MAP_ANONYMOUS +#define MAP_ANON MAP_ANONYMOUS +#endif /* MAP_ANONYMOUS */ +#endif /* !MAP_ANON */ -#endif /* SLJIT_UTIL_GLOBAL_LOCK */ +#ifndef MAP_ANON -#elif defined(_WIN32) /* SLJIT_SINGLE_THREADED */ +#include -#include "windows.h" +#ifdef O_CLOEXEC +#define SLJIT_CLOEXEC O_CLOEXEC +#else /* !O_CLOEXEC */ +#define SLJIT_CLOEXEC 0 +#endif /* O_CLOEXEC */ -#if (defined SLJIT_EXECUTABLE_ALLOCATOR && SLJIT_EXECUTABLE_ALLOCATOR) +/* Some old systems do not have MAP_ANON. */ +static int dev_zero = -1; -static HANDLE allocator_mutex = 0; +#if (defined SLJIT_SINGLE_THREADED && SLJIT_SINGLE_THREADED) -static SLJIT_INLINE void allocator_grab_lock(void) +static SLJIT_INLINE int open_dev_zero(void) { - /* No idea what to do if an error occures. Static mutexes should never fail... */ - if (!allocator_mutex) - allocator_mutex = CreateMutex(NULL, TRUE, NULL); - else - WaitForSingleObject(allocator_mutex, INFINITE); -} + dev_zero = open("/dev/zero", O_RDWR | SLJIT_CLOEXEC); -static SLJIT_INLINE void allocator_release_lock(void) -{ - ReleaseMutex(allocator_mutex); + return dev_zero < 0; } -#endif /* SLJIT_EXECUTABLE_ALLOCATOR */ +#else /* !SLJIT_SINGLE_THREADED */ -#if (defined SLJIT_UTIL_GLOBAL_LOCK && SLJIT_UTIL_GLOBAL_LOCK) +#include -static HANDLE global_mutex = 0; +static pthread_mutex_t dev_zero_mutex = PTHREAD_MUTEX_INITIALIZER; -SLJIT_API_FUNC_ATTRIBUTE void SLJIT_FUNC sljit_grab_lock(void) +static SLJIT_INLINE int open_dev_zero(void) { - /* No idea what to do if an error occures. Static mutexes should never fail... */ - if (!global_mutex) - global_mutex = CreateMutex(NULL, TRUE, NULL); - else - WaitForSingleObject(global_mutex, INFINITE); -} + pthread_mutex_lock(&dev_zero_mutex); + if (SLJIT_UNLIKELY(dev_zero < 0)) + dev_zero = open("/dev/zero", O_RDWR | SLJIT_CLOEXEC); -SLJIT_API_FUNC_ATTRIBUTE void SLJIT_FUNC sljit_release_lock(void) -{ - ReleaseMutex(global_mutex); + pthread_mutex_unlock(&dev_zero_mutex); + return dev_zero < 0; } -#endif /* SLJIT_UTIL_GLOBAL_LOCK */ - -#else /* _WIN32 */ - -#if (defined SLJIT_EXECUTABLE_ALLOCATOR && SLJIT_EXECUTABLE_ALLOCATOR) - -#include +#endif /* SLJIT_SINGLE_THREADED */ +#undef SLJIT_CLOEXEC +#endif /* !MAP_ANON */ +#endif /* !_WIN32 */ +#endif /* open_dev_zero */ -static pthread_mutex_t allocator_mutex = PTHREAD_MUTEX_INITIALIZER; +#if (defined SLJIT_UTIL_STACK && SLJIT_UTIL_STACK) \ + || (defined SLJIT_EXECUTABLE_ALLOCATOR && SLJIT_EXECUTABLE_ALLOCATOR) -static SLJIT_INLINE void allocator_grab_lock(void) -{ - pthread_mutex_lock(&allocator_mutex); -} +#ifdef _WIN32 -static SLJIT_INLINE void allocator_release_lock(void) -{ - pthread_mutex_unlock(&allocator_mutex); +static SLJIT_INLINE sljit_sw get_page_alignment(void) { + SYSTEM_INFO si; + static sljit_sw sljit_page_align; + if (!sljit_page_align) { + GetSystemInfo(&si); + sljit_page_align = si.dwPageSize - 1; + } + return sljit_page_align; } -#endif /* SLJIT_EXECUTABLE_ALLOCATOR */ - -#if (defined SLJIT_UTIL_GLOBAL_LOCK && SLJIT_UTIL_GLOBAL_LOCK) - -#include +#else -static pthread_mutex_t global_mutex = PTHREAD_MUTEX_INITIALIZER; +#include -SLJIT_API_FUNC_ATTRIBUTE void SLJIT_FUNC sljit_grab_lock(void) -{ - pthread_mutex_lock(&global_mutex); +static SLJIT_INLINE sljit_sw get_page_alignment(void) { + static sljit_sw sljit_page_align = -1; + if (sljit_page_align < 0) { +#ifdef _SC_PAGESIZE + sljit_page_align = sysconf(_SC_PAGESIZE); +#else + sljit_page_align = getpagesize(); +#endif + /* Should never happen. */ + if (sljit_page_align < 0) + sljit_page_align = 4096; + sljit_page_align--; + } + return sljit_page_align; } -SLJIT_API_FUNC_ATTRIBUTE void SLJIT_FUNC sljit_release_lock(void) -{ - pthread_mutex_unlock(&global_mutex); -} +#endif /* _WIN32 */ -#endif /* SLJIT_UTIL_GLOBAL_LOCK */ +#endif /* get_page_alignment() */ -#endif /* _WIN32 */ +#if (defined SLJIT_UTIL_STACK && SLJIT_UTIL_STACK) -/* ------------------------------------------------------------------------ */ -/* Stack */ -/* ------------------------------------------------------------------------ */ +#if (defined SLJIT_UTIL_SIMPLE_STACK_ALLOCATION && SLJIT_UTIL_SIMPLE_STACK_ALLOCATION) -#if (defined SLJIT_UTIL_STACK && SLJIT_UTIL_STACK) || (defined SLJIT_EXECUTABLE_ALLOCATOR && SLJIT_EXECUTABLE_ALLOCATOR) +SLJIT_API_FUNC_ATTRIBUTE struct sljit_stack* SLJIT_FUNC sljit_allocate_stack(sljit_uw start_size, sljit_uw max_size, void *allocator_data) +{ + struct sljit_stack *stack; + void *ptr; -#ifdef _WIN32 -#include "windows.h" -#else -/* Provides mmap function. */ -#include -/* For detecting the page size. */ -#include + SLJIT_UNUSED_ARG(allocator_data); -#ifndef MAP_ANON + if (start_size > max_size || start_size < 1) + return NULL; -#include + stack = (struct sljit_stack*)SLJIT_MALLOC(sizeof(struct sljit_stack), allocator_data); + if (stack == NULL) + return NULL; -/* Some old systems does not have MAP_ANON. */ -static sljit_s32 dev_zero = -1; + ptr = SLJIT_MALLOC(max_size, allocator_data); + if (ptr == NULL) { + SLJIT_FREE(stack, allocator_data); + return NULL; + } -#if (defined SLJIT_SINGLE_THREADED && SLJIT_SINGLE_THREADED) + stack->min_start = (sljit_u8 *)ptr; + stack->end = stack->min_start + max_size; + stack->start = stack->end - start_size; + stack->top = stack->end; + return stack; +} -static SLJIT_INLINE sljit_s32 open_dev_zero(void) +SLJIT_API_FUNC_ATTRIBUTE void SLJIT_FUNC sljit_free_stack(struct sljit_stack *stack, void *allocator_data) { - dev_zero = open("/dev/zero", O_RDWR); - return dev_zero < 0; + SLJIT_UNUSED_ARG(allocator_data); + SLJIT_FREE((void*)stack->min_start, allocator_data); + SLJIT_FREE(stack, allocator_data); } -#else /* SLJIT_SINGLE_THREADED */ - -#include - -static pthread_mutex_t dev_zero_mutex = PTHREAD_MUTEX_INITIALIZER; - -static SLJIT_INLINE sljit_s32 open_dev_zero(void) +SLJIT_API_FUNC_ATTRIBUTE sljit_u8 *SLJIT_FUNC sljit_stack_resize(struct sljit_stack *stack, sljit_u8 *new_start) { - pthread_mutex_lock(&dev_zero_mutex); - /* The dev_zero might be initialized by another thread during the waiting. */ - if (dev_zero < 0) { - dev_zero = open("/dev/zero", O_RDWR); - } - pthread_mutex_unlock(&dev_zero_mutex); - return dev_zero < 0; + if ((new_start < stack->min_start) || (new_start >= stack->end)) + return NULL; + stack->start = new_start; + return new_start; } -#endif /* SLJIT_SINGLE_THREADED */ +#else /* !SLJIT_UTIL_SIMPLE_STACK_ALLOCATION */ -#endif +#ifdef _WIN32 -#endif +SLJIT_API_FUNC_ATTRIBUTE void SLJIT_FUNC sljit_free_stack(struct sljit_stack *stack, void *allocator_data) +{ + SLJIT_UNUSED_ARG(allocator_data); + VirtualFree((void*)stack->min_start, 0, MEM_RELEASE); + SLJIT_FREE(stack, allocator_data); +} -#endif /* SLJIT_UTIL_STACK || SLJIT_EXECUTABLE_ALLOCATOR */ +#else /* !_WIN32 */ -#if (defined SLJIT_UTIL_STACK && SLJIT_UTIL_STACK) +SLJIT_API_FUNC_ATTRIBUTE void SLJIT_FUNC sljit_free_stack(struct sljit_stack *stack, void *allocator_data) +{ + SLJIT_UNUSED_ARG(allocator_data); + munmap((void*)stack->min_start, stack->end - stack->min_start); + SLJIT_FREE(stack, allocator_data); +} -/* Planning to make it even more clever in the future. */ -static sljit_sw sljit_page_align = 0; +#endif /* _WIN32 */ SLJIT_API_FUNC_ATTRIBUTE struct sljit_stack* SLJIT_FUNC sljit_allocate_stack(sljit_uw start_size, sljit_uw max_size, void *allocator_data) { struct sljit_stack *stack; void *ptr; -#ifdef _WIN32 - SYSTEM_INFO si; -#endif + sljit_sw page_align; SLJIT_UNUSED_ARG(allocator_data); + if (start_size > max_size || start_size < 1) return NULL; -#ifdef _WIN32 - if (!sljit_page_align) { - GetSystemInfo(&si); - sljit_page_align = si.dwPageSize - 1; - } -#else - if (!sljit_page_align) { - sljit_page_align = sysconf(_SC_PAGESIZE); - /* Should never happen. */ - if (sljit_page_align < 0) - sljit_page_align = 4096; - sljit_page_align--; - } -#endif - stack = (struct sljit_stack*)SLJIT_MALLOC(sizeof(struct sljit_stack), allocator_data); - if (!stack) + if (stack == NULL) return NULL; /* Align max_size. */ - max_size = (max_size + sljit_page_align) & ~sljit_page_align; + page_align = get_page_alignment(); + max_size = (max_size + page_align) & ~page_align; #ifdef _WIN32 ptr = VirtualAlloc(NULL, max_size, MEM_RESERVE, PAGE_READWRITE); @@ -252,18 +267,16 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_stack* SLJIT_FUNC sljit_allocate_stack(slj sljit_free_stack(stack, allocator_data); return NULL; } -#else +#else /* !_WIN32 */ #ifdef MAP_ANON ptr = mmap(NULL, max_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); -#else - if (dev_zero < 0) { - if (open_dev_zero()) { - SLJIT_FREE(stack, allocator_data); - return NULL; - } +#else /* !MAP_ANON */ + if (SLJIT_UNLIKELY((dev_zero < 0) && open_dev_zero())) { + SLJIT_FREE(stack, allocator_data); + return NULL; } ptr = mmap(NULL, max_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, dev_zero, 0); -#endif +#endif /* MAP_ANON */ if (ptr == MAP_FAILED) { SLJIT_FREE(stack, allocator_data); return NULL; @@ -271,35 +284,28 @@ SLJIT_API_FUNC_ATTRIBUTE struct sljit_stack* SLJIT_FUNC sljit_allocate_stack(slj stack->min_start = (sljit_u8 *)ptr; stack->end = stack->min_start + max_size; stack->start = stack->end - start_size; -#endif +#endif /* _WIN32 */ + stack->top = stack->end; return stack; } -#undef PAGE_ALIGN - -SLJIT_API_FUNC_ATTRIBUTE void SLJIT_FUNC sljit_free_stack(struct sljit_stack *stack, void *allocator_data) -{ - SLJIT_UNUSED_ARG(allocator_data); -#ifdef _WIN32 - VirtualFree((void*)stack->min_start, 0, MEM_RELEASE); -#else - munmap((void*)stack->min_start, stack->end - stack->min_start); -#endif - SLJIT_FREE(stack, allocator_data); -} - SLJIT_API_FUNC_ATTRIBUTE sljit_u8 *SLJIT_FUNC sljit_stack_resize(struct sljit_stack *stack, sljit_u8 *new_start) { +#if defined _WIN32 || defined(POSIX_MADV_DONTNEED) sljit_uw aligned_old_start; sljit_uw aligned_new_start; + sljit_sw page_align; +#endif if ((new_start < stack->min_start) || (new_start >= stack->end)) return NULL; #ifdef _WIN32 - aligned_new_start = (sljit_uw)new_start & ~sljit_page_align; - aligned_old_start = ((sljit_uw)stack->start) & ~sljit_page_align; + page_align = get_page_alignment(); + + aligned_new_start = (sljit_uw)new_start & ~page_align; + aligned_old_start = ((sljit_uw)stack->start) & ~page_align; if (aligned_new_start != aligned_old_start) { if (aligned_new_start < aligned_old_start) { if (!VirtualAlloc((void*)aligned_new_start, aligned_old_start - aligned_new_start, MEM_COMMIT, PAGE_READWRITE)) @@ -310,24 +316,26 @@ SLJIT_API_FUNC_ATTRIBUTE sljit_u8 *SLJIT_FUNC sljit_stack_resize(struct sljit_st return NULL; } } -#else - if (stack->start < new_start) { - aligned_new_start = (sljit_uw)new_start & ~sljit_page_align; - aligned_old_start = ((sljit_uw)stack->start) & ~sljit_page_align; - /* If madvise is available, we release the unnecessary space. */ -#if defined(MADV_DONTNEED) - if (aligned_new_start > aligned_old_start) - madvise((void*)aligned_old_start, aligned_new_start - aligned_old_start, MADV_DONTNEED); #elif defined(POSIX_MADV_DONTNEED) - if (aligned_new_start > aligned_old_start) + if (stack->start < new_start) { + page_align = get_page_alignment(); + + aligned_new_start = (sljit_uw)new_start & ~page_align; + aligned_old_start = ((sljit_uw)stack->start) & ~page_align; + + if (aligned_new_start > aligned_old_start) { posix_madvise((void*)aligned_old_start, aligned_new_start - aligned_old_start, POSIX_MADV_DONTNEED); -#endif +#ifdef MADV_FREE + madvise((void*)aligned_old_start, aligned_new_start - aligned_old_start, MADV_FREE); +#endif /* MADV_FREE */ + } } -#endif +#endif /* _WIN32 */ + stack->start = new_start; return new_start; } -#endif /* SLJIT_UTIL_STACK */ +#endif /* SLJIT_UTIL_SIMPLE_STACK_ALLOCATION */ -#endif +#endif /* SLJIT_UTIL_STACK */ diff --git a/src/pcre2/src/sljit/sljitWXExecAllocator.c b/src/pcre2/src/sljit/sljitWXExecAllocator.c new file mode 100644 index 00000000..72d5b8dd --- /dev/null +++ b/src/pcre2/src/sljit/sljitWXExecAllocator.c @@ -0,0 +1,229 @@ +/* + * Stack-less Just-In-Time compiler + * + * Copyright Zoltan Herczeg (hzmester@freemail.hu). All rights reserved. + * + * Redistribution and use in source and binary forms, with or without modification, are + * permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, this list of + * conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright notice, this list + * of conditions and the following disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) AND CONTRIBUTORS ``AS IS'' AND ANY + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES + * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT + * SHALL THE COPYRIGHT HOLDER(S) OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, + * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED + * TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN + * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* + This file contains a simple W^X executable memory allocator for POSIX + like systems and Windows + + In *NIX, MAP_ANON is required (that is considered a feature) so make + sure to set the right availability macros for your system or the code + will fail to build. + + If your system doesn't support mapping of anonymous pages (ex: IRIX) it + is also likely that it doesn't need this allocator and should be using + the standard one instead. + + It allocates a separate map for each code block and may waste a lot of + memory, because whatever was requested, will be rounded up to the page + size (minimum 4KB, but could be even bigger). + + It changes the page permissions (RW <-> RX) as needed and therefore, if you + will be updating the code after it has been generated, need to make sure to + block any concurrent execution, or could result in a SIGBUS, that could + even manifest itself at a different address than the one that was being + modified. + + Only use if you are unable to use the regular allocator because of security + restrictions and adding exceptions to your application or the system are + not possible. +*/ + +#define SLJIT_UPDATE_WX_FLAGS(from, to, enable_exec) \ + sljit_update_wx_flags((from), (to), (enable_exec)) + +#ifndef _WIN32 +#include +#include + +#ifdef __NetBSD__ +#if defined(PROT_MPROTECT) +#define check_se_protected(ptr, size) (0) +#define SLJIT_PROT_WX PROT_MPROTECT(PROT_EXEC) +#else /* !PROT_MPROTECT */ +#ifdef _NETBSD_SOURCE +#include +#else /* !_NETBSD_SOURCE */ +typedef unsigned int u_int; +#define devmajor_t sljit_s32 +#endif /* _NETBSD_SOURCE */ +#include +#include + +#define check_se_protected(ptr, size) netbsd_se_protected() + +static SLJIT_INLINE int netbsd_se_protected(void) +{ + int mib[3]; + int paxflags; + size_t len = sizeof(paxflags); + + mib[0] = CTL_PROC; + mib[1] = getpid(); + mib[2] = PROC_PID_PAXFLAGS; + + if (SLJIT_UNLIKELY(sysctl(mib, 3, &paxflags, &len, NULL, 0) < 0)) + return -1; + + return (paxflags & CTL_PROC_PAXFLAGS_MPROTECT) ? -1 : 0; +} +#endif /* PROT_MPROTECT */ +#else /* POSIX */ +#define check_se_protected(ptr, size) generic_se_protected(ptr, size) + +static SLJIT_INLINE int generic_se_protected(void *ptr, sljit_uw size) +{ + if (SLJIT_LIKELY(!mprotect(ptr, size, PROT_EXEC))) + return mprotect(ptr, size, PROT_READ | PROT_WRITE); + + return -1; +} +#endif /* NetBSD */ + +#if defined SLJIT_SINGLE_THREADED && SLJIT_SINGLE_THREADED +#define SLJIT_SE_LOCK() +#define SLJIT_SE_UNLOCK() +#else /* !SLJIT_SINGLE_THREADED */ +#include +#define SLJIT_SE_LOCK() pthread_mutex_lock(&se_lock) +#define SLJIT_SE_UNLOCK() pthread_mutex_unlock(&se_lock) +#endif /* SLJIT_SINGLE_THREADED */ + +#ifndef SLJIT_PROT_WX +#define SLJIT_PROT_WX 0 +#endif /* !SLJIT_PROT_WX */ + +SLJIT_API_FUNC_ATTRIBUTE void* sljit_malloc_exec(sljit_uw size) +{ +#if !(defined SLJIT_SINGLE_THREADED && SLJIT_SINGLE_THREADED) + static pthread_mutex_t se_lock = PTHREAD_MUTEX_INITIALIZER; +#endif + static int se_protected = !SLJIT_PROT_WX; + int prot = PROT_READ | PROT_WRITE | SLJIT_PROT_WX; + sljit_uw* ptr; + + if (SLJIT_UNLIKELY(se_protected < 0)) + return NULL; + +#ifdef PROT_MAX + prot |= PROT_MAX(PROT_READ | PROT_WRITE | PROT_EXEC); +#endif + + size += sizeof(sljit_uw); + ptr = (sljit_uw*)mmap(NULL, size, prot, MAP_PRIVATE | MAP_ANON, -1, 0); + + if (ptr == MAP_FAILED) + return NULL; + + if (SLJIT_UNLIKELY(se_protected > 0)) { + SLJIT_SE_LOCK(); + se_protected = check_se_protected(ptr, size); + SLJIT_SE_UNLOCK(); + if (SLJIT_UNLIKELY(se_protected < 0)) { + munmap((void *)ptr, size); + return NULL; + } + } + + *ptr++ = size; + return ptr; +} + +#undef SLJIT_PROT_WX +#undef SLJIT_SE_UNLOCK +#undef SLJIT_SE_LOCK + +SLJIT_API_FUNC_ATTRIBUTE void sljit_free_exec(void* ptr) +{ + sljit_uw *start_ptr = ((sljit_uw*)ptr) - 1; + munmap((void*)start_ptr, *start_ptr); +} + +static void sljit_update_wx_flags(void *from, void *to, sljit_s32 enable_exec) +{ + sljit_uw page_mask = (sljit_uw)get_page_alignment(); + sljit_uw start = (sljit_uw)from; + sljit_uw end = (sljit_uw)to; + int prot = PROT_READ | (enable_exec ? PROT_EXEC : PROT_WRITE); + + SLJIT_ASSERT(start < end); + + start &= ~page_mask; + end = (end + page_mask) & ~page_mask; + + mprotect((void*)start, end - start, prot); +} + +#else /* windows */ + +SLJIT_API_FUNC_ATTRIBUTE void* sljit_malloc_exec(sljit_uw size) +{ + sljit_uw *ptr; + + size += sizeof(sljit_uw); + ptr = (sljit_uw*)VirtualAlloc(NULL, size, + MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE); + + if (!ptr) + return NULL; + + *ptr++ = size; + + return ptr; +} + +SLJIT_API_FUNC_ATTRIBUTE void sljit_free_exec(void* ptr) +{ + sljit_uw start = (sljit_uw)ptr - sizeof(sljit_uw); +#if defined(SLJIT_DEBUG) && SLJIT_DEBUG + sljit_uw page_mask = (sljit_uw)get_page_alignment(); + + SLJIT_ASSERT(!(start & page_mask)); +#endif + VirtualFree((void*)start, 0, MEM_RELEASE); +} + +static void sljit_update_wx_flags(void *from, void *to, sljit_s32 enable_exec) +{ + DWORD oldprot; + sljit_uw page_mask = (sljit_uw)get_page_alignment(); + sljit_uw start = (sljit_uw)from; + sljit_uw end = (sljit_uw)to; + DWORD prot = enable_exec ? PAGE_EXECUTE : PAGE_READWRITE; + + SLJIT_ASSERT(start < end); + + start &= ~page_mask; + end = (end + page_mask) & ~page_mask; + + VirtualProtect((void*)start, end - start, prot, &oldprot); +} + +#endif /* !windows */ + +SLJIT_API_FUNC_ATTRIBUTE void sljit_free_unused_memory_exec(void) +{ + /* This allocator does not keep unused memory for future allocations. */ +} diff --git a/src/pcre/test-driver b/src/pcre2/test-driver similarity index 93% rename from src/pcre/test-driver rename to src/pcre2/test-driver index b8521a48..9759384a 100755 --- a/src/pcre/test-driver +++ b/src/pcre2/test-driver @@ -3,7 +3,7 @@ scriptversion=2018-03-07.03; # UTC -# Copyright (C) 2011-2018 Free Software Foundation, Inc. +# Copyright (C) 2011-2020 Free Software Foundation, Inc. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by @@ -42,11 +42,13 @@ print_usage () { cat << for testing. RC=0 @@ -433,7 +434,7 @@ RC=0 597:binary RC=0 ---------------------------- Test 46 ------------------------------ -pcregrep: Error in 2nd command-line regex at offset 9: missing ) +pcre2grep: Error in 2nd command-line regex at offset 9: missing closing parenthesis RC=2 ---------------------------- Test 47 ------------------------------ AB.VE @@ -454,17 +455,22 @@ RC=1 ---------------------------- Test 51 ------------------------------ over the lazy dog. This time it jumps and jumps and jumps. +This line contains \E and (regex) *meta* [characters]. +The word is cat in this line +The caterpillar sat on the mat +The snowcat is not an animal +A buried feline in the syndicate RC=0 ---------------------------- Test 52 ------------------------------ -fox jumps -This time it jumps and jumps and jumps. +fox jumps +This time it jumps and jumps and jumps. RC=0 ---------------------------- Test 53 ------------------------------ -36972,6 -36990,4 -37024,4 -37066,5 -37083,4 +36976,6 +36994,4 +37028,4 +37070,5 +37087,4 RC=0 ---------------------------- Test 54 ------------------------------ 595:15,6 @@ -474,14 +480,15 @@ RC=0 597:32,4 RC=0 ---------------------------- Test 55 ----------------------------- -Here is the pattern again. -That time it was on a line by itself. -This line contains pattern not on a line by itself. +Here is the pattern again. +That time it was on a line by itself. +This line contains pattern not on a line by itself. RC=0 ---------------------------- Test 56 ----------------------------- ./testdata/grepinput:456 ./testdata/grepinput3:0 ./testdata/grepinput8:0 +./testdata/grepinputM:0 ./testdata/grepinputv:1 ./testdata/grepinputx:0 RC=0 @@ -510,24 +517,24 @@ In the middle of a line, PATTERN appears. Check up on PATTERN near the end. RC=0 ---------------------------- Test 62 ----------------------------- -pcregrep: pcre_exec() gave error -8 while matching text that starts: +pcre2grep: pcre2_match() gave error -47 while matching text that starts: This is a file of miscellaneous text that is used as test data for checking -that the pcregrep command is working correctly. The file must be more than 24K -long so that it needs more than a single read +that the pcregrep command is working correctly. The file must be more than +24KiB long so that it needs more than a single re -pcregrep: Error -8, -21 or -27 means that a resource limit was exceeded. -pcregrep: Check your regex for nested unlimited loops. +pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded. +pcre2grep: Check your regex for nested unlimited loops. RC=1 ---------------------------- Test 63 ----------------------------- -pcregrep: pcre_exec() gave error -21 while matching text that starts: +pcre2grep: pcre2_match() gave error -53 while matching text that starts: This is a file of miscellaneous text that is used as test data for checking -that the pcregrep command is working correctly. The file must be more than 24K -long so that it needs more than a single read +that the pcregrep command is working correctly. The file must be more than +24KiB long so that it needs more than a single re -pcregrep: Error -8, -21 or -27 means that a resource limit was exceeded. -pcregrep: Check your regex for nested unlimited loops. +pcre2grep: Error -46, -47, -53 or -63 means that a resource limit was exceeded. +pcre2grep: Check your regex for nested unlimited loops. RC=1 ---------------------------- Test 64 ------------------------------ appears @@ -588,56 +595,84 @@ RC=0 ---------------------------- Test 70 ----------------------------- triple: t1_txt s1_tag s_txt p_tag p_txt o_tag o_txt -triple: t3_txt s2_tag s_txt p_tag p_txt o_tag o_txt +triple: t3_txt s2_tag s_txt p_tag p_txt o_tag o_txt -triple: t4_txt s1_tag s_txt p_tag p_txt o_tag o_txt +triple: t4_txt s1_tag s_txt p_tag p_txt o_tag o_txt -triple: t6_txt s2_tag s_txt p_tag p_txt o_tag o_txt +triple: t6_txt s2_tag s_txt p_tag p_txt o_tag o_txt -RC=0 +RC=0 +1:triple: t1_txt s1_tag s_txt p_tag p_txt o_tag o_txt + +6:triple: t3_txt s2_tag s_txt p_tag p_txt o_tag o_txt + +8:triple: t4_txt s1_tag s_txt p_tag p_txt o_tag o_txt + +13:triple: t6_txt s2_tag s_txt p_tag p_txt o_tag o_txt + +RC=0 +triple: t1_txt s1_tag s_txt p_tag p_txt o_tag o_txt + +triple: t3_txt s2_tag s_txt p_tag p_txt o_tag o_txt + +triple: t4_txt s1_tag s_txt p_tag p_txt o_tag o_txt + +triple: t6_txt s2_tag s_txt p_tag p_txt o_tag o_txt + +RC=0 +1:triple: t1_txt s1_tag s_txt p_tag p_txt o_tag o_txt + +6:triple: t3_txt s2_tag s_txt p_tag p_txt o_tag o_txt + +8:triple: t4_txt s1_tag s_txt p_tag p_txt o_tag o_txt + +13:triple: t6_txt s2_tag s_txt p_tag p_txt o_tag o_txt + +RC=0 ---------------------------- Test 71 ----------------------------- 01 RC=0 ---------------------------- Test 72 ----------------------------- -010203040506 +010203040506 RC=0 ---------------------------- Test 73 ----------------------------- -01 +01 RC=0 ---------------------------- Test 74 ----------------------------- 01 02 RC=0 ---------------------------- Test 75 ----------------------------- -010203040506 +010203040506 RC=0 ---------------------------- Test 76 ----------------------------- -01 -02 +01 +02 RC=0 ---------------------------- Test 77 ----------------------------- 01 03 RC=0 ---------------------------- Test 78 ----------------------------- -010203040506 +010203040506 RC=0 ---------------------------- Test 79 ----------------------------- -01 -03 +01 +03 RC=0 ---------------------------- Test 80 ----------------------------- 01 RC=0 ---------------------------- Test 81 ----------------------------- -010203040506 +010203040506 RC=0 ---------------------------- Test 82 ----------------------------- -01 +01 RC=0 ---------------------------- Test 83 ----------------------------- -pcregrep: line 4 of file ./testdata/grepinput3 is too long for the internal buffer -pcregrep: check the --buffer-size option +pcre2grep: line 4 of file ./testdata/grepinput3 is too long for the internal buffer +pcre2grep: the maximum buffer size is 100 +pcre2grep: use the --max-buffer-size option to change it RC=2 ---------------------------- Test 84 ----------------------------- testdata/grepinputv:fox jumps @@ -701,9 +736,9 @@ RC=0 ./testdata/grepinput:zerothe. RC=0 ---------------------------- Test 101 ------------------------------ -./testdata/grepinput:.|zero|the|. -./testdata/grepinput:zero|a -./testdata/grepinput:.|zero|the|. +./testdata/grepinput:.|zero|the|. +./testdata/grepinput:zero|a +./testdata/grepinput:.|zero|the|. RC=0 ---------------------------- Test 102 ----------------------------- 2: @@ -724,21 +759,21 @@ RC=0 14: RC=0 ---------------------------- Test 105 ----------------------------- -triple: t1_txt s1_tag s_txt p_tag p_txt o_tag o_txt - -triple: t2_txt s1_tag s_txt p_tag p_txt o_tag -Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. - -triple: t3_txt s2_tag s_txt p_tag p_txt o_tag o_txt - -triple: t4_txt s1_tag s_txt p_tag p_txt o_tag o_txt - -triple: t5_txt s1_tag s_txt p_tag p_txt o_tag -o_txt - -triple: t6_txt s2_tag s_txt p_tag p_txt o_tag o_txt - -triple: t7_txt s1_tag s_txt p_tag p_txt o_tag o_txt +triple: t1_txt s1_tag s_txt p_tag p_txt o_tag o_txt + +triple: t2_txt s1_tag s_txt p_tag p_txt o_tag +Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. + +triple: t3_txt s2_tag s_txt p_tag p_txt o_tag o_txt + +triple: t4_txt s1_tag s_txt p_tag p_txt o_tag o_txt + +triple: t5_txt s1_tag s_txt p_tag p_txt o_tag +o_txt + +triple: t6_txt s2_tag s_txt p_tag p_txt o_tag o_txt + +triple: t7_txt s1_tag s_txt p_tag p_txt o_tag o_txt RC=0 ---------------------------- Test 106 ----------------------------- a @@ -755,3 +790,193 @@ RC=0 RC=0 ---------------------------- Test 109 ----------------------------- RC=0 +---------------------------- Test 110 ----------------------------- +match 1: + a +/1/a +match 2: + b +/2/b +match 3: + c +/3/c +match 4: + d +/4/d +match 5: + e +/5/e +RC=0 +---------------------------- Test 111 ----------------------------- +607:0,12 +609:0,12 +611:0,12 +613:0,12 +615:0,12 +RC=0 +---------------------------- Test 112 ----------------------------- +37172,12 +37184,12 +37196,12 +37208,12 +37220,12 +RC=0 +---------------------------- Test 113 ----------------------------- +480 +RC=0 +---------------------------- Test 114 ----------------------------- +testdata/grepinput:469 +testdata/grepinput3:0 +testdata/grepinput8:0 +testdata/grepinputM:2 +testdata/grepinputv:3 +testdata/grepinputx:6 +TOTAL:480 +RC=0 +---------------------------- Test 115 ----------------------------- +testdata/grepinput:469 +testdata/grepinputM:2 +testdata/grepinputv:3 +testdata/grepinputx:6 +TOTAL:480 +RC=0 +---------------------------- Test 116 ----------------------------- +478 +RC=0 +---------------------------- Test 117 ----------------------------- +469 +0 +0 +2 +3 +6 +480 +RC=0 +---------------------------- Test 118 ----------------------------- +testdata/grepinput3 +testdata/grepinput8 +RC=0 +---------------------------- Test 119 ----------------------------- +123 +456 +789 +--- +abc +def +xyz +--- +RC=0 +---------------------------- Test 120 ------------------------------ +./testdata/grepinput:the binary zero.:zerothe. +./testdata/grepinput:a binary zero:zeroa +./testdata/grepinput:the binary zero.:zerothe. +RC=0 +---------------------------- Test 121 ----------------------------- +This line contains \E and (regex) *meta* [characters]. +RC=0 +---------------------------- Test 122 ----------------------------- +over the lazy dog. +The word is cat in this line +RC=0 +---------------------------- Test 123 ----------------------------- +over the lazy dog. +The word is cat in this line +RC=0 +---------------------------- Test 124 ----------------------------- +3:start end in between start +end and following +7:start end in between start +end and following start +end other stuff +11:start end in between start + +end +16:start end in between start +end +RC=0 +3:start end in between start +end and following +5-Other stuff +6- +7:start end in between start +end and following start +end other stuff +10- +11:start end in between start + +end +14- +15-** These two lines must be last. +16:start end in between start +end +RC=0 +3:start end in between start +end and following +7:start end in between start +end and following start +end other stuff +11:start end in between start + +end +16:start end in between start +end +RC=0 +3:start end in between start +end and following +5-Other stuff +6- +7:start end in between start +end and following start +end other stuff +10- +11:start end in between start + +end +14- +15-** These two lines must be last. +16:start end in between start +end +RC=0 +---------------------------- Test 125 ----------------------------- +abcd +RC=0 +abcd +RC=0 +abcd +RC=0 +abcd +RC=0 +---------------------------- Test 126 ----------------------------- +ABCXYZ +RC=0 +---------------------------- Test 127 ----------------------------- +pattern +RC=0 +---------------------------- Test 128 ----------------------------- +pcre2grep: Requested group 1 cannot be captured. +pcre2grep: Use --om-capture to increase the size of the capture vector. +RC=2 +---------------------------- Test 129 ----------------------------- +The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the +lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox +RC=0 +---------------------------- Test 130 ----------------------------- +fox +fox +fox +fox +RC=0 +---------------------------- Test 131 ----------------------------- +2 +RC=0 +---------------------------- Test 132 ----------------------------- +match 1: + a +match 2: + b +--- + a +RC=0 +---------------------------- Test 133 ----------------------------- +=AB3CD5= +RC=0 diff --git a/src/pcre2/testdata/grepoutput8 b/src/pcre2/testdata/grepoutput8 new file mode 100644 index 00000000..3888d9a8 --- /dev/null +++ b/src/pcre2/testdata/grepoutput8 @@ -0,0 +1,34 @@ +---------------------------- Test U1 ------------------------------ +1:X one +2:X two 3:X three 4:X four 5:X five +6:X six +7:X sevenÂ…8:X eight
9:X nine
10:X ten +RC=0 +---------------------------- Test U2 ------------------------------ +12-Before 111 +13-Before 222
14-Before 333Â…15:Match +16-After 111 +17-After 222
18-After 333 +RC=0 +---------------------------- Test U3 ------------------------------ +21:0,2 +22:0,2 +22:2,2 +22:4,2 +22:6,2 +22:8,2 +RC=0 +---------------------------- Test U4 ------------------------------ +pcre2grep: pcre2_match() gave error -22 while matching this text: + +Aက€CD Z + +UTF-8 error: isolated byte with 0x80 bit set at offset 4 + +RC=1 +---------------------------- Test U5 ------------------------------ +CD Z +RC=0 +---------------------------- Test U6 ----------------------------- +=ǓǤ= +RC=0 diff --git a/src/pcre2/testdata/grepoutputC b/src/pcre2/testdata/grepoutputC new file mode 100644 index 00000000..87897f05 --- /dev/null +++ b/src/pcre2/testdata/grepoutputC @@ -0,0 +1,44 @@ +Arg1: [T] [he ] [ ] Arg2: |T| () () (0) +Arg1: [T] [his] [s] Arg2: |T| () () (0) +Arg1: [T] [his] [s] Arg2: |T| () () (0) +Arg1: [T] [he ] [ ] Arg2: |T| () () (0) +Arg1: [T] [he ] [ ] Arg2: |T| () () (0) +Arg1: [T] [he ] [ ] Arg2: |T| () () (0) +The quick brown +This time it jumps and jumps and jumps. +This line contains \E and (regex) *meta* [characters]. +The word is cat in this line +The caterpillar sat on the mat +The snowcat is not an animal +Arg1: [qu] [qu] +Arg1: [ t] [ t] +Arg1: [ l] [ l] +Arg1: [wo] [wo] +Arg1: [ca] [ca] +Arg1: [sn] [sn] +The quick brown +This time it jumps and jumps and jumps. +This line contains \E and (regex) *meta* [characters]. +The word is cat in this line +The caterpillar sat on the mat +The snowcat is not an animal +0:T +The quick brown +0:T +This time it jumps and jumps and jumps. +0:T +This line contains \E and (regex) *meta* [characters]. +0:T +The word is cat in this line +0:T +The caterpillar sat on the mat +0:T +The snowcat is not an animal +T +T +T +T +T +T +0:T:AA +The quick brown diff --git a/src/pcre2/testdata/grepoutputCN b/src/pcre2/testdata/grepoutputCN new file mode 100644 index 00000000..838bee61 --- /dev/null +++ b/src/pcre2/testdata/grepoutputCN @@ -0,0 +1,32 @@ +The quick brown +This time it jumps and jumps and jumps. +This line contains \E and (regex) *meta* [characters]. +The word is cat in this line +The caterpillar sat on the mat +The snowcat is not an animal +The quick brown +This time it jumps and jumps and jumps. +This line contains \E and (regex) *meta* [characters]. +The word is cat in this line +The caterpillar sat on the mat +The snowcat is not an animal +0:T +The quick brown +0:T +This time it jumps and jumps and jumps. +0:T +This line contains \E and (regex) *meta* [characters]. +0:T +The word is cat in this line +0:T +The caterpillar sat on the mat +0:T +The snowcat is not an animal +T +T +T +T +T +T +0:T:AA +The quick brown diff --git a/src/pcre2/testdata/grepoutputN b/src/pcre2/testdata/grepoutputN new file mode 100644 index 00000000..811c52d7 --- /dev/null +++ b/src/pcre2/testdata/grepoutputN @@ -0,0 +1,21 @@ +---------------------------- Test N1 ------------------------------ +1:abc 2:def ---------------------------- Test N2 ------------------------------ +1:abc def +2:ghi +jkl +---------------------------- Test N3 ------------------------------ +2:def 3: +ghi +jkl ---------------------------- Test N4 ------------------------------ +2:ghi +jkl +---------------------------- Test N5 ------------------------------ +1:abc 2:def +3:ghi +4:jkl +---------------------------- Test N6 ------------------------------ +1:abc 2:def +3:ghi +4:jkl +---------------------------- Test N7 ------------------------------ +1:abc@2:def@ diff --git a/src/pcre/testdata/greppatN4 b/src/pcre2/testdata/greppatN4 similarity index 100% rename from src/pcre/testdata/greppatN4 rename to src/pcre2/testdata/greppatN4 diff --git a/src/pcre2/testdata/testbtables b/src/pcre2/testdata/testbtables new file mode 100644 index 00000000..b7aeeaf0 Binary files /dev/null and b/src/pcre2/testdata/testbtables differ diff --git a/src/pcre/testdata/testinput1 b/src/pcre2/testdata/testinput1 similarity index 71% rename from src/pcre/testdata/testinput1 rename to src/pcre2/testdata/testinput1 index 02e4f482..93b21c19 100644 --- a/src/pcre/testdata/testinput1 +++ b/src/pcre2/testdata/testinput1 @@ -1,13 +1,20 @@ -/-- This set of tests is for features that are compatible with all versions of - Perl >= 5.10, in non-UTF-8 mode. It should run clean for the 8-bit, 16-bit, - and 32-bit PCRE libraries. --/ +# This set of tests is for features that are compatible with all versions of +# Perl >= 5.10, in non-UTF mode. It should run clean for the 8-bit, 16-bit, and +# 32-bit PCRE libraries, and also using the perltest.sh script. + +# WARNING: Use only / as the pattern delimiter. Although pcre2test supports +# a number of delimiters, all those other than / give problems with the +# perltest.sh script. -< forbid 89?=ABCDEFfGILMNPTUWXZ< +#forbid_utf +#newline_default lf any anycrlf +#perltest /the quick brown fox/ the quick brown fox - The quick brown FOX What do you know about the quick brown fox? +\= Expect no match + The quick brown FOX What do you know about THE QUICK BROWN FOX? /The quick brown fox/i @@ -50,7 +57,7 @@ >>>aaabxyzpqrrrabbxyyyypqAzz >aaaabxyzpqrrrabbxyyyypqAzz >>>>abcxyzpqrrrabbxyyyypqAzz - *** Failers +\= Expect no match abxyzpqrrabbxyyyypqAzz abxyzpqrrrrabbxyyyypqAzz abxyzpqrrrabxyyyypqAzz @@ -61,7 +68,7 @@ /^(abc){1,2}zz/ abczz abcabczz - *** Failers +\= Expect no match zz abcabcabczz >>abczz @@ -75,7 +82,7 @@ aac abbbbbbbbbbbc bbbbbbbbbbbac - *** Failers +\= Expect no match aaac abbbbbbbbbbbac @@ -88,26 +95,15 @@ aac abbbbbbbbbbbc bbbbbbbbbbbac - *** Failers +\= Expect no match aaac abbbbbbbbbbbac -/^(b+|a){1,2}?bc/ - bbc - -/^(b*|ba){1,2}?bc/ - babc - bbabc - bababc - *** Failers - bababbc - babababc - /^(ba|b*){1,2}?bc/ babc bbabc bababc - *** Failers +\= Expect no match bababbc babababc @@ -121,7 +117,7 @@ cthing dthing ething - *** Failers +\= Expect no match fthing [thing \\thing @@ -131,7 +127,7 @@ cthing dthing ething - *** Failers +\= Expect no match athing fthing @@ -139,7 +135,7 @@ fthing [thing \\thing - *** Failers +\= Expect no match athing bthing ]thing @@ -150,7 +146,7 @@ /^[^]cde]/ athing fthing - *** Failers +\= Expect no match ]thing cthing dthing @@ -175,7 +171,7 @@ 9 10 100 - *** Failers +\= Expect no match abc /^.*nter/ @@ -186,28 +182,28 @@ /^xxx[0-9]+$/ xxx0 xxx1234 - *** Failers +\= Expect no match xxx /^.+[0-9][0-9][0-9]$/ x123 + x1234 xx123 123456 - *** Failers +\= Expect no match 123 - x1234 /^.+?[0-9][0-9][0-9]$/ x123 + x1234 xx123 123456 - *** Failers +\= Expect no match 123 - x1234 /^([^!]+)!(.+)=apquxz\.ixr\.zzz\.ac\.uk$/ abc!pqr=apquxz.ixr.zzz.ac.uk - *** Failers +\= Expect no match !pqr=apquxz.ixr.zzz.ac.uk abc!=apquxz.ixr.zzz.ac.uk abc!pqr=apquxz:ixr.zzz.ac.uk @@ -215,7 +211,8 @@ /:/ Well, we need a colon: somewhere - *** Fail if we don't +\= Expect no match + Fail without a colon /([\da-f:]+)$/i 0abc @@ -226,7 +223,7 @@ 5f03:12C0::932e fed def Any old stuff - *** Failers +\= Expect no match 0zzz gzzz fed\x20 @@ -235,7 +232,7 @@ /^.*\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/ .1.2.3 A.12.123.0 - *** Failers +\= Expect no match .1.2.3333 1.2.3 1234.2.3 @@ -243,7 +240,7 @@ /^(\d+)\s+IN\s+SOA\s+(\S+)\s+(\S+)\s*\(\s*$/ 1 IN SOA non-sp1 non-sp2( 1 IN SOA non-sp1 non-sp2 ( - *** Failers +\= Expect no match 1IN SOA non-sp1 non-sp2( /^[a-zA-Z\d][a-zA-Z\d\-]*(\.[a-zA-Z\d][a-zA-z\d\-]*)*\.$/ @@ -253,7 +250,7 @@ ab-c.pq-r. sxk.zzz.ac.uk. x-.y-. - *** Failers +\= Expect no match -abc.peq. /^\*\.[a-z]([a-z\-\d]*[a-z\d]+)?(\.[a-z]([a-z\-\d]*[a-z\d]+)?)*$/ @@ -261,7 +258,7 @@ *.b0-a *.c3-b.c *.c-a.b-c - *** Failers +\= Expect no match *.0 *.a- *.a-b.c- @@ -285,29 +282,30 @@ \"1234\" \"abcd\" ; \"\" ; rhubarb - *** Failers +\= Expect no match \"1234\" : things /^$/ \ - *** Failers +\= Expect no match + A non-empty line / ^ a (?# begins with a) b\sc (?# then b c) $ (?# then end)/x ab c - *** Failers +\= Expect no match abc ab cde /(?x) ^ a (?# begins with a) b\sc (?# then b c) $ (?# then end)/ ab c - *** Failers +\= Expect no match abc ab cde /^ a\ b[c ]d $/x a bcd a b d - *** Failers +\= Expect no match abcd ab d @@ -361,7 +359,7 @@ 1234567890 12345678ab 12345678__ - *** Failers +\= Expect no match 1234567 /^[aeiou\d]{4,5}$/ @@ -369,7 +367,7 @@ 1234 12345 aaaaa - *** Failers +\= Expect no match 123456 /^[aeiou\d]{4,5}?/ @@ -382,7 +380,7 @@ /\A(abc|def)=(\1){2,3}\Z/ abc=abcabc def=defdefdef - *** Failers +\= Expect no match abc=defdef /^(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)\11*(\3\4)\1(?#)2$/ @@ -401,7 +399,7 @@ /^From\s+\S+\s+([a-zA-Z]{3}\s+){2}\d{1,2}\s+\d\d:\d\d/ From abcd Mon Sep 01 12:33:02 1997 From abcd Mon Sep 1 12:33:02 1997 - *** Failers +\= Expect no match From abcd Sep 01 12:33:02 1997 /^12.34/s @@ -422,7 +420,7 @@ /^(\D*)(?=\d)(?!123)/ abc456 - *** Failers +\= Expect no match abc123 /^1234(?# test newlines @@ -448,12 +446,12 @@ /(?!^)abc/ the abc - *** Failers +\= Expect no match abc /(?=^)abc/ abc - *** Failers +\= Expect no match the abc /^[ab]{1,3}(ab*|b)/ @@ -669,7 +667,7 @@ A. Other (a comment) \"/s=user/ou=host/o=place/prmd=uu.yy/admd= /c=gb/\"\@x400-re.lay A missing angle (a comment) \"/s=user/ou=host/o=place/prmd=uu.yy/admd= /c=gb/\"\@x400-re.lay A missing angle .*/)foo" - /this/is/a/very/long/line/in/deed/with/very/many/slashes/in/it/you/see/ - -"(?>.*/)foo" +/(?>.*\/)foo/ /this/is/a/very/long/line/in/deed/with/very/many/slashes/in/and/foo +\= Expect no match + /this/is/a/very/long/line/in/deed/with/very/many/slashes/in/it/you/see/ /(?>(\.\d\d[1-9]?))\d+/ 1.230003938 1.875000282 - *** Failers +\= Expect no match 1.235 /^((?>\w+)|(?>\s+))*$/ now is the time for all good men to come to the aid of the party - *** Failers +\= Expect no match this is not a line with only words and spaces! /(\d+)(\w)/ @@ -2006,7 +1983,7 @@ /((?>\d+))(\w)/ 12345a - *** Failers +\= Expect no match 12345+ /(?>a+)b/ @@ -2027,35 +2004,35 @@ /((?>[^()]+)|\([^()]*\))+/ ((abc(ade)ufh()()x -/\(((?>[^()]+)|\([^()]+\))+\)/ +/\(((?>[^()]+)|\([^()]+\))+\)/ (abc) (abc(def)xyz) - *** Failers +\= Expect no match ((()aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa /a(?-i)b/i ab Ab - *** Failers +\= Expect no match aB AB /(a (?x)b c)d e/ a bcd e - *** Failers +\= Expect no match a b cd e abcd e a bcde /(a b(?x)c d (?-x)e f)/ a bcde f - *** Failers +\= Expect no match abcdef /(a(?i)b)c/ abc aBc - *** Failers +\= Expect no match abC aBC Abc @@ -2066,7 +2043,7 @@ /a(?i:b)c/ abc aBc - *** Failers +\= Expect no match ABC abC aBC @@ -2074,14 +2051,14 @@ /a(?i:b)*c/ aBc aBBc - *** Failers +\= Expect no match aBC aBBC /a(?=b(?i)c)\w\wd/ abcd abCd - *** Failers +\= Expect no match aBCd abcD @@ -2089,7 +2066,7 @@ more than million more than MILLION more \n than Million - *** Failers +\= Expect no match MORE THAN MILLION more \n than \n million @@ -2097,15 +2074,15 @@ more than million more than MILLION more \n than Million - *** Failers +\= Expect no match MORE THAN MILLION more \n than \n million -/(?>a(?i)b+)+c/ +/(?>a(?i)b+)+c/ abc aBbc aBBc - *** Failers +\= Expect no match Abc abAb abbC @@ -2113,7 +2090,7 @@ /(?=a(?i)b)\w\wc/ abc aBc - *** Failers +\= Expect no match Ab abC aBC @@ -2121,7 +2098,7 @@ /(?<=a(?i)b)(\w\w)c/ abxxc aBxxc - *** Failers +\= Expect no match Abxxc ABxxc abxxC @@ -2129,7 +2106,7 @@ /(?:(a)|b)(?(1)A|B)/ aA bB - *** Failers +\= Expect no match aB bA @@ -2137,20 +2114,23 @@ aa b bb - *** Failers +\= Expect no match ab + +# Perl gets this next one wrong if the pattern ends with $; in that case it +# fails to match "12". -/^(?(?=abc)\w{3}:|\d\d)$/ +/^(?(?=abc)\w{3}:|\d\d)/ abc: 12 - *** Failers 123 +\= Expect no match xyz /^(?(?!abc)\d\d|\w{3}:)$/ abc: 12 - *** Failers +\= Expect no match 123 xyz @@ -2159,7 +2139,7 @@ cat fcat focat - *** Failers +\= Expect no match foocat /(?(?a*)*/ a aa @@ -2274,13 +2259,13 @@ /(?(?=[^a-z]+[a-z]) \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} ) /x 12-sep-98 12-09-98 - *** Failers +\= Expect no match sep-12-98 /(?<=(foo))bar\1/ foobarfoo foobarfootling - *** Failers +\= Expect no match foobar barfoo @@ -2298,7 +2283,7 @@ aBCx bbx BBx - *** Failers +\= Expect no match abcX aBCX bbX @@ -2312,7 +2297,7 @@ Europe frog France - *** Failers +\= Expect no match Africa /^(ab|a(?i)[b-c](?m-i)d|x(?i)y|z)/ @@ -2322,13 +2307,13 @@ xY zebra Zambesi - *** Failers +\= Expect no match aCD XY /(?<=foo\n)^bar/m foo\nbar - *** Failers +\= Expect no match bar baz\nbar @@ -2336,41 +2321,43 @@ barbaz barbarbaz koobarbaz - *** Failers +\= Expect no match baz foobarbaz -/The cases of aaaa and aaaaaa are missed out below because Perl does things/ -/differently. We know that odd, and maybe incorrect, things happen with/ -/recursive references in Perl, as far as 5.11.3 - see some stuff in test #2./ +# The cases of aaaa and aaaaaa are missed out below because Perl does things +# differently. We know that odd, and maybe incorrect, things happen with +# recursive references in Perl, as far as 5.11.3 - see some stuff in test #2. /^(a\1?){4}$/ + aaaaa + aaaaaaa + aaaaaaaaaa +\= Expect no match a aa aaa - aaaaa - aaaaaaa aaaaaaaa aaaaaaaaa - aaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaaa - aaaaaaaaaaaaaaaa + aaaaaaaaaaaaaaaa /^(a\1?)(a\1?)(a\2?)(a\3?)$/ - a - aa - aaa aaaa aaaaa aaaaaa aaaaaaa + aaaaaaaaaa +\= Expect no match + a + aa + aaa aaaaaaaa aaaaaaaaa - aaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaa @@ -2378,14 +2365,14 @@ aaaaaaaaaaaaaaa aaaaaaaaaaaaaaaa -/The following tests are taken from the Perl 5.005 test suite; some of them/ -/are compatible with 5.004, but I'd rather not have to sort them out./ +# The following tests are taken from the Perl 5.005 test suite; some of them +# are compatible with 5.004, but I'd rather not have to sort them out. /abc/ abc xabcy ababc - *** Failers +\= Expect no match xbc axc abx @@ -2409,7 +2396,7 @@ /ab+bc/ abbc - *** Failers +\= Expect no match abc abq @@ -2428,7 +2415,7 @@ abbbbc /ab{4,5}bc/ - *** Failers +\= Expect no match abq abbbbc @@ -2449,7 +2436,7 @@ /^abc$/ abc - *** Failers +\= Expect no match abbbbc abcc @@ -2460,8 +2447,7 @@ /abc$/ aabc - *** Failers - aabc +\= Expect no match aabcd /^/ @@ -2479,7 +2465,7 @@ /a[bc]d/ abd - *** Failers +\= Expect no match axyzd abc @@ -2503,7 +2489,7 @@ /a[^bc]d/ aed - *** Failers +\= Expect no match abd abd @@ -2512,8 +2498,8 @@ /a[^]b]c/ adc - *** Failers a-c +\= Expect no match a]c /\ba\b/ @@ -2522,13 +2508,13 @@ -a- /\by\b/ - *** Failers +\= Expect no match xy yz xyz /\Ba\B/ - *** Failers +\= Expect no match a- -a -a- @@ -2547,8 +2533,7 @@ /\W/ - - *** Failers - - +\= Expect no match a /a\sb/ @@ -2556,8 +2541,7 @@ /a\Sb/ a-b - *** Failers - a-b +\= Expect no match a b /\d/ @@ -2565,8 +2549,7 @@ /\D/ - - *** Failers - - +\= Expect no match 1 /[\w]/ @@ -2574,8 +2557,7 @@ /[\W]/ - - *** Failers - - +\= Expect no match a /a[\s]b/ @@ -2583,8 +2565,7 @@ /a[\S]b/ a-b - *** Failers - a-b +\= Expect no match a b /[\d]/ @@ -2592,8 +2573,7 @@ /[\D]/ - - *** Failers - - +\= Expect no match 1 /ab|cd/ @@ -2613,7 +2593,7 @@ a((b /a\\b/ - a\b + a\\b /((a))/ abc @@ -2652,12 +2632,11 @@ cde /abc/ - *** Failers +\= Expect no match b - /a*/ - + \ /([abc])*d/ abbbcd @@ -2711,7 +2690,7 @@ adcdcde /a[bcd]+dcdcde/ - *** Failers +\= Expect no match abcde adcdcde @@ -2731,7 +2710,7 @@ effgz ij reffgz - *** Failers +\= Expect no match effg bcdd @@ -2745,7 +2724,7 @@ a /multiple words of text/ - *** Failers +\= Expect no match aa uh-uh @@ -2777,8 +2756,8 @@ /(a)|\1/ a - *** Failers ab +\= Expect no match x /(([a-c])b*?\2)*/ @@ -2797,7 +2776,7 @@ ABC XABCY ABABC - *** Failers +\= Expect no match aaxabxbaxbbx XBC AXC @@ -2820,7 +2799,7 @@ ABBC /ab+bc/i - *** Failers +\= Expect no match ABC ABQ @@ -2839,7 +2818,7 @@ ABBBBC /ab{4,5}?bc/i - *** Failers +\= Expect no match ABQ ABBBBC @@ -2860,7 +2839,7 @@ /^abc$/i ABC - *** Failers +\= Expect no match ABBBBC ABCC @@ -2886,8 +2865,8 @@ AXYZC /a.*c/i - *** Failers AABC +\= Expect no match AXYZD /a[bc]d/i @@ -2895,7 +2874,7 @@ /a[b-d]e/i ACE - *** Failers +\= Expect no match ABC ABD @@ -2919,7 +2898,7 @@ /a[^-b]c/i ADC - *** Failers +\= Expect no match ABD A-C @@ -2934,7 +2913,7 @@ DEF /$b/i - *** Failers +\= Expect no match A]C B @@ -2946,7 +2925,8 @@ A((B /a\\b/i - A\B + A\\b + a\\B /((a))/i ABC @@ -2993,11 +2973,6 @@ /[^ab]*/i CDE -/abc/i - -/a*/i - - /([abc])*d/i ABBBCD @@ -3024,6 +2999,7 @@ HIJ /^(ab|cd)e/i +\= Expect no match ABCDE /(abc|)ef/i @@ -3068,7 +3044,7 @@ EFFGZ IJ REFFGZ - *** Failers +\= Expect no match ADCDCDE EFFG BCDD @@ -3089,7 +3065,7 @@ C /multiple words of text/i - *** Failers +\= Expect no match AA UH-UH @@ -3185,14 +3161,14 @@ /^(a\1?){4}$/ aaaaaaaaaa - *** Failers +\= Expect no match AB aaaaaaaaa aaaaaaaaaaa /^(a(?(1)\1)){4}$/ aaaaaaaaaa - *** Failers +\= Expect no match aaaaaaaaa aaaaaaaaaaa @@ -3201,7 +3177,7 @@ /(?<=a)b/ ab - *** Failers +\= Expect no match cb b @@ -3250,7 +3226,7 @@ Ab /(?:(?i)a)b/ - *** Failers +\= Expect no match cb aB @@ -3269,7 +3245,7 @@ Ab /(?i:a)b/ - *** Failers +\= Expect no match aB aB @@ -3288,25 +3264,11 @@ aB /(?:(?-i)a)b/i - *** Failers aB - Ab - -/((?-i)a)b/i - -/(?:(?-i)a)b/i - aB - -/((?-i)a)b/i - aB - -/(?:(?-i)a)b/i - *** Failers +\= Expect no match Ab AB -/((?-i)a)b/i - /(?-i:a)b/i ab @@ -3320,7 +3282,7 @@ aB /(?-i:a)b/i - *** Failers +\= Expect no match AB Ab @@ -3333,14 +3295,14 @@ aB /(?-i:a)b/i - *** Failers +\= Expect no match Ab AB /((?-i:a))b/i /((?-i:a.))b/i - *** Failers +\= Expect no match AB a\nB @@ -3370,7 +3332,7 @@ aaac /(?(?(1)\.|())[^\W_](?>[a-z0-9-]*[^\W_])?)+$/ a @@ -3564,7 +3525,7 @@ the.quick.brown.fox a100.b200.300c 12-ab.1245 - *** Failers +\= Expect no match \ .a -a @@ -3582,38 +3543,40 @@ /(?>.*)(?<=(abcd|wxyz))/ alphabetabcd endingwxyz - *** Failers +\= Expect no match a rather long string that doesn't end with one of them /word (?>(?:(?!otherword)[a-zA-Z0-9]+ ){0,30})otherword/ word cat dog elephant mussel cow horse canary baboon snake shark otherword +\= Expect no match word cat dog elephant mussel cow horse canary baboon snake shark /word (?>[a-zA-Z0-9]+ ){0,30}otherword/ +\= Expect no match word cat dog elephant mussel cow horse canary baboon snake shark the quick brown fox and the lazy dog and several other words getting close to thirty by now I hope /(?<=\d{3}(?!999))foo/ 999foo 123999foo - *** Failers +\= Expect no match 123abcfoo /(?<=(?!...999)\d{3})foo/ 999foo 123999foo - *** Failers +\= Expect no match 123abcfoo /(?<=\d{3}(?!999)...)foo/ 123abcfoo 123456foo - *** Failers +\= Expect no match 123999foo /(?<=\d{3}...)(? \x09\x0a\x0c\x0d\x0b< @@ -3684,10 +3640,11 @@ ab /(?!\A)x/m - a\nxb\n + a\nxb\n /(?!^)x/m - a\nxb\n +\= Expect no match + a\nxb\n /abc\Qabc\Eabc/ abcabcabc @@ -3697,7 +3654,7 @@ / abc\Q abc\Eabc/x abc abcabc - *** Failers +\= Expect no match abcabcabc /abc#comment @@ -3729,7 +3686,7 @@ /\Gabc/ abc - *** Failers +\= Expect no match xyzabc /\Gabc./g @@ -3740,7 +3697,7 @@ /a(?x: b c )d/ XabcdY - *** Failers +\= Expect no match Xa b c d Y /((?x)x y z | a b c)/ @@ -3749,13 +3706,13 @@ /(?i)AB(?-i)C/ XabCY - *** Failers +\= Expect no match XabcY /((?i)AB(?-i)C|D)E/ abCE DE - *** Failers +\= Expect no match abcE abCe dE @@ -3773,9 +3730,9 @@ abc123abc abc123bc -/-- This tests for an IPv6 address in the form where it can have up to - eight components, one and only one of which is empty. This must be - an internal component. --/ +# This tests for an IPv6 address in the form where it can have up to +# eight components, one and only one of which is empty. This must be +# an internal component. /^(?!:) # colon disallowed at start (?: # start of item @@ -3785,14 +3742,14 @@ ){1,7} # end item; 1-7 of them required [0-9a-f]{1,4} $ # final hex number at end of string (?(1)|.) # check that there was an empty component - /xi + /ix a123::a123 a123:b342::abcd a123:b342::324e:abcd a123:ddde:b342::324e:abcd a123:ddde:b342::324e:dcba:abcd a123:ddde:9999:b342::324e:dcba:abcd - *** Failers +\= Expect no match 1:2:3:4:5:6:7:8 a123:bce:ddde:9999:b342::324e:dcba:abcd a123::9999:b342::324e:dcba:abcd @@ -3808,17 +3765,11 @@ - d ] - *** Failers +\= Expect no match b -/[\z\C]/ - z - C - -/\M/ - M - /(a+)*b/ +\= Expect no match aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa /(?i)reg(?:ul(?:[aä]|ae)r|ex)/ @@ -3841,29 +3792,29 @@ /ab cd(?x) de fg/ ab cddefg - ** Failers +\= Expect no match abcddefg /(?a|)*\d/ - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 +\= Expect no match + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa /(?:a|)*\d/ - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 +\= Expect no match + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa /\Z/g abc\n /^(?s)(?>.*)(?(?>(a))b|(a)c)/ ac -/(?:(?>([ab])))+a=/+ +/(?:(?>([ab])))+a=/aftertext =ba= -/(?>([ab]))+a=/+ +/(?>([ab]))+a=/aftertext =ba= /((?>(a+)b)+(aabab))/ aaaabaaabaabab /(?>a+|ab)+?c/ +\= Expect no match aabc /(?>a+|ab)+c/ +\= Expect no match aabc /(?:a+|ab)+c/ @@ -4183,34 +4144,36 @@ ab /^(?:a|ab)++c/ +\= Expect no match aaaabc /^(?>a|ab)++c/ +\= Expect no match aaaabc /^(?:a|ab)+c/ aaaabc -/(?=abc){3}abc/+ +/(?=abc){3}abc/aftertext abcabcabc - ** Failers +\= Expect no match xyz -/(?=abc)+abc/+ +/(?=abc)+abc/aftertext abcabcabc - ** Failers +\= Expect no match xyz -/(?=abc)++abc/+ +/(?=abc)++abc/aftertext abcabcabc - ** Failers +\= Expect no match xyz /(?=abc){0}xyz/ xyz /(?=abc){1}xyz/ - ** Failers +\= Expect no match xyz /(?=(a))?./ @@ -4234,7 +4197,7 @@ /^[\g]+/ ggg<<>> - ** Failers +\= Expect no match \\ga /^[\ga]+/ @@ -4251,12 +4214,12 @@ /(?<=a{2})b/i xaabc - ** Failers +\= Expect no match xabc /(?XNNNYZ > X NYQZ - ** Failers +\= Expect no match >XYZ > X NY Z @@ -4357,21 +4307,21 @@ /(foo\Kbar)baz/ foobarbaz -/abc\K|def\K/g+ +/abc\K|def\K/g,aftertext Xabcdefghi -/ab\Kc|de\Kf/g+ +/ab\Kc|de\Kf/g,aftertext Xabcdefghi -/(?=C)/g+ +/(?=C)/g,aftertext ABCDECBA -/^abc\K/+ +/^abc\K/aftertext abcdef - ** Failers +\= Expect no match defabcxyz -/^(a(b))\1\g1\g{1}\g-1\g{-1}\g{-02}Z/ +/^(a(b))\1\g1\g{1}\g-1\g{-1}\g{-2}Z/ ababababbbabZXXXX /(?tom|bon)-\g{A}/ @@ -4379,19 +4329,20 @@ bon-bon /(^(a|b\g{-1}))/ +\= Expect no match bacxxx /(?|(abc)|(xyz))\1/ abcabc xyzxyz - ** Failers +\= Expect no match abcxyz xyzabc /(?|(abc)|(xyz))(?1)/ abcabc xyzabc - ** Failers +\= Expect no match xyzxyz /^X(?5)(a)(?|(b)|(q))(c)(d)(Y)/ @@ -4406,14 +4357,14 @@ /(?'abc'\w+):\k{2}/ a:aaxyz ab:ababxyz - ** Failers +\= Expect no match a:axyz ab:abxyz /(?'abc'\w+):\g{abc}{2}/ a:aaxyz ab:ababxyz - ** Failers +\= Expect no match a:axyz ab:abxyz @@ -4441,7 +4392,7 @@ 1.2.3.4 131.111.10.206 10.0.0.0 - ** Failers +\= Expect no match 10.6 455.3.4.5 @@ -4449,18 +4400,18 @@ 1.2.3.4 131.111.10.206 10.0.0.0 - ** Failers +\= Expect no match 10.6 455.3.4.5 /^(\w++|\s++)*$/ now is the time for all good men to come to the aid of the party - *** Failers +\= Expect no match this is not a line with only words and spaces! /(\d++)(\w)/ 12345a - *** Failers +\= Expect no match 12345+ /a++b/ @@ -4478,14 +4429,14 @@ /\(([^()]++|\([^()]+\))+\)/ (abc) (abc(def)xyz) - *** Failers +\= Expect no match ((()aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa /^([^()]|\((?1)*\))*$/ abc a(b)c a(b(c))d - *** Failers) +\= Expect no match) a(b(c)d /^>abc>([^()]|\((?1)*\))* hij> def> - *** Failers +\= Expect no match a)(?<=b(?&X))/ @@ -4603,33 +4561,33 @@ /^(?|(abc)|(def))\1/ abcabc defdef - ** Failers +\= Expect no match abcdef defabc /^(?|(abc)|(def))(?1)/ abcabc defabc - ** Failers +\= Expect no match defdef abcdef -/(?:a(? (?')|(?")) |b(? (?')|(?")) ) (?('quote')[a-z]+|[0-9]+)/xJ +/(?:a(? (?')|(?")) |b(? (?')|(?")) ) (?('quote')[a-z]+|[0-9]+)/x,dupnames a\"aaaaa b\"aaaaa - ** Failers +\= Expect no match b\"11111 /(?:(?1)|B)(A(*F)|C)/ ABCD CCD - ** Failers +\= Expect no match CAD /^(?:(?1)|B)(A(*F)|C)/ CCD BCD - ** Failers +\= Expect no match ABCD CAD BAD @@ -4640,7 +4598,7 @@ BAD BCD BAX - ** Failers +\= Expect no match ACX ABC @@ -4654,12 +4612,12 @@ (ab(cd)ef) /^(?=a(*SKIP)b|ac)/ - ** Failers +\= Expect no match ac /^(?=a(*PRUNE)b)/ ab - ** Failers +\= Expect no match ac /^(?=a(*ACCEPT)b)/ @@ -4697,95 +4655,98 @@ 00 0000 -/--- This one does fail, as expected, in Perl. It needs the complex item at the - end of the pattern. A single letter instead of (B|D) makes it not fail, - which I think is a Perl bug. --- / +# This one does fail, as expected, in Perl. It needs the complex item at the +# end of the pattern. A single letter instead of (B|D) makes it not fail, which +# I think is a Perl bug. /A(*COMMIT)(B|D)/ +\= Expect no match ACABX -/--- Check the use of names for failure ---/ +# Check the use of names for failure -/^(A(*PRUNE:A)B|C(*PRUNE:B)D)/K - ** Failers +/^(A(*PRUNE:A)B|C(*PRUNE:B)D)/mark +\= Expect no match AC CB -/--- Force no study, otherwise mark is not seen. The studied version is in - test 2 because it isn't Perl-compatible. ---/ - -/(*MARK:A)(*SKIP:B)(C|X)/KSS +/(*MARK:A)(*SKIP:B)(C|X)/mark C +\= Expect no match D -/^(A(*THEN:A)B|C(*THEN:B)D)/K - ** Failers +/^(A(*THEN:A)B|C(*THEN:B)D)/mark +\= Expect no match CB -/^(?:A(*THEN:A)B|C(*THEN:B)D)/K +/^(?:A(*THEN:A)B|C(*THEN:B)D)/mark +\= Expect no match CB -/^(?>A(*THEN:A)B|C(*THEN:B)D)/K +/^(?>A(*THEN:A)B|C(*THEN:B)D)/mark +\= Expect no match CB -/--- This should succeed, as the skip causes bump to offset 1 (the mark). Note -that we have to have something complicated such as (B|Z) at the end because, -for Perl, a simple character somehow causes an unwanted optimization to mess -with the handling of backtracking verbs. ---/ +# This should succeed, as the skip causes bump to offset 1 (the mark). Note +# that we have to have something complicated such as (B|Z) at the end because, +# for Perl, a simple character somehow causes an unwanted optimization to mess +# with the handling of backtracking verbs. -/A(*MARK:A)A+(*SKIP:A)(B|Z) | AC/xK +/A(*MARK:A)A+(*SKIP:A)(B|Z) | AC/x,mark AAAC -/--- Test skipping over a non-matching mark. ---/ +# Test skipping over a non-matching mark. -/A(*MARK:A)A+(*MARK:B)(*SKIP:A)(B|Z) | AC/xK +/A(*MARK:A)A+(*MARK:B)(*SKIP:A)(B|Z) | AC/x,mark AAAC -/--- Check shorthand for MARK ---/ +# Check shorthand for MARK. -/A(*:A)A+(*SKIP:A)(B|Z) | AC/xK +/A(*:A)A+(*SKIP:A)(B|Z) | AC/x,mark AAAC -/--- Don't loop! Force no study, otherwise mark is not seen. ---/ - -/(*:A)A+(*SKIP:A)(B|Z)/KSS +/(*:A)A+(*SKIP:A)(B|Z)/mark +\= Expect no match AAAC -/--- This should succeed, as a non-existent skip name disables the skip ---/ +# This should succeed, as a non-existent skip name disables the skip. -/A(*MARK:A)A+(*SKIP:B)(B|Z) | AC/xK +/A(*MARK:A)A+(*SKIP:B)(B|Z) | AC/x,mark AAAC -/A(*MARK:A)A+(*SKIP:B)(B|Z) | AC(*:B)/xK +/A(*MARK:A)A+(*SKIP:B)(B|Z) | AC(*:B)/x,mark AAAC -/--- COMMIT at the start of a pattern should act like an anchor. Again, -however, we need the complication for Perl. ---/ +# COMMIT at the start of a pattern should act like an anchor. Again, however, +# we need the complication for Perl. /(*COMMIT)(A|P)(B|P)(C|P)/ ABCDEFG - ** Failers +\= Expect no match DEFGABC -/--- COMMIT inside an atomic group can't stop backtracking over the group. ---/ +# COMMIT inside an atomic group can't stop backtracking over the group. /(\w+)(?>b(*COMMIT))\w{2}/ abbb /(\w+)b(*COMMIT)\w{2}/ +\= Expect no match abbb -/--- Check opening parens in comment when seeking forward reference. ---/ +# Check opening parens in comment when seeking forward reference. /(?&t)(?#()(?(DEFINE)(?a))/ bac -/--- COMMIT should override THEN ---/ +# COMMIT should override THEN. /(?>(*COMMIT)(?>yes|no)(*THEN)(*F))?/ +\= Expect no match yes /(?>(*COMMIT)(yes|no)(*THEN)(*F))?/ +\= Expect no match yes /b?(*SKIP)c/ @@ -4793,9 +4754,11 @@ however, we need the complication for Perl. ---/ abc /(*SKIP)bc/ +\= Expect no match a /(*SKIP)b/ +\= Expect no match a /(?P(?P=abn)xxx|)+/ @@ -4804,7 +4767,7 @@ however, we need the complication for Perl. ---/ /(?i:([^b]))(?1)/ aa aA - ** Failers +\= Expect no match ab aB Ba @@ -4812,7 +4775,7 @@ however, we need the complication for Perl. ---/ /^(?&t)*+(?(DEFINE)(?a))\w$/ aaaaaaX - ** Failers +\= Expect no match aaaaaa /^(?&t)*(?(DEFINE)(?a))\w$/ @@ -4822,24 +4785,24 @@ however, we need the complication for Perl. ---/ /^(a)*+(\w)/ aaaaX YZ - ** Failers +\= Expect no match aaaa /^(?:a)*+(\w)/ aaaaX YZ - ** Failers +\= Expect no match aaaa /^(a)++(\w)/ aaaaX - ** Failers +\= Expect no match aaaa YZ /^(?:a)++(\w)/ aaaaX - ** Failers +\= Expect no match aaaa YZ @@ -4853,13 +4816,13 @@ however, we need the complication for Perl. ---/ /^(a){2,}+(\w)/ aaaaX - ** Failers +\= Expect no match aaa YZ /^(?:a){2,}+(\w)/ aaaaX - ** Failers +\= Expect no match aaa YZ @@ -4869,12 +4832,12 @@ however, we need the complication for Perl. ---/ aab /(a)++(?1)b/ - ** Failers +\= Expect no match ab aab /(a)*+(?1)b/ - ** Failers +\= Expect no match ab aab @@ -4909,12 +4872,13 @@ however, we need the complication for Perl. ---/ aaaab /^(a)(?1)++ab/ +\= Expect no match aaaab -/^(?=a(*:M))aZ/K +/^(?=a(*:M))aZ/mark aZbc -/^(?!(*:M)b)aZ/K +/^(?!(*:M)b)aZ/mark aZbc /(?(DEFINE)(a))?b(?1)/ @@ -4942,10 +4906,10 @@ however, we need the complication for Perl. ---/ aaa /((?(R)a|(?1)))+/ - aaa + aaa /a(*:any -name)/K +name)/mark abc /(?>(?&t)c|(?&t))(?(DEFINE)(?a|b(*PRUNE)c))/ @@ -4953,11 +4917,12 @@ name)/K ba bba -/--- Checking revised (*THEN) handling ---/ +# Checking revised (*THEN) handling. -/--- Capture ---/ +# Capture /^.*? (a(*THEN)b) c/x +\= Expect no match aabc /^.*? (a(*THEN)b|(*F)) c/x @@ -4967,11 +4932,13 @@ name)/K aabc /^.*? ( (a(*THEN)b) ) c/x +\= Expect no match aabc -/--- Non-capture ---/ +# Non-capture /^.*? (?:a(*THEN)b) c/x +\= Expect no match aabc /^.*? (?:a(*THEN)b|(*F)) c/x @@ -4981,11 +4948,13 @@ name)/K aabc /^.*? (?: (?:a(*THEN)b) ) c/x +\= Expect no match aabc -/--- Atomic ---/ +# Atomic /^.*? (?>a(*THEN)b) c/x +\= Expect no match aabc /^.*? (?>a(*THEN)b|(*F)) c/x @@ -4995,11 +4964,13 @@ name)/K aabc /^.*? (?> (?>a(*THEN)b) ) c/x +\= Expect no match aabc -/--- Possessive capture ---/ +# Possessive capture /^.*? (a(*THEN)b)++ c/x +\= Expect no match aabc /^.*? (a(*THEN)b|(*F))++ c/x @@ -5009,11 +4980,13 @@ name)/K aabc /^.*? ( (a(*THEN)b)++ )++ c/x +\= Expect no match aabc -/--- Possessive non-capture ---/ +# Possessive non-capture /^.*? (?:a(*THEN)b)++ c/x +\= Expect no match aabc /^.*? (?:a(*THEN)b|(*F))++ c/x @@ -5023,35 +4996,38 @@ name)/K aabc /^.*? (?: (?:a(*THEN)b)++ )++ c/x +\= Expect no match aabc -/--- Condition assertion ---/ +# Condition assertion /^(?(?=a(*THEN)b)ab|ac)/ ac -/--- Condition ---/ +# Condition /^.*?(?(?=a)a|b(*THEN)c)/ +\= Expect no match ba /^.*?(?:(?(?=a)a|b(*THEN)c)|d)/ ba /^.*?(?(?=a)a(*THEN)b|c)/ +\= Expect no match ac -/--- Assertion ---/ +# Assertion -/^.*(?=a(*THEN)b)/ +/^.*(?=a(*THEN)b)/ aabc -/------------------------------/ +# -------------------------- -/(?>a(*:m))/imsxSK +/(?>a(*:m))/imsx,mark a -/(?>(a)(*:m))/imsxSK +/(?>(a)(*:m))/imsx,mark a /(?<=a(*ACCEPT)b)c/ @@ -5062,14 +5038,14 @@ name)/K /(?<=(a(*COMMIT)b))c/ xabcd - ** Failers +\= Expect no match xacd /(?a?)*)*c/ - aac + aac /(?>.*?a)(?<=ba)/ aba @@ -5241,18 +5219,6 @@ name were given. ---/ /(?:.*?a)(?<=ba)/ aba -/.*?a(*PRUNE)b/ - aab - -/.*?a(*PRUNE)b/s - aab - -/^a(*PRUNE)b/s - aab - -/.*?a(*SKIP)b/ - aab - /(?>.*?a)b/s aab @@ -5260,6 +5226,7 @@ name were given. ---/ aab /(?>^a)b/s +\= Expect no match aab /(?>.*?)(?<=(abcd)|(wxyz))/ @@ -5270,10 +5237,11 @@ name were given. ---/ alphabetabcd endingwxyz -"(?>.*)foo" +/(?>.*)foo/ +\= Expect no match abcdfooxyz -"(?>.*?)foo" +/(?>.*?)foo/ abcdfooxyz /(?:(a(*PRUNE)b)){0}(?:(?1)|ac)/ @@ -5283,33 +5251,33 @@ name were given. ---/ ac /(?<=(*SKIP)ac)a/ +\= Expect no match aa -/A(*MARK:A)A+(*SKIP:B)(B|Z) | AC/xK +/A(*MARK:A)A+(*SKIP:B)(B|Z) | AC/x,mark AAAC -/a(*SKIP:m)x|ac(*:n)(*SKIP:n)d|ac/K +/a(*SKIP:m)x|ac(*:n)(*SKIP:n)d|ac/mark acacd -/A(*SKIP:m)x|A(*SKIP:n)x|AB/K +/A(*SKIP:m)x|A(*SKIP:n)x|AB/mark AB -/((*SKIP:r)d){0}a(*SKIP:m)x|ac(*:n)|ac/K +/((*SKIP:r)d){0}a(*SKIP:m)x|ac(*:n)|ac/mark acacd -/-- Tests that try to figure out how Perl works. My hypothesis is that the - first verb that is backtracked onto is the one that acts. This seems to be - the case almost all the time, but there is one exception that is perhaps a - bug. --/ +# Tests that try to figure out how Perl works. My hypothesis is that the first +# verb that is backtracked onto is the one that acts. This seems to be the case +# almost all the time, but there is one exception that is perhaps a bug. -/-- This matches "aaaac"; each PRUNE advances one character until the subject - no longer starts with 5 'a's. --/ +# This matches "aaaac"; each PRUNE advances one character until the subject no +# longer starts with 5 'a's. /aaaaa(*PRUNE)b|a+c/ aaaaaac -/-- Putting SKIP in front of PRUNE makes no difference, as it is never -backtracked onto, whether or not it has a label. --/ +# Putting SKIP in front of PRUNE makes no difference, as it is never +# backtracked onto, whether or not it has a label. /aaaaa(*SKIP)(*PRUNE)b|a+c/ aaaaaac @@ -5320,70 +5288,69 @@ backtracked onto, whether or not it has a label. --/ /aaaa(*:N)a(*SKIP:N)(*PRUNE)b|a+c/ aaaaaac -/-- Putting THEN in front makes no difference. */ +# Putting THEN in front makes no difference. /aaaaa(*THEN)(*PRUNE)b|a+c/ aaaaaac -/-- However, putting COMMIT in front of the prune changes it to "no match". I - think this is inconsistent and possibly a bug. For the moment, running this - test is moved out of the Perl-compatible file. --/ +# However, putting COMMIT in front of the prune changes it to "no match". I +# think this is inconsistent and possibly a bug. For the moment, running this +# test is moved out of the Perl-compatible file. /aaaaa(*COMMIT)(*PRUNE)b|a+c/ +# OK, lets play the same game again using SKIP instead of PRUNE. -/---- OK, lets play the same game again using SKIP instead of PRUNE. ----/ - -/-- This matches "ac" because SKIP forces the next match to start on the - sixth "a". --/ +# This matches "ac" because SKIP forces the next match to start on the +# sixth "a". /aaaaa(*SKIP)b|a+c/ aaaaaac -/-- Putting PRUNE in front makes no difference. --/ +# Putting PRUNE in front makes no difference. /aaaaa(*PRUNE)(*SKIP)b|a+c/ aaaaaac -/-- Putting THEN in front makes no difference. --/ +# Putting THEN in front makes no difference. /aaaaa(*THEN)(*SKIP)b|a+c/ aaaaaac -/-- In this case, neither does COMMIT. This still matches "ac". --/ +# In this case, neither does COMMIT. This still matches "ac". /aaaaa(*COMMIT)(*SKIP)b|a+c/ aaaaaac -/-- This gives "no match", as expected. --/ +# This gives "no match", as expected. /aaaaa(*COMMIT)b|a+c/ +\= Expect no match aaaaaac - -/------ Tests using THEN ------/ +# ---- Tests using THEN ---- -/-- This matches "aaaaaac", as expected. --/ +# This matches "aaaaaac", as expected. /aaaaa(*THEN)b|a+c/ aaaaaac -/-- Putting SKIP in front makes no difference. --/ +# Putting SKIP in front makes no difference. /aaaaa(*SKIP)(*THEN)b|a+c/ aaaaaac -/-- Putting PRUNE in front makes no difference. --/ +# Putting PRUNE in front makes no difference. /aaaaa(*PRUNE)(*THEN)b|a+c/ aaaaaac -/-- Putting COMMIT in front makes no difference. --/ +# Putting COMMIT in front makes no difference. /aaaaa(*COMMIT)(*THEN)b|a+c/ aaaaaac -/-- End of "priority" tests --/ +# End of "priority" tests /aaaaa(*:m)(*PRUNE:m)(*SKIP:m)m|a+/ aaaaaa @@ -5409,7 +5376,7 @@ backtracked onto, whether or not it has a label. --/ /aaa(*MARK:A)a(*SKIP:A)b|a+c/ aaaac -/a(*:m)a(*COMMIT)(*SKIP:m)b|a+c/K +/a(*:m)a(*COMMIT)(*SKIP:m)b|a+c/mark aaaaaac /.?(a|b(*THEN)c)/ @@ -5417,6 +5384,7 @@ backtracked onto, whether or not it has a label. --/ /(a(*COMMIT)b)c|abd/ abc +\= Expect no match abd /(?=a(*COMMIT)b)abc|abd/ @@ -5428,14 +5396,16 @@ backtracked onto, whether or not it has a label. --/ abd /a(?=b(*COMMIT)c)[^d]|abd/ + abc +\= Expect no match abd - abc /a(?=bc).|abd/ abd abc /a(?>b(*COMMIT)c)d|abd/ +\= Expect no match abceabd /a(?>bc)d|abd/ @@ -5445,100 +5415,104 @@ backtracked onto, whether or not it has a label. --/ abd /(?>a(*COMMIT)c)d|abd/ +\= Expect no match abd /((?=a(*COMMIT)b)ab|ac){0}(?:(?1)|a(c))/ ac -/-- These tests were formerly in test 2, but changes in PCRE and Perl have - made them compatible. --/ +# These tests were formerly in test 2, but changes in PCRE and Perl have +# made them compatible. /^(a)?(?(1)a|b)+$/ - *** Failers +\= Expect no match a -/(?=a\Kb)ab/ - ab - -/(?!a\Kb)ac/ - ac - -/^abc(?<=b\Kc)d/ - abcd - -/^abc(?b))/K +/(*:m(m)(?&y)(?(DEFINE)(?b))/mark abc -/(*PRUNE:m(m)(?&y)(?(DEFINE)(?b))/K +/(*PRUNE:m(m)(?&y)(?(DEFINE)(?b))/mark abc -/(*SKIP:m(m)(?&y)(?(DEFINE)(?b))/K +/(*SKIP:m(m)(?&y)(?(DEFINE)(?b))/mark abc -/(*THEN:m(m)(?&y)(?(DEFINE)(?b))/K +/(*THEN:m(m)(?&y)(?(DEFINE)(?b))/mark abc /^\d*\w{4}/ 1234 +\= Expect no match 123 /^[^b]*\w{4}/ aaaa +\= Expect no match aaa /^[^b]*\w{4}/i aaaa +\= Expect no match aaa /^a*\w{4}/ aaaa +\= Expect no match aaa /^a*\w{4}/i aaaa +\= Expect no match aaa -/(?(?=ab)ab)/+ - ca - cd - -/(?:(?foo)|(?bar))\k/J +/(?:(?foo)|(?bar))\k/dupnames foofoo barbar -/(?A)(?:(?foo)|(?bar))\k/J +/(?A)(?:(?foo)|(?bar))\k/dupnames AfooA AbarA - ** Failers +\= Expect no match Afoofoo Abarbar /^(\d+)\s+IN\s+SOA\s+(\S+)\s+(\S+)\s*\(\s*$/ 1 IN SOA non-sp1 non-sp2( -/^ (?:(?A)|(?'B'B)(?A)) (?('A')x) (?()y)$/xJ +/^ (?:(?A)|(?'B'B)(?A)) (?('A')x) (?()y)$/x,dupnames Ax BAxy @@ -5657,52 +5633,54 @@ AbcdCBefgBhiBqz / ^ ( a + ) + + \w $ /x aaaab -/(?:a\Kb)*+/+ +/(?:a\Kb)*+/aftertext ababc -/(?>a\Kb)*/+ +/(?>a\Kb)*/aftertext ababc -/(?:a\Kb)*/+ +/(?:a\Kb)*/aftertext ababc -/(a\Kb)*+/+ +/(a\Kb)*+/aftertext ababc -/(a\Kb)*/+ +/(a\Kb)*/aftertext ababc /(?:x|(?:(xx|yy)+|x|x|x|x|x)|a|a|a)bc/ +\= Expect no match acb -'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++' +/\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++/ NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED -'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++' +/\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++/ NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED -'\A(?:[^\"]++|\"(?:[^\"]++|\"\")++\")++' +/\A(?:[^\"]++|\"(?:[^\"]++|\"\")++\")++/ NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED -'\A([^\"1]++|[\"2]([^\"3]*+|[\"4][\"5])*+[\"6])++' +/\A([^\"1]++|[\"2]([^\"3]*+|[\"4][\"5])*+[\"6])++/ NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED /^\w+(?>\s*)(?<=\w)/ - test test + test test -/(?Pa)(?Pb)/gJ +/(?Pa)(?Pb)/g,dupnames abbaba -/(?Pa)(?Pb)(?P=same)/gJ +/(?Pa)(?Pb)(?P=same)/g,dupnames abbaba -/(?P=same)?(?Pa)(?Pb)/gJ +/(?P=same)?(?Pa)(?Pb)/g,dupnames abbaba -/(?:(?P=same)?(?:(?Pa)|(?Pb))(?P=same))+/gJ +/(?:(?P=same)?(?:(?Pa)|(?Pb))(?P=same))+/g,dupnames bbbaaabaabb -/(?:(?P=same)?(?:(?P=same)(?Pa)(?P=same)|(?P=same)?(?Pb)(?P=same)){2}(?P=same)(?Pc)(?P=same)){2}(?Pz)?/gJ +/(?:(?P=same)?(?:(?P=same)(?Pa)(?P=same)|(?P=same)?(?Pb)(?P=same)){2}(?P=same)(?Pc)(?P=same)){2}(?Pz)?/g,dupnames +\= Expect no match bbbaaaccccaaabbbcc /(?Pa)?(?Pb)?(?()c|d)*l/ @@ -5720,27 +5698,628 @@ AbcdCBefgBhiBqz /[\Q]a\E]+/ aa]] +/A((((((((a))))))))\8B/ + AaaB + +/A(((((((((a)))))))))\9B/ + AaaB + +/A[\8\9]B/ + A8B + A9B + +/(|ab)*?d/ + abd + xyd + /(?:((abcd))|(((?:(?:(?:(?:abc|(?:abcdef))))b)abcdefghi)abc)|((*ACCEPT)))/ 1234abcd +/(\2|a)(\1)/ + aaa + /(\2)(\1)/ -"Z*(|d*){216}" +/Z*(|d*){216}/ -"(?1)(?#?'){8}(a)" +/(?1)(?#?'){8}(a)/ baaaaaaaaac -"(?|(\k'Pm')|(?'Pm'))" +/((((((((((((x))))))))))))\12/ + xx + +/A[\8]B[\9]C/ + A8B9C + +/(?1)()((((((\1++))\x85)+)|))/ + \x85\x85 + +/(?|(\k'Pm')|(?'Pm'))/ abcd +/(?|(aaa)|(b))\g{1}/ + aaaaaa + bb + +/(?|(aaa)|(b))(?1)/ + aaaaaa + baaa +\= Expect no match + bb + +/(?|(aaa)|(b))/ + xaaa + xbc + +/(?|(?'a'aaa)|(?'a'b))\k'a'/ + aaaaaa + bb + +/(?|(?'a'aaa)|(?'a'b))(?'a'cccc)\k'a'/dupnames + aaaccccaaa + bccccb + +# /x does not apply to MARK labels + +/x (*MARK:ab cd # comment +ef) x/x,mark + axxz + +/(?<=a(B){0}c)X/ + acX + +/(?b)(?(DEFINE)(a+))(?&DEFINE)/ + bbbb +\= Expect no match + baaab + /(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(?=.*[,;:])(?=.{8,16})(?!.*[\s])/ - \ Fred:099 + \ Fred:099 /(?=.*X)X$/ \ X - -/X+(?#comment)?/ - >XXX< + +/(?s)(?=.*?)b/ + aabc + +/(Z)(a)\2{1,2}?(?-i)\1X/i + ZaAAZX + +/(?'c')XX(?'YYYYYYYYYYYYYYYYYYYYYYYCl')/ + +/[s[:digit:]\E-H]+/ + s09-H + +/[s[:digit:]\Q\E-H]+/ + s09-H + +/a+(?:|b)a/ + aaaa + +/X?(R||){3335}/ + +/(?1)(A(*COMMIT)|B)D/ + ABD + XABD + BAD + ABXABD +\= Expect no match + ABX + +/(?(DEFINE)(? 1? (?=(?2)?) 1 2 (?('cond')|3))) + \A + () + (?&m) + \Z/x + 123 + +/^(?: +(?: A| (1? (?=(?2)?) (1) 2 (?('cond')|3)) ) +(Z) +)+$/x + AZ123Z +\= Expect no match + AZ12Z + +/^ (?(DEFINE) ( (?!(a)\2b)..) ) ()(?1) /x + acb +\= Expect no match + aab + +/(?>ab|abab){1,5}?M/ + abababababababababababM + +/(?>ab|abab){2}?M/ + abababM + +/((?(?=(a))a)+k)/ + bbak + +/((?(?=(a))a|)+k)/ + bbak + +/(?(?!(b))a|b)+k/ + ababbalbbadabak + +/(?!(b))c|b/ + Ab + Ac + +/(?=(b))b|c/ + Ab + Ac + +/^(.|(.)(?1)\2)$/ + a + aba + abcba + ababa + abcdcba + +/^((.)(?1)\2|.?)$/ + a + aba + abba + abcba + ababa + abccba + abcdcba + abcddcba + +/^(.)(\1|a(?2))/ + bab + +/^(.|(.)(?1)?\2)$/ + abcba + +/^(?(?=(a))abc|def)/ + abc + +/^(?(?!(a))def|abc)/ + abc + +/^(?(?=(a)(*ACCEPT))abc|def)/ + abc + +/^(?(?!(a)(*ACCEPT))def|abc)/ + abc + +/^(?1)\d{3}(a)/ + a123a + +# This pattern uses a lot of named subpatterns in order to match email +# addresses in various formats. It's a heavy test for named subpatterns. In the +# group, slash is coded as \x{2f} so that this pattern can also be +# processed by perltest.sh, which does not cater for an escaped delimiter +# within the pattern. $ within the pattern must also be escaped. All $ and @ +# characters in subject strings are escaped so that Perl doesn't interpret them +# as variable insertions and " characters must also be escaped for Perl. + +# This set of subpatterns is more or less a direct transliteration of the BNF +# definitions in RFC2822, without any of the obsolete features. The addition of +# a possessive + to the definition of reduced the match limit in PCRE2 +# from over 5 million to just under 400, and eliminated a very noticeable delay +# when this file was passed to perltest.sh. + +/(?ix)(?(DEFINE) +(? (?&local_part) \@ (?&domain) ) +(? (?&CFWS)?+ < (?&addr_spec) > (?&CFWS)?+ ) +(? [a-z\d!#\$%&'*+-\x{2f}=?^_`{|}~] ) +(? (?&CFWS)?+ (?&atext)+ (?&CFWS)?+ ) +(? (?&ctext) | (?"ed_pair) | (?&comment) ) +(? [^\x{9}\x{10}\x{13}\x{7f}-\x{ff}\ ()\\] ) +(? \( (?: (?&FWS)?+ (?&ccontent) )*+ (?&FWS)?+ \) ) +(? (?: (?&FWS)?+ (?&comment) )* (?# NOT possessive) + (?: (?&FWS)?+ (?&comment) | (?&FWS) ) ) +(? (?&dtext) | (?"ed_pair) ) +(? (?&phrase) ) +(? (?&dot_atom) | (?&domain_literal) ) +(? (?&CFWS)?+ \[ (?: (?&FWS)?+ (?&dcontent) )* (?&FWS)?+ \] + (?&CFWS)?+ ) +(? (?&CFWS)?+ (?&dot_atom_text) (?&CFWS)?+ ) +(? (?&atext)++ (?: \. (?&atext)++)*+ ) +(? [^\x{9}\x{10}\x{13}\x{7f}-\x{ff}\ \[\]\\] ) +(? (?: [\t\ ]*+ \n)?+ [\t\ ]++ ) +(? (?&dot_atom) | (?"ed_string) ) +(? (?&name_addr) | (?&addr_spec) ) +(? (?&display_name)? (?&angle_addr) ) +(? (?&word)++ ) +(? (?&qtext) | (?"ed_pair) ) +(? " (?&text) ) +(? (?&CFWS)?+ " (?: (?&FWS)?+ (?&qcontent))* (?&FWS)?+ " + (?&CFWS)?+ ) +(? [^\x{9}\x{10}\x{13}\x{7f}-\x{ff}\ "\\] ) +(? [^\r\n] ) +(? (?&atom) | (?"ed_string) ) +) # End DEFINE +^(?&mailbox)$/ + Alan Other + + user\@dom.ain + user\@[] + user\@[domain literal] + user\@[domain literal with \"[square brackets\"] inside] + \"A. Other\" (a comment) + A. Other (a comment) + \"/s=user/ou=host/o=place/prmd=uu.yy/admd= /c=gb/\"\@x400-re.lay +\= Expect no match + A missing angle (?&simple_assertion) | (?&lookaround) ) + +(? \( \? > (?®ex) \) ) + +(? \\ \d+ | + \\g (?: [+-]?\d+ | \{ (?: [+-]?\d+ | (?&groupname) ) \} ) | + \\k <(?&groupname)> | + \\k '(?&groupname)' | + \\k \{ (?&groupname) \} | + \( \? P= (?&groupname) \) ) + +(? (?:(?&assertion) | + (?&callout) | + (?&comment) | + (?&option_setting) | + (?&qualified_item) | + (?"ed_string) | + (?"ed_string_empty) | + (?&special_escape) | + (?&verb) + )* ) + +(? \(\?C (?: \d+ | + (?: (?["'`^%\#\$]) + (?: \k'D'\k'D' | (?!\k'D') . )* \k'D' | + \{ (?: \}\} | [^}]*+ )* \} ) + )? \) ) + +(? \( (?: \? P? < (?&groupname) > | \? ' (?&groupname) ' )? + (?®ex) \) ) + +(? \[ \^?+ (?: \] (?&class_item)* | (?&class_item)+ ) \] ) + +(? (?! \\N\{\w+\} ) \\ [dDsSwWhHvVRN] ) + +(? (?: \[ : (?: + alnum|alpha|ascii|blank|cntrl|digit|graph|lower|print| + punct|space|upper|word|xdigit + ) : \] | + (?"ed_string) | + (?"ed_string_empty) | + (?&escaped_character) | + (?&character_type) | + [^]] ) ) + +(? \(\?\# [^)]* \) | (?"ed_string_empty) | \\E ) + +(? (?: \( [+-]? \d+ \) | + \( < (?&groupname) > \) | + \( ' (?&groupname) ' \) | + \( R \d* \) | + \( R & (?&groupname) \) | + \( (?&groupname) \) | + \( DEFINE \) | + \( VERSION >?=\d+(?:\.\d\d?)? \) | + (?&callout)?+ (?&comment)* (?&lookaround) ) ) + +(? \(\? (?&condition) (?&branch) (?: \| (?&branch) )? \) ) + +(? (? [-\x{2f}!"'`=_:;,%&@~]) (?®ex) + \k'delimiter' .* ) + +(? \\ (?: 0[0-7]{1,2} | [0-7]{1,3} | o\{ [0-7]+ \} | + x \{ (*COMMIT) [[:xdigit:]]* \} | x [[:xdigit:]]{0,2} | + [aefnrt] | c[[:print:]] | + [^[:alnum:]] ) ) + +(? (?&capturing_group) | (?&non_capturing_group) | + (?&resetting_group) | (?&atomic_group) | + (?&conditional_group) ) + +(? [a-zA-Z_]\w* ) + +(? (?! (?&range_qualifier) ) [^[()|*+?.\$\\] ) + +(? \(\? (?: = | ! | <= | \(\? [iJmnsUx-]* : (?®ex) \) ) + +(? \(\? [iJmnsUx-]* \) ) + +(? (?:\. | + (?&lookaround) | + (?&back_reference) | + (?&character_class) | + (?&character_type) | + (?&escaped_character) | + (?&group) | + (?&subroutine_call) | + (?&literal_character) | + (?"ed_string) + ) (?&comment)? (?&qualifier)? ) + +(? (?: [?*+] | (?&range_qualifier) ) [+?]? ) + +(? (?: \\Q (?: (?!\\E | \k'delimiter') . )++ (?: \\E | ) ) ) + +(? \\Q\\E ) + +(? \{ (?: \d+ (?: , \d* )? | , \d+ ) \} ) + +(? (?&start_item)* (?&branch) (?: \| (?&branch) )* ) + +(? \( \? \| (?®ex) \) ) + +(? \^ | \$ | \\A | \\b | \\B | \\G | \\z | \\Z ) + +(? \\K ) + +(? \( \* (?: + ANY | + ANYCRLF | + BSR_ANYCRLF | + BSR_UNICODE | + CR | + CRLF | + LF | + LIMIT_MATCH=\d+ | + LIMIT_DEPTH=\d+ | + LIMIT_HEAP=\d+ | + NOTEMPTY | + NOTEMPTY_ATSTART | + NO_AUTO_POSSESS | + NO_DOTSTAR_ANCHOR | + NO_JIT | + NO_START_OPT | + NUL | + UTF | + UCP ) \) ) + +(? (?: \(\?R\) | \(\?[+-]?\d+\) | + \(\? (?: & | P> ) (?&groupname) \) | + \\g < (?&groupname) > | + \\g ' (?&groupname) ' | + \\g < [+-]? \d+ > | + \\g ' [+-]? \d+ ) ) + +(? \(\* (?: ACCEPT | FAIL | F | COMMIT | + (?:MARK)?:(?&verbname) | + (?:PRUNE|SKIP|THEN) (?: : (?&verbname)? )? ) \) ) + +(? [^)]+ ) + +) # End DEFINE +# Kick it all off... +^(?&delimited_regex)$/subject_literal,jitstack=256 + /^(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)\11*(\3\4)\1(?#)2$/ + /(cat(a(ract|tonic)|erpillar)) \1()2(3)/ + /^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/ + /^From\s+\S+\s+([a-zA-Z]{3}\s+){2}\d{1,2}\s+\d\d:\d\d/ + /]{0,})>]{0,})>([\d]{0,}\.)(.*)((
    ([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/is + /^(?(DEFINE) (?
    a) (? b) ) (?&A) (?&B) / + /(?(DEFINE)(?2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))\b(?&byte)(\.(?&byte)){3}/ + /\b(?&byte)(\.(?&byte)){3}(?(DEFINE)(?2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))/ + /^(\w++|\s++)*$/ + /a+b?(*THEN)c+(*FAIL)/ + /(A (A|B(*ACCEPT)|C) D)(E)/x + /^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$/i + /A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/B + /(?C`a``b`)(?C'a''b')(?C"a""b")(?C^a^^b^)(?C%a%%b%)(?C#a##b#)(?C$a$$b$)(?C{a}}b})/B,callout_info + /(?sx)(?(DEFINE)(? (?&simple_assertion) | (?&lookaround) )(? \( \? > (?®ex) \) )(? \\ \d+ | \\g (?: [+-]?\d+ | \{ (?: [+-]?\d+ | (?&groupname) ) \} ) | \\k <(?&groupname)> | \\k '(?&groupname)' | \\k \{ (?&groupname) \} | \( \? P= (?&groupname) \) )(? (?:(?&assertion) | (?&callout) | (?&comment) | (?&option_setting) | (?&qualified_item) | (?"ed_string) | (?"ed_string_empty) | (?&special_escape) | (?&verb) )* )(? \(\?C (?: \d+ | (?: (?["'`^%\#\$]) (?: \k'D'\k'D' | (?!\k'D') . )* \k'D' | \{ (?: \}\} | [^}]*+ )* \} ) )? \) )(? \( (?: \? P? < (?&groupname) > | \? ' (?&groupname) ' )? (?®ex) \) )(? \[ \^?+ (?: \] (?&class_item)* | (?&class_item)+ ) \] )(? (?! \\N\{\w+\} ) \\ [dDsSwWhHvVRN] )(? (?: \[ : (?: alnum|alpha|ascii|blank|cntrl|digit|graph|lower|print| punct|space|upper|word|xdigit ) : \] | (?"ed_string) | (?"ed_string_empty) | (?&escaped_character) | (?&character_type) | [^]] ) )(? \(\?\# [^)]* \) | (?"ed_string_empty) | \\E )(? (?: \( [+-]? \d+ \) | \( < (?&groupname) > \) | \( ' (?&groupname) ' \) | \( R \d* \) | \( R & (?&groupname) \) | \( (?&groupname) \) | \( DEFINE \) | \( VERSION >?=\d+(?:\.\d\d?)? \) | (?&callout)?+ (?&comment)* (?&lookaround) ) )(? \(\? (?&condition) (?&branch) (?: \| (?&branch) )? \) )(? (? [-\x{2f}!"'`=_:;,%&@~]) (?®ex) \k'delimiter' .* )(? \\ (?: 0[0-7]{1,2} | [0-7]{1,3} | o\{ [0-7]+ \} | x \{ (*COMMIT) [[:xdigit:]]* \} | x [[:xdigit:]]{0,2} | [aefnrt] | c[[:print:]] | [^[:alnum:]] ) )(? (?&capturing_group) | (?&non_capturing_group) | (?&resetting_group) | (?&atomic_group) | (?&conditional_group) )(? [a-zA-Z_]\w* )(? (?! (?&range_qualifier) ) [^[()|*+?.\$\\] )(? \(\? (?: = | ! | <= | \(\? [iJmnsUx-]* : (?®ex) \) )(? \(\? [iJmnsUx-]* \) )(? (?:\. | (?&lookaround) | (?&back_reference) | (?&character_class) | (?&character_type) | (?&escaped_character) | (?&group) | (?&subroutine_call) | (?&literal_character) | (?"ed_string) ) (?&comment)? (?&qualifier)? )(? (?: [?*+] | (?&range_qualifier) ) [+?]? )(? (?: \\Q (?: (?!\\E | \k'delimiter') . )++ (?: \\E | ) ) ) (? \\Q\\E ) (? \{ (?: \d+ (?: , \d* )? | , \d+ ) \} )(? (?&start_item)* (?&branch) (?: \| (?&branch) )* )(? \( \? \| (?®ex) \) )(? \^ | \$ | \\A | \\b | \\B | \\G | \\z | \\Z )(? \\K )(? \( \* (?: ANY | ANYCRLF | BSR_ANYCRLF | BSR_UNICODE | CR | CRLF | LF | LIMIT_MATCH=\d+ | LIMIT_DEPTH=\d+ | LIMIT_HEAP=\d+ | NOTEMPTY | NOTEMPTY_ATSTART | NO_AUTO_POSSESS | NO_DOTSTAR_ANCHOR | NO_JIT | NO_START_OPT | NUL | UTF | UCP ) \) )(? (?: \(\?R\) | \(\?[+-]?\d+\) | \(\? (?: & | P> ) (?&groupname) \) | \\g < (?&groupname) > | \\g ' (?&groupname) ' | \\g < [+-]? \d+ > | \\g ' [+-]? \d+ ) )(? \(\* (?: ACCEPT | FAIL | F | COMMIT | (?:MARK)?:(?&verbname) | (?:PRUNE|SKIP|THEN) (?: : (?&verbname)? )? ) \) )(? [^)]+ ))^(?&delimited_regex)$/ +\= Expect no match + /((?(?C'')\QX\E(?!((?(?C'')(?!X=X));=)r*X=X));=)/ + /(?:(?(2y)a|b)(X))+/ + /a(*MARK)b/ + /a(*CR)b/ + /(?P(?P=abn)(?/xx + < > + +/<(?:[a b])>/xx + < > + +/<(?xxx:[a b])>/ + < > + +/<(?-x:[a b])>/xx + < > + +/[[:digit:]-]+/ + 12-24 + +/((?<=((*ACCEPT)) )\1?\b) / +\= Expect no match + ((?<=((*ACCEPT)) )\\1?\\b)\x20 + +/((?<=((*ACCEPT))X)\1?Y)\1/ + XYYZ + +/((?<=((*ACCEPT))X)\1?Y(*ACCEPT))\1/ + XYYZ + +/(?(DEFINE)(?a?)X)^(?&optional_a)a$/ + aa + a + +/^(a?)b(?1)a/ + abaa + aba + baa + ba + +/^(a?)+b(?1)a/ + abaa + aba + baa + ba + +/^(a?)++b(?1)a/ + abaa + aba + baa + ba + +/^(a?)+b/ + b + ab + aaab + +/(?=a+)a(a+)++b/ + aab + +/(?<=\G.)/g,aftertext + abc + +/(?<=(?=.)?)/ + +/(?<=(?=.)?+)/ + +/(?<=(?=.)*)/ + +/(?<=(?=.){4,5})/ + +/(?<=(?=.){4,5}x)/ + +/a(?=.(*:X))(*SKIP:X)(*F)|(.)/ + abc + +/a(?>(*:X))(*SKIP:X)(*F)|(.)/ + abc + +/a(?:(*:X))(*SKIP:X)(*F)|(.)/ + abc + +#pattern no_start_optimize + +/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/ + abc + +/(?>a(*:1))(?>b)(*SKIP:1)x|.*/ + abc + +#subject mark + +/a(*ACCEPT:X)b/ + abc + +/(?=a(*ACCEPT:QQ)bc)axyz/ + axyz + +/(?(DEFINE)(a(*ACCEPT:X)))(?1)b/ + abc + +/a(*F:X)b/ + abc + +/(?(DEFINE)(a(*F:X)))(?1)b/ + abc + +/a(*COMMIT:X)b/ + abc + +/(?(DEFINE)(a(*COMMIT:X)))(?1)b/ + abc + +/a+(*:Z)b(*COMMIT:X)(*SKIP:Z)c|.*/ + aaaabd + +/a+(*:Z)b(*COMMIT:X)(*SKIP:X)c|.*/ + aaaabd + +/a(*COMMIT:X)b/ + axabc + +#pattern -no_start_optimize +#subject -mark + +/(.COMMIT)(*COMMIT::::::::::interal error:::)/ + +/(*COMMIT:ÿÿ)/ + +/(*COMMIT:]w)/ + +/(?i)A(?^)B(?^x:C D)(?^i)e f/ + aBCDE F +\= Expect no match + aBCDEF + AbCDe f + +/(*pla:foo).{6}/ + abcfoobarxyz +\= Expect no match + abcfooba + +/(*positive_lookahead:foo).{6}/ + abcfoobarxyz + +/(?(*pla:foo).{6}|a..)/ + foobarbaz + abcfoobar + +/(?(*positive_lookahead:foo).{6}|a..)/ + foobarbaz + abcfoobar + +/(*plb:foo)bar/ + abcfoobar +\= Expect no match + abcbarfoo + +/(*positive_lookbehind:foo)bar/ + abcfoobar +\= Expect no match + abcbarfoo + +/(?(*plb:foo)bar|baz)/ + abcfoobar + bazfoobar + abcbazfoobar + foobazfoobar + +/(?(*positive_lookbehind:foo)bar|baz)/ + abcfoobar + bazfoobar + abcbazfoobar + foobazfoobar + +/(*nlb:foo)bar/ + abcbarfoo +\= Expect no match + abcfoobar + +/(*negative_lookbehind:foo)bar/ + abcbarfoo +\= Expect no match + abcfoobar + +/(?(*nlb:foo)bar|baz)/ + abcfoobaz + abcbarbaz +\= Expect no match + abcfoobar + +/(?(*negative_lookbehind:foo)bar|baz)/ + abcfoobaz + abcbarbaz +\= Expect no match + abcfoobar + +/(*atomic:a+)\w/ + aaab +\= Expect no match + aaaa / (? \w+ )* \. /xi pokus. @@ -5757,4 +6336,103 @@ AbcdCBefgBhiBqz /(?&word)* \. (? \w+ )/xi pokus.hokus -/-- End of testinput1 --/ +/a(?(?=(*:2)b).)/mark + abc + acb + +/a(?(?!(*:2)b).)/mark + acb + abc + +/(?:a|ab){1}+c/ +\= Expect no match + abc + +/(a|ab){1}+c/ + abc + +/(a+){1}+a/ +\= Expect no match + aaaa + +/(?(DEFINE)(a|ab))(?1){1}+c/ + abc + +/(?:a|(?=b)|.)*\z/ + abc + +/(?:a|(?=b)|.)*/ + abc + +/(?<=a(*SKIP)x)|c/ + abcd + +/(?<=a(*SKIP)x)|d/ + abcd + +/(?<=(?=.(?<=x)))/aftertext + abx + +/(?<=(?=(?<=a)))b/ + ab + +/^(?a)(?()b)((?<=b).*)$/ + abc + +/^(a\1?){4}$/ + aaaa + aaaaaa + +/^((\1+)|\d)+133X$/ + 111133X + +/^(?=.*(?=(([A-Z]).*(?(1)\1)))(?!.+\2)){26}/i + The quick brown fox jumps over the lazy dog. + Jackdaws love my big sphinx of quartz. + Pack my box with five dozen liquor jugs. +\= Expect no match + The quick brown fox jumps over the lazy cat. + Hackdaws love my big sphinx of quartz. + Pack my fox with five dozen liquor jugs. + +/^(?>.*?([A-Z])(?!.*\1)){26}/i + The quick brown fox jumps over the lazy dog. + Jackdaws love my big sphinx of quartz. + Pack my box with five dozen liquor jugs. +\= Expect no match + The quick brown fox jumps over the lazy cat. + Hackdaws love my big sphinx of quartz. + Pack my fox with five dozen liquor jugs. + +/(?<=X(?(DEFINE)(A)))X(*F)/ +\= Expect no match + AXYZ + +/(?<=X(?(DEFINE)(A)))./ + AXYZ + +/(?<=X(?(DEFINE)(.*))Y)./ + AXYZ + +/(?<=X(?(DEFINE)(Y))(?1))./ + AXYZ + +/(?(DEFINE)(?bar))(?\x{8c}748364< + +/a{65536/ + >a{65536< + +/a\K.(?0)*/ + abac + +/(a\K.(?1)*)/ + abac + +# End of testinput1 diff --git a/src/pcre2/testdata/testinput10 b/src/pcre2/testdata/testinput10 new file mode 100644 index 00000000..53e37cbc --- /dev/null +++ b/src/pcre2/testdata/testinput10 @@ -0,0 +1,620 @@ +# This set of tests is for UTF-8 support and Unicode property support, with +# relevance only for the 8-bit library. + +# The next 5 patterns have UTF-8 errors + +/[Ã]/utf + +/Ã/utf + +/ÃÃÃxxx/utf + +/‚‚‚‚‚‚‚Ã/utf + +/‚‚‚‚‚‚‚Ã/match_invalid_utf + +# Now test subjects + +/badutf/utf +\= Expect UTF-8 errors + X\xdf + XX\xef + XXX\xef\x80 + X\xf7 + XX\xf7\x80 + XXX\xf7\x80\x80 + \xfb + \xfb\x80 + \xfb\x80\x80 + \xfb\x80\x80\x80 + \xfd + \xfd\x80 + \xfd\x80\x80 + \xfd\x80\x80\x80 + \xfd\x80\x80\x80\x80 + \xdf\x7f + \xef\x7f\x80 + \xef\x80\x7f + \xf7\x7f\x80\x80 + \xf7\x80\x7f\x80 + \xf7\x80\x80\x7f + \xfb\x7f\x80\x80\x80 + \xfb\x80\x7f\x80\x80 + \xfb\x80\x80\x7f\x80 + \xfb\x80\x80\x80\x7f + \xfd\x7f\x80\x80\x80\x80 + \xfd\x80\x7f\x80\x80\x80 + \xfd\x80\x80\x7f\x80\x80 + \xfd\x80\x80\x80\x7f\x80 + \xfd\x80\x80\x80\x80\x7f + \xed\xa0\x80 + \xc0\x8f + \xe0\x80\x8f + \xf0\x80\x80\x8f + \xf8\x80\x80\x80\x8f + \xfc\x80\x80\x80\x80\x8f + \x80 + \xfe + \xff + +/badutf/utf +\= Expect UTF-8 errors + XX\xfb\x80\x80\x80\x80 + XX\xfd\x80\x80\x80\x80\x80 + XX\xf7\xbf\xbf\xbf + +/shortutf/utf +\= Expect UTF-8 errors + XX\xdf\=ph + XX\xef\=ph + XX\xef\x80\=ph + \xf7\=ph + \xf7\x80\=ph + \xf7\x80\x80\=ph + \xfb\=ph + \xfb\x80\=ph + \xfb\x80\x80\=ph + \xfb\x80\x80\x80\=ph + \xfd\=ph + \xfd\x80\=ph + \xfd\x80\x80\=ph + \xfd\x80\x80\x80\=ph + \xfd\x80\x80\x80\x80\=ph + +/anything/utf +\= Expect UTF-8 errors + X\xc0\x80 + XX\xc1\x8f + XXX\xe0\x9f\x80 + \xf0\x8f\x80\x80 + \xf8\x87\x80\x80\x80 + \xfc\x83\x80\x80\x80\x80 + \xfe\x80\x80\x80\x80\x80 + \xff\x80\x80\x80\x80\x80 + \xf8\x88\x80\x80\x80 + \xf9\x87\x80\x80\x80 + \xfc\x84\x80\x80\x80\x80 + \xfd\x83\x80\x80\x80\x80 +\= Expect no match + \xc3\x8f + \xe0\xaf\x80 + \xe1\x80\x80 + \xf0\x9f\x80\x80 + \xf1\x8f\x80\x80 + \xf8\x88\x80\x80\x80\=no_utf_check + \xf9\x87\x80\x80\x80\=no_utf_check + \xfc\x84\x80\x80\x80\x80\=no_utf_check + \xfd\x83\x80\x80\x80\x80\=no_utf_check + +# Similar tests with offsets + +/badutf/utf +\= Expect UTF-8 errors + X\xdfabcd + X\xdfabcd\=offset=1 +\= Expect no match + X\xdfabcd\=offset=2 + +/(?<=x)badutf/utf +\= Expect UTF-8 errors + X\xdfabcd + X\xdfabcd\=offset=1 + X\xdfabcd\=offset=2 + X\xdfabcd\xdf\=offset=3 +\= Expect no match + X\xdfabcd\=offset=3 + +/(?<=xx)badutf/utf +\= Expect UTF-8 errors + X\xdfabcd + X\xdfabcd\=offset=1 + X\xdfabcd\=offset=2 + X\xdfabcd\=offset=3 + +/(?<=xxxx)badutf/utf +\= Expect UTF-8 errors + X\xdfabcd + X\xdfabcd\=offset=1 + X\xdfabcd\=offset=2 + X\xdfabcd\=offset=3 + X\xdfabc\xdf\=offset=6 + X\xdfabc\xdf\=offset=7 +\= Expect no match + X\xdfabcd\=offset=6 + +/\x{100}/IB,utf + +/\x{1000}/IB,utf + +/\x{10000}/IB,utf + +/\x{100000}/IB,utf + +/\x{10ffff}/IB,utf + +/[\x{ff}]/IB,utf + +/[\x{100}]/IB,utf + +/\x80/IB,utf + +/\xff/IB,utf + +/\x{D55c}\x{ad6d}\x{C5B4}/IB,utf + \x{D55c}\x{ad6d}\x{C5B4} + +/\x{65e5}\x{672c}\x{8a9e}/IB,utf + \x{65e5}\x{672c}\x{8a9e} + +/\x{80}/IB,utf + +/\x{084}/IB,utf + +/\x{104}/IB,utf + +/\x{861}/IB,utf + +/\x{212ab}/IB,utf + +/[^ab\xC0-\xF0]/IB,utf + \x{f1} + \x{bf} + \x{100} + \x{1000} +\= Expect no match + \x{c0} + \x{f0} + +/Ä€{3,4}/IB,utf + \x{100}\x{100}\x{100}\x{100\x{100} + +/(\x{100}+|x)/IB,utf + +/(\x{100}*a|x)/IB,utf + +/(\x{100}{0,2}a|x)/IB,utf + +/(\x{100}{1,2}a|x)/IB,utf + +/\x{100}/IB,utf + +/a\x{100}\x{101}*/IB,utf + +/a\x{100}\x{101}+/IB,utf + +/[^\x{c4}]/IB + +/[\x{100}]/IB,utf + \x{100} + Z\x{100} + \x{100}Z + +/[\xff]/IB,utf + >\x{ff}< + +/[^\xff]/IB,utf + +/\x{100}abc(xyz(?1))/IB,utf + +/\777/I,utf + \x{1ff} + \777 + +/\x{100}+\x{200}/IB,utf + +/\x{100}+X/IB,utf + +/^[\QÄ€\E-\QÅ\E/B,utf + +# This tests the stricter UTF-8 check according to RFC 3629. + +/X/utf +\= Expect UTF-8 errors + \x{d800} + \x{da00} + \x{dfff} + \x{110000} + \x{2000000} + \x{7fffffff} +\= Expect no match + \x{d800}\=no_utf_check + \x{da00}\=no_utf_check + \x{dfff}\=no_utf_check + \x{110000}\=no_utf_check + \x{2000000}\=no_utf_check + \x{7fffffff}\=no_utf_check + +/(*UTF8)\x{1234}/ + abcd\x{1234}pqr + +/(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I + +/\h/I,utf + ABC\x{09} + ABC\x{20} + ABC\x{a0} + ABC\x{1680} + ABC\x{180e} + ABC\x{2000} + ABC\x{202f} + ABC\x{205f} + ABC\x{3000} + +/\v/I,utf + ABC\x{0a} + ABC\x{0b} + ABC\x{0c} + ABC\x{0d} + ABC\x{85} + ABC\x{2028} + +/\h*A/I,utf + CDBABC + +/\v+A/I,utf + +/\s?xxx\s/I,utf + +/\sxxx\s/I,utf,tables=2 + AB\x{85}xxx\x{a0}XYZ + AB\x{a0}xxx\x{85}XYZ + +/\S \S/I,utf,tables=2 + \x{a2} \x{84} + A Z + +/a+/utf + a\x{123}aa\=offset=1 + a\x{123}aa\=offset=3 + a\x{123}aa\=offset=4 +\= Expect bad offset value + a\x{123}aa\=offset=6 +\= Expect bad UTF-8 offset + a\x{123}aa\=offset=2 +\= Expect no match + a\x{123}aa\=offset=5 + +/\x{1234}+/Ii,utf + +/\x{1234}+?/Ii,utf + +/\x{1234}++/Ii,utf + +/\x{1234}{2}/Ii,utf + +/[^\x{c4}]/IB,utf + +/X+\x{200}/IB,utf + +/\R/I,utf + +/\777/IB,utf + +/\w+\x{C4}/B,utf + a\x{C4}\x{C4} + +/\w+\x{C4}/B,utf,tables=2 + a\x{C4}\x{C4} + +/\W+\x{C4}/B,utf + !\x{C4} + +/\W+\x{C4}/B,utf,tables=2 + !\x{C4} + +/\W+\x{A1}/B,utf + !\x{A1} + +/\W+\x{A1}/B,utf,tables=2 + !\x{A1} + +/X\s+\x{A0}/B,utf + X\x20\x{A0}\x{A0} + +/X\s+\x{A0}/B,utf,tables=2 + X\x20\x{A0}\x{A0} + +/\S+\x{A0}/B,utf + X\x{A0}\x{A0} + +/\S+\x{A0}/B,utf,tables=2 + X\x{A0}\x{A0} + +/\x{a0}+\s!/B,utf + \x{a0}\x20! + +/\x{a0}+\s!/B,utf,tables=2 + \x{a0}\x20! + +/A/utf + \x{ff000041} + \x{7f000041} + +/(*UTF8)abc/never_utf + +/abc/utf,never_utf + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IBi,utf + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IB,utf + +/AB\x{1fb0}/IB,utf + +/AB\x{1fb0}/IBi,utf + +/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf + \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} + \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} + +/[â±¥]/Bi,utf + +/[^â±¥]/Bi,utf + +/\h/I + +/\v/I + +/\R/I + +/[[:blank:]]/B,ucp + +/\x{212a}+/Ii,utf + KKkk\x{212a} + +/s+/Ii,utf + SSss\x{17f} + +/\x{100}*A/IB,utf + A + +/\x{100}*\d(?R)/IB,utf + +/[Z\x{100}]/IB,utf + Z\x{100} + \x{100} + \x{100}Z + +/[z-\x{100}]/IB,utf + +/[z\Qa-d]Ä€\E]/IB,utf + \x{100} + Ä€ + +/[ab\x{100}]abc(xyz(?1))/IB,utf + +/\x{100}*\s/IB,utf + +/\x{100}*\d/IB,utf + +/\x{100}*\w/IB,utf + +/\x{100}*\D/IB,utf + +/\x{100}*\S/IB,utf + +/\x{100}*\W/IB,utf + +/[\x{105}-\x{109}]/IBi,utf + \x{104} + \x{105} + \x{109} +\= Expect no match + \x{100} + \x{10a} + +/[z-\x{100}]/IBi,utf + Z + z + \x{39c} + \x{178} + | + \x{80} + \x{ff} + \x{100} + \x{101} +\= Expect no match + \x{102} + Y + y + +/[z-\x{100}]/IBi,utf + +/\x{3a3}B/IBi,utf + +/abc/utf,replace=à + abc + +/(?<=(a)(?-1))x/I,utf + a\x80zx\=offset=3 + +/[\W\p{Any}]/B + abc + 123 + +/[\W\pL]/B + abc +\= Expect no match + 123 + +/(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':Æ¿)/utf + +/[\s[:^ascii:]]/B,ucp + +# A special extra option allows excaped surrogate code points in 8-bit mode, +# but subjects containing them must not be UTF-checked. + +/\x{d800}/I,utf,allow_surrogate_escapes + \x{d800}\=no_utf_check + +/\udfff\o{157401}/utf,alt_bsux,allow_surrogate_escapes + \x{dfff}\x{df01}\=no_utf_check + +# This has different starting code units in 8-bit mode. + +/^[^ab]/IB,utf + c + \x{ff} + \x{100} +\= Expect no match + aaa + +# Offsets are different in 8-bit mode. + +/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout + 123abcáyzabcdef789abcሴqr + +# Check name length with non-ASCII characters + +/(?'ABáC678901234567890123456789012'...)/utf + +/(?'ABáC6789012345678901234567890123'...)/utf + +/(?'ABZC6789012345678901234567890123'...)/utf + +/(?(n/utf + +/(?(á/utf + +# Invalid UTF-8 tests + +/.../g,match_invalid_utf + abcd\x80wxzy\x80pqrs + abcd\x{80}wxzy\x80pqrs + +/abc/match_invalid_utf + ab\x80ab\=ph +\= Expect no match + ab\x80cdef\=ph + +/ab$/match_invalid_utf + ab\x80cdeab +\= Expect no match + ab\x80cde + +/.../g,match_invalid_utf + abcd\x{80}wxzy\x80pqrs + +/(?<=x)../g,match_invalid_utf + abcd\x{80}wxzy\x80pqrs + abcd\x{80}wxzy\x80xpqrs + +/X$/match_invalid_utf +\= Expect no match + X\xc4 + +/(?<=..)X/match_invalid_utf,aftertext + AB\x80AQXYZ + AB\x80AQXYZ\=offset=5 + AB\x80\x80AXYZXC\=offset=5 +\= Expect no match + AB\x80XYZ + AB\x80XYZ\=offset=3 + AB\xfeXYZ + AB\xffXYZ\=offset=3 + AB\x80AXYZ + AB\x80AXYZ\=offset=4 + AB\x80\x80AXYZ\=offset=5 + +/.../match_invalid_utf + AB\xc4CCC +\= Expect no match + A\x{d800}B + A\x{110000}B + A\xc4B + +/\bX/match_invalid_utf + A\x80X + +/\BX/match_invalid_utf +\= Expect no match + A\x80X + +/(?<=...)X/match_invalid_utf + AAA\x80BBBXYZ +\= Expect no match + AAA\x80BXYZ + AAA\x80BBXYZ + +# ------------------------------------- + +/(*UTF)(?=\x{123})/I + +/[\x{c1}\x{e1}]X[\x{145}\x{146}]/I,utf + +/[󿾟,]/BI,utf + +/[\x{fff4}-\x{ffff8}]/I,utf + +/[\x{fff4}-\x{afff8}\x{10ffff}]/I,utf + +/[\xff\x{ffff}]/I,utf + +/[\xff\x{ff}]/I,utf + abc\x{ff}def + +/[\xff\x{ff}]/I + abc\x{ff}def + +/[Ss]/I + +/[Ss]/I,utf + +/(?:\x{ff}|\x{3000})/I,utf + +/x/utf + abxyz + \x80\=startchar + abc\x80\=startchar + abc\x80\=startchar,offset=3 + +/\x{c1}+\x{e1}/iIB,ucp + \x{c1}\x{c1}\x{c1} + \x{e1}\x{e1}\x{e1} + +/a|\x{c1}/iI,ucp + \x{e1}xxx + +/a|\x{c1}/iI,utf + \x{e1}xxx + +/\x{c1}|\x{e1}/iI,ucp + +/X(\x{e1})Y/ucp,replace=>\U$1<,substitute_extended + X\x{e1}Y + +/X(\x{e1})Y/i,ucp,replace=>\L$1<,substitute_extended + X\x{c1}Y + +# Without UTF or UCP characters > 127 have only one case in the default locale. + +/X(\x{e1})Y/replace=>\U$1<,substitute_extended + X\x{e1}Y + +/A/utf,match_invalid_utf,caseless + \xe5A + +/\bch\b/utf,match_invalid_utf + qchq\=ph + qchq\=ps + +# End of testinput10 diff --git a/src/pcre2/testdata/testinput11 b/src/pcre2/testdata/testinput11 new file mode 100644 index 00000000..2bc8a25e --- /dev/null +++ b/src/pcre2/testdata/testinput11 @@ -0,0 +1,374 @@ +# This set of tests is for the 16-bit and 32-bit libraries' basic (non-UTF) +# features that are not compatible with the 8-bit library, or which give +# different output in 16-bit or 32-bit mode. The output for the two widths is +# different, so they have separate output files. + +#forbid_utf +#newline_default LF ANY ANYCRLF + +/[^\x{c4}]/IB + +/\x{100}/I + +/ (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* # optional leading comment +(?: (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +) # initial word +(?: (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +) )* # further okay, if led by a period +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* @ (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # initial subdomain +(?: # +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. # if led by a period... +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # ...further okay +)* +# address +| # or +(?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +) # one word, optionally followed by.... +(?: +[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037] | # atom and space parts, or... +\( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) | # comments, or... + +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +# quoted strings +)* +< (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* # leading < +(?: @ (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # initial subdomain +(?: # +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. # if led by a period... +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # ...further okay +)* + +(?: (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* , (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* @ (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # initial subdomain +(?: # +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. # if led by a period... +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # ...further okay +)* +)* # further okay, if led by comma +: # closing colon +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* )? # optional route +(?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +) # initial word +(?: (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +) )* # further okay, if led by a period +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* @ (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # initial subdomain +(?: # +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. # if led by a period... +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # ...further okay +)* +# address spec +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* > # trailing > +# name and address +) (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* # optional trailing comment +/Ix + +/[\h]/B + >\x09< + +/[\h]+/B + >\x09\x20\xa0< + +/[\v]/B + +/[^\h]/B + +/\h+/I + \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} + \x{3001}\x{2fff}\x{200a}\xa0\x{2000} + +/[\h\x{dc00}]+/IB + \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} + \x{3001}\x{2fff}\x{200a}\xa0\x{2000} + +/\H+/I + \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} + \x{2000}\x{200a}\x{1fff}\x{200b} + \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} + \xa0\x{3000}\x9f\xa1\x{2fff}\x{3001} + +/[\H\x{d800}]+/ + \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} + \x{2000}\x{200a}\x{1fff}\x{200b} + \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} + \xa0\x{3000}\x9f\xa1\x{2fff}\x{3001} + +/\v+/I + \x{2027}\x{2030}\x{2028}\x{2029} + \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d + +/[\v\x{dc00}]+/IB + \x{2027}\x{2030}\x{2028}\x{2029} + \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d + +/\V+/I + \x{2028}\x{2029}\x{2027}\x{2030} + \x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86 + +/[\V\x{d800}]+/ + \x{2028}\x{2029}\x{2027}\x{2030} + \x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86 + +/\R+/I,bsr=unicode + \x{2027}\x{2030}\x{2028}\x{2029} + \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d + +/\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}/I + \x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00} + +/[^\x{80}][^\x{ff}][^\x{100}][^\x{1000}][^\x{ffff}]/B + +/[^\x{80}][^\x{ff}][^\x{100}][^\x{1000}][^\x{ffff}]/Bi + +/[^\x{100}]*[^\x{1000}]+[^\x{ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{100}]{5,6}+/B + +/[^\x{100}]*[^\x{1000}]+[^\x{ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{100}]{5,6}+/Bi + +/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark + XX + +/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark + XX + +/\u0100/B,alt_bsux,allow_empty_class,match_unset_backref + +/[\u0100-\u0200]/B,alt_bsux,allow_empty_class,match_unset_backref + +/\ud800/B,alt_bsux,allow_empty_class,match_unset_backref + +/^\x{ffff}+/i + \x{ffff} + +/^\x{ffff}?/i + \x{ffff} + +/^\x{ffff}*/i + \x{ffff} + +/^\x{ffff}{3}/i + \x{ffff}\x{ffff}\x{ffff} + +/^\x{ffff}{0,3}/i + \x{ffff} + +/[^\x00-a]{12,}[^b-\xff]*/B + +/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/B + +/a*[b-\x{200}]?a#a*[b-\x{200}]?b#[a-f]*[g-\x{200}]*#[g-\x{200}]*[a-c]*#[g-\x{200}]*[a-h]*/B + +/^[\x{1234}\x{4321}]{2,4}?/ + \x{1234}\x{1234}\x{1234} + +# Check maximum non-UTF character size for the 16-bit library. + +/\x{ffff}/ + A\x{ffff}B + +/\x{10000}/ + +/\o{20000}/ + +# Check maximum character size for the 32-bit library. These will all give +# errors in the 16-bit library. + +/\x{110000}/ + +/\x{7fffffff}/ + +/\x{80000000}/ + +/\x{ffffffff}/ + +/\x{100000000}/ + +/\o{17777777777}/ + +/\o{20000000000}/ + +/\o{37777777777}/ + +/\o{40000000000}/ + +/\x{7fffffff}\x{7fffffff}/I + +/\x{80000000}\x{80000000}/I + +/\x{ffffffff}\x{ffffffff}/I + +# Non-UTF characters + +/.{2,3}/ + \x{400000}\x{400001}\x{400002}\x{400003} + +/\x{400000}\x{800000}/IBi + +# Check character ranges + +/[\H]/IB + +/[\V]/IB + +/(*THEN:\[A]{65501})/expand + +# We can use pcre2test's utf8_input modifier to create wide pattern characters, +# even though this test is run when UTF is not supported. + +/abý¿¿¿¿¿z/utf8_input + abý¿¿¿¿¿z + ab\x{7fffffff}z + +/abÿý¿¿¿¿¿z/utf8_input + abÿý¿¿¿¿¿z + ab\x{ffffffff}z + +/abÿAz/utf8_input + abÿAz + ab\x{80000041}z + +/(?i:A{1,}\6666666666)/ + A\x{1b6}6666666 + +# End of testinput11 diff --git a/src/pcre2/testdata/testinput12 b/src/pcre2/testdata/testinput12 new file mode 100644 index 00000000..9b4f8d34 --- /dev/null +++ b/src/pcre2/testdata/testinput12 @@ -0,0 +1,549 @@ +# This set of tests is for UTF-16 and UTF-32 support, including Unicode +# properties. It is relevant only to the 16-bit and 32-bit libraries. The +# output is different for each library, so there are separate output files. + +/ÃÃÃxxx/IB,utf,no_utf_check + +/abc/utf + Ã] + +# Check maximum character size + +/\x{ffff}/IB,utf + +/\x{10000}/IB,utf + +/\x{100}/IB,utf + +/\x{1000}/IB,utf + +/\x{10000}/IB,utf + +/\x{100000}/IB,utf + +/\x{10ffff}/IB,utf + +/[\x{ff}]/IB,utf + +/[\x{100}]/IB,utf + +/\x80/IB,utf + +/\xff/IB,utf + +/\x{D55c}\x{ad6d}\x{C5B4}/IB,utf + \x{D55c}\x{ad6d}\x{C5B4} + +/\x{65e5}\x{672c}\x{8a9e}/IB,utf + \x{65e5}\x{672c}\x{8a9e} + +/\x{80}/IB,utf + +/\x{084}/IB,utf + +/\x{104}/IB,utf + +/\x{861}/IB,utf + +/\x{212ab}/IB,utf + +/[^ab\xC0-\xF0]/IB,utf + \x{f1} + \x{bf} + \x{100} + \x{1000} +\= Expect no match + \x{c0} + \x{f0} + +/Ä€{3,4}/IB,utf + \x{100}\x{100}\x{100}\x{100\x{100} + +/(\x{100}+|x)/IB,utf + +/(\x{100}*a|x)/IB,utf + +/(\x{100}{0,2}a|x)/IB,utf + +/(\x{100}{1,2}a|x)/IB,utf + +/\x{100}/IB,utf + +/a\x{100}\x{101}*/IB,utf + +/a\x{100}\x{101}+/IB,utf + +/[^\x{c4}]/IB + +/[\x{100}]/IB,utf + \x{100} + Z\x{100} + \x{100}Z + +/[\xff]/IB,utf + >\x{ff}< + +/[^\xff]/IB,utf + +/\x{100}abc(xyz(?1))/IB,utf + +/\777/I,utf + \x{1ff} + \777 + +/\x{100}+\x{200}/IB,utf + +/\x{100}+X/IB,utf + +/^[\QÄ€\E-\QÅ\E/B,utf + +/X/utf + XX\x{d800}\=no_utf_check + XX\x{da00}\=no_utf_check + XX\x{dc00}\=no_utf_check + XX\x{de00}\=no_utf_check + XX\x{dfff}\=no_utf_check +\= Expect UTF error + XX\x{d800} + XX\x{da00} + XX\x{dc00} + XX\x{de00} + XX\x{dfff} + XX\x{110000} + XX\x{d800}\x{1234} +\= Expect no match + XX\x{d800}\=offset=3 + +/(?<=.)X/utf + XX\x{d800}\=offset=3 + +/(*UTF16)\x{11234}/ + abcd\x{11234}pqr + +/(*UTF)\x{11234}/I + abcd\x{11234}pqr + +/(*UTF-32)\x{11234}/ + abcd\x{11234}pqr + +/(*UTF-32)\x{112}/ + abcd\x{11234}pqr + +/(*CRLF)(*UTF16)(*BSR_UNICODE)a\Rb/I + +/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I + +/\h/I,utf + ABC\x{09} + ABC\x{20} + ABC\x{a0} + ABC\x{1680} + ABC\x{180e} + ABC\x{2000} + ABC\x{202f} + ABC\x{205f} + ABC\x{3000} + +/\v/I,utf + ABC\x{0a} + ABC\x{0b} + ABC\x{0c} + ABC\x{0d} + ABC\x{85} + ABC\x{2028} + +/\h*A/I,utf + CDBABC + \x{2000}ABC + +/\R*A/I,bsr=unicode,utf + CDBABC + \x{2028}A + +/\v+A/I,utf + +/\s?xxx\s/I,utf + +/\sxxx\s/I,utf,tables=2 + AB\x{85}xxx\x{a0}XYZ + AB\x{a0}xxx\x{85}XYZ + +/\S \S/I,utf,tables=2 + \x{a2} \x{84} + A Z + +/a+/utf + a\x{123}aa\=offset=1 + a\x{123}aa\=offset=2 + a\x{123}aa\=offset=3 +\= Expect no match + a\x{123}aa\=offset=4 +\= Expect bad offset error + a\x{123}aa\=offset=5 + a\x{123}aa\=offset=6 + +/\x{1234}+/Ii,utf + +/\x{1234}+?/Ii,utf + +/\x{1234}++/Ii,utf + +/\x{1234}{2}/Ii,utf + +/[^\x{c4}]/IB,utf + +/X+\x{200}/IB,utf + +/\R/I,utf + +# Check bad offset + +/a/utf +\= Expect bad UTF-16 offset, or no match in 32-bit + \x{10000}\=offset=1 + \x{10000}ab\=offset=1 +\= Expect 16-bit match, 32-bit no match + \x{10000}ab\=offset=2 +\= Expect no match + \x{10000}ab\=offset=3 +\= Expect no match in 16-bit, bad offset in 32-bit + \x{10000}ab\=offset=4 +\= Expect bad offset + \x{10000}ab\=offset=5 + +/í¼€/utf + +/\w+\x{C4}/B,utf + a\x{C4}\x{C4} + +/\w+\x{C4}/B,utf,tables=2 + a\x{C4}\x{C4} + +/\W+\x{C4}/B,utf + !\x{C4} + +/\W+\x{C4}/B,utf,tables=2 + !\x{C4} + +/\W+\x{A1}/B,utf + !\x{A1} + +/\W+\x{A1}/B,utf,tables=2 + !\x{A1} + +/X\s+\x{A0}/B,utf + X\x20\x{A0}\x{A0} + +/X\s+\x{A0}/B,utf,tables=2 + X\x20\x{A0}\x{A0} + +/\S+\x{A0}/B,utf + X\x{A0}\x{A0} + +/\S+\x{A0}/B,utf,tables=2 + X\x{A0}\x{A0} + +/\x{a0}+\s!/B,utf + \x{a0}\x20! + +/\x{a0}+\s!/B,utf,tables=2 + \x{a0}\x20! + +/(*UTF)abc/never_utf + +/abc/utf,never_utf + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IBi,utf + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IB,utf + +/AB\x{1fb0}/IB,utf + +/AB\x{1fb0}/IBi,utf + +/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf + \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} + \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} + +/[â±¥]/Bi,utf + +/[^â±¥]/Bi,utf + +/[[:blank:]]/B,ucp + +/\x{212a}+/Ii,utf + KKkk\x{212a} + +/s+/Ii,utf + SSss\x{17f} + +# Non-UTF characters should give errors in both 16-bit and 32-bit modes. + +/\x{110000}/utf + +/\o{4200000}/utf + +/\x{100}*A/IB,utf + A + +/\x{100}*\d(?R)/IB,utf + +/[Z\x{100}]/IB,utf + Z\x{100} + \x{100} + \x{100}Z + +/[z-\x{100}]/IB,utf + +/[z\Qa-d]Ä€\E]/IB,utf + \x{100} + Ä€ + +/[ab\x{100}]abc(xyz(?1))/IB,utf + +/\x{100}*\s/IB,utf + +/\x{100}*\d/IB,utf + +/\x{100}*\w/IB,utf + +/\x{100}*\D/IB,utf + +/\x{100}*\S/IB,utf + +/\x{100}*\W/IB,utf + +/[\x{105}-\x{109}]/IBi,utf + \x{104} + \x{105} + \x{109} +\= Expect no match + \x{100} + \x{10a} + +/[z-\x{100}]/IBi,utf + Z + z + \x{39c} + \x{178} + | + \x{80} + \x{ff} + \x{100} + \x{101} +\= Expect no match + \x{102} + Y + y + +/[z-\x{100}]/IBi,utf + +/\x{3a3}B/IBi,utf + +/./utf + \x{110000} + +/(*UTF)abý¿¿¿¿¿z/B + +/abý¿¿¿¿¿z/utf + +/[\W\p{Any}]/B + abc + 123 + +/[\W\pL]/B + abc + \x{100} + \x{308} +\= Expect no match + 123 + +/[\s[:^ascii:]]/B,ucp + +/\pP/ucp + \x{7fffffff} + +# A special extra option allows excaped surrogate code points in 32-bit mode, +# but subjects containing them must not be UTF-checked. These patterns give +# errors in 16-bit mode. + +/\x{d800}/I,utf,allow_surrogate_escapes + \x{d800}\=no_utf_check + +/\udfff\o{157401}/utf,alt_bsux,allow_surrogate_escapes + \x{dfff}\x{df01}\=no_utf_check + +# This has different starting code units in 8-bit mode. + +/^[^ab]/IB,utf + c + \x{ff} + \x{100} +\= Expect no match + aaa + +# Offsets are different in 8-bit mode. + +/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout + 123abcáyzabcdef789abcሴqr + +# A few script run tests in non-UTF mode (but they need Unicode support) + +/^(*script_run:.{4})/ + \x{3041}\x{30a1}\x{3007}\x{3007} Hiragana Katakana Han Han + \x{30a1}\x{3041}\x{3007}\x{3007} Katakana Hiragana Han Han + \x{1100}\x{2e80}\x{2e80}\x{1101} Hangul Han Han Hangul + +/^(*sr:.*)/utf,allow_surrogate_escapes + \x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana + \x{d800}\x{dfff} Surrogates (Unknown) \=no_utf_check + +/(?(n/utf + +/(?(á/utf + +# Invalid UTF-16/32 tests. + +/.../g,match_invalid_utf + abcd\x{df00}wxzy\x{df00}pqrs + abcd\x{80}wxzy\x{df00}pqrs + +/abc/match_invalid_utf + ab\x{df00}ab\=ph +\= Expect no match + ab\x{df00}cdef\=ph + +/ab$/match_invalid_utf + ab\x{df00}cdeab +\= Expect no match + ab\x{df00}cde + +/.../g,match_invalid_utf + abcd\x{80}wxzy\x{df00}pqrs + +/(?<=x)../g,match_invalid_utf + abcd\x{80}wxzy\x{df00}pqrs + abcd\x{80}wxzy\x{df00}xpqrs + +/X$/match_invalid_utf +\= Expect no match + X\x{df00} + +/(?<=..)X/match_invalid_utf,aftertext + AB\x{df00}AQXYZ + AB\x{df00}AQXYZ\=offset=5 + AB\x{df00}\x{df00}AXYZXC\=offset=5 +\= Expect no match + AB\x{df00}XYZ + AB\x{df00}XYZ\=offset=3 + AB\x{df00}AXYZ + AB\x{df00}AXYZ\=offset=4 + AB\x{df00}\x{df00}AXYZ\=offset=5 + +/.../match_invalid_utf +\= Expect no match + A\x{d800}B + A\x{110000}B + +/aa/utf,ucp,match_invalid_utf,global + aa\x{d800}aa + +/aa/utf,ucp,match_invalid_utf,global + \x{d800}aa + +# ---------------------------------------------------- + +/(*UTF)(?=\x{123})/I + +/[\x{c1}\x{e1}]X[\x{145}\x{146}]/I,utf + +/[\xff\x{ffff}]/I,utf + +/[\xff\x{ff}]/I,utf + +/[\xff\x{ff}]/I + +/[Ss]/I + +/[Ss]/I,utf + +/(?:\x{ff}|\x{3000})/I,utf + +# ---------------------------------------------------- +# UCP and casing tests + +/\x{120}/i,I + +/\x{c1}/i,I,ucp + +/[\x{120}\x{121}]/iB,ucp + +/[ab\x{120}]+/iB,ucp + aABb\x{121}\x{120} + +/\x{c1}/i,no_start_optimize +\= Expect no match + \x{e1} + +/\x{120}\x{c1}/i,ucp,no_start_optimize + \x{121}\x{e1} + +/\x{120}\x{c1}/i,ucp + \x{121}\x{e1} + +/[^\x{120}]/i,no_start_optimize + \x{121} + +/[^\x{120}]/i,ucp,no_start_optimize +\= Expect no match + \x{121} + +/[^\x{120}]/i + \x{121} + +/[^\x{120}]/i,ucp +\= Expect no match + \x{121} + +/\x{120}{2}/i,ucp + \x{121}\x{121} + +/[^\x{120}]{2}/i,ucp +\= Expect no match + \x{121}\x{121} + +/\x{c1}+\x{e1}/iB,ucp + \x{c1}\x{c1}\x{c1} + +/\x{c1}+\x{e1}/iIB,ucp + \x{c1}\x{c1}\x{c1} + \x{e1}\x{e1}\x{e1} + +/a|\x{c1}/iI,ucp + \x{e1}xxx + +/\x{c1}|\x{e1}/iI,ucp + +/X(\x{e1})Y/ucp,replace=>\U$1<,substitute_extended + X\x{e1}Y + +/X(\x{121})Y/ucp,replace=>\U$1<,substitute_extended + X\x{121}Y + +/s/i,ucp + \x{17f} + +/s/i,utf + \x{17f} + +/[^s]/i,ucp +\= Expect no match + \x{17f} + +/[^s]/i,utf +\= Expect no match + \x{17f} + +# ---------------------------------------------------- + +# End of testinput12 diff --git a/src/pcre/testdata/testinput20 b/src/pcre2/testdata/testinput13 similarity index 51% rename from src/pcre/testdata/testinput20 rename to src/pcre2/testdata/testinput13 index 2a6b8f23..93ac25f7 100644 --- a/src/pcre/testdata/testinput20 +++ b/src/pcre2/testdata/testinput13 @@ -1,5 +1,8 @@ -/-- These DFA tests are for the handling of characters greater than 255 in - 16- or 32-bit, non-UTF mode. --/ +# These DFA tests are for the handling of characters greater than 255 in +# 16-bit or 32-bit, non-UTF mode. + +#forbid_utf +#subject dfa /^\x{ffff}+/i \x{ffff} @@ -16,4 +19,4 @@ /^\x{ffff}{0,3}/i \x{ffff} -/-- End of testinput20 --/ +# End of testinput13 diff --git a/src/pcre2/testdata/testinput14 b/src/pcre2/testdata/testinput14 new file mode 100644 index 00000000..8a17ae73 --- /dev/null +++ b/src/pcre2/testdata/testinput14 @@ -0,0 +1,81 @@ +# These test special UTF and UCP features of DFA matching. The output is +# different for the different widths. + +#subject dfa + +# ---------------------------------------------------- +# These are a selection of the more comprehensive tests that are run for +# non-DFA matching. + +/X/utf + XX\x{d800} + XX\x{d800}\=offset=3 + XX\x{d800}\=no_utf_check + XX\x{da00} + XX\x{da00}\=no_utf_check + XX\x{dc00} + XX\x{dc00}\=no_utf_check + XX\x{de00} + XX\x{de00}\=no_utf_check + XX\x{dfff} + XX\x{dfff}\=no_utf_check + XX\x{110000} + XX\x{d800}\x{1234} + +/badutf/utf + X\xdf + XX\xef + XXX\xef\x80 + X\xf7 + XX\xf7\x80 + XXX\xf7\x80\x80 + +/shortutf/utf + XX\xdf\=ph + XX\xef\=ph + XX\xef\x80\=ph + \xf7\=ph + \xf7\x80\=ph + +# ---------------------------------------------------- +# UCP and casing tests - except for the first two, these will all fail in 8-bit +# mode because they are testing UCP without UTF and use characters > 255. + +/\x{c1}/i,no_start_optimize +\= Expect no match + \x{e1} + +/\x{c1}+\x{e1}/iB,ucp + \x{c1}\x{c1}\x{c1} + \x{e1}\x{e1}\x{e1} + +/\x{120}\x{c1}/i,ucp,no_start_optimize + \x{121}\x{e1} + +/\x{120}\x{c1}/i,ucp + \x{121}\x{e1} + +/[^\x{120}]/i,no_start_optimize + \x{121} + +/[^\x{120}]/i,ucp,no_start_optimize +\= Expect no match + \x{121} + +/[^\x{120}]/i + \x{121} + +/[^\x{120}]/i,ucp +\= Expect no match + \x{121} + +/\x{120}{2}/i,ucp + \x{121}\x{121} + +/[^\x{120}]{2}/i,ucp +\= Expect no match + \x{121}\x{121} + +# ---------------------------------------------------- + +# End of testinput14 diff --git a/src/pcre2/testdata/testinput15 b/src/pcre2/testdata/testinput15 new file mode 100644 index 00000000..5dd68979 --- /dev/null +++ b/src/pcre2/testdata/testinput15 @@ -0,0 +1,238 @@ +# These are: +# +# (1) Tests of the match-limiting features. The results are different for +# interpretive or JIT matching, so this test should not be run with JIT. The +# same tests are run using JIT in test 17. + +# (2) Other tests that must not be run with JIT. + +/(a+)*zz/I + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits + aaaaaaaaaaaaaz\=find_limits + +!((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)!I + /* this is a C style comment */\=find_limits + +/^(?>a)++/ + aa\=find_limits + aaaaaaaaa\=find_limits + +/(a)(?1)++/ + aa\=find_limits + aaaaaaaaa\=find_limits + +/a(?:.)*?a/ims + abbbbbbbbbbbbbbbbbbbbba\=find_limits + +/a(?:.(*THEN))*?a/ims + abbbbbbbbbbbbbbbbbbbbba\=find_limits + +/a(?:.(*THEN:ABC))*?a/ims + abbbbbbbbbbbbbbbbbbbbba\=find_limits + +/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/ + aabbccddee\=find_limits + +/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/ + aabbccddee\=find_limits + +/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/ + aabbccddee\=find_limits + +/(*LIMIT_MATCH=12bc)abc/ + +/(*LIMIT_MATCH=4294967290)abc/ + +/(*LIMIT_DEPTH=4294967280)abc/I + +/(a+)*zz/ +\= Expect no match + aaaaaaaaaaaaaz +\= Expect limit exceeded + aaaaaaaaaaaaaz\=match_limit=3000 + +/(a+)*zz/ +\= Expect limit exceeded + aaaaaaaaaaaaaz\=depth_limit=10 + +/(*LIMIT_MATCH=3000)(a+)*zz/I +\= Expect limit exceeded + aaaaaaaaaaaaaz +\= Expect limit exceeded + aaaaaaaaaaaaaz\=match_limit=60000 + +/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I +\= Expect limit exceeded + aaaaaaaaaaaaaz + +/(*LIMIT_MATCH=60000)(a+)*zz/I +\= Expect no match + aaaaaaaaaaaaaz +\= Expect limit exceeded + aaaaaaaaaaaaaz\=match_limit=3000 + +/(*LIMIT_DEPTH=10)(a+)*zz/I +\= Expect limit exceeded + aaaaaaaaaaaaaz +\= Expect limit exceeded + aaaaaaaaaaaaaz\=depth_limit=1000 + +/(*LIMIT_DEPTH=10)(*LIMIT_DEPTH=1000)(a+)*zz/I +\= Expect no match + aaaaaaaaaaaaaz + +/(*LIMIT_DEPTH=1000)(a+)*zz/I +\= Expect no match + aaaaaaaaaaaaaz +\= Expect limit exceeded + aaaaaaaaaaaaaz\=depth_limit=10 + +# These three have infinitely nested recursions. + +/((?2))((?1))/ + abc + +/((?(R2)a+|(?1)b))()/ + aaaabcde + +/(?(R)a*(?1)|((?R))b)/ + aaaabcde + +# The allusedtext modifier does not work with JIT, which does not maintain +# the leftchar/rightchar data. + +/abc(?=xyz)/allusedtext + abcxyzpqr + abcxyzpqr\=aftertext + +/(?<=pqr)abc(?=xyz)/allusedtext + xyzpqrabcxyzpqr + xyzpqrabcxyzpqr\=aftertext + +/a\b/ + a.\=allusedtext + a\=allusedtext + +/abc\Kxyz/ + abcxyz\=allusedtext + +/abc(?=xyz(*ACCEPT))/ + abcxyz\=allusedtext + +/abc(?=abcde)(?=ab)/allusedtext + abcabcdefg + +#subject allusedtext + +/(?<=abc)123/ + xyzabc123pqr + xyzabc12\=ps + xyzabc12\=ph + +/\babc\b/ + +++abc+++ + +++ab\=ps + +++ab\=ph + +/(?<=abc)def/ + abc\=ph + +/(?<=123)(*MARK:xx)abc/mark + xxxx123a\=ph + xxxx123a\=ps + +/(?<=(?<=a)b)c.*/I + abc\=ph +\= Expect no match + xbc\=ph + +/(?<=ab)c.*/I + abc\=ph +\= Expect no match + xbc\=ph + +/abc(?<=bc)def/ + xxxabcd\=ph + +/(?<=ab)cdef/ + xxabcd\=ph + +/(?<=(?<=(?<=a)b)c)./I + 123abcXYZ + +/(?<=ab(cd(?<=...)))./I + abcdX + +/(?<=ab((?<=...)cd))./I + ZabcdX + +/(?<=((?<=(?<=ab).))(?1)(?1))./I + abxZ + +#subject +# ------------------------------------------------------------------- + +# These tests provoke recursion loops, which give a different error message +# when JIT is used. + +/(?R)/I + abcd + +/(a|(?R))/I + abcd + defg + +/(ab|(bc|(de|(?R))))/I + abcd + fghi + +/(ab|(bc|(de|(?1))))/I + abcd + fghi + +/x(ab|(bc|(de|(?1)x)x)x)/I + xab123 + xfghi + +/(?!\w)(?R)/ + abcd + =abc + +/(?=\w)(?R)/ + =abc + abcd + +/(?testsavedregex -Capturing subpattern count = 0 -No options -First char = 'a' -Need char = 'c' -Subject length lower bound = 3 -No starting char list -JIT study was successful -Compiled pattern written to testsavedregex -Study data written to testsavedregex - -a)++/ + aa\=find_limits + aaaaaaaaa\=find_limits + +/(a)(?1)++/ + aa\=find_limits + aaaaaaaaa\=find_limits + +/a(?:.)*?a/ims + abbbbbbbbbbbbbbbbbbbbba\=find_limits + +/a(?:.(*THEN))*?a/ims + abbbbbbbbbbbbbbbbbbbbba\=find_limits + +/a(?:.(*THEN:ABC))*?a/ims + abbbbbbbbbbbbbbbbbbbbba\=find_limits + +/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/ + aabbccddee\=find_limits + +/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/ + aabbccddee\=find_limits + +/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/ + aabbccddee\=find_limits -/-- Test pattern compilation --/ +/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/jitfast + aabbccddee\=find_limits + aabbccddee\=jitstack=1 -/(?:a|b|c|d|e)(?R)/S++ +/(a+)*zz/ +\= Expect no match + aaaaaaaaaaaaaz +\= Expect limit exceeded + aaaaaaaaaaaaaz\=match_limit=3000 -/(?:a|b|c|d|e)(?R)(?R)/S++ +/(*LIMIT_MATCH=3000)(a+)*zz/I +\= Expect limit exceeded + aaaaaaaaaaaaaz +\= Expect limit exceeded + aaaaaaaaaaaaaz\=match_limit=60000 -/(a(?:a|b|c|d|e)b){8,16}/S++ +/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I +\= Expect limit exceeded + aaaaaaaaaaaaaz -/(?:|a|){100}x/S++ +/(*LIMIT_MATCH=60000)(a+)*zz/I +\= Expect no match + aaaaaaaaaaaaaz +\= Expect limit exceeded + aaaaaaaaaaaaaz\=match_limit=3000 -/(x(?1)){4}/S++ +# These three have infinitely nested recursions. + +/((?2))((?1))/ +\= Expect JIT stack limit reached + abc + +/((?(R2)a+|(?1)b))()/ +\= Expect JIT stack limit reached + aaaabcde + +/(?(R)a*(?1)|((?R))b)/ +\= Expect JIT stack limit reached + aaaabcde + +# Invalid options disable JIT when called via pcre2_match(), causing the +# match to happen via the interpreter, but for fast JIT invalid options are +# ignored, so an unanchored match happens. + +/abcd/ + abcd\=anchored +\= Expect no match + fail abcd\=anchored + +/abcd/jitfast + abcd\=anchored + succeed abcd\=anchored + +# Push/pop does not lose the JIT information, though jitverify applies only to +# compilation, but serializing (save/load) discards JIT data completely. + +/^abc\Kdef/info,push +#pop jitverify + abcdef + +/^abc\Kdef/info,push +#save testsaved1 +#load testsaved1 +#pop jitverify + abcdef + +#load testsaved1 +#pop jit,jitverify + abcdef + +/abcd/pushcopy,jitverify + abcd + +#pop jitverify + abcd + +# Test pattern compilation + +/(?:a|b|c|d|e)(?R)/jit=1 + +/(?:a|b|c|d|e)(?R)(?R)/jit=1 + +/(a(?:a|b|c|d|e)b){8,16}/jit=1 + +/(?:|a|){100}x/jit=1 + +# These tests provoke recursion loops, which give a different error message +# when JIT is used. + +/(?R)/I + abcd + +/(a|(?R))/I + abcd + defg + +/(ab|(bc|(de|(?R))))/I + abcd + fghi + +/(ab|(bc|(de|(?1))))/I + abcd + fghi + +/x(ab|(bc|(de|(?1)x)x)x)/I + xab123 + xfghi + +/(?!\w)(?R)/ + abcd + =abc + +/(?=\w)(?R)/ + =abc + abcd + +/(?b)c/posix_nosub + abc + +/(a)\1/posix_nosub + zaay + +/a?|b?/ + abc +\= Expect no match + ddd\=notempty + +/\w+A/ + CDAAAAB + +/\w+A/ungreedy + CDAAAAB + +/\Biss\B/I,aftertext + Mississippi + +/abc/\ + +"(?(?C)" + +"(?(?C))" + +/abcd/substitute_extended + +/\[A]{1000000}**/expand,regerror_buffsize=31 + +/\[A]{1000000}**/expand,regerror_buffsize=32 + +//posix_nosub + \=offset=70000 + +/(?=(a\K))/ + a + +/^d(e)$/posix + acdef\=posix_startend=2:4 + acde\=posix_startend=2 +\= Expect no match + acdef + acdef\=posix_startend=2 + +/^a\x{00}b$/posix + a\x{00}b\=posix_startend=0:3 + +/"A" 00 "B"/hex + A\x{00}B\=posix_startend=0:3 + +/ABC/use_length + ABC + +/a\b(c/literal,posix + a\\b(c + +/a\b(c/literal,posix,dotall + +/((a)(b)?(c))/posix + 123ace + 123ace\=posix_startend=2:6 + +# End of testdata/testinput18 diff --git a/src/pcre2/testdata/testinput19 b/src/pcre2/testdata/testinput19 new file mode 100644 index 00000000..3bf1720b --- /dev/null +++ b/src/pcre2/testdata/testinput19 @@ -0,0 +1,21 @@ +# This set of tests is run only with the 8-bit library. It tests the POSIX +# interface with UTF/UCP support, which is supported only with the 8-bit +# library. This test should not be run with JIT (which is not available for the +# POSIX interface). + +#pattern posix + +/a\x{1234}b/utf + a\x{1234}b + +/\w/ +\= Expect no match + +++\x{c2} + +/\w/ucp + +++\x{c2} + +/"^AB" 00 "\x{1234}$"/hex,utf + AB\x{00}\x{1234}\=posix_startend=0:6 + +# End of testdata/testinput19 diff --git a/src/pcre2/testdata/testinput2 b/src/pcre2/testdata/testinput2 new file mode 100644 index 00000000..865c903a --- /dev/null +++ b/src/pcre2/testdata/testinput2 @@ -0,0 +1,5896 @@ +# This set of tests is not Perl-compatible. It checks on special features +# of PCRE2's API, error diagnostics, and the compiled code of some patterns. +# It also checks the non-Perl syntax that PCRE2 supports (Python, .NET, +# Oniguruma). There are also some tests where PCRE2 and Perl differ, +# either because PCRE2 can't be compatible, or there is a possible Perl +# bug. + +# NOTE: This is a non-UTF set of tests. When UTF support is needed, use +# test 5. + +#forbid_utf +#newline_default lf any anycrlf + +# Test binary zeroes in the pattern + +# /a\0B/ where 0 is a binary zero +/61 5c 00 62/B,hex + a\x{0}b + +# /a0b/ where 0 is a binary zero +/61 00 62/B,hex + a\x{0}b + +# /(?#B0C)DE/ where 0 is a binary zero +/28 3f 23 42 00 43 29 44 45/B,hex + DE + +/(a)b|/I + +/abc/I + abc + defabc + abc\=anchored +\= Expect no match + defabc\=anchored + ABC + +/^abc/I + abc + abc\=anchored +\= Expect no match + defabc + defabc\=anchored + +/a+bc/I + +/a*bc/I + +/a{3}bc/I + +/(abc|a+z)/I + +/^abc$/I + abc +\= Expect no match + def\nabc + +/ab\idef/ + +/(?X)ab\idef/ + +/x{5,4}/ + +/z{65536}/ + +/[abcd/ + +/[\B]/B + +/[\R]/B + +/[\X]/B + +/[z-a]/ + +/^*/ + +/(abc/ + +/(?# abc/ + +/(?z)abc/ + +/.*b/I + +/.*?b/I + +/cat|dog|elephant/I + this sentence eventually mentions a cat + this sentences rambles on and on for a while and then reaches elephant + +/cat|dog|elephant/I + this sentence eventually mentions a cat + this sentences rambles on and on for a while and then reaches elephant + +/cat|dog|elephant/Ii + this sentence eventually mentions a CAT cat + this sentences rambles on and on for a while to elephant ElePhant + +/a|[bcd]/I + +/(a|[^\dZ])/I + +/(a|b)*[\s]/I + +/(ab\2)/ + +/{4,5}abc/ + +/(a)(b)(c)\2/I + abcb + abcb\=ovector=0 + abcb\=ovector=1 + abcb\=ovector=2 + abcb\=ovector=3 + abcb\=ovector=4 + +/(a)bc|(a)(b)\2/I + abc + abc\=ovector=0 + abc\=ovector=1 + abc\=ovector=2 + aba + aba\=ovector=0 + aba\=ovector=1 + aba\=ovector=2 + aba\=ovector=3 + aba\=ovector=4 + +/abc$/I,dollar_endonly + abc +\= Expect no match + abc\n + abc\ndef + +/(a)(b)(c)(d)(e)\6/ + +/the quick brown fox/I + the quick brown fox + this is a line with the quick brown fox + +/the quick brown fox/I,anchored + the quick brown fox +\= Expect no match + this is a line with the quick brown fox + +/ab(?z)cd/ + +/^abc|def/I + abcdef + abcdef\=notbol + +/.*((abc)$|(def))/I + defabc + defabc\=noteol + +/)/ + +/a[]b/ + +/[^aeiou ]{3,}/I + co-processors, and for + +/<.*>/I + abcghinop + +/<.*?>/I + abcghinop + +/<.*>/I,ungreedy + abcghinop + +/(?U)<.*>/I + abcghinop + +/<.*?>/I,ungreedy + abcghinop + +/={3,}/I,ungreedy + abc========def + +/(?U)={3,}?/I + abc========def + +/(?^abc)/Im + abc + def\nabc +\= Expect no match + defabc + +/(?<=ab(c+)d)ef/ + +/(?<=ab(?<=c+)d)ef/ + +/(?<=ab(c|de)f)g/ + +/The next three are in testinput2 because they have variable length branches/ + +/(?<=bullock|donkey)-cart/I + the bullock-cart + a donkey-cart race +\= Expect no match + cart + horse-and-cart + +/(?<=ab(?i)x|y|z)/I + +/(?>.*)(?<=(abcd)|(xyz))/I + alphabetabcd + endingxyz + +/(?<=ab(?i)x(?-i)y|(?i)z|b)ZZ/I + abxyZZ + abXyZZ + ZZZ + zZZ + bZZ + BZZ +\= Expect no match + ZZ + abXYZZ + zzz + bzz + +/(?[^()]+) # Either a sequence of non-brackets (no backtracking) + | # Or + (?R) # Recurse - i.e. nested bracketed string + )* # Zero or more contents + \) # Closing ) + /Ix + (abcd) + (abcd)xyz + xyz(abcd) + (ab(xy)cd)pqr + (ab(xycd)pqr + () abc () + 12(abcde(fsh)xyz(foo(bar))lmno)89 +\= Expect no match + abcd + abcd) + (abcd + +/\( ( (?>[^()]+) | (?R) )* \) /Igx + (ab(xy)cd)pqr + 1(abcd)(x(y)z)pqr + +/\( (?: (?>[^()]+) | (?R) ) \) /Ix + (abcd) + (ab(xy)cd) + (a(b(c)d)e) + ((ab)) +\= Expect no match + () + +/\( (?: (?>[^()]+) | (?R) )? \) /Ix + () + 12(abcde(fsh)xyz(foo(bar))lmno)89 + +/\( ( (?>[^()]+) | (?R) )* \) /Ix + (ab(xy)cd) + +/\( ( ( (?>[^()]+) | (?R) )* ) \) /Ix + (ab(xy)cd) + +/\( (123)? ( ( (?>[^()]+) | (?R) )* ) \) /Ix + (ab(xy)cd) + (123ab(xy)cd) + +/\( ( (123)? ( (?>[^()]+) | (?R) )* ) \) /Ix + (ab(xy)cd) + (123ab(xy)cd) + +/\( (((((((((( ( (?>[^()]+) | (?R) )* )))))))))) \) /Ix + (ab(xy)cd) + +/\( ( ( (?>[^()<>]+) | ((?>[^()]+)) | (?R) )* ) \) /Ix + (abcd(xyz

    qrs)123) + +/\( ( ( (?>[^()]+) | ((?R)) )* ) \) /Ix + (ab(cd)ef) + (ab(cd(ef)gh)ij) + +/^[[:alnum:]]/IB + +/^[[:^alnum:]]/IB + +/^[[:alpha:]]/IB + +/^[[:^alpha:]]/IB + +/[_[:alpha:]]/I + +/^[[:ascii:]]/IB + +/^[[:^ascii:]]/IB + +/^[[:blank:]]/IB + +/^[[:^blank:]]/IB + +/[\n\x0b\x0c\x0d[:blank:]]/I + +/^[[:cntrl:]]/IB + +/^[[:digit:]]/IB + +/^[[:graph:]]/IB + +/^[[:lower:]]/IB + +/^[[:print:]]/IB + +/^[[:punct:]]/IB + +/^[[:space:]]/IB + +/^[[:upper:]]/IB + +/^[[:xdigit:]]/IB + +/^[[:word:]]/IB + +/^[[:^cntrl:]]/IB + +/^[12[:^digit:]]/IB + +/^[[:^blank:]]/IB + +/[01[:alpha:]%]/IB + +/[[.ch.]]/I + +/[[=ch=]]/I + +/[[:rhubarb:]]/I + +/[[:upper:]]/Ii + A + a + +/[[:lower:]]/Ii + A + a + +/((?-i)[[:lower:]])[[:lower:]]/Ii + ab + aB +\= Expect no match + Ab + AB + +/[\200-\110]/I + +/^(?(0)f|b)oo/I + +# This one's here because of the large output vector needed + +/(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\w+)\s+(\270)/I + 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 ABC ABC\=ovector=300 + +# This one's here because Perl does this differently and PCRE2 can't at present + +/(main(O)?)+/I + mainmain + mainOmain + +# These are all cases where Perl does it differently (nested captures) + +/^(a(b)?)+$/I + aba + +/^(aa(bb)?)+$/I + aabbaa + +/^(aa|aa(bb))+$/I + aabbaa + +/^(aa(bb)??)+$/I + aabbaa + +/^(?:aa(bb)?)+$/I + aabbaa + +/^(aa(b(b))?)+$/I + aabbaa + +/^(?:aa(b(b))?)+$/I + aabbaa + +/^(?:aa(b(?:b))?)+$/I + aabbaa + +/^(?:aa(bb(?:b))?)+$/I + aabbbaa + +/^(?:aa(b(?:bb))?)+$/I + aabbbaa + +/^(?:aa(?:b(b))?)+$/I + aabbaa + +/^(?:aa(?:b(bb))?)+$/I + aabbbaa + +/^(aa(b(bb))?)+$/I + aabbbaa + +/^(aa(bb(bb))?)+$/I + aabbbbaa + +# ---------------- + +/#/IBx + +/a#/IBx + +/[\s]/IB + +/[\S]/IB + +/a(?i)b/IB + ab + aB +\= Expect no match + AB + +/(a(?i)b)/IB + ab + aB +\= Expect no match + AB + +/ (?i)abc/IBx + +/#this is a comment + (?i)abc/IBxx/IB + +/ \Q\E/IB + +/a\Q\E/IB + abc + bca + bac + +/a\Q\Eb/IB + abc + +/\Q\Eabc/IB + +/x*+\w/IB +\= Expect no match + xxxxx + +/x?+/IB + +/x++/IB + +/x{1,3}+/B,no_auto_possess + +/x{1,3}+/Bi,no_auto_possess + +/[^x]{1,3}+/B,no_auto_possess + +/[^x]{1,3}+/Bi,no_auto_possess + +/(x)*+/IB + +/^(\w++|\s++)*$/I + now is the time for all good men to come to the aid of the party +\= Expect no match + this is not a line with only words and spaces! + +/(\d++)(\w)/I + 12345a +\= Expect no match + 12345+ + +/a++b/I + aaab + +/(a++b)/I + aaab + +/(a++)b/I + aaab + +/([^()]++|\([^()]*\))+/I + ((abc(ade)ufh()()x + +/\(([^()]++|\([^()]+\))+\)/I + (abc) + (abc(def)xyz) +\= Expect no match + ((()aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + +/(abc){1,3}+/IB + +/a+?+/I + +/a{2,3}?+b/I + +/(?U)a+?+/I + +/a{2,3}?+b/I,ungreedy + +/x(?U)a++b/IB + xaaaab + +/(?U)xa++b/IB + xaaaab + +/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/IB + +/^x(?U)a+b/IB + +/^x(?U)(a+)b/IB + +/[.x.]/I + +/[=x=]/I + +/[:x:]/I + +/\F/I + +/\l/I + +/\L/I + +/\N{name}/I + +/\u/I + +/\U/I + +/a{1,3}b/ungreedy + ab + +/[/I + +/[a-/I + +/[[:space:]/I + +/[\s]/IB + +/[[:space:]]/IB + +/[[:space:]abcde]/IB + +/< (?: (?(R) \d++ | [^<>]*+) | (?R)) * >/Ix + <> + + hij> + hij> + def> + +\= Expect no match + iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b/IB + +/\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b/IB + +/(.*)\d+\1/I + +/(.*)\d+/I + +/(.*)\d+\1/Is + +/(.*)\d+/Is + +/(.*(xyz))\d+\2/I + +/((.*))\d+\1/I + abc123bc + +/a[b]/I + +/(?=a).*/I + +/(?=abc).xyz/Ii + +/(?=abc)(?i).xyz/I + +/(?=a)(?=b)/I + +/(?=.)a/I + +/((?=abcda)a)/I + +/((?=abcda)ab)/I + +/()a/I + +/(?:(?=.)|(?abc>([^()]|\((?1)*\))*abc>123abc>1(2)3abc>(1(2)3)]*+) | (?2)) * >))/Ix + <> + + hij> + hij> + def> + +\= Expect no match + b|c)d(?Pe)/IB + abde + acde + +/(?:a(?Pc(?Pd)))(?Pa)/IB + +/(?Pa)...(?P=a)bbb(?P>a)d/IB + +/^\W*(?:(?P(?P.)\W*(?P>one)\W*(?P=two)|)|(?P(?P.)\W*(?P>three)\W*(?P=four)|\W*.\W*))\W*$/Ii + 1221 + Satan, oscillate my metallic sonatas! + A man, a plan, a canal: Panama! + Able was I ere I saw Elba. +\= Expect no match + The quick brown fox + +/((?(R)a|b))\1(?1)?/I + bb + bbaa + +/(.*)a/Is + +/(.*)a\1/Is + +/(.*)a(b)\2/Is + +/((.*)a|(.*)b)z/Is + +/((.*)a|(.*)b)z\1/Is + +/((.*)a|(.*)b)z\2/Is + +/((.*)a|(.*)b)z\3/Is + +/((.*)a|^(.*)b)z\3/Is + +/(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)a/Is + +/(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)a\31/Is + +/(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)a\32/Is + +/(a)(bc)/IB,no_auto_capture + abc + +/(?Pa)(bc)/IB,no_auto_capture + abc + +/(a)(?Pbc)/IB,no_auto_capture + +/(aaa(?C1)bbb|ab)/I + aaabbb + aaabbb\=callout_data=0 + aaabbb\=callout_data=1 +\= Expect no match + aaabbb\=callout_data=-1 + +/ab(?Pcd)ef(?Pgh)/I + abcdefgh + abcdefgh\=copy=1,get=two + abcdefgh\=copy=one,copy=two + abcdefgh\=copy=three + +/(?P)(?P)/IB + +/(?P)(?P)/IB + +/(?Pzz)(?Paa)/I + zzaa\=copy=Z + zzaa\=copy=A + +/(?Peks)(?Peccs)/I + +/(?Pabc(?Pdef)(?Pxyz))/I + +"\[((?P\d+)(,(?P>elem))*)\]"I + [10,20,30,5,5,4,4,2,43,23,4234] +\= Expect no match + [] + +"\[((?P\d+)(,(?P>elem))*)?\]"I + [10,20,30,5,5,4,4,2,43,23,4234] + [] + +/(a(b(?2)c))?/IB + +/(a(b(?2)c))*/IB + +/(a(b(?2)c)){0,2}/IB + +/[ab]{1}+/B + +/()(?1){1}/B + +/()(?1)/B + +/((w\/|-|with)*(free|immediate)*.*?shipping\s*[!.-]*)/Ii + Baby Bjorn Active Carrier - With free SHIPPING!! + +/((w\/|-|with)*(free|immediate)*.*?shipping\s*[!.-]*)/Ii + Baby Bjorn Active Carrier - With free SHIPPING!! + +/a*.*b/IB + +/(a|b)*.?c/IB + +/abc(?C255)de(?C)f/IB + +/abcde/IB,auto_callout + abcde +\= Expect no match + abcdfe + +/a*b/IB,auto_callout + ab + aaaab + aaaacb + +/a*b/IB,auto_callout + ab + aaaab + aaaacb + +/a+b/IB,auto_callout + ab + aaaab +\= Expect no match + aaaacb + +/(abc|def)x/IB,auto_callout + abcx + defx +\= Expect no match + abcdefzx + +/(abc|def)x/IB,auto_callout + abcx + defx +\= Expect no match + abcdefzx + +/(ab|cd){3,4}/I,auto_callout + ababab + abcdabcd + abcdcdcdcdcd + +/([ab]{,4}c|xy)/IB,auto_callout +\= Expect no match + Note: that { does NOT introduce a quantifier + +/([ab]{,4}c|xy)/IB,auto_callout +\= Expect no match + Note: that { does NOT introduce a quantifier + +/([ab]{1,4}c|xy){4,5}?123/IB,auto_callout + aacaacaacaacaac123 + +/\b.*/I + ab cd\=offset=1 + +/\b.*/Is + ab cd\=startoffset=1 + +/(?!.bcd).*/I + Xbcd12345 + +/abcde/I + ab\=ps + abc\=ps + abcd\=ps + abcde\=ps + the quick brown abc\=ps +\= Expect no match\=ps + the quick brown abxyz fox\=ps + +"^(0?[1-9]|[12][0-9]|3[01])/(0?[1-9]|1[012])/(20)?\d\d$"I + 13/05/04\=ps + 13/5/2004\=ps + 02/05/09\=ps + 1\=ps + 1/2\=ps + 1/2/0\=ps + 1/2/04\=ps + 0\=ps + 02/\=ps + 02/0\=ps + 02/1\=ps +\= Expect no match\=ps + \=ps + 123\=ps + 33/4/04\=ps + 3/13/04\=ps + 0/1/2003\=ps + 0/\=ps + 02/0/\=ps + 02/13\=ps + +/0{0,2}ABC/I + +/\d{3,}ABC/I + +/\d*ABC/I + +/[abc]+DE/I + +/[abc]?123/I + 123\=ps + a\=ps + b\=ps + c\=ps + c12\=ps + c123\=ps + +/^(?:\d){3,5}X/I + 1\=ps + 123\=ps + 123X + 1234\=ps + 1234X + 12345\=ps + 12345X +\= Expect no match + 1X + 123456\=ps + +"<(\w+)/?>(.)*"Igms + \n\n\nPartner der LCO\nde\nPartner der LINEAS Consulting\nGmbH\nLINEAS Consulting GmbH Hamburg\nPartnerfirmen\n30 days\nindex,follow\n\nja\n3\nPartner\n\n\nLCO\nLINEAS Consulting\n15.10.2003\n\n\n\n\nDie Partnerfirmen der LINEAS Consulting\nGmbH\n\n\n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\=jitstack=1024 + +/line\nbreak/I + this is a line\nbreak + line one\nthis is a line\nbreak in the second line + +/line\nbreak/I,firstline + this is a line\nbreak +\= Expect no match + line one\nthis is a line\nbreak in the second line + +/line\nbreak/Im,firstline + this is a line\nbreak +\= Expect no match + line one\nthis is a line\nbreak in the second line + +/(?i)(?-i)AbCd/I + AbCd +\= Expect no match + abcd + +/a{11111111111111111111}/I + +/(){64294967295}/I + +/(){2,4294967295}/I + +"(?i:a)(?i:b)(?i:c)(?i:d)(?i:e)(?i:f)(?i:g)(?i:h)(?i:i)(?i:j)(k)(?i:l)A\1B"I + abcdefghijklAkB + +"(?Pa)(?Pb)(?Pc)(?Pd)(?Pe)(?Pf)(?Pg)(?Ph)(?Pi)(?Pj)(?Pk)(?Pl)A\11B"I + abcdefghijklAkB + +"(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)A\11B"I + abcdefghijklAkB + +"(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)"I + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + +"(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)"I + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + +/[^()]*(?:\((?R)\)[^()]*)*/I + (this(and)that + (this(and)that) + (this(and)that)stuff + +/[^()]*(?:\((?>(?R))\)[^()]*)*/I + (this(and)that + (this(and)that) + +/[^()]*(?:\((?R)\))*[^()]*/I + (this(and)that + (this(and)that) + +/(?:\((?R)\))*[^()]*/I + (this(and)that + (this(and)that) + ((this)) + +/(?:\((?R)\))|[^()]*/I + (this(and)that + (this(and)that) + (this) + ((this)) + +/\x{0000ff}/I + +/^((?Pa1)|(?Pa2)b)/I + +/^((?Pa1)|(?Pa2)b)/I,dupnames + a1b\=copy=A + a2b\=copy=A + a1b\=copy=Z,copy=A + +/(?|(?)(?)(?)|(?)(?)(?))/I,dupnames + +/^(?Pa)(?Pb)/I,dupnames + ab\=copy=A + +/^(?Pa)(?Pb)|cd/I,dupnames + ab\=copy=A + cd\=copy=A + +/^(?Pa)(?Pb)|cd(?Pef)(?Pgh)/I,dupnames + cdefgh\=copy=A + +/^((?Pa1)|(?Pa2)b)/I,dupnames + a1b\=get=A + a2b\=get=A + a1b\=get=Z,get=A + +/^(?Pa)(?Pb)/I,dupnames + ab\=get=A + +/^(?Pa)(?Pb)|cd/I,dupnames + ab\=get=A + cd\=get=A + +/^(?Pa)(?Pb)|cd(?Pef)(?Pgh)/I,dupnames + cdefgh\=get=A + +/(?J)^((?Pa1)|(?Pa2)b)/I + a1b\=copy=A + a2b\=copy=A + +/^(?Pa) (?J:(?Pb)(?Pc)) (?Pd)/I + +# In this next test, J is not set at the outer level; consequently it isn't set +# in the pattern's options; consequently pcre2_substring_get_byname() produces +# a random value. + +/^(?Pa) (?J:(?Pb)(?Pc)) (?Pd)/I + a bc d\=copy=A,copy=B,copy=C + +/^(?Pa)?(?(A)a|b)/I + aabc + bc +\= Expect no match + abc + +/(?:(?(ZZ)a|b)(?PX))+/I + bXaX + +/(?:(?(2y)a|b)(X))+/I + +/(?:(?(ZA)a|b)(?PX))+/I + +/(?:(?(ZZ)a|b)(?(ZZ)a|b)(?PX))+/I + bbXaaX + +/(?:(?(ZZ)a|\(b\))\\(?PX))+/I + (b)\\Xa\\X + +/(?PX|Y))+/I + bXXaYYaY + bXYaXXaX + +/()()()()()()()()()(?:(?(A)(?P=A)a|b)(?PX|Y))+/I + bXXaYYaY + +/\s*,\s*/I + \x0b,\x0b + \x0c,\x0d + +/^abc/Im,newline=lf + xyz\nabc + xyz\r\nabc +\= Expect no match + xyz\rabc + xyzabc\r + xyzabc\rpqr + xyzabc\r\n + xyzabc\r\npqr + +/^abc/Im,newline=crlf + xyz\r\nabclf> +\= Expect no match + xyz\nabclf + xyz\rabclf + +/^abc/Im,newline=cr + xyz\rabc +\= Expect no match + xyz\nabc + xyz\r\nabc + +/^abc/Im,newline=bad + +/.*/I,newline=lf + abc\ndef + abc\rdef + abc\r\ndef + +/.*/I,newline=cr + abc\ndef + abc\rdef + abc\r\ndef + +/.*/I,newline=crlf + abc\ndef + abc\rdef + abc\r\ndef + +/\w+(.)(.)?def/Is + abc\ndef + abc\rdef + abc\r\ndef + +/(?P25[0-5]|2[0-4]\d|[01]?\d?\d)(?:\.(?P>B)){3}/I + +/()()()()()()()()()()()()()()()()()()()() + ()()()()()()()()()()()()()()()()()()()() + ()()()()()()()()()()()()()()()()()()()() + ()()()()()()()()()()()()()()()()()()()() + ()()()()()()()()()()()()()()()()()()()() + (.(.))/Ix + XY\=ovector=133 + +/(a*b|(?i:c*(?-i)d))/I + +/()[ab]xyz/I + +/(|)[ab]xyz/I + +/(|c)[ab]xyz/I + +/(|c?)[ab]xyz/I + +/(d?|c?)[ab]xyz/I + +/(d?|c)[ab]xyz/I + +/^a*b\d/IB + +/^a*+b\d/IB + +/^a*?b\d/IB + +/^a+A\d/IB + aaaA5 +\= Expect no match + aaaa5 + +/^a*A\d/IBi + aaaA5 + aaaa5 + a5 + +/(a*|b*)[cd]/I + +/(a+|b*)[cd]/I + +/(a*|b+)[cd]/I + +/(a+|b+)[cd]/I + +/(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( + (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( + ((( + a + )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) + )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) + ))) +/Ix + large nest + +/a*\d/B + +/a*\D/B + +/0*\d/B + +/0*\D/B + +/a*\s/B + +/a*\S/B + +/ *\s/B + +/ *\S/B + +/a*\w/B + +/a*\W/B + +/=*\w/B + +/=*\W/B + +/\d*a/B + +/\d*2/B + +/\d*\d/B + +/\d*\D/B + +/\d*\s/B + +/\d*\S/B + +/\d*\w/B + +/\d*\W/B + +/\D*a/B + +/\D*2/B + +/\D*\d/B + +/\D*\D/B + +/\D*\s/B + +/\D*\S/B + +/\D*\w/B + +/\D*\W/B + +/\s*a/B + +/\s*2/B + +/\s*\d/B + +/\s*\D/B + +/\s*\s/B + +/\s*\S/B + +/\s*\w/B + +/\s*\W/B + +/\S*a/B + +/\S*2/B + +/\S*\d/B + +/\S*\D/B + +/\S*\s/B + +/\S*\S/B + +/\S*\w/B + +/\S*\W/B + +/\w*a/B + +/\w*2/B + +/\w*\d/B + +/\w*\D/B + +/\w*\s/B + +/\w*\S/B + +/\w*\w/B + +/\w*\W/B + +/\W*a/B + +/\W*2/B + +/\W*\d/B + +/\W*\D/B + +/\W*\s/B + +/\W*\S/B + +/\W*\w/B + +/\W*\W/B + +/[^a]+a/B + +/[^a]+a/Bi + +/[^a]+A/Bi + +/[^a]+b/B + +/[^a]+\d/B + +/a*[^a]/B + +/(?Px)(?Py)/I + xy\=copy=abc,copy=xyz + +/(?x)(?'xyz'y)/I + xy\=copy=abc,copy=xyz + +/(?x)(?'xyz>y)/I + +/(?P'abc'x)(?Py)/I + +/^(?:(?(ZZ)a|b)(?X))+/ + bXaX + bXbX +\= Expect no match + aXaX + aXbX + +/^(?P>abc)(?xxx)/ + +/^(?P>abc)(?x|y)/ + xx + xy + yy + yx + +/^(?P>abc)(?Px|y)/ + xx + xy + yy + yx + +/^((?(abc)a|b)(?x|y))+/ + bxay + bxby +\= Expect no match + axby + +/^(((?P=abc)|X)(?x|y))+/ + XxXxxx + XxXyyx + XxXyxx +\= Expect no match + x + +/^(?1)(abc)/ + abcabc + +/^(?:(?:\1|X)(a|b))+/ + Xaaa + Xaba + +/^[\E\Qa\E-\Qz\E]+/B + +/^[a\Q]bc\E]/B + +/^[a-\Q\E]/B + +/^(?P>abc)[()](?)/B + +/^((?(abc)y)[()](?Px))+/B + (xy)x + +/^(?P>abc)\Q()\E(?)/B + +/^(?P>abc)[a\Q(]\E(](?)/B + +/^(?P>abc) # this is (a comment) + (?)/Bx + +/^\W*(?:(?(?.)\W*(?&one)\W*\k|)|(?(?.)\W*(?&three)\W*\k'four'|\W*.\W*))\W*$/Ii + 1221 + Satan, oscillate my metallic sonatas! + A man, a plan, a canal: Panama! + Able was I ere I saw Elba. +\= Expect no match + The quick brown fox + +/(?=(\w+))\1:/I + abcd: + +/(?=(?'abc'\w+))\k:/I + abcd: + +/(?'abc'a|b)(?d|e)\k{2}/dupnames + adaa +\= Expect no match + addd + adbb + +/(?'abc'a|b)(?d|e)(?&abc){2}/dupnames + bdaa + bdab +\= Expect no match + bddd + +/(?( (?'B' abc (?(R) (?(R&A)1) (?(R&B)2) X | (?1) (?2) (?R) ))) /x + abcabc1Xabc2XabcXabcabc + +/(? (?'B' abc (?(R) (?(R&C)1) (?(R&B)2) X | (?1) (?2) (?R) ))) /x + +/^(?(DEFINE) abc | xyz ) /x + +/(?(DEFINE) abc) xyz/Ix + +/(a|)*\d/ + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4\=ovector=0 +\= Expect no match + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\=ovector=0 + +/^a.b/newline=lf + a\rb +\= Expect no match + a\nb + +/^a.b/newline=cr + a\nb +\= Expect no match + a\rb + +/^a.b/newline=anycrlf + a\x85b +\= Expect no match + a\rb + +/^a.b/newline=any +\= Expect no match + a\nb + a\rb + a\x85b + +/^abc./gmx,newline=any + abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x85abc7 JUNK + +/abc.$/gmx,newline=any + abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc7 abc9 + +/^a\Rb/bsr=unicode + a\nb + a\rb + a\r\nb + a\x0bb + a\x0cb + a\x85b +\= Expect no match + a\n\rb + +/^a\R*b/bsr=unicode + ab + a\nb + a\rb + a\r\nb + a\x0bb + a\x0cb + a\x85b + a\n\rb + a\n\r\x85\x0cb + +/^a\R+b/bsr=unicode + a\nb + a\rb + a\r\nb + a\x0bb + a\x0cb + a\x85b + a\n\rb + a\n\r\x85\x0cb +\= Expect no match + ab + +/^a\R{1,3}b/bsr=unicode + a\nb + a\n\rb + a\n\r\x85b + a\r\n\r\nb + a\r\n\r\n\r\nb + a\n\r\n\rb + a\n\n\r\nb +\= Expect no match + a\n\n\n\rb + a\r + +/(?&abc)X(?P)/I + abcPXP123 + +/(?1)X(?P)/I + abcPXP123 + +/(?:a(?&abc)b)*(?x)/ + 123axbaxbaxbx456 + 123axbaxbaxb456 + +/(?:a(?&abc)b){1,5}(?x)/ + 123axbaxbaxbx456 + +/(?:a(?&abc)b){2,5}(?x)/ + 123axbaxbaxbx456 + +/(?:a(?&abc)b){2,}(?x)/ + 123axbaxbaxbx456 + +/(abc)(?i:(?1))/ + defabcabcxyz +\= Expect no match + DEFabcABCXYZ + +/(abc)(?:(?i)(?1))/ + defabcabcxyz +\= Expect no match + DEFabcABCXYZ + +/^(a)\g-2/ + +/^(a)\g/ + +/^(a)\g{0}/ + +/^(a)\g{3/ + +/^(a)\g{aa}/ + +/^a.b/newline=lf + a\rb +\= Expect no match + a\nb + +/.+foo/ + afoo +\= Expect no match + \r\nfoo + \nfoo + +/.+foo/newline=crlf + afoo + \nfoo +\= Expect no match + \r\nfoo + +/.+foo/newline=any + afoo +\= Expect no match + \nfoo + \r\nfoo + +/.+foo/s + afoo + \r\nfoo + \nfoo + +/^$/gm,newline=any + abc\r\rxyz + abc\n\rxyz +\= Expect no match + abc\r\nxyz + +/(?m)^$/g,newline=any,aftertext + abc\r\n\r\n + +/(?m)^$|^\r\n/g,newline=any,aftertext + abc\r\n\r\n + +/(?m)$/g,newline=any,aftertext + abc\r\n\r\n + +/abc.$/gmx,newline=anycrlf + abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc9 + +/^X/m + XABC +\= Expect no match + XABC\=notbol + +/(ab|c)(?-1)/B + abc + +/xy(?+1)(abc)/B + xyabcabc +\= Expect no match + xyabc + +/x(?-0)y/ + +/x(?-1)y/ + +/x(?+0)y/ + +/x(?+1)y/ + +/^(abc)?(?(-1)X|Y)/B + abcX + Y +\= Expect no match + abcY + +/^((?(+1)X|Y)(abc))+/B + YabcXabc + YabcXabcXabc +\= Expect no match + XabcXabc + +/(?(-1)a)/B + +/((?(-1)a))/B + +/((?(-2)a))/B + +/^(?(+1)X|Y)(.)/B + Y! + +/(?tom|bon)-\k{A}/ + tom-tom + bon-bon +\= Expect no match + tom-bon + +/\g{A/ + +/(?|(abc)|(xyz))/B + >abc< + >xyz< + +/(x)(?|(abc)|(xyz))(x)/B + xabcx + xxyzx + +/(x)(?|(abc)(pqr)|(xyz))(x)/B + xabcpqrx + xxyzx + +/\H++X/B +\= Expect no match + XXXX + +/\H+\hY/B + XXXX Y + +/\H+ Y/B + +/\h+A/B + +/\v*B/B + +/\V+\x0a/B + +/A+\h/B + +/ *\H/B + +/A*\v/B + +/\x0b*\V/B + +/\d+\h/B + +/\d*\v/B + +/S+\h\S+\v/B + +/\w{3,}\h\w+\v/B + +/\h+\d\h+\w\h+\S\h+\H/B + +/\v+\d\v+\w\v+\S\v+\V/B + +/\H+\h\H+\d/B + +/\V+\v\V+\w/B + +/\( (?: [^()]* | (?R) )* \)/x +(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(00)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)\=jitstack=1024 + +/[\E]AAA/ + +/[\Q\E]AAA/ + +/[^\E]AAA/ + +/[^\Q\E]AAA/ + +/[\E^]AAA/ + +/[\Q\E^]AAA/ + +/A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/B + +/^a+(*FAIL)/auto_callout +\= Expect no match + aaaaaa + +/a+b?c+(*FAIL)/auto_callout +\= Expect no match + aaabccc + +/a+b?(*PRUNE)c+(*FAIL)/auto_callout +\= Expect no match + aaabccc + +/a+b?(*COMMIT)c+(*FAIL)/auto_callout +\= Expect no match + aaabccc + +/a+b?(*SKIP)c+(*FAIL)/auto_callout +\= Expect no match + aaabcccaaabccc + +/a+b?(*THEN)c+(*FAIL)/auto_callout +\= Expect no match + aaabccc + +/a(*MARK)b/ + +/\g6666666666/ + +/[\g6666666666]/B + +/(?1)\c[/ + +/.+A/newline=crlf +\= Expect no match + \r\nA + +/\nA/newline=crlf + \r\nA + +/[\r\n]A/newline=crlf + \r\nA + +/(\r|\n)A/newline=crlf + \r\nA + +/a(*CR)b/ + +/(*CR)a.b/ + a\nb +\= Expect no match + a\rb + +/(*CR)a.b/newline=lf + a\nb +\= Expect no match + a\rb + +/(*LF)a.b/newline=CRLF + a\rb +\= Expect no match + a\nb + +/(*CRLF)a.b/ + a\rb + a\nb +\= Expect no match + a\r\nb + +/(*ANYCRLF)a.b/newline=CR +\= Expect no match + a\rb + a\nb + a\r\nb + +/(*ANY)a.b/newline=cr +\= Expect no match + a\rb + a\nb + a\r\nb + a\x85b + +/(*ANY).*/g + abc\r\ndef + +/(*ANYCRLF).*/g + abc\r\ndef + +/(*CRLF).*/g + abc\r\ndef + +/(*NUL)^.*/ + a\nb\x00ccc + +/(*NUL)^.*/s + a\nb\x00ccc + +/^x/m,newline=NUL + ab\x00xy + +/'#comment' 0d 0a 00 '^x\' 0a 'y'/x,newline=nul,hex + x\nyz + +/(*NUL)^X\NY/ + X\nY + X\rY +\= Expect no match + X\x00Y + +/a\Rb/I,bsr=anycrlf + a\rb + a\nb + a\r\nb +\= Expect no match + a\x85b + a\x0bb + +/a\Rb/I,bsr=unicode + a\rb + a\nb + a\r\nb + a\x85b + a\x0bb + +/a\R?b/I,bsr=anycrlf + a\rb + a\nb + a\r\nb +\= Expect no match + a\x85b + a\x0bb + +/a\R?b/I,bsr=unicode + a\rb + a\nb + a\r\nb + a\x85b + a\x0bb + +/a\R{2,4}b/I,bsr=anycrlf + a\r\n\nb + a\n\r\rb + a\r\n\r\n\r\n\r\nb +\= Expect no match + a\x85\x85b + a\x0b\x0bb + +/a\R{2,4}b/I,bsr=unicode + a\r\rb + a\n\n\nb + a\r\n\n\r\rb + a\x85\x85b + a\x0b\x0bb +\= Expect no match + a\r\r\r\r\rb + +/(*BSR_ANYCRLF)a\Rb/I + a\nb + a\rb + +/(*BSR_UNICODE)a\Rb/I + a\x85b + +/(*BSR_ANYCRLF)(*CRLF)a\Rb/I + a\nb + a\rb + +/(*CRLF)(*BSR_UNICODE)a\Rb/I + a\x85b + +/(*CRLF)(*BSR_ANYCRLF)(*CR)ab/I + +/(?)(?&)/ + +/(?)(?&a)/ + +/(?)(?&aaaaaaaaaaaaaaaaaaaaaaa)/ + +/(?+-a)/ + +/(?-+a)/ + +/(?(-1))/ + +/(?(+10))/ + +/(?(10))/ + +/(?(+2))()()/ + +/(?(2))()()/ + +/\k''/ + +/\k<>/ + +/\k{}/ + +/\k/ + +/\kabc/ + +/(?P=)/ + +/(?P>)/ + +/[[:foo:]]/ + +/[[:1234:]]/ + +/[[:f\oo:]]/ + +/[[: :]]/ + +/[[:...:]]/ + +/[[:l\ower:]]/ + +/[[:abc\:]]/ + +/[abc[:x\]pqr:]]/ + +/[[:a\dz:]]/ + +/(^(a|b\g<-1'c))/ + +/^(?+1)(?x|y){0}z/ + xzxx + yzyy +\= Expect no match + xxz + +/(\3)(\1)(a)/ +\= Expect no match + cat + +/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames + cat + +/TA]/ + The ACTA] comes + +/TA]/allow_empty_class,match_unset_backref,dupnames + The ACTA] comes + +/(?2)[]a()b](abc)/ + abcbabc + +/(?2)[^]a()b](abc)/ + abcbabc + +/(?1)[]a()b](abc)/ + abcbabc +\= Expect no match + abcXabc + +/(?1)[^]a()b](abc)/ + abcXabc +\= Expect no match + abcbabc + +/(?2)[]a()b](abc)(xyz)/ + xyzbabcxyz + +/(?&N)[]a(?)](?abc)/ + abc)](abc)/ + abcY)/ + XYabcdY + +/Xa{2,4}b/ + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/Xa{2,4}?b/ + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/Xa{2,4}+b/ + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/X\d{2,4}b/ + X\=ps + X3\=ps + X33\=ps + X333\=ps + X3333\=ps + +/X\d{2,4}?b/ + X\=ps + X3\=ps + X33\=ps + X333\=ps + X3333\=ps + +/X\d{2,4}+b/ + X\=ps + X3\=ps + X33\=ps + X333\=ps + X3333\=ps + +/X\D{2,4}b/ + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/X\D{2,4}?b/ + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/X\D{2,4}+b/ + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/X[abc]{2,4}b/ + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/X[abc]{2,4}?b/ + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/X[abc]{2,4}+b/ + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/X[^a]{2,4}b/ + X\=ps + Xz\=ps + Xzz\=ps + Xzzz\=ps + Xzzzz\=ps + +/X[^a]{2,4}?b/ + X\=ps + Xz\=ps + Xzz\=ps + Xzzz\=ps + Xzzzz\=ps + +/X[^a]{2,4}+b/ + X\=ps + Xz\=ps + Xzz\=ps + Xzzz\=ps + Xzzzz\=ps + +/(Y)X\1{2,4}b/ + YX\=ps + YXY\=ps + YXYY\=ps + YXYYY\=ps + YXYYYY\=ps + +/(Y)X\1{2,4}?b/ + YX\=ps + YXY\=ps + YXYY\=ps + YXYYY\=ps + YXYYYY\=ps + +/(Y)X\1{2,4}+b/ + YX\=ps + YXY\=ps + YXYY\=ps + YXYYY\=ps + YXYYYY\=ps + +/\++\KZ|\d+X|9+Y/startchar + ++++123999\=ps + ++++123999Y\=ps + ++++Z1234\=ps + +/Z(*F)/ +\= Expect no match + Z\=ps + ZA\=ps + +/Z(?!)/ +\= Expect no match + Z\=ps + ZA\=ps + +/dog(sbody)?/ + dogs\=ps + dogs\=ph + +/dog(sbody)??/ + dogs\=ps + dogs\=ph + +/dog|dogsbody/ + dogs\=ps + dogs\=ph + +/dogsbody|dog/ + dogs\=ps + dogs\=ph + +/\bthe cat\b/ + the cat\=ps + the cat\=ph + +/abc/ + abc\=ps + abc\=ph + +/abc\K123/startchar + xyzabc123pqr + xyzabc12\=ps + xyzabc12\=ph + +/(?<=abc)123/ + xyzabc123pqr + xyzabc12\=ps + xyzabc12\=ph + +/\babc\b/ + +++abc+++ + +++ab\=ps + +++ab\=ph + +/(?&word)(?&element)(?(DEFINE)(?<[^m][^>]>[^<])(?\w*+))/B + +/(?&word)(?&element)(?(DEFINE)(?<[^\d][^>]>[^<])(?\w*+))/B + +/(ab)(x(y)z(cd(*ACCEPT)))pq/B + +/abc\K/aftertext,startchar + abcdef + abcdef\=notempty_atstart + xyzabcdef\=notempty_atstart +\= Expect no match + abcdef\=notempty + xyzabcdef\=notempty + +/^(?:(?=abc)|abc\K)/aftertext,startchar + abcdef + abcdef\=notempty_atstart +\= Expect no match + abcdef\=notempty + +/a?b?/aftertext + xyz + xyzabc + xyzabc\=notempty + xyzabc\=notempty_atstart + xyz\=notempty_atstart +\= Expect no match + xyz\=notempty + +/^a?b?/aftertext + xyz + xyzabc +\= Expect no match + xyzabc\=notempty + xyzabc\=notempty_atstart + xyz\=notempty_atstart + xyz\=notempty + +/^(?a|b\gc)/ + aaaa + bacxxx + bbaccxxx + bbbacccxx + +/^(?a|b\g'name'c)/ + aaaa + bacxxx + bbaccxxx + bbbacccxx + +/^(a|b\g<1>c)/ + aaaa + bacxxx + bbaccxxx + bbbacccxx + +/^(a|b\g'1'c)/ + aaaa + bacxxx + bbaccxxx + bbbacccxx + +/^(a|b\g'-1'c)/ + aaaa + bacxxx + bbaccxxx + bbbacccxx + +/(^(a|b\g<-1>c))/ + aaaa + bacxxx + bbaccxxx + bbbacccxx + +/(?-i:\g)(?i:(?a))/ + XaaX + XAAX + +/(?i:\g)(?-i:(?a))/ + XaaX +\= Expect no match + XAAX + +/(?-i:\g<+1>)(?i:(a))/ + XaaX + XAAX + +/(?=(?(?#simplesyntax)\$(?[a-zA-Z_\x{7f}-\x{ff}][a-zA-Z0-9_\x{7f}-\x{ff}]*)(?:\[(?[a-zA-Z0-9_\x{7f}-\x{ff}]+|\$\g)\]|->\g(\(.*?\))?)?|(?#simple syntax withbraces)\$\{(?:\g(?\[(?:\g|'(?:\\.|[^'\\])*'|"(?:\g|\\.|[^"\\])*")\])?|\g|\$\{\g\})\}|(?#complexsyntax)\{(?\$(?\g(\g*|\(.*?\))?)(?:->\g)*|\$\g|\$\{\g\})\}))\{/ + +/(?a|b|c)\g*/ + abc + accccbbb + +/^X(?7)(a)(?|(b)|(q)(r)(s))(c)(d)(Y)/ + XYabcdY + +/(?<=b(?1)|zzz)(a)/ + xbaax + xzzzax + +/(a)(?<=b\1)/ + +/(a)(?<=b+(?1))/ + +/(a+)(?<=b(?1))/ + +/(a(?<=b(?1)))/ + +/(?<=b(?1))xyz/ + +/(?<=b(?1))xyz(b+)pqrstuvew/ + +/(a|bc)\1/I + +/(a|bc)\1{2,3}/I + +/(a|bc)(?1)/I + +/(a|b\1)(a|b\1)/I + +/(a|b\1){2}/I + +/(a|bbbb\1)(a|bbbb\1)/I + +/(a|bbbb\1){2}/I + +/^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/I + +/]{0,})>]{0,})>([\d]{0,}\.)(.*)((
    ([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/Iis + +"(?>.*/)foo"I + +/(?(?=[^a-z]+[a-z]) \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} ) /Ix + +/(?:(?:(?:(?:(?:(?:(?:(?:(?:(a|b|c))))))))))/Ii + +/(?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))/I + +/A)|(?
    B))/I + AB\=copy=a + BA\=copy=a + +/(?|(?A)|(?B))/ + +/(?:a(? (?')|(?")) | + b(? (?')|(?")) ) + (?('quote')[a-z]+|[0-9]+)/Ix,dupnames + a"aaaaa + b"aaaaa +\= Expect no match + b"11111 + a"11111 + +/^(?|(a)(b)(c)(?d)|(?e)) (?('D')X|Y)/IBx,dupnames + abcdX + eX +\= Expect no match + abcdY + ey + +/(?a) (b)(c) (?d (?(R&A)$ | (?4)) )/IBx,dupnames + abcdd +\= Expect no match + abcdde + +/abcd*/ + xxxxabcd\=ps + xxxxabcd\=ph + +/abcd*/i + xxxxabcd\=ps + xxxxabcd\=ph + XXXXABCD\=ps + XXXXABCD\=ph + +/abc\d*/ + xxxxabc1\=ps + xxxxabc1\=ph + +/(a)bc\1*/ + xxxxabca\=ps + xxxxabca\=ph + +/abc[de]*/ + xxxxabcde\=ps + xxxxabcde\=ph + +/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames + cat + +/(\3)(\1)(a)/I,allow_empty_class,match_unset_backref,dupnames + cat + +/(\3)(\1)(a)/I +\= Expect no match + cat + +/i(?(DEFINE)(?a))/I + i + +/()i(?(1)a)/I + ia + +/(?i)a(?-i)b|c/B + XabX + XAbX + CcC +\= Expect no match + XABX + +/(?i)a(?s)b|c/B + +/(?i)a(?s-i)b|c/B + +/^(ab(c\1)d|x){2}$/B + xabcxd + +/^(?&t)*+(?(DEFINE)(?.))$/B + +/^(?&t)*(?(DEFINE)(?.))$/B + +# This one is here because Perl gives the match as "b" rather than "ab". I +# believe this to be a Perl bug. + +/(?>a\Kb)z|(ab)/ + ab\=startchar + +/(?P(?P0|)|(?P>L2)(?P>L1))/ + abcd + 0abc + +/abc(*MARK:)pqr/ + +/abc(*:)pqr/ + +/(*COMMIT:X)/B + +# This should, and does, fail. In Perl, it does not, which I think is a +# bug because replacing the B in the pattern by (B|D) does make it fail. +# Turning off Perl's optimization by inserting (??{""}) also makes it fail. + +/A(*COMMIT)B/aftertext,mark +\= Expect no match + ACABX + +# These should be different, but in Perl they are not, which I think +# is a bug in Perl. + +/A(*THEN)B|A(*THEN)C/mark + AC + +/A(*PRUNE)B|A(*PRUNE)C/mark +\= Expect no match + AC + +# Mark names can be duplicated. Perl doesn't give a mark for this one, +# though PCRE2 does. + +/^A(*:A)B|^X(*:A)Y/mark +\= Expect no match + XAQQ + +# COMMIT at the start of a pattern should be the same as an anchor. Perl +# optimizations defeat this. So does the PCRE2 optimization unless we disable +# it. + +/(*COMMIT)ABC/ + ABCDEFG + +/(*COMMIT)ABC/no_start_optimize +\= Expect no match + DEFGABC + +/^(ab (c+(*THEN)cd) | xyz)/x +\= Expect no match + abcccd + +/^(ab (c+(*PRUNE)cd) | xyz)/x +\= Expect no match + abcccd + +/^(ab (c+(*FAIL)cd) | xyz)/x +\= Expect no match + abcccd + +# Perl gets some of these wrong + +/(?>.(*ACCEPT))*?5/ + abcde + +/(.(*ACCEPT))*?5/ + abcde + +/(.(*ACCEPT))5/ + abcde + +/(.(*ACCEPT))*5/ + abcde + +/A\NB./B + ACBD +\= Expect no match + A\nB + ACB\n + +/A\NB./Bs + ACBD + ACB\n +\= Expect no match + A\nB + +/A\NB/newline=crlf + A\nB + A\rB +\= Expect no match + A\r\nB + +/\R+b/B + +/\R+\n/B + +/\R+\d/B + +/\d*\R/B + +/\s*\R/B + \x20\x0a + \x20\x0d + \x20\x0d\x0a + +/\S*\R/B + a\x0a + +/X\h*\R/B + X\x20\x0a + +/X\H*\R/B + X\x0d\x0a + +/X\H+\R/B + X\x0d\x0a + +/X\H++\R/B +\= Expect no match + X\x0d\x0a + +/(?<=abc)def/ + abc\=ph + +/abc$/ + abc + abc\=ps + abc\=ph + +/abc$/m + abc + abc\n + abc\=ph + abc\n\=ph + abc\=ps + abc\n\=ps + +/abc\z/ + abc + abc\=ps + abc\=ph + +/abc\Z/ + abc + abc\=ps + abc\=ph + +/abc\b/ + abc + abc\=ps + abc\=ph + +/abc\B/ + abc\=ps + abc\=ph +\= Expect no match + abc + +/.+/ +\= Bad offsets + abc\=offset=4 + abc\=offset=-4 +\= Valid data + abc\=offset=0 + abc\=offset=1 + abc\=offset=2 +\= Expect no match + abc\=offset=3 + +/^\cÄ£/ + +/(?P(?P=abn)xxx)/B + +/(a\1z)/B + +/(?P(?P=abn)(?(?P=axn)xxx)/B + +/(?P(?P=axn)xxx)(?yy)/B + +# These tests are here because Perl gets the first one wrong. + +/(\R*)(.)/s + \r\n + \r\r\n\n\r + \r\r\n\n\r\n + +/(\R)*(.)/s + \r\n + \r\r\n\n\r + \r\r\n\n\r\n + +/((?>\r\n|\n|\x0b|\f|\r|\x85)*)(.)/s + \r\n + \r\r\n\n\r + \r\r\n\n\r\n + +# ------------- + +/^abc$/B + +/^abc$/Bm + +/^(a)*+(\w)/ + aaaaX +\= Expect no match + aaaa + +/^(?:a)*+(\w)/ + aaaaX +\= Expect no match + aaaa + +/(a)++1234/IB + +/([abc])++1234/I + +/(?<=(abc)+)X/ + +/(^ab)/I + +/(^ab)++/I + +/(^ab|^)+/I + +/(^ab|^)++/I + +/(?:^ab)/I + +/(?:^ab)++/I + +/(?:^ab|^)+/I + +/(?:^ab|^)++/I + +/(.*ab)/I + +/(.*ab)++/I + +/(.*ab|.*)+/I + +/(.*ab|.*)++/I + +/(?:.*ab)/I + +/(?:.*ab)++/I + +/(?:.*ab|.*)+/I + +/(?:.*ab|.*)++/I + +/(?=a)[bcd]/I + +/((?=a))[bcd]/I + +/((?=a))+[bcd]/I + +/((?=a))++[bcd]/I + +/(?=a+)[bcd]/Ii + +/(?=a+?)[bcd]/Ii + +/(?=a++)[bcd]/Ii + +/(?=a{3})[bcd]/Ii + +/(abc)\1+/ + +# Perl doesn't get these right IMO (the 3rd is PCRE2-specific) + +/(?1)(?:(b(*ACCEPT))){0}/ + b + +/(?1)(?:(b(*ACCEPT))){0}c/ + bc +\= Expect no match + b + +/(?1)(?:((*ACCEPT))){0}c/ + c + c\=notempty + +/^.*?(?(?=a)a|b(*THEN)c)/ +\= Expect no match + ba + +/^.*?(?(?=a)a|bc)/ + ba + +/^.*?(?(?=a)a(*THEN)b|c)/ +\= Expect no match + ac + +/^.*?(?(?=a)a(*THEN)b)c/ +\= Expect no match + ac + +/^.*?(a(*THEN)b)c/ +\= Expect no match + aabc + +/^.*? (?1) c (?(DEFINE)(a(*THEN)b))/x + aabc + +/^.*?(a(*THEN)b|z)c/ + aabc + +/^.*?(z|a(*THEN)b)c/ + aabc + +# These are here because they are not Perl-compatible; the studying means the +# mark is not seen. + +/(*MARK:A)(*SKIP:B)(C|X)/mark + C +\= Expect no match + D + +/(*:A)A+(*SKIP:A)(B|Z)/mark +\= Expect no match + AAAC + +# ---------------------------- + +"(?=a*(*ACCEPT)b)c" + c + c\=notempty + +/(?1)c(?(DEFINE)((*ACCEPT)b))/ + c + c\=notempty + +/(?>(*ACCEPT)b)c/ + c +\= Expect no match + c\=notempty + +/(?:(?>(a)))+a%/allaftertext + %aa% + +/(a)b|ac/allaftertext + ac\=ovector=1 + +/(a)(b)x|abc/allaftertext + abc\=ovector=2 + +/(a)bc|(a)(b)\2/ + abc\=ovector=1 + abc\=ovector=2 + aba\=ovector=1 + aba\=ovector=2 + aba\=ovector=3 + aba\=ovector=4 + +/(?(DEFINE)(a(?2)|b)(b(?1)|a))(?:(?1)|(?2))/I + +/(a(?2)|b)(b(?1)|a)(?:(?1)|(?2))/I + +/(a(?2)|b)(b(?1)|a)(?1)(?2)/I + +/(abc)(?1)/I + +/(?:(foo)|(bar)|(baz))X/allcaptures + bazfooX + foobazbarX + barfooX + bazX + foobarbazX + bazfooX\=ovector=0 + bazfooX\=ovector=1 + bazfooX\=ovector=2 + bazfooX\=ovector=3 + +/(?=abc){3}abc/B + +/(?=abc)+abc/B + +/(?=abc)++abc/B + +/(?=abc){0}xyz/B + +/(?=(a))?./B + +/(?=(a))??./B + +/^(?=(a)){0}b(?1)/B + +/(?(DEFINE)(a))?b(?1)/B + +/^(?=(?1))?[az]([abc])d/B + +/^(?!a){0}\w+/B + +/(?<=(abc))?xyz/B + +/[:a[:abc]b:]/B + +/^(a(*:A)(d|e(*:B))z|aeq)/auto_callout + adz + aez + aeqwerty + +/.(*F)/ +\= Expect no match + abc\=ph + +/\btype\b\W*?\btext\b\W*?\bjavascript\b/I + +/\btype\b\W*?\btext\b\W*?\bjavascript\b|\burl\b\W*?\bshell:|a+)(?>(z+))\w/B + aaaazzzzb +\= Expect no match + aazz + +/(.)(\1|a(?2))/ + bab + +/\1|(.)(?R)\1/ + cbbbc + +/(.)((?(1)c|a)|a(?2))/ +\= Expect no match + baa + +/(?P(?P=abn)xxx)/B + +/(a\1z)/B + +/^a\x41z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + aAz +\= Expect no match + ax41z + +/^a[m\x41]z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + aAz + +/^a\x1z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + ax1z + +/^a\u0041z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + aAz +\= Expect no match + au0041z + +/^a[m\u0041]z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + aAz + +/^a\u041z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + au041z +\= Expect no match + aAz + +/^a\U0041z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + aU0041z +\= Expect no match + aAz + +/^\u{7a}/alt_bsux + u{7a} +\= Expect no match + zoo + +/^\u{7a}/extra_alt_bsux + zoo + +/(?(?=c)c|d)++Y/B + +/(?(?=c)c|d)*+Y/B + +/a[\NB]c/ + aNc + +/a[B-\Nc]/ + +/a[B\Nc]/ + +/(a)(?2){0,1999}?(b)/ + +/(a)(?(DEFINE)(b))(?2){0,1999}?(?2)/ + +# This test, with something more complicated than individual letters, causes +# different behaviour in Perl. Perhaps it disables some optimization; no tag is +# passed back for the failures, whereas in PCRE2 there is a tag. + +/(A|P)(*:A)(B|P) | (X|P)(X|P)(*:B)(Y|P)/x,mark + AABC + XXYZ +\= Expect no match + XAQQ + XAQQXZZ + AXQQQ + AXXQQQ + +# Perl doesn't give marks for these, though it does if the alternatives are +# replaced by single letters. + +/(b|q)(*:m)f|a(*:n)w/mark + aw +\= Expect no match + abc + +/(q|b)(*:m)f|a(*:n)w/mark + aw +\= Expect no match + abc + +# After a partial match, the behaviour is as for a failure. + +/^a(*:X)bcde/mark + abc\=ps + +# These are here because Perl doesn't return a mark, except for the first. + +/(?=(*:x))(q|)/aftertext,mark + abc + +/(?=(*:x))((*:y)q|)/aftertext,mark + abc + +/(?=(*:x))(?:(*:y)q|)/aftertext,mark + abc + +/(?=(*:x))(?>(*:y)q|)/aftertext,mark + abc + +/(?=a(*:x))(?!a(*:y)c)/aftertext,mark + ab + +/(?=a(*:x))(?=a(*:y)c|)/aftertext,mark + ab + +/(..)\1/ + ab\=ps + aba\=ps + abab\=ps + +/(..)\1/i + ab\=ps + abA\=ps + aBAb\=ps + +/(..)\1{2,}/ + ab\=ps + aba\=ps + abab\=ps + ababa\=ps + ababab\=ps + ababab\=ph + abababa\=ps + abababa\=ph + +/(..)\1{2,}/i + ab\=ps + aBa\=ps + aBAb\=ps + AbaBA\=ps + abABAb\=ps + aBAbaB\=ph + abABabA\=ps + abaBABa\=ph + +/(..)\1{2,}?x/i + ab\=ps + abA\=ps + aBAb\=ps + abaBA\=ps + abAbaB\=ps + abaBabA\=ps + abAbABaBx\=ps + +/^(..)\1/ + aba\=ps + +/^(..)\1{2,3}x/ + aba\=ps + ababa\=ps + ababa\=ph + abababx + ababababx + +/^(..)\1{2,3}?x/ + aba\=ps + ababa\=ps + ababa\=ph + abababx + ababababx + +/^(..)(\1{2,3})ab/ + abababab + +/^\R/ + \r\=ps + \r\=ph + +/^\R{2,3}x/ + \r\=ps + \r\=ph + \r\r\=ps + \r\r\=ph + \r\r\r\=ps + \r\r\r\=ph + \r\rx + \r\r\rx + +/^\R{2,3}?x/ + \r\=ps + \r\=ph + \r\r\=ps + \r\r\=ph + \r\r\r\=ps + \r\r\r\=ph + \r\rx + \r\r\rx + +/^\R?x/ + \r\=ps + \r\=ph + x + \rx + +/^\R+x/ + \r\=ps + \r\=ph + \r\n\=ps + \r\n\=ph + \rx + +/^a$/newline=crlf + a\r\=ps + a\r\=ph + +/^a$/m,newline=crlf + a\r\=ps + a\r\=ph + +/^(a$|a\r)/newline=crlf + a\r\=ps + a\r\=ph + +/^(a$|a\r)/m,newline=crlf + a\r\=ps + a\r\=ph + +/./newline=crlf + \r\=ps + \r\=ph + +/.{2,3}/newline=crlf + \r\=ps + \r\=ph + \r\r\=ps + \r\r\=ph + \r\r\r\=ps + \r\r\r\=ph + +/.{2,3}?/newline=crlf + \r\=ps + \r\=ph + \r\r\=ps + \r\r\=ph + \r\r\r\=ps + \r\r\r\=ph + +"AB(C(D))(E(F))?(?(?=\2)(?=\4))" + ABCDGHI\=ovector=01 + +# These are all run as real matches in test 1; here we are just checking the +# settings of the anchored and startline bits. + +/(?>.*?a)(?<=ba)/I + +/(?:.*?a)(?<=ba)/I + +/.*?a(*PRUNE)b/I + +/.*?a(*PRUNE)b/Is + +/^a(*PRUNE)b/Is + +/.*?a(*SKIP)b/I + +/(?>.*?a)b/Is + +/(?>.*?a)b/I + +/(?>^a)b/Is + +/(?>.*?)(?<=(abcd)|(wxyz))/I + +/(?>.*)(?<=(abcd)|(wxyz))/I + +"(?>.*)foo"I + +"(?>.*?)foo"I + +/(?>^abc)/Im + +/(?>.*abc)/Im + +/(?:.*abc)/Im + +/(?:(a)+(?C1)bb|aa(?C2)b)/ + aab\=callout_capture + +/(?:(a)++(?C1)bb|aa(?C2)b)/ + aab\=callout_capture + +/(?:(?>(a))(?C1)bb|aa(?C2)b)/ + aab\=callout_capture + +/(?:(?1)(?C1)x|ab(?C2))((a)){0}/ + aab\=callout_capture + +/(?1)(?C1)((a)(?C2)){0}/ + aab\=callout_capture + +/(?:(a)+(?C1)bb|aa(?C2)b)++/ + aab\=callout_capture + aab\=callout_capture,ovector=1 + +/(ab)x|ab/ + ab\=ovector=0 + ab\=ovector=1 + +/(?<=123)(*MARK:xx)abc/mark + xxxx123a\=ph + xxxx123a\=ps + +/123\Kabc/startchar + xxxx123a\=ph + xxxx123a\=ps + +/^(?(?=a)aa|bb)/auto_callout + bb + +/(?C1)^(?C2)(?(?C99)(?=(?C3)a(?C4))(?C5)a(?C6)a(?C7)|(?C8)b(?C9)b(?C10))(?C11)/ + bb + +# Perl seems to have a bug with this one. + +/aaaaa(*COMMIT)(*PRUNE)b|a+c/ + aaaaaac + +# Here are some that Perl treats differently because of the way it handles +# backtracking verbs. + +/(?!a(*COMMIT)b)ac|ad/ + ac + ad + +/^(?!a(*THEN)b|ac)../ + ad +\= Expect no match + ac + +/^(?=a(*THEN)b|ac)/ + ac + +/\A.*?(?:a|b(*THEN)c)/ + ba + +/\A.*?(?:a|b(*THEN)c)++/ + ba + +/\A.*?(?:a|b(*THEN)c|d)/ + ba + +/(?:(a(*MARK:X)a+(*SKIP:X)b)){0}(?:(?1)|aac)/ + aac + +/\A.*?(a|b(*THEN)c)/ + ba + +/^(A(*THEN)B|A(*THEN)D)/ + AD + +/(?!b(*THEN)a)bn|bnn/ + bnn + +/(?(?=b(*SKIP)a)bn|bnn)/ + bnn + +/(?=b(*THEN)a|)bn|bnn/ + bnn + +# This test causes a segfault with Perl 5.18.0 + +/^(?=(a)){0}b(?1)/ + backgammon + +/(?|(?f)|(?b))/I,dupnames + +/(?abc)(?z)\k()/IB,dupnames + +/a*[bcd]/B + +/[bcd]*a/B + +# A complete set of tests for auto-possessification of character types, but +# omitting \C because it might be disabled (it has its own tests). + +/\D+\D \D+\d \D+\S \D+\s \D+\W \D+\w \D+. \D+\R \D+\H \D+\h \D+\V \D+\v \D+\Z \D+\z \D+$/Bx + +/\d+\D \d+\d \d+\S \d+\s \d+\W \d+\w \d+. \d+\R \d+\H \d+\h \d+\V \d+\v \d+\Z \d+\z \d+$/Bx + +/\S+\D \S+\d \S+\S \S+\s \S+\W \S+\w \S+. \S+\R \S+\H \S+\h \S+\V \S+\v \S+\Z \S+\z \S+$/Bx + +/\s+\D \s+\d \s+\S \s+\s \s+\W \s+\w \s+. \s+\R \s+\H \s+\h \s+\V \s+\v \s+\Z \s+\z \s+$/Bx + +/\W+\D \W+\d \W+\S \W+\s \W+\W \W+\w \W+. \W+\R \W+\H \W+\h \W+\V \W+\v \W+\Z \W+\z \W+$/Bx + +/\w+\D \w+\d \w+\S \w+\s \w+\W \w+\w \w+. \w+\R \w+\H \w+\h \w+\V \w+\v \w+\Z \w+\z \w+$/Bx + +/\R+\D \R+\d \R+\S \R+\s \R+\W \R+\w \R+. \R+\R \R+\H \R+\h \R+\V \R+\v \R+\Z \R+\z \R+$/Bx + +/\H+\D \H+\d \H+\S \H+\s \H+\W \H+\w \H+. \H+\R \H+\H \H+\h \H+\V \H+\v \H+\Z \H+\z \H+$/Bx + +/\h+\D \h+\d \h+\S \h+\s \h+\W \h+\w \h+. \h+\R \h+\H \h+\h \h+\V \h+\v \h+\Z \h+\z \h+$/Bx + +/\V+\D \V+\d \V+\S \V+\s \V+\W \V+\w \V+. \V+\R \V+\H \V+\h \V+\V \V+\v \V+\Z \V+\z \V+$/Bx + +/\v+\D \v+\d \v+\S \v+\s \v+\W \v+\w \v+. \v+\R \v+\H \v+\h \v+\V \v+\v \v+\Z \v+\z \v+$/Bx + +/ a+\D a+\d a+\S a+\s a+\W a+\w a+. a+\R a+\H a+\h a+\V a+\v a+\Z a+\z a+$/Bx + +/\n+\D \n+\d \n+\S \n+\s \n+\W \n+\w \n+. \n+\R \n+\H \n+\h \n+\V \n+\v \n+\Z \n+\z \n+$/Bx + +/ .+\D .+\d .+\S .+\s .+\W .+\w .+. .+\R .+\H .+\h .+\V .+\v .+\Z .+\z .+$/Bx + +/ .+\D .+\d .+\S .+\s .+\W .+\w .+. .+\R .+\H .+\h .+\V .+\v .+\Z .+\z .+$/Bsx + +/ \D+$ \d+$ \S+$ \s+$ \W+$ \w+$ \R+$ \H+$ \h+$ \V+$ \v+$ a+$ \n+$ .+$ .+$/Bmx + +/(?=a+)a(a+)++a/B + +/a+(bb|cc)a+(?:bb|cc)a+(?>bb|cc)a+(?:bb|cc)+a+(aa)a+(?:bb|aa)/B + +/a+(bb|cc)?#a+(?:bb|cc)??#a+(?:bb|cc)?+#a+(?:bb|cc)*#a+(bb|cc)?a#a+(?:aa)?/B + +/a+(?:bb)?a#a+(?:|||)#a+(?:|b)a#a+(?:|||)?a/B + +/[ab]*/B + aaaa + +/[ab]*?/B + aaaa + +/[ab]?/B + aaaa + +/[ab]??/B + aaaa + +/[ab]+/B + aaaa + +/[ab]+?/B + aaaa + +/[ab]{2,3}/B + aaaa + +/[ab]{2,3}?/B + aaaa + +/[ab]{2,}/B + aaaa + +/[ab]{2,}?/B + aaaa + +/\d+\s{0,5}=\s*\S?=\w{0,4}\W*/B + +/[a-d]{5,12}[e-z0-9]*#[^a-z]+[b-y]*a[2-7]?[^0-9a-z]+/B + +/[a-z]*\s#[ \t]?\S#[a-c]*\S#[C-G]+?\d#[4-8]*\D#[4-9,]*\D#[!$]{0,5}\w#[M-Xf-l]+\W#[a-c,]?\W/B + +/a+(aa|bb)*c#a*(bb|cc)*a#a?(bb|cc)*d#[a-f]*(g|hh)*f/B + +/[a-f]*(g|hh|i)*i#[a-x]{4,}(y{0,6})*y#[a-k]+(ll|mm)+n/B + +/[a-f]*(?>gg|hh)+#[a-f]*(?>gg|hh)?#[a-f]*(?>gg|hh)*a#[a-f]*(?>gg|hh)*h/B + +/[a-c]*d/IB + +/[a-c]+d/IB + +/[a-c]?d/IB + +/[a-c]{4,6}d/IB + +/[a-c]{0,6}d/IB + +# End of special auto-possessive tests + +/^A\o{1239}B/ + A\123B + +/^A\oB/ + +/^A\x{zz}B/ + +/^A\x{12Z/ + +/^A\x{/ + +/[ab]++/B,no_auto_possess + +/[^ab]*+/B,no_auto_possess + +/a{4}+/B,no_auto_possess + +/a{4}+/Bi,no_auto_possess + +/[a-[:digit:]]+/ + +/[A-[:digit:]]+/ + +/[a-[.xxx.]]+/ + +/[a-[=xxx=]]+/ + +/[a-[!xxx!]]+/ + +/[A-[!xxx!]]+/ + A]]] + +/[a-\d]+/ + +/(?<0abc>xx)/ + +/(?&1abc)xx(?<1abc>y)/ + +/(?xx)/ + +/(?'0abc'xx)/ + +/(?P<0abc>xx)/ + +/\k<5ghj>/ + +/\k'5ghj'/ + +/\k{2fgh}/ + +/(?P=8yuki)/ + +/\g{4df}/ + +/(?&1abc)xx(?<1abc>y)/ + +/(?P>1abc)xx(?<1abc>y)/ + +/\g'3gh'/ + +/\g<5fg>/ + +/(?(<4gh>)abc)/ + +/(?('4gh')abc)/ + +/(?(4gh)abc)/ + +/(?(R&6yh)abc)/ + +/(((a\2)|(a*)\g<-1>))*a?/B + +# Test the ugly "start or end of word" compatibility syntax. + +/[[:<:]]red[[:>:]]/B + little red riding hood + a /red/ thing + red is a colour + put it all on red +\= Expect no match + no reduction + Alfred Winifred + +/[a[:<:]] should give error/ + +/(?=ab\K)/aftertext + abcd\=startchar + +/abcd/newline=lf,firstline +\= Expect no match + xx\nxabcd + +# Test stack guard external calls. + +/(((a)))/stackguard=1 + +/(((a)))/stackguard=2 + +/(((a)))/stackguard=3 + +/(((((a)))))/ + +# End stack guard tests + +/^\w+(?>\s*)(?<=\w)/B + +/\othing/ + +/\o{}/ + +/\o{whatever}/ + +/\xthing/ + +/\x{}/ + +/\x{whatever}/ + +/A\8B/ + +/A\9B/ + +# This one is here because Perl fails to match "12" for this pattern when the $ +# is present. + +/^(?(?=abc)\w{3}:|\d\d)$/ + abc: + 12 +\= Expect no match + 123 + xyz + +# Perl gets this one wrong, giving "a" as the after text for ca and failing to +# match for cd. + +/(?(?=ab)ab)/aftertext + abxxx + ca + cd + +# This should test both paths for processing OP_RECURSE. + +/(?(R)a+|(?R)b)/ + aaaabcde + aaaabcde\=ovector=100 + +/a*?b*?/ + ab + +/(*NOTEMPTY)a*?b*?/ + ab + ba + cb + +/(*NOTEMPTY_ATSTART)a*?b*?/aftertext + ab + cdab + +/(?(VERSION>=10.0)yes|no)/I + yesno + +/(?(VERSION>=10.04)yes|no)/ + yesno + +/(?(VERSION=8)yes){3}/BI,aftertext + yesno + +/(?(VERSION=8)yes|no){3}/I + yesnononoyes +\= Expect no match + yesno + +/(?:(?abc)|xyz)(?(VERSION)yes|no)/I + abcyes + xyzno +\= Expect no match + abcno + xyzyes + +/(?(VERSION<10)yes|no)/ + +/(?(VERSION>10)yes|no)/ + +/(?(VERSION>=10.0.0)yes|no)/ + +/(?(VERSION=10.101)yes|no)/ + +/abcd/I + +/abcd/I,no_start_optimize + +/(|ab)*?d/I + abd + xyd + +/(|ab)*?d/I,no_start_optimize + abd + xyd + +/\k*(?aa)(?bb)/match_unset_backref,dupnames + aabb + +/(((((a)))))/parens_nest_limit=2 + +/abc/replace=XYZ + 123123 + 123abc123 + 123abc123abc123 + 123123\=zero_terminate + 123abc123\=zero_terminate + 123abc123abc123\=zero_terminate + +/abc/g,replace=XYZ + 123abc123 + 123abc123abc123 + +/abc/replace=X$$Z + 123abc123 + +/abc/g,replace=X$$Z + 123abc123abc123 + +/a(b)c(d)e/replace=X$1Y${2}Z + "abcde" + +/a(b)c(d)e/replace=X$1Y${2}Z,global + "abcde-abcde" + +/a(?b)c(?d)e/replace=X$ONE+${TWO}Z + "abcde" + +/a(?b)c(?d)e/g,replace=X$ONE+${TWO}Z + "abcde-abcde-" + +/abc/replace=a$++ + 123abc + +/abc/replace=a$bad + 123abc + +/abc/replace=a${A234567890123456789_123456789012}z + 123abc + +/abc/replace=a${A23456789012345678901234567890123}z + 123abc + +/abc/replace=a${bcd + 123abc + +/abc/replace=a${b+d}z + 123abc + +/abc/replace=[10]XYZ + 123abc123 + +/abc/replace=[9]XYZ + 123abc123 + +/abc/replace=xyz + 1abc2\=partial_hard + +/abc/replace=xyz + 123abc456 + 123abc456\=replace=pqr + 123abc456abc789 + 123abc456abc789\=g + +/(?<=abc)(|def)/g,replace=<$0> + 123abcxyzabcdef789abcpqr + +/./replace=$0 + a + +/(.)(.)/replace=$2+$1 + abc + +/(?.)(?.)/replace=$B+$A + abc + +/(.)(.)/g,replace=$2$1 + abcdefgh + +/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=${*MARK} + apple lemon blackberry + apple strudel + fruitless + +/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/replace=${*MARK} sauce, + apple lemon blackberry + +/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=<$*MARK> + apple lemon blackberry + apple strudel + fruitless + +/(*:pear)apple/g,replace=${*MARKING} + apple lemon blackberry + +/(*:pear)apple/g,replace=${*MARK-time + apple lemon blackberry + +/(*:pear)apple/g,replace=${*mark} + apple lemon blackberry + +/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=<$*MARKET> + apple lemon blackberry + +/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=[22]${*MARK} + apple lemon blackberry + apple lemon blackberry\=substitute_overflow_length + +/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=[23]${*MARK} + apple lemon blackberry + +/abc/ + 123abc123\=replace=[9]XYZ + 123abc123\=substitute_overflow_length,replace=[9]XYZ + 123abc123\=substitute_overflow_length,replace=[6]XYZ + 123abc123\=substitute_overflow_length,replace=[1]XYZ + 123abc123\=substitute_overflow_length,replace=[0]XYZ + +/a(b)c/ + 123abc123\=replace=[9]x$1z + 123abc123\=substitute_overflow_length,replace=[9]x$1z + 123abc123\=substitute_overflow_length,replace=[6]x$1z + 123abc123\=substitute_overflow_length,replace=[1]x$1z + 123abc123\=substitute_overflow_length,replace=[0]x$1z + +"((?=(?(?=(?(?=(?(?=()))))))))" + a + +"(?(?=)==)(((((((((?=)))))))))" +\= Expect no match + a + +/(a)(b)|(c)/ + XcX\=ovector=2,get=1,get=2,get=3,get=4,getall + +/x(?=ab\K)/ + xab\=get=0 + xab\=copy=0 + xab\=getall + +/(?a)|(?b)/dupnames + a\=ovector=1,copy=A,get=A,get=2 + a\=ovector=2,copy=A,get=A,get=2 + b\=ovector=2,copy=A,get=A,get=2 + +/a(b)c(d)/ + abc\=ph,copy=0,copy=1,getall + +/^abc/info + +/^abc/info,no_dotstar_anchor + +/.*\d/info,auto_callout +\= Expect no match + aaa + +/.*\d/info,no_dotstar_anchor,auto_callout +\= Expect no match + aaa + +/.*\d/dotall,info + +/.*\d/dotall,no_dotstar_anchor,info + +/(*NO_DOTSTAR_ANCHOR)(?s).*\d/info + +'^(?:(a)|b)(?(1)A|B)' + aA123\=ovector=1 + aA123\=ovector=2 + +'^(?:(?a)|b)(?()A|B)' + aA123\=ovector=1 + aA123\=ovector=2 + +'^(?)(?:(?a)|b)(?()A|B)'dupnames + aA123\=ovector=1 + aA123\=ovector=2 + aA123\=ovector=3 + +'^(?:(?X)|)(?:(?a)|b)\k{AA}'dupnames + aa123\=ovector=1 + aa123\=ovector=2 + aa123\=ovector=3 + +/(?(?J)(?1(111111)11|)1|1|)(?()1)/ + +/(?(?J)(?))(?-J)\k/ + +# Quantifiers are not allowed on condition assertions, but are otherwise +# OK in conditions. + +/(?(?=0)?)+/ + +/(?(?=0)(?=00)?00765)/ + 00765 + +/(?(?=0)(?=00)?00765|(?!3).56)/ + 00765 + 456 +\= Expect no match + 356 + +'^(a)*+(\w)' + g + g\=ovector=1 + +'^(?:a)*+(\w)' + g + g\=ovector=1 + +# These two pattern showeds up compile-time bugs + +"((?2){0,1999}())?" + +/((?+1)(\1))/B + +# Callouts with string arguments + +/a(?C"/ + +/a(?C"a/ + +/a(?C"a"/ + +/a(?C"a"bcde(?C"b")xyz/ + +/a(?C"a)b""c")/B + +/ab(?C" any text with spaces ")cde/B + abcde + 12abcde + +/^a(b)c(?C1)def/ + abcdef + +/^a(b)c(?C"AB")def/ + abcdef + +/^a(b)c(?C1)def/ + abcdef\=callout_capture + +/^a(b)c(?C{AB})def/B + abcdef\=callout_capture + +/(?C`a``b`)(?C'a''b')(?C"a""b")(?C^a^^b^)(?C%a%%b%)(?C#a##b#)(?C$a$$b$)(?C{a}}b})/B,callout_info + +/(?:a(?C`code`)){3}/B + +/^(?(?C25)(?=abc)abcd|xyz)/B,callout_info + abcdefg + xyz123 + +/^(?(?C$abc$)(?=abc)abcd|xyz)/B + abcdefg + xyz123 + +/^ab(?C'first')cd(?C"second")ef/ + abcdefg + +/(?:a(?C`code`)){3}X/ + aaaXY + +# Binary zero in callout string +# a ( ? C ' x z ' ) b +/ 61 28 3f 43 27 78 00 7a 27 29 62/hex,callout_info + abcdefgh + +/(?(?!)^)/ + +/(?(?!)a|b)/ + bbb +\= Expect no match + aaa + +# JIT gives a different error message for the infinite recursion + +"(*NO_JIT)((?2)+)((?1)){" + abcd{ + +# Perl fails to diagnose the absence of an assertion + +"(?(?.*!.*)?)" + +"X((?2)()*+){2}+"B + +"X((?2)()*+){2}"B + +/(?<=\bABQ(3(?-7)))/ + +/(?<=\bABQ(3(?+7)))/ + +";(?<=()((?3))((?2)))" + +# Perl loops on this (PCRE2 used to!) + +/(?<=\Ka)/g,aftertext + aaaaa + +/(?<=\Ka)/altglobal,aftertext + aaaaa + +/((?2){73}(?2))((?1))/info + +/abc/ +\= Expect no match + \[9x!xxx(]{9999} + +/(abc)*/ + \[abc]{5} + +/^/gm + \n\n\n + +/^/gm,alt_circumflex + \n\n\n + +/((((((((x))))))))\81/ + xx1 + +/((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((x))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))\80/ + xx + +/\80/ + +/A\8B\9C/ + A8B9C + +/(?x:((?'a')) # comment (with parentheses) and | vertical +(?-x:#not a comment (?'b')) # this is a comment () +(?'c')) # not a comment (?'d')/info + +/(?|(?'a')(2)(?'b')|(?'a')(?'a')(3))/I,dupnames + A23B + B32A + +# These are some patterns that used to cause buffer overflows or other errors +# while compiling. + +/.((?2)(?R)|\1|$)()/B + +/.((?3)(?R)()(?2)|\1|$)()/B + +/(\9*+(?2);\3++()2|)++{/ + +/\V\x85\9*+((?2)\3++()2)*:2/ + +/(((?(R)){0,2}) (?'x'((?'R')((?'R')))))/dupnames + +/(((?(X)){0,2}) (?'x'((?'X')((?'X')))))/dupnames + +/(((?(R)){0,2}) (?'x'((?'X')((?'R')))))/ + +"(?J)(?'d'(?'d'\g{d}))" + +"(?=!((?2)(?))({8(?<=(?1){29}8bbbb\x16\xd\xc6^($(\xa9H4){4}h}?1)B))\x15')" + +/A(?'')Z/ + +"(?J:(?|(?'R')(\k'R')|((?'R'))))" + +/(?<=|(\,\$(?73591620449005828816)\xa8.{7}){6}\x09)/ + +/^(?:(?(1)x|)+)+$()/B + +/[[:>:]](?<)/ + +/((?x)(*:0))#(?'/ + +/(?C$[$)(?<]/ + +/(?C$)$)(?<]/ + +/(?(R))*+/B + abcd + +/((?x)(?#))#(?'/ + +/((?x)(?#))#(?'abc')/I + +/[[:\\](?<[::]/ + +/[[:\\](?'abc')[a:]/I + +"[[[.\xe8Nq\xffq\xff\xe0\x2|||::Nq\xffq\xff\xe0\x6\x2|||::[[[:[::::::[[[[[::::::::[:[[[:[:::[[[[[[[[[[[[:::::::::::::::::[[.\xe8Nq\xffq\xff\xe0\x2|||::Nq\xffq\xff\xe0\x6\x2|||::[[[:[::::::[[[[[::::::::[:[[[:[:::[[[[[[[[[[[[[[:::E[[[:[:[[:[:::[[:::E[[[:[:[[:'[:::::E[[[:[::::::[[[:[[[[[[[::E[[[:[::::::[[[:[[[[[[[[:[[::[::::[[:::::::[[:[[[[[[[:[[::[:[[:[~" + +/()(?(R)0)*+/B + +/(?R-:(?>abcd<< + +/abcd/g,replace=\$1$2\,substitute_literal + XabcdYabcdZ + +/a(bc)(DE)/replace=a\u$1\U$1\E$1\l$2\L$2\Eab\Uab\LYZ\EDone,substitute_extended + abcDE + +/abcd/replace=xy\kz,substitute_extended + abcd + +/a(?:(b)|(c))/substitute_extended,replace=X${1:+1:-1}X${2:+2:-2} + ab + ac + ab\=replace=${1:+$1\:$1:$2} + ac\=replace=${1:+$1\:$1:$2} + >>ac<<\=replace=${1:+$1\:$1:$2},substitute_literal + +/a(?:(b)|(c))/substitute_extended,replace=X${1:-1:-1}X${2:-2:-2} + ab + ac + +/(a)/substitute_extended,replace=>${1:+\Q$1:{}$$\E+\U$1}< + a + +/X(b)Y/substitute_extended + XbY\=replace=x${1:+$1\U$1}y + XbY\=replace=\Ux${1:+$1$1}y + +/a/substitute_extended,replace=${*MARK:+a:b} + a + +/(abcd)/replace=${1:+xy\kz},substitute_extended + abcd + +/(abcd)/ + abcd\=replace=${1:+xy\kz},substitute_extended + +/abcd/substitute_extended,replace=>$1< + abcd + +/abcd/substitute_extended,replace=>xxx${xyz}<<< + abcd + +/(?J)(?:(?a)|(?b))/replace=<$A> + [a] + [b] +\= Expect error + (a)\=ovector=1 + +/(a)|(b)/replace=<$1> +\= Expect error + b + +/(aa)(BB)/substitute_extended,replace=\U$1\L$2\E$1..\U$1\l$2$1 + aaBB + +/abcd/replace=wxyz,substitute_matched + abcd + pqrs + +/abcd/g + >abcd1234abcd5678<\=replace=wxyz,substitute_matched + +/^(o(\1{72}{\"{\\{00000059079}\d*){74}}){19}/I + +/((p(?'K/ + +/((p(?'K/no_auto_capture + +/abc/replace=A$3123456789Z + abc + +/(?$1<,substitute_unset_empty + cat + xbcom + +/a|(b)c/ + cat\=replace=>$1< + cat\=replace=>$1<,substitute_unset_empty + xbcom\=replace=>$1<,substitute_unset_empty + +/a|(b)c/substitute_extended + cat\=replace=>${2:-xx}< + cat\=replace=>${2:-xx}<,substitute_unknown_unset + cat\=replace=>${X:-xx}<,substitute_unknown_unset + +/a|(?'X'b)c/replace=>$X<,substitute_unset_empty + cat + xbcom + +/a|(?'X'b)c/replace=>$Y<,substitute_unset_empty + cat + cat\=substitute_unknown_unset + cat\=substitute_unknown_unset,-substitute_unset_empty + +/a|(b)c/replace=>$2<,substitute_unset_empty + cat + cat\=substitute_unknown_unset + cat\=substitute_unknown_unset,-substitute_unset_empty + +/()()()/use_offset_limit + \=ovector=11000000000 + \=callout_fail=11000000000 + \=callout_fail=1:11000000000 + \=callout_data=11000000000 + \=callout_data=-11000000000 + \=offset_limit=1100000000000000000000 + \=copy=11000000000 + +/(*MARK:A\x00b)/mark + abc + +/(*MARK:A\x00b)/mark,alt_verbnames + abc + +/"(*MARK:A" 00 "b)"/mark,hex + abc + +/"(*MARK:A" 00 "b)"/mark,hex,alt_verbnames + abc + +/efg/hex + +/eff/hex + +/effg/hex + +/(?J)(?'a'))(?'a')/ + +/(?<=((?C)0))/ + 9010 +\= Expect no match + abc + +/aaa/ +\[abc]{10000000000000000000000000000} +\[a]{3} + +/\[AB]{6000000000000000000000}/expand + +# Hex uses pattern length, not zero-terminated. This tests for overrunning +# the given length of a pattern. + +/'(*U'/hex + +/'(*'/hex + +/'('/hex + +//hex + +# These tests are here because Perl never allows a back reference in a +# lookbehind. PCRE2 supports some limited cases. + +/([ab])...(?<=\1)z/ + a11az + b11bz +\= Expect no match + b11az + +/(?|([ab]))...(?<=\1)z/ + +/([ab])(\1)...(?<=\2)z/ + aa11az + +/(a\2)(b\1)(?<=\2)/ + +/(?[ab])...(?<=\k'A')z/ + a11az + b11bz +\= Expect no match + b11az + +/(?[ab])...(?<=\k'A')(?)z/dupnames + +# Perl does not support \g+n + +/((\g+1X)?([ab]))+/ + aaXbbXa + +/ab(?C1)c/auto_callout + abc + +/'ab(?C1)c'/hex,auto_callout + abc + +# Perl accepts these, but gives a warning. We can't warn, so give an error. + +/[a-[:digit:]]+/ + a-a9-a + +/[A-[:digit:]]+/ + A-A9-A + +/[a-\d]+/ + a-a9-a + +/(?abc)(?(R)xyz)/B + +/(?abc)(?(R)xyz)/B + +/(?=.*[A-Z])/I + +/()(?<=(?0))/ + +/(?*?\g'0/use_length + +/.>*?\g'0/ + +/{„Í„ÍÍ„Í{'{22{2{{2{'{22{{22{2{'{22{2{{2{{222{{2{'{22{2{22{2{'{22{2{{2{'{22{2{22{2{'{'{22{2{22{2{'{22{2{{2{'{22{2{22{2{'{222{2Ą̈́ÍÍ„Í{'{22{2{{2{'{22{{11{2{'{22{2{{2{{'{22{2{{2{'{22{{22{1{'{22{2{{2{{222{{2{'{22{2{22{2{'{/auto_callout + +// +\=get=i00000000000000000000000000000000 +\=get=i2345678901234567890123456789012,get=i1245678901234567890123456789012 + +"(?(?C))" + +/(?(?(?(?(?(?))))))/ + +/(?<=(?1))((?s))/anchored + +/(*:ab)*/ + +%(*:(:(svvvvvvvvvv:]*[ Z!*;[]*[^[]*!^[+.+{{2,7}' _\\\\\\\\\\\\\)?.:.. *w////\\\Q\\\\\\\\\\\\\\\T\\\\\+/?/////'+\\\EEE?/////'+/*+/[^K]?]//(w)%never_backslash_c,alt_verbnames,auto_callout + +/./newline=crlf + \=ph + +/(\x0e00\000000\xc)/replace=\P,substitute_extended + \x0e00\000000\xc + +//replace=0 + \=offset=7 + +/(?<=\G.)/g,replace=+ + abc + +".+\QX\E+"B,no_auto_possess + +".+\QX\E+"B,auto_callout,no_auto_possess + +# This one is here because Perl gives an 'unmatched )' error which goes away +# if one of the \) sequences is removed - which is weird. PCRE finds it too +# complicated to find a minimum matching length. + +"()X|((((((((()))))))((((())))))\2())((((((\2\2)))\2)(\22((((\2\2)2))\2)))(2\ZZZ)+:)Z^|91ZiZZnter(ZZ |91Z(ZZ ZZ(\r2Z( or#(\Z2(Z\Z(\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)2))\2Z+:)Z|91Z(ZZ ZZ(\r2Z( or#(\Z2(Z\Z((Z*(\2(Z\':))\0)i|||||||||||||||loZ\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)2))\2Z)))int \)\0nte!rnal errpr\2\\21r(2\ZZZ)+:)Z!|91Z(ZZ ZZ(\r2Z( or#(\Z2(Z\Z(\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)2))\2Z)))int \)\0(2\ZZZ)+:)Z^|91ZiZZnter(ZZ |91Z(ZZ ZZ(\r2Z( or#(\Z2(Z\Z(\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)2))\2Z)))int \)\0(2\ZZZ)+:)Z^)))int \)\0(2\ZZZ)+:)Z^|91ZiZZnter(ZZernZal ZZ(\r2Z( or#(\Z2(Z\Z(\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)2))\2Z)))int \))\ZZ(\r2Z( or#(\Z2(Z\Z(\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)))\2))))((((((\2\2))))))"I + +# This checks that new code for handling groups that may match an empty string +# works on a very large number of alternatives. This pattern used to provoke a +# complaint that it was too complicated. + +/(?:\[A|B|C|D|E|F|G|H|I|J|]{200}Z)/expand + +# This one used to compile rubbish instead of a compile error, and then +# behave unpredictably at match time. + +/.+(?(?C'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'))?!XXXX.=X/ + .+(?(?C'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'))?!XXXX.=X + +/[:[:alnum:]-[[a:lnum:]+/ + +/((?(?C'')\QX\E(?!((?(?C'')(?!X=X));=)r*X=X));=)/ + +/((?(?C'')\Q\E(?!((?(?C'')(?!X=X));=)r*X=X));=)/ + +/abcd/auto_callout + abcd\=callout_error=255:2 + +/()(\g+65534)/ + +/()(\g+65533)/ + +/Á\x00\x00\x00š(\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\x00k\d+\x00‎\x00\x00\x00\x00\x00\2*\x00\x00\1*.){36}int^\x00\x00ÿÿ\x00š(\1{50779}?)J\w2/I + +/(a)(b)\2\1\1\1\1/I + +/(?a)(?b)\g{b}\g{a}\g{a}\g{a}\g{a}(?xx)(?zz)/I,dupnames + +// + \=ovector=7777777777 + +# This is here because Perl matches, even though a COMMIT is encountered +# outside of the recursion. + +/(?1)(A(*COMMIT)|B)D/ + BAXBAD + +"(?1){2}(a)"B + +"(?1){2,4}(a)"B + +# This test differs from Perl for the first subject. Perl ends up with +# $1 set to 'B'; PCRE2 has it unset (which I think is right). + +/^(?: +(?:A| (?:B|B(*ACCEPT)) (?<=(.)) D) +(Z) +)+$/x + AZB + AZBDZ + +# The first of these, when run by Perl, gives the mark 'aa', which is wrong. + +'(?>a(*:aa))b|ac' mark + ac + +'(?:a(*:aa))b|ac' mark + ac + +/(R?){65}/ + (R?){65} + +/\[(a)]{60}/expand + aaaa + +/(?=999)yes)^bc/I + +# This should not be anchored. + +/(?(VERSION>=999)yes|no)^bc/I + +/(*LIMIT_HEAP=0)xxx/I + +/\d{0,3}(*:abc)(?C1)xxx/callout_info + +# ---------------------------------------------------------------------- + +# These are a whole pile of tests that touch lines of code that are not +# used by any other tests (at least when these were created). + +/^a+?x/i,no_start_optimize,no_auto_possess +\= Expect no match + aaa + +/^[^a]{3,}?x/i,no_start_optimize,no_auto_possess +\= Expect no match + bbb + cc + +/^X\S/no_start_optimize,no_auto_possess +\= Expect no match + X + +/^X\W/no_start_optimize,no_auto_possess +\= Expect no match + X + +/^X\H/no_start_optimize,no_auto_possess +\= Expect no match + X + +/^X\h/no_start_optimize,no_auto_possess +\= Expect no match + X + +/^X\V/no_start_optimize,no_auto_possess +\= Expect no match + X + +/^X\v/no_start_optimize,no_auto_possess +\= Expect no match + X + +/^X\h/no_start_optimize,no_auto_possess +\= Expect no match + XY + +/^X\V/no_start_optimize,no_auto_possess +\= Expect no match + X\n + +/^X\v/no_start_optimize,no_auto_possess +\= Expect no match + XX + +/^X.+?/s,no_start_optimize,no_auto_possess +\= Expect no match + X + +/^X\R+?/no_start_optimize,no_auto_possess +\= Expect no match + XX + +/^X\H+?/no_start_optimize,no_auto_possess +\= Expect no match + X + +/^X\h+?/no_start_optimize,no_auto_possess +\= Expect no match + X + +/^X\V+?/no_start_optimize,no_auto_possess +\= Expect no match + X + X\n + +/^X\D+?/no_start_optimize,no_auto_possess +\= Expect no match + X + X9 + +/^X\S+?/no_start_optimize,no_auto_possess +\= Expect no match + X + X\n + +/^X\W+?/no_start_optimize,no_auto_possess +\= Expect no match + X + XX + +/^X.+?Z/no_start_optimize,no_auto_possess +\= Expect no match + XY\n + +/(*CRLF)^X.+?Z/no_start_optimize,no_auto_possess +\= Expect no match + XY\r\=ps + +/^X\R+?Z/no_start_optimize,no_auto_possess +\= Expect no match + X\nX + X\n\r\n + X\n\rY + X\n\nY + X\n\x{0c}Y + +/(*BSR_ANYCRLF)^X\R+?Z/no_start_optimize,no_auto_possess +\= Expect no match + X\nX + X\n\r\n + X\n\rY + X\n\nY + X\n\x{0c}Y + +/^X\H+?Z/no_start_optimize,no_auto_possess +\= Expect no match + XY\t + XYY + +/^X\h+?Z/no_start_optimize,no_auto_possess +\= Expect no match + X\t\t + X\tY + +/^X\V+?Z/no_start_optimize,no_auto_possess +\= Expect no match + XY\n + XYY + +/^X\v+?Z/no_start_optimize,no_auto_possess +\= Expect no match + X\n\n + X\nY + +/^X\D+?Z/no_start_optimize,no_auto_possess +\= Expect no match + XY9 + XYY + +/^X\d+?Z/no_start_optimize,no_auto_possess +\= Expect no match + X99 + X9Y + +/^X\S+?Z/no_start_optimize,no_auto_possess +\= Expect no match + XY\n + XYY + +/^X\s+?Z/no_start_optimize,no_auto_possess +\= Expect no match + X\n\n + X\nY + +/^X\W+?Z/no_start_optimize,no_auto_possess +\= Expect no match + X.A + X++ + +/^X\w+?Z/no_start_optimize,no_auto_possess +\= Expect no match + Xa. + Xaa + +/^X.{1,3}Z/s,no_start_optimize,no_auto_possess +\= Expect no match + Xa.bd + +/^X\h+Z/no_start_optimize,no_auto_possess +\= Expect no match + X\t\t + X\tY + +/^X\V+Z/no_start_optimize,no_auto_possess +\= Expect no match + XY\n + XYY + +/^(X(*THEN)Y|AB){0}(?1)/ + ABX +\= Expect no match + XAB + +/^(?!A(?C1)B)C/ + ABC\=callout_error=1,no_jit + +/^(?!A(?C1)B)C/no_start_optimize + ABC\=callout_error=1 + +/^(?(?!A(?C1)B)C)/ + ABC\=callout_error=1 + +# ---------------------------------------------------------------------- + +/[a b c]/BxxI + +/[a b c]/BxxxI + +/[a b c]/B,extended_more + +/[ a b c ]/B,extended_more + +/[a b](?xx: [ 12 ] (?-xx:[ 34 ]) )y z/B + +# Unsetting /x also unsets /xx + +/[a b](?xx: [ 12 ] (?-x:[ 34 ]) )y z/B + +/(a)(?-n:(b))(c)/nB + +# ---------------------------------------------------------------------- +# These test the dangerous PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL option. + +/\j\x{z}\o{82}\L\uabcd\u\U\g{\g/B,\bad_escape_is_literal + +/\N{\c/IB,bad_escape_is_literal + +/[\j\x{z}\o\gAb\g]/B,bad_escape_is_literal + +/[Q-\N]/B,bad_escape_is_literal + +/[\s-_]/bad_escape_is_literal + +/[_-\s]/bad_escape_is_literal + +/[\B\R\X]/B + +/[\B\R\X]/B,bad_escape_is_literal + +/[A-\BP-\RV-\X]/B + +/[A-\BP-\RV-\X]/B,bad_escape_is_literal + +# ---------------------------------------------------------------------- + +/a\b(c/literal + a\\b(c + +/a\b(c/literal,caseless + a\\b(c + a\\B(c + +/a\b(c/literal,firstline + XYYa\\b(c +\= Expect no match + X\na\\b(c + +/a\b?c/literal,use_offset_limit + XXXXa\\b?c\=offset_limit=4 +\= Expect no match + XXXXa\\b?c\=offset_limit=3 + +/a\b(c/literal,anchored,endanchored + a\\b(c +\= Expect no match + Xa\\b(c + a\\b(cX + Xa\\b(cX + +//literal,extended + +/a\b(c/literal,auto_callout,no_start_optimize + XXXXa\\b(c + +/a\b(c/literal,auto_callout + XXXXa\\b(c + +/(*CR)abc/literal + (*CR)abc + +/cat|dog/I,match_word + the cat sat +\= Expect no match + caterpillar + snowcat + syndicate + +/(cat)|dog/I,match_line,literal + (cat)|dog +\= Expect no match + the cat sat + caterpillar + snowcat + syndicate + +/a whole line/match_line,multiline + Rhubarb \na whole line\n custard +\= Expect no match + Not a whole line + +# Perl gets this wrong, failing to capture 'b' in group 1. + +/^(b+|a){1,2}?bc/ + bbc + +# And again here, for the "babc" subject string. + +/^(b*|ba){1,2}?bc/ + babc + bbabc + bababc +\= Expect no match + bababbc + babababc + +/[[:digit:]-a]/ + +/[[:digit:]-[:print:]]/ + +/[\d-a]/ + +/[\H-z]/ + +/[\d-[:print:]]/ + +# Perl gets the second of these wrong, giving no match. + +"(?<=(a))\1?b"I + ab + aaab + +"(?=(a))\1?b"I + ab + aaab + +# JIT does not support callout_extra + +/(*NO_JIT)(a+)b/auto_callout,no_start_optimize,no_auto_possess +\= Expect no match + aac\=callout_extra + +/(*NO_JIT)a+(?C'XXX')b/no_start_optimize,no_auto_possess +\= Expect no match + aac\=callout_extra + +/\n/firstline + xyz\nabc + +/\nabc/firstline + xyz\nabc + +/\x{0a}abc/firstline,newline=crlf +\= Expect no match + xyz\r\nabc + +/[abc]/firstline +\= Expect no match + \na + +# These tests are matched in test 1 as they are Perl compatible. Here we are +# looking at what does and does not get auto-possessified. + +/(?(DEFINE)(?a?))^(?&optional_a)a$/B + +/(?(DEFINE)(?a?)X)^(?&optional_a)a$/B + +/^(a?)b(?1)a/B + +/^(a?)+b(?1)a/B + +/^(a?)++b(?1)a/B + +/^(a?)+b/B + +/(?=a+)a(a+)++b/B + +/(?<=(?=.){4,5}x)/B + +# Perl behaves differently with these when optimization is turned off + +/a(*PRUNE:X)bc|qq/mark,no_start_optimize +\= Expect no match + axy + +/a(*THEN:X)bc|qq/mark,no_start_optimize +\= Expect no match + axy + +/(?^x-i)AB/ + +/(?^-i)AB/ + +/(?x-i-i)/ + +/(?(?=^))b/I + abc + +/(?(?=^)|)b/I + abc + +/(?(?=^)|^)b/I + bbc +\= Expect no match + abc + +/(?(1)^|^())/I + +/(?(1)^())b/I + +/(?(1)^())+b/I,aftertext + abc + +/(?(1)^()|^)+b/I,aftertext + bbc +\= Expect no match + abc + +/(?(1)^()|^)*b/I,aftertext + bbc + abc + xbc + +/(?(1)^())+b/I,aftertext + abc + +/(?(1)^a()|^a)+b/I,aftertext + abc +\= Expect no match + bbc + +/(?(1)^|^(a))+b/I,aftertext + abc +\= Expect no match + bbc + +/(?(1)^a()|^a)*b/I,aftertext + abc + bbc + xbc + +/a(b)c|xyz/g,allvector,replace=<$0> + abcdefabcpqr\=ovector=4 + abxyz\=ovector=4 + abcdefxyz\=ovector=4 + +/a(b)c|xyz/allvector + abcdef\=ovector=4 + abxyz\=ovector=4 + +/a(b)c|xyz/g,replace=<$0>,substitute_callout + abcdefabcpqr + abxyzpqrabcxyz + 12abc34xyz99abc55\=substitute_stop=2 + 12abc34xyz99abc55\=substitute_skip=1 + 12abc34xyz99abc55\=substitute_skip=2 + +/a(b)c|xyz/g,replace=<$0> + abcdefabcpqr + abxyzpqrabcxyz + 12abc34xyz\=substitute_stop=2 + 12abc34xyz\=substitute_skip=1 + +/a(b)c|xyz/replace=<$0> + abcdefabcpqr + 12abc34xyz\=substitute_skip=1 + 12abc34xyz\=substitute_stop=1 + +/abc\rdef/ + abc\ndef + +/abc\rdef\x{0d}xyz/escaped_cr_is_lf + abc\ndef\rxyz +\= Expect no match + abc\ndef\nxyz + +/(?(*ACCEPT)xxx)/ + +/(?(*atomic:xx)xxx)/ + +/(?(*script_run:xxx)zzz)/ + +/foobar/ + the foobar thing\=copy_matched_subject + the foobar thing\=copy_matched_subject,zero_terminate + +/foobar/g + the foobar thing foobar again\=copy_matched_subject + +/(*:XX)^abc/I + +/(*COMMIT:XX)^abc/I + +/(*ACCEPT:XX)^abc/I + +/abc/replace=xyz + abc\=null_context + +/abc/replace=xyz,substitute_callout + abc +\= Expect error message + abc\=null_context + +/\[()]{65535}()/expand + +/\[()]{65535}(?)/expand + +/a(?:(*ACCEPT))??bc/ + abc + axy + +/a(*ACCEPT)??bc/ + abc + axy + +/a(*ACCEPT:XX)??bc/mark + abc + axy + +/(*:\)?/ + +/(*:\Q \E){5}/alt_verbnames + +/(?=abc)/I + +/(?|(X)|(XY))\1abc/I + +/(?|(a)|(bcde))(c)\2/I + +/(?|(a)|(bcde))(c)\1/I + +/(?|(?'A'a)|(?'A'bcde))(?'B'c)\k'B'(?'A')/I,dupnames + +/(?|(?'A'a)|(?'A'bcde))(?'B'c)\k'A'(?'A')/I,dupnames + +/((a|)+)+Z/I + +/((?=a))[abcd]/I + +/A(?:(*ACCEPT))?B/info + +/(A(*ACCEPT)??B)C/ + ABC + AXY + +/(?<=(?<=a)b)c.*/I + abc\=ph +\= Expect no match + xbc\=ph + +/(?<=ab)c.*/I + abc\=ph +\= Expect no match + xbc\=ph + +/(?<=a(?<=a|a)c)/I + +/(?<=a(?<=a|ba)c)/I + +/(?<=(?<=a)b)(?.*?\b\1\b){3}/ + word1 word3 word1 word2 word3 word2 word2 word1 word3 word4 + +/\A(*napla:.*\b(\w++))(?>.*?\b\1\b){3}/ + word1 word3 word1 word2 word3 word2 word2 word1 word3 word4 + +/\A(?*.*\b(\w++))(?>.*?\b\1\b){3}/ + word1 word3 word1 word2 word3 word2 word2 word1 word3 word4 + +/(*plb:(.)..|(.)...)(\1|\2)/ + abcdb\=offset=4 + abcda\=offset=4 + +/(*naplb:(.)..|(.)...)(\1|\2)/ + abcdb\=offset=4 + abcda\=offset=4 + +/(?<*(.)..|(.)...)(\1|\2)/ + abcdb\=offset=4 + abcda\=offset=4 + +/(*non_atomic_positive_lookahead:ab)/B + +/(*non_atomic_positive_lookbehind:ab)/B + +/(*pla:ab+)/B + +/(*napla:ab+)/B + +/(*napla:)+/ + +/(*naplb:)+/ + +/(*napla:^x|^y)/I + +/(*napla:abc|abd)/I + +/(*napla:a|(.)(*ACCEPT)zz)\1../ + abcd + +/(*napla:a(*ACCEPT)zz|(.))\1../ + abcd + +/(*napla:a|(*COMMIT)(.))\1\1/ + aabc +\= Expect no match + abbc + +/(*napla:a|(.))\1\1/ + aabc + abbc + +# ---- + +# Expect error (recursion => not fixed length) +/(\2)((?=(?<=\1)))/ + +/c*+(?<=[bc])/ + abc\=ph + ab\=ph + abc\=ps + ab\=ps + +/c++(?<=[bc])/ + abc\=ph + ab\=ph + +/(?<=(?=.(?<=x)))/ + abx + ab\=ph + bxyz + xyz + +/\z/ + abc\=ph + abc\=ps + +/\Z/ + abc\=ph + abc\=ps + abc\n\=ph + abc\n\=ps + +/(?![ab]).*/ + ab\=ph + +/c*+/ + ab\=ph,offset=2 + +/\A\s*(a|(?:[^`]{28500}){4})/I + a + +/\A\s*((?:[^`]{28500}){4})/I + +/\A\s*((?:[^`]{28500}){4}|a)/I + a + +/(?a)(?()b)((?<=b).*)/B + +/(?(1)b)((?<=b).*)/B + +/(?(R1)b)((?<=b).*)/B + +/(?(DEFINE)b)((?<=b).*)/B + +/(?(VERSION=10.4)b)((?<=b).*)/B + +/[aA]b[cC]/IB + +/[cc]abcd/I + +/[Cc]abcd/I + +/[c]abcd/I + +/(?:c|C)abcd/I + +/(a)?a/I + manm + +/^(?|(\*)(*napla:\S*_(\2?+.+))|(\w)(?=\S*_(\2?+\1)))+_\2$/ + *abc_12345abc + +/^(?|(\*)(*napla:\S*_(\3?+.+))|(\w)(?=\S*_((\2?+\1))))+_\2$/ + *abc_12345abc + +/^((\1+)(?C)|\d)+133X$/ + 111133X\=callout_capture + +/abc/replace=xyz,substitute_replacement_only + 123abc456 + +/a(?b)c(?d)e/g,replace=X$ONE+${TWO}Z,substitute_replacement_only + "abcde-abcde-" + +/a(b)c|xyz/g,replace=<$0>,substitute_callout,substitute_replacement_only + abcdefabcpqr + abxyzpqrabcxyz + 12abc34xyz99abc55\=substitute_stop=2 + 12abc34xyz99abc55\=substitute_skip=1 + 12abc34xyz99abc55\=substitute_skip=2 + +/a(..)d/replace=>$1<,substitute_matched + xyzabcdxyzabcdxyz + xyzabcdxyzabcdxyz\=ovector=2 +\= Expect error + xyzabcdxyzabcdxyz\=ovector=1 + +/a(..)d/g,replace=>$1<,substitute_matched + xyzabcdxyzabcdxyz + xyzabcdxyzabcdxyz\=ovector=2 +\= Expect error + xyzabcdxyzabcdxyz\=ovector=1 + xyzabcdxyzabcdxyz\=ovector=1,substitute_unset_empty + +/55|a(..)d/g,replace=>$1<,substitute_matched + xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty +\= Expect error + xyz55abcdxyzabcdxyz\=ovector=2 + +/55|a(..)d/replace=>$1<,substitute_matched + xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty + +/55|a(..)d/replace=>$1< + xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty + +/55|a(..)d/g,replace=>$1< + xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty + +/abc/replace=,caseless + XabcY + XABCY + +/abc/replace=[4],caseless + XabcY + XABCY + +/abc/replace=*,caseless + XabcY + XABCY + XabcY\=replace= + +# Expect non-fixed-length error + +"(?<=X(?(DEFINE)(.*))(?1))." + +/\sxxx\s/tables=1 +\= Expect no match + AB\x{85}xxx\x{a0}XYZ + +/\sxxx\s/tables=2 + AB\x{85}xxx\x{a0}XYZ + +/^\w+/tables=2 + École + +/^\w+/tables=3 + École + +#loadtables ./testbtables + +/^\w+/tables=3 + École + +/"(*MARK:>" 00 "<).."/hex,mark,no_start_optimize + AB + A\=ph +\= Expect no match + A + +/"(*MARK:>" 00 "<).(?C1)."/hex,mark,no_start_optimize + AB + +/(?(VERSION=0.0/ + +# Perl has made \K in lookarounds an error. At the moment PCRE2 still accepts. + +/(?=a\Kb)ab/ + ab + +/(?!a\Kb)ac/ + ac + +/^abc(?<=b\Kc)d/ + abcd + +/^abc(?(?&NAME_PAT))\s+(?(?&ADDRESS_PAT)) + (?(DEFINE) + (?[a-z]+) + (?\d+) + )/x +/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/i + +#save testsaved1 + +# Do it again for some more patterns. + +/(*MARK:A)(*SKIP:B)(C|X)/mark +/(?:(?foo)|(?bar))\k/dupnames + +#save testsaved2 +#pattern -push + +# Reload the patterns, then pop them one by one and check them. + +#load testsaved1 +#load testsaved2 + +#pop info + foofoo + barbar + +#pop mark + C +\= Expect no match + D + +#pop + AmanaplanacanalPanama + +#pop info + metcalfe 33 + +# Check for an error when different tables are used. + +/abc/push,tables=1 +/xyz/push,tables=2 +#save testsaved1 + +#pop + xyz + +#pop + abc + +#pop should give an error + pqr + +/abcd/pushcopy + abcd + +#pop + abcd + +#pop should give an error + +/abcd/push +#popcopy + abcd + +#pop + abcd + +/abcd/push +#save testsaved1 +#pop should give an error + +#load testsaved1 +#popcopy + abcd + +#pop + abcd + +#pop should give an error + +/abcd/pushtablescopy + abcd + +#popcopy + abcd + +#pop + abcd + +# Must only specify one of these + +//push,pushcopy + +//push,pushtablescopy + +//pushcopy,pushtablescopy + +# End of testinput20 diff --git a/src/pcre2/testdata/testinput21 b/src/pcre2/testdata/testinput21 new file mode 100644 index 00000000..1d1fbedf --- /dev/null +++ b/src/pcre2/testdata/testinput21 @@ -0,0 +1,16 @@ +# These are tests of \C that do not involve UTF. They are not run when \C is +# disabled by compiling with --enable-never-backslash-C. + +/\C+\D \C+\d \C+\S \C+\s \C+\W \C+\w \C+. \C+\R \C+\H \C+\h \C+\V \C+\v \C+\Z \C+\z \C+$/Bx + +/\D+\C \d+\C \S+\C \s+\C \W+\C \w+\C .+\C \R+\C \H+\C \h+\C \V+\C \v+\C a+\C \n+\C \C+\C/Bx + +/ab\Cde/never_backslash_c + +/ab\Cde/info + abXde + +/(?<=ab\Cde)X/ + abZdeX + +# End of testinput21 diff --git a/src/pcre2/testdata/testinput22 b/src/pcre2/testdata/testinput22 new file mode 100644 index 00000000..5e01fdca --- /dev/null +++ b/src/pcre2/testdata/testinput22 @@ -0,0 +1,107 @@ +# Tests of \C when Unicode support is available. Note that \C is not supported +# for DFA matching in UTF mode, so this test is not run with -dfa. The output +# of this test is different in 8-, 16-, and 32-bit modes. Some tests may match +# in some widths and not in others. + +/ab\Cde/utf,info + abXde + +# This should produce an error diagnostic (\C in UTF lookbehind) in 8-bit and +# 16-bit modes, but not in 32-bit mode. + +/(?<=ab\Cde)X/utf + ab!deXYZ + +# Autopossessification tests + +/\C+\X \X+\C/Bx + +/\C+\X \X+\C/Bx,utf + +/\C\X*TÓ…; +{0,6}\v+ F +/utf +\= Expect no match + Ó…\x0a + +/\C(\W?Å¿)'?{{/utf +\= Expect no match + \\C(\\W?Å¿)'?{{ + +/X(\C{3})/utf + X\x{1234} + X\x{11234}Y + X\x{11234}YZ + +/X(\C{4})/utf + X\x{1234}YZ + X\x{11234}YZ + X\x{11234}YZW + +/X\C*/utf + XYZabcdce + +/X\C*?/utf + XYZabcde + +/X\C{3,5}/utf + Xabcdefg + X\x{1234} + X\x{1234}YZ + X\x{1234}\x{512} + X\x{1234}\x{512}YZ + X\x{11234}Y + X\x{11234}YZ + X\x{11234}\x{512} + X\x{11234}\x{512}YZ + X\x{11234}\x{512}\x{11234}Z + +/X\C{3,5}?/utf + Xabcdefg + X\x{1234} + X\x{1234}YZ + X\x{1234}\x{512} + X\x{11234}Y + X\x{11234}YZ + X\x{11234}\x{512}YZ + X\x{11234} + +/a\Cb/utf + aXb + a\nb + a\x{100}b + +/a\C\Cb/utf + a\x{100}b + a\x{12257}b + a\x{12257}\x{11234}b + +/ab\Cde/utf + abXde + +# This one is here not because it's different to Perl, but because the way +# the captured single code unit is displayed. (In Perl it becomes a character, +# and you can't tell the difference.) + +/X(\C)(.*)/utf + X\x{1234} + X\nabc + +# This one is here because Perl gives out a grumbly error message (quite +# correctly, but that messes up comparisons). + +/a\Cb/utf +\= Expect no match in 8-bit mode + a\x{100}b + +/^ab\C/utf,no_start_optimize +\= Expect no match - tests \C at end of subject + ab + +/\C[^\v]+\x80/utf + [Aá¿»BÅ€C] + +/\C[^\d]+\x80/utf + [Aá¿»BÅ€C] + +# End of testinput22 diff --git a/src/pcre2/testdata/testinput23 b/src/pcre2/testdata/testinput23 new file mode 100644 index 00000000..d0a9bc4f --- /dev/null +++ b/src/pcre2/testdata/testinput23 @@ -0,0 +1,7 @@ +# This test is run when PCRE2 has been built with --enable-never-backslash-C, +# which disables the use of \C. All we can do is check that it gives the +# correct error message. + +/a\Cb/ + +# End of testinput23 diff --git a/src/pcre2/testdata/testinput24 b/src/pcre2/testdata/testinput24 new file mode 100644 index 00000000..380e23cd --- /dev/null +++ b/src/pcre2/testdata/testinput24 @@ -0,0 +1,396 @@ +# This file tests the auxiliary pattern conversion features of the PCRE2 +# library, in non-UTF mode. + +#forbid_utf +#newline_default lf any anycrlf + +# -------- Tests of glob conversion -------- + +# Set the glob separator explicitly so that different OS defaults are not a +# problem. Then test various errors. + +#pattern convert=glob,convert_glob_escape=\,convert_glob_separator=/ + +/abc/posix + +# Separator must be / \ or . + +/a*b/convert_glob_separator=% + +# Can't have separator in a class + +"[ab/cd]" + +"[,-/]" + +/[ab/ + +# Length check + +/abc/convert_length=11 + +/abc/convert_length=12 + +# Now some actual tests + +/a?b[]xy]*c/ + azb]1234c + +# Tests from the gitwildmatch list, with some additions + +/foo/ + foo +/= Expect no match + bar + +// + \ + +/???/ + foo +\= Expect no match + foobar + +/*/ + foo + \ + +/f*/ + foo + f + +/*f/ + oof +\= Expect no match + foo + +/*foo*/ + foo + food + aprilfool + +/*ob*a*r*/ + foobar + +/*ab/ + aaaaaaabababab + +/foo\*/ + foo* + +/foo\*bar/ +\= Expect no match + foobar + +/f\\oo/ + f\\oo + +/*[al]?/ + ball + +/[ten]/ +\= Expect no match + ten + +/t[a-g]n/ + ten + +/a[]]b/ + a]b + +/a[]a-]b/ + +/a[]-]b/ + a-b + a]b +\= Expect no match + aab + +/a[]a-z]b/ + aab + +/]/ + ] + +/t[!a-g]n/ + ton +\= Expect no match + ten + +'[[:alpha:]][[:digit:]][[:upper:]]' + a1B + +'[[:digit:][:upper:][:space:]]' + A + 1 + \ \= +\= Expect no match + a + . + +'[a-c[:digit:]x-z]' + 5 + b + y +\= Expect no match + q + +# End of gitwildmatch tests + +/*.j?g/ + pic01.jpg + .jpg + pic02.jxg +\= Expect no match + pic03.j/g + +/A[+-0]B/ + A+B + A.B + A0B +\= Expect no match + A/B + +/*x?z/ + abc.xyz +\= Expect no match + .xyz + +/?x?z/ + axyz +\= Expect no match + .xyz + +"[,-0]x?z" + ,xyz +\= Expect no match + /xyz + .xyz + +".x*" + .xabc + +/a[--0]z/ + a-z + a.z + a0z +\= Expect no match + a/z + a1z + +/<[a-c-d]>/ + + + + + <-> + +/a[[:digit:].]z/ + a1z + a.z +\= Expect no match + a:z + +/a[[:digit].]z/ + a[.]z + a:.]z + ad.]z + +/<[[:a[:digit:]b]>/ + <[> + <:> + + <9> + +\= Expect no match + + +/a*b/convert_glob_separator=\ + +/a*b/convert_glob_separator=. + +/a*b/convert_glob_separator=/ + +# Non control character checking + +/A\B\\C\D/ + +/\\{}\?\*+\[\]()|.^$/ + +/*a*\/*b*/ + +/?a?\/?b?/ + +/[a\\b\c][]][-][\]\-]/ + +/[^a\\b\c][!]][!-][^\]\-]/ + +/[[:alnum:][:alpha:][:blank:][:cntrl:][:digit:][:graph:][:lower:][:print:][:punct:][:space:][:upper:][:word:][:xdigit:]]/ + +"[/-/]" + +/[-----]/ + +/[------]/ + +/[!------]/ + +/[[:alpha:]-a]/ + +/[[:alpha:]][[:punct:]][[:ascii:]]/ + +/[a-[:alpha:]]/ + +/[[:alpha:/ + +/[[:alpha:]/ + +/[[:alphaa:]]/ + +/[[:xdigi:]]/ + +/[[:xdigit::]]/ + +/****/ + +/**\/abc/ + abc + x/abc + xabc + +/abc\/**/ + +/abc\/**\/abc/ + +/**\/*a*b*g*n*t/ + abcd/abcdefg/abcdefghijk/abcdefghijklmnop.txt + +/**\/*a*\/**/ + xx/xx/xx/xax/xx/xb + +/**\/*a*/ + xx/xx/xx/xax + xx/xx/xx/xax/xx + +/**\/*a*\/**\/*b*/ + xx/xx/xx/xax/xx/xb + xx/xx/xx/xax/xx/x + +"**a"convert=glob + a + c/b/a + c/b/aaa + +"a**/b"convert=glob + a/b + ab + +"a/**b"convert=glob + a/b + ab + +#pattern convert=glob:glob_no_starstar + +/***/ + +/**a**/ + +#pattern convert=unset +#pattern convert=glob:glob_no_wild_separator + +/*/ + +/*a*/ + +/**a**/ + +/a*b/ + +/*a*b*/ + +/??a??/ + +#pattern convert=unset +#pattern convert=glob,convert_glob_escape=0 + +/a\b\cd/ + +/**\/a/ + +/a`*b/convert_glob_escape=` + +/a`*b/convert_glob_escape=0 + +/a`*b/convert_glob_escape=x + +# -------- Tests of extended POSIX conversion -------- + +#pattern convert=unset:posix_extended + +/<[[:a[:digit:]b]>/ + <[> + <:> + + <9> + +\= Expect no match + + +/a+\1b\\c|d[ab\c]/ + +/<[]bc]>/ + <]> + + + +/<[^]bc]>/ + <.> +\= Expect no match + <]> + + +/(a)\1b/ + a1b +\= Expect no match + aab + +/(ab)c)d]/ + Xabc)d]Y + +/a***b/ + +# -------- Tests of basic POSIX conversion -------- + +#pattern convert=unset:posix_basic + +/a*b+c\+[def](ab)\(cd\)/ + +/\(a\)\1b/ + aab +\= Expect no match + a1b + +/how.to how\.to/ + how\nto how.to +\= Expect no match + how\x{0}to how.to + +/^how to \^how to/ + +/^*abc/ + +/*abc/ + X*abcY + +/**abc/ + XabcY + X*abcY + X**abcY + +/*ab\(*cd\)/ + +/^b\(c^d\)\(^e^f\)/ + +/a***b/ + +# End of testinput24 diff --git a/src/pcre2/testdata/testinput25 b/src/pcre2/testdata/testinput25 new file mode 100644 index 00000000..f21d9ad4 --- /dev/null +++ b/src/pcre2/testdata/testinput25 @@ -0,0 +1,18 @@ +# This file tests the auxiliary pattern conversion features of the PCRE2 +# library, in UTF mode. + +#newline_default lf any anycrlf + +# -------- Tests of glob conversion -------- + +# Set the glob separator explicitly so that different OS defaults are not a +# problem. Then test various errors. + +#pattern convert=glob,convert_glob_escape=\,convert_glob_separator=/ + +# The fact that this one works in 13 bytes in the 8-bit library shows that the +# output is in UTF-8, though pcre2test shows the character as an escape. + +/'>' c4 a3 '<'/hex,utf,convert_length=13 + +# End of testinput25 diff --git a/src/pcre2/testdata/testinput3 b/src/pcre2/testdata/testinput3 new file mode 100644 index 00000000..71e95fec --- /dev/null +++ b/src/pcre2/testdata/testinput3 @@ -0,0 +1,104 @@ +# This set of tests checks local-specific features, using the "fr_FR" locale. +# It is not Perl-compatible. When run via RunTest, the locale is edited to +# be whichever of "fr_FR", "french", or "fr" is found to exist. There is +# different version of this file called wintestinput3 for use on Windows, +# where the locale is called "french" and the tests are run using +# RunTest.bat. + +#forbid_utf + +/^[\w]+/ +\= Expect no match + École + +/^[\w]+/locale=fr_FR + École + +/^[\w]+/ +\= Expect no match + École + +/^[\W]+/ + École + +/^[\W]+/locale=fr_FR +\= Expect no match + École + +/[\b]/ + \b +\= Expect no match + a + +/[\b]/locale=fr_FR + \b +\= Expect no match + a + +/^\w+/ +\= Expect no match + École + +/^\w+/locale=fr_FR + École + +/(.+)\b(.+)/ + École + +/(.+)\b(.+)/locale=fr_FR +\= Expect no match + École + +/École/i + École +\= Expect no match + école + +/École/i,locale=fr_FR + École + école + +/\w/I + +/\w/I,locale=fr_FR + +# All remaining tests are in the fr_FR locale, so set the default. + +#pattern locale=fr_FR + +/^[\xc8-\xc9]/i + École + école + +/^[\xc8-\xc9]/ + École +\= Expect no match + école + +/\W+/ + >>>\xaa<<< + >>>\xba<<< + +/[\W]+/ + >>>\xaa<<< + >>>\xba<<< + +/[^[:alpha:]]+/ + >>>\xaa<<< + >>>\xba<<< + +/\w+/ + >>>\xaa<<< + >>>\xba<<< + +/[\w]+/ + >>>\xaa<<< + >>>\xba<<< + +/[[:alpha:]]+/ + >>>\xaa<<< + >>>\xba<<< + +/[[:alpha:]][[:lower:]][[:upper:]]/IB + +# End of testinput3 diff --git a/src/pcre2/testdata/testinput4 b/src/pcre2/testdata/testinput4 new file mode 100644 index 00000000..4e2a0abc --- /dev/null +++ b/src/pcre2/testdata/testinput4 @@ -0,0 +1,2498 @@ +# This set of tests is for UTF support, including Unicode properties. The +# Unicode tests are all compatible with all versions of Perl >= 5.10, but +# some of the property tests may differ because of different versions of +# Unicode in use by PCRE2 and Perl. + +# WARNING: Use only / as the pattern delimiter. Although pcre2test supports +# a number of delimiters, all those other than / give problems with the +# perltest.sh script. + +#newline_default lf anycrlf any +#perltest + +/a.b/utf + acb + a\x7fb + a\x{100}b +\= Expect no match + a\nb + +/a(.{3})b/utf + a\x{4000}xyb + a\x{4000}\x7fyb + a\x{4000}\x{100}yb +\= Expect no match + a\x{4000}b + ac\ncb + +/a(.*?)(.)/ + a\xc0\x88b + +/a(.*?)(.)/utf + a\x{100}b + +/a(.*)(.)/ + a\xc0\x88b + +/a(.*)(.)/utf + a\x{100}b + +/a(.)(.)/ + a\xc0\x92bcd + +/a(.)(.)/utf + a\x{240}bcd + +/a(.?)(.)/ + a\xc0\x92bcd + +/a(.?)(.)/utf + a\x{240}bcd + +/a(.??)(.)/ + a\xc0\x92bcd + +/a(.??)(.)/utf + a\x{240}bcd + +/a(.{3})b/utf + a\x{1234}xyb + a\x{1234}\x{4321}yb + a\x{1234}\x{4321}\x{3412}b +\= Expect no match + a\x{1234}b + ac\ncb + +/a(.{3,})b/utf + a\x{1234}xyb + a\x{1234}\x{4321}yb + a\x{1234}\x{4321}\x{3412}b + axxxxbcdefghijb + a\x{1234}\x{4321}\x{3412}\x{3421}b +\= Expect no match + a\x{1234}b + +/a(.{3,}?)b/utf + a\x{1234}xyb + a\x{1234}\x{4321}yb + a\x{1234}\x{4321}\x{3412}b + axxxxbcdefghijb + a\x{1234}\x{4321}\x{3412}\x{3421}b +\= Expect no match + a\x{1234}b + +/a(.{3,5})b/utf + a\x{1234}xyb + a\x{1234}\x{4321}yb + a\x{1234}\x{4321}\x{3412}b + axxxxbcdefghijb + a\x{1234}\x{4321}\x{3412}\x{3421}b + axbxxbcdefghijb + axxxxxbcdefghijb +\= Expect no match + a\x{1234}b + axxxxxxbcdefghijb + +/a(.{3,5}?)b/utf + a\x{1234}xyb + a\x{1234}\x{4321}yb + a\x{1234}\x{4321}\x{3412}b + axxxxbcdefghijb + a\x{1234}\x{4321}\x{3412}\x{3421}b + axbxxbcdefghijb + axxxxxbcdefghijb +\= Expect no match + a\x{1234}b + axxxxxxbcdefghijb + +/^[a\x{c0}]/utf +\= Expect no match + \x{100} + +/(?<=aXb)cd/utf + aXbcd + +/(?<=a\x{100}b)cd/utf + a\x{100}bcd + +/(?<=a\x{100000}b)cd/utf + a\x{100000}bcd + +/(?:\x{100}){3}b/utf + \x{100}\x{100}\x{100}b +\= Expect no match + \x{100}\x{100}b + +/\x{ab}/utf + \x{ab} + \xc2\xab +\= Expect no match + \x00{ab} + +/(?<=(.))X/utf + WXYZ + \x{256}XYZ +\= Expect no match + XYZ + +/[^a]+/g,utf + bcd + \x{100}aY\x{256}Z + +/^[^a]{2}/utf + \x{100}bc + +/^[^a]{2,}/utf + \x{100}bcAa + +/^[^a]{2,}?/utf + \x{100}bca + +/[^a]+/gi,utf + bcd + \x{100}aY\x{256}Z + +/^[^a]{2}/i,utf + \x{100}bc + +/^[^a]{2,}/i,utf + \x{100}bcAa + +/^[^a]{2,}?/i,utf + \x{100}bca + +/\x{100}{0,0}/utf + abcd + +/\x{100}?/utf + abcd + \x{100}\x{100} + +/\x{100}{0,3}/utf + \x{100}\x{100} + \x{100}\x{100}\x{100}\x{100} + +/\x{100}*/utf + abce + \x{100}\x{100}\x{100}\x{100} + +/\x{100}{1,1}/utf + abcd\x{100}\x{100}\x{100}\x{100} + +/\x{100}{1,3}/utf + abcd\x{100}\x{100}\x{100}\x{100} + +/\x{100}+/utf + abcd\x{100}\x{100}\x{100}\x{100} + +/\x{100}{3}/utf + abcd\x{100}\x{100}\x{100}XX + +/\x{100}{3,5}/utf + abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX + +/\x{100}{3,}/utf + abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX + +/(?<=a\x{100}{2}b)X/utf,aftertext + Xyyya\x{100}\x{100}bXzzz + +/\D*/utf + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + +/\D*/utf + \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + +/\D/utf + 1X2 + 1\x{100}2 + +/>\S/utf + > >X Y + > >\x{100} Y + +/\d/utf + \x{100}3 + +/\s/utf + \x{100} X + +/\D+/utf + 12abcd34 +\= Expect no match + 1234 + +/\D{2,3}/utf + 12abcd34 + 12ab34 +\= Expect no match + 1234 + 12a34 + +/\D{2,3}?/utf + 12abcd34 + 12ab34 +\= Expect no match + 1234 + 12a34 + +/\d+/utf + 12abcd34 + +/\d{2,3}/utf + 12abcd34 + 1234abcd +\= Expect no match + 1.4 + +/\d{2,3}?/utf + 12abcd34 + 1234abcd +\= Expect no match + 1.4 + +/\S+/utf + 12abcd34 +\= Expect no match + \ \ + +/\S{2,3}/utf + 12abcd34 + 1234abcd +\= Expect no match + \ \ + +/\S{2,3}?/utf + 12abcd34 + 1234abcd +\= Expect no match + \ \ + +/>\s+ <34 + +/>\s{2,3} \s{2,3}? \xff< + +/[\xff]/utf + >\x{ff}< + +/[^\xFF]/ + XYZ + +/[^\xff]/utf + XYZ + \x{123} + +/^[ac]*b/utf +\= Expect no match + xb + +/^[ac\x{100}]*b/utf +\= Expect no match + xb + +/^[^x]*b/i,utf +\= Expect no match + xb + +/^[^x]*b/utf +\= Expect no match + xb + +/^\d*b/utf +\= Expect no match + xb + +/(|a)/g,utf + catac + a\x{256}a + +/^\x{85}$/i,utf + \x{85} + +/^ሴ/utf + ሴ + +/^\ሴ/utf + ሴ + +/(?s)(.{1,5})/utf + abcdefg + ab + +/a*\x{100}*\w/utf + a + +/\S\S/g,utf + A\x{a3}BC + +/\S{2}/g,utf + A\x{a3}BC + +/\W\W/g,utf + +\x{a3}== + +/\W{2}/g,utf + +\x{a3}== + +/\S/g,utf + \x{442}\x{435}\x{441}\x{442} + +/[\S]/g,utf + \x{442}\x{435}\x{441}\x{442} + +/\D/g,utf + \x{442}\x{435}\x{441}\x{442} + +/[\D]/g,utf + \x{442}\x{435}\x{441}\x{442} + +/\W/g,utf + \x{2442}\x{2435}\x{2441}\x{2442} + +/[\W]/g,utf + \x{2442}\x{2435}\x{2441}\x{2442} + +/[\S\s]*/utf + abc\n\r\x{442}\x{435}\x{441}\x{442}xyz + +/[\x{41f}\S]/g,utf + \x{442}\x{435}\x{441}\x{442} + +/.[^\S]./g,utf + abc def\x{442}\x{443}xyz\npqr + +/.[^\S\n]./g,utf + abc def\x{442}\x{443}xyz\npqr + +/[[:^alnum:]]/g,utf + +\x{2442} + +/[[:^alpha:]]/g,utf + +\x{2442} + +/[[:^ascii:]]/g,utf + A\x{442} + +/[[:^blank:]]/g,utf + A\x{442} + +/[[:^cntrl:]]/g,utf + A\x{442} + +/[[:^digit:]]/g,utf + A\x{442} + +/[[:^graph:]]/g,utf + \x19\x{e01ff} + +/[[:^lower:]]/g,utf + A\x{422} + +/[[:^print:]]/g,utf + \x{19}\x{e01ff} + +/[[:^punct:]]/g,utf + A\x{442} + +/[[:^space:]]/g,utf + A\x{442} + +/[[:^upper:]]/g,utf + a\x{442} + +/[[:^word:]]/g,utf + +\x{2442} + +/[[:^xdigit:]]/g,utf + M\x{442} + +/[^ABCDEFGHIJKLMNOPQRSTUVWXYZÀÃÂÃÄÅÆÇÈÉÊËÌÃÃŽÃÃÑÒÓÔÕÖØÙÚÛÜÃÞĀĂĄĆĈĊČĎÄĒĔĖĘĚĜĞĠĢĤĦĨĪĬĮİIJĴĶĹĻĽĿÅŃŅŇŊŌŎÅÅ’Å”Å–Å˜ÅšÅœÅžÅ Å¢Å¤Å¦Å¨ÅªÅ¬Å®Å°Å²Å´Å¶Å¸Å¹Å»Å½ÆÆ‚Æ„Æ†Æ‡Æ‰ÆŠÆ‹ÆŽÆÆÆ‘Æ“Æ”Æ–Æ—Æ˜ÆœÆÆŸÆ Æ¢Æ¤Æ¦Æ§Æ©Æ¬Æ®Æ¯Æ±Æ²Æ³ÆµÆ·Æ¸Æ¼Ç„LJNJÇÇǑǓǕǗǙǛǞǠǢǤǦǨǪǬǮDZǴǶǷǸǺǼǾȀȂȄȆȈȊȌȎÈȒȔȖȘȚȜȞȠȢȤȦȨȪȬȮȰȲȺȻȽȾÉΆΈΉΊΌΎÎΑΒΓΔΕΖΗΘΙΚΛΜÎΞΟΠΡΣΤΥΦΧΨΩΪΫϒϓϔϘϚϜϞϠϢϤϦϨϪϬϮϴϷϹϺϽϾϿЀÐЂЃЄЅІЇЈЉЊЋЌÐÐŽÐÐБВГДЕЖЗИЙКЛМÐОПРСТУФХЦЧШЩЪЫЬЭЮЯѠѢѤѦѨѪѬѮѰѲѴѶѸѺѼѾҀҊҌҎÒҒҔҖҘҚҜҞҠҢҤҦҨҪҬҮҰҲҴҶҸҺҼҾӀÓÓƒÓ…Ó‡Ó‰Ó‹ÓÓӒӔӖӘӚӜӞӠӢӤӦӨӪӬӮӰӲӴӶӸԀԂԄԆԈԊԌԎԱԲԳԴԵԶԷԸԹԺԻԼԽԾԿՀÕÕ‚ÕƒÕ„Õ…Õ†Õ‡ÕˆÕ‰ÕŠÕ‹ÕŒÕÕŽÕÕՑՒՓՔՕՖႠႡႢႣႤႥႦႧႨႩႪႫႬႭႮႯႰႱႲႳႴႵႶႷႸႹႺႻႼႽႾႿჀáƒáƒ‚ჃჄჅḀḂḄḆḈḊḌḎá¸á¸’ḔḖḘḚḜḞḠḢḤḦḨḪḬḮḰḲḴḶḸḺḼḾṀṂṄṆṈṊṌṎá¹á¹’ṔṖṘṚṜṞṠṢṤṦṨṪṬṮṰṲṴṶṸṺṼṾẀẂẄẆẈẊẌẎáºáº’ẔẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼẾỀỂỄỆỈỊỌỎá»á»’ỔỖỘỚỜỞỠỢỤỦỨỪỬỮỰỲỴỶỸἈἉἊἋἌá¼á¼Žá¼á¼˜á¼™á¼šá¼›á¼œá¼á¼¨á¼©á¼ªá¼«á¼¬á¼­á¼®á¼¯á¼¸á¼¹á¼ºá¼»á¼¼á¼½á¼¾á¼¿á½ˆá½‰á½Šá½‹á½Œá½á½™á½›á½á½Ÿá½¨á½©á½ªá½«á½¬á½­á½®á½¯á¾¸á¾¹á¾ºá¾»á¿ˆá¿‰á¿Šá¿‹á¿˜á¿™á¿šá¿›á¿¨á¿©á¿ªá¿«á¿¬á¿¸á¿¹á¿ºá¿»abcdefghijklmnopqrstuvwxyzªµºßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿÄăąćĉċÄÄđēĕėęěÄğġģĥħĩīĭįıijĵķĸĺļľŀłńņňʼnŋÅÅőœŕŗřśÅÅŸÅ¡Å£Å¥Å§Å©Å«Å­Å¯Å±Å³ÅµÅ·ÅºÅ¼Å¾Å¿Æ€ÆƒÆ…ÆˆÆŒÆÆ’ƕƙƚƛƞơƣƥƨƪƫƭưƴƶƹƺƽƾƿdžljnjǎÇǒǔǖǘǚǜÇǟǡǣǥǧǩǫǭǯǰdzǵǹǻǽǿÈȃȅȇȉȋÈÈȑȓȕȗșțÈȟȡȣȥȧȩȫȭȯȱȳȴȵȶȷȸȹȼȿɀÉɑɒɓɔɕɖɗɘəɚɛɜÉɞɟɠɡɢɣɤɥɦɧɨɩɪɫɬɭɮɯɰɱɲɳɴɵɶɷɸɹɺɻɼɽɾɿʀÊʂʃʄʅʆʇʈʉʊʋʌÊÊŽÊÊʑʒʓʔʕʖʗʘʙʚʛʜÊʞʟʠʡʢʣʤʥʦʧʨʩʪʫʬʭʮʯÎάέήίΰαβγδεζηθικλμνξοπÏςστυφχψωϊϋόÏÏŽÏϑϕϖϗϙϛÏϟϡϣϥϧϩϫϭϯϰϱϲϳϵϸϻϼабвгдежзийклмнопрÑтуфхцчшщъыьÑÑŽÑÑёђѓєѕіїјљњћќÑўџѡѣѥѧѩѫѭѯѱѳѵѷѹѻѽѿÒÒ‹ÒÒÒ‘Ò“Ò•Ò—Ò™Ò›ÒҟҡңҥҧҩҫҭүұҳҵҷҹһҽҿӂӄӆӈӊӌӎӑӓӕӗәӛÓÓŸÓ¡Ó£Ó¥Ó§Ó©Ó«Ó­Ó¯Ó±Ó³ÓµÓ·Ó¹ÔÔƒÔ…Ô‡Ô‰Ô‹ÔÔÕ¡Õ¢Õ£Õ¤Õ¥Õ¦Õ§Õ¨Õ©ÕªÕ«Õ¬Õ­Õ®Õ¯Õ°Õ±Õ²Õ³Õ´ÕµÕ¶Õ·Õ¸Õ¹ÕºÕ»Õ¼Õ½Õ¾Õ¿Ö€Öւփքօֆևᴀá´á´‚ᴃᴄᴅᴆᴇᴈᴉᴊᴋᴌá´á´Žá´á´á´‘ᴒᴓᴔᴕᴖᴗᴘᴙᴚᴛᴜá´á´žá´Ÿá´ á´¡á´¢á´£á´¤á´¥á´¦á´§á´¨á´©á´ªá´«áµ¢áµ£áµ¤áµ¥áµ¦áµ§áµ¨áµ©áµªáµ«áµ¬áµ­áµ®áµ¯áµ°áµ±áµ²áµ³áµ´áµµáµ¶áµ·áµ¹áµºáµ»áµ¼áµ½áµ¾áµ¿á¶€á¶á¶‚ᶃᶄᶅᶆᶇᶈᶉᶊᶋᶌá¶á¶Žá¶á¶á¶‘ᶒᶓᶔᶕᶖᶗᶘᶙᶚá¸á¸ƒá¸…ḇḉḋá¸á¸á¸‘ḓḕḗḙḛá¸á¸Ÿá¸¡á¸£á¸¥á¸§á¸©á¸«á¸­á¸¯á¸±á¸³á¸µá¸·á¸¹á¸»á¸½á¸¿á¹á¹ƒá¹…ṇṉṋá¹á¹á¹‘ṓṕṗṙṛá¹á¹Ÿá¹¡á¹£á¹¥á¹§á¹©á¹«á¹­á¹¯á¹±á¹³á¹µá¹·á¹¹á¹»á¹½á¹¿áºáºƒáº…ẇẉẋáºáºáº‘ẓẕẖẗẘẙẚẛạảấầẩẫậắằẳẵặẹẻẽếá»á»ƒá»…ệỉịá»á»á»‘ồổỗộớá»á»Ÿá»¡á»£á»¥á»§á»©á»«á»­á»¯á»±á»³á»µá»·á»¹á¼€á¼á¼‚ἃἄἅἆἇá¼á¼‘ἒἓἔἕἠἡἢἣἤἥἦἧἰἱἲἳἴἵἶἷὀá½á½‚ὃὄὅá½á½‘ὒὓὔὕὖὗὠὡὢὣὤὥὦὧὰάὲέὴήὶίὸόὺύὼώᾀá¾á¾‚ᾃᾄᾅᾆᾇá¾á¾‘ᾒᾓᾔᾕᾖᾗᾠᾡᾢᾣᾤᾥᾦᾧᾰᾱᾲᾳᾴᾶᾷιῂῃῄῆῇá¿á¿‘ῒΐῖῗῠῡῢΰῤῥῦῧῲῳῴῶῷâ²â²ƒâ²…ⲇⲉⲋâ²â²â²‘ⲓⲕⲗⲙⲛâ²â²Ÿâ²¡â²£â²¥â²§â²©â²«â²­â²¯â²±â²³â²µâ²·â²¹â²»â²½â²¿â³â³ƒâ³…ⳇⳉⳋâ³â³â³‘ⳓⳕⳗⳙⳛâ³â³Ÿâ³¡â³£â³¤â´€â´â´‚ⴃⴄⴅⴆⴇⴈⴉⴊⴋⴌâ´â´Žâ´â´â´‘ⴒⴓⴔⴕⴖⴗⴘⴙⴚⴛⴜâ´â´žâ´Ÿâ´ â´¡â´¢â´£â´¤â´¥ï¬€ï¬ï¬‚ffifflſtstﬓﬔﬕﬖﬗ\d_^]/utf + +/^[^d]*?$/ + abc + +/^[^d]*?$/utf + abc + +/^[^d]*?$/i + abc + +/^[^d]*?$/i,utf + abc + +/(?i)[\xc3\xa9\xc3\xbd]|[\xc3\xa9\xc3\xbdA]/utf + +/^[a\x{c0}]b/utf + \x{c0}b + +/^([a\x{c0}]*?)aa/utf + a\x{c0}aaaa/ + +/^([a\x{c0}]*?)aa/utf + a\x{c0}aaaa/ + a\x{c0}a\x{c0}aaa/ + +/^([a\x{c0}]*)aa/utf + a\x{c0}aaaa/ + a\x{c0}a\x{c0}aaa/ + +/^([a\x{c0}]*)a\x{c0}/utf + a\x{c0}aaaa/ + a\x{c0}a\x{c0}aaa/ + +/A*/g,utf + AAB\x{123}BAA + +/(abc)\1/i,utf +\= Expect no match + abc + +/(abc)\1/utf +\= Expect no match + abc + +/a(*:a\x{1234}b)/utf,mark + abc + +/a(*:a£b)/utf,mark + abc + +# Noncharacters + +/./utf + \x{fffe} + \x{ffff} + \x{1fffe} + \x{1ffff} + \x{2fffe} + \x{2ffff} + \x{3fffe} + \x{3ffff} + \x{4fffe} + \x{4ffff} + \x{5fffe} + \x{5ffff} + \x{6fffe} + \x{6ffff} + \x{7fffe} + \x{7ffff} + \x{8fffe} + \x{8ffff} + \x{9fffe} + \x{9ffff} + \x{afffe} + \x{affff} + \x{bfffe} + \x{bffff} + \x{cfffe} + \x{cffff} + \x{dfffe} + \x{dffff} + \x{efffe} + \x{effff} + \x{ffffe} + \x{fffff} + \x{10fffe} + \x{10ffff} + \x{fdd0} + \x{fdd1} + \x{fdd2} + \x{fdd3} + \x{fdd4} + \x{fdd5} + \x{fdd6} + \x{fdd7} + \x{fdd8} + \x{fdd9} + \x{fdda} + \x{fddb} + \x{fddc} + \x{fddd} + \x{fdde} + \x{fddf} + \x{fde0} + \x{fde1} + \x{fde2} + \x{fde3} + \x{fde4} + \x{fde5} + \x{fde6} + \x{fde7} + \x{fde8} + \x{fde9} + \x{fdea} + \x{fdeb} + \x{fdec} + \x{fded} + \x{fdee} + \x{fdef} + +/^\d*\w{4}/utf + 1234 +\= Expect no match + 123 + +/^[^b]*\w{4}/utf + aaaa +\= Expect no match + aaa + +/^[^b]*\w{4}/i,utf + aaaa +\= Expect no match + aaa + +/^\x{100}*.{4}/utf + \x{100}\x{100}\x{100}\x{100} +\= Expect no match + \x{100}\x{100}\x{100} + +/^\x{100}*.{4}/i,utf + \x{100}\x{100}\x{100}\x{100} +\= Expect no match + \x{100}\x{100}\x{100} + +/^a+[a\x{200}]/utf + aa + +/^.\B.\B./utf + \x{10123}\x{10124}\x{10125} + +/^#[^\x{ffff}]#[^\x{ffff}]#[^\x{ffff}]#/utf + #\x{10000}#\x{100}#\x{10ffff}# + +# Unicode property support tests + +/^\pC\pL\pM\pN\pP\pS\pZ\s+/utf,ucp + >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} + +/^>\pZ+/utf,ucp + >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} + +/^>[[:space:]]*/utf,ucp + >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} + +/^>[[:blank:]]*/utf,ucp + >\x{20}\x{a0}\x{1680}\x{2000}\x{202f}\x{9}\x{b}\x{2028} + +/^[[:alpha:]]*/utf,ucp + Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d} + +/^[[:alnum:]]*/utf,ucp + Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee} + +/^[[:cntrl:]]*/utf,ucp + \x{0}\x{09}\x{1f}\x{7f}\x{9f} + +/^[[:graph:]]*/utf,ucp + A\x{a1}\x{a0} + +/^[[:print:]]*/utf,ucp + A z\x{a0}\x{a1} + +/^[[:punct:]]*/utf,ucp + .+\x{a1}\x{a0} + +/\p{Zs}*?\R/ +\= Expect no match + a\xFCb + +/\p{Zs}*\R/ +\= Expect no match + a\xFCb + +/â±¥/i,utf + â±¥ + Ⱥx + Ⱥ + +/[â±¥]/i,utf + â±¥ + Ⱥx + Ⱥ + +/Ⱥ/i,utf + Ⱥ + â±¥ + +# These are tests for extended grapheme clusters + +/^\X/utf,aftertext + G\x{34e}\x{34e}X + \x{34e}\x{34e}X + \x04X + \x{1100}X + \x{1100}\x{34e}X + \x{1b04}\x{1b04}X + *These match up to the roman letters + \x{1111}\x{1111}L,L + \x{1111}\x{1111}\x{1169}L,L,V + \x{1111}\x{ae4c}L, LV + \x{1111}\x{ad89}L, LVT + \x{1111}\x{ae4c}\x{1169}L, LV, V + \x{1111}\x{ae4c}\x{1169}\x{1169}L, LV, V, V + \x{1111}\x{ae4c}\x{1169}\x{11fe}L, LV, V, T + \x{1111}\x{ad89}\x{11fe}L, LVT, T + \x{1111}\x{ad89}\x{11fe}\x{11fe}L, LVT, T, T + \x{ad89}\x{11fe}\x{11fe}LVT, T, T + *These match just the first codepoint (invalid sequence) + \x{1111}\x{11fe}L, T + \x{ae4c}\x{1111}LV, L + \x{ae4c}\x{ae4c}LV, LV + \x{ae4c}\x{ad89}LV, LVT + \x{1169}\x{1111}V, L + \x{1169}\x{ae4c}V, LV + \x{1169}\x{ad89}V, LVT + \x{ad89}\x{1111}LVT, L + \x{ad89}\x{1169}LVT, V + \x{ad89}\x{ae4c}LVT, LV + \x{ad89}\x{ad89}LVT, LVT + \x{11fe}\x{1111}T, L + \x{11fe}\x{1169}T, V + \x{11fe}\x{ae4c}T, LV + \x{11fe}\x{ad89}T, LVT + *Test extend and spacing mark + \x{1111}\x{ae4c}\x{0711}L, LV, extend + \x{1111}\x{ae4c}\x{1b04}L, LV, spacing mark + \x{1111}\x{ae4c}\x{1b04}\x{0711}\x{1b04}L, LV, spacing mark, extend, spacing mark + *Test CR, LF, and control + \x0d\x{0711}CR, extend + \x0d\x{1b04}CR, spacingmark + \x0a\x{0711}LF, extend + \x0a\x{1b04}LF, spacingmark + \x0b\x{0711}Control, extend + \x09\x{1b04}Control, spacingmark + *There are no Prepend characters, so we can't test Prepend, CR + +/^(?>\X{2})X/utf,aftertext + \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + +/^\X{2,4}X/utf,aftertext + \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + +/^\X{2,4}?X/utf,aftertext + \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + +/\X*Z/utf,no_start_optimize +\= Expect no match + A\x{300} + +/\X*(.)/utf,no_start_optimize + A\x{1111}\x{ae4c}\x{1169} + +# -------------------------------------------- + +/\x{1e9e}+/i,utf + \x{1e9e}\x{00df} + +/[z\x{1e9e}]+/i,utf + \x{1e9e}\x{00df} + +/\x{00df}+/i,utf + \x{1e9e}\x{00df} + +/[z\x{00df}]+/i,utf + \x{1e9e}\x{00df} + +/\x{1f88}+/i,utf + \x{1f88}\x{1f80} + +/[z\x{1f88}]+/i,utf + \x{1f88}\x{1f80} + +# Check a reference with more than one other case + +/^(\x{00b5})\1{2}$/i,utf + \x{00b5}\x{039c}\x{03bc} + +# Characters with more than one other case; test in classes + +/[z\x{00b5}]+/i,utf + \x{00b5}\x{039c}\x{03bc} + +/[z\x{039c}]+/i,utf + \x{00b5}\x{039c}\x{03bc} + +/[z\x{03bc}]+/i,utf + \x{00b5}\x{039c}\x{03bc} + +/[z\x{00c5}]+/i,utf + \x{00c5}\x{00e5}\x{212b} + +/[z\x{00e5}]+/i,utf + \x{00c5}\x{00e5}\x{212b} + +/[z\x{212b}]+/i,utf + \x{00c5}\x{00e5}\x{212b} + +/[z\x{01c4}]+/i,utf + \x{01c4}\x{01c5}\x{01c6} + +/[z\x{01c5}]+/i,utf + \x{01c4}\x{01c5}\x{01c6} + +/[z\x{01c6}]+/i,utf + \x{01c4}\x{01c5}\x{01c6} + +/[z\x{01c7}]+/i,utf + \x{01c7}\x{01c8}\x{01c9} + +/[z\x{01c8}]+/i,utf + \x{01c7}\x{01c8}\x{01c9} + +/[z\x{01c9}]+/i,utf + \x{01c7}\x{01c8}\x{01c9} + +/[z\x{01ca}]+/i,utf + \x{01ca}\x{01cb}\x{01cc} + +/[z\x{01cb}]+/i,utf + \x{01ca}\x{01cb}\x{01cc} + +/[z\x{01cc}]+/i,utf + \x{01ca}\x{01cb}\x{01cc} + +/[z\x{01f1}]+/i,utf + \x{01f1}\x{01f2}\x{01f3} + +/[z\x{01f2}]+/i,utf + \x{01f1}\x{01f2}\x{01f3} + +/[z\x{01f3}]+/i,utf + \x{01f1}\x{01f2}\x{01f3} + +/[z\x{0345}]+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + +/[z\x{0399}]+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + +/[z\x{03b9}]+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + +/[z\x{1fbe}]+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + +/[z\x{0392}]+/i,utf + \x{0392}\x{03b2}\x{03d0} + +/[z\x{03b2}]+/i,utf + \x{0392}\x{03b2}\x{03d0} + +/[z\x{03d0}]+/i,utf + \x{0392}\x{03b2}\x{03d0} + +/[z\x{0395}]+/i,utf + \x{0395}\x{03b5}\x{03f5} + +/[z\x{03b5}]+/i,utf + \x{0395}\x{03b5}\x{03f5} + +/[z\x{03f5}]+/i,utf + \x{0395}\x{03b5}\x{03f5} + +/[z\x{0398}]+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + +/[z\x{03b8}]+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + +/[z\x{03d1}]+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + +/[z\x{03f4}]+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + +/[z\x{039a}]+/i,utf + \x{039a}\x{03ba}\x{03f0} + +/[z\x{03ba}]+/i,utf + \x{039a}\x{03ba}\x{03f0} + +/[z\x{03f0}]+/i,utf + \x{039a}\x{03ba}\x{03f0} + +/[z\x{03a0}]+/i,utf + \x{03a0}\x{03c0}\x{03d6} + +/[z\x{03c0}]+/i,utf + \x{03a0}\x{03c0}\x{03d6} + +/[z\x{03d6}]+/i,utf + \x{03a0}\x{03c0}\x{03d6} + +/[z\x{03a1}]+/i,utf + \x{03a1}\x{03c1}\x{03f1} + +/[z\x{03c1}]+/i,utf + \x{03a1}\x{03c1}\x{03f1} + +/[z\x{03f1}]+/i,utf + \x{03a1}\x{03c1}\x{03f1} + +/[z\x{03a3}]+/i,utf + \x{03A3}\x{03C2}\x{03C3} + +/[z\x{03c2}]+/i,utf + \x{03A3}\x{03C2}\x{03C3} + +/[z\x{03c3}]+/i,utf + \x{03A3}\x{03C2}\x{03C3} + +/[z\x{03a6}]+/i,utf + \x{03a6}\x{03c6}\x{03d5} + +/[z\x{03c6}]+/i,utf + \x{03a6}\x{03c6}\x{03d5} + +/[z\x{03d5}]+/i,utf + \x{03a6}\x{03c6}\x{03d5} + +/[z\x{03c9}]+/i,utf + \x{03c9}\x{03a9}\x{2126} + +/[z\x{03a9}]+/i,utf + \x{03c9}\x{03a9}\x{2126} + +/[z\x{2126}]+/i,utf + \x{03c9}\x{03a9}\x{2126} + +/[z\x{1e60}]+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + +/[z\x{1e61}]+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + +/[z\x{1e9b}]+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + +# Perl 5.12.4 gets these wrong, but 5.15.3 is OK + +/[z\x{004b}]+/i,utf + \x{004b}\x{006b}\x{212a} + +/[z\x{006b}]+/i,utf + \x{004b}\x{006b}\x{212a} + +/[z\x{212a}]+/i,utf + \x{004b}\x{006b}\x{212a} + +/[z\x{0053}]+/i,utf + \x{0053}\x{0073}\x{017f} + +/[z\x{0073}]+/i,utf + \x{0053}\x{0073}\x{017f} + +/[z\x{017f}]+/i,utf + \x{0053}\x{0073}\x{017f} + +# -------------------------------------- + +/(ΣΆΜΟΣ) \1/i,utf + ΣΆΜΟΣ ΣΆΜΟΣ + ΣΆΜΟΣ σάμος + σάμος σάμος + σάμος σάμοσ + σάμος ΣΆΜΟΣ + +/(σάμος) \1/i,utf + ΣΆΜΟΣ ΣΆΜΟΣ + ΣΆΜΟΣ σάμος + σάμος σάμος + σάμος σάμοσ + σάμος ΣΆΜΟΣ + +/(ΣΆΜΟΣ) \1*/i,utf + ΣΆΜΟΣ\x20 + ΣΆΜΟΣ ΣΆΜΟΣσάμοςσάμος + +# Perl matches these + +/\x{00b5}+/i,utf + \x{00b5}\x{039c}\x{03bc} + +/\x{039c}+/i,utf + \x{00b5}\x{039c}\x{03bc} + +/\x{03bc}+/i,utf + \x{00b5}\x{039c}\x{03bc} + + +/\x{00c5}+/i,utf + \x{00c5}\x{00e5}\x{212b} + +/\x{00e5}+/i,utf + \x{00c5}\x{00e5}\x{212b} + +/\x{212b}+/i,utf + \x{00c5}\x{00e5}\x{212b} + + +/\x{01c4}+/i,utf + \x{01c4}\x{01c5}\x{01c6} + +/\x{01c5}+/i,utf + \x{01c4}\x{01c5}\x{01c6} + +/\x{01c6}+/i,utf + \x{01c4}\x{01c5}\x{01c6} + + +/\x{01c7}+/i,utf + \x{01c7}\x{01c8}\x{01c9} + +/\x{01c8}+/i,utf + \x{01c7}\x{01c8}\x{01c9} + +/\x{01c9}+/i,utf + \x{01c7}\x{01c8}\x{01c9} + + +/\x{01ca}+/i,utf + \x{01ca}\x{01cb}\x{01cc} + +/\x{01cb}+/i,utf + \x{01ca}\x{01cb}\x{01cc} + +/\x{01cc}+/i,utf + \x{01ca}\x{01cb}\x{01cc} + + +/\x{01f1}+/i,utf + \x{01f1}\x{01f2}\x{01f3} + +/\x{01f2}+/i,utf + \x{01f1}\x{01f2}\x{01f3} + +/\x{01f3}+/i,utf + \x{01f1}\x{01f2}\x{01f3} + + +/\x{0345}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + +/\x{0399}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + +/\x{03b9}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + +/\x{1fbe}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + + +/\x{0392}+/i,utf + \x{0392}\x{03b2}\x{03d0} + +/\x{03b2}+/i,utf + \x{0392}\x{03b2}\x{03d0} + +/\x{03d0}+/i,utf + \x{0392}\x{03b2}\x{03d0} + + +/\x{0395}+/i,utf + \x{0395}\x{03b5}\x{03f5} + +/\x{03b5}+/i,utf + \x{0395}\x{03b5}\x{03f5} + +/\x{03f5}+/i,utf + \x{0395}\x{03b5}\x{03f5} + + +/\x{0398}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + +/\x{03b8}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + +/\x{03d1}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + +/\x{03f4}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + + +/\x{039a}+/i,utf + \x{039a}\x{03ba}\x{03f0} + +/\x{03ba}+/i,utf + \x{039a}\x{03ba}\x{03f0} + +/\x{03f0}+/i,utf + \x{039a}\x{03ba}\x{03f0} + + +/\x{03a0}+/i,utf + \x{03a0}\x{03c0}\x{03d6} + +/\x{03c0}+/i,utf + \x{03a0}\x{03c0}\x{03d6} + +/\x{03d6}+/i,utf + \x{03a0}\x{03c0}\x{03d6} + + +/\x{03a1}+/i,utf + \x{03a1}\x{03c1}\x{03f1} + +/\x{03c1}+/i,utf + \x{03a1}\x{03c1}\x{03f1} + +/\x{03f1}+/i,utf + \x{03a1}\x{03c1}\x{03f1} + + +/\x{03a3}+/i,utf + \x{03A3}\x{03C2}\x{03C3} + +/\x{03c2}+/i,utf + \x{03A3}\x{03C2}\x{03C3} + +/\x{03c3}+/i,utf + \x{03A3}\x{03C2}\x{03C3} + + +/\x{03a6}+/i,utf + \x{03a6}\x{03c6}\x{03d5} + +/\x{03c6}+/i,utf + \x{03a6}\x{03c6}\x{03d5} + +/\x{03d5}+/i,utf + \x{03a6}\x{03c6}\x{03d5} + + +/\x{03c9}+/i,utf + \x{03c9}\x{03a9}\x{2126} + +/\x{03a9}+/i,utf + \x{03c9}\x{03a9}\x{2126} + +/\x{2126}+/i,utf + \x{03c9}\x{03a9}\x{2126} + + +/\x{1e60}+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + +/\x{1e61}+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + +/\x{1e9b}+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + + +/\x{1e9e}+/i,utf + \x{1e9e}\x{00df} + +/\x{00df}+/i,utf + \x{1e9e}\x{00df} + + +/\x{1f88}+/i,utf + \x{1f88}\x{1f80} + +/\x{1f80}+/i,utf + \x{1f88}\x{1f80} + +# Perl 5.12.4 gets these wrong, but 5.15.3 is OK + +/\x{004b}+/i,utf + \x{004b}\x{006b}\x{212a} + +/\x{006b}+/i,utf + \x{004b}\x{006b}\x{212a} + +/\x{212a}+/i,utf + \x{004b}\x{006b}\x{212a} + + +/\x{0053}+/i,utf + \x{0053}\x{0073}\x{017f} + +/\x{0073}+/i,utf + \x{0053}\x{0073}\x{017f} + +/\x{017f}+/i,utf + \x{0053}\x{0073}\x{017f} + +/^\p{Any}*\d{4}/utf + 1234 +\= Expect no match + 123 + +/^\X*\w{4}/utf + 1234 +\= Expect no match + 123 + +/^A\s+Z/utf,ucp + A\x{2005}Z + A\x{85}\x{2005}Z + +/^A[\s]+Z/utf,ucp + A\x{2005}Z + A\x{85}\x{2005}Z + +/^[[:graph:]]+$/utf,ucp + Letter:ABC + Mark:\x{300}\x{1d172}\x{1d17b} + Number:9\x{660} + Punctuation:\x{66a},; + Symbol:\x{6de}<>\x{fffc} + Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} + \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} + \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} + \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} + \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} + \x{feff} + \x{fff9}\x{fffa}\x{fffb} + \x{110bd} + \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} + \x{e0001} + \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} +\= Expect no match + \x{09} + \x{0a} + \x{1D} + \x{20} + \x{85} + \x{a0} + \x{1680} + \x{2028} + \x{2029} + \x{202f} + \x{2065} + \x{3000} + \x{e0002} + \x{e001f} + \x{e0080} + +/^[[:print:]]+$/utf,ucp + Space: \x{a0} + \x{1680}\x{2000}\x{2001}\x{2002}\x{2003}\x{2004}\x{2005} + \x{2006}\x{2007}\x{2008}\x{2009}\x{200a} + \x{202f}\x{205f} + \x{3000} + Letter:ABC + Mark:\x{300}\x{1d172}\x{1d17b} + Number:9\x{660} + Punctuation:\x{66a},; + Symbol:\x{6de}<>\x{fffc} + Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} + \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} + \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} + \x{202f} + \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} + \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} + \x{feff} + \x{fff9}\x{fffa}\x{fffb} + \x{110bd} + \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} + \x{e0001} + \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} +\= Expect no match + \x{09} + \x{1D} + \x{85} + \x{2028} + \x{2029} + \x{2065} + \x{e0002} + \x{e001f} + \x{e0080} + +/^[[:punct:]]+$/utf,ucp + \$+<=>^`|~ + !\"#%&'()*,-./:;?@[\\]_{} + \x{a1}\x{a7} + \x{37e} +\= Expect no match + abcde + +/^[[:^graph:]]+$/utf,ucp + \x{09}\x{0a}\x{1D}\x{20}\x{85}\x{a0}\x{1680} + \x{2028}\x{2029}\x{202f}\x{2065} + \x{3000}\x{e0002}\x{e001f}\x{e0080} +\= Expect no match + Letter:ABC + Mark:\x{300}\x{1d172}\x{1d17b} + Number:9\x{660} + Punctuation:\x{66a},; + Symbol:\x{6de}<>\x{fffc} + Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} + \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} + \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} + \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} + \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} + \x{feff} + \x{fff9}\x{fffa}\x{fffb} + \x{110bd} + \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} + \x{e0001} + \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} + +/^[[:^print:]]+$/utf,ucp + \x{09}\x{1D}\x{85}\x{2028}\x{2029}\x{2065} + \x{e0002}\x{e001f}\x{e0080} +\= Expect no match + Space: \x{a0} + \x{1680}\x{2000}\x{2001}\x{2002}\x{2003}\x{2004}\x{2005} + \x{2006}\x{2007}\x{2008}\x{2009}\x{200a} + \x{202f}\x{205f} + \x{3000} + Letter:ABC + Mark:\x{300}\x{1d172}\x{1d17b} + Number:9\x{660} + Punctuation:\x{66a},; + Symbol:\x{6de}<>\x{fffc} + Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} + \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} + \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} + \x{202f} + \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} + \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} + \x{feff} + \x{fff9}\x{fffa}\x{fffb} + \x{110bd} + \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} + \x{e0001} + \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} + +/^[[:^punct:]]+$/utf,ucp + abcde +\= Expect no match + \$+<=>^`|~ + !\"#%&'()*,-./:;?@[\\]_{} + \x{a1}\x{a7} + \x{37e} + +/[RST]+/i,utf,ucp + Ss\x{17f} + +/[R-T]+/i,utf,ucp + Ss\x{17f} + +/[q-u]+/i,utf,ucp + Ss\x{17f} + +/^s?c/im,utf + scat + +# The next four tests are for repeated caseless back references when the +# code unit length of the matched text is different to that of the original +# group in the UTF-8 case. + +/^(\x{23a})\1*(.)/i,utf + \x{23a}\x{23a}\x{23a}\x{23a} + \x{23a}\x{2c65}\x{2c65}\x{2c65} + \x{23a}\x{23a}\x{2c65}\x{23a} + +/^(\x{23a})\1*(..)/i,utf + \x{23a}\x{2c65}\x{2c65}\x{2c65} + \x{23a}\x{23a}\x{2c65}\x{23a} + +/^(\x{23a})\1*(...)/i,utf + \x{23a}\x{2c65}\x{2c65}\x{2c65} + \x{23a}\x{23a}\x{2c65}\x{23a} + +/^(\x{23a})\1*(....)/i,utf +\= Expect no match + \x{23a}\x{2c65}\x{2c65}\x{2c65} + \x{23a}\x{23a}\x{2c65}\x{23a} + +/[A-`]/i,utf + abcdefghijklmno + +/[\S\V\H]/utf + +/[^\p{Any}]*+x/utf + x + +/[[:punct:]]/utf,ucp + \x{b4} + +/[[:^ascii:]]/utf,ucp + \x{100} + \x{200} + \x{300} + \x{37e} +\= Expect no match + aa + 99 + +/[[:^ascii:]\w]/utf,ucp + aa + 99 + gg + \x{100} + \x{200} + \x{300} + \x{37e} + +/[\w[:^ascii:]]/utf,ucp + aa + 99 + gg + \x{100} + \x{200} + \x{300} + \x{37e} + +/[^[:ascii:]\W]/utf,ucp + \x{100} + \x{200} +\= Expect no match + aa + 99 + gg + \x{37e} + +/[^[:^ascii:]\d]/utf,ucp + a + ~ + \a + \x{7f} +\= Expect no match + 0 + \x{389} + \x{20ac} + +/(?=.*b)\pL/ + 11bb + +/(?(?=.*b)(?=.*b)\pL|.*c)/ + 11bb + +/^\x{123}+?$/utf,no_auto_possess + \x{123}\x{123}\x{123} + +/^\x{123}+?$/i,utf,no_auto_possess + \x{123}\x{122}\x{123} +\= Expect no match + \x{123}\x{124}\x{123} + +/\N{U+1234}/utf + \x{1234} + +/[\N{U+1234}]/utf + \x{1234} + +# Test the full list of Unicode "Pattern White Space" characters that are to +# be ignored by /x. The pattern lines below may show up oddly in text editors +# or when listed to the screen. Note that characters such as U+2002, which are +# matched as space by \h and \v are *not* "Pattern White Space". + +/A…‎â€â€¨â€©B/x,utf + AB + +/A B/x,utf + A\x{2002}B +\= Expect no match + AB + +# ------- + +/[^\x{100}-\x{ffff}]*[\x80-\xff]/utf + \x{99}\x{99}\x{99} + +/[^\x{100}-\x{ffff}ABC]*[\x80-\xff]/utf + \x{99}\x{99}\x{99} + +/[^\x{100}-\x{ffff}]*[\x80-\xff]/i,utf + \x{99}\x{99}\x{99} + +# Script run tests + +/^(*script_run:.{4})/utf + abcd Latin x4 + \x{2e80}\x{2fa1d}\x{3041}\x{30a1} Han Han Hiragana Katakana + \x{3041}\x{30a1}\x{3007}\x{3007} Hiragana Katakana Han Han + \x{30a1}\x{3041}\x{3007}\x{3007} Katakana Hiragana Han Han + \x{1100}\x{2e80}\x{2e80}\x{1101} Hangul Han Han Hangul + \x{2e80}\x{3105}\x{2e80}\x{3105} Han Bopomofo Han Bopomofo + \x{02ea}\x{2e80}\x{2e80}\x{3105} Bopomofo-Sk Han Han Bopomofo + \x{3105}\x{2e80}\x{2e80}\x{3105} Bopomofo Han Han Bopomofo + \x{0300}cd! Inherited Latin Latin Common + \x{0391}12\x{03a9} Greek Common-digits Greek + \x{0400}12\x{fe2f} Cyrillic Common-digits Cyrillic + \x{0531}12\x{fb17} Armenian Common-digits Armenian + \x{0591}12\x{fb4f} Hebrew Common-digits Hebrew + \x{0600}12\x{1eef1} Arabic Common-digits Arabic + \x{0600}\x{0660}\x{0669}\x{1eef1} Arabic Arabic-digits Arabic + \x{0700}12\x{086a} Syriac Common-digits Syriac + \x{1200}12\x{ab2e} Ethiopic Common-digits Ethiopic + \x{1680}12\x{169c} Ogham Common-digits Ogham + \x{3041}12\x{3041} Hiragana Common-digits Hiragana + \x{0980}\x{09e6}\x{09e7}\x{0993} Bengali Bengali-digits Bengali + !cde Common Latin Latin Latin + A..B Latin Common Common Latin + 0abc Ascii-digit Latin Latin Latin + 1\x{0700}\x{0700}\x{0700} Ascii-digit Syriac x 3 + \x{1A80}\x{1A80}\x{1a40}\x{1a41} Tai Tham Hora digits, letters +\= Expect no match + a\x{370}bcd Latin Greek Latin Latin + \x{1100}\x{02ea}\x{02ea}\x{02ea} Hangul Bopomofo x3 + \x{02ea}\x{02ea}\x{02ea}\x{1100} Bopomofo x3 Hangul + \x{1100}\x{2e80}\x{3041}\x{1101} Hangul Han Hiragana Hangul + \x{0391}\x{09e6}\x{09e7}\x{03a9} Greek Bengali digits Greek + \x{0600}7\x{0669}\x{1eef1} Arabic ascii-digit Arabic-digit Arabic + \x{0600}\x{0669}7\x{1eef1} Arabic Arabic-digit ascii-digit Arabic + A5\x{ff19}B Latin Common-ascii/notascii-digits Latin + \x{0300}cd\x{0391} Inherited Latin Latin Greek + !cd\x{0391} Common Latin Latin Greek + \x{1A80}\x{1A90}\x{1a40}\x{1a41} Tai Tham Hora digit, Tham digit, letters + A\x{1d7ce}\x{1d7ff}B Common fancy-common-2-sets-digits Common + \x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana + +/^(*sr:.{4}|..)/utf + \x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana + +/^(*atomic_script_run:.{4}|..)/utf +\= Expect no match + \x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana + +/^(*asr:.*)/utf +\= Expect no match + \x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana + +/^(?>(*sr:.*))/utf + \x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana + +/^(*sr:.*)/utf + \x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana + \x{10fffd}\x{10fffd}\x{10fffd} Private use (Unknown) + +/^(*sr:\x{2e80}*)/utf + \x{2e80}\x{2e80}\x{3105} Han Han Bopomofo + +/^(*sr:\x{2e80}*)\x{2e80}/utf + \x{2e80}\x{2e80}\x{3105} Han Han Bopomofo + +/^(*sr:.*)Test/utf + Test script run on an empty string + +/^(*sr:(.{2})){2}/utf + \x{0600}7\x{0669}\x{1eef1} Arabic ascii-digit Arabic-digit Arabic + \x{1A80}\x{1A80}\x{1a40}\x{1a41} Tai Tham Hora digits, letters + \x{1A80}\x{1a40}\x{1A90}\x{1a41} Tai Tham Hora digit, letter, Tham digit, letter +\= Expect no match + \x{1100}\x{2e80}\x{3041}\x{1101} Hangul Han Hiragana Hangul + +/^(*sr:\S*)/utf + \x{1cf4}\x{20f0}\x{900}\x{11305} [Dev,Gran,Kan] [Dev,Gran,Lat] Dev Gran + \x{1cf4}\x{20f0}\x{11305}\x{900} [Dev,Gran,Kan] [Dev,Gran,Lat] Gran Dev + \x{1cf4}\x{20f0}\x{900}ABC [Dev,Gran,Kan] [Dev,Gran,Lat] Dev Lat + \x{1cf4}\x{20f0}ABC [Dev,Gran,Kan] [Dev,Gran,Lat] Lat + \x{20f0}ABC [Dev,Gran,Lat] Lat + XYZ\x{20f0}ABC Lat [Dev,Gran,Lat] Lat + \x{a36}\x{a33}\x{900} [Dev,...] [Dev,...] Dev + \x{3001}\x{2e80}\x{3041}\x{30a1} [Bopo, Han, etc] Han Hira Kata + \x{3001}\x{30a1}\x{2e80}\x{3041} [Bopo, Han, etc] Kata Han Hira + \x{3001}\x{3105}\x{2e80}\x{1101} [Bopo, Han, etc] Bopomofo Han Hangul + \x{3105}\x{3001}\x{2e80}\x{1101} Bopomofo [Bopo, Han, etc] Han Hangul + \x{3031}\x{3041}\x{30a1}\x{2e80} [Hira Kata] Hira Kata Han + \x{060c}\x{06d4}\x{0600}\x{10d00}\x{0700} [Arab Rohg Syrc Thaa] [Arab Rohg] Arab Rohg Syrc + \x{060c}\x{06d4}\x{0700}\x{0600}\x{10d00} [Arab Rohg Syrc Thaa] [Arab Rohg] Syrc Arab Rohg + \x{2e80}\x{3041}\x{3001}\x{3031}\x{2e80} Han Hira [Bopo, Han, etc] [Hira Kata] Han + +/(?[[:blank:]]*/utf,ucp + >\x{20}\x{a0}\x{1680}\x{180e}\x{2000}\x{202f}\x{9}\x{b}\x{2028} + +/^A\s+Z/utf,ucp + A\x{85}\x{180e}\x{2005}Z + +/^A[\s]+Z/utf,ucp + A\x{2005}Z + A\x{85}\x{2005}Z + +/^[[:graph:]]+$/utf,ucp +\= Expect no match + \x{180e} + +/^[[:print:]]+$/utf,ucp + \x{180e} + +/^[[:^graph:]]+$/utf,ucp + \x{09}\x{0a}\x{1D}\x{20}\x{85}\x{a0}\x{61c}\x{1680}\x{180e} + +/^[[:^print:]]+$/utf,ucp +\= Expect no match + \x{180e} + +# End of U+180E tests. + +# --------------------------------------------------------------------- + +/\x{110000}/IB,utf + +/\o{4200000}/IB,utf + +/\x{ffffffff}/utf + +/\o{37777777777}/utf + +/\x{100000000}/utf + +/\o{77777777777}/utf + +/\x{d800}/utf + +/\o{154000}/utf + +/\x{dfff}/utf + +/\o{157777}/utf + +/\x{d7ff}/utf + +/\o{153777}/utf + +/\x{e000}/utf + +/\o{170000}/utf + +/^\x{100}a\x{1234}/utf + \x{100}a\x{1234}bcd + +/\x{0041}\x{2262}\x{0391}\x{002e}/IB,utf + \x{0041}\x{2262}\x{0391}\x{002e} + +/.{3,5}X/IB,utf + \x{212ab}\x{212ab}\x{212ab}\x{861}X + +/.{3,5}?/IB,utf + \x{212ab}\x{212ab}\x{212ab}\x{861} + +/^[ab]/IB,utf + bar +\= Expect no match + c + \x{ff} + \x{100} + +/\x{100}*(\d+|"(?1)")/utf + 1234 + "1234" + \x{100}1234 + "\x{100}1234" + \x{100}\x{100}12ab + \x{100}\x{100}"12" +\= Expect no match + \x{100}\x{100}abcd + +/\x{100}*/IB,utf + +/a\x{100}*/IB,utf + +/ab\x{100}*/IB,utf + +/[\x{200}-\x{100}]/utf + +/[Ä€-Ä„]/utf + \x{100} + \x{104} +\= Expect no match + \x{105} + \x{ff} + +/[\xFF]/IB + >\xff< + +/[^\xFF]/IB + +/[Ä-Ü]/utf + Ö # Matches without Study + \x{d6} + +/[Ä-Ü]/utf + Ö <-- Same with Study + \x{d6} + +/[\x{c4}-\x{dc}]/utf + Ö # Matches without Study + \x{d6} + +/[\x{c4}-\x{dc}]/utf + Ö <-- Same with Study + \x{d6} + +/[^\x{100}]abc(xyz(?1))/IB,utf + +/(\x{100}(b(?2)c))?/IB,utf + +/(\x{100}(b(?2)c)){0,2}/IB,utf + +/(\x{100}(b(?1)c))?/IB,utf + +/(\x{100}(b(?1)c)){0,2}/IB,utf + +/\W/utf + A.B + A\x{100}B + +/\w/utf + \x{100}X + +# Use no_start_optimize because the first code unit is different in 8-bit from +# the wider modes. + +/^\ሴ/IB,utf,no_start_optimize + +/()()()()()()()()()() + ()()()()()()()()()() + ()()()()()()()()()() + ()()()()()()()()()() + A (x) (?41) B/x,utf + AxxB + +/^[\x{100}\E-\Q\E\x{150}]/B,utf + +/^[\QÄ€\E-\QÅ\E]/B,utf + +/^abc./gmx,newline=any,utf + abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x{0085}abc7 \x{2028}abc8 \x{2029}abc9 JUNK + +/abc.$/gmx,newline=any,utf + abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x{0085} abc7\x{2028} abc8\x{2029} abc9 + +/^a\Rb/bsr=unicode,utf + a\nb + a\rb + a\r\nb + a\x0bb + a\x0cb + a\x{85}b + a\x{2028}b + a\x{2029}b +\= Expect no match + a\n\rb + +/^a\R*b/bsr=unicode,utf + ab + a\nb + a\rb + a\r\nb + a\x0bb + a\x0c\x{2028}\x{2029}b + a\x{85}b + a\n\rb + a\n\r\x{85}\x0cb + +/^a\R+b/bsr=unicode,utf + a\nb + a\rb + a\r\nb + a\x0bb + a\x0c\x{2028}\x{2029}b + a\x{85}b + a\n\rb + a\n\r\x{85}\x0cb +\= Expect no match + ab + +/^a\R{1,3}b/bsr=unicode,utf + a\nb + a\n\rb + a\n\r\x{85}b + a\r\n\r\nb + a\r\n\r\n\r\nb + a\n\r\n\rb + a\n\n\r\nb +\= Expect no match + a\n\n\n\rb + a\r + +/\H\h\V\v/utf + X X\x0a + X\x09X\x0b +\= Expect no match + \x{a0} X\x0a + +/\H*\h+\V?\v{3,4}/utf + \x09\x20\x{a0}X\x0a\x0b\x0c\x0d\x0a + \x09\x20\x{a0}\x0a\x0b\x0c\x0d\x0a + \x09\x20\x{a0}\x0a\x0b\x0c +\= Expect no match + \x09\x20\x{a0}\x0a\x0b + +/\H\h\V\v/utf + \x{3001}\x{3000}\x{2030}\x{2028} + X\x{180e}X\x{85} +\= Expect no match + \x{2009} X\x0a + +/\H*\h+\V?\v{3,4}/utf + \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x0c\x0d\x0a + \x09\x{205f}\x{a0}\x0a\x{2029}\x0c\x{2028}\x0a + \x09\x20\x{202f}\x0a\x0b\x0c +\= Expect no match + \x09\x{200a}\x{a0}\x{2028}\x0b + +/[\h]/B,utf + >\x{1680} + +/[\h]{3,}/B,utf + >\x{1680}\x{180e}\x{2000}\x{2003}\x{200a}\x{202f}\x{205f}\x{3000}< + +/[\v]/B,utf + +/[\H]/B,utf + +/[\V]/B,utf + +/.*$/newline=any,utf + \x{1ec5} + +/a\Rb/I,bsr=anycrlf,utf + a\rb + a\nb + a\r\nb +\= Expect no match + a\x{85}b + a\x0bb + +/a\Rb/I,bsr=unicode,utf + a\rb + a\nb + a\r\nb + a\x{85}b + a\x0bb + +/a\R?b/I,bsr=anycrlf,utf + a\rb + a\nb + a\r\nb +\= Expect no match + a\x{85}b + a\x0bb + +/a\R?b/I,bsr=unicode,utf + a\rb + a\nb + a\r\nb + a\x{85}b + a\x0bb + +/.*a.*=.b.*/utf,newline=any + QQQ\x{2029}ABCaXYZ=!bPQR +\= Expect no match + a\x{2029}b + \x61\xe2\x80\xa9\x62 + +/[[:a\x{100}b:]]/utf + +/a[^]b/utf,allow_empty_class,match_unset_backref + a\x{1234}b + a\nb +\= Expect no match + ab + +/a[^]+b/utf,allow_empty_class,match_unset_backref + aXb + a\nX\nX\x{1234}b +\= Expect no match + ab + +/(\x{de})\1/ + \x{de}\x{de} + +/X/newline=any,utf,firstline + A\x{1ec5}ABCXYZ + +/Xa{2,4}b/utf + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/Xa{2,4}?b/utf + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/Xa{2,4}+b/utf + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/X\x{123}{2,4}b/utf + X\=ps + X\x{123}\=ps + X\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\x{123}\=ps + +/X\x{123}{2,4}?b/utf + X\=ps + X\x{123}\=ps + X\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\x{123}\=ps + +/X\x{123}{2,4}+b/utf + X\=ps + X\x{123}\=ps + X\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\x{123}\=ps + +/X\x{123}{2,4}b/utf +\= Expect no match + Xx\=ps + X\x{123}x\=ps + X\x{123}\x{123}x\=ps + X\x{123}\x{123}\x{123}x\=ps + X\x{123}\x{123}\x{123}\x{123}x\=ps + +/X\x{123}{2,4}?b/utf +\= Expect no match + Xx\=ps + X\x{123}x\=ps + X\x{123}\x{123}x\=ps + X\x{123}\x{123}\x{123}x\=ps + X\x{123}\x{123}\x{123}\x{123}x\=ps + +/X\x{123}{2,4}+b/utf +\= Expect no match + Xx\=ps + X\x{123}x\=ps + X\x{123}\x{123}x\=ps + X\x{123}\x{123}\x{123}x\=ps + X\x{123}\x{123}\x{123}\x{123}x\=ps + +/X\d{2,4}b/utf + X\=ps + X3\=ps + X33\=ps + X333\=ps + X3333\=ps + +/X\d{2,4}?b/utf + X\=ps + X3\=ps + X33\=ps + X333\=ps + X3333\=ps + +/X\d{2,4}+b/utf + X\=ps + X3\=ps + X33\=ps + X333\=ps + X3333\=ps + +/X\D{2,4}b/utf + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/X\D{2,4}?b/utf + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/X\D{2,4}+b/utf + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/X\D{2,4}b/utf + X\=ps + X\x{123}\=ps + X\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\x{123}\=ps + +/X\D{2,4}?b/utf + X\=ps + X\x{123}\=ps + X\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\x{123}\=ps + +/X\D{2,4}+b/utf + X\=ps + X\x{123}\=ps + X\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\x{123}\=ps + +/X[abc]{2,4}b/utf + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/X[abc]{2,4}?b/utf + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/X[abc]{2,4}+b/utf + X\=ps + Xa\=ps + Xaa\=ps + Xaaa\=ps + Xaaaa\=ps + +/X[abc\x{123}]{2,4}b/utf + X\=ps + X\x{123}\=ps + X\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\x{123}\=ps + +/X[abc\x{123}]{2,4}?b/utf + X\=ps + X\x{123}\=ps + X\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\x{123}\=ps + +/X[abc\x{123}]{2,4}+b/utf + X\=ps + X\x{123}\=ps + X\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\x{123}\=ps + +/X[^a]{2,4}b/utf + X\=ps + Xz\=ps + Xzz\=ps + Xzzz\=ps + Xzzzz\=ps + +/X[^a]{2,4}?b/utf + X\=ps + Xz\=ps + Xzz\=ps + Xzzz\=ps + Xzzzz\=ps + +/X[^a]{2,4}+b/utf + X\=ps + Xz\=ps + Xzz\=ps + Xzzz\=ps + Xzzzz\=ps + +/X[^a]{2,4}b/utf + X\=ps + X\x{123}\=ps + X\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\x{123}\=ps + +/X[^a]{2,4}?b/utf + X\=ps + X\x{123}\=ps + X\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\x{123}\=ps + +/X[^a]{2,4}+b/utf + X\=ps + X\x{123}\=ps + X\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\=ps + X\x{123}\x{123}\x{123}\x{123}\=ps + +/(Y)X\1{2,4}b/utf + YX\=ps + YXY\=ps + YXYY\=ps + YXYYY\=ps + YXYYYY\=ps + +/(Y)X\1{2,4}?b/utf + YX\=ps + YXY\=ps + YXYY\=ps + YXYYY\=ps + YXYYYY\=ps + +/(Y)X\1{2,4}+b/utf + YX\=ps + YXY\=ps + YXYY\=ps + YXYYY\=ps + YXYYYY\=ps + +/(\x{123})X\1{2,4}b/utf + \x{123}X\=ps + \x{123}X\x{123}\=ps + \x{123}X\x{123}\x{123}\=ps + \x{123}X\x{123}\x{123}\x{123}\=ps + \x{123}X\x{123}\x{123}\x{123}\x{123}\=ps + +/(\x{123})X\1{2,4}?b/utf + \x{123}X\=ps + \x{123}X\x{123}\=ps + \x{123}X\x{123}\x{123}\=ps + \x{123}X\x{123}\x{123}\x{123}\=ps + \x{123}X\x{123}\x{123}\x{123}\x{123}\=ps + +/(\x{123})X\1{2,4}+b/utf + \x{123}X\=ps + \x{123}X\x{123}\=ps + \x{123}X\x{123}\x{123}\=ps + \x{123}X\x{123}\x{123}\x{123}\=ps + \x{123}X\x{123}\x{123}\x{123}\x{123}\=ps + +/\bthe cat\b/utf + the cat\=ps + the cat\=ph + +/abcd*/utf + xxxxabcd\=ps + xxxxabcd\=ph + +/abcd*/i,utf + xxxxabcd\=ps + xxxxabcd\=ph + XXXXABCD\=ps + XXXXABCD\=ph + +/abc\d*/utf + xxxxabc1\=ps + xxxxabc1\=ph + +/(a)bc\1*/utf + xxxxabca\=ps + xxxxabca\=ph + +/abc[de]*/utf + xxxxabcde\=ps + xxxxabcde\=ph + +/X\W{3}X/utf + X\=ps + +/\sxxx\s/utf,tables=2 + AB\x{85}xxx\x{a0}XYZ + AB\x{a0}xxx\x{85}XYZ + +/\S \S/utf,tables=2 + \x{a2} \x{84} + +'A#хц'Bx,newline=any,utf + +'A#хц + PQ'Bx,newline=any,utf + +/a+#Ñ…aa + z#XX?/Bx,newline=any,utf + +/a+#Ñ…aa + z#Ñ…?/Bx,newline=any,utf + +/\g{A}xxx#bXX(?'A'123) (?'A'456)/Bx,newline=any,utf + +/\g{A}xxx#bÑ…(?'A'123) (?'A'456)/Bx,newline=any,utf + +/^\cÄ£/utf + +/(\R*)(.)/s,utf + \r\n + \r\r\n\n\r + \r\r\n\n\r\n + +/(\R)*(.)/s,utf + \r\n + \r\r\n\n\r + \r\r\n\n\r\n + +/[^\x{1234}]+/Ii,utf + +/[^\x{1234}]+?/Ii,utf + +/[^\x{1234}]++/Ii,utf + +/[^\x{1234}]{2}/Ii,utf + +/f.*/ + for\=ph + +/f.*/s + for\=ph + +/f.*/utf + for\=ph + +/f.*/s,utf + for\=ph + +/\x{d7ff}\x{e000}/utf + +/\x{d800}/utf + +/\x{dfff}/utf + +/\h+/utf + \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} + \x{3001}\x{2fff}\x{200a}\x{a0}\x{2000} + +/[\h\x{e000}]+/B,utf + \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} + \x{3001}\x{2fff}\x{200a}\x{a0}\x{2000} + +/\H+/utf + \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} + \x{2000}\x{200a}\x{1fff}\x{200b} + \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} + \x{a0}\x{3000}\x{9f}\x{a1}\x{2fff}\x{3001} + +/[\H\x{d7ff}]+/B,utf + \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} + \x{2000}\x{200a}\x{1fff}\x{200b} + \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} + \x{a0}\x{3000}\x{9f}\x{a1}\x{2fff}\x{3001} + +/\v+/utf + \x{2027}\x{2030}\x{2028}\x{2029} + \x09\x0e\x{84}\x{86}\x{85}\x0a\x0b\x0c\x0d + +/[\v\x{e000}]+/B,utf + \x{2027}\x{2030}\x{2028}\x{2029} + \x09\x0e\x{84}\x{86}\x{85}\x0a\x0b\x0c\x0d + +/\V+/utf + \x{2028}\x{2029}\x{2027}\x{2030} + \x{85}\x0a\x0b\x0c\x0d\x09\x0e\x{84}\x{86} + +/[\V\x{d7ff}]+/B,utf + \x{2028}\x{2029}\x{2027}\x{2030} + \x{85}\x0a\x0b\x0c\x0d\x09\x0e\x{84}\x{86} + +/\R+/bsr=unicode,utf + \x{2027}\x{2030}\x{2028}\x{2029} + \x09\x0e\x{84}\x{86}\x{85}\x0a\x0b\x0c\x0d + +/(..)\1/utf + ab\=ps + aba\=ps + abab\=ps + +/(..)\1/i,utf + ab\=ps + abA\=ps + aBAb\=ps + +/(..)\1{2,}/utf + ab\=ps + aba\=ps + abab\=ps + ababa\=ps + ababab\=ps + ababab\=ph + abababa\=ps + abababa\=ph + +/(..)\1{2,}/i,utf + ab\=ps + aBa\=ps + aBAb\=ps + AbaBA\=ps + abABAb\=ps + aBAbaB\=ph + abABabA\=ps + abaBABa\=ph + +/(..)\1{2,}?x/i,utf + ab\=ps + abA\=ps + aBAb\=ps + abaBA\=ps + abAbaB\=ps + abaBabA\=ps + abAbABaBx\=ps + +/./utf,newline=crlf + \r\=ps + \r\=ph + +/.{2,3}/utf,newline=crlf + \r\=ps + \r\=ph + \r\r\=ps + \r\r\=ph + \r\r\r\=ps + \r\r\r\=ph + +/.{2,3}?/utf,newline=crlf + \r\=ps + \r\=ph + \r\r\=ps + \r\r\=ph + \r\r\r\=ps + \r\r\r\=ph + +/[^\x{100}][^\x{1234}][^\x{ffff}][^\x{10000}][^\x{10ffff}]/B,utf + +/[^\x{100}][^\x{1234}][^\x{ffff}][^\x{10000}][^\x{10ffff}]/Bi,utf + +/[^\x{100}]*[^\x{10000}]+[^\x{10ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{fffff}]{5,6}+/B,utf + +/[^\x{100}]*[^\x{10000}]+[^\x{10ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{fffff}]{5,6}+/Bi,utf + +/(?<=\x{1234}\x{1234})\bxy/I,utf + +/(?\p{Xsp}/utf + >\x{1680}\x{2028}\x{0b} + >\x{a0} +\= Expect no match + \x{0b} + +/^>\p{Xsp}+/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xsp}+?/utf + >\x{1680}\x{2028}\x{0b} + +/^>\p{Xsp}*/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xsp}{2,9}/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xsp}{2,9}?/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>[\p{Xsp}]/utf + >\x{2028}\x{0b} + +/^>[\p{Xsp}]+/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}/utf + >\x{1680}\x{2028}\x{0b} + >\x{a0} +\= Expect no match + \x{0b} + +/^>\p{Xps}+/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}+?/utf + >\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}*/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}{2,9}/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}{2,9}?/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>[\p{Xps}]/utf + >\x{2028}\x{0b} + +/^>[\p{Xps}]+/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^\p{Xwd}/utf + ABCD + 1234 + \x{6ca} + \x{a6c} + \x{10a7} + _ABC +\= Expect no match + [] + +/^\p{Xwd}+/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + +/^\p{Xwd}+?/utf + \x{6ca}\x{a6c}\x{10a7}_ + +/^\p{Xwd}*/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + +/^\p{Xwd}{2,9}/utf + A_B12\x{6ca}\x{a6c}\x{10a7} + +/^\p{Xwd}{2,9}?/utf + \x{6ca}\x{a6c}\x{10a7}_ + +/^[\p{Xwd}]/utf + ABCD1234_ + 1234abcd_ + \x{6ca} + \x{a6c} + \x{10a7} + _ABC +\= Expect no match + [] + +/^[\p{Xwd}]+/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + +# A check not in UTF-8 mode + +/^[\p{Xwd}]+/ + ABCD1234_ + +# Some negative checks + +/^[\P{Xwd}]+/utf + !.+\x{019}\x{35a}AB + +/^[\p{^Xwd}]+/utf + !.+\x{019}\x{35a}AB + +/[\D]/B,utf,ucp + 1\x{3c8}2 + +/[\d]/B,utf,ucp + >\x{6f4}< + +/[\S]/B,utf,ucp + \x{1680}\x{6f4}\x{1680} + +/[\s]/B,utf,ucp + >\x{1680}< + +/[\W]/B,utf,ucp + A\x{1712}B + +/[\w]/B,utf,ucp + >\x{1723}< + +/\D/B,utf,ucp + 1\x{3c8}2 + +/\d/B,utf,ucp + >\x{6f4}< + +/\S/B,utf,ucp + \x{1680}\x{6f4}\x{1680} + +/\s/B,utf,ucp + >\x{1680}> + +/\W/B,utf,ucp + A\x{1712}B + +/\w/B,utf,ucp + >\x{1723}< + +/[[:alpha:]]/B,ucp + +/[[:lower:]]/B,ucp + +/[[:upper:]]/B,ucp + +/[[:alnum:]]/B,ucp + +/[[:ascii:]]/B,ucp + +/[[:cntrl:]]/B,ucp + +/[[:digit:]]/B,ucp + +/[[:graph:]]/B,ucp + +/[[:print:]]/B,ucp + +/[[:punct:]]/B,ucp + +/[[:space:]]/B,ucp + +/[[:word:]]/B,ucp + +/[[:xdigit:]]/B,ucp + +# Unicode properties for \b abd \B + +/\b...\B/utf,ucp + abc_ + \x{37e}abc\x{376} + \x{37e}\x{376}\x{371}\x{393}\x{394} + !\x{c0}++\x{c1}\x{c2} + !\x{c0}+++++ + +# Without PCRE_UCP, non-ASCII always fail, even if < 256 + +/\b...\B/utf + abc_ +\= Expect no match + \x{37e}abc\x{376} + \x{37e}\x{376}\x{371}\x{393}\x{394} + !\x{c0}++\x{c1}\x{c2} + !\x{c0}+++++ + +# With PCRE_UCP, non-UTF8 chars that are < 256 still check properties + +/\b...\B/ucp + abc_ + !\x{c0}++\x{c1}\x{c2} + !\x{c0}+++++ + +# Some of these are silly, but they check various combinations + +/[[:^alpha:][:^cntrl:]]+/B,utf,ucp + 123 + abc + +/[[:^cntrl:][:^alpha:]]+/B,utf,ucp + 123 + abc + +/[[:alpha:]]+/B,utf,ucp + abc + +/[[:^alpha:]\S]+/B,utf,ucp + 123 + abc + +/[^\d]+/B,utf,ucp + abc123 + abc\x{123} + \x{660}abc + +/\p{Lu}+9\p{Lu}+B\p{Lu}+b/B + +/\p{^Lu}+9\p{^Lu}+B\p{^Lu}+b/B + +/\P{Lu}+9\P{Lu}+B\P{Lu}+b/B + +/\p{Han}+X\p{Greek}+\x{370}/B,utf + +/\p{Xan}+!\p{Xan}+A/B + +/\p{Xsp}+!\p{Xsp}\t/B + +/\p{Xps}+!\p{Xps}\t/B + +/\p{Xwd}+!\p{Xwd}_/B + +/A+\p{N}A+\dB+\p{N}*B+\d*/B,ucp + +# These behaved oddly in Perl, so they are kept in this test + +/(\x{23a}\x{23a}\x{23a})?\1/i,utf +\= Expect no match + \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65} + +/(ȺȺȺ)?\1/i,utf +\= Expect no match + ȺȺȺⱥⱥ + +/(\x{23a}\x{23a}\x{23a})?\1/i,utf + \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} + +/(ȺȺȺ)?\1/i,utf + ȺȺȺⱥⱥⱥ + +/(\x{23a}\x{23a}\x{23a})\1/i,utf +\= Expect no match + \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65} + +/(ȺȺȺ)\1/i,utf +\= Expect no match + ȺȺȺⱥⱥ + +/(\x{23a}\x{23a}\x{23a})\1/i,utf + \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} + +/(ȺȺȺ)\1/i,utf + ȺȺȺⱥⱥⱥ + +/(\x{2c65}\x{2c65})\1/i,utf + \x{2c65}\x{2c65}\x{23a}\x{23a} + +/(ⱥⱥ)\1/i,utf + ⱥⱥȺȺ + +/(\x{23a}\x{23a}\x{23a})\1Y/i,utf + X\x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65}YZ + +/(\x{2c65}\x{2c65})\1Y/i,utf + X\x{2c65}\x{2c65}\x{23a}\x{23a}YZ + +# These scripts weren't yet in Perl when I added Unicode 6.0.0 to PCRE + +/^[\p{Batak}]/utf + \x{1bc0} + \x{1bff} +\= Expect no match + \x{1bf4} + +/^[\p{Brahmi}]/utf + \x{11000} + \x{1106f} +\= Expect no match + \x{1104e} + +/^[\p{Mandaic}]/utf + \x{840} + \x{85e} +\= Expect no match + \x{85c} + \x{85d} + +/(\X*)(.)/s,utf + A\x{300} + +/^S(\X*)e(\X*)$/utf + SteÌreÌo + +/^\X/utf + ÌreÌo + +/^a\X41z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + aX41z +\= Expect no match + aAz + +/\X/ + a\=ps + a\=ph + +/\Xa/ + aa\=ps + aa\=ph + +/\X{2}/ + aa\=ps + aa\=ph + +/\X+a/ + a\=ps + aa\=ps + aa\=ph + +/\X+?a/ + a\=ps + ab\=ps + aa\=ps + aa\=ph + aba\=ps + +# These Unicode 6.1.0 scripts are not known to Perl. + +/\p{Chakma}\d/utf,ucp + \x{11100}\x{1113c} + +/\p{Takri}\d/utf,ucp + \x{11680}\x{116c0} + +/^\X/utf + A\=ps + A\=ph + A\x{300}\x{301}\=ps + A\x{300}\x{301}\=ph + A\x{301}\=ps + A\x{301}\=ph + +/^\X{2,3}/utf + A\=ps + A\=ph + AA\=ps + AA\=ph + A\x{300}\x{301}\=ps + A\x{300}\x{301}\=ph + A\x{300}\x{301}A\x{300}\x{301}\=ps + A\x{300}\x{301}A\x{300}\x{301}\=ph + +/^\X{2}/utf + AA\=ps + AA\=ph + A\x{300}\x{301}A\x{300}\x{301}\=ps + A\x{300}\x{301}A\x{300}\x{301}\=ph + +/^\X+/utf + AA\=ps + AA\=ph + +/^\X+?Z/utf + AA\=ps + AA\=ph + +/A\x{3a3}B/IBi,utf + +/[\x{3a3}]/Bi,utf + +/[^\x{3a3}]/Bi,utf + +/[\x{3a3}]+/Bi,utf + +/[^\x{3a3}]+/Bi,utf + +/a*\x{3a3}/Bi,utf + +/\x{3a3}+a/Bi,utf + +/\x{3a3}*\x{3c2}/Bi,utf + +/\x{3a3}{3}/i,utf,aftertext + \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} + +/\x{3a3}{2,4}/i,utf,aftertext + \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} + +/\x{3a3}{2,4}?/i,utf,aftertext + \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} + +/\x{3a3}+./i,utf,aftertext + \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} + +/\x{3a3}++./i,utf,aftertext +\= Expect no match + \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} + +/\x{3a3}*\x{3c2}/Bi,utf + +/[^\x{3a3}]*\x{3c2}/Bi,utf + +/[^a]*\x{3c2}/Bi,utf + +/ist/Bi,utf +\= Expect no match + ikt + +/is+t/i,utf + iSs\x{17f}t +\= Expect no match + ikt + +/is+?t/i,utf +\= Expect no match + ikt + +/is?t/i,utf +\= Expect no match + ikt + +/is{2}t/i,utf +\= Expect no match + iskt + +# This property is a PCRE special + +/^\p{Xuc}/utf + $abc + @abc + `abc + \x{1234}abc +\= Expect no match + abc + +/^\p{Xuc}+/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^\p{Xuc}+?/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^\p{Xuc}+?\*/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^\p{Xuc}++/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^\p{Xuc}{3,5}/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^\p{Xuc}{3,5}?/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^[\p{Xuc}]/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^[\p{Xuc}]+/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^\P{Xuc}/utf + abc +\= Expect no match + $abc + @abc + `abc + \x{1234}abc + +/^[\P{Xuc}]/utf + abc +\= Expect no match + $abc + @abc + `abc + \x{1234}abc + +# Some auto-possessification tests + +/\pN+\z/B + +/\PN+\z/B + +/\pN+/B + +/\PN+/B + +/\p{Any}+\p{Any} \p{Any}+\P{Any} \p{Any}+\p{L&} \p{Any}+\p{L} \p{Any}+\p{Lu} \p{Any}+\p{Han} \p{Any}+\p{Xan} \p{Any}+\p{Xsp} \p{Any}+\p{Xps} \p{Xwd}+\p{Any} \p{Any}+\p{Xuc}/Bx,ucp + +/\p{L&}+\p{Any} \p{L&}+\p{L&} \P{L&}+\p{L&} \p{L&}+\p{L} \p{L&}+\p{Lu} \p{L&}+\p{Han} \p{L&}+\p{Xan} \p{L&}+\P{Xan} \p{L&}+\p{Xsp} \p{L&}+\p{Xps} \p{Xwd}+\p{L&} \p{L&}+\p{Xuc}/Bx,ucp + +/\p{N}+\p{Any} \p{N}+\p{L&} \p{N}+\p{L} \p{N}+\P{L} \p{N}+\P{N} \p{N}+\p{Lu} \p{N}+\p{Han} \p{N}+\p{Xan} \p{N}+\p{Xsp} \p{N}+\p{Xps} \p{Xwd}+\p{N} \p{N}+\p{Xuc}/Bx,ucp + +/\p{Lu}+\p{Any} \p{Lu}+\p{L&} \p{Lu}+\p{L} \p{Lu}+\p{Lu} \P{Lu}+\p{Lu} \p{Lu}+\p{Nd} \p{Lu}+\P{Nd} \p{Lu}+\p{Han} \p{Lu}+\p{Xan} \p{Lu}+\p{Xsp} \p{Lu}+\p{Xps} \p{Xwd}+\p{Lu} \p{Lu}+\p{Xuc}/Bx,ucp + +/\p{Han}+\p{Lu} \p{Han}+\p{L&} \p{Han}+\p{L} \p{Han}+\p{Lu} \p{Han}+\p{Arabic} \p{Arabic}+\p{Arabic} \p{Han}+\p{Xan} \p{Han}+\p{Xsp} \p{Han}+\p{Xps} \p{Xwd}+\p{Han} \p{Han}+\p{Xuc}/Bx,ucp + +/\p{Xan}+\p{Any} \p{Xan}+\p{L&} \P{Xan}+\p{L&} \p{Xan}+\p{L} \p{Xan}+\p{Lu} \p{Xan}+\p{Han} \p{Xan}+\p{Xan} \p{Xan}+\P{Xan} \p{Xan}+\p{Xsp} \p{Xan}+\p{Xps} \p{Xwd}+\p{Xan} \p{Xan}+\p{Xuc}/Bx,ucp + +/\p{Xsp}+\p{Any} \p{Xsp}+\p{L&} \p{Xsp}+\p{L} \p{Xsp}+\p{Lu} \p{Xsp}+\p{Han} \p{Xsp}+\p{Xan} \p{Xsp}+\p{Xsp} \P{Xsp}+\p{Xsp} \p{Xsp}+\p{Xps} \p{Xwd}+\p{Xsp} \p{Xsp}+\p{Xuc}/Bx,ucp + +/\p{Xwd}+\p{Any} \p{Xwd}+\p{L&} \p{Xwd}+\p{L} \p{Xwd}+\p{Lu} \p{Xwd}+\p{Han} \p{Xwd}+\p{Xan} \p{Xwd}+\p{Xsp} \p{Xwd}+\p{Xps} \p{Xwd}+\p{Xwd} \p{Xwd}+\P{Xwd} \p{Xwd}+\p{Xuc}/Bx,ucp + +/\p{Xuc}+\p{Any} \p{Xuc}+\p{L&} \p{Xuc}+\p{L} \p{Xuc}+\p{Lu} \p{Xuc}+\p{Han} \p{Xuc}+\p{Xan} \p{Xuc}+\p{Xsp} \p{Xuc}+\p{Xps} \p{Xwd}+\p{Xuc} \p{Xuc}+\p{Xuc} \p{Xuc}+\P{Xuc}/Bx,ucp + +/\p{N}+\p{Ll} \p{N}+\p{Nd} \p{N}+\P{Nd}/Bx,ucp + +/\p{Xan}+\p{L} \p{Xan}+\p{N} \p{Xan}+\p{C} \p{Xan}+\P{L} \P{Xan}+\p{N} \p{Xan}+\P{C}/Bx,ucp + +/\p{L}+\p{Xan} \p{N}+\p{Xan} \p{C}+\p{Xan} \P{L}+\p{Xan} \p{N}+\p{Xan} \P{C}+\p{Xan} \p{L}+\P{Xan}/Bx,ucp + +/\p{Xan}+\p{Lu} \p{Xan}+\p{Nd} \p{Xan}+\p{Cc} \p{Xan}+\P{Ll} \P{Xan}+\p{No} \p{Xan}+\P{Cf}/Bx,ucp + +/\p{Lu}+\p{Xan} \p{Nd}+\p{Xan} \p{Cs}+\p{Xan} \P{Lt}+\p{Xan} \p{Nl}+\p{Xan} \P{Cc}+\p{Xan} \p{Lt}+\P{Xan}/Bx,ucp + +/\w+\p{P} \w+\p{Po} \w+\s \p{Xan}+\s \s+\p{Xan} \s+\w/Bx,ucp + +/\w+\P{P} \W+\p{Po} \w+\S \P{Xan}+\s \s+\P{Xan} \s+\W/Bx,ucp + +/\w+\p{Po} \w+\p{Pc} \W+\p{Po} \W+\p{Pc} \w+\P{Po} \w+\P{Pc}/Bx,ucp + +/\p{Nl}+\p{Xan} \P{Nl}+\p{Xan} \p{Nl}+\P{Xan} \P{Nl}+\P{Xan}/Bx,ucp + +/\p{Xan}+\p{Nl} \P{Xan}+\p{Nl} \p{Xan}+\P{Nl} \P{Xan}+\P{Nl}/Bx,ucp + +/\p{Xan}+\p{Nd} \P{Xan}+\p{Nd} \p{Xan}+\P{Nd} \P{Xan}+\P{Nd}/Bx,ucp + +# End auto-possessification tests + +/\w+/B,utf,ucp,auto_callout + abcd + +/[\p{N}]?+/B,no_auto_possess + +/[\p{L}ab]{2,3}+/B,no_auto_possess + +/\D+\X \d+\X \S+\X \s+\X \W+\X \w+\X \R+\X \H+\X \h+\X \V+\X \v+\X a+\X \n+\X .+\X/Bx + +/.+\X/Bsx + +/\X+$/Bmx + +/\X+\D \X+\d \X+\S \X+\s \X+\W \X+\w \X+. \X+\R \X+\H \X+\h \X+\V \X+\v \X+\X \X+\Z \X+\z \X+$/Bx + +/\d+\s{0,5}=\s*\S?=\w{0,4}\W*/B,utf,ucp + +/[RST]+/Bi,utf,ucp + +/[R-T]+/Bi,utf,ucp + +/[Q-U]+/Bi,utf,ucp + +/^s?c/Iim,utf + scat + +/\X?abc/utf,no_start_optimize + \xff\x7f\x00\x00\x03\x00\x41\xcc\x80\x41\x{300}\x61\x62\x63\x00\=no_utf_check,offset=06 + +/\x{100}\x{200}\K\x{300}/utf,startchar + \x{100}\x{200}\x{300} + +# Test UTF characters in a substitution + +/ábc/utf,replace=XሴZ + 123ábc123 + +/(?<=abc)(|def)/g,utf,replace=<$0> + 123abcáyzabcdef789abcሴqr + +/[A-`]/iB,utf + abcdefghijklmno + +/(?<=\K\x{17f})/g,utf,aftertext + \x{17f}\x{17f}\x{17f}\x{17f}\x{17f} + +/(?<=\K\x{17f})/altglobal,utf,aftertext + \x{17f}\x{17f}\x{17f}\x{17f}\x{17f} + +"\xa\xf<(.\pZ*\P{Xwd}+^\xa8\3'3yq.::?(?J:()\xd1+!~:3'(8?:)':(?'d'(?'d'^u]!.+.+\\A\Ah(n+?9){7}+\K;(?'X'u'(?'c'(?'z'(?\xb::\xf0'|\xd3(\xae?'w(z\x8?P>l)\x8?P>a)'\H\R\xd1+!!~:3'(?:h$N{26875}\W+?\\=D{2}\x89(?i:Uy0\N({2\xa(\v\x85*){y*\A(()\p{L}+?\P{^Xan}'+?\xff\+pS\?|).{;y*\A(()\p{L}+?\8}\d?1(|)(/1){7}.+[Lp{Me}].\s\xdcC*?(?())(?))(?\g{d});\g{x}\x11\g{d}\x81\|$((?'X'\'X'(?'W''\x92()'9'\x83*))\xba*\!?^ <){)':;\xcc4'\xd1'(?'X'28))?-%--\x95$9*\4'|\xd1((''e\x94*$9:)*#(?'R')3)\x7?('P\xed')\\x16:;()\x1e\x10*:(?)\xd1+0!~:(?)'d'E:yD!\s(?'R'\x1e;\x10:U))|'\x9g!\xb0*){)\\x16:;()\x1e\x10\x87*:(?)\xd1+!~:(?)'}'\d'E:yD!\s(?'R'\x1e;\x10:U))|'))|)g!\xb0*R+9{29+)#(?'P'})*?pS\{3,}\x85,{0,}l{*UTF)(\xe{7}){3722,{9,}d{2,?|))|{)\(A?&d}}{\xa,}2}){3,}7,l{)22}(,}l:7{2,4}}29\x19+)#?'P'})*v?))\x5" + +/$(&.+[\p{Me}].\s\xdcC*?(?())(?)\xd1+!~:(?)''(d'E:yD!\s(?'R'\x1e;\x10:U))|')g!\xb0*){29+))#(?'P'})*?/ + +"(*UTF)(*UCP)(.UTF).+X(\V+;\^(\D|)!999}(?(?C{7(?C')\H*\S*/^\x5\xa\\xd3\x85n?(;\D*(?m).[^mH+((*UCP)(*U:F)})(?!^)(?'" + +/[\pS#moq]/ + = + +/(*:a\x{12345}b\t(d\)c)xxx/utf,alt_verbnames,mark + cxxxz + +/abcd/utf,replace=x\x{824}y\o{3333}z(\Q12\$34$$\x34\E5$$),substitute_extended + abcd + +/a(\x{e0}\x{101})(\x{c0}\x{102})/utf,replace=a\u$1\U$1\E$1\l$2\L$2\Eab\U\x{e0}\x{101}\L\x{d0}\x{160}\EDone,substitute_extended + a\x{e0}\x{101}\x{c0}\x{102} + +/((?\d)|(?\p{L}))/g,substitute_extended,replace=<${digit:+digit; :not digit; }${letter:+letter:not a letter}> + ab12cde + +/(*UCP)(*UTF)[[:>:]]X/B + +/abc/utf,replace=xyz + abc\=zero_terminate + +/a[[:punct:]b]/ucp,bincode + +/a[[:punct:]b]/utf,ucp,bincode + +/a[b[:punct:]]/utf,ucp,bincode + +/[[:^ascii:]]/utf,ucp,bincode + +/[[:^ascii:]\w]/utf,ucp,bincode + +/[\w[:^ascii:]]/utf,ucp,bincode + +/[^[:ascii:]\W]/utf,ucp,bincode + \x{de} + \x{200} +\= Expect no match + \x{300} + \x{37e} + +/[[:^ascii:]a]/utf,ucp,bincode + +/L(?#(|++3 - *** Failers + xyzabc\=offset=3 +\= Expect no match xyzabc - xyzabc\>2 + xyzabc\=offset=2 /x\dy\Dz/ x9yzz x0y+z - *** Failers +\= Expect no match xyz xxy0z /x\sy\Sz/ x yzz x y+z - *** Failers +\= Expect no match xyz xxyyz /x\wy\Wz/ xxy+z - *** Failers +\= Expect no match xxy0z x+y+z /x.y/ x+y x-y - *** Failers +\= Expect no match x\ny /x.y/s @@ -96,46 +98,45 @@ a+bc+dp+q a+bc\ndp+q x\nyp+q - *** Failers +\= Expect no match a\nbc\ndp+q a+bc\ndp\nq x\nyp\nq /a\d\z/ ba0 - *** Failers +\= Expect no match ba0\n ba0\ncd /a\d\z/m ba0 - *** Failers +\= Expect no match ba0\n ba0\ncd /a\d\Z/ ba0 ba0\n - *** Failers +\= Expect no match ba0\ncd /a\d\Z/m ba0 ba0\n - *** Failers +\= Expect no match ba0\ncd /a\d$/ ba0 ba0\n - *** Failers +\= Expect no match ba0\ncd /a\d$/m ba0 ba0\n ba0\ncd - *** Failers /abc/i abc @@ -156,14 +157,14 @@ axxyz axxxyzq axxxxyzq - *** Failers +\= Expect no match ax axx /x{3}yz/ axxxyzq axxxxyzq - *** Failers +\= Expect no match ax axx ayzq @@ -174,30 +175,29 @@ axxyz axxxyzq axxxxyzq - *** Failers +\= Expect no match ax axx ayzq axyzq -/[^a]+/O +/[^a]+/no_auto_possess bac bcdefax - *** Failers +\= Expect no match aaaaa -/[^a]*/O +/[^a]*/no_auto_possess bac bcdefax - *** Failers aaaaa -/[^a]{3,5}/O +/[^a]{3,5}/no_auto_possess xyz awxyza abcdefa abcdefghijk - *** Failers +\= Expect no match axya axa aaaaa @@ -212,25 +212,24 @@ /\d+/ ab1234c56 - *** Failers +\= Expect no match xyz /\D+/ ab123c56 - *** Failers +\= Expect no match 789 /\d?A/ 045ABC ABC - *** Failers +\= Expect no match XYZ /\D?A/ ABC BAC 9ABC - *** Failers /a+/ aaaa @@ -242,7 +241,7 @@ /^.+xyz/ abcdxyz axyz - *** Failers +\= Expect no match xyz /^.?xyz/ @@ -252,7 +251,7 @@ /^\d{2,3}X/ 12X 123X - *** Failers +\= Expect no match X 1X 1234X @@ -262,7 +261,7 @@ b93 c99z d04 - *** Failers +\= Expect no match e45 abcd abcd1234 @@ -275,7 +274,7 @@ d04 abcd1234 1234 - *** Failers +\= Expect no match e45 abcd @@ -285,7 +284,7 @@ c99z d04 abcd1234 - *** Failers +\= Expect no match 1234 e45 abcd @@ -300,14 +299,14 @@ c99z d04 1234 - *** Failers +\= Expect no match abcd1234 e45 /^[abcd]{2,3}\d/ ab45 bcd93 - *** Failers +\= Expect no match 1234 a36 abcd1234 @@ -317,24 +316,23 @@ abc45 abcabcabc45 42xyz - *** Failers /^(abc)+\d/ abc45 abcabcabc45 - *** Failers +\= Expect no match 42xyz /^(abc)?\d/ abc45 42xyz - *** Failers +\= Expect no match abcabcabc45 /^(abc){2,3}\d/ abcabc45 abcabcabc45 - *** Failers +\= Expect no match abcabcabcabc45 abc45 42xyz @@ -353,7 +351,7 @@ abc a(b)c a(b(c))d - *** Failers) +\= Expect no match) a(b(c)d /^>abc>([^()]|\((?1)*\))*a*)\d/ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa9876 - *** Failers +\= Expect no match aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa /< (?: (?(R) \d++ | [^<>]*+) | (?R)) * >/x @@ -373,33 +371,33 @@ hij> def> - *** Failers +\= Expect no match 3 - *** Failers + defabcxyz\=offset=3 +\= Expect no match defabcxyz /^abcdef/ - ab\P - abcde\P - abcdef\P - *** Failers - abx\P + ab\=ps + abcde\=ps + abcdef\=ps +\= Expect no match + abx\=ps /^a{2,4}\d+z/ - a\P - aa\P - aa2\P - aaa\P - aaa23\P - aaaa12345\P - aa0z\P - aaaa4444444444444z\P - *** Failers - az\P - aaaaa\P - a56\P + a\=ps + aa\=ps + aa2\=ps + aaa\=ps + aaa23\=ps + aaaa12345\=ps + aa0z\=ps + aaaa4444444444444z\=ps +\= Expect no match + az\=ps + aaaaa\=ps + a56\=ps /^abcdef/ - abc\P - def\R + abc\=ps + def\=dfa_restart /(?<=foo)bar/ - xyzfo\P - foob\P\>2 - foobar...\R\P\>4 - xyzfo\P - foobar\>2 - *** Failers - xyzfo\P - obar\R + foob\=ps,offset=2,allusedtext + foobar...\=ps,dfa_restart,offset=4 + foobar\=offset=2 +\= Expect no match + xyzfo\=ps + obar\=dfa_restart /(ab*(cd|ef))+X/ - adfadadaklhlkalkajhlkjahdfasdfasdfladsfjkj\P\Z - lkjhlkjhlkjhlkjhabbbbbbcdaefabbbbbbbefa\P\B\Z - cdabbbbbbbb\P\R\B\Z - efabbbbbbbbbbbbbbbb\P\R\B\Z - bbbbbbbbbbbbcdXyasdfadf\P\R\B\Z - -/(a|b)/SF>testsavedregex ->>aaabxyzpqrrrabbxyyyypqAzz >aaaabxyzpqrrrabbxyyyypqAzz >>>>abcxyzpqrrrabbxyyyypqAzz - *** Failers +\= Expect no match abxyzpqrrabbxyyyypqAzz abxyzpqrrrrabbxyyyypqAzz abxyzpqrrrabxyyyypqAzz @@ -567,7 +559,7 @@ /^(abc){1,2}zz/ abczz abcabczz - *** Failers +\= Expect no match zz abcabcabczz >>abczz @@ -581,7 +573,7 @@ aac abbbbbbbbbbbc bbbbbbbbbbbac - *** Failers +\= Expect no match aaac abbbbbbbbbbbac @@ -594,7 +586,7 @@ aac abbbbbbbbbbbc bbbbbbbbbbbac - *** Failers +\= Expect no match aaac abbbbbbbbbbbac @@ -605,7 +597,7 @@ babc bbabc bababc - *** Failers +\= Expect no match bababbc babababc @@ -613,7 +605,7 @@ babc bbabc bababc - *** Failers +\= Expect no match bababbc babababc @@ -627,7 +619,7 @@ cthing dthing ething - *** Failers +\= Expect no match fthing [thing \\thing @@ -637,7 +629,7 @@ cthing dthing ething - *** Failers +\= Expect no match athing fthing @@ -645,7 +637,7 @@ fthing [thing \\thing - *** Failers +\= Expect no match athing bthing ]thing @@ -656,7 +648,7 @@ /^[^]cde]/ athing fthing - *** Failers +\= Expect no match ]thing cthing dthing @@ -681,7 +673,7 @@ 9 10 100 - *** Failers +\= Expect no match abc /^.*nter/ @@ -692,28 +684,28 @@ /^xxx[0-9]+$/ xxx0 xxx1234 - *** Failers +\= Expect no match xxx /^.+[0-9][0-9][0-9]$/ x123 xx123 123456 - *** Failers - 123 x1234 +\= Expect no match + 123 /^.+?[0-9][0-9][0-9]$/ x123 xx123 123456 - *** Failers - 123 x1234 +\= Expect no match + 123 /^([^!]+)!(.+)=apquxz\.ixr\.zzz\.ac\.uk$/ abc!pqr=apquxz.ixr.zzz.ac.uk - *** Failers +\= Expect no match !pqr=apquxz.ixr.zzz.ac.uk abc!=apquxz.ixr.zzz.ac.uk abc!pqr=apquxz:ixr.zzz.ac.uk @@ -721,7 +713,8 @@ /:/ Well, we need a colon: somewhere - *** Fail if we don't +\= Expect no match + No match without a colon /([\da-f:]+)$/i 0abc @@ -732,7 +725,7 @@ 5f03:12C0::932e fed def Any old stuff - *** Failers +\= Expect no match 0zzz gzzz fed\x20 @@ -741,7 +734,7 @@ /^.*\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/ .1.2.3 A.12.123.0 - *** Failers +\= Expect no match .1.2.3333 1.2.3 1234.2.3 @@ -749,7 +742,7 @@ /^(\d+)\s+IN\s+SOA\s+(\S+)\s+(\S+)\s*\(\s*$/ 1 IN SOA non-sp1 non-sp2( 1 IN SOA non-sp1 non-sp2 ( - *** Failers +\= Expect no match 1IN SOA non-sp1 non-sp2( /^[a-zA-Z\d][a-zA-Z\d\-]*(\.[a-zA-Z\d][a-zA-z\d\-]*)*\.$/ @@ -759,7 +752,7 @@ ab-c.pq-r. sxk.zzz.ac.uk. x-.y-. - *** Failers +\= Expect no match -abc.peq. /^\*\.[a-z]([a-z\-\d]*[a-z\d]+)?(\.[a-z]([a-z\-\d]*[a-z\d]+)?)*$/ @@ -767,7 +760,7 @@ *.b0-a *.c3-b.c *.c-a.b-c - *** Failers +\= Expect no match *.0 *.a- *.a-b.c- @@ -791,29 +784,28 @@ \"1234\" \"abcd\" ; \"\" ; rhubarb - *** Failers +\= Expect no match \"1234\" : things /^$/ \ - *** Failers / ^ a (?# begins with a) b\sc (?# then b c) $ (?# then end)/x ab c - *** Failers +\= Expect no match abc ab cde /(?x) ^ a (?# begins with a) b\sc (?# then b c) $ (?# then end)/ ab c - *** Failers +\= Expect no match abc ab cde /^ a\ b[c ]d $/x a bcd a b d - *** Failers +\= Expect no match abcd ab d @@ -867,7 +859,7 @@ 1234567890 12345678ab 12345678__ - *** Failers +\= Expect no match 1234567 /^[aeiou\d]{4,5}$/ @@ -875,7 +867,7 @@ 1234 12345 aaaaa - *** Failers +\= Expect no match 123456 /^[aeiou\d]{4,5}?/ @@ -891,7 +883,7 @@ /^From\s+\S+\s+([a-zA-Z]{3}\s+){2}\d{1,2}\s+\d\d:\d\d/ From abcd Mon Sep 01 12:33:02 1997 From abcd Mon Sep 1 12:33:02 1997 - *** Failers +\= Expect no match From abcd Sep 01 12:33:02 1997 /^12.34/s @@ -912,7 +904,7 @@ /^(\D*)(?=\d)(?!123)/ abc456 - *** Failers +\= Expect no match abc123 /^1234(?# test newlines @@ -932,24 +924,24 @@ /(?!^)abc/ the abc - *** Failers +\= Expect no match abc /(?=^)abc/ abc - *** Failers +\= Expect no match the abc -/^[ab]{1,3}(ab*|b)/O +/^[ab]{1,3}(ab*|b)/no_auto_possess aabbbbb -/^[ab]{1,3}?(ab*|b)/O +/^[ab]{1,3}?(ab*|b)/no_auto_possess aabbbbb -/^[ab]{1,3}?(ab*?|b)/O +/^[ab]{1,3}?(ab*?|b)/no_auto_possess aabbbbb -/^[ab]{1,3}(ab*?|b)/O +/^[ab]{1,3}(ab*?|b)/no_auto_possess aabbbbb / (?: [\040\t] | \( @@ -1153,7 +1145,7 @@ A. Other (a comment) \"/s=user/ou=host/o=place/prmd=uu.yy/admd= /c=gb/\"\@x400-re.lay A missing angle (a comment) \"/s=user/ou=host/o=place/prmd=uu.yy/admd= /c=gb/\"\@x400-re.lay A missing angle +/^a.b/newline=lf a\rb - *** Failers +\= Expect no match a\nb /abc$/ abc abc\n - *** Failers +\= Expect no match abc\ndef /(abc)\123/ @@ -1926,22 +1913,9 @@ abc\100\060 abc\100\60 -/^A\8B\9C$/ - A8B9C - *** Failers - A\08B\09C - -/^[A\8B\9C]+$/ - A8B9C - *** Failers - A8B9C\x00 - /(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)\12\123/ abcdefghijk\12S -/ab\idef/ - abidef - /a{0}bc/ bc @@ -1965,7 +1939,7 @@ baNOTcccd baNOTccd bacccd - *** Failers +\= Expect no match anything b\bc baccd @@ -1987,14 +1961,14 @@ /[^k]$/ abc - *** Failers +\= Expect no match abk /[^k]{2,3}$/ abc kbc kabc - *** Failers +\= Expect no match abk akb akk @@ -2002,7 +1976,7 @@ /^\d{8,}\@.+[^k]$/ 12345678\@a.b.c.d 123456789\@x.y.z - *** Failers +\= Expect no match 12345678\@x.y.uk 1234567\@a.b.c.d @@ -2039,7 +2013,7 @@ /(\.\d\d((?=0)|\d(?=\d)))/ 1.230003938 1.875000282 - *** Failers +\= Expect no match 1.235 /a(?)b/ @@ -2051,16 +2025,16 @@ /foo(.*)bar/ The food is under the bar in the barn. -/foo(.*?)bar/ +/foo(.*?)bar/ The food is under the bar in the barn. -/(.*)(\d*)/O +/(.*)(\d*)/no_auto_possess I have 2 numbers: 53147 /(.*)(\d+)/ I have 2 numbers: 53147 -/(.*?)(\d*)/O +/(.*?)(\d*)/no_auto_possess I have 2 numbers: 53147 /(.*?)(\d+)/ @@ -2083,13 +2057,13 @@ /^(\D*)(?=\d)(?!123)/ ABC445 - *** Failers +\= Expect no match ABC123 /^[W-]46]/ W46]789 -46]789 - *** Failers +\= Expect no match Wall Zebra 42 @@ -2105,7 +2079,7 @@ [abcd] ]abcd[ \\backslash - *** Failers +\= Expect no match -46]789 well @@ -2114,9 +2088,11 @@ /word (?:[a-zA-Z0-9]+ ){0,10}otherword/ word cat dog elephant mussel cow horse canary baboon snake shark otherword +\= Expect no match word cat dog elephant mussel cow horse canary baboon snake shark /word (?:[a-zA-Z0-9]+ ){0,300}otherword/ +\= Expect no match word cat dog elephant mussel cow horse canary baboon snake shark the quick brown fox and the lazy dog and several other words getting close to thirty by now I hope /^(a){0,0}/ @@ -2148,27 +2124,31 @@ aaaaaaaa /^(a){1,1}/ - bcd abc aab +\= Expect no match + bcd /^(a){1,2}/ - bcd abc aab +\= Expect no match + bcd /^(a){1,3}/ - bcd abc aab aaa +\= Expect no match + bcd /^(a){1,}/ - bcd abc aab aaa aaaaaaaa +\= Expect no match + bcd /.*\.gif/ borfle\nbib.gif\nno @@ -2212,7 +2192,7 @@ /(.*X|^B)/ abcde\n1234Xyz BarFoo - *** Failers +\= Expect no match abcde\nBar /(.*X|^B)/m @@ -2223,7 +2203,7 @@ /(.*X|^B)/s abcde\n1234Xyz BarFoo - *** Failers +\= Expect no match abcde\nBar /(.*X|^B)/ms @@ -2234,17 +2214,17 @@ /(?s)(.*X|^B)/ abcde\n1234Xyz BarFoo - *** Failers +\= Expect no match abcde\nBar /(?s:.*X|^B)/ abcde\n1234Xyz BarFoo - *** Failers +\= Expect no match abcde\nBar /^.*B/ - **** Failers +\= Expect no match abc\nB /(?s)^.*B/ @@ -2282,47 +2262,47 @@ /^[abcdefghijklmnopqrstuvwxy0123456789]/ n - *** Failers +\= Expect no match z /abcde{0,0}/ abcd - *** Failers +\= Expect no match abce /ab[cd]{0,0}e/ abe - *** Failers +\= Expect no match abcde /ab(c){0,0}d/ abd - *** Failers +\= Expect no match abcd /a(b*)/ a ab abbbb - *** Failers +\= Expect no match bbbbb /ab\d{0}e/ abe - *** Failers +\= Expect no match ab1e /"([^\\"]+|\\.)*"/ the \"quick\" brown fox \"the \\\"quick\\\" brown fox\" -/.*?/g+ +/.*?/g,aftertext abc -/\b/g+ +/\b/g,aftertext abc -/\b/+g +/\b/g,aftertext abc //g @@ -2337,7 +2317,7 @@ /a.b/ acb - *** Failers +\= Expect no match a\nb /a[^a]b/s @@ -2363,8 +2343,9 @@ bbbbbac /(?!\A)x/m - x\nb\n a\bx\n +\= Expect no match + x\nb\n /\x0{ab}/ \0{ab} @@ -2380,32 +2361,33 @@ catfood arfootle rfoosh - *** Failers +\= Expect no match barfoo towbarfoo /\w{3}(?.*/)foo" +\= Expect no match /this/is/a/very/long/line/in/deed/with/very/many/slashes/in/it/you/see/ "(?>.*/)foo" @@ -2414,12 +2396,12 @@ /(?>(\.\d\d[1-9]?))\d+/ 1.230003938 1.875000282 - *** Failers +\= Expect no match 1.235 /^((?>\w+)|(?>\s+))*$/ now is the time for all good men to come to the aid of the party - *** Failers +\= Expect no match this is not a line with only words and spaces! /(\d+)(\w)/ @@ -2428,7 +2410,7 @@ /((?>\d+))(\w)/ 12345a - *** Failers +\= Expect no match 12345+ /(?>a+)b/ @@ -2452,35 +2434,35 @@ /((?>[^()]+)|\([^()]*\))+/ ((abc(ade)ufh()()x -/\(((?>[^()]+)|\([^()]+\))+\)/ +/\(((?>[^()]+)|\([^()]+\))+\)/ (abc) (abc(def)xyz) - *** Failers +\= Expect no match ((()aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa /a(?-i)b/i ab Ab - *** Failers +\= Expect no match aB AB /(a (?x)b c)d e/ a bcd e - *** Failers +\= Expect no match a b cd e abcd e a bcde /(a b(?x)c d (?-x)e f)/ a bcde f - *** Failers +\= Expect no match abcdef /(a(?i)b)c/ abc aBc - *** Failers +\= Expect no match abC aBC Abc @@ -2491,7 +2473,7 @@ /a(?i:b)c/ abc aBc - *** Failers +\= Expect no match ABC abC aBC @@ -2499,14 +2481,14 @@ /a(?i:b)*c/ aBc aBBc - *** Failers +\= Expect no match aBC aBBC /a(?=b(?i)c)\w\wd/ abcd abCd - *** Failers +\= Expect no match aBCd abcD @@ -2514,7 +2496,7 @@ more than million more than MILLION more \n than Million - *** Failers +\= Expect no match MORE THAN MILLION more \n than \n million @@ -2522,15 +2504,15 @@ more than million more than MILLION more \n than Million - *** Failers +\= Expect no match MORE THAN MILLION more \n than \n million -/(?>a(?i)b+)+c/ +/(?>a(?i)b+)+c/ abc aBbc aBBc - *** Failers +\= Expect no match Abc abAb abbC @@ -2538,7 +2520,7 @@ /(?=a(?i)b)\w\wc/ abc aBc - *** Failers +\= Expect no match Ab abC aBC @@ -2546,7 +2528,7 @@ /(?<=a(?i)b)(\w\w)c/ abxxc aBxxc - *** Failers +\= Expect no match Abxxc ABxxc abxxC @@ -2554,14 +2536,14 @@ /^(?(?=abc)\w{3}:|\d\d)$/ abc: 12 - *** Failers +\= Expect no match 123 xyz /^(?(?!abc)\d\d|\w{3}:)$/ abc: 12 - *** Failers +\= Expect no match 123 xyz @@ -2570,7 +2552,7 @@ cat fcat focat - *** Failers +\= Expect no match foocat /(?(?a*)*/ @@ -2647,7 +2629,7 @@ /(?(?=[^a-z]+[a-z]) \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} ) /x 12-sep-98 12-09-98 - *** Failers +\= Expect no match sep-12-98 /(?i:saturday|sunday)/ @@ -2664,7 +2646,7 @@ aBCx bbx BBx - *** Failers +\= Expect no match abcX aBCX bbX @@ -2678,7 +2660,7 @@ Europe frog France - *** Failers +\= Expect no match Africa /^(ab|a(?i)[b-c](?m-i)d|x(?i)y|z)/ @@ -2688,13 +2670,13 @@ xY zebra Zambesi - *** Failers +\= Expect no match aCD XY /(?<=foo\n)^bar/m foo\nbar - *** Failers +\= Expect no match bar baz\nbar @@ -2702,18 +2684,18 @@ barbaz barbarbaz koobarbaz - *** Failers +\= Expect no match baz foobarbaz -/The following tests are taken from the Perl 5.005 test suite; some of them/ -/are compatible with 5.004, but I'd rather not have to sort them out./ +# The following tests are taken from the Perl 5.005 test suite; some of them +# are compatible with 5.004, but I'd rather not have to sort them out. /abc/ abc xabcy ababc - *** Failers +\= Expect no match xbc axc abx @@ -2737,7 +2719,7 @@ /ab+bc/ abbc - *** Failers +\= Expect no match abc abq @@ -2754,7 +2736,7 @@ abbbbc /ab{4,5}bc/ - *** Failers +\= Expect no match abq abbbbc @@ -2775,7 +2757,7 @@ /^abc$/ abc - *** Failers +\= Expect no match abbbbc abcc @@ -2786,8 +2768,8 @@ /abc$/ aabc - *** Failers aabc +\= Expect no match aabcd /^/ @@ -2805,7 +2787,7 @@ /a[bc]d/ abd - *** Failers +\= Expect no match axyzd abc @@ -2829,7 +2811,7 @@ /a[^bc]d/ aed - *** Failers +\= Expect no match abd abd @@ -2838,8 +2820,8 @@ /a[^]b]c/ adc - *** Failers a-c +\= Expect no match a]c /\ba\b/ @@ -2848,13 +2830,13 @@ -a- /\by\b/ - *** Failers +\= Expect no match xy yz xyz /\Ba\B/ - *** Failers +\= Expect no match a- -a -a- @@ -2873,8 +2855,7 @@ /\W/ - - *** Failers - - +\= Expect no match a /a\sb/ @@ -2882,8 +2863,7 @@ /a\Sb/ a-b - *** Failers - a-b +\= Expect no match a b /\d/ @@ -2891,8 +2871,7 @@ /\D/ - - *** Failers - - +\= Expect no match 1 /[\w]/ @@ -2900,8 +2879,7 @@ /[\W]/ - - *** Failers - - +\= Expect no match a /a[\s]b/ @@ -2909,8 +2887,7 @@ /a[\S]b/ a-b - *** Failers - a-b +\= Expect no match a b /[\d]/ @@ -2918,8 +2895,7 @@ /[\D]/ - - *** Failers - - +\= Expect no match 1 /ab|cd/ @@ -2939,6 +2915,8 @@ a((b /a\\b/ + a\\b +\= Expect no match a\b /((a))/ @@ -2978,12 +2956,10 @@ cde /abc/ - *** Failers +\= Expect no match b - /a*/ - /([abc])*d/ abbbcd @@ -3037,7 +3013,7 @@ adcdcde /a[bcd]+dcdcde/ - *** Failers +\= Expect no match abcde adcdcde @@ -3057,7 +3033,7 @@ effgz ij reffgz - *** Failers +\= Expect no match effg bcdd @@ -3068,7 +3044,7 @@ a /multiple words of text/ - *** Failers +\= Expect no match aa uh-uh @@ -3096,7 +3072,7 @@ ABC XABCY ABABC - *** Failers +\= Expect no match aaxabxbaxbbx XBC AXC @@ -3119,7 +3095,7 @@ ABBC /ab+bc/i - *** Failers +\= Expect no match ABC ABQ @@ -3138,7 +3114,7 @@ ABBBBC /ab{4,5}?bc/i - *** Failers +\= Expect no match ABQ ABBBBC @@ -3159,7 +3135,7 @@ /^abc$/i ABC - *** Failers +\= Expect no match ABBBBC ABCC @@ -3185,8 +3161,8 @@ AXYZC /a.*c/i - *** Failers AABC +\= Expect no match AXYZD /a[bc]d/i @@ -3194,7 +3170,7 @@ /a[b-d]e/i ACE - *** Failers +\= Expect no match ABC ABD @@ -3218,7 +3194,7 @@ /a[^-b]c/i ADC - *** Failers +\= Expect no match ABD A-C @@ -3233,7 +3209,7 @@ DEF /$b/i - *** Failers +\= Expect no match A]C B @@ -3245,7 +3221,8 @@ A((B /a\\b/i - A\B +\= Expect no match + A\=notbol /((a))/i ABC @@ -3295,7 +3272,6 @@ /abc/i /a*/i - /([abc])*d/i ABBBCD @@ -3323,6 +3299,7 @@ HIJ /^(ab|cd)e/i +\= Expect no match ABCDE /(abc|)ef/i @@ -3367,7 +3344,7 @@ EFFGZ IJ REFFGZ - *** Failers +\= Expect no match ADCDCDE EFFG BCDD @@ -3385,7 +3362,7 @@ C /multiple words of text/i - *** Failers +\= Expect no match AA UH-UH @@ -3478,7 +3455,7 @@ /(?<=a)b/ ab - *** Failers +\= Expect no match cb b @@ -3524,7 +3501,7 @@ Ab /(?:(?i)a)b/ - *** Failers +\= Expect no match cb aB @@ -3543,7 +3520,7 @@ Ab /(?i:a)b/ - *** Failers +\= Expect no match aB aB @@ -3562,8 +3539,8 @@ aB /(?:(?-i)a)b/i - *** Failers aB +\= Expect no match Ab /((?-i)a)b/i @@ -3575,7 +3552,7 @@ aB /(?:(?-i)a)b/i - *** Failers +\= Expect no match Ab AB @@ -3594,7 +3571,7 @@ aB /(?-i:a)b/i - *** Failers +\= Expect no match AB Ab @@ -3607,14 +3584,14 @@ aB /(?-i:a)b/i - *** Failers +\= Expect no match Ab AB /((?-i:a))b/i /((?-i:a.))b/i - *** Failers +\= Expect no match AB a\nB @@ -3640,7 +3617,7 @@ aaac /(?.*)(?<=(abcd|wxyz))/ alphabetabcd endingwxyz - *** Failers +\= Expect no match a rather long string that doesn't end with one of them /word (?>(?:(?!otherword)[a-zA-Z0-9]+ ){0,30})otherword/ word cat dog elephant mussel cow horse canary baboon snake shark otherword +\= Expect no match word cat dog elephant mussel cow horse canary baboon snake shark /word (?>[a-zA-Z0-9]+ ){0,30}otherword/ +\= Expect no match word cat dog elephant mussel cow horse canary baboon snake shark the quick brown fox and the lazy dog and several other words getting close to thirty by now I hope /(?<=\d{3}(?!999))foo/ 999foo 123999foo - *** Failers +\= Expect no match 123abcfoo /(?<=(?!...999)\d{3})foo/ 999foo 123999foo - *** Failers +\= Expect no match 123abcfoo /(?<=\d{3}(?!999)...)foo/ 123abcfoo 123456foo - *** Failers +\= Expect no match 123999foo /(?<=\d{3}...)(? \x09\x0a\x0c\x0d\x0b< @@ -3856,7 +3827,8 @@ a\nxb\n /(?!^)x/m - a\nxb\n +\= Expect no match + a\nxb\n /abc\Qabc\Eabc/ abcabcabc @@ -3866,7 +3838,7 @@ / abc\Q abc\Eabc/x abc abcabc - *** Failers +\= Expect no match abcabcabc /abc#comment @@ -3898,7 +3870,7 @@ /\Gabc/ abc - *** Failers +\= Expect no match xyzabc /\Gabc./g @@ -3909,7 +3881,7 @@ /a(?x: b c )d/ XabcdY - *** Failers +\= Expect no match Xa b c d Y /((?x)x y z | a b c)/ @@ -3918,13 +3890,13 @@ /(?i)AB(?-i)C/ XabCY - *** Failers +\= Expect no match XabcY /((?i)AB(?-i)C|D)E/ abCE DE - *** Failers +\= Expect no match abcE abCe dE @@ -3936,17 +3908,11 @@ - d ] - *** Failers +\= Expect no match b -/[\z\C]/ - z - C - -/\M/ - M - /(a+)*b/ +\= Expect no match aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa /(?i)reg(?:ul(?:[aä]|ae)r|ex)/ @@ -3971,57 +3937,34 @@ ac bbbbc -/abc/SS>testsavedregex -testsavedregex -testsavedregex -testsavedregex - - xyz\r\nabc\ - xyz\rabc\ - xyz\r\nabc\ - ** Failers - xyz\nabc\ - xyz\r\nabc\ - xyz\nabc\ - xyz\rabc\ - xyz\rabc\ - -/abc$/m - xyzabc - xyzabc\n - xyzabc\npqr - xyzabc\r\ - xyzabc\rpqr\ - xyzabc\r\n\ - xyzabc\r\npqr\ - ** Failers - xyzabc\r - xyzabc\rpqr - xyzabc\r\n - xyzabc\r\npqr + xyz\r\nabc +\= Expect no match + xyz\rabc + xyzabc\r + xyzabc\rpqr + xyzabc\r\n + xyzabc\r\npqr + +/^abc/Im,newline=crlf + xyz\r\nabclf> +\= Expect no match + xyz\nabclf + xyz\rabclf -/^abc/m - xyz\rabcdef - xyz\nabcdef\ - ** Failers - xyz\nabcdef - -/^abc/m - xyz\nabcdef - xyz\rabcdef\ - ** Failers - xyz\rabcdef - -/^abc/m - xyz\r\nabcdef - xyz\rabcdef\ - ** Failers - xyz\rabcdef - -/.*/ +/^abc/Im,newline=cr + xyz\rabc +\= Expect no match + xyz\nabc + xyz\r\nabc + +/.*/I,newline=lf + abc\ndef + abc\rdef + abc\r\ndef + +/.*/I,newline=cr + abc\ndef + abc\rdef + abc\r\ndef + +/.*/I,newline=crlf + abc\ndef + abc\rdef + abc\r\ndef + +/\w+(.)(.)?def/Is abc\ndef abc\rdef abc\r\ndef - \abc\ndef - \abc\rdef - \abc\r\ndef - \abc\ndef - \abc\rdef - \abc\r\ndef /\w+(.)(.)?def/s abc\ndef @@ -4129,43 +4058,58 @@ aaaa /(a|)*\d/ - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 +\= Expect no match + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa /(?>a|)*\d/ - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 +\= Expect no match + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa /(?:a|)*\d/ - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 +\= Expect no match + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + +/^a.b/newline=lf + a\rb +\= Expect no match + a\nb -/^a.b/ +/^a.b/newline=cr + a\nb +\= Expect no match + a\rb + +/^a.b/newline=anycrlf + a\x85b +\= Expect no match a\rb - a\nb\ - ** Failers + +/^a.b/newline=any +\= Expect no match a\nb - a\nb\ - a\rb\ - a\rb\ + a\rb + a\x85b -/^abc./mgx +/^abc./gmx,newline=any abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x85abc7 JUNK -/abc.$/mgx +/abc.$/gmx,newline=any abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc9 -/^a\Rb/ +/^a\Rb/bsr=unicode a\nb a\rb a\r\nb a\x0bb a\x0cb a\x85b - ** Failers +\= Expect no match a\n\rb -/^a\R*b/ +/^a\R*b/bsr=unicode ab a\nb a\rb @@ -4176,7 +4120,7 @@ a\n\rb a\n\r\x85\x0cb -/^a\R+b/ +/^a\R+b/bsr=unicode a\nb a\rb a\r\nb @@ -4185,10 +4129,10 @@ a\x85b a\n\rb a\n\r\x85\x0cb - ** Failers +\= Expect no match ab -/^a\R{1,3}b/ +/^a\R{1,3}b/bsr=unicode a\nb a\n\rb a\n\r\x85b @@ -4196,30 +4140,25 @@ a\r\n\r\n\r\nb a\n\r\n\rb a\n\n\r\nb - ** Failers +\= Expect no match a\n\n\n\rb a\r -/^a[\R]b/ - aRb - ** Failers - a\nb - /.+foo/ afoo - ** Failers +\= Expect no match \r\nfoo \nfoo -/.+foo/ +/.+foo/newline=crlf afoo \nfoo - ** Failers +\= Expect no match \r\nfoo -/.+foo/ +/.+foo/newline=any afoo - ** Failers +\= Expect no match \nfoo \r\nfoo @@ -4228,24 +4167,24 @@ \r\nfoo \nfoo -/^$/mg +/^$/gm,newline=any abc\r\rxyz abc\n\rxyz - ** Failers +\= Expect no match abc\r\nxyz /^X/m XABC - ** Failers - XABC\B +\= Expect no match + XABC\=notbol -/(?m)^$/g+ +/(?m)^$/g,newline=any,aftertext abc\r\n\r\n -/(?m)^$|^\r\n/g+ +/(?m)^$|^\r\n/g,newline=any,aftertext abc\r\n\r\n -/(?m)$/g+ +/(?m)$/g,newline=any,aftertext abc\r\n\r\n /(?|(abc)|(xyz))/ @@ -4263,20 +4202,20 @@ /(?|(abc)|(xyz))(?1)/ abcabc xyzabc - ** Failers +\= Expect no match xyzxyz /\H\h\V\v/ X X\x0a X\x09X\x0b - ** Failers +\= Expect no match \xa0 X\x0a -/\H*\h+\V?\v{3,4}/ +/\H*\h+\V?\v{3,4}/ \x09\x20\xa0X\x0a\x0b\x0c\x0d\x0a \x09\x20\xa0\x0a\x0b\x0c\x0d\x0a \x09\x20\xa0\x0a\x0b\x0c - ** Failers +\= Expect no match \x09\x20\xa0\x0a\x0b /\H{3,4}/ @@ -4289,7 +4228,7 @@ /\h*X\h?\H+Y\H?Z/ >XNNNYZ > X NYQZ - ** Failers +\= Expect no match >XYZ > X NY Z @@ -4297,248 +4236,241 @@ >XY\x0aZ\x0aA\x0bNN\x0c >\x0a\x0dX\x0aY\x0a\x0bZZZ\x0aAAA\x0bNNN\x0c -/.+A/ +/.+A/newline=crlf +\= Expect no match \r\nA -/\nA/ +/\nA/newline=crlf \r\nA -/[\r\n]A/ +/[\r\n]A/newline=crlf \r\nA -/(\r|\n)A/ +/(\r|\n)A/newline=crlf \r\nA -/a\Rb/I +/a\Rb/I,bsr=anycrlf a\rb a\nb a\r\nb - ** Failers +\= Expect no match a\x85b a\x0bb -/a\Rb/I +/a\Rb/I,bsr=unicode a\rb a\nb a\r\nb a\x85b a\x0bb - ** Failers - a\x85b\ - a\x0bb\ -/a\R?b/I +/a\R?b/I,bsr=anycrlf a\rb a\nb a\r\nb - ** Failers +\= Expect no match a\x85b a\x0bb -/a\R?b/I +/a\R?b/I,bsr=unicode a\rb a\nb a\r\nb a\x85b a\x0bb - ** Failers - a\x85b\ - a\x0bb\ -/a\R{2,4}b/I +/a\R{2,4}b/I,bsr=anycrlf a\r\n\nb a\n\r\rb a\r\n\r\n\r\n\r\nb - ** Failers - a\x85\85b - a\x0b\0bb +\= Expect no match + a\x0b\x0bb + a\x85\x85b -/a\R{2,4}b/I +/a\R{2,4}b/I,bsr=unicode a\r\rb a\n\n\nb a\r\n\n\r\rb - a\x85\85b - a\x0b\0bb - ** Failers + a\x85\x85b + a\x0b\x0bb +\= Expect no match a\r\r\r\r\rb - a\x85\85b\ - a\x0b\0bb\ /a(?!)|\wbc/ abc -/a[]b/ - ** Failers +/a[]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames +\= Expect no match ab -/a[]+b/ - ** Failers +/a[]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames +\= Expect no match ab -/a[]*+b/ - ** Failers +/a[]*+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames +\= Expect no match ab -/a[^]b/ +/a[^]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames aXb a\nb - ** Failers +\= Expect no match ab -/a[^]+b/ +/a[^]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames aXb a\nX\nXb - ** Failers +\= Expect no match ab -/X$/E +/X$/dollar_endonly X - ** Failers +\= Expect no match X\n /X$/ X X\n -/xyz/C +/xyz/auto_callout xyz abcxyz - abcxyz\Y - ** Failers +\= Expect no match abc - abc\Y abcxypqr - abcxypqr\Y -/(*NO_START_OPT)xyz/C +/xyz/auto_callout,no_start_optimize + abcxyz +\= Expect no match + abc + abcxypqr + +/(*NO_START_OPT)xyz/auto_callout abcxyz /(?C)ab/ ab - \C-ab + ab\=callout_none -/ab/C +/ab/auto_callout ab - \C-ab + ab\=callout_none -/^"((?(?=[a])[^"])|b)*"$/C +/^"((?(?=[a])[^"])|b)*"$/auto_callout "ab" - \C-"ab" + "ab"\=callout_none /\d+X|9+Y/ - ++++123999\P - ++++123999Y\P + ++++123999\=ps + ++++123999Y\=ps /Z(*F)/ - Z\P - ZA\P +\= Expect no match + Z\=ps + ZA\=ps /Z(?!)/ - Z\P - ZA\P +\= Expect no match + Z\=ps + ZA\=ps /dog(sbody)?/ - dogs\P - dogs\P\P + dogs\=ps + dogs\=ph /dog(sbody)??/ - dogs\P - dogs\P\P + dogs\=ps + dogs\=ph /dog|dogsbody/ - dogs\P - dogs\P\P + dogs\=ps + dogs\=ph /dogsbody|dog/ - dogs\P - dogs\P\P + dogs\=ps + dogs\=ph /Z(*F)Q|ZXY/ - Z\P - ZA\P - X\P + Z\=ps +\= Expect no match + ZA\=ps + X\=ps /\bthe cat\b/ - the cat\P - the cat\P\P + the cat\=ps + the cat\=ph /dog(sbody)?/ - dogs\D\P - body\D\R + dogs\=ps + body\=dfa_restart /dog(sbody)?/ - dogs\D\P\P - body\D\R + dogs\=ph + body\=dfa_restart /abc/ - abc\P - abc\P\P + abc\=ps + abc\=ph /abc\K123/ xyzabc123pqr -/(?<=abc)123/ +/(?<=abc)123/allusedtext xyzabc123pqr - xyzabc12\P - xyzabc12\P\P + xyzabc12\=ps + xyzabc12\=ph -/\babc\b/ +/\babc\b/allusedtext +++abc+++ - +++ab\P - +++ab\P\P + +++ab\=ps + +++ab\=ph -/(?=C)/g+ +/(?=C)/g,aftertext ABCDECBA /(abc|def|xyz)/I terhjk;abcdaadsfe the quick xyz brown fox - \Yterhjk;abcdaadsfe - \Ythe quick xyz brown fox - ** Failers +\= Expect no match thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd - \Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd -/(abc|def|xyz)/SI +/(abc|def|xyz)/I,no_start_optimize terhjk;abcdaadsfe - the quick xyz brown fox - \Yterhjk;abcdaadsfe - \Ythe quick xyz brown fox - ** Failers + the quick xyz brown fox +\= Expect no match thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd - \Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd -/abcd*/+ - xxxxabcd\P - xxxxabcd\P\P - dddxxx\R - xxxxabcd\P\P - xxx\R +/abcd*/aftertext + xxxxabcd\=ps + xxxxabcd\=ph + dddxxx\=dfa_restart + xxxxabcd\=ph + xxx\=dfa_restart /abcd*/i - xxxxabcd\P - xxxxabcd\P\P - XXXXABCD\P - XXXXABCD\P\P + xxxxabcd\=ps + xxxxabcd\=ph + XXXXABCD\=ps + XXXXABCD\=ph /abc\d*/ - xxxxabc1\P - xxxxabc1\P\P + xxxxabc1\=ps + xxxxabc1\=ph /abc[de]*/ - xxxxabcde\P - xxxxabcde\P\P + xxxxabcde\=ps + xxxxabcde\=ph /(?:(?1)|B)(A(*F)|C)/ ABCD CCD - ** Failers +\= Expect no match CAD /^(?:(?1)|B)(A(*F)|C)/ CCD BCD - ** Failers +\= Expect no match ABCD CAD BAD @@ -4547,7 +4479,6 @@ ac /^(?=a(*SKIP)b|ac)/ - ** Failers ac /^(?=a(*THEN)b|ac)/ @@ -4555,106 +4486,107 @@ /^(?=a(*PRUNE)b)/ ab - ** Failers - ac /^(?(?!a(*SKIP)b))/ ac -/(?<=abc)def/ - abc\P\P +/(?<=abc)def/allusedtext + abc\=ph /abc$/ abc - abc\P - abc\P\P + abc\=ps + abc\=ph /abc$/m abc abc\n - abc\P\P - abc\n\P\P - abc\P - abc\n\P + abc\=ph + abc\n\=ph + abc\=ps + abc\n\=ps /abc\z/ abc - abc\P - abc\P\P + abc\=ps + abc\=ph /abc\Z/ abc - abc\P - abc\P\P + abc\=ps + abc\=ph /abc\b/ abc - abc\P - abc\P\P + abc\=ps + abc\=ph /abc\B/ + abc\=ps + abc\=ph +\= Expect no match abc - abc\P - abc\P\P /.+/ - abc\>0 - abc\>1 - abc\>2 - abc\>3 - abc\>4 - abc\>-4 + abc\=offset=0 + abc\=offset=1 + abc\=offset=2 +\= Bad offsets + abc\=offset=4 + abc\=offset=-4 +\= Expect no match + abc\=offset=3 /^(?:a)++\w/ aaaab - ** Failers +\= Expect no match aaaa bbb /^(?:aa|(?:a)++\w)/ aaaab aaaa - ** Failers +\= Expect no match bbb /^(?:a)*+\w/ aaaab bbb - ** Failers +\= Expect no match aaaa /^(a)++\w/ aaaab - ** Failers +\= Expect no match aaaa bbb /^(a|)++\w/ aaaab - ** Failers +\= Expect no match aaaa bbb -/(?=abc){3}abc/+ +/(?=abc){3}abc/aftertext abcabcabc - ** Failers +\= Expect no match xyz -/(?=abc)+abc/+ +/(?=abc)+abc/aftertext abcabcabc - ** Failers +\= Expect no match xyz -/(?=abc)++abc/+ +/(?=abc)++abc/aftertext abcabcabc - ** Failers +\= Expect no match xyz /(?=abc){0}xyz/ xyz /(?=abc){1}xyz/ - ** Failers +\= Expect no match xyz /(?=(a))?./ @@ -4691,99 +4623,93 @@ /((?(R)a+|(?1)b))/ aaaabcde -/((?(R2)a+|(?1)b))/ +/((?(R2)a+|(?1)b))()/ aaaabcde /(?(R)a*(?1)|((?R))b)/ aaaabcde -/(a+)/O - \O6aaaa - \O8aaaa - -/ab\Cde/ - abXde - -/(?<=ab\Cde)X/ - abZdeX +/(a+)/no_auto_possess + aaaa\=ovector=3 + aaaa\=ovector=4 /^\R/ - \r\P - \r\P\P + \r\=ps + \r\=ph /^\R{2,3}x/ - \r\P - \r\P\P - \r\r\P - \r\r\P\P - \r\r\r\P - \r\r\r\P\P + \r\=ps + \r\=ph + \r\r\=ps + \r\r\=ph + \r\r\r\=ps + \r\r\r\=ph \r\rx \r\r\rx /^\R{2,3}?x/ - \r\P - \r\P\P - \r\r\P - \r\r\P\P - \r\r\r\P - \r\r\r\P\P + \r\=ps + \r\=ph + \r\r\=ps + \r\r\=ph + \r\r\r\=ps + \r\r\r\=ph \r\rx \r\r\rx /^\R?x/ - \r\P - \r\P\P + \r\=ps + \r\=ph x \rx /^\R+x/ - \r\P - \r\P\P - \r\n\P - \r\n\P\P + \r\=ps + \r\=ph + \r\n\=ps + \r\n\=ph \rx -/^a$/ - a\r\P - a\r\P\P +/^a$/newline=crlf + a\r\=ps + a\r\=ph -/^a$/m - a\r\P - a\r\P\P +/^a$/m,newline=crlf + a\r\=ps + a\r\=ph -/^(a$|a\r)/ - a\r\P - a\r\P\P +/^(a$|a\r)/newline=crlf + a\r\=ps + a\r\=ph -/^(a$|a\r)/m - a\r\P - a\r\P\P +/^(a$|a\r)/m,newline=crlf + a\r\=ps + a\r\=ph -/./ - \r\P - \r\P\P +/./newline=crlf + \r\=ps + \r\=ph -/.{2,3}/ - \r\P - \r\P\P - \r\r\P - \r\r\P\P - \r\r\r\P - \r\r\r\P\P - -/.{2,3}?/ - \r\P - \r\P\P - \r\r\P - \r\r\P\P - \r\r\r\P - \r\r\r\P\P - -/-- Test simple validity check for restarts --/ +/.{2,3}/newline=crlf + \r\=ps + \r\=ph + \r\r\=ps + \r\r\=ph + \r\r\r\=ps + \r\r\r\=ph + +/.{2,3}?/newline=crlf + \r\=ps + \r\=ph + \r\r\=ps + \r\r\=ph + \r\r\r\=ps + \r\r\r\=ph + +# Test simple validity check for restarts /abcdef/ - abc\R + abc\=dfa_restart /)(.)|(?R))++)*F>/ text text xxxxx text F> text2 more text. @@ -4797,9 +4723,9 @@ xx\xa0xxxxxabcd /abcd/ - abcd\O0 + abcd\=ovector=0 -/-- These tests show up auto-possessification --/ +# These tests show up auto-possessification /[ab]*/ aaaa @@ -4837,15 +4763,267 @@ '\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++' NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED +/abc(?=xyz)/allusedtext + abcxyzpqr + abcxyzpqr\=aftertext + +/(?<=pqr)abc(?=xyz)/allusedtext + xyzpqrabcxyzpqr + xyzpqrabcxyzpqr\=aftertext + +/a\b/ + a.\=allusedtext + a\=allusedtext + +/abc(?=abcde)(?=ab)/allusedtext + abcabcdefg + +/a*?b*?/ + ab + +/(*NOTEMPTY)a*?b*?/ + ab + ba + cb + +/(*NOTEMPTY_ATSTART)a*?b*?/aftertext + ab + cdab + +/(a)(b)|(c)/ + XcX\=ovector=2,get=1,get=2,get=3,get=4,getall + +/(?aa)/ + aa\=get=A + aa\=copy=A + +/a+/no_auto_possess + a\=ovector=2,get=1,get=2,getall + aaa\=ovector=2,get=1,get=2,getall + +/a(b)c(d)/ + abc\=ph,copy=0,copy=1,getall + +/ab(?C" any text with spaces ")cde/B + abcde + 12abcde + +/^a(b)c(?C1)def/ + abcdef + +/^a(b)c(?C"AB")def/ + abcdef + +/^a(b)c(?C1)def/ + abcdef\=callout_capture + +/^a(b)c(?C{AB})def/B + abcdef\=callout_capture + +/^(?(?C25)(?=abc)abcd|xyz)/B + abcdefg + xyz123 + +/^(?(?C$abc$)(?=abc)abcd|xyz)/B + abcdefg + xyz123 + +/^ab(?C'first')cd(?C"second")ef/ + abcdefg + +/(?:a(?C`code`)){3}X/ + aaaXY + +# Binary zero in callout string +/"a(?C'x" 00 "z')b"/hex + abcdefgh + /(?(?!)a|b)/ bbb +\= Expect no match aaa -/()()a+/O= - aaa\D - a\D +/^/gm + \n\n\n + +/^/gm,alt_circumflex + \n\n\n + +/abc/use_offset_limit + 1234abcde\=offset_limit=100 + 1234abcde\=offset_limit=9 + 1234abcde\=offset_limit=4 + 1234abcde\=offset_limit=4,offset=4 +\= Expect no match + 1234abcde\=offset_limit=4,offset=5 + 1234abcde\=offset_limit=3 + +/(?<=abc)/use_offset_limit + 1234abc\=offset_limit=7 +\= Expect no match + 1234abc\=offset_limit=6 + +/abcd/null_context + abcd\=null_context + +/()()a+/no_auto_possess + aaa\=allcaptures + a\=allcaptures + +/(*LIMIT_DEPTH=100)^((.)(?1)|.)$/ +\= Expect depth limit exceeded + a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] + +/(*LIMIT_HEAP=0)^((.)(?1)|.)$/ +\= Expect heap limit exceeded + a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] + +/(*LIMIT_HEAP=50000)^((.)(?1)|.)$/ +\= Expect success + a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] /(02-)?[0-9]{3}-[0-9]{3}/ 02-123-123 -/-- End of testinput8 --/ +/^(a(?2))(b)(?1)/ + abbab\=find_limits + +/abc/endanchored + xyzabc +\= Expect no match + xyzabcdef +\= Expect error + xyzabc\=ph + +/abc/ + xyzabc\=endanchored +\= Expect no match + xyzabcdef\=endanchored +\= Expect error + xyzabc\=ps,endanchored + +/abc|bcd/endanchored + xyzabcd +\= Expect no match + xyzabcdef + +/(*NUL)^.*/ + a\nb\x00ccc + +/(*NUL)^.*/s + a\nb\x00ccc + +/^x/m,newline=nul + ab\x00xy + +/'#comment' 0d 0a 00 '^x\' 0a 'y'/x,newline=nul,hex + x\nyz + +/(*NUL)^X\NY/ + X\nY + X\rY +\= Expect no match + X\x00Y + +/(?<=abc|)/ + abcde\=aftertext + +/(?<=|abc)/ + abcde\=aftertext + +/(?<=abc|)/endanchored + abcde\=aftertext + +/(?<=|abc)/endanchored + abcde\=aftertext + +/(*LIMIT_MATCH=100).*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00 \x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););/no_dotstar_anchor +\= Expect limit exceeded +.*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00 \x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?);); + +/\n/firstline + xyz\nabc + +/\nabc/firstline + xyz\nabc + +/\x{0a}abc/firstline,newline=crlf +\= Expect no match + xyz\r\nabc + +/[abc]/firstline +\= Expect no match + \na + +/foobar/ + the foobar thing\=copy_matched_subject + the foobar thing\=copy_matched_subject,zero_terminate + +/foobar/g + the foobar thing foobar again\=copy_matched_subject + +/(?(VERSION>=0)^B0W)/ + B0W-W0W +\= Expect no match + 0 + +/(?(VERSION>=1000)^B0W|W0W)/ + B0W-W0W +\= Expect no match + 0 + +/(?<=pqr)abc(?=xyz)/ + 123pqrabcxy\=ps,allusedtext + 123pqrabcxyz\=ps,allusedtext + +/(?>a+b)/ + aaaa\=ps + aaaab\=ps + +/(abc)(?1)/ + abca\=ps + abcabc\=ps + +/(?(?=abc).*|Z)/ + ab\=ps + abcxyz\=ps + +/(abc)++x/ + abcab\=ps + abc\=ps + ab\=ps + abcx + +/\z/ + abc\=ph + abc\=ps + +/\Z/ + abc\=ph + abc\=ps + abc\n\=ph + abc\n\=ps + +/c*+(?<=[bc])/ + abc\=ph + ab\=ph + abc\=ps + ab\=ps + +/c++(?<=[bc])/ + abc\=ph + ab\=ph + +/(?<=(?=.(?<=x)))/ + abx + ab\=ph + bxyz + xyz + +/(?![ab]).*/ + ab\=ph + +/c*+/ + ab\=ph,offset=2 + +# End of testinput6 diff --git a/src/pcre2/testdata/testinput7 b/src/pcre2/testdata/testinput7 new file mode 100644 index 00000000..ef302235 --- /dev/null +++ b/src/pcre2/testdata/testinput7 @@ -0,0 +1,2096 @@ +# This set of tests checks UTF and Unicode property support with the DFA +# matching functionality of pcre_dfa_match(). A default subject modifier is +# used to force DFA matching for all tests. + +#subject dfa +#newline_default LF any anyCRLF + +/\x{100}ab/utf + \x{100}ab + +/a\x{100}*b/utf + ab + a\x{100}b + a\x{100}\x{100}b + +/a\x{100}+b/utf + a\x{100}b + a\x{100}\x{100}b +\= Expect no match + ab + +/\bX/utf + Xoanon + +Xoanon + \x{300}Xoanon +\= Expect no match + YXoanon + +/\BX/utf + YXoanon +\= Expect no match + Xoanon + +Xoanon + \x{300}Xoanon + +/X\b/utf + X+oanon + ZX\x{300}oanon + FAX +\= Expect no match + Xoanon + +/X\B/utf + Xoanon +\= Expect no match + X+oanon + ZX\x{300}oanon + FAX + +/[^a]/utf + abcd + a\x{100} + +/^[abc\x{123}\x{400}-\x{402}]{2,3}\d/utf + ab99 + \x{123}\x{123}45 + \x{400}\x{401}\x{402}6 +\= Expect no match + d99 + \x{123}\x{122}4 + \x{400}\x{403}6 + \x{400}\x{401}\x{402}\x{402}6 + +/a.b/utf + acb + a\x7fb + a\x{100}b +\= Expect no match + a\nb + +/a(.{3})b/utf + a\x{4000}xyb + a\x{4000}\x7fyb + a\x{4000}\x{100}yb +\= Expect no match + a\x{4000}b + ac\ncb + +/a(.*?)(.)/ + a\xc0\x88b + +/a(.*?)(.)/utf + a\x{100}b + +/a(.*)(.)/ + a\xc0\x88b + +/a(.*)(.)/utf + a\x{100}b + +/a(.)(.)/ + a\xc0\x92bcd + +/a(.)(.)/utf + a\x{240}bcd + +/a(.?)(.)/ + a\xc0\x92bcd + +/a(.?)(.)/utf + a\x{240}bcd + +/a(.??)(.)/ + a\xc0\x92bcd + +/a(.??)(.)/utf + a\x{240}bcd + +/a(.{3})b/utf + a\x{1234}xyb + a\x{1234}\x{4321}yb + a\x{1234}\x{4321}\x{3412}b +\= Expect no match + a\x{1234}b + ac\ncb + +/a(.{3,})b/utf + a\x{1234}xyb + a\x{1234}\x{4321}yb + a\x{1234}\x{4321}\x{3412}b + axxxxbcdefghijb + a\x{1234}\x{4321}\x{3412}\x{3421}b +\= Expect no match + a\x{1234}b + +/a(.{3,}?)b/utf + a\x{1234}xyb + a\x{1234}\x{4321}yb + a\x{1234}\x{4321}\x{3412}b + axxxxbcdefghijb + a\x{1234}\x{4321}\x{3412}\x{3421}b +\= Expect no match + a\x{1234}b + +/a(.{3,5})b/utf + a\x{1234}xyb + a\x{1234}\x{4321}yb + a\x{1234}\x{4321}\x{3412}b + axxxxbcdefghijb + a\x{1234}\x{4321}\x{3412}\x{3421}b + axbxxbcdefghijb + axxxxxbcdefghijb +\= Expect no match + a\x{1234}b + axxxxxxbcdefghijb + +/a(.{3,5}?)b/utf + a\x{1234}xyb + a\x{1234}\x{4321}yb + a\x{1234}\x{4321}\x{3412}b + axxxxbcdefghijb + a\x{1234}\x{4321}\x{3412}\x{3421}b + axbxxbcdefghijb + axxxxxbcdefghijb +\= Expect no match + a\x{1234}b + axxxxxxbcdefghijb + +/^[a\x{c0}]/utf +\= Expect no match + \x{100} + +/(?<=aXb)cd/utf + aXbcd + +/(?<=a\x{100}b)cd/utf + a\x{100}bcd + +/(?<=a\x{100000}b)cd/utf + a\x{100000}bcd + +/(?:\x{100}){3}b/utf + \x{100}\x{100}\x{100}b +\= Expect no match + \x{100}\x{100}b + +/\x{ab}/utf + \x{ab} + \xc2\xab +\= Expect no match + \x00{ab} + +/(?<=(.))X/utf + WXYZ + \x{256}XYZ +\= Expect no match + XYZ + +/[^a]+/g,utf + bcd + \x{100}aY\x{256}Z + +/^[^a]{2}/utf + \x{100}bc + +/^[^a]{2,}/utf + \x{100}bcAa + +/^[^a]{2,}?/utf + \x{100}bca + +/[^a]+/gi,utf + bcd + \x{100}aY\x{256}Z + +/^[^a]{2}/i,utf + \x{100}bc + +/^[^a]{2,}/i,utf + \x{100}bcAa + +/^[^a]{2,}?/i,utf + \x{100}bca + +/\x{100}{0,0}/utf + abcd + +/\x{100}?/utf + abcd + \x{100}\x{100} + +/\x{100}{0,3}/utf + \x{100}\x{100} + \x{100}\x{100}\x{100}\x{100} + +/\x{100}*/utf + abce + \x{100}\x{100}\x{100}\x{100} + +/\x{100}{1,1}/utf + abcd\x{100}\x{100}\x{100}\x{100} + +/\x{100}{1,3}/utf + abcd\x{100}\x{100}\x{100}\x{100} + +/\x{100}+/utf + abcd\x{100}\x{100}\x{100}\x{100} + +/\x{100}{3}/utf + abcd\x{100}\x{100}\x{100}XX + +/\x{100}{3,5}/utf + abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX + +/\x{100}{3,}/utf,no_auto_possess + abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX + +/(?<=a\x{100}{2}b)X/utf + Xyyya\x{100}\x{100}bXzzz + +/\D*/utf,no_auto_possess + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + +/\D*/utf,no_auto_possess + \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + +/\D/utf + 1X2 + 1\x{100}2 + +/>\S/utf + > >X Y + > >\x{100} Y + +/\d/utf + \x{100}3 + +/\s/utf + \x{100} X + +/\D+/utf + 12abcd34 +\= Expect no match + 1234 + +/\D{2,3}/utf + 12abcd34 + 12ab34 +\= Expect no match + 1234 + 12a34 + +/\D{2,3}?/utf + 12abcd34 + 12ab34 +\= Expect no match + 1234 + 12a34 + +/\d+/utf + 12abcd34 + +/\d{2,3}/utf + 12abcd34 + 1234abcd +\= Expect no match + 1.4 + +/\d{2,3}?/utf + 12abcd34 + 1234abcd +\= Expect no match + 1.4 + +/\S+/utf + 12abcd34 +\= Expect no match + \ \ + +/\S{2,3}/utf + 12abcd34 + 1234abcd +\= Expect no match + \ \ + +/\S{2,3}?/utf + 12abcd34 + 1234abcd +\= Expect no match + \ \ + +/>\s+ <34 + +/>\s{2,3} \s{2,3}? \xff< + +/[\xff]/utf + >\x{ff}< + +/[^\xFF]/ + XYZ + +/[^\xff]/utf + XYZ + \x{123} + +/^[ac]*b/utf +\= Expect no match + xb + +/^[ac\x{100}]*b/utf +\= Expect no match + xb + +/^[^x]*b/i,utf +\= Expect no match + xb + +/^[^x]*b/utf +\= Expect no match + xb + +/^\d*b/utf +\= Expect no match + xb + +/(|a)/g,utf + catac + a\x{256}a + +/^\x{85}$/i,utf + \x{85} + +/^abc./gmx,newline=any,utf + abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x{0085}abc7 \x{2028}abc8 \x{2029}abc9 JUNK + +/abc.$/gmx,newline=any,utf + abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x{0085} abc7\x{2028} abc8\x{2029} abc9 + +/^a\Rb/bsr=unicode,utf + a\nb + a\rb + a\r\nb + a\x0bb + a\x0cb + a\x{85}b + a\x{2028}b + a\x{2029}b +\= Expect no match + a\n\rb + +/^a\R*b/bsr=unicode,utf + ab + a\nb + a\rb + a\r\nb + a\x0bb + a\x0c\x{2028}\x{2029}b + a\x{85}b + a\n\rb + a\n\r\x{85}\x0cb + +/^a\R+b/bsr=unicode,utf + a\nb + a\rb + a\r\nb + a\x0bb + a\x0c\x{2028}\x{2029}b + a\x{85}b + a\n\rb + a\n\r\x{85}\x0cb +\= Expect no match + ab + +/^a\R{1,3}b/bsr=unicode,utf + a\nb + a\n\rb + a\n\r\x{85}b + a\r\n\r\nb + a\r\n\r\n\r\nb + a\n\r\n\rb + a\n\n\r\nb +\= Expect no match + a\n\n\n\rb + a\r + +/\h+\V?\v{3,4}/utf,no_auto_possess + \x09\x20\x{a0}X\x0a\x0b\x0c\x0d\x0a + +/\V?\v{3,4}/utf,no_auto_possess + \x20\x{a0}X\x0a\x0b\x0c\x0d\x0a + +/\h+\V?\v{3,4}/utf,no_auto_possess + >\x09\x20\x{a0}X\x0a\x0a\x0a< + +/\V?\v{3,4}/utf,no_auto_possess + >\x09\x20\x{a0}X\x0a\x0a\x0a< + +/\H\h\V\v/utf + X X\x0a + X\x09X\x0b +\= Expect no match + \x{a0} X\x0a + +/\H*\h+\V?\v{3,4}/utf,no_auto_possess + \x09\x20\x{a0}X\x0a\x0b\x0c\x0d\x0a + \x09\x20\x{a0}\x0a\x0b\x0c\x0d\x0a + \x09\x20\x{a0}\x0a\x0b\x0c +\= Expect no match + \x09\x20\x{a0}\x0a\x0b + +/\H\h\V\v/utf + \x{3001}\x{3000}\x{2030}\x{2028} + X\x{180e}X\x{85} +\= Expect no match + \x{2009} X\x0a + +/\H*\h+\V?\v{3,4}/utf,no_auto_possess + \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x0c\x0d\x0a + \x09\x{205f}\x{a0}\x0a\x{2029}\x0c\x{2028}\x0a + \x09\x20\x{202f}\x0a\x0b\x0c +\= Expect no match + \x09\x{200a}\x{a0}\x{2028}\x0b + +/a\Rb/I,bsr=anycrlf,utf + a\rb + a\nb + a\r\nb +\= Expect no match + a\x{85}b + a\x0bb + +/a\Rb/I,bsr=unicode,utf + a\rb + a\nb + a\r\nb + a\x{85}b + a\x0bb + +/a\R?b/I,bsr=anycrlf,utf + a\rb + a\nb + a\r\nb +\= Expect no match + a\x{85}b + a\x0bb + +/a\R?b/I,bsr=unicode,utf + a\rb + a\nb + a\r\nb + a\x{85}b + a\x0bb + +/X/newline=any,utf,firstline + A\x{1ec5}ABCXYZ + +/abcd*/utf + xxxxabcd\=ps + xxxxabcd\=ph + +/abcd*/i,utf + xxxxabcd\=ps + xxxxabcd\=ph + XXXXABCD\=ps + XXXXABCD\=ph + +/abc\d*/utf + xxxxabc1\=ps + xxxxabc1\=ph + +/abc[de]*/utf + xxxxabcde\=ps + xxxxabcde\=ph + +/\bthe cat\b/utf + the cat\=ps + the cat\=ph + +/./newline=crlf,utf + \r\=ps + \r\=ph + +/.{2,3}/newline=crlf,utf + \r\=ps + \r\=ph + \r\r\=ps + \r\r\=ph + \r\r\r\=ps + \r\r\r\=ph + +/.{2,3}?/newline=crlf,utf + \r\=ps + \r\=ph + \r\r\=ps + \r\r\=ph + \r\r\r\=ps + \r\r\r\=ph + +/[^\x{100}]/utf + \x{100}\x{101}X + +/[^\x{100}]+/utf + \x{100}\x{101}X + +/\pL\P{Nd}/utf + AB +\= Expect no match + A0 + 00 + +/\X./utf + AB + A\x{300}BC + A\x{300}\x{301}\x{302}BC +\= Expect no match + \x{300} + +/\X\X/utf + ABC + A\x{300}B\x{300}\x{301}C + A\x{300}\x{301}\x{302}BC +\= Expect no match + \x{300} + +/^\pL+/utf + abcd + a + +/^\PL+/utf + 1234 + = +\= Expect no match + abcd + +/^\X+/utf + abcdA\x{300}\x{301}\x{302} + A\x{300}\x{301}\x{302} + A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302} + a + \x{300}\x{301}\x{302} + +/\X?abc/utf + abc + A\x{300}abc + A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz + \x{300}abc + +/^\X?abc/utf + abc + A\x{300}abc + \x{300}abc +\= Expect no match + A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz + +/\X*abc/utf + abc + A\x{300}abc + A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz + \x{300}abc + +/^\X*abc/utf + abc + A\x{300}abc + A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz + \x{300}abc + +/^\pL?=./utf + A=b + =c +\= Expect no match + 1=2 + AAAA=b + +/^\pL*=./utf + AAAA=b + =c +\= Expect no match + 1=2 + +/^\X{2,3}X/utf + A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X + A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X +\= Expect no match + X + A\x{300}\x{301}\x{302}X + A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X + +/^\pC\pL\pM\pN\pP\pS\pZ\p{Xsp}/utf + >\x{1680}\x{2028}\x{0b} +\= Expect no match + \x{0b} + +/^>\p{Xsp}+/utf,no_auto_possess + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xsp}*/utf,no_auto_possess + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xsp}{2,9}/utf,no_auto_possess + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>[\p{Xsp}]/utf,no_auto_possess + >\x{2028}\x{0b} + +/^>[\p{Xsp}]+/utf,no_auto_possess + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}/utf + >\x{1680}\x{2028}\x{0b} + >\x{a0} +\= Expect no match + \x{0b} + +/^>\p{Xps}+/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}+?/utf + >\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}*/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}{2,9}/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}{2,9}?/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>[\p{Xps}]/utf + >\x{2028}\x{0b} + +/^>[\p{Xps}]+/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^\p{Xwd}/utf + ABCD + 1234 + \x{6ca} + \x{a6c} + \x{10a7} + _ABC +\= Expect no match + [] + +/^\p{Xwd}+/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + +/^\p{Xwd}*/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + +/^\p{Xwd}{2,9}/utf + A_12\x{6ca}\x{a6c}\x{10a7} + +/^[\p{Xwd}]/utf + ABCD1234_ + 1234abcd_ + \x{6ca} + \x{a6c} + \x{10a7} + _ABC +\= Expect no match + [] + +/^[\p{Xwd}]+/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + +# Unicode properties for \b abd \B + +/\b...\B/utf,ucp + abc_ + \x{37e}abc\x{376} + \x{37e}\x{376}\x{371}\x{393}\x{394} + !\x{c0}++\x{c1}\x{c2} + !\x{c0}+++++ + +# Without PCRE_UCP, non-ASCII always fail, even if < 256 + +/\b...\B/utf + abc_ +\= Expect no match + \x{37e}abc\x{376} + \x{37e}\x{376}\x{371}\x{393}\x{394} + !\x{c0}++\x{c1}\x{c2} + !\x{c0}+++++ + +# With PCRE_UCP, non-UTF8 chars that are < 256 still check properties + +/\b...\B/ucp + abc_ + !\x{c0}++\x{c1}\x{c2} + !\x{c0}+++++ + +# Caseless single negated characters > 127 need UCP support + +/[^\x{100}]/i,utf + \x{100}\x{101}X + +/[^\x{100}]+/i,utf + \x{100}\x{101}XX + +/^\X/utf + A\=ps + A\=ph + A\x{300}\x{301}\=ps + A\x{300}\x{301}\=ph + A\x{301}\=ps + A\x{301}\=ph + +/^\X{2,3}/utf + A\=ps + A\=ph + AA\=ps + AA\=ph + A\x{300}\x{301}\=ps + A\x{300}\x{301}\=ph + A\x{300}\x{301}A\x{300}\x{301}\=ps + A\x{300}\x{301}A\x{300}\x{301}\=ph + +/^\X{2}/utf + AA\=ps + AA\=ph + A\x{300}\x{301}A\x{300}\x{301}\=ps + A\x{300}\x{301}A\x{300}\x{301}\=ph + +/^\X+/utf + AA\=ps + AA\=ph + +/^\X+?Z/utf + AA\=ps + AA\=ph + +# These are tests for extended grapheme clusters + +/^\X/utf,aftertext + G\x{34e}\x{34e}X + \x{34e}\x{34e}X + \x04X + \x{1100}X + \x{1100}\x{34e}X + \x{1b04}\x{1b04}X +\= These match up to the roman letters + \x{1111}\x{1111}L,L + \x{1111}\x{1111}\x{1169}L,L,V + \x{1111}\x{ae4c}L, LV + \x{1111}\x{ad89}L, LVT + \x{1111}\x{ae4c}\x{1169}L, LV, V + \x{1111}\x{ae4c}\x{1169}\x{1169}L, LV, V, V + \x{1111}\x{ae4c}\x{1169}\x{11fe}L, LV, V, T + \x{1111}\x{ad89}\x{11fe}L, LVT, T + \x{1111}\x{ad89}\x{11fe}\x{11fe}L, LVT, T, T + \x{ad89}\x{11fe}\x{11fe}LVT, T, T +\= These match just the first codepoint (invalid sequence) + \x{1111}\x{11fe}L, T + \x{ae4c}\x{1111}LV, L + \x{ae4c}\x{ae4c}LV, LV + \x{ae4c}\x{ad89}LV, LVT + \x{1169}\x{1111}V, L + \x{1169}\x{ae4c}V, LV + \x{1169}\x{ad89}V, LVT + \x{ad89}\x{1111}LVT, L + \x{ad89}\x{1169}LVT, V + \x{ad89}\x{ae4c}LVT, LV + \x{ad89}\x{ad89}LVT, LVT + \x{11fe}\x{1111}T, L + \x{11fe}\x{1169}T, V + \x{11fe}\x{ae4c}T, LV + \x{11fe}\x{ad89}T, LVT +\= Test extend and spacing mark + \x{1111}\x{ae4c}\x{0711}L, LV, extend + \x{1111}\x{ae4c}\x{1b04}L, LV, spacing mark + \x{1111}\x{ae4c}\x{1b04}\x{0711}\x{1b04}L, LV, spacing mark, extend, spacing mark +\= Test CR, LF, and control + \x0d\x{0711}CR, extend + \x0d\x{1b04}CR, spacingmark + \x0a\x{0711}LF, extend + \x0a\x{1b04}LF, spacingmark + \x0b\x{0711}Control, extend + \x09\x{1b04}Control, spacingmark +\= There are no Prepend characters, so we can't test Prepend, CR + +/^(?>\X{2})X/utf,aftertext + \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + +/^\X{2,4}X/utf,aftertext + \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + +/^\X{2,4}?X/utf,aftertext + \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + +/\x{1e9e}+/i,utf + \x{1e9e}\x{00df} + +/[z\x{1e9e}]+/i,utf + \x{1e9e}\x{00df} + +/\x{00df}+/i,utf + \x{1e9e}\x{00df} + +/[z\x{00df}]+/i,utf + \x{1e9e}\x{00df} + +/\x{1f88}+/i,utf + \x{1f88}\x{1f80} + +/[z\x{1f88}]+/i,utf + \x{1f88}\x{1f80} + +# Perl matches these + +/\x{00b5}+/i,utf + \x{00b5}\x{039c}\x{03bc} + +/\x{039c}+/i,utf + \x{00b5}\x{039c}\x{03bc} + +/\x{03bc}+/i,utf + \x{00b5}\x{039c}\x{03bc} + + +/\x{00c5}+/i,utf + \x{00c5}\x{00e5}\x{212b} + +/\x{00e5}+/i,utf + \x{00c5}\x{00e5}\x{212b} + +/\x{212b}+/i,utf + \x{00c5}\x{00e5}\x{212b} + +/\x{01c4}+/i,utf + \x{01c4}\x{01c5}\x{01c6} + +/\x{01c5}+/i,utf + \x{01c4}\x{01c5}\x{01c6} + +/\x{01c6}+/i,utf + \x{01c4}\x{01c5}\x{01c6} + +/\x{01c7}+/i,utf + \x{01c7}\x{01c8}\x{01c9} + +/\x{01c8}+/i,utf + \x{01c7}\x{01c8}\x{01c9} + +/\x{01c9}+/i,utf + \x{01c7}\x{01c8}\x{01c9} + + +/\x{01ca}+/i,utf + \x{01ca}\x{01cb}\x{01cc} + +/\x{01cb}+/i,utf + \x{01ca}\x{01cb}\x{01cc} + +/\x{01cc}+/i,utf + \x{01ca}\x{01cb}\x{01cc} + +/\x{01f1}+/i,utf + \x{01f1}\x{01f2}\x{01f3} + +/\x{01f2}+/i,utf + \x{01f1}\x{01f2}\x{01f3} + +/\x{01f3}+/i,utf + \x{01f1}\x{01f2}\x{01f3} + +/\x{0345}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + +/\x{0399}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + +/\x{03b9}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + +/\x{1fbe}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + +/\x{0392}+/i,utf + \x{0392}\x{03b2}\x{03d0} + +/\x{03b2}+/i,utf + \x{0392}\x{03b2}\x{03d0} + +/\x{03d0}+/i,utf + \x{0392}\x{03b2}\x{03d0} + + +/\x{0395}+/i,utf + \x{0395}\x{03b5}\x{03f5} + +/\x{03b5}+/i,utf + \x{0395}\x{03b5}\x{03f5} + +/\x{03f5}+/i,utf + \x{0395}\x{03b5}\x{03f5} + +/\x{0398}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + +/\x{03b8}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + +/\x{03d1}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + +/\x{03f4}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + +/\x{039a}+/i,utf + \x{039a}\x{03ba}\x{03f0} + +/\x{03ba}+/i,utf + \x{039a}\x{03ba}\x{03f0} + +/\x{03f0}+/i,utf + \x{039a}\x{03ba}\x{03f0} + +/\x{03a0}+/i,utf + \x{03a0}\x{03c0}\x{03d6} + +/\x{03c0}+/i,utf + \x{03a0}\x{03c0}\x{03d6} + +/\x{03d6}+/i,utf + \x{03a0}\x{03c0}\x{03d6} + +/\x{03a1}+/i,utf + \x{03a1}\x{03c1}\x{03f1} + +/\x{03c1}+/i,utf + \x{03a1}\x{03c1}\x{03f1} + +/\x{03f1}+/i,utf + \x{03a1}\x{03c1}\x{03f1} + +/\x{03a3}+/i,utf + \x{03A3}\x{03C2}\x{03C3} + +/\x{03c2}+/i,utf + \x{03A3}\x{03C2}\x{03C3} + +/\x{03c3}+/i,utf + \x{03A3}\x{03C2}\x{03C3} + +/\x{03a6}+/i,utf + \x{03a6}\x{03c6}\x{03d5} + +/\x{03c6}+/i,utf + \x{03a6}\x{03c6}\x{03d5} + +/\x{03d5}+/i,utf + \x{03a6}\x{03c6}\x{03d5} + +/\x{03c9}+/i,utf + \x{03c9}\x{03a9}\x{2126} + +/\x{03a9}+/i,utf + \x{03c9}\x{03a9}\x{2126} + +/\x{2126}+/i,utf + \x{03c9}\x{03a9}\x{2126} + +/\x{1e60}+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + +/\x{1e61}+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + +/\x{1e9b}+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + +/\x{1e9e}+/i,utf + \x{1e9e}\x{00df} + +/\x{00df}+/i,utf + \x{1e9e}\x{00df} + +/\x{1f88}+/i,utf + \x{1f88}\x{1f80} + +/\x{1f80}+/i,utf + \x{1f88}\x{1f80} + +/\x{004b}+/i,utf + \x{004b}\x{006b}\x{212a} + +/\x{006b}+/i,utf + \x{004b}\x{006b}\x{212a} + +/\x{212a}+/i,utf + \x{004b}\x{006b}\x{212a} + +/\x{0053}+/i,utf + \x{0053}\x{0073}\x{017f} + +/\x{0073}+/i,utf + \x{0053}\x{0073}\x{017f} + +/\x{017f}+/i,utf + \x{0053}\x{0073}\x{017f} + +/ist/i,utf +\= Expect no match + ikt + +/is+t/i,utf + iSs\x{17f}t +\= Expect no match + ikt + +/is+?t/i,utf +\= Expect no match + ikt + +/is?t/i,utf +\= Expect no match + ikt + +/is{2}t/i,utf +\= Expect no match + iskt + +/^\p{Xuc}/utf + $abc + @abc + `abc + \x{1234}abc +\= Expect no match + abc + +/^\p{Xuc}+/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^\p{Xuc}+?/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^\p{Xuc}+?\*/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^\p{Xuc}++/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^\p{Xuc}{3,5}/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^\p{Xuc}{3,5}?/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^[\p{Xuc}]/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^[\p{Xuc}]+/utf + $@`\x{a0}\x{1234}\x{e000}** +\= Expect no match + \x{9f} + +/^\P{Xuc}/utf + abc +\= Expect no match + $abc + @abc + `abc + \x{1234}abc + +/^[\P{Xuc}]/utf + abc +\= Expect no match + $abc + @abc + `abc + \x{1234}abc + +/^A\s+Z/utf,ucp + A\x{2005}Z + A\x{85}\x{180e}\x{2005}Z + +/^A[\s]+Z/utf,ucp + A\x{2005}Z + A\x{85}\x{180e}\x{2005}Z + +/(?<=\x{100})\x{200}(?=\x{300})/utf,allusedtext + \x{100}\x{200}\x{300} + +# End of testinput7 diff --git a/src/pcre2/testdata/testinput8 b/src/pcre2/testdata/testinput8 new file mode 100644 index 00000000..550631db --- /dev/null +++ b/src/pcre2/testdata/testinput8 @@ -0,0 +1,189 @@ +# There are two sorts of patterns in this test. A number of them are +# representative patterns whose lengths and offsets are checked. This is just a +# doublecheck test to ensure the sizes don't go horribly wrong when something +# is changed. The operation of these patterns is checked in other tests. +# +# This file also contains tests whose output varies with code unit size and/or +# link size. Unicode support is required for these tests. There are separate +# output files for each code unit size and link size. + +#pattern fullbincode,memory + +/((?i)b)/ + +/(?s)(.*X|^B)/ + +/(?s:.*X|^B)/ + +/^[[:alnum:]]/ + +/#/Ix + +/a#/Ix + +/x?+/ + +/x++/ + +/x{1,3}+/ + +/(x)*+/ + +/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/ + +"8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" + +"\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" + +/(a(?1)b)/ + +/(a(?1)+b)/ + +/a(?Pb|c)d(?Pe)/ + +/(?:a(?Pc(?Pd)))(?Pa)/ + +/(?Pa)...(?P=a)bbb(?P>a)d/ + +/abc(?C255)de(?C)f/ + +/abcde/auto_callout + +/\x{100}/utf + +/\x{1000}/utf + +/\x{10000}/utf + +/\x{100000}/utf + +/\x{10ffff}/utf + +/\x{110000}/utf + +/[\x{ff}]/utf + +/[\x{100}]/utf + +/\x80/utf + +/\xff/utf + +/\x{0041}\x{2262}\x{0391}\x{002e}/I,utf + +/\x{D55c}\x{ad6d}\x{C5B4}/I,utf + +/\x{65e5}\x{672c}\x{8a9e}/I,utf + +/[\x{100}]/utf + +/[Z\x{100}]/utf + +/^[\x{100}\E-\Q\E\x{150}]/utf + +/^[\QÄ€\E-\QÅ\E]/utf + +/^[\QÄ€\E-\QÅ\E/utf + +/[\p{L}]/ + +/[\p{^L}]/ + +/[\P{L}]/ + +/[\P{^L}]/ + +/[abc\p{L}\x{0660}]/utf + +/[\p{Nd}]/utf + +/[\p{Nd}+-]+/utf + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/i,utf + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/utf + +/[\x{105}-\x{109}]/i,utf + +/( ( (?(1)0|) )* )/x + +/( (?(1)0|)* )/x + +/[a]/ + +/[a]/utf + +/[\xaa]/ + +/[\xaa]/utf + +/[^a]/ + +/[^a]/utf + +/[^\xaa]/ + +/[^\xaa]/utf + +#pattern -memory + +/[^\d]/utf,ucp + +/[[:^alpha:][:^cntrl:]]+/utf,ucp + +/[[:^cntrl:][:^alpha:]]+/utf,ucp + +/[[:alpha:]]+/utf,ucp + +/[[:^alpha:]\S]+/utf,ucp + +/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/ + +/(((a\2)|(a*)\g<-1>))*a?/ + +/((?+1)(\1))/ + +"(?1)(?#?'){2}(a)" + +/.((?2)(?R)|\1|$)()/ + +/.((?3)(?R)()(?2)|\1|$)()/ + +/(?1)()((((((\1++))\x85)+)|))/ + +# Check the absolute limit on nesting (?| etc. This varies with code unit +# width because the workspace is a different number of bytes. It will fail +# with link size 2 in 8-bit and 16-bit but not in 32-bit. + +/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|parens_nest_limit=1000,-fullbincode + +# Use "expand" to create some very long patterns with nested parentheses, in +# order to test workspace overflow. Again, this varies with code unit width, +# and even when it fails in two modes, the error offset differs. It also varies +# with link size - hence multiple tests with different values. + +/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000 + +/(?(1)(?1)){8,}+()/debug + abcd + +/(?(1)|a(?1)b){2,}+()/debug + abcde + +/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug + +/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)/ + +/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/-fullbincode + +#pattern -fullbincode + +/\[()]{65535}/expand + +# End of testinput8 diff --git a/src/pcre/testdata/testinput14 b/src/pcre2/testdata/testinput9 similarity index 79% rename from src/pcre/testdata/testinput14 rename to src/pcre2/testdata/testinput9 index 192b8d64..4eb228af 100644 --- a/src/pcre/testdata/testinput14 +++ b/src/pcre2/testdata/testinput9 @@ -1,94 +1,13 @@ -/-- This set of tests is run only with the 8-bit library. They do not require - UTF-8 or Unicode property support. The file starts with all the tests of - the POSIX interface, because that is supported only with the 8-bit library. - --/ +# This set of tests is run only with the 8-bit library. They must not require +# UTF-8 or Unicode property support. */ -< forbid 8W +#forbid_utf +#newline_default lf any anycrlf -/abc/P - abc - *** Failers - -/^abc|def/P - abcdef - abcdef\B - -/.*((abc)$|(def))/P - defabc - \Zdefabc - -/the quick brown fox/P - the quick brown fox - *** Failers - The Quick Brown Fox - -/the quick brown fox/Pi - the quick brown fox - The Quick Brown Fox - -/abc.def/P - *** Failers - abc\ndef - -/abc$/P - abc - abc\n - -/(abc)\2/P - -/(abc\1)/P - abc - -/a*(b+)(z)(z)/P - aaaabbbbzzzz - aaaabbbbzzzz\O0 - aaaabbbbzzzz\O1 - aaaabbbbzzzz\O2 - aaaabbbbzzzz\O3 - aaaabbbbzzzz\O4 - aaaabbbbzzzz\O5 - -/ab.cd/P - ab-cd - ab=cd - ** Failers - ab\ncd - -/ab.cd/Ps - ab-cd - ab=cd - ab\ncd - -/a(b)c/PN - abc - -/a(?Pb)c/PN - abc - -/a?|b?/P - abc - ** Failers - ddd\N - -/\w+A/P - CDAAAAB - -/\w+A/PU - CDAAAAB - -/\Biss\B/I+P - Mississippi - -/abc/\P - -/-- End of POSIX tests --/ - -/a\Cb/ - aXb - a\nb - ** Failers (too big char) - A\x{123}B - A\o{443}B +/ab/ +\= Expect error message (too big char) and no match + A\x{123}B + A\o{443}B /\x{100}/I @@ -287,59 +206,61 @@ ) (?: [\040\t] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* \) )* # optional trailing comment -/xSI - -/-- Although this saved pattern was compiled with link-size=2, it does no harm -to run this test with other link sizes because it is going to generated a -"compiled in wrong mode" error as soon as it is loaded, so the link size does -not matter. --/ - -\x09< -/[\h]+/BZ +/[\h]+/B >\x09\x20\xa0< -/[\v]/BZ +/[\v]/B -/[\H]/BZ +/[\H]/B -/[^\h]/BZ +/[^\h]/B -/[\V]/BZ +/[\V]/B -/[\x0a\V]/BZ +/[\x0amark + XX + +/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark,alt_verbnames XX -/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/K +/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark XX -/\u0100/ +/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark,alt_verbnames + XX + +/\u0100/alt_bsux,allow_empty_class,match_unset_backref,dupnames + +/[\u0100-\u0200]/alt_bsux,allow_empty_class,match_unset_backref,dupnames + +/[^\x00-a]{12,}[^b-\xff]*/B -/[\u0100-\u0200]/ +/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/B -/[^\x00-a]{12,}[^b-\xff]*/BZ +/(*MARK:a\x{100}b)z/alt_verbnames -/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/BZ +/(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':Æ¿)/ -/(?'ABC'[bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar](*THEN:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/ +/(?i:A{1,}\6666666666)/ + A\x{1b6}6666666 -/-- End of testinput14 --/ +# End of testinput9 diff --git a/src/pcre2/testdata/testinputEBC b/src/pcre2/testdata/testinputEBC new file mode 100644 index 00000000..36df20b8 --- /dev/null +++ b/src/pcre2/testdata/testinputEBC @@ -0,0 +1,137 @@ +# This is a specialized test for checking, when PCRE2 is compiled with the +# EBCDIC option but in an ASCII environment, that newline, white space, and \c +# functionality is working. It catches cases where explicit values such as 0x0a +# have been used instead of names like CHAR_LF. Needless to say, it is not a +# genuine EBCDIC test! In patterns, alphabetic characters that follow a +# backslash must be in EBCDIC code. In data, NL, NEL, LF, ESC, and DEL must be +# in EBCDIC, but can of course be specified as escapes. + +# Test default newline and variations + +/^A/m + ABC + 12\x15ABC + +/^A/m,newline=any + 12\x15ABC + 12\x0dABC + 12\x0d\x15ABC + 12\x25ABC + +/^A/m,newline=anycrlf + 12\x15ABC + 12\x0dABC + 12\x0d\x15ABC + ** Fail + 12\x25ABC + +# Test \h + +/^A\ˆ/ + A B + A\x41B + +# Test \H + +/^A\È/ + AB + A\x42B + ** Fail + A B + A\x41B + +# Test \R + +/^A\Ù/ + A\x15B + A\x0dB + A\x25B + A\x0bB + A\x0cB + ** Fail + A B + +# Test \v + +/^A\¥/ + A\x15B + A\x0dB + A\x25B + A\x0bB + A\x0cB + ** Fail + A B + +# Test \V + +/^A\å/ + A B + ** Fail + A\x15B + A\x0dB + A\x25B + A\x0bB + A\x0cB + +# For repeated items, use an atomic group so that the output is the same +# for DFA matching (otherwise it may show multiple matches). + +# Test \h+ + +/^A(?>\ˆ+)/ + A B + +# Test \H+ + +/^A(?>\È+)/ + AB + ** Fail + A B + +# Test \R+ + +/^A(?>\Ù+)/ + A\x15B + A\x0dB + A\x25B + A\x0bB + A\x0cB + ** Fail + A B + +# Test \v+ + +/^A(?>\¥+)/ + A\x15B + A\x0dB + A\x25B + A\x0bB + A\x0cB + ** Fail + A B + +# Test \V+ + +/^A(?>\å+)/ + A B + ** Fail + A\x15B + A\x0dB + A\x25B + A\x0bB + A\x0cB + +# Test \c functionality + +/\ƒ@\ƒA\ƒb\ƒC\ƒd\ƒE\ƒf\ƒG\ƒh\ƒI\ƒJ\ƒK\ƒl\ƒm\ƒN\ƒO\ƒp\ƒq\ƒr\ƒS\ƒT\ƒu\ƒV\ƒW\ƒX\ƒy\ƒZ/ + \x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f + +/\ƒ[\ƒ\\ƒ]\ƒ^\ƒ_/ + \x18\x19\x1a\x1b\x1c\x1d\x1e\x1f + +/\ƒ?/ + A\xffB + +/\ƒ&/ + +# End diff --git a/src/pcre/testdata/testoutput1 b/src/pcre2/testdata/testoutput1 similarity index 74% rename from src/pcre/testdata/testoutput1 rename to src/pcre2/testdata/testoutput1 index e6147e60..5b1686ce 100644 --- a/src/pcre/testdata/testoutput1 +++ b/src/pcre2/testdata/testoutput1 @@ -1,16 +1,23 @@ -/-- This set of tests is for features that are compatible with all versions of - Perl >= 5.10, in non-UTF-8 mode. It should run clean for the 8-bit, 16-bit, - and 32-bit PCRE libraries. --/ +# This set of tests is for features that are compatible with all versions of +# Perl >= 5.10, in non-UTF mode. It should run clean for the 8-bit, 16-bit, and +# 32-bit PCRE libraries, and also using the perltest.sh script. + +# WARNING: Use only / as the pattern delimiter. Although pcre2test supports +# a number of delimiters, all those other than / give problems with the +# perltest.sh script. -< forbid 89?=ABCDEFfGILMNPTUWXZ< +#forbid_utf +#newline_default lf any anycrlf +#perltest /the quick brown fox/ the quick brown fox 0: the quick brown fox - The quick brown FOX -No match What do you know about the quick brown fox? 0: the quick brown fox +\= Expect no match + The quick brown FOX +No match What do you know about THE QUICK BROWN FOX? No match @@ -89,8 +96,7 @@ No match 0: aaaabxyzpqrrrabbxyyyypqAzz >>>>abcxyzpqrrrabbxyyyypqAzz 0: abcxyzpqrrrabbxyyyypqAzz - *** Failers -No match +\= Expect no match abxyzpqrrabbxyyyypqAzz No match abxyzpqrrrrabbxyyyypqAzz @@ -111,8 +117,7 @@ No match abcabczz 0: abcabczz 1: abc - *** Failers -No match +\= Expect no match zz No match abcabcabczz @@ -145,8 +150,7 @@ No match bbbbbbbbbbbac 0: bbbbbbbbbbbac 1: a - *** Failers -No match +\= Expect no match aaac No match abbbbbbbbbbbac @@ -177,35 +181,12 @@ No match bbbbbbbbbbbac 0: bbbbbbbbbbbac 1: a - *** Failers -No match +\= Expect no match aaac No match abbbbbbbbbbbac No match -/^(b+|a){1,2}?bc/ - bbc - 0: bbc - 1: b - -/^(b*|ba){1,2}?bc/ - babc - 0: babc - 1: ba - bbabc - 0: bbabc - 1: ba - bababc - 0: bababc - 1: ba - *** Failers -No match - bababbc -No match - babababc -No match - /^(ba|b*){1,2}?bc/ babc 0: babc @@ -216,8 +197,7 @@ No match bababc 0: bababc 1: ba - *** Failers -No match +\= Expect no match bababbc No match babababc @@ -240,8 +220,7 @@ No match 0: d ething 0: e - *** Failers -No match +\= Expect no match fthing No match [thing @@ -258,8 +237,7 @@ No match 0: d ething 0: e - *** Failers -No match +\= Expect no match athing No match fthing @@ -272,8 +250,7 @@ No match 0: [ \\thing 0: \ - *** Failers - 0: * +\= Expect no match athing No match bthing @@ -292,8 +269,7 @@ No match 0: a fthing 0: f - *** Failers - 0: * +\= Expect no match ]thing No match cthing @@ -336,8 +312,7 @@ No match 0: 10 100 0: 100 - *** Failers -No match +\= Expect no match abc No match @@ -354,46 +329,42 @@ No match 0: xxx0 xxx1234 0: xxx1234 - *** Failers -No match +\= Expect no match xxx No match /^.+[0-9][0-9][0-9]$/ x123 0: x123 + x1234 + 0: x1234 xx123 0: xx123 123456 0: 123456 - *** Failers -No match +\= Expect no match 123 No match - x1234 - 0: x1234 /^.+?[0-9][0-9][0-9]$/ x123 0: x123 + x1234 + 0: x1234 xx123 0: xx123 123456 0: 123456 - *** Failers -No match +\= Expect no match 123 No match - x1234 - 0: x1234 /^([^!]+)!(.+)=apquxz\.ixr\.zzz\.ac\.uk$/ abc!pqr=apquxz.ixr.zzz.ac.uk 0: abc!pqr=apquxz.ixr.zzz.ac.uk 1: abc 2: pqr - *** Failers -No match +\= Expect no match !pqr=apquxz.ixr.zzz.ac.uk No match abc!=apquxz.ixr.zzz.ac.uk @@ -406,7 +377,8 @@ No match /:/ Well, we need a colon: somewhere 0: : - *** Fail if we don't +\= Expect no match + Fail without a colon No match /([\da-f:]+)$/i @@ -434,8 +406,7 @@ No match Any old stuff 0: ff 1: ff - *** Failers -No match +\= Expect no match 0zzz No match gzzz @@ -456,8 +427,7 @@ No match 1: 12 2: 123 3: 0 - *** Failers -No match +\= Expect no match .1.2.3333 No match 1.2.3 @@ -476,8 +446,7 @@ No match 1: 1 2: non-sp1 3: non-sp2 - *** Failers -No match +\= Expect no match 1IN SOA non-sp1 non-sp2( No match @@ -497,8 +466,7 @@ No match x-.y-. 0: x-.y-. 1: .y- - *** Failers -No match +\= Expect no match -abc.peq. No match @@ -517,8 +485,7 @@ No match 1: -a 2: .b-c 3: -c - *** Failers -No match +\= Expect no match *.0 No match *.a- @@ -569,22 +536,21 @@ No match \"\" ; rhubarb 0: "" ; rhubarb 1: ; rhubarb - *** Failers -No match +\= Expect no match \"1234\" : things No match /^$/ \ 0: - *** Failers +\= Expect no match + A non-empty line No match / ^ a (?# begins with a) b\sc (?# then b c) $ (?# then end)/x ab c 0: ab c - *** Failers -No match +\= Expect no match abc No match ab cde @@ -593,8 +559,7 @@ No match /(?x) ^ a (?# begins with a) b\sc (?# then b c) $ (?# then end)/ ab c 0: ab c - *** Failers -No match +\= Expect no match abc No match ab cde @@ -605,8 +570,7 @@ No match 0: a bcd a b d 0: a b d - *** Failers -No match +\= Expect no match abcd No match ab d @@ -715,8 +679,7 @@ No match 0: 12345678ab 12345678__ 0: 12345678__ - *** Failers -No match +\= Expect no match 1234567 No match @@ -729,8 +692,7 @@ No match 0: 12345 aaaaa 0: aaaaa - *** Failers -No match +\= Expect no match 123456 No match @@ -755,8 +717,7 @@ No match 0: def=defdefdef 1: def 2: def - *** Failers -No match +\= Expect no match abc=defdef No match @@ -826,8 +787,7 @@ No match From abcd Mon Sep 1 12:33:02 1997 0: From abcd Mon Sep 1 12:33 1: Sep - *** Failers -No match +\= Expect no match From abcd Sep 01 12:33:02 1997 No match @@ -864,8 +824,7 @@ No match abc456 0: abc 1: abc - *** Failers -No match +\= Expect no match abc123 No match @@ -909,16 +868,14 @@ No match /(?!^)abc/ the abc 0: abc - *** Failers -No match +\= Expect no match abc No match /(?=^)abc/ abc 0: abc - *** Failers -No match +\= Expect no match the abc No match @@ -1150,8 +1107,7 @@ No match 0: "/s=user/ou=host/o=place/prmd=uu.yy/admd= /c=gb/"@x400-re.lay A missing angle .*/)foo" - /this/is/a/very/long/line/in/deed/with/very/many/slashes/in/it/you/see/ -No match - -"(?>.*/)foo" +/(?>.*\/)foo/ /this/is/a/very/long/line/in/deed/with/very/many/slashes/in/and/foo 0: /this/is/a/very/long/line/in/deed/with/very/many/slashes/in/and/foo +\= Expect no match + /this/is/a/very/long/line/in/deed/with/very/many/slashes/in/it/you/see/ +No match /(?>(\.\d\d[1-9]?))\d+/ 1.230003938 @@ -3095,8 +2967,7 @@ No match 1.875000282 0: .875000282 1: .875 - *** Failers -No match +\= Expect no match 1.235 No match @@ -3104,8 +2975,7 @@ No match now is the time for all good men to come to the aid of the party 0: now is the time for all good men to come to the aid of the party 1: party - *** Failers -No match +\= Expect no match this is not a line with only words and spaces! No match @@ -3124,8 +2994,7 @@ No match 0: 12345a 1: 12345 2: a - *** Failers -No match +\= Expect no match 12345+ No match @@ -3156,15 +3025,14 @@ No match 0: abc(ade)ufh()()x 1: x -/\(((?>[^()]+)|\([^()]+\))+\)/ +/\(((?>[^()]+)|\([^()]+\))+\)/ (abc) 0: (abc) 1: abc (abc(def)xyz) 0: (abc(def)xyz) 1: xyz - *** Failers -No match +\= Expect no match ((()aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa No match @@ -3173,8 +3041,7 @@ No match 0: ab Ab 0: Ab - *** Failers -No match +\= Expect no match aB No match AB @@ -3184,8 +3051,7 @@ No match a bcd e 0: a bcd e 1: a bc - *** Failers -No match +\= Expect no match a b cd e No match abcd e @@ -3197,8 +3063,7 @@ No match a bcde f 0: a bcde f 1: a bcde f - *** Failers -No match +\= Expect no match abcdef No match @@ -3209,8 +3074,7 @@ No match aBc 0: aBc 1: aB - *** Failers -No match +\= Expect no match abC No match aBC @@ -3229,8 +3093,7 @@ No match 0: abc aBc 0: aBc - *** Failers -No match +\= Expect no match ABC No match abC @@ -3243,8 +3106,7 @@ No match 0: aBc aBBc 0: aBBc - *** Failers -No match +\= Expect no match aBC No match aBBC @@ -3255,8 +3117,7 @@ No match 0: abcd abCd 0: abCd - *** Failers -No match +\= Expect no match aBCd No match abcD @@ -3269,8 +3130,7 @@ No match 0: more than MILLION more \n than Million 0: more \x0a than Million - *** Failers -No match +\= Expect no match MORE THAN MILLION No match more \n than \n million @@ -3283,22 +3143,20 @@ No match 0: more than MILLION more \n than Million 0: more \x0a than Million - *** Failers -No match +\= Expect no match MORE THAN MILLION No match more \n than \n million No match -/(?>a(?i)b+)+c/ +/(?>a(?i)b+)+c/ abc 0: abc aBbc 0: aBbc aBBc 0: aBBc - *** Failers -No match +\= Expect no match Abc No match abAb @@ -3311,8 +3169,7 @@ No match 0: abc aBc 0: aBc - *** Failers -No match +\= Expect no match Ab No match abC @@ -3327,8 +3184,7 @@ No match aBxxc 0: xxc 1: xx - *** Failers -No match +\= Expect no match Abxxc No match ABxxc @@ -3342,8 +3198,7 @@ No match 1: a bB 0: bB - *** Failers -No match +\= Expect no match aB No match bA @@ -3357,20 +3212,21 @@ No match 0: b bb 0: bb - *** Failers -No match +\= Expect no match ab No match + +# Perl gets this next one wrong if the pattern ends with $; in that case it +# fails to match "12". -/^(?(?=abc)\w{3}:|\d\d)$/ +/^(?(?=abc)\w{3}:|\d\d)/ abc: 0: abc: 12 0: 12 - *** Failers -No match 123 -No match + 0: 12 +\= Expect no match xyz No match @@ -3379,8 +3235,7 @@ No match 0: abc: 12 0: 12 - *** Failers -No match +\= Expect no match 123 No match xyz @@ -3395,8 +3250,7 @@ No match 0: cat focat 0: cat - *** Failers -No match +\= Expect no match foocat No match @@ -3409,8 +3263,7 @@ No match 0: cat focat 0: cat - *** Failers -No match +\= Expect no match foocat No match @@ -3449,8 +3302,7 @@ No match 0: 12aa 1: 1 2: 2 - *** Failers -No match +\= Expect no match 1234 No match @@ -3467,8 +3319,7 @@ No match blaH blaH 0: blaH blaH 1: blaH - *** Failers -No match +\= Expect no match blah BLAH No match Blah blah @@ -3499,6 +3350,14 @@ No match 0: blaH blah 1: blaH +/((?i)blah)\s+(?m)A(?i:\1)/ + blah ABLAH + 0: blah ABLAH + 1: blah +\= Expect no match + blah aBLAH +No match + /(?>a*)*/ a 0: a @@ -3636,8 +3495,7 @@ No match 0: 12-sep-98 12-09-98 0: 12-09-98 - *** Failers -No match +\= Expect no match sep-12-98 No match @@ -3648,8 +3506,7 @@ No match foobarfootling 0: barfoo 1: foo - *** Failers -No match +\= Expect no match foobar No match barfoo @@ -3684,8 +3541,7 @@ No match BBx 0: BBx 1: BB - *** Failers -No match +\= Expect no match abcX No match aBCX @@ -3717,8 +3573,7 @@ No match France 0: F 1: F - *** Failers -No match +\= Expect no match Africa No match @@ -3741,8 +3596,7 @@ No match Zambesi 0: Z 1: Z - *** Failers -No match +\= Expect no match aCD No match XY @@ -3751,8 +3605,7 @@ No match /(?<=foo\n)^bar/m foo\nbar 0: bar - *** Failers -No match +\= Expect no match bar No match baz\nbar @@ -3765,39 +3618,37 @@ No match 0: baz koobarbaz 0: baz - *** Failers -No match +\= Expect no match baz No match foobarbaz No match -/The cases of aaaa and aaaaaa are missed out below because Perl does things/ -/differently. We know that odd, and maybe incorrect, things happen with/ -No match -/recursive references in Perl, as far as 5.11.3 - see some stuff in test #2./ -No match +# The cases of aaaa and aaaaaa are missed out below because Perl does things +# differently. We know that odd, and maybe incorrect, things happen with +# recursive references in Perl, as far as 5.11.3 - see some stuff in test #2. /^(a\1?){4}$/ - a -No match - aa -No match - aaa -No match aaaaa 0: aaaaa 1: a aaaaaaa 0: aaaaaaa 1: a + aaaaaaaaaa + 0: aaaaaaaaaa + 1: aaaa +\= Expect no match + a +No match + aa +No match + aaa +No match aaaaaaaa No match aaaaaaaaa No match - aaaaaaaaaa - 0: aaaaaaaaaa - 1: aaaa aaaaaaaaaaa No match aaaaaaaaaaaa @@ -3808,16 +3659,10 @@ No match No match aaaaaaaaaaaaaaa No match - aaaaaaaaaaaaaaaa + aaaaaaaaaaaaaaaa No match /^(a\1?)(a\1?)(a\2?)(a\3?)$/ - a -No match - aa -No match - aaa -No match aaaa 0: aaaa 1: a @@ -3842,16 +3687,23 @@ No match 2: aa 3: aaa 4: a - aaaaaaaa -No match - aaaaaaaaa -No match aaaaaaaaaa 0: aaaaaaaaaa 1: a 2: aa 3: aaa 4: aaaa +\= Expect no match + a +No match + aa +No match + aaa +No match + aaaaaaaa +No match + aaaaaaaaa +No match aaaaaaaaaaa No match aaaaaaaaaaaa @@ -3865,9 +3717,8 @@ No match aaaaaaaaaaaaaaaa No match -/The following tests are taken from the Perl 5.005 test suite; some of them/ -/are compatible with 5.004, but I'd rather not have to sort them out./ -No match +# The following tests are taken from the Perl 5.005 test suite; some of them +# are compatible with 5.004, but I'd rather not have to sort them out. /abc/ abc @@ -3876,8 +3727,7 @@ No match 0: abc ababc 0: abc - *** Failers -No match +\= Expect no match xbc No match axc @@ -3912,8 +3762,7 @@ No match /ab+bc/ abbc 0: abbc - *** Failers -No match +\= Expect no match abc No match abq @@ -3938,8 +3787,7 @@ No match 0: abbbbc /ab{4,5}bc/ - *** Failers -No match +\= Expect no match abq No match abbbbc @@ -3968,8 +3816,7 @@ No match /^abc$/ abc 0: abc - *** Failers -No match +\= Expect no match abbbbc No match abcc @@ -3984,10 +3831,7 @@ No match /abc$/ aabc 0: abc - *** Failers -No match - aabc - 0: abc +\= Expect no match aabcd No match @@ -4012,8 +3856,7 @@ No match /a[bc]d/ abd 0: abd - *** Failers -No match +\= Expect no match axyzd No match abc @@ -4046,8 +3889,7 @@ No match /a[^bc]d/ aed 0: aed - *** Failers -No match +\= Expect no match abd No match abd @@ -4060,10 +3902,9 @@ No match /a[^]b]c/ adc 0: adc - *** Failers -No match a-c 0: a-c +\= Expect no match a]c No match @@ -4076,8 +3917,7 @@ No match 0: a /\by\b/ - *** Failers -No match +\= Expect no match xy No match yz @@ -4086,8 +3926,7 @@ No match No match /\Ba\B/ - *** Failers - 0: a +\= Expect no match a- No match -a @@ -4114,10 +3953,7 @@ No match /\W/ - 0: - - *** Failers - 0: * - - - 0: - +\= Expect no match a No match @@ -4128,10 +3964,7 @@ No match /a\Sb/ a-b 0: a-b - *** Failers -No match - a-b - 0: a-b +\= Expect no match a b No match @@ -4142,10 +3975,7 @@ No match /\D/ - 0: - - *** Failers - 0: * - - - 0: - +\= Expect no match 1 No match @@ -4156,10 +3986,7 @@ No match /[\W]/ - 0: - - *** Failers - 0: * - - - 0: - +\= Expect no match a No match @@ -4170,10 +3997,7 @@ No match /a[\S]b/ a-b 0: a-b - *** Failers -No match - a-b - 0: a-b +\= Expect no match a b No match @@ -4184,10 +4008,7 @@ No match /[\D]/ - 0: - - *** Failers - 0: * - - - 0: - +\= Expect no match 1 No match @@ -4215,8 +4036,8 @@ No match 0: a((b /a\\b/ - a\b -No match + a\\b + 0: a\b /((a))/ abc @@ -4277,14 +4098,13 @@ No match 0: cde /abc/ - *** Failers -No match +\= Expect no match b No match - /a*/ - + \ + 0: /([abc])*d/ abbbcd @@ -4369,8 +4189,7 @@ No match 0: adcdcde /a[bcd]+dcdcde/ - *** Failers -No match +\= Expect no match abcde No match adcdcde @@ -4408,8 +4227,7 @@ No match reffgz 0: effgz 1: effgz - *** Failers -No match +\= Expect no match effg No match bcdd @@ -4457,8 +4275,7 @@ No match 9: a /multiple words of text/ - *** Failers -No match +\= Expect no match aa No match uh-uh @@ -4508,13 +4325,11 @@ No match /(a)|\1/ a 0: a - 1: a - *** Failers - 0: a 1: a ab 0: a 1: a +\= Expect no match x No match @@ -4551,8 +4366,7 @@ No match 0: ABC ABABC 0: ABC - *** Failers -No match +\= Expect no match aaxabxbaxbbx No match XBC @@ -4585,8 +4399,7 @@ No match 0: ABBC /ab+bc/i - *** Failers -No match +\= Expect no match ABC No match ABQ @@ -4611,8 +4424,7 @@ No match 0: ABBBBC /ab{4,5}?bc/i - *** Failers -No match +\= Expect no match ABQ No match ABBBBC @@ -4641,8 +4453,7 @@ No match /^abc$/i ABC 0: ABC - *** Failers -No match +\= Expect no match ABBBBC No match ABCC @@ -4677,10 +4488,9 @@ No match 0: AXYZC /a.*c/i - *** Failers -No match AABC 0: AABC +\= Expect no match AXYZD No match @@ -4691,8 +4501,7 @@ No match /a[b-d]e/i ACE 0: ACE - *** Failers -No match +\= Expect no match ABC No match ABD @@ -4725,8 +4534,7 @@ No match /a[^-b]c/i ADC 0: ADC - *** Failers -No match +\= Expect no match ABD No match A-C @@ -4748,8 +4556,7 @@ No match 1: /$b/i - *** Failers -No match +\= Expect no match A]C No match B @@ -4766,8 +4573,10 @@ No match 0: A((B /a\\b/i - A\B -No match + A\\b + 0: A\b + a\\B + 0: a\B /((a))/i ABC @@ -4839,11 +4648,6 @@ No match CDE 0: CDE -/abc/i - -/a*/i - - /([abc])*d/i ABBBCD 0: ABBBCD @@ -4883,6 +4687,7 @@ No match 0: HIJ /^(ab|cd)e/i +\= Expect no match ABCDE No match @@ -4962,8 +4767,7 @@ No match REFFGZ 0: EFFGZ 1: EFFGZ - *** Failers -No match +\= Expect no match ADCDCDE No match EFFG @@ -5023,8 +4827,7 @@ No match 1: C /multiple words of text/i - *** Failers -No match +\= Expect no match AA No match UH-UH @@ -5182,8 +4985,7 @@ No match aaaaaaaaaa 0: aaaaaaaaaa 1: aaaa - *** Failers -No match +\= Expect no match AB No match aaaaaaaaa @@ -5195,8 +4997,7 @@ No match aaaaaaaaaa 0: aaaaaaaaaa 1: aaaa - *** Failers -No match +\= Expect no match aaaaaaaaa No match aaaaaaaaaaa @@ -5215,8 +5016,7 @@ No match /(?<=a)b/ ab 0: b - *** Failers -No match +\= Expect no match cb No match b @@ -5292,8 +5092,7 @@ No match 1: A /(?:(?i)a)b/ - *** Failers -No match +\= Expect no match cb No match aB @@ -5320,8 +5119,7 @@ No match 1: A /(?i:a)b/ - *** Failers -No match +\= Expect no match aB No match aB @@ -5348,34 +5146,14 @@ No match 1: a /(?:(?-i)a)b/i - *** Failers -No match - aB - 0: aB - Ab -No match - -/((?-i)a)b/i - -/(?:(?-i)a)b/i - aB - 0: aB - -/((?-i)a)b/i aB 0: aB - 1: a - -/(?:(?-i)a)b/i - *** Failers -No match +\= Expect no match Ab No match AB No match -/((?-i)a)b/i - /(?-i:a)b/i ab 0: ab @@ -5395,8 +5173,7 @@ No match 1: a /(?-i:a)b/i - *** Failers -No match +\= Expect no match AB No match Ab @@ -5414,8 +5191,7 @@ No match 1: a /(?-i:a)b/i - *** Failers -No match +\= Expect no match Ab No match AB @@ -5424,8 +5200,7 @@ No match /((?-i:a))b/i /((?-i:a.))b/i - *** Failers -No match +\= Expect no match AB No match a\nB @@ -5470,8 +5245,7 @@ No match 0: aaac /(? - 2: Failers +\= Expect no match abcd: No match abcd: @@ -5754,8 +5518,7 @@ No match 1: x /a\Z/ - *** Failers -No match +\= Expect no match aaab No match a\nb\n @@ -5774,8 +5537,6 @@ No match /b\z/ a\nb 0: b - *** Failers -No match /^(?>(?(1)\.|())[^\W_](?>[a-z0-9-]*[^\W_])?)+$/ a @@ -5805,8 +5566,7 @@ No match 12-ab.1245 0: 12-ab.1245 1: - *** Failers -No match +\= Expect no match \ No match .a @@ -5841,18 +5601,19 @@ No match endingwxyz 0: endingwxyz 1: wxyz - *** Failers -No match +\= Expect no match a rather long string that doesn't end with one of them No match /word (?>(?:(?!otherword)[a-zA-Z0-9]+ ){0,30})otherword/ word cat dog elephant mussel cow horse canary baboon snake shark otherword 0: word cat dog elephant mussel cow horse canary baboon snake shark otherword +\= Expect no match word cat dog elephant mussel cow horse canary baboon snake shark No match /word (?>[a-zA-Z0-9]+ ){0,30}otherword/ +\= Expect no match word cat dog elephant mussel cow horse canary baboon snake shark the quick brown fox and the lazy dog and several other words getting close to thirty by now I hope No match @@ -5861,8 +5622,7 @@ No match 0: foo 123999foo 0: foo - *** Failers -No match +\= Expect no match 123abcfoo No match @@ -5871,8 +5631,7 @@ No match 0: foo 123999foo 0: foo - *** Failers -No match +\= Expect no match 123abcfoo No match @@ -5881,8 +5640,7 @@ No match 0: foo 123456foo 0: foo - *** Failers -No match +\= Expect no match 123999foo No match @@ -5891,8 +5649,7 @@ No match 0: foo 123456foo 0: foo - *** Failers -No match +\= Expect no match 123999foo No match @@ -5991,18 +5748,6 @@ No match 0: 0: -/^[\d-a]/ - abcde - 0: a - -things - 0: - - 0digit - 0: 0 - *** Failers -No match - bcdef -No match - /[[:space:]]+/ > \x09\x0a\x0c\x0d\x0b< 0: \x09\x0a\x0c\x0d\x0b @@ -6024,11 +5769,12 @@ No match 0: ab /(?!\A)x/m - a\nxb\n + a\nxb\n 0: x /(?!^)x/m - a\nxb\n +\= Expect no match + a\nxb\n No match /abc\Qabc\Eabc/ @@ -6042,8 +5788,7 @@ No match / abc\Q abc\Eabc/x abc abcabc 0: abc abcabc - *** Failers -No match +\= Expect no match abcabcabc No match @@ -6083,8 +5828,7 @@ No match /\Gabc/ abc 0: abc - *** Failers -No match +\= Expect no match xyzabc No match @@ -6102,8 +5846,7 @@ No match /a(?x: b c )d/ XabcdY 0: abcd - *** Failers -No match +\= Expect no match Xa b c d Y No match @@ -6118,8 +5861,7 @@ No match /(?i)AB(?-i)C/ XabCY 0: abC - *** Failers -No match +\= Expect no match XabcY No match @@ -6130,8 +5872,7 @@ No match DE 0: DE 1: D - *** Failers -No match +\= Expect no match abcE No match abCe @@ -6167,9 +5908,9 @@ No match 1: bc 2: bc -/-- This tests for an IPv6 address in the form where it can have up to - eight components, one and only one of which is empty. This must be - an internal component. --/ +# This tests for an IPv6 address in the form where it can have up to +# eight components, one and only one of which is empty. This must be +# an internal component. /^(?!:) # colon disallowed at start (?: # start of item @@ -6179,7 +5920,7 @@ No match ){1,7} # end item; 1-7 of them required [0-9a-f]{1,4} $ # final hex number at end of string (?(1)|.) # check that there was an empty component - /xi + /ix a123::a123 0: a123::a123 1: @@ -6198,8 +5939,7 @@ No match a123:ddde:9999:b342::324e:dcba:abcd 0: a123:ddde:9999:b342::324e:dcba:abcd 1: - *** Failers -No match +\= Expect no match 1:2:3:4:5:6:7:8 No match a123:bce:ddde:9999:b342::324e:dcba:abcd @@ -6228,22 +5968,12 @@ No match 0: d ] 0: ] - *** Failers - 0: a +\= Expect no match b No match -/[\z\C]/ - z - 0: z - C - 0: C - -/\M/ - M - 0: M - /(a+)*b/ +\= Expect no match aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa No match @@ -6278,8 +6008,7 @@ No match /ab cd(?x) de fg/ ab cddefg 0: ab cddefg - ** Failers -No match +\= Expect no match abcddefg No match @@ -6287,28 +6016,25 @@ No match foobarX 0: bar 1: bar - ** Failers -No match +\= Expect no match boobarX No match /(?a|)*\d/ +\= Expect no match aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa No match + +/(?>a|)*\d/ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 - -/(?:a|)*\d/ +\= Expect no match aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa No match + +/(?:a|)*\d/ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 +\= Expect no match + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +No match /\Z/g abc\n @@ -6556,18 +6281,21 @@ No match /^(?s)(?>.*)(? 2: a -/(?:(?>([ab])))+a=/+ +/(?:(?>([ab])))+a=/aftertext =ba= 0: ba= 0+ 1: b -/(?>([ab]))+a=/+ +/(?>([ab]))+a=/aftertext =ba= 0: ba= 0+ @@ -6827,10 +6552,12 @@ No match 3: aabab /(?>a+|ab)+?c/ +\= Expect no match aabc No match /(?>a+|ab)+c/ +\= Expect no match aabc No match @@ -6850,10 +6577,12 @@ No match 2: b /^(?:a|ab)++c/ +\= Expect no match aaaabc No match /^(?>a|ab)++c/ +\= Expect no match aaaabc No match @@ -6861,30 +6590,27 @@ No match aaaabc 0: aaaabc -/(?=abc){3}abc/+ +/(?=abc){3}abc/aftertext abcabcabc 0: abc 0+ abcabc - ** Failers -No match +\= Expect no match xyz No match -/(?=abc)+abc/+ +/(?=abc)+abc/aftertext abcabcabc 0: abc 0+ abcabc - ** Failers -No match +\= Expect no match xyz No match -/(?=abc)++abc/+ +/(?=abc)++abc/aftertext abcabcabc 0: abc 0+ abcabc - ** Failers -No match +\= Expect no match xyz No match @@ -6893,8 +6619,7 @@ No match 0: xyz /(?=abc){1}xyz/ - ** Failers -No match +\= Expect no match xyz No match @@ -6933,8 +6658,7 @@ No match /^[\g]+/ ggg<<>> 0: ggg<<>> - ** Failers -No match +\= Expect no match \\ga No match @@ -6957,16 +6681,14 @@ No match /(?<=a{2})b/i xaabc 0: b - ** Failers -No match +\= Expect no match xabc No match /(? X NYQZ 0: X NYQZ - ** Failers -No match +\= Expect no match >XYZ No match > X NY Z @@ -7125,37 +6821,36 @@ No match 0: barbaz 1: foobar -/abc\K|def\K/g+ +/abc\K|def\K/g,aftertext Xabcdefghi 0: 0+ defghi 0: 0+ ghi -/ab\Kc|de\Kf/g+ +/ab\Kc|de\Kf/g,aftertext Xabcdefghi 0: c 0+ defghi 0: f 0+ ghi -/(?=C)/g+ +/(?=C)/g,aftertext ABCDECBA 0: 0+ CDECBA 0: 0+ CBA -/^abc\K/+ +/^abc\K/aftertext abcdef 0: 0+ def - ** Failers -No match +\= Expect no match defabcxyz No match -/^(a(b))\1\g1\g{1}\g-1\g{-1}\g{-02}Z/ +/^(a(b))\1\g1\g{1}\g-1\g{-1}\g{-2}Z/ ababababbbabZXXXX 0: ababababbbabZ 1: ab @@ -7170,6 +6865,7 @@ No match 1: bon /(^(a|b\g{-1}))/ +\= Expect no match bacxxx No match @@ -7180,8 +6876,7 @@ No match xyzxyz 0: xyzxyz 1: xyz - ** Failers -No match +\= Expect no match abcxyz No match xyzabc @@ -7194,8 +6889,7 @@ No match xyzabc 0: xyzabc 1: xyz - ** Failers -No match +\= Expect no match xyzxyz No match @@ -7237,8 +6931,7 @@ No match ab:ababxyz 0: ab:abab 1: ab - ** Failers -No match +\= Expect no match a:axyz No match ab:abxyz @@ -7251,8 +6944,7 @@ No match ab:ababxyz 0: ab:abab 1: ab - ** Failers -No match +\= Expect no match a:axyz No match ab:abxyz @@ -7302,8 +6994,7 @@ No match 0: 10.0.0.0 1: 2: .0 - ** Failers -No match +\= Expect no match 10.6 No match 455.3.4.5 @@ -7319,8 +7010,7 @@ No match 10.0.0.0 0: 10.0.0.0 1: .0 - ** Failers -No match +\= Expect no match 10.6 No match 455.3.4.5 @@ -7330,8 +7020,7 @@ No match now is the time for all good men to come to the aid of the party 0: now is the time for all good men to come to the aid of the party 1: party - *** Failers -No match +\= Expect no match this is not a line with only words and spaces! No match @@ -7340,8 +7029,7 @@ No match 0: 12345a 1: 12345 2: a - *** Failers -No match +\= Expect no match 12345+ No match @@ -7371,8 +7059,7 @@ No match (abc(def)xyz) 0: (abc(def)xyz) 1: xyz - *** Failers -No match +\= Expect no match ((()aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa No match @@ -7386,8 +7073,7 @@ No match a(b(c))d 0: a(b(c))d 1: d - *** Failers) -No match +\= Expect no match) a(b(c)d No match @@ -7425,8 +7111,7 @@ No match 2: 3: AblewasIereIsawElba 4: A - *** Failers -No match +\= Expect no match Thequickbrownfox No match @@ -7441,8 +7126,7 @@ No match -12 0: -12 1: -12 - *** Failers -No match +\= Expect no match ((2+2)*-3)-7) No match @@ -7455,8 +7139,7 @@ No match 0: xxyzxyzz 1: xxyzxyzz 2: xyzxyz - *** Failers -No match +\= Expect no match xxyzz No match xxyzxyzxyzz @@ -7487,28 +7170,32 @@ No match 0: <> 1: <> 2: <> - *** Failers -No match +\= Expect no match 3: Able was I ere I saw Elba 4: A - *** Failers -No match +\= Expect no match The quick brown fox No match @@ -7631,6 +7317,7 @@ No match 0: ablewasiereisawelba 1: ablewasiereisawelba 2: a +\= Expect no match rhubarb No match the quick brown fox @@ -7640,8 +7327,7 @@ No match baz 0: a 1: a - ** Failers -No match +\= Expect no match caz No match @@ -7649,8 +7335,7 @@ No match zbaaz 0: a 1: a - ** Failers -No match +\= Expect no match aaa No match @@ -7666,8 +7351,7 @@ No match defdef 0: defdef 1: def - ** Failers -No match +\= Expect no match abcdef No match defabc @@ -7680,14 +7364,13 @@ No match defabc 0: defabc 1: def - ** Failers -No match +\= Expect no match defdef No match abcdef No match -/(?:a(? (?')|(?")) |b(? (?')|(?")) ) (?('quote')[a-z]+|[0-9]+)/xJ +/(?:a(? (?')|(?")) |b(? (?')|(?")) ) (?('quote')[a-z]+|[0-9]+)/x,dupnames a\"aaaaa 0: a"aaaaa 1: " @@ -7701,8 +7384,7 @@ No match 4: " 5: 6: " - ** Failers -No match +\= Expect no match b\"11111 No match @@ -7713,8 +7395,7 @@ No match CCD 0: CC 1: C - ** Failers -No match +\= Expect no match CAD No match @@ -7725,8 +7406,7 @@ No match BCD 0: BC 1: C - ** Failers -No match +\= Expect no match ABCD No match CAD @@ -7750,8 +7430,7 @@ No match BAX 0: BA 1: A - ** Failers -No match +\= Expect no match ACX No match ABC @@ -7772,16 +7451,14 @@ No match 2: ef /^(?=a(*SKIP)b|ac)/ - ** Failers -No match +\= Expect no match ac No match /^(?=a(*PRUNE)b)/ ab 0: - ** Failers -No match +\= Expect no match ac No match @@ -7850,89 +7527,87 @@ No match 1: 0 2: 0 -/--- This one does fail, as expected, in Perl. It needs the complex item at the - end of the pattern. A single letter instead of (B|D) makes it not fail, - which I think is a Perl bug. --- / +# This one does fail, as expected, in Perl. It needs the complex item at the +# end of the pattern. A single letter instead of (B|D) makes it not fail, which +# I think is a Perl bug. /A(*COMMIT)(B|D)/ +\= Expect no match ACABX No match -/--- Check the use of names for failure ---/ +# Check the use of names for failure -/^(A(*PRUNE:A)B|C(*PRUNE:B)D)/K - ** Failers -No match +/^(A(*PRUNE:A)B|C(*PRUNE:B)D)/mark +\= Expect no match AC No match, mark = A CB No match, mark = B -/--- Force no study, otherwise mark is not seen. The studied version is in - test 2 because it isn't Perl-compatible. ---/ - -/(*MARK:A)(*SKIP:B)(C|X)/KSS +/(*MARK:A)(*SKIP:B)(C|X)/mark C 0: C 1: C MK: A +\= Expect no match D No match, mark = A -/^(A(*THEN:A)B|C(*THEN:B)D)/K - ** Failers -No match +/^(A(*THEN:A)B|C(*THEN:B)D)/mark +\= Expect no match CB No match, mark = B -/^(?:A(*THEN:A)B|C(*THEN:B)D)/K +/^(?:A(*THEN:A)B|C(*THEN:B)D)/mark +\= Expect no match CB No match, mark = B -/^(?>A(*THEN:A)B|C(*THEN:B)D)/K +/^(?>A(*THEN:A)B|C(*THEN:B)D)/mark +\= Expect no match CB No match, mark = B -/--- This should succeed, as the skip causes bump to offset 1 (the mark). Note -that we have to have something complicated such as (B|Z) at the end because, -for Perl, a simple character somehow causes an unwanted optimization to mess -with the handling of backtracking verbs. ---/ +# This should succeed, as the skip causes bump to offset 1 (the mark). Note +# that we have to have something complicated such as (B|Z) at the end because, +# for Perl, a simple character somehow causes an unwanted optimization to mess +# with the handling of backtracking verbs. -/A(*MARK:A)A+(*SKIP:A)(B|Z) | AC/xK +/A(*MARK:A)A+(*SKIP:A)(B|Z) | AC/x,mark AAAC 0: AC -/--- Test skipping over a non-matching mark. ---/ +# Test skipping over a non-matching mark. -/A(*MARK:A)A+(*MARK:B)(*SKIP:A)(B|Z) | AC/xK +/A(*MARK:A)A+(*MARK:B)(*SKIP:A)(B|Z) | AC/x,mark AAAC 0: AC -/--- Check shorthand for MARK ---/ +# Check shorthand for MARK. -/A(*:A)A+(*SKIP:A)(B|Z) | AC/xK +/A(*:A)A+(*SKIP:A)(B|Z) | AC/x,mark AAAC 0: AC -/--- Don't loop! Force no study, otherwise mark is not seen. ---/ - -/(*:A)A+(*SKIP:A)(B|Z)/KSS +/(*:A)A+(*SKIP:A)(B|Z)/mark +\= Expect no match AAAC No match, mark = A -/--- This should succeed, as a non-existent skip name disables the skip ---/ +# This should succeed, as a non-existent skip name disables the skip. -/A(*MARK:A)A+(*SKIP:B)(B|Z) | AC/xK +/A(*MARK:A)A+(*SKIP:B)(B|Z) | AC/x,mark AAAC 0: AC -/A(*MARK:A)A+(*SKIP:B)(B|Z) | AC(*:B)/xK +/A(*MARK:A)A+(*SKIP:B)(B|Z) | AC(*:B)/x,mark AAAC 0: AC MK: B -/--- COMMIT at the start of a pattern should act like an anchor. Again, -however, we need the complication for Perl. ---/ +# COMMIT at the start of a pattern should act like an anchor. Again, however, +# we need the complication for Perl. /(*COMMIT)(A|P)(B|P)(C|P)/ ABCDEFG @@ -7940,12 +7615,11 @@ however, we need the complication for Perl. ---/ 1: A 2: B 3: C - ** Failers -No match +\= Expect no match DEFGABC No match -/--- COMMIT inside an atomic group can't stop backtracking over the group. ---/ +# COMMIT inside an atomic group can't stop backtracking over the group. /(\w+)(?>b(*COMMIT))\w{2}/ abbb @@ -7953,22 +7627,25 @@ No match 1: a /(\w+)b(*COMMIT)\w{2}/ +\= Expect no match abbb No match -/--- Check opening parens in comment when seeking forward reference. ---/ +# Check opening parens in comment when seeking forward reference. /(?&t)(?#()(?(DEFINE)(?a))/ bac 0: a -/--- COMMIT should override THEN ---/ +# COMMIT should override THEN. /(?>(*COMMIT)(?>yes|no)(*THEN)(*F))?/ +\= Expect no match yes No match /(?>(*COMMIT)(yes|no)(*THEN)(*F))?/ +\= Expect no match yes No match @@ -7979,10 +7656,12 @@ No match 0: bc /(*SKIP)bc/ +\= Expect no match a No match /(*SKIP)b/ +\= Expect no match a No match @@ -7998,9 +7677,7 @@ No match aA 0: aA 1: a - ** Failers - 0: ** - 1: * +\= Expect no match ab No match aB @@ -8013,8 +7690,7 @@ No match /^(?&t)*+(?(DEFINE)(?a))\w$/ aaaaaaX 0: aaaaaaX - ** Failers -No match +\= Expect no match aaaaaa No match @@ -8033,8 +7709,7 @@ No match 0: Y 1: 2: Y - ** Failers -No match +\= Expect no match aaaa No match @@ -8045,8 +7720,7 @@ No match YZ 0: Y 1: Y - ** Failers -No match +\= Expect no match aaaa No match @@ -8055,8 +7729,7 @@ No match 0: aaaaX 1: a 2: X - ** Failers -No match +\= Expect no match aaaa No match YZ @@ -8066,8 +7739,7 @@ No match aaaaX 0: aaaaX 1: X - ** Failers -No match +\= Expect no match aaaa No match YZ @@ -8096,8 +7768,7 @@ No match 0: aaaaX 1: a 2: X - ** Failers -No match +\= Expect no match aaa No match YZ @@ -8107,8 +7778,7 @@ No match aaaaX 0: aaaaX 1: X - ** Failers -No match +\= Expect no match aaa No match YZ @@ -8126,16 +7796,14 @@ No match 1: /(a)++(?1)b/ - ** Failers -No match +\= Expect no match ab No match aab No match /(a)*+(?1)b/ - ** Failers -No match +\= Expect no match ab No match aab @@ -8192,15 +7860,16 @@ No match 1: a /^(a)(?1)++ab/ +\= Expect no match aaaab No match -/^(?=a(*:M))aZ/K +/^(?=a(*:M))aZ/mark aZbc 0: aZ MK: M -/^(?!(*:M)b)aZ/K +/^(?!(*:M)b)aZ/mark aZbc 0: aZ @@ -8241,12 +7910,12 @@ MK: M 1: a /((?(R)a|(?1)))+/ - aaa + aaa 0: aaa 1: a /a(*:any -name)/K +name)/mark abc 0: a MK: any \x0aname @@ -8259,11 +7928,12 @@ MK: any \x0aname bba 0: a -/--- Checking revised (*THEN) handling ---/ +# Checking revised (*THEN) handling. -/--- Capture ---/ +# Capture /^.*? (a(*THEN)b) c/x +\= Expect no match aabc No match @@ -8279,12 +7949,14 @@ No match 2: ab /^.*? ( (a(*THEN)b) ) c/x +\= Expect no match aabc No match -/--- Non-capture ---/ +# Non-capture /^.*? (?:a(*THEN)b) c/x +\= Expect no match aabc No match @@ -8297,12 +7969,14 @@ No match 0: aabc /^.*? (?: (?:a(*THEN)b) ) c/x +\= Expect no match aabc No match -/--- Atomic ---/ +# Atomic /^.*? (?>a(*THEN)b) c/x +\= Expect no match aabc No match @@ -8315,12 +7989,14 @@ No match 0: aabc /^.*? (?> (?>a(*THEN)b) ) c/x +\= Expect no match aabc No match -/--- Possessive capture ---/ +# Possessive capture /^.*? (a(*THEN)b)++ c/x +\= Expect no match aabc No match @@ -8336,12 +8012,14 @@ No match 2: ab /^.*? ( (a(*THEN)b)++ )++ c/x +\= Expect no match aabc No match -/--- Possessive non-capture ---/ +# Possessive non-capture /^.*? (?:a(*THEN)b)++ c/x +\= Expect no match aabc No match @@ -8354,18 +8032,20 @@ No match 0: aabc /^.*? (?: (?:a(*THEN)b)++ )++ c/x +\= Expect no match aabc No match -/--- Condition assertion ---/ +# Condition assertion /^(?(?=a(*THEN)b)ab|ac)/ ac 0: ac -/--- Condition ---/ +# Condition /^.*?(?(?=a)a|b(*THEN)c)/ +\= Expect no match ba No match @@ -8374,23 +8054,24 @@ No match 0: ba /^.*?(?(?=a)a(*THEN)b|c)/ +\= Expect no match ac No match -/--- Assertion ---/ +# Assertion -/^.*(?=a(*THEN)b)/ +/^.*(?=a(*THEN)b)/ aabc 0: a -/------------------------------/ +# -------------------------- -/(?>a(*:m))/imsxSK +/(?>a(*:m))/imsx,mark a 0: a MK: m -/(?>(a)(*:m))/imsxSK +/(?>(a)(*:m))/imsx,mark a 0: a 1: a @@ -8409,8 +8090,7 @@ MK: m xabcd 0: c 1: ab - ** Failers -No match +\= Expect no match xacd No match @@ -8420,7 +8100,7 @@ No match acd 0: c -/(?<=a(*:N)b)c/K +/(?<=a(*:N)b)c/mark xabcd 0: c MK: N @@ -8443,82 +8123,90 @@ MK: N 1: a 2: d -/(*MARK:A)(*PRUNE:B)(C|X)/KS +/(*MARK:A)(*PRUNE:B)(C|X)/mark C 0: C 1: C MK: B +\= Expect no match D No match, mark = B -/(*MARK:A)(*PRUNE:B)(C|X)/KSS +/(*MARK:A)(*PRUNE:B)(C|X)/mark C 0: C 1: C MK: B +\= Expect no match D No match, mark = B -/(*MARK:A)(*THEN:B)(C|X)/KS +/(*MARK:A)(*THEN:B)(C|X)/mark C 0: C 1: C MK: B +\= Expect no match D No match, mark = B -/(*MARK:A)(*THEN:B)(C|X)/KSY +/(*MARK:A)(*THEN:B)(C|X)/mark,no_start_optimize C 0: C 1: C MK: B +\= Expect no match D No match, mark = B -/(*MARK:A)(*THEN:B)(C|X)/KSS +/(*MARK:A)(*THEN:B)(C|X)/mark C 0: C 1: C MK: B +\= Expect no match D No match, mark = B -/--- This should fail, as the skip causes a bump to offset 3 (the skip) ---/ +# This should fail, as the skip causes a bump to offset 3 (the skip). -/A(*MARK:A)A+(*SKIP)(B|Z) | AC/xK +/A(*MARK:A)A+(*SKIP)(B|Z) | AC/x,mark +\= Expect no match AAAC No match, mark = A -/--- Same --/ +# Same -/A(*MARK:A)A+(*MARK:B)(*SKIP:B)(B|Z) | AC/xK +/A(*MARK:A)A+(*MARK:B)(*SKIP:B)(B|Z) | AC/x,mark +\= Expect no match AAAC No match, mark = B -/A(*:A)A+(*SKIP)(B|Z) | AC/xK +/A(*:A)A+(*SKIP)(B|Z) | AC/x,mark +\= Expect no match AAAC No match, mark = A -/--- This should fail, as a null name is the same as no name ---/ +# This should fail, as a null name is the same as no name. -/A(*MARK:A)A+(*SKIP:)(B|Z) | AC/xK +/A(*MARK:A)A+(*SKIP:)(B|Z) | AC/x,mark +\= Expect no match AAAC No match, mark = A -/--- A check on what happens after hitting a mark and them bumping along to -something that does not even start. Perl reports tags after the failures here, -though it does not when the individual letters are made into something -more complicated. ---/ +# A check on what happens after hitting a mark and them bumping along to +# something that does not even start. Perl reports tags after the failures +# here, though it does not when the individual letters are made into something +# more complicated. -/A(*:A)B|XX(*:B)Y/K +/A(*:A)B|XX(*:B)Y/mark AABC 0: AB MK: A XXYZ 0: XXY MK: B - ** Failers -No match +\= Expect no match XAQQ No match, mark = A XAQQXZZ @@ -8528,7 +8216,7 @@ No match, mark = A AXXQQQ No match, mark = B -/^(A(*THEN:A)B|C(*THEN:B)D)/K +/^(A(*THEN:A)B|C(*THEN:B)D)/mark AB 0: AB 1: AB @@ -8537,14 +8225,13 @@ MK: A 0: CD 1: CD MK: B - ** Failers -No match +\= Expect no match AC No match, mark = A CB No match, mark = B -/^(A(*PRUNE:A)B|C(*PRUNE:B)D)/K +/^(A(*PRUNE:A)B|C(*PRUNE:B)D)/mark AB 0: AB 1: AB @@ -8553,17 +8240,16 @@ MK: A 0: CD 1: CD MK: B - ** Failers -No match +\= Expect no match AC No match, mark = A CB No match, mark = B -/--- An empty name does not pass back an empty string. It is the same as if no -name were given. ---/ +# An empty name does not pass back an empty string. It is the same as if no +# name were given. -/^(A(*PRUNE:)B|C(*PRUNE:B)D)/K +/^(A(*PRUNE:)B|C(*PRUNE:B)D)/mark AB 0: AB 1: AB @@ -8572,16 +8258,16 @@ name were given. ---/ 1: CD MK: B -/--- PRUNE goes to next bumpalong; COMMIT does not. ---/ +# PRUNE goes to next bumpalong; COMMIT does not. -/A(*PRUNE:A)B/K +/A(*PRUNE:A)B/mark ACAB 0: AB MK: A -/--- Mark names can be duplicated ---/ +# Mark names can be duplicated. -/A(*:A)B|X(*:A)Y/K +/A(*:A)B|X(*:A)Y/mark AABC 0: AB MK: A @@ -8589,92 +8275,72 @@ MK: A 0: XY MK: A -/b(*:m)f|a(*:n)w/K +/b(*:m)f|a(*:n)w/mark aw 0: aw MK: n - ** Failers -No match, mark = n +\= Expect no match abc No match, mark = m -/b(*:m)f|aw/K +/b(*:m)f|aw/mark abaw 0: aw - ** Failers -No match +\= Expect no match abc No match, mark = m abax No match, mark = m -/A(*MARK:A)A+(*SKIP:B)(B|Z) | AAC/xK +/A(*MARK:A)A+(*SKIP:B)(B|Z) | AAC/x,mark AAAC 0: AAC -/a(*PRUNE:X)bc|qq/KY - ** Failers -No match, mark = X - axy -No match, mark = X - -/a(*THEN:X)bc|qq/KY - ** Failers -No match, mark = X - axy -No match, mark = X - -/(?=a(*MARK:A)b)..x/K +/(?=a(*MARK:A)b)..x/mark abxy 0: abx MK: A - ** Failers -No match +\= Expect no match abpq No match -/(?=a(*MARK:A)b)..(*:Y)x/K +/(?=a(*MARK:A)b)..(*:Y)x/mark abxy 0: abx MK: Y - ** Failers -No match +\= Expect no match abpq No match -/(?=a(*PRUNE:A)b)..x/K +/(?=a(*PRUNE:A)b)..x/mark abxy 0: abx MK: A - ** Failers -No match +\= Expect no match abpq No match -/(?=a(*PRUNE:A)b)..(*:Y)x/K +/(?=a(*PRUNE:A)b)..(*:Y)x/mark abxy 0: abx MK: Y - ** Failers -No match +\= Expect no match abpq No match -/(?=a(*THEN:A)b)..x/K +/(?=a(*THEN:A)b)..x/mark abxy 0: abx MK: A - ** Failers -No match +\= Expect no match abpq No match -/(?=a(*THEN:A)b)..(*:Y)x/K +/(?=a(*THEN:A)b)..(*:Y)x/mark abxy 0: abx MK: Y - ** Failers -No match +\= Expect no match abpq No match @@ -8685,6 +8351,7 @@ No match 2: /(another)?(\1+)test/ +\= Expect no match hello world test No match @@ -8693,12 +8360,12 @@ No match 0: aac /((?:a?)*)*c/ - aac + aac 0: aac 1: /((?>a?)*)*c/ - aac + aac 0: aac 1: @@ -8710,22 +8377,6 @@ No match aba 0: aba -/.*?a(*PRUNE)b/ - aab - 0: ab - -/.*?a(*PRUNE)b/s - aab - 0: ab - -/^a(*PRUNE)b/s - aab -No match - -/.*?a(*SKIP)b/ - aab - 0: ab - /(?>.*?a)b/s aab 0: ab @@ -8735,6 +8386,7 @@ No match 0: ab /(?>^a)b/s +\= Expect no match aab No match @@ -8756,11 +8408,12 @@ No match 1: 2: wxyz -"(?>.*)foo" +/(?>.*)foo/ +\= Expect no match abcdfooxyz No match -"(?>.*?)foo" +/(?>.*?)foo/ abcdfooxyz 0: foo @@ -8773,41 +8426,41 @@ No match 0: ac /(?<=(*SKIP)ac)a/ +\= Expect no match aa No match -/A(*MARK:A)A+(*SKIP:B)(B|Z) | AC/xK +/A(*MARK:A)A+(*SKIP:B)(B|Z) | AC/x,mark AAAC 0: AC -/a(*SKIP:m)x|ac(*:n)(*SKIP:n)d|ac/K +/a(*SKIP:m)x|ac(*:n)(*SKIP:n)d|ac/mark acacd 0: acd MK: n -/A(*SKIP:m)x|A(*SKIP:n)x|AB/K +/A(*SKIP:m)x|A(*SKIP:n)x|AB/mark AB 0: AB -/((*SKIP:r)d){0}a(*SKIP:m)x|ac(*:n)|ac/K +/((*SKIP:r)d){0}a(*SKIP:m)x|ac(*:n)|ac/mark acacd 0: ac MK: n -/-- Tests that try to figure out how Perl works. My hypothesis is that the - first verb that is backtracked onto is the one that acts. This seems to be - the case almost all the time, but there is one exception that is perhaps a - bug. --/ +# Tests that try to figure out how Perl works. My hypothesis is that the first +# verb that is backtracked onto is the one that acts. This seems to be the case +# almost all the time, but there is one exception that is perhaps a bug. -/-- This matches "aaaac"; each PRUNE advances one character until the subject - no longer starts with 5 'a's. --/ +# This matches "aaaac"; each PRUNE advances one character until the subject no +# longer starts with 5 'a's. /aaaaa(*PRUNE)b|a+c/ aaaaaac 0: aaaac -/-- Putting SKIP in front of PRUNE makes no difference, as it is never -backtracked onto, whether or not it has a label. --/ +# Putting SKIP in front of PRUNE makes no difference, as it is never +# backtracked onto, whether or not it has a label. /aaaaa(*SKIP)(*PRUNE)b|a+c/ aaaaaac @@ -8821,80 +8474,79 @@ backtracked onto, whether or not it has a label. --/ aaaaaac 0: aaaac -/-- Putting THEN in front makes no difference. */ +# Putting THEN in front makes no difference. /aaaaa(*THEN)(*PRUNE)b|a+c/ aaaaaac 0: aaaac -/-- However, putting COMMIT in front of the prune changes it to "no match". I - think this is inconsistent and possibly a bug. For the moment, running this - test is moved out of the Perl-compatible file. --/ +# However, putting COMMIT in front of the prune changes it to "no match". I +# think this is inconsistent and possibly a bug. For the moment, running this +# test is moved out of the Perl-compatible file. /aaaaa(*COMMIT)(*PRUNE)b|a+c/ +# OK, lets play the same game again using SKIP instead of PRUNE. -/---- OK, lets play the same game again using SKIP instead of PRUNE. ----/ - -/-- This matches "ac" because SKIP forces the next match to start on the - sixth "a". --/ +# This matches "ac" because SKIP forces the next match to start on the +# sixth "a". /aaaaa(*SKIP)b|a+c/ aaaaaac 0: ac -/-- Putting PRUNE in front makes no difference. --/ +# Putting PRUNE in front makes no difference. /aaaaa(*PRUNE)(*SKIP)b|a+c/ aaaaaac 0: ac -/-- Putting THEN in front makes no difference. --/ +# Putting THEN in front makes no difference. /aaaaa(*THEN)(*SKIP)b|a+c/ aaaaaac 0: ac -/-- In this case, neither does COMMIT. This still matches "ac". --/ +# In this case, neither does COMMIT. This still matches "ac". /aaaaa(*COMMIT)(*SKIP)b|a+c/ aaaaaac 0: ac -/-- This gives "no match", as expected. --/ +# This gives "no match", as expected. /aaaaa(*COMMIT)b|a+c/ +\= Expect no match aaaaaac No match - -/------ Tests using THEN ------/ +# ---- Tests using THEN ---- -/-- This matches "aaaaaac", as expected. --/ +# This matches "aaaaaac", as expected. /aaaaa(*THEN)b|a+c/ aaaaaac 0: aaaaaac -/-- Putting SKIP in front makes no difference. --/ +# Putting SKIP in front makes no difference. /aaaaa(*SKIP)(*THEN)b|a+c/ aaaaaac 0: aaaaaac -/-- Putting PRUNE in front makes no difference. --/ +# Putting PRUNE in front makes no difference. /aaaaa(*PRUNE)(*THEN)b|a+c/ aaaaaac 0: aaaaaac -/-- Putting COMMIT in front makes no difference. --/ +# Putting COMMIT in front makes no difference. /aaaaa(*COMMIT)(*THEN)b|a+c/ aaaaaac 0: aaaaaac -/-- End of "priority" tests --/ +# End of "priority" tests /aaaaa(*:m)(*PRUNE:m)(*SKIP:m)m|a+/ aaaaaa @@ -8928,7 +8580,7 @@ No match aaaac 0: ac -/a(*:m)a(*COMMIT)(*SKIP:m)b|a+c/K +/a(*:m)a(*COMMIT)(*SKIP:m)b|a+c/mark aaaaaac 0: ac @@ -8941,6 +8593,7 @@ No match abc 0: abc 1: ab +\= Expect no match abd No match @@ -8957,10 +8610,11 @@ No match 0: abd /a(?=b(*COMMIT)c)[^d]|abd/ + abc + 0: ab +\= Expect no match abd No match - abc - 0: ab /a(?=bc).|abd/ abd @@ -8969,6 +8623,7 @@ No match 0: ab /a(?>b(*COMMIT)c)d|abd/ +\= Expect no match abceabd No match @@ -8981,6 +8636,7 @@ No match 0: abd /(?>a(*COMMIT)c)d|abd/ +\= Expect no match abd No match @@ -8990,33 +8646,16 @@ No match 1: 2: c -/-- These tests were formerly in test 2, but changes in PCRE and Perl have - made them compatible. --/ +# These tests were formerly in test 2, but changes in PCRE and Perl have +# made them compatible. /^(a)?(?(1)a|b)+$/ - *** Failers -No match +\= Expect no match a No match -/(?=a\Kb)ab/ - ab - 0: b - -/(?!a\Kb)ac/ - ac - 0: ac - -/^abc(?<=b\Kc)d/ - abcd - 0: cd - -/^abc(?b))/K +/(*:m(m)(?&y)(?(DEFINE)(?b))/mark abc 0: b MK: m(m -/(*PRUNE:m(m)(?&y)(?(DEFINE)(?b))/K +/(*PRUNE:m(m)(?&y)(?(DEFINE)(?b))/mark abc 0: b MK: m(m -/(*SKIP:m(m)(?&y)(?(DEFINE)(?b))/K +/(*SKIP:m(m)(?&y)(?(DEFINE)(?b))/mark abc 0: b -/(*THEN:m(m)(?&y)(?(DEFINE)(?b))/K +/(*THEN:m(m)(?&y)(?(DEFINE)(?b))/mark abc 0: b MK: m(m @@ -9186,42 +8840,39 @@ MK: m(m /^\d*\w{4}/ 1234 0: 1234 +\= Expect no match 123 No match /^[^b]*\w{4}/ aaaa 0: aaaa +\= Expect no match aaa No match /^[^b]*\w{4}/i aaaa 0: aaaa +\= Expect no match aaa No match /^a*\w{4}/ aaaa 0: aaaa +\= Expect no match aaa No match /^a*\w{4}/i aaaa 0: aaaa +\= Expect no match aaa No match -/(?(?=ab)ab)/+ - ca - 0: - 0+ ca - cd - 0: - 0+ cd - -/(?:(?foo)|(?bar))\k/J +/(?:(?foo)|(?bar))\k/dupnames foofoo 0: foofoo 1: foo @@ -9230,7 +8881,7 @@ No match 1: 2: bar -/(?A)(?:(?foo)|(?bar))\k/J +/(?A)(?:(?foo)|(?bar))\k/dupnames AfooA 0: AfooA 1: A @@ -9240,8 +8891,7 @@ No match 1: A 2: 3: bar - ** Failers -No match +\= Expect no match Afoofoo No match Abarbar @@ -9254,7 +8904,7 @@ No match 2: non-sp1 3: non-sp2 -/^ (?:(?A)|(?'B'B)(?A)) (?('A')x) (?()y)$/xJ +/^ (?:(?A)|(?'B'B)(?A)) (?('A')x) (?()y)$/x,dupnames Ax 0: Ax 1: A @@ -9296,60 +8946,61 @@ No match 0: aaaab 1: aaaa -/(?:a\Kb)*+/+ +/(?:a\Kb)*+/aftertext ababc 0: b 0+ c -/(?>a\Kb)*/+ +/(?>a\Kb)*/aftertext ababc 0: b 0+ c -/(?:a\Kb)*/+ +/(?:a\Kb)*/aftertext ababc 0: b 0+ c -/(a\Kb)*+/+ +/(a\Kb)*+/aftertext ababc 0: b 0+ c 1: ab -/(a\Kb)*/+ +/(a\Kb)*/aftertext ababc 0: b 0+ c 1: ab /(?:x|(?:(xx|yy)+|x|x|x|x|x)|a|a|a)bc/ +\= Expect no match acb No match -'\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++' +/\A(?:[^\"]++|\"(?:[^\"]*+|\"\")*+\")++/ NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED 0: NON QUOTED "QUOT""ED" AFTER -'\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++' +/\A(?:[^\"]++|\"(?:[^\"]++|\"\")*+\")++/ NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED 0: NON QUOTED "QUOT""ED" AFTER -'\A(?:[^\"]++|\"(?:[^\"]++|\"\")++\")++' +/\A(?:[^\"]++|\"(?:[^\"]++|\"\")++\")++/ NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED 0: NON QUOTED "QUOT""ED" AFTER -'\A([^\"1]++|[\"2]([^\"3]*+|[\"4][\"5])*+[\"6])++' +/\A([^\"1]++|[\"2]([^\"3]*+|[\"4][\"5])*+[\"6])++/ NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED 0: NON QUOTED "QUOT""ED" AFTER 1: AFTER 2: /^\w+(?>\s*)(?<=\w)/ - test test + test test 0: tes -/(?Pa)(?Pb)/gJ +/(?Pa)(?Pb)/g,dupnames abbaba 0: ab 1: a @@ -9358,13 +9009,13 @@ No match 1: a 2: b -/(?Pa)(?Pb)(?P=same)/gJ +/(?Pa)(?Pb)(?P=same)/g,dupnames abbaba 0: aba 1: a 2: b -/(?P=same)?(?Pa)(?Pb)/gJ +/(?P=same)?(?Pa)(?Pb)/g,dupnames abbaba 0: ab 1: a @@ -9373,7 +9024,7 @@ No match 1: a 2: b -/(?:(?P=same)?(?:(?Pa)|(?Pb))(?P=same))+/gJ +/(?:(?P=same)?(?:(?Pa)|(?Pb))(?P=same))+/g,dupnames bbbaaabaabb 0: bbbaaaba 1: a @@ -9382,7 +9033,8 @@ No match 1: 2: b -/(?:(?P=same)?(?:(?P=same)(?Pa)(?P=same)|(?P=same)?(?Pb)(?P=same)){2}(?P=same)(?Pc)(?P=same)){2}(?Pz)?/gJ +/(?:(?P=same)?(?:(?P=same)(?Pa)(?P=same)|(?P=same)?(?Pb)(?P=same)){2}(?P=same)(?Pc)(?P=same)){2}(?Pz)?/g,dupnames +\= Expect no match bbbaaaccccaaabbbcc No match @@ -9411,40 +9063,957 @@ No match aa]] 0: aa]] -/(?:((abcd))|(((?:(?:(?:(?:abc|(?:abcdef))))b)abcdefghi)abc)|((*ACCEPT)))/ - 1234abcd - 0: - 1: - 2: - 3: - 4: - 5: - -/(\2)(\1)/ - -"Z*(|d*){216}" - -"(?1)(?#?'){8}(a)" - baaaaaaaaac - 0: aaaaaaaaa +/A((((((((a))))))))\8B/ + AaaB + 0: AaaB 1: a + 2: a + 3: a + 4: a + 5: a + 6: a + 7: a + 8: a -"(?|(\k'Pm')|(?'Pm'))" - abcd +/A(((((((((a)))))))))\9B/ + AaaB + 0: AaaB + 1: a + 2: a + 3: a + 4: a + 5: a + 6: a + 7: a + 8: a + 9: a + +/A[\8\9]B/ + A8B + 0: A8B + A9B + 0: A9B + +/(|ab)*?d/ + abd + 0: abd + 1: ab + xyd + 0: d + +/(?:((abcd))|(((?:(?:(?:(?:abc|(?:abcdef))))b)abcdefghi)abc)|((*ACCEPT)))/ + 1234abcd + 0: + 1: + 2: + 3: + 4: + 5: + +/(\2|a)(\1)/ + aaa + 0: aa + 1: a + 2: a + +/(\2)(\1)/ + +/Z*(|d*){216}/ + +/(?1)(?#?'){8}(a)/ + baaaaaaaaac + 0: aaaaaaaaa + 1: a + +/((((((((((((x))))))))))))\12/ + xx + 0: xx + 1: x + 2: x + 3: x + 4: x + 5: x + 6: x + 7: x + 8: x + 9: x +10: x +11: x +12: x + +/A[\8]B[\9]C/ + A8B9C + 0: A8B9C + +/(?1)()((((((\1++))\x85)+)|))/ + \x85\x85 + 0: \x85\x85 + 1: + 2: \x85\x85 + 3: \x85\x85 + 4: \x85\x85 + 5: \x85 + 6: + 7: + +/(?|(\k'Pm')|(?'Pm'))/ + abcd 0: 1: +/(?|(aaa)|(b))\g{1}/ + aaaaaa + 0: aaaaaa + 1: aaa + bb + 0: bb + 1: b + +/(?|(aaa)|(b))(?1)/ + aaaaaa + 0: aaaaaa + 1: aaa + baaa + 0: baaa + 1: b +\= Expect no match + bb +No match + +/(?|(aaa)|(b))/ + xaaa + 0: aaa + 1: aaa + xbc + 0: b + 1: b + +/(?|(?'a'aaa)|(?'a'b))\k'a'/ + aaaaaa + 0: aaaaaa + 1: aaa + bb + 0: bb + 1: b + +/(?|(?'a'aaa)|(?'a'b))(?'a'cccc)\k'a'/dupnames + aaaccccaaa + 0: aaaccccaaa + 1: aaa + 2: cccc + bccccb + 0: bccccb + 1: b + 2: cccc + +# /x does not apply to MARK labels + +/x (*MARK:ab cd # comment +ef) x/x,mark + axxz + 0: xx +MK: ab cd # comment\x0aef + +/(?<=a(B){0}c)X/ + acX + 0: X + +/(?b)(?(DEFINE)(a+))(?&DEFINE)/ + bbbb + 0: bb + 1: b +\= Expect no match + baaab +No match + /(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(?=.*[,;:])(?=.{8,16})(?!.*[\s])/ - \ Fred:099 + \ Fred:099 0: /(?=.*X)X$/ \ X 0: X - -/X+(?#comment)?/ - >XXX< - 0: X + +/(?s)(?=.*?)b/ + aabc + 0: b + +/(Z)(a)\2{1,2}?(?-i)\1X/i + ZaAAZX + 0: ZaAAZX + 1: Z + 2: a + +/(?'c')XX(?'YYYYYYYYYYYYYYYYYYYYYYYCl')/ + +/[s[:digit:]\E-H]+/ + s09-H + 0: s09-H + +/[s[:digit:]\Q\E-H]+/ + s09-H + 0: s09-H + +/a+(?:|b)a/ + aaaa + 0: aaaa + +/X?(R||){3335}/ + +/(?1)(A(*COMMIT)|B)D/ + ABD + 0: ABD + 1: B + XABD + 0: ABD + 1: B + BAD + 0: BAD + 1: A + ABXABD + 0: ABD + 1: B +\= Expect no match + ABX +No match + +/(?(DEFINE)(? 1? (?=(?2)?) 1 2 (?('cond')|3))) + \A + () + (?&m) + \Z/x + 123 + 0: 123 + 1: + 2: + 3: + +/^(?: +(?: A| (1? (?=(?2)?) (1) 2 (?('cond')|3)) ) +(Z) +)+$/x + AZ123Z + 0: AZ123Z + 1: 123 + 2: + 3: 1 + 4: Z +\= Expect no match + AZ12Z +No match + +/^ (?(DEFINE) ( (?!(a)\2b)..) ) ()(?1) /x + acb + 0: ac + 1: + 2: + 3: +\= Expect no match + aab +No match + +/(?>ab|abab){1,5}?M/ + abababababababababababM + 0: abababababM + +/(?>ab|abab){2}?M/ + abababM + 0: ababM + +/((?(?=(a))a)+k)/ + bbak + 0: ak + 1: ak + 2: a + +/((?(?=(a))a|)+k)/ + bbak + 0: ak + 1: ak + 2: a + +/(?(?!(b))a|b)+k/ + ababbalbbadabak + 0: abak + 1: b + +/(?!(b))c|b/ + Ab + 0: b + Ac + 0: c + +/(?=(b))b|c/ + Ab + 0: b + 1: b + Ac + 0: c + +/^(.|(.)(?1)\2)$/ + a + 0: a + 1: a + aba + 0: aba + 1: aba + 2: a + abcba + 0: abcba + 1: abcba + 2: a + ababa + 0: ababa + 1: ababa + 2: a + abcdcba + 0: abcdcba + 1: abcdcba + 2: a + +/^((.)(?1)\2|.?)$/ + a + 0: a + 1: a + aba + 0: aba + 1: aba + 2: a + abba + 0: abba + 1: abba + 2: a + abcba + 0: abcba + 1: abcba + 2: a + ababa + 0: ababa + 1: ababa + 2: a + abccba + 0: abccba + 1: abccba + 2: a + abcdcba + 0: abcdcba + 1: abcdcba + 2: a + abcddcba + 0: abcddcba + 1: abcddcba + 2: a + +/^(.)(\1|a(?2))/ + bab + 0: bab + 1: b + 2: ab + +/^(.|(.)(?1)?\2)$/ + abcba + 0: abcba + 1: abcba + 2: a + +/^(?(?=(a))abc|def)/ + abc + 0: abc + 1: a + +/^(?(?!(a))def|abc)/ + abc + 0: abc + 1: a + +/^(?(?=(a)(*ACCEPT))abc|def)/ + abc + 0: abc + 1: a + +/^(?(?!(a)(*ACCEPT))def|abc)/ + abc + 0: abc + 1: a + +/^(?1)\d{3}(a)/ + a123a + 0: a123a + 1: a + +# This pattern uses a lot of named subpatterns in order to match email +# addresses in various formats. It's a heavy test for named subpatterns. In the +# group, slash is coded as \x{2f} so that this pattern can also be +# processed by perltest.sh, which does not cater for an escaped delimiter +# within the pattern. $ within the pattern must also be escaped. All $ and @ +# characters in subject strings are escaped so that Perl doesn't interpret them +# as variable insertions and " characters must also be escaped for Perl. + +# This set of subpatterns is more or less a direct transliteration of the BNF +# definitions in RFC2822, without any of the obsolete features. The addition of +# a possessive + to the definition of reduced the match limit in PCRE2 +# from over 5 million to just under 400, and eliminated a very noticeable delay +# when this file was passed to perltest.sh. + +/(?ix)(?(DEFINE) +(? (?&local_part) \@ (?&domain) ) +(? (?&CFWS)?+ < (?&addr_spec) > (?&CFWS)?+ ) +(? [a-z\d!#\$%&'*+-\x{2f}=?^_`{|}~] ) +(? (?&CFWS)?+ (?&atext)+ (?&CFWS)?+ ) +(? (?&ctext) | (?"ed_pair) | (?&comment) ) +(? [^\x{9}\x{10}\x{13}\x{7f}-\x{ff}\ ()\\] ) +(? \( (?: (?&FWS)?+ (?&ccontent) )*+ (?&FWS)?+ \) ) +(? (?: (?&FWS)?+ (?&comment) )* (?# NOT possessive) + (?: (?&FWS)?+ (?&comment) | (?&FWS) ) ) +(? (?&dtext) | (?"ed_pair) ) +(? (?&phrase) ) +(? (?&dot_atom) | (?&domain_literal) ) +(? (?&CFWS)?+ \[ (?: (?&FWS)?+ (?&dcontent) )* (?&FWS)?+ \] + (?&CFWS)?+ ) +(? (?&CFWS)?+ (?&dot_atom_text) (?&CFWS)?+ ) +(? (?&atext)++ (?: \. (?&atext)++)*+ ) +(? [^\x{9}\x{10}\x{13}\x{7f}-\x{ff}\ \[\]\\] ) +(? (?: [\t\ ]*+ \n)?+ [\t\ ]++ ) +(? (?&dot_atom) | (?"ed_string) ) +(? (?&name_addr) | (?&addr_spec) ) +(? (?&display_name)? (?&angle_addr) ) +(? (?&word)++ ) +(? (?&qtext) | (?"ed_pair) ) +(? " (?&text) ) +(? (?&CFWS)?+ " (?: (?&FWS)?+ (?&qcontent))* (?&FWS)?+ " + (?&CFWS)?+ ) +(? [^\x{9}\x{10}\x{13}\x{7f}-\x{ff}\ "\\] ) +(? [^\r\n] ) +(? (?&atom) | (?"ed_string) ) +) # End DEFINE +^(?&mailbox)$/ + Alan Other + 0: Alan Other + + 0: + user\@dom.ain + 0: user@dom.ain + user\@[] + 0: user@[] + user\@[domain literal] + 0: user@[domain literal] + user\@[domain literal with \"[square brackets\"] inside] + 0: user@[domain literal with "[square brackets"] inside] + \"A. Other\" (a comment) + 0: "A. Other" (a comment) + A. Other (a comment) + 0: A. Other (a comment) + \"/s=user/ou=host/o=place/prmd=uu.yy/admd= /c=gb/\"\@x400-re.lay + 0: "/s=user/ou=host/o=place/prmd=uu.yy/admd= /c=gb/"@x400-re.lay +\= Expect no match + A missing angle (?&simple_assertion) | (?&lookaround) ) + +(? \( \? > (?®ex) \) ) + +(? \\ \d+ | + \\g (?: [+-]?\d+ | \{ (?: [+-]?\d+ | (?&groupname) ) \} ) | + \\k <(?&groupname)> | + \\k '(?&groupname)' | + \\k \{ (?&groupname) \} | + \( \? P= (?&groupname) \) ) + +(? (?:(?&assertion) | + (?&callout) | + (?&comment) | + (?&option_setting) | + (?&qualified_item) | + (?"ed_string) | + (?"ed_string_empty) | + (?&special_escape) | + (?&verb) + )* ) + +(? \(\?C (?: \d+ | + (?: (?["'`^%\#\$]) + (?: \k'D'\k'D' | (?!\k'D') . )* \k'D' | + \{ (?: \}\} | [^}]*+ )* \} ) + )? \) ) + +(? \( (?: \? P? < (?&groupname) > | \? ' (?&groupname) ' )? + (?®ex) \) ) + +(? \[ \^?+ (?: \] (?&class_item)* | (?&class_item)+ ) \] ) + +(? (?! \\N\{\w+\} ) \\ [dDsSwWhHvVRN] ) + +(? (?: \[ : (?: + alnum|alpha|ascii|blank|cntrl|digit|graph|lower|print| + punct|space|upper|word|xdigit + ) : \] | + (?"ed_string) | + (?"ed_string_empty) | + (?&escaped_character) | + (?&character_type) | + [^]] ) ) + +(? \(\?\# [^)]* \) | (?"ed_string_empty) | \\E ) + +(? (?: \( [+-]? \d+ \) | + \( < (?&groupname) > \) | + \( ' (?&groupname) ' \) | + \( R \d* \) | + \( R & (?&groupname) \) | + \( (?&groupname) \) | + \( DEFINE \) | + \( VERSION >?=\d+(?:\.\d\d?)? \) | + (?&callout)?+ (?&comment)* (?&lookaround) ) ) + +(? \(\? (?&condition) (?&branch) (?: \| (?&branch) )? \) ) + +(? (? [-\x{2f}!"'`=_:;,%&@~]) (?®ex) + \k'delimiter' .* ) + +(? \\ (?: 0[0-7]{1,2} | [0-7]{1,3} | o\{ [0-7]+ \} | + x \{ (*COMMIT) [[:xdigit:]]* \} | x [[:xdigit:]]{0,2} | + [aefnrt] | c[[:print:]] | + [^[:alnum:]] ) ) + +(? (?&capturing_group) | (?&non_capturing_group) | + (?&resetting_group) | (?&atomic_group) | + (?&conditional_group) ) + +(? [a-zA-Z_]\w* ) + +(? (?! (?&range_qualifier) ) [^[()|*+?.\$\\] ) + +(? \(\? (?: = | ! | <= | \(\? [iJmnsUx-]* : (?®ex) \) ) + +(? \(\? [iJmnsUx-]* \) ) + +(? (?:\. | + (?&lookaround) | + (?&back_reference) | + (?&character_class) | + (?&character_type) | + (?&escaped_character) | + (?&group) | + (?&subroutine_call) | + (?&literal_character) | + (?"ed_string) + ) (?&comment)? (?&qualifier)? ) + +(? (?: [?*+] | (?&range_qualifier) ) [+?]? ) + +(? (?: \\Q (?: (?!\\E | \k'delimiter') . )++ (?: \\E | ) ) ) + +(? \\Q\\E ) + +(? \{ (?: \d+ (?: , \d* )? | , \d+ ) \} ) + +(? (?&start_item)* (?&branch) (?: \| (?&branch) )* ) + +(? \( \? \| (?®ex) \) ) + +(? \^ | \$ | \\A | \\b | \\B | \\G | \\z | \\Z ) + +(? \\K ) + +(? \( \* (?: + ANY | + ANYCRLF | + BSR_ANYCRLF | + BSR_UNICODE | + CR | + CRLF | + LF | + LIMIT_MATCH=\d+ | + LIMIT_DEPTH=\d+ | + LIMIT_HEAP=\d+ | + NOTEMPTY | + NOTEMPTY_ATSTART | + NO_AUTO_POSSESS | + NO_DOTSTAR_ANCHOR | + NO_JIT | + NO_START_OPT | + NUL | + UTF | + UCP ) \) ) + +(? (?: \(\?R\) | \(\?[+-]?\d+\) | + \(\? (?: & | P> ) (?&groupname) \) | + \\g < (?&groupname) > | + \\g ' (?&groupname) ' | + \\g < [+-]? \d+ > | + \\g ' [+-]? \d+ ) ) + +(? \(\* (?: ACCEPT | FAIL | F | COMMIT | + (?:MARK)?:(?&verbname) | + (?:PRUNE|SKIP|THEN) (?: : (?&verbname)? )? ) \) ) + +(? [^)]+ ) + +) # End DEFINE +# Kick it all off... +^(?&delimited_regex)$/subject_literal,jitstack=256 + /^(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)\11*(\3\4)\1(?#)2$/ + 0: /^(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)\11*(\3\4)\1(?#)2$/ + /(cat(a(ract|tonic)|erpillar)) \1()2(3)/ + 0: /(cat(a(ract|tonic)|erpillar)) \1()2(3)/ + /^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/ + 0: /^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/ + /^From\s+\S+\s+([a-zA-Z]{3}\s+){2}\d{1,2}\s+\d\d:\d\d/ + 0: /^From\s+\S+\s+([a-zA-Z]{3}\s+){2}\d{1,2}\s+\d\d:\d\d/ + /]{0,})>]{0,})>([\d]{0,}\.)(.*)((
    ([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/is + 0: /]{0,})>]{0,})>([\d]{0,}\.)(.*)((
    ([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/is + /^(?(DEFINE) (?
    a) (? b) ) (?&A) (?&B) / + 0: /^(?(DEFINE) (? a) (? b) ) (?&A) (?&B) / + /(?(DEFINE)(?2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))\b(?&byte)(\.(?&byte)){3}/ + 0: /(?(DEFINE)(?2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))\b(?&byte)(\.(?&byte)){3}/ + /\b(?&byte)(\.(?&byte)){3}(?(DEFINE)(?2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))/ + 0: /\b(?&byte)(\.(?&byte)){3}(?(DEFINE)(?2[0-4]\d|25[0-5]|1\d\d|[1-9]?\d))/ + /^(\w++|\s++)*$/ + 0: /^(\w++|\s++)*$/ + /a+b?(*THEN)c+(*FAIL)/ + 0: /a+b?(*THEN)c+(*FAIL)/ + /(A (A|B(*ACCEPT)|C) D)(E)/x + 0: /(A (A|B(*ACCEPT)|C) D)(E)/x + /^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$/i + 0: /^\W*+(?:((.)\W*+(?1)\W*+\2|)|((.)\W*+(?3)\W*+\4|\W*+.\W*+))\W*+$/i + /A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/B + 0: /A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/B + /(?C`a``b`)(?C'a''b')(?C"a""b")(?C^a^^b^)(?C%a%%b%)(?C#a##b#)(?C$a$$b$)(?C{a}}b})/B,callout_info + 0: /(?C`a``b`)(?C'a''b')(?C"a""b")(?C^a^^b^)(?C%a%%b%)(?C#a##b#)(?C$a$$b$)(?C{a}}b})/B,callout_info + /(?sx)(?(DEFINE)(? (?&simple_assertion) | (?&lookaround) )(? \( \? > (?®ex) \) )(? \\ \d+ | \\g (?: [+-]?\d+ | \{ (?: [+-]?\d+ | (?&groupname) ) \} ) | \\k <(?&groupname)> | \\k '(?&groupname)' | \\k \{ (?&groupname) \} | \( \? P= (?&groupname) \) )(? (?:(?&assertion) | (?&callout) | (?&comment) | (?&option_setting) | (?&qualified_item) | (?"ed_string) | (?"ed_string_empty) | (?&special_escape) | (?&verb) )* )(? \(\?C (?: \d+ | (?: (?["'`^%\#\$]) (?: \k'D'\k'D' | (?!\k'D') . )* \k'D' | \{ (?: \}\} | [^}]*+ )* \} ) )? \) )(? \( (?: \? P? < (?&groupname) > | \? ' (?&groupname) ' )? (?®ex) \) )(? \[ \^?+ (?: \] (?&class_item)* | (?&class_item)+ ) \] )(? (?! \\N\{\w+\} ) \\ [dDsSwWhHvVRN] )(? (?: \[ : (?: alnum|alpha|ascii|blank|cntrl|digit|graph|lower|print| punct|space|upper|word|xdigit ) : \] | (?"ed_string) | (?"ed_string_empty) | (?&escaped_character) | (?&character_type) | [^]] ) )(? \(\?\# [^)]* \) | (?"ed_string_empty) | \\E )(? (?: \( [+-]? \d+ \) | \( < (?&groupname) > \) | \( ' (?&groupname) ' \) | \( R \d* \) | \( R & (?&groupname) \) | \( (?&groupname) \) | \( DEFINE \) | \( VERSION >?=\d+(?:\.\d\d?)? \) | (?&callout)?+ (?&comment)* (?&lookaround) ) )(? \(\? (?&condition) (?&branch) (?: \| (?&branch) )? \) )(? (? [-\x{2f}!"'`=_:;,%&@~]) (?®ex) \k'delimiter' .* )(? \\ (?: 0[0-7]{1,2} | [0-7]{1,3} | o\{ [0-7]+ \} | x \{ (*COMMIT) [[:xdigit:]]* \} | x [[:xdigit:]]{0,2} | [aefnrt] | c[[:print:]] | [^[:alnum:]] ) )(? (?&capturing_group) | (?&non_capturing_group) | (?&resetting_group) | (?&atomic_group) | (?&conditional_group) )(? [a-zA-Z_]\w* )(? (?! (?&range_qualifier) ) [^[()|*+?.\$\\] )(? \(\? (?: = | ! | <= | \(\? [iJmnsUx-]* : (?®ex) \) )(? \(\? [iJmnsUx-]* \) )(? (?:\. | (?&lookaround) | (?&back_reference) | (?&character_class) | (?&character_type) | (?&escaped_character) | (?&group) | (?&subroutine_call) | (?&literal_character) | (?"ed_string) ) (?&comment)? (?&qualifier)? )(? (?: [?*+] | (?&range_qualifier) ) [+?]? )(? (?: \\Q (?: (?!\\E | \k'delimiter') . )++ (?: \\E | ) ) ) (? \\Q\\E ) (? \{ (?: \d+ (?: , \d* )? | , \d+ ) \} )(? (?&start_item)* (?&branch) (?: \| (?&branch) )* )(? \( \? \| (?®ex) \) )(? \^ | \$ | \\A | \\b | \\B | \\G | \\z | \\Z )(? \\K )(? \( \* (?: ANY | ANYCRLF | BSR_ANYCRLF | BSR_UNICODE | CR | CRLF | LF | LIMIT_MATCH=\d+ | LIMIT_DEPTH=\d+ | LIMIT_HEAP=\d+ | NOTEMPTY | NOTEMPTY_ATSTART | NO_AUTO_POSSESS | NO_DOTSTAR_ANCHOR | NO_JIT | NO_START_OPT | NUL | UTF | UCP ) \) )(? (?: \(\?R\) | \(\?[+-]?\d+\) | \(\? (?: & | P> ) (?&groupname) \) | \\g < (?&groupname) > | \\g ' (?&groupname) ' | \\g < [+-]? \d+ > | \\g ' [+-]? \d+ ) )(? \(\* (?: ACCEPT | FAIL | F | COMMIT | (?:MARK)?:(?&verbname) | (?:PRUNE|SKIP|THEN) (?: : (?&verbname)? )? ) \) )(? [^)]+ ))^(?&delimited_regex)$/ + 0: /(?sx)(?(DEFINE)(? (?&simple_assertion) | (?&lookaround) )(? \( \? > (?®ex) \) )(? \\ \d+ | \\g (?: [+-]?\d+ | \{ (?: [+-]?\d+ | (?&groupname) ) \} ) | \\k <(?&groupname)> | \\k '(?&groupname)' | \\k \{ (?&groupname) \} | \( \? P= (?&groupname) \) )(? (?:(?&assertion) | (?&callout) | (?&comment) | (?&option_setting) | (?&qualified_item) | (?"ed_string) | (?"ed_string_empty) | (?&special_escape) | (?&verb) )* )(? \(\?C (?: \d+ | (?: (?["'`^%\#\$]) (?: \k'D'\k'D' | (?!\k'D') . )* \k'D' | \{ (?: \}\} | [^}]*+ )* \} ) )? \) )(? \( (?: \? P? < (?&groupname) > | \? ' (?&groupname) ' )? (?®ex) \) )(? \[ \^?+ (?: \] (?&class_item)* | (?&class_item)+ ) \] )(? (?! \\N\{\w+\} ) \\ [dDsSwWhHvVRN] )(? (?: \[ : (?: alnum|alpha|ascii|blank|cntrl|digit|graph|lower|print| punct|space|upper|word|xdigit ) : \] | (?"ed_string) | (?"ed_string_empty) | (?&escaped_character) | (?&character_type) | [^]] ) )(? \(\?\# [^)]* \) | (?"ed_string_empty) | \\E )(? (?: \( [+-]? \d+ \) | \( < (?&groupname) > \) | \( ' (?&groupname) ' \) | \( R \d* \) | \( R & (?&groupname) \) | \( (?&groupname) \) | \( DEFINE \) | \( VERSION >?=\d+(?:\.\d\d?)? \) | (?&callout)?+ (?&comment)* (?&lookaround) ) )(? \(\? (?&condition) (?&branch) (?: \| (?&branch) )? \) )(? (? [-\x{2f}!"'`=_:;,%&@~]) (?®ex) \k'delimiter' .* )(? \\ (?: 0[0-7]{1,2} | [0-7]{1,3} | o\{ [0-7]+ \} | x \{ (*COMMIT) [[:xdigit:]]* \} | x [[:xdigit:]]{0,2} | [aefnrt] | c[[:print:]] | [^[:alnum:]] ) )(? (?&capturing_group) | (?&non_capturing_group) | (?&resetting_group) | (?&atomic_group) | (?&conditional_group) )(? [a-zA-Z_]\w* )(? (?! (?&range_qualifier) ) [^[()|*+?.\$\\] )(? \(\? (?: = | ! | <= | \(\? [iJmnsUx-]* : (?®ex) \) )(? \(\? [iJmnsUx-]* \) )(? (?:\. | (?&lookaround) | (?&back_reference) | (?&character_class) | (?&character_type) | (?&escaped_character) | (?&group) | (?&subroutine_call) | (?&literal_character) | (?"ed_string) ) (?&comment)? (?&qualifier)? )(? (?: [?*+] | (?&range_qualifier) ) [+?]? )(? (?: \\Q (?: (?!\\E | \k'delimiter') . )++ (?: \\E | ) ) ) (? \\Q\\E ) (? \{ (?: \d+ (?: , \d* )? | , \d+ ) \} )(? (?&start_item)* (?&branch) (?: \| (?&branch) )* )(? \( \? \| (?®ex) \) )(? \^ | \$ | \\A | \\b | \\B | \\G | \\z | \\Z )(? \\K )(? \( \* (?: ANY | ANYCRLF | BSR_ANYCRLF | BSR_UNICODE | CR | CRLF | LF | LIMIT_MATCH=\d+ | LIMIT_DEPTH=\d+ | LIMIT_HEAP=\d+ | NOTEMPTY | NOTEMPTY_ATSTART | NO_AUTO_POSSESS | NO_DOTSTAR_ANCHOR | NO_JIT | NO_START_OPT | NUL | UTF | UCP ) \) )(? (?: \(\?R\) | \(\?[+-]?\d+\) | \(\? (?: & | P> ) (?&groupname) \) | \\g < (?&groupname) > | \\g ' (?&groupname) ' | \\g < [+-]? \d+ > | \\g ' [+-]? \d+ ) )(? \(\* (?: ACCEPT | FAIL | F | COMMIT | (?:MARK)?:(?&verbname) | (?:PRUNE|SKIP|THEN) (?: : (?&verbname)? )? ) \) )(? [^)]+ ))^(?&delimited_regex)$/ +\= Expect no match + /((?(?C'')\QX\E(?!((?(?C'')(?!X=X));=)r*X=X));=)/ +No match + /(?:(?(2y)a|b)(X))+/ +No match + /a(*MARK)b/ +No match + /a(*CR)b/ +No match + /(?P(?P=abn)(?/xx + < > + 0: < > + +/<(?:[a b])>/xx + < > +No match + +/<(?xxx:[a b])>/ + < > +No match + +/<(?-x:[a b])>/xx + < > + 0: < > + +/[[:digit:]-]+/ + 12-24 + 0: 12-24 + +/((?<=((*ACCEPT)) )\1?\b) / +\= Expect no match + ((?<=((*ACCEPT)) )\\1?\\b)\x20 +No match + +/((?<=((*ACCEPT))X)\1?Y)\1/ + XYYZ + 0: YY + 1: Y + 2: + +/((?<=((*ACCEPT))X)\1?Y(*ACCEPT))\1/ + XYYZ + 0: Y + 1: Y + 2: + +/(?(DEFINE)(?a?)X)^(?&optional_a)a$/ + aa + 0: aa + a + 0: a + +/^(a?)b(?1)a/ + abaa + 0: abaa + 1: a + aba + 0: aba + 1: a + baa + 0: baa + 1: + ba + 0: ba + 1: + +/^(a?)+b(?1)a/ + abaa + 0: abaa + 1: + aba + 0: aba + 1: + baa + 0: baa + 1: + ba + 0: ba + 1: + +/^(a?)++b(?1)a/ + abaa + 0: abaa + 1: + aba + 0: aba + 1: + baa + 0: baa + 1: + ba + 0: ba + 1: + +/^(a?)+b/ + b + 0: b + 1: + ab + 0: ab + 1: + aaab + 0: aaab + 1: + +/(?=a+)a(a+)++b/ + aab + 0: aab + 1: a + +/(?<=\G.)/g,aftertext + abc + 0: + 0+ bc + 0: + 0+ c + 0: + 0+ + +/(?<=(?=.)?)/ + +/(?<=(?=.)?+)/ + +/(?<=(?=.)*)/ + +/(?<=(?=.){4,5})/ + +/(?<=(?=.){4,5}x)/ + +/a(?=.(*:X))(*SKIP:X)(*F)|(.)/ + abc + 0: a + 1: a + +/a(?>(*:X))(*SKIP:X)(*F)|(.)/ + abc + 0: a + 1: a + +/a(?:(*:X))(*SKIP:X)(*F)|(.)/ + abc + 0: b + 1: b + +#pattern no_start_optimize + +/(?>a(*:1))(?>b(*:1))(*SKIP:1)x|.*/ + abc + 0: abc + +/(?>a(*:1))(?>b)(*SKIP:1)x|.*/ + abc + 0: abc + +#subject mark + +/a(*ACCEPT:X)b/ + abc + 0: a +MK: X + +/(?=a(*ACCEPT:QQ)bc)axyz/ + axyz + 0: axyz +MK: QQ + +/(?(DEFINE)(a(*ACCEPT:X)))(?1)b/ + abc + 0: ab +MK: X + +/a(*F:X)b/ + abc +No match, mark = X + +/(?(DEFINE)(a(*F:X)))(?1)b/ + abc +No match, mark = X + +/a(*COMMIT:X)b/ + abc + 0: ab +MK: X + +/(?(DEFINE)(a(*COMMIT:X)))(?1)b/ + abc + 0: ab +MK: X + +/a+(*:Z)b(*COMMIT:X)(*SKIP:Z)c|.*/ + aaaabd + 0: bd + +/a+(*:Z)b(*COMMIT:X)(*SKIP:X)c|.*/ + aaaabd +No match, mark = X + +/a(*COMMIT:X)b/ + axabc +No match, mark = X + +#pattern -no_start_optimize +#subject -mark + +/(.COMMIT)(*COMMIT::::::::::interal error:::)/ + +/(*COMMIT:ÿÿ)/ + +/(*COMMIT:]w)/ + +/(?i)A(?^)B(?^x:C D)(?^i)e f/ + aBCDE F + 0: aBCDE F +\= Expect no match + aBCDEF +No match + AbCDe f +No match + +/(*pla:foo).{6}/ + abcfoobarxyz + 0: foobar +\= Expect no match + abcfooba +No match + +/(*positive_lookahead:foo).{6}/ + abcfoobarxyz + 0: foobar + +/(?(*pla:foo).{6}|a..)/ + foobarbaz + 0: foobar + abcfoobar + 0: abc + +/(?(*positive_lookahead:foo).{6}|a..)/ + foobarbaz + 0: foobar + abcfoobar + 0: abc + +/(*plb:foo)bar/ + abcfoobar + 0: bar +\= Expect no match + abcbarfoo +No match + +/(*positive_lookbehind:foo)bar/ + abcfoobar + 0: bar +\= Expect no match + abcbarfoo +No match + +/(?(*plb:foo)bar|baz)/ + abcfoobar + 0: bar + bazfoobar + 0: baz + abcbazfoobar + 0: baz + foobazfoobar + 0: bar + +/(?(*positive_lookbehind:foo)bar|baz)/ + abcfoobar + 0: bar + bazfoobar + 0: baz + abcbazfoobar + 0: baz + foobazfoobar + 0: bar + +/(*nlb:foo)bar/ + abcbarfoo + 0: bar +\= Expect no match + abcfoobar +No match + +/(*negative_lookbehind:foo)bar/ + abcbarfoo + 0: bar +\= Expect no match + abcfoobar +No match + +/(?(*nlb:foo)bar|baz)/ + abcfoobaz + 0: baz + abcbarbaz + 0: bar +\= Expect no match + abcfoobar +No match + +/(?(*negative_lookbehind:foo)bar|baz)/ + abcfoobaz + 0: baz + abcbarbaz + 0: bar +\= Expect no match + abcfoobar +No match + +/(*atomic:a+)\w/ + aaab + 0: aaab +\= Expect no match + aaaa +No match / (? \w+ )* \. /xi pokus. @@ -9470,4 +10039,162 @@ No match 0: pokus.hokus 1: hokus -/-- End of testinput1 --/ +/a(?(?=(*:2)b).)/mark + abc + 0: ab +MK: 2 + acb + 0: a + +/a(?(?!(*:2)b).)/mark + acb + 0: ac + abc + 0: a +MK: 2 + +/(?:a|ab){1}+c/ +\= Expect no match + abc +No match + +/(a|ab){1}+c/ + abc +No match + +/(a+){1}+a/ +\= Expect no match + aaaa +No match + +/(?(DEFINE)(a|ab))(?1){1}+c/ + abc +No match + +/(?:a|(?=b)|.)*\z/ + abc + 0: abc + +/(?:a|(?=b)|.)*/ + abc + 0: a + +/(?<=a(*SKIP)x)|c/ + abcd +No match + +/(?<=a(*SKIP)x)|d/ + abcd + 0: d + +/(?<=(?=.(?<=x)))/aftertext + abx + 0: + 0+ x + +/(?<=(?=(?<=a)))b/ + ab + 0: b + +/^(?a)(?()b)((?<=b).*)$/ + abc + 0: abc + 1: a + 2: c + +/^(a\1?){4}$/ + aaaa + 0: aaaa + 1: a + aaaaaa + 0: aaaaaa + 1: aa + +/^((\1+)|\d)+133X$/ + 111133X + 0: 111133X + 1: 11 + 2: 11 + +/^(?=.*(?=(([A-Z]).*(?(1)\1)))(?!.+\2)){26}/i + The quick brown fox jumps over the lazy dog. + 0: + 1: quick brown fox jumps over the lazy dog. + 2: q + Jackdaws love my big sphinx of quartz. + 0: + 1: Jackdaws love my big sphinx of quartz. + 2: J + Pack my box with five dozen liquor jugs. + 0: + 1: Pack my box with five dozen liquor jugs. + 2: P +\= Expect no match + The quick brown fox jumps over the lazy cat. +No match + Hackdaws love my big sphinx of quartz. +No match + Pack my fox with five dozen liquor jugs. +No match + +/^(?>.*?([A-Z])(?!.*\1)){26}/i + The quick brown fox jumps over the lazy dog. + 0: The quick brown fox jumps over the lazy dog + 1: g + Jackdaws love my big sphinx of quartz. + 0: Jackdaws love my big sphinx of quartz + 1: z + Pack my box with five dozen liquor jugs. + 0: Pack my box with five dozen liquor jugs + 1: s +\= Expect no match + The quick brown fox jumps over the lazy cat. +No match + Hackdaws love my big sphinx of quartz. +No match + Pack my fox with five dozen liquor jugs. +No match + +/(?<=X(?(DEFINE)(A)))X(*F)/ +\= Expect no match + AXYZ +No match + +/(?<=X(?(DEFINE)(A)))./ + AXYZ + 0: Y + +/(?<=X(?(DEFINE)(.*))Y)./ + AXYZ + 0: Z + +/(?<=X(?(DEFINE)(Y))(?1))./ + AXYZ + 0: Z + +/(?(DEFINE)(?bar))(?\x{8c}748364< + 0: \x8c748364 + +/a{65536/ + >a{65536< + 0: a{65536 + +/a\K.(?0)*/ + abac + 0: c + +/(a\K.(?1)*)/ + abac + 0: c + 1: abac + +# End of testinput1 diff --git a/src/pcre2/testdata/testoutput10 b/src/pcre2/testdata/testoutput10 new file mode 100644 index 00000000..d4085106 --- /dev/null +++ b/src/pcre2/testdata/testoutput10 @@ -0,0 +1,1884 @@ +# This set of tests is for UTF-8 support and Unicode property support, with +# relevance only for the 8-bit library. + +# The next 5 patterns have UTF-8 errors + +/[Ã]/utf +Failed: error -8 at offset 1: UTF-8 error: byte 2 top bits not 0x80 + +/Ã/utf +Failed: error -3 at offset 0: UTF-8 error: 1 byte missing at end + +/ÃÃÃxxx/utf +Failed: error -8 at offset 0: UTF-8 error: byte 2 top bits not 0x80 + +/‚‚‚‚‚‚‚Ã/utf +Failed: error -22 at offset 2: UTF-8 error: isolated byte with 0x80 bit set + +/‚‚‚‚‚‚‚Ã/match_invalid_utf +Failed: error -22 at offset 2: UTF-8 error: isolated byte with 0x80 bit set + +# Now test subjects + +/badutf/utf +\= Expect UTF-8 errors + X\xdf +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 1 + XX\xef +Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 + XXX\xef\x80 +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3 + X\xf7 +Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 1 + XX\xf7\x80 +Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 + XXX\xf7\x80\x80 +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3 + \xfb +Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0 + \xfb\x80 +Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 + \xfb\x80\x80 +Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 + \xfb\x80\x80\x80 +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0 + \xfd +Failed: error -7: UTF-8 error: 5 bytes missing at end at offset 0 + \xfd\x80 +Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0 + \xfd\x80\x80 +Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 + \xfd\x80\x80\x80 +Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 + \xfd\x80\x80\x80\x80 +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0 + \xdf\x7f +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0 + \xef\x7f\x80 +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0 + \xef\x80\x7f +Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0 + \xf7\x7f\x80\x80 +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0 + \xf7\x80\x7f\x80 +Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0 + \xf7\x80\x80\x7f +Failed: error -10: UTF-8 error: byte 4 top bits not 0x80 at offset 0 + \xfb\x7f\x80\x80\x80 +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0 + \xfb\x80\x7f\x80\x80 +Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0 + \xfb\x80\x80\x7f\x80 +Failed: error -10: UTF-8 error: byte 4 top bits not 0x80 at offset 0 + \xfb\x80\x80\x80\x7f +Failed: error -11: UTF-8 error: byte 5 top bits not 0x80 at offset 0 + \xfd\x7f\x80\x80\x80\x80 +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0 + \xfd\x80\x7f\x80\x80\x80 +Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0 + \xfd\x80\x80\x7f\x80\x80 +Failed: error -10: UTF-8 error: byte 4 top bits not 0x80 at offset 0 + \xfd\x80\x80\x80\x7f\x80 +Failed: error -11: UTF-8 error: byte 5 top bits not 0x80 at offset 0 + \xfd\x80\x80\x80\x80\x7f +Failed: error -12: UTF-8 error: byte 6 top bits not 0x80 at offset 0 + \xed\xa0\x80 +Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0 + \xc0\x8f +Failed: error -17: UTF-8 error: overlong 2-byte sequence at offset 0 + \xe0\x80\x8f +Failed: error -18: UTF-8 error: overlong 3-byte sequence at offset 0 + \xf0\x80\x80\x8f +Failed: error -19: UTF-8 error: overlong 4-byte sequence at offset 0 + \xf8\x80\x80\x80\x8f +Failed: error -20: UTF-8 error: overlong 5-byte sequence at offset 0 + \xfc\x80\x80\x80\x80\x8f +Failed: error -21: UTF-8 error: overlong 6-byte sequence at offset 0 + \x80 +Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 0 + \xfe +Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0 + \xff +Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0 + +/badutf/utf +\= Expect UTF-8 errors + XX\xfb\x80\x80\x80\x80 +Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 2 + XX\xfd\x80\x80\x80\x80\x80 +Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 2 + XX\xf7\xbf\xbf\xbf +Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 2 + +/shortutf/utf +\= Expect UTF-8 errors + XX\xdf\=ph +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2 + XX\xef\=ph +Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 + XX\xef\x80\=ph +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2 + \xf7\=ph +Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 + \xf7\x80\=ph +Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 + \xf7\x80\x80\=ph +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0 + \xfb\=ph +Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0 + \xfb\x80\=ph +Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 + \xfb\x80\x80\=ph +Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 + \xfb\x80\x80\x80\=ph +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0 + \xfd\=ph +Failed: error -7: UTF-8 error: 5 bytes missing at end at offset 0 + \xfd\x80\=ph +Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0 + \xfd\x80\x80\=ph +Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 + \xfd\x80\x80\x80\=ph +Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 + \xfd\x80\x80\x80\x80\=ph +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0 + +/anything/utf +\= Expect UTF-8 errors + X\xc0\x80 +Failed: error -17: UTF-8 error: overlong 2-byte sequence at offset 1 + XX\xc1\x8f +Failed: error -17: UTF-8 error: overlong 2-byte sequence at offset 2 + XXX\xe0\x9f\x80 +Failed: error -18: UTF-8 error: overlong 3-byte sequence at offset 3 + \xf0\x8f\x80\x80 +Failed: error -19: UTF-8 error: overlong 4-byte sequence at offset 0 + \xf8\x87\x80\x80\x80 +Failed: error -20: UTF-8 error: overlong 5-byte sequence at offset 0 + \xfc\x83\x80\x80\x80\x80 +Failed: error -21: UTF-8 error: overlong 6-byte sequence at offset 0 + \xfe\x80\x80\x80\x80\x80 +Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0 + \xff\x80\x80\x80\x80\x80 +Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0 + \xf8\x88\x80\x80\x80 +Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0 + \xf9\x87\x80\x80\x80 +Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0 + \xfc\x84\x80\x80\x80\x80 +Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0 + \xfd\x83\x80\x80\x80\x80 +Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0 +\= Expect no match + \xc3\x8f +No match + \xe0\xaf\x80 +No match + \xe1\x80\x80 +No match + \xf0\x9f\x80\x80 +No match + \xf1\x8f\x80\x80 +No match + \xf8\x88\x80\x80\x80\=no_utf_check +No match + \xf9\x87\x80\x80\x80\=no_utf_check +No match + \xfc\x84\x80\x80\x80\x80\=no_utf_check +No match + \xfd\x83\x80\x80\x80\x80\=no_utf_check +No match + +# Similar tests with offsets + +/badutf/utf +\= Expect UTF-8 errors + X\xdfabcd +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 + X\xdfabcd\=offset=1 +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 +\= Expect no match + X\xdfabcd\=offset=2 +No match + +/(?<=x)badutf/utf +\= Expect UTF-8 errors + X\xdfabcd +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 + X\xdfabcd\=offset=1 +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 + X\xdfabcd\=offset=2 +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 + X\xdfabcd\xdf\=offset=3 +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 6 +\= Expect no match + X\xdfabcd\=offset=3 +No match + +/(?<=xx)badutf/utf +\= Expect UTF-8 errors + X\xdfabcd +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 + X\xdfabcd\=offset=1 +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 + X\xdfabcd\=offset=2 +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 + X\xdfabcd\=offset=3 +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 + +/(?<=xxxx)badutf/utf +\= Expect UTF-8 errors + X\xdfabcd +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 + X\xdfabcd\=offset=1 +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 + X\xdfabcd\=offset=2 +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 + X\xdfabcd\=offset=3 +Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 + X\xdfabc\xdf\=offset=6 +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 5 + X\xdfabc\xdf\=offset=7 +Failed: error -33: bad offset value +\= Expect no match + X\xdfabcd\=offset=6 +No match + +/\x{100}/IB,utf +------------------------------------------------------------------ + Bra + \x{100} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xc4 +Last code unit = \x80 +Subject length lower bound = 1 + +/\x{1000}/IB,utf +------------------------------------------------------------------ + Bra + \x{1000} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xe1 +Last code unit = \x80 +Subject length lower bound = 1 + +/\x{10000}/IB,utf +------------------------------------------------------------------ + Bra + \x{10000} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xf0 +Last code unit = \x80 +Subject length lower bound = 1 + +/\x{100000}/IB,utf +------------------------------------------------------------------ + Bra + \x{100000} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xf4 +Last code unit = \x80 +Subject length lower bound = 1 + +/\x{10ffff}/IB,utf +------------------------------------------------------------------ + Bra + \x{10ffff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xf4 +Last code unit = \xbf +Subject length lower bound = 1 + +/[\x{ff}]/IB,utf +------------------------------------------------------------------ + Bra + \x{ff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xc3 +Last code unit = \xbf +Subject length lower bound = 1 + +/[\x{100}]/IB,utf +------------------------------------------------------------------ + Bra + \x{100} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xc4 +Last code unit = \x80 +Subject length lower bound = 1 + +/\x80/IB,utf +------------------------------------------------------------------ + Bra + \x{80} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xc2 +Last code unit = \x80 +Subject length lower bound = 1 + +/\xff/IB,utf +------------------------------------------------------------------ + Bra + \x{ff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xc3 +Last code unit = \xbf +Subject length lower bound = 1 + +/\x{D55c}\x{ad6d}\x{C5B4}/IB,utf +------------------------------------------------------------------ + Bra + \x{d55c}\x{ad6d}\x{c5b4} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xed +Last code unit = \xb4 +Subject length lower bound = 3 + \x{D55c}\x{ad6d}\x{C5B4} + 0: \x{d55c}\x{ad6d}\x{c5b4} + +/\x{65e5}\x{672c}\x{8a9e}/IB,utf +------------------------------------------------------------------ + Bra + \x{65e5}\x{672c}\x{8a9e} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xe6 +Last code unit = \x9e +Subject length lower bound = 3 + \x{65e5}\x{672c}\x{8a9e} + 0: \x{65e5}\x{672c}\x{8a9e} + +/\x{80}/IB,utf +------------------------------------------------------------------ + Bra + \x{80} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xc2 +Last code unit = \x80 +Subject length lower bound = 1 + +/\x{084}/IB,utf +------------------------------------------------------------------ + Bra + \x{84} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xc2 +Last code unit = \x84 +Subject length lower bound = 1 + +/\x{104}/IB,utf +------------------------------------------------------------------ + Bra + \x{104} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xc4 +Last code unit = \x84 +Subject length lower bound = 1 + +/\x{861}/IB,utf +------------------------------------------------------------------ + Bra + \x{861} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xe0 +Last code unit = \xa1 +Subject length lower bound = 1 + +/\x{212ab}/IB,utf +------------------------------------------------------------------ + Bra + \x{212ab} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xf0 +Last code unit = \xab +Subject length lower bound = 1 + +/[^ab\xC0-\xF0]/IB,utf +------------------------------------------------------------------ + Bra + [\x00-`c-\xbf\xf1-\xff] (neg) + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 + 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y + Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f + \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 + \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf + \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee + \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd + \xfe \xff +Subject length lower bound = 1 + \x{f1} + 0: \x{f1} + \x{bf} + 0: \x{bf} + \x{100} + 0: \x{100} + \x{1000} + 0: \x{1000} +\= Expect no match + \x{c0} +No match + \x{f0} +No match + +/Ä€{3,4}/IB,utf +------------------------------------------------------------------ + Bra + \x{100}{3} + \x{100}?+ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xc4 +Last code unit = \x80 +Subject length lower bound = 3 + \x{100}\x{100}\x{100}\x{100\x{100} + 0: \x{100}\x{100}\x{100} + +/(\x{100}+|x)/IB,utf +------------------------------------------------------------------ + Bra + CBra 1 + \x{100}++ + Alt + x + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Starting code units: x \xc4 +Subject length lower bound = 1 + +/(\x{100}*a|x)/IB,utf +------------------------------------------------------------------ + Bra + CBra 1 + \x{100}*+ + a + Alt + x + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Starting code units: a x \xc4 +Subject length lower bound = 1 + +/(\x{100}{0,2}a|x)/IB,utf +------------------------------------------------------------------ + Bra + CBra 1 + \x{100}{0,2}+ + a + Alt + x + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Starting code units: a x \xc4 +Subject length lower bound = 1 + +/(\x{100}{1,2}a|x)/IB,utf +------------------------------------------------------------------ + Bra + CBra 1 + \x{100} + \x{100}{0,1}+ + a + Alt + x + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Starting code units: x \xc4 +Subject length lower bound = 1 + +/\x{100}/IB,utf +------------------------------------------------------------------ + Bra + \x{100} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xc4 +Last code unit = \x80 +Subject length lower bound = 1 + +/a\x{100}\x{101}*/IB,utf +------------------------------------------------------------------ + Bra + a\x{100} + \x{101}*+ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'a' +Last code unit = \x80 +Subject length lower bound = 2 + +/a\x{100}\x{101}+/IB,utf +------------------------------------------------------------------ + Bra + a\x{100} + \x{101}++ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'a' +Last code unit = \x81 +Subject length lower bound = 3 + +/[^\x{c4}]/IB +------------------------------------------------------------------ + Bra + [^\x{c4}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Subject length lower bound = 1 + +/[\x{100}]/IB,utf +------------------------------------------------------------------ + Bra + \x{100} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xc4 +Last code unit = \x80 +Subject length lower bound = 1 + \x{100} + 0: \x{100} + Z\x{100} + 0: \x{100} + \x{100}Z + 0: \x{100} + +/[\xff]/IB,utf +------------------------------------------------------------------ + Bra + \x{ff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xc3 +Last code unit = \xbf +Subject length lower bound = 1 + >\x{ff}< + 0: \x{ff} + +/[^\xff]/IB,utf +------------------------------------------------------------------ + Bra + [^\x{ff}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Subject length lower bound = 1 + +/\x{100}abc(xyz(?1))/IB,utf +------------------------------------------------------------------ + Bra + \x{100}abc + CBra 1 + xyz + Recurse + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +First code unit = \xc4 +Last code unit = 'z' +Subject length lower bound = 7 + +/\777/I,utf +Capture group count = 0 +Options: utf +First code unit = \xc7 +Last code unit = \xbf +Subject length lower bound = 1 + \x{1ff} + 0: \x{1ff} + \777 + 0: \x{1ff} + +/\x{100}+\x{200}/IB,utf +------------------------------------------------------------------ + Bra + \x{100}++ + \x{200} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xc4 +Last code unit = \x80 +Subject length lower bound = 2 + +/\x{100}+X/IB,utf +------------------------------------------------------------------ + Bra + \x{100}++ + X + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xc4 +Last code unit = 'X' +Subject length lower bound = 2 + +/^[\QÄ€\E-\QÅ\E/B,utf +Failed: error 106 at offset 15: missing terminating ] for character class + +# This tests the stricter UTF-8 check according to RFC 3629. + +/X/utf +\= Expect UTF-8 errors + \x{d800} +Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0 + \x{da00} +Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0 + \x{dfff} +Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0 + \x{110000} +Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 0 + \x{2000000} +Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0 + \x{7fffffff} +Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0 +\= Expect no match + \x{d800}\=no_utf_check +No match + \x{da00}\=no_utf_check +No match + \x{dfff}\=no_utf_check +No match + \x{110000}\=no_utf_check +No match + \x{2000000}\=no_utf_check +No match + \x{7fffffff}\=no_utf_check +No match + +/(*UTF8)\x{1234}/ + abcd\x{1234}pqr + 0: \x{1234} + +/(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I +Capture group count = 0 +Compile options: +Overall options: utf +\R matches any Unicode newline +Forced newline is CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + +/\h/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x09 \x20 \xc2 \xe1 \xe2 \xe3 +Subject length lower bound = 1 + ABC\x{09} + 0: \x{09} + ABC\x{20} + 0: + ABC\x{a0} + 0: \x{a0} + ABC\x{1680} + 0: \x{1680} + ABC\x{180e} + 0: \x{180e} + ABC\x{2000} + 0: \x{2000} + ABC\x{202f} + 0: \x{202f} + ABC\x{205f} + 0: \x{205f} + ABC\x{3000} + 0: \x{3000} + +/\v/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2 +Subject length lower bound = 1 + ABC\x{0a} + 0: \x{0a} + ABC\x{0b} + 0: \x{0b} + ABC\x{0c} + 0: \x{0c} + ABC\x{0d} + 0: \x{0d} + ABC\x{85} + 0: \x{85} + ABC\x{2028} + 0: \x{2028} + +/\h*A/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x09 \x20 A \xc2 \xe1 \xe2 \xe3 +Last code unit = 'A' +Subject length lower bound = 1 + CDBABC + 0: A + +/\v+A/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2 +Last code unit = 'A' +Subject length lower bound = 2 + +/\s?xxx\s/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x +Last code unit = 'x' +Subject length lower bound = 4 + +/\sxxx\s/I,utf,tables=2 +Capture group count = 0 +Options: utf +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc2 +Last code unit = 'x' +Subject length lower bound = 5 + AB\x{85}xxx\x{a0}XYZ + 0: \x{85}xxx\x{a0} + AB\x{a0}xxx\x{85}XYZ + 0: \x{a0}xxx\x{85} + +/\S \S/I,utf,tables=2 +Capture group count = 0 +Options: utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f + \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e + \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C + D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h + i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4 + \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 + \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 + \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 + \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff +Last code unit = ' ' +Subject length lower bound = 3 + \x{a2} \x{84} + 0: \x{a2} \x{84} + A Z + 0: A Z + +/a+/utf + a\x{123}aa\=offset=1 + 0: aa + a\x{123}aa\=offset=3 + 0: aa + a\x{123}aa\=offset=4 + 0: a +\= Expect bad offset value + a\x{123}aa\=offset=6 +Failed: error -33: bad offset value +\= Expect bad UTF-8 offset + a\x{123}aa\=offset=2 +Error -36 (bad UTF-8 offset) +\= Expect no match + a\x{123}aa\=offset=5 +No match + +/\x{1234}+/Ii,utf +Capture group count = 0 +Options: caseless utf +Starting code units: \xe1 +Subject length lower bound = 1 + +/\x{1234}+?/Ii,utf +Capture group count = 0 +Options: caseless utf +Starting code units: \xe1 +Subject length lower bound = 1 + +/\x{1234}++/Ii,utf +Capture group count = 0 +Options: caseless utf +Starting code units: \xe1 +Subject length lower bound = 1 + +/\x{1234}{2}/Ii,utf +Capture group count = 0 +Options: caseless utf +Starting code units: \xe1 +Subject length lower bound = 2 + +/[^\x{c4}]/IB,utf +------------------------------------------------------------------ + Bra + [^\x{c4}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Subject length lower bound = 1 + +/X+\x{200}/IB,utf +------------------------------------------------------------------ + Bra + X++ + \x{200} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'X' +Last code unit = \x80 +Subject length lower bound = 2 + +/\R/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2 +Subject length lower bound = 1 + +/\777/IB,utf +------------------------------------------------------------------ + Bra + \x{1ff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xc7 +Last code unit = \xbf +Subject length lower bound = 1 + +/\w+\x{C4}/B,utf +------------------------------------------------------------------ + Bra + \w++ + \x{c4} + Ket + End +------------------------------------------------------------------ + a\x{C4}\x{C4} + 0: a\x{c4} + +/\w+\x{C4}/B,utf,tables=2 +------------------------------------------------------------------ + Bra + \w+ + \x{c4} + Ket + End +------------------------------------------------------------------ + a\x{C4}\x{C4} + 0: a\x{c4}\x{c4} + +/\W+\x{C4}/B,utf +------------------------------------------------------------------ + Bra + \W+ + \x{c4} + Ket + End +------------------------------------------------------------------ + !\x{C4} + 0: !\x{c4} + +/\W+\x{C4}/B,utf,tables=2 +------------------------------------------------------------------ + Bra + \W++ + \x{c4} + Ket + End +------------------------------------------------------------------ + !\x{C4} + 0: !\x{c4} + +/\W+\x{A1}/B,utf +------------------------------------------------------------------ + Bra + \W+ + \x{a1} + Ket + End +------------------------------------------------------------------ + !\x{A1} + 0: !\x{a1} + +/\W+\x{A1}/B,utf,tables=2 +------------------------------------------------------------------ + Bra + \W+ + \x{a1} + Ket + End +------------------------------------------------------------------ + !\x{A1} + 0: !\x{a1} + +/X\s+\x{A0}/B,utf +------------------------------------------------------------------ + Bra + X + \s++ + \x{a0} + Ket + End +------------------------------------------------------------------ + X\x20\x{A0}\x{A0} + 0: X \x{a0} + +/X\s+\x{A0}/B,utf,tables=2 +------------------------------------------------------------------ + Bra + X + \s+ + \x{a0} + Ket + End +------------------------------------------------------------------ + X\x20\x{A0}\x{A0} + 0: X \x{a0}\x{a0} + +/\S+\x{A0}/B,utf +------------------------------------------------------------------ + Bra + \S+ + \x{a0} + Ket + End +------------------------------------------------------------------ + X\x{A0}\x{A0} + 0: X\x{a0}\x{a0} + +/\S+\x{A0}/B,utf,tables=2 +------------------------------------------------------------------ + Bra + \S++ + \x{a0} + Ket + End +------------------------------------------------------------------ + X\x{A0}\x{A0} + 0: X\x{a0} + +/\x{a0}+\s!/B,utf +------------------------------------------------------------------ + Bra + \x{a0}++ + \s + ! + Ket + End +------------------------------------------------------------------ + \x{a0}\x20! + 0: \x{a0} ! + +/\x{a0}+\s!/B,utf,tables=2 +------------------------------------------------------------------ + Bra + \x{a0}+ + \s + ! + Ket + End +------------------------------------------------------------------ + \x{a0}\x20! + 0: \x{a0} ! + +/A/utf + \x{ff000041} +** Character \x{ff000041} is greater than 0x7fffffff and so cannot be converted to UTF-8 + \x{7f000041} +Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0 + +/(*UTF8)abc/never_utf +Failed: error 174 at offset 7: using UTF is disabled by the application + +/abc/utf,never_utf +Failed: error 174 at offset 0: using UTF is disabled by the application + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IBi,utf +------------------------------------------------------------------ + Bra + /i A\x{391}\x{10427}\x{ff3a}\x{1fb0} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +First code unit = 'A' (caseless) +Subject length lower bound = 5 + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IB,utf +------------------------------------------------------------------ + Bra + A\x{391}\x{10427}\x{ff3a}\x{1fb0} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'A' +Last code unit = \xb0 +Subject length lower bound = 5 + +/AB\x{1fb0}/IB,utf +------------------------------------------------------------------ + Bra + AB\x{1fb0} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'A' +Last code unit = \xb0 +Subject length lower bound = 3 + +/AB\x{1fb0}/IBi,utf +------------------------------------------------------------------ + Bra + /i AB\x{1fb0} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +First code unit = 'A' (caseless) +Last code unit = 'B' (caseless) +Subject length lower bound = 3 + +/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf +Capture group count = 0 +Options: caseless utf +Starting code units: \xd0 \xd1 +Subject length lower bound = 17 + \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} + 0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} + \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} + 0: \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} + +/[â±¥]/Bi,utf +------------------------------------------------------------------ + Bra + /i \x{2c65} + Ket + End +------------------------------------------------------------------ + +/[^â±¥]/Bi,utf +------------------------------------------------------------------ + Bra + /i [^\x{2c65}] + Ket + End +------------------------------------------------------------------ + +/\h/I +Capture group count = 0 +Starting code units: \x09 \x20 \xa0 +Subject length lower bound = 1 + +/\v/I +Capture group count = 0 +Starting code units: \x0a \x0b \x0c \x0d \x85 +Subject length lower bound = 1 + +/\R/I +Capture group count = 0 +Starting code units: \x0a \x0b \x0c \x0d \x85 +Subject length lower bound = 1 + +/[[:blank:]]/B,ucp +------------------------------------------------------------------ + Bra + [\x09 \xa0] + Ket + End +------------------------------------------------------------------ + +/\x{212a}+/Ii,utf +Capture group count = 0 +Options: caseless utf +Starting code units: K k \xe2 +Subject length lower bound = 1 + KKkk\x{212a} + 0: KKkk\x{212a} + +/s+/Ii,utf +Capture group count = 0 +Options: caseless utf +Starting code units: S s \xc5 +Subject length lower bound = 1 + SSss\x{17f} + 0: SSss\x{17f} + +/\x{100}*A/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + A + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: A \xc4 +Last code unit = 'A' +Subject length lower bound = 1 + A + 0: A + +/\x{100}*\d(?R)/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + \d + Recurse + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4 +Subject length lower bound = 1 + +/[Z\x{100}]/IB,utf +------------------------------------------------------------------ + Bra + [Z\x{100}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: Z \xc4 +Subject length lower bound = 1 + Z\x{100} + 0: Z + \x{100} + 0: \x{100} + \x{100}Z + 0: \x{100} + +/[z-\x{100}]/IB,utf +------------------------------------------------------------------ + Bra + [z-\xff\x{100}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: z { | } ~ \x7f \xc2 \xc3 \xc4 +Subject length lower bound = 1 + +/[z\Qa-d]Ä€\E]/IB,utf +------------------------------------------------------------------ + Bra + [\-\]adz\x{100}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: - ] a d z \xc4 +Subject length lower bound = 1 + \x{100} + 0: \x{100} + Ä€ + 0: \x{100} + +/[ab\x{100}]abc(xyz(?1))/IB,utf +------------------------------------------------------------------ + Bra + [ab\x{100}] + abc + CBra 1 + xyz + Recurse + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Starting code units: a b \xc4 +Last code unit = 'z' +Subject length lower bound = 7 + +/\x{100}*\s/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + \s + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc4 +Subject length lower bound = 1 + +/\x{100}*\d/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + \d + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4 +Subject length lower bound = 1 + +/\x{100}*\w/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + \w + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z + \xc4 +Subject length lower bound = 1 + +/\x{100}*\D/IB,utf +------------------------------------------------------------------ + Bra + \x{100}* + \D + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = > + ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c + d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 + \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 + \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 + \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef + \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe + \xff +Subject length lower bound = 1 + +/\x{100}*\S/IB,utf +------------------------------------------------------------------ + Bra + \x{100}* + \S + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f + \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e + \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C + D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h + i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4 + \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 + \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 + \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 + \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/\x{100}*\W/IB,utf +------------------------------------------------------------------ + Bra + \x{100}* + \W + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = > + ? @ [ \ ] ^ ` { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 + \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 + \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 + \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 + \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/[\x{105}-\x{109}]/IBi,utf +------------------------------------------------------------------ + Bra + [\x{104}-\x{109}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +Starting code units: \xc4 +Subject length lower bound = 1 + \x{104} + 0: \x{104} + \x{105} + 0: \x{105} + \x{109} + 0: \x{109} +\= Expect no match + \x{100} +No match + \x{10a} +No match + +/[z-\x{100}]/IBi,utf +------------------------------------------------------------------ + Bra + [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xce \xe1 \xe2 +Subject length lower bound = 1 + Z + 0: Z + z + 0: z + \x{39c} + 0: \x{39c} + \x{178} + 0: \x{178} + | + 0: | + \x{80} + 0: \x{80} + \x{ff} + 0: \x{ff} + \x{100} + 0: \x{100} + \x{101} + 0: \x{101} +\= Expect no match + \x{102} +No match + Y +No match + y +No match + +/[z-\x{100}]/IBi,utf +------------------------------------------------------------------ + Bra + [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xce \xe1 \xe2 +Subject length lower bound = 1 + +/\x{3a3}B/IBi,utf +------------------------------------------------------------------ + Bra + clist 03a3 03c2 03c3 + /i B + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +Starting code units: \xce \xcf +Last code unit = 'B' (caseless) +Subject length lower bound = 2 + +/abc/utf,replace=à + abc +Failed: error -3: UTF-8 error: 1 byte missing at end + +/(?<=(a)(?-1))x/I,utf +Capture group count = 1 +Max lookbehind = 2 +Options: utf +First code unit = 'x' +Subject length lower bound = 1 + a\x80zx\=offset=3 +Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 1 + +/[\W\p{Any}]/B +------------------------------------------------------------------ + Bra + [\x00-/:-@[-^`{-\xff\p{Any}] + Ket + End +------------------------------------------------------------------ + abc + 0: a + 123 + 0: 1 + +/[\W\pL]/B +------------------------------------------------------------------ + Bra + [\x00-/:-@[-^`{-\xff\p{L}] + Ket + End +------------------------------------------------------------------ + abc + 0: a +\= Expect no match + 123 +No match + +/(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':Æ¿)/utf +Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN) + +/[\s[:^ascii:]]/B,ucp +------------------------------------------------------------------ + Bra + [\x80-\xff\p{Xsp}] + Ket + End +------------------------------------------------------------------ + +# A special extra option allows excaped surrogate code points in 8-bit mode, +# but subjects containing them must not be UTF-checked. + +/\x{d800}/I,utf,allow_surrogate_escapes +Capture group count = 0 +Options: utf +Extra options: allow_surrogate_escapes +First code unit = \xed +Last code unit = \x80 +Subject length lower bound = 1 + \x{d800}\=no_utf_check + 0: \x{d800} + +/\udfff\o{157401}/utf,alt_bsux,allow_surrogate_escapes + \x{dfff}\x{df01}\=no_utf_check + 0: \x{dfff}\x{df01} + +# This has different starting code units in 8-bit mode. + +/^[^ab]/IB,utf +------------------------------------------------------------------ + Bra + ^ + [\x00-`c-\xff] (neg) + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: utf +Overall options: anchored utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 + 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y + Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f + \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 + \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf + \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee + \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd + \xfe \xff +Subject length lower bound = 1 + c + 0: c + \x{ff} + 0: \x{ff} + \x{100} + 0: \x{100} +\= Expect no match + aaa +No match + +# Offsets are different in 8-bit mode. + +/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout + 123abcáyzabcdef789abcሴqr + 1(2) Old 6 6 "" New 6 8 "<>" + 2(2) Old 13 13 "" New 15 17 "<>" + 3(2) Old 13 16 "def" New 17 22 "" + 4(2) Old 22 22 "" New 28 30 "<>" + 4: 123abc<>\x{e1}yzabc<>789abc<>\x{1234}qr + +# Check name length with non-ASCII characters + +/(?'ABáC678901234567890123456789012'...)/utf + +/(?'ABáC6789012345678901234567890123'...)/utf +Failed: error 148 at offset 36: subpattern name is too long (maximum 32 code units) + +/(?'ABZC6789012345678901234567890123'...)/utf + +/(?(n/utf +Failed: error 142 at offset 4: syntax error in subpattern name (missing terminator?) + +/(?(á/utf +Failed: error 142 at offset 5: syntax error in subpattern name (missing terminator?) + +# Invalid UTF-8 tests + +/.../g,match_invalid_utf + abcd\x80wxzy\x80pqrs + 0: abc + 0: wxz + 0: pqr + abcd\x{80}wxzy\x80pqrs + 0: abc + 0: d\x{80}w + 0: xzy + 0: pqr + +/abc/match_invalid_utf + ab\x80ab\=ph +Partial match: ab +\= Expect no match + ab\x80cdef\=ph +No match + +/ab$/match_invalid_utf + ab\x80cdeab + 0: ab +\= Expect no match + ab\x80cde +No match + +/.../g,match_invalid_utf + abcd\x{80}wxzy\x80pqrs + 0: abc + 0: d\x{80}w + 0: xzy + 0: pqr + +/(?<=x)../g,match_invalid_utf + abcd\x{80}wxzy\x80pqrs + 0: zy + abcd\x{80}wxzy\x80xpqrs + 0: zy + 0: pq + +/X$/match_invalid_utf +\= Expect no match + X\xc4 +No match + +/(?<=..)X/match_invalid_utf,aftertext + AB\x80AQXYZ + 0: X + 0+ YZ + AB\x80AQXYZ\=offset=5 + 0: X + 0+ YZ + AB\x80\x80AXYZXC\=offset=5 + 0: X + 0+ C +\= Expect no match + AB\x80XYZ +No match + AB\x80XYZ\=offset=3 +No match + AB\xfeXYZ +No match + AB\xffXYZ\=offset=3 +No match + AB\x80AXYZ +No match + AB\x80AXYZ\=offset=4 +No match + AB\x80\x80AXYZ\=offset=5 +No match + +/.../match_invalid_utf + AB\xc4CCC + 0: CCC +\= Expect no match + A\x{d800}B +No match + A\x{110000}B +No match + A\xc4B +No match + +/\bX/match_invalid_utf + A\x80X + 0: X + +/\BX/match_invalid_utf +\= Expect no match + A\x80X +No match + +/(?<=...)X/match_invalid_utf + AAA\x80BBBXYZ + 0: X +\= Expect no match + AAA\x80BXYZ +No match + AAA\x80BBXYZ +No match + +# ------------------------------------- + +/(*UTF)(?=\x{123})/I +Capture group count = 0 +May match empty string +Compile options: +Overall options: utf +First code unit = \xc4 +Last code unit = \xa3 +Subject length lower bound = 1 + +/[\x{c1}\x{e1}]X[\x{145}\x{146}]/I,utf +Capture group count = 0 +Options: utf +Starting code units: \xc3 +Last code unit = 'X' +Subject length lower bound = 3 + +/[󿾟,]/BI,utf +------------------------------------------------------------------ + Bra + [,\x{fff9f}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: , \xf3 +Subject length lower bound = 1 + +/[\x{fff4}-\x{ffff8}]/I,utf +Capture group count = 0 +Options: utf +Starting code units: \xef \xf0 \xf1 \xf2 \xf3 +Subject length lower bound = 1 + +/[\x{fff4}-\x{afff8}\x{10ffff}]/I,utf +Capture group count = 0 +Options: utf +Starting code units: \xef \xf0 \xf1 \xf2 \xf4 +Subject length lower bound = 1 + +/[\xff\x{ffff}]/I,utf +Capture group count = 0 +Options: utf +Starting code units: \xc3 \xef +Subject length lower bound = 1 + +/[\xff\x{ff}]/I,utf +Capture group count = 0 +Options: utf +Starting code units: \xc3 +Subject length lower bound = 1 + abc\x{ff}def + 0: \x{ff} + +/[\xff\x{ff}]/I +Capture group count = 0 +First code unit = \xff +Subject length lower bound = 1 + abc\x{ff}def + 0: \xff + +/[Ss]/I +Capture group count = 0 +First code unit = 'S' (caseless) +Subject length lower bound = 1 + +/[Ss]/I,utf +Capture group count = 0 +Options: utf +Starting code units: S s +Subject length lower bound = 1 + +/(?:\x{ff}|\x{3000})/I,utf +Capture group count = 0 +Options: utf +Starting code units: \xc3 \xe3 +Subject length lower bound = 1 + +/x/utf + abxyz + 0: x + \x80\=startchar +Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 0 + abc\x80\=startchar +Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 3 + abc\x80\=startchar,offset=3 +Error -36 (bad UTF-8 offset) + +/\x{c1}+\x{e1}/iIB,ucp +------------------------------------------------------------------ + Bra + /i \x{c1}+ + /i \x{e1} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless ucp +First code unit = \xc1 (caseless) +Last code unit = \xe1 (caseless) +Subject length lower bound = 2 + \x{c1}\x{c1}\x{c1} + 0: \xc1\xc1\xc1 + \x{e1}\x{e1}\x{e1} + 0: \xe1\xe1\xe1 + +/a|\x{c1}/iI,ucp +Capture group count = 0 +Options: caseless ucp +Starting code units: A a \xc1 \xe1 +Subject length lower bound = 1 + \x{e1}xxx + 0: \xe1 + +/a|\x{c1}/iI,utf +Capture group count = 0 +Options: caseless utf +Starting code units: A a \xc3 +Subject length lower bound = 1 + \x{e1}xxx + 0: \x{e1} + +/\x{c1}|\x{e1}/iI,ucp +Capture group count = 0 +Options: caseless ucp +First code unit = \xc1 (caseless) +Subject length lower bound = 1 + +/X(\x{e1})Y/ucp,replace=>\U$1<,substitute_extended + X\x{e1}Y + 1: >\xc1< + +/X(\x{e1})Y/i,ucp,replace=>\L$1<,substitute_extended + X\x{c1}Y + 1: >\xe1< + +# Without UTF or UCP characters > 127 have only one case in the default locale. + +/X(\x{e1})Y/replace=>\U$1<,substitute_extended + X\x{e1}Y + 1: >\xe1< + +/A/utf,match_invalid_utf,caseless + \xe5A + 0: A + +/\bch\b/utf,match_invalid_utf + qchq\=ph +Partial match: + qchq\=ps +Partial match: + +# End of testinput10 diff --git a/src/pcre2/testdata/testoutput11-16 b/src/pcre2/testdata/testoutput11-16 new file mode 100644 index 00000000..87687857 --- /dev/null +++ b/src/pcre2/testdata/testoutput11-16 @@ -0,0 +1,668 @@ +# This set of tests is for the 16-bit and 32-bit libraries' basic (non-UTF) +# features that are not compatible with the 8-bit library, or which give +# different output in 16-bit or 32-bit mode. The output for the two widths is +# different, so they have separate output files. + +#forbid_utf +#newline_default LF ANY ANYCRLF + +/[^\x{c4}]/IB +------------------------------------------------------------------ + Bra + [^\x{c4}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Subject length lower bound = 1 + +/\x{100}/I +Capture group count = 0 +First code unit = \x{100} +Subject length lower bound = 1 + +/ (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* # optional leading comment +(?: (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +) # initial word +(?: (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +) )* # further okay, if led by a period +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* @ (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # initial subdomain +(?: # +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. # if led by a period... +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # ...further okay +)* +# address +| # or +(?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +) # one word, optionally followed by.... +(?: +[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037] | # atom and space parts, or... +\( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) | # comments, or... + +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +# quoted strings +)* +< (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* # leading < +(?: @ (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # initial subdomain +(?: # +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. # if led by a period... +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # ...further okay +)* + +(?: (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* , (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* @ (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # initial subdomain +(?: # +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. # if led by a period... +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # ...further okay +)* +)* # further okay, if led by comma +: # closing colon +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* )? # optional route +(?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +) # initial word +(?: (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +) )* # further okay, if led by a period +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* @ (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # initial subdomain +(?: # +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. # if led by a period... +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # ...further okay +)* +# address spec +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* > # trailing > +# name and address +) (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* # optional trailing comment +/Ix +Capture group count = 0 +Contains explicit CR or LF match +Options: extended +Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8 + 9 = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e + f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xff +Subject length lower bound = 3 + +/[\h]/B +------------------------------------------------------------------ + Bra + [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}] + Ket + End +------------------------------------------------------------------ + >\x09< + 0: \x09 + +/[\h]+/B +------------------------------------------------------------------ + Bra + [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}]++ + Ket + End +------------------------------------------------------------------ + >\x09\x20\xa0< + 0: \x09 \xa0 + +/[\v]/B +------------------------------------------------------------------ + Bra + [\x0a-\x0d\x85\x{2028}-\x{2029}] + Ket + End +------------------------------------------------------------------ + +/[^\h]/B +------------------------------------------------------------------ + Bra + [^\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}] + Ket + End +------------------------------------------------------------------ + +/\h+/I +Capture group count = 0 +Starting code units: \x09 \x20 \xa0 \xff +Subject length lower bound = 1 + \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} + 0: \x{1680}\x{2000}\x{202f}\x{3000} + \x{3001}\x{2fff}\x{200a}\xa0\x{2000} + 0: \x{200a}\xa0\x{2000} + +/[\h\x{dc00}]+/IB +------------------------------------------------------------------ + Bra + [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}\x{dc00}]++ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: \x09 \x20 \xa0 \xff +Subject length lower bound = 1 + \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} + 0: \x{1680}\x{2000}\x{202f}\x{3000} + \x{3001}\x{2fff}\x{200a}\xa0\x{2000} + 0: \x{200a}\xa0\x{2000} + +/\H+/I +Capture group count = 0 +Subject length lower bound = 1 + \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} + 0: \x{167f}\x{1681}\x{180d}\x{180f} + \x{2000}\x{200a}\x{1fff}\x{200b} + 0: \x{1fff}\x{200b} + \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} + 0: \x{202e}\x{2030}\x{205e}\x{2060} + \xa0\x{3000}\x9f\xa1\x{2fff}\x{3001} + 0: \x9f\xa1\x{2fff}\x{3001} + +/[\H\x{d800}]+/ + \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} + 0: \x{167f}\x{1681}\x{180d}\x{180f} + \x{2000}\x{200a}\x{1fff}\x{200b} + 0: \x{1fff}\x{200b} + \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} + 0: \x{202e}\x{2030}\x{205e}\x{2060} + \xa0\x{3000}\x9f\xa1\x{2fff}\x{3001} + 0: \x9f\xa1\x{2fff}\x{3001} + +/\v+/I +Capture group count = 0 +Starting code units: \x0a \x0b \x0c \x0d \x85 \xff +Subject length lower bound = 1 + \x{2027}\x{2030}\x{2028}\x{2029} + 0: \x{2028}\x{2029} + \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d + 0: \x85\x0a\x0b\x0c\x0d + +/[\v\x{dc00}]+/IB +------------------------------------------------------------------ + Bra + [\x0a-\x0d\x85\x{2028}-\x{2029}\x{dc00}]++ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: \x0a \x0b \x0c \x0d \x85 \xff +Subject length lower bound = 1 + \x{2027}\x{2030}\x{2028}\x{2029} + 0: \x{2028}\x{2029} + \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d + 0: \x85\x0a\x0b\x0c\x0d + +/\V+/I +Capture group count = 0 +Subject length lower bound = 1 + \x{2028}\x{2029}\x{2027}\x{2030} + 0: \x{2027}\x{2030} + \x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86 + 0: \x09\x0e\x84\x86 + +/[\V\x{d800}]+/ + \x{2028}\x{2029}\x{2027}\x{2030} + 0: \x{2027}\x{2030} + \x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86 + 0: \x09\x0e\x84\x86 + +/\R+/I,bsr=unicode +Capture group count = 0 +\R matches any Unicode newline +Starting code units: \x0a \x0b \x0c \x0d \x85 \xff +Subject length lower bound = 1 + \x{2027}\x{2030}\x{2028}\x{2029} + 0: \x{2028}\x{2029} + \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d + 0: \x85\x0a\x0b\x0c\x0d + +/\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}/I +Capture group count = 0 +First code unit = \x{d800} +Last code unit = \x{dd00} +Subject length lower bound = 6 + \x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00} + 0: \x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00} + +/[^\x{80}][^\x{ff}][^\x{100}][^\x{1000}][^\x{ffff}]/B +------------------------------------------------------------------ + Bra + [^\x{80}] + [^\x{ff}] + [^\x{100}] + [^\x{1000}] + [^\x{ffff}] + Ket + End +------------------------------------------------------------------ + +/[^\x{80}][^\x{ff}][^\x{100}][^\x{1000}][^\x{ffff}]/Bi +------------------------------------------------------------------ + Bra + /i [^\x{80}] + /i [^\x{ff}] + /i [^\x{100}] + /i [^\x{1000}] + /i [^\x{ffff}] + Ket + End +------------------------------------------------------------------ + +/[^\x{100}]*[^\x{1000}]+[^\x{ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{100}]{5,6}+/B +------------------------------------------------------------------ + Bra + [^\x{100}]* + [^\x{1000}]+ + [^\x{ffff}]?? + [^\x{8000}]{4} + [^\x{8000}]* + [^\x{7fff}]{2} + [^\x{7fff}]{0,7}? + [^\x{100}]{5} + [^\x{100}]?+ + Ket + End +------------------------------------------------------------------ + +/[^\x{100}]*[^\x{1000}]+[^\x{ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{100}]{5,6}+/Bi +------------------------------------------------------------------ + Bra + /i [^\x{100}]* + /i [^\x{1000}]+ + /i [^\x{ffff}]?? + /i [^\x{8000}]{4} + /i [^\x{8000}]* + /i [^\x{7fff}]{2} + /i [^\x{7fff}]{0,7}? + /i [^\x{100}]{5} + /i [^\x{100}]?+ + Ket + End +------------------------------------------------------------------ + +/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark + XX + 0: XX +MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF + +/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark + XX + 0: XX +MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE + +/\u0100/B,alt_bsux,allow_empty_class,match_unset_backref +------------------------------------------------------------------ + Bra + \x{100} + Ket + End +------------------------------------------------------------------ + +/[\u0100-\u0200]/B,alt_bsux,allow_empty_class,match_unset_backref +------------------------------------------------------------------ + Bra + [\x{100}-\x{200}] + Ket + End +------------------------------------------------------------------ + +/\ud800/B,alt_bsux,allow_empty_class,match_unset_backref +------------------------------------------------------------------ + Bra + \x{d800} + Ket + End +------------------------------------------------------------------ + +/^\x{ffff}+/i + \x{ffff} + 0: \x{ffff} + +/^\x{ffff}?/i + \x{ffff} + 0: \x{ffff} + +/^\x{ffff}*/i + \x{ffff} + 0: \x{ffff} + +/^\x{ffff}{3}/i + \x{ffff}\x{ffff}\x{ffff} + 0: \x{ffff}\x{ffff}\x{ffff} + +/^\x{ffff}{0,3}/i + \x{ffff} + 0: \x{ffff} + +/[^\x00-a]{12,}[^b-\xff]*/B +------------------------------------------------------------------ + Bra + [b-\xff] (neg){12,} + [\x00-a] (neg)*+ + Ket + End +------------------------------------------------------------------ + +/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/B +------------------------------------------------------------------ + Bra + [\x00-\x08\x0e-\x1f!-\xff] (neg)* + \s* + + [0-9A-Z_a-z]++ + \W+ + + [\x00-/:-\xff] (neg)*? + \d + 0 + [\x00-/:-@[-^`{-\xff] (neg){4,6}? + \w* + A + Ket + End +------------------------------------------------------------------ + +/a*[b-\x{200}]?a#a*[b-\x{200}]?b#[a-f]*[g-\x{200}]*#[g-\x{200}]*[a-c]*#[g-\x{200}]*[a-h]*/B +------------------------------------------------------------------ + Bra + a* + [b-\xff\x{100}-\x{200}]?+ + a# + a*+ + [b-\xff\x{100}-\x{200}]? + b# + [a-f]*+ + [g-\xff\x{100}-\x{200}]*+ + # + [g-\xff\x{100}-\x{200}]*+ + [a-c]*+ + # + [g-\xff\x{100}-\x{200}]* + [a-h]*+ + Ket + End +------------------------------------------------------------------ + +/^[\x{1234}\x{4321}]{2,4}?/ + \x{1234}\x{1234}\x{1234} + 0: \x{1234}\x{1234} + +# Check maximum non-UTF character size for the 16-bit library. + +/\x{ffff}/ + A\x{ffff}B + 0: \x{ffff} + +/\x{10000}/ +Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large + +/\o{20000}/ + +# Check maximum character size for the 32-bit library. These will all give +# errors in the 16-bit library. + +/\x{110000}/ +Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large + +/\x{7fffffff}/ +Failed: error 134 at offset 11: character code point value in \x{} or \o{} is too large + +/\x{80000000}/ +Failed: error 134 at offset 11: character code point value in \x{} or \o{} is too large + +/\x{ffffffff}/ +Failed: error 134 at offset 11: character code point value in \x{} or \o{} is too large + +/\x{100000000}/ +Failed: error 134 at offset 12: character code point value in \x{} or \o{} is too large + +/\o{17777777777}/ +Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large + +/\o{20000000000}/ +Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large + +/\o{37777777777}/ +Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large + +/\o{40000000000}/ +Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large + +/\x{7fffffff}\x{7fffffff}/I +Failed: error 134 at offset 11: character code point value in \x{} or \o{} is too large + +/\x{80000000}\x{80000000}/I +Failed: error 134 at offset 11: character code point value in \x{} or \o{} is too large + +/\x{ffffffff}\x{ffffffff}/I +Failed: error 134 at offset 11: character code point value in \x{} or \o{} is too large + +# Non-UTF characters + +/.{2,3}/ + \x{400000}\x{400001}\x{400002}\x{400003} +** Character \x{400000} is greater than 0xffff and UTF-16 mode is not enabled. +** Truncation will probably give the wrong result. +** Character \x{400001} is greater than 0xffff and UTF-16 mode is not enabled. +** Truncation will probably give the wrong result. +** Character \x{400002} is greater than 0xffff and UTF-16 mode is not enabled. +** Truncation will probably give the wrong result. +** Character \x{400003} is greater than 0xffff and UTF-16 mode is not enabled. +** Truncation will probably give the wrong result. + 0: \x00\x01\x02 + +/\x{400000}\x{800000}/IBi +Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large + +# Check character ranges + +/[\H]/IB +------------------------------------------------------------------ + Bra + [\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{ffff}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b + \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a + \x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 + : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ + _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 + \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f + \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e + \x9f \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae + \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd + \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc + \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb + \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea + \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 + \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/[\V]/IB +------------------------------------------------------------------ + Bra + [\x00-\x09\x0e-\x84\x86-\xff\x{100}-\x{2027}\x{202a}-\x{ffff}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e + \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d + \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > + ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c + d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 + \x83 \x84 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 + \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 + \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 + \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf + \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce + \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd + \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec + \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb + \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/(*THEN:\[A]{65501})/expand + +# We can use pcre2test's utf8_input modifier to create wide pattern characters, +# even though this test is run when UTF is not supported. + +/abý¿¿¿¿¿z/utf8_input +** Failed: character value greater than 0xffff cannot be converted to 16-bit in non-UTF mode + abý¿¿¿¿¿z + ab\x{7fffffff}z + +/abÿý¿¿¿¿¿z/utf8_input +** Failed: invalid UTF-8 string cannot be converted to 16-bit string + abÿý¿¿¿¿¿z + ab\x{ffffffff}z + +/abÿAz/utf8_input +** Failed: invalid UTF-8 string cannot be converted to 16-bit string + abÿAz + ab\x{80000041}z + +/(?i:A{1,}\6666666666)/ + A\x{1b6}6666666 + 0: A\x{1b6}6666666 + +# End of testinput11 diff --git a/src/pcre2/testdata/testoutput11-32 b/src/pcre2/testdata/testoutput11-32 new file mode 100644 index 00000000..2c95f615 --- /dev/null +++ b/src/pcre2/testdata/testoutput11-32 @@ -0,0 +1,674 @@ +# This set of tests is for the 16-bit and 32-bit libraries' basic (non-UTF) +# features that are not compatible with the 8-bit library, or which give +# different output in 16-bit or 32-bit mode. The output for the two widths is +# different, so they have separate output files. + +#forbid_utf +#newline_default LF ANY ANYCRLF + +/[^\x{c4}]/IB +------------------------------------------------------------------ + Bra + [^\x{c4}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Subject length lower bound = 1 + +/\x{100}/I +Capture group count = 0 +First code unit = \x{100} +Subject length lower bound = 1 + +/ (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* # optional leading comment +(?: (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +) # initial word +(?: (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +) )* # further okay, if led by a period +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* @ (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # initial subdomain +(?: # +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. # if led by a period... +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # ...further okay +)* +# address +| # or +(?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +) # one word, optionally followed by.... +(?: +[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037] | # atom and space parts, or... +\( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) | # comments, or... + +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +# quoted strings +)* +< (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* # leading < +(?: @ (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # initial subdomain +(?: # +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. # if led by a period... +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # ...further okay +)* + +(?: (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* , (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* @ (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # initial subdomain +(?: # +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. # if led by a period... +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # ...further okay +)* +)* # further okay, if led by comma +: # closing colon +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* )? # optional route +(?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +) # initial word +(?: (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| +" (?: # opening quote... +[^\\\x80-\xff\n\015"] # Anything except backslash and quote +| # or +\\ [^\x80-\xff] # Escaped something (something != CR) +)* " # closing quote +) )* # further okay, if led by a period +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* @ (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # initial subdomain +(?: # +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* \. # if led by a period... +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* (?: +[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+ # some number of atom characters... +(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]) # ..not followed by something that could be part of an atom +| \[ # [ +(?: [^\\\x80-\xff\n\015\[\]] | \\ [^\x80-\xff] )* # stuff +\] # ] +) # ...further okay +)* +# address spec +(?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* > # trailing > +# name and address +) (?: [\040\t] | \( +(?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* +\) )* # optional trailing comment +/Ix +Capture group count = 0 +Contains explicit CR or LF match +Options: extended +Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8 + 9 = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e + f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xff +Subject length lower bound = 3 + +/[\h]/B +------------------------------------------------------------------ + Bra + [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}] + Ket + End +------------------------------------------------------------------ + >\x09< + 0: \x09 + +/[\h]+/B +------------------------------------------------------------------ + Bra + [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}]++ + Ket + End +------------------------------------------------------------------ + >\x09\x20\xa0< + 0: \x09 \xa0 + +/[\v]/B +------------------------------------------------------------------ + Bra + [\x0a-\x0d\x85\x{2028}-\x{2029}] + Ket + End +------------------------------------------------------------------ + +/[^\h]/B +------------------------------------------------------------------ + Bra + [^\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}] + Ket + End +------------------------------------------------------------------ + +/\h+/I +Capture group count = 0 +Starting code units: \x09 \x20 \xa0 \xff +Subject length lower bound = 1 + \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} + 0: \x{1680}\x{2000}\x{202f}\x{3000} + \x{3001}\x{2fff}\x{200a}\xa0\x{2000} + 0: \x{200a}\xa0\x{2000} + +/[\h\x{dc00}]+/IB +------------------------------------------------------------------ + Bra + [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}\x{dc00}]++ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: \x09 \x20 \xa0 \xff +Subject length lower bound = 1 + \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} + 0: \x{1680}\x{2000}\x{202f}\x{3000} + \x{3001}\x{2fff}\x{200a}\xa0\x{2000} + 0: \x{200a}\xa0\x{2000} + +/\H+/I +Capture group count = 0 +Subject length lower bound = 1 + \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} + 0: \x{167f}\x{1681}\x{180d}\x{180f} + \x{2000}\x{200a}\x{1fff}\x{200b} + 0: \x{1fff}\x{200b} + \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} + 0: \x{202e}\x{2030}\x{205e}\x{2060} + \xa0\x{3000}\x9f\xa1\x{2fff}\x{3001} + 0: \x9f\xa1\x{2fff}\x{3001} + +/[\H\x{d800}]+/ + \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} + 0: \x{167f}\x{1681}\x{180d}\x{180f} + \x{2000}\x{200a}\x{1fff}\x{200b} + 0: \x{1fff}\x{200b} + \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} + 0: \x{202e}\x{2030}\x{205e}\x{2060} + \xa0\x{3000}\x9f\xa1\x{2fff}\x{3001} + 0: \x9f\xa1\x{2fff}\x{3001} + +/\v+/I +Capture group count = 0 +Starting code units: \x0a \x0b \x0c \x0d \x85 \xff +Subject length lower bound = 1 + \x{2027}\x{2030}\x{2028}\x{2029} + 0: \x{2028}\x{2029} + \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d + 0: \x85\x0a\x0b\x0c\x0d + +/[\v\x{dc00}]+/IB +------------------------------------------------------------------ + Bra + [\x0a-\x0d\x85\x{2028}-\x{2029}\x{dc00}]++ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: \x0a \x0b \x0c \x0d \x85 \xff +Subject length lower bound = 1 + \x{2027}\x{2030}\x{2028}\x{2029} + 0: \x{2028}\x{2029} + \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d + 0: \x85\x0a\x0b\x0c\x0d + +/\V+/I +Capture group count = 0 +Subject length lower bound = 1 + \x{2028}\x{2029}\x{2027}\x{2030} + 0: \x{2027}\x{2030} + \x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86 + 0: \x09\x0e\x84\x86 + +/[\V\x{d800}]+/ + \x{2028}\x{2029}\x{2027}\x{2030} + 0: \x{2027}\x{2030} + \x85\x0a\x0b\x0c\x0d\x09\x0e\x84\x86 + 0: \x09\x0e\x84\x86 + +/\R+/I,bsr=unicode +Capture group count = 0 +\R matches any Unicode newline +Starting code units: \x0a \x0b \x0c \x0d \x85 \xff +Subject length lower bound = 1 + \x{2027}\x{2030}\x{2028}\x{2029} + 0: \x{2028}\x{2029} + \x09\x0e\x84\x86\x85\x0a\x0b\x0c\x0d + 0: \x85\x0a\x0b\x0c\x0d + +/\x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00}/I +Capture group count = 0 +First code unit = \x{d800} +Last code unit = \x{dd00} +Subject length lower bound = 6 + \x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00} + 0: \x{d800}\x{d7ff}\x{dc00}\x{dc00}\x{dcff}\x{dd00} + +/[^\x{80}][^\x{ff}][^\x{100}][^\x{1000}][^\x{ffff}]/B +------------------------------------------------------------------ + Bra + [^\x{80}] + [^\x{ff}] + [^\x{100}] + [^\x{1000}] + [^\x{ffff}] + Ket + End +------------------------------------------------------------------ + +/[^\x{80}][^\x{ff}][^\x{100}][^\x{1000}][^\x{ffff}]/Bi +------------------------------------------------------------------ + Bra + /i [^\x{80}] + /i [^\x{ff}] + /i [^\x{100}] + /i [^\x{1000}] + /i [^\x{ffff}] + Ket + End +------------------------------------------------------------------ + +/[^\x{100}]*[^\x{1000}]+[^\x{ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{100}]{5,6}+/B +------------------------------------------------------------------ + Bra + [^\x{100}]* + [^\x{1000}]+ + [^\x{ffff}]?? + [^\x{8000}]{4} + [^\x{8000}]* + [^\x{7fff}]{2} + [^\x{7fff}]{0,7}? + [^\x{100}]{5} + [^\x{100}]?+ + Ket + End +------------------------------------------------------------------ + +/[^\x{100}]*[^\x{1000}]+[^\x{ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{100}]{5,6}+/Bi +------------------------------------------------------------------ + Bra + /i [^\x{100}]* + /i [^\x{1000}]+ + /i [^\x{ffff}]?? + /i [^\x{8000}]{4} + /i [^\x{8000}]* + /i [^\x{7fff}]{2} + /i [^\x{7fff}]{0,7}? + /i [^\x{100}]{5} + /i [^\x{100}]?+ + Ket + End +------------------------------------------------------------------ + +/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark + XX + 0: XX +MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF + +/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark + XX + 0: XX +MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE + +/\u0100/B,alt_bsux,allow_empty_class,match_unset_backref +------------------------------------------------------------------ + Bra + \x{100} + Ket + End +------------------------------------------------------------------ + +/[\u0100-\u0200]/B,alt_bsux,allow_empty_class,match_unset_backref +------------------------------------------------------------------ + Bra + [\x{100}-\x{200}] + Ket + End +------------------------------------------------------------------ + +/\ud800/B,alt_bsux,allow_empty_class,match_unset_backref +------------------------------------------------------------------ + Bra + \x{d800} + Ket + End +------------------------------------------------------------------ + +/^\x{ffff}+/i + \x{ffff} + 0: \x{ffff} + +/^\x{ffff}?/i + \x{ffff} + 0: \x{ffff} + +/^\x{ffff}*/i + \x{ffff} + 0: \x{ffff} + +/^\x{ffff}{3}/i + \x{ffff}\x{ffff}\x{ffff} + 0: \x{ffff}\x{ffff}\x{ffff} + +/^\x{ffff}{0,3}/i + \x{ffff} + 0: \x{ffff} + +/[^\x00-a]{12,}[^b-\xff]*/B +------------------------------------------------------------------ + Bra + [b-\xff] (neg){12,} + [\x00-a] (neg)*+ + Ket + End +------------------------------------------------------------------ + +/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/B +------------------------------------------------------------------ + Bra + [\x00-\x08\x0e-\x1f!-\xff] (neg)* + \s* + + [0-9A-Z_a-z]++ + \W+ + + [\x00-/:-\xff] (neg)*? + \d + 0 + [\x00-/:-@[-^`{-\xff] (neg){4,6}? + \w* + A + Ket + End +------------------------------------------------------------------ + +/a*[b-\x{200}]?a#a*[b-\x{200}]?b#[a-f]*[g-\x{200}]*#[g-\x{200}]*[a-c]*#[g-\x{200}]*[a-h]*/B +------------------------------------------------------------------ + Bra + a* + [b-\xff\x{100}-\x{200}]?+ + a# + a*+ + [b-\xff\x{100}-\x{200}]? + b# + [a-f]*+ + [g-\xff\x{100}-\x{200}]*+ + # + [g-\xff\x{100}-\x{200}]*+ + [a-c]*+ + # + [g-\xff\x{100}-\x{200}]* + [a-h]*+ + Ket + End +------------------------------------------------------------------ + +/^[\x{1234}\x{4321}]{2,4}?/ + \x{1234}\x{1234}\x{1234} + 0: \x{1234}\x{1234} + +# Check maximum non-UTF character size for the 16-bit library. + +/\x{ffff}/ + A\x{ffff}B + 0: \x{ffff} + +/\x{10000}/ + +/\o{20000}/ + +# Check maximum character size for the 32-bit library. These will all give +# errors in the 16-bit library. + +/\x{110000}/ + +/\x{7fffffff}/ + +/\x{80000000}/ + +/\x{ffffffff}/ + +/\x{100000000}/ +Failed: error 134 at offset 12: character code point value in \x{} or \o{} is too large + +/\o{17777777777}/ + +/\o{20000000000}/ + +/\o{37777777777}/ + +/\o{40000000000}/ +Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large + +/\x{7fffffff}\x{7fffffff}/I +Capture group count = 0 +First code unit = \x{7fffffff} +Last code unit = \x{7fffffff} +Subject length lower bound = 2 + +/\x{80000000}\x{80000000}/I +Capture group count = 0 +First code unit = \x{80000000} +Last code unit = \x{80000000} +Subject length lower bound = 2 + +/\x{ffffffff}\x{ffffffff}/I +Capture group count = 0 +First code unit = \x{ffffffff} +Last code unit = \x{ffffffff} +Subject length lower bound = 2 + +# Non-UTF characters + +/.{2,3}/ + \x{400000}\x{400001}\x{400002}\x{400003} + 0: \x{400000}\x{400001}\x{400002} + +/\x{400000}\x{800000}/IBi +------------------------------------------------------------------ + Bra + /i \x{400000}\x{800000} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless +First code unit = \x{400000} +Last code unit = \x{800000} +Subject length lower bound = 2 + +# Check character ranges + +/[\H]/IB +------------------------------------------------------------------ + Bra + [\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{ffffffff}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b + \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a + \x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 + : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ + _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 + \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f + \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e + \x9f \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae + \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd + \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc + \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb + \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea + \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 + \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/[\V]/IB +------------------------------------------------------------------ + Bra + [\x00-\x09\x0e-\x84\x86-\xff\x{100}-\x{2027}\x{202a}-\x{ffffffff}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e + \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d + \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > + ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c + d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 + \x83 \x84 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 + \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 + \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 + \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf + \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce + \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd + \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec + \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb + \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/(*THEN:\[A]{65501})/expand + +# We can use pcre2test's utf8_input modifier to create wide pattern characters, +# even though this test is run when UTF is not supported. + +/abý¿¿¿¿¿z/utf8_input + abý¿¿¿¿¿z + 0: ab\x{7fffffff}z + ab\x{7fffffff}z + 0: ab\x{7fffffff}z + +/abÿý¿¿¿¿¿z/utf8_input + abÿý¿¿¿¿¿z + 0: ab\x{ffffffff}z + ab\x{ffffffff}z + 0: ab\x{ffffffff}z + +/abÿAz/utf8_input + abÿAz + 0: ab\x{80000041}z + ab\x{80000041}z + 0: ab\x{80000041}z + +/(?i:A{1,}\6666666666)/ + A\x{1b6}6666666 + 0: A\x{1b6}6666666 + +# End of testinput11 diff --git a/src/pcre2/testdata/testoutput12-16 b/src/pcre2/testdata/testoutput12-16 new file mode 100644 index 00000000..84c48581 --- /dev/null +++ b/src/pcre2/testdata/testoutput12-16 @@ -0,0 +1,1784 @@ +# This set of tests is for UTF-16 and UTF-32 support, including Unicode +# properties. It is relevant only to the 16-bit and 32-bit libraries. The +# output is different for each library, so there are separate output files. + +/ÃÃÃxxx/IB,utf,no_utf_check +** Failed: invalid UTF-8 string cannot be converted to 16-bit string + +/abc/utf + Ã] +** Failed: invalid UTF-8 string cannot be used as input in UTF mode + +# Check maximum character size + +/\x{ffff}/IB,utf +------------------------------------------------------------------ + Bra + \x{ffff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{ffff} +Subject length lower bound = 1 + +/\x{10000}/IB,utf +------------------------------------------------------------------ + Bra + \x{10000} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{d800} +Last code unit = \x{dc00} +Subject length lower bound = 1 + +/\x{100}/IB,utf +------------------------------------------------------------------ + Bra + \x{100} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{100} +Subject length lower bound = 1 + +/\x{1000}/IB,utf +------------------------------------------------------------------ + Bra + \x{1000} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{1000} +Subject length lower bound = 1 + +/\x{10000}/IB,utf +------------------------------------------------------------------ + Bra + \x{10000} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{d800} +Last code unit = \x{dc00} +Subject length lower bound = 1 + +/\x{100000}/IB,utf +------------------------------------------------------------------ + Bra + \x{100000} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{dbc0} +Last code unit = \x{dc00} +Subject length lower bound = 1 + +/\x{10ffff}/IB,utf +------------------------------------------------------------------ + Bra + \x{10ffff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{dbff} +Last code unit = \x{dfff} +Subject length lower bound = 1 + +/[\x{ff}]/IB,utf +------------------------------------------------------------------ + Bra + \x{ff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xff +Subject length lower bound = 1 + +/[\x{100}]/IB,utf +------------------------------------------------------------------ + Bra + \x{100} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{100} +Subject length lower bound = 1 + +/\x80/IB,utf +------------------------------------------------------------------ + Bra + \x{80} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x80 +Subject length lower bound = 1 + +/\xff/IB,utf +------------------------------------------------------------------ + Bra + \x{ff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xff +Subject length lower bound = 1 + +/\x{D55c}\x{ad6d}\x{C5B4}/IB,utf +------------------------------------------------------------------ + Bra + \x{d55c}\x{ad6d}\x{c5b4} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{d55c} +Last code unit = \x{c5b4} +Subject length lower bound = 3 + \x{D55c}\x{ad6d}\x{C5B4} + 0: \x{d55c}\x{ad6d}\x{c5b4} + +/\x{65e5}\x{672c}\x{8a9e}/IB,utf +------------------------------------------------------------------ + Bra + \x{65e5}\x{672c}\x{8a9e} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{65e5} +Last code unit = \x{8a9e} +Subject length lower bound = 3 + \x{65e5}\x{672c}\x{8a9e} + 0: \x{65e5}\x{672c}\x{8a9e} + +/\x{80}/IB,utf +------------------------------------------------------------------ + Bra + \x{80} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x80 +Subject length lower bound = 1 + +/\x{084}/IB,utf +------------------------------------------------------------------ + Bra + \x{84} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x84 +Subject length lower bound = 1 + +/\x{104}/IB,utf +------------------------------------------------------------------ + Bra + \x{104} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{104} +Subject length lower bound = 1 + +/\x{861}/IB,utf +------------------------------------------------------------------ + Bra + \x{861} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{861} +Subject length lower bound = 1 + +/\x{212ab}/IB,utf +------------------------------------------------------------------ + Bra + \x{212ab} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{d844} +Last code unit = \x{deab} +Subject length lower bound = 1 + +/[^ab\xC0-\xF0]/IB,utf +------------------------------------------------------------------ + Bra + [\x00-`c-\xbf\xf1-\xff] (neg) + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 + 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y + Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f + \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e + \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d + \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac + \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb + \xbc \xbd \xbe \xbf \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb + \xfc \xfd \xfe \xff +Subject length lower bound = 1 + \x{f1} + 0: \x{f1} + \x{bf} + 0: \x{bf} + \x{100} + 0: \x{100} + \x{1000} + 0: \x{1000} +\= Expect no match + \x{c0} +No match + \x{f0} +No match + +/Ä€{3,4}/IB,utf +------------------------------------------------------------------ + Bra + \x{100}{3} + \x{100}?+ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{100} +Last code unit = \x{100} +Subject length lower bound = 3 + \x{100}\x{100}\x{100}\x{100\x{100} + 0: \x{100}\x{100}\x{100} + +/(\x{100}+|x)/IB,utf +------------------------------------------------------------------ + Bra + CBra 1 + \x{100}++ + Alt + x + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Starting code units: x \xff +Subject length lower bound = 1 + +/(\x{100}*a|x)/IB,utf +------------------------------------------------------------------ + Bra + CBra 1 + \x{100}*+ + a + Alt + x + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Starting code units: a x \xff +Subject length lower bound = 1 + +/(\x{100}{0,2}a|x)/IB,utf +------------------------------------------------------------------ + Bra + CBra 1 + \x{100}{0,2}+ + a + Alt + x + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Starting code units: a x \xff +Subject length lower bound = 1 + +/(\x{100}{1,2}a|x)/IB,utf +------------------------------------------------------------------ + Bra + CBra 1 + \x{100} + \x{100}{0,1}+ + a + Alt + x + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Starting code units: x \xff +Subject length lower bound = 1 + +/\x{100}/IB,utf +------------------------------------------------------------------ + Bra + \x{100} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{100} +Subject length lower bound = 1 + +/a\x{100}\x{101}*/IB,utf +------------------------------------------------------------------ + Bra + a\x{100} + \x{101}*+ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'a' +Last code unit = \x{100} +Subject length lower bound = 2 + +/a\x{100}\x{101}+/IB,utf +------------------------------------------------------------------ + Bra + a\x{100} + \x{101}++ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'a' +Last code unit = \x{101} +Subject length lower bound = 3 + +/[^\x{c4}]/IB +------------------------------------------------------------------ + Bra + [^\x{c4}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Subject length lower bound = 1 + +/[\x{100}]/IB,utf +------------------------------------------------------------------ + Bra + \x{100} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{100} +Subject length lower bound = 1 + \x{100} + 0: \x{100} + Z\x{100} + 0: \x{100} + \x{100}Z + 0: \x{100} + +/[\xff]/IB,utf +------------------------------------------------------------------ + Bra + \x{ff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xff +Subject length lower bound = 1 + >\x{ff}< + 0: \x{ff} + +/[^\xff]/IB,utf +------------------------------------------------------------------ + Bra + [^\x{ff}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Subject length lower bound = 1 + +/\x{100}abc(xyz(?1))/IB,utf +------------------------------------------------------------------ + Bra + \x{100}abc + CBra 1 + xyz + Recurse + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +First code unit = \x{100} +Last code unit = 'z' +Subject length lower bound = 7 + +/\777/I,utf +Capture group count = 0 +Options: utf +First code unit = \x{1ff} +Subject length lower bound = 1 + \x{1ff} + 0: \x{1ff} + \777 + 0: \x{1ff} + +/\x{100}+\x{200}/IB,utf +------------------------------------------------------------------ + Bra + \x{100}++ + \x{200} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{100} +Last code unit = \x{200} +Subject length lower bound = 2 + +/\x{100}+X/IB,utf +------------------------------------------------------------------ + Bra + \x{100}++ + X + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{100} +Last code unit = 'X' +Subject length lower bound = 2 + +/^[\QÄ€\E-\QÅ\E/B,utf +Failed: error 106 at offset 13: missing terminating ] for character class + +/X/utf + XX\x{d800}\=no_utf_check + 0: X + XX\x{da00}\=no_utf_check + 0: X + XX\x{dc00}\=no_utf_check + 0: X + XX\x{de00}\=no_utf_check + 0: X + XX\x{dfff}\=no_utf_check + 0: X +\= Expect UTF error + XX\x{d800} +Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2 + XX\x{da00} +Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2 + XX\x{dc00} +Failed: error -26: UTF-16 error: isolated low surrogate at offset 2 + XX\x{de00} +Failed: error -26: UTF-16 error: isolated low surrogate at offset 2 + XX\x{dfff} +Failed: error -26: UTF-16 error: isolated low surrogate at offset 2 + XX\x{110000} +** Failed: character \x{110000} is greater than 0x10ffff and so cannot be converted to UTF-16 + XX\x{d800}\x{1234} +Failed: error -25: UTF-16 error: invalid low surrogate at offset 2 +\= Expect no match + XX\x{d800}\=offset=3 +No match + +/(?<=.)X/utf + XX\x{d800}\=offset=3 +Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2 + +/(*UTF16)\x{11234}/ + abcd\x{11234}pqr + 0: \x{11234} + +/(*UTF)\x{11234}/I +Capture group count = 0 +Compile options: +Overall options: utf +First code unit = \x{d804} +Last code unit = \x{de34} +Subject length lower bound = 1 + abcd\x{11234}pqr + 0: \x{11234} + +/(*UTF-32)\x{11234}/ +Failed: error 160 at offset 5: (*VERB) not recognized or malformed + abcd\x{11234}pqr + +/(*UTF-32)\x{112}/ +Failed: error 160 at offset 5: (*VERB) not recognized or malformed + abcd\x{11234}pqr + +/(*CRLF)(*UTF16)(*BSR_UNICODE)a\Rb/I +Capture group count = 0 +Compile options: +Overall options: utf +\R matches any Unicode newline +Forced newline is CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + +/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I +Failed: error 160 at offset 14: (*VERB) not recognized or malformed + +/\h/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x09 \x20 \xa0 \xff +Subject length lower bound = 1 + ABC\x{09} + 0: \x{09} + ABC\x{20} + 0: + ABC\x{a0} + 0: \x{a0} + ABC\x{1680} + 0: \x{1680} + ABC\x{180e} + 0: \x{180e} + ABC\x{2000} + 0: \x{2000} + ABC\x{202f} + 0: \x{202f} + ABC\x{205f} + 0: \x{205f} + ABC\x{3000} + 0: \x{3000} + +/\v/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x0a \x0b \x0c \x0d \x85 \xff +Subject length lower bound = 1 + ABC\x{0a} + 0: \x{0a} + ABC\x{0b} + 0: \x{0b} + ABC\x{0c} + 0: \x{0c} + ABC\x{0d} + 0: \x{0d} + ABC\x{85} + 0: \x{85} + ABC\x{2028} + 0: \x{2028} + +/\h*A/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x09 \x20 A \xa0 \xff +Last code unit = 'A' +Subject length lower bound = 1 + CDBABC + 0: A + \x{2000}ABC + 0: \x{2000}A + +/\R*A/I,bsr=unicode,utf +Capture group count = 0 +Options: utf +\R matches any Unicode newline +Starting code units: \x0a \x0b \x0c \x0d A \x85 \xff +Last code unit = 'A' +Subject length lower bound = 1 + CDBABC + 0: A + \x{2028}A + 0: \x{2028}A + +/\v+A/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x0a \x0b \x0c \x0d \x85 \xff +Last code unit = 'A' +Subject length lower bound = 2 + +/\s?xxx\s/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x +Last code unit = 'x' +Subject length lower bound = 4 + +/\sxxx\s/I,utf,tables=2 +Capture group count = 0 +Options: utf +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0 +Last code unit = 'x' +Subject length lower bound = 5 + AB\x{85}xxx\x{a0}XYZ + 0: \x{85}xxx\x{a0} + AB\x{a0}xxx\x{85}XYZ + 0: \x{a0}xxx\x{85} + +/\S \S/I,utf,tables=2 +Capture group count = 0 +Options: utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f + \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e + \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C + D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h + i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 + \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 + \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3 \xa4 + \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 + \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 + \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 + \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 + \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef + \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe + \xff +Last code unit = ' ' +Subject length lower bound = 3 + \x{a2} \x{84} + 0: \x{a2} \x{84} + A Z + 0: A Z + +/a+/utf + a\x{123}aa\=offset=1 + 0: aa + a\x{123}aa\=offset=2 + 0: aa + a\x{123}aa\=offset=3 + 0: a +\= Expect no match + a\x{123}aa\=offset=4 +No match +\= Expect bad offset error + a\x{123}aa\=offset=5 +Failed: error -33: bad offset value + a\x{123}aa\=offset=6 +Failed: error -33: bad offset value + +/\x{1234}+/Ii,utf +Capture group count = 0 +Options: caseless utf +First code unit = \x{1234} +Subject length lower bound = 1 + +/\x{1234}+?/Ii,utf +Capture group count = 0 +Options: caseless utf +First code unit = \x{1234} +Subject length lower bound = 1 + +/\x{1234}++/Ii,utf +Capture group count = 0 +Options: caseless utf +First code unit = \x{1234} +Subject length lower bound = 1 + +/\x{1234}{2}/Ii,utf +Capture group count = 0 +Options: caseless utf +First code unit = \x{1234} +Last code unit = \x{1234} +Subject length lower bound = 2 + +/[^\x{c4}]/IB,utf +------------------------------------------------------------------ + Bra + [^\x{c4}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Subject length lower bound = 1 + +/X+\x{200}/IB,utf +------------------------------------------------------------------ + Bra + X++ + \x{200} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'X' +Last code unit = \x{200} +Subject length lower bound = 2 + +/\R/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x0a \x0b \x0c \x0d \x85 \xff +Subject length lower bound = 1 + +# Check bad offset + +/a/utf +\= Expect bad UTF-16 offset, or no match in 32-bit + \x{10000}\=offset=1 +Error -36 (bad UTF-16 offset) + \x{10000}ab\=offset=1 +Error -36 (bad UTF-16 offset) +\= Expect 16-bit match, 32-bit no match + \x{10000}ab\=offset=2 + 0: a +\= Expect no match + \x{10000}ab\=offset=3 +No match +\= Expect no match in 16-bit, bad offset in 32-bit + \x{10000}ab\=offset=4 +No match +\= Expect bad offset + \x{10000}ab\=offset=5 +Failed: error -33: bad offset value + +/í¼€/utf +Failed: error -26 at offset 0: UTF-16 error: isolated low surrogate + +/\w+\x{C4}/B,utf +------------------------------------------------------------------ + Bra + \w++ + \x{c4} + Ket + End +------------------------------------------------------------------ + a\x{C4}\x{C4} + 0: a\x{c4} + +/\w+\x{C4}/B,utf,tables=2 +------------------------------------------------------------------ + Bra + \w+ + \x{c4} + Ket + End +------------------------------------------------------------------ + a\x{C4}\x{C4} + 0: a\x{c4}\x{c4} + +/\W+\x{C4}/B,utf +------------------------------------------------------------------ + Bra + \W+ + \x{c4} + Ket + End +------------------------------------------------------------------ + !\x{C4} + 0: !\x{c4} + +/\W+\x{C4}/B,utf,tables=2 +------------------------------------------------------------------ + Bra + \W++ + \x{c4} + Ket + End +------------------------------------------------------------------ + !\x{C4} + 0: !\x{c4} + +/\W+\x{A1}/B,utf +------------------------------------------------------------------ + Bra + \W+ + \x{a1} + Ket + End +------------------------------------------------------------------ + !\x{A1} + 0: !\x{a1} + +/\W+\x{A1}/B,utf,tables=2 +------------------------------------------------------------------ + Bra + \W+ + \x{a1} + Ket + End +------------------------------------------------------------------ + !\x{A1} + 0: !\x{a1} + +/X\s+\x{A0}/B,utf +------------------------------------------------------------------ + Bra + X + \s++ + \x{a0} + Ket + End +------------------------------------------------------------------ + X\x20\x{A0}\x{A0} + 0: X \x{a0} + +/X\s+\x{A0}/B,utf,tables=2 +------------------------------------------------------------------ + Bra + X + \s+ + \x{a0} + Ket + End +------------------------------------------------------------------ + X\x20\x{A0}\x{A0} + 0: X \x{a0}\x{a0} + +/\S+\x{A0}/B,utf +------------------------------------------------------------------ + Bra + \S+ + \x{a0} + Ket + End +------------------------------------------------------------------ + X\x{A0}\x{A0} + 0: X\x{a0}\x{a0} + +/\S+\x{A0}/B,utf,tables=2 +------------------------------------------------------------------ + Bra + \S++ + \x{a0} + Ket + End +------------------------------------------------------------------ + X\x{A0}\x{A0} + 0: X\x{a0} + +/\x{a0}+\s!/B,utf +------------------------------------------------------------------ + Bra + \x{a0}++ + \s + ! + Ket + End +------------------------------------------------------------------ + \x{a0}\x20! + 0: \x{a0} ! + +/\x{a0}+\s!/B,utf,tables=2 +------------------------------------------------------------------ + Bra + \x{a0}+ + \s + ! + Ket + End +------------------------------------------------------------------ + \x{a0}\x20! + 0: \x{a0} ! + +/(*UTF)abc/never_utf +Failed: error 174 at offset 6: using UTF is disabled by the application + +/abc/utf,never_utf +Failed: error 174 at offset 0: using UTF is disabled by the application + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IBi,utf +------------------------------------------------------------------ + Bra + /i A\x{391}\x{10427}\x{ff3a}\x{1fb0} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +First code unit = 'A' (caseless) +Last code unit = \x{1fb0} (caseless) +Subject length lower bound = 5 + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IB,utf +------------------------------------------------------------------ + Bra + A\x{391}\x{10427}\x{ff3a}\x{1fb0} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'A' +Last code unit = \x{1fb0} +Subject length lower bound = 5 + +/AB\x{1fb0}/IB,utf +------------------------------------------------------------------ + Bra + AB\x{1fb0} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'A' +Last code unit = \x{1fb0} +Subject length lower bound = 3 + +/AB\x{1fb0}/IBi,utf +------------------------------------------------------------------ + Bra + /i AB\x{1fb0} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +First code unit = 'A' (caseless) +Last code unit = \x{1fb0} (caseless) +Subject length lower bound = 3 + +/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf +Capture group count = 0 +Options: caseless utf +First code unit = \x{401} (caseless) +Last code unit = \x{42f} (caseless) +Subject length lower bound = 17 + \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} + 0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} + \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} + 0: \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} + +/[â±¥]/Bi,utf +------------------------------------------------------------------ + Bra + /i \x{2c65} + Ket + End +------------------------------------------------------------------ + +/[^â±¥]/Bi,utf +------------------------------------------------------------------ + Bra + /i [^\x{2c65}] + Ket + End +------------------------------------------------------------------ + +/[[:blank:]]/B,ucp +------------------------------------------------------------------ + Bra + [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}] + Ket + End +------------------------------------------------------------------ + +/\x{212a}+/Ii,utf +Capture group count = 0 +Options: caseless utf +Starting code units: K k \xff +Subject length lower bound = 1 + KKkk\x{212a} + 0: KKkk\x{212a} + +/s+/Ii,utf +Capture group count = 0 +Options: caseless utf +Starting code units: S s \xff +Subject length lower bound = 1 + SSss\x{17f} + 0: SSss\x{17f} + +# Non-UTF characters should give errors in both 16-bit and 32-bit modes. + +/\x{110000}/utf +Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large + +/\o{4200000}/utf +Failed: error 134 at offset 10: character code point value in \x{} or \o{} is too large + +/\x{100}*A/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + A + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: A \xff +Last code unit = 'A' +Subject length lower bound = 1 + A + 0: A + +/\x{100}*\d(?R)/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + \d + Recurse + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff +Subject length lower bound = 1 + +/[Z\x{100}]/IB,utf +------------------------------------------------------------------ + Bra + [Z\x{100}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: Z \xff +Subject length lower bound = 1 + Z\x{100} + 0: Z + \x{100} + 0: \x{100} + \x{100}Z + 0: \x{100} + +/[z-\x{100}]/IB,utf +------------------------------------------------------------------ + Bra + [z-\xff\x{100}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 + \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 + \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 + \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 + \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 \xc3 + \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 + \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 + \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 + \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/[z\Qa-d]Ä€\E]/IB,utf +------------------------------------------------------------------ + Bra + [\-\]adz\x{100}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: - ] a d z \xff +Subject length lower bound = 1 + \x{100} + 0: \x{100} + Ä€ + 0: \x{100} + +/[ab\x{100}]abc(xyz(?1))/IB,utf +------------------------------------------------------------------ + Bra + [ab\x{100}] + abc + CBra 1 + xyz + Recurse + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Starting code units: a b \xff +Last code unit = 'z' +Subject length lower bound = 7 + +/\x{100}*\s/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + \s + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xff +Subject length lower bound = 1 + +/\x{100}*\d/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + \d + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff +Subject length lower bound = 1 + +/\x{100}*\w/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + \w + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z + \xff +Subject length lower bound = 1 + +/\x{100}*\D/IB,utf +------------------------------------------------------------------ + Bra + \x{100}* + \D + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = > + ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c + d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 + \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 + \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 + \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf + \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe + \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd + \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc + \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb + \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa + \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/\x{100}*\S/IB,utf +------------------------------------------------------------------ + Bra + \x{100}* + \S + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f + \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e + \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C + D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h + i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 + \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 + \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 \xa2 + \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 + \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 + \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf + \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde + \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed + \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc + \xfd \xfe \xff +Subject length lower bound = 1 + +/\x{100}*\W/IB,utf +------------------------------------------------------------------ + Bra + \x{100}* + \W + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = > + ? @ [ \ ] ^ ` { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 + \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 + \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 + \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 + \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 + \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 + \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 + \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 + \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/[\x{105}-\x{109}]/IBi,utf +------------------------------------------------------------------ + Bra + [\x{104}-\x{109}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +Starting code units: \xff +Subject length lower bound = 1 + \x{104} + 0: \x{104} + \x{105} + 0: \x{105} + \x{109} + 0: \x{109} +\= Expect no match + \x{100} +No match + \x{10a} +No match + +/[z-\x{100}]/IBi,utf +------------------------------------------------------------------ + Bra + [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 + \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 + \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 + \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 + \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 + \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 + \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 + \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef + \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe + \xff +Subject length lower bound = 1 + Z + 0: Z + z + 0: z + \x{39c} + 0: \x{39c} + \x{178} + 0: \x{178} + | + 0: | + \x{80} + 0: \x{80} + \x{ff} + 0: \x{ff} + \x{100} + 0: \x{100} + \x{101} + 0: \x{101} +\= Expect no match + \x{102} +No match + Y +No match + y +No match + +/[z-\x{100}]/IBi,utf +------------------------------------------------------------------ + Bra + [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 + \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 + \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 + \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 + \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 + \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 + \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 + \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef + \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe + \xff +Subject length lower bound = 1 + +/\x{3a3}B/IBi,utf +------------------------------------------------------------------ + Bra + clist 03a3 03c2 03c3 + /i B + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +Starting code units: \xff +Last code unit = 'B' (caseless) +Subject length lower bound = 2 + +/./utf + \x{110000} +** Failed: character \x{110000} is greater than 0x10ffff and so cannot be converted to UTF-16 + +/(*UTF)abý¿¿¿¿¿z/B +------------------------------------------------------------------ + Bra + ab\x{fd}\x{bf}\x{bf}\x{bf}\x{bf}\x{bf}z + Ket + End +------------------------------------------------------------------ + +/abý¿¿¿¿¿z/utf +** Failed: character value greater than 0x10ffff cannot be converted to UTF + +/[\W\p{Any}]/B +------------------------------------------------------------------ + Bra + [\x00-/:-@[-^`{-\xff\p{Any}\x{100}-\x{ffff}] + Ket + End +------------------------------------------------------------------ + abc + 0: a + 123 + 0: 1 + +/[\W\pL]/B +------------------------------------------------------------------ + Bra + [\x00-/:-@[-^`{-\xff\p{L}\x{100}-\x{ffff}] + Ket + End +------------------------------------------------------------------ + abc + 0: a + \x{100} + 0: \x{100} + \x{308} + 0: \x{308} +\= Expect no match + 123 +No match + +/[\s[:^ascii:]]/B,ucp +------------------------------------------------------------------ + Bra + [\x80-\xff\p{Xsp}\x{100}-\x{ffff}] + Ket + End +------------------------------------------------------------------ + +/\pP/ucp + \x{7fffffff} +** Character \x{7fffffff} is greater than 0xffff and UTF-16 mode is not enabled. +** Truncation will probably give the wrong result. +No match + +# A special extra option allows excaped surrogate code points in 32-bit mode, +# but subjects containing them must not be UTF-checked. These patterns give +# errors in 16-bit mode. + +/\x{d800}/I,utf,allow_surrogate_escapes +Failed: error 191 at offset 0: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowed in UTF-16 mode + \x{d800}\=no_utf_check + +/\udfff\o{157401}/utf,alt_bsux,allow_surrogate_escapes +Failed: error 191 at offset 0: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowed in UTF-16 mode + \x{dfff}\x{df01}\=no_utf_check + +# This has different starting code units in 8-bit mode. + +/^[^ab]/IB,utf +------------------------------------------------------------------ + Bra + ^ + [\x00-`c-\xff] (neg) + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: utf +Overall options: anchored utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 + 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y + Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f + \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e + \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d + \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac + \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb + \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca + \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 + \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 + \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 + \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + c + 0: c + \x{ff} + 0: \x{ff} + \x{100} + 0: \x{100} +\= Expect no match + aaa +No match + +# Offsets are different in 8-bit mode. + +/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout + 123abcáyzabcdef789abcሴqr + 1(2) Old 6 6 "" New 6 8 "<>" + 2(2) Old 12 12 "" New 14 16 "<>" + 3(2) Old 12 15 "def" New 16 21 "" + 4(2) Old 21 21 "" New 27 29 "<>" + 4: 123abc<>\x{e1}yzabc<>789abc<>\x{1234}qr + +# A few script run tests in non-UTF mode (but they need Unicode support) + +/^(*script_run:.{4})/ + \x{3041}\x{30a1}\x{3007}\x{3007} Hiragana Katakana Han Han + 0: \x{3041}\x{30a1}\x{3007}\x{3007} + \x{30a1}\x{3041}\x{3007}\x{3007} Katakana Hiragana Han Han + 0: \x{30a1}\x{3041}\x{3007}\x{3007} + \x{1100}\x{2e80}\x{2e80}\x{1101} Hangul Han Han Hangul + 0: \x{1100}\x{2e80}\x{2e80}\x{1101} + +/^(*sr:.*)/utf,allow_surrogate_escapes +Failed: error 191 at offset 0: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is not allowed in UTF-16 mode + \x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana + \x{d800}\x{dfff} Surrogates (Unknown) \=no_utf_check + +/(?(n/utf +Failed: error 142 at offset 4: syntax error in subpattern name (missing terminator?) + +/(?(á/utf +Failed: error 142 at offset 4: syntax error in subpattern name (missing terminator?) + +# Invalid UTF-16/32 tests. + +/.../g,match_invalid_utf + abcd\x{df00}wxzy\x{df00}pqrs + 0: abc + 0: wxz + 0: pqr + abcd\x{80}wxzy\x{df00}pqrs + 0: abc + 0: d\x{80}w + 0: xzy + 0: pqr + +/abc/match_invalid_utf + ab\x{df00}ab\=ph +Partial match: ab +\= Expect no match + ab\x{df00}cdef\=ph +No match + +/ab$/match_invalid_utf + ab\x{df00}cdeab + 0: ab +\= Expect no match + ab\x{df00}cde +No match + +/.../g,match_invalid_utf + abcd\x{80}wxzy\x{df00}pqrs + 0: abc + 0: d\x{80}w + 0: xzy + 0: pqr + +/(?<=x)../g,match_invalid_utf + abcd\x{80}wxzy\x{df00}pqrs + 0: zy + abcd\x{80}wxzy\x{df00}xpqrs + 0: zy + 0: pq + +/X$/match_invalid_utf +\= Expect no match + X\x{df00} +No match + +/(?<=..)X/match_invalid_utf,aftertext + AB\x{df00}AQXYZ + 0: X + 0+ YZ + AB\x{df00}AQXYZ\=offset=5 + 0: X + 0+ YZ + AB\x{df00}\x{df00}AXYZXC\=offset=5 + 0: X + 0+ C +\= Expect no match + AB\x{df00}XYZ +No match + AB\x{df00}XYZ\=offset=3 +No match + AB\x{df00}AXYZ +No match + AB\x{df00}AXYZ\=offset=4 +No match + AB\x{df00}\x{df00}AXYZ\=offset=5 +No match + +/.../match_invalid_utf +\= Expect no match + A\x{d800}B +No match + A\x{110000}B +** Failed: character \x{110000} is greater than 0x10ffff and so cannot be converted to UTF-16 + +/aa/utf,ucp,match_invalid_utf,global + aa\x{d800}aa + 0: aa + 0: aa + +/aa/utf,ucp,match_invalid_utf,global + \x{d800}aa + 0: aa + +# ---------------------------------------------------- + +/(*UTF)(?=\x{123})/I +Capture group count = 0 +May match empty string +Compile options: +Overall options: utf +First code unit = \x{123} +Subject length lower bound = 1 + +/[\x{c1}\x{e1}]X[\x{145}\x{146}]/I,utf +Capture group count = 0 +Options: utf +First code unit = \xc1 (caseless) +Last code unit = \x{145} (caseless) +Subject length lower bound = 3 + +/[\xff\x{ffff}]/I,utf +Capture group count = 0 +Options: utf +Starting code units: \xff +Subject length lower bound = 1 + +/[\xff\x{ff}]/I,utf +Capture group count = 0 +Options: utf +Starting code units: \xff +Subject length lower bound = 1 + +/[\xff\x{ff}]/I +Capture group count = 0 +Starting code units: \xff +Subject length lower bound = 1 + +/[Ss]/I +Capture group count = 0 +First code unit = 'S' (caseless) +Subject length lower bound = 1 + +/[Ss]/I,utf +Capture group count = 0 +Options: utf +Starting code units: S s +Subject length lower bound = 1 + +/(?:\x{ff}|\x{3000})/I,utf +Capture group count = 0 +Options: utf +Starting code units: \xff +Subject length lower bound = 1 + +# ---------------------------------------------------- +# UCP and casing tests + +/\x{120}/i,I +Capture group count = 0 +Options: caseless +First code unit = \x{120} +Subject length lower bound = 1 + +/\x{c1}/i,I,ucp +Capture group count = 0 +Options: caseless ucp +First code unit = \xc1 (caseless) +Subject length lower bound = 1 + +/[\x{120}\x{121}]/iB,ucp +------------------------------------------------------------------ + Bra + /i \x{120} + Ket + End +------------------------------------------------------------------ + +/[ab\x{120}]+/iB,ucp +------------------------------------------------------------------ + Bra + [ABab\x{120}-\x{121}]++ + Ket + End +------------------------------------------------------------------ + aABb\x{121}\x{120} + 0: aABb\x{121}\x{120} + +/\x{c1}/i,no_start_optimize +\= Expect no match + \x{e1} +No match + +/\x{120}\x{c1}/i,ucp,no_start_optimize + \x{121}\x{e1} + 0: \x{121}\xe1 + +/\x{120}\x{c1}/i,ucp + \x{121}\x{e1} + 0: \x{121}\xe1 + +/[^\x{120}]/i,no_start_optimize + \x{121} + 0: \x{121} + +/[^\x{120}]/i,ucp,no_start_optimize +\= Expect no match + \x{121} +No match + +/[^\x{120}]/i + \x{121} + 0: \x{121} + +/[^\x{120}]/i,ucp +\= Expect no match + \x{121} +No match + +/\x{120}{2}/i,ucp + \x{121}\x{121} + 0: \x{121}\x{121} + +/[^\x{120}]{2}/i,ucp +\= Expect no match + \x{121}\x{121} +No match + +/\x{c1}+\x{e1}/iB,ucp +------------------------------------------------------------------ + Bra + /i \x{c1}+ + /i \x{e1} + Ket + End +------------------------------------------------------------------ + \x{c1}\x{c1}\x{c1} + 0: \xc1\xc1\xc1 + +/\x{c1}+\x{e1}/iIB,ucp +------------------------------------------------------------------ + Bra + /i \x{c1}+ + /i \x{e1} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless ucp +First code unit = \xc1 (caseless) +Last code unit = \xe1 (caseless) +Subject length lower bound = 2 + \x{c1}\x{c1}\x{c1} + 0: \xc1\xc1\xc1 + \x{e1}\x{e1}\x{e1} + 0: \xe1\xe1\xe1 + +/a|\x{c1}/iI,ucp +Capture group count = 0 +Options: caseless ucp +Starting code units: A a \xc1 \xe1 +Subject length lower bound = 1 + \x{e1}xxx + 0: \xe1 + +/\x{c1}|\x{e1}/iI,ucp +Capture group count = 0 +Options: caseless ucp +First code unit = \xc1 (caseless) +Subject length lower bound = 1 + +/X(\x{e1})Y/ucp,replace=>\U$1<,substitute_extended + X\x{e1}Y + 1: >\xc1< + +/X(\x{121})Y/ucp,replace=>\U$1<,substitute_extended + X\x{121}Y + 1: >\x{120}< + +/s/i,ucp + \x{17f} + 0: \x{17f} + +/s/i,utf + \x{17f} + 0: \x{17f} + +/[^s]/i,ucp +\= Expect no match + \x{17f} +No match + +/[^s]/i,utf +\= Expect no match + \x{17f} +No match + +# ---------------------------------------------------- + +# End of testinput12 diff --git a/src/pcre2/testdata/testoutput12-32 b/src/pcre2/testdata/testoutput12-32 new file mode 100644 index 00000000..03b6e394 --- /dev/null +++ b/src/pcre2/testdata/testoutput12-32 @@ -0,0 +1,1782 @@ +# This set of tests is for UTF-16 and UTF-32 support, including Unicode +# properties. It is relevant only to the 16-bit and 32-bit libraries. The +# output is different for each library, so there are separate output files. + +/ÃÃÃxxx/IB,utf,no_utf_check +** Failed: invalid UTF-8 string cannot be converted to 32-bit string + +/abc/utf + Ã] +** Failed: invalid UTF-8 string cannot be used as input in UTF mode + +# Check maximum character size + +/\x{ffff}/IB,utf +------------------------------------------------------------------ + Bra + \x{ffff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{ffff} +Subject length lower bound = 1 + +/\x{10000}/IB,utf +------------------------------------------------------------------ + Bra + \x{10000} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{10000} +Subject length lower bound = 1 + +/\x{100}/IB,utf +------------------------------------------------------------------ + Bra + \x{100} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{100} +Subject length lower bound = 1 + +/\x{1000}/IB,utf +------------------------------------------------------------------ + Bra + \x{1000} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{1000} +Subject length lower bound = 1 + +/\x{10000}/IB,utf +------------------------------------------------------------------ + Bra + \x{10000} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{10000} +Subject length lower bound = 1 + +/\x{100000}/IB,utf +------------------------------------------------------------------ + Bra + \x{100000} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{100000} +Subject length lower bound = 1 + +/\x{10ffff}/IB,utf +------------------------------------------------------------------ + Bra + \x{10ffff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{10ffff} +Subject length lower bound = 1 + +/[\x{ff}]/IB,utf +------------------------------------------------------------------ + Bra + \x{ff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xff +Subject length lower bound = 1 + +/[\x{100}]/IB,utf +------------------------------------------------------------------ + Bra + \x{100} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{100} +Subject length lower bound = 1 + +/\x80/IB,utf +------------------------------------------------------------------ + Bra + \x{80} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x80 +Subject length lower bound = 1 + +/\xff/IB,utf +------------------------------------------------------------------ + Bra + \x{ff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xff +Subject length lower bound = 1 + +/\x{D55c}\x{ad6d}\x{C5B4}/IB,utf +------------------------------------------------------------------ + Bra + \x{d55c}\x{ad6d}\x{c5b4} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{d55c} +Last code unit = \x{c5b4} +Subject length lower bound = 3 + \x{D55c}\x{ad6d}\x{C5B4} + 0: \x{d55c}\x{ad6d}\x{c5b4} + +/\x{65e5}\x{672c}\x{8a9e}/IB,utf +------------------------------------------------------------------ + Bra + \x{65e5}\x{672c}\x{8a9e} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{65e5} +Last code unit = \x{8a9e} +Subject length lower bound = 3 + \x{65e5}\x{672c}\x{8a9e} + 0: \x{65e5}\x{672c}\x{8a9e} + +/\x{80}/IB,utf +------------------------------------------------------------------ + Bra + \x{80} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x80 +Subject length lower bound = 1 + +/\x{084}/IB,utf +------------------------------------------------------------------ + Bra + \x{84} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x84 +Subject length lower bound = 1 + +/\x{104}/IB,utf +------------------------------------------------------------------ + Bra + \x{104} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{104} +Subject length lower bound = 1 + +/\x{861}/IB,utf +------------------------------------------------------------------ + Bra + \x{861} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{861} +Subject length lower bound = 1 + +/\x{212ab}/IB,utf +------------------------------------------------------------------ + Bra + \x{212ab} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{212ab} +Subject length lower bound = 1 + +/[^ab\xC0-\xF0]/IB,utf +------------------------------------------------------------------ + Bra + [\x00-`c-\xbf\xf1-\xff] (neg) + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 + 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y + Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f + \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e + \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d + \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac + \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb + \xbc \xbd \xbe \xbf \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb + \xfc \xfd \xfe \xff +Subject length lower bound = 1 + \x{f1} + 0: \x{f1} + \x{bf} + 0: \x{bf} + \x{100} + 0: \x{100} + \x{1000} + 0: \x{1000} +\= Expect no match + \x{c0} +No match + \x{f0} +No match + +/Ä€{3,4}/IB,utf +------------------------------------------------------------------ + Bra + \x{100}{3} + \x{100}?+ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{100} +Last code unit = \x{100} +Subject length lower bound = 3 + \x{100}\x{100}\x{100}\x{100\x{100} + 0: \x{100}\x{100}\x{100} + +/(\x{100}+|x)/IB,utf +------------------------------------------------------------------ + Bra + CBra 1 + \x{100}++ + Alt + x + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Starting code units: x \xff +Subject length lower bound = 1 + +/(\x{100}*a|x)/IB,utf +------------------------------------------------------------------ + Bra + CBra 1 + \x{100}*+ + a + Alt + x + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Starting code units: a x \xff +Subject length lower bound = 1 + +/(\x{100}{0,2}a|x)/IB,utf +------------------------------------------------------------------ + Bra + CBra 1 + \x{100}{0,2}+ + a + Alt + x + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Starting code units: a x \xff +Subject length lower bound = 1 + +/(\x{100}{1,2}a|x)/IB,utf +------------------------------------------------------------------ + Bra + CBra 1 + \x{100} + \x{100}{0,1}+ + a + Alt + x + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Starting code units: x \xff +Subject length lower bound = 1 + +/\x{100}/IB,utf +------------------------------------------------------------------ + Bra + \x{100} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{100} +Subject length lower bound = 1 + +/a\x{100}\x{101}*/IB,utf +------------------------------------------------------------------ + Bra + a\x{100} + \x{101}*+ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'a' +Last code unit = \x{100} +Subject length lower bound = 2 + +/a\x{100}\x{101}+/IB,utf +------------------------------------------------------------------ + Bra + a\x{100} + \x{101}++ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'a' +Last code unit = \x{101} +Subject length lower bound = 3 + +/[^\x{c4}]/IB +------------------------------------------------------------------ + Bra + [^\x{c4}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Subject length lower bound = 1 + +/[\x{100}]/IB,utf +------------------------------------------------------------------ + Bra + \x{100} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{100} +Subject length lower bound = 1 + \x{100} + 0: \x{100} + Z\x{100} + 0: \x{100} + \x{100}Z + 0: \x{100} + +/[\xff]/IB,utf +------------------------------------------------------------------ + Bra + \x{ff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xff +Subject length lower bound = 1 + >\x{ff}< + 0: \x{ff} + +/[^\xff]/IB,utf +------------------------------------------------------------------ + Bra + [^\x{ff}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Subject length lower bound = 1 + +/\x{100}abc(xyz(?1))/IB,utf +------------------------------------------------------------------ + Bra + \x{100}abc + CBra 1 + xyz + Recurse + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +First code unit = \x{100} +Last code unit = 'z' +Subject length lower bound = 7 + +/\777/I,utf +Capture group count = 0 +Options: utf +First code unit = \x{1ff} +Subject length lower bound = 1 + \x{1ff} + 0: \x{1ff} + \777 + 0: \x{1ff} + +/\x{100}+\x{200}/IB,utf +------------------------------------------------------------------ + Bra + \x{100}++ + \x{200} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{100} +Last code unit = \x{200} +Subject length lower bound = 2 + +/\x{100}+X/IB,utf +------------------------------------------------------------------ + Bra + \x{100}++ + X + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{100} +Last code unit = 'X' +Subject length lower bound = 2 + +/^[\QÄ€\E-\QÅ\E/B,utf +Failed: error 106 at offset 13: missing terminating ] for character class + +/X/utf + XX\x{d800}\=no_utf_check + 0: X + XX\x{da00}\=no_utf_check + 0: X + XX\x{dc00}\=no_utf_check + 0: X + XX\x{de00}\=no_utf_check + 0: X + XX\x{dfff}\=no_utf_check + 0: X +\= Expect UTF error + XX\x{d800} +Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2 + XX\x{da00} +Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2 + XX\x{dc00} +Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2 + XX\x{de00} +Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2 + XX\x{dfff} +Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2 + XX\x{110000} +Failed: error -28: UTF-32 error: code points greater than 0x10ffff are not defined at offset 2 + XX\x{d800}\x{1234} +Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2 +\= Expect no match + XX\x{d800}\=offset=3 +No match + +/(?<=.)X/utf + XX\x{d800}\=offset=3 +Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2 + +/(*UTF16)\x{11234}/ +Failed: error 160 at offset 7: (*VERB) not recognized or malformed + abcd\x{11234}pqr + +/(*UTF)\x{11234}/I +Capture group count = 0 +Compile options: +Overall options: utf +First code unit = \x{11234} +Subject length lower bound = 1 + abcd\x{11234}pqr + 0: \x{11234} + +/(*UTF-32)\x{11234}/ +Failed: error 160 at offset 5: (*VERB) not recognized or malformed + abcd\x{11234}pqr + +/(*UTF-32)\x{112}/ +Failed: error 160 at offset 5: (*VERB) not recognized or malformed + abcd\x{11234}pqr + +/(*CRLF)(*UTF16)(*BSR_UNICODE)a\Rb/I +Failed: error 160 at offset 14: (*VERB) not recognized or malformed + +/(*CRLF)(*UTF32)(*BSR_UNICODE)a\Rb/I +Capture group count = 0 +Compile options: +Overall options: utf +\R matches any Unicode newline +Forced newline is CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + +/\h/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x09 \x20 \xa0 \xff +Subject length lower bound = 1 + ABC\x{09} + 0: \x{09} + ABC\x{20} + 0: + ABC\x{a0} + 0: \x{a0} + ABC\x{1680} + 0: \x{1680} + ABC\x{180e} + 0: \x{180e} + ABC\x{2000} + 0: \x{2000} + ABC\x{202f} + 0: \x{202f} + ABC\x{205f} + 0: \x{205f} + ABC\x{3000} + 0: \x{3000} + +/\v/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x0a \x0b \x0c \x0d \x85 \xff +Subject length lower bound = 1 + ABC\x{0a} + 0: \x{0a} + ABC\x{0b} + 0: \x{0b} + ABC\x{0c} + 0: \x{0c} + ABC\x{0d} + 0: \x{0d} + ABC\x{85} + 0: \x{85} + ABC\x{2028} + 0: \x{2028} + +/\h*A/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x09 \x20 A \xa0 \xff +Last code unit = 'A' +Subject length lower bound = 1 + CDBABC + 0: A + \x{2000}ABC + 0: \x{2000}A + +/\R*A/I,bsr=unicode,utf +Capture group count = 0 +Options: utf +\R matches any Unicode newline +Starting code units: \x0a \x0b \x0c \x0d A \x85 \xff +Last code unit = 'A' +Subject length lower bound = 1 + CDBABC + 0: A + \x{2028}A + 0: \x{2028}A + +/\v+A/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x0a \x0b \x0c \x0d \x85 \xff +Last code unit = 'A' +Subject length lower bound = 2 + +/\s?xxx\s/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x +Last code unit = 'x' +Subject length lower bound = 4 + +/\sxxx\s/I,utf,tables=2 +Capture group count = 0 +Options: utf +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \x85 \xa0 +Last code unit = 'x' +Subject length lower bound = 5 + AB\x{85}xxx\x{a0}XYZ + 0: \x{85}xxx\x{a0} + AB\x{a0}xxx\x{85}XYZ + 0: \x{a0}xxx\x{85} + +/\S \S/I,utf,tables=2 +Capture group count = 0 +Options: utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f + \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e + \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C + D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h + i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 + \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 + \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa1 \xa2 \xa3 \xa4 + \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 + \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 + \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 + \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 + \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef + \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe + \xff +Last code unit = ' ' +Subject length lower bound = 3 + \x{a2} \x{84} + 0: \x{a2} \x{84} + A Z + 0: A Z + +/a+/utf + a\x{123}aa\=offset=1 + 0: aa + a\x{123}aa\=offset=2 + 0: aa + a\x{123}aa\=offset=3 + 0: a +\= Expect no match + a\x{123}aa\=offset=4 +No match +\= Expect bad offset error + a\x{123}aa\=offset=5 +Failed: error -33: bad offset value + a\x{123}aa\=offset=6 +Failed: error -33: bad offset value + +/\x{1234}+/Ii,utf +Capture group count = 0 +Options: caseless utf +First code unit = \x{1234} +Subject length lower bound = 1 + +/\x{1234}+?/Ii,utf +Capture group count = 0 +Options: caseless utf +First code unit = \x{1234} +Subject length lower bound = 1 + +/\x{1234}++/Ii,utf +Capture group count = 0 +Options: caseless utf +First code unit = \x{1234} +Subject length lower bound = 1 + +/\x{1234}{2}/Ii,utf +Capture group count = 0 +Options: caseless utf +First code unit = \x{1234} +Last code unit = \x{1234} +Subject length lower bound = 2 + +/[^\x{c4}]/IB,utf +------------------------------------------------------------------ + Bra + [^\x{c4}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Subject length lower bound = 1 + +/X+\x{200}/IB,utf +------------------------------------------------------------------ + Bra + X++ + \x{200} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'X' +Last code unit = \x{200} +Subject length lower bound = 2 + +/\R/I,utf +Capture group count = 0 +Options: utf +Starting code units: \x0a \x0b \x0c \x0d \x85 \xff +Subject length lower bound = 1 + +# Check bad offset + +/a/utf +\= Expect bad UTF-16 offset, or no match in 32-bit + \x{10000}\=offset=1 +No match + \x{10000}ab\=offset=1 + 0: a +\= Expect 16-bit match, 32-bit no match + \x{10000}ab\=offset=2 +No match +\= Expect no match + \x{10000}ab\=offset=3 +No match +\= Expect no match in 16-bit, bad offset in 32-bit + \x{10000}ab\=offset=4 +Failed: error -33: bad offset value +\= Expect bad offset + \x{10000}ab\=offset=5 +Failed: error -33: bad offset value + +/í¼€/utf +Failed: error -27 at offset 0: UTF-32 error: code points 0xd800-0xdfff are not defined + +/\w+\x{C4}/B,utf +------------------------------------------------------------------ + Bra + \w++ + \x{c4} + Ket + End +------------------------------------------------------------------ + a\x{C4}\x{C4} + 0: a\x{c4} + +/\w+\x{C4}/B,utf,tables=2 +------------------------------------------------------------------ + Bra + \w+ + \x{c4} + Ket + End +------------------------------------------------------------------ + a\x{C4}\x{C4} + 0: a\x{c4}\x{c4} + +/\W+\x{C4}/B,utf +------------------------------------------------------------------ + Bra + \W+ + \x{c4} + Ket + End +------------------------------------------------------------------ + !\x{C4} + 0: !\x{c4} + +/\W+\x{C4}/B,utf,tables=2 +------------------------------------------------------------------ + Bra + \W++ + \x{c4} + Ket + End +------------------------------------------------------------------ + !\x{C4} + 0: !\x{c4} + +/\W+\x{A1}/B,utf +------------------------------------------------------------------ + Bra + \W+ + \x{a1} + Ket + End +------------------------------------------------------------------ + !\x{A1} + 0: !\x{a1} + +/\W+\x{A1}/B,utf,tables=2 +------------------------------------------------------------------ + Bra + \W+ + \x{a1} + Ket + End +------------------------------------------------------------------ + !\x{A1} + 0: !\x{a1} + +/X\s+\x{A0}/B,utf +------------------------------------------------------------------ + Bra + X + \s++ + \x{a0} + Ket + End +------------------------------------------------------------------ + X\x20\x{A0}\x{A0} + 0: X \x{a0} + +/X\s+\x{A0}/B,utf,tables=2 +------------------------------------------------------------------ + Bra + X + \s+ + \x{a0} + Ket + End +------------------------------------------------------------------ + X\x20\x{A0}\x{A0} + 0: X \x{a0}\x{a0} + +/\S+\x{A0}/B,utf +------------------------------------------------------------------ + Bra + \S+ + \x{a0} + Ket + End +------------------------------------------------------------------ + X\x{A0}\x{A0} + 0: X\x{a0}\x{a0} + +/\S+\x{A0}/B,utf,tables=2 +------------------------------------------------------------------ + Bra + \S++ + \x{a0} + Ket + End +------------------------------------------------------------------ + X\x{A0}\x{A0} + 0: X\x{a0} + +/\x{a0}+\s!/B,utf +------------------------------------------------------------------ + Bra + \x{a0}++ + \s + ! + Ket + End +------------------------------------------------------------------ + \x{a0}\x20! + 0: \x{a0} ! + +/\x{a0}+\s!/B,utf,tables=2 +------------------------------------------------------------------ + Bra + \x{a0}+ + \s + ! + Ket + End +------------------------------------------------------------------ + \x{a0}\x20! + 0: \x{a0} ! + +/(*UTF)abc/never_utf +Failed: error 174 at offset 6: using UTF is disabled by the application + +/abc/utf,never_utf +Failed: error 174 at offset 0: using UTF is disabled by the application + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IBi,utf +------------------------------------------------------------------ + Bra + /i A\x{391}\x{10427}\x{ff3a}\x{1fb0} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +First code unit = 'A' (caseless) +Last code unit = \x{1fb0} (caseless) +Subject length lower bound = 5 + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IB,utf +------------------------------------------------------------------ + Bra + A\x{391}\x{10427}\x{ff3a}\x{1fb0} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'A' +Last code unit = \x{1fb0} +Subject length lower bound = 5 + +/AB\x{1fb0}/IB,utf +------------------------------------------------------------------ + Bra + AB\x{1fb0} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'A' +Last code unit = \x{1fb0} +Subject length lower bound = 3 + +/AB\x{1fb0}/IBi,utf +------------------------------------------------------------------ + Bra + /i AB\x{1fb0} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +First code unit = 'A' (caseless) +Last code unit = \x{1fb0} (caseless) +Subject length lower bound = 3 + +/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf +Capture group count = 0 +Options: caseless utf +First code unit = \x{401} (caseless) +Last code unit = \x{42f} (caseless) +Subject length lower bound = 17 + \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} + 0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} + \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} + 0: \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} + +/[â±¥]/Bi,utf +------------------------------------------------------------------ + Bra + /i \x{2c65} + Ket + End +------------------------------------------------------------------ + +/[^â±¥]/Bi,utf +------------------------------------------------------------------ + Bra + /i [^\x{2c65}] + Ket + End +------------------------------------------------------------------ + +/[[:blank:]]/B,ucp +------------------------------------------------------------------ + Bra + [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}] + Ket + End +------------------------------------------------------------------ + +/\x{212a}+/Ii,utf +Capture group count = 0 +Options: caseless utf +Starting code units: K k \xff +Subject length lower bound = 1 + KKkk\x{212a} + 0: KKkk\x{212a} + +/s+/Ii,utf +Capture group count = 0 +Options: caseless utf +Starting code units: S s \xff +Subject length lower bound = 1 + SSss\x{17f} + 0: SSss\x{17f} + +# Non-UTF characters should give errors in both 16-bit and 32-bit modes. + +/\x{110000}/utf +Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large + +/\o{4200000}/utf +Failed: error 134 at offset 10: character code point value in \x{} or \o{} is too large + +/\x{100}*A/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + A + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: A \xff +Last code unit = 'A' +Subject length lower bound = 1 + A + 0: A + +/\x{100}*\d(?R)/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + \d + Recurse + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff +Subject length lower bound = 1 + +/[Z\x{100}]/IB,utf +------------------------------------------------------------------ + Bra + [Z\x{100}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: Z \xff +Subject length lower bound = 1 + Z\x{100} + 0: Z + \x{100} + 0: \x{100} + \x{100}Z + 0: \x{100} + +/[z-\x{100}]/IB,utf +------------------------------------------------------------------ + Bra + [z-\xff\x{100}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 + \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 + \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 + \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 + \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 \xc3 + \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 + \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 + \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 + \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/[z\Qa-d]Ä€\E]/IB,utf +------------------------------------------------------------------ + Bra + [\-\]adz\x{100}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: - ] a d z \xff +Subject length lower bound = 1 + \x{100} + 0: \x{100} + Ä€ + 0: \x{100} + +/[ab\x{100}]abc(xyz(?1))/IB,utf +------------------------------------------------------------------ + Bra + [ab\x{100}] + abc + CBra 1 + xyz + Recurse + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Starting code units: a b \xff +Last code unit = 'z' +Subject length lower bound = 7 + +/\x{100}*\s/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + \s + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xff +Subject length lower bound = 1 + +/\x{100}*\d/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + \d + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: 0 1 2 3 4 5 6 7 8 9 \xff +Subject length lower bound = 1 + +/\x{100}*\w/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + \w + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z + \xff +Subject length lower bound = 1 + +/\x{100}*\D/IB,utf +------------------------------------------------------------------ + Bra + \x{100}* + \D + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = > + ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c + d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 + \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 + \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 + \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf + \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe + \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd + \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc + \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb + \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa + \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/\x{100}*\S/IB,utf +------------------------------------------------------------------ + Bra + \x{100}* + \S + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f + \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e + \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C + D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h + i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 + \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 + \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 \xa2 + \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 + \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 + \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf + \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde + \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed + \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc + \xfd \xfe \xff +Subject length lower bound = 1 + +/\x{100}*\W/IB,utf +------------------------------------------------------------------ + Bra + \x{100}* + \W + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = > + ? @ [ \ ] ^ ` { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 + \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 + \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 + \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 + \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 + \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 + \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 + \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 + \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/[\x{105}-\x{109}]/IBi,utf +------------------------------------------------------------------ + Bra + [\x{104}-\x{109}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +Starting code units: \xff +Subject length lower bound = 1 + \x{104} + 0: \x{104} + \x{105} + 0: \x{105} + \x{109} + 0: \x{109} +\= Expect no match + \x{100} +No match + \x{10a} +No match + +/[z-\x{100}]/IBi,utf +------------------------------------------------------------------ + Bra + [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 + \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 + \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 + \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 + \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 + \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 + \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 + \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef + \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe + \xff +Subject length lower bound = 1 + Z + 0: Z + z + 0: z + \x{39c} + 0: \x{39c} + \x{178} + 0: \x{178} + | + 0: | + \x{80} + 0: \x{80} + \x{ff} + 0: \x{ff} + \x{100} + 0: \x{100} + \x{101} + 0: \x{101} +\= Expect no match + \x{102} +No match + Y +No match + y +No match + +/[z-\x{100}]/IBi,utf +------------------------------------------------------------------ + Bra + [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +Starting code units: Z z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 + \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 + \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 + \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 + \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 + \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 + \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 + \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef + \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe + \xff +Subject length lower bound = 1 + +/\x{3a3}B/IBi,utf +------------------------------------------------------------------ + Bra + clist 03a3 03c2 03c3 + /i B + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +Starting code units: \xff +Last code unit = 'B' (caseless) +Subject length lower bound = 2 + +/./utf + \x{110000} +Failed: error -28: UTF-32 error: code points greater than 0x10ffff are not defined at offset 0 + +/(*UTF)abý¿¿¿¿¿z/B +------------------------------------------------------------------ + Bra + ab\x{fd}\x{bf}\x{bf}\x{bf}\x{bf}\x{bf}z + Ket + End +------------------------------------------------------------------ + +/abý¿¿¿¿¿z/utf +** Failed: character value greater than 0x10ffff cannot be converted to UTF + +/[\W\p{Any}]/B +------------------------------------------------------------------ + Bra + [\x00-/:-@[-^`{-\xff\p{Any}\x{100}-\x{ffffffff}] + Ket + End +------------------------------------------------------------------ + abc + 0: a + 123 + 0: 1 + +/[\W\pL]/B +------------------------------------------------------------------ + Bra + [\x00-/:-@[-^`{-\xff\p{L}\x{100}-\x{ffffffff}] + Ket + End +------------------------------------------------------------------ + abc + 0: a + \x{100} + 0: \x{100} + \x{308} + 0: \x{308} +\= Expect no match + 123 +No match + +/[\s[:^ascii:]]/B,ucp +------------------------------------------------------------------ + Bra + [\x80-\xff\p{Xsp}\x{100}-\x{ffffffff}] + Ket + End +------------------------------------------------------------------ + +/\pP/ucp + \x{7fffffff} +No match + +# A special extra option allows excaped surrogate code points in 32-bit mode, +# but subjects containing them must not be UTF-checked. These patterns give +# errors in 16-bit mode. + +/\x{d800}/I,utf,allow_surrogate_escapes +Capture group count = 0 +Options: utf +Extra options: allow_surrogate_escapes +First code unit = \x{d800} +Subject length lower bound = 1 + \x{d800}\=no_utf_check + 0: \x{d800} + +/\udfff\o{157401}/utf,alt_bsux,allow_surrogate_escapes + \x{dfff}\x{df01}\=no_utf_check + 0: \x{dfff}\x{df01} + +# This has different starting code units in 8-bit mode. + +/^[^ab]/IB,utf +------------------------------------------------------------------ + Bra + ^ + [\x00-`c-\xff] (neg) + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: utf +Overall options: anchored utf +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 + 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y + Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f + \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e + \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d + \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac + \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb + \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca + \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 + \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 + \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 + \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + c + 0: c + \x{ff} + 0: \x{ff} + \x{100} + 0: \x{100} +\= Expect no match + aaa +No match + +# Offsets are different in 8-bit mode. + +/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout + 123abcáyzabcdef789abcሴqr + 1(2) Old 6 6 "" New 6 8 "<>" + 2(2) Old 12 12 "" New 14 16 "<>" + 3(2) Old 12 15 "def" New 16 21 "" + 4(2) Old 21 21 "" New 27 29 "<>" + 4: 123abc<>\x{e1}yzabc<>789abc<>\x{1234}qr + +# A few script run tests in non-UTF mode (but they need Unicode support) + +/^(*script_run:.{4})/ + \x{3041}\x{30a1}\x{3007}\x{3007} Hiragana Katakana Han Han + 0: \x{3041}\x{30a1}\x{3007}\x{3007} + \x{30a1}\x{3041}\x{3007}\x{3007} Katakana Hiragana Han Han + 0: \x{30a1}\x{3041}\x{3007}\x{3007} + \x{1100}\x{2e80}\x{2e80}\x{1101} Hangul Han Han Hangul + 0: \x{1100}\x{2e80}\x{2e80}\x{1101} + +/^(*sr:.*)/utf,allow_surrogate_escapes + \x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana + 0: \x{2e80}\x{3105}\x{2e80} + \x{d800}\x{dfff} Surrogates (Unknown) \=no_utf_check + 0: \x{d800} + +/(?(n/utf +Failed: error 142 at offset 4: syntax error in subpattern name (missing terminator?) + +/(?(á/utf +Failed: error 142 at offset 4: syntax error in subpattern name (missing terminator?) + +# Invalid UTF-16/32 tests. + +/.../g,match_invalid_utf + abcd\x{df00}wxzy\x{df00}pqrs + 0: abc + 0: wxz + 0: pqr + abcd\x{80}wxzy\x{df00}pqrs + 0: abc + 0: d\x{80}w + 0: xzy + 0: pqr + +/abc/match_invalid_utf + ab\x{df00}ab\=ph +Partial match: ab +\= Expect no match + ab\x{df00}cdef\=ph +No match + +/ab$/match_invalid_utf + ab\x{df00}cdeab + 0: ab +\= Expect no match + ab\x{df00}cde +No match + +/.../g,match_invalid_utf + abcd\x{80}wxzy\x{df00}pqrs + 0: abc + 0: d\x{80}w + 0: xzy + 0: pqr + +/(?<=x)../g,match_invalid_utf + abcd\x{80}wxzy\x{df00}pqrs + 0: zy + abcd\x{80}wxzy\x{df00}xpqrs + 0: zy + 0: pq + +/X$/match_invalid_utf +\= Expect no match + X\x{df00} +No match + +/(?<=..)X/match_invalid_utf,aftertext + AB\x{df00}AQXYZ + 0: X + 0+ YZ + AB\x{df00}AQXYZ\=offset=5 + 0: X + 0+ YZ + AB\x{df00}\x{df00}AXYZXC\=offset=5 + 0: X + 0+ C +\= Expect no match + AB\x{df00}XYZ +No match + AB\x{df00}XYZ\=offset=3 +No match + AB\x{df00}AXYZ +No match + AB\x{df00}AXYZ\=offset=4 +No match + AB\x{df00}\x{df00}AXYZ\=offset=5 +No match + +/.../match_invalid_utf +\= Expect no match + A\x{d800}B +No match + A\x{110000}B +No match + +/aa/utf,ucp,match_invalid_utf,global + aa\x{d800}aa + 0: aa + 0: aa + +/aa/utf,ucp,match_invalid_utf,global + \x{d800}aa + 0: aa + +# ---------------------------------------------------- + +/(*UTF)(?=\x{123})/I +Capture group count = 0 +May match empty string +Compile options: +Overall options: utf +First code unit = \x{123} +Subject length lower bound = 1 + +/[\x{c1}\x{e1}]X[\x{145}\x{146}]/I,utf +Capture group count = 0 +Options: utf +First code unit = \xc1 (caseless) +Last code unit = \x{145} (caseless) +Subject length lower bound = 3 + +/[\xff\x{ffff}]/I,utf +Capture group count = 0 +Options: utf +Starting code units: \xff +Subject length lower bound = 1 + +/[\xff\x{ff}]/I,utf +Capture group count = 0 +Options: utf +Starting code units: \xff +Subject length lower bound = 1 + +/[\xff\x{ff}]/I +Capture group count = 0 +Starting code units: \xff +Subject length lower bound = 1 + +/[Ss]/I +Capture group count = 0 +First code unit = 'S' (caseless) +Subject length lower bound = 1 + +/[Ss]/I,utf +Capture group count = 0 +Options: utf +Starting code units: S s +Subject length lower bound = 1 + +/(?:\x{ff}|\x{3000})/I,utf +Capture group count = 0 +Options: utf +Starting code units: \xff +Subject length lower bound = 1 + +# ---------------------------------------------------- +# UCP and casing tests + +/\x{120}/i,I +Capture group count = 0 +Options: caseless +First code unit = \x{120} +Subject length lower bound = 1 + +/\x{c1}/i,I,ucp +Capture group count = 0 +Options: caseless ucp +First code unit = \xc1 (caseless) +Subject length lower bound = 1 + +/[\x{120}\x{121}]/iB,ucp +------------------------------------------------------------------ + Bra + /i \x{120} + Ket + End +------------------------------------------------------------------ + +/[ab\x{120}]+/iB,ucp +------------------------------------------------------------------ + Bra + [ABab\x{120}-\x{121}]++ + Ket + End +------------------------------------------------------------------ + aABb\x{121}\x{120} + 0: aABb\x{121}\x{120} + +/\x{c1}/i,no_start_optimize +\= Expect no match + \x{e1} +No match + +/\x{120}\x{c1}/i,ucp,no_start_optimize + \x{121}\x{e1} + 0: \x{121}\xe1 + +/\x{120}\x{c1}/i,ucp + \x{121}\x{e1} + 0: \x{121}\xe1 + +/[^\x{120}]/i,no_start_optimize + \x{121} + 0: \x{121} + +/[^\x{120}]/i,ucp,no_start_optimize +\= Expect no match + \x{121} +No match + +/[^\x{120}]/i + \x{121} + 0: \x{121} + +/[^\x{120}]/i,ucp +\= Expect no match + \x{121} +No match + +/\x{120}{2}/i,ucp + \x{121}\x{121} + 0: \x{121}\x{121} + +/[^\x{120}]{2}/i,ucp +\= Expect no match + \x{121}\x{121} +No match + +/\x{c1}+\x{e1}/iB,ucp +------------------------------------------------------------------ + Bra + /i \x{c1}+ + /i \x{e1} + Ket + End +------------------------------------------------------------------ + \x{c1}\x{c1}\x{c1} + 0: \xc1\xc1\xc1 + +/\x{c1}+\x{e1}/iIB,ucp +------------------------------------------------------------------ + Bra + /i \x{c1}+ + /i \x{e1} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless ucp +First code unit = \xc1 (caseless) +Last code unit = \xe1 (caseless) +Subject length lower bound = 2 + \x{c1}\x{c1}\x{c1} + 0: \xc1\xc1\xc1 + \x{e1}\x{e1}\x{e1} + 0: \xe1\xe1\xe1 + +/a|\x{c1}/iI,ucp +Capture group count = 0 +Options: caseless ucp +Starting code units: A a \xc1 \xe1 +Subject length lower bound = 1 + \x{e1}xxx + 0: \xe1 + +/\x{c1}|\x{e1}/iI,ucp +Capture group count = 0 +Options: caseless ucp +First code unit = \xc1 (caseless) +Subject length lower bound = 1 + +/X(\x{e1})Y/ucp,replace=>\U$1<,substitute_extended + X\x{e1}Y + 1: >\xc1< + +/X(\x{121})Y/ucp,replace=>\U$1<,substitute_extended + X\x{121}Y + 1: >\x{120}< + +/s/i,ucp + \x{17f} + 0: \x{17f} + +/s/i,utf + \x{17f} + 0: \x{17f} + +/[^s]/i,ucp +\= Expect no match + \x{17f} +No match + +/[^s]/i,utf +\= Expect no match + \x{17f} +No match + +# ---------------------------------------------------- + +# End of testinput12 diff --git a/src/pcre/testdata/testoutput20 b/src/pcre2/testdata/testoutput13 similarity index 61% rename from src/pcre/testdata/testoutput20 rename to src/pcre2/testdata/testoutput13 index c1b20ee8..f737ebe9 100644 --- a/src/pcre/testdata/testoutput20 +++ b/src/pcre2/testdata/testoutput13 @@ -1,5 +1,8 @@ -/-- These DFA tests are for the handling of characters greater than 255 in - 16- or 32-bit, non-UTF mode. --/ +# These DFA tests are for the handling of characters greater than 255 in +# 16-bit or 32-bit, non-UTF mode. + +#forbid_utf +#subject dfa /^\x{ffff}+/i \x{ffff} @@ -21,4 +24,4 @@ \x{ffff} 0: \x{ffff} -/-- End of testinput20 --/ +# End of testinput13 diff --git a/src/pcre2/testdata/testoutput14-16 b/src/pcre2/testdata/testoutput14-16 new file mode 100644 index 00000000..61541f61 --- /dev/null +++ b/src/pcre2/testdata/testoutput14-16 @@ -0,0 +1,125 @@ +# These test special UTF and UCP features of DFA matching. The output is +# different for the different widths. + +#subject dfa + +# ---------------------------------------------------- +# These are a selection of the more comprehensive tests that are run for +# non-DFA matching. + +/X/utf + XX\x{d800} +Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2 + XX\x{d800}\=offset=3 +No match + XX\x{d800}\=no_utf_check + 0: X + XX\x{da00} +Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2 + XX\x{da00}\=no_utf_check + 0: X + XX\x{dc00} +Failed: error -26: UTF-16 error: isolated low surrogate at offset 2 + XX\x{dc00}\=no_utf_check + 0: X + XX\x{de00} +Failed: error -26: UTF-16 error: isolated low surrogate at offset 2 + XX\x{de00}\=no_utf_check + 0: X + XX\x{dfff} +Failed: error -26: UTF-16 error: isolated low surrogate at offset 2 + XX\x{dfff}\=no_utf_check + 0: X + XX\x{110000} +** Failed: character \x{110000} is greater than 0x10ffff and so cannot be converted to UTF-16 + XX\x{d800}\x{1234} +Failed: error -25: UTF-16 error: invalid low surrogate at offset 2 + +/badutf/utf + X\xdf +No match + XX\xef +No match + XXX\xef\x80 +No match + X\xf7 +No match + XX\xf7\x80 +No match + XXX\xf7\x80\x80 +No match + +/shortutf/utf + XX\xdf\=ph +No match + XX\xef\=ph +No match + XX\xef\x80\=ph +No match + \xf7\=ph +No match + \xf7\x80\=ph +No match + +# ---------------------------------------------------- +# UCP and casing tests - except for the first two, these will all fail in 8-bit +# mode because they are testing UCP without UTF and use characters > 255. + +/\x{c1}/i,no_start_optimize +\= Expect no match + \x{e1} +No match + +/\x{c1}+\x{e1}/iB,ucp +------------------------------------------------------------------ + Bra + /i \x{c1}+ + /i \x{e1} + Ket + End +------------------------------------------------------------------ + \x{c1}\x{c1}\x{c1} + 0: \xc1\xc1\xc1 + 1: \xc1\xc1 + \x{e1}\x{e1}\x{e1} + 0: \xe1\xe1\xe1 + 1: \xe1\xe1 + +/\x{120}\x{c1}/i,ucp,no_start_optimize + \x{121}\x{e1} + 0: \x{121}\xe1 + +/\x{120}\x{c1}/i,ucp + \x{121}\x{e1} + 0: \x{121}\xe1 + +/[^\x{120}]/i,no_start_optimize + \x{121} + 0: \x{121} + +/[^\x{120}]/i,ucp,no_start_optimize +\= Expect no match + \x{121} +No match + +/[^\x{120}]/i + \x{121} + 0: \x{121} + +/[^\x{120}]/i,ucp +\= Expect no match + \x{121} +No match + +/\x{120}{2}/i,ucp + \x{121}\x{121} + 0: \x{121}\x{121} + +/[^\x{120}]{2}/i,ucp +\= Expect no match + \x{121}\x{121} +No match + +# ---------------------------------------------------- + +# End of testinput14 diff --git a/src/pcre2/testdata/testoutput14-32 b/src/pcre2/testdata/testoutput14-32 new file mode 100644 index 00000000..f1f65b74 --- /dev/null +++ b/src/pcre2/testdata/testoutput14-32 @@ -0,0 +1,125 @@ +# These test special UTF and UCP features of DFA matching. The output is +# different for the different widths. + +#subject dfa + +# ---------------------------------------------------- +# These are a selection of the more comprehensive tests that are run for +# non-DFA matching. + +/X/utf + XX\x{d800} +Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2 + XX\x{d800}\=offset=3 +No match + XX\x{d800}\=no_utf_check + 0: X + XX\x{da00} +Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2 + XX\x{da00}\=no_utf_check + 0: X + XX\x{dc00} +Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2 + XX\x{dc00}\=no_utf_check + 0: X + XX\x{de00} +Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2 + XX\x{de00}\=no_utf_check + 0: X + XX\x{dfff} +Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2 + XX\x{dfff}\=no_utf_check + 0: X + XX\x{110000} +Failed: error -28: UTF-32 error: code points greater than 0x10ffff are not defined at offset 2 + XX\x{d800}\x{1234} +Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2 + +/badutf/utf + X\xdf +No match + XX\xef +No match + XXX\xef\x80 +No match + X\xf7 +No match + XX\xf7\x80 +No match + XXX\xf7\x80\x80 +No match + +/shortutf/utf + XX\xdf\=ph +No match + XX\xef\=ph +No match + XX\xef\x80\=ph +No match + \xf7\=ph +No match + \xf7\x80\=ph +No match + +# ---------------------------------------------------- +# UCP and casing tests - except for the first two, these will all fail in 8-bit +# mode because they are testing UCP without UTF and use characters > 255. + +/\x{c1}/i,no_start_optimize +\= Expect no match + \x{e1} +No match + +/\x{c1}+\x{e1}/iB,ucp +------------------------------------------------------------------ + Bra + /i \x{c1}+ + /i \x{e1} + Ket + End +------------------------------------------------------------------ + \x{c1}\x{c1}\x{c1} + 0: \xc1\xc1\xc1 + 1: \xc1\xc1 + \x{e1}\x{e1}\x{e1} + 0: \xe1\xe1\xe1 + 1: \xe1\xe1 + +/\x{120}\x{c1}/i,ucp,no_start_optimize + \x{121}\x{e1} + 0: \x{121}\xe1 + +/\x{120}\x{c1}/i,ucp + \x{121}\x{e1} + 0: \x{121}\xe1 + +/[^\x{120}]/i,no_start_optimize + \x{121} + 0: \x{121} + +/[^\x{120}]/i,ucp,no_start_optimize +\= Expect no match + \x{121} +No match + +/[^\x{120}]/i + \x{121} + 0: \x{121} + +/[^\x{120}]/i,ucp +\= Expect no match + \x{121} +No match + +/\x{120}{2}/i,ucp + \x{121}\x{121} + 0: \x{121}\x{121} + +/[^\x{120}]{2}/i,ucp +\= Expect no match + \x{121}\x{121} +No match + +# ---------------------------------------------------- + +# End of testinput14 diff --git a/src/pcre2/testdata/testoutput14-8 b/src/pcre2/testdata/testoutput14-8 new file mode 100644 index 00000000..aa624141 --- /dev/null +++ b/src/pcre2/testdata/testoutput14-8 @@ -0,0 +1,125 @@ +# These test special UTF and UCP features of DFA matching. The output is +# different for the different widths. + +#subject dfa + +# ---------------------------------------------------- +# These are a selection of the more comprehensive tests that are run for +# non-DFA matching. + +/X/utf + XX\x{d800} +Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 + XX\x{d800}\=offset=3 +Error -36 (bad UTF-8 offset) + XX\x{d800}\=no_utf_check + 0: X + XX\x{da00} +Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 + XX\x{da00}\=no_utf_check + 0: X + XX\x{dc00} +Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 + XX\x{dc00}\=no_utf_check + 0: X + XX\x{de00} +Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 + XX\x{de00}\=no_utf_check + 0: X + XX\x{dfff} +Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 + XX\x{dfff}\=no_utf_check + 0: X + XX\x{110000} +Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 2 + XX\x{d800}\x{1234} +Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 + +/badutf/utf + X\xdf +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 1 + XX\xef +Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 + XXX\xef\x80 +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3 + X\xf7 +Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 1 + XX\xf7\x80 +Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 + XXX\xf7\x80\x80 +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3 + +/shortutf/utf + XX\xdf\=ph +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2 + XX\xef\=ph +Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 + XX\xef\x80\=ph +Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2 + \xf7\=ph +Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 + \xf7\x80\=ph +Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 + +# ---------------------------------------------------- +# UCP and casing tests - except for the first two, these will all fail in 8-bit +# mode because they are testing UCP without UTF and use characters > 255. + +/\x{c1}/i,no_start_optimize +\= Expect no match + \x{e1} +No match + +/\x{c1}+\x{e1}/iB,ucp +------------------------------------------------------------------ + Bra + /i \x{c1}+ + /i \x{e1} + Ket + End +------------------------------------------------------------------ + \x{c1}\x{c1}\x{c1} + 0: \xc1\xc1\xc1 + 1: \xc1\xc1 + \x{e1}\x{e1}\x{e1} + 0: \xe1\xe1\xe1 + 1: \xe1\xe1 + +/\x{120}\x{c1}/i,ucp,no_start_optimize +Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large + \x{121}\x{e1} + +/\x{120}\x{c1}/i,ucp +Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large + \x{121}\x{e1} + +/[^\x{120}]/i,no_start_optimize +Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large + \x{121} + +/[^\x{120}]/i,ucp,no_start_optimize +Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large +\= Expect no match + \x{121} + +/[^\x{120}]/i +Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large + \x{121} + +/[^\x{120}]/i,ucp +Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large +\= Expect no match + \x{121} + +/\x{120}{2}/i,ucp +Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large + \x{121}\x{121} + +/[^\x{120}]{2}/i,ucp +Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large +\= Expect no match + \x{121}\x{121} + +# ---------------------------------------------------- + +# End of testinput14 diff --git a/src/pcre2/testdata/testoutput15 b/src/pcre2/testdata/testoutput15 new file mode 100644 index 00000000..9154e5f9 --- /dev/null +++ b/src/pcre2/testdata/testoutput15 @@ -0,0 +1,536 @@ +# These are: +# +# (1) Tests of the match-limiting features. The results are different for +# interpretive or JIT matching, so this test should not be run with JIT. The +# same tests are run using JIT in test 17. + +# (2) Other tests that must not be run with JIT. + +/(a+)*zz/I +Capture group count = 1 +Starting code units: a z +Last code unit = 'z' +Subject length lower bound = 2 + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits +Minimum heap limit = 0 +Minimum match limit = 7 +Minimum depth limit = 7 + 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazz + 1: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + aaaaaaaaaaaaaz\=find_limits +Minimum heap limit = 0 +Minimum match limit = 20481 +Minimum depth limit = 30 +No match + +!((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)!I +Capture group count = 1 +May match empty string +Subject length lower bound = 0 + /* this is a C style comment */\=find_limits +Minimum heap limit = 0 +Minimum match limit = 64 +Minimum depth limit = 7 + 0: /* this is a C style comment */ + 1: /* this is a C style comment */ + +/^(?>a)++/ + aa\=find_limits +Minimum heap limit = 0 +Minimum match limit = 5 +Minimum depth limit = 3 + 0: aa + aaaaaaaaa\=find_limits +Minimum heap limit = 0 +Minimum match limit = 12 +Minimum depth limit = 3 + 0: aaaaaaaaa + +/(a)(?1)++/ + aa\=find_limits +Minimum heap limit = 0 +Minimum match limit = 7 +Minimum depth limit = 5 + 0: aa + 1: a + aaaaaaaaa\=find_limits +Minimum heap limit = 0 +Minimum match limit = 21 +Minimum depth limit = 5 + 0: aaaaaaaaa + 1: a + +/a(?:.)*?a/ims + abbbbbbbbbbbbbbbbbbbbba\=find_limits +Minimum heap limit = 0 +Minimum match limit = 24 +Minimum depth limit = 3 + 0: abbbbbbbbbbbbbbbbbbbbba + +/a(?:.(*THEN))*?a/ims + abbbbbbbbbbbbbbbbbbbbba\=find_limits +Minimum heap limit = 0 +Minimum match limit = 66 +Minimum depth limit = 45 + 0: abbbbbbbbbbbbbbbbbbbbba + +/a(?:.(*THEN:ABC))*?a/ims + abbbbbbbbbbbbbbbbbbbbba\=find_limits +Minimum heap limit = 0 +Minimum match limit = 66 +Minimum depth limit = 45 + 0: abbbbbbbbbbbbbbbbbbbbba + +/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/ + aabbccddee\=find_limits +Minimum heap limit = 0 +Minimum match limit = 7 +Minimum depth limit = 7 + 0: aabbccddee + +/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/ + aabbccddee\=find_limits +Minimum heap limit = 0 +Minimum match limit = 12 +Minimum depth limit = 12 + 0: aabbccddee + 1: aa + 2: bb + 3: cc + 4: dd + 5: ee + +/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/ + aabbccddee\=find_limits +Minimum heap limit = 0 +Minimum match limit = 10 +Minimum depth limit = 10 + 0: aabbccddee + 1: aa + 2: cc + 3: ee + +/(*LIMIT_MATCH=12bc)abc/ +Failed: error 160 at offset 17: (*VERB) not recognized or malformed + +/(*LIMIT_MATCH=4294967290)abc/ +Failed: error 160 at offset 24: (*VERB) not recognized or malformed + +/(*LIMIT_DEPTH=4294967280)abc/I +Capture group count = 0 +Depth limit = 4294967280 +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 3 + +/(a+)*zz/ +\= Expect no match + aaaaaaaaaaaaaz +No match +\= Expect limit exceeded + aaaaaaaaaaaaaz\=match_limit=3000 +Failed: error -47: match limit exceeded + +/(a+)*zz/ +\= Expect limit exceeded + aaaaaaaaaaaaaz\=depth_limit=10 +Failed: error -53: matching depth limit exceeded + +/(*LIMIT_MATCH=3000)(a+)*zz/I +Capture group count = 1 +Match limit = 3000 +Starting code units: a z +Last code unit = 'z' +Subject length lower bound = 2 +\= Expect limit exceeded + aaaaaaaaaaaaaz +Failed: error -47: match limit exceeded +\= Expect limit exceeded + aaaaaaaaaaaaaz\=match_limit=60000 +Failed: error -47: match limit exceeded + +/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I +Capture group count = 1 +Match limit = 3000 +Starting code units: a z +Last code unit = 'z' +Subject length lower bound = 2 +\= Expect limit exceeded + aaaaaaaaaaaaaz +Failed: error -47: match limit exceeded + +/(*LIMIT_MATCH=60000)(a+)*zz/I +Capture group count = 1 +Match limit = 60000 +Starting code units: a z +Last code unit = 'z' +Subject length lower bound = 2 +\= Expect no match + aaaaaaaaaaaaaz +No match +\= Expect limit exceeded + aaaaaaaaaaaaaz\=match_limit=3000 +Failed: error -47: match limit exceeded + +/(*LIMIT_DEPTH=10)(a+)*zz/I +Capture group count = 1 +Depth limit = 10 +Starting code units: a z +Last code unit = 'z' +Subject length lower bound = 2 +\= Expect limit exceeded + aaaaaaaaaaaaaz +Failed: error -53: matching depth limit exceeded +\= Expect limit exceeded + aaaaaaaaaaaaaz\=depth_limit=1000 +Failed: error -53: matching depth limit exceeded + +/(*LIMIT_DEPTH=10)(*LIMIT_DEPTH=1000)(a+)*zz/I +Capture group count = 1 +Depth limit = 1000 +Starting code units: a z +Last code unit = 'z' +Subject length lower bound = 2 +\= Expect no match + aaaaaaaaaaaaaz +No match + +/(*LIMIT_DEPTH=1000)(a+)*zz/I +Capture group count = 1 +Depth limit = 1000 +Starting code units: a z +Last code unit = 'z' +Subject length lower bound = 2 +\= Expect no match + aaaaaaaaaaaaaz +No match +\= Expect limit exceeded + aaaaaaaaaaaaaz\=depth_limit=10 +Failed: error -53: matching depth limit exceeded + +# These three have infinitely nested recursions. + +/((?2))((?1))/ + abc +Failed: error -52: nested recursion at the same subject position + +/((?(R2)a+|(?1)b))()/ + aaaabcde +Failed: error -52: nested recursion at the same subject position + +/(?(R)a*(?1)|((?R))b)/ + aaaabcde +Failed: error -52: nested recursion at the same subject position + +# The allusedtext modifier does not work with JIT, which does not maintain +# the leftchar/rightchar data. + +/abc(?=xyz)/allusedtext + abcxyzpqr + 0: abcxyz + >>> + abcxyzpqr\=aftertext + 0: abcxyz + >>> + 0+ xyzpqr + +/(?<=pqr)abc(?=xyz)/allusedtext + xyzpqrabcxyzpqr + 0: pqrabcxyz + <<< >>> + xyzpqrabcxyzpqr\=aftertext + 0: pqrabcxyz + <<< >>> + 0+ xyzpqr + +/a\b/ + a.\=allusedtext + 0: a. + > + a\=allusedtext + 0: a + +/abc\Kxyz/ + abcxyz\=allusedtext + 0: abcxyz + <<< + +/abc(?=xyz(*ACCEPT))/ + abcxyz\=allusedtext + 0: abcxyz + >>> + +/abc(?=abcde)(?=ab)/allusedtext + abcabcdefg + 0: abcabcde + >>>>> + +#subject allusedtext + +/(?<=abc)123/ + xyzabc123pqr + 0: abc123 + <<< + xyzabc12\=ps +Partial match: abc12 + <<< + xyzabc12\=ph +Partial match: abc12 + <<< + +/\babc\b/ + +++abc+++ + 0: +abc+ + < > + +++ab\=ps +Partial match: +ab + < + +++ab\=ph +Partial match: +ab + < + +/(?<=abc)def/ + abc\=ph +Partial match: abc + <<< + +/(?<=123)(*MARK:xx)abc/mark + xxxx123a\=ph +Partial match, mark=xx: 123a + <<< + xxxx123a\=ps +Partial match, mark=xx: 123a + <<< + +/(?<=(?<=a)b)c.*/I +Capture group count = 0 +Max lookbehind = 1 +First code unit = 'c' +Subject length lower bound = 1 + abc\=ph +Partial match: abc + << +\= Expect no match + xbc\=ph +No match + +/(?<=ab)c.*/I +Capture group count = 0 +Max lookbehind = 2 +First code unit = 'c' +Subject length lower bound = 1 + abc\=ph +Partial match: abc + << +\= Expect no match + xbc\=ph +No match + +/abc(?<=bc)def/ + xxxabcd\=ph +Partial match: abcd + +/(?<=ab)cdef/ + xxabcd\=ph +Partial match: abcd + << + +/(?<=(?<=(?<=a)b)c)./I +Capture group count = 0 +Max lookbehind = 1 +Subject length lower bound = 1 + 123abcXYZ + 0: abcX + <<< + +/(?<=ab(cd(?<=...)))./I +Capture group count = 1 +Max lookbehind = 4 +Subject length lower bound = 1 + abcdX + 0: abcdX + <<<< + 1: cd + +/(?<=ab((?<=...)cd))./I +Capture group count = 1 +Max lookbehind = 4 +Subject length lower bound = 1 + ZabcdX + 0: ZabcdX + <<<<< + 1: cd + +/(?<=((?<=(?<=ab).))(?1)(?1))./I +Capture group count = 1 +Max lookbehind = 2 +Subject length lower bound = 1 + abxZ + 0: abxZ + <<< + 1: + +#subject +# ------------------------------------------------------------------- + +# These tests provoke recursion loops, which give a different error message +# when JIT is used. + +/(?R)/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + abcd +Failed: error -52: nested recursion at the same subject position + +/(a|(?R))/I +Capture group count = 1 +May match empty string +Subject length lower bound = 0 + abcd + 0: a + 1: a + defg +Failed: error -52: nested recursion at the same subject position + +/(ab|(bc|(de|(?R))))/I +Capture group count = 3 +May match empty string +Subject length lower bound = 0 + abcd + 0: ab + 1: ab + fghi +Failed: error -52: nested recursion at the same subject position + +/(ab|(bc|(de|(?1))))/I +Capture group count = 3 +May match empty string +Subject length lower bound = 0 + abcd + 0: ab + 1: ab + fghi +Failed: error -52: nested recursion at the same subject position + +/x(ab|(bc|(de|(?1)x)x)x)/I +Capture group count = 3 +First code unit = 'x' +Subject length lower bound = 3 + xab123 + 0: xab + 1: ab + xfghi +Failed: error -52: nested recursion at the same subject position + +/(?!\w)(?R)/ + abcd +Failed: error -52: nested recursion at the same subject position + =abc +Failed: error -52: nested recursion at the same subject position + +/(?=\w)(?R)/ + =abc +Failed: error -52: nested recursion at the same subject position + abcd +Failed: error -52: nested recursion at the same subject position + +/(?abc + 1 ^ ^ End of pattern + 1 ^ ^ End of pattern + 1 ^^ End of pattern + 1 ^ ^ End of pattern + 1 ^^ End of pattern + 1 ^^ End of pattern +No match + +/(*NO_AUTO_POSSESS)\w+(?C1)/BI +------------------------------------------------------------------ + Bra + \w+ + Callout 1 26 0 + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: no_auto_possess +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z +Subject length lower bound = 1 + abc\=callout_fail=1 +--->abc + 1 ^ ^ End of pattern + 1 ^ ^ End of pattern + 1 ^^ End of pattern + 1 ^ ^ End of pattern + 1 ^^ End of pattern + 1 ^^ End of pattern +No match + +# This test breaks the JIT stack limit + +/(|]+){2,2452}/ + (|]+){2,2452} + 0: + 1: + +/(*LIMIT_HEAP=21)\[(a)]{60}/expand + \[a]{60} +Failed: error -63: heap limit exceeded + +/b(? + abcz + 0: abcz + < >> + +# End of testinput15 diff --git a/src/pcre2/testdata/testoutput16 b/src/pcre2/testdata/testoutput16 new file mode 100644 index 00000000..78d43bda --- /dev/null +++ b/src/pcre2/testdata/testoutput16 @@ -0,0 +1,17 @@ +# This test is run only when JIT support is not available. It checks that an +# attempt to use it has the expected behaviour. It also tests things that +# are different without JIT. + +/abc/I,jit,jitverify +Capture group count = 0 +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 3 +JIT support is not available in this version of PCRE2 + +/a*/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + +# End of testinput16 diff --git a/src/pcre/testdata/testinput12 b/src/pcre2/testdata/testoutput17 similarity index 63% rename from src/pcre/testdata/testinput12 rename to src/pcre2/testdata/testoutput17 index 89ed4564..b66cfa32 100644 --- a/src/pcre/testdata/testinput12 +++ b/src/pcre2/testdata/testoutput17 @@ -1,109 +1,553 @@ -/-- This test is run only when JIT support is available. It checks for a -successful and an unsuccessful JIT compile and save and restore behaviour, -and a couple of things that are different with JIT. --/ +# This test is run only when JIT support is available. It checks JIT complete +# and partial modes, and things that are different with JIT. -/abc/S+I +#pattern jitverify -/(?(?C1)(?=a)a)/S+I +# JIT does not support this pattern (callout at start of condition). -/(?(?C1)(?=a)a)/S!+I +/(?(?C1)(?=a)a)/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 +JIT compilation was not successful (no more memory) -/b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*/S+I +# The following pattern cannot be compiled by JIT. -/abc/S+I>testsavedregex +/b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*b*/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 +JIT compilation was not successful (no more memory) -a)++/ + aa\=find_limits +Minimum match limit = 1 + 0: aa (JIT) + aaaaaaaaa\=find_limits +Minimum match limit = 1 + 0: aaaaaaaaa (JIT) + +/(a)(?1)++/ + aa\=find_limits +Minimum match limit = 1 + 0: aa (JIT) + 1: a + aaaaaaaaa\=find_limits +Minimum match limit = 1 + 0: aaaaaaaaa (JIT) + 1: a + +/a(?:.)*?a/ims + abbbbbbbbbbbbbbbbbbbbba\=find_limits +Minimum match limit = 22 + 0: abbbbbbbbbbbbbbbbbbbbba (JIT) + +/a(?:.(*THEN))*?a/ims + abbbbbbbbbbbbbbbbbbbbba\=find_limits +Minimum match limit = 22 + 0: abbbbbbbbbbbbbbbbbbbbba (JIT) + +/a(?:.(*THEN:ABC))*?a/ims + abbbbbbbbbbbbbbbbbbbbba\=find_limits +Minimum match limit = 22 + 0: abbbbbbbbbbbbbbbbbbbbba (JIT) + +/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/ + aabbccddee\=find_limits +Minimum match limit = 5 + 0: aabbccddee (JIT) + +/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/ + aabbccddee\=find_limits +Minimum match limit = 5 + 0: aabbccddee (JIT) + 1: aa + 2: bb + 3: cc + 4: dd + 5: ee + +/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/ + aabbccddee\=find_limits +Minimum match limit = 5 + 0: aabbccddee (JIT) + 1: aa + 2: cc + 3: ee + +/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/jitfast + aabbccddee\=find_limits +Minimum match limit = 5 + 0: aabbccddee (JIT) + 1: aa + 2: cc + 3: ee + aabbccddee\=jitstack=1 + 0: aabbccddee (JIT) + 1: aa + 2: cc + 3: ee + +/(a+)*zz/ +\= Expect no match + aaaaaaaaaaaaaz +No match (JIT) +\= Expect limit exceeded + aaaaaaaaaaaaaz\=match_limit=3000 +Failed: error -47: match limit exceeded + +/(*LIMIT_MATCH=3000)(a+)*zz/I +Capture group count = 1 +Match limit = 3000 +Starting code units: a z +Last code unit = 'z' +Subject length lower bound = 2 +JIT compilation was successful +\= Expect limit exceeded + aaaaaaaaaaaaaz +Failed: error -47: match limit exceeded +\= Expect limit exceeded + aaaaaaaaaaaaaz\=match_limit=60000 +Failed: error -47: match limit exceeded + +/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I +Capture group count = 1 +Match limit = 3000 +Starting code units: a z +Last code unit = 'z' +Subject length lower bound = 2 +JIT compilation was successful +\= Expect limit exceeded + aaaaaaaaaaaaaz +Failed: error -47: match limit exceeded + +/(*LIMIT_MATCH=60000)(a+)*zz/I +Capture group count = 1 +Match limit = 60000 +Starting code units: a z +Last code unit = 'z' +Subject length lower bound = 2 +JIT compilation was successful +\= Expect no match + aaaaaaaaaaaaaz +No match (JIT) +\= Expect limit exceeded + aaaaaaaaaaaaaz\=match_limit=3000 +Failed: error -47: match limit exceeded + +# These three have infinitely nested recursions. + +/((?2))((?1))/ +\= Expect JIT stack limit reached + abc +Failed: error -46: JIT stack limit reached + +/((?(R2)a+|(?1)b))()/ +\= Expect JIT stack limit reached + aaaabcde +Failed: error -46: JIT stack limit reached + +/(?(R)a*(?1)|((?R))b)/ +\= Expect JIT stack limit reached + aaaabcde +Failed: error -46: JIT stack limit reached + +# Invalid options disable JIT when called via pcre2_match(), causing the +# match to happen via the interpreter, but for fast JIT invalid options are +# ignored, so an unanchored match happens. + +/abcd/ + abcd\=anchored + 0: abcd +\= Expect no match + fail abcd\=anchored +No match + +/abcd/jitfast + abcd\=anchored + 0: abcd (JIT) + succeed abcd\=anchored + 0: abcd (JIT) + +# Push/pop does not lose the JIT information, though jitverify applies only to +# compilation, but serializing (save/load) discards JIT data completely. + +/^abc\Kdef/info,push +** Applies only to compile when pattern is stacked with 'push': jitverify +Capture group count = 0 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 6 +JIT compilation was successful +#pop jitverify + abcdef + 0: def (JIT) + +/^abc\Kdef/info,push +** Applies only to compile when pattern is stacked with 'push': jitverify +Capture group count = 0 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 6 +JIT compilation was successful +#save testsaved1 +#load testsaved1 +#pop jitverify + abcdef + 0: def + +#load testsaved1 +#pop jit,jitverify + abcdef + 0: def (JIT) + +/abcd/pushcopy,jitverify +** Applies only to compile when pattern is stacked with 'push': jitverify + abcd + 0: abcd (JIT) + +#pop jitverify + abcd + 0: abcd + +# Test pattern compilation + +/(?:a|b|c|d|e)(?R)/jit=1 + +/(?:a|b|c|d|e)(?R)(?R)/jit=1 + +/(a(?:a|b|c|d|e)b){8,16}/jit=1 + +/(?:|a|){100}x/jit=1 + +# These tests provoke recursion loops, which give a different error message +# when JIT is used. + +/(?R)/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 +JIT compilation was successful + abcd +Failed: error -46: JIT stack limit reached + +/(a|(?R))/I +Capture group count = 1 +May match empty string +Subject length lower bound = 0 +JIT compilation was successful + abcd + 0: a (JIT) + 1: a + defg +Failed: error -46: JIT stack limit reached + +/(ab|(bc|(de|(?R))))/I +Capture group count = 3 +May match empty string +Subject length lower bound = 0 +JIT compilation was successful + abcd + 0: ab (JIT) + 1: ab + fghi +Failed: error -46: JIT stack limit reached + +/(ab|(bc|(de|(?1))))/I +Capture group count = 3 +May match empty string +Subject length lower bound = 0 +JIT compilation was successful + abcd + 0: ab (JIT) + 1: ab + fghi +Failed: error -46: JIT stack limit reached + +/x(ab|(bc|(de|(?1)x)x)x)/I +Capture group count = 3 +First code unit = 'x' +Subject length lower bound = 3 +JIT compilation was successful + xab123 + 0: xab (JIT) + 1: ab + xfghi +Failed: error -46: JIT stack limit reached + +/(?!\w)(?R)/ + abcd +Failed: error -46: JIT stack limit reached + =abc +Failed: error -46: JIT stack limit reached + +/(?=\w)(?R)/ + =abc +Failed: error -46: JIT stack limit reached + abcd +Failed: error -46: JIT stack limit reached -/-- Test pattern compilation --/ +/(? + 3: def + +/the quick brown fox/ + the quick brown fox + 0: the quick brown fox +\= Expect no match + The Quick Brown Fox +No match: POSIX code 17: match failed + +/the quick brown fox/i + the quick brown fox + 0: the quick brown fox + The Quick Brown Fox + 0: The Quick Brown Fox + +/(*LF)abc.def/ +\= Expect no match + abc\ndef +No match: POSIX code 17: match failed + +/(*LF)abc$/ + abc + 0: abc + abc\n + 0: abc + +/(abc)\2/ +Failed: POSIX code 15: bad back reference at offset 6 + +/(abc\1)/ +\= Expect no match + abc +No match: POSIX code 17: match failed + +/a*(b+)(z)(z)/ + aaaabbbbzzzz + 0: aaaabbbbzz + 1: bbbb + 2: z + 3: z + aaaabbbbzzzz\=ovector=0 +Matched without capture + aaaabbbbzzzz\=ovector=1 + 0: aaaabbbbzz + aaaabbbbzzzz\=ovector=2 + 0: aaaabbbbzz + 1: bbbb + +/(*ANY)ab.cd/ + ab-cd + 0: ab-cd + ab=cd + 0: ab=cd +\= Expect no match + ab\ncd +No match: POSIX code 17: match failed + +/ab.cd/s + ab-cd + 0: ab-cd + ab=cd + 0: ab=cd + ab\ncd + 0: ab\x0acd + +/a(b)c/posix_nosub + abc +Matched with REG_NOSUB + +/a(?Pb)c/posix_nosub + abc +Matched with REG_NOSUB + +/(a)\1/posix_nosub + zaay +Matched with REG_NOSUB + +/a?|b?/ + abc + 0: a +\= Expect no match + ddd\=notempty +No match: POSIX code 17: match failed + +/\w+A/ + CDAAAAB + 0: CDAAAA + +/\w+A/ungreedy + CDAAAAB + 0: CDA + +/\Biss\B/I,aftertext +** Ignored with POSIX interface: info + Mississippi + 0: iss + 0+ issippi + +/abc/\ +Failed: POSIX code 9: bad escape sequence at offset 4 + +"(?(?C)" +Failed: POSIX code 11: unbalanced () at offset 6 + +"(?(?C))" +Failed: POSIX code 3: pattern error at offset 6 + +/abcd/substitute_extended +** Ignored with POSIX interface: substitute_extended + +/\[A]{1000000}**/expand,regerror_buffsize=31 +Failed: POSIX code 4: ? * + invalid at offset 100000 +** regerror() message truncated + +/\[A]{1000000}**/expand,regerror_buffsize=32 +Failed: POSIX code 4: ? * + invalid at offset 1000001 + +//posix_nosub + \=offset=70000 +** Ignored with POSIX interface: offset +Matched with REG_NOSUB + +/(?=(a\K))/ + a +Start of matched string is beyond its end - displaying from end to start. + 0: a + 1: a + +/^d(e)$/posix + acdef\=posix_startend=2:4 + 0: de + 1: e + acde\=posix_startend=2 + 0: de + 1: e +\= Expect no match + acdef +No match: POSIX code 17: match failed + acdef\=posix_startend=2 +No match: POSIX code 17: match failed + +/^a\x{00}b$/posix + a\x{00}b\=posix_startend=0:3 + 0: a\x00b + +/"A" 00 "B"/hex + A\x{00}B\=posix_startend=0:3 + 0: A\x00B + +/ABC/use_length + ABC + 0: ABC + +/a\b(c/literal,posix + a\\b(c + 0: a\b(c + +/a\b(c/literal,posix,dotall +Failed: POSIX code 16: bad argument at offset 0 + +/((a)(b)?(c))/posix + 123ace + 0: ac + 1: ac + 2: a + 3: + 4: c + 123ace\=posix_startend=2:6 + 0: ac + 1: ac + 2: a + 3: + 4: c + +# End of testdata/testinput18 diff --git a/src/pcre2/testdata/testoutput19 b/src/pcre2/testdata/testoutput19 new file mode 100644 index 00000000..a4a8b1a7 --- /dev/null +++ b/src/pcre2/testdata/testoutput19 @@ -0,0 +1,25 @@ +# This set of tests is run only with the 8-bit library. It tests the POSIX +# interface with UTF/UCP support, which is supported only with the 8-bit +# library. This test should not be run with JIT (which is not available for the +# POSIX interface). + +#pattern posix + +/a\x{1234}b/utf + a\x{1234}b + 0: a\x{1234}b + +/\w/ +\= Expect no match + +++\x{c2} +No match: POSIX code 17: match failed + +/\w/ucp + +++\x{c2} + 0: \xc2 + +/"^AB" 00 "\x{1234}$"/hex,utf + AB\x{00}\x{1234}\=posix_startend=0:6 + 0: AB\x{00}\x{1234} + +# End of testdata/testinput19 diff --git a/src/pcre2/testdata/testoutput2 b/src/pcre2/testdata/testoutput2 new file mode 100644 index 00000000..6065ed71 --- /dev/null +++ b/src/pcre2/testdata/testoutput2 @@ -0,0 +1,17673 @@ +# This set of tests is not Perl-compatible. It checks on special features +# of PCRE2's API, error diagnostics, and the compiled code of some patterns. +# It also checks the non-Perl syntax that PCRE2 supports (Python, .NET, +# Oniguruma). There are also some tests where PCRE2 and Perl differ, +# either because PCRE2 can't be compatible, or there is a possible Perl +# bug. + +# NOTE: This is a non-UTF set of tests. When UTF support is needed, use +# test 5. + +#forbid_utf +#newline_default lf any anycrlf + +# Test binary zeroes in the pattern + +# /a\0B/ where 0 is a binary zero +/61 5c 00 62/B,hex +------------------------------------------------------------------ + Bra + a\x00b + Ket + End +------------------------------------------------------------------ + a\x{0}b + 0: a\x00b + +# /a0b/ where 0 is a binary zero +/61 00 62/B,hex +------------------------------------------------------------------ + Bra + a\x00b + Ket + End +------------------------------------------------------------------ + a\x{0}b + 0: a\x00b + +# /(?#B0C)DE/ where 0 is a binary zero +/28 3f 23 42 00 43 29 44 45/B,hex +------------------------------------------------------------------ + Bra + DE + Ket + End +------------------------------------------------------------------ + DE + 0: DE + +/(a)b|/I +Capture group count = 1 +May match empty string +Subject length lower bound = 0 + +/abc/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 3 + abc + 0: abc + defabc + 0: abc + abc\=anchored + 0: abc +\= Expect no match + defabc\=anchored +No match + ABC +No match + +/^abc/I +Capture group count = 0 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 3 + abc + 0: abc + abc\=anchored + 0: abc +\= Expect no match + defabc +No match + defabc\=anchored +No match + +/a+bc/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 3 + +/a*bc/I +Capture group count = 0 +Starting code units: a b +Last code unit = 'c' +Subject length lower bound = 2 + +/a{3}bc/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 5 + +/(abc|a+z)/I +Capture group count = 1 +First code unit = 'a' +Subject length lower bound = 2 + +/^abc$/I +Capture group count = 0 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 3 + abc + 0: abc +\= Expect no match + def\nabc +No match + +/ab\idef/ +Failed: error 103 at offset 3: unrecognized character follows \ + +/(?X)ab\idef/ +Failed: error 111 at offset 2: unrecognized character after (? or (?- + +/x{5,4}/ +Failed: error 104 at offset 5: numbers out of order in {} quantifier + +/z{65536}/ +Failed: error 105 at offset 7: number too big in {} quantifier + +/[abcd/ +Failed: error 106 at offset 5: missing terminating ] for character class + +/[\B]/B +Failed: error 107 at offset 2: escape sequence is invalid in character class + +/[\R]/B +Failed: error 107 at offset 2: escape sequence is invalid in character class + +/[\X]/B +Failed: error 107 at offset 2: escape sequence is invalid in character class + +/[z-a]/ +Failed: error 108 at offset 3: range out of order in character class + +/^*/ +Failed: error 109 at offset 1: quantifier does not follow a repeatable item + +/(abc/ +Failed: error 114 at offset 4: missing closing parenthesis + +/(?# abc/ +Failed: error 118 at offset 7: missing ) after (?# comment + +/(?z)abc/ +Failed: error 111 at offset 2: unrecognized character after (? or (?- + +/.*b/I +Capture group count = 0 +First code unit at start or follows newline +Last code unit = 'b' +Subject length lower bound = 1 + +/.*?b/I +Capture group count = 0 +First code unit at start or follows newline +Last code unit = 'b' +Subject length lower bound = 1 + +/cat|dog|elephant/I +Capture group count = 0 +Starting code units: c d e +Subject length lower bound = 3 + this sentence eventually mentions a cat + 0: cat + this sentences rambles on and on for a while and then reaches elephant + 0: elephant + +/cat|dog|elephant/I +Capture group count = 0 +Starting code units: c d e +Subject length lower bound = 3 + this sentence eventually mentions a cat + 0: cat + this sentences rambles on and on for a while and then reaches elephant + 0: elephant + +/cat|dog|elephant/Ii +Capture group count = 0 +Options: caseless +Starting code units: C D E c d e +Subject length lower bound = 3 + this sentence eventually mentions a CAT cat + 0: CAT + this sentences rambles on and on for a while to elephant ElePhant + 0: elephant + +/a|[bcd]/I +Capture group count = 0 +Starting code units: a b c d +Subject length lower bound = 1 + +/(a|[^\dZ])/I +Capture group count = 1 +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = > + ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y [ \ ] ^ _ ` a b c d + e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 + \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 + \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 + \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 + \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf + \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce + \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd + \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec + \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb + \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/(a|b)*[\s]/I +Capture group count = 1 +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 a b +Subject length lower bound = 1 + +/(ab\2)/ +Failed: error 115 at offset 4: reference to non-existent subpattern + +/{4,5}abc/ +Failed: error 109 at offset 4: quantifier does not follow a repeatable item + +/(a)(b)(c)\2/I +Capture group count = 3 +Max back reference = 2 +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 4 + abcb + 0: abcb + 1: a + 2: b + 3: c + abcb\=ovector=0 + 0: abcb + 1: a + 2: b + 3: c + abcb\=ovector=1 +Matched, but too many substrings + 0: abcb + abcb\=ovector=2 +Matched, but too many substrings + 0: abcb + 1: a + abcb\=ovector=3 +Matched, but too many substrings + 0: abcb + 1: a + 2: b + abcb\=ovector=4 + 0: abcb + 1: a + 2: b + 3: c + +/(a)bc|(a)(b)\2/I +Capture group count = 3 +Max back reference = 2 +First code unit = 'a' +Subject length lower bound = 3 + abc + 0: abc + 1: a + abc\=ovector=0 + 0: abc + 1: a + abc\=ovector=1 +Matched, but too many substrings + 0: abc + abc\=ovector=2 + 0: abc + 1: a + aba + 0: aba + 1: + 2: a + 3: b + aba\=ovector=0 + 0: aba + 1: + 2: a + 3: b + aba\=ovector=1 +Matched, but too many substrings + 0: aba + aba\=ovector=2 +Matched, but too many substrings + 0: aba + 1: + aba\=ovector=3 +Matched, but too many substrings + 0: aba + 1: + 2: a + aba\=ovector=4 + 0: aba + 1: + 2: a + 3: b + +/abc$/I,dollar_endonly +Capture group count = 0 +Options: dollar_endonly +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 3 + abc + 0: abc +\= Expect no match + abc\n +No match + abc\ndef +No match + +/(a)(b)(c)(d)(e)\6/ +Failed: error 115 at offset 16: reference to non-existent subpattern + +/the quick brown fox/I +Capture group count = 0 +First code unit = 't' +Last code unit = 'x' +Subject length lower bound = 19 + the quick brown fox + 0: the quick brown fox + this is a line with the quick brown fox + 0: the quick brown fox + +/the quick brown fox/I,anchored +Capture group count = 0 +Options: anchored +First code unit = 't' +Subject length lower bound = 19 + the quick brown fox + 0: the quick brown fox +\= Expect no match + this is a line with the quick brown fox +No match + +/ab(?z)cd/ +Failed: error 111 at offset 4: unrecognized character after (? or (?- + +/^abc|def/I +Capture group count = 0 +Starting code units: a d +Subject length lower bound = 3 + abcdef + 0: abc + abcdef\=notbol + 0: def + +/.*((abc)$|(def))/I +Capture group count = 3 +First code unit at start or follows newline +Subject length lower bound = 3 + defabc + 0: defabc + 1: abc + 2: abc + defabc\=noteol + 0: def + 1: def + 2: + 3: def + +/)/ +Failed: error 122 at offset 0: unmatched closing parenthesis + +/a[]b/ +Failed: error 106 at offset 4: missing terminating ] for character class + +/[^aeiou ]{3,}/I +Capture group count = 0 +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 + 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ + \ ] ^ _ ` b c d f g h j k l m n p q r s t v w x y z { | } ~ \x7f \x80 \x81 + \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 + \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f + \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae + \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd + \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc + \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb + \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea + \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 + \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 3 + co-processors, and for + 0: -pr + +/<.*>/I +Capture group count = 0 +First code unit = '<' +Last code unit = '>' +Subject length lower bound = 2 + abcghinop + 0: ghi + +/<.*?>/I +Capture group count = 0 +First code unit = '<' +Last code unit = '>' +Subject length lower bound = 2 + abcghinop + 0: + +/<.*>/I,ungreedy +Capture group count = 0 +Options: ungreedy +First code unit = '<' +Last code unit = '>' +Subject length lower bound = 2 + abcghinop + 0: + +/(?U)<.*>/I +Capture group count = 0 +First code unit = '<' +Last code unit = '>' +Subject length lower bound = 2 + abcghinop + 0: + +/<.*?>/I,ungreedy +Capture group count = 0 +Options: ungreedy +First code unit = '<' +Last code unit = '>' +Subject length lower bound = 2 + abcghinop + 0: ghi + +/={3,}/I,ungreedy +Capture group count = 0 +Options: ungreedy +First code unit = '=' +Last code unit = '=' +Subject length lower bound = 3 + abc========def + 0: === + +/(?U)={3,}?/I +Capture group count = 0 +First code unit = '=' +Last code unit = '=' +Subject length lower bound = 3 + abc========def + 0: ======== + +/(? +Overall options: anchored +First code unit = '1' +Subject length lower bound = 4 + +/(^b|(?i)^d)/I +Capture group count = 1 +Compile options: +Overall options: anchored +Starting code units: D b d +Subject length lower bound = 1 + +/(?s).*/I +Capture group count = 0 +May match empty string +Compile options: +Overall options: anchored +Subject length lower bound = 0 + +/[abcd]/I +Capture group count = 0 +Starting code units: a b c d +Subject length lower bound = 1 + +/(?i)[abcd]/I +Capture group count = 0 +Starting code units: A B C D a b c d +Subject length lower bound = 1 + +/(?m)[xy]|(b|c)/I +Capture group count = 1 +Starting code units: b c x y +Subject length lower bound = 1 + +/(^a|^b)/Im +Capture group count = 1 +Options: multiline +First code unit at start or follows newline +Subject length lower bound = 1 + +/(?i)(^a|^b)/Im +Capture group count = 1 +Options: multiline +First code unit at start or follows newline +Subject length lower bound = 1 + +/(a)(?(1)a|b|c)/ +Failed: error 127 at offset 3: conditional subpattern contains more than two branches + +/(?(?=a)a|b|c)/ +Failed: error 127 at offset 0: conditional subpattern contains more than two branches + +/(?(1a)/ +Failed: error 124 at offset 4: missing closing parenthesis for condition + +/(?(1a))/ +Failed: error 124 at offset 4: missing closing parenthesis for condition + +/(?(?i))/ +Failed: error 128 at offset 2: assertion expected after (?( or (?(?C) + +/(?(abc))/ +Failed: error 115 at offset 3: reference to non-existent subpattern + +/(?(? +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 3 + aaaaabbbbbcccccdef + 0: aaaaabbbbbcccccdef + 1: aaaaabbbbbcccccdef + 2: aaaaa + 3: b + 4: bbbbccccc + 5: def + +/(?<=foo)[ab]/I +Capture group count = 0 +Max lookbehind = 3 +Starting code units: a b +Subject length lower bound = 1 + +/(?^abc)/Im +Capture group count = 0 +Options: multiline +First code unit at start or follows newline +Last code unit = 'c' +Subject length lower bound = 3 + abc + 0: abc + def\nabc + 0: abc +\= Expect no match + defabc +No match + +/(?<=ab(c+)d)ef/ +Failed: error 125 at offset 0: lookbehind assertion is not fixed length + +/(?<=ab(?<=c+)d)ef/ +Failed: error 125 at offset 6: lookbehind assertion is not fixed length + +/(?<=ab(c|de)f)g/ +Failed: error 125 at offset 0: lookbehind assertion is not fixed length + +/The next three are in testinput2 because they have variable length branches/ + +/(?<=bullock|donkey)-cart/I +Capture group count = 0 +Max lookbehind = 7 +First code unit = '-' +Last code unit = 't' +Subject length lower bound = 5 + the bullock-cart + 0: -cart + a donkey-cart race + 0: -cart +\= Expect no match + cart +No match + horse-and-cart +No match + +/(?<=ab(?i)x|y|z)/I +Capture group count = 0 +Max lookbehind = 3 +May match empty string +Subject length lower bound = 0 + +/(?>.*)(?<=(abcd)|(xyz))/I +Capture group count = 2 +Max lookbehind = 4 +May match empty string +Subject length lower bound = 0 + alphabetabcd + 0: alphabetabcd + 1: abcd + endingxyz + 0: endingxyz + 1: + 2: xyz + +/(?<=ab(?i)x(?-i)y|(?i)z|b)ZZ/I +Capture group count = 0 +Max lookbehind = 4 +First code unit = 'Z' +Last code unit = 'Z' +Subject length lower bound = 2 + abxyZZ + 0: ZZ + abXyZZ + 0: ZZ + ZZZ + 0: ZZ + zZZ + 0: ZZ + bZZ + 0: ZZ + BZZ + 0: ZZ +\= Expect no match + ZZ +No match + abXYZZ +No match + zzz +No match + bzz +No match + +/(? +Overall options: anchored +Starting code units: a b +Subject length lower bound = 4 + adef\=get=1,get=2,get=3,get=4,getall + 0: adef + 1: a + 2: + 3: f + 1G a (1) +Get substring 2 failed (-55): requested value is not set + 3G f (1) +Get substring 4 failed (-49): unknown substring + 0L adef + 1L a + 2L + 3L f + bcdef\=get=1,get=2,get=3,get=4,getall + 0: bcdef + 1: bc + 2: bc + 3: f + 1G bc (2) + 2G bc (2) + 3G f (1) +Get substring 4 failed (-49): unknown substring + 0L bcdef + 1L bc + 2L bc + 3L f + adefghijk\=copy=0 + 0: adef + 1: a + 2: + 3: f + 0C adef (4) + +/^abc\00def/I +Capture group count = 0 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 7 + abc\00def\=copy=0,getall + 0: abc\x00def + 0C abc\x00def (7) + 0L abc\x00def + +/word ((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ +)((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ )((?:[a-zA-Z0-9]+ +)?)?)?)?)?)?)?)?)?otherword/I +Capture group count = 8 +Contains explicit CR or LF match +First code unit = 'w' +Last code unit = 'd' +Subject length lower bound = 14 + +/.*X/IB +------------------------------------------------------------------ + Bra + Any* + X + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +First code unit at start or follows newline +Last code unit = 'X' +Subject length lower bound = 1 + +/.*X/IBs +------------------------------------------------------------------ + Bra + AllAny* + X + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: dotall +Overall options: anchored dotall +Last code unit = 'X' +Subject length lower bound = 1 + +/(.*X|^B)/IB +------------------------------------------------------------------ + Bra + CBra 1 + Any* + X + Alt + ^ + B + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +First code unit at start or follows newline +Subject length lower bound = 1 + +/(.*X|^B)/IBs +------------------------------------------------------------------ + Bra + CBra 1 + AllAny* + X + Alt + ^ + B + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Compile options: dotall +Overall options: anchored dotall +Subject length lower bound = 1 + +/(?s)(.*X|^B)/IB +------------------------------------------------------------------ + Bra + CBra 1 + AllAny* + X + Alt + ^ + B + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Compile options: +Overall options: anchored +Subject length lower bound = 1 + +/(?s:.*X|^B)/IB +------------------------------------------------------------------ + Bra + Bra + AllAny* + X + Alt + ^ + B + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Subject length lower bound = 1 + +/\Biss\B/I,aftertext +Capture group count = 0 +Max lookbehind = 1 +First code unit = 'i' +Last code unit = 's' +Subject length lower bound = 3 + Mississippi + 0: iss + 0+ issippi + +/iss/I,aftertext,altglobal +Capture group count = 0 +First code unit = 'i' +Last code unit = 's' +Subject length lower bound = 3 + Mississippi + 0: iss + 0+ issippi + 0: iss + 0+ ippi + +/\Biss\B/I,aftertext,altglobal +Capture group count = 0 +Max lookbehind = 1 +First code unit = 'i' +Last code unit = 's' +Subject length lower bound = 3 + Mississippi + 0: iss + 0+ issippi + +/\Biss\B/Ig,aftertext +Capture group count = 0 +Max lookbehind = 1 +First code unit = 'i' +Last code unit = 's' +Subject length lower bound = 3 + Mississippi + 0: iss + 0+ issippi + 0: iss + 0+ ippi +\= Expect no match + Mississippi\=anchored +No match + +/(?<=[Ms])iss/Ig,aftertext +Capture group count = 0 +Max lookbehind = 1 +First code unit = 'i' +Last code unit = 's' +Subject length lower bound = 3 + Mississippi + 0: iss + 0+ issippi + 0: iss + 0+ ippi + +/(?<=[Ms])iss/I,aftertext,altglobal +Capture group count = 0 +Max lookbehind = 1 +First code unit = 'i' +Last code unit = 's' +Subject length lower bound = 3 + Mississippi + 0: iss + 0+ issippi + +/^iss/Ig,aftertext +Capture group count = 0 +Compile options: +Overall options: anchored +First code unit = 'i' +Subject length lower bound = 3 + ississippi + 0: iss + 0+ issippi + +/.*iss/Ig,aftertext +Capture group count = 0 +First code unit at start or follows newline +Last code unit = 's' +Subject length lower bound = 3 + abciss\nxyzisspqr + 0: abciss + 0+ \x0axyzisspqr + 0: xyziss + 0+ pqr + +/.i./Ig,aftertext +Capture group count = 0 +Last code unit = 'i' +Subject length lower bound = 3 + Mississippi + 0: Mis + 0+ sissippi + 0: sis + 0+ sippi + 0: sip + 0+ pi + Mississippi\=anchored + 0: Mis + 0+ sissippi + 0: sis + 0+ sippi + 0: sip + 0+ pi + Missouri river + 0: Mis + 0+ souri river + 0: ri + 0+ river + 0: riv + 0+ er + Missouri river\=anchored + 0: Mis + 0+ souri river + +/^.is/Ig,aftertext +Capture group count = 0 +Compile options: +Overall options: anchored +Subject length lower bound = 3 + Mississippi + 0: Mis + 0+ sissippi + +/^ab\n/Ig,aftertext +Capture group count = 0 +Contains explicit CR or LF match +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 3 + ab\nab\ncd + 0: ab\x0a + 0+ ab\x0acd + +/^ab\n/Igm,aftertext +Capture group count = 0 +Contains explicit CR or LF match +Options: multiline +First code unit at start or follows newline +Last code unit = \x0a +Subject length lower bound = 3 + ab\nab\ncd + 0: ab\x0a + 0+ ab\x0acd + 0: ab\x0a + 0+ cd + +/^/gm,newline=any + a\rb\nc\r\nxyz\=aftertext + 0: + 0+ a\x0db\x0ac\x0d\x0axyz + 0: + 0+ b\x0ac\x0d\x0axyz + 0: + 0+ c\x0d\x0axyz + 0: + 0+ xyz + +/abc/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 3 + +/abc|bac/I +Capture group count = 0 +Starting code units: a b +Last code unit = 'c' +Subject length lower bound = 3 + +/(abc|bac)/I +Capture group count = 1 +Starting code units: a b +Last code unit = 'c' +Subject length lower bound = 3 + +/(abc|(c|dc))/I +Capture group count = 2 +Starting code units: a c d +Last code unit = 'c' +Subject length lower bound = 1 + +/(abc|(d|de)c)/I +Capture group count = 2 +Starting code units: a d +Last code unit = 'c' +Subject length lower bound = 2 + +/a*/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + +/a+/I +Capture group count = 0 +First code unit = 'a' +Subject length lower bound = 1 + +/(baa|a+)/I +Capture group count = 1 +Starting code units: a b +Last code unit = 'a' +Subject length lower bound = 1 + +/a{0,3}/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + +/baa{3,}/I +Capture group count = 0 +First code unit = 'b' +Last code unit = 'a' +Subject length lower bound = 5 + +/"([^\\"]+|\\.)*"/I +Capture group count = 1 +First code unit = '"' +Last code unit = '"' +Subject length lower bound = 2 + +/(abc|ab[cd])/I +Capture group count = 1 +First code unit = 'a' +Subject length lower bound = 3 + +/(a|.)/I +Capture group count = 1 +Subject length lower bound = 1 + +/a|ba|\w/I +Capture group count = 0 +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z +Subject length lower bound = 1 + +/abc(?=pqr)/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'r' +Subject length lower bound = 3 + +/...(?<=abc)/I +Capture group count = 0 +Max lookbehind = 3 +Subject length lower bound = 3 + +/abc(?!pqr)/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 3 + +/ab./I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + +/ab[xyz]/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + +/abc*/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + +/ab.c*/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + +/a.c*/I +Capture group count = 0 +First code unit = 'a' +Subject length lower bound = 2 + +/.c*/I +Capture group count = 0 +Subject length lower bound = 1 + +/ac*/I +Capture group count = 0 +First code unit = 'a' +Subject length lower bound = 1 + +/(a.c*|b.c*)/I +Capture group count = 1 +Starting code units: a b +Subject length lower bound = 2 + +/a.c*|aba/I +Capture group count = 0 +First code unit = 'a' +Subject length lower bound = 2 + +/.+a/I +Capture group count = 0 +Last code unit = 'a' +Subject length lower bound = 2 + +/(?=abcda)a.*/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'a' +Subject length lower bound = 2 + +/(?=a)a.*/I +Capture group count = 0 +First code unit = 'a' +Subject length lower bound = 1 + +/a(b)*/I +Capture group count = 1 +First code unit = 'a' +Subject length lower bound = 1 + +/a\d*/I +Capture group count = 0 +First code unit = 'a' +Subject length lower bound = 1 + +/ab\d*/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + +/a(\d)*/I +Capture group count = 1 +First code unit = 'a' +Subject length lower bound = 1 + +/abcde{0,0}/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'd' +Subject length lower bound = 4 + +/ab\d+/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + +/a(?(1)b)(.)/I +Capture group count = 1 +Max back reference = 1 +First code unit = 'a' +Subject length lower bound = 2 + +/a(?(1)bag|big)(.)/I +Capture group count = 1 +Max back reference = 1 +First code unit = 'a' +Last code unit = 'g' +Subject length lower bound = 5 + +/a(?(1)bag|big)*(.)/I +Capture group count = 1 +Max back reference = 1 +First code unit = 'a' +Subject length lower bound = 2 + +/a(?(1)bag|big)+(.)/I +Capture group count = 1 +Max back reference = 1 +First code unit = 'a' +Last code unit = 'g' +Subject length lower bound = 5 + +/a(?(1)b..|b..)(.)/I +Capture group count = 1 +Max back reference = 1 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 5 + +/ab\d{0}e/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'e' +Subject length lower bound = 3 + +/a?b?/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + a + 0: a + b + 0: b + ab + 0: ab + \ + 0: +\= Expect no match + \=notempty +No match + +/|-/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + abcd + 0: + -abc + 0: + ab-c\=notempty + 0: - +\= Expect no match + abc\=notempty +No match + +/^.?abcd/I +Capture group count = 0 +Compile options: +Overall options: anchored +Last code unit = 'd' +Subject length lower bound = 4 + +/\( # ( at start + (?: # Non-capturing bracket + (?>[^()]+) # Either a sequence of non-brackets (no backtracking) + | # Or + (?R) # Recurse - i.e. nested bracketed string + )* # Zero or more contents + \) # Closing ) + /Ix +Capture group count = 0 +Options: extended +First code unit = '(' +Last code unit = ')' +Subject length lower bound = 2 + (abcd) + 0: (abcd) + (abcd)xyz + 0: (abcd) + xyz(abcd) + 0: (abcd) + (ab(xy)cd)pqr + 0: (ab(xy)cd) + (ab(xycd)pqr + 0: (xycd) + () abc () + 0: () + 12(abcde(fsh)xyz(foo(bar))lmno)89 + 0: (abcde(fsh)xyz(foo(bar))lmno) +\= Expect no match + abcd +No match + abcd) +No match + (abcd +No match + +/\( ( (?>[^()]+) | (?R) )* \) /Igx +Capture group count = 1 +Options: extended +First code unit = '(' +Last code unit = ')' +Subject length lower bound = 2 + (ab(xy)cd)pqr + 0: (ab(xy)cd) + 1: cd + 1(abcd)(x(y)z)pqr + 0: (abcd) + 1: abcd + 0: (x(y)z) + 1: z + +/\( (?: (?>[^()]+) | (?R) ) \) /Ix +Capture group count = 0 +Options: extended +First code unit = '(' +Last code unit = ')' +Subject length lower bound = 3 + (abcd) + 0: (abcd) + (ab(xy)cd) + 0: (xy) + (a(b(c)d)e) + 0: (c) + ((ab)) + 0: ((ab)) +\= Expect no match + () +No match + +/\( (?: (?>[^()]+) | (?R) )? \) /Ix +Capture group count = 0 +Options: extended +First code unit = '(' +Last code unit = ')' +Subject length lower bound = 2 + () + 0: () + 12(abcde(fsh)xyz(foo(bar))lmno)89 + 0: (fsh) + +/\( ( (?>[^()]+) | (?R) )* \) /Ix +Capture group count = 1 +Options: extended +First code unit = '(' +Last code unit = ')' +Subject length lower bound = 2 + (ab(xy)cd) + 0: (ab(xy)cd) + 1: cd + +/\( ( ( (?>[^()]+) | (?R) )* ) \) /Ix +Capture group count = 2 +Options: extended +First code unit = '(' +Last code unit = ')' +Subject length lower bound = 2 + (ab(xy)cd) + 0: (ab(xy)cd) + 1: ab(xy)cd + 2: cd + +/\( (123)? ( ( (?>[^()]+) | (?R) )* ) \) /Ix +Capture group count = 3 +Options: extended +First code unit = '(' +Last code unit = ')' +Subject length lower bound = 2 + (ab(xy)cd) + 0: (ab(xy)cd) + 1: + 2: ab(xy)cd + 3: cd + (123ab(xy)cd) + 0: (123ab(xy)cd) + 1: 123 + 2: ab(xy)cd + 3: cd + +/\( ( (123)? ( (?>[^()]+) | (?R) )* ) \) /Ix +Capture group count = 3 +Options: extended +First code unit = '(' +Last code unit = ')' +Subject length lower bound = 2 + (ab(xy)cd) + 0: (ab(xy)cd) + 1: ab(xy)cd + 2: + 3: cd + (123ab(xy)cd) + 0: (123ab(xy)cd) + 1: 123ab(xy)cd + 2: 123 + 3: cd + +/\( (((((((((( ( (?>[^()]+) | (?R) )* )))))))))) \) /Ix +Capture group count = 11 +Options: extended +First code unit = '(' +Last code unit = ')' +Subject length lower bound = 2 + (ab(xy)cd) + 0: (ab(xy)cd) + 1: ab(xy)cd + 2: ab(xy)cd + 3: ab(xy)cd + 4: ab(xy)cd + 5: ab(xy)cd + 6: ab(xy)cd + 7: ab(xy)cd + 8: ab(xy)cd + 9: ab(xy)cd +10: ab(xy)cd +11: cd + +/\( ( ( (?>[^()<>]+) | ((?>[^()]+)) | (?R) )* ) \) /Ix +Capture group count = 3 +Options: extended +First code unit = '(' +Last code unit = ')' +Subject length lower bound = 2 + (abcd(xyz

    qrs)123) + 0: (abcd(xyz

    qrs)123) + 1: abcd(xyz

    qrs)123 + 2: 123 + +/\( ( ( (?>[^()]+) | ((?R)) )* ) \) /Ix +Capture group count = 3 +Options: extended +First code unit = '(' +Last code unit = ')' +Subject length lower bound = 2 + (ab(cd)ef) + 0: (ab(cd)ef) + 1: ab(cd)ef + 2: ef + 3: (cd) + (ab(cd(ef)gh)ij) + 0: (ab(cd(ef)gh)ij) + 1: ab(cd(ef)gh)ij + 2: ij + 3: (cd(ef)gh) + +/^[[:alnum:]]/IB +------------------------------------------------------------------ + Bra + ^ + [0-9A-Za-z] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z +Subject length lower bound = 1 + +/^[[:^alnum:]]/IB +------------------------------------------------------------------ + Bra + ^ + [\x00-/:-@[-`{-\xff] (neg) + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = > + ? @ [ \ ] ^ _ ` { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 + \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 + \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 + \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 + \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 + \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 + \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 + \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 + \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/^[[:alpha:]]/IB +------------------------------------------------------------------ + Bra + ^ + [A-Za-z] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z + a b c d e f g h i j k l m n o p q r s t u v w x y z +Subject length lower bound = 1 + +/^[[:^alpha:]]/IB +------------------------------------------------------------------ + Bra + ^ + [\x00-@[-`{-\xff] (neg) + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 + 5 6 7 8 9 : ; < = > ? @ [ \ ] ^ _ ` { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 + \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 + \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 \xa2 + \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 + \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 + \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf + \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde + \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed + \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc + \xfd \xfe \xff +Subject length lower bound = 1 + +/[_[:alpha:]]/I +Capture group count = 0 +Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z + _ a b c d e f g h i j k l m n o p q r s t u v w x y z +Subject length lower bound = 1 + +/^[[:ascii:]]/IB +------------------------------------------------------------------ + Bra + ^ + [\x00-\x7f] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 + 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y + Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ + \x7f +Subject length lower bound = 1 + +/^[[:^ascii:]]/IB +------------------------------------------------------------------ + Bra + ^ + [\x80-\xff] (neg) + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a + \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 + \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 + \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 + \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 + \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 + \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 + \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 + \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/^[[:blank:]]/IB +------------------------------------------------------------------ + Bra + ^ + [\x09 ] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: \x09 \x20 +Subject length lower bound = 1 + +/^[[:^blank:]]/IB +------------------------------------------------------------------ + Bra + ^ + [\x00-\x08\x0a-\x1f!-\xff] (neg) + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b + \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a + \x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 + : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ + _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 + \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f + \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e + \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad + \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc + \xbd \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb + \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda + \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 + \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 + \xf9 \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/[\n\x0b\x0c\x0d[:blank:]]/I +Capture group count = 0 +Contains explicit CR or LF match +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 +Subject length lower bound = 1 + +/^[[:cntrl:]]/IB +------------------------------------------------------------------ + Bra + ^ + [\x00-\x1f\x7f] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x7f +Subject length lower bound = 1 + +/^[[:digit:]]/IB +------------------------------------------------------------------ + Bra + ^ + [0-9] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: 0 1 2 3 4 5 6 7 8 9 +Subject length lower bound = 1 + +/^[[:graph:]]/IB +------------------------------------------------------------------ + Bra + ^ + [!-~] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : + ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ + ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ +Subject length lower bound = 1 + +/^[[:lower:]]/IB +------------------------------------------------------------------ + Bra + ^ + [a-z] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: a b c d e f g h i j k l m n o p q r s t u v w x y z +Subject length lower bound = 1 + +/^[[:print:]]/IB +------------------------------------------------------------------ + Bra + ^ + [ -~] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 + 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] + ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ +Subject length lower bound = 1 + +/^[[:punct:]]/IB +------------------------------------------------------------------ + Bra + ^ + [!-/:-@[-`{-~] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ + _ ` { | } ~ +Subject length lower bound = 1 + +/^[[:space:]]/IB +------------------------------------------------------------------ + Bra + ^ + [\x09-\x0d ] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 +Subject length lower bound = 1 + +/^[[:upper:]]/IB +------------------------------------------------------------------ + Bra + ^ + [A-Z] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z +Subject length lower bound = 1 + +/^[[:xdigit:]]/IB +------------------------------------------------------------------ + Bra + ^ + [0-9A-Fa-f] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f +Subject length lower bound = 1 + +/^[[:word:]]/IB +------------------------------------------------------------------ + Bra + ^ + [0-9A-Z_a-z] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z +Subject length lower bound = 1 + +/^[[:^cntrl:]]/IB +------------------------------------------------------------------ + Bra + ^ + [ -~\x80-\xff] (neg) + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 + 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] + ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x80 \x81 + \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 + \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f + \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae + \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd + \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc + \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb + \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea + \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 + \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/^[12[:^digit:]]/IB +------------------------------------------------------------------ + Bra + ^ + [\x00-/12:-\xff] (neg) + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 1 2 : ; < + = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a + b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 + \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 + \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 + \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf + \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe + \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd + \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc + \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb + \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa + \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/^[[:^blank:]]/IB +------------------------------------------------------------------ + Bra + ^ + [\x00-\x08\x0a-\x1f!-\xff] (neg) + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0a \x0b + \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a + \x1b \x1c \x1d \x1e \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 + : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ + _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 + \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f + \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e + \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad + \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc + \xbd \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb + \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda + \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 + \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 + \xf9 \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + +/[01[:alpha:]%]/IB +------------------------------------------------------------------ + Bra + [%01A-Za-z] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: % 0 1 A B C D E F G H I J K L M N O P Q R S T U V W + X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z +Subject length lower bound = 1 + +/[[.ch.]]/I +Failed: error 113 at offset 1: POSIX collating elements are not supported + +/[[=ch=]]/I +Failed: error 113 at offset 1: POSIX collating elements are not supported + +/[[:rhubarb:]]/I +Failed: error 130 at offset 3: unknown POSIX class name + +/[[:upper:]]/Ii +Capture group count = 0 +Options: caseless +Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z + a b c d e f g h i j k l m n o p q r s t u v w x y z +Subject length lower bound = 1 + A + 0: A + a + 0: a + +/[[:lower:]]/Ii +Capture group count = 0 +Options: caseless +Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z + a b c d e f g h i j k l m n o p q r s t u v w x y z +Subject length lower bound = 1 + A + 0: A + a + 0: a + +/((?-i)[[:lower:]])[[:lower:]]/Ii +Capture group count = 1 +Options: caseless +Starting code units: a b c d e f g h i j k l m n o p q r s t u v w x y z +Subject length lower bound = 2 + ab + 0: ab + 1: a + aB + 0: aB + 1: a +\= Expect no match + Ab +No match + AB +No match + +/[\200-\110]/I +Failed: error 108 at offset 9: range out of order in character class + +/^(?(0)f|b)oo/I +Failed: error 115 at offset 5: reference to non-existent subpattern + +# This one's here because of the large output vector needed + +/(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\d+(?:\s|$))(\w+)\s+(\270)/I +Capture group count = 271 +Max back reference = 270 +Starting code units: 0 1 2 3 4 5 6 7 8 9 +Subject length lower bound = 1 + 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 ABC ABC\=ovector=300 + 0: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 ABC ABC + 1: 1 + 2: 2 + 3: 3 + 4: 4 + 5: 5 + 6: 6 + 7: 7 + 8: 8 + 9: 9 +10: 10 +11: 11 +12: 12 +13: 13 +14: 14 +15: 15 +16: 16 +17: 17 +18: 18 +19: 19 +20: 20 +21: 21 +22: 22 +23: 23 +24: 24 +25: 25 +26: 26 +27: 27 +28: 28 +29: 29 +30: 30 +31: 31 +32: 32 +33: 33 +34: 34 +35: 35 +36: 36 +37: 37 +38: 38 +39: 39 +40: 40 +41: 41 +42: 42 +43: 43 +44: 44 +45: 45 +46: 46 +47: 47 +48: 48 +49: 49 +50: 50 +51: 51 +52: 52 +53: 53 +54: 54 +55: 55 +56: 56 +57: 57 +58: 58 +59: 59 +60: 60 +61: 61 +62: 62 +63: 63 +64: 64 +65: 65 +66: 66 +67: 67 +68: 68 +69: 69 +70: 70 +71: 71 +72: 72 +73: 73 +74: 74 +75: 75 +76: 76 +77: 77 +78: 78 +79: 79 +80: 80 +81: 81 +82: 82 +83: 83 +84: 84 +85: 85 +86: 86 +87: 87 +88: 88 +89: 89 +90: 90 +91: 91 +92: 92 +93: 93 +94: 94 +95: 95 +96: 96 +97: 97 +98: 98 +99: 99 +100: 100 +101: 101 +102: 102 +103: 103 +104: 104 +105: 105 +106: 106 +107: 107 +108: 108 +109: 109 +110: 110 +111: 111 +112: 112 +113: 113 +114: 114 +115: 115 +116: 116 +117: 117 +118: 118 +119: 119 +120: 120 +121: 121 +122: 122 +123: 123 +124: 124 +125: 125 +126: 126 +127: 127 +128: 128 +129: 129 +130: 130 +131: 131 +132: 132 +133: 133 +134: 134 +135: 135 +136: 136 +137: 137 +138: 138 +139: 139 +140: 140 +141: 141 +142: 142 +143: 143 +144: 144 +145: 145 +146: 146 +147: 147 +148: 148 +149: 149 +150: 150 +151: 151 +152: 152 +153: 153 +154: 154 +155: 155 +156: 156 +157: 157 +158: 158 +159: 159 +160: 160 +161: 161 +162: 162 +163: 163 +164: 164 +165: 165 +166: 166 +167: 167 +168: 168 +169: 169 +170: 170 +171: 171 +172: 172 +173: 173 +174: 174 +175: 175 +176: 176 +177: 177 +178: 178 +179: 179 +180: 180 +181: 181 +182: 182 +183: 183 +184: 184 +185: 185 +186: 186 +187: 187 +188: 188 +189: 189 +190: 190 +191: 191 +192: 192 +193: 193 +194: 194 +195: 195 +196: 196 +197: 197 +198: 198 +199: 199 +200: 200 +201: 201 +202: 202 +203: 203 +204: 204 +205: 205 +206: 206 +207: 207 +208: 208 +209: 209 +210: 210 +211: 211 +212: 212 +213: 213 +214: 214 +215: 215 +216: 216 +217: 217 +218: 218 +219: 219 +220: 220 +221: 221 +222: 222 +223: 223 +224: 224 +225: 225 +226: 226 +227: 227 +228: 228 +229: 229 +230: 230 +231: 231 +232: 232 +233: 233 +234: 234 +235: 235 +236: 236 +237: 237 +238: 238 +239: 239 +240: 240 +241: 241 +242: 242 +243: 243 +244: 244 +245: 245 +246: 246 +247: 247 +248: 248 +249: 249 +250: 250 +251: 251 +252: 252 +253: 253 +254: 254 +255: 255 +256: 256 +257: 257 +258: 258 +259: 259 +260: 260 +261: 261 +262: 262 +263: 263 +264: 264 +265: 265 +266: 266 +267: 267 +268: 268 +269: 269 +270: ABC +271: ABC + +# This one's here because Perl does this differently and PCRE2 can't at present + +/(main(O)?)+/I +Capture group count = 2 +First code unit = 'm' +Last code unit = 'n' +Subject length lower bound = 4 + mainmain + 0: mainmain + 1: main + mainOmain + 0: mainOmain + 1: main + 2: O + +# These are all cases where Perl does it differently (nested captures) + +/^(a(b)?)+$/I +Capture group count = 2 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 1 + aba + 0: aba + 1: a + 2: b + +/^(aa(bb)?)+$/I +Capture group count = 2 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + aabbaa + 0: aabbaa + 1: aa + 2: bb + +/^(aa|aa(bb))+$/I +Capture group count = 2 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + aabbaa + 0: aabbaa + 1: aa + 2: bb + +/^(aa(bb)??)+$/I +Capture group count = 2 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + aabbaa + 0: aabbaa + 1: aa + 2: bb + +/^(?:aa(bb)?)+$/I +Capture group count = 1 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + aabbaa + 0: aabbaa + 1: bb + +/^(aa(b(b))?)+$/I +Capture group count = 3 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + aabbaa + 0: aabbaa + 1: aa + 2: bb + 3: b + +/^(?:aa(b(b))?)+$/I +Capture group count = 2 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + aabbaa + 0: aabbaa + 1: bb + 2: b + +/^(?:aa(b(?:b))?)+$/I +Capture group count = 1 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + aabbaa + 0: aabbaa + 1: bb + +/^(?:aa(bb(?:b))?)+$/I +Capture group count = 1 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + aabbbaa + 0: aabbbaa + 1: bbb + +/^(?:aa(b(?:bb))?)+$/I +Capture group count = 1 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + aabbbaa + 0: aabbbaa + 1: bbb + +/^(?:aa(?:b(b))?)+$/I +Capture group count = 1 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + aabbaa + 0: aabbaa + 1: b + +/^(?:aa(?:b(bb))?)+$/I +Capture group count = 1 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + aabbbaa + 0: aabbbaa + 1: bb + +/^(aa(b(bb))?)+$/I +Capture group count = 3 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + aabbbaa + 0: aabbbaa + 1: aa + 2: bbb + 3: bb + +/^(aa(bb(bb))?)+$/I +Capture group count = 3 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + aabbbbaa + 0: aabbbbaa + 1: aa + 2: bbbb + 3: bb + +# ---------------- + +/#/IBx +------------------------------------------------------------------ + Bra + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +May match empty string +Options: extended +Subject length lower bound = 0 + +/a#/IBx +------------------------------------------------------------------ + Bra + a + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: extended +First code unit = 'a' +Subject length lower bound = 1 + +/[\s]/IB +------------------------------------------------------------------ + Bra + [\x09-\x0d ] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 +Subject length lower bound = 1 + +/[\S]/IB +------------------------------------------------------------------ + Bra + [\x00-\x08\x0e-\x1f!-\xff] (neg) + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f + \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e + \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C + D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h + i j k l m n o p q r s t u v w x y z { | } ~ \x7f \x80 \x81 \x82 \x83 \x84 + \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e \x8f \x90 \x91 \x92 \x93 + \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d \x9e \x9f \xa0 \xa1 \xa2 + \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac \xad \xae \xaf \xb0 \xb1 + \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb \xbc \xbd \xbe \xbf \xc0 + \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf + \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde + \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed + \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc + \xfd \xfe \xff +Subject length lower bound = 1 + +/a(?i)b/IB +------------------------------------------------------------------ + Bra + a + /i b + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +First code unit = 'a' +Last code unit = 'b' (caseless) +Subject length lower bound = 2 + ab + 0: ab + aB + 0: aB +\= Expect no match + AB +No match + +/(a(?i)b)/IB +------------------------------------------------------------------ + Bra + CBra 1 + a + /i b + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +First code unit = 'a' +Last code unit = 'b' (caseless) +Subject length lower bound = 2 + ab + 0: ab + 1: ab + aB + 0: aB + 1: aB +\= Expect no match + AB +No match + +/ (?i)abc/IBx +------------------------------------------------------------------ + Bra + /i abc + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: extended +First code unit = 'a' (caseless) +Last code unit = 'c' (caseless) +Subject length lower bound = 3 + +/#this is a comment + (?i)abc/IBx +------------------------------------------------------------------ + Bra + /i abc + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: extended +First code unit = 'a' (caseless) +Last code unit = 'c' (caseless) +Subject length lower bound = 3 + +/123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/IB +------------------------------------------------------------------ + Bra + 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890 + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +First code unit = '1' +Last code unit = '0' +Subject length lower bound = 300 + +/\Q123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890/IB +------------------------------------------------------------------ + Bra + 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890 + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +First code unit = '1' +Last code unit = '0' +Subject length lower bound = 300 + +/\Q\E/IB +------------------------------------------------------------------ + Bra + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + \ + 0: + +/\Q\Ex/IB +------------------------------------------------------------------ + Bra + x + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +First code unit = 'x' +Subject length lower bound = 1 + +/ \Q\E/IB +------------------------------------------------------------------ + Bra + + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +First code unit = ' ' +Subject length lower bound = 1 + +/a\Q\E/IB +------------------------------------------------------------------ + Bra + a + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +First code unit = 'a' +Subject length lower bound = 1 + abc + 0: a + bca + 0: a + bac + 0: a + +/a\Q\Eb/IB +------------------------------------------------------------------ + Bra + ab + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + abc + 0: ab + +/\Q\Eabc/IB +------------------------------------------------------------------ + Bra + abc + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 3 + +/x*+\w/IB +------------------------------------------------------------------ + Bra + x*+ + \w + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z +Subject length lower bound = 1 +\= Expect no match + xxxxx +No match + +/x?+/IB +------------------------------------------------------------------ + Bra + x?+ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + +/x++/IB +------------------------------------------------------------------ + Bra + x++ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +First code unit = 'x' +Subject length lower bound = 1 + +/x{1,3}+/B,no_auto_possess +------------------------------------------------------------------ + Bra + x + x{0,2}+ + Ket + End +------------------------------------------------------------------ + +/x{1,3}+/Bi,no_auto_possess +------------------------------------------------------------------ + Bra + /i x + /i x{0,2}+ + Ket + End +------------------------------------------------------------------ + +/[^x]{1,3}+/B,no_auto_possess +------------------------------------------------------------------ + Bra + [^x] + [^x]{0,2}+ + Ket + End +------------------------------------------------------------------ + +/[^x]{1,3}+/Bi,no_auto_possess +------------------------------------------------------------------ + Bra + /i [^x] + /i [^x]{0,2}+ + Ket + End +------------------------------------------------------------------ + +/(x)*+/IB +------------------------------------------------------------------ + Bra + Braposzero + CBraPos 1 + x + KetRpos + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +May match empty string +Subject length lower bound = 0 + +/^(\w++|\s++)*$/I +Capture group count = 1 +May match empty string +Compile options: +Overall options: anchored +Subject length lower bound = 0 + now is the time for all good men to come to the aid of the party + 0: now is the time for all good men to come to the aid of the party + 1: party +\= Expect no match + this is not a line with only words and spaces! +No match + +/(\d++)(\w)/I +Capture group count = 2 +Starting code units: 0 1 2 3 4 5 6 7 8 9 +Subject length lower bound = 2 + 12345a + 0: 12345a + 1: 12345 + 2: a +\= Expect no match + 12345+ +No match + +/a++b/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + aaab + 0: aaab + +/(a++b)/I +Capture group count = 1 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + aaab + 0: aaab + 1: aaab + +/(a++)b/I +Capture group count = 1 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + aaab + 0: aaab + 1: aaa + +/([^()]++|\([^()]*\))+/I +Capture group count = 1 +Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a + \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 + \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( * + , - . / 0 1 2 3 4 5 + 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z + [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f + \x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d \x8e + \x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c \x9d + \x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab \xac + \xad \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba \xbb + \xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca + \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 + \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 + \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 + \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff +Subject length lower bound = 1 + ((abc(ade)ufh()()x + 0: abc(ade)ufh()()x + 1: x + +/\(([^()]++|\([^()]+\))+\)/I +Capture group count = 1 +First code unit = '(' +Last code unit = ')' +Subject length lower bound = 3 + (abc) + 0: (abc) + 1: abc + (abc(def)xyz) + 0: (abc(def)xyz) + 1: xyz +\= Expect no match + ((()aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +No match + +/(abc){1,3}+/IB +------------------------------------------------------------------ + Bra + Once + CBra 1 + abc + Ket + Brazero + Bra + CBra 1 + abc + Ket + Brazero + CBra 1 + abc + Ket + Ket + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 3 + +/a+?+/I +Failed: error 109 at offset 3: quantifier does not follow a repeatable item + +/a{2,3}?+b/I +Failed: error 109 at offset 7: quantifier does not follow a repeatable item + +/(?U)a+?+/I +Failed: error 109 at offset 7: quantifier does not follow a repeatable item + +/a{2,3}?+b/I,ungreedy +Failed: error 109 at offset 7: quantifier does not follow a repeatable item + +/x(?U)a++b/IB +------------------------------------------------------------------ + Bra + x + a++ + b + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +First code unit = 'x' +Last code unit = 'b' +Subject length lower bound = 3 + xaaaab + 0: xaaaab + +/(?U)xa++b/IB +------------------------------------------------------------------ + Bra + x + a++ + b + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +First code unit = 'x' +Last code unit = 'b' +Subject length lower bound = 3 + xaaaab + 0: xaaaab + +/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/IB +------------------------------------------------------------------ + Bra + ^ + CBra 1 + CBra 2 + a+ + Ket + CBra 3 + [ab]+? + Ket + CBra 4 + [bc]+ + Ket + CBra 5 + \w*+ + Ket + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 5 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 3 + +/^x(?U)a+b/IB +------------------------------------------------------------------ + Bra + ^ + x + a++ + b + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +First code unit = 'x' +Last code unit = 'b' +Subject length lower bound = 3 + +/^x(?U)(a+)b/IB +------------------------------------------------------------------ + Bra + ^ + x + CBra 1 + a+? + Ket + b + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Compile options: +Overall options: anchored +First code unit = 'x' +Last code unit = 'b' +Subject length lower bound = 3 + +/[.x.]/I +Failed: error 113 at offset 0: POSIX collating elements are not supported + +/[=x=]/I +Failed: error 113 at offset 0: POSIX collating elements are not supported + +/[:x:]/I +Failed: error 112 at offset 0: POSIX named classes are supported only within a class + +/\F/I +Failed: error 137 at offset 2: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u + +/\l/I +Failed: error 137 at offset 2: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u + +/\L/I +Failed: error 137 at offset 2: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u + +/\N{name}/I +Failed: error 137 at offset 2: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u + +/\u/I +Failed: error 137 at offset 2: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u + +/\U/I +Failed: error 137 at offset 2: PCRE2 does not support \F, \L, \l, \N{name}, \U, or \u + +/a{1,3}b/ungreedy + ab + 0: ab + +/[/I +Failed: error 106 at offset 1: missing terminating ] for character class + +/[a-/I +Failed: error 106 at offset 3: missing terminating ] for character class + +/[[:space:]/I +Failed: error 106 at offset 10: missing terminating ] for character class + +/[\s]/IB +------------------------------------------------------------------ + Bra + [\x09-\x0d ] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 +Subject length lower bound = 1 + +/[[:space:]]/IB +------------------------------------------------------------------ + Bra + [\x09-\x0d ] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 +Subject length lower bound = 1 + +/[[:space:]abcde]/IB +------------------------------------------------------------------ + Bra + [\x09-\x0d a-e] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 a b c d e +Subject length lower bound = 1 + +/< (?: (?(R) \d++ | [^<>]*+) | (?R)) * >/Ix +Capture group count = 0 +Options: extended +First code unit = '<' +Last code unit = '>' +Subject length lower bound = 2 + <> + 0: <> + + 0: + hij> + 0: hij> + hij> + 0: + def> + 0: def> + + 0: <> +\= Expect no match + iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b/IB +------------------------------------------------------------------ + Bra + 8J$WE<.rX+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X + \b + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Max lookbehind = 1 +First code unit = '8' +Last code unit = 'X' +Subject length lower bound = 409 + +/\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b/IB +------------------------------------------------------------------ + Bra + $<.X+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X + \b + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Max lookbehind = 1 +First code unit = '$' +Last code unit = 'X' +Subject length lower bound = 404 + +/(.*)\d+\1/I +Capture group count = 1 +Max back reference = 1 +Subject length lower bound = 1 + +/(.*)\d+/I +Capture group count = 1 +First code unit at start or follows newline +Subject length lower bound = 1 + +/(.*)\d+\1/Is +Capture group count = 1 +Max back reference = 1 +Options: dotall +Subject length lower bound = 1 + +/(.*)\d+/Is +Capture group count = 1 +Compile options: dotall +Overall options: anchored dotall +Subject length lower bound = 1 + +/(.*(xyz))\d+\2/I +Capture group count = 2 +Max back reference = 2 +First code unit at start or follows newline +Last code unit = 'z' +Subject length lower bound = 7 + +/((.*))\d+\1/I +Capture group count = 2 +Max back reference = 1 +Subject length lower bound = 1 + abc123bc + 0: bc123bc + 1: bc + 2: bc + +/a[b]/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + +/(?=a).*/I +Capture group count = 0 +May match empty string +First code unit = 'a' +Subject length lower bound = 1 + +/(?=abc).xyz/Ii +Capture group count = 0 +Options: caseless +First code unit = 'a' (caseless) +Last code unit = 'z' (caseless) +Subject length lower bound = 4 + +/(?=abc)(?i).xyz/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'z' (caseless) +Subject length lower bound = 4 + +/(?=a)(?=b)/I +Capture group count = 0 +May match empty string +First code unit = 'a' +Subject length lower bound = 1 + +/(?=.)a/I +Capture group count = 0 +First code unit = 'a' +Subject length lower bound = 1 + +/((?=abcda)a)/I +Capture group count = 1 +First code unit = 'a' +Last code unit = 'a' +Subject length lower bound = 2 + +/((?=abcda)ab)/I +Capture group count = 1 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + +/()a/I +Capture group count = 1 +First code unit = 'a' +Subject length lower bound = 1 + +/(?:(?=.)|(?abcdef + 0 ^ ^ d + 0: abcdef + 1234abcdef +--->1234abcdef + 0 ^ ^ d + 0: abcdef +\= Expect no match + abcxyz +No match + abcxyzf +--->abcxyzf + 0 ^ ^ d +No match + +/abc(?C)de(?C1)f/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'f' +Subject length lower bound = 6 + 123abcdef +--->123abcdef + 0 ^ ^ d + 1 ^ ^ f + 0: abcdef + +/(?C1)\dabc(?C2)def/I +Capture group count = 0 +Starting code units: 0 1 2 3 4 5 6 7 8 9 +Last code unit = 'f' +Subject length lower bound = 7 + 1234abcdef +--->1234abcdef + 1 ^ \d + 1 ^ \d + 1 ^ \d + 1 ^ \d + 2 ^ ^ d + 0: 4abcdef +\= Expect no match + abcdef +No match + +/(?C1)\dabc(?C2)def/I +Capture group count = 0 +Starting code units: 0 1 2 3 4 5 6 7 8 9 +Last code unit = 'f' +Subject length lower bound = 7 + 1234abcdef +--->1234abcdef + 1 ^ \d + 1 ^ \d + 1 ^ \d + 1 ^ \d + 2 ^ ^ d + 0: 4abcdef +\= Expect no match + abcdef +No match + +/(?C255)ab/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + +/(?C256)ab/I +Failed: error 138 at offset 6: number after (?C is greater than 255 + +/(?Cab)xx/I +Failed: error 182 at offset 3: unrecognized string delimiter follows (?C + +/(?C12vr)x/I +Failed: error 139 at offset 5: closing parenthesis for (?C expected + +/abc(?C)def/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'f' +Subject length lower bound = 6 + \x83\x0\x61bcdef +--->\x83\x00abcdef + 0 ^ ^ d + 0: abcdef + +/(abc)(?C)de(?C1)f/I +Capture group count = 1 +First code unit = 'a' +Last code unit = 'f' +Subject length lower bound = 6 + 123abcdef +--->123abcdef + 0 ^ ^ d + 1 ^ ^ f + 0: abcdef + 1: abc + 123abcdef\=callout_capture +Callout 0: last capture = 1 + 1: abc +--->123abcdef + ^ ^ d +Callout 1: last capture = 1 + 1: abc +--->123abcdef + ^ ^ f + 0: abcdef + 1: abc + 123abcdefC-\=callout_none + 0: abcdef + 1: abc +\= Expect no match + 123abcdef\=callout_fail=1 +--->123abcdef + 0 ^ ^ d + 1 ^ ^ f +No match + +/(?C0)(abc(?C1))*/I +Capture group count = 1 +May match empty string +Subject length lower bound = 0 + abcabcabc +--->abcabcabc + 0 ^ ( + 1 ^ ^ )* + 1 ^ ^ )* + 1 ^ ^ )* + 0: abcabcabc + 1: abc + abcabc\=callout_fail=1:4 +--->abcabc + 0 ^ ( + 1 ^ ^ )* + 1 ^ ^ )* + 0: abcabc + 1: abc + abcabcabc\=callout_fail=1:4 +--->abcabcabc + 0 ^ ( + 1 ^ ^ )* + 1 ^ ^ )* + 1 ^ ^ )* + 0: abcabc + 1: abc + +/(\d{3}(?C))*/I +Capture group count = 1 +May match empty string +Subject length lower bound = 0 + 123\=callout_capture +Callout 0: last capture = 0 +--->123 + ^ ^ )* + 0: 123 + 1: 123 + 123456\=callout_capture +Callout 0: last capture = 0 +--->123456 + ^ ^ )* +Callout 0: last capture = 1 + 1: 123 +--->123456 + ^ ^ )* + 0: 123456 + 1: 456 + 123456789\=callout_capture +Callout 0: last capture = 0 +--->123456789 + ^ ^ )* +Callout 0: last capture = 1 + 1: 123 +--->123456789 + ^ ^ )* +Callout 0: last capture = 1 + 1: 456 +--->123456789 + ^ ^ )* + 0: 123456789 + 1: 789 + +/((xyz)(?C)p|(?C1)xyzabc)/I +Capture group count = 2 +First code unit = 'x' +Subject length lower bound = 4 + xyzabc\=callout_capture +Callout 0: last capture = 2 + 1: + 2: xyz +--->xyzabc + ^ ^ p +Callout 1: last capture = 0 +--->xyzabc + ^ x + 0: xyzabc + 1: xyzabc + +/(X)((xyz)(?C)p|(?C1)xyzabc)/I +Capture group count = 3 +First code unit = 'X' +Last code unit = 'x' +Subject length lower bound = 5 + Xxyzabc\=callout_capture +Callout 0: last capture = 3 + 1: X + 2: + 3: xyz +--->Xxyzabc + ^ ^ p +Callout 1: last capture = 1 + 1: X +--->Xxyzabc + ^^ x + 0: Xxyzabc + 1: X + 2: xyzabc + +/(?=(abc))(?C)abcdef/I +Capture group count = 1 +First code unit = 'a' +Last code unit = 'f' +Subject length lower bound = 6 + abcdef\=callout_capture +Callout 0: last capture = 1 + 1: abc +--->abcdef + ^ a + 0: abcdef + 1: abc + +/(?!(abc)(?C1)d)(?C2)abcxyz/I +Capture group count = 1 +First code unit = 'a' +Last code unit = 'z' +Subject length lower bound = 6 + abcxyz\=callout_capture +Callout 1: last capture = 1 + 1: abc +--->abcxyz + ^ ^ d +Callout 2: last capture = 0 +--->abcxyz + ^ a + 0: abcxyz + +/(?<=(abc)(?C))xyz/I +Capture group count = 1 +Max lookbehind = 3 +First code unit = 'x' +Last code unit = 'z' +Subject length lower bound = 3 + abcxyz\=callout_capture +Callout 0: last capture = 1 + 1: abc +--->abcxyz + ^ ) + 0: xyz + 1: abc + +/a(b+)(c*)(?C1)/I +Capture group count = 2 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 +\= Expect no match + abbbbbccc\=callout_data=1 +--->abbbbbccc + 1 ^ ^ End of pattern +Callout data = 1 +No match + +/a(b+?)(c*?)(?C1)/I +Capture group count = 2 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 +\= Expect no match + abbbbbccc\=callout_data=1 +--->abbbbbccc + 1 ^ ^ End of pattern +Callout data = 1 + 1 ^ ^ End of pattern +Callout data = 1 + 1 ^ ^ End of pattern +Callout data = 1 + 1 ^ ^ End of pattern +Callout data = 1 + 1 ^ ^ End of pattern +Callout data = 1 + 1 ^ ^ End of pattern +Callout data = 1 + 1 ^ ^ End of pattern +Callout data = 1 + 1 ^ ^ End of pattern +Callout data = 1 +No match + +/(?C)abc/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 3 + +/(?C)^abc/I +Capture group count = 0 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 3 + +/(?C)a|b/I +Capture group count = 0 +Starting code units: a b +Subject length lower bound = 1 + +/a|(b)(?C)/I +Capture group count = 1 +Starting code units: a b +Subject length lower bound = 1 + b +--->b + 0 ^^ End of pattern + 0: b + 1: b + +/x(ab|(bc|(de|(?R))))/I +Capture group count = 3 +First code unit = 'x' +Subject length lower bound = 3 + xab + 0: xab + 1: ab + xbc + 0: xbc + 1: bc + 2: bc + xde + 0: xde + 1: de + 2: de + 3: de + xxab + 0: xxab + 1: xab + 2: xab + 3: xab + xxxab + 0: xxxab + 1: xxab + 2: xxab + 3: xxab +\= Expect no match + xyab +No match + +/^([^()]|\((?1)*\))*$/I +Capture group count = 1 +May match empty string +Compile options: +Overall options: anchored +Subject length lower bound = 0 + abc + 0: abc + 1: c + a(b)c + 0: a(b)c + 1: c + a(b(c))d + 0: a(b(c))d + 1: d +\= Expect no match) + a(b(c)d +No match + +/^>abc>([^()]|\((?1)*\))* +Overall options: anchored +First code unit = '>' +Last code unit = '<' +Subject length lower bound = 10 + >abc>123abc>123abc>1(2)3abc>1(2)3abc>(1(2)3)abc>(1(2)3) +Overall options: anchored +Starting code units: ( - 0 1 2 3 4 5 6 7 8 9 +Subject length lower bound = 1 + 12 + 0: 12 + 1: 12 + (((2+2)*-3)-7) + 0: (((2+2)*-3)-7) + 1: (((2+2)*-3)-7) + 2: - + -12 + 0: -12 + 1: -12 +\= Expect no match + ((2+2)*-3)-7) +No match + +/^(x(y|(?1){2})z)/I +Capture group count = 2 +Compile options: +Overall options: anchored +First code unit = 'x' +Subject length lower bound = 3 + xyz + 0: xyz + 1: xyz + 2: y + xxyzxyzz + 0: xxyzxyzz + 1: xxyzxyzz + 2: xyzxyz +\= Expect no match + xxyzz +No match + xxyzxyzxyzz +No match + +/((< (?: (?(R) \d++ | [^<>]*+) | (?2)) * >))/Ix +Capture group count = 2 +Options: extended +First code unit = '<' +Last code unit = '>' +Subject length lower bound = 2 + <> + 0: <> + 1: <> + 2: <> + + 0: + 1: + 2: + hij> + 0: hij> + 1: hij> + 2: hij> + hij> + 0: + 1: + 2: + def> + 0: def> + 1: def> + 2: def> + + 0: <> + 1: <> + 2: <> +\= Expect no match + +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 9 + abcdefabc + 0: abcdefabc + 1: abc + +/^(a|b|c)=(?1)+/I +Capture group count = 1 +Compile options: +Overall options: anchored +Starting code units: a b c +Subject length lower bound = 2 + a=a + 0: a=a + 1: a + a=b + 0: a=b + 1: a + a=bc + 0: a=bc + 1: a + +/^(a|b|c)=((?1))+/I +Capture group count = 2 +Compile options: +Overall options: anchored +Starting code units: a b c +Subject length lower bound = 2 + a=a + 0: a=a + 1: a + 2: a + a=b + 0: a=b + 1: a + 2: b + a=bc + 0: a=bc + 1: a + 2: c + +/a(?Pb|c)d(?Pe)/IB +------------------------------------------------------------------ + Bra + a + CBra 1 + b + Alt + c + Ket + d + CBra 2 + e + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 2 +Named capture groups: + longername2 2 + name1 1 +First code unit = 'a' +Last code unit = 'e' +Subject length lower bound = 4 + abde + 0: abde + 1: b + 2: e + acde + 0: acde + 1: c + 2: e + +/(?:a(?Pc(?Pd)))(?Pa)/IB +------------------------------------------------------------------ + Bra + Bra + a + CBra 1 + c + CBra 2 + d + Ket + Ket + Ket + CBra 3 + a + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 3 +Named capture groups: + a 3 + c 1 + d 2 +First code unit = 'a' +Last code unit = 'a' +Subject length lower bound = 4 + +/(?Pa)...(?P=a)bbb(?P>a)d/IB +------------------------------------------------------------------ + Bra + CBra 1 + a + Ket + Any + Any + Any + \1 + bbb + Recurse + d + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +Named capture groups: + a 1 +First code unit = 'a' +Last code unit = 'd' +Subject length lower bound = 10 + +/^\W*(?:(?P(?P.)\W*(?P>one)\W*(?P=two)|)|(?P(?P.)\W*(?P>three)\W*(?P=four)|\W*.\W*))\W*$/Ii +Capture group count = 4 +Max back reference = 4 +Named capture groups: + four 4 + one 1 + three 3 + two 2 +May match empty string +Compile options: caseless +Overall options: anchored caseless +Subject length lower bound = 0 + 1221 + 0: 1221 + 1: 1221 + 2: 1 + Satan, oscillate my metallic sonatas! + 0: Satan, oscillate my metallic sonatas! + 1: + 2: + 3: Satan, oscillate my metallic sonatas + 4: S + A man, a plan, a canal: Panama! + 0: A man, a plan, a canal: Panama! + 1: + 2: + 3: A man, a plan, a canal: Panama + 4: A + Able was I ere I saw Elba. + 0: Able was I ere I saw Elba. + 1: + 2: + 3: Able was I ere I saw Elba + 4: A +\= Expect no match + The quick brown fox +No match + +/((?(R)a|b))\1(?1)?/I +Capture group count = 1 +Max back reference = 1 +Subject length lower bound = 2 + bb + 0: bb + 1: b + bbaa + 0: bba + 1: b + +/(.*)a/Is +Capture group count = 1 +Compile options: dotall +Overall options: anchored dotall +Last code unit = 'a' +Subject length lower bound = 1 + +/(.*)a\1/Is +Capture group count = 1 +Max back reference = 1 +Options: dotall +Last code unit = 'a' +Subject length lower bound = 1 + +/(.*)a(b)\2/Is +Capture group count = 2 +Max back reference = 2 +Compile options: dotall +Overall options: anchored dotall +Last code unit = 'b' +Subject length lower bound = 3 + +/((.*)a|(.*)b)z/Is +Capture group count = 3 +Compile options: dotall +Overall options: anchored dotall +Last code unit = 'z' +Subject length lower bound = 2 + +/((.*)a|(.*)b)z\1/Is +Capture group count = 3 +Max back reference = 1 +Options: dotall +Last code unit = 'z' +Subject length lower bound = 3 + +/((.*)a|(.*)b)z\2/Is +Capture group count = 3 +Max back reference = 2 +Options: dotall +Last code unit = 'z' +Subject length lower bound = 2 + +/((.*)a|(.*)b)z\3/Is +Capture group count = 3 +Max back reference = 3 +Options: dotall +Last code unit = 'z' +Subject length lower bound = 2 + +/((.*)a|^(.*)b)z\3/Is +Capture group count = 3 +Max back reference = 3 +Compile options: dotall +Overall options: anchored dotall +Last code unit = 'z' +Subject length lower bound = 2 + +/(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)a/Is +Capture group count = 31 +May match empty string +Compile options: dotall +Overall options: anchored dotall +Subject length lower bound = 0 + +/(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)a\31/Is +Capture group count = 31 +Max back reference = 31 +May match empty string +Options: dotall +Subject length lower bound = 0 + +/(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)|(.*)a\32/Is +Capture group count = 32 +Max back reference = 32 +May match empty string +Options: dotall +Subject length lower bound = 0 + +/(a)(bc)/IB,no_auto_capture +------------------------------------------------------------------ + Bra + Bra + a + Ket + Bra + bc + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: no_auto_capture +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 3 + abc + 0: abc + +/(?Pa)(bc)/IB,no_auto_capture +------------------------------------------------------------------ + Bra + CBra 1 + a + Ket + Bra + bc + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Named capture groups: + one 1 +Options: no_auto_capture +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 3 + abc + 0: abc + 1: a + +/(a)(?Pbc)/IB,no_auto_capture +------------------------------------------------------------------ + Bra + Bra + a + Ket + CBra 1 + bc + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Named capture groups: + named 1 +Options: no_auto_capture +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 3 + +/(aaa(?C1)bbb|ab)/I +Capture group count = 1 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + aaabbb +--->aaabbb + 1 ^ ^ b + 0: aaabbb + 1: aaabbb + aaabbb\=callout_data=0 +--->aaabbb + 1 ^ ^ b + 0: aaabbb + 1: aaabbb + aaabbb\=callout_data=1 +--->aaabbb + 1 ^ ^ b +Callout data = 1 + 0: ab + 1: ab +\= Expect no match + aaabbb\=callout_data=-1 +--->aaabbb + 1 ^ ^ b +Callout data = -1 +No match + +/ab(?Pcd)ef(?Pgh)/I +Capture group count = 2 +Named capture groups: + one 1 + two 2 +First code unit = 'a' +Last code unit = 'h' +Subject length lower bound = 8 + abcdefgh + 0: abcdefgh + 1: cd + 2: gh + abcdefgh\=copy=1,get=two + 0: abcdefgh + 1: cd + 2: gh + 1C cd (2) + G gh (2) two (group 2) + abcdefgh\=copy=one,copy=two + 0: abcdefgh + 1: cd + 2: gh + C cd (2) one (group 1) + C gh (2) two (group 2) + abcdefgh\=copy=three + 0: abcdefgh + 1: cd + 2: gh +Number not found for group 'three' +Copy substring 'three' failed (-49): unknown substring + +/(?P)(?P)/IB +------------------------------------------------------------------ + Bra + CBra 1 + Ket + CBra 2 + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 2 +Named capture groups: + Tes 1 + Test 2 +May match empty string +Subject length lower bound = 0 + +/(?P)(?P)/IB +------------------------------------------------------------------ + Bra + CBra 1 + Ket + CBra 2 + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 2 +Named capture groups: + Tes 2 + Test 1 +May match empty string +Subject length lower bound = 0 + +/(?Pzz)(?Paa)/I +Capture group count = 2 +Named capture groups: + A 2 + Z 1 +First code unit = 'z' +Last code unit = 'a' +Subject length lower bound = 4 + zzaa\=copy=Z + 0: zzaa + 1: zz + 2: aa + C zz (2) Z (group 1) + zzaa\=copy=A + 0: zzaa + 1: zz + 2: aa + C aa (2) A (group 2) + +/(?Peks)(?Peccs)/I +Failed: error 143 at offset 16: two named subpatterns have the same name (PCRE2_DUPNAMES not set) + +/(?Pabc(?Pdef)(?Pxyz))/I +Failed: error 143 at offset 31: two named subpatterns have the same name (PCRE2_DUPNAMES not set) + +"\[((?P\d+)(,(?P>elem))*)\]"I +Capture group count = 3 +Named capture groups: + elem 2 +First code unit = '[' +Last code unit = ']' +Subject length lower bound = 3 + [10,20,30,5,5,4,4,2,43,23,4234] + 0: [10,20,30,5,5,4,4,2,43,23,4234] + 1: 10,20,30,5,5,4,4,2,43,23,4234 + 2: 10 + 3: ,4234 +\= Expect no match + [] +No match + +"\[((?P\d+)(,(?P>elem))*)?\]"I +Capture group count = 3 +Named capture groups: + elem 2 +First code unit = '[' +Last code unit = ']' +Subject length lower bound = 2 + [10,20,30,5,5,4,4,2,43,23,4234] + 0: [10,20,30,5,5,4,4,2,43,23,4234] + 1: 10,20,30,5,5,4,4,2,43,23,4234 + 2: 10 + 3: ,4234 + [] + 0: [] + +/(a(b(?2)c))?/IB +------------------------------------------------------------------ + Bra + Brazero + CBra 1 + a + CBra 2 + b + Recurse + c + Ket + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 2 +May match empty string +Subject length lower bound = 0 + +/(a(b(?2)c))*/IB +------------------------------------------------------------------ + Bra + Brazero + CBra 1 + a + CBra 2 + b + Recurse + c + Ket + KetRmax + Ket + End +------------------------------------------------------------------ +Capture group count = 2 +May match empty string +Subject length lower bound = 0 + +/(a(b(?2)c)){0,2}/IB +------------------------------------------------------------------ + Bra + Brazero + Bra + CBra 1 + a + CBra 2 + b + Recurse + c + Ket + Ket + Brazero + CBra 1 + a + CBra 2 + b + Recurse + c + Ket + Ket + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 2 +May match empty string +Subject length lower bound = 0 + +/[ab]{1}+/B +------------------------------------------------------------------ + Bra + [ab] + Ket + End +------------------------------------------------------------------ + +/()(?1){1}/B +------------------------------------------------------------------ + Bra + CBra 1 + Ket + Recurse + Ket + End +------------------------------------------------------------------ + +/()(?1)/B +------------------------------------------------------------------ + Bra + CBra 1 + Ket + Recurse + Ket + End +------------------------------------------------------------------ + +/((w\/|-|with)*(free|immediate)*.*?shipping\s*[!.-]*)/Ii +Capture group count = 3 +Options: caseless +Last code unit = 'g' (caseless) +Subject length lower bound = 8 + Baby Bjorn Active Carrier - With free SHIPPING!! + 0: Baby Bjorn Active Carrier - With free SHIPPING!! + 1: Baby Bjorn Active Carrier - With free SHIPPING!! + +/((w\/|-|with)*(free|immediate)*.*?shipping\s*[!.-]*)/Ii +Capture group count = 3 +Options: caseless +Last code unit = 'g' (caseless) +Subject length lower bound = 8 + Baby Bjorn Active Carrier - With free SHIPPING!! + 0: Baby Bjorn Active Carrier - With free SHIPPING!! + 1: Baby Bjorn Active Carrier - With free SHIPPING!! + +/a*.*b/IB +------------------------------------------------------------------ + Bra + a* + Any* + b + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Last code unit = 'b' +Subject length lower bound = 1 + +/(a|b)*.?c/IB +------------------------------------------------------------------ + Bra + Brazero + CBra 1 + a + Alt + b + KetRmax + Any? + c + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Last code unit = 'c' +Subject length lower bound = 1 + +/abc(?C255)de(?C)f/IB +------------------------------------------------------------------ + Bra + abc + Callout 255 10 1 + de + Callout 0 16 1 + f + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +First code unit = 'a' +Last code unit = 'f' +Subject length lower bound = 6 + +/abcde/IB,auto_callout +------------------------------------------------------------------ + Bra + Callout 255 0 1 + a + Callout 255 1 1 + b + Callout 255 2 1 + c + Callout 255 3 1 + d + Callout 255 4 1 + e + Callout 255 5 0 + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: auto_callout +First code unit = 'a' +Last code unit = 'e' +Subject length lower bound = 5 + abcde +--->abcde + +0 ^ a + +1 ^^ b + +2 ^ ^ c + +3 ^ ^ d + +4 ^ ^ e + +5 ^ ^ End of pattern + 0: abcde +\= Expect no match + abcdfe +--->abcdfe + +0 ^ a + +1 ^^ b + +2 ^ ^ c + +3 ^ ^ d + +4 ^ ^ e +No match + +/a*b/IB,auto_callout +------------------------------------------------------------------ + Bra + Callout 255 0 2 + a*+ + Callout 255 2 1 + b + Callout 255 3 0 + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: auto_callout +Starting code units: a b +Last code unit = 'b' +Subject length lower bound = 1 + ab +--->ab + +0 ^ a* + +2 ^^ b + +3 ^ ^ End of pattern + 0: ab + aaaab +--->aaaab + +0 ^ a* + +2 ^ ^ b + +3 ^ ^ End of pattern + 0: aaaab + aaaacb +--->aaaacb + +0 ^ a* + +2 ^ ^ b + +0 ^ a* + +2 ^ ^ b + +0 ^ a* + +2 ^ ^ b + +0 ^ a* + +2 ^^ b + +0 ^ a* + +2 ^ b + +3 ^^ End of pattern + 0: b + +/a*b/IB,auto_callout +------------------------------------------------------------------ + Bra + Callout 255 0 2 + a*+ + Callout 255 2 1 + b + Callout 255 3 0 + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: auto_callout +Starting code units: a b +Last code unit = 'b' +Subject length lower bound = 1 + ab +--->ab + +0 ^ a* + +2 ^^ b + +3 ^ ^ End of pattern + 0: ab + aaaab +--->aaaab + +0 ^ a* + +2 ^ ^ b + +3 ^ ^ End of pattern + 0: aaaab + aaaacb +--->aaaacb + +0 ^ a* + +2 ^ ^ b + +0 ^ a* + +2 ^ ^ b + +0 ^ a* + +2 ^ ^ b + +0 ^ a* + +2 ^^ b + +0 ^ a* + +2 ^ b + +3 ^^ End of pattern + 0: b + +/a+b/IB,auto_callout +------------------------------------------------------------------ + Bra + Callout 255 0 2 + a++ + Callout 255 2 1 + b + Callout 255 3 0 + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: auto_callout +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + ab +--->ab + +0 ^ a+ + +2 ^^ b + +3 ^ ^ End of pattern + 0: ab + aaaab +--->aaaab + +0 ^ a+ + +2 ^ ^ b + +3 ^ ^ End of pattern + 0: aaaab +\= Expect no match + aaaacb +--->aaaacb + +0 ^ a+ + +2 ^ ^ b + +0 ^ a+ + +2 ^ ^ b + +0 ^ a+ + +2 ^ ^ b + +0 ^ a+ + +2 ^^ b +No match + +/(abc|def)x/IB,auto_callout +------------------------------------------------------------------ + Bra + Callout 255 0 1 + CBra 1 + Callout 255 1 1 + a + Callout 255 2 1 + b + Callout 255 3 1 + c + Callout 255 4 1 + Alt + Callout 255 5 1 + d + Callout 255 6 1 + e + Callout 255 7 1 + f + Callout 255 8 1 + Ket + Callout 255 9 1 + x + Callout 255 10 0 + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: auto_callout +Starting code units: a d +Last code unit = 'x' +Subject length lower bound = 4 + abcx +--->abcx + +0 ^ ( + +1 ^ a + +2 ^^ b + +3 ^ ^ c + +4 ^ ^ | + +9 ^ ^ x ++10 ^ ^ End of pattern + 0: abcx + 1: abc + defx +--->defx + +0 ^ ( + +1 ^ a + +5 ^ d + +6 ^^ e + +7 ^ ^ f + +8 ^ ^ ) + +9 ^ ^ x ++10 ^ ^ End of pattern + 0: defx + 1: def +\= Expect no match + abcdefzx +--->abcdefzx + +0 ^ ( + +1 ^ a + +2 ^^ b + +3 ^ ^ c + +4 ^ ^ | + +9 ^ ^ x + +5 ^ d + +0 ^ ( + +1 ^ a + +5 ^ d + +6 ^^ e + +7 ^ ^ f + +8 ^ ^ ) + +9 ^ ^ x +No match + +/(abc|def)x/IB,auto_callout +------------------------------------------------------------------ + Bra + Callout 255 0 1 + CBra 1 + Callout 255 1 1 + a + Callout 255 2 1 + b + Callout 255 3 1 + c + Callout 255 4 1 + Alt + Callout 255 5 1 + d + Callout 255 6 1 + e + Callout 255 7 1 + f + Callout 255 8 1 + Ket + Callout 255 9 1 + x + Callout 255 10 0 + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: auto_callout +Starting code units: a d +Last code unit = 'x' +Subject length lower bound = 4 + abcx +--->abcx + +0 ^ ( + +1 ^ a + +2 ^^ b + +3 ^ ^ c + +4 ^ ^ | + +9 ^ ^ x ++10 ^ ^ End of pattern + 0: abcx + 1: abc + defx +--->defx + +0 ^ ( + +1 ^ a + +5 ^ d + +6 ^^ e + +7 ^ ^ f + +8 ^ ^ ) + +9 ^ ^ x ++10 ^ ^ End of pattern + 0: defx + 1: def +\= Expect no match + abcdefzx +--->abcdefzx + +0 ^ ( + +1 ^ a + +2 ^^ b + +3 ^ ^ c + +4 ^ ^ | + +9 ^ ^ x + +5 ^ d + +0 ^ ( + +1 ^ a + +5 ^ d + +6 ^^ e + +7 ^ ^ f + +8 ^ ^ ) + +9 ^ ^ x +No match + +/(ab|cd){3,4}/I,auto_callout +Capture group count = 1 +Options: auto_callout +Starting code units: a c +Subject length lower bound = 6 + ababab +--->ababab + +0 ^ ( + +1 ^ a + +2 ^^ b + +3 ^ ^ | + +1 ^ ^ a + +2 ^ ^ b + +3 ^ ^ | + +1 ^ ^ a + +2 ^ ^ b + +3 ^ ^ | + +1 ^ ^ a + +4 ^ ^ c ++12 ^ ^ End of pattern + 0: ababab + 1: ab + abcdabcd +--->abcdabcd + +0 ^ ( + +1 ^ a + +2 ^^ b + +3 ^ ^ | + +1 ^ ^ a + +4 ^ ^ c + +5 ^ ^ d + +6 ^ ^ ){3,4} + +1 ^ ^ a + +2 ^ ^ b + +3 ^ ^ | + +1 ^ ^ a + +4 ^ ^ c + +5 ^ ^ d + +6 ^ ^ ){3,4} ++12 ^ ^ End of pattern + 0: abcdabcd + 1: cd + abcdcdcdcdcd +--->abcdcdcdcdcd + +0 ^ ( + +1 ^ a + +2 ^^ b + +3 ^ ^ | + +1 ^ ^ a + +4 ^ ^ c + +5 ^ ^ d + +6 ^ ^ ){3,4} + +1 ^ ^ a + +4 ^ ^ c + +5 ^ ^ d + +6 ^ ^ ){3,4} + +1 ^ ^ a + +4 ^ ^ c + +5 ^ ^ d + +6 ^ ^ ){3,4} ++12 ^ ^ End of pattern + 0: abcdcdcd + 1: cd + +/([ab]{,4}c|xy)/IB,auto_callout +------------------------------------------------------------------ + Bra + Callout 255 0 1 + CBra 1 + Callout 255 1 4 + [ab] + Callout 255 5 1 + { + Callout 255 6 1 + , + Callout 255 7 1 + 4 + Callout 255 8 1 + } + Callout 255 9 1 + c + Callout 255 10 1 + Alt + Callout 255 11 1 + x + Callout 255 12 1 + y + Callout 255 13 1 + Ket + Callout 255 14 0 + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: auto_callout +Starting code units: a b x +Subject length lower bound = 2 +\= Expect no match + Note: that { does NOT introduce a quantifier +--->Note: that { does NOT introduce a quantifier + +0 ^ ( + +1 ^ [ab] + +5 ^^ { ++11 ^ x + +0 ^ ( + +1 ^ [ab] + +5 ^^ { ++11 ^ x + +0 ^ ( + +1 ^ [ab] + +5 ^^ { ++11 ^ x +No match + +/([ab]{,4}c|xy)/IB,auto_callout +------------------------------------------------------------------ + Bra + Callout 255 0 1 + CBra 1 + Callout 255 1 4 + [ab] + Callout 255 5 1 + { + Callout 255 6 1 + , + Callout 255 7 1 + 4 + Callout 255 8 1 + } + Callout 255 9 1 + c + Callout 255 10 1 + Alt + Callout 255 11 1 + x + Callout 255 12 1 + y + Callout 255 13 1 + Ket + Callout 255 14 0 + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: auto_callout +Starting code units: a b x +Subject length lower bound = 2 +\= Expect no match + Note: that { does NOT introduce a quantifier +--->Note: that { does NOT introduce a quantifier + +0 ^ ( + +1 ^ [ab] + +5 ^^ { ++11 ^ x + +0 ^ ( + +1 ^ [ab] + +5 ^^ { ++11 ^ x + +0 ^ ( + +1 ^ [ab] + +5 ^^ { ++11 ^ x +No match + +/([ab]{1,4}c|xy){4,5}?123/IB,auto_callout +------------------------------------------------------------------ + Bra + Callout 255 0 1 + CBra 1 + Callout 255 1 9 + [ab]{1,4}+ + Callout 255 10 1 + c + Callout 255 11 1 + Alt + Callout 255 12 1 + x + Callout 255 13 1 + y + Callout 255 14 7 + Ket + CBra 1 + Callout 255 1 9 + [ab]{1,4}+ + Callout 255 10 1 + c + Callout 255 11 1 + Alt + Callout 255 12 1 + x + Callout 255 13 1 + y + Callout 255 14 7 + Ket + CBra 1 + Callout 255 1 9 + [ab]{1,4}+ + Callout 255 10 1 + c + Callout 255 11 1 + Alt + Callout 255 12 1 + x + Callout 255 13 1 + y + Callout 255 14 7 + Ket + CBra 1 + Callout 255 1 9 + [ab]{1,4}+ + Callout 255 10 1 + c + Callout 255 11 1 + Alt + Callout 255 12 1 + x + Callout 255 13 1 + y + Callout 255 14 7 + Ket + Braminzero + CBra 1 + Callout 255 1 9 + [ab]{1,4}+ + Callout 255 10 1 + c + Callout 255 11 1 + Alt + Callout 255 12 1 + x + Callout 255 13 1 + y + Callout 255 14 7 + Ket + Callout 255 21 1 + 1 + Callout 255 22 1 + 2 + Callout 255 23 1 + 3 + Callout 255 24 0 + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: auto_callout +Starting code units: a b x +Last code unit = '3' +Subject length lower bound = 11 + aacaacaacaacaac123 +--->aacaacaacaacaac123 + +0 ^ ( + +1 ^ [ab]{1,4} ++10 ^ ^ c ++11 ^ ^ | + +1 ^ ^ [ab]{1,4} ++10 ^ ^ c ++11 ^ ^ | + +1 ^ ^ [ab]{1,4} ++10 ^ ^ c ++11 ^ ^ | + +1 ^ ^ [ab]{1,4} ++10 ^ ^ c ++11 ^ ^ | ++21 ^ ^ 1 + +1 ^ ^ [ab]{1,4} ++10 ^ ^ c ++11 ^ ^ | ++21 ^ ^ 1 ++22 ^ ^ 2 ++23 ^ ^ 3 ++24 ^ ^ End of pattern + 0: aacaacaacaacaac123 + 1: aac + +/\b.*/I +Capture group count = 0 +Max lookbehind = 1 +May match empty string +Subject length lower bound = 0 + ab cd\=offset=1 + 0: cd + +/\b.*/Is +Capture group count = 0 +Max lookbehind = 1 +May match empty string +Options: dotall +Subject length lower bound = 0 + ab cd\=startoffset=1 + 0: cd + +/(?!.bcd).*/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + Xbcd12345 + 0: bcd12345 + +/abcde/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'e' +Subject length lower bound = 5 + ab\=ps +Partial match: ab + abc\=ps +Partial match: abc + abcd\=ps +Partial match: abcd + abcde\=ps + 0: abcde + the quick brown abc\=ps +Partial match: abc +\= Expect no match\=ps + the quick brown abxyz fox\=ps +No match + +"^(0?[1-9]|[12][0-9]|3[01])/(0?[1-9]|1[012])/(20)?\d\d$"I +Capture group count = 3 +Compile options: +Overall options: anchored +Starting code units: 0 1 2 3 4 5 6 7 8 9 +Last code unit = '/' +Subject length lower bound = 6 + 13/05/04\=ps + 0: 13/05/04 + 1: 13 + 2: 05 + 13/5/2004\=ps + 0: 13/5/2004 + 1: 13 + 2: 5 + 3: 20 + 02/05/09\=ps + 0: 02/05/09 + 1: 02 + 2: 05 + 1\=ps +Partial match: 1 + 1/2\=ps +Partial match: 1/2 + 1/2/0\=ps +Partial match: 1/2/0 + 1/2/04\=ps + 0: 1/2/04 + 1: 1 + 2: 2 + 0\=ps +Partial match: 0 + 02/\=ps +Partial match: 02/ + 02/0\=ps +Partial match: 02/0 + 02/1\=ps +Partial match: 02/1 +\= Expect no match\=ps + \=ps +No match + 123\=ps +No match + 33/4/04\=ps +No match + 3/13/04\=ps +No match + 0/1/2003\=ps +No match + 0/\=ps +No match + 02/0/\=ps +No match + 02/13\=ps +No match + +/0{0,2}ABC/I +Capture group count = 0 +Starting code units: 0 A +Last code unit = 'C' +Subject length lower bound = 3 + +/\d{3,}ABC/I +Capture group count = 0 +Starting code units: 0 1 2 3 4 5 6 7 8 9 +Last code unit = 'C' +Subject length lower bound = 6 + +/\d*ABC/I +Capture group count = 0 +Starting code units: 0 1 2 3 4 5 6 7 8 9 A +Last code unit = 'C' +Subject length lower bound = 3 + +/[abc]+DE/I +Capture group count = 0 +Starting code units: a b c +Last code unit = 'E' +Subject length lower bound = 3 + +/[abc]?123/I +Capture group count = 0 +Starting code units: 1 a b c +Last code unit = '3' +Subject length lower bound = 3 + 123\=ps + 0: 123 + a\=ps +Partial match: a + b\=ps +Partial match: b + c\=ps +Partial match: c + c12\=ps +Partial match: c12 + c123\=ps + 0: c123 + +/^(?:\d){3,5}X/I +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: 0 1 2 3 4 5 6 7 8 9 +Last code unit = 'X' +Subject length lower bound = 4 + 1\=ps +Partial match: 1 + 123\=ps +Partial match: 123 + 123X + 0: 123X + 1234\=ps +Partial match: 1234 + 1234X + 0: 1234X + 12345\=ps +Partial match: 12345 + 12345X + 0: 12345X +\= Expect no match + 1X +No match + 123456\=ps +No match + +"<(\w+)/?>(.)*"Igms +Capture group count = 3 +Max back reference = 1 +Options: dotall multiline +First code unit = '<' +Last code unit = '>' +Subject length lower bound = 7 + \n\n\nPartner der LCO\nde\nPartner der LINEAS Consulting\nGmbH\nLINEAS Consulting GmbH Hamburg\nPartnerfirmen\n30 days\nindex,follow\n\nja\n3\nPartner\n\n\nLCO\nLINEAS Consulting\n15.10.2003\n\n\n\n\nDie Partnerfirmen der LINEAS Consulting\nGmbH\n\n\n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\=jitstack=1024 + 0: \x0a\x0aPartner der LCO\x0ade\x0aPartner der LINEAS Consulting\x0aGmbH\x0aLINEAS Consulting GmbH Hamburg\x0aPartnerfirmen\x0a30 days\x0aindex,follow\x0a\x0aja\x0a3\x0aPartner\x0a\x0a\x0aLCO\x0aLINEAS Consulting\x0a15.10.2003\x0a\x0a\x0a\x0a\x0aDie Partnerfirmen der LINEAS Consulting\x0aGmbH\x0a\x0a\x0a \x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a\x0a + 1: seite + 2: \x0a + 3: seite + +/line\nbreak/I +Capture group count = 0 +Contains explicit CR or LF match +First code unit = 'l' +Last code unit = 'k' +Subject length lower bound = 10 + this is a line\nbreak + 0: line\x0abreak + line one\nthis is a line\nbreak in the second line + 0: line\x0abreak + +/line\nbreak/I,firstline +Capture group count = 0 +Contains explicit CR or LF match +Options: firstline +First code unit = 'l' +Last code unit = 'k' +Subject length lower bound = 10 + this is a line\nbreak + 0: line\x0abreak +\= Expect no match + line one\nthis is a line\nbreak in the second line +No match + +/line\nbreak/Im,firstline +Capture group count = 0 +Contains explicit CR or LF match +Options: firstline multiline +First code unit = 'l' +Last code unit = 'k' +Subject length lower bound = 10 + this is a line\nbreak + 0: line\x0abreak +\= Expect no match + line one\nthis is a line\nbreak in the second line +No match + +/(?i)(?-i)AbCd/I +Capture group count = 0 +First code unit = 'A' +Last code unit = 'd' +Subject length lower bound = 4 + AbCd + 0: AbCd +\= Expect no match + abcd +No match + +/a{11111111111111111111}/I +Failed: error 105 at offset 8: number too big in {} quantifier + +/(){64294967295}/I +Failed: error 105 at offset 9: number too big in {} quantifier + +/(){2,4294967295}/I +Failed: error 105 at offset 11: number too big in {} quantifier + +"(?i:a)(?i:b)(?i:c)(?i:d)(?i:e)(?i:f)(?i:g)(?i:h)(?i:i)(?i:j)(k)(?i:l)A\1B"I +Capture group count = 1 +Max back reference = 1 +First code unit = 'a' (caseless) +Last code unit = 'B' +Subject length lower bound = 15 + abcdefghijklAkB + 0: abcdefghijklAkB + 1: k + +"(?Pa)(?Pb)(?Pc)(?Pd)(?Pe)(?Pf)(?Pg)(?Ph)(?Pi)(?Pj)(?Pk)(?Pl)A\11B"I +Capture group count = 12 +Max back reference = 11 +Named capture groups: + n0 1 + n1 2 + n10 11 + n11 12 + n2 3 + n3 4 + n4 5 + n5 6 + n6 7 + n7 8 + n8 9 + n9 10 +First code unit = 'a' +Last code unit = 'B' +Subject length lower bound = 15 + abcdefghijklAkB + 0: abcdefghijklAkB + 1: a + 2: b + 3: c + 4: d + 5: e + 6: f + 7: g + 8: h + 9: i +10: j +11: k +12: l + +"(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)A\11B"I +Capture group count = 12 +Max back reference = 11 +First code unit = 'a' +Last code unit = 'B' +Subject length lower bound = 15 + abcdefghijklAkB + 0: abcdefghijklAkB + 1: a + 2: b + 3: c + 4: d + 5: e + 6: f + 7: g + 8: h + 9: i +10: j +11: k +12: l + +"(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)(?Pa)"I +Capture group count = 101 +Named capture groups: + name0 1 + name1 2 + name10 11 + name100 101 + name11 12 + name12 13 + name13 14 + name14 15 + name15 16 + name16 17 + name17 18 + name18 19 + name19 20 + name2 3 + name20 21 + name21 22 + name22 23 + name23 24 + name24 25 + name25 26 + name26 27 + name27 28 + name28 29 + name29 30 + name3 4 + name30 31 + name31 32 + name32 33 + name33 34 + name34 35 + name35 36 + name36 37 + name37 38 + name38 39 + name39 40 + name4 5 + name40 41 + name41 42 + name42 43 + name43 44 + name44 45 + name45 46 + name46 47 + name47 48 + name48 49 + name49 50 + name5 6 + name50 51 + name51 52 + name52 53 + name53 54 + name54 55 + name55 56 + name56 57 + name57 58 + name58 59 + name59 60 + name6 7 + name60 61 + name61 62 + name62 63 + name63 64 + name64 65 + name65 66 + name66 67 + name67 68 + name68 69 + name69 70 + name7 8 + name70 71 + name71 72 + name72 73 + name73 74 + name74 75 + name75 76 + name76 77 + name77 78 + name78 79 + name79 80 + name8 9 + name80 81 + name81 82 + name82 83 + name83 84 + name84 85 + name85 86 + name86 87 + name87 88 + name88 89 + name89 90 + name9 10 + name90 91 + name91 92 + name92 93 + name93 94 + name94 95 + name95 96 + name96 97 + name97 98 + name98 99 + name99 100 +First code unit = 'a' +Last code unit = 'a' +Subject length lower bound = 101 + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +Matched, but too many substrings + 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + 1: a + 2: a + 3: a + 4: a + 5: a + 6: a + 7: a + 8: a + 9: a +10: a +11: a +12: a +13: a +14: a + +"(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)(a)"I +Capture group count = 101 +First code unit = 'a' +Last code unit = 'a' +Subject length lower bound = 101 + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +Matched, but too many substrings + 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + 1: a + 2: a + 3: a + 4: a + 5: a + 6: a + 7: a + 8: a + 9: a +10: a +11: a +12: a +13: a +14: a + +/[^()]*(?:\((?R)\)[^()]*)*/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + (this(and)that + 0: + (this(and)that) + 0: (this(and)that) + (this(and)that)stuff + 0: (this(and)that)stuff + +/[^()]*(?:\((?>(?R))\)[^()]*)*/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + (this(and)that + 0: + (this(and)that) + 0: (this(and)that) + +/[^()]*(?:\((?R)\))*[^()]*/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + (this(and)that + 0: + (this(and)that) + 0: (this(and)that) + +/(?:\((?R)\))*[^()]*/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + (this(and)that + 0: + (this(and)that) + 0: + ((this)) + 0: ((this)) + +/(?:\((?R)\))|[^()]*/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + (this(and)that + 0: + (this(and)that) + 0: + (this) + 0: (this) + ((this)) + 0: ((this)) + +/\x{0000ff}/I +Capture group count = 0 +First code unit = \xff +Subject length lower bound = 1 + +/^((?Pa1)|(?Pa2)b)/I +Failed: error 143 at offset 18: two named subpatterns have the same name (PCRE2_DUPNAMES not set) + +/^((?Pa1)|(?Pa2)b)/I,dupnames +Capture group count = 3 +Named capture groups: + A 2 + A 3 +Compile options: dupnames +Overall options: anchored dupnames +First code unit = 'a' +Subject length lower bound = 2 + a1b\=copy=A + 0: a1 + 1: a1 + 2: a1 + C a1 (2) A (non-unique) + a2b\=copy=A + 0: a2b + 1: a2b + 2: + 3: a2 + C a2 (2) A (non-unique) + a1b\=copy=Z,copy=A + 0: a1 + 1: a1 + 2: a1 +Number not found for group 'Z' +Copy substring 'Z' failed (-49): unknown substring + C a1 (2) A (non-unique) + +/(?|(?)(?)(?)|(?)(?)(?))/I,dupnames +Capture group count = 3 +Named capture groups: + a 1 + a 3 + b 2 +May match empty string +Options: dupnames +Subject length lower bound = 0 + +/^(?Pa)(?Pb)/I,dupnames +Capture group count = 2 +Named capture groups: + A 1 + A 2 +Compile options: dupnames +Overall options: anchored dupnames +First code unit = 'a' +Subject length lower bound = 2 + ab\=copy=A + 0: ab + 1: a + 2: b + C a (1) A (non-unique) + +/^(?Pa)(?Pb)|cd/I,dupnames +Capture group count = 2 +Named capture groups: + A 1 + A 2 +Options: dupnames +Starting code units: a c +Subject length lower bound = 2 + ab\=copy=A + 0: ab + 1: a + 2: b + C a (1) A (non-unique) + cd\=copy=A + 0: cd +Copy substring 'A' failed (-55): requested value is not set + +/^(?Pa)(?Pb)|cd(?Pef)(?Pgh)/I,dupnames +Capture group count = 4 +Named capture groups: + A 1 + A 2 + A 3 + A 4 +Options: dupnames +Starting code units: a c +Subject length lower bound = 2 + cdefgh\=copy=A + 0: cdefgh + 1: + 2: + 3: ef + 4: gh + C ef (2) A (non-unique) + +/^((?Pa1)|(?Pa2)b)/I,dupnames +Capture group count = 3 +Named capture groups: + A 2 + A 3 +Compile options: dupnames +Overall options: anchored dupnames +First code unit = 'a' +Subject length lower bound = 2 + a1b\=get=A + 0: a1 + 1: a1 + 2: a1 + G a1 (2) A (non-unique) + a2b\=get=A + 0: a2b + 1: a2b + 2: + 3: a2 + G a2 (2) A (non-unique) + a1b\=get=Z,get=A + 0: a1 + 1: a1 + 2: a1 +Number not found for group 'Z' +Get substring 'Z' failed (-49): unknown substring + G a1 (2) A (non-unique) + +/^(?Pa)(?Pb)/I,dupnames +Capture group count = 2 +Named capture groups: + A 1 + A 2 +Compile options: dupnames +Overall options: anchored dupnames +First code unit = 'a' +Subject length lower bound = 2 + ab\=get=A + 0: ab + 1: a + 2: b + G a (1) A (non-unique) + +/^(?Pa)(?Pb)|cd/I,dupnames +Capture group count = 2 +Named capture groups: + A 1 + A 2 +Options: dupnames +Starting code units: a c +Subject length lower bound = 2 + ab\=get=A + 0: ab + 1: a + 2: b + G a (1) A (non-unique) + cd\=get=A + 0: cd +Get substring 'A' failed (-55): requested value is not set + +/^(?Pa)(?Pb)|cd(?Pef)(?Pgh)/I,dupnames +Capture group count = 4 +Named capture groups: + A 1 + A 2 + A 3 + A 4 +Options: dupnames +Starting code units: a c +Subject length lower bound = 2 + cdefgh\=get=A + 0: cdefgh + 1: + 2: + 3: ef + 4: gh + G ef (2) A (non-unique) + +/(?J)^((?Pa1)|(?Pa2)b)/I +Capture group count = 3 +Named capture groups: + A 2 + A 3 +Compile options: +Overall options: anchored +Duplicate name status changes +First code unit = 'a' +Subject length lower bound = 2 + a1b\=copy=A + 0: a1 + 1: a1 + 2: a1 + C a1 (2) A (non-unique) + a2b\=copy=A + 0: a2b + 1: a2b + 2: + 3: a2 + C a2 (2) A (non-unique) + +/^(?Pa) (?J:(?Pb)(?Pc)) (?Pd)/I +Failed: error 143 at offset 38: two named subpatterns have the same name (PCRE2_DUPNAMES not set) + +# In this next test, J is not set at the outer level; consequently it isn't set +# in the pattern's options; consequently pcre2_substring_get_byname() produces +# a random value. + +/^(?Pa) (?J:(?Pb)(?Pc)) (?Pd)/I +Capture group count = 4 +Named capture groups: + A 1 + B 2 + B 3 + C 4 +Compile options: +Overall options: anchored +Duplicate name status changes +First code unit = 'a' +Subject length lower bound = 6 + a bc d\=copy=A,copy=B,copy=C + 0: a bc d + 1: a + 2: b + 3: c + 4: d + C a (1) A (group 1) + C b (1) B (non-unique) + C d (1) C (group 4) + +/^(?Pa)?(?(A)a|b)/I +Capture group count = 1 +Max back reference = 1 +Named capture groups: + A 1 +Compile options: +Overall options: anchored +Subject length lower bound = 1 + aabc + 0: aa + 1: a + bc + 0: b +\= Expect no match + abc +No match + +/(?:(?(ZZ)a|b)(?PX))+/I +Capture group count = 1 +Max back reference = 1 +Named capture groups: + ZZ 1 +Last code unit = 'X' +Subject length lower bound = 2 + bXaX + 0: bXaX + 1: X + +/(?:(?(2y)a|b)(X))+/I +Failed: error 124 at offset 7: missing closing parenthesis for condition + +/(?:(?(ZA)a|b)(?PX))+/I +Failed: error 115 at offset 6: reference to non-existent subpattern + +/(?:(?(ZZ)a|b)(?(ZZ)a|b)(?PX))+/I +Capture group count = 1 +Max back reference = 1 +Named capture groups: + ZZ 1 +Last code unit = 'X' +Subject length lower bound = 3 + bbXaaX + 0: bbXaaX + 1: X + +/(?:(?(ZZ)a|\(b\))\\(?PX))+/I +Capture group count = 1 +Max back reference = 1 +Named capture groups: + ZZ 1 +Last code unit = 'X' +Subject length lower bound = 3 + (b)\\Xa\\X + 0: (b)\Xa\X + 1: X + +/(?PX|Y))+/I +Capture group count = 1 +Max back reference = 1 +Named capture groups: + A 1 +Subject length lower bound = 2 + bXXaYYaY + 0: bXXaYYaY + 1: Y + bXYaXXaX + 0: bX + 1: X + +/()()()()()()()()()(?:(?(A)(?P=A)a|b)(?PX|Y))+/I +Capture group count = 10 +Max back reference = 10 +Named capture groups: + A 10 +Subject length lower bound = 2 + bXXaYYaY + 0: bXXaYYaY + 1: + 2: + 3: + 4: + 5: + 6: + 7: + 8: + 9: +10: Y + +/\s*,\s*/I +Capture group count = 0 +Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 , +Last code unit = ',' +Subject length lower bound = 1 + \x0b,\x0b + 0: \x0b,\x0b + \x0c,\x0d + 0: \x0c,\x0d + +/^abc/Im,newline=lf +Capture group count = 0 +Options: multiline +Forced newline is LF +First code unit at start or follows newline +Last code unit = 'c' +Subject length lower bound = 3 + xyz\nabc + 0: abc + xyz\r\nabc + 0: abc +\= Expect no match + xyz\rabc +No match + xyzabc\r +No match + xyzabc\rpqr +No match + xyzabc\r\n +No match + xyzabc\r\npqr +No match + +/^abc/Im,newline=crlf +Capture group count = 0 +Options: multiline +Forced newline is CRLF +First code unit at start or follows newline +Last code unit = 'c' +Subject length lower bound = 3 + xyz\r\nabclf> + 0: abc +\= Expect no match + xyz\nabclf +No match + xyz\rabclf +No match + +/^abc/Im,newline=cr +Capture group count = 0 +Options: multiline +Forced newline is CR +First code unit at start or follows newline +Last code unit = 'c' +Subject length lower bound = 3 + xyz\rabc + 0: abc +\= Expect no match + xyz\nabc +No match + xyz\r\nabc +No match + +/^abc/Im,newline=bad +** Invalid value in 'newline=bad' + +/.*/I,newline=lf +Capture group count = 0 +May match empty string +Forced newline is LF +First code unit at start or follows newline +Subject length lower bound = 0 + abc\ndef + 0: abc + abc\rdef + 0: abc\x0ddef + abc\r\ndef + 0: abc\x0d + +/.*/I,newline=cr +Capture group count = 0 +May match empty string +Forced newline is CR +First code unit at start or follows newline +Subject length lower bound = 0 + abc\ndef + 0: abc\x0adef + abc\rdef + 0: abc + abc\r\ndef + 0: abc + +/.*/I,newline=crlf +Capture group count = 0 +May match empty string +Forced newline is CRLF +First code unit at start or follows newline +Subject length lower bound = 0 + abc\ndef + 0: abc\x0adef + abc\rdef + 0: abc\x0ddef + abc\r\ndef + 0: abc + +/\w+(.)(.)?def/Is +Capture group count = 2 +Options: dotall +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z +Last code unit = 'f' +Subject length lower bound = 5 + abc\ndef + 0: abc\x0adef + 1: \x0a + abc\rdef + 0: abc\x0ddef + 1: \x0d + abc\r\ndef + 0: abc\x0d\x0adef + 1: \x0d + 2: \x0a + +/(?P25[0-5]|2[0-4]\d|[01]?\d?\d)(?:\.(?P>B)){3}/I +Capture group count = 1 +Named capture groups: + B 1 +Starting code units: 0 1 2 3 4 5 6 7 8 9 +Last code unit = '.' +Subject length lower bound = 7 + +/()()()()()()()()()()()()()()()()()()()() + ()()()()()()()()()()()()()()()()()()()() + ()()()()()()()()()()()()()()()()()()()() + ()()()()()()()()()()()()()()()()()()()() + ()()()()()()()()()()()()()()()()()()()() + (.(.))/Ix +Capture group count = 102 +Options: extended +Subject length lower bound = 2 + XY\=ovector=133 + 0: XY + 1: + 2: + 3: + 4: + 5: + 6: + 7: + 8: + 9: +10: +11: +12: +13: +14: +15: +16: +17: +18: +19: +20: +21: +22: +23: +24: +25: +26: +27: +28: +29: +30: +31: +32: +33: +34: +35: +36: +37: +38: +39: +40: +41: +42: +43: +44: +45: +46: +47: +48: +49: +50: +51: +52: +53: +54: +55: +56: +57: +58: +59: +60: +61: +62: +63: +64: +65: +66: +67: +68: +69: +70: +71: +72: +73: +74: +75: +76: +77: +78: +79: +80: +81: +82: +83: +84: +85: +86: +87: +88: +89: +90: +91: +92: +93: +94: +95: +96: +97: +98: +99: +100: +101: XY +102: Y + +/(a*b|(?i:c*(?-i)d))/I +Capture group count = 1 +Starting code units: C a b c d +Subject length lower bound = 1 + +/()[ab]xyz/I +Capture group count = 1 +Starting code units: a b +Last code unit = 'z' +Subject length lower bound = 4 + +/(|)[ab]xyz/I +Capture group count = 1 +Starting code units: a b +Last code unit = 'z' +Subject length lower bound = 4 + +/(|c)[ab]xyz/I +Capture group count = 1 +Starting code units: a b c +Last code unit = 'z' +Subject length lower bound = 4 + +/(|c?)[ab]xyz/I +Capture group count = 1 +Starting code units: a b c +Last code unit = 'z' +Subject length lower bound = 4 + +/(d?|c?)[ab]xyz/I +Capture group count = 1 +Starting code units: a b c d +Last code unit = 'z' +Subject length lower bound = 4 + +/(d?|c)[ab]xyz/I +Capture group count = 1 +Starting code units: a b c d +Last code unit = 'z' +Subject length lower bound = 4 + +/^a*b\d/IB +------------------------------------------------------------------ + Bra + ^ + a*+ + b + \d + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: a b +Last code unit = 'b' +Subject length lower bound = 2 + +/^a*+b\d/IB +------------------------------------------------------------------ + Bra + ^ + a*+ + b + \d + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: a b +Last code unit = 'b' +Subject length lower bound = 2 + +/^a*?b\d/IB +------------------------------------------------------------------ + Bra + ^ + a*+ + b + \d + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +Starting code units: a b +Last code unit = 'b' +Subject length lower bound = 2 + +/^a+A\d/IB +------------------------------------------------------------------ + Bra + ^ + a++ + A + \d + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: +Overall options: anchored +First code unit = 'a' +Last code unit = 'A' +Subject length lower bound = 3 + aaaA5 + 0: aaaA5 +\= Expect no match + aaaa5 +No match + +/^a*A\d/IBi +------------------------------------------------------------------ + Bra + ^ + /i a* + /i A + \d + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: caseless +Overall options: anchored caseless +Starting code units: A a +Last code unit = 'A' (caseless) +Subject length lower bound = 2 + aaaA5 + 0: aaaA5 + aaaa5 + 0: aaaa5 + a5 + 0: a5 + +/(a*|b*)[cd]/I +Capture group count = 1 +Starting code units: a b c d +Subject length lower bound = 1 + +/(a+|b*)[cd]/I +Capture group count = 1 +Starting code units: a b c d +Subject length lower bound = 1 + +/(a*|b+)[cd]/I +Capture group count = 1 +Starting code units: a b c d +Subject length lower bound = 1 + +/(a+|b+)[cd]/I +Capture group count = 1 +Starting code units: a b +Subject length lower bound = 2 + +/(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( + (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( + ((( + a + )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) + )))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) + ))) +/Ix +Capture group count = 203 +Options: extended +First code unit = 'a' +Subject length lower bound = 1 + large nest +Matched, but too many substrings + 0: a + 1: a + 2: a + 3: a + 4: a + 5: a + 6: a + 7: a + 8: a + 9: a +10: a +11: a +12: a +13: a +14: a + +/a*\d/B +------------------------------------------------------------------ + Bra + a*+ + \d + Ket + End +------------------------------------------------------------------ + +/a*\D/B +------------------------------------------------------------------ + Bra + a* + \D + Ket + End +------------------------------------------------------------------ + +/0*\d/B +------------------------------------------------------------------ + Bra + 0* + \d + Ket + End +------------------------------------------------------------------ + +/0*\D/B +------------------------------------------------------------------ + Bra + 0*+ + \D + Ket + End +------------------------------------------------------------------ + +/a*\s/B +------------------------------------------------------------------ + Bra + a*+ + \s + Ket + End +------------------------------------------------------------------ + +/a*\S/B +------------------------------------------------------------------ + Bra + a* + \S + Ket + End +------------------------------------------------------------------ + +/ *\s/B +------------------------------------------------------------------ + Bra + * + \s + Ket + End +------------------------------------------------------------------ + +/ *\S/B +------------------------------------------------------------------ + Bra + *+ + \S + Ket + End +------------------------------------------------------------------ + +/a*\w/B +------------------------------------------------------------------ + Bra + a* + \w + Ket + End +------------------------------------------------------------------ + +/a*\W/B +------------------------------------------------------------------ + Bra + a*+ + \W + Ket + End +------------------------------------------------------------------ + +/=*\w/B +------------------------------------------------------------------ + Bra + =*+ + \w + Ket + End +------------------------------------------------------------------ + +/=*\W/B +------------------------------------------------------------------ + Bra + =* + \W + Ket + End +------------------------------------------------------------------ + +/\d*a/B +------------------------------------------------------------------ + Bra + \d*+ + a + Ket + End +------------------------------------------------------------------ + +/\d*2/B +------------------------------------------------------------------ + Bra + \d* + 2 + Ket + End +------------------------------------------------------------------ + +/\d*\d/B +------------------------------------------------------------------ + Bra + \d* + \d + Ket + End +------------------------------------------------------------------ + +/\d*\D/B +------------------------------------------------------------------ + Bra + \d*+ + \D + Ket + End +------------------------------------------------------------------ + +/\d*\s/B +------------------------------------------------------------------ + Bra + \d*+ + \s + Ket + End +------------------------------------------------------------------ + +/\d*\S/B +------------------------------------------------------------------ + Bra + \d* + \S + Ket + End +------------------------------------------------------------------ + +/\d*\w/B +------------------------------------------------------------------ + Bra + \d* + \w + Ket + End +------------------------------------------------------------------ + +/\d*\W/B +------------------------------------------------------------------ + Bra + \d*+ + \W + Ket + End +------------------------------------------------------------------ + +/\D*a/B +------------------------------------------------------------------ + Bra + \D* + a + Ket + End +------------------------------------------------------------------ + +/\D*2/B +------------------------------------------------------------------ + Bra + \D*+ + 2 + Ket + End +------------------------------------------------------------------ + +/\D*\d/B +------------------------------------------------------------------ + Bra + \D*+ + \d + Ket + End +------------------------------------------------------------------ + +/\D*\D/B +------------------------------------------------------------------ + Bra + \D* + \D + Ket + End +------------------------------------------------------------------ + +/\D*\s/B +------------------------------------------------------------------ + Bra + \D* + \s + Ket + End +------------------------------------------------------------------ + +/\D*\S/B +------------------------------------------------------------------ + Bra + \D* + \S + Ket + End +------------------------------------------------------------------ + +/\D*\w/B +------------------------------------------------------------------ + Bra + \D* + \w + Ket + End +------------------------------------------------------------------ + +/\D*\W/B +------------------------------------------------------------------ + Bra + \D* + \W + Ket + End +------------------------------------------------------------------ + +/\s*a/B +------------------------------------------------------------------ + Bra + \s*+ + a + Ket + End +------------------------------------------------------------------ + +/\s*2/B +------------------------------------------------------------------ + Bra + \s*+ + 2 + Ket + End +------------------------------------------------------------------ + +/\s*\d/B +------------------------------------------------------------------ + Bra + \s*+ + \d + Ket + End +------------------------------------------------------------------ + +/\s*\D/B +------------------------------------------------------------------ + Bra + \s* + \D + Ket + End +------------------------------------------------------------------ + +/\s*\s/B +------------------------------------------------------------------ + Bra + \s* + \s + Ket + End +------------------------------------------------------------------ + +/\s*\S/B +------------------------------------------------------------------ + Bra + \s*+ + \S + Ket + End +------------------------------------------------------------------ + +/\s*\w/B +------------------------------------------------------------------ + Bra + \s*+ + \w + Ket + End +------------------------------------------------------------------ + +/\s*\W/B +------------------------------------------------------------------ + Bra + \s* + \W + Ket + End +------------------------------------------------------------------ + +/\S*a/B +------------------------------------------------------------------ + Bra + \S* + a + Ket + End +------------------------------------------------------------------ + +/\S*2/B +------------------------------------------------------------------ + Bra + \S* + 2 + Ket + End +------------------------------------------------------------------ + +/\S*\d/B +------------------------------------------------------------------ + Bra + \S* + \d + Ket + End +------------------------------------------------------------------ + +/\S*\D/B +------------------------------------------------------------------ + Bra + \S* + \D + Ket + End +------------------------------------------------------------------ + +/\S*\s/B +------------------------------------------------------------------ + Bra + \S*+ + \s + Ket + End +------------------------------------------------------------------ + +/\S*\S/B +------------------------------------------------------------------ + Bra + \S* + \S + Ket + End +------------------------------------------------------------------ + +/\S*\w/B +------------------------------------------------------------------ + Bra + \S* + \w + Ket + End +------------------------------------------------------------------ + +/\S*\W/B +------------------------------------------------------------------ + Bra + \S* + \W + Ket + End +------------------------------------------------------------------ + +/\w*a/B +------------------------------------------------------------------ + Bra + \w* + a + Ket + End +------------------------------------------------------------------ + +/\w*2/B +------------------------------------------------------------------ + Bra + \w* + 2 + Ket + End +------------------------------------------------------------------ + +/\w*\d/B +------------------------------------------------------------------ + Bra + \w* + \d + Ket + End +------------------------------------------------------------------ + +/\w*\D/B +------------------------------------------------------------------ + Bra + \w* + \D + Ket + End +------------------------------------------------------------------ + +/\w*\s/B +------------------------------------------------------------------ + Bra + \w*+ + \s + Ket + End +------------------------------------------------------------------ + +/\w*\S/B +------------------------------------------------------------------ + Bra + \w* + \S + Ket + End +------------------------------------------------------------------ + +/\w*\w/B +------------------------------------------------------------------ + Bra + \w* + \w + Ket + End +------------------------------------------------------------------ + +/\w*\W/B +------------------------------------------------------------------ + Bra + \w*+ + \W + Ket + End +------------------------------------------------------------------ + +/\W*a/B +------------------------------------------------------------------ + Bra + \W*+ + a + Ket + End +------------------------------------------------------------------ + +/\W*2/B +------------------------------------------------------------------ + Bra + \W*+ + 2 + Ket + End +------------------------------------------------------------------ + +/\W*\d/B +------------------------------------------------------------------ + Bra + \W*+ + \d + Ket + End +------------------------------------------------------------------ + +/\W*\D/B +------------------------------------------------------------------ + Bra + \W* + \D + Ket + End +------------------------------------------------------------------ + +/\W*\s/B +------------------------------------------------------------------ + Bra + \W* + \s + Ket + End +------------------------------------------------------------------ + +/\W*\S/B +------------------------------------------------------------------ + Bra + \W* + \S + Ket + End +------------------------------------------------------------------ + +/\W*\w/B +------------------------------------------------------------------ + Bra + \W*+ + \w + Ket + End +------------------------------------------------------------------ + +/\W*\W/B +------------------------------------------------------------------ + Bra + \W* + \W + Ket + End +------------------------------------------------------------------ + +/[^a]+a/B +------------------------------------------------------------------ + Bra + [^a]++ + a + Ket + End +------------------------------------------------------------------ + +/[^a]+a/Bi +------------------------------------------------------------------ + Bra + /i [^a]++ + /i a + Ket + End +------------------------------------------------------------------ + +/[^a]+A/Bi +------------------------------------------------------------------ + Bra + /i [^a]++ + /i A + Ket + End +------------------------------------------------------------------ + +/[^a]+b/B +------------------------------------------------------------------ + Bra + [^a]+ + b + Ket + End +------------------------------------------------------------------ + +/[^a]+\d/B +------------------------------------------------------------------ + Bra + [^a]+ + \d + Ket + End +------------------------------------------------------------------ + +/a*[^a]/B +------------------------------------------------------------------ + Bra + a*+ + [^a] + Ket + End +------------------------------------------------------------------ + +/(?Px)(?Py)/I +Capture group count = 2 +Named capture groups: + abc 1 + xyz 2 +First code unit = 'x' +Last code unit = 'y' +Subject length lower bound = 2 + xy\=copy=abc,copy=xyz + 0: xy + 1: x + 2: y + C x (1) abc (group 1) + C y (1) xyz (group 2) + +/(?x)(?'xyz'y)/I +Capture group count = 2 +Named capture groups: + abc 1 + xyz 2 +First code unit = 'x' +Last code unit = 'y' +Subject length lower bound = 2 + xy\=copy=abc,copy=xyz + 0: xy + 1: x + 2: y + C x (1) abc (group 1) + C y (1) xyz (group 2) + +/(?x)(?'xyz>y)/I +Failed: error 142 at offset 15: syntax error in subpattern name (missing terminator?) + +/(?P'abc'x)(?Py)/I +Failed: error 141 at offset 3: unrecognized character after (?P + +/^(?:(?(ZZ)a|b)(?X))+/ + bXaX + 0: bXaX + 1: X + bXbX + 0: bX + 1: X +\= Expect no match + aXaX +No match + aXbX +No match + +/^(?P>abc)(?xxx)/ +Failed: error 115 at offset 5: reference to non-existent subpattern + +/^(?P>abc)(?x|y)/ + xx + 0: xx + 1: x + xy + 0: xy + 1: y + yy + 0: yy + 1: y + yx + 0: yx + 1: x + +/^(?P>abc)(?Px|y)/ + xx + 0: xx + 1: x + xy + 0: xy + 1: y + yy + 0: yy + 1: y + yx + 0: yx + 1: x + +/^((?(abc)a|b)(?x|y))+/ + bxay + 0: bxay + 1: ay + 2: y + bxby + 0: bx + 1: bx + 2: x +\= Expect no match + axby +No match + +/^(((?P=abc)|X)(?x|y))+/ + XxXxxx + 0: XxXxxx + 1: xx + 2: x + 3: x + XxXyyx + 0: XxXyyx + 1: yx + 2: y + 3: x + XxXyxx + 0: XxXy + 1: Xy + 2: X + 3: y +\= Expect no match + x +No match + +/^(?1)(abc)/ + abcabc + 0: abcabc + 1: abc + +/^(?:(?:\1|X)(a|b))+/ + Xaaa + 0: Xaaa + 1: a + Xaba + 0: Xa + 1: a + +/^[\E\Qa\E-\Qz\E]+/B +------------------------------------------------------------------ + Bra + ^ + [a-z]++ + Ket + End +------------------------------------------------------------------ + +/^[a\Q]bc\E]/B +------------------------------------------------------------------ + Bra + ^ + [\]a-c] + Ket + End +------------------------------------------------------------------ + +/^[a-\Q\E]/B +------------------------------------------------------------------ + Bra + ^ + [\-a] + Ket + End +------------------------------------------------------------------ + +/^(?P>abc)[()](?)/B +------------------------------------------------------------------ + Bra + ^ + Recurse + [()] + CBra 1 + Ket + Ket + End +------------------------------------------------------------------ + +/^((?(abc)y)[()](?Px))+/B +------------------------------------------------------------------ + Bra + ^ + CBra 1 + Cond + 2 Cond ref + y + Ket + [()] + CBra 2 + x + Ket + KetRmax + Ket + End +------------------------------------------------------------------ + (xy)x + 0: (xy)x + 1: y)x + 2: x + +/^(?P>abc)\Q()\E(?)/B +------------------------------------------------------------------ + Bra + ^ + Recurse + () + CBra 1 + Ket + Ket + End +------------------------------------------------------------------ + +/^(?P>abc)[a\Q(]\E(](?)/B +------------------------------------------------------------------ + Bra + ^ + Recurse + [(\]a] + CBra 1 + Ket + Ket + End +------------------------------------------------------------------ + +/^(?P>abc) # this is (a comment) + (?)/Bx +------------------------------------------------------------------ + Bra + ^ + Recurse + CBra 1 + Ket + Ket + End +------------------------------------------------------------------ + +/^\W*(?:(?(?.)\W*(?&one)\W*\k|)|(?(?.)\W*(?&three)\W*\k'four'|\W*.\W*))\W*$/Ii +Capture group count = 4 +Max back reference = 4 +Named capture groups: + four 4 + one 1 + three 3 + two 2 +May match empty string +Compile options: caseless +Overall options: anchored caseless +Subject length lower bound = 0 + 1221 + 0: 1221 + 1: 1221 + 2: 1 + Satan, oscillate my metallic sonatas! + 0: Satan, oscillate my metallic sonatas! + 1: + 2: + 3: Satan, oscillate my metallic sonatas + 4: S + A man, a plan, a canal: Panama! + 0: A man, a plan, a canal: Panama! + 1: + 2: + 3: A man, a plan, a canal: Panama + 4: A + Able was I ere I saw Elba. + 0: Able was I ere I saw Elba. + 1: + 2: + 3: Able was I ere I saw Elba + 4: A +\= Expect no match + The quick brown fox +No match + +/(?=(\w+))\1:/I +Capture group count = 1 +Max back reference = 1 +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z +Last code unit = ':' +Subject length lower bound = 2 + abcd: + 0: abcd: + 1: abcd + +/(?=(?'abc'\w+))\k:/I +Capture group count = 1 +Max back reference = 1 +Named capture groups: + abc 1 +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z +Last code unit = ':' +Subject length lower bound = 2 + abcd: + 0: abcd: + 1: abcd + +/(?'abc'a|b)(?d|e)\k{2}/dupnames + adaa + 0: adaa + 1: a + 2: d +\= Expect no match + addd +No match + adbb +No match + +/(?'abc'a|b)(?d|e)(?&abc){2}/dupnames + bdaa + 0: bdaa + 1: b + 2: d + bdab + 0: bdab + 1: b + 2: d +\= Expect no match + bddd +No match + +/(?( (?'B' abc (?(R) (?(R&A)1) (?(R&B)2) X | (?1) (?2) (?R) ))) /x + abcabc1Xabc2XabcXabcabc + 0: abcabc1Xabc2XabcX + 1: abcabc1Xabc2XabcX + 2: abcabc1Xabc2XabcX + +/(? (?'B' abc (?(R) (?(R&C)1) (?(R&B)2) X | (?1) (?2) (?R) ))) /x +Failed: error 115 at offset 27: reference to non-existent subpattern + +/^(?(DEFINE) abc | xyz ) /x +Failed: error 154 at offset 4: DEFINE subpattern contains more than one branch + +/(?(DEFINE) abc) xyz/Ix +Capture group count = 0 +Options: extended +First code unit = 'x' +Last code unit = 'z' +Subject length lower bound = 3 + +/(a|)*\d/ + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4\=ovector=0 + 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 + 1: +\= Expect no match + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\=ovector=0 +No match + +/^a.b/newline=lf + a\rb + 0: a\x0db +\= Expect no match + a\nb +No match + +/^a.b/newline=cr + a\nb + 0: a\x0ab +\= Expect no match + a\rb +No match + +/^a.b/newline=anycrlf + a\x85b + 0: a\x85b +\= Expect no match + a\rb +No match + +/^a.b/newline=any +\= Expect no match + a\nb +No match + a\rb +No match + a\x85b +No match + +/^abc./gmx,newline=any + abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x85abc7 JUNK + 0: abc1 + 0: abc2 + 0: abc3 + 0: abc4 + 0: abc5 + 0: abc6 + 0: abc7 + +/abc.$/gmx,newline=any + abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc7 abc9 + 0: abc1 + 0: abc2 + 0: abc3 + 0: abc4 + 0: abc5 + 0: abc6 + 0: abc9 + +/^a\Rb/bsr=unicode + a\nb + 0: a\x0ab + a\rb + 0: a\x0db + a\r\nb + 0: a\x0d\x0ab + a\x0bb + 0: a\x0bb + a\x0cb + 0: a\x0cb + a\x85b + 0: a\x85b +\= Expect no match + a\n\rb +No match + +/^a\R*b/bsr=unicode + ab + 0: ab + a\nb + 0: a\x0ab + a\rb + 0: a\x0db + a\r\nb + 0: a\x0d\x0ab + a\x0bb + 0: a\x0bb + a\x0cb + 0: a\x0cb + a\x85b + 0: a\x85b + a\n\rb + 0: a\x0a\x0db + a\n\r\x85\x0cb + 0: a\x0a\x0d\x85\x0cb + +/^a\R+b/bsr=unicode + a\nb + 0: a\x0ab + a\rb + 0: a\x0db + a\r\nb + 0: a\x0d\x0ab + a\x0bb + 0: a\x0bb + a\x0cb + 0: a\x0cb + a\x85b + 0: a\x85b + a\n\rb + 0: a\x0a\x0db + a\n\r\x85\x0cb + 0: a\x0a\x0d\x85\x0cb +\= Expect no match + ab +No match + +/^a\R{1,3}b/bsr=unicode + a\nb + 0: a\x0ab + a\n\rb + 0: a\x0a\x0db + a\n\r\x85b + 0: a\x0a\x0d\x85b + a\r\n\r\nb + 0: a\x0d\x0a\x0d\x0ab + a\r\n\r\n\r\nb + 0: a\x0d\x0a\x0d\x0a\x0d\x0ab + a\n\r\n\rb + 0: a\x0a\x0d\x0a\x0db + a\n\n\r\nb + 0: a\x0a\x0a\x0d\x0ab +\= Expect no match + a\n\n\n\rb +No match + a\r +No match + +/(?&abc)X(?P)/I +Capture group count = 1 +Named capture groups: + abc 1 +Last code unit = 'P' +Subject length lower bound = 3 + abcPXP123 + 0: PXP + 1: P + +/(?1)X(?P)/I +Capture group count = 1 +Named capture groups: + abc 1 +Last code unit = 'P' +Subject length lower bound = 3 + abcPXP123 + 0: PXP + 1: P + +/(?:a(?&abc)b)*(?x)/ + 123axbaxbaxbx456 + 0: axbaxbaxbx + 1: x + 123axbaxbaxb456 + 0: x + 1: x + +/(?:a(?&abc)b){1,5}(?x)/ + 123axbaxbaxbx456 + 0: axbaxbaxbx + 1: x + +/(?:a(?&abc)b){2,5}(?x)/ + 123axbaxbaxbx456 + 0: axbaxbaxbx + 1: x + +/(?:a(?&abc)b){2,}(?x)/ + 123axbaxbaxbx456 + 0: axbaxbaxbx + 1: x + +/(abc)(?i:(?1))/ + defabcabcxyz + 0: abcabc + 1: abc +\= Expect no match + DEFabcABCXYZ +No match + +/(abc)(?:(?i)(?1))/ + defabcabcxyz + 0: abcabc + 1: abc +\= Expect no match + DEFabcABCXYZ +No match + +/^(a)\g-2/ +Failed: error 115 at offset 8: reference to non-existent subpattern + +/^(a)\g/ +Failed: error 157 at offset 6: \g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number + +/^(a)\g{0}/ +Failed: error 115 at offset 9: reference to non-existent subpattern + +/^(a)\g{3/ +Failed: error 157 at offset 6: \g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number + +/^(a)\g{aa}/ +Failed: error 115 at offset 7: reference to non-existent subpattern + +/^a.b/newline=lf + a\rb + 0: a\x0db +\= Expect no match + a\nb +No match + +/.+foo/ + afoo + 0: afoo +\= Expect no match + \r\nfoo +No match + \nfoo +No match + +/.+foo/newline=crlf + afoo + 0: afoo + \nfoo + 0: \x0afoo +\= Expect no match + \r\nfoo +No match + +/.+foo/newline=any + afoo + 0: afoo +\= Expect no match + \nfoo +No match + \r\nfoo +No match + +/.+foo/s + afoo + 0: afoo + \r\nfoo + 0: \x0d\x0afoo + \nfoo + 0: \x0afoo + +/^$/gm,newline=any + abc\r\rxyz + 0: + abc\n\rxyz + 0: +\= Expect no match + abc\r\nxyz +No match + +/(?m)^$/g,newline=any,aftertext + abc\r\n\r\n + 0: + 0+ \x0d\x0a + +/(?m)^$|^\r\n/g,newline=any,aftertext + abc\r\n\r\n + 0: + 0+ \x0d\x0a + 0: \x0d\x0a + 0+ + +/(?m)$/g,newline=any,aftertext + abc\r\n\r\n + 0: + 0+ \x0d\x0a\x0d\x0a + 0: + 0+ \x0d\x0a + 0: + 0+ + +/abc.$/gmx,newline=anycrlf + abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc9 + 0: abc1 + 0: abc4 + 0: abc5 + 0: abc9 + +/^X/m + XABC + 0: X +\= Expect no match + XABC\=notbol +No match + +/(ab|c)(?-1)/B +------------------------------------------------------------------ + Bra + CBra 1 + ab + Alt + c + Ket + Recurse + Ket + End +------------------------------------------------------------------ + abc + 0: abc + 1: ab + +/xy(?+1)(abc)/B +------------------------------------------------------------------ + Bra + xy + Recurse + CBra 1 + abc + Ket + Ket + End +------------------------------------------------------------------ + xyabcabc + 0: xyabcabc + 1: abc +\= Expect no match + xyabc +No match + +/x(?-0)y/ +Failed: error 126 at offset 5: a relative value of zero is not allowed + +/x(?-1)y/ +Failed: error 115 at offset 5: reference to non-existent subpattern + +/x(?+0)y/ +Failed: error 126 at offset 5: a relative value of zero is not allowed + +/x(?+1)y/ +Failed: error 115 at offset 5: reference to non-existent subpattern + +/^(abc)?(?(-1)X|Y)/B +------------------------------------------------------------------ + Bra + ^ + Brazero + CBra 1 + abc + Ket + Cond + 1 Cond ref + X + Alt + Y + Ket + Ket + End +------------------------------------------------------------------ + abcX + 0: abcX + 1: abc + Y + 0: Y +\= Expect no match + abcY +No match + +/^((?(+1)X|Y)(abc))+/B +------------------------------------------------------------------ + Bra + ^ + CBra 1 + Cond + 2 Cond ref + X + Alt + Y + Ket + CBra 2 + abc + Ket + KetRmax + Ket + End +------------------------------------------------------------------ + YabcXabc + 0: YabcXabc + 1: Xabc + 2: abc + YabcXabcXabc + 0: YabcXabcXabc + 1: Xabc + 2: abc +\= Expect no match + XabcXabc +No match + +/(?(-1)a)/B +Failed: error 115 at offset 5: reference to non-existent subpattern + +/((?(-1)a))/B +------------------------------------------------------------------ + Bra + CBra 1 + Cond + 1 Cond ref + a + Ket + Ket + Ket + End +------------------------------------------------------------------ + +/((?(-2)a))/B +Failed: error 115 at offset 6: reference to non-existent subpattern + +/^(?(+1)X|Y)(.)/B +------------------------------------------------------------------ + Bra + ^ + Cond + 1 Cond ref + X + Alt + Y + Ket + CBra 1 + Any + Ket + Ket + End +------------------------------------------------------------------ + Y! + 0: Y! + 1: ! + +/(?tom|bon)-\k{A}/ + tom-tom + 0: tom-tom + 1: tom + bon-bon + 0: bon-bon + 1: bon +\= Expect no match + tom-bon +No match + +/\g{A/ +Failed: error 142 at offset 4: syntax error in subpattern name (missing terminator?) + +/(?|(abc)|(xyz))/B +------------------------------------------------------------------ + Bra + Bra + CBra 1 + abc + Ket + Alt + CBra 1 + xyz + Ket + Ket + Ket + End +------------------------------------------------------------------ + >abc< + 0: abc + 1: abc + >xyz< + 0: xyz + 1: xyz + +/(x)(?|(abc)|(xyz))(x)/B +------------------------------------------------------------------ + Bra + CBra 1 + x + Ket + Bra + CBra 2 + abc + Ket + Alt + CBra 2 + xyz + Ket + Ket + CBra 3 + x + Ket + Ket + End +------------------------------------------------------------------ + xabcx + 0: xabcx + 1: x + 2: abc + 3: x + xxyzx + 0: xxyzx + 1: x + 2: xyz + 3: x + +/(x)(?|(abc)(pqr)|(xyz))(x)/B +------------------------------------------------------------------ + Bra + CBra 1 + x + Ket + Bra + CBra 2 + abc + Ket + CBra 3 + pqr + Ket + Alt + CBra 2 + xyz + Ket + Ket + CBra 4 + x + Ket + Ket + End +------------------------------------------------------------------ + xabcpqrx + 0: xabcpqrx + 1: x + 2: abc + 3: pqr + 4: x + xxyzx + 0: xxyzx + 1: x + 2: xyz + 3: + 4: x + +/\H++X/B +------------------------------------------------------------------ + Bra + \H++ + X + Ket + End +------------------------------------------------------------------ +\= Expect no match + XXXX +No match + +/\H+\hY/B +------------------------------------------------------------------ + Bra + \H++ + \h + Y + Ket + End +------------------------------------------------------------------ + XXXX Y + 0: XXXX Y + +/\H+ Y/B +------------------------------------------------------------------ + Bra + \H++ + Y + Ket + End +------------------------------------------------------------------ + +/\h+A/B +------------------------------------------------------------------ + Bra + \h++ + A + Ket + End +------------------------------------------------------------------ + +/\v*B/B +------------------------------------------------------------------ + Bra + \v*+ + B + Ket + End +------------------------------------------------------------------ + +/\V+\x0a/B +------------------------------------------------------------------ + Bra + \V++ + \x0a + Ket + End +------------------------------------------------------------------ + +/A+\h/B +------------------------------------------------------------------ + Bra + A++ + \h + Ket + End +------------------------------------------------------------------ + +/ *\H/B +------------------------------------------------------------------ + Bra + *+ + \H + Ket + End +------------------------------------------------------------------ + +/A*\v/B +------------------------------------------------------------------ + Bra + A*+ + \v + Ket + End +------------------------------------------------------------------ + +/\x0b*\V/B +------------------------------------------------------------------ + Bra + \x0b*+ + \V + Ket + End +------------------------------------------------------------------ + +/\d+\h/B +------------------------------------------------------------------ + Bra + \d++ + \h + Ket + End +------------------------------------------------------------------ + +/\d*\v/B +------------------------------------------------------------------ + Bra + \d*+ + \v + Ket + End +------------------------------------------------------------------ + +/S+\h\S+\v/B +------------------------------------------------------------------ + Bra + S++ + \h + \S++ + \v + Ket + End +------------------------------------------------------------------ + +/\w{3,}\h\w+\v/B +------------------------------------------------------------------ + Bra + \w{3} + \w*+ + \h + \w++ + \v + Ket + End +------------------------------------------------------------------ + +/\h+\d\h+\w\h+\S\h+\H/B +------------------------------------------------------------------ + Bra + \h++ + \d + \h++ + \w + \h++ + \S + \h++ + \H + Ket + End +------------------------------------------------------------------ + +/\v+\d\v+\w\v+\S\v+\V/B +------------------------------------------------------------------ + Bra + \v++ + \d + \v++ + \w + \v++ + \S + \v++ + \V + Ket + End +------------------------------------------------------------------ + +/\H+\h\H+\d/B +------------------------------------------------------------------ + Bra + \H++ + \h + \H+ + \d + Ket + End +------------------------------------------------------------------ + +/\V+\v\V+\w/B +------------------------------------------------------------------ + Bra + \V++ + \v + \V+ + \w + Ket + End +------------------------------------------------------------------ + +/\( (?: [^()]* | (?R) )* \)/x +(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(00)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)\=jitstack=1024 + 0: (0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(0(00)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0)0) + +/[\E]AAA/ +Failed: error 106 at offset 7: missing terminating ] for character class + +/[\Q\E]AAA/ +Failed: error 106 at offset 9: missing terminating ] for character class + +/[^\E]AAA/ +Failed: error 106 at offset 8: missing terminating ] for character class + +/[^\Q\E]AAA/ +Failed: error 106 at offset 10: missing terminating ] for character class + +/[\E^]AAA/ +Failed: error 106 at offset 8: missing terminating ] for character class + +/[\Q\E^]AAA/ +Failed: error 106 at offset 10: missing terminating ] for character class + +/A(*PRUNE)B(*SKIP)C(*THEN)D(*COMMIT)E(*F)F(*FAIL)G(?!)H(*ACCEPT)I/B +------------------------------------------------------------------ + Bra + A + *PRUNE + B + *SKIP + C + *THEN + D + *COMMIT + E + *FAIL + F + *FAIL + G + *FAIL + H + *ACCEPT + I + Ket + End +------------------------------------------------------------------ + +/^a+(*FAIL)/auto_callout +\= Expect no match + aaaaaa +--->aaaaaa + +0 ^ ^ + +1 ^ a+ + +3 ^ ^ (*FAIL) + +3 ^ ^ (*FAIL) + +3 ^ ^ (*FAIL) + +3 ^ ^ (*FAIL) + +3 ^ ^ (*FAIL) + +3 ^^ (*FAIL) +No match + +/a+b?c+(*FAIL)/auto_callout +\= Expect no match + aaabccc +--->aaabccc + +0 ^ a+ + +2 ^ ^ b? + +4 ^ ^ c+ + +6 ^ ^ (*FAIL) + +6 ^ ^ (*FAIL) + +6 ^ ^ (*FAIL) + +0 ^ a+ + +2 ^ ^ b? + +4 ^ ^ c+ + +6 ^ ^ (*FAIL) + +6 ^ ^ (*FAIL) + +6 ^ ^ (*FAIL) + +0 ^ a+ + +2 ^^ b? + +4 ^ ^ c+ + +6 ^ ^ (*FAIL) + +6 ^ ^ (*FAIL) + +6 ^ ^ (*FAIL) +No match + +/a+b?(*PRUNE)c+(*FAIL)/auto_callout +\= Expect no match + aaabccc +--->aaabccc + +0 ^ a+ + +2 ^ ^ b? + +4 ^ ^ (*PRUNE) ++12 ^ ^ c+ ++14 ^ ^ (*FAIL) ++14 ^ ^ (*FAIL) ++14 ^ ^ (*FAIL) + +0 ^ a+ + +2 ^ ^ b? + +4 ^ ^ (*PRUNE) ++12 ^ ^ c+ ++14 ^ ^ (*FAIL) ++14 ^ ^ (*FAIL) ++14 ^ ^ (*FAIL) + +0 ^ a+ + +2 ^^ b? + +4 ^ ^ (*PRUNE) ++12 ^ ^ c+ ++14 ^ ^ (*FAIL) ++14 ^ ^ (*FAIL) ++14 ^ ^ (*FAIL) +No match + +/a+b?(*COMMIT)c+(*FAIL)/auto_callout +\= Expect no match + aaabccc +--->aaabccc + +0 ^ a+ + +2 ^ ^ b? + +4 ^ ^ (*COMMIT) ++13 ^ ^ c+ ++15 ^ ^ (*FAIL) ++15 ^ ^ (*FAIL) ++15 ^ ^ (*FAIL) +No match + +/a+b?(*SKIP)c+(*FAIL)/auto_callout +\= Expect no match + aaabcccaaabccc +--->aaabcccaaabccc + +0 ^ a+ + +2 ^ ^ b? + +4 ^ ^ (*SKIP) ++11 ^ ^ c+ ++13 ^ ^ (*FAIL) ++13 ^ ^ (*FAIL) ++13 ^ ^ (*FAIL) + +0 ^ a+ + +2 ^ ^ b? + +4 ^ ^ (*SKIP) ++11 ^ ^ c+ ++13 ^ ^ (*FAIL) ++13 ^ ^ (*FAIL) ++13 ^ ^ (*FAIL) +No match + +/a+b?(*THEN)c+(*FAIL)/auto_callout +\= Expect no match + aaabccc +--->aaabccc + +0 ^ a+ + +2 ^ ^ b? + +4 ^ ^ (*THEN) ++11 ^ ^ c+ ++13 ^ ^ (*FAIL) ++13 ^ ^ (*FAIL) ++13 ^ ^ (*FAIL) + +0 ^ a+ + +2 ^ ^ b? + +4 ^ ^ (*THEN) ++11 ^ ^ c+ ++13 ^ ^ (*FAIL) ++13 ^ ^ (*FAIL) ++13 ^ ^ (*FAIL) + +0 ^ a+ + +2 ^^ b? + +4 ^ ^ (*THEN) ++11 ^ ^ c+ ++13 ^ ^ (*FAIL) ++13 ^ ^ (*FAIL) ++13 ^ ^ (*FAIL) +No match + +/a(*MARK)b/ +Failed: error 166 at offset 7: (*MARK) must have an argument + +/\g6666666666/ +Failed: error 161 at offset 7: subpattern number is too big + +/[\g6666666666]/B +------------------------------------------------------------------ + Bra + [6g] + Ket + End +------------------------------------------------------------------ + +/(?1)\c[/ +Failed: error 115 at offset 3: reference to non-existent subpattern + +/.+A/newline=crlf +\= Expect no match + \r\nA +No match + +/\nA/newline=crlf + \r\nA + 0: \x0aA + +/[\r\n]A/newline=crlf + \r\nA + 0: \x0aA + +/(\r|\n)A/newline=crlf + \r\nA + 0: \x0aA + 1: \x0a + +/a(*CR)b/ +Failed: error 160 at offset 5: (*VERB) not recognized or malformed + +/(*CR)a.b/ + a\nb + 0: a\x0ab +\= Expect no match + a\rb +No match + +/(*CR)a.b/newline=lf + a\nb + 0: a\x0ab +\= Expect no match + a\rb +No match + +/(*LF)a.b/newline=CRLF + a\rb + 0: a\x0db +\= Expect no match + a\nb +No match + +/(*CRLF)a.b/ + a\rb + 0: a\x0db + a\nb + 0: a\x0ab +\= Expect no match + a\r\nb +No match + +/(*ANYCRLF)a.b/newline=CR +\= Expect no match + a\rb +No match + a\nb +No match + a\r\nb +No match + +/(*ANY)a.b/newline=cr +\= Expect no match + a\rb +No match + a\nb +No match + a\r\nb +No match + a\x85b +No match + +/(*ANY).*/g + abc\r\ndef + 0: abc + 0: + 0: def + 0: + +/(*ANYCRLF).*/g + abc\r\ndef + 0: abc + 0: + 0: def + 0: + +/(*CRLF).*/g + abc\r\ndef + 0: abc + 0: + 0: def + 0: + +/(*NUL)^.*/ + a\nb\x00ccc + 0: a\x0ab + +/(*NUL)^.*/s + a\nb\x00ccc + 0: a\x0ab\x00ccc + +/^x/m,newline=NUL + ab\x00xy + 0: x + +/'#comment' 0d 0a 00 '^x\' 0a 'y'/x,newline=nul,hex + x\nyz + 0: x\x0ay + +/(*NUL)^X\NY/ + X\nY + 0: X\x0aY + X\rY + 0: X\x0dY +\= Expect no match + X\x00Y +No match + +/a\Rb/I,bsr=anycrlf +Capture group count = 0 +\R matches CR, LF, or CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + a\rb + 0: a\x0db + a\nb + 0: a\x0ab + a\r\nb + 0: a\x0d\x0ab +\= Expect no match + a\x85b +No match + a\x0bb +No match + +/a\Rb/I,bsr=unicode +Capture group count = 0 +\R matches any Unicode newline +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + a\rb + 0: a\x0db + a\nb + 0: a\x0ab + a\r\nb + 0: a\x0d\x0ab + a\x85b + 0: a\x85b + a\x0bb + 0: a\x0bb + +/a\R?b/I,bsr=anycrlf +Capture group count = 0 +\R matches CR, LF, or CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + a\rb + 0: a\x0db + a\nb + 0: a\x0ab + a\r\nb + 0: a\x0d\x0ab +\= Expect no match + a\x85b +No match + a\x0bb +No match + +/a\R?b/I,bsr=unicode +Capture group count = 0 +\R matches any Unicode newline +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + a\rb + 0: a\x0db + a\nb + 0: a\x0ab + a\r\nb + 0: a\x0d\x0ab + a\x85b + 0: a\x85b + a\x0bb + 0: a\x0bb + +/a\R{2,4}b/I,bsr=anycrlf +Capture group count = 0 +\R matches CR, LF, or CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 4 + a\r\n\nb + 0: a\x0d\x0a\x0ab + a\n\r\rb + 0: a\x0a\x0d\x0db + a\r\n\r\n\r\n\r\nb + 0: a\x0d\x0a\x0d\x0a\x0d\x0a\x0d\x0ab +\= Expect no match + a\x85\x85b +No match + a\x0b\x0bb +No match + +/a\R{2,4}b/I,bsr=unicode +Capture group count = 0 +\R matches any Unicode newline +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 4 + a\r\rb + 0: a\x0d\x0db + a\n\n\nb + 0: a\x0a\x0a\x0ab + a\r\n\n\r\rb + 0: a\x0d\x0a\x0a\x0d\x0db + a\x85\x85b + 0: a\x85\x85b + a\x0b\x0bb + 0: a\x0b\x0bb +\= Expect no match + a\r\r\r\r\rb +No match + +/(*BSR_ANYCRLF)a\Rb/I +Capture group count = 0 +\R matches CR, LF, or CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + a\nb + 0: a\x0ab + a\rb + 0: a\x0db + +/(*BSR_UNICODE)a\Rb/I +Capture group count = 0 +\R matches any Unicode newline +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + a\x85b + 0: a\x85b + +/(*BSR_ANYCRLF)(*CRLF)a\Rb/I +Capture group count = 0 +\R matches CR, LF, or CRLF +Forced newline is CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + a\nb + 0: a\x0ab + a\rb + 0: a\x0db + +/(*CRLF)(*BSR_UNICODE)a\Rb/I +Capture group count = 0 +\R matches any Unicode newline +Forced newline is CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + a\x85b + 0: a\x85b + +/(*CRLF)(*BSR_ANYCRLF)(*CR)ab/I +Capture group count = 0 +\R matches CR, LF, or CRLF +Forced newline is CR +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + +/(?)(?&)/ +Failed: error 162 at offset 9: subpattern name expected + +/(?)(?&a)/ +Failed: error 115 at offset 11: reference to non-existent subpattern + +/(?)(?&aaaaaaaaaaaaaaaaaaaaaaa)/ +Failed: error 115 at offset 9: reference to non-existent subpattern + +/(?+-a)/ +Failed: error 129 at offset 2: digit expected after (?+ or (?- + +/(?-+a)/ +Failed: error 111 at offset 3: unrecognized character after (? or (?- + +/(?(-1))/ +Failed: error 115 at offset 5: reference to non-existent subpattern + +/(?(+10))/ +Failed: error 115 at offset 4: reference to non-existent subpattern + +/(?(10))/ +Failed: error 115 at offset 3: reference to non-existent subpattern + +/(?(+2))()()/ + +/(?(2))()()/ + +/\k''/ +Failed: error 162 at offset 3: subpattern name expected + +/\k<>/ +Failed: error 162 at offset 3: subpattern name expected + +/\k{}/ +Failed: error 162 at offset 3: subpattern name expected + +/\k/ +Failed: error 169 at offset 2: \k is not followed by a braced, angle-bracketed, or quoted name + +/\kabc/ +Failed: error 169 at offset 2: \k is not followed by a braced, angle-bracketed, or quoted name + +/(?P=)/ +Failed: error 162 at offset 4: subpattern name expected + +/(?P>)/ +Failed: error 162 at offset 4: subpattern name expected + +/[[:foo:]]/ +Failed: error 130 at offset 3: unknown POSIX class name + +/[[:1234:]]/ +Failed: error 130 at offset 3: unknown POSIX class name + +/[[:f\oo:]]/ +Failed: error 130 at offset 3: unknown POSIX class name + +/[[: :]]/ +Failed: error 130 at offset 3: unknown POSIX class name + +/[[:...:]]/ +Failed: error 130 at offset 3: unknown POSIX class name + +/[[:l\ower:]]/ +Failed: error 130 at offset 3: unknown POSIX class name + +/[[:abc\:]]/ +Failed: error 130 at offset 3: unknown POSIX class name + +/[abc[:x\]pqr:]]/ +Failed: error 130 at offset 6: unknown POSIX class name + +/[[:a\dz:]]/ +Failed: error 130 at offset 3: unknown POSIX class name + +/(^(a|b\g<-1'c))/ +Failed: error 157 at offset 8: \g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number + +/^(?+1)(?x|y){0}z/ + xzxx + 0: xz + yzyy + 0: yz +\= Expect no match + xxz +No match + +/(\3)(\1)(a)/ +\= Expect no match + cat +No match + +/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames + cat + 0: a + 1: + 2: + 3: a + +/TA]/ + The ACTA] comes + 0: TA] + +/TA]/allow_empty_class,match_unset_backref,dupnames + The ACTA] comes + 0: TA] + +/(?2)[]a()b](abc)/ +Failed: error 115 at offset 3: reference to non-existent subpattern + abcbabc + +/(?2)[^]a()b](abc)/ +Failed: error 115 at offset 3: reference to non-existent subpattern + abcbabc + +/(?1)[]a()b](abc)/ + abcbabc + 0: abcbabc + 1: abc +\= Expect no match + abcXabc +No match + +/(?1)[^]a()b](abc)/ + abcXabc + 0: abcXabc + 1: abc +\= Expect no match + abcbabc +No match + +/(?2)[]a()b](abc)(xyz)/ + xyzbabcxyz + 0: xyzbabcxyz + 1: abc + 2: xyz + +/(?&N)[]a(?)](?abc)/ +Failed: error 115 at offset 3: reference to non-existent subpattern + abc)](abc)/ +Failed: error 115 at offset 3: reference to non-existent subpattern + abcadc + +0 ^ (? + +2 ^ (?= + +5 ^ .* + +7 ^ ^ b + +7 ^ ^ b + +7 ^^ b + +7 ^ b ++11 ^ ^ ++12 ^ ) ++13 ^ End of pattern + 0: + abc +--->abc + +0 ^ (? + +2 ^ (?= + +5 ^ .* + +7 ^ ^ b + +7 ^ ^ b + +7 ^^ b + +8 ^ ^ ) + +9 ^ b + +0 ^ (? + +2 ^ (?= + +5 ^ .* + +7 ^ ^ b + +7 ^^ b + +7 ^ b + +8 ^^ ) + +9 ^ b ++10 ^^ | ++13 ^^ End of pattern + 0: b + +/(?(?=b).*b|^d)/I +Capture group count = 0 +Subject length lower bound = 1 + +/(?(?=.*b).*b|^d)/I +Capture group count = 0 +Subject length lower bound = 1 + +/xyz/auto_callout + xyz +--->xyz + +0 ^ x + +1 ^^ y + +2 ^ ^ z + +3 ^ ^ End of pattern + 0: xyz + abcxyz +--->abcxyz + +0 ^ x + +1 ^^ y + +2 ^ ^ z + +3 ^ ^ End of pattern + 0: xyz +\= Expect no match + abc +No match + abcxypqr +No match + +/xyz/auto_callout,no_start_optimize + abcxyz +--->abcxyz + +0 ^ x + +0 ^ x + +0 ^ x + +0 ^ x + +1 ^^ y + +2 ^ ^ z + +3 ^ ^ End of pattern + 0: xyz +\= Expect no match + abc +--->abc + +0 ^ x + +0 ^ x + +0 ^ x + +0 ^ x +No match + abcxypqr +--->abcxypqr + +0 ^ x + +0 ^ x + +0 ^ x + +0 ^ x + +1 ^^ y + +2 ^ ^ z + +0 ^ x + +0 ^ x + +0 ^ x + +0 ^ x + +0 ^ x +No match + +/(*NO_START_OPT)xyz/auto_callout + abcxyz +--->abcxyz ++15 ^ x ++15 ^ x ++15 ^ x ++15 ^ x ++16 ^^ y ++17 ^ ^ z ++18 ^ ^ End of pattern + 0: xyz + +/(*NO_AUTO_POSSESS)a+b/B +------------------------------------------------------------------ + Bra + a+ + b + Ket + End +------------------------------------------------------------------ + +/xyz/auto_callout,no_start_optimize + abcxyz +--->abcxyz + +0 ^ x + +0 ^ x + +0 ^ x + +0 ^ x + +1 ^^ y + +2 ^ ^ z + +3 ^ ^ End of pattern + 0: xyz + +/^"((?(?=[a])[^"])|b)*"$/auto_callout + "ab" +--->"ab" + +0 ^ ^ + +1 ^ " + +2 ^^ ( + +3 ^^ (? + +5 ^^ (?= + +8 ^^ [a] ++11 ^ ^ ) ++12 ^^ [^"] ++16 ^ ^ ) ++17 ^ ^ | + +3 ^ ^ (? + +5 ^ ^ (?= + +8 ^ ^ [a] ++17 ^ ^ | ++21 ^ ^ " ++18 ^ ^ b ++19 ^ ^ )* + +3 ^ ^ (? + +5 ^ ^ (?= + +8 ^ ^ [a] ++17 ^ ^ | ++21 ^ ^ " ++22 ^ ^ $ ++23 ^ ^ End of pattern + 0: "ab" + 1: + +/^"((?(?=[a])[^"])|b)*"$/ + "ab" + 0: "ab" + 1: + +/^X(?5)(a)(?|(b)|(q))(c)(d)Y/ +Failed: error 115 at offset 5: reference to non-existent subpattern + XYabcdY + +/^X(?&N)(a)(?|(b)|(q))(c)(d)(?Y)/ + XYabcdY + 0: XYabcdY + 1: a + 2: b + 3: c + 4: d + 5: Y + +/Xa{2,4}b/ + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/Xa{2,4}?b/ + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/Xa{2,4}+b/ + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/X\d{2,4}b/ + X\=ps +Partial match: X + X3\=ps +Partial match: X3 + X33\=ps +Partial match: X33 + X333\=ps +Partial match: X333 + X3333\=ps +Partial match: X3333 + +/X\d{2,4}?b/ + X\=ps +Partial match: X + X3\=ps +Partial match: X3 + X33\=ps +Partial match: X33 + X333\=ps +Partial match: X333 + X3333\=ps +Partial match: X3333 + +/X\d{2,4}+b/ + X\=ps +Partial match: X + X3\=ps +Partial match: X3 + X33\=ps +Partial match: X33 + X333\=ps +Partial match: X333 + X3333\=ps +Partial match: X3333 + +/X\D{2,4}b/ + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/X\D{2,4}?b/ + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/X\D{2,4}+b/ + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/X[abc]{2,4}b/ + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/X[abc]{2,4}?b/ + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/X[abc]{2,4}+b/ + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/X[^a]{2,4}b/ + X\=ps +Partial match: X + Xz\=ps +Partial match: Xz + Xzz\=ps +Partial match: Xzz + Xzzz\=ps +Partial match: Xzzz + Xzzzz\=ps +Partial match: Xzzzz + +/X[^a]{2,4}?b/ + X\=ps +Partial match: X + Xz\=ps +Partial match: Xz + Xzz\=ps +Partial match: Xzz + Xzzz\=ps +Partial match: Xzzz + Xzzzz\=ps +Partial match: Xzzzz + +/X[^a]{2,4}+b/ + X\=ps +Partial match: X + Xz\=ps +Partial match: Xz + Xzz\=ps +Partial match: Xzz + Xzzz\=ps +Partial match: Xzzz + Xzzzz\=ps +Partial match: Xzzzz + +/(Y)X\1{2,4}b/ + YX\=ps +Partial match: YX + YXY\=ps +Partial match: YXY + YXYY\=ps +Partial match: YXYY + YXYYY\=ps +Partial match: YXYYY + YXYYYY\=ps +Partial match: YXYYYY + +/(Y)X\1{2,4}?b/ + YX\=ps +Partial match: YX + YXY\=ps +Partial match: YXY + YXYY\=ps +Partial match: YXYY + YXYYY\=ps +Partial match: YXYYY + YXYYYY\=ps +Partial match: YXYYYY + +/(Y)X\1{2,4}+b/ + YX\=ps +Partial match: YX + YXY\=ps +Partial match: YXY + YXYY\=ps +Partial match: YXYY + YXYYY\=ps +Partial match: YXYYY + YXYYYY\=ps +Partial match: YXYYYY + +/\++\KZ|\d+X|9+Y/startchar + ++++123999\=ps +Partial match: 123999 + ++++123999Y\=ps + 0: 999Y + ++++Z1234\=ps + 0: ++++Z + ^^^^ + +/Z(*F)/ +\= Expect no match + Z\=ps +No match + ZA\=ps +No match + +/Z(?!)/ +\= Expect no match + Z\=ps +No match + ZA\=ps +No match + +/dog(sbody)?/ + dogs\=ps + 0: dog + dogs\=ph +Partial match: dogs + +/dog(sbody)??/ + dogs\=ps + 0: dog + dogs\=ph + 0: dog + +/dog|dogsbody/ + dogs\=ps + 0: dog + dogs\=ph + 0: dog + +/dogsbody|dog/ + dogs\=ps + 0: dog + dogs\=ph +Partial match: dogs + +/\bthe cat\b/ + the cat\=ps + 0: the cat + the cat\=ph +Partial match: the cat + +/abc/ + abc\=ps + 0: abc + abc\=ph + 0: abc + +/abc\K123/startchar + xyzabc123pqr + 0: abc123 + ^^^ + xyzabc12\=ps +Partial match: abc12 + xyzabc12\=ph +Partial match: abc12 + +/(?<=abc)123/ + xyzabc123pqr + 0: 123 + xyzabc12\=ps +Partial match: 12 + xyzabc12\=ph +Partial match: 12 + +/\babc\b/ + +++abc+++ + 0: abc + +++ab\=ps +Partial match: ab + +++ab\=ph +Partial match: ab + +/(?&word)(?&element)(?(DEFINE)(?<[^m][^>]>[^<])(?\w*+))/B +------------------------------------------------------------------ + Bra + Recurse + Recurse + Cond + Cond false + CBra 1 + < + [^m] + [^>] + > + [^<] + Ket + CBra 2 + \w*+ + Ket + Ket + Ket + End +------------------------------------------------------------------ + +/(?&word)(?&element)(?(DEFINE)(?<[^\d][^>]>[^<])(?\w*+))/B +------------------------------------------------------------------ + Bra + Recurse + Recurse + Cond + Cond false + CBra 1 + < + [\x00-/:-\xff] (neg) + [^>] + > + [^<] + Ket + CBra 2 + \w*+ + Ket + Ket + Ket + End +------------------------------------------------------------------ + +/(ab)(x(y)z(cd(*ACCEPT)))pq/B +------------------------------------------------------------------ + Bra + CBra 1 + ab + Ket + CBra 2 + x + CBra 3 + y + Ket + z + CBra 4 + cd + Close 4 + Close 2 + *ACCEPT + Ket + Ket + pq + Ket + End +------------------------------------------------------------------ + +/abc\K/aftertext,startchar + abcdef + 0: abc + ^^^ + 0+ def + abcdef\=notempty_atstart + 0: abc + ^^^ + 0+ def + xyzabcdef\=notempty_atstart + 0: abc + ^^^ + 0+ def +\= Expect no match + abcdef\=notempty +No match + xyzabcdef\=notempty +No match + +/^(?:(?=abc)|abc\K)/aftertext,startchar + abcdef + 0: + 0+ abcdef + abcdef\=notempty_atstart + 0: abc + ^^^ + 0+ def +\= Expect no match + abcdef\=notempty +No match + +/a?b?/aftertext + xyz + 0: + 0+ xyz + xyzabc + 0: + 0+ xyzabc + xyzabc\=notempty + 0: ab + 0+ c + xyzabc\=notempty_atstart + 0: + 0+ yzabc + xyz\=notempty_atstart + 0: + 0+ yz +\= Expect no match + xyz\=notempty +No match + +/^a?b?/aftertext + xyz + 0: + 0+ xyz + xyzabc + 0: + 0+ xyzabc +\= Expect no match + xyzabc\=notempty +No match + xyzabc\=notempty_atstart +No match + xyz\=notempty_atstart +No match + xyz\=notempty +No match + +/^(?a|b\gc)/ + aaaa + 0: a + 1: a + bacxxx + 0: bac + 1: bac + bbaccxxx + 0: bbacc + 1: bbacc + bbbacccxx + 0: bbbaccc + 1: bbbaccc + +/^(?a|b\g'name'c)/ + aaaa + 0: a + 1: a + bacxxx + 0: bac + 1: bac + bbaccxxx + 0: bbacc + 1: bbacc + bbbacccxx + 0: bbbaccc + 1: bbbaccc + +/^(a|b\g<1>c)/ + aaaa + 0: a + 1: a + bacxxx + 0: bac + 1: bac + bbaccxxx + 0: bbacc + 1: bbacc + bbbacccxx + 0: bbbaccc + 1: bbbaccc + +/^(a|b\g'1'c)/ + aaaa + 0: a + 1: a + bacxxx + 0: bac + 1: bac + bbaccxxx + 0: bbacc + 1: bbacc + bbbacccxx + 0: bbbaccc + 1: bbbaccc + +/^(a|b\g'-1'c)/ + aaaa + 0: a + 1: a + bacxxx + 0: bac + 1: bac + bbaccxxx + 0: bbacc + 1: bbacc + bbbacccxx + 0: bbbaccc + 1: bbbaccc + +/(^(a|b\g<-1>c))/ + aaaa + 0: a + 1: a + 2: a + bacxxx + 0: bac + 1: bac + 2: bac + bbaccxxx + 0: bbacc + 1: bbacc + 2: bbacc + bbbacccxx + 0: bbbaccc + 1: bbbaccc + 2: bbbaccc + +/(?-i:\g)(?i:(?a))/ + XaaX + 0: aa + 1: a + XAAX + 0: AA + 1: A + +/(?i:\g)(?-i:(?a))/ + XaaX + 0: aa + 1: a +\= Expect no match + XAAX +No match + +/(?-i:\g<+1>)(?i:(a))/ + XaaX + 0: aa + 1: a + XAAX + 0: AA + 1: A + +/(?=(?(?#simplesyntax)\$(?[a-zA-Z_\x{7f}-\x{ff}][a-zA-Z0-9_\x{7f}-\x{ff}]*)(?:\[(?[a-zA-Z0-9_\x{7f}-\x{ff}]+|\$\g)\]|->\g(\(.*?\))?)?|(?#simple syntax withbraces)\$\{(?:\g(?\[(?:\g|'(?:\\.|[^'\\])*'|"(?:\g|\\.|[^"\\])*")\])?|\g|\$\{\g\})\}|(?#complexsyntax)\{(?\$(?\g(\g*|\(.*?\))?)(?:->\g)*|\$\g|\$\{\g\})\}))\{/ + +/(?a|b|c)\g*/ + abc + 0: abc + 1: a + accccbbb + 0: accccbbb + 1: a + +/^X(?7)(a)(?|(b)|(q)(r)(s))(c)(d)(Y)/ + XYabcdY + 0: XYabcdY + 1: a + 2: b + 3: + 4: + 5: c + 6: d + 7: Y + +/(?<=b(?1)|zzz)(a)/ + xbaax + 0: a + 1: a + xzzzax + 0: a + 1: a + +/(a)(?<=b\1)/ + +/(a)(?<=b+(?1))/ +Failed: error 125 at offset 3: lookbehind assertion is not fixed length + +/(a+)(?<=b(?1))/ +Failed: error 125 at offset 4: lookbehind assertion is not fixed length + +/(a(?<=b(?1)))/ +Failed: error 125 at offset 2: lookbehind assertion is not fixed length + +/(?<=b(?1))xyz/ +Failed: error 115 at offset 8: reference to non-existent subpattern + +/(?<=b(?1))xyz(b+)pqrstuvew/ +Failed: error 125 at offset 0: lookbehind assertion is not fixed length + +/(a|bc)\1/I +Capture group count = 1 +Max back reference = 1 +Starting code units: a b +Subject length lower bound = 2 + +/(a|bc)\1{2,3}/I +Capture group count = 1 +Max back reference = 1 +Starting code units: a b +Subject length lower bound = 3 + +/(a|bc)(?1)/I +Capture group count = 1 +Starting code units: a b +Subject length lower bound = 2 + +/(a|b\1)(a|b\1)/I +Capture group count = 2 +Max back reference = 1 +Starting code units: a b +Subject length lower bound = 2 + +/(a|b\1){2}/I +Capture group count = 1 +Max back reference = 1 +Starting code units: a b +Subject length lower bound = 2 + +/(a|bbbb\1)(a|bbbb\1)/I +Capture group count = 2 +Max back reference = 1 +Starting code units: a b +Subject length lower bound = 2 + +/(a|bbbb\1){2}/I +Capture group count = 1 +Max back reference = 1 +Starting code units: a b +Subject length lower bound = 2 + +/^From +([^ ]+) +[a-zA-Z][a-zA-Z][a-zA-Z] +[a-zA-Z][a-zA-Z][a-zA-Z] +[0-9]?[0-9] +[0-9][0-9]:[0-9][0-9]/I +Capture group count = 1 +Compile options: +Overall options: anchored +First code unit = 'F' +Last code unit = ':' +Subject length lower bound = 22 + +/]{0,})>]{0,})>([\d]{0,}\.)(.*)((
    ([\w\W\s\d][^<>]{0,})|[\s]{0,}))<\/a><\/TD>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD>]{0,})>([\w\W\s\d][^<>]{0,})<\/TD><\/TR>/Iis +Capture group count = 11 +Options: caseless dotall +First code unit = '<' +Last code unit = '>' +Subject length lower bound = 47 + +"(?>.*/)foo"I +Capture group count = 0 +Last code unit = 'o' +Subject length lower bound = 4 + +/(?(?=[^a-z]+[a-z]) \d{2}-[a-z]{3}-\d{2} | \d{2}-\d{2}-\d{2} ) /Ix +Capture group count = 0 +Options: extended +Last code unit = '-' +Subject length lower bound = 8 + +/(?:(?:(?:(?:(?:(?:(?:(?:(?:(a|b|c))))))))))/Ii +Capture group count = 1 +Options: caseless +Starting code units: A B C a b c +Subject length lower bound = 1 + +/(?:c|d)(?:)(?:aaaaaaaa(?:)(?:bbbbbbbb)(?:bbbbbbbb(?:))(?:bbbbbbbb(?:)(?:bbbbbbbb)))/I +Capture group count = 0 +Starting code units: c d +Last code unit = 'b' +Subject length lower bound = 41 + +/A)|(?
    B))/I +Capture group count = 1 +Named capture groups: + a 1 +Starting code units: A B +Subject length lower bound = 1 + AB\=copy=a + 0: A + 1: A + C A (1) a (group 1) + BA\=copy=a + 0: B + 1: B + C B (1) a (group 1) + +/(?|(?A)|(?B))/ +Failed: error 165 at offset 16: different names for subpatterns of the same number are not allowed + +/(?:a(? (?')|(?")) | + b(? (?')|(?")) ) + (?('quote')[a-z]+|[0-9]+)/Ix,dupnames +Capture group count = 6 +Max back reference = 4 +Named capture groups: + apostrophe 2 + apostrophe 5 + quote 1 + quote 4 + realquote 3 + realquote 6 +Options: dupnames extended +Starting code units: a b +Subject length lower bound = 3 + a"aaaaa + 0: a"aaaaa + 1: " + 2: + 3: " + b"aaaaa + 0: b"aaaaa + 1: + 2: + 3: + 4: " + 5: + 6: " +\= Expect no match + b"11111 +No match + a"11111 +No match + +/^(?|(a)(b)(c)(?d)|(?e)) (?('D')X|Y)/IBx,dupnames +------------------------------------------------------------------ + Bra + ^ + Bra + CBra 1 + a + Ket + CBra 2 + b + Ket + CBra 3 + c + Ket + CBra 4 + d + Ket + Alt + CBra 1 + e + Ket + Ket + Cond + Cond ref 2 + X + Alt + Y + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 4 +Max back reference = 4 +Named capture groups: + D 4 + D 1 +Compile options: dupnames extended +Overall options: anchored dupnames extended +Starting code units: a e +Subject length lower bound = 2 + abcdX + 0: abcdX + 1: a + 2: b + 3: c + 4: d + eX + 0: eX + 1: e +\= Expect no match + abcdY +No match + ey +No match + +/(?a) (b)(c) (?d (?(R&A)$ | (?4)) )/IBx,dupnames +------------------------------------------------------------------ + Bra + CBra 1 + a + Ket + CBra 2 + b + Ket + CBra 3 + c + Ket + CBra 4 + d + Cond + Cond recurse 2 + $ + Alt + Recurse + Ket + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 4 +Max back reference = 4 +Named capture groups: + A 1 + A 4 +Options: dupnames extended +First code unit = 'a' +Last code unit = 'd' +Subject length lower bound = 4 + abcdd + 0: abcdd + 1: a + 2: b + 3: c + 4: dd +\= Expect no match + abcdde +No match + +/abcd*/ + xxxxabcd\=ps + 0: abcd + xxxxabcd\=ph +Partial match: abcd + +/abcd*/i + xxxxabcd\=ps + 0: abcd + xxxxabcd\=ph +Partial match: abcd + XXXXABCD\=ps + 0: ABCD + XXXXABCD\=ph +Partial match: ABCD + +/abc\d*/ + xxxxabc1\=ps + 0: abc1 + xxxxabc1\=ph +Partial match: abc1 + +/(a)bc\1*/ + xxxxabca\=ps + 0: abca + 1: a + xxxxabca\=ph +Partial match: abca + +/abc[de]*/ + xxxxabcde\=ps + 0: abcde + xxxxabcde\=ph +Partial match: abcde + +/(\3)(\1)(a)/allow_empty_class,match_unset_backref,dupnames + cat + 0: a + 1: + 2: + 3: a + +/(\3)(\1)(a)/I,allow_empty_class,match_unset_backref,dupnames +Capture group count = 3 +Max back reference = 3 +Options: allow_empty_class dupnames match_unset_backref +Last code unit = 'a' +Subject length lower bound = 1 + cat + 0: a + 1: + 2: + 3: a + +/(\3)(\1)(a)/I +Capture group count = 3 +Max back reference = 3 +Last code unit = 'a' +Subject length lower bound = 3 +\= Expect no match + cat +No match + +/i(?(DEFINE)(?a))/I +Capture group count = 1 +Named capture groups: + s 1 +First code unit = 'i' +Subject length lower bound = 1 + i + 0: i + +/()i(?(1)a)/I +Capture group count = 1 +Max back reference = 1 +First code unit = 'i' +Subject length lower bound = 1 + ia + 0: ia + 1: + +/(?i)a(?-i)b|c/B +------------------------------------------------------------------ + Bra + /i a + b + Alt + c + Ket + End +------------------------------------------------------------------ + XabX + 0: ab + XAbX + 0: Ab + CcC + 0: c +\= Expect no match + XABX +No match + +/(?i)a(?s)b|c/B +------------------------------------------------------------------ + Bra + /i ab + Alt + /i c + Ket + End +------------------------------------------------------------------ + +/(?i)a(?s-i)b|c/B +------------------------------------------------------------------ + Bra + /i a + b + Alt + c + Ket + End +------------------------------------------------------------------ + +/^(ab(c\1)d|x){2}$/B +------------------------------------------------------------------ + Bra + ^ + CBra 1 + ab + CBra 2 + c + \1 + Ket + d + Alt + x + Ket + CBra 1 + ab + CBra 2 + c + \1 + Ket + d + Alt + x + Ket + $ + Ket + End +------------------------------------------------------------------ + xabcxd + 0: xabcxd + 1: abcxd + 2: cx + +/^(?&t)*+(?(DEFINE)(?.))$/B +------------------------------------------------------------------ + Bra + ^ + Braposzero + SBraPos + Recurse + KetRpos + Cond + Cond false + CBra 1 + Any + Ket + Ket + $ + Ket + End +------------------------------------------------------------------ + +/^(?&t)*(?(DEFINE)(?.))$/B +------------------------------------------------------------------ + Bra + ^ + Brazero + SBra + Recurse + KetRmax + Cond + Cond false + CBra 1 + Any + Ket + Ket + $ + Ket + End +------------------------------------------------------------------ + +# This one is here because Perl gives the match as "b" rather than "ab". I +# believe this to be a Perl bug. + +/(?>a\Kb)z|(ab)/ + ab\=startchar + 0: ab + 1: ab + +/(?P(?P0|)|(?P>L2)(?P>L1))/ + abcd + 0: + 1: + 2: + 0abc + 0: 0 + 1: 0 + 2: 0 + +/abc(*MARK:)pqr/ +Failed: error 166 at offset 10: (*MARK) must have an argument + +/abc(*:)pqr/ +Failed: error 166 at offset 6: (*MARK) must have an argument + +/(*COMMIT:X)/B +------------------------------------------------------------------ + Bra + *COMMIT X + Ket + End +------------------------------------------------------------------ + +# This should, and does, fail. In Perl, it does not, which I think is a +# bug because replacing the B in the pattern by (B|D) does make it fail. +# Turning off Perl's optimization by inserting (??{""}) also makes it fail. + +/A(*COMMIT)B/aftertext,mark +\= Expect no match + ACABX +No match + +# These should be different, but in Perl they are not, which I think +# is a bug in Perl. + +/A(*THEN)B|A(*THEN)C/mark + AC + 0: AC + +/A(*PRUNE)B|A(*PRUNE)C/mark +\= Expect no match + AC +No match + +# Mark names can be duplicated. Perl doesn't give a mark for this one, +# though PCRE2 does. + +/^A(*:A)B|^X(*:A)Y/mark +\= Expect no match + XAQQ +No match, mark = A + +# COMMIT at the start of a pattern should be the same as an anchor. Perl +# optimizations defeat this. So does the PCRE2 optimization unless we disable +# it. + +/(*COMMIT)ABC/ + ABCDEFG + 0: ABC + +/(*COMMIT)ABC/no_start_optimize +\= Expect no match + DEFGABC +No match + +/^(ab (c+(*THEN)cd) | xyz)/x +\= Expect no match + abcccd +No match + +/^(ab (c+(*PRUNE)cd) | xyz)/x +\= Expect no match + abcccd +No match + +/^(ab (c+(*FAIL)cd) | xyz)/x +\= Expect no match + abcccd +No match + +# Perl gets some of these wrong + +/(?>.(*ACCEPT))*?5/ + abcde + 0: a + +/(.(*ACCEPT))*?5/ + abcde + 0: a + 1: a + +/(.(*ACCEPT))5/ + abcde + 0: a + 1: a + +/(.(*ACCEPT))*5/ + abcde + 0: a + 1: a + +/A\NB./B +------------------------------------------------------------------ + Bra + A + Any + B + Any + Ket + End +------------------------------------------------------------------ + ACBD + 0: ACBD +\= Expect no match + A\nB +No match + ACB\n +No match + +/A\NB./Bs +------------------------------------------------------------------ + Bra + A + Any + B + AllAny + Ket + End +------------------------------------------------------------------ + ACBD + 0: ACBD + ACB\n + 0: ACB\x0a +\= Expect no match + A\nB +No match + +/A\NB/newline=crlf + A\nB + 0: A\x0aB + A\rB + 0: A\x0dB +\= Expect no match + A\r\nB +No match + +/\R+b/B +------------------------------------------------------------------ + Bra + \R++ + b + Ket + End +------------------------------------------------------------------ + +/\R+\n/B +------------------------------------------------------------------ + Bra + \R+ + \x0a + Ket + End +------------------------------------------------------------------ + +/\R+\d/B +------------------------------------------------------------------ + Bra + \R++ + \d + Ket + End +------------------------------------------------------------------ + +/\d*\R/B +------------------------------------------------------------------ + Bra + \d*+ + \R + Ket + End +------------------------------------------------------------------ + +/\s*\R/B +------------------------------------------------------------------ + Bra + \s* + \R + Ket + End +------------------------------------------------------------------ + \x20\x0a + 0: \x0a + \x20\x0d + 0: \x0d + \x20\x0d\x0a + 0: \x0d\x0a + +/\S*\R/B +------------------------------------------------------------------ + Bra + \S*+ + \R + Ket + End +------------------------------------------------------------------ + a\x0a + 0: a\x0a + +/X\h*\R/B +------------------------------------------------------------------ + Bra + X + \h*+ + \R + Ket + End +------------------------------------------------------------------ + X\x20\x0a + 0: X \x0a + +/X\H*\R/B +------------------------------------------------------------------ + Bra + X + \H* + \R + Ket + End +------------------------------------------------------------------ + X\x0d\x0a + 0: X\x0d\x0a + +/X\H+\R/B +------------------------------------------------------------------ + Bra + X + \H+ + \R + Ket + End +------------------------------------------------------------------ + X\x0d\x0a + 0: X\x0d\x0a + +/X\H++\R/B +------------------------------------------------------------------ + Bra + X + \H++ + \R + Ket + End +------------------------------------------------------------------ +\= Expect no match + X\x0d\x0a +No match + +/(?<=abc)def/ + abc\=ph +Partial match: + +/abc$/ + abc + 0: abc + abc\=ps + 0: abc + abc\=ph +Partial match: abc + +/abc$/m + abc + 0: abc + abc\n + 0: abc + abc\=ph +Partial match: abc + abc\n\=ph + 0: abc + abc\=ps + 0: abc + abc\n\=ps + 0: abc + +/abc\z/ + abc + 0: abc + abc\=ps + 0: abc + abc\=ph +Partial match: abc + +/abc\Z/ + abc + 0: abc + abc\=ps + 0: abc + abc\=ph +Partial match: abc + +/abc\b/ + abc + 0: abc + abc\=ps + 0: abc + abc\=ph +Partial match: abc + +/abc\B/ + abc\=ps +Partial match: abc + abc\=ph +Partial match: abc +\= Expect no match + abc +No match + +/.+/ +\= Bad offsets + abc\=offset=4 +Failed: error -33: bad offset value + abc\=offset=-4 +** Invalid value in 'offset=-4' +\= Valid data + abc\=offset=0 + 0: abc + abc\=offset=1 + 0: bc + abc\=offset=2 + 0: c +\= Expect no match + abc\=offset=3 +No match + +/^\cÄ£/ +Failed: error 168 at offset 3: \c must be followed by a printable ASCII character + +/(?P(?P=abn)xxx)/B +------------------------------------------------------------------ + Bra + CBra 1 + \1 + xxx + Ket + Ket + End +------------------------------------------------------------------ + +/(a\1z)/B +------------------------------------------------------------------ + Bra + CBra 1 + a + \1 + z + Ket + Ket + End +------------------------------------------------------------------ + +/(?P(?P=abn)(?(?P=axn)xxx)/B +Failed: error 115 at offset 12: reference to non-existent subpattern + +/(?P(?P=axn)xxx)(?yy)/B +------------------------------------------------------------------ + Bra + CBra 1 + \2 + xxx + Ket + CBra 2 + yy + Ket + Ket + End +------------------------------------------------------------------ + +# These tests are here because Perl gets the first one wrong. + +/(\R*)(.)/s + \r\n + 0: \x0d + 1: + 2: \x0d + \r\r\n\n\r + 0: \x0d\x0d\x0a\x0a\x0d + 1: \x0d\x0d\x0a\x0a + 2: \x0d + \r\r\n\n\r\n + 0: \x0d\x0d\x0a\x0a\x0d + 1: \x0d\x0d\x0a\x0a + 2: \x0d + +/(\R)*(.)/s + \r\n + 0: \x0d + 1: + 2: \x0d + \r\r\n\n\r + 0: \x0d\x0d\x0a\x0a\x0d + 1: \x0a + 2: \x0d + \r\r\n\n\r\n + 0: \x0d\x0d\x0a\x0a\x0d + 1: \x0a + 2: \x0d + +/((?>\r\n|\n|\x0b|\f|\r|\x85)*)(.)/s + \r\n + 0: \x0d + 1: + 2: \x0d + \r\r\n\n\r + 0: \x0d\x0d\x0a\x0a\x0d + 1: \x0d\x0d\x0a\x0a + 2: \x0d + \r\r\n\n\r\n + 0: \x0d\x0d\x0a\x0a\x0d + 1: \x0d\x0d\x0a\x0a + 2: \x0d + +# ------------- + +/^abc$/B +------------------------------------------------------------------ + Bra + ^ + abc + $ + Ket + End +------------------------------------------------------------------ + +/^abc$/Bm +------------------------------------------------------------------ + Bra + /m ^ + abc + /m $ + Ket + End +------------------------------------------------------------------ + +/^(a)*+(\w)/ + aaaaX + 0: aaaaX + 1: a + 2: X +\= Expect no match + aaaa +No match + +/^(?:a)*+(\w)/ + aaaaX + 0: aaaaX + 1: X +\= Expect no match + aaaa +No match + +/(a)++1234/IB +------------------------------------------------------------------ + Bra + CBraPos 1 + a + KetRpos + 1234 + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +First code unit = 'a' +Last code unit = '4' +Subject length lower bound = 5 + +/([abc])++1234/I +Capture group count = 1 +Starting code units: a b c +Last code unit = '4' +Subject length lower bound = 5 + +/(?<=(abc)+)X/ +Failed: error 125 at offset 0: lookbehind assertion is not fixed length + +/(^ab)/I +Capture group count = 1 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + +/(^ab)++/I +Capture group count = 1 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + +/(^ab|^)+/I +Capture group count = 1 +May match empty string +Compile options: +Overall options: anchored +Subject length lower bound = 0 + +/(^ab|^)++/I +Capture group count = 1 +May match empty string +Compile options: +Overall options: anchored +Subject length lower bound = 0 + +/(?:^ab)/I +Capture group count = 0 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + +/(?:^ab)++/I +Capture group count = 0 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 2 + +/(?:^ab|^)+/I +Capture group count = 0 +May match empty string +Compile options: +Overall options: anchored +Subject length lower bound = 0 + +/(?:^ab|^)++/I +Capture group count = 0 +May match empty string +Compile options: +Overall options: anchored +Subject length lower bound = 0 + +/(.*ab)/I +Capture group count = 1 +First code unit at start or follows newline +Last code unit = 'b' +Subject length lower bound = 2 + +/(.*ab)++/I +Capture group count = 1 +First code unit at start or follows newline +Last code unit = 'b' +Subject length lower bound = 2 + +/(.*ab|.*)+/I +Capture group count = 1 +May match empty string +First code unit at start or follows newline +Subject length lower bound = 0 + +/(.*ab|.*)++/I +Capture group count = 1 +May match empty string +First code unit at start or follows newline +Subject length lower bound = 0 + +/(?:.*ab)/I +Capture group count = 0 +First code unit at start or follows newline +Last code unit = 'b' +Subject length lower bound = 2 + +/(?:.*ab)++/I +Capture group count = 0 +First code unit at start or follows newline +Last code unit = 'b' +Subject length lower bound = 2 + +/(?:.*ab|.*)+/I +Capture group count = 0 +May match empty string +First code unit at start or follows newline +Subject length lower bound = 0 + +/(?:.*ab|.*)++/I +Capture group count = 0 +May match empty string +First code unit at start or follows newline +Subject length lower bound = 0 + +/(?=a)[bcd]/I +Capture group count = 0 +First code unit = 'a' +Subject length lower bound = 1 + +/((?=a))[bcd]/I +Capture group count = 1 +First code unit = 'a' +Subject length lower bound = 1 + +/((?=a))+[bcd]/I +Capture group count = 1 +First code unit = 'a' +Subject length lower bound = 1 + +/((?=a))++[bcd]/I +Capture group count = 1 +First code unit = 'a' +Subject length lower bound = 1 + +/(?=a+)[bcd]/Ii +Capture group count = 0 +Options: caseless +First code unit = 'a' (caseless) +Subject length lower bound = 1 + +/(?=a+?)[bcd]/Ii +Capture group count = 0 +Options: caseless +First code unit = 'a' (caseless) +Subject length lower bound = 1 + +/(?=a++)[bcd]/Ii +Capture group count = 0 +Options: caseless +First code unit = 'a' (caseless) +Subject length lower bound = 1 + +/(?=a{3})[bcd]/Ii +Capture group count = 0 +Options: caseless +First code unit = 'a' (caseless) +Last code unit = 'a' (caseless) +Subject length lower bound = 2 + +/(abc)\1+/ + +# Perl doesn't get these right IMO (the 3rd is PCRE2-specific) + +/(?1)(?:(b(*ACCEPT))){0}/ + b + 0: b + +/(?1)(?:(b(*ACCEPT))){0}c/ + bc + 0: bc +\= Expect no match + b +No match + +/(?1)(?:((*ACCEPT))){0}c/ + c + 0: c + c\=notempty + 0: c + +/^.*?(?(?=a)a|b(*THEN)c)/ +\= Expect no match + ba +No match + +/^.*?(?(?=a)a|bc)/ + ba + 0: ba + +/^.*?(?(?=a)a(*THEN)b|c)/ +\= Expect no match + ac +No match + +/^.*?(?(?=a)a(*THEN)b)c/ +\= Expect no match + ac +No match + +/^.*?(a(*THEN)b)c/ +\= Expect no match + aabc +No match + +/^.*? (?1) c (?(DEFINE)(a(*THEN)b))/x + aabc + 0: aabc + +/^.*?(a(*THEN)b|z)c/ + aabc + 0: aabc + 1: ab + +/^.*?(z|a(*THEN)b)c/ + aabc + 0: aabc + 1: ab + +# These are here because they are not Perl-compatible; the studying means the +# mark is not seen. + +/(*MARK:A)(*SKIP:B)(C|X)/mark + C + 0: C + 1: C +MK: A +\= Expect no match + D +No match, mark = A + +/(*:A)A+(*SKIP:A)(B|Z)/mark +\= Expect no match + AAAC +No match, mark = A + +# ---------------------------- + +"(?=a*(*ACCEPT)b)c" + c + 0: c + c\=notempty + 0: c + +/(?1)c(?(DEFINE)((*ACCEPT)b))/ + c + 0: c + c\=notempty + 0: c + +/(?>(*ACCEPT)b)c/ + c + 0: +\= Expect no match + c\=notempty +No match + +/(?:(?>(a)))+a%/allaftertext + %aa% + 0: aa% + 0+ + 1: a + 1+ a% + +/(a)b|ac/allaftertext + ac\=ovector=1 + 0: ac + 0+ + +/(a)(b)x|abc/allaftertext + abc\=ovector=2 + 0: abc + 0+ + +/(a)bc|(a)(b)\2/ + abc\=ovector=1 +Matched, but too many substrings + 0: abc + abc\=ovector=2 + 0: abc + 1: a + aba\=ovector=1 +Matched, but too many substrings + 0: aba + aba\=ovector=2 +Matched, but too many substrings + 0: aba + 1: + aba\=ovector=3 +Matched, but too many substrings + 0: aba + 1: + 2: a + aba\=ovector=4 + 0: aba + 1: + 2: a + 3: b + +/(?(DEFINE)(a(?2)|b)(b(?1)|a))(?:(?1)|(?2))/I +Capture group count = 2 +May match empty string +Subject length lower bound = 0 + +/(a(?2)|b)(b(?1)|a)(?:(?1)|(?2))/I +Capture group count = 2 +Starting code units: a b +Subject length lower bound = 3 + +/(a(?2)|b)(b(?1)|a)(?1)(?2)/I +Capture group count = 2 +Starting code units: a b +Subject length lower bound = 4 + +/(abc)(?1)/I +Capture group count = 1 +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 6 + +/(?:(foo)|(bar)|(baz))X/allcaptures + bazfooX + 0: fooX + 1: foo + 2: + 3: + foobazbarX + 0: barX + 1: + 2: bar + 3: + barfooX + 0: fooX + 1: foo + 2: + 3: + bazX + 0: bazX + 1: + 2: + 3: baz + foobarbazX + 0: bazX + 1: + 2: + 3: baz + bazfooX\=ovector=0 + 0: fooX + 1: foo + 2: + 3: + bazfooX\=ovector=1 +Matched, but too many substrings + 0: fooX + bazfooX\=ovector=2 + 0: fooX + 1: foo + bazfooX\=ovector=3 + 0: fooX + 1: foo + 2: + +/(?=abc){3}abc/B +------------------------------------------------------------------ + Bra + Assert + abc + Ket + Assert + abc + Ket + Assert + abc + Ket + abc + Ket + End +------------------------------------------------------------------ + +/(?=abc)+abc/B +------------------------------------------------------------------ + Bra + Assert + abc + Ket + Brazero + Assert + abc + Ket + abc + Ket + End +------------------------------------------------------------------ + +/(?=abc)++abc/B +------------------------------------------------------------------ + Bra + Once + Assert + abc + Ket + Brazero + Assert + abc + Ket + Ket + abc + Ket + End +------------------------------------------------------------------ + +/(?=abc){0}xyz/B +------------------------------------------------------------------ + Bra + Skip zero + Assert + abc + Ket + xyz + Ket + End +------------------------------------------------------------------ + +/(?=(a))?./B +------------------------------------------------------------------ + Bra + Brazero + Assert + CBra 1 + a + Ket + Ket + Any + Ket + End +------------------------------------------------------------------ + +/(?=(a))??./B +------------------------------------------------------------------ + Bra + Braminzero + Assert + CBra 1 + a + Ket + Ket + Any + Ket + End +------------------------------------------------------------------ + +/^(?=(a)){0}b(?1)/B +------------------------------------------------------------------ + Bra + ^ + Skip zero + Assert + CBra 1 + a + Ket + Ket + b + Recurse + Ket + End +------------------------------------------------------------------ + +/(?(DEFINE)(a))?b(?1)/B +------------------------------------------------------------------ + Bra + Cond + Cond false + CBra 1 + a + Ket + Ket + b + Recurse + Ket + End +------------------------------------------------------------------ + +/^(?=(?1))?[az]([abc])d/B +------------------------------------------------------------------ + Bra + ^ + Brazero + Assert + Recurse + Ket + [az] + CBra 1 + [a-c] + Ket + d + Ket + End +------------------------------------------------------------------ + +/^(?!a){0}\w+/B +------------------------------------------------------------------ + Bra + ^ + Skip zero + Assert not + a + Ket + \w++ + Ket + End +------------------------------------------------------------------ + +/(?<=(abc))?xyz/B +------------------------------------------------------------------ + Bra + Brazero + Assert back + Reverse + CBra 1 + abc + Ket + Ket + xyz + Ket + End +------------------------------------------------------------------ + +/[:a[:abc]b:]/B +------------------------------------------------------------------ + Bra + [:[a-c] + b:] + Ket + End +------------------------------------------------------------------ + +/^(a(*:A)(d|e(*:B))z|aeq)/auto_callout + adz +--->adz + +0 ^ ^ + +1 ^ ( + +2 ^ a + +3 ^^ (*:A) + +8 ^^ ( +Latest Mark: A + +9 ^^ d ++10 ^ ^ | ++18 ^ ^ z ++19 ^ ^ | ++24 ^ ^ End of pattern + 0: adz + 1: adz + 2: d + aez +--->aez + +0 ^ ^ + +1 ^ ( + +2 ^ a + +3 ^^ (*:A) + +8 ^^ ( +Latest Mark: A + +9 ^^ d ++11 ^^ e ++12 ^ ^ (*:B) ++17 ^ ^ ) +Latest Mark: B ++18 ^ ^ z ++19 ^ ^ | ++24 ^ ^ End of pattern + 0: aez + 1: aez + 2: e + aeqwerty +--->aeqwerty + +0 ^ ^ + +1 ^ ( + +2 ^ a + +3 ^^ (*:A) + +8 ^^ ( +Latest Mark: A + +9 ^^ d ++11 ^^ e ++12 ^ ^ (*:B) ++17 ^ ^ ) +Latest Mark: B ++18 ^ ^ z ++20 ^ a ++21 ^^ e ++22 ^ ^ q ++23 ^ ^ ) ++24 ^ ^ End of pattern + 0: aeq + 1: aeq + +/.(*F)/ +\= Expect no match + abc\=ph +No match + +/\btype\b\W*?\btext\b\W*?\bjavascript\b/I +Capture group count = 0 +Max lookbehind = 1 +First code unit = 't' +Last code unit = 't' +Subject length lower bound = 18 + +/\btype\b\W*?\btext\b\W*?\bjavascript\b|\burl\b\W*?\bshell:|a+)(?>(z+))\w/B +------------------------------------------------------------------ + Bra + ^ + Once + a++ + Ket + Once + CBra 1 + z++ + Ket + Ket + \w + Ket + End +------------------------------------------------------------------ + aaaazzzzb + 0: aaaazzzzb + 1: zzzz +\= Expect no match + aazz +No match + +/(.)(\1|a(?2))/ + bab + 0: bab + 1: b + 2: ab + +/\1|(.)(?R)\1/ + cbbbc + 0: cbbbc + 1: c + +/(.)((?(1)c|a)|a(?2))/ +\= Expect no match + baa +No match + +/(?P(?P=abn)xxx)/B +------------------------------------------------------------------ + Bra + CBra 1 + \1 + xxx + Ket + Ket + End +------------------------------------------------------------------ + +/(a\1z)/B +------------------------------------------------------------------ + Bra + CBra 1 + a + \1 + z + Ket + Ket + End +------------------------------------------------------------------ + +/^a\x41z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + aAz + 0: aAz +\= Expect no match + ax41z +No match + +/^a[m\x41]z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + aAz + 0: aAz + +/^a\x1z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + ax1z + 0: ax1z + +/^a\u0041z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + aAz + 0: aAz +\= Expect no match + au0041z +No match + +/^a[m\u0041]z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + aAz + 0: aAz + +/^a\u041z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + au041z + 0: au041z +\= Expect no match + aAz +No match + +/^a\U0041z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + aU0041z + 0: aU0041z +\= Expect no match + aAz +No match + +/^\u{7a}/alt_bsux + u{7a} + 0: u{7a} +\= Expect no match + zoo +No match + +/^\u{7a}/extra_alt_bsux + zoo + 0: z + +/(?(?=c)c|d)++Y/B +------------------------------------------------------------------ + Bra + BraPos + Cond + Assert + c + Ket + c + Alt + d + Ket + KetRpos + Y + Ket + End +------------------------------------------------------------------ + +/(?(?=c)c|d)*+Y/B +------------------------------------------------------------------ + Bra + Braposzero + BraPos + Cond + Assert + c + Ket + c + Alt + d + Ket + KetRpos + Y + Ket + End +------------------------------------------------------------------ + +/a[\NB]c/ +Failed: error 171 at offset 4: \N is not supported in a class + aNc + +/a[B-\Nc]/ +Failed: error 150 at offset 6: invalid range in character class + +/a[B\Nc]/ +Failed: error 171 at offset 5: \N is not supported in a class + +/(a)(?2){0,1999}?(b)/ + +/(a)(?(DEFINE)(b))(?2){0,1999}?(?2)/ + +# This test, with something more complicated than individual letters, causes +# different behaviour in Perl. Perhaps it disables some optimization; no tag is +# passed back for the failures, whereas in PCRE2 there is a tag. + +/(A|P)(*:A)(B|P) | (X|P)(X|P)(*:B)(Y|P)/x,mark + AABC + 0: AB + 1: A + 2: B +MK: A + XXYZ + 0: XXY + 1: + 2: + 3: X + 4: X + 5: Y +MK: B +\= Expect no match + XAQQ +No match, mark = A + XAQQXZZ +No match, mark = A + AXQQQ +No match, mark = A + AXXQQQ +No match, mark = B + +# Perl doesn't give marks for these, though it does if the alternatives are +# replaced by single letters. + +/(b|q)(*:m)f|a(*:n)w/mark + aw + 0: aw +MK: n +\= Expect no match + abc +No match, mark = m + +/(q|b)(*:m)f|a(*:n)w/mark + aw + 0: aw +MK: n +\= Expect no match + abc +No match, mark = m + +# After a partial match, the behaviour is as for a failure. + +/^a(*:X)bcde/mark + abc\=ps +Partial match, mark=X: abc + +# These are here because Perl doesn't return a mark, except for the first. + +/(?=(*:x))(q|)/aftertext,mark + abc + 0: + 0+ abc + 1: +MK: x + +/(?=(*:x))((*:y)q|)/aftertext,mark + abc + 0: + 0+ abc + 1: +MK: x + +/(?=(*:x))(?:(*:y)q|)/aftertext,mark + abc + 0: + 0+ abc +MK: x + +/(?=(*:x))(?>(*:y)q|)/aftertext,mark + abc + 0: + 0+ abc +MK: x + +/(?=a(*:x))(?!a(*:y)c)/aftertext,mark + ab + 0: + 0+ ab +MK: x + +/(?=a(*:x))(?=a(*:y)c|)/aftertext,mark + ab + 0: + 0+ ab +MK: x + +/(..)\1/ + ab\=ps +Partial match: ab + aba\=ps +Partial match: aba + abab\=ps + 0: abab + 1: ab + +/(..)\1/i + ab\=ps +Partial match: ab + abA\=ps +Partial match: abA + aBAb\=ps + 0: aBAb + 1: aB + +/(..)\1{2,}/ + ab\=ps +Partial match: ab + aba\=ps +Partial match: aba + abab\=ps +Partial match: abab + ababa\=ps +Partial match: ababa + ababab\=ps + 0: ababab + 1: ab + ababab\=ph +Partial match: ababab + abababa\=ps + 0: ababab + 1: ab + abababa\=ph +Partial match: abababa + +/(..)\1{2,}/i + ab\=ps +Partial match: ab + aBa\=ps +Partial match: aBa + aBAb\=ps +Partial match: aBAb + AbaBA\=ps +Partial match: AbaBA + abABAb\=ps + 0: abABAb + 1: ab + aBAbaB\=ph +Partial match: aBAbaB + abABabA\=ps + 0: abABab + 1: ab + abaBABa\=ph +Partial match: abaBABa + +/(..)\1{2,}?x/i + ab\=ps +Partial match: ab + abA\=ps +Partial match: abA + aBAb\=ps +Partial match: aBAb + abaBA\=ps +Partial match: abaBA + abAbaB\=ps +Partial match: abAbaB + abaBabA\=ps +Partial match: abaBabA + abAbABaBx\=ps + 0: abAbABaBx + 1: ab + +/^(..)\1/ + aba\=ps +Partial match: aba + +/^(..)\1{2,3}x/ + aba\=ps +Partial match: aba + ababa\=ps +Partial match: ababa + ababa\=ph +Partial match: ababa + abababx + 0: abababx + 1: ab + ababababx + 0: ababababx + 1: ab + +/^(..)\1{2,3}?x/ + aba\=ps +Partial match: aba + ababa\=ps +Partial match: ababa + ababa\=ph +Partial match: ababa + abababx + 0: abababx + 1: ab + ababababx + 0: ababababx + 1: ab + +/^(..)(\1{2,3})ab/ + abababab + 0: abababab + 1: ab + 2: abab + +/^\R/ + \r\=ps + 0: \x0d + \r\=ph +Partial match: \x0d + +/^\R{2,3}x/ + \r\=ps +Partial match: \x0d + \r\=ph +Partial match: \x0d + \r\r\=ps +Partial match: \x0d\x0d + \r\r\=ph +Partial match: \x0d\x0d + \r\r\r\=ps +Partial match: \x0d\x0d\x0d + \r\r\r\=ph +Partial match: \x0d\x0d\x0d + \r\rx + 0: \x0d\x0dx + \r\r\rx + 0: \x0d\x0d\x0dx + +/^\R{2,3}?x/ + \r\=ps +Partial match: \x0d + \r\=ph +Partial match: \x0d + \r\r\=ps +Partial match: \x0d\x0d + \r\r\=ph +Partial match: \x0d\x0d + \r\r\r\=ps +Partial match: \x0d\x0d\x0d + \r\r\r\=ph +Partial match: \x0d\x0d\x0d + \r\rx + 0: \x0d\x0dx + \r\r\rx + 0: \x0d\x0d\x0dx + +/^\R?x/ + \r\=ps +Partial match: \x0d + \r\=ph +Partial match: \x0d + x + 0: x + \rx + 0: \x0dx + +/^\R+x/ + \r\=ps +Partial match: \x0d + \r\=ph +Partial match: \x0d + \r\n\=ps +Partial match: \x0d\x0a + \r\n\=ph +Partial match: \x0d\x0a + \rx + 0: \x0dx + +/^a$/newline=crlf + a\r\=ps +Partial match: a\x0d + a\r\=ph +Partial match: a\x0d + +/^a$/m,newline=crlf + a\r\=ps +Partial match: a\x0d + a\r\=ph +Partial match: a\x0d + +/^(a$|a\r)/newline=crlf + a\r\=ps + 0: a\x0d + 1: a\x0d + a\r\=ph +Partial match: a\x0d + +/^(a$|a\r)/m,newline=crlf + a\r\=ps + 0: a\x0d + 1: a\x0d + a\r\=ph +Partial match: a\x0d + +/./newline=crlf + \r\=ps + 0: \x0d + \r\=ph +Partial match: \x0d + +/.{2,3}/newline=crlf + \r\=ps +Partial match: \x0d + \r\=ph +Partial match: \x0d + \r\r\=ps + 0: \x0d\x0d + \r\r\=ph +Partial match: \x0d\x0d + \r\r\r\=ps + 0: \x0d\x0d\x0d + \r\r\r\=ph +Partial match: \x0d\x0d\x0d + +/.{2,3}?/newline=crlf + \r\=ps +Partial match: \x0d + \r\=ph +Partial match: \x0d + \r\r\=ps + 0: \x0d\x0d + \r\r\=ph +Partial match: \x0d\x0d + \r\r\r\=ps + 0: \x0d\x0d + \r\r\r\=ph + 0: \x0d\x0d + +"AB(C(D))(E(F))?(?(?=\2)(?=\4))" + ABCDGHI\=ovector=01 +Matched, but too many substrings + 0: ABCD + +# These are all run as real matches in test 1; here we are just checking the +# settings of the anchored and startline bits. + +/(?>.*?a)(?<=ba)/I +Capture group count = 0 +Max lookbehind = 2 +Last code unit = 'a' +Subject length lower bound = 1 + +/(?:.*?a)(?<=ba)/I +Capture group count = 0 +Max lookbehind = 2 +First code unit at start or follows newline +Last code unit = 'a' +Subject length lower bound = 1 + +/.*?a(*PRUNE)b/I +Capture group count = 0 +Last code unit = 'b' +Subject length lower bound = 2 + +/.*?a(*PRUNE)b/Is +Capture group count = 0 +Options: dotall +Last code unit = 'b' +Subject length lower bound = 2 + +/^a(*PRUNE)b/Is +Capture group count = 0 +Compile options: dotall +Overall options: anchored dotall +First code unit = 'a' +Subject length lower bound = 2 + +/.*?a(*SKIP)b/I +Capture group count = 0 +Last code unit = 'b' +Subject length lower bound = 2 + +/(?>.*?a)b/Is +Capture group count = 0 +Options: dotall +Last code unit = 'b' +Subject length lower bound = 2 + +/(?>.*?a)b/I +Capture group count = 0 +Last code unit = 'b' +Subject length lower bound = 2 + +/(?>^a)b/Is +Capture group count = 0 +Compile options: dotall +Overall options: anchored dotall +First code unit = 'a' +Subject length lower bound = 2 + +/(?>.*?)(?<=(abcd)|(wxyz))/I +Capture group count = 2 +Max lookbehind = 4 +May match empty string +Subject length lower bound = 0 + +/(?>.*)(?<=(abcd)|(wxyz))/I +Capture group count = 2 +Max lookbehind = 4 +May match empty string +Subject length lower bound = 0 + +"(?>.*)foo"I +Capture group count = 0 +Last code unit = 'o' +Subject length lower bound = 3 + +"(?>.*?)foo"I +Capture group count = 0 +Last code unit = 'o' +Subject length lower bound = 3 + +/(?>^abc)/Im +Capture group count = 0 +Options: multiline +First code unit at start or follows newline +Last code unit = 'c' +Subject length lower bound = 3 + +/(?>.*abc)/Im +Capture group count = 0 +Options: multiline +Last code unit = 'c' +Subject length lower bound = 3 + +/(?:.*abc)/Im +Capture group count = 0 +Options: multiline +First code unit at start or follows newline +Last code unit = 'c' +Subject length lower bound = 3 + +/(?:(a)+(?C1)bb|aa(?C2)b)/ + aab\=callout_capture +Callout 1: last capture = 1 + 1: a +--->aab + ^ ^ b +Callout 1: last capture = 1 + 1: a +--->aab + ^^ b +Callout 2: last capture = 0 +--->aab + ^ ^ b + 0: aab + +/(?:(a)++(?C1)bb|aa(?C2)b)/ + aab\=callout_capture +Callout 1: last capture = 1 + 1: a +--->aab + ^ ^ b +Callout 2: last capture = 0 +--->aab + ^ ^ b + 0: aab + +/(?:(?>(a))(?C1)bb|aa(?C2)b)/ + aab\=callout_capture +Callout 1: last capture = 1 + 1: a +--->aab + ^^ b +Callout 2: last capture = 0 +--->aab + ^ ^ b + 0: aab + +/(?:(?1)(?C1)x|ab(?C2))((a)){0}/ + aab\=callout_capture +Callout 1: last capture = 0 +--->aab + ^^ x +Callout 1: last capture = 0 +--->aab + ^^ x +Callout 2: last capture = 0 +--->aab + ^ ^ ) + 0: ab + +/(?1)(?C1)((a)(?C2)){0}/ + aab\=callout_capture +Callout 2: last capture = 2 + 1: + 2: a +--->aab + ^^ ){0} +Callout 1: last capture = 0 +--->aab + ^^ ( + 0: a + +/(?:(a)+(?C1)bb|aa(?C2)b)++/ + aab\=callout_capture +Callout 1: last capture = 1 + 1: a +--->aab + ^ ^ b +Callout 1: last capture = 1 + 1: a +--->aab + ^^ b +Callout 2: last capture = 0 +--->aab + ^ ^ b + 0: aab + aab\=callout_capture,ovector=1 +Callout 1: last capture = 1 + 1: a +--->aab + ^ ^ b +Callout 1: last capture = 1 + 1: a +--->aab + ^^ b +Callout 2: last capture = 0 +--->aab + ^ ^ b + 0: aab + +/(ab)x|ab/ + ab\=ovector=0 + 0: ab + ab\=ovector=1 + 0: ab + +/(?<=123)(*MARK:xx)abc/mark + xxxx123a\=ph +Partial match, mark=xx: a + xxxx123a\=ps +Partial match, mark=xx: a + +/123\Kabc/startchar + xxxx123a\=ph +Partial match: 123a + xxxx123a\=ps +Partial match: 123a + +/^(?(?=a)aa|bb)/auto_callout + bb +--->bb + +0 ^ ^ + +1 ^ (? + +3 ^ (?= + +6 ^ a ++11 ^ b ++12 ^^ b ++13 ^ ^ ) ++14 ^ ^ End of pattern + 0: bb + +/(?C1)^(?C2)(?(?C99)(?=(?C3)a(?C4))(?C5)a(?C6)a(?C7)|(?C8)b(?C9)b(?C10))(?C11)/ + bb +--->bb + 1 ^ ^ + 2 ^ (? + 99 ^ (?= + 3 ^ a + 8 ^ b + 9 ^^ b + 10 ^ ^ ) + 11 ^ ^ End of pattern + 0: bb + +# Perl seems to have a bug with this one. + +/aaaaa(*COMMIT)(*PRUNE)b|a+c/ + aaaaaac + 0: aaaac + +# Here are some that Perl treats differently because of the way it handles +# backtracking verbs. + +/(?!a(*COMMIT)b)ac|ad/ + ac + 0: ac + ad + 0: ad + +/^(?!a(*THEN)b|ac)../ + ad + 0: ad +\= Expect no match + ac +No match + +/^(?=a(*THEN)b|ac)/ + ac + 0: + +/\A.*?(?:a|b(*THEN)c)/ + ba + 0: ba + +/\A.*?(?:a|b(*THEN)c)++/ + ba + 0: ba + +/\A.*?(?:a|b(*THEN)c|d)/ + ba + 0: ba + +/(?:(a(*MARK:X)a+(*SKIP:X)b)){0}(?:(?1)|aac)/ + aac + 0: aac + +/\A.*?(a|b(*THEN)c)/ + ba + 0: ba + 1: a + +/^(A(*THEN)B|A(*THEN)D)/ + AD + 0: AD + 1: AD + +/(?!b(*THEN)a)bn|bnn/ + bnn + 0: bn + +/(?(?=b(*SKIP)a)bn|bnn)/ + bnn + 0: bnn + +/(?=b(*THEN)a|)bn|bnn/ + bnn + 0: bn + +# This test causes a segfault with Perl 5.18.0 + +/^(?=(a)){0}b(?1)/ + backgammon + 0: ba + +/(?|(?f)|(?b))/I,dupnames +Capture group count = 1 +Named capture groups: + n 1 +Options: dupnames +Starting code units: b f +Subject length lower bound = 1 + +/(?abc)(?z)\k()/IB,dupnames +------------------------------------------------------------------ + Bra + CBra 1 + abc + Ket + CBra 2 + z + Ket + \k2 + CBra 3 + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 3 +Max back reference = 2 +Named capture groups: + a 1 + a 2 +Options: dupnames +First code unit = 'a' +Last code unit = 'z' +Subject length lower bound = 5 + +/a*[bcd]/B +------------------------------------------------------------------ + Bra + a*+ + [b-d] + Ket + End +------------------------------------------------------------------ + +/[bcd]*a/B +------------------------------------------------------------------ + Bra + [b-d]*+ + a + Ket + End +------------------------------------------------------------------ + +# A complete set of tests for auto-possessification of character types, but +# omitting \C because it might be disabled (it has its own tests). + +/\D+\D \D+\d \D+\S \D+\s \D+\W \D+\w \D+. \D+\R \D+\H \D+\h \D+\V \D+\v \D+\Z \D+\z \D+$/Bx +------------------------------------------------------------------ + Bra + \D+ + \D + \D++ + \d + \D+ + \S + \D+ + \s + \D+ + \W + \D+ + \w + \D+ + Any + \D+ + \R + \D+ + \H + \D+ + \h + \D+ + \V + \D+ + \v + \D+ + \Z + \D++ + \z + \D+ + $ + Ket + End +------------------------------------------------------------------ + +/\d+\D \d+\d \d+\S \d+\s \d+\W \d+\w \d+. \d+\R \d+\H \d+\h \d+\V \d+\v \d+\Z \d+\z \d+$/Bx +------------------------------------------------------------------ + Bra + \d++ + \D + \d+ + \d + \d+ + \S + \d++ + \s + \d++ + \W + \d+ + \w + \d+ + Any + \d++ + \R + \d+ + \H + \d++ + \h + \d+ + \V + \d++ + \v + \d++ + \Z + \d++ + \z + \d++ + $ + Ket + End +------------------------------------------------------------------ + +/\S+\D \S+\d \S+\S \S+\s \S+\W \S+\w \S+. \S+\R \S+\H \S+\h \S+\V \S+\v \S+\Z \S+\z \S+$/Bx +------------------------------------------------------------------ + Bra + \S+ + \D + \S+ + \d + \S+ + \S + \S++ + \s + \S+ + \W + \S+ + \w + \S+ + Any + \S++ + \R + \S+ + \H + \S++ + \h + \S+ + \V + \S++ + \v + \S++ + \Z + \S++ + \z + \S++ + $ + Ket + End +------------------------------------------------------------------ + +/\s+\D \s+\d \s+\S \s+\s \s+\W \s+\w \s+. \s+\R \s+\H \s+\h \s+\V \s+\v \s+\Z \s+\z \s+$/Bx +------------------------------------------------------------------ + Bra + \s+ + \D + \s++ + \d + \s++ + \S + \s+ + \s + \s+ + \W + \s++ + \w + \s+ + Any + \s+ + \R + \s+ + \H + \s+ + \h + \s+ + \V + \s+ + \v + \s+ + \Z + \s++ + \z + \s+ + $ + Ket + End +------------------------------------------------------------------ + +/\W+\D \W+\d \W+\S \W+\s \W+\W \W+\w \W+. \W+\R \W+\H \W+\h \W+\V \W+\v \W+\Z \W+\z \W+$/Bx +------------------------------------------------------------------ + Bra + \W+ + \D + \W++ + \d + \W+ + \S + \W+ + \s + \W+ + \W + \W++ + \w + \W+ + Any + \W+ + \R + \W+ + \H + \W+ + \h + \W+ + \V + \W+ + \v + \W+ + \Z + \W++ + \z + \W+ + $ + Ket + End +------------------------------------------------------------------ + +/\w+\D \w+\d \w+\S \w+\s \w+\W \w+\w \w+. \w+\R \w+\H \w+\h \w+\V \w+\v \w+\Z \w+\z \w+$/Bx +------------------------------------------------------------------ + Bra + \w+ + \D + \w+ + \d + \w+ + \S + \w++ + \s + \w++ + \W + \w+ + \w + \w+ + Any + \w++ + \R + \w+ + \H + \w++ + \h + \w+ + \V + \w++ + \v + \w++ + \Z + \w++ + \z + \w++ + $ + Ket + End +------------------------------------------------------------------ + +/\R+\D \R+\d \R+\S \R+\s \R+\W \R+\w \R+. \R+\R \R+\H \R+\h \R+\V \R+\v \R+\Z \R+\z \R+$/Bx +------------------------------------------------------------------ + Bra + \R+ + \D + \R++ + \d + \R+ + \S + \R++ + \s + \R+ + \W + \R++ + \w + \R++ + Any + \R+ + \R + \R+ + \H + \R++ + \h + \R+ + \V + \R+ + \v + \R+ + \Z + \R++ + \z + \R+ + $ + Ket + End +------------------------------------------------------------------ + +/\H+\D \H+\d \H+\S \H+\s \H+\W \H+\w \H+. \H+\R \H+\H \H+\h \H+\V \H+\v \H+\Z \H+\z \H+$/Bx +------------------------------------------------------------------ + Bra + \H+ + \D + \H+ + \d + \H+ + \S + \H+ + \s + \H+ + \W + \H+ + \w + \H+ + Any + \H+ + \R + \H+ + \H + \H++ + \h + \H+ + \V + \H+ + \v + \H+ + \Z + \H++ + \z + \H+ + $ + Ket + End +------------------------------------------------------------------ + +/\h+\D \h+\d \h+\S \h+\s \h+\W \h+\w \h+. \h+\R \h+\H \h+\h \h+\V \h+\v \h+\Z \h+\z \h+$/Bx +------------------------------------------------------------------ + Bra + \h+ + \D + \h++ + \d + \h++ + \S + \h+ + \s + \h+ + \W + \h++ + \w + \h+ + Any + \h++ + \R + \h++ + \H + \h+ + \h + \h+ + \V + \h++ + \v + \h+ + \Z + \h++ + \z + \h+ + $ + Ket + End +------------------------------------------------------------------ + +/\V+\D \V+\d \V+\S \V+\s \V+\W \V+\w \V+. \V+\R \V+\H \V+\h \V+\V \V+\v \V+\Z \V+\z \V+$/Bx +------------------------------------------------------------------ + Bra + \V+ + \D + \V+ + \d + \V+ + \S + \V+ + \s + \V+ + \W + \V+ + \w + \V+ + Any + \V++ + \R + \V+ + \H + \V+ + \h + \V+ + \V + \V++ + \v + \V+ + \Z + \V++ + \z + \V+ + $ + Ket + End +------------------------------------------------------------------ + +/\v+\D \v+\d \v+\S \v+\s \v+\W \v+\w \v+. \v+\R \v+\H \v+\h \v+\V \v+\v \v+\Z \v+\z \v+$/Bx +------------------------------------------------------------------ + Bra + \v+ + \D + \v++ + \d + \v++ + \S + \v+ + \s + \v+ + \W + \v++ + \w + \v+ + Any + \v+ + \R + \v+ + \H + \v++ + \h + \v++ + \V + \v+ + \v + \v+ + \Z + \v++ + \z + \v+ + $ + Ket + End +------------------------------------------------------------------ + +/ a+\D a+\d a+\S a+\s a+\W a+\w a+. a+\R a+\H a+\h a+\V a+\v a+\Z a+\z a+$/Bx +------------------------------------------------------------------ + Bra + a+ + \D + a++ + \d + a+ + \S + a++ + \s + a++ + \W + a+ + \w + a+ + Any + a++ + \R + a+ + \H + a++ + \h + a+ + \V + a++ + \v + a++ + \Z + a++ + \z + a++ + $ + Ket + End +------------------------------------------------------------------ + +/\n+\D \n+\d \n+\S \n+\s \n+\W \n+\w \n+. \n+\R \n+\H \n+\h \n+\V \n+\v \n+\Z \n+\z \n+$/Bx +------------------------------------------------------------------ + Bra + \x0a+ + \D + \x0a++ + \d + \x0a++ + \S + \x0a+ + \s + \x0a+ + \W + \x0a++ + \w + \x0a+ + Any + \x0a+ + \R + \x0a+ + \H + \x0a++ + \h + \x0a++ + \V + \x0a+ + \v + \x0a+ + \Z + \x0a++ + \z + \x0a+ + $ + Ket + End +------------------------------------------------------------------ + +/ .+\D .+\d .+\S .+\s .+\W .+\w .+. .+\R .+\H .+\h .+\V .+\v .+\Z .+\z .+$/Bx +------------------------------------------------------------------ + Bra + Any+ + \D + Any+ + \d + Any+ + \S + Any+ + \s + Any+ + \W + Any+ + \w + Any+ + Any + Any++ + \R + Any+ + \H + Any+ + \h + Any+ + \V + Any+ + \v + Any+ + \Z + Any++ + \z + Any+ + $ + Ket + End +------------------------------------------------------------------ + +/ .+\D .+\d .+\S .+\s .+\W .+\w .+. .+\R .+\H .+\h .+\V .+\v .+\Z .+\z .+$/Bsx +------------------------------------------------------------------ + Bra + AllAny+ + \D + AllAny+ + \d + AllAny+ + \S + AllAny+ + \s + AllAny+ + \W + AllAny+ + \w + AllAny+ + AllAny + AllAny+ + \R + AllAny+ + \H + AllAny+ + \h + AllAny+ + \V + AllAny+ + \v + AllAny+ + \Z + AllAny++ + \z + AllAny+ + $ + Ket + End +------------------------------------------------------------------ + +/ \D+$ \d+$ \S+$ \s+$ \W+$ \w+$ \R+$ \H+$ \h+$ \V+$ \v+$ a+$ \n+$ .+$ .+$/Bmx +------------------------------------------------------------------ + Bra + \D+ + /m $ + \d++ + /m $ + \S++ + /m $ + \s+ + /m $ + \W+ + /m $ + \w++ + /m $ + \R+ + /m $ + \H+ + /m $ + \h+ + /m $ + \V+ + /m $ + \v+ + /m $ + a+ + /m $ + \x0a+ + /m $ + Any+ + /m $ + Any+ + /m $ + Ket + End +------------------------------------------------------------------ + +/(?=a+)a(a+)++a/B +------------------------------------------------------------------ + Bra + Assert + a++ + Ket + a + CBraPos 1 + a+ + KetRpos + a + Ket + End +------------------------------------------------------------------ + +/a+(bb|cc)a+(?:bb|cc)a+(?>bb|cc)a+(?:bb|cc)+a+(aa)a+(?:bb|aa)/B +------------------------------------------------------------------ + Bra + a++ + CBra 1 + bb + Alt + cc + Ket + a++ + Bra + bb + Alt + cc + Ket + a++ + Once + bb + Alt + cc + Ket + a++ + Bra + bb + Alt + cc + KetRmax + a+ + CBra 2 + aa + Ket + a+ + Bra + bb + Alt + aa + Ket + Ket + End +------------------------------------------------------------------ + +/a+(bb|cc)?#a+(?:bb|cc)??#a+(?:bb|cc)?+#a+(?:bb|cc)*#a+(bb|cc)?a#a+(?:aa)?/B +------------------------------------------------------------------ + Bra + a++ + Brazero + CBra 1 + bb + Alt + cc + Ket + # + a++ + Braminzero + Bra + bb + Alt + cc + Ket + # + a++ + Once + Brazero + Bra + bb + Alt + cc + Ket + Ket + # + a++ + Brazero + Bra + bb + Alt + cc + KetRmax + # + a+ + Brazero + CBra 2 + bb + Alt + cc + Ket + a# + a+ + Brazero + Bra + aa + Ket + Ket + End +------------------------------------------------------------------ + +/a+(?:bb)?a#a+(?:|||)#a+(?:|b)a#a+(?:|||)?a/B +------------------------------------------------------------------ + Bra + a+ + Brazero + Bra + bb + Ket + a# + a++ + Bra + Alt + Alt + Alt + Ket + # + a+ + Bra + Alt + b + Ket + a# + a+ + Brazero + Bra + Alt + Alt + Alt + Ket + a + Ket + End +------------------------------------------------------------------ + +/[ab]*/B +------------------------------------------------------------------ + Bra + [ab]*+ + Ket + End +------------------------------------------------------------------ + aaaa + 0: aaaa + +/[ab]*?/B +------------------------------------------------------------------ + Bra + [ab]*? + Ket + End +------------------------------------------------------------------ + aaaa + 0: + +/[ab]?/B +------------------------------------------------------------------ + Bra + [ab]?+ + Ket + End +------------------------------------------------------------------ + aaaa + 0: a + +/[ab]??/B +------------------------------------------------------------------ + Bra + [ab]?? + Ket + End +------------------------------------------------------------------ + aaaa + 0: + +/[ab]+/B +------------------------------------------------------------------ + Bra + [ab]++ + Ket + End +------------------------------------------------------------------ + aaaa + 0: aaaa + +/[ab]+?/B +------------------------------------------------------------------ + Bra + [ab]+? + Ket + End +------------------------------------------------------------------ + aaaa + 0: a + +/[ab]{2,3}/B +------------------------------------------------------------------ + Bra + [ab]{2,3}+ + Ket + End +------------------------------------------------------------------ + aaaa + 0: aaa + +/[ab]{2,3}?/B +------------------------------------------------------------------ + Bra + [ab]{2,3}? + Ket + End +------------------------------------------------------------------ + aaaa + 0: aa + +/[ab]{2,}/B +------------------------------------------------------------------ + Bra + [ab]{2,}+ + Ket + End +------------------------------------------------------------------ + aaaa + 0: aaaa + +/[ab]{2,}?/B +------------------------------------------------------------------ + Bra + [ab]{2,}? + Ket + End +------------------------------------------------------------------ + aaaa + 0: aa + +/\d+\s{0,5}=\s*\S?=\w{0,4}\W*/B +------------------------------------------------------------------ + Bra + \d++ + \s{0,5}+ + = + \s*+ + \S? + = + \w{0,4}+ + \W*+ + Ket + End +------------------------------------------------------------------ + +/[a-d]{5,12}[e-z0-9]*#[^a-z]+[b-y]*a[2-7]?[^0-9a-z]+/B +------------------------------------------------------------------ + Bra + [a-d]{5,12}+ + [0-9e-z]*+ + # + [\x00-`{-\xff] (neg)++ + [b-y]*+ + a + [2-7]?+ + [\x00-/:-`{-\xff] (neg)++ + Ket + End +------------------------------------------------------------------ + +/[a-z]*\s#[ \t]?\S#[a-c]*\S#[C-G]+?\d#[4-8]*\D#[4-9,]*\D#[!$]{0,5}\w#[M-Xf-l]+\W#[a-c,]?\W/B +------------------------------------------------------------------ + Bra + [a-z]*+ + \s + # + [\x09 ]?+ + \S + # + [a-c]* + \S + # + [C-G]++ + \d + # + [4-8]*+ + \D + # + [,4-9]* + \D + # + [!$]{0,5}+ + \w + # + [M-Xf-l]++ + \W + # + [,a-c]? + \W + Ket + End +------------------------------------------------------------------ + +/a+(aa|bb)*c#a*(bb|cc)*a#a?(bb|cc)*d#[a-f]*(g|hh)*f/B +------------------------------------------------------------------ + Bra + a+ + Brazero + CBra 1 + aa + Alt + bb + KetRmax + c# + a* + Brazero + CBra 2 + bb + Alt + cc + KetRmax + a# + a?+ + Brazero + CBra 3 + bb + Alt + cc + KetRmax + d# + [a-f]* + Brazero + CBra 4 + g + Alt + hh + KetRmax + f + Ket + End +------------------------------------------------------------------ + +/[a-f]*(g|hh|i)*i#[a-x]{4,}(y{0,6})*y#[a-k]+(ll|mm)+n/B +------------------------------------------------------------------ + Bra + [a-f]*+ + Brazero + CBra 1 + g + Alt + hh + Alt + i + KetRmax + i# + [a-x]{4,} + Brazero + SCBra 2 + y{0,6} + KetRmax + y# + [a-k]++ + CBra 3 + ll + Alt + mm + KetRmax + n + Ket + End +------------------------------------------------------------------ + +/[a-f]*(?>gg|hh)+#[a-f]*(?>gg|hh)?#[a-f]*(?>gg|hh)*a#[a-f]*(?>gg|hh)*h/B +------------------------------------------------------------------ + Bra + [a-f]*+ + Once + gg + Alt + hh + KetRmax + # + [a-f]*+ + Brazero + Once + gg + Alt + hh + Ket + # + [a-f]* + Brazero + Once + gg + Alt + hh + KetRmax + a# + [a-f]*+ + Brazero + Once + gg + Alt + hh + KetRmax + h + Ket + End +------------------------------------------------------------------ + +/[a-c]*d/IB +------------------------------------------------------------------ + Bra + [a-c]*+ + d + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: a b c d +Last code unit = 'd' +Subject length lower bound = 1 + +/[a-c]+d/IB +------------------------------------------------------------------ + Bra + [a-c]++ + d + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: a b c +Last code unit = 'd' +Subject length lower bound = 2 + +/[a-c]?d/IB +------------------------------------------------------------------ + Bra + [a-c]?+ + d + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: a b c d +Last code unit = 'd' +Subject length lower bound = 1 + +/[a-c]{4,6}d/IB +------------------------------------------------------------------ + Bra + [a-c]{4,6}+ + d + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: a b c +Last code unit = 'd' +Subject length lower bound = 5 + +/[a-c]{0,6}d/IB +------------------------------------------------------------------ + Bra + [a-c]{0,6}+ + d + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: a b c d +Last code unit = 'd' +Subject length lower bound = 1 + +# End of special auto-possessive tests + +/^A\o{1239}B/ +Failed: error 164 at offset 8: non-octal character in \o{} (closing brace missing?) + A\123B + +/^A\oB/ +Failed: error 155 at offset 4: missing opening brace after \o + +/^A\x{zz}B/ +Failed: error 167 at offset 5: non-hex character in \x{} (closing brace missing?) + +/^A\x{12Z/ +Failed: error 167 at offset 7: non-hex character in \x{} (closing brace missing?) + +/^A\x{/ +Failed: error 178 at offset 5: digits missing in \x{} or \o{} or \N{U+} + +/[ab]++/B,no_auto_possess +------------------------------------------------------------------ + Bra + [ab]++ + Ket + End +------------------------------------------------------------------ + +/[^ab]*+/B,no_auto_possess +------------------------------------------------------------------ + Bra + [\x00-`c-\xff] (neg)*+ + Ket + End +------------------------------------------------------------------ + +/a{4}+/B,no_auto_possess +------------------------------------------------------------------ + Bra + a{4} + Ket + End +------------------------------------------------------------------ + +/a{4}+/Bi,no_auto_possess +------------------------------------------------------------------ + Bra + /i a{4} + Ket + End +------------------------------------------------------------------ + +/[a-[:digit:]]+/ +Failed: error 150 at offset 4: invalid range in character class + +/[A-[:digit:]]+/ +Failed: error 150 at offset 4: invalid range in character class + +/[a-[.xxx.]]+/ +Failed: error 150 at offset 4: invalid range in character class + +/[a-[=xxx=]]+/ +Failed: error 150 at offset 4: invalid range in character class + +/[a-[!xxx!]]+/ +Failed: error 108 at offset 3: range out of order in character class + +/[A-[!xxx!]]+/ + A]]] + 0: A]]] + +/[a-\d]+/ +Failed: error 150 at offset 5: invalid range in character class + +/(?<0abc>xx)/ +Failed: error 144 at offset 3: subpattern name must start with a non-digit + +/(?&1abc)xx(?<1abc>y)/ +Failed: error 144 at offset 3: subpattern name must start with a non-digit + +/(?xx)/ +Failed: error 142 at offset 5: syntax error in subpattern name (missing terminator?) + +/(?'0abc'xx)/ +Failed: error 144 at offset 3: subpattern name must start with a non-digit + +/(?P<0abc>xx)/ +Failed: error 144 at offset 4: subpattern name must start with a non-digit + +/\k<5ghj>/ +Failed: error 144 at offset 3: subpattern name must start with a non-digit + +/\k'5ghj'/ +Failed: error 144 at offset 3: subpattern name must start with a non-digit + +/\k{2fgh}/ +Failed: error 144 at offset 3: subpattern name must start with a non-digit + +/(?P=8yuki)/ +Failed: error 144 at offset 4: subpattern name must start with a non-digit + +/\g{4df}/ +Failed: error 157 at offset 2: \g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number + +/(?&1abc)xx(?<1abc>y)/ +Failed: error 144 at offset 3: subpattern name must start with a non-digit + +/(?P>1abc)xx(?<1abc>y)/ +Failed: error 144 at offset 4: subpattern name must start with a non-digit + +/\g'3gh'/ +Failed: error 157 at offset 2: \g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number + +/\g<5fg>/ +Failed: error 157 at offset 2: \g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number + +/(?(<4gh>)abc)/ +Failed: error 144 at offset 4: subpattern name must start with a non-digit + +/(?('4gh')abc)/ +Failed: error 144 at offset 4: subpattern name must start with a non-digit + +/(?(4gh)abc)/ +Failed: error 124 at offset 4: missing closing parenthesis for condition + +/(?(R&6yh)abc)/ +Failed: error 144 at offset 5: subpattern name must start with a non-digit + +/(((a\2)|(a*)\g<-1>))*a?/B +------------------------------------------------------------------ + Bra + Brazero + SCBra 1 + CBra 2 + CBra 3 + a + \2 + Ket + Alt + CBra 4 + a* + Ket + Recurse + Ket + KetRmax + a?+ + Ket + End +------------------------------------------------------------------ + +# Test the ugly "start or end of word" compatibility syntax. + +/[[:<:]]red[[:>:]]/B +------------------------------------------------------------------ + Bra + \b + Assert + \w + Ket + red + \b + Assert back + Reverse + \w + Ket + Ket + End +------------------------------------------------------------------ + little red riding hood + 0: red + a /red/ thing + 0: red + red is a colour + 0: red + put it all on red + 0: red +\= Expect no match + no reduction +No match + Alfred Winifred +No match + +/[a[:<:]] should give error/ +Failed: error 130 at offset 4: unknown POSIX class name + +/(?=ab\K)/aftertext + abcd\=startchar +Start of matched string is beyond its end - displaying from end to start. + 0: ab + 0+ abcd + +/abcd/newline=lf,firstline +\= Expect no match + xx\nxabcd +No match + +# Test stack guard external calls. + +/(((a)))/stackguard=1 +Failed: error 133 at offset 7: parentheses are too deeply nested (stack check) + +/(((a)))/stackguard=2 +Failed: error 133 at offset 7: parentheses are too deeply nested (stack check) + +/(((a)))/stackguard=3 + +/(((((a)))))/ + +# End stack guard tests + +/^\w+(?>\s*)(?<=\w)/B +------------------------------------------------------------------ + Bra + ^ + \w+ + Once + \s*+ + Ket + Assert back + Reverse + \w + Ket + Ket + End +------------------------------------------------------------------ + +/\othing/ +Failed: error 155 at offset 2: missing opening brace after \o + +/\o{}/ +Failed: error 178 at offset 3: digits missing in \x{} or \o{} or \N{U+} + +/\o{whatever}/ +Failed: error 164 at offset 3: non-octal character in \o{} (closing brace missing?) + +/\xthing/ + +/\x{}/ +Failed: error 178 at offset 3: digits missing in \x{} or \o{} or \N{U+} + +/\x{whatever}/ +Failed: error 167 at offset 3: non-hex character in \x{} (closing brace missing?) + +/A\8B/ +Failed: error 115 at offset 2: reference to non-existent subpattern + +/A\9B/ +Failed: error 115 at offset 2: reference to non-existent subpattern + +# This one is here because Perl fails to match "12" for this pattern when the $ +# is present. + +/^(?(?=abc)\w{3}:|\d\d)$/ + abc: + 0: abc: + 12 + 0: 12 +\= Expect no match + 123 +No match + xyz +No match + +# Perl gets this one wrong, giving "a" as the after text for ca and failing to +# match for cd. + +/(?(?=ab)ab)/aftertext + abxxx + 0: ab + 0+ xxx + ca + 0: + 0+ ca + cd + 0: + 0+ cd + +# This should test both paths for processing OP_RECURSE. + +/(?(R)a+|(?R)b)/ + aaaabcde + 0: aaaab + aaaabcde\=ovector=100 + 0: aaaab + +/a*?b*?/ + ab + 0: + +/(*NOTEMPTY)a*?b*?/ + ab + 0: a + ba + 0: b + cb + 0: b + +/(*NOTEMPTY_ATSTART)a*?b*?/aftertext + ab + 0: a + 0+ b + cdab + 0: + 0+ dab + +/(?(VERSION>=10.0)yes|no)/I +Capture group count = 0 +Subject length lower bound = 2 + yesno + 0: yes + +/(?(VERSION>=10.04)yes|no)/ + yesno + 0: yes + +/(?(VERSION=8)yes){3}/BI,aftertext +------------------------------------------------------------------ + Bra + Cond + Cond false + yes + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + yesno + 0: + 0+ yesno + +/(?(VERSION=8)yes|no){3}/I +Capture group count = 0 +Subject length lower bound = 6 + yesnononoyes + 0: nonono +\= Expect no match + yesno +No match + +/(?:(?abc)|xyz)(?(VERSION)yes|no)/I +Capture group count = 1 +Max back reference = 1 +Named capture groups: + VERSION 1 +Starting code units: a x +Subject length lower bound = 5 + abcyes + 0: abcyes + 1: abc + xyzno + 0: xyzno +\= Expect no match + abcno +No match + xyzyes +No match + +/(?(VERSION<10)yes|no)/ +Failed: error 179 at offset 10: syntax error or number too big in (?(VERSION condition + +/(?(VERSION>10)yes|no)/ +Failed: error 179 at offset 11: syntax error or number too big in (?(VERSION condition + +/(?(VERSION>=10.0.0)yes|no)/ +Failed: error 179 at offset 16: syntax error or number too big in (?(VERSION condition + +/(?(VERSION=10.101)yes|no)/ +Failed: error 179 at offset 16: syntax error or number too big in (?(VERSION condition + +/abcd/I +Capture group count = 0 +First code unit = 'a' +Last code unit = 'd' +Subject length lower bound = 4 + +/abcd/I,no_start_optimize +Capture group count = 0 +Options: no_start_optimize + +/(|ab)*?d/I +Capture group count = 1 +Starting code units: a d +Last code unit = 'd' +Subject length lower bound = 1 + abd + 0: abd + 1: ab + xyd + 0: d + +/(|ab)*?d/I,no_start_optimize +Capture group count = 1 +Options: no_start_optimize + abd + 0: abd + 1: ab + xyd + 0: d + +/\k*(?aa)(?bb)/match_unset_backref,dupnames + aabb + 0: aabb + 1: aa + 2: bb + +/(((((a)))))/parens_nest_limit=2 +Failed: error 119 at offset 3: parentheses are too deeply nested + +/abc/replace=XYZ + 123123 + 0: 123123 + 123abc123 + 1: 123XYZ123 + 123abc123abc123 + 1: 123XYZ123abc123 + 123123\=zero_terminate + 0: 123123 + 123abc123\=zero_terminate + 1: 123XYZ123 + 123abc123abc123\=zero_terminate + 1: 123XYZ123abc123 + +/abc/g,replace=XYZ + 123abc123 + 1: 123XYZ123 + 123abc123abc123 + 2: 123XYZ123XYZ123 + +/abc/replace=X$$Z + 123abc123 + 1: 123X$Z123 + +/abc/g,replace=X$$Z + 123abc123abc123 + 2: 123X$Z123X$Z123 + +/a(b)c(d)e/replace=X$1Y${2}Z + "abcde" + 1: "XbYdZ" + +/a(b)c(d)e/replace=X$1Y${2}Z,global + "abcde-abcde" + 2: "XbYdZ-XbYdZ" + +/a(?b)c(?d)e/replace=X$ONE+${TWO}Z + "abcde" + 1: "Xb+dZ" + +/a(?b)c(?d)e/g,replace=X$ONE+${TWO}Z + "abcde-abcde-" + 2: "Xb+dZ-Xb+dZ-" + +/abc/replace=a$++ + 123abc +Failed: error -35 at offset 2 in replacement: invalid replacement string + +/abc/replace=a$bad + 123abc +Failed: error -49 at offset 5 in replacement: unknown substring + +/abc/replace=a${A234567890123456789_123456789012}z + 123abc +Failed: error -49 at offset 36 in replacement: unknown substring + +/abc/replace=a${A23456789012345678901234567890123}z + 123abc +Failed: error -35 at offset 35 in replacement: invalid replacement string + +/abc/replace=a${bcd + 123abc +Failed: error -58 at offset 6 in replacement: expected closing curly bracket in replacement string + +/abc/replace=a${b+d}z + 123abc +Failed: error -58 at offset 4 in replacement: expected closing curly bracket in replacement string + +/abc/replace=[10]XYZ + 123abc123 + 1: 123XYZ123 + +/abc/replace=[9]XYZ + 123abc123 +Failed: error -48: no more memory + +/abc/replace=xyz + 1abc2\=partial_hard +Failed: error -34: bad option value + +/abc/replace=xyz + 123abc456 + 1: 123xyz456 + 123abc456\=replace=pqr + 1: 123pqr456 + 123abc456abc789 + 1: 123xyz456abc789 + 123abc456abc789\=g + 2: 123xyz456xyz789 + +/(?<=abc)(|def)/g,replace=<$0> + 123abcxyzabcdef789abcpqr + 4: 123abc<>xyzabc<>789abc<>pqr + +/./replace=$0 + a + 1: a + +/(.)(.)/replace=$2+$1 + abc + 1: b+ac + +/(?.)(?.)/replace=$B+$A + abc + 1: b+ac + +/(.)(.)/g,replace=$2$1 + abcdefgh + 4: badcfehg + +/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=${*MARK} + apple lemon blackberry + 3: pear orange strawberry + apple strudel + 1: pear strudel + fruitless + 0: fruitless + +/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/replace=${*MARK} sauce, + apple lemon blackberry + 1: pear sauce lemon blackberry + +/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=<$*MARK> + apple lemon blackberry + 3: + apple strudel + 1: strudel + fruitless + 0: fruitless + +/(*:pear)apple/g,replace=${*MARKING} + apple lemon blackberry +Failed: error -35 at offset 11 in replacement: invalid replacement string + +/(*:pear)apple/g,replace=${*MARK-time + apple lemon blackberry +Failed: error -58 at offset 7 in replacement: expected closing curly bracket in replacement string + +/(*:pear)apple/g,replace=${*mark} + apple lemon blackberry +Failed: error -35 at offset 8 in replacement: invalid replacement string + +/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=<$*MARKET> + apple lemon blackberry +Failed: error -35 at offset 9 in replacement: invalid replacement string + +/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=[22]${*MARK} + apple lemon blackberry +Failed: error -48: no more memory + apple lemon blackberry\=substitute_overflow_length +Failed: error -48: no more memory: 23 code units are needed + +/(*:pear)apple|(*:orange)lemon|(*:strawberry)blackberry/g,replace=[23]${*MARK} + apple lemon blackberry + 3: pear orange strawberry + +/abc/ + 123abc123\=replace=[9]XYZ +Failed: error -48: no more memory + 123abc123\=substitute_overflow_length,replace=[9]XYZ +Failed: error -48: no more memory: 10 code units are needed + 123abc123\=substitute_overflow_length,replace=[6]XYZ +Failed: error -48: no more memory: 10 code units are needed + 123abc123\=substitute_overflow_length,replace=[1]XYZ +Failed: error -48: no more memory: 10 code units are needed + 123abc123\=substitute_overflow_length,replace=[0]XYZ +Failed: error -48: no more memory: 10 code units are needed + +/a(b)c/ + 123abc123\=replace=[9]x$1z +Failed: error -48: no more memory + 123abc123\=substitute_overflow_length,replace=[9]x$1z +Failed: error -48: no more memory: 10 code units are needed + 123abc123\=substitute_overflow_length,replace=[6]x$1z +Failed: error -48: no more memory: 10 code units are needed + 123abc123\=substitute_overflow_length,replace=[1]x$1z +Failed: error -48: no more memory: 10 code units are needed + 123abc123\=substitute_overflow_length,replace=[0]x$1z +Failed: error -48: no more memory: 10 code units are needed + +"((?=(?(?=(?(?=(?(?=()))))))))" + a + 0: + 1: + 2: + +"(?(?=)==)(((((((((?=)))))))))" +\= Expect no match + a +No match + +/(a)(b)|(c)/ + XcX\=ovector=2,get=1,get=2,get=3,get=4,getall +Matched, but too many substrings + 0: c + 1: +Get substring 1 failed (-55): requested value is not set +Get substring 2 failed (-54): requested value is not available +Get substring 3 failed (-54): requested value is not available +Get substring 4 failed (-49): unknown substring + 0L c + 1L + +/x(?=ab\K)/ + xab\=get=0 +Start of matched string is beyond its end - displaying from end to start. + 0: ab + 0G (0) + xab\=copy=0 +Start of matched string is beyond its end - displaying from end to start. + 0: ab + 0C (0) + xab\=getall +Start of matched string is beyond its end - displaying from end to start. + 0: ab + 0L + +/(?a)|(?b)/dupnames + a\=ovector=1,copy=A,get=A,get=2 +Matched, but too many substrings + 0: a +Copy substring 'A' failed (-54): requested value is not available +Get substring 2 failed (-54): requested value is not available +Get substring 'A' failed (-54): requested value is not available + a\=ovector=2,copy=A,get=A,get=2 + 0: a + 1: a + C a (1) A (non-unique) +Get substring 2 failed (-54): requested value is not available + G a (1) A (non-unique) + b\=ovector=2,copy=A,get=A,get=2 +Matched, but too many substrings + 0: b + 1: +Copy substring 'A' failed (-55): requested value is not set +Get substring 2 failed (-54): requested value is not available +Get substring 'A' failed (-55): requested value is not set + +/a(b)c(d)/ + abc\=ph,copy=0,copy=1,getall +Partial match: abc + 0C abc (3) +Copy substring 1 failed (-2): partial match +get substring list failed (-2): partial match + +/^abc/info +Capture group count = 0 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 3 + +/^abc/info,no_dotstar_anchor +Capture group count = 0 +Compile options: no_dotstar_anchor +Overall options: anchored no_dotstar_anchor +First code unit = 'a' +Subject length lower bound = 3 + +/.*\d/info,auto_callout +Capture group count = 0 +Options: auto_callout +First code unit at start or follows newline +Subject length lower bound = 1 +\= Expect no match + aaa +--->aaa + +0 ^ .* + +2 ^ ^ \d + +2 ^ ^ \d + +2 ^^ \d + +2 ^ \d +No match + +/.*\d/info,no_dotstar_anchor,auto_callout +Capture group count = 0 +Options: auto_callout no_dotstar_anchor +Subject length lower bound = 1 +\= Expect no match + aaa +--->aaa + +0 ^ .* + +2 ^ ^ \d + +2 ^ ^ \d + +2 ^^ \d + +2 ^ \d + +0 ^ .* + +2 ^ ^ \d + +2 ^^ \d + +2 ^ \d + +0 ^ .* + +2 ^^ \d + +2 ^ \d +No match + +/.*\d/dotall,info +Capture group count = 0 +Compile options: dotall +Overall options: anchored dotall +Subject length lower bound = 1 + +/.*\d/dotall,no_dotstar_anchor,info +Capture group count = 0 +Options: dotall no_dotstar_anchor +Subject length lower bound = 1 + +/(*NO_DOTSTAR_ANCHOR)(?s).*\d/info +Capture group count = 0 +Compile options: +Overall options: no_dotstar_anchor +Subject length lower bound = 1 + +'^(?:(a)|b)(?(1)A|B)' + aA123\=ovector=1 +Matched, but too many substrings + 0: aA + aA123\=ovector=2 + 0: aA + 1: a + +'^(?:(?a)|b)(?()A|B)' + aA123\=ovector=1 +Matched, but too many substrings + 0: aA + aA123\=ovector=2 + 0: aA + 1: a + +'^(?)(?:(?a)|b)(?()A|B)'dupnames + aA123\=ovector=1 +Matched, but too many substrings + 0: aA + aA123\=ovector=2 +Matched, but too many substrings + 0: aA + 1: + aA123\=ovector=3 + 0: aA + 1: + 2: a + +'^(?:(?X)|)(?:(?a)|b)\k{AA}'dupnames + aa123\=ovector=1 +Matched, but too many substrings + 0: aa + aa123\=ovector=2 +Matched, but too many substrings + 0: aa + 1: + aa123\=ovector=3 + 0: aa + 1: + 2: a + +/(?(?J)(?1(111111)11|)1|1|)(?()1)/ + +/(?(?J)(?))(?-J)\k/ + +# Quantifiers are not allowed on condition assertions, but are otherwise +# OK in conditions. + +/(?(?=0)?)+/ +Failed: error 109 at offset 7: quantifier does not follow a repeatable item + +/(?(?=0)(?=00)?00765)/ + 00765 + 0: 00765 + +/(?(?=0)(?=00)?00765|(?!3).56)/ + 00765 + 0: 00765 + 456 + 0: 456 +\= Expect no match + 356 +No match + +'^(a)*+(\w)' + g + 0: g + 1: + 2: g + g\=ovector=1 +Matched, but too many substrings + 0: g + +'^(?:a)*+(\w)' + g + 0: g + 1: g + g\=ovector=1 +Matched, but too many substrings + 0: g + +# These two pattern showeds up compile-time bugs + +"((?2){0,1999}())?" + +/((?+1)(\1))/B +------------------------------------------------------------------ + Bra + CBra 1 + Recurse + CBra 2 + \1 + Ket + Ket + Ket + End +------------------------------------------------------------------ + +# Callouts with string arguments + +/a(?C"/ +Failed: error 181 at offset 4: missing terminating delimiter for callout with string argument + +/a(?C"a/ +Failed: error 181 at offset 4: missing terminating delimiter for callout with string argument + +/a(?C"a"/ +Failed: error 139 at offset 7: closing parenthesis for (?C expected + +/a(?C"a"bcde(?C"b")xyz/ +Failed: error 139 at offset 7: closing parenthesis for (?C expected + +/a(?C"a)b""c")/B +------------------------------------------------------------------ + Bra + a + CalloutStr "a)b"c" 5 13 0 + Ket + End +------------------------------------------------------------------ + +/ab(?C" any text with spaces ")cde/B +------------------------------------------------------------------ + Bra + ab + CalloutStr " any text with spaces " 6 30 1 + cde + Ket + End +------------------------------------------------------------------ + abcde +Callout (6): " any text with spaces " +--->abcde + ^ ^ c + 0: abcde + 12abcde +Callout (6): " any text with spaces " +--->12abcde + ^ ^ c + 0: abcde + +/^a(b)c(?C1)def/ + abcdef +--->abcdef + 1 ^ ^ d + 0: abcdef + 1: b + +/^a(b)c(?C"AB")def/ + abcdef +Callout (10): "AB" +--->abcdef + ^ ^ d + 0: abcdef + 1: b + +/^a(b)c(?C1)def/ + abcdef\=callout_capture +Callout 1: last capture = 1 + 1: b +--->abcdef + ^ ^ d + 0: abcdef + 1: b + +/^a(b)c(?C{AB})def/B +------------------------------------------------------------------ + Bra + ^ + a + CBra 1 + b + Ket + c + CalloutStr {AB} 10 14 1 + def + Ket + End +------------------------------------------------------------------ + abcdef\=callout_capture +Callout (10): {AB} last capture = 1 + 1: b +--->abcdef + ^ ^ d + 0: abcdef + 1: b + +/(?C`a``b`)(?C'a''b')(?C"a""b")(?C^a^^b^)(?C%a%%b%)(?C#a##b#)(?C$a$$b$)(?C{a}}b})/B,callout_info +------------------------------------------------------------------ + Bra + CalloutStr `a`b` 4 10 0 + CalloutStr 'a'b' 14 20 0 + CalloutStr "a"b" 24 30 0 + CalloutStr ^a^b^ 34 40 0 + CalloutStr %a%b% 44 50 0 + CalloutStr #a#b# 54 60 0 + CalloutStr $a$b$ 64 70 0 + CalloutStr {a}b} 74 80 0 + Ket + End +------------------------------------------------------------------ +Callout `a`b` ( +Callout 'a'b' ( +Callout "a"b" ( +Callout ^a^b^ ( +Callout %a%b% ( +Callout #a#b# ( +Callout $a$b$ ( +Callout {a}b} + +/(?:a(?C`code`)){3}/B +------------------------------------------------------------------ + Bra + Bra + a + CalloutStr `code` 8 14 4 + Ket + Bra + a + CalloutStr `code` 8 14 4 + Ket + Bra + a + CalloutStr `code` 8 14 4 + Ket + Ket + End +------------------------------------------------------------------ + +/^(?(?C25)(?=abc)abcd|xyz)/B,callout_info +------------------------------------------------------------------ + Bra + ^ + Cond + Callout 25 9 3 + Assert + abc + Ket + abcd + Alt + xyz + Ket + Ket + End +------------------------------------------------------------------ +Callout 25 (?= + abcdefg +--->abcdefg + 25 ^ (?= + 0: abcd + xyz123 +--->xyz123 + 25 ^ (?= + 0: xyz + +/^(?(?C$abc$)(?=abc)abcd|xyz)/B +------------------------------------------------------------------ + Bra + ^ + Cond + CalloutStr $abc$ 7 12 3 + Assert + abc + Ket + abcd + Alt + xyz + Ket + Ket + End +------------------------------------------------------------------ + abcdefg +Callout (7): $abc$ +--->abcdefg + ^ (?= + 0: abcd + xyz123 +Callout (7): $abc$ +--->xyz123 + ^ (?= + 0: xyz + +/^ab(?C'first')cd(?C"second")ef/ + abcdefg +Callout (7): 'first' +--->abcdefg + ^ ^ c +Callout (20): "second" +--->abcdefg + ^ ^ e + 0: abcdef + +/(?:a(?C`code`)){3}X/ + aaaXY +Callout (8): `code` +--->aaaXY + ^^ ){3} +Callout (8): `code` +--->aaaXY + ^ ^ ){3} +Callout (8): `code` +--->aaaXY + ^ ^ ){3} + 0: aaaX + +# Binary zero in callout string +# a ( ? C ' x z ' ) b +/ 61 28 3f 43 27 78 00 7a 27 29 62/hex,callout_info +Callout 'x\x00z' b + abcdefgh +Callout (5): 'x\x00z' +--->abcdefgh + ^^ b + 0: ab + +/(?(?!)^)/ + +/(?(?!)a|b)/ + bbb + 0: b +\= Expect no match + aaa +No match + +# JIT gives a different error message for the infinite recursion + +"(*NO_JIT)((?2)+)((?1)){" + abcd{ +Failed: error -52: nested recursion at the same subject position + +# Perl fails to diagnose the absence of an assertion + +"(?(?.*!.*)?)" +Failed: error 128 at offset 2: assertion expected after (?( or (?(?C) + +"X((?2)()*+){2}+"B +------------------------------------------------------------------ + Bra + X + Once + CBra 1 + Recurse + Braposzero + SCBraPos 2 + KetRpos + Ket + CBra 1 + Recurse + Braposzero + SCBraPos 2 + KetRpos + Ket + Ket + Ket + End +------------------------------------------------------------------ + +"X((?2)()*+){2}"B +------------------------------------------------------------------ + Bra + X + CBra 1 + Recurse + Braposzero + SCBraPos 2 + KetRpos + Ket + CBra 1 + Recurse + Braposzero + SCBraPos 2 + KetRpos + Ket + Ket + End +------------------------------------------------------------------ + +/(?<=\bABQ(3(?-7)))/ +Failed: error 115 at offset 15: reference to non-existent subpattern + +/(?<=\bABQ(3(?+7)))/ +Failed: error 115 at offset 15: reference to non-existent subpattern + +";(?<=()((?3))((?2)))" +Failed: error 125 at offset 1: lookbehind assertion is not fixed length + +# Perl loops on this (PCRE2 used to!) + +/(?<=\Ka)/g,aftertext + aaaaa + 0: a + 0+ aaaa + 0: a + 0+ aaa + 0: a + 0+ aa + 0: a + 0+ a + 0: a + 0+ + +/(?<=\Ka)/altglobal,aftertext + aaaaa + 0: a + 0+ aaaa + 0: a + 0+ aaa + 0: a + 0+ aa + 0: a + 0+ a + 0: a + 0+ + +/((?2){73}(?2))((?1))/info +Capture group count = 2 +May match empty string +Subject length lower bound = 0 + +/abc/ +\= Expect no match + \[9x!xxx(]{9999} +No match + +/(abc)*/ + \[abc]{5} + 0: abcabcabcabcabc + 1: abc + +/^/gm + \n\n\n + 0: + 0: + 0: + +/^/gm,alt_circumflex + \n\n\n + 0: + 0: + 0: + 0: + +/((((((((x))))))))\81/ +Failed: error 115 at offset 19: reference to non-existent subpattern + xx1 + +/((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((x))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))\80/ + xx +Matched, but too many substrings + 0: xx + 1: x + 2: x + 3: x + 4: x + 5: x + 6: x + 7: x + 8: x + 9: x +10: x +11: x +12: x +13: x +14: x + +/\80/ +Failed: error 115 at offset 2: reference to non-existent subpattern + +/A\8B\9C/ +Failed: error 115 at offset 2: reference to non-existent subpattern + A8B9C + +/(?x:((?'a')) # comment (with parentheses) and | vertical +(?-x:#not a comment (?'b')) # this is a comment () +(?'c')) # not a comment (?'d')/info +Capture group count = 5 +Named capture groups: + a 2 + b 3 + c 4 + d 5 +First code unit = '#' +Last code unit = ' ' +Subject length lower bound = 32 + +/(?|(?'a')(2)(?'b')|(?'a')(?'a')(3))/I,dupnames +Capture group count = 3 +Named capture groups: + a 1 + a 2 + b 3 +Options: dupnames +Starting code units: 2 3 +Subject length lower bound = 1 + A23B + 0: 2 + 1: + 2: 2 + 3: + B32A + 0: 3 + 1: + 2: + 3: 3 + +# These are some patterns that used to cause buffer overflows or other errors +# while compiling. + +/.((?2)(?R)|\1|$)()/B +------------------------------------------------------------------ + Bra + Any + CBra 1 + Recurse + Recurse + Alt + \1 + Alt + $ + Ket + CBra 2 + Ket + Ket + End +------------------------------------------------------------------ + +/.((?3)(?R)()(?2)|\1|$)()/B +------------------------------------------------------------------ + Bra + Any + CBra 1 + Recurse + Recurse + CBra 2 + Ket + Recurse + Alt + \1 + Alt + $ + Ket + CBra 3 + Ket + Ket + End +------------------------------------------------------------------ + +/(\9*+(?2);\3++()2|)++{/ +Failed: error 115 at offset 2: reference to non-existent subpattern + +/\V\x85\9*+((?2)\3++()2)*:2/ +Failed: error 115 at offset 7: reference to non-existent subpattern + +/(((?(R)){0,2}) (?'x'((?'R')((?'R')))))/dupnames + +/(((?(X)){0,2}) (?'x'((?'X')((?'X')))))/dupnames + +/(((?(R)){0,2}) (?'x'((?'X')((?'R')))))/ + +"(?J)(?'d'(?'d'\g{d}))" + +"(?=!((?2)(?))({8(?<=(?1){29}8bbbb\x16\xd\xc6^($(\xa9H4){4}h}?1)B))\x15')" +Failed: error 125 at offset 16: lookbehind assertion is not fixed length + +/A(?'')Z/ +Failed: error 162 at offset 4: subpattern name expected + +"(?J:(?|(?'R')(\k'R')|((?'R'))))" + +/(?<=|(\,\$(?73591620449005828816)\xa8.{7}){6}\x09)/ +Failed: error 161 at offset 17: subpattern number is too big + +/^(?:(?(1)x|)+)+$()/B +------------------------------------------------------------------ + Bra + ^ + SBra + SCond + 1 Cond ref + x + Alt + KetRmax + KetRmax + $ + CBra 1 + Ket + Ket + End +------------------------------------------------------------------ + +/[[:>:]](?<)/ +Failed: error 162 at offset 10: subpattern name expected + +/((?x)(*:0))#(?'/ +Failed: error 162 at offset 15: subpattern name expected + +/(?C$[$)(?<]/ +Failed: error 162 at offset 10: subpattern name expected + +/(?C$)$)(?<]/ +Failed: error 162 at offset 10: subpattern name expected + +/(?(R))*+/B +------------------------------------------------------------------ + Bra + Braposzero + SBraPos + SCond + Cond recurse any + Ket + KetRpos + Ket + End +------------------------------------------------------------------ + abcd + 0: + +/((?x)(?#))#(?'/ +Failed: error 162 at offset 14: subpattern name expected + +/((?x)(?#))#(?'abc')/I +Capture group count = 2 +Named capture groups: + abc 2 +First code unit = '#' +Subject length lower bound = 1 + +/[[:\\](?<[::]/ +Failed: error 162 at offset 9: subpattern name expected + +/[[:\\](?'abc')[a:]/I +Capture group count = 1 +Named capture groups: + abc 1 +Starting code units: : [ \ +Subject length lower bound = 2 + +"[[[.\xe8Nq\xffq\xff\xe0\x2|||::Nq\xffq\xff\xe0\x6\x2|||::[[[:[::::::[[[[[::::::::[:[[[:[:::[[[[[[[[[[[[:::::::::::::::::[[.\xe8Nq\xffq\xff\xe0\x2|||::Nq\xffq\xff\xe0\x6\x2|||::[[[:[::::::[[[[[::::::::[:[[[:[:::[[[[[[[[[[[[[[:::E[[[:[:[[:[:::[[:::E[[[:[:[[:'[:::::E[[[:[::::::[[[:[[[[[[[::E[[[:[::::::[[[:[[[[[[[[:[[::[::::[[:::::::[[:[[[[[[[:[[::[:[[:[~" +Failed: error 106 at offset 353: missing terminating ] for character class + +/()(?(R)0)*+/B +------------------------------------------------------------------ + Bra + CBra 1 + Ket + Braposzero + SBraPos + SCond + Cond recurse any + 0 + Ket + KetRpos + Ket + End +------------------------------------------------------------------ + +/(?R-:(?>abcd<< + 1: >>w\rx\x82y\o{333}z(\Q12\$34$$\x34\E5$$)<< + +/abcd/g,replace=\$1$2\,substitute_literal + XabcdYabcdZ + 2: X\$1$2\Y\$1$2\Z + +/a(bc)(DE)/replace=a\u$1\U$1\E$1\l$2\L$2\Eab\Uab\LYZ\EDone,substitute_extended + abcDE + 1: aBcBCbcdEdeabAByzDone + +/abcd/replace=xy\kz,substitute_extended + abcd +Failed: error -57 at offset 4 in replacement: bad escape sequence in replacement string + +/a(?:(b)|(c))/substitute_extended,replace=X${1:+1:-1}X${2:+2:-2} + ab + 1: X1X-2 + ac + 1: X-1X2 + ab\=replace=${1:+$1\:$1:$2} + 1: b:b + ac\=replace=${1:+$1\:$1:$2} + 1: c + >>ac<<\=replace=${1:+$1\:$1:$2},substitute_literal + 1: >>${1:+$1\:$1:$2}<< + +/a(?:(b)|(c))/substitute_extended,replace=X${1:-1:-1}X${2:-2:-2} + ab + 1: XbX2:-2 + ac + 1: X1:-1Xc + +/(a)/substitute_extended,replace=>${1:+\Q$1:{}$$\E+\U$1}< + a + 1: >$1:{}$$+A< + +/X(b)Y/substitute_extended + XbY\=replace=x${1:+$1\U$1}y + 1: xbBY + XbY\=replace=\Ux${1:+$1$1}y + 1: XBBY + +/a/substitute_extended,replace=${*MARK:+a:b} + a +Failed: error -58 at offset 7 in replacement: expected closing curly bracket in replacement string + +/(abcd)/replace=${1:+xy\kz},substitute_extended + abcd +Failed: error -57 at offset 8 in replacement: bad escape sequence in replacement string + +/(abcd)/ + abcd\=replace=${1:+xy\kz},substitute_extended +Failed: error -57 at offset 8 in replacement: bad escape sequence in replacement string + +/abcd/substitute_extended,replace=>$1< + abcd +Failed: error -49 at offset 3 in replacement: unknown substring + +/abcd/substitute_extended,replace=>xxx${xyz}<<< + abcd +Failed: error -49 at offset 10 in replacement: unknown substring + +/(?J)(?:(?a)|(?b))/replace=<$A> + [a] + 1: [] + [b] + 1: [] +\= Expect error + (a)\=ovector=1 +Failed: error -54 at offset 3 in replacement: requested value is not available + +/(a)|(b)/replace=<$1> +\= Expect error + b +Failed: error -55 at offset 3 in replacement: requested value is not set + +/(aa)(BB)/substitute_extended,replace=\U$1\L$2\E$1..\U$1\l$2$1 + aaBB + 1: AAbbaa..AAbBaa + +/abcd/replace=wxyz,substitute_matched + abcd + 1: wxyz + pqrs + 0: pqrs + +/abcd/g + >abcd1234abcd5678<\=replace=wxyz,substitute_matched + 2: >wxyz1234wxyz5678< + +/^(o(\1{72}{\"{\\{00000059079}\d*){74}}){19}/I +Capture group count = 2 +Max back reference = 1 +Compile options: +Overall options: anchored +First code unit = 'o' +Last code unit = '}' +Subject length lower bound = 65535 + +/((p(?'K/ +Failed: error 142 at offset 7: syntax error in subpattern name (missing terminator?) + +/((p(?'K/no_auto_capture +Failed: error 142 at offset 7: syntax error in subpattern name (missing terminator?) + +/abc/replace=A$3123456789Z + abc +Failed: error -49 at offset 3 in replacement: unknown substring + +/(?a[bc]d + +0 ^ ( + +1 ^ )\Q\E* + +7 ^ ] + +8 ^^ End of pattern + 0: ] + 1: + +/\x8a+f|;T?(*:;.'?`(\xeap ){![^()!y*''C*(?';]{1;(\x08)/B,alt_verbnames,dupnames,extended +------------------------------------------------------------------ + Bra + \x{8a}++ + f + Alt + ; + T? + *MARK ;.'?`(\x{ea}p + {! + [\x00- "-&+-:<->@-BD-xz-\xff] (neg) + {1; + CBra 1 + \x08 + Ket + Ket + End +------------------------------------------------------------------ + +# Tests for NULL characters in comments and verb "names" and callouts + +# /A#B\x00C\x0aZ/ +/41 23 42 00 43 0a 5a/Bx,hex +------------------------------------------------------------------ + Bra + AZ + Ket + End +------------------------------------------------------------------ + +# /A+#B\x00C\x0a+/ +/41 2b 23 42 00 43 0a 2b/Bx,hex +------------------------------------------------------------------ + Bra + A++ + Ket + End +------------------------------------------------------------------ + +# /A(*:B\x00W#X\00Y\x0aC)Z/ +/41 28 2a 3a 42 00 57 23 58 00 59 0a 43 29 5a/Bx,hex,alt_verbnames +------------------------------------------------------------------ + Bra + A + *MARK B\x{0}WC + Z + Ket + End +------------------------------------------------------------------ + +# /A(*:B\x00W#X\00Y\x0aC)Z/ +/41 28 2a 3a 42 00 57 23 58 00 59 0a 43 29 5a/Bx,hex +------------------------------------------------------------------ + Bra + A + *MARK B\x{0}W#X\x{0}Y\x{a}C + Z + Ket + End +------------------------------------------------------------------ + +# /A(?C{X\x00Y})B/ +/41 28 3f 43 7b 58 00 59 7d 29 42/B,hex +------------------------------------------------------------------ + Bra + A + CalloutStr {X\x{0}Y} 5 10 1 + B + Ket + End +------------------------------------------------------------------ + +# /A(?#X\x00Y)B/ +/41 28 3f 23 7b 00 7d 29 42/B,hex +------------------------------------------------------------------ + Bra + AB + Ket + End +------------------------------------------------------------------ + +# Tests for leading comment in extended patterns + +/ (?-x):?/extended + +/ (?-x):?/extended + +/0b 28 3f 2d 78 29 3a/hex,extended + +/#comment +(?-x):?/extended + +/(8(*:6^\x09x\xa6l\)6!|\xd0:[^:|)\x09d\Z\d{85*m(?'(?<1!)*\W[*\xff]!!h\w]*\xbe;/alt_bsux,alt_verbnames,allow_empty_class,dollar_endonly,extended,multiline,never_utf,no_dotstar_anchor,no_start_optimize +Failed: error 162 at offset 49: subpattern name expected + +/a|(b)c/replace=>$1<,substitute_unset_empty + cat + 1: c>b$1< +Failed: error -55 at offset 3 in replacement: requested value is not set + cat\=replace=>$1<,substitute_unset_empty + 1: c>$1<,substitute_unset_empty + 1: x>b${2:-xx}< +Failed: error -49 at offset 9 in replacement: unknown substring + cat\=replace=>${2:-xx}<,substitute_unknown_unset + 1: c>xx${X:-xx}<,substitute_unknown_unset + 1: c>xx$X<,substitute_unset_empty + cat + 1: c>b$Y<,substitute_unset_empty + cat +Failed: error -49 at offset 3 in replacement: unknown substring + cat\=substitute_unknown_unset + 1: c>$2<,substitute_unset_empty + cat +Failed: error -49 at offset 3 in replacement: unknown substring + cat\=substitute_unknown_unset + 1: c>9010 + 0 ^ 0 + 0 ^ 0 + 0: + 1: 0 +\= Expect no match + abc +--->abc + 0 ^ 0 + 0 ^ 0 + 0 ^ 0 +No match + +/aaa/ +\[abc]{10000000000000000000000000000} +** Repeat count too large +\[a]{3} + 0: aaa + +/\[AB]{6000000000000000000000}/expand +** Pattern repeat count too large + +# Hex uses pattern length, not zero-terminated. This tests for overrunning +# the given length of a pattern. + +/'(*U'/hex +Failed: error 160 at offset 3: (*VERB) not recognized or malformed + +/'(*'/hex +Failed: error 109 at offset 1: quantifier does not follow a repeatable item + +/'('/hex +Failed: error 114 at offset 1: missing closing parenthesis + +//hex + +# These tests are here because Perl never allows a back reference in a +# lookbehind. PCRE2 supports some limited cases. + +/([ab])...(?<=\1)z/ + a11az + 0: a11az + 1: a + b11bz + 0: b11bz + 1: b +\= Expect no match + b11az +No match + +/(?|([ab]))...(?<=\1)z/ +Failed: error 125 at offset 13: lookbehind assertion is not fixed length + +/([ab])(\1)...(?<=\2)z/ + aa11az + 0: aa11az + 1: a + 2: a + +/(a\2)(b\1)(?<=\2)/ +Failed: error 125 at offset 10: lookbehind assertion is not fixed length + +/(?[ab])...(?<=\k'A')z/ + a11az + 0: a11az + 1: a + b11bz + 0: b11bz + 1: b +\= Expect no match + b11az +No match + +/(?[ab])...(?<=\k'A')(?)z/dupnames +Failed: error 125 at offset 13: lookbehind assertion is not fixed length + +# Perl does not support \g+n + +/((\g+1X)?([ab]))+/ + aaXbbXa + 0: aaXbbXa + 1: bXa + 2: bX + 3: a + +/ab(?C1)c/auto_callout + abc +--->abc + +0 ^ a + +1 ^^ b + 1 ^ ^ c + +8 ^ ^ End of pattern + 0: abc + +/'ab(?C1)c'/hex,auto_callout + abc +--->abc + +0 ^ a + +1 ^^ b + 1 ^ ^ c + +8 ^ ^ End of pattern + 0: abc + +# Perl accepts these, but gives a warning. We can't warn, so give an error. + +/[a-[:digit:]]+/ +Failed: error 150 at offset 4: invalid range in character class + a-a9-a + +/[A-[:digit:]]+/ +Failed: error 150 at offset 4: invalid range in character class + A-A9-A + +/[a-\d]+/ +Failed: error 150 at offset 5: invalid range in character class + a-a9-a + +/(?abc)(?(R)xyz)/B +------------------------------------------------------------------ + Bra + CBra 1 + abc + Ket + Cond + Cond recurse any + xyz + Ket + Ket + End +------------------------------------------------------------------ + +/(?abc)(?(R)xyz)/B +------------------------------------------------------------------ + Bra + CBra 1 + abc + Ket + Cond + 1 Cond ref + xyz + Ket + Ket + End +------------------------------------------------------------------ + +/(?=.*[A-Z])/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + +/()(?<=(?0))/ +Failed: error 125 at offset 2: lookbehind assertion is not fixed length + +/(?*?\g'0/use_length +Failed: error 157 at offset 6: \g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number + +/.>*?\g'0/ +Failed: error 157 at offset 6: \g is not followed by a braced, angle-bracketed, or quoted name/number or by a plain number + +/{„Í„ÍÍ„Í{'{22{2{{2{'{22{{22{2{'{22{2{{2{{222{{2{'{22{2{22{2{'{22{2{{2{'{22{2{22{2{'{'{22{2{22{2{'{22{2{{2{'{22{2{22{2{'{222{2Ą̈́ÍÍ„Í{'{22{2{{2{'{22{{11{2{'{22{2{{2{{'{22{2{{2{'{22{{22{1{'{22{2{{2{{222{{2{'{22{2{22{2{'{/auto_callout + +// +\=get=i00000000000000000000000000000000 +** Group name in 'get' is too long +\=get=i2345678901234567890123456789012,get=i1245678901234567890123456789012 +** Too many characters in named 'get' modifiers + +"(?(?C))" +Failed: error 128 at offset 6: assertion expected after (?( or (?(?C) + +/(?(?(?(?(?(?))))))/ +Failed: error 128 at offset 2: assertion expected after (?( or (?(?C) + +/(?<=(?1))((?s))/anchored + +/(*:ab)*/ +Failed: error 109 at offset 6: quantifier does not follow a repeatable item + +%(*:(:(svvvvvvvvvv:]*[ Z!*;[]*[^[]*!^[+.+{{2,7}' _\\\\\\\\\\\\\)?.:.. *w////\\\Q\\\\\\\\\\\\\\\T\\\\\+/?/////'+\\\EEE?/////'+/*+/[^K]?]//(w)%never_backslash_c,alt_verbnames,auto_callout + +/./newline=crlf + \=ph +No match + +/(\x0e00\000000\xc)/replace=\P,substitute_extended + \x0e00\000000\xc +Failed: error -57 at offset 2 in replacement: bad escape sequence in replacement string + +//replace=0 + \=offset=7 +Failed: error -33: bad offset value + +/(?<=\G.)/g,replace=+ + abc + 3: a+b+c+ + +".+\QX\E+"B,no_auto_possess +------------------------------------------------------------------ + Bra + Any+ + X+ + Ket + End +------------------------------------------------------------------ + +".+\QX\E+"B,auto_callout,no_auto_possess +------------------------------------------------------------------ + Bra + Callout 255 0 4 + Any+ + Callout 255 4 4 + X+ + Callout 255 8 0 + Ket + End +------------------------------------------------------------------ + +# This one is here because Perl gives an 'unmatched )' error which goes away +# if one of the \) sequences is removed - which is weird. PCRE finds it too +# complicated to find a minimum matching length. + +"()X|((((((((()))))))((((())))))\2())((((((\2\2)))\2)(\22((((\2\2)2))\2)))(2\ZZZ)+:)Z^|91ZiZZnter(ZZ |91Z(ZZ ZZ(\r2Z( or#(\Z2(Z\Z(\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)2))\2Z+:)Z|91Z(ZZ ZZ(\r2Z( or#(\Z2(Z\Z((Z*(\2(Z\':))\0)i|||||||||||||||loZ\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)2))\2Z)))int \)\0nte!rnal errpr\2\\21r(2\ZZZ)+:)Z!|91Z(ZZ ZZ(\r2Z( or#(\Z2(Z\Z(\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)2))\2Z)))int \)\0(2\ZZZ)+:)Z^|91ZiZZnter(ZZ |91Z(ZZ ZZ(\r2Z( or#(\Z2(Z\Z(\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)2))\2Z)))int \)\0(2\ZZZ)+:)Z^)))int \)\0(2\ZZZ)+:)Z^|91ZiZZnter(ZZernZal ZZ(\r2Z( or#(\Z2(Z\Z(\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)2))\2Z)))int \))\ZZ(\r2Z( or#(\Z2(Z\Z(\2\2)2))\2Z)Z(\22Z((\Z2(Z\Z(\2\2)))\2))))((((((\2\2))))))"I +Capture group count = 108 +Max back reference = 22 +Contains explicit CR or LF match +Subject length lower bound = 1 + +# This checks that new code for handling groups that may match an empty string +# works on a very large number of alternatives. This pattern used to provoke a +# complaint that it was too complicated. + +/(?:\[A|B|C|D|E|F|G|H|I|J|]{200}Z)/expand + +# This one used to compile rubbish instead of a compile error, and then +# behave unpredictably at match time. + +/.+(?(?C'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'))?!XXXX.=X/ +Failed: error 128 at offset 63: assertion expected after (?( or (?(?C) + .+(?(?C'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'))?!XXXX.=X + +/[:[:alnum:]-[[a:lnum:]+/ +Failed: error 150 at offset 11: invalid range in character class + +/((?(?C'')\QX\E(?!((?(?C'')(?!X=X));=)r*X=X));=)/ +Failed: error 128 at offset 11: assertion expected after (?( or (?(?C) + +/((?(?C'')\Q\E(?!((?(?C'')(?!X=X));=)r*X=X));=)/ + +/abcd/auto_callout + abcd\=callout_error=255:2 +--->abcd + +0 ^ a + +1 ^^ b +Failed: error -37: callout error code + +/()(\g+65534)/ +Failed: error 161 at offset 11: subpattern number is too big + +/()(\g+65533)/ +Failed: error 115 at offset 10: reference to non-existent subpattern + +/Á\x00\x00\x00š(\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\2*\x00k\d+\x00‎\x00\x00\x00\x00\x00\2*\x00\x00\1*.){36}int^\x00\x00ÿÿ\x00š(\1{50779}?)J\w2/I +Capture group count = 2 +Max back reference = 2 +First code unit = \xc1 +Last code unit = '2' +Subject length lower bound = 65535 + +/(a)(b)\2\1\1\1\1/I +Capture group count = 2 +Max back reference = 2 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 7 + +/(?a)(?b)\g{b}\g{a}\g{a}\g{a}\g{a}(?xx)(?zz)/I,dupnames +Capture group count = 4 +Max back reference = 4 +Named capture groups: + a 1 + a 3 + b 2 + b 4 +Options: dupnames +First code unit = 'a' +Last code unit = 'z' +Subject length lower bound = 11 + +// + \=ovector=7777777777 +** Invalid value in 'ovector=7777777777' + +# This is here because Perl matches, even though a COMMIT is encountered +# outside of the recursion. + +/(?1)(A(*COMMIT)|B)D/ + BAXBAD +No match + +"(?1){2}(a)"B +------------------------------------------------------------------ + Bra + Recurse + Recurse + CBra 1 + a + Ket + Ket + End +------------------------------------------------------------------ + +"(?1){2,4}(a)"B +------------------------------------------------------------------ + Bra + Recurse + Recurse + Brazero + Bra + Bra + Recurse + Ket + Brazero + Bra + Recurse + Ket + Ket + CBra 1 + a + Ket + Ket + End +------------------------------------------------------------------ + +# This test differs from Perl for the first subject. Perl ends up with +# $1 set to 'B'; PCRE2 has it unset (which I think is right). + +/^(?: +(?:A| (?:B|B(*ACCEPT)) (?<=(.)) D) +(Z) +)+$/x + AZB + 0: AZB + 1: + 2: Z + AZBDZ + 0: AZBDZ + 1: B + 2: Z + +# The first of these, when run by Perl, gives the mark 'aa', which is wrong. + +'(?>a(*:aa))b|ac' mark + ac + 0: ac + +'(?:a(*:aa))b|ac' mark + ac + 0: ac + +/(R?){65}/ + (R?){65} + 0: + 1: + +/\[(a)]{60}/expand + aaaa +No match + +/(?abcdabcd + ^^ ( +Callout 1: last capture = 1 + 1: abcd + 2: b + 3: c +--->abcdabcd + ^ ^ ( + 0: abcdabcd + 1: abcd + 2: b + 3: c + +# Perl matches this one, but PCRE does not because (*ACCEPT) clears out any +# pending backtracks in the recursion. + +/^ (?(DEFINE) (..(*ACCEPT)|...) ) (?1)$/x +\= Expect no match + abc +No match + +# Perl gives no match for this one + +/(a(*MARK:m)(*ACCEPT)){0}(?1)/mark + abc + 0: a +MK: m + +/abc/endanchored + xyzabc + 0: abc +\= Expect no match + xyzabcdef +No match +\= Expect error + xyzabc\=ph +Failed: error -34: bad option value + +/abc/ + xyzabc\=endanchored + 0: abc +\= Expect no match + xyzabcdef\=endanchored +No match +\= Expect error + xyzabc\=ps,endanchored +Failed: error -34: bad option value + +/abc(*ACCEPT)d/endanchored + xyzabc + 0: abc +\= Expect no match + xyzabcdef +No match + +/abc|bcd/endanchored + xyzabcd + 0: bcd +\= Expect no match + xyzabcdef +No match + +/a(*ACCEPT)x|aa/endanchored + aaa + 0: a + +# Check auto-anchoring when there is a group that is never obeyed at +# the start of a branch. + +/(?(DEFINE)(a))^bc/I +Capture group count = 1 +Compile options: +Overall options: anchored +First code unit = 'b' +Subject length lower bound = 2 + +/(a){0}.*bc/sI +Capture group count = 1 +Compile options: dotall +Overall options: anchored dotall +Last code unit = 'c' +Subject length lower bound = 2 + +# This should be anchored, as the condition is always false and there is +# no alternative branch. + +/(?(VERSION>=999)yes)^bc/I +Capture group count = 0 +Compile options: +Overall options: anchored +Subject length lower bound = 2 + +# This should not be anchored. + +/(?(VERSION>=999)yes|no)^bc/I +Capture group count = 0 +Last code unit = 'c' +Subject length lower bound = 4 + +/(*LIMIT_HEAP=0)xxx/I +Capture group count = 0 +Heap limit = 0 +First code unit = 'x' +Last code unit = 'x' +Subject length lower bound = 3 + +/\d{0,3}(*:abc)(?C1)xxx/callout_info +Callout 1 x + +# ---------------------------------------------------------------------- + +# These are a whole pile of tests that touch lines of code that are not +# used by any other tests (at least when these were created). + +/^a+?x/i,no_start_optimize,no_auto_possess +\= Expect no match + aaa +No match + +/^[^a]{3,}?x/i,no_start_optimize,no_auto_possess +\= Expect no match + bbb +No match + cc +No match + +/^X\S/no_start_optimize,no_auto_possess +\= Expect no match + X +No match + +/^X\W/no_start_optimize,no_auto_possess +\= Expect no match + X +No match + +/^X\H/no_start_optimize,no_auto_possess +\= Expect no match + X +No match + +/^X\h/no_start_optimize,no_auto_possess +\= Expect no match + X +No match + +/^X\V/no_start_optimize,no_auto_possess +\= Expect no match + X +No match + +/^X\v/no_start_optimize,no_auto_possess +\= Expect no match + X +No match + +/^X\h/no_start_optimize,no_auto_possess +\= Expect no match + XY +No match + +/^X\V/no_start_optimize,no_auto_possess +\= Expect no match + X\n +No match + +/^X\v/no_start_optimize,no_auto_possess +\= Expect no match + XX +No match + +/^X.+?/s,no_start_optimize,no_auto_possess +\= Expect no match + X +No match + +/^X\R+?/no_start_optimize,no_auto_possess +\= Expect no match + XX +No match + +/^X\H+?/no_start_optimize,no_auto_possess +\= Expect no match + X +No match + +/^X\h+?/no_start_optimize,no_auto_possess +\= Expect no match + X +No match + +/^X\V+?/no_start_optimize,no_auto_possess +\= Expect no match + X +No match + X\n +No match + +/^X\D+?/no_start_optimize,no_auto_possess +\= Expect no match + X +No match + X9 +No match + +/^X\S+?/no_start_optimize,no_auto_possess +\= Expect no match + X +No match + X\n +No match + +/^X\W+?/no_start_optimize,no_auto_possess +\= Expect no match + X +No match + XX +No match + +/^X.+?Z/no_start_optimize,no_auto_possess +\= Expect no match + XY\n +No match + +/(*CRLF)^X.+?Z/no_start_optimize,no_auto_possess +\= Expect no match + XY\r\=ps +Partial match: XY\x0d + +/^X\R+?Z/no_start_optimize,no_auto_possess +\= Expect no match + X\nX +No match + X\n\r\n +No match + X\n\rY +No match + X\n\nY +No match + X\n\x{0c}Y +No match + +/(*BSR_ANYCRLF)^X\R+?Z/no_start_optimize,no_auto_possess +\= Expect no match + X\nX +No match + X\n\r\n +No match + X\n\rY +No match + X\n\nY +No match + X\n\x{0c}Y +No match + +/^X\H+?Z/no_start_optimize,no_auto_possess +\= Expect no match + XY\t +No match + XYY +No match + +/^X\h+?Z/no_start_optimize,no_auto_possess +\= Expect no match + X\t\t +No match + X\tY +No match + +/^X\V+?Z/no_start_optimize,no_auto_possess +\= Expect no match + XY\n +No match + XYY +No match + +/^X\v+?Z/no_start_optimize,no_auto_possess +\= Expect no match + X\n\n +No match + X\nY +No match + +/^X\D+?Z/no_start_optimize,no_auto_possess +\= Expect no match + XY9 +No match + XYY +No match + +/^X\d+?Z/no_start_optimize,no_auto_possess +\= Expect no match + X99 +No match + X9Y +No match + +/^X\S+?Z/no_start_optimize,no_auto_possess +\= Expect no match + XY\n +No match + XYY +No match + +/^X\s+?Z/no_start_optimize,no_auto_possess +\= Expect no match + X\n\n +No match + X\nY +No match + +/^X\W+?Z/no_start_optimize,no_auto_possess +\= Expect no match + X.A +No match + X++ +No match + +/^X\w+?Z/no_start_optimize,no_auto_possess +\= Expect no match + Xa. +No match + Xaa +No match + +/^X.{1,3}Z/s,no_start_optimize,no_auto_possess +\= Expect no match + Xa.bd +No match + +/^X\h+Z/no_start_optimize,no_auto_possess +\= Expect no match + X\t\t +No match + X\tY +No match + +/^X\V+Z/no_start_optimize,no_auto_possess +\= Expect no match + XY\n +No match + XYY +No match + +/^(X(*THEN)Y|AB){0}(?1)/ + ABX + 0: AB +\= Expect no match + XAB +No match + +/^(?!A(?C1)B)C/ + ABC\=callout_error=1,no_jit +No match + +/^(?!A(?C1)B)C/no_start_optimize + ABC\=callout_error=1 +--->ABC + 1 ^^ B +Failed: error -37: callout error code + +/^(?(?!A(?C1)B)C)/ + ABC\=callout_error=1 +--->ABC + 1 ^^ B +Failed: error -37: callout error code + +# ---------------------------------------------------------------------- + +/[a b c]/BxxI +------------------------------------------------------------------ + Bra + [a-c] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: extended_more +Starting code units: a b c +Subject length lower bound = 1 + +/[a b c]/BxxxI +------------------------------------------------------------------ + Bra + [a-c] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: extended extended_more +Starting code units: a b c +Subject length lower bound = 1 + +/[a b c]/B,extended_more +------------------------------------------------------------------ + Bra + [a-c] + Ket + End +------------------------------------------------------------------ + +/[ a b c ]/B,extended_more +------------------------------------------------------------------ + Bra + [a-c] + Ket + End +------------------------------------------------------------------ + +/[a b](?xx: [ 12 ] (?-xx:[ 34 ]) )y z/B +------------------------------------------------------------------ + Bra + [ ab] + Bra + [12] + Bra + [ 34] + Ket + Ket + y z + Ket + End +------------------------------------------------------------------ + +# Unsetting /x also unsets /xx + +/[a b](?xx: [ 12 ] (?-x:[ 34 ]) )y z/B +------------------------------------------------------------------ + Bra + [ ab] + Bra + [12] + Bra + [ 34] + Ket + Ket + y z + Ket + End +------------------------------------------------------------------ + +/(a)(?-n:(b))(c)/nB +------------------------------------------------------------------ + Bra + Bra + a + Ket + Bra + CBra 1 + b + Ket + Ket + Bra + c + Ket + Ket + End +------------------------------------------------------------------ + +# ---------------------------------------------------------------------- +# These test the dangerous PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL option. + +/\j\x{z}\o{82}\L\uabcd\u\U\g{\g/B,\bad_escape_is_literal +** Unrecognized modifier '\' in '\bad_escape_is_literal' + +/\N{\c/IB,bad_escape_is_literal +------------------------------------------------------------------ + Bra + N{c + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Extra options: bad_escape_is_literal +First code unit = 'N' +Last code unit = 'c' +Subject length lower bound = 3 + +/[\j\x{z}\o\gAb\g]/B,bad_escape_is_literal +------------------------------------------------------------------ + Bra + [Abgjoxz{}] + Ket + End +------------------------------------------------------------------ + +/[Q-\N]/B,bad_escape_is_literal +Failed: error 150 at offset 5: invalid range in character class + +/[\s-_]/bad_escape_is_literal +Failed: error 150 at offset 3: invalid range in character class + +/[_-\s]/bad_escape_is_literal +Failed: error 150 at offset 5: invalid range in character class + +/[\B\R\X]/B +Failed: error 107 at offset 2: escape sequence is invalid in character class + +/[\B\R\X]/B,bad_escape_is_literal +Failed: error 107 at offset 2: escape sequence is invalid in character class + +/[A-\BP-\RV-\X]/B +Failed: error 107 at offset 4: escape sequence is invalid in character class + +/[A-\BP-\RV-\X]/B,bad_escape_is_literal +Failed: error 107 at offset 4: escape sequence is invalid in character class + +# ---------------------------------------------------------------------- + +/a\b(c/literal + a\\b(c + 0: a\b(c + +/a\b(c/literal,caseless + a\\b(c + 0: a\b(c + a\\B(c + 0: a\B(c + +/a\b(c/literal,firstline + XYYa\\b(c + 0: a\b(c +\= Expect no match + X\na\\b(c +No match + +/a\b?c/literal,use_offset_limit + XXXXa\\b?c\=offset_limit=4 + 0: a\b?c +\= Expect no match + XXXXa\\b?c\=offset_limit=3 +No match + +/a\b(c/literal,anchored,endanchored + a\\b(c + 0: a\b(c +\= Expect no match + Xa\\b(c +No match + a\\b(cX +No match + Xa\\b(cX +No match + +//literal,extended +Failed: error 192 at offset 0: invalid option bits with PCRE2_LITERAL + +/a\b(c/literal,auto_callout,no_start_optimize + XXXXa\\b(c +--->XXXXa\b(c + +0 ^ a + +0 ^ a + +0 ^ a + +0 ^ a + +0 ^ a + +1 ^^ \ + +2 ^ ^ b + +3 ^ ^ ( + +4 ^ ^ c + +5 ^ ^ End of pattern + 0: a\b(c + +/a\b(c/literal,auto_callout + XXXXa\\b(c +--->XXXXa\b(c + +0 ^ a + +1 ^^ \ + +2 ^ ^ b + +3 ^ ^ ( + +4 ^ ^ c + +5 ^ ^ End of pattern + 0: a\b(c + +/(*CR)abc/literal + (*CR)abc + 0: (*CR)abc + +/cat|dog/I,match_word +Capture group count = 0 +Max lookbehind = 1 +Extra options: match_word +Starting code units: c d +Subject length lower bound = 3 + the cat sat + 0: cat +\= Expect no match + caterpillar +No match + snowcat +No match + syndicate +No match + +/(cat)|dog/I,match_line,literal +Capture group count = 0 +Compile options: literal +Overall options: anchored literal +Extra options: match_line +First code unit = '(' +Subject length lower bound = 9 + (cat)|dog + 0: (cat)|dog +\= Expect no match + the cat sat +No match + caterpillar +No match + snowcat +No match + syndicate +No match + +/a whole line/match_line,multiline + Rhubarb \na whole line\n custard + 0: a whole line +\= Expect no match + Not a whole line +No match + +# Perl gets this wrong, failing to capture 'b' in group 1. + +/^(b+|a){1,2}?bc/ + bbc + 0: bbc + 1: b + +# And again here, for the "babc" subject string. + +/^(b*|ba){1,2}?bc/ + babc + 0: babc + 1: ba + bbabc + 0: bbabc + 1: ba + bababc + 0: bababc + 1: ba +\= Expect no match + bababbc +No match + babababc +No match + +/[[:digit:]-a]/ +Failed: error 150 at offset 10: invalid range in character class + +/[[:digit:]-[:print:]]/ +Failed: error 150 at offset 10: invalid range in character class + +/[\d-a]/ +Failed: error 150 at offset 3: invalid range in character class + +/[\H-z]/ +Failed: error 150 at offset 3: invalid range in character class + +/[\d-[:print:]]/ +Failed: error 150 at offset 3: invalid range in character class + +# Perl gets the second of these wrong, giving no match. + +"(?<=(a))\1?b"I +Capture group count = 1 +Max back reference = 1 +Max lookbehind = 1 +Last code unit = 'b' +Subject length lower bound = 1 + ab + 0: b + 1: a + aaab + 0: ab + 1: a + +"(?=(a))\1?b"I +Capture group count = 1 +Max back reference = 1 +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + ab + 0: ab + 1: a + aaab + 0: ab + 1: a + +# JIT does not support callout_extra + +/(*NO_JIT)(a+)b/auto_callout,no_start_optimize,no_auto_possess +\= Expect no match + aac\=callout_extra +New match attempt +--->aac + +9 ^ ( ++10 ^ a+ ++12 ^ ^ ) ++13 ^ ^ b +Backtrack +--->aac ++12 ^^ ) ++13 ^^ b +Backtrack +No other matching paths +New match attempt +--->aac + +9 ^ ( ++10 ^ a+ ++12 ^^ ) ++13 ^^ b +Backtrack +No other matching paths +New match attempt +--->aac + +9 ^ ( ++10 ^ a+ +Backtrack +No other matching paths +New match attempt +--->aac + +9 ^ ( ++10 ^ a+ +No match + +/(*NO_JIT)a+(?C'XXX')b/no_start_optimize,no_auto_possess +\= Expect no match + aac\=callout_extra +New match attempt +Callout (15): 'XXX' +--->aac + ^ ^ b +Backtrack +Callout (15): 'XXX' +--->aac + ^^ b +Backtrack +No other matching paths +New match attempt +Callout (15): 'XXX' +--->aac + ^^ b +No match + +/\n/firstline + xyz\nabc + 0: \x0a + +/\nabc/firstline + xyz\nabc + 0: \x0aabc + +/\x{0a}abc/firstline,newline=crlf +\= Expect no match + xyz\r\nabc +No match + +/[abc]/firstline +\= Expect no match + \na +No match + +# These tests are matched in test 1 as they are Perl compatible. Here we are +# looking at what does and does not get auto-possessified. + +/(?(DEFINE)(?a?))^(?&optional_a)a$/B +------------------------------------------------------------------ + Bra + Cond + Cond false + CBra 1 + a? + Ket + Ket + ^ + Recurse + a + $ + Ket + End +------------------------------------------------------------------ + +/(?(DEFINE)(?a?)X)^(?&optional_a)a$/B +------------------------------------------------------------------ + Bra + Cond + Cond false + CBra 1 + a? + Ket + X + Ket + ^ + Recurse + a + $ + Ket + End +------------------------------------------------------------------ + +/^(a?)b(?1)a/B +------------------------------------------------------------------ + Bra + ^ + CBra 1 + a? + Ket + b + Recurse + a + Ket + End +------------------------------------------------------------------ + +/^(a?)+b(?1)a/B +------------------------------------------------------------------ + Bra + ^ + SCBra 1 + a? + KetRmax + b + Recurse + a + Ket + End +------------------------------------------------------------------ + +/^(a?)++b(?1)a/B +------------------------------------------------------------------ + Bra + ^ + SCBraPos 1 + a? + KetRpos + b + Recurse + a + Ket + End +------------------------------------------------------------------ + +/^(a?)+b/B +------------------------------------------------------------------ + Bra + ^ + SCBra 1 + a? + KetRmax + b + Ket + End +------------------------------------------------------------------ + +/(?=a+)a(a+)++b/B +------------------------------------------------------------------ + Bra + Assert + a++ + Ket + a + CBraPos 1 + a++ + KetRpos + b + Ket + End +------------------------------------------------------------------ + +/(?<=(?=.){4,5}x)/B +------------------------------------------------------------------ + Bra + Assert back + Reverse + Assert + Any + Ket + Assert + Any + Ket + Assert + Any + Ket + Assert + Any + Ket + Brazero + Assert + Any + Ket + x + Ket + Ket + End +------------------------------------------------------------------ + +# Perl behaves differently with these when optimization is turned off + +/a(*PRUNE:X)bc|qq/mark,no_start_optimize +\= Expect no match + axy +No match, mark = X + +/a(*THEN:X)bc|qq/mark,no_start_optimize +\= Expect no match + axy +No match, mark = X + +/(?^x-i)AB/ +Failed: error 194 at offset 4: invalid hyphen in option setting + +/(?^-i)AB/ +Failed: error 194 at offset 3: invalid hyphen in option setting + +/(?x-i-i)/ +Failed: error 194 at offset 5: invalid hyphen in option setting + +/(?(?=^))b/I +Capture group count = 0 +Last code unit = 'b' +Subject length lower bound = 1 + abc + 0: b + +/(?(?=^)|)b/I +Capture group count = 0 +First code unit = 'b' +Subject length lower bound = 1 + abc + 0: b + +/(?(?=^)|^)b/I +Capture group count = 0 +Compile options: +Overall options: anchored +First code unit = 'b' +Subject length lower bound = 1 + bbc + 0: b +\= Expect no match + abc +No match + +/(?(1)^|^())/I +Capture group count = 1 +Max back reference = 1 +May match empty string +Compile options: +Overall options: anchored +Subject length lower bound = 0 + +/(?(1)^())b/I +Capture group count = 1 +Max back reference = 1 +Last code unit = 'b' +Subject length lower bound = 1 + +/(?(1)^())+b/I,aftertext +Capture group count = 1 +Max back reference = 1 +Last code unit = 'b' +Subject length lower bound = 1 + abc + 0: b + 0+ c + +/(?(1)^()|^)+b/I,aftertext +Capture group count = 1 +Max back reference = 1 +Compile options: +Overall options: anchored +First code unit = 'b' +Subject length lower bound = 1 + bbc + 0: b + 0+ bc +\= Expect no match + abc +No match + +/(?(1)^()|^)*b/I,aftertext +Capture group count = 1 +Max back reference = 1 +First code unit = 'b' +Subject length lower bound = 1 + bbc + 0: b + 0+ bc + abc + 0: b + 0+ c + xbc + 0: b + 0+ c + +/(?(1)^())+b/I,aftertext +Capture group count = 1 +Max back reference = 1 +Last code unit = 'b' +Subject length lower bound = 1 + abc + 0: b + 0+ c + +/(?(1)^a()|^a)+b/I,aftertext +Capture group count = 1 +Max back reference = 1 +Compile options: +Overall options: anchored +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + abc + 0: ab + 0+ c +\= Expect no match + bbc +No match + +/(?(1)^|^(a))+b/I,aftertext +Capture group count = 1 +Max back reference = 1 +Compile options: +Overall options: anchored +Last code unit = 'b' +Subject length lower bound = 1 + abc + 0: ab + 0+ c + 1: a +\= Expect no match + bbc +No match + +/(?(1)^a()|^a)*b/I,aftertext +Capture group count = 1 +Max back reference = 1 +Last code unit = 'b' +Subject length lower bound = 1 + abc + 0: ab + 0+ c + bbc + 0: b + 0+ bc + xbc + 0: b + 0+ c + +/a(b)c|xyz/g,allvector,replace=<$0> + abcdefabcpqr\=ovector=4 + 2: defpqr + 0: 6 9 + 1: 7 8 + 2: + 3: + abxyz\=ovector=4 + 1: ab + 0: 2 5 + 1: + 2: + 3: + abcdefxyz\=ovector=4 + 2: def + 0: 6 9 + 1: + 2: + 3: + +/a(b)c|xyz/allvector + abcdef\=ovector=4 + 0: abc + 1: b + 2: + 3: + abxyz\=ovector=4 + 0: xyz + 1: + 2: + 3: + +/a(b)c|xyz/g,replace=<$0>,substitute_callout + abcdefabcpqr + 1(2) Old 0 3 "abc" New 0 5 "" + 2(2) Old 6 9 "abc" New 8 13 "" + 2: defpqr + abxyzpqrabcxyz + 1(1) Old 2 5 "xyz" New 2 7 "" + 2(2) Old 8 11 "abc" New 10 15 "" + 3(1) Old 11 14 "xyz" New 15 20 "" + 3: abpqr + 12abc34xyz99abc55\=substitute_stop=2 + 1(2) Old 2 5 "abc" New 2 7 "" + 2(1) Old 7 10 "xyz" New 9 14 " STOPPED" + 2: 1234xyz99abc55 + 12abc34xyz99abc55\=substitute_skip=1 + 1(2) Old 2 5 "abc" New 2 7 " SKIPPED" + 2(1) Old 7 10 "xyz" New 7 12 "" + 3(2) Old 12 15 "abc" New 14 19 "" + 3: 12abc349955 + 12abc34xyz99abc55\=substitute_skip=2 + 1(2) Old 2 5 "abc" New 2 7 "" + 2(1) Old 7 10 "xyz" New 9 14 " SKIPPED" + 3(2) Old 12 15 "abc" New 14 19 "" + 3: 1234xyz9955 + +/a(b)c|xyz/g,replace=<$0> + abcdefabcpqr + 2: defpqr + abxyzpqrabcxyz + 3: abpqr + 12abc34xyz\=substitute_stop=2 + 1(2) Old 2 5 "abc" New 2 7 "" + 2(1) Old 7 10 "xyz" New 9 14 " STOPPED" + 2: 1234xyz + 12abc34xyz\=substitute_skip=1 + 1(2) Old 2 5 "abc" New 2 7 " SKIPPED" + 2(1) Old 7 10 "xyz" New 7 12 "" + 2: 12abc34 + +/a(b)c|xyz/replace=<$0> + abcdefabcpqr + 1: defabcpqr + 12abc34xyz\=substitute_skip=1 + 1(2) Old 2 5 "abc" New 2 7 " SKIPPED" + 1: 12abc34xyz + 12abc34xyz\=substitute_stop=1 + 1(2) Old 2 5 "abc" New 2 7 " STOPPED" + 1: 12abc34xyz + +/abc\rdef/ + abc\ndef +No match + +/abc\rdef\x{0d}xyz/escaped_cr_is_lf + abc\ndef\rxyz + 0: abc\x0adef\x0dxyz +\= Expect no match + abc\ndef\nxyz +No match + +/(?(*ACCEPT)xxx)/ +Failed: error 128 at offset 2: assertion expected after (?( or (?(?C) + +/(?(*atomic:xx)xxx)/ +Failed: error 128 at offset 10: assertion expected after (?( or (?(?C) + +/(?(*script_run:xxx)zzz)/ +Failed: error 128 at offset 14: assertion expected after (?( or (?(?C) + +/foobar/ + the foobar thing\=copy_matched_subject + 0: foobar + the foobar thing\=copy_matched_subject,zero_terminate + 0: foobar + +/foobar/g + the foobar thing foobar again\=copy_matched_subject + 0: foobar + 0: foobar + +/(*:XX)^abc/I +Capture group count = 0 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 3 + +/(*COMMIT:XX)^abc/I +Capture group count = 0 +Compile options: +Overall options: anchored +First code unit = 'a' +Subject length lower bound = 3 + +/(*ACCEPT:XX)^abc/I +Capture group count = 0 +May match empty string +Subject length lower bound = 0 + +/abc/replace=xyz + abc\=null_context + 1: xyz + +/abc/replace=xyz,substitute_callout + abc + 1(1) Old 0 3 "abc" New 0 3 "xyz" + 1: xyz +\= Expect error message + abc\=null_context +** Replacement callouts are not supported with null_context. + +/\[()]{65535}()/expand +Failed: error 197 at offset 131071: too many capturing groups (maximum 65535) + +/\[()]{65535}(?)/expand +Failed: error 197 at offset 131075: too many capturing groups (maximum 65535) + +/a(?:(*ACCEPT))??bc/ + abc + 0: abc + axy + 0: a + +/a(*ACCEPT)??bc/ + abc + 0: abc + axy + 0: a + +/a(*ACCEPT:XX)??bc/mark + abc + 0: abc + axy + 0: a +MK: XX + +/(*:\)?/ +Failed: error 109 at offset 5: quantifier does not follow a repeatable item + +/(*:\Q \E){5}/alt_verbnames +Failed: error 109 at offset 11: quantifier does not follow a repeatable item + +/(?=abc)/I +Capture group count = 0 +May match empty string +First code unit = 'a' +Last code unit = 'c' +Subject length lower bound = 2 + +/(?|(X)|(XY))\1abc/I +Capture group count = 1 +Max back reference = 1 +First code unit = 'X' +Last code unit = 'c' +Subject length lower bound = 4 + +/(?|(a)|(bcde))(c)\2/I +Capture group count = 2 +Max back reference = 2 +Starting code units: a b +Last code unit = 'c' +Subject length lower bound = 3 + +/(?|(a)|(bcde))(c)\1/I +Capture group count = 2 +Max back reference = 1 +Starting code units: a b +Last code unit = 'c' +Subject length lower bound = 2 + +/(?|(?'A'a)|(?'A'bcde))(?'B'c)\k'B'(?'A')/I,dupnames +Capture group count = 3 +Max back reference = 2 +Named capture groups: + A 1 + A 3 + B 2 +Options: dupnames +Starting code units: a b +Last code unit = 'c' +Subject length lower bound = 3 + +/(?|(?'A'a)|(?'A'bcde))(?'B'c)\k'A'(?'A')/I,dupnames +Capture group count = 3 +Max back reference = 3 +Named capture groups: + A 1 + A 3 + B 2 +Options: dupnames +Starting code units: a b +Last code unit = 'c' +Subject length lower bound = 2 + +/((a|)+)+Z/I +Capture group count = 2 +Starting code units: Z a +Last code unit = 'Z' +Subject length lower bound = 1 + +/((?=a))[abcd]/I +Capture group count = 1 +First code unit = 'a' +Subject length lower bound = 1 + +/A(?:(*ACCEPT))?B/info +Capture group count = 0 +First code unit = 'A' +Subject length lower bound = 1 + +/(A(*ACCEPT)??B)C/ + ABC + 0: ABC + 1: AB + AXY + 0: A + 1: A + +/(?<=(?<=a)b)c.*/I +Capture group count = 0 +Max lookbehind = 1 +First code unit = 'c' +Subject length lower bound = 1 + abc\=ph +Partial match: c +\= Expect no match + xbc\=ph +No match + +/(?<=ab)c.*/I +Capture group count = 0 +Max lookbehind = 2 +First code unit = 'c' +Subject length lower bound = 1 + abc\=ph +Partial match: c +\= Expect no match + xbc\=ph +No match + +/(?<=a(?<=a|a)c)/I +Capture group count = 0 +Max lookbehind = 2 +May match empty string +Subject length lower bound = 0 + +/(?<=a(?<=a|ba)c)/I +Capture group count = 0 +Max lookbehind = 2 +May match empty string +Subject length lower bound = 0 + +/(?<=(?<=a)b)(?.*?\b\1\b){3}/ + word1 word3 word1 word2 word3 word2 word2 word1 word3 word4 +No match + +/\A(*napla:.*\b(\w++))(?>.*?\b\1\b){3}/ + word1 word3 word1 word2 word3 word2 word2 word1 word3 word4 + 0: word1 word3 word1 word2 word3 word2 word2 word1 word3 + 1: word3 + +/\A(?*.*\b(\w++))(?>.*?\b\1\b){3}/ + word1 word3 word1 word2 word3 word2 word2 word1 word3 word4 + 0: word1 word3 word1 word2 word3 word2 word2 word1 word3 + 1: word3 + +/(*plb:(.)..|(.)...)(\1|\2)/ + abcdb\=offset=4 + 0: b + 1: b + 2: + 3: b + abcda\=offset=4 +No match + +/(*naplb:(.)..|(.)...)(\1|\2)/ + abcdb\=offset=4 + 0: b + 1: b + 2: + 3: b + abcda\=offset=4 + 0: a + 1: + 2: a + 3: a + +/(?<*(.)..|(.)...)(\1|\2)/ + abcdb\=offset=4 + 0: b + 1: b + 2: + 3: b + abcda\=offset=4 + 0: a + 1: + 2: a + 3: a + +/(*non_atomic_positive_lookahead:ab)/B +------------------------------------------------------------------ + Bra + Non-atomic assert + ab + Ket + Ket + End +------------------------------------------------------------------ + +/(*non_atomic_positive_lookbehind:ab)/B +------------------------------------------------------------------ + Bra + Non-atomic assert back + Reverse + ab + Ket + Ket + End +------------------------------------------------------------------ + +/(*pla:ab+)/B +------------------------------------------------------------------ + Bra + Assert + a + b++ + Ket + Ket + End +------------------------------------------------------------------ + +/(*napla:ab+)/B +------------------------------------------------------------------ + Bra + Non-atomic assert + a + b+ + Ket + Ket + End +------------------------------------------------------------------ + +/(*napla:)+/ + +/(*naplb:)+/ + +/(*napla:^x|^y)/I +Capture group count = 0 +May match empty string +Compile options: +Overall options: anchored +Starting code units: x y +Subject length lower bound = 1 + +/(*napla:abc|abd)/I +Capture group count = 0 +May match empty string +First code unit = 'a' +Subject length lower bound = 1 + +/(*napla:a|(.)(*ACCEPT)zz)\1../ + abcd + 0: abc + 1: a + +/(*napla:a(*ACCEPT)zz|(.))\1../ + abcd + 0: bcd + 1: b + +/(*napla:a|(*COMMIT)(.))\1\1/ + aabc + 0: aa + 1: a +\= Expect no match + abbc +No match + +/(*napla:a|(.))\1\1/ + aabc + 0: aa + 1: a + abbc + 0: bb + 1: b + +# ---- + +# Expect error (recursion => not fixed length) +/(\2)((?=(?<=\1)))/ +Failed: error 125 at offset 8: lookbehind assertion is not fixed length + +/c*+(?<=[bc])/ + abc\=ph +Partial match: c + ab\=ph +Partial match: + abc\=ps + 0: c + ab\=ps + 0: + +/c++(?<=[bc])/ + abc\=ph +Partial match: c + ab\=ph +Partial match: + +/(?<=(?=.(?<=x)))/ + abx + 0: + ab\=ph +Partial match: + bxyz + 0: + xyz + 0: + +/\z/ + abc\=ph +Partial match: + abc\=ps + 0: + +/\Z/ + abc\=ph +Partial match: + abc\=ps + 0: + abc\n\=ph +Partial match: \x0a + abc\n\=ps + 0: + +/(?![ab]).*/ + ab\=ph +Partial match: + +/c*+/ + ab\=ph,offset=2 +Partial match: + +/\A\s*(a|(?:[^`]{28500}){4})/I +Capture group count = 1 +Max lookbehind = 1 +Compile options: +Overall options: anchored +Subject length lower bound = 1 + a + 0: a + 1: a + +/\A\s*((?:[^`]{28500}){4})/I +Capture group count = 1 +Max lookbehind = 1 +Compile options: +Overall options: anchored +Subject length lower bound = 65535 + +/\A\s*((?:[^`]{28500}){4}|a)/I +Capture group count = 1 +Max lookbehind = 1 +Compile options: +Overall options: anchored +Subject length lower bound = 1 + a + 0: a + 1: a + +/(?a)(?()b)((?<=b).*)/B +------------------------------------------------------------------ + Bra + CBra 1 + a + Ket + Cond + 1 Cond ref + b + Ket + CBra 2 + Assert back + Reverse + b + Ket + Any*+ + Ket + Ket + End +------------------------------------------------------------------ + +/(?(1)b)((?<=b).*)/B +------------------------------------------------------------------ + Bra + Cond + 1 Cond ref + b + Ket + CBra 1 + Assert back + Reverse + b + Ket + Any*+ + Ket + Ket + End +------------------------------------------------------------------ + +/(?(R1)b)((?<=b).*)/B +------------------------------------------------------------------ + Bra + Cond + Cond recurse 1 + b + Ket + CBra 1 + Assert back + Reverse + b + Ket + Any*+ + Ket + Ket + End +------------------------------------------------------------------ + +/(?(DEFINE)b)((?<=b).*)/B +------------------------------------------------------------------ + Bra + Cond + Cond false + b + Ket + CBra 1 + Assert back + Reverse + b + Ket + Any*+ + Ket + Ket + End +------------------------------------------------------------------ + +/(?(VERSION=10.4)b)((?<=b).*)/B +------------------------------------------------------------------ + Bra + Cond + Cond false + b + Ket + CBra 1 + Assert back + Reverse + b + Ket + Any*+ + Ket + Ket + End +------------------------------------------------------------------ + +/[aA]b[cC]/IB +------------------------------------------------------------------ + Bra + /i a + b + /i c + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +First code unit = 'a' (caseless) +Last code unit = 'c' (caseless) +Subject length lower bound = 3 + +/[cc]abcd/I +Capture group count = 0 +First code unit = 'c' +Last code unit = 'd' +Subject length lower bound = 5 + +/[Cc]abcd/I +Capture group count = 0 +First code unit = 'C' (caseless) +Last code unit = 'd' +Subject length lower bound = 5 + +/[c]abcd/I +Capture group count = 0 +First code unit = 'c' +Last code unit = 'd' +Subject length lower bound = 5 + +/(?:c|C)abcd/I +Capture group count = 0 +First code unit = 'C' (caseless) +Last code unit = 'd' +Subject length lower bound = 5 + +/(a)?a/I +Capture group count = 1 +Starting code units: a +Last code unit = 'a' +Subject length lower bound = 1 + manm + 0: a + +/^(?|(\*)(*napla:\S*_(\2?+.+))|(\w)(?=\S*_(\2?+\1)))+_\2$/ + *abc_12345abc + 0: *abc_12345abc + 1: c + 2: 12345abc + +/^(?|(\*)(*napla:\S*_(\3?+.+))|(\w)(?=\S*_((\2?+\1))))+_\2$/ + *abc_12345abc + 0: *abc_12345abc + 1: c + 2: 12345abc + 3: 12345abc + +/^((\1+)(?C)|\d)+133X$/ + 111133X\=callout_capture +Callout 0: last capture = 2 + 1: 1 + 2: 111 +--->111133X + ^ ^ | +Callout 0: last capture = 2 + 1: 3 + 2: 3 +--->111133X + ^ ^ | +Callout 0: last capture = 2 + 1: 1 + 2: 11 +--->111133X + ^ ^ | +Callout 0: last capture = 2 + 1: 3 + 2: 3 +--->111133X + ^ ^ | + 0: 111133X + 1: 11 + 2: 11 + +/abc/replace=xyz,substitute_replacement_only + 123abc456 + 1: xyz + +/a(?b)c(?d)e/g,replace=X$ONE+${TWO}Z,substitute_replacement_only + "abcde-abcde-" + 2: Xb+dZXb+dZ + +/a(b)c|xyz/g,replace=<$0>,substitute_callout,substitute_replacement_only + abcdefabcpqr + 1(2) Old 0 3 "abc" New 0 5 "" + 2(2) Old 6 9 "abc" New 5 10 "" + 2: + abxyzpqrabcxyz + 1(1) Old 2 5 "xyz" New 0 5 "" + 2(2) Old 8 11 "abc" New 5 10 "" + 3(1) Old 11 14 "xyz" New 10 15 "" + 3: + 12abc34xyz99abc55\=substitute_stop=2 + 1(2) Old 2 5 "abc" New 0 5 "" + 2(1) Old 7 10 "xyz" New 5 10 " STOPPED" + 2: + 12abc34xyz99abc55\=substitute_skip=1 + 1(2) Old 2 5 "abc" New 0 5 " SKIPPED" + 2(1) Old 7 10 "xyz" New 0 5 "" + 3(2) Old 12 15 "abc" New 5 10 "" + 3: + 12abc34xyz99abc55\=substitute_skip=2 + 1(2) Old 2 5 "abc" New 0 5 "" + 2(1) Old 7 10 "xyz" New 5 10 " SKIPPED" + 3(2) Old 12 15 "abc" New 5 10 "" + 3: + +/a(..)d/replace=>$1<,substitute_matched + xyzabcdxyzabcdxyz + 1: xyz>bcbc$1<,substitute_matched + xyzabcdxyzabcdxyz + 2: xyz>bcbcbcbc$1<,substitute_matched + xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty + 3: xyz><>bcbc$1<,substitute_matched + xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty + 1: xyz>$1< + xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty + 1: xyz>$1< + xyz55abcdxyzabcdxyz\=ovector=2,substitute_unset_empty + 3: xyz><>bcbc" 00 "<).."/hex,mark,no_start_optimize + AB + 0: AB +MK: >\x00< + A\=ph +Partial match, mark=>\x00<: A +\= Expect no match + A +No match, mark = >\x00< + +/"(*MARK:>" 00 "<).(?C1)."/hex,mark,no_start_optimize + AB +--->AB + 1 ^^ . +Latest Mark: >\x00< + 0: AB +MK: >\x00< + +/(?(VERSION=0.0/ +Failed: error 179 at offset 14: syntax error or number too big in (?(VERSION condition + +# Perl has made \K in lookarounds an error. At the moment PCRE2 still accepts. + +/(?=a\Kb)ab/ + ab + 0: b + +/(?!a\Kb)ac/ + ac + 0: ac + +/^abc(?<=b\Kc)d/ + abcd + 0: cd + +/^abc(?(?&NAME_PAT))\s+(?(?&ADDRESS_PAT)) + (?(DEFINE) + (?[a-z]+) + (?\d+) + )/x +/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/i + +#save testsaved1 + +# Do it again for some more patterns. + +/(*MARK:A)(*SKIP:B)(C|X)/mark +** Ignored when compiled pattern is stacked with 'push': mark +/(?:(?foo)|(?bar))\k/dupnames + +#save testsaved2 +#pattern -push + +# Reload the patterns, then pop them one by one and check them. + +#load testsaved1 +#load testsaved2 + +#pop info +Capture group count = 2 +Max back reference = 2 +Named capture groups: + n 1 + n 2 +Options: dupnames +Starting code units: b f +Subject length lower bound = 6 + foofoo + 0: foofoo + 1: foo + barbar + 0: barbar + 1: + 2: bar + +#pop mark + C + 0: C + 1: C +MK: A +\= Expect no match + D +No match, mark = A + +#pop + AmanaplanacanalPanama + 0: AmanaplanacanalPanama + 1: + 2: + 3: AmanaplanacanalPanama + 4: A + +#pop info +Capture group count = 4 +Named capture groups: + ADDR 2 + ADDRESS_PAT 4 + NAME 1 + NAME_PAT 3 +Options: extended +Subject length lower bound = 3 + metcalfe 33 + 0: metcalfe 33 + 1: metcalfe + 2: 33 + +# Check for an error when different tables are used. + +/abc/push,tables=1 +/xyz/push,tables=2 +#save testsaved1 +Serialization failed: error -30: patterns do not all use the same character tables + +#pop + xyz + 0: xyz + +#pop + abc + 0: abc + +#pop should give an error +** Can't pop off an empty stack + pqr + +/abcd/pushcopy + abcd + 0: abcd + +#pop + abcd + 0: abcd + +#pop should give an error +** Can't pop off an empty stack + +/abcd/push +#popcopy + abcd + 0: abcd + +#pop + abcd + 0: abcd + +/abcd/push +#save testsaved1 +#pop should give an error +** Can't pop off an empty stack + +#load testsaved1 +#popcopy + abcd + 0: abcd + +#pop + abcd + 0: abcd + +#pop should give an error +** Can't pop off an empty stack + +/abcd/pushtablescopy + abcd + 0: abcd + +#popcopy + abcd + 0: abcd + +#pop + abcd + 0: abcd + +# Must only specify one of these + +//push,pushcopy +** Not allowed together: push pushcopy + +//push,pushtablescopy +** Not allowed together: push pushtablescopy + +//pushcopy,pushtablescopy +** Not allowed together: pushcopy pushtablescopy + +# End of testinput20 diff --git a/src/pcre2/testdata/testoutput21 b/src/pcre2/testdata/testoutput21 new file mode 100644 index 00000000..fbd74004 --- /dev/null +++ b/src/pcre2/testdata/testoutput21 @@ -0,0 +1,94 @@ +# These are tests of \C that do not involve UTF. They are not run when \C is +# disabled by compiling with --enable-never-backslash-C. + +/\C+\D \C+\d \C+\S \C+\s \C+\W \C+\w \C+. \C+\R \C+\H \C+\h \C+\V \C+\v \C+\Z \C+\z \C+$/Bx +------------------------------------------------------------------ + Bra + AllAny+ + \D + AllAny+ + \d + AllAny+ + \S + AllAny+ + \s + AllAny+ + \W + AllAny+ + \w + AllAny+ + Any + AllAny+ + \R + AllAny+ + \H + AllAny+ + \h + AllAny+ + \V + AllAny+ + \v + AllAny+ + \Z + AllAny++ + \z + AllAny+ + $ + Ket + End +------------------------------------------------------------------ + +/\D+\C \d+\C \S+\C \s+\C \W+\C \w+\C .+\C \R+\C \H+\C \h+\C \V+\C \v+\C a+\C \n+\C \C+\C/Bx +------------------------------------------------------------------ + Bra + \D+ + AllAny + \d+ + AllAny + \S+ + AllAny + \s+ + AllAny + \W+ + AllAny + \w+ + AllAny + Any+ + AllAny + \R+ + AllAny + \H+ + AllAny + \h+ + AllAny + \V+ + AllAny + \v+ + AllAny + a+ + AllAny + \x0a+ + AllAny + AllAny+ + AllAny + Ket + End +------------------------------------------------------------------ + +/ab\Cde/never_backslash_c +Failed: error 183 at offset 4: using \C is disabled by the application + +/ab\Cde/info +Capture group count = 0 +Contains \C +First code unit = 'a' +Last code unit = 'e' +Subject length lower bound = 5 + abXde + 0: abXde + +/(?<=ab\Cde)X/ + abZdeX + 0: X + +# End of testinput21 diff --git a/src/pcre2/testdata/testoutput22-16 b/src/pcre2/testdata/testoutput22-16 new file mode 100644 index 00000000..54218540 --- /dev/null +++ b/src/pcre2/testdata/testoutput22-16 @@ -0,0 +1,182 @@ +# Tests of \C when Unicode support is available. Note that \C is not supported +# for DFA matching in UTF mode, so this test is not run with -dfa. The output +# of this test is different in 8-, 16-, and 32-bit modes. Some tests may match +# in some widths and not in others. + +/ab\Cde/utf,info +Capture group count = 0 +Contains \C +Options: utf +First code unit = 'a' +Last code unit = 'e' +Subject length lower bound = 2 + abXde + 0: abXde + +# This should produce an error diagnostic (\C in UTF lookbehind) in 8-bit and +# 16-bit modes, but not in 32-bit mode. + +/(?<=ab\Cde)X/utf +Failed: error 136 at offset 0: \C is not allowed in a lookbehind assertion in UTF-16 mode + ab!deXYZ + +# Autopossessification tests + +/\C+\X \X+\C/Bx +------------------------------------------------------------------ + Bra + AllAny+ + extuni + extuni+ + AllAny + Ket + End +------------------------------------------------------------------ + +/\C+\X \X+\C/Bx,utf +------------------------------------------------------------------ + Bra + Anybyte+ + extuni + extuni+ + Anybyte + Ket + End +------------------------------------------------------------------ + +/\C\X*TÓ…; +{0,6}\v+ F +/utf +\= Expect no match + Ó…\x0a +No match + +/\C(\W?Å¿)'?{{/utf +\= Expect no match + \\C(\\W?Å¿)'?{{ +No match + +/X(\C{3})/utf + X\x{1234} +No match + X\x{11234}Y + 0: X\x{11234}Y + 1: \x{11234}Y + X\x{11234}YZ + 0: X\x{11234}Y + 1: \x{11234}Y + +/X(\C{4})/utf + X\x{1234}YZ +No match + X\x{11234}YZ + 0: X\x{11234}YZ + 1: \x{11234}YZ + X\x{11234}YZW + 0: X\x{11234}YZ + 1: \x{11234}YZ + +/X\C*/utf + XYZabcdce + 0: XYZabcdce + +/X\C*?/utf + XYZabcde + 0: X + +/X\C{3,5}/utf + Xabcdefg + 0: Xabcde + X\x{1234} +No match + X\x{1234}YZ + 0: X\x{1234}YZ + X\x{1234}\x{512} +No match + X\x{1234}\x{512}YZ + 0: X\x{1234}\x{512}YZ + X\x{11234}Y + 0: X\x{11234}Y + X\x{11234}YZ + 0: X\x{11234}YZ + X\x{11234}\x{512} + 0: X\x{11234}\x{512} + X\x{11234}\x{512}YZ + 0: X\x{11234}\x{512}YZ + X\x{11234}\x{512}\x{11234}Z + 0: X\x{11234}\x{512}\x{11234} + +/X\C{3,5}?/utf + Xabcdefg + 0: Xabc + X\x{1234} +No match + X\x{1234}YZ + 0: X\x{1234}YZ + X\x{1234}\x{512} +No match + X\x{11234}Y + 0: X\x{11234}Y + X\x{11234}YZ + 0: X\x{11234}Y + X\x{11234}\x{512}YZ + 0: X\x{11234}\x{512} + X\x{11234} +No match + +/a\Cb/utf + aXb + 0: aXb + a\nb + 0: a\x{0a}b + a\x{100}b + 0: a\x{100}b + +/a\C\Cb/utf + a\x{100}b +No match + a\x{12257}b + 0: a\x{12257}b + a\x{12257}\x{11234}b +No match + +/ab\Cde/utf + abXde + 0: abXde + +# This one is here not because it's different to Perl, but because the way +# the captured single code unit is displayed. (In Perl it becomes a character, +# and you can't tell the difference.) + +/X(\C)(.*)/utf + X\x{1234} + 0: X\x{1234} + 1: \x{1234} + 2: + X\nabc + 0: X\x{0a}abc + 1: \x{0a} + 2: abc + +# This one is here because Perl gives out a grumbly error message (quite +# correctly, but that messes up comparisons). + +/a\Cb/utf +\= Expect no match in 8-bit mode + a\x{100}b + 0: a\x{100}b + +/^ab\C/utf,no_start_optimize +\= Expect no match - tests \C at end of subject + ab +No match + +/\C[^\v]+\x80/utf + [Aá¿»BÅ€C] +No match + +/\C[^\d]+\x80/utf + [Aá¿»BÅ€C] +No match + +# End of testinput22 diff --git a/src/pcre2/testdata/testoutput22-32 b/src/pcre2/testdata/testoutput22-32 new file mode 100644 index 00000000..e96696a9 --- /dev/null +++ b/src/pcre2/testdata/testoutput22-32 @@ -0,0 +1,180 @@ +# Tests of \C when Unicode support is available. Note that \C is not supported +# for DFA matching in UTF mode, so this test is not run with -dfa. The output +# of this test is different in 8-, 16-, and 32-bit modes. Some tests may match +# in some widths and not in others. + +/ab\Cde/utf,info +Capture group count = 0 +Contains \C +Options: utf +First code unit = 'a' +Last code unit = 'e' +Subject length lower bound = 5 + abXde + 0: abXde + +# This should produce an error diagnostic (\C in UTF lookbehind) in 8-bit and +# 16-bit modes, but not in 32-bit mode. + +/(?<=ab\Cde)X/utf + ab!deXYZ + 0: X + +# Autopossessification tests + +/\C+\X \X+\C/Bx +------------------------------------------------------------------ + Bra + AllAny+ + extuni + extuni+ + AllAny + Ket + End +------------------------------------------------------------------ + +/\C+\X \X+\C/Bx,utf +------------------------------------------------------------------ + Bra + AllAny+ + extuni + extuni+ + AllAny + Ket + End +------------------------------------------------------------------ + +/\C\X*TÓ…; +{0,6}\v+ F +/utf +\= Expect no match + Ó…\x0a +No match + +/\C(\W?Å¿)'?{{/utf +\= Expect no match + \\C(\\W?Å¿)'?{{ +No match + +/X(\C{3})/utf + X\x{1234} +No match + X\x{11234}Y +No match + X\x{11234}YZ + 0: X\x{11234}YZ + 1: \x{11234}YZ + +/X(\C{4})/utf + X\x{1234}YZ +No match + X\x{11234}YZ +No match + X\x{11234}YZW + 0: X\x{11234}YZW + 1: \x{11234}YZW + +/X\C*/utf + XYZabcdce + 0: XYZabcdce + +/X\C*?/utf + XYZabcde + 0: X + +/X\C{3,5}/utf + Xabcdefg + 0: Xabcde + X\x{1234} +No match + X\x{1234}YZ + 0: X\x{1234}YZ + X\x{1234}\x{512} +No match + X\x{1234}\x{512}YZ + 0: X\x{1234}\x{512}YZ + X\x{11234}Y +No match + X\x{11234}YZ + 0: X\x{11234}YZ + X\x{11234}\x{512} +No match + X\x{11234}\x{512}YZ + 0: X\x{11234}\x{512}YZ + X\x{11234}\x{512}\x{11234}Z + 0: X\x{11234}\x{512}\x{11234}Z + +/X\C{3,5}?/utf + Xabcdefg + 0: Xabc + X\x{1234} +No match + X\x{1234}YZ + 0: X\x{1234}YZ + X\x{1234}\x{512} +No match + X\x{11234}Y +No match + X\x{11234}YZ + 0: X\x{11234}YZ + X\x{11234}\x{512}YZ + 0: X\x{11234}\x{512}Y + X\x{11234} +No match + +/a\Cb/utf + aXb + 0: aXb + a\nb + 0: a\x{0a}b + a\x{100}b + 0: a\x{100}b + +/a\C\Cb/utf + a\x{100}b +No match + a\x{12257}b +No match + a\x{12257}\x{11234}b + 0: a\x{12257}\x{11234}b + +/ab\Cde/utf + abXde + 0: abXde + +# This one is here not because it's different to Perl, but because the way +# the captured single code unit is displayed. (In Perl it becomes a character, +# and you can't tell the difference.) + +/X(\C)(.*)/utf + X\x{1234} + 0: X\x{1234} + 1: \x{1234} + 2: + X\nabc + 0: X\x{0a}abc + 1: \x{0a} + 2: abc + +# This one is here because Perl gives out a grumbly error message (quite +# correctly, but that messes up comparisons). + +/a\Cb/utf +\= Expect no match in 8-bit mode + a\x{100}b + 0: a\x{100}b + +/^ab\C/utf,no_start_optimize +\= Expect no match - tests \C at end of subject + ab +No match + +/\C[^\v]+\x80/utf + [Aá¿»BÅ€C] +No match + +/\C[^\d]+\x80/utf + [Aá¿»BÅ€C] +No match + +# End of testinput22 diff --git a/src/pcre2/testdata/testoutput22-8 b/src/pcre2/testdata/testoutput22-8 new file mode 100644 index 00000000..eab410eb --- /dev/null +++ b/src/pcre2/testdata/testoutput22-8 @@ -0,0 +1,184 @@ +# Tests of \C when Unicode support is available. Note that \C is not supported +# for DFA matching in UTF mode, so this test is not run with -dfa. The output +# of this test is different in 8-, 16-, and 32-bit modes. Some tests may match +# in some widths and not in others. + +/ab\Cde/utf,info +Capture group count = 0 +Contains \C +Options: utf +First code unit = 'a' +Last code unit = 'e' +Subject length lower bound = 2 + abXde + 0: abXde + +# This should produce an error diagnostic (\C in UTF lookbehind) in 8-bit and +# 16-bit modes, but not in 32-bit mode. + +/(?<=ab\Cde)X/utf +Failed: error 136 at offset 0: \C is not allowed in a lookbehind assertion in UTF-8 mode + ab!deXYZ + +# Autopossessification tests + +/\C+\X \X+\C/Bx +------------------------------------------------------------------ + Bra + AllAny+ + extuni + extuni+ + AllAny + Ket + End +------------------------------------------------------------------ + +/\C+\X \X+\C/Bx,utf +------------------------------------------------------------------ + Bra + Anybyte+ + extuni + extuni+ + Anybyte + Ket + End +------------------------------------------------------------------ + +/\C\X*TÓ…; +{0,6}\v+ F +/utf +\= Expect no match + Ó…\x0a +No match + +/\C(\W?Å¿)'?{{/utf +\= Expect no match + \\C(\\W?Å¿)'?{{ +No match + +/X(\C{3})/utf + X\x{1234} + 0: X\x{1234} + 1: \x{1234} + X\x{11234}Y + 0: X\x{f0}\x{91}\x{88} + 1: \x{f0}\x{91}\x{88} + X\x{11234}YZ + 0: X\x{f0}\x{91}\x{88} + 1: \x{f0}\x{91}\x{88} + +/X(\C{4})/utf + X\x{1234}YZ + 0: X\x{1234}Y + 1: \x{1234}Y + X\x{11234}YZ + 0: X\x{11234} + 1: \x{11234} + X\x{11234}YZW + 0: X\x{11234} + 1: \x{11234} + +/X\C*/utf + XYZabcdce + 0: XYZabcdce + +/X\C*?/utf + XYZabcde + 0: X + +/X\C{3,5}/utf + Xabcdefg + 0: Xabcde + X\x{1234} + 0: X\x{1234} + X\x{1234}YZ + 0: X\x{1234}YZ + X\x{1234}\x{512} + 0: X\x{1234}\x{512} + X\x{1234}\x{512}YZ + 0: X\x{1234}\x{512} + X\x{11234}Y + 0: X\x{11234}Y + X\x{11234}YZ + 0: X\x{11234}Y + X\x{11234}\x{512} + 0: X\x{11234}\x{d4} + X\x{11234}\x{512}YZ + 0: X\x{11234}\x{d4} + X\x{11234}\x{512}\x{11234}Z + 0: X\x{11234}\x{d4} + +/X\C{3,5}?/utf + Xabcdefg + 0: Xabc + X\x{1234} + 0: X\x{1234} + X\x{1234}YZ + 0: X\x{1234} + X\x{1234}\x{512} + 0: X\x{1234} + X\x{11234}Y + 0: X\x{f0}\x{91}\x{88} + X\x{11234}YZ + 0: X\x{f0}\x{91}\x{88} + X\x{11234}\x{512}YZ + 0: X\x{f0}\x{91}\x{88} + X\x{11234} + 0: X\x{f0}\x{91}\x{88} + +/a\Cb/utf + aXb + 0: aXb + a\nb + 0: a\x{0a}b + a\x{100}b +No match + +/a\C\Cb/utf + a\x{100}b + 0: a\x{100}b + a\x{12257}b +No match + a\x{12257}\x{11234}b +No match + +/ab\Cde/utf + abXde + 0: abXde + +# This one is here not because it's different to Perl, but because the way +# the captured single code unit is displayed. (In Perl it becomes a character, +# and you can't tell the difference.) + +/X(\C)(.*)/utf + X\x{1234} + 0: X\x{1234} + 1: \x{e1} + 2: \x{88}\x{b4} + X\nabc + 0: X\x{0a}abc + 1: \x{0a} + 2: abc + +# This one is here because Perl gives out a grumbly error message (quite +# correctly, but that messes up comparisons). + +/a\Cb/utf +\= Expect no match in 8-bit mode + a\x{100}b +No match + +/^ab\C/utf,no_start_optimize +\= Expect no match - tests \C at end of subject + ab +No match + +/\C[^\v]+\x80/utf + [Aá¿»BÅ€C] +No match + +/\C[^\d]+\x80/utf + [Aá¿»BÅ€C] +No match + +# End of testinput22 diff --git a/src/pcre2/testdata/testoutput23 b/src/pcre2/testdata/testoutput23 new file mode 100644 index 00000000..c6f0aa21 --- /dev/null +++ b/src/pcre2/testdata/testoutput23 @@ -0,0 +1,8 @@ +# This test is run when PCRE2 has been built with --enable-never-backslash-C, +# which disables the use of \C. All we can do is check that it gives the +# correct error message. + +/a\Cb/ +Failed: error 185 at offset 3: using \C is disabled in this PCRE2 library + +# End of testinput23 diff --git a/src/pcre2/testdata/testoutput24 b/src/pcre2/testdata/testoutput24 new file mode 100644 index 00000000..9c598938 --- /dev/null +++ b/src/pcre2/testdata/testoutput24 @@ -0,0 +1,624 @@ +# This file tests the auxiliary pattern conversion features of the PCRE2 +# library, in non-UTF mode. + +#forbid_utf +#newline_default lf any anycrlf + +# -------- Tests of glob conversion -------- + +# Set the glob separator explicitly so that different OS defaults are not a +# problem. Then test various errors. + +#pattern convert=glob,convert_glob_escape=\,convert_glob_separator=/ + +/abc/posix +** The convert and posix modifiers are mutually exclusive + +# Separator must be / \ or . + +/a*b/convert_glob_separator=% +** Invalid glob separator '%' + +# Can't have separator in a class + +"[ab/cd]" +(?s)\A[ab/cd](?/ +(?s)\A<[a-c\-d]>\z + + 0: + + 0: + + 0: + + 0: + <-> + 0: <-> + +/a[[:digit:].]z/ +(?s)\Aa[[:digit:].]z\z + a1z + 0: a1z + a.z + 0: a.z +\= Expect no match + a:z +No match + +/a[[:digit].]z/ +(?s)\Aa[\[:digit]\.\]z\z + a[.]z + 0: a[.]z + a:.]z + 0: a:.]z + ad.]z + 0: ad.]z + +/<[[:a[:digit:]b]>/ +(?s)\A<[\[:a[:digit:]b]>\z + <[> + 0: <[> + <:> + 0: <:> + + 0: + <9> + 0: <9> + + 0: +\= Expect no match + +No match + +/a*b/convert_glob_separator=\ +(?s)\Aa(*COMMIT)[^\\]*?b\z + +/a*b/convert_glob_separator=. +(?s)\Aa(*COMMIT)[^\.]*?b\z + +/a*b/convert_glob_separator=/ +(?s)\Aa(*COMMIT)[^/]*?b\z + +# Non control character checking + +/A\B\\C\D/ +(?s)\AAB\\CD\z + +/\\{}\?\*+\[\]()|.^$/ +(?s)\A\\\{\}\?\*\+\[\]\(\)\|\.\^\$\z + +/*a*\/*b*/ +(?s)\A[^/]*?a(*COMMIT)[^/]*?/(*COMMIT)[^/]*?b(*COMMIT)[^/]*+\z + +/?a?\/?b?/ +(?s)\A[^/]a[^/]/[^/]b[^/]\z + +/[a\\b\c][]][-][\]\-]/ +(?s)\A[a\\bc][\]][\-][\]\-]\z + +/[^a\\b\c][!]][!-][^\]\-]/ +(?s)\A[^/a\\bc][^/\]][^/\-][^/\]\-]\z + +/[[:alnum:][:alpha:][:blank:][:cntrl:][:digit:][:graph:][:lower:][:print:][:punct:][:space:][:upper:][:word:][:xdigit:]]/ +(?s)\A[[:alnum:][:alpha:][:blank:][:cntrl:][:digit:][:graph:][:lower:][:print:][:punct:][:space:][:upper:][:word:][:xdigit:]](?[^/]*?a)(?>[^/]*?b)(?>[^/]*?g)(?>[^/]*?n)(?>[^/]*?t\z) + abcd/abcdefg/abcdefghijk/abcdefghijklmnop.txt + 0: /abcdefghijklmnop.txt + +/**\/*a*\/**/ +(?s)(?:\A|/)(?>[^/]*?a)(?>[^/]*?/) + xx/xx/xx/xax/xx/xb + 0: /xax/ + +/**\/*a*/ +(?s)(?:\A|/)(?>[^/]*?a)(?>[^/]*+\z) + xx/xx/xx/xax + 0: /xax + xx/xx/xx/xax/xx +No match + +/**\/*a*\/**\/*b*/ +(?s)(?:\A|/)(?>[^/]*?a)(?>[^/]*?/)(*COMMIT)(?:.*?/)??(?>[^/]*?b)(?>[^/]*+\z) + xx/xx/xx/xax/xx/xb + 0: /xax/xx/xb + xx/xx/xx/xax/xx/x +No match + +"**a"convert=glob +(?s)a\z + a + 0: a + c/b/a + 0: a + c/b/aaa + 0: a + +"a**/b"convert=glob +(?s)\Aa(*COMMIT).*?/b\z + a/b + 0: a/b + ab +No match + +"a/**b"convert=glob +(?s)\Aa/(*COMMIT).*?b\z + a/b + 0: a/b + ab +No match + +#pattern convert=glob:glob_no_starstar + +/***/ +(?s)\A[^/]*+\z + +/**a**/ +(?s)\A[^/]*?a(*COMMIT)[^/]*+\z + +#pattern convert=unset +#pattern convert=glob:glob_no_wild_separator + +/*/ +(?s) + +/*a*/ +(?s)a + +/**a**/ +(?s)a + +/a*b/ +(?s)\Aa(*COMMIT).*?b\z + +/*a*b*/ +(?s)a(*COMMIT).*?b + +/??a??/ +(?s)\A..a..\z + +#pattern convert=unset +#pattern convert=glob,convert_glob_escape=0 + +/a\b\cd/ +(?s)\Aa\\b\\cd\z + +/**\/a/ +(?s)\\/a\z + +/a`*b/convert_glob_escape=` +(?s)\Aa\*b\z + +/a`*b/convert_glob_escape=0 +(?s)\Aa`(*COMMIT)[^/]*?b\z + +/a`*b/convert_glob_escape=x +** Invalid glob escape 'x' + +# -------- Tests of extended POSIX conversion -------- + +#pattern convert=unset:posix_extended + +/<[[:a[:digit:]b]>/ +(*NUL)<[[:a[:digit:]b]> + <[> + 0: <[> + <:> + 0: <:> + + 0: + <9> + 0: <9> + + 0: +\= Expect no match + +No match + +/a+\1b\\c|d[ab\c]/ +(*NUL)a+1b\\c|d[ab\\c] + +/<[]bc]>/ +(*NUL)<[]bc]> + <]> + 0: <]> + + 0: + + 0: + +/<[^]bc]>/ +(*NUL)<[^]bc]> + <.> + 0: <.> +\= Expect no match + <]> +No match + +No match + +/(a)\1b/ +(*NUL)(a)1b + a1b + 0: a1b + 1: a +\= Expect no match + aab +No match + +/(ab)c)d]/ +(*NUL)(ab)c\)d\] + Xabc)d]Y + 0: abc)d] + 1: ab + +/a***b/ +(*NUL)a*b + +# -------- Tests of basic POSIX conversion -------- + +#pattern convert=unset:posix_basic + +/a*b+c\+[def](ab)\(cd\)/ +(*NUL)a*b\+c\+[def]\(ab\)(cd) + +/\(a\)\1b/ +(*NUL)(a)\1b + aab + 0: aab + 1: a +\= Expect no match + a1b +No match + +/how.to how\.to/ +(*NUL)how.to how\.to + how\nto how.to + 0: how\x0ato how.to +\= Expect no match + how\x{0}to how.to +No match + +/^how to \^how to/ +(*NUL)^how to \^how to + +/^*abc/ +(*NUL)^\*abc + +/*abc/ +(*NUL)\*abc + X*abcY + 0: *abc + +/**abc/ +(*NUL)\**abc + XabcY + 0: abc + X*abcY + 0: *abc + X**abcY + 0: **abc + +/*ab\(*cd\)/ +(*NUL)\*ab(\*cd) + +/^b\(c^d\)\(^e^f\)/ +(*NUL)^b(c\^d)(^e\^f) + +/a***b/ +(*NUL)a*b + +# End of testinput24 diff --git a/src/pcre2/testdata/testoutput25 b/src/pcre2/testdata/testoutput25 new file mode 100644 index 00000000..49902937 --- /dev/null +++ b/src/pcre2/testdata/testoutput25 @@ -0,0 +1,19 @@ +# This file tests the auxiliary pattern conversion features of the PCRE2 +# library, in UTF mode. + +#newline_default lf any anycrlf + +# -------- Tests of glob conversion -------- + +# Set the glob separator explicitly so that different OS defaults are not a +# problem. Then test various errors. + +#pattern convert=glob,convert_glob_escape=\,convert_glob_separator=/ + +# The fact that this one works in 13 bytes in the 8-bit library shows that the +# output is in UTF-8, though pcre2test shows the character as an escape. + +/'>' c4 a3 '<'/hex,utf,convert_length=13 +(?s)\A>\x{123}<\z + +# End of testinput25 diff --git a/src/pcre2/testdata/testoutput3 b/src/pcre2/testdata/testoutput3 new file mode 100644 index 00000000..801966a9 --- /dev/null +++ b/src/pcre2/testdata/testoutput3 @@ -0,0 +1,163 @@ +# This set of tests checks local-specific features, using the "fr_FR" locale. +# It is not Perl-compatible. When run via RunTest, the locale is edited to +# be whichever of "fr_FR", "french", or "fr" is found to exist. There is +# different version of this file called wintestinput3 for use on Windows, +# where the locale is called "french" and the tests are run using +# RunTest.bat. + +#forbid_utf + +/^[\w]+/ +\= Expect no match + École +No match + +/^[\w]+/locale=fr_FR + École + 0: École + +/^[\w]+/ +\= Expect no match + École +No match + +/^[\W]+/ + École + 0: \xc9 + +/^[\W]+/locale=fr_FR +\= Expect no match + École +No match + +/[\b]/ + \b + 0: \x08 +\= Expect no match + a +No match + +/[\b]/locale=fr_FR + \b + 0: \x08 +\= Expect no match + a +No match + +/^\w+/ +\= Expect no match + École +No match + +/^\w+/locale=fr_FR + École + 0: École + +/(.+)\b(.+)/ + École + 0: \xc9cole + 1: \xc9 + 2: cole + +/(.+)\b(.+)/locale=fr_FR +\= Expect no match + École +No match + +/École/i + École + 0: \xc9cole +\= Expect no match + école +No match + +/École/i,locale=fr_FR + École + 0: École + école + 0: école + +/\w/I +Capture group count = 0 +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z +Subject length lower bound = 1 + +/\w/I,locale=fr_FR +Capture group count = 0 +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z + ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â + ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ +Subject length lower bound = 1 + +# All remaining tests are in the fr_FR locale, so set the default. + +#pattern locale=fr_FR + +/^[\xc8-\xc9]/i + École + 0: É + école + 0: é + +/^[\xc8-\xc9]/ + École + 0: É +\= Expect no match + école +No match + +/\W+/ + >>>\xaa<<< + 0: >>> + >>>\xba<<< + 0: >>> + +/[\W]+/ + >>>\xaa<<< + 0: >>> + >>>\xba<<< + 0: >>> + +/[^[:alpha:]]+/ + >>>\xaa<<< + 0: >>> + >>>\xba<<< + 0: >>> + +/\w+/ + >>>\xaa<<< + 0: ª + >>>\xba<<< + 0: º + +/[\w]+/ + >>>\xaa<<< + 0: ª + >>>\xba<<< + 0: º + +/[[:alpha:]]+/ + >>>\xaa<<< + 0: ª + >>>\xba<<< + 0: º + +/[[:alpha:]][[:lower:]][[:upper:]]/IB +------------------------------------------------------------------ + Bra + [A-Za-z\xaa\xb5\xba\xc0-\xd6\xd8-\xf6\xf8-\xff] + [a-z\xb5\xdf-\xf6\xf8-\xff] + [A-Z\xc0-\xd6\xd8-\xde] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z + a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç + È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í + î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ +Subject length lower bound = 3 + +# End of testinput3 diff --git a/src/pcre2/testdata/testoutput3A b/src/pcre2/testdata/testoutput3A new file mode 100644 index 00000000..d7a223ab --- /dev/null +++ b/src/pcre2/testdata/testoutput3A @@ -0,0 +1,163 @@ +# This set of tests checks local-specific features, using the "fr_FR" locale. +# It is not Perl-compatible. When run via RunTest, the locale is edited to +# be whichever of "fr_FR", "french", or "fr" is found to exist. There is +# different version of this file called wintestinput3 for use on Windows, +# where the locale is called "french" and the tests are run using +# RunTest.bat. + +#forbid_utf + +/^[\w]+/ +\= Expect no match + École +No match + +/^[\w]+/locale=fr_FR + École + 0: École + +/^[\w]+/ +\= Expect no match + École +No match + +/^[\W]+/ + École + 0: \xc9 + +/^[\W]+/locale=fr_FR +\= Expect no match + École +No match + +/[\b]/ + \b + 0: \x08 +\= Expect no match + a +No match + +/[\b]/locale=fr_FR + \b + 0: \x08 +\= Expect no match + a +No match + +/^\w+/ +\= Expect no match + École +No match + +/^\w+/locale=fr_FR + École + 0: École + +/(.+)\b(.+)/ + École + 0: \xc9cole + 1: \xc9 + 2: cole + +/(.+)\b(.+)/locale=fr_FR +\= Expect no match + École +No match + +/École/i + École + 0: \xc9cole +\= Expect no match + école +No match + +/École/i,locale=fr_FR + École + 0: École + école + 0: école + +/\w/I +Capture group count = 0 +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z +Subject length lower bound = 1 + +/\w/I,locale=fr_FR +Capture group count = 0 +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z + ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â + ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ +Subject length lower bound = 1 + +# All remaining tests are in the fr_FR locale, so set the default. + +#pattern locale=fr_FR + +/^[\xc8-\xc9]/i + École + 0: É + école + 0: é + +/^[\xc8-\xc9]/ + École + 0: É +\= Expect no match + école +No match + +/\W+/ + >>>\xaa<<< + 0: >>> + >>>\xba<<< + 0: >>> + +/[\W]+/ + >>>\xaa<<< + 0: >>> + >>>\xba<<< + 0: >>> + +/[^[:alpha:]]+/ + >>>\xaa<<< + 0: >>> + >>>\xba<<< + 0: >>> + +/\w+/ + >>>\xaa<<< + 0: ª + >>>\xba<<< + 0: º + +/[\w]+/ + >>>\xaa<<< + 0: ª + >>>\xba<<< + 0: º + +/[[:alpha:]]+/ + >>>\xaa<<< + 0: ª + >>>\xba<<< + 0: º + +/[[:alpha:]][[:lower:]][[:upper:]]/IB +------------------------------------------------------------------ + Bra + [A-Za-z\xaa\xb5\xba\xc0-\xd6\xd8-\xf6\xf8-\xff] + [a-z\xaa\xb5\xba\xdf-\xf6\xf8-\xff] + [A-Z\xc0-\xd6\xd8-\xde] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z + a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç + È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í + î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ +Subject length lower bound = 3 + +# End of testinput3 diff --git a/src/pcre/testdata/testoutput3B b/src/pcre2/testdata/testoutput3B similarity index 50% rename from src/pcre/testdata/testoutput3B rename to src/pcre2/testdata/testoutput3B index 8d9fe7df..b18d441b 100644 --- a/src/pcre/testdata/testoutput3B +++ b/src/pcre2/testdata/testoutput3B @@ -1,25 +1,23 @@ -/-- This set of tests checks local-specific features, using the "fr_FR" locale. - It is not Perl-compatible. When run via RunTest, the locale is edited to - be whichever of "fr_FR", "french", or "fr" is found to exist. There is - different version of this file called wintestinput3 for use on Windows, - where the locale is called "french" and the tests are run using - RunTest.bat. --/ +# This set of tests checks local-specific features, using the "fr_FR" locale. +# It is not Perl-compatible. When run via RunTest, the locale is edited to +# be whichever of "fr_FR", "french", or "fr" is found to exist. There is +# different version of this file called wintestinput3 for use on Windows, +# where the locale is called "french" and the tests are run using +# RunTest.bat. -< forbid 8W +#forbid_utf /^[\w]+/ - *** Failers -No match +\= Expect no match École No match -/^[\w]+/Lfr_FR +/^[\w]+/locale=fr_FR École 0: École /^[\w]+/ - *** Failers -No match +\= Expect no match École No match @@ -27,35 +25,31 @@ No match École 0: \xc9 -/^[\W]+/Lfr_FR - *** Failers - 0: *** +/^[\W]+/locale=fr_FR +\= Expect no match École No match /[\b]/ \b 0: \x08 - *** Failers -No match +\= Expect no match a No match -/[\b]/Lfr_FR +/[\b]/locale=fr_FR \b 0: \x08 - *** Failers -No match +\= Expect no match a No match /^\w+/ - *** Failers -No match +\= Expect no match École No match -/^\w+/Lfr_FR +/^\w+/locale=fr_FR École 0: École @@ -65,99 +59,92 @@ No match 1: \xc9 2: cole -/(.+)\b(.+)/Lfr_FR - *** Failers - 0: *** Failers - 1: *** - 2: Failers +/(.+)\b(.+)/locale=fr_FR +\= Expect no match École No match /École/i École 0: \xc9cole - *** Failers -No match +\= Expect no match école No match -/École/iLfr_FR +/École/i,locale=fr_FR École 0: École école 0: école -/\w/IS -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P +/\w/I +Capture group count = 0 +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z - -/\w/ISLfr_FR -Capturing subpattern count = 0 -No options -No first char -No need char Subject length lower bound = 1 -Starting chars: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + +/\w/I,locale=fr_FR +Capture group count = 0 +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ +Subject length lower bound = 1 + +# All remaining tests are in the fr_FR locale, so set the default. -/^[\xc8-\xc9]/iLfr_FR +#pattern locale=fr_FR + +/^[\xc8-\xc9]/i École 0: É école 0: é -/^[\xc8-\xc9]/Lfr_FR +/^[\xc8-\xc9]/ École 0: É - *** Failers -No match +\= Expect no match école No match -/\W+/Lfr_FR +/\W+/ >>>\xaa<<< 0: >>> >>>\xba<<< 0: >>> -/[\W]+/Lfr_FR +/[\W]+/ >>>\xaa<<< 0: >>> >>>\xba<<< 0: >>> -/[^[:alpha:]]+/Lfr_FR +/[^[:alpha:]]+/ >>>\xaa<<< 0: >>> >>>\xba<<< 0: >>> -/\w+/Lfr_FR +/\w+/ >>>\xaa<<< 0: ª >>>\xba<<< 0: º -/[\w]+/Lfr_FR +/[\w]+/ >>>\xaa<<< 0: ª >>>\xba<<< 0: º -/[[:alpha:]]+/Lfr_FR +/[[:alpha:]]+/ >>>\xaa<<< 0: ª >>>\xba<<< 0: º -/[[:alpha:]][[:lower:]][[:upper:]]/DZLfr_FR +/[[:alpha:]][[:lower:]][[:upper:]]/IB ------------------------------------------------------------------ Bra [A-Za-z\x83\x8a\x8c\x8e\x9a\x9c\x9e\x9f\xaa\xb5\xba\xc0-\xd6\xd8-\xf6\xf8-\xff] @@ -166,9 +153,11 @@ No match Ket End ------------------------------------------------------------------ -Capturing subpattern count = 0 -No options -No first char -No need char - -/-- End of testinput3 --/ +Capture group count = 0 +Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z + a b c d e f g h i j k l m n o p q r s t u v w x y z ª µ º À Á Â Ã Ä Å Æ Ç + È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í + î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ +Subject length lower bound = 3 + +# End of testinput3 diff --git a/src/pcre2/testdata/testoutput4 b/src/pcre2/testdata/testoutput4 new file mode 100644 index 00000000..f43d9405 --- /dev/null +++ b/src/pcre2/testdata/testoutput4 @@ -0,0 +1,4035 @@ +# This set of tests is for UTF support, including Unicode properties. The +# Unicode tests are all compatible with all versions of Perl >= 5.10, but +# some of the property tests may differ because of different versions of +# Unicode in use by PCRE2 and Perl. + +# WARNING: Use only / as the pattern delimiter. Although pcre2test supports +# a number of delimiters, all those other than / give problems with the +# perltest.sh script. + +#newline_default lf anycrlf any +#perltest + +/a.b/utf + acb + 0: acb + a\x7fb + 0: a\x{7f}b + a\x{100}b + 0: a\x{100}b +\= Expect no match + a\nb +No match + +/a(.{3})b/utf + a\x{4000}xyb + 0: a\x{4000}xyb + 1: \x{4000}xy + a\x{4000}\x7fyb + 0: a\x{4000}\x{7f}yb + 1: \x{4000}\x{7f}y + a\x{4000}\x{100}yb + 0: a\x{4000}\x{100}yb + 1: \x{4000}\x{100}y +\= Expect no match + a\x{4000}b +No match + ac\ncb +No match + +/a(.*?)(.)/ + a\xc0\x88b + 0: a\xc0 + 1: + 2: \xc0 + +/a(.*?)(.)/utf + a\x{100}b + 0: a\x{100} + 1: + 2: \x{100} + +/a(.*)(.)/ + a\xc0\x88b + 0: a\xc0\x88b + 1: \xc0\x88 + 2: b + +/a(.*)(.)/utf + a\x{100}b + 0: a\x{100}b + 1: \x{100} + 2: b + +/a(.)(.)/ + a\xc0\x92bcd + 0: a\xc0\x92 + 1: \xc0 + 2: \x92 + +/a(.)(.)/utf + a\x{240}bcd + 0: a\x{240}b + 1: \x{240} + 2: b + +/a(.?)(.)/ + a\xc0\x92bcd + 0: a\xc0\x92 + 1: \xc0 + 2: \x92 + +/a(.?)(.)/utf + a\x{240}bcd + 0: a\x{240}b + 1: \x{240} + 2: b + +/a(.??)(.)/ + a\xc0\x92bcd + 0: a\xc0 + 1: + 2: \xc0 + +/a(.??)(.)/utf + a\x{240}bcd + 0: a\x{240} + 1: + 2: \x{240} + +/a(.{3})b/utf + a\x{1234}xyb + 0: a\x{1234}xyb + 1: \x{1234}xy + a\x{1234}\x{4321}yb + 0: a\x{1234}\x{4321}yb + 1: \x{1234}\x{4321}y + a\x{1234}\x{4321}\x{3412}b + 0: a\x{1234}\x{4321}\x{3412}b + 1: \x{1234}\x{4321}\x{3412} +\= Expect no match + a\x{1234}b +No match + ac\ncb +No match + +/a(.{3,})b/utf + a\x{1234}xyb + 0: a\x{1234}xyb + 1: \x{1234}xy + a\x{1234}\x{4321}yb + 0: a\x{1234}\x{4321}yb + 1: \x{1234}\x{4321}y + a\x{1234}\x{4321}\x{3412}b + 0: a\x{1234}\x{4321}\x{3412}b + 1: \x{1234}\x{4321}\x{3412} + axxxxbcdefghijb + 0: axxxxbcdefghijb + 1: xxxxbcdefghij + a\x{1234}\x{4321}\x{3412}\x{3421}b + 0: a\x{1234}\x{4321}\x{3412}\x{3421}b + 1: \x{1234}\x{4321}\x{3412}\x{3421} +\= Expect no match + a\x{1234}b +No match + +/a(.{3,}?)b/utf + a\x{1234}xyb + 0: a\x{1234}xyb + 1: \x{1234}xy + a\x{1234}\x{4321}yb + 0: a\x{1234}\x{4321}yb + 1: \x{1234}\x{4321}y + a\x{1234}\x{4321}\x{3412}b + 0: a\x{1234}\x{4321}\x{3412}b + 1: \x{1234}\x{4321}\x{3412} + axxxxbcdefghijb + 0: axxxxb + 1: xxxx + a\x{1234}\x{4321}\x{3412}\x{3421}b + 0: a\x{1234}\x{4321}\x{3412}\x{3421}b + 1: \x{1234}\x{4321}\x{3412}\x{3421} +\= Expect no match + a\x{1234}b +No match + +/a(.{3,5})b/utf + a\x{1234}xyb + 0: a\x{1234}xyb + 1: \x{1234}xy + a\x{1234}\x{4321}yb + 0: a\x{1234}\x{4321}yb + 1: \x{1234}\x{4321}y + a\x{1234}\x{4321}\x{3412}b + 0: a\x{1234}\x{4321}\x{3412}b + 1: \x{1234}\x{4321}\x{3412} + axxxxbcdefghijb + 0: axxxxb + 1: xxxx + a\x{1234}\x{4321}\x{3412}\x{3421}b + 0: a\x{1234}\x{4321}\x{3412}\x{3421}b + 1: \x{1234}\x{4321}\x{3412}\x{3421} + axbxxbcdefghijb + 0: axbxxb + 1: xbxx + axxxxxbcdefghijb + 0: axxxxxb + 1: xxxxx +\= Expect no match + a\x{1234}b +No match + axxxxxxbcdefghijb +No match + +/a(.{3,5}?)b/utf + a\x{1234}xyb + 0: a\x{1234}xyb + 1: \x{1234}xy + a\x{1234}\x{4321}yb + 0: a\x{1234}\x{4321}yb + 1: \x{1234}\x{4321}y + a\x{1234}\x{4321}\x{3412}b + 0: a\x{1234}\x{4321}\x{3412}b + 1: \x{1234}\x{4321}\x{3412} + axxxxbcdefghijb + 0: axxxxb + 1: xxxx + a\x{1234}\x{4321}\x{3412}\x{3421}b + 0: a\x{1234}\x{4321}\x{3412}\x{3421}b + 1: \x{1234}\x{4321}\x{3412}\x{3421} + axbxxbcdefghijb + 0: axbxxb + 1: xbxx + axxxxxbcdefghijb + 0: axxxxxb + 1: xxxxx +\= Expect no match + a\x{1234}b +No match + axxxxxxbcdefghijb +No match + +/^[a\x{c0}]/utf +\= Expect no match + \x{100} +No match + +/(?<=aXb)cd/utf + aXbcd + 0: cd + +/(?<=a\x{100}b)cd/utf + a\x{100}bcd + 0: cd + +/(?<=a\x{100000}b)cd/utf + a\x{100000}bcd + 0: cd + +/(?:\x{100}){3}b/utf + \x{100}\x{100}\x{100}b + 0: \x{100}\x{100}\x{100}b +\= Expect no match + \x{100}\x{100}b +No match + +/\x{ab}/utf + \x{ab} + 0: \x{ab} + \xc2\xab + 0: \x{ab} +\= Expect no match + \x00{ab} +No match + +/(?<=(.))X/utf + WXYZ + 0: X + 1: W + \x{256}XYZ + 0: X + 1: \x{256} +\= Expect no match + XYZ +No match + +/[^a]+/g,utf + bcd + 0: bcd + \x{100}aY\x{256}Z + 0: \x{100} + 0: Y\x{256}Z + +/^[^a]{2}/utf + \x{100}bc + 0: \x{100}b + +/^[^a]{2,}/utf + \x{100}bcAa + 0: \x{100}bcA + +/^[^a]{2,}?/utf + \x{100}bca + 0: \x{100}b + +/[^a]+/gi,utf + bcd + 0: bcd + \x{100}aY\x{256}Z + 0: \x{100} + 0: Y\x{256}Z + +/^[^a]{2}/i,utf + \x{100}bc + 0: \x{100}b + +/^[^a]{2,}/i,utf + \x{100}bcAa + 0: \x{100}bc + +/^[^a]{2,}?/i,utf + \x{100}bca + 0: \x{100}b + +/\x{100}{0,0}/utf + abcd + 0: + +/\x{100}?/utf + abcd + 0: + \x{100}\x{100} + 0: \x{100} + +/\x{100}{0,3}/utf + \x{100}\x{100} + 0: \x{100}\x{100} + \x{100}\x{100}\x{100}\x{100} + 0: \x{100}\x{100}\x{100} + +/\x{100}*/utf + abce + 0: + \x{100}\x{100}\x{100}\x{100} + 0: \x{100}\x{100}\x{100}\x{100} + +/\x{100}{1,1}/utf + abcd\x{100}\x{100}\x{100}\x{100} + 0: \x{100} + +/\x{100}{1,3}/utf + abcd\x{100}\x{100}\x{100}\x{100} + 0: \x{100}\x{100}\x{100} + +/\x{100}+/utf + abcd\x{100}\x{100}\x{100}\x{100} + 0: \x{100}\x{100}\x{100}\x{100} + +/\x{100}{3}/utf + abcd\x{100}\x{100}\x{100}XX + 0: \x{100}\x{100}\x{100} + +/\x{100}{3,5}/utf + abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX + 0: \x{100}\x{100}\x{100}\x{100}\x{100} + +/\x{100}{3,}/utf + abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX + 0: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + +/(?<=a\x{100}{2}b)X/utf,aftertext + Xyyya\x{100}\x{100}bXzzz + 0: X + 0+ zzz + +/\D*/utf + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + +/\D*/utf + \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + 0: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + +/\D/utf + 1X2 + 0: X + 1\x{100}2 + 0: \x{100} + +/>\S/utf + > >X Y + 0: >X + > >\x{100} Y + 0: >\x{100} + +/\d/utf + \x{100}3 + 0: 3 + +/\s/utf + \x{100} X + 0: + +/\D+/utf + 12abcd34 + 0: abcd +\= Expect no match + 1234 +No match + +/\D{2,3}/utf + 12abcd34 + 0: abc + 12ab34 + 0: ab +\= Expect no match + 1234 +No match + 12a34 +No match + +/\D{2,3}?/utf + 12abcd34 + 0: ab + 12ab34 + 0: ab +\= Expect no match + 1234 +No match + 12a34 +No match + +/\d+/utf + 12abcd34 + 0: 12 + +/\d{2,3}/utf + 12abcd34 + 0: 12 + 1234abcd + 0: 123 +\= Expect no match + 1.4 +No match + +/\d{2,3}?/utf + 12abcd34 + 0: 12 + 1234abcd + 0: 12 +\= Expect no match + 1.4 +No match + +/\S+/utf + 12abcd34 + 0: 12abcd34 +\= Expect no match + \ \ +No match + +/\S{2,3}/utf + 12abcd34 + 0: 12a + 1234abcd + 0: 123 +\= Expect no match + \ \ +No match + +/\S{2,3}?/utf + 12abcd34 + 0: 12 + 1234abcd + 0: 12 +\= Expect no match + \ \ +No match + +/>\s+ <34 + 0: > < + 0+ 34 + +/>\s{2,3} < + 0+ cd + ab> < + 0+ ce +\= Expect no match + ab> \s{2,3}? < + 0+ cd + ab> < + 0+ ce +\= Expect no match + ab> \xff< + 0: \xff + +/[\xff]/utf + >\x{ff}< + 0: \x{ff} + +/[^\xFF]/ + XYZ + 0: X + +/[^\xff]/utf + XYZ + 0: X + \x{123} + 0: \x{123} + +/^[ac]*b/utf +\= Expect no match + xb +No match + +/^[ac\x{100}]*b/utf +\= Expect no match + xb +No match + +/^[^x]*b/i,utf +\= Expect no match + xb +No match + +/^[^x]*b/utf +\= Expect no match + xb +No match + +/^\d*b/utf +\= Expect no match + xb +No match + +/(|a)/g,utf + catac + 0: + 1: + 0: + 1: + 0: a + 1: a + 0: + 1: + 0: + 1: + 0: a + 1: a + 0: + 1: + 0: + 1: + a\x{256}a + 0: + 1: + 0: a + 1: a + 0: + 1: + 0: + 1: + 0: a + 1: a + 0: + 1: + +/^\x{85}$/i,utf + \x{85} + 0: \x{85} + +/^ሴ/utf + ሴ + 0: \x{1234} + +/^\ሴ/utf + ሴ + 0: \x{1234} + +/(?s)(.{1,5})/utf + abcdefg + 0: abcde + 1: abcde + ab + 0: ab + 1: ab + +/a*\x{100}*\w/utf + a + 0: a + +/\S\S/g,utf + A\x{a3}BC + 0: A\x{a3} + 0: BC + +/\S{2}/g,utf + A\x{a3}BC + 0: A\x{a3} + 0: BC + +/\W\W/g,utf + +\x{a3}== + 0: +\x{a3} + 0: == + +/\W{2}/g,utf + +\x{a3}== + 0: +\x{a3} + 0: == + +/\S/g,utf + \x{442}\x{435}\x{441}\x{442} + 0: \x{442} + 0: \x{435} + 0: \x{441} + 0: \x{442} + +/[\S]/g,utf + \x{442}\x{435}\x{441}\x{442} + 0: \x{442} + 0: \x{435} + 0: \x{441} + 0: \x{442} + +/\D/g,utf + \x{442}\x{435}\x{441}\x{442} + 0: \x{442} + 0: \x{435} + 0: \x{441} + 0: \x{442} + +/[\D]/g,utf + \x{442}\x{435}\x{441}\x{442} + 0: \x{442} + 0: \x{435} + 0: \x{441} + 0: \x{442} + +/\W/g,utf + \x{2442}\x{2435}\x{2441}\x{2442} + 0: \x{2442} + 0: \x{2435} + 0: \x{2441} + 0: \x{2442} + +/[\W]/g,utf + \x{2442}\x{2435}\x{2441}\x{2442} + 0: \x{2442} + 0: \x{2435} + 0: \x{2441} + 0: \x{2442} + +/[\S\s]*/utf + abc\n\r\x{442}\x{435}\x{441}\x{442}xyz + 0: abc\x{0a}\x{0d}\x{442}\x{435}\x{441}\x{442}xyz + +/[\x{41f}\S]/g,utf + \x{442}\x{435}\x{441}\x{442} + 0: \x{442} + 0: \x{435} + 0: \x{441} + 0: \x{442} + +/.[^\S]./g,utf + abc def\x{442}\x{443}xyz\npqr + 0: c d + 0: z\x{0a}p + +/.[^\S\n]./g,utf + abc def\x{442}\x{443}xyz\npqr + 0: c d + +/[[:^alnum:]]/g,utf + +\x{2442} + 0: + + 0: \x{2442} + +/[[:^alpha:]]/g,utf + +\x{2442} + 0: + + 0: \x{2442} + +/[[:^ascii:]]/g,utf + A\x{442} + 0: \x{442} + +/[[:^blank:]]/g,utf + A\x{442} + 0: A + 0: \x{442} + +/[[:^cntrl:]]/g,utf + A\x{442} + 0: A + 0: \x{442} + +/[[:^digit:]]/g,utf + A\x{442} + 0: A + 0: \x{442} + +/[[:^graph:]]/g,utf + \x19\x{e01ff} + 0: \x{19} + 0: \x{e01ff} + +/[[:^lower:]]/g,utf + A\x{422} + 0: A + 0: \x{422} + +/[[:^print:]]/g,utf + \x{19}\x{e01ff} + 0: \x{19} + 0: \x{e01ff} + +/[[:^punct:]]/g,utf + A\x{442} + 0: A + 0: \x{442} + +/[[:^space:]]/g,utf + A\x{442} + 0: A + 0: \x{442} + +/[[:^upper:]]/g,utf + a\x{442} + 0: a + 0: \x{442} + +/[[:^word:]]/g,utf + +\x{2442} + 0: + + 0: \x{2442} + +/[[:^xdigit:]]/g,utf + M\x{442} + 0: M + 0: \x{442} + +/[^ABCDEFGHIJKLMNOPQRSTUVWXYZÀÃÂÃÄÅÆÇÈÉÊËÌÃÃŽÃÃÑÒÓÔÕÖØÙÚÛÜÃÞĀĂĄĆĈĊČĎÄĒĔĖĘĚĜĞĠĢĤĦĨĪĬĮİIJĴĶĹĻĽĿÅŃŅŇŊŌŎÅÅ’Å”Å–Å˜ÅšÅœÅžÅ Å¢Å¤Å¦Å¨ÅªÅ¬Å®Å°Å²Å´Å¶Å¸Å¹Å»Å½ÆÆ‚Æ„Æ†Æ‡Æ‰ÆŠÆ‹ÆŽÆÆÆ‘Æ“Æ”Æ–Æ—Æ˜ÆœÆÆŸÆ Æ¢Æ¤Æ¦Æ§Æ©Æ¬Æ®Æ¯Æ±Æ²Æ³ÆµÆ·Æ¸Æ¼Ç„LJNJÇÇǑǓǕǗǙǛǞǠǢǤǦǨǪǬǮDZǴǶǷǸǺǼǾȀȂȄȆȈȊȌȎÈȒȔȖȘȚȜȞȠȢȤȦȨȪȬȮȰȲȺȻȽȾÉΆΈΉΊΌΎÎΑΒΓΔΕΖΗΘΙΚΛΜÎΞΟΠΡΣΤΥΦΧΨΩΪΫϒϓϔϘϚϜϞϠϢϤϦϨϪϬϮϴϷϹϺϽϾϿЀÐЂЃЄЅІЇЈЉЊЋЌÐÐŽÐÐБВГДЕЖЗИЙКЛМÐОПРСТУФХЦЧШЩЪЫЬЭЮЯѠѢѤѦѨѪѬѮѰѲѴѶѸѺѼѾҀҊҌҎÒҒҔҖҘҚҜҞҠҢҤҦҨҪҬҮҰҲҴҶҸҺҼҾӀÓÓƒÓ…Ó‡Ó‰Ó‹ÓÓӒӔӖӘӚӜӞӠӢӤӦӨӪӬӮӰӲӴӶӸԀԂԄԆԈԊԌԎԱԲԳԴԵԶԷԸԹԺԻԼԽԾԿՀÕÕ‚ÕƒÕ„Õ…Õ†Õ‡ÕˆÕ‰ÕŠÕ‹ÕŒÕÕŽÕÕՑՒՓՔՕՖႠႡႢႣႤႥႦႧႨႩႪႫႬႭႮႯႰႱႲႳႴႵႶႷႸႹႺႻႼႽႾႿჀáƒáƒ‚ჃჄჅḀḂḄḆḈḊḌḎá¸á¸’ḔḖḘḚḜḞḠḢḤḦḨḪḬḮḰḲḴḶḸḺḼḾṀṂṄṆṈṊṌṎá¹á¹’ṔṖṘṚṜṞṠṢṤṦṨṪṬṮṰṲṴṶṸṺṼṾẀẂẄẆẈẊẌẎáºáº’ẔẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼẾỀỂỄỆỈỊỌỎá»á»’ỔỖỘỚỜỞỠỢỤỦỨỪỬỮỰỲỴỶỸἈἉἊἋἌá¼á¼Žá¼á¼˜á¼™á¼šá¼›á¼œá¼á¼¨á¼©á¼ªá¼«á¼¬á¼­á¼®á¼¯á¼¸á¼¹á¼ºá¼»á¼¼á¼½á¼¾á¼¿á½ˆá½‰á½Šá½‹á½Œá½á½™á½›á½á½Ÿá½¨á½©á½ªá½«á½¬á½­á½®á½¯á¾¸á¾¹á¾ºá¾»á¿ˆá¿‰á¿Šá¿‹á¿˜á¿™á¿šá¿›á¿¨á¿©á¿ªá¿«á¿¬á¿¸á¿¹á¿ºá¿»abcdefghijklmnopqrstuvwxyzªµºßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿÄăąćĉċÄÄđēĕėęěÄğġģĥħĩīĭįıijĵķĸĺļľŀłńņňʼnŋÅÅőœŕŗřśÅÅŸÅ¡Å£Å¥Å§Å©Å«Å­Å¯Å±Å³ÅµÅ·ÅºÅ¼Å¾Å¿Æ€ÆƒÆ…ÆˆÆŒÆÆ’ƕƙƚƛƞơƣƥƨƪƫƭưƴƶƹƺƽƾƿdžljnjǎÇǒǔǖǘǚǜÇǟǡǣǥǧǩǫǭǯǰdzǵǹǻǽǿÈȃȅȇȉȋÈÈȑȓȕȗșțÈȟȡȣȥȧȩȫȭȯȱȳȴȵȶȷȸȹȼȿɀÉɑɒɓɔɕɖɗɘəɚɛɜÉɞɟɠɡɢɣɤɥɦɧɨɩɪɫɬɭɮɯɰɱɲɳɴɵɶɷɸɹɺɻɼɽɾɿʀÊʂʃʄʅʆʇʈʉʊʋʌÊÊŽÊÊʑʒʓʔʕʖʗʘʙʚʛʜÊʞʟʠʡʢʣʤʥʦʧʨʩʪʫʬʭʮʯÎάέήίΰαβγδεζηθικλμνξοπÏςστυφχψωϊϋόÏÏŽÏϑϕϖϗϙϛÏϟϡϣϥϧϩϫϭϯϰϱϲϳϵϸϻϼабвгдежзийклмнопрÑтуфхцчшщъыьÑÑŽÑÑёђѓєѕіїјљњћќÑўџѡѣѥѧѩѫѭѯѱѳѵѷѹѻѽѿÒÒ‹ÒÒÒ‘Ò“Ò•Ò—Ò™Ò›ÒҟҡңҥҧҩҫҭүұҳҵҷҹһҽҿӂӄӆӈӊӌӎӑӓӕӗәӛÓÓŸÓ¡Ó£Ó¥Ó§Ó©Ó«Ó­Ó¯Ó±Ó³ÓµÓ·Ó¹ÔÔƒÔ…Ô‡Ô‰Ô‹ÔÔÕ¡Õ¢Õ£Õ¤Õ¥Õ¦Õ§Õ¨Õ©ÕªÕ«Õ¬Õ­Õ®Õ¯Õ°Õ±Õ²Õ³Õ´ÕµÕ¶Õ·Õ¸Õ¹ÕºÕ»Õ¼Õ½Õ¾Õ¿Ö€Öւփքօֆևᴀá´á´‚ᴃᴄᴅᴆᴇᴈᴉᴊᴋᴌá´á´Žá´á´á´‘ᴒᴓᴔᴕᴖᴗᴘᴙᴚᴛᴜá´á´žá´Ÿá´ á´¡á´¢á´£á´¤á´¥á´¦á´§á´¨á´©á´ªá´«áµ¢áµ£áµ¤áµ¥áµ¦áµ§áµ¨áµ©áµªáµ«áµ¬áµ­áµ®áµ¯áµ°áµ±áµ²áµ³áµ´áµµáµ¶áµ·áµ¹áµºáµ»áµ¼áµ½áµ¾áµ¿á¶€á¶á¶‚ᶃᶄᶅᶆᶇᶈᶉᶊᶋᶌá¶á¶Žá¶á¶á¶‘ᶒᶓᶔᶕᶖᶗᶘᶙᶚá¸á¸ƒá¸…ḇḉḋá¸á¸á¸‘ḓḕḗḙḛá¸á¸Ÿá¸¡á¸£á¸¥á¸§á¸©á¸«á¸­á¸¯á¸±á¸³á¸µá¸·á¸¹á¸»á¸½á¸¿á¹á¹ƒá¹…ṇṉṋá¹á¹á¹‘ṓṕṗṙṛá¹á¹Ÿá¹¡á¹£á¹¥á¹§á¹©á¹«á¹­á¹¯á¹±á¹³á¹µá¹·á¹¹á¹»á¹½á¹¿áºáºƒáº…ẇẉẋáºáºáº‘ẓẕẖẗẘẙẚẛạảấầẩẫậắằẳẵặẹẻẽếá»á»ƒá»…ệỉịá»á»á»‘ồổỗộớá»á»Ÿá»¡á»£á»¥á»§á»©á»«á»­á»¯á»±á»³á»µá»·á»¹á¼€á¼á¼‚ἃἄἅἆἇá¼á¼‘ἒἓἔἕἠἡἢἣἤἥἦἧἰἱἲἳἴἵἶἷὀá½á½‚ὃὄὅá½á½‘ὒὓὔὕὖὗὠὡὢὣὤὥὦὧὰάὲέὴήὶίὸόὺύὼώᾀá¾á¾‚ᾃᾄᾅᾆᾇá¾á¾‘ᾒᾓᾔᾕᾖᾗᾠᾡᾢᾣᾤᾥᾦᾧᾰᾱᾲᾳᾴᾶᾷιῂῃῄῆῇá¿á¿‘ῒΐῖῗῠῡῢΰῤῥῦῧῲῳῴῶῷâ²â²ƒâ²…ⲇⲉⲋâ²â²â²‘ⲓⲕⲗⲙⲛâ²â²Ÿâ²¡â²£â²¥â²§â²©â²«â²­â²¯â²±â²³â²µâ²·â²¹â²»â²½â²¿â³â³ƒâ³…ⳇⳉⳋâ³â³â³‘ⳓⳕⳗⳙⳛâ³â³Ÿâ³¡â³£â³¤â´€â´â´‚ⴃⴄⴅⴆⴇⴈⴉⴊⴋⴌâ´â´Žâ´â´â´‘ⴒⴓⴔⴕⴖⴗⴘⴙⴚⴛⴜâ´â´žâ´Ÿâ´ â´¡â´¢â´£â´¤â´¥ï¬€ï¬ï¬‚ffifflſtstﬓﬔﬕﬖﬗ\d_^]/utf + +/^[^d]*?$/ + abc + 0: abc + +/^[^d]*?$/utf + abc + 0: abc + +/^[^d]*?$/i + abc + 0: abc + +/^[^d]*?$/i,utf + abc + 0: abc + +/(?i)[\xc3\xa9\xc3\xbd]|[\xc3\xa9\xc3\xbdA]/utf + +/^[a\x{c0}]b/utf + \x{c0}b + 0: \x{c0}b + +/^([a\x{c0}]*?)aa/utf + a\x{c0}aaaa/ + 0: a\x{c0}aa + 1: a\x{c0} + +/^([a\x{c0}]*?)aa/utf + a\x{c0}aaaa/ + 0: a\x{c0}aa + 1: a\x{c0} + a\x{c0}a\x{c0}aaa/ + 0: a\x{c0}a\x{c0}aa + 1: a\x{c0}a\x{c0} + +/^([a\x{c0}]*)aa/utf + a\x{c0}aaaa/ + 0: a\x{c0}aaaa + 1: a\x{c0}aa + a\x{c0}a\x{c0}aaa/ + 0: a\x{c0}a\x{c0}aaa + 1: a\x{c0}a\x{c0}a + +/^([a\x{c0}]*)a\x{c0}/utf + a\x{c0}aaaa/ + 0: a\x{c0} + 1: + a\x{c0}a\x{c0}aaa/ + 0: a\x{c0}a\x{c0} + 1: a\x{c0} + +/A*/g,utf + AAB\x{123}BAA + 0: AA + 0: + 0: + 0: + 0: AA + 0: + +/(abc)\1/i,utf +\= Expect no match + abc +No match + +/(abc)\1/utf +\= Expect no match + abc +No match + +/a(*:a\x{1234}b)/utf,mark + abc + 0: a +MK: a\x{1234}b + +/a(*:a£b)/utf,mark + abc + 0: a +MK: a\x{a3}b + +# Noncharacters + +/./utf + \x{fffe} + 0: \x{fffe} + \x{ffff} + 0: \x{ffff} + \x{1fffe} + 0: \x{1fffe} + \x{1ffff} + 0: \x{1ffff} + \x{2fffe} + 0: \x{2fffe} + \x{2ffff} + 0: \x{2ffff} + \x{3fffe} + 0: \x{3fffe} + \x{3ffff} + 0: \x{3ffff} + \x{4fffe} + 0: \x{4fffe} + \x{4ffff} + 0: \x{4ffff} + \x{5fffe} + 0: \x{5fffe} + \x{5ffff} + 0: \x{5ffff} + \x{6fffe} + 0: \x{6fffe} + \x{6ffff} + 0: \x{6ffff} + \x{7fffe} + 0: \x{7fffe} + \x{7ffff} + 0: \x{7ffff} + \x{8fffe} + 0: \x{8fffe} + \x{8ffff} + 0: \x{8ffff} + \x{9fffe} + 0: \x{9fffe} + \x{9ffff} + 0: \x{9ffff} + \x{afffe} + 0: \x{afffe} + \x{affff} + 0: \x{affff} + \x{bfffe} + 0: \x{bfffe} + \x{bffff} + 0: \x{bffff} + \x{cfffe} + 0: \x{cfffe} + \x{cffff} + 0: \x{cffff} + \x{dfffe} + 0: \x{dfffe} + \x{dffff} + 0: \x{dffff} + \x{efffe} + 0: \x{efffe} + \x{effff} + 0: \x{effff} + \x{ffffe} + 0: \x{ffffe} + \x{fffff} + 0: \x{fffff} + \x{10fffe} + 0: \x{10fffe} + \x{10ffff} + 0: \x{10ffff} + \x{fdd0} + 0: \x{fdd0} + \x{fdd1} + 0: \x{fdd1} + \x{fdd2} + 0: \x{fdd2} + \x{fdd3} + 0: \x{fdd3} + \x{fdd4} + 0: \x{fdd4} + \x{fdd5} + 0: \x{fdd5} + \x{fdd6} + 0: \x{fdd6} + \x{fdd7} + 0: \x{fdd7} + \x{fdd8} + 0: \x{fdd8} + \x{fdd9} + 0: \x{fdd9} + \x{fdda} + 0: \x{fdda} + \x{fddb} + 0: \x{fddb} + \x{fddc} + 0: \x{fddc} + \x{fddd} + 0: \x{fddd} + \x{fdde} + 0: \x{fdde} + \x{fddf} + 0: \x{fddf} + \x{fde0} + 0: \x{fde0} + \x{fde1} + 0: \x{fde1} + \x{fde2} + 0: \x{fde2} + \x{fde3} + 0: \x{fde3} + \x{fde4} + 0: \x{fde4} + \x{fde5} + 0: \x{fde5} + \x{fde6} + 0: \x{fde6} + \x{fde7} + 0: \x{fde7} + \x{fde8} + 0: \x{fde8} + \x{fde9} + 0: \x{fde9} + \x{fdea} + 0: \x{fdea} + \x{fdeb} + 0: \x{fdeb} + \x{fdec} + 0: \x{fdec} + \x{fded} + 0: \x{fded} + \x{fdee} + 0: \x{fdee} + \x{fdef} + 0: \x{fdef} + +/^\d*\w{4}/utf + 1234 + 0: 1234 +\= Expect no match + 123 +No match + +/^[^b]*\w{4}/utf + aaaa + 0: aaaa +\= Expect no match + aaa +No match + +/^[^b]*\w{4}/i,utf + aaaa + 0: aaaa +\= Expect no match + aaa +No match + +/^\x{100}*.{4}/utf + \x{100}\x{100}\x{100}\x{100} + 0: \x{100}\x{100}\x{100}\x{100} +\= Expect no match + \x{100}\x{100}\x{100} +No match + +/^\x{100}*.{4}/i,utf + \x{100}\x{100}\x{100}\x{100} + 0: \x{100}\x{100}\x{100}\x{100} +\= Expect no match + \x{100}\x{100}\x{100} +No match + +/^a+[a\x{200}]/utf + aa + 0: aa + +/^.\B.\B./utf + \x{10123}\x{10124}\x{10125} + 0: \x{10123}\x{10124}\x{10125} + +/^#[^\x{ffff}]#[^\x{ffff}]#[^\x{ffff}]#/utf + #\x{10000}#\x{100}#\x{10ffff}# + 0: #\x{10000}#\x{100}#\x{10ffff}# + +# Unicode property support tests + +/^\pC\pL\pM\pN\pP\pS\pZ\s+/utf,ucp + >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} + 0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{09}\x{0b} + +/^>\pZ+/utf,ucp + >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} + 0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f} + +/^>[[:space:]]*/utf,ucp + >\x{20}\x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{9}\x{b} + 0: > \x{a0}\x{1680}\x{2028}\x{2029}\x{202f}\x{09}\x{0b} + +/^>[[:blank:]]*/utf,ucp + >\x{20}\x{a0}\x{1680}\x{2000}\x{202f}\x{9}\x{b}\x{2028} + 0: > \x{a0}\x{1680}\x{2000}\x{202f}\x{09} + +/^[[:alpha:]]*/utf,ucp + Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d} + 0: Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d} + +/^[[:alnum:]]*/utf,ucp + Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee} + 0: Az\x{aa}\x{c0}\x{1c5}\x{2b0}\x{3b6}\x{1d7c9}\x{2fa1d}1\x{660}\x{bef}\x{16ee} + +/^[[:cntrl:]]*/utf,ucp + \x{0}\x{09}\x{1f}\x{7f}\x{9f} + 0: \x{00}\x{09}\x{1f}\x{7f}\x{9f} + +/^[[:graph:]]*/utf,ucp + A\x{a1}\x{a0} + 0: A\x{a1} + +/^[[:print:]]*/utf,ucp + A z\x{a0}\x{a1} + 0: A z\x{a0}\x{a1} + +/^[[:punct:]]*/utf,ucp + .+\x{a1}\x{a0} + 0: .+\x{a1} + +/\p{Zs}*?\R/ +\= Expect no match + a\xFCb +No match + +/\p{Zs}*\R/ +\= Expect no match + a\xFCb +No match + +/â±¥/i,utf + â±¥ + 0: \x{2c65} + Ⱥx + 0: \x{23a} + Ⱥ + 0: \x{23a} + +/[â±¥]/i,utf + â±¥ + 0: \x{2c65} + Ⱥx + 0: \x{23a} + Ⱥ + 0: \x{23a} + +/Ⱥ/i,utf + Ⱥ + 0: \x{23a} + â±¥ + 0: \x{2c65} + +# These are tests for extended grapheme clusters + +/^\X/utf,aftertext + G\x{34e}\x{34e}X + 0: G\x{34e}\x{34e} + 0+ X + \x{34e}\x{34e}X + 0: \x{34e}\x{34e} + 0+ X + \x04X + 0: \x{04} + 0+ X + \x{1100}X + 0: \x{1100} + 0+ X + \x{1100}\x{34e}X + 0: \x{1100}\x{34e} + 0+ X + \x{1b04}\x{1b04}X + 0: \x{1b04}\x{1b04} + 0+ X + *These match up to the roman letters + 0: * + 0+ These match up to the roman letters + \x{1111}\x{1111}L,L + 0: \x{1111}\x{1111} + 0+ L,L + \x{1111}\x{1111}\x{1169}L,L,V + 0: \x{1111}\x{1111}\x{1169} + 0+ L,L,V + \x{1111}\x{ae4c}L, LV + 0: \x{1111}\x{ae4c} + 0+ L, LV + \x{1111}\x{ad89}L, LVT + 0: \x{1111}\x{ad89} + 0+ L, LVT + \x{1111}\x{ae4c}\x{1169}L, LV, V + 0: \x{1111}\x{ae4c}\x{1169} + 0+ L, LV, V + \x{1111}\x{ae4c}\x{1169}\x{1169}L, LV, V, V + 0: \x{1111}\x{ae4c}\x{1169}\x{1169} + 0+ L, LV, V, V + \x{1111}\x{ae4c}\x{1169}\x{11fe}L, LV, V, T + 0: \x{1111}\x{ae4c}\x{1169}\x{11fe} + 0+ L, LV, V, T + \x{1111}\x{ad89}\x{11fe}L, LVT, T + 0: \x{1111}\x{ad89}\x{11fe} + 0+ L, LVT, T + \x{1111}\x{ad89}\x{11fe}\x{11fe}L, LVT, T, T + 0: \x{1111}\x{ad89}\x{11fe}\x{11fe} + 0+ L, LVT, T, T + \x{ad89}\x{11fe}\x{11fe}LVT, T, T + 0: \x{ad89}\x{11fe}\x{11fe} + 0+ LVT, T, T + *These match just the first codepoint (invalid sequence) + 0: * + 0+ These match just the first codepoint (invalid sequence) + \x{1111}\x{11fe}L, T + 0: \x{1111} + 0+ \x{11fe}L, T + \x{ae4c}\x{1111}LV, L + 0: \x{ae4c} + 0+ \x{1111}LV, L + \x{ae4c}\x{ae4c}LV, LV + 0: \x{ae4c} + 0+ \x{ae4c}LV, LV + \x{ae4c}\x{ad89}LV, LVT + 0: \x{ae4c} + 0+ \x{ad89}LV, LVT + \x{1169}\x{1111}V, L + 0: \x{1169} + 0+ \x{1111}V, L + \x{1169}\x{ae4c}V, LV + 0: \x{1169} + 0+ \x{ae4c}V, LV + \x{1169}\x{ad89}V, LVT + 0: \x{1169} + 0+ \x{ad89}V, LVT + \x{ad89}\x{1111}LVT, L + 0: \x{ad89} + 0+ \x{1111}LVT, L + \x{ad89}\x{1169}LVT, V + 0: \x{ad89} + 0+ \x{1169}LVT, V + \x{ad89}\x{ae4c}LVT, LV + 0: \x{ad89} + 0+ \x{ae4c}LVT, LV + \x{ad89}\x{ad89}LVT, LVT + 0: \x{ad89} + 0+ \x{ad89}LVT, LVT + \x{11fe}\x{1111}T, L + 0: \x{11fe} + 0+ \x{1111}T, L + \x{11fe}\x{1169}T, V + 0: \x{11fe} + 0+ \x{1169}T, V + \x{11fe}\x{ae4c}T, LV + 0: \x{11fe} + 0+ \x{ae4c}T, LV + \x{11fe}\x{ad89}T, LVT + 0: \x{11fe} + 0+ \x{ad89}T, LVT + *Test extend and spacing mark + 0: * + 0+ Test extend and spacing mark + \x{1111}\x{ae4c}\x{0711}L, LV, extend + 0: \x{1111}\x{ae4c}\x{711} + 0+ L, LV, extend + \x{1111}\x{ae4c}\x{1b04}L, LV, spacing mark + 0: \x{1111}\x{ae4c}\x{1b04} + 0+ L, LV, spacing mark + \x{1111}\x{ae4c}\x{1b04}\x{0711}\x{1b04}L, LV, spacing mark, extend, spacing mark + 0: \x{1111}\x{ae4c}\x{1b04}\x{711}\x{1b04} + 0+ L, LV, spacing mark, extend, spacing mark + *Test CR, LF, and control + 0: * + 0+ Test CR, LF, and control + \x0d\x{0711}CR, extend + 0: \x{0d} + 0+ \x{711}CR, extend + \x0d\x{1b04}CR, spacingmark + 0: \x{0d} + 0+ \x{1b04}CR, spacingmark + \x0a\x{0711}LF, extend + 0: \x{0a} + 0+ \x{711}LF, extend + \x0a\x{1b04}LF, spacingmark + 0: \x{0a} + 0+ \x{1b04}LF, spacingmark + \x0b\x{0711}Control, extend + 0: \x{0b} + 0+ \x{711}Control, extend + \x09\x{1b04}Control, spacingmark + 0: \x{09} + 0+ \x{1b04}Control, spacingmark + *There are no Prepend characters, so we can't test Prepend, CR + 0: * + 0+ There are no Prepend characters, so we can't test Prepend, CR + +/^(?>\X{2})X/utf,aftertext + \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0+ + +/^\X{2,4}X/utf,aftertext + \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0+ + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0+ + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0+ + +/^\X{2,4}?X/utf,aftertext + \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0+ + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0+ + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0+ + +/\X*Z/utf,no_start_optimize +\= Expect no match + A\x{300} +No match + +/\X*(.)/utf,no_start_optimize + A\x{1111}\x{ae4c}\x{1169} + 0: A\x{1111} + 1: \x{1111} + +# -------------------------------------------- + +/\x{1e9e}+/i,utf + \x{1e9e}\x{00df} + 0: \x{1e9e}\x{df} + +/[z\x{1e9e}]+/i,utf + \x{1e9e}\x{00df} + 0: \x{1e9e}\x{df} + +/\x{00df}+/i,utf + \x{1e9e}\x{00df} + 0: \x{1e9e}\x{df} + +/[z\x{00df}]+/i,utf + \x{1e9e}\x{00df} + 0: \x{1e9e}\x{df} + +/\x{1f88}+/i,utf + \x{1f88}\x{1f80} + 0: \x{1f88}\x{1f80} + +/[z\x{1f88}]+/i,utf + \x{1f88}\x{1f80} + 0: \x{1f88}\x{1f80} + +# Check a reference with more than one other case + +/^(\x{00b5})\1{2}$/i,utf + \x{00b5}\x{039c}\x{03bc} + 0: \x{b5}\x{39c}\x{3bc} + 1: \x{b5} + +# Characters with more than one other case; test in classes + +/[z\x{00b5}]+/i,utf + \x{00b5}\x{039c}\x{03bc} + 0: \x{b5}\x{39c}\x{3bc} + +/[z\x{039c}]+/i,utf + \x{00b5}\x{039c}\x{03bc} + 0: \x{b5}\x{39c}\x{3bc} + +/[z\x{03bc}]+/i,utf + \x{00b5}\x{039c}\x{03bc} + 0: \x{b5}\x{39c}\x{3bc} + +/[z\x{00c5}]+/i,utf + \x{00c5}\x{00e5}\x{212b} + 0: \x{c5}\x{e5}\x{212b} + +/[z\x{00e5}]+/i,utf + \x{00c5}\x{00e5}\x{212b} + 0: \x{c5}\x{e5}\x{212b} + +/[z\x{212b}]+/i,utf + \x{00c5}\x{00e5}\x{212b} + 0: \x{c5}\x{e5}\x{212b} + +/[z\x{01c4}]+/i,utf + \x{01c4}\x{01c5}\x{01c6} + 0: \x{1c4}\x{1c5}\x{1c6} + +/[z\x{01c5}]+/i,utf + \x{01c4}\x{01c5}\x{01c6} + 0: \x{1c4}\x{1c5}\x{1c6} + +/[z\x{01c6}]+/i,utf + \x{01c4}\x{01c5}\x{01c6} + 0: \x{1c4}\x{1c5}\x{1c6} + +/[z\x{01c7}]+/i,utf + \x{01c7}\x{01c8}\x{01c9} + 0: \x{1c7}\x{1c8}\x{1c9} + +/[z\x{01c8}]+/i,utf + \x{01c7}\x{01c8}\x{01c9} + 0: \x{1c7}\x{1c8}\x{1c9} + +/[z\x{01c9}]+/i,utf + \x{01c7}\x{01c8}\x{01c9} + 0: \x{1c7}\x{1c8}\x{1c9} + +/[z\x{01ca}]+/i,utf + \x{01ca}\x{01cb}\x{01cc} + 0: \x{1ca}\x{1cb}\x{1cc} + +/[z\x{01cb}]+/i,utf + \x{01ca}\x{01cb}\x{01cc} + 0: \x{1ca}\x{1cb}\x{1cc} + +/[z\x{01cc}]+/i,utf + \x{01ca}\x{01cb}\x{01cc} + 0: \x{1ca}\x{1cb}\x{1cc} + +/[z\x{01f1}]+/i,utf + \x{01f1}\x{01f2}\x{01f3} + 0: \x{1f1}\x{1f2}\x{1f3} + +/[z\x{01f2}]+/i,utf + \x{01f1}\x{01f2}\x{01f3} + 0: \x{1f1}\x{1f2}\x{1f3} + +/[z\x{01f3}]+/i,utf + \x{01f1}\x{01f2}\x{01f3} + 0: \x{1f1}\x{1f2}\x{1f3} + +/[z\x{0345}]+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + 0: \x{345}\x{399}\x{3b9}\x{1fbe} + +/[z\x{0399}]+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + 0: \x{345}\x{399}\x{3b9}\x{1fbe} + +/[z\x{03b9}]+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + 0: \x{345}\x{399}\x{3b9}\x{1fbe} + +/[z\x{1fbe}]+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + 0: \x{345}\x{399}\x{3b9}\x{1fbe} + +/[z\x{0392}]+/i,utf + \x{0392}\x{03b2}\x{03d0} + 0: \x{392}\x{3b2}\x{3d0} + +/[z\x{03b2}]+/i,utf + \x{0392}\x{03b2}\x{03d0} + 0: \x{392}\x{3b2}\x{3d0} + +/[z\x{03d0}]+/i,utf + \x{0392}\x{03b2}\x{03d0} + 0: \x{392}\x{3b2}\x{3d0} + +/[z\x{0395}]+/i,utf + \x{0395}\x{03b5}\x{03f5} + 0: \x{395}\x{3b5}\x{3f5} + +/[z\x{03b5}]+/i,utf + \x{0395}\x{03b5}\x{03f5} + 0: \x{395}\x{3b5}\x{3f5} + +/[z\x{03f5}]+/i,utf + \x{0395}\x{03b5}\x{03f5} + 0: \x{395}\x{3b5}\x{3f5} + +/[z\x{0398}]+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + 0: \x{398}\x{3b8}\x{3d1}\x{3f4} + +/[z\x{03b8}]+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + 0: \x{398}\x{3b8}\x{3d1}\x{3f4} + +/[z\x{03d1}]+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + 0: \x{398}\x{3b8}\x{3d1}\x{3f4} + +/[z\x{03f4}]+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + 0: \x{398}\x{3b8}\x{3d1}\x{3f4} + +/[z\x{039a}]+/i,utf + \x{039a}\x{03ba}\x{03f0} + 0: \x{39a}\x{3ba}\x{3f0} + +/[z\x{03ba}]+/i,utf + \x{039a}\x{03ba}\x{03f0} + 0: \x{39a}\x{3ba}\x{3f0} + +/[z\x{03f0}]+/i,utf + \x{039a}\x{03ba}\x{03f0} + 0: \x{39a}\x{3ba}\x{3f0} + +/[z\x{03a0}]+/i,utf + \x{03a0}\x{03c0}\x{03d6} + 0: \x{3a0}\x{3c0}\x{3d6} + +/[z\x{03c0}]+/i,utf + \x{03a0}\x{03c0}\x{03d6} + 0: \x{3a0}\x{3c0}\x{3d6} + +/[z\x{03d6}]+/i,utf + \x{03a0}\x{03c0}\x{03d6} + 0: \x{3a0}\x{3c0}\x{3d6} + +/[z\x{03a1}]+/i,utf + \x{03a1}\x{03c1}\x{03f1} + 0: \x{3a1}\x{3c1}\x{3f1} + +/[z\x{03c1}]+/i,utf + \x{03a1}\x{03c1}\x{03f1} + 0: \x{3a1}\x{3c1}\x{3f1} + +/[z\x{03f1}]+/i,utf + \x{03a1}\x{03c1}\x{03f1} + 0: \x{3a1}\x{3c1}\x{3f1} + +/[z\x{03a3}]+/i,utf + \x{03A3}\x{03C2}\x{03C3} + 0: \x{3a3}\x{3c2}\x{3c3} + +/[z\x{03c2}]+/i,utf + \x{03A3}\x{03C2}\x{03C3} + 0: \x{3a3}\x{3c2}\x{3c3} + +/[z\x{03c3}]+/i,utf + \x{03A3}\x{03C2}\x{03C3} + 0: \x{3a3}\x{3c2}\x{3c3} + +/[z\x{03a6}]+/i,utf + \x{03a6}\x{03c6}\x{03d5} + 0: \x{3a6}\x{3c6}\x{3d5} + +/[z\x{03c6}]+/i,utf + \x{03a6}\x{03c6}\x{03d5} + 0: \x{3a6}\x{3c6}\x{3d5} + +/[z\x{03d5}]+/i,utf + \x{03a6}\x{03c6}\x{03d5} + 0: \x{3a6}\x{3c6}\x{3d5} + +/[z\x{03c9}]+/i,utf + \x{03c9}\x{03a9}\x{2126} + 0: \x{3c9}\x{3a9}\x{2126} + +/[z\x{03a9}]+/i,utf + \x{03c9}\x{03a9}\x{2126} + 0: \x{3c9}\x{3a9}\x{2126} + +/[z\x{2126}]+/i,utf + \x{03c9}\x{03a9}\x{2126} + 0: \x{3c9}\x{3a9}\x{2126} + +/[z\x{1e60}]+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + 0: \x{1e60}\x{1e61}\x{1e9b} + +/[z\x{1e61}]+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + 0: \x{1e60}\x{1e61}\x{1e9b} + +/[z\x{1e9b}]+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + 0: \x{1e60}\x{1e61}\x{1e9b} + +# Perl 5.12.4 gets these wrong, but 5.15.3 is OK + +/[z\x{004b}]+/i,utf + \x{004b}\x{006b}\x{212a} + 0: Kk\x{212a} + +/[z\x{006b}]+/i,utf + \x{004b}\x{006b}\x{212a} + 0: Kk\x{212a} + +/[z\x{212a}]+/i,utf + \x{004b}\x{006b}\x{212a} + 0: Kk\x{212a} + +/[z\x{0053}]+/i,utf + \x{0053}\x{0073}\x{017f} + 0: Ss\x{17f} + +/[z\x{0073}]+/i,utf + \x{0053}\x{0073}\x{017f} + 0: Ss\x{17f} + +/[z\x{017f}]+/i,utf + \x{0053}\x{0073}\x{017f} + 0: Ss\x{17f} + +# -------------------------------------- + +/(ΣΆΜΟΣ) \1/i,utf + ΣΆΜΟΣ ΣΆΜΟΣ + 0: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} + 1: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} + ΣΆΜΟΣ σάμος + 0: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} + 1: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} + σάμος σάμος + 0: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} + 1: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} + σάμος σάμοσ + 0: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c3} + 1: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} + σάμος ΣΆΜΟΣ + 0: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} + 1: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} + +/(σάμος) \1/i,utf + ΣΆΜΟΣ ΣΆΜΟΣ + 0: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} + 1: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} + ΣΆΜΟΣ σάμος + 0: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} + 1: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} + σάμος σάμος + 0: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} + 1: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} + σάμος σάμοσ + 0: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c3} + 1: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} + σάμος ΣΆΜΟΣ + 0: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} + 1: \x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} + +/(ΣΆΜΟΣ) \1*/i,utf + ΣΆΜΟΣ\x20 + 0: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} + 1: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} + ΣΆΜΟΣ ΣΆΜΟΣσάμοςσάμος + 0: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3}\x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2}\x{3c3}\x{3ac}\x{3bc}\x{3bf}\x{3c2} + 1: \x{3a3}\x{386}\x{39c}\x{39f}\x{3a3} + +# Perl matches these + +/\x{00b5}+/i,utf + \x{00b5}\x{039c}\x{03bc} + 0: \x{b5}\x{39c}\x{3bc} + +/\x{039c}+/i,utf + \x{00b5}\x{039c}\x{03bc} + 0: \x{b5}\x{39c}\x{3bc} + +/\x{03bc}+/i,utf + \x{00b5}\x{039c}\x{03bc} + 0: \x{b5}\x{39c}\x{3bc} + + +/\x{00c5}+/i,utf + \x{00c5}\x{00e5}\x{212b} + 0: \x{c5}\x{e5}\x{212b} + +/\x{00e5}+/i,utf + \x{00c5}\x{00e5}\x{212b} + 0: \x{c5}\x{e5}\x{212b} + +/\x{212b}+/i,utf + \x{00c5}\x{00e5}\x{212b} + 0: \x{c5}\x{e5}\x{212b} + + +/\x{01c4}+/i,utf + \x{01c4}\x{01c5}\x{01c6} + 0: \x{1c4}\x{1c5}\x{1c6} + +/\x{01c5}+/i,utf + \x{01c4}\x{01c5}\x{01c6} + 0: \x{1c4}\x{1c5}\x{1c6} + +/\x{01c6}+/i,utf + \x{01c4}\x{01c5}\x{01c6} + 0: \x{1c4}\x{1c5}\x{1c6} + + +/\x{01c7}+/i,utf + \x{01c7}\x{01c8}\x{01c9} + 0: \x{1c7}\x{1c8}\x{1c9} + +/\x{01c8}+/i,utf + \x{01c7}\x{01c8}\x{01c9} + 0: \x{1c7}\x{1c8}\x{1c9} + +/\x{01c9}+/i,utf + \x{01c7}\x{01c8}\x{01c9} + 0: \x{1c7}\x{1c8}\x{1c9} + + +/\x{01ca}+/i,utf + \x{01ca}\x{01cb}\x{01cc} + 0: \x{1ca}\x{1cb}\x{1cc} + +/\x{01cb}+/i,utf + \x{01ca}\x{01cb}\x{01cc} + 0: \x{1ca}\x{1cb}\x{1cc} + +/\x{01cc}+/i,utf + \x{01ca}\x{01cb}\x{01cc} + 0: \x{1ca}\x{1cb}\x{1cc} + + +/\x{01f1}+/i,utf + \x{01f1}\x{01f2}\x{01f3} + 0: \x{1f1}\x{1f2}\x{1f3} + +/\x{01f2}+/i,utf + \x{01f1}\x{01f2}\x{01f3} + 0: \x{1f1}\x{1f2}\x{1f3} + +/\x{01f3}+/i,utf + \x{01f1}\x{01f2}\x{01f3} + 0: \x{1f1}\x{1f2}\x{1f3} + + +/\x{0345}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + 0: \x{345}\x{399}\x{3b9}\x{1fbe} + +/\x{0399}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + 0: \x{345}\x{399}\x{3b9}\x{1fbe} + +/\x{03b9}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + 0: \x{345}\x{399}\x{3b9}\x{1fbe} + +/\x{1fbe}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + 0: \x{345}\x{399}\x{3b9}\x{1fbe} + + +/\x{0392}+/i,utf + \x{0392}\x{03b2}\x{03d0} + 0: \x{392}\x{3b2}\x{3d0} + +/\x{03b2}+/i,utf + \x{0392}\x{03b2}\x{03d0} + 0: \x{392}\x{3b2}\x{3d0} + +/\x{03d0}+/i,utf + \x{0392}\x{03b2}\x{03d0} + 0: \x{392}\x{3b2}\x{3d0} + + +/\x{0395}+/i,utf + \x{0395}\x{03b5}\x{03f5} + 0: \x{395}\x{3b5}\x{3f5} + +/\x{03b5}+/i,utf + \x{0395}\x{03b5}\x{03f5} + 0: \x{395}\x{3b5}\x{3f5} + +/\x{03f5}+/i,utf + \x{0395}\x{03b5}\x{03f5} + 0: \x{395}\x{3b5}\x{3f5} + + +/\x{0398}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + 0: \x{398}\x{3b8}\x{3d1}\x{3f4} + +/\x{03b8}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + 0: \x{398}\x{3b8}\x{3d1}\x{3f4} + +/\x{03d1}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + 0: \x{398}\x{3b8}\x{3d1}\x{3f4} + +/\x{03f4}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + 0: \x{398}\x{3b8}\x{3d1}\x{3f4} + + +/\x{039a}+/i,utf + \x{039a}\x{03ba}\x{03f0} + 0: \x{39a}\x{3ba}\x{3f0} + +/\x{03ba}+/i,utf + \x{039a}\x{03ba}\x{03f0} + 0: \x{39a}\x{3ba}\x{3f0} + +/\x{03f0}+/i,utf + \x{039a}\x{03ba}\x{03f0} + 0: \x{39a}\x{3ba}\x{3f0} + + +/\x{03a0}+/i,utf + \x{03a0}\x{03c0}\x{03d6} + 0: \x{3a0}\x{3c0}\x{3d6} + +/\x{03c0}+/i,utf + \x{03a0}\x{03c0}\x{03d6} + 0: \x{3a0}\x{3c0}\x{3d6} + +/\x{03d6}+/i,utf + \x{03a0}\x{03c0}\x{03d6} + 0: \x{3a0}\x{3c0}\x{3d6} + + +/\x{03a1}+/i,utf + \x{03a1}\x{03c1}\x{03f1} + 0: \x{3a1}\x{3c1}\x{3f1} + +/\x{03c1}+/i,utf + \x{03a1}\x{03c1}\x{03f1} + 0: \x{3a1}\x{3c1}\x{3f1} + +/\x{03f1}+/i,utf + \x{03a1}\x{03c1}\x{03f1} + 0: \x{3a1}\x{3c1}\x{3f1} + + +/\x{03a3}+/i,utf + \x{03A3}\x{03C2}\x{03C3} + 0: \x{3a3}\x{3c2}\x{3c3} + +/\x{03c2}+/i,utf + \x{03A3}\x{03C2}\x{03C3} + 0: \x{3a3}\x{3c2}\x{3c3} + +/\x{03c3}+/i,utf + \x{03A3}\x{03C2}\x{03C3} + 0: \x{3a3}\x{3c2}\x{3c3} + + +/\x{03a6}+/i,utf + \x{03a6}\x{03c6}\x{03d5} + 0: \x{3a6}\x{3c6}\x{3d5} + +/\x{03c6}+/i,utf + \x{03a6}\x{03c6}\x{03d5} + 0: \x{3a6}\x{3c6}\x{3d5} + +/\x{03d5}+/i,utf + \x{03a6}\x{03c6}\x{03d5} + 0: \x{3a6}\x{3c6}\x{3d5} + + +/\x{03c9}+/i,utf + \x{03c9}\x{03a9}\x{2126} + 0: \x{3c9}\x{3a9}\x{2126} + +/\x{03a9}+/i,utf + \x{03c9}\x{03a9}\x{2126} + 0: \x{3c9}\x{3a9}\x{2126} + +/\x{2126}+/i,utf + \x{03c9}\x{03a9}\x{2126} + 0: \x{3c9}\x{3a9}\x{2126} + + +/\x{1e60}+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + 0: \x{1e60}\x{1e61}\x{1e9b} + +/\x{1e61}+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + 0: \x{1e60}\x{1e61}\x{1e9b} + +/\x{1e9b}+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + 0: \x{1e60}\x{1e61}\x{1e9b} + + +/\x{1e9e}+/i,utf + \x{1e9e}\x{00df} + 0: \x{1e9e}\x{df} + +/\x{00df}+/i,utf + \x{1e9e}\x{00df} + 0: \x{1e9e}\x{df} + + +/\x{1f88}+/i,utf + \x{1f88}\x{1f80} + 0: \x{1f88}\x{1f80} + +/\x{1f80}+/i,utf + \x{1f88}\x{1f80} + 0: \x{1f88}\x{1f80} + +# Perl 5.12.4 gets these wrong, but 5.15.3 is OK + +/\x{004b}+/i,utf + \x{004b}\x{006b}\x{212a} + 0: Kk\x{212a} + +/\x{006b}+/i,utf + \x{004b}\x{006b}\x{212a} + 0: Kk\x{212a} + +/\x{212a}+/i,utf + \x{004b}\x{006b}\x{212a} + 0: Kk\x{212a} + + +/\x{0053}+/i,utf + \x{0053}\x{0073}\x{017f} + 0: Ss\x{17f} + +/\x{0073}+/i,utf + \x{0053}\x{0073}\x{017f} + 0: Ss\x{17f} + +/\x{017f}+/i,utf + \x{0053}\x{0073}\x{017f} + 0: Ss\x{17f} + +/^\p{Any}*\d{4}/utf + 1234 + 0: 1234 +\= Expect no match + 123 +No match + +/^\X*\w{4}/utf + 1234 + 0: 1234 +\= Expect no match + 123 +No match + +/^A\s+Z/utf,ucp + A\x{2005}Z + 0: A\x{2005}Z + A\x{85}\x{2005}Z + 0: A\x{85}\x{2005}Z + +/^A[\s]+Z/utf,ucp + A\x{2005}Z + 0: A\x{2005}Z + A\x{85}\x{2005}Z + 0: A\x{85}\x{2005}Z + +/^[[:graph:]]+$/utf,ucp + Letter:ABC + 0: Letter:ABC + Mark:\x{300}\x{1d172}\x{1d17b} + 0: Mark:\x{300}\x{1d172}\x{1d17b} + Number:9\x{660} + 0: Number:9\x{660} + Punctuation:\x{66a},; + 0: Punctuation:\x{66a},; + Symbol:\x{6de}<>\x{fffc} + 0: Symbol:\x{6de}<>\x{fffc} + Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} + 0: Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} + \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} + 0: \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} + \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} + 0: \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} + \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} + 0: \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} + \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} + 0: \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} + \x{feff} + 0: \x{feff} + \x{fff9}\x{fffa}\x{fffb} + 0: \x{fff9}\x{fffa}\x{fffb} + \x{110bd} + 0: \x{110bd} + \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} + 0: \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} + \x{e0001} + 0: \x{e0001} + \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} + 0: \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} +\= Expect no match + \x{09} +No match + \x{0a} +No match + \x{1D} +No match + \x{20} +No match + \x{85} +No match + \x{a0} +No match + \x{1680} +No match + \x{2028} +No match + \x{2029} +No match + \x{202f} +No match + \x{2065} +No match + \x{3000} +No match + \x{e0002} +No match + \x{e001f} +No match + \x{e0080} +No match + +/^[[:print:]]+$/utf,ucp + Space: \x{a0} + 0: Space: \x{a0} + \x{1680}\x{2000}\x{2001}\x{2002}\x{2003}\x{2004}\x{2005} + 0: \x{1680}\x{2000}\x{2001}\x{2002}\x{2003}\x{2004}\x{2005} + \x{2006}\x{2007}\x{2008}\x{2009}\x{200a} + 0: \x{2006}\x{2007}\x{2008}\x{2009}\x{200a} + \x{202f}\x{205f} + 0: \x{202f}\x{205f} + \x{3000} + 0: \x{3000} + Letter:ABC + 0: Letter:ABC + Mark:\x{300}\x{1d172}\x{1d17b} + 0: Mark:\x{300}\x{1d172}\x{1d17b} + Number:9\x{660} + 0: Number:9\x{660} + Punctuation:\x{66a},; + 0: Punctuation:\x{66a},; + Symbol:\x{6de}<>\x{fffc} + 0: Symbol:\x{6de}<>\x{fffc} + Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} + 0: Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} + \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} + 0: \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} + \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} + 0: \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} + \x{202f} + 0: \x{202f} + \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} + 0: \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} + \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} + 0: \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} + \x{feff} + 0: \x{feff} + \x{fff9}\x{fffa}\x{fffb} + 0: \x{fff9}\x{fffa}\x{fffb} + \x{110bd} + 0: \x{110bd} + \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} + 0: \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} + \x{e0001} + 0: \x{e0001} + \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} + 0: \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} +\= Expect no match + \x{09} +No match + \x{1D} +No match + \x{85} +No match + \x{2028} +No match + \x{2029} +No match + \x{2065} +No match + \x{e0002} +No match + \x{e001f} +No match + \x{e0080} +No match + +/^[[:punct:]]+$/utf,ucp + \$+<=>^`|~ + 0: $+<=>^`|~ + !\"#%&'()*,-./:;?@[\\]_{} + 0: !"#%&'()*,-./:;?@[\]_{} + \x{a1}\x{a7} + 0: \x{a1}\x{a7} + \x{37e} + 0: \x{37e} +\= Expect no match + abcde +No match + +/^[[:^graph:]]+$/utf,ucp + \x{09}\x{0a}\x{1D}\x{20}\x{85}\x{a0}\x{1680} + 0: \x{09}\x{0a}\x{1d} \x{85}\x{a0}\x{1680} + \x{2028}\x{2029}\x{202f}\x{2065} + 0: \x{2028}\x{2029}\x{202f}\x{2065} + \x{3000}\x{e0002}\x{e001f}\x{e0080} + 0: \x{3000}\x{e0002}\x{e001f}\x{e0080} +\= Expect no match + Letter:ABC +No match + Mark:\x{300}\x{1d172}\x{1d17b} +No match + Number:9\x{660} +No match + Punctuation:\x{66a},; +No match + Symbol:\x{6de}<>\x{fffc} +No match + Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} +No match + \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} +No match + \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} +No match + \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} +No match + \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} +No match + \x{feff} +No match + \x{fff9}\x{fffa}\x{fffb} +No match + \x{110bd} +No match + \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} +No match + \x{e0001} +No match + \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} +No match + +/^[[:^print:]]+$/utf,ucp + \x{09}\x{1D}\x{85}\x{2028}\x{2029}\x{2065} + 0: \x{09}\x{1d}\x{85}\x{2028}\x{2029}\x{2065} + \x{e0002}\x{e001f}\x{e0080} + 0: \x{e0002}\x{e001f}\x{e0080} +\= Expect no match + Space: \x{a0} +No match + \x{1680}\x{2000}\x{2001}\x{2002}\x{2003}\x{2004}\x{2005} +No match + \x{2006}\x{2007}\x{2008}\x{2009}\x{200a} +No match + \x{202f}\x{205f} +No match + \x{3000} +No match + Letter:ABC +No match + Mark:\x{300}\x{1d172}\x{1d17b} +No match + Number:9\x{660} +No match + Punctuation:\x{66a},; +No match + Symbol:\x{6de}<>\x{fffc} +No match + Cf-property:\x{ad}\x{600}\x{601}\x{602}\x{603}\x{604}\x{6dd}\x{70f} +No match + \x{200b}\x{200c}\x{200d}\x{200e}\x{200f} +No match + \x{202a}\x{202b}\x{202c}\x{202d}\x{202e} +No match + \x{202f} +No match + \x{2060}\x{2061}\x{2062}\x{2063}\x{2064} +No match + \x{206a}\x{206b}\x{206c}\x{206d}\x{206e}\x{206f} +No match + \x{feff} +No match + \x{fff9}\x{fffa}\x{fffb} +No match + \x{110bd} +No match + \x{1d173}\x{1d174}\x{1d175}\x{1d176}\x{1d177}\x{1d178}\x{1d179}\x{1d17a} +No match + \x{e0001} +No match + \x{e0020}\x{e0030}\x{e0040}\x{e0050}\x{e0060}\x{e0070}\x{e007f} +No match + +/^[[:^punct:]]+$/utf,ucp + abcde + 0: abcde +\= Expect no match + \$+<=>^`|~ +No match + !\"#%&'()*,-./:;?@[\\]_{} +No match + \x{a1}\x{a7} +No match + \x{37e} +No match + +/[RST]+/i,utf,ucp + Ss\x{17f} + 0: Ss\x{17f} + +/[R-T]+/i,utf,ucp + Ss\x{17f} + 0: Ss\x{17f} + +/[q-u]+/i,utf,ucp + Ss\x{17f} + 0: Ss\x{17f} + +/^s?c/im,utf + scat + 0: sc + +# The next four tests are for repeated caseless back references when the +# code unit length of the matched text is different to that of the original +# group in the UTF-8 case. + +/^(\x{23a})\1*(.)/i,utf + \x{23a}\x{23a}\x{23a}\x{23a} + 0: \x{23a}\x{23a}\x{23a}\x{23a} + 1: \x{23a} + 2: \x{23a} + \x{23a}\x{2c65}\x{2c65}\x{2c65} + 0: \x{23a}\x{2c65}\x{2c65}\x{2c65} + 1: \x{23a} + 2: \x{2c65} + \x{23a}\x{23a}\x{2c65}\x{23a} + 0: \x{23a}\x{23a}\x{2c65}\x{23a} + 1: \x{23a} + 2: \x{23a} + +/^(\x{23a})\1*(..)/i,utf + \x{23a}\x{2c65}\x{2c65}\x{2c65} + 0: \x{23a}\x{2c65}\x{2c65}\x{2c65} + 1: \x{23a} + 2: \x{2c65}\x{2c65} + \x{23a}\x{23a}\x{2c65}\x{23a} + 0: \x{23a}\x{23a}\x{2c65}\x{23a} + 1: \x{23a} + 2: \x{2c65}\x{23a} + +/^(\x{23a})\1*(...)/i,utf + \x{23a}\x{2c65}\x{2c65}\x{2c65} + 0: \x{23a}\x{2c65}\x{2c65}\x{2c65} + 1: \x{23a} + 2: \x{2c65}\x{2c65}\x{2c65} + \x{23a}\x{23a}\x{2c65}\x{23a} + 0: \x{23a}\x{23a}\x{2c65}\x{23a} + 1: \x{23a} + 2: \x{23a}\x{2c65}\x{23a} + +/^(\x{23a})\1*(....)/i,utf +\= Expect no match + \x{23a}\x{2c65}\x{2c65}\x{2c65} +No match + \x{23a}\x{23a}\x{2c65}\x{23a} +No match + +/[A-`]/i,utf + abcdefghijklmno + 0: a + +/[\S\V\H]/utf + +/[^\p{Any}]*+x/utf + x + 0: x + +/[[:punct:]]/utf,ucp + \x{b4} +No match + +/[[:^ascii:]]/utf,ucp + \x{100} + 0: \x{100} + \x{200} + 0: \x{200} + \x{300} + 0: \x{300} + \x{37e} + 0: \x{37e} +\= Expect no match + aa +No match + 99 +No match + +/[[:^ascii:]\w]/utf,ucp + aa + 0: a + 99 + 0: 9 + gg + 0: g + \x{100} + 0: \x{100} + \x{200} + 0: \x{200} + \x{300} + 0: \x{300} + \x{37e} + 0: \x{37e} + +/[\w[:^ascii:]]/utf,ucp + aa + 0: a + 99 + 0: 9 + gg + 0: g + \x{100} + 0: \x{100} + \x{200} + 0: \x{200} + \x{300} + 0: \x{300} + \x{37e} + 0: \x{37e} + +/[^[:ascii:]\W]/utf,ucp + \x{100} + 0: \x{100} + \x{200} + 0: \x{200} +\= Expect no match + aa +No match + 99 +No match + gg +No match + \x{37e} +No match + +/[^[:^ascii:]\d]/utf,ucp + a + 0: a + ~ + 0: ~ + \a + 0: \x{07} + \x{7f} + 0: \x{7f} +\= Expect no match + 0 +No match + \x{389} +No match + \x{20ac} +No match + +/(?=.*b)\pL/ + 11bb + 0: b + +/(?(?=.*b)(?=.*b)\pL|.*c)/ + 11bb + 0: b + +/^\x{123}+?$/utf,no_auto_possess + \x{123}\x{123}\x{123} + 0: \x{123}\x{123}\x{123} + +/^\x{123}+?$/i,utf,no_auto_possess + \x{123}\x{122}\x{123} + 0: \x{123}\x{122}\x{123} +\= Expect no match + \x{123}\x{124}\x{123} +No match + +/\N{U+1234}/utf + \x{1234} + 0: \x{1234} + +/[\N{U+1234}]/utf + \x{1234} + 0: \x{1234} + +# Test the full list of Unicode "Pattern White Space" characters that are to +# be ignored by /x. The pattern lines below may show up oddly in text editors +# or when listed to the screen. Note that characters such as U+2002, which are +# matched as space by \h and \v are *not* "Pattern White Space". + +/A…‎â€â€¨â€©B/x,utf + AB + 0: AB + +/A B/x,utf + A\x{2002}B + 0: A\x{2002}B +\= Expect no match + AB +No match + +# ------- + +/[^\x{100}-\x{ffff}]*[\x80-\xff]/utf + \x{99}\x{99}\x{99} + 0: \x{99}\x{99}\x{99} + +/[^\x{100}-\x{ffff}ABC]*[\x80-\xff]/utf + \x{99}\x{99}\x{99} + 0: \x{99}\x{99}\x{99} + +/[^\x{100}-\x{ffff}]*[\x80-\xff]/i,utf + \x{99}\x{99}\x{99} + 0: \x{99}\x{99}\x{99} + +# Script run tests + +/^(*script_run:.{4})/utf + abcd Latin x4 + 0: abcd + \x{2e80}\x{2fa1d}\x{3041}\x{30a1} Han Han Hiragana Katakana + 0: \x{2e80}\x{2fa1d}\x{3041}\x{30a1} + \x{3041}\x{30a1}\x{3007}\x{3007} Hiragana Katakana Han Han + 0: \x{3041}\x{30a1}\x{3007}\x{3007} + \x{30a1}\x{3041}\x{3007}\x{3007} Katakana Hiragana Han Han + 0: \x{30a1}\x{3041}\x{3007}\x{3007} + \x{1100}\x{2e80}\x{2e80}\x{1101} Hangul Han Han Hangul + 0: \x{1100}\x{2e80}\x{2e80}\x{1101} + \x{2e80}\x{3105}\x{2e80}\x{3105} Han Bopomofo Han Bopomofo + 0: \x{2e80}\x{3105}\x{2e80}\x{3105} + \x{02ea}\x{2e80}\x{2e80}\x{3105} Bopomofo-Sk Han Han Bopomofo + 0: \x{2ea}\x{2e80}\x{2e80}\x{3105} + \x{3105}\x{2e80}\x{2e80}\x{3105} Bopomofo Han Han Bopomofo + 0: \x{3105}\x{2e80}\x{2e80}\x{3105} + \x{0300}cd! Inherited Latin Latin Common + 0: \x{300}cd! + \x{0391}12\x{03a9} Greek Common-digits Greek + 0: \x{391}12\x{3a9} + \x{0400}12\x{fe2f} Cyrillic Common-digits Cyrillic + 0: \x{400}12\x{fe2f} + \x{0531}12\x{fb17} Armenian Common-digits Armenian + 0: \x{531}12\x{fb17} + \x{0591}12\x{fb4f} Hebrew Common-digits Hebrew + 0: \x{591}12\x{fb4f} + \x{0600}12\x{1eef1} Arabic Common-digits Arabic + 0: \x{600}12\x{1eef1} + \x{0600}\x{0660}\x{0669}\x{1eef1} Arabic Arabic-digits Arabic + 0: \x{600}\x{660}\x{669}\x{1eef1} + \x{0700}12\x{086a} Syriac Common-digits Syriac + 0: \x{700}12\x{86a} + \x{1200}12\x{ab2e} Ethiopic Common-digits Ethiopic + 0: \x{1200}12\x{ab2e} + \x{1680}12\x{169c} Ogham Common-digits Ogham + 0: \x{1680}12\x{169c} + \x{3041}12\x{3041} Hiragana Common-digits Hiragana + 0: \x{3041}12\x{3041} + \x{0980}\x{09e6}\x{09e7}\x{0993} Bengali Bengali-digits Bengali + 0: \x{980}\x{9e6}\x{9e7}\x{993} + !cde Common Latin Latin Latin + 0: !cde + A..B Latin Common Common Latin + 0: A..B + 0abc Ascii-digit Latin Latin Latin + 0: 0abc + 1\x{0700}\x{0700}\x{0700} Ascii-digit Syriac x 3 + 0: 1\x{700}\x{700}\x{700} + \x{1A80}\x{1A80}\x{1a40}\x{1a41} Tai Tham Hora digits, letters + 0: \x{1a80}\x{1a80}\x{1a40}\x{1a41} +\= Expect no match + a\x{370}bcd Latin Greek Latin Latin +No match + \x{1100}\x{02ea}\x{02ea}\x{02ea} Hangul Bopomofo x3 +No match + \x{02ea}\x{02ea}\x{02ea}\x{1100} Bopomofo x3 Hangul +No match + \x{1100}\x{2e80}\x{3041}\x{1101} Hangul Han Hiragana Hangul +No match + \x{0391}\x{09e6}\x{09e7}\x{03a9} Greek Bengali digits Greek +No match + \x{0600}7\x{0669}\x{1eef1} Arabic ascii-digit Arabic-digit Arabic +No match + \x{0600}\x{0669}7\x{1eef1} Arabic Arabic-digit ascii-digit Arabic +No match + A5\x{ff19}B Latin Common-ascii/notascii-digits Latin +No match + \x{0300}cd\x{0391} Inherited Latin Latin Greek +No match + !cd\x{0391} Common Latin Latin Greek +No match + \x{1A80}\x{1A90}\x{1a40}\x{1a41} Tai Tham Hora digit, Tham digit, letters +No match + A\x{1d7ce}\x{1d7ff}B Common fancy-common-2-sets-digits Common +No match + \x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana +No match + +/^(*sr:.{4}|..)/utf + \x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana + 0: \x{2e80}\x{3105} + +/^(*atomic_script_run:.{4}|..)/utf +\= Expect no match + \x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana +No match + +/^(*asr:.*)/utf +\= Expect no match + \x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana +No match + +/^(?>(*sr:.*))/utf + \x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana + 0: \x{2e80}\x{3105}\x{2e80} + +/^(*sr:.*)/utf + \x{2e80}\x{3105}\x{2e80}\x{30a1} Han Bopomofo Han Katakana + 0: \x{2e80}\x{3105}\x{2e80} + \x{10fffd}\x{10fffd}\x{10fffd} Private use (Unknown) + 0: \x{10fffd} + +/^(*sr:\x{2e80}*)/utf + \x{2e80}\x{2e80}\x{3105} Han Han Bopomofo + 0: \x{2e80}\x{2e80} + +/^(*sr:\x{2e80}*)\x{2e80}/utf + \x{2e80}\x{2e80}\x{3105} Han Han Bopomofo + 0: \x{2e80}\x{2e80} + +/^(*sr:.*)Test/utf + Test script run on an empty string + 0: Test + +/^(*sr:(.{2})){2}/utf + \x{0600}7\x{0669}\x{1eef1} Arabic ascii-digit Arabic-digit Arabic + 0: \x{600}7\x{669}\x{1eef1} + 1: \x{669}\x{1eef1} + \x{1A80}\x{1A80}\x{1a40}\x{1a41} Tai Tham Hora digits, letters + 0: \x{1a80}\x{1a80}\x{1a40}\x{1a41} + 1: \x{1a40}\x{1a41} + \x{1A80}\x{1a40}\x{1A90}\x{1a41} Tai Tham Hora digit, letter, Tham digit, letter + 0: \x{1a80}\x{1a40}\x{1a90}\x{1a41} + 1: \x{1a90}\x{1a41} +\= Expect no match + \x{1100}\x{2e80}\x{3041}\x{1101} Hangul Han Hiragana Hangul +No match + +/^(*sr:\S*)/utf + \x{1cf4}\x{20f0}\x{900}\x{11305} [Dev,Gran,Kan] [Dev,Gran,Lat] Dev Gran + 0: \x{1cf4}\x{20f0}\x{900} + \x{1cf4}\x{20f0}\x{11305}\x{900} [Dev,Gran,Kan] [Dev,Gran,Lat] Gran Dev + 0: \x{1cf4}\x{20f0}\x{11305} + \x{1cf4}\x{20f0}\x{900}ABC [Dev,Gran,Kan] [Dev,Gran,Lat] Dev Lat + 0: \x{1cf4}\x{20f0}\x{900} + \x{1cf4}\x{20f0}ABC [Dev,Gran,Kan] [Dev,Gran,Lat] Lat + 0: \x{1cf4}\x{20f0} + \x{20f0}ABC [Dev,Gran,Lat] Lat + 0: \x{20f0}ABC + XYZ\x{20f0}ABC Lat [Dev,Gran,Lat] Lat + 0: XYZ\x{20f0}ABC + \x{a36}\x{a33}\x{900} [Dev,...] [Dev,...] Dev + 0: \x{a36}\x{a33} + \x{3001}\x{2e80}\x{3041}\x{30a1} [Bopo, Han, etc] Han Hira Kata + 0: \x{3001}\x{2e80}\x{3041}\x{30a1} + \x{3001}\x{30a1}\x{2e80}\x{3041} [Bopo, Han, etc] Kata Han Hira + 0: \x{3001}\x{30a1}\x{2e80}\x{3041} + \x{3001}\x{3105}\x{2e80}\x{1101} [Bopo, Han, etc] Bopomofo Han Hangul + 0: \x{3001}\x{3105}\x{2e80} + \x{3105}\x{3001}\x{2e80}\x{1101} Bopomofo [Bopo, Han, etc] Han Hangul + 0: \x{3105}\x{3001}\x{2e80} + \x{3031}\x{3041}\x{30a1}\x{2e80} [Hira Kata] Hira Kata Han + 0: \x{3031}\x{3041}\x{30a1}\x{2e80} + \x{060c}\x{06d4}\x{0600}\x{10d00}\x{0700} [Arab Rohg Syrc Thaa] [Arab Rohg] Arab Rohg Syrc + 0: \x{60c}\x{6d4}\x{600} + \x{060c}\x{06d4}\x{0700}\x{0600}\x{10d00} [Arab Rohg Syrc Thaa] [Arab Rohg] Syrc Arab Rohg + 0: \x{60c}\x{6d4} + \x{2e80}\x{3041}\x{3001}\x{3031}\x{2e80} Han Hira [Bopo, Han, etc] [Hira Kata] Han + 0: \x{2e80}\x{3041}\x{3001}\x{3031}\x{2e80} + +/(?[[:blank:]]*/utf,ucp + >\x{20}\x{a0}\x{1680}\x{180e}\x{2000}\x{202f}\x{9}\x{b}\x{2028} + 0: > \x{a0}\x{1680}\x{180e}\x{2000}\x{202f}\x{09} + +/^A\s+Z/utf,ucp + A\x{85}\x{180e}\x{2005}Z + 0: A\x{85}\x{180e}\x{2005}Z + +/^A[\s]+Z/utf,ucp + A\x{2005}Z + 0: A\x{2005}Z + A\x{85}\x{2005}Z + 0: A\x{85}\x{2005}Z + +/^[[:graph:]]+$/utf,ucp +\= Expect no match + \x{180e} +No match + +/^[[:print:]]+$/utf,ucp + \x{180e} + 0: \x{180e} + +/^[[:^graph:]]+$/utf,ucp + \x{09}\x{0a}\x{1D}\x{20}\x{85}\x{a0}\x{61c}\x{1680}\x{180e} + 0: \x{09}\x{0a}\x{1d} \x{85}\x{a0}\x{61c}\x{1680}\x{180e} + +/^[[:^print:]]+$/utf,ucp +\= Expect no match + \x{180e} +No match + +# End of U+180E tests. + +# --------------------------------------------------------------------- + +/\x{110000}/IB,utf +Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large + +/\o{4200000}/IB,utf +Failed: error 134 at offset 10: character code point value in \x{} or \o{} is too large + +/\x{ffffffff}/utf +Failed: error 134 at offset 11: character code point value in \x{} or \o{} is too large + +/\o{37777777777}/utf +Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large + +/\x{100000000}/utf +Failed: error 134 at offset 12: character code point value in \x{} or \o{} is too large + +/\o{77777777777}/utf +Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large + +/\x{d800}/utf +Failed: error 173 at offset 7: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) + +/\o{154000}/utf +Failed: error 173 at offset 9: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) + +/\x{dfff}/utf +Failed: error 173 at offset 7: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) + +/\o{157777}/utf +Failed: error 173 at offset 9: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) + +/\x{d7ff}/utf + +/\o{153777}/utf + +/\x{e000}/utf + +/\o{170000}/utf + +/^\x{100}a\x{1234}/utf + \x{100}a\x{1234}bcd + 0: \x{100}a\x{1234} + +/\x{0041}\x{2262}\x{0391}\x{002e}/IB,utf +------------------------------------------------------------------ + Bra + A\x{2262}\x{391}. + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'A' +Last code unit = '.' +Subject length lower bound = 4 + \x{0041}\x{2262}\x{0391}\x{002e} + 0: A\x{2262}\x{391}. + +/.{3,5}X/IB,utf +------------------------------------------------------------------ + Bra + Any{3} + Any{0,2} + X + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Last code unit = 'X' +Subject length lower bound = 4 + \x{212ab}\x{212ab}\x{212ab}\x{861}X + 0: \x{212ab}\x{212ab}\x{212ab}\x{861}X + +/.{3,5}?/IB,utf +------------------------------------------------------------------ + Bra + Any{3} + Any{0,2}? + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Subject length lower bound = 3 + \x{212ab}\x{212ab}\x{212ab}\x{861} + 0: \x{212ab}\x{212ab}\x{212ab} + +/^[ab]/IB,utf +------------------------------------------------------------------ + Bra + ^ + [ab] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: utf +Overall options: anchored utf +Starting code units: a b +Subject length lower bound = 1 + bar + 0: b +\= Expect no match + c +No match + \x{ff} +No match + \x{100} +No match + +/\x{100}*(\d+|"(?1)")/utf + 1234 + 0: 1234 + 1: 1234 + "1234" + 0: "1234" + 1: "1234" + \x{100}1234 + 0: \x{100}1234 + 1: 1234 + "\x{100}1234" + 0: \x{100}1234 + 1: 1234 + \x{100}\x{100}12ab + 0: \x{100}\x{100}12 + 1: 12 + \x{100}\x{100}"12" + 0: \x{100}\x{100}"12" + 1: "12" +\= Expect no match + \x{100}\x{100}abcd +No match + +/\x{100}*/IB,utf +------------------------------------------------------------------ + Bra + \x{100}*+ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +May match empty string +Options: utf +Subject length lower bound = 0 + +/a\x{100}*/IB,utf +------------------------------------------------------------------ + Bra + a + \x{100}*+ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'a' +Subject length lower bound = 1 + +/ab\x{100}*/IB,utf +------------------------------------------------------------------ + Bra + ab + \x{100}*+ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + +/[\x{200}-\x{100}]/utf +Failed: error 108 at offset 15: range out of order in character class + +/[Ä€-Ä„]/utf + \x{100} + 0: \x{100} + \x{104} + 0: \x{104} +\= Expect no match + \x{105} +No match + \x{ff} +No match + +/[\xFF]/IB +------------------------------------------------------------------ + Bra + \x{ff} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +First code unit = \xff +Subject length lower bound = 1 + >\xff< + 0: \xff + +/[^\xFF]/IB +------------------------------------------------------------------ + Bra + [^\x{ff}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Subject length lower bound = 1 + +/[Ä-Ü]/utf + Ö # Matches without Study + 0: \x{d6} + \x{d6} + 0: \x{d6} + +/[Ä-Ü]/utf + Ö <-- Same with Study + 0: \x{d6} + \x{d6} + 0: \x{d6} + +/[\x{c4}-\x{dc}]/utf + Ö # Matches without Study + 0: \x{d6} + \x{d6} + 0: \x{d6} + +/[\x{c4}-\x{dc}]/utf + Ö <-- Same with Study + 0: \x{d6} + \x{d6} + 0: \x{d6} + +/[^\x{100}]abc(xyz(?1))/IB,utf +------------------------------------------------------------------ + Bra + [^\x{100}] + abc + CBra 1 + xyz + Recurse + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 1 +Options: utf +Last code unit = 'z' +Subject length lower bound = 7 + +/(\x{100}(b(?2)c))?/IB,utf +------------------------------------------------------------------ + Bra + Brazero + CBra 1 + \x{100} + CBra 2 + b + Recurse + c + Ket + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 2 +May match empty string +Options: utf +Subject length lower bound = 0 + +/(\x{100}(b(?2)c)){0,2}/IB,utf +------------------------------------------------------------------ + Bra + Brazero + Bra + CBra 1 + \x{100} + CBra 2 + b + Recurse + c + Ket + Ket + Brazero + CBra 1 + \x{100} + CBra 2 + b + Recurse + c + Ket + Ket + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 2 +May match empty string +Options: utf +Subject length lower bound = 0 + +/(\x{100}(b(?1)c))?/IB,utf +------------------------------------------------------------------ + Bra + Brazero + CBra 1 + \x{100} + CBra 2 + b + Recurse + c + Ket + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 2 +May match empty string +Options: utf +Subject length lower bound = 0 + +/(\x{100}(b(?1)c)){0,2}/IB,utf +------------------------------------------------------------------ + Bra + Brazero + Bra + CBra 1 + \x{100} + CBra 2 + b + Recurse + c + Ket + Ket + Brazero + CBra 1 + \x{100} + CBra 2 + b + Recurse + c + Ket + Ket + Ket + Ket + End +------------------------------------------------------------------ +Capture group count = 2 +May match empty string +Options: utf +Subject length lower bound = 0 + +/\W/utf + A.B + 0: . + A\x{100}B + 0: \x{100} + +/\w/utf + \x{100}X + 0: X + +# Use no_start_optimize because the first code unit is different in 8-bit from +# the wider modes. + +/^\ሴ/IB,utf,no_start_optimize +------------------------------------------------------------------ + Bra + ^ + \x{1234} + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Compile options: no_start_optimize utf +Overall options: anchored no_start_optimize utf + +/()()()()()()()()()() + ()()()()()()()()()() + ()()()()()()()()()() + ()()()()()()()()()() + A (x) (?41) B/x,utf + AxxB +Matched, but too many substrings + 0: AxxB + 1: + 2: + 3: + 4: + 5: + 6: + 7: + 8: + 9: +10: +11: +12: +13: +14: + +/^[\x{100}\E-\Q\E\x{150}]/B,utf +------------------------------------------------------------------ + Bra + ^ + [\x{100}-\x{150}] + Ket + End +------------------------------------------------------------------ + +/^[\QÄ€\E-\QÅ\E]/B,utf +------------------------------------------------------------------ + Bra + ^ + [\x{100}-\x{150}] + Ket + End +------------------------------------------------------------------ + +/^abc./gmx,newline=any,utf + abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x{0085}abc7 \x{2028}abc8 \x{2029}abc9 JUNK + 0: abc1 + 0: abc2 + 0: abc3 + 0: abc4 + 0: abc5 + 0: abc6 + 0: abc7 + 0: abc8 + 0: abc9 + +/abc.$/gmx,newline=any,utf + abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x{0085} abc7\x{2028} abc8\x{2029} abc9 + 0: abc1 + 0: abc2 + 0: abc3 + 0: abc4 + 0: abc5 + 0: abc6 + 0: abc7 + 0: abc8 + 0: abc9 + +/^a\Rb/bsr=unicode,utf + a\nb + 0: a\x{0a}b + a\rb + 0: a\x{0d}b + a\r\nb + 0: a\x{0d}\x{0a}b + a\x0bb + 0: a\x{0b}b + a\x0cb + 0: a\x{0c}b + a\x{85}b + 0: a\x{85}b + a\x{2028}b + 0: a\x{2028}b + a\x{2029}b + 0: a\x{2029}b +\= Expect no match + a\n\rb +No match + +/^a\R*b/bsr=unicode,utf + ab + 0: ab + a\nb + 0: a\x{0a}b + a\rb + 0: a\x{0d}b + a\r\nb + 0: a\x{0d}\x{0a}b + a\x0bb + 0: a\x{0b}b + a\x0c\x{2028}\x{2029}b + 0: a\x{0c}\x{2028}\x{2029}b + a\x{85}b + 0: a\x{85}b + a\n\rb + 0: a\x{0a}\x{0d}b + a\n\r\x{85}\x0cb + 0: a\x{0a}\x{0d}\x{85}\x{0c}b + +/^a\R+b/bsr=unicode,utf + a\nb + 0: a\x{0a}b + a\rb + 0: a\x{0d}b + a\r\nb + 0: a\x{0d}\x{0a}b + a\x0bb + 0: a\x{0b}b + a\x0c\x{2028}\x{2029}b + 0: a\x{0c}\x{2028}\x{2029}b + a\x{85}b + 0: a\x{85}b + a\n\rb + 0: a\x{0a}\x{0d}b + a\n\r\x{85}\x0cb + 0: a\x{0a}\x{0d}\x{85}\x{0c}b +\= Expect no match + ab +No match + +/^a\R{1,3}b/bsr=unicode,utf + a\nb + 0: a\x{0a}b + a\n\rb + 0: a\x{0a}\x{0d}b + a\n\r\x{85}b + 0: a\x{0a}\x{0d}\x{85}b + a\r\n\r\nb + 0: a\x{0d}\x{0a}\x{0d}\x{0a}b + a\r\n\r\n\r\nb + 0: a\x{0d}\x{0a}\x{0d}\x{0a}\x{0d}\x{0a}b + a\n\r\n\rb + 0: a\x{0a}\x{0d}\x{0a}\x{0d}b + a\n\n\r\nb + 0: a\x{0a}\x{0a}\x{0d}\x{0a}b +\= Expect no match + a\n\n\n\rb +No match + a\r +No match + +/\H\h\V\v/utf + X X\x0a + 0: X X\x{0a} + X\x09X\x0b + 0: X\x{09}X\x{0b} +\= Expect no match + \x{a0} X\x0a +No match + +/\H*\h+\V?\v{3,4}/utf + \x09\x20\x{a0}X\x0a\x0b\x0c\x0d\x0a + 0: \x{09} \x{a0}X\x{0a}\x{0b}\x{0c}\x{0d} + \x09\x20\x{a0}\x0a\x0b\x0c\x0d\x0a + 0: \x{09} \x{a0}\x{0a}\x{0b}\x{0c}\x{0d} + \x09\x20\x{a0}\x0a\x0b\x0c + 0: \x{09} \x{a0}\x{0a}\x{0b}\x{0c} +\= Expect no match + \x09\x20\x{a0}\x0a\x0b +No match + +/\H\h\V\v/utf + \x{3001}\x{3000}\x{2030}\x{2028} + 0: \x{3001}\x{3000}\x{2030}\x{2028} + X\x{180e}X\x{85} + 0: X\x{180e}X\x{85} +\= Expect no match + \x{2009} X\x0a +No match + +/\H*\h+\V?\v{3,4}/utf + \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x0c\x0d\x0a + 0: \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x{0c}\x{0d} + \x09\x{205f}\x{a0}\x0a\x{2029}\x0c\x{2028}\x0a + 0: \x{09}\x{205f}\x{a0}\x{0a}\x{2029}\x{0c}\x{2028} + \x09\x20\x{202f}\x0a\x0b\x0c + 0: \x{09} \x{202f}\x{0a}\x{0b}\x{0c} +\= Expect no match + \x09\x{200a}\x{a0}\x{2028}\x0b +No match + +/[\h]/B,utf +------------------------------------------------------------------ + Bra + [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}] + Ket + End +------------------------------------------------------------------ + >\x{1680} + 0: \x{1680} + +/[\h]{3,}/B,utf +------------------------------------------------------------------ + Bra + [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}]{3,}+ + Ket + End +------------------------------------------------------------------ + >\x{1680}\x{180e}\x{2000}\x{2003}\x{200a}\x{202f}\x{205f}\x{3000}< + 0: \x{1680}\x{180e}\x{2000}\x{2003}\x{200a}\x{202f}\x{205f}\x{3000} + +/[\v]/B,utf +------------------------------------------------------------------ + Bra + [\x0a-\x0d\x85\x{2028}-\x{2029}] + Ket + End +------------------------------------------------------------------ + +/[\H]/B,utf +------------------------------------------------------------------ + Bra + [\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{10ffff}] + Ket + End +------------------------------------------------------------------ + +/[\V]/B,utf +------------------------------------------------------------------ + Bra + [\x00-\x09\x0e-\x84\x86-\xff\x{100}-\x{2027}\x{202a}-\x{10ffff}] + Ket + End +------------------------------------------------------------------ + +/.*$/newline=any,utf + \x{1ec5} + 0: \x{1ec5} + +/a\Rb/I,bsr=anycrlf,utf +Capture group count = 0 +Options: utf +\R matches CR, LF, or CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + a\rb + 0: a\x{0d}b + a\nb + 0: a\x{0a}b + a\r\nb + 0: a\x{0d}\x{0a}b +\= Expect no match + a\x{85}b +No match + a\x0bb +No match + +/a\Rb/I,bsr=unicode,utf +Capture group count = 0 +Options: utf +\R matches any Unicode newline +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + a\rb + 0: a\x{0d}b + a\nb + 0: a\x{0a}b + a\r\nb + 0: a\x{0d}\x{0a}b + a\x{85}b + 0: a\x{85}b + a\x0bb + 0: a\x{0b}b + +/a\R?b/I,bsr=anycrlf,utf +Capture group count = 0 +Options: utf +\R matches CR, LF, or CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + a\rb + 0: a\x{0d}b + a\nb + 0: a\x{0a}b + a\r\nb + 0: a\x{0d}\x{0a}b +\= Expect no match + a\x{85}b +No match + a\x0bb +No match + +/a\R?b/I,bsr=unicode,utf +Capture group count = 0 +Options: utf +\R matches any Unicode newline +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + a\rb + 0: a\x{0d}b + a\nb + 0: a\x{0a}b + a\r\nb + 0: a\x{0d}\x{0a}b + a\x{85}b + 0: a\x{85}b + a\x0bb + 0: a\x{0b}b + +/.*a.*=.b.*/utf,newline=any + QQQ\x{2029}ABCaXYZ=!bPQR + 0: ABCaXYZ=!bPQR +\= Expect no match + a\x{2029}b +No match + \x61\xe2\x80\xa9\x62 +No match + +/[[:a\x{100}b:]]/utf +Failed: error 130 at offset 3: unknown POSIX class name + +/a[^]b/utf,allow_empty_class,match_unset_backref + a\x{1234}b + 0: a\x{1234}b + a\nb + 0: a\x{0a}b +\= Expect no match + ab +No match + +/a[^]+b/utf,allow_empty_class,match_unset_backref + aXb + 0: aXb + a\nX\nX\x{1234}b + 0: a\x{0a}X\x{0a}X\x{1234}b +\= Expect no match + ab +No match + +/(\x{de})\1/ + \x{de}\x{de} + 0: \xde\xde + 1: \xde + +/X/newline=any,utf,firstline + A\x{1ec5}ABCXYZ + 0: X + +/Xa{2,4}b/utf + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/Xa{2,4}?b/utf + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/Xa{2,4}+b/utf + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/X\x{123}{2,4}b/utf + X\=ps +Partial match: X + X\x{123}\=ps +Partial match: X\x{123} + X\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123} + X\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123} + X\x{123}\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123}\x{123} + +/X\x{123}{2,4}?b/utf + X\=ps +Partial match: X + X\x{123}\=ps +Partial match: X\x{123} + X\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123} + X\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123} + X\x{123}\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123}\x{123} + +/X\x{123}{2,4}+b/utf + X\=ps +Partial match: X + X\x{123}\=ps +Partial match: X\x{123} + X\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123} + X\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123} + X\x{123}\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123}\x{123} + +/X\x{123}{2,4}b/utf +\= Expect no match + Xx\=ps +No match + X\x{123}x\=ps +No match + X\x{123}\x{123}x\=ps +No match + X\x{123}\x{123}\x{123}x\=ps +No match + X\x{123}\x{123}\x{123}\x{123}x\=ps +No match + +/X\x{123}{2,4}?b/utf +\= Expect no match + Xx\=ps +No match + X\x{123}x\=ps +No match + X\x{123}\x{123}x\=ps +No match + X\x{123}\x{123}\x{123}x\=ps +No match + X\x{123}\x{123}\x{123}\x{123}x\=ps +No match + +/X\x{123}{2,4}+b/utf +\= Expect no match + Xx\=ps +No match + X\x{123}x\=ps +No match + X\x{123}\x{123}x\=ps +No match + X\x{123}\x{123}\x{123}x\=ps +No match + X\x{123}\x{123}\x{123}\x{123}x\=ps +No match + +/X\d{2,4}b/utf + X\=ps +Partial match: X + X3\=ps +Partial match: X3 + X33\=ps +Partial match: X33 + X333\=ps +Partial match: X333 + X3333\=ps +Partial match: X3333 + +/X\d{2,4}?b/utf + X\=ps +Partial match: X + X3\=ps +Partial match: X3 + X33\=ps +Partial match: X33 + X333\=ps +Partial match: X333 + X3333\=ps +Partial match: X3333 + +/X\d{2,4}+b/utf + X\=ps +Partial match: X + X3\=ps +Partial match: X3 + X33\=ps +Partial match: X33 + X333\=ps +Partial match: X333 + X3333\=ps +Partial match: X3333 + +/X\D{2,4}b/utf + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/X\D{2,4}?b/utf + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/X\D{2,4}+b/utf + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/X\D{2,4}b/utf + X\=ps +Partial match: X + X\x{123}\=ps +Partial match: X\x{123} + X\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123} + X\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123} + X\x{123}\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123}\x{123} + +/X\D{2,4}?b/utf + X\=ps +Partial match: X + X\x{123}\=ps +Partial match: X\x{123} + X\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123} + X\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123} + X\x{123}\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123}\x{123} + +/X\D{2,4}+b/utf + X\=ps +Partial match: X + X\x{123}\=ps +Partial match: X\x{123} + X\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123} + X\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123} + X\x{123}\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123}\x{123} + +/X[abc]{2,4}b/utf + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/X[abc]{2,4}?b/utf + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/X[abc]{2,4}+b/utf + X\=ps +Partial match: X + Xa\=ps +Partial match: Xa + Xaa\=ps +Partial match: Xaa + Xaaa\=ps +Partial match: Xaaa + Xaaaa\=ps +Partial match: Xaaaa + +/X[abc\x{123}]{2,4}b/utf + X\=ps +Partial match: X + X\x{123}\=ps +Partial match: X\x{123} + X\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123} + X\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123} + X\x{123}\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123}\x{123} + +/X[abc\x{123}]{2,4}?b/utf + X\=ps +Partial match: X + X\x{123}\=ps +Partial match: X\x{123} + X\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123} + X\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123} + X\x{123}\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123}\x{123} + +/X[abc\x{123}]{2,4}+b/utf + X\=ps +Partial match: X + X\x{123}\=ps +Partial match: X\x{123} + X\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123} + X\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123} + X\x{123}\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123}\x{123} + +/X[^a]{2,4}b/utf + X\=ps +Partial match: X + Xz\=ps +Partial match: Xz + Xzz\=ps +Partial match: Xzz + Xzzz\=ps +Partial match: Xzzz + Xzzzz\=ps +Partial match: Xzzzz + +/X[^a]{2,4}?b/utf + X\=ps +Partial match: X + Xz\=ps +Partial match: Xz + Xzz\=ps +Partial match: Xzz + Xzzz\=ps +Partial match: Xzzz + Xzzzz\=ps +Partial match: Xzzzz + +/X[^a]{2,4}+b/utf + X\=ps +Partial match: X + Xz\=ps +Partial match: Xz + Xzz\=ps +Partial match: Xzz + Xzzz\=ps +Partial match: Xzzz + Xzzzz\=ps +Partial match: Xzzzz + +/X[^a]{2,4}b/utf + X\=ps +Partial match: X + X\x{123}\=ps +Partial match: X\x{123} + X\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123} + X\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123} + X\x{123}\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123}\x{123} + +/X[^a]{2,4}?b/utf + X\=ps +Partial match: X + X\x{123}\=ps +Partial match: X\x{123} + X\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123} + X\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123} + X\x{123}\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123}\x{123} + +/X[^a]{2,4}+b/utf + X\=ps +Partial match: X + X\x{123}\=ps +Partial match: X\x{123} + X\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123} + X\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123} + X\x{123}\x{123}\x{123}\x{123}\=ps +Partial match: X\x{123}\x{123}\x{123}\x{123} + +/(Y)X\1{2,4}b/utf + YX\=ps +Partial match: YX + YXY\=ps +Partial match: YXY + YXYY\=ps +Partial match: YXYY + YXYYY\=ps +Partial match: YXYYY + YXYYYY\=ps +Partial match: YXYYYY + +/(Y)X\1{2,4}?b/utf + YX\=ps +Partial match: YX + YXY\=ps +Partial match: YXY + YXYY\=ps +Partial match: YXYY + YXYYY\=ps +Partial match: YXYYY + YXYYYY\=ps +Partial match: YXYYYY + +/(Y)X\1{2,4}+b/utf + YX\=ps +Partial match: YX + YXY\=ps +Partial match: YXY + YXYY\=ps +Partial match: YXYY + YXYYY\=ps +Partial match: YXYYY + YXYYYY\=ps +Partial match: YXYYYY + +/(\x{123})X\1{2,4}b/utf + \x{123}X\=ps +Partial match: \x{123}X + \x{123}X\x{123}\=ps +Partial match: \x{123}X\x{123} + \x{123}X\x{123}\x{123}\=ps +Partial match: \x{123}X\x{123}\x{123} + \x{123}X\x{123}\x{123}\x{123}\=ps +Partial match: \x{123}X\x{123}\x{123}\x{123} + \x{123}X\x{123}\x{123}\x{123}\x{123}\=ps +Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123} + +/(\x{123})X\1{2,4}?b/utf + \x{123}X\=ps +Partial match: \x{123}X + \x{123}X\x{123}\=ps +Partial match: \x{123}X\x{123} + \x{123}X\x{123}\x{123}\=ps +Partial match: \x{123}X\x{123}\x{123} + \x{123}X\x{123}\x{123}\x{123}\=ps +Partial match: \x{123}X\x{123}\x{123}\x{123} + \x{123}X\x{123}\x{123}\x{123}\x{123}\=ps +Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123} + +/(\x{123})X\1{2,4}+b/utf + \x{123}X\=ps +Partial match: \x{123}X + \x{123}X\x{123}\=ps +Partial match: \x{123}X\x{123} + \x{123}X\x{123}\x{123}\=ps +Partial match: \x{123}X\x{123}\x{123} + \x{123}X\x{123}\x{123}\x{123}\=ps +Partial match: \x{123}X\x{123}\x{123}\x{123} + \x{123}X\x{123}\x{123}\x{123}\x{123}\=ps +Partial match: \x{123}X\x{123}\x{123}\x{123}\x{123} + +/\bthe cat\b/utf + the cat\=ps + 0: the cat + the cat\=ph +Partial match: the cat + +/abcd*/utf + xxxxabcd\=ps + 0: abcd + xxxxabcd\=ph +Partial match: abcd + +/abcd*/i,utf + xxxxabcd\=ps + 0: abcd + xxxxabcd\=ph +Partial match: abcd + XXXXABCD\=ps + 0: ABCD + XXXXABCD\=ph +Partial match: ABCD + +/abc\d*/utf + xxxxabc1\=ps + 0: abc1 + xxxxabc1\=ph +Partial match: abc1 + +/(a)bc\1*/utf + xxxxabca\=ps + 0: abca + 1: a + xxxxabca\=ph +Partial match: abca + +/abc[de]*/utf + xxxxabcde\=ps + 0: abcde + xxxxabcde\=ph +Partial match: abcde + +/X\W{3}X/utf + X\=ps +Partial match: X + +/\sxxx\s/utf,tables=2 + AB\x{85}xxx\x{a0}XYZ + 0: \x{85}xxx\x{a0} + AB\x{a0}xxx\x{85}XYZ + 0: \x{a0}xxx\x{85} + +/\S \S/utf,tables=2 + \x{a2} \x{84} + 0: \x{a2} \x{84} + +'A#хц'Bx,newline=any,utf +------------------------------------------------------------------ + Bra + A + Ket + End +------------------------------------------------------------------ + +'A#хц + PQ'Bx,newline=any,utf +------------------------------------------------------------------ + Bra + APQ + Ket + End +------------------------------------------------------------------ + +/a+#Ñ…aa + z#XX?/Bx,newline=any,utf +------------------------------------------------------------------ + Bra + a++ + z + Ket + End +------------------------------------------------------------------ + +/a+#Ñ…aa + z#Ñ…?/Bx,newline=any,utf +------------------------------------------------------------------ + Bra + a++ + z + Ket + End +------------------------------------------------------------------ + +/\g{A}xxx#bXX(?'A'123) (?'A'456)/Bx,newline=any,utf +------------------------------------------------------------------ + Bra + \1 + xxx + CBra 1 + 456 + Ket + Ket + End +------------------------------------------------------------------ + +/\g{A}xxx#bÑ…(?'A'123) (?'A'456)/Bx,newline=any,utf +------------------------------------------------------------------ + Bra + \1 + xxx + CBra 1 + 456 + Ket + Ket + End +------------------------------------------------------------------ + +/^\cÄ£/utf +Failed: error 168 at offset 3: \c must be followed by a printable ASCII character + +/(\R*)(.)/s,utf + \r\n + 0: \x{0d} + 1: + 2: \x{0d} + \r\r\n\n\r + 0: \x{0d}\x{0d}\x{0a}\x{0a}\x{0d} + 1: \x{0d}\x{0d}\x{0a}\x{0a} + 2: \x{0d} + \r\r\n\n\r\n + 0: \x{0d}\x{0d}\x{0a}\x{0a}\x{0d} + 1: \x{0d}\x{0d}\x{0a}\x{0a} + 2: \x{0d} + +/(\R)*(.)/s,utf + \r\n + 0: \x{0d} + 1: + 2: \x{0d} + \r\r\n\n\r + 0: \x{0d}\x{0d}\x{0a}\x{0a}\x{0d} + 1: \x{0a} + 2: \x{0d} + \r\r\n\n\r\n + 0: \x{0d}\x{0d}\x{0a}\x{0a}\x{0d} + 1: \x{0a} + 2: \x{0d} + +/[^\x{1234}]+/Ii,utf +Capture group count = 0 +Options: caseless utf +Subject length lower bound = 1 + +/[^\x{1234}]+?/Ii,utf +Capture group count = 0 +Options: caseless utf +Subject length lower bound = 1 + +/[^\x{1234}]++/Ii,utf +Capture group count = 0 +Options: caseless utf +Subject length lower bound = 1 + +/[^\x{1234}]{2}/Ii,utf +Capture group count = 0 +Options: caseless utf +Subject length lower bound = 2 + +/f.*/ + for\=ph +Partial match: for + +/f.*/s + for\=ph +Partial match: for + +/f.*/utf + for\=ph +Partial match: for + +/f.*/s,utf + for\=ph +Partial match: for + +/\x{d7ff}\x{e000}/utf + +/\x{d800}/utf +Failed: error 173 at offset 7: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) + +/\x{dfff}/utf +Failed: error 173 at offset 7: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) + +/\h+/utf + \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} + 0: \x{1680}\x{2000}\x{202f}\x{3000} + \x{3001}\x{2fff}\x{200a}\x{a0}\x{2000} + 0: \x{200a}\x{a0}\x{2000} + +/[\h\x{e000}]+/B,utf +------------------------------------------------------------------ + Bra + [\x09 \xa0\x{1680}\x{180e}\x{2000}-\x{200a}\x{202f}\x{205f}\x{3000}\x{e000}]++ + Ket + End +------------------------------------------------------------------ + \x{1681}\x{200b}\x{1680}\x{2000}\x{202f}\x{3000} + 0: \x{1680}\x{2000}\x{202f}\x{3000} + \x{3001}\x{2fff}\x{200a}\x{a0}\x{2000} + 0: \x{200a}\x{a0}\x{2000} + +/\H+/utf + \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} + 0: \x{167f}\x{1681}\x{180d}\x{180f} + \x{2000}\x{200a}\x{1fff}\x{200b} + 0: \x{1fff}\x{200b} + \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} + 0: \x{202e}\x{2030}\x{205e}\x{2060} + \x{a0}\x{3000}\x{9f}\x{a1}\x{2fff}\x{3001} + 0: \x{9f}\x{a1}\x{2fff}\x{3001} + +/[\H\x{d7ff}]+/B,utf +------------------------------------------------------------------ + Bra + [\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{10ffff}\x{d7ff}]++ + Ket + End +------------------------------------------------------------------ + \x{1680}\x{180e}\x{167f}\x{1681}\x{180d}\x{180f} + 0: \x{167f}\x{1681}\x{180d}\x{180f} + \x{2000}\x{200a}\x{1fff}\x{200b} + 0: \x{1fff}\x{200b} + \x{202f}\x{205f}\x{202e}\x{2030}\x{205e}\x{2060} + 0: \x{202e}\x{2030}\x{205e}\x{2060} + \x{a0}\x{3000}\x{9f}\x{a1}\x{2fff}\x{3001} + 0: \x{9f}\x{a1}\x{2fff}\x{3001} + +/\v+/utf + \x{2027}\x{2030}\x{2028}\x{2029} + 0: \x{2028}\x{2029} + \x09\x0e\x{84}\x{86}\x{85}\x0a\x0b\x0c\x0d + 0: \x{85}\x{0a}\x{0b}\x{0c}\x{0d} + +/[\v\x{e000}]+/B,utf +------------------------------------------------------------------ + Bra + [\x0a-\x0d\x85\x{2028}-\x{2029}\x{e000}]++ + Ket + End +------------------------------------------------------------------ + \x{2027}\x{2030}\x{2028}\x{2029} + 0: \x{2028}\x{2029} + \x09\x0e\x{84}\x{86}\x{85}\x0a\x0b\x0c\x0d + 0: \x{85}\x{0a}\x{0b}\x{0c}\x{0d} + +/\V+/utf + \x{2028}\x{2029}\x{2027}\x{2030} + 0: \x{2027}\x{2030} + \x{85}\x0a\x0b\x0c\x0d\x09\x0e\x{84}\x{86} + 0: \x{09}\x{0e}\x{84}\x{86} + +/[\V\x{d7ff}]+/B,utf +------------------------------------------------------------------ + Bra + [\x00-\x09\x0e-\x84\x86-\xff\x{100}-\x{2027}\x{202a}-\x{10ffff}\x{d7ff}]++ + Ket + End +------------------------------------------------------------------ + \x{2028}\x{2029}\x{2027}\x{2030} + 0: \x{2027}\x{2030} + \x{85}\x0a\x0b\x0c\x0d\x09\x0e\x{84}\x{86} + 0: \x{09}\x{0e}\x{84}\x{86} + +/\R+/bsr=unicode,utf + \x{2027}\x{2030}\x{2028}\x{2029} + 0: \x{2028}\x{2029} + \x09\x0e\x{84}\x{86}\x{85}\x0a\x0b\x0c\x0d + 0: \x{85}\x{0a}\x{0b}\x{0c}\x{0d} + +/(..)\1/utf + ab\=ps +Partial match: ab + aba\=ps +Partial match: aba + abab\=ps + 0: abab + 1: ab + +/(..)\1/i,utf + ab\=ps +Partial match: ab + abA\=ps +Partial match: abA + aBAb\=ps + 0: aBAb + 1: aB + +/(..)\1{2,}/utf + ab\=ps +Partial match: ab + aba\=ps +Partial match: aba + abab\=ps +Partial match: abab + ababa\=ps +Partial match: ababa + ababab\=ps + 0: ababab + 1: ab + ababab\=ph +Partial match: ababab + abababa\=ps + 0: ababab + 1: ab + abababa\=ph +Partial match: abababa + +/(..)\1{2,}/i,utf + ab\=ps +Partial match: ab + aBa\=ps +Partial match: aBa + aBAb\=ps +Partial match: aBAb + AbaBA\=ps +Partial match: AbaBA + abABAb\=ps + 0: abABAb + 1: ab + aBAbaB\=ph +Partial match: aBAbaB + abABabA\=ps + 0: abABab + 1: ab + abaBABa\=ph +Partial match: abaBABa + +/(..)\1{2,}?x/i,utf + ab\=ps +Partial match: ab + abA\=ps +Partial match: abA + aBAb\=ps +Partial match: aBAb + abaBA\=ps +Partial match: abaBA + abAbaB\=ps +Partial match: abAbaB + abaBabA\=ps +Partial match: abaBabA + abAbABaBx\=ps + 0: abAbABaBx + 1: ab + +/./utf,newline=crlf + \r\=ps + 0: \x{0d} + \r\=ph +Partial match: \x{0d} + +/.{2,3}/utf,newline=crlf + \r\=ps +Partial match: \x{0d} + \r\=ph +Partial match: \x{0d} + \r\r\=ps + 0: \x{0d}\x{0d} + \r\r\=ph +Partial match: \x{0d}\x{0d} + \r\r\r\=ps + 0: \x{0d}\x{0d}\x{0d} + \r\r\r\=ph +Partial match: \x{0d}\x{0d}\x{0d} + +/.{2,3}?/utf,newline=crlf + \r\=ps +Partial match: \x{0d} + \r\=ph +Partial match: \x{0d} + \r\r\=ps + 0: \x{0d}\x{0d} + \r\r\=ph +Partial match: \x{0d}\x{0d} + \r\r\r\=ps + 0: \x{0d}\x{0d} + \r\r\r\=ph + 0: \x{0d}\x{0d} + +/[^\x{100}][^\x{1234}][^\x{ffff}][^\x{10000}][^\x{10ffff}]/B,utf +------------------------------------------------------------------ + Bra + [^\x{100}] + [^\x{1234}] + [^\x{ffff}] + [^\x{10000}] + [^\x{10ffff}] + Ket + End +------------------------------------------------------------------ + +/[^\x{100}][^\x{1234}][^\x{ffff}][^\x{10000}][^\x{10ffff}]/Bi,utf +------------------------------------------------------------------ + Bra + /i [^\x{100}] + /i [^\x{1234}] + /i [^\x{ffff}] + /i [^\x{10000}] + /i [^\x{10ffff}] + Ket + End +------------------------------------------------------------------ + +/[^\x{100}]*[^\x{10000}]+[^\x{10ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{fffff}]{5,6}+/B,utf +------------------------------------------------------------------ + Bra + [^\x{100}]* + [^\x{10000}]+ + [^\x{10ffff}]?? + [^\x{8000}]{4} + [^\x{8000}]* + [^\x{7fff}]{2} + [^\x{7fff}]{0,7}? + [^\x{fffff}]{5} + [^\x{fffff}]?+ + Ket + End +------------------------------------------------------------------ + +/[^\x{100}]*[^\x{10000}]+[^\x{10ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{fffff}]{5,6}+/Bi,utf +------------------------------------------------------------------ + Bra + /i [^\x{100}]* + /i [^\x{10000}]+ + /i [^\x{10ffff}]?? + /i [^\x{8000}]{4} + /i [^\x{8000}]* + /i [^\x{7fff}]{2} + /i [^\x{7fff}]{0,7}? + /i [^\x{fffff}]{5} + /i [^\x{fffff}]?+ + Ket + End +------------------------------------------------------------------ + +/(?<=\x{1234}\x{1234})\bxy/I,utf +Capture group count = 0 +Max lookbehind = 2 +Options: utf +First code unit = 'x' +Last code unit = 'y' +Subject length lower bound = 2 + +/(?= 0xd800 && <= 0xdfff) + +/^\u{0000000000010ffff}/utf,extra_alt_bsux + \x{10ffff} + 0: \x{10ffff} + +/\u/utf,alt_bsux + \\u + 0: u + +/^a+[a\x{200}]/B,utf +------------------------------------------------------------------ + Bra + ^ + a+ + [a\x{200}] + Ket + End +------------------------------------------------------------------ + aa + 0: aa + +/[b-d\x{200}-\x{250}]*[ae-h]?#[\x{200}-\x{250}]{0,8}[\x00-\xff]*#[\x{200}-\x{250}]+[a-z]/B,utf +------------------------------------------------------------------ + Bra + [b-d\x{200}-\x{250}]*+ + [ae-h]?+ + # + [\x{200}-\x{250}]{0,8}+ + [\x00-\xff]* + # + [\x{200}-\x{250}]++ + [a-z] + Ket + End +------------------------------------------------------------------ + +/[\p{L}]/IB +------------------------------------------------------------------ + Bra + [\p{L}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Subject length lower bound = 1 + +/[\p{^L}]/IB +------------------------------------------------------------------ + Bra + [\P{L}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Subject length lower bound = 1 + +/[\P{L}]/IB +------------------------------------------------------------------ + Bra + [\P{L}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Subject length lower bound = 1 + +/[\P{^L}]/IB +------------------------------------------------------------------ + Bra + [\p{L}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Subject length lower bound = 1 + +/[abc\p{L}\x{0660}]/IB,utf +------------------------------------------------------------------ + Bra + [a-c\p{L}\x{660}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Subject length lower bound = 1 + +/[\p{Nd}]/IB,utf +------------------------------------------------------------------ + Bra + [\p{Nd}] + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Subject length lower bound = 1 + 1234 + 0: 1 + +/[\p{Nd}+-]+/IB,utf +------------------------------------------------------------------ + Bra + [+\-\p{Nd}]++ + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +Subject length lower bound = 1 + 1234 + 0: 1234 + 12-34 + 0: 12-34 + 12+\x{661}-34 + 0: 12+\x{661}-34 +\= Expect no match + abcd +No match + +/(?:[\PPa*]*){8,}/ + +/[\P{Any}]/B +------------------------------------------------------------------ + Bra + [\P{Any}] + Ket + End +------------------------------------------------------------------ + +/[\P{Any}\E]/B +------------------------------------------------------------------ + Bra + [\P{Any}] + Ket + End +------------------------------------------------------------------ + +/(\P{Yi}+\277)/ + +/(\P{Yi}+\277)?/ + +/(?<=\P{Yi}{3}A)X/ + +/\p{Yi}+(\P{Yi}+)(?1)/ + +/(\P{Yi}{2}\277)?/ + +/[\P{Yi}A]/ + +/[\P{Yi}\P{Yi}\P{Yi}A]/ + +/[^\P{Yi}A]/ + +/[^\P{Yi}\P{Yi}\P{Yi}A]/ + +/(\P{Yi}*\277)*/ + +/(\P{Yi}*?\277)*/ + +/(\p{Yi}*+\277)*/ + +/(\P{Yi}?\277)*/ + +/(\P{Yi}??\277)*/ + +/(\p{Yi}?+\277)*/ + +/(\P{Yi}{0,3}\277)*/ + +/(\P{Yi}{0,3}?\277)*/ + +/(\p{Yi}{0,3}+\277)*/ + +/\p{Zl}{2,3}+/B,utf +------------------------------------------------------------------ + Bra + prop Zl {2} + prop Zl ?+ + Ket + End +------------------------------------------------------------------ + 

 + 0: \x{2028}\x{2028} + \x{2028}\x{2028}\x{2028} + 0: \x{2028}\x{2028}\x{2028} + +/\p{Zl}/B,utf +------------------------------------------------------------------ + Bra + prop Zl + Ket + End +------------------------------------------------------------------ + +/\p{Lu}{3}+/B,utf +------------------------------------------------------------------ + Bra + prop Lu {3} + Ket + End +------------------------------------------------------------------ + +/\pL{2}+/B,utf +------------------------------------------------------------------ + Bra + prop L {2} + Ket + End +------------------------------------------------------------------ + +/\p{Cc}{2}+/B,utf +------------------------------------------------------------------ + Bra + prop Cc {2} + Ket + End +------------------------------------------------------------------ + +/^\p{Cf}/utf + \x{180e} + 0: \x{180e} + \x{061c} + 0: \x{61c} + \x{2066} + 0: \x{2066} + \x{2067} + 0: \x{2067} + \x{2068} + 0: \x{2068} + \x{2069} + 0: \x{2069} + +/^\p{Cs}/utf + \x{dfff}\=no_utf_check + 0: \x{dfff} +\= Expect no match + \x{09f} +No match + +/^\p{Mn}/utf + \x{1a1b} + 0: \x{1a1b} + +/^\p{Pe}/utf + \x{2309} + 0: \x{2309} + \x{230b} + 0: \x{230b} + +/^\p{Ps}/utf + \x{2308} + 0: \x{2308} + \x{230a} + 0: \x{230a} + +/^\p{Sc}+/utf + $\x{a2}\x{a3}\x{a4}\x{a5}\x{a6} + 0: $\x{a2}\x{a3}\x{a4}\x{a5} + \x{9f2} + 0: \x{9f2} +\= Expect no match + X +No match + \x{2c2} +No match + +/^\p{Zs}/utf + \ \ + 0: + \x{a0} + 0: \x{a0} + \x{1680} + 0: \x{1680} + \x{2000} + 0: \x{2000} + \x{2001} + 0: \x{2001} +\= Expect no match + \x{2028} +No match + \x{200d} +No match + +# These are here because Perl has problems with the negative versions of the +# properties and has changed how it behaves for caseless matching. + +/\p{^Lu}/i,utf + 1234 + 0: 1 +\= Expect no match + ABC +No match + +/\P{Lu}/i,utf + 1234 + 0: 1 +\= Expect no match + ABC +No match + +/\p{Ll}/i,utf + a + 0: a + Az + 0: z +\= Expect no match + ABC +No match + +/\p{Lu}/i,utf + A + 0: A + a\x{10a0}B + 0: \x{10a0} +\= Expect no match + a +No match + \x{1d00} +No match + +/\p{Lu}/i,utf + A + 0: A + aZ + 0: Z +\= Expect no match + abc +No match + +/[\x{c0}\x{391}]/i,utf + \x{c0} + 0: \x{c0} + \x{e0} + 0: \x{e0} + +# The next two are special cases where the lengths of the different cases of +# the same character differ. The first went wrong with heap frame storage; the +# second was broken in all cases. + +/^\x{023a}+?(\x{0130}+)/i,utf + \x{023a}\x{2c65}\x{0130} + 0: \x{23a}\x{2c65}\x{130} + 1: \x{130} + +/^\x{023a}+([^X])/i,utf + \x{023a}\x{2c65}X + 0: \x{23a}\x{2c65} + 1: \x{2c65} + +/\x{c0}+\x{116}+/i,utf + \x{c0}\x{e0}\x{116}\x{117} + 0: \x{c0}\x{e0}\x{116}\x{117} + +/[\x{c0}\x{116}]+/i,utf + \x{c0}\x{e0}\x{116}\x{117} + 0: \x{c0}\x{e0}\x{116}\x{117} + +/(\x{de})\1/i,utf + \x{de}\x{de} + 0: \x{de}\x{de} + 1: \x{de} + \x{de}\x{fe} + 0: \x{de}\x{fe} + 1: \x{de} + \x{fe}\x{fe} + 0: \x{fe}\x{fe} + 1: \x{fe} + \x{fe}\x{de} + 0: \x{fe}\x{de} + 1: \x{fe} + +/^\x{c0}$/i,utf + \x{c0} + 0: \x{c0} + \x{e0} + 0: \x{e0} + +/^\x{e0}$/i,utf + \x{c0} + 0: \x{c0} + \x{e0} + 0: \x{e0} + +# The next two should be Perl-compatible, but it fails to match \x{e0}. PCRE +# will match it only with UCP support, because without that it has no notion +# of case for anything other than the ASCII letters. + +/((?i)[\x{c0}])/utf + \x{c0} + 0: \x{c0} + 1: \x{c0} + \x{e0} + 0: \x{e0} + 1: \x{e0} + +/(?i:[\x{c0}])/utf + \x{c0} + 0: \x{c0} + \x{e0} + 0: \x{e0} + +# These are PCRE's extra properties to help with Unicodizing \d etc. + +/^\p{Xan}/utf + ABCD + 0: A + 1234 + 0: 1 + \x{6ca} + 0: \x{6ca} + \x{a6c} + 0: \x{a6c} + \x{10a7} + 0: \x{10a7} +\= Expect no match + _ABC +No match + +/^\p{Xan}+/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + 0: ABCD1234\x{6ca}\x{a6c}\x{10a7} +\= Expect no match + _ABC +No match + +/^\p{Xan}+?/utf + \x{6ca}\x{a6c}\x{10a7}_ + 0: \x{6ca} + +/^\p{Xan}*/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + 0: ABCD1234\x{6ca}\x{a6c}\x{10a7} + +/^\p{Xan}{2,9}/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + 0: ABCD1234\x{6ca} + +/^\p{Xan}{2,9}?/utf + \x{6ca}\x{a6c}\x{10a7}_ + 0: \x{6ca}\x{a6c} + +/^[\p{Xan}]/utf + ABCD1234_ + 0: A + 1234abcd_ + 0: 1 + \x{6ca} + 0: \x{6ca} + \x{a6c} + 0: \x{a6c} + \x{10a7} + 0: \x{10a7} +\= Expect no match + _ABC +No match + +/^[\p{Xan}]+/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + 0: ABCD1234\x{6ca}\x{a6c}\x{10a7} +\= Expect no match + _ABC +No match + +/^>\p{Xsp}/utf + >\x{1680}\x{2028}\x{0b} + 0: >\x{1680} + >\x{a0} + 0: >\x{a0} +\= Expect no match + \x{0b} +No match + +/^>\p{Xsp}+/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xsp}+?/utf + >\x{1680}\x{2028}\x{0b} + 0: >\x{1680} + +/^>\p{Xsp}*/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xsp}{2,9}/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xsp}{2,9}?/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09} + +/^>[\p{Xsp}]/utf + >\x{2028}\x{0b} + 0: >\x{2028} + +/^>[\p{Xsp}]+/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}/utf + >\x{1680}\x{2028}\x{0b} + 0: >\x{1680} + >\x{a0} + 0: >\x{a0} +\= Expect no match + \x{0b} +No match + +/^>\p{Xps}+/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}+?/utf + >\x{1680}\x{2028}\x{0b} + 0: >\x{1680} + +/^>\p{Xps}*/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}{2,9}/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}{2,9}?/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09} + +/^>[\p{Xps}]/utf + >\x{2028}\x{0b} + 0: >\x{2028} + +/^>[\p{Xps}]+/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^\p{Xwd}/utf + ABCD + 0: A + 1234 + 0: 1 + \x{6ca} + 0: \x{6ca} + \x{a6c} + 0: \x{a6c} + \x{10a7} + 0: \x{10a7} + _ABC + 0: _ +\= Expect no match + [] +No match + +/^\p{Xwd}+/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + +/^\p{Xwd}+?/utf + \x{6ca}\x{a6c}\x{10a7}_ + 0: \x{6ca} + +/^\p{Xwd}*/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + +/^\p{Xwd}{2,9}/utf + A_B12\x{6ca}\x{a6c}\x{10a7} + 0: A_B12\x{6ca}\x{a6c}\x{10a7} + +/^\p{Xwd}{2,9}?/utf + \x{6ca}\x{a6c}\x{10a7}_ + 0: \x{6ca}\x{a6c} + +/^[\p{Xwd}]/utf + ABCD1234_ + 0: A + 1234abcd_ + 0: 1 + \x{6ca} + 0: \x{6ca} + \x{a6c} + 0: \x{a6c} + \x{10a7} + 0: \x{10a7} + _ABC + 0: _ +\= Expect no match + [] +No match + +/^[\p{Xwd}]+/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + +# A check not in UTF-8 mode + +/^[\p{Xwd}]+/ + ABCD1234_ + 0: ABCD1234_ + +# Some negative checks + +/^[\P{Xwd}]+/utf + !.+\x{019}\x{35a}AB + 0: !.+\x{19}\x{35a} + +/^[\p{^Xwd}]+/utf + !.+\x{019}\x{35a}AB + 0: !.+\x{19}\x{35a} + +/[\D]/B,utf,ucp +------------------------------------------------------------------ + Bra + [\P{Nd}] + Ket + End +------------------------------------------------------------------ + 1\x{3c8}2 + 0: \x{3c8} + +/[\d]/B,utf,ucp +------------------------------------------------------------------ + Bra + [\p{Nd}] + Ket + End +------------------------------------------------------------------ + >\x{6f4}< + 0: \x{6f4} + +/[\S]/B,utf,ucp +------------------------------------------------------------------ + Bra + [\P{Xsp}] + Ket + End +------------------------------------------------------------------ + \x{1680}\x{6f4}\x{1680} + 0: \x{6f4} + +/[\s]/B,utf,ucp +------------------------------------------------------------------ + Bra + [\p{Xsp}] + Ket + End +------------------------------------------------------------------ + >\x{1680}< + 0: \x{1680} + +/[\W]/B,utf,ucp +------------------------------------------------------------------ + Bra + [\P{Xwd}] + Ket + End +------------------------------------------------------------------ + A\x{1712}B + 0: \x{1712} + +/[\w]/B,utf,ucp +------------------------------------------------------------------ + Bra + [\p{Xwd}] + Ket + End +------------------------------------------------------------------ + >\x{1723}< + 0: \x{1723} + +/\D/B,utf,ucp +------------------------------------------------------------------ + Bra + notprop Nd + Ket + End +------------------------------------------------------------------ + 1\x{3c8}2 + 0: \x{3c8} + +/\d/B,utf,ucp +------------------------------------------------------------------ + Bra + prop Nd + Ket + End +------------------------------------------------------------------ + >\x{6f4}< + 0: \x{6f4} + +/\S/B,utf,ucp +------------------------------------------------------------------ + Bra + notprop Xsp + Ket + End +------------------------------------------------------------------ + \x{1680}\x{6f4}\x{1680} + 0: \x{6f4} + +/\s/B,utf,ucp +------------------------------------------------------------------ + Bra + prop Xsp + Ket + End +------------------------------------------------------------------ + >\x{1680}> + 0: \x{1680} + +/\W/B,utf,ucp +------------------------------------------------------------------ + Bra + notprop Xwd + Ket + End +------------------------------------------------------------------ + A\x{1712}B + 0: \x{1712} + +/\w/B,utf,ucp +------------------------------------------------------------------ + Bra + prop Xwd + Ket + End +------------------------------------------------------------------ + >\x{1723}< + 0: \x{1723} + +/[[:alpha:]]/B,ucp +------------------------------------------------------------------ + Bra + [\p{L}] + Ket + End +------------------------------------------------------------------ + +/[[:lower:]]/B,ucp +------------------------------------------------------------------ + Bra + [\p{Ll}] + Ket + End +------------------------------------------------------------------ + +/[[:upper:]]/B,ucp +------------------------------------------------------------------ + Bra + [\p{Lu}] + Ket + End +------------------------------------------------------------------ + +/[[:alnum:]]/B,ucp +------------------------------------------------------------------ + Bra + [\p{Xan}] + Ket + End +------------------------------------------------------------------ + +/[[:ascii:]]/B,ucp +------------------------------------------------------------------ + Bra + [\x00-\x7f] + Ket + End +------------------------------------------------------------------ + +/[[:cntrl:]]/B,ucp +------------------------------------------------------------------ + Bra + [\p{Cc}] + Ket + End +------------------------------------------------------------------ + +/[[:digit:]]/B,ucp +------------------------------------------------------------------ + Bra + [\p{Nd}] + Ket + End +------------------------------------------------------------------ + +/[[:graph:]]/B,ucp +------------------------------------------------------------------ + Bra + [[:graph:]] + Ket + End +------------------------------------------------------------------ + +/[[:print:]]/B,ucp +------------------------------------------------------------------ + Bra + [[:print:]] + Ket + End +------------------------------------------------------------------ + +/[[:punct:]]/B,ucp +------------------------------------------------------------------ + Bra + [[:punct:]] + Ket + End +------------------------------------------------------------------ + +/[[:space:]]/B,ucp +------------------------------------------------------------------ + Bra + [\p{Xps}] + Ket + End +------------------------------------------------------------------ + +/[[:word:]]/B,ucp +------------------------------------------------------------------ + Bra + [\p{Xwd}] + Ket + End +------------------------------------------------------------------ + +/[[:xdigit:]]/B,ucp +------------------------------------------------------------------ + Bra + [0-9A-Fa-f] + Ket + End +------------------------------------------------------------------ + +# Unicode properties for \b abd \B + +/\b...\B/utf,ucp + abc_ + 0: abc + \x{37e}abc\x{376} + 0: abc + \x{37e}\x{376}\x{371}\x{393}\x{394} + 0: \x{376}\x{371}\x{393} + !\x{c0}++\x{c1}\x{c2} + 0: ++\x{c1} + !\x{c0}+++++ + 0: \x{c0}++ + +# Without PCRE_UCP, non-ASCII always fail, even if < 256 + +/\b...\B/utf + abc_ + 0: abc +\= Expect no match + \x{37e}abc\x{376} +No match + \x{37e}\x{376}\x{371}\x{393}\x{394} +No match + !\x{c0}++\x{c1}\x{c2} +No match + !\x{c0}+++++ +No match + +# With PCRE_UCP, non-UTF8 chars that are < 256 still check properties + +/\b...\B/ucp + abc_ + 0: abc + !\x{c0}++\x{c1}\x{c2} + 0: ++\xc1 + !\x{c0}+++++ + 0: \xc0++ + +# Some of these are silly, but they check various combinations + +/[[:^alpha:][:^cntrl:]]+/B,utf,ucp +------------------------------------------------------------------ + Bra + [\P{L}\P{Cc}]++ + Ket + End +------------------------------------------------------------------ + 123 + 0: 123 + abc + 0: abc + +/[[:^cntrl:][:^alpha:]]+/B,utf,ucp +------------------------------------------------------------------ + Bra + [\P{Cc}\P{L}]++ + Ket + End +------------------------------------------------------------------ + 123 + 0: 123 + abc + 0: abc + +/[[:alpha:]]+/B,utf,ucp +------------------------------------------------------------------ + Bra + [\p{L}]++ + Ket + End +------------------------------------------------------------------ + abc + 0: abc + +/[[:^alpha:]\S]+/B,utf,ucp +------------------------------------------------------------------ + Bra + [\P{L}\P{Xsp}]++ + Ket + End +------------------------------------------------------------------ + 123 + 0: 123 + abc + 0: abc + +/[^\d]+/B,utf,ucp +------------------------------------------------------------------ + Bra + [^\p{Nd}]++ + Ket + End +------------------------------------------------------------------ + abc123 + 0: abc + abc\x{123} + 0: abc\x{123} + \x{660}abc + 0: abc + +/\p{Lu}+9\p{Lu}+B\p{Lu}+b/B +------------------------------------------------------------------ + Bra + prop Lu ++ + 9 + prop Lu + + B + prop Lu ++ + b + Ket + End +------------------------------------------------------------------ + +/\p{^Lu}+9\p{^Lu}+B\p{^Lu}+b/B +------------------------------------------------------------------ + Bra + notprop Lu + + 9 + notprop Lu ++ + B + notprop Lu + + b + Ket + End +------------------------------------------------------------------ + +/\P{Lu}+9\P{Lu}+B\P{Lu}+b/B +------------------------------------------------------------------ + Bra + notprop Lu + + 9 + notprop Lu ++ + B + notprop Lu + + b + Ket + End +------------------------------------------------------------------ + +/\p{Han}+X\p{Greek}+\x{370}/B,utf +------------------------------------------------------------------ + Bra + prop Han ++ + X + prop Greek + + \x{370} + Ket + End +------------------------------------------------------------------ + +/\p{Xan}+!\p{Xan}+A/B +------------------------------------------------------------------ + Bra + prop Xan ++ + ! + prop Xan + + A + Ket + End +------------------------------------------------------------------ + +/\p{Xsp}+!\p{Xsp}\t/B +------------------------------------------------------------------ + Bra + prop Xsp ++ + ! + prop Xsp + \x09 + Ket + End +------------------------------------------------------------------ + +/\p{Xps}+!\p{Xps}\t/B +------------------------------------------------------------------ + Bra + prop Xps ++ + ! + prop Xps + \x09 + Ket + End +------------------------------------------------------------------ + +/\p{Xwd}+!\p{Xwd}_/B +------------------------------------------------------------------ + Bra + prop Xwd ++ + ! + prop Xwd + _ + Ket + End +------------------------------------------------------------------ + +/A+\p{N}A+\dB+\p{N}*B+\d*/B,ucp +------------------------------------------------------------------ + Bra + A++ + prop N + A++ + prop Nd + B+ + prop N *+ + B++ + prop Nd *+ + Ket + End +------------------------------------------------------------------ + +# These behaved oddly in Perl, so they are kept in this test + +/(\x{23a}\x{23a}\x{23a})?\1/i,utf +\= Expect no match + \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65} +No match + +/(ȺȺȺ)?\1/i,utf +\= Expect no match + ȺȺȺⱥⱥ +No match + +/(\x{23a}\x{23a}\x{23a})?\1/i,utf + \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} + 0: \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} + 1: \x{23a}\x{23a}\x{23a} + +/(ȺȺȺ)?\1/i,utf + ȺȺȺⱥⱥⱥ + 0: \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} + 1: \x{23a}\x{23a}\x{23a} + +/(\x{23a}\x{23a}\x{23a})\1/i,utf +\= Expect no match + \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65} +No match + +/(ȺȺȺ)\1/i,utf +\= Expect no match + ȺȺȺⱥⱥ +No match + +/(\x{23a}\x{23a}\x{23a})\1/i,utf + \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} + 0: \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} + 1: \x{23a}\x{23a}\x{23a} + +/(ȺȺȺ)\1/i,utf + ȺȺȺⱥⱥⱥ + 0: \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65} + 1: \x{23a}\x{23a}\x{23a} + +/(\x{2c65}\x{2c65})\1/i,utf + \x{2c65}\x{2c65}\x{23a}\x{23a} + 0: \x{2c65}\x{2c65}\x{23a}\x{23a} + 1: \x{2c65}\x{2c65} + +/(ⱥⱥ)\1/i,utf + ⱥⱥȺȺ + 0: \x{2c65}\x{2c65}\x{23a}\x{23a} + 1: \x{2c65}\x{2c65} + +/(\x{23a}\x{23a}\x{23a})\1Y/i,utf + X\x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65}YZ + 0: \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65}Y + 1: \x{23a}\x{23a}\x{23a} + +/(\x{2c65}\x{2c65})\1Y/i,utf + X\x{2c65}\x{2c65}\x{23a}\x{23a}YZ + 0: \x{2c65}\x{2c65}\x{23a}\x{23a}Y + 1: \x{2c65}\x{2c65} + +# These scripts weren't yet in Perl when I added Unicode 6.0.0 to PCRE + +/^[\p{Batak}]/utf + \x{1bc0} + 0: \x{1bc0} + \x{1bff} + 0: \x{1bff} +\= Expect no match + \x{1bf4} +No match + +/^[\p{Brahmi}]/utf + \x{11000} + 0: \x{11000} + \x{1106f} + 0: \x{1106f} +\= Expect no match + \x{1104e} +No match + +/^[\p{Mandaic}]/utf + \x{840} + 0: \x{840} + \x{85e} + 0: \x{85e} +\= Expect no match + \x{85c} +No match + \x{85d} +No match + +/(\X*)(.)/s,utf + A\x{300} + 0: A + 1: + 2: A + +/^S(\X*)e(\X*)$/utf + SteÌreÌo + 0: Ste\x{301}re\x{301}o + 1: te\x{301}r + 2: \x{301}o + +/^\X/utf + ÌreÌo + 0: \x{301} + +/^a\X41z/alt_bsux,allow_empty_class,match_unset_backref,dupnames + aX41z + 0: aX41z +\= Expect no match + aAz +No match + +/\X/ + a\=ps + 0: a + a\=ph +Partial match: a + +/\Xa/ + aa\=ps + 0: aa + aa\=ph + 0: aa + +/\X{2}/ + aa\=ps + 0: aa + aa\=ph +Partial match: aa + +/\X+a/ + a\=ps +Partial match: a + aa\=ps + 0: aa + aa\=ph +Partial match: aa + +/\X+?a/ + a\=ps +Partial match: a + ab\=ps +Partial match: ab + aa\=ps + 0: aa + aa\=ph + 0: aa + aba\=ps + 0: aba + +# These Unicode 6.1.0 scripts are not known to Perl. + +/\p{Chakma}\d/utf,ucp + \x{11100}\x{1113c} + 0: \x{11100}\x{1113c} + +/\p{Takri}\d/utf,ucp + \x{11680}\x{116c0} + 0: \x{11680}\x{116c0} + +/^\X/utf + A\=ps + 0: A + A\=ph +Partial match: A + A\x{300}\x{301}\=ps + 0: A\x{300}\x{301} + A\x{300}\x{301}\=ph +Partial match: A\x{300}\x{301} + A\x{301}\=ps + 0: A\x{301} + A\x{301}\=ph +Partial match: A\x{301} + +/^\X{2,3}/utf + A\=ps +Partial match: A + A\=ph +Partial match: A + AA\=ps + 0: AA + AA\=ph +Partial match: AA + A\x{300}\x{301}\=ps +Partial match: A\x{300}\x{301} + A\x{300}\x{301}\=ph +Partial match: A\x{300}\x{301} + A\x{300}\x{301}A\x{300}\x{301}\=ps + 0: A\x{300}\x{301}A\x{300}\x{301} + A\x{300}\x{301}A\x{300}\x{301}\=ph +Partial match: A\x{300}\x{301}A\x{300}\x{301} + +/^\X{2}/utf + AA\=ps + 0: AA + AA\=ph +Partial match: AA + A\x{300}\x{301}A\x{300}\x{301}\=ps + 0: A\x{300}\x{301}A\x{300}\x{301} + A\x{300}\x{301}A\x{300}\x{301}\=ph +Partial match: A\x{300}\x{301}A\x{300}\x{301} + +/^\X+/utf + AA\=ps + 0: AA + AA\=ph +Partial match: AA + +/^\X+?Z/utf + AA\=ps +Partial match: AA + AA\=ph +Partial match: AA + +/A\x{3a3}B/IBi,utf +------------------------------------------------------------------ + Bra + /i A + clist 03a3 03c2 03c3 + /i B + Ket + End +------------------------------------------------------------------ +Capture group count = 0 +Options: caseless utf +First code unit = 'A' (caseless) +Last code unit = 'B' (caseless) +Subject length lower bound = 3 + +/[\x{3a3}]/Bi,utf +------------------------------------------------------------------ + Bra + clist 03a3 03c2 03c3 + Ket + End +------------------------------------------------------------------ + +/[^\x{3a3}]/Bi,utf +------------------------------------------------------------------ + Bra + not clist 03a3 03c2 03c3 + Ket + End +------------------------------------------------------------------ + +/[\x{3a3}]+/Bi,utf +------------------------------------------------------------------ + Bra + clist 03a3 03c2 03c3 ++ + Ket + End +------------------------------------------------------------------ + +/[^\x{3a3}]+/Bi,utf +------------------------------------------------------------------ + Bra + not clist 03a3 03c2 03c3 ++ + Ket + End +------------------------------------------------------------------ + +/a*\x{3a3}/Bi,utf +------------------------------------------------------------------ + Bra + /i a*+ + clist 03a3 03c2 03c3 + Ket + End +------------------------------------------------------------------ + +/\x{3a3}+a/Bi,utf +------------------------------------------------------------------ + Bra + clist 03a3 03c2 03c3 ++ + /i a + Ket + End +------------------------------------------------------------------ + +/\x{3a3}*\x{3c2}/Bi,utf +------------------------------------------------------------------ + Bra + clist 03a3 03c2 03c3 * + clist 03a3 03c2 03c3 + Ket + End +------------------------------------------------------------------ + +/\x{3a3}{3}/i,utf,aftertext + \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} + 0: \x{3a3}\x{3c3}\x{3c2} + 0+ \x{3a3}\x{3c3}\x{3c2} + +/\x{3a3}{2,4}/i,utf,aftertext + \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} + 0: \x{3a3}\x{3c3}\x{3c2}\x{3a3} + 0+ \x{3c3}\x{3c2} + +/\x{3a3}{2,4}?/i,utf,aftertext + \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} + 0: \x{3a3}\x{3c3} + 0+ \x{3c2}\x{3a3}\x{3c3}\x{3c2} + +/\x{3a3}+./i,utf,aftertext + \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} + 0: \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} + 0+ + +/\x{3a3}++./i,utf,aftertext +\= Expect no match + \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2} +No match + +/\x{3a3}*\x{3c2}/Bi,utf +------------------------------------------------------------------ + Bra + clist 03a3 03c2 03c3 * + clist 03a3 03c2 03c3 + Ket + End +------------------------------------------------------------------ + +/[^\x{3a3}]*\x{3c2}/Bi,utf +------------------------------------------------------------------ + Bra + not clist 03a3 03c2 03c3 *+ + clist 03a3 03c2 03c3 + Ket + End +------------------------------------------------------------------ + +/[^a]*\x{3c2}/Bi,utf +------------------------------------------------------------------ + Bra + /i [^a]* + clist 03a3 03c2 03c3 + Ket + End +------------------------------------------------------------------ + +/ist/Bi,utf +------------------------------------------------------------------ + Bra + /i i + clist 0053 0073 017f + /i t + Ket + End +------------------------------------------------------------------ +\= Expect no match + ikt +No match + +/is+t/i,utf + iSs\x{17f}t + 0: iSs\x{17f}t +\= Expect no match + ikt +No match + +/is+?t/i,utf +\= Expect no match + ikt +No match + +/is?t/i,utf +\= Expect no match + ikt +No match + +/is{2}t/i,utf +\= Expect no match + iskt +No match + +# This property is a PCRE special + +/^\p{Xuc}/utf + $abc + 0: $ + @abc + 0: @ + `abc + 0: ` + \x{1234}abc + 0: \x{1234} +\= Expect no match + abc +No match + +/^\p{Xuc}+/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $@`\x{a0}\x{1234}\x{e000} +\= Expect no match + \x{9f} +No match + +/^\p{Xuc}+?/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $ +\= Expect no match + \x{9f} +No match + +/^\p{Xuc}+?\*/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $@`\x{a0}\x{1234}\x{e000}* +\= Expect no match + \x{9f} +No match + +/^\p{Xuc}++/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $@`\x{a0}\x{1234}\x{e000} +\= Expect no match + \x{9f} +No match + +/^\p{Xuc}{3,5}/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $@`\x{a0}\x{1234} +\= Expect no match + \x{9f} +No match + +/^\p{Xuc}{3,5}?/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $@` +\= Expect no match + \x{9f} +No match + +/^[\p{Xuc}]/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $ +\= Expect no match + \x{9f} +No match + +/^[\p{Xuc}]+/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $@`\x{a0}\x{1234}\x{e000} +\= Expect no match + \x{9f} +No match + +/^\P{Xuc}/utf + abc + 0: a +\= Expect no match + $abc +No match + @abc +No match + `abc +No match + \x{1234}abc +No match + +/^[\P{Xuc}]/utf + abc + 0: a +\= Expect no match + $abc +No match + @abc +No match + `abc +No match + \x{1234}abc +No match + +# Some auto-possessification tests + +/\pN+\z/B +------------------------------------------------------------------ + Bra + prop N ++ + \z + Ket + End +------------------------------------------------------------------ + +/\PN+\z/B +------------------------------------------------------------------ + Bra + notprop N ++ + \z + Ket + End +------------------------------------------------------------------ + +/\pN+/B +------------------------------------------------------------------ + Bra + prop N ++ + Ket + End +------------------------------------------------------------------ + +/\PN+/B +------------------------------------------------------------------ + Bra + notprop N ++ + Ket + End +------------------------------------------------------------------ + +/\p{Any}+\p{Any} \p{Any}+\P{Any} \p{Any}+\p{L&} \p{Any}+\p{L} \p{Any}+\p{Lu} \p{Any}+\p{Han} \p{Any}+\p{Xan} \p{Any}+\p{Xsp} \p{Any}+\p{Xps} \p{Xwd}+\p{Any} \p{Any}+\p{Xuc}/Bx,ucp +------------------------------------------------------------------ + Bra + AllAny+ + AllAny + AllAny+ + notprop Any + AllAny+ + prop L& + AllAny+ + prop L + AllAny+ + prop Lu + AllAny+ + prop Han + AllAny+ + prop Xan + AllAny+ + prop Xsp + AllAny+ + prop Xps + prop Xwd + + AllAny + AllAny+ + prop Xuc + Ket + End +------------------------------------------------------------------ + +/\p{L&}+\p{Any} \p{L&}+\p{L&} \P{L&}+\p{L&} \p{L&}+\p{L} \p{L&}+\p{Lu} \p{L&}+\p{Han} \p{L&}+\p{Xan} \p{L&}+\P{Xan} \p{L&}+\p{Xsp} \p{L&}+\p{Xps} \p{Xwd}+\p{L&} \p{L&}+\p{Xuc}/Bx,ucp +------------------------------------------------------------------ + Bra + prop L& + + AllAny + prop L& + + prop L& + notprop L& ++ + prop L& + prop L& + + prop L + prop L& + + prop Lu + prop L& + + prop Han + prop L& + + prop Xan + prop L& ++ + notprop Xan + prop L& ++ + prop Xsp + prop L& ++ + prop Xps + prop Xwd + + prop L& + prop L& + + prop Xuc + Ket + End +------------------------------------------------------------------ + +/\p{N}+\p{Any} \p{N}+\p{L&} \p{N}+\p{L} \p{N}+\P{L} \p{N}+\P{N} \p{N}+\p{Lu} \p{N}+\p{Han} \p{N}+\p{Xan} \p{N}+\p{Xsp} \p{N}+\p{Xps} \p{Xwd}+\p{N} \p{N}+\p{Xuc}/Bx,ucp +------------------------------------------------------------------ + Bra + prop N + + AllAny + prop N + + prop L& + prop N ++ + prop L + prop N + + notprop L + prop N ++ + notprop N + prop N ++ + prop Lu + prop N + + prop Han + prop N + + prop Xan + prop N ++ + prop Xsp + prop N ++ + prop Xps + prop Xwd + + prop N + prop N + + prop Xuc + Ket + End +------------------------------------------------------------------ + +/\p{Lu}+\p{Any} \p{Lu}+\p{L&} \p{Lu}+\p{L} \p{Lu}+\p{Lu} \P{Lu}+\p{Lu} \p{Lu}+\p{Nd} \p{Lu}+\P{Nd} \p{Lu}+\p{Han} \p{Lu}+\p{Xan} \p{Lu}+\p{Xsp} \p{Lu}+\p{Xps} \p{Xwd}+\p{Lu} \p{Lu}+\p{Xuc}/Bx,ucp +------------------------------------------------------------------ + Bra + prop Lu + + AllAny + prop Lu + + prop L& + prop Lu + + prop L + prop Lu + + prop Lu + notprop Lu ++ + prop Lu + prop Lu ++ + prop Nd + prop Lu + + notprop Nd + prop Lu + + prop Han + prop Lu + + prop Xan + prop Lu ++ + prop Xsp + prop Lu ++ + prop Xps + prop Xwd + + prop Lu + prop Lu + + prop Xuc + Ket + End +------------------------------------------------------------------ + +/\p{Han}+\p{Lu} \p{Han}+\p{L&} \p{Han}+\p{L} \p{Han}+\p{Lu} \p{Han}+\p{Arabic} \p{Arabic}+\p{Arabic} \p{Han}+\p{Xan} \p{Han}+\p{Xsp} \p{Han}+\p{Xps} \p{Xwd}+\p{Han} \p{Han}+\p{Xuc}/Bx,ucp +------------------------------------------------------------------ + Bra + prop Han + + prop Lu + prop Han + + prop L& + prop Han + + prop L + prop Han + + prop Lu + prop Han ++ + prop Arabic + prop Arabic + + prop Arabic + prop Han + + prop Xan + prop Han + + prop Xsp + prop Han + + prop Xps + prop Xwd + + prop Han + prop Han + + prop Xuc + Ket + End +------------------------------------------------------------------ + +/\p{Xan}+\p{Any} \p{Xan}+\p{L&} \P{Xan}+\p{L&} \p{Xan}+\p{L} \p{Xan}+\p{Lu} \p{Xan}+\p{Han} \p{Xan}+\p{Xan} \p{Xan}+\P{Xan} \p{Xan}+\p{Xsp} \p{Xan}+\p{Xps} \p{Xwd}+\p{Xan} \p{Xan}+\p{Xuc}/Bx,ucp +------------------------------------------------------------------ + Bra + prop Xan + + AllAny + prop Xan + + prop L& + notprop Xan ++ + prop L& + prop Xan + + prop L + prop Xan + + prop Lu + prop Xan + + prop Han + prop Xan + + prop Xan + prop Xan ++ + notprop Xan + prop Xan ++ + prop Xsp + prop Xan ++ + prop Xps + prop Xwd + + prop Xan + prop Xan + + prop Xuc + Ket + End +------------------------------------------------------------------ + +/\p{Xsp}+\p{Any} \p{Xsp}+\p{L&} \p{Xsp}+\p{L} \p{Xsp}+\p{Lu} \p{Xsp}+\p{Han} \p{Xsp}+\p{Xan} \p{Xsp}+\p{Xsp} \P{Xsp}+\p{Xsp} \p{Xsp}+\p{Xps} \p{Xwd}+\p{Xsp} \p{Xsp}+\p{Xuc}/Bx,ucp +------------------------------------------------------------------ + Bra + prop Xsp + + AllAny + prop Xsp ++ + prop L& + prop Xsp ++ + prop L + prop Xsp ++ + prop Lu + prop Xsp + + prop Han + prop Xsp ++ + prop Xan + prop Xsp + + prop Xsp + notprop Xsp ++ + prop Xsp + prop Xsp + + prop Xps + prop Xwd ++ + prop Xsp + prop Xsp + + prop Xuc + Ket + End +------------------------------------------------------------------ + +/\p{Xwd}+\p{Any} \p{Xwd}+\p{L&} \p{Xwd}+\p{L} \p{Xwd}+\p{Lu} \p{Xwd}+\p{Han} \p{Xwd}+\p{Xan} \p{Xwd}+\p{Xsp} \p{Xwd}+\p{Xps} \p{Xwd}+\p{Xwd} \p{Xwd}+\P{Xwd} \p{Xwd}+\p{Xuc}/Bx,ucp +------------------------------------------------------------------ + Bra + prop Xwd + + AllAny + prop Xwd + + prop L& + prop Xwd + + prop L + prop Xwd + + prop Lu + prop Xwd + + prop Han + prop Xwd + + prop Xan + prop Xwd ++ + prop Xsp + prop Xwd ++ + prop Xps + prop Xwd + + prop Xwd + prop Xwd ++ + notprop Xwd + prop Xwd + + prop Xuc + Ket + End +------------------------------------------------------------------ + +/\p{Xuc}+\p{Any} \p{Xuc}+\p{L&} \p{Xuc}+\p{L} \p{Xuc}+\p{Lu} \p{Xuc}+\p{Han} \p{Xuc}+\p{Xan} \p{Xuc}+\p{Xsp} \p{Xuc}+\p{Xps} \p{Xwd}+\p{Xuc} \p{Xuc}+\p{Xuc} \p{Xuc}+\P{Xuc}/Bx,ucp +------------------------------------------------------------------ + Bra + prop Xuc + + AllAny + prop Xuc + + prop L& + prop Xuc + + prop L + prop Xuc + + prop Lu + prop Xuc + + prop Han + prop Xuc + + prop Xan + prop Xuc + + prop Xsp + prop Xuc + + prop Xps + prop Xwd + + prop Xuc + prop Xuc + + prop Xuc + prop Xuc ++ + notprop Xuc + Ket + End +------------------------------------------------------------------ + +/\p{N}+\p{Ll} \p{N}+\p{Nd} \p{N}+\P{Nd}/Bx,ucp +------------------------------------------------------------------ + Bra + prop N ++ + prop Ll + prop N + + prop Nd + prop N + + notprop Nd + Ket + End +------------------------------------------------------------------ + +/\p{Xan}+\p{L} \p{Xan}+\p{N} \p{Xan}+\p{C} \p{Xan}+\P{L} \P{Xan}+\p{N} \p{Xan}+\P{C}/Bx,ucp +------------------------------------------------------------------ + Bra + prop Xan + + prop L + prop Xan + + prop N + prop Xan ++ + prop C + prop Xan + + notprop L + notprop Xan ++ + prop N + prop Xan + + notprop C + Ket + End +------------------------------------------------------------------ + +/\p{L}+\p{Xan} \p{N}+\p{Xan} \p{C}+\p{Xan} \P{L}+\p{Xan} \p{N}+\p{Xan} \P{C}+\p{Xan} \p{L}+\P{Xan}/Bx,ucp +------------------------------------------------------------------ + Bra + prop L + + prop Xan + prop N + + prop Xan + prop C ++ + prop Xan + notprop L + + prop Xan + prop N + + prop Xan + notprop C + + prop Xan + prop L ++ + notprop Xan + Ket + End +------------------------------------------------------------------ + +/\p{Xan}+\p{Lu} \p{Xan}+\p{Nd} \p{Xan}+\p{Cc} \p{Xan}+\P{Ll} \P{Xan}+\p{No} \p{Xan}+\P{Cf}/Bx,ucp +------------------------------------------------------------------ + Bra + prop Xan + + prop Lu + prop Xan + + prop Nd + prop Xan ++ + prop Cc + prop Xan + + notprop Ll + notprop Xan ++ + prop No + prop Xan + + notprop Cf + Ket + End +------------------------------------------------------------------ + +/\p{Lu}+\p{Xan} \p{Nd}+\p{Xan} \p{Cs}+\p{Xan} \P{Lt}+\p{Xan} \p{Nl}+\p{Xan} \P{Cc}+\p{Xan} \p{Lt}+\P{Xan}/Bx,ucp +------------------------------------------------------------------ + Bra + prop Lu + + prop Xan + prop Nd + + prop Xan + prop Cs ++ + prop Xan + notprop Lt + + prop Xan + prop Nl + + prop Xan + notprop Cc + + prop Xan + prop Lt ++ + notprop Xan + Ket + End +------------------------------------------------------------------ + +/\w+\p{P} \w+\p{Po} \w+\s \p{Xan}+\s \s+\p{Xan} \s+\w/Bx,ucp +------------------------------------------------------------------ + Bra + prop Xwd + + prop P + prop Xwd + + prop Po + prop Xwd ++ + prop Xsp + prop Xan ++ + prop Xsp + prop Xsp ++ + prop Xan + prop Xsp ++ + prop Xwd + Ket + End +------------------------------------------------------------------ + +/\w+\P{P} \W+\p{Po} \w+\S \P{Xan}+\s \s+\P{Xan} \s+\W/Bx,ucp +------------------------------------------------------------------ + Bra + prop Xwd + + notprop P + notprop Xwd + + prop Po + prop Xwd + + notprop Xsp + notprop Xan + + prop Xsp + prop Xsp + + notprop Xan + prop Xsp + + notprop Xwd + Ket + End +------------------------------------------------------------------ + +/\w+\p{Po} \w+\p{Pc} \W+\p{Po} \W+\p{Pc} \w+\P{Po} \w+\P{Pc}/Bx,ucp +------------------------------------------------------------------ + Bra + prop Xwd + + prop Po + prop Xwd ++ + prop Pc + notprop Xwd + + prop Po + notprop Xwd + + prop Pc + prop Xwd + + notprop Po + prop Xwd + + notprop Pc + Ket + End +------------------------------------------------------------------ + +/\p{Nl}+\p{Xan} \P{Nl}+\p{Xan} \p{Nl}+\P{Xan} \P{Nl}+\P{Xan}/Bx,ucp +------------------------------------------------------------------ + Bra + prop Nl + + prop Xan + notprop Nl + + prop Xan + prop Nl ++ + notprop Xan + notprop Nl + + notprop Xan + Ket + End +------------------------------------------------------------------ + +/\p{Xan}+\p{Nl} \P{Xan}+\p{Nl} \p{Xan}+\P{Nl} \P{Xan}+\P{Nl}/Bx,ucp +------------------------------------------------------------------ + Bra + prop Xan + + prop Nl + notprop Xan ++ + prop Nl + prop Xan + + notprop Nl + notprop Xan + + notprop Nl + Ket + End +------------------------------------------------------------------ + +/\p{Xan}+\p{Nd} \P{Xan}+\p{Nd} \p{Xan}+\P{Nd} \P{Xan}+\P{Nd}/Bx,ucp +------------------------------------------------------------------ + Bra + prop Xan + + prop Nd + notprop Xan ++ + prop Nd + prop Xan + + notprop Nd + notprop Xan + + notprop Nd + Ket + End +------------------------------------------------------------------ + +# End auto-possessification tests + +/\w+/B,utf,ucp,auto_callout +------------------------------------------------------------------ + Bra + Callout 255 0 3 + prop Xwd ++ + Callout 255 3 0 + Ket + End +------------------------------------------------------------------ + abcd +--->abcd + +0 ^ \w+ + +3 ^ ^ End of pattern + 0: abcd + +/[\p{N}]?+/B,no_auto_possess +------------------------------------------------------------------ + Bra + [\p{N}]?+ + Ket + End +------------------------------------------------------------------ + +/[\p{L}ab]{2,3}+/B,no_auto_possess +------------------------------------------------------------------ + Bra + [ab\p{L}]{2,3}+ + Ket + End +------------------------------------------------------------------ + +/\D+\X \d+\X \S+\X \s+\X \W+\X \w+\X \R+\X \H+\X \h+\X \V+\X \v+\X a+\X \n+\X .+\X/Bx +------------------------------------------------------------------ + Bra + \D+ + extuni + \d+ + extuni + \S+ + extuni + \s+ + extuni + \W+ + extuni + \w+ + extuni + \R+ + extuni + \H+ + extuni + \h+ + extuni + \V+ + extuni + \v+ + extuni + a+ + extuni + \x0a+ + extuni + Any+ + extuni + Ket + End +------------------------------------------------------------------ + +/.+\X/Bsx +------------------------------------------------------------------ + Bra + AllAny+ + extuni + Ket + End +------------------------------------------------------------------ + +/\X+$/Bmx +------------------------------------------------------------------ + Bra + extuni+ + /m $ + Ket + End +------------------------------------------------------------------ + +/\X+\D \X+\d \X+\S \X+\s \X+\W \X+\w \X+. \X+\R \X+\H \X+\h \X+\V \X+\v \X+\X \X+\Z \X+\z \X+$/Bx +------------------------------------------------------------------ + Bra + extuni+ + \D + extuni+ + \d + extuni+ + \S + extuni+ + \s + extuni+ + \W + extuni+ + \w + extuni+ + Any + extuni+ + \R + extuni+ + \H + extuni+ + \h + extuni+ + \V + extuni+ + \v + extuni+ + extuni + extuni+ + \Z + extuni++ + \z + extuni+ + $ + Ket + End +------------------------------------------------------------------ + +/\d+\s{0,5}=\s*\S?=\w{0,4}\W*/B,utf,ucp +------------------------------------------------------------------ + Bra + prop Nd ++ + prop Xsp {0,5}+ + = + prop Xsp *+ + notprop Xsp ? + = + prop Xwd {0,4}+ + notprop Xwd *+ + Ket + End +------------------------------------------------------------------ + +/[RST]+/Bi,utf,ucp +------------------------------------------------------------------ + Bra + [R-Tr-t\x{17f}]++ + Ket + End +------------------------------------------------------------------ + +/[R-T]+/Bi,utf,ucp +------------------------------------------------------------------ + Bra + [R-Tr-t\x{17f}]++ + Ket + End +------------------------------------------------------------------ + +/[Q-U]+/Bi,utf,ucp +------------------------------------------------------------------ + Bra + [Q-Uq-u\x{17f}]++ + Ket + End +------------------------------------------------------------------ + +/^s?c/Iim,utf +Capture group count = 0 +Options: caseless multiline utf +First code unit at start or follows newline +Last code unit = 'c' (caseless) +Subject length lower bound = 1 + scat + 0: sc + +/\X?abc/utf,no_start_optimize + \xff\x7f\x00\x00\x03\x00\x41\xcc\x80\x41\x{300}\x61\x62\x63\x00\=no_utf_check,offset=06 + 0: A\x{300}abc + +/\x{100}\x{200}\K\x{300}/utf,startchar + \x{100}\x{200}\x{300} + 0: \x{100}\x{200}\x{300} + ^^^^^^^^^^^^^^ + +# Test UTF characters in a substitution + +/ábc/utf,replace=XሴZ + 123ábc123 + 1: 123X\x{1234}Z123 + +/(?<=abc)(|def)/g,utf,replace=<$0> + 123abcáyzabcdef789abcሴqr + 4: 123abc<>\x{e1}yzabc<>789abc<>\x{1234}qr + +/[A-`]/iB,utf +------------------------------------------------------------------ + Bra + [A-z\x{212a}\x{17f}] + Ket + End +------------------------------------------------------------------ + abcdefghijklmno + 0: a + +/(?<=\K\x{17f})/g,utf,aftertext + \x{17f}\x{17f}\x{17f}\x{17f}\x{17f} + 0: \x{17f} + 0+ \x{17f}\x{17f}\x{17f}\x{17f} + 0: \x{17f} + 0+ \x{17f}\x{17f}\x{17f} + 0: \x{17f} + 0+ \x{17f}\x{17f} + 0: \x{17f} + 0+ \x{17f} + 0: \x{17f} + 0+ + +/(?<=\K\x{17f})/altglobal,utf,aftertext + \x{17f}\x{17f}\x{17f}\x{17f}\x{17f} + 0: \x{17f} + 0+ \x{17f}\x{17f}\x{17f}\x{17f} + 0: \x{17f} + 0+ \x{17f}\x{17f}\x{17f} + 0: \x{17f} + 0+ \x{17f}\x{17f} + 0: \x{17f} + 0+ \x{17f} + 0: \x{17f} + 0+ + +"\xa\xf<(.\pZ*\P{Xwd}+^\xa8\3'3yq.::?(?J:()\xd1+!~:3'(8?:)':(?'d'(?'d'^u]!.+.+\\A\Ah(n+?9){7}+\K;(?'X'u'(?'c'(?'z'(?\xb::\xf0'|\xd3(\xae?'w(z\x8?P>l)\x8?P>a)'\H\R\xd1+!!~:3'(?:h$N{26875}\W+?\\=D{2}\x89(?i:Uy0\N({2\xa(\v\x85*){y*\A(()\p{L}+?\P{^Xan}'+?\xff\+pS\?|).{;y*\A(()\p{L}+?\8}\d?1(|)(/1){7}.+[Lp{Me}].\s\xdcC*?(?())(?))(?\g{d});\g{x}\x11\g{d}\x81\|$((?'X'\'X'(?'W''\x92()'9'\x83*))\xba*\!?^ <){)':;\xcc4'\xd1'(?'X'28))?-%--\x95$9*\4'|\xd1((''e\x94*$9:)*#(?'R')3)\x7?('P\xed')\\x16:;()\x1e\x10*:(?)\xd1+0!~:(?)'d'E:yD!\s(?'R'\x1e;\x10:U))|'\x9g!\xb0*){)\\x16:;()\x1e\x10\x87*:(?)\xd1+!~:(?)'}'\d'E:yD!\s(?'R'\x1e;\x10:U))|'))|)g!\xb0*R+9{29+)#(?'P'})*?pS\{3,}\x85,{0,}l{*UTF)(\xe{7}){3722,{9,}d{2,?|))|{)\(A?&d}}{\xa,}2}){3,}7,l{)22}(,}l:7{2,4}}29\x19+)#?'P'})*v?))\x5" +Failed: error 122 at offset 1227: unmatched closing parenthesis + +/$(&.+[\p{Me}].\s\xdcC*?(?())(?)\xd1+!~:(?)''(d'E:yD!\s(?'R'\x1e;\x10:U))|')g!\xb0*){29+))#(?'P'})*?/ + +"(*UTF)(*UCP)(.UTF).+X(\V+;\^(\D|)!999}(?(?C{7(?C')\H*\S*/^\x5\xa\\xd3\x85n?(;\D*(?m).[^mH+((*UCP)(*U:F)})(?!^)(?'" +Failed: error 162 at offset 113: subpattern name expected + +/[\pS#moq]/ + = + 0: = + +/(*:a\x{12345}b\t(d\)c)xxx/utf,alt_verbnames,mark + cxxxz + 0: xxx +MK: a\x{12345}b\x{09}(d)c + +/abcd/utf,replace=x\x{824}y\o{3333}z(\Q12\$34$$\x34\E5$$),substitute_extended + abcd + 1: x\x{824}y\x{6db}z(12\$34$$\x345$) + +/a(\x{e0}\x{101})(\x{c0}\x{102})/utf,replace=a\u$1\U$1\E$1\l$2\L$2\Eab\U\x{e0}\x{101}\L\x{d0}\x{160}\EDone,substitute_extended + a\x{e0}\x{101}\x{c0}\x{102} + 1: a\x{c0}\x{101}\x{c0}\x{100}\x{e0}\x{101}\x{e0}\x{102}\x{e0}\x{103}ab\x{c0}\x{100}\x{f0}\x{161}Done + +/((?\d)|(?\p{L}))/g,substitute_extended,replace=<${digit:+digit; :not digit; }${letter:+letter:not a letter}> + ab12cde + 7: + +/(*UCP)(*UTF)[[:>:]]X/B +------------------------------------------------------------------ + Bra + \b + Assert back + Reverse + prop Xwd + Ket + X + Ket + End +------------------------------------------------------------------ + +/abc/utf,replace=xyz + abc\=zero_terminate + 1: xyz + +/a[[:punct:]b]/ucp,bincode +------------------------------------------------------------------ + Bra + a + [b[:punct:]] + Ket + End +------------------------------------------------------------------ + +/a[[:punct:]b]/utf,ucp,bincode +------------------------------------------------------------------ + Bra + a + [b[:punct:]] + Ket + End +------------------------------------------------------------------ + +/a[b[:punct:]]/utf,ucp,bincode +------------------------------------------------------------------ + Bra + a + [b[:punct:]] + Ket + End +------------------------------------------------------------------ + +/[[:^ascii:]]/utf,ucp,bincode +------------------------------------------------------------------ + Bra + [\x80-\xff] (neg) + Ket + End +------------------------------------------------------------------ + +/[[:^ascii:]\w]/utf,ucp,bincode +------------------------------------------------------------------ + Bra + [\x80-\xff\p{Xwd}\x{100}-\x{10ffff}] + Ket + End +------------------------------------------------------------------ + +/[\w[:^ascii:]]/utf,ucp,bincode +------------------------------------------------------------------ + Bra + [\x80-\xff\p{Xwd}\x{100}-\x{10ffff}] + Ket + End +------------------------------------------------------------------ + +/[^[:ascii:]\W]/utf,ucp,bincode +------------------------------------------------------------------ + Bra + [^\x00-\x7f\P{Xwd}] + Ket + End +------------------------------------------------------------------ + \x{de} + 0: \x{de} + \x{200} + 0: \x{200} +\= Expect no match + \x{300} +No match + \x{37e} +No match + +/[[:^ascii:]a]/utf,ucp,bincode +------------------------------------------------------------------ + Bra + [a\x80-\xff] (neg) + Ket + End +------------------------------------------------------------------ + +/L(?#(|++\x{0a}\x{123}\x{123}\x{123}\x{123} + +0 ^ . + +0 ^ . + +1 ^ ^ . + +2 ^ ^ End of pattern + 0: \x{123}\x{123} + +# This tests processing wide characters in extended mode. + +/XÈ€/x,utf + +# These three test a bug fix that was not clearing up after a locale setting +# when the test or a subsequent one matched a wide character. + +//locale=C + +/[\P{Yi}]/utf +\x{2f000} + 0: \x{2f000} + +/[\P{Yi}]/utf,locale=C +\x{2f000} + 0: \x{2f000} + +/^(? +Overall options: anchored +Last code unit = 'z' +Subject length lower bound = 3 + +/(|ß)7/caseless,ucp + +/(\xc1)\1/i,ucp + \xc1\xe1\=no_jit + 0: \xc1\xe1 + 1: \xc1 + +# End of testinput5 diff --git a/src/pcre/testdata/testoutput8 b/src/pcre2/testdata/testoutput6 similarity index 80% rename from src/pcre/testdata/testoutput8 rename to src/pcre2/testdata/testoutput6 index 4984376d..607b572b 100644 --- a/src/pcre/testdata/testoutput8 +++ b/src/pcre2/testdata/testoutput6 @@ -1,8 +1,10 @@ -/-- This set of tests check the DFA matching functionality of pcre_dfa_exec(), - excluding UTF and Unicode property support. The -dfa flag must be used with - pcretest when running it. --/ +# This set of tests check the DFA matching functionality of pcre2_dfa_match(), +# excluding UTF and Unicode property support. All matches are done using DFA, +# forced by setting a default subject modifier at the start. -< forbid 8W +#forbid_utf +#subject dfa +#newline_default lf anycrlf any /abc/ abc @@ -21,18 +23,18 @@ 0: abc abbbbbbc 0: abbbbbbc - *** Failers -No match +\= Expect no match ac No match ab No match -/a*/O +/a*/no_auto_possess a 0: a 1: aaaaaaaaaaaaaaaaa +Matched, but offsets vector is too small to show all matches 0: aaaaaaaaaaaaaaaaa 1: aaaaaaaaaaaaaaaa 2: aaaaaaaaaaaaaaa @@ -48,10 +50,7 @@ No match 12: aaaaa 13: aaaa 14: aaa -15: aa -16: a -17: - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\=ovector=10 Matched, but offsets vector is too small to show all matches 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1: aaaaaaaaaaaaaaaaaaaaaaaaaaaaa @@ -63,19 +62,7 @@ Matched, but offsets vector is too small to show all matches 7: aaaaaaaaaaaaaaaaaaaaaaa 8: aaaaaaaaaaaaaaaaaaaaaa 9: aaaaaaaaaaaaaaaaaaaaa -10: aaaaaaaaaaaaaaaaaaaa -11: aaaaaaaaaaaaaaaaaaa -12: aaaaaaaaaaaaaaaaaa -13: aaaaaaaaaaaaaaaaa -14: aaaaaaaaaaaaaaaa -15: aaaaaaaaaaaaaaa -16: aaaaaaaaaaaaaa -17: aaaaaaaaaaaaa -18: aaaaaaaaaaaa -19: aaaaaaaaaaa -20: aaaaaaaaaa -21: aaaaaaaaa - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\F + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa\=dfa_shortest 0: /(a|abcd|african)/ @@ -91,8 +78,7 @@ Matched, but offsets vector is too small to show all matches /^abc/ abcdef 0: abc - *** Failers -No match +\= Expect no match xyzabc No match xyz\nabc @@ -103,16 +89,14 @@ No match 0: abc xyz\nabc 0: abc - *** Failers -No match +\= Expect no match xyzabc No match /\Aabc/ abcdef 0: abc - *** Failers -No match +\= Expect no match xyzabc No match xyz\nabc @@ -121,8 +105,7 @@ No match /\Aabc/m abcdef 0: abc - *** Failers -No match +\= Expect no match xyzabc No match xyz\nabc @@ -131,13 +114,12 @@ No match /\Gabc/ abcdef 0: abc - xyzabc\>3 + xyzabc\=offset=3 0: abc - *** Failers -No match +\= Expect no match xyzabc No match - xyzabc\>2 + xyzabc\=offset=2 No match /x\dy\Dz/ @@ -145,8 +127,7 @@ No match 0: x9yzz x0y+z 0: x0y+z - *** Failers -No match +\= Expect no match xyz No match xxy0z @@ -157,8 +138,7 @@ No match 0: x yzz x y+z 0: x y+z - *** Failers -No match +\= Expect no match xyz No match xxyyz @@ -167,8 +147,7 @@ No match /x\wy\Wz/ xxy+z 0: xxy+z - *** Failers -No match +\= Expect no match xxy0z No match x+y+z @@ -179,8 +158,7 @@ No match 0: x+y x-y 0: x-y - *** Failers -No match +\= Expect no match x\ny No match @@ -199,8 +177,7 @@ No match 0: a+bc\x0adp+q x\nyp+q 0: x\x0ayp+q - *** Failers -No match +\= Expect no match a\nbc\ndp+q No match a+bc\ndp\nq @@ -211,8 +188,7 @@ No match /a\d\z/ ba0 0: a0 - *** Failers -No match +\= Expect no match ba0\n No match ba0\ncd @@ -221,8 +197,7 @@ No match /a\d\z/m ba0 0: a0 - *** Failers -No match +\= Expect no match ba0\n No match ba0\ncd @@ -233,8 +208,7 @@ No match 0: a0 ba0\n 0: a0 - *** Failers -No match +\= Expect no match ba0\ncd No match @@ -243,8 +217,7 @@ No match 0: a0 ba0\n 0: a0 - *** Failers -No match +\= Expect no match ba0\ncd No match @@ -253,8 +226,7 @@ No match 0: a0 ba0\n 0: a0 - *** Failers -No match +\= Expect no match ba0\ncd No match @@ -265,8 +237,6 @@ No match 0: a0 ba0\ncd 0: a0 - *** Failers -No match /abc/i abc @@ -301,8 +271,7 @@ No match 0: xxxyz axxxxyzq 0: xxxyz - *** Failers -No match +\= Expect no match ax No match axx @@ -313,8 +282,7 @@ No match 0: xxxyz axxxxyzq 0: xxxyz - *** Failers -No match +\= Expect no match ax No match axx @@ -333,8 +301,7 @@ No match 0: xxxyz axxxxyzq 0: xxxyz - *** Failers -No match +\= Expect no match ax No match axx @@ -344,7 +311,7 @@ No match axyzq No match -/[^a]+/O +/[^a]+/no_auto_possess bac 0: b bcdefax @@ -353,16 +320,11 @@ No match 2: bcd 3: bc 4: b - *** Failers - 0: *** F - 1: *** - 2: *** - 3: ** - 4: * +\= Expect no match aaaaa No match -/[^a]*/O +/[^a]*/no_auto_possess bac 0: b 1: @@ -372,18 +334,11 @@ No match 2: bcd 3: bc 4: b - 5: - *** Failers - 0: *** F - 1: *** - 2: *** - 3: ** - 4: * 5: aaaaa 0: -/[^a]{3,5}/O +/[^a]{3,5}/no_auto_possess xyz 0: xyz awxyza @@ -397,10 +352,7 @@ No match 0: bcdef 1: bcde 2: bcd - *** Failers - 0: *** F - 1: *** - 2: *** +\= Expect no match axya No match axa @@ -423,16 +375,14 @@ No match /\d+/ ab1234c56 0: 1234 - *** Failers -No match +\= Expect no match xyz No match /\D+/ ab123c56 0: ab - *** Failers - 0: *** Failers +\= Expect no match 789 No match @@ -441,8 +391,7 @@ No match 0: 5A ABC 0: A - *** Failers -No match +\= Expect no match XYZ No match @@ -453,8 +402,6 @@ No match 0: BA 9ABC 0: A - *** Failers -No match /a+/ aaaa @@ -471,8 +418,7 @@ No match 0: abcdxyz axyz 0: axyz - *** Failers -No match +\= Expect no match xyz No match @@ -487,8 +433,7 @@ No match 0: 12X 123X 0: 123X - *** Failers -No match +\= Expect no match X No match 1X @@ -505,8 +450,7 @@ No match 0: c9 d04 0: d0 - *** Failers -No match +\= Expect no match e45 No match abcd @@ -529,8 +473,7 @@ No match 0: abcd1 1234 0: 1 - *** Failers -No match +\= Expect no match e45 No match abcd @@ -547,8 +490,7 @@ No match 0: d0 abcd1234 0: abcd1 - *** Failers -No match +\= Expect no match 1234 No match e45 @@ -573,8 +515,7 @@ No match 0: d0 1234 0: 1 - *** Failers -No match +\= Expect no match abcd1234 No match e45 @@ -585,8 +526,7 @@ No match 0: ab4 bcd93 0: bcd9 - *** Failers -No match +\= Expect no match 1234 No match a36 @@ -603,16 +543,13 @@ No match 0: abcabcabc4 42xyz 0: 4 - *** Failers -No match /^(abc)+\d/ abc45 0: abc4 abcabcabc45 0: abcabcabc4 - *** Failers -No match +\= Expect no match 42xyz No match @@ -621,8 +558,7 @@ No match 0: abc4 42xyz 0: 4 - *** Failers -No match +\= Expect no match abcabcabc45 No match @@ -631,8 +567,7 @@ No match 0: abcabc4 abcabcabc45 0: abcabcabc4 - *** Failers -No match +\= Expect no match abcabcabcabc45 No match abc45 @@ -663,8 +598,7 @@ No match 0: a(b)c a(b(c))d 0: a(b(c))d - *** Failers) -No match +\= Expect no match) a(b(c)d No match @@ -679,8 +613,7 @@ No match /^(?>a*)\d/ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa9876 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa9 - *** Failers -No match +\= Expect no match aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa No match @@ -697,40 +630,36 @@ No match 0: def> 0: <> - *** Failers -No match +\= Expect no match ababab - +0 ^ (ab|cd){3,4} + +0 ^ ( +1 ^ a +4 ^ c +2 ^^ b @@ -800,13 +726,13 @@ No match +4 ^ ^ c +2 ^ ^ b +3 ^ ^ | -+12 ^ ^ ++12 ^ ^ End of pattern +1 ^ ^ a +4 ^ ^ c 0: ababab abcdabcd --->abcdabcd - +0 ^ (ab|cd){3,4} + +0 ^ ( +1 ^ a +4 ^ c +2 ^^ b @@ -814,22 +740,22 @@ No match +1 ^ ^ a +4 ^ ^ c +5 ^ ^ d - +6 ^ ^ ) + +6 ^ ^ ){3,4} +1 ^ ^ a +4 ^ ^ c +2 ^ ^ b +3 ^ ^ | -+12 ^ ^ ++12 ^ ^ End of pattern +1 ^ ^ a +4 ^ ^ c +5 ^ ^ d - +6 ^ ^ ) -+12 ^ ^ + +6 ^ ^ ){3,4} ++12 ^ ^ End of pattern 0: abcdabcd 1: abcdab abcdcdcdcdcd --->abcdcdcdcdcd - +0 ^ (ab|cd){3,4} + +0 ^ ( +1 ^ a +4 ^ c +2 ^^ b @@ -837,26 +763,25 @@ No match +1 ^ ^ a +4 ^ ^ c +5 ^ ^ d - +6 ^ ^ ) + +6 ^ ^ ){3,4} +1 ^ ^ a +4 ^ ^ c +5 ^ ^ d - +6 ^ ^ ) -+12 ^ ^ + +6 ^ ^ ){3,4} ++12 ^ ^ End of pattern +1 ^ ^ a +4 ^ ^ c +5 ^ ^ d - +6 ^ ^ ) -+12 ^ ^ + +6 ^ ^ ){3,4} ++12 ^ ^ End of pattern 0: abcdcdcd 1: abcdcd /^abc/ abcdef 0: abc - *** Failers -No match - abcdef\B +\= Expect no match + abcdef\=notbol No match /^(a*|xyz)/ @@ -867,11 +792,10 @@ No match xyz 0: xyz 1: - xyz\N + xyz\=notempty 0: xyz - *** Failers - 0: - bcd\N +\= Expect no match + bcd\=notempty No match /xyz$/ @@ -879,11 +803,10 @@ No match 0: xyz xyz\n 0: xyz - *** Failers -No match - xyz\Z +\= Expect no match + xyz\=noteol No match - xyz\n\Z + xyz\n\=noteol No match /xyz$/m @@ -893,119 +816,100 @@ No match 0: xyz abcxyz\npqr 0: xyz - abcxyz\npqr\Z + abcxyz\npqr\=noteol 0: xyz - xyz\n\Z + xyz\n\=noteol 0: xyz - *** Failers -No match - xyz\Z +\= Expect no match + xyz\=noteol No match /\Gabc/ abcdef 0: abc - defabcxyz\>3 + defabcxyz\=offset=3 0: abc - *** Failers -No match +\= Expect no match defabcxyz No match /^abcdef/ - ab\P + ab\=ps Partial match: ab - abcde\P + abcde\=ps Partial match: abcde - abcdef\P + abcdef\=ps 0: abcdef - *** Failers -No match - abx\P +\= Expect no match + abx\=ps No match /^a{2,4}\d+z/ - a\P + a\=ps Partial match: a - aa\P + aa\=ps Partial match: aa - aa2\P + aa2\=ps Partial match: aa2 - aaa\P + aaa\=ps Partial match: aaa - aaa23\P + aaa23\=ps Partial match: aaa23 - aaaa12345\P + aaaa12345\=ps Partial match: aaaa12345 - aa0z\P + aa0z\=ps 0: aa0z - aaaa4444444444444z\P + aaaa4444444444444z\=ps 0: aaaa4444444444444z - *** Failers -No match - az\P +\= Expect no match + az\=ps No match - aaaaa\P + aaaaa\=ps No match - a56\P + a56\=ps No match /^abcdef/ - abc\P + abc\=ps Partial match: abc - def\R + def\=dfa_restart 0: def /(?<=foo)bar/ - xyzfo\P -No match - foob\P\>2 -Partial match at offset 3: foob - foobar...\R\P\>4 + foob\=ps,offset=2,allusedtext +Partial match: foob + <<< + foobar...\=ps,dfa_restart,offset=4 0: ar - xyzfo\P -No match - foobar\>2 + foobar\=offset=2 0: bar - *** Failers +\= Expect no match + xyzfo\=ps No match - xyzfo\P -No match - obar\R + obar\=dfa_restart No match /(ab*(cd|ef))+X/ - adfadadaklhlkalkajhlkjahdfasdfasdfladsfjkj\P\Z -No match - lkjhlkjhlkjhlkjhabbbbbbcdaefabbbbbbbefa\P\B\Z + lkjhlkjhlkjhlkjhabbbbbbcdaefabbbbbbbefa\=ps,notbol,noteol Partial match: abbbbbbcdaefabbbbbbbefa - cdabbbbbbbb\P\R\B\Z + cdabbbbbbbb\=ps,notbol,dfa_restart,noteol Partial match: cdabbbbbbbb - efabbbbbbbbbbbbbbbb\P\R\B\Z + efabbbbbbbbbbbbbbbb\=ps,notbol,dfa_restart,noteol Partial match: efabbbbbbbbbbbbbbbb - bbbbbbbbbbbbcdXyasdfadf\P\R\B\Z + bbbbbbbbbbbbcdXyasdfadf\=ps,notbol,dfa_restart,noteol 0: bbbbbbbbbbbbcdX - -/(a|b)/SF>testsavedregex -Compiled pattern written to testsavedregex -Study data written to testsavedregex ->>>abcxyzpqrrrabbxyyyypqAzz 0: abcxyzpqrrrabbxyyyypqAzz - *** Failers -No match +\= Expect no match abxyzpqrrabbxyyyypqAzz No match abxyzpqrrrrabbxyyyypqAzz @@ -1104,8 +1007,7 @@ No match 0: abczz abcabczz 0: abcabczz - *** Failers -No match +\= Expect no match zz No match abcabcabczz @@ -1130,8 +1032,7 @@ No match 0: abbbbbbbbbbbc bbbbbbbbbbbac 0: bbbbbbbbbbbac - *** Failers -No match +\= Expect no match aaac No match abbbbbbbbbbbac @@ -1154,8 +1055,7 @@ No match 0: abbbbbbbbbbbc bbbbbbbbbbbac 0: bbbbbbbbbbbac - *** Failers -No match +\= Expect no match aaac No match abbbbbbbbbbbac @@ -1172,8 +1072,7 @@ No match 0: bbabc bababc 0: bababc - *** Failers -No match +\= Expect no match bababbc No match babababc @@ -1186,8 +1085,7 @@ No match 0: bbabc bababc 0: bababc - *** Failers -No match +\= Expect no match bababbc No match babababc @@ -1210,8 +1108,7 @@ No match 0: d ething 0: e - *** Failers -No match +\= Expect no match fthing No match [thing @@ -1228,8 +1125,7 @@ No match 0: d ething 0: e - *** Failers -No match +\= Expect no match athing No match fthing @@ -1242,8 +1138,7 @@ No match 0: [ \\thing 0: \ - *** Failers - 0: * +\= Expect no match athing No match bthing @@ -1262,8 +1157,7 @@ No match 0: a fthing 0: f - *** Failers - 0: * +\= Expect no match ]thing No match cthing @@ -1306,8 +1200,7 @@ No match 0: 10 100 0: 100 - *** Failers -No match +\= Expect no match abc No match @@ -1324,8 +1217,7 @@ No match 0: xxx0 xxx1234 0: xxx1234 - *** Failers -No match +\= Expect no match xxx No match @@ -1336,12 +1228,11 @@ No match 0: xx123 123456 0: 123456 - *** Failers -No match - 123 -No match x1234 0: x1234 +\= Expect no match + 123 +No match /^.+?[0-9][0-9][0-9]$/ x123 @@ -1350,18 +1241,16 @@ No match 0: xx123 123456 0: 123456 - *** Failers -No match - 123 -No match x1234 0: x1234 +\= Expect no match + 123 +No match /^([^!]+)!(.+)=apquxz\.ixr\.zzz\.ac\.uk$/ abc!pqr=apquxz.ixr.zzz.ac.uk 0: abc!pqr=apquxz.ixr.zzz.ac.uk - *** Failers -No match +\= Expect no match !pqr=apquxz.ixr.zzz.ac.uk No match abc!=apquxz.ixr.zzz.ac.uk @@ -1374,7 +1263,8 @@ No match /:/ Well, we need a colon: somewhere 0: : - *** Fail if we don't +\= Expect no match + No match without a colon No match /([\da-f:]+)$/i @@ -1394,8 +1284,7 @@ No match 0: def Any old stuff 0: ff - *** Failers -No match +\= Expect no match 0zzz No match gzzz @@ -1410,8 +1299,7 @@ No match 0: .1.2.3 A.12.123.0 0: A.12.123.0 - *** Failers -No match +\= Expect no match .1.2.3333 No match 1.2.3 @@ -1424,8 +1312,7 @@ No match 0: 1 IN SOA non-sp1 non-sp2( 1 IN SOA non-sp1 non-sp2 ( 0: 1 IN SOA non-sp1 non-sp2 ( - *** Failers -No match +\= Expect no match 1IN SOA non-sp1 non-sp2( No match @@ -1442,8 +1329,7 @@ No match 0: sxk.zzz.ac.uk. x-.y-. 0: x-.y-. - *** Failers -No match +\= Expect no match -abc.peq. No match @@ -1456,8 +1342,7 @@ No match 0: *.c3-b.c *.c-a.b-c 0: *.c-a.b-c - *** Failers -No match +\= Expect no match *.0 No match *.a- @@ -1494,22 +1379,18 @@ No match 0: "abcd" ; \"\" ; rhubarb 0: "" ; rhubarb - *** Failers -No match +\= Expect no match \"1234\" : things No match /^$/ \ 0: - *** Failers -No match / ^ a (?# begins with a) b\sc (?# then b c) $ (?# then end)/x ab c 0: ab c - *** Failers -No match +\= Expect no match abc No match ab cde @@ -1518,8 +1399,7 @@ No match /(?x) ^ a (?# begins with a) b\sc (?# then b c) $ (?# then end)/ ab c 0: ab c - *** Failers -No match +\= Expect no match abc No match ab cde @@ -1530,8 +1410,7 @@ No match 0: a bcd a b d 0: a b d - *** Failers -No match +\= Expect no match abcd No match ab d @@ -1646,8 +1525,7 @@ No match 0: 12345678ab 12345678__ 0: 12345678__ - *** Failers -No match +\= Expect no match 1234567 No match @@ -1660,8 +1538,7 @@ No match 0: 12345 aaaaa 0: aaaaa - *** Failers -No match +\= Expect no match 123456 No match @@ -1689,8 +1566,7 @@ No match 0: From abcd Mon Sep 01 12:33 From abcd Mon Sep 1 12:33:02 1997 0: From abcd Mon Sep 1 12:33 - *** Failers -No match +\= Expect no match From abcd Sep 01 12:33:02 1997 No match @@ -1721,8 +1597,7 @@ No match /^(\D*)(?=\d)(?!123)/ abc456 0: abc - *** Failers -No match +\= Expect no match abc123 No match @@ -1748,20 +1623,18 @@ No match /(?!^)abc/ the abc 0: abc - *** Failers -No match +\= Expect no match abc No match /(?=^)abc/ abc 0: abc - *** Failers -No match +\= Expect no match the abc No match -/^[ab]{1,3}(ab*|b)/O +/^[ab]{1,3}(ab*|b)/no_auto_possess aabbbbb 0: aabbbbb 1: aabbbb @@ -1770,7 +1643,7 @@ No match 4: aab 5: aa -/^[ab]{1,3}?(ab*|b)/O +/^[ab]{1,3}?(ab*|b)/no_auto_possess aabbbbb 0: aabbbbb 1: aabbbb @@ -1779,7 +1652,7 @@ No match 4: aab 5: aa -/^[ab]{1,3}?(ab*?|b)/O +/^[ab]{1,3}?(ab*?|b)/no_auto_possess aabbbbb 0: aabbbbb 1: aabbbb @@ -1788,7 +1661,7 @@ No match 4: aab 5: aa -/^[ab]{1,3}(ab*?|b)/O +/^[ab]{1,3}(ab*?|b)/no_auto_possess aabbbbb 0: aabbbbb 1: aabbbb @@ -2013,8 +1886,7 @@ No match A missing angle +/^a.b/newline=lf a\rb 0: a\x0db - *** Failers -No match +\= Expect no match a\nb No match @@ -2863,8 +2713,7 @@ No match 0: abc abc\n 0: abc - *** Failers -No match +\= Expect no match abc\ndef No match @@ -2900,30 +2749,10 @@ No match abc\100\60 0: abc@0 -/^A\8B\9C$/ - A8B9C - 0: A8B9C - *** Failers -No match - A\08B\09C -No match - -/^[A\8B\9C]+$/ - A8B9C - 0: A8B9C - *** Failers -No match - A8B9C\x00 -No match - /(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)\12\123/ abcdefghijk\12S 0: abcdefghijk\x0aS -/ab\idef/ - abidef - 0: abidef - /a{0}bc/ bc 0: bc @@ -2963,13 +2792,7 @@ No match 1: baNOTc bacccd 0: baccc - *** Failers - 0: *** Failers - 1: *** Failer - 2: *** Faile - 3: *** Fail - 4: *** Fai - 5: *** Fa +\= Expect no match anything No match b\bc @@ -3000,8 +2823,7 @@ No match /[^k]$/ abc 0: c - *** Failers - 0: s +\= Expect no match abk No match @@ -3012,8 +2834,7 @@ No match 0: bc kabc 0: abc - *** Failers - 0: ers +\= Expect no match abk No match akb @@ -3026,8 +2847,7 @@ No match 0: 12345678@a.b.c.d 123456789\@x.y.z 0: 123456789@x.y.z - *** Failers -No match +\= Expect no match 12345678\@x.y.uk No match 1234567\@a.b.c.d @@ -3083,8 +2903,7 @@ No match 1: .23 1.875000282 0: .875 - *** Failers -No match +\= Expect no match 1.235 No match @@ -3101,12 +2920,12 @@ No match 0: food is under the bar in the bar 1: food is under the bar -/foo(.*?)bar/ +/foo(.*?)bar/ The food is under the bar in the barn. 0: food is under the bar in the bar 1: food is under the bar -/(.*)(\d*)/O +/(.*)(\d*)/no_auto_possess I have 2 numbers: 53147 Matched, but offsets vector is too small to show all matches 0: I have 2 numbers: 53147 @@ -3124,20 +2943,13 @@ Matched, but offsets vector is too small to show all matches 12: I have 2 nu 13: I have 2 n 14: I have 2 -15: I have 2 -16: I have -17: I have -18: I hav -19: I ha -20: I h -21: I /(.*)(\d+)/ I have 2 numbers: 53147 0: I have 2 numbers: 53147 1: I have 2 -/(.*?)(\d*)/O +/(.*?)(\d*)/no_auto_possess I have 2 numbers: 53147 Matched, but offsets vector is too small to show all matches 0: I have 2 numbers: 53147 @@ -3155,13 +2967,6 @@ Matched, but offsets vector is too small to show all matches 12: I have 2 nu 13: I have 2 n 14: I have 2 -15: I have 2 -16: I have -17: I have -18: I hav -19: I ha -20: I h -21: I /(.*?)(\d+)/ I have 2 numbers: 53147 @@ -3193,8 +2998,7 @@ Matched, but offsets vector is too small to show all matches /^(\D*)(?=\d)(?!123)/ ABC445 0: ABC - *** Failers -No match +\= Expect no match ABC123 No match @@ -3203,8 +3007,7 @@ No match 0: W46] -46]789 0: -46] - *** Failers -No match +\= Expect no match Wall No match Zebra @@ -3233,8 +3036,7 @@ No match 0: ] \\backslash 0: \ - *** Failers -No match +\= Expect no match -46]789 No match well @@ -3247,10 +3049,12 @@ No match /word (?:[a-zA-Z0-9]+ ){0,10}otherword/ word cat dog elephant mussel cow horse canary baboon snake shark otherword 0: word cat dog elephant mussel cow horse canary baboon snake shark otherword +\= Expect no match word cat dog elephant mussel cow horse canary baboon snake shark No match /word (?:[a-zA-Z0-9]+ ){0,300}otherword/ +\= Expect no match word cat dog elephant mussel cow horse canary baboon snake shark the quick brown fox and the lazy dog and several other words getting close to thirty by now I hope No match @@ -3326,25 +3130,25 @@ No match 8: /^(a){1,1}/ - bcd -No match abc 0: a aab 0: a - -/^(a){1,2}/ +\= Expect no match bcd No match + +/^(a){1,2}/ abc 0: a aab 0: aa 1: a - -/^(a){1,3}/ +\= Expect no match bcd No match + +/^(a){1,3}/ abc 0: a aab @@ -3354,10 +3158,11 @@ No match 0: aaa 1: aa 2: a - -/^(a){1,}/ +\= Expect no match bcd No match + +/^(a){1,}/ abc 0: a aab @@ -3376,6 +3181,9 @@ No match 5: aaa 6: aa 7: a +\= Expect no match + bcd +No match /.*\.gif/ borfle\nbib.gif\nno @@ -3440,8 +3248,7 @@ No match 0: 1234X BarFoo 0: B - *** Failers -No match +\= Expect no match abcde\nBar No match @@ -3458,8 +3265,7 @@ No match 0: abcde\x0a1234X BarFoo 0: B - *** Failers -No match +\= Expect no match abcde\nBar No match @@ -3476,8 +3282,7 @@ No match 0: abcde\x0a1234X BarFoo 0: B - *** Failers -No match +\= Expect no match abcde\nBar No match @@ -3486,14 +3291,12 @@ No match 0: abcde\x0a1234X BarFoo 0: B - *** Failers -No match +\= Expect no match abcde\nBar No match /^.*B/ - **** Failers -No match +\= Expect no match abc\nB No match @@ -3544,32 +3347,28 @@ No match /^[abcdefghijklmnopqrstuvwxy0123456789]/ n 0: n - *** Failers -No match +\= Expect no match z No match /abcde{0,0}/ abcd 0: abcd - *** Failers -No match +\= Expect no match abce No match /ab[cd]{0,0}e/ abe 0: abe - *** Failers -No match +\= Expect no match abcde No match /ab(c){0,0}d/ abd 0: abd - *** Failers -No match +\= Expect no match abcd No match @@ -3580,16 +3379,14 @@ No match 0: ab abbbb 0: abbbb - *** Failers - 0: a +\= Expect no match bbbbb No match /ab\d{0}e/ abe 0: abe - *** Failers -No match +\= Expect no match ab1e No match @@ -3599,7 +3396,7 @@ No match \"the \\\"quick\\\" brown fox\" 0: "the \"quick\" brown fox" -/.*?/g+ +/.*?/g,aftertext abc 0: abc 0+ @@ -3609,14 +3406,14 @@ No match 0: 0+ -/\b/g+ +/\b/g,aftertext abc 0: 0+ abc 0: 0+ -/\b/+g +/\b/g,aftertext abc 0: 0+ abc @@ -3643,8 +3440,7 @@ No match /a.b/ acb 0: acb - *** Failers -No match +\= Expect no match a\nb No match @@ -3685,10 +3481,11 @@ No match 0: bbbbbac /(?!\A)x/m - x\nb\n -No match a\bx\n 0: x +\= Expect no match + x\nb\n +No match /\x0{ab}/ \0{ab} @@ -3711,8 +3508,7 @@ No match 0: foo rfoosh 0: foo - *** Failers -No match +\= Expect no match barfoo No match towbarfoo @@ -3721,8 +3517,7 @@ No match /\w{3}(?.*/)foo" +\= Expect no match /this/is/a/very/long/line/in/deed/with/very/many/slashes/in/it/you/see/ No match @@ -3767,16 +3561,14 @@ No match 0: .230003938 1.875000282 0: .875000282 - *** Failers -No match +\= Expect no match 1.235 No match /^((?>\w+)|(?>\s+))*$/ now is the time for all good men to come to the aid of the party 0: now is the time for all good men to come to the aid of the party - *** Failers -No match +\= Expect no match this is not a line with only words and spaces! No match @@ -3796,8 +3588,7 @@ No match /((?>\d+))(\w)/ 12345a 0: 12345a - *** Failers -No match +\= Expect no match 12345+ No match @@ -3840,13 +3631,12 @@ No match 4: abc(ade) 5: abc -/\(((?>[^()]+)|\([^()]+\))+\)/ +/\(((?>[^()]+)|\([^()]+\))+\)/ (abc) 0: (abc) (abc(def)xyz) 0: (abc(def)xyz) - *** Failers -No match +\= Expect no match ((()aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa No match @@ -3855,8 +3645,7 @@ No match 0: ab Ab 0: Ab - *** Failers -No match +\= Expect no match aB No match AB @@ -3865,8 +3654,7 @@ No match /(a (?x)b c)d e/ a bcd e 0: a bcd e - *** Failers -No match +\= Expect no match a b cd e No match abcd e @@ -3877,8 +3665,7 @@ No match /(a b(?x)c d (?-x)e f)/ a bcde f 0: a bcde f - *** Failers -No match +\= Expect no match abcdef No match @@ -3887,8 +3674,7 @@ No match 0: abc aBc 0: aBc - *** Failers -No match +\= Expect no match abC No match aBC @@ -3907,8 +3693,7 @@ No match 0: abc aBc 0: aBc - *** Failers -No match +\= Expect no match ABC No match abC @@ -3921,8 +3706,7 @@ No match 0: aBc aBBc 0: aBBc - *** Failers -No match +\= Expect no match aBC No match aBBC @@ -3933,8 +3717,7 @@ No match 0: abcd abCd 0: abCd - *** Failers -No match +\= Expect no match aBCd No match abcD @@ -3947,8 +3730,7 @@ No match 0: more than MILLION more \n than Million 0: more \x0a than Million - *** Failers -No match +\= Expect no match MORE THAN MILLION No match more \n than \n million @@ -3961,22 +3743,20 @@ No match 0: more than MILLION more \n than Million 0: more \x0a than Million - *** Failers -No match +\= Expect no match MORE THAN MILLION No match more \n than \n million No match -/(?>a(?i)b+)+c/ +/(?>a(?i)b+)+c/ abc 0: abc aBbc 0: aBbc aBBc 0: aBBc - *** Failers -No match +\= Expect no match Abc No match abAb @@ -3989,8 +3769,7 @@ No match 0: abc aBc 0: aBc - *** Failers -No match +\= Expect no match Ab No match abC @@ -4003,8 +3782,7 @@ No match 0: xxc aBxxc 0: xxc - *** Failers -No match +\= Expect no match Abxxc No match ABxxc @@ -4017,8 +3795,7 @@ No match 0: abc: 12 0: 12 - *** Failers -No match +\= Expect no match 123 No match xyz @@ -4029,8 +3806,7 @@ No match 0: abc: 12 0: 12 - *** Failers -No match +\= Expect no match 123 No match xyz @@ -4045,8 +3821,7 @@ No match 0: cat focat 0: cat - *** Failers -No match +\= Expect no match foocat No match @@ -4059,8 +3834,7 @@ No match 0: cat focat 0: cat - *** Failers -No match +\= Expect no match foocat No match @@ -4241,8 +4015,7 @@ No match 0: 12-sep-98 12-09-98 0: 12-09-98 - *** Failers -No match +\= Expect no match sep-12-98 No match @@ -4271,8 +4044,7 @@ No match 0: bbx BBx 0: BBx - *** Failers -No match +\= Expect no match abcX No match aBCX @@ -4297,8 +4069,7 @@ No match 0: f France 0: F - *** Failers -No match +\= Expect no match Africa No match @@ -4315,8 +4086,7 @@ No match 0: z Zambesi 0: Z - *** Failers -No match +\= Expect no match aCD No match XY @@ -4325,8 +4095,7 @@ No match /(?<=foo\n)^bar/m foo\nbar 0: bar - *** Failers -No match +\= Expect no match bar No match baz\nbar @@ -4339,16 +4108,14 @@ No match 0: baz koobarbaz 0: baz - *** Failers -No match +\= Expect no match baz No match foobarbaz No match -/The following tests are taken from the Perl 5.005 test suite; some of them/ -/are compatible with 5.004, but I'd rather not have to sort them out./ -No match +# The following tests are taken from the Perl 5.005 test suite; some of them +# are compatible with 5.004, but I'd rather not have to sort them out. /abc/ abc @@ -4357,8 +4124,7 @@ No match 0: abc ababc 0: abc - *** Failers -No match +\= Expect no match xbc No match axc @@ -4393,8 +4159,7 @@ No match /ab+bc/ abbc 0: abbc - *** Failers -No match +\= Expect no match abc No match abq @@ -4417,8 +4182,7 @@ No match 0: abbbbc /ab{4,5}bc/ - *** Failers -No match +\= Expect no match abq No match abbbbc @@ -4447,8 +4211,7 @@ No match /^abc$/ abc 0: abc - *** Failers -No match +\= Expect no match abbbbc No match abcc @@ -4463,10 +4226,9 @@ No match /abc$/ aabc 0: abc - *** Failers -No match aabc 0: abc +\= Expect no match aabcd No match @@ -4491,8 +4253,7 @@ No match /a[bc]d/ abd 0: abd - *** Failers -No match +\= Expect no match axyzd No match abc @@ -4525,8 +4286,7 @@ No match /a[^bc]d/ aed 0: aed - *** Failers -No match +\= Expect no match abd No match abd @@ -4539,10 +4299,9 @@ No match /a[^]b]c/ adc 0: adc - *** Failers -No match a-c 0: a-c +\= Expect no match a]c No match @@ -4555,8 +4314,7 @@ No match 0: a /\by\b/ - *** Failers -No match +\= Expect no match xy No match yz @@ -4565,8 +4323,7 @@ No match No match /\Ba\B/ - *** Failers - 0: a +\= Expect no match a- No match -a @@ -4593,10 +4350,7 @@ No match /\W/ - 0: - - *** Failers - 0: * - - - 0: - +\= Expect no match a No match @@ -4607,10 +4361,7 @@ No match /a\Sb/ a-b 0: a-b - *** Failers -No match - a-b - 0: a-b +\= Expect no match a b No match @@ -4621,10 +4372,7 @@ No match /\D/ - 0: - - *** Failers - 0: * - - - 0: - +\= Expect no match 1 No match @@ -4635,10 +4383,7 @@ No match /[\W]/ - 0: - - *** Failers - 0: * - - - 0: - +\= Expect no match a No match @@ -4649,10 +4394,7 @@ No match /a[\S]b/ a-b 0: a-b - *** Failers -No match - a-b - 0: a-b +\= Expect no match a b No match @@ -4663,10 +4405,7 @@ No match /[\D]/ - 0: - - *** Failers - 0: * - - - 0: - +\= Expect no match 1 No match @@ -4693,6 +4432,9 @@ No match 0: a((b /a\\b/ + a\\b + 0: a\b +\= Expect no match a\b No match @@ -4754,14 +4496,11 @@ No match 0: cde /abc/ - *** Failers -No match +\= Expect no match b No match - /a*/ - /([abc])*d/ abbbcd @@ -4833,8 +4572,7 @@ No match 0: adcdcde /a[bcd]+dcdcde/ - *** Failers -No match +\= Expect no match abcde No match adcdcde @@ -4863,8 +4601,7 @@ No match 0: ij reffgz 0: effgz - *** Failers -No match +\= Expect no match effg No match bcdd @@ -4879,8 +4616,7 @@ No match 0: a /multiple words of text/ - *** Failers -No match +\= Expect no match aa No match uh-uh @@ -4919,8 +4655,7 @@ No match 0: ABC ABABC 0: ABC - *** Failers -No match +\= Expect no match aaxabxbaxbbx No match XBC @@ -4953,8 +4688,7 @@ No match 0: ABBC /ab+bc/i - *** Failers -No match +\= Expect no match ABC No match ABQ @@ -4979,8 +4713,7 @@ No match 0: ABBBBC /ab{4,5}?bc/i - *** Failers -No match +\= Expect no match ABQ No match ABBBBC @@ -5009,8 +4742,7 @@ No match /^abc$/i ABC 0: ABC - *** Failers -No match +\= Expect no match ABBBBC No match ABCC @@ -5045,10 +4777,9 @@ No match 0: AXYZC /a.*c/i - *** Failers -No match AABC 0: AABC +\= Expect no match AXYZD No match @@ -5059,8 +4790,7 @@ No match /a[b-d]e/i ACE 0: ACE - *** Failers -No match +\= Expect no match ABC No match ABD @@ -5093,8 +4823,7 @@ No match /a[^-b]c/i ADC 0: ADC - *** Failers -No match +\= Expect no match ABD No match A-C @@ -5115,8 +4844,7 @@ No match 0: EF /$b/i - *** Failers -No match +\= Expect no match A]C No match B @@ -5133,7 +4861,8 @@ No match 0: A((B /a\\b/i - A\B +\= Expect no match + A\=notbol No match /((a))/i @@ -5211,7 +4940,6 @@ No match /abc/i /a*/i - /([abc])*d/i ABBBCD @@ -5248,6 +4976,7 @@ No match 0: HIJ /^(ab|cd)e/i +\= Expect no match ABCDE No match @@ -5309,8 +5038,7 @@ No match 0: IJ REFFGZ 0: EFFGZ - *** Failers -No match +\= Expect no match ADCDCDE No match EFFG @@ -5335,8 +5063,7 @@ No match 0: C /multiple words of text/i - *** Failers -No match +\= Expect no match AA No match UH-UH @@ -5482,8 +5209,7 @@ No match /(?<=a)b/ ab 0: b - *** Failers -No match +\= Expect no match cb No match b @@ -5548,8 +5274,7 @@ No match 0: Ab /(?:(?i)a)b/ - *** Failers -No match +\= Expect no match cb No match aB @@ -5574,8 +5299,7 @@ No match 0: Ab /(?i:a)b/ - *** Failers -No match +\= Expect no match aB No match aB @@ -5600,10 +5324,9 @@ No match 0: aB /(?:(?-i)a)b/i - *** Failers -No match aB 0: aB +\= Expect no match Ab No match @@ -5618,8 +5341,7 @@ No match 0: aB /(?:(?-i)a)b/i - *** Failers -No match +\= Expect no match Ab No match AB @@ -5644,8 +5366,7 @@ No match 0: aB /(?-i:a)b/i - *** Failers -No match +\= Expect no match AB No match Ab @@ -5662,8 +5383,7 @@ No match 0: aB /(?-i:a)b/i - *** Failers -No match +\= Expect no match Ab No match AB @@ -5672,8 +5392,7 @@ No match /((?-i:a))b/i /((?-i:a.))b/i - *** Failers -No match +\= Expect no match AB No match a\nB @@ -5709,8 +5428,7 @@ No match 0: aaac /(?.*)(?<=(abcd|wxyz))/ alphabetabcd 0: alphabetabcd endingwxyz 0: endingwxyz - *** Failers -No match +\= Expect no match a rather long string that doesn't end with one of them No match /word (?>(?:(?!otherword)[a-zA-Z0-9]+ ){0,30})otherword/ word cat dog elephant mussel cow horse canary baboon snake shark otherword 0: word cat dog elephant mussel cow horse canary baboon snake shark otherword +\= Expect no match word cat dog elephant mussel cow horse canary baboon snake shark No match /word (?>[a-zA-Z0-9]+ ){0,30}otherword/ +\= Expect no match word cat dog elephant mussel cow horse canary baboon snake shark the quick brown fox and the lazy dog and several other words getting close to thirty by now I hope No match @@ -5928,8 +5640,7 @@ No match 0: foo 123999foo 0: foo - *** Failers -No match +\= Expect no match 123abcfoo No match @@ -5938,8 +5649,7 @@ No match 0: foo 123999foo 0: foo - *** Failers -No match +\= Expect no match 123abcfoo No match @@ -5948,8 +5658,7 @@ No match 0: foo 123456foo 0: foo - *** Failers -No match +\= Expect no match 123999foo No match @@ -5958,8 +5667,7 @@ No match 0: foo 123456foo 0: foo - *** Failers -No match +\= Expect no match 123999foo No match @@ -6000,18 +5708,6 @@ No match 0: 0: -/^[\d-a]/ - abcde - 0: a - -things - 0: - - 0digit - 0: 0 - *** Failers -No match - bcdef -No match - /[[:space:]]+/ > \x09\x0a\x0c\x0d\x0b< 0: \x09\x0a\x0c\x0d\x0b @@ -6037,7 +5733,8 @@ No match 0: x /(?!^)x/m - a\nxb\n +\= Expect no match + a\nxb\n No match /abc\Qabc\Eabc/ @@ -6051,8 +5748,7 @@ No match / abc\Q abc\Eabc/x abc abcabc 0: abc abcabc - *** Failers -No match +\= Expect no match abcabcabc No match @@ -6092,8 +5788,7 @@ No match /\Gabc/ abc 0: abc - *** Failers -No match +\= Expect no match xyzabc No match @@ -6111,8 +5806,7 @@ No match /a(?x: b c )d/ XabcdY 0: abcd - *** Failers -No match +\= Expect no match Xa b c d Y No match @@ -6125,8 +5819,7 @@ No match /(?i)AB(?-i)C/ XabCY 0: abC - *** Failers -No match +\= Expect no match XabcY No match @@ -6135,8 +5828,7 @@ No match 0: abCE DE 0: DE - *** Failers -No match +\= Expect no match abcE No match abCe @@ -6157,22 +5849,12 @@ No match 0: d ] 0: ] - *** Failers - 0: a +\= Expect no match b No match -/[\z\C]/ - z - 0: z - C - 0: C - -/\M/ - M - 0: M - /(a+)*b/ +\= Expect no match aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa No match @@ -6202,7 +5884,7 @@ No match /^(?(2)a|(1)(2))+$/ 123a -Error -17 (backreference condition or recursion test not supported for DFA matching) +Failed: error -40: backreference condition or recursion test is not supported for DFA matching /(?<=a|bbbb)c/ ac @@ -6210,91 +5892,40 @@ Error -17 (backreference condition or recursion test not supported for DFA match bbbbc 0: c -/abc/SS>testsavedregex -Compiled pattern written to testsavedregex -testsavedregex -Compiled pattern written to testsavedregex -testsavedregex -Compiled pattern written to testsavedregex -Study data written to testsavedregex -testsavedregex -Compiled pattern written to testsavedregex -Study data written to testsavedregex - - 0: abc - xyz\r\nabc\ + xyz\r\nabc 0: abc - xyz\rabc\ - 0: abc - xyz\r\nabc\ - 0: abc - ** Failers -No match - xyz\nabc\ +\= Expect no match + xyz\rabc No match - xyz\r\nabc\ + xyzabc\r No match - xyz\nabc\ + xyzabc\rpqr No match - xyz\rabc\ + xyzabc\r\n No match - xyz\rabc\ + xyzabc\r\npqr No match - -/abc$/m - xyzabc - 0: abc - xyzabc\n - 0: abc - xyzabc\npqr - 0: abc - xyzabc\r\ - 0: abc - xyzabc\rpqr\ - 0: abc - xyzabc\r\n\ - 0: abc - xyzabc\r\npqr\ + +/^abc/Im,newline=crlf +Capture group count = 0 +Options: multiline +Forced newline is CRLF +First code unit at start or follows newline +Last code unit = 'c' +Subject length lower bound = 3 + xyz\r\nabclf> 0: abc - ** Failers -No match - xyzabc\r -No match - xyzabc\rpqr +\= Expect no match + xyz\nabclf No match - xyzabc\r\n -No match - xyzabc\r\npqr + xyz\rabclf No match -/^abc/m - xyz\rabcdef - 0: abc - xyz\nabcdef\ - 0: abc - ** Failers -No match - xyz\nabcdef -No match - -/^abc/m - xyz\nabcdef - 0: abc - xyz\rabcdef\ - 0: abc - ** Failers -No match - xyz\rabcdef -No match - -/^abc/m - xyz\r\nabcdef - 0: abc - xyz\rabcdef\ +/^abc/Im,newline=cr +Capture group count = 0 +Options: multiline +Forced newline is CR +First code unit at start or follows newline +Last code unit = 'c' +Subject length lower bound = 3 + xyz\rabc 0: abc - ** Failers +\= Expect no match + xyz\nabc No match - xyz\rabcdef + xyz\r\nabc No match - -/.*/ + +/.*/I,newline=lf +Capture group count = 0 +May match empty string +Forced newline is LF +First code unit at start or follows newline +Subject length lower bound = 0 abc\ndef 0: abc abc\rdef 0: abc\x0ddef abc\r\ndef 0: abc\x0d - \abc\ndef + +/.*/I,newline=cr +Capture group count = 0 +May match empty string +Forced newline is CR +First code unit at start or follows newline +Subject length lower bound = 0 + abc\ndef 0: abc\x0adef - \abc\rdef + abc\rdef 0: abc - \abc\r\ndef + abc\r\ndef 0: abc - \abc\ndef + +/.*/I,newline=crlf +Capture group count = 0 +May match empty string +Forced newline is CRLF +First code unit at start or follows newline +Subject length lower bound = 0 + abc\ndef 0: abc\x0adef - \abc\rdef + abc\rdef 0: abc\x0ddef - \abc\r\ndef + abc\r\ndef 0: abc +/\w+(.)(.)?def/Is +Capture group count = 2 +Options: dotall +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z +Last code unit = 'f' +Subject length lower bound = 5 + abc\ndef + 0: abc\x0adef + abc\rdef + 0: abc\x0ddef + abc\r\ndef + 0: abc\x0d\x0adef + /\w+(.)(.)?def/s abc\ndef 0: abc\x0adef @@ -6497,40 +6134,57 @@ No match 3: a /(a|)*\d/ - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa -No match aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 - -/(?>a|)*\d/ +\= Expect no match aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa No match + +/(?>a|)*\d/ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 - -/(?:a|)*\d/ +\= Expect no match aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa No match + +/(?:a|)*\d/ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa4 +\= Expect no match + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +No match -/^a.b/ +/^a.b/newline=lf a\rb 0: a\x0db - a\nb\ - 0: a\x0ab - ** Failers +\= Expect no match + a\nb No match + +/^a.b/newline=cr a\nb + 0: a\x0ab +\= Expect no match + a\rb No match - a\nb\ + +/^a.b/newline=anycrlf + a\x85b + 0: a\x85b +\= Expect no match + a\rb No match - a\rb\ + +/^a.b/newline=any +\= Expect no match + a\nb No match - a\rb\ + a\rb +No match + a\x85b No match -/^abc./mgx +/^abc./gmx,newline=any abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x85abc7 JUNK 0: abc1 0: abc2 @@ -6540,7 +6194,7 @@ No match 0: abc6 0: abc7 -/abc.$/mgx +/abc.$/gmx,newline=any abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x85 abc9 0: abc1 0: abc2 @@ -6550,7 +6204,7 @@ No match 0: abc6 0: abc9 -/^a\Rb/ +/^a\Rb/bsr=unicode a\nb 0: a\x0ab a\rb @@ -6563,12 +6217,11 @@ No match 0: a\x0cb a\x85b 0: a\x85b - ** Failers -No match +\= Expect no match a\n\rb No match -/^a\R*b/ +/^a\R*b/bsr=unicode ab 0: ab a\nb @@ -6588,7 +6241,7 @@ No match a\n\r\x85\x0cb 0: a\x0a\x0d\x85\x0cb -/^a\R+b/ +/^a\R+b/bsr=unicode a\nb 0: a\x0ab a\rb @@ -6605,12 +6258,11 @@ No match 0: a\x0a\x0db a\n\r\x85\x0cb 0: a\x0a\x0d\x85\x0cb - ** Failers -No match +\= Expect no match ab No match -/^a\R{1,3}b/ +/^a\R{1,3}b/bsr=unicode a\nb 0: a\x0ab a\n\rb @@ -6625,46 +6277,34 @@ No match 0: a\x0a\x0d\x0a\x0db a\n\n\r\nb 0: a\x0a\x0a\x0d\x0ab - ** Failers -No match +\= Expect no match a\n\n\n\rb No match a\r No match -/^a[\R]b/ - aRb - 0: aRb - ** Failers -No match - a\nb -No match - /.+foo/ afoo 0: afoo - ** Failers -No match +\= Expect no match \r\nfoo No match \nfoo No match -/.+foo/ +/.+foo/newline=crlf afoo 0: afoo \nfoo 0: \x0afoo - ** Failers -No match +\= Expect no match \r\nfoo No match -/.+foo/ +/.+foo/newline=any afoo 0: afoo - ** Failers -No match +\= Expect no match \nfoo No match \r\nfoo @@ -6678,36 +6318,34 @@ No match \nfoo 0: \x0afoo -/^$/mg +/^$/gm,newline=any abc\r\rxyz 0: abc\n\rxyz 0: - ** Failers -No match +\= Expect no match abc\r\nxyz No match /^X/m XABC 0: X - ** Failers -No match - XABC\B +\= Expect no match + XABC\=notbol No match -/(?m)^$/g+ +/(?m)^$/g,newline=any,aftertext abc\r\n\r\n 0: 0+ \x0d\x0a -/(?m)^$|^\r\n/g+ +/(?m)^$|^\r\n/g,newline=any,aftertext abc\r\n\r\n 0: \x0d\x0a 0+ 1: -/(?m)$/g+ +/(?m)$/g,newline=any,aftertext abc\r\n\r\n 0: 0+ \x0d\x0a\x0d\x0a @@ -6739,8 +6377,7 @@ No match 0: abcabc xyzabc 0: xyzabc - ** Failers -No match +\= Expect no match xyzxyz No match @@ -6749,20 +6386,18 @@ No match 0: X X\x0a X\x09X\x0b 0: X\x09X\x0b - ** Failers -No match +\= Expect no match \xa0 X\x0a No match -/\H*\h+\V?\v{3,4}/ +/\H*\h+\V?\v{3,4}/ \x09\x20\xa0X\x0a\x0b\x0c\x0d\x0a 0: \x09 \xa0X\x0a\x0b\x0c\x0d \x09\x20\xa0\x0a\x0b\x0c\x0d\x0a 0: \x09 \xa0\x0a\x0b\x0c\x0d \x09\x20\xa0\x0a\x0b\x0c 0: \x09 \xa0\x0a\x0b\x0c - ** Failers -No match +\= Expect no match \x09\x20\xa0\x0a\x0b No match @@ -6782,8 +6417,7 @@ No match 0: XNNNYZ > X NYQZ 0: X NYQZ - ** Failers -No match +\= Expect no match >XYZ No match > X NY Z @@ -6795,45 +6429,47 @@ No match >\x0a\x0dX\x0aY\x0a\x0bZZZ\x0aAAA\x0bNNN\x0c 0: \x0a\x0dX\x0aY\x0a\x0bZZZ\x0aAAA\x0bNNN\x0c -/.+A/ +/.+A/newline=crlf +\= Expect no match \r\nA No match -/\nA/ +/\nA/newline=crlf \r\nA 0: \x0aA -/[\r\n]A/ +/[\r\n]A/newline=crlf \r\nA 0: \x0aA -/(\r|\n)A/ +/(\r|\n)A/newline=crlf \r\nA 0: \x0aA -/a\Rb/I -Capturing subpattern count = 0 -Options: bsr_anycrlf -First char = 'a' -Need char = 'b' +/a\Rb/I,bsr=anycrlf +Capture group count = 0 +\R matches CR, LF, or CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 a\rb 0: a\x0db a\nb 0: a\x0ab a\r\nb 0: a\x0d\x0ab - ** Failers -No match +\= Expect no match a\x85b No match a\x0bb No match -/a\Rb/I -Capturing subpattern count = 0 -Options: bsr_unicode -First char = 'a' -Need char = 'b' +/a\Rb/I,bsr=unicode +Capture group count = 0 +\R matches any Unicode newline +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 a\rb 0: a\x0db a\nb @@ -6844,36 +6480,31 @@ Need char = 'b' 0: a\x85b a\x0bb 0: a\x0bb - ** Failers -No match - a\x85b\ -No match - a\x0bb\ -No match -/a\R?b/I -Capturing subpattern count = 0 -Options: bsr_anycrlf -First char = 'a' -Need char = 'b' +/a\R?b/I,bsr=anycrlf +Capture group count = 0 +\R matches CR, LF, or CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 a\rb 0: a\x0db a\nb 0: a\x0ab a\r\nb 0: a\x0d\x0ab - ** Failers -No match +\= Expect no match a\x85b No match a\x0bb No match -/a\R?b/I -Capturing subpattern count = 0 -Options: bsr_unicode -First char = 'a' -Need char = 'b' +/a\R?b/I,bsr=unicode +Capture group count = 0 +\R matches any Unicode newline +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 a\rb 0: a\x0db a\nb @@ -6884,102 +6515,86 @@ Need char = 'b' 0: a\x85b a\x0bb 0: a\x0bb - ** Failers -No match - a\x85b\ -No match - a\x0bb\ -No match -/a\R{2,4}b/I -Capturing subpattern count = 0 -Options: bsr_anycrlf -First char = 'a' -Need char = 'b' +/a\R{2,4}b/I,bsr=anycrlf +Capture group count = 0 +\R matches CR, LF, or CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 4 a\r\n\nb 0: a\x0d\x0a\x0ab a\n\r\rb 0: a\x0a\x0d\x0db a\r\n\r\n\r\n\r\nb 0: a\x0d\x0a\x0d\x0a\x0d\x0a\x0d\x0ab - ** Failers -No match - a\x85\85b +\= Expect no match + a\x0b\x0bb No match - a\x0b\0bb + a\x85\x85b No match -/a\R{2,4}b/I -Capturing subpattern count = 0 -Options: bsr_unicode -First char = 'a' -Need char = 'b' +/a\R{2,4}b/I,bsr=unicode +Capture group count = 0 +\R matches any Unicode newline +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 4 a\r\rb 0: a\x0d\x0db a\n\n\nb 0: a\x0a\x0a\x0ab a\r\n\n\r\rb 0: a\x0d\x0a\x0a\x0d\x0db - a\x85\85b -No match - a\x0b\0bb -No match - ** Failers -No match + a\x85\x85b + 0: a\x85\x85b + a\x0b\x0bb + 0: a\x0b\x0bb +\= Expect no match a\r\r\r\r\rb No match - a\x85\85b\ -No match - a\x0b\0bb\ -No match /a(?!)|\wbc/ abc 0: abc -/a[]b/ - ** Failers -No match +/a[]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames +\= Expect no match ab No match -/a[]+b/ - ** Failers -No match +/a[]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames +\= Expect no match ab No match -/a[]*+b/ - ** Failers -No match +/a[]*+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames +\= Expect no match ab No match -/a[^]b/ +/a[^]b/alt_bsux,allow_empty_class,match_unset_backref,dupnames aXb 0: aXb a\nb 0: a\x0ab - ** Failers -No match +\= Expect no match ab No match -/a[^]+b/ +/a[^]+b/alt_bsux,allow_empty_class,match_unset_backref,dupnames aXb 0: aXb a\nX\nXb 0: a\x0aX\x0aXb - ** Failers -No match +\= Expect no match ab No match -/X$/E +/X$/dollar_endonly X 0: X - ** Failers -No match +\= Expect no match X\n No match @@ -6989,22 +6604,29 @@ No match X\n 0: X -/xyz/C +/xyz/auto_callout xyz --->xyz +0 ^ x +1 ^^ y +2 ^ ^ z - +3 ^ ^ + +3 ^ ^ End of pattern 0: xyz abcxyz --->abcxyz +0 ^ x +1 ^^ y +2 ^ ^ z - +3 ^ ^ + +3 ^ ^ End of pattern 0: xyz - abcxyz\Y +\= Expect no match + abc +No match + abcxypqr +No match + +/xyz/auto_callout,no_start_optimize + abcxyz --->abcxyz +0 ^ x +0 ^ x @@ -7012,13 +6634,10 @@ No match +0 ^ x +1 ^^ y +2 ^ ^ z - +3 ^ ^ + +3 ^ ^ End of pattern 0: xyz - ** Failers -No match +\= Expect no match abc -No match - abc\Y --->abc +0 ^ x +0 ^ x @@ -7026,8 +6645,6 @@ No match +0 ^ x No match abcxypqr -No match - abcxypqr\Y --->abcxypqr +0 ^ x +0 ^ x @@ -7042,7 +6659,7 @@ No match +0 ^ x No match -/(*NO_START_OPT)xyz/C +/(*NO_START_OPT)xyz/auto_callout abcxyz --->abcxyz +15 ^ x @@ -7051,7 +6668,7 @@ No match +15 ^ x +16 ^^ y +17 ^ ^ z -+18 ^ ^ ++18 ^ ^ End of pattern 0: xyz /(?C)ab/ @@ -7059,147 +6676,156 @@ No match --->ab 0 ^ a 0: ab - \C-ab + ab\=callout_none 0: ab -/ab/C +/ab/auto_callout ab --->ab +0 ^ a +1 ^^ b - +2 ^ ^ + +2 ^ ^ End of pattern 0: ab - \C-ab + ab\=callout_none 0: ab -/^"((?(?=[a])[^"])|b)*"$/C +/^"((?(?=[a])[^"])|b)*"$/auto_callout "ab" --->"ab" +0 ^ ^ +1 ^ " - +2 ^^ ((?(?=[a])[^"])|b)* + +2 ^^ ( +21 ^^ " - +3 ^^ (?(?=[a])[^"]) + +3 ^^ (? +18 ^^ b - +5 ^^ (?=[a]) + +5 ^^ (?= +8 ^ [a] +11 ^^ ) +12 ^^ [^"] +16 ^ ^ ) +17 ^ ^ | +21 ^ ^ " - +3 ^ ^ (?(?=[a])[^"]) + +3 ^ ^ (? +18 ^ ^ b - +5 ^ ^ (?=[a]) + +5 ^ ^ (?= +8 ^ [a] -+19 ^ ^ ) ++19 ^ ^ )* +21 ^ ^ " - +3 ^ ^ (?(?=[a])[^"]) + +3 ^ ^ (? +18 ^ ^ b - +5 ^ ^ (?=[a]) + +5 ^ ^ (?= +8 ^ [a] +17 ^ ^ | +22 ^ ^ $ -+23 ^ ^ ++23 ^ ^ End of pattern 0: "ab" - \C-"ab" + "ab"\=callout_none 0: "ab" /\d+X|9+Y/ - ++++123999\P + ++++123999\=ps Partial match: 123999 - ++++123999Y\P + ++++123999Y\=ps 0: 999Y /Z(*F)/ - Z\P +\= Expect no match + Z\=ps No match - ZA\P + ZA\=ps No match /Z(?!)/ - Z\P +\= Expect no match + Z\=ps No match - ZA\P + ZA\=ps No match /dog(sbody)?/ - dogs\P + dogs\=ps 0: dog - dogs\P\P + dogs\=ph Partial match: dogs /dog(sbody)??/ - dogs\P + dogs\=ps 0: dog - dogs\P\P + dogs\=ph Partial match: dogs /dog|dogsbody/ - dogs\P + dogs\=ps 0: dog - dogs\P\P + dogs\=ph Partial match: dogs /dogsbody|dog/ - dogs\P + dogs\=ps 0: dog - dogs\P\P + dogs\=ph Partial match: dogs /Z(*F)Q|ZXY/ - Z\P + Z\=ps Partial match: Z - ZA\P +\= Expect no match + ZA\=ps No match - X\P + X\=ps No match /\bthe cat\b/ - the cat\P + the cat\=ps 0: the cat - the cat\P\P + the cat\=ph Partial match: the cat /dog(sbody)?/ - dogs\D\P + dogs\=ps 0: dog - body\D\R + body\=dfa_restart 0: body /dog(sbody)?/ - dogs\D\P\P + dogs\=ph Partial match: dogs - body\D\R + body\=dfa_restart 0: body /abc/ - abc\P + abc\=ps 0: abc - abc\P\P + abc\=ph 0: abc /abc\K123/ xyzabc123pqr -Error -16 (item unsupported for DFA matching) +Failed: error -42: pattern contains an item that is not supported for DFA matching -/(?<=abc)123/ +/(?<=abc)123/allusedtext xyzabc123pqr - 0: 123 - xyzabc12\P -Partial match at offset 6: abc12 - xyzabc12\P\P -Partial match at offset 6: abc12 - -/\babc\b/ + 0: abc123 + <<< + xyzabc12\=ps +Partial match: abc12 + <<< + xyzabc12\=ph +Partial match: abc12 + <<< + +/\babc\b/allusedtext +++abc+++ - 0: abc - +++ab\P -Partial match at offset 3: +ab - +++ab\P\P -Partial match at offset 3: +ab - -/(?=C)/g+ + 0: +abc+ + < > + +++ab\=ps +Partial match: +ab + < + +++ab\=ph +Partial match: +ab + < + +/(?=C)/g,aftertext ABCDECBA 0: 0+ CDECBA @@ -7207,82 +6833,63 @@ Partial match at offset 3: +ab 0+ CBA /(abc|def|xyz)/I -Capturing subpattern count = 1 -No options -No first char -No need char +Capture group count = 1 +Starting code units: a d x +Subject length lower bound = 3 terhjk;abcdaadsfe 0: abc the quick xyz brown fox 0: xyz - \Yterhjk;abcdaadsfe - 0: abc - \Ythe quick xyz brown fox - 0: xyz - ** Failers -No match +\= Expect no match thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd No match - \Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd -No match -/(abc|def|xyz)/SI -Capturing subpattern count = 1 -No options -No first char -No need char -Subject length lower bound = 3 -Starting chars: a d x +/(abc|def|xyz)/I,no_start_optimize +Capture group count = 1 +Options: no_start_optimize terhjk;abcdaadsfe 0: abc - the quick xyz brown fox - 0: xyz - \Yterhjk;abcdaadsfe - 0: abc - \Ythe quick xyz brown fox + the quick xyz brown fox 0: xyz - ** Failers -No match +\= Expect no match thejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd No match - \Ythejk;adlfj aenjl;fda asdfasd ehj;kjxyasiupd -No match -/abcd*/+ - xxxxabcd\P +/abcd*/aftertext + xxxxabcd\=ps 0: abcd 0+ - xxxxabcd\P\P + xxxxabcd\=ph Partial match: abcd - dddxxx\R + dddxxx\=dfa_restart 0: ddd 0+ xxx - xxxxabcd\P\P + xxxxabcd\=ph Partial match: abcd - xxx\R + xxx\=dfa_restart 0: 0+ xxx /abcd*/i - xxxxabcd\P + xxxxabcd\=ps 0: abcd - xxxxabcd\P\P + xxxxabcd\=ph Partial match: abcd - XXXXABCD\P + XXXXABCD\=ps 0: ABCD - XXXXABCD\P\P + XXXXABCD\=ph Partial match: ABCD /abc\d*/ - xxxxabc1\P + xxxxabc1\=ps 0: abc1 - xxxxabc1\P\P + xxxxabc1\=ph Partial match: abc1 /abc[de]*/ - xxxxabcde\P + xxxxabcde\=ps 0: abcde - xxxxabcde\P\P + xxxxabcde\=ph Partial match: abcde /(?:(?1)|B)(A(*F)|C)/ @@ -7290,8 +6897,7 @@ Partial match: abcde 0: BC CCD 0: CC - ** Failers -No match +\= Expect no match CAD No match @@ -7300,8 +6906,7 @@ No match 0: CC BCD 0: BC - ** Failers -No match +\= Expect no match ABCD No match CAD @@ -7311,40 +6916,35 @@ No match /^(?!a(*SKIP)b)/ ac -Error -16 (item unsupported for DFA matching) +Failed: error -42: pattern contains an item that is not supported for DFA matching /^(?=a(*SKIP)b|ac)/ - ** Failers -No match ac -Error -16 (item unsupported for DFA matching) +Failed: error -42: pattern contains an item that is not supported for DFA matching /^(?=a(*THEN)b|ac)/ ac -Error -16 (item unsupported for DFA matching) +Failed: error -42: pattern contains an item that is not supported for DFA matching /^(?=a(*PRUNE)b)/ ab -Error -16 (item unsupported for DFA matching) - ** Failers -No match - ac -Error -16 (item unsupported for DFA matching) +Failed: error -42: pattern contains an item that is not supported for DFA matching /^(?(?!a(*SKIP)b))/ ac -Error -16 (item unsupported for DFA matching) +Failed: error -42: pattern contains an item that is not supported for DFA matching -/(?<=abc)def/ - abc\P\P -Partial match at offset 3: abc +/(?<=abc)def/allusedtext + abc\=ph +Partial match: abc + <<< /abc$/ abc 0: abc - abc\P + abc\=ps 0: abc - abc\P\P + abc\=ph Partial match: abc /abc$/m @@ -7352,66 +6952,68 @@ Partial match: abc 0: abc abc\n 0: abc - abc\P\P + abc\=ph Partial match: abc - abc\n\P\P + abc\n\=ph 0: abc - abc\P + abc\=ps 0: abc - abc\n\P + abc\n\=ps 0: abc /abc\z/ abc 0: abc - abc\P + abc\=ps 0: abc - abc\P\P + abc\=ph Partial match: abc /abc\Z/ abc 0: abc - abc\P + abc\=ps 0: abc - abc\P\P + abc\=ph Partial match: abc /abc\b/ abc 0: abc - abc\P + abc\=ps 0: abc - abc\P\P + abc\=ph Partial match: abc /abc\B/ - abc -No match - abc\P + abc\=ps Partial match: abc - abc\P\P + abc\=ph Partial match: abc +\= Expect no match + abc +No match /.+/ - abc\>0 + abc\=offset=0 0: abc - abc\>1 + abc\=offset=1 0: bc - abc\>2 + abc\=offset=2 0: c - abc\>3 +\= Bad offsets + abc\=offset=4 +Failed: error -33: bad offset value + abc\=offset=-4 +** Invalid value in 'offset=-4' +\= Expect no match + abc\=offset=3 No match - abc\>4 -Error -24 (bad offset value) - abc\>-4 -Error -24 (bad offset value) /^(?:a)++\w/ aaaab 0: aaaab - ** Failers -No match +\= Expect no match aaaa No match bbb @@ -7423,8 +7025,7 @@ No match 1: aa aaaa 0: aa - ** Failers -No match +\= Expect no match bbb No match @@ -7433,16 +7034,14 @@ No match 0: aaaab bbb 0: b - ** Failers -No match +\= Expect no match aaaa No match /^(a)++\w/ aaaab 0: aaaab - ** Failers -No match +\= Expect no match aaaa No match bbb @@ -7451,37 +7050,33 @@ No match /^(a|)++\w/ aaaab 0: aaaab - ** Failers -No match +\= Expect no match aaaa No match bbb No match -/(?=abc){3}abc/+ +/(?=abc){3}abc/aftertext abcabcabc 0: abc 0+ abcabc - ** Failers -No match +\= Expect no match xyz No match -/(?=abc)+abc/+ +/(?=abc)+abc/aftertext abcabcabc 0: abc 0+ abcabc - ** Failers -No match +\= Expect no match xyz No match -/(?=abc)++abc/+ +/(?=abc)++abc/aftertext abcabcabc 0: abc 0+ abcabc - ** Failers -No match +\= Expect no match xyz No match @@ -7490,8 +7085,7 @@ No match 0: xyz /(?=abc){1}xyz/ - ** Failers -No match +\= Expect no match xyz No match @@ -7529,7 +7123,7 @@ No match /((?2))((?1))/ abc -Error -26 (nested recursion at the same subject position) +Failed: error -52: nested recursion at the same subject position /(?(R)a+|(?R)b)/ aaaabcde @@ -7543,52 +7137,44 @@ Error -26 (nested recursion at the same subject position) aaaabcde 0: aaaab -/((?(R2)a+|(?1)b))/ +/((?(R2)a+|(?1)b))()/ aaaabcde -Error -17 (backreference condition or recursion test not supported for DFA matching) +Failed: error -40: backreference condition or recursion test is not supported for DFA matching /(?(R)a*(?1)|((?R))b)/ aaaabcde -Error -26 (nested recursion at the same subject position) +Failed: error -52: nested recursion at the same subject position -/(a+)/O - \O6aaaa +/(a+)/no_auto_possess + aaaa\=ovector=3 Matched, but offsets vector is too small to show all matches 0: aaaa 1: aaa 2: aa - \O8aaaa + aaaa\=ovector=4 0: aaaa 1: aaa 2: aa 3: a -/ab\Cde/ - abXde - 0: abXde - -/(?<=ab\Cde)X/ - abZdeX - 0: X - /^\R/ - \r\P + \r\=ps 0: \x0d - \r\P\P + \r\=ph Partial match: \x0d /^\R{2,3}x/ - \r\P + \r\=ps Partial match: \x0d - \r\P\P + \r\=ph Partial match: \x0d - \r\r\P + \r\r\=ps Partial match: \x0d\x0d - \r\r\P\P + \r\r\=ph Partial match: \x0d\x0d - \r\r\r\P + \r\r\r\=ps Partial match: \x0d\x0d\x0d - \r\r\r\P\P + \r\r\r\=ph Partial match: \x0d\x0d\x0d \r\rx 0: \x0d\x0dx @@ -7596,17 +7182,17 @@ Partial match: \x0d\x0d\x0d 0: \x0d\x0d\x0dx /^\R{2,3}?x/ - \r\P + \r\=ps Partial match: \x0d - \r\P\P + \r\=ph Partial match: \x0d - \r\r\P + \r\r\=ps Partial match: \x0d\x0d - \r\r\P\P + \r\r\=ph Partial match: \x0d\x0d - \r\r\r\P + \r\r\r\=ps Partial match: \x0d\x0d\x0d - \r\r\r\P\P + \r\r\r\=ph Partial match: \x0d\x0d\x0d \r\rx 0: \x0d\x0dx @@ -7614,9 +7200,9 @@ Partial match: \x0d\x0d\x0d 0: \x0d\x0d\x0dx /^\R?x/ - \r\P + \r\=ps Partial match: \x0d - \r\P\P + \r\=ph Partial match: \x0d x 0: x @@ -7624,81 +7210,81 @@ Partial match: \x0d 0: \x0dx /^\R+x/ - \r\P + \r\=ps Partial match: \x0d - \r\P\P + \r\=ph Partial match: \x0d - \r\n\P + \r\n\=ps Partial match: \x0d\x0a - \r\n\P\P + \r\n\=ph Partial match: \x0d\x0a \rx 0: \x0dx -/^a$/ - a\r\P +/^a$/newline=crlf + a\r\=ps Partial match: a\x0d - a\r\P\P + a\r\=ph Partial match: a\x0d -/^a$/m - a\r\P +/^a$/m,newline=crlf + a\r\=ps Partial match: a\x0d - a\r\P\P + a\r\=ph Partial match: a\x0d -/^(a$|a\r)/ - a\r\P +/^(a$|a\r)/newline=crlf + a\r\=ps 0: a\x0d - a\r\P\P + a\r\=ph Partial match: a\x0d -/^(a$|a\r)/m - a\r\P +/^(a$|a\r)/m,newline=crlf + a\r\=ps 0: a\x0d - a\r\P\P + a\r\=ph Partial match: a\x0d -/./ - \r\P +/./newline=crlf + \r\=ps 0: \x0d - \r\P\P + \r\=ph Partial match: \x0d -/.{2,3}/ - \r\P +/.{2,3}/newline=crlf + \r\=ps Partial match: \x0d - \r\P\P + \r\=ph Partial match: \x0d - \r\r\P + \r\r\=ps 0: \x0d\x0d - \r\r\P\P + \r\r\=ph Partial match: \x0d\x0d - \r\r\r\P + \r\r\r\=ps 0: \x0d\x0d\x0d - \r\r\r\P\P + \r\r\r\=ph Partial match: \x0d\x0d\x0d -/.{2,3}?/ - \r\P +/.{2,3}?/newline=crlf + \r\=ps Partial match: \x0d - \r\P\P + \r\=ph Partial match: \x0d - \r\r\P + \r\r\=ps 0: \x0d\x0d - \r\r\P\P + \r\r\=ph Partial match: \x0d\x0d - \r\r\r\P + \r\r\r\=ps 0: \x0d\x0d\x0d 1: \x0d\x0d - \r\r\r\P\P + \r\r\r\=ph Partial match: \x0d\x0d\x0d -/-- Test simple validity check for restarts --/ +# Test simple validity check for restarts /abcdef/ - abc\R -Error -30 (invalid data in workspace for DFA restart) + abc\=dfa_restart +Failed: error -38: invalid data in workspace for DFA restart /)(.)|(?R))++)*F>/ text text xxxxx text F> text2 more text. @@ -7721,10 +7307,10 @@ Error -30 (invalid data in workspace for DFA restart) 1: xx\xa0xxxxxabc /abcd/ - abcd\O0 -Matched, but offsets vector is too small to show all matches + abcd\=ovector=0 + 0: abcd -/-- These tests show up auto-possessification --/ +# These tests show up auto-possessification /[ab]*/ aaaa @@ -7785,24 +7371,528 @@ Matched, but offsets vector is too small to show all matches NON QUOTED \"QUOT\"\"ED\" AFTER \"NOT MATCHED 0: NON QUOTED "QUOT""ED" AFTER +/abc(?=xyz)/allusedtext + abcxyzpqr + 0: abcxyz + >>> + abcxyzpqr\=aftertext + 0: abcxyz + >>> + 0+ xyzpqr + +/(?<=pqr)abc(?=xyz)/allusedtext + xyzpqrabcxyzpqr + 0: pqrabcxyz + <<< >>> + xyzpqrabcxyzpqr\=aftertext + 0: pqrabcxyz + <<< >>> + 0+ xyzpqr + +/a\b/ + a.\=allusedtext + 0: a. + > + a\=allusedtext + 0: a + +/abc(?=abcde)(?=ab)/allusedtext + abcabcdefg + 0: abcabcde + >>>>> + +/a*?b*?/ + ab + 0: ab + 1: a + 2: + +/(*NOTEMPTY)a*?b*?/ + ab + 0: ab + 1: a + ba + 0: b + cb + 0: b + +/(*NOTEMPTY_ATSTART)a*?b*?/aftertext + ab + 0: ab + 0+ + 1: a + cdab + 0: + 0+ dab + +/(a)(b)|(c)/ + XcX\=ovector=2,get=1,get=2,get=3,get=4,getall + 0: c +Get substring 1 failed (-55): requested value is not set +Get substring 2 failed (-54): requested value is not available +Get substring 3 failed (-54): requested value is not available +Get substring 4 failed (-54): requested value is not available + 0L c + +/(?aa)/ + aa\=get=A + 0: aa +Get substring 'A' failed (-41): function is not supported for DFA matching + aa\=copy=A + 0: aa +Copy substring 'A' failed (-41): function is not supported for DFA matching + +/a+/no_auto_possess + a\=ovector=2,get=1,get=2,getall + 0: a +Get substring 1 failed (-55): requested value is not set +Get substring 2 failed (-54): requested value is not available + 0L a + aaa\=ovector=2,get=1,get=2,getall +Matched, but offsets vector is too small to show all matches + 0: aaa + 1: aa + 1G aa (2) +Get substring 2 failed (-54): requested value is not available + 0L aaa + 1L aa + +/a(b)c(d)/ + abc\=ph,copy=0,copy=1,getall +Partial match: abc + 0C abc (3) +Copy substring 1 failed (-2): partial match +get substring list failed (-2): partial match + +/ab(?C" any text with spaces ")cde/B +------------------------------------------------------------------ + Bra + ab + CalloutStr " any text with spaces " 6 30 1 + cde + Ket + End +------------------------------------------------------------------ + abcde +Callout (6): " any text with spaces " +--->abcde + ^ ^ c + 0: abcde + 12abcde +Callout (6): " any text with spaces " +--->12abcde + ^ ^ c + 0: abcde + +/^a(b)c(?C1)def/ + abcdef +--->abcdef + 1 ^ ^ d + 0: abcdef + +/^a(b)c(?C"AB")def/ + abcdef +Callout (10): "AB" +--->abcdef + ^ ^ d + 0: abcdef + +/^a(b)c(?C1)def/ + abcdef\=callout_capture +Callout 1: last capture = 0 +--->abcdef + ^ ^ d + 0: abcdef + +/^a(b)c(?C{AB})def/B +------------------------------------------------------------------ + Bra + ^ + a + CBra 1 + b + Ket + c + CalloutStr {AB} 10 14 1 + def + Ket + End +------------------------------------------------------------------ + abcdef\=callout_capture +Callout (10): {AB} last capture = 0 +--->abcdef + ^ ^ d + 0: abcdef + +/^(?(?C25)(?=abc)abcd|xyz)/B +------------------------------------------------------------------ + Bra + ^ + Cond + Callout 25 9 3 + Assert + abc + Ket + abcd + Alt + xyz + Ket + Ket + End +------------------------------------------------------------------ + abcdefg +--->abcdefg + 25 ^ (?= + 0: abcd + xyz123 +--->xyz123 + 25 ^ (?= + 0: xyz + +/^(?(?C$abc$)(?=abc)abcd|xyz)/B +------------------------------------------------------------------ + Bra + ^ + Cond + CalloutStr $abc$ 7 12 3 + Assert + abc + Ket + abcd + Alt + xyz + Ket + Ket + End +------------------------------------------------------------------ + abcdefg +Callout (7): $abc$ +--->abcdefg + ^ (?= + 0: abcd + xyz123 +Callout (7): $abc$ +--->xyz123 + ^ (?= + 0: xyz + +/^ab(?C'first')cd(?C"second")ef/ + abcdefg +Callout (7): 'first' +--->abcdefg + ^ ^ c +Callout (20): "second" +--->abcdefg + ^ ^ e + 0: abcdef + +/(?:a(?C`code`)){3}X/ + aaaXY +Callout (8): `code` +--->aaaXY + ^^ ){3} +Callout (8): `code` +--->aaaXY + ^ ^ ){3} +Callout (8): `code` +--->aaaXY + ^ ^ ){3} + 0: aaaX + +# Binary zero in callout string +/"a(?C'x" 00 "z')b"/hex + abcdefgh +Callout (5): 'x\x00z' +--->abcdefgh + ^^ b + 0: ab + /(?(?!)a|b)/ bbb 0: b +\= Expect no match aaa No match -/()()a+/O= - aaa\D -** Show all captures ignored after DFA matching +/^/gm + \n\n\n + 0: + 0: + 0: + +/^/gm,alt_circumflex + \n\n\n + 0: + 0: + 0: + 0: + +/abc/use_offset_limit + 1234abcde\=offset_limit=100 + 0: abc + 1234abcde\=offset_limit=9 + 0: abc + 1234abcde\=offset_limit=4 + 0: abc + 1234abcde\=offset_limit=4,offset=4 + 0: abc +\= Expect no match + 1234abcde\=offset_limit=4,offset=5 +No match + 1234abcde\=offset_limit=3 +No match + +/(?<=abc)/use_offset_limit + 1234abc\=offset_limit=7 + 0: +\= Expect no match + 1234abc\=offset_limit=6 +No match + +/abcd/null_context + abcd\=null_context + 0: abcd + +/()()a+/no_auto_possess + aaa\=allcaptures +** Ignored after DFA matching: allcaptures 0: aaa 1: aa 2: a - a\D -** Show all captures ignored after DFA matching + a\=allcaptures +** Ignored after DFA matching: allcaptures 0: a +/(*LIMIT_DEPTH=100)^((.)(?1)|.)$/ +\= Expect depth limit exceeded + a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] +Failed: error -53: matching depth limit exceeded + +/(*LIMIT_HEAP=0)^((.)(?1)|.)$/ +\= Expect heap limit exceeded + a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] +Failed: error -63: heap limit exceeded + +/(*LIMIT_HEAP=50000)^((.)(?1)|.)$/ +\= Expect success + a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] + 0: a[00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00] + /(02-)?[0-9]{3}-[0-9]{3}/ 02-123-123 0: 02-123-123 -/-- End of testinput8 --/ +/^(a(?2))(b)(?1)/ + abbab\=find_limits +Minimum heap limit = 0 +Minimum match limit = 4 +Minimum depth limit = 2 + 0: abbab + +/abc/endanchored + xyzabc + 0: abc +\= Expect no match + xyzabcdef +No match +\= Expect error + xyzabc\=ph +Failed: error -34: bad option value + +/abc/ + xyzabc\=endanchored + 0: abc +\= Expect no match + xyzabcdef\=endanchored +No match +\= Expect error + xyzabc\=ps,endanchored +Failed: error -34: bad option value + +/abc|bcd/endanchored + xyzabcd + 0: bcd +\= Expect no match + xyzabcdef +No match + +/(*NUL)^.*/ + a\nb\x00ccc + 0: a\x0ab + +/(*NUL)^.*/s + a\nb\x00ccc + 0: a\x0ab\x00ccc + +/^x/m,newline=nul + ab\x00xy + 0: x + +/'#comment' 0d 0a 00 '^x\' 0a 'y'/x,newline=nul,hex + x\nyz + 0: x\x0ay + +/(*NUL)^X\NY/ + X\nY + 0: X\x0aY + X\rY + 0: X\x0dY +\= Expect no match + X\x00Y +No match + +/(?<=abc|)/ + abcde\=aftertext + 0: + 0+ abcde + +/(?<=|abc)/ + abcde\=aftertext + 0: + 0+ abcde + +/(?<=abc|)/endanchored + abcde\=aftertext + 0: + 0+ + +/(?<=|abc)/endanchored + abcde\=aftertext + 0: + 0+ + +/(*LIMIT_MATCH=100).*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00 \x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););/no_dotstar_anchor +\= Expect limit exceeded +.*(?![|H]?.*(?![|H]?););.*(?![|H]?.*(?![|H]?););\x00\x00\x00\x00\x00\x00\x00(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?!(?![|);)?.*(![|H]?);)?.*(?![|H]?);)?.*(?![|H]?);)?.*(?![|H]););![|H]?););[|H]?);|H]?);)\x00\x00\x00 \x00\x00\x00H]?););?![|H]?);)?.*(?![|H]?););[||H]?);)?.*(?![|H]?););[|H]?);(?![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?););;[\x00\x00\x00\x00\x00\x00\x00![|H]?););![|H]?););[|H]?);|H]?);)?.*(?![|H]?);); +Failed: error -47: match limit exceeded + +/\n/firstline + xyz\nabc + 0: \x0a + +/\nabc/firstline + xyz\nabc + 0: \x0aabc + +/\x{0a}abc/firstline,newline=crlf +\= Expect no match + xyz\r\nabc +No match + +/[abc]/firstline +\= Expect no match + \na +No match + +/foobar/ + the foobar thing\=copy_matched_subject + 0: foobar + the foobar thing\=copy_matched_subject,zero_terminate + 0: foobar + +/foobar/g + the foobar thing foobar again\=copy_matched_subject + 0: foobar + 0: foobar + +/(?(VERSION>=0)^B0W)/ + B0W-W0W + 0: B0W +\= Expect no match + 0 +No match + +/(?(VERSION>=1000)^B0W|W0W)/ + B0W-W0W + 0: W0W +\= Expect no match + 0 +No match + +/(?<=pqr)abc(?=xyz)/ + 123pqrabcxy\=ps,allusedtext +Partial match: pqrabcxy + <<< + 123pqrabcxyz\=ps,allusedtext + 0: pqrabcxyz + <<< >>> + +/(?>a+b)/ + aaaa\=ps +Partial match: aaaa + aaaab\=ps + 0: aaaab + +/(abc)(?1)/ + abca\=ps +Partial match: abca + abcabc\=ps + 0: abcabc + +/(?(?=abc).*|Z)/ + ab\=ps +Partial match: ab + abcxyz\=ps + 0: abcxyz + +/(abc)++x/ + abcab\=ps +Partial match: abcab + abc\=ps +Partial match: abc + ab\=ps +Partial match: ab + abcx + 0: abcx + +/\z/ + abc\=ph +Partial match: + abc\=ps + 0: + +/\Z/ + abc\=ph +Partial match: + abc\=ps + 0: + abc\n\=ph +Partial match: \x0a + abc\n\=ps + 0: + +/c*+(?<=[bc])/ + abc\=ph +Partial match: c + ab\=ph +Partial match: + abc\=ps + 0: c + ab\=ps + 0: + +/c++(?<=[bc])/ + abc\=ph +Partial match: c + ab\=ph +Partial match: + +/(?<=(?=.(?<=x)))/ + abx + 0: + ab\=ph +Partial match: + bxyz + 0: + xyz + 0: + +/(?![ab]).*/ + ab\=ph +Partial match: + +/c*+/ + ab\=ph,offset=2 +Partial match: + +# End of testinput6 diff --git a/src/pcre2/testdata/testoutput7 b/src/pcre2/testdata/testoutput7 new file mode 100644 index 00000000..004186e9 --- /dev/null +++ b/src/pcre2/testdata/testoutput7 @@ -0,0 +1,3542 @@ +# This set of tests checks UTF and Unicode property support with the DFA +# matching functionality of pcre_dfa_match(). A default subject modifier is +# used to force DFA matching for all tests. + +#subject dfa +#newline_default LF any anyCRLF + +/\x{100}ab/utf + \x{100}ab + 0: \x{100}ab + +/a\x{100}*b/utf + ab + 0: ab + a\x{100}b + 0: a\x{100}b + a\x{100}\x{100}b + 0: a\x{100}\x{100}b + +/a\x{100}+b/utf + a\x{100}b + 0: a\x{100}b + a\x{100}\x{100}b + 0: a\x{100}\x{100}b +\= Expect no match + ab +No match + +/\bX/utf + Xoanon + 0: X + +Xoanon + 0: X + \x{300}Xoanon + 0: X +\= Expect no match + YXoanon +No match + +/\BX/utf + YXoanon + 0: X +\= Expect no match + Xoanon +No match + +Xoanon +No match + \x{300}Xoanon +No match + +/X\b/utf + X+oanon + 0: X + ZX\x{300}oanon + 0: X + FAX + 0: X +\= Expect no match + Xoanon +No match + +/X\B/utf + Xoanon + 0: X +\= Expect no match + X+oanon +No match + ZX\x{300}oanon +No match + FAX +No match + +/[^a]/utf + abcd + 0: b + a\x{100} + 0: \x{100} + +/^[abc\x{123}\x{400}-\x{402}]{2,3}\d/utf + ab99 + 0: ab9 + \x{123}\x{123}45 + 0: \x{123}\x{123}4 + \x{400}\x{401}\x{402}6 + 0: \x{400}\x{401}\x{402}6 +\= Expect no match + d99 +No match + \x{123}\x{122}4 +No match + \x{400}\x{403}6 +No match + \x{400}\x{401}\x{402}\x{402}6 +No match + +/a.b/utf + acb + 0: acb + a\x7fb + 0: a\x{7f}b + a\x{100}b + 0: a\x{100}b +\= Expect no match + a\nb +No match + +/a(.{3})b/utf + a\x{4000}xyb + 0: a\x{4000}xyb + a\x{4000}\x7fyb + 0: a\x{4000}\x{7f}yb + a\x{4000}\x{100}yb + 0: a\x{4000}\x{100}yb +\= Expect no match + a\x{4000}b +No match + ac\ncb +No match + +/a(.*?)(.)/ + a\xc0\x88b + 0: a\xc0\x88b + 1: a\xc0\x88 + 2: a\xc0 + +/a(.*?)(.)/utf + a\x{100}b + 0: a\x{100}b + 1: a\x{100} + +/a(.*)(.)/ + a\xc0\x88b + 0: a\xc0\x88b + 1: a\xc0\x88 + 2: a\xc0 + +/a(.*)(.)/utf + a\x{100}b + 0: a\x{100}b + 1: a\x{100} + +/a(.)(.)/ + a\xc0\x92bcd + 0: a\xc0\x92 + +/a(.)(.)/utf + a\x{240}bcd + 0: a\x{240}b + +/a(.?)(.)/ + a\xc0\x92bcd + 0: a\xc0\x92 + 1: a\xc0 + +/a(.?)(.)/utf + a\x{240}bcd + 0: a\x{240}b + 1: a\x{240} + +/a(.??)(.)/ + a\xc0\x92bcd + 0: a\xc0\x92 + 1: a\xc0 + +/a(.??)(.)/utf + a\x{240}bcd + 0: a\x{240}b + 1: a\x{240} + +/a(.{3})b/utf + a\x{1234}xyb + 0: a\x{1234}xyb + a\x{1234}\x{4321}yb + 0: a\x{1234}\x{4321}yb + a\x{1234}\x{4321}\x{3412}b + 0: a\x{1234}\x{4321}\x{3412}b +\= Expect no match + a\x{1234}b +No match + ac\ncb +No match + +/a(.{3,})b/utf + a\x{1234}xyb + 0: a\x{1234}xyb + a\x{1234}\x{4321}yb + 0: a\x{1234}\x{4321}yb + a\x{1234}\x{4321}\x{3412}b + 0: a\x{1234}\x{4321}\x{3412}b + axxxxbcdefghijb + 0: axxxxbcdefghijb + 1: axxxxb + a\x{1234}\x{4321}\x{3412}\x{3421}b + 0: a\x{1234}\x{4321}\x{3412}\x{3421}b +\= Expect no match + a\x{1234}b +No match + +/a(.{3,}?)b/utf + a\x{1234}xyb + 0: a\x{1234}xyb + a\x{1234}\x{4321}yb + 0: a\x{1234}\x{4321}yb + a\x{1234}\x{4321}\x{3412}b + 0: a\x{1234}\x{4321}\x{3412}b + axxxxbcdefghijb + 0: axxxxbcdefghijb + 1: axxxxb + a\x{1234}\x{4321}\x{3412}\x{3421}b + 0: a\x{1234}\x{4321}\x{3412}\x{3421}b +\= Expect no match + a\x{1234}b +No match + +/a(.{3,5})b/utf + a\x{1234}xyb + 0: a\x{1234}xyb + a\x{1234}\x{4321}yb + 0: a\x{1234}\x{4321}yb + a\x{1234}\x{4321}\x{3412}b + 0: a\x{1234}\x{4321}\x{3412}b + axxxxbcdefghijb + 0: axxxxb + a\x{1234}\x{4321}\x{3412}\x{3421}b + 0: a\x{1234}\x{4321}\x{3412}\x{3421}b + axbxxbcdefghijb + 0: axbxxb + axxxxxbcdefghijb + 0: axxxxxb +\= Expect no match + a\x{1234}b +No match + axxxxxxbcdefghijb +No match + +/a(.{3,5}?)b/utf + a\x{1234}xyb + 0: a\x{1234}xyb + a\x{1234}\x{4321}yb + 0: a\x{1234}\x{4321}yb + a\x{1234}\x{4321}\x{3412}b + 0: a\x{1234}\x{4321}\x{3412}b + axxxxbcdefghijb + 0: axxxxb + a\x{1234}\x{4321}\x{3412}\x{3421}b + 0: a\x{1234}\x{4321}\x{3412}\x{3421}b + axbxxbcdefghijb + 0: axbxxb + axxxxxbcdefghijb + 0: axxxxxb +\= Expect no match + a\x{1234}b +No match + axxxxxxbcdefghijb +No match + +/^[a\x{c0}]/utf +\= Expect no match + \x{100} +No match + +/(?<=aXb)cd/utf + aXbcd + 0: cd + +/(?<=a\x{100}b)cd/utf + a\x{100}bcd + 0: cd + +/(?<=a\x{100000}b)cd/utf + a\x{100000}bcd + 0: cd + +/(?:\x{100}){3}b/utf + \x{100}\x{100}\x{100}b + 0: \x{100}\x{100}\x{100}b +\= Expect no match + \x{100}\x{100}b +No match + +/\x{ab}/utf + \x{ab} + 0: \x{ab} + \xc2\xab + 0: \x{ab} +\= Expect no match + \x00{ab} +No match + +/(?<=(.))X/utf + WXYZ + 0: X + \x{256}XYZ + 0: X +\= Expect no match + XYZ +No match + +/[^a]+/g,utf + bcd + 0: bcd + \x{100}aY\x{256}Z + 0: \x{100} + 0: Y\x{256}Z + +/^[^a]{2}/utf + \x{100}bc + 0: \x{100}b + +/^[^a]{2,}/utf + \x{100}bcAa + 0: \x{100}bcA + +/^[^a]{2,}?/utf + \x{100}bca + 0: \x{100}bc + 1: \x{100}b + +/[^a]+/gi,utf + bcd + 0: bcd + \x{100}aY\x{256}Z + 0: \x{100} + 0: Y\x{256}Z + +/^[^a]{2}/i,utf + \x{100}bc + 0: \x{100}b + +/^[^a]{2,}/i,utf + \x{100}bcAa + 0: \x{100}bc + +/^[^a]{2,}?/i,utf + \x{100}bca + 0: \x{100}bc + 1: \x{100}b + +/\x{100}{0,0}/utf + abcd + 0: + +/\x{100}?/utf + abcd + 0: + \x{100}\x{100} + 0: \x{100} + +/\x{100}{0,3}/utf + \x{100}\x{100} + 0: \x{100}\x{100} + \x{100}\x{100}\x{100}\x{100} + 0: \x{100}\x{100}\x{100} + +/\x{100}*/utf + abce + 0: + \x{100}\x{100}\x{100}\x{100} + 0: \x{100}\x{100}\x{100}\x{100} + +/\x{100}{1,1}/utf + abcd\x{100}\x{100}\x{100}\x{100} + 0: \x{100} + +/\x{100}{1,3}/utf + abcd\x{100}\x{100}\x{100}\x{100} + 0: \x{100}\x{100}\x{100} + +/\x{100}+/utf + abcd\x{100}\x{100}\x{100}\x{100} + 0: \x{100}\x{100}\x{100}\x{100} + +/\x{100}{3}/utf + abcd\x{100}\x{100}\x{100}XX + 0: \x{100}\x{100}\x{100} + +/\x{100}{3,5}/utf + abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX + 0: \x{100}\x{100}\x{100}\x{100}\x{100} + +/\x{100}{3,}/utf,no_auto_possess + abcd\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}XX + 0: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + 1: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + 2: \x{100}\x{100}\x{100}\x{100}\x{100} + 3: \x{100}\x{100}\x{100}\x{100} + 4: \x{100}\x{100}\x{100} + +/(?<=a\x{100}{2}b)X/utf + Xyyya\x{100}\x{100}bXzzz + 0: X + +/\D*/utf,no_auto_possess + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +Matched, but offsets vector is too small to show all matches + 0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + 1: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + 2: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + 3: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + 4: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + 5: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + 6: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + 7: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + 8: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + 9: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +10: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +11: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +12: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +13: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa +14: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa + +/\D*/utf,no_auto_possess + \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} +Matched, but offsets vector is too small to show all matches + 0: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + 1: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + 2: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + 3: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + 4: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + 5: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + 6: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + 7: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + 8: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + 9: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} +10: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} +11: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} +12: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} +13: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} +14: \x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100}\x{100} + +/\D/utf + 1X2 + 0: X + 1\x{100}2 + 0: \x{100} + +/>\S/utf + > >X Y + 0: >X + > >\x{100} Y + 0: >\x{100} + +/\d/utf + \x{100}3 + 0: 3 + +/\s/utf + \x{100} X + 0: + +/\D+/utf + 12abcd34 + 0: abcd +\= Expect no match + 1234 +No match + +/\D{2,3}/utf + 12abcd34 + 0: abc + 12ab34 + 0: ab +\= Expect no match + 1234 +No match + 12a34 +No match + +/\D{2,3}?/utf + 12abcd34 + 0: abc + 1: ab + 12ab34 + 0: ab +\= Expect no match + 1234 +No match + 12a34 +No match + +/\d+/utf + 12abcd34 + 0: 12 + +/\d{2,3}/utf + 12abcd34 + 0: 12 + 1234abcd + 0: 123 +\= Expect no match + 1.4 +No match + +/\d{2,3}?/utf + 12abcd34 + 0: 12 + 1234abcd + 0: 123 + 1: 12 +\= Expect no match + 1.4 +No match + +/\S+/utf + 12abcd34 + 0: 12abcd34 +\= Expect no match + \ \ +No match + +/\S{2,3}/utf + 12abcd34 + 0: 12a + 1234abcd + 0: 123 +\= Expect no match + \ \ +No match + +/\S{2,3}?/utf + 12abcd34 + 0: 12a + 1: 12 + 1234abcd + 0: 123 + 1: 12 +\= Expect no match + \ \ +No match + +/>\s+ <34 + 0: > < + +/>\s{2,3} < + ab> < +\= Expect no match + ab> \s{2,3}? < + ab> < +\= Expect no match + ab> \xff< + 0: \xff + +/[\xff]/utf + >\x{ff}< + 0: \x{ff} + +/[^\xFF]/ + XYZ + 0: X + +/[^\xff]/utf + XYZ + 0: X + \x{123} + 0: \x{123} + +/^[ac]*b/utf +\= Expect no match + xb +No match + +/^[ac\x{100}]*b/utf +\= Expect no match + xb +No match + +/^[^x]*b/i,utf +\= Expect no match + xb +No match + +/^[^x]*b/utf +\= Expect no match + xb +No match + +/^\d*b/utf +\= Expect no match + xb +No match + +/(|a)/g,utf + catac + 0: + 0: a + 1: + 0: + 0: a + 1: + 0: + 0: + a\x{256}a + 0: a + 1: + 0: + 0: a + 1: + 0: + +/^\x{85}$/i,utf + \x{85} + 0: \x{85} + +/^abc./gmx,newline=any,utf + abc1 \x0aabc2 \x0babc3xx \x0cabc4 \x0dabc5xx \x0d\x0aabc6 \x{0085}abc7 \x{2028}abc8 \x{2029}abc9 JUNK + 0: abc1 + 0: abc2 + 0: abc3 + 0: abc4 + 0: abc5 + 0: abc6 + 0: abc7 + 0: abc8 + 0: abc9 + +/abc.$/gmx,newline=any,utf + abc1\x0a abc2\x0b abc3\x0c abc4\x0d abc5\x0d\x0a abc6\x{0085} abc7\x{2028} abc8\x{2029} abc9 + 0: abc1 + 0: abc2 + 0: abc3 + 0: abc4 + 0: abc5 + 0: abc6 + 0: abc7 + 0: abc8 + 0: abc9 + +/^a\Rb/bsr=unicode,utf + a\nb + 0: a\x{0a}b + a\rb + 0: a\x{0d}b + a\r\nb + 0: a\x{0d}\x{0a}b + a\x0bb + 0: a\x{0b}b + a\x0cb + 0: a\x{0c}b + a\x{85}b + 0: a\x{85}b + a\x{2028}b + 0: a\x{2028}b + a\x{2029}b + 0: a\x{2029}b +\= Expect no match + a\n\rb +No match + +/^a\R*b/bsr=unicode,utf + ab + 0: ab + a\nb + 0: a\x{0a}b + a\rb + 0: a\x{0d}b + a\r\nb + 0: a\x{0d}\x{0a}b + a\x0bb + 0: a\x{0b}b + a\x0c\x{2028}\x{2029}b + 0: a\x{0c}\x{2028}\x{2029}b + a\x{85}b + 0: a\x{85}b + a\n\rb + 0: a\x{0a}\x{0d}b + a\n\r\x{85}\x0cb + 0: a\x{0a}\x{0d}\x{85}\x{0c}b + +/^a\R+b/bsr=unicode,utf + a\nb + 0: a\x{0a}b + a\rb + 0: a\x{0d}b + a\r\nb + 0: a\x{0d}\x{0a}b + a\x0bb + 0: a\x{0b}b + a\x0c\x{2028}\x{2029}b + 0: a\x{0c}\x{2028}\x{2029}b + a\x{85}b + 0: a\x{85}b + a\n\rb + 0: a\x{0a}\x{0d}b + a\n\r\x{85}\x0cb + 0: a\x{0a}\x{0d}\x{85}\x{0c}b +\= Expect no match + ab +No match + +/^a\R{1,3}b/bsr=unicode,utf + a\nb + 0: a\x{0a}b + a\n\rb + 0: a\x{0a}\x{0d}b + a\n\r\x{85}b + 0: a\x{0a}\x{0d}\x{85}b + a\r\n\r\nb + 0: a\x{0d}\x{0a}\x{0d}\x{0a}b + a\r\n\r\n\r\nb + 0: a\x{0d}\x{0a}\x{0d}\x{0a}\x{0d}\x{0a}b + a\n\r\n\rb + 0: a\x{0a}\x{0d}\x{0a}\x{0d}b + a\n\n\r\nb + 0: a\x{0a}\x{0a}\x{0d}\x{0a}b +\= Expect no match + a\n\n\n\rb +No match + a\r +No match + +/\h+\V?\v{3,4}/utf,no_auto_possess + \x09\x20\x{a0}X\x0a\x0b\x0c\x0d\x0a + 0: \x{09} \x{a0}X\x{0a}\x{0b}\x{0c}\x{0d} + 1: \x{09} \x{a0}X\x{0a}\x{0b}\x{0c} + +/\V?\v{3,4}/utf,no_auto_possess + \x20\x{a0}X\x0a\x0b\x0c\x0d\x0a + 0: X\x{0a}\x{0b}\x{0c}\x{0d} + 1: X\x{0a}\x{0b}\x{0c} + +/\h+\V?\v{3,4}/utf,no_auto_possess + >\x09\x20\x{a0}X\x0a\x0a\x0a< + 0: \x{09} \x{a0}X\x{0a}\x{0a}\x{0a} + +/\V?\v{3,4}/utf,no_auto_possess + >\x09\x20\x{a0}X\x0a\x0a\x0a< + 0: X\x{0a}\x{0a}\x{0a} + +/\H\h\V\v/utf + X X\x0a + 0: X X\x{0a} + X\x09X\x0b + 0: X\x{09}X\x{0b} +\= Expect no match + \x{a0} X\x0a +No match + +/\H*\h+\V?\v{3,4}/utf,no_auto_possess + \x09\x20\x{a0}X\x0a\x0b\x0c\x0d\x0a + 0: \x{09} \x{a0}X\x{0a}\x{0b}\x{0c}\x{0d} + 1: \x{09} \x{a0}X\x{0a}\x{0b}\x{0c} + \x09\x20\x{a0}\x0a\x0b\x0c\x0d\x0a + 0: \x{09} \x{a0}\x{0a}\x{0b}\x{0c}\x{0d} + 1: \x{09} \x{a0}\x{0a}\x{0b}\x{0c} + \x09\x20\x{a0}\x0a\x0b\x0c + 0: \x{09} \x{a0}\x{0a}\x{0b}\x{0c} +\= Expect no match + \x09\x20\x{a0}\x0a\x0b +No match + +/\H\h\V\v/utf + \x{3001}\x{3000}\x{2030}\x{2028} + 0: \x{3001}\x{3000}\x{2030}\x{2028} + X\x{180e}X\x{85} + 0: X\x{180e}X\x{85} +\= Expect no match + \x{2009} X\x0a +No match + +/\H*\h+\V?\v{3,4}/utf,no_auto_possess + \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x0c\x0d\x0a + 0: \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x{0c}\x{0d} + 1: \x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x{0c} + \x09\x{205f}\x{a0}\x0a\x{2029}\x0c\x{2028}\x0a + 0: \x{09}\x{205f}\x{a0}\x{0a}\x{2029}\x{0c}\x{2028} + 1: \x{09}\x{205f}\x{a0}\x{0a}\x{2029}\x{0c} + \x09\x20\x{202f}\x0a\x0b\x0c + 0: \x{09} \x{202f}\x{0a}\x{0b}\x{0c} +\= Expect no match + \x09\x{200a}\x{a0}\x{2028}\x0b +No match + +/a\Rb/I,bsr=anycrlf,utf +Capture group count = 0 +Options: utf +\R matches CR, LF, or CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + a\rb + 0: a\x{0d}b + a\nb + 0: a\x{0a}b + a\r\nb + 0: a\x{0d}\x{0a}b +\= Expect no match + a\x{85}b +No match + a\x0bb +No match + +/a\Rb/I,bsr=unicode,utf +Capture group count = 0 +Options: utf +\R matches any Unicode newline +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 3 + a\rb + 0: a\x{0d}b + a\nb + 0: a\x{0a}b + a\r\nb + 0: a\x{0d}\x{0a}b + a\x{85}b + 0: a\x{85}b + a\x0bb + 0: a\x{0b}b + +/a\R?b/I,bsr=anycrlf,utf +Capture group count = 0 +Options: utf +\R matches CR, LF, or CRLF +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + a\rb + 0: a\x{0d}b + a\nb + 0: a\x{0a}b + a\r\nb + 0: a\x{0d}\x{0a}b +\= Expect no match + a\x{85}b +No match + a\x0bb +No match + +/a\R?b/I,bsr=unicode,utf +Capture group count = 0 +Options: utf +\R matches any Unicode newline +First code unit = 'a' +Last code unit = 'b' +Subject length lower bound = 2 + a\rb + 0: a\x{0d}b + a\nb + 0: a\x{0a}b + a\r\nb + 0: a\x{0d}\x{0a}b + a\x{85}b + 0: a\x{85}b + a\x0bb + 0: a\x{0b}b + +/X/newline=any,utf,firstline + A\x{1ec5}ABCXYZ + 0: X + +/abcd*/utf + xxxxabcd\=ps + 0: abcd + xxxxabcd\=ph +Partial match: abcd + +/abcd*/i,utf + xxxxabcd\=ps + 0: abcd + xxxxabcd\=ph +Partial match: abcd + XXXXABCD\=ps + 0: ABCD + XXXXABCD\=ph +Partial match: ABCD + +/abc\d*/utf + xxxxabc1\=ps + 0: abc1 + xxxxabc1\=ph +Partial match: abc1 + +/abc[de]*/utf + xxxxabcde\=ps + 0: abcde + xxxxabcde\=ph +Partial match: abcde + +/\bthe cat\b/utf + the cat\=ps + 0: the cat + the cat\=ph +Partial match: the cat + +/./newline=crlf,utf + \r\=ps + 0: \x{0d} + \r\=ph +Partial match: \x{0d} + +/.{2,3}/newline=crlf,utf + \r\=ps +Partial match: \x{0d} + \r\=ph +Partial match: \x{0d} + \r\r\=ps + 0: \x{0d}\x{0d} + \r\r\=ph +Partial match: \x{0d}\x{0d} + \r\r\r\=ps + 0: \x{0d}\x{0d}\x{0d} + \r\r\r\=ph +Partial match: \x{0d}\x{0d}\x{0d} + +/.{2,3}?/newline=crlf,utf + \r\=ps +Partial match: \x{0d} + \r\=ph +Partial match: \x{0d} + \r\r\=ps + 0: \x{0d}\x{0d} + \r\r\=ph +Partial match: \x{0d}\x{0d} + \r\r\r\=ps + 0: \x{0d}\x{0d}\x{0d} + 1: \x{0d}\x{0d} + \r\r\r\=ph +Partial match: \x{0d}\x{0d}\x{0d} + +/[^\x{100}]/utf + \x{100}\x{101}X + 0: \x{101} + +/[^\x{100}]+/utf + \x{100}\x{101}X + 0: \x{101}X + +/\pL\P{Nd}/utf + AB + 0: AB +\= Expect no match + A0 +No match + 00 +No match + +/\X./utf + AB + 0: AB + A\x{300}BC + 0: A\x{300}B + A\x{300}\x{301}\x{302}BC + 0: A\x{300}\x{301}\x{302}B +\= Expect no match + \x{300} +No match + +/\X\X/utf + ABC + 0: AB + A\x{300}B\x{300}\x{301}C + 0: A\x{300}B\x{300}\x{301} + A\x{300}\x{301}\x{302}BC + 0: A\x{300}\x{301}\x{302}B +\= Expect no match + \x{300} +No match + +/^\pL+/utf + abcd + 0: abcd + a + 0: a + +/^\PL+/utf + 1234 + 0: 1234 + = + 0: = +\= Expect no match + abcd +No match + +/^\X+/utf + abcdA\x{300}\x{301}\x{302} + 0: abcdA\x{300}\x{301}\x{302} + A\x{300}\x{301}\x{302} + 0: A\x{300}\x{301}\x{302} + A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302} + 0: A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302} + a + 0: a + \x{300}\x{301}\x{302} + 0: \x{300}\x{301}\x{302} + +/\X?abc/utf + abc + 0: abc + A\x{300}abc + 0: A\x{300}abc + A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz + 0: A\x{300}abc + \x{300}abc + 0: \x{300}abc + +/^\X?abc/utf + abc + 0: abc + A\x{300}abc + 0: A\x{300}abc + \x{300}abc + 0: \x{300}abc +\= Expect no match + A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz +No match + +/\X*abc/utf + abc + 0: abc + A\x{300}abc + 0: A\x{300}abc + A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz + 0: A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abc + \x{300}abc + 0: \x{300}abc + +/^\X*abc/utf + abc + 0: abc + A\x{300}abc + 0: A\x{300}abc + A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abcxyz + 0: A\x{300}\x{301}\x{302}A\x{300}A\x{300}A\x{300}abc + \x{300}abc + 0: \x{300}abc + +/^\pL?=./utf + A=b + 0: A=b + =c + 0: =c +\= Expect no match + 1=2 +No match + AAAA=b +No match + +/^\pL*=./utf + AAAA=b + 0: AAAA=b + =c + 0: =c +\= Expect no match + 1=2 +No match + +/^\X{2,3}X/utf + A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X + 0: A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X + A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X + 0: A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X +\= Expect no match + X +No match + A\x{300}\x{301}\x{302}X +No match + A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}A\x{300}\x{301}\x{302}X +No match + +/^\pC\pL\pM\pN\pP\pS\pZ\p{Xsp}/utf + >\x{1680}\x{2028}\x{0b} + 0: >\x{1680} +\= Expect no match + \x{0b} +No match + +/^>\p{Xsp}+/utf,no_auto_possess + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028} + 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680} + 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0} + 4: > \x{09}\x{0a}\x{0c}\x{0d} + 5: > \x{09}\x{0a}\x{0c} + 6: > \x{09}\x{0a} + 7: > \x{09} + 8: > + +/^>\p{Xsp}*/utf,no_auto_possess + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028} + 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680} + 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0} + 4: > \x{09}\x{0a}\x{0c}\x{0d} + 5: > \x{09}\x{0a}\x{0c} + 6: > \x{09}\x{0a} + 7: > \x{09} + 8: > + 9: > + +/^>\p{Xsp}{2,9}/utf,no_auto_possess + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028} + 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680} + 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0} + 4: > \x{09}\x{0a}\x{0c}\x{0d} + 5: > \x{09}\x{0a}\x{0c} + 6: > \x{09}\x{0a} + 7: > \x{09} + +/^>[\p{Xsp}]/utf,no_auto_possess + >\x{2028}\x{0b} + 0: >\x{2028} + +/^>[\p{Xsp}]+/utf,no_auto_possess + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028} + 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680} + 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0} + 4: > \x{09}\x{0a}\x{0c}\x{0d} + 5: > \x{09}\x{0a}\x{0c} + 6: > \x{09}\x{0a} + 7: > \x{09} + 8: > + +/^>\p{Xps}/utf + >\x{1680}\x{2028}\x{0b} + 0: >\x{1680} + >\x{a0} + 0: >\x{a0} +\= Expect no match + \x{0b} +No match + +/^>\p{Xps}+/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}+?/utf + >\x{1680}\x{2028}\x{0b} + 0: >\x{1680}\x{2028}\x{0b} + 1: >\x{1680}\x{2028} + 2: >\x{1680} + +/^>\p{Xps}*/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}{2,9}/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^>\p{Xps}{2,9}?/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 1: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028} + 2: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680} + 3: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0} + 4: > \x{09}\x{0a}\x{0c}\x{0d} + 5: > \x{09}\x{0a}\x{0c} + 6: > \x{09}\x{0a} + 7: > \x{09} + +/^>[\p{Xps}]/utf + >\x{2028}\x{0b} + 0: >\x{2028} + +/^>[\p{Xps}]+/utf + > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + 0: > \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b} + +/^\p{Xwd}/utf + ABCD + 0: A + 1234 + 0: 1 + \x{6ca} + 0: \x{6ca} + \x{a6c} + 0: \x{a6c} + \x{10a7} + 0: \x{10a7} + _ABC + 0: _ +\= Expect no match + [] +No match + +/^\p{Xwd}+/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + +/^\p{Xwd}*/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + +/^\p{Xwd}{2,9}/utf + A_12\x{6ca}\x{a6c}\x{10a7} + 0: A_12\x{6ca}\x{a6c}\x{10a7} + +/^[\p{Xwd}]/utf + ABCD1234_ + 0: A + 1234abcd_ + 0: 1 + \x{6ca} + 0: \x{6ca} + \x{a6c} + 0: \x{a6c} + \x{10a7} + 0: \x{10a7} + _ABC + 0: _ +\= Expect no match + [] +No match + +/^[\p{Xwd}]+/utf + ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + 0: ABCD1234\x{6ca}\x{a6c}\x{10a7}_ + +# Unicode properties for \b abd \B + +/\b...\B/utf,ucp + abc_ + 0: abc + \x{37e}abc\x{376} + 0: abc + \x{37e}\x{376}\x{371}\x{393}\x{394} + 0: \x{376}\x{371}\x{393} + !\x{c0}++\x{c1}\x{c2} + 0: ++\x{c1} + !\x{c0}+++++ + 0: \x{c0}++ + +# Without PCRE_UCP, non-ASCII always fail, even if < 256 + +/\b...\B/utf + abc_ + 0: abc +\= Expect no match + \x{37e}abc\x{376} +No match + \x{37e}\x{376}\x{371}\x{393}\x{394} +No match + !\x{c0}++\x{c1}\x{c2} +No match + !\x{c0}+++++ +No match + +# With PCRE_UCP, non-UTF8 chars that are < 256 still check properties + +/\b...\B/ucp + abc_ + 0: abc + !\x{c0}++\x{c1}\x{c2} + 0: ++\xc1 + !\x{c0}+++++ + 0: \xc0++ + +# Caseless single negated characters > 127 need UCP support + +/[^\x{100}]/i,utf + \x{100}\x{101}X + 0: X + +/[^\x{100}]+/i,utf + \x{100}\x{101}XX + 0: XX + +/^\X/utf + A\=ps + 0: A + A\=ph +Partial match: A + A\x{300}\x{301}\=ps + 0: A\x{300}\x{301} + A\x{300}\x{301}\=ph +Partial match: A\x{300}\x{301} + A\x{301}\=ps + 0: A\x{301} + A\x{301}\=ph +Partial match: A\x{301} + +/^\X{2,3}/utf + A\=ps +Partial match: A + A\=ph +Partial match: A + AA\=ps + 0: AA + AA\=ph +Partial match: AA + A\x{300}\x{301}\=ps +Partial match: A\x{300}\x{301} + A\x{300}\x{301}\=ph +Partial match: A\x{300}\x{301} + A\x{300}\x{301}A\x{300}\x{301}\=ps + 0: A\x{300}\x{301}A\x{300}\x{301} + A\x{300}\x{301}A\x{300}\x{301}\=ph +Partial match: A\x{300}\x{301}A\x{300}\x{301} + +/^\X{2}/utf + AA\=ps + 0: AA + AA\=ph +Partial match: AA + A\x{300}\x{301}A\x{300}\x{301}\=ps + 0: A\x{300}\x{301}A\x{300}\x{301} + A\x{300}\x{301}A\x{300}\x{301}\=ph +Partial match: A\x{300}\x{301}A\x{300}\x{301} + +/^\X+/utf + AA\=ps + 0: AA + AA\=ph +Partial match: AA + +/^\X+?Z/utf + AA\=ps +Partial match: AA + AA\=ph +Partial match: AA + +# These are tests for extended grapheme clusters + +/^\X/utf,aftertext + G\x{34e}\x{34e}X + 0: G\x{34e}\x{34e} + 0+ X + \x{34e}\x{34e}X + 0: \x{34e}\x{34e} + 0+ X + \x04X + 0: \x{04} + 0+ X + \x{1100}X + 0: \x{1100} + 0+ X + \x{1100}\x{34e}X + 0: \x{1100}\x{34e} + 0+ X + \x{1b04}\x{1b04}X + 0: \x{1b04}\x{1b04} + 0+ X +\= These match up to the roman letters + \x{1111}\x{1111}L,L + 0: \x{1111}\x{1111} + 0+ L,L + \x{1111}\x{1111}\x{1169}L,L,V + 0: \x{1111}\x{1111}\x{1169} + 0+ L,L,V + \x{1111}\x{ae4c}L, LV + 0: \x{1111}\x{ae4c} + 0+ L, LV + \x{1111}\x{ad89}L, LVT + 0: \x{1111}\x{ad89} + 0+ L, LVT + \x{1111}\x{ae4c}\x{1169}L, LV, V + 0: \x{1111}\x{ae4c}\x{1169} + 0+ L, LV, V + \x{1111}\x{ae4c}\x{1169}\x{1169}L, LV, V, V + 0: \x{1111}\x{ae4c}\x{1169}\x{1169} + 0+ L, LV, V, V + \x{1111}\x{ae4c}\x{1169}\x{11fe}L, LV, V, T + 0: \x{1111}\x{ae4c}\x{1169}\x{11fe} + 0+ L, LV, V, T + \x{1111}\x{ad89}\x{11fe}L, LVT, T + 0: \x{1111}\x{ad89}\x{11fe} + 0+ L, LVT, T + \x{1111}\x{ad89}\x{11fe}\x{11fe}L, LVT, T, T + 0: \x{1111}\x{ad89}\x{11fe}\x{11fe} + 0+ L, LVT, T, T + \x{ad89}\x{11fe}\x{11fe}LVT, T, T + 0: \x{ad89}\x{11fe}\x{11fe} + 0+ LVT, T, T +\= These match just the first codepoint (invalid sequence) + \x{1111}\x{11fe}L, T + 0: \x{1111} + 0+ \x{11fe}L, T + \x{ae4c}\x{1111}LV, L + 0: \x{ae4c} + 0+ \x{1111}LV, L + \x{ae4c}\x{ae4c}LV, LV + 0: \x{ae4c} + 0+ \x{ae4c}LV, LV + \x{ae4c}\x{ad89}LV, LVT + 0: \x{ae4c} + 0+ \x{ad89}LV, LVT + \x{1169}\x{1111}V, L + 0: \x{1169} + 0+ \x{1111}V, L + \x{1169}\x{ae4c}V, LV + 0: \x{1169} + 0+ \x{ae4c}V, LV + \x{1169}\x{ad89}V, LVT + 0: \x{1169} + 0+ \x{ad89}V, LVT + \x{ad89}\x{1111}LVT, L + 0: \x{ad89} + 0+ \x{1111}LVT, L + \x{ad89}\x{1169}LVT, V + 0: \x{ad89} + 0+ \x{1169}LVT, V + \x{ad89}\x{ae4c}LVT, LV + 0: \x{ad89} + 0+ \x{ae4c}LVT, LV + \x{ad89}\x{ad89}LVT, LVT + 0: \x{ad89} + 0+ \x{ad89}LVT, LVT + \x{11fe}\x{1111}T, L + 0: \x{11fe} + 0+ \x{1111}T, L + \x{11fe}\x{1169}T, V + 0: \x{11fe} + 0+ \x{1169}T, V + \x{11fe}\x{ae4c}T, LV + 0: \x{11fe} + 0+ \x{ae4c}T, LV + \x{11fe}\x{ad89}T, LVT + 0: \x{11fe} + 0+ \x{ad89}T, LVT +\= Test extend and spacing mark + \x{1111}\x{ae4c}\x{0711}L, LV, extend + 0: \x{1111}\x{ae4c}\x{711} + 0+ L, LV, extend + \x{1111}\x{ae4c}\x{1b04}L, LV, spacing mark + 0: \x{1111}\x{ae4c}\x{1b04} + 0+ L, LV, spacing mark + \x{1111}\x{ae4c}\x{1b04}\x{0711}\x{1b04}L, LV, spacing mark, extend, spacing mark + 0: \x{1111}\x{ae4c}\x{1b04}\x{711}\x{1b04} + 0+ L, LV, spacing mark, extend, spacing mark +\= Test CR, LF, and control + \x0d\x{0711}CR, extend + 0: \x{0d} + 0+ \x{711}CR, extend + \x0d\x{1b04}CR, spacingmark + 0: \x{0d} + 0+ \x{1b04}CR, spacingmark + \x0a\x{0711}LF, extend + 0: \x{0a} + 0+ \x{711}LF, extend + \x0a\x{1b04}LF, spacingmark + 0: \x{0a} + 0+ \x{1b04}LF, spacingmark + \x0b\x{0711}Control, extend + 0: \x{0b} + 0+ \x{711}Control, extend + \x09\x{1b04}Control, spacingmark + 0: \x{09} + 0+ \x{1b04}Control, spacingmark +\= There are no Prepend characters, so we can't test Prepend, CR + +/^(?>\X{2})X/utf,aftertext + \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0+ + +/^\X{2,4}X/utf,aftertext + \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0+ + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0+ + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0+ + +/^\X{2,4}?X/utf,aftertext + \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0+ + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0+ + \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0: \x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}\x{1111}\x{ae4c}X + 0+ + +/\x{1e9e}+/i,utf + \x{1e9e}\x{00df} + 0: \x{1e9e}\x{df} + +/[z\x{1e9e}]+/i,utf + \x{1e9e}\x{00df} + 0: \x{1e9e}\x{df} + +/\x{00df}+/i,utf + \x{1e9e}\x{00df} + 0: \x{1e9e}\x{df} + +/[z\x{00df}]+/i,utf + \x{1e9e}\x{00df} + 0: \x{1e9e}\x{df} + +/\x{1f88}+/i,utf + \x{1f88}\x{1f80} + 0: \x{1f88}\x{1f80} + +/[z\x{1f88}]+/i,utf + \x{1f88}\x{1f80} + 0: \x{1f88}\x{1f80} + +# Perl matches these + +/\x{00b5}+/i,utf + \x{00b5}\x{039c}\x{03bc} + 0: \x{b5}\x{39c}\x{3bc} + +/\x{039c}+/i,utf + \x{00b5}\x{039c}\x{03bc} + 0: \x{b5}\x{39c}\x{3bc} + +/\x{03bc}+/i,utf + \x{00b5}\x{039c}\x{03bc} + 0: \x{b5}\x{39c}\x{3bc} + + +/\x{00c5}+/i,utf + \x{00c5}\x{00e5}\x{212b} + 0: \x{c5}\x{e5}\x{212b} + +/\x{00e5}+/i,utf + \x{00c5}\x{00e5}\x{212b} + 0: \x{c5}\x{e5}\x{212b} + +/\x{212b}+/i,utf + \x{00c5}\x{00e5}\x{212b} + 0: \x{c5}\x{e5}\x{212b} + +/\x{01c4}+/i,utf + \x{01c4}\x{01c5}\x{01c6} + 0: \x{1c4}\x{1c5}\x{1c6} + +/\x{01c5}+/i,utf + \x{01c4}\x{01c5}\x{01c6} + 0: \x{1c4}\x{1c5}\x{1c6} + +/\x{01c6}+/i,utf + \x{01c4}\x{01c5}\x{01c6} + 0: \x{1c4}\x{1c5}\x{1c6} + +/\x{01c7}+/i,utf + \x{01c7}\x{01c8}\x{01c9} + 0: \x{1c7}\x{1c8}\x{1c9} + +/\x{01c8}+/i,utf + \x{01c7}\x{01c8}\x{01c9} + 0: \x{1c7}\x{1c8}\x{1c9} + +/\x{01c9}+/i,utf + \x{01c7}\x{01c8}\x{01c9} + 0: \x{1c7}\x{1c8}\x{1c9} + + +/\x{01ca}+/i,utf + \x{01ca}\x{01cb}\x{01cc} + 0: \x{1ca}\x{1cb}\x{1cc} + +/\x{01cb}+/i,utf + \x{01ca}\x{01cb}\x{01cc} + 0: \x{1ca}\x{1cb}\x{1cc} + +/\x{01cc}+/i,utf + \x{01ca}\x{01cb}\x{01cc} + 0: \x{1ca}\x{1cb}\x{1cc} + +/\x{01f1}+/i,utf + \x{01f1}\x{01f2}\x{01f3} + 0: \x{1f1}\x{1f2}\x{1f3} + +/\x{01f2}+/i,utf + \x{01f1}\x{01f2}\x{01f3} + 0: \x{1f1}\x{1f2}\x{1f3} + +/\x{01f3}+/i,utf + \x{01f1}\x{01f2}\x{01f3} + 0: \x{1f1}\x{1f2}\x{1f3} + +/\x{0345}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + 0: \x{345}\x{399}\x{3b9}\x{1fbe} + +/\x{0399}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + 0: \x{345}\x{399}\x{3b9}\x{1fbe} + +/\x{03b9}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + 0: \x{345}\x{399}\x{3b9}\x{1fbe} + +/\x{1fbe}+/i,utf + \x{0345}\x{0399}\x{03b9}\x{1fbe} + 0: \x{345}\x{399}\x{3b9}\x{1fbe} + +/\x{0392}+/i,utf + \x{0392}\x{03b2}\x{03d0} + 0: \x{392}\x{3b2}\x{3d0} + +/\x{03b2}+/i,utf + \x{0392}\x{03b2}\x{03d0} + 0: \x{392}\x{3b2}\x{3d0} + +/\x{03d0}+/i,utf + \x{0392}\x{03b2}\x{03d0} + 0: \x{392}\x{3b2}\x{3d0} + + +/\x{0395}+/i,utf + \x{0395}\x{03b5}\x{03f5} + 0: \x{395}\x{3b5}\x{3f5} + +/\x{03b5}+/i,utf + \x{0395}\x{03b5}\x{03f5} + 0: \x{395}\x{3b5}\x{3f5} + +/\x{03f5}+/i,utf + \x{0395}\x{03b5}\x{03f5} + 0: \x{395}\x{3b5}\x{3f5} + +/\x{0398}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + 0: \x{398}\x{3b8}\x{3d1}\x{3f4} + +/\x{03b8}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + 0: \x{398}\x{3b8}\x{3d1}\x{3f4} + +/\x{03d1}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + 0: \x{398}\x{3b8}\x{3d1}\x{3f4} + +/\x{03f4}+/i,utf + \x{0398}\x{03b8}\x{03d1}\x{03f4} + 0: \x{398}\x{3b8}\x{3d1}\x{3f4} + +/\x{039a}+/i,utf + \x{039a}\x{03ba}\x{03f0} + 0: \x{39a}\x{3ba}\x{3f0} + +/\x{03ba}+/i,utf + \x{039a}\x{03ba}\x{03f0} + 0: \x{39a}\x{3ba}\x{3f0} + +/\x{03f0}+/i,utf + \x{039a}\x{03ba}\x{03f0} + 0: \x{39a}\x{3ba}\x{3f0} + +/\x{03a0}+/i,utf + \x{03a0}\x{03c0}\x{03d6} + 0: \x{3a0}\x{3c0}\x{3d6} + +/\x{03c0}+/i,utf + \x{03a0}\x{03c0}\x{03d6} + 0: \x{3a0}\x{3c0}\x{3d6} + +/\x{03d6}+/i,utf + \x{03a0}\x{03c0}\x{03d6} + 0: \x{3a0}\x{3c0}\x{3d6} + +/\x{03a1}+/i,utf + \x{03a1}\x{03c1}\x{03f1} + 0: \x{3a1}\x{3c1}\x{3f1} + +/\x{03c1}+/i,utf + \x{03a1}\x{03c1}\x{03f1} + 0: \x{3a1}\x{3c1}\x{3f1} + +/\x{03f1}+/i,utf + \x{03a1}\x{03c1}\x{03f1} + 0: \x{3a1}\x{3c1}\x{3f1} + +/\x{03a3}+/i,utf + \x{03A3}\x{03C2}\x{03C3} + 0: \x{3a3}\x{3c2}\x{3c3} + +/\x{03c2}+/i,utf + \x{03A3}\x{03C2}\x{03C3} + 0: \x{3a3}\x{3c2}\x{3c3} + +/\x{03c3}+/i,utf + \x{03A3}\x{03C2}\x{03C3} + 0: \x{3a3}\x{3c2}\x{3c3} + +/\x{03a6}+/i,utf + \x{03a6}\x{03c6}\x{03d5} + 0: \x{3a6}\x{3c6}\x{3d5} + +/\x{03c6}+/i,utf + \x{03a6}\x{03c6}\x{03d5} + 0: \x{3a6}\x{3c6}\x{3d5} + +/\x{03d5}+/i,utf + \x{03a6}\x{03c6}\x{03d5} + 0: \x{3a6}\x{3c6}\x{3d5} + +/\x{03c9}+/i,utf + \x{03c9}\x{03a9}\x{2126} + 0: \x{3c9}\x{3a9}\x{2126} + +/\x{03a9}+/i,utf + \x{03c9}\x{03a9}\x{2126} + 0: \x{3c9}\x{3a9}\x{2126} + +/\x{2126}+/i,utf + \x{03c9}\x{03a9}\x{2126} + 0: \x{3c9}\x{3a9}\x{2126} + +/\x{1e60}+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + 0: \x{1e60}\x{1e61}\x{1e9b} + +/\x{1e61}+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + 0: \x{1e60}\x{1e61}\x{1e9b} + +/\x{1e9b}+/i,utf + \x{1e60}\x{1e61}\x{1e9b} + 0: \x{1e60}\x{1e61}\x{1e9b} + +/\x{1e9e}+/i,utf + \x{1e9e}\x{00df} + 0: \x{1e9e}\x{df} + +/\x{00df}+/i,utf + \x{1e9e}\x{00df} + 0: \x{1e9e}\x{df} + +/\x{1f88}+/i,utf + \x{1f88}\x{1f80} + 0: \x{1f88}\x{1f80} + +/\x{1f80}+/i,utf + \x{1f88}\x{1f80} + 0: \x{1f88}\x{1f80} + +/\x{004b}+/i,utf + \x{004b}\x{006b}\x{212a} + 0: Kk\x{212a} + +/\x{006b}+/i,utf + \x{004b}\x{006b}\x{212a} + 0: Kk\x{212a} + +/\x{212a}+/i,utf + \x{004b}\x{006b}\x{212a} + 0: Kk\x{212a} + +/\x{0053}+/i,utf + \x{0053}\x{0073}\x{017f} + 0: Ss\x{17f} + +/\x{0073}+/i,utf + \x{0053}\x{0073}\x{017f} + 0: Ss\x{17f} + +/\x{017f}+/i,utf + \x{0053}\x{0073}\x{017f} + 0: Ss\x{17f} + +/ist/i,utf +\= Expect no match + ikt +No match + +/is+t/i,utf + iSs\x{17f}t + 0: iSs\x{17f}t +\= Expect no match + ikt +No match + +/is+?t/i,utf +\= Expect no match + ikt +No match + +/is?t/i,utf +\= Expect no match + ikt +No match + +/is{2}t/i,utf +\= Expect no match + iskt +No match + +/^\p{Xuc}/utf + $abc + 0: $ + @abc + 0: @ + `abc + 0: ` + \x{1234}abc + 0: \x{1234} +\= Expect no match + abc +No match + +/^\p{Xuc}+/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $@`\x{a0}\x{1234}\x{e000} +\= Expect no match + \x{9f} +No match + +/^\p{Xuc}+?/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $@`\x{a0}\x{1234}\x{e000} + 1: $@`\x{a0}\x{1234} + 2: $@`\x{a0} + 3: $@` + 4: $@ + 5: $ +\= Expect no match + \x{9f} +No match + +/^\p{Xuc}+?\*/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $@`\x{a0}\x{1234}\x{e000}* +\= Expect no match + \x{9f} +No match + +/^\p{Xuc}++/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $@`\x{a0}\x{1234}\x{e000} +\= Expect no match + \x{9f} +No match + +/^\p{Xuc}{3,5}/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $@`\x{a0}\x{1234} +\= Expect no match + \x{9f} +No match + +/^\p{Xuc}{3,5}?/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $@`\x{a0}\x{1234} + 1: $@`\x{a0} + 2: $@` +\= Expect no match + \x{9f} +No match + +/^[\p{Xuc}]/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $ +\= Expect no match + \x{9f} +No match + +/^[\p{Xuc}]+/utf + $@`\x{a0}\x{1234}\x{e000}** + 0: $@`\x{a0}\x{1234}\x{e000} +\= Expect no match + \x{9f} +No match + +/^\P{Xuc}/utf + abc + 0: a +\= Expect no match + $abc +No match + @abc +No match + `abc +No match + \x{1234}abc +No match + +/^[\P{Xuc}]/utf + abc + 0: a +\= Expect no match + $abc +No match + @abc +No match + `abc +No match + \x{1234}abc +No match + +/^A\s+Z/utf,ucp + A\x{2005}Z + 0: A\x{2005}Z + A\x{85}\x{180e}\x{2005}Z + 0: A\x{85}\x{180e}\x{2005}Z + +/^A[\s]+Z/utf,ucp + A\x{2005}Z + 0: A\x{2005}Z + A\x{85}\x{180e}\x{2005}Z + 0: A\x{85}\x{180e}\x{2005}Z + +/(?<=\x{100})\x{200}(?=\x{300})/utf,allusedtext + \x{100}\x{200}\x{300} + 0: \x{100}\x{200}\x{300} + <<<<<<< >>>>>>> + +# End of testinput7 diff --git a/src/pcre/testdata/testoutput11-16 b/src/pcre2/testdata/testoutput8-16-2 similarity index 61% rename from src/pcre/testdata/testoutput11-16 rename to src/pcre2/testdata/testoutput8-16-2 index 3c485da7..569a8603 100644 --- a/src/pcre/testdata/testoutput11-16 +++ b/src/pcre2/testdata/testoutput8-16-2 @@ -1,10 +1,15 @@ -/-- These are a few representative patterns whose lengths and offsets are to be -shown when the link size is 2. This is just a doublecheck test to ensure the -sizes don't go horribly wrong when something is changed. The pattern contents -are all themselves checked in other tests. Unicode, including property support, -is required for these tests. --/ - -/((?i)b)/BM +# There are two sorts of patterns in this test. A number of them are +# representative patterns whose lengths and offsets are checked. This is just a +# doublecheck test to ensure the sizes don't go horribly wrong when something +# is changed. The operation of these patterns is checked in other tests. +# +# This file also contains tests whose output varies with code unit size and/or +# link size. Unicode support is required for these tests. There are separate +# output files for each code unit size and link size. + +#pattern fullbincode,memory + +/((?i)b)/ Memory allocation (code space): 24 ------------------------------------------------------------------ 0 9 Bra @@ -15,7 +20,7 @@ Memory allocation (code space): 24 11 End ------------------------------------------------------------------ -/(?s)(.*X|^B)/BM +/(?s)(.*X|^B)/ Memory allocation (code space): 38 ------------------------------------------------------------------ 0 16 Bra @@ -30,7 +35,7 @@ Memory allocation (code space): 38 18 End ------------------------------------------------------------------ -/(?s:.*X|^B)/BM +/(?s:.*X|^B)/ Memory allocation (code space): 36 ------------------------------------------------------------------ 0 15 Bra @@ -45,7 +50,7 @@ Memory allocation (code space): 36 17 End ------------------------------------------------------------------ -/^[[:alnum:]]/BM +/^[[:alnum:]]/ Memory allocation (code space): 46 ------------------------------------------------------------------ 0 20 Bra @@ -55,20 +60,19 @@ Memory allocation (code space): 46 22 End ------------------------------------------------------------------ -/#/IxMD +/#/Ix Memory allocation (code space): 10 ------------------------------------------------------------------ 0 2 Bra 2 2 Ket 4 End ------------------------------------------------------------------ -Capturing subpattern count = 0 +Capture group count = 0 May match empty string Options: extended -No first char -No need char +Subject length lower bound = 0 -/a#/IxMD +/a#/Ix Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -76,12 +80,12 @@ Memory allocation (code space): 14 4 4 Ket 6 End ------------------------------------------------------------------ -Capturing subpattern count = 0 +Capture group count = 0 Options: extended -First char = 'a' -No need char +First code unit = 'a' +Subject length lower bound = 1 -/x?+/BM +/x?+/ Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -90,7 +94,7 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/x++/BM +/x++/ Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -99,7 +103,7 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/x{1,3}+/BM +/x{1,3}+/ Memory allocation (code space): 20 ------------------------------------------------------------------ 0 7 Bra @@ -109,7 +113,7 @@ Memory allocation (code space): 20 9 End ------------------------------------------------------------------ -/(x)*+/BM +/(x)*+/ Memory allocation (code space): 26 ------------------------------------------------------------------ 0 10 Bra @@ -121,7 +125,7 @@ Memory allocation (code space): 26 12 End ------------------------------------------------------------------ -/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/BM +/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/ Memory allocation (code space): 142 ------------------------------------------------------------------ 0 68 Bra @@ -144,7 +148,7 @@ Memory allocation (code space): 142 70 End ------------------------------------------------------------------ -|8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b|BM +"8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" Memory allocation (code space): 1648 ------------------------------------------------------------------ 0 821 Bra @@ -154,7 +158,7 @@ Memory allocation (code space): 1648 823 End ------------------------------------------------------------------ -|\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b|BM +"\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" Memory allocation (code space): 1628 ------------------------------------------------------------------ 0 811 Bra @@ -164,7 +168,7 @@ Memory allocation (code space): 1628 813 End ------------------------------------------------------------------ -/(a(?1)b)/BM +/(a(?1)b)/ Memory allocation (code space): 32 ------------------------------------------------------------------ 0 13 Bra @@ -177,13 +181,13 @@ Memory allocation (code space): 32 15 End ------------------------------------------------------------------ -/(a(?1)+b)/BM +/(a(?1)+b)/ Memory allocation (code space): 40 ------------------------------------------------------------------ 0 17 Bra 2 13 CBra 1 5 a - 7 4 Once + 7 4 SBra 9 2 Recurse 11 4 KetRmax 13 b @@ -192,8 +196,8 @@ Memory allocation (code space): 40 19 End ------------------------------------------------------------------ -/a(?Pb|c)d(?Pe)/BM -Memory allocation (code space): 80 +/a(?Pb|c)d(?Pe)/ +Memory allocation (code space): 54 ------------------------------------------------------------------ 0 24 Bra 2 a @@ -210,8 +214,8 @@ Memory allocation (code space): 80 26 End ------------------------------------------------------------------ -/(?:a(?Pc(?Pd)))(?Pa)/BM -Memory allocation (code space): 73 +/(?:a(?Pc(?Pd)))(?Pa)/ +Memory allocation (code space): 64 ------------------------------------------------------------------ 0 29 Bra 2 18 Bra @@ -230,8 +234,8 @@ Memory allocation (code space): 73 31 End ------------------------------------------------------------------ -/(?Pa)...(?P=a)bbb(?P>a)d/BM -Memory allocation (code space): 93 +/(?Pa)...(?P=a)bbb(?P>a)d/ +Memory allocation (code space): 54 ------------------------------------------------------------------ 0 24 Bra 2 5 CBra 1 @@ -248,7 +252,7 @@ Memory allocation (code space): 93 26 End ------------------------------------------------------------------ -/abc(?C255)de(?C)f/BM +/abc(?C255)de(?C)f/ Memory allocation (code space): 50 ------------------------------------------------------------------ 0 22 Bra @@ -261,7 +265,7 @@ Memory allocation (code space): 50 24 End ------------------------------------------------------------------ -/abcde/CBM +/abcde/auto_callout Memory allocation (code space): 78 ------------------------------------------------------------------ 0 36 Bra @@ -280,7 +284,7 @@ Memory allocation (code space): 78 38 End ------------------------------------------------------------------ -/\x{100}/8BM +/\x{100}/utf Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -289,7 +293,7 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/\x{1000}/8BM +/\x{1000}/utf Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -298,7 +302,7 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/\x{10000}/8BM +/\x{10000}/utf Memory allocation (code space): 16 ------------------------------------------------------------------ 0 5 Bra @@ -307,7 +311,7 @@ Memory allocation (code space): 16 7 End ------------------------------------------------------------------ -/\x{100000}/8BM +/\x{100000}/utf Memory allocation (code space): 16 ------------------------------------------------------------------ 0 5 Bra @@ -316,7 +320,7 @@ Memory allocation (code space): 16 7 End ------------------------------------------------------------------ -/\x{10ffff}/8BM +/\x{10ffff}/utf Memory allocation (code space): 16 ------------------------------------------------------------------ 0 5 Bra @@ -325,10 +329,10 @@ Memory allocation (code space): 16 7 End ------------------------------------------------------------------ -/\x{110000}/8BM -Failed: character value in \x{} or \o{} is too large at offset 9 +/\x{110000}/utf +Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large -/[\x{ff}]/8BM +/[\x{ff}]/utf Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -337,7 +341,7 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/[\x{100}]/8BM +/[\x{100}]/utf Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -346,16 +350,16 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/\x80/8BM +/\x80/utf Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra - 2 \x80 + 2 \x{80} 4 4 Ket 6 End ------------------------------------------------------------------ -/\xff/8BM +/\xff/utf Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -364,7 +368,7 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/\x{0041}\x{2262}\x{0391}\x{002e}/D8M +/\x{0041}\x{2262}\x{0391}\x{002e}/I,utf Memory allocation (code space): 26 ------------------------------------------------------------------ 0 10 Bra @@ -372,12 +376,13 @@ Memory allocation (code space): 26 10 10 Ket 12 End ------------------------------------------------------------------ -Capturing subpattern count = 0 +Capture group count = 0 Options: utf -First char = 'A' -Need char = '.' - -/\x{D55c}\x{ad6d}\x{C5B4}/D8M +First code unit = 'A' +Last code unit = '.' +Subject length lower bound = 4 + +/\x{D55c}\x{ad6d}\x{C5B4}/I,utf Memory allocation (code space): 22 ------------------------------------------------------------------ 0 8 Bra @@ -385,12 +390,13 @@ Memory allocation (code space): 22 8 8 Ket 10 End ------------------------------------------------------------------ -Capturing subpattern count = 0 +Capture group count = 0 Options: utf -First char = \x{d55c} -Need char = \x{c5b4} +First code unit = \x{d55c} +Last code unit = \x{c5b4} +Subject length lower bound = 3 -/\x{65e5}\x{672c}\x{8a9e}/D8M +/\x{65e5}\x{672c}\x{8a9e}/I,utf Memory allocation (code space): 22 ------------------------------------------------------------------ 0 8 Bra @@ -398,12 +404,13 @@ Memory allocation (code space): 22 8 8 Ket 10 End ------------------------------------------------------------------ -Capturing subpattern count = 0 +Capture group count = 0 Options: utf -First char = \x{65e5} -Need char = \x{8a9e} +First code unit = \x{65e5} +Last code unit = \x{8a9e} +Subject length lower bound = 3 -/[\x{100}]/8BM +/[\x{100}]/utf Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -412,7 +419,7 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/[Z\x{100}]/8BM +/[Z\x{100}]/utf Memory allocation (code space): 54 ------------------------------------------------------------------ 0 24 Bra @@ -421,7 +428,7 @@ Memory allocation (code space): 54 26 End ------------------------------------------------------------------ -/^[\x{100}\E-\Q\E\x{150}]/B8M +/^[\x{100}\E-\Q\E\x{150}]/utf Memory allocation (code space): 26 ------------------------------------------------------------------ 0 10 Bra @@ -431,7 +438,7 @@ Memory allocation (code space): 26 12 End ------------------------------------------------------------------ -/^[\QÄ€\E-\QÅ\E]/B8M +/^[\QÄ€\E-\QÅ\E]/utf Memory allocation (code space): 26 ------------------------------------------------------------------ 0 10 Bra @@ -441,10 +448,10 @@ Memory allocation (code space): 26 12 End ------------------------------------------------------------------ -/^[\QÄ€\E-\QÅ\E/B8M -Failed: missing terminating ] for character class at offset 13 +/^[\QÄ€\E-\QÅ\E/utf +Failed: error 106 at offset 13: missing terminating ] for character class -/[\p{L}]/BM +/[\p{L}]/ Memory allocation (code space): 24 ------------------------------------------------------------------ 0 9 Bra @@ -453,7 +460,7 @@ Memory allocation (code space): 24 11 End ------------------------------------------------------------------ -/[\p{^L}]/BM +/[\p{^L}]/ Memory allocation (code space): 24 ------------------------------------------------------------------ 0 9 Bra @@ -462,7 +469,7 @@ Memory allocation (code space): 24 11 End ------------------------------------------------------------------ -/[\P{L}]/BM +/[\P{L}]/ Memory allocation (code space): 24 ------------------------------------------------------------------ 0 9 Bra @@ -471,7 +478,7 @@ Memory allocation (code space): 24 11 End ------------------------------------------------------------------ -/[\P{^L}]/BM +/[\P{^L}]/ Memory allocation (code space): 24 ------------------------------------------------------------------ 0 9 Bra @@ -480,7 +487,7 @@ Memory allocation (code space): 24 11 End ------------------------------------------------------------------ -/[abc\p{L}\x{0660}]/8BM +/[abc\p{L}\x{0660}]/utf Memory allocation (code space): 60 ------------------------------------------------------------------ 0 27 Bra @@ -489,7 +496,7 @@ Memory allocation (code space): 60 29 End ------------------------------------------------------------------ -/[\p{Nd}]/8BM +/[\p{Nd}]/utf Memory allocation (code space): 24 ------------------------------------------------------------------ 0 9 Bra @@ -498,7 +505,7 @@ Memory allocation (code space): 24 11 End ------------------------------------------------------------------ -/[\p{Nd}+-]+/8BM +/[\p{Nd}+-]+/utf Memory allocation (code space): 58 ------------------------------------------------------------------ 0 26 Bra @@ -507,7 +514,7 @@ Memory allocation (code space): 58 28 End ------------------------------------------------------------------ -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iBM +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/i,utf Memory allocation (code space): 32 ------------------------------------------------------------------ 0 13 Bra @@ -516,7 +523,7 @@ Memory allocation (code space): 32 15 End ------------------------------------------------------------------ -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8BM +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/utf Memory allocation (code space): 32 ------------------------------------------------------------------ 0 13 Bra @@ -525,7 +532,7 @@ Memory allocation (code space): 32 15 End ------------------------------------------------------------------ -/[\x{105}-\x{109}]/8iBM +/[\x{105}-\x{109}]/i,utf Memory allocation (code space): 24 ------------------------------------------------------------------ 0 9 Bra @@ -534,7 +541,7 @@ Memory allocation (code space): 24 11 End ------------------------------------------------------------------ -/( ( (?(1)0|) )* )/xBM +/( ( (?(1)0|) )* )/x Memory allocation (code space): 52 ------------------------------------------------------------------ 0 23 Bra @@ -552,7 +559,7 @@ Memory allocation (code space): 52 25 End ------------------------------------------------------------------ -/( (?(1)0|)* )/xBM +/( (?(1)0|)* )/x Memory allocation (code space): 42 ------------------------------------------------------------------ 0 18 Bra @@ -568,7 +575,7 @@ Memory allocation (code space): 42 20 End ------------------------------------------------------------------ -/[a]/BM +/[a]/ Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -577,7 +584,7 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/[a]/8BM +/[a]/utf Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -586,7 +593,7 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/[\xaa]/BM +/[\xaa]/ Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -595,7 +602,7 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/[\xaa]/8BM +/[\xaa]/utf Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -604,7 +611,7 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/[^a]/BM +/[^a]/ Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -613,7 +620,7 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/[^a]/8BM +/[^a]/utf Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -622,7 +629,7 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/[^\xaa]/BM +/[^\xaa]/ Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -631,7 +638,7 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/[^\xaa]/8BM +/[^\xaa]/utf Memory allocation (code space): 14 ------------------------------------------------------------------ 0 4 Bra @@ -640,7 +647,9 @@ Memory allocation (code space): 14 6 End ------------------------------------------------------------------ -/[^\d]/8WB +#pattern -memory + +/[^\d]/utf,ucp ------------------------------------------------------------------ 0 9 Bra 2 [^\p{Nd}] @@ -648,23 +657,23 @@ Memory allocation (code space): 14 11 End ------------------------------------------------------------------ -/[[:^alpha:][:^cntrl:]]+/8WB +/[[:^alpha:][:^cntrl:]]+/utf,ucp ------------------------------------------------------------------ - 0 30 Bra - 2 [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++ - 30 30 Ket - 32 End + 0 13 Bra + 2 [\P{L}\P{Cc}]++ + 13 13 Ket + 15 End ------------------------------------------------------------------ -/[[:^cntrl:][:^alpha:]]+/8WB +/[[:^cntrl:][:^alpha:]]+/utf,ucp ------------------------------------------------------------------ - 0 30 Bra - 2 [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++ - 30 30 Ket - 32 End + 0 13 Bra + 2 [\P{Cc}\P{L}]++ + 13 13 Ket + 15 End ------------------------------------------------------------------ -/[[:alpha:]]+/8WB +/[[:alpha:]]+/utf,ucp ------------------------------------------------------------------ 0 10 Bra 2 [\p{L}]++ @@ -672,7 +681,7 @@ Memory allocation (code space): 14 12 End ------------------------------------------------------------------ -/[[:^alpha:]\S]+/8WB +/[[:^alpha:]\S]+/utf,ucp ------------------------------------------------------------------ 0 13 Bra 2 [\P{L}\P{Xsp}]++ @@ -680,7 +689,7 @@ Memory allocation (code space): 14 15 End ------------------------------------------------------------------ -/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/B +/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/ ------------------------------------------------------------------ 0 60 Bra 2 abc @@ -709,63 +718,303 @@ Memory allocation (code space): 14 62 End ------------------------------------------------------------------ -/(((a\2)|(a*)\g<-1>))*a?/B +/(((a\2)|(a*)\g<-1>))*a?/ ------------------------------------------------------------------ - 0 39 Bra + 0 35 Bra 2 Brazero - 3 32 SCBra 1 - 6 27 Once - 8 12 CBra 2 - 11 7 CBra 3 - 14 a - 16 \2 - 18 7 Ket - 20 11 Alt - 22 5 CBra 4 - 25 a* - 27 5 Ket - 29 22 Recurse - 31 23 Ket - 33 27 Ket - 35 32 KetRmax - 37 a?+ - 39 39 Ket - 41 End + 3 28 SCBra 1 + 6 12 CBra 2 + 9 7 CBra 3 + 12 a + 14 \2 + 16 7 Ket + 18 11 Alt + 20 5 CBra 4 + 23 a* + 25 5 Ket + 27 20 Recurse + 29 23 Ket + 31 28 KetRmax + 33 a?+ + 35 35 Ket + 37 End +------------------------------------------------------------------ + +/((?+1)(\1))/ +------------------------------------------------------------------ + 0 16 Bra + 2 12 CBra 1 + 5 7 Recurse + 7 5 CBra 2 + 10 \1 + 12 5 Ket + 14 12 Ket + 16 16 Ket + 18 End ------------------------------------------------------------------ -/((?+1)(\1))/B +"(?1)(?#?'){2}(a)" ------------------------------------------------------------------ - 0 20 Bra - 2 16 Once - 4 12 CBra 1 - 7 9 Recurse - 9 5 CBra 2 - 12 \1 - 14 5 Ket - 16 12 Ket - 18 16 Ket - 20 20 Ket - 22 End + 0 13 Bra + 2 6 Recurse + 4 6 Recurse + 6 5 CBra 1 + 9 a + 11 5 Ket + 13 13 Ket + 15 End ------------------------------------------------------------------ -/.((?2)(?R)\1)()/B +/.((?2)(?R)|\1|$)()/ ------------------------------------------------------------------ - 0 23 Bra + 0 24 Bra 2 Any - 3 13 Once - 5 9 CBra 1 - 8 18 Recurse - 10 0 Recurse + 3 7 CBra 1 + 6 19 Recurse + 8 0 Recurse + 10 4 Alt 12 \1 - 14 9 Ket - 16 13 Ket - 18 3 CBra 2 - 21 3 Ket - 23 23 Ket - 25 End + 14 3 Alt + 16 $ + 17 14 Ket + 19 3 CBra 2 + 22 3 Ket + 24 24 Ket + 26 End ------------------------------------------------------------------ +/.((?3)(?R)()(?2)|\1|$)()/ +------------------------------------------------------------------ + 0 31 Bra + 2 Any + 3 14 CBra 1 + 6 26 Recurse + 8 0 Recurse + 10 3 CBra 2 + 13 3 Ket + 15 10 Recurse + 17 4 Alt + 19 \1 + 21 3 Alt + 23 $ + 24 21 Ket + 26 3 CBra 3 + 29 3 Ket + 31 31 Ket + 33 End +------------------------------------------------------------------ + +/(?1)()((((((\1++))\x85)+)|))/ +------------------------------------------------------------------ + 0 50 Bra + 2 4 Recurse + 4 3 CBra 1 + 7 3 Ket + 9 39 CBra 2 + 12 32 CBra 3 + 15 27 CBra 4 + 18 22 CBra 5 + 21 15 CBra 6 + 24 10 CBra 7 + 27 5 Once + 29 \1+ + 32 5 Ket + 34 10 Ket + 36 15 Ket + 38 \x{85} + 40 22 KetRmax + 42 27 Ket + 44 2 Alt + 46 34 Ket + 48 39 Ket + 50 50 Ket + 52 End +------------------------------------------------------------------ + +# Check the absolute limit on nesting (?| etc. This varies with code unit +# width because the workspace is a different number of bytes. It will fail +# with link size 2 in 8-bit and 16-bit but not in 32-bit. + +/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|parens_nest_limit=1000,-fullbincode +Failed: error 184 at offset 1504: (?| and/or (?J: or (?x: parentheses are too deeply nested + +# Use "expand" to create some very long patterns with nested parentheses, in +# order to test workspace overflow. Again, this varies with code unit width, +# and even when it fails in two modes, the error offset differs. It also varies +# with link size - hence multiple tests with different values. + +/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000 +Failed: error 186 at offset 12820: regular expression is too complicated + +/(?(1)(?1)){8,}+()/debug +------------------------------------------------------------------ + 0 79 Bra + 2 70 Once + 4 6 Cond + 6 1 Cond ref + 8 74 Recurse + 10 6 Ket + 12 6 Cond + 14 1 Cond ref + 16 74 Recurse + 18 6 Ket + 20 6 Cond + 22 1 Cond ref + 24 74 Recurse + 26 6 Ket + 28 6 Cond + 30 1 Cond ref + 32 74 Recurse + 34 6 Ket + 36 6 Cond + 38 1 Cond ref + 40 74 Recurse + 42 6 Ket + 44 6 Cond + 46 1 Cond ref + 48 74 Recurse + 50 6 Ket + 52 6 Cond + 54 1 Cond ref + 56 74 Recurse + 58 6 Ket + 60 10 SBraPos + 62 6 SCond + 64 1 Cond ref + 66 74 Recurse + 68 6 Ket + 70 10 KetRpos + 72 70 Ket + 74 3 CBra 1 + 77 3 Ket + 79 79 Ket + 81 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcd + 0: + 1: + +/(?(1)|a(?1)b){2,}+()/debug +------------------------------------------------------------------ + 0 43 Bra + 2 34 Once + 4 4 Cond + 6 1 Cond ref + 8 8 Alt + 10 a + 12 38 Recurse + 14 b + 16 12 Ket + 18 16 SBraPos + 20 4 SCond + 22 1 Cond ref + 24 8 Alt + 26 a + 28 38 Recurse + 30 b + 32 12 Ket + 34 16 KetRpos + 36 34 Ket + 38 3 CBra 1 + 41 3 Ket + 43 43 Ket + 45 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcde +No match + +/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug +------------------------------------------------------------------ + 0 133 Bra + 2 41 CBra 1 + 5 2 Recurse + 7 88 Recurse + 9 93 Recurse + 11 98 Recurse + 13 103 Recurse + 15 108 Recurse + 17 113 Recurse + 19 118 Recurse + 21 123 Recurse + 23 123 Recurse + 25 118 Recurse + 27 113 Recurse + 29 108 Recurse + 31 103 Recurse + 33 98 Recurse + 35 93 Recurse + 37 88 Recurse + 39 2 Recurse + 41 0 Recurse + 43 41 Ket + 45 41 SCBra 1 + 48 2 Recurse + 50 88 Recurse + 52 93 Recurse + 54 98 Recurse + 56 103 Recurse + 58 108 Recurse + 60 113 Recurse + 62 118 Recurse + 64 123 Recurse + 66 123 Recurse + 68 118 Recurse + 70 113 Recurse + 72 108 Recurse + 74 103 Recurse + 76 98 Recurse + 78 93 Recurse + 80 88 Recurse + 82 2 Recurse + 84 0 Recurse + 86 41 KetRmax + 88 3 CBra 2 + 91 3 Ket + 93 3 CBra 3 + 96 3 Ket + 98 3 CBra 4 +101 3 Ket +103 3 CBra 5 +106 3 Ket +108 3 CBra 6 +111 3 Ket +113 3 CBra 7 +116 3 Ket +118 3 CBra 8 +121 3 Ket +123 3 CBra 9 +126 3 Ket +128 3 CBra 10 +131 3 Ket +133 133 Ket +135 End +------------------------------------------------------------------ +Capture group count = 10 +May match empty string +Subject length lower boundailed: regular expression is too complicated at offset 490 +Failed: error 114 at offset 509: missing closing parenthesisfullbincode + +#pattern -fullbincode + +/\[()]{65535}/expand +Failed: error 120 at offset 131070: regular expression is too large -/-- End of testinput11 --/ +# End of testinput8 diff --git a/src/pcre2/testdata/testoutput8-16-3 b/src/pcre2/testdata/testoutput8-16-3 new file mode 100644 index 00000000..80ee1c99 --- /dev/null +++ b/src/pcre2/testdata/testoutput8-16-3 @@ -0,0 +1,1018 @@ +# There are two sorts of patterns in this test. A number of them are +# representative patterns whose lengths and offsets are checked. This is just a +# doublecheck test to ensure the sizes don't go horribly wrong when something +# is changed. The operation of these patterns is checked in other tests. +# +# This file also contains tests whose output varies with code unit size and/or +# link size. Unicode support is required for these tests. There are separate +# output files for each code unit size and link size. + +#pattern fullbincode,memory + +/((?i)b)/ +Memory allocation (code space): 32 +------------------------------------------------------------------ + 0 12 Bra + 3 6 CBra 1 + 7 /i b + 9 6 Ket + 12 12 Ket + 15 End +------------------------------------------------------------------ + +/(?s)(.*X|^B)/ +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 20 Bra + 3 8 CBra 1 + 7 AllAny* + 9 X + 11 6 Alt + 14 ^ + 15 B + 17 14 Ket + 20 20 Ket + 23 End +------------------------------------------------------------------ + +/(?s:.*X|^B)/ +Memory allocation (code space): 46 +------------------------------------------------------------------ + 0 19 Bra + 3 7 Bra + 6 AllAny* + 8 X + 10 6 Alt + 13 ^ + 14 B + 16 13 Ket + 19 19 Ket + 22 End +------------------------------------------------------------------ + +/^[[:alnum:]]/ +Memory allocation (code space): 50 +------------------------------------------------------------------ + 0 21 Bra + 3 ^ + 4 [0-9A-Za-z] + 21 21 Ket + 24 End +------------------------------------------------------------------ + +/#/Ix +Memory allocation (code space): 14 +------------------------------------------------------------------ + 0 3 Bra + 3 3 Ket + 6 End +------------------------------------------------------------------ +Capture group count = 0 +May match empty string +Options: extended +Subject length lower bound = 0 + +/a#/Ix +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 a + 5 5 Ket + 8 End +------------------------------------------------------------------ +Capture group count = 0 +Options: extended +First code unit = 'a' +Subject length lower bound = 1 + +/x?+/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 x?+ + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/x++/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 x++ + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/x{1,3}+/ +Memory allocation (code space): 24 +------------------------------------------------------------------ + 0 8 Bra + 3 x + 5 x{0,2}+ + 8 8 Ket + 11 End +------------------------------------------------------------------ + +/(x)*+/ +Memory allocation (code space): 34 +------------------------------------------------------------------ + 0 13 Bra + 3 Braposzero + 4 6 CBraPos 1 + 8 x + 10 6 KetRpos + 13 13 Ket + 16 End +------------------------------------------------------------------ + +/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/ +Memory allocation (code space): 166 +------------------------------------------------------------------ + 0 79 Bra + 3 ^ + 4 72 CBra 1 + 8 6 CBra 2 + 12 a+ + 14 6 Ket + 17 22 CBra 3 + 21 [ab]+? + 39 22 Ket + 42 22 CBra 4 + 46 [bc]+ + 64 22 Ket + 67 6 CBra 5 + 71 \w*+ + 73 6 Ket + 76 72 Ket + 79 79 Ket + 82 End +------------------------------------------------------------------ + +"8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" +Memory allocation (code space): 1652 +------------------------------------------------------------------ + 0 822 Bra + 3 8J$WE<.rX+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X +821 \b +822 822 Ket +825 End +------------------------------------------------------------------ + +"\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" +Memory allocation (code space): 1632 +------------------------------------------------------------------ + 0 812 Bra + 3 $<.X+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X +811 \b +812 812 Ket +815 End +------------------------------------------------------------------ + +/(a(?1)b)/ +Memory allocation (code space): 42 +------------------------------------------------------------------ + 0 17 Bra + 3 11 CBra 1 + 7 a + 9 3 Recurse + 12 b + 14 11 Ket + 17 17 Ket + 20 End +------------------------------------------------------------------ + +/(a(?1)+b)/ +Memory allocation (code space): 54 +------------------------------------------------------------------ + 0 23 Bra + 3 17 CBra 1 + 7 a + 9 6 SBra + 12 3 Recurse + 15 6 KetRmax + 18 b + 20 17 Ket + 23 23 Ket + 26 End +------------------------------------------------------------------ + +/a(?Pb|c)d(?Pe)/ +Memory allocation (code space): 68 +------------------------------------------------------------------ + 0 30 Bra + 3 a + 5 6 CBra 1 + 9 b + 11 5 Alt + 14 c + 16 11 Ket + 19 d + 21 6 CBra 2 + 25 e + 27 6 Ket + 30 30 Ket + 33 End +------------------------------------------------------------------ + +/(?:a(?Pc(?Pd)))(?Pa)/ +Memory allocation (code space): 84 +------------------------------------------------------------------ + 0 38 Bra + 3 23 Bra + 6 a + 8 15 CBra 1 + 12 c + 14 6 CBra 2 + 18 d + 20 6 Ket + 23 15 Ket + 26 23 Ket + 29 6 CBra 3 + 33 a + 35 6 Ket + 38 38 Ket + 41 End +------------------------------------------------------------------ + +/(?Pa)...(?P=a)bbb(?P>a)d/ +Memory allocation (code space): 64 +------------------------------------------------------------------ + 0 28 Bra + 3 6 CBra 1 + 7 a + 9 6 Ket + 12 Any + 13 Any + 14 Any + 15 \1 + 17 bbb + 23 3 Recurse + 26 d + 28 28 Ket + 31 End +------------------------------------------------------------------ + +/abc(?C255)de(?C)f/ +Memory allocation (code space): 62 +------------------------------------------------------------------ + 0 27 Bra + 3 abc + 9 Callout 255 10 1 + 15 de + 19 Callout 0 16 1 + 25 f + 27 27 Ket + 30 End +------------------------------------------------------------------ + +/abcde/auto_callout +Memory allocation (code space): 106 +------------------------------------------------------------------ + 0 49 Bra + 3 Callout 255 0 1 + 9 a + 11 Callout 255 1 1 + 17 b + 19 Callout 255 2 1 + 25 c + 27 Callout 255 3 1 + 33 d + 35 Callout 255 4 1 + 41 e + 43 Callout 255 5 0 + 49 49 Ket + 52 End +------------------------------------------------------------------ + +/\x{100}/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{100} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/\x{1000}/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{1000} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/\x{10000}/utf +Memory allocation (code space): 20 +------------------------------------------------------------------ + 0 6 Bra + 3 \x{10000} + 6 6 Ket + 9 End +------------------------------------------------------------------ + +/\x{100000}/utf +Memory allocation (code space): 20 +------------------------------------------------------------------ + 0 6 Bra + 3 \x{100000} + 6 6 Ket + 9 End +------------------------------------------------------------------ + +/\x{10ffff}/utf +Memory allocation (code space): 20 +------------------------------------------------------------------ + 0 6 Bra + 3 \x{10ffff} + 6 6 Ket + 9 End +------------------------------------------------------------------ + +/\x{110000}/utf +Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large + +/[\x{ff}]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{ff} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[\x{100}]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{100} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/\x80/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{80} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/\xff/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{ff} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/\x{0041}\x{2262}\x{0391}\x{002e}/I,utf +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 A\x{2262}\x{391}. + 11 11 Ket + 14 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'A' +Last code unit = '.' +Subject length lower bound = 4 + +/\x{D55c}\x{ad6d}\x{C5B4}/I,utf +Memory allocation (code space): 26 +------------------------------------------------------------------ + 0 9 Bra + 3 \x{d55c}\x{ad6d}\x{c5b4} + 9 9 Ket + 12 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{d55c} +Last code unit = \x{c5b4} +Subject length lower bound = 3 + +/\x{65e5}\x{672c}\x{8a9e}/I,utf +Memory allocation (code space): 26 +------------------------------------------------------------------ + 0 9 Bra + 3 \x{65e5}\x{672c}\x{8a9e} + 9 9 Ket + 12 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{65e5} +Last code unit = \x{8a9e} +Subject length lower bound = 3 + +/[\x{100}]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{100} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[Z\x{100}]/utf +Memory allocation (code space): 60 +------------------------------------------------------------------ + 0 26 Bra + 3 [Z\x{100}] + 26 26 Ket + 29 End +------------------------------------------------------------------ + +/^[\x{100}\E-\Q\E\x{150}]/utf +Memory allocation (code space): 32 +------------------------------------------------------------------ + 0 12 Bra + 3 ^ + 4 [\x{100}-\x{150}] + 12 12 Ket + 15 End +------------------------------------------------------------------ + +/^[\QÄ€\E-\QÅ\E]/utf +Memory allocation (code space): 32 +------------------------------------------------------------------ + 0 12 Bra + 3 ^ + 4 [\x{100}-\x{150}] + 12 12 Ket + 15 End +------------------------------------------------------------------ + +/^[\QÄ€\E-\QÅ\E/utf +Failed: error 106 at offset 13: missing terminating ] for character class + +/[\p{L}]/ +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\p{L}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[\p{^L}]/ +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\P{L}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[\P{L}]/ +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\P{L}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[\P{^L}]/ +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\p{L}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[abc\p{L}\x{0660}]/utf +Memory allocation (code space): 66 +------------------------------------------------------------------ + 0 29 Bra + 3 [a-c\p{L}\x{660}] + 29 29 Ket + 32 End +------------------------------------------------------------------ + +/[\p{Nd}]/utf +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\p{Nd}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[\p{Nd}+-]+/utf +Memory allocation (code space): 64 +------------------------------------------------------------------ + 0 28 Bra + 3 [+\-\p{Nd}]++ + 28 28 Ket + 31 End +------------------------------------------------------------------ + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/i,utf +Memory allocation (code space): 36 +------------------------------------------------------------------ + 0 14 Bra + 3 /i A\x{391}\x{10427}\x{ff3a}\x{1fb0} + 14 14 Ket + 17 End +------------------------------------------------------------------ + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/utf +Memory allocation (code space): 36 +------------------------------------------------------------------ + 0 14 Bra + 3 A\x{391}\x{10427}\x{ff3a}\x{1fb0} + 14 14 Ket + 17 End +------------------------------------------------------------------ + +/[\x{105}-\x{109}]/i,utf +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\x{104}-\x{109}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/( ( (?(1)0|) )* )/x +Memory allocation (code space): 70 +------------------------------------------------------------------ + 0 31 Bra + 3 25 CBra 1 + 7 Brazero + 8 17 SCBra 2 + 12 7 Cond + 15 1 Cond ref + 17 0 + 19 3 Alt + 22 10 Ket + 25 17 KetRmax + 28 25 Ket + 31 31 Ket + 34 End +------------------------------------------------------------------ + +/( (?(1)0|)* )/x +Memory allocation (code space): 56 +------------------------------------------------------------------ + 0 24 Bra + 3 18 CBra 1 + 7 Brazero + 8 7 SCond + 11 1 Cond ref + 13 0 + 15 3 Alt + 18 10 KetRmax + 21 18 Ket + 24 24 Ket + 27 End +------------------------------------------------------------------ + +/[a]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 a + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[a]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 a + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[\xaa]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{aa} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[\xaa]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{aa} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[^a]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 [^a] + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[^a]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 [^a] + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[^\xaa]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 [^\x{aa}] + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[^\xaa]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 [^\x{aa}] + 5 5 Ket + 8 End +------------------------------------------------------------------ + +#pattern -memory + +/[^\d]/utf,ucp +------------------------------------------------------------------ + 0 11 Bra + 3 [^\p{Nd}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[[:^alpha:][:^cntrl:]]+/utf,ucp +------------------------------------------------------------------ + 0 15 Bra + 3 [\P{L}\P{Cc}]++ + 15 15 Ket + 18 End +------------------------------------------------------------------ + +/[[:^cntrl:][:^alpha:]]+/utf,ucp +------------------------------------------------------------------ + 0 15 Bra + 3 [\P{Cc}\P{L}]++ + 15 15 Ket + 18 End +------------------------------------------------------------------ + +/[[:alpha:]]+/utf,ucp +------------------------------------------------------------------ + 0 12 Bra + 3 [\p{L}]++ + 12 12 Ket + 15 End +------------------------------------------------------------------ + +/[[:^alpha:]\S]+/utf,ucp +------------------------------------------------------------------ + 0 15 Bra + 3 [\P{L}\P{Xsp}]++ + 15 15 Ket + 18 End +------------------------------------------------------------------ + +/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/ +------------------------------------------------------------------ + 0 70 Bra + 3 abc + 9 6 CBra 1 + 13 d + 15 5 Alt + 18 e + 20 11 Ket + 23 *THEN + 24 x + 26 13 CBra 2 + 30 123 + 36 *THEN + 37 4 + 39 28 Alt + 42 567 + 48 6 CBra 3 + 52 b + 54 5 Alt + 57 q + 59 11 Ket + 62 *THEN + 63 xx + 67 41 Ket + 70 70 Ket + 73 End +------------------------------------------------------------------ + +/(((a\2)|(a*)\g<-1>))*a?/ +------------------------------------------------------------------ + 0 46 Bra + 3 Brazero + 4 37 SCBra 1 + 8 15 CBra 2 + 12 8 CBra 3 + 16 a + 18 \2 + 20 8 Ket + 23 15 Alt + 26 6 CBra 4 + 30 a* + 32 6 Ket + 35 26 Recurse + 38 30 Ket + 41 37 KetRmax + 44 a?+ + 46 46 Ket + 49 End +------------------------------------------------------------------ + +/((?+1)(\1))/ +------------------------------------------------------------------ + 0 22 Bra + 3 16 CBra 1 + 7 10 Recurse + 10 6 CBra 2 + 14 \1 + 16 6 Ket + 19 16 Ket + 22 22 Ket + 25 End +------------------------------------------------------------------ + +"(?1)(?#?'){2}(a)" +------------------------------------------------------------------ + 0 18 Bra + 3 9 Recurse + 6 9 Recurse + 9 6 CBra 1 + 13 a + 15 6 Ket + 18 18 Ket + 21 End +------------------------------------------------------------------ + +/.((?2)(?R)|\1|$)()/ +------------------------------------------------------------------ + 0 33 Bra + 3 Any + 4 10 CBra 1 + 8 26 Recurse + 11 0 Recurse + 14 5 Alt + 17 \1 + 19 4 Alt + 22 $ + 23 19 Ket + 26 4 CBra 2 + 30 4 Ket + 33 33 Ket + 36 End +------------------------------------------------------------------ + +/.((?3)(?R)()(?2)|\1|$)()/ +------------------------------------------------------------------ + 0 43 Bra + 3 Any + 4 20 CBra 1 + 8 36 Recurse + 11 0 Recurse + 14 4 CBra 2 + 18 4 Ket + 21 14 Recurse + 24 5 Alt + 27 \1 + 29 4 Alt + 32 $ + 33 29 Ket + 36 4 CBra 3 + 40 4 Ket + 43 43 Ket + 46 End +------------------------------------------------------------------ + +/(?1)()((((((\1++))\x85)+)|))/ +------------------------------------------------------------------ + 0 69 Bra + 3 6 Recurse + 6 4 CBra 1 + 10 4 Ket + 13 53 CBra 2 + 17 43 CBra 3 + 21 36 CBra 4 + 25 29 CBra 5 + 29 20 CBra 6 + 33 13 CBra 7 + 37 6 Once + 40 \1+ + 43 6 Ket + 46 13 Ket + 49 20 Ket + 52 \x{85} + 54 29 KetRmax + 57 36 Ket + 60 3 Alt + 63 46 Ket + 66 53 Ket + 69 69 Ket + 72 End +------------------------------------------------------------------ + +# Check the absolute limit on nesting (?| etc. This varies with code unit +# width because the workspace is a different number of bytes. It will fail +# with link size 2 in 8-bit and 16-bit but not in 32-bit. + +/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|parens_nest_limit=1000,-fullbincode + +# Use "expand" to create some very long patterns with nested parentheses, in +# order to test workspace overflow. Again, this varies with code unit width, +# and even when it fails in two modes, the error offset differs. It also varies +# with link size - hence multiple tests with different values. + +/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000 +Failed: error 186 at offset 12820: regular expression is too complicated + +/(?(1)(?1)){8,}+()/debug +------------------------------------------------------------------ + 0 110 Bra + 3 97 Once + 6 8 Cond + 9 1 Cond ref + 11 103 Recurse + 14 8 Ket + 17 8 Cond + 20 1 Cond ref + 22 103 Recurse + 25 8 Ket + 28 8 Cond + 31 1 Cond ref + 33 103 Recurse + 36 8 Ket + 39 8 Cond + 42 1 Cond ref + 44 103 Recurse + 47 8 Ket + 50 8 Cond + 53 1 Cond ref + 55 103 Recurse + 58 8 Ket + 61 8 Cond + 64 1 Cond ref + 66 103 Recurse + 69 8 Ket + 72 8 Cond + 75 1 Cond ref + 77 103 Recurse + 80 8 Ket + 83 14 SBraPos + 86 8 SCond + 89 1 Cond ref + 91 103 Recurse + 94 8 Ket + 97 14 KetRpos +100 97 Ket +103 4 CBra 1 +107 4 Ket +110 110 Ket +113 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcd + 0: + 1: + +/(?(1)|a(?1)b){2,}+()/debug +------------------------------------------------------------------ + 0 58 Bra + 3 45 Once + 6 5 Cond + 9 1 Cond ref + 11 10 Alt + 14 a + 16 51 Recurse + 19 b + 21 15 Ket + 24 21 SBraPos + 27 5 SCond + 30 1 Cond ref + 32 10 Alt + 35 a + 37 51 Recurse + 40 b + 42 15 Ket + 45 21 KetRpos + 48 45 Ket + 51 4 CBra 1 + 55 4 Ket + 58 58 Ket + 61 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcde +No match + +/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug +------------------------------------------------------------------ + 0 194 Bra + 3 61 CBra 1 + 7 3 Recurse + 10 131 Recurse + 13 138 Recurse + 16 145 Recurse + 19 152 Recurse + 22 159 Recurse + 25 166 Recurse + 28 173 Recurse + 31 180 Recurse + 34 180 Recurse + 37 173 Recurse + 40 166 Recurse + 43 159 Recurse + 46 152 Recurse + 49 145 Recurse + 52 138 Recurse + 55 131 Recurse + 58 3 Recurse + 61 0 Recurse + 64 61 Ket + 67 61 SCBra 1 + 71 3 Recurse + 74 131 Recurse + 77 138 Recurse + 80 145 Recurse + 83 152 Recurse + 86 159 Recurse + 89 166 Recurse + 92 173 Recurse + 95 180 Recurse + 98 180 Recurse +101 173 Recurse +104 166 Recurse +107 159 Recurse +110 152 Recurse +113 145 Recurse +116 138 Recurse +119 131 Recurse +122 3 Recurse +125 0 Recurse +128 61 KetRmax +131 4 CBra 2 +135 4 Ket +138 4 CBra 3 +142 4 Ket +145 4 CBra 4 +149 4 Ket +152 4 CBra 5 +156 4 Ket +159 4 CBra 6 +163 4 Ket +166 4 CBra 7 +170 4 Ket +173 4 CBra 8 +177 4 Ket +180 4 CBra 9 +184 4 Ket +187 4 CBra 10 +191 4 Ket +194 194 Ket +197 End +------------------------------------------------------------------ +Capture group count = 10 +May match empty string +Subject length lower boundailed: error 114 at offset 509: missing closing parenthesisfullbincode + +#pattern -fullbincode + +/\[()]{65535}/expand + +# End of testinput8 diff --git a/src/pcre2/testdata/testoutput8-16-4 b/src/pcre2/testdata/testoutput8-16-4 new file mode 100644 index 00000000..80ee1c99 --- /dev/null +++ b/src/pcre2/testdata/testoutput8-16-4 @@ -0,0 +1,1018 @@ +# There are two sorts of patterns in this test. A number of them are +# representative patterns whose lengths and offsets are checked. This is just a +# doublecheck test to ensure the sizes don't go horribly wrong when something +# is changed. The operation of these patterns is checked in other tests. +# +# This file also contains tests whose output varies with code unit size and/or +# link size. Unicode support is required for these tests. There are separate +# output files for each code unit size and link size. + +#pattern fullbincode,memory + +/((?i)b)/ +Memory allocation (code space): 32 +------------------------------------------------------------------ + 0 12 Bra + 3 6 CBra 1 + 7 /i b + 9 6 Ket + 12 12 Ket + 15 End +------------------------------------------------------------------ + +/(?s)(.*X|^B)/ +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 20 Bra + 3 8 CBra 1 + 7 AllAny* + 9 X + 11 6 Alt + 14 ^ + 15 B + 17 14 Ket + 20 20 Ket + 23 End +------------------------------------------------------------------ + +/(?s:.*X|^B)/ +Memory allocation (code space): 46 +------------------------------------------------------------------ + 0 19 Bra + 3 7 Bra + 6 AllAny* + 8 X + 10 6 Alt + 13 ^ + 14 B + 16 13 Ket + 19 19 Ket + 22 End +------------------------------------------------------------------ + +/^[[:alnum:]]/ +Memory allocation (code space): 50 +------------------------------------------------------------------ + 0 21 Bra + 3 ^ + 4 [0-9A-Za-z] + 21 21 Ket + 24 End +------------------------------------------------------------------ + +/#/Ix +Memory allocation (code space): 14 +------------------------------------------------------------------ + 0 3 Bra + 3 3 Ket + 6 End +------------------------------------------------------------------ +Capture group count = 0 +May match empty string +Options: extended +Subject length lower bound = 0 + +/a#/Ix +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 a + 5 5 Ket + 8 End +------------------------------------------------------------------ +Capture group count = 0 +Options: extended +First code unit = 'a' +Subject length lower bound = 1 + +/x?+/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 x?+ + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/x++/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 x++ + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/x{1,3}+/ +Memory allocation (code space): 24 +------------------------------------------------------------------ + 0 8 Bra + 3 x + 5 x{0,2}+ + 8 8 Ket + 11 End +------------------------------------------------------------------ + +/(x)*+/ +Memory allocation (code space): 34 +------------------------------------------------------------------ + 0 13 Bra + 3 Braposzero + 4 6 CBraPos 1 + 8 x + 10 6 KetRpos + 13 13 Ket + 16 End +------------------------------------------------------------------ + +/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/ +Memory allocation (code space): 166 +------------------------------------------------------------------ + 0 79 Bra + 3 ^ + 4 72 CBra 1 + 8 6 CBra 2 + 12 a+ + 14 6 Ket + 17 22 CBra 3 + 21 [ab]+? + 39 22 Ket + 42 22 CBra 4 + 46 [bc]+ + 64 22 Ket + 67 6 CBra 5 + 71 \w*+ + 73 6 Ket + 76 72 Ket + 79 79 Ket + 82 End +------------------------------------------------------------------ + +"8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" +Memory allocation (code space): 1652 +------------------------------------------------------------------ + 0 822 Bra + 3 8J$WE<.rX+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X +821 \b +822 822 Ket +825 End +------------------------------------------------------------------ + +"\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" +Memory allocation (code space): 1632 +------------------------------------------------------------------ + 0 812 Bra + 3 $<.X+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X +811 \b +812 812 Ket +815 End +------------------------------------------------------------------ + +/(a(?1)b)/ +Memory allocation (code space): 42 +------------------------------------------------------------------ + 0 17 Bra + 3 11 CBra 1 + 7 a + 9 3 Recurse + 12 b + 14 11 Ket + 17 17 Ket + 20 End +------------------------------------------------------------------ + +/(a(?1)+b)/ +Memory allocation (code space): 54 +------------------------------------------------------------------ + 0 23 Bra + 3 17 CBra 1 + 7 a + 9 6 SBra + 12 3 Recurse + 15 6 KetRmax + 18 b + 20 17 Ket + 23 23 Ket + 26 End +------------------------------------------------------------------ + +/a(?Pb|c)d(?Pe)/ +Memory allocation (code space): 68 +------------------------------------------------------------------ + 0 30 Bra + 3 a + 5 6 CBra 1 + 9 b + 11 5 Alt + 14 c + 16 11 Ket + 19 d + 21 6 CBra 2 + 25 e + 27 6 Ket + 30 30 Ket + 33 End +------------------------------------------------------------------ + +/(?:a(?Pc(?Pd)))(?Pa)/ +Memory allocation (code space): 84 +------------------------------------------------------------------ + 0 38 Bra + 3 23 Bra + 6 a + 8 15 CBra 1 + 12 c + 14 6 CBra 2 + 18 d + 20 6 Ket + 23 15 Ket + 26 23 Ket + 29 6 CBra 3 + 33 a + 35 6 Ket + 38 38 Ket + 41 End +------------------------------------------------------------------ + +/(?Pa)...(?P=a)bbb(?P>a)d/ +Memory allocation (code space): 64 +------------------------------------------------------------------ + 0 28 Bra + 3 6 CBra 1 + 7 a + 9 6 Ket + 12 Any + 13 Any + 14 Any + 15 \1 + 17 bbb + 23 3 Recurse + 26 d + 28 28 Ket + 31 End +------------------------------------------------------------------ + +/abc(?C255)de(?C)f/ +Memory allocation (code space): 62 +------------------------------------------------------------------ + 0 27 Bra + 3 abc + 9 Callout 255 10 1 + 15 de + 19 Callout 0 16 1 + 25 f + 27 27 Ket + 30 End +------------------------------------------------------------------ + +/abcde/auto_callout +Memory allocation (code space): 106 +------------------------------------------------------------------ + 0 49 Bra + 3 Callout 255 0 1 + 9 a + 11 Callout 255 1 1 + 17 b + 19 Callout 255 2 1 + 25 c + 27 Callout 255 3 1 + 33 d + 35 Callout 255 4 1 + 41 e + 43 Callout 255 5 0 + 49 49 Ket + 52 End +------------------------------------------------------------------ + +/\x{100}/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{100} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/\x{1000}/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{1000} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/\x{10000}/utf +Memory allocation (code space): 20 +------------------------------------------------------------------ + 0 6 Bra + 3 \x{10000} + 6 6 Ket + 9 End +------------------------------------------------------------------ + +/\x{100000}/utf +Memory allocation (code space): 20 +------------------------------------------------------------------ + 0 6 Bra + 3 \x{100000} + 6 6 Ket + 9 End +------------------------------------------------------------------ + +/\x{10ffff}/utf +Memory allocation (code space): 20 +------------------------------------------------------------------ + 0 6 Bra + 3 \x{10ffff} + 6 6 Ket + 9 End +------------------------------------------------------------------ + +/\x{110000}/utf +Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large + +/[\x{ff}]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{ff} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[\x{100}]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{100} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/\x80/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{80} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/\xff/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{ff} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/\x{0041}\x{2262}\x{0391}\x{002e}/I,utf +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 A\x{2262}\x{391}. + 11 11 Ket + 14 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'A' +Last code unit = '.' +Subject length lower bound = 4 + +/\x{D55c}\x{ad6d}\x{C5B4}/I,utf +Memory allocation (code space): 26 +------------------------------------------------------------------ + 0 9 Bra + 3 \x{d55c}\x{ad6d}\x{c5b4} + 9 9 Ket + 12 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{d55c} +Last code unit = \x{c5b4} +Subject length lower bound = 3 + +/\x{65e5}\x{672c}\x{8a9e}/I,utf +Memory allocation (code space): 26 +------------------------------------------------------------------ + 0 9 Bra + 3 \x{65e5}\x{672c}\x{8a9e} + 9 9 Ket + 12 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{65e5} +Last code unit = \x{8a9e} +Subject length lower bound = 3 + +/[\x{100}]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{100} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[Z\x{100}]/utf +Memory allocation (code space): 60 +------------------------------------------------------------------ + 0 26 Bra + 3 [Z\x{100}] + 26 26 Ket + 29 End +------------------------------------------------------------------ + +/^[\x{100}\E-\Q\E\x{150}]/utf +Memory allocation (code space): 32 +------------------------------------------------------------------ + 0 12 Bra + 3 ^ + 4 [\x{100}-\x{150}] + 12 12 Ket + 15 End +------------------------------------------------------------------ + +/^[\QÄ€\E-\QÅ\E]/utf +Memory allocation (code space): 32 +------------------------------------------------------------------ + 0 12 Bra + 3 ^ + 4 [\x{100}-\x{150}] + 12 12 Ket + 15 End +------------------------------------------------------------------ + +/^[\QÄ€\E-\QÅ\E/utf +Failed: error 106 at offset 13: missing terminating ] for character class + +/[\p{L}]/ +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\p{L}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[\p{^L}]/ +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\P{L}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[\P{L}]/ +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\P{L}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[\P{^L}]/ +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\p{L}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[abc\p{L}\x{0660}]/utf +Memory allocation (code space): 66 +------------------------------------------------------------------ + 0 29 Bra + 3 [a-c\p{L}\x{660}] + 29 29 Ket + 32 End +------------------------------------------------------------------ + +/[\p{Nd}]/utf +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\p{Nd}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[\p{Nd}+-]+/utf +Memory allocation (code space): 64 +------------------------------------------------------------------ + 0 28 Bra + 3 [+\-\p{Nd}]++ + 28 28 Ket + 31 End +------------------------------------------------------------------ + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/i,utf +Memory allocation (code space): 36 +------------------------------------------------------------------ + 0 14 Bra + 3 /i A\x{391}\x{10427}\x{ff3a}\x{1fb0} + 14 14 Ket + 17 End +------------------------------------------------------------------ + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/utf +Memory allocation (code space): 36 +------------------------------------------------------------------ + 0 14 Bra + 3 A\x{391}\x{10427}\x{ff3a}\x{1fb0} + 14 14 Ket + 17 End +------------------------------------------------------------------ + +/[\x{105}-\x{109}]/i,utf +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 11 Bra + 3 [\x{104}-\x{109}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/( ( (?(1)0|) )* )/x +Memory allocation (code space): 70 +------------------------------------------------------------------ + 0 31 Bra + 3 25 CBra 1 + 7 Brazero + 8 17 SCBra 2 + 12 7 Cond + 15 1 Cond ref + 17 0 + 19 3 Alt + 22 10 Ket + 25 17 KetRmax + 28 25 Ket + 31 31 Ket + 34 End +------------------------------------------------------------------ + +/( (?(1)0|)* )/x +Memory allocation (code space): 56 +------------------------------------------------------------------ + 0 24 Bra + 3 18 CBra 1 + 7 Brazero + 8 7 SCond + 11 1 Cond ref + 13 0 + 15 3 Alt + 18 10 KetRmax + 21 18 Ket + 24 24 Ket + 27 End +------------------------------------------------------------------ + +/[a]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 a + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[a]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 a + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[\xaa]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{aa} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[\xaa]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 \x{aa} + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[^a]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 [^a] + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[^a]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 [^a] + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[^\xaa]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 [^\x{aa}] + 5 5 Ket + 8 End +------------------------------------------------------------------ + +/[^\xaa]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 5 Bra + 3 [^\x{aa}] + 5 5 Ket + 8 End +------------------------------------------------------------------ + +#pattern -memory + +/[^\d]/utf,ucp +------------------------------------------------------------------ + 0 11 Bra + 3 [^\p{Nd}] + 11 11 Ket + 14 End +------------------------------------------------------------------ + +/[[:^alpha:][:^cntrl:]]+/utf,ucp +------------------------------------------------------------------ + 0 15 Bra + 3 [\P{L}\P{Cc}]++ + 15 15 Ket + 18 End +------------------------------------------------------------------ + +/[[:^cntrl:][:^alpha:]]+/utf,ucp +------------------------------------------------------------------ + 0 15 Bra + 3 [\P{Cc}\P{L}]++ + 15 15 Ket + 18 End +------------------------------------------------------------------ + +/[[:alpha:]]+/utf,ucp +------------------------------------------------------------------ + 0 12 Bra + 3 [\p{L}]++ + 12 12 Ket + 15 End +------------------------------------------------------------------ + +/[[:^alpha:]\S]+/utf,ucp +------------------------------------------------------------------ + 0 15 Bra + 3 [\P{L}\P{Xsp}]++ + 15 15 Ket + 18 End +------------------------------------------------------------------ + +/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/ +------------------------------------------------------------------ + 0 70 Bra + 3 abc + 9 6 CBra 1 + 13 d + 15 5 Alt + 18 e + 20 11 Ket + 23 *THEN + 24 x + 26 13 CBra 2 + 30 123 + 36 *THEN + 37 4 + 39 28 Alt + 42 567 + 48 6 CBra 3 + 52 b + 54 5 Alt + 57 q + 59 11 Ket + 62 *THEN + 63 xx + 67 41 Ket + 70 70 Ket + 73 End +------------------------------------------------------------------ + +/(((a\2)|(a*)\g<-1>))*a?/ +------------------------------------------------------------------ + 0 46 Bra + 3 Brazero + 4 37 SCBra 1 + 8 15 CBra 2 + 12 8 CBra 3 + 16 a + 18 \2 + 20 8 Ket + 23 15 Alt + 26 6 CBra 4 + 30 a* + 32 6 Ket + 35 26 Recurse + 38 30 Ket + 41 37 KetRmax + 44 a?+ + 46 46 Ket + 49 End +------------------------------------------------------------------ + +/((?+1)(\1))/ +------------------------------------------------------------------ + 0 22 Bra + 3 16 CBra 1 + 7 10 Recurse + 10 6 CBra 2 + 14 \1 + 16 6 Ket + 19 16 Ket + 22 22 Ket + 25 End +------------------------------------------------------------------ + +"(?1)(?#?'){2}(a)" +------------------------------------------------------------------ + 0 18 Bra + 3 9 Recurse + 6 9 Recurse + 9 6 CBra 1 + 13 a + 15 6 Ket + 18 18 Ket + 21 End +------------------------------------------------------------------ + +/.((?2)(?R)|\1|$)()/ +------------------------------------------------------------------ + 0 33 Bra + 3 Any + 4 10 CBra 1 + 8 26 Recurse + 11 0 Recurse + 14 5 Alt + 17 \1 + 19 4 Alt + 22 $ + 23 19 Ket + 26 4 CBra 2 + 30 4 Ket + 33 33 Ket + 36 End +------------------------------------------------------------------ + +/.((?3)(?R)()(?2)|\1|$)()/ +------------------------------------------------------------------ + 0 43 Bra + 3 Any + 4 20 CBra 1 + 8 36 Recurse + 11 0 Recurse + 14 4 CBra 2 + 18 4 Ket + 21 14 Recurse + 24 5 Alt + 27 \1 + 29 4 Alt + 32 $ + 33 29 Ket + 36 4 CBra 3 + 40 4 Ket + 43 43 Ket + 46 End +------------------------------------------------------------------ + +/(?1)()((((((\1++))\x85)+)|))/ +------------------------------------------------------------------ + 0 69 Bra + 3 6 Recurse + 6 4 CBra 1 + 10 4 Ket + 13 53 CBra 2 + 17 43 CBra 3 + 21 36 CBra 4 + 25 29 CBra 5 + 29 20 CBra 6 + 33 13 CBra 7 + 37 6 Once + 40 \1+ + 43 6 Ket + 46 13 Ket + 49 20 Ket + 52 \x{85} + 54 29 KetRmax + 57 36 Ket + 60 3 Alt + 63 46 Ket + 66 53 Ket + 69 69 Ket + 72 End +------------------------------------------------------------------ + +# Check the absolute limit on nesting (?| etc. This varies with code unit +# width because the workspace is a different number of bytes. It will fail +# with link size 2 in 8-bit and 16-bit but not in 32-bit. + +/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|parens_nest_limit=1000,-fullbincode + +# Use "expand" to create some very long patterns with nested parentheses, in +# order to test workspace overflow. Again, this varies with code unit width, +# and even when it fails in two modes, the error offset differs. It also varies +# with link size - hence multiple tests with different values. + +/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000 +Failed: error 186 at offset 12820: regular expression is too complicated + +/(?(1)(?1)){8,}+()/debug +------------------------------------------------------------------ + 0 110 Bra + 3 97 Once + 6 8 Cond + 9 1 Cond ref + 11 103 Recurse + 14 8 Ket + 17 8 Cond + 20 1 Cond ref + 22 103 Recurse + 25 8 Ket + 28 8 Cond + 31 1 Cond ref + 33 103 Recurse + 36 8 Ket + 39 8 Cond + 42 1 Cond ref + 44 103 Recurse + 47 8 Ket + 50 8 Cond + 53 1 Cond ref + 55 103 Recurse + 58 8 Ket + 61 8 Cond + 64 1 Cond ref + 66 103 Recurse + 69 8 Ket + 72 8 Cond + 75 1 Cond ref + 77 103 Recurse + 80 8 Ket + 83 14 SBraPos + 86 8 SCond + 89 1 Cond ref + 91 103 Recurse + 94 8 Ket + 97 14 KetRpos +100 97 Ket +103 4 CBra 1 +107 4 Ket +110 110 Ket +113 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcd + 0: + 1: + +/(?(1)|a(?1)b){2,}+()/debug +------------------------------------------------------------------ + 0 58 Bra + 3 45 Once + 6 5 Cond + 9 1 Cond ref + 11 10 Alt + 14 a + 16 51 Recurse + 19 b + 21 15 Ket + 24 21 SBraPos + 27 5 SCond + 30 1 Cond ref + 32 10 Alt + 35 a + 37 51 Recurse + 40 b + 42 15 Ket + 45 21 KetRpos + 48 45 Ket + 51 4 CBra 1 + 55 4 Ket + 58 58 Ket + 61 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcde +No match + +/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug +------------------------------------------------------------------ + 0 194 Bra + 3 61 CBra 1 + 7 3 Recurse + 10 131 Recurse + 13 138 Recurse + 16 145 Recurse + 19 152 Recurse + 22 159 Recurse + 25 166 Recurse + 28 173 Recurse + 31 180 Recurse + 34 180 Recurse + 37 173 Recurse + 40 166 Recurse + 43 159 Recurse + 46 152 Recurse + 49 145 Recurse + 52 138 Recurse + 55 131 Recurse + 58 3 Recurse + 61 0 Recurse + 64 61 Ket + 67 61 SCBra 1 + 71 3 Recurse + 74 131 Recurse + 77 138 Recurse + 80 145 Recurse + 83 152 Recurse + 86 159 Recurse + 89 166 Recurse + 92 173 Recurse + 95 180 Recurse + 98 180 Recurse +101 173 Recurse +104 166 Recurse +107 159 Recurse +110 152 Recurse +113 145 Recurse +116 138 Recurse +119 131 Recurse +122 3 Recurse +125 0 Recurse +128 61 KetRmax +131 4 CBra 2 +135 4 Ket +138 4 CBra 3 +142 4 Ket +145 4 CBra 4 +149 4 Ket +152 4 CBra 5 +156 4 Ket +159 4 CBra 6 +163 4 Ket +166 4 CBra 7 +170 4 Ket +173 4 CBra 8 +177 4 Ket +180 4 CBra 9 +184 4 Ket +187 4 CBra 10 +191 4 Ket +194 194 Ket +197 End +------------------------------------------------------------------ +Capture group count = 10 +May match empty string +Subject length lower boundailed: error 114 at offset 509: missing closing parenthesisfullbincode + +#pattern -fullbincode + +/\[()]{65535}/expand + +# End of testinput8 diff --git a/src/pcre/testdata/testoutput11-32 b/src/pcre2/testdata/testoutput8-32-2 similarity index 62% rename from src/pcre/testdata/testoutput11-32 rename to src/pcre2/testdata/testoutput8-32-2 index e19518db..91d96c94 100644 --- a/src/pcre/testdata/testoutput11-32 +++ b/src/pcre2/testdata/testoutput8-32-2 @@ -1,10 +1,15 @@ -/-- These are a few representative patterns whose lengths and offsets are to be -shown when the link size is 2. This is just a doublecheck test to ensure the -sizes don't go horribly wrong when something is changed. The pattern contents -are all themselves checked in other tests. Unicode, including property support, -is required for these tests. --/ - -/((?i)b)/BM +# There are two sorts of patterns in this test. A number of them are +# representative patterns whose lengths and offsets are checked. This is just a +# doublecheck test to ensure the sizes don't go horribly wrong when something +# is changed. The operation of these patterns is checked in other tests. +# +# This file also contains tests whose output varies with code unit size and/or +# link size. Unicode support is required for these tests. There are separate +# output files for each code unit size and link size. + +#pattern fullbincode,memory + +/((?i)b)/ Memory allocation (code space): 48 ------------------------------------------------------------------ 0 9 Bra @@ -15,7 +20,7 @@ Memory allocation (code space): 48 11 End ------------------------------------------------------------------ -/(?s)(.*X|^B)/BM +/(?s)(.*X|^B)/ Memory allocation (code space): 76 ------------------------------------------------------------------ 0 16 Bra @@ -30,7 +35,7 @@ Memory allocation (code space): 76 18 End ------------------------------------------------------------------ -/(?s:.*X|^B)/BM +/(?s:.*X|^B)/ Memory allocation (code space): 72 ------------------------------------------------------------------ 0 15 Bra @@ -45,7 +50,7 @@ Memory allocation (code space): 72 17 End ------------------------------------------------------------------ -/^[[:alnum:]]/BM +/^[[:alnum:]]/ Memory allocation (code space): 60 ------------------------------------------------------------------ 0 12 Bra @@ -55,20 +60,19 @@ Memory allocation (code space): 60 14 End ------------------------------------------------------------------ -/#/IxMD +/#/Ix Memory allocation (code space): 20 ------------------------------------------------------------------ 0 2 Bra 2 2 Ket 4 End ------------------------------------------------------------------ -Capturing subpattern count = 0 +Capture group count = 0 May match empty string Options: extended -No first char -No need char +Subject length lower bound = 0 -/a#/IxMD +/a#/Ix Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -76,12 +80,12 @@ Memory allocation (code space): 28 4 4 Ket 6 End ------------------------------------------------------------------ -Capturing subpattern count = 0 +Capture group count = 0 Options: extended -First char = 'a' -No need char +First code unit = 'a' +Subject length lower bound = 1 -/x?+/BM +/x?+/ Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -90,7 +94,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/x++/BM +/x++/ Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -99,7 +103,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/x{1,3}+/BM +/x{1,3}+/ Memory allocation (code space): 40 ------------------------------------------------------------------ 0 7 Bra @@ -109,7 +113,7 @@ Memory allocation (code space): 40 9 End ------------------------------------------------------------------ -/(x)*+/BM +/(x)*+/ Memory allocation (code space): 52 ------------------------------------------------------------------ 0 10 Bra @@ -121,7 +125,7 @@ Memory allocation (code space): 52 12 End ------------------------------------------------------------------ -/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/BM +/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/ Memory allocation (code space): 220 ------------------------------------------------------------------ 0 52 Bra @@ -144,7 +148,7 @@ Memory allocation (code space): 220 54 End ------------------------------------------------------------------ -|8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b|BM +"8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" Memory allocation (code space): 3296 ------------------------------------------------------------------ 0 821 Bra @@ -154,7 +158,7 @@ Memory allocation (code space): 3296 823 End ------------------------------------------------------------------ -|\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b|BM +"\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" Memory allocation (code space): 3256 ------------------------------------------------------------------ 0 811 Bra @@ -164,7 +168,7 @@ Memory allocation (code space): 3256 813 End ------------------------------------------------------------------ -/(a(?1)b)/BM +/(a(?1)b)/ Memory allocation (code space): 64 ------------------------------------------------------------------ 0 13 Bra @@ -177,13 +181,13 @@ Memory allocation (code space): 64 15 End ------------------------------------------------------------------ -/(a(?1)+b)/BM +/(a(?1)+b)/ Memory allocation (code space): 80 ------------------------------------------------------------------ 0 17 Bra 2 13 CBra 1 5 a - 7 4 Once + 7 4 SBra 9 2 Recurse 11 4 KetRmax 13 b @@ -192,8 +196,8 @@ Memory allocation (code space): 80 19 End ------------------------------------------------------------------ -/a(?Pb|c)d(?Pe)/BM -Memory allocation (code space): 186 +/a(?Pb|c)d(?Pe)/ +Memory allocation (code space): 108 ------------------------------------------------------------------ 0 24 Bra 2 a @@ -210,8 +214,8 @@ Memory allocation (code space): 186 26 End ------------------------------------------------------------------ -/(?:a(?Pc(?Pd)))(?Pa)/BM -Memory allocation (code space): 155 +/(?:a(?Pc(?Pd)))(?Pa)/ +Memory allocation (code space): 128 ------------------------------------------------------------------ 0 29 Bra 2 18 Bra @@ -230,8 +234,8 @@ Memory allocation (code space): 155 31 End ------------------------------------------------------------------ -/(?Pa)...(?P=a)bbb(?P>a)d/BM -Memory allocation (code space): 189 +/(?Pa)...(?P=a)bbb(?P>a)d/ +Memory allocation (code space): 108 ------------------------------------------------------------------ 0 24 Bra 2 5 CBra 1 @@ -248,7 +252,7 @@ Memory allocation (code space): 189 26 End ------------------------------------------------------------------ -/abc(?C255)de(?C)f/BM +/abc(?C255)de(?C)f/ Memory allocation (code space): 100 ------------------------------------------------------------------ 0 22 Bra @@ -261,7 +265,7 @@ Memory allocation (code space): 100 24 End ------------------------------------------------------------------ -/abcde/CBM +/abcde/auto_callout Memory allocation (code space): 156 ------------------------------------------------------------------ 0 36 Bra @@ -280,7 +284,7 @@ Memory allocation (code space): 156 38 End ------------------------------------------------------------------ -/\x{100}/8BM +/\x{100}/utf Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -289,7 +293,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/\x{1000}/8BM +/\x{1000}/utf Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -298,7 +302,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/\x{10000}/8BM +/\x{10000}/utf Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -307,7 +311,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/\x{100000}/8BM +/\x{100000}/utf Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -316,7 +320,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/\x{10ffff}/8BM +/\x{10ffff}/utf Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -325,10 +329,10 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/\x{110000}/8BM -Failed: character value in \x{} or \o{} is too large at offset 9 +/\x{110000}/utf +Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large -/[\x{ff}]/8BM +/[\x{ff}]/utf Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -337,7 +341,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/[\x{100}]/8BM +/[\x{100}]/utf Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -346,16 +350,16 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/\x80/8BM +/\x80/utf Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra - 2 \x80 + 2 \x{80} 4 4 Ket 6 End ------------------------------------------------------------------ -/\xff/8BM +/\xff/utf Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -364,7 +368,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/\x{0041}\x{2262}\x{0391}\x{002e}/D8M +/\x{0041}\x{2262}\x{0391}\x{002e}/I,utf Memory allocation (code space): 52 ------------------------------------------------------------------ 0 10 Bra @@ -372,12 +376,13 @@ Memory allocation (code space): 52 10 10 Ket 12 End ------------------------------------------------------------------ -Capturing subpattern count = 0 +Capture group count = 0 Options: utf -First char = 'A' -Need char = '.' - -/\x{D55c}\x{ad6d}\x{C5B4}/D8M +First code unit = 'A' +Last code unit = '.' +Subject length lower bound = 4 + +/\x{D55c}\x{ad6d}\x{C5B4}/I,utf Memory allocation (code space): 44 ------------------------------------------------------------------ 0 8 Bra @@ -385,12 +390,13 @@ Memory allocation (code space): 44 8 8 Ket 10 End ------------------------------------------------------------------ -Capturing subpattern count = 0 +Capture group count = 0 Options: utf -First char = \x{d55c} -Need char = \x{c5b4} +First code unit = \x{d55c} +Last code unit = \x{c5b4} +Subject length lower bound = 3 -/\x{65e5}\x{672c}\x{8a9e}/D8M +/\x{65e5}\x{672c}\x{8a9e}/I,utf Memory allocation (code space): 44 ------------------------------------------------------------------ 0 8 Bra @@ -398,12 +404,13 @@ Memory allocation (code space): 44 8 8 Ket 10 End ------------------------------------------------------------------ -Capturing subpattern count = 0 +Capture group count = 0 Options: utf -First char = \x{65e5} -Need char = \x{8a9e} +First code unit = \x{65e5} +Last code unit = \x{8a9e} +Subject length lower bound = 3 -/[\x{100}]/8BM +/[\x{100}]/utf Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -412,7 +419,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/[Z\x{100}]/8BM +/[Z\x{100}]/utf Memory allocation (code space): 76 ------------------------------------------------------------------ 0 16 Bra @@ -421,7 +428,7 @@ Memory allocation (code space): 76 18 End ------------------------------------------------------------------ -/^[\x{100}\E-\Q\E\x{150}]/B8M +/^[\x{100}\E-\Q\E\x{150}]/utf Memory allocation (code space): 52 ------------------------------------------------------------------ 0 10 Bra @@ -431,7 +438,7 @@ Memory allocation (code space): 52 12 End ------------------------------------------------------------------ -/^[\QÄ€\E-\QÅ\E]/B8M +/^[\QÄ€\E-\QÅ\E]/utf Memory allocation (code space): 52 ------------------------------------------------------------------ 0 10 Bra @@ -441,10 +448,10 @@ Memory allocation (code space): 52 12 End ------------------------------------------------------------------ -/^[\QÄ€\E-\QÅ\E/B8M -Failed: missing terminating ] for character class at offset 13 +/^[\QÄ€\E-\QÅ\E/utf +Failed: error 106 at offset 13: missing terminating ] for character class -/[\p{L}]/BM +/[\p{L}]/ Memory allocation (code space): 48 ------------------------------------------------------------------ 0 9 Bra @@ -453,7 +460,7 @@ Memory allocation (code space): 48 11 End ------------------------------------------------------------------ -/[\p{^L}]/BM +/[\p{^L}]/ Memory allocation (code space): 48 ------------------------------------------------------------------ 0 9 Bra @@ -462,7 +469,7 @@ Memory allocation (code space): 48 11 End ------------------------------------------------------------------ -/[\P{L}]/BM +/[\P{L}]/ Memory allocation (code space): 48 ------------------------------------------------------------------ 0 9 Bra @@ -471,7 +478,7 @@ Memory allocation (code space): 48 11 End ------------------------------------------------------------------ -/[\P{^L}]/BM +/[\P{^L}]/ Memory allocation (code space): 48 ------------------------------------------------------------------ 0 9 Bra @@ -480,7 +487,7 @@ Memory allocation (code space): 48 11 End ------------------------------------------------------------------ -/[abc\p{L}\x{0660}]/8BM +/[abc\p{L}\x{0660}]/utf Memory allocation (code space): 88 ------------------------------------------------------------------ 0 19 Bra @@ -489,7 +496,7 @@ Memory allocation (code space): 88 21 End ------------------------------------------------------------------ -/[\p{Nd}]/8BM +/[\p{Nd}]/utf Memory allocation (code space): 48 ------------------------------------------------------------------ 0 9 Bra @@ -498,7 +505,7 @@ Memory allocation (code space): 48 11 End ------------------------------------------------------------------ -/[\p{Nd}+-]+/8BM +/[\p{Nd}+-]+/utf Memory allocation (code space): 84 ------------------------------------------------------------------ 0 18 Bra @@ -507,7 +514,7 @@ Memory allocation (code space): 84 20 End ------------------------------------------------------------------ -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iBM +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/i,utf Memory allocation (code space): 60 ------------------------------------------------------------------ 0 12 Bra @@ -516,7 +523,7 @@ Memory allocation (code space): 60 14 End ------------------------------------------------------------------ -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8BM +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/utf Memory allocation (code space): 60 ------------------------------------------------------------------ 0 12 Bra @@ -525,7 +532,7 @@ Memory allocation (code space): 60 14 End ------------------------------------------------------------------ -/[\x{105}-\x{109}]/8iBM +/[\x{105}-\x{109}]/i,utf Memory allocation (code space): 48 ------------------------------------------------------------------ 0 9 Bra @@ -534,7 +541,7 @@ Memory allocation (code space): 48 11 End ------------------------------------------------------------------ -/( ( (?(1)0|) )* )/xBM +/( ( (?(1)0|) )* )/x Memory allocation (code space): 104 ------------------------------------------------------------------ 0 23 Bra @@ -552,7 +559,7 @@ Memory allocation (code space): 104 25 End ------------------------------------------------------------------ -/( (?(1)0|)* )/xBM +/( (?(1)0|)* )/x Memory allocation (code space): 84 ------------------------------------------------------------------ 0 18 Bra @@ -568,7 +575,7 @@ Memory allocation (code space): 84 20 End ------------------------------------------------------------------ -/[a]/BM +/[a]/ Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -577,7 +584,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/[a]/8BM +/[a]/utf Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -586,7 +593,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/[\xaa]/BM +/[\xaa]/ Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -595,7 +602,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/[\xaa]/8BM +/[\xaa]/utf Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -604,7 +611,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/[^a]/BM +/[^a]/ Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -613,7 +620,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/[^a]/8BM +/[^a]/utf Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -622,7 +629,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/[^\xaa]/BM +/[^\xaa]/ Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -631,7 +638,7 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/[^\xaa]/8BM +/[^\xaa]/utf Memory allocation (code space): 28 ------------------------------------------------------------------ 0 4 Bra @@ -640,7 +647,9 @@ Memory allocation (code space): 28 6 End ------------------------------------------------------------------ -/[^\d]/8WB +#pattern -memory + +/[^\d]/utf,ucp ------------------------------------------------------------------ 0 9 Bra 2 [^\p{Nd}] @@ -648,23 +657,23 @@ Memory allocation (code space): 28 11 End ------------------------------------------------------------------ -/[[:^alpha:][:^cntrl:]]+/8WB +/[[:^alpha:][:^cntrl:]]+/utf,ucp ------------------------------------------------------------------ - 0 21 Bra - 2 [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++ - 21 21 Ket - 23 End + 0 13 Bra + 2 [\P{L}\P{Cc}]++ + 13 13 Ket + 15 End ------------------------------------------------------------------ -/[[:^cntrl:][:^alpha:]]+/8WB +/[[:^cntrl:][:^alpha:]]+/utf,ucp ------------------------------------------------------------------ - 0 21 Bra - 2 [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++ - 21 21 Ket - 23 End + 0 13 Bra + 2 [\P{Cc}\P{L}]++ + 13 13 Ket + 15 End ------------------------------------------------------------------ -/[[:alpha:]]+/8WB +/[[:alpha:]]+/utf,ucp ------------------------------------------------------------------ 0 10 Bra 2 [\p{L}]++ @@ -672,7 +681,7 @@ Memory allocation (code space): 28 12 End ------------------------------------------------------------------ -/[[:^alpha:]\S]+/8WB +/[[:^alpha:]\S]+/utf,ucp ------------------------------------------------------------------ 0 13 Bra 2 [\P{L}\P{Xsp}]++ @@ -680,7 +689,7 @@ Memory allocation (code space): 28 15 End ------------------------------------------------------------------ -/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/B +/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/ ------------------------------------------------------------------ 0 60 Bra 2 abc @@ -709,63 +718,301 @@ Memory allocation (code space): 28 62 End ------------------------------------------------------------------ -/(((a\2)|(a*)\g<-1>))*a?/B +/(((a\2)|(a*)\g<-1>))*a?/ ------------------------------------------------------------------ - 0 39 Bra + 0 35 Bra 2 Brazero - 3 32 SCBra 1 - 6 27 Once - 8 12 CBra 2 - 11 7 CBra 3 - 14 a - 16 \2 - 18 7 Ket - 20 11 Alt - 22 5 CBra 4 - 25 a* - 27 5 Ket - 29 22 Recurse - 31 23 Ket - 33 27 Ket - 35 32 KetRmax - 37 a?+ - 39 39 Ket - 41 End ------------------------------------------------------------------- - -/((?+1)(\1))/B ------------------------------------------------------------------- - 0 20 Bra - 2 16 Once - 4 12 CBra 1 - 7 9 Recurse - 9 5 CBra 2 - 12 \1 - 14 5 Ket - 16 12 Ket - 18 16 Ket - 20 20 Ket - 22 End + 3 28 SCBra 1 + 6 12 CBra 2 + 9 7 CBra 3 + 12 a + 14 \2 + 16 7 Ket + 18 11 Alt + 20 5 CBra 4 + 23 a* + 25 5 Ket + 27 20 Recurse + 29 23 Ket + 31 28 KetRmax + 33 a?+ + 35 35 Ket + 37 End +------------------------------------------------------------------ + +/((?+1)(\1))/ +------------------------------------------------------------------ + 0 16 Bra + 2 12 CBra 1 + 5 7 Recurse + 7 5 CBra 2 + 10 \1 + 12 5 Ket + 14 12 Ket + 16 16 Ket + 18 End ------------------------------------------------------------------ -/.((?2)(?R)\1)()/B +"(?1)(?#?'){2}(a)" ------------------------------------------------------------------ - 0 23 Bra + 0 13 Bra + 2 6 Recurse + 4 6 Recurse + 6 5 CBra 1 + 9 a + 11 5 Ket + 13 13 Ket + 15 End +------------------------------------------------------------------ + +/.((?2)(?R)|\1|$)()/ +------------------------------------------------------------------ + 0 24 Bra 2 Any - 3 13 Once - 5 9 CBra 1 - 8 18 Recurse - 10 0 Recurse + 3 7 CBra 1 + 6 19 Recurse + 8 0 Recurse + 10 4 Alt 12 \1 - 14 9 Ket - 16 13 Ket - 18 3 CBra 2 - 21 3 Ket - 23 23 Ket - 25 End + 14 3 Alt + 16 $ + 17 14 Ket + 19 3 CBra 2 + 22 3 Ket + 24 24 Ket + 26 End ------------------------------------------------------------------ +/.((?3)(?R)()(?2)|\1|$)()/ +------------------------------------------------------------------ + 0 31 Bra + 2 Any + 3 14 CBra 1 + 6 26 Recurse + 8 0 Recurse + 10 3 CBra 2 + 13 3 Ket + 15 10 Recurse + 17 4 Alt + 19 \1 + 21 3 Alt + 23 $ + 24 21 Ket + 26 3 CBra 3 + 29 3 Ket + 31 31 Ket + 33 End +------------------------------------------------------------------ + +/(?1)()((((((\1++))\x85)+)|))/ +------------------------------------------------------------------ + 0 50 Bra + 2 4 Recurse + 4 3 CBra 1 + 7 3 Ket + 9 39 CBra 2 + 12 32 CBra 3 + 15 27 CBra 4 + 18 22 CBra 5 + 21 15 CBra 6 + 24 10 CBra 7 + 27 5 Once + 29 \1+ + 32 5 Ket + 34 10 Ket + 36 15 Ket + 38 \x{85} + 40 22 KetRmax + 42 27 Ket + 44 2 Alt + 46 34 Ket + 48 39 Ket + 50 50 Ket + 52 End +------------------------------------------------------------------ + +# Check the absolute limit on nesting (?| etc. This varies with code unit +# width because the workspace is a different number of bytes. It will fail +# with link size 2 in 8-bit and 16-bit but not in 32-bit. + +/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?| +))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) +/parens_nest_limit=1000,-fullbincode + +# Use "expand" to create some very long patterns with nested parentheses, in +# order to test workspace overflow. Again, this varies with code unit width, +# and even when it fails in two modes, the error offset differs. It also varies +# with link size - hence multiple tests with different values. + +/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000 +Failed: error 186 at offset 12820: regular expression is too complicated + +/(?(1)(?1)){8,}+()/debug +------------------------------------------------------------------ + 0 79 Bra + 2 70 Once + 4 6 Cond + 6 1 Cond ref + 8 74 Recurse + 10 6 Ket + 12 6 Cond + 14 1 Cond ref + 16 74 Recurse + 18 6 Ket + 20 6 Cond + 22 1 Cond ref + 24 74 Recurse + 26 6 Ket + 28 6 Cond + 30 1 Cond ref + 32 74 Recurse + 34 6 Ket + 36 6 Cond + 38 1 Cond ref + 40 74 Recurse + 42 6 Ket + 44 6 Cond + 46 1 Cond ref + 48 74 Recurse + 50 6 Ket + 52 6 Cond + 54 1 Cond ref + 56 74 Recurse + 58 6 Ket + 60 10 SBraPos + 62 6 SCond + 64 1 Cond ref + 66 74 Recurse + 68 6 Ket + 70 10 KetRpos + 72 70 Ket + 74 3 CBra 1 + 77 3 Ket + 79 79 Ket + 81 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcd + 0: + 1: + +/(?(1)|a(?1)b){2,}+()/debug +------------------------------------------------------------------ + 0 43 Bra + 2 34 Once + 4 4 Cond + 6 1 Cond ref + 8 8 Alt + 10 a + 12 38 Recurse + 14 b + 16 12 Ket + 18 16 SBraPos + 20 4 SCond + 22 1 Cond ref + 24 8 Alt + 26 a + 28 38 Recurse + 30 b + 32 12 Ket + 34 16 KetRpos + 36 34 Ket + 38 3 CBra 1 + 41 3 Ket + 43 43 Ket + 45 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcde +No match + +/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug +------------------------------------------------------------------ + 0 133 Bra + 2 41 CBra 1 + 5 2 Recurse + 7 88 Recurse + 9 93 Recurse + 11 98 Recurse + 13 103 Recurse + 15 108 Recurse + 17 113 Recurse + 19 118 Recurse + 21 123 Recurse + 23 123 Recurse + 25 118 Recurse + 27 113 Recurse + 29 108 Recurse + 31 103 Recurse + 33 98 Recurse + 35 93 Recurse + 37 88 Recurse + 39 2 Recurse + 41 0 Recurse + 43 41 Ket + 45 41 SCBra 1 + 48 2 Recurse + 50 88 Recurse + 52 93 Recurse + 54 98 Recurse + 56 103 Recurse + 58 108 Recurse + 60 113 Recurse + 62 118 Recurse + 64 123 Recurse + 66 123 Recurse + 68 118 Recurse + 70 113 Recurse + 72 108 Recurse + 74 103 Recurse + 76 98 Recurse + 78 93 Recurse + 80 88 Recurse + 82 2 Recurse + 84 0 Recurse + 86 41 KetRmax + 88 3 CBra 2 + 91 3 Ket + 93 3 CBra 3 + 96 3 Ket + 98 3 CBra 4 +101 3 Ket +103 3 CBra 5 +106 3 Ket +108 3 CBra 6 +111 3 Ket +113 3 CBra 7 +116 3 Ket +118 3 CBra 8 +121 3 Ket +123 3 CBra 9 +126 3 Ket +128 3 CBra 10 +131 3 Ket +133 133 Ket +135 End +------------------------------------------------------------------ +Capture group count = 10 +May match empty string +Subject length lower boundailed: missing ) at offset 509 +Failed: error 114 at offset 509: missing closing parenthesisfullbincode + +#pattern -fullbincode + +/\[()]{65535}/expand -/-- End of testinput11 --/ +# End of testinput8 diff --git a/src/pcre2/testdata/testoutput8-32-3 b/src/pcre2/testdata/testoutput8-32-3 new file mode 100644 index 00000000..91d96c94 --- /dev/null +++ b/src/pcre2/testdata/testoutput8-32-3 @@ -0,0 +1,1018 @@ +# There are two sorts of patterns in this test. A number of them are +# representative patterns whose lengths and offsets are checked. This is just a +# doublecheck test to ensure the sizes don't go horribly wrong when something +# is changed. The operation of these patterns is checked in other tests. +# +# This file also contains tests whose output varies with code unit size and/or +# link size. Unicode support is required for these tests. There are separate +# output files for each code unit size and link size. + +#pattern fullbincode,memory + +/((?i)b)/ +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 9 Bra + 2 5 CBra 1 + 5 /i b + 7 5 Ket + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/(?s)(.*X|^B)/ +Memory allocation (code space): 76 +------------------------------------------------------------------ + 0 16 Bra + 2 7 CBra 1 + 5 AllAny* + 7 X + 9 5 Alt + 11 ^ + 12 B + 14 12 Ket + 16 16 Ket + 18 End +------------------------------------------------------------------ + +/(?s:.*X|^B)/ +Memory allocation (code space): 72 +------------------------------------------------------------------ + 0 15 Bra + 2 6 Bra + 4 AllAny* + 6 X + 8 5 Alt + 10 ^ + 11 B + 13 11 Ket + 15 15 Ket + 17 End +------------------------------------------------------------------ + +/^[[:alnum:]]/ +Memory allocation (code space): 60 +------------------------------------------------------------------ + 0 12 Bra + 2 ^ + 3 [0-9A-Za-z] + 12 12 Ket + 14 End +------------------------------------------------------------------ + +/#/Ix +Memory allocation (code space): 20 +------------------------------------------------------------------ + 0 2 Bra + 2 2 Ket + 4 End +------------------------------------------------------------------ +Capture group count = 0 +May match empty string +Options: extended +Subject length lower bound = 0 + +/a#/Ix +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 a + 4 4 Ket + 6 End +------------------------------------------------------------------ +Capture group count = 0 +Options: extended +First code unit = 'a' +Subject length lower bound = 1 + +/x?+/ +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 x?+ + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/x++/ +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 x++ + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/x{1,3}+/ +Memory allocation (code space): 40 +------------------------------------------------------------------ + 0 7 Bra + 2 x + 4 x{0,2}+ + 7 7 Ket + 9 End +------------------------------------------------------------------ + +/(x)*+/ +Memory allocation (code space): 52 +------------------------------------------------------------------ + 0 10 Bra + 2 Braposzero + 3 5 CBraPos 1 + 6 x + 8 5 KetRpos + 10 10 Ket + 12 End +------------------------------------------------------------------ + +/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/ +Memory allocation (code space): 220 +------------------------------------------------------------------ + 0 52 Bra + 2 ^ + 3 47 CBra 1 + 6 5 CBra 2 + 9 a+ + 11 5 Ket + 13 13 CBra 3 + 16 [ab]+? + 26 13 Ket + 28 13 CBra 4 + 31 [bc]+ + 41 13 Ket + 43 5 CBra 5 + 46 \w*+ + 48 5 Ket + 50 47 Ket + 52 52 Ket + 54 End +------------------------------------------------------------------ + +"8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" +Memory allocation (code space): 3296 +------------------------------------------------------------------ + 0 821 Bra + 2 8J$WE<.rX+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X +820 \b +821 821 Ket +823 End +------------------------------------------------------------------ + +"\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" +Memory allocation (code space): 3256 +------------------------------------------------------------------ + 0 811 Bra + 2 $<.X+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X +810 \b +811 811 Ket +813 End +------------------------------------------------------------------ + +/(a(?1)b)/ +Memory allocation (code space): 64 +------------------------------------------------------------------ + 0 13 Bra + 2 9 CBra 1 + 5 a + 7 2 Recurse + 9 b + 11 9 Ket + 13 13 Ket + 15 End +------------------------------------------------------------------ + +/(a(?1)+b)/ +Memory allocation (code space): 80 +------------------------------------------------------------------ + 0 17 Bra + 2 13 CBra 1 + 5 a + 7 4 SBra + 9 2 Recurse + 11 4 KetRmax + 13 b + 15 13 Ket + 17 17 Ket + 19 End +------------------------------------------------------------------ + +/a(?Pb|c)d(?Pe)/ +Memory allocation (code space): 108 +------------------------------------------------------------------ + 0 24 Bra + 2 a + 4 5 CBra 1 + 7 b + 9 4 Alt + 11 c + 13 9 Ket + 15 d + 17 5 CBra 2 + 20 e + 22 5 Ket + 24 24 Ket + 26 End +------------------------------------------------------------------ + +/(?:a(?Pc(?Pd)))(?Pa)/ +Memory allocation (code space): 128 +------------------------------------------------------------------ + 0 29 Bra + 2 18 Bra + 4 a + 6 12 CBra 1 + 9 c + 11 5 CBra 2 + 14 d + 16 5 Ket + 18 12 Ket + 20 18 Ket + 22 5 CBra 3 + 25 a + 27 5 Ket + 29 29 Ket + 31 End +------------------------------------------------------------------ + +/(?Pa)...(?P=a)bbb(?P>a)d/ +Memory allocation (code space): 108 +------------------------------------------------------------------ + 0 24 Bra + 2 5 CBra 1 + 5 a + 7 5 Ket + 9 Any + 10 Any + 11 Any + 12 \1 + 14 bbb + 20 2 Recurse + 22 d + 24 24 Ket + 26 End +------------------------------------------------------------------ + +/abc(?C255)de(?C)f/ +Memory allocation (code space): 100 +------------------------------------------------------------------ + 0 22 Bra + 2 abc + 8 Callout 255 10 1 + 12 de + 16 Callout 0 16 1 + 20 f + 22 22 Ket + 24 End +------------------------------------------------------------------ + +/abcde/auto_callout +Memory allocation (code space): 156 +------------------------------------------------------------------ + 0 36 Bra + 2 Callout 255 0 1 + 6 a + 8 Callout 255 1 1 + 12 b + 14 Callout 255 2 1 + 18 c + 20 Callout 255 3 1 + 24 d + 26 Callout 255 4 1 + 30 e + 32 Callout 255 5 0 + 36 36 Ket + 38 End +------------------------------------------------------------------ + +/\x{100}/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{100} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\x{1000}/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{1000} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\x{10000}/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{10000} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\x{100000}/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{100000} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\x{10ffff}/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{10ffff} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\x{110000}/utf +Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large + +/[\x{ff}]/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{ff} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[\x{100}]/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{100} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\x80/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{80} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\xff/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{ff} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\x{0041}\x{2262}\x{0391}\x{002e}/I,utf +Memory allocation (code space): 52 +------------------------------------------------------------------ + 0 10 Bra + 2 A\x{2262}\x{391}. + 10 10 Ket + 12 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'A' +Last code unit = '.' +Subject length lower bound = 4 + +/\x{D55c}\x{ad6d}\x{C5B4}/I,utf +Memory allocation (code space): 44 +------------------------------------------------------------------ + 0 8 Bra + 2 \x{d55c}\x{ad6d}\x{c5b4} + 8 8 Ket + 10 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{d55c} +Last code unit = \x{c5b4} +Subject length lower bound = 3 + +/\x{65e5}\x{672c}\x{8a9e}/I,utf +Memory allocation (code space): 44 +------------------------------------------------------------------ + 0 8 Bra + 2 \x{65e5}\x{672c}\x{8a9e} + 8 8 Ket + 10 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{65e5} +Last code unit = \x{8a9e} +Subject length lower bound = 3 + +/[\x{100}]/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{100} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[Z\x{100}]/utf +Memory allocation (code space): 76 +------------------------------------------------------------------ + 0 16 Bra + 2 [Z\x{100}] + 16 16 Ket + 18 End +------------------------------------------------------------------ + +/^[\x{100}\E-\Q\E\x{150}]/utf +Memory allocation (code space): 52 +------------------------------------------------------------------ + 0 10 Bra + 2 ^ + 3 [\x{100}-\x{150}] + 10 10 Ket + 12 End +------------------------------------------------------------------ + +/^[\QÄ€\E-\QÅ\E]/utf +Memory allocation (code space): 52 +------------------------------------------------------------------ + 0 10 Bra + 2 ^ + 3 [\x{100}-\x{150}] + 10 10 Ket + 12 End +------------------------------------------------------------------ + +/^[\QÄ€\E-\QÅ\E/utf +Failed: error 106 at offset 13: missing terminating ] for character class + +/[\p{L}]/ +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 9 Bra + 2 [\p{L}] + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/[\p{^L}]/ +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 9 Bra + 2 [\P{L}] + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/[\P{L}]/ +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 9 Bra + 2 [\P{L}] + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/[\P{^L}]/ +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 9 Bra + 2 [\p{L}] + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/[abc\p{L}\x{0660}]/utf +Memory allocation (code space): 88 +------------------------------------------------------------------ + 0 19 Bra + 2 [a-c\p{L}\x{660}] + 19 19 Ket + 21 End +------------------------------------------------------------------ + +/[\p{Nd}]/utf +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 9 Bra + 2 [\p{Nd}] + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/[\p{Nd}+-]+/utf +Memory allocation (code space): 84 +------------------------------------------------------------------ + 0 18 Bra + 2 [+\-\p{Nd}]++ + 18 18 Ket + 20 End +------------------------------------------------------------------ + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/i,utf +Memory allocation (code space): 60 +------------------------------------------------------------------ + 0 12 Bra + 2 /i A\x{391}\x{10427}\x{ff3a}\x{1fb0} + 12 12 Ket + 14 End +------------------------------------------------------------------ + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/utf +Memory allocation (code space): 60 +------------------------------------------------------------------ + 0 12 Bra + 2 A\x{391}\x{10427}\x{ff3a}\x{1fb0} + 12 12 Ket + 14 End +------------------------------------------------------------------ + +/[\x{105}-\x{109}]/i,utf +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 9 Bra + 2 [\x{104}-\x{109}] + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/( ( (?(1)0|) )* )/x +Memory allocation (code space): 104 +------------------------------------------------------------------ + 0 23 Bra + 2 19 CBra 1 + 5 Brazero + 6 13 SCBra 2 + 9 6 Cond + 11 1 Cond ref + 13 0 + 15 2 Alt + 17 8 Ket + 19 13 KetRmax + 21 19 Ket + 23 23 Ket + 25 End +------------------------------------------------------------------ + +/( (?(1)0|)* )/x +Memory allocation (code space): 84 +------------------------------------------------------------------ + 0 18 Bra + 2 14 CBra 1 + 5 Brazero + 6 6 SCond + 8 1 Cond ref + 10 0 + 12 2 Alt + 14 8 KetRmax + 16 14 Ket + 18 18 Ket + 20 End +------------------------------------------------------------------ + +/[a]/ +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 a + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[a]/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 a + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[\xaa]/ +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{aa} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[\xaa]/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{aa} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[^a]/ +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 [^a] + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[^a]/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 [^a] + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[^\xaa]/ +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 [^\x{aa}] + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[^\xaa]/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 [^\x{aa}] + 4 4 Ket + 6 End +------------------------------------------------------------------ + +#pattern -memory + +/[^\d]/utf,ucp +------------------------------------------------------------------ + 0 9 Bra + 2 [^\p{Nd}] + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/[[:^alpha:][:^cntrl:]]+/utf,ucp +------------------------------------------------------------------ + 0 13 Bra + 2 [\P{L}\P{Cc}]++ + 13 13 Ket + 15 End +------------------------------------------------------------------ + +/[[:^cntrl:][:^alpha:]]+/utf,ucp +------------------------------------------------------------------ + 0 13 Bra + 2 [\P{Cc}\P{L}]++ + 13 13 Ket + 15 End +------------------------------------------------------------------ + +/[[:alpha:]]+/utf,ucp +------------------------------------------------------------------ + 0 10 Bra + 2 [\p{L}]++ + 10 10 Ket + 12 End +------------------------------------------------------------------ + +/[[:^alpha:]\S]+/utf,ucp +------------------------------------------------------------------ + 0 13 Bra + 2 [\P{L}\P{Xsp}]++ + 13 13 Ket + 15 End +------------------------------------------------------------------ + +/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/ +------------------------------------------------------------------ + 0 60 Bra + 2 abc + 8 5 CBra 1 + 11 d + 13 4 Alt + 15 e + 17 9 Ket + 19 *THEN + 20 x + 22 12 CBra 2 + 25 123 + 31 *THEN + 32 4 + 34 24 Alt + 36 567 + 42 5 CBra 3 + 45 b + 47 4 Alt + 49 q + 51 9 Ket + 53 *THEN + 54 xx + 58 36 Ket + 60 60 Ket + 62 End +------------------------------------------------------------------ + +/(((a\2)|(a*)\g<-1>))*a?/ +------------------------------------------------------------------ + 0 35 Bra + 2 Brazero + 3 28 SCBra 1 + 6 12 CBra 2 + 9 7 CBra 3 + 12 a + 14 \2 + 16 7 Ket + 18 11 Alt + 20 5 CBra 4 + 23 a* + 25 5 Ket + 27 20 Recurse + 29 23 Ket + 31 28 KetRmax + 33 a?+ + 35 35 Ket + 37 End +------------------------------------------------------------------ + +/((?+1)(\1))/ +------------------------------------------------------------------ + 0 16 Bra + 2 12 CBra 1 + 5 7 Recurse + 7 5 CBra 2 + 10 \1 + 12 5 Ket + 14 12 Ket + 16 16 Ket + 18 End +------------------------------------------------------------------ + +"(?1)(?#?'){2}(a)" +------------------------------------------------------------------ + 0 13 Bra + 2 6 Recurse + 4 6 Recurse + 6 5 CBra 1 + 9 a + 11 5 Ket + 13 13 Ket + 15 End +------------------------------------------------------------------ + +/.((?2)(?R)|\1|$)()/ +------------------------------------------------------------------ + 0 24 Bra + 2 Any + 3 7 CBra 1 + 6 19 Recurse + 8 0 Recurse + 10 4 Alt + 12 \1 + 14 3 Alt + 16 $ + 17 14 Ket + 19 3 CBra 2 + 22 3 Ket + 24 24 Ket + 26 End +------------------------------------------------------------------ + +/.((?3)(?R)()(?2)|\1|$)()/ +------------------------------------------------------------------ + 0 31 Bra + 2 Any + 3 14 CBra 1 + 6 26 Recurse + 8 0 Recurse + 10 3 CBra 2 + 13 3 Ket + 15 10 Recurse + 17 4 Alt + 19 \1 + 21 3 Alt + 23 $ + 24 21 Ket + 26 3 CBra 3 + 29 3 Ket + 31 31 Ket + 33 End +------------------------------------------------------------------ + +/(?1)()((((((\1++))\x85)+)|))/ +------------------------------------------------------------------ + 0 50 Bra + 2 4 Recurse + 4 3 CBra 1 + 7 3 Ket + 9 39 CBra 2 + 12 32 CBra 3 + 15 27 CBra 4 + 18 22 CBra 5 + 21 15 CBra 6 + 24 10 CBra 7 + 27 5 Once + 29 \1+ + 32 5 Ket + 34 10 Ket + 36 15 Ket + 38 \x{85} + 40 22 KetRmax + 42 27 Ket + 44 2 Alt + 46 34 Ket + 48 39 Ket + 50 50 Ket + 52 End +------------------------------------------------------------------ + +# Check the absolute limit on nesting (?| etc. This varies with code unit +# width because the workspace is a different number of bytes. It will fail +# with link size 2 in 8-bit and 16-bit but not in 32-bit. + +/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|parens_nest_limit=1000,-fullbincode + +# Use "expand" to create some very long patterns with nested parentheses, in +# order to test workspace overflow. Again, this varies with code unit width, +# and even when it fails in two modes, the error offset differs. It also varies +# with link size - hence multiple tests with different values. + +/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000 +Failed: error 186 at offset 12820: regular expression is too complicated + +/(?(1)(?1)){8,}+()/debug +------------------------------------------------------------------ + 0 79 Bra + 2 70 Once + 4 6 Cond + 6 1 Cond ref + 8 74 Recurse + 10 6 Ket + 12 6 Cond + 14 1 Cond ref + 16 74 Recurse + 18 6 Ket + 20 6 Cond + 22 1 Cond ref + 24 74 Recurse + 26 6 Ket + 28 6 Cond + 30 1 Cond ref + 32 74 Recurse + 34 6 Ket + 36 6 Cond + 38 1 Cond ref + 40 74 Recurse + 42 6 Ket + 44 6 Cond + 46 1 Cond ref + 48 74 Recurse + 50 6 Ket + 52 6 Cond + 54 1 Cond ref + 56 74 Recurse + 58 6 Ket + 60 10 SBraPos + 62 6 SCond + 64 1 Cond ref + 66 74 Recurse + 68 6 Ket + 70 10 KetRpos + 72 70 Ket + 74 3 CBra 1 + 77 3 Ket + 79 79 Ket + 81 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcd + 0: + 1: + +/(?(1)|a(?1)b){2,}+()/debug +------------------------------------------------------------------ + 0 43 Bra + 2 34 Once + 4 4 Cond + 6 1 Cond ref + 8 8 Alt + 10 a + 12 38 Recurse + 14 b + 16 12 Ket + 18 16 SBraPos + 20 4 SCond + 22 1 Cond ref + 24 8 Alt + 26 a + 28 38 Recurse + 30 b + 32 12 Ket + 34 16 KetRpos + 36 34 Ket + 38 3 CBra 1 + 41 3 Ket + 43 43 Ket + 45 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcde +No match + +/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug +------------------------------------------------------------------ + 0 133 Bra + 2 41 CBra 1 + 5 2 Recurse + 7 88 Recurse + 9 93 Recurse + 11 98 Recurse + 13 103 Recurse + 15 108 Recurse + 17 113 Recurse + 19 118 Recurse + 21 123 Recurse + 23 123 Recurse + 25 118 Recurse + 27 113 Recurse + 29 108 Recurse + 31 103 Recurse + 33 98 Recurse + 35 93 Recurse + 37 88 Recurse + 39 2 Recurse + 41 0 Recurse + 43 41 Ket + 45 41 SCBra 1 + 48 2 Recurse + 50 88 Recurse + 52 93 Recurse + 54 98 Recurse + 56 103 Recurse + 58 108 Recurse + 60 113 Recurse + 62 118 Recurse + 64 123 Recurse + 66 123 Recurse + 68 118 Recurse + 70 113 Recurse + 72 108 Recurse + 74 103 Recurse + 76 98 Recurse + 78 93 Recurse + 80 88 Recurse + 82 2 Recurse + 84 0 Recurse + 86 41 KetRmax + 88 3 CBra 2 + 91 3 Ket + 93 3 CBra 3 + 96 3 Ket + 98 3 CBra 4 +101 3 Ket +103 3 CBra 5 +106 3 Ket +108 3 CBra 6 +111 3 Ket +113 3 CBra 7 +116 3 Ket +118 3 CBra 8 +121 3 Ket +123 3 CBra 9 +126 3 Ket +128 3 CBra 10 +131 3 Ket +133 133 Ket +135 End +------------------------------------------------------------------ +Capture group count = 10 +May match empty string +Subject length lower boundailed: error 114 at offset 509: missing closing parenthesisfullbincode + +#pattern -fullbincode + +/\[()]{65535}/expand + +# End of testinput8 diff --git a/src/pcre2/testdata/testoutput8-32-4 b/src/pcre2/testdata/testoutput8-32-4 new file mode 100644 index 00000000..91d96c94 --- /dev/null +++ b/src/pcre2/testdata/testoutput8-32-4 @@ -0,0 +1,1018 @@ +# There are two sorts of patterns in this test. A number of them are +# representative patterns whose lengths and offsets are checked. This is just a +# doublecheck test to ensure the sizes don't go horribly wrong when something +# is changed. The operation of these patterns is checked in other tests. +# +# This file also contains tests whose output varies with code unit size and/or +# link size. Unicode support is required for these tests. There are separate +# output files for each code unit size and link size. + +#pattern fullbincode,memory + +/((?i)b)/ +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 9 Bra + 2 5 CBra 1 + 5 /i b + 7 5 Ket + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/(?s)(.*X|^B)/ +Memory allocation (code space): 76 +------------------------------------------------------------------ + 0 16 Bra + 2 7 CBra 1 + 5 AllAny* + 7 X + 9 5 Alt + 11 ^ + 12 B + 14 12 Ket + 16 16 Ket + 18 End +------------------------------------------------------------------ + +/(?s:.*X|^B)/ +Memory allocation (code space): 72 +------------------------------------------------------------------ + 0 15 Bra + 2 6 Bra + 4 AllAny* + 6 X + 8 5 Alt + 10 ^ + 11 B + 13 11 Ket + 15 15 Ket + 17 End +------------------------------------------------------------------ + +/^[[:alnum:]]/ +Memory allocation (code space): 60 +------------------------------------------------------------------ + 0 12 Bra + 2 ^ + 3 [0-9A-Za-z] + 12 12 Ket + 14 End +------------------------------------------------------------------ + +/#/Ix +Memory allocation (code space): 20 +------------------------------------------------------------------ + 0 2 Bra + 2 2 Ket + 4 End +------------------------------------------------------------------ +Capture group count = 0 +May match empty string +Options: extended +Subject length lower bound = 0 + +/a#/Ix +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 a + 4 4 Ket + 6 End +------------------------------------------------------------------ +Capture group count = 0 +Options: extended +First code unit = 'a' +Subject length lower bound = 1 + +/x?+/ +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 x?+ + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/x++/ +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 x++ + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/x{1,3}+/ +Memory allocation (code space): 40 +------------------------------------------------------------------ + 0 7 Bra + 2 x + 4 x{0,2}+ + 7 7 Ket + 9 End +------------------------------------------------------------------ + +/(x)*+/ +Memory allocation (code space): 52 +------------------------------------------------------------------ + 0 10 Bra + 2 Braposzero + 3 5 CBraPos 1 + 6 x + 8 5 KetRpos + 10 10 Ket + 12 End +------------------------------------------------------------------ + +/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/ +Memory allocation (code space): 220 +------------------------------------------------------------------ + 0 52 Bra + 2 ^ + 3 47 CBra 1 + 6 5 CBra 2 + 9 a+ + 11 5 Ket + 13 13 CBra 3 + 16 [ab]+? + 26 13 Ket + 28 13 CBra 4 + 31 [bc]+ + 41 13 Ket + 43 5 CBra 5 + 46 \w*+ + 48 5 Ket + 50 47 Ket + 52 52 Ket + 54 End +------------------------------------------------------------------ + +"8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" +Memory allocation (code space): 3296 +------------------------------------------------------------------ + 0 821 Bra + 2 8J$WE<.rX+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X +820 \b +821 821 Ket +823 End +------------------------------------------------------------------ + +"\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" +Memory allocation (code space): 3256 +------------------------------------------------------------------ + 0 811 Bra + 2 $<.X+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X +810 \b +811 811 Ket +813 End +------------------------------------------------------------------ + +/(a(?1)b)/ +Memory allocation (code space): 64 +------------------------------------------------------------------ + 0 13 Bra + 2 9 CBra 1 + 5 a + 7 2 Recurse + 9 b + 11 9 Ket + 13 13 Ket + 15 End +------------------------------------------------------------------ + +/(a(?1)+b)/ +Memory allocation (code space): 80 +------------------------------------------------------------------ + 0 17 Bra + 2 13 CBra 1 + 5 a + 7 4 SBra + 9 2 Recurse + 11 4 KetRmax + 13 b + 15 13 Ket + 17 17 Ket + 19 End +------------------------------------------------------------------ + +/a(?Pb|c)d(?Pe)/ +Memory allocation (code space): 108 +------------------------------------------------------------------ + 0 24 Bra + 2 a + 4 5 CBra 1 + 7 b + 9 4 Alt + 11 c + 13 9 Ket + 15 d + 17 5 CBra 2 + 20 e + 22 5 Ket + 24 24 Ket + 26 End +------------------------------------------------------------------ + +/(?:a(?Pc(?Pd)))(?Pa)/ +Memory allocation (code space): 128 +------------------------------------------------------------------ + 0 29 Bra + 2 18 Bra + 4 a + 6 12 CBra 1 + 9 c + 11 5 CBra 2 + 14 d + 16 5 Ket + 18 12 Ket + 20 18 Ket + 22 5 CBra 3 + 25 a + 27 5 Ket + 29 29 Ket + 31 End +------------------------------------------------------------------ + +/(?Pa)...(?P=a)bbb(?P>a)d/ +Memory allocation (code space): 108 +------------------------------------------------------------------ + 0 24 Bra + 2 5 CBra 1 + 5 a + 7 5 Ket + 9 Any + 10 Any + 11 Any + 12 \1 + 14 bbb + 20 2 Recurse + 22 d + 24 24 Ket + 26 End +------------------------------------------------------------------ + +/abc(?C255)de(?C)f/ +Memory allocation (code space): 100 +------------------------------------------------------------------ + 0 22 Bra + 2 abc + 8 Callout 255 10 1 + 12 de + 16 Callout 0 16 1 + 20 f + 22 22 Ket + 24 End +------------------------------------------------------------------ + +/abcde/auto_callout +Memory allocation (code space): 156 +------------------------------------------------------------------ + 0 36 Bra + 2 Callout 255 0 1 + 6 a + 8 Callout 255 1 1 + 12 b + 14 Callout 255 2 1 + 18 c + 20 Callout 255 3 1 + 24 d + 26 Callout 255 4 1 + 30 e + 32 Callout 255 5 0 + 36 36 Ket + 38 End +------------------------------------------------------------------ + +/\x{100}/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{100} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\x{1000}/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{1000} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\x{10000}/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{10000} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\x{100000}/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{100000} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\x{10ffff}/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{10ffff} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\x{110000}/utf +Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large + +/[\x{ff}]/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{ff} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[\x{100}]/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{100} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\x80/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{80} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\xff/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{ff} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/\x{0041}\x{2262}\x{0391}\x{002e}/I,utf +Memory allocation (code space): 52 +------------------------------------------------------------------ + 0 10 Bra + 2 A\x{2262}\x{391}. + 10 10 Ket + 12 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'A' +Last code unit = '.' +Subject length lower bound = 4 + +/\x{D55c}\x{ad6d}\x{C5B4}/I,utf +Memory allocation (code space): 44 +------------------------------------------------------------------ + 0 8 Bra + 2 \x{d55c}\x{ad6d}\x{c5b4} + 8 8 Ket + 10 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{d55c} +Last code unit = \x{c5b4} +Subject length lower bound = 3 + +/\x{65e5}\x{672c}\x{8a9e}/I,utf +Memory allocation (code space): 44 +------------------------------------------------------------------ + 0 8 Bra + 2 \x{65e5}\x{672c}\x{8a9e} + 8 8 Ket + 10 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \x{65e5} +Last code unit = \x{8a9e} +Subject length lower bound = 3 + +/[\x{100}]/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{100} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[Z\x{100}]/utf +Memory allocation (code space): 76 +------------------------------------------------------------------ + 0 16 Bra + 2 [Z\x{100}] + 16 16 Ket + 18 End +------------------------------------------------------------------ + +/^[\x{100}\E-\Q\E\x{150}]/utf +Memory allocation (code space): 52 +------------------------------------------------------------------ + 0 10 Bra + 2 ^ + 3 [\x{100}-\x{150}] + 10 10 Ket + 12 End +------------------------------------------------------------------ + +/^[\QÄ€\E-\QÅ\E]/utf +Memory allocation (code space): 52 +------------------------------------------------------------------ + 0 10 Bra + 2 ^ + 3 [\x{100}-\x{150}] + 10 10 Ket + 12 End +------------------------------------------------------------------ + +/^[\QÄ€\E-\QÅ\E/utf +Failed: error 106 at offset 13: missing terminating ] for character class + +/[\p{L}]/ +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 9 Bra + 2 [\p{L}] + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/[\p{^L}]/ +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 9 Bra + 2 [\P{L}] + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/[\P{L}]/ +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 9 Bra + 2 [\P{L}] + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/[\P{^L}]/ +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 9 Bra + 2 [\p{L}] + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/[abc\p{L}\x{0660}]/utf +Memory allocation (code space): 88 +------------------------------------------------------------------ + 0 19 Bra + 2 [a-c\p{L}\x{660}] + 19 19 Ket + 21 End +------------------------------------------------------------------ + +/[\p{Nd}]/utf +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 9 Bra + 2 [\p{Nd}] + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/[\p{Nd}+-]+/utf +Memory allocation (code space): 84 +------------------------------------------------------------------ + 0 18 Bra + 2 [+\-\p{Nd}]++ + 18 18 Ket + 20 End +------------------------------------------------------------------ + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/i,utf +Memory allocation (code space): 60 +------------------------------------------------------------------ + 0 12 Bra + 2 /i A\x{391}\x{10427}\x{ff3a}\x{1fb0} + 12 12 Ket + 14 End +------------------------------------------------------------------ + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/utf +Memory allocation (code space): 60 +------------------------------------------------------------------ + 0 12 Bra + 2 A\x{391}\x{10427}\x{ff3a}\x{1fb0} + 12 12 Ket + 14 End +------------------------------------------------------------------ + +/[\x{105}-\x{109}]/i,utf +Memory allocation (code space): 48 +------------------------------------------------------------------ + 0 9 Bra + 2 [\x{104}-\x{109}] + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/( ( (?(1)0|) )* )/x +Memory allocation (code space): 104 +------------------------------------------------------------------ + 0 23 Bra + 2 19 CBra 1 + 5 Brazero + 6 13 SCBra 2 + 9 6 Cond + 11 1 Cond ref + 13 0 + 15 2 Alt + 17 8 Ket + 19 13 KetRmax + 21 19 Ket + 23 23 Ket + 25 End +------------------------------------------------------------------ + +/( (?(1)0|)* )/x +Memory allocation (code space): 84 +------------------------------------------------------------------ + 0 18 Bra + 2 14 CBra 1 + 5 Brazero + 6 6 SCond + 8 1 Cond ref + 10 0 + 12 2 Alt + 14 8 KetRmax + 16 14 Ket + 18 18 Ket + 20 End +------------------------------------------------------------------ + +/[a]/ +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 a + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[a]/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 a + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[\xaa]/ +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{aa} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[\xaa]/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 \x{aa} + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[^a]/ +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 [^a] + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[^a]/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 [^a] + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[^\xaa]/ +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 [^\x{aa}] + 4 4 Ket + 6 End +------------------------------------------------------------------ + +/[^\xaa]/utf +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 4 Bra + 2 [^\x{aa}] + 4 4 Ket + 6 End +------------------------------------------------------------------ + +#pattern -memory + +/[^\d]/utf,ucp +------------------------------------------------------------------ + 0 9 Bra + 2 [^\p{Nd}] + 9 9 Ket + 11 End +------------------------------------------------------------------ + +/[[:^alpha:][:^cntrl:]]+/utf,ucp +------------------------------------------------------------------ + 0 13 Bra + 2 [\P{L}\P{Cc}]++ + 13 13 Ket + 15 End +------------------------------------------------------------------ + +/[[:^cntrl:][:^alpha:]]+/utf,ucp +------------------------------------------------------------------ + 0 13 Bra + 2 [\P{Cc}\P{L}]++ + 13 13 Ket + 15 End +------------------------------------------------------------------ + +/[[:alpha:]]+/utf,ucp +------------------------------------------------------------------ + 0 10 Bra + 2 [\p{L}]++ + 10 10 Ket + 12 End +------------------------------------------------------------------ + +/[[:^alpha:]\S]+/utf,ucp +------------------------------------------------------------------ + 0 13 Bra + 2 [\P{L}\P{Xsp}]++ + 13 13 Ket + 15 End +------------------------------------------------------------------ + +/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/ +------------------------------------------------------------------ + 0 60 Bra + 2 abc + 8 5 CBra 1 + 11 d + 13 4 Alt + 15 e + 17 9 Ket + 19 *THEN + 20 x + 22 12 CBra 2 + 25 123 + 31 *THEN + 32 4 + 34 24 Alt + 36 567 + 42 5 CBra 3 + 45 b + 47 4 Alt + 49 q + 51 9 Ket + 53 *THEN + 54 xx + 58 36 Ket + 60 60 Ket + 62 End +------------------------------------------------------------------ + +/(((a\2)|(a*)\g<-1>))*a?/ +------------------------------------------------------------------ + 0 35 Bra + 2 Brazero + 3 28 SCBra 1 + 6 12 CBra 2 + 9 7 CBra 3 + 12 a + 14 \2 + 16 7 Ket + 18 11 Alt + 20 5 CBra 4 + 23 a* + 25 5 Ket + 27 20 Recurse + 29 23 Ket + 31 28 KetRmax + 33 a?+ + 35 35 Ket + 37 End +------------------------------------------------------------------ + +/((?+1)(\1))/ +------------------------------------------------------------------ + 0 16 Bra + 2 12 CBra 1 + 5 7 Recurse + 7 5 CBra 2 + 10 \1 + 12 5 Ket + 14 12 Ket + 16 16 Ket + 18 End +------------------------------------------------------------------ + +"(?1)(?#?'){2}(a)" +------------------------------------------------------------------ + 0 13 Bra + 2 6 Recurse + 4 6 Recurse + 6 5 CBra 1 + 9 a + 11 5 Ket + 13 13 Ket + 15 End +------------------------------------------------------------------ + +/.((?2)(?R)|\1|$)()/ +------------------------------------------------------------------ + 0 24 Bra + 2 Any + 3 7 CBra 1 + 6 19 Recurse + 8 0 Recurse + 10 4 Alt + 12 \1 + 14 3 Alt + 16 $ + 17 14 Ket + 19 3 CBra 2 + 22 3 Ket + 24 24 Ket + 26 End +------------------------------------------------------------------ + +/.((?3)(?R)()(?2)|\1|$)()/ +------------------------------------------------------------------ + 0 31 Bra + 2 Any + 3 14 CBra 1 + 6 26 Recurse + 8 0 Recurse + 10 3 CBra 2 + 13 3 Ket + 15 10 Recurse + 17 4 Alt + 19 \1 + 21 3 Alt + 23 $ + 24 21 Ket + 26 3 CBra 3 + 29 3 Ket + 31 31 Ket + 33 End +------------------------------------------------------------------ + +/(?1)()((((((\1++))\x85)+)|))/ +------------------------------------------------------------------ + 0 50 Bra + 2 4 Recurse + 4 3 CBra 1 + 7 3 Ket + 9 39 CBra 2 + 12 32 CBra 3 + 15 27 CBra 4 + 18 22 CBra 5 + 21 15 CBra 6 + 24 10 CBra 7 + 27 5 Once + 29 \1+ + 32 5 Ket + 34 10 Ket + 36 15 Ket + 38 \x{85} + 40 22 KetRmax + 42 27 Ket + 44 2 Alt + 46 34 Ket + 48 39 Ket + 50 50 Ket + 52 End +------------------------------------------------------------------ + +# Check the absolute limit on nesting (?| etc. This varies with code unit +# width because the workspace is a different number of bytes. It will fail +# with link size 2 in 8-bit and 16-bit but not in 32-bit. + +/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|parens_nest_limit=1000,-fullbincode + +# Use "expand" to create some very long patterns with nested parentheses, in +# order to test workspace overflow. Again, this varies with code unit width, +# and even when it fails in two modes, the error offset differs. It also varies +# with link size - hence multiple tests with different values. + +/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000 +Failed: error 186 at offset 12820: regular expression is too complicated + +/(?(1)(?1)){8,}+()/debug +------------------------------------------------------------------ + 0 79 Bra + 2 70 Once + 4 6 Cond + 6 1 Cond ref + 8 74 Recurse + 10 6 Ket + 12 6 Cond + 14 1 Cond ref + 16 74 Recurse + 18 6 Ket + 20 6 Cond + 22 1 Cond ref + 24 74 Recurse + 26 6 Ket + 28 6 Cond + 30 1 Cond ref + 32 74 Recurse + 34 6 Ket + 36 6 Cond + 38 1 Cond ref + 40 74 Recurse + 42 6 Ket + 44 6 Cond + 46 1 Cond ref + 48 74 Recurse + 50 6 Ket + 52 6 Cond + 54 1 Cond ref + 56 74 Recurse + 58 6 Ket + 60 10 SBraPos + 62 6 SCond + 64 1 Cond ref + 66 74 Recurse + 68 6 Ket + 70 10 KetRpos + 72 70 Ket + 74 3 CBra 1 + 77 3 Ket + 79 79 Ket + 81 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcd + 0: + 1: + +/(?(1)|a(?1)b){2,}+()/debug +------------------------------------------------------------------ + 0 43 Bra + 2 34 Once + 4 4 Cond + 6 1 Cond ref + 8 8 Alt + 10 a + 12 38 Recurse + 14 b + 16 12 Ket + 18 16 SBraPos + 20 4 SCond + 22 1 Cond ref + 24 8 Alt + 26 a + 28 38 Recurse + 30 b + 32 12 Ket + 34 16 KetRpos + 36 34 Ket + 38 3 CBra 1 + 41 3 Ket + 43 43 Ket + 45 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcde +No match + +/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug +------------------------------------------------------------------ + 0 133 Bra + 2 41 CBra 1 + 5 2 Recurse + 7 88 Recurse + 9 93 Recurse + 11 98 Recurse + 13 103 Recurse + 15 108 Recurse + 17 113 Recurse + 19 118 Recurse + 21 123 Recurse + 23 123 Recurse + 25 118 Recurse + 27 113 Recurse + 29 108 Recurse + 31 103 Recurse + 33 98 Recurse + 35 93 Recurse + 37 88 Recurse + 39 2 Recurse + 41 0 Recurse + 43 41 Ket + 45 41 SCBra 1 + 48 2 Recurse + 50 88 Recurse + 52 93 Recurse + 54 98 Recurse + 56 103 Recurse + 58 108 Recurse + 60 113 Recurse + 62 118 Recurse + 64 123 Recurse + 66 123 Recurse + 68 118 Recurse + 70 113 Recurse + 72 108 Recurse + 74 103 Recurse + 76 98 Recurse + 78 93 Recurse + 80 88 Recurse + 82 2 Recurse + 84 0 Recurse + 86 41 KetRmax + 88 3 CBra 2 + 91 3 Ket + 93 3 CBra 3 + 96 3 Ket + 98 3 CBra 4 +101 3 Ket +103 3 CBra 5 +106 3 Ket +108 3 CBra 6 +111 3 Ket +113 3 CBra 7 +116 3 Ket +118 3 CBra 8 +121 3 Ket +123 3 CBra 9 +126 3 Ket +128 3 CBra 10 +131 3 Ket +133 133 Ket +135 End +------------------------------------------------------------------ +Capture group count = 10 +May match empty string +Subject length lower boundailed: error 114 at offset 509: missing closing parenthesisfullbincode + +#pattern -fullbincode + +/\[()]{65535}/expand + +# End of testinput8 diff --git a/src/pcre/testdata/testoutput11-8 b/src/pcre2/testdata/testoutput8-8-2 similarity index 62% rename from src/pcre/testdata/testoutput11-8 rename to src/pcre2/testdata/testoutput8-8-2 index 5a4fbb23..8393d5c5 100644 --- a/src/pcre/testdata/testoutput11-8 +++ b/src/pcre2/testdata/testoutput8-8-2 @@ -1,10 +1,15 @@ -/-- These are a few representative patterns whose lengths and offsets are to be -shown when the link size is 2. This is just a doublecheck test to ensure the -sizes don't go horribly wrong when something is changed. The pattern contents -are all themselves checked in other tests. Unicode, including property support, -is required for these tests. --/ - -/((?i)b)/BM +# There are two sorts of patterns in this test. A number of them are +# representative patterns whose lengths and offsets are checked. This is just a +# doublecheck test to ensure the sizes don't go horribly wrong when something +# is changed. The operation of these patterns is checked in other tests. +# +# This file also contains tests whose output varies with code unit size and/or +# link size. Unicode support is required for these tests. There are separate +# output files for each code unit size and link size. + +#pattern fullbincode,memory + +/((?i)b)/ Memory allocation (code space): 17 ------------------------------------------------------------------ 0 13 Bra @@ -15,7 +20,7 @@ Memory allocation (code space): 17 16 End ------------------------------------------------------------------ -/(?s)(.*X|^B)/BM +/(?s)(.*X|^B)/ Memory allocation (code space): 25 ------------------------------------------------------------------ 0 21 Bra @@ -30,7 +35,7 @@ Memory allocation (code space): 25 24 End ------------------------------------------------------------------ -/(?s:.*X|^B)/BM +/(?s:.*X|^B)/ Memory allocation (code space): 23 ------------------------------------------------------------------ 0 19 Bra @@ -45,7 +50,7 @@ Memory allocation (code space): 23 22 End ------------------------------------------------------------------ -/^[[:alnum:]]/BM +/^[[:alnum:]]/ Memory allocation (code space): 41 ------------------------------------------------------------------ 0 37 Bra @@ -55,20 +60,19 @@ Memory allocation (code space): 41 40 End ------------------------------------------------------------------ -/#/IxMD +/#/Ix Memory allocation (code space): 7 ------------------------------------------------------------------ 0 3 Bra 3 3 Ket 6 End ------------------------------------------------------------------ -Capturing subpattern count = 0 +Capture group count = 0 May match empty string Options: extended -No first char -No need char +Subject length lower bound = 0 -/a#/IxMD +/a#/Ix Memory allocation (code space): 9 ------------------------------------------------------------------ 0 5 Bra @@ -76,12 +80,12 @@ Memory allocation (code space): 9 5 5 Ket 8 End ------------------------------------------------------------------ -Capturing subpattern count = 0 +Capture group count = 0 Options: extended -First char = 'a' -No need char +First code unit = 'a' +Subject length lower bound = 1 -/x?+/BM +/x?+/ Memory allocation (code space): 9 ------------------------------------------------------------------ 0 5 Bra @@ -90,7 +94,7 @@ Memory allocation (code space): 9 8 End ------------------------------------------------------------------ -/x++/BM +/x++/ Memory allocation (code space): 9 ------------------------------------------------------------------ 0 5 Bra @@ -99,7 +103,7 @@ Memory allocation (code space): 9 8 End ------------------------------------------------------------------ -/x{1,3}+/BM +/x{1,3}+/ Memory allocation (code space): 13 ------------------------------------------------------------------ 0 9 Bra @@ -109,7 +113,7 @@ Memory allocation (code space): 13 12 End ------------------------------------------------------------------ -/(x)*+/BM +/(x)*+/ Memory allocation (code space): 18 ------------------------------------------------------------------ 0 14 Bra @@ -121,7 +125,7 @@ Memory allocation (code space): 18 17 End ------------------------------------------------------------------ -/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/BM +/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/ Memory allocation (code space): 120 ------------------------------------------------------------------ 0 116 Bra @@ -144,7 +148,7 @@ Memory allocation (code space): 120 119 End ------------------------------------------------------------------ -|8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b|BM +"8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" Memory allocation (code space): 826 ------------------------------------------------------------------ 0 822 Bra @@ -154,7 +158,7 @@ Memory allocation (code space): 826 825 End ------------------------------------------------------------------ -|\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b|BM +"\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" Memory allocation (code space): 816 ------------------------------------------------------------------ 0 812 Bra @@ -164,7 +168,7 @@ Memory allocation (code space): 816 815 End ------------------------------------------------------------------ -/(a(?1)b)/BM +/(a(?1)b)/ Memory allocation (code space): 22 ------------------------------------------------------------------ 0 18 Bra @@ -177,13 +181,13 @@ Memory allocation (code space): 22 21 End ------------------------------------------------------------------ -/(a(?1)+b)/BM +/(a(?1)+b)/ Memory allocation (code space): 28 ------------------------------------------------------------------ 0 24 Bra 3 18 CBra 1 8 a - 10 6 Once + 10 6 SBra 13 3 Recurse 16 6 KetRmax 19 b @@ -192,7 +196,7 @@ Memory allocation (code space): 28 27 End ------------------------------------------------------------------ -/a(?Pb|c)d(?Pe)/BM +/a(?Pb|c)d(?Pe)/ Memory allocation (code space): 36 ------------------------------------------------------------------ 0 32 Bra @@ -210,7 +214,7 @@ Memory allocation (code space): 36 35 End ------------------------------------------------------------------ -/(?:a(?Pc(?Pd)))(?Pa)/BM +/(?:a(?Pc(?Pd)))(?Pa)/ Memory allocation (code space): 45 ------------------------------------------------------------------ 0 41 Bra @@ -230,8 +234,8 @@ Memory allocation (code space): 45 44 End ------------------------------------------------------------------ -/(?Pa)...(?P=a)bbb(?P>a)d/BM -Memory allocation (code space): 62 +/(?Pa)...(?P=a)bbb(?P>a)d/ +Memory allocation (code space): 34 ------------------------------------------------------------------ 0 30 Bra 3 7 CBra 1 @@ -248,7 +252,7 @@ Memory allocation (code space): 62 33 End ------------------------------------------------------------------ -/abc(?C255)de(?C)f/BM +/abc(?C255)de(?C)f/ Memory allocation (code space): 31 ------------------------------------------------------------------ 0 27 Bra @@ -261,7 +265,7 @@ Memory allocation (code space): 31 30 End ------------------------------------------------------------------ -/abcde/CBM +/abcde/auto_callout Memory allocation (code space): 53 ------------------------------------------------------------------ 0 49 Bra @@ -280,7 +284,7 @@ Memory allocation (code space): 53 52 End ------------------------------------------------------------------ -/\x{100}/8BM +/\x{100}/utf Memory allocation (code space): 10 ------------------------------------------------------------------ 0 6 Bra @@ -289,7 +293,7 @@ Memory allocation (code space): 10 9 End ------------------------------------------------------------------ -/\x{1000}/8BM +/\x{1000}/utf Memory allocation (code space): 11 ------------------------------------------------------------------ 0 7 Bra @@ -298,7 +302,7 @@ Memory allocation (code space): 11 10 End ------------------------------------------------------------------ -/\x{10000}/8BM +/\x{10000}/utf Memory allocation (code space): 12 ------------------------------------------------------------------ 0 8 Bra @@ -307,7 +311,7 @@ Memory allocation (code space): 12 11 End ------------------------------------------------------------------ -/\x{100000}/8BM +/\x{100000}/utf Memory allocation (code space): 12 ------------------------------------------------------------------ 0 8 Bra @@ -316,7 +320,7 @@ Memory allocation (code space): 12 11 End ------------------------------------------------------------------ -/\x{10ffff}/8BM +/\x{10ffff}/utf Memory allocation (code space): 12 ------------------------------------------------------------------ 0 8 Bra @@ -325,10 +329,10 @@ Memory allocation (code space): 12 11 End ------------------------------------------------------------------ -/\x{110000}/8BM -Failed: character value in \x{} or \o{} is too large at offset 9 +/\x{110000}/utf +Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large -/[\x{ff}]/8BM +/[\x{ff}]/utf Memory allocation (code space): 10 ------------------------------------------------------------------ 0 6 Bra @@ -337,7 +341,7 @@ Memory allocation (code space): 10 9 End ------------------------------------------------------------------ -/[\x{100}]/8BM +/[\x{100}]/utf Memory allocation (code space): 10 ------------------------------------------------------------------ 0 6 Bra @@ -346,7 +350,7 @@ Memory allocation (code space): 10 9 End ------------------------------------------------------------------ -/\x80/8BM +/\x80/utf Memory allocation (code space): 10 ------------------------------------------------------------------ 0 6 Bra @@ -355,7 +359,7 @@ Memory allocation (code space): 10 9 End ------------------------------------------------------------------ -/\xff/8BM +/\xff/utf Memory allocation (code space): 10 ------------------------------------------------------------------ 0 6 Bra @@ -364,7 +368,7 @@ Memory allocation (code space): 10 9 End ------------------------------------------------------------------ -/\x{0041}\x{2262}\x{0391}\x{002e}/D8M +/\x{0041}\x{2262}\x{0391}\x{002e}/I,utf Memory allocation (code space): 18 ------------------------------------------------------------------ 0 14 Bra @@ -372,12 +376,13 @@ Memory allocation (code space): 18 14 14 Ket 17 End ------------------------------------------------------------------ -Capturing subpattern count = 0 +Capture group count = 0 Options: utf -First char = 'A' -Need char = '.' - -/\x{D55c}\x{ad6d}\x{C5B4}/D8M +First code unit = 'A' +Last code unit = '.' +Subject length lower bound = 4 + +/\x{D55c}\x{ad6d}\x{C5B4}/I,utf Memory allocation (code space): 19 ------------------------------------------------------------------ 0 15 Bra @@ -385,12 +390,13 @@ Memory allocation (code space): 19 15 15 Ket 18 End ------------------------------------------------------------------ -Capturing subpattern count = 0 +Capture group count = 0 Options: utf -First char = \x{ed} -Need char = \x{b4} +First code unit = \xed +Last code unit = \xb4 +Subject length lower bound = 3 -/\x{65e5}\x{672c}\x{8a9e}/D8M +/\x{65e5}\x{672c}\x{8a9e}/I,utf Memory allocation (code space): 19 ------------------------------------------------------------------ 0 15 Bra @@ -398,12 +404,13 @@ Memory allocation (code space): 19 15 15 Ket 18 End ------------------------------------------------------------------ -Capturing subpattern count = 0 +Capture group count = 0 Options: utf -First char = \x{e6} -Need char = \x{9e} +First code unit = \xe6 +Last code unit = \x9e +Subject length lower bound = 3 -/[\x{100}]/8BM +/[\x{100}]/utf Memory allocation (code space): 10 ------------------------------------------------------------------ 0 6 Bra @@ -412,7 +419,7 @@ Memory allocation (code space): 10 9 End ------------------------------------------------------------------ -/[Z\x{100}]/8BM +/[Z\x{100}]/utf Memory allocation (code space): 47 ------------------------------------------------------------------ 0 43 Bra @@ -421,7 +428,7 @@ Memory allocation (code space): 47 46 End ------------------------------------------------------------------ -/^[\x{100}\E-\Q\E\x{150}]/B8M +/^[\x{100}\E-\Q\E\x{150}]/utf Memory allocation (code space): 18 ------------------------------------------------------------------ 0 14 Bra @@ -431,7 +438,7 @@ Memory allocation (code space): 18 17 End ------------------------------------------------------------------ -/^[\QÄ€\E-\QÅ\E]/B8M +/^[\QÄ€\E-\QÅ\E]/utf Memory allocation (code space): 18 ------------------------------------------------------------------ 0 14 Bra @@ -441,10 +448,10 @@ Memory allocation (code space): 18 17 End ------------------------------------------------------------------ -/^[\QÄ€\E-\QÅ\E/B8M -Failed: missing terminating ] for character class at offset 15 +/^[\QÄ€\E-\QÅ\E/utf +Failed: error 106 at offset 15: missing terminating ] for character class -/[\p{L}]/BM +/[\p{L}]/ Memory allocation (code space): 15 ------------------------------------------------------------------ 0 11 Bra @@ -453,7 +460,7 @@ Memory allocation (code space): 15 14 End ------------------------------------------------------------------ -/[\p{^L}]/BM +/[\p{^L}]/ Memory allocation (code space): 15 ------------------------------------------------------------------ 0 11 Bra @@ -462,7 +469,7 @@ Memory allocation (code space): 15 14 End ------------------------------------------------------------------ -/[\P{L}]/BM +/[\P{L}]/ Memory allocation (code space): 15 ------------------------------------------------------------------ 0 11 Bra @@ -471,7 +478,7 @@ Memory allocation (code space): 15 14 End ------------------------------------------------------------------ -/[\P{^L}]/BM +/[\P{^L}]/ Memory allocation (code space): 15 ------------------------------------------------------------------ 0 11 Bra @@ -480,7 +487,7 @@ Memory allocation (code space): 15 14 End ------------------------------------------------------------------ -/[abc\p{L}\x{0660}]/8BM +/[abc\p{L}\x{0660}]/utf Memory allocation (code space): 50 ------------------------------------------------------------------ 0 46 Bra @@ -489,7 +496,7 @@ Memory allocation (code space): 50 49 End ------------------------------------------------------------------ -/[\p{Nd}]/8BM +/[\p{Nd}]/utf Memory allocation (code space): 15 ------------------------------------------------------------------ 0 11 Bra @@ -498,7 +505,7 @@ Memory allocation (code space): 15 14 End ------------------------------------------------------------------ -/[\p{Nd}+-]+/8BM +/[\p{Nd}+-]+/utf Memory allocation (code space): 48 ------------------------------------------------------------------ 0 44 Bra @@ -507,7 +514,7 @@ Memory allocation (code space): 48 47 End ------------------------------------------------------------------ -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8iBM +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/i,utf Memory allocation (code space): 25 ------------------------------------------------------------------ 0 21 Bra @@ -516,7 +523,7 @@ Memory allocation (code space): 25 24 End ------------------------------------------------------------------ -/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/8BM +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/utf Memory allocation (code space): 25 ------------------------------------------------------------------ 0 21 Bra @@ -525,7 +532,7 @@ Memory allocation (code space): 25 24 End ------------------------------------------------------------------ -/[\x{105}-\x{109}]/8iBM +/[\x{105}-\x{109}]/i,utf Memory allocation (code space): 17 ------------------------------------------------------------------ 0 13 Bra @@ -534,7 +541,7 @@ Memory allocation (code space): 17 16 End ------------------------------------------------------------------ -/( ( (?(1)0|) )* )/xBM +/( ( (?(1)0|) )* )/x Memory allocation (code space): 38 ------------------------------------------------------------------ 0 34 Bra @@ -552,7 +559,7 @@ Memory allocation (code space): 38 37 End ------------------------------------------------------------------ -/( (?(1)0|)* )/xBM +/( (?(1)0|)* )/x Memory allocation (code space): 30 ------------------------------------------------------------------ 0 26 Bra @@ -568,7 +575,7 @@ Memory allocation (code space): 30 29 End ------------------------------------------------------------------ -/[a]/BM +/[a]/ Memory allocation (code space): 9 ------------------------------------------------------------------ 0 5 Bra @@ -577,7 +584,7 @@ Memory allocation (code space): 9 8 End ------------------------------------------------------------------ -/[a]/8BM +/[a]/utf Memory allocation (code space): 9 ------------------------------------------------------------------ 0 5 Bra @@ -586,7 +593,7 @@ Memory allocation (code space): 9 8 End ------------------------------------------------------------------ -/[\xaa]/BM +/[\xaa]/ Memory allocation (code space): 9 ------------------------------------------------------------------ 0 5 Bra @@ -595,7 +602,7 @@ Memory allocation (code space): 9 8 End ------------------------------------------------------------------ -/[\xaa]/8BM +/[\xaa]/utf Memory allocation (code space): 10 ------------------------------------------------------------------ 0 6 Bra @@ -604,7 +611,7 @@ Memory allocation (code space): 10 9 End ------------------------------------------------------------------ -/[^a]/BM +/[^a]/ Memory allocation (code space): 9 ------------------------------------------------------------------ 0 5 Bra @@ -613,7 +620,7 @@ Memory allocation (code space): 9 8 End ------------------------------------------------------------------ -/[^a]/8BM +/[^a]/utf Memory allocation (code space): 9 ------------------------------------------------------------------ 0 5 Bra @@ -622,7 +629,7 @@ Memory allocation (code space): 9 8 End ------------------------------------------------------------------ -/[^\xaa]/BM +/[^\xaa]/ Memory allocation (code space): 9 ------------------------------------------------------------------ 0 5 Bra @@ -631,7 +638,7 @@ Memory allocation (code space): 9 8 End ------------------------------------------------------------------ -/[^\xaa]/8BM +/[^\xaa]/utf Memory allocation (code space): 10 ------------------------------------------------------------------ 0 6 Bra @@ -640,7 +647,9 @@ Memory allocation (code space): 10 9 End ------------------------------------------------------------------ -/[^\d]/8WB +#pattern -memory + +/[^\d]/utf,ucp ------------------------------------------------------------------ 0 11 Bra 3 [^\p{Nd}] @@ -648,23 +657,23 @@ Memory allocation (code space): 10 14 End ------------------------------------------------------------------ -/[[:^alpha:][:^cntrl:]]+/8WB +/[[:^alpha:][:^cntrl:]]+/utf,ucp ------------------------------------------------------------------ - 0 51 Bra - 3 [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++ - 51 51 Ket - 54 End + 0 15 Bra + 3 [\P{L}\P{Cc}]++ + 15 15 Ket + 18 End ------------------------------------------------------------------ -/[[:^cntrl:][:^alpha:]]+/8WB +/[[:^cntrl:][:^alpha:]]+/utf,ucp ------------------------------------------------------------------ - 0 51 Bra - 3 [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++ - 51 51 Ket - 54 End + 0 15 Bra + 3 [\P{Cc}\P{L}]++ + 15 15 Ket + 18 End ------------------------------------------------------------------ -/[[:alpha:]]+/8WB +/[[:alpha:]]+/utf,ucp ------------------------------------------------------------------ 0 12 Bra 3 [\p{L}]++ @@ -672,7 +681,7 @@ Memory allocation (code space): 10 15 End ------------------------------------------------------------------ -/[[:^alpha:]\S]+/8WB +/[[:^alpha:]\S]+/utf,ucp ------------------------------------------------------------------ 0 15 Bra 3 [\P{L}\P{Xsp}]++ @@ -680,7 +689,7 @@ Memory allocation (code space): 10 18 End ------------------------------------------------------------------ -/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/B +/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/ ------------------------------------------------------------------ 0 73 Bra 3 abc @@ -709,63 +718,303 @@ Memory allocation (code space): 10 76 End ------------------------------------------------------------------ -/(((a\2)|(a*)\g<-1>))*a?/B +/(((a\2)|(a*)\g<-1>))*a?/ ------------------------------------------------------------------ - 0 57 Bra + 0 51 Bra 3 Brazero - 4 48 SCBra 1 - 9 40 Once - 12 18 CBra 2 - 17 10 CBra 3 - 22 a - 24 \2 - 27 10 Ket - 30 16 Alt - 33 7 CBra 4 - 38 a* - 40 7 Ket - 43 33 Recurse - 46 34 Ket - 49 40 Ket - 52 48 KetRmax - 55 a?+ - 57 57 Ket - 60 End ------------------------------------------------------------------- - -/((?+1)(\1))/B ------------------------------------------------------------------- - 0 31 Bra - 3 25 Once - 6 19 CBra 1 - 11 14 Recurse - 14 8 CBra 2 - 19 \1 - 22 8 Ket - 25 19 Ket - 28 25 Ket - 31 31 Ket - 34 End + 4 42 SCBra 1 + 9 18 CBra 2 + 14 10 CBra 3 + 19 a + 21 \2 + 24 10 Ket + 27 16 Alt + 30 7 CBra 4 + 35 a* + 37 7 Ket + 40 30 Recurse + 43 34 Ket + 46 42 KetRmax + 49 a?+ + 51 51 Ket + 54 End +------------------------------------------------------------------ + +/((?+1)(\1))/ +------------------------------------------------------------------ + 0 25 Bra + 3 19 CBra 1 + 8 11 Recurse + 11 8 CBra 2 + 16 \1 + 19 8 Ket + 22 19 Ket + 25 25 Ket + 28 End +------------------------------------------------------------------ + +"(?1)(?#?'){2}(a)" +------------------------------------------------------------------ + 0 19 Bra + 3 9 Recurse + 6 9 Recurse + 9 7 CBra 1 + 14 a + 16 7 Ket + 19 19 Ket + 22 End ------------------------------------------------------------------ -/.((?2)(?R)\1)()/B +/.((?2)(?R)|\1|$)()/ ------------------------------------------------------------------ - 0 35 Bra + 0 36 Bra 3 Any - 4 20 Once - 7 14 CBra 1 - 12 27 Recurse - 15 0 Recurse + 4 11 CBra 1 + 9 28 Recurse + 12 0 Recurse + 15 6 Alt 18 \1 - 21 14 Ket - 24 20 Ket - 27 5 CBra 2 - 32 5 Ket - 35 35 Ket - 38 End + 21 4 Alt + 24 $ + 25 21 Ket + 28 5 CBra 2 + 33 5 Ket + 36 36 Ket + 39 End +------------------------------------------------------------------ + +/.((?3)(?R)()(?2)|\1|$)()/ ------------------------------------------------------------------ + 0 47 Bra + 3 Any + 4 22 CBra 1 + 9 39 Recurse + 12 0 Recurse + 15 5 CBra 2 + 20 5 Ket + 23 15 Recurse + 26 6 Alt + 29 \1 + 32 4 Alt + 35 $ + 36 32 Ket + 39 5 CBra 3 + 44 5 Ket + 47 47 Ket + 50 End +------------------------------------------------------------------ + +/(?1)()((((((\1++))\x85)+)|))/ +------------------------------------------------------------------ + 0 77 Bra + 3 6 Recurse + 6 5 CBra 1 + 11 5 Ket + 14 60 CBra 2 + 19 49 CBra 3 + 24 41 CBra 4 + 29 33 CBra 5 + 34 23 CBra 6 + 39 15 CBra 7 + 44 7 Once + 47 \1+ + 51 7 Ket + 54 15 Ket + 57 23 Ket + 60 \x{85} + 62 33 KetRmax + 65 41 Ket + 68 3 Alt + 71 52 Ket + 74 60 Ket + 77 77 Ket + 80 End +------------------------------------------------------------------ + +# Check the absolute limit on nesting (?| etc. This varies with code unit +# width because the workspace is a different number of bytes. It will fail +# with link size 2 in 8-bit and 16-bit but not in 32-bit. + +/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|parens_nest_limit=1000,-fullbincode +Failed: error 184 at offset 1504: (?| and/or (?J: or (?x: parentheses are too deeply nested + +# Use "expand" to create some very long patterns with nested parentheses, in +# order to test workspace overflow. Again, this varies with code unit width, +# and even when it fails in two modes, the error offset differs. It also varies +# with link size - hence multiple tests with different values. + +/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000 +Failed: error 186 at offset 12820: regular expression is too complicated + +/(?(1)(?1)){8,}+()/debug +------------------------------------------------------------------ + 0 119 Bra + 3 105 Once + 6 9 Cond + 9 1 Cond ref + 12 111 Recurse + 15 9 Ket + 18 9 Cond + 21 1 Cond ref + 24 111 Recurse + 27 9 Ket + 30 9 Cond + 33 1 Cond ref + 36 111 Recurse + 39 9 Ket + 42 9 Cond + 45 1 Cond ref + 48 111 Recurse + 51 9 Ket + 54 9 Cond + 57 1 Cond ref + 60 111 Recurse + 63 9 Ket + 66 9 Cond + 69 1 Cond ref + 72 111 Recurse + 75 9 Ket + 78 9 Cond + 81 1 Cond ref + 84 111 Recurse + 87 9 Ket + 90 15 SBraPos + 93 9 SCond + 96 1 Cond ref + 99 111 Recurse +102 9 Ket +105 15 KetRpos +108 105 Ket +111 5 CBra 1 +116 5 Ket +119 119 Ket +122 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcd + 0: + 1: + +/(?(1)|a(?1)b){2,}+()/debug +------------------------------------------------------------------ + 0 61 Bra + 3 47 Once + 6 6 Cond + 9 1 Cond ref + 12 10 Alt + 15 a + 17 53 Recurse + 20 b + 22 16 Ket + 25 22 SBraPos + 28 6 SCond + 31 1 Cond ref + 34 10 Alt + 37 a + 39 53 Recurse + 42 b + 44 16 Ket + 47 22 KetRpos + 50 47 Ket + 53 5 CBra 1 + 58 5 Ket + 61 61 Ket + 64 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcde +No match + +/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug +------------------------------------------------------------------ + 0 205 Bra + 3 62 CBra 1 + 8 3 Recurse + 11 133 Recurse + 14 141 Recurse + 17 149 Recurse + 20 157 Recurse + 23 165 Recurse + 26 173 Recurse + 29 181 Recurse + 32 189 Recurse + 35 189 Recurse + 38 181 Recurse + 41 173 Recurse + 44 165 Recurse + 47 157 Recurse + 50 149 Recurse + 53 141 Recurse + 56 133 Recurse + 59 3 Recurse + 62 0 Recurse + 65 62 Ket + 68 62 SCBra 1 + 73 3 Recurse + 76 133 Recurse + 79 141 Recurse + 82 149 Recurse + 85 157 Recurse + 88 165 Recurse + 91 173 Recurse + 94 181 Recurse + 97 189 Recurse +100 189 Recurse +103 181 Recurse +106 173 Recurse +109 165 Recurse +112 157 Recurse +115 149 Recurse +118 141 Recurse +121 133 Recurse +124 3 Recurse +127 0 Recurse +130 62 KetRmax +133 5 CBra 2 +138 5 Ket +141 5 CBra 3 +146 5 Ket +149 5 CBra 4 +154 5 Ket +157 5 CBra 5 +162 5 Ket +165 5 CBra 6 +170 5 Ket +173 5 CBra 7 +178 5 Ket +181 5 CBra 8 +186 5 Ket +189 5 CBra 9 +194 5 Ket +197 5 CBra 10 +202 5 Ket +205 205 Ket +208 End +------------------------------------------------------------------ +Capture group count = 10 +May match empty string +Subject length lower boundailed: missing ) at offset 509 +Failed: error 114 at offset 509: missing closing parenthesis + +/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/-fullbincode + +#pattern -fullbincode + +/\[()]{65535}/expand +Failed: error 120 at offset 131070: regular expression is too large -/-- End of testinput11 --/ +# End of testinput8 diff --git a/src/pcre2/testdata/testoutput8-8-3 b/src/pcre2/testdata/testoutput8-8-3 new file mode 100644 index 00000000..963700a3 --- /dev/null +++ b/src/pcre2/testdata/testoutput8-8-3 @@ -0,0 +1,1018 @@ +# There are two sorts of patterns in this test. A number of them are +# representative patterns whose lengths and offsets are checked. This is just a +# doublecheck test to ensure the sizes don't go horribly wrong when something +# is changed. The operation of these patterns is checked in other tests. +# +# This file also contains tests whose output varies with code unit size and/or +# link size. Unicode support is required for these tests. There are separate +# output files for each code unit size and link size. + +#pattern fullbincode,memory + +/((?i)b)/ +Memory allocation (code space): 21 +------------------------------------------------------------------ + 0 16 Bra + 4 8 CBra 1 + 10 /i b + 12 8 Ket + 16 16 Ket + 20 End +------------------------------------------------------------------ + +/(?s)(.*X|^B)/ +Memory allocation (code space): 30 +------------------------------------------------------------------ + 0 25 Bra + 4 10 CBra 1 + 10 AllAny* + 12 X + 14 7 Alt + 18 ^ + 19 B + 21 17 Ket + 25 25 Ket + 29 End +------------------------------------------------------------------ + +/(?s:.*X|^B)/ +Memory allocation (code space): 28 +------------------------------------------------------------------ + 0 23 Bra + 4 8 Bra + 8 AllAny* + 10 X + 12 7 Alt + 16 ^ + 17 B + 19 15 Ket + 23 23 Ket + 27 End +------------------------------------------------------------------ + +/^[[:alnum:]]/ +Memory allocation (code space): 43 +------------------------------------------------------------------ + 0 38 Bra + 4 ^ + 5 [0-9A-Za-z] + 38 38 Ket + 42 End +------------------------------------------------------------------ + +/#/Ix +Memory allocation (code space): 9 +------------------------------------------------------------------ + 0 4 Bra + 4 4 Ket + 8 End +------------------------------------------------------------------ +Capture group count = 0 +May match empty string +Options: extended +Subject length lower bound = 0 + +/a#/Ix +Memory allocation (code space): 11 +------------------------------------------------------------------ + 0 6 Bra + 4 a + 6 6 Ket + 10 End +------------------------------------------------------------------ +Capture group count = 0 +Options: extended +First code unit = 'a' +Subject length lower bound = 1 + +/x?+/ +Memory allocation (code space): 11 +------------------------------------------------------------------ + 0 6 Bra + 4 x?+ + 6 6 Ket + 10 End +------------------------------------------------------------------ + +/x++/ +Memory allocation (code space): 11 +------------------------------------------------------------------ + 0 6 Bra + 4 x++ + 6 6 Ket + 10 End +------------------------------------------------------------------ + +/x{1,3}+/ +Memory allocation (code space): 15 +------------------------------------------------------------------ + 0 10 Bra + 4 x + 6 x{0,2}+ + 10 10 Ket + 14 End +------------------------------------------------------------------ + +/(x)*+/ +Memory allocation (code space): 22 +------------------------------------------------------------------ + 0 17 Bra + 4 Braposzero + 5 8 CBraPos 1 + 11 x + 13 8 KetRpos + 17 17 Ket + 21 End +------------------------------------------------------------------ + +/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/ +Memory allocation (code space): 132 +------------------------------------------------------------------ + 0 127 Bra + 4 ^ + 5 118 CBra 1 + 11 8 CBra 2 + 17 a+ + 19 8 Ket + 23 40 CBra 3 + 29 [ab]+? + 63 40 Ket + 67 40 CBra 4 + 73 [bc]+ +107 40 Ket +111 8 CBra 5 +117 \w*+ +119 8 Ket +123 118 Ket +127 127 Ket +131 End +------------------------------------------------------------------ + +"8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" +Memory allocation (code space): 828 +------------------------------------------------------------------ + 0 823 Bra + 4 8J$WE<.rX+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X +822 \b +823 823 Ket +827 End +------------------------------------------------------------------ + +"\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" +Memory allocation (code space): 818 +------------------------------------------------------------------ + 0 813 Bra + 4 $<.X+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X +812 \b +813 813 Ket +817 End +------------------------------------------------------------------ + +/(a(?1)b)/ +Memory allocation (code space): 27 +------------------------------------------------------------------ + 0 22 Bra + 4 14 CBra 1 + 10 a + 12 4 Recurse + 16 b + 18 14 Ket + 22 22 Ket + 26 End +------------------------------------------------------------------ + +/(a(?1)+b)/ +Memory allocation (code space): 35 +------------------------------------------------------------------ + 0 30 Bra + 4 22 CBra 1 + 10 a + 12 8 SBra + 16 4 Recurse + 20 8 KetRmax + 24 b + 26 22 Ket + 30 30 Ket + 34 End +------------------------------------------------------------------ + +/a(?Pb|c)d(?Pe)/ +Memory allocation (code space): 43 +------------------------------------------------------------------ + 0 38 Bra + 4 a + 6 8 CBra 1 + 12 b + 14 6 Alt + 18 c + 20 14 Ket + 24 d + 26 8 CBra 2 + 32 e + 34 8 Ket + 38 38 Ket + 42 End +------------------------------------------------------------------ + +/(?:a(?Pc(?Pd)))(?Pa)/ +Memory allocation (code space): 55 +------------------------------------------------------------------ + 0 50 Bra + 4 30 Bra + 8 a + 10 20 CBra 1 + 16 c + 18 8 CBra 2 + 24 d + 26 8 Ket + 30 20 Ket + 34 30 Ket + 38 8 CBra 3 + 44 a + 46 8 Ket + 50 50 Ket + 54 End +------------------------------------------------------------------ + +/(?Pa)...(?P=a)bbb(?P>a)d/ +Memory allocation (code space): 39 +------------------------------------------------------------------ + 0 34 Bra + 4 8 CBra 1 + 10 a + 12 8 Ket + 16 Any + 17 Any + 18 Any + 19 \1 + 22 bbb + 28 4 Recurse + 32 d + 34 34 Ket + 38 End +------------------------------------------------------------------ + +/abc(?C255)de(?C)f/ +Memory allocation (code space): 37 +------------------------------------------------------------------ + 0 32 Bra + 4 abc + 10 Callout 255 10 1 + 18 de + 22 Callout 0 16 1 + 30 f + 32 32 Ket + 36 End +------------------------------------------------------------------ + +/abcde/auto_callout +Memory allocation (code space): 67 +------------------------------------------------------------------ + 0 62 Bra + 4 Callout 255 0 1 + 12 a + 14 Callout 255 1 1 + 22 b + 24 Callout 255 2 1 + 32 c + 34 Callout 255 3 1 + 42 d + 44 Callout 255 4 1 + 52 e + 54 Callout 255 5 0 + 62 62 Ket + 66 End +------------------------------------------------------------------ + +/\x{100}/utf +Memory allocation (code space): 12 +------------------------------------------------------------------ + 0 7 Bra + 4 \x{100} + 7 7 Ket + 11 End +------------------------------------------------------------------ + +/\x{1000}/utf +Memory allocation (code space): 13 +------------------------------------------------------------------ + 0 8 Bra + 4 \x{1000} + 8 8 Ket + 12 End +------------------------------------------------------------------ + +/\x{10000}/utf +Memory allocation (code space): 14 +------------------------------------------------------------------ + 0 9 Bra + 4 \x{10000} + 9 9 Ket + 13 End +------------------------------------------------------------------ + +/\x{100000}/utf +Memory allocation (code space): 14 +------------------------------------------------------------------ + 0 9 Bra + 4 \x{100000} + 9 9 Ket + 13 End +------------------------------------------------------------------ + +/\x{10ffff}/utf +Memory allocation (code space): 14 +------------------------------------------------------------------ + 0 9 Bra + 4 \x{10ffff} + 9 9 Ket + 13 End +------------------------------------------------------------------ + +/\x{110000}/utf +Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large + +/[\x{ff}]/utf +Memory allocation (code space): 12 +------------------------------------------------------------------ + 0 7 Bra + 4 \x{ff} + 7 7 Ket + 11 End +------------------------------------------------------------------ + +/[\x{100}]/utf +Memory allocation (code space): 12 +------------------------------------------------------------------ + 0 7 Bra + 4 \x{100} + 7 7 Ket + 11 End +------------------------------------------------------------------ + +/\x80/utf +Memory allocation (code space): 12 +------------------------------------------------------------------ + 0 7 Bra + 4 \x{80} + 7 7 Ket + 11 End +------------------------------------------------------------------ + +/\xff/utf +Memory allocation (code space): 12 +------------------------------------------------------------------ + 0 7 Bra + 4 \x{ff} + 7 7 Ket + 11 End +------------------------------------------------------------------ + +/\x{0041}\x{2262}\x{0391}\x{002e}/I,utf +Memory allocation (code space): 20 +------------------------------------------------------------------ + 0 15 Bra + 4 A\x{2262}\x{391}. + 15 15 Ket + 19 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'A' +Last code unit = '.' +Subject length lower bound = 4 + +/\x{D55c}\x{ad6d}\x{C5B4}/I,utf +Memory allocation (code space): 21 +------------------------------------------------------------------ + 0 16 Bra + 4 \x{d55c}\x{ad6d}\x{c5b4} + 16 16 Ket + 20 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xed +Last code unit = \xb4 +Subject length lower bound = 3 + +/\x{65e5}\x{672c}\x{8a9e}/I,utf +Memory allocation (code space): 21 +------------------------------------------------------------------ + 0 16 Bra + 4 \x{65e5}\x{672c}\x{8a9e} + 16 16 Ket + 20 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xe6 +Last code unit = \x9e +Subject length lower bound = 3 + +/[\x{100}]/utf +Memory allocation (code space): 12 +------------------------------------------------------------------ + 0 7 Bra + 4 \x{100} + 7 7 Ket + 11 End +------------------------------------------------------------------ + +/[Z\x{100}]/utf +Memory allocation (code space): 50 +------------------------------------------------------------------ + 0 45 Bra + 4 [Z\x{100}] + 45 45 Ket + 49 End +------------------------------------------------------------------ + +/^[\x{100}\E-\Q\E\x{150}]/utf +Memory allocation (code space): 21 +------------------------------------------------------------------ + 0 16 Bra + 4 ^ + 5 [\x{100}-\x{150}] + 16 16 Ket + 20 End +------------------------------------------------------------------ + +/^[\QÄ€\E-\QÅ\E]/utf +Memory allocation (code space): 21 +------------------------------------------------------------------ + 0 16 Bra + 4 ^ + 5 [\x{100}-\x{150}] + 16 16 Ket + 20 End +------------------------------------------------------------------ + +/^[\QÄ€\E-\QÅ\E/utf +Failed: error 106 at offset 15: missing terminating ] for character class + +/[\p{L}]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 13 Bra + 4 [\p{L}] + 13 13 Ket + 17 End +------------------------------------------------------------------ + +/[\p{^L}]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 13 Bra + 4 [\P{L}] + 13 13 Ket + 17 End +------------------------------------------------------------------ + +/[\P{L}]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 13 Bra + 4 [\P{L}] + 13 13 Ket + 17 End +------------------------------------------------------------------ + +/[\P{^L}]/ +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 13 Bra + 4 [\p{L}] + 13 13 Ket + 17 End +------------------------------------------------------------------ + +/[abc\p{L}\x{0660}]/utf +Memory allocation (code space): 53 +------------------------------------------------------------------ + 0 48 Bra + 4 [a-c\p{L}\x{660}] + 48 48 Ket + 52 End +------------------------------------------------------------------ + +/[\p{Nd}]/utf +Memory allocation (code space): 18 +------------------------------------------------------------------ + 0 13 Bra + 4 [\p{Nd}] + 13 13 Ket + 17 End +------------------------------------------------------------------ + +/[\p{Nd}+-]+/utf +Memory allocation (code space): 51 +------------------------------------------------------------------ + 0 46 Bra + 4 [+\-\p{Nd}]++ + 46 46 Ket + 50 End +------------------------------------------------------------------ + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/i,utf +Memory allocation (code space): 27 +------------------------------------------------------------------ + 0 22 Bra + 4 /i A\x{391}\x{10427}\x{ff3a}\x{1fb0} + 22 22 Ket + 26 End +------------------------------------------------------------------ + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/utf +Memory allocation (code space): 27 +------------------------------------------------------------------ + 0 22 Bra + 4 A\x{391}\x{10427}\x{ff3a}\x{1fb0} + 22 22 Ket + 26 End +------------------------------------------------------------------ + +/[\x{105}-\x{109}]/i,utf +Memory allocation (code space): 20 +------------------------------------------------------------------ + 0 15 Bra + 4 [\x{104}-\x{109}] + 15 15 Ket + 19 End +------------------------------------------------------------------ + +/( ( (?(1)0|) )* )/x +Memory allocation (code space): 47 +------------------------------------------------------------------ + 0 42 Bra + 4 34 CBra 1 + 10 Brazero + 11 23 SCBra 2 + 17 9 Cond + 21 1 Cond ref + 24 0 + 26 4 Alt + 30 13 Ket + 34 23 KetRmax + 38 34 Ket + 42 42 Ket + 46 End +------------------------------------------------------------------ + +/( (?(1)0|)* )/x +Memory allocation (code space): 37 +------------------------------------------------------------------ + 0 32 Bra + 4 24 CBra 1 + 10 Brazero + 11 9 SCond + 15 1 Cond ref + 18 0 + 20 4 Alt + 24 13 KetRmax + 28 24 Ket + 32 32 Ket + 36 End +------------------------------------------------------------------ + +/[a]/ +Memory allocation (code space): 11 +------------------------------------------------------------------ + 0 6 Bra + 4 a + 6 6 Ket + 10 End +------------------------------------------------------------------ + +/[a]/utf +Memory allocation (code space): 11 +------------------------------------------------------------------ + 0 6 Bra + 4 a + 6 6 Ket + 10 End +------------------------------------------------------------------ + +/[\xaa]/ +Memory allocation (code space): 11 +------------------------------------------------------------------ + 0 6 Bra + 4 \x{aa} + 6 6 Ket + 10 End +------------------------------------------------------------------ + +/[\xaa]/utf +Memory allocation (code space): 12 +------------------------------------------------------------------ + 0 7 Bra + 4 \x{aa} + 7 7 Ket + 11 End +------------------------------------------------------------------ + +/[^a]/ +Memory allocation (code space): 11 +------------------------------------------------------------------ + 0 6 Bra + 4 [^a] + 6 6 Ket + 10 End +------------------------------------------------------------------ + +/[^a]/utf +Memory allocation (code space): 11 +------------------------------------------------------------------ + 0 6 Bra + 4 [^a] + 6 6 Ket + 10 End +------------------------------------------------------------------ + +/[^\xaa]/ +Memory allocation (code space): 11 +------------------------------------------------------------------ + 0 6 Bra + 4 [^\x{aa}] + 6 6 Ket + 10 End +------------------------------------------------------------------ + +/[^\xaa]/utf +Memory allocation (code space): 12 +------------------------------------------------------------------ + 0 7 Bra + 4 [^\x{aa}] + 7 7 Ket + 11 End +------------------------------------------------------------------ + +#pattern -memory + +/[^\d]/utf,ucp +------------------------------------------------------------------ + 0 13 Bra + 4 [^\p{Nd}] + 13 13 Ket + 17 End +------------------------------------------------------------------ + +/[[:^alpha:][:^cntrl:]]+/utf,ucp +------------------------------------------------------------------ + 0 17 Bra + 4 [\P{L}\P{Cc}]++ + 17 17 Ket + 21 End +------------------------------------------------------------------ + +/[[:^cntrl:][:^alpha:]]+/utf,ucp +------------------------------------------------------------------ + 0 17 Bra + 4 [\P{Cc}\P{L}]++ + 17 17 Ket + 21 End +------------------------------------------------------------------ + +/[[:alpha:]]+/utf,ucp +------------------------------------------------------------------ + 0 14 Bra + 4 [\p{L}]++ + 14 14 Ket + 18 End +------------------------------------------------------------------ + +/[[:^alpha:]\S]+/utf,ucp +------------------------------------------------------------------ + 0 17 Bra + 4 [\P{L}\P{Xsp}]++ + 17 17 Ket + 21 End +------------------------------------------------------------------ + +/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/ +------------------------------------------------------------------ + 0 83 Bra + 4 abc + 10 8 CBra 1 + 16 d + 18 6 Alt + 22 e + 24 14 Ket + 28 *THEN + 29 x + 31 15 CBra 2 + 37 123 + 43 *THEN + 44 4 + 46 33 Alt + 50 567 + 56 8 CBra 3 + 62 b + 64 6 Alt + 68 q + 70 14 Ket + 74 *THEN + 75 xx + 79 48 Ket + 83 83 Ket + 87 End +------------------------------------------------------------------ + +/(((a\2)|(a*)\g<-1>))*a?/ +------------------------------------------------------------------ + 0 62 Bra + 4 Brazero + 5 51 SCBra 1 + 11 21 CBra 2 + 17 11 CBra 3 + 23 a + 25 \2 + 28 11 Ket + 32 20 Alt + 36 8 CBra 4 + 42 a* + 44 8 Ket + 48 36 Recurse + 52 41 Ket + 56 51 KetRmax + 60 a?+ + 62 62 Ket + 66 End +------------------------------------------------------------------ + +/((?+1)(\1))/ +------------------------------------------------------------------ + 0 31 Bra + 4 23 CBra 1 + 10 14 Recurse + 14 9 CBra 2 + 20 \1 + 23 9 Ket + 27 23 Ket + 31 31 Ket + 35 End +------------------------------------------------------------------ + +"(?1)(?#?'){2}(a)" +------------------------------------------------------------------ + 0 24 Bra + 4 12 Recurse + 8 12 Recurse + 12 8 CBra 1 + 18 a + 20 8 Ket + 24 24 Ket + 28 End +------------------------------------------------------------------ + +/.((?2)(?R)|\1|$)()/ +------------------------------------------------------------------ + 0 45 Bra + 4 Any + 5 14 CBra 1 + 11 35 Recurse + 15 0 Recurse + 19 7 Alt + 23 \1 + 26 5 Alt + 30 $ + 31 26 Ket + 35 6 CBra 2 + 41 6 Ket + 45 45 Ket + 49 End +------------------------------------------------------------------ + +/.((?3)(?R)()(?2)|\1|$)()/ +------------------------------------------------------------------ + 0 59 Bra + 4 Any + 5 28 CBra 1 + 11 49 Recurse + 15 0 Recurse + 19 6 CBra 2 + 25 6 Ket + 29 19 Recurse + 33 7 Alt + 37 \1 + 40 5 Alt + 44 $ + 45 40 Ket + 49 6 CBra 3 + 55 6 Ket + 59 59 Ket + 63 End +------------------------------------------------------------------ + +/(?1)()((((((\1++))\x85)+)|))/ +------------------------------------------------------------------ + 0 96 Bra + 4 8 Recurse + 8 6 CBra 1 + 14 6 Ket + 18 74 CBra 2 + 24 60 CBra 3 + 30 50 CBra 4 + 36 40 CBra 5 + 42 28 CBra 6 + 48 18 CBra 7 + 54 8 Once + 58 \1+ + 62 8 Ket + 66 18 Ket + 70 28 Ket + 74 \x{85} + 76 40 KetRmax + 80 50 Ket + 84 4 Alt + 88 64 Ket + 92 74 Ket + 96 96 Ket +100 End +------------------------------------------------------------------ + +# Check the absolute limit on nesting (?| etc. This varies with code unit +# width because the workspace is a different number of bytes. It will fail +# with link size 2 in 8-bit and 16-bit but not in 32-bit. + +/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|parens_nest_limit=1000,-fullbincode + +# Use "expand" to create some very long patterns with nested parentheses, in +# order to test workspace overflow. Again, this varies with code unit width, +# and even when it fails in two modes, the error offset differs. It also varies +# with link size - hence multiple tests with different values. + +/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000 +Failed: error 186 at offset 12820: regular expression is too complicated + +/(?(1)(?1)){8,}+()/debug +------------------------------------------------------------------ + 0 150 Bra + 4 132 Once + 8 11 Cond + 12 1 Cond ref + 15 140 Recurse + 19 11 Ket + 23 11 Cond + 27 1 Cond ref + 30 140 Recurse + 34 11 Ket + 38 11 Cond + 42 1 Cond ref + 45 140 Recurse + 49 11 Ket + 53 11 Cond + 57 1 Cond ref + 60 140 Recurse + 64 11 Ket + 68 11 Cond + 72 1 Cond ref + 75 140 Recurse + 79 11 Ket + 83 11 Cond + 87 1 Cond ref + 90 140 Recurse + 94 11 Ket + 98 11 Cond +102 1 Cond ref +105 140 Recurse +109 11 Ket +113 19 SBraPos +117 11 SCond +121 1 Cond ref +124 140 Recurse +128 11 Ket +132 19 KetRpos +136 132 Ket +140 6 CBra 1 +146 6 Ket +150 150 Ket +154 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcd + 0: + 1: + +/(?(1)|a(?1)b){2,}+()/debug +------------------------------------------------------------------ + 0 76 Bra + 4 58 Once + 8 7 Cond + 12 1 Cond ref + 15 12 Alt + 19 a + 21 66 Recurse + 25 b + 27 19 Ket + 31 27 SBraPos + 35 7 SCond + 39 1 Cond ref + 42 12 Alt + 46 a + 48 66 Recurse + 52 b + 54 19 Ket + 58 27 KetRpos + 62 58 Ket + 66 6 CBra 1 + 72 6 Ket + 76 76 Ket + 80 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcde +No match + +/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug +------------------------------------------------------------------ + 0 266 Bra + 4 82 CBra 1 + 10 4 Recurse + 14 176 Recurse + 18 186 Recurse + 22 196 Recurse + 26 206 Recurse + 30 216 Recurse + 34 226 Recurse + 38 236 Recurse + 42 246 Recurse + 46 246 Recurse + 50 236 Recurse + 54 226 Recurse + 58 216 Recurse + 62 206 Recurse + 66 196 Recurse + 70 186 Recurse + 74 176 Recurse + 78 4 Recurse + 82 0 Recurse + 86 82 Ket + 90 82 SCBra 1 + 96 4 Recurse +100 176 Recurse +104 186 Recurse +108 196 Recurse +112 206 Recurse +116 216 Recurse +120 226 Recurse +124 236 Recurse +128 246 Recurse +132 246 Recurse +136 236 Recurse +140 226 Recurse +144 216 Recurse +148 206 Recurse +152 196 Recurse +156 186 Recurse +160 176 Recurse +164 4 Recurse +168 0 Recurse +172 82 KetRmax +176 6 CBra 2 +182 6 Ket +186 6 CBra 3 +192 6 Ket +196 6 CBra 4 +202 6 Ket +206 6 CBra 5 +212 6 Ket +216 6 CBra 6 +222 6 Ket +226 6 CBra 7 +232 6 Ket +236 6 CBra 8 +242 6 Ket +246 6 CBra 9 +252 6 Ket +256 6 CBra 10 +262 6 Ket +266 266 Ket +270 End +------------------------------------------------------------------ +Capture group count = 10 +May match empty string +Subject length lower boundailed: error 114 at offset 509: missing closing parenthesisfullbincode + +#pattern -fullbincode + +/\[()]{65535}/expand + +# End of testinput8 diff --git a/src/pcre2/testdata/testoutput8-8-4 b/src/pcre2/testdata/testoutput8-8-4 new file mode 100644 index 00000000..8e19908e --- /dev/null +++ b/src/pcre2/testdata/testoutput8-8-4 @@ -0,0 +1,1018 @@ +# There are two sorts of patterns in this test. A number of them are +# representative patterns whose lengths and offsets are checked. This is just a +# doublecheck test to ensure the sizes don't go horribly wrong when something +# is changed. The operation of these patterns is checked in other tests. +# +# This file also contains tests whose output varies with code unit size and/or +# link size. Unicode support is required for these tests. There are separate +# output files for each code unit size and link size. + +#pattern fullbincode,memory + +/((?i)b)/ +Memory allocation (code space): 25 +------------------------------------------------------------------ + 0 19 Bra + 5 9 CBra 1 + 12 /i b + 14 9 Ket + 19 19 Ket + 24 End +------------------------------------------------------------------ + +/(?s)(.*X|^B)/ +Memory allocation (code space): 35 +------------------------------------------------------------------ + 0 29 Bra + 5 11 CBra 1 + 12 AllAny* + 14 X + 16 8 Alt + 21 ^ + 22 B + 24 19 Ket + 29 29 Ket + 34 End +------------------------------------------------------------------ + +/(?s:.*X|^B)/ +Memory allocation (code space): 33 +------------------------------------------------------------------ + 0 27 Bra + 5 9 Bra + 10 AllAny* + 12 X + 14 8 Alt + 19 ^ + 20 B + 22 17 Ket + 27 27 Ket + 32 End +------------------------------------------------------------------ + +/^[[:alnum:]]/ +Memory allocation (code space): 45 +------------------------------------------------------------------ + 0 39 Bra + 5 ^ + 6 [0-9A-Za-z] + 39 39 Ket + 44 End +------------------------------------------------------------------ + +/#/Ix +Memory allocation (code space): 11 +------------------------------------------------------------------ + 0 5 Bra + 5 5 Ket + 10 End +------------------------------------------------------------------ +Capture group count = 0 +May match empty string +Options: extended +Subject length lower bound = 0 + +/a#/Ix +Memory allocation (code space): 13 +------------------------------------------------------------------ + 0 7 Bra + 5 a + 7 7 Ket + 12 End +------------------------------------------------------------------ +Capture group count = 0 +Options: extended +First code unit = 'a' +Subject length lower bound = 1 + +/x?+/ +Memory allocation (code space): 13 +------------------------------------------------------------------ + 0 7 Bra + 5 x?+ + 7 7 Ket + 12 End +------------------------------------------------------------------ + +/x++/ +Memory allocation (code space): 13 +------------------------------------------------------------------ + 0 7 Bra + 5 x++ + 7 7 Ket + 12 End +------------------------------------------------------------------ + +/x{1,3}+/ +Memory allocation (code space): 17 +------------------------------------------------------------------ + 0 11 Bra + 5 x + 7 x{0,2}+ + 11 11 Ket + 16 End +------------------------------------------------------------------ + +/(x)*+/ +Memory allocation (code space): 26 +------------------------------------------------------------------ + 0 20 Bra + 5 Braposzero + 6 9 CBraPos 1 + 13 x + 15 9 KetRpos + 20 20 Ket + 25 End +------------------------------------------------------------------ + +/^((a+)(?U)([ab]+)(?-U)([bc]+)(\w*))/ +Memory allocation (code space): 144 +------------------------------------------------------------------ + 0 138 Bra + 5 ^ + 6 127 CBra 1 + 13 9 CBra 2 + 20 a+ + 22 9 Ket + 27 41 CBra 3 + 34 [ab]+? + 68 41 Ket + 73 41 CBra 4 + 80 [bc]+ +114 41 Ket +119 9 CBra 5 +126 \w*+ +128 9 Ket +133 127 Ket +138 138 Ket +143 End +------------------------------------------------------------------ + +"8J\$WE\<\.rX\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" +Memory allocation (code space): 830 +------------------------------------------------------------------ + 0 824 Bra + 5 8J$WE<.rX+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X +823 \b +824 824 Ket +829 End +------------------------------------------------------------------ + +"\$\<\.X\+ix\[d1b\!H\#\?vV0vrK\:ZH1\=2M\>iV\;\?aPhFB\<\*vW\@QW\@sO9\}cfZA\-i\'w\%hKd6gt1UJP\,15_\#QY\$M\^Mss_U\/\]\&LK9\[5vQub\^w\[KDD\qmj\;2\}YWFdYx\.Ap\]hjCPTP\(n28k\+3\;o\&WXqs\/gOXdr\$\:r\'do0\;b4c\(f_Gr\=\"\\4\)\[01T7ajQJvL\$W\~mL_sS\/4h\:x\*\[ZN\=KLs\&L5zX\/\/\>it\,o\:aU\(\;Z\>pW\&T7oP\'2K\^E\:x9\'c\[\%z\-\,64JQ5AeH_G\#KijUKghQw\^\\vea3a\?kka_G\$8\#\`\*kynsxzBLru\'\]k_\[7FrVx\}\^\=\$blx\>s\-N\%j\;D\*aZDnsw\:YKZ\%Q\.Kne9\#hP\?\+b3\(SOvL\,\^\;\&u5\@\?5C5Bhb\=m\-vEh_L15Jl\]U\)0RP6\{q\%L\^_z5E\'Dw6X\b" +Memory allocation (code space): 820 +------------------------------------------------------------------ + 0 814 Bra + 5 $<.X+ix[d1b!H#?vV0vrK:ZH1=2M>iV;?aPhFB<*vW@QW@sO9}cfZA-i'w%hKd6gt1UJP,15_#QY$M^Mss_U/]&LK9[5vQub^w[KDDqmj;2}YWFdYx.Ap]hjCPTP(n28k+3;o&WXqs/gOXdr$:r'do0;b4c(f_Gr="\4)[01T7ajQJvL$W~mL_sS/4h:x*[ZN=KLs&L5zX//>it,o:aU(;Z>pW&T7oP'2K^E:x9'c[%z-,64JQ5AeH_G#KijUKghQw^\vea3a?kka_G$8#`*kynsxzBLru']k_[7FrVx}^=$blx>s-N%j;D*aZDnsw:YKZ%Q.Kne9#hP?+b3(SOvL,^;&u5@?5C5Bhb=m-vEh_L15Jl]U)0RP6{q%L^_z5E'Dw6X +813 \b +814 814 Ket +819 End +------------------------------------------------------------------ + +/(a(?1)b)/ +Memory allocation (code space): 32 +------------------------------------------------------------------ + 0 26 Bra + 5 16 CBra 1 + 12 a + 14 5 Recurse + 19 b + 21 16 Ket + 26 26 Ket + 31 End +------------------------------------------------------------------ + +/(a(?1)+b)/ +Memory allocation (code space): 42 +------------------------------------------------------------------ + 0 36 Bra + 5 26 CBra 1 + 12 a + 14 10 SBra + 19 5 Recurse + 24 10 KetRmax + 29 b + 31 26 Ket + 36 36 Ket + 41 End +------------------------------------------------------------------ + +/a(?Pb|c)d(?Pe)/ +Memory allocation (code space): 50 +------------------------------------------------------------------ + 0 44 Bra + 5 a + 7 9 CBra 1 + 14 b + 16 7 Alt + 21 c + 23 16 Ket + 28 d + 30 9 CBra 2 + 37 e + 39 9 Ket + 44 44 Ket + 49 End +------------------------------------------------------------------ + +/(?:a(?Pc(?Pd)))(?Pa)/ +Memory allocation (code space): 65 +------------------------------------------------------------------ + 0 59 Bra + 5 35 Bra + 10 a + 12 23 CBra 1 + 19 c + 21 9 CBra 2 + 28 d + 30 9 Ket + 35 23 Ket + 40 35 Ket + 45 9 CBra 3 + 52 a + 54 9 Ket + 59 59 Ket + 64 End +------------------------------------------------------------------ + +/(?Pa)...(?P=a)bbb(?P>a)d/ +Memory allocation (code space): 44 +------------------------------------------------------------------ + 0 38 Bra + 5 9 CBra 1 + 12 a + 14 9 Ket + 19 Any + 20 Any + 21 Any + 22 \1 + 25 bbb + 31 5 Recurse + 36 d + 38 38 Ket + 43 End +------------------------------------------------------------------ + +/abc(?C255)de(?C)f/ +Memory allocation (code space): 43 +------------------------------------------------------------------ + 0 37 Bra + 5 abc + 11 Callout 255 10 1 + 21 de + 25 Callout 0 16 1 + 35 f + 37 37 Ket + 42 End +------------------------------------------------------------------ + +/abcde/auto_callout +Memory allocation (code space): 81 +------------------------------------------------------------------ + 0 75 Bra + 5 Callout 255 0 1 + 15 a + 17 Callout 255 1 1 + 27 b + 29 Callout 255 2 1 + 39 c + 41 Callout 255 3 1 + 51 d + 53 Callout 255 4 1 + 63 e + 65 Callout 255 5 0 + 75 75 Ket + 80 End +------------------------------------------------------------------ + +/\x{100}/utf +Memory allocation (code space): 14 +------------------------------------------------------------------ + 0 8 Bra + 5 \x{100} + 8 8 Ket + 13 End +------------------------------------------------------------------ + +/\x{1000}/utf +Memory allocation (code space): 15 +------------------------------------------------------------------ + 0 9 Bra + 5 \x{1000} + 9 9 Ket + 14 End +------------------------------------------------------------------ + +/\x{10000}/utf +Memory allocation (code space): 16 +------------------------------------------------------------------ + 0 10 Bra + 5 \x{10000} + 10 10 Ket + 15 End +------------------------------------------------------------------ + +/\x{100000}/utf +Memory allocation (code space): 16 +------------------------------------------------------------------ + 0 10 Bra + 5 \x{100000} + 10 10 Ket + 15 End +------------------------------------------------------------------ + +/\x{10ffff}/utf +Memory allocation (code space): 16 +------------------------------------------------------------------ + 0 10 Bra + 5 \x{10ffff} + 10 10 Ket + 15 End +------------------------------------------------------------------ + +/\x{110000}/utf +Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too large + +/[\x{ff}]/utf +Memory allocation (code space): 14 +------------------------------------------------------------------ + 0 8 Bra + 5 \x{ff} + 8 8 Ket + 13 End +------------------------------------------------------------------ + +/[\x{100}]/utf +Memory allocation (code space): 14 +------------------------------------------------------------------ + 0 8 Bra + 5 \x{100} + 8 8 Ket + 13 End +------------------------------------------------------------------ + +/\x80/utf +Memory allocation (code space): 14 +------------------------------------------------------------------ + 0 8 Bra + 5 \x{80} + 8 8 Ket + 13 End +------------------------------------------------------------------ + +/\xff/utf +Memory allocation (code space): 14 +------------------------------------------------------------------ + 0 8 Bra + 5 \x{ff} + 8 8 Ket + 13 End +------------------------------------------------------------------ + +/\x{0041}\x{2262}\x{0391}\x{002e}/I,utf +Memory allocation (code space): 22 +------------------------------------------------------------------ + 0 16 Bra + 5 A\x{2262}\x{391}. + 16 16 Ket + 21 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = 'A' +Last code unit = '.' +Subject length lower bound = 4 + +/\x{D55c}\x{ad6d}\x{C5B4}/I,utf +Memory allocation (code space): 23 +------------------------------------------------------------------ + 0 17 Bra + 5 \x{d55c}\x{ad6d}\x{c5b4} + 17 17 Ket + 22 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xed +Last code unit = \xb4 +Subject length lower bound = 3 + +/\x{65e5}\x{672c}\x{8a9e}/I,utf +Memory allocation (code space): 23 +------------------------------------------------------------------ + 0 17 Bra + 5 \x{65e5}\x{672c}\x{8a9e} + 17 17 Ket + 22 End +------------------------------------------------------------------ +Capture group count = 0 +Options: utf +First code unit = \xe6 +Last code unit = \x9e +Subject length lower bound = 3 + +/[\x{100}]/utf +Memory allocation (code space): 14 +------------------------------------------------------------------ + 0 8 Bra + 5 \x{100} + 8 8 Ket + 13 End +------------------------------------------------------------------ + +/[Z\x{100}]/utf +Memory allocation (code space): 53 +------------------------------------------------------------------ + 0 47 Bra + 5 [Z\x{100}] + 47 47 Ket + 52 End +------------------------------------------------------------------ + +/^[\x{100}\E-\Q\E\x{150}]/utf +Memory allocation (code space): 24 +------------------------------------------------------------------ + 0 18 Bra + 5 ^ + 6 [\x{100}-\x{150}] + 18 18 Ket + 23 End +------------------------------------------------------------------ + +/^[\QÄ€\E-\QÅ\E]/utf +Memory allocation (code space): 24 +------------------------------------------------------------------ + 0 18 Bra + 5 ^ + 6 [\x{100}-\x{150}] + 18 18 Ket + 23 End +------------------------------------------------------------------ + +/^[\QÄ€\E-\QÅ\E/utf +Failed: error 106 at offset 15: missing terminating ] for character class + +/[\p{L}]/ +Memory allocation (code space): 21 +------------------------------------------------------------------ + 0 15 Bra + 5 [\p{L}] + 15 15 Ket + 20 End +------------------------------------------------------------------ + +/[\p{^L}]/ +Memory allocation (code space): 21 +------------------------------------------------------------------ + 0 15 Bra + 5 [\P{L}] + 15 15 Ket + 20 End +------------------------------------------------------------------ + +/[\P{L}]/ +Memory allocation (code space): 21 +------------------------------------------------------------------ + 0 15 Bra + 5 [\P{L}] + 15 15 Ket + 20 End +------------------------------------------------------------------ + +/[\P{^L}]/ +Memory allocation (code space): 21 +------------------------------------------------------------------ + 0 15 Bra + 5 [\p{L}] + 15 15 Ket + 20 End +------------------------------------------------------------------ + +/[abc\p{L}\x{0660}]/utf +Memory allocation (code space): 56 +------------------------------------------------------------------ + 0 50 Bra + 5 [a-c\p{L}\x{660}] + 50 50 Ket + 55 End +------------------------------------------------------------------ + +/[\p{Nd}]/utf +Memory allocation (code space): 21 +------------------------------------------------------------------ + 0 15 Bra + 5 [\p{Nd}] + 15 15 Ket + 20 End +------------------------------------------------------------------ + +/[\p{Nd}+-]+/utf +Memory allocation (code space): 54 +------------------------------------------------------------------ + 0 48 Bra + 5 [+\-\p{Nd}]++ + 48 48 Ket + 53 End +------------------------------------------------------------------ + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/i,utf +Memory allocation (code space): 29 +------------------------------------------------------------------ + 0 23 Bra + 5 /i A\x{391}\x{10427}\x{ff3a}\x{1fb0} + 23 23 Ket + 28 End +------------------------------------------------------------------ + +/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/utf +Memory allocation (code space): 29 +------------------------------------------------------------------ + 0 23 Bra + 5 A\x{391}\x{10427}\x{ff3a}\x{1fb0} + 23 23 Ket + 28 End +------------------------------------------------------------------ + +/[\x{105}-\x{109}]/i,utf +Memory allocation (code space): 23 +------------------------------------------------------------------ + 0 17 Bra + 5 [\x{104}-\x{109}] + 17 17 Ket + 22 End +------------------------------------------------------------------ + +/( ( (?(1)0|) )* )/x +Memory allocation (code space): 56 +------------------------------------------------------------------ + 0 50 Bra + 5 40 CBra 1 + 12 Brazero + 13 27 SCBra 2 + 20 10 Cond + 25 1 Cond ref + 28 0 + 30 5 Alt + 35 15 Ket + 40 27 KetRmax + 45 40 Ket + 50 50 Ket + 55 End +------------------------------------------------------------------ + +/( (?(1)0|)* )/x +Memory allocation (code space): 44 +------------------------------------------------------------------ + 0 38 Bra + 5 28 CBra 1 + 12 Brazero + 13 10 SCond + 18 1 Cond ref + 21 0 + 23 5 Alt + 28 15 KetRmax + 33 28 Ket + 38 38 Ket + 43 End +------------------------------------------------------------------ + +/[a]/ +Memory allocation (code space): 13 +------------------------------------------------------------------ + 0 7 Bra + 5 a + 7 7 Ket + 12 End +------------------------------------------------------------------ + +/[a]/utf +Memory allocation (code space): 13 +------------------------------------------------------------------ + 0 7 Bra + 5 a + 7 7 Ket + 12 End +------------------------------------------------------------------ + +/[\xaa]/ +Memory allocation (code space): 13 +------------------------------------------------------------------ + 0 7 Bra + 5 \x{aa} + 7 7 Ket + 12 End +------------------------------------------------------------------ + +/[\xaa]/utf +Memory allocation (code space): 14 +------------------------------------------------------------------ + 0 8 Bra + 5 \x{aa} + 8 8 Ket + 13 End +------------------------------------------------------------------ + +/[^a]/ +Memory allocation (code space): 13 +------------------------------------------------------------------ + 0 7 Bra + 5 [^a] + 7 7 Ket + 12 End +------------------------------------------------------------------ + +/[^a]/utf +Memory allocation (code space): 13 +------------------------------------------------------------------ + 0 7 Bra + 5 [^a] + 7 7 Ket + 12 End +------------------------------------------------------------------ + +/[^\xaa]/ +Memory allocation (code space): 13 +------------------------------------------------------------------ + 0 7 Bra + 5 [^\x{aa}] + 7 7 Ket + 12 End +------------------------------------------------------------------ + +/[^\xaa]/utf +Memory allocation (code space): 14 +------------------------------------------------------------------ + 0 8 Bra + 5 [^\x{aa}] + 8 8 Ket + 13 End +------------------------------------------------------------------ + +#pattern -memory + +/[^\d]/utf,ucp +------------------------------------------------------------------ + 0 15 Bra + 5 [^\p{Nd}] + 15 15 Ket + 20 End +------------------------------------------------------------------ + +/[[:^alpha:][:^cntrl:]]+/utf,ucp +------------------------------------------------------------------ + 0 19 Bra + 5 [\P{L}\P{Cc}]++ + 19 19 Ket + 24 End +------------------------------------------------------------------ + +/[[:^cntrl:][:^alpha:]]+/utf,ucp +------------------------------------------------------------------ + 0 19 Bra + 5 [\P{Cc}\P{L}]++ + 19 19 Ket + 24 End +------------------------------------------------------------------ + +/[[:alpha:]]+/utf,ucp +------------------------------------------------------------------ + 0 16 Bra + 5 [\p{L}]++ + 16 16 Ket + 21 End +------------------------------------------------------------------ + +/[[:^alpha:]\S]+/utf,ucp +------------------------------------------------------------------ + 0 19 Bra + 5 [\P{L}\P{Xsp}]++ + 19 19 Ket + 24 End +------------------------------------------------------------------ + +/abc(d|e)(*THEN)x(123(*THEN)4|567(b|q)(*THEN)xx)/ +------------------------------------------------------------------ + 0 93 Bra + 5 abc + 11 9 CBra 1 + 18 d + 20 7 Alt + 25 e + 27 16 Ket + 32 *THEN + 33 x + 35 16 CBra 2 + 42 123 + 48 *THEN + 49 4 + 51 37 Alt + 56 567 + 62 9 CBra 3 + 69 b + 71 7 Alt + 76 q + 78 16 Ket + 83 *THEN + 84 xx + 88 53 Ket + 93 93 Ket + 98 End +------------------------------------------------------------------ + +/(((a\2)|(a*)\g<-1>))*a?/ +------------------------------------------------------------------ + 0 73 Bra + 5 Brazero + 6 60 SCBra 1 + 13 24 CBra 2 + 20 12 CBra 3 + 27 a + 29 \2 + 32 12 Ket + 37 24 Alt + 42 9 CBra 4 + 49 a* + 51 9 Ket + 56 42 Recurse + 61 48 Ket + 66 60 KetRmax + 71 a?+ + 73 73 Ket + 78 End +------------------------------------------------------------------ + +/((?+1)(\1))/ +------------------------------------------------------------------ + 0 37 Bra + 5 27 CBra 1 + 12 17 Recurse + 17 10 CBra 2 + 24 \1 + 27 10 Ket + 32 27 Ket + 37 37 Ket + 42 End +------------------------------------------------------------------ + +"(?1)(?#?'){2}(a)" +------------------------------------------------------------------ + 0 29 Bra + 5 15 Recurse + 10 15 Recurse + 15 9 CBra 1 + 22 a + 24 9 Ket + 29 29 Ket + 34 End +------------------------------------------------------------------ + +/.((?2)(?R)|\1|$)()/ +------------------------------------------------------------------ + 0 54 Bra + 5 Any + 6 17 CBra 1 + 13 42 Recurse + 18 0 Recurse + 23 8 Alt + 28 \1 + 31 6 Alt + 36 $ + 37 31 Ket + 42 7 CBra 2 + 49 7 Ket + 54 54 Ket + 59 End +------------------------------------------------------------------ + +/.((?3)(?R)()(?2)|\1|$)()/ +------------------------------------------------------------------ + 0 71 Bra + 5 Any + 6 34 CBra 1 + 13 59 Recurse + 18 0 Recurse + 23 7 CBra 2 + 30 7 Ket + 35 23 Recurse + 40 8 Alt + 45 \1 + 48 6 Alt + 53 $ + 54 48 Ket + 59 7 CBra 3 + 66 7 Ket + 71 71 Ket + 76 End +------------------------------------------------------------------ + +/(?1)()((((((\1++))\x85)+)|))/ +------------------------------------------------------------------ + 0 115 Bra + 5 10 Recurse + 10 7 CBra 1 + 17 7 Ket + 22 88 CBra 2 + 29 71 CBra 3 + 36 59 CBra 4 + 43 47 CBra 5 + 50 33 CBra 6 + 57 21 CBra 7 + 64 9 Once + 69 \1+ + 73 9 Ket + 78 21 Ket + 83 33 Ket + 88 \x{85} + 90 47 KetRmax + 95 59 Ket +100 5 Alt +105 76 Ket +110 88 Ket +115 115 Ket +120 End +------------------------------------------------------------------ + +# Check the absolute limit on nesting (?| etc. This varies with code unit +# width because the workspace is a different number of bytes. It will fail +# with link size 2 in 8-bit and 16-bit but not in 32-bit. + +/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|parens_nest_limit=1000,-fullbincode + +# Use "expand" to create some very long patterns with nested parentheses, in +# order to test workspace overflow. Again, this varies with code unit width, +# and even when it fails in two modes, the error offset differs. It also varies +# with link size - hence multiple tests with different values. + +/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000 + +/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000 +Failed: error 186 at offset 12820: regular expression is too complicated + +/(?(1)(?1)){8,}+()/debug +------------------------------------------------------------------ + 0 181 Bra + 5 159 Once + 10 13 Cond + 15 1 Cond ref + 18 169 Recurse + 23 13 Ket + 28 13 Cond + 33 1 Cond ref + 36 169 Recurse + 41 13 Ket + 46 13 Cond + 51 1 Cond ref + 54 169 Recurse + 59 13 Ket + 64 13 Cond + 69 1 Cond ref + 72 169 Recurse + 77 13 Ket + 82 13 Cond + 87 1 Cond ref + 90 169 Recurse + 95 13 Ket +100 13 Cond +105 1 Cond ref +108 169 Recurse +113 13 Ket +118 13 Cond +123 1 Cond ref +126 169 Recurse +131 13 Ket +136 23 SBraPos +141 13 SCond +146 1 Cond ref +149 169 Recurse +154 13 Ket +159 23 KetRpos +164 159 Ket +169 7 CBra 1 +176 7 Ket +181 181 Ket +186 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcd + 0: + 1: + +/(?(1)|a(?1)b){2,}+()/debug +------------------------------------------------------------------ + 0 91 Bra + 5 69 Once + 10 8 Cond + 15 1 Cond ref + 18 14 Alt + 23 a + 25 79 Recurse + 30 b + 32 22 Ket + 37 32 SBraPos + 42 8 SCond + 47 1 Cond ref + 50 14 Alt + 55 a + 57 79 Recurse + 62 b + 64 22 Ket + 69 32 KetRpos + 74 69 Ket + 79 7 CBra 1 + 86 7 Ket + 91 91 Ket + 96 End +------------------------------------------------------------------ +Capture group count = 1 +Max back reference = 1 +May match empty string +Subject length lower bound = 0 + abcde +No match + +/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug +------------------------------------------------------------------ + 0 327 Bra + 5 102 CBra 1 + 12 5 Recurse + 17 219 Recurse + 22 231 Recurse + 27 243 Recurse + 32 255 Recurse + 37 267 Recurse + 42 279 Recurse + 47 291 Recurse + 52 303 Recurse + 57 303 Recurse + 62 291 Recurse + 67 279 Recurse + 72 267 Recurse + 77 255 Recurse + 82 243 Recurse + 87 231 Recurse + 92 219 Recurse + 97 5 Recurse +102 0 Recurse +107 102 Ket +112 102 SCBra 1 +119 5 Recurse +124 219 Recurse +129 231 Recurse +134 243 Recurse +139 255 Recurse +144 267 Recurse +149 279 Recurse +154 291 Recurse +159 303 Recurse +164 303 Recurse +169 291 Recurse +174 279 Recurse +179 267 Recurse +184 255 Recurse +189 243 Recurse +194 231 Recurse +199 219 Recurse +204 5 Recurse +209 0 Recurse +214 102 KetRmax +219 7 CBra 2 +226 7 Ket +231 7 CBra 3 +238 7 Ket +243 7 CBra 4 +250 7 Ket +255 7 CBra 5 +262 7 Ket +267 7 CBra 6 +274 7 Ket +279 7 CBra 7 +286 7 Ket +291 7 CBra 8 +298 7 Ket +303 7 CBra 9 +310 7 Ket +315 7 CBra 10 +322 7 Ket +327 327 Ket +332 End +------------------------------------------------------------------ +Capture group count = 10 +May match empty string +Subject length lower boundailed: error 114 at offset 509: missing closing parenthesisfullbincode + +#pattern -fullbincode + +/\[()]{65535}/expand + +# End of testinput8 diff --git a/src/pcre/testdata/testoutput14 b/src/pcre2/testdata/testoutput9 similarity index 73% rename from src/pcre/testdata/testoutput14 rename to src/pcre2/testdata/testoutput9 index 020f51e3..1ec43177 100644 --- a/src/pcre/testdata/testoutput14 +++ b/src/pcre2/testdata/testoutput9 @@ -1,165 +1,25 @@ -/-- This set of tests is run only with the 8-bit library. They do not require - UTF-8 or Unicode property support. The file starts with all the tests of - the POSIX interface, because that is supported only with the 8-bit library. - --/ +# This set of tests is run only with the 8-bit library. They must not require +# UTF-8 or Unicode property support. */ -< forbid 8W +#forbid_utf +#newline_default lf any anycrlf -/abc/P - abc - 0: abc - *** Failers -No match: POSIX code 17: match failed - -/^abc|def/P - abcdef - 0: abc - abcdef\B - 0: def - -/.*((abc)$|(def))/P - defabc - 0: defabc - 1: abc - 2: abc - \Zdefabc - 0: def - 1: def - 3: def - -/the quick brown fox/P - the quick brown fox - 0: the quick brown fox - *** Failers -No match: POSIX code 17: match failed - The Quick Brown Fox -No match: POSIX code 17: match failed - -/the quick brown fox/Pi - the quick brown fox - 0: the quick brown fox - The Quick Brown Fox - 0: The Quick Brown Fox - -/abc.def/P - *** Failers -No match: POSIX code 17: match failed - abc\ndef -No match: POSIX code 17: match failed - -/abc$/P - abc - 0: abc - abc\n - 0: abc - -/(abc)\2/P -Failed: POSIX code 15: bad back reference at offset 7 - -/(abc\1)/P - abc -No match: POSIX code 17: match failed - -/a*(b+)(z)(z)/P - aaaabbbbzzzz - 0: aaaabbbbzz - 1: bbbb - 2: z - 3: z - aaaabbbbzzzz\O0 - aaaabbbbzzzz\O1 - 0: aaaabbbbzz - aaaabbbbzzzz\O2 - 0: aaaabbbbzz - 1: bbbb - aaaabbbbzzzz\O3 - 0: aaaabbbbzz - 1: bbbb - 2: z - aaaabbbbzzzz\O4 - 0: aaaabbbbzz - 1: bbbb - 2: z - 3: z - aaaabbbbzzzz\O5 - 0: aaaabbbbzz - 1: bbbb - 2: z - 3: z - -/ab.cd/P - ab-cd - 0: ab-cd - ab=cd - 0: ab=cd - ** Failers -No match: POSIX code 17: match failed - ab\ncd -No match: POSIX code 17: match failed - -/ab.cd/Ps - ab-cd - 0: ab-cd - ab=cd - 0: ab=cd - ab\ncd - 0: ab\x0acd - -/a(b)c/PN - abc -Matched with REG_NOSUB - -/a(?Pb)c/PN - abc -Matched with REG_NOSUB - -/a?|b?/P - abc - 0: a - ** Failers - 0: - ddd\N -No match: POSIX code 17: match failed - -/\w+A/P - CDAAAAB - 0: CDAAAA - -/\w+A/PU - CDAAAAB - 0: CDA - -/\Biss\B/I+P - Mississippi - 0: iss - 0+ issippi - -/abc/\P -Failed: POSIX code 9: bad escape sequence at offset 4 - -/-- End of POSIX tests --/ - -/a\Cb/ - aXb - 0: aXb - a\nb - 0: a\x0ab - ** Failers (too big char) -No match - A\x{123}B +/ab/ +\= Expect error message (too big char) and no match + A\x{123}B ** Character \x{123} is greater than 255 and UTF-8 mode is not enabled. ** Truncation will probably give the wrong result. No match - A\o{443}B + A\o{443}B ** Character \x{123} is greater than 255 and UTF-8 mode is not enabled. ** Truncation will probably give the wrong result. No match /\x{100}/I -Failed: character value in \x{} or \o{} is too large at offset 6 +Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large /\o{400}/I -Failed: character value in \x{} or \o{} is too large at offset 6 +Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large / (?: [\040\t] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* @@ -354,75 +214,39 @@ Failed: character value in \x{} or \o{} is too large at offset 6 ) (?: [\040\t] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] | \( (?: [^\\\x80-\xff\n\015()] | \\ [^\x80-\xff] )* \) )* \) )* # optional trailing comment -/xSI -Capturing subpattern count = 0 +/Ix +Capture group count = 0 Contains explicit CR or LF match Options: extended -No first char -No need char -Subject length lower bound = 3 -Starting chars: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8 +Starting code units: \x09 \x20 ! " # $ % & ' ( * + - / 0 1 2 3 4 5 6 7 8 9 = ? A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f +Subject length lower bound = 3 -/-- Although this saved pattern was compiled with link-size=2, it does no harm -to run this test with other link sizes because it is going to generated a -"compiled in wrong mode" error as soon as it is loaded, so the link size does -not matter. --/ - -\x09< 0: \x09 -/[\h]+/BZ +/[\h]+/B ------------------------------------------------------------------ Bra [\x09 \xa0]++ @@ -442,7 +266,7 @@ Starting chars: \x0a \x0b \x0c \x0d \x85 >\x09\x20\xa0< 0: \x09 \xa0 -/[\v]/BZ +/[\v]/B ------------------------------------------------------------------ Bra [\x0a-\x0d\x85] @@ -450,7 +274,7 @@ Starting chars: \x0a \x0b \x0c \x0d \x85 End ------------------------------------------------------------------ -/[\H]/BZ +/[\H]/B ------------------------------------------------------------------ Bra [\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff] @@ -458,7 +282,7 @@ Starting chars: \x0a \x0b \x0c \x0d \x85 End ------------------------------------------------------------------ -/[^\h]/BZ +/[^\h]/B ------------------------------------------------------------------ Bra [\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff] (neg) @@ -466,7 +290,7 @@ Starting chars: \x0a \x0b \x0c \x0d \x85 End ------------------------------------------------------------------ -/[\V]/BZ +/[\V]/B ------------------------------------------------------------------ Bra [\x00-\x09\x0e-\x84\x86-\xff] @@ -474,7 +298,7 @@ Starting chars: \x0a \x0b \x0c \x0d \x85 End ------------------------------------------------------------------ -/[\x0a\V]/BZ +/[\x0a\V]/B ------------------------------------------------------------------ Bra [\x00-\x0a\x0e-\x84\x86-\xff] @@ -483,23 +307,33 @@ Starting chars: \x0a \x0b \x0c \x0d \x85 ------------------------------------------------------------------ /\777/I -Failed: octal value is greater than \377 in 8-bit non-UTF-8 mode at offset 3 +Failed: error 151 at offset 4: octal value is greater than \377 in 8-bit non-UTF-8 mode -/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/K -Failed: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN) at offset 259 +/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark +Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN) + XX + +/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark,alt_verbnames +Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN) + XX + +/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark + XX + 0: XX +MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE -/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/K +/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark,alt_verbnames XX 0: XX MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE -/\u0100/ -Failed: character value in \u.... sequence is too large at offset 5 +/\u0100/alt_bsux,allow_empty_class,match_unset_backref,dupnames +Failed: error 177 at offset 6: character code point value in \u.... sequence is too large -/[\u0100-\u0200]/ -Failed: character value in \u.... sequence is too large at offset 6 +/[\u0100-\u0200]/alt_bsux,allow_empty_class,match_unset_backref,dupnames +Failed: error 177 at offset 7: character code point value in \u.... sequence is too large -/[^\x00-a]{12,}[^b-\xff]*/BZ +/[^\x00-a]{12,}[^b-\xff]*/B ------------------------------------------------------------------ Bra [b-\xff] (neg){12,}+ @@ -508,7 +342,7 @@ Failed: character value in \u.... sequence is too large at offset 6 End ------------------------------------------------------------------ -/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/BZ +/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/B ------------------------------------------------------------------ Bra [\x00-\x08\x0e-\x1f!-\xff] (neg)*+ @@ -527,6 +361,14 @@ Failed: character value in \u.... sequence is too large at offset 6 End ------------------------------------------------------------------ -/(?'ABC'[bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar](*THEN:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/ +/(*MARK:a\x{100}b)z/alt_verbnames +Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large + +/(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':Æ¿)/ +Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN) + +/(?i:A{1,}\6666666666)/ +Failed: error 151 at offset 13: octal value is greater than \377 in 8-bit non-UTF-8 mode + A\x{1b6}6666666 -/-- End of testinput14 --/ +# End of testinput9 diff --git a/src/pcre2/testdata/testoutputEBC b/src/pcre2/testdata/testoutputEBC new file mode 100644 index 00000000..4edc8f99 --- /dev/null +++ b/src/pcre2/testdata/testoutputEBC @@ -0,0 +1,206 @@ +PCRE2 version 10.32-RC1 2018-02-19 +# This is a specialized test for checking, when PCRE2 is compiled with the +# EBCDIC option but in an ASCII environment, that newline, white space, and \c +# functionality is working. It catches cases where explicit values such as 0x0a +# have been used instead of names like CHAR_LF. Needless to say, it is not a +# genuine EBCDIC test! In patterns, alphabetic characters that follow a +# backslash must be in EBCDIC code. In data, NL, NEL, LF, ESC, and DEL must be +# in EBCDIC, but can of course be specified as escapes. + +# Test default newline and variations + +/^A/m + ABC + 0: A + 12\x15ABC + 0: A + +/^A/m,newline=any + 12\x15ABC + 0: A + 12\x0dABC + 0: A + 12\x0d\x15ABC + 0: A + 12\x25ABC + 0: A + +/^A/m,newline=anycrlf + 12\x15ABC + 0: A + 12\x0dABC + 0: A + 12\x0d\x15ABC + 0: A + ** Fail +No match + 12\x25ABC +No match + +# Test \h + +/^A\ˆ/ + A B + 0: A\x20 + A\x41B + 0: AA + +# Test \H + +/^A\È/ + AB + 0: AB + A\x42B + 0: AB + ** Fail +No match + A B +No match + A\x41B +No match + +# Test \R + +/^A\Ù/ + A\x15B + 0: A\x15 + A\x0dB + 0: A\x0d + A\x25B + 0: A\x25 + A\x0bB + 0: A\x0b + A\x0cB + 0: A\x0c + ** Fail +No match + A B +No match + +# Test \v + +/^A\¥/ + A\x15B + 0: A\x15 + A\x0dB + 0: A\x0d + A\x25B + 0: A\x25 + A\x0bB + 0: A\x0b + A\x0cB + 0: A\x0c + ** Fail +No match + A B +No match + +# Test \V + +/^A\å/ + A B + 0: A\x20 + ** Fail +No match + A\x15B +No match + A\x0dB +No match + A\x25B +No match + A\x0bB +No match + A\x0cB +No match + +# For repeated items, use an atomic group so that the output is the same +# for DFA matching (otherwise it may show multiple matches). + +# Test \h+ + +/^A(?>\ˆ+)/ + A B + 0: A\x20 + +# Test \H+ + +/^A(?>\È+)/ + AB + 0: AB + ** Fail +No match + A B +No match + +# Test \R+ + +/^A(?>\Ù+)/ + A\x15B + 0: A\x15 + A\x0dB + 0: A\x0d + A\x25B + 0: A\x25 + A\x0bB + 0: A\x0b + A\x0cB + 0: A\x0c + ** Fail +No match + A B +No match + +# Test \v+ + +/^A(?>\¥+)/ + A\x15B + 0: A\x15 + A\x0dB + 0: A\x0d + A\x25B + 0: A\x25 + A\x0bB + 0: A\x0b + A\x0cB + 0: A\x0c + ** Fail +No match + A B +No match + +# Test \V+ + +/^A(?>\å+)/ + A B + 0: A\x20B + ** Fail +No match + A\x15B +No match + A\x0dB +No match + A\x25B +No match + A\x0bB +No match + A\x0cB +No match + +# Test \c functionality + +/\ƒ@\ƒA\ƒb\ƒC\ƒd\ƒE\ƒf\ƒG\ƒh\ƒI\ƒJ\ƒK\ƒl\ƒm\ƒN\ƒO\ƒp\ƒq\ƒr\ƒS\ƒT\ƒu\ƒV\ƒW\ƒX\ƒy\ƒZ/ + \x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f + 0: \x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a + +/\ƒ[\ƒ\\ƒ]\ƒ^\ƒ_/ + \x18\x19\x1a\x1b\x1c\x1d\x1e\x1f + 0: \x1b\x1c\x1d\x1e\x1f + +/\ƒ?/ + A\xffB + 0: \xff + +/\ƒ&/ +Failed: error 168 at offset 3: \c\x20must\x20be\x20followed\x20by\x20a\x20letter\x20or\x20one\x20of\x20[\]^_\x3f + +# End diff --git a/src/pcre/testdata/valgrind-jit.supp b/src/pcre2/testdata/valgrind-jit.supp similarity index 100% rename from src/pcre/testdata/valgrind-jit.supp rename to src/pcre2/testdata/valgrind-jit.supp diff --git a/src/pcre2/testdata/wintestinput3 b/src/pcre2/testdata/wintestinput3 new file mode 100644 index 00000000..8d8017a6 --- /dev/null +++ b/src/pcre2/testdata/wintestinput3 @@ -0,0 +1,104 @@ +# This set of tests checks local-specific features, using the "fr_FR" locale. +# It is not Perl-compatible. When run via RunTest, the locale is edited to +# be whichever of "fr_FR", "french", or "fr" is found to exist. There is +# different version of this file called wintestinput3 for use on Windows, +# where the locale is called "french" and the tests are run using +# RunTest.bat. + +#forbid_utf + +/^[\w]+/ + *** Failers + École + +/^[\w]+/locale=french + École + +/^[\w]+/ + *** Failers + École + +/^[\W]+/ + École + +/^[\W]+/locale=french + *** Failers + École + +/[\b]/ + \b + *** Failers + a + +/[\b]/locale=french + \b + *** Failers + a + +/^\w+/ + *** Failers + École + +/^\w+/locale=french + École + +/(.+)\b(.+)/ + École + +/(.+)\b(.+)/locale=french + *** Failers + École + +/École/i + École + *** Failers + école + +/École/i,locale=french + École + école + +/\w/I + +/\w/I,locale=french + +# All remaining tests are in the french locale, so set the default. + +#pattern locale=french + +/^[\xc8-\xc9]/i + École + école + +/^[\xc8-\xc9]/ + École + *** Failers + école + +/\W+/ + >>>\xaa<<< + >>>\xba<<< + +/[\W]+/ + >>>\xaa<<< + >>>\xba<<< + +/[^[:alpha:]]+/ + >>>\xaa<<< + >>>\xba<<< + +/\w+/ + >>>\xaa<<< + >>>\xba<<< + +/[\w]+/ + >>>\xaa<<< + >>>\xba<<< + +/[[:alpha:]]+/ + >>>\xaa<<< + >>>\xba<<< + +/[[:alpha:]][[:lower:]][[:upper:]]/IB + +# End of testinput3 diff --git a/src/pcre/testdata/wintestoutput3 b/src/pcre2/testdata/wintestoutput3 similarity index 57% rename from src/pcre/testdata/wintestoutput3 rename to src/pcre2/testdata/wintestoutput3 index 456ad196..b1894b66 100644 --- a/src/pcre/testdata/wintestoutput3 +++ b/src/pcre2/testdata/wintestoutput3 @@ -1,10 +1,19 @@ +# This set of tests checks local-specific features, using the "fr_FR" locale. +# It is not Perl-compatible. When run via RunTest, the locale is edited to +# be whichever of "fr_FR", "french", or "fr" is found to exist. There is +# different version of this file called wintestinput3 for use on Windows, +# where the locale is called "french" and the tests are run using +# RunTest.bat. + +#forbid_utf + /^[\w]+/ *** Failers No match École No match -/^[\w]+/Lfrench +/^[\w]+/locale=french École 0: École @@ -18,7 +27,7 @@ No match École 0: \xc9 -/^[\W]+/Lfrench +/^[\W]+/locale=french *** Failers 0: *** École @@ -32,7 +41,7 @@ No match a No match -/[\b]/Lfrench +/[\b]/locale=french \b 0: \x08 *** Failers @@ -46,7 +55,7 @@ No match École No match -/^\w+/Lfrench +/^\w+/locale=french École 0: École @@ -56,7 +65,7 @@ No match 1: \xc9 2: cole -/(.+)\b(.+)/Lfrench +/(.+)\b(.+)/locale=french *** Failers 0: *** Failers 1: *** @@ -72,40 +81,38 @@ No match école No match -/École/iLfrench +/École/i,locale=french École 0: École école 0: école -/\w/IS -Capturing subpattern count = 0 -No options -No first char -No need char -Subject length lower bound = 1 -Starting chars: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P +/\w/I +Capture group count = 0 +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z - -/\w/ISLfrench -Capturing subpattern count = 0 -No options -No first char -No need char Subject length lower bound = 1 -Starting chars: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P + +/\w/I,locale=french +Capture group count = 0 +Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z ƒ Š Œ Ž š œ ž Ÿ ª ² ³ µ ¹ º À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ +Subject length lower bound = 1 + +# All remaining tests are in the french locale, so set the default. -/^[\xc8-\xc9]/iLfrench +#pattern locale=french + +/^[\xc8-\xc9]/i École 0: É école 0: é -/^[\xc8-\xc9]/Lfrench +/^[\xc8-\xc9]/ École 0: É *** Failers @@ -113,43 +120,43 @@ No match école No match -/\W+/Lfrench +/\W+/ >>>\xaa<<< 0: >>> >>>\xba<<< 0: >>> -/[\W]+/Lfrench +/[\W]+/ >>>\xaa<<< 0: >>> >>>\xba<<< 0: >>> -/[^[:alpha:]]+/Lfrench +/[^[:alpha:]]+/ >>>\xaa<<< 0: >>> >>>\xba<<< 0: >>> -/\w+/Lfrench +/\w+/ >>>\xaa<<< 0: ª >>>\xba<<< 0: º -/[\w]+/Lfrench +/[\w]+/ >>>\xaa<<< 0: ª >>>\xba<<< 0: º -/[[:alpha:]]+/Lfrench +/[[:alpha:]]+/ >>>\xaa<<< 0: ª >>>\xba<<< 0: º -/[[:alpha:]][[:lower:]][[:upper:]]/DZLfrench +/[[:alpha:]][[:lower:]][[:upper:]]/IB ------------------------------------------------------------------ Bra [A-Za-z\x83\x8a\x8c\x8e\x9a\x9c\x9e\x9f\xaa\xb5\xba\xc0-\xd6\xd8-\xf6\xf8-\xff] @@ -158,9 +165,11 @@ No match Ket End ------------------------------------------------------------------ -Capturing subpattern count = 0 -No options -No first char -No need char - -/ End of testinput3 / +Capture group count = 0 +Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z + a b c d e f g h i j k l m n o p q r s t u v w x y z ƒ Š Œ Ž š œ ž Ÿ ª µ º + À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å + æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ +Subject length lower bound = 3 + +# End of testinput3 diff --git a/src/print.c b/src/print.c index a3f9be22..5da5ec66 100644 --- a/src/print.c +++ b/src/print.c @@ -3,10 +3,11 @@ ** VeriFone Inc./Hewlett-Packard. All Rights Reserved. ** Kevin Hughes, kev@kevcom.com 3/11/94 ** Kent Landfield, kent@landfield.com 4/6/97 +** Hypermail Project 1998-2023 ** ** This program and library is free software; you can redistribute it and/or ** modify it under the terms of the GNU (Library) General Public License -** as published by the Free Software Foundation; either version 2 +** as published by the Free Software Foundation; either version 3 ** of the License, or any later version. ** ** This program is distributed in the hope that it will be useful, @@ -20,9 +21,9 @@ */ #include -#include #include "hypermail.h" +#include "dmatch.h" #include "setup.h" #include "struct.h" #include "printfile.h" @@ -30,6 +31,7 @@ #include "parse.h" #include "txt2html.h" #include "finelink.h" +#include "getname.h" #include "threadprint.h" @@ -135,13 +137,15 @@ int togdbm(void *gp, struct emailinfo *ep) #endif - /* Uses threadlist to find the next message after * msgnum in the thread containing msgnum. * Returns NULL if there are no more messages in * the thread. + * The skip_deleted argument instructs the function to + * skip all deleted mails in a thread until it finds + * a non-deleted one. */ -struct emailinfo *nextinthread(int msgnum) +static struct emailinfo *nextinthread_common(int msgnum, bool skip_deleted) { struct reply *rp = threadlist; @@ -161,6 +165,12 @@ struct emailinfo *nextinthread(int msgnum) rp = rp->next; + if (skip_deleted) { + while (rp && rp->frommsgnum != -1 && rp->data->is_deleted) { + rp = rp->next; + } + } + if ((rp == NULL) || (rp->frommsgnum == -1)) { /*end of thread - no next msg */ return NULL; @@ -168,6 +178,24 @@ struct emailinfo *nextinthread(int msgnum) return rp->data; } +/* Uses threadlist to find the next message after + * msgnum in the thread containing msgnum. + * Returns NULL if there are no more messages in + * the thread. + */ +struct emailinfo *nextinthread(int msgnum) +{ + return nextinthread_common(msgnum, FALSE); +} + +/* similar to nextinthread but skips all deleted messages +** in the thread +*/ +struct emailinfo *nextinthread_skip_deleted(int msgnum) +{ + return nextinthread_common(msgnum, TRUE); +} + #if 0 /* ** Output a menu line with hyperlinks for table display @@ -219,7 +247,7 @@ void fprint_menu(FILE *fp, mindex_t idx, char *archives, char *currentid, char * if (pos == PAGE_TOP) fprintf(fp, "%s\n", lang[MSG_END_OF_MESSAGES]); else - fprintf(fp, "%s\n", lang[MSG_START_OF_MESSAGES]); + fprintf(fp, "%s\n", lang[MSG_START_OF_MESSAGES]); } for (i = 0; i <= AUTHOR_INDEX; ++i) { @@ -277,19 +305,20 @@ void fprint_menu0(FILE *fp, struct emailinfo *email, int pos) if (!(set_show_msg_links && set_show_msg_links != loc_cmp) || (set_show_index_links && set_show_index_links != loc_cmp)) { - fprintf(fp, "

      \n"); + fprintf(fp, "\n"); + fprintf (fp, "
    \n"); is_first = FALSE; } else is_first = TRUE; - snprintf(date_str, sizeof(date_str), "
  • %s%s
      \n", - (is_first) ? first_attributes : "", tmp); + trio_snprintf(date_str, sizeof(date_str), "%s\n
        \n", + (is_first) ? first_attributes : "", tmp); fprintf (fp, "%s", date_str); strcpy (prev_date_str, tmp); } date_str[0] = 0; startline = "
      • "; - break_str = " "; + break_str = " "; endline = "
      • "; subj_tag = ""; subj_end_tag = ""; } - fprintf(fp,"%s%s%s%s%s%s%s%s%s\n", - startline, msg_href(em, subdir_email, FALSE), + fprintf(fp,"%s%s%s%s%s%s%s%s%s\n", + startline, + set_fragment_prefix, em->msgnum, + msg_href(em, subdir_email, FALSE), subj_tag, subject, subj_end_tag, break_str, - set_fragment_prefix, em->msgnum, set_fragment_prefix, em->msgnum, name, break_str, date_str, endline); @@ -853,7 +907,7 @@ int printattachments(FILE *fp, struct header *hp, struct emailinfo *subdir_email char *attdir; char *msgnum; int nb_attach = 0; - static char *first_attributes = ""; + static char *first_attributes = " id=\"first\""; const char *rel_path_to_top = (subdir_email ? subdir_email->subdir->rel_path_to_top : ""); @@ -875,15 +929,14 @@ int printattachments(FILE *fp, struct header *hp, struct emailinfo *subdir_email /* consider that if there's an attachment directory, there are attachments */ nb_attach++; if (set_indextable) { - fprintf(fp, "%s%s%s" "%s\n", msg_href(em, subdir_email, TRUE), subject, set_fragment_prefix, em->msgnum, set_fragment_prefix, em->msgnum, name, getindexdatestr(em->date)); + fprintf(fp, "%s%s%s" "%s\n", msg_href(em, subdir_email, TRUE), subject, set_fragment_prefix, em->msgnum, name, getindexdatestr(em->date)); } else { - fprintf(fp, "
      • %s%s%s " - "%s (%s)\n", + fprintf(fp, "%s " + "%s (%s)\n", (*is_first) ? first_attributes : "", - msg_href(em, subdir_email, TRUE), subject, - set_fragment_prefix, em->msgnum, - set_fragment_prefix, em->msgnum, + set_fragment_prefix, em->msgnum, + msg_href(em, subdir_email, FALSE), subject, name, getindexdatestr(em->date)); if (*is_first) @@ -907,7 +960,7 @@ int printattachments(FILE *fp, struct header *hp, struct emailinfo *subdir_email nb_attach++; if (first_time && !set_indextable) { first_time = 0; - fprintf(fp, "
          \n"); + fprintf(fp, "
            \n"); } trio_asprintf(&filename, "%s%c%s", attdir, PATH_SEPARATOR, entry->d_name); if (!stat(filename, &fileinfo)) @@ -922,7 +975,7 @@ int printattachments(FILE *fp, struct header *hp, struct emailinfo *subdir_email free(filename); } if (!first_time && !set_indextable) { - fprintf(fp, "
          \n"); + fprintf(fp, "
        \n"); } closedir(dir); } @@ -940,12 +993,20 @@ int printattachments(FILE *fp, struct header *hp, struct emailinfo *subdir_email return nb_attach; } -int showheader(char *header) +int showheader_list(char *header) { return (!inlist (set_skip_headers, header) && (inlist(set_show_headers, header) || inlist(set_show_headers, "*"))); } +int showheader_match(char *header, char *wildcard) +{ + if (Match(header, wildcard) || Match("*", wildcard)) + return 1; + else + return 0; +} + /* * ConvURLsWithHrefs handles lines with URLs that are already written as * href's, to avoid having ConvURLsString add a second href to those URLs. @@ -1078,8 +1139,13 @@ char *ConvURLsString(char *line, char *mailid, char *mailsubject, char *charset) if (set_href_detection) { if ((c = strcasestr(line, "")) + if (parsed && strcasestr(parsed, "")) { +#ifdef HAVE_ICONV + if (tmpptr) + free(tmpptr); +#endif return parsed; - + } + /* we didn't find any previous href convertion, we try to do a mailto: convertion */ if (use_mailcommand) { /* Exclude headers that are not mail type headers */ @@ -1109,92 +1185,177 @@ char *ConvURLsString(char *line, char *mailid, char *mailsubject, char *charset) newparse = parseemail(parsed, /* source */ mailid, /* mail's Message-Id: */ mailsubject, /* mail's Subject: */ - MAKEMAILCOMMAND); /* make a mailto: */ + /* either obfuscate the address or convert it to a mailto: */ + (use_mailcommand == 2) ? OBFUSCATE_ADDRESS : MAKEMAILCOMMAND); free(parsed); parsed = newparse; } } #ifdef HAVE_ICONV if(tmpptr) - free(tmpptr); + free(tmpptr); #endif return parsed; } -void printheaders (FILE *fp, struct emailinfo *email) +static struct body *printheaders(FILE *fp, struct emailinfo *email, struct body *from_bp, bool msg_rfc822) { - struct body *bp = email->bodylist; + struct body *bp = NULL; char *id = email->msgid; char *subject = email->subject; char head[128]; char head_lower[128]; char *header_content; + struct hmlist *show_headers_list; + if (REMOVE_MESSAGE(email)) { + /* the following message is now shown when printing the body; we + ** only need to return */ +#if 0 int d_index = MSG_DELETED; if (email->is_deleted == 2) d_index = MSG_EXPIRED; if (email->is_deleted == 4 || email->is_deleted == 8) d_index = MSG_FILTERED_OUT; - fprintf(fp, ""); - fprintf(fp, "(%s)\n", lang[d_index]); - return; + fprintf(fp, "(%s)\n", lang[d_index]); +#endif + return NULL; } - - if (set_show_headers) { - while (bp != NULL && bp->header) { - if ((bp->line)[0] == '\n') { /* don't try to convert newline */ - break; - } - - if (sscanf(bp->line, "%127[^:]", head) == 1 && showheader(head)) { - /* this is a header we want to show */ - - strcpy (head_lower, head); - strtolower (head_lower); - - /* we print the header, escaping it as needed */ - - header_content = bp->line + strlen (head) + 2; - fprintf (fp, "%s: ", - head_lower, head); + /* if we are dealing with a message/rfc822 attachment, use + set_show_msg_rfc_headers if defined, otherwise fall back + to set_show_headers */ + if (msg_rfc822 && set_show_headers_msg_rfc822) { + show_headers_list = set_show_headers_msg_rfc822; + } else { + show_headers_list = set_show_headers; + } + + if (show_headers_list) { + struct hmlist *shp; + + for (shp = show_headers_list; shp != NULL; shp = shp->next) { + + /* if from_bp is initialized, we are dealing with an attachment, + ** we want to print out the headers we usually skip */ + if (inlist(set_skip_headers, shp->val)) { + continue; + } - /* JK: avoid converting Message-Id: headers */ - if (!strcmp(head_lower, "message-id") && use_mailcommand) { - /* we desactivate it just during this conversion */ - use_mailcommand = 0; - ConvURLs(fp, header_content, id, subject, email->charset); - use_mailcommand = 1; - } - else{ + if (from_bp) { + bp = from_bp; + } else { + bp = email->bodylist; + } + + while (bp != NULL && bp->header) { + if ((bp->line)[0] == '\n') { /* don't try to convert newline */ + break; + } + + /* skip invalid header lines */ + if (bp->invalid_header) { + bp = bp->next; + continue; + } + + if (sscanf(bp->line, "%127[^:]", head) == 1 && showheader_match(head, shp->val)) { + /* this is a header we want to show */ + + strcpy (head_lower, head); + strtolower (head_lower); + + /* we print the header, escaping it as needed */ + + header_content = bp->line + strlen (head) + 2; + fprintf (fp, "
      • %s: ", + head_lower, head); + + + /* JK: avoid converting Message-Id: headers for all messages. + We also avoid converting all mail address headers in + message/rfc822 attachments */ + if (use_mailcommand + && (!strcmp(head_lower, "message-id") + || msg_rfc822)) + { + int prev_use_mail_command = use_mailcommand; + /* we instruct ConvURls to only obfuscate addresses */ + use_mailcommand = 2; + ConvURLs(fp, header_content, id, subject, email->charset); + use_mailcommand = prev_use_mail_command; + } + else { #ifdef HAVE_ICONV - size_t tmplen; - char *tmpptr=i18n_convstring(header_content,"UTF-8",email->charset,&tmplen); - ConvURLs(fp, tmpptr, id, subject, email->charset); - if (tmpptr) - free(tmpptr); + size_t tmplen; + char *tmpptr=i18n_convstring(header_content,"UTF-8",email->charset,&tmplen); + ConvURLs(fp, tmpptr, id, subject, email->charset); + if (tmpptr) + free(tmpptr); #else - ConvURLs(fp, header_content, id, subject, email->charset); + ConvURLs(fp, header_content, id, subject, email->charset); #endif - } - fprintf (fp, "
        \n"); - } + } + fprintf (fp, "
      • \n"); + } - /* go to the next header or stop if we reached the end of the headers - (signaled thru the \n char). */ - if ((bp->line)[0] != '\n') { - bp = bp->next; - continue; - } - else - break; - } + /* go to the next header or stop if we reached the end of the headers + (signaled thru the \n char). */ + if ((bp->line)[0] != '\n') { + bp = bp->next; + continue; + } + else + break; + } + } + } else { + /* if we didn't print the main mail headers, caller expects bp to + point to next line that has to be processed */ + bp = from_bp; } + + /* returns last line that was processed */ + return bp; + } /* printheaders */ +/* Closes all open sections and resets the flags. This +** function factorizes code that is used in multiple places +** in printbody() +*/ +static void close_open_sections(FILE *fp, int *pre_open, int *showhtml_open, + int *inlinehtml_open, int *attachment_open) +{ + /* close all open tag sections */ + + /* pre is used for signatures even when + ** showhtml is enabled + */ + if (*pre_open) { + fprintf(fp, "
  • \n"); + *pre_open = FALSE; + } + + if (set_showhtml == 2 && !*inlinehtml_open) { + end_txt2html(fp); + } + + if (*showhtml_open || *inlinehtml_open) { + fprintf(fp, "\n"); + *showhtml_open = FALSE; + *inlinehtml_open = FALSE; + } + + if (*attachment_open) { + /* fprintf(fp, "\n"); */ + *attachment_open = FALSE; + } +} + /* ** The heuristics for displaying an otherwise ordinary line (a non-quote, ** non-sig, non-inhtml, non-blank line) if 'showhtml' is 1 (which really means @@ -1208,249 +1369,518 @@ void printheaders (FILE *fp, struct emailinfo *email) void printbody(FILE *fp, struct emailinfo *email, int maybe_reply, int is_reply) { - int insig, inblank; + int insig = 0; + int inblank = 1; struct body *bp = email->bodylist; char *id = email->msgid; char *subject = email->subject; int msgnum = email->msgnum; + char *body_start_attribute = " id=\"start\""; char inheader = FALSE; /* we always start in a mail header */ - int pre = FALSE; - - int inquote; - int quote_num; + int body_start = TRUE; /* used to put the anchor to the first line of the body */ + int pre_open = FALSE; /* controls if a
     is open */
    +    int showhtml_open = FALSE;  /* if using showhtml controls if the special 
    +				** 
    surrounding that content is open */ + int inlinehtml_open = FALSE; /* if using inline_html controls if + ** the special
    surrounding that content is open */ + int attachment_open = FALSE; /* if we generated a list of attachments, controls if the + ** the
    surrounding the list is open */ + int attachment_link_open = FALSE; /* set to true if we opened a section for the list pointing + to external attachments */ + int inquote = 0; + int quote_num = 0; int quoted_percent; bool replace_quoted; - + bool prefered_charset_is_utf8; + + /* used to generate both unique ids for each forwarded-message + (message/rfc822) section, list of stored attachments and + indications helping users of screen-readers better identify + how a forwarded message is binded with a list of stored + attachments */ + unsigned int nesting_level_sequence[MAX_FWD_MSG_NESTING_LEVEL]; + int nesting_level = 0; + /* we only use this when we read nesting_level >= MAX_FWD_MSG_NESTING_LEVEL + so that we can continue to ensure xmlwf and unique ids */ + int forwarded_message_count = 0; + + memset(nesting_level_sequence, 0, sizeof(nesting_level_sequence)); + if (set_linkquotes || set_showhtml == 2) - /* should be changed to unconditional after tested for a while? - - pcm@rahul.net 1999-09-09 */ - find_quote_prefix(email->bodylist, is_reply); + /* should be changed to unconditional after tested for a while? + - pcm@rahul.net 1999-09-09 */ + find_quote_prefix(email->bodylist, is_reply); if (set_quote_hide_threshold <= 100) quoted_percent = compute_quoted_percent(bp); else - quoted_percent = 100; + quoted_percent = 100; replace_quoted = (quoted_percent > set_quote_hide_threshold); if (set_showprogress && replace_quoted) - printf("\nMessage %d quoted text (%d %%) replaced by links\n", msgnum, quoted_percent); + printf("\nMessage %d quoted text (%d %%) replaced by links\n", msgnum, quoted_percent); + + if (!strncasecmp (email->charset, "UTF-8", 5)) { + prefered_charset_is_utf8 = TRUE; + } else { + prefered_charset_is_utf8 = FALSE; + } + /* for deleted messages, print a specific message, either default + ** or configuration given */ if (email->is_deleted && set_delete_level != DELETE_LEAVES_TEXT && !(email->is_deleted == 2 && set_delete_level == DELETE_LEAVES_EXPIRED_TEXT)) { - int d_index = MSG_DELETED; - if (email->is_deleted == 2) - d_index = MSG_EXPIRED; - if (email->is_deleted == 4 || email->is_deleted == 8) - d_index = MSG_FILTERED_OUT; - if (email->is_deleted == 64) - d_index = MSG_DELETED_OTHER; - switch(d_index) { - case MSG_DELETED: - if(set_htmlmessage_deleted_spam){ - fprintf(fp,"%s\n",set_htmlmessage_deleted_spam); - break; - } - case MSG_DELETED_OTHER: - if(set_htmlmessage_deleted_other){ - fprintf(fp,"%s\n",set_htmlmessage_deleted_other); - break; + int d_index = MSG_DELETED; + if (email->is_deleted == 2) + d_index = MSG_EXPIRED; + if (email->is_deleted == 4 || email->is_deleted == 8) + d_index = MSG_FILTERED_OUT; + if (email->is_deleted == 64) + d_index = MSG_DELETED_OTHER; + switch(d_index) { + case MSG_DELETED: + if(set_htmlmessage_deleted_spam){ + fprintf(fp,"%s\n",set_htmlmessage_deleted_spam); + break; + } + case MSG_DELETED_OTHER: + if(set_htmlmessage_deleted_other){ + fprintf(fp,"%s\n",set_htmlmessage_deleted_other); + break; } - default: - fprintf(fp, ""); - fprintf(fp, "

    %s

    \n", lang[d_index]); - } - return; + default: + fprintf(fp, "\n", body_start_attribute); + fprintf(fp, "%s\n", lang[d_index]); + fprintf(fp, "
    "); + } + return; } - + + /* deal with messages that were edited */ if (email->annotation_content == ANNOTATION_CONTENT_EDITED) { - if (set_htmlmessage_edited) - fprintf(fp,"%s\n",set_htmlmessage_edited); - else - fprintf(fp, "

    %s

    \n", lang[MSG_EDITED]); + if (set_htmlmessage_edited) + fprintf(fp,"%s\n",set_htmlmessage_edited); + else { + fprintf(fp, "%s

    \n", body_start_attribute, + lang[MSG_EDITED]); + body_start = FALSE; + } } - if (!set_showhtml) { - fprintf(fp, "
    \n");
    -	pre = TRUE;
    -    }
    +    while (bp != NULL) {
     
    -    /* tag the start of the message body */
    -    fprintf(fp, "");
    +#ifdef PRINTBODY_DEBUG
    +        fprintf(stderr, "========================\n");
    +        fprintf(stderr, "%s\n", bp->line);
    +        fprintf(stderr, "header: %d\n", bp->header);
    +        fprintf(stderr, "parsed: %d\n", bp->parsedheader);
    +#ifdef DELETE_ME
    +        fprintf(stderr, "attached: %d\n", bp->attached);
    +#endif
    +	fprintf(stderr, "attachment_rfc822: %d\n", bp->attachment_rfc822);
    +        fprintf(stderr, "demimed: %d\n", bp->demimed);
    +        fprintf(stderr, "html: %d\n", bp->html);
    +#endif
    +        
    +        /* skip all headers */
    +        if (bp->header) {
    +            
    +            if (!inheader) {
    +                inheader= TRUE;
     
    -    if (set_showhtml == 2)
    -      init_txt2html();
    -    inquote = 0;
    -    quote_num = 0;
    +                /* close open sections */
    +                close_open_sections(fp, &pre_open, &showhtml_open,
    +                                    &inlinehtml_open, &attachment_open);
    +            }
    +            bp = bp->next;
    +            continue;
    +        }
     
    -    inblank = 1;
    -    insig = 0;
    +        /* we're not in a header anymore */
    +        
    +        if (inheader) {
    +            inheader = FALSE;
    +            /* initialize the variables used for each attachment */
    +            inquote = 0;
    +            quote_num = 0;
    +            inblank = 1;
    +            insig = 0;
    +        }
     
    -    while (bp != NULL) {
    -	if (bp->html) {
    -	  /* already in HTML, don't touch */
    -  	  if (pre) {
    -	    fprintf(fp, "
    \n"); - pre = FALSE; - } - printhtml(fp, bp->line); - inheader = FALSE; /* this can't be a header if already in HTML */ - bp = bp->next; - continue; - } + if (bp->attachment_flags) { + if (bp->attachment_flags & BODY_ATTACHMENT_START) { + /* close open sections */ + close_open_sections(fp, &pre_open, &showhtml_open, + &inlinehtml_open, &attachment_open); + if (bp->attachment_rfc822) { + char *unique_id; + + forwarded_message_count++; + + /* update sequence count */ + nesting_level++; + + if (nesting_level < MAX_FWD_MSG_NESTING_LEVEL) { + /* + trying to use a simpler numbering scheme by skipping + the nesting level and just using the forwarded_message + count + */ + /* + nesting_level_sequence[nesting_level]++; + trio_asprintf(&unique_id, "%d-%d", nesting_level, + nesting_level_sequence[nesting_level]); + */ + + nesting_level_sequence[nesting_level] = forwarded_message_count; + + trio_asprintf(&unique_id, "%d", forwarded_message_count, + forwarded_message_count); + } else { + /* this won't be useful for associating a fwd message + with a corresponding list of stored attachments, but + at least we won't generate duplicate ids */ + trio_asprintf(&unique_id, "%d-%d", nesting_level, + forwarded_message_count); + } + + fprintf(fp, "\n", + (body_start) ? body_start_attribute : ""); + fprintf(fp, "
    \n", + unique_id); + fprintf(fp, "

    %s %s

    \n", + unique_id, + lang[MSG_FORWARDED_MESSAGE_NOTICE], + unique_id); + + free(unique_id); + + } else { + fprintf(fp, "\n", + (body_start) ? body_start_attribute : ""); + } + + if (body_start) { + body_start = FALSE; + } + + /* when debugging, reveal the sections */ + if (set_debug_level == 4 && !bp->attachment_rfc822) { + fprintf(fp, "

    %s:

    \n", + lang[MSG_ATTACHED_MESSAGE_NOTICE]); + } + + if (bp->attachment_rfc822 && bp->next) { + bp = bp->next; + if (bp->header) { + /* if it's a header and the user wants to show them, then print it + and all the other headers that follow */ + + /* @@ check for duplicate ids */ + bp = print_headers_rfc822_att(fp, email, bp); + /* reset this flag to as we just printed the attachment headers */ + inblank = 1; + if (bp) + bp = bp->next; + continue; + } + } + } + else if (bp->attachment_flags & BODY_ATTACHMENT_END) { + /* close open sections */ + close_open_sections(fp, &pre_open, &showhtml_open, + &inlinehtml_open, &attachment_open); + if (bp->attachment_rfc822) { + fprintf(fp, "
    \n"); + nesting_level--; + } + fprintf(fp, "\n"); + } + + bp = bp->next; + continue; + } - if (bp->header) { - char head[128]; - if (!inheader) { - /* JK: I'm not sure why, but I had a !set_showhtml here */ - if (!set_showhtml && !pre && set_showheaders) { - fprintf(fp, "
    \n");
    -		pre = TRUE;
    -	      }
    -	      inheader = TRUE;
    -	    }
    -	    if (sscanf(bp->line, "%127[^:]", head) == 1 && set_show_headers && !showheader(head)) {
    -	      /* the show header keyword has been used, then we skip all those
    -		 that aren't mentioned! */
    -	      if (isalnum(*head) || !set_showheaders) {
    -		/* this check is only to make sure that the last line among 
    -		   the headers (the "\n" one) won't be filtered off */
    -		bp = bp->next;
    -		continue;
    -	      }
    -	    }
    -	}
    -	else {
    -	  if (inheader) {
    -	    insig = 0;
    -	    if (set_showhtml) {
    -	      if (pre) {
    -		fprintf(fp, "
    \n"); - pre = FALSE; - } - fprintf(fp, "
    \n"); - } - else { - if (!pre) { - fprintf(fp, "
    \n");
    -		pre = TRUE;
    -	      }
    -	    }
    -	    inheader = FALSE;
    -	  }
    +        /* handle start and end markup for list of external attachments.
    +           we need to improve this so it is attached to the body or message/rfc822
    +           and avoid closing all open sections */
    +        else if (bp->attachment_links_flags) {
    +            if (bp->attachment_links_flags & BODY_ATTACHMENT_LINKS_START) {
    +                char *unique_id;
    +                
    +                /* close open sections */
    +                close_open_sections(fp, &pre_open, &showhtml_open,
    +                                    &inlinehtml_open, &attachment_open);
    +
    +                if (nesting_level < MAX_FWD_MSG_NESTING_LEVEL) {
    +                    /*
    +                      trio_asprintf(&unique_id, "%d-%d", nesting_level,
    +                      nesting_level_sequence[nesting_level]);
    +                    */
    +                    trio_asprintf(&unique_id, "%d", 
    +                                  nesting_level_sequence[nesting_level]);
    +                } else {
    +                    trio_asprintf(&unique_id, "%d-%d", nesting_level,
    +                                  forwarded_message_count);                    
    +                }
    +                
    +                fprintf(fp, "\n",
    +                        (body_start) ? body_start_attribute : "",
    +                        unique_id);
    +                
    +                if (nesting_level > 0) {
    +                    fprintf(fp, "

    %s %s

    \n", + unique_id, + lang[MSG_ATTACHMENTS_FOR_MESSAGE_NOTICE], + unique_id); + } else { + fprintf(fp, "

    %s

    \n", + unique_id, + lang[MSG_ATTACHMENTS_NOTICE]); + } + + free(unique_id); + + fprintf(fp, "
      \n"); + attachment_link_open = TRUE; + + } else if (bp->attachment_links_flags & BODY_ATTACHMENT_LINKS_END) { + /* close open list and open section */ + fprintf(fp, "
    \n" + "\n"); + attachment_link_open = FALSE; + } + bp = bp->next; + continue; + } + + /* skip any trailing newlines at the beginning of the attachment */ + if (!bp->attachment_links && (bp->line)[0] == '\n' && inblank) { + bp = bp->next; + continue; } + else + inblank = 0; + +#if 0 + /* if we have headers that are inside an rfc822, they + ** are marked as headers and attached, we print them out + ** in that case */ + if (bp->attachment_rfc822 && set_show_headers && bp->next) { + char head[128]; + + /* close open sections */ + close_open_sections(fp, &pre_open, &showhtml_open, + &inlinehtml_open, &attachment_open); + + fprintf(fp, "\n", + (body_start) ? body_start_attribute : ""); + if (set_debug_level == 4) { + fprintf(fp, "

    %s:

    \n", + lang[MSG_ATTACHED_MESSAGE_NOTICE]); + } + + fprintf(fp, "%s\n", bp->line); + attachment_open = TRUE; + bp = bp->next; + if (sscanf(bp->line, "%127[^:]", head) == 1) { + /* if it's a header and the user wants to show them, then print it + and all the other headers that follow */ + + /* @@ check for duplicate ids */ + bp = print_headers_rfc822_att(fp, email, bp); + /* reset this flag to as we just printed the attachment headers */ + inblank = 1; + if (bp) + bp = bp->next; + continue; + } + } +#endif - if (((bp->line)[0] != '\n') && (bp->header && !set_showheaders)) { - bp = bp->next; - continue; + if (bp->html) { + /* already in HTML, don't touch. It may be either inline + * html or an attachment list */ + + if (bp->attachment_links) { + if (!attachment_link_open) { + if (!attachment_open) { + /* close open sections */ + close_open_sections(fp, &pre_open, &showhtml_open, + &inlinehtml_open, &attachment_open); + + fprintf(fp, "\n", + (body_start) ? body_start_attribute : ""); + + if (set_debug_level == 4) { + fprintf(fp, "

    %s:

    \n", + lang[MSG_ATTACHED_MESSAGE_NOTICE]); + } + attachment_link_open = TRUE; + } + } + + /* attachment descriptions and comments are coded in utf-8. + If the prefered charset for the message is not utf-8, + we convert the line before printing it */ +#ifdef HAVE_ICONV + if (!prefered_charset_is_utf8) { + size_t tmplen; + char *tmpline = i18n_convstring( bp->line, "UTF-8", email->charset, &tmplen); + + free(bp->line); + bp->line = tmpline; + } +#endif + } + + else if (!inlinehtml_open) { + /* close open sections */ + close_open_sections(fp, &pre_open, &showhtml_open, + &inlinehtml_open, &attachment_open); + fprintf(fp, "\n", + (body_start) ? body_start_attribute : ""); + inlinehtml_open = TRUE; + } + + if (body_start) { + body_start = FALSE; + } + + fprintf(fp, "%s", bp->line); + bp = bp->next; + continue; + } + + /* if we get here, decide if we need to convert the body to html or + ** print it inside a pre, with an exception for sigs which are always + ** in pre sections when showhtml == 1*/ + if (set_showhtml) { + if (!showhtml_open) { + fprintf(fp, "\n", + (body_start) ? body_start_attribute : ""); + if (body_start) { + body_start = FALSE; + } + if (set_showhtml == 2) { + init_txt2html(); + } + showhtml_open = TRUE; + } } - if (set_showhtml == 2 && !inheader) { + if (set_showhtml == 2) { txt2html(fp, email, bp, replace_quoted, maybe_reply); - bp = bp->next; - continue; + bp = bp->next; + continue; } - if (bp->header && set_showheaders && !pre) { - fprintf(fp, "
    \n");
    -	  pre = TRUE;
    -	}
    - 
    -	if ((bp->line)[0] == '\n' && inblank) {
    -	  bp = bp->next;
    -	  continue;
    -	}
    -	else
    -	  inblank = 0;
    -	
     	if (set_showhtml) {
    -	  if (is_sig_start(bp->line)) {
    -	    insig = 1;
    -	    if (!pre) {
    -	      fprintf(fp, "
    \n");
    -	      pre = TRUE;
    -	    }
    -	  }
    +            if (is_sig_start(bp->line)) {
    +                insig = 1;
    +                if (!pre_open) {
    +                    fprintf(fp, "
    \n");
    +                    pre_open = TRUE;
    +                }
    +            }
     	  
    -	  if (!inheader && (bp->line)[0] == '\n')
    -	    /* within the 
     statements you do not need to
    -	       insert 

    statements since text is already preformated. - the W3C HTML validation script fails for such pages - Akis Karnouskos */ - { - if (!pre) - fprintf(fp, "
    "); - } - else { - if (insig) { - ConvURLs(fp, bp->line, id, subject, email->charset); - } - else if (isquote(bp->line)) { - if (set_linkquotes) { - if (handle_quoted_text(fp, email, bp, bp->line, inquote, quote_num, replace_quoted, maybe_reply)) { - ++quote_num; - inquote = 1; - } - } - else { - fprintf(fp, "<%s class=\"%s\">", set_iquotes ? "em" : "span", find_quote_class(bp->line)); - - ConvURLs(fp, bp->line, id, subject, email->charset); - - fprintf(fp, "%s
    \n", (set_iquotes) ? "" : ""); - } - } - else if ((bp->line)[0] != '\0' && !bp->header) { - char *sp; - sp = print_leading_whitespace(fp, bp->line); + if ((bp->line)[0] == '\n') + /* within the

     statements you do not need to
    +                   insert 

    statements since text is already preformated. + the W3C HTML validation script fails for such pages + Akis Karnouskos */ + { + if (!pre_open) + fprintf(fp, "
    \n"); + } + else { + if (insig) { + ConvURLs(fp, bp->line, id, subject, email->charset); + } + else if (isquote(bp->line)) { + if (set_linkquotes) { + if (handle_quoted_text(fp, email, bp, bp->line, inquote, quote_num, replace_quoted, maybe_reply)) { + ++quote_num; + inquote = 1; + } + } + else { + fprintf(fp, "<%s class=\"quote %s\">", set_iquotes ? "em" : "span", find_quote_class(bp->line)); + + ConvURLs(fp, bp->line, id, subject, email->charset); + + fprintf(fp, "%s
    \n", (set_iquotes) ? "" : ""); + } + } + else if ((bp->line)[0] != '\0' && !bp->header) { + char *sp; + sp = print_leading_whitespace(fp, bp->line); - /* JK: avoid converting Message-Id: headers */ - if (bp->header && bp->parsedheader && !strncasecmp(bp->line, "Message-Id:", 11) - && use_mailcommand) { - /* we desactivate it just during this conversion */ - use_mailcommand = 0; - ConvURLs(fp, sp, id, subject, email->charset); - use_mailcommand = 1; - } - else - ConvURLs(fp, sp, id, subject, email->charset); + /* JK: avoid converting Message-Id: headers */ + /* @@ change this for a strstr */ + if (bp->header && bp->parsedheader && !strncasecmp(bp->line, "Message-Id:", 11) + && use_mailcommand) { + /* we desactivate it just during this conversion */ + use_mailcommand = 0; + ConvURLs(fp, sp, id, subject, email->charset); + use_mailcommand = 1; + } + else + ConvURLs(fp, sp, id, subject, email->charset); - /* - * Determine whether we should break. - * We could check for leading spaces - * or quote lines, but in general, - * non-alphanumeric lines should be - * broken before. - */ + /* + * Determine whether we should break. + * We could check for leading spaces + * or quote lines, but in general, + * non-alphanumeric lines should be + * broken before. + */ - if ((set_showbr && !bp->header) || ((bp->next != NULL) && !isalnum(bp->next->line[0]))) - fprintf(fp, "
    "); - if (!bp->header) { - fprintf(fp, "\n"); - } - } - - } - } + if ((set_showbr) || ((bp->next != NULL) && !isalnum(bp->next->line[0]))) + fprintf(fp, "
    "); + fprintf(fp, "\n"); + } + } + } + + /* this section prints the text message inside

     */
     	else if ((bp->line)[0] != '\0' && !bp->header) {
    -	  /* JK: avoid converting Message-Id: headers */
    -	  if (bp->header && bp->parsedheader && !strncasecmp(bp->line, "Message-Id:", 11)
    -	      && use_mailcommand) {
    -	    /* we desactivate it just during this conversion */
    -	    use_mailcommand = 0;
    -	    ConvURLs(fp, bp->line, id, subject, email->charset);
    -	    use_mailcommand = 1;
    -	  }
    -	  else
    -	    ConvURLs(fp, bp->line, id, subject, email->charset);
    +            if (!pre_open) {
    +                fprintf(fp, "\n", (body_start) ? body_start_attribute : "");
    +                pre_open = TRUE;
    +                if (body_start) {
    +                    body_start = FALSE;
    +                }
    +            }
    +            
    +            /* JK: avoid converting Message-Id: headers that may appear in replies */
    +            /* @@ add strstr here */
    +            if (bp->header && bp->parsedheader && !strncasecmp(bp->line, "Message-Id:", 11)
    +                && use_mailcommand) {
    +                /* we desactivate it just during this conversion */
    +                use_mailcommand = 0;
    +                ConvURLs(fp, bp->line, id, subject, email->charset);
    +                use_mailcommand = 1;
    +            }
    +            else
    +                ConvURLs(fp, bp->line, id, subject, email->charset);
     	}
    +        
     	if (!isquote(bp->line))
    -	  inquote = 0;
    +            inquote = 0;
    +        
     	bp = bp->next;
         }
     
    -    if (pre)
    -      fprintf(fp, "
    \n"); - else if (set_showhtml == 2) - end_txt2html(fp); + if (attachment_link_open) { + fprintf(fp, "\n"); + attachment_link_open = FALSE; + } + + /* close all open tags */ + close_open_sections(fp, &pre_open, &showhtml_open, + &inlinehtml_open, &attachment_open); } char *print_leading_whitespace(FILE *fp, char *sp) @@ -1466,14 +1896,14 @@ char *print_leading_whitespace(FILE *fp, char *sp) } void print_headers(FILE *fp, struct emailinfo *email, int in_thread_file) -{ - /* - * Print the message's author info and date. - * General form: from:: name
    - */ - - fprintf(fp, "
    \n"); +{ + /* + * Print the message's author info and date. + * General form:
  • from:: name
  • + */ + char *tmp_oea; + #ifdef HAVE_ICONV size_t tmplen; char *tmpsubject=i18n_convstring(email->subject,"UTF-8",email->charset,&tmplen); @@ -1485,9 +1915,13 @@ void print_headers(FILE *fp, struct emailinfo *email, int in_thread_file) char *tmpname=convchars(email->name, email->charset); #endif + tmp_oea = obfuscate_email_address(email->emailaddr); + + fprintf(fp, "
      \n"); + /* the from header */ - fprintf (fp, "\n"); - fprintf (fp, "%s: ", lang[MSG_FROM]); + fprintf (fp, "
    • \n"); + fprintf (fp, "%s: ", lang[MSG_FROM]); if (REMOVE_MESSAGE(email)) { /* don't show the email address and name if we have deleted the message */ fprintf(fp, "<%s>", lang[MSG_SENDER_DELETED]); @@ -1502,9 +1936,9 @@ void print_headers(FILE *fp, struct emailinfo *email, int in_thread_file) email->msgid, email->subject); #endif fprintf(fp, "<%s>", ptr ? ptr : "", - obfuscate_email_address(email->emailaddr)); + tmp_oea); if (ptr) - free(ptr); + free(ptr); } else fprintf(fp, "%s", tmpname); @@ -1519,30 +1953,33 @@ void print_headers(FILE *fp, struct emailinfo *email, int in_thread_file) email->msgid, email->subject); #endif fprintf(fp, "%s <%s>", tmpname, ptr ? ptr : "", - obfuscate_email_address(email->emailaddr)); + tmp_oea); if (ptr) free(ptr); } else { - fprintf(fp, "%s <%s>", tmpname, + fprintf(fp, "%s <%s>", tmpname, (strcmp(email->emailaddr, "(no email)") != 0) ? email->emailaddr : "no email"); } } - fprintf (fp, "\n
      \n"); + fprintf (fp, "\n
    • \n"); /* subject */ if (in_thread_file) #ifdef HAVE_ICONV - fprintf(fp, "%s: %s
      \n", lang[MSG_SUBJECT], tmpsubject); + fprintf(fp, "
    • %s: %s
    • \n", lang[MSG_SUBJECT], tmpsubject); #else - fprintf(fp, "%s: %s
      \n", lang[MSG_SUBJECT], tmpsubject=convchars(email->subject,email->charset)); + fprintf(fp, "
    • %s: %s
    • \n", lang[MSG_SUBJECT], tmpsubject=convchars(email->subject,email->charset)); #endif /* date */ - fprintf(fp, "%s: %s
      \n", lang[MSG_CDATE], email->datestr); + fprintf(fp, "
    • %s: %s
    • \n", lang[MSG_CDATE], email->datestr); + + printheaders(fp, email, NULL, FALSE); - printheaders (fp, email); + fprintf(fp, "
    \n"); - fprintf(fp, "
    \n"); + if (set_email_address_obfuscation && tmp_oea) + free(tmp_oea); if(tmpsubject) free(tmpsubject); @@ -1550,6 +1987,151 @@ void print_headers(FILE *fp, struct emailinfo *email, int in_thread_file) free(tmpname); } +struct body *print_headers_rfc822_att(FILE *fp, struct emailinfo *email, struct body *bp) +{ + struct body *head; + char *date = NULL; + char *namep = NULL; + char *emailp = NULL; + char *subject = NULL; + bool hasdate = FALSE; + bool hasfrom = FALSE; + bool hassubject = FALSE; + char *tmpsubject = NULL; + char *tmpname = NULL; + + /* find the date, author, and subject, from a message/rfc822 attachment */ + + for (head = bp; head->header; head = head->next) { + char head_name[128]; + + /* + if (head->header && !head->demimed) { + head->line = + mdecodeRFC2047(head->line, strlen(head->line),charsetsave); + */ + if (!sscanf(head->line, "%127[^:]", head_name)) + continue; + + if (!strncasecmp(head->line, "Date:", 5)) { + date = getmaildate(head->line); + hasdate = TRUE; + } + else if (!strncasecmp(head->line, "From:", 5)) { + getname(head->line, &namep, &emailp); + head->parsedheader = TRUE; + if (set_spamprotect) { + char *tmp; + + tmp = spamify(emailp); + free(emailp); + emailp = tmp; + /* we need to "fix" the name as well, as sometimes + the email ends up in the name part */ + tmp = spamify(namep); + free(namep); + namep = tmp; + } + hasfrom = TRUE; + } + else if (!strncasecmp(head->line, "Subject:", 8)) { + subject = getsubject(head->line); + hassubject = TRUE; + head->parsedheader = TRUE; + } + + if (hassubject && hasfrom && hasdate) { + break; + } + } + + /* + * Print the message's author info and date. + * General form: from:: name
    + */ + + fprintf(fp, "
      \n"); + +#ifdef HAVE_ICONV + if (subject) { + size_t tmplen; + char *tmptmpsubject; + tmptmpsubject = i18n_convstring( subject, "UTF-8", email->charset, &tmplen); + tmpsubject = convchars(tmptmpsubject, "utf-8"); + free(tmptmpsubject); + } + if (namep) { + char *tmptmpname; + size_t tmplen; + + tmptmpname = i18n_convstring(namep, "UTF-8", email->charset, &tmplen); + tmpname = convchars(tmptmpname, "utf-8"); + free(tmptmpname); + } +#else + tmpsubject = NULL; + tmpname = convchars(namep, email->charset); +#endif + + /* the from header */ + if (hasfrom) { + char *tmp_oea = NULL; + + if (emailp) { + tmp_oea = obfuscate_email_address(emailp); + } + + fprintf (fp, "
    • \n"); + fprintf (fp, "%s: ", lang[MSG_FROM]); + fprintf (fp, "%s <%s>", + (tmpname) ? tmpname : "", + (emailp) ? tmp_oea : NOEMAIL); + fprintf (fp, "\n
    • \n"); + + if (set_email_address_obfuscation && tmp_oea) { + free(tmp_oea); + } + } + + if (hasdate) { + /* date */ + fprintf(fp, "
    • %s: %s
    • \n", + lang[MSG_CDATE], (date) ? date : NODATE); + } + + if (hassubject) { + /* subject */ +#ifdef HAVE_ICONV + fprintf(fp, "
    • %s: %s
    • \n", + lang[MSG_CSUBJECT], (tmpsubject) ? tmpsubject : NOSUBJECT); +#else + fprintf(fp, "
    • %s: %s
    • \n", + lang[MSG_CSUBJECT], tmpsubject=convchars((subject) ? subject : NOSUBJECT, + email->charset)); +#endif + } + + /* print the rest of the headers the user wants */ + bp = printheaders(fp, email, bp, TRUE); + + fprintf(fp, "
    \n"); + + if (namep) + free(namep); + if (emailp) + free(emailp); + if (date) + free(date); + if (subject) + free(subject); + if (tmpsubject) + free(tmpsubject); + if (tmpname) + free(tmpname); + + return bp; +} + static char *href01(struct emailinfo *email, struct emailinfo *email2, int in_thread_file, bool generate_markup) { @@ -1579,28 +2161,31 @@ print_replies(FILE *fp, struct emailinfo *email, int num, int in_thread_file) for (rp = replylist; rp != NULL; rp = rp->next) { if (rp->frommsgnum == num && hashnumlookup(rp->msgnum, &email2)) { #endif - char *del_msg = (email2->is_deleted ? lang[MSG_DEL_SHORT] : ""); + if (email2->is_deleted) + continue; + if (!list_started) { list_started = TRUE; - fprintf (fp, "
  • \n"); + fprintf (fp, "
  • "); } else fprintf (fp, "
  • "); if (rp->maybereply) - fprintf(fp, "%s:", lang[MSG_MAYBE_REPLY]); + fprintf(fp, "%s: ", lang[MSG_MAYBE_REPLY]); else - fprintf(fp, "%s:", lang[MSG_REPLY]); - fprintf(fp, "%s ", del_msg, - href01(email, email2, in_thread_file, FALSE), - lang[MSG_LTITLE_REPLIES]); + fprintf(fp, "%s: ", lang[MSG_REPLY]); + fprintf(fp, "", + href01(email, email2, in_thread_file, FALSE)); #ifdef HAVE_ICONV - char *tmpptr; - ptr = i18n_utf2numref(email2->subject,1); - tmpptr = i18n_utf2numref(email2->name,1); - fprintf(fp, "%s: \"%s\"
  • \n", tmpptr, ptr); - if (tmpptr) - free(tmpptr); + { + char *tmpptr; + ptr = i18n_utf2numref(email2->subject,1); + tmpptr = i18n_utf2numref(email2->name,1); + fprintf(fp, "%s: \"%s\"
  • \n", tmpptr, ptr); + if (tmpptr) + free(tmpptr); + } #else ptr = convchars(email2->subject, email2->charset); fprintf(fp, "%s: \"%s\"\n", email2->name, ptr); @@ -1612,7 +2197,7 @@ print_replies(FILE *fp, struct emailinfo *email, int num, int in_thread_file) printcomment(fp, "lreply", "end"); } -int print_links_up(FILE *fp, struct emailinfo *email, int pos, int in_thread_file) +static int print_links_up(FILE *fp, struct emailinfo *email, int pos, int in_thread_file) { int num = email->msgnum; int subjmatch; @@ -1628,18 +2213,19 @@ int print_links_up(FILE *fp, struct emailinfo *email, int pos, int in_thread_fil */ if (set_show_msg_links && set_show_msg_links != loc_cmp) { + if (pos == PAGE_TOP) { + fprintf(fp, "\n"); if (set_txtsuffix) { fprintf(fp, "

    %s", email->msgnum, set_txtsuffix, lang[MSG_TXT_VERSION]); } - - printfooter(fp, mhtmlfooterfile, set_label, set_dir, email->subject, filename, FALSE); - + + printfooter(fp, mhtmlfooterfile, set_label, set_dir, email->subject, filename, FALSE); + fprintf (fp, "\n"); + fprintf(fp, "\n\n"); + fclose(fp); if (get_new_reply_to() != -1) { @@ -2379,7 +2936,7 @@ void writearticles(int startnum, int maxnum) } if (newfile && chmod(filename, set_filemode) == -1) { - snprintf(errmsg, sizeof(errmsg), "%s \"%s\": %o.", lang[MSG_CANNOT_CHMOD], filename, set_filemode); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\": %o.", lang[MSG_CANNOT_CHMOD], filename, set_filemode); progerr(errmsg); } @@ -2442,7 +2999,7 @@ void writedates(int amountmsgs, struct emailinfo *email) newfile = 1; if ((fp = fopen(filename, "w")) == NULL) { /* AUDIT biege: where? */ - snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_COULD_NOT_WRITE], filename); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_COULD_NOT_WRITE], filename); progerr(errmsg); } @@ -2452,38 +3009,43 @@ void writedates(int amountmsgs, struct emailinfo *email) /* * Print out the index file header */ - print_index_header(fp, set_label, set_dir, lang[MSG_BY_DATE], datename); + print_index_header(fp, set_label, set_dir, lang[MSG_BY_DATE], datename, email); /* * Print out archive information links at the top of the index */ print_index_header_links(fp, DATE_INDEX, start_date_num, end_date_num, amountmsgs, email ? email->subdir : NULL); - fprintf (fp, "\n"); + fprintf (fp, "\n"); /* * Print out the actual message index lists. Here's the beef. */ - if (set_indextable) - fprintf(fp, "

    \n\n\n", lang[MSG_CSUBJECT], lang[MSG_CAUTHOR], lang[MSG_CDATE]); - else { - fprintf (fp, "
    \n"); - fprintf(fp, "
      \n"); + if (amountmsgs > 0) { + if (set_indextable) + fprintf(fp, "
      \n
    %s%s%s
    \n\n", lang[MSG_CSUBJECT], lang[MSG_CAUTHOR], lang[MSG_CDATE]); + else { + fprintf (fp, "
    \n"); + } + prev_date_str[0] = '\0'; + printdates(fp, datelist, -1, -1, email, prev_date_str); + + if (set_indextable) { + fprintf(fp, "
    %s%s%s
    \n
    \n"); + printlaststats (fp, end_date_num); + } else { + if (*prev_date_str) /* close the previous date item */ + fprintf (fp, "\n"); + printlaststats (fp, end_date_num); + fprintf (fp, "\n"); + } + } else { + /* print notice that the archive has no messages. + This can happen if all the messages in the archive + have been annotated as either spam or deleted */ + print_empty_archive_notice(fp, end_date_num); } - prev_date_str[0] = '\0'; - printdates(fp, datelist, -1, -1, email, prev_date_str); - - if (set_indextable) - fprintf(fp, "\n\n"); - else - { - if (*prev_date_str) /* close the previous date item */ - fprintf (fp, "\n"); - fprintf(fp, "\n"); - printlaststats (fp, end_date_num); - fprintf (fp, "\n"); - } - + /* * Print out archive information links at the bottom of the index */ @@ -2494,12 +3056,12 @@ void writedates(int amountmsgs, struct emailinfo *email) * Print the index page footer. */ printfooter(fp, ihtmlfooterfile, set_label, set_dir, lang[MSG_BY_DATE], datename, TRUE); - + fclose(fp); /* AUDIT biege: depending on the direc. it better to use fchmod(). */ if (newfile && chmod(filename, set_filemode) == -1) { - snprintf(errmsg, sizeof(errmsg), "%s \"%s\": %o.", lang[MSG_CANNOT_CHMOD], filename, set_filemode); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\": %o.", lang[MSG_CANNOT_CHMOD], filename, set_filemode); progerr(errmsg); } free(filename); @@ -2530,8 +3092,8 @@ void writeattachments(int amountmsgs, struct emailinfo *email) newfile = 1; if ((fp = fopen(filename, "w")) == NULL) { /* AUDIT biege: where? */ - snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_COULD_NOT_WRITE], filename); - progerr(errmsg); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_COULD_NOT_WRITE], filename); + progerr(errmsg); } if (set_showprogress) @@ -2540,37 +3102,46 @@ void writeattachments(int amountmsgs, struct emailinfo *email) /* * Print out the index file header */ - print_index_header(fp, set_label, set_dir, lang[MSG_BY_ATTACHMENT], attname); + print_index_header(fp, set_label, set_dir, lang[MSG_BY_ATTACHMENT], attname, email); /* * Print out archive information links at the top of the index */ print_index_header_links(fp, ATTACHMENT_INDEX, start_date_num, end_date_num, amountmsgs, email ? email->subdir : NULL); - fprintf (fp, "\n"); + + fprintf (fp, "\n"); /* * Print out the actual message index lists. Here's the beef. */ - if (set_indextable) { - fprintf(fp, "
    \n\n\n", lang[MSG_CSUBJECT], lang[MSG_CAUTHOR], lang[MSG_CDATE]); - printattachments(fp, datelist, email, &is_first); - fprintf(fp, "
    %s%s%s
    \n
    \n"); - } - else { - fprintf (fp, "
    \n"); - fprintf(fp, "
      \n"); - if (printattachments(fp, datelist, email, &is_first) == 0) - fprintf(fp, "
    • Nothing received yet!
    • \n"); - fprintf(fp, "
    \n"); - fprintf(fp, "
    \n"); + if (amountmsgs > 0) { + if (set_indextable) { + fprintf(fp, "
    \n\n\n", lang[MSG_CSUBJECT], lang[MSG_CAUTHOR], lang[MSG_CDATE]); + printattachments(fp, datelist, email, &is_first); + fprintf(fp, "
    %s%s%s
    \n
    \n"); + printlaststats (fp, end_date_num); + } + else { + fprintf (fp, "
    \n"); + if (printattachments(fp, datelist, email, &is_first) == 0) { + fprintf(fp, "

    %s

    \n", lang[MSG_EMPTY_ARCHIVE]); + } else { + printlaststats (fp, end_date_num); + } + fprintf(fp, "
    \n"); + } + } else { + /* print notice that the archive has no messages. + This can happen if all the messages in the archive + have been annotated as either spam or deleted */ + print_empty_archive_notice(fp, end_date_num); } /* * Print out archive information links at the bottom of the index */ - printlaststats (fp, end_date_num); print_index_footer_links(fp, ATTACHMENT_INDEX, end_date_num, amountmsgs, email ? email->subdir : NULL); @@ -2582,7 +3153,7 @@ void writeattachments(int amountmsgs, struct emailinfo *email) fclose(fp); if (newfile && chmod(filename, set_filemode) == -1) { - snprintf(errmsg, sizeof(errmsg), "%s \"%s\": %o.", lang[MSG_CANNOT_CHMOD], filename, set_filemode); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\": %o.", lang[MSG_CANNOT_CHMOD], filename, set_filemode); progerr(errmsg); } free(filename); @@ -2622,48 +3193,55 @@ void writethreads(int amountmsgs, struct emailinfo *email) newfile = 1; if ((fp = fopen(filename, "w")) == NULL) { /* AUDIT biege: where? */ - snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_COULD_NOT_WRITE], filename); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_COULD_NOT_WRITE], filename); progerr(errmsg); } if (set_showprogress) printf("%s \"%s\"...", lang[MSG_WRITING_THREAD_INDEX], filename); - print_index_header(fp, set_label, set_dir, lang[MSG_BY_THREAD], thrdname); + print_index_header(fp, set_label, set_dir, lang[MSG_BY_THREAD], thrdname, email); /* * Print out the index page links */ print_index_header_links(fp, THREAD_INDEX, start_date_num, end_date_num, amountmsgs, email ? email->subdir : NULL); - fprintf (fp, "\n"); - if (set_indextable) { - fprintf(fp, "
    \n\n\n", lang[MSG_CSUBJECT], lang[MSG_CAUTHOR], lang[MSG_CDATE]); - print_all_threads(fp, -1, -1, email); - fprintf(fp, "
    %s%s %s
    \n
    \n"); - } - else { - fprintf (fp, "
    \n"); - fprintf(fp, "
      \n"); - print_all_threads(fp, -1, -1, email); - fprintf(fp, "
    \n"); - fprintf (fp, "
    "); - } + fprintf (fp, "\n"); + + if (amountmsgs > 0) { + if (set_indextable) { + fprintf(fp, "
    \n\n\n", lang[MSG_CSUBJECT], lang[MSG_CAUTHOR], lang[MSG_CDATE]); + print_all_threads(fp, -1, -1, email); + fprintf(fp, "
    %s%s %s
    \n
    \n"); + printlaststats (fp, end_date_num); + } + else { + fprintf (fp, "
    \n"); + print_all_threads(fp, -1, -1, email); + printlaststats (fp, end_date_num); + fprintf (fp, "
    \n"); + } + } else { + /* print notice that the archive has no messages. + This can happen if all the messages in the archive + have been annotated as either spam or deleted */ + print_empty_archive_notice(fp, end_date_num); + } /* * Print out archive information links at the bottom of the index */ - printlaststats (fp, end_date_num); print_index_footer_links(fp, THREAD_INDEX, end_date_num, amountmsgs, email ? email->subdir : NULL); - + printfooter(fp, ihtmlfooterfile, set_label, set_dir, lang[MSG_BY_THREAD], thrdname, TRUE); fclose(fp); if (newfile && chmod(filename, set_filemode) == -1) { - snprintf(errmsg, sizeof(errmsg), "%s \"%s\": %o.", lang[MSG_CANNOT_CHMOD], filename, set_filemode); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\": %o.", lang[MSG_CANNOT_CHMOD], filename, set_filemode); progerr(errmsg); } free(filename); @@ -2684,7 +3262,7 @@ void printsubjects(FILE *fp, struct header *hp, char **oldsubject, const char *break_str; const char *endline; static char date_str[DATESTRLEN+40]; /* made static for smaller stack */ - static char *first_attributes = ""; + static char *first_attributes = " id=\"first\""; if (hp != NULL) { printsubjects(fp, hp->left, oldsubject, year, month, subdir_email); @@ -2710,37 +3288,38 @@ void printsubjects(FILE *fp, struct header *hp, char **oldsubject, else { bool is_first; if (*oldsubject && *oldsubject[0] != '\0') { /* close the previous open list */ - fprintf(fp, "\n"); + fprintf(fp, "\n"); is_first = FALSE; } else is_first = TRUE; - fprintf(fp, "
  • %s%s\n", + fprintf(fp, "%s\n", (is_first) ? first_attributes : "", subject); fprintf(fp, "
      \n"); } } - if(set_indextable) { + if (set_indextable) { startline = " "; break_str = ""; strcpy(date_str, getindexdatestr(hp->data->date)); endline = ""; } else { - startline = "
    • "; + startline = "
    • "; break_str = ""; - snprintf(date_str, sizeof(date_str), "(%s)", getindexdatestr(hp->data->date)); + trio_snprintf(date_str, sizeof(date_str), "(%s)", getindexdatestr(hp->data->date)); endline = "
    • "; } fprintf(fp, - "%s%s%s%s %s%s\n", startline, - msg_href(hp->data, subdir_email, TRUE), + "%s%s%s %s%s\n", + startline, + set_fragment_prefix, hp->data->msgnum, + msg_href(hp->data, subdir_email, FALSE), name, break_str, - set_fragment_prefix, hp->data->msgnum, - set_fragment_prefix, hp->data->msgnum, date_str, endline); + date_str, endline); *oldsubject = hp->data->unre_subject; - + free(subject); free(name); } @@ -2768,48 +3347,55 @@ void writesubjects(int amountmsgs, struct emailinfo *email) else newfile = 1; - if ((fp = fopen(filename, "w")) == NULL) { /* AUDIT biege: where? */ - snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_COULD_NOT_WRITE], filename); - progerr(errmsg); + if ((fp = fopen(filename, "w")) == NULL) { /* AUDIT biege: where? */ + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_COULD_NOT_WRITE], filename); + progerr(errmsg); } if (set_showprogress) printf("%s \"%s\"...", lang[MSG_WRITING_SUBJECT_INDEX], filename); - print_index_header(fp, set_label, set_dir, lang[MSG_BY_SUBJECT], subjname); + print_index_header(fp, set_label, set_dir, lang[MSG_BY_SUBJECT], subjname, email); + + /* + * Print out the index page links + */ + print_index_header_links(fp, SUBJECT_INDEX, start_date_num, end_date_num, + amountmsgs, email ? email->subdir : NULL); + + fprintf (fp, "\n"); - /* - * Print out the index page links - */ - print_index_header_links(fp, SUBJECT_INDEX, start_date_num, end_date_num, - amountmsgs, email ? email->subdir : NULL); - fprintf (fp, "\n"); - - if (set_indextable) { - fprintf(fp, "
      \n\n\n", lang[MSG_CSUBJECT], lang[MSG_CAUTHOR], lang[MSG_CDATE]); - } - else { - fprintf (fp, "
      \n"); - fprintf(fp, "
        \n"); - } - { - char *oldsubject = ""; /* dummy to start with */ - printsubjects(fp, subjectlist, &oldsubject, -1, -1, email); - } - if (set_indextable) { - fprintf(fp, "
      %s%s %s
      \n
      \n"); - } - else { - fprintf(fp, "
  • \n"); - fprintf(fp, "\n"); - fprintf (fp, ""); + if (amountmsgs > 0) { + if (set_indextable) { + fprintf(fp, "
    \n\n\n", lang[MSG_CSUBJECT], lang[MSG_CAUTHOR], lang[MSG_CDATE]); + } + else { + fprintf (fp, "
    \n"); + } + { + char *oldsubject = ""; /* dummy to start with */ + printsubjects(fp, subjectlist, &oldsubject, -1, -1, email); + } + if (set_indextable) { + fprintf(fp, "
    %s%s %s
    \n
    \n"); + printlaststats (fp, end_date_num); + } + else { + fprintf(fp, "\n"); + printlaststats (fp, end_date_num); + fprintf (fp, "\n"); + } + } else { + /* print notice that the archive has no messages. + This can happen if all the messages in the archive + have been annotated as either spam or deleted */ + print_empty_archive_notice(fp, end_date_num); } /* * Print out archive information links at the bottom of the index */ - printlaststats (fp, end_date_num); print_index_footer_links(fp, SUBJECT_INDEX, end_date_num, amountmsgs, email ? email->subdir : NULL); @@ -2818,7 +3404,7 @@ void writesubjects(int amountmsgs, struct emailinfo *email) fclose(fp); if (newfile && chmod(filename, set_filemode) == -1) { - snprintf(errmsg, sizeof(errmsg), "%s \"%s\": %o.", lang[MSG_CANNOT_CHMOD], filename, set_filemode); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\": %o.", lang[MSG_CANNOT_CHMOD], filename, set_filemode); progerr(errmsg); } free(filename); @@ -2839,7 +3425,7 @@ void printauthors(FILE *fp, struct header *hp, char **oldname, const char *break_str; const char *endline; static char date_str[DATESTRLEN+40]; /* made static for smaller stack */ - static char *first_attributes = ""; + static char *first_attributes = " id=\"first\""; if (hp != NULL) { printauthors(fp, hp->left, oldname, year, month, subdir_email); @@ -2865,13 +3451,13 @@ void printauthors(FILE *fp, struct header *hp, char **oldname, bool is_first; if (*oldname && *oldname[0] != '\0') { /* close the previous open list */ - fprintf(fp, "\n"); + fprintf(fp, "\n"); is_first = FALSE; } else is_first = TRUE; - fprintf(fp, "
  • %s%s\n", + fprintf(fp, "%s\n", (is_first) ? first_attributes : "", tmpname); fprintf(fp, "
      \n"); @@ -2884,14 +3470,16 @@ void printauthors(FILE *fp, struct header *hp, char **oldname, endline = ""; } else { - startline = "
    • "; - break_str = " "; - snprintf(date_str, sizeof(date_str), "(%s)", getindexdatestr(hp->data->date)); + startline = "
    • "; + break_str = ""; + trio_snprintf(date_str, sizeof(date_str), "(%s)", getindexdatestr(hp->data->date)); endline = "
    • "; } - fprintf(fp,"%s%s%s%s%s%s\n", - startline, msg_href(hp->data, subdir_email, TRUE), subj, break_str, - set_fragment_prefix, hp->data->msgnum, set_fragment_prefix, hp->data->msgnum, + + fprintf(fp,"%s%s%s %s%s\n", + startline, + set_fragment_prefix, hp->data->msgnum, + msg_href(hp->data, subdir_email, FALSE), subj, break_str, date_str, endline); if(subj) free(subj); @@ -2924,41 +3512,49 @@ void writeauthors(int amountmsgs, struct emailinfo *email) else newfile = 1; - if ((fp = fopen(filename, "w")) == NULL) { /* AUDIT biege: where? */ - snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_COULD_NOT_WRITE], filename); - progerr(errmsg); + if ((fp = fopen(filename, "w")) == NULL) { /* AUDIT biege: where? */ + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_COULD_NOT_WRITE], filename); + progerr(errmsg); } if (set_showprogress) printf("%s \"%s\"...", lang[MSG_WRITING_AUTHOR_INDEX], filename); - print_index_header(fp, set_label, set_dir, lang[MSG_BY_AUTHOR], authname); + print_index_header(fp, set_label, set_dir, lang[MSG_BY_AUTHOR], authname, email); /* * Print out the index page links */ print_index_header_links(fp, AUTHOR_INDEX, start_date_num, end_date_num, amountmsgs, email ? email->subdir : NULL); - fprintf (fp, "\n"); - if (set_indextable) { - fprintf(fp, "
      \n\n\n", lang[MSG_CAUTHOR], lang[MSG_CSUBJECT], lang[MSG_CDATE]); - } - else { - fprintf(fp, "
      \n"); - fprintf(fp, "
        \n"); - } - { - char *prevauthor = ""; - printauthors(fp, authorlist, &prevauthor, -1, -1, email); - } - if (set_indextable) { - fprintf(fp, "
      %s%s %s
      \n
      \n"); - } - else { - fprintf(fp, "
  • \n"); - fprintf(fp, "\n"); - fprintf(fp, "\n"); + fprintf (fp, "\n"); + + if (amountmsgs > 0) { + if (set_indextable) { + fprintf(fp, "
    \n\n\n", lang[MSG_CAUTHOR], lang[MSG_CSUBJECT], lang[MSG_CDATE]); + } + else { + fprintf(fp, "
    \n"); + } + { + char *prevauthor = ""; + printauthors(fp, authorlist, &prevauthor, -1, -1, email); + } + if (set_indextable) { + fprintf(fp, "
    %s%s %s
    \n
    \n"); + printlaststats (fp, end_date_num); + } + else { + fprintf(fp, "\n"); + printlaststats (fp, end_date_num); + fprintf(fp, "\n"); + } + } else { + /* print notice that the archive has no messages. + This can happen if all the messages in the archive + have been annotated as either spam or deleted */ + print_empty_archive_notice(fp, end_date_num); } /* @@ -2966,7 +3562,6 @@ void writeauthors(int amountmsgs, struct emailinfo *email) * of the index page */ - printlaststats (fp, end_date_num); print_index_footer_links(fp, AUTHOR_INDEX, end_date_num, amountmsgs, email ? email->subdir : NULL); @@ -2975,7 +3570,7 @@ void writeauthors(int amountmsgs, struct emailinfo *email) fclose(fp); if (newfile && chmod(filename, set_filemode) == -1) { - snprintf(errmsg, sizeof(errmsg), "%s \"%s\": %o.", lang[MSG_CANNOT_CHMOD], filename, set_filemode); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\": %o.", lang[MSG_CANNOT_CHMOD], filename, set_filemode); progerr(errmsg); } free(filename); @@ -2987,7 +3582,7 @@ void writeauthors(int amountmsgs, struct emailinfo *email) /* ** Pretty-prints the items for the haof */ -void printhaofitems(FILE *fp, struct header *hp, int year, int month, struct emailinfo *subdir_email) +static void printhaofitems(FILE *fp, struct header *hp, int year, int month, struct emailinfo *subdir_email) { char *subj, *from_name, *from_emailaddr; @@ -3035,9 +3630,9 @@ void writehaof(int amountmsgs, struct emailinfo *email) else newfile = 1; - if ((fp = fopen(filename, "w")) == NULL) { /* AUDIT biege: where? */ - snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_COULD_NOT_WRITE], filename); - progerr(errmsg); + if ((fp = fopen(filename, "w")) == NULL) { /* AUDIT biege: where? */ + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\".", lang[MSG_COULD_NOT_WRITE], filename); + progerr(errmsg); } if (set_showprogress) @@ -3050,7 +3645,7 @@ void writehaof(int amountmsgs, struct emailinfo *email) #endif fprintf(fp, " \n\n"); fprintf(fp, " \n\n"); - fprintf(fp, " \n\n"); + fprintf(fp, " \n\n"); print_haof_indices(fp, email ? email->subdir : NULL); @@ -3062,7 +3657,7 @@ void writehaof(int amountmsgs, struct emailinfo *email) fclose(fp); if (newfile && chmod(filename, set_filemode) == -1) { - snprintf(errmsg, sizeof(errmsg), "%s \"%s\": %o.", lang[MSG_CANNOT_CHMOD], filename, set_filemode); + trio_snprintf(errmsg, sizeof(errmsg), "%s \"%s\": %o.", lang[MSG_CANNOT_CHMOD], filename, set_filemode); progerr(errmsg); } free(filename); @@ -3103,10 +3698,21 @@ static void printmonths(FILE *fp, char *summary_filename, int amountmsgs) for (j = 0; j <= AUTHOR_INDEX; ++j) save_name[j] = index_name[0][j]; - print_index_header(fp, set_label, set_dir, subject, summary_filename); + print_index_header(fp, set_label, set_dir, subject, summary_filename, NULL); + fprintf(fp, "\n"); + fprintf(fp, "
    \n"); fprintf(fp, "\n"); + fprintf(fp, "\n", lang[MSG_ACCESS_MAIL_ARCHIVES_BY_CAPTION]); + fprintf(fp, "\n"); + fprintf(fp, "\n"); + fprintf(fp, "\n", lang[MSG_PERIOD]); + fprintf(fp, "\n", lang[MSG_ARTICLES]); + fprintf (fp, "\n", lang[MSG_SORTED_BY]); + fprintf(fp, "\n"); + fprintf(fp, "\n"); + for (y = first_year; y <= last_year; ++y) { - for (m = (set_monthly_index ? 0 : -1); m < (set_monthly_index ? 12 : 0); ++m) { + for (m = (set_monthly_index ? 0 : -1); m < (set_monthly_index ? 12 : 0); ++m) { char month_str[80]; char month_str_pub[80]; int started_line = 0; @@ -3138,52 +3744,63 @@ static void printmonths(FILE *fp, char *summary_filename, int amountmsgs) char subject_title[128]; if (!show_index[0][j]) continue; - snprintf(buf1, sizeof(buf1), "%sby%s", month_str, save_name[j]); + trio_snprintf(buf1, sizeof(buf1), "%sby%s", month_str, save_name[j]); filename = htmlfilename(buf1, NULL, ""); fp1 = fopen(filename, "w"); if (!fp1) { - snprintf(errmsg, sizeof(errmsg), "can't open %s", filename); + trio_snprintf(errmsg, sizeof(errmsg), "can't open %s", filename); progerr(errmsg); } - snprintf(subject_title, sizeof(subject_title), "%s %s", month_str_pub, indextypename[j]); - print_index_header(fp1, set_label, set_dir, subject_title, filename); + trio_snprintf(subject_title, sizeof(subject_title), "%s %s", month_str_pub, indextypename[j]); + print_index_header(fp1, set_label, set_dir, subject_title, filename, NULL); /* * Print out the index page links */ print_index_header_links(fp1, j, first_date, last_date, count, NULL); - + + fprintf (fp1, "\n"); if (set_indextable) { fprintf(fp1, "
    \n
    %s
    %s%s%s
    \n\n", lang[j == AUTHOR_INDEX ? MSG_CAUTHOR : MSG_CSUBJECT], lang[j == AUTHOR_INDEX ? MSG_CSUBJECT : MSG_CAUTHOR], lang[MSG_CDATE]); } else { - fprintf(fp1, "
      \n"); + fprintf (fp1, "
      \n"); } - switch (j) { - case DATE_INDEX: - { - char prev_date_str[DATESTRLEN + 40]; - prev_date_str[0] = '\0'; - printdates(fp1, datelist, y, m, NULL, prev_date_str); - if (*prev_date_str) /* close the previous date item */ - fprintf (fp, "
    \n"); - break; - } - case THREAD_INDEX: - print_all_threads(fp1, y, m, NULL); - break; - case SUBJECT_INDEX: - printsubjects(fp1, subjectlist, &prev_text, y, m, NULL); + + switch (j) + { + case DATE_INDEX: + { + char prev_date_str[DATESTRLEN + 40]; + prev_date_str[0] = '\0'; + printdates(fp1, datelist, y, m, NULL, prev_date_str); + if (!set_indextable) { + if (*prev_date_str) /* close the previous date item */ + fprintf (fp1, "\n"); + } + break; + } + case THREAD_INDEX: + print_all_threads(fp1, y, m, NULL); + break; + case SUBJECT_INDEX: + printsubjects(fp1, subjectlist, &prev_text, y, m, NULL); + if (!set_indextable) { + fprintf(fp1, "\n"); + } break; case AUTHOR_INDEX: printauthors(fp1, authorlist, &prev_text, y, m, NULL); + if (!set_indextable) { + fprintf(fp1, "\n"); + } break; - } - + } + if (set_indextable) { fprintf(fp1, "
    %s%s %s
    \n\n"); } else { - fprintf(fp1, "\n"); + fprintf(fp1, "
    \n"); } /* @@ -3194,7 +3811,7 @@ static void printmonths(FILE *fp, char *summary_filename, int amountmsgs) print_index_footer_links(fp1, j, last_date, count, NULL); printfooter(fp1, ihtmlfooterfile, set_label, set_dir, subject_title, - save_name[j], FALSE); + save_name[j], TRUE); fclose(fp1); if (!count) { remove(filename); @@ -3205,7 +3822,7 @@ static void printmonths(FILE *fp, char *summary_filename, int amountmsgs) } else { if (!started_line) { - fprintf(fp, "%s%d %s", month_str_pub, count, lang[MSG_ARTICLES]); + fprintf(fp, "%s%d %s", month_str_pub, count, lang[MSG_ARTICLES]); while (empties--) fprintf(fp, ""); started_line = 1; @@ -3219,8 +3836,10 @@ static void printmonths(FILE *fp, char *summary_filename, int amountmsgs) fprintf(fp, "\n"); } } - fprintf(fp, "\n"); - printfooter(fp, ihtmlfooterfile, set_label, set_dir, subject, summary_filename, FALSE); + fprintf(fp, "\n"); + fprintf(fp, "\n"); + fprintf (fp, "